Connect Ceph Infrastructure

Connect Ceph Infrastructure

Production Ceph cluster • 2 Dell R510s • Dual Intel X5660s • 96 GB RAM • 2 PERC H800s, each with 2 MD1200 shelves • Total of 56 disks per host. • 420TB raw, using 2x replication • Ceph version 0.72 (Emperor)

Ceph features currently used by Connect CephFS • Distributed network filesystem • Fully POSIX compatible. • CI Connect “Stash” service. • Accessible via POSIX, HTTP, and Globus on all Connect instances • In our experience, generally pretty good but: • Requires custom kernels (3.12+) • Problems with large numbers (1000s) of files in directories

Ceph features currently used by Connect RADOS Block Device (RBD) • Allows us to carve off a portion of the Ceph pool and expose it to a machine as a block device (e.g., “/dev/rbd1”) • Block device can be formatted with any filesystem of our choosing (XFS, EXT4, Btrfs, ZFS, ...) • Currently using RBD as the backend storage for FAXbox. • Simply an XRootD service running on top of normal RBD device • Files in XRootD are also available via Globus, POSIX, and HTTP.

TestbedCeph cluster • Scrappy, triply redundant deployment built out of retired machines for testing new Ceph releases and features • 14 disk nodes with 6 disks each • Fairly ancient dual-CPU, dual-core Opteron boxes • (6) 500 – 750GB disks each • 3 redundant head nodes and 3x replication • Continues to grow as we retire old hardware

Services running on Testbed • Using latest stable Ceph version (v0.80 Firefly) • Currently testing RADOSGW: • Implements the Amazon S3 API and the OpenStack Swift API on top of the Ceph object store • Knowledge Lab group at UChicago successfully using RADOSGW to store job input/outputs via S3 API • All files also get stored on Amazon S3 as a backup. • Very cost effective – since transferring into S3 is free, they only have to pay Amazon to keep the data on disk.

Upcoming experiments – Tiered storage • In the Tiered Storage scenario, two pools are created • “Hot” cache pool with recently accessed data living on SSDs, no replication. • “Cold” pool with traditional HDDs, using erasure coding scheme to maximize available disk space at cost of performance • Ideal for scenarios where the majority of data is written, popular for a while, and then seldom accessed afterwards. • Compare to, say, HDFS deployments where 2/3 of storage is immediately lost to replication

Connect Ceph Infrastructure

Connect Ceph Infrastructure

Presentation Transcript

HP Virtual Connect Converged Infrastructure

Connect

Ceph @ CERN: one year on…

Connect

Connect

Connect

CEPH at the Tier 1

Connect

Ceph Storage in OpenStack

PV Ceph: Young Star Caught Speeding?

CEPH Accreditation O rientation Workshop

CEPH Baccalaureate Accreditation

PV Ceph: The Movie

Connect

Connect !

Connect.

Connect.

A BigData Tour – HDFS, Ceph and MapReduce