190 likes | 359 Views
R&D Activities on Storage in CERN-IT’s FIO group. Helge Meinhard / CERN-IT HEPiX Fall 2009 LBNL 27 October 2009. Outline. Follow-up of two presentations in Umea meeting: iSCSI technology ( Andras Horvath) Lustre evaluation project (Arne Wiebalck ). iSCSI - Motivation.
E N D
R&D Activities on Storagein CERN-IT’s FIO group HelgeMeinhard / CERN-IT HEPiX Fall 2009 LBNL 27 October 2009
Outline Follow-up of two presentations in Umea meeting: • iSCSI technology (Andras Horvath) • Lustre evaluation project (Arne Wiebalck) Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009
iSCSI - Motivation • Three approaches • Possible replacement for rather expensive setups with Fibre Channel SANs (used e.g. for physics databases with Oracle RAC, and for backup infrastructure) or proprietary high-end NAS appliances • Potential cost-saving • Possible replacement for bulk disk servers (Castor) • Potential gain in availability, reliability and flexibility • Possible use for applications, for which small disk servers have been used in the past • Potential gain in flexibility, cost-saving • Focus is functionality, robustness and large-scale deployment rather than ultimate performance Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009
iSCSI terminology • iSCSI is a set of protocols for block-level access to storage • Similar to FC • Unlike NAS (e.g. NFS) • “Target”: storage unit listening to block-level requests • Appliances available on the market • Do-it-yourself: put software stack on storage node, e.g. our storage-in-a-box nodes • “Initiator”: unit sending block-level requests (e.g. read, write) to the target • Most modern operating systems feature an iSCSI initiator stack: Linux RH4, RH5; Windows Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009
Hardware used • Initiators: number of different servers including • Dell M610 blades • Storage-in-a-box server • All running SLC5 • Targets: • Dell Equallogic PS5000E (12 drives, 2 controllers with 3 GigE each) • Dell Equallogic PS6500E (48 drives, 2 controllers with 4 GigE each) • Infortrend A12E-G2121 (12 drives, 1 controller with 2 GigE) • Storage-in-a-box: Various models with multiple GigE or 10GigE interfaces, running Linux • Network (if required): private, HP ProCurve 3500 and 6600 Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009
Target stacks under Linux • RedHat Enterprise 5 comes with tgtd • Single-threaded • Does not scale well • Tests with IET • Multi-threaded • No performance limitation in our tests • Required newer kernel to work out of the box (Fedora and Ubuntu server worked for us) • In context of collaboration between CERN and Caspur, work going on to understand the steps to be taken for backporting IET to RHEL 5 Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009
Performance comparison • 8k random I/O test with Oracle tool Orion Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009
Performance measurement • 1 server, 3 storage-in-a-box servers as targets • Each target exporting 14 JBOD disks over 10GigE Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009
Almost production status… • Two storage-in-a-box servers with hardware RAID5 running SLC5 and tgtd on GigE • Initiator provides multipathing and software RAID 1 • Used for some grid services • No issues • Two Infortrend boxes (JBOD configuration) • Again, initiator provides multipathing and software RAID 1 • Used as backend storage for Lustre MDT (see next part) • Tools for setup, configuration and monitoring in place Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009
Being worked on • Large deployment of Equallogic ‘Sumos’ (48 drives of 1 TB each, dual controllers, 4 GigE/controller): 24 systems, 48 front-end nodes • Experience encouraging, but there are issues • Controllers don’t support DHCP, manual config required • Buggy firmware • Problems with batteries on controllers • Support not fully integrated into Dell structures yet • Remarkable stability • We have failed all network and server components that can fail, the boxes kept running • Remarkable performance Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009
Equallogic performance • 16 servers, 8 sumos, 1 GigE per server, iozone Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009
Appliances vs. home-made • Appliances • Stable • Performant • Highly functional (Equallogic: snapshots, relocation without server involvement, automatic load balancing, …) • Home-made with storage-in-a-box servers • Inexpensive • Complete control over configuration • Can run other things than target software stack • Can select function at software install time (iSCSI target vs. classical disk server with rfiod or xrootd) Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009
Ideas (partly started testing) • Two storage-in-a box servers as highly redundant setup • Running target and initiator stacks at the same time • Mounting half the disks local, half on the other machine • Some heartbeat detects failures and (e.g. by resetting an IP alias) moves functionality to one or the other box • Several storage-in-a-box servers as targets • Exporting disks either as JBOD or as RAID • Front-end server creates software RAID (e.g. RAID 6) over volumes from all storage-in-a-box servers • Any one (or two with SW RAID 6) storage-in-a-box server can fail entirely, the data remain available Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009
Lustre Evaluation Project • Tasks and goals • Evaluate Lustre as a candidate for storage consolidation • Home directories • Project space • Analysis space • HSM • Reduce service catalogue • Increase overlap between service teams • Integrate with CERN fabric management tools Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009
Areas of interest (1) • Installation • Quattorized installation of Lustre instances • Client RPMs for SLC5 • Backup • LVM-based snapshots for meta data • Tested with TSM, set up for PPS instance • Changelogs feature of v2.0 not yet usable • Strong Authentication • v2.0: early adaptation, full Kerberos Q1/2011 • Tested & used by other sites (not by us yet) • Fault-tolerance • Lustre comes with built-in failover • PPS MDS iSCSI setup Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009
FT: MDS PPS Setup MDS MDT OSS Dell EquallogiciSCSI Arrays 16x 500GB SATA Dell PowerEdge M600 Blade Server 16GB Private iSCSI Network OSS CLT • Fully redundant against component failure • iSCSI for shared storage • Linux device mapper + md for mirroring • Quattorized • Needs testing
Areas of Interest (2/2) • Special performance & Optimization • Small files: „Numbers dropped from slides“ • Postmark benchmark (not done yet) • HSM interface • Active developement, driven by CEA • Access to Lustre HSM code (to be tested with TSM/CASTOR) • Life Cycle Management (LCM) & Tools • Support for day-to-day operations? • Limited support for setup, monitoring and management
Findings and Thoughts • No strong authentication as of now • Foreseen for Q1/2011 • Strong client/server coupling • Recovery • Very powerful users • Striping, Pools • Missing support for life cycle management • No user transparent data migration • Lustre/Kernel upgrades difficult • Moving targets on the roadmap • V2.0 not yet stable enough for testing
Summary • Some desirable features not there (yet) • Wish list communicated to SUN • SUN interested in evaluation • Some more tests to be done • Kerberos, Small files, HSM • Documentation