Storage Technology Case Study: NetApp

Storage Technology Case Study:NetApp Tyler Bletsch 16 February 2009

Physical hardware

Protocols • NAS: Network attached storage (files) • CIFS: Mostly windows, works in *NIX • NFS: Mostly *NIX, works in Windows • HTTP: Read-only, supported everywhere • SAN: Storage Area Network (blocks) • iSCSI: SCSI over IP • Fibre Channel (FCP): SCSI over special optical fabric

What does the filer do? FCP iSCSI CIFS NFS HTTP Block IO File IO ? ? ? Disks

What does the filer do? FCP iSCSI CIFS NFS HTTP Block IO File IO Volumes LUNs File system Aggregates Disks

Disks • Disks – dumb arrays of 512-byte sectors • Prone to failure • Small (1500GB is small) • Slow • High seek time • Comparatively low throughput • Block device: only has two operations • Read block • Write block

Aggregates (1) A B C P1(A,B,C) P2(A,B,C) D E P1(D,E,F) P2(D,E,F) F G P1(G,H,I) P2(G,H,I) H I ... ... ... ... ... • How to combine disks in a fault-tolerant way? • RAID (Redundant Array of Inexpensive Disks) • More specifically: “RAID-DP” ("Dual Parity") • An implementation of RAID 6 • For each RAID group with N disks, 2 disks store parity of the other N-2 • Any two disks can fail without data loss

Aggregates (2) • Aggregate: • A set of disks combined into one or more RAID groups. • Fault-tolerant virtual block device built out of real block devices • Analogous to “md” devices in Linux, “raid groups” in Windows

Volumes • An aggregate is a block device – only has two operations: • Read block • Write block • We want to make this useful by: (a) virtually chopping up aggregates, and (b) adding the concept of directories, files, and LUNs • Answer: volumes • Analagous to “LVM” on Linux or “Logical Disk Manager” in Windows

Traditional vs. Flexible Volumes • Traditional volumes: Slice up the aggregate • Example: "Volume X is mapped to the first 30GB of aggregate A." • An empty 30GB volume uses 30GB of real disk. • Flexible volumes: lazily allocate as needed • Example: "Volume X gets blocks from aggregate A as they're needed up to a maximum of 30GB." • An empty 30GB volume uses almost no space. • Thin-provisioning: create volumes with more space than you have real disk capacity • Why?

From volumes to file systems • Traditional systems: the volume is a block device, a file system gets written on top • Strict layering • NetApp/ZFS: the concepts of volume and file system are combined • So there's no additional "formatting" step • We can now do NAS: Just export volumes • Linux NFS: • mount –t nfs myfiler:/vol/myvolume /mnt/stuff • Windows CIFS: • \\myfiler\myvolume

SAN on NetApp • LUN: “Logical Unit Number” (worst name ever) • A virtual block device mounted on a client via a storage area network (SAN) • Fibre Channel or iSCSI (SCSI-over-IP) • Why a SAN? • You can boot from it (with the right hardware) • Very fast for block-oriented apps • E.g. databases, video-on-demand systems • Historical: Fibre Channel used to be faster/more reliable than IP • Volumes can also store LUNs • NetApp feature: LUNs can be thin provisioned like volumes • E.g. capacity = 30GB, real disk usage = 10GB

The complete picture • Looking back: • A LUN is a virtual block device … • in a virtual file system (volume) … • in a virtual block device (aggregate) … • spread over some real block devices (disks). FCP iSCSI CIFS NFS HTTP Block IO File IO Volumes LUNs File system Aggregates Disks

Block indirection: NetApp’s big secret • How does thin provisioning work? • Block indirection layer Traditional model NetApp WAFL model myfile.txt 0 1 2 0 1 2 Filesystem: Block pointer table See 6056 See 6059 See 6002 6000 6001 6002 6002 6056 6059 Volume:

Block indirection benefits (1) • Free snapshots • Make a copy-on-write duplicate of the block table • Cheap in-place backup lets recovers from many disasters without getting tapes involved • Example: • rm important.doccp ~/.snapshot/20080130.130852/important.doc . Snapshot Real working copy 0 1 2 0 1 2' See 6056 See 6059 See 6102 See 6056 See 6059 See 6155 6056 6059 6102 6155

Block indirection benefits (2) • Thin clones • Copy a snapshot, but make it’s block table writable • Idea: Clone a LUN with an OS installed and tell hundreds of machines to boot from the clones Original Clone 0 1' 2 0 1 2' See 6056 See 6001 See 6102 See 6056 See 6059 See 6155 6001 6056 6059 6102 6155

Block indirection benefits (3) • Data deduplication • Add a “data hash” field to the block table • When a block is written, hash it • If it’s already in the block table, just point this block entry to the same underlying data one-million-zeroes.txt Before After 0 1 2 0 1 2 See 6056 See 6059 See 6102 See 6056 See 6056 See 6056 6056 6059 6102 6056

Demonstration • NetApp simulator • Runs as a Linux process, same as the real thing • Tasks • Check out the hardware • Make an aggregate • Make a volume • Mount the volume • Make data, snapshot, delete data, recover

Demonstration • Even more tasks • Volume clone • Create LUN • Snapshot • LUN Clone

Additional topics • SnapMirror: continuous remote backup with constant Recovery Point Objective (RPO) monitoring • “You’re mirroring filer1:/vol/stuff to filer2, filer2 is 31 seconds behind” • Cluster fail-over • Two filers connected with infiniband, each connected to the other’s disks • One can go down, the other takes over automatically • Zero down-time failure recovery, hardware upgrades, etc. • True clustering • Next-generation filer software “GX” • Combine tons of filers and shelves into a single namespace with a backend IP LAN • Next step in storage scalability

Storage Technology Case Study: NetApp