1.41k likes | 1.68k Views
Everything You Wanted to Know About Storage, but Were Afraid to Ask . Do you have a Cell phone, PDA or Smartphone?. Do you have a DIGITAL CAMERA?. Do you have a PC?. What do all of these devices have in common ?. How do you protect your data?. Digital Footprint Calculator.
E N D
Everything You Wanted to Know About Storage, but Were Afraid to Ask
Digital Footprint Calculator http://www.emc.com/digital_universe/downloads/web/personal-ticker.htm
RAID 0 • Data is striped across the HDDs in a RAID set • The stripe size is specified at a host level for software RAID and is vendor specific for hardware RAID • When the number of drives in the array increases, performance improves because more data can be read or written simultaneously • Used in applications that need high I/O throughput • Does not provide data protection and availability in the event of drive failures
RAID 1 • Mirroring is a technique whereby data is stored on two different HDDs, yielding two copies of data. • In addition to providing complete data redundancy, mirroring enables faster recovery from disk failure. • Mirroring involves duplication of data — the amount of storage capacity needed is twice the amount of data being stored. Therefore, mirroring is considered expensive • It is preferred for mission-critical applications that cannot afford data loss
Nested RAID • Mirroring can be implemented with striped RAID by mirroring entire stripes of disks to stripes on other disks • RAID 0+1 and RAID 1+0 combine the performance benefits of RAID 0 with the redundancy benefits of RAID 1 • These types of RAID require an even number of disks, the minimum being four. • RAID 0+1 is also called mirrored stripe. • This means that the process of striping data across HDDs is performed initially and then the entire stripe is mirrored.
Nested RAID • RAID 1+0 is also called striped mirror • The basic element of RAID 1+0 is that data is first mirrored and then both copies of data are striped across multiple HDDs in a RAID set • Some applications that benefit from RAID 1+0 include the following: • High transaction rate Online Transaction Processing (OLTP),Database applications that require high I/O rate, random access, and high availability
RAID 3 • RAID 3 stripes data for high performance and uses parity for improved fault tolerance. • Parity information is stored on a dedicated drive so that data can be reconstructed if a drive fails • RAID 3 is used in applications that involve large sequential data access, such as video streaming.
RAID 4 • Stripes data across all disks except the parity disk at the block level • Parity information is stored on a dedicated disk • Unlike RAID 3 , data disks can be accessed independently so that specific data elements can be read or written on a single disk without read or write of an entire stripe
RAID 5 • RAID 5 is a very versatile RAID implementation • The difference between RAID 4 and RAID 5 is the parity location. • RAID 4, parity is written to a dedicated drive, while In RAID 5, parity is distributed across all disks • The distribution of parity in RAID 5 overcomes the write bottleneck. • RAID 5 is preferred for messaging, medium-performance media serving, and relational database management system (RDBMS) implementations in which database administrators (DBAs) optimize data access
RAID 6 • RAID 6 works the same way as RAID 5 except that RAID 6 includes a second parity element • This enable survival in the event of the failure of two disks in a RAID group. • RAID-6 protects against two disk failures by maintaining two parities
Hot Spare • A hot spare refers to a spare HDD in a RAID array that temporarily replaces a failed HDD of a RAID set. • When the failed HDD is replaced with a new HDD, The hot spare replaces the new HDD permanently, and a new hot spare must be configured on the array, or data from the hot spare is copied to it, and the hot spare returns to its idle state, ready to replace the next failed drive. • A hot spare should be large enough to accommodate data from a failed drive. • Some systems implement multiple hot spares to improve data availability. • A hot spare can be configured as automatic or user initiated, which specifies how it will be used in the event of disk failure
What is an Intelligent Storage System • Intelligent Storage Systems are RAID arrays that are: Highly optimized for I/O processing Have large amounts of cache for improving I/O performance Have operating environments that provide: – Intelligence for managing cache – Array resource allocation – Connectivity for heterogeneous hosts – Advanced array based local and remote replication options
Components of an Intelligent Storage System • An intelligent storage system consists of four key components: front end, cache, back end, and physical disks.
Components of an Intelligent Storage System • The front end provides the interface between the storage system and the host. • It consists of two components: front-end ports and front-end controllers • The front-end ports enable hosts to connect to the intelligent storage system, and has processing logic that executes the appropriate transport protocol, such as SCSI, Fibre Channel, or iSCSI, for storage connections • Front-end controllers route data to and from cache via the internal data bus. When cache receives write data, the controller sends an acknowledgment
Components of an Intelligent Storage System • Controllers optimize I/O processing by using command queuing algorithms • Command queuing is a technique implemented on front-end controllers • It determines the execution order of received commands and can reduce unnecessary drive head movements and improve disk performance
Intelligent Storage System: Cache • Cache is an important component that enhances the I/O performance in an intelligent storage system. • Cache improves storage system performance by isolating hosts from the mechanical delays associated with physical disks, which are the slowest components of an intelligent storage system. Accessing data from a physical disk usually takes a few milliseconds • Accessing data from cache takes less than a millisecond. Write data is placed in cache and then written to disk
Cache Data Protection • Cache mirroring: Each write to cache is held in two different memory locations on two independent memory cards • Cache vaulting: Cache is exposed to the risk of uncommitted data loss due to power failure • using battery power to write the cache content to the disk storage vendors use a set of physical disks to dump the contents of cache during power failure
Intelligent Storage System: Back End • It consists of two components: back-end ports and back-end controllers • Physical disks are connected to ports on the back end. • The back end controller communicates with the disks when performing reads and writes and also provides additional, but limited, temporary data storage. • The algorithms implemented on back-end controllers provide error detection and correction, along with RAID functionality. Controller • Multiple controllers also facilitate load balancing
Intelligent Storage System: Physical Disks • Disks are connected to the back-end with either SCSI or a Fibre Channel interface
What is LUNs • Physical drives or groups of RAID protected drives can be logically split into volumes known as logical volumes, commonly referred to as Logical Unit Numbers (LUNs)
High-end Storage Systems • High-end storage systems, referred to as active-active arrays, are generally aimed at large enterprises for centralizing corporate data • These arrays are designed with a large number of controllers and cache memory • An active-active array implies that the host can perform I/Os to its LUNs across any of the available Paths
Midrange Storage Systems • Also referred as Active-passive arrays • Host can perform I/Os to LUNs only through active paths • Other paths remain passive till active path fails • Midrange array have two controllers, each with cache, RAID controllers and disks drive interfaces • Designed for small and medium enterprises • Less scalable as compared to high-end array
DAS Direct-Attached Storage (DAS) • storage connects directly to servers • applications access data from DAS using block-level access protocols • Examples: • internal HDD of a host, • tape libraries, and • directly connected external HDD
DAS Direct-Attached Storage (DAS) • DAS is classified as internal or external, based on the location of the storage device with respect to the host. • Internal DAS: storage device internally connected to the host by a serial or parallel bus • distance limitations for high-speed connectivity • can support only a limited number of devices, and • occupy a large amount of space inside the host
DAS Direct-Attached Storage (DAS) • External DAS: server connects directly to the external storage device • usually communication via SCSI or FC protocol. • overcomes the distance and device count limitations of internal DAS, and • provides centralized management of storage devices.
DAS Benefits • Ideal for local data provisioning • Quick deployment for small environments • Simple to deploy • Reliability • Low capital expense • Low complexity
DAS Connectivity Options • host storage device communication via protocols • ATA/IDE and SATA – Primarily for internal bus • SCSI • – Parallel (primarily for internal bus) • – Serial (external bus) • FC – High speed network technology
DAS Connectivity Options • protocols are implemented on the HDD controller • a storage device is also known by the name of the protocol it supports
DAS Management • LUN creation, filesystem layout, and data addressing • Internal – Host (or 3rd party software) provides: • Disk partitioning (Volume management) • File system layout
DAS Management • External • –Array based management • – Lower TCO for managing data and storage Infrastructure
DAS Challenges • limited scalability • Number of connectivity ports to hosts • Number of addressable disks • Distance limitations • For internal DAS, maintenance requires downtime • Limited ability to share resources (unused resources cannot be easily re-allocated) • – Array front-end port, storage space • – Resulting in islands of over and under utilized storage pools
Introduction to SCSI • SCSI–3 is the latest version of SCSI
SCSI Architecture Primary commands common to all devices
SCSI Architecture Standard rules for device communication and information sharing