Mass-Storage Structure

Mass-Storage Structure Operating System Concepts chapter 12 CS 355 Operating Systems Dr. Matthew Wright

Background: Magnetic Disks • Rotate 60 to 200 times per second • Transfer rate: rate at which data flows between drive and computer • Positioning time (random-access time): time to move disk arm to desired cylinder (seek time) and time for desired sector to rotate under the disk head (rotational latency)

Disk Address Structure • Disks are addressed as a large 1-dimensional array of logical blocks (usually 512 bytes per logical block). • This array is mapped onto the sectors of the disk, usually with sector 0 on the outermost cylinder, then through that track, then through that cylinder, and then through the other cylinders working toward the center of the disk. • Converting logical addresses to cylinder and track numbers is difficult because: • Most disks have some defective sectors, which are replaced by spare sectors elsewhere on the disk. • The number of sectors per track might not be constant. • Constant linear velocity (CLV): tracks farther from center hold more bits, so disk rotates faster when reading these tracks to keep data rate constant (CSs, DVDs commonly use this method) • Constant angular velocity (CAV): rotational speed is constant, so bit density decreases from inner tracks to outer tracks to keep data rate constant

Disk Scheduling: FCFS • Simple, but generally doesn’t provide the fastest service • Example: suppose the read/write heads start on cylinder 53, and disk queue has requests for I/O to blocks on the following cylinders: • 98, 183, 37, 122, 14, 124, 65, 67 Diagram shows read/write head movement to service the requests FCFS. Total head movement spans 640 cylinders.

Disk Scheduling: SSTF • Shortest Seek Time First (SSTF): service the requests closest to the current position of the read/write heads • This is similar to SJF scheduling, and could starve some requests. • Example: heads at cylinder 53; disk request queue contains: • 98, 183, 37, 122, 14, 124, 65, 67 Diagram shows read/write head movement to service the requests SSTF. Total head movement spans 236 cylinders.

Disk Scheduling: SCAN • SCAN algorithm: disk heads start at one end, move towards the other end, then return, servicing requests along each way • Example: heads at cylinder 53 moving toward 0; request queue: • 98, 183, 37, 122, 14, 124, 65, 67 Diagram shows read/write head movement to service the requests with SCAN algorithm. Total head movement spans 236 cylinders.

Disk Scheduling: C-SCAN • Circular SCAN (C-SCAN): Disk heads start at one end, move towards the other end, servicing requests along each way. Disk heads return immediately to the first end without servicing requests, then repeat. • Example: heads at cylinder 53; request queue: • 98, 183, 37, 122, 14, 124, 65, 67 Diagram shows read/write head movement to service the requests with C-SCAN algorithm. Total head movement spans 383 cylinders.

Disk Scheduling: LOOK and C-LOOK • Like SPAN or C-SPAN algorithms, but only going as far as the last request in either direction. • Example: heads at cylinder 53; request queue: • 98, 183, 37, 122, 14, 124, 65, 67 Diagram shows read/write head movement to service the requests with C-SCAN algorithm. Total head movement spans 322 cylinders.

Selecting a Disk-Scheduling Algorithm • Which algorithm to choose? • SSTF is common and better than FCFS. • SCAN and C-SCAN perform better for systems that place a heavy load on the disk. • Performance depends on the number and types of requests, and the file-allocation method. • In general, either SSTF or LOOK is a reasonable choice for the default algorithm. • The disk-scheduling algorithm should be written as a separate module of the operating system, allowing it to be replaced with a different algorithm if necessary. • Why not let the controller built into the disk hardware manage the scheduling? • The disk hardware can take into account both seek time and rotational latency. • The OS may choose to mandate the disk scheduling to guarantee priority of certain types of I/O.

Disk Management • The Operating System may also be responsible for tasks such as disk formatting, booting from disk, and bad-block recovery. • Low-level formatting divides a disk into sectors, and is usually performed when the disk is manufactured. • Logical formatting creates a file system on the disk, and is done by the OS. • The OS maintains the boot blocks (or boot partition) that contain the bootstrap loader. • Bad blocks: disk blocks may fail • An error-correcting code (ECC) stored with each block can detect and possibly correct an error (if so, it is called a soft error). • Disks contain spare sectors which are substituted for bad sectors. • If the system cannot recover from the error, it is called a hard error, and manual intervention may be required.

Swap-Space Management • Recall that memory uses disk space as an extension of main memory; this disk space is called the swap space, even for systems that implement paging rather than pure swapping. • Swap-space can be: • A file in the normal file system: easy to implement, but slow in practice • A separate (raw) disk partition: requires a swap-space manager, but can be optimized for speed rather than storage efficiency • Linux allows the administrator to choose whether the swap space is in a file or in a raw disk partition.

RAID Structure • RAID: Redundant Array of Independent Disks or Redundant Array of Inexpensive Disks • In systems with large numbers of disks, disk failures are common. • Redundancy allows the recovery of data when disk(s) fail. • Mirroring: A logical disk consists of two physical disks, and every write is carried out on both disks. • Bit-level striping: Splits the bits of each byte across multiple disks, which improves the transfer rate. • Block-level striping: Splits blocks of a file across multiple disks, which improves the access rate for large files and allows for concurrent reads of small files. • A nonvolatile RAM (NVRAM) cache can be used to protect data waiting to be written in case a power failure occurs. • Are disk failures really independent? • What if multiple disks fail simultaneously?

RAID Levels • RAID level 0: non-redundant striping • Data striped at the block level, with no redundancy. • RAID level 1: mirrored disks • Two copies of data stored on different disks. • Data not striped. • Easy to recover data from one disk that fails • RAID 0 + 1: combines RAID levels 0 and 1 • Provides both • performance and • reliability.

RAID Levels • RAID level 2: error-correcting codes • Data striped across disks at the bit level. • Disks labeled P store extra bits that can be used to reconstruct data if one disk fails. • Requires fewer disks than RAID level 1. • Requires computation of the error-correction bits at every write, and failure recovery requires lots of reads and computation.

RAID Levels • RAID level 3: bit-interleaved parity • Data striped across disks at the bit level. • Since disk controllers can detect whether a sector has read correctly, a single parity bit can be used for error detection and correction. • As good as RAID level 2 in practice, but less expensive. • Still requires extra computation for parity bits. • RAID level 4: block-interleaved parity • Data striped across disks at the block level. • Stores parity blocks on a separate disk, which can be used to reconstruct the blocks on a single failed disk.

RAID Levels • RAID level 5: block-interleaved distributed parity • Data striped across disks at the • block level. • Spreads data and parity blocks across all disks. • Avoids possible overuse of a single parity disk, which could happen with RAID level 4. • RAID level 6: P + Q redundancy scheme • Like RAID level 5, but stores extra • redundant information to guard • against simultaneous failures of • multiple disks. • Uses error-correcting codes such as Reed-Solomon codes.

RAID Implementation • RAID can be implemented at various levels: • At the kernel of system software level • By the host bus-adapter hardware • By storage array hardware • In the Storage Area Network (SAN) by disk virtualization devices • Some RAID implementations include a hot spare: an extra disk that is not used until one disk fails, at which time the system automatically restores data onto the spare disk.

Stable-Storage Implementation • Stable storage: storage that never loses stored information. • Write-ahead logging (used to implement atomic transactions) requires stable storage. • To implement stable storage: • Replicate information on more than one nonvolatile storage media with independent failure modes. • Update information in a controlled manner to ensure that failure during an update will not leave all copies in a damaged state, and so that we can safely recover from a failure. • Three possible outcomes of a disk write: • Successful completion: all of the data written successfully • Partial failure: only some of the data written successfully • Total failure: occurs before write starts; previous data remains intact

Stable-Storage Implementation • Strategy: maintain two (identical) physical blocks for each logical block, on different disks, with error-detection bits for each block • A write operation proceeds as: • Write the information to the first physical block. • When the first write completes, then write the same information to the second physical block. • When the second write completes, then declare the operation successful. • During failure recovery, examine each pair of physical blocks: • If both are the same and neither contains a detectable error, then do nothing. • If one block contains a detectable error, then replace its contents with the other block. • If neither block contains a detectable error, but the values differ, then replace the contents of the first block with that of the second. • As long as both copies don’t fail simultaneously, we guarantee that a write operation will either succeed completely or result in no change.

Tertiary Storage • Most OSs handle removable disks almost exactly like fixed disks — a new cartridge is formatted and an empty file system is generated on the disk. • Tapes are presented as a raw storage medium, i.e., and application does not open a file on the tape, it opens the whole tape drive as a raw device. • Usually the tape drive is reserved for the exclusive use of that application. • Since the OS does not provide file system services, the application must decide how to use the array of blocks. • Since every application makes up its own rules for how to organize a tape, a tape full of data can generally only be used by the program that created it. • The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer, and then use the cartridge in another computer. • Contemporary OSs generally leave the name space problem unsolved for removable media, and depend on applications and users to figure out how to access and interpret the data. • Some kinds of removable media (e.g., CDs) are so well standardized that all computers use them the same way.

Mass-Storage Structure

Mass-Storage Structure

Presentation Transcript

Mass Storage

Chapter 14: Mass-Storage Structure

Mass-Storage Systems

WP5 Mass Storage

Mass-Storage Systems

Mass Storage System

Mass-Storage Structure

Fermilab Mass Storage

Mass-Storage Systems

Ch 13. Mass-Storage Structure

Overview of Mass Storage Structure

Mass Storage

USB Mass Storage

Chapter 14: Mass-Storage Structure

Chapter 14 Mass-Storage Structure

Mass-Storage Systems

Overview of Mass Storage Structure

Mass-Storage Systems

Mass Storage Structure

Chapter 10: Mass-Storage Structure

Mass-Storage Systems