1 / 22

Midterm 2: April 28th

Midterm 2: April 28th . Material: Query processing and Optimization, Chapters 12 and 13 (ignore 12.5.5, 12.7, 13.4.4 and 13.5) Transactions, Chapter 14 Concurrency Control, Chapter 15, ignore 15.7 to 15.10 Recovery System, Chapter 16, ignore 16.8 and 16.9 Google File System

hiero
Download Presentation

Midterm 2: April 28th

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Midterm 2: April 28th • Material: • Query processing and Optimization, Chapters 12 and 13 (ignore 12.5.5, 12.7, 13.4.4 and 13.5) • Transactions, Chapter 14 • Concurrency Control, Chapter 15, ignore 15.7 to 15.10 • Recovery System, Chapter 16, ignore 16.8 and 16.9 • Google File System • LRU-K, article by O’Neils and Weikum • Continuous Media, article by Ghandeharizadeh & Muntz (1st 11 pages) • COSAR-CQN

  2. Enterprise Data Mangement Shahram Ghandeharizadeh Computer Science Department University of Southern California

  3. Challenge: Managing Data is Expensive • Cost of Managing Data is $100K/TB/Year: • Down time is estimated at thousands of dollars per minute. • Loss of data results in lost productivity: • 20 Megabytes of accounting data requires 21 days and costs $19K to reproduce. • 50% of companies that lose their data due to a disaster never re-open; 90% go out of business in 2 years!

  4. Before Data stored locally. After Data stored across the network at a central location. Centralize Management of Storage Data Network Data

  5. Centralize Management of Storage • Advantages: • Many clients share storage and data: data remains available when a client fails. Network Data

  6. Centralize Management of Storage • Advantages: • Many clients share storage and data. • Redundancy is implemented in one place protecting all clients from disk failure. Network

  7. Centralize Management of Storage • Advantages: • Many clients share storage and data. • Redundancy is implemented in one place protecting all clients from disk failure. • Centralized backup: The administrator does not care/know how many clients are on the network sharing storage. Network

  8. Centralize Management of Storage • Advantages: • Many clients share storage and data. • Redundancy is implemented in one place protecting all clients from disk failure. • Centralized backup: The administrator does not care/know how many clients are on the network sharing storage. Data Sharing High Availability Network Data Backup

  9. Network failures • What about network failures? • Two host bus adapters per server, • Each server connected to a different switch.

  10. Storage Area Network (SAN): Block level access, Write to storage is immediate, Specialized hardware including switches, host bus adapters, disk chassis, battery backed caches, etc. Expensive Supports transaction processing systems. Network Attached Storage (NAS): File level access, Write to storage might be delayed, Generic hardware, In-expensive, Not appropriate for transaction processing systems. Centralize Management of Storage

  11. Centralize management of storage: Storage Area Networks (SANs), Redundancy in data to tolerate disk failures, Regular backup, Disaster recovery. Storage Area Network

  12. Concepts and Terminology • Virtualization: • Available storage is represented as one HUGE disk drive, e.g., a SAN with a thousand 1.5 TB disk provides 1 Petabyte of storage, • Available storage is partitioned into Logical Unit Numbers (LUNs), • A LUN is presented to one or more servers, • A LUN appears as a disk drive to a server. • SAN places blocks across physical disks intelligently to balance load.

  13. Question • Is it possible to present the same LUN to two different servers simultaneously?

  14. Question • Is it possible to present the same LUN to two different servers simultaneously? YES! • Can two different servers read and write the files stored on the presented LUN?

  15. Question • Is it possible to present the same LUN to two different servers simultaneously? YES! • Can two different servers read and write the files stored on the presented LUN? Yes! • What are the consequences?

  16. Concepts: Backup • Snapshot: State of a LUN at one instance in time. • Copy-on-write: • A snapshot consists of the original blocks of a LUN, • Every time an application writes a block, SAN generates a new copy for the current LUN (snapshot maintains the original), • Advantage: copy of blocks in support of backup is generated on-demand.

  17. Copy-on-Write • Original LUN and Snapshot taken midnight Sunday morning. 5 6 7 1 2 3 4

  18. Copy-on-Write • Original LUN and Snapshot taken midnight Sunday morning. • Write block 5 changes the current LUN to: • As blocks are written, the physical blocks of the snapshot materialize. 6 7 Old 5 1 2 3 4 5

  19. Hot Standby • An in-expensive server that is maintained on the side to assume responsibility for a failed server. • Goal: Minimize downtime.

  20. Summary • SAN and NAS are shared-disk architecture, • SAN is appropriate for transaction processing systems, • Hardware alone is not a substitute for a parallel, high performance transaction processing system, e.g., Teradata, Oracle RAC, etc.

More Related