1 / 16

The Hadoop Distributed File System

The Hadoop Distributed File System. PaoMin Wu University at Buffalo. Namenode stores matadata of the system keeps all namespace in RAM Datanode block replica stores application data 3. HDFS-Client User applications access the file system using the HDFS client. ARCHITECTURE.

Download Presentation

The Hadoop Distributed File System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Hadoop Distributed File System PaoMin Wu University at Buffalo

  2. Namenode • stores matadata of the system • keeps all namespace in RAM • Datanode • block replica • stores application data • 3. HDFS-Client • User applications access the file system using the HDFS • client ARCHITECTURE

  3. HDFS Client Process

  4. 4. Image and Journal • Namespace image = file system metadata • Peresistent record of image = checkpoint • CheckpointNode (NameNode) • Protects file system metadata • 6. BackupNode (NameNode) • Capable of creating periodic checkpoints ARCHITECTURE

  5. FILE I/O OPERATIONS AND REPLICA MANGEMENT

  6. FILE I/O OPERATIONS AND REPLICA MANGEMENT

  7. Sort Benchmark

  8. Problem: NameNode contains all important information Solution: Allow multiple namespaces(and NameNodes) to share the physical storage within a cluster Future Work

  9. MapReduce: Simplied Data Processing on Large Clusters PaoMin Wu University at Buffalo

  10. key/value pair • execution across a set of machines • handling machine failures • managing the required inter-machine communication • runs on a large cluster • powerful interface • automatic parallelization • distribution of large-scale computations Introduction

  11. Map, written by the user, takes an input pair and produces a set of intermediate key/value pairs. The Reduce function, also written by the user, accepts an intermediate key and a set of values for that key. The intermediate values are supplied to the user's reduce function via an iterator. Programming Model

  12. Example:

  13. Execution Overflow:

  14. Backup Tasks:

  15. Restricting the programming model is beneficial Network bandwidth is a scarce resource Redundant execution can help Conclusions

  16. The Hadoop Distributed File System Konstantin Shvachko, HairongKuang, Sanjay Radia, Robert Chansler Yahoo! Sunnyvale, California USA {Shv, Hairong, SRadia, Chansler}@Yahoo-Inc.com MapReduce: Simplied Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat jeff@google.com, sanjay@google.com Google, Inc. References:

More Related