The Hadoop Distributed File System

The Hadoop Distributed File System PaoMin Wu University at Buffalo

Namenode • stores matadata of the system • keeps all namespace in RAM • Datanode • block replica • stores application data • 3. HDFS-Client • User applications access the file system using the HDFS • client ARCHITECTURE

HDFS Client Process

4. Image and Journal • Namespace image = file system metadata • Peresistent record of image = checkpoint • CheckpointNode (NameNode) • Protects file system metadata • 6. BackupNode (NameNode) • Capable of creating periodic checkpoints ARCHITECTURE

FILE I/O OPERATIONS AND REPLICA MANGEMENT

Sort Benchmark

Problem: NameNode contains all important information Solution: Allow multiple namespaces(and NameNodes) to share the physical storage within a cluster Future Work

MapReduce: Simplied Data Processing on Large Clusters PaoMin Wu University at Buffalo

key/value pair • execution across a set of machines • handling machine failures • managing the required inter-machine communication • runs on a large cluster • powerful interface • automatic parallelization • distribution of large-scale computations Introduction

Map, written by the user, takes an input pair and produces a set of intermediate key/value pairs. The Reduce function, also written by the user, accepts an intermediate key and a set of values for that key. The intermediate values are supplied to the user's reduce function via an iterator. Programming Model

Example:

Execution Overflow:

Backup Tasks:

Restricting the programming model is beneficial Network bandwidth is a scarce resource Redundant execution can help Conclusions

The Hadoop Distributed File System Konstantin Shvachko, HairongKuang, Sanjay Radia, Robert Chansler Yahoo! Sunnyvale, California USA {Shv, Hairong, SRadia, Chansler}@Yahoo-Inc.com MapReduce: Simplied Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat jeff@google.com, sanjay@google.com Google, Inc. References:

The Hadoop Distributed File System

The Hadoop Distributed File System

Presentation Transcript

Multi-dimensional Index on Hadoop Distributed File System

Hadoop File System

Distributed File System

MapReduce and Hadoop Distributed File System

Hadoop Distributed File System Architecture and Design

15-440, Hadoop Distributed File System Allison Naaktgeboren

Hadoop Distributed File System

Hadoop Distributed File System

The Hadoop Distributed File System

DISTRIBUTED FILE SYSTEM

HDFS ( Hadoop Distributed File System)

15-440, Hadoop Distributed File System Allison Naaktgeboren

Distributed File System

Hadoop File System

Hadoop Distributed File System Usage in USCMS

HDFS Hadoop Distributed File System

MapReduce and Hadoop Distributed File System

What is HDFS | Hadoop Distributed File System | Edureka

Big-data Computing: Hadoop Distributed File System

MapReduce and Hadoop Distributed File System

Distributed File System

Hadoop File System