1 / 82

ecs289m Fall, 2009 “Facebook and Hadoop” Lecture #04

ecs289m Fall, 2009 “Facebook and Hadoop” Lecture #04. S. Felix Wu Computer Science Department University of California, Davis wu@cs.ucdavis.edu http://www.cs.ucdavis.edu/~wu/. Hadoop at Facebook. Production cluster 4800 cores, 600 machines, 16GB per machine – April 2009

rehan
Download Presentation

ecs289m Fall, 2009 “Facebook and Hadoop” Lecture #04

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ecs289m Fall, 2009“Facebook and Hadoop”Lecture #04 S. Felix Wu Computer Science Department University of California, Davis wu@cs.ucdavis.edu http://www.cs.ucdavis.edu/~wu/

  2. Hadoop at Facebook Production cluster 4800 cores, 600 machines, 16GB per machine – April 2009 8000 cores, 1000 machines, 32 GB per machine – July 2009 4 SATA disks of 1 TB each per machine 2 level network hierarchy, 40 machines per rack Total cluster size is 2 PB, projected to be 12 PB in Q3 2009 Test cluster 800 cores, 16GB each Cloudera: “Redhat” for Hadoop Cloud Computing Davis Social Links

  3. Yahoo! Hadoop Clusters Yahoo! has ~10,000 machines running Hadoop The largest cluster is currently 1600 nodes Nearly 1 petabyte of user data (compressed, unreplicated) run roughly 10,000 research jobs / week Davis Social Links

  4. Web Crawl Problem Detection • The Problem • Yahoo! crawls billions of pages per day, how do you detect when one site has a problem? • The Solution • We load the crawl logs into Hadoop (via a map-reduce job) • We aggregate reports by site over time and flag sites where the crawl behavior has changed • This generates a report to customer service every day • They contact web masters and get sites fixed Davis Social Links

  5. Facebook Data Flow Web Servers Scribe Servers Network Storage Oracle RAC Hadoop Cluster MySQL Davis Social Links

  6. Facebook Hadoop and Hive Usage Statistics : 15 TB uncompressed data ingested per day 55TB of compressed data scanned per day 3200+ jobs on production cluster per day 80M compute minutes per day Barrier to entry is reduced: 80+ engineers have run jobs on Hadoop platform Analysts (non-engineers) starting to use Hadoop through Hive Davis Social Links

  7. What Is ? • Distributed computing frame work • For clusters of computers • Thousands of Compute Nodes • Petabytes of data • Open source, Java • Google’s MapReduce and GFS inspired Yahoo’s Hadoop. Davis Social Links

  8. What Is ? • The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing. Hadoop includes: • Hadoop Common utilities • Avro: A data serialization system with scripting languages. • Chukwa: managing large distributed systems. • HBase: A scalable, distributed database for large tables. • HDFS: A distributed file system. • Hive: data summarization and ad hoc querying. • MapReduce: distributed processing on compute clusters. • Pig: A high-level data-flow language for parallel computation. • ZooKeeper: coordination service for distributed applications. Davis Social Links

  9. GFS – Google File System • “failures” are norm • Multiple-GB files are common • Append rather than overwrite • Random writes are rare • Can we relax the consistency? Davis Social Links

  10. Davis Social Links

  11. The Master • Maintains all file system metadata. • names space, access control info, file to chunk mappings, chunk (including replicas) location, etc. • Periodically communicates with chunkservers in HeartBeat messages to give instructions and check state Davis Social Links

  12. The Master • Helps make sophisticated chunk placement and replication decision, using global knowledge • For reading and writing, client contacts Master to get chunk locations, then deals directly with chunkservers • Master is not a bottleneck for reads/writes Davis Social Links

  13. Chunkservers • Files are broken into chunks. Each chunk has a immutable globally unique 64-bit chunk-handle. • handle is assigned by the master at chunk creation • Chunk size is 64 MB • Each chunk is replicated on 3 (default) servers Davis Social Links

  14. Clients • Linked to apps using the file system API. • Communicates with master and chunkservers for reading and writing • Master interactions only for metadata • Chunkserver interactions for data • Only caches metadata information • Data is too large to cache. Davis Social Links

  15. Chunk Locations • Master does not keep a persistent record of locations of chunks and replicas. • Polls chunkservers at startup, and when new chunkservers join/leave for this. • Stays up to date by controlling placement of new chunks and through HeartBeat messages (when monitoring chunkservers) Davis Social Links

  16. Davis Social Links

  17. Atomic Commitment • All replica made the change or none of them did! • Asynchronous updates > Inconsistency • “Commit State” and “Write State” • Before you really write “it” to the public record, you already have it “committed”. Davis Social Links

  18. Atomic commit protocols one-phase atomic commit protocol • the coordinator tells the participants whether to commit or abort • what is the problem with that? • this does not allow one of the servers to decide to abort – it may have discovered a deadlock or it may have crashed and been restarted The decision could be commit or abort – participants record it in permanent store Davis Social Links •

  19. Atomic commit protocols two-phase atomic commit protocol • is designed to allow any participant to choose to abort a transaction • phase 1 - each participant votes. If it votes to commit, it is prepared. It cannot change its mind. In case it crashes, it must save updates in permanent store • phase 2 - the participants carry out the joint decision The decision could be commit or abort – participants record it in permanent store Davis Social Links •

  20. Two phase commit (2PC) Coordinator What is your result? ….. Server Server Server Server Server Coordinator Final consensus. ….. Server Server Server Server Server Davis Social Links

  21. Failure model • Commit protocols are designed to work in • asynchronous system (e.g. messages may take a very long time) • servers/coordinator may crash • messages may be lost. • assume corrupt and duplicated messages are removed. • no byzantine faults – servers either crash or they obey their requests • 2PC is an example of a protocol for reaching a consensus. • because crash failures of processes are masked by replacing a crashed process with a new process whose state is set from information saved in permanent storage and information held by other processes. Davis Social Links •

  22. 2PC • 2PC • voting phase: coordinator asks all servers if they can commit • if yes, server records updates in permanent storage and then votes • completion phase: coordinator tells all servers to commit or abort Davis Social Links •

  23. INIT INIT vote canCommit WAIT READY doAbort doCommit ABORT COMMIT ABORT COMMIT haveCommitted coordinator server Davis Social Links

  24. Davis Social Links

  25. INIT INIT vote canCommit WAIT READY doAbort doCommit ABORT COMMIT ABORT COMMIT haveCommitted Davis Social Links

  26. Failures • Some servers missed “canCommit”. • Coordinator missed some “votes”. • Some servers missed “doAbort” or “doCommit”. Davis Social Links

  27. Failures/Crashes • Some servers crashed b/a “canCommit”. • Coordinator crashed b/a receiving some “votes”. • Some servers crashes b/a receiving “doAbort” or “doCommit”. Davis Social Links

  28. INIT INIT vote canCommit WAIT READY doAbort doCommit ABORT COMMIT ABORT COMMIT haveCommitted Assume the coordinator crashed after “canCommit” messages have been sent: (0). Some servers have not received the vote requests. WAIT/INIT (1). All good servers are in the WAIT state. WAIT/INIT (2). Some servers are in either ABORT or COMMIT state. ABORT/COMMIT (3). All servers are in either ABORT or COMMIT state. ABORT/COMMIT Davis Social Links

  29. INIT INIT vote canCommit WAIT READY doAbort doCommit ABORT COMMIT ABORT COMMIT haveCommitted Assume the coordinator crashed after “canCommit” messages have been sent: (0). Some servers have not received the vote requests. ABORT (1). All good servers are in the WAIT state. ABORT (2). Some servers are in either ABORT or COMMIT state. ABORT (3). All servers are in either ABORT or COMMIT state. ABORT/COMMIT Davis Social Links

  30. INIT INIT vote canCommit WAIT READY doAbort doCommit ABORT COMMIT ABORT COMMIT haveCommitted Assume the coordinator crashed after “canCommit” messages have been sent: (0). Some servers have not received the vote requests. ABORT (1). All good servers are in the WAIT state. ABORT (2). Some servers are in either ABORT or COMMIT state. ABORT/COMMIT (3). All servers are in either ABORT or COMMIT state. ABORT/COMMIT Davis Social Links

  31. Coordinator ….. Server Server Server Server Server Davis Social Links

  32. COMMITTED Coordinator M ….. Server Server Server Server Server COMMITTED;WRITTEN WAITING Davis Social Links

  33. INIT INIT vote canCommit WAIT READY doAbort doCommit ABORT COMMIT ABORT COMMIT haveCommitted Assume the coordinator crashed after “canCommit” messages have been sent: (0). Some servers have not received the vote requests. ABORT (1). All good servers are in the WAIT state. ??? (2). Some servers are in either ABORT or COMMIT state. ABORT/COMMIT (3). All servers are in either ABORT or COMMIT state. ABORT/COMMIT Davis Social Links

  34. 2PC • Concept widely used! • The only “holding” condition is … Davis Social Links

  35. 3PCSkeen & Stonebraker, 1983 INIT Uncertain vote Aborted WAIT ABORT Pre-COMMIT Committable ACK COMMIT Committed Davis Social Links

  36. COMMITTED Coordinator M ….. Server Server Server Server Server COMMITTED;WRITTEN PRE-COMIT/ WAITING?? Davis Social Links

  37. HDFS Architecture Cluster Membership NameNode 1. filename Secondary NameNode 2. BlckId, DataNodes o Client 3.Read data Cluster Membership NameNode : Maps a file to a file-id and list of MapNodes DataNode : Maps a block-id to a physical location on disk SecondaryNameNode: Periodic merge of Transaction log DataNodes Davis Social Links

  38. Davis Social Links

  39. Map and Reduce • The idea of Map, and Reduce is 40+ year old • Present in all Functional Programming Languages. • See, e.g., APL, Lisp and ML • Alternate names for Map: Apply-All • Higher Order Functions • take function definitions as arguments, or • return a function as output • Map and Reduce are higher-order functions. Davis Social Links

  40. Map: A Higher Order Function • F(x: int) returns r: int • Let V be an array of integers. • W = map(F, V) • W[i] = F(V[i]) for all I • i.e., apply F to every element of V Davis Social Links

  41. Map Examples in Haskell • map (+1) [1,2,3,4,5] == [2, 3, 4, 5, 6] • map (toLower) "abcDEFG12!@#“ == "abcdefg12!@#“ • map (`mod` 3) [1..10] == [1, 2, 0, 1, 2, 0, 1, 2, 0, 1] Davis Social Links

  42. Word Count Example • Read text files and count how often words occur. • The input is text files • The output is a text file • each line: word, tab, count • Map: Produce pairs of (word, count) • Reduce: For each word, sum up the counts. Davis Social Links

  43. I,1 am,1 a,1 a,2 also,1 am,1 are,1 tiger,1 you,1 are,1 I, 1 tiger,2 you,1 also,1 a, 1 tiger,1 I am a tiger, you are also a tiger a,2 also,1 am,1 are,1 I,1 tiger,2 you,1 map reduce a, 1 a,1 also,1 am,1 are,1 I,1 tiger,1 tiger,1 you,1 map reduce map Davis Social Links

  44. Grep Example Search input files for a given pattern Map: emits a line if pattern is matched Reduce: Copies results to output Davis Social Links

  45. Inverted Index Example Generate an inverted index of words from a given set of files Map: parses a document and emits <word, docId> pairs Reduce: takes all pairs for a given word, sorts the docId values, and emits a <word, list(docId)> pair Davis Social Links

  46. Execution on Clusters Input files split (M splits) Assign Master & Workers Map tasks Writing intermediate data to disk (R regions) Intermediate data read & sort Reduce tasks Return Davis Social Links

  47. <Key, Value> Pair Row Data Map key1 val key2 val key1 val … … Select Key Map Reduce Input Input key1 val val Reduce …. val Output Output 47 key values Davis Social Links

  48. split 0 split 1 split 2 split 3 split 4 input HDFS map output HDFS sort/copy merge reduce part0 map reduce part1 map Davis Social Links

  49. Davis Social Links

  50. Davis Social Links

More Related