1 / 30

A MapReduce-Based Maximum-Flow Algorithm for Large Small-World Network Graphs

A MapReduce-Based Maximum-Flow Algorithm for Large Small-World Network Graphs. Felix Halim, Roland H.C. Yap, Yongzheng Wu. Outline. Background and Motivation Overview Maximum-Flow algorithm (The Ford-Fulkerson Method) MapReduce (MR) Framework Parallelizing the Ford-Fulkerson Method

bertha
Download Presentation

A MapReduce-Based Maximum-Flow Algorithm for Large Small-World Network Graphs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A MapReduce-Based Maximum-Flow Algorithm for Large Small-World Network Graphs Felix Halim, Roland H.C. Yap, Yongzheng Wu

  2. Outline • Background and Motivation • Overview • Maximum-Flow algorithm (The Ford-Fulkerson Method) • MapReduce (MR) Framework • Parallelizing the Ford-Fulkerson Method • Incremental update, bi-directional search, multiple excess paths • MapReduce Optimizations • Stateful extension, trading off space vs. number of rounds • Experimental Results • Conclusion

  3. Background and Motivation • Large Small-world network Graphs naturally arise in • World Wide Web, Social Netwoks, Biology, etc… • Have been shown to have small diameter and robust • Maximum-flow algorithm on Large Graphs • Isolate a group of spam sites on WWW • Community Identification • Defense against Sybil attack • Challenge:Computation, storage and memory requirements exceed a single machine capacity • Our approach: Cloud/Cluster Computing • Run Max-Flow on top of the MapReduce Framework

  4. The Maximum-Flow Overview • Given a flow network G = (V,E), s, t • Residual network Gf = (V, Ef) • Augmenting path • A simple path from s to t in the residual network • To compute the maximum-flow • The Ford-Fulkerson Method O( |f*| E ) While an Augmenting path p is found in Gf Augment the flow along the path p

  5. Maximum-Flow Example Residual Network 9 Legends: source node sink node intermediate node residual = C – F augmenting path p 6 10 t s 2 10 s 7 9 8 t Augment p 3 0 4 r 6 6 6 t s 2 10 7 9 8

  6. Maximum-Flow Example Residual Network 3 Legends: source node sink node intermediate node residual = C – F augmenting path p 0 4 6 6 6 t s 2 10 s 7 9 8 t Augment p 3 0 4 r 6 6 6 t s 2 10 2 0 1 7 7 7

  7. Maximum-Flow Example Residual Network 3 Legends: source node sink node intermediate node residual = C – F augmenting path p 0 4 6 6 6 t s 2 10 s 2 0 1 7 7 t 7 Augment p 2 0 3 r 7 7 6 1 t s 10 1 1 0 0 8 7 8

  8. Maximum-Flow Example Flow / Capacity 7/9 Legends: source node sink node intermediate node residual = C – F augmenting path p 6/6 7/10 t s 1/2 10 s 7/7 8/9 8/8 t Max-Flow = 14 2 0 3 r 7 7 6 1 t s 10 1 1 0 0 8 7 8

  9. MapReduce Framework Overview • Introduced by Google in 2004 • Open source implementation: Hadoop • Operates on very large dataset • Consists of key/value pairs • Across thousands of commodity machines • Order terabytes of data • Abstracts away distributed computing problems • Data partitioning and distribution, load balancing • Scheduling, fault tolerance, communication, etc…

  10. MapReduce Model • User Defined Map and Reduce function (stateless) • Input: a list of tuples of key/value pairs (k1/v1) • The user’s map function is applied to each key/value pair • Produces a list of intermediate key/value pairs • Output: a list of tuples of key/value pairs (k2/v2) • The intermediate values are grouped by its key • The user’s reduce function is applied to each group • Each tuple is independent • Can be processed in isolation and massively parallel manner • Total input can be far larger than the total workers’ memory

  11. Max-Flow on MR - Observations • Input: Small-world graph with small diameter D • A vertex in the graph is modeled as an input tuple in MR • Naïve translation of the Ford-Fulkerson method to MapReduce requires O(|f*| D)MR rounds • Breadth-first search on MR requires O(D) MR rounds • BFSMR using 20 machines takes 9 rounds and 6 hours for • SWN with ~400M vertices and ~31B edges • It would take years to compute a maxflow (|f*| > 1000) • Question: How to minimize the number of rounds?

  12. FF1: A Parallel Ford-Fulkerson • Goal: Minimize the number of rounds thru parallelism • Do more work/round, avoid spilling tasks to the next round • Use speculative execution to increase parallelism • Incremental updates • Bi-directional search • Doubles the parallelism (number of active vertices) • Effectively halves the expected number of rounds • Maintain parallelism (maintain large number of active vertices) • Multiple Excess Paths (k) -> most effective • Each vertex stores k excess paths (avoid becoming inactive) • Result: • Lower the number of rounds required from O(|f*| D) to ~D • Large number of augmenting path/round (MRbottleneck)

  13. MR Optimizations • Goal: minimize MR bottlenecks and overheads • FF2 : External worker(s) for stateful MR extension • Reducer for vertex t is the bottleneck for FF1 (bottleneck) • An external process is used to handle augmenting paths acceptance • FF3 : Schimmy Method [Lin10] • Avoid shuffling the master graph • FF4 : Eliminate object instantiations • FF5 : Minimize MR shuffle (communication) costs • Monitor the extended excess paths for saturation and resend as needed • Recompute/reprocess the graph instead of shuffling the delta (not in paper)

  14. Experimental Results • Facebook Sub-Graphs • Cluster setup • Hadoop v0.21-RC-0 installed on 21 nodes (8-cores Hyper-threaded (2 Intel E5520 @2.27GHz), 3 hard disks (@500GB SATA) and Centos 5.4 (64-bit))

  15. Handling Large Max-Flow Values • FF5MR is able to process FB6 with a very small number of MR rounds (close to graph diameter).

  16. MapReduce Optimizations • FF1 (parallel FF) to FF5 (MR optimized) vs. BFSMR

  17. Shuffle Bytes Reduction • The bottleneck in MR is the shuffle bytes, FF5 optimizes the shuffled bytes

  18. Scalability (#Machines and Graph Size)

  19. Conclusion • We showed how to parallelize a sequential max-flow algorithm, minimizing the number of MR rounds • Incremental Updates, Bi-directional Search, and Multiple Excess Paths • We showed MR optimizations for max-flow • Stateful extension, minimize communication costs • Computing max-flow on Large Small-World Graph • Is practical using FF5MR

  20. Q & A • Thank you

  21. Backup Slides • Related Works • Multiple Excess Paths – Results • Edge processed / second – Results • MapReduce example (word count) • MapReduce execution flow • FF1 Map Function • FF1 Reduce Function • MR

  22. Related Work • The Push-Relabel algorithm • Distributed (no global view of the graph required) • Have been developed for SMP architectures • Needs sophisticated heuristics to push the flows • Not Suitable for MapReduce model • Need locks, pull information from its neighbors • Low number of active vertices • Pushing flow to a wrong sub-graph can lead to huge number of rounds

  23. Multiple Excess Paths - Effectiveness • The more the k, the less the number of MR rounds required (keep the number of active vertices high)

  24. # Edges processed / sec • The larger the graph, the more effective

  25. MapReduce Example – Word Count map(key,value) // document name, contents foreach word w in value EmitIntermediate(w, 1); reduce(key,values) // a word, list of counts freq = 0; foreach v in values freq = freq + v Emit(key, freq)

  26. MapReduce Execution Flow

  27. MapReduce Simple Applications • Word Count • Distributed Grep • Count URL access frequency • Reverse Web-Link Graph • Term-Vector per host • Inverted Index • Distributed Sort All these tasks can be completed in one MR job

  28. General MR Optimizations • External worker as stateful Extension for MR • (Dedicated) External workers outside mappers / reducers • Immediately process requests • Don’t need to wait until mappers / reducers complete • Flexible synchronization point • Minimize shuffling intermediate tuples • Can avoid shuffling by re-(process/compute) in the reduce • Use flags in the data structure to prevent re-shuffling • Eliminate object instantiation • Use binary serializations

More Related