370 likes | 728 Views
R. Y. B. H. J. 2014. H. Differentiated Graph Computation and Partitioning on Skewed Graphs. PowerLyra. Rong Chen , JiaXin Shi, Yanzhe Chen, and Haibo Chen Institute of Parallel and Distributed Systems Shanghai Jiao Tong University
E N D
R Y B H J 2014 H Differentiated Graph Computation and Partitioning on Skewed Graphs PowerLyra Rong Chen, JiaXin Shi, Yanzhe Chen, and Haibo Chen Institute of Parallel and Distributed Systems Shanghai Jiao Tong University http://ipads.se.sjtu.edu.cn/projects/powerlyra.html
Big Data Everywhere 100Hrsof Video every minute 1.11 BillionUsers 6 BillionPhotos 400 MillionTweets/day How do we understand and use Big Data?
Big Data Big Learning 100Hrsof Video every minute 1.11 BillionUsers 6 BillionPhotos 400 MillionTweets/day Big Learning: machine learning and data mining on Big Data NLP
Example Algorithms 2 4 • PageRank(Centrality Measures) 3 1 5 α is the random reset probability L[j] is the number of links on page j iterate until convergence example: http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf
Background: Graph Algorithms Dependent Data LocalAccesses Iterative Computation 2 2 2 4 4 4 3 3 3 1 1 1 5 5 5 Coding graph algorithms as vertex-centric programs to process vertices in parallel and communicate along edges "Think as a Vertex" philosophy
Think as a Vertex • Algorithm • Impl. compute() for vertex • 1. aggregate value of neighbors • 2. update itself value • 3. activate neighbors compute(v): double sum = 0 doublevalue, last = v.get () foreach (n inv.in_nbrs) sum +=n.value/n.nedges; value =0.15+ 0.85 * sum; v.set(value); activate (v.out_nbrs); 1 Example: PageRank 2 3
Graph in Real World Hallmark Property :Skewed power-law degree distributions “most vertices have relatively few neighbors while a few have many neighbors” Low Degree Vertex star-like motif High Degree Vertex • count Twitter Following Graph:1% of the vertices are adjacent to nearly half of the edges • degree
Existing Graph Models sample graph A B master x5 mirror x2 A A A A A B B B B B Pregel GraphLab PowerGraph Computation Model Pregel GraphLab PowerGraph Graph Placement Comp. Pattern Comm. Cost Dynamic Comp. Load Balance edge-cuts local ≤ #edge-cuts no no edge-cuts local ≤ 2 x #mirrors yes no vertex-cuts distributed ≤ 5 x #mirrors yes yes
Existing Graph Cuts master 6 6 6 mirror imbalance 4 1 2 1 2 4 1 2 dup.edge 3 5 3 5 Edge-cut flying master Vertex-cut 5 6 1 2 4 1 2 6 4 1 2 3 5 3 5 random 6 2 6 greedy 4 4 1 1 2 1 5 3 6
Issues of Graph Partitioning Edge-cut: Imbalance & replicated edges Vertex-cut:do not exploit locality • Random: high replication factor* • Greedy: long ingress time, unfair to low-degree vertex • Constrained: imbalance, poor placement of low-vertex Twitter Follower Graph 48 machines, |V|=42M |E|=1.47B
Principle of PowerLyra • The vitally important challenges associated to the performance of distributed computation system • 1. How to make resource locallyaccessible? • 2. How to evenly parallelizeworkloads? Conflict • Differentiated Graph Computation and Partitioning • Low-degree vertex Locality One Size fit All • High-degree vertex Parallelism
Computation Model High-degree vertex • Goal: exploit parallelism • Follow GAS model [PowerGraph OSDI’12] “Gather ApplyScatter” compute (v) double sum = 0 doublevalue, last = v.get() foreach (n inv.in_nbrs) sum +=n.value/n.nedges; value =0.15 + 0.85 * sum; v.set(value); activate (v.out_nbrs); gather(n): return n.value/n.nedges; scatter(v) activate (v.out_nbrs); apply (v, acc): value =0.15 + 0.85 * acc; v.set(value);
Computation Model High-degree vertex • Goal: exploit parallelism • Follow GAS model [PowerGraph OSDI’12] master mirrors 1 1 call gather() Gather Gather Gather H H 2 master mirrors 2 Apply 3 call apply() Apply 4 Scatter Scatter master mirrors 3 5 master mirrors 4 Scatter call scatter() master mirrors 5
Computation Model Observation: most algorithms only gather or scatter in one direction (e.g., PageRank: G/IN and S/OUT) Low-degree vertex • Goal: exploit locality • One directionlocality (avoid replicated edges) • Local gather + distributed scatter • Comm. Cost : ≤ 1 x #mirrors L L e.g., PageRank: Gather/IN & Scatter/OUT Gather call gather() Gather call apply() Apply Apply 1 master mirrors 1 Scatter Scatter Scatter call scatter() All of in-edges
Computation Model Generality • Algorithm gather or scatter in two directions • Adaptive degradation for gathering or scattering • Easily check in runtime without overhead(user has explicitly defined access direction in code) L L e.g., Gather/IN & Scatter/ALL Gather Gather Apply 1 Scatter Scatter 2
Graph Partitioning Low-degree vertex • Place one direction edges (e.g., in-edges) of a vertex to its hash-based machine • Simple, but Best ! • Lower replication factor • One direction locality • Efficiency (ingress/runtime) • Balance (#edge) • Fewer flying master Synthetic Regular Graph*48 machines, |V|=10M |E|=93M *https://github.com/graphlab-code/graphlab/blob/master/src/graphlab/graph/distributed_graph.hpp
Graph Partitioning High-degree vertex • Distribute edges(e.g., in-edges)according to another endpoint vertex (e.g., source) • The upper bound of replications imported by placing all edges belonged to high-degree vertex is #machines Existing Vertex-cut low-master low-mirror high-master high-mirror Low-degree mirror
Graph Partitioning High-degree vertex • Distribute edges (e.g., in-edges) according to another endpoint vertex (e.g., source) • The upper bound of replications imported by placing all edges belonged to high-degree vertex is #machines High-cut low-master low-mirror high-master high-mirror
Graph Partitioning Hybrid vertex-cut • User defined threshold (θ) and the direction of locality • Group edges in hash-based machine of vertex • Low-cut: done! / High-cut: re-assignment e.g., θ=3, IN group 1 1 2 1 4 reassign 6 6 5 3 3 3 2 4 4 1 1 1 1 2 2 5 3 construct 5
Heuristicfor Hybrid-cut Inspired by heuristic for edge-cut • choose best master location of vertex according to neighboring has located • Consider one direction neighbors is enough • Only apply to low-degree vertices • Parallel ingress: periodicallysynchronize private mapping-table(global vertex-id machine)
Optimization 2 4 3 5 7 1 Challenge: graph computation usually exhibits poor data access (cache) locality* • irregular traversal of neighboring vertices along edges How about (cache) locality in communication? • Problem: a mismatch of orders btw. sender & receiver *LUMSDAINE et al. Challenges in parallel graph processing. 2007
Locality-conscious Layout General Idea: match orders by hybrid vertex-cut • Tradeoff: ingress time vs. runtime • Decentralized matching global vertex-id M1 2 7 5 3 6 1 8 4 M2 2 6 4 1 5 8 3 9 7 M3 4 9 8 6 2 5 1 9 Zoning Z1 Z2 Z3 Z4 H1 L1 h-mrr l-mrr M1 1 7 4 6 8 3 2 5 H2 L2 h-mrr l-mrr M2 2 5 8 6 1 4 3 9 7 M3 H3 L3 h-mrr l-mrr 6 9 3 1 2 5 8 4 High-master Low-master high-mirror low-mirror 6 9 2 8
Locality-conscious Layout General Idea: match orders by hybrid vertex-cut • Tradeoff: ingress time vs. runtime • Decentralized algorithm global vertex-id M1 1 7 4 6 5 3 H1 L1 h2 h3 l2 l3 2 8 M2 H2 L2 h1 h3 l1 l3 2 5 8 1 6 4 7 9 3 M3 6 9 3 1 2 4 8 5 H3 L3 h1 h2 l1 l2 Grouping Z1 Z2 Z3 Z4 H1 L1 h-mrr l-mrr M1 1 7 4 6 8 3 2 5 H2 L2 h-mrr l-mrr M2 2 5 8 6 1 4 3 9 7 M3 H3 L3 h-mrr l-mrr 6 9 3 1 2 5 8 4 High-master Low-master high-mirror low-mirror 6 9 2 8
Locality-conscious Layout General Idea: match orders by hybrid vertex-cut • Tradeoff: ingress time vs. runtime • Decentralized algorithm global vertex-id M1 1 7 4 6 5 3 H1 L1 h2 h3 l2 l3 2 8 M2 H2 L2 h1 h3 l1 l3 2 5 8 1 6 4 7 9 3 M3 6 9 3 1 2 4 8 5 H3 L3 h1 h2 l1 l2 Sorting M1 1 4 7 6 8 3 H1 L1 h2 h3 l2 l3 2 5 M2 H2 L2 h1 h3 l1 l3 2 5 8 1 6 4 7 3 9 M3 6 3 9 1 2 4 5 8 H3 L3 h1 h2 l1 l2 High-master Low-master high-mirror low-mirror 6 9 2 8
Locality-conscious Layout General Idea: match orders by hybrid vertex-cut • Tradeoff: ingress time vs. runtime • Decentralized algorithm global vertex-id H1 L1 h2 h3 l2 l3 M1 1 4 7 6 8 3 2 5 H2 L2 h3 h1 l3 l1 M2 2 5 8 6 1 3 9 4 7 H3 L3 h1 h2 l1 l2 M3 6 3 9 1 2 4 5 8 Rolling M1 1 4 7 6 8 3 H1 L1 h2 h3 l2 l3 2 5 M2 H2 L2 h1 h3 l1 l3 2 5 8 1 6 4 7 3 9 M3 6 3 9 1 2 4 5 8 H3 L3 h1 h2 l1 l2 High-master Low-master high-mirror low-mirror 6 9 2 8
Evaluation Experiment Setup • 48-node EC2-like cluster (4-core 12G RAM 1GigE NIC) • Graph Algorithms • PageRank • Approximate Diameter • Connected Components • Data Set: • 5 real-world graphs • 5 synthetic power-law graphs* *Varying α and fixed 10 million vertices (smaller α produces denser graphs)
Runtime Speedup Hybrid: 2.02X ~ 2.96X Hybrid: 1.40X ~ 2.05X Ginger: 2.17X ~ 3.26X Ginger: 1.97X ~ 5.53X Power-law Graphs Real-world Graphs better PageRankGather: IN / Scatter: OUT 48 machines and baseline: PowerGraph + Grid (default)
Runtime Speedup Hybrid: 1.93X ~ 2.48X Hybrid: 1.44X ~ 1.88X Ginger: 1.97X ~ 3.15X Ginger: 1.50X ~ 2.07X Approximate Diameter Gather: OUT / Scatter: NONE Connected Component Gather: NONE / Scatter: ALL better 48 machines and baseline: PowerGraph + Grid (default)
Communication Cost 188MB 394MB 170MB 79.4% better Power-law Graphs Real-world Graphs
Effectiveness of Hybrid Hybrid Graph Partitioning Power-law (48) Ingress Time better Real-world (48) Scalability (Twitter)
Effectiveness of Hybrid Hybrid Graph Computation better
Scalability Increasing of data size Increasing of machines better
Conclusion PowerLyra • a new hybridgraph analytics engine that embraces the best of both worlds of existing frameworks • an efficient hybridgraph partitioning algorithm that adopts different heuristics for different vertices. • outperforms PowerGraph with default partition by up to 5.53X and 3.26X for real-world and synthetic graphs accordingly http://ipads.se.sjtu.edu.cn/projects/powerlyra.html
Thanks Questions PowerLyra http://ipads.se.sjtu.edu.cn/projects/powerlyra.html Institute of Parallel And Distributed Systems http://ipads.se.sjtu.edu.cn