1 / 54

GraphLab A New Framework for Parallel Machine Learning

GraphLab A New Framework for Parallel Machine Learning. Yucheng Low Aapo Kyrola Carlos Guestrin. Joseph Gonzalez Danny Bickson Joe Hellerstein. Exponential Parallelism. Exponentially Increasing Parallel Performance. Exponentially Increasing Sequential Performance.

tausiq
Download Presentation

GraphLab A New Framework for Parallel Machine Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GraphLabA New Framework for Parallel Machine Learning Yucheng Low AapoKyrola Carlos Guestrin Joseph Gonzalez DannyBickson JoeHellerstein

  2. Exponential Parallelism Exponentially Increasing Parallel Performance Exponentially Increasing Sequential Performance 13 Million Wikipedia Pages 3.6 Billion photos on Flickr Constant Sequential Performance Processor Speed GHz Release Date

  3. Parallel Programming is Hard • Designing efficient Parallel Algorithms is hard • Race conditions and deadlocks • Parallel memory bottlenecks • Architecture specific concurrency • Difficult to debug • ML experts repeatedly address the same parallel design challenges Graduate students Avoid these problems by using high-level abstractions.

  4. MapReduce – Map Phase 4 2 . 3 2 1 . 3 2 5 . 8 CPU 1 1 2 . 9 CPU 2 CPU 3 CPU 4 Embarrassingly Parallel independent computation No Communication needed

  5. MapReduce – Map Phase 8 4 . 3 1 8 . 4 8 4 . 4 CPU 1 2 4 . 1 CPU 2 CPU 3 CPU 4 1 2 . 9 4 2 . 3 2 1 . 3 2 5 . 8 Embarrassingly Parallel independent computation No Communication needed

  6. MapReduce – Map Phase 6 7 . 5 1 4 . 9 3 4 . 3 CPU 1 1 7 . 5 CPU 2 CPU 3 CPU 4 8 4 . 3 1 8 . 4 8 4 . 4 1 2 . 9 2 4 . 1 4 2 . 3 2 1 . 3 2 5 . 8 Embarrassingly Parallel independent computation No Communication needed

  7. MapReduce – Reduce Phase 17 26 . 31 22 26 . 26 CPU 1 CPU 2 1 2 . 9 2 4 . 1 1 7 . 5 4 2 . 3 8 4 . 3 6 7 . 5 2 1 . 3 1 8 . 4 1 4 . 9 2 5 . 8 8 4 . 4 3 4 . 3 Fold/Aggregation

  8. Related Data Interdependent Computation: Not MapReduceable

  9. Parallel Computing and ML • Not all algorithms are efficiently data parallel Data-ParallelComplex Parallel Structure Kernel Methods Belief Propagation Feature Extraction SVM Tensor Factorization Cross Validation Sampling Deep Belief Networks Neural Networks Lasso

  10. Common Properties 1) Sparse Data Dependencies • Sparse Primal SVM • Tensor/Matrix Factorization 2) Local Computations • Sampling • Belief Propagation 3) Iterative Updates • Expectation Maximization • Optimization Operation A Operation B

  11. Gibbs Sampling 1) Sparse Data Dependencies X2 X1 X3 2) Local Computations X4 X5 X6 3) Iterative Updates X7 X8 X9

  12. GraphLab is the Solution • Designed specifically for ML needs • Express data dependencies • Iterative • Simplifies the design of parallel programs: • Abstract away hardware issues • Automatic data synchronization • Addresses multiple hardware architectures • Implementation here is multi-core • Distributed implementation in progress

  13. GraphLab A New Framework for Parallel Machine Learning

  14. GraphLab Data Graph Shared Data Table GraphLab Model Update Functions and Scopes Scheduling

  15. Data Graph A Graph with data associated with every vertex and edge. X1 X2 X3 X4 x3: Sample value C(X3): sample counts X5 X7 X6 Φ(X6,X9): Binary potential X11 X8 X9 X10 :Data

  16. Update Functions Update Functions are operations which are applied on a vertex and transform the data in the scope of the vertex Gibbs Update: - Read samples on adjacent vertices - Read edge potentials - Compute a new sample for the current vertex

  17. Update Function Schedule b d a c e f g CPU 1 a h i k h j a i CPU2 b d

  18. Update Function Schedule b d a c e f g CPU 1 i k h j a i CPU2 b d

  19. Static Schedule Scheduler determines the order of Update Function Evaluations Synchronous Schedule: Every vertex updated simultaneously Round Robin Schedule: Every vertex updated sequentially

  20. Need for Dynamic Scheduling Converged Slowly Converging Focus Effort

  21. Dynamic Schedule b d a c e f g CPU 1 b a h i k h j a b i CPU2

  22. Dynamic Schedule Update Functions can insert new tasks into the schedule Residual BP [Elidan et al.] Splash BP [Gonzalez et al.] Wildfire BP [Selvatici et al.] Splash Schedule Priority Queue FIFO Queue Obtain different algorithms simply by changing a flag! --scheduler=fifo --scheduler=priority --scheduler=splash

  23. Global Information What if we need global information? Algorithm Parameters? Sufficient Statistics? Sum of all the vertices?

  24. Shared Data Table (SDT) • Global constant parameters Constant: Temperature Constant: Total# Samples

  25. Sync Operation • Sync is a fold/reduce operation over the graph • Accumulate performs an aggregation over vertices • Apply makes a final modification to the accumulated data • Example: Compute the average of all the vertices 1 Sync! 0 6 5 3 1 2 Add Accumulate Function: 8 Apply Function: Divide by |V| 2 9 22 1 3 2 1 1 1 2

  26. Shared Data Table (SDT) • Global constant parameters • Global computation (Sync Operation) Sync: Loglikelihood Constant: Temperature Constant: Total# Samples Sync: Sample Statistics

  27. SafetyandConsistency

  28. Write-Write Race Write-Write Race If adjacent update functions write simultaneously Left update writes: Right update writes: Final Value

  29. Race Conditions + Deadlocks • Just one of the many possible races • Race-free code is extremely difficult to write GraphLab design ensures race-free operation

  30. Scope Rules Full Consistency Guaranteed safety for all update functions

  31. Full Consistency Full Consistency Only allow update functions two vertices apart to be run in parallel Reduced opportunities for parallelism

  32. Obtaining More Parallelism Full Consistency Not all update functions will modify the entire scope! Edge Consistency Belief Propagation: Only uses edge data Gibbs Sampling: Only needs to read adjacent vertices

  33. Edge Consistency Edge Consistency

  34. Obtaining More Parallelism Full Consistency Edge Consistency Vertex Consistency “Map”operations. Feature extraction on vertex data

  35. Vertex Consistency Vertex Consistency

  36. Sequential Consistency GraphLab guarantees sequential consistency For every parallel execution, there exists a sequential execution of update functions which will produce the same result. time CPU 1 Parallel CPU2 CPU1 Sequential

  37. GraphLab Data Graph Shared Data Table GraphLab Model Update Functions and Scopes Scheduling

  38. Experiments

  39. Experiments • Shared Memory Implemention in C++ using Pthreads • Tested on a 16 processor machine • 4x Quad Core AMD Opteron 8384 • 64 GB RAM • Belief Propagation +Parameter Learning • Gibbs Sampling • CoEM • Lasso • Compressed Sensing • SVM • PageRank • Tensor Factorization

  40. Graphical Model Learning 3D retinal image denoising Data Graph: 256x64x64 vertices Sync: Edge-potential Update Function Belief Propagation Sync Acc: Compute inference statistics Apply:Take a gradient step

  41. Graphical Model Learning 15.5x speedup on 16 cpus Optimal Better Splash Schedule Approx. Priority Schedule

  42. Graphical Model Learning Standard parameter learning takes gradient only after inference is compute With GraphLab: Take gradient step while inference is running 2100 sec Parallel Inference + Gradient Step 3x faster! 700 sec Runtime Inference Simultaneous Iterated Gradient Step

  43. Gibbs Sampling • Two methods for sequentially consistency: Scopes Edge Scope Scheduling Graph Coloring CPU 1 CPU 2 CPU 3 t0 t1 t2 t3 graphlab(gibbs, edge, sweep); graphlab(gibbs, vertex, colored);

  44. Gibbs Sampling • Protein-protein interaction networks [Elidan et al. 2006] • Pair-wise MRF • 14K Vertices • 100K Edges • 10x Speedup • Scheduling reduceslocking overhead Better Optimal Round robin schedule Colored Schedule

  45. CoEM (Rosie Jones, 2005) Named Entity Recognition Task Is “Dog” an animal? Is “Catalina” a place? the dog <X> ran quickly Australia travelled to <X> Catalina Island <X> is pleasant

  46. CoEM (Rosie Jones, 2005) Optimal Better 6x fewer CPUs! 15x Faster! Large Small 46

  47. Lasso L1 regularized Linear Regression Shooting Algorithm (Coordinate Descent) Due to the properties of the update, full consistency is needed

  48. Lasso L1 regularized Linear Regression Shooting Algorithm (Coordinate Descent) Due to the properties of the update, full consistency is needed

  49. Lasso L1 regularized Linear Regression Shooting Algorithm (Coordinate Descent) Due to the properties of the update, full consistency is needed Finance Dataset from Kogan et al [2009].

  50. Full Consistency Optimal Better Sparse Dense

More Related