Multithreaded Clustering for Multi-level Hypergraph Partitioning

Multithreaded Clustering for Multi-level Hypergraph Partitioning Ümit V. Çatalyürek1,2, Mehmet Deveci1,3, Kamer Kaya1, Bora Uçar4 1Dept. of Biomedical Informatics, The Ohio State University 2Dept. of Electrical & Computer Engineering, The Ohio State University 3Dept. of Computer Science & Engineering, The Ohio State University 4CNRS and LIP, ENS Lyon

Introduction • Hypergraph partitioning • Used for parallelization of complex and irregular applications • balanced load distribution • good communication pattern • Other applications • VLSI design • Sparse matrix reordering • Static and dynamic load balancing • Cryptosystem design • … Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

Hypergraph Partitioning • Hypergraph: H = (V, N) • A net is a subset of vertices. • Each net n has cost c(n) and each vertex v has weight w(v). • λn: Connectivity of a net n, i.e., the number of parts net n is connected. • Objective: Find a partition of the vertices. • minimizes the cut size: • provides load balance: Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

Hypergraph Partitioning: Example P2 v1 v2 n3 n1 n4 n2 n5 v3 v5 v4 P1 P3 Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

Multi-level Approach • Three phases: • Coarsening: obtain smaller and similar hypergraphs to the original, until either a minimum vertex count is reached or reduction on vertex number is lower than a threshold. • Initial Partitioning: find a solution for the smallest hypergraph. • Uncoarsening: Project the initial solution to the finer hypergraphsand refine it iteratively until a solution for the original hypergraph obtained. Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

Parallelization of Coarsening • Why Coarsening? • Coarsening phase is an important phase of multi-level approach. • Worst case time complexity is higher than other phases. • Quality of coarsening affects the run-time of other phases. • A good coarsening requires less local moves in the uncoarsening phase. • Affects the quality of partitioning result. • Two classes of clustering algorithms are parallelized. • Matching-based: only allows two vertices to be clustered. • faster • Agglomerative-based: allows any number of vertices. • better quality Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

Clustering Algorithms in PaToH • Heavy connectivity matching • Unmatched vertex u is matched with an unmatched adjacent vertex v with maximum connectivity. • Creates adjacency list on fly. • Then traverses adjacency list and picks the most heavily connected vertex. • Removes the matched vertices of a net from the pins for efficiency. Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

Heavy Connectivity Matching v1 u = v1 adj= {v2,v3,v4} u = v2 adj= {v1,v3,v4,v5} v2 conn(v2)= 1 conn(v3)= 2 conn(v4)= 1 conn(v4)= 2 conn(v5)= 2 n3 n1 n4 v* = v3 v* = v4 n2 n5 v3 v5 v4 Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

Clustering Algorithms in PaToH Seq. agglomerative matching • Unmatched vertex u is matched with a unmatched v or a cluster of vertices. • Traverses adjacent vertices and computes the connectivity of vertices. • Calculates connectivity of clusters. • Picks the heaviest connected vertex or vertex cluster and checks for maxW criteria. Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

Agglomerative Clustering v1 u = v1 adj= {v2,v3,v4} u = v2 adj= {v1,v3,v4,v5} u = v4 adj= {v1,v2,v3,v5} v2 conn(v2)= 1 conn(v3)= 2 conn(v4)= 1 conn(v1)= 1 conn(v2)= 2 conn(v3)= 2 conn(v5)= 1 conn(v1+v2+v3)= 5 conn(v5)= 1 conn(v1)= 1 conn(v3)= 2 conn(v4)= 2 conn(v5)= 2 n3 conn(v1+v3)= 3 conn(v4)= 2 conn(v5)= 2 n1 n4 v* = v3 n2 n5 v* = v1 v* = v5 (maxW criteria) v3 v5 v4 Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

Multithreaded Clustering Algorithms • Matching-based algorithms • Parallel lock-based • Parallel resolution-based • Parallel agglomerative-based algorithm Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

Parallel Lock-based Matching • Similar to sequential • But it uses an atomic CHECKANDLOCK operation. • Current vertex u, and candidate vertex v are required to be locked to be matched. • If a better candidate is found, unlocks the previous candidate. Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

Parallel Lock-based Matching v1 u = v1 adj= {v2,v3,v4} u = v2 adj= {v1,v3,v4,v5} v2 conn(v1)= 1 conn(v3)= 2 conn(v4)= 2 conn(v5)= 2 conn(v2)= 1 conn(v3)= 2 conn(v4)= 1 n3 n1 n4 n2 n5 v* = v3(locked) v* = v4 v* = v3 v3 v5 v4 Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

Parallel Resolution-based Matching • Sequential algorithm in parallel. • At the end, check for the conflicts. • A conflict cost a pair, therefore incurs a reduction on the cardinality and quality of matching. • To reduce the number of conflicts, matchings of vertices are checked more frequently. Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

Parallel Resolution-based Matching v1 u = v1 adj= {v2,v3,v4} u = v2 adj= {v1,v3,v4,v5} Conflict at v1 u = v4 adj= {v1,v2,v3,v5} u = v5 adj = {v2,v4} v2 conn(v1)= 1 conn(v3)= 2 conn(v4)= 2 conn(v5)= 2 conn(v2)= 1 conn(v3)= 2 conn(v4)= 1 conn(v4)= 1 conn(v5)= 1 n3 v* = v5 n1 v* = v4 n4 n2 n5 v* = v3 v* = v3 v3 v5 v4 Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

Parallel Agglomerative Clustering • Traverses neighbors, creates adjacency list. • Sums up the connectivity of vertex clusters. • If the candidate vertex or vertex cluster is available, • Double checks for matching occurrence and vertex weight criteria. • Selects as the best candidate, unlock previous best candidate. Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

Parallel Agglomerative Clustering v1 u = v1 adj= {v2,v3,v4} u = v2 adj= {v1,v3,v4,v5} u = v5 adj= {v2,v4} v2 conn(v1)= 1 conn(v3)= 2 conn(v4)= 2 conn(v5)= 2 conn(v2)= 1 conn(v3)= 2 conn(v4)= 1 conn(v2)= 2 conn(v4)= 1 conn(v2+v4)= 3 n3 n1 v* = v2 n4 n2 n5 v* = v3(locked) v* = v4 v* = v3 v3 v5 v4 Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

Experimental Setup • All the algorithms are implemented in PaToH. • Implemented in C and OpenMP. • Compiled with icc version 11.3 and –O3 flag. • All the algorithms are tested on an in-house cluster consisting of 64 nodes: • Each node has 2 Intel Xeon E5520 (quad-core clocked at 2.27 Ghz with hyper-threading) processors. • The experiments are run on 70 large hypergraphs corresponding to matrices from UFL Sparse Matrix Collection. Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

Comparison Metrics • Clustering metrics: • Cardinality: the number of clustering decisions. • Quality: the sum of similarities between each vertex pair which resides in the same cluster. Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

Comparison with Max. Weight. Matching • Matching algorithms are compared against Gabow’smaximum matching algorithm. • Tested on 289 small (at most 10000 rows) matrices different than the dataset used in other experiments, due to the complexity of Gabow’s algorithm O(n3). • Table gives the relative performances of algorithms w.r.t. Gabow’s maximum matching algorithm. • Parallel algorithms are 17-26% worse in terms of quality. • The proposed lock-based parallelization does not hamper the performance of the sequential algorithm. • The resolution-based algorithm is outperformed by the other two. Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

Performance Profiles compared against Gabow’s algorithm The probability to obtain matching quality 1.25 times less than Qmaxis ~75% • Point (x,y) in the graph: With y probability, the quality of matching found is more than Where Qmax is the quality obtained by Gabow’s algorithm. 45% for resolution-based algorithm Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

Resolution-Based Algorithm Conflicts • Average number of matched vertices and conflicts for the proposed resolution-based algorithm. • Number of conflicts increases with number of threads. • However, compared to the cardinality of the matching the conflicts are uncommon. • Max conflict/match in a single graph is as low as 0.7% (mean is as low as 0.008%). Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

Matching Speedup • 5.87, 5.82 and 5.23 for resolution-based, parallel agglomerative, and lock-based algorithm respectively. • 5-7 % overhead due to OpenMP and atomic operations. Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

Matching Speedup Profiles • Matching speed up profiles for #threads = 8. • The resolution-based algorithm is the best among the proposed ones in terms of scalability. P = 33% to obtain at least 6.6 speedup 20% and 16% for parallel agglomerative and lock-based algorithms Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

Overall PaToH Speedup • Only the coarsening level is parallelized. • The speedup for overall execution time can be found with Amdahl’s law: where r is the ratio of total coarsening time to total sequential execution time. Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

Overall Speedup for Match. Algorithms • Lock-based algorithm is more efficient as its speedup is closer to ideal. • Although resolution-based algorithm scales better, it obtains worse quality matching. This increases the running time of initial partitioning and uncoarsening phases. Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

Overall Speedup for Agglo. Clustering • 6%, 6%, and 10% slower than the best possible parallel execution time for 2, 4, and 8 threads respectively. Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

Overall Running Time Matching time of agglomerative clustering is 25% higher, but overall execution time is 13% lower. • Overall execution times normalized w.r.t. that of sequential agglomerative clustering. Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

Minimum Cut Size At most 1% difference At most 3% difference • Minimum cut sizes are almost equal to that of sequential algorithms. Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

Conclusion • Clustering algorithms are the most time consuming part of a multi-level hypergraphpartitioner. • We have presented two different multithreaded implementations of matching-based clustering algorithms and a multithreaded agglomerative clustering algorithm. • We have presented different sets of experiments. • The proposed algorithms have decent speedups. • They perform as good as sequential counter parts, sometimes even better. • We observe that clustering with better quality helps the partitioner to obtain better cut sizes and reduces the time of the other phases. Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

Thanks • For more information • Email umit@bmi.osu.edu • Visit http://bmi.osu.edu/~umit or http://bmi.osu.edu/hpc • Research at the HPC Lab is funded by Ç., Deveci, Kaya, Uçar “Multithreaded Clustering for Multi-level Hypergraph Partitioning”

Multithreaded Clustering for Multi-level Hypergraph Partitioning

Multithreaded Clustering for Multi-level Hypergraph Partitioning

Presentation Transcript

Clustering and Partitioning for Spatial and Temporal Data Mining

Mechanism for Multi-Level Marketing

Hypergraph Sparsification and Its Application to Partitioning

CS 240A: Graph and hypergraph partitioning

Clustering of Phylogenetic Trees by Clique Partitioning

Song-level Multi-pitch Tracking by Heavily Constrained Clustering

Global Clustering-Based Performance-Driven Circuit Partitioning

Trace-Level Speculative Multithreaded Architecture

Parallel Hypergraph Partitioning for Scientific Computing

Task Partitioning for Multi-Core Network Processors

CS 240A: Graph and hypergraph partitioning

MULTI-LEVEL SECURITY for NATO

High-level Multithreaded Programming [Part II]

RIC- Refine Initial Cluster Centers in Partitioning Clustering for Clustering Transacting Data

High-level Multithreaded Programming [Part I]

High-level Multithreaded Programming [Part III]

Optimized Graph Search Using Multi-Level Graph Clustering

Global Clustering-Based Performance-Driven Circuit Partitioning

Clustering and Partitioning for Spatial and Temporal Data Mining

Clustering Event Logs Using Iterative Partitioning