Distributed Sorting Methods for Efficient Data Arrangement

Distributed Algorithms

a11 a12 ….a1n b11 b12 ….b1n a21 a22 ….a2n b21 b22 ….b2n …... …... an1 an2 ….ann bn1 bn2 ….bnn 1. Matrix multiplication c11 c12 ….c1n • Method 1: mimic vector computation • Method 2: divide-and-conquer c21 c22 ….c2n x = …... cn1 cn2 ….cnn

c11 c1n c12 b11 b1n b12 c2n c21 c22 b22 b21 b2n … … … … … … cn2 cn1 cnn bnn bn1 bn2 a11 a12 ….a1n a11 a12 ….a1n a11 a12 ….a1n a21 a22 ….a2n a21 a22 ….a2n a21 a22 ….a2n …... …... …... an1 an2 ….ann an1 an2 ….ann an1 an2 ….ann P1 P2 …... Pn 1 MIMD: • Master parcels out the computation to n processors • Each processor performs its own task independently • Master collects the results from n processors 2 …...

A11 A12 C11 C12 A21 A22 C21 C22 B11 B12 C11 = A11 B11 +A12 B21 C12 = A11 B12 +A12 B22 …... = B21 B22

S1 S2 Sd 2. Distributed Sorting 2.1. Introduction 1 Category for sorting 2 Distributed sorting P = X=X1 X2 … … Xd Y= Y1 Y2 … … Yd … …

S1 S3 S2 S4 X Y

Suppose, there are d ranked sites S1, S2, S3, …, Sd. For a given data distribution X={X1, X2, … , Xd}, a global distributed sorting is a process to find a new distribution Y={Y1,Y2, … , Yd} such that each Yi is ordered and furthermore every element of Yi is not greater than any element of Yj if i<j. 3 Two kinds of distributed sorting • Based on merge easy-split and hard join • Based on selection hard-split and easy join

503 87 512 61 908 170 897 275 653 426 154 509 612 677 756 703 61 87 503 512 170 275 897 908 154 426 509 653 612 677 703 756 odd 1) Distributed sortingBased on merge S1 S2 S3 S4 local sort

61 87 170 275 653 677 703 756 503 512 897 908 154 426 509 612 61 87 170 275 503 512 897 908 154 426 509 612 653 677 703 756 even S1 S2 S3 S4

61 87 170 275 653 677 703 756 61 87 170 275 653 677 703 756 154 426 503 509 512 612 897 908 154 426 503 509 512 612 897 908 odd S1 S2 S3 S4

61 87 154 170 703 756 897 908 275 426 503 509 512 612 653 677 61 87 154 170 275 426 503 509 512 612 653 677 703 756 897 908 S1 S2 S3 S4

local sort odd S1 S3 S2 S4 even 1 2 3 4 5

0 1 2 3 4 5 6 7 8 9 10 11 min

S1 S2 S3 S4 300th 200th 100th smallest 100,000,000data/site 400,000,000data 2)Distributed sorting based on selection … … … …

503 87 512 61 908 170 897 275 653 426 154 509 612 677 756 703 61 87 154 170 275 426 503 509 512 612 653 677 703 765 897 908 4th 8th 12th Select Split Sort

key key For 4 machines and 400,000,000 data, distributed sorting find 3 smallest elements • 100,000,000th smallest element • 200,000,000th smallest element • 300,000,000th smallest element Generally speaking, for d machines and N data, distributed sorting find d-1 smallest elements • Kth smallest element • 2Kth smallest element • … … • (d-1)Kth smallest element K=N/d

2.2 Distributed sortingbased on selection 1. Algorithm Let, there be d sites, N data, each site have k=N/d elements phase 1: All sites cooperatively select d-1 elements, kth, 2kth, … (d-1)kth smallest elements. phase 2: Each site splits its own data into d parts according to d-1 particular elements, sends elements whose values fall into [kj-1, kj] to site j. phase 3: Each site sorts its local data independently.

503 87 512 61 908 170 897 275 653 426 154 509 612 677 756 703 703 765 897 908 87,61 503 512 512 612 653 677 275 426 503 509 61 87 154 170 Y1 4th 170 275 908,899 Y2 8th sort 154 426,509 653 Y3 12th 612,677 703,765 Y4 X1 X2 select X3 X4

87,61 503 512 703 765 897 908 512 612 653 677 61 87 154 170 908 899 756 703 512 653 612 677 275 426 503 509 87 61 170 154 503 275 426 509 Y1 4th 170 275 908,899 Y2 8th 154 426,509 653 Y3 12th 612,677 703,765 Y4

How select ith smallest element? How select (d-1) smallest elements?

87 61 87 61 170 275 426 154 2 503 87 512 61 908 170 897 275 653 426 154 509 612 677 756 703 154 1 |XL|=6 170 275 426 3 1 170 275 426 XE 503 512 908 897 653 509 612 677 765 703 |XG|=9 distributed selection 4th smallest traditional selection

(d-1)k 3k 2k k One by one 2. (m, n)-Selection It is a process in which m required elements are selected from n elements. X={x1, x2 , … … … … … … … … … … …, xn} K={k1, k2, … …, km} |X|=n Once

87 61 170 275 426 154 503 87 512 61 908 170 897 275 653 426 154 509 612 677 756 703 |XL|=6 XE 503 512 908 897 653 509 612 677 765 703 |XG|=9 K 4 8 12 X

1) Algorithm (traditional one) • If k=[ ] then return • Choose a partitioning element v, divide X into XL, XE, XG which contain elements of X which are smaller, equal, greater than v respectively • Split K into K1, K2 where K1= {Ki | Ki<= |BL|}, K2= {Ki | Ki = Kj - (|XL|+|XE|), Kj > |XL|+|XE|}. |K|-|K1|-|K2| elements in XE have been selected out in XE • Recursively call this algorithm on (XL, K1) and/or (XG, K2) if K1 or K2 is not empty.

87 61 87 61 170 275 426 154 2 503 87 512 61 908 170 897 275 653 426 154 509 612 677 756 703 154 1 |XL|=6 170 275 426 3 4 1 1 170 275 426 XE 503 3 1 512 908 897 653 509 612 677 765 703 509 512 908 897 653 612 677 765 703 |XG|=9 K 4 8 12 1 5

2) Distributed (m, n)-selection • Counting the total number of elements in all sites in one scan of the distributed system(see chapter4) • Choosing a partitioning element v from all data • Each site partitions its own data into XLi, XEi, XGi according to v. The global size of XL, XE, XG are obtained by a scan from leaves to root • By these three global sizes, the root divides K into K1, K2. If K1 or K2 is not empty, go to (2), apply the algorithm to (XL, K1) and/or (XG, K2) spawn ?

3)Distributed sorting based on (m, n)-selection • Apply (m, n)-selection algorithm to X and K, K={k1, k2, … …, kd-1}, ki = i*|X|/d to select k1th, k2th, … …, kd-1th smallest elements • Each site splits its own data into d parts according to d-1 particular elements, sends elements whose values fall into [kj-1, kj] to site j. • Each site sorts its local data independently.

Compare these two kinds of distributed sorting.

Distributed Sorting Methods for Efficient Data Arrangement

Distributed Sorting Methods for Efficient Data Arrangement

Presentation Transcript

Distributed Control Algorithms

Distributed Algorithms – 2g1513

Distributed Algorithms

Distributed Algorithms

Distributed Algorithms

Distributed Algorithms

UBI529 Distributed Algorithms

Distributed Algorithms – 2g1513

Distributed Algorithms 2g1513

Distributed Systems: Distributed algorithms

Distributed Algorithms

UBI529 Distributed Algorithms

Distributed Algorithms (22903)

Distributed Algorithms – 2g1513

IOA: Distributed Algorithms  Distributed Programs

Distributed Agreement Algorithms

DISTRIBUTED ALGORITHMS

Distributed Algorithms

DISTRIBUTED ALGORITHMS

DISTRIBUTED ALGORITHMS