480 likes | 591 Views
Sum Selection in Arrays. Allan Grønlund Jørgensen Kvalifikationseksamen. Priority Queues Resilient to Memory Faults, with Moruz, Mølhave (WADS 07) Optimal Resilient Dictionaries, with Brodal, Fagerberg, Finocchi, Grandoni, Italiano, Moruz, Mølhave (ESA07)
E N D
Sum Selection in Arrays Allan Grønlund Jørgensen Kvalifikationseksamen
Priority Queues Resilient to Memory Faults, with Moruz, Mølhave (WADS 07) Optimal Resilient Dictionaries, with Brodal, Fagerberg, Finocchi, Grandoni, Italiano, Moruz, Mølhave (ESA07) Comparison Based Dictionaries: Fault Tolerance versus I/O Efficiency, with Brodal and Mølhave (Manuscript-ICALP08) A Linear Time Algorithm for the k Maximal Sums Problem, with Brodal (MFCS 07) Sum Selection, with Brodal. (Manuscript-ICALP08) Progress Report Fault Tolerance: Sum Selection:
42 -8 7 2 -52 42 7 -52 -8 2 34 -1 9 -50 41 1 -43 43 -51 -9
Outline • Introduction • The k maximal sums problem • Length constrained k maximal sums problem • Sum selection problem • Summary and plans for the future
The Maximum Sum Problem • Given array of numbers, find the largest sum -3 7 -12 1 6 -3 5 -2 (4,7,9)
Kadanes Algorithm(’77) • Scan array from left and in step i update: • Largest suffix sum (Largest sum ending at A[i]) • Largest sum so far (Largest sum in A[1,…,i]) 1 7 -12 1 6 -3 5 -2 8 1 -4 4 7 7 1 9 8 1 9
Outline • Introduction • The k maximal sums problem • Length constrained k maximal sums problem • Sum selection problem • Summary and plans for the future
-3 7 -12 1 6 -3 5 -2 The k Maximal Sums Problem • Given array of numbers, find the k largest sums (they may overlap) • Example with k=2 9 8
Goal Optimal O(n+k) time algorithm outputting the k maximal sums
Main Idea(Intuition) • Build all sums and insert them into a heap ordered binary tree • Find the k largest sums using Frederickson’s heap selection algorithm(’93) in O(k) time
Example(k=4) -12 1 6 -3 5 9 6 8 4 3 -3 7 -3 -8 -5 5 -11 -12 1 2 Fredericksons algorithm finds the red nodes in O(k) time (no particular order)
The Iheap • It is a heap ordered binary tree • Supports insertions in amortized constant time
T2 T2 T2 5 T3 T3 T3 T3 4 T4 T4 3 T4 T4 Inserting 7 in an Iheap 9 7 5 T1 5 7 4 4 7 3 3 7
Main Issue • There are n(n+1)/2 = Q(n2) sums • Constructing and inserting Q(n2) sums into a heap ordered binary tree takes Q(n2) time
-3 7 -12 1 6 -3 5 -2 Grouping Sums • The sums are grouped by their endpoint in the array (1,4,-7) (2,4,-4) Q4: (3,4,-11) (4,4,1)
-3 7 -12 1 6 -3 5 -2 Q4: Q5: Constructing Q5 from Q4 (1,4,-7) (1,5,-1) (2,5,2) (2,4,-4) (3,5,-5) (3,4,-11) (4,4,1) (4,5,7) (5,5,6)
Main Idea Continued • Represent each Q set as a heap ordered binary tree H • Combine all heaps by assembling them into one big heap using dummy infinity keys
H3 H4 H5 H1 H2 The Assembled Heap
Representing Q Sets: • Each set Qj is represent by a tuple < dj , Hj > • Hj is an Iheap containing all j sums from Qj • djis a number must be added to all elements • We get the following construction equation < d0, H0 > = < 0, { } > < dj+1, Hj+1 > = < dj +A[j+1], Hj {-dj}>
0 3 0 3 0 -4 Example < d0, H0 > = < 0, { } > -3 7 -12 < dj+1, Hj+1 > = < dj +A[j+1], Hj {-dj}> {-3} {4,7} {-8,-5,-12}
9 5 7 T1 9 T2 T2 5 5 T1 insert T3 T3 T3 4 7 4 T4 T4 3 3 T4 Analysis of Pair Construction • Building each pair takes amortized constant time (One insertion into Iheap) • !! But the old version disappears • Solution: Partial Persistence (Driscoll.. ‘89) Version i Version i+1
H3 H5 H4 H1 H2 Resume • Build all pairs in O(n) time • Join them into a single heap in O(n) time • Use Fredericksons algorithm to get the k+n-1 largest and discard the dummies in O(n+k) time • O(n+k) time algorithm
Space Reduction • Current algorithm uses O(n+k) time and additional space • The input array is considered read only • Kadanes algorithm uses O(1) additional space • Reduce the additional space usage to O(k)
Higher Dimensions …….. For an m x n matrix, we get In general we get Can be reduced to 1D case.
Outline • Introduction • The k maximal sums problem • Length constrained k maximal sums problem • Sum selection problem • Summary and plans for the future
12 7 -666 8 7 -6 4 -2 Length Constrained k Maximal Sums Problem • Each sum must be an aggregate of at leastl numbers and at mostunumbers • Example with l=3 and u=5 Best Valid: 13 Best: 19
Goal Optimal O(n+k) time algorithm outputting the k maximal sums with length between l and u
H4 H3 H5 H1 H2 First Approach • Use the same idea as before but redefine Q to match the length criteria • Constructing equation is almost identical but requires a deletion
Constructing Q SetsUsing Deletions (l=3,u=6) -5 17 42 -10 0 12 -10 666 (1,7,46) (1,6,56) (2,7,51) (2,6,61) (3,7,34) (3,6,44) (4,7,-8) (4,6,2) (5,7,2)
H4 H3 H5 H1 H2 Result • Same algorithm as before using the new way of constructing the next heap • Deleting an element in a heap of size n with constant time insertion takes O(log n) • O(nlog(u-l) +k) time alg.
13 1 l -1 11 j + l -1 A Better Way of Constructing the Q sets(u=8,l=4) 13+680=693 Divide into slabs of size u-l+1 For each slab build two sets of heaps: One from left (L) and one from right (R) For each index j group all sums of length between l and u ending at j+l-1 using the sets from above and two constants Example j=3 in slab 2 1+680=681 11+680=691 Slab 1 Slab 2 -5 17 42 -10 0 12 -10 11 7 7 666 0 0+693=693 -10+693=683 -10 32
H4 H3 H5 H1 H2 Result • Same algorithm using the new way to group sums. • Building the L and R sets takes O(u-l) time for each slab. • O(n+k) time algorithm
Outline • Introduction • The k maximal sums problem • Length constrained k maximal sums problem • Sum selection problem • Summary and plans for the future
42 -8 7 2 -52 The 15 sums in sorted order: -56 -52 -50 -43 -14 -13 -6 -4 2 7 9 29 36 38 42 Sum Selection • Given array of numbers, find the k’th largest sum • Example with k=5 42 7 -52 -8 2 34 -1 9 -50 41 1 -43 43 -51 -9 9
First Solution • Use the algorithm finding the k maximal sums to find the k largest and output the smallest of these • Algorithm uses O(n+k) time. • What if is large? k
Lower Bound • Reduction from the Cartesian Sum Problem (X+Y) • A lower bound of (|Y| + |Y|log(k/|Y|)) (Frederickson and Johnson ’82) Y X 7 2 -5 12 3 9 1 13 -3 8
Reduction 2 12 1 -3 8 7 -5 9 13 Y X -4 113 = 117 - 4 = 12 -14 -4 117+15 10 -11 -4 11
Result • An (n+nlog(k/n)) lower bound for the sum selection problem
Goal Optimal O(n+nlog(k/n)) time algorithm for selecting the k’th largest sum
Algorithm • Reduction to selection in sorted arrays and weight balanced search trees • Frederickson and Johnson(’82) already solved selection in n arrays in optimal O(n + nlog(k/n)) time • Adapt this algorithm such that it also works on weight balanced trees
Heap ordered binary tree Each node stores B sorted elements Inserting a block of B elements takes O(B) time. Block Heap 54,49,42 39,31,25 23,22,21 24,12,7 17,13,11 10,5,1 9,6,3
720 688 676 686 WB: 675 20 WB: 668 9 BH: WB: 666 2 BH: 720 688 676 Reducing Sum Selection to Selection in Arrays and Trees Slab Divide into slabs of size k/n Each index j is associated with two data structures that together cover all sums ending at index j First data structure is all sums starting in current slab and is named WBj The second is the rest and is named BHj Example Extending within a slab Extending to new slab - a block of k/n elements is inserted to BH 42 -10 0 12 -10 11 7 2 666 0 686 0 675 668 666 54 22 10
H3 H5 H4 H1 H2 Reducing Problem • One insert in tree per step and one insert in Block heap every k/n steps. • n trees of size at most k/n and n Block heaps. • Join all Block heaps together and use Frederickson to find the 4n blocks with largest minimum • n trees and O(n) sorted arrays left
Result • Selection in O(n) trees and sorted arrays storing O(k) elements can be done in O(n+nlog(k/n)) time • Result is an O(n+nlog(k/n)) time algorithm.
Outline • Introduction • The k maximal sums problem • Length constrained k maximal sums problem • Sum selection problem • Summary and plans for the future
Summary of Results Sum Selection:
Summary of Results Fault Tolerant Data Structures:
Progress and Future Fault Tolerance Priority Queue Searching Dictionary I/O Eff. Search I/O Eff. Sorting Cache Oblivious Sums in Arrays k Max Sums (l,u) k Max Sums Sum Selection Selection in arb. Trees MIT Time PhD Start Qualification Exam