220 likes | 354 Views
15-211 Fundamental Structures of Computer Science. Sorting – Part II. February 25, 2003. Ananda Guna. Announcements. Work on Homework #4 Due on Monday, March 17, 11:59pm You should have started by now! Quiz #2 is Tuesday, Feb.25 Study Huffman and LZW algorithms
E N D
15-211Fundamental Structuresof Computer Science Sorting – Part II February 25, 2003 Ananda Guna
Announcements • Work on Homework #4 • Due on Monday, March 17, 11:59pm • You should have started by now! • Quiz #2 • is Tuesday, Feb.25 • Study Huffman and LZW algorithms • Midterm is Tuesday March 4th • Review for mid term test thursday
Master Theorem THEOREM: The recurrence T(n) = aT(n/b) + cn, T(1) = c, where a, b, and c are all constants, solves to: T(n) = (n) if a < b T(n) = (n log n) if a = b T(n) = (n logb a) if a > b
Recurrences • Divide-and-conquer algorithms often lead to recurrences of the following form: • T(n) = a*T(n/b) + cn • T(1) = c (Here a,b, and c, are constants > 0.) • For merge sort : a = b = 2 • What if a = b = 3 for merge sort? How will that affect c?
Solving General Recurrences • We can solve this by repeated substitution method • T(n) = aT(n/b) + cn • = a(aT(n/b2) + cn/b) + cn • = a(a(aT(n/b3)+cn/b2)+cn/b) +cn • = …… • = ak+1 T(n/bk+1) + cn[(a/b)k+..+(a/b)2+(a/b)+1] • We will solve this in class.
Sorting Recap • Selection sort: always O(n^2) • Insertion sort: the total time is O(n + # inversions). This is O(n^2)in worst-case and average-case, but might be smaller if the file is almost sorted. • Bubble-sort - between insertion and selection in terms of running time. • These are all easy to code but O(n2) on average • We can do better
Sorting Recap ctd.. • Better algorithms • Heapsort : O(n log n) worst-case. Can do in-place in array. • Mergesort. O(n log n) worst-case. Simple divide-and-conquer: split into left and right halves, recursively sort both halves, and then merge the results. Running time described by recurrence: T(n) = 2T(n/2) + cn and Recurrence solves to O(n log n). • Quicksort: O(n^2) worst-case but O(n log n) average-case. • If you always pick the leftmost as pivot, then this is a lot like inserting into a binary search tree • Cost is like sum of depths of the nodes • How can we avoid the worst case for any data set? Hint: Randomize • Why is quicksort better than MergeSort? Hint: faster innerloop
How fast can we sort? • We have seen several sorting algorithms with O(Nlog N) running time. • Can we do better than N.logN? • In fact, O(Nlog N) is a general lower bound for the sorting algorithm. • A proof appears in Weiss. • Informally we can argue as follows…
a<b<c a<c<b b<a<c b<c<a c<a<b c<b<a a<b b<a a<b<c a<c<b c<a<b b<a<c b<c<a c<b<a a<c c<a b<c c<b a<b<c a<c<b c<a<b b<a<c b<c<a c<b<a b<c c<b a<c c<a a<b<c a<c<b b<a<c b<c<a Decision tree for sorting N! leaves. So, tree has height log(N!). log(N!) = (Nlog N).
Our lower bound argument • We make the following observations/Arguments • for any two different permutations P1,P2 of the input, the algorithm must at some point make a comparison that causes it to do different things in the two permutations (otherwise, they wouldn't both be sorted) • each comparison has only two outcomes (it's a YES/NO question) • there must be some permutation that causes the algorithm to ask log(n!) questions • So we argued that comparison based algorithms are both O(n log n) and (n log n) • So this is (n log n).
Summary on sorting bound • If we are restricted to comparisons on pairs of elements, then the general lower bound for sorting is (Nlog N). • A decision treeis a representation of the possible comparisons required to solve a problem.
Non-comparison-based sorting • If we can use more than just comparisons of pairs of elements, we can sometimes sort more quickly. • A simple example is bucket sort. • In bucket sort, we require the additional knowledge that all elements are non-negative integers less than a specified maximum value.
2 3 1 Bucket sort 1 3 3 1 2
Implementing Bucket Sort • Assume all values are in the range 0..k for some small k. • Make an array of k linked lists • Insert each item into array[item.value()] • Make one pass and collect all items • This is O(N + k) algorithm
Bucket sort characteristics • Runs in O(N) time. • Easy to implement each bucket as a linked list. • Is stable: • If two elements (A,B) are equal with respect to sorting, and they appear in the input in order (A,B), then they remain in the same order in the output.
Radix Sort • If your integers are in a larger range then do bucket sort on each digit • Start by sorting with the low-order digit using a STABLE bucket sort. • Then, do the next-lowest,and so on • If the items are b digits long (or b bytes long for strings) then the time to sort N items is O(Nb).
0 1 0 0 0 0 1 0 0 1 1 0 1 0 1 0 0 1 1 1 1 0 1 1 0 0 0 1 0 0 1 0 1 0 0 1 0 1 0 1 1 0 1 1 1 0 1 1 0 0 0 0 0 1 0 1 0 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1 Radix sort Example • A sorting algorithm that goes beyond comparison - radix sort. 2 0 5 1 7 3 4 6 0 1 0 0 0 0 1 0 1 0 0 1 1 1 1 0 1 1 1 0 0 1 1 0 0 1 2 3 4 5 6 7 Each sorting step must be stable.
Radix sort characteristics • Each sorting step can be performed via bucket sort, and is thus O(N). • If the numbers are all b bits long, then there are b sorting steps. • Hence, radix sort is O(bN). • Also, radix sort can be implemented in-place (just like quicksort).
0 3 1 0 3 2 2 5 2 1 2 3 2 2 4 0 1 5 0 1 6 1 6 9 0 1 5 0 1 6 1 2 3 2 2 4 0 3 1 0 3 2 2 5 2 1 6 9 0 1 5 0 1 6 0 3 1 0 3 2 1 2 3 1 6 9 2 2 4 2 5 2 Not just for binary numbers • Radix sort can be used for decimal numbers and alphanumeric strings. 0 3 2 2 2 4 0 1 6 0 1 5 0 3 1 1 6 9 1 2 3 2 5 2
Thursday and Next Week • We will do a review for midterm on Thursday • Midterm test is Tuesday March 4th • We will post some old exams on Bb. • We will have online office hours next week • Work on HW4 – Ask questions early