370 likes | 681 Views
SORTING AND SEARCHING. Sorting Algorithm Analysis Searching. Sorting. Algorithm Analysis Sorting and Searching are the most frequent operations Elementary Sorting bubble sort selection sort insertion sort Merge Sort We’ll explain what these symbols mean in a few slides ….
E N D
SORTING AND SEARCHING • Sorting • Algorithm Analysis • Searching Sorting and Searching
Sorting • Algorithm Analysis • Sorting and Searching are the most frequent operations • Elementary Sorting • bubble sort • selection sort • insertion sort • Merge Sort • We’ll explain what these symbols mean in a few slides … Sorting and Searching
The Importance Of Algorithm Analysis • “Performance” of an algorithm often refers to how quickly it executes • Performance matters! Can observe and/or analyze, then tune or revise algorithm • Algorithm analysis is so important that every Brown CS student is required to take at least one course covering the topic! • CS16: Introduction to Algorithms and Data Structures • a toolkit of useful algorithms • order of magnitude performance characterizations (i.e., expected runtime analysis) • CS157: Design and Analysis of Algorithms • considers exact upper and lower bounds of performance for more advanced algorithms • employs sophisticated mathematical analysis Sorting and Searching
Analysis of Algorithms – Things to Consider • Computing resources consumed • running time • memory space • network resources • power • … • Implementation of algorithm • machine (e.g., Intel Core i7, AMD Phenom X4, etc.) • language (Java, C, C++, etc.) • For given input, time and space used • can depend on implementation • Size of input data, denoted N, e.g. • number of elements to sort • number of nodes in tree to be visited • Worst-case time complexity T(N) • maximum running time of algorithm over all possible inputs of size N Sorting and Searching
Big-O Notation - OrderOf() • How to abstract from implementation? • Big-O notation • O(N) means each element is accessed once • N elements * 1 access/element = N accesses • O(N2) means each element is accessed N times • N elements * N accesses/element = N2 accesses • Only consider “asymptotic behavior” i.e., when N>>1 (N is much greater than 1) • N is unimportant compared to N2 • Disregard constant factors: • newer machine might run all programs twice as fast • line of code in one language may be several lines on another • Remember, only largest N expression without constants matters • 3N2 is O(N2) • N/2 is O(N) • 4N2 + 2N is O(N2) • useful sum that recurs frequently in analysis: N 1+2+3+…+N = i = N(N+1)/2 is O(N2) i =1 Sorting and Searching
f(N) on linear graph paper 180 NlogN N 2N N2 160 140 120 100 f(N) 80 60 40 N 20 logN 20 40 60 80 100 120 140 0 N Note: Base for logs often depends on the data structures we are using in our algorithms (for example, base 2 for binary trees) Sorting and Searching
f(N) on log-log graph paper 109 NlogN N2 2N N 108 107 106 105 N f(N) 104 103 102 logN 101 100 101 102 103 104 105 106 107 N • x-axis: log N y-axis: log f(N) • the diagram of cf(N) is obtained by “shifting” the diagram of f(N) up by log c Sorting and Searching
Bubble Sort • Iterate through sequence, compare each element to right neighbor. • Exchange adjacent elements if necessary. • Keep passing through sequence until no exchanges are required (up to N times). • Each pass causes largest element to bubble into place: 1st pass, largest; 2nd pass, 2nd largest, ... • Therefore get a sorted sub-array on the right and can stop one position sooner each pass (more efficient than brute force bubbling through entire array each pass…) 49 2 36 55 4 72 23 Before a pass 2 36 49 55 4 72 23 Middle of first pass 2 36 49 4 55 23 72 After one pass Sorting and Searching
Worst-Case Time, Bubble Sort # executions 1 1 N (N - 1) (N-1)+(N-2)+ ... + 2 + 1 = N(N-1)/2 (N - 1) Instruction • N is number of objects in sequence Worst-case analysis (sorted in inverse order): • while-loop is iterated N-1 times • iteration i executes 2 + 6 (i - 1) instructions Total: 2 + N + 2(N-1) + 6[(N-1)+ ... + 2 + 1] = 3N + 6N(N-1)/2 = 3N2+... = O(N2) i = N; sorted = false; while((i > 1)&&(!sorted)) { sorted = true; for(int j=1; j<i; j++){ if (a[j-1] > a[j]) { temp = a[j-1]; a[j-1] = a[j]; a[j] = temp; sorted = false; } } i--; } { [ exchange Sorting and Searching
2 4 36 55 9 72 23 i 2 4 36 55 72 23 j j+1 2 4 9 36 55 72 23 j j+1 i Insertion Sort • Like inserting a new card into a partially sorted hand by bubbling to the left into sorted subarray on left; little less brute-force than bubble sort • add one element a[i] at a time • find proper position, j+1, to the left by shifting to the right a[i-1], a[i-2], ..., a[j+1] left neighbors, until a[j] < a[i] • move a[i] into vacated a[j+1] • After iteration i<n, the original a[0] ... a[i] are in sorted order, but not necessarily in final position Sorting and Searching
Time Complexity of Insertion Sort Pseudocode implementation for (inti = 1; i < n; i++) { inttoInsert = a[i]; int j = i-1; while ((j >= 0) && (a[j] > a[i]) { move a[j] forward; j--; } move toInsert to a[j+1]; } Analysis • Most executed instructions are ones for move in while-loop within the for-loop. • Such instructions are executed worst case (inverse order) 1 + 2 + ... + (N-2) + (N-1) times. • Time complexity: O(N2) worst-case; again, constants do not matter for Big-O. Sorting and Searching
Selection Sort • Find smallest element and put it in a[0]. • Find 2nd smallest element and put it in a[1], etc. • Less data movement (no bubbling) Pseudocode: for (inti = 0; i < n; i++) { find minimum element a[min] in subsequence a[i...n-1] exchange a[min] and a[i] } • After iteration i, a[0] ... a[i] are in final position. a 2 4 36 55 5 72 23 i min Sorting and Searching
Time Complexity of Selection Sort for (int i = 0; i < n-1; i++) { int min = i; for (int j = i + 1; j < n; j++) { if (a[j] < a[min]) { min = j; } } temp = a[min]; a[min] = a[i]; a[i] = temp; } Worst Case Analysis • Most executed instructions are those in innerforloop (if) • Each such instruction is executed (N-1) + (N-2) + ... + 2 + 1 times • Time complexity: O(N2) Sorting and Searching
Comparison of Elementary Sorting Algorithms Note: The differences in Best and Worst case performance result from the state (ordering) of the input before sorting Selection Insertion Bubble n2 Best n n 2 n2 n2 n2 Comparisons Average 4 2 2 n2 Worst n2 n2 2 2 2 Best 0 0 0 n2 n2 n Average Movements 4 2 n2 n2 Worst n 2 2 Sorting and Searching
Merge Sort • Divide-and-Conquer algorithm • Time complexity: O(N log N) • Simple recursive formulation • Stable-- preserves initial order of equal keys Sorting and Searching
Merging Two Sorted Lists • Time: O(M + N) A 1 5 9 25 M B 2 3 17 N M + N C 1 2 3 5 9 17 25 Sorting and Searching
Outline of Recursive (Top Down) Merge Sort • Partition sequence into two sub-sequences of N/2 elements. • Recursively partition and sort each sub-sequence. • Merge the sorted sub-sequences. Sorting and Searching
Recursive Merge Sort • listSequence is sequence to sort. • first and last are smallest and largest indices of sequence. public class Sorts { // other code here public void mergeSort( ItemSequencelistSequence, int first, int last) { if (first < last) { int middle = (first + last) / 2; // recursively mergeSort sub-sequence mergeSort(listSequence, first, middle); // recursively mergeSort sub-sequence mergeSort(listSequence, middle+1, last); // merge sorted sub-sequences // back together, code for merge elided merge(listSequence, first, middle,last); } } } Sorting and Searching
Bottom - Up Merge Sort 7 2 3 5 8 4 1 6 This should remind you of ... a tree!!! 2 7 3 5 4 8 1 6 2 3 5 7 1 4 6 8 1 2 3 4 5 6 7 8 Sorting and Searching
Bottom - Up Merge Sort for k = 1, 2, 4, 8, ... , N/2 { merge all pairs of consecutive sub-sequences of size k into sorted sub-sequences of size 2k. } • Number of iterations is log2N, rounded up to nearest integer • if there are 1000 elements in the list, only 10 iterations are required!!! • Each iteration (merges of successively fewer but larger sub-sequences) takes O(N) time • Total time T(N) = O( N log2 N ) • Sequential Data Access: • merge sort accesses data sequentially • particularly useful for sorting linked lists and data sorted on disk Sorting and Searching
Time Complexity of Recursive Merge Sort • Stops when N/2i = 1, i.e., i = log2N T(N) = 2log2N + Nlog2N = O(Nlog2N) merge ( N ) • T(N) = 2 • T + N for N 2 2 2 recursive mergeSorts T(1) = 1 ( N ) (1) • T(N) = 2 • T + N 2 ( ] [ N N ) + N + • T(N) = 2 • 2T 4 2 ( N N ) (2) • T(N) = 4 • T + 2 • + N 4 2 ( N N ) (3) • T(N) = 8 • T + 4 • + N 8 4 • • • i ( N ) (i) 2iT + N + N + ... + N 2i Sorting and Searching
Sorting and Searching • Part II -- Searching • Using: • sequences (arrays, arraylists) • linked lists • tree • hash tables Sorting and Searching
Searching In Different Structures • Searching is one of the most common and fundamental computer tasks. We learned how to sort so that we could search faster. input output search function key value(s) or object ... value value value value 1. Sequences key1 key2 key3 key4 2. Linked Lists ... key1 key2 value value 3. Binary Trees key value Sorting and Searching
Simple Searches in Sequences • Searching in an unordered sequence: • dumb Linear Search • time = N when: • duplicates must be found • worst-case scenario arises • time = N/2 for average case in successful searches • time = N for unsuccessful search • Searching in ordered sequence: • use binary search (e.g. telephone book) • average and worst-case time = log2N for successful and unsuccessful searches • duplicates can be found by checking adjacent elements • Using binary search on ordered sequences thus offers logarithmic rather than linear run-time. • But we don’t always use sequences (e.g. arrays, arraylists, etc.) to store our data • insertion and deletion require data movement! • linked lists obviate most data movement at cost of extra pointers Sorting and Searching
Simple Searches in Linked Lists (1/2) • Searching unordered linked list • Linear search; can’t do binary search. • Time = N if duplicates must be found or for worst case • Time = N / 2 on average for successful searches. head tail key1 value key2 value key3 value Sorting and Searching
Simple Searches in Linked Lists (2/2) • Searching ordered linked list • Time = N / 2 for average case of successful, unsuccessful, and duplicate searches • A linked list is an easy-to-implement dynamic data structure, but the tradeoff is an increase in running time because we can’t directly access the ith element of the list: • O(N) >> O(log2N) null first Aisha ... Sam ... David ... Sorting and Searching
Binary Trees • Offer advantages of both sequences and linked lists • logarithmic search time for mostly balanced trees • proportional to depth of tree • log2N on average • easy insertion • just ask if node belongs in the left or right subtree! • What’s the catch? • extra reference in each node • somewhat more complicated deletions as shown in the Trees lecture Sorting and Searching
n-ary Trees • Anything faster than binary tree? • n-ary tree (n references per node) • searches in lognN time • But • difference between logs is small because log (corresponds to tree depth) grows slowly e.g., log2 106 = 20; log10 106 = 6 • requires more complex node processing • not in common use (in this form) • Is there a scheme that offers advantages of both log2 and logn? Sorting and Searching
Hybrid Trees • n-ary at root, binary thereafter. Alphanumeric Keys ........ A B C D E F G Binary tree with all keys with first letter = A Binary tree with all keys with first letter = E Sorting and Searching
Most Efficient Data Structures • Most efficient search would be 1-level tree where we could access each key immediately: • Implement this dream-tree as an array. Then if each key is an integer in range of the array’s size, use it as the index. Use the contents of the array at that location and you’re done! • Sounds perfect, right? .... key1 key2 key n value 1 value 2 value n ... key n valuei ... Sorting and Searching
Problems with Different-Sized Keys • Creating a table to look up CS15 students based on some ID number would be a tremendous waste of space • if ID number is one letter followed by five digits (e.g., D00011), there are 26 * 105 combinations! • do not want to allocate 2,600,000 words for no more than 250 students (1 word = 4 bytes) • array would be rather sparse... • What about using social security number? • would need to allocate 109 words, about 4 gigabytes, for no more than 250 students! • Thus, two major problems: • how can we deal with arbitrarily long keys, both numeric and alpha-numeric? • how can we build a small, dense (i.e., space-efficient) array that we can index into to find keys and values? Sorting and Searching
Introduction to Hashing (1/2) • Hashing refers to deriving an array index from an arbitrarily large key using a hash function. • Index leads to a value or an object. Therefore, two-step process: hash table hash function key index value Sorting and Searching
Introduction to Hashing (2/2) • Hash table typically holds several hundred to several thousand entries. array of links to instances of the class TA 0 Sam null 1 Hash(‘Sam’)=1 2 David Hash(‘David’)=3 3 null 4 Hash(‘Aisha’)=5 Aisha N - 1 Sorting and Searching
Collisions • Problem: Normally have more keys than entries in our table. Therefore it is inevitable that two keys hash to same position… • e.g., Hash(‘Sam’) = 4 • and, Hash(‘Ardra’) = 4 • Called collision – multiple values hashed to the same key • But arrays won’t let us store multiple elements at one location! • This did look too good to be true... Sorting and Searching
Handling Collisions • Since by design, we can’t avoid collisions, we use buckets to catch extra entries • Consider stupid hash that returns integer value of first letter of each name • Each entry in table could be a reference to bucket • implement bucket with unsorted linked list • then we only need to search within (small) linked lists • called “chaining” • for longer lists, we could sort bucket, or use binary trees; hybrid tree! Linked List 55 head ... 65 ... Linked List head 90 Linked List “Aisha” “Sam” “David” Sorting and Searching
Building a Good Hash Function • Good hash functions • take into account all information in key • fill out hash table as uniformly /densely as possible • avoid collisions?!? Unfortunately, unrealistic • Thus, a function that uses only first character (or any character) is terrible hashing function. • not many Q’s or Z’s, lots of A’s, M’s, etc • % (remainder) provides simple method of ensuring that any integer can be brought within desired range (pick prime as table size for best results). • E.G., take a string, chop it into sections of 4 letters each, then take value of 32 bits that make up each 4-letter section and XOR them together, then % that result by table size • Almost any reasonable function that uses all bits will do, so choose a fast one! • example: hashValue = (key.getValue()) % 101; Sorting and Searching
Announcements • T-Shirt Design Contest! Email your designs to the HTAs "Clothes make the man. Naked people have little or no influence on society."-Mark Twain • Tetris Design Checks start tomorrow! • Tetris ontimehandin 11/21 at 11:59PM Sorting and Searching