500 likes | 1.09k Views
Quadratic probing. Outline. Problems with linear problem and primary clustering Outline of quadratic probing insertions, searching restrictions deletions weaknesses. Quadratic Probing. Primary clustering occurs with linear probing because the same linear pattern:
E N D
Outline Problems with linear problem and primary clustering Outline of quadratic probing • insertions, searching • restrictions • deletions • weaknesses
Quadratic Probing Primary clustering occurs with linear probing because the same linear pattern: • If a bin is inside a cluster, then the next bin must either: • Also be in that cluster, or • Expand the cluster Instead of searching forward in a linear fashion, consider searching forward using a quadratic function
Quadratic Probing Suppose that an element should appear in bin h: • if bin h is occupied, then check the following sequence of bins: h + 12, h + 22, h + 32, h + 42, h + 52, ... h + 1, h + 4, h + 9, h + 16, h + 25, ... For example, with M = 17:
Quadratic Probing If one of h + i2 falls into a cluster, this does not imply the next one will
Quadratic Probing For example, suppose an element was to be inserted in bin 23 in a hash table with 31bins The sequence in which the bins would be checked is: 23, 24, 27, 1, 8, 17, 28, 10, 25, 11, 30, 20, 12, 6, 2, 0
Quadratic Probing Even if two bins are initially close, the sequence in which subsequent bins are checked varies greatly Again, with M = 31 bins, compare the first 16 bins which are checked starting with 22 and 23: 22 22, 23, 26, 0, 7, 16, 27, 9, 24, 10, 29, 19, 11, 5, 1, 30 23 23, 24, 27, 1, 8, 17, 28, 10, 25, 11, 30, 20, 12, 6, 2, 0
Quadratic Probing Thus, quadratic probing solves the problem of primary clustering Unfortunately, there is a second problem which must be dealt with • Suppose we have M = 8 bins: 12 ≡ 1, 22 ≡ 4, 32 ≡ 1 • In this case, we are checking bin h + 1 twice having checked only one other bin
Quadratic Probing Unfortunately, there is no guarantee that h + i2 mod M will cycle through 0, 1, ..., M – 1 Solution: • Require that M be prime • In this case, h + i2 mod M for i = 0, ..., (M – 1)/2 will cycle through exactly (M + 1)/2 values before repeating
Quadratic Probing Example M= 11: 0, 1, 4, 9, 16 ≡ 5, 25 ≡ 3, 36 ≡ 3 M= 13: 0, 1, 4, 9, 16 ≡ 3, 25 ≡ 12, 36 ≡ 10, 49 ≡ 10 M= 17: 0, 1, 4, 9, 16, 25 ≡ 8, 36 ≡ 2, 49 ≡ 15, 64 ≡ 13, 81 ≡ 13
Quadratic Probing Thus, quadratic probing avoids primary clustering • Unfortunately, we are not guaranteed that we will use all the bins In practice, if the hash function is reasonable, this is not a significant problem until l approaches 1
Quadratic Probing For example, with a hash table with M = 19 using quadratic probing, insert the following random 3-digit numbers: 086, 198, 466, 709, 973, 981, 374, 766, 473, 342, 191, 393, 300, 011, 538, 913, 220, 844, 565 using the number modulo 19 to be the initial bin
Quadratic Probing The first two fall into their correct bin: 086 → 10, 198 → 8 The next already causes a collision: 466 → 10 → 11 The next four cause no collisons: 709 → 6, 973 → 4, 981 → 12, 374 → 13 Then another collision: 766 → 6 → 7
Quadratic Probing At this point, there are two clusters and the load factor is l = 0.42
Quadratic Probing The next three also go into their appropriat bin: 473 → 17, 342 → 0, 191 → 1 Then there is one more collision 393 → 13 → 14 and 300 falls into its correct bin: 300 → 15
Quadratic Probing With five more insertions, the load factor is isl = 0.68 with one large cluster:
Quadratic Probing At this point, insertions become more tedious: 011 → 11 → 12 → 15 → 1 → 8 → 17 → 9 538 → 6 → 7 → 10 → 15 → 3 913 → 1 → 2 220 → 11 → ⋅⋅⋅→ 9 → 3 → 18 844 → 8 → 9 → 12 → 17 → 5
Quadratic Probing To show how quadratic probing works, consider the addition of 583, starting in bin 6: The first four bins all fall within the same cluster, however, the fifth bin checked falls far outside the cluster
Quadratic Probing At this point, the array is almost full (bin 16 is open) and the load factor is l = 0.95 If we try to add the last number, 565 the sequence of bins checked is 14 → 15 → 18 → 4 → 11 → 1 → 12 → 6 → 2 → 0 which does not hit bin 16
Quadratic versus Linear Probing We can compare the number of probes required with that of linear probing: 086 → 10, 10 198 → 8 466 → 10 → 11 709 → 6 973 → 4 981 → 12 374 → 13 766 → 6 → 7 473 → 17 342 → 0 191 → 1 393 → 13 → 14 300 → 15 011 → 11 → 12 → 13 → 14 → 15 → 16 538 → 6 → 7 → 8 → 9 913 → 1 → 2 220 → 11 → 12 → 13 → 14 → 15 → 16 → 17 → 18 844 → 8 → 9 → 10 → 11 → 12 → 13 → 14 → 15 → 16 → 17 → 18 → 0 → 1 → 2 → 3 565 → 14 → 15 → 16 → 17 → 18 → 0 → 1 → 2 → 3 → 4 → 5
Erase We have seen how we can perform insertions – next is deletions With linear probing, if we deleted the contents of a bin, we had to search ahead to determine if any nodes had to be moved back • easy with linear probing; we simply moved from bin to bin until an empty bin was located
Erase The nonlinear probing associated with quadratic probing does not allow us to do this efficiently • For example, suppose we delete 466 which is currently in bin 11: • The two other entries which pass through bin 11 were 011 and 220 • We cannot (efficiently) find these entries
Erase • Solution: • associate with each bin a field which is either UNOCCUPIED, OCCUPIED, or ERASED
Erase Initially, all bins are initially marked UNOCCUPIED When a bin is filled, it is marked OCCUPIED If a bin is emptied (as a result of a remove), it is marked ERASED • Note that a bin which is marked as being ERASED may once again be filled (and hence marked OCCUPIED)
Erase For example, given a hash table withM = 11 bins, enter the values 135 909 246 894 518 365
Erase This results in
Erase Next, erase 135 909 246
Erase In searching for 894, we would skip over the bins marked as erased • 3, 4, 7, 1
Erase In searching for 575, we would also examine bins • 3, 4, 7, 1, 8 The last bin is unoccupied: 575 is not in the hash table
Secondary Clustering The phenomenon of primary clustering does not occur with quadratic probing However, if multiple items all hash to the same initial bin, the same sequence of numbers will be followed • This is termed secondary clustering • The effect is less significant than that of primary clustering
Secondary Clustering Secondary clustering may be a problem if the hash function does not produce an even distribution of entries One solution to secondary is double hashing: associating with each element an initial bin (defined by one hash function) and a skip (defined by a second hash function)
Example Insert the 6 elements 14, 107, 31, 118, 34, 112 into an initially empty hash table of size 11 using quadratic hashing Let the hash function be the number modulo 11
Insert 14, 107, 31, 118, 34, 112 The first three fall into bins 3, 8, and 9, respectively
Insert 14, 107, 31, 118, 34, 112 118 also falls into bin 8 (occupied) Thus, we check: • 8 + 1 = 9 - occupied • 8 + 4 = 1 - unoccupied
Insert 14, 107, 31,118,34, 112 34 falls into bin 1 which is occupied, thus we check: • 1 + 1 = 2 - unoccupied
Insert 14, 107, 31,118, 35, 112 112 falls into bin 2 which is now occupied, thus we check: • 2 + 1 = 3 - occupied • 2 + 4 = 6 - unoccupied
Insert 14, 107, 31,118,35, 112 At this point, the hash table is over half full We are no longer guaranteed that the insertion of a new element may be possible Solution: increase the size of the table (perhaps only after failing) • Problem: the new size must, too, be prime
Erase To erase an element, we must simply mark it as erased In our example, removing 118, we begin in bin 8, and continue to check 9, and then 1 Mark that bin as having had an element erased:
Find To find an element we start by checking the bin it should have initially been in, and then begin checking following quadratic probing until either: • we find it, or • we find a bin which is neither occupied or deleted
Find We find 14 in bin 3 We don’t find 34 in bin 1 (marked as erased), so we check bin1 + 1 = 2, and find it
Find We search for 19 in bin 8 Not finding it, we check: 8 + 1 = 9 - occupied 8 + 4 = 1 - erased 8 + 9 = 6 - occupied 8 + 16 = 2 - occupied 8 + 25 = 0 - unoccupied: not found
References Wikipedia, http://en.wikipedia.org/wiki/Hash_function [1] Cormen, Leiserson, and Rivest, Introduction to Algorithms, McGraw Hill, 1990. [2] Weiss, Data Structures and Algorithm Analysis in C++, 3rd Ed., Addison Wesley. These slides are provided for the ECE 250 Algorithms and Data Structures course. The material in it reflects Douglas W. Harder’s best judgment in light of the information available to him at the time of preparation. Any reliance on these course slides by any party for any other purpose are the responsibility of such parties. Douglas W. Harder accepts no responsibility for damages, if any, suffered by any party as a result of decisions made or actions based on these course slides for any other purpose than that for which it was intended.