1.27k likes | 1.42k Views
Hash Tables. Briana B. Morrison Adapted from William Collins. Sequential Search. Given a vector of integers: v = {12, 15, 18, 3, 76, 9, 14, 33, 51, 44} What is the best case for sequential search? O(1) when value is the first element What is the worst case?
E N D
Hash Tables Briana B. Morrison Adapted from William Collins
Sequential Search • Given a vector of integers: v = {12, 15, 18, 3, 76, 9, 14, 33, 51, 44} • What is the best case for sequential search? • O(1) when value is the first element • What is the worst case? • O(n) when value is last element, or value is not in the list • What is the average case? • O(1/2 * n) which is O(n) Hashing
Binary Search • Given a vector of integers: v = {3, 9, 12, 14, 15, 18, 33, 44, 51, 76} • What is the best case for binary search? • O(1) when element is the middle element • What is the worst case? • O(log n) when element is first, last, or not in list • What is the average case? • O(log n) Hashing
Map vs. Hashmap • What are the differences between a map and a hashmap? • Interface • Efficiency • Applications • Implementation Hashing
CONTIGUOUS array?vector?deque? heap? LINKED Linked? list? map? BUT NONE OF THESE WILL GIVE CONSTANT AVERAGE TIME FOR SEARCHES, INSERTIONS AND REMOVALS. Hashing
To make these values fit into the table, we need to mod by the table size; i.e., key % 1000. 210 256 816 OOPS! Hashing
Hash Codes • Suppose we have a table of size N • A hash code is: • A number in the range 0 to N-1 • We compute the hash code from the key • You can think of this as a “default position” when inserting, or a “position hint” when looking up • A hash function is a way of computing a hash code • Desire: The set of keys should spread evenly over the N values • When two keys have the same hash code: collision Hashing
Hash Functions • A hash function should be quick and easy to compute. • A hash function should achieve an even distribution of the keys that actually occur across the range of indices for both random and non-random data. • Calculation should involve the entire search key. Hashing
Examples of Hash Functions • Usually involves taking the key, chopping it up, mix the pieces together in various ways • Examples: • Truncation – ignore part of key, use the remaining part as the index • Folding – partition the key into several parts and combine the parts in a convenient way (adding, etc.) • After calculating the index, use modular arithmetic. Divide by the size of the index range, and take the remainder as the result Hashing
Example Hash Function Hashing
Devising Hash Functions • Simple functions often produce many collisions • ... but complex functions may not be good either! • It is often an empirical process • Adding letter values in a string: same hash for strings with same letters in different order • Better approach: size_t hash = 0; for (size_t i = 0; i < s.size(); ++i) hash = hash * 31 + s[i]; Hashing
Devising Hash Functions (2) • The String hash is good in that: • Every letter affects the value • The order of the letters affects the value • The values tend to be spread well over the integers Hashing
Devising Hash Functions (3) • Guidelines for good hash functions: • Spread values evenly: as if “random” • Cheap to compute • Generally, number of possible values much greater than table size Hashing
Memory address: We reinterpret the memory address of the key object as an integer Good in general, except for numeric and string keys Integer cast: We reinterpret the bits of the key as an integer Suitable for keys of length less than or equal to the number of bits of the integer type (e.g., char, short, int and float on many machines) Component sum: We partition the bits of the key into components of fixed length (e.g., 16 or 32 bits) and we sum the components (ignoring overflows) Suitable for numeric keys of fixed length greater than or equal to the number of bits of the integer type (e.g., long and double on many machines) Hash Code Maps Hashing
Polynomial accumulation: We partition the bits of the key into a sequence of components of fixed length (e.g., 8, 16 or 32 bits)a0 a1 … an-1 We evaluate the polynomial p(z)= a0+a1 z+a2 z2+ … … +an-1zn-1 at a fixed value z, ignoring overflows Especially suitable for strings (e.g., the choice z =33gives at most 6 collisions on a set of 50,000 English words) Polynomial p(z) can be evaluated in O(n) time using Horner’s rule: The following polynomials are successively computed, each from the previous one in O(1) time p0(z)= an-1 pi(z)= an-i-1 +zpi-1(z) (i =1, 2, …, n -1) We have p(z) = pn-1(z) Hash Code Maps (cont.) Hashing