150 likes | 281 Views
Algorithms. Hashing. Hash Functions. Hash functions have many uses Most well-known is for building hash tables. Hashing versus Direct Addressing. Direct addressing is often used for tables. A unique key for an item is used as its address in an array
E N D
Algorithms Hashing
Hash Functions • Hash functions have many uses • Most well-known is for building hash tables
Hashing versus Direct Addressing • Direct addressing is often used for tables. • A unique key for an item is used as its address in an array • If the universe of key values (U) is very large, it may be impractical or impossible to use direct addressing • The set K of keys actually stored may be small relative to U, so much of the allocated space may be wasted • Hashing is used for more efficient storage in data structures called hash tables
Hash Tables • When the set of keys K stored in a dictionary is much smaller than the universe of possible keys U, a has table requires much less storage than a direct address table. • Storage requirements are (|K|) • Searching still requires O(1) on average.
Hash Table Notation • Element with key k is stored at • Slot k with direct-addressing • Slot h(k) with hashing • The hash function h() is used to compute the slot from the key k • The function h maps the universe U of keys into the slots of a hash table T[0..m-1]
Terminology • An element with key khashes to slot h(k). • The value h(k) is the hash value of key k.
T 0 h(k1) h(k4) h(k2)=h(k5) h(k3) U (universe of keys) k1 K (actual keys) k2 k3 k4 k5
Collisions • When two keys hash to the same slot a collision results. • Since |U| > m, collisions are unavoidable. • The simplest (and often effective) collision resolution technique is chaining.
k1 k4 k5 k3 T U (universe of keys) k1 K (actual keys) k2 k2 k3 k4 k5
Hash Functions • A good hash function satisfies (approximately) the assumption of simple uniform hashing: • each key is equally likely to has to any of the m slots • the hash value for a particular key is independent of the hash value for any other key. • It is typically not possible to check this condition because • we do not know the probability distribution from which the keys will be drawn. • The keys may not be drawn independently.
Example If the keys are known to be random real numbers k independently and uniformly distributed in the range 0 < k < 1, the following hash function satisfies the condition of simple uniform hashing.
The Division Method • This is a heuristic hash function that is often effective. • Hash value is the remainder of k divided by m. • Avoid m value that is a power of 2 • Often use an m value that is a prime that is not too close to an exact power of 2
The Multiplication Method • Two step method • Multiply the key k by a constant A in the range 0 < A < 1 and extract the fractional part of kA. • Multiply this value by m and take the floor of the result • More precisely • Some values of A work better than others (see text).
Perfect Hashing • Static hashing: once the set of keys is stored in the table, the set of keys never changes. • Set of reserved words in a programming language • Set of files names on a READ-ONLY CD • Perfect hashing: the worst case number of memory accesses required to perform a search is O(1)