1 / 93

the hash table

the hash table. hash table. hash table. A hash table consists of two major components …. hash table. … a bucket array. hash table. … and a hash function. hash table. Performance is expected to be O(1). bucket array. bucket array. hash table. A bucket array is an array A of size N

adolph
Download Presentation

the hash table

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. the hash table

  2. hash table

  3. hash table A hash table consists of two major components …

  4. hash table … a bucket array

  5. hash table … and a hash function

  6. hash table Performance is expected to be O(1)

  7. bucket array

  8. bucket array hash table • A bucket array is an array A of size N • A[i] is a bucket, i.e. a collection of <key,value> pairs • N is the capacity of A • <k,e> is inserted in A[k] • if keys are well distributed between 0 .. N-1 • if keys are unique integers in range 0 .. N-1 • then each bucket holds at most one entry. • consequently O(1) for get, insert, delete • downside: space is proportional to N • if N is much larger than n (number of entries) we waste space • downside: keys must be in range 0 .. N • this may not be the case (think matric number)

  9. bucket array hash table 0 1 2 3 4 5 6 7 8 9 10 (7,Q) (1,D) (3,C) (6,C) Bucket array of size 11 for the entries (1,D), (3,C), (3,F), (6,C) and (7,Q) If hashed keys unique entries in range [0..11] then each bucket holds at most one entry. Otherwise we have a collision and need to deal with it.

  10. collision bucket array hash table When two different entries map to the same bucket we have a collision 11

  11. collision bucket array hash table When two different entries map to the same bucket we have a collision It’s good to avoid collisions 12

  12. hash function

  13. hash function hash table A hash function maps each key to an integer in the range [0,N-1] Given entry <k,e> … h(k) is the index into the bucket array store entry <k,e> in A[h(k)] • h is a good hash function if • h maps keys so as to minimise collisions • h is easy to compute/program • h is fast to compute • h(k) has two actions • map k to a hash code • map hash code into range [0,N-1]

  14. hash function hash codes in java hash table But care should be taken as this might not be “good”

  15. a bit of maths … that you know (af2)

  16. af2 • Let A and B be sets • A function is • a mapping from elements of A • to elements of B • and is a subset of AxB • i.e. can be defined by a set of tuples!

  17. af2 • A is the domain • B is codomain • f(x) = y • y is image of x • x is preimage of y • There may be more than one preimage of y • There is only one image of x • otherwise not a function • There may be an element in the codomain with no preimage • Range of f is the set of all images of A • the set of all results

  18. Injection (aka one-to-one, 1-1) af2 a u a x b v b c w c y x d y d z z not an injection injection If an injection then preimages are unique

  19. Injection (aka one-to-one, 1-1) af2 • Ideally we want our hash function to be • injective (no collisions) • have a small codomain and range • may need to compress range a u a x b v b c w c y x d y d z z not an injection injection If an injection then preimages are unique

  20. back to ads2

  21. hash code & hash function Just to clear this up (but lets not make too big a deal about it) …

  22. hash code & hash function Just to clear this up (but lets not make too big a deal about it) … We assume hash code is an integer in the codomain Hash function brings hash codes into the range [0,N-1] We will examine just a few hash functions, acting on strings

  23. Polynomial hash codes hash code & hash function Assume we have a key s that is a character String Here is a really dumb hash code public int dumbHash(String s){ int code = 0; for (int i=0;i<s.length();i++) code = code + s.charAt(i); return code; } • What would we get for • dumbHash(“spot”) • dumbHash(“pots”) • dumbHash(“tops”) • dumbHash(“post”)

  24. Polynomial hash codes hash code & hash function Take into consideration the “position” of elements of the key So, this doesn’t look any different from an every-day number It’s to the base a and the coefficients are the components of the key

  25. Polynomial hash codes hash code & hash function Good values for a appear to be 33, 37, 39, 41

  26. Yikes! Look at that range!!!! Polynomial hash codes hash code & hash function • Small scale experiments on unix dictionary • a = 33 • 25104 words/strings • minimum hash value -9165468936209580338 • maximum hash value 8952279818009261254 • collision count 7

  27. Cyclic shift hash codes hash code & hash function Start moving bits around

  28. Cyclic shift hash codes hash code & hash function

  29. Cyclic shift hash codes hash code & hash function Thanks to Arash Partow

  30. Cyclic shift hash codes hash code & hash function

  31. Cyclic shift hash codes hash code & hash function

  32. Cyclic shift hash codes hash code & hash function

  33. Cyclic shift hash codes hash code & hash function

  34. Cyclic shift hash codes hash code & hash function

  35. Cyclic shift hash codes hash code & hash function

  36. Cyclic shift hash codes hash code & hash function

  37. Compression Functions hash code & hash function So, you think you’ve found something that produces a good hash code … How do we compress its range to fit into our machine?

  38. Compression Functions hash code & hash function Assume we want to limit storage to buckets in range [0,N-1] The division method NOTE: keep N prime int i = (int)(hash(s) % N); S[i] = s; … ideally, but there may be collisions 

  39. Compression Functions hash code & hash function Assume we want to limit storage to buckets in range [0,N-1] The multiply add and divide (MAD) method • N is prime • a > 1 is scaling factor • b ≥ 0 is a shift • a % N ≠ 0

  40. hash tables Collision handling schemes

  41. Collision handling schemes hash tables Separate Chaining

  42. Collision handling schemes Separate Chaining hash tables • bucket[i] is a small map • implemented as a list bucket[i] should be a short list It may be sorted It might be something other than a list

  43. Collision handling schemes Separate Chaining hash tables Let N be number of buckets and n the amount of data stored load factor is n/M • Upside: • simple • Downside: • requires auxiliary data structures (to resolve collisions) • this may put additional burden on space

  44. Collision handling schemes Separate Chaining hash tables A simple view: an array where array elements are linked list locn list 0 1 2 3 4 5 6 7

  45. Collision handling schemes Separate Chaining hash tables A simple view: an array where array elements are linked list locn list put(Jon,plumber) hash(Jon) = 3 0 1 2 3 4 5 6 7

  46. Collision handling schemes Separate Chaining hash tables A simple view: an array where array elements are linked list locn list put(Jon,plumber) hash(Jon) = 3 0 1 2 3 Jon,plumber 4 5 6 7

  47. Collision handling schemes Separate Chaining hash tables A simple view: an array where array elements are linked list locn list put(Fred,painter) hash(Fred) = 6 0 1 2 3 Jon,plumber 4 5 6 7

  48. Collision handling schemes Separate Chaining hash tables A simple view: an array where array elements are linked list locn list put(Fred,painter) hash(Fred) = 6 0 1 2 3 Jon,plumber 4 5 6 Fred,painter 7

  49. Collision handling schemes Separate Chaining hash tables A simple view: an array where array elements are linked list locn list put(Joe,prof) hash(Joe) = 1 0 1 2 3 Jon,plumber 4 5 6 Fred,painter 7

More Related