1 / 6

Preliminaries

Preliminaries. Advantages Hash tables can insert(), remove(), and find() with complexity close to O(1). Relatively easy to program Disadvantages There is no convenient way to traverse a hash table. At least double the memory is required.

rossa
Download Presentation

Preliminaries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Preliminaries • Advantages • Hash tables can insert(), remove(), and find() with complexity close to O(1). • Relatively easy to program • Disadvantages • There is no convenient way to traverse a hash table. • At least double the memory is required. • If the hash table becomes too full (load factor > 50%), the insert(), remove(), and find() operations degrade to O(N). • Careful design must be given to the hash key.

  2. Design of Hash Keys • A Hash Table is a collection of elements that performs lookups using an appropriately selected hash function • Definition of a Hash Function • A function that when applied to a key value, computes a hash key used as an index to locate the data element • Design Issue: How do we Choose Hash Functions? • Goal: The hash function must compute values that are random and span the entire hash table. • Goal: The hash function must be quickly calculated

  3. Additional Design Considerations • What if the hash function produces (collisions) the same index for different keys? • Open Addressing (h1(key)+ h2(key,tries))%tableSize • Examples: linear probing, secondary probing, quadratic probing, double hashing • Separate Chaining • How big should the hash table be? • Answer: At least twice as big as the number of elements the table is to stored. • Answer: A prime length

  4. Collision ResolutionOpen Addressing • Linear Probing (h2(key,tries) = tries) • Characteristics: Primary Clustering, deletions difficult • Secondary Probing (h2(key,tries) = constant*tries) • Characteristics: Primary Clustering, deletions difficult • Quadratic Probing (h2(key,tries) = tries^2) • Characteristics: Secondary clustering (same collision resolution pattern for all keys) • incomplete use of the hash table, deletions are difficult • Double Hashing (h2(key,tries) = second hash function*tries) • Characteristics: Eliminates clustering, deletions are difficult Clustering: Tendency for sections of the table to fill up, with increasing probability that keys to insert hit these areas

  5. Separate Chaining Compute hash key If Collision occurs then Insert key in the front of chain (linked list) • Advantages • Hash table grows as needed • Performance is less sensitive to full hash table • Deletion is easy • No clustering

  6. Performance • There are charts in the text describing performance of • Linear, Quadratic, Double hash probing • Open Addressing versus Separate Chaining If F = load factor (percentage full). Probability of one collision = F Probability of two collisions = F2 Expected collisions E=F+2*F2+3*F3+…= ∑i=0.∞i*Fi = F/(1-F)2 If F=.5, E= ½ +2* ¼ +3* 1/8+… ½+ ½ + 3/8+4/16+…≈ ½/(1-½)2=2 If F=.75 E = F/(1-F)2 =(3/4)/(1/16) = 12 If F=0.9 E = F/(1-F)2 = (9/10)/(1/100) = 90 • Hash Tables are often used for file system folders, • They complement Databases using bTrees for sequential processing and a hash table for rapid searching.

More Related