1 / 26

Hashing & HashMaps

Hashing & HashMaps. Let’s review the worst-case performance characteristics of previously covered data structures. ArrayList – JCF class get() add() contains() SortedArrayList (uses binary searching) get() add() contains() LinkedList – JCF class get() add() contains() BinaryTree

Download Presentation

Hashing & HashMaps

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hashing & HashMaps CS-2851Dr. Mark L. Hornick

  2. Let’s review the worst-case performance characteristics of previously covered data structures ArrayList – JCF class get() add() contains() SortedArrayList (uses binary searching) get() add() contains() LinkedList – JCF class get() add() contains() BinaryTree get() add() contains() CS-2851Dr. Mark L. Hornick

  3. Let’s review the worst-case performance characteristics of previously covered data structures ArrayList get() – O(k); constant-time access add() – O(n); due to shifting contains() – O(n); sequential search SortedArrayList get() add() contains() LinkedList get() add() contains() BinaryTree get() add() contains() CS-2851Dr. Mark L. Hornick

  4. Let’s review the worst-case performance characteristics of previously covered data structures ArrayList get() – O(k); constant-time access add() – O(n); due to shifting contains() – O(n); sequential search SortedArrayList get() – O(k); constant-time access add() – O(n+log n)->O(n); due to need to figure out first where to add; then the need to shift elements to the right contains() – O(log n); search based on the Splitting Rule (binary search) LinkedList get() add() contains() BinaryTree get() add() contains() CS-2851Dr. Mark L. Hornick

  5. Let’s review the worst-case performance characteristics of previously covered data structures ArrayList get() – O(k); constant-time access add() – O(n); due to shifting contains() – O(n); sequential search SortedArrayList get() – O(k); constant-time access add() – O(n+log n)->O(n); due to need to figure out first where to add; then the need to shift elements to the right contains() – O(log n); search based on the Splitting Rule (binary search) LinkedList get() – O(n); sequential access add() – O(k); once where to add has been determined really O(n), because that’s how it takes to find the location to insert contains() – O(n); sequential search BinaryTree get() add() contains() CS-2851Dr. Mark L. Hornick

  6. Let’s review the worst-case performance characteristics of previously covered data structures ArrayList get() – O(k); constant-time access add() – O(n); due to shifting contains() – O(n); sequential search SortedArrayList get() – O(k); constant-time access add() – O(n+log n)->O(n); due to need to figure out first where to add; then the need to shift elements to the right contains() – O(log n); search based on the Splitting Rule (binary search) LinkedList get() – O(n); sequential access add() – O(k); once where to add has been determined really O(n), because that’s how it takes to find the location to insert contains() – O(n); sequential search BinaryTree get() – not supported due to lack of indexing (but do we always need it?) add() – O(log n); due to sorting built into the tree structure contains() – O(log n); due to sorting built into the tree structure What about memory usage?? CS-2851Dr. Mark L. Hornick

  7. Is there anything faster at everything? CS-2851Dr. Mark L. Hornick

  8. Map definition • A map is a collection in which each Entry element has two parts • a uniquekey part • a value part (which may not be unique) • Each unique key “maps” to a corresponding value • Example: Morse code map – each character maps to a (unique) sequence of dots and dashes • Example: a map of Students, in which each key is the (unique) student ID, and each (non-unique?) value is a reference to the Student object itself • Example: a phonebook, where each number (each key) maps to a person Entry key value CS-2851Dr. Mark L. Hornick

  9. What is a Key? • A key is just something that uniquely identifies a particular instance of an value/object • A key can be a number, a string, or an object, so long as it is unique • If two values/objects have the same key, then they are (theoretically) equal • Only one ID per MSOE student, so if the ID’s match, it must (by definition) be the same student • If the equals() method comparing two keys returns true, then the objects are equal too, by definition CS-2851Dr. Mark L. Hornick

  10. What if an object doesn’t possess a specific unique attribute? • Scenario: pretend MSOE ID’s didn’t exist • Can any of the attributes of a student, taken together, be unique? • …even though any individual attribute may not exhibit this uniqueness? • Exercise CS-2851Dr. Mark L. Hornick

  11. A key can be generated from a unique combination of non-unique attributes All of an object’s attributes can be used to generate the key • That is, the object itself is the key Or the key can be generated from just a subset of an object’s attributes • Provided that subset is unique CS-2851Dr. Mark L. Hornick

  12. OK, so what role do keys play in making a faster data structure? What if each unique key corresponded to a unique index within an array of Entries? Maps to key index Entry key value CS-2851Dr. Mark L. Hornick

  13. Hash definition • A hash is a transformation of a key into a numeric value that maps to the index of an array (or table) • This is done in two steps: • generate a numeric hashcode from the key (which is not necessarily numeric) • If the key is already numeric and unique (like an ID), then the key can be used as the hashcode • transform the hashcode into an array index Key hashcode index CS-2851Dr. Mark L. Hornick

  14. HashMap definition • A HashMap<E> is an array-based collection of Entry<E> elements • a value part (which could be anything) • a uniquekey part (somehow derived from value) • Each Entry is at a specific index in the array, where the index is determined from the hashcode of the key • Example: a map of Students, in which each key is the (unique) student ID, and each (non-unique?) value is a reference to the Student object itself Entry<E> key E value CS-2851Dr. Mark L. Hornick

  15. How do you generate a hashcode? In Java, all classes have a built-in hashCode() method defined in the Object class Key hashcode CS-2851Dr. Mark L. Hornick

  16. Classes that don’t override hashCode() inherit the Object class’s hashCode() method Which returns the memory address of the object • Is this a repeatable hashcode??? No! Mem addr Object hashcode CS-2851Dr. Mark L. Hornick

  17. A given key should always generate the same hashcode • So that the hashcode computation can be repeated at any time, and always result in the same value • …and therefore, the same index Q: If keys are unique, does this guarantee the hashcode generated from the keys are also unique?? Key hashcode index CS-2851Dr. Mark L. Hornick

  18. Exercise • Generate a hashcode from a String of characters • What approach should you use?? CS-2851Dr. Mark L. Hornick

  19. How do you generate a hashcode? In Java, many classes override Objects hashcode() method in order to generate unique hashcodes Integer class • Integer’s hashCode( ) method simply returns the underlying int value String class • Look at the javadoc for String.hashCode Key hashcode CS-2851Dr. Mark L. Hornick

  20. Writing your own hashCode() • A key should uniquely identify an object • Hashcodes generated from keys should be as unique as possible • to avoid collisions • Depending on the hashcode algorithm, different keys can generate the same hashcode Key hashcode index CS-2851Dr. Mark L. Hornick

  21. How do you transform a hashcode into an array index? Assume you have an array with length=1024 An array index in the range 0…1023 can be computed as follows using modulo arithmetic: int index = hashCode(123456789)% 1024; The resulting index=933 CS-2851Dr. Mark L. Hornick

  22. More hashing examples(for a table 1024 in length) • 123456789 indexes to 933 • 428671256 indexes to 500 • 884739816 indexes to 234 CS-2851Dr. Mark L. Hornick

  23. table size null 3 0 … xxx Anne xxx … yyy Susan yyy … zzz Ed zzz … null 1023 Exercise What are the index values xxx, yyy, and zzz? CS-2851Dr. Mark L. Hornick

  24. Hashing can result in Collisions 123456789 indexes to 933 428671256 indexes to 500 884739816 indexes to 234 403578063 also indexes to 933 • When two different keys yield the same index (even from different hashcodes), that is called a collision • Keys that yield the same index are called synonyms • Special handling is required CS-2851Dr. Mark L. Hornick

  25. Hashing is inefficient when there are a lot of collisions • Ideally, we want the hashing algorithm to generate indices “sprinkled” randomly throughout the underlying table • The Uniform Hashing Assumption assumes • Each key is equally likely to hash to any one of the table addresses, independently of where the other keys have hashed CS-2851Dr. Mark L. Hornick

  26. Even if this assumption is true, collisions still occur • This is due to the finite set of indices in a table • The bigger the table, the less likely a collision is to occur • But tables cannot be made infinitely large • An infinite number of keys cannot be mapped into a finite set of indices • So collision handlers have to be implemented CS-2851Dr. Mark L. Hornick

More Related