Basic concepts• Home address: address produced by a hash function.. • Prime area: memory that contains all the home addresses.. • Collision: the location of the data to be inserted is al
Trang 1Lecturer: Duc Dung Nguyen, PhD.
Contact: nddung@hcmut.edu.vn
October 24, 2016
Faculty of Computer Science and Engineering
Hochiminh city University of Technology
Trang 21 Basic concepts
2 Hash functions
3 Collision resolution
Trang 3• L.O.5.2 - Describe hashing functions using pseudocode and give examples to show theiralgorithms.
• L.O.5.3 - Describe collision resolution methods using pseudocode and give examples to
show their algorithms
• L.O.5.4 - Implement hashing tables using C/C++
• L.O.5.5 - Analyze the complexity and develop experiment (program) to evaluate methodssupplied for hashing tables
• L.O.1.2 - Analyze algorithms and use Big-O notation to characterize the computational
Trang 4Basic concepts
Trang 5• Sequential search: O(n)
• Binary search:O(log2n)
→ Requiring severalkey comparisonsbefore the target is found
Trang 7Basic concepts
Is there a search algorithm whose complexity isO(1)?
Trang 8Basic concepts
Is there a search algorithm whose complexity isO(1)?
YES
Trang 10Basic concepts
Trang 11Basic concepts
• Home address: address produced by a hash function
• Prime area: memory that contains all the home addresses
• Synonyms: a set of keys that hash to the same location
• Collision: the location of the data to be inserted is already occupied by the synonym data
• Ideal hashing:
Trang 12Basic concepts
• Home address: address produced by a hash function
• Prime area: memory that contains all the home addresses
• Synonyms: a set of keys that hash to the same location
• Collision: the location of the data to be inserted is already occupied by the synonym data
• Ideal hashing:
• No location collision
• Compact address space
Trang 13• Home address: address produced by a hash function.
• Prime area: memory that contains all the home addresses
• Synonyms: a set of keys that hash to the same location
• Collision: the location of the data to be inserted is already occupied by the synonym data
• Ideal hashing:
• No location collision
• Compact address space
Trang 14Basic concepts
Trang 16Basic concepts
Trang 18Hash functions
Trang 20Direct Hashing
The address is the key itself:
hash(Key) = Key
Trang 21• Advantage: there is no collision.
• Disadvantage: the address space (storage size) is as large as the key space
Trang 22Modulo division
Address = Key mod listSize
• Fewer collisions if listSize is a prime number
• Example:
Numbering system to handle 1,000,000employees
Data space to store up to 300employees
hash(121267) = 121267 mod 307 = 2
Trang 23Address = selected digits f rom Key
Trang 24Address = middle digits of Key2
Example:
9452 * 9452 = 89340304→3403
Trang 25• Disadvantage: the size of the Key2 is too large.
• Variations: use only a portion of the key
Example:
379452: 379 * 379 = 143641→364 121267: 121 * 121 = 014641→464 045128: 045 * 045
= 002025→202
Trang 28• Hashing keys that are identical except for the last character may create synonyms
• The key is rotated before hashing
original key rotated key
Trang 29• Used in combination with fold shift.
original key rotated key
Trang 30For maximum efficiency, a and c should be prime numbers
Trang 32Collision resolution
Trang 33• Except for the direct hashing, none of the others areone-to-one mapping
→ Requiring collision resolution methods
• Each collision resolution method can be used independentlywith each hash function
Trang 34Collision resolution
• Open addressing
• Linked list resolution
• Bucket hashing
Trang 35When a collision occurs, anunoccupied elementis searched for placing the new element in.
Trang 37Hash and probe function:
hp : U × {0, 1, 2, , m − 1} → {0, 1, 2, , m − 1}
set ofkeys probe numbers addresses
Trang 38Open Addressing
Algorithm hashInsert(ref T <array>, val k <key>)
Inserts key k into table T
Trang 42Linear Probing
• When a home address is occupied, go to the next address(the current address + 1):
hp(k, i) = (h(k) + i) mod m
Trang 44Linear Probing
• Advantages:
• quite simple to implement
• data tend to remain near their home address (significant for disk addresses)
• Disadvantages:
• produces primary clustering
Trang 45• The address increment is thecollision probe numbersquared:
hp(k, i) = (h(k) + i2) mod m
Trang 46Quadratic Probing
• Advantages:
• works much better than linear probing
• Disadvantages:
• time required to square numbers
• produces secondary clustering
h(k1) = h(k2) → hp(k1, i) = hp(k2, i)
Trang 47• Usingtwohash functions:
hp(k, i) = (h1(k) + ih2(k)) mod m
Trang 49• The new address is a function of thecollision addressand the key.
of f set = [key/listSize]
newAddress = (collisionAddress + of f set) mod listSize
hp(k, i) = (hp(k, i − 1) + [k/m]) mod m
Trang 51• Major disadvantage of Open Addressing: each collision resolution increases the probabilityfor future collisions.
→ uselinked liststo store synonyms
Trang 52Linked list resolution
Trang 53• Hashing data to bucketsthat can hold multiple pieces of data.
• Each bucket has an address and collisions are postponeduntil the bucket is full
Trang 54Bucket hashing