Luong The Nhan, Tran Giang SonBasic concepts Hash functions Direct HashingModulo divisionDigit extractionMid-squareMid-squareFoldingRotationPseudo-random Collision resolution Open addres
Trang 1Luong The Nhan, Tran Giang Son
Basic concepts Hash functions
Direct HashingModulo divisionDigit extractionMid-squareMid-squareFoldingRotationPseudo-random
Collision resolution
Open addressingLinked list resolutionBucket hashing
9.1
Chapter 9
Hash
Data Structures and Algorithms
Luong The Nhan, Tran Giang Son Faculty of Computer Science and Engineering
University of Technology, VNU-HCM
Trang 2Luong The Nhan, Tran Giang Son
Basic concepts Hash functions
Direct HashingModulo divisionDigit extractionMid-squareMid-squareFoldingRotationPseudo-random
Collision resolution
Open addressingLinked list resolutionBucket hashing
9.2
Outcomes
• L.O.5.1 - Depict the following concepts: hashing table,
key, collision, and collision resolution.
• L.O.5.2 - Describe hashing functions using pseudocode
and give examples to show their algorithms.
• L.O.5.3 - Describe collision resolution methods using
pseudocode and give examples to show their algorithms.
• L.O.5.4 - Implement hashing tables using C/C++.
• L.O.5.5 - Analyze the complexity and develop
experiment (program) to evaluate methods supplied for
hashing tables.
• L.O.1.2 - Analyze algorithms and use Big-O notation to
characterize the computational complexity of algorithms
composed by using the following control structures:
sequence, branching, and iteration (not recursion).
Trang 3Luong The Nhan, Tran Giang Son
Basic concepts Hash functions
Direct HashingModulo divisionDigit extractionMid-squareMid-squareFoldingRotationPseudo-random
Collision resolution
Open addressingLinked list resolutionBucket hashing
Trang 4Luong The Nhan, Tran Giang Son
Basic concepts Hash functions
Direct HashingModulo divisionDigit extractionMid-squareMid-squareFoldingRotationPseudo-random
Collision resolution
Open addressingLinked list resolutionBucket hashing
9.4
Basic concepts
Trang 5Luong The Nhan, Tran Giang Son
Basic concepts Hash functions
Direct HashingModulo divisionDigit extractionMid-squareMid-squareFoldingRotationPseudo-random
Collision resolution
Open addressingLinked list resolutionBucket hashing
9.5
Basic concepts
• Sequential search: O(n)
• Binary search: O(log 2 n)
→ Requiring several key
comparisons before the
target is found.
Trang 6Luong The Nhan, Tran Giang Son
Basic concepts Hash functions
Direct HashingModulo divisionDigit extractionMid-squareMid-squareFoldingRotationPseudo-random
Collision resolution
Open addressingLinked list resolutionBucket hashing
Trang 7Luong The Nhan, Tran Giang Son
Basic concepts Hash functions
Direct HashingModulo divisionDigit extractionMid-squareMid-squareFoldingRotationPseudo-random
Collision resolution
Open addressingLinked list resolutionBucket hashing
9.7
Basic concepts
Is there a search algorithm
whose complexity is O(1) ?
YES
Trang 8Luong The Nhan, Tran Giang Son
Basic concepts Hash functions
Direct HashingModulo divisionDigit extractionMid-squareMid-squareFoldingRotationPseudo-random
Collision resolution
Open addressingLinked list resolutionBucket hashing
9.7
Basic concepts
Is there a search algorithm
whose complexity is O(1) ?
YES
Trang 9Luong The Nhan, Tran Giang Son
Basic concepts Hash functions
Direct HashingModulo divisionDigit extractionMid-squareMid-squareFoldingRotationPseudo-random
Collision resolution
Open addressingLinked list resolutionBucket hashing
9.8
Basic concepts
Hình: Each key has only one address
Trang 10Luong The Nhan, Tran Giang Son
Basic concepts Hash functions
Direct HashingModulo divisionDigit extractionMid-squareMid-squareFoldingRotationPseudo-random
Collision resolution
Open addressingLinked list resolutionBucket hashing
9.9
Basic concepts
Trang 11Luong The Nhan, Tran Giang Son
Basic concepts Hash functions
Direct HashingModulo divisionDigit extractionMid-squareMid-squareFoldingRotationPseudo-random
Collision resolution
Open addressingLinked list resolutionBucket hashing
• Ideal hashing :
• No location collision
• Compact address space
Trang 12Luong The Nhan, Tran Giang Son
Basic concepts Hash functions
Direct HashingModulo divisionDigit extractionMid-squareMid-squareFoldingRotationPseudo-random
Collision resolution
Open addressingLinked list resolutionBucket hashing
• Collision : the location of the data to be
inserted is already occupied by the synonym
Trang 13Luong The Nhan, Tran Giang Son
Basic concepts Hash functions
Direct HashingModulo divisionDigit extractionMid-squareMid-squareFoldingRotationPseudo-random
Collision resolution
Open addressingLinked list resolutionBucket hashing
• Collision : the location of the data to be
inserted is already occupied by the synonym
Trang 14Luong The Nhan, Tran Giang Son
Basic concepts Hash functions
Direct HashingModulo divisionDigit extractionMid-squareMid-squareFoldingRotationPseudo-random
Collision resolution
Open addressingLinked list resolutionBucket hashing
9.11
Basic concepts
Trang 15Luong The Nhan, Tran Giang Son
Basic concepts Hash functions
Direct HashingModulo divisionDigit extractionMid-squareMid-squareFoldingRotationPseudo-random
Collision resolution
Open addressingLinked list resolutionBucket hashing
9.12
Basic concepts
Trang 16Luong The Nhan, Tran Giang Son
Basic concepts Hash functions
Direct HashingModulo divisionDigit extractionMid-squareMid-squareFoldingRotationPseudo-random
Collision resolution
Open addressingLinked list resolutionBucket hashing
9.13
Basic concepts
Trang 17Luong The Nhan, Tran Giang Son
Basic concepts Hash functions
Direct HashingModulo divisionDigit extractionMid-squareMid-squareFoldingRotationPseudo-random
Collision resolution
Open addressingLinked list resolutionBucket hashing
9.14
Basic concepts
Trang 18Luong The Nhan, Tran Giang Son
Basic concepts Hash functions
Direct HashingModulo divisionDigit extractionMid-squareMid-squareFoldingRotationPseudo-random
Collision resolution
Open addressingLinked list resolutionBucket hashing
9.15
Hash functions
Trang 19Luong The Nhan, Tran Giang Son
Basic concepts Hash functions
Direct HashingModulo divisionDigit extractionMid-squareMid-squareFoldingRotationPseudo-random
Collision resolution
Open addressingLinked list resolutionBucket hashing
Trang 20Luong The Nhan, Tran Giang Son
Basic concepts Hash functions
Direct HashingModulo divisionDigit extractionMid-squareMid-squareFoldingRotationPseudo-random
Collision resolution
Open addressingLinked list resolutionBucket hashing
9.17
Direct Hashing
The address is the key itself:
hash(Key) = Key
Trang 21Luong The Nhan, Tran Giang Son
Basic concepts Hash functions
Direct HashingModulo divisionDigit extractionMid-squareMid-squareFoldingRotationPseudo-random
Collision resolution
Open addressingLinked list resolutionBucket hashing
9.18
Direct Hashing
• Advantage : there is no collision.
• Disadvantage : the address space (storage
size) is as large as the key space.
Trang 22Luong The Nhan, Tran Giang Son
Basic concepts Hash functions
Direct HashingModulo divisionDigit extractionMid-squareMid-squareFoldingRotationPseudo-random
Collision resolution
Open addressingLinked list resolutionBucket hashing
9.19
Modulo division
Address = Key mod listSize
• Fewer collisions if listSize is a prime
Trang 23Luong The Nhan, Tran Giang Son
Basic concepts Hash functions
Direct HashingModulo divisionDigit extractionMid-squareMid-squareFoldingRotationPseudo-random
Collision resolution
Open addressingLinked list resolutionBucket hashing
Trang 24Luong The Nhan, Tran Giang Son
Basic concepts Hash functions
Direct HashingModulo divisionDigit extractionMid-squareMid-squareFoldingRotationPseudo-random
Collision resolution
Open addressingLinked list resolutionBucket hashing
Trang 25Luong The Nhan, Tran Giang Son
Basic concepts Hash functions
Direct HashingModulo divisionDigit extractionMid-squareMid-squareFoldingRotationPseudo-random
Collision resolution
Open addressingLinked list resolutionBucket hashing
Trang 26Luong The Nhan, Tran Giang Son
Basic concepts Hash functions
Direct HashingModulo divisionDigit extractionMid-squareMid-squareFoldingRotationPseudo-random
Collision resolution
Open addressingLinked list resolutionBucket hashing
9.23
Folding
The key is divided into parts whose size
matches the address size.
Trang 27Luong The Nhan, Tran Giang Son
Basic concepts Hash functions
Direct HashingModulo divisionDigit extractionMid-squareMid-squareFoldingRotationPseudo-random
Collision resolution
Open addressingLinked list resolutionBucket hashing
9.23
Folding
The key is divided into parts whose size
matches the address size.
Trang 28Luong The Nhan, Tran Giang Son
Basic concepts Hash functions
Direct HashingModulo divisionDigit extractionMid-squareMid-squareFoldingRotationPseudo-random
Collision resolution
Open addressingLinked list resolutionBucket hashing
9.24
Rotation
• Hashing keys that are identical except for
the last character may create synonyms.
• The key is rotated before hashing.
original key rotated key
Trang 29Luong The Nhan, Tran Giang Son
Basic concepts Hash functions
Direct HashingModulo divisionDigit extractionMid-squareMid-squareFoldingRotationPseudo-random
Collision resolution
Open addressingLinked list resolutionBucket hashing
9.25
Rotation
• Used in combination with fold shift.
original key rotated key
Trang 30Luong The Nhan, Tran Giang Son
Basic concepts Hash functions
Direct HashingModulo divisionDigit extractionMid-squareMid-squareFoldingRotationPseudo-random
Collision resolution
Open addressingLinked list resolutionBucket hashing
9.26
Pseudo-random
For maximum efficiency, a and c should be
prime numbers.
Trang 31Luong The Nhan, Tran Giang Son
Basic concepts Hash functions
Direct HashingModulo divisionDigit extractionMid-squareMid-squareFoldingRotationPseudo-random
Collision resolution
Open addressingLinked list resolutionBucket hashing
Trang 32Luong The Nhan, Tran Giang Son
Basic concepts Hash functions
Direct HashingModulo divisionDigit extractionMid-squareMid-squareFoldingRotationPseudo-random
Collision resolution
Open addressingLinked list resolutionBucket hashing
9.28
Collision resolution
Trang 33Luong The Nhan, Tran Giang Son
Basic concepts Hash functions
Direct HashingModulo divisionDigit extractionMid-squareMid-squareFoldingRotationPseudo-random
Collision resolution
Open addressingLinked list resolutionBucket hashing
9.29
Collision resolution
• Except for the direct hashing, none of the
others are one-to-one mapping
→ Requiring collision resolution methods
• Each collision resolution method can be
used independently with each hash function
Trang 34Luong The Nhan, Tran Giang Son
Basic concepts Hash functions
Direct HashingModulo divisionDigit extractionMid-squareMid-squareFoldingRotationPseudo-random
Collision resolution
Open addressingLinked list resolutionBucket hashing
9.30
Collision resolution
• A rule of thumb: a hashed list should not
be allowed to become more than 75% full.
Trang 35Luong The Nhan, Tran Giang Son
Basic concepts Hash functions
Direct HashingModulo divisionDigit extractionMid-squareMid-squareFoldingRotationPseudo-random
Collision resolution
Open addressingLinked list resolutionBucket hashing
9.31
Collision resolution
• As data are added and collisions are
resolved, hashing tends to cause data to
group within the list.
→ Clustering : data are unevenly distributed
across the list.
• High degree of clustering increases the
number of probes to locate an element.
→ Minimize clustering.
Trang 36Luong The Nhan, Tran Giang Son
Basic concepts Hash functions
Direct HashingModulo divisionDigit extractionMid-squareMid-squareFoldingRotationPseudo-random
Collision resolution
Open addressingLinked list resolutionBucket hashing
9.32
Collision resolution
• Primary clustering : data become clustered
around a home address.
Trang 37Luong The Nhan, Tran Giang Son
Basic concepts Hash functions
Direct HashingModulo divisionDigit extractionMid-squareMid-squareFoldingRotationPseudo-random
Collision resolution
Open addressingLinked list resolutionBucket hashing
9.33
Collision resolution
• Secondary clustering : data become grouped
along a collision path throughout a list.
Trang 38Luong The Nhan, Tran Giang Son
Basic concepts Hash functions
Direct HashingModulo divisionDigit extractionMid-squareMid-squareFoldingRotationPseudo-random
Collision resolution
Open addressingLinked list resolutionBucket hashing
Trang 39Luong The Nhan, Tran Giang Son
Basic concepts Hash functions
Direct HashingModulo divisionDigit extractionMid-squareMid-squareFoldingRotationPseudo-random
Collision resolution
Open addressingLinked list resolutionBucket hashing
9.35
Open addressing
When a collision occurs, an
for placing the new element in.
Trang 40Luong The Nhan, Tran Giang Son
Basic concepts Hash functions
Direct HashingModulo divisionDigit extractionMid-squareMid-squareFoldingRotationPseudo-random
Collision resolution
Open addressingLinked list resolutionBucket hashing
Trang 41Luong The Nhan, Tran Giang Son
Basic concepts Hash functions
Direct HashingModulo divisionDigit extractionMid-squareMid-squareFoldingRotationPseudo-random
Collision resolution
Open addressingLinked list resolutionBucket hashing
Trang 42Luong The Nhan, Tran Giang Son
Basic concepts Hash functions
Direct HashingModulo divisionDigit extractionMid-squareMid-squareFoldingRotationPseudo-random
Collision resolution
Open addressingLinked list resolutionBucket hashing
9.38
Open Addressing
Algorithm hashInsert(ref T <array>, val k <key>)
Inserts key k into table T.
Trang 43Luong The Nhan, Tran Giang Son
Basic concepts Hash functions
Direct HashingModulo divisionDigit extractionMid-squareMid-squareFoldingRotationPseudo-random
Collision resolution
Open addressingLinked list resolutionBucket hashing
9.39
Open Addressing
Algorithm hashSearch(val T <array>, val k <key>)
Searches for key k in table T.
Trang 44Luong The Nhan, Tran Giang Son
Basic concepts Hash functions
Direct HashingModulo divisionDigit extractionMid-squareMid-squareFoldingRotationPseudo-random
Collision resolution
Open addressingLinked list resolutionBucket hashing
Trang 45Luong The Nhan, Tran Giang Son
Basic concepts Hash functions
Direct HashingModulo divisionDigit extractionMid-squareMid-squareFoldingRotationPseudo-random
Collision resolution
Open addressingLinked list resolutionBucket hashing
9.41
Linear Probing
• When a home address is occupied, go to
the next address (the current address + 1):
hp(k, i) = (h(k) + i) mod m
Trang 46Luong The Nhan, Tran Giang Son
Basic concepts Hash functions
Direct HashingModulo divisionDigit extractionMid-squareMid-squareFoldingRotationPseudo-random
Collision resolution
Open addressingLinked list resolutionBucket hashing
9.41
Linear Probing
• When a home address is occupied, go to
the next address (the current address + 1):
hp(k, i) = (h(k) + i) mod m
Trang 47Luong The Nhan, Tran Giang Son
Basic concepts Hash functions
Direct HashingModulo divisionDigit extractionMid-squareMid-squareFoldingRotationPseudo-random
Collision resolution
Open addressingLinked list resolutionBucket hashing
9.42
Linear Probing
Trang 48Luong The Nhan, Tran Giang Son
Basic concepts Hash functions
Direct HashingModulo divisionDigit extractionMid-squareMid-squareFoldingRotationPseudo-random
Collision resolution
Open addressingLinked list resolutionBucket hashing
9.43
Linear Probing
• Advantages :
• quite simple to implement
• data tend to remain near their home
address (significant for disk addresses)
• Disadvantages :
• produces primary clustering
Trang 49Luong The Nhan, Tran Giang Son
Basic concepts Hash functions
Direct HashingModulo divisionDigit extractionMid-squareMid-squareFoldingRotationPseudo-random
Collision resolution
Open addressingLinked list resolutionBucket hashing