Data Structure and Algorithms CO2003 Chapter 9 Hash

Basic concepts• Home address: address produced by a hash function.. • Prime area: memory that contains all the home addresses.. • Collision: the location of the data to be inserted is al

Trang 1

Lecturer: Duc Dung Nguyen, PhD.

Contact: nddung@hcmut.edu.vn

October 24, 2016

Faculty of Computer Science and Engineering

Hochiminh city University of Technology

Trang 2

1 Basic concepts

2 Hash functions

3 Collision resolution

Trang 3

• L.O.5.2 - Describe hashing functions using pseudocode and give examples to show theiralgorithms.

• L.O.5.3 - Describe collision resolution methods using pseudocode and give examples to

show their algorithms

• L.O.5.4 - Implement hashing tables using C/C++

• L.O.5.5 - Analyze the complexity and develop experiment (program) to evaluate methodssupplied for hashing tables

• L.O.1.2 - Analyze algorithms and use Big-O notation to characterize the computational

Trang 4

Basic concepts

Trang 5

• Sequential search: O(n)

• Binary search:O(log2n)

→ Requiring severalkey comparisonsbefore the target is found

Trang 7

Basic concepts

Is there a search algorithm whose complexity isO(1)?

Trang 8

Basic concepts

Is there a search algorithm whose complexity isO(1)?

YES

Trang 10

Basic concepts

Trang 11

Basic concepts

• Home address: address produced by a hash function

• Prime area: memory that contains all the home addresses

• Synonyms: a set of keys that hash to the same location

• Collision: the location of the data to be inserted is already occupied by the synonym data

• Ideal hashing:

Trang 12

Basic concepts

• Home address: address produced by a hash function

• Ideal hashing:

• No location collision

• Compact address space

Trang 13

• Home address: address produced by a hash function.

• Ideal hashing:

• No location collision

• Compact address space

Trang 14

Basic concepts

Trang 16

Basic concepts

Trang 18

Hash functions

Trang 20

Direct Hashing

The address is the key itself:

hash(Key) = Key

Trang 21

• Advantage: there is no collision.

• Disadvantage: the address space (storage size) is as large as the key space

Trang 22

Modulo division

Address = Key mod listSize

• Fewer collisions if listSize is a prime number

• Example:

Numbering system to handle 1,000,000employees

Data space to store up to 300employees

hash(121267) = 121267 mod 307 = 2

Trang 23

Address = selected digits f rom Key

Trang 24

Address = middle digits of Key2

Example:

9452 * 9452 = 89340304→3403

Trang 25

• Disadvantage: the size of the Key2 is too large.

• Variations: use only a portion of the key

Example:

379452: 379 * 379 = 143641→364 121267: 121 * 121 = 014641→464 045128: 045 * 045

= 002025→202

Trang 28

• Hashing keys that are identical except for the last character may create synonyms

• The key is rotated before hashing

original key rotated key

Trang 29

• Used in combination with fold shift.

original key rotated key

Trang 30

For maximum efficiency, a and c should be prime numbers

Trang 32

Collision resolution

Trang 33

• Except for the direct hashing, none of the others areone-to-one mapping

→ Requiring collision resolution methods

• Each collision resolution method can be used independentlywith each hash function

Trang 34

Collision resolution

• Open addressing

• Linked list resolution

• Bucket hashing

Trang 35

When a collision occurs, anunoccupied elementis searched for placing the new element in.

Trang 37

Hash and probe function:

hp : U × {0, 1, 2, , m − 1} → {0, 1, 2, , m − 1}

set ofkeys probe numbers addresses

Trang 38

Open Addressing

Algorithm hashInsert(ref T <array>, val k <key>)

Inserts key k into table T

Trang 42

Linear Probing

• When a home address is occupied, go to the next address(the current address + 1):

hp(k, i) = (h(k) + i) mod m

Trang 44

Linear Probing

• Advantages:

• quite simple to implement

• data tend to remain near their home address (significant for disk addresses)

• Disadvantages:

• produces primary clustering

Trang 45

• The address increment is thecollision probe numbersquared:

hp(k, i) = (h(k) + i2) mod m

Trang 46

Quadratic Probing

• Advantages:

• works much better than linear probing

• Disadvantages:

• time required to square numbers

• produces secondary clustering

h(k1) = h(k2) → hp(k1, i) = hp(k2, i)

Trang 47

• Usingtwohash functions:

hp(k, i) = (h1(k) + ih2(k)) mod m

Trang 49

• The new address is a function of thecollision addressand the key.

of f set = [key/listSize]

newAddress = (collisionAddress + of f set) mod listSize

hp(k, i) = (hp(k, i − 1) + [k/m]) mod m

Trang 51

• Major disadvantage of Open Addressing: each collision resolution increases the probabilityfor future collisions.

→ uselinked liststo store synonyms

Trang 52

Linked list resolution

Trang 53

• Hashing data to bucketsthat can hold multiple pieces of data.

• Each bucket has an address and collisions are postponeduntil the bucket is full

Trang 54

Bucket hashing

Định dạng
Số trang	54
Dung lượng	903,21 KB