1. Trang chủ
  2. » Giáo án - Bài giảng

Data Structure and Algorithms CO2003 Chapter 9 Hash

54 424 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 54
Dung lượng 903,21 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Basic concepts• Home address: address produced by a hash function.. • Prime area: memory that contains all the home addresses.. • Collision: the location of the data to be inserted is al

Trang 1

Lecturer: Duc Dung Nguyen, PhD.

Contact: nddung@hcmut.edu.vn

October 24, 2016

Faculty of Computer Science and Engineering

Hochiminh city University of Technology

Trang 2

1 Basic concepts

2 Hash functions

3 Collision resolution

Trang 3

• L.O.5.2 - Describe hashing functions using pseudocode and give examples to show theiralgorithms.

• L.O.5.3 - Describe collision resolution methods using pseudocode and give examples to

show their algorithms

• L.O.5.4 - Implement hashing tables using C/C++

• L.O.5.5 - Analyze the complexity and develop experiment (program) to evaluate methodssupplied for hashing tables

• L.O.1.2 - Analyze algorithms and use Big-O notation to characterize the computational

Trang 4

Basic concepts

Trang 5

• Sequential search: O(n)

• Binary search:O(log2n)

→ Requiring severalkey comparisonsbefore the target is found

Trang 7

Basic concepts

Is there a search algorithm whose complexity isO(1)?

Trang 8

Basic concepts

Is there a search algorithm whose complexity isO(1)?

YES

Trang 10

Basic concepts

Trang 11

Basic concepts

• Home address: address produced by a hash function

• Prime area: memory that contains all the home addresses

• Synonyms: a set of keys that hash to the same location

• Collision: the location of the data to be inserted is already occupied by the synonym data

• Ideal hashing:

Trang 12

Basic concepts

• Home address: address produced by a hash function

• Prime area: memory that contains all the home addresses

• Synonyms: a set of keys that hash to the same location

• Collision: the location of the data to be inserted is already occupied by the synonym data

• Ideal hashing:

• No location collision

• Compact address space

Trang 13

• Home address: address produced by a hash function.

• Prime area: memory that contains all the home addresses

• Synonyms: a set of keys that hash to the same location

• Collision: the location of the data to be inserted is already occupied by the synonym data

• Ideal hashing:

• No location collision

• Compact address space

Trang 14

Basic concepts

Trang 16

Basic concepts

Trang 18

Hash functions

Trang 20

Direct Hashing

The address is the key itself:

hash(Key) = Key

Trang 21

• Advantage: there is no collision.

• Disadvantage: the address space (storage size) is as large as the key space

Trang 22

Modulo division

Address = Key mod listSize

• Fewer collisions if listSize is a prime number

• Example:

Numbering system to handle 1,000,000employees

Data space to store up to 300employees

hash(121267) = 121267 mod 307 = 2

Trang 23

Address = selected digits f rom Key

Trang 24

Address = middle digits of Key2

Example:

9452 * 9452 = 89340304→3403

Trang 25

• Disadvantage: the size of the Key2 is too large.

• Variations: use only a portion of the key

Example:

379452: 379 * 379 = 143641→364 121267: 121 * 121 = 014641→464 045128: 045 * 045

= 002025→202

Trang 28

• Hashing keys that are identical except for the last character may create synonyms

• The key is rotated before hashing

original key rotated key

Trang 29

• Used in combination with fold shift.

original key rotated key

Trang 30

For maximum efficiency, a and c should be prime numbers

Trang 32

Collision resolution

Trang 33

• Except for the direct hashing, none of the others areone-to-one mapping

→ Requiring collision resolution methods

• Each collision resolution method can be used independentlywith each hash function

Trang 34

Collision resolution

• Open addressing

• Linked list resolution

• Bucket hashing

Trang 35

When a collision occurs, anunoccupied elementis searched for placing the new element in.

Trang 37

Hash and probe function:

hp : U × {0, 1, 2, , m − 1} → {0, 1, 2, , m − 1}

set ofkeys probe numbers addresses

Trang 38

Open Addressing

Algorithm hashInsert(ref T <array>, val k <key>)

Inserts key k into table T

Trang 42

Linear Probing

• When a home address is occupied, go to the next address(the current address + 1):

hp(k, i) = (h(k) + i) mod m

Trang 44

Linear Probing

• Advantages:

• quite simple to implement

• data tend to remain near their home address (significant for disk addresses)

• Disadvantages:

• produces primary clustering

Trang 45

• The address increment is thecollision probe numbersquared:

hp(k, i) = (h(k) + i2) mod m

Trang 46

Quadratic Probing

• Advantages:

• works much better than linear probing

• Disadvantages:

• time required to square numbers

• produces secondary clustering

h(k1) = h(k2) → hp(k1, i) = hp(k2, i)

Trang 47

• Usingtwohash functions:

hp(k, i) = (h1(k) + ih2(k)) mod m

Trang 49

• The new address is a function of thecollision addressand the key.

of f set = [key/listSize]

newAddress = (collisionAddress + of f set) mod listSize

hp(k, i) = (hp(k, i − 1) + [k/m]) mod m

Trang 51

• Major disadvantage of Open Addressing: each collision resolution increases the probabilityfor future collisions.

→ uselinked liststo store synonyms

Trang 52

Linked list resolution

Trang 53

• Hashing data to bucketsthat can hold multiple pieces of data.

• Each bucket has an address and collisions are postponeduntil the bucket is full

Trang 54

Bucket hashing

Ngày đăng: 29/03/2017, 18:21

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN