1. Trang chủ
  2. » Công Nghệ Thông Tin

Distributed Database Management Systems: Lecture 35

42 2 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Distributed Database Management Systems: Lecture 35
Thể loại lecture
Định dạng
Số trang 42
Dung lượng 196,28 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Distributed Database Management Systems: Lecture 35. The main topics covered in this chapter include: query optimization and fragmented queries; joins replaced by semijoins; three major QO algorithms; distributed query processing algorithms;...

Trang 1

Distributed Database Management Systems

Lecture 35

Trang 2

In the previous lecture

Trang 4

Semijoin based

Algorithms

Trang 5

• Reduces cost of join

queries

• Semijoin is …….

• Join of two relations

can be replaced SJ of one or both relations.

Trang 8

• Ignoring Tmsg semijoin is

better if

–Size( A(S)) + size(R ⋉A S) < size(R)

• Join is better if

… -• Semijoin is better if… -.

Trang 9

• SJ with more than two

tables Will be more

complex

applied to each individual join, consider

Trang 10

• EMP ⋈ ASG ⋈ PROJ =

• EMP’ ⋈ ASG’ ⋈ PROJ

where

• EMP’ = EMP ⋉ ASG and

• ASG’ = ASG ⋉ PROJ

rather

• EMP” = EMP ⋉ (ASG ⋉

PROJ)

Trang 12

• Select eName From

EMP, ASG, PROJ

Where

EMP.eNo = ASG.eNo and ASG.eNo = PROJ.eNo and EMP.city = PROJ.city

Trang 14

• Full Reducer may be hard

to find

SJs to reduce relation

size

Trang 15

Distributed Query

Processing Algorithms

Trang 16

• Three main

representative algos are

–Distributed INGRES Algorithm–R* Algorithm

–SDD-1 Algorithm.

Trang 18

• Optimizer of Master site

makes inter-site decisions

decisions

time & communication time

Trang 19

• Optimizer, based on

stats of DB and size of

iterm results, decides

about

–Join Ordering

–Join Algo (nested/mergeJoin)–Access path (indexed/seq.).

Trang 20

• Inter-site transfers

– Ship-whole

• Entire relation transferred

• Stored in a temp relation

• In case of merge-join approach, tuples can

be processed as they arrive

– Fetch-as-needed

• nExternal relation is sequentially scanned

• Join attribute value is sent to other relation

• Relevant tuples scanned at other site and

sent to first site

Trang 21

• Inter-site transfers: comparison

• larger data transfer

• smaller number of messages

• better if relations are small

number of messages = O(cardinality of

external relation)

• data transfer per message is minimal

• better if relations are large and the join

selectivity is good.

Trang 23

1-Move outer relation tuples to the site of the inner relation

• Can be joined as they arrive

Trang 24

2- Move inner relation to the site of

outer relation

• cannot join as they arrive; they need

to be stored

• Total Cost = LT (retrieve card(S)

tuples from S) + CT (size (S)) +

LT (store card(S) tuples as T) +

LT (retrieve card(R) tuples from R) +

LT (retrieve s tuples from T) * card

(R).

Trang 25

3- Fetch inner tuples as

needed

• For each tuple in R, send join

attribute value to site of S

• Retrieve matching inner

Trang 26

• Total Cost =

LT (retrieve card(R) tuples from R)+

CT (length(A) * card (R)) + LT(retrieve s tuples from

S) * card(R) +

CT (s * length(S)) * card(R)

Trang 27

4- Move both inner and

outer relations to another site

consisting join of PROJ

(ext) and ASG (int) on

pNo

Trang 28

1- Ship PROJ to site of ASG 2- Ship ASG to site of PROJ 3- Fetch ASG tuples as

needed for each tuple of

PROJ

4- Move both to a third site

Optimization involves costing

for each possibility.

Trang 31

• Based on the Hill Climbing

Algorithm

result to the user site from the final result site is not

considered

time or response time

Trang 32

• Input include

–Query Graph

–Locations of relations–Relations’ statistics.

Trang 33

1- Do the initial local

processing

2- Select the initial best plan

(ES0)

–Calculate cost of moving all

relations to a single site

–Plan with the least cost is ES0

Trang 34

3- Split ES0 into ES1 and

– ES1: Sending one of the relation to other site,

relations joined there

– ES2:Sending the result back to site in ES0.

Trang 35

4- Replace ES0 with ES1 and

ES2 when we should have

cost(ES1) + cost(local join)

+ cost (ES2) < cost (ES0)

5- Recursively apply step 3

and 4 on ES1 and ES2, until no improvement

Trang 36

• Example

• “Find the salaries of engineers

working on CAD/CAM project”

• Involves EMP, PAY, PROJ and

ASG

sal (PAY ⋈title(EMP ⋈eNo(ASG 

⋈pNo( pName = ‘CAD/CAM’  (PROJ)))))

Trang 37

Relation Size Site

EMP PAY PROJ ASG

8 4 1 10

1 2 3 4

Assume Tmsg = 0 and TTR = 1 Length of a tuple is 1

So size(R) = card(R)

Trang 38

• Considering only transfers

Trang 39

Relation Size Site

EMP PAY PROJ ASG

8 4 1 10

1 2 3 4

Assume Tmsg = 0 and TTR = 1 Length of a tuple is 1

So size(R) = card(R)

Trang 40

• Considering only transfers

Trang 41

• Cost for site 2 = 19

• Cost for site 3 = 22

• Cost for site 4 = 13

• So site 4 is our ES0

• Move all relations to

site 4.

Trang 42

Thanks

Ngày đăng: 05/07/2022, 13:42

TÀI LIỆU CÙNG NGƯỜI DÙNG

  • Đang cập nhật ...

TÀI LIỆU LIÊN QUAN