Distributed Database Management Systems: Lecture 33

Distributed Database Management Systems: Lecture 33. The main topics covered in this chapter include: data localization for hybrid fragmentation; query optimization; HyF contains both types of fragmentations; QO refers to producing a query execution plan (QEP) that represents execution strategy;...

Trang 1

Distributed Database Management Systems

Lecture 33

Trang 2

In the previous lecture

• Final phase of QD

• Data Localization: for HF,

VF and DF.

Trang 3

In today’s Lecture

• Data Localization for

Hybrid Fragmentation

• Query Optimization.

Trang 4

Reduction for HyF

• HyF contains both types of

Fragmentations

• EMP1= eNo ≤ E4 ( eNo, eName (EMP))

• EMP2= eNo > E4 ( eNo, eName (EMP))

• EMP3= eNo, title (EMP).

Trang 5

• Select eName from EMP

Trang 6

Summary of what we

have done so far

Trang 7

• Data Localization: applies

global query to fragments;

increases optimization

Trang 8

level-• So, next is the cost-based

Trang 10

• Components of Optimizer

• Search Space: set of eq

alternative exec plans

• Cost Model: predicts cost

of a execution plan

• Search Strategy:

produces best plan

Trang 11

Search Space

• Search space consists of

eq Query Trees

produced using Tr Rules

• Optimizer concentrates

on join trees, since join

cost is the most effective

Trang 12

• Example:

• Select eName, resp

From EMP, ASG, PROJ where EMP.eNo = ASG eNo and ASG.pNo =

PROJ.pNo

Trang 14

• Alternatives with N

relations are O(N!)

based on properties of relations

• So, restrictions are

applied

Trang 15

1- Heuristics

- Selection and

projection on base relations

- Avoid Cartesian

product

Trang 16

2- Shape of Tree

- Linear Tree: At least one

node for each operand is

a base relation

- Bushy tree: May have

operators with interm

tables only; allows

parallel execution

Trang 17

Search Strategy

• Most popular is Dynamic

Programming

• That starts with base

relations and keeps on

adding relations calculating cost

Trang 18

• DP is almost exhaustive

so produces best plan

• Too expensive with more

Trang 19

Cost Model

• Cost of operators, statistics

of base data to predict size

of intermediate tables

• Cost considered as Total

Time and Response Time.

Trang 20

• Total time = CPU time +

I/O time + tr time

• In WAN, major cost is tr

time

• Initially ratios were 20:1

for tr and I/O, for LAN it

is 1:1.6

Trang 21

• Response time = CPU

time + I/O time + tr

time

• Difference.?

Trang 22

• TCPU = time for a CPU inst

• TI/O = a disk I/O

• TMSG = fixed time for

initiating and recv a msg

• TTR = transmit a data unit from one site to another

Trang 23

• TT = 2TMSG + TTR*(x+y)

• RT = max{TMSG + TTR*X,

TMSG + TTR*Y}

Site 1 Site 2

Site 3

X units

Y units

Trang 24

Database Statistics

• Major factor is interm tabs

• If the interm results are to

Trang 25

• For each relation R[A1, A2, …, A n]

fragmented as R1, …, R r

1.length of each attribute: length(A i)

2 the number of distinct values for

each attribute in each fragment:

card( Ai (R j))

3 maximum and minimum values in

the domain of each attribute:

min(A i ), max(A i).

Trang 26

4.The cardinalities of each

domain: card(dom[A i])

and the cardinalities of

each fragment: card(R j)

5.Join selectivity factor for

some of the relations

SF J (R,S) = card(R ⋈

Trang 27

card(S))-Cardinalities of Intermediate Results

Trang 29

• SFS(A < value) = max(A) – value

/(max(A) – min(A))

• SFS(p(Ai) ^ p(Aj)) = SFS(p(Ai)) *

(SFSp(Aj))

• SFS(p(Ai) v p(Aj)) = SFS(p(Ai)) +

SFS(p(Aj))–(SFS(p(Ai))* SFS(p(Ai)))

Trang 30

Cardinality of Projection

• Hard to determine precisely

• Two cases when it is trivial

1- When a single attribute A,

card( A(R)) = card (A)

2- When PK is included

card( A(R)) = card (R)

Trang 32

• Semi Join:

SFSJ(R ⋉AS)= card( A(S))/ card(dom[A])

card(R ⋉AS) = SFSJ(S.A) *

card(R).

Trang 33

• Union: Hard to estimate

• Limits possible which are

card(R) + card(S) and

max{card (R) + card (S))

• Difference: Like Union,

card (R) for (R-S), and 0

Trang 34

Centralized Query

Optimization

Tiêu đề	Data Localization for Hybrid Fragmentation and Query Optimization
Trường học	Standard University
Chuyên ngành	Distributed Database Management Systems
Thể loại	Lecture
Năm xuất bản	2023
Thành phố	Standard City

Định dạng
Số trang	36
Dung lượng	93,57 KB