Distributed Database Management Systems: Lecture 35. The main topics covered in this chapter include: query optimization and fragmented queries; joins replaced by semijoins; three major QO algorithms; distributed query processing algorithms;...
Trang 1Distributed Database Management Systems
Lecture 35
Trang 2In the previous lecture
Trang 4Semijoin based
Algorithms
Trang 5• Reduces cost of join
queries
• Semijoin is …….
• Join of two relations
can be replaced SJ of one or both relations.
Trang 8• Ignoring Tmsg semijoin is
better if
–Size( A(S)) + size(R ⋉A S) < size(R)
• Join is better if
… -• Semijoin is better if… -.
Trang 9• SJ with more than two
tables Will be more
complex
applied to each individual join, consider
Trang 10• EMP ⋈ ASG ⋈ PROJ =
• EMP’ ⋈ ASG’ ⋈ PROJ
where
• EMP’ = EMP ⋉ ASG and
• ASG’ = ASG ⋉ PROJ
rather
• EMP” = EMP ⋉ (ASG ⋉
PROJ)
Trang 12• Select eName From
EMP, ASG, PROJ
Where
EMP.eNo = ASG.eNo and ASG.eNo = PROJ.eNo and EMP.city = PROJ.city
Trang 14• Full Reducer may be hard
to find
SJs to reduce relation
size
Trang 15Distributed Query
Processing Algorithms
Trang 16• Three main
representative algos are
–Distributed INGRES Algorithm–R* Algorithm
–SDD-1 Algorithm.
Trang 18• Optimizer of Master site
makes inter-site decisions
decisions
time & communication time
Trang 19• Optimizer, based on
stats of DB and size of
iterm results, decides
about
–Join Ordering
–Join Algo (nested/mergeJoin)–Access path (indexed/seq.).
Trang 20• Inter-site transfers
– Ship-whole
• Entire relation transferred
• Stored in a temp relation
• In case of merge-join approach, tuples can
be processed as they arrive
– Fetch-as-needed
• nExternal relation is sequentially scanned
• Join attribute value is sent to other relation
• Relevant tuples scanned at other site and
sent to first site
Trang 21• Inter-site transfers: comparison
• larger data transfer
• smaller number of messages
• better if relations are small
• number of messages = O(cardinality of
external relation)
• data transfer per message is minimal
• better if relations are large and the join
selectivity is good.
Trang 231-Move outer relation tuples to the site of the inner relation
• Can be joined as they arrive
Trang 242- Move inner relation to the site of
outer relation
• cannot join as they arrive; they need
to be stored
• Total Cost = LT (retrieve card(S)
tuples from S) + CT (size (S)) +
LT (store card(S) tuples as T) +
LT (retrieve card(R) tuples from R) +
LT (retrieve s tuples from T) * card
(R).
Trang 253- Fetch inner tuples as
needed
• For each tuple in R, send join
attribute value to site of S
• Retrieve matching inner
Trang 26• Total Cost =
LT (retrieve card(R) tuples from R)+
CT (length(A) * card (R)) + LT(retrieve s tuples from
S) * card(R) +
CT (s * length(S)) * card(R)
Trang 274- Move both inner and
outer relations to another site
consisting join of PROJ
(ext) and ASG (int) on
pNo
Trang 281- Ship PROJ to site of ASG 2- Ship ASG to site of PROJ 3- Fetch ASG tuples as
needed for each tuple of
PROJ
4- Move both to a third site
Optimization involves costing
for each possibility.
Trang 31• Based on the Hill Climbing
Algorithm
result to the user site from the final result site is not
considered
time or response time
Trang 32• Input include
–Query Graph
–Locations of relations–Relations’ statistics.
Trang 331- Do the initial local
processing
2- Select the initial best plan
(ES0)
–Calculate cost of moving all
relations to a single site
–Plan with the least cost is ES0
Trang 343- Split ES0 into ES1 and
– ES1: Sending one of the relation to other site,
relations joined there
– ES2:Sending the result back to site in ES0.
Trang 354- Replace ES0 with ES1 and
ES2 when we should have
cost(ES1) + cost(local join)
+ cost (ES2) < cost (ES0)
5- Recursively apply step 3
and 4 on ES1 and ES2, until no improvement
Trang 36• Example
• “Find the salaries of engineers
working on CAD/CAM project”
• Involves EMP, PAY, PROJ and
ASG
sal (PAY ⋈title(EMP ⋈eNo(ASG
⋈pNo( pName = ‘CAD/CAM’ (PROJ)))))
Trang 37Relation Size Site
EMP PAY PROJ ASG
8 4 1 10
1 2 3 4
Assume Tmsg = 0 and TTR = 1 Length of a tuple is 1
So size(R) = card(R)
Trang 38• Considering only transfers
Trang 39Relation Size Site
EMP PAY PROJ ASG
8 4 1 10
1 2 3 4
Assume Tmsg = 0 and TTR = 1 Length of a tuple is 1
So size(R) = card(R)
Trang 40• Considering only transfers
Trang 41• Cost for site 2 = 19
• Cost for site 3 = 22
• Cost for site 4 = 13
• So site 4 is our ES0
• Move all relations to
site 4.
Trang 42Thanks