1. Trang chủ
  2. » Công Nghệ Thông Tin

Distributed Database Management Systems: Lecture 30

28 18 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 28
Dung lượng 82,08 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Distributed Database Management Systems: Lecture 30. The main topics covered in this chapter include: basic concepts of query optimization; QP in centralized and distributed DBs; query processor transforms complex queries into concise and simple ones;...

Trang 1

Distributed Database Management Systems

Lecture 30

Trang 2

In the previous lecture

based CC

Trang 5

• Query processing is

critical performance issue

problem specially in DDBS environment

Trang 6

• Main function of QP is to

transform an SQL query into equivalent relational algebra one (low level

language)

• Transformation must

achieve correctness and efficiency

Trang 8

• Considering the tables

• EMP(eNo, eName, title)

• ASG(eNo, pNo, resp, dur)

• PROJ(pNo, pName,

budget, loc)

• Query: Get the names of

employees who are

managing a project

Trang 9

• SELECT eName

FROM EMP, ASG

WHERE EMP.eNo = ASG.eNo

AND resp = ‘Manager’

Trang 10

eName ( resp=‘Manager’ ^ EMP.eNo =

ASG.eNo) (EMPxASG)

eName (EMP ⋈ ( resp=‘Manager’

(ASG)))

• Obviously second one needs

less computing resources

since avoids Cartesian product

Trang 12

• Same query in DDBS

• Suppose EMP and ASG

are HF as

• EMP1 = eNo ≤ ‘E3’ (EMP)

• EMP2 = eNo > ‘E3’ (EMP)

• ASG1 = eNo ≤ ‘E3’ (ASG)

• ASG2 = eNo > ‘E3’ (ASG)

Trang 13

• Further suppose these

fragments are stored

at site 1, 2, 3 and 4

and result at site 5

Trang 14

ASC1’= resp = ‘Manager(ASG1)

EMP1’=EMP1 ⋈ (ASG1’)

Site 1

Site 3

ASC2’= resp = ‘Manager(ASG2)

EMP2’=EMP2 ⋈ (ASG2’)

Trang 15

result = (EMP1 U EMP2) ⋈ eNo

resp = ‘Manager’ (ASG1 U ASG2)

Site 1 Site 2 Site 3 Site 4

ASG1 ASG2 EMP1 EMP2

Trang 16

Lets Assume

size(EMP)

size(ASG)

400 1000

• tuple access cost

• tuple transfer cost

1 unit

10 units

• There are 20 Managers

• Data distributed evenly at all

sites

Trang 20

• Communication Cost will

dominate in WAN

• Not that dominant in

LANs, so total cost

should be considered in LANs

• QO can also maximize

throughput

Trang 21

Operators’ Complexity

• Select, Project (without

duplicate elimination) O(n)

• Project (with duplicate

elimination), Group O(nlogn)

• Join, Semi-Join,

Division, Set Operators O(nlog n)

• Cartesian Product O(n2 )

Trang 22

Characterization of Query Processors

Trang 23

• Types of Optimization

–Exhaustive search for the

cost of each strategy to find the most optimal one

–May be very costly in case of

multiple options and more

fragments

–Heuristics

Trang 24

• Optimization Timing

• Size of intermediate tables not

known always

• Cost justified with repeated

execution

• Intermediate tables’ size known

• Re-optimzation may be required

Trang 25

• Statistics

–Relation/Fragment:

Cardinality, size of a tuple,

fraction of tuples participating

in a join with another relation

–Attribute: cardinality of

domain, actual number of

distinct values

Trang 26

• Decision Sites

–Centralized: simple, need

knowledge about the entire

distributed database

–Distributed: cooperation among

sites to determine the schedule, need only local information

–Hybrid: one site determines the

global schedule, each site

optimizes the local subqueries

Trang 27

• Other factors like:

Trang 28

SQL Query on Distributed Relations

QUERY DECOMPOSITION GLOBALSCHEMAAlgebraic Query on Distributed

Relations

DATA LOCALIZATION FRAGMENTSCHEMAFragment Query

GLOBAL OPTIMIZATION STAT OFFRAGMENTSOptimized Fragment Query with

Communication Operations LOCAL

OPTIMIZATION SCHEMALOCAL

Optimized Local Query

Ngày đăng: 05/07/2022, 13:40

TÀI LIỆU CÙNG NGƯỜI DÙNG

  • Đang cập nhật ...

TÀI LIỆU LIÊN QUAN