Báo cáo môn cơ sở dữ liệu Query optimization Introduction to Query Processing Translating SQL Queries into Relational Algebra Rules for equivalent RAEs Using Heuristics in Query Optimization Costbased query optimization Summary
Trang 2 Introduction to Query Processing
Translating SQL Queries into Relational Algebra
Rules for equivalent RAEs
Using Heuristics in Query Optimization
Cost-based query optimization
Summary
Trang 3Introduction to Query Processing
Translating SQL Queries into Relational Algebra
Rules for equivalent RAEs
Using Heuristics in Query Optimization
Cost-based query optimization
Summary
3
Trang 4Introduction to Query Processing
Query processing:
The process by which the query results are
retrieved from a high-level query such as SQL or OQL, ODBMS
Query optimization:
The process of choosing a suitable execution
strategy for processing a query.
Two internal representations of a query:
Query Tree
Trang 5Processing a high-level query
Scanning, parsing, and
validating
Scanning, parsing, and
validating
Query optimizer
Query code generator
Runtime database processor
Query in a high-level language
Code to execute the query
Execution plan Immediate form of query
Result of query
5
Trang 6Example
Trang 7 Introduction to Query Processing
Translating SQL Queries into Relational Algebra
Rules for equivalent RAEs
Using Heuristics in Query Optimization
Cost-based query optimization
Summary
7
Trang 8Translating SQL Queries into Relational Algebra
A1, A2, …(R)
SELECT * FROM R, S
WHERE c
R c S
Trang 9Translating SQL Queries into Relational Algebra
Query block : the basic unit that can be translated
into the algebraic operators and optimized.
A query block contains a single
SELECT-FROM-WHERE expression, as well as GROUP BY and
HAVING clause if these are part of the block.
Nested queries within a query are identified as
separate query blocks.
Aggregate operators (MAX, MIN, SUM, and COUNT)
in SQL must be included in the extended algebra.
9
Trang 10Translating SQL Queries into Relational Algebra
SELECT LNAME, FNAME
Trang 11 Introduction to Query Processing
Translating SQL Queries into Relational Algebra
Query Trees and Query Graphs
Rules for equivalent RAEs
Using Heuristics in Query Optimization
Cost-based query optimization
Summary
11
Trang 12Query Trees and Query Graphs
Query tree:
A tree data structure that corresponds to a relational algebra expression
It represents the input relations of the query as leaf nodes
of the tree, and represents the relational algebra operations
as internal nodes
An execution of the query tree consists of executing an internal node operation whenever its operands are available and then replacing that internal node by the relation that results from executing the operation
Trang 13Query Trees and Query Graphs
Trang 14Query Trees and Query Graphs
Example: For every project located in ‘Stafford’, retrieve the project number, the controlling department number and the department manager’s last name, address and birthdate.
Relation algebra :
PNUMBER, DNUM, LNAME, ADDRESS, BDATE ((( PLOCATION=‘STAFFORD’(PROJECT))
DNUM=DNUMBER (DEPARTMENT)) MGRSSN=SSN (EMPLOYEE))
SQL query:
Q2: SELECT P.NUMBER,P.DNUM,E.LNAME,
E.ADDRESS, E.BDATE FROM PROJECT AS P,DEPARTMENT AS D, EMPLOYEE AS E
WHERE P.DNUM=D.DNUMBER AND D.MGRSSN=E.SSN AND
P.PLOCATION=‘STAFFORD’;
Trang 15Query Trees and Query Graphs
15
Trang 16Query Trees and Query Graphs
Trang 17 Introduction to Query Processing
Translating SQL Queries into Relational Algebra
Rules for equivalent RAEs
Using Heuristics in Query Optimization
Cost-based query optimization
Summary
17
Trang 18Equivalent Relational Expressions
equivalent if they produce the same results (tuples) on the same input relations
- Although their tuples/attributes may
be ordered differently
An equivalent rule says that expressions of
two forms are equivalent
Can replace expression of first form by second, or vice versa
Trang 19Rules for equivalent RAEs
19
1 Cascade of σ A conjunctive selection
condition can be broken up into a
cascade (that is, a sequence) of
individual σ operations :
(R)) )) ( (σ
(σ σ
(R)
σ
n 2
1 n
2
Trang 20Rules for equivalent RAEs
2 Commutativity of σ The σ operation is
commutative:
3 Cascade of π:
(R)) (σ
σ (R))
(σ
σ
1 2
2
L 1
L
L L
L
1 n
2 1
Trang 21Rules for equivalent RAEs
21
4 Commuting σ with π:
5 Commutativity of (and ×)
(R)) (
σ (R))
(σ
n 2, , 1
n 2, ,
S
R θ θ
S x R
S
x
Trang 22Rules for equivalent RAEs
6 Commuting σ with (or x )
S)) (
σ R))
( σ
( )
(R
σ
2 1
2
θ S (
S)) (
σ R)
( σ
( σ )
(R
σ
2 1
2
θ
θ S
Trang 23Rules for equivalent RAEs
) R) (
( )
(R
2
L θ
))S((
))R((
)
(R
4 2
3
L L
θ
Trang 258 Commutativity of set operations
9 Associativity of ( X , , and ∩) ∪, and ∩)
25
Rules for equivalent RAEs
)RS
()
R
( S
)RS
()
R( S
) T
S ( R
T )
R
Trang 26R S
T
If R S is better than S T then execute
R S first ( Choose a join order )
Trang 2711 The π operation commutes with ∪, and ∩)
12 Converting a (σ,×) sequence into
10 Commuting σ with set operations (∩, - )∪, and ∩)
27
Rules for equivalent RAEs
)) S ( (
(R)) (
)) S ( (σ (R))
(σ )
(R
Trang 28 Introduction to Query Processing
Translating SQL Queries into Relational Algebra
Rules for equivalent RAEs
Using Heuristics in Query Optimization
Cost-based query optimization
Summary
Trang 29Using Heuristics in Query Optimization
plan Two main approach :
Reduce the number of operations …
Estimate cost of each operation …
29
Trang 30 Each Relational Algebra Expression (E) is
represented by a Query Tree (Q)
Trang 31100 D
C B B
A, σ σ
Trang 32100 D
C B B
C B B
Trang 331 Break up any SELECT operations with
conjunctive conditions into a cascade of SELECT operations (Rule 1)
2 Move each SELECT operation as
far down the query tree as is
permitted by the attributes
involved in the select condition
( Rules 2, 4, 6, and 10 )
Outline of a Heuristic Algebraic Optimization Algorithm
33
Trang 343 Rearrange the leaf nodes of the tree using the
following criteria ( Rules 5 , 9 concerning
commutativity and associativity of binary
operations )
– position the leaf node relations with the most
restrictive SELECT operations so they are
executed first in the query tree representation– make sure that the ordering of leaf nodes does
not cause CARTESIAN PRODUCT operations
Outline of a Heuristic Algebraic Optimization Algorithm
Trang 354 Combine a CARTESIAN PRODUCT operation
with a subsequent SELECT operation in the
tree into a JOIN operation, if the condition
represents a join condition (Rule 12)
5 Break down and move lists of projection
attributes down the tree as far as possible by
( Using Rules 3, 4, 7, and 11 )
Outline of a Heuristic Algebraic Optimization Algorithm
35
Trang 366 Identify subtrees that represent groups of
operations that can be executed by a single algorithm
Outline of a Heuristic Algebraic Optimization Algorithm
Trang 3737
Trang 38SELECT Pname
FROM PROJECT As P, DEPARTMENT As D, EMPLOYEE As E
WHERE Dnum=Dnumber AND Mgr_ssn=Ssn AND Lname=‘Smith’
Pname((Dnum=Dnumber) (Mgr_ssn=Ssn) (Lname=‘Smith’))(PxDxE)
Trang 46 Introduction to Query Processing
Translating SQL Queries into Relational Algebra
Rules for equivalent RAEs
Using Heuristics in Query Optimization
Cost-based query optimization
Summary
Trang 47The cost of executing a query includes the
following components:
• Access cost to secondary storage
• Disk storage cost
• Computation cost
• Memory usage cost
• Communication cost
47
Trang 48 The cost of an operation depends on the size and
other statistics of its inputs
List some statistics about database relations that are stored in database-system catalogs
Use the statistics to estimate statistics on the results
of various relational operations
Trang 49Catalog Information
n r : number of tuples in a relation r.
b r : number of blocks containing tuples of r.
s r : size of a tuple of r.
f r : blocking factor of r
V(A, r): number of distinct values that appear in r for
attribute A; same as the size of A (r).
f n b
Trang 50Selection Size Estimation
The size estimate of the result of a selection
operation depends on the selection predicate
single equality predicate
single comparison predicate
combinations of predicates
Trang 51Equality selection A=v (r)
SC(A, r) : number of records that will satisfy the
selection
SC(A, r)/f r — number of blocks that these
records will occupy
E.g Binary search cost estimate becomes
Equality condition on a key attribute: SC(A,r) = 1
f
r A
SC b
E
Trang 52Selections Involving Comparisons
Selections of the form AV (r) (case of A V (r) is symmetric)
Let c denote the estimated number of tuples satisfying the condition
If min(A,r) and max(A,r) are available in catalog
C = 0 if v < min(A,r)
In absence of statistical information c is assumed to be
n r / 2.
) , min(
) , max(
) ,
min(
.
r A r
A
r A
v n
Trang 53Complex Selections
The selectivity of a condition i is the probability
that a tuple in the relation r satisfies i If s i is the
number of satisfying tuples in r, the selectivity of i
is given by s i /n r
Conjunction: 1 2 n (r) :
53
n r
n r
n
s s
s
n 1 2
Trang 54r
n
s n
s n
s n
Trang 55Estimation of the Size of Joins
Size of Cartesian product:
Cartesian product r × s contains nr ns tuples
Each tuple of r × s occupies lr + ls bytes
55
Trang 56Estimation of the Size of Joins
r(R) and s(S) be relations:
If R ∩ S = , then r s is the same as r × s∅ , then r ⨝ s is the same as r × s ⨝ s is the same as r × s
If R ∩ S is a key for R, then a tuple of s will join with
at most one tuple from r
-> The number of tuples in r s is no greater than ⨝ s is the same as r × s
the number of tuples in s
Trang 57Estimation of the Size of Joins
If R ∩ S is a foreign key in S referencing R, then the number of tuples in r s is exactly the same as the ⨝ s is the same as r × snumber of tuples in s
57
Trang 58Estimation of the Size of Joins
R ∩ S = {A} is not a key for R or S
If every tuple t in R produces tuples in R S, The ⨝ s is the same as r × snumber of tuples in R S is estimated:⨝ s is the same as r × s
) ,
V
n
Trang 59Estimation of the Size of Joins
If the reverse is true, the estimate obtained:
-> The lower of these two estimates is probably the more accurate one
59
) ,
( r A V
n
Trang 60 Introduction to Query Processing
Translating SQL Queries into Relational Algebra
Rules for equivalent RAEs
Using Heuristics in Query Optimization
Cost-based query optimization
Summary
Trang 61The main heuristic is to apply first the operations that reduce the size of intermediate results by:
operations to reduce the number of tuples
operations to reduce the number of attributes
61
Trang 62most restrictive—that is, result in relations
with the fewest tuples or with the smallest
absolute size should be executed before other similar operations
themselves while avoiding Cartesian products
Trang 6363