Dbms chapter 4 query processing and optimization

SQL clause Relational operation Meaning FROM a single table none Input table FROM table1, table2 table1 X table2 Cartesian product FROM table1 JOIN table2 ON conditions table1  con

Trang 1

Chapter 4: Algorithms for Query

Processing and Optimization

Ho Chi Minh City University of Technology Faculty of Computer Science and Engineering

Database Management Systems

(CO3021)

Computer Science Program

Dr Võ Thị Ngọc Châu (chauvtn@hcmut.edu.vn)

Trang 2

Course outline

 Chapter 1 Overall Introduction to Database

Management Systems

 Chapter 2 Disk Storage and Basic File Structures

 Chapter 3 Indexing Structures for Files

Optimization

 Chapter 5 Introduction to Transaction Processing Concepts and Theory

 Chapter 6 Concurrency Control Techniques

 Chapter 7 Database Recovery Techniques

Trang 3

References

 [1] R Elmasri, S R Navathe, Fundamentals of Database

Systems- 6th Edition, Pearson- Addison Wesley, 2011

 R Elmasri, S R Navathe, Fundamentals of Database Systems- 7th Edition, Pearson, 2016

 [2] H G Molina, J D Ullman, J Widom, Database System

Implementation, Prentice-Hall, 2000

 [3] H G Molina, J D Ullman, J Widom, Database Systems: The Complete Book, Prentice-Hall, 2002

 [4] A Silberschatz, H F Korth, S Sudarshan, Database

System Concepts –3rd Edition, McGraw-Hill, 1999

 [Internet] …

Trang 4

Content

 4.1 Introduction to Query Processing

 4.2 Translating SQL Queries into Relational Algebra

 4.3 Algorithms for External Sorting

 4.4 Algorithms for SELECT and JOIN Operations

 4.5 Algorithms for PROJECT and SET Operations

 4.6 Implementing Aggregate Operations and Outer Joins

 4.7 Combining Operations using Pipelining

 4.8 Using Heuristics in Query Optimization

 4.9 Using Selectivity and Cost Estimates in Query Optimization

 4.10 Overview of Query Optimization in Oracle

Trang 5

4.1 Introduction to Query Processing

5

CREATE TABLE EMPLOYEE (

Fname VARCHAR(15) NOT NULL,

Minit CHAR,

Lname VARCHAR(15) NOT NULL,

Ssn CHAR(9) NOT NULL,

ON DELETE SET NULL ON UPDATE CASCADE, CONSTRAINT EMPDEPTFK

FOREIGN KEY(Dno) REFERENCES DEPARTMENT(Dnumber)

ON DELETE SET DEFAULT ON UPDATE CASCADE);

Trang 6

How would you do for such results?

Retrieve SSN, last name, and

department number of all the

employees who work in

department 1 or were born after

01/01/1955 with salary higher

SELECT SSN, LNAME, DNO

FROM EMPLOYEE

WHERE DNO = 1 OR

(BDATE > '01/01/1955'

AND SALARY > 30000);

Trang 7

Typical steps when

processing a high-level query

FROM EMPLOYEE

WHERE DNO = 1 OR

(BDATE > '01/01/1955'

AND SALARY > 30000);

Trang 8

 A query is expressed in a high-level query

language such as SQL

 scanned, parsed, validated

 The scanner identifies the query tokens

(SQL keywords, attribute names, and relation names) that appear in the query text

 The parser checks the query syntax to

determine whether it is formulated according

to the syntax grammar rules of the language

 The validator checks if all attribute and

relation names are valid and semantically

meaningful names in the database schema

Trang 9

 The query is represented in an intermediate form, i.e internal representation

 Query Tree

 Query Graph

 The DBMS must then devise an execution strategy

or query plan for retrieving the results of the query

from the database files

 An execution plan includes details about the access

methods available for each relation and the algorithms to be

used in computing the relational operators represented in the tree

 A query has many possible execution plans, and the process

of choosing a suitable one for processing a query is query

Trang 10

 The query optimizer module has the task

of producing a good execution plan

 the code generator generates the code to

execute that plan

 The runtime database processor has the

task of running (executing) the query code, whether in compiled or interpreted mode, to produce the query result

 If a runtime error results, an error message is

generated by the runtime database processor

Trang 11

 Query Tree

 A tree data structure corresponds to an

extended relational algebra expression

 It represents the input relations of the query as

leaf nodes of the tree

 It represents the relational algebra operations as

internal nodes

 An execution of the query tree consists of

executing an internal node operation whenever

its operands are available and then replacing that internal node by the relation that results from

executing the operation

 The order of execution of operations starts at the

Trang 12

 Query Graph

 Relations in the query are represented by

relation nodes, which are displayed as single

circles

 Constant values, typically from the query

selection conditions, are represented by

constant nodes, which are displayed as double

circles or ovals

 Selection and join conditions are represented by

the graph edges

 The attributes to be retrieved from each relation are displayed in square brackets above each

relation

Trang 13

Query Tree

[SSN, LNAME, DNO]

DNO=1 BDATE>'01/01/1955' SALARY>30000

Trang 14

4.2 Translating SQL Queries into

Relational Algebra

 An SQL query is first translated into an

equivalent extended relational algebra

expression—represented as a query tree

data structure—that is then optimized

SQL clause Relational operation Meaning

FROM a single table (none) Input table

FROM table1, table2 table1 X table2 Cartesian product FROM table1 JOIN table2

ON conditions table1  conditions table2 Theta join

WHERE conditions conditions Selection

SELECT an attribute list an attribute list Projection

SELECT a function list

…

[GROUP BY a grouping

<a grouping attribute list> ℑ

<function list> Aggregation

Trang 15

Relational Algebra

translated into the algebraic operators and

optimized

 A query block contains a single WHERE expression, as well as GROUP BY and

SELECT-FROM-HAVING clauses if these are part of the block

as separate query blocks

 Aggregate operators (MAX, MIN, COUNT, SUM) must be included in the extended algebra

Trang 16

Relational Algebra

Retrieve the names of employees (from any department in

the company) who earn a salary that is greater than the

highest salary in department 5

Trang 17

Relational Algebra

Trang 18

4.3 Algorithms for External Sorting

 Sorting is one of the primary algorithms used

in query processing

 the ORDER BY clause

 sort-merge algorithms for JOIN and set operations

 duplicate elimination algorithms for the PROJECT operation

 DISTINCT in the SELECT clause

 External sorting : refers to sorting

algorithms that are suitable for large files of records stored on disk that do not fit entirely

in main memory

Trang 19

 Sort-Merge strategy : starts by sorting small

subfiles (runs) of the main file and then merges

the sorted runs, creating larger sorted subfiles

that are merged in turn

– Sorting phase: nR =  b/nB

– Merging phase: dM = min(nB-1, nR)

nP =  logdM(nR) 

b: number of file blocks

nB: available buffer space

nR: number of initial runs

dM: degree of merging

Trang 20

set i  1; j  b; /* size of the file in blocks */

k  nB; /* size of buffer in blocks */

m   j/k  ; /* the number of runs */

/*Sort phase*/

while (i<= m) do

{

read next k blocks of the file into the buffer or if

there are less than k blocks remaining, then read

in the remaining blocks;

sort the records in the buffer and write as a

Trang 21

/*Merge phase: merge subfiles until only one remains */

set i  1;

p   logk-1m  ;/*p: number of passes in the merging phase*/

j  m; /* the number of runs */

read next k-1 subfiles or remaining subfiles (from

previous pass) one block at a time

merge and write as new subfile one block at a time;

Trang 22

a sorting field

Trang 23

CREATE TABLE EMPLOYEE (

Fname VARCHAR(15) NOT NULL,

Lname VARCHAR(15) NOT NULL,

Ssn CHAR(9) NOT NULL,

… CONSTRAINT EMPDEPTFK FOREIGN KEY(Dno) REFERENCES DEPARTMENT(Dnumber)

ON DELETE SET DEFAULT ON UPDATE CASCADE);

4.4 Algorithms for SELECT and

JOIN Operations

CREATE TABLE DEPARTMENT (

Dname VARCHAR(15) NOT NULL, Dnumber INT NOT NULL,

Mgr_ssn CHAR(9) NOT NULL, Mgr_start_date DATE,

PRIMARY KEY (Dnumber), UNIQUE (Dname),

FOREIGN KEY (Mgr_ssn) REFERENCES EMPLOYEE(Ssn) );

CREATE TABLE WORKS_ON (

Essn CHAR(9) NOT NULL, Pno INT NOT NULL,

Hours DECIMAL(3,1) NOT NULL, PRIMARY KEY (Essn, Pno),

FOREIGN KEY (Essn) REFERENCES EMPLOYEE(Ssn),

Trang 24

Given the tables, some examples for selection:

 OP1: σSSN='123456789'(EMPLOYEE)

 OP2: σDNUMBER>5(DEPARTMENT)

 OP3: σDNO=5(EMPLOYEE)

 OP4: σDNO=5 AND SALARY>30000 AND SEX='F' (EMPLOYEE)

 OP4‘: σDno=5 OR Salary > 30000 OR Sex ='F' (EMPLOYEE)

 OP5: σESSN='123456789' AND PNO=10(WORKS_ON)

 OP6: σDNO IN (3, 27, 49)(EMPLOYEE)

 OP7: σ((Salary*Commission_pct) + Salary ) > 5000(EMPLOYEE)

JOIN Operations

SELECT * FROM TABLE WHERE CONDITIONs;

Trang 25

Implementing the SELECT Operation: Search

record in the file, and test whether its attribute

values satisfy the selection condition

involves an equality comparison on a key

attribute on which the file is ordered, binary

search (which is more efficient than linear

search) can be used

retrieve a single record: If the selection condition involves an equality comparison on a key

attribute with a primary index (or a hash key),

use the primary index (or the hash key) to

JOIN Operations

Trang 26

 S4 Using a primary index to retrieve

multiple records: If the comparison condition is

>, ≥ , <, or ≤ on a key field with a primary

index, use the index to find the record

satisfying the corresponding equality condition, then retrieve all subsequent records in the

(ordered) file

 S5 Using a clustering index to retrieve

multiple records: If the selection condition

involves an equality comparison on a non-key attribute with a clustering index, use the

clustering index to retrieve all the records

satisfying the selection condition

JOIN Operations

Trang 27

an equality comparison, this search method can

be used to retrieve a single record if the

indexing field has unique values (is a key) or to retrieve multiple records if the indexing field is

not a key In addition, it can be used to retrieve records on conditions involving >,>=, <, or

<= (FOR RANGE QUERIES )

JOIN Operations

Trang 28

condition involves a set of values for an attribute, the corresponding bitmaps for each value can be

OR-ed to give the set of record identifiers that

qualify

 S7.b Using a functional index: If there is a

functional index defined, this index can be used to retrieve all the records that qualify

JOIN Operations

CREATE INDEX income_ix

ON EMPLOYEE (Salary + (Salary*Commission_pct));

This index can be used for OP7

Trang 29

attribute involved in any single simple condition in

the conjunctive condition has an access path that permits the use of one of the methods S2 to S6,

use that condition to retrieve the records and then check whether each retrieved record satisfies the remaining simple conditions in the conjunctive

condition

composite index: If two or more attributes are

involved in equality conditions in the conjunctive

condition and a composite index (or hash

structure) exists on the combined field, we can

use the index directly

JOIN Operations

Trang 30

 S10 Conjunctive (AND) selection by

intersection of record pointers : This method

is possible if secondary indexes are available on

all (or some of) the fields involved in equality

comparison conditions in the conjunctive

condition and if the indexes include record

pointers (rather than block pointers) Each index

can be used to retrieve the record pointers that

satisfy the individual condition The intersection

of these sets of record pointers gives the record

pointers that satisfy the conjunctive condition,

which are then used to retrieve those records

directly If only some of the conditions have

secondary indexes, each retrieved record is

further tested to determine whether it satisfies

the remaining conditions

JOIN Operations

Trang 31

 Disjunctive (OR) selection conditions: With a

disjunctive selection condition, the records

satisfying the disjunctive condition are the union

of the records satisfying the individual conditions

Hence, if any one of the conditions does not have

an access path, we are compelled to use the brute force, linear search approach Only if an access

path exists on every simple condition in the

disjunction can we optimize the selection by

retrieving the records satisfying each condition—

or their record identifiers—and then applying the

union operation to eliminate duplicates

JOIN Operations

Trang 32

Algorithms for SELECT

σDno=5 OR Salary > 30000 OR Sex ='F' (EMPLOYEE)

Linear search: Each block is loaded in the buffer

The records are then checked

there for all the conditions in OR

……

Data file

σDno=5 OR Salary > 30000 OR Sex ='F' (EMPLOYEE)

Result 1 Result 2 Result 3

union

Trang 33

 Whenever a single condition specifies the

selection, we can only check whether an access

path exists on the attribute involved in that

condition If an access path exists, the method

corresponding to that access path is used;

otherwise, the ―brute force‖ linear search

approach of method S1 is used

 The query optimizer must choose the appropriate one for executing each SELECT operation in a

query

 This optimization uses formulas that estimate the costs

for each available access method

 The optimizer chooses the access method with the lowest estimated cost.

JOIN Operations

Trang 34

Given EMPLOYEE and DEPARTMENT tables:

 OP1: σSSN='123456789'(EMPLOYEE)

 OP2: σDNUMBER>5(DEPARTMENT)

 OP3: σDNO=5(EMPLOYEE)

 OP4: σDNO=5 AND SALARY>30000 AND SEX='F' (EMPLOYEE)

 OP4‘: σDno=5 OR Salary > 30000 OR Sex ='F' (EMPLOYEE)

 OP5: σESSN='123456789' AND PNO=10(WORKS_ON)

 OP6: σDNO IN (3, 27, 49)(EMPLOYEE)

 OP7: σ((Salary*Commission_pct) + Salary ) > 5000(EMPLOYEE)

Which search method should be used?

Trang 35

Implementing the JOIN Operation:

 Join (EQUIJOIN, NATURAL JOIN)

– two–way join: a join on two files

e.g R  A=B S

– multi-way join: a join involving more than two files

e.g R  A=B S  C=DT

 Examples

OP8: EMPLOYEE  DNO=DNUMBERDEPARTMENT

OP9: DEPARTMENT  MGR_SSN=SSNEMPLOYEE

JOIN Operations

SELECT * FROM R JOIN S ON A=B;

Trang 36

to retrieve the matching records)

JOIN Operations

Trang 37

record t in R (outer loop), retrieve every record s from S (inner loop) and test whether the two

records satisfy the join condition t[A] = s[B]

JOIN Operations

for each record t in each block of R

for each record s in each block of S

if (t[A] = s[B])

add (t, s) into the result

How many block accesses are needed with a (memory) buffer? Which (large or small) table should be on the outer loop?

Trang 38

JOIN Operations

for each record t in each block of R

for each record s in each block of S

if (t[A] = s[B])

add (t, s) into the result OP8: EMPLOYEE  DNO=DNUMBERDEPARTMENT

The number of blocks of EMPLOYEE bE = 2000 blocks

The number of blocks of DEPARTMENT bD = 10 blocks

Buffer size nB = 7 blocks

Trang 39

 J1 Nested-loop join (brute force):

JOIN Operations

OP8: EMPLOYEE  DNO=DNUMBERDEPARTMENT

bE = 2000 blocks, bD = 10 blocks, nB = 7 blocks

J1.1 EMPLOYEE on the outer loop

Cost = bE + bD*  bE/(nB-2)  = 2000 + 10*  2000/(7-2) 

Cost = 6000 block accesses

J1.2 DEPARTMENT on the outer loop

Cost = bD + bE*  bD/(nB-2)  = 10 + 2000*  10/(7-2) 

Cost = 4010 block accesses

Buffer

Outer Inner Result

Smaller file on the outer loop!!!

Trang 40

to retrieve the matching records): If an index (or hash key) exists for one of the two join

attributes — say, B of S — retrieve each record t

in R (loop over R) and then use the access

structure to retrieve directly all matching records

s from S that satisfy s[B] = t[A]

Tiêu đề	Algorithms for Query Processing and Optimization
Người hướng dẫn	Dr. Võ Thị Ngọc Châu
Trường học	Ho Chi Minh City University of Technology
Chuyên ngành	Database Management Systems
Thể loại	Bài giảng
Năm xuất bản	2020-2021
Thành phố	Ho Chi Minh City

Định dạng
Số trang	140
Dung lượng	2,88 MB