Therefore the rule-based optimizer makes a plan:find matching rows using the index on column2.. And the whole table can be scanned using two pagereads, whereas an index lookup would take
Trang 1The non-standard EXPLAIN statement (see Table 17-1) is thevital way to find out what the optimizer has done We haven'tmentioned it up to now because this book's primary goal has
been to show what you can do before the fact But EXPLAIN is
the way to measure whether your estimates correspond to
DBMS reality In many shops, it's customary to get an EXPLAINfor every SQL statement before submitting it for execution That
is quite reasonable What's perhaps less reasonable is the
custom of trying out every transformation one can think of andsubmitting them all for explanation That is mere floundering.Understanding principles—in other words, estimating what's
Trang 2rows.) That is a narrow search, and usually it's faster to
perform a narrow search with a B-tree rather than scan all rows
in the table Therefore the rule-based optimizer makes a plan:find matching rows using the index on column2
index for column2 Those facts change everything The equalsoperation will match on 60% of the rows, so it's not a narrowsearch And the whole table can be scanned using two pagereads, whereas an index lookup would take three page reads(one to lookup in the index, two more to fetch the data later).Therefore the cost-based optimizer makes a different plan: findmatching rows using a table scan
Trang 3optimizers, as you can see from Table 17-1 The claims don'tmean much by themselves What's important is whether theoptimizer estimates cost correctly and how it acts on the
"Updates" Statistics for the Optimizer
Informix Yes SET EXPLAIN UPDATE STATISTICSIngres Yes EXECUTE QEP optimizedb utility
InterBase Yes SELECT … PLAN SET STATISTICS
Microsoft Yes EXPLAIN UPDATE STATISTICSMySQL No EXPLAIN ANALYZE TABLE
Oracle Yes EXPLAIN PLAN
Trang 4Claims to be CBO column
This column is "Yes" if the DBMS's documentation makesthe claim that it operates with a cost-based optimizer
"Explains" the Access Plan column
This column shows the non-standard statement provided bythe DBMS so that you can examine the access plan theoptimizer will use to resolve an SQL statement For
example, if you want to know how Oracle will resolve a
specific SELECT statement, just execute an EXPLAIN PLANFOR statement for the SELECT
"Updates" Statistics for the Optimizer column
This column shows the non-standard statement or utilitythe DBMS provides so that you can update volatile
information, or statistics, for the optimizer For example, ifyour DBMS is MySQL and you've just added many rows to atable and want the optimizer to know about them, just
execute an ANALYZE TABLE statement for the table
Trang 5This glossary contains only terms that specifically are used forSQL optimization For terms that apply to the subject of SQL ingeneral, consult the 1,000-term glossary on our Web site,
ourworld.compuserve.com/homepages/OCELOTSQL/glossary.htm
Before the definition there may be a "Used by" note For
example, "Used by: Microsoft, Sybase" indicates that Microsoftand Sybase authorities prefer the term and/or definition Thewords "Used by: this book only" indicate a temporary and non-standard term that exists only for this book's purposes
Trang 6Second normal form table, a 1NF table that contains onlycolumns that are dependent upon the entire primary key
3NF
Third normal form table, a 2NF table whose non-key
columns are also mutually independent; that is, eachcolumn can be updated independently of all the rest
Trang 7Application Programming Interface, the method by which aprogrammer writing an application program can make
requests of the operating system or another application
applet
A Java program that can be downloaded and executed by abrowser
B
B-tree
A structure for storing index keys; an ordered, hierarchical,paged assortment of index keys Some people say the "B"stands for "Balanced."
back compression
Making index keys shorter by throwing bytes away from theback
See also [compression]
See also [front compression]
Trang 8See [B-tree]
BDB
Berkeley DB, an embedded database system that is bundledwith MySQL
Trang 10Used by: Microsoft
A pointer in an index key If the data is in a clustered index,the bookmark is a clustered index key If the data is not in aclustered index, the bookmark is an RID
bytecode
An intermediate form of code in which executable Java
programs are represented Bytecode is higher level thanmachine code, but lower level than source code
Trang 11cartesian explosion
The effect on the size of a cartesian join's product as thejoined tables grow in size During a join, the DBMS creates
a temporary table to hold the join result The temporarytable's size is the product of the sizes of the two originaltables, which means the processing time goes up
geometrically if Table1 and Table2 get bigger
cartesian join
The set of all ordered pairs {a, b} where a is a member ofset A and b is a member of set B In database terms, acartesian product joins all rows in Table1 with all rows inTable2 Thus if Table1 has the values {T_a1, T_b1} and Table2 has the values {T_a2, T_b2} then the cartesianproduct is {(T_a1, T_a2) (T_a1, T_b2) (T_b1, T_a2)(T_b1, T_b2)}.Cartesian products are useful for
explanation, but when we see an operation which "goescartesian," we usually criticize the optimizer Also known as
Trang 12A sort order that considers 'SMITH' and 'Smith' to betwo different strings
CBO
Cost-based optimizer, an optimizer that uses volatile data(e.g., the row and column values that have been inserted)and an override (e.g., that the database contents are moreimportant than the fixed optimizing assumptions) to
determine the optimal query resolution plan A cost-basedoptimizer is a type of rule-based optimizer that has
additional volatile information available to it so that it canoverride a fixed assumption
CGI
Common Gateway Interface, a standard way for a Webserver to pass a Web user's request to an application
program and to receive data back to forward to the user
cluster
Used by: Microsoft, Sybase
A structure for storing data in a specific order; that is, an
Trang 13permanently according to some column value, such as theprimary key
clustered index
An index that the DBMS uses to determine the order of datarows, according to values in one or more columns, calledthe cluster key With a strong-clustered index, the data
pages are the index's leaves and are thus always in order.With a weak-clustered index, data pages are separate fromindex leaf pages and the rows need not be 100% in order
Trang 14See [compound index]
composite table
A table that contains column values derived from two ormore other tables
compound index
An index whose keys contain values derived from more thanone data column
compression
Making index keys shorter by throwing bytes away from thefront or from the back
Trang 15Two transactions that have overlapping start or end times
To prevent concurrent transactions from interfering witheach other, the DBMS may arrange a lock
connection pooling
A facility that allows connections to a data source to bestored and reused A Java term
Trang 16cursor
A marker that indicates the current position within a resultset
data source
A repository for storing data An ODBC/JDBC term
DBA
Trang 17performs all activities related to maintaining a successfuldatabase environment
dbc
Database connection, an ODBC resource
DBMS
Database Management System, a program that lets one ormore computer users create and access the data in a
Trang 18A condition that arises when two or more transactions arewaiting for one another to release locks
denormalize
Break the normalization rules deliberately, in an attempt togain speed or save space
{A,C,C,D,D} The number of distinct keys, that is, thenumber that a SELECT DISTINCT statement wouldreturn, is three: {A,C,D} Now recall from your arithmeticclasses that reciprocal = 1/N—that is, the reciprocal of anumber N is the number that would yield one when
multiplied by N The reciprocal of 3 is 1/3; therefore, thedensity for our example is 1/3 It's possible to get a
preliminary guess of the number of rows that column1 =
<literal> will return by multiplying the table's cardinality
Trang 19WARNING: The following definitions of density from vendormanuals or other texts are imprecise or confusing: "theaverage number of rows which are duplicates" (Sybase);
"density is inversely proportional to index sensitivity"
(various)
dependence
A concept used in normalization If the value of column1uniquely determines the value of column2, then column2
is functionally dependent on column1 If the value of
column1 limits the possible values in column2 to a specificset, then column2 is set dependent on column1
dictionary sort
Trang 20dictionary sort with tie-breaking
A dictionary sort with multiple passes to sort accentedcharacters differently from unaccented characters anduppercase letters differently from lowercase letters
Dirty Read
A problem arising with concurrent transactions The DirtyRead problem occurs when a transaction reads a row thathas been changed but not committed by another
transaction The result is that Transaction #2's work isbased on a change that never really happened You canavoid Dirty Read by using an isolation level of READ
Trang 21The table that the DBMS examines first when evaluating ajoin or subquery expression
Enterprise Bean
A server component written in Java
env
Trang 22A lock that may not coexist with any other lock on the sameobject
expanding update
A data-change statement that increases the size of a row
extent
A group of pages that are allocated together, as part of theinitial creation of an object or when an existing extent
Trang 24A plan to process a subquery: make everything one level,that is, transform the query to a join, then process as ajoin
flattened (query)
See [flatten (a query)]
FPU
Floating Point Unit, a microprocessor that manipulatesnumbers faster than the basic microprocessor used by acomputer
[2] Used by: Informix What Informix users call
partitioning
Trang 25See also [back compression]
functionally dependent
A concept used in normalization If the value of column1uniquely determines the value of column2, then column2
is functionally dependent on column1
G
granularity (of a lock)
The size of the locked area—database, file, table, page,row, or column
Trang 26A method for producing a joined table Given two input
tables Table1 and Table2, processing is as follows:
(a) For each row in Table1, produce a hash Assign thehash to a hash bucket
(b) For each row in Table2, produce a hash Check if thehash is already in the hash bucket If it is: there's a join If
it is not: there's no join
Trang 27A structure for storing data in an unstructured manner.When you add something to a heap, it goes wherever there
is free space, which probably means at the end Existingdata is not moved to make free space available for newdata
heap-organized table
See [heap]
histogram
Detailed information on the distribution of values over acolumn; information stored for the sake of the optimizer
Trang 28expression Table1 LEFT JOIN Table2 the inner table
must be Table2 and for Table1 RIGHT JOIN Table2 theinner table must be Table1
Trang 31L
latch
A low-level on-off mechanism that ensures two processes orthreads can't access the same object at the same time
See also [lock]
leaf (page of an index)
A page at the bottom level of a B-tree (the page at the toplevel is the root) Typically a leaf page contains pointers tothe data pages (if it's a non-clustered index) or to the dataitself (if it's a clustered index)
Trang 32A method the DBMS uses to prevent concurrent
transactions from interfering with one another Physically, alock is one of three things: a latch, a mark on the wall, or aRAM record
locking level
See [granularity (of a lock)]
lock mode
The type of lock a DBMS has arranged for a transaction.Options are exclusive, shared, or update
Trang 33Transaction #1's change never happened You can avoidLost Update by using an isolation level of READ
UNCOMMITTED or higher
LRU
Least-Recently-Used, an algorithm that replaces the pagethat hasn't been accessed for the longest time
M
mark on the wall
An ITL slot or mark put against a row by the DBMS Byputting a mark right beside the row accessed by a
Trang 34materialize
See [materialization]
materialized view
A view whose rows take up space When you select from aview, the DBMS can elect to do one of two things: (a) it canget the rows from the original table, convert any derivedcolumns, and pass the results to the application or (b) itcan create a temporary table and put the rows from theoriginal table(s) into the temporary table, then select fromthe temporary copy The latter case results in a materializedview Materialization is often necessary when there is noone-to-one correspondence between the original table'srows and the view's rows (because there is a grouping) orwhen many tables are affected and concurrency would beharmed (because there is a join)
Trang 35The process of finding a home for an expanding update.When a page overflows due to a data change that increasesthe length of a variable-length column, a row must be
Trang 36Notice the for loop nested within a for loop
Trang 37A problem arising with concurrent transactions The Non-repeatable Read by using an isolation level of REPEATABLEREAD or higher.
Trang 38The process of designing a database so that its tables followthe rules specified by relational theory In practice, this
usually means that all database tables are in third normalform
process, according to rules based on relational theory In anormalized table, one set of columns is the primary key(which uniquely identifies a row of the table) and all othercolumns are functionally dependent upon the entire primarykey
Trang 39Locking that assumes conflict is unlikely Generally, this
means avoiding locks and checking for conflict between twotransactions only after data changes have been made
outer table
The table in the outer loop of a nested-loop join When youwrite an SQL statement with an inner join, the outer table isdetermined by the DBMS based on its join strategy for that
Trang 40of the join determines the outer table: for the join
expression Table1 LEFT JOIN Table2 the outer tablemust be Table1 and for Table1 RIGHT JOIN Table2 theouter table must be Table2
out-of-place update
Used by: Microsoft, Sybase A data change that causes arow to move
Trang 41A fixed-size hopper that stores rows of data or index keys;
a minimal unit for disk I/O Depending on the DBMS, a page
is also called a data block, a block, a blocking unit, a controlinterval, or a row group
on separate disks Informix calls this fragmentation
Trang 42A problem arising with concurrent transactions The
Phantom problem occurs when a transaction reads multiplerows twice; once before and once after another transactiondoes a data change that affects the search condition in thefirst transaction's reads The result is that Transaction #1gets a different (larger) result set back from its second
read You can avoid Phantoms by using an isolation level ofSERIALIZABLE
PL/SQL
Used by: Oracle
Trang 43stored procedures
positioned delete
A DELETE statement that allows you to delete the row atthe current cursor position Syntax: DELETE … WHERE
CURRENT OF <cursor>
positioned update
An UPDATE statement that allows you to update the row atthe current cursor position Syntax: UPDATE … WHERE
CURRENT OF <cursor>
precompiler
A utility you use to "compile" SQL code before you compilethe host program, that is, a utility that converts SQL
statements in a host program to statements that a compilercan understand A remnant of embedded-SQL days; there is
no such thing as an SQL compiler
prepared statement
An SQL statement that has been parsed and planned, forexample, with ODBC's SQLPrepare function