Microsoft SQL Server 2008 R2 Unleashed- P129 pptx

The logical I/O cost of finding a row using the nonclustered index on a clustered table is therefore as follows: FIGURE 35.2 An execution plan for a nonclustered index seek against a hea

Trang 1

NOTE

This estimate assumes that the data rows have not been forwarded In a heap table,

when a row has been forwarded, the original row location contains a pointer to the new

location of the data row; therefore, an additional page read is required to retrieve the

actual data row The actual I/O cost would be one page greater per row than the

esti-mated I/O cost for any rows that have been forwarded

When a nonclustered index is used to retrieve the data rows from a heap table with a

clus-tered index, you see a query plan similar to the one shown in Figure 35.2 Notice that in

SQL Server 2008, the bookmark lookup operator is replaced by a RID lookup, essentially as

a join with the RIDs returned by the nonclustered index seek

If the table is clustered, the row bookmark is the clustered key for the data row The

number of I/Os to retrieve the data row depends on the depth of the clustered index tree

because SQL Server has to use the clustered index to find each row The logical I/O cost of

finding a row using the nonclustered index on a clustered table is therefore as follows:

FIGURE 35.2 An execution plan for a nonclustered index seek against a heap table

Number of nonclustered index levels

+ Number of leaf pages to be scanned

+ Number of qualifying rows × Number of page reads to find a single row via the clustered

index

Trang 2

For example, consider a heap table with a nonclustered index on last name Assume that

the index holds 800 rows per page (they’re really big last names!), and 1,700 names are

within the range you are looking for If the index is three levels deep, the estimated logical

I/O cost for the nonclustered index would be as follows:

Now, assume that the table has a clustered index on it, and the size of the nonclustered

index is the same If the clustered index is three levels deep, including the data page, the

estimated logical I/O cost of using the nonclustered index would be as follows:

3 (index levels)

+ 3 (leaf pages: 1,700 leaf rows/800 rows per page)

+ 1,700 (data page reads)

= 1,706 total logical I/Os

3 (nonclustered index levels)

+ 5,100 (1,700 rows × 3 clustered page reads per row)

= 5,106 (total logical I/Os)

NOTE

Although the I/O cost is greater for bookmark lookups in a nonclustered index when a

clustered index exists on the table, the cost savings during row inserts, updates, and

deletes using the clustered index as the bookmark are substantial, whereas the couple

extra logical I/Os per row during retrieval do not substantially impact query

perfor-mance

For a unique nonclustered index using an equality operator, the I/O cost is estimated as

the number of index levels traversed to access the bookmark plus the number of I/Os

required to access the data page via the bookmark

When a nonclustered index is used to retrieve the data rows on a table with a clustered

index, you see a query plan similar to the one shown in Figure 35.3 Notice that in SQL

Server 2008, the bookmark lookup operator is replaced by a clustered index seek, essentially

Trang 3

as a join between the clustered index and the clustered index keys returned by the

nonclus-tered index seek

Covering Nonclustered Index Cost

When analyzing a query, the Query Optimizer considers any possibility to take advantage

of index covering Index covering is a method of using the leaf level of a nonclustered

index to resolve a query when all the columns referenced in the query (in both the

column list and WHERE clause, as well as any GROUP BY columns) are included in the index

leaf row as either index key columns or included columns

Index covering can save a significant amount of I/O because the query doesn’t have to

access the data page to return the requested information In most cases, a nonclustered

index that covers a query is faster than a similarly defined clustered index on the table

because of the greater number of rows per page in the index leaf level compared to the

number of rows per page in the table itself (As the nonclustered leaf row size approaches

the data row size, the I/O cost savings are minimal, if any.)

If index covering can take place in a query, the Query Optimizer considers it and estimates

the I/O cost of using the nonclustered index to cover the query The estimated I/O cost of

index covering is as follows:

Number of index levels

+ Number of leaf level index pages to scan

FIGURE 35.3 An execution plan for a nonclustered index seek against a table with a

clustered index

Trang 4

3 (nonclustered index levels)

= 6 total logical I/Os

The number of leaf-level pages to scan is based on the estimated number of matching

rows divided by the number of leaf index rows per page For example, if index covering

could be used on the nonclustered index on title_id for the query in the previous

example, the I/O cost would be the following:

Other times, if the index keys can be searched to limit the range, you might see an index

seek used, as shown in Figure 35.5 Note that the difference here from a normal index

lookup is the lack of the RID or clustered index lookup because SQL Server does not need

to go to the data row to find the needed information

Table Scan Cost

If no usable index exists that can be matched with a SARG or a join clause, the Query

Optimizer’s only option is to perform a table scan The estimate of the total I/O cost is

simply the number of pages in the table, which is stored in the system catalogs and can be

viewed by querying the used_page_count column of the sys.dm_db_partition_stats

dynamic management view (DMV):

FIGURE 35.4 An execution plan for a covered index scan without limits on the search

NOTE

For more information on index covering and when it can take place, as well as the

included columns feature introduced in SQL Server 2008, see Chapter 34

When index covering is used to retrieve the data rows, you might see a query plan similar

to the one shown in Figure 35.4 If the entire leaf level of the index is searched, it displays

as an index scan, as shown in this example

Trang 5

FIGURE 35.5 An execution plan for a covered index seek with limits on the search

Keep in mind that there are instances (for example, large range retrievals on a

nonclus-tered index column) in which a table scan might be cheaper than a candidate index in

terms of total logical I/O For example, in the previous nonclustered index example, if the

index does not cover the query, it costs between 1,706 and 5,106 logical I/Os to retrieve

the matching rows using the nonclustered index, depending on whether a clustered index

exists on the table If the total number of pages in the table is less than either of these

values, a table scan would be more efficient in terms of total logical I/Os than using a

nonclustered index

When a table scan is used to retrieve the data rows from a heap table, you see a query

plan similar to the one shown in Figure 35.6

When a table scan is used to retrieve the data rows from a clustered table, you see a query

plan similar to the one shown in Figure 35.7 Notice that it displays as a clustered index

scan because the table is the leaf level of the clustered index

Using Multiple Indexes

SQL Server allows the creation of multiple indexes on a table If a query has multiple

SARGs that can each be efficiently searched using an available index, the Query Optimizer

select used_page_count

from sys.dm_db_partition_stats

where object_id = object_id(‘sales_noclust’)

and (index_id = 0 data pages for heap table

or index_id = 1) data pages for clustered table

go

used_page_count

-1244

Trang 6

FIGURE 35.6 A table scan on a heap table

FIGURE 35.7 A table scan on a clustered table

Index Intersection

Index intersection is a mechanism that allows SQL Server to use multiple indexes on a

table when you have two or more SARGs in a query and each can be efficiently satisfied

using an index as the access path Consider the following example:

First, create 2 additional indexes on sales to support the query

create index ord_date_idx on sales(ord_date)

create index qty_idx on sales(qty)

go

select * from sales

Trang 7

where qty = 816

and ord_date = ‘1/2/2008’

In this example, two additional nonclustered indexes are created on the sales table: one

on the qty column and one on the ord_date column In this example, the Query

Optimizer considers the option of searching the index leaf rows of each index to find the

rows that meet each of the search conditions and joining on the matching bookmarks

(either the clustered index key or RIDs if it’s a heap table) for each result set It then

performs a merge join on the bookmarks and uses the output from that to retrieve the

actual data rows for all the bookmarks that are in both result sets

The index intersection strategy is applied only when the cost of retrieving the bookmarks

for both indexes and then retrieving the data rows is less than that of retrieving the

quali-fying data rows using only one of the indexes or using a table scan

You can go through the same analysis as the Query Optimizer to determine whether an

index intersection makes sense For example, the sales table has a clustered index on

stor_id, ord_num, and title_id, and this clustered index is the bookmark used to retrieve

the data rows for the matching data rows found via the nonclustered indexes Assume the

following statistics:

There are 1,200 rows estimated to match where qty = 816

There are approximately 215 index rows per leaf page for the index on qty

There are 212 rows estimated to match where ord_date = ‘1/2/2008’

There are approximately 185 index rows per leaf page for the index on ord_date

The Query Optimizer estimates that the overlap between the two result sets is 1 row

The number of levels in the index on qty is 3

The number of levels in the index on ord_date is 3

The number of levels in the clustered index on the sales table is 3

The sales table is 1,252 pages in size

Using this information, you can calculate the I/O cost for the different strategies the

Query Optimizer can consider

A table scan would cost 1,252 pages

A standard data row retrieval via the nonclustered index on qty would have the following

approximate cost:

2 index page reads (root and intermediate pages to locate first leaf page)

+ 6 leaf page reads (1200 rows / 215 rows per page)

+ 3600 (1,200 rows × 3 pages per bookmark lookup via the clustered index)

Trang 8

2 nonclustered index page reads (root and intermediate pages)

+ 2 nonclustered leaf page reads (212 rows / 185 rows per page)

+ 636 (212 rows × 3 pages per bookmark lookup via clustered index)

= 640 pages

A standard data row retrieval via the nonclustered index on ord_date would have the

following approximate cost:

8 pages (1 root page + 1 intermediate page + the 6 leaf pages to find all the bookmarks

for the 1,200 matching index rows on qty)

+ 4 pages (1 root page + 1 intermediate page + 2 leaf pages to find all the bookmarks for

the 212 matching index rows on ord_date)

+ 3 page reads to find the 1 estimated overlapping row between the two indexes using the

clustered index

= 15 pages

The index intersection is estimated to have the following cost:

As you can see from these examples, the index intersection strategy is definitely the

cheapest approach If at any point the estimated intersection cost reaches 640 pages, SQL

Server just uses the single index on ord_date and checks both search criteria against the

212 matching rows for ord_date If the estimated cost of using an index in any way ever

exceeds 1,252 pages, a table scan is likely to be performed, with the criteria checked

against all rows

When an index intersection is used to retrieve the data rows from a table with a clustered

index, you see a query plan similar to the one shown in Figure 35.8

If the table does not have a clustered index (that is, a heap table like the sales_noclust

table in the bigpubs2008 database) and has supporting nonclustered indexes for an index

intersection, you see a query plan similar to the one shown in Figure 35.9

Notice that in the example shown in Figure 35.9, the Query Optimizer performs a hash

join rather than a merge join on the RIDs returned by each nonclustered index seek and

uses the results from the hash join to perform an RID lookup to retrieve the matching

data rows

NOTE

To duplicate the query plan shown in Figure 35.9, you need to create the following two

additional indexes on the sales_noclust table:

create index ord_date_idx on sales_noclust(ord_date)

create index qty_idx on sales_noclust(qty)

Trang 9

FIGURE 35.8 An execution plan for an index intersection on a clustered table

FIGURE 35.9 An execution plan for an index intersection on a heap table

The Index Union Strategy

You see a strategy similar to an index intersection applied when you have an OR condition

between your SARGs, as in the following query:

select * from sales

where title_id = ‘DR8514’

or ord_date = ‘2006-01-01 00:00:00.000’

Trang 10

each part separately, using the index that matches the SARG, but after combining the

results with a merge join, it removes any duplicate bookmarks for rows that match both

search arguments It then uses the unique bookmarks to retrieve the result rows from the

base table

When the index union strategy is used on a table with a clustered index, you see a query

plan similar to the one shown in Figure 35.10 Notice the addition of the stream

aggrega-tion step, which differentiates it from the index intersecaggrega-tion query plan The stream

aggregation step performs a grouping on the bookmarks returned by the merge join to

eliminate the duplicate bookmarks

The following steps describe how SQL Server determines whether to use the index union

strategy:

1 Estimate the cost of a table scan and the cost of using the index union strategy If

the cost of the index union strategy exceeds the cost of a table scan, stop here and

simply perform a table scan Otherwise, continue with the succeeding steps to

perform the index union strategy

2 Break the query into multiple parts, as in this example:

select * from sales where title_id = ‘DR8514’

select * from sales where ord_date = ‘2006-01-01 00:00:00.000’

3 Match each part with an available index

4 Execute each piece and perform a join on the row bookmarks

5 Remove any duplicate bookmarks

6 Use the resulting list of unique bookmarks to retrieve all qualifying rows from the

base table

If any one of the OR clauses needs to be resolved via a table scan for any reason, SQL

Server simply uses a table scan to resolve the whole query rather than applying the index

union strategy

FIGURE 35.10 An execution plan for an index union strategy on a clustered table

Định dạng
Số trang	10
Dung lượng	683,38 KB