The logical I/O cost of finding a row using the nonclustered index on a clustered table is therefore as follows: FIGURE 35.2 An execution plan for a nonclustered index seek against a hea
Trang 1NOTE
This estimate assumes that the data rows have not been forwarded In a heap table,
when a row has been forwarded, the original row location contains a pointer to the new
location of the data row; therefore, an additional page read is required to retrieve the
actual data row The actual I/O cost would be one page greater per row than the
esti-mated I/O cost for any rows that have been forwarded
When a nonclustered index is used to retrieve the data rows from a heap table with a
clus-tered index, you see a query plan similar to the one shown in Figure 35.2 Notice that in
SQL Server 2008, the bookmark lookup operator is replaced by a RID lookup, essentially as
a join with the RIDs returned by the nonclustered index seek
If the table is clustered, the row bookmark is the clustered key for the data row The
number of I/Os to retrieve the data row depends on the depth of the clustered index tree
because SQL Server has to use the clustered index to find each row The logical I/O cost of
finding a row using the nonclustered index on a clustered table is therefore as follows:
FIGURE 35.2 An execution plan for a nonclustered index seek against a heap table
Number of nonclustered index levels
+ Number of leaf pages to be scanned
+ Number of qualifying rows × Number of page reads to find a single row via the clustered
index
Trang 2For example, consider a heap table with a nonclustered index on last name Assume that
the index holds 800 rows per page (they’re really big last names!), and 1,700 names are
within the range you are looking for If the index is three levels deep, the estimated logical
I/O cost for the nonclustered index would be as follows:
Now, assume that the table has a clustered index on it, and the size of the nonclustered
index is the same If the clustered index is three levels deep, including the data page, the
estimated logical I/O cost of using the nonclustered index would be as follows:
3 (index levels)
+ 3 (leaf pages: 1,700 leaf rows/800 rows per page)
+ 1,700 (data page reads)
= 1,706 total logical I/Os
3 (nonclustered index levels)
+ 3 (leaf pages: 1,700 leaf rows/800 rows per page)
+ 5,100 (1,700 rows × 3 clustered page reads per row)
= 5,106 (total logical I/Os)
NOTE
Although the I/O cost is greater for bookmark lookups in a nonclustered index when a
clustered index exists on the table, the cost savings during row inserts, updates, and
deletes using the clustered index as the bookmark are substantial, whereas the couple
extra logical I/Os per row during retrieval do not substantially impact query
perfor-mance
For a unique nonclustered index using an equality operator, the I/O cost is estimated as
the number of index levels traversed to access the bookmark plus the number of I/Os
required to access the data page via the bookmark
When a nonclustered index is used to retrieve the data rows on a table with a clustered
index, you see a query plan similar to the one shown in Figure 35.3 Notice that in SQL
Server 2008, the bookmark lookup operator is replaced by a clustered index seek, essentially
Trang 3as a join between the clustered index and the clustered index keys returned by the
nonclus-tered index seek
Covering Nonclustered Index Cost
When analyzing a query, the Query Optimizer considers any possibility to take advantage
of index covering Index covering is a method of using the leaf level of a nonclustered
index to resolve a query when all the columns referenced in the query (in both the
column list and WHERE clause, as well as any GROUP BY columns) are included in the index
leaf row as either index key columns or included columns
Index covering can save a significant amount of I/O because the query doesn’t have to
access the data page to return the requested information In most cases, a nonclustered
index that covers a query is faster than a similarly defined clustered index on the table
because of the greater number of rows per page in the index leaf level compared to the
number of rows per page in the table itself (As the nonclustered leaf row size approaches
the data row size, the I/O cost savings are minimal, if any.)
If index covering can take place in a query, the Query Optimizer considers it and estimates
the I/O cost of using the nonclustered index to cover the query The estimated I/O cost of
index covering is as follows:
Number of index levels
+ Number of leaf level index pages to scan
FIGURE 35.3 An execution plan for a nonclustered index seek against a table with a
clustered index
Trang 43 (nonclustered index levels)
+ 3 (leaf pages: 1,700 leaf rows/800 rows per page)
= 6 total logical I/Os
The number of leaf-level pages to scan is based on the estimated number of matching
rows divided by the number of leaf index rows per page For example, if index covering
could be used on the nonclustered index on title_id for the query in the previous
example, the I/O cost would be the following:
Other times, if the index keys can be searched to limit the range, you might see an index
seek used, as shown in Figure 35.5 Note that the difference here from a normal index
lookup is the lack of the RID or clustered index lookup because SQL Server does not need
to go to the data row to find the needed information
Table Scan Cost
If no usable index exists that can be matched with a SARG or a join clause, the Query
Optimizer’s only option is to perform a table scan The estimate of the total I/O cost is
simply the number of pages in the table, which is stored in the system catalogs and can be
viewed by querying the used_page_count column of the sys.dm_db_partition_stats
dynamic management view (DMV):
FIGURE 35.4 An execution plan for a covered index scan without limits on the search
NOTE
For more information on index covering and when it can take place, as well as the
included columns feature introduced in SQL Server 2008, see Chapter 34
When index covering is used to retrieve the data rows, you might see a query plan similar
to the one shown in Figure 35.4 If the entire leaf level of the index is searched, it displays
as an index scan, as shown in this example
Trang 5FIGURE 35.5 An execution plan for a covered index seek with limits on the search
Keep in mind that there are instances (for example, large range retrievals on a
nonclus-tered index column) in which a table scan might be cheaper than a candidate index in
terms of total logical I/O For example, in the previous nonclustered index example, if the
index does not cover the query, it costs between 1,706 and 5,106 logical I/Os to retrieve
the matching rows using the nonclustered index, depending on whether a clustered index
exists on the table If the total number of pages in the table is less than either of these
values, a table scan would be more efficient in terms of total logical I/Os than using a
nonclustered index
When a table scan is used to retrieve the data rows from a heap table, you see a query
plan similar to the one shown in Figure 35.6
When a table scan is used to retrieve the data rows from a clustered table, you see a query
plan similar to the one shown in Figure 35.7 Notice that it displays as a clustered index
scan because the table is the leaf level of the clustered index
Using Multiple Indexes
SQL Server allows the creation of multiple indexes on a table If a query has multiple
SARGs that can each be efficiently searched using an available index, the Query Optimizer
select used_page_count
from sys.dm_db_partition_stats
where object_id = object_id(‘sales_noclust’)
and (index_id = 0 data pages for heap table
or index_id = 1) data pages for clustered table
go
used_page_count
-1244
Trang 6FIGURE 35.6 A table scan on a heap table
FIGURE 35.7 A table scan on a clustered table
Index Intersection
Index intersection is a mechanism that allows SQL Server to use multiple indexes on a
table when you have two or more SARGs in a query and each can be efficiently satisfied
using an index as the access path Consider the following example:
First, create 2 additional indexes on sales to support the query
create index ord_date_idx on sales(ord_date)
create index qty_idx on sales(qty)
go
select * from sales
Trang 7where qty = 816
and ord_date = ‘1/2/2008’
In this example, two additional nonclustered indexes are created on the sales table: one
on the qty column and one on the ord_date column In this example, the Query
Optimizer considers the option of searching the index leaf rows of each index to find the
rows that meet each of the search conditions and joining on the matching bookmarks
(either the clustered index key or RIDs if it’s a heap table) for each result set It then
performs a merge join on the bookmarks and uses the output from that to retrieve the
actual data rows for all the bookmarks that are in both result sets
The index intersection strategy is applied only when the cost of retrieving the bookmarks
for both indexes and then retrieving the data rows is less than that of retrieving the
quali-fying data rows using only one of the indexes or using a table scan
You can go through the same analysis as the Query Optimizer to determine whether an
index intersection makes sense For example, the sales table has a clustered index on
stor_id, ord_num, and title_id, and this clustered index is the bookmark used to retrieve
the data rows for the matching data rows found via the nonclustered indexes Assume the
following statistics:
There are 1,200 rows estimated to match where qty = 816
There are approximately 215 index rows per leaf page for the index on qty
There are 212 rows estimated to match where ord_date = ‘1/2/2008’
There are approximately 185 index rows per leaf page for the index on ord_date
The Query Optimizer estimates that the overlap between the two result sets is 1 row
The number of levels in the index on qty is 3
The number of levels in the index on ord_date is 3
The number of levels in the clustered index on the sales table is 3
The sales table is 1,252 pages in size
Using this information, you can calculate the I/O cost for the different strategies the
Query Optimizer can consider
A table scan would cost 1,252 pages
A standard data row retrieval via the nonclustered index on qty would have the following
approximate cost:
2 index page reads (root and intermediate pages to locate first leaf page)
+ 6 leaf page reads (1200 rows / 215 rows per page)
+ 3600 (1,200 rows × 3 pages per bookmark lookup via the clustered index)
Trang 82 nonclustered index page reads (root and intermediate pages)
+ 2 nonclustered leaf page reads (212 rows / 185 rows per page)
+ 636 (212 rows × 3 pages per bookmark lookup via clustered index)
= 640 pages
A standard data row retrieval via the nonclustered index on ord_date would have the
following approximate cost:
8 pages (1 root page + 1 intermediate page + the 6 leaf pages to find all the bookmarks
for the 1,200 matching index rows on qty)
+ 4 pages (1 root page + 1 intermediate page + 2 leaf pages to find all the bookmarks for
the 212 matching index rows on ord_date)
+ 3 page reads to find the 1 estimated overlapping row between the two indexes using the
clustered index
= 15 pages
The index intersection is estimated to have the following cost:
As you can see from these examples, the index intersection strategy is definitely the
cheapest approach If at any point the estimated intersection cost reaches 640 pages, SQL
Server just uses the single index on ord_date and checks both search criteria against the
212 matching rows for ord_date If the estimated cost of using an index in any way ever
exceeds 1,252 pages, a table scan is likely to be performed, with the criteria checked
against all rows
When an index intersection is used to retrieve the data rows from a table with a clustered
index, you see a query plan similar to the one shown in Figure 35.8
If the table does not have a clustered index (that is, a heap table like the sales_noclust
table in the bigpubs2008 database) and has supporting nonclustered indexes for an index
intersection, you see a query plan similar to the one shown in Figure 35.9
Notice that in the example shown in Figure 35.9, the Query Optimizer performs a hash
join rather than a merge join on the RIDs returned by each nonclustered index seek and
uses the results from the hash join to perform an RID lookup to retrieve the matching
data rows
NOTE
To duplicate the query plan shown in Figure 35.9, you need to create the following two
additional indexes on the sales_noclust table:
create index ord_date_idx on sales_noclust(ord_date)
create index qty_idx on sales_noclust(qty)
Trang 9FIGURE 35.8 An execution plan for an index intersection on a clustered table
FIGURE 35.9 An execution plan for an index intersection on a heap table
The Index Union Strategy
You see a strategy similar to an index intersection applied when you have an OR condition
between your SARGs, as in the following query:
select * from sales
where title_id = ‘DR8514’
or ord_date = ‘2006-01-01 00:00:00.000’
Trang 10each part separately, using the index that matches the SARG, but after combining the
results with a merge join, it removes any duplicate bookmarks for rows that match both
search arguments It then uses the unique bookmarks to retrieve the result rows from the
base table
When the index union strategy is used on a table with a clustered index, you see a query
plan similar to the one shown in Figure 35.10 Notice the addition of the stream
aggrega-tion step, which differentiates it from the index intersecaggrega-tion query plan The stream
aggregation step performs a grouping on the bookmarks returned by the merge join to
eliminate the duplicate bookmarks
The following steps describe how SQL Server determines whether to use the index union
strategy:
1 Estimate the cost of a table scan and the cost of using the index union strategy If
the cost of the index union strategy exceeds the cost of a table scan, stop here and
simply perform a table scan Otherwise, continue with the succeeding steps to
perform the index union strategy
2 Break the query into multiple parts, as in this example:
select * from sales where title_id = ‘DR8514’
select * from sales where ord_date = ‘2006-01-01 00:00:00.000’
3 Match each part with an available index
4 Execute each piece and perform a join on the row bookmarks
5 Remove any duplicate bookmarks
6 Use the resulting list of unique bookmarks to retrieve all qualifying rows from the
base table
If any one of the OR clauses needs to be resolved via a table scan for any reason, SQL
Server simply uses a table scan to resolve the whole query rather than applying the index
union strategy
FIGURE 35.10 An execution plan for an index union strategy on a clustered table