Clustered Index PK_WorkOrder_WorkOrderID Seek Scan Query Path 4: Filter by non-key column The previous query paths were simple to solve because the filter column matched the clustered in
Trang 1a single row; however, from a rows-returned-per-millisecond perspective, it’s one of the slowest
query paths
A common myth is that seeks can only return single rows and that’s why seeking multiple rows would
be very slow compared to scans As the next two query paths indicate, that’s not true
Query Path 3: Range Seek Query
The third query path selects a narrow range of consecutive values using abetweenoperator in the
WHEREclause:
SELECT * FROM Production.WorkOrder
WHERE WorkOrderID between 10000 and 10010;
The Query Optimizer must first determine whether there’s a suitable index to select the range In this
case it’s the same key column in the clustered index as in Query path 2
A range seek query has an interesting query execution plan The seek predicate (listed in the index seek
properties), which defines how the query is navigating the b-tree, has both a start and an end, as shown
in Figure 64-6 This means the operation is seeking the first row and then quickly scanning and
return-ing every row to the end of the range, as illustrated in Figure 64-7
To further investigate the range seek query path, this next query pushes the range to the limit by
select-ing every row in the table Both queries are tested just to prove thatbetweenis logically the same as
>=with<=:
SELECT * FROM Production.WorkOrder
WHERE WorkOrderID >= 1 and WorkOrderID <= 72591;
SELECT * FROM Production.WorkOrder
WHERE WorkOrderID between 1 and 72591;
At first blush it would seem that this query should generate the same query execution plan as the first
query path (select * from table), but, just like the narrow range query, thebetweenoperator
needs a consecutive range of rows, which causes the Query Optimizer to select index seek to return
ordered rows
Keep in mind that there’s no guarantee that another row might be added after the query plan is
gener-ated and before it’s executed Therefore, for range queries, an index seek is the fastest possible way to
ensure that only the correct rows are selected
Trang 2FIGURE 64-6
The clustered index seek’s seek predicate has a start and an end indicating the range of rows searched
for using the b-tree index
Index seeks and index scans both perform well when returning large sets of data The minor difference
between the two query’s durations listed in the performance chart (refer to Table 64-1) is more likely
due to variance in my computer’s performance There were some iterations of the index seek that
per-formed faster than some iterations of the index scan
Trang 3An index seek operation has the option of seeking to find the first row, and then sequentially scanning
on a block of data
Clustered Index PK_WorkOrder_WorkOrderID
Seek
Scan
Query Path 4: Filter by non-key column
The previous query paths were simple to solve because the filter column matched the clustered index
key column and all the data was available from one index; but what if that isn’t the case?
Consider this query:
SELECT * FROM Production.WorkOrder WHERE StartDate = ‘2003-06-25’;
There’s no index with a key column of StartDate This means that the Query Optimizer can’t choose
a fast b-tree index and must resort to scanning the entire table and then manually searching for rows
that match theWHEREclause Without an index, this query path is 23 times slower than the clustered
index seek query path
The cost isn’t the filter operation alone (which is only 7 percent of the total query cost) The real cost is
having to scan in every row and pass 72,592 rows to the filter operation, as shown in the query
execu-tion plan in Figure 64-8
Note that this query execution plan suggests a missing index Management Studio will even generate the
code to create the missing index using the context menu, not that I’d suggest using that as an indexing
strategy (Too often the missing index is not the best index, and it often wants to build a non-clustered
index that includes every column.)
Trang 4FIGURE 64-8
Query path 4 (filter by non-key column) passes every row from an index scan to a filter operation to
manually select the rows
Query Path 5: Bookmark Lookup
This bookmark lookup query path is a two-edged sword For infrequent queries, it’s the perfect query
path, but for the handful of queries that consume the majority of the server’s CPU, this query path will
kill performance
To demonstrate a bookmark lookup query path, the following query filters byProductIDwhile
return-ing all the base table’s columns:
SELECT *
FROM Production.WorkOrder
WHERE ProductID = 757;
Trang 5those rows.
There is an index on theProductIDcolumn, so the Query Optimizer has two possible options:
■ Scan the entire clustered index to access all the columns, and then filter the results to find the right rows Essentially, this would be the same as query path 4
■ Perform an index seek on theIX_Workload_ProductIDindex to fetch the 11 rows In the process, it learns theWorkOrderIDvalues for those 11 rows (because the clustered index key columns are in the leaf level of the non-clustered index) Then it can index seek those 11 rows from the clustered index to fetch the other columns
This jump, from the non-clustered index used to find the rows to the clustered index to
complete the columns needed for the query, is called a bookmark lookup and is shown in
Figure 64-9
FIGURE 64-9
The non-clustered index is missing a column To solve the query, SQL Server has to perform
a bookmark lookup (the dashed line) from the non-clustered index to the clustered index This
illustration shows a single row In reality it’s often hundreds or thousands of rows scattered throughout
the clustered index
PK_WorkOrder_WorkOrderID IX_WorkOrder_ProductID
The real cost of the bookmark lookup is that the rows are typically scattered throughout the clustered
index Locating the 11 rows in the non-clustered index was a single page hit, but those 11 rows might
be on 11 different pages in the clustered index With a larger number of selected rows the problem
intensifies Selecting 1,000 rows with a bookmark lookup might mean reading 3–4 pages from the
non-clustered index and then reading more than a thousand pages from the clustered index b-tree and
leaf level Eventually, SQL Server will decide that the bookmark lookup is more expensive than just
scanning the clustered index
Trang 6In the Zen mindset of indexing, the best query path is one that can return all the data by navigating a
single index The bookmark lookup has to navigate two indexes, which is wasteful
The query execution plan for a bookmark lookup shows the two indexes as data sources for a nested
loop join (as shown in Figure 64-10) For each row returned by the seek of the non-clustered index, the
nested loop join is requesting the matching rows from the clustered index by calling the key lookup
If you think of SQL Server as having tables with indexes, this query execution plan appears confusing;
but if you think of SQL Server as a collection of indexes with varying amounts of data, then fetching
data from two indexes and joining the results makes sense
FIGURE 64-10
The query execution plan shows the bookmark lookup as an index seek being joined with a key
lookup
It’s frequently said thatSelect *is wrong because it returns too many columns — the extra data is
considered wasteful I agree thatSelect *is wrong, but the real reason isn’t the extra network traffic,
it’s the bookmark lookup that is almost always generated by aSelect *
Trang 7bookmark lookup problem; the difference is that this query requests only one column that’s not
available from the non-clustered index:
SELECT WorkOrderID, StartDate FROM Production.WorkOrder WHERE ProductID = 757;
Consider the performance difference (again, refer to Table 64-1) between this query path and the
select *bookmark lookup query path Their performance is nearly identical
It doesn’t take many columns to force a bookmark lookup; a single column missing from the
non-clustered index means SQL Server must also look to the clustered index to solve the query
There are only two ways to avoid the bookmark lookup problem:
■ Filter by the clustered index key columns so the query can be satisfied using the clustered index (Query path 2 or 3)
■ Design a covering index (the next query path)
Query Path 6: Covering Index
If a non-clustered index includes every column required by the query (and that means every column
referenced by the query:SELECTcolumns,JOIN ONcondition columns,GROUP BYcolumns,WHERE
clause columns, and windowing columns), then SQL Server’s Query Optimizer can choose to solve the
query using only that non-clustered index When this occurs the index is said to cover the needs of
the query — in other words, it’s a covering index.
An index by itself isn’t a covering index, rather it becomes a covering index for a specific query when
the Query Optimizer can solve the query using only the non-clustered index
Query Path 5’s second query selected theStartDatecolumn BecauseStartDateisn’t part of the
IX_WorkOrder_ProductIDindex, SQL Server was forced to use an evil bookmark lookup To solve
the problem, the following code addsStartDateto theIX_WorkOrder_ProductIDindex so the
index can cover the query:
DROP INDEX Production.WorkOrder.IX_WorkOrder_ProductID
CREATE INDEX IX_WorkOrder_ProductID
ON Production.WorkOrder (ProductID)
INCLUDE (StartDate);
TheINCLUDEoption (added in SQL Server 2005) adds theStartDatecolumn to the leaf level of the
IX_WorkOrder_ProductIDindex The Query Optimizer can now solve the queries with an index
seek (as show in Figure 64-11):
SELECT WorkOrderID, StartDate
FROM Production.WorkOrder WHERE ProductID = 757; 9 rows
Trang 8SELECT WorkOrderID, StartDate
FROM Production.WorkOrder
WHERE ProductID = 945; –- 1,105 rows
FIGURE 64-11
With the StartDate column included in the index, the queries are solved with an index seek — a
perfect covering index
A nuance of the non-clustered index structure proves to be useful when designing covering indexes
This next query filters by the non-clustered index key and returns the clustered index key value:
SELECT WorkOrderID
FROM Production.WorkOrder
WHERE ProductID = 757;
TheIx_WorkOrder_ProductIDnon-clustered index has theProductIDcolumn as the key column,
so that data is available in the b-tree
Even though the clustered index key,WorkOrderID, doesn’t show up anywhere in the
Ix_WorkOrder_ProductIDdialogs in Management Studio, it’s there.WorkOrderIDis
Trang 9in the index.
The next query is a rare example of a covering index Compared to the previous query path, this query
adds theStartDatecolumn in theWHEREclause Conventional wisdom would say that this query
requires an index scan because it filters by a non-key column (StartDateis an included column in
the index and not a key column):
SELECT WorkOrderID FROM Production.WorkOrder WHERE ProductID = 945
AND StartDate = ‘2002-01-04’;
In this case the index seek operator uses the b-tree index (keyed byProductID) to seek the rows
matchingProductID = 945 This can be seen in the index seek properties as the seek predicate (as
illustrated in Figure 64-12)
But then, the index seek operator continues to select the correct rows by filtering the rows by the
included column (AND StartDate = ‘2002-01-04’) In the index seek properties, the predicate is
filtering by theStartDatecolumn
The performance difference between the bookmark lookup solution and the covering index is dramatic
When comparing the Query Optimizer cost and the logical reads (refer to Table 64-1), the query paths
that use a covering index are about 12 times more efficient The duration appears less so due to my
lim-ited hardware
Query Path 7: Filter by 2 x NC Indexes
A common indexing dilemma is how to index for multipleWHEREclause criteria Is it better to create
one composite index that includes both key columns? Or do two single-key column indexes perform
better? Query Paths 7 through 9 evaluate the options
The following code reconfigures the indexes: one index keyed onProductID, and one with
StartDate:
DROP INDEX Production.WorkOrder.IX_WorkOrder_ProductID;
CREATE INDEX IX_WorkOrder_ProductID
ON Production.WorkOrder (ProductID);
CREATE INDEX IX_WorkOrder_StartDate
ON Production.WorkOrder (StartDate);
With these indexes in place, this query filters by both key columns:
SELECT WorkOrderID, StartDate FROM Production.WorkOrder WHERE ProductID = 757
Trang 10FIGURE 64-12
The index seek operator can have a seek predicate, which uses the b-tree; and a predicate, which
functions as a non-indexed filter
To use both indexes, SQL Server uses a merge join to request rows from each index seek and then
cor-relate the data to return the rows that meet both criteria, as shown in Figure 64-13