Hướng dẫn học Microsoft SQL Server 2008 part 138 pot

Clustered Index PK_WorkOrder_WorkOrderID Seek Scan Query Path 4: Filter by non-key column The previous query paths were simple to solve because the filter column matched the clustered in

Trang 1

a single row; however, from a rows-returned-per-millisecond perspective, it’s one of the slowest

query paths

A common myth is that seeks can only return single rows and that’s why seeking multiple rows would

be very slow compared to scans As the next two query paths indicate, that’s not true

Query Path 3: Range Seek Query

The third query path selects a narrow range of consecutive values using abetweenoperator in the

WHEREclause:

SELECT * FROM Production.WorkOrder

WHERE WorkOrderID between 10000 and 10010;

The Query Optimizer must first determine whether there’s a suitable index to select the range In this

case it’s the same key column in the clustered index as in Query path 2

A range seek query has an interesting query execution plan The seek predicate (listed in the index seek

properties), which defines how the query is navigating the b-tree, has both a start and an end, as shown

in Figure 64-6 This means the operation is seeking the first row and then quickly scanning and

return-ing every row to the end of the range, as illustrated in Figure 64-7

To further investigate the range seek query path, this next query pushes the range to the limit by

select-ing every row in the table Both queries are tested just to prove thatbetweenis logically the same as

>=with<=:

WHERE WorkOrderID >= 1 and WorkOrderID <= 72591;

WHERE WorkOrderID between 1 and 72591;

At first blush it would seem that this query should generate the same query execution plan as the first

query path (select * from table), but, just like the narrow range query, thebetweenoperator

needs a consecutive range of rows, which causes the Query Optimizer to select index seek to return

ordered rows

Keep in mind that there’s no guarantee that another row might be added after the query plan is

gener-ated and before it’s executed Therefore, for range queries, an index seek is the fastest possible way to

ensure that only the correct rows are selected

Trang 2

FIGURE 64-6

The clustered index seek’s seek predicate has a start and an end indicating the range of rows searched

for using the b-tree index

Index seeks and index scans both perform well when returning large sets of data The minor difference

between the two query’s durations listed in the performance chart (refer to Table 64-1) is more likely

due to variance in my computer’s performance There were some iterations of the index seek that

per-formed faster than some iterations of the index scan

Trang 3

An index seek operation has the option of seeking to find the first row, and then sequentially scanning

on a block of data

Clustered Index PK_WorkOrder_WorkOrderID

Seek

Scan

Query Path 4: Filter by non-key column

The previous query paths were simple to solve because the filter column matched the clustered index

key column and all the data was available from one index; but what if that isn’t the case?

Consider this query:

SELECT * FROM Production.WorkOrder WHERE StartDate = ‘2003-06-25’;

There’s no index with a key column of StartDate This means that the Query Optimizer can’t choose

a fast b-tree index and must resort to scanning the entire table and then manually searching for rows

that match theWHEREclause Without an index, this query path is 23 times slower than the clustered

index seek query path

The cost isn’t the filter operation alone (which is only 7 percent of the total query cost) The real cost is

having to scan in every row and pass 72,592 rows to the filter operation, as shown in the query

execu-tion plan in Figure 64-8

Note that this query execution plan suggests a missing index Management Studio will even generate the

code to create the missing index using the context menu, not that I’d suggest using that as an indexing

strategy (Too often the missing index is not the best index, and it often wants to build a non-clustered

index that includes every column.)

Trang 4

FIGURE 64-8

Query path 4 (filter by non-key column) passes every row from an index scan to a filter operation to

manually select the rows

Query Path 5: Bookmark Lookup

This bookmark lookup query path is a two-edged sword For infrequent queries, it’s the perfect query

path, but for the handful of queries that consume the majority of the server’s CPU, this query path will

kill performance

To demonstrate a bookmark lookup query path, the following query filters byProductIDwhile

return-ing all the base table’s columns:

SELECT *

FROM Production.WorkOrder

WHERE ProductID = 757;

Trang 5

those rows.

There is an index on theProductIDcolumn, so the Query Optimizer has two possible options:

■ Scan the entire clustered index to access all the columns, and then filter the results to find the right rows Essentially, this would be the same as query path 4

■ Perform an index seek on theIX_Workload_ProductIDindex to fetch the 11 rows In the process, it learns theWorkOrderIDvalues for those 11 rows (because the clustered index key columns are in the leaf level of the non-clustered index) Then it can index seek those 11 rows from the clustered index to fetch the other columns

This jump, from the non-clustered index used to find the rows to the clustered index to

complete the columns needed for the query, is called a bookmark lookup and is shown in

Figure 64-9

FIGURE 64-9

The non-clustered index is missing a column To solve the query, SQL Server has to perform

a bookmark lookup (the dashed line) from the non-clustered index to the clustered index This

illustration shows a single row In reality it’s often hundreds or thousands of rows scattered throughout

the clustered index

PK_WorkOrder_WorkOrderID IX_WorkOrder_ProductID

The real cost of the bookmark lookup is that the rows are typically scattered throughout the clustered

index Locating the 11 rows in the non-clustered index was a single page hit, but those 11 rows might

be on 11 different pages in the clustered index With a larger number of selected rows the problem

intensifies Selecting 1,000 rows with a bookmark lookup might mean reading 3–4 pages from the

non-clustered index and then reading more than a thousand pages from the clustered index b-tree and

leaf level Eventually, SQL Server will decide that the bookmark lookup is more expensive than just

scanning the clustered index

Trang 6

In the Zen mindset of indexing, the best query path is one that can return all the data by navigating a

single index The bookmark lookup has to navigate two indexes, which is wasteful

The query execution plan for a bookmark lookup shows the two indexes as data sources for a nested

loop join (as shown in Figure 64-10) For each row returned by the seek of the non-clustered index, the

nested loop join is requesting the matching rows from the clustered index by calling the key lookup

If you think of SQL Server as having tables with indexes, this query execution plan appears confusing;

but if you think of SQL Server as a collection of indexes with varying amounts of data, then fetching

data from two indexes and joining the results makes sense

FIGURE 64-10

The query execution plan shows the bookmark lookup as an index seek being joined with a key

lookup

It’s frequently said thatSelect *is wrong because it returns too many columns — the extra data is

considered wasteful I agree thatSelect *is wrong, but the real reason isn’t the extra network traffic,

it’s the bookmark lookup that is almost always generated by aSelect *

Trang 7

bookmark lookup problem; the difference is that this query requests only one column that’s not

available from the non-clustered index:

SELECT WorkOrderID, StartDate FROM Production.WorkOrder WHERE ProductID = 757;

Consider the performance difference (again, refer to Table 64-1) between this query path and the

select *bookmark lookup query path Their performance is nearly identical

It doesn’t take many columns to force a bookmark lookup; a single column missing from the

non-clustered index means SQL Server must also look to the clustered index to solve the query

There are only two ways to avoid the bookmark lookup problem:

■ Filter by the clustered index key columns so the query can be satisfied using the clustered index (Query path 2 or 3)

■ Design a covering index (the next query path)

Query Path 6: Covering Index

If a non-clustered index includes every column required by the query (and that means every column

referenced by the query:SELECTcolumns,JOIN ONcondition columns,GROUP BYcolumns,WHERE

clause columns, and windowing columns), then SQL Server’s Query Optimizer can choose to solve the

query using only that non-clustered index When this occurs the index is said to cover the needs of

the query — in other words, it’s a covering index.

An index by itself isn’t a covering index, rather it becomes a covering index for a specific query when

the Query Optimizer can solve the query using only the non-clustered index

Query Path 5’s second query selected theStartDatecolumn BecauseStartDateisn’t part of the

IX_WorkOrder_ProductIDindex, SQL Server was forced to use an evil bookmark lookup To solve

the problem, the following code addsStartDateto theIX_WorkOrder_ProductIDindex so the

index can cover the query:

DROP INDEX Production.WorkOrder.IX_WorkOrder_ProductID

CREATE INDEX IX_WorkOrder_ProductID

ON Production.WorkOrder (ProductID)

INCLUDE (StartDate);

TheINCLUDEoption (added in SQL Server 2005) adds theStartDatecolumn to the leaf level of the

IX_WorkOrder_ProductIDindex The Query Optimizer can now solve the queries with an index

seek (as show in Figure 64-11):

SELECT WorkOrderID, StartDate

FROM Production.WorkOrder WHERE ProductID = 757; 9 rows

Trang 8

SELECT WorkOrderID, StartDate

WHERE ProductID = 945; –- 1,105 rows

FIGURE 64-11

With the StartDate column included in the index, the queries are solved with an index seek — a

perfect covering index

A nuance of the non-clustered index structure proves to be useful when designing covering indexes

This next query filters by the non-clustered index key and returns the clustered index key value:

SELECT WorkOrderID

WHERE ProductID = 757;

TheIx_WorkOrder_ProductIDnon-clustered index has theProductIDcolumn as the key column,

so that data is available in the b-tree

Even though the clustered index key,WorkOrderID, doesn’t show up anywhere in the

Ix_WorkOrder_ProductIDdialogs in Management Studio, it’s there.WorkOrderIDis

Trang 9

in the index.

The next query is a rare example of a covering index Compared to the previous query path, this query

adds theStartDatecolumn in theWHEREclause Conventional wisdom would say that this query

requires an index scan because it filters by a non-key column (StartDateis an included column in

the index and not a key column):

SELECT WorkOrderID FROM Production.WorkOrder WHERE ProductID = 945

AND StartDate = ‘2002-01-04’;

In this case the index seek operator uses the b-tree index (keyed byProductID) to seek the rows

matchingProductID = 945 This can be seen in the index seek properties as the seek predicate (as

illustrated in Figure 64-12)

But then, the index seek operator continues to select the correct rows by filtering the rows by the

included column (AND StartDate = ‘2002-01-04’) In the index seek properties, the predicate is

filtering by theStartDatecolumn

The performance difference between the bookmark lookup solution and the covering index is dramatic

When comparing the Query Optimizer cost and the logical reads (refer to Table 64-1), the query paths

that use a covering index are about 12 times more efficient The duration appears less so due to my

lim-ited hardware

Query Path 7: Filter by 2 x NC Indexes

A common indexing dilemma is how to index for multipleWHEREclause criteria Is it better to create

one composite index that includes both key columns? Or do two single-key column indexes perform

better? Query Paths 7 through 9 evaluate the options

The following code reconfigures the indexes: one index keyed onProductID, and one with

StartDate:

DROP INDEX Production.WorkOrder.IX_WorkOrder_ProductID;

CREATE INDEX IX_WorkOrder_ProductID

ON Production.WorkOrder (ProductID);

CREATE INDEX IX_WorkOrder_StartDate

ON Production.WorkOrder (StartDate);

With these indexes in place, this query filters by both key columns:

SELECT WorkOrderID, StartDate FROM Production.WorkOrder WHERE ProductID = 757

Trang 10

FIGURE 64-12

The index seek operator can have a seek predicate, which uses the b-tree; and a predicate, which

functions as a non-indexed filter

To use both indexes, SQL Server uses a merge join to request rows from each index seek and then

cor-relate the data to return the rows that meet both criteria, as shown in Figure 64-13

Định dạng
Số trang	10
Dung lượng	854,54 KB