Hướng dẫn học Microsoft SQL Server 2008 part 139 ppsx

DROP INDEX Production.WorkOrder.IX_WorkOrder_ProductIDDROP INDEX Production.WorkOrder.IX_WorkOrder_StartDate CREATE INDEX IX_WorkOrder_ProductID ON Production.WorkOrder ProductID, StartD

Trang 1

FIGURE 64-13

Filtering by two indexes adds a merge join into the mix

Examining the performance stat in Table 64-1, multiple indexes has a Query Optimizer cost of 12 and

uses four logical reads

For infrequent queries, Query Path 7, with its multiple indexes, is more than adequate, and much better

than no index at all However, for those few queries that run constantly, the next query path is a better

solution for multiple criteria

Query Path 8: Filter by Ordered Composite Index

For raw performance, the fastest solution to the ‘‘multipleWHEREclause criteria’’ problem is a single

composite index, as demonstrated in Query Path 8

Creating a composite index withProductIDandStartDateas key columns sets up the test:

Trang 2

DROP INDEX Production.WorkOrder.IX_WorkOrder_ProductID

DROP INDEX Production.WorkOrder.IX_WorkOrder_StartDate

CREATE INDEX IX_WorkOrder_ProductID

ON Production.WorkOrder (ProductID, StartDate);

Rerunning the same query:

SELECT WorkOrderID, StartDate

FROM Production.WorkOrder

WHERE ProductID = 757

AND StartDate = ‘2002-01-04’;

The query execution plan, show in Figure 64-14, is a simple single-index seek operation and it performs

wonderfully

FIGURE 64-14

Filtering two criteria using a composite index performs like greased lighting

Trang 3

Query Path 9: Filter by Unordered Composite Index

One common indexing myth is that the order of the index key columns doesn’t matter — that is,

SQL Server can use an index so long as the column is anywhere in the index Like most myths, it’s a

half truth

Searching b-tree indexes requires the data for the leading columns in the order of the columns

Searching for col1, col2, and so on works great, but searching for the columns out of order — e.g., col2

without col1 — requires scanning all the leaf-level data

Query Path 9 demonstrates the inefficiency of filtering by an unordered composite index In the

follow-ing example,StartDateis the second key in the composite index, so the data is there Will the query

use the index?

SELECT WorkOrderID FROM Production.WorkOrder WHERE StartDate = ‘2002-01-04’;

The Query Optimizer uses theIX_WorkOrder_ProductIDcomposite non-clustered index, as shown

in Figure 64-15, because it’s narrower than the clustered index, enabling more rows to fit on a page But

because the filter is by the second column, it can’t use the b-tree of the index; instead, SQL Server is

forced to scan every row and manually filter (in the scan operation) to select the correct rows

Essen-tially, it’s doing the exact same operation as manually scanning a telephone book for everyone with a

first name of Paul

Query Path 10: Non-SARGable Expressions

SQL Server’s Query Optimizer examines the conditions within the query’sWHEREclause to determine

which indexes are actually useful If SQL Server can optimize theWHEREcondition using an index,

the condition is referred to as a search argument, or SARG for short However, not every condition is a

‘‘SARGable’’ search argument

The final query path walks through a series of anti-patterns — designingWHEREclauses with conditions

that can’t use b-tree indexes and that fall back to an index scan

■ Wrapping the column in an expression forces SQL Server to evaluate the data using the expression for every row before it can determine if the row passes theWHEREclause criteria:

SELECT WorkOrderID FROM Production.WorkOrder WHERE ProductID + 2 = 759;

■ The solution to this non-SARGable issue is to apply a little algebra and rewrite the query with the expression on the other side of the equals:

SELECT WorkOrderID FROM Production.WorkOrder WHERE ProductID = 759 - 2;

Trang 4

FIGURE 64-15

Filtering by the second key column of an index forces an index scan

■ Multiple conditions that areANDed together are SARGs, butORed conditions might not be

useful for the b-tree:

WHERE ProductID = 757

OR StartDate = ‘2002-01-04’;

■ Negative search conditions(<>,!>,!<,Not Exists,Not In,Not Like) are not

eas-ily optimizable It’s easy to prove that a row exists, but to prove it doesn’t exist requires

examining every row:

WHERE ProductID NOT IN (400,800, 950);

However, sometimes, a few negative values can be SARGable, so it’s worth testing Often, it’s

the number of rows returned that forces a scan, not the negative condition

Trang 5

■ Conditions that begin with wildcards aren’t SARGable An index can quickly locate

WorkOrderID = 757, but must scan every row to find anyWorkOrderID’s ending in7:

SELECT WorkOrderID, StartDate FROM Production.WorkOrder WHERE WorkOrderID like ‘%7’;

■ If theWHEREclause includes a function, such as a string function, a table scan is required so every row can be tested with the function applied to the data:

SELECT WorkOrderID, StartDate FROM Production.WorkOrder WHERE DateName(dw, StartDate) = ‘Monday’;

SQL Server 2008 does include some optimizations that can avoid the scan when working with date conversions

The type of access (index scan vs index seek) not only affects the performance of reading data from the single table, it also affects join performance The type of join chosen by SQL Server depends on whether or not the data is ordered Typically, if the data is being read as efficiently

as possible from the single table, the data will then passed to an efficient join However, inefficient table

access is compounded by the subsequent inefficient join performance.

A Comprehensive Indexing Strategy

An index strategy deals with the overall application, rather than fixing isolated problems to the

detri-ment of the whole In my consulting practice, I’ve found that the key to indexing is knowing when you

need a bookmark lookup vs when to design indexes to avoid bookmark lookups

Identifying key queries

Analyzing a full query workload, which includes a couple days of operations (and nightly or weekend

workloads) will likely reveal that although there may be a few hundred distinct queries, the majority

of the CPU time is spent on the top handful of queries I’ve tuned systems on which 95 percent of the

CPU time was spent on only five queries Those top queries demand flat-out performance, while the

other queries might be able to afford a bookmark lookup

To identify those top queries:

1 Create a Profiler trace to capture all queries or stored procedures:

■ Profiler event:T-SQL SQL:CompletedandRemote Procedure Call:Completed

■ Profiler columns:TextData,ApplicationName,CPU,Reads,Writes,Duration,

SPID,EndTime,DatabaseName, andRowCounts

It’s terribly important to not filter the trace to capture only long-running queries (a common suggestion is to set the filter to capture only queries with a duration > 1 sec) Every query

must be captured

2 Test the trace definition using Profiler for a few moments, and then stop the trace Be sure to

filter out applications or databases not being analyzed

Trang 6

3 In the trace properties, add a stop time to the trace definition (so it will capture a full day’s

and night’s workload), and set up the trace to write to a file

4 Generate a trace script using the File➪ Export ➪ Script Trace Definition ➪ for SQL Server

2005–2008 menu command

5 Check the script You may need to edit it to supply a filename and path, and double-check the

start and stop times Execute the trace script on the production server

6 Pull the trace file into Profiler and then save it to a table using the File➪ Save As ➪ Trace

Table menu command

7 Profiler exports theTextDatacolumn as annTextdata type, and that just won’t do The

following code creates annVarChar(max)column, which is much friendlier, with string

functions:

alter table trace

alter column textdata nvarchar(max);

8 Run the following aggregate query to summarize the query load This query assumes that the

trace data was saved to a table creatively namedtrace:

select substring(querytext, 1, CHARINDEX(‘ ’,querytext, 6)),

count(*) as ‘count’,

sum(duration) as ‘SumDuration’,

avg(duration) as ‘AvgDuration’,

max(duration) as ‘MaxDuration’,

cast(SUM(duration) as numeric(20,2))

/ (select sum(Duration) from trace) as ‘Percentage’,

sum(rowcounts) as ‘SumRows’

from trace

group by substring(querytext, 1, charindex(‘ ’,querytext, 6))

order by sum(Duration) desc;

The top queries will become obvious

Table CRUD analysis

For each table involved with one of the top queries, it’s important to collect together in one

pre-sentation these top queries and stored procedures that hit that table Plot the access using a CRUD

(create, retrieve, update, delete) matrix, as shown in Table 64-2 This example analyzes a fictitious

OrderDetailtable and examines only three fictitious procedures for simplicity

The abbreviations are as follows: S for selected column, O for order by column, W for a column

refer-enced in theWHEREclause, and G for the group by function

The next step is to design the fewest number of indexes that satisfies the table’s needs This process first

determines the clustered index and then creates indexes for every procedure and query that accesses the

table, as shown in the following list and Table 64-3 The numbers in the chart indicate the ordinal

posi-tion of the column in the index An included column is listed asI

Trang 7

TABLE 64-2

Table CRUD Usage Analysis

NonStockProduct S

ExtendedPrice S

TABLE 64-3

Table Strategic Index Plan

NonStockProduct

UnitPrice

ExtendedPrice

ShipDate

ShipComment

Trang 8

Because theOrderDetailtable is often selected using theOrderIDcolumn, and this column can also

be used to gather multiple rows into a single data page,OrderIDis the best candidate for the clustered

index (CI) The clustered index will consist of one column — theOrderID— so a1goes in the

ordered row for the clustered index, indicating that it’s the first column of the clustered index

The clustered index satisfies thepGetOrderprocedure

ThepCheckQuantityprocedure verifies the quantity on hand prior to shipping It filters the rows

byShipRequestDateandOrderID Creating a non-clustered index withShipRequestDatewill

index both theShipRequestDatecolumn and theOrderIDcolumn, as the clustered index is present

in the leaf node of the non-clustered index Because the procedure needs only four columns, adding

ProductIDandQuantityas included columns will enableIx1to completely cover the needs of the

query and significantly improve performance

The third procedure can be satisfied by adding a non-clustered index, Ix2, with theOrderDetailID

column

Although this example had only three procedures, and may seem simplistic, if the plan focuses on the

top queries, most production tables will, in fact, have only a handful of queries or stored procedures

The Database Engine Tuning Advisor is a SQL Server 2008 utility that can analyze a single

query or a set of queries and recommend indexes and partitions to improve performance.

My indexing strategy is based on knowing when to use a bookmark lookup vs when to avoid a

book-mark lookup The Database Engine Tuning Advisor doesn’t know whether a given query should or should

not have a bookmark lookup so it can’t follow the strategy Therefore, I recommend that you avoid

the Database Engine Tuning Advisor If you understand how queries work, you don’t need the Advisor

anyway.

Selecting the clustered index

Selecting the clustered index is a critical piece of the performance puzzle, perhaps the most important

piece of the physical schema A clustered index can affect performance in several ways:

■ When an index seek operation finds a row using a clustered index, the data is right

there — no bookmark lookup This makes the column used to select the row, probably

the primary key, an ideal candidate for a clustered index

■ Clustered indexes gather rows with the same or similar values to the smallest possible number

of data pages, thus reducing the number of data pages required to retrieve a set a rows

Clus-tered indexes are therefore excellent for columns that are often used to select a range of rows,

such as secondary table foreign keys likeOrderDetail.OrderID

■ Inserting data in the middle of a clustered index is always a bad idea The page splits can

cripple performance, so carefully consider the actual data usage for every clustered index

The MOC (Microsoft Official Curriculum) used to teach that the primary purpose of the

clustered index was gathering together similar rows (the second bullet above) When I

wrote the SQL Server 2000 Bible, I also believed that was the primary reason for a clustered index.

I now believe that avoiding a bookmark lookup is a stronger case for designing a clustered index, and

the second bullet only sometimes applies.

Trang 9

Creating base indexes

Even before tuning, the locations of a few indexes are easy to determine These base indexes are the first

step in building a solid set of indexes Here are a few steps to keep in mind when building these base

indexes:

1 Create a clustered index for every table For primary tables, cluster on the column most likely

used to select the row — probably the primary key

For secondary tables that are most commonly retrieved by a set of related rows, create a clustered index for the most important foreign key to group those related rows together

2 Create non-clustered indexes for the columns of every foreign key, except for the foreign

key that was indexed in step 1 Use only the foreign key values as index keys I’ve devel-oped a script that will create a non-clustered index for every foreign key (download from

www.sqlserverbible.com)

3 Create a single-column index for every column expected to be referenced in aWHEREclause,

anORDER BY, or aGROUP BY

While this indexing plan is far from perfect, and it’s definitely not a final indexing plan, it provides an

initial compromise between no indexes and tuned indexes, and can serve as a baseline performance

measurement to compare against future index tuning

Additional tuning will likely involve creating composite indexes and removing unnecessary indexes

Best Practice

When planning indexes, there’s a subtle tension between serving the needs of select queries vs update

queries While an index may improve query performance, there’s a performance cost because when

a row is inserted, updated, or deleted, the indexes must be updated as well Nonetheless, some indexing is

necessary for write operations The update or delete operation must locate the row prior to performing the

write operation, and useful indexes facilitate locating that row, thereby speeding up write operations

Therefore, when planning indexes, be careful to include the fewest number of indexes to accomplish the job

SQL Server exposes index usage statistics via dynamic management views Specifically,

sys.dm_db_index_operational_stats and sys.dm_index_usage_stats uncover infor-mation about how indexes are being used In addition, there are four dynamic management views that

reveal indexes that the Query Optimizer looked for but didn’t find: sys.dm_missing_index_groups ,

sys.dm_missing_index_group_stats , sys.dm_missing_index_columns , and sys.dm_missing_

index_details

Specialty Indexes

Beyond the standard clustered and non-clustered indexes, SQL Server offers two type of indexes I

refer to as specialty indexes Filtered indexes, new in SQL Server 2008, include less data, and indexed

Trang 10

views, available since SQL Server 2000, build out custom sets of data Both are considered high-end

performance-tuning indexes

Filtered indexes

Until SQL Server 2008, every non-clustered index indexed every key value and every row Filtered

indexes allow adding aWHEREclause to theCREATE INDEXstatement This option is only available for

non-clustered indexes (how could a clustered index not include some rows?) A filtered index not only

includes fewer rows at the leaf level, but also includes fewer values in the intermediate levels It’s this

reduction in intermediate levels that causes the reads to be fewer for any index seek

An example of employing a filtered index inAdventureWorks2008is theScrappedReasonID

col-umn in theProduction.WorkOrdertable Fortunately for Adventure Works, they scrapped only 612

(.8%) parts over the life of the database The existingIX_WorkOrder_ScrapReasonIDincludes every

row TheScrapReasonIDforeign key in theProduction.WorkOrdertable allows nulls for work

orders that were not scrapped The index includes all the null values with pointers to theWorkOrder

rows withNULL ScrapReasonIDs The current index uses 109 pages

The following script recreates the index with aWHEREclause that excludes all theNULLvalues:

DROP INDEX Production.WorkOrder.IX_WorkOrder_ScrapReasonID

CREATE INDEX IX_WorkOrder_ScrapReasonID

ON Production.WorkOrder(ScrapReasonID)

WHERE ScrapReasonID IS NOT NULL

The new index uses only two pages Interestingly, the difference isn’t noticeable between using the

filtered or a non-filtered index when selecting all the work orders with a scrap reason that’s not null

That’s because there aren’t enough intermediate levels to make a significant difference For a much larger

table, the difference would be worth testing, and most likely the filtered index would provide a benefit

Filtered indexes, because of their compact size, not only reduce the disk usage but are easier to

maintain

Best Practice

When designing a covering index (see index kata #6) to solve a specific query — probably one that

represents the top handful of CPU duration according to the indexing strategy — if the covering index

works with a relatively small subset of data, and the overall table is a large table, consider filtering the

covering index

Another situation that might benefit from filtered indexes is building a unique index that includes

multi-ple rows with null values A normal unique index allows only a single row to include a null value in the

key columns However, building a unique index that excludes null in theWHEREclause creates a unique

index that permits an unlimited number of null values

Định dạng
Số trang	10
Dung lượng	714,72 KB