DROP INDEX Production.WorkOrder.IX_WorkOrder_ProductIDDROP INDEX Production.WorkOrder.IX_WorkOrder_StartDate CREATE INDEX IX_WorkOrder_ProductID ON Production.WorkOrder ProductID, StartD
Trang 1FIGURE 64-13
Filtering by two indexes adds a merge join into the mix
Examining the performance stat in Table 64-1, multiple indexes has a Query Optimizer cost of 12 and
uses four logical reads
For infrequent queries, Query Path 7, with its multiple indexes, is more than adequate, and much better
than no index at all However, for those few queries that run constantly, the next query path is a better
solution for multiple criteria
Query Path 8: Filter by Ordered Composite Index
For raw performance, the fastest solution to the ‘‘multipleWHEREclause criteria’’ problem is a single
composite index, as demonstrated in Query Path 8
Creating a composite index withProductIDandStartDateas key columns sets up the test:
Trang 2DROP INDEX Production.WorkOrder.IX_WorkOrder_ProductID
DROP INDEX Production.WorkOrder.IX_WorkOrder_StartDate
CREATE INDEX IX_WorkOrder_ProductID
ON Production.WorkOrder (ProductID, StartDate);
Rerunning the same query:
SELECT WorkOrderID, StartDate
FROM Production.WorkOrder
WHERE ProductID = 757
AND StartDate = ‘2002-01-04’;
The query execution plan, show in Figure 64-14, is a simple single-index seek operation and it performs
wonderfully
FIGURE 64-14
Filtering two criteria using a composite index performs like greased lighting
Trang 3Query Path 9: Filter by Unordered Composite Index
One common indexing myth is that the order of the index key columns doesn’t matter — that is,
SQL Server can use an index so long as the column is anywhere in the index Like most myths, it’s a
half truth
Searching b-tree indexes requires the data for the leading columns in the order of the columns
Searching for col1, col2, and so on works great, but searching for the columns out of order — e.g., col2
without col1 — requires scanning all the leaf-level data
Query Path 9 demonstrates the inefficiency of filtering by an unordered composite index In the
follow-ing example,StartDateis the second key in the composite index, so the data is there Will the query
use the index?
SELECT WorkOrderID FROM Production.WorkOrder WHERE StartDate = ‘2002-01-04’;
The Query Optimizer uses theIX_WorkOrder_ProductIDcomposite non-clustered index, as shown
in Figure 64-15, because it’s narrower than the clustered index, enabling more rows to fit on a page But
because the filter is by the second column, it can’t use the b-tree of the index; instead, SQL Server is
forced to scan every row and manually filter (in the scan operation) to select the correct rows
Essen-tially, it’s doing the exact same operation as manually scanning a telephone book for everyone with a
first name of Paul
Query Path 10: Non-SARGable Expressions
SQL Server’s Query Optimizer examines the conditions within the query’sWHEREclause to determine
which indexes are actually useful If SQL Server can optimize theWHEREcondition using an index,
the condition is referred to as a search argument, or SARG for short However, not every condition is a
‘‘SARGable’’ search argument
The final query path walks through a series of anti-patterns — designingWHEREclauses with conditions
that can’t use b-tree indexes and that fall back to an index scan
■ Wrapping the column in an expression forces SQL Server to evaluate the data using the expression for every row before it can determine if the row passes theWHEREclause criteria:
SELECT WorkOrderID FROM Production.WorkOrder WHERE ProductID + 2 = 759;
■ The solution to this non-SARGable issue is to apply a little algebra and rewrite the query with the expression on the other side of the equals:
SELECT WorkOrderID FROM Production.WorkOrder WHERE ProductID = 759 - 2;
Trang 4FIGURE 64-15
Filtering by the second key column of an index forces an index scan
■ Multiple conditions that areANDed together are SARGs, butORed conditions might not be
useful for the b-tree:
SELECT WorkOrderID, StartDate
FROM Production.WorkOrder
WHERE ProductID = 757
OR StartDate = ‘2002-01-04’;
■ Negative search conditions(<>,!>,!<,Not Exists,Not In,Not Like) are not
eas-ily optimizable It’s easy to prove that a row exists, but to prove it doesn’t exist requires
examining every row:
SELECT WorkOrderID, StartDate
FROM Production.WorkOrder
WHERE ProductID NOT IN (400,800, 950);
However, sometimes, a few negative values can be SARGable, so it’s worth testing Often, it’s
the number of rows returned that forces a scan, not the negative condition
Trang 5■ Conditions that begin with wildcards aren’t SARGable An index can quickly locate
WorkOrderID = 757, but must scan every row to find anyWorkOrderID’s ending in7:
SELECT WorkOrderID, StartDate FROM Production.WorkOrder WHERE WorkOrderID like ‘%7’;
■ If theWHEREclause includes a function, such as a string function, a table scan is required so every row can be tested with the function applied to the data:
SELECT WorkOrderID, StartDate FROM Production.WorkOrder WHERE DateName(dw, StartDate) = ‘Monday’;
SQL Server 2008 does include some optimizations that can avoid the scan when working with date conversions
The type of access (index scan vs index seek) not only affects the performance of reading data from the single table, it also affects join performance The type of join chosen by SQL Server depends on whether or not the data is ordered Typically, if the data is being read as efficiently
as possible from the single table, the data will then passed to an efficient join However, inefficient table
access is compounded by the subsequent inefficient join performance.
A Comprehensive Indexing Strategy
An index strategy deals with the overall application, rather than fixing isolated problems to the
detri-ment of the whole In my consulting practice, I’ve found that the key to indexing is knowing when you
need a bookmark lookup vs when to design indexes to avoid bookmark lookups
Identifying key queries
Analyzing a full query workload, which includes a couple days of operations (and nightly or weekend
workloads) will likely reveal that although there may be a few hundred distinct queries, the majority
of the CPU time is spent on the top handful of queries I’ve tuned systems on which 95 percent of the
CPU time was spent on only five queries Those top queries demand flat-out performance, while the
other queries might be able to afford a bookmark lookup
To identify those top queries:
1 Create a Profiler trace to capture all queries or stored procedures:
■ Profiler event:T-SQL SQL:CompletedandRemote Procedure Call:Completed
■ Profiler columns:TextData,ApplicationName,CPU,Reads,Writes,Duration,
SPID,EndTime,DatabaseName, andRowCounts
It’s terribly important to not filter the trace to capture only long-running queries (a common suggestion is to set the filter to capture only queries with a duration > 1 sec) Every query
must be captured
2 Test the trace definition using Profiler for a few moments, and then stop the trace Be sure to
filter out applications or databases not being analyzed
Trang 63 In the trace properties, add a stop time to the trace definition (so it will capture a full day’s
and night’s workload), and set up the trace to write to a file
4 Generate a trace script using the File➪ Export ➪ Script Trace Definition ➪ for SQL Server
2005–2008 menu command
5 Check the script You may need to edit it to supply a filename and path, and double-check the
start and stop times Execute the trace script on the production server
6 Pull the trace file into Profiler and then save it to a table using the File➪ Save As ➪ Trace
Table menu command
7 Profiler exports theTextDatacolumn as annTextdata type, and that just won’t do The
following code creates annVarChar(max)column, which is much friendlier, with string
functions:
alter table trace
alter column textdata nvarchar(max);
8 Run the following aggregate query to summarize the query load This query assumes that the
trace data was saved to a table creatively namedtrace:
select substring(querytext, 1, CHARINDEX(‘ ’,querytext, 6)),
count(*) as ‘count’,
sum(duration) as ‘SumDuration’,
avg(duration) as ‘AvgDuration’,
max(duration) as ‘MaxDuration’,
cast(SUM(duration) as numeric(20,2))
/ (select sum(Duration) from trace) as ‘Percentage’,
sum(rowcounts) as ‘SumRows’
from trace
group by substring(querytext, 1, charindex(‘ ’,querytext, 6))
order by sum(Duration) desc;
The top queries will become obvious
Table CRUD analysis
For each table involved with one of the top queries, it’s important to collect together in one
pre-sentation these top queries and stored procedures that hit that table Plot the access using a CRUD
(create, retrieve, update, delete) matrix, as shown in Table 64-2 This example analyzes a fictitious
OrderDetailtable and examines only three fictitious procedures for simplicity
The abbreviations are as follows: S for selected column, O for order by column, W for a column
refer-enced in theWHEREclause, and G for the group by function
The next step is to design the fewest number of indexes that satisfies the table’s needs This process first
determines the clustered index and then creates indexes for every procedure and query that accesses the
table, as shown in the following list and Table 64-3 The numbers in the chart indicate the ordinal
posi-tion of the column in the index An included column is listed asI
Trang 7TABLE 64-2
Table CRUD Usage Analysis
NonStockProduct S
ExtendedPrice S
TABLE 64-3
Table Strategic Index Plan
NonStockProduct
UnitPrice
ExtendedPrice
ShipDate
ShipComment
Trang 8Because theOrderDetailtable is often selected using theOrderIDcolumn, and this column can also
be used to gather multiple rows into a single data page,OrderIDis the best candidate for the clustered
index (CI) The clustered index will consist of one column — theOrderID— so a1goes in the
ordered row for the clustered index, indicating that it’s the first column of the clustered index
The clustered index satisfies thepGetOrderprocedure
ThepCheckQuantityprocedure verifies the quantity on hand prior to shipping It filters the rows
byShipRequestDateandOrderID Creating a non-clustered index withShipRequestDatewill
index both theShipRequestDatecolumn and theOrderIDcolumn, as the clustered index is present
in the leaf node of the non-clustered index Because the procedure needs only four columns, adding
ProductIDandQuantityas included columns will enableIx1to completely cover the needs of the
query and significantly improve performance
The third procedure can be satisfied by adding a non-clustered index, Ix2, with theOrderDetailID
column
Although this example had only three procedures, and may seem simplistic, if the plan focuses on the
top queries, most production tables will, in fact, have only a handful of queries or stored procedures
The Database Engine Tuning Advisor is a SQL Server 2008 utility that can analyze a single
query or a set of queries and recommend indexes and partitions to improve performance.
My indexing strategy is based on knowing when to use a bookmark lookup vs when to avoid a
book-mark lookup The Database Engine Tuning Advisor doesn’t know whether a given query should or should
not have a bookmark lookup so it can’t follow the strategy Therefore, I recommend that you avoid
the Database Engine Tuning Advisor If you understand how queries work, you don’t need the Advisor
anyway.
Selecting the clustered index
Selecting the clustered index is a critical piece of the performance puzzle, perhaps the most important
piece of the physical schema A clustered index can affect performance in several ways:
■ When an index seek operation finds a row using a clustered index, the data is right
there — no bookmark lookup This makes the column used to select the row, probably
the primary key, an ideal candidate for a clustered index
■ Clustered indexes gather rows with the same or similar values to the smallest possible number
of data pages, thus reducing the number of data pages required to retrieve a set a rows
Clus-tered indexes are therefore excellent for columns that are often used to select a range of rows,
such as secondary table foreign keys likeOrderDetail.OrderID
■ Inserting data in the middle of a clustered index is always a bad idea The page splits can
cripple performance, so carefully consider the actual data usage for every clustered index
The MOC (Microsoft Official Curriculum) used to teach that the primary purpose of the
clustered index was gathering together similar rows (the second bullet above) When I
wrote the SQL Server 2000 Bible, I also believed that was the primary reason for a clustered index.
I now believe that avoiding a bookmark lookup is a stronger case for designing a clustered index, and
the second bullet only sometimes applies.
Trang 9Creating base indexes
Even before tuning, the locations of a few indexes are easy to determine These base indexes are the first
step in building a solid set of indexes Here are a few steps to keep in mind when building these base
indexes:
1 Create a clustered index for every table For primary tables, cluster on the column most likely
used to select the row — probably the primary key
For secondary tables that are most commonly retrieved by a set of related rows, create a clustered index for the most important foreign key to group those related rows together
2 Create non-clustered indexes for the columns of every foreign key, except for the foreign
key that was indexed in step 1 Use only the foreign key values as index keys I’ve devel-oped a script that will create a non-clustered index for every foreign key (download from
www.sqlserverbible.com)
3 Create a single-column index for every column expected to be referenced in aWHEREclause,
anORDER BY, or aGROUP BY
While this indexing plan is far from perfect, and it’s definitely not a final indexing plan, it provides an
initial compromise between no indexes and tuned indexes, and can serve as a baseline performance
measurement to compare against future index tuning
Additional tuning will likely involve creating composite indexes and removing unnecessary indexes
Best Practice
When planning indexes, there’s a subtle tension between serving the needs of select queries vs update
queries While an index may improve query performance, there’s a performance cost because when
a row is inserted, updated, or deleted, the indexes must be updated as well Nonetheless, some indexing is
necessary for write operations The update or delete operation must locate the row prior to performing the
write operation, and useful indexes facilitate locating that row, thereby speeding up write operations
Therefore, when planning indexes, be careful to include the fewest number of indexes to accomplish the job
SQL Server exposes index usage statistics via dynamic management views Specifically,
sys.dm_db_index_operational_stats and sys.dm_index_usage_stats uncover infor-mation about how indexes are being used In addition, there are four dynamic management views that
reveal indexes that the Query Optimizer looked for but didn’t find: sys.dm_missing_index_groups ,
sys.dm_missing_index_group_stats , sys.dm_missing_index_columns , and sys.dm_missing_
index_details
Specialty Indexes
Beyond the standard clustered and non-clustered indexes, SQL Server offers two type of indexes I
refer to as specialty indexes Filtered indexes, new in SQL Server 2008, include less data, and indexed
Trang 10views, available since SQL Server 2000, build out custom sets of data Both are considered high-end
performance-tuning indexes
Filtered indexes
Until SQL Server 2008, every non-clustered index indexed every key value and every row Filtered
indexes allow adding aWHEREclause to theCREATE INDEXstatement This option is only available for
non-clustered indexes (how could a clustered index not include some rows?) A filtered index not only
includes fewer rows at the leaf level, but also includes fewer values in the intermediate levels It’s this
reduction in intermediate levels that causes the reads to be fewer for any index seek
An example of employing a filtered index inAdventureWorks2008is theScrappedReasonID
col-umn in theProduction.WorkOrdertable Fortunately for Adventure Works, they scrapped only 612
(.8%) parts over the life of the database The existingIX_WorkOrder_ScrapReasonIDincludes every
row TheScrapReasonIDforeign key in theProduction.WorkOrdertable allows nulls for work
orders that were not scrapped The index includes all the null values with pointers to theWorkOrder
rows withNULL ScrapReasonIDs The current index uses 109 pages
The following script recreates the index with aWHEREclause that excludes all theNULLvalues:
DROP INDEX Production.WorkOrder.IX_WorkOrder_ScrapReasonID
CREATE INDEX IX_WorkOrder_ScrapReasonID
ON Production.WorkOrder(ScrapReasonID)
WHERE ScrapReasonID IS NOT NULL
The new index uses only two pages Interestingly, the difference isn’t noticeable between using the
filtered or a non-filtered index when selecting all the work orders with a scrap reason that’s not null
That’s because there aren’t enough intermediate levels to make a significant difference For a much larger
table, the difference would be worth testing, and most likely the filtered index would provide a benefit
Filtered indexes, because of their compact size, not only reduce the disk usage but are easier to
maintain
Best Practice
When designing a covering index (see index kata #6) to solve a specific query — probably one that
represents the top handful of CPU duration according to the indexing strategy — if the covering index
works with a relatively small subset of data, and the overall table is a large table, consider filtering the
covering index
Another situation that might benefit from filtered indexes is building a unique index that includes
multi-ple rows with null values A normal unique index allows only a single row to include a null value in the
key columns However, building a unique index that excludes null in theWHEREclause creates a unique
index that permits an unlimited number of null values