Although there is no delay for updated statistics, the out-of-date statistics may cause the Query Optimizer to choose a less efficient query plan, but the response times are more predict
Trang 1FIGURE 35.29 A graphical execution plan of a query using parallel query techniques
Common Query Optimization Problems
So you’ve written a query and examined the query plan, and performance isn’t what you
expected It might appear that SQL Server isn’t choosing the appropriate query plan that
you expect Is something wrong with the query or with the Query Optimizer? Before
delving into a detailed discussion about how to debug and analyze query plans (covered
in detail in Chapter 36), the following sections look at some of the most common
prob-lems and SQL coding issues that can lead to poor query plan selection
Out-of-Date or Insufficient Statistics
Admittedly, having out-of-date or unavailable statistics is not as big a problem as it was in
SQL Server releases prior to 7.0 Back in those days, the first question asked when
someone was complaining of poor performance was, “When did you last update
statis-tics?” If the answer was “Huh?” we usually found the culprit
With the Auto-Update Statistics and Auto-Create Statistics features in SQL Server 2008,
this problem is not as prevalent as it used to be If a query detects that statistics are out of
date or missing, it causes them to be updated or created and then optimizes the query
plan based on the new statistics
Trang 2NOTE
If statistics are missing or out of date, the first running query that detects this
condi-tion might run a bit more slowly as it updates or creates the statistics first, especially
if the table is relatively large, and also if it has been configured for FULLSCAN when
indexes are updated
However, SQL Server 2008 provides the AUTO_UPDATE_STATISTICS_ASYNC database
option When this option is set to ON, queries do not wait for the statistics to be
updated before compiling Instead, the out-of-date statistics are put on a queue for
updating by a worker thread in a background process, and the query and any other
concurrent queries compile immediately, using the existing out-of-date statistics
Although there is no delay for updated statistics, the out-of-date statistics may cause
the Query Optimizer to choose a less efficient query plan, but the response times are
more predictable Any queries invoked after the updated statistics are ready will use
the updated statistics in generating a query plan This may cause the recompilation of
any cached plans that depend on the older statistics
You should consider setting the AUTO_UPDATE_STATISTICS_ASYNC option to ON when
any of your applications have experienced client request timeouts caused by queries
waiting for updated statistics or when it is acceptable for your application to run
queries with less efficient query plans due to outdated statistics so that you can
main-tain predictable query response times
You could have insufficient statistics to properly optimize a query if the sample size used
when the statistics were generated wasn’t large enough Depending on the nature of your
data and size of the table, the statistics might not accurately reflect the actual data
distrib-ution and cardinality If you suspect that this is the case, you can update statistics by
spec-ifying the FULLSCAN option or a larger sample size, so SQL Server examines more records to
derive the statistics
For more information on understanding and managing index statistics, see Chapter 34
Poor Index Design
Poor index design is another reason—often a primary reason—why queries might not
opti-mize as you expect them to If no supporting indexes exist for a query, or if a query
contains SARGs that cannot be optimized effectively to use the available indexes, SQL
Server ends up performing either a table scan, an index scan, or another hash or merge
join strategy that is less efficient If this appears to be the problem, you need to reevaluate
your indexing decisions or rewrite the query so it can take advantage of an available
index For more information on designing useful indexes, see Chapter 34
Trang 3Search Argument Problems
It’s the curse of SQL that there are a number of ways to write a query and get the same
result set Some queries, however, might not be as efficient as others A good
understand-ing of the Query Optimizer can help you avoid writunderstand-ing search arguments that SQL Server
can’t optimize effectively The following sections highlight some of the common
“gotchas” encountered in SQL Server SARGs that can lead to poor or unexpected query
performance
Using Optimizable SARGs
As mentioned previously, in the section “Identifying Search Arguments,” the Query
Optimizer uses search arguments to help it narrow down the set of rows to evaluate The
search argument is in the form of a WHERE clause that equates a column to a constant The
SARGs that optimize most effectively are those that compare a column with a constant
value that is not an expression or a variable, and with no operation performed against the
column itself The following is an example:
SELECT column1
FROM table1
WHERE column1 = 123
You should try to avoid using any negative logic in your SARGs (for example, !=, <>, not
in) or performing operations on, or applying functions to, the columns in the SARG
No SARGs
You need to watch out for queries in which the SARG might have been left out
inadver-tently, such as this:
select title_id from titles
A SQL query with no search argument (that is, no WHERE clause) always performs a table or
clustered index scan unless a nonclustered index can be used to cover the query (See
Chapter 34 for a discussion of index covering.) If you don’t want the query to affect the
entire table, you need to be sure to specify a valid SARG that matches an index on the
table to avoid table scans
Unknown Values in WHERE Clauses
You need to watch out for expressions in which the search value in the SARG cannot be
evaluated until runtime In these expressions, often the search value is a local variable or
subquery that can be materialized to a single value
SQL Server treats these expressions as SARGs but can’t use the statistics histogram to
esti-mate the number of matching rows because it doesn’t have a value to compare against the
histogram values during query optimization The values for the expressions aren’t known
until the query is actually executed In this situation, the Query Optimizer uses the index
density information The Query Optimizer is generally able to better estimate the number
of rows affected by a query when it can compare a known value against the statistics
Trang 4histogram than when it has to use the index density to estimate the average number of
rows that match an unknown value This is especially true if the data in a table isn’t
distributed evenly When you can, you should try to avoid using constant expressions that
can’t be evaluated until runtime so that the statistics histogram can be used rather than
the density value
To avoid using constant expressions in WHERE clauses that can’t be evaluated until runtime,
you should consider putting the queries into stored procedures and passing in the
constant expression as a parameter Because the Query Optimizer evaluates the value of a
parameter prior to optimization, SQL Server evaluates the expression prior to optimizing
the stored procedure
For best results when writing queries inside stored procedures, you should use stored
procedure parameters rather than local variables in your SARGs whenever possible This
strategy allows the Query Optimizer to optimize the query by using the statistics
histogram, comparing the parameter value against the statistics histogram to estimate the
number of matching rows If you use local variables as SARGs in stored procedures, the
Query Optimizer is restricted to using index density, even if the local variable is assigned
the value of a parameter
Other types of constructs for which it is difficult for the Query Optimizer to accurately
estimate the number of qualifying rows or the data distribution using the statistics
histogram include aggregations in subqueries, scalar expressions, user-defined functions,
and noninline table-valued functions
Data Type Mismatches
Another common problem is data type mismatches If you attempt to join tables on
columns of different data types, the Query Optimizer might not be able to effectively use
indexes to evaluate the join This can result in a less efficient join strategy because SQL
Server has to convert all values first before it can process the query You should avoid
this situation by maintaining data type consistency across the join key columns in your
database
Large Complex Queries
For complex queries with a large number of tables and join conditions, the number of
possible execution plans can be enormous The full optimization phase of the Query
Optimizer has a time limit to restrict how long it spends analyzing all the possible query
plans There is no known general and effective shortcut to arrive at the optimal plan To
deal with such a large selection of plans, SQL Server 2008 implements a number of
heuris-tics to deal with very large queries and attempt to come up with an efficient query plan
within the time available When it is not possible to analyze the entire set of plan
alterna-tives and the heuristics are applied, it is not uncommon to encounter suboptimal query
plans being chosen
Trang 5When is your query large enough to be a concern? Answering this question is difficult
because the answer depends on the number of tables involved, the form of filter and join
predicates, and the operations performed If a query involves more than 12 tables, it is
likely that the Query Optimizer is having to rely on heuristics and shortcuts to generate a
query plan and may miss some optimal strategies
In general, you get more optimal query plans if you can simplify your queries as much as
possible
Triggers
If you are using triggers on INSERT, UPDATE, or DELETE, it is possible that your triggers can
cause performance problems You might think that INSERT, UPDATE, or DELETE is
perform-ing poorly when actually it is the trigger that needs to be tuned In addition, you might
have triggers that fire other triggers If you suspect that you are having performance
prob-lems with the triggers, you can monitor the SQL they are executing and the response
time, as well as execution plans generated for statements within triggers using SQL Server
Profiler For more information on monitoring performance with SQL Server Profiler, see
Chapter 6, “SQL Server Profiler.” You can also see the query plans for statements executed
in triggers by using SSMS if you enable the Include Actual Execution Plan option For
more information on using SSMS to view and analyze query plans, see Chapter 36
Managing the Optimizer
Because the Query Optimizer might sometimes make poor decisions as to how to best
process a query, you need to know how and when you may need to override the Query
Optimizer and force SQL Server to process a query in a specific manner
How often does SQL Server require manual intervention to execute a query optimally?
Considering the overwhelming number of query types and circumstances in which those
queries are run, SQL Server does a surprisingly effective job of query optimization in most
instances For all but the most grueling, complex query operations, experience has shown
that SQL Server’s Query Optimizer is quite clever—and very, very good at wringing the
best performance out of any hardware platform For this reason, you should treat the
material covered in this chapter as a collection of techniques to be used only where other
methods of getting optimal query performance have already failed
Before indiscriminately applying the techniques discussed in this section, remember one
very important point: use of these features can effectively hide serious fundamental design
or coding flaws in your database, application, or queries In fact, if you’re tempted to use
these features (with a few moderate exceptions), it should serve as an indicator that the
problems might lie elsewhere in the application or queries
If you are satisfied that no such flaws exist and that SQL Server is choosing the wrong
plan to optimize your query, you can use the methods discussed in this section to override
Trang 6Choosing which index, if any, to resolve the query
Choosing the join strategy to apply in a multitable query
The other decision made by the Query Optimizer is the locking strategy to apply Using
table hints to override locking strategies is discussed in Chapter 37, “Locking and
Performance.”
Throughout this and following sections, one point must remain clear in your mind: these
options should be used only in exception cases to cope with specific optimization problems
in specific queries in specific applications There are therefore no standard or global rules
to follow because the application of these features, by definition, means that normal SQL
Server behavior isn’t taking place
The practical result of this idea is that you should test every option in your environment,
with your data and your queries, and use the techniques and methods discussed in this
chapter and the other performance-related chapters to optimize and fine-tune the
perfor-mance of your queries The fastest-performing query wins, so you shouldn’t be afraid to
experiment with different alternatives—but you shouldn’t think that these statements and
features are globally applicable or fit general categories of problems, either! There are, in
fact, only three rules: Test, test, and test!
TIP
As a general rule, Query Optimizer and table hints should be used only as a last resort,
when all other methods to get the Query Optimizer to generate a more efficient query
plan have failed Always try to find other ways to rewrite the queries to encourage the
Query Optimizer to choose a better plan This includes adding additional SARGs,
substi-tuting unknown values for known values in SARGS or trying to replace unknown values
with known values, breaking up queries, converting subqueries to joins or joins to
subqueries, and so on Essentially, you should try other coding variations on the query
itself to get the same result in a different way and try to see if one of the variations
ends up using the more efficient query plan that you expect it to
In reality, about the only time you should use these hints is when you’re testing the
performance of a query and want to see if the Query Optimizer is actually choosing the
best execution plan You can enable the various query analysis options, such as
STATISTICS PROFILE and STATISTICS IO, and then see how the query plan and
statistics change as you apply various hints to the query You can examine the output
to determine whether the I/O cost and/or runtime improves or gets worse if you force
one index over another or if you force a specific join strategy or join order
The problem with hard-coding table and Query Optimizer hints into application queries
is that the hints prevent the Query Optimizer from modifying the query plan as the data
in the tables changes over time Also, if subsequent service packs or releases of SQL
Server incorporate improved optimization algorithms or strategies, the queries with
hard-coded hints will not be able to take advantage of them
Trang 7If you find that you must incorporate any of these hints to solve query performance
problems, you should be sure to document which queries and stored procedures
con-tain Query Optimizer and table hints It’s a good idea to periodically go back and test
the queries to determine whether the hints are still appropriate You might find that,
over time, as the data values in the table change, the forced query plan generated
because of the hints is no longer the most efficient query plan, and the Query
Optimizer now generates a more efficient query plan on its own
Optimizer Hints
You can specify three types of hints in a query to override the decisions made by the
Query Optimizer:
Table hints
Join hints
Query hints
The following sections examine and describe each type of table hint
Forcing Index Selection with Table Hints
In addition to locking hints that can be specified for each table in a query, SQL Server 2008
allows you to provide table-level hints that enable you to specify the index SQL Server
should use for accessing the table The syntax for specifying an index hint is as follows:
SELECT column_list FROM tablename WITH (INDEX (indid | index_name [, ]) )
This syntax allows you to specify multiple indexes You can specify an index by name or
by ID It is recommended that you specify indexes by name as the IDs for nonclustered
indexes can change if they are dropped and re-created in a different order than that in
which they were created originally You can specify an index ID of 0, or the table name
itself, to force a table scan
When you specify multiple indexes in the hint list, all the indexes listed are used to
retrieve the rows from the table, forcing an index intersection or index covering via an
index join If the collection of indexes listed does not cover the query, a regular row fetch
is performed after all the indexed columns are retrieved
To get a list of indexes on a table, you can use sp_helpindex However, the stored procedure
doesn’t display the index ID To get a list of all user-defined tables and the names of the
indexes defined on them, you can execute a query against the sys.indexes catalog view
similar to the one shown in Listing 35.6, which was run against the bigpubs2008 database
LISTING 35.6 Query Against sys.indexes Catalog View to Get Index Names and IDs
select ‘Table name’ = convert(char(20), object_name(object_id)),
‘Index name’ = convert(char(30), name),
‘Index ID’ = index_id,
Trang 8from sys.indexes where object_id > 99 —only system tables have id less than 99
and index_id between 1 and 254 /* do not include rows for text columns
or tables without a clustered index*/
/* do not include auto statistics */
and is_hypothetical = 0
and objectproperty(object_id, ‘IsUserTable’) = 1
order by 1, 3
go
Table name Index name Index ID Index Type
-authors UPKCL_auidind 1 CLUSTERED
authors aunmind 2 NONCLUSTERED
employee employee_ind 1 CLUSTERED
employee PK_emp_id 2 NONCLUSTERED
jobs PK jobs job_id 25319086 1 CLUSTERED
PARTS PK PARTS 09746778 1 CLUSTERED
PARTS UQ PARTS 0A688BB1 2 NONCLUSTERED
pub_info UPKCL_pubinfo 1 CLUSTERED
publishers UPKCL_pubind 1 CLUSTERED
roysched titleidind 2 NONCLUSTERED
sales UPKCL_sales 1 CLUSTERED
sales titleidind 2 NONCLUSTERED
sales ord_date_idx 7 NONCLUSTERED
sales qty_idx 8 NONCLUSTERED
sales_big ci_sales_big 1 CLUSTERED
sales_big idx1 2 NONCLUSTERED
sales_noclust idx1 2 NONCLUSTERED
sales_noclust ord_date_idx 3 NONCLUSTERED
sales_noclust qty_idx 4 NONCLUSTERED
stores UPK_storeid 1 CLUSTERED
stores nc1_stores 2 NONCLUSTERED
titleauthor UPKCL_taind 1 CLUSTERED
titleauthor auidind 2 NONCLUSTERED
titleauthor titleidind 3 NONCLUSTERED
titles UPKCL_titleidind 1 CLUSTERED
titles titleind 2 NONCLUSTERED
titles ytd_sales_filtered 11 NONCLUSTERED
SQL Server 2008 introduces the new FORCESEEK table hint, which provides an additional
query optimization option This hint specifies that the query optimizer use only an index
seek operation as the access path to the data in the table or view referenced in the query
rather than a index or table scan If a query plan contains table or index scan operators,
forcing an index seek operation may yield better query performance This is especially true
Trang 9when inaccurate cardinality or cost estimations cause the optimizer to favor scan
opera-tions at plan compilation time
Before using the FORCESEEK table hint, you should make sure that statistics on the table
are current and accurate Also, you should evaluate the query for items that can cause
poor cardinality or cost estimates and remove these items if possible For example, replace
local variables with parameters or literals and limit the use of multistatement table-valued
functions and table variables in the query
Also, be aware that if you specify the FORCESEEK hint in addition to an index hint, the
FORCESEEK hint can cause the optimizer to use an index other than one specified in the
index hint
Forcing Join Strategies with Join Hints
Join hints let you force the type of join that should be used between two tables The join
hints correspond with the three types of join strategies:
LOOP
MERGE
HASH
You can specify join hints only when you use the ANSI-style join syntax—that is, when
you actually use the keyword JOIN in the query The hint is specified between the type of
join and the keyword JOIN, which means you can’t leave out the keyword INNER for an
inner join Thus, the syntax for the FROM clause when using join hints is as follows:
FROM table1 {INNER | OUTER} [LOOP | MERGE | HASH} JOIN table2
The following example forces SQL Server to use a hash join:
select st.stor_name, ord_date, qty
from stores st INNER HASH JOIN sales s on st.stor_id = s.stor_id
where st.stor_id between ‘B100’ and ‘B599’
You can also specify a global join hint for all joins in a query by using a query processing
hint
Specifying Query Processing Hints
SQL Server 2008 enables you to specify additional query hints to control how your queries
are optimized and processed You specify query hints at the end of a query by using the
OPTION keyword There can be only one OPTION clause per query, but you can specify
multiple hints in an OPTION clause, as shown in the following syntax:
OPTION (hint1 [, hintn])
Query hints are grouped into four categories: GROUP BY, UNION, join, and miscellaneous
Trang 10GROUP BY Hints GROUP BY hints specify how GROUP BY or COMPUTE operations should be
performed The following GROUP BY hints can be specified:
HASH GROUP—This option forces the Query Optimizer to use a hashing function to
perform the GROUP BY operation
ORDER GROUP—This option forces the Query Optimizer to use a sorting operation
to perform the GROUP BY operation
Only one GROUP BY hint can be specified at a time
UNION Hints The UNION hints specify how UNION operations should be performed The
following UNION hints can be specified:
MERGE UNION—This option forces the Query Optimizer to use a merge operation to
perform the UNION operation
HASH UNION—This option forces the Query Optimizer to use a hash operation to
perform the UNION operation
CONCAT UNION—This option forces the Query Optimizer to use the concatenation
method to perform the UNION operation
Only one UNION hint can be specified at a time, and it must come after the last query in
the UNION The following is an example of forcing concatenation for a UNION:
select stor_id from sales where stor_id like ‘B19%’
UNION
select title_id from titles where title_id like ‘C19%’
OPTION (CONCAT UNION)
Join Hints The join hint specified in the OPTION clause specifies that all join operations in
the query are performed as the type of join specified in the hint The join hints that can
be specified in the query hints are the same as the table hints:
LOOP JOIN
MERGE JOIN
HASH JOIN
If you also specify a join hint for a specific pair of tables, the table-level hints specified
must be compatible with the query-level join hint
Miscellaneous Hints The following miscellaneous hints can be used to override various
query operations:
FORCE ORDER—This option tells the Query Optimizer to join the tables in the order
in which they are listed in the FROM clause and not to determine the optimal join
order