Microsoft SQL Server 2008 R2 Unleashed- P134 ppsx

Although there is no delay for updated statistics, the out-of-date statistics may cause the Query Optimizer to choose a less efficient query plan, but the response times are more predict

Trang 1

FIGURE 35.29 A graphical execution plan of a query using parallel query techniques

Common Query Optimization Problems

So you’ve written a query and examined the query plan, and performance isn’t what you

expected It might appear that SQL Server isn’t choosing the appropriate query plan that

you expect Is something wrong with the query or with the Query Optimizer? Before

delving into a detailed discussion about how to debug and analyze query plans (covered

in detail in Chapter 36), the following sections look at some of the most common

prob-lems and SQL coding issues that can lead to poor query plan selection

Out-of-Date or Insufficient Statistics

Admittedly, having out-of-date or unavailable statistics is not as big a problem as it was in

SQL Server releases prior to 7.0 Back in those days, the first question asked when

someone was complaining of poor performance was, “When did you last update

statis-tics?” If the answer was “Huh?” we usually found the culprit

With the Auto-Update Statistics and Auto-Create Statistics features in SQL Server 2008,

this problem is not as prevalent as it used to be If a query detects that statistics are out of

date or missing, it causes them to be updated or created and then optimizes the query

plan based on the new statistics

Trang 2

NOTE

If statistics are missing or out of date, the first running query that detects this

condi-tion might run a bit more slowly as it updates or creates the statistics first, especially

if the table is relatively large, and also if it has been configured for FULLSCAN when

indexes are updated

However, SQL Server 2008 provides the AUTO_UPDATE_STATISTICS_ASYNC database

option When this option is set to ON, queries do not wait for the statistics to be

updated before compiling Instead, the out-of-date statistics are put on a queue for

updating by a worker thread in a background process, and the query and any other

concurrent queries compile immediately, using the existing out-of-date statistics

Although there is no delay for updated statistics, the out-of-date statistics may cause

the Query Optimizer to choose a less efficient query plan, but the response times are

more predictable Any queries invoked after the updated statistics are ready will use

the updated statistics in generating a query plan This may cause the recompilation of

any cached plans that depend on the older statistics

You should consider setting the AUTO_UPDATE_STATISTICS_ASYNC option to ON when

any of your applications have experienced client request timeouts caused by queries

waiting for updated statistics or when it is acceptable for your application to run

queries with less efficient query plans due to outdated statistics so that you can

main-tain predictable query response times

You could have insufficient statistics to properly optimize a query if the sample size used

when the statistics were generated wasn’t large enough Depending on the nature of your

data and size of the table, the statistics might not accurately reflect the actual data

distrib-ution and cardinality If you suspect that this is the case, you can update statistics by

spec-ifying the FULLSCAN option or a larger sample size, so SQL Server examines more records to

derive the statistics

For more information on understanding and managing index statistics, see Chapter 34

Poor Index Design

Poor index design is another reason—often a primary reason—why queries might not

opti-mize as you expect them to If no supporting indexes exist for a query, or if a query

contains SARGs that cannot be optimized effectively to use the available indexes, SQL

Server ends up performing either a table scan, an index scan, or another hash or merge

join strategy that is less efficient If this appears to be the problem, you need to reevaluate

your indexing decisions or rewrite the query so it can take advantage of an available

index For more information on designing useful indexes, see Chapter 34

Trang 3

Search Argument Problems

It’s the curse of SQL that there are a number of ways to write a query and get the same

result set Some queries, however, might not be as efficient as others A good

understand-ing of the Query Optimizer can help you avoid writunderstand-ing search arguments that SQL Server

can’t optimize effectively The following sections highlight some of the common

“gotchas” encountered in SQL Server SARGs that can lead to poor or unexpected query

performance

Using Optimizable SARGs

As mentioned previously, in the section “Identifying Search Arguments,” the Query

Optimizer uses search arguments to help it narrow down the set of rows to evaluate The

search argument is in the form of a WHERE clause that equates a column to a constant The

SARGs that optimize most effectively are those that compare a column with a constant

value that is not an expression or a variable, and with no operation performed against the

column itself The following is an example:

SELECT column1

FROM table1

WHERE column1 = 123

You should try to avoid using any negative logic in your SARGs (for example, !=, <>, not

in) or performing operations on, or applying functions to, the columns in the SARG

No SARGs

You need to watch out for queries in which the SARG might have been left out

inadver-tently, such as this:

select title_id from titles

A SQL query with no search argument (that is, no WHERE clause) always performs a table or

clustered index scan unless a nonclustered index can be used to cover the query (See

Chapter 34 for a discussion of index covering.) If you don’t want the query to affect the

entire table, you need to be sure to specify a valid SARG that matches an index on the

table to avoid table scans

Unknown Values in WHERE Clauses

You need to watch out for expressions in which the search value in the SARG cannot be

evaluated until runtime In these expressions, often the search value is a local variable or

subquery that can be materialized to a single value

SQL Server treats these expressions as SARGs but can’t use the statistics histogram to

esti-mate the number of matching rows because it doesn’t have a value to compare against the

histogram values during query optimization The values for the expressions aren’t known

until the query is actually executed In this situation, the Query Optimizer uses the index

density information The Query Optimizer is generally able to better estimate the number

of rows affected by a query when it can compare a known value against the statistics

Trang 4

histogram than when it has to use the index density to estimate the average number of

rows that match an unknown value This is especially true if the data in a table isn’t

distributed evenly When you can, you should try to avoid using constant expressions that

can’t be evaluated until runtime so that the statistics histogram can be used rather than

the density value

To avoid using constant expressions in WHERE clauses that can’t be evaluated until runtime,

you should consider putting the queries into stored procedures and passing in the

constant expression as a parameter Because the Query Optimizer evaluates the value of a

parameter prior to optimization, SQL Server evaluates the expression prior to optimizing

the stored procedure

For best results when writing queries inside stored procedures, you should use stored

procedure parameters rather than local variables in your SARGs whenever possible This

strategy allows the Query Optimizer to optimize the query by using the statistics

histogram, comparing the parameter value against the statistics histogram to estimate the

number of matching rows If you use local variables as SARGs in stored procedures, the

Query Optimizer is restricted to using index density, even if the local variable is assigned

the value of a parameter

Other types of constructs for which it is difficult for the Query Optimizer to accurately

estimate the number of qualifying rows or the data distribution using the statistics

histogram include aggregations in subqueries, scalar expressions, user-defined functions,

and noninline table-valued functions

Data Type Mismatches

Another common problem is data type mismatches If you attempt to join tables on

columns of different data types, the Query Optimizer might not be able to effectively use

indexes to evaluate the join This can result in a less efficient join strategy because SQL

Server has to convert all values first before it can process the query You should avoid

this situation by maintaining data type consistency across the join key columns in your

database

Large Complex Queries

For complex queries with a large number of tables and join conditions, the number of

possible execution plans can be enormous The full optimization phase of the Query

Optimizer has a time limit to restrict how long it spends analyzing all the possible query

plans There is no known general and effective shortcut to arrive at the optimal plan To

deal with such a large selection of plans, SQL Server 2008 implements a number of

heuris-tics to deal with very large queries and attempt to come up with an efficient query plan

within the time available When it is not possible to analyze the entire set of plan

alterna-tives and the heuristics are applied, it is not uncommon to encounter suboptimal query

plans being chosen

Trang 5

When is your query large enough to be a concern? Answering this question is difficult

because the answer depends on the number of tables involved, the form of filter and join

predicates, and the operations performed If a query involves more than 12 tables, it is

likely that the Query Optimizer is having to rely on heuristics and shortcuts to generate a

query plan and may miss some optimal strategies

In general, you get more optimal query plans if you can simplify your queries as much as

possible

Triggers

If you are using triggers on INSERT, UPDATE, or DELETE, it is possible that your triggers can

cause performance problems You might think that INSERT, UPDATE, or DELETE is

perform-ing poorly when actually it is the trigger that needs to be tuned In addition, you might

have triggers that fire other triggers If you suspect that you are having performance

prob-lems with the triggers, you can monitor the SQL they are executing and the response

time, as well as execution plans generated for statements within triggers using SQL Server

Profiler For more information on monitoring performance with SQL Server Profiler, see

Chapter 6, “SQL Server Profiler.” You can also see the query plans for statements executed

in triggers by using SSMS if you enable the Include Actual Execution Plan option For

more information on using SSMS to view and analyze query plans, see Chapter 36

Managing the Optimizer

Because the Query Optimizer might sometimes make poor decisions as to how to best

process a query, you need to know how and when you may need to override the Query

Optimizer and force SQL Server to process a query in a specific manner

How often does SQL Server require manual intervention to execute a query optimally?

Considering the overwhelming number of query types and circumstances in which those

queries are run, SQL Server does a surprisingly effective job of query optimization in most

instances For all but the most grueling, complex query operations, experience has shown

that SQL Server’s Query Optimizer is quite clever—and very, very good at wringing the

best performance out of any hardware platform For this reason, you should treat the

material covered in this chapter as a collection of techniques to be used only where other

methods of getting optimal query performance have already failed

Before indiscriminately applying the techniques discussed in this section, remember one

very important point: use of these features can effectively hide serious fundamental design

or coding flaws in your database, application, or queries In fact, if you’re tempted to use

these features (with a few moderate exceptions), it should serve as an indicator that the

problems might lie elsewhere in the application or queries

If you are satisfied that no such flaws exist and that SQL Server is choosing the wrong

plan to optimize your query, you can use the methods discussed in this section to override

Trang 6

Choosing which index, if any, to resolve the query

Choosing the join strategy to apply in a multitable query

The other decision made by the Query Optimizer is the locking strategy to apply Using

table hints to override locking strategies is discussed in Chapter 37, “Locking and

Performance.”

Throughout this and following sections, one point must remain clear in your mind: these

options should be used only in exception cases to cope with specific optimization problems

in specific queries in specific applications There are therefore no standard or global rules

to follow because the application of these features, by definition, means that normal SQL

Server behavior isn’t taking place

The practical result of this idea is that you should test every option in your environment,

with your data and your queries, and use the techniques and methods discussed in this

chapter and the other performance-related chapters to optimize and fine-tune the

perfor-mance of your queries The fastest-performing query wins, so you shouldn’t be afraid to

experiment with different alternatives—but you shouldn’t think that these statements and

features are globally applicable or fit general categories of problems, either! There are, in

fact, only three rules: Test, test, and test!

TIP

As a general rule, Query Optimizer and table hints should be used only as a last resort,

when all other methods to get the Query Optimizer to generate a more efficient query

plan have failed Always try to find other ways to rewrite the queries to encourage the

Query Optimizer to choose a better plan This includes adding additional SARGs,

substi-tuting unknown values for known values in SARGS or trying to replace unknown values

with known values, breaking up queries, converting subqueries to joins or joins to

subqueries, and so on Essentially, you should try other coding variations on the query

itself to get the same result in a different way and try to see if one of the variations

ends up using the more efficient query plan that you expect it to

In reality, about the only time you should use these hints is when you’re testing the

performance of a query and want to see if the Query Optimizer is actually choosing the

best execution plan You can enable the various query analysis options, such as

STATISTICS PROFILE and STATISTICS IO, and then see how the query plan and

statistics change as you apply various hints to the query You can examine the output

to determine whether the I/O cost and/or runtime improves or gets worse if you force

one index over another or if you force a specific join strategy or join order

The problem with hard-coding table and Query Optimizer hints into application queries

is that the hints prevent the Query Optimizer from modifying the query plan as the data

in the tables changes over time Also, if subsequent service packs or releases of SQL

Server incorporate improved optimization algorithms or strategies, the queries with

hard-coded hints will not be able to take advantage of them

Trang 7

If you find that you must incorporate any of these hints to solve query performance

problems, you should be sure to document which queries and stored procedures

con-tain Query Optimizer and table hints It’s a good idea to periodically go back and test

the queries to determine whether the hints are still appropriate You might find that,

over time, as the data values in the table change, the forced query plan generated

because of the hints is no longer the most efficient query plan, and the Query

Optimizer now generates a more efficient query plan on its own

Optimizer Hints

You can specify three types of hints in a query to override the decisions made by the

Query Optimizer:

Table hints

Join hints

Query hints

The following sections examine and describe each type of table hint

Forcing Index Selection with Table Hints

In addition to locking hints that can be specified for each table in a query, SQL Server 2008

allows you to provide table-level hints that enable you to specify the index SQL Server

should use for accessing the table The syntax for specifying an index hint is as follows:

SELECT column_list FROM tablename WITH (INDEX (indid | index_name [, ]) )

This syntax allows you to specify multiple indexes You can specify an index by name or

by ID It is recommended that you specify indexes by name as the IDs for nonclustered

indexes can change if they are dropped and re-created in a different order than that in

which they were created originally You can specify an index ID of 0, or the table name

itself, to force a table scan

When you specify multiple indexes in the hint list, all the indexes listed are used to

retrieve the rows from the table, forcing an index intersection or index covering via an

index join If the collection of indexes listed does not cover the query, a regular row fetch

is performed after all the indexed columns are retrieved

To get a list of indexes on a table, you can use sp_helpindex However, the stored procedure

doesn’t display the index ID To get a list of all user-defined tables and the names of the

indexes defined on them, you can execute a query against the sys.indexes catalog view

similar to the one shown in Listing 35.6, which was run against the bigpubs2008 database

LISTING 35.6 Query Against sys.indexes Catalog View to Get Index Names and IDs

select ‘Table name’ = convert(char(20), object_name(object_id)),

‘Index name’ = convert(char(30), name),

‘Index ID’ = index_id,

Trang 8

from sys.indexes where object_id > 99 —only system tables have id less than 99

and index_id between 1 and 254 /* do not include rows for text columns

or tables without a clustered index*/

/* do not include auto statistics */

and is_hypothetical = 0

and objectproperty(object_id, ‘IsUserTable’) = 1

order by 1, 3

go

Table name Index name Index ID Index Type

-authors UPKCL_auidind 1 CLUSTERED

authors aunmind 2 NONCLUSTERED

employee employee_ind 1 CLUSTERED

employee PK_emp_id 2 NONCLUSTERED

jobs PK jobs job_id 25319086 1 CLUSTERED

PARTS PK PARTS 09746778 1 CLUSTERED

PARTS UQ PARTS 0A688BB1 2 NONCLUSTERED

pub_info UPKCL_pubinfo 1 CLUSTERED

publishers UPKCL_pubind 1 CLUSTERED

roysched titleidind 2 NONCLUSTERED

sales UPKCL_sales 1 CLUSTERED

sales titleidind 2 NONCLUSTERED

sales ord_date_idx 7 NONCLUSTERED

sales qty_idx 8 NONCLUSTERED

sales_big ci_sales_big 1 CLUSTERED

sales_big idx1 2 NONCLUSTERED

sales_noclust idx1 2 NONCLUSTERED

sales_noclust ord_date_idx 3 NONCLUSTERED

sales_noclust qty_idx 4 NONCLUSTERED

stores UPK_storeid 1 CLUSTERED

stores nc1_stores 2 NONCLUSTERED

titleauthor UPKCL_taind 1 CLUSTERED

titleauthor auidind 2 NONCLUSTERED

titleauthor titleidind 3 NONCLUSTERED

titles UPKCL_titleidind 1 CLUSTERED

titles titleind 2 NONCLUSTERED

titles ytd_sales_filtered 11 NONCLUSTERED

SQL Server 2008 introduces the new FORCESEEK table hint, which provides an additional

query optimization option This hint specifies that the query optimizer use only an index

seek operation as the access path to the data in the table or view referenced in the query

rather than a index or table scan If a query plan contains table or index scan operators,

forcing an index seek operation may yield better query performance This is especially true

Trang 9

when inaccurate cardinality or cost estimations cause the optimizer to favor scan

opera-tions at plan compilation time

Before using the FORCESEEK table hint, you should make sure that statistics on the table

are current and accurate Also, you should evaluate the query for items that can cause

poor cardinality or cost estimates and remove these items if possible For example, replace

local variables with parameters or literals and limit the use of multistatement table-valued

functions and table variables in the query

Also, be aware that if you specify the FORCESEEK hint in addition to an index hint, the

FORCESEEK hint can cause the optimizer to use an index other than one specified in the

index hint

Forcing Join Strategies with Join Hints

Join hints let you force the type of join that should be used between two tables The join

hints correspond with the three types of join strategies:

LOOP

MERGE

HASH

You can specify join hints only when you use the ANSI-style join syntax—that is, when

you actually use the keyword JOIN in the query The hint is specified between the type of

join and the keyword JOIN, which means you can’t leave out the keyword INNER for an

inner join Thus, the syntax for the FROM clause when using join hints is as follows:

FROM table1 {INNER | OUTER} [LOOP | MERGE | HASH} JOIN table2

The following example forces SQL Server to use a hash join:

select st.stor_name, ord_date, qty

from stores st INNER HASH JOIN sales s on st.stor_id = s.stor_id

where st.stor_id between ‘B100’ and ‘B599’

You can also specify a global join hint for all joins in a query by using a query processing

hint

Specifying Query Processing Hints

SQL Server 2008 enables you to specify additional query hints to control how your queries

are optimized and processed You specify query hints at the end of a query by using the

OPTION keyword There can be only one OPTION clause per query, but you can specify

multiple hints in an OPTION clause, as shown in the following syntax:

OPTION (hint1 [, hintn])

Query hints are grouped into four categories: GROUP BY, UNION, join, and miscellaneous

Trang 10

GROUP BY Hints GROUP BY hints specify how GROUP BY or COMPUTE operations should be

performed The following GROUP BY hints can be specified:

HASH GROUP—This option forces the Query Optimizer to use a hashing function to

perform the GROUP BY operation

ORDER GROUP—This option forces the Query Optimizer to use a sorting operation

to perform the GROUP BY operation

Only one GROUP BY hint can be specified at a time

UNION Hints The UNION hints specify how UNION operations should be performed The

following UNION hints can be specified:

MERGE UNION—This option forces the Query Optimizer to use a merge operation to

perform the UNION operation

HASH UNION—This option forces the Query Optimizer to use a hash operation to

perform the UNION operation

CONCAT UNION—This option forces the Query Optimizer to use the concatenation

method to perform the UNION operation

Only one UNION hint can be specified at a time, and it must come after the last query in

the UNION The following is an example of forcing concatenation for a UNION:

select stor_id from sales where stor_id like ‘B19%’

UNION

select title_id from titles where title_id like ‘C19%’

OPTION (CONCAT UNION)

Join Hints The join hint specified in the OPTION clause specifies that all join operations in

the query are performed as the type of join specified in the hint The join hints that can

be specified in the query hints are the same as the table hints:

LOOP JOIN

MERGE JOIN

HASH JOIN

If you also specify a join hint for a specific pair of tables, the table-level hints specified

must be compatible with the query-level join hint

Miscellaneous Hints The following miscellaneous hints can be used to override various

query operations:

FORCE ORDER—This option tells the Query Optimizer to join the tables in the order

in which they are listed in the FROM clause and not to determine the optimal join

order

Định dạng
Số trang	10
Dung lượng	284,48 KB