Microsoft SQL Server 2008 R2 Unleashed- P133 pdf

AS “Avg CPU Timems”, SUMtotal_logical_reads / SUMexecution_count AS “Avg Reads” FROM sys.dm_exec_query_stats where query_hash = 0x9AB21AC5889FE2D0 go Avg CPU Timems Avg Reads --- ---164.

Trang 1

CHAPTER 35 Understanding Query Optimization

Query Text Query Hash Query Plan Hash

-select * from titles where ytd_sales = 0 0x9AB21AC5889FE2D0 0x8D6DE6D258BABB2B

select * from titles where ytd_sales = 0 0x9AB21AC5889FE2D0 0x8D6DE6D258BABB2B

select * from titles where ytd_sales = 99 0x9AB21AC5889FE2D0 0xE889B5D23D917DFD

select * from titles where ytd_sales = 0 0x9AB21AC5889FE2D0 0x8D6DE6D258BABB2B

This query hash or query plan hash value can be used in a query to aggregate performance

statistics for like queries For example, the following query returns the average processing

time and logical reads for the same queries that were returned in Listing 35.2:

SELECT

SUM(total_worker_time) / SUM(execution_count)/1000 AS “Avg CPU Time(ms)”,

SUM(total_logical_reads) / SUM(execution_count) AS “Avg Reads”

FROM

sys.dm_exec_query_stats

where query_hash = 0x9AB21AC5889FE2D0

go

Avg CPU Time(ms) Avg Reads

-

-164.092000 7

Listing 35.5 provides a sample query using the query hash value to return information

about the top 25 queries ranked by average processing time

LISTING 35.5 Returning Top 25 Queries Using Query Hash

SELECT TOP 25 query_stats.query_hash AS “Query Hash”,

SUM(query_stats.total_worker_time) / SUM(query_stats.execution_count) AS

“Avg CPU Time”,

MIN(query_stats.statement_text) AS “Statement Text”

FROM

(SELECT QS.*,

SUBSTRING(ST.text, (QS.statement_start_offset/2) + 1,

((CASE statement_end_offset

WHEN -1 THEN DATALENGTH(ST.text)

ELSE QS.statement_end_offset END

- QS.statement_start_offset)/2) + 1) AS statement_text

FROM sys.dm_exec_query_stats AS QS

CROSS APPLY sys.dm_exec_sql_text(QS.sql_handle) as ST) as query_stats

GROUP BY query_stats.query_hash

Trang 2

ORDER BY 2 DESC;

GO

sys.dm_exec_plan_attributes

If you want to get information about specific attributes of a specific query plan, you use

sys.dm_exec_plan_attributes This DMV takes a plan_handle as an input parameter (see

Listing 35.1 for an example of a query that you can use to retrieve a query’s plan handle)

and returns one row for each attribute associated with the query plan These attributes

include information such as the ID of the database context the query plan was generated

in, the ID of the user who generated the query plan, session SET options in effect at the

time the plan was generated, and so on Many of these attributes are used as part of the

cache lookup key for the plan (indicated by the value 1 in the is_cache_key_column)

Following is an example of the output for sys.dm_exec_plan_attributes:

select convert(varchar(30), attribute) as attribute,

convert(varchar(12), value) as value,

is_cache_key

FROM

sys.dm_exec_plan_attributes (0x06000400EBC44D2AB880A006000000000000000000000000)

where is_cache_key = 1

go

attribute value is_cache_key

-set_options 187 1

objectid 709739755 1

dbid 4 1

dbid_execute 0 1

user_id -2 1

language_id 0 1

date_format 1 1

date_first 7 1

compat_level 100 1

status 0 1

required_cursor_options 0 1

acceptable_cursor_options 0 1

merge_action_type 0 1

is_replication_specific 0 1

optional_spid 0 1

optional_clr_trigger_dbid 0 1

optional_clr_trigger_objid 0 1

Trang 3

Note the attributes flagged as cache keys for the plan If one of these properties does not

match the state of the current user session, the plan cannot be reused for that session, and

a new plan must be compiled and stored in the plan cache If you see multiple plans in

cache for what appears to be the same query, you can determine the key differences

between them by comparing the columns associated with the plan’s cache keys to see

where the differences lie

TIP

If SQL Server has been running for a while, with a lot of activity, the number of plans in

the plan cache can become quite large, resulting in a large number of rows being

returned by the plan cache DMVs To run your own tests to determine which query

plans get cached and when specific query plans are reused, you should clear out the

cache occasionally You can use the DBCC FREEPROCCACHE command to clear all

cached plans from memory If you want to clear only the cached plans for objects or

queries in a specific database, you execute the following command:

DBCC FLUSHPROCINDB (dbid)

Keep in mind that you should run these commands only in a test environment Running

these commands in production servers could impact the performance of the currently

running applications

Other Query Processing Strategies

In addition to the optimization strategies covered so far, SQL Server also has some

addi-tional strategies it can apply for special types of queries These strategies are used to help

further reduce the cost of executing various types of queries

Predicate Transitivity

You might be familiar with the transitive property from algebra The transitive property

simply states that if A=B and B=C, then A=C SQL Server supports the transitive property

in its query predicates Predicate transitivity enables SQL Server to infer a join equality

from two given equalities Consider the following example:

SELECT *

FROM table1 t1

join table2 t2 on t1.column1 = t2.column1

join table3 t3 on t2.column1 = t3.column1

Using the principle of predicate transitivity, SQL Server is able to infer that t1.column1 is

equal to t3.column1 This capability provides the Query Optimizer with another join

Trang 4

strategy to consider when optimizing this query This might result in a much cheaper

execution plan

The transitive property can also be applied to SARGs used on join columns Consider the

following query:

select *

from sales s

join stores st on s.stor_id = st.stor_id

and s.stor_id = ‘B199’

Again, using transitive closure, it follows that st.stor_id is also equal to ’B199’ SQL

Server recognizes this and can compare the search value against the statistics on both

tables to more accurately estimate the number of matching rows from each table

Group by Optimization

One way SQL Server can process GROUP BY results is to retrieve the matching detailed data

rows into a worktable and then sort the rows and calculate the aggregates on the groups

formed In SQL Server 2008, the Query Optimizer also may choose to use hashing to

orga-nize the data into groups and then compute the aggregates

The hash aggregation strategy uses the same basic method for grouping and calculating

aggregates as for a hash join At the point where the probe input row is checked to

deter-mine whether it already exists in the hash bucket, the aggregate is computed if a hash

match is found The following pseudocode summarizes the hash aggregation strategy:

create a hash table

for each row in the input table

read the row

hash the key value

search the hash table for matches

if match found

aggregate the value into the old record

else

insert the hashed key into the hash bucket

scan and output the hash table contents

drop the hash table

For some join queries that contain GROUP BY clauses, SQL Server might perform the

group-ing operation before processgroup-ing the join This could reduce the size of the input table to

the join and lower the overall cost of executing the query

Trang 5

NOTE

One important point to keep in mind is that regardless of the GROUP BY strategy

employed, the rows are not guaranteed to be returned in sorted order by the grouping

column(s) as they were in earlier releases If the results must be returned in a specific

sort order, you need to use the ORDER BY clause with GROUP BY to ensure ordered

results You might want to get into the habit of doing this regularly

Queries with DISTINCT

When the DISTINCT clause is specified in a query, SQL Server can eliminate duplicate rows

by the sorting the result set in a worktable to identify and remove the duplicates, similar

to how a worktable is used for GROUP BY queries In SQL Server 2008, the Query Optimizer

can also employ a hashing strategy similar to that used for GROUP BY to return only the

distinct rows before the final result set is determined

In addition, if the Query Optimizer can determine at compile time that there will be no

possibility of duplicate rows in the result set (for example, each row contains the table’s

primary key), the strategies for removing duplicate rows are skipped altogether

Queries with UNION

When you specify UNION in a query, SQL Server merges the result sets, applying one of the

merge or concatenation operators with sorting strategies to remove any duplicate rows

Figure 35.25 shows an example similar to the OR strategy where the rows are concatenated

and then sorted to remove any duplicates

If you specify UNION ALL in a query, SQL Server simply appends the result sets together No

intermediate sorting or merge step is needed to remove duplicates Figure 35.26 shows the

same query as in Figure 35.25, except that a UNION ALL is specified

When you know that you do not need to worry about duplicate rows in a UNION result set,

always specify UNION ALL to eliminate the extra overhead required for sorting

When a UNION is used to merge large result sets together, SQL Server 2008 may opt to use a

merge join or hash match operation to remove any duplicate rows Figure 35.27 shows an

example of a UNION query where the rows are concatenated, and then a hash match

opera-tion is used to remove any duplicates

Parallel Query Processing

The query processor in SQL Server 2008 includes parallel query processing—an execution

strategy that can improve the performance of complex queries on computers with more

than one processor

SQL Server inserts exchange operators into each parallel query to build and manage the

query execution plan The exchange operator is responsible for providing process

Trang 6

FIGURE 35.25 An execution plan for a UNION query.

FIGURE 35.26 An execution plan for a UNION ALL query.

Trang 7

ptg CHAPTER 35 Understanding Query Optimization

FIGURE 35.27 An execution plan for a UNION query, using a hash match to eliminate

duplicate rows

query plans as the Distribute Streams, Repartition Streams, and Gather Streams

logical operators One or more of these operators can appear in the execution plan output

of a query plan for a parallel query

Whereas a parallel query execution plan can use more than one thread, a serial execution

plan, used by a nonparallel query, uses only a single thread for its execution Prior to

query execution time, SQL Server determines whether the current system state and

config-uration allow for parallel query execution If parallel query execution is justified, SQL

Server determines the optimal number of threads, called the degree of parallelism, and

distributes the query workload execution across those threads The parallel query uses the

same number of threads until the query completes SQL Server reexamines the optimal

degree of parallelism each time a query execution plan is retrieved from the procedure

cache Individual instances of the same query could be assigned a different degree of

parallelism

SQL Server calculates the degree of parallelism for each instance of a parallel query

execu-tion by using the following criteria:

How many processors does the computer running SQL Server have, and how many

are allocated to SQL Server?

If two or more processors are allocated to SQL Server, it can use parallel queries

Trang 8

The degree of parallelism is inversely related to CPU usage The Query Optimizer

assigns a lower degree of parallelism if the CPUs are already busy

Is sufficient memory available for parallel query execution?

Queries, like other processes, require resources to execute, particularly memory

Obviously, a parallel query demands more memory than a serial query More

impor-tantly, as the degree of parallelism increases, so does the amount of memory

required The Query Optimizer carefully considers this in developing a query

execu-tion plan The Query Optimizer could either adjust the degree of parallelism or use a

serial plan to complete the query

What is the type of query being executed?

Queries that use several CPU cycles justify using a parallel execution plan Some

examples are joins of large tables, substantial aggregations, and sorting of large result

sets The Query Optimizer determines whether to use a parallel or serial plan by

checking the value of the cost threshold for parallelism

Are a sufficient number of rows processed in the given stream?

If the Query Optimizer determines that the number of rows in a stream is too low, it

does not execute a parallel plan This prevents scenarios where the parallel execution

costs exceed the benefits of executing a parallel plan

Regardless of the answers to the previous questions, the Query Optimizer does not use a

parallel execution plan for a query if any one of the following conditions is true:

The serial execution cost of the query is not high enough to consider an alternative

parallel execution plan

A serial execution plan exists that is estimated to be faster than any possible parallel

execution plan for the particular query

The query contains scalar or relational operators that cannot be run in parallel

Parallel Query Configuration Options

Two server configuration options—maximum degree of parallelism and cost

thresh-old for parallelism—affect the consideration for a parallel query Although doing so

is not recommended, you can change the default settings for each For single processor

machines, these settings are ignored

The maximum degree of parallelism option limits the number of threads to use in a

parallel plan execution The range of possible values is 0 to 32 This value is configured to

0 by default, which allows the Query Optimizer to use up to the actual number of CPUs

allocated to SQL Server If you want to suppress parallel processing completely, set the

value to 1

Trang 9

The cost threshold for parallelism option establishes a ceiling value the Query

Optimizer uses to consider parallel query execution plans If the calculated value to

execute a serial plan is greater than the value set for the cost threshold for parallelism, a

parallel plan is generated This value is defined by the estimated time, in seconds, to

execute the serial plan The range of values for this setting is 0 to 32767 The default value

is 5 If the maximum degree of parallelism is set to 1, or if the computer has a single

processor, the cost threshold for parallelism value is ignored

You can modify the settings for the maximum degree of parallelism and the cost

threshold for parallelism server configuration options either by using the

sp_configure system stored procedure or through SSMS To set the values for these

options, use the sp_configure system stored procedure via SSMS or via SQLCMD, as

follows:

USE master

go

exec sp_configure ‘show advanced options’, 1

GO

RECONFIGURE

GO

exec sp_configure ‘max degree of parallelism’, 2

exec sp_configure ‘cost threshold for parallelism’, 15

RECONFIGURE

GO

To set these configuration options via SSMS, right-click the SQL Server instance in the

Object Explorer and then click Properties In the Server Properties dialog, select the

Advanced page The parallelism options are near the bottom, as shown in Figure 35.28

Identifying Parallel Queries

You can identify when a parallel execution plan is being chosen by displaying the

graphi-cal execution plan in SSMS The graphigraphi-cal execution plan uses icons to represent the

execution of specific statements and queries in SQL Server The execution plan output for

every parallel query has at least one of these three logical operators:

Distribute Streams—Receives a single input stream of records and distributes

multiple output streams The contents and form of the record are unchanged All

records enter through the same single input stream and appear in one of the output

streams, preserving the relative order

Gather Streams—Assembles multiple input streams of records and yields a single

output stream The relative order of the records, contents, and form is maintained

Repartition Streams—Accepts multiple input streams and produces multiple

streams of records The record contents and format are unchanged

Trang 10

FIGURE 35.28 Setting SQL Server parallelism options

Figure 35.29 shows a portion of a sample query plan that uses parallel query techniques—

both repartition streams and gather streams

Parallel Queries on Partitioned Objects

SQL Server 2008 provides improved query processing performance for partitioned objects

when running parallel plans including changes in the way parallel and serial plans are

represented, and enhancements to the partitioning information provided in both

compile-time and runcompile-time execution plans SQL Server 2008 also automates and improves the

thread partitioning strategy for parallel query execution plans on partitioned objects

In addition to the performance improvements, query plan information has been improved

as well in SQL Server 2008, now providing the following information related to

parti-tioned objects:

The partitions accessed by the query, available in runtime execution plans

An optional Partitioned attribute indicating that an operation, such as a seek,

scan, insert, update, merge, or delete, is performed on a partitioned table

Summary information that provides a total count of the partitions accessed This

information is available only in runtime plans

Định dạng
Số trang	10
Dung lượng	494,46 KB