Microsoft SQL Server 2008 R2 Unleashed- P130 doc

In this instance, when the Query Optimizer merges the two indexes using a merge join, joining them on the matching clustered indexes, the index rows in the merge set have all the informa

Trang 1

When the index union strategy is used on a heap table (such as the sales_noclust table),

you see a query plan similar to the one shown in Figure 35.11 Notice that the merge join

is replaced with a concatenation operation, and the stream aggregate is replaced with

distinct sort operation Although the steps are slightly different from the index

intersec-tion strategy, the result is similar: a list of unique RIDs is returned, and they are used to

retrieve the matching data rows in the table itself

When the OR in the query involves only a single column and a nonclustered index exists

on the column, the Query Optimizer in SQL Server 2008 typically resolves the query with

an index seek against the nonclustered index and then a bookmark lookup to retrieve the

data rows Consider the following query:

select * from sales

where ord_date in (‘6/15/2005’, ‘9/28/2008’, ‘6/25/2008’)

This query is the same as the following:

select * from sales

where ord_date = ‘6/15/2005’

or ord_date = ‘9/28/2008’

or ord_date = ‘6/25/2008’

To process this query, SQL Server performs a single index seek that looks for each of the

search values and then joins the list of bookmarks returned with either the clustered index

or the RIDs of the target table No removal of duplicates is necessary because each OR

condition matches a distinct set of rows Figure 35.12 shows an example of the query plan

for multiple OR conditions against a single column

Index Joins

Besides using the index intersection and index union strategies, another way of using

multiple indexes on a single table is to join two or more indexes to create a covering

index This is similar to an index intersection, except that the final bookmark lookup is

not required because the merged index rows contain all the necessary information

FIGURE 35.11 An execution plan for an index union strategy on a heap table

Trang 2

FIGURE 35.12 An execution plan using index seek to retrieve rows for an OR condition on a

single column

select stor_id from sales

where qty = 816

and ord_date = ‘1/2/2008’

Again, the sales table contains indexes on both the qty and ord_date columns Each of

these indexes contains the clustered index as a bookmark, and the clustered index

contains the stor_id column In this instance, when the Query Optimizer merges the two

indexes using a merge join, joining them on the matching clustered indexes, the index

rows in the merge set have all the information needed to resolve the query because

stor_id is part of the nonclustered indexes There is no need to perform a bookmark

lookup on the data page By joining the two index result sets, SQL Server creates the same

effect as having one covering index on qty, ord_date, and stor_id on the table If you use

the same numbers as in the “Index Intersection” section presented earlier, the cost of the

index join would be as follows:

8 pages (1 root page + 1 intermediate page + the 6 leaf pages to find all the

book-marks for the 1,200 matching index rows on qty)

+ 4 pages (1 root page + 1 intermediate page + 2 leaf pages to find all the bookmarks

for the 212 matching index rows on ord_date)

Trang 3

Figure 35.13 shows an example of the execution plan for an index join Notice that it does

not include the bookmark lookup present in the index intersection execution plan (refer

to Figure 35.8)

Optimizing with Indexed Views

In SQL Server 2008, when you create a unique clustered index on a view, the result set for

the view is materialized and stored in the database with the same structure as a table that

has a clustered index Changes made to the data in the underlying tables of the view are

automatically reflected in the view the same way as changes to a table are reflected in its

indexes In the Developer and Enterprise Editions of SQL Server 2008, the Query

Optimizer automatically considers using the index on the view to speed up access for

queries run directly against the view The Query Optimizer in the Developer and

Enterprise Editions of SQL Server 2008 also looks at and considers using the indexed view

for searches against the underlying base table, when appropriate

NOTE

Although indexed views can be created in any edition of SQL Server 2008, they are

considered for query optimization only in the Developer and Enterprise Editions of SQL

Server 2008 In other editions of SQL Server 2008, indexed views are not used to

opti-mize the query unless the view is explicitly referenced in the query and the NOEXPAND

Query Optimizer hint is specified For example, to force the Query Optimizer to consider

using the sales_Qty_Rollup indexed view in the Standard Edition of SQL Server

2008, you execute the query as follows:

FIGURE 35.13 An execution plan for an index join

Trang 4

select * from sales_Qty_Rollup WITH (NOEXPAND)

where stor_id between ‘B914’ and ‘B999’

The NOEXPAND hint is allowed only in SELECT statements, and the indexed view must be

referenced directly in the query (Only the Developer and Enterprise Editions consider

using an indexed view that is not directly referenced in the query.) As always, you

should use Query Optimizer hints with care When the NOEXPAND hint is included in the

query, the Query Optimizer cannot consider other alternatives for optimizing the query

Consider the following example, which creates an indexed view on the sales table,

containing stor_id and sum(qty) grouped by stor_id:

set quoted_identifier on

go

if object_id(‘sales_Qty_Rollup’) is not null

drop view sales_Qty_Rollup

go

create view sales_qty_rollup

with schemabinding

as

select stor_id, sum(qty) as total_qty, count_big(*) as id

from dbo.sales

group by stor_id

go

create unique clustered index idx1 on sales_Qty_Rollup (stor_id)

go

The creation of the clustered index on the view essentially creates a clustered table in the

database with the three columns stor_id, total_qty, and id As you would expect, the

following query on the view itself uses a clustered index seek on the view to retrieve the

result rows from the view instead of having to scan or search the sales table itself:

select * from sales_Qty_Rollup

However, the following query on the sales table uses the indexed view sales_qty_rollup

to retrieve the result set as well:

select stor_id, sum(qty)

from sales

group by stor_id

Trang 5

Essentially, the Query Optimizer recognizes the indexed view essentially as another index

on the sales table that covers the query The execution plan in Figure 35.14 shows the

indexed view being searched in place of the table

NOTE

In addition to the seven required SET options that need to be set appropriately when the

indexed view is created, they must also be set the same way for a session to be able to

use the indexed view in queries The required SET option settings are as follows:

SET ARITHABORT ON

SET CONCAT_NULL_YIELDS_NULL ON

SET QUOTED_IDENTIFIER ON

SET ANSI_NULLS ON

SET ANSI_PADDING ON

SET ANSI_WARNINGS ON

SET NUMERIC_ROUNDABORT OFF

If these SET options are not set appropriately for the session running a query that

could make use of an indexed view, the indexed view is not used, and the table is

searched instead

For more information on indexed views, see Chapters 27, “Creating and Managing

Views,” and 34, “Data Structures, Indexes, and Performance.”

FIGURE 35.14 An execution plan showing an indexed view being searched to satisfy a query

on a base table

Trang 6

You might find rare situations when using the indexed view in the Enterprise, Datacenter,

or Developer Editions of SQL Server 2008 leads to poor query performance, and you might

want to avoid having the Query Optimizer use the indexed view To force the Query

Optimizer to ignore the indexed view(s) and optimize the query using the indexes on the

underlying base tables, you specify the EXPAND VIEWS query option, as follows:

select * from sales_Qty_Rollup

OPTION (EXPAND VIEWS)

Optimizing with Filtered Indexes

SQL Server 2008 introduces the capability to define filtered indexes and statistics on a

subset of rows rather than on the entire rowset in a table This is done by specifying

simple predicates in the index create statement to restrict the set of rows included in the

index Filtered statistics help solve a common problem in estimating the number of

matching rows when the estimates become skewed due to a large number of duplicate

values (or NULLs) in an index or due to data correlation between columns Filtered indexes

provide query optimization benefits when you frequently query specific subsets of your

data rows

If a filtered index exists on a table, the optimizer recognizes when a search predicate is

compatible with the filtered index; it considers using the filtered index to optimize the

query if the selectivity is good

For example, the titles table in the bigpubs2008 database contains a large percentage of

rows where ytd_sales is 0 A nonclustered index typically doesn’t help for searches in

which ytd_sales is 0 because the selectivity isn’t adequate, and a table scan would be

performed An advantageous approach then is to create a filtered index on ytd_sales

without including the values of 0 to reduce the size of the index and make it more efficient

For example, first create an unfiltered index on ytd_sales on the titles table:

create index ytd_sales_unfiltered on titles (ytd_sales)

Then, execute the following two queries:

select * from titles where ytd_sales = 0

select * from titles where ytd_sales = 10

As you can see by the query plan displayed in Figure 35.15, a query where ytd_sales = 0

still uses a table scan instead of the index because the selectivity is poor, whereas it uses

the index for ytd_sales = 10

Trang 7

Now, drop the unfiltered index and re-create a filtered index that excludes values of 0:

drop index titles.ytd_sales_unfiltered

go

create index ytd_sales_filtered on titles (ytd_sales)

where ytd_sales <> 0

Re-run the queries and examine the query plan again Figure 35.16 shows that the

query where ytd_sales = 0 still uses a table scan as before, but the query where

ytd_sales = 10 is able to use the filtered index

In this case, it may be beneficial to define the filtered index instead of a normal index on

ytd_sales because the filtered index will require less space and be a more efficient index

FIGURE 35.15 An execution plan showing index not being used due to poor selectivity

FIGURE 35.16 An execution plan showing the filtered index being used

Trang 8

by excluding all the rows with ytd_sales values of 0, especially if the majority of the

queries against the table are searching for ytd_sales values that are nonzero

NOTE

For more information on creating and using filtered indexes, see Chapter 34

Join Selection

The job of the Query Optimizer is incredibly complex The Query Optimizer can consider

literally thousands of options when determining the optimal execution plan The statistics

are simply one of the tools that the Query Optimizer can use to help in the

decision-making process

In addition to examining the statistics to determine the most efficient access paths for

SARGs and join clauses, the Query Optimizer must consider the optimum order in which

to access the tables, the appropriate join algorithms to use, the appropriate sorting

algo-rithms, and many other details too numerous to list here The goal of the Query

Optimizer during join selection is to determine the most efficient join strategy

As mentioned at the beginning of this chapter, delving into the detailed specifics of the

various join strategies and their costing algorithms is beyond the scope of a single chapter

on optimization In addition, some of these costing algorithms are proprietary and not

publicly available The goal of this section, then, is to present an overview of the most

common query processing algorithms that the Query Optimizer uses to determine an

effi-cient execution plan

Join Processing Strategies

If you are familiar with SQL, you are probably very familiar with using joins between

tables in creating SQL queries A join occurs any time the SQL Server Query Optimizer has

to compare two inputs to determine an output The join can occur between one table and

another table, between an index and a table, or between an index and another index (as

described previously, in the section “Index Intersection”)

The SQL Server Query Optimizer uses three primary types of join strategies when it must

compare two inputs: nested loops joins, merge joins, and hash joins The Query Optimizer

must consider each one of these algorithms to determine the most appropriate and

effi-cient algorithm for a given situation

Each of the three supported join algorithms could be used for any join operation The

Query Optimizer examines all the possible alternatives, assigns costs to each, and chooses

the least expensive join algorithm for a given situation Merge and hash joins often

Trang 9

greatly improve the query processing performance for very large data tables and data

warehouses

Nested Loops Joins

The nested loops join algorithm is by far the simplest of the three join algorithms The

nested loops join uses one input as the “outer” loop and the other input as the “inner”

loop As you might expect, SQL Server processes the outer input one row at a time For

each row in the outer input, the inner input is searched for matching rows

Figure 35.17 illustrates a query that uses a nested loops join

Note that in the graphical execution plan, the outer loop is represented as the top input

table, and the inner loop is represented as the bottom input table In most instances, the

Query Optimizer chooses the input table with the fewest number of qualifying rows to be

the outer loop to limit the number of iterative lookups against the inner table However,

the Query Optimizer may choose the input table with the greater number of qualifying

rows as the outer table if the I/O cost of searching that table first and then performing the

iterative loops on the other table is lower than the alternative

The nested loop join is the easiest join strategy for which to estimate the I/O cost The

cost of the nested loop join is calculated as follows:

Number of I/Os to read in outer input

+ Number of matching rows × Number of I/Os per lookup on inner input

= Total logical I/O cost for query

FIGURE 35.17 An execution plan for a nested loops join

The Query Optimizer evaluates the I/O costs for the various possible join orders as well as

Trang 10

join order The nested loops join is efficient for queries that typically affect only a small

number of rows As the number of rows in the outer loop increases, the effectiveness of

the nested loops join strategy diminishes The reason is the increased number of logical

I/Os required as the number of qualifying rows increases

Also, if there are no useful indexes on the join columns, the nested loop join is not an

efficient join strategy because it requires a table scan lookup on the inner table for each

row in the outer table Lacking useful indexes for the join, the Query Optimizer often opts

to perform a merge or hash join

Merge Joins

The merge join algorithm is much more effective than the nested loops join for dealing

with large data volumes or when the lack of limiting SARGs or useful indexes on SARGs

leads to a table scan of one or both tables involved in the join A merge join works by

retrieving one row from each input and comparing them, matching on the join

column(s) Figure 35.18 illustrates a query that uses a merge join

A merge join requires that both inputs be sorted on the merge columns—that is, the

columns specified in the equality (ON) clauses of the join predicate A merge join does not

work if both inputs are not sorted In the query shown in Figure 35.18, both tables have a

clustered index on stor_id, so the merge column (stor_id) is already sorted for each

table If the merge columns are not already sorted, a separate sort operation may be

required before the merge join operation When the input is sorted, the merge join

opera-tion retrieves a row from each input and compares them, returning the rows if they are

equal If the inputs are not equal, the lower-value row is discarded, and another row is

obtained from that input This process repeats until all rows have been processed

Usually, the Query Optimizer chooses a merge join strategy, as in this example, when the

data volume is large and both columns are contained in an existing presorted index, such

FIGURE 35.18 An execution plan for a merge join

Định dạng
Số trang	10
Dung lượng	698,55 KB