In this instance, when the Query Optimizer merges the two indexes using a merge join, joining them on the matching clustered indexes, the index rows in the merge set have all the informa
Trang 1When the index union strategy is used on a heap table (such as the sales_noclust table),
you see a query plan similar to the one shown in Figure 35.11 Notice that the merge join
is replaced with a concatenation operation, and the stream aggregate is replaced with
distinct sort operation Although the steps are slightly different from the index
intersec-tion strategy, the result is similar: a list of unique RIDs is returned, and they are used to
retrieve the matching data rows in the table itself
When the OR in the query involves only a single column and a nonclustered index exists
on the column, the Query Optimizer in SQL Server 2008 typically resolves the query with
an index seek against the nonclustered index and then a bookmark lookup to retrieve the
data rows Consider the following query:
select * from sales
where ord_date in (‘6/15/2005’, ‘9/28/2008’, ‘6/25/2008’)
This query is the same as the following:
select * from sales
where ord_date = ‘6/15/2005’
or ord_date = ‘9/28/2008’
or ord_date = ‘6/25/2008’
To process this query, SQL Server performs a single index seek that looks for each of the
search values and then joins the list of bookmarks returned with either the clustered index
or the RIDs of the target table No removal of duplicates is necessary because each OR
condition matches a distinct set of rows Figure 35.12 shows an example of the query plan
for multiple OR conditions against a single column
Index Joins
Besides using the index intersection and index union strategies, another way of using
multiple indexes on a single table is to join two or more indexes to create a covering
index This is similar to an index intersection, except that the final bookmark lookup is
not required because the merged index rows contain all the necessary information
FIGURE 35.11 An execution plan for an index union strategy on a heap table
Trang 2FIGURE 35.12 An execution plan using index seek to retrieve rows for an OR condition on a
single column
select stor_id from sales
where qty = 816
and ord_date = ‘1/2/2008’
Again, the sales table contains indexes on both the qty and ord_date columns Each of
these indexes contains the clustered index as a bookmark, and the clustered index
contains the stor_id column In this instance, when the Query Optimizer merges the two
indexes using a merge join, joining them on the matching clustered indexes, the index
rows in the merge set have all the information needed to resolve the query because
stor_id is part of the nonclustered indexes There is no need to perform a bookmark
lookup on the data page By joining the two index result sets, SQL Server creates the same
effect as having one covering index on qty, ord_date, and stor_id on the table If you use
the same numbers as in the “Index Intersection” section presented earlier, the cost of the
index join would be as follows:
8 pages (1 root page + 1 intermediate page + the 6 leaf pages to find all the
book-marks for the 1,200 matching index rows on qty)
+ 4 pages (1 root page + 1 intermediate page + 2 leaf pages to find all the bookmarks
for the 212 matching index rows on ord_date)
Trang 3Figure 35.13 shows an example of the execution plan for an index join Notice that it does
not include the bookmark lookup present in the index intersection execution plan (refer
to Figure 35.8)
Optimizing with Indexed Views
In SQL Server 2008, when you create a unique clustered index on a view, the result set for
the view is materialized and stored in the database with the same structure as a table that
has a clustered index Changes made to the data in the underlying tables of the view are
automatically reflected in the view the same way as changes to a table are reflected in its
indexes In the Developer and Enterprise Editions of SQL Server 2008, the Query
Optimizer automatically considers using the index on the view to speed up access for
queries run directly against the view The Query Optimizer in the Developer and
Enterprise Editions of SQL Server 2008 also looks at and considers using the indexed view
for searches against the underlying base table, when appropriate
NOTE
Although indexed views can be created in any edition of SQL Server 2008, they are
considered for query optimization only in the Developer and Enterprise Editions of SQL
Server 2008 In other editions of SQL Server 2008, indexed views are not used to
opti-mize the query unless the view is explicitly referenced in the query and the NOEXPAND
Query Optimizer hint is specified For example, to force the Query Optimizer to consider
using the sales_Qty_Rollup indexed view in the Standard Edition of SQL Server
2008, you execute the query as follows:
FIGURE 35.13 An execution plan for an index join
Trang 4select * from sales_Qty_Rollup WITH (NOEXPAND)
where stor_id between ‘B914’ and ‘B999’
The NOEXPAND hint is allowed only in SELECT statements, and the indexed view must be
referenced directly in the query (Only the Developer and Enterprise Editions consider
using an indexed view that is not directly referenced in the query.) As always, you
should use Query Optimizer hints with care When the NOEXPAND hint is included in the
query, the Query Optimizer cannot consider other alternatives for optimizing the query
Consider the following example, which creates an indexed view on the sales table,
containing stor_id and sum(qty) grouped by stor_id:
set quoted_identifier on
go
if object_id(‘sales_Qty_Rollup’) is not null
drop view sales_Qty_Rollup
go
create view sales_qty_rollup
with schemabinding
as
select stor_id, sum(qty) as total_qty, count_big(*) as id
from dbo.sales
group by stor_id
go
create unique clustered index idx1 on sales_Qty_Rollup (stor_id)
go
The creation of the clustered index on the view essentially creates a clustered table in the
database with the three columns stor_id, total_qty, and id As you would expect, the
following query on the view itself uses a clustered index seek on the view to retrieve the
result rows from the view instead of having to scan or search the sales table itself:
select * from sales_Qty_Rollup
where stor_id between ‘B914’ and ‘B999’
However, the following query on the sales table uses the indexed view sales_qty_rollup
to retrieve the result set as well:
select stor_id, sum(qty)
from sales
where stor_id between ‘B914’ and ‘B999’
group by stor_id
Trang 5Essentially, the Query Optimizer recognizes the indexed view essentially as another index
on the sales table that covers the query The execution plan in Figure 35.14 shows the
indexed view being searched in place of the table
NOTE
In addition to the seven required SET options that need to be set appropriately when the
indexed view is created, they must also be set the same way for a session to be able to
use the indexed view in queries The required SET option settings are as follows:
SET ARITHABORT ON
SET CONCAT_NULL_YIELDS_NULL ON
SET QUOTED_IDENTIFIER ON
SET ANSI_NULLS ON
SET ANSI_PADDING ON
SET ANSI_WARNINGS ON
SET NUMERIC_ROUNDABORT OFF
If these SET options are not set appropriately for the session running a query that
could make use of an indexed view, the indexed view is not used, and the table is
searched instead
For more information on indexed views, see Chapters 27, “Creating and Managing
Views,” and 34, “Data Structures, Indexes, and Performance.”
FIGURE 35.14 An execution plan showing an indexed view being searched to satisfy a query
on a base table
Trang 6You might find rare situations when using the indexed view in the Enterprise, Datacenter,
or Developer Editions of SQL Server 2008 leads to poor query performance, and you might
want to avoid having the Query Optimizer use the indexed view To force the Query
Optimizer to ignore the indexed view(s) and optimize the query using the indexes on the
underlying base tables, you specify the EXPAND VIEWS query option, as follows:
select * from sales_Qty_Rollup
where stor_id between ‘B914’ and ‘B999’
OPTION (EXPAND VIEWS)
Optimizing with Filtered Indexes
SQL Server 2008 introduces the capability to define filtered indexes and statistics on a
subset of rows rather than on the entire rowset in a table This is done by specifying
simple predicates in the index create statement to restrict the set of rows included in the
index Filtered statistics help solve a common problem in estimating the number of
matching rows when the estimates become skewed due to a large number of duplicate
values (or NULLs) in an index or due to data correlation between columns Filtered indexes
provide query optimization benefits when you frequently query specific subsets of your
data rows
If a filtered index exists on a table, the optimizer recognizes when a search predicate is
compatible with the filtered index; it considers using the filtered index to optimize the
query if the selectivity is good
For example, the titles table in the bigpubs2008 database contains a large percentage of
rows where ytd_sales is 0 A nonclustered index typically doesn’t help for searches in
which ytd_sales is 0 because the selectivity isn’t adequate, and a table scan would be
performed An advantageous approach then is to create a filtered index on ytd_sales
without including the values of 0 to reduce the size of the index and make it more efficient
For example, first create an unfiltered index on ytd_sales on the titles table:
create index ytd_sales_unfiltered on titles (ytd_sales)
Then, execute the following two queries:
select * from titles where ytd_sales = 0
select * from titles where ytd_sales = 10
As you can see by the query plan displayed in Figure 35.15, a query where ytd_sales = 0
still uses a table scan instead of the index because the selectivity is poor, whereas it uses
the index for ytd_sales = 10
Trang 7Now, drop the unfiltered index and re-create a filtered index that excludes values of 0:
drop index titles.ytd_sales_unfiltered
go
create index ytd_sales_filtered on titles (ytd_sales)
where ytd_sales <> 0
Re-run the queries and examine the query plan again Figure 35.16 shows that the
query where ytd_sales = 0 still uses a table scan as before, but the query where
ytd_sales = 10 is able to use the filtered index
In this case, it may be beneficial to define the filtered index instead of a normal index on
ytd_sales because the filtered index will require less space and be a more efficient index
FIGURE 35.15 An execution plan showing index not being used due to poor selectivity
FIGURE 35.16 An execution plan showing the filtered index being used
Trang 8by excluding all the rows with ytd_sales values of 0, especially if the majority of the
queries against the table are searching for ytd_sales values that are nonzero
NOTE
For more information on creating and using filtered indexes, see Chapter 34
Join Selection
The job of the Query Optimizer is incredibly complex The Query Optimizer can consider
literally thousands of options when determining the optimal execution plan The statistics
are simply one of the tools that the Query Optimizer can use to help in the
decision-making process
In addition to examining the statistics to determine the most efficient access paths for
SARGs and join clauses, the Query Optimizer must consider the optimum order in which
to access the tables, the appropriate join algorithms to use, the appropriate sorting
algo-rithms, and many other details too numerous to list here The goal of the Query
Optimizer during join selection is to determine the most efficient join strategy
As mentioned at the beginning of this chapter, delving into the detailed specifics of the
various join strategies and their costing algorithms is beyond the scope of a single chapter
on optimization In addition, some of these costing algorithms are proprietary and not
publicly available The goal of this section, then, is to present an overview of the most
common query processing algorithms that the Query Optimizer uses to determine an
effi-cient execution plan
Join Processing Strategies
If you are familiar with SQL, you are probably very familiar with using joins between
tables in creating SQL queries A join occurs any time the SQL Server Query Optimizer has
to compare two inputs to determine an output The join can occur between one table and
another table, between an index and a table, or between an index and another index (as
described previously, in the section “Index Intersection”)
The SQL Server Query Optimizer uses three primary types of join strategies when it must
compare two inputs: nested loops joins, merge joins, and hash joins The Query Optimizer
must consider each one of these algorithms to determine the most appropriate and
effi-cient algorithm for a given situation
Each of the three supported join algorithms could be used for any join operation The
Query Optimizer examines all the possible alternatives, assigns costs to each, and chooses
the least expensive join algorithm for a given situation Merge and hash joins often
Trang 9greatly improve the query processing performance for very large data tables and data
warehouses
Nested Loops Joins
The nested loops join algorithm is by far the simplest of the three join algorithms The
nested loops join uses one input as the “outer” loop and the other input as the “inner”
loop As you might expect, SQL Server processes the outer input one row at a time For
each row in the outer input, the inner input is searched for matching rows
Figure 35.17 illustrates a query that uses a nested loops join
Note that in the graphical execution plan, the outer loop is represented as the top input
table, and the inner loop is represented as the bottom input table In most instances, the
Query Optimizer chooses the input table with the fewest number of qualifying rows to be
the outer loop to limit the number of iterative lookups against the inner table However,
the Query Optimizer may choose the input table with the greater number of qualifying
rows as the outer table if the I/O cost of searching that table first and then performing the
iterative loops on the other table is lower than the alternative
The nested loop join is the easiest join strategy for which to estimate the I/O cost The
cost of the nested loop join is calculated as follows:
Number of I/Os to read in outer input
+ Number of matching rows × Number of I/Os per lookup on inner input
= Total logical I/O cost for query
FIGURE 35.17 An execution plan for a nested loops join
The Query Optimizer evaluates the I/O costs for the various possible join orders as well as
Trang 10join order The nested loops join is efficient for queries that typically affect only a small
number of rows As the number of rows in the outer loop increases, the effectiveness of
the nested loops join strategy diminishes The reason is the increased number of logical
I/Os required as the number of qualifying rows increases
Also, if there are no useful indexes on the join columns, the nested loop join is not an
efficient join strategy because it requires a table scan lookup on the inner table for each
row in the outer table Lacking useful indexes for the join, the Query Optimizer often opts
to perform a merge or hash join
Merge Joins
The merge join algorithm is much more effective than the nested loops join for dealing
with large data volumes or when the lack of limiting SARGs or useful indexes on SARGs
leads to a table scan of one or both tables involved in the join A merge join works by
retrieving one row from each input and comparing them, matching on the join
column(s) Figure 35.18 illustrates a query that uses a merge join
A merge join requires that both inputs be sorted on the merge columns—that is, the
columns specified in the equality (ON) clauses of the join predicate A merge join does not
work if both inputs are not sorted In the query shown in Figure 35.18, both tables have a
clustered index on stor_id, so the merge column (stor_id) is already sorted for each
table If the merge columns are not already sorted, a separate sort operation may be
required before the merge join operation When the input is sorted, the merge join
opera-tion retrieves a row from each input and compares them, returning the rows if they are
equal If the inputs are not equal, the lower-value row is discarded, and another row is
obtained from that input This process repeats until all rows have been processed
Usually, the Query Optimizer chooses a merge join strategy, as in this example, when the
data volume is large and both columns are contained in an existing presorted index, such
FIGURE 35.18 An execution plan for a merge join