If you create a nonclustered index on a computed column, the computed value is stored in the nonclustered index rows but not in the data rows, unless you also have a clustered index on t
Trang 1In addition, the functions in the computed column must be deterministic A deterministic
function is one that returns the same result every time it is called with the same set of
input parameters
When you create a clustered index on a computed column, it is no longer a virtual
column in the table The computed value for the column is stored in the data rows of the
table If you create a nonclustered index on a computed column, the computed value is
stored in the nonclustered index rows but not in the data rows, unless you also have a
clustered index on the computed column
Be aware of the overhead involved with indexes on computed columns Updates to the
columns that the computed columns are based on result in updates to the index on the
computed column as well
Indexes on computed columns can be useful when you need an index on large character
fields As discussed earlier, the smaller an index, the more efficient it is You could create a
computed column on the large character field by using the CHECKSUM() function
CHECKSUM() generates a 4-byte integer that is relatively unique for character strings but not
absolutely unique (Different character strings can generate the same checksum, so when
searching against the checksum, you need to include the character string as an additional
search argument to ensure that you are matching the right row.) The benefit is that you
can create an index on the 4-byte integer generated by the CHECKSUM() that can be used to
search against the character string instead of having to create an index on the large
char-acter column itself Listing 34.7 shows an example of applying this solution
LISTING 34.7 Using an Index on a Computed Checksum Column
The first statement is used to disable any previously created
DDL triggers in the database which would prevent creating a new constraint
DISABLE TRIGGER ALL ON DATABASE
go
First add the computed column to the table
alter table titles add title_checksum as CHECKSUM(title)
go
Next, create an index on the computed column
create index NC_titles_titlechecksum on titles(title_checksum)
go
In your queries, include both the checksum column and the title column in
your search argument
select title_id, ytd_sales
from titles
where title_checksum = checksum(‘Fifty Years in Buckingham Palace Kitchens’)
and title = ‘Fifty Years in Buckingham Palace Kitchens’
Trang 2SQL Server 2008 also supports persisted computed columns With persisted computed
columns, SQL Server stores the computed values in the table without requiring an index on
the computed column Like indexed computed columns, persisted computed columns are
updated when any other columns on which the computed column depends are updated
Persisted computed columns allow you to create an index on a computed column that is
defined with a deterministic, but imprecise, expression This option enables you to create
an index on a computed column when SQL Server cannot determine with certainty
whether a function that returns a computed column expression—for example, a CLR
func-tion that is created in the Microsoft NET Framework—is both deterministic and precise
Filtered Indexes and Statistics
As discussed earlier in this chapter, a nonclustered index contains a row for every row in
the table, even rows with a large number of duplicate key values where the nonclustered
index will not be an effective method for finding those rows For these situations, SQL
Server 2008 introduces filtered indexes Filtered indexes are an optimized form of
nonclus-tered indexes, created by specifying a search predicate when defining the index This
search predicate acts as a filter to create the index on only the data rows that match the
search predicate This reduces the size of the index and essentially creates an index that
covers your queries, which return only a small percentage of rows from a well-defined
subset of data within your table
Filtered indexes can provide the following advantages over full-table indexes:
improves query performance and execution plan quality because it is smaller than a
full-table nonclustered index and has filtered statistics Filtered statistics are more
accurate than full-table statistics because they cover only the rows contained in the
filtered index
data modifications affect the data values contained in the index Also, because a
filtered index contains only the frequently accessed data, the smaller size of the
index reduces the cost of updating the statistics
non-clustered indexes when a table index is not necessary You can replace a
full-table nonclustered index with multiple filtered indexes without significantly
increasing the storage requirements
Following are some of the situations in which filtered indexes can be useful:
When a column contains mostly NULL values, but your queries search only for rows
where data values are NOT NULL
When a column contains a large number of duplicate values, but your queries
Trang 3When you want to enforce uniqueness on a subset of values—for example, a
column on which you want to allow NULL values A unique constraint allows only
one NULL value; however, a filtered index can be defined as unique over only the
rows that are NOT NULL
When queries retrieve only a particular range of data values and you want to index
these values but not the entire table For example, you have a table that contains a
large number of historical values, but you want to search only values for the current
year or quarter You can create a filtered index on the desired range of values and
possibly even use the INCLUDE option to add columns so your index fully covers
your queries
Now, you may be asking, “Can’t some of the preceding solutions be accomplished using
indexed views?” Yes, they can, but filtered indexes provided a better alternative The most
significant advantage is that filtered indexes can be used in any edition of SQL Server
2008, whereas indexed views are chosen by the optimizer only in the Developer,
Enterprise, and Datacenter Editions unless you use the NOEXPAND hint in other editions In
addition, filtered indexes have reduced index maintenance costs (the query processor uses
fewer CPU resources to update a filtered index than an indexed view); the Query
Optimizer considers using a filtered index in more situations than the equivalent indexed
view; you can perform online rebuilds of filtered indexes (online index rebuilds are not
supported for indexed views); and filtered indexes can be nonunique, whereas indexed
views must be unique
Based on these advantages, it is recommended that you use filtered indexes instead of
indexed views when possible Consider replacing indexed views with filtered indexes
when the view references only one table, the view query doesn’t return computed
columns, and the view predicate uses simple comparison logic and doesn’t contain a view
Creating and Using Filtered Indexes
To define filtered indexes, you use the normal CREATE INDEX command but include a
WHERE condition as a search predicate to specify which data rows the filtered index should
include In the current implementation, you can specify only simple search predicates
such as IN; the comparison operators IS NULL, IS NOT NULL, =, <>, !=, >, >=, !>, <, <=, !<;
and the logical operator AND In addition, filtered indexes cannot be created on computed
columns, user-defined data types, Hierarchyid, or spatial types
For example, assume you need to search only the sales table in the bigpubs2008 database
for sales since 9/1/2008 The majority of the rows in the sales table have order dates prior
to 9/1/2008 To create a filtered index on the ord_date column, you would execute a
command like the following:
create index ord_date_filt on sales (ord_date)
WHERE ord_date >= ‘2008-09-01 00:00:00.000’
Trang 4FIGURE 34.29 Query plan for a query using a value not in the filtered index
Now, let’s look at a couple queries that may or may not use the new filtered index First,
let’s consider the following query looking for any sales for 9/15/2008:
select * from sales
where ord_date = ‘9/15/2008’
If you look at the execution plan in Figure 34.28, you can see that the filtered index,
ord_date_filt, is used to locate the qualifying row values The clustered index,
UPKCL_sales, is used as the row locator to retrieve the data rows (as described earlier in the
“Nonclustered Indexes” section)
NOTE
For more information on understanding and analyzing query plans, see Chapter 36
If you run the following query using a data values that’s outside the range of values stored
in the filtered index, you see that the filtered index is not used (see Figure 34.29):
select * from sales
where ord_date = ‘9/15/2008’
FIGURE 34.28 Query plan for a query that uses a filtered index
Trang 5Now let’s consider a query that you might expect would use the filtered index but does not:
select stor_id, qty from sales
where ord_date > ‘9/15/2008’
Now, you might expect that this query would use the filtered index because the data
values are within the range of values for the filtered index, but due to the number of rows
that match, SQL Server determines that the I/O cost of using the filtered nonclustered
index to locate the matching rows and then retrieve the data rows using the clustered
index row locators requires more I/Os than simply performing a clustered index scan of
the entire table (the same query plan as shown in Figure 34.29)
In this case, you might want to use included columns on the filtered index so that the
data values for the query can be retrieved using index covering without incurring the
extra cost of using the row locators to retrieve the actual data rows The following
example creates a filtered index on ord_date that includes stor_id and qty:
create index ord_date_filt2 on sales (ord_date)
INCLUDE (qty, stor_id)
WHERE ord_date >= ‘2008-09-01 00:00:00.000’
If you rerun the same query and examine the query plan, you see that the filtered index is
used this time, and SQL Server uses index covering (see Figure 34.30) You can tell that it’s
using index covering with the ord_dat_filt2 index because there is no use of the
clus-tered index to retrieve the data rows Using the row locators is unnecessary because all the
information requested by the query can be retrieved from the index leaf rows that contain
the values of the included columns as well
Creating and Using Filtered Statistics
Similar to the way you use filtered indexes, SQL Server 2008 also lets you create filtered
statistics Like filtered indexes, filtered statistics are also created over a subset of rows in
the table based on a specified filter predicate Creating a filtered index on a column
autocreates the corresponding filtered statistics In addition, filtered statistics can be
created explicitly by including the WHERE clause with the CREATE STATISTICS statement
FIGURE 34.30 Query plan using index covering on a filtered index with included columns
Trang 6Filtered statistics can be used to avoid a common issue with statistics where the cardinality
estimation is skewed due to a large number of NULL or duplicate values, or due to a data
correlation between columns For example, let’s consider the titles table in the
bigpubs2008 database All the cooking books (type = ‘trad_cook’ or ’mod_cook’) are
published by a single publisher (pub_id = ‘0877’) However, SQL Server stores
column-level statistics on each of these columns independent of each other Based on the
statis-tics, SQL Server estimates there are six rows in the titles table where pub_id = ‘0877’,
and five rows where the type is either ’trad_cook’ or ’mod_cook’
However, let’s assume you were to execute the following query:
select * from titles where pub_id = ‘0877’
and type in (‘trad_cook’, ‘mod_cook’)
When the Query Optimizer estimates the selectivity of this query where each search
predi-cate is part of an AND condition, it assumes the conditions are independent of one another
and estimates the number of matching rows by taking the intersection of the two
condi-tions Essentially, it multiplies the selectivity of each of the two conditions together to
determine the total selectivity The selectivity of each is 0.011 (6/537) and 0.009 (5/537),
which, when multiplied together, comes out to approximately 0.0001, so the optimizer
estimates at most only a single row will match However, because all five cooking books
are published by pub_id ‘0877’, in actuality a total of five rows match
Now, in this example, the difference between one row and five rows is likely not
signifi-cant enough to make a big difference in query performance, but a similar estimation error
could be quite large with other data sets, leading the optimizer to possibly choose an
inap-propriate, and considerably more expensive, query plan
Filtered statistics can help solve this problem by letting you capture these types of data
correlations in your column statistics For example, to capture the fact that all cooking
books are also published by the same publisher, you could create the filtered statistics
using the following statement:
create statistics pub_id_type on titles (pub_id, type)
where pub_id = ‘0877’ and type in (‘trad_cook’, ‘mod_cook’)
When these filtered statistics are defined and the same query is run, SQL Server uses the
filtered statistics to determine that the query will match five rows instead of only one
Although using this solution could require having to define a number of filtered statistics,
it can be effective to help fix your most critical queries where cardinality estimates due to
data correlation or data skew issues are causing the Query Optimizer to choose poorly
performing query plans
Choosing Indexes: Query Versus Update Performance
I/O is the primary factor in determining query performance The challenge for a database
designer is to build a physical data model that provides efficient data access Creating
Trang 7indexes on database tables allows SQL Server to access data with fewer I/Os Defining
useful indexes during the logical and physical data modeling step is crucial The SQL
Server Query Optimizer relies heavily on index key distribution and index density to
determine which indexes to use for a query The Query Optimizer in SQL Server can use
multiple indexes in a query (through index intersection) to reduce the I/O required to
retrieve information In the absence of indexes, the Query Optimizer performs a table
scan, which can be costly from an I/O standpoint
Although indexes provide a means for faster access to data, they slow down data
modifica-tion statements due to the extra overhead of having to maintain the index during inserts,
updates, and deletes
In a DSS environment, defining many indexes can help your queries and does not create
much of a performance issue because the data is relatively static and doesn’t get updated
frequently You typically load the data, create the indexes, and forget about it until the
next data load As long as you have the necessary indexes to support the user queries and
they’re getting decent response time, the penalties of having too many indexes in a DSS
environment are the space wasted for indexes that possibly won’t be used, the additional
time required to create the excessive indexes, and the additional time required to back up
and run DBCC checks on the data
In an OLTP environment, on the other hand, too many indexes can lead to significant
performance degradation, especially if the number of indexes on a table exceeds four or
five Think about it for a second Every single-row insert is at least one data page write and
one or more index page writes (depending on whether a page split occurs) for every index
on the table With eight nonclustered indexes, that would be a minimum of nine writes to
the database for a single-row insert Therefore, for an OLTP environment, you want as few
indexes as possible—typically only the indexes required to support the update and delete
operations and your critical queries, and to enforce your uniqueness constraints
The natural solution, in a perfect world, would be to create a lot of indexes for a DSS
envi-ronment and as few indexes as possible in an OLTP envienvi-ronment Unfortunately, in the
real world, you typically have an environment that must support both DSS and OLTP
applications How do you resolve the competing indexing requirements of the two
envi-ronments? Meeting the indexing needs of DSS and OLTP applications requires a bit of a
balancing act, with no easy solution It often involves making hard decisions as to which
DSS queries might have to live with table scans and which updates have to contend with
additional overhead
One solution is to have two separate databases: one for DSS applications and another for
OLTP applications Obviously, this method requires some method of keeping the databases
in sync The method chosen depends on how up-to-date the DSS database has to be If you
can afford some lag time, you could consider using a dump-and-load mechanism, such as
Log Shipping or periodic full database restores If the DSS system requires
up-to-the-minute concurrency, you might want to consider using replication or database mirroring
Another possible alternative is to have only the required indexes in place during normal
processing periods to support the OLTP requirements At the end of the business day, you
can create the indexes necessary to support the DSS queries and reports, and they can run
Trang 8as batch jobs after normal processing hours When the DSS reports are complete, you can
drop the additional indexes, and you’re ready for the next day’s processing Note that this
solution assumes that the time required to create the additional indexes is offset by the
time saved by the faster running of the DSS queries If the additional indexes do not result
in substantial time savings, they are probably not necessary and need not be created in
the first place The queries need to be more closely examined to select the appropriate
indexes to best support your queries
As you can see, it is important to choose indexes carefully to provide a good balance
between data search and data modification performance The application environment
usually governs the choice of indexes For example, if the application is mainly OLTP with
transactions requiring fast response time, creating too many indexes might have an
adverse impact on performance On the other hand, the application might be a DSS with
few transactions doing data modifications In that case, it makes sense to create a number
of indexes on the columns frequently used in queries
Identifying Missing Indexes
When developing an index design for your database and applications, you should make
sure you create appropriate indexes for the various queries that will be executed against
your tables However, it can be quite a chore to identify all the queries you may need to
create indexes for Fortunately, SQL Server 2008 provides a couple of tools to help you
identify any indexes you may need in your database: The Database Engine Tuning Advisor
and the missing index dynamic management objects
The Database Engine Tuning Advisor
The Database Engine Tuning Advisor is a tool that can analyze a SQL Script file or a set of
queries captured in a SQL Profiler trace and recommend changes to your indexing scheme
After performing its analysis, the Database Engine Tuning Advisor provides
recommenda-tions for new or more effective indexes, indexed views, and partitioning schemes, along
with the estimated improvement in execution time should the recommendation be
imple-mented You can choose to implement the recommendations immediately or later, or you
can save the SQL statements to a script file For detailed information on using the
Database Engine Tuning Advisor, see Chapter55
Although the Database Engine Tuning Advisor is a useful tool, and improvements have been
made since it was introduced in SQL Server 2005 to improve its recommendations, it does
still have some limitations For one, because the Database Engine Tuning Advisor gathers
statistics by sampling the data, repeatedly running the tool on the same workload may
produce different results as different samples are used In addition, if you impose constraints,
such as specifying maximum disk space for tuning recommendations, the Database Engine
Tuning Advisor may be forced to drop certain existing indexes, and the resulting
recommen-dation may produce a negative expected improvement The Database Engine Tuning Advisor
may also not make recommendations under the following circumstances:
The table being tuned contains fewer than 10 data pages
Trang 9The recommended indexes would not offer enough improvement in query
perfor-mance over the current physical database design
The user who runs the Database Engine Tuning Advisor is not a member of the
db_owner database role or the sysadmin fixed server role
Missing Index Dynamic Management Objects
In addition to the Database Engine Tuning Advisor, SQL Server 2008 introduces the
missing index dynamic management objects to help identify potentially missing indexes
in your database The missing index dynamic management objects are a set of new dynamic
management objects introduced in SQL Server 2008:
about missing index groups, such as the performance improvements that could be
gained by implementing a specific group of missing indexes
of missing indexes, such as the group identifier and identifiers of all missing indexes
contained in that group
missing index; for example, it returns the name and identifier of the table where the
index is missing, and the columns and column types that should make up the
missing index
table columns missing an index
After running a typical workload on SQL Server, you can query the dynamic management
functions to retrieve information about possible missing indexes Listing 34.8 provides a
sample query that displays the missing index information for a query on the sales table
that was run between 10:30 and 10:40 p.m on 2/21/2010
LISTING 34.8 Querying the Missing Index Dynamic Management Objects
SELECT
mig.index_group_handle as handle,
convert(varchar(30), statement) AS table_name,
convert(varchar(12), column_name) AS Column_name,
convert(varchar(10), column_usage) as ColumnUsage,
avg_user_impact as avg_impact
FROM sys.dm_db_missing_index_details AS mid
CROSS APPLY sys.dm_db_missing_index_columns (mid.index_handle)
INNER JOIN sys.dm_db_missing_index_groups AS mig
ON mig.index_handle = mid.index_handle
inner join sys.dm_db_missing_index_group_stats AS migs
ON migs.group_handle = mig.index_group_handle
where mid.object_id = object_id(‘sales’)
and last_user_seek between ‘2010-02-21 22:30’ and ‘2010-02-21 22:40’
Trang 10ORDER BY mig.index_group_handle, mig.index_handle, column_id;
GO
handle table_name Column_name ColumnUsage avg_impact
- -
-2 [bigpubs -2008].[dbo].[sales] stor_id INCLUDE 87.46
2 [bigpubs2008].[dbo].[sales] qty INEQUALITY 87.46
In this example, the optimizer recommends an index on the qty column to support an
inequality operator It is also recommended that the stor_id column be specified as an
included column in the index This index is estimated to improve performance by 87.46%
Although the missing index feature provides some helpful information for identifying
potentially missing indexes in your database, it too has a few limitations:
It is not intended to fine-tune the existing indexes, only to recommend additional
indexes when no useful index is found that can be used to satisfy a search or join
condition
It reports only included columns for some queries You need to determine whether
the included columns should be specified as additional index key columns instead
It may return different costs for the same missing index group for different executions
It does not suggest filtered indexes
It is unable to provide recommendations for clustered indexes, indexed views, or
table partitioning (you should use the Database Engine Tuning Advisor instead for
these recommendations)
Probably the key limitation is that although the missing index feature is helpful for
identi-fying indexes that may be useful for you to define, it’s not a substitute for a
well-thought-out index design
Missing Index Feature Versus Database Engine Tuning Advisor
The missing indexes dynamic management objects are a lightweight, server-side,
always-on feature for identifying and correcting potential indexing oversights The Database
Engine Tuning Advisor, on the other hand, is a comprehensive client-side tool that can be
used to assess the physical database design and recommend new physical design structures
for improving performance, including not only indexes, but also indexed views or
parti-tioning schemes
The Database Engine Tuning Advisor and missing indexes feature can possibly return
different recommendations, even for a single-query workload The reason is that the
missing indexes dynamic management objects’ index key column recommendations are
not order sensitive On the other hand, the Database Engine Tuning Advisor
recommenda-tions include ordering of the key columns for indexes to optimize query performance