1. Trang chủ
  2. » Công Nghệ Thông Tin

Microsoft SQL Server 2008 R2 Unleashed- P126 doc

10 106 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 10
Dung lượng 329,65 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

If you create a nonclustered index on a computed column, the computed value is stored in the nonclustered index rows but not in the data rows, unless you also have a clustered index on t

Trang 1

In addition, the functions in the computed column must be deterministic A deterministic

function is one that returns the same result every time it is called with the same set of

input parameters

When you create a clustered index on a computed column, it is no longer a virtual

column in the table The computed value for the column is stored in the data rows of the

table If you create a nonclustered index on a computed column, the computed value is

stored in the nonclustered index rows but not in the data rows, unless you also have a

clustered index on the computed column

Be aware of the overhead involved with indexes on computed columns Updates to the

columns that the computed columns are based on result in updates to the index on the

computed column as well

Indexes on computed columns can be useful when you need an index on large character

fields As discussed earlier, the smaller an index, the more efficient it is You could create a

computed column on the large character field by using the CHECKSUM() function

CHECKSUM() generates a 4-byte integer that is relatively unique for character strings but not

absolutely unique (Different character strings can generate the same checksum, so when

searching against the checksum, you need to include the character string as an additional

search argument to ensure that you are matching the right row.) The benefit is that you

can create an index on the 4-byte integer generated by the CHECKSUM() that can be used to

search against the character string instead of having to create an index on the large

char-acter column itself Listing 34.7 shows an example of applying this solution

LISTING 34.7 Using an Index on a Computed Checksum Column

The first statement is used to disable any previously created

DDL triggers in the database which would prevent creating a new constraint

DISABLE TRIGGER ALL ON DATABASE

go

First add the computed column to the table

alter table titles add title_checksum as CHECKSUM(title)

go

Next, create an index on the computed column

create index NC_titles_titlechecksum on titles(title_checksum)

go

In your queries, include both the checksum column and the title column in

your search argument

select title_id, ytd_sales

from titles

where title_checksum = checksum(‘Fifty Years in Buckingham Palace Kitchens’)

and title = ‘Fifty Years in Buckingham Palace Kitchens’

Trang 2

SQL Server 2008 also supports persisted computed columns With persisted computed

columns, SQL Server stores the computed values in the table without requiring an index on

the computed column Like indexed computed columns, persisted computed columns are

updated when any other columns on which the computed column depends are updated

Persisted computed columns allow you to create an index on a computed column that is

defined with a deterministic, but imprecise, expression This option enables you to create

an index on a computed column when SQL Server cannot determine with certainty

whether a function that returns a computed column expression—for example, a CLR

func-tion that is created in the Microsoft NET Framework—is both deterministic and precise

Filtered Indexes and Statistics

As discussed earlier in this chapter, a nonclustered index contains a row for every row in

the table, even rows with a large number of duplicate key values where the nonclustered

index will not be an effective method for finding those rows For these situations, SQL

Server 2008 introduces filtered indexes Filtered indexes are an optimized form of

nonclus-tered indexes, created by specifying a search predicate when defining the index This

search predicate acts as a filter to create the index on only the data rows that match the

search predicate This reduces the size of the index and essentially creates an index that

covers your queries, which return only a small percentage of rows from a well-defined

subset of data within your table

Filtered indexes can provide the following advantages over full-table indexes:

improves query performance and execution plan quality because it is smaller than a

full-table nonclustered index and has filtered statistics Filtered statistics are more

accurate than full-table statistics because they cover only the rows contained in the

filtered index

data modifications affect the data values contained in the index Also, because a

filtered index contains only the frequently accessed data, the smaller size of the

index reduces the cost of updating the statistics

non-clustered indexes when a table index is not necessary You can replace a

full-table nonclustered index with multiple filtered indexes without significantly

increasing the storage requirements

Following are some of the situations in which filtered indexes can be useful:

When a column contains mostly NULL values, but your queries search only for rows

where data values are NOT NULL

When a column contains a large number of duplicate values, but your queries

Trang 3

When you want to enforce uniqueness on a subset of values—for example, a

column on which you want to allow NULL values A unique constraint allows only

one NULL value; however, a filtered index can be defined as unique over only the

rows that are NOT NULL

When queries retrieve only a particular range of data values and you want to index

these values but not the entire table For example, you have a table that contains a

large number of historical values, but you want to search only values for the current

year or quarter You can create a filtered index on the desired range of values and

possibly even use the INCLUDE option to add columns so your index fully covers

your queries

Now, you may be asking, “Can’t some of the preceding solutions be accomplished using

indexed views?” Yes, they can, but filtered indexes provided a better alternative The most

significant advantage is that filtered indexes can be used in any edition of SQL Server

2008, whereas indexed views are chosen by the optimizer only in the Developer,

Enterprise, and Datacenter Editions unless you use the NOEXPAND hint in other editions In

addition, filtered indexes have reduced index maintenance costs (the query processor uses

fewer CPU resources to update a filtered index than an indexed view); the Query

Optimizer considers using a filtered index in more situations than the equivalent indexed

view; you can perform online rebuilds of filtered indexes (online index rebuilds are not

supported for indexed views); and filtered indexes can be nonunique, whereas indexed

views must be unique

Based on these advantages, it is recommended that you use filtered indexes instead of

indexed views when possible Consider replacing indexed views with filtered indexes

when the view references only one table, the view query doesn’t return computed

columns, and the view predicate uses simple comparison logic and doesn’t contain a view

Creating and Using Filtered Indexes

To define filtered indexes, you use the normal CREATE INDEX command but include a

WHERE condition as a search predicate to specify which data rows the filtered index should

include In the current implementation, you can specify only simple search predicates

such as IN; the comparison operators IS NULL, IS NOT NULL, =, <>, !=, >, >=, !>, <, <=, !<;

and the logical operator AND In addition, filtered indexes cannot be created on computed

columns, user-defined data types, Hierarchyid, or spatial types

For example, assume you need to search only the sales table in the bigpubs2008 database

for sales since 9/1/2008 The majority of the rows in the sales table have order dates prior

to 9/1/2008 To create a filtered index on the ord_date column, you would execute a

command like the following:

create index ord_date_filt on sales (ord_date)

WHERE ord_date >= ‘2008-09-01 00:00:00.000’

Trang 4

FIGURE 34.29 Query plan for a query using a value not in the filtered index

Now, let’s look at a couple queries that may or may not use the new filtered index First,

let’s consider the following query looking for any sales for 9/15/2008:

select * from sales

where ord_date = ‘9/15/2008’

If you look at the execution plan in Figure 34.28, you can see that the filtered index,

ord_date_filt, is used to locate the qualifying row values The clustered index,

UPKCL_sales, is used as the row locator to retrieve the data rows (as described earlier in the

“Nonclustered Indexes” section)

NOTE

For more information on understanding and analyzing query plans, see Chapter 36

If you run the following query using a data values that’s outside the range of values stored

in the filtered index, you see that the filtered index is not used (see Figure 34.29):

select * from sales

where ord_date = ‘9/15/2008’

FIGURE 34.28 Query plan for a query that uses a filtered index

Trang 5

Now let’s consider a query that you might expect would use the filtered index but does not:

select stor_id, qty from sales

where ord_date > ‘9/15/2008’

Now, you might expect that this query would use the filtered index because the data

values are within the range of values for the filtered index, but due to the number of rows

that match, SQL Server determines that the I/O cost of using the filtered nonclustered

index to locate the matching rows and then retrieve the data rows using the clustered

index row locators requires more I/Os than simply performing a clustered index scan of

the entire table (the same query plan as shown in Figure 34.29)

In this case, you might want to use included columns on the filtered index so that the

data values for the query can be retrieved using index covering without incurring the

extra cost of using the row locators to retrieve the actual data rows The following

example creates a filtered index on ord_date that includes stor_id and qty:

create index ord_date_filt2 on sales (ord_date)

INCLUDE (qty, stor_id)

WHERE ord_date >= ‘2008-09-01 00:00:00.000’

If you rerun the same query and examine the query plan, you see that the filtered index is

used this time, and SQL Server uses index covering (see Figure 34.30) You can tell that it’s

using index covering with the ord_dat_filt2 index because there is no use of the

clus-tered index to retrieve the data rows Using the row locators is unnecessary because all the

information requested by the query can be retrieved from the index leaf rows that contain

the values of the included columns as well

Creating and Using Filtered Statistics

Similar to the way you use filtered indexes, SQL Server 2008 also lets you create filtered

statistics Like filtered indexes, filtered statistics are also created over a subset of rows in

the table based on a specified filter predicate Creating a filtered index on a column

autocreates the corresponding filtered statistics In addition, filtered statistics can be

created explicitly by including the WHERE clause with the CREATE STATISTICS statement

FIGURE 34.30 Query plan using index covering on a filtered index with included columns

Trang 6

Filtered statistics can be used to avoid a common issue with statistics where the cardinality

estimation is skewed due to a large number of NULL or duplicate values, or due to a data

correlation between columns For example, let’s consider the titles table in the

bigpubs2008 database All the cooking books (type = ‘trad_cook’ or ’mod_cook’) are

published by a single publisher (pub_id = ‘0877’) However, SQL Server stores

column-level statistics on each of these columns independent of each other Based on the

statis-tics, SQL Server estimates there are six rows in the titles table where pub_id = ‘0877’,

and five rows where the type is either ’trad_cook’ or ’mod_cook’

However, let’s assume you were to execute the following query:

select * from titles where pub_id = ‘0877’

and type in (‘trad_cook’, ‘mod_cook’)

When the Query Optimizer estimates the selectivity of this query where each search

predi-cate is part of an AND condition, it assumes the conditions are independent of one another

and estimates the number of matching rows by taking the intersection of the two

condi-tions Essentially, it multiplies the selectivity of each of the two conditions together to

determine the total selectivity The selectivity of each is 0.011 (6/537) and 0.009 (5/537),

which, when multiplied together, comes out to approximately 0.0001, so the optimizer

estimates at most only a single row will match However, because all five cooking books

are published by pub_id ‘0877’, in actuality a total of five rows match

Now, in this example, the difference between one row and five rows is likely not

signifi-cant enough to make a big difference in query performance, but a similar estimation error

could be quite large with other data sets, leading the optimizer to possibly choose an

inap-propriate, and considerably more expensive, query plan

Filtered statistics can help solve this problem by letting you capture these types of data

correlations in your column statistics For example, to capture the fact that all cooking

books are also published by the same publisher, you could create the filtered statistics

using the following statement:

create statistics pub_id_type on titles (pub_id, type)

where pub_id = ‘0877’ and type in (‘trad_cook’, ‘mod_cook’)

When these filtered statistics are defined and the same query is run, SQL Server uses the

filtered statistics to determine that the query will match five rows instead of only one

Although using this solution could require having to define a number of filtered statistics,

it can be effective to help fix your most critical queries where cardinality estimates due to

data correlation or data skew issues are causing the Query Optimizer to choose poorly

performing query plans

Choosing Indexes: Query Versus Update Performance

I/O is the primary factor in determining query performance The challenge for a database

designer is to build a physical data model that provides efficient data access Creating

Trang 7

indexes on database tables allows SQL Server to access data with fewer I/Os Defining

useful indexes during the logical and physical data modeling step is crucial The SQL

Server Query Optimizer relies heavily on index key distribution and index density to

determine which indexes to use for a query The Query Optimizer in SQL Server can use

multiple indexes in a query (through index intersection) to reduce the I/O required to

retrieve information In the absence of indexes, the Query Optimizer performs a table

scan, which can be costly from an I/O standpoint

Although indexes provide a means for faster access to data, they slow down data

modifica-tion statements due to the extra overhead of having to maintain the index during inserts,

updates, and deletes

In a DSS environment, defining many indexes can help your queries and does not create

much of a performance issue because the data is relatively static and doesn’t get updated

frequently You typically load the data, create the indexes, and forget about it until the

next data load As long as you have the necessary indexes to support the user queries and

they’re getting decent response time, the penalties of having too many indexes in a DSS

environment are the space wasted for indexes that possibly won’t be used, the additional

time required to create the excessive indexes, and the additional time required to back up

and run DBCC checks on the data

In an OLTP environment, on the other hand, too many indexes can lead to significant

performance degradation, especially if the number of indexes on a table exceeds four or

five Think about it for a second Every single-row insert is at least one data page write and

one or more index page writes (depending on whether a page split occurs) for every index

on the table With eight nonclustered indexes, that would be a minimum of nine writes to

the database for a single-row insert Therefore, for an OLTP environment, you want as few

indexes as possible—typically only the indexes required to support the update and delete

operations and your critical queries, and to enforce your uniqueness constraints

The natural solution, in a perfect world, would be to create a lot of indexes for a DSS

envi-ronment and as few indexes as possible in an OLTP envienvi-ronment Unfortunately, in the

real world, you typically have an environment that must support both DSS and OLTP

applications How do you resolve the competing indexing requirements of the two

envi-ronments? Meeting the indexing needs of DSS and OLTP applications requires a bit of a

balancing act, with no easy solution It often involves making hard decisions as to which

DSS queries might have to live with table scans and which updates have to contend with

additional overhead

One solution is to have two separate databases: one for DSS applications and another for

OLTP applications Obviously, this method requires some method of keeping the databases

in sync The method chosen depends on how up-to-date the DSS database has to be If you

can afford some lag time, you could consider using a dump-and-load mechanism, such as

Log Shipping or periodic full database restores If the DSS system requires

up-to-the-minute concurrency, you might want to consider using replication or database mirroring

Another possible alternative is to have only the required indexes in place during normal

processing periods to support the OLTP requirements At the end of the business day, you

can create the indexes necessary to support the DSS queries and reports, and they can run

Trang 8

as batch jobs after normal processing hours When the DSS reports are complete, you can

drop the additional indexes, and you’re ready for the next day’s processing Note that this

solution assumes that the time required to create the additional indexes is offset by the

time saved by the faster running of the DSS queries If the additional indexes do not result

in substantial time savings, they are probably not necessary and need not be created in

the first place The queries need to be more closely examined to select the appropriate

indexes to best support your queries

As you can see, it is important to choose indexes carefully to provide a good balance

between data search and data modification performance The application environment

usually governs the choice of indexes For example, if the application is mainly OLTP with

transactions requiring fast response time, creating too many indexes might have an

adverse impact on performance On the other hand, the application might be a DSS with

few transactions doing data modifications In that case, it makes sense to create a number

of indexes on the columns frequently used in queries

Identifying Missing Indexes

When developing an index design for your database and applications, you should make

sure you create appropriate indexes for the various queries that will be executed against

your tables However, it can be quite a chore to identify all the queries you may need to

create indexes for Fortunately, SQL Server 2008 provides a couple of tools to help you

identify any indexes you may need in your database: The Database Engine Tuning Advisor

and the missing index dynamic management objects

The Database Engine Tuning Advisor

The Database Engine Tuning Advisor is a tool that can analyze a SQL Script file or a set of

queries captured in a SQL Profiler trace and recommend changes to your indexing scheme

After performing its analysis, the Database Engine Tuning Advisor provides

recommenda-tions for new or more effective indexes, indexed views, and partitioning schemes, along

with the estimated improvement in execution time should the recommendation be

imple-mented You can choose to implement the recommendations immediately or later, or you

can save the SQL statements to a script file For detailed information on using the

Database Engine Tuning Advisor, see Chapter55

Although the Database Engine Tuning Advisor is a useful tool, and improvements have been

made since it was introduced in SQL Server 2005 to improve its recommendations, it does

still have some limitations For one, because the Database Engine Tuning Advisor gathers

statistics by sampling the data, repeatedly running the tool on the same workload may

produce different results as different samples are used In addition, if you impose constraints,

such as specifying maximum disk space for tuning recommendations, the Database Engine

Tuning Advisor may be forced to drop certain existing indexes, and the resulting

recommen-dation may produce a negative expected improvement The Database Engine Tuning Advisor

may also not make recommendations under the following circumstances:

The table being tuned contains fewer than 10 data pages

Trang 9

The recommended indexes would not offer enough improvement in query

perfor-mance over the current physical database design

The user who runs the Database Engine Tuning Advisor is not a member of the

db_owner database role or the sysadmin fixed server role

Missing Index Dynamic Management Objects

In addition to the Database Engine Tuning Advisor, SQL Server 2008 introduces the

missing index dynamic management objects to help identify potentially missing indexes

in your database The missing index dynamic management objects are a set of new dynamic

management objects introduced in SQL Server 2008:

about missing index groups, such as the performance improvements that could be

gained by implementing a specific group of missing indexes

of missing indexes, such as the group identifier and identifiers of all missing indexes

contained in that group

missing index; for example, it returns the name and identifier of the table where the

index is missing, and the columns and column types that should make up the

missing index

table columns missing an index

After running a typical workload on SQL Server, you can query the dynamic management

functions to retrieve information about possible missing indexes Listing 34.8 provides a

sample query that displays the missing index information for a query on the sales table

that was run between 10:30 and 10:40 p.m on 2/21/2010

LISTING 34.8 Querying the Missing Index Dynamic Management Objects

SELECT

mig.index_group_handle as handle,

convert(varchar(30), statement) AS table_name,

convert(varchar(12), column_name) AS Column_name,

convert(varchar(10), column_usage) as ColumnUsage,

avg_user_impact as avg_impact

FROM sys.dm_db_missing_index_details AS mid

CROSS APPLY sys.dm_db_missing_index_columns (mid.index_handle)

INNER JOIN sys.dm_db_missing_index_groups AS mig

ON mig.index_handle = mid.index_handle

inner join sys.dm_db_missing_index_group_stats AS migs

ON migs.group_handle = mig.index_group_handle

where mid.object_id = object_id(‘sales’)

and last_user_seek between ‘2010-02-21 22:30’ and ‘2010-02-21 22:40’

Trang 10

ORDER BY mig.index_group_handle, mig.index_handle, column_id;

GO

handle table_name Column_name ColumnUsage avg_impact

- -

-2 [bigpubs -2008].[dbo].[sales] stor_id INCLUDE 87.46

2 [bigpubs2008].[dbo].[sales] qty INEQUALITY 87.46

In this example, the optimizer recommends an index on the qty column to support an

inequality operator It is also recommended that the stor_id column be specified as an

included column in the index This index is estimated to improve performance by 87.46%

Although the missing index feature provides some helpful information for identifying

potentially missing indexes in your database, it too has a few limitations:

It is not intended to fine-tune the existing indexes, only to recommend additional

indexes when no useful index is found that can be used to satisfy a search or join

condition

It reports only included columns for some queries You need to determine whether

the included columns should be specified as additional index key columns instead

It may return different costs for the same missing index group for different executions

It does not suggest filtered indexes

It is unable to provide recommendations for clustered indexes, indexed views, or

table partitioning (you should use the Database Engine Tuning Advisor instead for

these recommendations)

Probably the key limitation is that although the missing index feature is helpful for

identi-fying indexes that may be useful for you to define, it’s not a substitute for a

well-thought-out index design

Missing Index Feature Versus Database Engine Tuning Advisor

The missing indexes dynamic management objects are a lightweight, server-side,

always-on feature for identifying and correcting potential indexing oversights The Database

Engine Tuning Advisor, on the other hand, is a comprehensive client-side tool that can be

used to assess the physical database design and recommend new physical design structures

for improving performance, including not only indexes, but also indexed views or

parti-tioning schemes

The Database Engine Tuning Advisor and missing indexes feature can possibly return

different recommendations, even for a single-query workload The reason is that the

missing indexes dynamic management objects’ index key column recommendations are

not order sensitive On the other hand, the Database Engine Tuning Advisor

recommenda-tions include ordering of the key columns for indexes to optimize query performance

Ngày đăng: 05/07/2014, 02:20