Microsoft SQL Server 2008 R2 Unleashed- P122 pptx

In the output in Listing 34.4, distinct key values in the first column of the index are stored as the sample values in the histogram.. In addition, because histogram steps are kept only

Trang 1

LISTING 34.4 DBCC SHOW_STATISTICS Output for the aunmind Index on the authors

Table

dbcc show_statistics (authors, aunmind )

go

Name Updated Rows Rows Sampled Steps Density

Average key length String Index Filter Expression Unfiltered Rows

- -

-

-aunmind Mar 14 2010 10:20PM 172 172 148 1

24.06977 YES NULL 172

(1 row(s) affected) All density Average Length Columns - -

-0.00625 6.406977 au_lname 0.005813953 13.06977 au_lname, au_fname 0.005813953 24.06977 au_lname, au_fname, au_id (3 row(s) affected) RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS

-Ahlberg 0 2 0 1

Alexander 0 1 0 1

Amis 0 1 0 1

Arendt 0 1 0 1

Arnosky 0 1 0 1

Bate 0 1 0 1

Bauer 0 1 0 1

Benchley 0 1 0 1

Bennet 0 1 0 1 Blotchet-Halls 0 1 0 1

del Castillo 0 1 0 1

Dillard 0 1 0 1

Doctorow 0 1 0 1

Doyle 0 1 0 1

Durrenmatt 2 1 2 1

Eastman 0 1 0 1

Gringlesby 0 1 0 1

Grisham 0 1 0 1

Gunning 0 1 0 1

Trang 2

Hill 0 1 0 1

Hutchins 3 2 3 1

Ionesco 0 1 0 1

Ishiguro 0 1 0 1

Tyler 0 1 0 1

Van Allsburg 0 1 0 1

Van der 0 1 0 1

Van der Meer 0 1 0 1

von Goethe 0 1 0 1

Walker 0 1 0 1

Warner 0 1 0 1

White 0 2 0 1

Wilder 0 1 0 1

Williams 0 2 0 1

Wilson 0 1 0 1

Yates 0 1 0 1

Yokomoto 0 1 0 1

Young 0 1 0 1

Looking at the output, you can determine that the statistics were last updated on March

14, 2010 At the time the statistics were generated, the table had 172 rows, and all 172

rows were sampled to generate the statistics (no filtering was applied) The average key

length is 24.06977 bytes From the All density information, you can see that this index is

highly selective (A low density means high selectivity; index densities are covered shortly.)

After the general information and the index densities, the index histogram is displayed

The Statistics Histogram

Up to 200 sample values can be stored in the statistics histogram Each sample value is

called a step The sample value stored in each step is the endpoint of a range of values Three

values are stored for each step:

RANGE_ROWS—This indicates how many other rows are inside the range between the

current step and the step prior, not including the step values themselves

EQ_ROWS—This is the number of rows that have the same value as the sample value

In other words, it is the number of duplicate values for the step

Range density—This indicates the number of distinct values within the range The

range density information is actually displayed in two separate columns,

DISTINCT_RANGE_ROWS and AVG_RANGE_ROWS:

DISTINCT_RANGE_ROWS is the number of distinct values between the current step

and the step prior, not including the step values itself

AVG_RANGE_ROWS is the average number of rows per distinct value within the

range of the step

Trang 3

In the output in Listing 34.4, distinct key values in the first column of the index are

stored as the sample values in the histogram Because most of the values for au_lname are

unique, most of the range values are 0 You can see that there is a duplicate in the index

key for the last name of Hutchins (EQ_ROWS is 2) For comparison purposes, Listing 34.5

shows a snippet of the DBCC SHOW_STATISTICS output for the titleidind index on the

sales table in bigpubs2008

LISTING 34.5 DBCC SHOW_STATISTICS Output for the titleidind Index on the sales

Table in the bigpubs2008 Database

dbcc show_statistics (sales, ‘titleidind’)

go

Name Updated Rows Rows Sampled Steps Density

Average key length String Index Filter Expression Unfiltered Rows

- - - - -

- -

-titleidind Mar 14 2010 10:39PM 168725 152432 188 0.003537365

26.40519 YES NULL 168725

All density Average Length Columns

- -

-0.001858736 6 title_id

5.99844E-06 10 title_id, stor_id

5.926804E-06 26.4007 title_id, stor_id, ord_num

RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS

- - - -

-BI0194 0 274.8199 0 1

BI2184 639.6047 312.9337 2 277.1448

BI2915 893.1208 271.811 3 261.8779

BI3976 637.2789 260.778 2 276.137

BI8448 1685.068 281.8409 6 300.0652

BU1111 616.3464 276.8259 2 267.0668

BU7832 357.0157 299.8948 1 296.2236

CH0249 1067.558 279.8349 3 313.0259

CH0639 1019.879 284.8499 3 299.0454

CH0671 316.3136 259.7751 1 262.4521

CH0847 1333.867 266.796 5 295.557

CH1260 1069.884 287.8589 3 313.7079

CH1380 612.8576 311.9307 2 265.5551

CH1692 974.525 275.8229 3 285.7469

CH2080 329.1057 285.8529 1 273.066

CH2240 715.1943 273.817 2 309.8983

CH2256 352.364 310.9277 1 292.364

Trang 4

CH2360 630.3014 293.8768 2 273.1136

CH2480 626.8126 311.9307 2 271.6019

CH2574 679.1439 279.8349 2 294.2774

CH2610 334.9203 280.8379 1 277.8905

CH2706 343.0607 300.8978 1 284.6448

CH2856 326.7799 287.8589 1 271.1362

FI9853 623.3239 295.8828 2 270.0902

FI9965 625.6497 323.9666 2 271.098

LC1680 629.1384 286.8559 2 272.6097

LC5292 647.7451 265.793 2 280.6721

MC3021 610.5318 244.7302 2 264.5473

NF2924 652.3968 266.796 2 282.6877

NF8918 669.8406 310.9277 2 290.2462

PC9999 665.1889 275.8229 2 288.2306

PS2106 709.3798 259.7751 2 307.3788

TC3218 617.5093 291.8708 2 267.5707

TC4203 29.23513 293.8768 0 284.9097

As you can see in this example, there are a greater number of rows per range and a greater

number of duplicates for each step value Also, 188 steps in the histogram are used, and

the sample values for the 168,725 rows in the table are distributed across those 188 step

values Also, in this example, 152,432 rows, rather than the whole table, were sampled to

generate the statistics

How the Statistics Histogram Is Used

The histogram steps are used for SARGs only when a constant expression is compared

against an indexed column and the value of the constant expression is known at query

compile time The following SARG examples show where histogram steps can be used:

where col_a = getdate()

where cust_id = 12345

where monthly_sales < 10000 / 12

where l_name like “Smith” + “%”

Some constant expressions cannot be evaluated until query runtime They include

search arguments that contain local variables or subqueries and also join clauses, such as

the following:

where price = @avg_price

where total_sales > (select sum(qty) from sales)

where titles.pub_id = publishers.pub_id

Trang 5

For these types of statements, you need some other way of estimating the number of

matching rows In addition, because histogram steps are kept only on the first column of

the index, SQL Server must use a different method for determining the number of

match-ing rows for SARGs that specify multiple column values for a composite index, such as

the following:

select * from sales

where title_id = ‘BI3976’

and stor_id = ‘P648’

When the histogram is not used or cannot be used, SQL Server uses the index density

values to estimate the number of matching rows

Index Densities

SQL Server stores the density values of each column in the index for use in queries where

the SARG value is not known until runtime or when the SARG is on multiple columns of

the index For composite keys, SQL Server stores the density for the first column of the

composite key; for the first and second columns; for the first, second, and third columns;

and so on This information is shown in the All density section of the DBCC

SHOW_STATISTICS output in Listings 34.4 and 34.5

Index density essentially represents the inverse of all unique key values of the key The

density of each key is calculated by using the following formula:

Key density = 1.00 / Count of distinct key values in the table

Therefore, the density for the au_lname column in the authors table in the bigpubs2008

database is calculated as follows:

Select Density = 1.00/ (select count(distinct au_lname) from authors)

go

Density

-0.0062500000000

The density for the combination of the columns au_lname and au_fname is as follows:

Select Density = 1.00/ (select count(distinct au_lname + au_fname) from authors)

go

Density

-0.0058139534883

Notice that, unlike with the selectivity ratio, a smaller index density indicates a more

selective index As the density value approaches 1, the index becomes less selective and

essentially useless When the index selectivity is poor, the Query Optimizer might choose

to do a table scan or a leaf-level index scan rather than perform an index seek because it is

more cost-effective

Trang 6

TABLE 34.8 Index Densities for the titleidind Index on the sales Table

title_id, stor_id 5.99844E-06 (.00000599844)

title_id, stor_id, ord_num 5.926804E-06 (.000005926804)

TIP

Watch out for database indexes that have poor selectivity Such indexes are often more

of a detriment to the performance of the system than they are a help Not only are they

usually not used for data retrieval, but they also slow down your data modification

statements because of the additional index overhead You should identify such indexes

and consider dropping them

Typically, the density value should become smaller (that is, more selective) as you add

more columns to the key For example, in Listing 34.5, the densities get progressively

smaller (and thus, more selective) as additional columns are factored in, as shown in

Table 34.8

Estimating Rows Using Index Statistics

How does the Query Optimizer use the index statistics to estimate the number of rows

that match the SARGs in a query?

SQL Server uses the histogram information when searching for a known value being

compared to the leading column of the index key column, especially when the search

spans a range or when there are duplicate values in the key Consider this query on the

sales table in the bigpubs2008 database:

select * from sales

where title_id = ‘BI3976’

Because there are duplicates of title_id in the table, SQL Server uses the histogram on

title_id (refer to Listing 34.5) to estimate the number of matching rows For the value of

BI3976, it would look at the EQ_ROWS value, which is 260.778 This indicates that there are

approximately 261 rows in the table that have a title_id value of BI3976

When an exact match for the search argument is not found as a step in the histogram,

SQL Server uses the AVG_RANGE_ROWS value for the next step greater than the search value

For example, SQL Server would estimate that for a search value of ’BI4184’, on average, it

would >match approximately 300.0652 rows because that is the AVG_RANGE_ROWS value for

the step value of ’BI8448’, which is the next step greater than ’BI3976’

Trang 7

When the query is a range retrieval that spans multiple steps, SQL Server sums the

RANGE_ROWS and EQ_ROWS values between the endpoints of the range retrieval For example,

when we use the histogram in Listing 34.5, if the search argument were where title_id

<= ‘BI3976’, the row estimate would be 274.8199+639.6047+312.9337+893.1208+

271.811+637.2789+260.778, or 3290.3470 rows

As mentioned previously, when the histogram cannot be used, SQL Server uses just the

index density to estimate the number of matching rows The formula is straightforward

for an equality search; it looks like this:

Row Estimate = Number of Rows in Table × Index Density

For example, to estimate the number of matching rows for any given title_id in the

sales table, multiply the number of rows in the sales table by the index density for the

title_id key (0.001862197), as follows:

select count(*) * 0.001862197 as ‘Row Estimate’

from sales

go

Row Estimate

-314.199188825

If a query specifies both the title_id and stor_id as SARGs, and if the SARG for

title_id is a constant expression that can be evaluated at optimization time, SQL Server

uses both the index density on title_id and stor_id as well as the histogram on

title_id to estimate the number of matching rows For some data values, the estimated

number of matching rows for title_id and stor_id calculated using the index density

could be greater than the estimated number of rows that match the specific title_id, as

determined by the histogram SQL Server uses whichever is the smaller of the two to

calculate the row estimate

Multiplying the number of rows in the sales table by the index density for title_id,

stor_id (5.997505E-06), you can see that it is nearly unique, essentially matching only a

single row:

select count(*) * 5.997505E-06 as ‘Row Estimate’

from sales

Row Estimate

-1.011929031125

In this example, SQL Server would use the index density on title_id and stor_id to

esti-mate the number of matching rows In this case, it is estiesti-mated that the query will return,

on average, one matching row

Trang 8

Generating and Maintaining Index and Column Statistics

At this point, you might ask, “How do the index statistics get created?” and “How are they

maintained?” The index statistics are first created when you create the index on a table

that already contains data rows or when you run the UPDATE STATISTICS command Index

statistics can also be automatically updated by SQL Server SQL Server can be configured to

constantly monitor the update activity on the indexed key values in a database and

update the statistics through an internal process, when appropriate

Auto-Update Statistics

To automatically update statistics, an internal SQL Server process monitors the updates to

a table’s columns to determine when statistics should be updated SQL Server internally

keeps track of the number of modifications made to a column via column modification

counters (colmodctrs) SQL Server uses information about the table and the colmodctrs to

determine whether statistics are out of date and need to be updated Statistics are

consid-ered out of date in the following situations:

When the table size has gone from 0 to >0 rows

When the number of rows in the table at the time the statistics were gathered was

500 or fewer and the colmodctr of the leading column of the statistics object has

changed by more than 500

When the table had more than 500 rows at the time the statistics were gathered and

the colmodctr of the leading column of the statistics object has changed by more

than 500 + 20% of the number of rows in the table

If the statistics are defined on a temporary table, there is an additional threshold for

updat-ing statistics every six column modifications if the table contains fewer than 500 rows

The colmodctrs are incremented in the following situations:

When a row is inserted into the table

When a row is deleted from the table

When an indexed column is updated

Whenever the index statistics have been updated for a column, the colmodctr for that

column is reset to 0

When SQL Server generates an update of the column statistics, it generates the new

statis-tics based on a sampling of the data values in the table Sampling helps minimize the

overhead of the AutoStats process The sampling is random across the data pages, and the

values are taken from the table or the smallest nonclustered index on the columns needed

to generate the statistics After a data page containing a sampled row has been read from

disk, all the rows on the data page are used to update the statistical information

Trang 9

CAUTION

Having up-to-date statistics on tables helps ensure that optimum execution plans are

being generated for queries at all times In most cases, you would want SQL Server to

automatically keep the statistics updated However, it is possible that Auto-Update

Statistics can cause an update of the index statistics to run at inappropriate times in a

production environment or in a high-volume environment to run too often If this

prob-lem is occurring, you might want to turn off the AutoStats feature and set up a

sched-uled job to update statistics during off-peak periods Do not forget to update statistics

periodically; otherwise, the resulting performance problems might end up being much

worse than the momentary ones caused by the AutoStats process

To determine how often the AutoStats process is being run, you can use SQL Server

Profiler to determine when an automatic update of index statistics is occurring by

moni-toring the Auto Stats event in the Performance event class (For more information on

using SQL Server Profiler, see Chapter 6.)

If necessary, it is possible to turn off the AutoStats behavior by using the sp_autostats

system stored procedure This stored procedure allows you to turn the automatic updating

of statistics on or off for a specific index or all the indexes of a table The following

command turns off the automatic update of statistics for an index named aunmind on the

authors table:

Exec sp_autostats ‘authors’, ‘OFF’, ‘aunmind’

When you run sp_autostats and simply supply the table name, it displays the current

setting for the table as well as the database Following are the settings for the authors table:

Exec sp_autostats ‘authors’

go

Global statistics settings for [bigpubs2008]:

Automatic update statistics: ON

Automatic create statistics: ON

settings for table [authors]

Index Name AUTOSTATS Last Updated

-

-[UPKCL_auidind] ON 2009-10-19 01:23:47.263

[aunmind] OFF 2010-03-14 22:20:52.177

[_WA_Sys_state_4AB81AF0] ON 2009-10-19 01:23:47.263

[au_fname] ON 2009-10-19 01:23:47.280

[phone] ON 2009-10-19 01:23:47.293

[address] ON 2009-10-19 01:23:47.310

[city] ON 2009-10-19 01:23:47.310

[zip] ON 2009-10-19 01:23:47.310

Trang 10

There are three other ways to disable auto-updating of statistics for an index:

Specify the STATISTICS_NORECOMPUTE clause when creating the index

Specify the NORECOMPUTE option when running the UPDATE STATISTICS command

Specify the NORECOMPUTE option when creating statistics with the CREATE STATISTICS

command (You learn more about this command in the “Creating Statistics” section,

later in the chapter.)

You can also turn AutoStats on or off for the entire database by setting the database

option in SQL Server Management Studio; to do this, right-click the database in Object

Explorer to bring up the Database Properties dialog, select the Options page, and set the

Auto Update Statistics option to False You can also disable or enable the AutoStats option

for a database by using the ALTER DATABASE command:

ALTER DATABASE dbname SET AUTO_UPDATE_STATISTICS { ON | OFF }

NOTE

What actually happens when you execute sp_autostats or use the NORECOMPUTE

option in the UPDATE STATISTICS command to turn off auto-update statistics for a

spe-cific index or table? SQL Server internally sets a flag in the system catalog to inform

the internal SQL Server process not to update the index statistics for the table or index

that has had the option turned off using any of these commands To re-enable Auto

Update Statistics, you either run UPDATE STATISTICS without the NORECOMPUTE option

or execute the sp_autostats system stored procedure and specify the value ’ON’ for

the second parameter

Asynchronous Statistics Updating

In versions prior to SQL Server 2005, when SQL Server determined that the statistics being

examined to optimize a query were out of date, the query would wait for the statistics update

to complete before compilation of the query plan would continue This is still the default

behavior in SQL Server 2008 However, the database option, AUTO_UPDATE_STATISTICS_ASYNC,

can be enabled to support asynchronous statistics updating

When the AUTO_UPDATE_STATISTICS_ASYNC option is enabled, queries do not have to wait

for the statistics to be updated before compiling Instead, SQL Server puts the out-of-date

statistics on a queue to be updated by a worker thread, which runs as a background

process The query and any other concurrent queries compile immediately by using the

existing out-of-date statistics Because there is no delay for updated statistics, query

response times are more predictable, even if the out-of-date statistics may cause the Query

Optimizer to choose a less-efficient query plan Queries that start after the updated

statis-tics are ready use the updated statisstatis-tics

Định dạng
Số trang	10
Dung lượng	231,18 KB