In the output in Listing 34.4, distinct key values in the first column of the index are stored as the sample values in the histogram.. In addition, because histogram steps are kept only
Trang 1LISTING 34.4 DBCC SHOW_STATISTICS Output for the aunmind Index on the authors
Table
dbcc show_statistics (authors, aunmind )
go
Name Updated Rows Rows Sampled Steps Density
Average key length String Index Filter Expression Unfiltered Rows
- -
-
-aunmind Mar 14 2010 10:20PM 172 172 148 1
24.06977 YES NULL 172
(1 row(s) affected) All density Average Length Columns - -
-0.00625 6.406977 au_lname 0.005813953 13.06977 au_lname, au_fname 0.005813953 24.06977 au_lname, au_fname, au_id (3 row(s) affected) RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS
-Ahlberg 0 2 0 1
Alexander 0 1 0 1
Amis 0 1 0 1
Arendt 0 1 0 1
Arnosky 0 1 0 1
Bate 0 1 0 1
Bauer 0 1 0 1
Benchley 0 1 0 1
Bennet 0 1 0 1 Blotchet-Halls 0 1 0 1
del Castillo 0 1 0 1
Dillard 0 1 0 1
Doctorow 0 1 0 1
Doyle 0 1 0 1
Durrenmatt 2 1 2 1
Eastman 0 1 0 1
Gringlesby 0 1 0 1
Grisham 0 1 0 1
Gunning 0 1 0 1
Trang 2Hill 0 1 0 1
Hutchins 3 2 3 1
Ionesco 0 1 0 1
Ishiguro 0 1 0 1
Tyler 0 1 0 1
Van Allsburg 0 1 0 1
Van der 0 1 0 1
Van der Meer 0 1 0 1
von Goethe 0 1 0 1
Walker 0 1 0 1
Warner 0 1 0 1
White 0 2 0 1
Wilder 0 1 0 1
Williams 0 2 0 1
Wilson 0 1 0 1
Yates 0 1 0 1
Yokomoto 0 1 0 1
Young 0 1 0 1
Looking at the output, you can determine that the statistics were last updated on March
14, 2010 At the time the statistics were generated, the table had 172 rows, and all 172
rows were sampled to generate the statistics (no filtering was applied) The average key
length is 24.06977 bytes From the All density information, you can see that this index is
highly selective (A low density means high selectivity; index densities are covered shortly.)
After the general information and the index densities, the index histogram is displayed
The Statistics Histogram
Up to 200 sample values can be stored in the statistics histogram Each sample value is
called a step The sample value stored in each step is the endpoint of a range of values Three
values are stored for each step:
RANGE_ROWS—This indicates how many other rows are inside the range between the
current step and the step prior, not including the step values themselves
EQ_ROWS—This is the number of rows that have the same value as the sample value
In other words, it is the number of duplicate values for the step
Range density—This indicates the number of distinct values within the range The
range density information is actually displayed in two separate columns,
DISTINCT_RANGE_ROWS and AVG_RANGE_ROWS:
DISTINCT_RANGE_ROWS is the number of distinct values between the current step
and the step prior, not including the step values itself
AVG_RANGE_ROWS is the average number of rows per distinct value within the
range of the step
Trang 3In the output in Listing 34.4, distinct key values in the first column of the index are
stored as the sample values in the histogram Because most of the values for au_lname are
unique, most of the range values are 0 You can see that there is a duplicate in the index
key for the last name of Hutchins (EQ_ROWS is 2) For comparison purposes, Listing 34.5
shows a snippet of the DBCC SHOW_STATISTICS output for the titleidind index on the
sales table in bigpubs2008
LISTING 34.5 DBCC SHOW_STATISTICS Output for the titleidind Index on the sales
Table in the bigpubs2008 Database
dbcc show_statistics (sales, ‘titleidind’)
go
Name Updated Rows Rows Sampled Steps Density
Average key length String Index Filter Expression Unfiltered Rows
- - - - -
- -
-titleidind Mar 14 2010 10:39PM 168725 152432 188 0.003537365
26.40519 YES NULL 168725
All density Average Length Columns
- -
-0.001858736 6 title_id
5.99844E-06 10 title_id, stor_id
5.926804E-06 26.4007 title_id, stor_id, ord_num
RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS
- - - -
-BI0194 0 274.8199 0 1
BI2184 639.6047 312.9337 2 277.1448
BI2915 893.1208 271.811 3 261.8779
BI3976 637.2789 260.778 2 276.137
BI8448 1685.068 281.8409 6 300.0652
BU1111 616.3464 276.8259 2 267.0668
BU7832 357.0157 299.8948 1 296.2236
CH0249 1067.558 279.8349 3 313.0259
CH0639 1019.879 284.8499 3 299.0454
CH0671 316.3136 259.7751 1 262.4521
CH0847 1333.867 266.796 5 295.557
CH1260 1069.884 287.8589 3 313.7079
CH1380 612.8576 311.9307 2 265.5551
CH1692 974.525 275.8229 3 285.7469
CH2080 329.1057 285.8529 1 273.066
CH2240 715.1943 273.817 2 309.8983
CH2256 352.364 310.9277 1 292.364
Trang 4CH2360 630.3014 293.8768 2 273.1136
CH2480 626.8126 311.9307 2 271.6019
CH2574 679.1439 279.8349 2 294.2774
CH2610 334.9203 280.8379 1 277.8905
CH2706 343.0607 300.8978 1 284.6448
CH2856 326.7799 287.8589 1 271.1362
FI9853 623.3239 295.8828 2 270.0902
FI9965 625.6497 323.9666 2 271.098
LC1680 629.1384 286.8559 2 272.6097
LC5292 647.7451 265.793 2 280.6721
MC3021 610.5318 244.7302 2 264.5473
NF2924 652.3968 266.796 2 282.6877
NF8918 669.8406 310.9277 2 290.2462
PC9999 665.1889 275.8229 2 288.2306
PS2106 709.3798 259.7751 2 307.3788
TC3218 617.5093 291.8708 2 267.5707
TC4203 29.23513 293.8768 0 284.9097
As you can see in this example, there are a greater number of rows per range and a greater
number of duplicates for each step value Also, 188 steps in the histogram are used, and
the sample values for the 168,725 rows in the table are distributed across those 188 step
values Also, in this example, 152,432 rows, rather than the whole table, were sampled to
generate the statistics
How the Statistics Histogram Is Used
The histogram steps are used for SARGs only when a constant expression is compared
against an indexed column and the value of the constant expression is known at query
compile time The following SARG examples show where histogram steps can be used:
where col_a = getdate()
where cust_id = 12345
where monthly_sales < 10000 / 12
where l_name like “Smith” + “%”
Some constant expressions cannot be evaluated until query runtime They include
search arguments that contain local variables or subqueries and also join clauses, such as
the following:
where price = @avg_price
where total_sales > (select sum(qty) from sales)
where titles.pub_id = publishers.pub_id
Trang 5For these types of statements, you need some other way of estimating the number of
matching rows In addition, because histogram steps are kept only on the first column of
the index, SQL Server must use a different method for determining the number of
match-ing rows for SARGs that specify multiple column values for a composite index, such as
the following:
select * from sales
where title_id = ‘BI3976’
and stor_id = ‘P648’
When the histogram is not used or cannot be used, SQL Server uses the index density
values to estimate the number of matching rows
Index Densities
SQL Server stores the density values of each column in the index for use in queries where
the SARG value is not known until runtime or when the SARG is on multiple columns of
the index For composite keys, SQL Server stores the density for the first column of the
composite key; for the first and second columns; for the first, second, and third columns;
and so on This information is shown in the All density section of the DBCC
SHOW_STATISTICS output in Listings 34.4 and 34.5
Index density essentially represents the inverse of all unique key values of the key The
density of each key is calculated by using the following formula:
Key density = 1.00 / Count of distinct key values in the table
Therefore, the density for the au_lname column in the authors table in the bigpubs2008
database is calculated as follows:
Select Density = 1.00/ (select count(distinct au_lname) from authors)
go
Density
-0.0062500000000
The density for the combination of the columns au_lname and au_fname is as follows:
Select Density = 1.00/ (select count(distinct au_lname + au_fname) from authors)
go
Density
-0.0058139534883
Notice that, unlike with the selectivity ratio, a smaller index density indicates a more
selective index As the density value approaches 1, the index becomes less selective and
essentially useless When the index selectivity is poor, the Query Optimizer might choose
to do a table scan or a leaf-level index scan rather than perform an index seek because it is
more cost-effective
Trang 6TABLE 34.8 Index Densities for the titleidind Index on the sales Table
title_id, stor_id 5.99844E-06 (.00000599844)
title_id, stor_id, ord_num 5.926804E-06 (.000005926804)
TIP
Watch out for database indexes that have poor selectivity Such indexes are often more
of a detriment to the performance of the system than they are a help Not only are they
usually not used for data retrieval, but they also slow down your data modification
statements because of the additional index overhead You should identify such indexes
and consider dropping them
Typically, the density value should become smaller (that is, more selective) as you add
more columns to the key For example, in Listing 34.5, the densities get progressively
smaller (and thus, more selective) as additional columns are factored in, as shown in
Table 34.8
Estimating Rows Using Index Statistics
How does the Query Optimizer use the index statistics to estimate the number of rows
that match the SARGs in a query?
SQL Server uses the histogram information when searching for a known value being
compared to the leading column of the index key column, especially when the search
spans a range or when there are duplicate values in the key Consider this query on the
sales table in the bigpubs2008 database:
select * from sales
where title_id = ‘BI3976’
Because there are duplicates of title_id in the table, SQL Server uses the histogram on
title_id (refer to Listing 34.5) to estimate the number of matching rows For the value of
BI3976, it would look at the EQ_ROWS value, which is 260.778 This indicates that there are
approximately 261 rows in the table that have a title_id value of BI3976
When an exact match for the search argument is not found as a step in the histogram,
SQL Server uses the AVG_RANGE_ROWS value for the next step greater than the search value
For example, SQL Server would estimate that for a search value of ’BI4184’, on average, it
would >match approximately 300.0652 rows because that is the AVG_RANGE_ROWS value for
the step value of ’BI8448’, which is the next step greater than ’BI3976’
Trang 7When the query is a range retrieval that spans multiple steps, SQL Server sums the
RANGE_ROWS and EQ_ROWS values between the endpoints of the range retrieval For example,
when we use the histogram in Listing 34.5, if the search argument were where title_id
<= ‘BI3976’, the row estimate would be 274.8199+639.6047+312.9337+893.1208+
271.811+637.2789+260.778, or 3290.3470 rows
As mentioned previously, when the histogram cannot be used, SQL Server uses just the
index density to estimate the number of matching rows The formula is straightforward
for an equality search; it looks like this:
Row Estimate = Number of Rows in Table × Index Density
For example, to estimate the number of matching rows for any given title_id in the
sales table, multiply the number of rows in the sales table by the index density for the
title_id key (0.001862197), as follows:
select count(*) * 0.001862197 as ‘Row Estimate’
from sales
go
Row Estimate
-314.199188825
If a query specifies both the title_id and stor_id as SARGs, and if the SARG for
title_id is a constant expression that can be evaluated at optimization time, SQL Server
uses both the index density on title_id and stor_id as well as the histogram on
title_id to estimate the number of matching rows For some data values, the estimated
number of matching rows for title_id and stor_id calculated using the index density
could be greater than the estimated number of rows that match the specific title_id, as
determined by the histogram SQL Server uses whichever is the smaller of the two to
calculate the row estimate
Multiplying the number of rows in the sales table by the index density for title_id,
stor_id (5.997505E-06), you can see that it is nearly unique, essentially matching only a
single row:
select count(*) * 5.997505E-06 as ‘Row Estimate’
from sales
Row Estimate
-1.011929031125
In this example, SQL Server would use the index density on title_id and stor_id to
esti-mate the number of matching rows In this case, it is estiesti-mated that the query will return,
on average, one matching row
Trang 8Generating and Maintaining Index and Column Statistics
At this point, you might ask, “How do the index statistics get created?” and “How are they
maintained?” The index statistics are first created when you create the index on a table
that already contains data rows or when you run the UPDATE STATISTICS command Index
statistics can also be automatically updated by SQL Server SQL Server can be configured to
constantly monitor the update activity on the indexed key values in a database and
update the statistics through an internal process, when appropriate
Auto-Update Statistics
To automatically update statistics, an internal SQL Server process monitors the updates to
a table’s columns to determine when statistics should be updated SQL Server internally
keeps track of the number of modifications made to a column via column modification
counters (colmodctrs) SQL Server uses information about the table and the colmodctrs to
determine whether statistics are out of date and need to be updated Statistics are
consid-ered out of date in the following situations:
When the table size has gone from 0 to >0 rows
When the number of rows in the table at the time the statistics were gathered was
500 or fewer and the colmodctr of the leading column of the statistics object has
changed by more than 500
When the table had more than 500 rows at the time the statistics were gathered and
the colmodctr of the leading column of the statistics object has changed by more
than 500 + 20% of the number of rows in the table
If the statistics are defined on a temporary table, there is an additional threshold for
updat-ing statistics every six column modifications if the table contains fewer than 500 rows
The colmodctrs are incremented in the following situations:
When a row is inserted into the table
When a row is deleted from the table
When an indexed column is updated
Whenever the index statistics have been updated for a column, the colmodctr for that
column is reset to 0
When SQL Server generates an update of the column statistics, it generates the new
statis-tics based on a sampling of the data values in the table Sampling helps minimize the
overhead of the AutoStats process The sampling is random across the data pages, and the
values are taken from the table or the smallest nonclustered index on the columns needed
to generate the statistics After a data page containing a sampled row has been read from
disk, all the rows on the data page are used to update the statistical information
Trang 9CAUTION
Having up-to-date statistics on tables helps ensure that optimum execution plans are
being generated for queries at all times In most cases, you would want SQL Server to
automatically keep the statistics updated However, it is possible that Auto-Update
Statistics can cause an update of the index statistics to run at inappropriate times in a
production environment or in a high-volume environment to run too often If this
prob-lem is occurring, you might want to turn off the AutoStats feature and set up a
sched-uled job to update statistics during off-peak periods Do not forget to update statistics
periodically; otherwise, the resulting performance problems might end up being much
worse than the momentary ones caused by the AutoStats process
To determine how often the AutoStats process is being run, you can use SQL Server
Profiler to determine when an automatic update of index statistics is occurring by
moni-toring the Auto Stats event in the Performance event class (For more information on
using SQL Server Profiler, see Chapter 6.)
If necessary, it is possible to turn off the AutoStats behavior by using the sp_autostats
system stored procedure This stored procedure allows you to turn the automatic updating
of statistics on or off for a specific index or all the indexes of a table The following
command turns off the automatic update of statistics for an index named aunmind on the
authors table:
Exec sp_autostats ‘authors’, ‘OFF’, ‘aunmind’
When you run sp_autostats and simply supply the table name, it displays the current
setting for the table as well as the database Following are the settings for the authors table:
Exec sp_autostats ‘authors’
go
Global statistics settings for [bigpubs2008]:
Automatic update statistics: ON
Automatic create statistics: ON
settings for table [authors]
Index Name AUTOSTATS Last Updated
-
-[UPKCL_auidind] ON 2009-10-19 01:23:47.263
[aunmind] OFF 2010-03-14 22:20:52.177
[_WA_Sys_state_4AB81AF0] ON 2009-10-19 01:23:47.263
[au_fname] ON 2009-10-19 01:23:47.280
[phone] ON 2009-10-19 01:23:47.293
[address] ON 2009-10-19 01:23:47.310
[city] ON 2009-10-19 01:23:47.310
[zip] ON 2009-10-19 01:23:47.310
Trang 10There are three other ways to disable auto-updating of statistics for an index:
Specify the STATISTICS_NORECOMPUTE clause when creating the index
Specify the NORECOMPUTE option when running the UPDATE STATISTICS command
Specify the NORECOMPUTE option when creating statistics with the CREATE STATISTICS
command (You learn more about this command in the “Creating Statistics” section,
later in the chapter.)
You can also turn AutoStats on or off for the entire database by setting the database
option in SQL Server Management Studio; to do this, right-click the database in Object
Explorer to bring up the Database Properties dialog, select the Options page, and set the
Auto Update Statistics option to False You can also disable or enable the AutoStats option
for a database by using the ALTER DATABASE command:
ALTER DATABASE dbname SET AUTO_UPDATE_STATISTICS { ON | OFF }
NOTE
What actually happens when you execute sp_autostats or use the NORECOMPUTE
option in the UPDATE STATISTICS command to turn off auto-update statistics for a
spe-cific index or table? SQL Server internally sets a flag in the system catalog to inform
the internal SQL Server process not to update the index statistics for the table or index
that has had the option turned off using any of these commands To re-enable Auto
Update Statistics, you either run UPDATE STATISTICS without the NORECOMPUTE option
or execute the sp_autostats system stored procedure and specify the value ’ON’ for
the second parameter
Asynchronous Statistics Updating
In versions prior to SQL Server 2005, when SQL Server determined that the statistics being
examined to optimize a query were out of date, the query would wait for the statistics update
to complete before compilation of the query plan would continue This is still the default
behavior in SQL Server 2008 However, the database option, AUTO_UPDATE_STATISTICS_ASYNC,
can be enabled to support asynchronous statistics updating
When the AUTO_UPDATE_STATISTICS_ASYNC option is enabled, queries do not have to wait
for the statistics to be updated before compiling Instead, SQL Server puts the out-of-date
statistics on a queue to be updated by a worker thread, which runs as a background
process The query and any other concurrent queries compile immediately by using the
existing out-of-date statistics Because there is no delay for updated statistics, query
response times are more predictable, even if the out-of-date statistics may cause the Query
Optimizer to choose a less-efficient query plan Queries that start after the updated
statis-tics are ready use the updated statisstatis-tics