CHAPTER 34 Data Structures, Indexes, and Performance size_in_MB ---113.742187 Now, implement page compression on the sales_big table: ALTER TABLE sales_big REBUILD WITH DATA_COMPRESSION
Trang 1CHAPTER 34 Data Structures, Indexes, and Performance
sp_estimate_data_compression_savings
[ @schema_name = ] ‘schema_name’
, [ @object_name = ] ‘object_name’
, [@index_id = ] index_id
, [@partition_number = ] partition_number
, [@data_compression = ] ‘data_compression’
You can estimate the data compression savings for a table for either row or page
compres-sion by specifying either ’ROW’ or ’PAGE’ as the value for the @data_compression
parame-ter You can also estimate the average size of the compressed table if compression is
disabled by specifying NONE as the value for @data_compression You can also use the
sp_estimate_data_compression_savings procedure to estimate the space savings for
compression on a specific index or partition The following example estimates the space
savings if page compression were applied to the big_sales table in the bigpubs2008 table
versus row compression:
use bigpubs2008
go
exec sp_estimate_data_compression_savings ‘dbo’, ‘sales_big’, null, null, ‘PAGE’
go
object_name schema_name index_id partition_number
size_with_current_compression_setting(KB)
size_with_requested_compression_setting(KB)
sample_size_with_current_compression_setting(KB)
sample_size_with_requested_compression_setting(KB)
- - -
-
-
-
-sales_big dbo 1 1
116512 39128 40016 13440 sales_big dbo 2 1
36648
22128
10904
6584
exec sp_estimate_data_compression_savings ‘dbo’, ‘sales_big’, null, null, ‘ROW’
go
Trang 2object_name schema_name index_id partition_number
size_with_current_compression_setting(KB)
size_with_requested_compression_setting(KB)
sample_size_with_current_compression_setting(KB)
sample_size_with_requested_compression_setting(KB)
- - -
-
-
-
-sales_big dbo 1 1
116512 97936 40344 33912 sales_big dbo 2 1
36648 27176 10992 8152 You can see in this example that the space savings from page compression would be significant, with an estimated reduction in the size of the table itself (index_id = 1) from 113MB (116,512 KB) to 38MB (39,128 KB), a savings of more than 66% Row compression would not provide nearly as significant a savings, with an estimated reduction in size from 113MB to only 95MB (97,936 KB) , only a 16% savings If you compress the table, you can compare the estimated space savings to the actual size For example, let’s look at the initial size of the sales_big table: use bigpubs2008 go select sum(page_count) as pages, sum(compressed_page_count) as compressed_pages from sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID(‘sales_big’), 1, null, ‘DETAILED’) where index_level = 0 SELECT SUM(used_page_count/ 128.0) AS size_in_MB FROM sys.dm_db_partition_stats WHERE object_id=OBJECT_ID(‘dbo.sales_big’) AND index_id=1 GO pages compressed_pages
-14519 0
Trang 3CHAPTER 34 Data Structures, Indexes, and Performance
size_in_MB
-113.742187
Now, implement page compression on the sales_big table:
ALTER TABLE sales_big REBUILD WITH (DATA_COMPRESSION=PAGE)
Now, re-examine the size of the sales_big table:
select sum(page_count) as pages, sum(compressed_page_count) as compressed_pages
from sys.dm_db_index_physical_stats (DB_ID(),
OBJECT_ID(‘sales_big’), 1, null, ‘DETAILED’)
where index_level = 0
SELECT SUM(used_page_count/ 128.0) AS size_in_MB
FROM sys.dm_db_partition_stats
WHERE object_id=OBJECT_ID(‘dbo.sales_big’) AND index_id=1
GO
pages compressed_pages
-4452 4451
size_in_MB
-34.906250
In this example, you can see that the table was reduced in size significantly, from 14,519
pages to 4,452 pages (113.7MB to 34.9MB), pretty much right in line with the estimated
space savings You can also see that compression was reasonably effective, compressing
4,451 of 4,452 pages
Be aware that you may not always receive the space savings predicted due to the effects of
fill factor and the actual size of the rows For example, if you have a row that is 8,000 bytes
long and compression reduces its size by 40%, only one row can still be fit on the data page,
so there is no space savings for that page If the results of running
sp_estimate_data_compression_savings indicate that the table will grow, this indicates that
many of the rows in the table are using nearly the full precision of the data types, and the
addition of the small overhead needed for the compressed format is more than the savings
from compression In this, it is obvious that there is no advantage to enabling compression
Managing Data Compression with SSMS
The preceding examples show the T-SQL commands you can use to evaluate and manage
row and page compression in SQL Server 2008 SSMS provides a Data Compression Wizard
for evaluating and performing data compression activities To invoke the Data
Trang 4Compression Wizard, right-click on the table in the Object Explorer and select Storage and
then select Manage Compression Click Next to move past the Welcome page to bring up
the Select Compression Type page, as shown in Figure 34.11
On the Compression Type Page, you can choose the compression type to use at the
parti-tion level or to use the same compression type for all partiparti-tions You can also see the
esti-mated savings for selected compression type by clicking on the Calculate button After
you click on Calculate, the wizard displays the current partition size and requested
compression size in the corresponding columns (note that it might take a few moments to
do the calculation)
After making your selections, click on Next to display the Select and Output Option page
Here, you have the opportunity to have the wizard generate a script of commands you can
run manually to implement the selected compression type If you choose to generate a
script, you have the option to save the script to a file, the Clipboard, or to a new query
window in SSMS You also have the option to run the compression changes immediately
or schedule a SQL Agent job to run the changes at a specified time
Understanding Table Structures
A table is logically defined as a set of columns with certain properties, such as the data
type, nullability, constraints, and so on Information about data types, column properties,
constraints, and other information related to defining and creating tables can be found in
Chapters 24, “Creating and Managing Tables,” and 27, “Creating and Managing Views.”
FIGURE 34.11 The Data Compression Wizard’s Select Compression Type page
Trang 5CHAPTER 34 Data Structures, Indexes, and Performance
Internally, a table is contained in one or more partitions A partition is a user-defined unit of
data organization By default, a table has at least one partition that contains all the table pages
This partition resides in a single filegroup, as described earlier When a table has multiple
parti-tions, the data is partitioned horizontally so that groups of rows are mapped into individual
partitions, based on a specified column The partitions can be placed in one or more filegroups
in the database The table is treated as a single logical entity when queries or updates are
performed on the data Figure 34.12 shows the organization of a table in SQL Server 2008
Each table has one row in the sys.objects catalog view, and each table and index in a
database is represented by a single row in the sys.indexes catalog view Each partition of
a table or index is represented by one or more rows in the sys.partitions catalog view
Each partition can have three types of data, each stored on its own set of pages: in-row
data pages, row-overflow pages, and LOB data pages Each of these types of pages has an
allocation unit, which is contained in the sys.allocation_units view There is always at
least one allocation unit for the in-row data The following sample query shows how to
view the partition and allocation information for the databaselog and currency tables in
the AdventureWorks2008R2 database:
use AdventureWorks2008R2
go
SELECT convert(varchar(15), o.name) AS table_name,
p.index_id as indid,
convert(varchar(30), i.name) AS index_name ,
convert(varchar(18), au.type_desc) AS allocation_type,
au.data_pages as d_pgs,
partition_number as ptn
FROM sys.allocation_units AS au
JOIN sys.partitions AS p ON au.container_id = p.partition_id
JOIN sys.objects AS o ON p.object_id = o.object_id
JOIN sys.indexes AS i ON p.index_id = i.index_id AND i.object_id = p.object_id
Table
Partitionn
…
Heap or
B-Tree
Partition1
Heap or B-Tree
Data LOB OverflowRow Data LOB OverflowRow
FIGURE 34.12 Table organization in SQL Server 2008
Trang 6WHERE o.name = N’databaselog’ OR o.name = N’currency’
ORDER BY o.name, p.index_id;
table_name indid index_name allocation_type d_pgs ptn
-
-Currency 1 PK_ -Currency_ -CurrencyCode IN_ROW_DATA 1 1
Currency 2 AK_Currency_Name IN_ROW_DATA 1 1
DatabaseLog 0 NULL IN_ROW_DATA 753 1
DatabaseLog 0 NULL LOB_DATA 0 1
DatabaseLog 0 NULL ROW_OVERFLOW_DATA 0 1
DatabaseLog 2 PK_DatabaseLog_DatabaseLogID IN_ROW_DATA 3 1
In this example, you can see that the DatabaseLog table (which is a heap table) has three
allocation units associated with the table—LOB, row-overflow, and in-row data—and one
allocation unit for the nonclustered index PK_DatabaseLog_DatabaseLogID The currency
table (which is a clustered table) has a single in-row allocation unit for both the table
(index_id = 1) and the nonclustered index (AK_Currency_Name)
In SQL Server 2008, there are two types of tables: heap tables and clustered tables Let’s
look at how they are stored
Heap Tables
A table without a clustered index is a heap table There is no imposed ordering of the
data rows for a heap table Additionally, there is no direct linkage between the pages in a
heap table
By default, a heap has a single partition Heaps have one row in sys.partitions, with an
index ID of 0 for each partition used by the heap When a heap has multiple partitions,
each partition has a heap structure that contains the data for that specific partition For
example, if a heap has four partitions, there are four heap structures (one in each
parti-tion) and four rows in sys.partitions
Depending on the data types in the heap, each heap structure has one or more allocation
units to store and manage the data for each partition At a minimum, each heap has one
IN_ROW_DATA allocation unit per partition The heap also has one LOB_DATA allocation unit
per partition, if it contains large object columns It also has one ROW_OVERFLOW_DATA
allo-cation unit per partition if it contains variable-length columns that exceed the 8,060-byte
row size limit
To access the contents of a heap, SQL Server uses the IAM pages In SQL Server 2008, each
heap table has at least one IAM page The address of the first IAM page is available in the
undocumented sys.sytem_internals_allocation_units system view The column
first_iam_page points to the first IAM page in the chain of IAM pages that manage the
space allocated to the heap in a specific partition The following query returns the first
IAM pages for each of the allocation units for the heap table DatabaseLog in
AdventureWorks2008R2:
Trang 7CHAPTER 34 Data Structures, Indexes, and Performance
use AdventureWorks2008R2
go
select p.partition_number as ptn,
type_desc,
filegroup_id,
first_iam_page
from sys.system_internals_allocation_units i
inner join
sys.partitions p
on p.hobt_id = i.container_id
where p.object_id = OBJECT_ID(‘DatabaseLog’)
and index_id = 0
go
ptn type_desc filegroup_id first_iam_page
- - -
-1 IN_ROW_DATA -1 0xAA0000000 -100
1 LOB_DATA 1 0xB90000000100
1 ROW_OVERFLOW_DATA 1 0x000000000000
Note that the value 0x000000000000 for the first_iam_page for ROW_OVERFLOW_DATA
indi-cates that no extents have yet been allocated for storing row-overflow data
NOTE
The sys.system_internals_allocation_units system view is reserved for Microsoft
SQL Server internal use only Future compatibility and availability of this view is not
guar-anteed
The data pages and rows in the heap are not sorted in any specific order and are not linked
The IAM page registers which extents are used by the table SQL Server can then simply scan
the allocated extents referenced by the IAM page, in physical order This essentially avoids
the problem of page chain fragmentation during reads because SQL Server always reads full
extents in sequential order Using the IAM pages to set the scan sequence also means that
rows from the heap often are not returned in the order in which they were inserted
As discussed earlier, each IAM can map a maximum of 63,903 extents for a table As a table
uses extents beyond the range of those 63,903 extents, more IAM pages are created for the
heap table as needed A heap table also has at least one IAM page for each file on which
the heap table has extents allocated Figure 34.13 illustrates the structure of a heap and
how its contents are traversed using the IAM pages
Clustered Tables
A clustered table is a table that has a clustered index defined on it When you create a
clus-tered index, the data rows in the table are physically sorted in the order of the columns in
the index key The data pages are chained together in a doubly linked list (each page points
Trang 8IAM
File A
SYS.SYSTEM_INTERNALS_ALLOCATION_UNITS
ALLOCATION_UNIT_ID TYPE FILEGROUP ID CONTAINER ID FIRST PAGE ROOT
PAGE FIRST IAM PAGE TOTAL PAGES USED PAGES DATA PAGES
IAM Data
Page
Data Page
Data Page
Data Page Data Page
• • • • •
File B
IAM Data
Page
Data Page Data Page
Data Page
FIGURE 34.13 The structure of a heap table
to the next page and to the previous page) Normally, data pages are not linked Only
index pages within a level are linked in this manner to allow for ordered scans of the data
in an index level Because the data pages of a clustered table constitute the leaf level of the
clustered index, they are chained as well This allows for an ordered table scan The page
pointers are stored in the page header Figure 34.14 shows a simplified example of the data
pages of a clustered table (Note that the figure shows only the data pages.)
Previous
Albert, Lynn,…
Alexis, Amy,…
Cox, Nancy,…
Dean, Beth,…
Next Previous
Eddy, Elizabeth,…
Franks, Anabelle,…
Hunt, Sally,…
Martin, Emma,…
Next Previous
Smith, David,…
Toms, Mike,…
Watson, Tom,…
Next
FIGURE 34.14 The data page structure of a clustered table
Trang 9CHAPTER 34 Data Structures, Indexes, and Performance
TIP
More details on the structure and maintenance of clustered tables are provided in the
remainder of this chapter
Understanding Index Structures
When you run a query against a table that has no indexes, SQL Server has to read every
page of the table, looking at every row on each page to find out whether each row satisfies
the search arguments SQL Server has to scan all the pages because there’s no way of
knowing whether any rows found are the only rows that satisfy the search arguments
This search method is referred to as a table scan.
A table scan is not an efficient way to retrieve data unless you really need to retrieve all
rows The Query Optimizer in SQL Server always calculates the cost of performing a table
scan and uses that as a baseline when evaluating other access methods The various access
methods and query plan cost analysis are discussed in more detail in Chapter 35,
“Understanding Query Optimization.”
Suppose that a table is stored on 10,000 pages; even if only one row is to be returned or
modified, all the pages must be searched, resulting in a scan of approximately 80MB of
data (that is, 10,000 pages × 8KB per page = 80,000KB)
Indexes are structures stored separately from the actual data pages; they contain pointers
to data pages or data rows Indexes are used to speed up access to the data; they are also
the mechanism used to enforce the uniqueness of key values
Indexes in SQL Server are balanced trees (B-trees; see Figure 34.12) There is a single root
page at the top of the tree, which branches out into N pages at each intermediate level
until it reaches the bottom (leaf level) of the index The leaf level has one row stored for
each row in the table The index tree is traversed by following pointers from the
upper-Level 2 (Root)
Level 1 (Intermediate)
Level 0 (Leaf)
FIGURE 34.15 The basic structure of a B-tree index
Trang 10level pages down through the lower-level pages Each level of the index is linked as a
doubly linked list
An index can have many intermediate levels, depending on the number of rows in the
table, index type, and index key width The maximum number of columns in an index is
16; the maximum width of an index row is 900 bytes
To provide a more efficient mechanism to identify and locate specific rows within a table
quickly and easily, SQL Server supports two types of indexes: clustered and nonclustered
Clustered Indexes
When you create a clustered index, all rows in the table are sorted and stored in the
clus-tered index key order Because the rows are physically sorted by the index key, you can
have only one clustered index per table You can think of the structure of a clustered index
as being similar to a filing cabinet: the data pages are like folders in a file drawer in
alpha-betical order, and the data rows are like the records in the file folder, also in sorted order
You can think of the intermediate levels of the index tree as the file drawers, also in
alpha-betical order, that assist you in finding the appropriate file folder Figure 34.16 shows an
example of a clustered index tree structure
In Figure 34.16, note that the data page chain is in clustered index order However, the
rows on each page might not be physically sorted in clustered index order, depending on
when rows were inserted or deleted in the page SQL Server still keeps the proper sort
order of the rows via the row IDs and the row offset table A clustered index is useful for
range-retrieval queries or searches against columns with duplicate values because the rows
within the range are physically located in the same page or on adjacent pages
The data pages of the table are also the leaf level of a clustered index To find all clustered
index key values, SQL Server must eventually scan all the data pages
SQL Server performs the following steps when searching for a value using a clustered index:
1 Queries the system catalogs for the page address for the root page of the index (For
a clustered index, the root_page column in
sys.system_internals_allocation_units points to the top of the clustered index
for a specific partition.)
2 Compares the search value against the key values stored on the root page
3 Finds the highest key value on the page where the key value is less than or equal to
the search value
4 Follows the page pointer stored with the key to the appropriate page at the next
level down in the index
5 Continues following page pointers (that is, repeats steps 3 and 4) until the data page
is reached
6 Searches the rows on the data page to locate any matches for the search value If no
matching row is found on that data page, the table contains no matching values