Microsoft SQL Server 2008 R2 Unleashed- P119 pot

CHAPTER 34 Data Structures, Indexes, and Performance size_in_MB ---113.742187 Now, implement page compression on the sales_big table: ALTER TABLE sales_big REBUILD WITH DATA_COMPRESSION

Trang 1

CHAPTER 34 Data Structures, Indexes, and Performance

sp_estimate_data_compression_savings

[ @schema_name = ] ‘schema_name’

, [ @object_name = ] ‘object_name’

, [@index_id = ] index_id

, [@partition_number = ] partition_number

, [@data_compression = ] ‘data_compression’

You can estimate the data compression savings for a table for either row or page

compres-sion by specifying either ’ROW’ or ’PAGE’ as the value for the @data_compression

parame-ter You can also estimate the average size of the compressed table if compression is

disabled by specifying NONE as the value for @data_compression You can also use the

sp_estimate_data_compression_savings procedure to estimate the space savings for

compression on a specific index or partition The following example estimates the space

savings if page compression were applied to the big_sales table in the bigpubs2008 table

versus row compression:

use bigpubs2008

go

exec sp_estimate_data_compression_savings ‘dbo’, ‘sales_big’, null, null, ‘PAGE’

go

object_name schema_name index_id partition_number

size_with_current_compression_setting(KB)

size_with_requested_compression_setting(KB)

sample_size_with_current_compression_setting(KB)

sample_size_with_requested_compression_setting(KB)

- - -

-

-sales_big dbo 1 1

116512 39128 40016 13440 sales_big dbo 2 1

36648

22128

10904

6584

exec sp_estimate_data_compression_savings ‘dbo’, ‘sales_big’, null, null, ‘ROW’

go

Trang 2

object_name schema_name index_id partition_number

size_with_current_compression_setting(KB)

size_with_requested_compression_setting(KB)

sample_size_with_current_compression_setting(KB)

sample_size_with_requested_compression_setting(KB)

- - -

-

-sales_big dbo 1 1

116512 97936 40344 33912 sales_big dbo 2 1

36648 27176 10992 8152 You can see in this example that the space savings from page compression would be significant, with an estimated reduction in the size of the table itself (index_id = 1) from 113MB (116,512 KB) to 38MB (39,128 KB), a savings of more than 66% Row compression would not provide nearly as significant a savings, with an estimated reduction in size from 113MB to only 95MB (97,936 KB) , only a 16% savings If you compress the table, you can compare the estimated space savings to the actual size For example, let’s look at the initial size of the sales_big table: use bigpubs2008 go select sum(page_count) as pages, sum(compressed_page_count) as compressed_pages from sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID(‘sales_big’), 1, null, ‘DETAILED’) where index_level = 0 SELECT SUM(used_page_count/ 128.0) AS size_in_MB FROM sys.dm_db_partition_stats WHERE object_id=OBJECT_ID(‘dbo.sales_big’) AND index_id=1 GO pages compressed_pages

-14519 0

Trang 3

size_in_MB

-113.742187

Now, implement page compression on the sales_big table:

ALTER TABLE sales_big REBUILD WITH (DATA_COMPRESSION=PAGE)

Now, re-examine the size of the sales_big table:

select sum(page_count) as pages, sum(compressed_page_count) as compressed_pages

from sys.dm_db_index_physical_stats (DB_ID(),

OBJECT_ID(‘sales_big’), 1, null, ‘DETAILED’)

where index_level = 0

SELECT SUM(used_page_count/ 128.0) AS size_in_MB

FROM sys.dm_db_partition_stats

WHERE object_id=OBJECT_ID(‘dbo.sales_big’) AND index_id=1

GO

pages compressed_pages

-4452 4451

size_in_MB

-34.906250

In this example, you can see that the table was reduced in size significantly, from 14,519

pages to 4,452 pages (113.7MB to 34.9MB), pretty much right in line with the estimated

space savings You can also see that compression was reasonably effective, compressing

4,451 of 4,452 pages

Be aware that you may not always receive the space savings predicted due to the effects of

fill factor and the actual size of the rows For example, if you have a row that is 8,000 bytes

long and compression reduces its size by 40%, only one row can still be fit on the data page,

so there is no space savings for that page If the results of running

sp_estimate_data_compression_savings indicate that the table will grow, this indicates that

many of the rows in the table are using nearly the full precision of the data types, and the

addition of the small overhead needed for the compressed format is more than the savings

from compression In this, it is obvious that there is no advantage to enabling compression

Managing Data Compression with SSMS

The preceding examples show the T-SQL commands you can use to evaluate and manage

row and page compression in SQL Server 2008 SSMS provides a Data Compression Wizard

for evaluating and performing data compression activities To invoke the Data

Trang 4

Compression Wizard, right-click on the table in the Object Explorer and select Storage and

then select Manage Compression Click Next to move past the Welcome page to bring up

the Select Compression Type page, as shown in Figure 34.11

On the Compression Type Page, you can choose the compression type to use at the

parti-tion level or to use the same compression type for all partiparti-tions You can also see the

esti-mated savings for selected compression type by clicking on the Calculate button After

you click on Calculate, the wizard displays the current partition size and requested

compression size in the corresponding columns (note that it might take a few moments to

do the calculation)

After making your selections, click on Next to display the Select and Output Option page

Here, you have the opportunity to have the wizard generate a script of commands you can

run manually to implement the selected compression type If you choose to generate a

script, you have the option to save the script to a file, the Clipboard, or to a new query

window in SSMS You also have the option to run the compression changes immediately

or schedule a SQL Agent job to run the changes at a specified time

Understanding Table Structures

A table is logically defined as a set of columns with certain properties, such as the data

type, nullability, constraints, and so on Information about data types, column properties,

constraints, and other information related to defining and creating tables can be found in

Chapters 24, “Creating and Managing Tables,” and 27, “Creating and Managing Views.”

FIGURE 34.11 The Data Compression Wizard’s Select Compression Type page

Trang 5

Internally, a table is contained in one or more partitions A partition is a user-defined unit of

data organization By default, a table has at least one partition that contains all the table pages

This partition resides in a single filegroup, as described earlier When a table has multiple

parti-tions, the data is partitioned horizontally so that groups of rows are mapped into individual

partitions, based on a specified column The partitions can be placed in one or more filegroups

in the database The table is treated as a single logical entity when queries or updates are

performed on the data Figure 34.12 shows the organization of a table in SQL Server 2008

Each table has one row in the sys.objects catalog view, and each table and index in a

database is represented by a single row in the sys.indexes catalog view Each partition of

a table or index is represented by one or more rows in the sys.partitions catalog view

Each partition can have three types of data, each stored on its own set of pages: in-row

data pages, row-overflow pages, and LOB data pages Each of these types of pages has an

allocation unit, which is contained in the sys.allocation_units view There is always at

least one allocation unit for the in-row data The following sample query shows how to

view the partition and allocation information for the databaselog and currency tables in

the AdventureWorks2008R2 database:

use AdventureWorks2008R2

go

SELECT convert(varchar(15), o.name) AS table_name,

p.index_id as indid,

convert(varchar(30), i.name) AS index_name ,

convert(varchar(18), au.type_desc) AS allocation_type,

au.data_pages as d_pgs,

partition_number as ptn

FROM sys.allocation_units AS au

JOIN sys.partitions AS p ON au.container_id = p.partition_id

JOIN sys.objects AS o ON p.object_id = o.object_id

JOIN sys.indexes AS i ON p.index_id = i.index_id AND i.object_id = p.object_id

Table

Partitionn

…

Heap or

B-Tree

Partition1

Heap or B-Tree

Data LOB OverflowRow Data LOB OverflowRow

FIGURE 34.12 Table organization in SQL Server 2008

Trang 6

WHERE o.name = N’databaselog’ OR o.name = N’currency’

ORDER BY o.name, p.index_id;

table_name indid index_name allocation_type d_pgs ptn

-

-Currency 1 PK_ -Currency_ -CurrencyCode IN_ROW_DATA 1 1

Currency 2 AK_Currency_Name IN_ROW_DATA 1 1

DatabaseLog 0 NULL IN_ROW_DATA 753 1

DatabaseLog 0 NULL LOB_DATA 0 1

DatabaseLog 0 NULL ROW_OVERFLOW_DATA 0 1

DatabaseLog 2 PK_DatabaseLog_DatabaseLogID IN_ROW_DATA 3 1

In this example, you can see that the DatabaseLog table (which is a heap table) has three

allocation units associated with the table—LOB, row-overflow, and in-row data—and one

allocation unit for the nonclustered index PK_DatabaseLog_DatabaseLogID The currency

table (which is a clustered table) has a single in-row allocation unit for both the table

(index_id = 1) and the nonclustered index (AK_Currency_Name)

In SQL Server 2008, there are two types of tables: heap tables and clustered tables Let’s

look at how they are stored

Heap Tables

A table without a clustered index is a heap table There is no imposed ordering of the

data rows for a heap table Additionally, there is no direct linkage between the pages in a

heap table

By default, a heap has a single partition Heaps have one row in sys.partitions, with an

index ID of 0 for each partition used by the heap When a heap has multiple partitions,

each partition has a heap structure that contains the data for that specific partition For

example, if a heap has four partitions, there are four heap structures (one in each

parti-tion) and four rows in sys.partitions

Depending on the data types in the heap, each heap structure has one or more allocation

units to store and manage the data for each partition At a minimum, each heap has one

IN_ROW_DATA allocation unit per partition The heap also has one LOB_DATA allocation unit

per partition, if it contains large object columns It also has one ROW_OVERFLOW_DATA

allo-cation unit per partition if it contains variable-length columns that exceed the 8,060-byte

row size limit

To access the contents of a heap, SQL Server uses the IAM pages In SQL Server 2008, each

heap table has at least one IAM page The address of the first IAM page is available in the

undocumented sys.sytem_internals_allocation_units system view The column

first_iam_page points to the first IAM page in the chain of IAM pages that manage the

space allocated to the heap in a specific partition The following query returns the first

IAM pages for each of the allocation units for the heap table DatabaseLog in

AdventureWorks2008R2:

Trang 7

use AdventureWorks2008R2

go

select p.partition_number as ptn,

type_desc,

filegroup_id,

first_iam_page

from sys.system_internals_allocation_units i

inner join

sys.partitions p

on p.hobt_id = i.container_id

where p.object_id = OBJECT_ID(‘DatabaseLog’)

and index_id = 0

go

ptn type_desc filegroup_id first_iam_page

- - -

-1 IN_ROW_DATA -1 0xAA0000000 -100

1 LOB_DATA 1 0xB90000000100

1 ROW_OVERFLOW_DATA 1 0x000000000000

Note that the value 0x000000000000 for the first_iam_page for ROW_OVERFLOW_DATA

indi-cates that no extents have yet been allocated for storing row-overflow data

NOTE

The sys.system_internals_allocation_units system view is reserved for Microsoft

SQL Server internal use only Future compatibility and availability of this view is not

guar-anteed

The data pages and rows in the heap are not sorted in any specific order and are not linked

The IAM page registers which extents are used by the table SQL Server can then simply scan

the allocated extents referenced by the IAM page, in physical order This essentially avoids

the problem of page chain fragmentation during reads because SQL Server always reads full

extents in sequential order Using the IAM pages to set the scan sequence also means that

rows from the heap often are not returned in the order in which they were inserted

As discussed earlier, each IAM can map a maximum of 63,903 extents for a table As a table

uses extents beyond the range of those 63,903 extents, more IAM pages are created for the

heap table as needed A heap table also has at least one IAM page for each file on which

the heap table has extents allocated Figure 34.13 illustrates the structure of a heap and

how its contents are traversed using the IAM pages

Clustered Tables

A clustered table is a table that has a clustered index defined on it When you create a

clus-tered index, the data rows in the table are physically sorted in the order of the columns in

the index key The data pages are chained together in a doubly linked list (each page points

Trang 8

IAM

File A

SYS.SYSTEM_INTERNALS_ALLOCATION_UNITS

ALLOCATION_UNIT_ID TYPE FILEGROUP ID CONTAINER ID FIRST PAGE ROOT

PAGE FIRST IAM PAGE TOTAL PAGES USED PAGES DATA PAGES

IAM Data

Page

Data Page

Data Page Data Page

• • • • •

File B

IAM Data

Page

Data Page Data Page

Data Page

FIGURE 34.13 The structure of a heap table

to the next page and to the previous page) Normally, data pages are not linked Only

index pages within a level are linked in this manner to allow for ordered scans of the data

in an index level Because the data pages of a clustered table constitute the leaf level of the

clustered index, they are chained as well This allows for an ordered table scan The page

pointers are stored in the page header Figure 34.14 shows a simplified example of the data

pages of a clustered table (Note that the figure shows only the data pages.)

Alexis, Amy,…

Cox, Nancy,…

Dean, Beth,…

Next Previous

Eddy, Elizabeth,…

Franks, Anabelle,…

Hunt, Sally,…

Martin, Emma,…

Next Previous

Smith, David,…

Toms, Mike,…

Watson, Tom,…

Trang 9

TIP

More details on the structure and maintenance of clustered tables are provided in the

remainder of this chapter

Understanding Index Structures

When you run a query against a table that has no indexes, SQL Server has to read every

page of the table, looking at every row on each page to find out whether each row satisfies

the search arguments SQL Server has to scan all the pages because there’s no way of

knowing whether any rows found are the only rows that satisfy the search arguments

This search method is referred to as a table scan.

A table scan is not an efficient way to retrieve data unless you really need to retrieve all

rows The Query Optimizer in SQL Server always calculates the cost of performing a table

scan and uses that as a baseline when evaluating other access methods The various access

methods and query plan cost analysis are discussed in more detail in Chapter 35,

“Understanding Query Optimization.”

Suppose that a table is stored on 10,000 pages; even if only one row is to be returned or

modified, all the pages must be searched, resulting in a scan of approximately 80MB of

data (that is, 10,000 pages × 8KB per page = 80,000KB)

Indexes are structures stored separately from the actual data pages; they contain pointers

to data pages or data rows Indexes are used to speed up access to the data; they are also

the mechanism used to enforce the uniqueness of key values

Indexes in SQL Server are balanced trees (B-trees; see Figure 34.12) There is a single root

page at the top of the tree, which branches out into N pages at each intermediate level

until it reaches the bottom (leaf level) of the index The leaf level has one row stored for

each row in the table The index tree is traversed by following pointers from the

upper-Level 2 (Root)

Level 1 (Intermediate)

Level 0 (Leaf)

FIGURE 34.15 The basic structure of a B-tree index

Trang 10

level pages down through the lower-level pages Each level of the index is linked as a

doubly linked list

An index can have many intermediate levels, depending on the number of rows in the

table, index type, and index key width The maximum number of columns in an index is

16; the maximum width of an index row is 900 bytes

To provide a more efficient mechanism to identify and locate specific rows within a table

quickly and easily, SQL Server supports two types of indexes: clustered and nonclustered

Clustered Indexes

When you create a clustered index, all rows in the table are sorted and stored in the

clus-tered index key order Because the rows are physically sorted by the index key, you can

have only one clustered index per table You can think of the structure of a clustered index

as being similar to a filing cabinet: the data pages are like folders in a file drawer in

alpha-betical order, and the data rows are like the records in the file folder, also in sorted order

You can think of the intermediate levels of the index tree as the file drawers, also in

alpha-betical order, that assist you in finding the appropriate file folder Figure 34.16 shows an

example of a clustered index tree structure

In Figure 34.16, note that the data page chain is in clustered index order However, the

rows on each page might not be physically sorted in clustered index order, depending on

when rows were inserted or deleted in the page SQL Server still keeps the proper sort

order of the rows via the row IDs and the row offset table A clustered index is useful for

range-retrieval queries or searches against columns with duplicate values because the rows

within the range are physically located in the same page or on adjacent pages

The data pages of the table are also the leaf level of a clustered index To find all clustered

index key values, SQL Server must eventually scan all the data pages

SQL Server performs the following steps when searching for a value using a clustered index:

1 Queries the system catalogs for the page address for the root page of the index (For

a clustered index, the root_page column in

sys.system_internals_allocation_units points to the top of the clustered index

for a specific partition.)

2 Compares the search value against the key values stored on the root page

3 Finds the highest key value on the page where the key value is less than or equal to

the search value

4 Follows the page pointer stored with the key to the appropriate page at the next

level down in the index

5 Continues following page pointers (that is, repeats steps 3 and 4) until the data page

is reached

6 Searches the rows on the data page to locate any matches for the search value If no

matching row is found on that data page, the table contains no matching values

Định dạng
Số trang	10
Dung lượng	294,41 KB