Microsoft SQL Server 2008 R2 Unleashed- P118 doc

SQL Server is able to easily locate GAM pages in a database because the first GAM page is located at the third page in the file page number 2.. The CD array uses 4 bits for each column i

Trang 1

CHAPTER 34 Data Structures, Indexes, and Performance

If SQL Server had to search throughout an entire database file to find free extents, it

would-n’t be efficient Instead, SQL Server uses two special types of pages to record which extents

have been allocated to tables or indexes and whether it is a mixed or uniform extent:

Global allocation map pages (GAMs)

Shared global allocation map pages (SGAMs)

Global and Shared Global Allocation Map Pages

The allocation map pages track whether extents have been allocated to objects and

indexes and whether the allocation is for mixed extents or uniform extents As mentioned

in the preceding section, there are two types of GAMs:

Global allocation map (GAM)—The GAM keeps track of all allocated extents in a

database, regardless of what it’s allocated to The structure of the GAM is

straightfor-ward: each bit in the page outside the page header represents one extent in the file,

where 1 means that the extent is not allocated, and 0 means that the extent is

allo-cated Nearly 8,000 bytes (64,000 bits) are available in a GAM page after the header

and other overhead bytes are taken into account Therefore, a single GAM covers

approximately 64,000 extents, or 4GB (64,000 * 64KB) of data

Shared global allocation map (SGAM)—The SGAM keeps track of mixed extents

that have free space available An SGAM has a structure similar to a GAM, with each

bit representing an extent A value of 1 means that the extent is a mixed extent and

there is free space (at least one unused page) available on the extent A value of 0

means that the extent is not currently allocated, that the extent is a uniform extent,

or that the extent is a mixed extent with no free pages

Table 34.6 summarizes the meaning of the bit in GAMs and SGAMs

When SQL Server needs to allocate a uniform extent, it simply searches the GAM for a bit

with a value of 1 and sets it to 0 to indicate it has been allocated To find a mixed extent

with free pages, it searches the SGAM for a bit set to 1 When all pages in a mixed extent

are used, its corresponding bit is set to 0 When a mixed extent needs to be allocated, SQL

Server searches the GAM for an extent whose bit set to 1 and sets the bit to 0, and the

corresponding SGAM bit is set to 1 There is some more processing involved as well, such

as spreading the data evenly across database files, but the allocation algorithms are still

relatively simple

TABLE 34.6 Meaning of the GAM and SGAM Bits

Trang 2

SQL Server is able to easily locate GAM pages in a database because the first GAM page is

located at the third page in the file (page number 2) There is another GAM every 511,230

pages after the first GAM The fourth page (page number 3) in each database file is the

SGAM page, and there is another SGAM each 511,230 pages after the first SGAM

Page Free Space Pages

A page free space (PFS) page records whether each page is allocated and the amount of free

space available on the page Each PFS covers 8,088 contiguous pages in the file For each

of the 8,088 pages, the PFS has a 1-byte record that contains a bitmap for each page

indi-cating whether the page is empty, 1 to 50% full, 51 to 80% full, 81 to 95% full, or more

than 95% full The first PFS page in a file is located at page number 1, the second PFS page

is located at page 8088, and each additional PFS page is located every 8,088 pages after

that SQL Server uses PFS pages to find free pages on extents and to find pages with space

available on extents when a new row needs to be added to a table or index

Figure 34.6 shows the layout of GAM, SGAM, and PFS pages in a database file Note that

every file has a single file header located at page 0

Index Allocation Map Pages

Index allocation map (IAM) pages keep track of the extents used by a heap or index Each

heap table and index has at least one IAM page for each file where it has extents An IAM

cannot reference pages in other database files; if the heap or index spreads to a new

data-base file, a new IAM for the heap or index is created in that file IAM pages are allocated as

needed and are spread randomly throughout the database files

An IAM page contains a small header that has the address of the first extent in the range

of pages being mapped by the IAM It also contains eight page pointers that keep track of

index or heap pages that are in mixed extents These pointers might or might not contain

any information, depending on whether any data has been deleted from the tables and

the page(s) released Remember, an index or heap will have no more than eight pages in

mixed extents (after eight pages, it begins using uniform extents), so only the first IAM

page stores this information The remainder of the IAM page is for the allocation bitmap

The IAM bitmap works similarly to the GAM, indicating which extents over the range of

extents covered by the IAM are used by the heap or index the IAM belongs to If a bit is

on, the corresponding extent is allocated to the table

Each IAM covers a possible range of 63,903 extents (511,224 pages), covering a 4GB

section of a file Each bit represents an extent within that range, whether or not the

Page 0

File

Heaader

Page 2

GAM

Page

Page 3 SGAM Page

Page 8089 PFS Page

…

Page 1

PFS

Page

Page 16177 PFS Page

…

Page 509545 PFS Page

Page 511232 GAM Page

Page 511233 SGAM Page

FIGURE 34.6 The layout of GAM, SGAM, and PFS pages in a database file

Trang 3

extent is allocated to the object that the IAM belongs to If the bit is set to 1, the relative

extent in the range is allocated to the index or heap If the bit is set to 0, the extent is

either not allocated or might be allocated to another heap or index

For example, assume that an IAM page resides at page 649 in the file If the bit pattern in

the first byte of the IAM is 1010 0100, the first, third, and sixth extents within the range

of the IAM are allocated to the heap or index The second, fourth, fifth, seventh, and

eighth extents are not

NOTE

For a heap table, the data pages and rows within them are not stored in any specific

order Unlike versions of SQL Server prior to 7.0, the pages in a heap structure are not

linked together in a page chain The only logical connection between data pages is the

information recorded in the IAM pages, which are linked together The structure of heap

tables is examined in more detail later in this chapter

Differential Changed Map Pages

The seventh page (page number 6), and every 511,232nd page thereafter, in the database

file is the differential changed map (DCM) page This page keeps track of which extents in a

file have been modified since the last full database backup When an extent has been

modified, its corresponding bit in the DCM is turned on This information is used when a

differential backup is performed on the database A differential backup copies only the

extents changed since the last full backup was made Using the DCM, SQL Server can

quickly tell which extents need to be backed up by examining the bits on the DCM pages

for each data file in the database When a full backup is performed for the database, all

the bits are set back to 0

Bulk Changed Map Pages

The eighth page (page number 7), and every 511,232nd page thereafter, in the database file

is the bulk changed map (BCM) When you perform a minimally or bulk-logged operation

in SQL Server 2008 in BULK_LOGGED recovery mode, SQL Server logs only the fact that the

operation occurred and doesn’t log the actual data changes The operation is still fully

recoverable because SQL Server keeps track of what extents were actually modified by the

bulk operation in the BCM page Similar to the DCM page, each bit on a BCM page

repre-sents an extent within its range, and if the bit is set to 1, that indicates that the

corre-sponding extent has been changed by a minimally logged bulk operation since the last

full database backup All the bits on the BCM page are reset to 0 whenever a full database

backup or log backup occurs

When you initiate a log backup for a database using the BULK_LOGGED recovery model, SQL

Server scans the BCM pages and backs up all the modified extents along with the contents

of the transaction log itself You should be aware that the log file itself might be small, but

the backup of the log can be many times larger if a large bulk operation has been

performed since the last log backup

Trang 4

Data Compression

SQL Server 2008 introduced a new data compression feature that is available in Enterprise

and Datacenter Editions Data compression helps to reduce both storage and memory

requirements as the data is compressed both on disk and when brought into the SQL

Server data cache

When compression is enabled and data is written to disk, it is compressed and stored in

the designated compressed format When the data is read from disk into the buffer cache,

it remains in its compressed format This helps reduce both storage requirements and

memory requirements It also reduces I/O because more data can be stored on a data page

when it’s compressed When the data is passed to another component of SQL Server,

however, the Database Engine then has to uncompress the data on the fly In other words,

every time data has to be passed to or from the buffered cache, it has to be compressed or

uncompressed This requires extra CPU overhead to accomplish However, in most cases,

the amount of I/O and buffer cache saved by compression more than makes up for the

CPU costs, boosting the overall performance of SQL Server

Data compression can be applied on the following database objects:

Tables (clustered or heap)

As the DBA, you need to evaluate which of the preceding objects in your database could

benefit from compression and then decide whether you want to compress it using either

row-level or page-row-level compression Compression is enabled or disabled at the object row-level There is

no single option you can enable that turns compression on or off for all objects in the

data-base Fortunately, other than turning compression on or off for the preceding objects, you

don’t have to do anything else to use data compression SQL Server handles data compression

transparently without your having to re-architect your database or your applications

Row-Level Compression

Row-level compression isn’t true data compression Instead, space savings are achieved by

using a more efficient storage format for fixed-length data to use the minimum amount of

space required For example, the int data type uses 4 bytes of storage regardless of the value

stored, even NULL However, only a single byte is required to store a value of 100 Row-level

compression allows fixed-length values to use only the amount of storage space required

Row-level compression saves space and reduces I/O by

Reducing the amount of metadata required to store data rows

Storing fixed-length numeric data types as if they were variable-length data types,

using only as many bytes as necessary to store the actual value

Storing CHAR data types as variable-length data types

Not storing NULL or 0 values

Trang 5

Row-level data compression provides less compression than page-level data compression,

but it also incurs less overhead, reducing the amount of CPU resources required to

implement it

Row-level compression can be enabled when creating a table or index or using the ALTER

TABLE or ALTER INDEX commands by specifying the WITH (DATA_COMPRESSION = ROW)

option The following example enables row compression on the titles table in the

bigpubs2008 database:

ALTER TABLE titles REBUILD WITH (DATA_COMPRESSION=ROW)

Additionally, if a table or index is partitioned, you can apply compression at the

parti-tion level

When row-level compression is applied to a table, a new row format is used that is unlike

the standard data row format discussed previouslywhich has a fixed-length data section

separate from a variable-length data section (see Figure 34.3) This new row format is

referred to as column descriptor, or CD, format The name of this row format refers to the

fact the every column has description information contained in the row itself Figure 34.7

illustrates a representative view of the CD format (a definitive view is difficult because,

except for the header, the number of bytes in each region is completely dependent on the

values in the data row)

The row header is always 1 byte in length and contains information similar to Status Bits

A in a normal data row:

Bit 0—This bit indicates the type of record (1 = CD record format).

Bit 1—This bit indicates whether the row contains versioning information.

Bits 2–4—This three-bit value indicates what kind of information is stored in the

row (such as primary record, ghost record, forwarding record, index record)

Bit 5—This bit indicates whether the row contains a Long data region (with values

greater than 8 bytes in length)

Bits 6 and 7—These bits are not used.

The CD region consists of two parts The first is either a 1- or 2-byte value indicating the

number of short columns (8 bytes or less) If the most significant bit of the first byte is set

to 0, it’s a 1-byte field representing up to 127 columns; if it’s 1, it’s a 2-byte field

represent-ing up to 32,767 columns Followrepresent-ing the first 1 or 2 bytes is the CD array The CD array

uses 4 bits for each column in the table to represent information about the length of the

Header

(1 byte) CD Region Short Data Region Long Data Region Special Information

FIGURE 34.7 A representative structure of a CD format row

Trang 6

column A bit representation of 0 indicates the column is NULL A bit representation of the

values 1 to 9 indicates the column is 0 to 8 bytes in length, respectively A bit

representa-tion of 10 (0xa) indicates that the corresponding column value is a long data value and

uses no space in the short data region A bit representation of 11 (0xb) represents a bit

column with a value of 1, and a bit representation of 12 (0xc) indicates that the

corre-sponding value is a 1-byte symbol representing a value in the page compression dictionary

(the page compression dictionary is discussed next in the page-level compression section)

The short data region contains each of the short data values However, because accessing

the last columns can be expensive if there are hundreds of columns in the table, columns

are grouped into clusters of 30 columns At the beginning of the short data region, there is

an area called the short data cluster array Each entry in the array is a single byte, which

indicates the sum of the sizes of all the data in the previous cluster in the short data

region; the value is essentially a pointer to the first column of the cluster (no row offset is

needed for the first cluster because it starts immediately after the CD region)

Any data value in the row longer than 8 bytes is stored in the long data region This can

include LOB and row-overflow pointers Long data needs an actual offset value to allow

SQL Server to locate each value This offset array looks similar to the offset array used in

the standard data row structure The long data region consists of three parts: an offset

array, a long data cluster array, and the long data The long data cluster array is similar to

the short data cluster array; it has one entry for each 30-column cluster (except for the last

one) and serves to limit the cost of locating columns near the end of a long list of columns

The special information section at the end of the row contains three optional pieces of

information The existence of any or all of this information is indicated by bits in the first

1-byte header at the beginning of the row The three special pieces of information are

Forwarding pointer—This pointer is used in a heap when a row is forwarded due

to an update (forward pointers are discussed later in this chapter)

Back pointer—If the row is a forwarded row, it contains a pointer back to the

origi-nal row location

Versioning information—If snapshot isolation is being used, 14 bytes of

version-ing information are appended to the row

Page-Level Compression

Page-level compression is an implementation of true data compression, using both column

prefix and dictionary-based compression Data is compressed be storing repeating values or

common prefixes only once and then referencing those values from other columns and

rows When you implement page compression for a table, row compression is applied as

well Page-level compression offers increased data compression over row-level compression

alone but at the expense of greater CPU utilization It works using these techniques:

First, row-level data compression is applied to fit as many rows as it can on a

single page

Trang 7

Next, column prefix compression is run Essentially, repeating patterns of data at the

beginning of the values of a given column are removed and substituted with an

abbreviated reference, which is stored in the compression information (CI) structure

stored after the page header

Finally, dictionary compression is applied on the page Dictionary compression

searches for repeated values anywhere on a page and stores them in the CI

Page compression is applied only after a page is full and if SQL Server determines that

compressing a page will save a meaningful amount of space

The amount of compression provided by page-level data compression is highly dependent

on the data stored in a table or index If a lot of the data repeats itself, compression is

more efficient If the data is more randomly discrete values, fewer benefits are gained from

using page-level compression

Column prefix compression looks at the column values on a single page and chooses a

common prefix that can be used to reduce the storage space required for values in that

column The longest value in the column that contains the prefix is chosen as the anchor

value A row that represents the prefix values for each column is created and stored in the

CI structure that immediately follows the page header Each column is then stored as a

delta from the anchor value, where repeated prefix values in the column are replaced by a

reference to the corresponding prefix If the value in a row does not exactly match the

selected prefix value, a partial match can still be indicated

For example, consider a page that contains the following data rows before prefix

compres-sion as shown in Figure 34.8

After you apply column prefix compression on the page, the CI structure is stored after

the page header holding the prefix values for each column The columns then are stored

as the difference between the prefix and column value, as shown in Figure 34.9

In the first column in the first data row, the value 4b represents that the first four

charac-ters of the prefix (aaab) are present at the beginning of the column for that row and also

the character b If you append the character b to the first four values of the prefix, it

rebuilds the original value of aaabb For any columns values that are [empty], the column

matches the prefix value exactly Any column value that starts with 0 means that none of

the first characters of the column match the prefix For the fourth column, there is no

common prefix value in the columns, so no prefix value is stored in the CI structure

Page Header

aaabb aaaab abcd abc

aaabccc bbbbb abcd mno

aaaccc aaaacc bbbb xyz

Data Rows

FIGURE 34.8 Sample page of a table before prefix compression

Trang 8

aaabccc

Page Header

Data Rows

4b

0bbbb

[empty]

FIGURE 34.9 Sample page of a table after prefix compression

After column prefix compression is applied to every column individually on the page, SQL

Server then looks to apply dictionary compression Dictionary compression looks for

repeated values anywhere on the page and also stores them in the CI structure after the

column prefix values Dictionary compression values replace repeated values anywhere on

a page The following illustrates the same page shown previously after dictionary

compres-sion has been applied:

The dictionary is stored as a set of these duplicate values and a symbol to represent these

values in the columns on the page As you can see in this example, 4b is repeated in

multiple columns in multiple rows, and the value is replaced by the symbol 0 throughout

the page The value 0bbbb is replaced by the symbol 1 SQL Server recognizes that the

value stored in the column is a symbol and not a data value by examining the coding in

the CD array, as discussed earlier

Not all pages contain both the prefix record and a dictionary Having them both depends

on whether the data has enough repeating values or patterns to warrant either a prefix

record or a dictionary

Data Rows

Page Header

0 0 [empty] abcd

[empty] 1 [empty] mno

3ccc [empty] 1 xyz

aaabccc aaaacc abcd [NULL]

4b 0bbbb

FIGURE 34.10 Sample page of a table after dictionary compression

Trang 9

The CI Record

The CI record is the only main structural change to a page when it is page compressed

versus a page that uses row compression only As shown in the previous examples, the CI

record is located immediately after the page header There is no entry for the CI record in

the row offset table because its location is always the same A bit is set in the page header

to indicate whether the page is page compressed When this bit is present, SQL Server

knows to look for the CI record The CI record contains the data elements shown in

Table 34.7

Implementing Page Compression

Page compression can be implemented for a table at the time it is created or by using the

ALTER TABLE command, as in the following example:

ALTER TABLE sales_big REBUILD WITH (DATA_COMPRESSION=PAGE)

Unlike row compression, which is applied immediately on the rows, page compression

isn’t applied until the page is full The rows cannot be compressed until SQL Server can

determine what encodings for prefix and dictionary substitution are going to be used to

replace the actual data When you enable page compression for a table or a partition, SQL

Server examines every full page to determine the possible space savings Any pages that

are not full are not considered for compression During the compression analysis, the

prefix and dictionary values are created, and the column values are modified to reflect the

prefix and dictionary values Then row compression is applied If the new compressed

page can hold at least five additional rows, or 25% more rows than the page currently

TABLE 34.7 Data Elements Within the CI Record

Name Description

Header This structure contains 1 byte to keep track of information about the CI Bit 0 is

the version (currently always 0), Bit 1 indicates the presence of a column prefix

anchor record, and Bit 2 indicates the presence of a compression dictionary

PageModCount This value keeps track of the number of changes to the page to determine

whether the compression on the page should be reevaluated and the CI record

rebuilt

Offsets This element contains values to help SQL Server find the dictionary It contains

the offset of the end of the Column prefix anchor record and offset of the end of

the CI record itself

Anchor Record This record looks exactly like a regular CD record (see Figure 34.7) Values

stored are the common prefix values for each column, some of which might be

NULL

Dictionary The first 2 bytes represent the number of entries in the dictionary, followed by

an offset array of 2-byte entries, which indicate the end offset of each dictionary

entry, and then the actual dictionary values

Trang 10

holds, the page is compressed If neither one of these criteria is met, the compressed

version of the page is discarded

New rows inserted into a compressed page are compressed as they are inserted However,

new entries are not added to the prefix list or dictionary based on a single new row The

prefix values and dictionary symbols are rebuilt only on an all-or-nothing basis After the

page is changed a sufficient number of times, SQL Server evaluates whether to rebuild the

CI record The PageModCount field in the CI record is used to keep track of the number of

changes to the page since the CI record was last built or rebuilt This value is updated

every time a row is updated, deleted, or inserted If SQL Server encounters a full page

during a data modification and the PageModCount is greater than 25 or the PageModCount

divided by the number of rows on the page is greater than 25%, SQL Server reapplies the

compression analysis on the page Again, only if recompressing the page creates room for

five additional rows, or 25% more rows than the page currently holds, the new

compressed page replaces the existing page

In B-tree structures (nonclustered indexes or a clustered table), only the leaf-level and data

pages are considered for compression When you insert a new row into a leaf or data page,

if the compressed row fits, it is inserted and nothing more is done If it doesn’t fit, SQL

Server attempts to recompress the page and then recompress the row based on the new CI

record If the row fits after recompression, it is inserted and nothing more is done If the

row still doesn’t fit, the page needs to be split When a compressed page is split, the CI

record is copied to the new page exactly as it was, along with the rows moved to the new

page However, the PageModCount value is set to 25, so that when the new page gets full, it

will be immediately analyzed for recompression Leaf and data pages are also checked for

recompression whenever you run an index rebuild or shrink operation

If you enable compression on a heap table, pages are evaluated for compression only

during rebuild and shrink operations Also, if you drop a clustered index on a table,

turning it into a heap, SQL Server runs compression analysis on any full pages

Compression is avoided during normal data modification operations on a heap to avoid

changes to the Row IDs, which are used as the row locators for any indexes on the heap

(See the “Understanding Index Structures” section later in this chapter for a discussion of

row locators.) Although the RowModCounter is still maintained, SQL Server essentially

ignores it and never tries to recompress a page based on the RowModCounter value

Evaluating Page Compression

Before choosing to implement page compression, you should determine if the overhead of

page compression will provide sufficient benefit in space savings To determine how

changing the compression state will affect a table or an index, you can use the SQL Server

2008 sp_estimate_data_compression_savings stored procedure, which is available only

in the editions of SQL Server that support data compression This stored procedure

evalu-ates the effects of compression by sampling up to 5,000 pages in the table and creating a

copy of these 5,000 pages of the table in tempdb, performing the compression, and then

using the sample to estimate the overall size for the table after compression The syntax

for sp_estimate_data_compression_savings is as follows:

Định dạng
Số trang	10
Dung lượng	268,48 KB