Microsoft SQL Server 2008 R2 Unleashed- P117 ppt

If you want to store the data row and logging information on a single log page, the in-row data cannot be more than 8,060 bytes in size.. This, in effect, limits the maximum in-row data

Trang 1

The Data Rows

Following the page header, starting at byte 96 on the page, are the actual data rows Each

data row has a unique row number within the page Data rows in SQL Server cannot cross

page boundaries The maximum available space in a SQL Server page is 8,060 bytes of in-row

data When a data row is logged in the transaction log (for an insert, for example),

addi-tional logging information is stored on the log page along with the data row Because log

pages are 8,192 bytes in size and also have a 96-byte header, a log page has only 8,096 bytes

of available space If you want to store the data row and logging information on a single log

page, the in-row data cannot be more than 8,060 bytes in size This, in effect, limits the

maximum in-row data row size for a table in SQL Server 2008 to 8,060 bytes as well

Page Header

Fields

Description

PageID Unique identifier for the page It consists of two parts: the file ID number and

page number

NextPage File number and page number of the next page in the chain (0 if the page is

the last or only page in the chain or if the page belongs to a heap table)

PrevPage File number and page number of the previous page in the chain (0 if the page

is the first or only page in the chain or if the page belongs to a heap table)

ObjectID ID of the object to which this page belongs

PartitionID ID of the partition of which this page is a part

AllocUnitID ID of the allocation unit that contains this page

LSN Log sequence number (LSN) value used for changes and updates to this page

SlotCnt Total number of rows (slots) used on the page

Level Level at which this page resides in an index tree (0 indicates a leaf page or

data page)

IndexID ID of the index this page belongs to (0 indicates that it is a data page)

freedata Byte offset where the available free space starts on the page

Pminlen Minimum size of a data row Essentially, this is the number of bytes in the

fixed-length portion of the data rows

FreeCnt Number of free bytes available on the page

reservedCnt Number of bytes reserved by all transactions

Xactreserved Number of bytes reserved by the most recently started transaction

tornBits Bit string containing 1 bit per sector for detecting torn page writes (or

check-sum information if torn_page_detection is not on)

flagBits Two-byte bitmap that contains additional information about the page

Trang 2

NOTE

Although 8,060 bytes is the maximum size of in-row data, 8,060 bytes is not the

maxi-mum row size limit in SQL Server 2008 Data rows can also have row-overflow and

large object (LOB) data stored on separate pages, as you see later in this chapter

The number of rows stored on a page depends on the size of each row For a table that has

all fixed-length, non-nullable columns, the size of the row and the number of rows that

can be stored on a page are always the same If the table has any variable or nullable

fields, the number of rows stored on the page depends on the size of each row SQL Server

attempts to fit as many rows as possible in a page Smaller row sizes allow SQL Server to fit

more rows on a page, which reduces page I/O and allows more data pages to fit in

memory This helps improve system performance by reducing the number of times SQL

Server has to read data in from disk

Because each data row also incurs some overhead bytes in addition to the actual data, the

maximum amount of actual data that can be stored in a single row on a page is slightly

less than 8,060 bytes The actual amount of overhead required per row depends on

whether the table contains any variable-length columns If you attempt to create a table

with a minimum row size including data and row overhead that exceeds 8,060 bytes, you

receive an error message as shown in the following example (remember that a multibyte

character set data type such as nchar or nvarchar requires 2 bytes per character, so an

nchar(4000) column requires 8,000 bytes):

CREATE TABLE customer_info2

(cust_no INT, cust_address NCHAR(25), info NCHAR(4000))

go

Msg 1701, Level 16, State 1, Line 1

Creating or altering table ‘customer_info2’ failed because the

minimum row size would be 8061, including 7 bytes of internal

overhead This exceeds the maximum allowable table row size of 8060

bytes

If the table contains variable-length or nullable columns, you can create a table for which

the minimum row size is less than 8,060 bytes, but the data rows could conceivably

exceed 8,060 bytes SQL Server allows the table to be created If you then try to insert a

row that exceeds 8,060 bytes of data and overhead, the data that exceeds the 8,060-byte

limit for in-row data is stored in a row-overflow page

The Structure of Data Rows The data for all fixed-length data fields in a table is stored at

the beginning of the row All variable-length data columns are stored after the

fixed-length data Figure 34.3 shows the structure of the data row in SQL Server

Trang 3

Status

Byte A

(1 byte)

Fixed Length Data Columns (n bytes)

Number

of Columns (2 bytes)

Null Bitmap (1 bit for each column)

Number

of Variable Length Columns (2 bytes)

Status

Byte B

(1 byte)

not used

Length

of Fixed

Length

Data

(2 bytes)

Column Offset

(2 x number

of variable columns)

Variable Length Data Columns

(n bytes)

The total size of each data row is a factor of the sum of the size of the columns plus the

row overhead Seven bytes of overhead is the minimum for any data row:

1 byte for status byte A

1 byte for status byte B (in SQL Server 2008, only 1 bit is used indicating that the

record is a ghost-forwarded row)

2 bytes to store the length of the fixed-length columns

2 bytes to store the number of columns in the row

1 byte for every multiple of 8 columns (ceiling(numcols / 8) in the table for the

NULL bitmap A 1 in the bitmap indicates that the column allows NULLs

The values stored in status byte A are as follows:

Bit 0—This bit provides version information In SQL Server 2008, it’s always 0

Bits 1 through 3—This 3-bit value indicates the nature of the row 0 indicates that

the row is a primary record, 1 indicates that the row has been forwarded, 2 indicates

a forwarded stub, 3 indicates an index record, 4 indicates a blob fragment, 5

indi-cates a ghost index record, 6 indicates a ghost data record, and 7 indicates a ghost

version record (Many of these topics, such as forwarded and ghost records, are

discussed in further detail later in this chapter.)

Bit 4—This bit indicates that a NULL bitmap exists This is somewhat unnecessary in

SQL Server 2008 because a NULL bitmap is always present, even if no NULLs are

allowed in the table

Bit 5—This bit indicates that one or more variable-length columns exists in the row.

Bit 6—This bit indicates the row contains versioning information.

Bit 7—This bit is not currently used in SQL Server 2008.

If the table contains any variable-length columns, the following additional overhead bytes

are included in each data row:

2 bytes to store the number of variable-length columns in the row

2 bytes times the number of variable-length columns for the offset array This is

essentially a table in the row identifying where each variable-length column can be

found within the variable-length column block

Within each block of fixed-length or variable-length data, the data columns are stored in

the column order in which they were defined when the table was created In other words,

Trang 4

all fixed-length fields are stored in column ID order in the fixed-length block, and all

nullable or variable-length fields are stored in column ID order in the variable-length block

Storage of the sql_variant Data Type The sql_variant data type can contain a value

of any column data type in SQL Server except for text, ntext, image, variable-length

columns with the MAX qualifier, and timestamp For example, a sql_variant in one row

could contain character data; in another row, an integer value; and in yet another row, a

float value Because they can contain any type of value, sql_variant columns are always

considered variable length The format of a sql_variant column is as follows:

Byte 1 indicates the actual data type being stored in the sql_variant

Byte 2 indicates the sql_variant version, which is always 1 in SQL Server 2008

The remainder of the sql_variant column contains the data value and, for some

data types, information about the data value

The data type value in byte 1 corresponds to the values in the xtype column in the

systypes database system table For example, if the first byte contains a hex 38, that

corresponds to the xtype value of 56, which is the int data type

Some data types stored in a sql_variant column require additional information bytes

stored at the beginning of the data value (after the sql_variant version byte) The data

types requiring additional information bytes and the values in these information bytes are

as follows:

Numeric and decimal data types require 1 byte for the precision and 1 byte for the

scale

Character strings require 2 bytes to store the maximum length and 4 bytes for the

collation ID

Binary and varbinary data values require 2 bytes to store the maximum length

Storage of Sparse Columns A new storage feature introduced in SQL Server 2008 is sparse

columns Sparse columns are ordinary columns that use an optimized storage format for

NULL values Sparse columns reduce the space requirements for NULL values at the cost of

more overhead to retrieve non-NULL values A rule of thumb is to consider using sparse

columns when you expect at least 90% of the rows to contain NULL values Prime

candi-dates are tables that have many columns where most of the attributes are NULL for most

rows—for example, when different attributes apply to different subsets of rows and, for

each row, only a subset of columns are populated with values

The sparse columns feature significantly increases the number of possible columns in a

table from 1,024 to 30,000 However, not all 30,000 can contain values The number of

actual populated columns you can have depends on the number of bytes of data in the

rows With sparse columns, storage of NULL values is optimized, requiring no space at all

for storing NULL values This is unlike nonsparse columns, which, as you saw earlier, do

need space even for NULL values (a fixed-length NULL value requires the full column width,

and a variable-length NULL requires at least 2 bytes of storage in the column offset array)

Trang 5

Name Number of Bytes Description

Complex column

header

2 A value of 05 indicates the column is a sparse vector

Sparse column

count

2 Number of sparse columns

Column ID set 2 × # of sparse

columns

The column IDs of each column with a value stored in the sparse vector

Column offset

table

2 × # of sparse columns

The offset of the ending position of each sparse column

Sparse data Depends on actual

values

The actual data values for each sparse column stored

in column ID order

Although sparse columns themselves require no space for NULL values, there is some fixed

overhead space required to allow rows to contain sparse columns This space is needed to

add the sparse vector to the end of the data row A sparse vector is added to the end of a

data row only if at least one sparse column is defined on the table

The sparse vector is used to keep track of the physical storage of sparse columns in the

row It is stored as the last variable-length column in the row No bit is stored in the NULL

bitmap for the sparse vector column but it is included in the count of the variable

columns (refer to Figure 34.3 for the general structure of a data row) The bytes stored in

the sparse vector are shown in Table 34.5

With the required overhead space for the sparse vector, the maximum size of all

fixed-length non-NULL sparse columns is reduced to 8,019 bytes per row

As you can see, the contents of a sparse vector are like a data row structure within a data

row If you refer to Figure 34.3, you can see that the structure of the sparse vector is

similar to the shaded structure of a data row One of the main differences is that the

sparse vector stores no information for any sparse columns that contain NULL values Also,

fixed-length and variable-length columns are stored the same within the sparse vector

However, if you have any variable-length columns in the sparse vector that are too large

to fit in the 8,019-byte limit of the data row, they are stored on row-overflow pages

The Row Offset Table

The location of a row within a page is identified by the row offset table, which is located at

the end of the page To find a specific row within a page, SQL Server looks up the starting

byte address for a given row ID in the row offset table, which contains the offset of the

row from the beginning of the page (refer to Figure 34.2) Each entry in the row offset

table is 2 bytes in size, so for each row in a table, an additional 2 bytes of space is added

in from the end of the page for the row offset entry

Trang 6

The row offset table keeps track of the logical order of rows on a data page If a table has

a clustered index defined on it, the data rows are stored in clustered index order

However, they may not be physically stored in clustered key order on the page itself

Instead, the row offset array indicates the logical clustered key order of the data rows For

example, row offset slot 0 refers to the first row in the clustered index key order, slot 1

refers to the second row, slot 2 refers to the third row, and so on The physical location of

the rows on the page may be in any order, depending on when rows on the page were

inserted or deleted

Row-Overflow Pages

While the maximum in-row size is 8,060 bytes per row, SQL Server 2008 allows actual

rows to exceed this size for tables that contain varchar, nvarchar, varbinary,

sql_variant, or common language runtime (CLR) user-defined type columns Although

the length of each one of these columns must still fall within the limit of 8,000 bytes, the

combined width of the row can exceed the 8,060-byte limit

When a combination of varchar, nvarchar, varbinary, sql_variant, or CLR user-defined

type columns exceeds the 8,060-byte limit, SQL Server moves the record column with the

largest width to another page in the ROW_OVERFLOW_DATA allocation unit, while

maintain-ing a 24-byte pointer to the row-overflow page on the original page Movmaintain-ing large records

to another page occurs dynamically as records are lengthened based on update operations

Update operations that shorten records may cause records to be moved back to the

origi-nal page in the IN_ROW_DATA allocation unit

Row-overflow pages are used only under certain circumstances For one, the row itself has

to exceed 8,060 bytes; it does not matter how full the data page itself is If a row is less

than 8,060 bytes and there’s not enough space in the data page, normal page splitting

occurs to store the row Also, each column in a table must be completely on the row or

completely off it A variable-length column cannot have some of its data on the regular

data page and some of its data on the overflow page One row can span multiple

row-overflow pages depending on how many large variable-length columns there are

Be aware that having data rows that require a row-overflow page increases the I/O cost of

retrieving the data row Querying and performing other select operations, such as sorts or

joins on large records that contain row-overflow data, also slow processing time because

these records are processed synchronously instead of asynchronously

Therefore, when you design a table with multiple varchar, nvarchar, varbinary,

sql_variant, or CLR user-defined type columns, you might want to consider the

percent-age of rows that are likely to require row overflow and the frequency with which this

overflow data is likely to be queried If there are likely to be frequent queries on many

rows of row-overflow data, you should consider normalizing the table so that some

columns are moved to another table, reducing the overall row size so that the rows fit

within 8,060 bytes The data can then be recombined in a query using an asynchronous

JOIN operation

Trang 7

TIP

Because of the performance implications, row-overflow pages are intended to be a

solu-tion for situasolu-tions in which most of your data rows fit completely on your data pages and

you only occasionally have rows that require a row-overflow page Row-overflow pages

allow SQL Server to handle the large data rows effectively without requiring a redesign

of your table However, if you find more than a few of your rows exceed the in-row size,

you probably should look into using the LOB data types or redesigning your table

LOB Data Pages

If you want to store large amounts of text or binary data, you can use the text, ntext, and

image data types, as well as the varchar(max), nvarchar(max), and varbinary(max) data

types (For information about how to use these data types, see Chapter 24, “Creating and

Managing Tables,” and Chapter 38, “Database Design and Performance.”) Each column for

a row of these data types can store up to 2GB (minus 2 bytes) of data By default, the LOB

values are not stored as part of the data row, but as a collection of pages on their own For

each LOB column, the data page contains a 16-byte pointer, which points to the location

of the initial page of the LOB data A row with several LOB columns has one pointer for

each column

The pages that hold LOB data are 8KB in size, just like any other page in SQL Server An

individual LOB page can hold LOB data for multiple columns and also from multiple

rows A LOB data page can even contain a mix of LOB data This helps reduce the storage

requirements for the LOB data, especially when smaller amounts of data are stored in

these columns For example, if SQL Server could store data for only a single column for a

single row on a single LOB data page and the data value consisted of only a single

charac-ter, it would still use an entire 8KB data page to store that data! Definitely not an efficient

use of space

A LOB data page can hold LOB data for only a single table, however A table with a LOB

column has a single set of pages to hold all its LOB data

LOB information is presented externally (to the user) as a long string of bytes Internally,

however, the information is stored within a set of pages The pages are not necessarily

organized sequentially but are logically organized as a B-tree structure (B-tree structures

are covered in more detail later in this chapter.) If an operation addresses some

informa-tion in the middle of the data, SQL Server can navigate through the B-tree to find the data

In previous versions, SQL Server had to follow the entire page chain from the beginning to

find the desired information

If the amount of the data in the LOB field is less than 32KB, the 16-byte pointer in the

data row points to an 84-byte root structure in the LOB B-tree This root structure points

to the pages and location where the actual LOB data is stored (see Figure 34.4) The data

itself can be placed anywhere within the LOB pages for the table The root structure keeps

Trang 8

track of the location of the information in a logical manner If the data is less than 64

bytes, it is stored in the root structure itself

If the amount of LOB data exceeds 32KB, SQL Server allocates intermediate B-tree index

nodes that point to the LOB pages In this situation, the intermediate node pages are

stored on pages not shared between different occurrences of LOB columns; the

intermedi-ate node pages store nodes for only one LOB column in a single data row

Storing LOB Data in the Data Row

To further conserve space and help minimize I/O, SQL Server 2008 supports storing LOB

data in the actual data row When the LOB data is stored outside the data row pages, at a

minimum, SQL Server needs to perform one additional page read per row to get the LOB

data

Why would you want to store LOB data in the row? Why not just store the data in a

varchar(8000)? Well, primarily because there is an upper limit of 8KB if the data is stored

within the data row (not counting the other columns) Using a LOB data type, you can

store more than 2 billion bytes of text If you know most of your records will be small, but

on occasion, some very large values will be stored, the text in row option provides

optimum performance and better space efficiency for the majority of your LOB values,

while providing the flexibility you need for the occasional large values This option also

provides the benefit of keeping the data all in a single column instead of having to split it

across multiple columns or rows when the data exceeds the size limit of a single row

Header

Root structure

Header

LOB Pointer in Data Row Data Page

LOB Data Page

Trang 9

If you want to enable the text in row option for a table with a LOB column, use the

sp_tableoption stored procedure:

exec sp_tableoption pub_info, ‘text in row’, 512

This example enables up to 512 bytes of LOB data in the pub_info table to be stored in

the data row The maximum amount of LOB data that can be stored in a data row is 7,000

bytes When a LOB value exceeds the specified size, rather than store the 16-byte pointer

in the data row as it would normally, SQL Server stores the 24-byte root structure that

contains the pointers to the separate chunks of LOB data for the row in the LOB column

The second parameter to sp_tableoption can be just the option ON If no size is specified,

the option is enabled with a default size of 256 bytes To disable the text in row option,

you can set its value to 0 or OFF with sp_tableoption When the option is turned off, all

LOB data stored in the row is moved off to LOB pages and replaced with the standard

16-byte pointer This can be a time-consuming process for a large table

Also, you should keep in mind that just because this option is enabled it doesn’t always

mean that the LOB data will be stored in the row All other data columns that are not LOB

take priority over LOB data for storage in the data row If a variable-length column grows

and there is not enough space left in the row or page for the LOB data, the LOB data is

moved off the page

Storage of MAX Data

An alternative to the text and image data types in SQL Server 2008 is the option of

defin-ing variable-length data usdefin-ing the MAX specifier When you use the MAX specifier with

varchar, nvarchar, and varbinary columns, SQL Server determines automatically whether

to store the data as a regular varchar, nvarchar, or varbinary value or as a LOB

Essentially, if the actual length is less than 8,000 bytes, SQL Server treats it as if it were

one of the regular variable-length data types, including using row-overflow pages if

neces-sary If the MAX column exceeds 8,000 bytes, it is stored like LOB data

Index Pages

Index information is stored on index pages An index page has the same layout as a data

page The difference is the type of information stored on the page Generally, a row in

an index page contains the index key and a pointer to the page or row at the next

(lower) level

The actual information stored in an index page depends on the index type and whether it

is a leaf-level page A leaf-level clustered index page is the data page itself; you’ve already

seen its structure The information stored on other index pages is as follows:

Clustered indexes, nonleaf pages—Each index row contains the index key and a

pointer (the fileId and a page address) to a page in the index tree at the next lower

level

Nonclustered index, nonleaf pages—Each index row contains the index key and

a page-down pointer (the file ID and a page address) to a page in the index tree at

Trang 10

the next lower level For nonunique indexes, the nonleaf row also contains the row

locator information for the corresponding data row

Nonclustered index, leaf pages—Rows on this level contain an index key and a

reference to a data row For heap tables, this is the Row ID; for clustered tables, this

is the clustered key for the corresponding data row

The actual structure and content of index rows, as well as the structure of the index

tree, are discussed in more detail later in this chapter

Space Allocation Structures

When a table or index needs more space in a database, SQL Server needs a way to

deter-mine where space is available in the database to be allocated If the table or index is still

fewer than eight pages in size, SQL Server must find a mixed extent with one or more

pages available that can be allocated If the table or index is eight pages or larger in size,

SQL Server must find a free uniform extent that can be allocated to the table or index

Extents

If SQL Server allocated space one page at a time as pages were needed for a table (or an

index), SQL Server would be spending a good portion of its time just allocating pages, and

the data would likely be scattered noncontiguously throughout the database Scanning such

a table would not be very efficient For these reasons, pages for each object are grouped

together and allocated in extents; an extent consists of eight logically contiguous pages.

When a table or index is created, it is initially allocated a page on a mixed extent If no

mixed extents are available in the database, a new mixed extent is allocated A mixed

extent can be shared by up to eight objects (each page in the extent can be assigned to a

different table or index)

As the table grows to at least eight pages in size, all future allocations to the table are done

as uniform extents.

Figure 34.5 shows the use of mixed and uniform extents

Mixed Extent

Table 2

8 Table 1 9 Table 2 10 Index 1 11 Table 1 12 Table 3 13 Index 1 14 Table 1 15

Uniform Extent

Table 1

16

Table 1 17 Table 1 18 Table 1 19 Table 1 20 Table 1 21 Table 1 22 Table 1 23

Page

Address

Page

Address

Định dạng
Số trang	10
Dung lượng	269,67 KB