If you want to store the data row and logging information on a single log page, the in-row data cannot be more than 8,060 bytes in size.. This, in effect, limits the maximum in-row data
Trang 1The Data Rows
Following the page header, starting at byte 96 on the page, are the actual data rows Each
data row has a unique row number within the page Data rows in SQL Server cannot cross
page boundaries The maximum available space in a SQL Server page is 8,060 bytes of in-row
data When a data row is logged in the transaction log (for an insert, for example),
addi-tional logging information is stored on the log page along with the data row Because log
pages are 8,192 bytes in size and also have a 96-byte header, a log page has only 8,096 bytes
of available space If you want to store the data row and logging information on a single log
page, the in-row data cannot be more than 8,060 bytes in size This, in effect, limits the
maximum in-row data row size for a table in SQL Server 2008 to 8,060 bytes as well
Page Header
Fields
Description
PageID Unique identifier for the page It consists of two parts: the file ID number and
page number
NextPage File number and page number of the next page in the chain (0 if the page is
the last or only page in the chain or if the page belongs to a heap table)
PrevPage File number and page number of the previous page in the chain (0 if the page
is the first or only page in the chain or if the page belongs to a heap table)
ObjectID ID of the object to which this page belongs
PartitionID ID of the partition of which this page is a part
AllocUnitID ID of the allocation unit that contains this page
LSN Log sequence number (LSN) value used for changes and updates to this page
SlotCnt Total number of rows (slots) used on the page
Level Level at which this page resides in an index tree (0 indicates a leaf page or
data page)
IndexID ID of the index this page belongs to (0 indicates that it is a data page)
freedata Byte offset where the available free space starts on the page
Pminlen Minimum size of a data row Essentially, this is the number of bytes in the
fixed-length portion of the data rows
FreeCnt Number of free bytes available on the page
reservedCnt Number of bytes reserved by all transactions
Xactreserved Number of bytes reserved by the most recently started transaction
tornBits Bit string containing 1 bit per sector for detecting torn page writes (or
check-sum information if torn_page_detection is not on)
flagBits Two-byte bitmap that contains additional information about the page
Trang 2NOTE
Although 8,060 bytes is the maximum size of in-row data, 8,060 bytes is not the
maxi-mum row size limit in SQL Server 2008 Data rows can also have row-overflow and
large object (LOB) data stored on separate pages, as you see later in this chapter
The number of rows stored on a page depends on the size of each row For a table that has
all fixed-length, non-nullable columns, the size of the row and the number of rows that
can be stored on a page are always the same If the table has any variable or nullable
fields, the number of rows stored on the page depends on the size of each row SQL Server
attempts to fit as many rows as possible in a page Smaller row sizes allow SQL Server to fit
more rows on a page, which reduces page I/O and allows more data pages to fit in
memory This helps improve system performance by reducing the number of times SQL
Server has to read data in from disk
Because each data row also incurs some overhead bytes in addition to the actual data, the
maximum amount of actual data that can be stored in a single row on a page is slightly
less than 8,060 bytes The actual amount of overhead required per row depends on
whether the table contains any variable-length columns If you attempt to create a table
with a minimum row size including data and row overhead that exceeds 8,060 bytes, you
receive an error message as shown in the following example (remember that a multibyte
character set data type such as nchar or nvarchar requires 2 bytes per character, so an
nchar(4000) column requires 8,000 bytes):
CREATE TABLE customer_info2
(cust_no INT, cust_address NCHAR(25), info NCHAR(4000))
go
Msg 1701, Level 16, State 1, Line 1
Creating or altering table ‘customer_info2’ failed because the
minimum row size would be 8061, including 7 bytes of internal
overhead This exceeds the maximum allowable table row size of 8060
bytes
If the table contains variable-length or nullable columns, you can create a table for which
the minimum row size is less than 8,060 bytes, but the data rows could conceivably
exceed 8,060 bytes SQL Server allows the table to be created If you then try to insert a
row that exceeds 8,060 bytes of data and overhead, the data that exceeds the 8,060-byte
limit for in-row data is stored in a row-overflow page
The Structure of Data Rows The data for all fixed-length data fields in a table is stored at
the beginning of the row All variable-length data columns are stored after the
fixed-length data Figure 34.3 shows the structure of the data row in SQL Server
Trang 3Status
Byte A
(1 byte)
Fixed Length Data Columns (n bytes)
Number
of Columns (2 bytes)
Null Bitmap (1 bit for each column)
Number
of Variable Length Columns (2 bytes)
Status
Byte B
(1 byte)
not used
Length
of Fixed
Length
Data
(2 bytes)
Column Offset
(2 x number
of variable columns)
Variable Length Data Columns
(n bytes)
The total size of each data row is a factor of the sum of the size of the columns plus the
row overhead Seven bytes of overhead is the minimum for any data row:
1 byte for status byte A
1 byte for status byte B (in SQL Server 2008, only 1 bit is used indicating that the
record is a ghost-forwarded row)
2 bytes to store the length of the fixed-length columns
2 bytes to store the number of columns in the row
1 byte for every multiple of 8 columns (ceiling(numcols / 8) in the table for the
NULL bitmap A 1 in the bitmap indicates that the column allows NULLs
The values stored in status byte A are as follows:
Bit 0—This bit provides version information In SQL Server 2008, it’s always 0
Bits 1 through 3—This 3-bit value indicates the nature of the row 0 indicates that
the row is a primary record, 1 indicates that the row has been forwarded, 2 indicates
a forwarded stub, 3 indicates an index record, 4 indicates a blob fragment, 5
indi-cates a ghost index record, 6 indicates a ghost data record, and 7 indicates a ghost
version record (Many of these topics, such as forwarded and ghost records, are
discussed in further detail later in this chapter.)
Bit 4—This bit indicates that a NULL bitmap exists This is somewhat unnecessary in
SQL Server 2008 because a NULL bitmap is always present, even if no NULLs are
allowed in the table
Bit 5—This bit indicates that one or more variable-length columns exists in the row.
Bit 6—This bit indicates the row contains versioning information.
Bit 7—This bit is not currently used in SQL Server 2008.
If the table contains any variable-length columns, the following additional overhead bytes
are included in each data row:
2 bytes to store the number of variable-length columns in the row
2 bytes times the number of variable-length columns for the offset array This is
essentially a table in the row identifying where each variable-length column can be
found within the variable-length column block
Within each block of fixed-length or variable-length data, the data columns are stored in
the column order in which they were defined when the table was created In other words,
Trang 4all fixed-length fields are stored in column ID order in the fixed-length block, and all
nullable or variable-length fields are stored in column ID order in the variable-length block
Storage of the sql_variant Data Type The sql_variant data type can contain a value
of any column data type in SQL Server except for text, ntext, image, variable-length
columns with the MAX qualifier, and timestamp For example, a sql_variant in one row
could contain character data; in another row, an integer value; and in yet another row, a
float value Because they can contain any type of value, sql_variant columns are always
considered variable length The format of a sql_variant column is as follows:
Byte 1 indicates the actual data type being stored in the sql_variant
Byte 2 indicates the sql_variant version, which is always 1 in SQL Server 2008
The remainder of the sql_variant column contains the data value and, for some
data types, information about the data value
The data type value in byte 1 corresponds to the values in the xtype column in the
systypes database system table For example, if the first byte contains a hex 38, that
corresponds to the xtype value of 56, which is the int data type
Some data types stored in a sql_variant column require additional information bytes
stored at the beginning of the data value (after the sql_variant version byte) The data
types requiring additional information bytes and the values in these information bytes are
as follows:
Numeric and decimal data types require 1 byte for the precision and 1 byte for the
scale
Character strings require 2 bytes to store the maximum length and 4 bytes for the
collation ID
Binary and varbinary data values require 2 bytes to store the maximum length
Storage of Sparse Columns A new storage feature introduced in SQL Server 2008 is sparse
columns Sparse columns are ordinary columns that use an optimized storage format for
NULL values Sparse columns reduce the space requirements for NULL values at the cost of
more overhead to retrieve non-NULL values A rule of thumb is to consider using sparse
columns when you expect at least 90% of the rows to contain NULL values Prime
candi-dates are tables that have many columns where most of the attributes are NULL for most
rows—for example, when different attributes apply to different subsets of rows and, for
each row, only a subset of columns are populated with values
The sparse columns feature significantly increases the number of possible columns in a
table from 1,024 to 30,000 However, not all 30,000 can contain values The number of
actual populated columns you can have depends on the number of bytes of data in the
rows With sparse columns, storage of NULL values is optimized, requiring no space at all
for storing NULL values This is unlike nonsparse columns, which, as you saw earlier, do
need space even for NULL values (a fixed-length NULL value requires the full column width,
and a variable-length NULL requires at least 2 bytes of storage in the column offset array)
Trang 5Name Number of Bytes Description
Complex column
header
2 A value of 05 indicates the column is a sparse vector
Sparse column
count
2 Number of sparse columns
Column ID set 2 × # of sparse
columns
The column IDs of each column with a value stored in the sparse vector
Column offset
table
2 × # of sparse columns
The offset of the ending position of each sparse column
Sparse data Depends on actual
values
The actual data values for each sparse column stored
in column ID order
Although sparse columns themselves require no space for NULL values, there is some fixed
overhead space required to allow rows to contain sparse columns This space is needed to
add the sparse vector to the end of the data row A sparse vector is added to the end of a
data row only if at least one sparse column is defined on the table
The sparse vector is used to keep track of the physical storage of sparse columns in the
row It is stored as the last variable-length column in the row No bit is stored in the NULL
bitmap for the sparse vector column but it is included in the count of the variable
columns (refer to Figure 34.3 for the general structure of a data row) The bytes stored in
the sparse vector are shown in Table 34.5
With the required overhead space for the sparse vector, the maximum size of all
fixed-length non-NULL sparse columns is reduced to 8,019 bytes per row
As you can see, the contents of a sparse vector are like a data row structure within a data
row If you refer to Figure 34.3, you can see that the structure of the sparse vector is
similar to the shaded structure of a data row One of the main differences is that the
sparse vector stores no information for any sparse columns that contain NULL values Also,
fixed-length and variable-length columns are stored the same within the sparse vector
However, if you have any variable-length columns in the sparse vector that are too large
to fit in the 8,019-byte limit of the data row, they are stored on row-overflow pages
The Row Offset Table
The location of a row within a page is identified by the row offset table, which is located at
the end of the page To find a specific row within a page, SQL Server looks up the starting
byte address for a given row ID in the row offset table, which contains the offset of the
row from the beginning of the page (refer to Figure 34.2) Each entry in the row offset
table is 2 bytes in size, so for each row in a table, an additional 2 bytes of space is added
in from the end of the page for the row offset entry
Trang 6The row offset table keeps track of the logical order of rows on a data page If a table has
a clustered index defined on it, the data rows are stored in clustered index order
However, they may not be physically stored in clustered key order on the page itself
Instead, the row offset array indicates the logical clustered key order of the data rows For
example, row offset slot 0 refers to the first row in the clustered index key order, slot 1
refers to the second row, slot 2 refers to the third row, and so on The physical location of
the rows on the page may be in any order, depending on when rows on the page were
inserted or deleted
Row-Overflow Pages
While the maximum in-row size is 8,060 bytes per row, SQL Server 2008 allows actual
rows to exceed this size for tables that contain varchar, nvarchar, varbinary,
sql_variant, or common language runtime (CLR) user-defined type columns Although
the length of each one of these columns must still fall within the limit of 8,000 bytes, the
combined width of the row can exceed the 8,060-byte limit
When a combination of varchar, nvarchar, varbinary, sql_variant, or CLR user-defined
type columns exceeds the 8,060-byte limit, SQL Server moves the record column with the
largest width to another page in the ROW_OVERFLOW_DATA allocation unit, while
maintain-ing a 24-byte pointer to the row-overflow page on the original page Movmaintain-ing large records
to another page occurs dynamically as records are lengthened based on update operations
Update operations that shorten records may cause records to be moved back to the
origi-nal page in the IN_ROW_DATA allocation unit
Row-overflow pages are used only under certain circumstances For one, the row itself has
to exceed 8,060 bytes; it does not matter how full the data page itself is If a row is less
than 8,060 bytes and there’s not enough space in the data page, normal page splitting
occurs to store the row Also, each column in a table must be completely on the row or
completely off it A variable-length column cannot have some of its data on the regular
data page and some of its data on the overflow page One row can span multiple
row-overflow pages depending on how many large variable-length columns there are
Be aware that having data rows that require a row-overflow page increases the I/O cost of
retrieving the data row Querying and performing other select operations, such as sorts or
joins on large records that contain row-overflow data, also slow processing time because
these records are processed synchronously instead of asynchronously
Therefore, when you design a table with multiple varchar, nvarchar, varbinary,
sql_variant, or CLR user-defined type columns, you might want to consider the
percent-age of rows that are likely to require row overflow and the frequency with which this
overflow data is likely to be queried If there are likely to be frequent queries on many
rows of row-overflow data, you should consider normalizing the table so that some
columns are moved to another table, reducing the overall row size so that the rows fit
within 8,060 bytes The data can then be recombined in a query using an asynchronous
JOIN operation
Trang 7TIP
Because of the performance implications, row-overflow pages are intended to be a
solu-tion for situasolu-tions in which most of your data rows fit completely on your data pages and
you only occasionally have rows that require a row-overflow page Row-overflow pages
allow SQL Server to handle the large data rows effectively without requiring a redesign
of your table However, if you find more than a few of your rows exceed the in-row size,
you probably should look into using the LOB data types or redesigning your table
LOB Data Pages
If you want to store large amounts of text or binary data, you can use the text, ntext, and
image data types, as well as the varchar(max), nvarchar(max), and varbinary(max) data
types (For information about how to use these data types, see Chapter 24, “Creating and
Managing Tables,” and Chapter 38, “Database Design and Performance.”) Each column for
a row of these data types can store up to 2GB (minus 2 bytes) of data By default, the LOB
values are not stored as part of the data row, but as a collection of pages on their own For
each LOB column, the data page contains a 16-byte pointer, which points to the location
of the initial page of the LOB data A row with several LOB columns has one pointer for
each column
The pages that hold LOB data are 8KB in size, just like any other page in SQL Server An
individual LOB page can hold LOB data for multiple columns and also from multiple
rows A LOB data page can even contain a mix of LOB data This helps reduce the storage
requirements for the LOB data, especially when smaller amounts of data are stored in
these columns For example, if SQL Server could store data for only a single column for a
single row on a single LOB data page and the data value consisted of only a single
charac-ter, it would still use an entire 8KB data page to store that data! Definitely not an efficient
use of space
A LOB data page can hold LOB data for only a single table, however A table with a LOB
column has a single set of pages to hold all its LOB data
LOB information is presented externally (to the user) as a long string of bytes Internally,
however, the information is stored within a set of pages The pages are not necessarily
organized sequentially but are logically organized as a B-tree structure (B-tree structures
are covered in more detail later in this chapter.) If an operation addresses some
informa-tion in the middle of the data, SQL Server can navigate through the B-tree to find the data
In previous versions, SQL Server had to follow the entire page chain from the beginning to
find the desired information
If the amount of the data in the LOB field is less than 32KB, the 16-byte pointer in the
data row points to an 84-byte root structure in the LOB B-tree This root structure points
to the pages and location where the actual LOB data is stored (see Figure 34.4) The data
itself can be placed anywhere within the LOB pages for the table The root structure keeps
Trang 8track of the location of the information in a logical manner If the data is less than 64
bytes, it is stored in the root structure itself
If the amount of LOB data exceeds 32KB, SQL Server allocates intermediate B-tree index
nodes that point to the LOB pages In this situation, the intermediate node pages are
stored on pages not shared between different occurrences of LOB columns; the
intermedi-ate node pages store nodes for only one LOB column in a single data row
Storing LOB Data in the Data Row
To further conserve space and help minimize I/O, SQL Server 2008 supports storing LOB
data in the actual data row When the LOB data is stored outside the data row pages, at a
minimum, SQL Server needs to perform one additional page read per row to get the LOB
data
Why would you want to store LOB data in the row? Why not just store the data in a
varchar(8000)? Well, primarily because there is an upper limit of 8KB if the data is stored
within the data row (not counting the other columns) Using a LOB data type, you can
store more than 2 billion bytes of text If you know most of your records will be small, but
on occasion, some very large values will be stored, the text in row option provides
optimum performance and better space efficiency for the majority of your LOB values,
while providing the flexibility you need for the occasional large values This option also
provides the benefit of keeping the data all in a single column instead of having to split it
across multiple columns or rows when the data exceeds the size limit of a single row
Header
Header
Root structure
Header
Header
LOB Pointer in Data Row Data Page
LOB Data Page
LOB Data Page
LOB Data Page
Trang 9If you want to enable the text in row option for a table with a LOB column, use the
sp_tableoption stored procedure:
exec sp_tableoption pub_info, ‘text in row’, 512
This example enables up to 512 bytes of LOB data in the pub_info table to be stored in
the data row The maximum amount of LOB data that can be stored in a data row is 7,000
bytes When a LOB value exceeds the specified size, rather than store the 16-byte pointer
in the data row as it would normally, SQL Server stores the 24-byte root structure that
contains the pointers to the separate chunks of LOB data for the row in the LOB column
The second parameter to sp_tableoption can be just the option ON If no size is specified,
the option is enabled with a default size of 256 bytes To disable the text in row option,
you can set its value to 0 or OFF with sp_tableoption When the option is turned off, all
LOB data stored in the row is moved off to LOB pages and replaced with the standard
16-byte pointer This can be a time-consuming process for a large table
Also, you should keep in mind that just because this option is enabled it doesn’t always
mean that the LOB data will be stored in the row All other data columns that are not LOB
take priority over LOB data for storage in the data row If a variable-length column grows
and there is not enough space left in the row or page for the LOB data, the LOB data is
moved off the page
Storage of MAX Data
An alternative to the text and image data types in SQL Server 2008 is the option of
defin-ing variable-length data usdefin-ing the MAX specifier When you use the MAX specifier with
varchar, nvarchar, and varbinary columns, SQL Server determines automatically whether
to store the data as a regular varchar, nvarchar, or varbinary value or as a LOB
Essentially, if the actual length is less than 8,000 bytes, SQL Server treats it as if it were
one of the regular variable-length data types, including using row-overflow pages if
neces-sary If the MAX column exceeds 8,000 bytes, it is stored like LOB data
Index Pages
Index information is stored on index pages An index page has the same layout as a data
page The difference is the type of information stored on the page Generally, a row in
an index page contains the index key and a pointer to the page or row at the next
(lower) level
The actual information stored in an index page depends on the index type and whether it
is a leaf-level page A leaf-level clustered index page is the data page itself; you’ve already
seen its structure The information stored on other index pages is as follows:
Clustered indexes, nonleaf pages—Each index row contains the index key and a
pointer (the fileId and a page address) to a page in the index tree at the next lower
level
Nonclustered index, nonleaf pages—Each index row contains the index key and
a page-down pointer (the file ID and a page address) to a page in the index tree at
Trang 10the next lower level For nonunique indexes, the nonleaf row also contains the row
locator information for the corresponding data row
Nonclustered index, leaf pages—Rows on this level contain an index key and a
reference to a data row For heap tables, this is the Row ID; for clustered tables, this
is the clustered key for the corresponding data row
The actual structure and content of index rows, as well as the structure of the index
tree, are discussed in more detail later in this chapter
Space Allocation Structures
When a table or index needs more space in a database, SQL Server needs a way to
deter-mine where space is available in the database to be allocated If the table or index is still
fewer than eight pages in size, SQL Server must find a mixed extent with one or more
pages available that can be allocated If the table or index is eight pages or larger in size,
SQL Server must find a free uniform extent that can be allocated to the table or index
Extents
If SQL Server allocated space one page at a time as pages were needed for a table (or an
index), SQL Server would be spending a good portion of its time just allocating pages, and
the data would likely be scattered noncontiguously throughout the database Scanning such
a table would not be very efficient For these reasons, pages for each object are grouped
together and allocated in extents; an extent consists of eight logically contiguous pages.
When a table or index is created, it is initially allocated a page on a mixed extent If no
mixed extents are available in the database, a new mixed extent is allocated A mixed
extent can be shared by up to eight objects (each page in the extent can be assigned to a
different table or index)
As the table grows to at least eight pages in size, all future allocations to the table are done
as uniform extents.
Figure 34.5 shows the use of mixed and uniform extents
Mixed Extent
Table 2
8 Table 1 9 Table 2 10 Index 1 11 Table 1 12 Table 3 13 Index 1 14 Table 1 15
Uniform Extent
Table 1
16
Table 1 17 Table 1 18 Table 1 19 Table 1 20 Table 1 21 Table 1 22 Table 1 23
Page
Address
Page
Address