Microsoft SQL Server 2008 R2 Unleashed- P120 pptx

Shaded Areas represent data present only when index contains nullable or variable length columns Status Byte A 1 byte Fixed Length Key Data n bytes Number of Columns 2 bytes Null Bi

Trang 1

Data (Leaf) Pages Page 8 Albert, Lynn,…

Alexis, Amy,…

Key Page ptr Page 14 Albert Cox Eddy

8 9 10

Page 9 Dean, Beth,…

Cox, Nancy,…

Page 10 Eddy, Elizabeth,…

Frank, Anabelle,…

Page 11 Sally, Hunt,…

Martin, Emma,…

Page 12 Smith, David,…

Toms, Mike,…

Page 13 Watson, Tom,…

Key Page ptr

Page 24

Albert

Hunt

14

15

Key Page ptr Page 15 Hunt Smith Watson

11 12 13

FIGURE 34.16 The structure of a clustered index

By default, a clustered index has a single partition and thus has at least one row in

separate B-tree structure contains the data for that specific partition

Depending on the data types in the clustered index, each clustered index structure has

one or more allocation units in which to store and manage the data for a specific

parti-tion At a minimum, each clustered index has one IN_ROW_DATA allocation unit per

parti-tion If the table contains any LOB data, the clustered index also has one LOB_DATA

allocation unit per partition and one ROW_OVERFLOW_DATA allocation unit per partition if

the table contains any variable-length columns that exceed the 8,060-byte row size limit

Clustered Index Row Structure

The structure of a clustered index row is similar to the structure of a data row except that

it contains only key columns; this structure is detailed in Figure 34.17

Trang 2

(Shaded Areas represent data present only when index contains nullable or variable length columns)

Status

Byte A

(1 byte)

Fixed Length

Key Data

(n bytes)

Number

of Columns (2 bytes)

Null Bitmap (1 bit for each column)

File ID (2 bytes)

Page Number (4 bytes)

Slot number (2 bytes)

Row Locator

Number

of Variable Length Columns (2 bytes)

Column Offset Array (2 x number

of variable columns)

Variable Length Key Data (n bytes)

FIGURE 34.17 Clustered index row structure

Notice that unlike a data row, index rows do not contain the status byte B or the 2 bytes to

hold the length of fixed-length data fields Instead of storing the length of the fixed-length

data, which also indicates where the fixed-length portion of a row ends and the

variable-length portion begins, the page header pminlen value is used to help describe an index

row The pminlen value is the minimum length of the index row, which is essentially the

sum of the size of all fixed-width fields and overhead Therefore, if no variable-length or

nullable fields are in the index key, pminlen also indicates the width of each index row

The null bitmap field and field for the number of columns in the index row are present

only when an index key contains nullable columns The number of columns value is only

needed to determine how many bits are needed in the null bitmap and therefore how

many bytes are required to store the null bitmap (1 byte per eight columns) The data

contents of a clustered index row include the key values along with a 6-byte down-page

pointer (the first 2 bytes are the file ID, and the last 4 bytes are the page number) The

down-page pointer is the last value in the fixed-data portion of the row

Nonunique Clustered Indexes

When a clustered index is defined on a table, the clustered index keys are used as row

locators to identify the data rows being referenced by nonclustered indexes (more on this

topic in the following section on nonclustered indexes) Because the clustered keys are

used as unique row pointers, there needs to be a way to uniquely refer to each row in the

table If the clustered index is defined as a unique index, the key itself uniquely identifies

every row If the clustered index was not created as a unique index, SQL Server adds a

4-byte integer field, called a uniqueifier, to the data row to make each key unique when

necessary When is the uniqueifier necessary? SQL Server adds the uniqueifier to a row

when the row is added to a table and that new row contains a key that is a duplicate of

the key for an already-existing row

The uniqueifier is added to the variable-length data area of the data row, which also

results in the addition of the variable-length overhead bytes Therefore, each duplicate row

in a clustered index has a minimum of 4 bytes of overhead added for the additional

Trang 3

uniqueifier If the row had no variable-length keys previously, an additional 8 bytes of

overhead are added to the row to store the uniqueifier (4 bytes) plus the overhead bytes

required for the variable data (storing the number of variable columns requires 2 bytes,

and the column offset array requires 2 bytes)

Nonclustered Indexes

A nonclustered index is a separate index structure, independent of the physical sort order

of the data rows in the table You can have up to 999 nonclustered indexes per table

A nonclustered index is similar to the index in the back of a book To find the pages on

which a specific subject is discussed, you look up the subject in the index and then go to

the pages referenced in the index This method is efficient as long as the subject is

discussed on only a few pages If the subject is discussed on many pages, or if you want to

read about many subjects, it can be more efficient to read the entire book

A nonclustered index works similarly to the book index From the index’s perspective, the

data rows are randomly spread throughout the table The nonclustered index tree contains

the index key values, in sorted order There is a row at the leaf level of the index for each

data row in the table Each leaf-level row contains a data row locator to locate the actual

data row in the table

Toms Watson Hunt

Martin Smith

Albert Hunt

Albert Dean

Non-Leaf

Level

Leaf

Level

Albert

Pages

Data Pages

11:1 9:2 8:2

13:2 8:1 10:2

Albert

Alexis

Cox

9:1 11:2 12:1

Dean Eddy Franks

Eddy

Smith

Page 8

…

Hunt Alexis

Page 11

…

Cox Toms

Page 12

…

Watson Dean

Page 13

…

Franks

Page 10

…

Albert Martin

Page 9

…

12:2 13:1

FIGURE 34.18 A nonclustered index on a heap table

Trang 4

If no clustered index is created for the table, the data row locator for the leaf level of the

index is an actual pointer to the data page and the row number within the page where the

row is located (see Figure 34.18)

Nonclustered indexes on clustered tables use the associated clustered index key value for

the record as the data row locator When SQL Server reaches the leaf level of a

nonclus-tered index, it uses the clusnonclus-tered index key to start searching through the clusnonclus-tered index

to find the actual data row (see Figure 34.19) This adds some I/O to the search itself, but

the benefit is that if a page split occurs in a clustered table, or if a data row is moved (for

example, as a result of an update), the nonclustered index row locator stays the same As

Non-Clustered

Index

Where firstname=‘Sally’

= Indicates search

Amy

Anabelle

Ruth

Albert Cox Eddy

Albert

Alexis

Lynn

Amy

…

Eddy Franks Elizabeth Anabelle

Hunt Martin Sally Emma Smith Toms David Mike

Watson Tom

…

Cox

Dean

Nancy Beth

…

Alexis

Franks

Dean

David Emma Lynn

Smith Martin Albert

Mike Nancy Sally

Toms Cox Hunt Tom Watson

Amy Mike Alexis Toms

Amy

David

Albert Hunt

Alexis Smith

Mike Tom Toms

Hunt Smith Watson

Data Pages

Clustered

Index

FIGURE 34.19 A nonclustered index on a clustered table

Trang 5

Status

Byte A

(1 byte)

Fixed Length

Key Data

(n bytes)

Number

File ID (2 bytes)

Slot number (2 bytes)

Row Locator

Number

Variable Length Key Data

(n bytes)

FIGURE 34.20 The structure of a nonclustered index leaf row for a heap table

long as the clustered index key value itself is not modified, no data row locators in the

nonclustered index have to be updated

SQL Server performs the following steps when searching for a value by using a

nonclus-tered index:

1 Queries the system catalog to determine the page address for the root page of the

index

2 Compares the search value against the index key values on the root page

3 Finds the highest key value on the page where the key value is less than or equal to

the search value

4 Follows the down-page pointer to the next level down in the nonclustered index tree

5 Continues following page pointers (that is, repeats steps 3 and 4) until the

nonclus-tered index leaf page is reached

6 Searches the index key rows on the leaf page to locate any matches for the search

value If no matching row is found on the leaf page, the table contains no

match-ing values

7 If a match is found on the leaf page, SQL Server follows the data row locator to the

data row on the data page

Nonclustered Index Leaf Row Structures

In nonclustered indexes, if the row locator is a row ID, it is stored at the end of the

fixed-length data portion of the row The rest of the structure of a nonclustered index leaf row is

similar to a clustered index row Figure 34.20 shows the structure of a nonclustered leaf

row for a heap table

If the row locator is a clustered index key value, the row locator resides in either the fixed

or variable portion of the row, depending on whether the clustered key columns were

defined as fixed or variable length Figure 34.21 shows the structure of a nonclustered leaf

row for a clustered table

When the row locator is a clustered key value and the clustered and nonclustered indexes

share columns, the data value for the key is stored only once in the nonclustered index

row For example, if your clustered index key is on lastname and you have a nonclustered

Trang 6

Status

Byte A

(1 byte)

Fixed Length

Nonclustered

Key Data

(n bytes)

Number

Variable Length Nonclustered Key Data

(n bytes)

Row Locator

Non-Overlapping Fixed Length Clustered Key Data

(n bytes)

Non-Overlapping Variable Length Clustered Key Data

(n bytes)

FIGURE 34.21 The structure of a nonclustered index leaf row for a clustered table

Status

Byte A

(1 byte)

Fixed Length

Key Data

(n bytes)

File ID (2 bytes)

Number

(n bytes)

Page-Down Pointer

Column Offset

(2 x number

FIGURE 34.22 The structure of a nonclustered nonleaf index row for a unique index

index defined on both firstname and lastname, the index rows do not store the value of

Nonclustered Index Nonleaf Row Structures

The nonclustered index nonleaf rows are similar in structure to clustered index nonleaf

rows in that they contain a page-down pointer to a page at the next level down in the

index tree The nonleaf rows don’t need to point to data rows; they only need to provide

the path to traverse the index tree to a leaf row If the nonclustered index is defined as

unique, the nonleaf index key row contains only the index key value and page-down

pointer Figure 34.22 shows the structure of a nonleaf index row for a unique

nonclus-tered index

If the nonclustered index is not defined as a unique index, the nonleaf rows also contain

the row locator information for the corresponding data row Storing the row locator in the

nonleaf index row ensures each index key row is unique (because the row locator, by its

Trang 7

Number

(n bytes)

Page-Down Pointer Row Locator

File ID (2 bytes) File ID

(2 bytes)

Slot Number (2 bytes)

Status

Byte A

(1 byte)

Fixed Length

Key Data

(n bytes)

FIGURE 34.23 The structure of a nonclustered nonleaf index row for a nonunique index on a

heap table

Status

Byte A

(1 byte)

Fixed Length

Nonclustered

Key Data

(n bytes)

Number

Variable Length Nonclustered Key Data

(n bytes)

Row Locator

Non-Overlapping Fixed Length Clustered Key Data

(n bytes)

Non-Overlapping Variable Length Clustered Key Data

(n bytes)

File ID (2 bytes)

Page-Down Pointer

FIGURE 34.24 The structure of a nonclustered nonleafindex row for a nonunique index on a

clustered table

nature, must be unique) Ensuring each index key row is unique allows any corresponding

nonleaf index rows to be located and deleted more easily when the data row is deleted

For a heap table, the row locator is the corresponding data row’s page and row pointer, as

shown in Figure 34.23

If the table is clustered, the clustered key values are stored in the nonleaf index rows of the

nonunique nonclustered index just as they are in the leaf rows, as shown in Figure 34.24

As you can see, it’s possible for the index pointers and row overhead to exceed the size of

the index key itself This is why, for I/O and storage reasons, it is always recommended

that you keep your index keys as small as possible

Trang 8

Data Modification and Performance

Now that you have a better understanding of the storage structures in SQL Server, it’s time

to look at how SQL Server maintains and manages those structures when data

modifica-tions are taking place in the database

Inserting Data

When you add a data row to a heap table, SQL Server adds the row to the heap wherever

space is available SQL Server uses the IAM and PFS pages to identify whether any pages

with free space are available in the extents already allocated to the table If no free pages

are found, SQL Server uses the information from the GAM and SGAM pages to locate a

free extent and allocate it to the table

For clustered tables, the new data row is inserted to the appropriate location on the

appro-priate data page relative to the clustered index key order If no more room is available on

the destination page, SQL Server needs to link a new page in the page chain to make room

available and add the row This is called a page split.

In addition to modifying the affected data pages when adding rows, SQL Server needs to

update all nonclustered indexes to add a pointer to the new record If a page split occurs,

this incurs even more overhead because the clustered index needs to be updated to store

the pointer for the new page added to the table Fortunately, because the clustered key is

used as the row locator in nonclustered indexes when a table is clustered, even though

the page and row IDs have changed, the nonclustered index row locators for rows moved

by a page split do not have to be updated as long as the clustered key column values

remain the same

Page Splits

When a page split occurs, SQL Server looks for an available page to link into the page

chain It first tries to find an available page in the same extent as the pages it will be

linked to If no free pages exist in the same extent, it looks at the IAM to determine

whether there are any free pages in any other extents already allocated to the table or

index If no free pages are found, a new extent is allocated to the table

When a new page is found or allocated to the table and linked into the page chain, the

original page is “split.” Approximately half the rows are moved to the new page, and the

rest remain on the original page (see Figure 34.25) Whether the new page goes before or

after the original page when the split is made depends on the amount of data to be moved

In an effort to minimize logging, SQL Server moves the smaller rows to the new page If

the smaller rows are at the beginning of the page, SQL Server places the new page before

the original page and moves the smaller rows to it If the larger rows are at the beginning

of the page, SQL Server keeps them on the original page and moves the smaller rows to the

new page after the original page

Trang 9

AAAA … BBBB…

CCCC … EEEE … FFFF … Page 1:201

GGGG … HHHH … JJJJ … LLLL…

Page 1:202

AAAA … BBBB … CCCC … Page 1:201

GGGG … HHHH … JJJJ … KKKK … Page 1:202

EEEE … FFFF … DDDD … Page 1:307

DDDD…

New Row

Page Split

FIGURE 34.25 Page splitting due to inserts

After determining where the new row goes between the existing rows and whether the

new page is to be added before or after the original page, SQL Server has to move rows to

the new page The simplified algorithm for determining the split point is as follows:

1 Place first row (with the lowest clustered key value) at the beginning of first page

2 Place the last row (with the highest clustered key value) on the second page

3 Place the row with the next lowest clustered key value on the first page after the

existing row(s)

4 Place the next-to-last row (with the second highest clustered key value) on the

second page

5 Continue alternating back and forth until the space between the two pages is

bal-anced or one of the pages is full

In some situations a double split can occur If the new row has to go between two existing

rows on a page, but the new row is too large to fit on either page with any of the existing

rows, a new page is added after the original The new row is added to the new page, a

second new page is added after that, and the remaining original rows are inserted into the

second new page An example of a double split is shown in Figure 34.26

Trang 10

AAAA … BBBB…

CCCC … EEEE … FFFF … Page 1:201

GGGG … HHHH … JJJJ … LLLL…

Page 1:202

AAAA … BBBB … CCCC … Page 1:201

GGGG … HHHH … JJJJ … KKKK … Page 1:202

EEEE,,, FFFF … Page 1:308

DDDD XXXX

XXXXXXXXX

DDDD XXXX XXXXXXXXX XXXXXXXXX XXXXXXXXX XXXXXXXXX

Page 1:307 Page

Split

FIGURE 34.26 Double page split due to large row insert

NOTE

Although page splits are expensive when they occur, they do generate free space in the

split pages for future inserts into those pages Page splits also help keep the index

tree balanced as rows are added to the table However, if you monitor the system with

Performance Monitor and are seeing hundreds of page splits per second, you might

want to consider rebuilding the clustered index on the table and applying a lower fill

factor to provide more free space in the existing pages This can help improve system

performance until eventually the pages fill up and start splitting again For this reason,

some shops supporting high-volume online transaction processing (OLTP) environments

with a lot of insert activity rebuild the indexes with a lower fill factor on a daily basis

Định dạng
Số trang	10
Dung lượng	299,26 KB