Hướng dẫn học Microsoft SQL Server 2008 part 137 ppsx

Internally, SQL Server non-clustered indexes are b-tree indexes and point to the base table, which is either a clustered index or a heap.. If the base table is a clustered index, then th

Trang 1

Indexes only become useful as they serve the needs of a query, so designing indexes means thinking

about how the query will navigate the indexes to reach the data ‘‘Zen and the Art of Indexing’’

means that you see the query path in your mind’s eye and design the shortest path from the query to

the data

What’s New With Indexes?

Indexing is critical to SQL Server performance, and Microsoft has steadily invested in SQL Server’s indexing

capabilities Back in SQL Server 2005, my favorite new feature was included columns for non-clustered

indexes, which made non-clustered indexes more efficient as covering indexes

With SQL Server 2008, Microsoft has again added several significant new indexing features

Filtered indexes means that a non-clustered index can be created that indexes only a subset of the data This

is perfect for situations like a manufacturing orders table with 2% active orders

The new star-join optimization uses bitmap filters for up to seven times performance gains when joining a

single table (fact table) with several lookup (dimension) tables

The new Forceseek table hint, as the name implies, forces the Query Optimizer to choose a seek operation

instead of a scan

Indexing Basics

You can’t master indexing without a solid understanding of how indexes work Please don’t skip this

section To apply the strategies described later in this chapter, you must grok the b-tree

The b-tree index

Conventional wisdom says that SQL Server has two types of indexes: clustered and non-clustered; but a

closer look reveals that SQL Server has in fact only one type of index: the b-tree, or balanced tree, index,

because internally both clustered and non-clustered indexes are b-tree indexes

B-tree indexes exist on index pages and have a root level, one or more intermediate levels, and a leaf or

node level The columns actually sorted by the b-tree index are called the index’s key columns, as shown

in Figure 64-1 The difference between clustered and non-clustered indexes is the amount and type of

data stored at the leaf level

While this chapter discusses the strategies of designing and optimizing indexes and does include some code examples that demonstrate creating indexes, the sister Chapter 20,

‘‘Creating the Physical Database Schema,’’ details the actual syntax and Management Studio methods of

creating indexes.

Over time, indexes typically become fragmented, which significantly hurts performance For more

infor-mation on index maintenance, turn to Chapter 42, ‘‘Maintaining the Database.’’

Trang 2

After you’ve read this chapter, I highly recommend digging deeper into the internals of SQL Server’s

indexes with my favorite SQL Server book, Kalen Delaney’s SQL Server 2008 Internals (Microsoft

Press, 2009).

FIGURE 64-1

The b-tree index is the most basic element of SQL Server This figure illustrates a simplified view of a

clustered index with an identity column as the clustered index key The first name is the data column

Data Columns

Key Columns

Balanced Tree Index

1-3 4-6

1 2 3

Matt Paul Beth

1-6

7-12

7-9 10-12

4 5 6

Nick Steve Zack 7

8 9

Tom Hank Greg 10

11 12

Susan Albert Ingrid

Clustered indexes

In SQL Server, when all the data columns are attached to the b-tree index’s leaf level, it’s called a

clustered index, and some might call it a table or base table (refer to Figure 64-1) A clustered index is

often called the physical sort order of the table, which is mostly, or at least logically, true

Logically, the clustered index pages will have the data in the clustered index sort order; but physically,

on the disk, those pages are a linked list — each page links to the next page and the previous page in

the list In a perfect world the pages would be in the same order as the list, but in reality they are often

moved around due to page splits and fragmentation (more on page splits later in this chapter) In this

case, the links probably jump around a bit

A table may only have one physical sort order, and therefore, only one clustered index The

quintessen-tial example of a clustered index is a telephone book (the old-fashioned printed kind, not the Internet

search type) The telephone book itself is a clustered index The last name and first name columns are

the index keys, and the rest of the data (address, phone number) is attached to the index

A telephone book even simulates a b-tree index Open a telephone book to the middle Choose the side

with the name you want to find, and then split that side in half In a few halves and splits, you’ll be at

the page with the name you’re looking for Your eye can now quickly scan that page and find the last

name and first name you want Because the address and phone number are printed right next to the

names, no more searching is needed

Trang 3

Non-clustered indexes

SQL Server can also create non-clustered indexes, which are similar to the indexes in the back of a book.

This type of index is keyed, or sorted, by the keywords, and the page numbers are pointers to the

book’s content

Internally, SQL Server non-clustered indexes are b-tree indexes and point to the base table, which is

either a clustered index or a heap If the base table is a clustered index, then the clustered index keys

(every sort-by column) are included at every level of the non-clustered index b-tree and leaf level If the

base table is a heap, then the heap RID (row ID) is used

For example, the non-clustered index illustrated in Figure 64-2 uses the first name column as its key

column, so that’s the data sorted by the b-tree The non-clustered index points to the base table by

including the clustered index key column In Figure 64-2, the clustered index key column is the identity

column used in Figure 64-1

Since SQL Server 2005, additional unsorted columns can be included in the leaf level The employee’s

title and department columns could be added to the previous index, which is extremely useful in

designing covering indexes (described in the next section)

A SQL Server table may have up to 999 non-clustered indexes, but I’ve never seen a well-normalized

table that required more than a dozen well-designed indexes

FIGURE 64-2

This simplified illustration of a non-clustered index has a b-tree index with first name as the key

column The non-clustered index includes pointers to the clustered index key column

Clustered Keys or Heap RowID (2005) Included Columns

Key Columns Balanced Tree Index

(2008) Filtered

A-G H-M

11

A-M N-Z

N-St Su-Z

3 9 8 12 1 4 2 5 10 7 6

Albert Beth Greg Hank Ingrid Matt Nick Paul Steve Susan Tom Zack

Composite indexes

A composite index is a clustered or non-clustered index that is keyed, or sorted, on multiple columns.

Composite indexes are common in production

Trang 4

The order of the columns in a composite index is important In order for a search to take advantage

of a composite index it must include the index columns from left to right If the composite index is

lastname,firstname, a search forfirstnamecan’t seek quickly through the b-tree index, but a

search forlastname, orlastnameandfirstname, will use the b-tree

Various methods of indexing for multiple columns are examined in Query Paths 9 through

11 later in this chapter.

A similar problem is searching for words within a column but not at the beginning of the text string

stored in the column For these word searches, SQL Server can use Integrated Full-Text Search (iFTS),

covered in Chapter 19, ‘‘Using Integrated Full-Text Search.’’

Unique indexes and constraints

Because primary keys are the unique method of identifying any row, indexes and primary keys are

intertwined — in fact, a primary key must be indexed By default, creating a primary key automatically

creates a unique clustered index, but it can optionally create a unique non-clustered index instead

A unique index limits data to being unique so it’s like a constraint; and a unique constraint builds a

unique index to quickly check the data In fact, a unique constraint and a unique index are the exact

same thing — creating either one builds a unique constraint/index

The only difference between a unique constraint/index and a primary key is that a primary key cannot

allow nulls, whereas a unique constraint/index can permit a single null value

The page split problem

Every b-tree index must maintain the key column data in the correct sort order Inserts, updates, and

deletes will affect that data As the data is inserted or modified, if the index page to which a value needs

to be added is full, then SQL Server must split the page into two less-than-full pages so it can insert

the value in the correct position Turning again to the telephone book example, if several new Nielsens

moved into the area and the Nie page 515 had to now accommodate 20 additions, a simulated page

split would take several steps:

1 Cut page 515 in half making two pages; call them 515a and 515b.

2 Print out and tape the new Nielsens to page 515a.

3 Tape page 515b inside the back cover of the telephone book.

4 Make a note on page 515a that the Nie listing continues on page 515b located at the end of

the book, and a note on page 515b indicating that the listing continues on page 515a

Pages splits cause several performance-related problems:

■ The page split operation is expensive because it involves several steps and moving data I’ve

personally seen page splits reduce an intensive insert process’ performance by 90 percent

■ If, after the page split, there still isn’t enough room, then the page will be split again This can

occur repeatedly depending on certain circumstances

■ The data structure is left fragmented and can no longer be read in a single contiguous pass

The data structure has more empty space, which means less data is read with every page read and less

data is stored in the buffer per page

Trang 5

Index selectivity

Another aspect of index tuning is the selectivity of the index An index that is very selective has more

distinct index values and selects fewer data rows per index value A primary key or unique index has

the highest possible selectivity; each index key only relates to one row

An index with only a few distinct values spread across a large table is less selective Indexes that are less

selective may not even be useful as indexes A column with three values spread throughout the table is a

poor candidate for an index A bit column has low selectivity and cannot be indexed directly

SQL Server uses its internal index statistics to track the selectivity of an index.DBCC Show_Statistic

reports the last date on which the statistics were updated, and basic information about the index

statistics, including the usefulness of the index A low density indicates that the index is very selective

A high density indicates that a given index node points to several table rows and that the index may be

less useful, as shown in this code sample:

Use CHA2;

DBCC Show_Statistics (Customer, IxCustomerName);

Result (formatted and abridged; the full listing includes details for every value in the index):

Statistics for INDEX ‘IxCustomerName’

Updated Rows Sampled Steps Density key length - - - - -

All density Average Length Columns - - -3.0303031E-2 6.6904764 LastName

2.3809524E-2 11.547619 LastName, FirstName DBCC execution completed If DBCC printed error messages, contact your system administrator

Sometimes changing the order of the key columns can improve the selectivity of an index and its

perfor-mance Be careful, however, because other queries may depend on the order for their perforperfor-mance

Unordered heaps

It’s also possible to create a table without a clustered index, in which case the data is stored in an

unordered heap Instead of being identified by the clustered index key columns, the rows are identified

internally using the heap’s RowID The RowID is an actual physical location composed of three values,

FileID:PageNum:SlotNum, and cannot be directly queried Any non-clustered indexes store the

heap’s RowID in all levels of the index to point to the heap instead of using the clustered index key

columns to point to the clustered index

Because a heap does not include a clustered index, a heap’s primary key must be a non-clustered index

Trang 6

Why Use Heaps?

Ibelieve heaps add no value and nearly always require a bookmark lookup (explained in Query Path 5), so

I avoid creating heaps

Developers who like heaps tend to be the same developers who prefer natural primary keys (as opposed to

surrogate primary keys) Natural primary keys are nearly always unordered When natural primary keys are

used for clustered indexes they generate a lot of page splits, which kills performance Heaps simply add new

rows at the end of the heap and they avoid the natural primary key page split problem

Some developers claim that heaps are faster than clustered indexes for inserts This is true only when the

clustered index is designed in a way that generates page splits Comparing insert performance between heaps

and clustered surrogate primary keys, there is little measurable difference, or the clustered index is slightly

faster

Heaps are organized by RIDs — row IDs (includes file, page, and row) Any seek operation (detailed soon)

into a heap must use a non-clustered index and a bookmark lookup (detailed in Query Path 5 later in this

chapter)

Query operations

Although there are dozens of logical and physical query execution operations, SQL Server uses three

primary operations to actually fetch the data:

■ Table scan: Reads the entire heap and, most likely, passes all the data to a secondary filter

operation

■ Index scan: Reads the entire leaf level (every row) of the clustered index or non-clustered

index The index scan operation might filter the rows and return only those rows that meet the

criteria, or it might pass all the rows to another filter operation depending on the complexity

of the criteria The data may or may not be ordered

■ Index seek: Locates specific row(s) data using the b-tree and returns only the selected rows in

an ordered list, as illustrated in Figure 64-3

The Query Optimizer chooses the fetch operation with the least cost Sequentially reading the data is

a very efficient task, so an index scan and filter operation may actually be cheaper than an index seek

with a bookmark lookup (see Query Path 5 below) involving hundreds of random I/O index seeks It’s

all about correctly guessing the number of rows touched and returned by each operation in the query

execution plan

Path of the Query

Indexes exist to serve queries — an index by itself serves no purpose The best way to understand how

to design efficient indexes is to observe and learn from the various possible paths queries take through

the indexes to locate data

Trang 7

FIGURE 64-3

An index-seek operation navigates the b-tree index, selects a beginning row, and then scans all the

required rows

Seek

Clustered Index SeekD

Scan

There are ten kata (a Japanese word for martial arts choreographed patterns or movements), or query

paths, with different combinations of indexes combined with index seeks and scans These kata begin

with a simple index scan and progress toward more complex query paths

Not every query path is an efficient query path There are nine good paths, and three paths that should

be avoided

A good test table for observing the twelve query paths in theAdventureWorks2008database is the

Production.WorkOrdertable It has 72,591 rows, only 10 columns, and a single-column clustered

primary key Here’s the table definition:

CREATE TABLE [Production].[WorkOrder](

[WorkOrderID] [int] IDENTITY(1,1) NOT NULL, [ProductID] [int] NOT NULL,

[OrderQty] [int] NOT NULL, [StockedQty] AS (isnull([OrderQty]-[ScrappedQty],(0))), [ScrappedQty] [smallint] NOT NULL,

[StartDate] [datetime] NOT NULL, [EndDate] [datetime] NULL, [DueDate] [datetime] NOT NULL, [ScrapReasonID] [smallint] NULL, [ModifiedDate] [datetime] NOT NULL, CONSTRAINT [PK_WorkOrder_WorkOrderID] PRIMARY KEY CLUSTERED ([WorkOrderID] ASC)

WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,

ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

) ON [PRIMARY];

Trang 8

As installed, theWorkOrdertable has the three indexes, each with one column as identified in the

index name:

■ PK_WorkOrder_WorkOrderID(clustered)

■ IX_WorkORder_ProductID(non-unique, non-clustered)

■ IX_WorkOrder_ScrapReasonID(non-unique, non-clustered)

Performance data for each kata, listed in Table 64-1, was captured by watching the T-SQL➪

SQL:StmtComplete and Performance➪ Showplan XML Statistics Profile events in Profiler, and

examining the query execution plan

The key performance indicators are the query execution plan optimizer costs (Cost), and the number of

logical reads (Reads)

For the duration column, I ran each query multiple times and averaged the results Of course, your SQL

Server machine is probably beefier than my notebook I urge you to run the script on your own

SQL Server instance, take your own performance measurements, and study the query execution plans

The Rows per ms column is calculated from the number of rows returned and the average duration

Before executing each query path, the following code clears the buffers:

DBCC FREEPROCCACHE;

DBCC DROPCLEANBUFFERS;

Query Path 1: Fetch All

The first query path sets a baseline for performance by simply requesting all the data from the base

table:

SELECT *

FROM Production.WorkOrder;

Without aWHEREclause and every column selected, the query must read every row from the clustered

index A clustered index scan (illustrated in Figure 64-4) sequentially reads every row

This query is the longest query of all the query paths, so it might seem to be a slow query, but when

comparing the number of rows returned per millisecond, the index scan returns the highest number of

rows per millisecond of any query path

Query Path 2: Clustered Index Seek

The second query path adds aWHEREclause to the first query and filters the result to a single row using

a clustered key value:

SELECT *

FROM Production.WorkOrder

WHERE WorkOrderID = 1234;

Trang 9

TABLE 64-1

Query Path Performance

Path Kata Plan Rows Cost Reads Index (ms) per ms

1 Fetch All C Ix Scan 72,591 485 526 1,196 60.71

2 Clustered Index

Seek

C Ix Seek 1 003 2 7 14

3 Range Seek Query

(narrow)

C Ix Seek (Seek keys start-end)

Range Seek Query (wide)

C Ix Seek (Seek keys start-end)

72,591 485 526 1,257 57.73

4 Filter by non-Key

Column

C Ix Scanfilter (predicate)

55 519 526 NC (include

all columns)

170 32

5 Bookmark Lookup

(Select *)

NC Ix SeekBML 9 037 29 226 04 Bookmark Lookup

(Select clustered key, non-key col)

NC Ix SeekBML 9 037 29 128 07

6 Covering Index

(narrow)

NC Ix Seek (Seek Predicate)

Covering Index (wide)

1,105 005 6 106 10.46

NC Seek Selecting Clustered Key (narrow)

NC Seek Selecting Clustered Key (wide)

1,105 004 4 46 24.02

Filter by Include Column

NC Ix Seek (Seek Predicate + Predicate)

7 Filter by 2 x NC

Indexes

2 x NC Ix Seek (PredicateMerge Join

8 Filter by Ordered

NC Composite Index

NC Ix Seek (Seek Predicate w/ 2 prefixes)

9 Filter by Unordered

NC Composite Index

NC Ix Scan 118 209 173 NC by missing

key, include C Key

72 1.64

10 Filter by Expression NC Ix Scan 9 209 173 111 08

Trang 10

FIGURE 64-4

The clustered index scan sequentially reads all the rows from the clustered index

Clustered Index PK_WorkOrder_WorkOrderID

The Query Optimizer offers two clues that there’s only one row that meets theWHEREclause

crite-ria: statistics and the fact thatWorkOrderIDis the primary key constraint so it must be unique

WorkOrderIDis also the clustered index key, so the Query Optimizer knows there’s a great index

available to locate a single row The clustered index seek operation navigates the clustered index b-tree

and quickly locates the desired row, as illustrated in Figure 64-5

FIGURE 64-5

A clustered index seek navigates the b-tree index and locates the row in a snap

Clustered Index PK_WorkOrder_WorkOrderID

Định dạng
Số trang	10
Dung lượng	603,79 KB