Hướng dẫn học Microsoft SQL Server 2008 part 147 doc

The 5% is not a random sample but every twentieth page, so it should give consistent results: EXEC sp_estimate_data_compression_savings @schema_name = ‘Production’, @object_name = ‘BillO

Trang 1

Abbreviated result when executed in theAdventureWorksdatabase:

- - -

1509580416 Person Person 2 IX_Person_LastName NONCLUSTERED 1 NONE

.

Estimating data compression

Because every object can yield a different compression ratio, it’s useful to have some idea of how much

compression is possible before actually performing the compression Toward this end, SQL Server

2008 includes the ability to pre-estimate the potential data reduction of data compression using the

sp_estimate_data_compression_savingssystem stored procedure

Specifically, this system stored procedure will copy 5% of the data to be compressed intotempdband

compress it The 5% is not a random sample but every twentieth page, so it should give consistent

results:

EXEC sp_estimate_data_compression_savings

@schema_name = ‘Production’,

@object_name = ‘BillOfMaterials’,

@index_id = NULL,

@partition_number = NULL,

@data_compression = ‘page’;

The result displays the following columns for each object (base table and index):

object_name schema_name index_id partition_number size_with_current_compression_setting(KB) size_with_requested_compression_setting(KB) sample_size_with_current_compression_setting(KB) sample_size_with_requested_compression_setting(KB)

The Data Compression Wizard, shown in Figure 67-4, uses this same system stored procedure to

esti-mate the compression Select the type of compression to estiesti-mate and press the Calculate button

Enabling data compression

Data compression alters the structure of the data on the disk, so it makes sense that data compression is

enabled using aCREATEorALTERstatement

Using the UI, the only way to adjust an object’s data compression is by using the same Data

Compres-sion Wizard used previously to estimate the compresCompres-sion gain

Trang 2

Data Compression 67

FIGURE 67-4

The Data Compression Wizard will estimate the compression ratio and apply the selected type of data

compression

With T-SQL, compression may be initially set when the object is created by adding the data

compres-sion setting to theCREATEstatement with the following option:

WITH (DATA_COMPRESSION = [none, row, or page])

Use the following to create a new table with row compression:

CREATE TABLE CTest (col1 INT, Col2 CHAR(100))

WITH (Data_Compression = Row);

To change the compression setting for an existing object, use theALTERstatement:

ALTER object REBUILD

WITH (DATA_COMPRESSION = [none, row, or page])

For instance, the following code changes theBillOfMaterialstable to page compression:

ALTER TABLE ‘Production’ ‘BillOfMaterials’

Rebuild with (data_compression = Page);

1423

www.getcoolebook.com

Trang 3

Whole Database Compression

I’m a big fan of data compression, so I’ve expended some effort in trying to make compression more

accessible to the busy DBA by creating two stored procedures that automate estimating and applying data

compression for the whole database

The first stored procedure, db_compression_estimate, estimates the row and page compression gain

for every object and index in the database For AdventureWorks2008 on my VPC it runs in about 2:35,

producing the following results:

The db_compression (@minCompression) stored procedure automatically compresses using a few

intelligent choices: It checks the size of the object and the current compression setting, and compares

it to potential row and page compression gains If the object is eight pages or less, no compression is

applied For larger objects, the stored procedure calls sp_estimate_data_compression_savings to

estimate the savings with row and page compression If the estimated gain is equal to or greater than the

@minCompressionparameter (default 25%), it enables row or page compression, whichever offers greater

gain If row and page have the same gain, then it enables row compression

If the estimated gain is less than the @mincompression parameter, then its alters the object to set

compression to none

If the stored procedure is rerun and the gains have changed, it will change the object to the compression

method (or no compression) that is now the recommended option

The db_compression_estimate and db_compression stored procedures may be downloaded from

www.sqlserverbible.comor codeplex.com This is the first version of these stored procedures; check

back or watch my blog for any updates

Data compression strategies

Data compression is new to SQL Server and at this early stage, applying compression is more an art

than science With this in mind, here are my recommendations on how to best use data compression:

1 Establish a performance baseline.

2 Run thedb_compressstored procedure

Trang 4

Data Compression 67

3 If specific procedures or queries run noticeably slower, decide, on a case-by-case basis, if the

space savings and I/O reduction is worth the performance hit, and adjust compression as

needed

4 Carefully monitor the use of data compression on high-transaction tables, in case the CPU

overhead exceeds the I/O performance gains

In practice I’ve seen row compression alone offer disk space gains up to 50%, but sometimes it actually

increases the size of the data Seldom does row compression alone beat page compression, but they

often provide the same result When row compression and page compression offer the same

compres-sion ratio, it’s better to apply only row comprescompres-sion and save the CPU from having to perform the

additional page compression

For small lookup tables that are frequently accessed by queries, use row compression but avoid page

compression — the CPU overhead versus compression benefit isn’t worth it in this case

If the object is partitioned using partition tables (covered in the next chapter), carefully consider data

compression on a per-partition basis — especially for sliding window–style partitioning

Summary

Data compression is the sleeper feature of SQL Server 2008 With both row compression and page

compression, including both prefix and dictionary compression, SQL Server offers the granularity to

tune data compression Using data compression carefully, you’ll be able to push the envelope for an

I/O bound, high-transaction database

The next chapter continues the thread of technologies used for highly scalable database design with a

look at several types of partitioning

1425

Trang 6

IN THIS CHAPTER

Scaling out with multiple tables and multiple servers.

Distributed partition views Table partitioning

Custom partitioning design

Divide and conquer

Dividing a terabyte table can be as effective as dividing an enemy tank division or

dividing the opposing political party

Dividing data brings several benefits:

■ It’s significantly easier to maintain, back up, and defragment a divided

data set

■ The divided data sets mean smaller indexes, fewer intermediate pages,

and faster performance

■ The divided data sets can reside on separate physical servers, thus

scaling out and lowering costs and improving performance

However, dividing, or partitioning, data has its own set of problems to conquer

E F Codd recognized the potential issues with physical partitioning of data in

October 1985 in his famous ‘‘Is Your DBMS Really Relational?’’ article, which

out-lined 12 rules, or criteria, for a relational database Rule 11 specifically deals with

partitioned data:

Rule 11: Distribution independence

The distribution of portions of the database to various locations should be invisible to

users of the database Existing applications should continue to operate successfully:

1 when a distributed version of the DBMS is first introduced; and

2 when existing distributed data are redistributed around the system.

In layperson’s terms, rule 11 says that if the complete set of data is spread over

multiple tables or multiple servers, then the software must be able to search for

any piece of that data regardless of its physical location

1427

Trang 7

There are several ways to try to solve this problem SQL Server offers a couple of technologies that

han-dle partitioning: partitioned views and partitioned tables And later in this chapter, I offer a design pattern

that I’ve had some success with

Partitioning Strategies

The partitions are most effective when the partition key is a column often used to select a range of data,

so that a query has a good chance of addressing only one of the segments For example:

■ A company manages sales from five distinct sales offices; splitting the order table by sales region will likely enable each sales region’s queries to access only that region’s partition

■ A manufacturing company partitions a large activity-tracking table into several smaller tables, one for each department, knowing that each of the production applications tends to query a single department’s data

■ A financial company has several terabytes of historical data and must be able to easily query across current and old data However, the majority of current activity deals with only the current data Segmenting the data by era enables the current-activity queries to access a much smaller table

Best Practice

Very large, frequently accessed tables, with data that can logically be divided horizontally for the most

common queries, are the best candidates for partitioning If the table doesn’t meet this criteria, don’t partition

the table

In the access of data, the greatest bottleneck is reading the data from the drive The primary benefit of

partitioning tables is that a smaller partitioned table will have a greater percentage of the table cached in

memory

Partitioning can be considered from two perspectives:

■ Horizontal partitioning means splitting the table by rows For example, if you have a large

5,000-row spreadsheet and split it so that rows 1 through 2,500 remain in the original spreadsheet and move rows 2,501 to 5,000 to a new additional spreadsheet, that move would illustrate horizontal partitioning

■ Vertical partitioning splits the table by columns, segmenting some columns into a different

table Sometimes this makes sense from a logical modeling point of view, if the vertical parti-tioning segments columns that belong only to certain subtypes But strictly speaking, vertical partitioning is less common and not considered a best practice

All the partitioning methods discussed in this chapter involve horizontal partitioning

Trang 8

Partitioning 68

A Brief History of SQL Server Partitioning

Microsoft introduced partitioned views and distributed partitioned views with SQL Server 2000 and improved

their performance with SQL Server 2005, but the big news regarding partitioning in SQL Server 2005 was

the new partitioned tables

SQL Server 2008 doesn’t change the feature set or syntax for partitioned views or partitioned tables, but the

new version significantly improves how the Query Processor uses parallelism with partitioned tables

Considerable research is still ongoing regarding SQL Server scale-out and partitioning Microsoft has already

publicly demonstrated Synchronicity — an incredible scale-out middle layer technology for SQL Server

Partitioned Views

Of the possible ways to partition data using SQL Server, the most straightforward solution is partitioned

views

To partition a view is to split the table into two or more smaller separate tables based on a partition key

and then make the data accessible, meeting Codd’s eleventh rule, using a view The individual tables can

all exist on the same server, making them local partitioned views

With the data split into several partition tables, of course, each individual table may be directly queried

A more sophisticated and flexible approach is to access the whole set of data by querying a view that

unites all the partition tables — this type of view is called a partitioned view.

The SQL Server query processor is designed specifically to handle such a partitioned view If a query

accesses the union of all the partition tables, the query processor will retrieve data only from the

required partition tables

A partitioned view not only handles selects; data can be inserted, updated, and deleted through the

partitioned view The query processor will engage only the individual table(s) necessary

SQL Server supports two types of partition views: local and distributed

■ A local-partition view unites data from multiple local partition tables on a single server.

■ A distributed-partition view, also known as a federated database, spreads the partition tables

across multiple servers and connects them using linked servers, and views that include

distributed queries

The individual tables underneath the partitioned view are called partition tables, not to be confused with partitioned tables, a completely different technology, covered in the next major section in this chapter.

1429

Trang 9

Local-partition views

Local-partition views access only local tables For a local-partition view to be configured, the following

elements must be in place:

■ The data must be segmented into multiple tables according to a single column, known as the

partition key.

■ Each partition table must have a check constraint restricting the partition-key data to a single value SQL Server uses the check constraint to determine which tables are required by a query

■ The partition key must be part of the primary key

■ The partition view must include a union statement that pulls together data from all the partition tables

Segmenting the data

To implement a partitioned-view design for a database and segment the data in a logical fashion, the

first step is to move the data into the partitioned tables

As an example, theOrderandOrderDetailtables in theOBXKitessample database can be

partitioned by sales location In the sample database, the data breaks down as follows:

SELECT LocationCode, Count(OrderNumber) AS Count FROM Location

JOIN [Order]

ON [Order].LocationID = Location.LocationID GROUP BY LocationCode

Result:

-

To partition the sales data, theOrderandOrderDetailtables will be split into a table for each

location The first portion of the script creates the partition tables They differ from the original tables

only in the primary-key definition, which becomes a composite primary key consisting of the original

primary key and theLocationCode In theOrderDetailtable theLocationCodecolumn is

added so it can serve as the partition key, and theOrderIDcolumn foreign-key constraint points to the

partition table

The script then progresses to populating the tables from the non-partitioned tables To select the correct

OrderDetailrows, the table needs to be joined with theOrderCHtable

For brevity’s sake, only the Cape Hatteras (CH) location is shown here The chapter’s sample code script

includes similar code for the Jockey Ridge and Kill Devil Hills locations The differences between the

partition table and the original tables, and the code that differs among the various partitions, are shown

in bold:

Order Table CREATE TABLE dbo.OrderCH (

Trang 10

Partitioning 68

LocationCode CHAR(5) NOT NULL,

OrderID UNIQUEIDENTIFIER NOT NULL Not PK

ROWGUIDCOL DEFAULT (NEWID()),

OrderNumber INT NOT NULL,

ContactID UNIQUEIDENTIFIER NULL

FOREIGN KEY REFERENCES dbo.Contact,

OrderPriorityID UNIQUEIDENTIFIER NULL

FOREIGN KEY REFERENCES dbo.OrderPriority,

EmployeeID UNIQUEIDENTIFIER NULL

FOREIGN KEY REFERENCES dbo.Contact,

LocationID UNIQUEIDENTIFIER NOT NULL

FOREIGN KEY REFERENCES dbo.Location,

OrderDate DATETIME NOT NULL DEFAULT (GETDATE()),

Closed BIT NOT NULL DEFAULT (0) set to true when Closed

)

ON [Primary]

go

PK

ALTER TABLE dbo.OrderCH

ADD CONSTRAINT

PK_OrderCH PRIMARY KEY NONCLUSTERED

(LocationCode, OrderID)

Check Constraint

ALTER TABLE dbo.OrderCH

ADD CONSTRAINT

OrderCH_PartitionCheck CHECK (LocationCode = ‘CH’)

go

Order Detail Table

CREATE TABLE dbo.OrderDetailCH (

LocationCode CHAR(5) NOT NULL,

OrderDetailID UNIQUEIDENTIFIER NOT NULL Not PK

ROWGUIDCOL DEFAULT (NEWID()),

OrderID UNIQUEIDENTIFIER NOT NULL, Not FK

ProductID UNIQUEIDENTIFIER NULL

FOREIGN KEY REFERENCES dbo.Product,

NonStockProduct NVARCHAR(256),

Quantity NUMERIC(7,2) NOT NULL,

UnitPrice MONEY NOT NULL,

ExtendedPrice AS Quantity * UnitPrice,

ShipRequestDate DATETIME,

ShipDate DATETIME,

ShipComment NVARCHAR(256)

)

ON [Primary]

go

1431

Định dạng
Số trang	10
Dung lượng	1,08 MB