The first example partition scheme, namedpsYearsAll, uses thepfYearsRTpartition function andplaces all the partitions in thePrimaryfilegroup: CREATE PARTITION SCHEME psYearsAll AS PARTIT
Trang 1FIGURE 68-3
The partition function is used by the partition scheme to place the data in separate filegroups
Part01
Boundaries
defined by
the Partition
Function
Partition
Locations
defined by
the Partition
Scheme
1/1/2002 1/1/2003 1/1/2004 1/1/2005
Part02 Part03
Create Table( ) On Partition Scheme
Part04 Part05
only specify the boundary values between ranges; they don’t define the upper or lower values for the
whole table
A boundary value can only exist in one partition The ranges are defined as left or right If a row has a
partition column value that is the same as a boundary value, then SQL Server needs to know in which
partition to put the row
Left ranges mean that data equal to the boundary is included in the partition to the left of the
bound-ary A boundary of‘12/31/2004’would create two partitions The lower partition would include all
data up to and including‘12/31/2004’, and the right partition would include any data greater than
‘12/31/2004’
Right ranges mean that data equal to the boundary goes into the partition on the right of the boundary
value To separate at the new year starting 2008, a right range would set the boundary at‘1/1/2008’
Any values less than the boundary go into the left, or lower, boundary Any data with a date equal to
or later than the boundary goes into the next partition These two functions use left and right ranges to
create the same result:
CREATE PARTITION FUNCTION pfyears(DateTime)
AS RANGE LEFT FOR VALUES (’12/31/2001’, ‘12/31/2002’, ‘12/31/2003’, ‘12/31/2004’);
or
CREATE PARTITION FUNCTION pfYearsRT(DateTime)
AS RANGE RIGHT FOR VALUES (‘1/1/2002’, ‘1/1/2003’, ‘1/1/2004’, ‘1/1/2008’);
These functions both create four defined boundaries, and thus five partitions
Trang 2SQL Server 2008’s table partitions are declarative, meaning the table is segmented by data
values A hash partition segments the data randomly SQL Server does not have hash
par-titioning You can create a hash function on a computed column but your client application needs to
understand this computation to allow for partition elimination Another option to randomly spread the
data across multiple disk subsystems is to define the table using a filegroup and then add multiple files
to the filegroup See Figure 68-4.
FIGURE 68-4
The partition configuration can be viewed in Object Explorer
Three catalog views expose information about partition function: syspartition_
functions , syspartition_function_range_values , and syspartition_parameters
Creating partition schemes
The partition schema builds on the partition function to specify the physical locations for the partitions
The physical partition tables may all be located in the same filegroup or spread over several filegroups
Trang 3The first example partition scheme, namedpsYearsAll, uses thepfYearsRTpartition function and
places all the partitions in thePrimaryfilegroup:
CREATE PARTITION SCHEME psYearsAll
AS PARTITION pfYearsRT ALL TO ([Primary]);
To place the table partitions in their own filegroup, omit theALLkeyword and list the filegroups
individually This creates five partitions to match the four boundary values specific in the function:
CREATE PARTITION SCHEME psYearsFiles
AS PARTITION pfYearsRT
TO (Part01, Part02, Part03, Part04, Part05);
The partition functions and schemes must be created using T-SQL code, but once they’ve been created
you can view them in Management Studio’s Object Explorer under the database Storage node
To examine information about partition schemes programmatically, query sys.partition_
schemes
Creating the partition table
Once the partition function and partition schemes are in place, actually creating the table is a piece of
cake (pun intended) I recommend creating a partition table with a non-clustered primary key Adding a
clustered index to a table will partition the table based on the partition scheme TheWorkOrderTable
Properties page also displays the partition scheme being used by the table
Partition functions and partition schemes don’t have owners, so when referring to partition schemes or
partition functions, you don’t need to use the four-part name or the schema owner in the name
The following table is similar to theAdventureWorks WorkOrdertable in the production scheme:
CREATE TABLE dbo.WorkOrder ( WorkOrderID INT NOT NULL PRIMARY KEY NONCLUSTERED, ProductID INT NOT NULL,
OrderQty INT NOT NULL, StockedQty INT NOT NULL, ScrappedQty INT NOT NULL, StartDate DATETIME NOT NULL, EndDate DATETIME NOT NULL, DueDate DATETIME NOT NULL, ScrapReasonID INT NULL, ModifiedDate DATETIME NOT NULL );
CREATE CLUSTERED INDEX ix_WorkORder_DueDate
ON dbo.WorkOrder (DueDate)
ON psYearsAll(DueDate);
The next script inserts 7,259,100 rows into theWorkOrdertable in 2 minutes and 42 seconds, as
confirmed by the database Summary page:
Trang 4DECLARE @Counter INT;
SET @Counter = 0;
WHILE @Counter < 100
BEGIN
SET @Counter = @Counter + 1;
INSERT dbo.WorkOrder (ProductID, OrderQty, StockedQty, ScrappedQty,
StartDate, EndDate, DueDate, ScrapReasonID, ModifiedDate)
SELECT WorkOrderID, ProductID, OrderQty, StockedQty, ScrappedQty,
StartDate, EndDate, DueDate, ScrapReasonID, ModifiedDate
FROM AdventureWorks.Production.WorkOrder;
END;
It’s possible for multiple partition schemas to share a single partition function Architecturally, this might
make sense if several tables should be partitioned using the same boundaries, because this improves the
consistency of the partitions To verify which tables use which partition schemes, based on which
par-tition functions, use the Object Dependencies dialog for the parpar-tition function or parpar-tition scheme You
can find it using the partition function’s context menu
To see information about how the partitions are being used, look at sys.partitions and
sys.partition_counts
Querying partition tables
The nice thing about partition tables is that no special code is required to query either across multiple
underlying partition tables or from only one partition table The Query Optimizer automatically uses the
right tables to retrieve the data
The$partitionoperator can return the partition table’s integer identifier when used with the
partition function The next code snippet counts the number of rows in each partition:
SELECT $PARTITION.pfYearsRT(DueDate) AS Partition,
COUNT(*) AS Count
FROM WorkOrder
GROUP BY $PARTITION.pfYearsRT(DueDate)
ORDER BY Partition;
Result:
Partition Count
-
The next query selects data for one year, so the data should be located in only one partition Examining
the query execution plan (not shown here) reveals that the Query Optimizer used a high-speed clustered
index scan on partition IDPtnIds1005:
Trang 5SELECT WorkOrderID,ProductID, OrderQty, StockedQty, ScrappedQty FROM dbo.WorkOrder
WHERE DueDate between ‘1/1/2002’ and ‘12/31/2002’
Altering partition tables
In order for partition tables to be updated to keep up with changing data, and to enable the
perfor-mance testing of various partition schemes, they are easily modified Even though the commands are
simple, modifying the design of partition tables never executes very quickly, as you can imagine
Merging partitions
Merge and split surgically modify the table partition design TheALTER PARTITION MERGE
RANGEcommand effectively removes one of the boundaries from the partition function and merges two
partitions For example, to remove the boundary between 2003 and 2004 in thepfYearsRTpartition
function, and combine the data from 2003 and 2004 into a single partition, use the followingALTER
command:
ALTER PARTITION FUNCTION pfYearsRT() MERGE RANGE (’1/1/2004’);
Sure enough, following the merge operation, the previous count-rows-per-partition query now returns
three partitions, and scripting the partition function from Object Explorer creates a script with three
boundaries in the partition function code
If multiple tables share the same partition scheme and partition function being modified, then multiple tables will be affected by these changes.
Splitting partitions
To split an existing single partition, the first step is to designate the next filegroup to be used by the
partition scheme This is done using theALTER PARTITION NEXT USEDcommand If you specify
too many filegroups when creating a scheme, you will get a message that the next filegroup used is the
extra file group you specified Then the partition function can be modified to specify the new boundary
using theALTER PARTITION SPLIT RANGEcommand to insert a new boundary into the partition
function It’s theALTER FUNCTIONcommand that actually performs the work
This example segments the 2003–2004 work order data into two partitions The new partition will
include only data for July 2004, the last month with data in theAdventureWorkstable:
ALTER PARTITION SCHEME psYearsFiles NEXT USED [Primary];
Trang 6ALTER PARTITION FUNCTION pfYearsRT()
SPLIT RANGE (’7/1/2004’);
Switching tables
Switching tables is the cool capability to move an entire table into a partition within a partitioned
table, or to remove a single partition so that it becomes a stand-alone table This is very useful when
importing new data, but note a few restrictions:
■ Every index for the partition table must be a partitioned index
■ The new table must have the same columns (excluding identity columns), indexes, and
con-straints (including foreign keys) as the partition table, except that the new table cannot be
partitioned
■ The source partition table cannot be the target of a foreign key
■ Neither table can be published using replication, or have schema-bound views
■ The new table must have check constraint restricting the data range to the new partition, so
SQL Server doesn’t have to re-verify the data range (and it needs to be validated; no point
loading and then creating the constraint with nocheck)
■ Both the stand-alone table and the partition that will receive the stand-alone table must be on
the same filegroup
■ The receiving partition or table must be empty
In essence, switching a partition is rearranging the database metadata to reassign the existing table as a
partition No data is actually moved, which makes table switching nearly instantaneous regardless of the
table’s size
Prepping the new table
TheWorkOrderNEWtable will be created to demonstrate switching It will hold August 2004 data from
AdventureWorks:
CREATE TABLE dbo.WorkOrderNEW (
WorkOrderID INT IDENTITY NOT NULL,
ProductID INT NOT NULL,
OrderQty INT NOT NULL,
StockedQty INT NOT NULL,
ScrappedQty INT NOT NULL,
StartDate DATETIME NOT NULL,
EndDate DATETIME NOT NULL,
DueDate DATETIME NOT NULL,
ScrapReasonID INT NULL,
ModifiedDate DATETIME NOT NULL
)
ON Part05;
Trang 7Indexes identical to those on the preceding table will be created on the partitioned table:
ALTER TABLE dbo.WorkOrderNEW ADD CONSTRAINT WorkOrderNEWPK PRIMARY KEY NONCLUSTERED (WorkOrderID, DueDate) go CREATE CLUSTERED INDEX ix_WorkOrderNEW_DueDate
ON dbo.WorkOrderNEW (DueDate)
The following adds the mandatory constraint:
ALTER TABLE dbo.WorkOrderNEW ADD CONSTRAINT WONewPT CHECK (DueDate BETWEEN ‘8/1/2004’ AND ‘8/31/2004’);
Now import the new data fromAdventureWorks, reusing the January 2004 data:
INSERT dbo.WorkOrderNEW (ProductID, OrderQty, StockedQty, ScrappedQty,
StartDate, EndDate, DueDate, ScrapReasonID, ModifiedDate) SELECT
ProductID, OrderQty, StockedQty, ScrappedQty, DATEADD(mm,7,StartDate), DATEADD(mm,7,EndDate), DATEADD(mm,7,DueDate), ScrapReasonID, DATEADD(mm,7,ModifiedDate) FROM AdventureWorks.Production.WorkOrder
WHERE DueDate BETWEEN ‘1/1/2004’ and ‘1/31/2004’;
The new table now has 3,158 rows
Prepping the partition table
The original partition table, built earlier in this section, has a non-partitioned, non-clustered primary
key Because one of the rules of switching into a partitioned table is that every index must be
parti-tioned, the first task for this example is to drop and rebuild theWorkOrdertable’s primary key so it
will be partitioned:
ALTER TABLE dbo.WorkOrder DROP CONSTRAINT WorkOrderPK
ALTER TABLE dbo.WorkOrder ADD CONSTRAINT WorkOrderPK PRIMARY KEY NONCLUSTERED (WorkORderID,DueDate)
ON psYearsAll(DueDate);
Next, the partition table needs an empty partition:
ALTER PARTITION SCHEME psYearsFiles NEXT USED [Primary]
ALTER PARTITION FUNCTION pfYearsRT() SPLIT RANGE (’8/1/2004’)
Trang 8Performing the switch
TheALTER TABLE SWITCH TOcommand will move the new table into a specific partition To
determine the empty target partition, select the database Summary page ➪ Disk Usage report:
ALTER TABLE WorkOrderNEW
SWITCH TO WorkOrder PARTITION 5
Switching out
The same technology can be used to switch a partition out of the partition table so that it becomes a
stand-alone table Because no merger is taking place, this is much easier than switching in The
follow-ing code takes the first partition out of theWorkOrderpartition table and reconfigures the database
metadata so it becomes its own table:
ALTER TABLE WorkOrder
SWITCH PARITION 1 to WorkOrderArchive
Rolling partitions
With a little imagination, the technology to create and merge existing partitions can be used to create
rolling partition designs
Rolling partitions are useful for time-based partition functions such as partitioning a year of data into
months Each month, the rolling partition expands for a new month To build a 13-month rolling
partition, perform these steps each month:
1 Add a new boundary.
2 Point the boundary to the next used filegroup.
3 Merge the oldest two partitions to keep all the data.
Switching tables into and out of partitions can enhance the rolling partition designs by switching in fully
populated staging tables and switching out the tables into an archive location
Indexing partitioned tables
Large tables mean large indexes, so non-clustered indexes can be optionally partitioned
Creating partitioned indexes
Partitioned non-clustered indexes must include the column used by the partition function in the index,
and must be created using the sameONclause as the partitioned clustered index:
CREATE INDEX WorkOrder_ProductID
ON WorkOrder (ProductID, DueDate)
ON psYearsFiles(DueDate);
Trang 9Maintaining partitioned indexes
One of the advantages of partitioned indexes is that they can be individually maintained The following
example rebuilds the newly added fifth partition:
ALTER INDEX WorkOrder_ProductID
ON dbo.WorkOrder REBUILD
PARTITION = 5
Removing partitioning
To remove the partitioning of any table, drop the clustered index and add a new clustered index
without the partitioningONclause When dropping the clustered index, you must add theMOVE TO
option to actually consolidate the data onto the specified filegroup, thus removing the partitioning from
the table:
DROP INDEX ix_WorkOrder_DueDate
ON dbo.Workorder WITH (MOVE TO [Primary]);
Data-Driven Partitioning
The third method doesn’t involve any Microsoft partitioning technology Instead, it’s an architectural
pattern that I’ve used in large, heavy transaction databases It’s rather simple, but very fast
A data-driven partitioning scheme segments the data into different servers based on a partition key
Each server has the same database schema, but the data stored is only the required data partition key or
ranges For example, server A could hold accounts 1–999 Server B could hold accounts 1,000–1,999
Server C could hold all accounts greater than or equal to 2,000
A partition mapping table stores the server name for each partition key value or range of values In the
previous example, the partition key table would hold the from and to account numbers and the server
name
The middle tier reads and caches the partition mapping table, and for every database access it checks
the partition mapping table to determine which server holds the needed data
This method works best when the data is self-contained and the complete query can be solved using
only the subset of data If the servers need to do much cross-server querying to solve the queries, then
the benefits are likely lost
What’s nice about data-driven partitioning is that it’s very easy to scale out Adding another server only
requires moving some data and updating the partition-mapping table
Trang 10Not every database will have to scale to higher magnitudes of capacity, but when a project does grow
into the terabytes, SQL Server 2008 provides some advanced technologies to tackle the growth
However, even these advanced technologies are no substitute for Smart Database Design
Key points on partitioning include the following:
■ Partitioned views use a union all to merge data from several user-created base tables Each
partition table must include the partition key and a constraint
■ The Query Processor can carefully choose the minimum number of underlying tables when
selecting through a partitioned view, but not when updating
■ Distributed partitioned views add distributed queries to combine data from multiple servers
■ Partitioned tables are a completely different technology than partitioned views and use a
partition function, schema, and clustered index to partition a single table
■ Data-driven partitioning is an architectural pattern that involves custom coding, but it delivers
the best possible scale-out performance and flexibility
The next chapter wraps up this part covering optimization with a new feature for SQL Server 2008
Enterprise Edition that’s getting quite a bit of buzz