To enable compression on a table or index, use the CREATE TABLE or CREATE INDEX statement with DATA_COMPRESSION = ROW | PAGE option.. If you are enabling compression on an existing table
Trang 1As you have learned from earlier chapters, tables can be stored as a heap, where rows
are stored in no particular order; or as a clustered index, where rows are stored in
the order defined by the index You can use compression on both types of table
Nonclustered indexes are stored separately from the table on which they are defined
Nonclustered indexes can also be created on views, a situation referred to as “indexed
views” You can use compression on nonclustered indexes for tables and views Finally,
data compression can be configured for individual partitions that tables and indexes are
stored across For example, if a table is partitioned into current and historical partitions,
you can choose to enable data compression on the historical partition only
To enable compression on a table or index, use the CREATE TABLE or CREATE INDEX statement with DATA_COMPRESSION = ROW | PAGE option If you are enabling compression on an existing table or index, use the DATA_COMPRESSION
option with the ALTER TABLE or ALTER INDEX statement Enabling compression
will cause a rebuild of the object and is, therefore, a highly time and resource
consum-ing operation Enablconsum-ing data compression on a table will have no effect on that table’s
non clustered indexes Each non clustered index can be compressed separately The
syn-tax for enabling data compression is shown in Examples 7.2 through 7.6
Example 7.2 Enabling Data Compression on a New Table—Syntax
CREATE TABLE [database_name].[schema_name].table_name
(<Column Definition List>)
WITH (DATA_COMPRESSION = ROW | PAGE)
Example 7.3 Enabling Data Compression on an Existing Table—Syntax
ALTER TABLE [database_name].[schema_name].table_name
REBUILD WITH (DATA_COMPRESSION = ROW | PAGE | NONE)
Example 7.4 Enabling Data Compression on a New Nonclustered Index—Syntax
CREATE NONCLUSTERED INDEX index_name
ON table_name (<Column List>)
WITH (DATA_COMPRESSION = ROW | PAGE)
Example 7.5 Enabling Data Compression on an Existing Nonclustered
Index—Syntax
ALTER INDEX index_name
ON table_name
REBUILD WITH (DATA_COMPRESSION = ROW | PAGE | NONE)
Example 7.6 Enabling Data Compression on a Partitioned Table—Syntax
ALTER TABLE partitioned_table_name
REBUILD PARTITION = 1 WITH (DATA_COMPRESSION = ROW | PAGE | NONE)
Trang 2Row versus Page Compression
Row compression attempts to reduce disk space by storing all fixed-length data types as variable length, including numeric data types This can reduce the size of each individual row, allowing you to fit more rows on a page Row compression uses compression metadata to describe the offset of each value within the row However, this space saving is not always achieved For example, when values stored
in columns of fixed length data type consume the entire length of the column, no space saving occurs In fact, in this scenario more space is used as the overhead compression metadata must still be written to the page
Row compression has no effect on the smallest possible data types like tinyint, smalldatetime, date and uniqueidentifier data types It also has no effect on data types that are already stored as variable-length like varchar, nvarchar, and varbinary Finally, special data types like text, image, xml, table, sql_variant, and cursor are not affected
by row level compression The bit data type is always negatively affected because, together with the metadata overhead, it requires four bits of storage as opposed to the one byte usually required for up-to-eight–bit columns
Page compression applies the following compression techniques to each data page:
Row compression
■
■
Prefix compression
■
■
Dictionary compression
■
■
These techniques are applied in order when the data page becomes full This is why page compression has a profound negative effect on write performance Page compression goes further than row compression when attempting to save space When you enable page compression, row compression is automatically enabled Prefix compression identifies repeating values in each column and stores the repeating value once in the compression information (CI) structure in the page header The repeating value throughout the column is then replaced by a reference
to the value in the page header The reference can also indicate a partial match Dictionary compression is applied after prefix compression This type of compres-sion identifies repeating values anywhere on the page and then stores these values, once in the CI structure, in the page header Repeating values throughout the page are replaced by a reference Dictionary compression is not limited to a single column;
it is applied to the entire page
Trang 3Estimating Space Savings
Using sp_estimate_data_compression_savings
It can be difficult to decide whether or not to implement data compression
To take the guesswork out of this decision, SQL Server 2008 provides a handy
stored procedure: sp_estimate_data_compression savings This stored procedure
takes the name of the table or indexed view, optional index number (specify
NULL for all indexes or 0 for the heap), optional partition number, and the type
of data compression to calculate the estimate for This stored procedure is also
useful if you have a compressed table and want to know how much space the
table would consume uncompressed The following columns are included in the
results of the sp_estimate_data_compression_savings stored procedure:
■
■ Object_name This is the name of the table or the indexed view for
which you are calculating the savings
■
■ Schema_name The schema that this table or view belongs to.
■
■ Index_id Index number: 0 stands for the heap, 1 for the clustered index,
other numbers for nonclustered indexes
■
■ Partition_number Number of the partition: 1 stands for a
nonparti-tioned table or index
■
■ Size_with_current_compression_setting (KB) Current size of the object.
■
■ Size_with_requested_compression_setting (KB) Estimated size of the
object without fragmentation or padding
■
■ Sample_size_with_current_compression_setting (KB) Size of the
sample using the existing compression setting
■
■ Sample_size_with_requested_compression_setting (KB) Size of the
sample using the requested compression setting
In Example 7.7, we will use the sp_estimate_data_compression_savings with the
Purchasing.PurchaseOrderDetail table
Example 7.7 Estimating Compression Savings
Use AdventureWorks;
GO
execute sp_estimate_data_compression_savings Purchasing, PurchaseOrderDetail,
null, null, Page
Trang 4Using Sparse Columns
Sparse columns reduce the amount of space taken up by null values However, sparse columns increase the time it takes to retrieve values that are not null Most columns that allow nulls can be marked as sparse The best practice is to use sparse columns when the technique saves at least 20 to 40 percent of space You are not concerned about read performance reduction for non-null values Columns are marked as sparse within the CREATE TABLE or ALTER TABLE statements, as shown in Examples 7.8 through 7.10
Example 7.8 Creating a Sparse Column in a New Table—Syntax
CREATE TABLE [database_name].[schema_name].table_name
(Column1 int PRIMARY KEY,
Column2 varchar(50) SPARSE NULL)
Example 7.9 Marking an Existing Column as Sparse—Syntax
ALTER TABLE [database_name].[schema_name].table_name
ALTER COLUMN Column2 ADD SPARSE
Example 7.10 Marking an Existing Column as Non-Sparse—Syntax
ALTER TABLE [database_name].[schema_name].table_name
ALTER COLUMN Column2 DROP SPARSE
New & Noteworthy…
Using Column Sets
Sparse columns are often used with a new feature of SQL Server called column sets A column set is like a calculated column that, when queried, returns an XML fragment representing all values stored in all sparse columns within a single table A column set, similar to a calculated column, consumes no storage space except for table metadata Unlike a calculated column, you can update a column set by updating the XML returned by the column set This makes column sets especially useful for storing a large number of properties that are often null.
Trang 5Consider using column sets when it is difficult to work with a large number of
columns individually, and many of the values in these columns are null Column sets can offer improved performance except in situations where many indexes are
defined on the table Example 7.11 demonstrates the use of column sets:
Example 7.11 Using Column Sets
CREATE TABLE Planets
(PlanetID int IDENTITY PRIMARY KEY,
PlanetName nvarchar(50) SPARSE NULL,
PlanetType nvarchar(50) SPARSE NULL,
Radius int SPARSE NULL,
PlanetDescription XML COLUMN_SET FOR ALL_SPARSE_COLUMNS);
GO
INSERT Planets (PlanetName, PlanetType, Radius) VALUES
('Earth', NULL, NULL),
('Jupiter', 'Gas Giant', 71492),
('Venus', NULL, 6051);
GO
SELECT PlanetDescription FROM Planets
Results:
PlanetDescription
- <PlanetName>Earth</PlanetName>
<PlanetName>Jupiter</PlanetName><PlanetType>Gas Giant
</PlanetType><Radius>71492</Radius>
<PlanetName>Venus</PlanetName><Radius>6051</Radius>
UPDATE Planets
SET PlanetDescription = '<PlanetName>Earth</PlanetName><PlanetType>Terrestrial Planet</PlanetType><Radius>6371</Radius>'
WHERE PlanetName = 'Earth';
GO
SELECT * FROM Planets
Results:
- PlanetID PlanetDescription
-
- 1 <PlanetName>Earth</PlanetName><PlanetType>Terrestrial Planet
</PlanetType><Radius>6371</Radius>
2 <PlanetName>Jupiter</PlanetName><PlanetType>Gas Giant
</PlanetType><Radius>71492</Radius>
3 <PlanetName>Venus</PlanetName><Radius>6051</Radius>
DROP TABLE Planets;
GO