The Change Data Capture Tables When CDC is enabled for a database and one or more tables, an associated Change Data Capture table is created for each table being monitored.. The remainin
Trang 1declare @smallBox GEOMETRY = ‘polygon((0 0, 0 2, 2 2, 2 0, 0 0))’;
declare @largeBox GEOMETRY = ‘polygon((1 1, 1 4, 4 4, 4 1, 1 1))’;
declare @line GEOMETRY = ‘linestring(0 2, 4 4)’;
select @smallBox
union all
select @largeBox
union all
select @smallBox.STIntersection(@largeBox)
union all
select @line
Spatial Data Types: Where to Go from Here?
The preceding sections provide only a brief introduction to spatial data types and how to
work with geometry and geography data For more information on working with spatial
data, in addition to Books Online, you might want to visit the Microsoft SQL Server 2008
Spatial Data page at http://www.microsoft.com/sqlserver/2008/en/us/spatial-data.aspx
This page provides links to whitepapers and other technical documents related to working
with spatial data in SQL Server 2008
In addition, all examples here deal with spatial data only as data values and coordinates
Spatial data is often most useful when it can be displayed visually, such as on a map SQL
Server 2008 R2 Reporting Services provides new map controls and a map wizard for
creating map reports based on spatial data For more information, see Chapter 53, “SQL
Server 2008 Reporting Services.”
Change Data Capture
In SQL Server 2008, Microsoft introduced a new feature called Change Data Capture
(CDC), which is designed to make it much easier and less resource intensive to identify
and retrieve changed data from tables in an online transaction processing (OLTP)
data-base In a nutshell, CDC captures and records INSERT,UPDATE, and DELETEactivity in an
OLTP database and stores it in a form that is easily consumed by an application, such as a
SQL Server Integration Services (SSIS) package
In the past, capturing data changes for your tables for auditing or extract, transform, and
load (ETL) purposes required using replication, time stamp columns, triggers, complex
queries, or expensive third-party tools None of these other methods are easy to
imple-ment, and many of them use a lot of server resources, negatively affecting the
perfor-mance of the OLTP server
Change Data Capture provides for a more efficient mechanism for capturing the data
changes in a table
Trang 2NOTE
Change Data Capture is available only in the SQL Server 2008 Developer, Enterprise,
and Datacenter Editions
The source of change data for Change Data Capture is the SQL Server transaction log As
inserts, updates, and deletes are applied to tables, entries that describe those changes are
added to the transaction log When Change Data Capture is enabled for a database, a SQL
Server Agent capture job is created to invoke the sp_replcmdssystem procedure This
procedure is an internal server function and is the same mechanism used by transactional
replication to harvest changes from the transaction log
NOTE
If replication is already enabled for the database, the transactional log reader used for
replication is also used for CDC This strategy significantly reduces log contention when
both replication and Change Data Capture are enabled for the same database
The principal task of the Change Data Capture process is to scan the log and identify
changes to data rows in any tables configured for Change Data Capture As these changes
are identified, the process writes column data and transaction-related information to the
Change Data Capture tables The changes can then be read from these change tables to be
applied as needed
The Change Data Capture Tables
When CDC is enabled for a database and one or more tables, an associated Change Data
Capture table is created for each table being monitored The Change Data Capture tables
are used to store the changes made to the data in corresponding source tables, along with
some metadata used to track the changes By default, the name of the CDC change table is
schemaname_tablename_CTand is based on the name of the source table
The first five columns of a Change Data Capture change table are metadata columns and
contain additional information relevant to the recorded change:
$start_lsn —Identifies the commit log sequence number (LSN) assigned to the
change This value can be used to determine the order of the transactions
$end_lsn —Is currently not used and in SQL Server 2008 is always NULL
$seqval —Can be used to order changes that occur within the same transaction.
Trang 3$operation —Records the operation associated with the change: 1= delete, 2=
insert,3= update before image(delete), and 4= update after image(insert)
$update_mask —Is a variable bit mask with one defined bit for each captured
col-umn to identify what colcol-umns were changed For insert and delete entries, the
update mask always has all bits set Update rows have the bits set only for the
columns that were modified
The remaining columns in the Change Data Capture change table are identical to the
columns from the source table in name and type and are used to store the column data
gathered from the source table when an insert, update, or delete operation is performed
on the table
For every row inserted into the source table, a single row a single row is inserted into the
change table, and this row contains the column values inserted into the source table
Every row deleted from the source table is also inserted as a single row into the change
table but contains the column values in the row before the delete operation An update
operation is captured as a delete followed by an insert, so two rows are captured for each
update: one row entry to capture the column values before the update, and a second row
entry to capture the column values after the update
In addition to the Change Data Capture tables, the following Change Data Capture
meta-data tables are also created:
cdc.change_tables —Contains one row for each change table in the created when
Change Data Capture is enabled on a source table
cdc.index_columns —Contains one row for each index column used by Change Data
Capture to uniquely identify rows in the source table By default, this is the column
of the primary key of the source table, but a different unique index on the source
table can be specified when Change Data Capture is enabled on the source table A
primary key or unique index is required on the source table only if Net Change
Tracking is enabled
cdc.captured_columns —Contains one row for each column tracked in each source
table By default, all columns of the source table are captured, but you can include or
exclude columns when enabling Change Data Capture for a table by specifying a
column list
cdc.ddl_history —Contains a row for each Data Definition Language (DDL) change
made to any table enabled for Change Data Capture You can use this table to
deter-mine when a DDL change occurred on a source table and what the change was
cdc.lsn_time_mapping —Contains a row for each transaction stored in a change
table and is used to map between log sequence number (LSN) commit values and the
actual time the transaction was committed
Although you can query the Change Data Capture tables directly, it is not recommended
Instead, you should use the Change Data Capture functions, which are discussed later
Trang 4All these objects associated with a CDC instance are created in the special schema called
cdcwhen Change Data Capture is enabled for a database
Enabling CDC for a Database
Before you can begin capturing data changes for a table, you must first enable the
data-base for Change Data Capture You do this by running the stored procedure
sys.sp_cdc_enable_dbwithin the desired database context When a database is enabled
for Change Data Capture, the cdcschema,cdcuser, metadata tables, as well as the system
functions, are used to query for change data
NOTE
To determine whether a database is already enabled for CDC, you can check the value
in the is_cdc_enabledcolumn in the sys.databasescatalog view A value of 1
indi-cates that CDC is enabled for the specified database
The following SQL code enables CDC for the AdventureWorks2008R2database and then
checks that CDC is enabled by querying the sys.databasescatalog view:
use AdventureWorks2008R2
go
exec sys.sp_cdc_enable_db
go
select is_cdc_enabled
from sys.databases
where name = ‘AdventureWorks2008R2’
go
is_cdc_enabled
-1
NOTE
Although the examples presented here are run against theAdventureWorks2008R2
data-base, they can also be run against theAdventureWorks2008database However, you
should be aware that some of the column values displayed may not be exactly the same
Enabling CDC for a Table
When the database is enabled for Change Data Capture, you can use the
sys.sp_cdc_enable_tablestored procedure to enable a Change Data Capture instance for
any tables in that database The sp_cdc_enable_Tablestored procedure supports the
following parameters:
Trang 5@source_schema—Specifies the name of the schema in which the source table
resides
@source_name—Specifies the name of the source table
@role_name—Indicates the name of the database role used to control access to
Change Data Capture tables If this parameter is set toNULL, no role is used to limit
access to the change data If the specified role does not exist, SQL Server creates a
database role with the specified name
@capture_instance—Specifies the name of the capture instance used to name the
instance-specific Change Data Capture objects By default, this is the source schema
name plus the source table name in the formatschemaname_sourcename A source
table can have a maximum of two capture instances
@supports_net_changes—Is set to1or0to indicate whether support for querying
for net changes is to be enabled for this capture instance If this parameter is set to 1,
the source table must have a defined primary key, or an alternate unique index must
be specified for the @index_nameparameter
@index_name—Specifies the name of a unique index to use to uniquely identify rows
in the source table
@captured_column_list—Specifies the source table columns to be included in the
change table By default, all columns are included in the change table
@filegroup_name—Specifies the filegroup to be used for the change table created for
the capture instance If this parameter isNULLor not specified, the default filegroup
is used If possible, it is recommended you create a separate filegroup from your
source tables for the Change Data Capture change tables
@allow_partition_switch—Indicates whether theSWITCH PARTITIONcommand of
ALTER TABLE can be executed against a table that is enabled for Change Data
Capture The default is 1(enabled) If any partition switches occur, Change Data
Capture does not track the changes resulting from the switch This causes data
inconsistencies when the change data is consumed
The@source_schema, @source_name, and @role_nameparameters are the only required
parameters All the others are optional and apply default values if not specified
To implement basic change data tracking for a table, let’s first create a copy of the
Customertable to play around with:
select * into MyCustomer from Sales.Customer
alter table MyCustomer add Primary key (CUstomerID)
Now, to enable CDC on the MyCustomertable, you can execute the following:
EXEC sys.sp_cdc_enable_table
@source_schema = N’dbo’,
@source_name = N’MyCustomer’,
@role_name = NULL
Trang 6NOTE
If this is the first time you are enabling CDC for a table in the database, you may see
the following messages, which indicate that SQL Server is enabling the SQL Agent jobs
to begin capturing the data changes in the database:
Job ‘cdc.AdventureWorks2008R2_capture’ started successfully.
Job ‘cdc.AdventureWorks2008R2_cleanup’ started successfully.
The Capture job that is created generally runs continuously and is used to move
changed data to the CDC tables from the transaction log The Cleanup job runs on a
scheduled basis to remove older data from the CDC tables so that they don’t grow too
large By default, it automatically removes data that is more than three days old The
properties of these jobs can be viewed and modified using the sys.sp_cdc_help_jobs
andsys.sp_cdc_change_jobprocedures, respectively
To determine whether or not a source table has been enabled for Change Data Capture, you
can query theis_tracked_by_cdccolumn in thesys.tablescatalog view for that table:
select is_tracked_by_cdc
from sys.tables
where name = ‘MyCustomer’
go
is_tracked_by_cdc
-1
TIP
To get information on which tables are configured for CDC and what the settings for
each are, you can execute the sys.sp_cdc_help_change_data_capturestored
proce-dure It reports the name and ID of the source and Change Tracking tables, the CDC
table properties, the columns included in the capture, and the date the CDC was
enabled/created for the source table
Querying the CDC Tables
After you enable change data tracking for a table, SQL Server begins capturing any data
changes for the table in the Change Data Capture tables To identify the data changes, you
need to query the Change Data Capture tables Although you can query the Change Data
Capture tables directly, it is recommended that you use the CDC functions instead The
main CDC table-valued functions (TVFs) are
cdc.fn_cdc_get_all_changes_capture_instance
Trang 7cdc.fn_cdc_get_net_changes_capture_instance
NOTE
The Change Data Capture change table and associated CDC table-valued functions
created along with it constitute what is referred to as a capture instance A capture
instance is created for every source table that is enabled for CDC
Each capture instance is given a unique name based on the schema and table names
For example, if the table named sales.productsis CDC enabled, the capture instance
created is named sales_products The name of the CDC change table within the
cap-ture instance is sales_products_CT, and the names of the two associated CDC query
functions are cdc.fn_cdc_get_all_changes_sales_productsand
cdc.fn_cdc_get_net_changes_sales_products.
Both of the CDC table-valued functions require two parameters to define the range of log
sequence numbers to use as the upper and lower bounds to determine which records are
to be included in the returned result set A third required parameter, the
row_filter_option, specifies the content of the metadata columns as well as the rows to
be returned in the result set Two values can be specified for the row_filterfor the
cdc.fn_cdc_get_all_changes_capture_instancefunction:”all”and”all update old”.
If”all”is specified, the function returns all changes within the specified log sequence
number (LSN) range For changes due to an update operation, only the row containing the
new values after the update are returned If ”all update old”is specified, the function
returns all changes within the specified LSN range For changes due to an update
opera-tion, this option returns both the before and after update copies of the row
For the cdc.fn_cdc_get_net_changes_capture_instancefunction, three values can be
specified for therow_filterparameter:”all”, ”all with mask”, and ”all with merge”.
If”all”is specified, the function returns the LSN of the final change to the row, and the
operation needed to apply the change to the row is returned in the $start_lsnand
$operationmetadata columns The $update_maskcolumn is always NULL If ”all
with mask”is specified, the function returns the LSN of the final change to the row and
the operation needed to apply the change to the row Plus, if the $operationequals4
(that is, it contains the after update row values), the columns actually modified in the
update are identified by the bit mask returned in the $update_maskcolumn
If the ”all with merge”option is passed, the function returns the LSN of the final change
to the row and the operation needed to apply the change to the row The $operation
column will have one of two values: 1for delete and 5to indicate that the operation
needed to apply the change is either an insert or update The column $update_maskis
alwaysNULL.
Trang 8So how do you determine what LSNs to specify to return the rows you need? Fortunately,
SQL Server provides several functions to help determine the appropriate LSN values for use
in querying the TVFs:
sys.fn_cdc_get_min_lsn—Returns the smallest LSN associated with a capture
instance validity interval The validity interval is the time interval for which change
data is currently available for its capture instances
sys.fn_cdc_get_max_lsn —Returns the largest LSN in the validity interval.
sys.fn_cdc_map_time_to_lsnandsys.fn_cdc_map_lsn_to_time—Are used to
corre-late LSN values with a standard time value.
sys.fn_cdc_increment_lsn and sys.fn_cdc_decrement_lsn—Can be used to make
an incremental adjustment to an LSN value This adjustment is sometimes necessary
to ensure that changes are not duplicated in consecutive query windows
So, before you can start querying the CDC tables, you need to generate some records in
them by running some data modifications against the source tables First, you need to run
the statements in Listing 42.21 against the MyCustomertable to generate some records in
thedbo_MyCustomer_CTChange Data Capture change table
LISTING 42.21 Some Data Modifications to Populate the MyCustomer CDC Capture Table
delete MyCustomer where CustomerID = 22
Insert MyCustomer (PersonID, StoreID, TerritoryID,
AccountNumber, rowguid, ModifiedDate) Values (20778, null, 9,
‘AW’ + RIGHT(‘00000000’
+ convert(varchar(8), IDENT_Current(‘MyCustomer’)), 8),
NEWID(),
GETDATE())
declare @ident int
select @ident = SCOPE_IDENTITY()
update MyCustomer
set TerritoryID = 3,
ModifiedDate = GETDATE()
where CustomerID = @ident
Now that you have some rows in the CDC capture table, you can start retrieving them
First, you need to identify the minandmaxLSN values to pass to the
Trang 9cdc.fn_cdc_get_all_changes_dbo_MyCustomerfunction This can be done using the
sys.fn_cdc_get_min_lsnandsys.fn_cdc_get_max_lsnfunctions Listing 42.22 puts all
these pieces together to return the records stored in the CDC capture table
LISTING 42.22 Querying the MyCustomer CDC Capture Table
USE AdventureWorks2008R2
GO
declare variables to represent beginning and ending lsn
DECLARE @from_lsn BINARY(10), @to_lsn BINARY(10)
get the first LSN for table changes
SELECT @from_lsn = sys.fn_cdc_get_min_lsn(‘dbo_MyCustomer’)
get the last LSN for table changes
SELECT @to_lsn = sys.fn_cdc_get_max_lsn()
get all changes in the range using “all update old” parameter
SELECT *
FROM cdc.fn_cdc_get_all_changes_dbo_MyCustomer
(@from_lsn, @to_lsn, ‘all update old’);
GO
$start_lsn $seqval $operation
$update_mask CustomerID PersonID StoreID TerritoryID
AccountNumber rowguid
ModifiedDate
- -
- -
-
-0x00000039000014400004 0x00000039000014400002 1 0x7F 22 NULL 494 3
AW00000022 9774AED6-D673-412D-B481-2573E470B478 2008-10-13 11:15:07.263 0x00000039000014410004 0x00000039000014410003 2 0x7F 30119 20778 NULL 9
AW00030119 2385A86E-6FD2-4815-8BFE-B3F4DF4AEA74 2010-04-27 22:38:44.267 0x000000390000144C0004 0x000000390000144C0002 3 0x48 30119 20778 NULL 9
AW00030119 2385A86E-6FD2-4815-8BFE-B3F4DF4AEA74
2010-04-27 22:38:44.267
ccc0x000000390000144C0004 0x000000390000144C0002 4
Trang 10ccc0x48 30119 20778 NULL 3
cccAW00030119 2385A86E-6FD2-4815-8BFE-B3F4DF4AEA74 ccc2010-04-27 22:38:48.263 Because the option ”all update old”is specified in Listing 42.22, all the rows in the dbo_MyCustomer_CTcapture table are returned, including the deleted row, inserted row, and both the before and after copies of the row updated If you want to return only the final version of each row within the LSN range (and the @supports_net_changeswas set to 1when CDC was enabled for the table), you can use thecdc.fn_cdc_get_net_changes_capture_instancefunction, as shown in Listing 42.23 LISTING 42.23 Querying the MyCustomer CDC Capture Table for Net Changes USE AdventureWorks2008R2 GO declare variables to represent beginning and ending lsn DECLARE @from_lsn BINARY(10), @to_lsn BINARY(10) get the first LSN for table changes SELECT @from_lsn = sys.fn_cdc_get_min_lsn(‘dbo_MyCustomer’) get the last LSN for table changes SELECT @to_lsn = sys.fn_cdc_get_max_lsn() get all changes in the range using “all with_merge” parameter SELECT * FROM cdc.fn_cdc_get_net_changes_dbo_MyCustomer (@from_lsn, @to_lsn, ‘all with merge’); GO $start_lsn $operation $update_mask CustomerID PersonID StoreID TerritoryID AccountNumber rowguid ModifiedDate - -
- - - -
-
-0x00000039000014400004 1 NULL 22
NULL 494 3 AW00000022 9774AED6-D673-412D-B481-2573E470B478 2008-10-13 11:15:07.263 ccc0x000000390000144C0004 5 NULL 30119
ccc20778 NULL 3 AW00030119
ccc2385A86E-6FD2-4815-8BFE-B3F4DF4AEA74 2010-04-27 22:38:48.263