Enabling Change Data Capture on a source table in SQL Server 2008 does not prevent DDL changes from occurring.. However, Change Data Capture does help to mitigate the effect on the downs
Trang 1CHAPTER 42 What’s New for Transact-SQL in SQL Server 2008
For typical ETL-type applications, querying for change data is an ongoing process, making
periodic requests for all the changes that occurred since the last request which need to be
applied to the target For these types of queries, you can use the
sys.fn_cdc_increment_lsnfunction to determine the next lowest LSN boundary that is
greater than the maxLSN boundary of the previous query To demonstrate this, let’s first
execute some additional data modifications against the MyCustomertable:
Insert MyCustomer (PersonID, StoreID, TerritoryID,
AccountNumber, rowguid, ModifiedDate) Values (20779, null, 12,
‘AW’ + RIGHT(‘00000000’
+ convert(varchar(8), IDENT_Current(‘MyCustomer’)), 8),
NEWID(),
GETDATE())
delete MyCustomer where CustomerID = 30119
ThemaxLSN from the previous examples is 0x000000390000144C0004 We want to
incre-ment from this LSN to find the next set of changes In Listing 42.24, you pass this value
to the sys.fn_cdc_increment_lsnto set the minLSN value you’ll use with the
cdc.fn_cdc_get_net_changes_dbo_MyCustomerfunction as the lower bound
LISTING 42.24 Usingsys.fn_cdc_increment_lsn to Return the Net Changes to the
MyCustomer CDC Capture Table Since the Last Retrieval
declare variables to represent beginning and ending lsn
DECLARE @from_lsn BINARY(10), @to_lsn BINARY(10)
get the Next lowest LSN after the previous Max LSN
SELECT @from_lsn = sys.fn_cdc_increment_lsn(0x000000390000144C0004)
get the last LSN for table changes
SELECT @to_lsn = sys.fn_cdc_get_max_lsn()
get all changes in the range using “all with_merge” parameter
SELECT *
FROM cdc.fn_cdc_get_net_changes_dbo_MyCustomer
(@from_lsn, @to_lsn, ‘all with merge’);
GO
$start_lsn $operation $update_mask CustomerID
PersonID StoreID TerritoryID AccountNumber
rowguid ModifiedDate
- -
Trang 2- - - -
- -
-0x00000039000017D30004 5 NULL 30120
20779 NULL 12 AW00030120 CE8BBAA1-04C0-4A81-9A7E-85B4EDB5C36D 2010-04-27 23:52:36.477 ccc0x00000039000017E50004 1 NULL 30119
ccc20778 NULL 3 AW00030119 ccc2385A86E-6FD2-4815-8BFE-B3F4DF4AEA74 2010-04-27 22:38:48.263 If you want to retrieve the changes captured during a specific time period, you can use the sys.fn_cdc_map_time_to_lsnfunction, as shown in Listing 42.25 LISTING 42.25 Retrieving all Changes to MyCustomer During a Specific Time Period DECLARE @begin_time datetime, @end_time datetime, @begin_lsn binary(10), @end_lsn binary(10); SET @begin_time = ‘2010-04-27 22:38:48.250’ SET @end_time = ‘2010-04-27 23:52:36.500’ SELECT @begin_lsn = sys.fn_cdc_map_time_to_lsn (‘smallest greater than’, @begin_time); SELECT @end_lsn = sys.fn_cdc_map_time_to_lsn (‘largest less than or equal’, @end_time); SELECT * FROM cdc.fn_cdc_get_net_changes_dbo_MyCustomer (@begin_lsn, @end_lsn, ‘all’); Go $start_lsn $operation $update_mask CustomerID PersonID StoreID TerritoryID AccountNumber rowguid ModifiedDate - -
- - - -
-
-0x000000390000144C0004 4 NULL 30119
20778 NULL 3 AW00030119 2385A86E-6FD2-4815-8BFE-B3F4DF4AEA74 2010-04-27 22:38:48.263 ccc0x00000039000017D30004 2 NULL 30120
Trang 3CHAPTER 42 What’s New for Transact-SQL in SQL Server 2008
ccc20779 NULL 12 AW00030120
cccCE8BBAA1-04C0-4A81-9A7E-85B4EDB5C36D 2010-04-27 23:52:36.477
CDC and DDL Changes to Source Tables
One of the common challenges when capturing data changes from your source tables is
how to handle DDL changes to the source tables This can be an issue if the downstream
consumer of the changes has not reflected the same DDL changes for its destination tables
Enabling Change Data Capture on a source table in SQL Server 2008 does not prevent
DDL changes from occurring However, Change Data Capture does help to mitigate the
effect on the downstream consumers by allowing the delivered result sets that are returned
from the CDC capture tables to remain unchanged even as the column structure of the
underlying source table changes Essentially, the capture process responsible for
populat-ing the change table ignores any new columns not present when the source table was
enabled for Change Data Capture If a tracked column is dropped, NULLvalues are supplied
for the column in the subsequent change entries
However, if the data type of a tracked column is modified, the data type change is also
propagated to the change table to ensure that the capture mechanism does not introduce
data loss in tracked columns as a result of mismatched data types When a column is
modified, the capture process posts any detected changes to the cdc.ddl_historytable
Downstream consumers of the change data from the source tables that may need to be
alerted of the column changes (and make similar adjustments to the destination tables)
can use the stored procedure sys.sp_cdc_get_ddl_historyto identify any modifications
to the source table columns
So how do you modify the capture instance to recognize any added or dropped columns
in the source table? Unfortunately, the only way to do this is to disable CDC on the table
and re-enable it However, in an active source environment where it’s not possible to
suspend processing while CDC is being disabled and re-enabled, there is the possibility of
data loss between when CDC is disabled and re-enabled
Fortunately, CDC allows two capture instances to be associated with a single source table
This makes it possible to create a second capture instance for the table that reflects the
new column structure The capture process then captures changes to the same source table
into two distinct change tables having two different column structures While the original
change table continues to feed current operational programs, the new change table feeds
environments that have been modified to incorporate the new column data Allowing the
capture mechanism to populate both change tables in tandem provides a mechanism for
smoothly transitioning from one table structure to the other without any loss of change
data When the transition to the new table structure has been fully effected, the obsolete
capture instance can be removed
Trang 4Change Tracking
In addition to Change Data Capture, SQL Server 2008 also introduces Change Tracking
Change Tracking is a lightweight solution that provides an efficient change tracking
mechanism for applications Although they are similar in name, the purposes of Change
Tracking and Change Data Capture are different
Change Data Capture is an asynchronous mechanism that uses the transaction log to
record all the changes to a data row and store them in change tables All intermediate
versions of a row are available in the change tables The information captured is stored in
a relational format that can be queried by client applications such as ETL processes
Change Tracking, in contrast, is a synchronous mechanism that tracks modifications to a
table but stores only the fact that a row has been modified and when It does not keep
track of how many times the row has changed or the values of any of the intermediate
changes However, having a mechanism that records that a row has changed, you can
check to see whether data has changed and obtain the latest version of the row directly
from the table itself rather than querying a change capture table
NOTE
Unlike Change Data Capture, which is available only in the Enterprise, Datacenter, and
Developer Editions of SQL Server, Change Tracking is available in all editions
Change Tracking operates by using tracking tables that store a primary key and version
number for each row in a table that has been enabled for Change Tracking Applications
can then check to see whether a row has changed by looking up the row in the tracking
table by its primary key and see if the version number is different from when the row was
first retrieved
One of the common uses of Change Tracking is for applications that have to synchronize
data with SQL Server Change Tracking can be used as a foundation for both one-way and
two-way synchronization applications
One-way synchronization applications, such as a client or mid-tier caching application,
can be built to use Change Tracking The caching application, which requires data from a
SQL Server database to be cached in other data stores, can use Change Tracking to
deter-mine when changes have been made to the database tables and refresh the cache store by
retrieving data from the modified rows only to keep the cache up-to-date
Two-way synchronization applications can also be built to use Change Tracking A typical
example of a two-way synchronization application is the occasionally connected
applica-tion—for example, a sales application that runs on a laptop and is disconnected from the
central SQL Server database while the salesperson is out in the field Initially, the client
Trang 5CHAPTER 42 What’s New for Transact-SQL in SQL Server 2008
application queries and updates its local data store from the SQL Server database When it
reconnects with the database later, the application synchronizes with the database, and
data changes will flow from the laptop to the database and from the database to the
laptop Because data changes happen in both locations while the client application is
disconnected, the two-way synchronization application must be able to detect conflicts A
conflict occurs if the same data is changed in both data stores in the time between
synchronizations The client application can use Change Tracking to detect conflicts by
identifying rows whose version number has changed since the last synchronization The
application can implement a mechanism to resolve the conflicts so that the data changes
are not lost
Implementing Change Tracking
To use Change Tracking, you must first enable it for the database and then enable it at the
table level for any tables for which you want to track changes Change Tracking can be
enabled via T-SQL statements or through SQL Server Management Studio
To enable Change Tracking for a database in SSMS, right-click on the database in Object
Explorer to bring up the Properties dialog and select the Change Tracking page To enable
Change Tracking, set the Change Tracking option to True(see Figure 42.6) Also on this
page, you can configure the retention period for how long SQL Server retains the Change
Tracking information for each data row and whether to automatically clean up the
Change Tracking information when the retention period has been exceeded
FIGURE 42.6 Enabling Change Tracking for a database
Trang 6Change Tracking can also be enabled with the ALTER DATABASEcommand:
ALTER DATABASE AdventureWorks2008R2
SET CHANGE_TRACKING = ON
(CHANGE_RETENTION = 2 DAYS, AUTO_CLEANUP = ON)
After enabling Change Tracking at the database level, you can then enable Change
Tracking for the tables for which you want to track changes To enable Change Tracking
for a table in SSMS, right-click on the table in Object Explorer to bring up the Properties
dialog and select the Change Tracking page Set the Change Tracking option to Trueto
enable Change Tracking (see Figure 42.7) The TRACK_COLUMNS_UPDATEDoption specifies
whether SQL Server should store in the internal Change Tracking table any extra
informa-tion about which specific columns were updated Column tracking allows an applicainforma-tion
to synchronize only when specific columns are updated This capability can improve the
efficiency and performance of the synchronization process, but at the cost of additional
storage overhead This option is set to OFFby default
Change Tracking can also be enabled via T-SQL with the ALTER TABLEcommand:
FIGURE 42.7 Enabling Change Tracking for a table
Trang 7CHAPTER 42 What’s New for Transact-SQL in SQL Server 2008
USE [AdventureWorks2008R2]
GO
ALTER TABLE [dbo].[MyCustomer]
ENABLE CHANGE_TRACKING WITH(TRACK_COLUMNS_UPDATED = ON)
TIP
To determine which tables and databases have Change Tracking enabled, you can use the
sys.change_tracking_databasesandsys.change_tracking_tablescatalog views
Identifying Tracked Changes
After Change Tracking is enabled for a table, any data modification statements that affect
rows in the table cause Change Tracking information for each modified row to be
recorded To query for the rows that have changed and to obtain information about the
changes, you can use the built-in Change Tracking functions
Unless you enabled theTRACK_COLUMNS_UPDATEDoption, only the values of the primary key
column are recorded with the change information to allow you to identify the rows that
have been changed To identify the changed rows, use theCHANGETABLE (CHANGES )
Change Tracking function TheCHANGETABLE (CHANGES )function takes two
parame-ters: the first is the table name, and the second is the last synchronization version number
If you pass 0for the last synchronization version parameter, you get a list of all the rows
that have been modified since version 0, which means all the changes to the table since
first enabling Change Tracking Typically, however, you do not want all the rows that have
changed from the beginning of Change Tracking, but only those rows that have changed
since the last time you retrieved the changed rows
Rather than having to keep track of the version numbers, you can use the
CHANGE_TRACKING_CURRENT_VERSION()function to obtain the current version that will be
used the next time you query for changes The version returned represents the version of
the last committed transaction
Before an application can obtain changes for the first time, the application must first
execute a query to obtain the initial data from the table and a query to retrieve the initial
synchronization version using CHANGE_TRACKING_CURRENT_VERSION()function The version
number that is retrieved is passed to the CHANGETABLE(CHANGES )function the next
time it is invoked
The following example illustrates how to obtain the initial synchronization version and
initial data set:
USE AdventureWorks2008R2
Go
declare @synchronization_version bigint
Select change_tracking_version = CHANGE_TRACKING_CURRENT_VERSION();
Trang 8Obtain initial data set.
select CustomerID, TerritoryID, @synchronization_version as version
from MyCustomer
where CustomerID <= 5
go
change_tracking_version
-0 CustomerID TerritoryID
-1 -1
2 1
3 4
4 4
5 4
As you can see, because no updates have been performed since Change Tracking was enabled, the initial version is 0 Now let’s perform some updates on these rows to effect some changes: update MyCustomer set TerritoryID = 5 where CustomerID = 4 update MyCustomer set TerritoryID = 4 where CustomerID = 5 Now you can use the CHANGETABLE(CHANGES )function to find the rows that have changed since the last version (0): declare @last_synchronization_version bigint set @last_synchronization_version = 0 SELECT CT.CustomerID as CustID, CT.SYS_CHANGE_OPERATION, CT.SYS_CHANGE_COLUMNS, CT.SYS_CHANGE_CONTEXT FROM CHANGETABLE(CHANGES MyCustomer, @last_synchronization_version) AS CT Go CustID SYS_CHANGE_OPERATION SYS_CHANGE_COLUMNS SYS_CHANGE_CONTEXT -
Trang 9CHAPTER 42 What’s New for Transact-SQL in SQL Server 2008
4 U 0x0000000004000000 NULL
5 U 0x0000000004000000 NULL
You can see in these results that this query returns the CustomerIDs of the two rows that
were changed However, most applications also want the data from these rows as well To
return the data, you can join the results from CHANGETABLE(CHANGES )with the data in
the user table For example, the following query joins with the MyCustomertable to obtain
the values for the PersonID,StoredID, and TerritoryIDcolumns Note that the query
uses an OUTER JOINto make sure that the change information is returned for any rows
that may have been deleted from the user table Also, at the same time you are retrieving
the data rows, you also want to retrieve the current version as well to use the next time
the application comes back to retrieve the latest changes:
declare @last_synchronization_version bigint
set @last_synchronization_version = 0
select current_version = CHANGE_TRACKING_CURRENT_VERSION()
SELECT
CT.CustomerID as CustID,
C.PersonID,
C.StoreID,
C.TerritoryID,
CT.SYS_CHANGE_OPERATION,
CT.SYS_CHANGE_COLUMNS, CT.SYS_CHANGE_CONTEXT
FROM
MyCustomer C
RIGHT OUTER JOIN
CHANGETABLE(CHANGES MyCustomer, @last_synchronization_version) AS CT
on C.CustomerID = CT.CustomerID
go
current_version
-2
CustID PersonID StoreID TerritoryID
SYS_CHANGE_OPERATION SYS_CHANGE_COLUMNS SYS_CHANGE_CONTEXT
- -
-4 NULL 932 5
U 0x0000000004000000 NULL
5 NULL 1026 4
U 0x0000000004000000 NULL
You can see in the output from this query that the current version is now 2 The next time
the application issues a query to identify the rows that have been changed since this
Trang 10query, it will pass the value of 2as the @last_synchronization_versionto the
CHANGETABLE(CHANGES )function
CAUTION
The version number is NOT specific to a table or user session The Change Tracking
version number is maintained across the entire database for all users and change
tracked tables Whenever a data modification is performed by any user on any table that
has Change Tracking enabled, the version number is incremented
For example, immediately after running an update on change tracked table A in the
cur-rent application and incrementing the version to 3, another application could run an
update on change tracked table B and increment the version to 4, and so on This is
why you should always capture the current version number whenever you are retrieving
the latest set of changes from the change tracked tables
If an application has not synchronized with the database in a while, the stored version
number could no longer be valid if the Change Tracking retention period has expired for
any row modifications that have occurred since that version To validate the version
number, you can use the CHANGE_TRACKING_MIN_VALID_VERSION()function This function
returns the minimum valid version that a client can have and still obtain valid results
fromCHANGETABLE() Your client applications should check the last synchronization
version obtained against the value returned by this function and if the last
synchroniza-tion version is less than the version returned by this funcsynchroniza-tion, that version is invalid The
client application has to reinitialize all the data rows from the table The following T-SQL
code snippet can be used to validate the last_synchronization_version:
Check individual table.
IF (@last_synchronization_version <
CHANGE_TRACKING_MIN_VALID_VERSION(OBJECT_ID(‘MyCustomer’)))
BEGIN
Handle invalid version and do not enumerate changes.
Client must be reinitialized.
END
Identifying Changed Columns
In addition to information about which rows were changed and the operation that caused
the change (insert, update, or delete—reported as I,U, or Din the SYS_CHANGE_OPERATION),
theCHANGETABLE(CHANGES )function also provides information on which columns
were modified if you enabled the TRACK_COLUMNS_UPDATEDoption You can use this
infor-mation to determine whether any action is needed in your client application based on
which columns changed
To identify whether a specific column has changed, you can use the
CHANGE_TRACKING_IS_COLUMN_IN_MASK(column_id , change_columns ) function This