The CASTfunction converts it to readable text: SELECT CT.SYS_CHANGE_VERSION, CT.DepartmentID, CT.SYS_CHANGE_OPERATION, d.Name, d.GroupName, D.ModifiedDate, CASTSYS_CHANGE_CONTEXT as VARC
Trang 1Nielsen c59.tex V4 - 07/21/2009 3:58pm Page 1282
Part VIII Monitoring and Auditing
Here’s the catch: The context isn’t automatic — it must be added to each and every DML command In
addition, it uses aWITHclause, just like a common table expression, so the syntax is confusing While
I’m glad it’s possible to capture the context, I’m not a huge fan of the implementation
The following code creates thevarbinaryvariable and passes it to Change Tracking as part of an
UPDATEcommand
DECLARE @AppContext VARBINARY(128)
= CAST(‘Maui/Pn’ as VARBINARY(128));
WITH Change_Tracking_Context (@AppContext)
UPDATE HumanResources.Department SET GroupName = ‘Certified Master w/Context’
WHERE Name = ‘Row Three’;
When queryingChangeTable, thesys_Change_Contextcolumn returns the context data The
CAST()function converts it to readable text:
SELECT CT.SYS_CHANGE_VERSION, CT.DepartmentID, CT.SYS_CHANGE_OPERATION, d.Name, d.GroupName, D.ModifiedDate,
CAST(SYS_CHANGE_CONTEXT as VARCHAR) as ApplicationContext
FROM ChangeTable (Changes HumanResources.Department, 5) as CT LEFT OUTER JOIN HumanResources.Department d
ON d.DepartmentID = CT.DepartmentID ORDER BY CT.SYS_CHANGE_VERSION;
Removing Change Tracking
It’s as easy to remove Change Tracking as it is to enable it: Disable it from every table, and then remove
it from the database
If the goal is to reduce Change Tracking by a single table, then the sameALTERcommand that enabled
Change Tracking can disable it:
ALTER TABLE HumanResources.Department Disable Change_tracking;
When Change Tracking is disabled from a table, all storedChangeTabledata — the PKs and columns
updated — are lost
If the goal is to remove Change Tracking from the database, then Change Tracking must first
be removed from every table in the database One way to accomplish this is to leverage the
sp_MSforeachtablestored procedure:
EXEC sp_MSforeachtable
‘ALTER TABLE ? Disable Change_tracking;’;
Trang 2Nielsen c59.tex V4 - 07/21/2009 3:58pm Page 1283
Change Tracking 59
However, after much testing, I can only warn that in many casessp_msforeachtableoften fails to
remove Change Tracking from every table
A less elegant, but more reliable, method of ensuring that Change Tracking is completely removed from
every table in the database is to actually cursor through the sys.change_tracking_tablestable:
DECLARE @SQL NVARCHAR(MAX)=‘’;
SELECT @SQL = @SQL + ‘ALTER TABLE ’ + s.name + ‘.’ + t.name +
‘ Disable Change_tracking;’
FROM sys.change_tracking_tables ct
JOIN sys.tables t
ON ct.object_id = t.object_id
JOIN sys.schemas s
ON t.schema_id = s.schema_id;
PRINT @SQL;
EXEC sp_executesql @SQL;
Only after Change Tracking is disabled from every table can Change Tracking be removed from the
database:
ALTER DATABASE AdventureWorks2008
SET Change_tracking = off;
Even though Change Tracking is removed from the database, it doesn’t reset the Change Tracking
version number, so if Change Tracking is restarted it won’t cause a synchronization nightmare
Summary
Designing a DIY synchronization system involves triggers that either update row timestamps or write
keys to a table Change Tracking does all the hard work, adds auto cleanup, is relatively easy to set up
and use, and reliably returns the net changes Without question, using Change Tracking sets you up for
success with ETL processes and mobile device synchronization
Microsoft introduces several new auditing and monitoring technologies with SQL Server 2008 The
next chapter continues exploring these new technologies with Change Tracking’s big brother, Change
Data Capture
Trang 3Nielsen c59.tex V4 - 07/21/2009 3:58pm Page 1284
Trang 4Nielsen c60.tex V4 - 07/21/2009 4:00pm Page 1285
Change Data Capture
IN THIS CHAPTER
High-end BI ETL Leveraging the T-Log
Iknow almost nothing about the CDC in Atlanta The little I do know about
the Centers for Disease Control comes from watching Dustin Hoffman in the
movie Outbreak Fortunately for me and you, this chapter is about the other
CDC — Change Data Capture
There’s power hidden in the transaction log (T-Log), and Change Data Capture
(CDC) harnesses the transaction log to capture data changes with the least
possible impact on performance
Any data written to the transaction log can be asynchronously captured using
CDC from the transaction log after the transaction is complete, so it doesn’t affect
the original transaction’s performance CDC can track any data from the T-Log,
including any DMLINSERT,UPDATE,DELETE, andMERGEcommand, and DDL
CREATE,ALTER, andDROP
Changes are stored in change tables — tables created by CDC with the same
columns as the tracked tables plus a few extra CDC-specific columns All the
changes are captured, so CDC can return all the intermediate values or just
the net changes
Because CDC gathers its data by reading the log, the data in the change tables is
organized the same way the transaction log is organized — by T-log log sequence
numbers, known as LSNs (Kalen Delaney told a joke about Oracle’s founder
Larry Ellison being inside SQL Server — just look at the transaction log and
there’s LSN! Ha!)
There are only a few drawbacks to CDC:
■ Cost: It requires Enterprise Edition
■ Code: Personally, this really irks me CDC uses system stored
proce-dures instead of standardizedALTERstatements
Trang 5Nielsen c60.tex V4 - 07/21/2009 4:00pm Page 1286
Part VIII Monitoring and Auditing
■ Code: There’s no UI for configuring change data capture in Management Studio
■ T-Log I/O: The transaction log will experience about twice as much I/O because CDC reads from the log
■ Performance hit: Although it can vary greatly, expect an approximate 10% performance hit
on the OLTP server running CDC
■ Disk space: Because CDC essentially stores copies of every transaction data, there’s the potential that it can grow like a transaction log gone wild
Where change data capture shines is in gathering data for ETL from a high-traffic OLTP database to a
data warehouse Of the possible options, change data capture has the least performance hit, and it does
a great job of providing the right set of data for the Business Intelligence ETL (extract-transform-load)
When you think big-dollar BI, think change data capture
Enabling CDC
Change Data Capture is enabled at the database level first, and then for every table that needs to be
tracked Because change data capture reads from the transaction log, one might think that CDC requires
the database to be set to full recovery model so that the transaction log is kept However, SQL Server
doesn’t flush the log until after the transactions have been read by CDC, so CDC will work with any
recovery model, even simple
Also, and this is very important, change data capture uses SQL Agent jobs to capture and clean up the
data, so SQL Agent must be running or data will not be captured
Enabling the database
To enable the database, execute thesys.sp_cdc_enable_dbsystem stored procedure in the current
database It has no parameters:
EXEC sys.sp_cdc_enable_db
Theis_cdc_enabledcolumn insys.databasescan be used to determine which databases have
CDC enabled on them:
SELECT * FROM sys.databases WHERE is_cdc_enabled = 1
This procedure creates six system tables in the current database:
■ cdc.captured_columns: Stores metadata for tracked table’s columns
■ cdc.change_tables: Stores metadata for tracked tables
■ cdc.ddl_history: Tracks DDL activity
■ cdc.index_columns: Tracks table indexes
■ cdc.lsn_time_mapping: Used for calculating clean-up time
Trang 6Nielsen c60.tex V4 - 07/21/2009 4:00pm Page 1287
Change Data Capture 60
■ dbo.systranschemas: Tracks schema changes
These are listed in Object Explorer under the Database➪ Tables ➪ System tables node
Enabling tables
Once the database has been prepared for CDC, tables may be set up for CDC using thesys.sp_
cdc_enable_tablestored procedure, which has several options:
■ @source_schema: The name of the table to be tracked
■ @source_name: The tracked table’s schema
■ @role_name: The role with permission to view CDC data
The last six parameters are optional:
■ @capture_instance: May be used to create multiple capture instances for the table This is
useful if the schema is changed
■ @supports_net_changes: Allows seeing just the net changes, and requires the primary
key The default is true
■ @index_name: The name of the unique index, if there’s no primary key for the table (but
you’d never do that, right?)
■ @captured_column_list: Determines which columns are tracked The default is to track
all columns
■ @filegroup_name: The filegroup the CDC will be stored on If not specified, then the
change table is created on the default filegroup
■ @allow_partition_switch: AllowsALTER TABLE SWITCH PARTITIONon
CDC table
Note that the last parameter,@allow_partition_switch, was changed late in development of SQL
Server 2008, and some sources incorrectly list it as@partition_switch
The following batch configures CDC to track changes made to theHumanResources.Department
table:
EXEC sys.sp_cdc_enable_table
@source_schema = ‘HumanResources’,
@source_name = ‘Department’,
@role_name = null;
With the first table that’s enabled, SQL Server generates two SQL Agent jobs:
■ cdc.dbname_capture
■ cdc.dbname_cleanup
With every table that’s enabled for CDC, SQL Server creates a change table:
■ cdc.change_tables
Trang 7Nielsen c60.tex V4 - 07/21/2009 4:00pm Page 1288
Part VIII Monitoring and Auditing
■ cdc.index_columns
■ cdc.captured_columns
For an excellent article on tuning the performance of change data capture under various loads, see http://msdn.microsoft.com/en-us/library/dd266396.aspx
Working with Change Data Capture
It isn’t difficult to work with change data capture The trick is to understand the transaction log’s log
sequence numbers
AssumingAdventureWorks2008has been freshly installed, the following scripts make some data
changes so there will be some activity in the log for change data capture to gather:
INSERT HumanResources.Department (Name, GroupName) VALUES (’CDC New Row’, ‘SQL Rocks’),
(’Test Two’ , ‘CDC Rocks ’);
UPDATE HumanResources.Department SET Name = ‘Changed Name’
WHERE Name = ‘CDC New Row’;
INSERT HumanResources.Department (Name, GroupName) VALUES (’Row Three’, ‘PBM Rocks’),
(’Row Four’ , ‘TVP Rocks’);
UPDATE HumanResources.Department SET GroupName = ‘T-SQL Rocks’
WHERE Name = ‘Test Two’;
DELETE FROM HumanResources.Department WHERE Name = ‘Row Four’;
With five transactions complete, there should be some activity in the log The following DMVs can
reveal information about the log:
SELECT * FROM sys.dm_cdc_log_scan_sessions SELECT *
FROM sys.dm_repl_traninfo SELECT *
FROM sys.dm_cdc_errors
Trang 8Nielsen c60.tex V4 - 07/21/2009 4:00pm Page 1289
Change Data Capture 60
Examining the log sequence numbers
The data changes are organized in the change tables by log sequence number (LSN) Converting a
given date time to LSN is essential to working with change data capture Thesys.fn_cdc_map_
time_to_lsnfunction is designed to do just that The first parameter defines the LSN search (called
LSN boundary options), and the second parameter is the point in time Possible searches are as follows:
■ smallest greater than
■ smallest greater than or equal
■ largest less than
■ largest less than or equal
Each of the search options defines how the function will locate the nearest LSN in the change tables
The following sample query defines a range beginning with Jan 20 and ending with Jan 24, and returns
the LSNs that bound that range:
select
sys.fn_cdc_map_time_to_lsn
(’smallest greater than or equal’, ‘20090101’)
as BeginLSN,
sys.fn_cdc_map_time_to_lsn
(’largest less than or equal’, ‘20091231’)
as EndLSN;
Result:
-0x0000002F000001330040 0x0000003B000002290001
Thesys.fn_cdc_get_min_lsn()andsys.fn_cdc_get_max_lsn()functions serve as anchor
points to begin the walk through the log Theminfunction requires a table and returns the oldest log
entry Themaxfunction has no parameters and returns the most recent LSN in the change tables:
DECLARE
@BeginLSN VARBINARY(10) =
sys.fn_cdc_get_min_lsn(’HumanResources_Department’);
SELECT @BeginLSN;
DECLARE
@EndLSN VARBINARY(10) =
sys.fn_cdc_get_max_lsn();
SELECT @EndLSN;
There’s not much benefit to knowing the hexadecimal LSN values by themselves, but the LSNs can be
passed to other functions to select data from the change tables
Trang 9Nielsen c60.tex V4 - 07/21/2009 4:00pm Page 1290
Part VIII Monitoring and Auditing
Querying the change tables
Change tracking creates a function for each table being tracked using the namecdc.fn_cdc_get_
all_changesconcatenated with the schema and name of the table The following script uses the
sys.fn_cdc_map_time_to_lsnfunction to determine the LSN range values, store them in variables,
and then pass the variables to the department tables’ custom change data capture function:
with variables DECLARE
@BeginLSN VARBINARY(10) =
sys.fn_cdc_map_time_to_lsn
(’smallest greater than or equal’, ‘20090101’),
@EndLSN VARBINARY(10) =
sys.fn_cdc_map_time_to_lsn
(’largest less than or equal’, ‘20091231’);
SELECT $start_lsn, $seqval, $operation,
$update_mask, DepartmentID Name, GroupName, ModifiedDate FROM cdc.fn_cdc_get_all_changes_HumanResources_Department
(@BeginLSN, @EndLSN, ‘all’) ORDER BY $start_lsn
Result:
- - -0x0000005400001D6E0008 0x0000005400001D6E0003 2
0x0000005400001D6E0008 0x0000005400001D6E0006 2 0x0000005400001D700007 0x0000005400001D700002 4 0x0000005400001D7D0008 0x0000005400001D7D0003 2 0x0000005400001D7D0008 0x0000005400001D7D0006 2 0x0000005400001D7F0004 0x0000005400001D7F0002 4 0x0000005400001D810005 0x0000005400001D810003 1
$update_mask Name GroupName ModifiedDate - - -0x0F 17 SQL Rocks 2009-03-07 11:21:48.720 0x0F 18 CDC Rocks 2009-03-07 11:21:48.720 0x02 17 SQL Rocks 2009-03-07 11:21:48.720 0x0F 19 PBM Rocks 2009-03-07 11:21:55.387 0x0F 20 TVP Rocks 2009-03-07 11:21:55.387 0x04 18 T-SQL Rocks 2009-03-07 11:21:48.720 0x0F 20 TVP Rocks 2009-03-07 11:21:55.387
It’s also possible to pass the functions directly to the table’s change data capture function This is
essen-tially the same code as the previous query, but slightly simpler, which is usually a good thing:
SELECT * FROM cdc.fn_cdc_get_all_changes_HumanResources_Department (sys.fn_cdc_map_time_to_lsn
Trang 10Nielsen c60.tex V4 - 07/21/2009 4:00pm Page 1291
Change Data Capture 60
(’smallest greater than or equal’, ‘20090101’), sys.fn_cdc_map_time_to_lsn
(’largest less than or equal’, ‘20091231’),
‘all’) as CDC
ORDER BY $start_lsn
You can also convert an LSN directly to a time using thefn_cdc_map_lsn_to_time()function The
next query extends the previous query by returning the time of the transaction:
with lsn converted to time
SELECT
sys.fn_cdc_map_lsn_to_time( $start_lsn) as StartLSN, *
FROM cdc.fn_cdc_get_all_changes_HumanResources_Department
(sys.fn_cdc_map_time_to_lsn
(’smallest greater than or equal’, ‘20090101’),
sys.fn_cdc_map_time_to_lsn
(’largest less than or equal’, ‘20091231’),
‘all’) as CDC
ORDER BY $start_lsn
The $Operationcolumn returned by the change data capture custom table functions identifies the
type of DML that caused the data change Similar to a DML trigger, the data can be the before (deleted
table) or after (inserted table) image of an update
The default‘all’parameter directs CDC to only return the after, or new, image from an update
oper-ation The‘all update old’option, shown in the following example, tells CDC to return a row for
both the before update image and the after update image
This query uses a row constructor subquery to spell out the meaning of the operation:
SELECT
sys.fn_cdc_map_lsn_to_time( $start_lsn) as StartLSN,
Operation.Description as ‘Operation’,
DepartmentID, Name, GroupName
FROM cdc.fn_cdc_get_all_changes_HumanResources_Department
(sys.fn_cdc_map_time_to_lsn(’smallest greater than or equal’,
‘20090101’),
sys.fn_cdc_map_time_to_lsn(’largest less than or equal’,
‘20091231’),
‘all update old’) as CDC
JOIN
(VALUES
(1, ‘delete’),
(2, ‘insert’),
(3, ‘update/deleted’), ‘all update old’ option to view
(4, ‘update/inserted’)
) as Operation(OperationID, Description)
ON CDC $operation = Operation.OperationID
ORDER BY $start_lsn