Microsoft SQL Server 2008 R2 Unleashed- P169 docx

Enabling Change Data Capture on a source table in SQL Server 2008 does not prevent DDL changes from occurring.. However, Change Data Capture does help to mitigate the effect on the downs

Trang 1

CHAPTER 42 What’s New for Transact-SQL in SQL Server 2008

For typical ETL-type applications, querying for change data is an ongoing process, making

periodic requests for all the changes that occurred since the last request which need to be

applied to the target For these types of queries, you can use the

sys.fn_cdc_increment_lsnfunction to determine the next lowest LSN boundary that is

greater than the maxLSN boundary of the previous query To demonstrate this, let’s first

execute some additional data modifications against the MyCustomertable:

Insert MyCustomer (PersonID, StoreID, TerritoryID,

AccountNumber, rowguid, ModifiedDate) Values (20779, null, 12,

‘AW’ + RIGHT(‘00000000’

+ convert(varchar(8), IDENT_Current(‘MyCustomer’)), 8),

NEWID(),

GETDATE())

delete MyCustomer where CustomerID = 30119

ThemaxLSN from the previous examples is 0x000000390000144C0004 We want to

incre-ment from this LSN to find the next set of changes In Listing 42.24, you pass this value

to the sys.fn_cdc_increment_lsnto set the minLSN value you’ll use with the

cdc.fn_cdc_get_net_changes_dbo_MyCustomerfunction as the lower bound

LISTING 42.24 Usingsys.fn_cdc_increment_lsn to Return the Net Changes to the

MyCustomer CDC Capture Table Since the Last Retrieval

declare variables to represent beginning and ending lsn

DECLARE @from_lsn BINARY(10), @to_lsn BINARY(10)

get the Next lowest LSN after the previous Max LSN

SELECT @from_lsn = sys.fn_cdc_increment_lsn(0x000000390000144C0004)

get the last LSN for table changes

SELECT @to_lsn = sys.fn_cdc_get_max_lsn()

get all changes in the range using “all with_merge” parameter

SELECT *

FROM cdc.fn_cdc_get_net_changes_dbo_MyCustomer

(@from_lsn, @to_lsn, ‘all with merge’);

GO

$start_lsn $operation $update_mask CustomerID

PersonID StoreID TerritoryID AccountNumber

rowguid ModifiedDate

- -

Trang 2

- - - -

- -

-0x00000039000017D30004 5 NULL 30120

20779 NULL 12 AW00030120 CE8BBAA1-04C0-4A81-9A7E-85B4EDB5C36D 2010-04-27 23:52:36.477 ccc0x00000039000017E50004 1 NULL 30119

ccc20778 NULL 3 AW00030119 ccc2385A86E-6FD2-4815-8BFE-B3F4DF4AEA74 2010-04-27 22:38:48.263 If you want to retrieve the changes captured during a specific time period, you can use the sys.fn_cdc_map_time_to_lsnfunction, as shown in Listing 42.25 LISTING 42.25 Retrieving all Changes to MyCustomer During a Specific Time Period DECLARE @begin_time datetime, @end_time datetime, @begin_lsn binary(10), @end_lsn binary(10); SET @begin_time = ‘2010-04-27 22:38:48.250’ SET @end_time = ‘2010-04-27 23:52:36.500’ SELECT @begin_lsn = sys.fn_cdc_map_time_to_lsn (‘smallest greater than’, @begin_time); SELECT @end_lsn = sys.fn_cdc_map_time_to_lsn (‘largest less than or equal’, @end_time); SELECT * FROM cdc.fn_cdc_get_net_changes_dbo_MyCustomer (@begin_lsn, @end_lsn, ‘all’); Go $start_lsn $operation $update_mask CustomerID PersonID StoreID TerritoryID AccountNumber rowguid ModifiedDate - -

- - - -

-

-0x000000390000144C0004 4 NULL 30119

20778 NULL 3 AW00030119 2385A86E-6FD2-4815-8BFE-B3F4DF4AEA74 2010-04-27 22:38:48.263 ccc0x00000039000017D30004 2 NULL 30120

Trang 3

ccc20779 NULL 12 AW00030120

cccCE8BBAA1-04C0-4A81-9A7E-85B4EDB5C36D 2010-04-27 23:52:36.477

CDC and DDL Changes to Source Tables

One of the common challenges when capturing data changes from your source tables is

how to handle DDL changes to the source tables This can be an issue if the downstream

consumer of the changes has not reflected the same DDL changes for its destination tables

Enabling Change Data Capture on a source table in SQL Server 2008 does not prevent

DDL changes from occurring However, Change Data Capture does help to mitigate the

effect on the downstream consumers by allowing the delivered result sets that are returned

from the CDC capture tables to remain unchanged even as the column structure of the

underlying source table changes Essentially, the capture process responsible for

populat-ing the change table ignores any new columns not present when the source table was

enabled for Change Data Capture If a tracked column is dropped, NULLvalues are supplied

for the column in the subsequent change entries

However, if the data type of a tracked column is modified, the data type change is also

propagated to the change table to ensure that the capture mechanism does not introduce

data loss in tracked columns as a result of mismatched data types When a column is

modified, the capture process posts any detected changes to the cdc.ddl_historytable

Downstream consumers of the change data from the source tables that may need to be

alerted of the column changes (and make similar adjustments to the destination tables)

can use the stored procedure sys.sp_cdc_get_ddl_historyto identify any modifications

to the source table columns

So how do you modify the capture instance to recognize any added or dropped columns

in the source table? Unfortunately, the only way to do this is to disable CDC on the table

and re-enable it However, in an active source environment where it’s not possible to

suspend processing while CDC is being disabled and re-enabled, there is the possibility of

data loss between when CDC is disabled and re-enabled

Fortunately, CDC allows two capture instances to be associated with a single source table

This makes it possible to create a second capture instance for the table that reflects the

new column structure The capture process then captures changes to the same source table

into two distinct change tables having two different column structures While the original

change table continues to feed current operational programs, the new change table feeds

environments that have been modified to incorporate the new column data Allowing the

capture mechanism to populate both change tables in tandem provides a mechanism for

smoothly transitioning from one table structure to the other without any loss of change

data When the transition to the new table structure has been fully effected, the obsolete

capture instance can be removed

Trang 4

Change Tracking

In addition to Change Data Capture, SQL Server 2008 also introduces Change Tracking

Change Tracking is a lightweight solution that provides an efficient change tracking

mechanism for applications Although they are similar in name, the purposes of Change

Tracking and Change Data Capture are different

Change Data Capture is an asynchronous mechanism that uses the transaction log to

record all the changes to a data row and store them in change tables All intermediate

versions of a row are available in the change tables The information captured is stored in

a relational format that can be queried by client applications such as ETL processes

Change Tracking, in contrast, is a synchronous mechanism that tracks modifications to a

table but stores only the fact that a row has been modified and when It does not keep

track of how many times the row has changed or the values of any of the intermediate

changes However, having a mechanism that records that a row has changed, you can

check to see whether data has changed and obtain the latest version of the row directly

from the table itself rather than querying a change capture table

NOTE

Unlike Change Data Capture, which is available only in the Enterprise, Datacenter, and

Developer Editions of SQL Server, Change Tracking is available in all editions

Change Tracking operates by using tracking tables that store a primary key and version

number for each row in a table that has been enabled for Change Tracking Applications

can then check to see whether a row has changed by looking up the row in the tracking

table by its primary key and see if the version number is different from when the row was

first retrieved

One of the common uses of Change Tracking is for applications that have to synchronize

data with SQL Server Change Tracking can be used as a foundation for both one-way and

two-way synchronization applications

One-way synchronization applications, such as a client or mid-tier caching application,

can be built to use Change Tracking The caching application, which requires data from a

SQL Server database to be cached in other data stores, can use Change Tracking to

deter-mine when changes have been made to the database tables and refresh the cache store by

retrieving data from the modified rows only to keep the cache up-to-date

Two-way synchronization applications can also be built to use Change Tracking A typical

example of a two-way synchronization application is the occasionally connected

applica-tion—for example, a sales application that runs on a laptop and is disconnected from the

central SQL Server database while the salesperson is out in the field Initially, the client

Trang 5

application queries and updates its local data store from the SQL Server database When it

reconnects with the database later, the application synchronizes with the database, and

data changes will flow from the laptop to the database and from the database to the

laptop Because data changes happen in both locations while the client application is

disconnected, the two-way synchronization application must be able to detect conflicts A

conflict occurs if the same data is changed in both data stores in the time between

synchronizations The client application can use Change Tracking to detect conflicts by

identifying rows whose version number has changed since the last synchronization The

application can implement a mechanism to resolve the conflicts so that the data changes

are not lost

Implementing Change Tracking

To use Change Tracking, you must first enable it for the database and then enable it at the

table level for any tables for which you want to track changes Change Tracking can be

enabled via T-SQL statements or through SQL Server Management Studio

To enable Change Tracking for a database in SSMS, right-click on the database in Object

Explorer to bring up the Properties dialog and select the Change Tracking page To enable

Change Tracking, set the Change Tracking option to True(see Figure 42.6) Also on this

page, you can configure the retention period for how long SQL Server retains the Change

Tracking information for each data row and whether to automatically clean up the

Change Tracking information when the retention period has been exceeded

FIGURE 42.6 Enabling Change Tracking for a database

Trang 6

Change Tracking can also be enabled with the ALTER DATABASEcommand:

ALTER DATABASE AdventureWorks2008R2

SET CHANGE_TRACKING = ON

(CHANGE_RETENTION = 2 DAYS, AUTO_CLEANUP = ON)

After enabling Change Tracking at the database level, you can then enable Change

Tracking for the tables for which you want to track changes To enable Change Tracking

for a table in SSMS, right-click on the table in Object Explorer to bring up the Properties

dialog and select the Change Tracking page Set the Change Tracking option to Trueto

enable Change Tracking (see Figure 42.7) The TRACK_COLUMNS_UPDATEDoption specifies

whether SQL Server should store in the internal Change Tracking table any extra

informa-tion about which specific columns were updated Column tracking allows an applicainforma-tion

to synchronize only when specific columns are updated This capability can improve the

efficiency and performance of the synchronization process, but at the cost of additional

storage overhead This option is set to OFFby default

Change Tracking can also be enabled via T-SQL with the ALTER TABLEcommand:

FIGURE 42.7 Enabling Change Tracking for a table

Trang 7

USE [AdventureWorks2008R2]

GO

ALTER TABLE [dbo].[MyCustomer]

ENABLE CHANGE_TRACKING WITH(TRACK_COLUMNS_UPDATED = ON)

TIP

To determine which tables and databases have Change Tracking enabled, you can use the

sys.change_tracking_databasesandsys.change_tracking_tablescatalog views

Identifying Tracked Changes

After Change Tracking is enabled for a table, any data modification statements that affect

rows in the table cause Change Tracking information for each modified row to be

recorded To query for the rows that have changed and to obtain information about the

changes, you can use the built-in Change Tracking functions

Unless you enabled theTRACK_COLUMNS_UPDATEDoption, only the values of the primary key

column are recorded with the change information to allow you to identify the rows that

have been changed To identify the changed rows, use theCHANGETABLE (CHANGES )

Change Tracking function TheCHANGETABLE (CHANGES )function takes two

parame-ters: the first is the table name, and the second is the last synchronization version number

If you pass 0for the last synchronization version parameter, you get a list of all the rows

that have been modified since version 0, which means all the changes to the table since

first enabling Change Tracking Typically, however, you do not want all the rows that have

changed from the beginning of Change Tracking, but only those rows that have changed

since the last time you retrieved the changed rows

Rather than having to keep track of the version numbers, you can use the

CHANGE_TRACKING_CURRENT_VERSION()function to obtain the current version that will be

used the next time you query for changes The version returned represents the version of

the last committed transaction

Before an application can obtain changes for the first time, the application must first

execute a query to obtain the initial data from the table and a query to retrieve the initial

synchronization version using CHANGE_TRACKING_CURRENT_VERSION()function The version

number that is retrieved is passed to the CHANGETABLE(CHANGES )function the next

time it is invoked

The following example illustrates how to obtain the initial synchronization version and

initial data set:

USE AdventureWorks2008R2

Go

declare @synchronization_version bigint

Select change_tracking_version = CHANGE_TRACKING_CURRENT_VERSION();

Trang 8

Obtain initial data set.

select CustomerID, TerritoryID, @synchronization_version as version

from MyCustomer

where CustomerID <= 5

go

change_tracking_version

-0 CustomerID TerritoryID

-1 -1

2 1

3 4

4 4

5 4

As you can see, because no updates have been performed since Change Tracking was enabled, the initial version is 0 Now let’s perform some updates on these rows to effect some changes: update MyCustomer set TerritoryID = 5 where CustomerID = 4 update MyCustomer set TerritoryID = 4 where CustomerID = 5 Now you can use the CHANGETABLE(CHANGES )function to find the rows that have changed since the last version (0): declare @last_synchronization_version bigint set @last_synchronization_version = 0 SELECT CT.CustomerID as CustID, CT.SYS_CHANGE_OPERATION, CT.SYS_CHANGE_COLUMNS, CT.SYS_CHANGE_CONTEXT FROM CHANGETABLE(CHANGES MyCustomer, @last_synchronization_version) AS CT Go CustID SYS_CHANGE_OPERATION SYS_CHANGE_COLUMNS SYS_CHANGE_CONTEXT -

Trang 9

4 U 0x0000000004000000 NULL

5 U 0x0000000004000000 NULL

You can see in these results that this query returns the CustomerIDs of the two rows that

were changed However, most applications also want the data from these rows as well To

return the data, you can join the results from CHANGETABLE(CHANGES )with the data in

the user table For example, the following query joins with the MyCustomertable to obtain

the values for the PersonID,StoredID, and TerritoryIDcolumns Note that the query

uses an OUTER JOINto make sure that the change information is returned for any rows

that may have been deleted from the user table Also, at the same time you are retrieving

the data rows, you also want to retrieve the current version as well to use the next time

the application comes back to retrieve the latest changes:

declare @last_synchronization_version bigint

set @last_synchronization_version = 0

select current_version = CHANGE_TRACKING_CURRENT_VERSION()

SELECT

CT.CustomerID as CustID,

C.PersonID,

C.StoreID,

C.TerritoryID,

CT.SYS_CHANGE_OPERATION,

CT.SYS_CHANGE_COLUMNS, CT.SYS_CHANGE_CONTEXT

FROM

MyCustomer C

RIGHT OUTER JOIN

CHANGETABLE(CHANGES MyCustomer, @last_synchronization_version) AS CT

on C.CustomerID = CT.CustomerID

go

current_version

-2

CustID PersonID StoreID TerritoryID

SYS_CHANGE_OPERATION SYS_CHANGE_COLUMNS SYS_CHANGE_CONTEXT

- -

-4 NULL 932 5

U 0x0000000004000000 NULL

5 NULL 1026 4

U 0x0000000004000000 NULL

You can see in the output from this query that the current version is now 2 The next time

the application issues a query to identify the rows that have been changed since this

Trang 10

query, it will pass the value of 2as the @last_synchronization_versionto the

CHANGETABLE(CHANGES )function

CAUTION

The version number is NOT specific to a table or user session The Change Tracking

version number is maintained across the entire database for all users and change

tracked tables Whenever a data modification is performed by any user on any table that

has Change Tracking enabled, the version number is incremented

For example, immediately after running an update on change tracked table A in the

cur-rent application and incrementing the version to 3, another application could run an

update on change tracked table B and increment the version to 4, and so on This is

why you should always capture the current version number whenever you are retrieving

the latest set of changes from the change tracked tables

If an application has not synchronized with the database in a while, the stored version

number could no longer be valid if the Change Tracking retention period has expired for

any row modifications that have occurred since that version To validate the version

number, you can use the CHANGE_TRACKING_MIN_VALID_VERSION()function This function

returns the minimum valid version that a client can have and still obtain valid results

fromCHANGETABLE() Your client applications should check the last synchronization

version obtained against the value returned by this function and if the last

synchroniza-tion version is less than the version returned by this funcsynchroniza-tion, that version is invalid The

client application has to reinitialize all the data rows from the table The following T-SQL

code snippet can be used to validate the last_synchronization_version:

Check individual table.

IF (@last_synchronization_version <

CHANGE_TRACKING_MIN_VALID_VERSION(OBJECT_ID(‘MyCustomer’)))

BEGIN

Handle invalid version and do not enumerate changes.

Client must be reinitialized.

END

Identifying Changed Columns

In addition to information about which rows were changed and the operation that caused

the change (insert, update, or delete—reported as I,U, or Din the SYS_CHANGE_OPERATION),

theCHANGETABLE(CHANGES )function also provides information on which columns

were modified if you enabled the TRACK_COLUMNS_UPDATEDoption You can use this

infor-mation to determine whether any action is needed in your client application based on

which columns changed

To identify whether a specific column has changed, you can use the

CHANGE_TRACKING_IS_COLUMN_IN_MASK(column_id , change_columns ) function This

Định dạng
Số trang	10
Dung lượng	281,1 KB