Figure 3.8: Additional options for Fast Loading data within SSIS destination objects.. A data viewer lets you, the package designer, view the data as it flows through from the source to
Trang 1Figure 3.7: OLE DB Destination Editor properties
Although I did not use it in this example, there is also the Rows per batch option
that will batch the load process so that any failures can be rolled back to the previous commit, rather than rolling back the entire load
It is worth noting that there are other Fast Load options that you cannot see
here In SSIS, these options are presented to you only in the Properties window for the destination object Additional fast load properties include:
• FIRE_Triggers, which forces any triggers on the destination table to fire By default, fast or bulk loading bypasses triggers
• ORDER, which speeds performance when working with tables with clustered indexes so that the data being loaded is pre-sorted to match the physical order of the clustered index
Trang 2These properties can be manually keyed into the FastLoadOptions property value In this example, I also used the FIRE_TRIGGERS fast load option, as shown
in Figure 3.8
Figure 3.8: Additional options for Fast Loading data within SSIS destination objects
It is almost time to execute this simple data migration package First, however, I would like to add a data viewer to the process A data viewer lets you, the package designer, view the data as it flows through from the source to the destination
Trang 3object To add a data viewer, simply right-click on the green data flow path and select "Data Viewer" This will bring up the "Configure Data Viewer" screen, as shown in Figure 3.9 The data viewer can take several forms: Grid, Histogram,
Scatter Plot and Column Chart In this example, I chose Grid
Figure 3.9: Selecting a data viewer
When we execute the package, the attached data viewer displays the flow of data You can detach the data viewer to allow the records to flow through without interaction The data viewer is useful while developing a package to ensure that the data you are expecting to see is indeed there Of course, you will want to remove them before deploying the package to production, via a scheduled job Figure 3.10 shows the data viewer, as well as the completed package, as the 64,488 records are migrated
Trang 4Figure 3.10: Completed package execution with data viewer
If this was indeed an archive process, the final step would be to delete the data from the source table I will not cover this step except to say that it too can be automated in the SSIS package with a simple DELETE statement, matching the criteria we used for the source query when migrating the data
I am always careful when deleting data from a table, not because I am fearful of removing the wrong data (good backup practices and transactions are safety measures) but because I am mindful of how it might affect my server For example, how will the log growth be affected by deleting potentially millions of records at a time? Should I batch the delete process? Will there be enough space for log growth when accounting for each individual delete? How long will it take? These are all questions the answers to which have, over the years, taught me to tread carefully when handling the delete process
Data comparison tools
Now that we have investigated how to bulk load and move data with BCP and SSIS, it is time to turn our attention to the other very popular and efficient ways to get selected data from source to target Sometimes you do not have to resort to the truncate and load processes that are prevalent in many data load facilities like BCP or SSIS Sometimes, merging the data from one location to another is a much quicker way of synchronizing two data stores
Some third party tools that perform these comparisons incur cost but offer substantial savings in terms of space and time because you do not need to store multiple Gigabytes of data in output files, or transfer those same large files across slow network connections
Trang 5With data comparison, you are migrating a much smaller subset of transactions, for example those that have occurred over the last day, or even hour for that matter This is similar in nature to log shipping in the sense that only new transactions are migrated, but with the added benefit of maintaining much more control over the target data For example, once the data is migrated from source
to target, via a data comparison tool, you can add indexes to the target that did not exist on the source This is not possible with log shipping, as I will discuss shortly Several tools come to mind immediately, for performing this data comparison and
"merge" process:
• Native Change Data Capture (SQL Server 2008 only) – this new
technology allows you to capture date changes and push them to a target
in near-real time I have been anxiously awaiting such a technology in SQL Server but I would have to say that I have not found CDC to be fully realized in SQL Server 2008, and I don't cover it in this book Don't get me wrong, it is there and it works but, much akin to table partitions and plan guides, it is a bit daunting and not very intuitive
• T-SQL Scripts – Pre SQL Server 2008, many DBAs developed their
own ways of merging data from one source to another, using T-SQL statements such as EXCEPT and EXISTS Essentially, such code tests for the existence of data in a receiving table and acts appropriately upon learning the results This was not difficult code to produce; it was just time consuming
• TableDiff – this is another tool that has been around for many years It
was designed to help compare replication sets for native SQL Server replication but it is also a handy tool for comparing and synchronizing data
• Third party Data Comparison tools – there are several available on the
market, but I am most familiar with Red Gate's SQL Data Compare Where tablediff.exe is excellent for comparing one table to another, SQL Data Compare allows you to compare entire databases, or subsets of objects, and data therein The process can be scripted and automated to ensure data is synchronized between data sources It is particularly valuable for synching your production environment to your test and dev environments, or for reporting
I will cover TableDiff.exe in this chapter While I do not cover SQL Data Compare here, I would highly recommend trying it out if you do a lot of data migration and synchronization:
http://www.red-gate.com/products/SQL_Data_Compare/index.htm
However, before I present the sample solution, using TableDiff, we need to discuss briefly the concept of uniqueness