And when the Loading Vehicle task failed in the Package1.dtsx package, the transaction rolled back not only all the tasks in this package but also the tasks in the other package, Package
Trang 1because both the containers were running under one transaction that was started
by the package, and when one of the tasks in a container fails, the transaction rolls back all the work done by previous tasks One lesson to learn from this exercise
is that the parent container, which is the package in this case, must have its TransactionOption property set to Required to start a transaction, and the child containers need to have at least the Supported attribute for this property
Exercise (Case III: Transaction Spanning over Multiple Packages)
In the last part of this exercise, you will use a transaction to roll back the inconsistent data when your loading process uses multiple packages When you have multiple packages to process, you use the Execute Package task to embed them inside a single package to run them The Execute Package task is basically a wrapper task that enables
a package to be used inside another package The Execute Package task is covered in Chapter 5
26. Right-click the SSIS Packages node in the Solution Explorer window and choose New SSIS Package from the context menu You will see that the new package has been added with the default name of Package1.dtsx and the screen is switched to the new package Note that the Designer shows these two packages as tabs
27. Go to Package.dtsx, right-click the localhost.Campaign Connection Manager, and choose Copy Switch back to Package1.dtsx and paste this connection manager in the Connection Managers area
28. Again go to Package.dtsx and cut the Sequence Container 1 with Loading Vehicle Task, return to Package1.dtsx, and paste this container on the Control Flow You will see a validation error about the connection manager on the Loading Vehicle task This is because the ID for the localhost.Campaign Connection Manager has been changed
29. Double-click the Loading Vehicle task icon to open the editor In the Connection field, choose localhost.Campaign Connection Manager from the drop-down list and click OK You’ve divided the first package into two separate packages To run these two packages as a single job, you need to create a new package and call these two packages using the Package Execute task
30. Right-click the SSIS Packages node in the Solution Explorer window and choose New SSIS Package from the context menu When the new blank package is loaded, drop two Execute Package tasks on the Control Flow surface
31 Rename the first Execute Package task Package and the second task Package1
Join Package to Package1 using an on-success precedence constraint
32. Double-click the Package icon to open the editor Go to the Package page and change the Location field value to File System
Trang 233. Click in the Connection field and then click the drop-down arrow and choose
<New Connection > In the File Connection Manager Editor’s File field, type
C:\SSIS\Projects\Maintaining data Integrity with Transactions\Package.dtsx
and click OK You will see Package.dtsx displayed in the Connection field Click
OK to close the Execute Package Task Editor
34. As in the last two steps, open the editor for the Package1 task, change the Location
to File System, and add a file connection manager in the Connection field pointing
to C:\SSIS\Projects\Maintaining data Integrity with Transactions\Package1.dtsx
as the existing file Close the Execute Package Task Editor after making these
changes
35. Click anywhere on the blank surface of the Control Flow panel and press f4
to open the Properties window for the package Scroll down and locate the
Transactions section and set the TransactionOption property to Required This will run both the Execute Package tasks and hence the child packages in the context
of a single transaction However, before proceeding any further, verify that the
TransactionOption is set to the default value on Package and Package1 tasks and on
the Package1.dtsx package The Package.dtsx will have this property set to Required,
which is okay, as this will also enable it to join the transaction started by Package2
.dtsx At this time, your package will look like the one shown in Figure 8-7
Figure 8-7 Calling multiple packages using the Execute Package tasks
Trang 336. Go to the Solution Explorer window, right-click Package2.dtsx, and then select Execute Package from the context menu You will see that the Package.dtsx will execute successfully and then Package1.dtsx will execute, but it fails, and the components will turn red
37. Switch to SQL Server Management Studio and run the command you created
in Step 10 in the first sequence of steps to see the results You will see that still no record has been added to the tables, despite the fact that Package.dtsx executed successfully This is because both the packages were running under one transaction And when the Loading Vehicle task failed in the Package1.dtsx package, the transaction rolled back not only all the tasks in this package but also the tasks in the other package, Package.dtsx
Review
You’ve seen how you can use a transaction to combine various tasks and containers and even the packages to behave as a single unit and create atomicity among them that will commit or roll back as a unit You’ve worked with the Sequence container to combine set of tasks as a logical unit and have learned a new trick of copying and pasting tasks among packages to increase productivity
While all the preceding is useful when you want to use distributed transactions, you cannot use the distributed transactions in all situations Sometimes you may need
to use Native Transaction support Native transactions are native to the RDBMS that is used, for instance A simple case could be that you create and populate a temporary table in one task and want to use it later in another task This kind of requirement cannot be met using the distributed transaction support In SSIS, when you configure a task you specify a connection manager on each task So, when a task
is run, a connection is opened specifically for that task, and later this connection is closed when the defined operation on the task has been performed The closure of a connection doesn’t help to perform native transactions that need the same connection
to be retained across all the tasks involved SSIS provides you a Boolean property on the Connection Manager named intuitively the RetainSameConnection property that allows you to keep a connection open across all the involved tasks To use this property, click the Connection Manager, then set the RetainSameConnection to True, and then use this connection manager in all the tasks that participate in native transaction process One of the main benefits of using a native transaction is that you can build a logic-based commit or rollback of the transaction that is otherwise not possible with distributed transactions, which can commit or roll back only on success
or failure of the tasks involved
Trang 4Restarting Packages with Checkpoints
If you’re like most other information analysts and update your data warehouse every
night, this feature will be of much interest to you After having set up logging for your
packages, every morning you’d be checking the logs for the last night’s update process to
see how the update went You usually expect that the update process has been successful, but what if the update process has failed? You will have to rerun your package during the daytime—and I know you wouldn’t be happy about this, because doing this work during
business hours involves some serious implications Your users will not get the latest
updates and will experience poor performance of the involved database servers while you
rerun the update process If you’ve worked with DTS 2000 packages, you know that
DTS 2000 doesn’t support restating a package from the point of failure You have to
rerun the package from the start or manually run the tasks individually, which is quite
involving and sometimes impossible to do This is where Integration Services comes to
the rescue by providing improved functionality of restarting a package
By using checkpoints with Integration Services packages, you can restart your failed
packages from the point of failure and can save the work that has completed successfully
Integration Services writes all the information that is required to restart a failed package
in a checkpoint file This file is created whenever you run a package the first time after
a successful completion, and it is deleted when the package successfully completes
However, if an Integration Services package fails and is configured to use checkpoints,
the checkpoint file is not deleted; instead, it is updated with information that is required
to rerun the package from that point When you rerun your package, Integration Services
checks two things before executing the package: whether the package is configured to use
checkpoints and whether the checkpoint file exists—i.e., whether the package failed while executing last time If it finds that the package configured to use checkpoints has actually
failed the last time it was run—i.e., the checkpoint file exists, it then reads the checkpoint
file associated with the package, gets the required information from the file, and restarts
the package from the point of failure
The checkpoint file contains all the necessary information for a package to restart at
the point of failure such as the execution results of all the completed units of work, the
current values of variables involved, and package configuration information
You decide the key positions in your package that would be good candidates for the
point of restart and can be written as checkpoints in the file For example, you would
definitely designate a checkpoint immediately after the task that loads a large data set
or downloads multiple large files from an FTP site In case of failure of the package
after successfully downloading files or completing loading the data set, the package will
be restarted after these tasks, as the checkpoint defines the starting place As mentioned earlier, the checkpoint file also contains the package configuration information—
i.e., the information about the configurations under which the package was running
Trang 5This avoids reloading of package configurations, as this is read from the checkpoint file and hence maintains the original configurations into which the package was running at the time of failure
To enable your package to record checkpoints information, you set the following properties at the package level:
CheckpointUsage
c You can access this property in the Checkpoints section of the package Properties window This property can have one of three values: Never, Always, or If Exists The default value is Never, which means the checkpoints are not enabled and no checkpoint file will be created; hence, the package will always start processing from the beginning whenever it is executed The second value is Always, which, if selected, will make the package always use a checkpoint file
If the package has failed in the previous execution and you’ve somehow deleted or lost the checkpoint file, the package will fail to execute The third possible value is
If Exists, which, when selected, makes the package use a checkpoint file if it exists and start the package from the point of failure in the previous execution You can reuse a checkpoint file over and over for the same package However, if the checkpoint file doesn’t exist, the package will always start from the beginning The checkpoint file is specific to a package Before executing a package, SSIS checks if the PackageID in the checkpoint file is the same as that of the package If there is
a mismatch, SSIS won’t execute the package
SaveCheckpoints
c After enabling your package to use checkpoints, you can set this property to True to indicate that checkpoints should be saved
CheckpointFileName
c Using this property, you can specify the path and the file into which you would like to save checkpoints
Along with these properties, you also need to set the FailPackageOnFailure property, available in the Execution section in Properties window on the package and the containers,
to True to specify that the package will fail when a failure occurs This property helps
in setting the checkpoints on the tasks that you want to make as points of restart
If you do not set this property on any task or container in the package, the checkpoint file will not include any information for the containers on failure and will restart the package from the beginning It is interesting to note the following points concerning the smallest unit that can be restarted:
The smallest unit that can be restarted is a task
c The Data Flow task, which is a special task in Integration Services enclosing c
the data flow engine, can consist of several data flow transformations This task
is considered similar to any other Control Flow task as far as checkpoints are
Trang 6concerned and cannot be started from halfway where it failed If you have massive
pipeline operations in your package and you’re concerned about rerunning
packages, it is better that you divide up the data transformations work between
multiple Data Flow tasks
The Foreach Loop Container is also considered an atomic unit of work that will
c
either commit or restart completely to iterate over all the values provided by the
enumerator used
When used with For Loop Container, the checkpoint file will save the last value
c
of the variable and hence will restart from the same point where it left off
The use of an atomic unit of work actually calls for a discussion on transactions and
checkpoints, as transactions convert the tasks and the packages involved into an atomic
unit of work Let’s understand the checkpoints and their operation within the scope of
a transaction in the following Hands-On exercise
Hands-On: Restarting a Failed
Package Using Checkpoints
In this exercise, you will simulate a package failure and configure your package with
checkpoints to restart it from the point of failure
Method
You will use the package you developed earlier in the last exercise and apply checkpoint
configurations to it In the second step, you will use transactions over the package to see its behavior
Exercise (Apply Checkpoint Configurations to Your Package)
In the first part of this Hands-on, you configure the Integration Services package to use the checkpoints and execute the package to see it execution behavior
1 Open BIDS and create a new Integration Services Project with the following
details:
Name Restarting failed package
Location C:\SSIS\Projects
2 When a blank project is created, delete the Package.dtsx package under SSIS
Packages node in the Solution Explorer window Then, right-click the SSIS
Packages node and choose Add Existing Package from the context menu
Trang 73 In the Add Copy Of Existing Package dialog box, select Package Location as
the File System In the Package Path field, type C:\SSIS\Projects\Maintaining
data Integrity with Transactions\Package.dtsx and click OK to add this package
Once the package has been added, open it in the Designer
4 Drop an Execute SQL task from the Toolbox on to the Designer surface outside
the Sequence container and rename this task Loading Vehicle Double-click the
task icon to open the editor In the General page’s Connection field, choose the Add localhost.Campaign Connection Manager and type the following SQL statement in the SQLStatement field:
INSERT INTO Vehicle (CustomerID, Series, Model) VALUES ('N501', 'X11 Series', 'Saloon')
You already know that this SQL statement is without the mandatory VIN field; hence it will fail the Loading Vehicle task Join the Sequence Container with the Loading Vehicle task using an on-success precedence constraint
5 Click anywhere on the blank surface of the Designer and press f4 to open the Properties of the package First, make sure that the package is not configured to use transactions Scroll down and locate the TransactionOption property, and change its value to Supported
6 Scroll up in the Properties window and locate the Checkpoints section Specify the following settings in this section:
CheckpointUsage IfExists
CheckPointFileName C:\SSIS\Projects\Restarting failed package\checkpoints.chk
7 Because we want to include the restart information of the Loading Vehicle task in the checkpoints file, click the Loading Vehicle task on the Designer surface You will see that the context of Properties window changes to show the properties
of the Loading Vehicle task Locate the FailPackageOnFailure property in the Execution section and change its value to True
8 Press f5 to execute the package You already know the result of the execution The Sequence Container and the two Execute SQL tasks in it successfully execute and turn green, but the Loading Vehicle task fails and shows up in red Press shift-f5 to switch back to designer mode
9 Let’s see what has happened in the background while the package was executing Open SQL Server Management Studio and run the following query to see the records imported into the database:
SELECT n.[CustomerID], [FirstName], [SurName], [Email], [Type], [VIN], [Series], [Model]
FROM [Campaign].[dbo].[NewCustomer] n LEFT OUTER JOIN [Campaign].[dbo].[EmailAddress] e
Trang 8ON n.CustomerID = e.CustomerID
LEFT OUTER JOIN [Campaign].[dbo].[Vehicle] v
ON n.CustomerID = v.CustomerID
You will see that the customer information and its e-mail information have been
loaded while the vehicle information fields have null values
Using Windows Explorer, navigate to the C:\SSIS\Projects\Restarting failed
package folder and note that the checkpoints.chk file has been created Open this
XML formatted file and note that it contains information about the failure of the
package and the cylinder involved in the failure
10. Change the SQL statement of the Loading Vehicle task to include the VIN
information with the following query:
INSERT INTO Vehicle (CustomerID, VIN, Series, Model) VALUES
('N501', 'UV123WX456YZ789', 'X11 Series', 'Saloon')
11. Again execute the package This time you will see that only the Loading Vehicle
task is executed and the earlier two tasks and the Sequence container did not
run at all (see Figure 8-8) This is because the package reads the checkpoint file
before executing and finds the information about where to start executing Press
shift-f5 to switch back to design mode
Figure 8-8 Restarting package with checkpoints
Trang 912. Explore to the C:\SSIS\Projects\Restarting failed package folder and note that the checkpoints.chk file does not exist
13. Switch to SQL Server Management Studio and run the script specified in Step 9
to see the result set You will see one record containing customer, e-mail, and vehicle information Run the following queries to clear the tables:
DELETE [Campaign].[dbo].[NewCustomer]
DELETE [Campaign].[dbo].[EmailAddress]
DELETE [Campaign].[dbo].[Vehicle]
Exercise (Effect of Transaction on Checkpoints)
To set transactions on this package we need to set the TransactionOption value to Required So, let’s do it
14. Click anywhere on the blank surface of the Control Flow Panel and press f4 to open the Properties window Scroll down and locate the TransactionOption property in the Transactions section Set it to the Required value so that it starts
a transaction But SSIS doesn’t allow you to do this and throws an error as shown
in Figure 8-9
This behavior is different than Integration Services 2005, in which you could use transactions and checkpoints in the same package and Integration Services left proper usage and management of both of them to you In that case the transactions roll back the information of the checkpoint file and cause that package to execute all over again This is actually applicable to containers in simple packages also But there is a potential for error or misbehavior when you are using Integration Services 2005 with checkpoints
Figure 8-9 Error thrown while trying to use transactions alongside checkpoints
Trang 10and transactions in a complex package; that is, if your package consists of a complex
container hierarchy and a subcontainer commits before the parent container fails, the
subcontainers do not get rolled back and also do not get recorded in the checkpoint
file This causes those subcontainers to be executed again when the parent container
is restarted Similarly, the Foreach Loop container does not record any information in
the checkpoint file about the iterations it may have already done before failing and gets
executed all over again when restarted So, when you’re planning to use checkpoints
alongside the transactions, use caution and test thoroughly Integration Services 2008
R2, by contrast, stops you doing that altogether due to the complexity and risk involved,
and you can’t use transactions and checkpoints in your packages at the same time
Review
You’ve seen in this exercise that the checkpoints can help you restart a package precisely from the task where the package failed You also understand that you need to be careful
while using transactions and checkpoints on packages with complex container hierarchies
in Integration Services 2005 On the other hand, Integration Services 2008 R2 doesn’t
allow you to implement checkpoints and transactions at the same time
Expressions and Variables
You learned about variables and property expressions in Chapter 3 and have used
them in various Hands-On exercises in subsequent chapters With DTS 2000, use of
variables was considered an advanced feature that allowed you to add some dynamic
behavior to your packages However, use of variables in Integration Services is made
easier and has been tied into SSIS package design so much that the packages developed
without using variables are reduced to ad hoc data operations, most of which can be
done using the SQL Server Import and Export Wizard On the other hand, use of
property expressions is a new feature in Integration Services that provides an ability to
set values for component properties dynamically using variables that are updated at run
time by other tasks Property Expressions allow you to evaluate values generated at run
time by other tasks and use the evaluated values to update properties exposed by the
concerned task at run time This is quite a powerful feature, as it allows you to read and
evaluate the values that exist only at run time and modify the property or behavior of
other tasks in the package
Though you’ve used variables and expressions in the Hands-On exercises earlier, here
you will do another exercise that uses variables and particularly property expressions
extensively to update properties of the send mail task to generate personalized mails