The ability to use your own code is provided by the Script task in the control flow and by the Script component in the data flow.. Create a new project in BIDS with the following details
Trang 14 8 8 H a n d s - O n M i c r o s o f t S Q L S e r v e r 2 0 0 8 I n t e g r a t i o n S e r v i c e s
Script Task
The Script task lets you write a piece of code to extend the package functionality and get the job done The ability to use your own code is provided by the Script task in the control flow and by the Script component in the data flow To provide an IDE, the Script task and the Script component use Visual Studio Tools for Applications (VSTA)
in Integration Services 2008 The previous version, Integration Services 2005, used Visual Studio for Applications (VSA) as its IDE, so if you have some scripts that are written in SSIS 2005, you will need to upgrade them for the new environment Refer to Chapter 14 for details on migration issues
Using VSTA, you can write scripts with Microsoft Visual Basic 2008 or Microsoft Visual C# 2008 If your business rules are already coded in different NET-compliant languages or you prefer to write code in a different language, you can compile it into
a custom assembly and call it within the Script task, as it can call the external NET assemblies quite easily The Script task allows you to leverage the powerful NET libraries also The code written in VSTA is completely integrated with Integration Services—for example, the breakpoints in VSTA work seamlessly with breakpoints in Integration Services Before you run a package containing a Script Task, you do need to make sure that the VSTA engine is installed on the computer
So, whether you want to achieve extra functionality or use existing code, the Script task provides enough facilities to allow you to accomplish your goals
Hands-On: Scripting the Handshake Functionality
In a classical data warehousing scenario, it is quite common to use control or handshake files to let the processes know whether the operations upstream or downstream have completed In our test scenario, we have a mainframe process that copies the files into
a folder and stamps the handshake file with different strings based on the current status
of the process As the files could be big, depending upon what has been extracted—i.e., daily, weekly, monthly, or yearly data—we do not want to start loading the file that is still being written by this upstream process Before loading the data file, the mainframe process stamps the handshake file with the LOADING string, and on completion of data loading into the data file, it stamps the file with the OK string When the SSIS package starts, we want to check the string in the handshake file first; if it is OK, the package should stamp the handshake file with the PROCESSING string and should start processing And after completion of processing, it should stamp the file with the UPDATED string so that the mainframe process knows that the data of the previous day has been processed
Trang 2As you can make out, it is not easy to implement the preceding requirements in SSIS
using prebuilt components, but on the other hand, they can be implemented quite
easily using the Script task The logic required is shown in the Figure 11-1 In our
scenario, we will use HandshakeFile.txt file as the control file that has been saved in the C:\SSIS\RawFiles folder and the RawDataTxt.csv file as the data file saved in the same folder You’ve already imported this data file in Chapter 2 The logic of the branches
shown in the figure will be implemented with the help of a variable HandshakeMessage that will be created in our package In this exercise, we will focus more on the Script
task; the rest of the items you should be able to implement yourself by now
Exercise (Working with Script Task GUI)
We will add a Script task in the package to read the handshake file
1 Create a new project in BIDS with the following details:
Template Integration Services Project
Name Programming SSIS
Location C:\SSIS\Projects
Figure 11-1 Work flow for handshake exercise
Status in the handshake file?
-Stamp PROCESSING -load data
Stamp UPDATED in the handshake file and finish
Do nothing
OK
ELSE UPDATED
OK
ELSE
Check status after 1 hour
Trang 34 9 0 H a n d s - O n M i c r o s o f t S Q L S e r v e r 2 0 0 8 I n t e g r a t i o n S e r v i c e s
2 When the blank solution is created, rename the package Extending SSIS with
Script Task.dtsx and click OK in the confirmation dialog box.
3 Create a variable called HandshakeMessage at the package scope.
4 Drop a Script task from the Toolbox onto the Control Flow pane and double-click
to open the Script Task Editor
5 Set the Name and the Description as follows in the General page:
Name Determine workflow using Handshake
Description This task sets HandshakeMessage variable that is used to determine the package control flow
6 You can set the preferred programming language in the ScriptLanguage field Change it to Microsoft Visual Basic 2008 as shown in Figure 11-2
Figure 11-2 Script Task Editor
Trang 47 Entry point is the method that is called when the Script task runs Specify the
entry point name in the EntryPoint field When you click Edit Script, the VSTA
development environment is launched and a script project is created from the
script templates based on the language you’ve specified in the ScriptLanguage
field This auto-generated script generates the ScriptMain class as the default
class, which further contains a public subroutine or method called Main that acts
as an entry point for the script Make sure that the name of the public subroutine
or method Main is the same as the value specified in the EntryPoint field at all
times, in case you choose to change one later on
8 The next two fields allow you to list variables that exist in the package and you
want to access in your script You can specify multiple variables in either of these
fields by using a comma-separated list As the names indicate, you can have either read-only access or read/write access to the variables, depending upon which field
you choose to list them in This is a cool method to exchange values between
the package and your script Though the script is a part of the package and gets
saved within the package, yet it has its own object model: ScriptObjectModel,
represented by the Dts object, that allows interaction with the package objects
outside the Script task We will study the Dts object in a bit more detail later
on when we work on the script, but for the time being just remember that
the Dts object makes the package objects such as variables accessible to you
while working in the script The variables listed in the ReadOnlyVariables
and ReadWriteVariables fields are referred using the Variables property of the
Dts object, and the Script task locks the variables for read or read/write access
transparently The code example in this case would be as follows:
Dim strScriptVariable As String = Dts.Variables("varPkgVariable").
Value.ToString
There is another way to access variables from within the script You can use the
VariableDispenser property of the Dts object within the scripts In this method, you
don’t use ReadOnlyVariables and ReadWriteVariables fields in the Script task GUI;
rather, you lock the variables for read or read/write access using your code While
using the VariableDispenser property is a standard way of accessing variables within
scripts, it requires a bit more code to access a variable than the earlier method Look
at the following code example using the Dts.VariableDispenser method that shows
much more code than the Dts.Variables method
Dim vars as variables
Dts.VariableDispenser.LockOneForRead("varPkgVariable", vars)
strScriptVariable = vars("varPkgVariable").Value.ToString()
vars.Unlock()
You may prefer to use the earlier method, as that is more convenient, though
the Dts.VariableDispenser method is the recommended method to use First, in
the earlier Dts.Variables method you specify variables in the ReadOnlyVariables
Trang 54 9 2 H a n d s - O n M i c r o s o f t S Q L S e r v e r 2 0 0 8 I n t e g r a t i o n S e r v i c e s
and ReadWriteVariables fields in the GUI; however, if the variables do not exist at design time and gets created only at the run time, you can’t use this method In such cases, you must use the Dts.VariableDispenser method, as this method locks the variables only when the code needs them and it doesn’t need to know beforehand whether the variables existed or not Second, as you can control release of locks in the second method and can release them more quickly for access to the variables by other concurrent processes, this can be the more efficient method to work with The first method—i.e., Dts.Variables—does not release locks until the task has completed, and that obviously, in some cases, will be blocking other processes
We will use the Dts.Variables method in our script to keep things simple, so let’s specify the variable in Script Task GUI as a first step Also, as we will need to update the variable value, so we will use the ReadWriteVariables field To specify a variable, click in the ReadWriteVariables field and then on the ellipsis button and select the check box next to the User:HandshakeMessage variable Click OK to come back and find the variable listed in the ReadWriteVariables field as shown in Figure 11-2
Exercise (Understanding the Auto-Generated Script)
In this part of the exercise, you will open the scripting environment and understand the various parts of the auto-generated code
9 Click Edit Script This will invoke the VSTA-based Script task development environment The Script task creates a blank scripting project within the VSTA environment (Figure 11-3) using the language you specified in the GUI This VSTA project gets saved as a ScriptProject item inside the package XML code Once this project gets created, you can’t go back and change the language Now if you close the VSTA environment, you will see that the ScriptLanguage field has been grayed out and you can’t change the language
10. Look in the Project Explorer By default the project creates a ScriptMain item (ScriptMain.vb if you’ve selected the Microsoft Visual Basic 2008 language or ScriptMain.cs for Microsoft Visual C # 2008) This does not limit you in any way You can create more items in your project such as classes, modules, and code files, and if you need to reference other managed assemblies in your code, you can also do that by adding a reference in your project You can add a reference to an external assembly by right-clicking the project in the Project Explorer or from the Project menu bar item Click the Project menu bar item to see what other items you can add in the scripting project The scripting code that is auto-generated and the code that you add to create the required functionality get attached to the Script task in which it resides All the items that you add in the scripting project get persisted inside the package, and you can organize them in the folders as you would normally do in other development projects
Trang 6Note that a lot of default code has been auto-generated for you It starts with
the comment about what the code is about and tells you that the ScriptMain is
the entry point class of the script It will be a good practice to add comments at
various levels within your script Replace the first line of the comment—Microsoft
SQL Server Integration Services Script Task—with the project heading—Script for
Handshake functionality.
After the comment lines, Imports statements have been added These statements
represent the NET Framework system libraries and make it easier to call the
functions in those libraries Here you can add more system libraries as required in
Figure 11-3 Default Script task blank project
Trang 74 9 4 H a n d s - O n M i c r o s o f t S Q L S e r v e r 2 0 0 8 I n t e g r a t i o n S e r v i c e s
your code Listed next are some of the more frequently used NET Framework classes
System.Data
c Used to work with the ADO.NET architecture
System.IO
c Used to work with the file system and streams
System.Windows.Forms
System.Text.RegularExpressions
System.Environment
c Used to retrieve information about the local computer, the current user, and environmental settings
System.Net
c Used to provide network communications
System.DirectoryServices
System.Threading
c Used to write multithreaded programs
The code contains a class that is named as ScriptMain by default If you scroll
to the end of the code, you can notice that a public subroutine is also created called Main So, at run time, Integration Services calls the ScriptMain.Main subroutine to execute the Script task Two more subroutines have been added in the ScriptMain class—ScriptMain_Startup and ScriptMain_Shutdown—along with an enumerator ScriptResults that is used to enumerate the execution results The ScriptMain_Startup subroutine doesn’t contain much code, while the ScriptMain_Shutdown subroutine unlocks the variables that have been locked by the Dts.Variables collection You can add more code to these subroutines in case you want to perform some logic at the startup or shutdown of the Script task Finally, note that the subroutine Main is the place where you would add your code as indicated by a comment line in the auto-generated code The only line contained by the Main subroutine sets the TaskResult It is important to explicitly set the execution result, as in some instances you may want to force the failure based on an outcome in the script if you’re building a slightly more complex work flow within a package The Script task uses the TaskResult property to return status information to the run-time engine, which in turn can use the return status
to determine the path of the workflow The Dts.TaskResult gets the result value from the ScriptResults Enumerator that has also been added in the script
Exercise (Adding Your Code in the Script Project)
We have used the Dts object quite a few times in the previous sections, but it is time
to learn about it a bit more before we add code into the script Integration Services uses a class—Microsoft.SqlServer.Dts.Tasks.ScriptTask.ScriptObjectModel—called the ScriptObjectModel class to access the package objects within the code written for
a Script task Developers use properties and methods of the ScriptObjectModel class
Trang 8to access the objects such as connections, variables, and events defined elsewhere in the
package The global Dts object is nothing but an instance of the ScriptObjectModel
class and hence is used to interact with the package and the Integration Services
run-time engine Just be clear that the Dts object is available only in the Script task and
not in the Script component, as it is inherited from the Script task’s namespace It has
seven properties and one method as shown in Figure 11-4 and explained next
c Connections Accesses connection managers defined in the package The
connection manager accessed in this way stores the information such as user name
and password to connect to the data source, and you don’t have to provide these
details in your script They can also be utilized to access data directly in the script
to perform any data-related operation
c Events Lets the Script task fire errors, warnings, and informational messages.
c ExecutionValue Returns additional information such as a value from your
Script task to the run-time engine This value in turn can be used to determine the control flow path
c Log This is the only method of Dts object that helps to log information to any
of the enabled log providers
c TaskResult Returns the success or failure of the Script task to the run-time
engine and determines the control flow path within your package As explained
earlier, setting the TaskResult property is the main way to control the package
workflow, as you can force the value based on the outcome in the script and not
just on the execution status of the script
c Transaction Provides the transaction within which the task’s container is running.
c Variables Provides access to the variables listed in the ReadOnlyVariables and
ReadWriteVariables task properties for use within the script
Figure 11-4 The properties and method of the Dts object
Trang 94 9 6 H a n d s - O n M i c r o s o f t S Q L S e r v e r 2 0 0 8 I n t e g r a t i o n S e r v i c e s
c VariableDispenser Provides an alternative way to access package variables
within the script Both the Variables and the VariableDispenser properties have been discussed in detail earlier in the exercise
Now that you’ve learned all you need to write a script that can interact with the package, let’s create a logic that is needed to fulfill our requirement of this Hands-On exercise—i.e., creating a different workflow based on the message in the handshake file for our package
11. Add the following code below the comment “Add your code here.”
Dim strFileReader As String strFileReader = My.Computer.FileSystem.ReadAllText("C:\SSIS\
RawFiles\HandshakeFile.txt") Dts.Variables("HandshakeMessage").Value = strFileReader 'If the mainframe process is still loading or the previous day's processing is still running
'Change the handshake message to ELSE for special handling in the right most branch of the logic
If (strFileReader = "PROCESSING" Or strFileReader = "LOADING") Then Dts.Variables("HandshakeMessage").Value = "ELSE"
ElseIf strFileReader = "OK" Then My.Computer.FileSystem.WriteAllText("C:\SSIS\RawFiles\HandshakeFile txt", "PROCESSING", False)
End If
This code reads the HandshakeFile.txt that resides in the RawFiles folder In real-life coding, you will be using a connection manager, probably with a variable that provides flexibility as to where the file has been placed, to connect to the file The variable could be populated at run time by some other process before the script runs, giving you flexibility and control over the process To keep things simple, I have used a direct connection to the file here At this stage if you refer back to Figure 11-1 to remind yourself of the logic we need to develop, you will understand that this code better The code reads the HandshakeFile.txt file and passes the value to the HandshakeMessage variable, and if the value is OK, then it writes PROCESSING in the HandshakeFile.txt file to implement the middle branch logic For the scenario where the process has been invoked while the mainframe is still loading or the previous processing has still not finished, it will set the value of the HandshakeMessage variable to ELSE to implement the rightmost branch of logic
12. Add a Data Flow task in the package and from the Script task, drag the precedence constraint and drop it onto the Data Flow task Double-click the precedence constraint to open the editor Select Expression in the Evaluation Operation field and type the following in the Expression field as shown in Figure 11-5
@HandshakeMessage = = "OK"
There is no space between the two equal (=) signs Click Test to test the expression Close the success message and the Precedence Constraint Editor by clicking
OK twice
Trang 1013. Configure the Data Flow task to import the RawDataTxt.csv file into the
Campaign database The main items you will set up in this task are described
next If you’ve difficulty in setting up this task, refer to the code provided for this
book, as the complete code for this exercise been included
Source Flat File Source to read RawDataTxt.csv file
Destination OLE DB Destination to import into Campaign.dbo.RawDataTxt table
14. Drop another Script task below Data Flow task and connect both of them with
a success precedence constraint This Script task will write the UPDATED
message in the HandshakeFile.txt file if the HandshakeMessage variable has
a value of OK and will complete the middle branch of our required work flow
15. Configure the Script task as follows:
ScriptLanguage Microsoft Visual Basic 2008
EntryPoint Main
Figure 11-5 Setting the Expression in the Precedence Constraint Editor