Hands-On Microsoft SQL Server 2008 Integration Services part 54 potx

Select the radio button to configure the component as a Source and click OK to place this task as a source on the Data Flow Designer surface.. new rows and is used for the outputs where

Trang 1

import We have to write custom code to parse the good rows of data into the correct columns So, let’s start configuring our first Script component

1 Add a new Integration Services package in the Programming SSIS project and

rename it as Extending Data Flow with Script Component.dtsx.

2 Add a Flat File Connection Manager for C:\SSIS\RawFiles\Sales.txt file with

default settings and rename it to Sales.

3 Add a Data Flow task in the newly created package and double-click it to go to the Data Flow tab Drag and drop the Script component onto the Data Flow Designer surface The Select Script Component Type dialog box will pop up on the screen (see Figure 11-10) Select the radio button to configure the component

as a Source and click OK to place this task as a source on the Data Flow Designer surface

4 Rename the component as Script Source Component and double-click to open

the Script Transformation Editor Change the ScriptLanguage to Microsoft Visual Basic 2008

5 Rename the Output 0 in the Inputs And Outputs page to SalesOutput Note

that you have only one output available in this tab and no inputs Though you can add more outputs if you wish to create a multiple outputs source, you can’t add

an input here This is because the Sources have no inputs You refer to outputs (and inputs) with their names suffixed by buffer in the code For example, the SalesOutput will be referenced in the code with the name SalesOutputBuffer

Figure 11-10 Selecting the Script component type in the Data Flow Designer

Trang 2

6 Expand the SalesOutput and click the Output Columns node Now you can add

columns to SalesOutput by clicking Add Column Click this button four times to

add four columns with the following details as shown in Figure 11-11

SalesAmount four-byte signed integer [DT_I4]

Figure 11-11 Adding columns in the Script Source Component Outputs

Trang 3

Remember that when you decide on a data type for the columns here, it is efficient to choose the correct data types; for instance, we used length 20 in this case instead of default 50 The space we save by using the correct data type that

is sufficient enough to fit the data will mean more rows can be fit into the data buffers and hence SSIS can work faster It is as if to say, you need to take fewer buckets out of the well if you fill your buckets up to the full capacity

7 Now go to the Connection Managers page and add the Sales Connection

Manager Change the name from Connection to Sales in the Name field This

name is exposed inside the script and makes accessing the package connection managers quite easy in your script

8 Now click the Edit Script button in the Script page to open the scripting environment As you can see this environment is quite different than the one you’ve seen earlier in the Script task So, let’s spend some time here to understand various parts of this auto-generated code

First of all, notice in the Project Explorer window that there are three project items in the ScriptComponent project—BufferWrapper, ComponentWrapper, and Main

BufferWrapper

c The classes in the BufferWrapper project item provide methods for working with the data flow buffers and typed properties for each column Double-click the BufferWrapper.vb project item to the auto-generated code It contains a public class for each output buffer, in our case, only one SalesOutputBuffer, and two typed write-only properties for each column; one with the column name to refer to the column in the code and the other one with the column name suffixed with _IsNull to set the column value to null Scroll all the way down to see an AddRow method that is used

to add an empty row to the output buffer, and a SetEndOfRowset method that determines, using the EndOfRowset function, that the current buffer

is the last buffer of data and passes this information to the data flow engine Note that the ScriptBuffer class serves as the base class for the read-only classes representing the input and the outputs You are not supposed to edit this auto-generated code, as it will be overwritten when you modify the Script component

ComponentWrapper

c The classes in the ComponentWrapper project item provide methods and properties to process data and to interact with the package objects Double-click ComponentWrapper.vb to see the auto-generated code This project item creates a UserComponent class that is inherited from the ScriptComponent class The ComponentWrapper has

an overridden implementation of PrimeOutput method that is called only once at run time The PrimeOutput method prepares the outputs to accept

Trang 4

new rows and is used for the outputs where you add new rows to the output

buffers such as a source or a transformation with asynchronous output This

method then passes the processing to CreateNewOutputRows method, the

FinishOutputs method, and MarkOutputsAsFinished, which sets the end of

rowset on the last output buffer Note that the ComponentWrapper item also

contains Connections and Variables collection classes As with BufferWrapper, you are not supposed to edit this auto-generated code, as it will be overwritten when you modify the Script component

Main

c This project item contains the ScriptMain class, which inherits from

the UserComponent class In contrast to the other two project items where

you don’t write your code directly against them, you will be writing your code

here using methods and properties provided by the derived classes in this

project item Double-click main.vb to see the auto-generated code for the

configurations you’ve done earlier in the metadata design mode Also, bear

in mind that the auto-generated code in the Main.vb item is generated only

when you click the Edit Script button for the first time Later, when you make changes to the Script component, for example, you may add more outputs; the Main.vb code is not updated with those changes and you have to add methods and properties manually to perform the additional functions This behavior

also means that the code you write in the Main.vb project item is persisted

within the Script component and the auto-generation of code doesn’t affect it

Note that the imports statements have two wrappers These are the Primary

Interop Assemblies for their respective namespaces The run-time wrapper

provides the classes and interfaces used to access the Control Flow components

in the run time, while the pipeline wrapper provides the classes and interfaces

used to create custom Data Flow components This implies that the wrappers

help your custom script to access package objects such as variables and connection

managers, in the same way as the Dts global object provides access in the

Script task

The ScriptMain class is the entry Point class here, but there is no Main()

subroutine such as you’ve seen in the Script task In a script Source component,

most of the work is performed in the CreateNewOutputRows() subroutine

The CreateNewOutputRows method is used along with the AddRow method

to add new rows to the output and is primarily used when you’re writing code

for a source or for an asynchronous output You will study the synchronous and

asynchronous outputs in the next part of this Hands-On exercise There are

two more subroutines, PreExecute() and PostExecute() As the names indicate,

these subroutines run any one-time tasks necessary before and after the Script

component has processed its inputs and outputs to perform, such as initializing or

closing connections

Trang 5

9 As we are going to use StreamReader to read our nonstandard text file, add the following namespace in the Imports statements

Imports System.IO

Declare the following in the ScriptMain class:

Private textReader As StreamReader Private SalesFile As String

10. Next, to use the package connection manager Sales in your script—the connection manager you specified in the Connection Managers page in the Script

component GUI—you can use the AcquireConnections method along with the IDTSConnectionManager100 interface that returns a reference to the required connection manager You can override the AcquireConnections method to retrieve the connection information from the Sales Connection Manager Add the following subroutine below the previous statements as shown in Figure 11-12

Public Overrides Sub AcquireConnections(ByVal Transaction As Object) Dim connMgr As IDTSConnectionManager100 = Me.Connections.Sales SalesFile = CType(connMgr.AcquireConnection(Nothing), String) End Sub

11. As the connection has been set to the Sales.txt file using SalesFile connection manager string in the previous step, now you can initialize the textReader

to connect to the SalesFile As this is a one-time operation that needs to

be performed before the component starts reading the file, we will use the PreExecute method Add the following lines in the PreExecute subroutine:

MyBase.PreExecute() textReader = New StreamReader(SalesFile)

12. While you have opened the text reader, it needs to be closed after all the rows have been processed As this is a one-time operation that needs to be performed after processing of rows, you will use the PostExecute method for this Add the following code in the PostExecute subroutine:

textReader.Close()

13. Finally, you can create new output rows as the text reader reads the text file Add the following piece of code in the CreateNewOutputRows subroutine (refer to Figure 11-12)

Dim textLine As String textLine = textReader.ReadLine

Do While textLine IsNot Nothing

If textLine.StartsWith("BEGIN_RECORD") Then SalesOutputBuffer.AddRow()

ElseIf textLine.StartsWith("FirstName: ") Then SalesOutputBuffer.FirstName = textLine.Remove(0, 11) ElseIf textLine.StartsWith("LastName: ") Then

SalesOutputBuffer.LastName = textLine.Remove(0, 10)

Trang 6

ElseIf textLine.StartsWith("Title: ") Then

SalesOutputBuffer.Title = textLine.Remove(0, 7)

ElseIf textLine.StartsWith("SalesAmount: ") Then

SalesOutputBuffer.SalesAmount = textLine.Remove(0, 13)

End If

textLine = textReader.ReadLine

Loop

Figure 11-12 Code listing for the Script Source Component

Trang 7

This code reads a line of using the textReader ReadLine function and stores the line into the textLine string variable The read process is in the while loop that goes on till there is nothing to read Once a line has been read, it then passes through various case statements and, depending upon the evaluated conditions,

it either adds a new row to the output buffer using the AddRow method or adds the data into any of the evaluated columns Last, it is a good practice to use a SetEndOfRowset method that determines that the current buffer is the last buffer of data so that the downstream components know that no more rows are expected One example of SetEndOfRowset is shown later in the chapter (refer to Figure 11-17) Close the scripting environment and click the OK button on the Script Transformation Editor as you’ve created a Script Source component

14. Though we will debug the package when it is ready, running the package now

to test the script source will be a good idea and will confirm that the code we have built so far is working To run the package and see how the file has been read, we will use a Row Count transformation, as it can consume pipeline rows without requiring a destination Also, we will need a variable to configure this transformation

15. Create a variable varRecords at the package scope of the Int32 data type with

0 as its value Drop the Row Count transformation below the Script Source component and join both of them with the data flow path

16. Add a grid-type data viewer to the path to row count to see the records in a tabular format To add a data viewer, double-click the data flow path, go to Data Viewers page, click Add, and choose the Grid type data viewer

17. Execute the package and you will see the SalesOutput Data Viewer showing the records that we wanted to read in a tabular format as shown in Figure 11-13 At this time if you open the file, you can appreciate that the Script component has read the data from a nonstandard text file and formatted it in a much easier to read and easier to operate on format Also, it has nicely ignored the comments in the text file that we didn’t want to read anyway

Figure 11-13 SalesOutput Data Viewer showing the data in a grid format

Trang 8

Script Component as a Transformation

In this section, you will learn to configure a Script component as a transformation

You will be writing code for a script transformation more often than writing code

for a script source or a script destination So, it is more important to understand the

components involved in designing code for a transformation You have imported Sales

data into the pipeline using a script source; now in this part of the exercise, you will

derive bonuses paid out to the employees based on title and the sales amount that

each employee achieves Business also wants to know the total sales amount and the

total bonus disbursed as a separate reports The business rules are defined in this way:

Bonuses will be paid to all employees who achieve their targets The target for a Sales

representative is ten thousand dollars, that for a Sales Manager is fifty thousand dollars,

and that for Vice President is one hundred thousand dollars A bonus is paid at a fixed

rate of 2 percent of the sales amount, and business wants to know who has been paid

bonuses and how much Second, business also wants to know the aggregated sales and

bonus amounts

From this description, you can very well understand that you need to derive an

indicator for bonuses paid and derive the bonus amount per employee if he or she

has achieved the target Second, you need to aggregate sales amount and the bonus

amount and write those values into a separate file It is also evident that while the

first requirement can be derived using row-based operations, the second requirement

is a complete rowset-based operation This also leads us to a brief discussion of

synchronous and asynchronous components These components will be covered in

detail in Chapter 15, but here just keep in mind that a synchronous component is one

in which the output is synchronous to the input—for instance, the rows get processed

as they come and row-level operations are performed such as deriving a column value

based on the other column values On the other hand, the asynchronous components

are the ones that perform operations on the complete rowset instead of one row, such as aggregations and sorting operations These operations need all the rows before they can

provide outputs Moreover, the output rows can be different (generally less) than the

input rows, which is opposite to the synchronous component, which outputs the same

number of rows it receives at the input One last but very important difference from the

point of view of writing code: The synchronous components work on the same buffers

of data sets and do not create or write data to new buffers; they simply add or change

data in the same buffer On the other hand, asynchronous components block inputs,

collect all the input rows, perform the required operations, and write outputs to new

buffers, which means they work with more than one buffer at a time This also explains

why the asynchronous components generally need more memory

So, we will create two outputs for our script transformation They will be, as you can

guess, synchronous and asynchronous to meet both the requirements

Trang 9

Scripting a Synchronous Transformation

In this part you will create an IsBonusPaid indicator based on the targets and will derive the bonus using the sales amount Though this kind of operation doesn’t need SSIS to be scripted, as the Derived Column transformation is quite capable

of performing such operations, this example will show you how you can implement complex business requirements that otherwise can’t be met using preconfigured components in Integration Services

18. When you are ready to proceed, create a variable for the bonus calculation at the package scope with the following details:

19. Delete the data flow path between Row Count and the Script Source component Drop a Script component in between the Script Source component and the Row Count transformation Choose Transformation and click OK in the Select

Script Component Type dialog box Rename this as the Script Transformation

Component.

20. Join this new transformation component with the source and the row count transformation Double-click the newly added script transformation component

to open the editor Change the ScriptLanguage to Microsoft Visual Basic 2008

21. Just as with a Script task, you can make the package variables available to your script using ReadOnlyVariables or ReadWriteVariables fields Select the User::varBonusMultiplier variable in the ReadOnlyVariables field

22. Note that you have one additional page, this time called Input Columns, in which you can select the columns you want to work with In this page, when you select

an input column, you can specify the Usage Type as ReadOnly or ReadWrite This is quite a handy feature from a data security perspective, as it won’t let the columns marked for the ReadOnly usage type get accidentally updated For example, you may want to derive data using your custom code and want to output that derived data in the new output column; it is advisable to mark the input column’s usage type as ReadOnly You can assign an alias to the output column

in the Output Alias column in case this is a ReadWrite column and you’re going

to update this column Also, keep in mind that you don’t need to select all of the available fields in the input; rather, it is efficient to select only the fields you’re going to work with The unselected fields will be passed on to the downstream component as is without any changes Typically the synchronous component works only on the columns used in the derivations in a buffer and rest of columns

Trang 10

in the buffer stay untouched and the buffer is passed on to the next component in

the data flow

23. Select the Title and SalesAmount columns and assign them a ReadOnly usage type

24. Go to Inputs and Outputs page Note that you’ve two items here, Input 0 and

Output 0 This is because a transformation has both the inputs and the outputs

Rename the input 0 to XfrInput and the Output 0 to SynchXfrOutput Expand

the XfrInput to see the Title and SalesAmount fields that you’ve selected in the

Input Columns page Note that you can add more outputs here but not inputs

This is because a Script component can have only one input when used as a

transformation or a destination Also, remember that a script source doesn’t have

any input Click the SynchXfrOutput Here are two important properties that

need to be understood properly, as they can be quite useful on some occasions

The first property is the ExclusionGroup, which is set to 0 by default This

implies that all the input rows are sent to all the outputs If you add more than

one output to your Script component and you do not want all the rows to be sent

to all the outputs—i.e., you want to split the rows between outputs based on some criteria—you will need to set the ExclusionGroup value to be the same on all the

outputs to indicate that you want to split the rows among all the outputs; this may

be any arbitrary nonzero value In this case you will use DirectRowTo<Output

Name>Buffer method in your code to decide where to direct the different types of rows For instance, along with usual data split, you can actually use this method

to direct error rows to one of the outputs and have a made-up error output, which

otherwise is not provided by the Script component

The second property is SynchronousInputID that by default contains the ID

of the input for the first output This tells the Data Flow task to add rows from

the input buffer to the output buffer of the component When you add a second

output or more outputs to your component, you need to set this ID yourself, as it

is not set automatically If you set the value of SynchronousInputID to None, the

output becomes asynchronous, in which case you must add the rows to the output

buffer after applying transformation logic on the input rows You will use this

attribute while adding an asynchronous output to this component in the next part

of this Hands-On exercise

25. With this component, you will be creating two more derived columns to meet the

exercise requirements So, add two output columns in the SynchXfrOutput with

the following details:

BonusAmount numeric [DT_NUMERIC]

As you won’t be connecting to any outside data source, you don’t need to any

connection manager in the Connection Managers page

Định dạng
Số trang	10
Dung lượng	397,46 KB