Hands-On Microsoft SQL Server 2008 Integration Services part 43 potx

What happens with partial cache mode is that the partial reference data rows are loaded into the cache based on the cache size value and as the lookup operation applies, the input data r

Trang 1

specify in the Cache Size fields Two Cache Size fields have been provided in the Partial caching section—one is for a 32-bit system and the other one applies to a 64-bit system What happens with partial cache mode is that the partial reference data rows are loaded into the cache based on the cache size value and as the lookup operation applies, the input data rows are matched against the reference data in the cache If a match is found, the row is directed to the match output, and

if a match is not found in the cache, a query is sent to the OLE DB source to find

if a match exists Again, if a match is found the row is sent to the match output; however, if a match is not found in the reference database the non-matching row

is directed to the error output or no-match output, depending on how you’ve chosen to handle the no-match rows This trip to an external reference database to find a match or no match could be quite expensive and can hurt the performance

of a lookup operation If your reference data is not changing while the package is running and you’ve sufficient free memory available on the server, you can choose

to store the no-matching rows in the cache using the option “Enable cache for rows with no matching entries.” You can also apply a limit on the amount of cache

no matching rows can use by specifying a percentage value in the Allocation From Cache field The next time the no-matching row with the same key appears in the input data set, the lookup transformation already knows about this, as it has it in the cache and doesn’t do a trip out to the database This process goes on till the cache is completely filled up, at which time the Lookup transformation removes the least frequently used rows from the cache

You can also improve the query performance by selecting the Modify The SQL Statement check box Using this option, you can use an SQL statement to select a subset of the reference data set For example, if you have selected a table or a view

as the reference data set in the Connection page that contains address details for the whole country and your input records contain records for a state only, you don’t need to load all the reference data set in the memory; rather, you need to load only the particular state’s address details To limit the reference data set dynamically, you can also use values or parameters in the WHERE clause of the SQL statement and use the Parameters button to map parameters to input columns When you use parameters in the WHERE clause mapped to input columns, these parameters are first evaluated from the input data set at run time and limit the data set that needs

to be referenced and hence increases overall performance

No cache

c This option does not load reference data set into the cache; rather,

it makes a connection to the external data source using an OLE DB Connection Manager to perform the lookup operation This option works like partial cache mode with a difference that all the input rows are matched against data in the database table, other than the last retrieved row, which always resides in the cache

Trang 2

When you choose this option, the Partial Caching section in the Advanced option

is disabled as there is no cache to configure However, the Custom query can be

used to modify the SQL statement that is applied in the Connection page

You can specify the connection settings for the reference data set in the Connection

page The graphical interface in this page changes to match the options you have

selected in the General page When you select full cache mode, you can choose a

Connection type of either a Cache Connection Manager or an OLE DB Connection

Manager If you choose the Cache Connection Manager option, the Connection page

has only one field in which you can specify the Cache Connection Manager However,

if you choose an OLE DB Connection Manager in the case of full cache mode or are

using a partial cache mode or no cache mode, the interface of the Connection page

changes to allow you to specify settings for an OLE DB Connection Manager In this

case, the reference data set can be a table or view in a database or can be a result of an

SQL statement So, you specify an OLE DB Connection Manager, select a table or

view or an SQL query if you prefer to use an SQL query in the Connection page In

this page, you can either select an existing table or create a new table by clicking the

New button provided for the “Use a table or a view” option and can populate this table

at run time before this Lookup transformation executes If you select “Use results of

an SQL query,” you can either type your SQL statement directly in the Option field

or you can build the query with the Query Builder, which you can invoke by clicking

Build Query You can also read the query from a file by clicking Browse

You specify the join columns in the Columns tab and can also select reference columns

that you want to include in the data flow To make a join, drag a column from Available

Input Columns and drop it on the column you want to join with in the Available Lookup

Columns list You won’t be able to map or join two columns that don’t have matching

data types To select a column to include in the output, check the box provided for the

Available Lookup Columns As you select a column, it will be listed in the lower half

of the editor window, where you can select the lookup operation and assign an output

alias as well to the selected column You can add the selected column as a new column or

replace the values of the same column in the input columns

Finally, note a few cautions when you add this transformation in the data flow The

Lookup transformation’s lookup operation is case-sensitive—i.e., the string comparison

in the lookup is case sensitive—while this may not be the case with your database if

you’re using a case-insensitive collation So you need to be careful with the input data

set case, vis-à-vis the case of the reference data set, and you also need to understand

where the comparison is made If you’re using full cache mode, the comparison is

made within the lookup transformation and is case sensitive; however, if you are using

no-cache mode, then the lookup operation is made in the database engine and, based

Trang 3

on the database collation, could be case-insensitive Similarly, you need to be clear how you are configuring your cache in the partial cache mode and where the comparison will happen One solution may be to convert both the string data sets to either lowercase

or uppercase for this transformation Another reason for lookup failures could be the null values in data If the lookup operation matches columns that contain null values,

it will fail when you are using no or partial caching and the comparison happens in the database engine instead of in the lookup transformation A possible solution for this could be either to use full caching mode or to remove null values by modifying the SQL statement Last, you can join (map) columns in the lookup transformation with any data type other than DT_R4, DT_R8, DT_TEXT, DT_NTEXT, or DT_IMAGE because they are not supported data types for joins

Hands-On: Updating PersonContact Data

You receive contact details for persons interested in your products from different sources As a policy, you standardize data as you insert it into a PersonContact table by performing lookups for address details on the basis of postal code, which is a mandatory requirement for the data to be accepted This data mart will be used for marketing purposes, and hence you want to make sure that all the contacts have a Salutation and there are no duplicated contacts in the table

Method

In this exercise, you receive data in two different file formats, the Microsoft Excel file format and the flat file format, and you will load this data into a PersonContact table in the Campaign database after applying the required transformations In the data flow of this package, you’ll combine the data from these two files, format it to the required data types, generate a salutation in the data flow, perform a lookup on the postal code to get the correct city, and delete duplicate records before loading it into the table Here is the step-by-step process:

Add data flow sources to get the data in the data flow

C Convert the data to match the data types outputted by both the data flow sources C

and then combine the data from both data streams

Derive a salutation

C Enhance address details

C

Trang 4

Delete duplicates and insert data into the PersonContact table.

c

Execute the package to see the results

c

Exercise (Add Data Flow Sources)

You will create a new Integration Services project that will subsequently be used for

all the packages developed in this chapter You will add a data flow in the package and

then will add data flow sources in the data flow

1 Create a new Integration Services project with the following details:

Name Data Flow transformations

Location C:\SSIS\Projects

2 When the blank project is created, rename the package as Updating

PersonContact.dtsx.

3 Drop the Data Flow task from the Toolbox onto the Control Flow Designer

surface and then double-click it to go to the Data Flow tab

4 In the Data Flow tab, drag and drop a Flat File source and an Excel source from

the Toolbox onto the Designer surface

5 Double-click the Flat File source to open the Flat File Source Editor Click

the New button opposite the Flat File Connection Manager field to create a

connection manager

6 In the Flat File Connection Manager Editor, type PersonDetails01 in the

Connection Manager Name field Specify C:\SSIS\RawFiles\PersonDetails01.txt

in the File Name field Select the check box for the “Column names in the first

data row” option The rest of the options will be automatically filled for you with

default, as shown in Figure 10-1

7 Go to the Columns page and the columns will be listed for you as the Flat File

Connection Manager selects {CR}{LF} as the row delimiter and comma {,} as

column delimiter values automatically

8 Go to the Advanced page where you can specify the column data type details In

this exercise, our final destination is the PersonContact table, so you can check

the schema of this table and update column data types accordingly The following

table shows these details, which you need to apply to the column properties in

this page For example, to specify settings for the Postcode column, click this

column and then in the right side pane, go to the OutputColumnWidth field and

change it to 12 Make sure DataType is set to String [DT_STR] Similarly apply

Trang 5

the following settings to all the columns After having applied all the settings, click OK to close the page and return to the Flat File Source Editor

Figure 10-1 Flat File Connection Manager settings

Trang 6

9 Go to the Columns page and check out that all the Available External Columns

have been selected Click OK to close the editor Right-click the Flat File source

and rename it PersonDetails01.

10 Right-click the Excel source and rename it PersonDetails02 Double-click

PersonDetails02 to open the Excel Source Editor Click the New button provided opposite to the OLE DB Connection Manager field

11. In the Excel Connection Manager dialog box, specify C:\SSIS\RawFiles\

PersonDetails02.xls in the Excel File Path field Leave Microsoft Excel 97-2003

specified in the Excel Version field, and leave the check box selected for “First

row has column names.” Click OK to create this connection manager

12. In the Excel Source Editor, leave “Table or view selected” in the Data Access

Mode field Select PersonDetails02$ in the “Name of the Excel sheet” field from

the drop-down list and go to the Columns page to see that all the Available

External Columns have been selected Click OK to close this editor

13 Rename the Excel Connection Manager to PersonDetails02.

Exercise (Combine Two Data Streams)

In this part of the exercise, you will add a Data Conversion transformation to convert

the data type of columns coming from the Excel source to match with the data types

of columns coming through Flat File source and then use Union All transformation to

combine these two data streams

14. Right-click the Excel source and choose Show Advanced Editor from the context

menu Go to the Input and Output Properties tab, expand the Excel Source Output and check out the properties of columns listed in External Columns and Output

Columns Note that these columns are of the Unicode string [DT_ WSTR] data

type with column width (Length) equals 255 These columns need to be converted

to the data type so that you can combine them with the columns from the Flat File

source Close the Advanced Editor

15. Drop a Data Conversion transformation from the Toolbox onto the Data Flow

surface just below the Excel source Join the Excel source with this transformation using the green arrow

16. Double-click the Data Conversion transformation to open its editor As you

want to convert the data type of all the columns, select the check boxes provided

in front of all the columns in the Available Input Columns As you select the

columns, a row for each column will be added in the lower grid section, where

you can configure the changes

17. For the FirstName Input Column, note its Output Alias, which means the converted column will be added to the transformation output as Copy of FirstName Click

in the Data Type field and change it to string [DT_STR] from the drop-down list

Trang 7

and change the Length to 50 A Code Page will be selected by default based on the code page of the computer; however, you can change it if you need to by selecting

a different code page from the drop-down list Similarly, change settings for all the columns as shown in the Figure 10-2

18. Click OK to close the transformation Rename the Data Conversion transformation

Converting PersonDetails02.

19. Drop the Union All transformation from the Toolbox on the Data Flow surface between PersonDetails01 and Converting PersonDetails02 Join both of these components to the Union All transformations using green arrows

Figure 10-2 Converting Excel source data using Data Conversion transformation

Trang 8

20. Double-click the Union All transformation to open the editor Here, you will

see Output Column Name columns automatically mapped to Union All Input

1 columns, whereas you will need to map Union All Input 2 columns yourself

Click in the FirstName column under Union All Input 2 and select the converted

Copy of FirstName column If you select FirstName in this column, you will get

an exception error for incompatible data type Map all the output columns to the

converted columns as you did for FirstName column (Figure 10-3)

21 Click OK to close the editor Rename the Union All transformation Merging

PersonDetails01 and PersonDetails02.

Figure 10-3 Combining two data streams using the Union All transformation

Trang 9

Exercise (Derive Salutation)

You will use a Derived Column transformation in this part to derive a salutation You need to derive a salutation as Mr., Mrs., or Miss

22. Drop a Derived Column transformation from the Toolbox below the Merging PersonDetails01 and PersonDetails02 and join both of them by dragging the green arrow onto the Derived Column transformation

23. Double-click the Derived Column transformation to open the editor This transformation has an Expression Builder interface Click in the Derived Column

Name field and type Salutation By default, <add as new column> will be displayed

in the Derived Column field, which you can change to replace an existing column from the drop-down list

24. Type the following expression in the Expression field (refer to Figure 10-4): Gender == "m" ? "Mr." : (Gender == "f" && Married == "y" ? "Mrs." : "Miss")

Figure 10-4 Deriving a salutation using Derived Column transformation

Trang 10

In this expression, you are using a conditional operator that returns one of the two expressions based on the evaluation of the Boolean expression

25. Select string [DT_STR] from the drop-down list in the Data Type field and specify

4 in the Length column if not automatically specified If you can’t change the data

type here, you can in the Input and Output Properties tab of the Advanced Editor

Click OK to close the editor Rename this transformation Deriving Salutation.

Exercise (Enrich Address Details)

One of the requirements you will address in this exercise is to standardize the address detail

by including City on the basis of the Postcode available in the data You’ve a table called

PostalCity in the Campaign database that contains a list of cities against postal codes, and

you will perform a lookup operation to get the correct city against a postal code As a lookup operation is a case-sensitive operation in SSIS, you will convert the Postcode column in

both places—i.e., in the pipeline as well as in the reference data to UPPERCASE Let’s

first use a Character Map transformation to convert the pipeline data

26. Drop a Character Map transformation from the Toolbox on the Data Flow

Designer Join the Character Map transformation with Deriving Salutation by

dragging and dropping the green arrow from the latter

27. Double-click the Character Map transformation to open the editor Click in the

check box next to the Postcode column to select it A row will be added in the

lower grid area with Input Column selected as Postcode

28. Check that the Destination column in the lower grid has only two possible values: New Column and In-place Change Select the In-Place Change value in the

Destination column Note that the Output Alias is also changed to Postcode (see

Figure 10-5)

29. Invoke the drop-down list of available operations in the Operation field Select

Uppercase by clicking the check box in front of it Click OK to complete your

selection

30 Click OK to close the editor Rename Character Map transformation Uppercasing

Postcode.

31. Now drop a Lookup transformation on the Data Flow surface after the Uppercasing Postcode transformation Join both of these transformations using a green data

flow path

32. Double-click the Lookup transformation to open the editor Select the Partial

Cache mode in the General page

33. In our data stream, some of the postal codes will not find matches for cities, so

they will fail the Lookup transformation You will configure this transformation

to divert these mismatching records to a flat file so that you can review them later

on Select the “Redirect rows to no match output” option under the “Specify how

to handle rows with no matching entries” field

Định dạng
Số trang	10
Dung lượng	474,83 KB