Hands-On Microsoft SQL Server 2008 Integration Services part 40 ppt

did in the control flow—drag the output of a data flow component and drop it onto the input of the next component, and the line formed connecting the data flow components on the Data Flo

Trang 1

did in the control flow—drag the output of a data flow component and drop it onto the input of the next component, and the line formed connecting the data flow components

on the Data Flow Designer surface is called the data flow path They may look similar on

the Designer, but there are major differences between a precedence constraint and a data flow path, as both represent different functionalities in their own right

Note that the data flow path line is thinner than the precedence constraint line and can be either green or red, depending on whether it is representing Output Path

or Error Output Path In the Control Flow, when you connect a task to another using a precedence constraint and click the task again, you will see another green arrow, indicating that you can configure multiple precedence constraints for the tasks; However, in the data flow the data flow paths are limited to the number of outputs and error outputs available to the source component There’s another important difference— the data flow path actually simulates a pipe connecting the pipeline components in the

data flow path (remember that data flow is also known as pipeline) through which data

flows, whereas a precedence constraint specifies a condition when the next task can be executed in the workflow

When you click a component, for example OLE DB source, in the Data Flow Designer surface, depending upon the outputs available from the component, you may see a combination of green and red arrows Some components have both output and error output paths; some have only one and some have no output, such as destinations Our example component, OLE DB source, has both output and error output available and hence shows both green and red arrows After you connect a component to another component using a data flow path on the Data Flow Designer, you can configure the properties of the data flow path using Data Flow Path Editor This editor can be opened by choosing the Edit command from the context menu or simply by double-clicking the path Once in the Data Flow Path Editor, you will be able to configure properties such as name, description, and annotation of the path on the General page; you can see the metadata of the data columns flowing through the path on the Metadata page; and you can add data viewers on the Data Viewers page We will configure the properties of the data flow path in the following Hands-On exercise

Hands-On: An Introduction to the Data Flow Task

The purpose of this Hands-On exercise is to introduce you to the Data Flow task and how you can monitor the data flowing through the package by exporting data from [Person].[Contact] table of AdventureWorks database to a flat file

Method

We will not do much research in this package but will keep it simple as this is just an introduction to data flow You will drop a Data Flow task on the control flow and then

Trang 2

go on to configure this Data Flow task As you know by now that the Data Flow task

has its own development and designer environment in BIDS, which opens up when

you double-click the Data Flow task or by clicking the Data Flow tab

Exercise (Configure an OLE DB Connection

Manager and Add a Data Flow Task)

To begin this exercise, create a new package, configure a connection manager for

connecting to the AdventureWorks database, and then add a Data Flow task to the

package

1 Start BIDS Create a New Project with the following details:

Template Integration Services Project

Name Introduction to Data Flow

Location C:\SSIS\Projects

2 When the blank solution is created, go to Solution Explorer and rename the

package.dtsx file to My First Data Flow.dtsx.

3 As we will be exporting data from Adventure Works database, we need to have a

connection manager to establish a connection to the database Right-click anywhere

in the Connection Managers area and choose New OLE DB Connection from the

context menu In the Configure OLE DB Connection Manager dialog box, click

New to specify settings for the Connection Manager dialog box Specify localhost

or your computer name in the Server Name field and leave the Use Windows

Authentication radio button selected Choose the AdventureWorks database

from the drop-down list in the Select Or Enter A Database Name field Test the

connection to the database using the Test Connection button before closing the

open windows by clicking OK twice

4 Go to the Toolbox; drag and drop the Data Flow task onto the Control Flow

Designer surface Right-click the Data Flow task and choose Rename from the

context menu Rename the Data Flow task as Export PersonContact.

5 Double-click Export PersonContact and you will be taken to the Data Flow

tab, where the Data Flow Task field displays the currently selected task: Export

PersonContact Using this field, you can select the required Data Flow task from

the drop-down list when your package has multiple Data Flow tasks

6 Go to the Toolbox, and you will notice that the available list of tasks in the

Toolbox has changed The Data Flow tab has a different set of Integration Services

components that are designed to handle data flow operations and are divided

into three sections: Data Flow sources, Data Flow transformations, and Data

Flow destinations See the list of components available under each section

Trang 3

Exercise (Add an OLE DB Source and a Flat File Data Flow Destination)

Now you can build your first data flow using an OLE DB source and a Flat File destination

7 From the Data Flow Sources section in the Toolbox, drag and drop the OLE DB Source onto the Data Flow Designer surface Double-click the OLE DB source

to open the OLE DB Source Editor You will see that the OLE DB Connection Manager field has automatically picked up the already configured connection manager Expand the list of Data Access Mode to see the available options Leave the Table Or View option selected

8 When you click in the name of the table or the view field, the Data Flow source goes out using the connection manager settings to display you a list of tables and views Select [Person].[Contact] table from the list Click Preview to see the first

200 rows from the selected table Once done, close the preview window

9 Click the Columns page from the left pane of the editor window Note that all the external columns have been automatically selected Uncheck the last five columns—PasswordHash, PasswordSalt, AdditionalContactInfo, rowguid, and ModifiedDate—as we do not want to output these columns The Output Column shows the names given to the output columns of OLE DB source, though you can change these names if you wish to do so (see Figure 9-6)

10. Go to the Error Output page and note that the default setting for any error or truncation in data for each column is to fail the component This is fine for the time being Click OK to close the OLE DB Source Editor

11. Right-click the OLE DB source and choose the Show Advanced Editor context menu command This will open the Advanced Editor dialog box for OLE DB source, in which you can see its properties exposed in four different tabs The Connection Managers tab shows the connection manager you configured in earlier steps Click the Component Properties tab and specify the following properties:

Name Person_Contact

Description OLE DB source fetching data from [Person].[Contact] table of AdventureWorks database

Go to the Column Mappings tab to see the mappings of the columns from Available External Columns to Available Output columns Go to the Input and Output Properties tab and expand the outputs listed there You will see the External Columns, Output Columns, and the Error Output Columns If you click a column, you will see the properties of the column in the right side Depending upon the category of the column you’re viewing, you will see different

Trang 4

levels and types of properties that you may be able to change as well Click OK

to close the Advanced Editor and you will see the OLE DB source has been

renamed If you hover your mouse over the OLE DB source, you will see the

description appear as a screen tip Make a habit to clearly define the name and

description properties of the Integration Services components, as this helps in

self-documenting the package and goes a long way in reminding you what this

component does, especially when you open a package after several months to

modify some details

Figure 9-6 You can select the external columns and assign output names for them.

Trang 5

12. Go to the Toolbox and scroll down to the Data Flow destinations section Drag and drop the Flat File destination from the Toolbox onto the Designer surface just below the Person_Contact OLE DB source Click the Person_Contact and you will see a green arrow and a red arrow emerging from the source Drag the green arrow over to the Flat File Destination to connect the components together

13. Double-click the Flat File destination to invoke the Flat File Destination Editor

In the Connection Manager page, click the New button shown opposite the Flat File Connection Manager field You will be asked to choose a format for the flat file to which you want to output data Select the Delimited radio button

if it is not already selected and click OK This will open a Flat File Connection Manager Editor dialog box Specify C:\SSIS\RawFiles\PersonContact.txt in the File Name field and check the box for the Column names in the first data row option All other fields will be filled in automatically for you with default values Click the Columns page from the left pane of the dialog box and see that all the columns you’ve selected in the OLE DB source have been added This list

of columns is actually taken by the Flat File Destination’s input from the output columns of OLE DB source If you go to the Advanced page, you will see the available columns and their properties

Click OK to add this newly configured connection manager to the Flat File Connection Manager field of the Flat File Destination Editor dialog box Leave the Overwrite Data In The File option checked

14. Go to the Mappings page to review the mappings between Available Input Columns and Available Destination Columns Click OK to close the Flat File

destination Rename the Flat File destination to PersonContact Flat File.

Exercise (Configure the Data Flow Path and Execute the Package)

In this part, you will configure the Data Flow path that you’ve used to connect the two components in the last exercise to view the flow of data at run time

15. Double-click the green line connecting the Person_Contact and PersonContact Flat File components to open the Data Flow Path Editor In the General page of the editor, you can specify a unique Name for the path, type in a Description, and annotate the path The PathAnnotation provides four options for annotation: Never for disabling path annotation, AsNeeded for enabling annotation, SourceName to annotate using the value of Source Name field, and PathName to annotate using the value specified in Name field

16. The Metadata page of the Data Flow Path Editor shows you the metadata of the data flowing through it You can see the name, data type, precision, scale, length, code page, sort key position, comparison flags, and source component of each column The source component is the name of component that generated the column You can also copy this metadata to the clipboard if needed

Trang 6

17. In the Data Viewers page you can add data viewers to see the actual data that is

flowing through the data flow path This is an excellent debugging tool, especially when you’re trying to find out what happened to the data Let’s add a data viewer Click Add to configure a data viewer In the General tab of the Configure Data

Viewer dialog box, choose how you want to view the data by selecting from

Grid, Histogram, Scatter Plot (x,y), and Column Chart types of the data viewers

Depending upon your choice of data viewer type, the second tab is changed

appropriately

Grid

c Shows the data columns and rows in a grid You can select the data

columns to be included in the grid in the Grid tab

Histogram

c Select a numerical column in the Histogram tab to model the

histogram when you choose this data viewer type

Scatter Plot (x,y)

c Select this option and the second tab changes to Scatter

Plot (x,y), in which you can select a numerical column each for the x-axis and

the y-axis The two columns that you select here will be plotted against each

other to draw a point for each record on the Scatter Plot

Column Chart

c Visualize the data as column charts of counts of distinct

data values For example, if you are dealing with persons and use city as a

column in the data, then the Column Chart can show the number of persons

for each city drawn as columns on the chart

For our exercise, choose Grid as a data viewer type and leave all the columns

selected in the Grid tab Click OK to return to the Data Flow Path Editor

dialog box, where you will see a grid-type data viewer has been added in

the Data Viewers list Click OK to close this editor and you will see a Data

Viewer icon alongside the Data Flow path on the Designer

18. The package configuration is complete now, but before we execute the package,

it is worth exploring two of the properties of the data flow engine that affect the

flow of data buffer by buffer through it Right-click anywhere on the Data Flow

Designer surface and choose Properties Scroll down to the Misc section in the

Properties window and note the following two listed properties:

DefaultBufferMaxRows 10000

DefaultBufferSize 10485760

These properties define the default size of the buffer as 10MB and the maximum

rows that a buffer can contain by default as 10,000 These settings give you control

to optimize the flow of data through the pipeline

Trang 7

19. Press the f5 key on the keyboard to execute the package As the package starts executing you will see a Grid Data Viewer window As the package executes and starts sending data down the pipeline, the data viewer gets attached to the data flow and shows the data in the buffer flowing between the two components If you look on the status bar at the bottom of the data viewer window, you can see the counts of the total number of rows that have passed through the data viewer, the total number of buffers when the viewer is detached, and the rows displayed

in this buffer On the top of the data viewer window, you can see three buttons: Copy Data allows you to copy the data currently shown in the data viewer to the Clipboard, Detach toggles to Attach when clicked and allows you to detach the data viewer from the data flow and lets the data continue to flow through the path without being paused, and the green arrow button allows you to move data through the data flow buffer by buffer When the package is executed, the data is moved in the chunk sizes (buffer by buffer) limited by the default buffer size and the default buffer maximum rows, 10MB and 10,000 by default Clicking this green arrow button will allow the data in the first buffer to pass through and the data in the second buffer will be held up for you to view (see Figure 9-7) Click the green arrow to see the data in the next buffer

Figure 9-7 Data Viewer showing the data flow in the grid

Trang 8

20. After a couple of seconds, you will see the data in the next buffer This time the

total rows will be shown at a little less than 20,000 and the rows displayed will be

a little less than 10,000 The total number of rows is also shown next to the Data

Flow path on the Designer surface This number of rows may vary for a different

data flow depending upon the width of the rows Click Detach to complete the

execution of the package

21. Press shift-f5 to stop debugging the package Press ctrl-shift-s to save all the

items in this project

Review

In this exercise, you built a basic data flow for a package to extract data from a database

table to a flat file You’ve also used the Data Viewer to see the data flowing past and

learned how to optimize the data buffer settings to fine-tune the data flow buffer by

buffer through the data flow

Summary

You are now familiar with the components of data flow and know how the data is

being accessed from the external source by the data flow source, passes through the data flow transformations, and then gets loaded into the data flow destinations You have

studied the data flow sources, data flow destinations, and data flow path in detail in

this chapter and have briefly learned about the data flow transformations In the next

chapter, you will learn more about data flow transformations by studying them in detail

and doing Hands-On exercises using most of the transformations

Trang 10

Data Flow

Transformations

In This Chapter

c Row Transformations

c Split and Join

Transformations

c Rowset Transformations

c Audit Transformations

c Business Intelligence

Transformations

c Summary

Định dạng
Số trang	10
Dung lượng	339,66 KB