Hands-On Microsoft SQL Server 2008 Integration Services part 36 pot

Data Flow ComponentsIn This Chapter c From Control Flow to Data Flow c Data Flow Component Interfaces c Considerations for Bringing Data into Data Flow c Data Flow Sources c Data Flow

Trang 1

3 2 8 H a n d s - O n M i c r o s o f t S Q L S e r v e r 2 0 0 8 I n t e g r a t i o n S e r v i c e s

Review

Event Handlers are like subpackages that can respond to events occurring at run time, and if properly designed and developed, they can let you have a good night’s sleep most nights In this exercise, you created simple event handlers for OnTaskFailed and OnPostExecute events and have seen how these event handlers wake up on the occurrence of these events and perform the work they are configured to perform You can create quite a complex event handler to perform alternative actions if a particular event happens However, you also need to be careful while using these event handlers,

as the events are flagged up to the parent container at the cost of CPU cycles, unless you filter out the events not to be sent to parent container Last, you can disable event handlers during design and testing using the DisableEventHandlers property on the objects on which the event handlers have been defined

As a Data Source for Reporting Services Report

This is an excellent feature that allows you to use an SSIS package as a data source inside a reporting services report You will study about a DataReader Destination of Data Flow task in Chapter 9 that can act as a data source for external applications The Data Reader Destination is not actually a destination to write data to some data store; rather, it is an interface for ASP.NET applications to connect and read data from an SSIS package This feature is not enabled by default in Reporting Services, so you have

to enable it before using it The following describes the steps you need to perform

Enable SSIS as a Data Source

To enable SSIS as a data source, you need to configure two reporting services config files—RSReportDesigner.Config and RSReportServer.Config files

1 First, locate the RSReportDesigner.Config file, which exists in the C:\Program Files\Microsoft Visual Studio 9.0\Common7\IDE\PrivateAssemblies folder in

a default installation

2 Open this file using either any text editor or Visual Studio 2008 The later displays XML in a nice format that is easy to work with Locate the Extensions section and note that there are three different types of extensions—Render, Data, and Designer extensions SSIS being a data source, we will add the following code in Data element as shown in Figure 8-15:

<Extension Name="SSIS"

Type="Microsoft.SqlServer.Dts.DtsClient.DtsConnection,Microsoft.SqlServer Dts.DtsClient, Version=10.0.0.0, Culture=neutral, PublicKeyToken=89845dcd808 0cc91"/>

Trang 2

3 And add the following code in the Designer element:

<Extension Name="SSIS"

Type="Microsoft.ReportingServices.QueryDesigners.

GenericQueryDesigner,Microsoft.ReportingServices.QueryDesigners"/>

4 Now locate and open the rsreportserver.config file, which should be located

in C:\Program Files\Microsoft SQL Server\MSRS10_50.MSSQLSERVER\

Reporting Services\ReportServer folder in a default installation In the file locate

the Extensions section and add the following code in the Data element:

<Extension Name="SSIS" Type="Microsoft.SqlServer.Dts.DtsClient

.DtsConnection,Microsoft.SqlServer.Dts.DtsClient, Version=10.0.0.0,

Culture=neutral, PublicKeyToken=89845dcd8080cc91"/>

5 You may have to restart the Reporting Services service for the changes to

take effect

Figure 8-15 Editing the RSReportDesigner.Config file

Trang 3

Using SSIS as a Data Source

After you have enabled the SSRS to use SSIS as a data source, you can use this feature

1 First you have to build an SSIS package in which your data flow task should have DataReader Destination as a destination The name you assign to the DataReader Destination will be used in the Reporting Services, so keep it short

In our example, I’m using the default name—DataReaderDest

2 Now start a new reporting services project in BIDS and add a data source In the Type field, you will see SSIS listed as a data source Select SSIS as shown in

Figure 8-16 and then add -f C:\SSIS\Packages\Sales.dtsx in the connection

string field The SSRS data source will execute the package using DTEXEC command and needs only the connection path in the connection string

Figure 8-16 Using SSIS as a data source in the SSRS Report and specifying the connection path

Trang 4

3 Next add a report and select the data source created in the preceding step Then

in the query field specify the name of the DataReader Destination, which in our

case is DataReaderDest as shown in Figure 8-17 You have done whatever was

required to use the SSIS DataReader Destination as a source in an SSRS report

Now proceed to complete the report When you run the report or preview it, you

will see data served at run time

This method is easily configurable, though Microsoft does not recommend it The

primary concern is the security settings, as the package runs under the Reporting

Services service account, which might be an issue in some situations You need to

be careful when you deploy this method However, this example is very useful in

demonstrating that the DataReader Destination can serve data dynamically to external

ASP.NET applications

Figure 8-17 Specifying the DataReader Destination name in the Query String

Trang 5

Summary

This was an absorbing chapter, as you’ve worked through various advanced features provided in Integration Services to enhance the service quality of your packages You’ve worked with logging and log providers and have enabled logging for a package

in the Hands-On exercise Then you worked with transactions and configured your package to use transactions to handle three different kinds of scenarios for processing transactional data You also used checkpoints to restart your package from the point

of failure and learned about the effect of using transactions on checkpoints Later in the chapter you extended the Contacting Opportunities package to create personalized e-mails using variables and property expressions In the last Hands-On exercise, you created event handlers in your package and saw how they become active when a specified event happens

With this chapter, you’ve concluded the workflow configurations of Integration Services projects and their administration In the next two chapters, you will learn all about the data flow engine and how to transform data using Integration Services Data Flow transformations Until now, your packages did not include any data flow or Data Flow tasks; however, from now on, we will focus mainly on data flow components and their configurations and will not do much in the control flow Note that you cannot exclude control flow from your packages, as it must be part of an Integration Services package

Trang 6

Data Flow Components

In This Chapter

c From Control Flow to

Data Flow

c Data Flow Component

Interfaces

c Considerations for Bringing

Data into Data Flow

c Data Flow Sources

c Data Flow Transformations

c Data Flow Destinations

c Data Flow Paths

c Summary

Trang 7

Integration Services provides several data-oriented services and the related

components that can be configured in the data flow task designer The data flow within a package extracts the data from a data source; performs the transformations, computations, and derivations; and then loads the transformed data

to the data destination The assembly used to represent a data flow is called a data

flow pipeline The data flow engine of Integration Services is a data-driven execution

engine that provides a steady flow of data streams onto which atomic computations or processes are applied

The data flow constitutes three main components: data flow sources, data flow transformations, and data flow destinations Additionally, these components use data flow paths to move the data from one component to the next in the data flow

This chapter discusses the separation of data flow from the control flow and the benefits thus achieved It then discusses the data flow components consisting of data flow sources, data flow transformations, data flow destinations, and data flow paths, plus the inputs, outputs, error outputs, and the external columns to explain their functions At the end of the chapter, you will work through a Hands-On exercise to use these components

From Control Flow to Data Flow

An Integration Services package relies on control flow and data flow as integral parts, though other parts such as event handlers and log providers also support the package The data flow consists of three major components—data flow sources, data flow transformations, and data flow destinations—which are connected via data flow paths One of the key enhancements in Integration Services over DTS 2000 is the separation

of control flow and data flow engines The immediate predecessor to SSIS—i.e., SQL Server 2000 Data Transformation Services—had all the control and data flow features available in one engine and hence was difficult to use for creating complex packages to meet some stringent data requirements Integration Services introduces two engines, called the integration services run-time engine and integration services data flow engine, which separate data flow from control flow

Inherently, the requirements of control flow and data flow are different for operational and performance reasons The requirements that drive the creation of an ETL package are to extract data, perform some transformations, and load; when you get to the design board, you realize that the package needs to populate variables along the way, wait for data to be available that might be populated by some other part of the package, perform operations in a particular order, move data from buffer to buffer, and so on This order

of tasks, availability of data, and ability to move data as fast as possible from a buffer

to another buffer after applying transformations drive the designers in two opposing directions—applying control to the package work flow and applying transformations that are fast enough to the data flow Although DTS 2000 fits the bill for tackling these

Trang 8

operations, it lacks the ability to strike a perfect balance between control and data flow

requirements for the packages you design for various purposes

The provision of a separate run-time engine and data flow engine provides better

control over packages and enhances the way the packages can be designed with separate

data flow The run-time engine controls the order of the package workflow and provides

associated services, such as the ability to define an alternative workflow on the occurrence

of an event, to apply breakpoints on the tasks, to manage connections to the data sources,

to log data, and to manage transactions and variables The data flow engine provides

high-performance data movement functionality, transformations optimized for speed,

and great extensibility by allowing you to create custom transformations on top of prebuilt data sources, transformations, and destinations to load data capable of meeting most of

your requirements You can include multiple data flow tasks in a package, with each data

flow task able to support multiple data sources, transformations, and destinations

When you drop the Data Flow task on the Control Flow Designer surface,

you invoke the data flow engine The Data Flow task is a special task provided in

Integration Services that allows you to create data movements and transformations

for data in a package The Data Flow task replaces the DTS 2000 tasks, such as the

Data transformation task and the Data Driven Query task, which used to provide data

movement and manipulation functionalities Unlike other tasks, double-clicking the

Data Flow task doesn’t invoke the Properties or Editor dialog box; instead, it opens

another designer surface represented by the Data Flow tab in BI Development Studio

(BIDS) The Data Flow task, accessed through the Data Flow tab, provides many

additional components such as data sources, transformations, and destinations to build

your data movement and transformation part of the package The following section

covers the components available in the Data Flow tab

Data Flow Component Interfaces

The Data Flow task consists of source and destination adapters that extract and load

data between heterogeneous data stores; transformations that modify, summarize,

cleanse, and extend data; and the paths that link these components together by

connecting output of one component to the input of the other component, thus

providing a sequence to the data flow Some components accept multiple inputs and

can have multiple outputs As mentioned, the Data Flow task can have multiple data

sources, transformations, and destinations These components are designed for high

performance with a focus on data movement and manipulation efficiency

All these components are available in the Toolbox window of the Data Flow tab

and can be added to the data flow just like control flow tasks These components are

categorized in three ways: data flow sources, data flow transformations, and data flow

destinations Data flow sources are used to extract data, data flow transformations help

Trang 9

in modifying the extracted data, and this modified data is finally loaded into the data silos using data flow destinations While these components perform their functions, they use data flow paths to move data from a source component to the destination component These data flow paths are similar to pipes in a pipeline A data flow path

is quite interesting and helpful to debug and visualize the data You can see the data and metadata of the data flowing through data flow path using data viewers Data flow paths are covered in detail later in the chapter

Each of the data flow components may have one or more inputs, outputs, or error outputs associated with it The data flow source reads the data from the external data source, such as a flat file or a table of a relational database, using its external interface It then makes this data available to the downstream components A data flow source uses outputs to send the data in one or more columns via the data flow path to the inputs of the transformations or destinations A transformation receives data at its inputs and sends data out at its outputs Finally, a destination receives data at its inputs and sends data out through its external interface to the destination These components can optionally be configured to have error outputs as well that can contain all the input columns and two extra columns, one for error code and one for an error column to indicate the reason for failure and the failing column Let’s discuss these interfaces in detail

External Metadata

This interface writes or reads data to and from external data stores (for example, an Excel worksheet) and keeps a copy of their metadata During design time, when you create a data flow and add sources to your package, the metadata of the data from the sources is copied

to the external columns on data flow sources Similarly, a data flow destination keeps a copy of the destination metadata in its external columns as a snapshot of the destination data store and compares the data schema with this snapshot before writing to the data store So, as the external columns keep a schema snapshot of the external data stores, they help in the process of package validation So if a data store is to be created at run time that doesn’t exist at design time, or you make changes to a data store without updating the package, you will get validation errors or warnings displayed by SSIS components Some

of these warnings can be handled by using the ValidateExternalMetadata property on the components, which works like DelayValidation property on the tasks and delays the validation until the run time of the component

Inputs

Inputs receive data from the data flow path in the input columns The components keep a copy of the metadata in the input interface as well, and the data received at the inputs can be validated against the cached schema You can configure an input

Trang 10

to fail a component, ignore failure, or redirect a row in case it receives errors in

the input columns In an Integration Services data flow, only the destinations and

transformations have inputs You might think that as the data flow sources bring in

data to the data flow, they have inputs, but this is not the case The data flow sources

use external columns to bring data into the data flow Inputs and outputs (discussed

in the next section) are used when the data flows from one data flow component to

another data flow component Whenever external data sources or storages are involved,

the data flows directly through the data flow sources or data flow destinations via the

external interface

Outputs

The data is sent out from the outputs of the sources and transformations through the data flow path to the inputs of the downstream components that could be transformations

or destinations The output columns’ data can be validated against the external column

schema depending on the data flow components configurations The output columns

are exposed in the Input and Output Properties tab of the Advanced Editor, where you

can configure them

Error Outputs

When the data flows through the processing logic of the component, errors and mismatches can occur These errors could be related to the data type mismatch between the input

data and the corresponding metadata of an external column, or the data coming in may

be longer than the length defined in the corresponding external column These kinds

of mismatches may cause data rows to raise errors or truncations You can configure the error handling in the component to fail the component on an error, ignore the error, or

redirect the failing rows to an error output You can specify the error handling behavior

for each column using one of three options Whenever the failing rows are redirected to error output, two additional columns, ErrorColumn and ErrorCode, are also added to

indicate the failing column and the code of failure

Considerations for Bringing Data into Data Flow

Integration Services uses the data flow engine to extract data from a data source,

transform and modify the data once it is in the data flow, and then load it to an external data store The data flow engine uses data flow sources for extracting data from a data

store Figure 9-1 shows how the data is being extracted and parsed by a data source; it

also shows how an erroneous record will be handled

In Figure 9-1, two records demonstrate how data flows through a data flow source

As the data flow source reads the data from the external source, it translates the source

Định dạng
Số trang	10
Dung lượng	350,13 KB