Hands-On Microsoft SQL Server 2008 Integration Services part 38 pps

The Raw File source establishes a direct connection to the file containing the raw data and brings that raw data into the data flow.. Script Component Source The preconfigured data flow

Trang 1

OLE DB Source

You will be using an OLE DB source whenever you need to extract data from a relational database that is OLE DB–compliant, such as Microsoft SQL Server, an Oracle database server, IBM’s DB2 database server, an Access database, or even an Excel file An OLE DB provider is used in an OLE DB Connection Manager to

Figure 9-5 Flat File Advanced Editor showing FastParse and UseBinaryFormat properties

Trang 2

connect to the database, and an OLE DB source uses the OLE DB Connection

Manager to bring the data into the data flow Just to remind you here that if you use

an Integration Services Data Source inside an OLE DB Connection Manager, you

can actually make Data Source or Data Source view available through the OLE DB

Connection Manager to all of your projects within an Integration Services project Not

all OLE DB providers are the same, you should check the documentation of the OLE

DB provider that you want to use in your package, as some limitations are associated

with different types of OLE DB providers The OLE DB source has one regular

output and one error output

This data flow component has a custom user interface that is similar to the Excel

source adapter discussed earlier, and its properties can also be configured using the

Advanced Editor When you open an OLE DB Source Editor, it takes you into the

Connection Manager page Here you can specify an OLE DB Connection Manager

from the drop-down list provided in the OLE DB Connection Manager field The

OLE DB Source provides you the following four levels of data access mode options:

Table Or View

c Specify the name of the table or the view from which you want

to extract data

Table Name Or View Name Variable

c Specify the variable name, which will be

the holding name of the table or the view

SQL Command

c Write an SQL statement to extract data You have the option

of using a parameterized SQL query here

SQL Command From Variable

c Specify the variable name that will be holding

an SQL statement to be passed on to OLE DB source

For more details on these options, refer to the section “Excel Source.” where they

have been discussed in detail

In the Columns page, you can see the Output Columns mappings to the External

Columns The matching name will be written for you in the Output Column, which

you can change if you wish to do so

In the Error Output page, you can specify how the OLE DB source should handle

an error or a truncation for each column

Raw File Source

All the prebuilt Integration Services sources other than this Raw File Source require a

connection manager to connect to a data source The Raw File source doesn’t require

a connection manager The Raw File source establishes a direct connection to the file

containing the raw data and brings that raw data into the data flow The raw data

Trang 3

written in the raw file is native to the source and requires no parsing or translation during import, so the Raw File source can extract the data much faster You can use raw files to stage data due to its fast read and write operation; for example, in scenarios for which you export data from your system for later transformation and loading back into the similar system, this may be an ideal choice The Raw File source has only one output and no error output

The Raw File source has been given a custom user interface in this version of Integration Services, but that doesn’t mean that you can’t use the Advanced Editor to configure its properties The custom UI is very simple and contains only two pages

Connection Manager

c Though this page is named like the ones in other components where you select a connection manager, this component doesn’t use any SSIS connection manager; rather, it connects to the raw file directly In the Access Mode field, you can specify a filename or choose to get the filename from a variable Depending on your choice of access mode, the interface changes the available fields to collect relevant information Choosing File Name lets you specify the raw filename in the File Name field, and choosing File Name From Variable lets you specify the name of the variable that holds the raw filename in the Variable Name field

Column Mappings

c Shows the mappings of external columns to output columns and allows you to rename the output columns

Script Component Source

The preconfigured data flow sources in Integration Services have only one output available—for example, Flat File source, Excel source, and OLE DB source all have single outputs If you need to output data to more than one downstream component in the data flow, you can’t do that using these preconfigured components The only option you have in this case, or in other similar cases when existing source components do not meet your requirements, is to create your own data flow source You can write yourself

a custom component, but an easier option is to use the script component as a source The script component has not been shown under data flow sources as a data source, but

it can be configured as a data source When you drop the script component onto the Data Flow Designer surface, you are asked to select whether you want to configure this component as a source, a transformation, or a destination Based on your choice, the script component customizes the interface and options appropriate for the purpose As

a source, the script component doesn’t have any input and only one output to start with You can add additional outputs using the Add Output button in the Inputs and Outputs page Configuring a script component as a data flow source is covered in Chapter 11

Trang 4

XML Source

XML Source reads the XML data from an XML file or from a variable containing

XML data and brings that data into the data flow This source has a custom user

interface to edit properties but also uses the Advanced Editor for configurations of

some of its properties

When you open the XML Source Editor, the Connection Manager page appears,

where you specify how you want to connect to and access data from the XML data file

Depending upon the option you choose, the interface changes to collect the relevant

information:

XML file location

c Lets you specify the location and the filename for the XML

data file in the XML Location field

XML file from variable

c Allows you to use a variable to specify the XML data

filename and location You then provide name of the variable containing XML

data file details in the Variable Name field

XML data from variable

c Access XML data directly from a variable by specifying the name of the variable in the Variable Name field

Next you can choose schema options: use an inline schema or provide an XML

schema definition file in the XSD format When the XML data file contains the XSD

schema itself to validate its structure and data, you will be using inline schema option;

otherwise, you will have to supply an external schema definition file (XSD) If you

don’t have an XSD file with you, you can generate this file by clicking the Generate

XSD button and providing a location and name for the XSD file This file is required

to interpret the relationships among the elements in the XML data file

In the Columns page, you can map output columns to external columns and in the

Error Output page, you can specify how the XML Source should handle errors and

truncations of data for each column This source can have multiple regular outputs and

multiple error outputs

Data Flow Transformations

Once the data has been captured by source adapters and passed on to data flow path,

you can modify this data using a wide range of data flow transformations provided

in SSIS You can use data flow transformations to aggregate column values, update

column values, add columns to the data flow, merge data, and accomplish many

more data modifications The data flow transformations can have single or multiple

inputs or outputs, depending upon the type of transformation you choose A data

Trang 5

flow transformation receives data on its input columns from the output columns

of a data flow source or a transformation After applying transformations on to the data, the data flow transformation provides the output through its output columns

to the input columns of the downstream component, which can be another data flow transformation or a data flow destination Some of the data flow transformations can also have error outputs The data flow transformations send the data rows that fail to transform to the error output columns that can be dealt with separately Data flow transformations do not have external columns, but you can create one with external columns programmatically

The following tables list the 29 data flow transformations grouped together in categories on the basis of the function they perform in Integration Services Future service packs or add-ons may bring in more transformations These transformations provide a rich set of functionalities in many areas such as data cleansing, data standardization, BI functionalities, loading slowly changing data warehouse dimension tables, pivoting and unpivoting, and a facility to write script using the Script Component transformation However, if you still need a functionality that can’t be met by preconfigured components, you can write custom transformations with synchronous outputs or asynchronous outputs Transformations with synchronous outputs make modifications to data inline—i.e.,

as the data rows flow through the component one at a time—whereas transformations with asynchronous outputs cannot process each row independently of all other rows—for example, an aggregate transformation needs all the rows before it can perform an operation across rows You will learn more about synchronous and asynchronous components and programming options in Chapter 11

The following sections contain brief definitions about the data flow transformations, however, the details of configurations and usage exercises will be covered in next chapter

Business Intelligence Transformations

This category groups together the transformations that allows you to perform business intelligence operations such as data cleansing, data standardizing, text mining, and running DMX prediction queries

Transformation Description

Performs data cleansing and Standardizes values in column data

Uses fuzzy matching to cleanse or standardize data

Configures the updating of slowly changing dimension in data warehouse dimension tables

Trang 6

Extracts a term from text in the input columns and write the extracted term to an output column

Performs a lookup and count for the terms in a table that are defined in a lookup table

Runs DMX queries for performing prediction queries against data mining models

Row Transformations

The transformations in this category allow you to update column values or create new

columns on a row-by-row basis

Applies string functions to string data type columns

Creates new columns in the transformation output by copying input columns

Converts data type of a column and optionally copies the converted data to a new output column

Creates new derivations of data by applying expressions using a combination of input columns, variables, functions, and operators; the results of this derivation can be used to modify an existing column or can be copied in to a new column in the output

Exports data from a pipeline column in to a file This transformation can be especially useful to export DT_TEXT, DT_NTEXT, or DT_IMAGE data type data from the pipeline into a file

Reads data from a file and add it to the columns in the data flow

Hosts and runs a script that can be used to transform data

Updates, inserts, or deletes rows using SQL commands in a data flow

Trang 7

Rowset Transformations

The transformations in this category work on collections of rows and allow you to perform operations such as aggregate, sort, sample, pivot, and unpivot row sets

Performs aggregate functions such as average, sum, and count and copy the results to the output columns

Sorts input columns in ascending or descending order and copies the sorted data to the output columns

Creates a sample data set by specifying a percentage to randomly select input rows

Creates a sample data set by specifying the exact number of output rows to randomly select input rows

Pivots the input data on a column value to get a less normalized but compact view of data

Creates a more normalized version of a de-normalized table

Split and Join Transformations

This category groups the transformations that allow you to distribute rows to different outputs, multicast input rows, join multiple inputs into one output, and perform lookup operations

Routes data rows to different outputs, depending on the content of the data

Distributes the input data set to multiple outputs

Merges multiple input data sets into one output

Merges two sorted data sets into a single output data set

Trang 8

Joins two sorted input data sets using a FULL, LEFT, or INNER join into a single output

Prepares a cache file to be used by a lookup transformation by writing data in the pipeline to a cache connection manager

Perform lookups by joining data in the input columns with reference data set columns

Auditing Transformations

These transformations are used to add audit information and count rows

Includes the system environment information in the data flow

Counts the rows flowing through the data flow and writes the final count in a variable

Data Flow Destinations

A data flow destination is the last component in the Integration Services data flow

and writes the data received to an external data store This component has no output

columns, though it may have error output columns for redirecting error rows A data

flow destination writes the data received at its input columns to an external data store

or to an in-memory data set via its external columns interface You can configure the

data flow destination to handle errors in data in the same way as you configure a data

flow source You can choose to fail the component, ignore the error, or redirect the

error rows to the error output columns If the data flow destination can’t write some of

the data rows to the external data store due to errors in data or data not matching with

the external columns, it can redirect those rows to the error output columns depending

how it is configured to redirect error rows

Integration Services provides 12 preconfigured destinations plus a script component

that can be used as a destination The following tables list the available destinations and the functions they perform

Trang 9

Destination Description

Writes data to ADO.NET–compliant databases

Passes the data to train data mining models through the data mining model algorithms

Exposes data in the data flow using ADO.NET DataReader interface for applications such as SQL Server Reporting Services

Loads and processes an SQL Server Analysis Services Dimension

Writes data from a data flow to an Excel workbook

Writes data from a data flow to a flat file

Loads data from a data flow to an OLE DB–compliant data stores

Loads and process an SQL Server Analysis Services partition

Writes data to a raw file

Creates and populates an in-memory ADO record set

Hosts and runs a script that can be used to load data Though not shown under Data Flow Destinations in the Toolbox, Script Component can be used as a data flow destination This component is covered in Chapter 11

Inserts rows into an SQL Server Compact database

Bulk-loads data into an SQL Server table or view

All these destinations have an input, and some of them have an error output to meet most of your requirements to write data to external stores However, if you find that existing data flow destinations do not do what you want, you can custom build

a destination to suit your requirements in the Integration Services object model The following table lists input and outputs available in each data flow destination

Trang 10

Data Flow Destination Input Output Error Output Custom UI Connection Manager

ADO NET destination 1 No Yes Yes ADO.NET Connection Manager

Data Mining Model Training

destination 1 No No Yes Analysis Services Connection Manager

Dimension Processing

destination 1 No No Yes Analysis Services Connection Manager

Flat File destination 1 No No Yes Flat File Connection Manager

OLE DB destination 1 No 1 Yes OLE DB Connection Manager

Partition Processing destination 1 No No Yes Analysis Services Connection

Manager

SQL Server Compact destination 1 No 1 No OLE DB Connection Manager

SQL Server destination 1 No No Yes OLE DB Connection Manager

ADO NET Destination

You can use an ADO NET destination to load data into a table or view of an ADO.NET–

compliant database If you have a database into which you can load data using either an

ADO NET destination or an OLE DB destination, it is generally preferred to use the

OLE DB destination due to its performance and resilience; however, an ADO NET

destination does provide more options for you to load data into the databases when an

OLE DB destination cannot be used The ADO.NET Destination Editor has three

pages: Connection Manager, Mappings, and Error Output The Connection Manager

page enables you to specify an ADO.NET Connection Manager and the table or

view into which the data is to be loaded You have options in this page to create a new

connection manager or a new table In the Mappings page, you can map input columns

to the destination columns Finally, in the Error Output page, you can configure the

error response conditions

Data Mining Model Training Destination

Data changes with time, and the type of information that can be gathered from the data

changes along with it Mining data for useful and relevant information requires regular

data mining model training so that it can keep up with the challenges of ever-changing

data The data mining model training destination passes data through the data mining

model algorithms to train the data mining models

Định dạng
Số trang	10
Dung lượng	374,36 KB