Hands-On Microsoft SQL Server 2008 Integration Services part 11 potx

You will use the Analysis Services Connection Manager with the Analysis Services Processing task, Analysis Services Execute DDL task, or Data Mining Model Training destination objects in

Trang 1

the FileUsageType property of the connection manager to indicate how you want to use the File Connection Manager—that is, you want to create or use an existing file or

a folder

Flat File Connection Manager

This connection manager provides access to data in a flat file It is used to extract data from a flat-file source or load data to a destination and can use delimited, fixed-width,

or ragged-right format This connection manager accesses only one file If you want to reference multiple flat files, you must use a Multiple Flat Files Connection Manager

FTP Connection Manager

Use this connection manager whenever you want to upload or download files using File Transfer Protocol (FTP) It enables you to connect to an FTP server using anonymous authentication or basic authentication The default port used for FTP connection is 21 FTP Connection Manager can send and receive files using active or passive mode The transfer mode is defined as active mode when the server initiates the FTP connection and passive mode when the client initiates the FTP connection

HTTP Connection Manager

Whenever you want to upload or download files using HTTP (port 80), use this connection manager It enables you to connect to a web server using HTTP The Web Service task provided in Integration Services uses this connection manager Like the FTP Connection Manager, the HTTP Connection Manager allows connections using anonymous authentication or basic authentication

MSMQ Connection Manager

When you’re working with mainframe systems or on systems with messaging architecture, you will need to use Message Queuing within your packages for which you will have

to use an MSMQ Connection Manager For example, if you want to use the Message Queue task in Integration Services, you need to add an MSMQ Connection Manager

An MSMQ Connection Manager enables a package to connect to a message queue

Analysis Services Connection Manager

If you are creating an analysis services project or database as part of your solution, you may want to update or process Analysis Services objects as part of your SSIS jobs One simple example could be that your SSIS packages update the data mart nightly, after

Trang 2

which you may want to process the cube and dimensions to include the latest data in

the SSAS database For such reasons as these, you may include the Analysis Services

Connection Manager into your SSIS packages This connection manager provides

access to Analysis Services objects such as cube and dimensions by allowing you to

connect to an Analysis Services database or an Analysis Services project in the same

solution, though you can connect to an Analysis Services project only at design time

You will use the Analysis Services Connection Manager with the Analysis Services

Processing task, Analysis Services Execute DDL task, or Data Mining Model Training destination objects in your package

Multiple Files Connection Manager

When you have to connect to multiple files within your Script task or Script component scripts, you will use the Multiple Files Connection Manager When you add this

connection manager, you can add multiple files or folders to be referenced Those multiple files and folders show up as a piped delimited list in the ConnectionString property of

this connection manager To specify multiple files or folders, you can also use wildcards Suppose, for example, that you want to use all the text files in the C:\SSIS folder You

could add the Multiple Files Connection Manager by choosing only one file in the

C:\SSIS folder, going to the Properties window of the connection manager, and setting the value of the ConnectionString property to C:\SSIS\*.txt

Similar to the File Connection Manager, the Multiple Files Connection Manager

has a FileUsageType property to indicate the usage type—that is, how you want to

create or use an existing file or a folder

Multiple Flat Files Connection Manager

As you can reference only one flat file using the Flat File Connection Manager, you use the Multiple Flat Files Connection Manager when you need to reference more than

one flat file You can access data in flat files having delimited, fixed-width, or

ragged-right format In the GUI of this connection manager, you can select multiple files by

using the Browse button and highlighting multiple files These files are then listed as

a piped delimited list in the connection manager You can also use wildcards to specify multiple files Suppose, for example, that you want to use all the flat files in the

C:\SSIS folder To do this, you would add C:\SSIS\*.txt in the File Names field to

choose multiple files However, note that all these files must have the same format

So, when you have multiple flat files to import from a folder, you have two options One is to loop over the files using Foreach Loop Container, read the filenames and

pass those filenames one by one to the Flat File Connection Manager so that the

files can be imported iteratively The second option is to use a Multiple Flat Files

Trang 3

Connection Manager where you don’t need to use a looping construct; rather, this connection manager reads all the files, collates the data, and passes the data directly to the downstream components in a single iteration as if the data were coming from

a single source such as a database table instead of multiple flat files

Both these options have their usability in particular scenarios; for example, if you have to import several files from the same folder and you’re not worried much about auditing and lineage—i.e., where the data is coming from, you can use the Multiple Flat Files Connection Manager method This method bulk-imports the data quite quickly comparative to the looping construct of dealing with each file The cost of speed is paid in terms of resource utilization As all the files are read within the same batch, the CPU utilization and memory requirements are quite high in this case, although for a short duration, depending upon the file sizes On the other hand, the iterative method deals with a file at a time, requiring less CPU and memory resources, but for a longer duration Based on the file size, lineage, and auditing requirements, the resource availability on your server and the time window available to import data, you can choose one of these two methods to address the requirements

ODBC Connection Manager

This connection manager enables an Integration Services package to connect to a wide range of relational database management systems (RDBMS) using the Open Database Connectivity (ODBC) protocol

OLE DB Connection Manager

This connection manager enables an Integration Services package to connect to a data source using an OLE DB provider OLE DB is an updated ODBC standard and

is designed to be faster, more efficient, and more stable than ODBC; it is an open specification for accessing several kinds of data Many of the Integration Services tasks and data flow components use the OLE DB Connection Manager For example, the OLE DB source adapter and OLE DB destination adapter use OLE DB Connection Manager to extract and load data, and one of the connections that the Execute SQL task uses is the OLE DB Connection Manager to connect to an SQL Server database

to run queries

SMO Connection Manager

SQL Management Objects (SMO) is a collection of objects that can be programmed

to manage SQL Server SMO is an upgrade to SQL-DMO, a set of APIs you use to create and manage SQL Server database objects SMO performs better, is more scalable, and is easy to use compared to SQL-DMO SMO Connection Manager enables an

Trang 4

Integration Services package to connect to an SMO server and hence enable you to

manage SQL Server objects using SMO scripts For example, Integration Services

transfer tasks use an SMO connection to transfer objects from one server to another

SMTP Connection Manager

An SMTP Connection Manager enables an Integration Services package to connect to

a Simple Mail Transfer Protocol (SMTP) server For example, when you want to send

an e-mail notification from a package, you can use Send Mail Task and configure it to use SMTP Connection Manager to connect to an SMTP server

SQL Server Compact Edition Connection Manager

When you need to connect to an SQL Server Compact database, you will use an SQL Server Compact Connection Manager SQL Server Compact Destination adapter uses this connection to load data into a table in an SQL Server Compact Edition database

If you’re running the package that uses this connection manager on a 64-bit server,

you will need to run it in 32-bit mode, as the SQL Server Compact Edition provider is available in a 32-bit version

WMI Connection Manager

Windows Management Instrumentation (WMI) enables you to access management

information in enterprise systems such as networks, computers, managed devices, and

other managed components using the Web-Based Enterprise Management (WBEM) standard Using a WMI Connection Manager, your Integration Services package can

manage and automate administrative tasks in an enterprise environment

Microsoft Connector 1.0 for SAP BI

You can import and export data between Integration Services and SAP BI by using

Microsoft Connector 1.0 for SAP BI Using this connector in Integration Services, you can integrate a non-SAP data source with SAP BI or can use SAP BI as a data source

in your data integration application The Microsoft Connector for SAP BI is a set of

managed components that transfers data from and to an SAP NetWeaver BI version

7 system in both Full and Delta modes via standard interfaces This connector is not

installed in the default installation; rather it is an add-in to Integration Services and

you have to download the installation files separately from the Microsoft SQL Server

2008 Feature Pack download web page The SAP BI Connector can be installed on an Enterprise or a Developer Edition of SQL Server 2008 Integration Services; however,

Trang 5

you can transfer data between SAP BI 7.0 and any of the versions from SQL Server

2000 and later The SAP BI connector provides three main components:

SAP BI Source c

SAP BI Destination c

SAP BI Connection Manager c

As you can guess, SAP BI Source can be used to extract data from an SAP BI system, SAP BI Destination can be used to load data into an SAP BI system and the SAP BI Connection Manager helps to manage the RFC connection between the Integration Services package and SAP BI When you install the SAP BI connector, the SAP BI Connection Manager is displayed in the list of connection managers; however, you will need to add the SAP BI Source and SAP BI Destination manually You can do this by right-clicking the Data Flow Sources in the Toolbox, selecting the Choose Items option, and selecting SAP BI Source from the list in the SSIS Data Flow Items tab Similarly, you can add the SAP BI Destination by right-clicking the Data Flow Destinations

in the Toolbox.Figure 3-2 shows the SAPBI Connection Manager in the Add SSIS Connection Manager dialog box, the SAP BI Source in Data Flow Sources section, and the SAP BI Destination in the Data Flow Destinations section of the Toolbox

Microsoft Connector for Oracle by Attunity

Microsoft Oracle and Teradata connectors are developed by Attunity and have been implemented in the same fashion as the SAP BI connector That is, when you install these connectors, you get a connection manager, a Source component, and a Destination component, though you will have to manually add source and destination components in

to the Data Flow Designer Toolbox Refer to Figure 3-2 to see how these components have been implemented The Oracle connector has been developed to achieve optimal performance when transferring data from or to an Oracle database using Integration Services The connector is implemented as a set of managed components and is available for Enterprise and Developer Editions of SQL Server 2008 Integration Services only The Attunity Oracle Connector supports Oracle 9.2.0.4 and higher-version databases

and requires Oracle client software version 10.x or 11.x be installed on the same

computer where SSIS will be using this connector With this connector, you can:

Fast Load

c Bulk Load Destination using OCI (Oracle Call Interface) Direct Path

Arrayed Load

c Bulk Load Destination in batches and the entire batch is inserted under the same transaction

Bulk Extract Source

c Using OCI Array Binding

Trang 6

Microsoft Connector for Teradata by Attunity

The Microsoft Connector for Teradata is a set of managed components developed

to achieve optimal performance for transferring data from or to a Teradata database

using Integration Services The connector is available for the Enterprise and Developer Editions of SQL Server 2008 Integration Services only The SSIS components for

Teradata—i.e., Teradata Source, Teradata Destination, and Teradata Connection

Figure 3-2 SSIS connection managers and data flow sources and destinations

Trang 7

Manager (see Figure 3-2) use the Teradata Parallel Connector (TPC) for connectivity The Microsoft Connector for Teradata supports

Teradata Database version 2R6.0 c

Teradata Database version 12.0 c

To use this connector, you will have to install Teradata Parallel Transporter (TPT) version 12.0 and the Teradata ODBC driver (version 12 recommended) on the same computer where SSIS will be using this connector You can use this connector for Bulk Load Destination using TPT FastLoad

c Incremental Load Destination using TPT Tpump c

Bulk Extract Source using TPT c

Data Sources and Data Source Views

We have talked about connection managers that can be added in the packages However, you might have noticed two folders, Data Sources and Data Source Views, in your project

in Solution Explorer These folders can also contain data source connections However, these are only design-time objects and aren’t available at run time The connection managers embedded in the packages are used at run time

Data Sources

You can create design-time data source objects in Integration Services, Analysis Services,

and Reporting Services projects in BIDS A data source is a connection to a data store—

for example, a database You can create a data source by right-clicking the Data Sources node and selecting the New Data Source option This will start the Data Source Wizard that will help you create a data source So, the data source object gets created outside the package and you reference it later in the package Once a data source is created, it can be referenced by multiple packages You can reference a data source in a package by right-clicking in the Connection Managers area and selecting the New Connection from Data Source option from the context menu

When you reference a data source inside a package, it is added as a connection manager connection and is used at run time This approach of having data source created outside a package and then referencing it or embedding it in the package as

Trang 8

a connection manager has several benefits You can provide a consistent approach

in your packages to make managing connections easier You can update all the

connection managers used in various packages that reference a data source by

simply making a change at one place only—in the data source itself, as the data

source provides synchronization between itself and the connection managers Last,

you can delete a data source any time without affecting the connection managers

in the packages This is possible because there is no dependency between the two

Connection managers don’t need data sources to be able to work, as they are complete

in themselves The only link between a data source and the connection managers

that reference it is that the connection managers get synchronized at times or when

the changes occur The data sources and the data source views are only design-time

objects that help in management of the connection managers across several packages During run time, the package doesn’t need a data source to be present, as it uses

connection managers that gets embedded in it anyway Data sources are not used

when building packages programmatically

Data Source View

A data source view, built on a data source, is a named, saved subset that defines the

underlying schema of a relational data source A data source view can include metadata that can define sources, destinations, and lookup tables for SSIS tasks, transformations, and data adapters While a data source is a connection to a data store, the data source

views are used to reference more specific objects such as tables or views or their

subsets As you can apply filters on a data source view, you can in fact create multiple

data source view objects from a data source For example, a data source can reference

a database, while different data source views can be created to reference its different

tables or views To use a data source view in a package, you must first add the data

source to the package

Using data source views can be beneficial While you can use a data source view in

multiple packages, refreshing a data source view reflects the changes in its underlying data sources Data source views can also cache metadata of the data sources on which they are built and can extend a data source view by adding calculated columns, new relationships, and so on You can consider this as an additional abstraction layer provided to you for

polishing the data model or aligning the metadata as per your package requirements This can be a very powerful facility in case you’re dealing with third-party databases or working with systems where it is not easy for you to make a change

The data source view can be referenced by data flow components such as OLE DB source and lookup transformations To reference a data source view, you instantiate

the data source and then refer the data source view in the component Figure 3-3

shows an OLE DB source referencing a CampaignZone1 data source view, where

Trang 9

Campaign is a data source Once you add a data source view to a package, it is resolved

to an SQL statement and stored in a property of the component using it You create

a data source view by using the Data Source View Wizard and then modify it in the Data Source View Designer Data source views are not used when building packages programmatically

SSIS Variables

Variables are used to store values They enable SSIS objects to communicate among each other in the package as well as between parent and child packages at run time You can use variables in a variety of ways—for example, you can load results of an Execute SQL task to a variable, change the way a package works by dynamically updating its parameters at run time using variables, control looping within a package by using a loaded variable, raise an error when a variable is altered, use them in scripts, or evaluate them as an expression

Figure 3-3 Referencing a data source view inside an OLE DB source

Trang 10

DTS 2000 provides global variables, for which users set the values in a single area

in the package and then use those values over and over This allows users to extend the dynamic abilities of packages As the global variables are defined at the package level,

sometimes managing all the variables at a single place becomes quite challenging for

complex packages SSIS has improved on this shortcoming by assigning a scope to the

variables Scopes are discussed in greater detail a bit later in the chapter in the section

“User-Defined Variables.”

Integration Services provides two types of variables—system variables and

user-defined variables—that you can configure and use in your packages System variables

are made available in the package and provide environmental information or the state

of the system at run time You don’t have to create the system variables, as they are

provided for you, and hence you can use them in your packages straightaway However,

you must create a user-defined variable before you can use it in your package To see the

variables available in a package in BIDS, either go to the Variables window or go to the Package Explorer tab and expand the Variables folder

System Variables

The preconfigured variables provided in Integration Services are called system variables

While you create user-defined variables to meet the needs of your packages, you cannot create additional system variables They are read-only; however, you can configure

them to raise an event when they change their value System variables store informative values about the packages and their objects, which can be used in expressions to

customize packages, containers, tasks, and event handlers Different containers have

different system variables available to them For example, PackageID is available in the package scope, whereas TaskID is available in the Data Flow Task scope Some of the more frequently used system variables for different containers are defined in Table 3-1 Using these system variables, you can actually extract interesting information from

the packages on the fly For example, at run time using system variables, you can log

who started which package at what time This is exactly what you are going to do in the following Hands-On exercise

Hands-On: Using System Variables

to Create Custom Logs

This exercise demonstrates how you can create a custom log for an Integration Services package

Định dạng
Số trang	10
Dung lượng	328,55 KB