You will use the Analysis Services Connection Manager with the Analysis Services Processing task, Analysis Services Execute DDL task, or Data Mining Model Training destination objects in
Trang 1the FileUsageType property of the connection manager to indicate how you want to use the File Connection Manager—that is, you want to create or use an existing file or
a folder
Flat File Connection Manager
This connection manager provides access to data in a flat file It is used to extract data from a flat-file source or load data to a destination and can use delimited, fixed-width,
or ragged-right format This connection manager accesses only one file If you want to reference multiple flat files, you must use a Multiple Flat Files Connection Manager
FTP Connection Manager
Use this connection manager whenever you want to upload or download files using File Transfer Protocol (FTP) It enables you to connect to an FTP server using anonymous authentication or basic authentication The default port used for FTP connection is 21 FTP Connection Manager can send and receive files using active or passive mode The transfer mode is defined as active mode when the server initiates the FTP connection and passive mode when the client initiates the FTP connection
HTTP Connection Manager
Whenever you want to upload or download files using HTTP (port 80), use this connection manager It enables you to connect to a web server using HTTP The Web Service task provided in Integration Services uses this connection manager Like the FTP Connection Manager, the HTTP Connection Manager allows connections using anonymous authentication or basic authentication
MSMQ Connection Manager
When you’re working with mainframe systems or on systems with messaging architecture, you will need to use Message Queuing within your packages for which you will have
to use an MSMQ Connection Manager For example, if you want to use the Message Queue task in Integration Services, you need to add an MSMQ Connection Manager
An MSMQ Connection Manager enables a package to connect to a message queue
Analysis Services Connection Manager
If you are creating an analysis services project or database as part of your solution, you may want to update or process Analysis Services objects as part of your SSIS jobs One simple example could be that your SSIS packages update the data mart nightly, after
Trang 2which you may want to process the cube and dimensions to include the latest data in
the SSAS database For such reasons as these, you may include the Analysis Services
Connection Manager into your SSIS packages This connection manager provides
access to Analysis Services objects such as cube and dimensions by allowing you to
connect to an Analysis Services database or an Analysis Services project in the same
solution, though you can connect to an Analysis Services project only at design time
You will use the Analysis Services Connection Manager with the Analysis Services
Processing task, Analysis Services Execute DDL task, or Data Mining Model Training destination objects in your package
Multiple Files Connection Manager
When you have to connect to multiple files within your Script task or Script component scripts, you will use the Multiple Files Connection Manager When you add this
connection manager, you can add multiple files or folders to be referenced Those multiple files and folders show up as a piped delimited list in the ConnectionString property of
this connection manager To specify multiple files or folders, you can also use wildcards Suppose, for example, that you want to use all the text files in the C:\SSIS folder You
could add the Multiple Files Connection Manager by choosing only one file in the
C:\SSIS folder, going to the Properties window of the connection manager, and setting the value of the ConnectionString property to C:\SSIS\*.txt
Similar to the File Connection Manager, the Multiple Files Connection Manager
has a FileUsageType property to indicate the usage type—that is, how you want to
create or use an existing file or a folder
Multiple Flat Files Connection Manager
As you can reference only one flat file using the Flat File Connection Manager, you use the Multiple Flat Files Connection Manager when you need to reference more than
one flat file You can access data in flat files having delimited, fixed-width, or
ragged-right format In the GUI of this connection manager, you can select multiple files by
using the Browse button and highlighting multiple files These files are then listed as
a piped delimited list in the connection manager You can also use wildcards to specify multiple files Suppose, for example, that you want to use all the flat files in the
C:\SSIS folder To do this, you would add C:\SSIS\*.txt in the File Names field to
choose multiple files However, note that all these files must have the same format
So, when you have multiple flat files to import from a folder, you have two options One is to loop over the files using Foreach Loop Container, read the filenames and
pass those filenames one by one to the Flat File Connection Manager so that the
files can be imported iteratively The second option is to use a Multiple Flat Files
Trang 3Connection Manager where you don’t need to use a looping construct; rather, this connection manager reads all the files, collates the data, and passes the data directly to the downstream components in a single iteration as if the data were coming from
a single source such as a database table instead of multiple flat files
Both these options have their usability in particular scenarios; for example, if you have to import several files from the same folder and you’re not worried much about auditing and lineage—i.e., where the data is coming from, you can use the Multiple Flat Files Connection Manager method This method bulk-imports the data quite quickly comparative to the looping construct of dealing with each file The cost of speed is paid in terms of resource utilization As all the files are read within the same batch, the CPU utilization and memory requirements are quite high in this case, although for a short duration, depending upon the file sizes On the other hand, the iterative method deals with a file at a time, requiring less CPU and memory resources, but for a longer duration Based on the file size, lineage, and auditing requirements, the resource availability on your server and the time window available to import data, you can choose one of these two methods to address the requirements
ODBC Connection Manager
This connection manager enables an Integration Services package to connect to a wide range of relational database management systems (RDBMS) using the Open Database Connectivity (ODBC) protocol
OLE DB Connection Manager
This connection manager enables an Integration Services package to connect to a data source using an OLE DB provider OLE DB is an updated ODBC standard and
is designed to be faster, more efficient, and more stable than ODBC; it is an open specification for accessing several kinds of data Many of the Integration Services tasks and data flow components use the OLE DB Connection Manager For example, the OLE DB source adapter and OLE DB destination adapter use OLE DB Connection Manager to extract and load data, and one of the connections that the Execute SQL task uses is the OLE DB Connection Manager to connect to an SQL Server database
to run queries
SMO Connection Manager
SQL Management Objects (SMO) is a collection of objects that can be programmed
to manage SQL Server SMO is an upgrade to SQL-DMO, a set of APIs you use to create and manage SQL Server database objects SMO performs better, is more scalable, and is easy to use compared to SQL-DMO SMO Connection Manager enables an
Trang 4Integration Services package to connect to an SMO server and hence enable you to
manage SQL Server objects using SMO scripts For example, Integration Services
transfer tasks use an SMO connection to transfer objects from one server to another
SMTP Connection Manager
An SMTP Connection Manager enables an Integration Services package to connect to
a Simple Mail Transfer Protocol (SMTP) server For example, when you want to send
an e-mail notification from a package, you can use Send Mail Task and configure it to use SMTP Connection Manager to connect to an SMTP server
SQL Server Compact Edition Connection Manager
When you need to connect to an SQL Server Compact database, you will use an SQL Server Compact Connection Manager SQL Server Compact Destination adapter uses this connection to load data into a table in an SQL Server Compact Edition database
If you’re running the package that uses this connection manager on a 64-bit server,
you will need to run it in 32-bit mode, as the SQL Server Compact Edition provider is available in a 32-bit version
WMI Connection Manager
Windows Management Instrumentation (WMI) enables you to access management
information in enterprise systems such as networks, computers, managed devices, and
other managed components using the Web-Based Enterprise Management (WBEM) standard Using a WMI Connection Manager, your Integration Services package can
manage and automate administrative tasks in an enterprise environment
Microsoft Connector 1.0 for SAP BI
You can import and export data between Integration Services and SAP BI by using
Microsoft Connector 1.0 for SAP BI Using this connector in Integration Services, you can integrate a non-SAP data source with SAP BI or can use SAP BI as a data source
in your data integration application The Microsoft Connector for SAP BI is a set of
managed components that transfers data from and to an SAP NetWeaver BI version
7 system in both Full and Delta modes via standard interfaces This connector is not
installed in the default installation; rather it is an add-in to Integration Services and
you have to download the installation files separately from the Microsoft SQL Server
2008 Feature Pack download web page The SAP BI Connector can be installed on an Enterprise or a Developer Edition of SQL Server 2008 Integration Services; however,
Trang 5you can transfer data between SAP BI 7.0 and any of the versions from SQL Server
2000 and later The SAP BI connector provides three main components:
SAP BI Source c
SAP BI Destination c
SAP BI Connection Manager c
As you can guess, SAP BI Source can be used to extract data from an SAP BI system, SAP BI Destination can be used to load data into an SAP BI system and the SAP BI Connection Manager helps to manage the RFC connection between the Integration Services package and SAP BI When you install the SAP BI connector, the SAP BI Connection Manager is displayed in the list of connection managers; however, you will need to add the SAP BI Source and SAP BI Destination manually You can do this by right-clicking the Data Flow Sources in the Toolbox, selecting the Choose Items option, and selecting SAP BI Source from the list in the SSIS Data Flow Items tab Similarly, you can add the SAP BI Destination by right-clicking the Data Flow Destinations
in the Toolbox.Figure 3-2 shows the SAPBI Connection Manager in the Add SSIS Connection Manager dialog box, the SAP BI Source in Data Flow Sources section, and the SAP BI Destination in the Data Flow Destinations section of the Toolbox
Microsoft Connector for Oracle by Attunity
Microsoft Oracle and Teradata connectors are developed by Attunity and have been implemented in the same fashion as the SAP BI connector That is, when you install these connectors, you get a connection manager, a Source component, and a Destination component, though you will have to manually add source and destination components in
to the Data Flow Designer Toolbox Refer to Figure 3-2 to see how these components have been implemented The Oracle connector has been developed to achieve optimal performance when transferring data from or to an Oracle database using Integration Services The connector is implemented as a set of managed components and is available for Enterprise and Developer Editions of SQL Server 2008 Integration Services only The Attunity Oracle Connector supports Oracle 9.2.0.4 and higher-version databases
and requires Oracle client software version 10.x or 11.x be installed on the same
computer where SSIS will be using this connector With this connector, you can:
Fast Load
c Bulk Load Destination using OCI (Oracle Call Interface) Direct Path
Arrayed Load
c Bulk Load Destination in batches and the entire batch is inserted under the same transaction
Bulk Extract Source
c Using OCI Array Binding
Trang 6Microsoft Connector for Teradata by Attunity
The Microsoft Connector for Teradata is a set of managed components developed
to achieve optimal performance for transferring data from or to a Teradata database
using Integration Services The connector is available for the Enterprise and Developer Editions of SQL Server 2008 Integration Services only The SSIS components for
Teradata—i.e., Teradata Source, Teradata Destination, and Teradata Connection
Figure 3-2 SSIS connection managers and data flow sources and destinations
Trang 7Manager (see Figure 3-2) use the Teradata Parallel Connector (TPC) for connectivity The Microsoft Connector for Teradata supports
Teradata Database version 2R6.0 c
Teradata Database version 2R6.1 c
Teradata Database version 2R6.2 c
Teradata Database version 12.0 c
To use this connector, you will have to install Teradata Parallel Transporter (TPT) version 12.0 and the Teradata ODBC driver (version 12 recommended) on the same computer where SSIS will be using this connector You can use this connector for Bulk Load Destination using TPT FastLoad
c Incremental Load Destination using TPT Tpump c
Bulk Extract Source using TPT c
Data Sources and Data Source Views
We have talked about connection managers that can be added in the packages However, you might have noticed two folders, Data Sources and Data Source Views, in your project
in Solution Explorer These folders can also contain data source connections However, these are only design-time objects and aren’t available at run time The connection managers embedded in the packages are used at run time
Data Sources
You can create design-time data source objects in Integration Services, Analysis Services,
and Reporting Services projects in BIDS A data source is a connection to a data store—
for example, a database You can create a data source by right-clicking the Data Sources node and selecting the New Data Source option This will start the Data Source Wizard that will help you create a data source So, the data source object gets created outside the package and you reference it later in the package Once a data source is created, it can be referenced by multiple packages You can reference a data source in a package by right-clicking in the Connection Managers area and selecting the New Connection from Data Source option from the context menu
When you reference a data source inside a package, it is added as a connection manager connection and is used at run time This approach of having data source created outside a package and then referencing it or embedding it in the package as
Trang 8a connection manager has several benefits You can provide a consistent approach
in your packages to make managing connections easier You can update all the
connection managers used in various packages that reference a data source by
simply making a change at one place only—in the data source itself, as the data
source provides synchronization between itself and the connection managers Last,
you can delete a data source any time without affecting the connection managers
in the packages This is possible because there is no dependency between the two
Connection managers don’t need data sources to be able to work, as they are complete
in themselves The only link between a data source and the connection managers
that reference it is that the connection managers get synchronized at times or when
the changes occur The data sources and the data source views are only design-time
objects that help in management of the connection managers across several packages During run time, the package doesn’t need a data source to be present, as it uses
connection managers that gets embedded in it anyway Data sources are not used
when building packages programmatically
Data Source View
A data source view, built on a data source, is a named, saved subset that defines the
underlying schema of a relational data source A data source view can include metadata that can define sources, destinations, and lookup tables for SSIS tasks, transformations, and data adapters While a data source is a connection to a data store, the data source
views are used to reference more specific objects such as tables or views or their
subsets As you can apply filters on a data source view, you can in fact create multiple
data source view objects from a data source For example, a data source can reference
a database, while different data source views can be created to reference its different
tables or views To use a data source view in a package, you must first add the data
source to the package
Using data source views can be beneficial While you can use a data source view in
multiple packages, refreshing a data source view reflects the changes in its underlying data sources Data source views can also cache metadata of the data sources on which they are built and can extend a data source view by adding calculated columns, new relationships, and so on You can consider this as an additional abstraction layer provided to you for
polishing the data model or aligning the metadata as per your package requirements This can be a very powerful facility in case you’re dealing with third-party databases or working with systems where it is not easy for you to make a change
The data source view can be referenced by data flow components such as OLE DB source and lookup transformations To reference a data source view, you instantiate
the data source and then refer the data source view in the component Figure 3-3
shows an OLE DB source referencing a CampaignZone1 data source view, where
Trang 9Campaign is a data source Once you add a data source view to a package, it is resolved
to an SQL statement and stored in a property of the component using it You create
a data source view by using the Data Source View Wizard and then modify it in the Data Source View Designer Data source views are not used when building packages programmatically
SSIS Variables
Variables are used to store values They enable SSIS objects to communicate among each other in the package as well as between parent and child packages at run time You can use variables in a variety of ways—for example, you can load results of an Execute SQL task to a variable, change the way a package works by dynamically updating its parameters at run time using variables, control looping within a package by using a loaded variable, raise an error when a variable is altered, use them in scripts, or evaluate them as an expression
Figure 3-3 Referencing a data source view inside an OLE DB source
Trang 10DTS 2000 provides global variables, for which users set the values in a single area
in the package and then use those values over and over This allows users to extend the dynamic abilities of packages As the global variables are defined at the package level,
sometimes managing all the variables at a single place becomes quite challenging for
complex packages SSIS has improved on this shortcoming by assigning a scope to the
variables Scopes are discussed in greater detail a bit later in the chapter in the section
“User-Defined Variables.”
Integration Services provides two types of variables—system variables and
user-defined variables—that you can configure and use in your packages System variables
are made available in the package and provide environmental information or the state
of the system at run time You don’t have to create the system variables, as they are
provided for you, and hence you can use them in your packages straightaway However,
you must create a user-defined variable before you can use it in your package To see the
variables available in a package in BIDS, either go to the Variables window or go to the Package Explorer tab and expand the Variables folder
System Variables
The preconfigured variables provided in Integration Services are called system variables
While you create user-defined variables to meet the needs of your packages, you cannot create additional system variables They are read-only; however, you can configure
them to raise an event when they change their value System variables store informative values about the packages and their objects, which can be used in expressions to
customize packages, containers, tasks, and event handlers Different containers have
different system variables available to them For example, PackageID is available in the package scope, whereas TaskID is available in the Data Flow Task scope Some of the more frequently used system variables for different containers are defined in Table 3-1 Using these system variables, you can actually extract interesting information from
the packages on the fly For example, at run time using system variables, you can log
who started which package at what time This is exactly what you are going to do in the following Hands-On exercise
Hands-On: Using System Variables
to Create Custom Logs
This exercise demonstrates how you can create a custom log for an Integration Services package