1. Trang chủ
  2. » Công Nghệ Thông Tin

Hands-On Microsoft SQL Server 2008 Integration Services part 39 pdf

10 296 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 10
Dung lượng 96,26 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Using this connection manager, the Dimension Processing Destination Editor accesses all the dimensions in the source and displays them as a list for you to select the one you want to pro

Trang 1

To train the data mining models using this destination, you need a connection to SQL Server Analysis Services, where the mining structure and the mining models reside For this, you can use Analysis Services Connection Manager to connect to an instance of Analysis Services or to the Analysis Services project The Data Mining Model Training Editor has two tabs, Connection and Columns, in which you can configure the required properties In the Connection tab, you specify the connection manager for Analysis Services in the Connection Manager field and then specify the mining structure that contains the mining models you want this data to train Once you select a mining structure in the Mining structure field, the list of mining models

is displayed in the Mining models area, and this destination adapter will train all the models contained within the specified mining structure In the Columns tab, you can map available input columns to the Mining structure columns The processing of the mining model requires data to be sorted, which you can achieve by adding a sort transformation before the data mining model training destination

DataReader Destination

When your ADO.NET–compliant application needs to access data from the data flow

of an SSIS package, you can use the DataReader destination Integration Services can provide data straight from the pipeline to your ADO.NET application in cases where you need dynamic processing to happen when users request using the ADO.NET DataReader interface SSIS data processing extension facilitates provision of the data via the DataReader destination An excellent use of the DataReader destination is as a data source for an SSRS report

The DataReader destination doesn’t have a custom UI but uses the Advanced Editor

to expose all the properties organized in three tabs You can specify Name, Description, LocaleID, and ValidateExternalMetadata properties in the Common Properties section

of the Component Properties tab In the Custom Properties section, you can specify a ReadTimeout value in milliseconds, and if this value is exceeded, you can choose to fail the component in the FailOnTimeout field

In the Input Columns tab, you can select the columns you want to output, assign each

of them an output alias, and specify a usage type of READONLY or READWRITE from the drop-down list box Finally, the Input And Output Properties tab lists only the input column details, as DataReader destination has only one input and no error output

Dimension Processing Destination

One of the frequent uses of Integration Services is to load data warehouse dimensions using the dimension processing destination This destination can be used to load and process an SQL Server Analysis Services dimension Being a destination, it has no output and one input, and it does not support an error output

Trang 2

The dimension processing destination has a custom user interface, but the Advanced

Editor can also be used to modify properties that are not available in the custom editor

In the Dimension Processing Destination Editor, the properties are grouped logically

in three different pages In the Connection Manager page, you can specify the connection manager for Analysis Services to connect to the Analysis Services server or an Analysis

Services project Using this connection manager, the Dimension Processing Destination

Editor accesses all the dimensions in the source and displays them as a list for you to

select the one you want to process Next you can choose the processing method from

add (incremental), full, or update options In the Mappings page, you can map the

Available Input Columns to the Available Destination Columns using a

drag-and-drop operation

The Advanced page allows you to configure error handling in the dimension

processing destination You can choose from several options to configure the way you

want the errors to be handled:

By default, this destination will use default Analysis Services error handling that

c

you can change by un-checking the Use Default Error Configuration check box

When the dimension processing destination processes a dimension to populate

c

values from the underlying columns, an unacceptable key value may be encountered

In such cases, you can use the Key Error Action field to specify that the record be

discarded by selecting the DiscardRecord value, or you can convert the unacceptable

key value to the UnknownMember value UnknownMember is a property of the

analysis services dimension indicating that the supporting column doesn’t have a value

Next you can specify the processing error limits and can choose to either ignore

c

errors or stop on error If you select Stop On Error option, then you can specify

the error threshold using the Number Of Errors option Also, you can specify

the on error action either to stop processing or to stop logging when the error

threshold is reached by selecting the StopProcessing or StopLogging value

You can also specify specific error conditions such as these:

c

When the destination raises an error of Key Not Found, you can select it to be c

IgnoreError or ReportAndStop, whereas, by default, it is ReportAndContinue

Similarly, you can configure for Duplicate Key error for which default action

c

is to IgnoreError You can set it to ReportAndStop or ReportAndContinue if

you wish

When a null key is converted to the UnknownMember value, you can choose

c

to ReportAndStop or ReportAndContinue By default, the destination will

IgnoreError

Trang 3

When a null key value is not allowed in data, this destination will c

ReportAndContinue by default However, you can set it to IgnoreError

or ReportAndStop

You can specify a path for the error log using the Browse button

c

Excel Destination

Using the Excel destination, you can output data straight to an Excel workbook, worksheets, or ranges You use an Excel Connection Manager to connect to an Excel workbook Like an Excel Source, the Excel destination treats the worksheets and ranges in an Excel workbook as tables or views The Excel destination has one regular input and one error output

This destination has its own custom user interface that you can use to configure its properties; the Advanced Editor can also be used to modify the remaining properties The Excel Destination Editor lists its properties in three different pages

In the Connection Manager page, you can select the name of the connection manager from the drop-down list in the OLE DB Connection Manager field Then you can choose one of these three data access mode options:

Table or view

c Lets the Excel destination load data in the Excel worksheet or named range; specify the name of the worksheet or the range in the Name Of The Excel Sheet field

Table name or view name variable

that the name of the table or view is contained within a variable that you specify

in the Variable Name field

SQL command

c Allows you to load the results of an SQL statement to

an Excel file

In the Mappings page, you can map Available Input Columns to the Available Destination Columns using a drag-and-drop operation In the Error Output page you can configure the behavior of the Excel destination for errors and truncations You can ignore the failure, redirect the data, or fail the component for each of the columns in case of an error or a truncation

Flat File Destination

Every now and then you may require outputting some data from disparate sources to a text file, as this is the most convenient method to share data with external systems You can build an Integration Services package to connect to those disparate sources, extract data using customized extraction rules, and output the required data set to a text file

Trang 4

using the flat file destination adapter This destination requires a Flat File Connection

Manager to connect to a text file When you configure a Flat File Connection Manager,

you also configure various properties to specify the type of the file and how the data will

reside in the file For example, you can choose the format of the file to be delimited,

fixed width, or ragged right (also called mixed format) You also specify how the columns

and rows will be delimited and the data type of each column In this way, the Flat File

Connection Manager provides a basic structure to the file, which the destination adapter

uses as is This destination has one output and no error output

The Flat File destination has a simple customized user interface, though you can

also use the Advanced Editor to configure some of the properties In the Flat File

Destination Editor, you can specify the connection manager you want to use for this

destination in the Flat File Connection Manager field and select the check box for

“Overwrite data in the file” if you want to overwrite the existing data in the flat file

Next you are given an opportunity to provide a block of text in the Header field, which

can be added before the data as a header to the file In the Mappings page, you can map Available Input Columns to the Available Destination Columns

OLE DB Destination

You can use the OLE DB destination when you want to load your transformed data

to OLE DB–compliant databases, such as Microsoft SQL Server, Oracle, or Sybase

database servers This destination adapter requires an OLE DB Connection Manager

with an appropriate OLE DB provider to connect to the data destination The OLE

DB destination has one regular input and one error output

This destination adapter has a custom user interface that can be used to configure

most of the properties alternatively you can also use the Advanced Editor In the

OLE DB Destination Editor, you can specify an OLE DB connection manager in

the Connections Manager page If you haven’t configured an OLE DB Connection

Manager in the package yet, you can create a new connection by clicking New Once

you’ve specified the OLE DB Connection Manager, you can select the data access

mode from the drop-down list Depending on the option you choose, the editor

interface changes to collect the relevant information Here you have five options to

choose from:

Table or view

c You can load data into a table or view in the database specified

by OLE DB Connection Manager Select the table or the view from the

drop-down list in the name of the table or the view field If you don’t already have a

table in the database where you want to load data, you can create a new table by

clicking New An SQL statement for creating a table is created for you when you

click New The columns use the data type and the length same as that of the input

Trang 5

columns, which you can change if you want However, if you provide the wrong data type or a shorter column length, you will not be warned and may get errors

at run time If you are happy with the CREATE TABLE statement, all you need

to do is provide a table name replacing the [OLE DB Destination] string after CREATE TABLE in the SQL statement

Table or view—fast load

c The data is loaded into a table or view as in the preceding option; however, you can configure additional options here when you select fast load data access mode The additional fast load options are:

Keep identity

c During loading, the OLE DB destination needs to know whether it has to keep the identity values coming in the data or it has to assign unique values itself to the columns configured to have identity key

Keep nulls

c Tells the OLE DB destination to keep the null values in the data

Table lock

c Acquires a table lock during bulk load operation to speed up the loading process This option is selected by default

Check constraints

c Checks the constraints at the destination table during the data loading operation This option is selected by default

Rows per batch

c Specifies the number of rows in a batch in this box The loading operation handles the incoming rows in batches and the setting in this box will affect the buffer size So, you should test out a suitable value for this field based on the memory available to this process during run time on your server

Maximum insert commit size

box to indicate the maximum size that the OLE DB destination handles

to commit during loading The default value of 2147483647 indicates that these many rows are considered in a single batch and they will be handled together—i.e., they will commit or fail as a single batch Use this setting carefully, taking into consideration how busy your system is and how many rows you want to handle in a single batch A smaller value means more commits and hence the overall loading will take more time; however, if the server is a transactional server hosting other applications, then this might

be a good idea to share resources on the server However, if the server is a dedicated reporting or data mart server or you’re loading at a time when the other activities on the server are less active, then using a higher value in this box will reduce the overall loading time

Make sure you use fast load data access mode when loading with double-byte character set (DBCS) data; otherwise, you may get corrupted data loaded in your table or view The DBCS is a set of characters in which each character is represented by two bytes

Trang 6

The environments using ideographic writing systems such as Japanese, Korean, and

Chinese use DBCS, as they contain more characters than can be represented by 256 code

points These double-byte characters are commonly called Unicode characters Examples

of data types that support Unicode data in SQL Server are nchar, nvarchar, and ntext,

whereas Integration Services has DT_WSTR and DT_NTEXT data types to support

Unicode character strings

Table name or view name variable

or view access mode except that in this access mode you supply the name of a

variable in the Variable Name field that contains the name of the table or the view

Table name or view name variable—fast load

table or view—fast load access mode except here you supply the name of a variable

in the Variable Name field that contains the name of the table or the view You

still specify the fast load options in this data access mode

SQL command

c Load the result set of an SQL statement using this option

You can provide the SQL query in the SQL Command Text dialog box or build

a query by clicking Build Query

In the Mappings page, you can map Available Input Columns to the Available

Destination Columns using a drag-and-drop operation, and in the Error Output page,

you can specify the behavior when an error occurs

Partition Processing Destination

The partition processing destination is used to load and process an SQL Server Analysis

Services partition and works like a dimension processing destination This destination

has a custom user interface that is like the one for the dimension processing destination

This destination adapter requires the Analysis Services Connection Manager to connect

to the cubes and its partitions that reside in an Analysis Services server or the Analysis

Services project

The Partition Processing Destination Editor has three pages to configure properties

In the Connection Manager page, you can specify an Analysis Services Connection

Manager and can choose from the three processing methods—Add (incremental) for

incremental processing; Full, which is a default option and performs full processing

of the partition; and Data only to perform update processing of the partition In the

Mappings page, you can map Available Input Columns to the Available Destination

Columns using a drag-and-drop operation In the Advanced page you can configure

error-handling options when various types of errors occur Error-handling options are

similar to those available on the Advanced page of dimension processing destination

Trang 7

Raw File Destination

Sometimes you may need to stage data in between processes, for which you will want

to extract data at the fastest possible speed For example, if you have multiple packages that work on a data set one after another—i.e., a package needs to export the data at the end of its operation for the next package to continue its work on the data—a raw file destination and raw file source combination can be excellent choices The raw file destination writes raw data to the destination raw file in an SSIS native form that doesn’t require translation This raw data can be imported back to the system using the raw file source discussed earlier Using the raw file destination to export and raw file source to import data back into the system results in high performance for the staging

or export/import operation However, if you have binary large object (BLOB) data that needs to be handled in such a fashion, Raw File destination cannot help you, as it doesn’t support BLOB objects

The Raw File Destination Editor has two pages to expose the configurable properties The Connection Managers page allows you to select an access mode—File name or File name from variable—to specify how the filename information is provided You can either specify the filename and path in the File Name field directly or you can use a variable to pass these details Note that the Raw File destination doesn’t use a connection manager

to connect to the raw file and hence you don’t specify a connection manager in this page;

it connects to the raw file directly using the specified filename or by reading the filename from a variable

Next, you can choose from the following four options to write data to a file in the Write Option field:

Append

c Lets you use an existing file and append data to the already existing data This option requires that the metadata of the appended data match the metadata of the existing data in the file

Create Always

c This is a default option and always creates a new file using the filename details provided either directly in the File Name field or indirectly in

a variable specified in the Variable Name field

Create Once

c In the situations where you are using the data flow inside a repeating logic—i.e., inside a loop container—you may want to create a new file in the first iteration of the loop and then append the data to the file in the second and higher iterations You can achieve this requirement by using this option

Truncate And Append

c If you’ve an existing raw file that you want to use to write the data into, but want to delete the existing data before the new data is written into it, you can use this option to truncate the existing file first and then append the data to this file

Trang 8

In all these options, wherever you use an existing file, the metadata of the data being

loaded to the destination must match with the metadata of the file specified

In the Columns tab, you can select the columns you want to write into the raw file

and assign them an output alias as well

Recordset Destination

Sometimes you may need to take a record set from the data flow to pass it over to

other elements in the package Of course, in this instance you do not want to write to

an external storage and then read from it unnecessarily You can achieve this by using

a variable and the recordset destination that populates an in-memory ADO record set

to the variable at run time

This destination adapter doesn’t have its own custom user interface but uses the

Advanced Editor to expose its properties When you double-click this destination, the

Advanced Editor for Recordset destination opens and displays properties organized in

three tabs In the Component Properties tab, you can specify the name of the variable

to hold the record set in the Variable Name field In the Input Columns tab, you can

select the columns you want to extract out to the variable and assign an alias to each of

the selected column along with specifying whether this is a read-only or a read-write

column As this source has only one input and no error output, the Input And Output

Properties tab lists only the input columns

Script Component Destination

You can use the script component as a data flow destination when you choose Destination

in the Select Script Component Type dialog box On being deployed as a destination, this

component supports only one input and no output, as you know data flow destinations

don’t have an output The script component as a destination is covered in Chapter 11

SQL Server Compact Destination

Integration Services stretches out to give you an SQL Server Compact destination,

enabling your packages to write data straight to an SQL Server Compact database

table This destination uses the SQL Server Compact Connection Manager to connect

to an SQL Server Compact database The SQL Server Compact Connection Manager

lets your package connect to a compact database file, and then you can specify the table

you want to update in an SQL Server Compact destination

You need to create an SQL Server Compact Connection Manager before you can

configure an SQL Server Compact destination This destination does not have a

custom user interface and hence uses the Advanced Editor to expose its properties

When you double-click this destination, the Advanced Editor for SQL Server

Trang 9

Compact destination opens with four tabs Choose the connection manager for

a Compact database in the Connection Manager tab Specify the table name you want to update in the Table Name field under the Custom Properties section of the Component Properties tab

In the Column Mappings tab, you can map Available Input Columns to the Available Destination Columns using a drag-and-drop operation The Input and Output Properties tab shows you the External Columns and Input Columns in the Input Collection and the Output Columns in the Error Output Collection SQL Server Compact destination has one input and supports an error output

SQL Server Destination

We have looked at two different ways to import data into SQL Server—using the Bulk Insert Task in Chapter 5 and the OLE DB destination earlier in this chapter Though both are capable of importing data into SQL Server, they suffer from some limitations The Bulk Insert task is a faster way to import data but is a part of the control flow, not the data flow, and doesn’t let you transform data before import The OLE DB destination is part of the data flow and lets you transform the data before import; however, it isn’t the fastest method to import data into SQL Server The SQL Server destination combines benefits of both the components—it lets you transform the data before import and use the speed of the Bulk Insert task to import data into local SQL Server tables and views The SQL Server destination can write data into a local SQL Server only So, if you want to import data faster to an SQL Server table or a view on the same server where the package is running, use an SQL Server destination rather than an OLE DB destination Being a destination adapter, this has one input only and does not support an error output

SQL Server destination has a custom user interface, though you can also use the Advanced Editor to configure its properties In the Connection Manager page of the SQL Destination Editor, you can specify a connection manager, a data source, or a data source view in the Connection Manager field to connect to an SQL Server database Then select a table or view from the drop-down list in the Use A Table Or View field You also have an option to create a new connection manager or a table or view by clicking the New buttons provided In the Mappings page, you can map Available Input Columns

to the Available Destination Columns using a drag-and-drop operation

You specify the Bulk Insert options in the Advanced page of the SQL Destination Editor dialog box You can configure the following ten options in this page:

Keep identity

c This option is not checked by default Check this box to keep the identity values coming in the data rather than using the unique values assigned by SQL Server

Trang 10

Keep nulls

c This option is not checked by default Check this box to retain the

null values

Table lock

c This option is checked by default Uncheck this option if you don’t

want to lock the table during loading time This option may impact the availability

of tables being loaded to other applications or users If you want to allow concurrent

use of SQL Server tables that are being loaded by this destination, uncheck this

box; however, if you are running this package at a quiet time—i.e., when no other

applications or users are accessing the tables being loaded, or you do not want to

allow concurrent use of those tables—it is better to leave the default setting

Check constraints

c This option is checked by default This means any constraint

on the table being loaded will be checked during loading time If you’re confident

the data being loaded does not break any constraints and want faster import of

data, you may uncheck this box to save processing overhead of checking constraints

Fire triggers

c This option is not checked by default Check this box to let the bulk insert operation execute insert triggers on target tables during loading Selecting to

execute insert triggers on the destination table may affect the performance of the

loading operation

First row

c Specify a value for the first row from which the bulk insert will start

Last row

c Specify a value in this field for the last row to insert

Maximum number of errors

c Provide a value for the maximum number of rows

that cannot be imported due to errors in data before the bulk insert operation

stops Leave the First Row, Last Row, and Maximum Number Of Errors fields

blank to indicate that you do not want to specify any limits However, if you’re

using the Advanced Editor, use a –1 value to indicate the same

Timeout

c Specify the number of seconds in this field before the bulk insert

operation times out

Order columns

c Specify a comma-delimited list of columns in this field to sort

data on in ascending or descending order

Data Flow Paths

First, think of how you connect tasks in the control flow You click the first task in the

control flow to highlight the task and display a green arrow, representing output from

the task Then you drag the green arrow onto the next task in the work flow to create a

connection between the tasks, represented by the green line by default The green line,

called a precedence constraint, enables you to define some conditions when the following

tasks can be executed In the data flow, you connect the components in the same way you

Ngày đăng: 04/07/2014, 15:21

TỪ KHÓA LIÊN QUAN