1. Trang chủ
  2. » Công Nghệ Thông Tin

Hướng dẫn học Microsoft SQL Server 2008 part 90 docx

10 175 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 10
Dung lượng 493,97 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

■ ADO.NET: Uses an ADO.NET connection manager to write data to a selected table or view ■ DataReader: Makes the data flow available via an ADO.NET DataReader, which can be opened by othe

Trang 1

For i = 1 To 20 Output0Buffer.AddRow() Output0Buffer.RandomInt = CInt(Rnd() * 100) Next

End Sub This example works for a single output with the default nameOutput 0containing a single integer columnRandomInt Notice how each output is exposed asname+"buffer"and embedded spaces are removed from the name New rows are added using theAddRowmethod and columns are populated by referring to them as output properties An additional property

is exposed for each column with the suffix_IsNull(e.g.,Output0Buffer.RandomInt_

IsNull) to mark a value asNULL

Reading data from an external source requires some additional steps, including identify-ing the connection managers that will be referenced within the script on the Connection Managers page of the editor Then, in the script, additional methods must be overridden:

AcquireConnectionsandReleaseConnectionsto open and close any connections, andPreExecuteandPostExecuteto open and close any record sets, data readers, and

so on (database sources only) Search for the topic ‘‘Extending the Data Flow with the Script Component’’ in SQL Server Books Online for full code samples and related information

Destinations

Data Flow destinations provide a place to write the data transformed by the Data Flow task Configuring

destinations is similar to configuring sources, including both basic and advanced editors, and the three

common steps:

■ Connection Manager: Specify the particular table, file(s), view, or query to which data will be written Several destinations will accept a table name from a variable

■ Columns: Map the columns from the data flow (input) to the appropriate destination columns

■ Error Output: Specify what to do should a row fail to insert into the destination: ignore the row, cause the component to fail (default), or redirect the problem row to error output

The available destinations are as follows:

■ OLE DB: Writes rows to a table, view, or SQL command (ad hoc view) for which an OLE DB driver exists Table/view names can be selected directly in the destination or read from a string

variable, and each can be selected with or without fast load Fast load can decrease runtime

by an order of magnitude or more depending on the particular data set and selected options

Options for fast load are as follows:

■ Keep identity: When the target table contains an identity column, either this option must be chosen to allow the identity to be overwritten with inserted values (alaSET IDENTITY_INSERT ON) or the identity column must be excluded from mapped columns

so that new identity values can be generated by SQL Server

■ Keep nulls: Choose this option to load null values instead of any column defaults that would normally apply

■ Table lock: Keeps a table-level lock during execution

Trang 2

■ Check constraints: EnablesCHECKconstraints (such as a valid range on an integer

col-umn) for inserted rows Note that other types of constraints, includingUNIQUE,PRIMARY

KEY,FOREIGN KEY, andNOT NULLcannot be disabled Loading data withCHECK

con-straints disabled will result in those concon-straints being marked as ‘‘not trusted’’ by SQL

Server

■ Rows per batch: Specifying a batch size provides a hint to building the query plan, but it

does not change the size of the transaction used to put rows in the destination table

■ Maximum insert commit size: Similar to theBatchSizeproperty of the Bulk Insert

task (see ‘‘Control flow tasks’’ earlier in the chapter), the maximum insert commit size

is the largest number of rows included in a single transaction The default value is very

large (maximum integer value), allowing most any load task to be committed in a single

transaction

■ SQL Server: This destination uses the same fast-loading mechanism as the Bulk Insert task

but is restricted in that the package must execute on the SQL Server that contains the target

table/view Speed can exceed OLE DB fast loading in some circumstances

■ ADO.NET: Uses an ADO.NET connection manager to write data to a selected table or view

■ DataReader: Makes the data flow available via an ADO.NET DataReader, which can be

opened by other applications, notably Reporting Services, to read the output from the package

■ Flat File: Writes the data flow to a file specified by a Flat File connection manager Because

the file is described in the connection manager, limited options are available in the destination:

Choose whether to overwrite any existing file and provide file header text if desired

■ Excel: Sends rows from the data flow to a sheet or range in a workbook using an Excel

connection manager Note that versions of Excel prior to 2007 can handle at most 65,536

rows and 256 columns of data, the first row of which is consumed by header information

Excel 2007 format supports 1,048,576 rows and 16,384 columns Strings are required to be

Unicode, so anyDT_STRtypes need to be converted toDT_WSTRbefore reaching the Excel

destination

■ Raw: Writes rows from the data flow to an Integration Services format suitable for fast loads

by a raw source component It does not use a connection manager; instead, specify the

AccessMode by choosing to supply a filename via direct input or a string variable Set the

WriteOptionproperty to an appropriate value:

■ Append: Adds data to an existing file, assuming the new data matches the previously

written format

■ Create always: Always start a new file

■ Create once: Creates initially and then appends on subsequent writes This is useful for

loops that write to the same destination many times in the same package

■ Truncate and append: Keeps the existing file’s meta-data, but replaces the data

Raw files cannot handle BLOB data, which excludes any of the large data types, including

text,varchar(max), andvarbinary(max)

■ Recordset: Writes the data flow to a variable Stored as a recordset, the object variable is

suitable for use as the source of aForeachloop or other processing within the package

Trang 3

■ SQL Server Compact: Writes rows from the data flow into a SQL Mobile database table.

Configure by identifying the SQL Server Mobile connection manager that points to the appropriate SDFfile, and then enter the name of the table on the Component Properties tab

■ Dimension Processing and Partition Processing: These tasks enable the population of Anal-ysis Services cubes without first populating the underlying relational data source Identify the Analysis Services connection manager of interest, choose the desired dimension or partition, and then select a processing mode:

■ Add/Incremental: Minimal processing required to add new data

■ Full: Complete reprocess of structure and data

■ Update/Data-only: Replaces data without updating the structure

■ Data Mining Model Training: Provides training data to an existing data mining structure, thus preparing it for prediction queries Specify the Analysis Services connection manager and the target mining structure in that database Use the Columns tab to map the training data

to the appropriate mining structure attributes

■ Script: A script can also be used as a destination, using a similar process to that already described for using a script as a source Use a script as a destination to format output in

a manner not allowed by one of the standard destinations For example, a file suitable for input to a COBOL program could be generated from a standard data flow Start by dragging

a script component onto the design surface, choosing Destination from the pop-up Select Script Component Type dialog Identify the input columns of interest and configure the script properties as described previously After pressing the Edit Script button to access the code, the primary routine to be coded is named after the Input name with a_ProcessInputRow suffix (e.g.,Input0_ProcessInputRow) Note the row object passed as an argument to this routine, which provides the input column information for each row (e.g.,Row.MyColumnand Row.MyColumn_IsNull) Connection configuration and preparation is the same as described

in the source topic Search for the topic ‘‘Extending the Data Flow with the Script Component’’

in SQL Server Books Online for full code samples and related information

Transformations

Between the source and the destination, transformations provide functionality to change the data from

what was read into what is needed Each transformation requires one or more data flows as input and

provides one or more data flows as output Like sources and destinations, many transformations provide

a way to configure error output for rows that fail the transformation In addition, many transformations

provide both a basic and an advanced editor to configure the component, with normal configurations

offered by the basic editor when available

The standard transformations available in the Data Flow task are as follows:

■ Aggregate: Functions rather like aGROUP BYquery in SQL, generating Min, Max, Average, and so on, on the input data flow Due to the nature of this operation, Aggregate does not pass through the data flow, but outputs only aggregated rows Begin on the Aggregations tab

by selecting the columns to include and adding the same column multiple times in the bottom pane if necessary Then, for each column, specify the output column name (Output Alias), the

operation to be performed (such as Group by, Count ), and any comparison flags for

deter-mining value matches (e.g., Ignore case) For columns being distinct counted, performance hints can be supplied for the exact number (Distinct Count Keys) or an approximate number

Trang 4

(Distinct Count Scale) of distinct values that the transform will encounter The scale ranges are

as follows:

■ Low: Approximately 500,000 values

■ Medium: Approximately 5,000,000 values

■ High: Approximately 25,000,000 values

Likewise, performance hints can be specified for the Group By columns by expanding the

Advanced section of the Aggregations tab, entering either an exact (Keys) or an approximate

(Keys Scale) count of different values to be processed Alternately, you can specify performance

hints for the entire component, instead of individual columns, on the Advanced tab, along

with the amount to expand memory when additional memory is required

■ Audit: Adds execution context columns to the data flow, enabling data to be written with

audit information about when it was written and where it came from Available columns

are ExecutionInstanceGUID, PackageID, PackageName, VersionID, ExecutionStartTime,

MachineName, UserName, TaskName, and TaskID

■ Cache: Places selected columns from a data flow into a cache for later use by a Lookup

transform Identify the Cache connection manager and then map the data flow columns into

the cache columns as necessary The cache is a write once, read many data store: All the data

to be included in the cache must be written by a single Cache transform but can then be used

by many Lookup transforms

■ Character Map: Allows strings in the data flow to be transformed by a number of operations:

Byte reversal, Full width, Half width, Hiragana, Katakana, Linguistic casing, Lowercase,

Sim-plified Chinese, Traditional Chinese, and Uppercase Within the editor, choose the columns to

be transformed, adding a column multiple times in the lower pane if necessary Each column

can then be given a destination of a New column or In-place change (replaces the contents of

a column) Then choose an operation and the name for the output column

■ Conditional Split: Enables rows of a data flow to be split between different outputs

depend-ing on the contents of the row Configure by enterdepend-ing output names and expressions in the

editor When the transform receives a row, each expression is evaluated in order, and the

first one that evaluates to true will receive that row of data When none of the expressions

evaluate to true, the default output (named at the bottom of the editor) receives the row Once

configured, as data flows are connected to downstream components, an Input Output Selection

pop-up appears, and the appropriate output can be selected Unmapped outputs are ignored

and can result in data loss

■ Copy Column: Adds a copy of an existing column to the data flow Within the editor, choose

the columns to be copied, adding a column multiple times in the lower pane if necessary

Each new column can then be given an appropriate name (Output Alias)

■ Data Conversion: Adds a copy of an existing column to the data flow, enabling data type

conversions in the process Within the editor, choose the columns to be converted, adding a

column multiple times in the lower pane if necessary Each new column can then be given

an appropriate name (Output Alias) and data type Conversions between code pages are not

allowed Use the advanced editor to enable locale-insensitive fast parsing algorithms by setting

theFastParseproperty totrueon each output column

■ Data Mining Query: Runs a DMX query for each row of the data flow, enabling rows to be

associated with predictions, such as the likelihood that a new customer will make a purchase

or the probability that a transaction is fraudulent Configure by specifying an Analysis Services

Trang 5

connection manager, choosing the mining structure and highlighting the mining model to

be queried On the Query tab, click the Build New Query button and map columns in the data flow to the columns of the model (a default mapping is created based on column name)

Then specify the columns to be added to the data flow in the lower half of the pane (usually a prediction function) and give the output an appropriate name (Alias)

■ Derived Column: Uses expressions to generate values that can either be added to the data flow or replace existing columns Within the editor, construct Integration Services expressions

to produce the desired value, using type casts to change data types as needed Assign each expression to either replace an existing column or be added as a new column Give new columns an appropriate name and data type

■ Export Column: Writes large object data types (DT_TEXT,DT_NTEXT, orDT_IMAGE) to file(s) specified by a filename contained in the data flow For example, large text objects could

be extracted into different files for inclusion in a website or text index Within the editor, specify two columns for each extract defined: a large object column and a column containing the target filename A file can receive any number of objects Set Append/Truncate/Exists options to indicate the desired file create behavior

■ Fuzzy Grouping: Identifies duplicate rows in the data flow using exact matching for any data type and/or fuzzy matching for string data types (DT_STRandDT_WSTR) Configure the task

to examine the key columns within the data flow that identify a unique row Several columns are added to the output as a result of this transform:

■ Input key (default name _key_in): A sequential number assigned to identify each input row

■ Output key (default name _key_out): The Input key of the row this row matches (or its own Input key if not a duplicate) One way to cull the duplicate rows from the data flow is to define a downstream conditional split on the condition[_key_in] ==

[_key_out]

■ Similarity score (default name _score): A measure of the similarity of the entire row,

on a scale of 0 to one, to the first row of the set of duplicates

■ Group Output (default name <column>_clean): For each key column selected, this

is the value from the first row of the set of duplicates (that is, the value from the row indicated by_key_out)

Similarity Output (default name _Similarity_<column>): For each key column

selected, this is the similarity score for that individual column versus the first row of the set

of duplicates

Within the editor, specify an OLE DB connection manager, where the transform will have per-missions to create a temporary table Then configure each key column by setting its Output, Group Output, and Similarity Output names In addition, set the following properties for each column:

■ Match Type: Choose between Fuzzy and Exact Match types for each string column (non-string data types always match exactly)

■ Minimum Similarity: Smallest similarity score allowed for a match Leaving fuzzy match columns at the default of 0 enables similarity to be controlled from the slider on the Advanced tab of the editor

Trang 6

■ Numerals: Specify whether leading or trailing numerals are significant in making

compar-isons The default of Neither specifies that leading and training numerals are not considered

in matches

■ Comparison Flags: Choose settings appropriate to the type of strings being compared

■ Fuzzy Lookup: Similar to the Lookup transform, except that when an exact lookup fails,

a fuzzy lookup is attempted for any string columns (DT_STRandDT_WSTR) Specify an

OLE DB connection manager and table name where values will be looked up, and a new or

existing index to be used to cache fuzzy lookup information On the Columns tab, specify

a join between the data flow and the reference table, and which columns from the reference

table will be added to the data flow On the Advanced tab, select the similarity required for

finding a match: The lower the number the more liberal the matches become In addition to

the specified columns added to the data flow, match meta-data is added as follows:

■ _Similarity: Reports the similarity between all of the values compared

■ _Confidence: Reports the confidence level that the chosen match was the correct one

compared to other possible matches in the lookup table

_Similarity_<column name>: Similarity for each individual column

The advanced editor has settings of MinimumSimilarityand

FuzzyComparisonFlagsfor each individual column

■ Import Column: Reads large object data types (DT_TEXT,DT_NTEXT, orDT_IMAGE) from

files specified by a filename contained in the data flow, adding the text or image objects as a

new column in the data flow Configure in the advanced editor by identifying each column

that contains a filename to be read on the Input Columns tab Then, on the Input and Output

Properties tab, create a new output column for each filename column to contain the contents

of the files as they are read, giving the new column an appropriate name and data type In the

output column properties, note the grayed-outIDproperty, and locate the properties for the

corresponding input (filename) column Set the input column’sFileDataColumnIDproperty

to the output column’sIDvalue to tie the filename and contents columns together Set the

ExpectBOMproperty totruefor anyDT_NTEXTdata being read that has been written with

byte-order marks

■ Lookup: Finds rows in a database table or cache that match the data flow and includes

selected columns in the data flow, much like a join between the data flow and a table or

cache For example, a product ID could be added to the data flow by looking up the product

name in the master table Note that all lookups are case sensitive regardless of the collation

of the underlying database Case can be effectively ignored by converting the associated text

values to a single case before comparison (e.g., using theUPPERfunction in a derived column

expression)

The Lookup transform operates in three possible modes:

■ No cache: Runs a query against the source database for each lookup performed No cache

is kept in memory in order to minimize the number of database accesses, but each lookup

reflects the latest value stored in the database

■ Full cache: Populates an in-memory cache from either the database or a Cache connection

manager (see Cache transform and connection manager descriptions earlier in this chapter)

and relies solely on that cache for lookups during execution This minimizes the disk

accesses required but may exceed available memory for very large data sets, which can

Trang 7

dramatically reduce performance Because no error message appears as performance degrades, it is useful to monitor resource usage while processing sample datasets to determine whether the cache size will work for the range of data sizes expected in production uses

■ Partial cache: Populates an in-memory cache with a subset of the data available from the database, and then issues queries against the database for any values not found within the in-memory cache This method provides a compromise between speed and available memory Whenever possible, this mode should be used with a query that fills the cache with the most likely rows encountered For example, many warehousing applications are more likely to access values recently added to the database

Start the lookup transform configuration process by selecting the cache mode and the connec-tion type for Full Cache transforms The most common handling of rows with no matching entries is to ‘‘Redirect rows to no match output’’ for further processing, but the context may require one of the other options On the Connections page, choose the connection manager containing the reference data, and the table or query to retrieve that data from (for database connections) Usually, the best choice is a query that returns only the columns used in the lookup, which avoids reading and storing unused columns

On the Columns tab, map the join columns between the data flow and the reference table

by dragging and dropping lines between corresponding columns Then check the reference table columns that should be added to the data flow, adjusting names as desired in the bottom pane

The Advanced tab provides an opportunity to optimize memory performance of the Lookup transform for Partial Cache mode, and to modify the query used for row-by-row lookups Set the size for in-memory caching based on the number of rows that will be loaded — these values often require testing to refine ‘‘Enable cache for rows with no matching entries’’

enables data from row-by-row lookups that fail to be saved in the in-memory cache along with the data originally read at the start of the transform, thus avoiding repeated database accesses for missing values Review the custom query to ensure that the row-by-row lookup statement

is properly built

■ Merge: Combines the rows of two sorted data flows into a single data flow For example,

if some of the rows of a sorted data flow are split by an error output or Conditional Split transform, then they can be merged again The upstream sort must have used the same key columns for both flows, and the data types of columns to be merged must be compatible

Configure by dragging two different inputs to the transform and mapping columns together

in the editor See the Union All description later in this list for the unsorted combination of flows

■ Merge Join: Provides SQL join functionality between data flows sorted on the join columns

Configure by dragging the two flows to be joined to the transform, paying attention to which one is connected to the left input if a left outer join is desired Within the editor, choose the join type, map the join columns, and choose which columns are to be included in the output

■ Multicast: Copies every row of an input data flow to many different outputs Once an output has been connected to a downstream component, a new output will appear for connection to the next downstream component Only the names of the output are configurable

■ OLE DB Command: Executes a SQL statement (such asUPDATEorDELETE) for every row

in a data flow Configure by specifying an OLE DB connection manager to use when executing the command, and then switch to the Component Properties tab and enter the SQL statement using question marks for any parameters (e.g.,UPDATE MyTable SET Col1 = ? WHERE

Trang 8

Col2=?) On the Column Mappings tab, associate a data flow column with each parameter in

the SQL statement

■ Percentage Sampling: Splits a data flow by randomly sampling the rows for a given

per-centage For example, this could be used to separate a data set into training and testing

sets for data mining Within the editor, specify the approximate percentage of rows to

allo-cate to the selected output, while the remaining rows are sent to the unselected output If

a sampling seed is provided, the transform will always select the same rows from a given

data set

■ Pivot: Denormalizes a data flow, similar to the way an Excel pivot table operates, making

attribute values into columns For example, a data flow with three columns, Quarter, Region,

and Revenue, could be transformed into a data flow with columns for Quarter, Western

Region, and Eastern Region, thus pivoting on Region

■ Row Count: Counts the number of rows in a data flow and places the result into a variable

Configure by populating theVariableNameproperty

■ Row Sampling: Nearly identical to the Percentage Sampling transform, except that the

approximate number of rows to be sampled is entered, rather than the percentage of rows

■ Script: Using a script as a transformation enables transformations with very complex logic to

act on a data flow Start by dragging a script component onto the design surface, choosing

Transformation from the pop-up Select Script Component Type dialog Within the editor’s

Input Columns tab, mark the columns that will be available in the script, and indicate which

will be ReadWrite versus ReadOnly On the Inputs and Outputs tab, add any output columns

that will be populated by the script above and beyond the input columns

On the Script page of the editor, list the read and read/write variables to be accessed within

the script, separated by commas, in theReadOnlyVariablesandReadWriteVariables

properties, respectively Click the Edit Script button to expose the code itself, and note that

the primary method to be coded overrides<inputname>_ProcessInputRow, as shown in

this simple example:

Public Overrides Sub Input0_ProcessInputRow _

(ByVal Row As Input0Buffer)

‘Source system indicates missing dates with old values,

‘replace those with NULLs Also determine if given time

‘is during defined business hours

If Row.TransactionDate < #1/1/2000# Then

Row.TransactionDate_IsNull = True

Row.PrimeTimeFlag_IsNull = True

Else

‘Set flag for prime time transactions

If Weekday(Row.TransactionDate) > 1 _

And Weekday(Row.TransactionDate) < 7 _

And Row.TransactionDate.Hour > 7 _

And Row.TransactionDate.Hour < 17 Then

Row.PrimeTimeFlag = True Else

Row.PrimeTimeFlag = False End If

End If

End Sub

Trang 9

This example uses oneReadWriteinput (TransactionDate) and one output (Prime TimeFlag), with the input name left with the default of Input 0 Each column is exposed as

a property of theRowobject, as is the additional property with thesuffix _IsNullto test

or set the column value asNULL The routine is called once for each row in the data flow

■ Slowly Changing Dimension: Compares the data in a data flow to a dimension table, and, based on the roles assigned to particular columns, maintains the dimension This component

is unusual in that it does not have an editor; instead, a wizard guides the steps to define col-umn roles and interactions with the dimension table At the conclusion of the wizard, several components are placed on the design surface to accomplish the dimension maintenance task

■ Sort: Sorts the rows in a data flow by selected columns Configure by selecting the columns

to sort by Then, in the lower pane, choose the sort type, the sort order, and the comparison flags appropriate to the data being sorted

■ Term Extraction: Builds a new data flow based on terms it finds in a Unicode text column (DT_WSTRorDT_NTEXT) This is the training part of text mining, whereby strings of a partic-ular type are used to generate a list of commonly used terms, which is later used by the Term Lookup component to identify similar strings For example, the text of saved RSS documents could be used to find similar documents in a large population Configure by identifying the column containing the Unicode text to be analyzed If a list of terms to be excluded has been built, then identify the table and column on the Exclusions tab The Advanced tab controls the extraction algorithm, including whether terms are single words or phrases (articles, pronouns, etc., are never included), the scoring algorithm, minimum frequency before extraction, and maximum phrase length

■ Term Lookup: Provides a ‘‘join’’ between a Unicode text column (DT_WSTRorDT_NTEXT) in the data flow and a reference table of terms built by the Term Extraction component One row appears in the output data flow for each term matched The output data flow also contains two columns in addition to the selected input columns: Term and Frequency Term is the noun or noun phrase that was matched and Frequency is the number of occurrences in the data flow column Configure the transform by specifying the OLE DB connection manager and table that contains the list of terms Use the Term Lookup tab to check the input columns that should be passed through to the output data flow, and then map the input Unicode text column to the Term column of the reference table by dragging and dropping between those columns in the upper pane

■ Union All: Combines rows from multiple data flows into a single data flow, assuming the source columns are of compatible types Configure by connecting as many data flows as needed to the component Then, using the editor, ensure that the correct columns from each data flow are mapped to the appropriate output column

■ Unpivot: Makes a data flow more normalized by turning columns into attribute values For example, a data flow with one row for each quarter and a column for revenue by region could

be turned into a three-column data flow: Quarter, Region, and Revenue

Maintainable and Manageable Packages

Integration Services enables applications to be created with relatively little effort, which is a great

advantage from a development perspective, but can be a problem if quickly developed systems are

deployed without proper planning Care is required to build maintainable and manageable applications

Trang 10

regardless of the implementation Fortunately, Integration Services is designed with many features that

support long-term maintainability and manageability

Designing before developing is especially important when first getting started with Integration Services,

as practices established early are often reused in subsequent efforts, especially logging, auditing, and

overall structure Perhaps the key advantage to developing with Integration Services is the opportunity

to centralize everything about a data processing task in a single place, with clear precedence between

steps, and opportunities to handle errors as they occur Centralization greatly increases maintainability

compared to the traditional ‘‘script here, program there, stored procedure somewhere else’’ approach

Other topics to consider during design include the following:

■ Identify repeating themes for possible package reuse Many tasks that repeat the same activities

on objects with the same metadata are good candidates for placing in reused subpackages

■ Appropriate logging strategies are the key to operational success When an error occurs, who

will be responsible for noticing and how will they know? For example, how will someone

know whether a package was supposed to run but did not for some reason? What level of

logging is appropriate? (More is not always better; too many irrelevant details mask true

problems.) What kinds of environment and package state information will be required to

understand why a failure has occurred after the fact? (For more information about logging, see

the next section.)

■ Auditing concepts may be useful for both compliance and error-recovery operations What

type of information should be associated with data created by a package? If large quantities

of information are required, then consider adding the details to an audit or lineage log, adding

only an ID to affected records Alternately, the Audit transform described earlier in this chapter

can be used to put audit information on each row

■ For packages that run on multiple servers or environments, what configuration details change

for those environments? Which storage mode (registry, SQL, XML, etc.) will be most

effec-tive at distributing configuration data? (See the ‘‘Package configurations’’ section later in this

chapter.)

■ Determine how to recover from a package failure Will manual intervention be required

before the package can run again? For example, a package that loads data may be able to use

transactions to ensure that rerunning a package does not load duplicate rows

■ Consider designing checkpoint restartable logic for long-running packages (See the

‘‘Check-point restart’’ section later in this chapter.)

■ Determine the most likely failure points in a package What steps will be realistically taken to

address a failure? Add those steps to the package if possible, using error data flows and task

constraints now to avoid labor costs later

Good development practices help increase maintainability as well Give packages, tasks, components,

and other visible objects meaningful names Liberal use of annotations to note non-obvious

mean-ings and motivations will benefit future developers, too Finally, use version-control software to maintain

a history of package and related file versions

Logging

Because many packages are destined for unattended operation, generating an execution log is an

excellent method for tracking operations and collecting debug information To configure logging for

a package, right-click on the package design surface and choose Logging On the Providers and Logs

Ngày đăng: 04/07/2014, 09:20

TỪ KHÓA LIÊN QUAN