■ ADO.NET: Uses an ADO.NET connection manager to write data to a selected table or view ■ DataReader: Makes the data flow available via an ADO.NET DataReader, which can be opened by othe
Trang 1For i = 1 To 20 Output0Buffer.AddRow() Output0Buffer.RandomInt = CInt(Rnd() * 100) Next
End Sub This example works for a single output with the default nameOutput 0containing a single integer columnRandomInt Notice how each output is exposed asname+"buffer"and embedded spaces are removed from the name New rows are added using theAddRowmethod and columns are populated by referring to them as output properties An additional property
is exposed for each column with the suffix_IsNull(e.g.,Output0Buffer.RandomInt_
IsNull) to mark a value asNULL
Reading data from an external source requires some additional steps, including identify-ing the connection managers that will be referenced within the script on the Connection Managers page of the editor Then, in the script, additional methods must be overridden:
AcquireConnectionsandReleaseConnectionsto open and close any connections, andPreExecuteandPostExecuteto open and close any record sets, data readers, and
so on (database sources only) Search for the topic ‘‘Extending the Data Flow with the Script Component’’ in SQL Server Books Online for full code samples and related information
Destinations
Data Flow destinations provide a place to write the data transformed by the Data Flow task Configuring
destinations is similar to configuring sources, including both basic and advanced editors, and the three
common steps:
■ Connection Manager: Specify the particular table, file(s), view, or query to which data will be written Several destinations will accept a table name from a variable
■ Columns: Map the columns from the data flow (input) to the appropriate destination columns
■ Error Output: Specify what to do should a row fail to insert into the destination: ignore the row, cause the component to fail (default), or redirect the problem row to error output
The available destinations are as follows:
■ OLE DB: Writes rows to a table, view, or SQL command (ad hoc view) for which an OLE DB driver exists Table/view names can be selected directly in the destination or read from a string
variable, and each can be selected with or without fast load Fast load can decrease runtime
by an order of magnitude or more depending on the particular data set and selected options
Options for fast load are as follows:
■ Keep identity: When the target table contains an identity column, either this option must be chosen to allow the identity to be overwritten with inserted values (alaSET IDENTITY_INSERT ON) or the identity column must be excluded from mapped columns
so that new identity values can be generated by SQL Server
■ Keep nulls: Choose this option to load null values instead of any column defaults that would normally apply
■ Table lock: Keeps a table-level lock during execution
Trang 2■ Check constraints: EnablesCHECKconstraints (such as a valid range on an integer
col-umn) for inserted rows Note that other types of constraints, includingUNIQUE,PRIMARY
KEY,FOREIGN KEY, andNOT NULLcannot be disabled Loading data withCHECK
con-straints disabled will result in those concon-straints being marked as ‘‘not trusted’’ by SQL
Server
■ Rows per batch: Specifying a batch size provides a hint to building the query plan, but it
does not change the size of the transaction used to put rows in the destination table
■ Maximum insert commit size: Similar to theBatchSizeproperty of the Bulk Insert
task (see ‘‘Control flow tasks’’ earlier in the chapter), the maximum insert commit size
is the largest number of rows included in a single transaction The default value is very
large (maximum integer value), allowing most any load task to be committed in a single
transaction
■ SQL Server: This destination uses the same fast-loading mechanism as the Bulk Insert task
but is restricted in that the package must execute on the SQL Server that contains the target
table/view Speed can exceed OLE DB fast loading in some circumstances
■ ADO.NET: Uses an ADO.NET connection manager to write data to a selected table or view
■ DataReader: Makes the data flow available via an ADO.NET DataReader, which can be
opened by other applications, notably Reporting Services, to read the output from the package
■ Flat File: Writes the data flow to a file specified by a Flat File connection manager Because
the file is described in the connection manager, limited options are available in the destination:
Choose whether to overwrite any existing file and provide file header text if desired
■ Excel: Sends rows from the data flow to a sheet or range in a workbook using an Excel
connection manager Note that versions of Excel prior to 2007 can handle at most 65,536
rows and 256 columns of data, the first row of which is consumed by header information
Excel 2007 format supports 1,048,576 rows and 16,384 columns Strings are required to be
Unicode, so anyDT_STRtypes need to be converted toDT_WSTRbefore reaching the Excel
destination
■ Raw: Writes rows from the data flow to an Integration Services format suitable for fast loads
by a raw source component It does not use a connection manager; instead, specify the
AccessMode by choosing to supply a filename via direct input or a string variable Set the
WriteOptionproperty to an appropriate value:
■ Append: Adds data to an existing file, assuming the new data matches the previously
written format
■ Create always: Always start a new file
■ Create once: Creates initially and then appends on subsequent writes This is useful for
loops that write to the same destination many times in the same package
■ Truncate and append: Keeps the existing file’s meta-data, but replaces the data
Raw files cannot handle BLOB data, which excludes any of the large data types, including
text,varchar(max), andvarbinary(max)
■ Recordset: Writes the data flow to a variable Stored as a recordset, the object variable is
suitable for use as the source of aForeachloop or other processing within the package
Trang 3■ SQL Server Compact: Writes rows from the data flow into a SQL Mobile database table.
Configure by identifying the SQL Server Mobile connection manager that points to the appropriate SDFfile, and then enter the name of the table on the Component Properties tab
■ Dimension Processing and Partition Processing: These tasks enable the population of Anal-ysis Services cubes without first populating the underlying relational data source Identify the Analysis Services connection manager of interest, choose the desired dimension or partition, and then select a processing mode:
■ Add/Incremental: Minimal processing required to add new data
■ Full: Complete reprocess of structure and data
■ Update/Data-only: Replaces data without updating the structure
■ Data Mining Model Training: Provides training data to an existing data mining structure, thus preparing it for prediction queries Specify the Analysis Services connection manager and the target mining structure in that database Use the Columns tab to map the training data
to the appropriate mining structure attributes
■ Script: A script can also be used as a destination, using a similar process to that already described for using a script as a source Use a script as a destination to format output in
a manner not allowed by one of the standard destinations For example, a file suitable for input to a COBOL program could be generated from a standard data flow Start by dragging
a script component onto the design surface, choosing Destination from the pop-up Select Script Component Type dialog Identify the input columns of interest and configure the script properties as described previously After pressing the Edit Script button to access the code, the primary routine to be coded is named after the Input name with a_ProcessInputRow suffix (e.g.,Input0_ProcessInputRow) Note the row object passed as an argument to this routine, which provides the input column information for each row (e.g.,Row.MyColumnand Row.MyColumn_IsNull) Connection configuration and preparation is the same as described
in the source topic Search for the topic ‘‘Extending the Data Flow with the Script Component’’
in SQL Server Books Online for full code samples and related information
Transformations
Between the source and the destination, transformations provide functionality to change the data from
what was read into what is needed Each transformation requires one or more data flows as input and
provides one or more data flows as output Like sources and destinations, many transformations provide
a way to configure error output for rows that fail the transformation In addition, many transformations
provide both a basic and an advanced editor to configure the component, with normal configurations
offered by the basic editor when available
The standard transformations available in the Data Flow task are as follows:
■ Aggregate: Functions rather like aGROUP BYquery in SQL, generating Min, Max, Average, and so on, on the input data flow Due to the nature of this operation, Aggregate does not pass through the data flow, but outputs only aggregated rows Begin on the Aggregations tab
by selecting the columns to include and adding the same column multiple times in the bottom pane if necessary Then, for each column, specify the output column name (Output Alias), the
operation to be performed (such as Group by, Count ), and any comparison flags for
deter-mining value matches (e.g., Ignore case) For columns being distinct counted, performance hints can be supplied for the exact number (Distinct Count Keys) or an approximate number
Trang 4(Distinct Count Scale) of distinct values that the transform will encounter The scale ranges are
as follows:
■ Low: Approximately 500,000 values
■ Medium: Approximately 5,000,000 values
■ High: Approximately 25,000,000 values
Likewise, performance hints can be specified for the Group By columns by expanding the
Advanced section of the Aggregations tab, entering either an exact (Keys) or an approximate
(Keys Scale) count of different values to be processed Alternately, you can specify performance
hints for the entire component, instead of individual columns, on the Advanced tab, along
with the amount to expand memory when additional memory is required
■ Audit: Adds execution context columns to the data flow, enabling data to be written with
audit information about when it was written and where it came from Available columns
are ExecutionInstanceGUID, PackageID, PackageName, VersionID, ExecutionStartTime,
MachineName, UserName, TaskName, and TaskID
■ Cache: Places selected columns from a data flow into a cache for later use by a Lookup
transform Identify the Cache connection manager and then map the data flow columns into
the cache columns as necessary The cache is a write once, read many data store: All the data
to be included in the cache must be written by a single Cache transform but can then be used
by many Lookup transforms
■ Character Map: Allows strings in the data flow to be transformed by a number of operations:
Byte reversal, Full width, Half width, Hiragana, Katakana, Linguistic casing, Lowercase,
Sim-plified Chinese, Traditional Chinese, and Uppercase Within the editor, choose the columns to
be transformed, adding a column multiple times in the lower pane if necessary Each column
can then be given a destination of a New column or In-place change (replaces the contents of
a column) Then choose an operation and the name for the output column
■ Conditional Split: Enables rows of a data flow to be split between different outputs
depend-ing on the contents of the row Configure by enterdepend-ing output names and expressions in the
editor When the transform receives a row, each expression is evaluated in order, and the
first one that evaluates to true will receive that row of data When none of the expressions
evaluate to true, the default output (named at the bottom of the editor) receives the row Once
configured, as data flows are connected to downstream components, an Input Output Selection
pop-up appears, and the appropriate output can be selected Unmapped outputs are ignored
and can result in data loss
■ Copy Column: Adds a copy of an existing column to the data flow Within the editor, choose
the columns to be copied, adding a column multiple times in the lower pane if necessary
Each new column can then be given an appropriate name (Output Alias)
■ Data Conversion: Adds a copy of an existing column to the data flow, enabling data type
conversions in the process Within the editor, choose the columns to be converted, adding a
column multiple times in the lower pane if necessary Each new column can then be given
an appropriate name (Output Alias) and data type Conversions between code pages are not
allowed Use the advanced editor to enable locale-insensitive fast parsing algorithms by setting
theFastParseproperty totrueon each output column
■ Data Mining Query: Runs a DMX query for each row of the data flow, enabling rows to be
associated with predictions, such as the likelihood that a new customer will make a purchase
or the probability that a transaction is fraudulent Configure by specifying an Analysis Services
Trang 5connection manager, choosing the mining structure and highlighting the mining model to
be queried On the Query tab, click the Build New Query button and map columns in the data flow to the columns of the model (a default mapping is created based on column name)
Then specify the columns to be added to the data flow in the lower half of the pane (usually a prediction function) and give the output an appropriate name (Alias)
■ Derived Column: Uses expressions to generate values that can either be added to the data flow or replace existing columns Within the editor, construct Integration Services expressions
to produce the desired value, using type casts to change data types as needed Assign each expression to either replace an existing column or be added as a new column Give new columns an appropriate name and data type
■ Export Column: Writes large object data types (DT_TEXT,DT_NTEXT, orDT_IMAGE) to file(s) specified by a filename contained in the data flow For example, large text objects could
be extracted into different files for inclusion in a website or text index Within the editor, specify two columns for each extract defined: a large object column and a column containing the target filename A file can receive any number of objects Set Append/Truncate/Exists options to indicate the desired file create behavior
■ Fuzzy Grouping: Identifies duplicate rows in the data flow using exact matching for any data type and/or fuzzy matching for string data types (DT_STRandDT_WSTR) Configure the task
to examine the key columns within the data flow that identify a unique row Several columns are added to the output as a result of this transform:
■ Input key (default name _key_in): A sequential number assigned to identify each input row
■ Output key (default name _key_out): The Input key of the row this row matches (or its own Input key if not a duplicate) One way to cull the duplicate rows from the data flow is to define a downstream conditional split on the condition[_key_in] ==
[_key_out]
■ Similarity score (default name _score): A measure of the similarity of the entire row,
on a scale of 0 to one, to the first row of the set of duplicates
■ Group Output (default name <column>_clean): For each key column selected, this
is the value from the first row of the set of duplicates (that is, the value from the row indicated by_key_out)
■ Similarity Output (default name _Similarity_<column>): For each key column
selected, this is the similarity score for that individual column versus the first row of the set
of duplicates
Within the editor, specify an OLE DB connection manager, where the transform will have per-missions to create a temporary table Then configure each key column by setting its Output, Group Output, and Similarity Output names In addition, set the following properties for each column:
■ Match Type: Choose between Fuzzy and Exact Match types for each string column (non-string data types always match exactly)
■ Minimum Similarity: Smallest similarity score allowed for a match Leaving fuzzy match columns at the default of 0 enables similarity to be controlled from the slider on the Advanced tab of the editor
Trang 6■ Numerals: Specify whether leading or trailing numerals are significant in making
compar-isons The default of Neither specifies that leading and training numerals are not considered
in matches
■ Comparison Flags: Choose settings appropriate to the type of strings being compared
■ Fuzzy Lookup: Similar to the Lookup transform, except that when an exact lookup fails,
a fuzzy lookup is attempted for any string columns (DT_STRandDT_WSTR) Specify an
OLE DB connection manager and table name where values will be looked up, and a new or
existing index to be used to cache fuzzy lookup information On the Columns tab, specify
a join between the data flow and the reference table, and which columns from the reference
table will be added to the data flow On the Advanced tab, select the similarity required for
finding a match: The lower the number the more liberal the matches become In addition to
the specified columns added to the data flow, match meta-data is added as follows:
■ _Similarity: Reports the similarity between all of the values compared
■ _Confidence: Reports the confidence level that the chosen match was the correct one
compared to other possible matches in the lookup table
■ _Similarity_<column name>: Similarity for each individual column
The advanced editor has settings of MinimumSimilarityand
FuzzyComparisonFlagsfor each individual column
■ Import Column: Reads large object data types (DT_TEXT,DT_NTEXT, orDT_IMAGE) from
files specified by a filename contained in the data flow, adding the text or image objects as a
new column in the data flow Configure in the advanced editor by identifying each column
that contains a filename to be read on the Input Columns tab Then, on the Input and Output
Properties tab, create a new output column for each filename column to contain the contents
of the files as they are read, giving the new column an appropriate name and data type In the
output column properties, note the grayed-outIDproperty, and locate the properties for the
corresponding input (filename) column Set the input column’sFileDataColumnIDproperty
to the output column’sIDvalue to tie the filename and contents columns together Set the
ExpectBOMproperty totruefor anyDT_NTEXTdata being read that has been written with
byte-order marks
■ Lookup: Finds rows in a database table or cache that match the data flow and includes
selected columns in the data flow, much like a join between the data flow and a table or
cache For example, a product ID could be added to the data flow by looking up the product
name in the master table Note that all lookups are case sensitive regardless of the collation
of the underlying database Case can be effectively ignored by converting the associated text
values to a single case before comparison (e.g., using theUPPERfunction in a derived column
expression)
The Lookup transform operates in three possible modes:
■ No cache: Runs a query against the source database for each lookup performed No cache
is kept in memory in order to minimize the number of database accesses, but each lookup
reflects the latest value stored in the database
■ Full cache: Populates an in-memory cache from either the database or a Cache connection
manager (see Cache transform and connection manager descriptions earlier in this chapter)
and relies solely on that cache for lookups during execution This minimizes the disk
accesses required but may exceed available memory for very large data sets, which can
Trang 7dramatically reduce performance Because no error message appears as performance degrades, it is useful to monitor resource usage while processing sample datasets to determine whether the cache size will work for the range of data sizes expected in production uses
■ Partial cache: Populates an in-memory cache with a subset of the data available from the database, and then issues queries against the database for any values not found within the in-memory cache This method provides a compromise between speed and available memory Whenever possible, this mode should be used with a query that fills the cache with the most likely rows encountered For example, many warehousing applications are more likely to access values recently added to the database
Start the lookup transform configuration process by selecting the cache mode and the connec-tion type for Full Cache transforms The most common handling of rows with no matching entries is to ‘‘Redirect rows to no match output’’ for further processing, but the context may require one of the other options On the Connections page, choose the connection manager containing the reference data, and the table or query to retrieve that data from (for database connections) Usually, the best choice is a query that returns only the columns used in the lookup, which avoids reading and storing unused columns
On the Columns tab, map the join columns between the data flow and the reference table
by dragging and dropping lines between corresponding columns Then check the reference table columns that should be added to the data flow, adjusting names as desired in the bottom pane
The Advanced tab provides an opportunity to optimize memory performance of the Lookup transform for Partial Cache mode, and to modify the query used for row-by-row lookups Set the size for in-memory caching based on the number of rows that will be loaded — these values often require testing to refine ‘‘Enable cache for rows with no matching entries’’
enables data from row-by-row lookups that fail to be saved in the in-memory cache along with the data originally read at the start of the transform, thus avoiding repeated database accesses for missing values Review the custom query to ensure that the row-by-row lookup statement
is properly built
■ Merge: Combines the rows of two sorted data flows into a single data flow For example,
if some of the rows of a sorted data flow are split by an error output or Conditional Split transform, then they can be merged again The upstream sort must have used the same key columns for both flows, and the data types of columns to be merged must be compatible
Configure by dragging two different inputs to the transform and mapping columns together
in the editor See the Union All description later in this list for the unsorted combination of flows
■ Merge Join: Provides SQL join functionality between data flows sorted on the join columns
Configure by dragging the two flows to be joined to the transform, paying attention to which one is connected to the left input if a left outer join is desired Within the editor, choose the join type, map the join columns, and choose which columns are to be included in the output
■ Multicast: Copies every row of an input data flow to many different outputs Once an output has been connected to a downstream component, a new output will appear for connection to the next downstream component Only the names of the output are configurable
■ OLE DB Command: Executes a SQL statement (such asUPDATEorDELETE) for every row
in a data flow Configure by specifying an OLE DB connection manager to use when executing the command, and then switch to the Component Properties tab and enter the SQL statement using question marks for any parameters (e.g.,UPDATE MyTable SET Col1 = ? WHERE
Trang 8Col2=?) On the Column Mappings tab, associate a data flow column with each parameter in
the SQL statement
■ Percentage Sampling: Splits a data flow by randomly sampling the rows for a given
per-centage For example, this could be used to separate a data set into training and testing
sets for data mining Within the editor, specify the approximate percentage of rows to
allo-cate to the selected output, while the remaining rows are sent to the unselected output If
a sampling seed is provided, the transform will always select the same rows from a given
data set
■ Pivot: Denormalizes a data flow, similar to the way an Excel pivot table operates, making
attribute values into columns For example, a data flow with three columns, Quarter, Region,
and Revenue, could be transformed into a data flow with columns for Quarter, Western
Region, and Eastern Region, thus pivoting on Region
■ Row Count: Counts the number of rows in a data flow and places the result into a variable
Configure by populating theVariableNameproperty
■ Row Sampling: Nearly identical to the Percentage Sampling transform, except that the
approximate number of rows to be sampled is entered, rather than the percentage of rows
■ Script: Using a script as a transformation enables transformations with very complex logic to
act on a data flow Start by dragging a script component onto the design surface, choosing
Transformation from the pop-up Select Script Component Type dialog Within the editor’s
Input Columns tab, mark the columns that will be available in the script, and indicate which
will be ReadWrite versus ReadOnly On the Inputs and Outputs tab, add any output columns
that will be populated by the script above and beyond the input columns
On the Script page of the editor, list the read and read/write variables to be accessed within
the script, separated by commas, in theReadOnlyVariablesandReadWriteVariables
properties, respectively Click the Edit Script button to expose the code itself, and note that
the primary method to be coded overrides<inputname>_ProcessInputRow, as shown in
this simple example:
Public Overrides Sub Input0_ProcessInputRow _
(ByVal Row As Input0Buffer)
‘Source system indicates missing dates with old values,
‘replace those with NULLs Also determine if given time
‘is during defined business hours
If Row.TransactionDate < #1/1/2000# Then
Row.TransactionDate_IsNull = True
Row.PrimeTimeFlag_IsNull = True
Else
‘Set flag for prime time transactions
If Weekday(Row.TransactionDate) > 1 _
And Weekday(Row.TransactionDate) < 7 _
And Row.TransactionDate.Hour > 7 _
And Row.TransactionDate.Hour < 17 Then
Row.PrimeTimeFlag = True Else
Row.PrimeTimeFlag = False End If
End If
End Sub
Trang 9This example uses oneReadWriteinput (TransactionDate) and one output (Prime TimeFlag), with the input name left with the default of Input 0 Each column is exposed as
a property of theRowobject, as is the additional property with thesuffix _IsNullto test
or set the column value asNULL The routine is called once for each row in the data flow
■ Slowly Changing Dimension: Compares the data in a data flow to a dimension table, and, based on the roles assigned to particular columns, maintains the dimension This component
is unusual in that it does not have an editor; instead, a wizard guides the steps to define col-umn roles and interactions with the dimension table At the conclusion of the wizard, several components are placed on the design surface to accomplish the dimension maintenance task
■ Sort: Sorts the rows in a data flow by selected columns Configure by selecting the columns
to sort by Then, in the lower pane, choose the sort type, the sort order, and the comparison flags appropriate to the data being sorted
■ Term Extraction: Builds a new data flow based on terms it finds in a Unicode text column (DT_WSTRorDT_NTEXT) This is the training part of text mining, whereby strings of a partic-ular type are used to generate a list of commonly used terms, which is later used by the Term Lookup component to identify similar strings For example, the text of saved RSS documents could be used to find similar documents in a large population Configure by identifying the column containing the Unicode text to be analyzed If a list of terms to be excluded has been built, then identify the table and column on the Exclusions tab The Advanced tab controls the extraction algorithm, including whether terms are single words or phrases (articles, pronouns, etc., are never included), the scoring algorithm, minimum frequency before extraction, and maximum phrase length
■ Term Lookup: Provides a ‘‘join’’ between a Unicode text column (DT_WSTRorDT_NTEXT) in the data flow and a reference table of terms built by the Term Extraction component One row appears in the output data flow for each term matched The output data flow also contains two columns in addition to the selected input columns: Term and Frequency Term is the noun or noun phrase that was matched and Frequency is the number of occurrences in the data flow column Configure the transform by specifying the OLE DB connection manager and table that contains the list of terms Use the Term Lookup tab to check the input columns that should be passed through to the output data flow, and then map the input Unicode text column to the Term column of the reference table by dragging and dropping between those columns in the upper pane
■ Union All: Combines rows from multiple data flows into a single data flow, assuming the source columns are of compatible types Configure by connecting as many data flows as needed to the component Then, using the editor, ensure that the correct columns from each data flow are mapped to the appropriate output column
■ Unpivot: Makes a data flow more normalized by turning columns into attribute values For example, a data flow with one row for each quarter and a column for revenue by region could
be turned into a three-column data flow: Quarter, Region, and Revenue
Maintainable and Manageable Packages
Integration Services enables applications to be created with relatively little effort, which is a great
advantage from a development perspective, but can be a problem if quickly developed systems are
deployed without proper planning Care is required to build maintainable and manageable applications
Trang 10regardless of the implementation Fortunately, Integration Services is designed with many features that
support long-term maintainability and manageability
Designing before developing is especially important when first getting started with Integration Services,
as practices established early are often reused in subsequent efforts, especially logging, auditing, and
overall structure Perhaps the key advantage to developing with Integration Services is the opportunity
to centralize everything about a data processing task in a single place, with clear precedence between
steps, and opportunities to handle errors as they occur Centralization greatly increases maintainability
compared to the traditional ‘‘script here, program there, stored procedure somewhere else’’ approach
Other topics to consider during design include the following:
■ Identify repeating themes for possible package reuse Many tasks that repeat the same activities
on objects with the same metadata are good candidates for placing in reused subpackages
■ Appropriate logging strategies are the key to operational success When an error occurs, who
will be responsible for noticing and how will they know? For example, how will someone
know whether a package was supposed to run but did not for some reason? What level of
logging is appropriate? (More is not always better; too many irrelevant details mask true
problems.) What kinds of environment and package state information will be required to
understand why a failure has occurred after the fact? (For more information about logging, see
the next section.)
■ Auditing concepts may be useful for both compliance and error-recovery operations What
type of information should be associated with data created by a package? If large quantities
of information are required, then consider adding the details to an audit or lineage log, adding
only an ID to affected records Alternately, the Audit transform described earlier in this chapter
can be used to put audit information on each row
■ For packages that run on multiple servers or environments, what configuration details change
for those environments? Which storage mode (registry, SQL, XML, etc.) will be most
effec-tive at distributing configuration data? (See the ‘‘Package configurations’’ section later in this
chapter.)
■ Determine how to recover from a package failure Will manual intervention be required
before the package can run again? For example, a package that loads data may be able to use
transactions to ensure that rerunning a package does not load duplicate rows
■ Consider designing checkpoint restartable logic for long-running packages (See the
‘‘Check-point restart’’ section later in this chapter.)
■ Determine the most likely failure points in a package What steps will be realistically taken to
address a failure? Add those steps to the package if possible, using error data flows and task
constraints now to avoid labor costs later
Good development practices help increase maintainability as well Give packages, tasks, components,
and other visible objects meaningful names Liberal use of annotations to note non-obvious
mean-ings and motivations will benefit future developers, too Finally, use version-control software to maintain
a history of package and related file versions
Logging
Because many packages are destined for unattended operation, generating an execution log is an
excellent method for tracking operations and collecting debug information To configure logging for
a package, right-click on the package design surface and choose Logging On the Providers and Logs