The goal of this paper is to demonstrate the generation of a table of random data to be used in a development environment, exposing the nuances of passing variable values into various ta
Trang 1Bust a Move with
Your SSIS – Passing Package
Variables
Expert Reference Series of White Papers
Trang 2Integration Services (SSIS), the next generation of the Extract, Transport, and Load (ETL) feature included with Microsoft SQL Server 2005 has befuddled Database Administrators This is due to the new programming para-digm and the complexity of the development environment SSIS includes many new capabilities bundled with a Visual Studio front end This paper explores the creation of sample development data featuring the use of the most basic features in this new interface The challenge of declaring and passing package variables contains enough material for an hour’s skill-building session with this new tool
Extract,Transport, and Load (ELT)
The challenge of the new changes of the ETL feature is comparable to a rock band taking over a square dance It’s time to let go, learn to dance to a new beat, and bust some moves with this great new tool, SSIS The com-plexities of this new tool have caused many administrators to retreat to the familiar tempo of DTS However,
as we explore this tool, you will find the creativity available in this new environment compelling enough to bust some moves of your own
The goal of this paper is to demonstrate the generation of a table of random data to be used in a development environment, exposing the nuances of passing variable values into various tasks objects that make up the package
One of the challenges to application development in a database is to have data available for testing and reporting Adding to this challenge are the typical privacy requirements preventing use of actual patient per-sonal identification In my experience, it is always better to develop with test data that is as reflective of the test data as possible, especially when interviewing end users using a prototype of the system under develop-ment I will walk thru the steps needed to create a SSIS package to develop a table of test data
The end result of the execution of the package, the Person table, contains just a couple of challenges The package will be configured with package-level variables that will be passed from one task to another This poses the challenges of declaring the variables and referencing them throughout the package Our goal is to generate random Patient Name and SSN information to be inserted in the Person Table schema shown below
in Step 1 This purpose of this table is to replace a table containing an extract of live production data to facili-tate application testing Overwriting the contents of a table with randomly generated data will obscure the pri-vate details in a database, which are protected by legal constraints, while allowing proper acceptance testing and comparison to an existing system prior to cut over to the new system
Bill Kenworthy, Global Knowledge Instructor, MCDBA
Bust a Move with Your SSIS – Passing Package Variables
Trang 3The project at hand starts with generation of a simple Database containing five tables and two stored proce-dures A SSIS solution is created to use this database and populate the Person table with random data in three columns: FirstName, LastName, and SSN
The name data is generated from the contents of two driving tables, FName and LName These tables contain the seed data for name generation The result of the execution of the name generation is a table containing an Identity column and columns for First and Last Name The SSN generation fills a separate table and requires no seed data
The name generation takes place inside a For Next Loop that uses a package variable to control the number of rows generated Once the loop completes, execution is passed to a SQL task that runs the SSN generation stored procedure After completion of this task, a Merge Join Task is the last major data manipulation activity The Merge Join, contained in a Data Flow Task, combines the two staging tables into the final product The moves that make this package possible are creating package variables, passing the variable values to the task objects, and careful attention to matching column data types A data dictionary for the development database
is contained in Resource A
The Project
1 Generate tables and stored procedures for the project
2 Creating a SSIS project adding needed variables
3 Add a SQL Task truncating working tables
4 Add a For Next Loop configure looping parameters
5 Call a stored procedure in the loop passing a parameter
6 Generating the SSN data using a execute sql task
7 Add a data flow task
8 Configure a data merge task
The tasks to be accomplished in this project
The following the individual steps of creating the package;
howev-er, the reader will find the configuration entries in the screen shots
are valuable in duplicating this demonstration The database and
its objects are the foundation upon which we build our
transforma-tion The solution requires 2 seed tables and 2 staging tables used
to hold temporary results and a final table holding the merged
con-tents of the two staging tables The database diagram in the
data-base is shown in Figure 1
Step 1 Generate tables and stored procedures for the
project
The data dictionary for this database is contained in the Resources
section of this document The script for generating the schema and
stored procedures is in Resource B The code has hard-coded
refer-ences to the DEV database; the script should be run in the context
of a database with that name
Figure 1 Database Schema for the project
Trang 4Step 2 Open an Integration
Services Project.
Open your project, then right click on
any clear space on the control flow pane
and choose variables from the context
menu to open the variable declaration
dialog box Add two int32 variables,
Counter and MaxRows, with values of 0
and 1000, as shown in Figure 2 The
Counter variable is used to pass the
cur-rent loop index into the SQL task contained in the loop task that will be added to the project in step 4
MaxRows is the number of rows to be inserted into the Person Table Figure 3 shows the dialog box with appropriate entries
Note: references to variables in this
environment are case sensitive
Resource C contains a reference a topic
in the SQL Server 2005 Books Online describing variables and links to how-to: topics
Step 3 Add a SQL Task truncating working tables
Add a SQL Task to the Control Flow win-dow as the first task in the project and set its parameters as shown in the dia-gram Note the configuration of the ConnectionType as ADO.NET Although this SQL Task doesn’t pass parameters in the SQLStatement property, I like to keep settings of similar objects consistent Specifying ADO.NET as the connection type allows reference to parameters using the @ naming convention The SQL query simply truncates the Person and Name tables An appropriate reference to this object in the books online
is listed in Resource C
Step 4 Add a For Next Loop configure looping parameters
The For Loop container defines a repeating control flow in a package In this package, the For loop is used to repeat the execution of the MakeNames stored procedure until the required number of rows configured in the MaxRow variable are inserted into the Name table The For Loop container uses three elements to define the loop init, eval, and assign(increment) control values As you can see in Figure 4 above, the variable @Counter
is used for indexing in the loop This reference is case-sensitive and must match a package variable name, with the @ prefix necessary in this property page For example, the variable @MaxRows matches the MaxRows package variable An appropriate reference to this object in the books online is listed in Resource C
Figure 2 Variables dialog box
Figure 3 SQL Task to truncate the working tables
Trang 5Figure 4 Configuring the For Loop Task
Step 5 Call a stored procedure in the loop passing a parameter.
SQL Task configured with connection type ADO.NET, calls stored procedure MakeAName Note my preference for the property, ConnectionType Each connection type supports a different syntax for passing parameters ADO.NET supports the @ reference, other connection type use a ? [question mark] I prefer the @ syntax, it is consistent with the syntax used in Transact SQL An appropriate reference to this object in the books online is listed in Resource C
Figure 5 Property page for the SQL Task embedded in the Loop Task
Trang 6Figure 6 Parameter map entry passes variable value from loop.
The second part of configuring this SQL Task is the mapping required to tie the variable referenced in the SQL statement to the package variable value being passed into the SQL Task by its parent container The first three columns are selected from combo box choices, the developer enters the Parameter Name value by hand
Figure 7 Control Flow diagram of the project to this point.
Test it!
Now the project is at a point where it can be tested Your package should resemble the package shown in the figure above If your package errors out when you run the debugger, consult the Execution/Results view for error messages and resolve the errors
Trang 7Step 6 Generating the SSN data using an Execute SQL task.
Figure 8 Configuration of the SQL Task that follows execution of the loop
Step 7 Add a data flow task
Assemble the data flow objects as shown in Figure 8 above; the properties to be set are in the table in Resource D Appropriate references to the objects used in this dataflow for lookup in the books online are
list-ed in Resource C The Merge Join Task properties are detaillist-ed in Step 8
Figure 9 The Data Flow contains 6 objects
Trang 8Step 8 Configure a data merge task
This entire configuration of the Merge Join is shown in Figure 10 This is the only property page for the Merge Join
Data typing is strong in this environment The datatype of each column in the output table must match that of the corresponding column in the Person table The figure shows the FirstName output column has been configured as
a Unicode string, the LastName column in this datareader and the SSN column in the SSN datareader should be set to Unicode as well
Figure 10 Configuration of the Merge Join Object
Figure 11 Configuring the
datareader column datatype
properties
Trang 9Figure 12 Final control flow of the project
The finished project should have a Control Flow diagram as shown in Figure 12 In this example annotations have been added to label each task in the flow
The diagram shows a few of the
rows from the Person table
popu-lated using the SSIS package A
weakness in my calls to the
RAND() SQL function inside the
MakeSSN procedure shows a lot of
commonality in the second and
third segments of the data in the
SSN column In this snapshot of
the data, you see the value of 71 is
very popular in the second
seg-ment of the string, and a modal
distribution in the last four
charac-ters of the string There are clumps
of similar values, ‘7137’ shows up
in rows 98 -103 I think the
inclu-sion of a Common Language
Runtime (CLR) Assembly with a
function to generate a random SSN
string would be a significant
improvement to the package and
provide a performance increase
Figure 13 The data generated by
Trang 10I’ve presented a common development scenario, generating representation test data and a possible solution to this requirement The solution presented demonstrates a control flow containing a looping task, several SQL tasks, and a dataflow using a merge object Declaring package variables and passing variable values between tasks requires careful attention to detail when configuring the various tasks to share values amongst them SSIS presents a flexible programming structure allowing no practical limit to extension This flexibility brings with it a finer structure for controlling a group of tasks the complexity of which bears careful experimentation The environment provides many opportunities for tapping into the power of the NET Framework but brings with it some new baggage such as case sensitivity, connection type requirements, and strict type casting
Learn More
Learn more about how you can improve productivity, enhance efficiency, and sharpen your competitive edge Check out the following Global Knowledge courses:
Implementing and Maintaining Microsoft SQL Server 2005 Integration Services
Microsoft Certified IT Professional: Database Administrator Boot Camp
SQL Server 2005 Administration
SQL Server 2005 for Business Intelligence
SQL Server 2005 for Developers
SQL Server 2005 for Reporting Services
For more information or to register, visit www.globalknowledge.comor call 1-800-COURSESto speak with a sales representative
Our courses and enhanced, hands-on labs offer practical skills and tips that you can immediately put to use Our expert instructors draw upon their experiences to help you understand key concepts and how to apply them to your specific work situation Choose from our more than 700 courses, delivered through Classrooms, e-Learning, and On-site sessions, to meet your IT and management training needs
About the Author:
Bill Kenworthy has been working with SQL Server since version 6.0 His love for database challenges is
reflect-ed in his writing Bill lives with his wife and 2 dogs at the end of a dirt road in northern Washington State
Resources:
A Data Dictionary for the project
Staging Tables
FName, LName
Two seed tables – number of rows not necessarily equal These two tables contain the first and last name values that will be selected randomly and inserted into a row in the Name table
Name ,SSN
Working tables holding Name and SSN working data
Trang 11Production Table
Person
Stores Patient Name and SSN data
Stored procedures
MakeAName, requires an integer variable that is used to seed the RAND() function The procedure
inserts a row into the Dev.dbo.Person table, providing values for the FirstName and LastName columns The name values are randomly selected from the staging tables
MakeASSN, populates the SSN table with a unique combination of characters generated by the
RAND() function The stored procedure checks the size of the Person table and inserts the same number of rows into the staging table
B Script for creation of the database objects
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id =
OBJECT_ID(N'[dbo].[FName]') AND type in (N'U'))
BEGIN
CREATE TABLE [dbo].[FName](
[Id] [int] IDENTITY(1,1) NOT NULL, [FirstName] [nvarchar](50) NULL
) ON [PRIMARY]
END
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id =
OBJECT_ID(N'[dbo].[LName]') AND type in (N'U'))
BEGIN
CREATE TABLE [dbo].[LName](
[Id] [int] IDENTITY(1,1) NOT NULL, [LastName] [nvarchar](50) NULL ) ON [PRIMARY]
END
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id =
OBJECT_ID(N'[dbo].[SSN]') AND type in (N'U'))
BEGIN
CREATE TABLE [dbo].[SSN](
[Id] [int] IDENTITY(1,1) NOT NULL, [SSN] [char](11) NULL
) ON [PRIMARY]