Organizations also often need to share data between those databases in-house as well as to exchange data with business partners.. Therefore, organizations need tools to move data between
Trang 1This page intentionally left blank
Trang 2Exam objectives review:
˛ Summary of Exam Objectives
˛ Exam Objectives Fast Track
˛ Exam Objectives Frequently Asked Questions
˛ Self Test
˛ Self Test Quick Answer Key
Exam objectives in this chapter:
Bulk Copying Data
■
■
Distributed Queries
■
■
SQL Server Integration Services
■
■
Alternative ETL Solutions
■
■
ETL Techniques
MCTS SQL Server 2008
Exam 432
Trang 3314 Chapter 8 • ETL Techniques
Introduction
The ETL (extract/transform/loading) process is generally done using SQL Server Integration Services (SSIS) Other methods for ETL in SQL Server are available, such as BCP, Select/Into, and insert/select You can also write custom code using ASP.Net or other languages
This chapter covers all of the aspects and methods for performing ETL with SQL server, from BCP to SQL Server integration Services
Distributed queries and transactions are also covered in this chapter, as well as setting up and using a linked server for cross-server communications and/or queries Setting up systems to work correctly with distributed queries can be difficult, and this chapter will help to demystify the process
Understanding ETL
Organizations often have more than just one database Organizations also often need
to share data between those databases in-house as well as to exchange data with business partners Therefore, organizations need tools to move data between databases, across platforms, and often even between different businesses As the data moves from one database to another, there is also often a need to perform some manipulation or transformation on the data The process of extracting data from a data source,
transforming the data to meet your needs, and loading the data into a destination is known more generally as extract, transform, and load, or more simply, ETL
SQL Server provides a wide array of tools to move data Exactly what data these tools move and how they do it varies significantly The data could be an entire database, such as the Copy Database Wizard; a specific set of rows, such as the BCP, BULK INSERT, and OPENROWSET commands; or an extremely complex set of data movements as defined by an SSIS package In this section, we will review all
of these tools We’ll start by looking at tools for bulk copying data
Bulk Copying Data
When you want to move a lot of data into or out of SQL server, you will likely find that the most efficient and best-performing tools are those that facilitate the bulk loading of data The bulk loading tools are typically used to move data
between flat files (e.g., a csv file) and SQL Server, but it is possible to use the bulk copy operations with data sources other than flat files by using certain programming techniques Regardless of what the data source is, though, the whole point of using the bulk copy tools is performance Organizations use bulk loading techniques
Trang 4when they want to move a large number of rows as quickly as possible, and with as
little impact on their servers as possible Optimizing that performance can be a bit
of an art, but we’ll talk about some performance techniques at the end of this
section Before we get into performance, however, let’s look at the first of the bulk
load tools, BCP
Using BCP
BCP (short for Bulk Copy Program) is a command line utility that you can use to
import data from a file into SQL Server, export data from SQL Server into a file,
and generate format files for use by BCP and other bulk copy tools BCP has been
available in many versions of SQL Server Before tools like SSIS existed, BCP was
the only available means for easily getting data into or out of SQL Server
BCP is limited to working with flat files and SQL Server If you want to get
data from another data source (e.g., and Oracle database) into SQL Server, BCP is
not the most direct means for this type of data transfer SSIS would be a better tool
for this type of transfer BCP is a useful tool, however, if you have flat files (delimited,
fixed format, etc.) to either extract data from or load data into
Exam Warning
Remember that BCP can work with only text files and SQL Server It can’t
be used to move data directly from one SQL Server to another The data
must first be exported from one database to a flat file and then
imported to the destination from the flat file.
So let’s start with the basic structure of the BCP utility:
bcp {dbtable | query} {in | out | queryout | format} datafile [option, n]
You can find explanations of these in Table 8.1
Trang 5316 Chapter 8 • ETL Techniques
Syntax Element Description
SQL Table or View, this is the qualified object name For example, to import into the Person table in the Person schema of the AdventureWorks2008 data-base, this would be:
[AdventureWorks2008].[Person].[Person]
The bcp command allows you to tell it which SQL Server instance to connect to, but does not allow you to specify which database If the table you wish
to work with is not in the connections default data-base, you will need to qualify the object name using the database.schema.object format.
If you will be importing to a view, you need to make sure that the view supports inserts.
dbtable can be used with the in, out, or format options This means that you can import to the table (in) or export from it (out).
table or view You need to enclose the query in double quotes (“) and use single quotes for any character literals inside the query.
query can be used with the queryout and format
options You cannot import to a query.
into the dbtable specified.
from the dbtable specified.
from the specified query.
Table 8.1 Required BCP Syntax Arguments
Continued