we are loading data from the text file back in to the SQL_Conn table, which currently holds 58K rows.. As you can see in Figure 3.2, this is exactly the number of rows that is now in the
Trang 1we are loading data from the text file back in to the SQL_Conn table, which currently holds 58K rows
The -h TABLOCK hint forces a lock on the receiving table This is one of the requirements to guarantee minimally logging the transactions The –b option tells BCP to batch the transactions at n rows, in this case every 50,000 rows If there are any issues during the BCP load process, any rollback that occurs will only rollback to the last transaction after the n load So, say I wanted to load 100,000 records, and I batched the BCP load every 20,000 records If there were an issue while loading record 81,002 I would know that 80,000 records were successfully imported I would lose 1,002 transactions as they would roll back to the last 20,000 mark, which would be 80,000 records
The batch file takes one parameter, which is the number of times to run the BCP command in order to load the required number of rows into the table How did I choose 20 iterations? Simple math: 20 * 58,040 = 1,160,800 records
As you can see in Figure 3.2, this is exactly the number of rows that is now in the
SQL_Conn table, after 20 iterations of the BCP command, using the 58,040 records
in the f1_out.txt file as the source
Figure 3.2: Query to count SQL_Conn after loading over 1 million records NOTE
For what it is worth, I have also used this batch file to load a Terabyte worth of data to test how we could effectively manage such a large data store
If you re-run the BCP command in Listing 3.2, to output the query results to a file, you will find that the process takes more than a minute for a million rows, as opposed to the previous 3 seconds for 58K rows, indicating that the time to output the records remains good (58,040 / 3 = 19,346 records per second * 60
Trang 2seonds = 1.16 million) I am still seeing nearly 20,000 records per second times(?) despite the increase in data, attesting to the efficiency of the old tried and true BCP
Filtering the output using queryout
Rather than working with the entire table, you can use the queryout option of BCP to limit the data you will be exporting, by way of a filtered T-SQL query Suppose I want to export data only from a particular time period, say for a
run_date greater than October 1st of 2008.The query is shown in Listing 3.4 Select * from dba_rep SQL_Conn where run_date > '10/01/2008'
Listing 3.4: Query to filter BCP output
There are many duplicate rows in the SQL_Conn table, and no indexes defined, so
I would expect that this query would take many seconds, possibly half a minute to execute The BCP command is shown in Listing 3.5
bcp "Select * from dba_rep SQL_Conn
where run_date > '10/01/2008'"
queryout
"C:\Writing\Simple Talk Book\Ch3\bcp_query_dba_rep.txt" -n –T
Listing 3.5: BCP output statement limiting rows to specific date range, using the output option
As you can see in Figure 3.3, this supposedly inefficient query ran through more than a million records and dumped out 64,488 of them to a file in 28 seconds, averaging over 2,250 records per second
Figure 3.3: BCP with queryout option
Trang 3Of course, at this point I could fine tune the query, or make recommendations for re-architecting the source table to add indexes if necessary, before moving this type of process into production However, I am satisfied with the results and can move safely on to the space age of data migration in SSIS
SSIS
We saw an example of an SSIS package in the previous chapter, when discussing the DBA Repository The repository is loaded with data from several source servers, via a series of data flow objects in an SSIS package (Populate_DBA_Rep) Let's dig a little deeper into an SSIS data flow task Again, we'll use the SQL_Conn
table, which we loaded with 1 million rows of data in the previous section, as the source and use SSIS to selectively move data to an archive table; a process that happens frequently in the real world
Figure 3.4 shows the data flow task, "SQL Connections Archive", which will copy the data from the source SQL_Conn table to the target archive table,
SQL_Conn_Archive, in the same DBA_Rep database There is only a single connection manager object This is a quite simple example of using SSIS to migrate data, but it is an easy solution to build on
Figure 3.4: Simple SSIS data flow
Trang 4Inside the SQL Connections Archive data flow, there are two data flow objects, an OLE DB Source and OLE DB Destination, as shown in Figure 3.5
Figure 3.5: Source and destination OLE DB objects in SSIS
We'll use the OLE DB source to execute the query in Listing 3.4 against the source SQL_Conn table, to return the same 64,488K records we dumped out to a file previously Instead of a file, however, the results will be sent to the OLE DB destination object, which writes them to a SQL_Conn_Archive table Figure 3.6 shows the Source Editor of the OLE DB source object, including the qualified query to extract the rows from the source table, SQL_Conn
For the Data Access Mode, notice that I am using "SQL command"; other
options are "Table or view", "Table Name or View Name Variable" and "SQL Command from variable" I am using SQL command here so as to have control over which fields and subset of data I wish to move, which is often a criteria for real world requests Notice that I am filtering the data with a WHERE clause, selecting only transactions with a run_date greater than '10/01/08'
Trang 5Figure 3.6: Source Editor for SQL_Conn query
Figure 3.7 shows the Source Editor for the OLE DB Destination object, where
we define the target table, SQL_Conn_Archive, to which the rows will be copied There are a few other properties of the destination object that are worth noting I
have chosen to use the Fast Load option for the data access mode, and I have enabled the Table Lock option, which as you might recall from the BCP section,
is required to ensure minimally logged transactions