SQL Server Tacklebox- P15 ppt

we are loading data from the text file back in to the SQL_Conn table, which currently holds 58K rows.. As you can see in Figure 3.2, this is exactly the number of rows that is now in the

Trang 1

we are loading data from the text file back in to the SQL_Conn table, which currently holds 58K rows

The -h TABLOCK hint forces a lock on the receiving table This is one of the requirements to guarantee minimally logging the transactions The –b option tells BCP to batch the transactions at n rows, in this case every 50,000 rows If there are any issues during the BCP load process, any rollback that occurs will only rollback to the last transaction after the n load So, say I wanted to load 100,000 records, and I batched the BCP load every 20,000 records If there were an issue while loading record 81,002 I would know that 80,000 records were successfully imported I would lose 1,002 transactions as they would roll back to the last 20,000 mark, which would be 80,000 records

The batch file takes one parameter, which is the number of times to run the BCP command in order to load the required number of rows into the table How did I choose 20 iterations? Simple math: 20 * 58,040 = 1,160,800 records

As you can see in Figure 3.2, this is exactly the number of rows that is now in the

SQL_Conn table, after 20 iterations of the BCP command, using the 58,040 records

in the f1_out.txt file as the source

Figure 3.2: Query to count SQL_Conn after loading over 1 million records NOTE

For what it is worth, I have also used this batch file to load a Terabyte worth of data to test how we could effectively manage such a large data store

If you re-run the BCP command in Listing 3.2, to output the query results to a file, you will find that the process takes more than a minute for a million rows, as opposed to the previous 3 seconds for 58K rows, indicating that the time to output the records remains good (58,040 / 3 = 19,346 records per second * 60

Trang 2

seonds = 1.16 million) I am still seeing nearly 20,000 records per second times(?) despite the increase in data, attesting to the efficiency of the old tried and true BCP

Filtering the output using queryout

Rather than working with the entire table, you can use the queryout option of BCP to limit the data you will be exporting, by way of a filtered T-SQL query Suppose I want to export data only from a particular time period, say for a

run_date greater than October 1st of 2008.The query is shown in Listing 3.4 Select * from dba_rep SQL_Conn where run_date > '10/01/2008'

Listing 3.4: Query to filter BCP output

There are many duplicate rows in the SQL_Conn table, and no indexes defined, so

I would expect that this query would take many seconds, possibly half a minute to execute The BCP command is shown in Listing 3.5

bcp "Select * from dba_rep SQL_Conn

where run_date > '10/01/2008'"

queryout

"C:\Writing\Simple Talk Book\Ch3\bcp_query_dba_rep.txt" -n –T

Listing 3.5: BCP output statement limiting rows to specific date range, using the output option

As you can see in Figure 3.3, this supposedly inefficient query ran through more than a million records and dumped out 64,488 of them to a file in 28 seconds, averaging over 2,250 records per second

Figure 3.3: BCP with queryout option

Trang 3

Of course, at this point I could fine tune the query, or make recommendations for re-architecting the source table to add indexes if necessary, before moving this type of process into production However, I am satisfied with the results and can move safely on to the space age of data migration in SSIS

SSIS

We saw an example of an SSIS package in the previous chapter, when discussing the DBA Repository The repository is loaded with data from several source servers, via a series of data flow objects in an SSIS package (Populate_DBA_Rep) Let's dig a little deeper into an SSIS data flow task Again, we'll use the SQL_Conn

table, which we loaded with 1 million rows of data in the previous section, as the source and use SSIS to selectively move data to an archive table; a process that happens frequently in the real world

Figure 3.4 shows the data flow task, "SQL Connections Archive", which will copy the data from the source SQL_Conn table to the target archive table,

SQL_Conn_Archive, in the same DBA_Rep database There is only a single connection manager object This is a quite simple example of using SSIS to migrate data, but it is an easy solution to build on

Figure 3.4: Simple SSIS data flow

Trang 4

Inside the SQL Connections Archive data flow, there are two data flow objects, an OLE DB Source and OLE DB Destination, as shown in Figure 3.5

Figure 3.5: Source and destination OLE DB objects in SSIS

We'll use the OLE DB source to execute the query in Listing 3.4 against the source SQL_Conn table, to return the same 64,488K records we dumped out to a file previously Instead of a file, however, the results will be sent to the OLE DB destination object, which writes them to a SQL_Conn_Archive table Figure 3.6 shows the Source Editor of the OLE DB source object, including the qualified query to extract the rows from the source table, SQL_Conn

For the Data Access Mode, notice that I am using "SQL command"; other

options are "Table or view", "Table Name or View Name Variable" and "SQL Command from variable" I am using SQL command here so as to have control over which fields and subset of data I wish to move, which is often a criteria for real world requests Notice that I am filtering the data with a WHERE clause, selecting only transactions with a run_date greater than '10/01/08'

Trang 5

Figure 3.6: Source Editor for SQL_Conn query

Figure 3.7 shows the Source Editor for the OLE DB Destination object, where

we define the target table, SQL_Conn_Archive, to which the rows will be copied There are a few other properties of the destination object that are worth noting I

have chosen to use the Fast Load option for the data access mode, and I have enabled the Table Lock option, which as you might recall from the BCP section,

is required to ensure minimally logged transactions

Định dạng
Số trang	5
Dung lượng	448,81 KB