Use the following T-SQL statement to load the data from the data file into the new table: BULK INSERT AdventureWorks2008.Person.PersonCopy FROM 'C:\bcp\Person.tsv' WITH DATAFILETYPE='wid
Trang 1After running the statement, however, you get a simple report back from SQL Server:
(19972 row(s) affected)
Of course, you could use the same format files (either traditional, or XML) that
we discussed earlier So as you can see the BULK INSERT statement is very similar
in functionality to the BCP command line utility From the previous two sections you should have a pretty good idea about the mechanics of bulk inserting data You may be wondering what all the parameters we haven’t discussed are for Mostly, they have to do with performance In the next two sections, well discuss a few pointers on maximizing the performance of your bulk loads We’ll start by looking at how the transaction log is used during bulk operations But first, get your hands dirty and try a BULK INSERT
EXERCISE 8.2
Using BULK iNSERT
In this exercise, you will export and import the data file that you created previously in Exercise 8.1 back into SQL Server This exercise assumes that you have administrative privileges on the SQL Server instance you are working with, that you have the AdventureWorks2008 sample database installed on your SQL Server instance, and that you are running the exercise from the same computer where the SQL Server instance is installed.
1 Launch SQL Server Management Studio and open a new query window in the AdventureWorks2008 database.
2 Create the target table by running the following T-SQL statement:
SELECT TOP 0 * INTO AdventureWorks2008.Person.PersonCopy FROM AdventureWorks2008.Person.Person;
3 Use the following T-SQL statement to load the data from the data file into the new table:
BULK INSERT AdventureWorks2008.Person.PersonCopy FROM 'C:\bcp\Person.tsv'
WITH (DATAFILETYPE='widechar');
4 Run the following query to view the imported data:
SELECT * FROM AdventureWorks2008.Person.PersonCopy;
Trang 2Recovery Model and Bulk Operations
Every SQL Server database has an option that determines its recovery model
The recovery model of the database determines how the transaction log can be used for backups, and how much detail is recorded in the live log for bulk operations
A database’s recovery model can be set to FULL, BULK_LOGGED, or SIMPLE
The FULL recovery model specifies that all transactions, including bulk
opera-tions, will be fully logged in the transaction log The problem with having the
FULL recovery model turned on when you are doing bulk operations is that every
record that is inserted gets completely logged in the databases transaction log If you
are loading several records, you might end up with a problem It can fill the
data-bases transaction log up, and the logging activity itself can slow down the bulk
operation The FULL recovery model does make it possible to do point-in-time
restores, even partway through a bulk operation, using the transaction log in the
event of a failure
The BULK_LOGGED recovery model records all regular transactions fully just
liked the FULL recovery model Bulk operations are minimally logged, however
What does that mean? Rather than recording the details of every row that was
written, the transaction log tracks only which data pages and extents were modified
by the bulk operation The upside is that you don’t bloat the log with a large number
of inserts, and because less I/O is being performed against the log, performance
can increase The downside is that the transaction log alone no longer has all the
information required to recover the database to a consistent state
When you back up the transaction log that contains information about bulk
operations, the actual data extents that were modified by the bulk operation are
included in the log backup That sounds weird, but it’s true The log backup actually
contains extents from the data files, thereby making it possible to restore the
transac-tion log backup and get all the data that the bulk operatransac-tion inserted back as well
You should also note that the live log can remain small (because it doesn’t have to
log every insert performed as part of the bulk load), but the log backup will be large because the log backup contains the actual database extents that were modified
However, when you are using the BULK_LOGGED recovery model, there is
some exposure to loss If a catastrophic failure were to occur after the bulk operation
completed, but before you had a chance to back up the log, or the database, you
could lose the data that was loaded This implies that when you are using the
BULK_LOGGED recovery model, you must perform at least a transaction log
backup of the database immediately after the bulk operation completes A transaction log backup is enough, but it doesn’t hurt to do full or differential database backups
as well
Trang 3Regardless of whether you are using the FULL or BULK_LOGGED recovery model, SQL Server will keep all entries in the transaction log until they are backed
up using a BACKUP LOG statement, thereby ensuring that you can back up a contiguous chain of all transactions that have occurred on your database and that you can then restore the database using the transaction log backups This is true even with the BULK_LOGGED recovery model, as long as you back up the log immediately after a bulk operation occurs
The SIMPLE recovery model is not typically recommended for production databases The big reason is that SQL Server can clear entries from the log, even though they may not have been backed up yet However, as far as how the log works with bulk operations, SIMPLE is the same as BULK_LOGGED After a bulk operation is performed, however, you have no choice of doing a log backup You must follow up with a full or differential database backup
So what recovery model should you be using? SIMPLE isn’t a viable option for critical production databases because it doesn’t allow you to back up the transaction log FULL is the best option in terms of recoverability because it allows you to back up the log, and the log contains all the details BULK_LOGGED, however, can offer performance and maintenance benefits when doing bulk operations The answer then is really a mixture of FULL and BULK_LOGGED It is generally recommended that you leave your production databases with a FULL recovery model When doing a bulk operation you would first run a statement to change the recovery model to BULK_LOGGED, do the bulk load, run another statement to change the recovery model back to FULL, and then back up the transaction log
A couple of other requirements must be met for minimal logging to occur Minimal logging requires that the target table not be replicated and that a
TABLOCK be placed on the table by the bulk operation It also requires that the target table not have any indices on it, unless the table is empty If the table already has data in it and it has one or more indices, it may be better to drop the indices before the load, and then rebuild them after Of course, this should be tested in your own environment
The following sample code shows an example of a minimally logged
BULK INSERT:
ALTER DATABASE AdventureWorks2008 SET RECOVERY BULK_LOGGED;
BULK INSERT AdventureWorks2008.Person.PersonCopy
FROM 'C:\bcp\Person.tsv' WITH (DATAFILETYPE='widechar', TABLOCK);
ALTER DATABASE AdventureWorks2008 SET RECOVERY FULL;
BACKUP LOG AdventureWorks2008 To DISK='C:\…\SomeFile.bak'
Trang 4Note that the preceding code is only a sample The AdventureWorks2008
database actually uses the SIMPLE recovery model by default Although the code
shown in this example would work, it assumes that the full database backup has
already been performed Log backups can’t be run unless a full backup has been
performed If you do try the preceding code, you might want to set the recover
model back to SIMPLE when you are done
Using the right recovery model and bcp options to enable minimal logging can
help improve performance by not writing as much detail to the live transaction log
for a database These steps reduce the amount of work the hard drives must do and
can accelerate the performance of your bulk loading It can also make the load
more manageable by not bloating the transaction log with a large amount of data
This bloat alone could actually cause a bulk load to fail if the log filled to capacity
Figure 8.1 shows a performance monitor chart of the Percent Log Used counter for the AdventureWorks2008 database The chart shows the log utilization for two bulk
loads The first load was not minimally logged The second load was You can see the dramatic difference in performance between the two modes
Trang 5There are other ways to optimize performance, though In the next section we will cover some ways to optimize the performance of bulk load operations
Optimizing Bulk Load Performance
The whole point of performing bulk loads is performance Well, performance and convenience, but performance is probably the critical part You want to get as much data into the server as fast as you can, and with as little impact on the server as possible As we discussed in the previous topic, configuring your bulk loads to be minimally logged can significantly improve the performance and decrease the negative impacts of bulk loads However, you have other options that you can use
to help manage bulk loads as well as improve their performance These options include breaking the data into multiple batches, and presorting the data to match the clustered index on the target table
Both BCP and BULK INSERT support breaking the load of large files down into smaller batches The default behavior is that a single batch is used Each
batch equates to a transaction Therefore, the default is that the bulk operation is performed as a single transaction One big problem with this option is that the entire load succeeds, or the entire load fails It also means that the transaction log information that is maintained for the bulk load can’t be cleared from the log until the bulk operation completes
You can optimize the loading of your bulk data by breaking it down into smaller batches This allows you to fail only the batch rather than the whole load
if an error occurs When you restart the process, you could restart (using the first row options) with the specific batch It also allows the log to be cleared if backup operations run during the bulk load time frame Finally, it allows you to break a larger data file into pieces and have it be run by multiple clients in parallel
Of course, if you didn’t have a performance problem to start with, using batches can actually make things worse So you really need to test with the options to find the optimal settings for your situation
You can also help improve the performance of your bulk loads by making sure that the data in the data file is sorted by the same order as the clustered index key on the target table If you know this is the case, you can specify to the bulk operation that the data is presorted using the ORDER hint of the BCP utility or BULK INSERT statement This can improve the performance of bulk loads to tables with clustered indexes
In addition, it may be beneficial to drop nonclustered indices on the table before the load, and re-create them after the load If the table is empty to start with, this may not help, but if the table has data in it before the load, then it could provide a performance improvement Of course you should test this with your own databases