When you use character or Unicode files, however, you must describe the structure of the data in the file to BCP.. Head of the Class… Dealing with Characters and Collations The file stor
Trang 1need to describe the structure of the file to BCP When you use character or
Unicode files, however, you must describe the structure of the data in the file to BCP For example if it is a delimited file, the delimiters need to be specified for BCP to recognize them
Head of the Class…
Dealing with Characters and Collations
The file storage type that is used can be a common problem when sharing data between different database systems, operating systems, and organi-zations If you receive a data file that has the data stored as character data, the way those characters are encoded can be an issue In SQL Server you describe the encoding of the character data as well as how that data can be sorted and compared using collations.
The Unicode character set is able to represent thousands of possible character symbols The Unicode character set is sufficient for representing characters from all the major languages, alphabets and cultures in the world.
However, non-Unicode character sets typically can represent only 256 possible symbols So when you create SQL Server instances, databases, and character columns, you need to specify the character set that has the
256 characters you want.
When you are transferring data between two systems, it is possible that the two systems may have elected to use different sets of characters for their non-Unicode data BCP gives you a number of ways to deal with the differences You can use the command line arguments to let bcp know that the data file contains either character (-c) or unicode (-w) You can also specify the specific code page (or character set) that the data file was encoded with by including the –C argument Finally, you can
do column-specific collation assignments using bcp format files.
You have probably worked with either comma-separated value (csv) or tab- separated value (tsv) files in the past They store data as values with a delimiter (a comma, a tab, or something else) between each of the values The rows typically end with a line feed (“\n”) or a carriage return and a line feed (“\r\n”) The following example exports the same data that you got before from the AdventureWorks2008
Trang 2Person.Person table, but this time you’ll use a nonnative Unicode format (-w) for
the data, and you will specify a comma as the delimiter (-t “,”)
bcp AdventureWorks2008.Person.Person out person.csv -w -t, -T
If you were to open the person.csv file that is created by the preceding
statement, it would look similar to the following (the output has been trimmed
for readability) Notice that the field values are separated by commas as was
specified in the command line:
1,EM,0,,Ken,J,Sánchez,,0,,<IndividualSurvey
2,EM,0,,Terri,Lee,Duffy,,1,,<IndividualSurvey
3,EM,0,,Roberto,,Tamburello,,0,,<IndividualSurvey
Now try to import the data back into the same AdventureWorks2008.Person
PersonCopy table that you used before Because it already has data in it, you will
truncate the table first To do that you can run the following statement in a query
window in SQL Server Management Studio:
TRUNCATE TABLE AdventureWorks2008.Person.PersonCopy;
Next, you’ll try to load the data into the newly truncated table Review the
script and the output Notice that you receive an error:
bcp AdventureWorks2008.Person.PersonCopy in person.csv -w -t, -T
Starting copy
SQLState = 22005, NativeError = 0
Error = [Microsoft][SQL Server Native Client 10.0]Invalid character
value for cast specification
The cause of the error is that actual data has commas in it (this is common in
fields that contain human-entered notes or comments) BCP reads the comma
in the data as if it were the delimiter of the field This messes up the reading of the
file and causes errors
If you had to stay with a nonnative file, you could specify an alternate field
terminator When picking either field or row terminators, you want to select a
character, or a character sequence, that doesn’t occur in the data itself In this case
you could try a tab (“\t”) or something like a pipe character (|) that almost never
occurs in human-entered data If you ran the preceding example with no –t option,
the default tab delimiter would have been used to delimit the fields, and because
there are luckily no tabs in the actual data, it should work Here is what that
command would look like:
Trang 3bcp AdventureWorks2008.Person.Person out Person.tsv -w -T
The data file produced by the preceding statement would be tab delimited You could then successfully import it into your Person.PersonCopy table using
a very similar statement:
bcp AdventureWorks2008.Person.PersonCopy in Person.tsv -w -T
You can see where getting BCP to work with your data could be problematic
It has already become a problem pulling data from one of your own SQL tables
It can get even more troublesome when you have to make data that has come from business partners to load successfully into your own tables As the data formatting specification becomes more complex, you need the power of format files In the next section we’ll talk about format files
Using Format Files
Format files allow you to more explicitly describe the structure of the data file and how it maps to the corresponding SQL Server table or view For native data files or simple character or Unicode data file types, you can probably specify all the infor-mation that BCP needs to parse the files just using the command line switches However, if the files use fixed field widths rather than delimiters, or if different fields use different delimiters, the command line options fall short There are also times when the data file you are using has a different number of columns than the target table you want to load the data into In those situations format files become a requirement
A common situation where format files are needed is when the target object has an identity column that generates primary key values, but the data file does not include the values for the column There will be a mismatch between the number
of columns in the data file and the target table
Creating a format file is easiest when you have BCP do the initial work for you There are a number of ways you can do this task You could run the bcp command with insufficient input and have it prompt you for the details, or you could specify the details needed on the command line, but ask that it generate a format file for you by using the “format” and “-f ” options Finally, you could have it produce a newer XML format file by including not only the “format” and “-f ” options but also the “-x” option
To demonstrate using format you will start with a simple table that has three columns in it The following script would generate the table and load it with some sample data:
Trang 4USE AdventureWorks2008;
CREATE TABLE dbo.Presidents
(PresidentID int IDENTITY(1,1) NOT NULL PRIMARY KEY,
FirstName varchar(50) NOT NULL,
LastName varchar(50) NOT NULL);
INSERT INTO Presidents VALUES ('George','Washington')
INSERT INTO Presidents VALUES ('John','Adams')
INSERT INTO Presidents VALUES ('Thomas','Jefferson')
Next, you will have BCP create a format file named character type data file and
have it name the format file Presidents.fmt Because you are only generating a
for-mat file and not really moving any data, there is no data file That explains the “nul”
where the data file path would normally be:
bcp AdventureWorks2008.dbo.Presidents format nul -T -c -f Presidents.fmt
The file that is produced by the preceding command looks like this:
10.0
3
1 SQLCHAR 0 12 "\t" 1 PresidentID ""
2 SQLCHAR 0 50 "\t" 2 FirstName SQL_Latin1_General_CP1_CI_AS
3 SQLCHAR 0 50 "\r\n" 3 LastName SQL_Latin1_General_CP1_CI_AS
Let’s break the preceding format file down The first row states the version
of BCP that the format file is from (v10.0 is SQL Server 2008’s BCP utility)
The second row lists how many fields there are in the data file In this case there
are three columns The next three rows describe each of the data fields, and the
corresponding SQL table column they map to
Trang 5Table 8.3 explains each of the elements of the format file field definitions for the second field definition in the format file:
Purpose Sample Value Description
Host File Field
Order 2 Indicates the ordinal position of the field as it is in the data file Host field data
type SQLCHAR The storage type of the data in the data files In our example everything
is just SQLCHAR because the file is a character file.
Host field prefix
length 0 Can be zero unless the field contains NULLs Learn more in the SQL Server
2008 documentation.
Host field data
length 50 The length of the host file data field in bytes The firstname field in the
original table was 50 characters, or
50 bytes wide.
Host file field
terminator “\t” The character that will be used in the data file to indicate the end
of the field The “\t” value here means that the “tab” character is the field terminator.
Server Column
Num 2 The position of the destination column in the target database
object Server column
name FirstName The name of the destination column in the target database object Server column
collation SQL_Latin1_General_ CP1_CI_AS The collation of the destination column in the target database
object.
Table 8.3 Format File Field Definition
So now that you have a format file, use it during an export from the
AdventureWorks2008.dbo.Presidents table (the following command is printed in the book on two lines, but should be entered as a single line:
bcp AdventureWorks2008.dbo.Presidents out Presidents.tsv -T
-f Presidents.fmt