The Real MTCS SQL Server 2008 Exam 70/432 Prep Kit- P69 docx

When you use character or Unicode files, however, you must describe the structure of the data in the file to BCP.. Head of the Class… Dealing with Characters and Collations The file stor

Trang 1

need to describe the structure of the file to BCP When you use character or

Unicode files, however, you must describe the structure of the data in the file to BCP For example if it is a delimited file, the delimiters need to be specified for BCP to recognize them

Head of the Class…

Dealing with Characters and Collations

The file storage type that is used can be a common problem when sharing data between different database systems, operating systems, and organi-zations If you receive a data file that has the data stored as character data, the way those characters are encoded can be an issue In SQL Server you describe the encoding of the character data as well as how that data can be sorted and compared using collations.

The Unicode character set is able to represent thousands of possible character symbols The Unicode character set is sufficient for representing characters from all the major languages, alphabets and cultures in the world.

However, non-Unicode character sets typically can represent only 256 possible symbols So when you create SQL Server instances, databases, and character columns, you need to specify the character set that has the

256 characters you want.

When you are transferring data between two systems, it is possible that the two systems may have elected to use different sets of characters for their non-Unicode data BCP gives you a number of ways to deal with the differences You can use the command line arguments to let bcp know that the data file contains either character (-c) or unicode (-w) You can also specify the specific code page (or character set) that the data file was encoded with by including the –C argument Finally, you can

do column-specific collation assignments using bcp format files.

You have probably worked with either comma-separated value (csv) or tab- separated value (tsv) files in the past They store data as values with a delimiter (a comma, a tab, or something else) between each of the values The rows typically end with a line feed (“\n”) or a carriage return and a line feed (“\r\n”) The following example exports the same data that you got before from the AdventureWorks2008

Trang 2

Person.Person table, but this time you’ll use a nonnative Unicode format (-w) for

the data, and you will specify a comma as the delimiter (-t “,”)

bcp AdventureWorks2008.Person.Person out person.csv -w -t, -T

If you were to open the person.csv file that is created by the preceding

statement, it would look similar to the following (the output has been trimmed

for readability) Notice that the field values are separated by commas as was

specified in the command line:

1,EM,0,,Ken,J,Sánchez,,0,,<IndividualSurvey

2,EM,0,,Terri,Lee,Duffy,,1,,<IndividualSurvey

3,EM,0,,Roberto,,Tamburello,,0,,<IndividualSurvey

Now try to import the data back into the same AdventureWorks2008.Person

PersonCopy table that you used before Because it already has data in it, you will

truncate the table first To do that you can run the following statement in a query

window in SQL Server Management Studio:

TRUNCATE TABLE AdventureWorks2008.Person.PersonCopy;

Next, you’ll try to load the data into the newly truncated table Review the

script and the output Notice that you receive an error:

bcp AdventureWorks2008.Person.PersonCopy in person.csv -w -t, -T

Starting copy

SQLState = 22005, NativeError = 0

Error = [Microsoft][SQL Server Native Client 10.0]Invalid character

value for cast specification

The cause of the error is that actual data has commas in it (this is common in

fields that contain human-entered notes or comments) BCP reads the comma

in the data as if it were the delimiter of the field This messes up the reading of the

file and causes errors

If you had to stay with a nonnative file, you could specify an alternate field

terminator When picking either field or row terminators, you want to select a

character, or a character sequence, that doesn’t occur in the data itself In this case

you could try a tab (“\t”) or something like a pipe character (|) that almost never

occurs in human-entered data If you ran the preceding example with no –t option,

the default tab delimiter would have been used to delimit the fields, and because

there are luckily no tabs in the actual data, it should work Here is what that

command would look like:

Trang 3

bcp AdventureWorks2008.Person.Person out Person.tsv -w -T

The data file produced by the preceding statement would be tab delimited You could then successfully import it into your Person.PersonCopy table using

a very similar statement:

bcp AdventureWorks2008.Person.PersonCopy in Person.tsv -w -T

You can see where getting BCP to work with your data could be problematic

It has already become a problem pulling data from one of your own SQL tables

It can get even more troublesome when you have to make data that has come from business partners to load successfully into your own tables As the data formatting specification becomes more complex, you need the power of format files In the next section we’ll talk about format files

Using Format Files

Format files allow you to more explicitly describe the structure of the data file and how it maps to the corresponding SQL Server table or view For native data files or simple character or Unicode data file types, you can probably specify all the infor-mation that BCP needs to parse the files just using the command line switches However, if the files use fixed field widths rather than delimiters, or if different fields use different delimiters, the command line options fall short There are also times when the data file you are using has a different number of columns than the target table you want to load the data into In those situations format files become a requirement

A common situation where format files are needed is when the target object has an identity column that generates primary key values, but the data file does not include the values for the column There will be a mismatch between the number

of columns in the data file and the target table

Creating a format file is easiest when you have BCP do the initial work for you There are a number of ways you can do this task You could run the bcp command with insufficient input and have it prompt you for the details, or you could specify the details needed on the command line, but ask that it generate a format file for you by using the “format” and “-f ” options Finally, you could have it produce a newer XML format file by including not only the “format” and “-f ” options but also the “-x” option

To demonstrate using format you will start with a simple table that has three columns in it The following script would generate the table and load it with some sample data:

Trang 4

USE AdventureWorks2008;

CREATE TABLE dbo.Presidents

(PresidentID int IDENTITY(1,1) NOT NULL PRIMARY KEY,

FirstName varchar(50) NOT NULL,

LastName varchar(50) NOT NULL);

INSERT INTO Presidents VALUES ('George','Washington')

INSERT INTO Presidents VALUES ('John','Adams')

INSERT INTO Presidents VALUES ('Thomas','Jefferson')

Next, you will have BCP create a format file named character type data file and

have it name the format file Presidents.fmt Because you are only generating a

for-mat file and not really moving any data, there is no data file That explains the “nul”

where the data file path would normally be:

bcp AdventureWorks2008.dbo.Presidents format nul -T -c -f Presidents.fmt

The file that is produced by the preceding command looks like this:

10.0

3

1 SQLCHAR 0 12 "\t" 1 PresidentID ""

2 SQLCHAR 0 50 "\t" 2 FirstName SQL_Latin1_General_CP1_CI_AS

3 SQLCHAR 0 50 "\r\n" 3 LastName SQL_Latin1_General_CP1_CI_AS

Let’s break the preceding format file down The first row states the version

of BCP that the format file is from (v10.0 is SQL Server 2008’s BCP utility)

The second row lists how many fields there are in the data file In this case there

are three columns The next three rows describe each of the data fields, and the

corresponding SQL table column they map to

Trang 5

Table 8.3 explains each of the elements of the format file field definitions for the second field definition in the format file:

Purpose Sample Value Description

Host File Field

Order 2 Indicates the ordinal position of the field as it is in the data file Host field data

type SQLCHAR The storage type of the data in the data files In our example everything

is just SQLCHAR because the file is a character file.

Host field prefix

length 0 Can be zero unless the field contains NULLs Learn more in the SQL Server

2008 documentation.

Host field data

length 50 The length of the host file data field in bytes The firstname field in the

original table was 50 characters, or

50 bytes wide.

Host file field

terminator “\t” The character that will be used in the data file to indicate the end

of the field The “\t” value here means that the “tab” character is the field terminator.

Server Column

Num 2 The position of the destination column in the target database

object Server column

name FirstName The name of the destination column in the target database object Server column

collation SQL_Latin1_General_ CP1_CI_AS The collation of the destination column in the target database

object.

Table 8.3 Format File Field Definition

So now that you have a format file, use it during an export from the

AdventureWorks2008.dbo.Presidents table (the following command is printed in the book on two lines, but should be entered as a single line:

bcp AdventureWorks2008.dbo.Presidents out Presidents.tsv -T

-f Presidents.fmt

Định dạng
Số trang	5
Dung lượng	137,93 KB