Hướng dẫn học Microsoft SQL Server 2008 part 75 pot

TABLE 31-2 Distributed Query Method Matrix Link Setup Query-Execution Location Local SQL Server External Data Source Pass-Through Linked Server Four-part name Four-part name OpenQuery Ad

Trang 1

@datasrc = ‘C:\SQLServerBible\CHA1_Schedule.xls’,

@provstr = ‘Excel 5.0’;

Excel spreadsheets are not multi-user spreadsheets SQL Server can’t perform a distributed query that accesses an Excel spreadsheet while that spreadsheet is open in Excel.

Linking to MS Access

Not surprisingly, SQL Server links easily to MS Access databases SQL Server uses the OLE DB Jet

provider to connect to Jet and request data from the MS Access.mdbfile

FIGURE 31-3

Prior to the conversion to SQL Server, the Cape Hatteras Adventures company was managing its tour

schedule in the CHA1_Schedule.xls spreadsheet

Trang 2

FIGURE 31-4

Tables are defined within the Excel spreadsheet as named ranges The CHA1_Schedule spreadsheet

has five named ranges

Because Access is a database, there’s no trick to preparing it for linking, as there is with Excel Each

Access table will appear as a table under the Linked Servers node in Management Studio

The Cape Hatteras Adventures customer/prospect list was stored in Access prior to upsizing the

database to SQL Server The following code from theCHA2_Convert.sqlscript links to the

CHA1_Customers.mdbAccess database so SQL Server can retrieve the data and populate the SQL

Server tables:

EXEC sp_addlinkedserver

‘CHA1_Customers’,

Trang 3

‘Access 2003’,

‘Microsoft.Jet.OLEDB.4.0’,

‘C:\SQLServerBible\CHA1_Customers.mdb’;

If you are having difficulty with a distributed query, one of the first places to check is the security

con-text Excel expects that connections do not establish a security context, so the non-mapped user login

should be set to no security context:

EXEC sp_addlinkedsrvlogin

@rmtsrvname = ‘CHA1_Schedule’,

@useself = ‘false’;

Developing Distributed Queries

Once the link to the external data source is established, SQL Server can reference the external data

within queries Table 31-2 shows the four basic syntax methods that are available, which differ in

query-processing location and setup method

TABLE 31-2

Distributed Query Method Matrix

Link Setup Query-Execution Location

Local SQL Server External Data Source (Pass-Through) Linked Server Four-part name Four-part name OpenQuery()

Ad Hoc Link Declared

in the Query

Distributed queries and Management Studio

Management Studio doesn’t supply a graphic method for initiating a distributed query There’s no way

to drag a linked server or remote table into the Query Designer However, the distributed query can be

entered manually in the SQL pane and then executed as a query

Using the Query Editor, the name of the linked server can be dragged from the Object Explorer to the

Query Editor

Distributed views

Views are saved SQLSELECTstatements While I don’t recommend building a client/server application

based on views, they are useful for ad hoc queries Because most users (and even developers) are

unfa-miliar with the various methods of performing distributed queries, wrapping a distributed query inside a

view might be a good idea

Trang 4

Local-distributed queries

A local-distributed query sounds like an oxymoron, but it’s a query that pulls the external data into

SQL Server and then processes the query at the local SQL Server Because the processing occurs at the

local SQL Server, local-distributed queries use T-SQL syntax and are sometimes called T-SQL distributed

queries

Using the four-part name

If the data is in another SQL Server, then a complete four-part name is required:

Server.Database.Schma.ObjectName

The four-part name may be used in anySELECTor data-modification query On my writing computer

is a second instance of SQL Server called[SQL2008RC0\London] The object’s owner name is

required if the query accesses an external SQL Server

The following query retrieves thePersontable from theSQL2instance:

SELECT LastName, FirstName

FROM [SQL2008RC0\London].Family.dbo.Person;

Result:

When performing anINSERT,UPDATE, orDELETEcommand as a distributed query, either the

four-part name or a distributed query function must be substituted for the table name For example, the

following SQL code, extracted from theCHA2_Convert.sqlscript that populates theCHA2sample

database, uses the four-part name as the source for anINSERTcommand The query retrieves base

camps from the Excel spreadsheet and inserts them into SQL Server:

INSERT BaseCamp(Name)

SELECT DISTINCT [Base Camp]

FROM CHA1_Schedule [Base_Camp]

WHERE [Base Camp] IS NOT NULL;

If you’ve already executed CHA2_Convert.sql and populated your copy of CHA2 , then you

may want to re-execute CHA2_Create.sql in order to start with an empty database.

As another example of using the four-part name for a distributed query, the following code updates the

Familydatabase on the second SQL Server instance:

UPDATE [SQL2008RC0\London].Family.dbo.Person

SET LastName = ‘Wilson’

WHERE PersonID = 1;

Trang 5

Using theOpenDataSource()function is functionally the same as using a four-part name to access

a linked server, except that theOpenDataSource()function defines the link within the function

instead of referencing a pre-defined linked server While defining the link in code bypasses the linked

server requirement, if the link location changes, then the change will affect every query that uses

OpenDataSource() In addition,OpenDataSource()won’t accept variables as parameters

TheOpenDataSource()function is substituted for a server in the four-part name and may be used

within any DML statement

The syntax for theOpenDataSource()function seems simple enough:

OPENDATASOURCE ( provider_name, init_string )

However, there’s more to it than the first appearance betrays Theinitstring is a semicolon-delimited

string containing several parameters (the exact parameters used depend on the external data source

and are not described here; see Books Online for a full overview) The potential parameters within the

initstring include data source, location, extended properties, connection timeout, user ID, password,

and catalog Theinitstring must define the entire external data-source connection, and the security

context, within a function No quotes are required around the parameters within theinitstring The

common error committed in buildingOpenDataSource()distributed queries is mixing the commas

and semicolons

IfOpenDataSource()is connecting to another SQL Server using Windows authentication, then

authentication delegation via Kerberos security is required

A relatively straightforward example of theOpenDataSource()function is using it as a means of

accessing a table within another SQL Server instance:

SELECT FirstName, Gender

FROM OPENDATASOURCE(

‘SQLOLEDB’,

‘Data Source=SQL2008VPC\London;User ID=Joe;Password=j’

).Family.dbo.Person;

Result:

-

The following example of a distributed query that usesOpenDataSource()references theCape

Hatteras Adventuressample database Because an Access location contains only one database and

the tables don’t require the owner to specify the table, the database and owner are omitted from the

four-part name:

SELECT ContactFirstName, ContactLastName

FROM OPENDATASOURCE(

‘Microsoft.Jet.OLEDB.4.0’,

Trang 6

‘Data Source =

C:\SQLServerBible\CHA1_Customers.mdb’

) Customers;

Result:

ContactFirstName ContactLastName

-

To illustrate usingOpenDataSource()in an update query, the following query example will update

any rows inside theCHA1_Schedule.xlsExcel 2000 spreadsheet A named range was previously

defined asTours ‘=Sheet1!$E$5:$E$24’, which now appears to the SQL query as a table within

the data source Rather than update an individual spreadsheet cell, this query performs anUPDATE

operation that affects every row in which the tour column is equal toGauley River Raftingand

updates theBase Campcolumn to the valueAshville

The distributed SQL Server query will use OLE DB to call the Jet engine, which will open the Excel

spreadsheet file Because the spreadsheet is opened by a user, the file is now unavailable to anyone else

Excel is a single-user database TheOpenDataSource()function supplies only the server name in a

four-part name; as with Access, the database and owner values are omitted:

UPDATE OpenDataSource(

‘Data Source=C:\SQLServerBible\CHA1_Schedule.xls;

User ID=Admin;Password=;Extended properties=Excel 5.0’

) Tour

SET [Base Camp] = ‘Ashville’

WHERE Tour = ‘Gauley River Rafting’;

Figure 31-5 illustrates the query execution plan for the distributedUPDATEquery, beginning at the

right with a Remote Scan operation that returns all 19 rows from the Excel named range The data is

then processed within SQL Server The details of the Remote Update logical operation reveal that the

distributedUPDATEquery actually updated only two rows

To complete the example, the following query reads from the same Excel spreadsheet and verifies that

the update took place Again, theOpenDataSource()function is only pointing the distributed query

to an external server:

SELECT *

FROM OpenDataSource(

‘Data Source=C:\SQLServerBible\CHA1_Schedule.xls;

User ID=Admin;Password=;Extended properties=Excel 5.0’

) Tour

WHERE Tour = ‘Gauley River Rafting’;

Trang 7

FIGURE 31-5

The query execution plan for the distributed query using OpenDataSource()

Result:

-

Pass-through distributed queries

A pass-through query executes a query at the external data source and returns the result to SQL Server

The primary reason for using a pass-through query is to reduce the amount of data being passed

from the server (the external data source) and the client (SQL Server) Rather than pull a million rows

into SQL Server so that it can use 25 of them, it may be better to select those 25 rows from the external

data source

Be aware that the pass-through query will use the query syntax of the external data source If the

external data source is Oracle or Access, then PL/SQL or Access SQL must be used in the pass-through

query

Trang 8

In the case of a pass-through query that modifies data, the remote data type determines whether the

update is performed locally or remotely:

■ When another SQL Server is being updated, the remote SQL Server will perform the update

■ When non–SQL Server data is being updated, the data providers determine where the update

will be performed Often, the pass-through query merely selects the correct rows remotely The

selected rows are returned to SQL Server, modified inside SQL Server, and then returned to

the remote data source for the update

Two forms of local distributed queries exist, one for linked servers and one for external data sources

defined in the query; likewise, two forms of explicitly declaring pass-through distributed queries exist

as well.OpenQuery()uses an established linked server, andOpenRowSet()declares the link within

the query

Using the four-part name

If the distributed query is accessing another SQL Server, then the four-part name becomes a hybrid

distributed query method Depending on theFROMclause and theWHEREclause, SQL Server will attempt

to pass as much of the query as possible to the external SQL Server to improve performance

When building a complex distributed query using the four-part name, it’s difficult to predict how much

of the query SQL Server will pass through I’ve seen SQL Server take a single query and depending on

theWHEREclause, the whole query was passed through, each table became a separate pass-through

query, or only one table was passed through

OpenQuery()

For pass-through queries, theOpenQuery()function leverages a linked server, so it’s the easiest to

develop It also handles changes in server configuration without changing the code

TheOpenQuery()function is used within the SQL DML statement as a table The function accepts

only two parameters: the name of the linked server and the pass-through query The next query uses

OpenQuery()to retrieve data from theCHA1_ScheduleExcel spreadsheet:

SELECT *

FROM OPENQUERY(CHA1_Schedule,

‘SELECT * FROM Tour WHERE Tour = "Gauley River Rafting"’);

Result:

-

TheOpenQuery()pass-through query requires almost no processing by SQL Server The Remote Scan

returns exactly two rows to SQL Server TheWHEREclause is executed by the Jet engine as it reads from

the Excel spreadsheet

In the next example, theOpenQuery()requests the Jet engine to extract only the two rows requiring

the update The actualUPDATEoperation is performed in SQL Server, and the result is written back

Trang 9

to the external data set In effect, the pass-through query is performing only theSELECTportion of the

UPDATEcommand:

UPDATE OPENQUERY(CHA1_Schedule,

‘SELECT * FROM Tour WHERE Tour = "Gauley River Rafting"’)

SET [Base Camp] = ‘Ashville’;

OpenRowSet()

TheOpenRowSet()function is the pass-through counterpart to theOpenDataSet()function Both

require the remote data source to be fully specified in the distributed query.OpenRowSet()adds a

parameter to specify the pass-through query:

SELECT ContactFirstName, ContactLastName

FROM OPENROWSET (’Microsoft.Jet.OLEDB.4.0’,

‘C:\SQLServerBible\CHA1_Customers.mdb’; ‘Admin’;’’,

‘SELECT * FROM Customers WHERE CustomerID = 1’);

Result:

ContactFirstName ContactLastName -

Best Practice

Of the four distributed-query methods, the best option is the OpenQuery() function With

OpenQuery(), you have specific control over which data will be processed where In addition, it has

the advantage of predefined links, making the query more robust if the server configuration changes

To perform an update using theOpenRowSet()function, use the function in place of the table being

modified The following code sample modifies the customer’s last name in an Access database The

WHEREclause of theUPDATEcommand is handled by the pass-through portion of theOpenRowSet()

function:

UPDATE OPENROWSET (’Microsoft.Jet.OLEDB.4.0’,

‘C:\SQLServerBible\CHA1_Customers.mdb’; ‘Admin’;’’,

‘SELECT * FROM Customers WHERE CustomerID = 1’)

SET ContactLastName = ‘Wilson’;

Distributed Transactions

Transactions are key to data integrity If the logical unit of work includes modifying data outside the

local SQL server, then a standard transaction is unable to handle the atomicity of the transaction If a

failure should occur in the middle of the transaction, then a mechanism must be in place to roll back

Trang 10

the partial work; otherwise, a partial transaction will be recorded and the database will be left in an

inconsistent state

Chapter 66, ‘‘Managing Transactions, Locking, and Blocking,’’ explores the ACID properties

of a database and transactions.

Distributed Transaction Coordinator

SQL Server uses the Distributed Transaction Coordinator (DTC) to handle multiple server transactions,

commits, and rollbacks The DTC service uses a two-phase commit scheme for multiple server

trans-actions The two-phase commit ensures that every server is available and handling the transaction by

performing the following steps:

1 Each server is sent a ‘‘prepare to commit’’ message.

2 Each server performs the first phase of the commit, ensuring that it is capable of committing

the transaction

3 Each server replies when it has finished preparing for the commit.

4 Only after every participating server has responded positively to the ‘‘prepare to commit’’

message is the actual commit message sent to each server

If the logical unit of work only involves reading from the external SQL Server, then the DTC is not

required Only when remote updates are occurring is a transaction considered a distributed transaction

The Distributed Transaction Coordinator is a separate service from SQL Server DTC is started or

stopped with the SQL Server Service Manager

Only one instance of DTC runs per server regardless of how many SQL Server instances may be

installed or running on that server The actual service name ismsdtc.exe, and it consumes only about

2.5 MB of memory

DTC must be running when a distributed transaction is initiated or the transaction will fail

Developing distributed transactions

Distributed transactions are similar to local transactions with a few extensions to the syntax:

SET xact_abort on;

BEGIN DISTRIBUTED TRANSACTION;

In case of error, thexact_abortconnection option will cause the current transaction, rather than only

the current T-SQL statement, to be rolled back The xact_abort ONoption is required for any

dis-tributed transactions accessing a remote SQL Server and for most other OLE DB connections as well;

but if xact_abort ONis not in the code, then SQL Server will automatically convert the transaction

toxact_abort ONas soon as a distributed query is executed

TheBEGIN DISTRIBUTED TRANSACTIONcommand, which determines whether the DTC service is

available, is not strictly required If a transaction is initiated with onlyBEGIN TRAN, then the transaction

is escalated to a distributed transaction, and DTC is checked as soon as a distributed query is executed

It’s considered a better practice to useBEGIN DISTRIBUTED TRANSACTIONso that DTC is checked at

the beginning of the transaction When DTC is not running, an 8501 error is raised automatically:

Định dạng
Số trang	10
Dung lượng	765,48 KB