Hướng dẫn học Microsoft SQL Server 2008 part 53 doc

An XML column or variable that is bound to a schema collection is called typed XML.. Creating typed XML columns and variablesOnce the schema collection is created, typed XML columns and

Trang 1

The system stored procedurexp_xml_preparedocumenttakes an optional fourth argument that

accepts the namespace declarations If the XML document contains namespace declarations, this

parameter can be used to specify the namespaces declared in the XML document The following example

shows how to do this:

DECLARE @hdoc INT DECLARE @xml VARCHAR(MAX) SET @xml =’

<itm:Items xmlns:itm="http://www.sqlserverbible.com/items">

<itm:Item ItemNumber="D001" Quantity="1" Price="900.0000" />

<itm:Item ItemNumber="Z001" Quantity="1" Price="200.0000" />

</itm:Items>’

Step 1: initialize XML Document Handle EXEC sp_xml_preparedocument

@hdoc OUTPUT,

@xml,

‘<itm:Items xmlns:itm="http://www.sqlserverbible.com/items"/>’

Step 2: Call OPENXML() SELECT * FROM OPENXML(@hdoc, ‘itm:Items/itm:Item’) WITH (

ItemNumber CHAR(4) ‘@ItemNumber’, Quantity INT ‘@Quantity’,

Price MONEY ‘@Price’

) Step 3: Free document handle exec sp_xml_removedocument @hdoc /*

ItemNumber Quantity Price - -

*/

BecauseOPENXML()needs a three-step process to shred each XML document, it is not suitable for

set-based operations It cannot be called from a scalar or table-valued function If a table has an XML

column, and a piece of information is to be extracted from more than one row, withOPENXML()a

WHILEloop is needed Row-by-row processing has significant overhead and will typically be much

slower than a set-based operation In such cases, XQuery will be a better choice overOPENXML()

UsingOPENXML()may be expensive in terms of memory usage too It uses the MSXML parser

internally, using a COM invocation, which may not be cheap A call toxp_xml_preparedocument

parses the XML document and stores it in the internal cache of SQL Server The MSXML parser

uses one-eighth of the total memory available to SQL Server Every document handle initialized by

xp_xml_preparedocument should be released by calling thexp_xml_releasedocumentprocedure

to avoid memory leaks

Trang 2

XSD and XML Schema Collections

XSD (XML Schema Definition) is a W3C-recommended language for describing and validating XML

doc-uments SQL Server supports a subset of the XSD specification and can validate XML documents against

XSD schemas

SQL Server implements support for XSD schemas in the form of XML schema collections An XML

SCHEMA COLLECTIONis a SQL Server database object just like tables or views It can be created from

an XML schema definition Once a schema collection is created, it can be associated with an XML

column or variable An XML column or variable that is bound to a schema collection is called typed

XML SQL Server strictly validates typed XML documents when the value of the column or variable is

modified either by an assignment operation or by an XML DML operation (insert/update/delete)

Creating an XML Schema collection

An XML schema collection can be created withCREATE XML SCHEMA COLLECTIONstatement It

cre-ates a new XML schema collection with the specified name using the schema definition provided

The following example shows an XML schema that describes a customer information XML document

and implements a number of validation rules:

CREATE XML SCHEMA COLLECTION CustomerSchema AS ‘

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name="Customer">

<xs:complexType>

<xs:attribute name="CustomerID" use="required">

<xs:simpleType>

<xs:restriction base="xs:integer">

<xs:minInclusive value="1"/>

<xs:maxInclusive value="9999"/>

</xs:restriction>

</xs:simpleType>

</xs:attribute>

<xs:attribute name="CustomerName" use="optional">

<xs:simpleType>

<xs:restriction base="xs:string">

<xs:maxLength value="40"/>

</xs:restriction>

</xs:simpleType>

</xs:attribute>

</xs:complexType>

</xs:element>

</xs:schema>’

GO

This schema defines a top-level element ˝ Customer˝with two attributes:CustomerIDand

CustomerNumber.CustomerIDattribute is set to mandatory by using theuseattribute

A restriction of minimum value and maximum value is applied on thecustomerIDattribute The

CustomerNumberattribute is set tooptionalby setting theuseattribute tooptional

A restriction is applied on the length of this attribute

Trang 3

Creating typed XML columns and variables

Once the schema collection is created, typed XML columns and variables can be created that are bound

to the schema collection The following example creates a typed XML variable:

DECLARE @x XML(CustomerSchema) Similarly, a typed XML column can be created as follows:

Create a table with a TYPED XML column CREATE TABLE TypedXML(

ID INT, CustomerData XML(CustomerSchema)) Typed XML columns can be added to existing tables by using theALTER TABLE ADDstatement:

add a new typed XML column ALTER TABLE TypedXML ADD Customer2 XML(CustomerSchema) Typed XML parameters can be used as input and output parameters of stored procedures They can also

be used as input parameters and return values of scalar functions

Performing validation

When a value is assigned to a typed XML column or variable, SQL Server will perform all the

valida-tions defined in the schema collection against the new value being inserted or assigned The insert/

assignment operation will succeed only if the validation succeeds The following code generates an error

because the value being assigned to theCustomerIDattribute is outside the range defined for it:

DECLARE @x XML(CustomerSchema) SELECT @x = ‘<Customer CustomerID="19909" CustomerName="Jacob"/>’

/*

Msg 6926, Level 16, State 1, Line 2 XML Validation: Invalid simple type value: ‘19909’ Location: /*:Cus-tomer[1]/@*:

CustomerID

*/

SQL Server will perform the same set of validations if a new value is being assigned or the existing value

is modified by using XML DML operations (insert/update/delete)

An existing untyped XML column can be changed to a typed XML column by using theALTER TABLE

ALTER COLUMNcommand SQL Server will validate the XML values stored in each row for that column,

and check if the values validate successfully against the schema collection being bound to the column

TheALTER COLUMNoperation will succeed only if all the existing values are valid as per the rules

defined in the schema collection The same process happens if a typed XML column is altered and the

column is bound to a different schema collection The operation can succeed only if all the existing

values are valid as per the rules defined in the new schema collection

Trang 4

XML DOCUMENT and CONTENT

A typed XML column or variable can accept two flavors of XML values:DOCUMENTandCONTENT

DOCUMENTis a complete XML document with a single top-level element.CONTENTusually is an XML

fragment and can have more than one top-level element Depending upon the requirement, a typed

XML column or variable can be defined asDOCUMENTorCONTENTwhen it is bound with the schema

collection

The following code snippet shows examples of XML variables declared as DOCUMENT and CONTENT

XML Document

DECLARE @x XML(DOCUMENT CustomerSchema)

SELECT @x = ‘<Customer CustomerID="1001" CustomerName="Jacob"/>’

XML Content

DECLARE @x XML(CONTENT CustomerSchema)

SELECT @x = ‘

<Customer CustomerID="1002" CustomerName="Steve"/>’

If a content model is not specified, SQL Server assumesCONTENTwhen creating the typed XML column

or variable

Altering XML Schema collections

There are times when you might need to alter the definition of a given schema collection This can

usually happen when the business requirement changes or you need to fix a missing or incorrect

validation rule

However, altering schema collections is a big pain in SQL Server Once created, the definition of a

schema cannot be altered The schema demonstrated earlier in this section defines customer name as an

optional attribute If the business requirement changes and this attribute has to be made mandatory, that

will be a lot of work

Because the definition of a schema collection cannot be altered, if a new schema definition is wanted,

the existing schema collection should be dropped by executing theDROP XML SCHEMA COLLECTION

statement

Note that a schema collection cannot be dropped unless all the references are removed All columns

bound to the given schema collection should be dropped, changed to untyped XML, or altered

and bound to another schema collection before dropping the schema collection Similarly, any XML

parameters or return values that refer to the schema collection in stored procedures or functions should

be removed or altered as well

What’s in the ‘‘collection’’?

An XML schema collection can contain multiple schema definitions In most production use cases, it will

likely have only one schema definition, but it is valid to have more than one schema definition in a

sin-gle schema collection

Trang 5

When a schema collection contains more than one schema definition, SQL Server will allow XML values

that validate with any of the schema definitions available within the schema collection

For example, a feed aggregator that stores valid RSS and ATOM feeds in a single column can create a

schema collection containing two schema definitions, one for RSS and one for ATOM SQL Server will

then allow both RSS and ATOM feeds to be stored in the given column The following XML schema

col-lection defines two top-level elements,CustomerandOrder:

CREATE XML SCHEMA COLLECTION CustomerOrOrder AS ‘

<xs:element name="Customer">

<xs:complexType>

<xs:attribute name="CustomerID"/>

<xs:attribute name="CustomerName"/>

</xs:complexType>

</xs:element>

<xs:element name="Order">

<xs:complexType>

<xs:attribute name="OrderID"/>

<xs:attribute name="OrderNumber"/>

</xs:complexType>

</xs:element>

</xs:schema>’

GO

A typed XML column or variable bound to this schema collection can store aCustomerelement, an

Orderelement or both (if the XML column or variable is defined asCONTENT) The following sample

code presents an example to demonstrate this

XML Document DECLARE @x XML(CustomerOrOrder) SELECT @x = ‘<Customer CustomerID="1001" CustomerName="Jacob"/>’

SELECT @x = ‘<Order OrderID="121" OrderNumber="10001"/>’

SELECT @x = ‘

<Order OrderID="121" OrderNumber="10001"/>’

A new schema definition can be added to an existing schema collection by using theALTER XML

SCHEMA COLLECTION ADDstatement:

ALTER XML SCHEMA COLLECTION CustomerOrOrder ADD ‘

<xs:element name="Item">

<xs:complexType>

<xs:attribute name="ItemID"/>

<xs:attribute name="ItemNumber"/>

</xs:complexType>

</xs:element>

</xs:schema>’

GO

Trang 6

Before creating or altering an XML schema collection, it is important to check whether a schema

collec-tion with the given name exists XML schema colleccollec-tions are stored in a set of internal tables and are

accessible through a number of system catalog views.Sys.xml_schema_collectionscan be queried

to determine whether a given XML schema collection exists:

IF EXISTS(

SELECT name FROM sys.xml_schema_collections

WHERE schema_id = schema_id(’dbo’) AND name = ‘CustomerSchema’

) DROP XML SCHEMA COLLECTION CustomerSchema

What’s new in SQL Server 2008 for XSD

SQL Server 2008 added a number of enhancements to the XSD implementation of the previous version

The XSD implementation ofdate,time, anddateTimedata types required time zone information in

SQL Server 2005, so a date value should look like2009-03-14Zor2009-03-14+05:30, where the

zand+05:30indicates the time zone This requirement has been removed in SQL Server 2008 The

XSD processor now acceptsdate,time, anddateTimevalues with or without time zone information

Though thedate,time, anddateTimedata type implementation in SQL Server 2005 required time

zone information, the XML document did not preserve it It normalized the value to a UTC date/time

value and stored it SQL Server 2008 added enhancements to preserve the time zone information

If thedate,time, ordateTimevalue contains time zone information, then SQL Server 2008

preserves it

Unions and lists are two powerful data models of XSD Union types are simple types created with the

union of two or more atomic types List types are simple types that can store a space-delimited list of

atomic values Both of these types supported only atomic values in their implementation in SQL Server

2005 SQL Server 2008 enhanced these types so that lists of union types and unions of list types can be

created

Lax validation is the most important addition to the XSD validation in SQL Server 2008 In SQL Server

2005, wildcard elements could be validated either with ‘‘skip’’ (does not validate at all) or ‘‘strict’’

(performs a strict, or full, validation) SQL Server 2008 added ‘‘lax’’ validation whereby the schema

processor performs the validation only if the declaration for the target namespace is found in the

schema collection

This chapter provides only a brief overview of the XSD implementation in SQL Server Detailed coverage

of all the XSD features is beyond the scope of this chapter

Understanding XML Indexes

SQL Server does not allow an XML column to be part of a regular index (SQL Index) To optimize

queries that extract information from XML columns, SQL Server supports a special type of index called

an XML index The query processor can use an XML index to optimize XQuery, just like it uses SQL

indexes to optimize SQL queries

SQL Server supports four different types of XML indexes Each XML column can have a primary XML

index and three different types of secondary XML indexes.

Trang 7

A primary XML index is a clustered index created in document order, on an internal table known as a

node table It contains information about all tags, paths, and values within the XML instance in each row.

A primary XML index can be created only on a table that already has a clustered index on the primary

key The primary key of the base table is used to join the XQuery results with the base table The

pri-mary XML index contains one row for each node in the XML instance The query processor will use the

primary XML index to execute every query, except for cases where the whole document is retrieved

Just like SQL indexes, XML indexes should be created and used wisely The size of a primary XML

index may be around three times the size of the XML data stored in the base table, although this may

vary based on the structure of the XML document Document order is important for XML, and the

primary XML index is created in such a way that document order and the structural integrity of

the XML document is maintained in the query result

If an XML column has a primary XML index, three additional types of secondary XML indexes can be

created on the column The additional index types arePROPERTY,VALUE, andPATHindexes Based

upon the specific query requirements, one or more of the index types may be used Secondary indexes

are non-clustered indexes created on the internal node table

APATHXML index is created on the internal node table and indexes the path and value of each XML

element and attribute.PATHindexes are good for operations in which nodes with specific values are

fil-tered or selected

APROPERTYXML index is created on the internal node table and contains the primary key of the table,

the path to elements and attributes, and their values The advantage of aPROPERTYXML index over a

PATHXML index is that it helps to search multi-valued properties in the same XML instance

AVALUEXML index is just like thePATHXML index, and contains the value and path of each XML

element and attribute (instead of path and value).VALUEindexes are helpful in cases where wildcards

are used in the path expression

XML indexes are a great addition to the XML capabilities of SQL Server Wise usage of XML indexes

helps optimize queries that use XQuery to fetch information from XML columns

XML Best Practices

SQL Server comes with a wide range of XML-related functionalities, and the correct usage of these

func-tionalities is essential for building a good system A feature may be deemed ‘‘good’’ only if it is applied

on an area where it is really required If not, it might result in unnecessary overhead or add unwanted

complexity to an otherwise simpler task

■ XML should be used only where it is really required Using XML where relational data would best be suited is not a good idea Similarly, using a relational model where XML might run better won’t produce the desired results

■ XML is good for storing semi-structured or unstructured data XML is a better choice if the physical order of values is significant and the data represents a hierarchy If the values are valid XML documents and need to be queried, storing them on an XML column will be a better choice overVARCHAR,NVARCHAR, orVARBINARYcolumns

Trang 8

■ If the structure of the XML documents is defined, using typed XML columns will be a better

choice Typed XML columns provide better metadata information and allow SQL Server to

optimize queries running over typed XML columns Furthermore, typed XML provides storage

optimization and static type checking

■ Creating a primary XML index and a secondary XML index (or more, depending on the

work-load) might help improve XQuery performance An XML primary index usually uses up to

three times the storage space than the data in the base table This indicates that, just like SQL

indexes, XML indexes also should be used wisely Keep in mind that a full-text index can be

created on an XML column A wise combination of a full-text index with XML indexes might

be a better choice in many situations

■ Creating property tables to promote multi-valued properties from the XML column may be

a good idea in many cases One or more property tables may be created from the data in an

XML column, and these tables can be indexed to improve performance further

■ Two common mistakes that add a lot of overhead to XQuery processing are usage of wildcards

in the path expression and using a parent node accessor to read information from upper-level

nodes

■ Using specific markups instead of generic markups will enhance performance significantly

Generic markups do not perform well and do not allow XML index lookups to be done

efficiently

■ Attribute-centric markup is a better choice than element-centric markup Processing

infor-mation from attributes is much more efficient than processing inforinfor-mation from elements

Attribute-centric markups take less storage space than element-centric markups, and the

evaluation of predicates is more efficient because the attribute’s value is stored in the same row

as its markup in the primary XML index

■ An in-place update of theXMLdata type gives better performance in most cases If the update

operation requires modifying the value of one or more elements or attributes, it is a better

practice to modify those elements and attributes using XML DML functions, rather than

replace the whole document

■ Using theexist()method to check for the existence of a value is much more efficient than

using thevalue()method Parameterizing XQuery and XML DML expressions is much more

efficient than executing dynamic SQL statements

Summary

SQL Server 2008 is fully equipped with a wide range of XML capabilities to support the XML processing

requirements needed by almost every modern application SQL Server 2008 added a number of

enhancements to the XML features supported by previous versions Key points to take away from this

chapter include the following:

■ SQL Server 2008 is equipped with a number of XML processing capabilities, including support

for generating, loading, querying, validating, modifying, and indexing XML documents

■ TheXMLdata type can be used to store XML documents It supports the following methods:

value(),exist(),query(),modify(), andnodes()

Trang 9

■ AnXMLdata type column or variable that is associated with a schema collection is called typed

XML SQL Server validates typed XML columns and variables against the rules defined in the

schema

■ TheOPENROWSET()function can be used with theBULKrow set provider to load an XML document from a disk file

■ XML output can be generated from the result of aSELECTquery usingFOR XML.FOR XML can be used with theAUTO,RAW,EXPLICIT, andPATHdirectives to achieve different levels

of control over the structure and format of the XML output

■ Thequery()method of theXMLdata type supports XQueryFLWORoperations, which allow complex restructuring and manipulation of XML documents SQL Server 2008 added support for theletclause inFLWORoperations

■ TheXMLdata type supports XML DML operations through themodify()method It allows performing insert, update, and delete operations on XML documents

■ SQL Server 2008 added support for inserting anXMLdata type value into another XML document

■ WITH XMLNAMESPACEScan be used to process XML documents that have namespace declarations

■ SQL Server supports XSD in the form of XML schema collections SQL Server 2008 added a number of enhancements to the XSD support available with previous versions These enhance-ments include full support for thedate,time, anddateTimedata types, support for lax validation, support for creating unions of list types and lists of union types, etc

■ SQL Server supports a special category of indexes called XML indexes to index XML columns.

A primary XML index and up to three secondary indexes (PATH,VALUE, andPROPERTY) can

be created on an XML column

Trang 10

Using Integrated Full-Text Search

IN THIS CHAPTER Setting up full-text index catalogs with Management Studio or T-SQL code Maintaining full-text indexes Using full-text indexes in queries

Performing fuzzy word searches

Searching text stored in binary objects

Full-text search performance

Several years ago I wrote a word search for a large database of legal texts

For word searches, the database parsed all the documents and built a

word-frequency table as a many-to-many association between the word

table and the document table It worked well, and word searches became

lightning-fast As much fun as writing your own word search can be, fortunately,

you have a choice

SQL Server includes a structured word/phrase indexing system called Full-Text

Search More than just a word parser, Full-Text Search actually performs

linguis-tic analysis by determining base words and word boundaries, and by conjugating

verbs for different languages It runs circles around the simple word index system

that I built

ANSI Standard SQL uses theLIKEoperator to perform basic word searches and

even wildcard searches For example, the following code uses theLIKEoperator

to query the Aesop’s Fables sample database:

USE Aesop;

SELECT Title

FROM Fable

WHERE Fabletext LIKE ‘%Lion%’

AND Fabletext LIKE ‘%bold%’;

Result:

Title

-The Hunter and the Woodman

The main problem with performing SQL ServerWHERE LIKEsearches is

the slow performance Indexes are searchable from the beginning of the word,

Định dạng
Số trang	10
Dung lượng	0,96 MB