An XML column or variable that is bound to a schema collection is called typed XML.. Creating typed XML columns and variablesOnce the schema collection is created, typed XML columns and
Trang 1The system stored procedurexp_xml_preparedocumenttakes an optional fourth argument that
accepts the namespace declarations If the XML document contains namespace declarations, this
parameter can be used to specify the namespaces declared in the XML document The following example
shows how to do this:
DECLARE @hdoc INT DECLARE @xml VARCHAR(MAX) SET @xml =’
<itm:Items xmlns:itm="http://www.sqlserverbible.com/items">
<itm:Item ItemNumber="D001" Quantity="1" Price="900.0000" />
<itm:Item ItemNumber="Z001" Quantity="1" Price="200.0000" />
</itm:Items>’
Step 1: initialize XML Document Handle EXEC sp_xml_preparedocument
@hdoc OUTPUT,
@xml,
‘<itm:Items xmlns:itm="http://www.sqlserverbible.com/items"/>’
Step 2: Call OPENXML() SELECT * FROM OPENXML(@hdoc, ‘itm:Items/itm:Item’) WITH (
ItemNumber CHAR(4) ‘@ItemNumber’, Quantity INT ‘@Quantity’,
Price MONEY ‘@Price’
) Step 3: Free document handle exec sp_xml_removedocument @hdoc /*
ItemNumber Quantity Price - -
*/
BecauseOPENXML()needs a three-step process to shred each XML document, it is not suitable for
set-based operations It cannot be called from a scalar or table-valued function If a table has an XML
column, and a piece of information is to be extracted from more than one row, withOPENXML()a
WHILEloop is needed Row-by-row processing has significant overhead and will typically be much
slower than a set-based operation In such cases, XQuery will be a better choice overOPENXML()
UsingOPENXML()may be expensive in terms of memory usage too It uses the MSXML parser
internally, using a COM invocation, which may not be cheap A call toxp_xml_preparedocument
parses the XML document and stores it in the internal cache of SQL Server The MSXML parser
uses one-eighth of the total memory available to SQL Server Every document handle initialized by
xp_xml_preparedocument should be released by calling thexp_xml_releasedocumentprocedure
to avoid memory leaks
Trang 2XSD and XML Schema Collections
XSD (XML Schema Definition) is a W3C-recommended language for describing and validating XML
doc-uments SQL Server supports a subset of the XSD specification and can validate XML documents against
XSD schemas
SQL Server implements support for XSD schemas in the form of XML schema collections An XML
SCHEMA COLLECTIONis a SQL Server database object just like tables or views It can be created from
an XML schema definition Once a schema collection is created, it can be associated with an XML
column or variable An XML column or variable that is bound to a schema collection is called typed
XML SQL Server strictly validates typed XML documents when the value of the column or variable is
modified either by an assignment operation or by an XML DML operation (insert/update/delete)
Creating an XML Schema collection
An XML schema collection can be created withCREATE XML SCHEMA COLLECTIONstatement It
cre-ates a new XML schema collection with the specified name using the schema definition provided
The following example shows an XML schema that describes a customer information XML document
and implements a number of validation rules:
CREATE XML SCHEMA COLLECTION CustomerSchema AS ‘
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="Customer">
<xs:complexType>
<xs:attribute name="CustomerID" use="required">
<xs:simpleType>
<xs:restriction base="xs:integer">
<xs:minInclusive value="1"/>
<xs:maxInclusive value="9999"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:attribute name="CustomerName" use="optional">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:maxLength value="40"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
</xs:complexType>
</xs:element>
</xs:schema>’
GO
This schema defines a top-level element ˝ Customer˝with two attributes:CustomerIDand
CustomerNumber.CustomerIDattribute is set to mandatory by using theuseattribute
A restriction of minimum value and maximum value is applied on thecustomerIDattribute The
CustomerNumberattribute is set tooptionalby setting theuseattribute tooptional
A restriction is applied on the length of this attribute
Trang 3Creating typed XML columns and variables
Once the schema collection is created, typed XML columns and variables can be created that are bound
to the schema collection The following example creates a typed XML variable:
DECLARE @x XML(CustomerSchema) Similarly, a typed XML column can be created as follows:
Create a table with a TYPED XML column CREATE TABLE TypedXML(
ID INT, CustomerData XML(CustomerSchema)) Typed XML columns can be added to existing tables by using theALTER TABLE ADDstatement:
add a new typed XML column ALTER TABLE TypedXML ADD Customer2 XML(CustomerSchema) Typed XML parameters can be used as input and output parameters of stored procedures They can also
be used as input parameters and return values of scalar functions
Performing validation
When a value is assigned to a typed XML column or variable, SQL Server will perform all the
valida-tions defined in the schema collection against the new value being inserted or assigned The insert/
assignment operation will succeed only if the validation succeeds The following code generates an error
because the value being assigned to theCustomerIDattribute is outside the range defined for it:
DECLARE @x XML(CustomerSchema) SELECT @x = ‘<Customer CustomerID="19909" CustomerName="Jacob"/>’
/*
Msg 6926, Level 16, State 1, Line 2 XML Validation: Invalid simple type value: ‘19909’ Location: /*:Cus-tomer[1]/@*:
CustomerID
*/
SQL Server will perform the same set of validations if a new value is being assigned or the existing value
is modified by using XML DML operations (insert/update/delete)
An existing untyped XML column can be changed to a typed XML column by using theALTER TABLE
ALTER COLUMNcommand SQL Server will validate the XML values stored in each row for that column,
and check if the values validate successfully against the schema collection being bound to the column
TheALTER COLUMNoperation will succeed only if all the existing values are valid as per the rules
defined in the schema collection The same process happens if a typed XML column is altered and the
column is bound to a different schema collection The operation can succeed only if all the existing
values are valid as per the rules defined in the new schema collection
Trang 4XML DOCUMENT and CONTENT
A typed XML column or variable can accept two flavors of XML values:DOCUMENTandCONTENT
DOCUMENTis a complete XML document with a single top-level element.CONTENTusually is an XML
fragment and can have more than one top-level element Depending upon the requirement, a typed
XML column or variable can be defined asDOCUMENTorCONTENTwhen it is bound with the schema
collection
The following code snippet shows examples of XML variables declared as DOCUMENT and CONTENT
XML Document
DECLARE @x XML(DOCUMENT CustomerSchema)
SELECT @x = ‘<Customer CustomerID="1001" CustomerName="Jacob"/>’
XML Content
DECLARE @x XML(CONTENT CustomerSchema)
SELECT @x = ‘
<Customer CustomerID="1001" CustomerName="Jacob"/>
<Customer CustomerID="1002" CustomerName="Steve"/>’
If a content model is not specified, SQL Server assumesCONTENTwhen creating the typed XML column
or variable
Altering XML Schema collections
There are times when you might need to alter the definition of a given schema collection This can
usually happen when the business requirement changes or you need to fix a missing or incorrect
validation rule
However, altering schema collections is a big pain in SQL Server Once created, the definition of a
schema cannot be altered The schema demonstrated earlier in this section defines customer name as an
optional attribute If the business requirement changes and this attribute has to be made mandatory, that
will be a lot of work
Because the definition of a schema collection cannot be altered, if a new schema definition is wanted,
the existing schema collection should be dropped by executing theDROP XML SCHEMA COLLECTION
statement
Note that a schema collection cannot be dropped unless all the references are removed All columns
bound to the given schema collection should be dropped, changed to untyped XML, or altered
and bound to another schema collection before dropping the schema collection Similarly, any XML
parameters or return values that refer to the schema collection in stored procedures or functions should
be removed or altered as well
What’s in the ‘‘collection’’?
An XML schema collection can contain multiple schema definitions In most production use cases, it will
likely have only one schema definition, but it is valid to have more than one schema definition in a
sin-gle schema collection
Trang 5When a schema collection contains more than one schema definition, SQL Server will allow XML values
that validate with any of the schema definitions available within the schema collection
For example, a feed aggregator that stores valid RSS and ATOM feeds in a single column can create a
schema collection containing two schema definitions, one for RSS and one for ATOM SQL Server will
then allow both RSS and ATOM feeds to be stored in the given column The following XML schema
col-lection defines two top-level elements,CustomerandOrder:
CREATE XML SCHEMA COLLECTION CustomerOrOrder AS ‘
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="Customer">
<xs:complexType>
<xs:attribute name="CustomerID"/>
<xs:attribute name="CustomerName"/>
</xs:complexType>
</xs:element>
<xs:element name="Order">
<xs:complexType>
<xs:attribute name="OrderID"/>
<xs:attribute name="OrderNumber"/>
</xs:complexType>
</xs:element>
</xs:schema>’
GO
A typed XML column or variable bound to this schema collection can store aCustomerelement, an
Orderelement or both (if the XML column or variable is defined asCONTENT) The following sample
code presents an example to demonstrate this
XML Document DECLARE @x XML(CustomerOrOrder) SELECT @x = ‘<Customer CustomerID="1001" CustomerName="Jacob"/>’
SELECT @x = ‘<Order OrderID="121" OrderNumber="10001"/>’
SELECT @x = ‘
<Customer CustomerID="1001" CustomerName="Jacob"/>
<Order OrderID="121" OrderNumber="10001"/>’
A new schema definition can be added to an existing schema collection by using theALTER XML
SCHEMA COLLECTION ADDstatement:
ALTER XML SCHEMA COLLECTION CustomerOrOrder ADD ‘
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="Item">
<xs:complexType>
<xs:attribute name="ItemID"/>
<xs:attribute name="ItemNumber"/>
</xs:complexType>
</xs:element>
</xs:schema>’
GO
Trang 6Before creating or altering an XML schema collection, it is important to check whether a schema
collec-tion with the given name exists XML schema colleccollec-tions are stored in a set of internal tables and are
accessible through a number of system catalog views.Sys.xml_schema_collectionscan be queried
to determine whether a given XML schema collection exists:
IF EXISTS(
SELECT name FROM sys.xml_schema_collections
WHERE schema_id = schema_id(’dbo’) AND name = ‘CustomerSchema’
) DROP XML SCHEMA COLLECTION CustomerSchema
What’s new in SQL Server 2008 for XSD
SQL Server 2008 added a number of enhancements to the XSD implementation of the previous version
The XSD implementation ofdate,time, anddateTimedata types required time zone information in
SQL Server 2005, so a date value should look like2009-03-14Zor2009-03-14+05:30, where the
zand+05:30indicates the time zone This requirement has been removed in SQL Server 2008 The
XSD processor now acceptsdate,time, anddateTimevalues with or without time zone information
Though thedate,time, anddateTimedata type implementation in SQL Server 2005 required time
zone information, the XML document did not preserve it It normalized the value to a UTC date/time
value and stored it SQL Server 2008 added enhancements to preserve the time zone information
If thedate,time, ordateTimevalue contains time zone information, then SQL Server 2008
preserves it
Unions and lists are two powerful data models of XSD Union types are simple types created with the
union of two or more atomic types List types are simple types that can store a space-delimited list of
atomic values Both of these types supported only atomic values in their implementation in SQL Server
2005 SQL Server 2008 enhanced these types so that lists of union types and unions of list types can be
created
Lax validation is the most important addition to the XSD validation in SQL Server 2008 In SQL Server
2005, wildcard elements could be validated either with ‘‘skip’’ (does not validate at all) or ‘‘strict’’
(performs a strict, or full, validation) SQL Server 2008 added ‘‘lax’’ validation whereby the schema
processor performs the validation only if the declaration for the target namespace is found in the
schema collection
This chapter provides only a brief overview of the XSD implementation in SQL Server Detailed coverage
of all the XSD features is beyond the scope of this chapter
Understanding XML Indexes
SQL Server does not allow an XML column to be part of a regular index (SQL Index) To optimize
queries that extract information from XML columns, SQL Server supports a special type of index called
an XML index The query processor can use an XML index to optimize XQuery, just like it uses SQL
indexes to optimize SQL queries
SQL Server supports four different types of XML indexes Each XML column can have a primary XML
index and three different types of secondary XML indexes.
Trang 7A primary XML index is a clustered index created in document order, on an internal table known as a
node table It contains information about all tags, paths, and values within the XML instance in each row.
A primary XML index can be created only on a table that already has a clustered index on the primary
key The primary key of the base table is used to join the XQuery results with the base table The
pri-mary XML index contains one row for each node in the XML instance The query processor will use the
primary XML index to execute every query, except for cases where the whole document is retrieved
Just like SQL indexes, XML indexes should be created and used wisely The size of a primary XML
index may be around three times the size of the XML data stored in the base table, although this may
vary based on the structure of the XML document Document order is important for XML, and the
primary XML index is created in such a way that document order and the structural integrity of
the XML document is maintained in the query result
If an XML column has a primary XML index, three additional types of secondary XML indexes can be
created on the column The additional index types arePROPERTY,VALUE, andPATHindexes Based
upon the specific query requirements, one or more of the index types may be used Secondary indexes
are non-clustered indexes created on the internal node table
APATHXML index is created on the internal node table and indexes the path and value of each XML
element and attribute.PATHindexes are good for operations in which nodes with specific values are
fil-tered or selected
APROPERTYXML index is created on the internal node table and contains the primary key of the table,
the path to elements and attributes, and their values The advantage of aPROPERTYXML index over a
PATHXML index is that it helps to search multi-valued properties in the same XML instance
AVALUEXML index is just like thePATHXML index, and contains the value and path of each XML
element and attribute (instead of path and value).VALUEindexes are helpful in cases where wildcards
are used in the path expression
XML indexes are a great addition to the XML capabilities of SQL Server Wise usage of XML indexes
helps optimize queries that use XQuery to fetch information from XML columns
XML Best Practices
SQL Server comes with a wide range of XML-related functionalities, and the correct usage of these
func-tionalities is essential for building a good system A feature may be deemed ‘‘good’’ only if it is applied
on an area where it is really required If not, it might result in unnecessary overhead or add unwanted
complexity to an otherwise simpler task
■ XML should be used only where it is really required Using XML where relational data would best be suited is not a good idea Similarly, using a relational model where XML might run better won’t produce the desired results
■ XML is good for storing semi-structured or unstructured data XML is a better choice if the physical order of values is significant and the data represents a hierarchy If the values are valid XML documents and need to be queried, storing them on an XML column will be a better choice overVARCHAR,NVARCHAR, orVARBINARYcolumns
Trang 8■ If the structure of the XML documents is defined, using typed XML columns will be a better
choice Typed XML columns provide better metadata information and allow SQL Server to
optimize queries running over typed XML columns Furthermore, typed XML provides storage
optimization and static type checking
■ Creating a primary XML index and a secondary XML index (or more, depending on the
work-load) might help improve XQuery performance An XML primary index usually uses up to
three times the storage space than the data in the base table This indicates that, just like SQL
indexes, XML indexes also should be used wisely Keep in mind that a full-text index can be
created on an XML column A wise combination of a full-text index with XML indexes might
be a better choice in many situations
■ Creating property tables to promote multi-valued properties from the XML column may be
a good idea in many cases One or more property tables may be created from the data in an
XML column, and these tables can be indexed to improve performance further
■ Two common mistakes that add a lot of overhead to XQuery processing are usage of wildcards
in the path expression and using a parent node accessor to read information from upper-level
nodes
■ Using specific markups instead of generic markups will enhance performance significantly
Generic markups do not perform well and do not allow XML index lookups to be done
efficiently
■ Attribute-centric markup is a better choice than element-centric markup Processing
infor-mation from attributes is much more efficient than processing inforinfor-mation from elements
Attribute-centric markups take less storage space than element-centric markups, and the
evaluation of predicates is more efficient because the attribute’s value is stored in the same row
as its markup in the primary XML index
■ An in-place update of theXMLdata type gives better performance in most cases If the update
operation requires modifying the value of one or more elements or attributes, it is a better
practice to modify those elements and attributes using XML DML functions, rather than
replace the whole document
■ Using theexist()method to check for the existence of a value is much more efficient than
using thevalue()method Parameterizing XQuery and XML DML expressions is much more
efficient than executing dynamic SQL statements
Summary
SQL Server 2008 is fully equipped with a wide range of XML capabilities to support the XML processing
requirements needed by almost every modern application SQL Server 2008 added a number of
enhancements to the XML features supported by previous versions Key points to take away from this
chapter include the following:
■ SQL Server 2008 is equipped with a number of XML processing capabilities, including support
for generating, loading, querying, validating, modifying, and indexing XML documents
■ TheXMLdata type can be used to store XML documents It supports the following methods:
value(),exist(),query(),modify(), andnodes()
Trang 9■ AnXMLdata type column or variable that is associated with a schema collection is called typed
XML SQL Server validates typed XML columns and variables against the rules defined in the
schema
■ TheOPENROWSET()function can be used with theBULKrow set provider to load an XML document from a disk file
■ XML output can be generated from the result of aSELECTquery usingFOR XML.FOR XML can be used with theAUTO,RAW,EXPLICIT, andPATHdirectives to achieve different levels
of control over the structure and format of the XML output
■ Thequery()method of theXMLdata type supports XQueryFLWORoperations, which allow complex restructuring and manipulation of XML documents SQL Server 2008 added support for theletclause inFLWORoperations
■ TheXMLdata type supports XML DML operations through themodify()method It allows performing insert, update, and delete operations on XML documents
■ SQL Server 2008 added support for inserting anXMLdata type value into another XML document
■ WITH XMLNAMESPACEScan be used to process XML documents that have namespace declarations
■ SQL Server supports XSD in the form of XML schema collections SQL Server 2008 added a number of enhancements to the XSD support available with previous versions These enhance-ments include full support for thedate,time, anddateTimedata types, support for lax validation, support for creating unions of list types and lists of union types, etc
■ SQL Server supports a special category of indexes called XML indexes to index XML columns.
A primary XML index and up to three secondary indexes (PATH,VALUE, andPROPERTY) can
be created on an XML column
Trang 10Using Integrated Full-Text Search
IN THIS CHAPTER Setting up full-text index catalogs with Management Studio or T-SQL code Maintaining full-text indexes Using full-text indexes in queries
Performing fuzzy word searches
Searching text stored in binary objects
Full-text search performance
Several years ago I wrote a word search for a large database of legal texts
For word searches, the database parsed all the documents and built a
word-frequency table as a many-to-many association between the word
table and the document table It worked well, and word searches became
lightning-fast As much fun as writing your own word search can be, fortunately,
you have a choice
SQL Server includes a structured word/phrase indexing system called Full-Text
Search More than just a word parser, Full-Text Search actually performs
linguis-tic analysis by determining base words and word boundaries, and by conjugating
verbs for different languages It runs circles around the simple word index system
that I built
ANSI Standard SQL uses theLIKEoperator to perform basic word searches and
even wildcard searches For example, the following code uses theLIKEoperator
to query the Aesop’s Fables sample database:
USE Aesop;
SELECT Title
FROM Fable
WHERE Fabletext LIKE ‘%Lion%’
AND Fabletext LIKE ‘%bold%’;
Result:
Title
-The Hunter and the Woodman
The main problem with performing SQL ServerWHERE LIKEsearches is
the slow performance Indexes are searchable from the beginning of the word,