We also display a hyperlink to this document: 'variable to count number of validation errors found Dim intValidErrors As Integer = 0 'create the new XmlTextReader object and load the X
Trang 1or DTD
However, things are different when using the NET System.Xml classes Loading a combined schema or DTD and the XML data content (that is, an inline schema) or an XML document that references an external schema or DTD into any of the XML "storage" objects such as XmlDocument, XmlDataDocument, and XPathDocument does not automatically validate that document And there is no property that we can set to make it do this
Instead, we load the document via an XmlTextReader object to which we have attached an XmlValidatingReader The Load method of the XmlDocument and XmlDataDocument objects can accept an XmlValidatingReader as the single parameter instead of a file path and name Meanwhile the constructor for the XPathDocument object can accept
an XmlValdiatingReader as the single parameter
So all we have to do is set up our XmlValidatingReader and XmlTextReader combination, and then pass this to the Load method or the constructor function (depending on which document object we're creating) The document will then
be validated as it is loaded:
'create XmlTextReader, load XML document and create Validator
objXTReader = New XmlTextReader(strXMLPath)
Dim objValidator As New XmlValidatingReader(objXTReader)
objValidator.ValidationType = ValidationType.Schema
'use the validator/reader combination to create XPathDocument object
Dim objXPathDoc As New XPathDocument(objValidator)
'use the validator/reader combination to create XmlDocument object
Dim objXmlDoc As New XmlDocument()
objXmlDoc.Load(objValidator)
The XmlValidatingReader can also be used to validate XML held in a String So we can validate XML that's already loaded into an object or application by simply extracting it as a String object (using the GetXml method with a DataSetobject, or the OuterXml property to get a document fragment, for example) and applying the XmlValidatingReader
to this
Trang 2Validating XML in a DataSet Object
Like the XML document objects, a DataSet does not automatically validate XML that you provide for the ReadXmlmethod against any schema that is already in place within the DataSet or which is inline with the XML (that is, in the same document as the XML data content) In a DataSet, the schema is used solely to provide information about the intended structure of the data It's not used for actual validation at all
When we load the schema, the DataSet uses it as a specification for the table names, column names, data types, etc Then, when we load the XML data content, it arranges the data in the appropriate tables and columns as new data rows
If a value or element is encountered that doesn't match the schema, it is ignored and that particular column in the current data row is left empty
This makes sense, because the DataSet is designed to work with structured relational data, and so any superfluous content in the source file cannot be part of the correct data model So, you should think of schemas in a DataSet as being
a way to specify the data structure (rather than inferring the structure from the data, as happens if no schema is present) Don't think of this as a way of validating the data
A Document Validation Example
We've provided an example page 'Validating XML documents with an XmlValidatingReader object'
(validating-xml.aspx) that demonstrates how you can validate an XML document When first opened it displays a list
of source documents that you can use in a drop-down list, and it performs validation against the selected document As you can see from the screenshot, it reports no validation errors in a valid document:
Note that you must run the page in a browser on the web server itself to be able to open the XML document and schema using the physical paths in the hyperlinks in the page
However, if you select the "well-formed but invalid" document, it reports a series of validation errors:
Trang 3In this case the XML document contains an extra child element within one of the <Books> elements, which is not permitted in the schema that we're using to validate it (you can view the document and the schema using the hyperlinks
we provide in the page):
Trang 4The code that performs the validation is shown in the next listings We start by creating the paths to the schema and XML document In this example, the document name comes from the drop-down list named selXMLFile that is defined earlier in the page - the filename itself is the value attribute of the selected item:
'create physical path to sample files (in same folder as ASPX page)
Dim strCurrentPath As String = Request.PhysicalPath
Dim strXMLPath As String = Left(strCurrentPath, _
InStrRev(strCurrentPath, "\")) & selXMLFile.SelectedItem.Value
Dim strSchemaPath As String = Left(strCurrentPath, _
InStrRev(strCurrentPath, "\")) & "booklist-schema.xsd"
We then declare a variable to hold the number of validation errors we find This is followed by code to create an XmlTextReader object, specifying the XML document as the source We also display a hyperlink to this document:
'variable to count number of validation errors found
Dim intValidErrors As Integer = 0
'create the new XmlTextReader object and load the XML document
objXTReader = New XmlTextReader(strXMLPath)
outXMLDoc.innerHTML = "Loaded file: <a href=""" & strXMLPath _
& """>" & strXMLPath & "</a><br />"
Creating the XmlValidatingReader and Specifying the Schema
The next step is to create our XmlValidatingReader object with the XmlTextReader as the source, and specify the validation type to suit our schema (we could, of course, have used Auto to automatically validate against any type of
Trang 5schema or DTD):
'create an XMLValidatingReader for this XmlTextReader
Dim objValidator As New XmlValidatingReader(objXTReader)
'set the validation type to use an XSD schema
objValidator.ValidationType = ValidationType.Schema
Our schema is in a separate document and there is no link or reference to it in the XML document, so we need to specify which schema we want to use We create a new XmlSchemaCollection, and add our schema to it using the Addmethod of the XmlSchemaCollection Then we specify this collection as the Schemas property, and display a link to the schema:
'create a new XmlSchemaCollection
Dim objSchemaCol As New XmlSchemaCollection()
'add the booklist-schema.xsd schema to it
objSchemaCol.Add("", strSchemaPath)
'assign the schema collection to the XmlValidatingReader
objValidator.Schemas.Add(objSchemaCol)
outXMLDoc.innerHTML += "Validating against: <a href=""" _
& strSchemaPath & """>" & strSchemaPath & "</a>"
Specifying the Validation Event Handler
The XmlValidatingReader will raise an event whenever it encounters a validation error in the document, as the XmlTextReader reads it from our disk file If we don't handle this event specifically, it will be raised to the default error handler In our case, this is the Try Catch construct we include in our example page
Trang 6However, it's often better to handle the validation events separately from other (usually fatal) errors such as the XML file not actually existing on disk To specify our own event handler for the ValidationEventHandler event in Visual Basic
we use the AddHandler method, and pass it the event we want to handle and a pointer to our handler routine (which is named ValidationError in this example):
'add the event handler for any validation errors found
AddHandler objValidator.ValidationEventHandler, AddressOf ValidationError
In C#, we can add the validation event handler using the following syntax:
objValidator.ValidationEventHandler += new
ValidationEventHandler(ValidationError);
Reading the Document and Catching Parser Errors
We are now ready to read the XML document from the disk file In our case, we're only reading through to check for validation errors In an application, you would have code here to perform whatever tasks you need against the XML, or alternatively use the XmlValidatingReader as the source for the Load method of an XmlDocument or
XmlDataDocument object, or in the constructor for an XPathDocument object:
Trang 7'display count of errors found
outXMLDoc.innerHTML += "Validation complete " & intValidErrors _
& " error(s) found"
Catch objError As Exception
'will occur if there is a read error or the document cannot be parsed
outXMLDoc.innerHTML += "Read/Parser error: " & objError.Message
Finally
'must remember to always close the XmlTextReader after use
objXTReader.Close()
End Try
That's all we need to do to validate the document The remaining part of the code in this page is the event handler that
we specified for the Validation event We'll look at this next
The ValidationEvent Handler
The XmlValidatingReader raises the Validation event whenever a validation error is discovered in the XML document, and we've specified that our event handler named ValidationError will be called when this event is raised This event handler receives the usual reference to the object that raised the event, plus a ValidationEventArgs object containing information about the event
In the event handler, we first increment our error counter, and then check what kind of error it is by using the Severityproperty of the ValidationEventArgs object We display a message describing the error, and the line number and character position if available (although these are generally included in the error message anyway):
Public Sub ValidationError(objSender As Object, _
objArgs As ValidationEventArgs)
Trang 8'event handler called when a validation error is found
intValidErrors += 1 'increment count of errors
'check the severity of the error
Dim strSeverity As String
If objArgs.Severity = 0 Then strSeverity = "Error"
If objArgs.Severity = 1 Then strSeverity = "Warning"
'display a message
outXMLDoc.innerHTML += "Validation error: " & objArgs.Message _
& "<br /> Severity level: '" & strSeverity
If objXTReader.LineNumber > 0 Then
outXMLDoc.innerHTML += "Line: " & objXTReader.LineNumber _
& ", character: " & objXTReader.LinePosition
End If
End Sub
We saw the validation error messages in the previous screenshot using a well-formed but invalid document We've also provided an XML document that is not well-formed so that you can see the parser error that is raised in this case and trapped by our Try Catch construct This also prevents the remainder of the document from being read:
Trang 9In this case, as you can verify if you try to open the XML document using the hyperlink, there is an illegal closing tag for one of the <Books> elements:
Trang 10We've spent a lot of time looking at how we can read and write XML documents, access them in a range of ways, and validate the content against a schema or DTD However, we haven't looked at how we can edit XML documents, or how
we create new ones The example page for this section, 'Creating and Editing the Content of XML Documents'
(edit-xml.aspx) fills out these gaps in our coverage
The example page loads an XML document named bookdetails.xml and demonstrates four different techniques we can use for editing and creating documents:
Selecting a node, extracting the content, and deleting that node from the document
Creating a new empty document and adding a declaration and comment to it
Importing (that is, copying) a node from the original document into the new document
Selecting, editing and inserting new nodes and content into the original document
The next screenshot shows the page when you run it You can see the four stages of the process, though the second and third are combined into one section of the output in the page:
Note that you must run the page in a browser on the web server itself to be able to open the XML documents using the physical paths in the hyperlinks in the page
Trang 11The Code for this Example Page
The page contains the customary <div> elements to display the results and messages, and details of any errors that we encounter It also creates the paths to the existing and new documents, and displays a hyperlink to the existing document This is identical to the previous example, and we aren't repeating the code here Instead, we start with the part that loads the existing document into a new XmlDocument object:
Dim objXMLDoc As New XmlDocument()
Try
objXMLDoc.Load(strXMLPath)
Catch objError As Exception
outError.innerHTML = "Error while accessing document.<br />" _
& objError.Message & "<br />" & objError.Source
Exit Sub ' and stop execution
End Try
Selecting, Displaying, and Deleting a Node
To select a specific node in our document we can use an XPath expression In our example the expression is
descendant::Book[ISBN="1861003234"], which, when the current node is the root element of the document, selects the <Book> node with the specified value for its <ISBN> child node
We use this expression in the SelectSingleNode method, and it returns a reference to the node we want To display this node and its content, we just have to reference its OuterXml property:
'specify XPath expression to select a book element
Dim strXPath As String = "descendant::Book[ISBN=" & Chr(34) _
Trang 12& "1861003234" & Chr(34) & "]"
'get a reference to the matching <Book> node
Dim objNode As XmlNode
objNode = objXMLDoc.SelectSingleNode(strXPath)
'display node and content using the OuterXml property
outResult1.InnerHtml = "XPath expression '<b>" & strXPath _
& "</b>' returned:<br />" _
& Server.HtmlEncode(objNode.OuterXml) & "<br />"
If we only want the content of the node, we can use the InnerXml property, and if we only want the text values of all the nodes concatenated together we can use the InnerText property
To delete the node from the document, we call the RemoveChild method of the parent node (the root of the document, which is returned by the DocumentElement property of the document object), and pass it a reference to the node to be deleted:
'delete this node using RemoveChild method from document element
objXMLDoc.DocumentElement.RemoveChild(objNode)
outResult1.InnerHtml += "Removed node from document.<br />"
Creating a New Document and Adding Nodes
We create a new empty XML document, simply by instantiating an XmlDocument (or XmlDataDocument) object Then
we can create nodes and insert them into this document using code like the following Here, we're creating a new XML declaration (the <?xml version="1.0"?> element) and inserting it into the new document with the InsertBefore
Trang 13method:
'create new empty XmlDocument object
Dim objNewDoc As New XmlDocument()
'create a new XmlDeclaration object
Dim objDeclare As XmlDeclaration
objDeclare = objNewDoc.CreateXmlDeclaration("1.0", Nothing, Nothing)
'and add it as the first node in the new document
objDeclare = objNewDoc.InsertBefore(objDeclare, objNewDoc.DocumentElement)
The second and third parameters of the CreateXmlDeclaration method are used to specify the encoding type used
in the document, and the standalone value (in other words, if there is a schema available to validate the document) We set both to Nothing, so we'll get neither of these optional attributes in our XML declaration element An XML parser will then assume the default values "UTF-8" and "yes" when it loads the document
When we create the new node, we get a reference to it back from the CreateXmlDeclaration method, and we use this
as the first parameter to the InsertBefore method The second parameter is a reference to the node that we want to insert before, and in this case we specify the root of the document Notice that DocumentElement is not the root element
of the document, as it doesn't yet have one This sounds confusing, but you can think of it as a reference to the placeholder where the root element will reside
Next we create a new Comment element, and insert this into the new document after the XML declaration element:
'create a new XmlComment object
Dim objComment As XmlComment
objComment = objNewDoc.CreateComment("New document created " & Now())
'and add it as the second node in the new document
Trang 14objComment = objNewDoc.InsertAfter(objComment, objDeclare)
Importing Nodes into the New Document
To get some content into the new document we just created, our example page imports a node from the existing document we loaded from disk at the start of the page We again use an XPath expression with the SelectSingleNodemethod to get a reference to the <Book> element we want to import:
strXPath = "descendant::Book[ISBN=" & Chr(34) & "1861003382" & Chr(34) & "]"
objNode = objXMLDoc.SelectSingleNode(strXPath)
Now we create a new XmlNode object in the target document to hold the imported node, and call the Import method of this new node to copy the node from the original document The second parameter to the Import method specifies if we want a "deep" copy - in other words if we want to import all the content of the node as well as the value:
'create a variable to hold the imported node object
Dim objImportedNode As XmlNode
'import node and all children into new document as unattached fragment
objImportedNode = objNewDoc.ImportNode(objNode, True)
Once we've got our new node into the document, we have to insert it into the tree - it is only an unattached fragment at the moment We use the InsertAfter method as before, using the reference we've already got to the new node, and the reference we created earlier to our Comment node so that the imported node becomes the root element of the new document:
'insert new unattached node into document after the comment node
objNewDoc.InsertAfter(objImportedNode, objComment)
'display the contents of the new document
Trang 15outResult2.InnerHtml = "Created new XML document and inserted " _
& "into it the node selected by<br />" _
& "the XPath expression '" & strXPath & "'" _
& "Content of new document is:<br />" _
& Server.HtmlEncode(objNewDoc.OuterXml)
We finish this section (in the code above) by displaying the contents of the new document We've got a reference to the XmlDocument object that contains it, so we just query the OuterXml property to get the complete content You can see the new document displayed in the example page shown previously
Inserting and Updating Nodes in a Document
The final part of our example page edits some values in the original document This time we need an XPath expression that will match more than one node, and so we use the SelectNodes method of the document to return an XmlNodeListobject containing references to all the matching nodes (in our example all the <ISBN> nodes) Then we can display the number of matches found:
strXPath = "descendant::ISBN"
'get a reference to the matching nodes as a collection
Dim colNodeList As XmlNodeList
colNodeList = objXMLDoc.SelectNodes(strXPath)
'display the number of matches found
outResult3.InnerHtml = "Found " & colNodeList.Count _
& " nodes matching the" _
& "XPath expression '" & strXPath & "'<br />" _
Trang 16& "Editing and inserting new content<br />"
Our plan is to add an attribute to all of the <ISBN> elements, and replace the text content (value) of these elements with two new elements that contain the information in a different form After declaring some variables that we'll need, we iterate through the collection of <ISBN> nodes using a For Each loop:
Dim strNodeValue, strNewValue, strShortCode As String
'create a variable to hold an XmlAttribute object
Dim objAttr As XmlAttribute
'iterate through all the nodes found
For Each objNode In colNodeList
Within the loop, we first create a new attribute named "formatting" and set the value to "hyphens" (all our <ISBN>nodes will have the same value for this attribute) Then we can add this attribute to the <ISBN> element node by calling the SetAttribute method However, there is a minor hitch - the members of an XmlNodeList are XmlNode objects, which don't have a SetAttribute method We get round this in Visual Basic by casting the object to an XmlElementobject using the CType (convert type) function:
'create an XmlAttribute named 'formatting'
objAttr = objXMLDoc.CreateAttribute("formatting")
'set the value of the XmlAttribute to 'hyphens'
objAttr.Value = "hyphens"
'and add it to this ISBN element - have to cast the object
'to an XmlElement as XmlNode doesn't have this method
CType(objNode, XmlElement).SetAttributeNode(objAttr)
Trang 17To change the content of the <ISBN> elements, we just have to set the InnerXml property This is much easier than using the InsertBefore and InsertAfter methods we demonstrated previously, and provides a valid alternative when the content we want to insert is available as a string (recall that we had references to the element node and its new content node when we used InsertBefore previously)
Our code extracts the existing ISBN value, creates the new "short code" from it, formats the existing ISBN with hyphens, and then creates a string containing the new content for the element The final step is to insert these values into the
<ISBN> node by setting its InnerXml property, before going round to do the next one:
'get text value of this ISBN element
strNodeValue = objNode.InnerText
'create short and long strings to replace content
strShortCode = Right(strNodeValue, 4)
strNewValue = Left(strNodeValue, 1) & "-" _
& Mid(strNodeValue, 2, 6) & "-" _
& Mid(strNodeValue, 8, 2) & "-" _
& Right(strNodeValue, 1)
'insert into element by setting the InnerXml property
objNode.InnerXml = "<LongCode>" & strNewValue _
& "</LongCode><ShortCode>" _
& strShortCode & "</ShortCode>"
Next
Trang 18We end the page by writing the complete edited XML document to a disk file and displaying a hyperlink to it so that you can view it:
'write the updated document to a disk file
objXMLDoc.Save(strNewPath)
'display a link to view the updated document
outResult3.InnerHTML += "Saved updated document: <a href=""" _
& strNewPath & """>" & strNewPath & "</a>"
Viewing the Results
If you open both documents, the original and the edited version, you can see the effects of our editing process The first contains the <Book>; node with the <ISBN>; value 1861003234, while it is not present in the second one (they are in order by ISBN code) You can also see the updated <ISBN>; elements in the second document:
Trang 20In this example, we've demonstrated several techniques for working with an XML document using the System.Xmlclasses provided in NET Some of the techniques use the XML DOM methods as defined by W3C, and some are specific
"extensions" available with the XmlDocument (and other) objects In general, these extensions make common tasks a lot easier, for example the ability to access the InnerText, InnerXml, and OuterXml of a node makes it remarkably easy
to edit or insert content and markup
We have by no means covered all the possibilities for accessing XML documents, as you'll see if you examine the list of properties, methods, and events for each of the relevant objects in the SDK However, by now, you should have a flavor for what is possible, and how easy it is to achieve
Using XSL and XSLT Transformations
To finish this chapter, we need to come back to a topic that we first looked at in the data management introduction chapter