Document begins with declaration that specifies XML version 1.0 Element message is child element of root element myMessage... Parsing an XML Document with MSXML • XML document – Contai
Trang 1Fundamental XML for
Developers
Dr Timothy M Chester Texas A&M University
Trang 2Timothy M Chester is .
• Senior IT Manager, Texas A&M University
– Application Development, Systems Integration, Developer Tools
& Training
• Lecturer, Texas A&M College of Business
– Courses on Business Programming Fundamentals (VB.NET, C#), XML & Advanced Web Development.
Trang 3Texas A&M University
Trang 4– Interested in how XML can affect both
software development and legacy system integration
Trang 5• Demonstrates some of the basics of
working with the DOM, XSLT, Schema,
WSDL, and SOAP.
Trang 9Other Web Services
Partner Web Service
Partner Web Service
Trang 10Introducing XML
• XML stands for Extensible Markup
Language A markup language specifies the structure and content of a document.
• Because it is extensible, XML can be used
to create a wide variety of document
types.
Trang 11between SGML and HTML – easier to learn than
Trang 12The Limits of HTML
• HTML was designed for formatting text on a Web page
It was not designed for dealing with the content of a Web page Additional features have been added to HTML, but they do not solve data description or cataloging issues in
an HTML document.
• Because HTML is not extensible, it cannot be modified to meet specific needs Browser developers have added
features making HTML more robust, but this has resulted
in a confusing mix of different HTML standards.
Trang 15Document begins with declaration
that specifies XML version 1.0
Element message is
child element of root
element myMessage
Trang 16• XML documents
– Must contain exactly one root element
• Attempting to create more than one root element is erroneous
– Elements must be nested properly
• Incorrect: <x><y> hello </x></y>
• Correct: <x><y> hello </y></x>
– Must be well-formed
Introduction to XML Markup
(cont.)
Trang 17XML Parsers
• An XML processor (also called XML
parser) evaluates the document to make sure it conforms to all XML specifications for structure and syntax.
• XML parsers are strict It is this rigidity
built into XML that ensures XML code
Trang 19Parsers and Well-formed XML
Documents (cont.)
• XML parsers support
– Document Object Model (DOM)
• Builds tree structure containing document data in memory
– Simple API for XML (SAX)
• Generates events when tags, comments, etc are
encountered
– (Events are notifications to the application)
Trang 20Parsing an XML Document with
MSXML
• XML document
– Contains data
– Does not contain formatting information
– Load XML document into Internet Explorer 5.0
• Document is parsed by msxml.
• Places plus (+) or minus (-) signs next to container elements
– Plus sign indicates that all child elements are hidden – Clicking plus sign expands container element
» Displays children – Minus sign indicates that all child elements are visible – Clicking minus sign collapses container element
» Hides children
• Error generated, if document is not well formed
Trang 21XML document shown in IE6.
Trang 23Characters vs Markup
• XML must differentiate between
– Markup text
• Enclosed in angle brackets (< and >)
– e.g, Child elements– Character data
• Text between start tag and end tag
– Welcome to XML!
– Elements versus Attributes
Trang 24White Space, Entity References
and Built-in Entities
• Whitespace characters
– Spaces, tabs, line feeds and carriage returns
• Significant (preserved by application)
• Insignificant (not preserved by application)
– Normalization
» Whitespace collapsed into single whitespace character
» Sometimes whitespace removed entirely
<markup> This is character data </markup>
after normalization, becomes
<markup> This is character data </markup>
Trang 25White Space, Entity References and
Built-in Entities (cont.)
– Allow to use XML-reserved characters
• Begin with ampersand (&) and end with semicolon (;)
– Prevents from misinterpreting character data as markup
Trang 26White Space, Entity References
and Built-in Entities (cont.)
• Build-in entities
– Ampersand (&)
– Left-angle bracket (<)
– Right-angle bracket (>)
– Apostrophe (')
– Quotation mark (")
– Mark up characters “<>&” in element
message
<message> <>& </message>
Trang 28• XML Document Object Model (DOM)
– Build tree structure in memory for XML
documents
– DOM-based parsers parse these structures
• Exist in several languages (Java, C, C++, Python, Perl, C#, VB.NET, VB, etc)
Trang 29<message from = "Paul" to = "Tem" >
<body> Hi, Tim! </body>
</message>
• Node created for element message
Trang 31Creating Nodes
• Create XML document at run time
Trang 32Traversing the DOM
• Use DOM to traverse XML document
– Output element nodes
– Output attribute nodes
– Output text nodes
Trang 33DOM Components
• Manipulate XML document
Trang 35• XML Path Language (XPath)
– Syntax for locating information in XML document
• e.g., attribute values
– String-based language of expressions
• Not structural language like XML
– Used by other XML technologies
• XSLT
Trang 36• XML document
– Tree structure with nodes
– Each node represents part of XML document
• Seven types
– Root – Element – Attribute – Text – Comment – Processing instruction – Namespace
• Attributes and namespaces are not children of their parent node
– They describe their parent node
Trang 37XPath node types
Node Type string-value expanded-name Description
root Determined by
concatenating the string-values of all text- node descendents in document order
None Represents the root of an
XML document This node exists only at the top of the tree and may contain element, comment or processor-
instruction children
element Determined by
concatenating the string-values of all text- node descendents in document order
The element tag, including the namespace prefix (if applicable)
Represents an XML element and may contain element, text, comment or processor-
Represents an attribute of an element
Trang 38XPath node types (Part 2)
contained in the text node
data content of an element
(not including <! and >)
The target of the processing instruction
Represents an XML processing instruction
prefix
Represents an XML namespace
Trang 39Location Paths
• Location path
– Expression specifying how to navigate XPath tree
– Composed of location steps
• Each location step composed of
– Axis – Node test – Predicate
Trang 40• XPath searches are made relative to
context node
• Axis
– Indicates which nodes are included in search
• Relative to context node
– Dictates node ordering in set
• Forward axes select nodes that follow context node
• Reverse axes select nodes that precede context node
Trang 41Node Tests
• Node tests
– Refine set of nodes selected by axis
• Rely upon axis’ principle node type
– Corresponds to type of node axis can select
Trang 42Node-set Operators and
Functions (cont.)
• Location-path expressions
– Combine node-set operators and functions
• Select all head and body children element nodes
Trang 44• Extensible Stylesheet Language (XSL)
– Used to format XML documents
– Consist of two parts
Trang 45• XSLT processor
– Microsoft Internet Explorer 6
– Java 2 Standard Edition
– Microsoft.NET System.Xml Namespace
Trang 46• XSLT document
– XML document with root element stylesheet
– template element
• Matches specific XML document nodes
• Uses XPath expression in attribute match
Trang 47Templates (cont.)
• XSLT
– Two trees of nodes
• Source tree corresponds to original XML document
• Result tree contains nodes produced by transformation
– Transforms intro.xml into HTML document
Trang 48Iteration and Sorting
Trang 49Conditional Processing
• Perform conditional processing
– Such as if statement
– Use element choose
• Allows alternate conditional statements
• Similar to switch statement
• Has child elements when and otherwise
– when element content used if condition is met – otherwise element content used if no conditions in
Trang 50XSLT and XPath
• XPath Expression
– locates elements, attributes and text in XML document
Trang 52Working with Namespaces
• Name collision occurs when elements from two
or more documents share the same name
• Name collision isn’t a problem if you are not
concerned with validation The document
content only needs to be well-formed
• However, name collision will keep a document from being validated
Trang 53Name Collision
This figure shows two documents each with a Name
element
Trang 54Using Namespaces to Avoid
Name Collision
This figure shows how to use a namespace to avoid collision
Trang 55Declaring a Namespace
• A namespace is a defined collection of element
and attribute names
• Names that belong to the same namespace
must be unique Elements can share the same name if they reside in different namespaces
• Namespaces must be declared before they can
be used
Trang 56Declaring a Namespace
• A namespace can be declared in the prolog or as an
element attribute The syntax to declare a namespace in the prolog is:
<?xml:namespace ns=“URI” prefix=“prefix”?>
• Where URI is a Uniform Resource Identifier that assigns
a unique name to the namespace, and prefix is a string
of letters that associates each element or attribute in the document with the declared namespace.
Trang 589 <text:file filename = "book.xml" >
10 <text:description> A book list </text:description>
11 </text:file>
12
13 <image:file filename = "funny.jpg" >
14 <image:description> A funny picture 15 </image:description> <image:size width = "200" height = "100" />
16 </image:file>
17
18 </directory>
Trang 599 <file filename = "book.xml" >
10 <description> A book list </description>
11 </file>
12
13 <image:file filename = "funny.jpg" >
14 <image:description> A funny picture 15 </image:description> <image:size width = "200" height = "100" />
Trang 60• A schema is an XML document that defines the content
and structure of one or more XML documents.
• To avoid confusion, the XML document containing the
content is called the instance document.
• It represents a specific instance of the structure defined
in the schema.
Trang 61Comparing Schemas and DTDs
This figure compares schemas and DTDs
Trang 62Schema Dialects
• There is no single schema form.
• Several schema “dialects” have been
developed in the XML language.
• Support for a particular schema depends
on the XML parser being used for
validation.
Trang 63Starting a Schema File
• A schema is always placed in a separate XML document that is referenced by the instance document.
Trang 64Schema Types
• XML Schema recognize two categories of element types: complex and simple.
• A complex type element has one or more
attributes, or is the parent to one or more child elements.
• A simple type element contains only
character data and has no attributes.
Trang 65Schema Types
This figure shows types of elements
Trang 66Understanding Data Types
• XML Schema supports two data types: built-in
and user-derived
• A built-in data type is part of the XML Schema
specifications and is available to all XML
Schema authors
• A user-derived data type is created by the XML
Schema author for specific data values in the
instance document
Trang 67Understanding Data Types
• A primitive data type, also called a base
type, is one of 19 fundamental data types not defined in terms of other types.
• A derived data type is a collection of 25
data types that the XML Schema
developers created based on the 19
Trang 69• Think "TypeLib for SOAP"
• WSDL = Web Service Description Language
• Uniform representation for services
– Transport Protocol neutral – Access Protocol neutral (not only SOAP)
• Describes:
– Schema for Data Types – Call Signatures (Message) – Interfaces (Port Types) – Endpoint Mappings (Bindings)
Trang 70• Think "Yahoo!" for WebServices
• Universal Description and Discovery Interface
• WebService-Programmable "Yellow Pages"
• Advertise Sites and Services
• May point to DISCO resources
• Initiative driven by Microsoft, IBM,
Ariba
Trang 72• A lightweight protocol for exchanging information
in a distributed, heterogeneous environment
– It enables cross-platform interoperability
Trang 73• Guiding principle: “Invent no new technology”
• Builds on key Internet standards
– SOAP ≈ HTTP + XML
– Submitted to W3C
• The SOAP specification defines:
– The SOAP message format
– How to send messages
– How to receive responses
SOAP
Overview
Trang 76<Header> encloses headers
<Envelope> encloses payload
Protocol binding headers The complete SOAP message
SOAP
Message Structure
Trang 77SOAP Message Format
• An XML document using the SOAP schema:
Trang 78</AddResult>
</soap:Body>
</soap:Envelope>
Trang 81Questions
Trang 82• Harvey Deitel’s “XML:How To Program”
• Prentice Hall XML Reference
• Microsoft Academic Resource Kit