1. Trang chủ
  2. » Comedy

this site is individual site for ueh students of information management faculty this site provides some students resources of it courses such as computer network data structure and algorithm enterprise resource planning

34 10 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 34
Dung lượng 313,23 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

An XML instance is composed of elements expressed in tag pairs (except for empty tags) plus optional attributes that always have quoted values and optional data that appears between th[r]

Trang 1

© Copyright IBM Corporation 2004 Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

Welcome to:

3.1

What Is XML?

Trang 2

Unit Objectives

After completing this unit, you should be able to:

Describe the basic rules of XML

Describe what it means for an XML document to be well-formedList the components that make up an XML document

Differentiate between XML and HTML

Describe the internationalization support in XML

Define some best practices for XML

Trang 3

What Is XML?

At its core XML is text formatted to follow a well-defined set of rules XML documents consist primarily of tags and text

If you've ever seen the source to an HTML document, then the

XML structure should look familiar

This text may be stored/represented in:

A normal file stored on disk

A message being sent over HTTP

A character string in a programming language

A CLOB (character large object) in a database

Any other way textual data can be used

XML documents do not need to exist as documents they may be: Byte streams sent between applications

Fields in a database record

Collections of XML Infoset information items

For simplicity they will be referred to as though they are

documents and files

Trang 4

XML documents should be thought of as a hierarchical tree structure.

Example Tree Representation of XML

"Tom

"The Right Stuff"

Trang 5

<?xml version="1.0"?> "Optional" first line; only required if encoding IS NOT UTF-8 or UTF-16*

<title>

Alphabet from A to Z

</title>

First child element with data

<isbn number=" 1112-23-4356 " /> Empty element (no data)

<author> Begin element tag

<firstName> Boreng </firstName>

<lastName> Riter </lastName> Nested child elements

</author> End element tag

<chapter title=" Letter A ">

The letter A is the first in

the alphabet It is also the

first of five vowels.

</chapter>

Element containing an attribute and parsed character data (PCDATA) [TBD]

<! The rest of the letter

chapters are missing > Comment

<chapter title=" Letter Z ">

The letter Z is the last

letter in the alphabet

</chapter>

Last element in document

A Simple XML Document - Basic Structure

Trang 6

A Simple XML Document -

Basic Nomenclature

The XML instance on the previous page consists of:

One main element book

Subelements title, isbn, author, chapter, and comment

Author contains other subelements firstName and lastName

ISBN and chapter contain attributes number and title, respectively

Title, firstName, and lastName contain only strings:

Elements that contain numbers, strings, dates, and so forth (TBD) but no

subelements (or attributes) are said to have simple types

ISBN and chapter carry attributes; author has subelements:

Elements that contain subelements or carry attributes are said to have

complex types

Attributes always have simple types (that is, they are numbers, strings, dates, and so forth.

TBD In a later chapter we describe XML Schemas which have access to

a collection of built-in simple types

Trang 7

Basics of Well-formed XML (1 of 2)

XML documents are considered to be well-formed when they

adhere to a set of five rules that define basic XML syntax and structure + a sixth for worldwide conformity

1 There must be a single root element:

All other elements are nested inside the root element

2 Elements must be properly terminated:

For every opening tag "< >" there must be a matching closing tag

"</ >"

The exception is an empty (no content or body) tag "< />"

3 Elements must be properly nested underneath a parent tag (except for the single, root element):

A nested tag-pair may not overlap another tag

There is no limit to the nesting level of children elements

Trang 8

Basics of Well-formed XML (2 of 2)

4 Tag names are case sensitive:

All tag and attribute names, attribute values, and data must comply with XML naming rules.

5 Attributes, extra information that can be provided for elements,

must be properly quoted:

That is, all attribute values must be in quotes.

6 The first line should/must contain the special tag that identifies the version of the XML specification to apply:

XML 1.0 is currently the most common.

Trang 9

Element Rules - Rule 1 Single Root Element

All XML documents must have a single root element

<?xml version="1.0"?>

<colors>

<color> red </color>

<color> green </color>

</colors>

<?xml version="1.0"?>

<color> red </color>

<color> green </color>

Colors is the root element for

this XML Color represents multiple root elements.

Trang 10

Element Rules - Rule 2 Element Tag Rules

Elements consist of start and end tags

End tag is identified by the /

Example: <color> red </color>

Elements may contain attributes within the start tag

Example: <book isbn=" 34323 "></book>

Note: The attribute is isbn

Empty elements contain no child elements or data

These elements can be represented with a special shorthand notation

Example:

<record key=" 123 "></record>

Can be shortened to:

<record key=" 123 " /> (preferred)

Or, if the element has no data as: <record />

Trang 11

Element Rules - Rule 3 Element Nesting

Elements must be properly nested

The end tags of inner elements must occur before the end tags of outer elements

Any number of child elements or data may be nested within the start and end tags of an element

Trang 12

Element Nesting Example

<?xml version="1.0"?>

<shirt>

<style> Polo </style>

<color> red </color>

<size> large </size>

</style>

</size></color>

</shirt>

All elements are properly nested The element tags are mixed up and not ordered.

Best Practice:Use indentation to represent the document's hierarchy.Important if your document will likely be read by humans

Computers and programs don't usually care

Trang 13

Element Rules - Rule 4 XML Naming Rules

XML name construction:

The first character must be A-Z, a-z, or _ (underscore)

Any number of subsequent letters, numbers, hyphens,

periods, colons, and underscore characters.

XML names are case sensitive.

Names cannot contain spaces.

Names must not have a prefix of xml in any case combination (such names are reserved).

Best Practice: Brevity in tag names is not necessary.

Use descriptive names for elements and attributes.

<Queue> or <que> is far better than <q>.

Best Practice: Maintain standard naming conventions and quoting.

Camelback, dot and underscore notation are all common (For example, camelBackNotation, dot.notation, and

underscore_notation).

Trang 14

Rule 4 Tag Naming - Samples

Trang 15

Rule 4 Element Content (1 of 2): General

An XML instance is composed of elements expressed in tag pairs (except for empty tags) plus optional attributes that always have quoted values and optional data that appears between the element

start tag and the element end tag

Mixed content - element content that contains data (PCDATA is shown) and other elements

Example (snippet):

<title><ref> XML </ref> Example </title>

<chapter>

Chapter information

<para> What is XML </para>

<para> What is HTML </para>

More chapter information

</chapter>

Trang 16

Rule 4 Element Content (2 of 2): Data

Element data content is handled in one of two ways:

1 Parsed Character Data (PCDATA): is examined by the XML parser to discover XML content embedded within it

2 Character Data (CDATA): is delimited by the special syntax

<![CDATA[ ]]> and is not processed by the parser

Trang 17

Rule 4 PCDATA - Parsed Character Data

Predefined entities exist to address ambiguous syntax situations, situations where the literal would be interpreted as part of the XML document syntax rather than its content

Examples:

<range> &gt; 6 &amp; &lt; 20 </range>

<quotes characters="' &quot; '"/>

Entity Description Character

&lt; "less than" <

&gt; "greater than" >

&amp; "ampersand" &

&apos; "apostrophe" '

&quot; "quote" "

Trang 18

Rule 4 CDATA - Character Data

Syntax:

Note: Anything except the literal string "]]>";

to embed "]]>" use "]]&gt;"

CDATA is not parsed and is treated as-is

Useful for embedding other languages within the XML

HTML documents

XML documents

JavaScript source

Or any other text with a lot of special characters

Generally speaking the escaping rules inside a CDATA section are those of the embedded language

For example, to escape an ampersand in Javascript use &#38;

Trang 19

Rule 4 CDATA Examples

These script elements contain JavaScript:

This nameXML element stores actual XML to be treated as text:

{ return 1 } else

{ return 0 } }

]]></script>

<nameXML>

<![CDATA[

<name common="freddy" breed="springer-spaniel">

Sir Frederick of Ledyard's End

</name>

]]>

</nameXML>

Trang 20

Element Rules - Rule 5 Element Attributes

Attributes are used to attach information to elements

Attributes consist of a name="value" pair, where the name is a legal XML name This is often referred to as a "key-value" pair

Attributes are placed in the start tag of the element to which they apply

An element may have several attributes, each uniquely named

Examples:

<title type="section" number="1" >XML overview</title>

<title type="boat" state="FL" >Yacht</title>

Notice the different usage of the attribute "type" in the two elements; semantically they are not the same

Attributes must have a value

Values must be quoted with either double or single quotes

Convention is to stick with one or the other

Trang 21

Element Rules - Rule 6

XML Declaration (1 of 2)

The XML Declaration is an optional first line in all XML documents:

<?xml version= "1.0" ? >

<?xml version= "1.0" encoding= "UTF-8" ?>

<?xml version= "1.0" standalone= "yes" ?>

If this declaration is used, the version attribute is mandatory.

The encoding attribute indicates the character encoding used in the

document; if UTF-8 or UTF-16 is used it may be omitted.

ASCII is a subset of UTF-8 and need not be declared.

Comments are not allowed before this statement.

The XML Declaration follows the syntax of a Processing Instruction or PI,

which is described on a subsequent chart, but it is considered to be unique and is treated separately in the 1.0 XML specification.

GENERAL NOTE OF CAUTION: You can not always rely on a browser or tool to completely/correctly enforce the specifications Nor are the

specifications always written in language that, to a particular reader, is unambiguous Still, the best advice is when in doubt, refer to the

specification, which for XML is www.w3.org/XML.

Trang 22

The stand-alone attribute is included here for completeness: it is used to

indicate if this XML document depends on information declared externally to

this document (in a DTD or XSL file (TBD), for examples); value may be yes

or no.

A value of "yes" indicates there are no external markup declarations; if there are no external markup declarations, the declaration has no

meaning.

A value of "no"indicates there are or may be such external markup

declarations; if there are such declarations but there is no standalone declaration, "no" is assumed.

so it is typically not used.

In any event, the inclusion in the XML instance of references to external entities, such as those in an embedded DTD, does not change its

standalone status.

A bigger issue associated with the stand-alone attribute is that of defining or

setting values in any entity that may be external to the XML instance

Arguably, the principal reason for using XML is that it explicitly defines the elements it includes If attribute values are overridden then the XML

instance before us is no longer declarative.

Element Rules - Rule 6

XML Declaration (2 of 2)

Trang 23

<! > Defines a comment

A space after the beginning and before the trailing hyphens is

recommended but not required

<?xml version=" 1.0 "?>

<! This is a comment They can go anywhere

inside an XML document except within an element tag.

>

<book>

<! Here is another comment >

</book>

Improper usage:

<chapter <! comment > > Some text </chapter>

or before the XML Declaration statement.

Trang 24

Internationalization and Encoding (1 of 2)

Support for different character encodings is provided through the encoding attribute of the XML Declaration

<?xml version="1.0" encoding="charset"?>

The encoding attribute indicates the set of characters that are permitted in the document

In the absence of an encoding declaration, Unicode UTF-8 or

UTF-16 characters may be used

Documents exchanged via network may be presented to the

processor in an encoding format other than the specified encoding

as long as the transport protocol (for example, HTTP) indicates the encoding used

Trang 25

Internationalization and Encoding (2 of 2)

It is very important that the editor and operating system used to write and save an XML document support the encoding specified in the XML Declaration

Sample encoding declarations:

ASCII (subset of UTF-8)

<?xml version="1.0" encoding="ISO-8859-1"?>

16 bit UNICODE

<?xml version="1.0" encoding="UTF-16"?>

<?xml version="1.0" encoding="ISO-10646-UCS-2"?>

Trang 26

Processing Instruction

Syntax <? target arg*?>

Processing Instruction is often abbreviated as PI in

documentation

A feature inherited from SGML

Used to embed application-specific instructions in documents.The target name immediately follows "<?" and is used to associate the PI with an application

May include zero or more arguments

May be preceded by comments

For example, <?xml-stylesheet href="common.css" type="text/css"?>, which is a generally available stylesheet for simple formatting

Trang 27

Well-formed versus Valid

A well-formed XML document:

Consists of XML elements that are nested within another

Has a unique root element

Follows the XML naming conventions

Follows the XML rules for quoting attributes

Has tags that are properly terminated

All XML parsers check for well-formedness

A valid XML document has an associated vocabulary and obeys the

structural rules specified by that vocabulary

Associated vocabulary is typically defined by either a DTD or an XML Schema

XML parsers may be validating or non-validating depending upon whether or not they can apply an associated grammar

Studio is an example of a tool whose XML capabilities include validation

Ngày đăng: 25/01/2021, 15:38

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm

w