Tai lieu= Tieng anh

XML can be used to store data inside HTML documents.. The following code is legal in HTML: This is a paragraph This is another paragraph In XML all elements must have a closing tag like

Trang 1

How XML can be used

XML can keep data separated from your HTML

HTML pages are used to display data Data is often stored inside HTML pages With XML this data can now be stored in a separate XML file This way you can concentrate on using HTML for formatting and display, and be sure that changes

in the underlying data will not force changes to any of your HTML code

XML can be used to store data inside HTML documents

XML data can also be stored inside HTML pages as Data Islands You can still concentrate on using HTML for formatting and displaying the data

XML can be used as a format to exchange information

In the real world, computer systems and databases contain data in incompatible formats One of the most time consuming challenges for developers has been to exchange data between such systems over the Internet Converting the data to XML can greatly reduce this complexity and create data that can be read by different types of applications

XML can be used to store data in files or in the databases

Applications can be written to store and retrieve information from the store, and generic applications can be used to display the data

XML Example

<?xml version="1.0"?>

<note>

<heading>Reminder</heading>

<body>Don't forget me this weekend!!!</body>

</note>

Line-by-line code Explanation

The XML declaration should always be included It defines the XML version of the document In this case, the document conforms to the 1.0 specification of XML

<note>

Defines the first element (the root element) of the document

<body>Don't forget me this weekend!!!</body>

Defines 4 elements of the root (to, from, heading and body)

</note>

The last line defines the end of the root element

Trang 2

What's an XML doc looks like

Let's save a piece of code above as note.xml (by the way, XML document should have xml as it's extension) and open it in the IE Below is what you actually see

on the browser

XML Syntax - General Idea

1 All XML elements must have a closing tag

In HTML some elements do not have to have a closing tag The following code is legal in HTML: This is a paragraph

This is another paragraph

In XML all elements must have a closing tag like this: This is a

paragraph

This is another paragraph

2 XML tags are case sensitive

XML tags are case sensitive Opening and closing tags must therefore be written with the same case

<Message>This is incorrect</message>

<message>This is correct</message>

Important: Tags should begin with either a letter, an underscore (_) or a colon ( followed by some combination of letters, numbers, periods (.), colons,

underscores, or hyphens (-) but no white space, with the exception that no tags should begin with any form of "xml" It is also a good idea to not use colons as the first character in a tag name even if it is legal Using a colon first could be confusing Here are some legal and illegal tags examples: Legal tags Illegal tags

<first-name> <first - name>

<last.name> <last name>

3 All XML elements must be properly nested

In HTML some elements can be improperly nested within each other like this:

This text is bold and italic

In XML, all elements must be properly nested within each other like this:

This text is bold and italic

4 All XML documents must have a root tag

All XML documents must contain a single tag pair to define the root element All other elements must be nested within the root element All elements can have sub (children) elements Sub elements must be in pairs and correctly nested

Trang 3

within their parent element eg <root>

<child>

</subchild>

</child>

</root>

5 Attribute values must always be quoted

XML elements can have attributes in name/value pairs just like in HTML In XML the attribute value must always be quoted eg

Correct Incorrect

<?xml version="1.0"?> <?xml version="1.0"?>

Avoid using attributes?

Attributes are handy in HTML But in XML you should try to avoid them (you could easily substitute attributes by elements - I will show you later so you could get the idea!!!) Why? Below are some of the problems using attributes

Attributescan not contain multiple values

Attribute are not expandable

Attribute are more difficult to manipulate by program code

Attribute values are not easy to test against DTD

Let me clear up your doubt by looking at the following example:

An XML example

<note>

<subject>Reminder</subject>

<body>Don't forget me this weekend</body>

</note>

If you look at the element <date> above, how do you interpret it??? Is this 12 of November or 11 of December???

Now, let see how you can expand the <date> element:

<note>

<date>

Trang 4

</date>

<body>Don't forget me this weekend</body>

</note>

Got the idea???

XML Well-formed

If you have read the XML Syntax - General Idea section above, by now you should have a very fair idea about XML in general So, I am going to move on to more interesting topic that is XML well-formed XML Documents consider well formed should satisfy three simple rules:

The document must contain one or more elements

It must contain a uniquely name element, no part of which appears in the content

of any other element, known as the root element

All other elements within the root element must be correctly nested

So, according to these rules, the following are examples of well formed

documents:

example1.xml

<empty_tag></empty_tag>

example2.xml

<class>Mammalia</class>

example3.xml

<root>

<class>Mammalia</class>

</root>

example4.xml

<empty_tag/>

Note: example1.xml and example4.xml are the same

The following is example of not well formed documents:

bad_example.xml

<bad_parent>

<naughty_child>Some text info

</bad_parent>

</naughty_child>

Trang 5

Explanation: If you look carefully, you can see that the element <naughty_child> overshoots the end of the <bad_parent> element, which should encapsulate the

<naughty_child> element completely (According to rule 3 above)

XML doc structure

Physically, documents are composed of a set of entities (we will talk about this topic in a bit) that are identified by unique names All documents begin with a root

or document entity All other entities are optional

As opposed to physical structure, XML documents have a logical structure as well Logically, documents are composed of declarations, elements, comments, character references and processing instructions, all of which are indicated in the document by explicit markup

Data vs Markup

All XML documents may be understood in terms of the data they contain and the markup that describe that data

Data is typically "character data" (i.e anything within the boundaries of valid Unicode such as letters, numbers, punctuation and so on) but can also be binary data as well

Markup includes tags, comments, processing instructions, DTDs and references and so forth

For example: <name>John Smith</name>

Explanation: <name>and </name> tags comprise the markup and "John Smith" comprises the character data

XML Declaration

To begin an XML document, it is a good idea to include the XML declaration at the very first line of the document Though the XML declaration is optional, but the W3C specification (World Wide Web Consortium - the group developed XML) suggests that we should include it to indicate the version of XML, used to

construct the document so that an appropriate parser or parsing process can be matched to the document

Essentially, the XML declaration is a processing instruction that notifies the processing agent (browser) that the following document has been marked up as

an XML document It will look something like the following:

Trang 6

OR having a white space in between as shown below.

<?xml version = "1.0" ?>

We will talk more about the gory details of processing instructions later, for now we concentrate on explaining how the XML declaration works okie!

All processing instructions, including the XML declaration should have the

following syntax:

<?name ?>

It must begin with <? and end with ?> Following the initial <?, you will find the name of the processing instruction, which in this case is "xml"

The XML processing instruction, requires that you specify a version attribute and allows you to specify optional standalone and encoding attributes

In its full regalia, the XML declaration might look like the following:

<?xml version="1.0" standalone="yes" encoding="UTF-8"?>

The Version Attribute

As we have mentioned before, if you do decide to use the optional XML

declaration, you must define the version attribute As of this writing, the current version of XML is 1.0

If you include the optional attributes, version must be specified first

The STANDALONE Attribute

The standalone attribute specifies whether the document has any markup

declarations that are defined in a separate document Thus, if standalone is set

to "yes", the document is effectively self-contained and there are no extra markup declarations in external DTD's However, setting the standalone to "no" leaves the issue open Remember that the document may or may not access external DTD's

For examples:

standalone_yes.xml

<?xml version="1.0" standalone="yes" encoding="UTF-8"?>

<book>

<title>Professional XML Design and Implementation</title>

<author>Paul Spencer</author>

Trang 7

</book>

standalone_no.xml

<?xml version="1.0" standalone="no" encoding="UTF-8"?>

<!DOCTYPE book SYSTEM "book.dtd">

<book>

<title>Professional XML Design and Implementation</title>

<author>Paul Spencer</author>

</book>

Note: As you can see, if standalone="no" which means the XML document should use an external DTD In this case, use book.dtd file to check for validating document

The ENCODING Attribute

All XML parsers must support 8-bit and 16-bit Unicode encoding (UTF-8 and UTF-16 respectively) corresponding to ASCII However, XML parsers may support a larger set

Character Data

XML defines the text between the start and end tags to be character data and the text within the tags to be markup

Since the "<" and ">" are the reserved characters for the start and end of a tag - respectively Thus character data may be any legal (Unicode) character except the "<" and ">" can not be used The following example is incorrect

Alternative solution:

Here is the question you might ask yourself How am I supposed to know which characters that legal or illegal to use? Well, not too worries - XML provides a couple of useful entity references that you can use:

Character Entity Reference Meaning

> > Greater than

< < Less than

& & Ampersand

" " Double quote

Trang 8

' ' Apostrophe (Single quote)

Obviously, the < entity reference is useful for character data The other entity references can be used within markup in cases in which there could be confusion such as:

Which should be written as:

By and large, tags make up the majority of XML markup A tag is pretty much anything between a < sign and a > sign that is not inside a comment, or a

CDATA section (Read on next section, please!)

CDATA

CDATA also means character data CDATA is text that will NOT be parsed by a parser Tags inside the text will NOT be treated as markup and entities will not be expanded

As we have said already, it is a pretty good rule of thumb to consider anything outside of tags to be character data and anything inside of tags to be considered markup But alas, how am I going to show the ">" or any other reserved

characters on the browser? and worse still, if I decide to have lots of reserved characters to show up the browser, do I have to key in all the funny entity

reference symbols???

Of course not, XML has provided you a wonderful feature that you can use That

is the special case of CDATA blocks, it is provided as a convenience measure when you want to include large blocks of special characters as character data

By including CDATA block, you actually tell the XML processor (browser) to treat everything inside CDATA section just like any others ordinary character data (that means all tags and entity references are ignored by an XML processor) Let's say you want to display XML document on the browser, you can construct your XML document as follow:

<name>Trina Thach</name>

<email>trina@technomusic.org</email>

</document>

</example>

As you can see, you would be forced to use entity references for all the tags

Trang 9

What's a mess!

Thus, to avoid the inconvenience of translating all special characters, you can use a CDATA block to specify that all character data should be considered character data whether or not it looks like a tag or entity reference

Now, allow me to show you how easy it is by applying CDATA block within XML document:

<![CDATA[

<name>Trina Thach</name>

<email>trina@technomusic.org</email>

</document>

]]>

</example>

See how readable and legible it is???

As you might have guessed, the character string ]]> is not allowed within a CDATA block as it would signal the end of the CDATA block

PCDATA

PCDATA means parsed character data Think of character data as the text found between the start tag and the end tag of an XML element PCDATA is the text that will be parsed by a parser Another word, the tags inside the text will be treated as markup and entities will be expanded

Comments

Not only will you sometimes want to include tags in your XML document that you want the XML processor will ignore (display as character data), but sometimes you will want to put character data in your document that you want the XML processor to ignore (not display at all) This type of text is called Comment text

In HTML, you specified comments using the <! and > syntax Well, I have some good news In XML, comments are done in just the same way! So the following would be a valid XML comment:

<! Begin the Names >

<name>Jim Nelson</name>

<name>Sam Sanger</name>

<name>Les Moore</name>

<! End the names >

When using comments in your XML documents, however, you should keep in mind a couple of rules

Trang 10

Should not have "-" or " " within the text of your comment as it might be

confusing to the XML processor

Never ever place a comment within a tag Thus, the following code would be poorly-formed XML

<name <! The name > >Peter Williams </name>

Likewise, never place a comment inside of an entity declaration and never place

a comment before the XML declaration that must always be the first line in any XML document

Comments can be used to comment out tag sets Thus, in the following case, all the names will be ignored except for Barbara Tropp

Processing Instructions

We have already seen a processing instruction The XML declaration is a

processing instruction And if you recall, when we introduced the XML declaration

we promised to return to the concept of processing instructions to explain them

as a category

So here we are

A processing instruction is a bit of information meant for the application using the XML document That is, they are not really of interest to the XML parser Instead, the instructions are passed intact straight to the application using the parser

The application can then pass this on to another application or interpret it itself All processing instructions follow the generic format of:

<?name_of_app_instruction_is_for_instructions?>

As you might imagine, you cannot use any combination of "xml" as the

name_of_application_instruction_is_for since "xml" is reserved However, you might have something like:

<?JAVA_OBJECT JAR_FILE= "/java/myjar.jar"?>

XML Syntax - Entities

Actually I should have leave this topic till we talk about writing the valid

documents rather than writing well-formed documents Nevertheless, some issues make sense within this section, because entities must be well-formed as well as valid So, what I am going to do is to introduce entities in terms of their

Tiêu đề	How XML Can Be Used
Trường học	University of Technology
Chuyên ngành	Computer Science
Thể loại	Essay
Năm xuất bản	2023
Thành phố	Hanoi

Định dạng
Số trang	14
Dung lượng	51,5 KB