Addison wesley XPath XLink XPointer and XML a practical guide to web hyperlinking and transclusion jul 2002 ISBN 0201703440

Publisher : Addison WesleyPub Date : July 23, 2002ISBN : 0-201-70344-0Pages : 304 The combination of Extensible Markup Language XML andits related interlinking standards bring a range of

Trang 1

Publisher : Addison WesleyPub Date : July 23, 2002ISBN : 0-201-70344-0Pages : 304

The combination of Extensible Markup Language (XML) andits related interlinking standards bring a range of excitingpossibilities to the realm of Internet content management.This practical reference book documents these criticalstandards, shifting theory into practice for today's developerswho are creating tomorrow's useful, efficient, and

information-rich applications and Web sites

Blending advanced reference material with practicalguidelines, this authoritative guide presents a historicaloverview, current developments, and future perspectives inthree detailed sections Part I provides a conceptual

framework highlighting current and emerging linkingtechnologies, hypermedia concepts, and the rationale behindthe "open" Web of tomorrow Part II covers the specificsbehind the emerging core standards, and then Part IIIexamines how these technologies can be applied and howthe concepts can be put to efficient use within the world ofWeb site management and Web publishing

Both detailed and authoritative, this book presents the mostthorough documentation of XML's linking standards available,and it examines how today's enabling technologies are likely

to change the Web of tomorrow

Topics covered in-depth include:

Hypermedia concepts and alternatives to the WebXML Namespaces, XML Base, XInclude, XML

Trang 2

XPath, XLink, and XPointer concepts, strengths, andlimitations

Emerging tools, applications, and environments

Migration strategies, from conventional models to moresophisticated linking techniques

Future perspectives on the XPath, XLink, and XPointerstandards

Trang 3

2.3 Usage Scenarios: Hypermedia Support for InformationUtilization

Trang 5

References

Trang 6

Many of the designations used by manufacturers and sellers to

distinguish their products are claimed as trademarks Where those

designations appear in this book, and Addison-Wesley was aware of atrademark claim, the designations have been printed with initial capitalletters or in all capitals

The authors and publisher have taken care in the preparation of thisbook, but make no expressed or implied warranty of any kind and

assume no responsibility for errors or omissions No liability is assumedfor incidental or consequential damages in connection with or arising out

of the use of the information or programs contained herein

The publisher offers discounts on this book when ordered in quantity forbulk purchases and special sales For more information, please contact:U.S Corporate and Government Sales

Trang 8

6.1 Snapshot of W3C's technical reports page

6.2 Container nodes, node points, and character points

Trang 9

7.2 Inline extended link with arcs7.3 Out-of-line extended link

7.4 Out-of-line extended link with arcs7.5 XLink and linkbases

Trang 10

6.2 XPointer Character Escaping (Example 1)

6.3 XPointer Character Escaping (Example 2)

7.1 Relation Between XLink Link and Element Types7.2 XLink Element Type Relationships

7.3 Attribute Use Patterns for XLink Element Types

Trang 11

It gives me great pleasure to write a foreword to this book, which explainshow to bring some of the rich results of years of hypermedia research tothe World Wide Web, now the most important hypermedia publishingplatform on the planet It is wonderful to see XLink, XPointer, and XPathoutside of the charmed circle of standards development and to see theirpurpose and application to hypertext clearly explained by authors andresearchers who really understand them You may have already read theintentionally terse language of some of the many standards defining theWeb and wondered, "Why does the standard say to do things this way?"

If this has been your experience, relax and prepare to take a painlesstour with skillful guides who will lead you through possibilities that arenew to the Web, who will show you how they can be applied, and whowill alert you to some of the problems that remain

In a sense, this book celebrates the "marriage" of hypertext research,which has sometimes been remote from the workaday world, and theWorld Wide Web, which has become an integral part of commerce andentertainment Before you start to enjoy the wedding feast on offer, I, likethe Ancient Mariner, would hold you with a glittering eye for a while andtalk about history before sending you on to the feast to learn the latestthing

History comes to mind for a few reasons, one of which is that the title ofthis book contains the word "transclusion." This word is newer than

"hypertext"; it is less well known and still clearly bears the maker's stamp

of Ted Nelson, neologist extraordinary to the trade Vannevar Bush'sarticle about the Memex first sketched a clear vision of technologicallyassisted reading, writing, and thinking; but Ted Nelson and Douglas

Engelbart moved those concepts into the world of digital computers, theirnatural home where they now thrive In its invocation of Nelson, the title

of this book harks back to the origins of hypertext in the text-processingexperimentation of the sixties, when the idea of "wasting" computer

power on documents and not using it for mathematical problems wasrevolutionary and somewhat subversive

Trang 12

unusual today Partly, this reflected an ambition to provide an all-encompassing solution; partly, it reflected the way systems were

constructed in those days; but above all, it was a practical necessity Inthis new area of computer applications, the researcher must have somesolution to all the subproblems simultaneously (such as display, printing,data management, authoring, and composition support) in order to have

a system at all Later research systems produced more in-depth

knowledge of particular issues but were also more narrowly targeted atexploring specific concepts and were, often, unwieldy to deploy at othersites or to integrate with other systems

The Web bravely ignored many of the hard problems framed by

researchers and it did so by building a simple structure that would beuseful immediately, integrate with as many systems as possible, andspread easily The history of the Web over the last decade is deeply

"intertwingled" with many research concepts; with diverse social

communities, from physical scientists to humanists; and with the complexinfluences on the systems that inspired its creators But the Web hasgrown so fast that some connections have been obscured, and someopportunities that are no longer hard problems have been delayed Thisbook explains some of these connections, including why they are

important and how they will change things in the years to come

The marriage is on Come learn about new ways to link and to navigatethat are now ready for "worldwide" application, and enjoy the cake!

David G Durand

Trang 13

The Web has been growing and evolving at a phenomenal rate since itsemergence in the early 1990s Part of this evolution has been the

development of increasingly sophisticated technologies and their

utilization in developing complex applications While the technical

foundations have remained relatively unchanged—URLs and HTTP haveremained stable for some time, and only HTML has changed frequently—the emergence of XML as a new format for the representation of content,along with a sequence of related developments such as XLink, has

heralded a substantial change in the way content can be managed Themost significant of these changes is with respect to the hypermedia

functionality that is enabled by these new technologies, particularly thericher linking and navigation models

The title of this book includes the word "transclusion." This word— fromTed Nelson's work on the Xanadu system [Nelson 95]—describes anapproach to including content via references that retain the original

context "Transclusion" and "transcopyright" are the two basic features ofthe Xanadu system, and the Web will definitely become more Xanadu-like in the coming years Furthermore, the Web's new hypermedia

functionality will make its structure more complex but also richer, moreusable, and more informative We believe this book will provide an

effective guide to this development in the coming years

Purpose of the Book

Our purpose in writing this book has been to explore and illustrate thepossible hypermedia functionality introduced into the Web's architecture

by XML and the accompanying XLink and XPointer standards Today'sfocus in the use of XML is its application-specific, data-structuring

capabilities However, we believe that by effective use of XLink and

XPointer, in conjunction with XML, hypermedia-rich applications, whichwill be more usable and effective than the current content-based HTMLhypermedia model, can be created

Trang 14

XPointer-enabled Web from both a conceptual point of view and a

practical perspective A conceptual view allows us to understand thetypes of advanced changes enabled by these technologies, and the

implications of these changes for creating effective, maintainable, andusable applications A practical perspective allows us to understand howthese technologies are ac tually applied by developers, as well as toexamine issues related to current tools, environments, and

standardization processes

The Book's Audience

We believe that XML, XLink, and XPointer and, in particular, the newhypermedia functionality enabled by these technologies will

fundamentally change the Web This book focuses on understanding andleveraging these changes and should therefore be interesting and usefulfor many people

Web authors, developers, and project managers So far, this group

has been limited by HTML's primitive linking mechanism; and for manyapplications, an understanding of this new hypermedia functionality will

be beneficial It will enable them to produce more sophisticated

applications, both in terms of the way the content that underpins their site

is managed, as well as in terms of the functionality that can be created inthe application front-end This book provides an overview of the

technology and presents concrete implementation strategies To assistWeb authors, developers, and project managers in being backwards-compatible, the book also provides transition strategies

Web users In many cases, Web users are very interested in what the

future of Web technology can bring them In particular, updated featuresare often the main motivation for upgrading to a newer version of a

browser or other software, so Web users should be well informed aboutthe improvements available with the most recent software

Students In courses as diverse as information studies, software

engineering, information systems, and library studies, students will

benefit from understanding how the Web is likely to evolve in the future—

Trang 15

The Book's Content

In this preface, we discuss the changes in the Web and the role that

emerging standards can play in developing a richer and more usableWeb In the introduction, we elaborate on this idea by exploring the

emerging standards and, in particular, consider what we mean by

information-linking and the role it plays within the Web The introductionprovides a context for the broad focus of the book

The rest of the book is divided into three main parts Part I focuses on aconceptual framework It explores the Web we might wish to develop andthe emerging linking technologies that may go some way toward

providing it We start in chapter 1 with a consideration of current

technology We focus on the limitations inherent in this technology,

particularly with respect to linking and the implications for informationhandling, navigation, and retrieval Chapter 2 provides information aboutthe motivation for the types of changes we are promoting We start byexploring linking issues in much more detail, looking at hypermedia

concepts and some of the historical hypermedia developments, whichprovides useful insights into how information might be better managed

We also provide relevant definitions that clarify much of the terminologyused in the rest of the book This chapter concludes with a typical

scenario that illustrates the types of Web changes that might be desirablecurrently Chapter 3 begins the process of considering the new and

emerging technologies that enable the vision we have begun to establish

in the first two chapters Rather than describing the technologies from thesyntactic level (where their applicability may be difficult to put into thecontext of the discussions in the previous chapter), we first consider

standards such as XPath, XPointer, and XLink from a conceptual

viewpoint, looking at the types of support they provide for sophisticatedlinking and content management This discussion is supported by XMLfragment examples as a way of introducing these concepts through aprocess of illustration

Then, Part II of the book gets down to the specific details of the new

Trang 16

readers familiar only with the more "traditional" Web technologies, such

as HTML and HTTP, should first read this chapter

In chapters 5, 6, and 7, we look in detail at three of the key technologiesthat enable our vision: XPath, XPointer, and XLink In each case, ratherthan simply presenting the standard, we explain the concepts and,

wherever appropriate, the strengths, limitations, and ambiguities of thestandard As such, it is important that these chapters be read in

conjunction with the relevant standards This, in turn, raises an importantpoint: The XPointer and XLink standards have been evolving continuallyduring the writing of this book and are likely to continue to evolve Thismeans that you will need to be careful in interpreting some of the

comments here In particular, at the time of this writing, the current statusand version of the most relevant standards are as follows:

XML Path Language (XPath): W3C Recommendation (16 November1999) [Clark & DeRose 99]

XML Pointer Language (XPointer): W3C Candidate

Recommendation (11 September 2001) [DeRose+ 01b]

XML Linking Language (XLink): W3C Recommendation (27 June2001) [DeRose+ 01a]

This means that the standards as they are today are not going to change;but since adoption has been slow so far, actual implementations maydiffer from these standards, and the standards may have to be reworked.[1] Currently, there is no sign that this going to happen, but readers

should regularly check the W3C Web site at http://www.w3.org—in

particular, the technical reports page at http://www.w3.org/TR/—to look atthe latest versions of the standards We will also track standard

development on the book's Web site—http://transcluding.com

Trang 17

discussions are in the context of current practical limitations imposed byavailable infrastructure, environments, and tools (or lack of tools) In

10, everything is drawn together, and we make some final comments,particularly with regard to our own perspectives on the future of XLinkand XPointer

Acknowledgments

The authors would like to acknowledge the assistance of a number ofpeople in the preparation of this book Obviously, the W3C in general andthe developers of the XPointer and XLink standards in particular deservespecial mention Specifically, we wish to acknowledge the efforts of SteveDeRose, Eve Maler, David Orchard, and Ron Daniel in developing andpromoting these key standards

We would also like to acknowledge the original ground-breaking work ofTheodor Holm Nelson on early hypertext systems Many of the conceptsthat are only now being woven into the framework of the Web were

originally proposed by Ted 30 or more years ago His contribution to thefield is without parallel, and his vision for hypermedia is one that we arestill trying to appreciate and live up to

The assistance and support of the Addison-Wesley editorial staff hasbeen excellent In particular, we would like to acknowledge the

assistance of Mary O'Brien and Marilyn Rash, who never gave up on us,even when we were missing deadline after deadline Thanks!

And on a personal note, the support of Catherine Lowe and Jacqueline

Trang 18

Schwerzmann has been beyond value.

Trang 19

Dr Erik Wilde is lecturer and senior researcher at the Swiss Federal

Institute of Technology in Zürich (ETH Zürich), Switzerland To find outmore about Erik and his activities, visit his Web site at http://dret.net

Dr David Lowe is an associate professor and associate dean (teachingand learning) in the faculty of engineering at the University of Technology,Sydney He has active research interests in the areas of Web

development and technologies, hypermedia, and software engineering Inparticular, he focuses on Web development processes, Web project

of Technology, Sydney, P.O Box 123, Broadway, NSW 2007, Australia, ormailto:david.lowe@uts.edu.au

Trang 20

The World Wide Web has undergone astounding growth since its

emergence in the early 1990s There is a plethora of statistics that attest

to this expansion—the number of users, the number of pages that areavailable, business expenditure on Web technologies, consumer

expenditure through e-commerce sites, and so forth [2] These statisticsusually focus on technical growth and tend not to capture the more

fundamental and unprecedented changes in business, the world

economy, and, perhaps most significantly, social structures

And these changes will accelerate, as we continue to head toward anever richer online environment Commercial interactions and support forbusiness processes will become more complex and, at the same time,more central to both business and government activity We will see

progressively more pervasive, sophisticated, and diverse user

experiences as we move toward the emerging vision of a semantic Web(i.e., a Web that supports automated retrieval, analysis, and management

of resources)

Of importance in this rapidly evolving environment is the convergence of

a substantial number of emerging technologies and standards Thesetechnologies (or maybe acronyms would be a better name!) include, forexample, RDF, SMIL, WAP, WebML, DOM, CSS, PICS, PNG, SVG, WAI,and many more A quick look through the World Wide Web Consortium's(W3C's) list of technical recommendations, proposed recommendations,and working drafts (see http://www.w3.org/TR/) illustrates the breadth ofwork being considered

One of the most fundamental, widely discussed, and far-reaching

technologies is the Extensible Markup Language (XML) Viewed

simplistically, XML provides a mechanism for representing in a powerfulway the data that underpins the Web But a representation of the data isnot sufficient to enable systems and users to interact with, utilize, andcommunicate with that data—a representation of the ways in which

different data items are interrelated is also required Effectively, someform of linking model is necessary For this model to be useful for the

Trang 21

This book is intended to help Web developers understand the evolvingstandards supporting linking within XML and the implications of thesestandards for managing information and constructing sophisticated

applications In particular, we consider the ways in which these standardswill lead to a fundamentally richer Web environment and user experience

Information Linking

Linking is a fundamental concept that forms an important part of the

theoretical foundations of the Web Without linking, the Web is just anextremely large collection of (albeit very sophisticated) distributed

information and applications With linking, the Web becomes a singlecomplex system

Linking allows us to associate semantically related items of information

so that we can support sophisticated techniques for locating those items.But it goes way beyond that We can link information to tools for

manipulating that information We can link the various steps in processes(such as the steps in buying a book online) But we can also do moresophisticated linking, such as implementing dynamic links that changedepending on the context (time, user, history, etc.) or constructing newdocuments by merging content or applications from diverse (but linked)locations Linking effectively allows us to create a complexly structurednetwork of distributed resources—a "Web."

The concept of linking information resources has been around for a

considerable period of time, predating the Web by at least 45 years Theconcept of associations between items of information (at least as a

technically supported aid to information management) was originally

introduced by Vannevar Bush [1945] in the 1940s The concept

essentially remained an obscure idea until the 1960s when it was revived

by farsighted researchers such as Ted Nelson [1993] and Doug Engelbart[1988] Indeed, it was Ted Nelson who coined the terms "hypertext" and

"transclusion." His Xanadu system encapsulates many of the

sophisticated information structuring and management concepts nowbeing investigated for the Web Engelbart's work envisaged the user and

Trang 22

human capabilities

This work then spawned a growing body of research and development of

a number of systems within the hypertext community These systemsevolved during the 1970s and 1980s and gradually came to include verydiverse and sophisticated concepts: set-based association, multiple

source and destination links, dynamically adapted links, generic links thatare sourced from all content satisfying certain criteria, spatial

representations of the link associations, and so forth This richness inlinking concepts reflected the maturing ideas of how information can bemanaged and, in particular, how we interact with this information

The Web

Then, in the 1980s, Tim Berners-Lee started experimenting with theseconcepts In 1990, he developed (at CERN, the European Organizationfor Nuclear Research) a relatively simple implementation that was initiallyintended to allow him and his colleagues within the high-energy physicscommunity to collaborate through rapid information sharing [Berners-Lee92] In the next decade, Berners-Lee's ideas became the catalyst, alongwith various related convergent technologies, for a frenzy of businessand consumer activity that has completely transformed the world

economy and is fundamentally changing our social structure

The model originally proposed by Berners-Lee—that of a simple

communication protocol (Hypertext Transfer Protocol, or HTTP) that

allows documents to be requested from remote servers, coupled with adocument format (Hypertext Markup Language, or HTML) that supportsreferences to related documents—ignored almost all of the sophisticatedlinking concepts that had evolved during the previous 30 years

The linking model adopted by Berners-Lee was very simple: single-source, single-destination links embedded into the source document.This, however, is not a criticism of the choice to adopt this model Indeed

it was almost certainly partly this choice of a simple model that led to thesuccess of the Web

Trang 23

attractive As the number of users increases, the amount of informationincreases, and the likelihood of additional users (and providers of

information) increases The simplicity of the original model adopted byBerners-Lee made it very easy for people to provide content and set upservers and clients This in turn rapidly led to a critical mass and the

subsequent rapid adoption and evolution of the Web A more complexmodel incorporating some of the sophisticated information linking andmanagement functionalities found in other hypertext systems would haveslowed the adoption, possibly to the point where the critical mass was notreached

It is worth pointing out that, apart from the Web, the only commerciallysuccessful hypertext systems have been applied to particular niche

markets Hence, there has been a much stronger justification for the

effort required to create sufficient content Hypercard and Storyspace aretwo good examples of the systems that have managed to develop smallbut active niche markets But even these have been tiny efforts in

comparison with the Web

However—and here is a key point—while the simplicity of the originalWeb model may have led to its initial success (and what a success itwas!), it also meant that much of the richness that had been developed inprevious hypertext and information management systems was lost Thiswas not originally a great problem; but as the Web matured and evolved,these limitations began to place constraints on the ways in which theWeb could be used As just one simple example, the lack of any form ofstate (i.e., memory about the history of interaction between a server and

a client) originally led to concepts such as cookies and server sessionvariables, then complicated the issues of secure transactions and

ultimately made systems that adapted to users' specific needs

unnecessarily complex Much of the technical evolution and innovationover the last few years has been a consequence of trying to circumvent

or remove limitations such as these In some cases, this has included theintegration of richer hypermedia systems into the Web architecture, forexample, the development of systems such as Webcosm [Hall+ 96] andHyperwave [Maurer 96] (both to be discussed later in the book) This has,

Trang 24

More recently, various other Web developments have gained attention asways to circumvent the Web's original limitations One of the most

vocabulary and grammar for documents—the allowable elements andtheir valid usage Essentially, the result is that XML documents support(or at least support much better than HTML documents) aspects such asdata exchange, automated processing of content, and customized views

of the content

Accompanying XML and supporting it in various ways is a series of

related standards and technologies For example, the Extensible

Stylesheet Language (XSL) is composed of several components: XSLTransformations (XSLT) supports the conversion of XML documents intoother (possibly XML) documents, and XSL Formatting Objects (XSL-FO)support turning the results of a transformation into a form that can berendered Extensible HTML (XHTML) is the reformulation of HTML as anapplication of XML, thereby supporting the transition from HTML to XML.XML Schema is a schema language for XML documents, which in thefuture will probably replace XML's built-in mechanism for defining

schemas.[3] Other examples include XML Information Set, XML Query,and XForms

In effect, the XML family of standards provides an opportunity to

introduce into the Web environment some of the richness that was

missing from the original Web architecture An enormous amount has

Trang 25

as information management, business processes, and the user

experience— especially when combined with related technologies such

as Resource Description Framework (RDF), Cascading Style Sheets(CSS), and Document Object Model (DOM) However, one aspect thathas been overlooked to a certain extent is linking A key piece of the

maturing XML jigsaw is the linking model within XML—as provided by theemerging W3C recommendations on XLink and XPointer This linkingmodel provides a much more sophisticated approach to linking than theoriginal Web model Indeed, it comes close to the sophistication of thevarious hypertext models that pre-dated the Web—but within the context

of the distributed diversity offered by the evolving Web

The linking model that underpins XML is being developed by the W3CXML Linking Working Group As stated by the W3C, "the objective of theXML Linking Working Group is to design advanced, scalable, and

maintainable hyperlinking and addressing functionality for XML." Themodel that has been developed is composed of several components Thefirst of these is the XML Pointer Language (XPointer) XPointer builds onthe XML Path Language (XPath) to provide a mechanism for addressingXML-based Web resources by defining a powerful way to address

fragments of an XML document These fragments may be single XMLelements or a collection of elements, text strings, attributes, and so forththat have been merged into one composite

The second component is the XML Linking Language (XLink) XLink isused to describe complex associations between information resourcesidentified using URIs, possibly with added XPointers These associationscan be simple directional embedded links between two resources (aswith HTML), or much more complex associations Examples include

multidirectional links, multiple-destination links, and inlining of contentfrom one source to another, as well as automatically merged fragmentssourced from multiple disparate documents

Conclusions

In this book we look at the richer linking enabled by the emerging XLinkand XPointer standards (as well as XPath, which is the foundation of

Trang 26

Ultimately, we see this book as a motivation for integrating hypermediaconcepts into Web-related projects These concepts may not necessarilyalways or only map to XLink constructs (though this mapping is the focus

of much of this book) At present many of them can be supported usingDHTML (Dynamic HTML), tomorrow they will be supportable by XLink,and in five years time they may be supported by some new technology.The key point is that these concepts should be captured somehow, sothat they can be made available in whatever form is supported by thecurrent "publishing system." We don't intend to try to sell XLink as thehypermedia technology to end all other hypermedia technologies, but asone step in the evolution of the Web and a good reason to start thinkingabout the Web as a hypermedia system

[1] This is not the way standards are supposed to develop, but it may happen For

example, HTML standards for some time more or less simply tracked what the two major browser providers had already implemented.

Trang 27

Current Technology

Hypermedia Concepts and Alternatives to the WebConceptual Viewpoint

Trang 28

CONTENTS

Trang 29

discussing the Web's current linking model and its limitations in section1.3, "Information Linking in the 1.3."

1.1 The Internet Environment

One of the main results of the Web's success was to bring the Internetinto the conciousness of the general public Before the advent of theWeb, the Internet was mainly used by academic institutions for researchpurposes Within ten years, however, the Internet has become the

backbone of the information society, propelled mainly by the success ofthe Web and electronic mail The public often confuses the Internet andthe Web because much too often the terms are used interchangeably.However, there is a clear difference between them, defined as follows:

Internet— The entirety of all computers that are interconnected (using

various physical networking technologies) and that employ the Internetprotocol suite on top of their networking systems The Internet protocolsuite implements a wide-area, packet-switched network that

interconnects networks using different protocols and connection

characteristics

World Wide Web— A distributed hypermedia system built on top of

some of the services provided by the Internet, the most important beingthe naming service provided by the Domain Name System (DNS), and

Trang 30

1.1.1 Connecting to the Internet

Most computers today are connected to a computer network in one form

or another On the other hand, most individual home computers are notrunning constantly, or if they are, they are not permanently connected to

a computer network, mainly for economic reasons (because connectioncosts are often based on connection time) It is therefore possible to

differentiate between two different connection modes:

Dial-up connections This kind of connection is opened on demand,

for example, if a user wants to view Web content or send or receivee-mails In most cases, the initial connection between the user andthe user's Internet service provider (ISP) is established over a phoneline using a modem The basic modem connection establishes adata path between the user and the ISP, which is then used to sendInternet data packets back and forth, often using the Point to Point

Trang 31

Permanent connection For many applications (such as servers,

which must be available all the time), a permanent connection to theInternet is required or desirable One popular technology allowinghome users to achieve this is a cable modem, which works over acable television network Another technology is asymmetric digitalsubscriber line (ADSL), standardized in ITU-T G.992.2 [ITU 99],

which works over phone lines For corporate users, ISPs often offerleased lines, which have much greater capacities than modem

connections For internal distribution inside a company, local areanetworks (LANs) are used to interconnect all computers to form a so-called intranet

The decision about whether to connect a particular computer

permanently to the Internet is dictated by a number of issues, such as thepurpose of the computer, the remotely accessed services running on thecomputer, and available ISPs and their connection costs While todaymany home computers still are using dial-up connections, this will changewith the expansion of existing services and the introduction of new

offerings, such as cheap wireless services, ADSL, cable modems,

satellite connections, and new buildings providing Internet connectivity as

a basic service in the same way water, electricity, and phone lines areprovided today

behave correctly as an Internet host The most basic requirement is thatInternet hosts must be able to send and receive data packets (called

Trang 32

Internet RFC 791 [Postel 81a] IP provides the functions necessary to

deliver a package of bits (an Internet datagram) from a source to a

destination over an interconnected system of networks And herein liesthe strength of the Internet, it does not depend on a specific underlyingphysical network; it can be used on top of virtually any networking

technology

In the context of Web technologies, the most important protocol is theTransmission Control Protocol, which is standardized in RFC 793 [Postel81b] TCP is layered on top of IP to provide a reliable, connection-

oriented, flow-controlled transport service for applications IP is capable

of sending datagrams but does not guarantee that these datagrams will

be delivered IP datagrams can get lost, they can be duplicated, or theycan arrive at the receiver side in a different order from the one in whichthey had been sent TCP deals with all these possibilities and providesapplications with a transport service that, for many application scenarios,

is better suited than IP's service TCP does so by assigning sequencenumbers to individual packets and employing a number of elaboratemechanisms to make sure that the underlying network is not overloaded.Many Internet application protocols use TCP as the transport protocol,the most relevant being the Web's Hypertext Transfer Protocol (HTTP)and the Simple Mail Transfer Protocol (SMTP), which is used for theexchange of electronic mail

Apart from providing applications with transport protocols, the Internetenvironment also includes services By far the most important service isthe Domain Name System, which is standardized in RFCs 1034 and

1035 [Mockapetris 87a, 87b] DNS implements a globally distributeddatabase, which is used to map domain names to IP addresses On theprotocol level (IP or TCP), Internet hosts are addressed by IP addresses,which are 32-bit numbers, often written in the so-called dotted decimalnotation In this notation, IP addresses have four decimal numbers, such

as 129.132.66.9, the IP address of the Web server of this book'sWeb site However, since such a number is hard to remember, [2] a

naming system has been introduced that makes it possible to use

hierarchically structured names for Internet hosts The DNS name for thisbook's Web site is transcluding.com; and whenever a browser is trying to

Trang 33

to the IP address using a DNS request Therefore, most Web browserinteractions with the Internet involve two steps: first resolving the domainname by requesting its IP address from a DNS server, [3] and then

connecting to this address and requesting a Web page

1.2 The World Wide Web

In the previous section, we briefly describe how the Internet is used tosupport applications that use computer communications in general andhow the Internet supports the Web in particular Apart from the underlyingsupporting infrastructure provided by ISPs, what are the most

HTTP for the communication between a server and a client

Today, these basic technologies are still the same—even though theconcept of the URL has been extended to the Universal Resource

Identifier (URI) However, many technologies have been introduced tosupplement this basic architecture The most important technologies forthe topic of this book are described in chapter 4 The basic idea of theWeb is very simple: a browser requests a document (or some other

resource) from a server using the HTTP protocol, which then provides anappropriate response, also using the HTTP protocol The response may

or may not contain the requested document The requested documentwill typically be written using an appropriate language (such as HTML)

and sections of the document's information (known as anchors) will

contain references (known as links) to other documents The user readsthe document and can then, if desired, select one of the anchors Thebrowser then interprets the reference by extracting the URI of the

referenced document (part of which is the name of the server on which

Trang 34

experience

However, this diversity of new content formats has also led to many

incompatibilities The interpretation of a specific content format must beimplemented in the user's browser, and if there are dozens of contentformats all requiring special plug-ins or external viewers, then the Webbecomes less universal than it was designed to be After a period of

1990s, during which both companies continually improved HTML and theWeb architecture with new inventions incompatible with the contender'sproducts, the situation has greatly improved The World Wide Web

uncontrolled competition between Netscape and Microsoft in the mid-Consortium (W3C), an industry consortium headed by the Web's

inventor, Tim Berners-Lee, now strives to define common standards; andthe major players are all part of the standardization efforts

The idea of the semantic Web, as described by Berners-Lee in 1999, is a

Web in which all of the information available is not only machine-readable(as it is now) but also machine-understandable To achieve this goal, thebasic standards underlying many Web technologies must be able to carry

as much semantic information as possible Put simply, this means thatthe meaning of the content needs to be somehow encapsulated in therepresentation of that information

Essentially, the two most important issues when trying to develop

technologies for a semantically richer Web are how to attach meaning toWeb content, as can by the done by using the Resource Description

Framework (RDF) described in chapter 4, and how to develop a linkingmodel that supports a more diverse range of linking possibilities than theone currently supported by the Web This book focuses on the secondissue, and chapters 2 and 3 explore why Web linking should be improvedand how it can be improved

1.3 Information Linking in the WWW

Trang 35

models developed in a diverse range of hypertext systems over the last

30 years It has, however, been sufficient for the initial development andadoption of the Web At the end of this section and in the next chapter,

we discuss the limitations that this model imposes and the need for aricher model if the Web is to continue to grow and evolve

1.3.1 The Web's Linking Model

One of the key factors in the Web's growth in the last decade has beenthe way in which it enables users to move about and explore an

enormous information space Several key characteristics have

contributed to this ability The first is the distributed nature of the Web—the HTTP protocol supports access to information from remote sources in

a seamless fashion The second is the way in which the information isinterlinked—it is supported through several mechanisms, but most

notably the linking mechanisms associated with HTML and URIs

At its simplest, a link between items of information requires three

components: a source, a destination, and the connection between thesetwo (shown in Figure 1.1) Although these might be quite obvious, it isworth dissecting them a little The mechanism for representing link

sources (and possible destinations) is HTML's <a> element Effectivelyeverything between the start and end tags of this element forms a

marked section of text, also called an anchor For example, consider the

following HTML fragments:

Figure 1.1 Basic linking components

Trang 36

Textual data has been widely used to construct and manipulate information We can define

<a href="doc2.html#example">anchors</a> within this text, which are used as the basis for link sources and destinations

Trang 37

.

</body>

Trang 38

This example illustrates three other forms of linking The first is where adocument is associated with another one—in this case, a stylesheet Theassociation is not one based on the content of the document, but rather

on the way in which the document is to be presented This type of

relationship is not one that would be explicitly used by (or even be

apparent to) a user Rather, it is used by the software manipulating thedocument— most likely a browser [4] Nevertheless, the association

between the document and its stylesheet could provide useful information

to tools that facilitate access to information (especially where the

stylesheet contains named styles) For example, a search engine coulduse this type of association to allow a user to retrieve all Web pages thatcontain text paragraphs formatted as margin-note

The second case of linking in the example just given is a little more

subtle We have several meta tags containing name-content pairs,providing metadata about the document—information that describes thedocument rather than being part of the document itself This metadata istypically not displayed by a browser but is used in analyzing or searchingthe document In effect, we have information that has been explicitly

associated with the document content This can be viewed as a

degenerate case of linking and is usually not even considered to be

linking

The third case is where an image has been embedded into a document.This is similar to the first case just mentioned Both represent an

association between two different files The difference, however, is in theimplied semantics of the association With the image, the semantics are(usually) interpreted as "when showing this document, show the imageembedded into it at this location." The semantics of the association to thestylesheet are that the stylesheet is used to format elements of the

document In other words, the various mechanisms for creating

associations in HTML have very different implied semantics

1.3.3 Shortcomings of the Web Linking Model

Trang 39

becomes more sophisticated We should again emphasize that this is not

a criticism of the model that was adopted—the simplicity of this model isone of the key reasons the Web was originally so successful A richer,more flexible, but consequently more complex approach to linking wouldhave made Web technologies and concepts more difficult to understandand manage and hence have hindered the early development of theWeb Nevertheless, this simplicity now needs to be addressed as part ofthe ongoing evolution and maturing of the Web

So what are these limitations? We shall list some of the more significantconstraints and then discuss each of them in turn:

Embedded anchors limit the ability to link from or into read-only

material

Embedded unidirectional links make backtracking extremely difficult.The lack of overlapping anchors creates problems

Links lack any form of inherent contextualization

Single-source, single-destination links inhibit the representation ofcertain types of associations

Untyped links inhibit intelligent browsing and navigation

Embedded and untyped links restrict effective content managementand flexible information structuring schemes

Inability to Link from or into Read-Only Material

One of the most significant problems is the difficulty associated with

linking either from or to content that is either read-only or beyond thecontrol of the link author This is particularly problematic when content isbeing reused or is stored in a read-only format (for example, on a CD-ROM) Consider the following scenarios:

Trang 40

sequence of material collected from different Web sites Ideally, hewould create a trail though this existing material, either by adding alink to the end of each section, which links to the next, or by creating

a new composite page (or node) that contains the relevant extracts(but without "cutting-and-pasting" the material—and hence infringing

on the original author's rights) [5] In both cases, the existing HTMLmodel of linking (and document composition) does not allow this

Users may often wish to annotate Web pages belonging to someoneelse with their own comments and observations For example, JoeLinker is searching for information on XML linking to include in hiseducational site He locates a page on a remote server containingrelevant information, but finds a small section within the main

document with which he disagrees He would like to add a link (thatonly he or others using his link set will see) from the section in

question to a critique that he has written and is stored on his ownserver The Web, however, provides no mechanism to add anchorsinto content maintained by others

Joe Linker finds some additional information on XLink that would begood to provide as a resource, but unfortunately the material is in themiddle of a very long Web page containing predominantly unrelatedinformation The information on XLink is not tagged with a namedanchor that can be used as a link destination, so Joe has no way oflinking to the information other than to link to the overall page— arather unsatisfactory solution

In each case, the scenario described is common but cannot be

implemented using the current linking model supported by HTML Thereare, of course, ways around these problems (i.e., technical "kludges"such as the use of frames to present related material), but they are

invariably cumbersome, often ineffective, and expensive to maintain

Difficulty in Backtracking through Links

Định dạng
Số trang	351
Dung lượng	2,6 MB