Publisher : Addison WesleyPub Date : July 23, 2002ISBN : 0-201-70344-0Pages : 304 The combination of Extensible Markup Language XML andits related interlinking standards bring a range of
Trang 1Publisher : Addison WesleyPub Date : July 23, 2002ISBN : 0-201-70344-0Pages : 304
The combination of Extensible Markup Language (XML) andits related interlinking standards bring a range of excitingpossibilities to the realm of Internet content management.This practical reference book documents these criticalstandards, shifting theory into practice for today's developerswho are creating tomorrow's useful, efficient, and
information-rich applications and Web sites
Blending advanced reference material with practicalguidelines, this authoritative guide presents a historicaloverview, current developments, and future perspectives inthree detailed sections Part I provides a conceptual
framework highlighting current and emerging linkingtechnologies, hypermedia concepts, and the rationale behindthe "open" Web of tomorrow Part II covers the specificsbehind the emerging core standards, and then Part IIIexamines how these technologies can be applied and howthe concepts can be put to efficient use within the world ofWeb site management and Web publishing
Both detailed and authoritative, this book presents the mostthorough documentation of XML's linking standards available,and it examines how today's enabling technologies are likely
to change the Web of tomorrow
Topics covered in-depth include:
Hypermedia concepts and alternatives to the WebXML Namespaces, XML Base, XInclude, XML
Trang 2XPath, XLink, and XPointer concepts, strengths, andlimitations
Emerging tools, applications, and environments
Migration strategies, from conventional models to moresophisticated linking techniques
Future perspectives on the XPath, XLink, and XPointerstandards
Trang 32.3 Usage Scenarios: Hypermedia Support for InformationUtilization
Trang 5References
Trang 6Many of the designations used by manufacturers and sellers to
distinguish their products are claimed as trademarks Where those
designations appear in this book, and Addison-Wesley was aware of atrademark claim, the designations have been printed with initial capitalletters or in all capitals
The authors and publisher have taken care in the preparation of thisbook, but make no expressed or implied warranty of any kind and
assume no responsibility for errors or omissions No liability is assumedfor incidental or consequential damages in connection with or arising out
of the use of the information or programs contained herein
The publisher offers discounts on this book when ordered in quantity forbulk purchases and special sales For more information, please contact:U.S Corporate and Government Sales
Trang 86.1 Snapshot of W3C's technical reports page
6.2 Container nodes, node points, and character points
Trang 97.2 Inline extended link with arcs7.3 Out-of-line extended link
7.4 Out-of-line extended link with arcs7.5 XLink and linkbases
Trang 106.2 XPointer Character Escaping (Example 1)
6.3 XPointer Character Escaping (Example 2)
7.1 Relation Between XLink Link and Element Types7.2 XLink Element Type Relationships
7.3 Attribute Use Patterns for XLink Element Types
Trang 11It gives me great pleasure to write a foreword to this book, which explainshow to bring some of the rich results of years of hypermedia research tothe World Wide Web, now the most important hypermedia publishingplatform on the planet It is wonderful to see XLink, XPointer, and XPathoutside of the charmed circle of standards development and to see theirpurpose and application to hypertext clearly explained by authors andresearchers who really understand them You may have already read theintentionally terse language of some of the many standards defining theWeb and wondered, "Why does the standard say to do things this way?"
If this has been your experience, relax and prepare to take a painlesstour with skillful guides who will lead you through possibilities that arenew to the Web, who will show you how they can be applied, and whowill alert you to some of the problems that remain
In a sense, this book celebrates the "marriage" of hypertext research,which has sometimes been remote from the workaday world, and theWorld Wide Web, which has become an integral part of commerce andentertainment Before you start to enjoy the wedding feast on offer, I, likethe Ancient Mariner, would hold you with a glittering eye for a while andtalk about history before sending you on to the feast to learn the latestthing
History comes to mind for a few reasons, one of which is that the title ofthis book contains the word "transclusion." This word is newer than
"hypertext"; it is less well known and still clearly bears the maker's stamp
of Ted Nelson, neologist extraordinary to the trade Vannevar Bush'sarticle about the Memex first sketched a clear vision of technologicallyassisted reading, writing, and thinking; but Ted Nelson and Douglas
Engelbart moved those concepts into the world of digital computers, theirnatural home where they now thrive In its invocation of Nelson, the title
of this book harks back to the origins of hypertext in the text-processingexperimentation of the sixties, when the idea of "wasting" computer
power on documents and not using it for mathematical problems wasrevolutionary and somewhat subversive
Trang 12
unusual today Partly, this reflected an ambition to provide an all-encompassing solution; partly, it reflected the way systems were
constructed in those days; but above all, it was a practical necessity Inthis new area of computer applications, the researcher must have somesolution to all the subproblems simultaneously (such as display, printing,data management, authoring, and composition support) in order to have
a system at all Later research systems produced more in-depth
knowledge of particular issues but were also more narrowly targeted atexploring specific concepts and were, often, unwieldy to deploy at othersites or to integrate with other systems
The Web bravely ignored many of the hard problems framed by
researchers and it did so by building a simple structure that would beuseful immediately, integrate with as many systems as possible, andspread easily The history of the Web over the last decade is deeply
"intertwingled" with many research concepts; with diverse social
communities, from physical scientists to humanists; and with the complexinfluences on the systems that inspired its creators But the Web hasgrown so fast that some connections have been obscured, and someopportunities that are no longer hard problems have been delayed Thisbook explains some of these connections, including why they are
important and how they will change things in the years to come
The marriage is on Come learn about new ways to link and to navigatethat are now ready for "worldwide" application, and enjoy the cake!
David G Durand
Trang 13The Web has been growing and evolving at a phenomenal rate since itsemergence in the early 1990s Part of this evolution has been the
development of increasingly sophisticated technologies and their
utilization in developing complex applications While the technical
foundations have remained relatively unchanged—URLs and HTTP haveremained stable for some time, and only HTML has changed frequently—the emergence of XML as a new format for the representation of content,along with a sequence of related developments such as XLink, has
heralded a substantial change in the way content can be managed Themost significant of these changes is with respect to the hypermedia
functionality that is enabled by these new technologies, particularly thericher linking and navigation models
The title of this book includes the word "transclusion." This word— fromTed Nelson's work on the Xanadu system [Nelson 95]—describes anapproach to including content via references that retain the original
context "Transclusion" and "transcopyright" are the two basic features ofthe Xanadu system, and the Web will definitely become more Xanadu-like in the coming years Furthermore, the Web's new hypermedia
functionality will make its structure more complex but also richer, moreusable, and more informative We believe this book will provide an
effective guide to this development in the coming years
Purpose of the Book
Our purpose in writing this book has been to explore and illustrate thepossible hypermedia functionality introduced into the Web's architecture
by XML and the accompanying XLink and XPointer standards Today'sfocus in the use of XML is its application-specific, data-structuring
capabilities However, we believe that by effective use of XLink and
XPointer, in conjunction with XML, hypermedia-rich applications, whichwill be more usable and effective than the current content-based HTMLhypermedia model, can be created
Trang 14XPointer-enabled Web from both a conceptual point of view and a
practical perspective A conceptual view allows us to understand thetypes of advanced changes enabled by these technologies, and the
implications of these changes for creating effective, maintainable, andusable applications A practical perspective allows us to understand howthese technologies are ac tually applied by developers, as well as toexamine issues related to current tools, environments, and
standardization processes
The Book's Audience
We believe that XML, XLink, and XPointer and, in particular, the newhypermedia functionality enabled by these technologies will
fundamentally change the Web This book focuses on understanding andleveraging these changes and should therefore be interesting and usefulfor many people
Web authors, developers, and project managers So far, this group
has been limited by HTML's primitive linking mechanism; and for manyapplications, an understanding of this new hypermedia functionality will
be beneficial It will enable them to produce more sophisticated
applications, both in terms of the way the content that underpins their site
is managed, as well as in terms of the functionality that can be created inthe application front-end This book provides an overview of the
technology and presents concrete implementation strategies To assistWeb authors, developers, and project managers in being backwards-compatible, the book also provides transition strategies
Web users In many cases, Web users are very interested in what the
future of Web technology can bring them In particular, updated featuresare often the main motivation for upgrading to a newer version of a
browser or other software, so Web users should be well informed aboutthe improvements available with the most recent software
Students In courses as diverse as information studies, software
engineering, information systems, and library studies, students will
benefit from understanding how the Web is likely to evolve in the future—
Trang 15The Book's Content
In this preface, we discuss the changes in the Web and the role that
emerging standards can play in developing a richer and more usableWeb In the introduction, we elaborate on this idea by exploring the
emerging standards and, in particular, consider what we mean by
information-linking and the role it plays within the Web The introductionprovides a context for the broad focus of the book
The rest of the book is divided into three main parts Part I focuses on aconceptual framework It explores the Web we might wish to develop andthe emerging linking technologies that may go some way toward
providing it We start in chapter 1 with a consideration of current
technology We focus on the limitations inherent in this technology,
particularly with respect to linking and the implications for informationhandling, navigation, and retrieval Chapter 2 provides information aboutthe motivation for the types of changes we are promoting We start byexploring linking issues in much more detail, looking at hypermedia
concepts and some of the historical hypermedia developments, whichprovides useful insights into how information might be better managed
We also provide relevant definitions that clarify much of the terminologyused in the rest of the book This chapter concludes with a typical
scenario that illustrates the types of Web changes that might be desirablecurrently Chapter 3 begins the process of considering the new and
emerging technologies that enable the vision we have begun to establish
in the first two chapters Rather than describing the technologies from thesyntactic level (where their applicability may be difficult to put into thecontext of the discussions in the previous chapter), we first consider
standards such as XPath, XPointer, and XLink from a conceptual
viewpoint, looking at the types of support they provide for sophisticatedlinking and content management This discussion is supported by XMLfragment examples as a way of introducing these concepts through aprocess of illustration
Then, Part II of the book gets down to the specific details of the new
Trang 16readers familiar only with the more "traditional" Web technologies, such
as HTML and HTTP, should first read this chapter
In chapters 5, 6, and 7, we look in detail at three of the key technologiesthat enable our vision: XPath, XPointer, and XLink In each case, ratherthan simply presenting the standard, we explain the concepts and,
wherever appropriate, the strengths, limitations, and ambiguities of thestandard As such, it is important that these chapters be read in
conjunction with the relevant standards This, in turn, raises an importantpoint: The XPointer and XLink standards have been evolving continuallyduring the writing of this book and are likely to continue to evolve Thismeans that you will need to be careful in interpreting some of the
comments here In particular, at the time of this writing, the current statusand version of the most relevant standards are as follows:
XML Path Language (XPath): W3C Recommendation (16 November1999) [Clark & DeRose 99]
XML Pointer Language (XPointer): W3C Candidate
Recommendation (11 September 2001) [DeRose+ 01b]
XML Linking Language (XLink): W3C Recommendation (27 June2001) [DeRose+ 01a]
This means that the standards as they are today are not going to change;but since adoption has been slow so far, actual implementations maydiffer from these standards, and the standards may have to be reworked.[1] Currently, there is no sign that this going to happen, but readers
should regularly check the W3C Web site at http://www.w3.org—in
particular, the technical reports page at http://www.w3.org/TR/—to look atthe latest versions of the standards We will also track standard
development on the book's Web site—http://transcluding.com
Trang 17discussions are in the context of current practical limitations imposed byavailable infrastructure, environments, and tools (or lack of tools) In
10, everything is drawn together, and we make some final comments,particularly with regard to our own perspectives on the future of XLinkand XPointer
Acknowledgments
The authors would like to acknowledge the assistance of a number ofpeople in the preparation of this book Obviously, the W3C in general andthe developers of the XPointer and XLink standards in particular deservespecial mention Specifically, we wish to acknowledge the efforts of SteveDeRose, Eve Maler, David Orchard, and Ron Daniel in developing andpromoting these key standards
We would also like to acknowledge the original ground-breaking work ofTheodor Holm Nelson on early hypertext systems Many of the conceptsthat are only now being woven into the framework of the Web were
originally proposed by Ted 30 or more years ago His contribution to thefield is without parallel, and his vision for hypermedia is one that we arestill trying to appreciate and live up to
The assistance and support of the Addison-Wesley editorial staff hasbeen excellent In particular, we would like to acknowledge the
assistance of Mary O'Brien and Marilyn Rash, who never gave up on us,even when we were missing deadline after deadline Thanks!
And on a personal note, the support of Catherine Lowe and Jacqueline
Trang 18Schwerzmann has been beyond value.
Trang 19Dr Erik Wilde is lecturer and senior researcher at the Swiss Federal
Institute of Technology in Zürich (ETH Zürich), Switzerland To find outmore about Erik and his activities, visit his Web site at http://dret.net
Dr David Lowe is an associate professor and associate dean (teachingand learning) in the faculty of engineering at the University of Technology,Sydney He has active research interests in the areas of Web
development and technologies, hypermedia, and software engineering Inparticular, he focuses on Web development processes, Web project
of Technology, Sydney, P.O Box 123, Broadway, NSW 2007, Australia, ormailto:david.lowe@uts.edu.au
Trang 20The World Wide Web has undergone astounding growth since its
emergence in the early 1990s There is a plethora of statistics that attest
to this expansion—the number of users, the number of pages that areavailable, business expenditure on Web technologies, consumer
expenditure through e-commerce sites, and so forth [2] These statisticsusually focus on technical growth and tend not to capture the more
fundamental and unprecedented changes in business, the world
economy, and, perhaps most significantly, social structures
And these changes will accelerate, as we continue to head toward anever richer online environment Commercial interactions and support forbusiness processes will become more complex and, at the same time,more central to both business and government activity We will see
progressively more pervasive, sophisticated, and diverse user
experiences as we move toward the emerging vision of a semantic Web(i.e., a Web that supports automated retrieval, analysis, and management
of resources)
Of importance in this rapidly evolving environment is the convergence of
a substantial number of emerging technologies and standards Thesetechnologies (or maybe acronyms would be a better name!) include, forexample, RDF, SMIL, WAP, WebML, DOM, CSS, PICS, PNG, SVG, WAI,and many more A quick look through the World Wide Web Consortium's(W3C's) list of technical recommendations, proposed recommendations,and working drafts (see http://www.w3.org/TR/) illustrates the breadth ofwork being considered
One of the most fundamental, widely discussed, and far-reaching
technologies is the Extensible Markup Language (XML) Viewed
simplistically, XML provides a mechanism for representing in a powerfulway the data that underpins the Web But a representation of the data isnot sufficient to enable systems and users to interact with, utilize, andcommunicate with that data—a representation of the ways in which
different data items are interrelated is also required Effectively, someform of linking model is necessary For this model to be useful for the
Trang 21This book is intended to help Web developers understand the evolvingstandards supporting linking within XML and the implications of thesestandards for managing information and constructing sophisticated
applications In particular, we consider the ways in which these standardswill lead to a fundamentally richer Web environment and user experience
Information Linking
Linking is a fundamental concept that forms an important part of the
theoretical foundations of the Web Without linking, the Web is just anextremely large collection of (albeit very sophisticated) distributed
information and applications With linking, the Web becomes a singlecomplex system
Linking allows us to associate semantically related items of information
so that we can support sophisticated techniques for locating those items.But it goes way beyond that We can link information to tools for
manipulating that information We can link the various steps in processes(such as the steps in buying a book online) But we can also do moresophisticated linking, such as implementing dynamic links that changedepending on the context (time, user, history, etc.) or constructing newdocuments by merging content or applications from diverse (but linked)locations Linking effectively allows us to create a complexly structurednetwork of distributed resources—a "Web."
The concept of linking information resources has been around for a
considerable period of time, predating the Web by at least 45 years Theconcept of associations between items of information (at least as a
technically supported aid to information management) was originally
introduced by Vannevar Bush [1945] in the 1940s The concept
essentially remained an obscure idea until the 1960s when it was revived
by farsighted researchers such as Ted Nelson [1993] and Doug Engelbart[1988] Indeed, it was Ted Nelson who coined the terms "hypertext" and
"transclusion." His Xanadu system encapsulates many of the
sophisticated information structuring and management concepts nowbeing investigated for the Web Engelbart's work envisaged the user and
Trang 22human capabilities
This work then spawned a growing body of research and development of
a number of systems within the hypertext community These systemsevolved during the 1970s and 1980s and gradually came to include verydiverse and sophisticated concepts: set-based association, multiple
source and destination links, dynamically adapted links, generic links thatare sourced from all content satisfying certain criteria, spatial
representations of the link associations, and so forth This richness inlinking concepts reflected the maturing ideas of how information can bemanaged and, in particular, how we interact with this information
The Web
Then, in the 1980s, Tim Berners-Lee started experimenting with theseconcepts In 1990, he developed (at CERN, the European Organizationfor Nuclear Research) a relatively simple implementation that was initiallyintended to allow him and his colleagues within the high-energy physicscommunity to collaborate through rapid information sharing [Berners-Lee92] In the next decade, Berners-Lee's ideas became the catalyst, alongwith various related convergent technologies, for a frenzy of businessand consumer activity that has completely transformed the world
economy and is fundamentally changing our social structure
The model originally proposed by Berners-Lee—that of a simple
communication protocol (Hypertext Transfer Protocol, or HTTP) that
allows documents to be requested from remote servers, coupled with adocument format (Hypertext Markup Language, or HTML) that supportsreferences to related documents—ignored almost all of the sophisticatedlinking concepts that had evolved during the previous 30 years
The linking model adopted by Berners-Lee was very simple: single-source, single-destination links embedded into the source document.This, however, is not a criticism of the choice to adopt this model Indeed
it was almost certainly partly this choice of a simple model that led to thesuccess of the Web
Trang 23attractive As the number of users increases, the amount of informationincreases, and the likelihood of additional users (and providers of
information) increases The simplicity of the original model adopted byBerners-Lee made it very easy for people to provide content and set upservers and clients This in turn rapidly led to a critical mass and the
subsequent rapid adoption and evolution of the Web A more complexmodel incorporating some of the sophisticated information linking andmanagement functionalities found in other hypertext systems would haveslowed the adoption, possibly to the point where the critical mass was notreached
It is worth pointing out that, apart from the Web, the only commerciallysuccessful hypertext systems have been applied to particular niche
markets Hence, there has been a much stronger justification for the
effort required to create sufficient content Hypercard and Storyspace aretwo good examples of the systems that have managed to develop smallbut active niche markets But even these have been tiny efforts in
comparison with the Web
However—and here is a key point—while the simplicity of the originalWeb model may have led to its initial success (and what a success itwas!), it also meant that much of the richness that had been developed inprevious hypertext and information management systems was lost Thiswas not originally a great problem; but as the Web matured and evolved,these limitations began to place constraints on the ways in which theWeb could be used As just one simple example, the lack of any form ofstate (i.e., memory about the history of interaction between a server and
a client) originally led to concepts such as cookies and server sessionvariables, then complicated the issues of secure transactions and
ultimately made systems that adapted to users' specific needs
unnecessarily complex Much of the technical evolution and innovationover the last few years has been a consequence of trying to circumvent
or remove limitations such as these In some cases, this has included theintegration of richer hypermedia systems into the Web architecture, forexample, the development of systems such as Webcosm [Hall+ 96] andHyperwave [Maurer 96] (both to be discussed later in the book) This has,
Trang 24More recently, various other Web developments have gained attention asways to circumvent the Web's original limitations One of the most
vocabulary and grammar for documents—the allowable elements andtheir valid usage Essentially, the result is that XML documents support(or at least support much better than HTML documents) aspects such asdata exchange, automated processing of content, and customized views
of the content
Accompanying XML and supporting it in various ways is a series of
related standards and technologies For example, the Extensible
Stylesheet Language (XSL) is composed of several components: XSLTransformations (XSLT) supports the conversion of XML documents intoother (possibly XML) documents, and XSL Formatting Objects (XSL-FO)support turning the results of a transformation into a form that can berendered Extensible HTML (XHTML) is the reformulation of HTML as anapplication of XML, thereby supporting the transition from HTML to XML.XML Schema is a schema language for XML documents, which in thefuture will probably replace XML's built-in mechanism for defining
schemas.[3] Other examples include XML Information Set, XML Query,and XForms
In effect, the XML family of standards provides an opportunity to
introduce into the Web environment some of the richness that was
missing from the original Web architecture An enormous amount has
Trang 25as information management, business processes, and the user
experience— especially when combined with related technologies such
as Resource Description Framework (RDF), Cascading Style Sheets(CSS), and Document Object Model (DOM) However, one aspect thathas been overlooked to a certain extent is linking A key piece of the
maturing XML jigsaw is the linking model within XML—as provided by theemerging W3C recommendations on XLink and XPointer This linkingmodel provides a much more sophisticated approach to linking than theoriginal Web model Indeed, it comes close to the sophistication of thevarious hypertext models that pre-dated the Web—but within the context
of the distributed diversity offered by the evolving Web
The linking model that underpins XML is being developed by the W3CXML Linking Working Group As stated by the W3C, "the objective of theXML Linking Working Group is to design advanced, scalable, and
maintainable hyperlinking and addressing functionality for XML." Themodel that has been developed is composed of several components Thefirst of these is the XML Pointer Language (XPointer) XPointer builds onthe XML Path Language (XPath) to provide a mechanism for addressingXML-based Web resources by defining a powerful way to address
fragments of an XML document These fragments may be single XMLelements or a collection of elements, text strings, attributes, and so forththat have been merged into one composite
The second component is the XML Linking Language (XLink) XLink isused to describe complex associations between information resourcesidentified using URIs, possibly with added XPointers These associationscan be simple directional embedded links between two resources (aswith HTML), or much more complex associations Examples include
multidirectional links, multiple-destination links, and inlining of contentfrom one source to another, as well as automatically merged fragmentssourced from multiple disparate documents
Conclusions
In this book we look at the richer linking enabled by the emerging XLinkand XPointer standards (as well as XPath, which is the foundation of
Trang 26Ultimately, we see this book as a motivation for integrating hypermediaconcepts into Web-related projects These concepts may not necessarilyalways or only map to XLink constructs (though this mapping is the focus
of much of this book) At present many of them can be supported usingDHTML (Dynamic HTML), tomorrow they will be supportable by XLink,and in five years time they may be supported by some new technology.The key point is that these concepts should be captured somehow, sothat they can be made available in whatever form is supported by thecurrent "publishing system." We don't intend to try to sell XLink as thehypermedia technology to end all other hypermedia technologies, but asone step in the evolution of the Web and a good reason to start thinkingabout the Web as a hypermedia system
[1] This is not the way standards are supposed to develop, but it may happen For
example, HTML standards for some time more or less simply tracked what the two major browser providers had already implemented.
Trang 27Current Technology
Hypermedia Concepts and Alternatives to the WebConceptual Viewpoint
Trang 28CONTENTS
Trang 29discussing the Web's current linking model and its limitations in section1.3, "Information Linking in the 1.3."
1.1 The Internet Environment
One of the main results of the Web's success was to bring the Internetinto the conciousness of the general public Before the advent of theWeb, the Internet was mainly used by academic institutions for researchpurposes Within ten years, however, the Internet has become the
backbone of the information society, propelled mainly by the success ofthe Web and electronic mail The public often confuses the Internet andthe Web because much too often the terms are used interchangeably.However, there is a clear difference between them, defined as follows:
Internet— The entirety of all computers that are interconnected (using
various physical networking technologies) and that employ the Internetprotocol suite on top of their networking systems The Internet protocolsuite implements a wide-area, packet-switched network that
interconnects networks using different protocols and connection
characteristics
World Wide Web— A distributed hypermedia system built on top of
some of the services provided by the Internet, the most important beingthe naming service provided by the Domain Name System (DNS), and
Trang 301.1.1 Connecting to the Internet
Most computers today are connected to a computer network in one form
or another On the other hand, most individual home computers are notrunning constantly, or if they are, they are not permanently connected to
a computer network, mainly for economic reasons (because connectioncosts are often based on connection time) It is therefore possible to
differentiate between two different connection modes:
Dial-up connections This kind of connection is opened on demand,
for example, if a user wants to view Web content or send or receivee-mails In most cases, the initial connection between the user andthe user's Internet service provider (ISP) is established over a phoneline using a modem The basic modem connection establishes adata path between the user and the ISP, which is then used to sendInternet data packets back and forth, often using the Point to Point
Trang 31Permanent connection For many applications (such as servers,
which must be available all the time), a permanent connection to theInternet is required or desirable One popular technology allowinghome users to achieve this is a cable modem, which works over acable television network Another technology is asymmetric digitalsubscriber line (ADSL), standardized in ITU-T G.992.2 [ITU 99],
which works over phone lines For corporate users, ISPs often offerleased lines, which have much greater capacities than modem
connections For internal distribution inside a company, local areanetworks (LANs) are used to interconnect all computers to form a so-called intranet
The decision about whether to connect a particular computer
permanently to the Internet is dictated by a number of issues, such as thepurpose of the computer, the remotely accessed services running on thecomputer, and available ISPs and their connection costs While todaymany home computers still are using dial-up connections, this will changewith the expansion of existing services and the introduction of new
offerings, such as cheap wireless services, ADSL, cable modems,
satellite connections, and new buildings providing Internet connectivity as
a basic service in the same way water, electricity, and phone lines areprovided today
behave correctly as an Internet host The most basic requirement is thatInternet hosts must be able to send and receive data packets (called
Trang 32Internet RFC 791 [Postel 81a] IP provides the functions necessary to
deliver a package of bits (an Internet datagram) from a source to a
destination over an interconnected system of networks And herein liesthe strength of the Internet, it does not depend on a specific underlyingphysical network; it can be used on top of virtually any networking
technology
In the context of Web technologies, the most important protocol is theTransmission Control Protocol, which is standardized in RFC 793 [Postel81b] TCP is layered on top of IP to provide a reliable, connection-
oriented, flow-controlled transport service for applications IP is capable
of sending datagrams but does not guarantee that these datagrams will
be delivered IP datagrams can get lost, they can be duplicated, or theycan arrive at the receiver side in a different order from the one in whichthey had been sent TCP deals with all these possibilities and providesapplications with a transport service that, for many application scenarios,
is better suited than IP's service TCP does so by assigning sequencenumbers to individual packets and employing a number of elaboratemechanisms to make sure that the underlying network is not overloaded.Many Internet application protocols use TCP as the transport protocol,the most relevant being the Web's Hypertext Transfer Protocol (HTTP)and the Simple Mail Transfer Protocol (SMTP), which is used for theexchange of electronic mail
Apart from providing applications with transport protocols, the Internetenvironment also includes services By far the most important service isthe Domain Name System, which is standardized in RFCs 1034 and
1035 [Mockapetris 87a, 87b] DNS implements a globally distributeddatabase, which is used to map domain names to IP addresses On theprotocol level (IP or TCP), Internet hosts are addressed by IP addresses,which are 32-bit numbers, often written in the so-called dotted decimalnotation In this notation, IP addresses have four decimal numbers, such
as 129.132.66.9, the IP address of the Web server of this book'sWeb site However, since such a number is hard to remember, [2] a
naming system has been introduced that makes it possible to use
hierarchically structured names for Internet hosts The DNS name for thisbook's Web site is transcluding.com; and whenever a browser is trying to
Trang 33to the IP address using a DNS request Therefore, most Web browserinteractions with the Internet involve two steps: first resolving the domainname by requesting its IP address from a DNS server, [3] and then
connecting to this address and requesting a Web page
1.2 The World Wide Web
In the previous section, we briefly describe how the Internet is used tosupport applications that use computer communications in general andhow the Internet supports the Web in particular Apart from the underlyingsupporting infrastructure provided by ISPs, what are the most
HTTP for the communication between a server and a client
Today, these basic technologies are still the same—even though theconcept of the URL has been extended to the Universal Resource
Identifier (URI) However, many technologies have been introduced tosupplement this basic architecture The most important technologies forthe topic of this book are described in chapter 4 The basic idea of theWeb is very simple: a browser requests a document (or some other
resource) from a server using the HTTP protocol, which then provides anappropriate response, also using the HTTP protocol The response may
or may not contain the requested document The requested documentwill typically be written using an appropriate language (such as HTML)
and sections of the document's information (known as anchors) will
contain references (known as links) to other documents The user readsthe document and can then, if desired, select one of the anchors Thebrowser then interprets the reference by extracting the URI of the
referenced document (part of which is the name of the server on which
Trang 34experience
However, this diversity of new content formats has also led to many
incompatibilities The interpretation of a specific content format must beimplemented in the user's browser, and if there are dozens of contentformats all requiring special plug-ins or external viewers, then the Webbecomes less universal than it was designed to be After a period of
1990s, during which both companies continually improved HTML and theWeb architecture with new inventions incompatible with the contender'sproducts, the situation has greatly improved The World Wide Web
uncontrolled competition between Netscape and Microsoft in the mid-Consortium (W3C), an industry consortium headed by the Web's
inventor, Tim Berners-Lee, now strives to define common standards; andthe major players are all part of the standardization efforts
The idea of the semantic Web, as described by Berners-Lee in 1999, is a
Web in which all of the information available is not only machine-readable(as it is now) but also machine-understandable To achieve this goal, thebasic standards underlying many Web technologies must be able to carry
as much semantic information as possible Put simply, this means thatthe meaning of the content needs to be somehow encapsulated in therepresentation of that information
Essentially, the two most important issues when trying to develop
technologies for a semantically richer Web are how to attach meaning toWeb content, as can by the done by using the Resource Description
Framework (RDF) described in chapter 4, and how to develop a linkingmodel that supports a more diverse range of linking possibilities than theone currently supported by the Web This book focuses on the secondissue, and chapters 2 and 3 explore why Web linking should be improvedand how it can be improved
1.3 Information Linking in the WWW
Trang 35models developed in a diverse range of hypertext systems over the last
30 years It has, however, been sufficient for the initial development andadoption of the Web At the end of this section and in the next chapter,
we discuss the limitations that this model imposes and the need for aricher model if the Web is to continue to grow and evolve
1.3.1 The Web's Linking Model
One of the key factors in the Web's growth in the last decade has beenthe way in which it enables users to move about and explore an
enormous information space Several key characteristics have
contributed to this ability The first is the distributed nature of the Web—the HTTP protocol supports access to information from remote sources in
a seamless fashion The second is the way in which the information isinterlinked—it is supported through several mechanisms, but most
notably the linking mechanisms associated with HTML and URIs
At its simplest, a link between items of information requires three
components: a source, a destination, and the connection between thesetwo (shown in Figure 1.1) Although these might be quite obvious, it isworth dissecting them a little The mechanism for representing link
sources (and possible destinations) is HTML's <a> element Effectivelyeverything between the start and end tags of this element forms a
marked section of text, also called an anchor For example, consider the
following HTML fragments:
Figure 1.1 Basic linking components
Trang 36Textual data has been widely used to construct and manipulate information We can define
<a href="doc2.html#example">anchors</a> within this text, which are used as the basis for link sources and destinations
Trang 37<link rel="stylesheet" href="trans.css" type="text/css"> </head>
<body bgcolor="#EEEEEE">
<img src="header.gif" alt="Linking Example">
.
</body>
Trang 38This example illustrates three other forms of linking The first is where adocument is associated with another one—in this case, a stylesheet Theassociation is not one based on the content of the document, but rather
on the way in which the document is to be presented This type of
relationship is not one that would be explicitly used by (or even be
apparent to) a user Rather, it is used by the software manipulating thedocument— most likely a browser [4] Nevertheless, the association
between the document and its stylesheet could provide useful information
to tools that facilitate access to information (especially where the
stylesheet contains named styles) For example, a search engine coulduse this type of association to allow a user to retrieve all Web pages thatcontain text paragraphs formatted as margin-note
The second case of linking in the example just given is a little more
subtle We have several meta tags containing name-content pairs,providing metadata about the document—information that describes thedocument rather than being part of the document itself This metadata istypically not displayed by a browser but is used in analyzing or searchingthe document In effect, we have information that has been explicitly
associated with the document content This can be viewed as a
degenerate case of linking and is usually not even considered to be
linking
The third case is where an image has been embedded into a document.This is similar to the first case just mentioned Both represent an
association between two different files The difference, however, is in theimplied semantics of the association With the image, the semantics are(usually) interpreted as "when showing this document, show the imageembedded into it at this location." The semantics of the association to thestylesheet are that the stylesheet is used to format elements of the
document In other words, the various mechanisms for creating
associations in HTML have very different implied semantics
1.3.3 Shortcomings of the Web Linking Model
Trang 39becomes more sophisticated We should again emphasize that this is not
a criticism of the model that was adopted—the simplicity of this model isone of the key reasons the Web was originally so successful A richer,more flexible, but consequently more complex approach to linking wouldhave made Web technologies and concepts more difficult to understandand manage and hence have hindered the early development of theWeb Nevertheless, this simplicity now needs to be addressed as part ofthe ongoing evolution and maturing of the Web
So what are these limitations? We shall list some of the more significantconstraints and then discuss each of them in turn:
Embedded anchors limit the ability to link from or into read-only
material
Embedded unidirectional links make backtracking extremely difficult.The lack of overlapping anchors creates problems
Links lack any form of inherent contextualization
Single-source, single-destination links inhibit the representation ofcertain types of associations
Untyped links inhibit intelligent browsing and navigation
Embedded and untyped links restrict effective content managementand flexible information structuring schemes
Inability to Link from or into Read-Only Material
One of the most significant problems is the difficulty associated with
linking either from or to content that is either read-only or beyond thecontrol of the link author This is particularly problematic when content isbeing reused or is stored in a read-only format (for example, on a CD-ROM) Consider the following scenarios:
Trang 40sequence of material collected from different Web sites Ideally, hewould create a trail though this existing material, either by adding alink to the end of each section, which links to the next, or by creating
a new composite page (or node) that contains the relevant extracts(but without "cutting-and-pasting" the material—and hence infringing
on the original author's rights) [5] In both cases, the existing HTMLmodel of linking (and document composition) does not allow this
Users may often wish to annotate Web pages belonging to someoneelse with their own comments and observations For example, JoeLinker is searching for information on XML linking to include in hiseducational site He locates a page on a remote server containingrelevant information, but finds a small section within the main
document with which he disagrees He would like to add a link (thatonly he or others using his link set will see) from the section in
question to a critique that he has written and is stored on his ownserver The Web, however, provides no mechanism to add anchorsinto content maintained by others
Joe Linker finds some additional information on XLink that would begood to provide as a resource, but unfortunately the material is in themiddle of a very long Web page containing predominantly unrelatedinformation The information on XLink is not tagged with a namedanchor that can be used as a link destination, so Joe has no way oflinking to the information other than to link to the overall page— arather unsatisfactory solution
In each case, the scenario described is common but cannot be
implemented using the current linking model supported by HTML Thereare, of course, ways around these problems (i.e., technical "kludges"such as the use of frames to present related material), but they are
invariably cumbersome, often ineffective, and expensive to maintain
Difficulty in Backtracking through Links