Secondly, to provide semantic information about different attributes of Web content, the server may need to store a large amount of content descriptions.. The framework consists of sever
Trang 1CONTENT DESCRIPTION MODEL AND FRAMEWORK
FOR EFFICIENT CONTENT DISTRIBUTION
ZHANG SHUTAO
(B Eng (Hons.) NUS)
HT00-6864A
A THESIS SUBMITTED FOR THE DEGREE OF MASTER OF SCIENCE
DEPARTMENT OF COMPUTER SCIENCE
SCHOOL OF COMPUTING NATIONAL UNIVERSITY OF SINGAPORE
2005
Trang 2CONTENT DESCRIPTION MODEL AND FRAMEWORK
FOR EFFICIENT CONTENT DISTRIBUTION
ZHANG SHUTAO
NATIONAL UNIVERSITY OF SINGAPORE
2005
Trang 3I owe my deepest gratitude and appreciation to my thesis supervisor, Dr Chi Chi-Hung, for giving me the opportunity to work with him and my lab mates I thank him for his continued guidance, insight, patience, encouragement, and above all, his confidence in
me, without which this thesis would not have been possible I am grateful to him for all the time and efforts he has spent in helping me improve my research and this document I would also like to thank Dr Chi Chi-Hung, for giving me advices on how to choose my career path at this important stage of life
I sincerely thank all my lab mates for offering me much needed assistance and for sharing their invaluable insights during my research Special thanks to my dear friend Wang Hong-Guang for his sincere help and encouragement during the most difficult time of my research Also I want to thank Yuan Jun-Li and Li Qi-Ming for sharing their valuable advice on my research experiment
Finally, I would like to express my immeasurable appreciation to my wife, my parents and my parents in law for their love, trust, inspiration and understanding,
Trang 4Contents
Summary iii
List of Figures v
Chapter 1 Introduction 1
Chapter 2 Related Works 6
2.1 Framework for Customized Content Delivery 7
2.2 Content Description Model .9
2.3 Client Descriptions 12
2.4 Server Side Approaches .12
2.5 Existing Software Tools … 14
2.6 Summary 15
Chapter 3 A General Content Description Model 17
3.1 General Settings 17
3.2 Proposed Content Description Model 20
3.2.1 Web Objects……….… 20
3.2.2 Object Description Scheme 21
3.2.3 Discussion……… 26
Chapter 4 A Framework for Efficient Content Distribution 27
4.1 Design Objectives … 27
4.2 Overall Architecture…… 29
4.3 Server Operations… ……… .34
4.4 Proxy Operations… ………… …… 37
Trang 54.4.1 Mapping User Descriptions to Content Descriptions 37
4.4.2 Managing Local Content Descriptions……… 41
4.5 User Operations …… ……… 45
4.6 Summary…… …… ……… 45
Chapter 5 A Case Study on the Framework 47
5.1 Simulation Setup….….…… … …… 47
5.2 Web Object Size……… 51
5.3 Web Object Latency ……… 52
5.4 XHTML Page Latency……… …… .55
5.5 Summary ……… … …… 63
Chapter 6 Conclusion 65
Reference 68
Trang 6Summary
Today, the Web has become a highly heterogeneous environment Users are
accessing information on the Web pervasively through heterogeneous end points with different capabilities To accommodate the needs due to heterogeneous user preferences and device capabilities, web intermediaries, called proxies, start to perform various
functions including Web content caching and image transcoding on the Web content
before it is distributed to the users As different functions require different content semantic information, which we refer to as content descriptions, web servers are hosting
a large amount of content descriptions to help proxies perform various functions
Under the heterogeneous environment, efficient content distribution has become a problem due to a few challenging issues First of all, it is not clear how a proxy should decide which functions to perform given any user preferences and device capabilities, because it is not easy, if possible at all, for every proxy to understand every type of devices and users, and the users may not be able to know all the functions provided by proxies, either If this is not properly handled, we may end up delivering non-acceptable content to users Secondly, to provide semantic information about different attributes of Web content, the server may need to store a large amount of content descriptions Delivering all the descriptions about a Web page to a proxy when the Web page is requested may be highly inefficient because the proxy may only need a small fraction of the content descriptions to perform the desirable functions Thirdly, repeatedly delivering the same content descriptions to the same proxy is unnecessary But insofar, there lacks a
Trang 7mechanism for a proxy to properly cache and reuse the content descriptions that are already retrieved
In this thesis, we propose a content description model and framework for efficient content distribution The content description model employs ideas from Resource Description Framework [3] and External Annotation [2], which allow flexible descriptions for Web content The model also allows a server to efficiently select any subset of the descriptions
of any Web page and deliver them to a proxy The framework consists of several algorithms for the proxies to map user preferences and device capabilities to a set of functions to be performed, and for the server to select and deliver necessary content descriptions to the proxy, and for the proxy to efficiently cache and reuse the content descriptions
To evaluate the performance of our framework, we conduct a simulation study with certain simplifications (the details are given in Chapter 5) We employ real world Web objects identified from network traces, and study how our content description model and framework reduce the size of the Web objects, the delay in retrieving Web objects, the number chunks in HTTP responses, and the delay of entire Web pages We give some preliminary results and some discussions
Trang 8List of Figures
2.1 ICAP Response Modification……… 7
2.2 ICAP Request Modification ………8
2.3 InfoPyramid Model……… 9
3.1 General Settings ……….18
3.2 Description for a Simple XHTML Page 24
4.1 The Framework Overview ……… 33
4.2 Mapping User Descriptions to Functions ……… 38
4.3 Mapping Functions to Set of Attribute Descriptions ……… 40
4.4 Caching and Validation for Content Descriptions ……….43
4.5 Managing Local Attribute Descriptions ……….44
5.1 A Sample Content Selection Flow……… 50
5.2 Web Object Size Reduction ……… 51
5.3 Web Object Latency Reductions ……… 53
5.4 Chunk Number Distribution for Web Objects………54
5.5 HTML Chunk Number Reduction ……….55
5.6 XHTML Page Latency ……… 57
5.7 Effect of Different Parallel Connections with User Description D1 ……….59
5.8 Effect of Different Parallel Connections with User Description D2 ……….59
Trang 95.9 Effect of Different Parallel Connections with User Description D3 ……….61 5.10 Effect of Different Parallel Connections with User Description D4 …… 61 5.11 Effect of Different Parallel Connections with User Description D5 …… 62
Trang 10Chapter 1
Introduction
The Internet keeps growing rapidly based on latest surveys [7, 8, 9, 18] The Wide-Web (or Web in short), which is based on the Hyper Text Transfer Protocol (HTTP), has become the main platform for information distribution on the Internet Thompson et al [9] conducted a study on InternetMCI’s backbone and found that Web traffic occupied more than half of the total Internet traffic
World-Today, the Web has become a highly heterogeneous environment Users are
accessing information on the Web pervasively through heterogeneous end points, including personal computers and workstations on traditional wired networks, and devices based on more recent wireless technologies
Wireless devices such as smart phones, palm-top devices, and laptop computers are playing a very important role on the Internet All these Web accessing devices have
various capabilities due to their widely diversified hardware computation power (e.g.,
Trang 11processor speed, memory size, I/O capability), software configuration (e.g., operating system, Web browser, audio-visual applications), and network access methods (communication media and bandwidth)
Besides that devices are heterogeneous in their capabilities, users may as well have
different preferences for Web access, which may vary in several aspects such as privacy,
advertising, latency, and so on Consequently, different users may require different treatments on the Web content, based on their own preferences and device capabilities
To accommodate the needs for heterogeneous users and devices, network nodes between
servers and end users start to perform various functions on the Web content before it is distributed to the users These network nodes are often referred to as active web
intermediaries or proxies, in the rest of this thesis, we call them proxies Below are some
examples of the functions that are widely supported by proxies
Web content caching [39, 40, 41]
To achieve fast access to Web content for users who are spread out in a large range of different networks, one can employ a Web caching proxy on a subnet that temporarily stores copies of selected content provided by some server, so that a local user can obtain the content quickly from the local proxy instead of the remote server These proxies have become very important to speed up Web content distribution
Trang 12Content adaptation [2, 3, 4, 6]
Different device capabilities and user preferences pose different constraints on what kind of content is acceptable to the users To deliver only acceptable content to
users, proxies can perform content adaptations, which include image transcoding [41],
content transformation [4], content filtering [42], and so on
To perform these functions properly, a proxy usually requires some semantic information about the Web content We will refer this kind of semantic information as
content descriptions in the rest of this thesis To support various functions, many content
description models and frameworks have been proposed to provide semantic information
about different attributes of Web content For example, the Edge Side Includes (ESI) [1]
language was proposed to describe attributes such as expected expiry time or Live (TTL) for Web content to support dynamic content caching Extensible Device Independent Markup Language (XDIME) was proposed by Volantis [43] to describe content layout, image color and others attributes to support content adaptation for mobile devices
Time-To-Under the heterogeneous environment, efficient content distribution has become a problem In the following, we will address the challenging issues related to this problem one by one
Trang 13First of all, it is not clear how a proxy should decide which functions to perform given any user preferences and device capabilities, because it is not easy, if possible at all, for every proxy to understand every type of devices and users, and the users may not
be able to know all the functions provided by proxies, either If this is not properly handled, we may end up delivering non-acceptable content to users
Secondly, to provide semantic information about different attributes of Web content, the server may need to store a large amount of content descriptions Delivering all the descriptions about a Web page to a proxy when the Web page is requested may be highly inefficient because the proxy may only need a small fraction of the content descriptions to perform the desirable functions
Thirdly, repeatedly delivering the same content descriptions to the same proxy is unnecessary But insofar, there lacks a mechanism for a proxy to properly cache and reuse the content descriptions that are already retrieved
In this thesis, we propose a content description model and framework for efficient content distribution The content description model employs ideas from Resource Description Framework [3] and External Annotation [2], which allow flexible descriptions for Web content The model also allows a server to efficiently select any subset of the descriptions
of any Web page and deliver them to a proxy The framework consists of several algorithms for the proxies to map user preferences and device capabilities to a set of
Trang 14functions to be performed, and for the server to select and deliver necessary content descriptions to the proxy, and for the proxy to efficiently cache and reuse the content descriptions.
To evaluate the performance of our framework, we conduct a simulation study with certain simplifications (the details are given in Chapter 5) We employ real world Web objects identified from network traces, and study how our content description model and framework reduce the size of the Web objects, the delay in retrieving Web objects, the number chunks in HTTP responses, and the delay of entire Web pages We give some preliminary results and some discussions
This thesis is organized as follows In chapter 2, we review existing content description models and frameworks In chapter 3, we give a general content description model to support various content descriptions Subsequently, in chapter 4, we propose a framework
to support efficient content distribution in a heterogeneous environment After that, in chapter 5, we conduct a performance study to show the efficiency of the model and framework by simulations We conclude this thesis in chapter 6
Trang 15it is very important to adapt the Web content to suit needs of different users Because of this, Web servers and proxies have started to perform various functions on the content before delivering it to the users
In the following, we will outline approaches from different aspects There are general frameworks for customized content distribution, content description models for providing Web content descriptions, mechanisms to support descriptions for device capabilities and user preferences, as well as existing software tools to do content adaptation
Trang 162.1 Framework for Customized Content Delivery
There are many frameworks for Web content customization In the following, we will introduce two well known frameworks: Internet content adaptation protocol (ICAP) [14] and Open Pluggable Edge Services (OPES) [36] In the following, we will introduce ICAP followed by OPES
ICAP, the Internet content adaptation protocol, is a protocol designed to provide simple Web object based content vectoring for HTTP services It is essentially a lightweight protocol for executing a “remote procedure call” on HTTP messages In other words, ICAP clients can pass HTTP messages to ICAP servers for some kind of content modification The ICAP server executes its own processes on messages and sends back response to the client, usually with modified messages The modified messages may be either HTTP requests or responses The following figure shows the flow of HTTP messages under the ICAP protocol for request modification and response modification
Figure 2.1 ICAP Response Modification
Trang 17Figure 2.2 ICAP Request Modification
From the above diagrams, the ICAP server is a dedicated server to off-load specific Internet-based content modification from the original server, therefore freeing up resources in original servers and standardizing the way in which content modification can
be implemented
Similar to ICAP, OPES working group [36] is chartered to define a framework and protocols to authorize and invoke services to perform functions on Web objects It extends the functionality of a caching proxy to provide additional services that mediate, modify, and monitor object requests and responses
In general, both of the frameworks are proposed to provide support for almost any web services to modify Web content That means anyone can provide any function via these frameworks However, applying functions on Web content help to adapt the content according to special needs of users We cannot rely on any “special” functions to handle issues related to efficiency of content delivery in a heterogeneous environment In the
Trang 18next section, we will look at content description models for web content description to facilitate customized web content delivery
2.2 Content Description Model
In this section, we review approaches on describing web content to facilitate customized web content delivery We will again talk about two well known content description models here, namely InfoPyramid [31] and Resource Description Framework (RDF) [3]
InfoPyramid is a representation scheme for handling Web content (text, image, audio and video) hierarchically along the dimension of fidelity/resolution (in different quality but in the same media type) and modality (in different media type) This representation scheme
is shown in Figure 2.3 The representation scheme includes methods for analyzing, filtering, translating, and manipulating the Web content
Figure 2.3 InfoPyramid Model
Trang 19For the InfoPyramid model, the content is authored in XML [44], allowing the author to provide more information to the system performing content modification as only limited information about the content can be deducted from an HTML page directly The content will later be converted to HTML prior to delivery The authored content is analyzed to extract information that will be useful in adaptation Two types of content analysis are performed
First, each component of the content is analyzed to determine its resource requirements These requirements are content size, display size, streaming bit-rate, color requirements, compression formats, and hardware requirements
Second, the semantics of the content are determined in the context of the entire document After getting all these information, different modules can be chosen to convert the content into different versions with various resolutions and modalities This conversion is done offline, during content creation time Then multiple versions of the content, along with any associated meta-data are stored When a request comes, the web server determines the user device capabilities, selects the best fidelity and/or modality, and delivers the object in a suitable delivery format to the user
Resource Description Framework (RDF) is another general purpose content description framework This framework is based on XML and uses a collection of triples to provide descriptions A triple consists of a subject, a predicate and an object The assertion of an RDF triple says that some relationship, indicated by the predicate, holds between the
Trang 20things denoted by subject and object of the triple A set of such triples is called an RDF graph This can be illustrated by a node and directed-arc diagram, in which each triple is represented as a node-arc-node link (hence the term "graph")
The assertion of an RDF graph amounts to asserting all the triples in it, so the meaning of
an RDF graph is the conjunction (logical AND) of the statements corresponding to all the triples it contains Note that the subject in the triple can be anything that can be referenced by a URI We know that External Annotation [2] proposed by W3C has suggested a way to reference to any node of an XML document For a well formed HTML page, we can parse it into a tree and use External Annotation to create a URI to any node in the HTML parse tree That means combining RDF and External Annotation can create a very flexible approach to provide any descriptions about any node in a well formed HTML Web page
For the two content description models, the InfoPyramid approach provides a model to generate and organize web content with different versions This is a one-for-all approach,
it tries to handle all types of content in the container object (usually HTML objects), including text, images, videos, etc But it relies on content descriptions (embedded in XML format) to determine their resource requirements of the content However it may not work on HTML objects without extra content descriptions embedded RDF is a general and flexible framework providing content descriptions Combining RDF and
Trang 21External Annotation is a very useful approach to provide arbitrary descriptions about Web content without changing the content at all Actually our new content description model uses this idea to provide flexibility in our content description model
2.3 User Descriptions
To deliver the best-fit presentation of content to the users, we need descriptions about the user preference and device capabilities in the first place W3C has proposed the Composite Capability and Preference Profile (CC/PP) [10] to achieve this goal Wireless Application Protocol (WAP) Forum has proposed a similar approach named User Agent Profile (UAProf) [37] to handle user descriptions Both CC/PP and UAProf are based on Resource Description Framework (RDF) [38] and aim at describing and managing software and hardware profiles In our framework for efficient content distribution, we can use CC/PP or UAProf to provide descriptions about user preferences and device capabilities
2.4 Server Side Approaches
Besides descriptions about the clients, there are also approaches on the server side to address the issue of customized content delivery Approaches in this category fall into two main streams: providing web content descriptions or giving instructions on how to process web content from the web server We will introduce examples in these two streams in the following part of this section
Trang 22W3C has proposed a working draft on content selection for web contents for device independence [17] It specifies a processing model general purpose selection Selection involves conditional processing of various parts of an XML information set according to the results of the evaluation of expressions These logical expressions are associated with some parts of the information set and they will be processed at run time Using this mechanism some parts of the information set can be selected for further processing and others can be suppressed The specification of the parts of the infoset affected and the expressions that govern processing is by means of XML-friendly syntax This includes elements, attributes and XPath [45] expressions When using this selection mechanism with HTML objects, these logical expressions are embedded into the HTML objects and evaluated at run time to determine which part to include
ESI [1] uses a similar mechanism as W3C’s content selection Logical expressions are embedded with ESI markups into HTML object and evaluated at run to determine which fragment will be selected But main purpose of the ESI selection is for dynamic content assembly for different users
Besides providing content descriptions on the server side, there are also approaches which suggest web servers giving explicit guidance to allow a proxy to make the best choice while modifying web contents An example of this approach is server-directed-transcoding [33] by Mogul et al He proposed new HTTP header directives, by which a web server could give hints to a proxy on how to modify a web object He also proposed
Trang 23the use of applets (Java, Perl, etc.) to modify the web object according to web server’s guidance
From the above approaches, either the web server gives instructions on how to customize web content delivery, or they provide content descriptions about the content so that other web intermediaries can perform the task In the next section, we will introduce several software tools on providing customized HTML content to clients
2.5 Existing Software Tools
There are numerous software tools in the market providing customized Web content according to different clients’ needs These software tools include WebSphere Everyplace Mobile Portal [46], Web Logic Portal [47], etc They can transform HTML content to different markup languages such as WML, changing the page layout to suit different screen size, etc examining the users’ hardware capability and preferences, by filtering out parts of HTML objects that clients are not interested In the above mentioned software, content description is embedded in the content via special mark up User preferences are stored locally on the server when user registers himself with the server Their device capabilities are retrieved from specialized external repositories such as Wolantis [48] As we can see, these commercial software tools have the ability to support certain content transformation functions but the implementation is proprietary and not easily extensible to support other functions
Trang 24Different software tools may provide a different set of options depending on the software design But if new type of content emerges, clients have to wait for an update of the software to handle the new type of contents Thus extensibility and flexibility is a problem for these existing software tools
2.6 Summary
This chapter lists some of the approaches relevant to customized content delivery to clients from different aspects ICAP provides a framework where almost any services for customized web content delivery can be implemented The service can be provided by redirect HTTP request or response to dedicated ICAP servers OPES provides a similar system The InfoPyramid approach provides a model to generate and organize different versions of content in HTML objects Different versions can be selected when client sends request for a particular HTML object However, generation of different versions of content relies on content descriptions and specific modules to accomplish
To support content customization for different clients, we need descriptions about the clients’ preference and capabilities as well as the contents From the client side, there are frameworks such as CC/PP and UAProf to handle description for clients From server side, approaches like ESI provide content description through its own mark up languages, but their focus is on dynamic content assembly and caching Other approaches like server-directed-transcoding provide server guidance on how to provide content
Trang 25customization However it emphasizes on transformation of embedded objects in HTML pages
There are also existing software products like IBM WebSphere Mobile Portal and BEA WebLogic Portal to provide content adaptation according to device capabilities of users But different software provides a different set of options to clients, and there is no standard way to map all the clients’ preferences to the options provided by the software From above, there is no direct solution from the literature that addresses the efficiency issue in content delivery We can make use of existing frameworks such as ICAP and CC/PP to support our model But we need to add elements in our model to improve efficiency In the next chapter, we will explain our own content description model in detail
Trang 26Chapter 3
A General Content Description Model
As mentioned in previous chapters, we need a general content description model
to provide content semantic information to support different functions by different parties In this chapter, we give the general settings for the content description model Under our setting, we discuss the design considerations of the model In particular, we look at how the content should be described, and how the descriptions should be organized and associated with the content Finally we show how to achieve these design goals
3.1 General Settings
As illustrated in Figure 3.1, without loss of generality, we assume that there three entities,
namely, a Web server, a proxy, and a user We say that a Web object is the smallest unit
for content delivery For example, a Web object can be a paragraph of text between a pair
of <p> and </p> tags in an XHTML document Any Web object is associated with a
Trang 27unique identifier, which we call web object identifier We will make this notion more
precise later in Section 3.2
We assume that the server holds some Web content, which is a set of Web objects The
user may send requests for Web objects to the Web server through the proxy Each time, the user may choose to use a difference device for a different application, and sends a list
of device capabilities and user preferences to the proxy to indicate the requirements on
the Web objects imposed by the device and application for the current request
The proxy, based on the user’s preferences and device capabilities as well its local
policies, determines a set of functions to perform on the content before delivering content
to the user Each function here refers to a set of logical operations to be performed on a Web object, e.g., caching and image transcoding
Figure 3.1 General Settings
Proxy
Server Processed Web objects Web objects
Trang 28Next, the proxy requests from the server for the Web objects and the content descriptions that are necessary for the corresponding functions After that the proxy performs the desired functions on the Web objects and delivers the results to the user
Note that whether to perform some functions on a certain Web object may not depend on the capabilities and preferences provided by the user at all, e.g., in the case of caching, but for some other functions, such decisions may in deed depend on the capabilities and preferences, e.g., image transcoding
Trang 29Even for the same content property, there can be multiple types of descriptions For example, to describe the expected expiring time for a Web object, ESI suggests a single Time-To-Live (TTL) value, whereas some others use more complex descriptions such as
a set of TTL values where each of them is associated with a certain probability at which a Web object may expire It is preferable that the content description model allows different types of descriptions for the same property
Furthermore, when a proxy requests for certain content descriptions that are needed to support its functions, under the content description model, it should be easy for the server
to select only the desired descriptions Hence, the content description model should provide a way to organize all the content descriptions in a manner so that the server can perform such selection efficiently for any proxy function
To achieve these design goals, we need to decide how a Web object should be described
in a general way, and how we should organize the descriptions such that we can efficiently locate the descriptions of a given object, and select those desired ones In the following sections we are going to describe our solutions to the above problems
Trang 303.2 Proposed Content Description Model
3.2.1 Web Objects
Without loss of generality, we assume that the Web content that the user is interested in is
always stored in the form of a mark-up language that is well-formed, in the sense that (1)
the tags always appear in pairs, and (2) after removing any pair of tags and the content between them, the remaining content is still well-formed Note that the second requirement implies that the tags are properly nested, so that we can always compute a parse-tree from a document encoded using such a mark-up language XML and XHTML are examples of such mark-up languages In the rest of this thesis we will use XHTML as
an example, but our content description model applies to any well-formed mark-up language We also assume that each Web object is all the content enclosed by a pair of tags, and is always represented by a parse-tree
Recall that we require that every Web object is uniquely identified by some identifier
(ID) For any XHTML page P which includes a unique pair of tag “<html>” and
“</html>”, let its URL be Up, then the ID for the Web object that represents P would
be Up#root() Where root() represents the root node of the parse tree of the page Similarly we can give the identifier for each element in P For example,
Up#root().child(1) is the ID for first child node of the parse tree, while Up#root().child(1).child(5) would be the ID for the fifth child node of the first child node
of the parse tree Note that the above notation is similar to the use of annotation scheme proposed by W3C [50] for content transcoding
Trang 313.2.2 Object Description Scheme
In the proposed model, we give an object description scheme (ODS) to describe Web
objects Under our framework, every Web object, which is uniquely identified, is
associated with a number of descriptions Each description of a Web object is a tuple
<ID, attribute, value>, where ID is the unique identifier associated with the Web object, whose attribute, which is a string, is specified by value, which is another string The “attribute” is the property of the Web object we want to describe In particular, the description takes the form of the following
as below
<ods object='http://mywebsite/mywebpage.html#root()'>
<Author>Anonymous Author</Author>
</ods>
Trang 32For the ease of selection of different descriptions, descriptions for a Web object are
organized according to the attributes of the Web object, such as expected time of expiry
of a Web object However, to accommodate different types of descriptions for an
attribute, we define a type to differentiate various types of descriptions for the same
attribute In this way, all the descriptions for a Web object are organized as a set of XML documents, each document stores descriptions for a particular attribute of a particular type Such an XML document then consists of three parts:
XML and XML Name Space Declarations
Since all the descriptions are in the form of XML, we need a XML declaration “<?xml version=‘1.0’?>” to indicate the beginning of a XML document Further more, all the name spaces used in this XML document needs to be declared too These name spaces include the scheme’s default name space as well as name spaces for property descriptions
Attribute Meta-data Definition
Specify meta-data about the attribute we are describing in this XML document These meta-data is specified as different attributes of the tag “<ods:attribute>” The attribute “attr” indicates what attribute we are describing in this document and attribute
“type” is for the type of the description for the attribute The attribute “isDefault” is
an indication on whether this is the default description for this attribute about the Web
Trang 33object and attribute “mode” is mode of the description What these meta-data means will
be further explained in chapter 4
Description Tuples
As shown above, a description tuple consists of an object ID, a property and its value Note that the property can be from another namespace other than the default, this allows reusing descriptions by existing content description frameworks The value can be both literals and markups from a XML namespace All the description tuples must be enclosed
in between the root element “ods:Desp”
In the following figure, there is an example of the descriptions of the Web objects
in a simple XHTML page, where the property of the objects is their time-to-live (TTL)
Trang 34attribute descriptions for multiple Web objects
Rule two: Ignore irrelevant attribute descriptions Sometimes attribute descriptions to a
parent node does not apply to child nodes We can show this example A web page contains a news article in English with a few images among the text The author would like to specify an attribute to show that the text information is in English This description
Trang 35is put at the “root” node level and obvious it doesn’t apply on the image Web objects So
we ignore them during processing
Rule three: Local attributes have a higher priority than global attributes If attribute
descriptions are attached to the child and parent nodes, then those attached to the child nodes will overwrite the ones in the parent node This is a complement to rule one
With the three rules defined above, it is clear on which attributes apply on a particular piece of content In next section, there is a discussion on all the language components and functionalities explained in this chapter
3.2.3 Discussion
The object description scheme given in the previous section is highly flexible Since the values of the properties can be any string, we can even embed arbitrary encoded text into it, as long as the proxy knows how to decode it and use it to process the objects For example, in Edge Side Includes (ESI), sometimes we need to specify multiple alternatives for the same Web objects, by using <esi:try> and <esi:attempt> tags
In this case, we can actually encode the entire content between the <esi:try> and </esi:try> tags into a single string, and put it as a description of the object
Trang 36When the proxy requests for the description, it will receive and decode this string and process the ESI entry as if this ESI entry was embedded into the page
Furthermore, as we have mentioned before, different descriptions for different properties are put into different XML documents In this way, if only the descriptions about one attribute, such as the lifetime of the objects, are required, the server can easily pick out the relevant XML document and deliver to the proxy
Trang 37Chapter 4
A Framework for Efficient Content Distribution
In previous chapters, we have introduced a content description model for Web objects
We focus on how to use the model to provide content descriptions and facilitate efficient selection of content descriptions In this chapter, we propose a framework to improve efficiency of content distribution in a heterogeneous environment In the following parts
of the chapter, firstly we will introduce the design objectives of the framework to achieve efficiency After that we will define operations from web servers, proxies and clients to support the design objectives Finally we give a summary of the framework
4.1 Design Objectives
The following are the objectives we would like to achieve by the framework
Trang 38System architecture
The functions discussed in previous chapters can be performed by the server, the proxy, and even the user To improve efficiency, we need to come out a system architecture where responsibilities of the server, the proxy, and the user are clearly defined
Select the right functions to perform
A proxy may have a set of functions for different purposes For a user with certain preferences and device capabilities, we need to select the right functions to perform on the content before delivering the result to the user Otherwise, we may end up delivering
unacceptable content to the user
Transfer necessary content descriptions only
Due to variety of user preferences and device capabilities, web servers need to maintain a large set of content descriptions to support various functions on Web objects Different proxies may require different portions of content descriptions on the server to support different functions Hence we need a mechanism for a server to transfer only the necessary descriptions desired by a proxy
Reuse existing content descriptions
Repeatedly transferring the same set of descriptions to a proxy is a waste of network bandwidth, especially when the volume of descriptions is large Under the condition that
Trang 39descriptions for a Web object may expire, we need a mechanism to properly cache, validate and reuse existing descriptions retrieved from a server
4.2 System Architecture
When we design the architecture to handle content delivery from the Web server to the user in a heterogeneous environment, we need to address the following issues: who to provide the content descriptions, who to perform the functions, and who to provide descriptions about user preferences and device capabilities In the following, we would like to introduce the considerations for the system architecture first, followed by the overall architecture design
There are some design considerations we would like to take into account for the system architecture, namely scalable service and transparent to end users We will discuss them one by one
Scalable service
Due to rapid increase of Internet hosts and end users, the way to provide content delivery
needs to be scalable It means that relatively maintaining the same response time for
content delivery with the increasing number of users requiring customized delivery from the same website
Transparent to users
Trang 40Due to the large amount of users access the Web, it is very inconvenient or even impractical if there is a dramatic change to the software (e.g., browsers) used to access the Web by the users In our system architecture, users only need to express their preferences and device capabilities and no other change on the software for end users is required
Having these design considerations, we will address the issues about how to allocate the responsibilities on the Web server, the proxy and the end user
For content descriptions, the Web server is the most appropriate to host them Since content authors provide Web content, they know their content better than others It is very natural for the content authors to provide both the Web content and the content descriptions and let all of them hosted on the Web server
As for the question on who to perform the functions on Web content, we have a few choices, namely the Web server, the proxy and even the end user There are different benefits and constraints for different choices
For the Web server, it hosts Web content and content descriptions If it is going to perform functions on Web content, the benefits is that it can refers to content descriptions locally and performs the necessary functions At the same time, it also has to get the user descriptions (preferences, capabilities) to make the decision on what functions to perform Since the user descriptions vary, the server may need to perform different sets of