Content description model and framework for efficient content distribution

Secondly, to provide semantic information about different attributes of Web content, the server may need to store a large amount of content descriptions.. The framework consists of sever

Trang 1

CONTENT DESCRIPTION MODEL AND FRAMEWORK

FOR EFFICIENT CONTENT DISTRIBUTION

ZHANG SHUTAO

(B Eng (Hons.) NUS)

HT00-6864A

A THESIS SUBMITTED FOR THE DEGREE OF MASTER OF SCIENCE

DEPARTMENT OF COMPUTER SCIENCE

SCHOOL OF COMPUTING NATIONAL UNIVERSITY OF SINGAPORE

2005

Trang 2

CONTENT DESCRIPTION MODEL AND FRAMEWORK

FOR EFFICIENT CONTENT DISTRIBUTION

ZHANG SHUTAO

NATIONAL UNIVERSITY OF SINGAPORE

2005

Trang 3

I owe my deepest gratitude and appreciation to my thesis supervisor, Dr Chi Chi-Hung, for giving me the opportunity to work with him and my lab mates I thank him for his continued guidance, insight, patience, encouragement, and above all, his confidence in

me, without which this thesis would not have been possible I am grateful to him for all the time and efforts he has spent in helping me improve my research and this document I would also like to thank Dr Chi Chi-Hung, for giving me advices on how to choose my career path at this important stage of life

I sincerely thank all my lab mates for offering me much needed assistance and for sharing their invaluable insights during my research Special thanks to my dear friend Wang Hong-Guang for his sincere help and encouragement during the most difficult time of my research Also I want to thank Yuan Jun-Li and Li Qi-Ming for sharing their valuable advice on my research experiment

Finally, I would like to express my immeasurable appreciation to my wife, my parents and my parents in law for their love, trust, inspiration and understanding,

Trang 4

Contents

Summary iii

List of Figures v

Chapter 1 Introduction 1

Chapter 2 Related Works 6

2.1 Framework for Customized Content Delivery 7

2.2 Content Description Model .9

2.3 Client Descriptions 12

2.4 Server Side Approaches .12

2.5 Existing Software Tools … 14

2.6 Summary 15

Chapter 3 A General Content Description Model 17

3.1 General Settings 17

3.2 Proposed Content Description Model 20

3.2.1 Web Objects……….… 20

3.2.2 Object Description Scheme 21

3.2.3 Discussion……… 26

Chapter 4 A Framework for Efficient Content Distribution 27

4.1 Design Objectives … 27

4.2 Overall Architecture…… 29

4.3 Server Operations… ……… .34

4.4 Proxy Operations… ………… …… 37

Trang 5

4.4.1 Mapping User Descriptions to Content Descriptions 37

4.4.2 Managing Local Content Descriptions……… 41

4.5 User Operations …… ……… 45

4.6 Summary…… …… ……… 45

Chapter 5 A Case Study on the Framework 47

5.1 Simulation Setup….….…… … …… 47

5.2 Web Object Size……… 51

5.3 Web Object Latency ……… 52

5.4 XHTML Page Latency……… …… .55

5.5 Summary ……… … …… 63

Chapter 6 Conclusion 65

Reference 68

Trang 6

Summary

Today, the Web has become a highly heterogeneous environment Users are

accessing information on the Web pervasively through heterogeneous end points with different capabilities To accommodate the needs due to heterogeneous user preferences and device capabilities, web intermediaries, called proxies, start to perform various

functions including Web content caching and image transcoding on the Web content

before it is distributed to the users As different functions require different content semantic information, which we refer to as content descriptions, web servers are hosting

a large amount of content descriptions to help proxies perform various functions

Under the heterogeneous environment, efficient content distribution has become a problem due to a few challenging issues First of all, it is not clear how a proxy should decide which functions to perform given any user preferences and device capabilities, because it is not easy, if possible at all, for every proxy to understand every type of devices and users, and the users may not be able to know all the functions provided by proxies, either If this is not properly handled, we may end up delivering non-acceptable content to users Secondly, to provide semantic information about different attributes of Web content, the server may need to store a large amount of content descriptions Delivering all the descriptions about a Web page to a proxy when the Web page is requested may be highly inefficient because the proxy may only need a small fraction of the content descriptions to perform the desirable functions Thirdly, repeatedly delivering the same content descriptions to the same proxy is unnecessary But insofar, there lacks a

Trang 7

mechanism for a proxy to properly cache and reuse the content descriptions that are already retrieved

In this thesis, we propose a content description model and framework for efficient content distribution The content description model employs ideas from Resource Description Framework [3] and External Annotation [2], which allow flexible descriptions for Web content The model also allows a server to efficiently select any subset of the descriptions

of any Web page and deliver them to a proxy The framework consists of several algorithms for the proxies to map user preferences and device capabilities to a set of functions to be performed, and for the server to select and deliver necessary content descriptions to the proxy, and for the proxy to efficiently cache and reuse the content descriptions

To evaluate the performance of our framework, we conduct a simulation study with certain simplifications (the details are given in Chapter 5) We employ real world Web objects identified from network traces, and study how our content description model and framework reduce the size of the Web objects, the delay in retrieving Web objects, the number chunks in HTTP responses, and the delay of entire Web pages We give some preliminary results and some discussions

Trang 8

List of Figures

2.1 ICAP Response Modification……… 7

2.2 ICAP Request Modification ………8

2.3 InfoPyramid Model……… 9

3.1 General Settings ……….18

3.2 Description for a Simple XHTML Page 24

4.1 The Framework Overview ……… 33

4.2 Mapping User Descriptions to Functions ……… 38

4.3 Mapping Functions to Set of Attribute Descriptions ……… 40

4.4 Caching and Validation for Content Descriptions ……….43

4.5 Managing Local Attribute Descriptions ……….44

5.1 A Sample Content Selection Flow……… 50

5.2 Web Object Size Reduction ……… 51

5.3 Web Object Latency Reductions ……… 53

5.4 Chunk Number Distribution for Web Objects………54

5.5 HTML Chunk Number Reduction ……….55

5.6 XHTML Page Latency ……… 57

5.7 Effect of Different Parallel Connections with User Description D1 ……….59

5.8 Effect of Different Parallel Connections with User Description D2 ……….59

Trang 9

5.9 Effect of Different Parallel Connections with User Description D3 ……….61 5.10 Effect of Different Parallel Connections with User Description D4 …… 61 5.11 Effect of Different Parallel Connections with User Description D5 …… 62

Trang 10

Chapter 1

Introduction

The Internet keeps growing rapidly based on latest surveys [7, 8, 9, 18] The Wide-Web (or Web in short), which is based on the Hyper Text Transfer Protocol (HTTP), has become the main platform for information distribution on the Internet Thompson et al [9] conducted a study on InternetMCI’s backbone and found that Web traffic occupied more than half of the total Internet traffic

World-Today, the Web has become a highly heterogeneous environment Users are

accessing information on the Web pervasively through heterogeneous end points, including personal computers and workstations on traditional wired networks, and devices based on more recent wireless technologies

Wireless devices such as smart phones, palm-top devices, and laptop computers are playing a very important role on the Internet All these Web accessing devices have

various capabilities due to their widely diversified hardware computation power (e.g.,

Trang 11

processor speed, memory size, I/O capability), software configuration (e.g., operating system, Web browser, audio-visual applications), and network access methods (communication media and bandwidth)

Besides that devices are heterogeneous in their capabilities, users may as well have

different preferences for Web access, which may vary in several aspects such as privacy,

advertising, latency, and so on Consequently, different users may require different treatments on the Web content, based on their own preferences and device capabilities

To accommodate the needs for heterogeneous users and devices, network nodes between

servers and end users start to perform various functions on the Web content before it is distributed to the users These network nodes are often referred to as active web

intermediaries or proxies, in the rest of this thesis, we call them proxies Below are some

examples of the functions that are widely supported by proxies

Web content caching [39, 40, 41]

To achieve fast access to Web content for users who are spread out in a large range of different networks, one can employ a Web caching proxy on a subnet that temporarily stores copies of selected content provided by some server, so that a local user can obtain the content quickly from the local proxy instead of the remote server These proxies have become very important to speed up Web content distribution

Trang 12

Content adaptation [2, 3, 4, 6]

Different device capabilities and user preferences pose different constraints on what kind of content is acceptable to the users To deliver only acceptable content to

users, proxies can perform content adaptations, which include image transcoding [41],

content transformation [4], content filtering [42], and so on

To perform these functions properly, a proxy usually requires some semantic information about the Web content We will refer this kind of semantic information as

content descriptions in the rest of this thesis To support various functions, many content

description models and frameworks have been proposed to provide semantic information

about different attributes of Web content For example, the Edge Side Includes (ESI) [1]

language was proposed to describe attributes such as expected expiry time or Live (TTL) for Web content to support dynamic content caching Extensible Device Independent Markup Language (XDIME) was proposed by Volantis [43] to describe content layout, image color and others attributes to support content adaptation for mobile devices

Time-To-Under the heterogeneous environment, efficient content distribution has become a problem In the following, we will address the challenging issues related to this problem one by one

Trang 13

First of all, it is not clear how a proxy should decide which functions to perform given any user preferences and device capabilities, because it is not easy, if possible at all, for every proxy to understand every type of devices and users, and the users may not

be able to know all the functions provided by proxies, either If this is not properly handled, we may end up delivering non-acceptable content to users

Secondly, to provide semantic information about different attributes of Web content, the server may need to store a large amount of content descriptions Delivering all the descriptions about a Web page to a proxy when the Web page is requested may be highly inefficient because the proxy may only need a small fraction of the content descriptions to perform the desirable functions

Thirdly, repeatedly delivering the same content descriptions to the same proxy is unnecessary But insofar, there lacks a mechanism for a proxy to properly cache and reuse the content descriptions that are already retrieved

In this thesis, we propose a content description model and framework for efficient content distribution The content description model employs ideas from Resource Description Framework [3] and External Annotation [2], which allow flexible descriptions for Web content The model also allows a server to efficiently select any subset of the descriptions

of any Web page and deliver them to a proxy The framework consists of several algorithms for the proxies to map user preferences and device capabilities to a set of

Trang 14

functions to be performed, and for the server to select and deliver necessary content descriptions to the proxy, and for the proxy to efficiently cache and reuse the content descriptions.

To evaluate the performance of our framework, we conduct a simulation study with certain simplifications (the details are given in Chapter 5) We employ real world Web objects identified from network traces, and study how our content description model and framework reduce the size of the Web objects, the delay in retrieving Web objects, the number chunks in HTTP responses, and the delay of entire Web pages We give some preliminary results and some discussions

This thesis is organized as follows In chapter 2, we review existing content description models and frameworks In chapter 3, we give a general content description model to support various content descriptions Subsequently, in chapter 4, we propose a framework

to support efficient content distribution in a heterogeneous environment After that, in chapter 5, we conduct a performance study to show the efficiency of the model and framework by simulations We conclude this thesis in chapter 6

Trang 15

it is very important to adapt the Web content to suit needs of different users Because of this, Web servers and proxies have started to perform various functions on the content before delivering it to the users

In the following, we will outline approaches from different aspects There are general frameworks for customized content distribution, content description models for providing Web content descriptions, mechanisms to support descriptions for device capabilities and user preferences, as well as existing software tools to do content adaptation

Trang 16

2.1 Framework for Customized Content Delivery

There are many frameworks for Web content customization In the following, we will introduce two well known frameworks: Internet content adaptation protocol (ICAP) [14] and Open Pluggable Edge Services (OPES) [36] In the following, we will introduce ICAP followed by OPES

ICAP, the Internet content adaptation protocol, is a protocol designed to provide simple Web object based content vectoring for HTTP services It is essentially a lightweight protocol for executing a “remote procedure call” on HTTP messages In other words, ICAP clients can pass HTTP messages to ICAP servers for some kind of content modification The ICAP server executes its own processes on messages and sends back response to the client, usually with modified messages The modified messages may be either HTTP requests or responses The following figure shows the flow of HTTP messages under the ICAP protocol for request modification and response modification

Figure 2.1 ICAP Response Modification

Trang 17

Figure 2.2 ICAP Request Modification

From the above diagrams, the ICAP server is a dedicated server to off-load specific Internet-based content modification from the original server, therefore freeing up resources in original servers and standardizing the way in which content modification can

be implemented

Similar to ICAP, OPES working group [36] is chartered to define a framework and protocols to authorize and invoke services to perform functions on Web objects It extends the functionality of a caching proxy to provide additional services that mediate, modify, and monitor object requests and responses

In general, both of the frameworks are proposed to provide support for almost any web services to modify Web content That means anyone can provide any function via these frameworks However, applying functions on Web content help to adapt the content according to special needs of users We cannot rely on any “special” functions to handle issues related to efficiency of content delivery in a heterogeneous environment In the

Trang 18

next section, we will look at content description models for web content description to facilitate customized web content delivery

2.2 Content Description Model

In this section, we review approaches on describing web content to facilitate customized web content delivery We will again talk about two well known content description models here, namely InfoPyramid [31] and Resource Description Framework (RDF) [3]

InfoPyramid is a representation scheme for handling Web content (text, image, audio and video) hierarchically along the dimension of fidelity/resolution (in different quality but in the same media type) and modality (in different media type) This representation scheme

is shown in Figure 2.3 The representation scheme includes methods for analyzing, filtering, translating, and manipulating the Web content

Figure 2.3 InfoPyramid Model

Trang 19

For the InfoPyramid model, the content is authored in XML [44], allowing the author to provide more information to the system performing content modification as only limited information about the content can be deducted from an HTML page directly The content will later be converted to HTML prior to delivery The authored content is analyzed to extract information that will be useful in adaptation Two types of content analysis are performed

First, each component of the content is analyzed to determine its resource requirements These requirements are content size, display size, streaming bit-rate, color requirements, compression formats, and hardware requirements

Second, the semantics of the content are determined in the context of the entire document After getting all these information, different modules can be chosen to convert the content into different versions with various resolutions and modalities This conversion is done offline, during content creation time Then multiple versions of the content, along with any associated meta-data are stored When a request comes, the web server determines the user device capabilities, selects the best fidelity and/or modality, and delivers the object in a suitable delivery format to the user

Resource Description Framework (RDF) is another general purpose content description framework This framework is based on XML and uses a collection of triples to provide descriptions A triple consists of a subject, a predicate and an object The assertion of an RDF triple says that some relationship, indicated by the predicate, holds between the

Trang 20

things denoted by subject and object of the triple A set of such triples is called an RDF graph This can be illustrated by a node and directed-arc diagram, in which each triple is represented as a node-arc-node link (hence the term "graph")

The assertion of an RDF graph amounts to asserting all the triples in it, so the meaning of

an RDF graph is the conjunction (logical AND) of the statements corresponding to all the triples it contains Note that the subject in the triple can be anything that can be referenced by a URI We know that External Annotation [2] proposed by W3C has suggested a way to reference to any node of an XML document For a well formed HTML page, we can parse it into a tree and use External Annotation to create a URI to any node in the HTML parse tree That means combining RDF and External Annotation can create a very flexible approach to provide any descriptions about any node in a well formed HTML Web page

For the two content description models, the InfoPyramid approach provides a model to generate and organize web content with different versions This is a one-for-all approach,

it tries to handle all types of content in the container object (usually HTML objects), including text, images, videos, etc But it relies on content descriptions (embedded in XML format) to determine their resource requirements of the content However it may not work on HTML objects without extra content descriptions embedded RDF is a general and flexible framework providing content descriptions Combining RDF and

Trang 21

External Annotation is a very useful approach to provide arbitrary descriptions about Web content without changing the content at all Actually our new content description model uses this idea to provide flexibility in our content description model

2.3 User Descriptions

To deliver the best-fit presentation of content to the users, we need descriptions about the user preference and device capabilities in the first place W3C has proposed the Composite Capability and Preference Profile (CC/PP) [10] to achieve this goal Wireless Application Protocol (WAP) Forum has proposed a similar approach named User Agent Profile (UAProf) [37] to handle user descriptions Both CC/PP and UAProf are based on Resource Description Framework (RDF) [38] and aim at describing and managing software and hardware profiles In our framework for efficient content distribution, we can use CC/PP or UAProf to provide descriptions about user preferences and device capabilities

2.4 Server Side Approaches

Besides descriptions about the clients, there are also approaches on the server side to address the issue of customized content delivery Approaches in this category fall into two main streams: providing web content descriptions or giving instructions on how to process web content from the web server We will introduce examples in these two streams in the following part of this section

Trang 22

W3C has proposed a working draft on content selection for web contents for device independence [17] It specifies a processing model general purpose selection Selection involves conditional processing of various parts of an XML information set according to the results of the evaluation of expressions These logical expressions are associated with some parts of the information set and they will be processed at run time Using this mechanism some parts of the information set can be selected for further processing and others can be suppressed The specification of the parts of the infoset affected and the expressions that govern processing is by means of XML-friendly syntax This includes elements, attributes and XPath [45] expressions When using this selection mechanism with HTML objects, these logical expressions are embedded into the HTML objects and evaluated at run time to determine which part to include

ESI [1] uses a similar mechanism as W3C’s content selection Logical expressions are embedded with ESI markups into HTML object and evaluated at run to determine which fragment will be selected But main purpose of the ESI selection is for dynamic content assembly for different users

Besides providing content descriptions on the server side, there are also approaches which suggest web servers giving explicit guidance to allow a proxy to make the best choice while modifying web contents An example of this approach is server-directed-transcoding [33] by Mogul et al He proposed new HTTP header directives, by which a web server could give hints to a proxy on how to modify a web object He also proposed

Trang 23

the use of applets (Java, Perl, etc.) to modify the web object according to web server’s guidance

From the above approaches, either the web server gives instructions on how to customize web content delivery, or they provide content descriptions about the content so that other web intermediaries can perform the task In the next section, we will introduce several software tools on providing customized HTML content to clients

2.5 Existing Software Tools

There are numerous software tools in the market providing customized Web content according to different clients’ needs These software tools include WebSphere Everyplace Mobile Portal [46], Web Logic Portal [47], etc They can transform HTML content to different markup languages such as WML, changing the page layout to suit different screen size, etc examining the users’ hardware capability and preferences, by filtering out parts of HTML objects that clients are not interested In the above mentioned software, content description is embedded in the content via special mark up User preferences are stored locally on the server when user registers himself with the server Their device capabilities are retrieved from specialized external repositories such as Wolantis [48] As we can see, these commercial software tools have the ability to support certain content transformation functions but the implementation is proprietary and not easily extensible to support other functions

Trang 24

Different software tools may provide a different set of options depending on the software design But if new type of content emerges, clients have to wait for an update of the software to handle the new type of contents Thus extensibility and flexibility is a problem for these existing software tools

2.6 Summary

This chapter lists some of the approaches relevant to customized content delivery to clients from different aspects ICAP provides a framework where almost any services for customized web content delivery can be implemented The service can be provided by redirect HTTP request or response to dedicated ICAP servers OPES provides a similar system The InfoPyramid approach provides a model to generate and organize different versions of content in HTML objects Different versions can be selected when client sends request for a particular HTML object However, generation of different versions of content relies on content descriptions and specific modules to accomplish

To support content customization for different clients, we need descriptions about the clients’ preference and capabilities as well as the contents From the client side, there are frameworks such as CC/PP and UAProf to handle description for clients From server side, approaches like ESI provide content description through its own mark up languages, but their focus is on dynamic content assembly and caching Other approaches like server-directed-transcoding provide server guidance on how to provide content

Trang 25

customization However it emphasizes on transformation of embedded objects in HTML pages

There are also existing software products like IBM WebSphere Mobile Portal and BEA WebLogic Portal to provide content adaptation according to device capabilities of users But different software provides a different set of options to clients, and there is no standard way to map all the clients’ preferences to the options provided by the software From above, there is no direct solution from the literature that addresses the efficiency issue in content delivery We can make use of existing frameworks such as ICAP and CC/PP to support our model But we need to add elements in our model to improve efficiency In the next chapter, we will explain our own content description model in detail

Trang 26

Chapter 3

A General Content Description Model

As mentioned in previous chapters, we need a general content description model

to provide content semantic information to support different functions by different parties In this chapter, we give the general settings for the content description model Under our setting, we discuss the design considerations of the model In particular, we look at how the content should be described, and how the descriptions should be organized and associated with the content Finally we show how to achieve these design goals

3.1 General Settings

As illustrated in Figure 3.1, without loss of generality, we assume that there three entities,

namely, a Web server, a proxy, and a user We say that a Web object is the smallest unit

for content delivery For example, a Web object can be a paragraph of text between a pair

of <p> and </p> tags in an XHTML document Any Web object is associated with a

Trang 27

unique identifier, which we call web object identifier We will make this notion more

precise later in Section 3.2

We assume that the server holds some Web content, which is a set of Web objects The

user may send requests for Web objects to the Web server through the proxy Each time, the user may choose to use a difference device for a different application, and sends a list

of device capabilities and user preferences to the proxy to indicate the requirements on

the Web objects imposed by the device and application for the current request

The proxy, based on the user’s preferences and device capabilities as well its local

policies, determines a set of functions to perform on the content before delivering content

to the user Each function here refers to a set of logical operations to be performed on a Web object, e.g., caching and image transcoding

Figure 3.1 General Settings

Proxy

Server Processed Web objects Web objects

Trang 28

Next, the proxy requests from the server for the Web objects and the content descriptions that are necessary for the corresponding functions After that the proxy performs the desired functions on the Web objects and delivers the results to the user

Note that whether to perform some functions on a certain Web object may not depend on the capabilities and preferences provided by the user at all, e.g., in the case of caching, but for some other functions, such decisions may in deed depend on the capabilities and preferences, e.g., image transcoding

Trang 29

Even for the same content property, there can be multiple types of descriptions For example, to describe the expected expiring time for a Web object, ESI suggests a single Time-To-Live (TTL) value, whereas some others use more complex descriptions such as

a set of TTL values where each of them is associated with a certain probability at which a Web object may expire It is preferable that the content description model allows different types of descriptions for the same property

Furthermore, when a proxy requests for certain content descriptions that are needed to support its functions, under the content description model, it should be easy for the server

to select only the desired descriptions Hence, the content description model should provide a way to organize all the content descriptions in a manner so that the server can perform such selection efficiently for any proxy function

To achieve these design goals, we need to decide how a Web object should be described

in a general way, and how we should organize the descriptions such that we can efficiently locate the descriptions of a given object, and select those desired ones In the following sections we are going to describe our solutions to the above problems

Trang 30

3.2 Proposed Content Description Model

3.2.1 Web Objects

Without loss of generality, we assume that the Web content that the user is interested in is

always stored in the form of a mark-up language that is well-formed, in the sense that (1)

the tags always appear in pairs, and (2) after removing any pair of tags and the content between them, the remaining content is still well-formed Note that the second requirement implies that the tags are properly nested, so that we can always compute a parse-tree from a document encoded using such a mark-up language XML and XHTML are examples of such mark-up languages In the rest of this thesis we will use XHTML as

an example, but our content description model applies to any well-formed mark-up language We also assume that each Web object is all the content enclosed by a pair of tags, and is always represented by a parse-tree

Recall that we require that every Web object is uniquely identified by some identifier

(ID) For any XHTML page P which includes a unique pair of tag “<html>” and

“</html>”, let its URL be Up, then the ID for the Web object that represents P would

be Up#root() Where root() represents the root node of the parse tree of the page Similarly we can give the identifier for each element in P For example,

Up#root().child(1) is the ID for first child node of the parse tree, while Up#root().child(1).child(5) would be the ID for the fifth child node of the first child node

of the parse tree Note that the above notation is similar to the use of annotation scheme proposed by W3C [50] for content transcoding

Trang 31

3.2.2 Object Description Scheme

In the proposed model, we give an object description scheme (ODS) to describe Web

objects Under our framework, every Web object, which is uniquely identified, is

associated with a number of descriptions Each description of a Web object is a tuple

<ID, attribute, value>, where ID is the unique identifier associated with the Web object, whose attribute, which is a string, is specified by value, which is another string The “attribute” is the property of the Web object we want to describe In particular, the description takes the form of the following

as below

<Author>Anonymous Author</Author>

</ods>

Trang 32

For the ease of selection of different descriptions, descriptions for a Web object are

organized according to the attributes of the Web object, such as expected time of expiry

of a Web object However, to accommodate different types of descriptions for an

attribute, we define a type to differentiate various types of descriptions for the same

attribute In this way, all the descriptions for a Web object are organized as a set of XML documents, each document stores descriptions for a particular attribute of a particular type Such an XML document then consists of three parts:

XML and XML Name Space Declarations

Since all the descriptions are in the form of XML, we need a XML declaration “<?xml version=‘1.0’?>” to indicate the beginning of a XML document Further more, all the name spaces used in this XML document needs to be declared too These name spaces include the scheme’s default name space as well as name spaces for property descriptions

Attribute Meta-data Definition

Specify meta-data about the attribute we are describing in this XML document These meta-data is specified as different attributes of the tag “<ods:attribute>” The attribute “attr” indicates what attribute we are describing in this document and attribute

“type” is for the type of the description for the attribute The attribute “isDefault” is

an indication on whether this is the default description for this attribute about the Web

Trang 33

object and attribute “mode” is mode of the description What these meta-data means will

be further explained in chapter 4

Description Tuples

As shown above, a description tuple consists of an object ID, a property and its value Note that the property can be from another namespace other than the default, this allows reusing descriptions by existing content description frameworks The value can be both literals and markups from a XML namespace All the description tuples must be enclosed

in between the root element “ods:Desp”

In the following figure, there is an example of the descriptions of the Web objects

in a simple XHTML page, where the property of the objects is their time-to-live (TTL)

Trang 34

attribute descriptions for multiple Web objects

Rule two: Ignore irrelevant attribute descriptions Sometimes attribute descriptions to a

parent node does not apply to child nodes We can show this example A web page contains a news article in English with a few images among the text The author would like to specify an attribute to show that the text information is in English This description

Trang 35

is put at the “root” node level and obvious it doesn’t apply on the image Web objects So

we ignore them during processing

Rule three: Local attributes have a higher priority than global attributes If attribute

descriptions are attached to the child and parent nodes, then those attached to the child nodes will overwrite the ones in the parent node This is a complement to rule one

With the three rules defined above, it is clear on which attributes apply on a particular piece of content In next section, there is a discussion on all the language components and functionalities explained in this chapter

3.2.3 Discussion

The object description scheme given in the previous section is highly flexible Since the values of the properties can be any string, we can even embed arbitrary encoded text into it, as long as the proxy knows how to decode it and use it to process the objects For example, in Edge Side Includes (ESI), sometimes we need to specify multiple alternatives for the same Web objects, by using <esi:try> and <esi:attempt> tags

In this case, we can actually encode the entire content between the <esi:try> and </esi:try> tags into a single string, and put it as a description of the object

Trang 36

When the proxy requests for the description, it will receive and decode this string and process the ESI entry as if this ESI entry was embedded into the page

Furthermore, as we have mentioned before, different descriptions for different properties are put into different XML documents In this way, if only the descriptions about one attribute, such as the lifetime of the objects, are required, the server can easily pick out the relevant XML document and deliver to the proxy

Trang 37

Chapter 4

A Framework for Efficient Content Distribution

In previous chapters, we have introduced a content description model for Web objects

We focus on how to use the model to provide content descriptions and facilitate efficient selection of content descriptions In this chapter, we propose a framework to improve efficiency of content distribution in a heterogeneous environment In the following parts

of the chapter, firstly we will introduce the design objectives of the framework to achieve efficiency After that we will define operations from web servers, proxies and clients to support the design objectives Finally we give a summary of the framework

4.1 Design Objectives

The following are the objectives we would like to achieve by the framework

Trang 38

System architecture

The functions discussed in previous chapters can be performed by the server, the proxy, and even the user To improve efficiency, we need to come out a system architecture where responsibilities of the server, the proxy, and the user are clearly defined

Select the right functions to perform

A proxy may have a set of functions for different purposes For a user with certain preferences and device capabilities, we need to select the right functions to perform on the content before delivering the result to the user Otherwise, we may end up delivering

unacceptable content to the user

Transfer necessary content descriptions only

Due to variety of user preferences and device capabilities, web servers need to maintain a large set of content descriptions to support various functions on Web objects Different proxies may require different portions of content descriptions on the server to support different functions Hence we need a mechanism for a server to transfer only the necessary descriptions desired by a proxy

Reuse existing content descriptions

Repeatedly transferring the same set of descriptions to a proxy is a waste of network bandwidth, especially when the volume of descriptions is large Under the condition that

Trang 39

descriptions for a Web object may expire, we need a mechanism to properly cache, validate and reuse existing descriptions retrieved from a server

4.2 System Architecture

When we design the architecture to handle content delivery from the Web server to the user in a heterogeneous environment, we need to address the following issues: who to provide the content descriptions, who to perform the functions, and who to provide descriptions about user preferences and device capabilities In the following, we would like to introduce the considerations for the system architecture first, followed by the overall architecture design

There are some design considerations we would like to take into account for the system architecture, namely scalable service and transparent to end users We will discuss them one by one

Scalable service

Due to rapid increase of Internet hosts and end users, the way to provide content delivery

needs to be scalable It means that relatively maintaining the same response time for

content delivery with the increasing number of users requiring customized delivery from the same website

Transparent to users

Trang 40

Due to the large amount of users access the Web, it is very inconvenient or even impractical if there is a dramatic change to the software (e.g., browsers) used to access the Web by the users In our system architecture, users only need to express their preferences and device capabilities and no other change on the software for end users is required

Having these design considerations, we will address the issues about how to allocate the responsibilities on the Web server, the proxy and the end user

For content descriptions, the Web server is the most appropriate to host them Since content authors provide Web content, they know their content better than others It is very natural for the content authors to provide both the Web content and the content descriptions and let all of them hosted on the Web server

As for the question on who to perform the functions on Web content, we have a few choices, namely the Web server, the proxy and even the end user There are different benefits and constraints for different choices

For the Web server, it hosts Web content and content descriptions If it is going to perform functions on Web content, the benefits is that it can refers to content descriptions locally and performs the necessary functions At the same time, it also has to get the user descriptions (preferences, capabilities) to make the decision on what functions to perform Since the user descriptions vary, the server may need to perform different sets of

Định dạng
Số trang	87
Dung lượng	245,5 KB