2.3.2 Status – HEAD The HEAD operation is just like a GET operation, except that the server does not return the actual object requested.. When a server receives a TRACE, it responds 1 O
Trang 1page the user has identified a local file By clicking on the
Upload button, the user asks the browser to send a PUT
re-quest to the server
2.2.4 File Deletion – DELETE
With GET and PUT operations, http becomes a serviceable
protocol for simple file transfers The DELETE operation
completes this function by giving clients a way to delete
ob-jects from servers The message exchange contains no
sur-prises As figure 2.13 shows, the client sends a DELETE
message along with the uri of the object the server should
remove The server responds with a status code and,
option-ally, more data for the client
Figure 2.12
The PUT request may be used to upload a file to a server In this example the user wants to store the indicated file on the server
Trang 22.3 Behind the Scenes
The basic http operations generally occur as a direct result
of end-user actions Those four operations are not the only ones the protocol defines, however Three additional opera-tions, OPTIONS, HEAD, and TRACE, frequently take place be-hind the scenes Clients use them to communicate with servers not so much to perform user actions but to prepare for or diagnose problems with the basic operations
Although this section does not discuss it further, the http specification also reserves the name for another operation, CONNECT The standard does not define how CONNECT works, except to indicate that it is intended to support tunneling (See section 2.4.3.) Future extensions to http may define CONNECT in more detail
2.3.1 Capabilities – OPTIONS
Clients can use an OPTIONS message to discover what bilities a server supports The exchange is the standard re-quest and response, as figure 2.14 illustrates If the client includes a uri, the server responds with the options relevant
capa-to that object If the client sends an asterisk (*) as the uri, the server returns the general options that apply to all objects
it maintains
A client might use the OPTIONS message to determine the version of http that the server supports or, in the case of a specific uri, which encoding methods the server can provide for the object Such information would let the client adjust
1 DELETE URI
2
200 OK + Data
Figure 2.13 The DELETE operation lets a client
remove an object from a server The
URI identifies the object to delete.
Trang 3how it interacts with the server or how it actually requests a
specific object
2.3.2 Status – HEAD
The HEAD operation is just like a GET operation, except that
the server does not return the actual object requested As
figure 2.15 shows, the server returns a status code but no data
(HEAD is short for “header,” as the server returns only message
headers in response.) Clients can use a HEAD message when
they want to verify that an object exists, but they don’t need
to actually retrieve the object Programs that verify links in
Web pages, for example, can use the HEAD message to ensure
that a link refers to a valid object without consuming the
network bandwidth and server resources that a full retrieval
would require Cache servers can also use the HEAD operation;
it gives them a way to see if an object has changed without
actually retrieving the full object
2.3.3 Path – TRACE
The TRACE message gives clients a way to check the network
path to a server When a server receives a TRACE, it responds
1 OPTIONS URI
2
200 OK + Options
Figure 2.14
Clients can use an OPTIONS request to ask about a particular object or about the server itself The server returns the options data in its response
Trang 4simply by copying the TRACE message itself into the data for the response Figure 2.16 shows the simplest case
TRACE messages are more useful when multiple servers are involved in responding to a request An intermediate server, for example, may accept requests from clients but turn around and forward those requests onto additional servers (Proxies and cache servers, described in the next section, are examples of such intermediate servers.) When an interme-diate server is involved, TRACE works as in figure 2.17 The intermediate server modifies the request by inserting a Viaoption in the message This Via option is part of the message that arrives at the destination server, and it is copied into the data of the server’s response When the client receives the response, it can see the Via option in the data and identify any intermediate servers in the path Section 3.2.34 describes this process in more detail
Client
1 TRACE
4
200 OK + Message
Ultimate Server
2 TRACE
+ Via
3
200 OK + Message
Intermediate Server
Figure 2.16 Servers respond to TRACE requests by
echoing the request in their reply.
Figure 2.17 The TRACE request lets clients
discover the path their messages
follow through a network of
intermediate servers.
Trang 5and a single server The http protocol defines more complex
interactions, however, that frequently involve multiple servers
cooperating on a client’s behalf In this section, we’ll look at
the different ways that multiple servers may be involved in a
communication exchange
2.4.1 Virtual Hosts
Of all the enhancements that http version 1.1 adds to
ver-sion 1.0, one of the smallest is direct support for virtual hosts
But although the protocol change is small, this feature is a
major benefit for the World Wide Web Virtual host support
addresses a key element of the Web’s architecture that the
designers of version 1.0 did not anticipate—Web hosting
providers
The popularity of the Internet has created a tremendous
de-mand for Web sites, as organizations ranging from
corpora-tions to individuals (and even pets!) establish a presence on
the Web In many cases, though, it is impractical or
ineffi-cient for the organization itself to own and operate the
serv-ers and network infrastructure a Web site requires To meet
this demand, traditional Internet Service Providers,
tele-communications carriers, and specialized service providers
can host Web sites on behalf of other organizations A
sig-nificant majority of sites on the Internet are modest and
re-quire little resources from the systems on which they run
Because they don’t require a dedicated server, for example,
most Web hosting providers actually run many separate Web
sites on a single server, as figure 2.18 illustrates
The problem facing a Web server hosting multiple Web sites
is simply stated: When a client requests a Web page, how
does the server know which site the client is attempting to
access? Consider a client request for the Web page
corre-sponding to http://www.company1.com/news.html The
cli-ent first resolves the host part, www.company1.com, to an ip
address Then, as figure 2.19 shows, it establishes a tcp
con-nection and sends the http command GET news.html to
Trang 6that address Note, though, that the Web server does not participate in the dns resolution, so it doesn’t know which host the client intends to contact The Web server has no way of knowing whether “news.html” refers to com-pany1.com or company2.com
Prior to http 1.1, Web hosting providers had only two ways
to solve this problem They could require the Web sites to use unique uris for all their pages So if company1.com had a page named “news.html” on its site, company2.com could not use that same name within its pages In practice, Web host-ing providers implemented this solution by requiring a site identifier in all path names For example, instead of the straightforward uri “http://www.company1.com/news.html,” the company1.com Web site might use the more complicated
www.company1.com
Internet
Web Server www.company2.com
Domain Name System
1
Query www.company1.com
Virtual Hosts
Figure 2.19 Virtual hosts can make it difficult for
the Web server to determine which
Web site the client is trying to
access In this case the physical Web
server has no idea which Web
address the client requested
because it did not participate in the
DNS exchange that mapped the
host name to its IP address.
Figure 2.18 Virtual hosting lets many Web
addresses share the same Web server.
This configuration is typical in ISPs
that provide Web hosting for small
businesses and individuals.
Trang 7“http://www.company1.com/company1.com/news.html.” As
an alternative, Web hosting providers could assign separate
ip addresses to each site on their servers The servers then
determine which site a client has requested by examining the
ip address to which the client connects Servers end up with
multiple ip addresses, and ip addresses are scarce resources
With version 1.1, http addresses the problem of virtual hosts
with a simple addition to the client’s request That addition
is the Host header, in which the client must place the host
name of the site it is requesting As figure 2.20 shows, the
server can easily determine the site to which a request
ap-plies, and it can return the appropriate resource
2.4.2 Redirection
While virtual host support allows a single server to support
multiple Web sites easily, redirection offers a way to support
a single site to use multiple servers Redirection lets a server
redirect a client to another uri for an object Figure 2.21
shows the process First the client requests an object from
the first Web server Instead of returning the requested
ob-ject, however, the server replies with a 301 Moved status
code The response also indicates a new uri for the object
The client recognizes this uri and, in step 3, reissues the
re-quest This time the GET succeeds, and the second server
re-turns the actual object
www.company1.com
Internet
Web Server www.company2.com
GET /news.html Host: www.company1.com
Figure 2.20
The Host feature in HTTP version 1.1 lets clients explicitly identify the Web site they are accessing, so the virtual hosting Web server can return the right content
Trang 8Redirection is essential to the very dynamic Web ment It provides a convenient way to support revisions within a Web site, relocation of content, and even the change
environ-of a corporate identity
Note that the redirection does not have to specify a different host Frequently, in fact, redirection is used to inform the client of a new path for the resource on the same host Note also that there are other techniques for accomplishing the same effect The server can, for example, answer the original request by providing a JavaScript object that automatically directs the client to a new location
2.4.3 Proxies, Gateways, and Tunnels
Another way that http servers can cooperate with each other is by acting as proxies, gateways, or tunnels In each of these roles, the server that the client first contacts relays the request to a new server and then relays the second server’s response back to the client Figure 2.22 shows a proxy server
in operation
In the figure, the client first sends its http request directly
to the proxy server That server, however, cannot (or chooses not to) respond to the client immediately Instead, it re-issues the request to a second server, which the figure labels
A server redirects a client to tell the
client that the object it requested is
located elsewhere When, in step 2,
the client receives a 301 Moved
response, it looks for a new URI in the
response message and issues a new
GET request for that URI.
Trang 9the “origin server” (so called because it is the origin of the
requested object) In the most basic case, the second GET has
a uri identical to that of the first; it’s simply sent to a new
server That server treats the second GET as if it had come
from a client and responds with the requested object The
proxy server then has the information the client originally
requested, and it returns that object to the client in step 4
Although figure 2.22 shows a single proxy server, http
al-lows multiple proxies to participate in satisfying a request
The proxies form a chain as in figure 2.23, handing off the
request from one to the other until the requested object can
be found The proxies then pass that object back to the client
in the reverse direction As each server processes a request, it
adds its own identity to the Via header in the request By the
time the request arrives at the ultimate final server, the Via
Via: proxy1 3
GET URI Via: proxy1, proxy2
1
Proxy Server GET
4 200 OK
2 GET
3 200 OK Web Browser
Figure 2.23
Proxy servers create or update the Via option as they relay requests or responses This option may make it easier to diagnose network problems.
Figure 2.22
A proxy server positions itself in between clients and servers It forwards requests on behalf of clients and relays responses from the servers
Trang 10header will have captured the path taken by the request through the server chain The response follows the same process, with each intermediate system inserting its identity
in the Via header (Note that figure 2.23 shows only a partial Via header; for complete details, see section 3.2.50.)
Proxy servers perform several important functions for http communications The most common is in support of cach-ing, which section 2.4.4 discusses in more detail Other uses include enforcing policy for an organization A corporation can direct all its internal clients to use a proxy server to ac-cess the public Internet, allowing the proxy server to filter that Internet access appropriately Frequently this type of operation is part of a firewall Proxy servers have also been used to provide anonymity to Web browsers, preventing servers from discovering identifying information about actual clients
If, as is common, a proxy serves multiple origin servers, then the client must usually include the absolute uri in its re-quests Without the full uri, the proxy may not be able to tell which server the client wishes to contact Because this behavior is unusual for many clients, and because clients must know to send their requests to proxy servers rather than the ultimate destination, they must often be explicitly con-figured to use a proxy server Chapter 5 describes some of the mechanisms that system administrators can use to automati-cally configure proxy servers for their users
Gateways and tunnels operate very much like proxy servers; however, there are subtle differences Gateways act as an endpoint to a server chain, but they still rely on other servers
to provide all or part of the requested object In many cases, gateways use a protocol other than http to access the object
In figure 2.24, for example, the gateway uses the Structured Query Language to retrieve information from a database management system
Trang 11While gateways act as a definite endpoint to a server chain,
tunnels are exactly the opposite As figure 2.25 indicates, they
are relatively transparent to the original client; the client may
not even be aware that a tunnel exists Tunnels do provide
some service, however In the example of figure 2.25, the
tun-nel establishes a secure connection to the actual server,
add-ing security to the communication between client and server
Note that although http 1.1 defines the operation of tunnels
in general terms, as of this writing few practical
implementa-tions are available
2.4.4 Cache Servers
Cache servers are a specialized type of proxy servers whose
main function is to improve Web performance They do that
by remembering the objects requested by clients and, if the
Server Tunnel
Figure 2.24
A gateway accepts HTTP requests and translates them to a different format such as SQL The gateway also ensures that any reply is a proper HTTP response
Trang 12same object is requested again (either by the same client or a different client), returning the object that they’ve remem-bered instead of re-requesting it from the origin server Fig-ures 2.26 and 2.27 show the process
The first figure shows standard proxy operation The key to a cache server’s operation is that it remembers the requested object, generally by saving a copy on its local disk or in its memory
Figure 2.27 shows the payoff for the cache server In this ure, a new client requests the same object as in figure 2.26 This time, however, the cache server does not need to con-tact the origin server It simply returns the saved object from its local disk or memory
fig-Cache servers improve Web performance at both the client and the origin server For the client, they shorten the dis-tance to the object the client needs As figures 2.26 and 2.27 illustrate, a cache server may be located on the same local area network as its clients Local networks typically have higher bandwidth than wide area Internet connections, and the transmission delay across a local network is generally much less
Cache servers also improve performance by reducing the load on the origin server When a cache server returns an object to a client, that’s one less request to bother the origin
Origin Server Internet
1
Cache Server GET
4
200 OK
2 GET
3 200 OK Web Browser
Figure 2.26 Cache servers are proxy servers that
relay requests and responses In
addition, they keep a local copy of any
responses they receive.
Trang 13server Fewer requests mean less processing and memory
re-sources that the origin server requires, as well as less
band-width it needs for its connection to the Internet
One of the more complicated issues facing a cache server is
knowing how long the objects it has stored in its cache
re-main valid Given the dynamic nature of the Web, an object
that an origin server returns at one moment may be
super-ceded by a new object in the next moment When that
hap-pens, the cache server must not return the object from its
cache, but, rather, it must query the origin server to
re-trieve the new object
As we’ll see in section 3.2, http 1.1 includes several headers
just to support cache servers Those headers tell cache servers
whether an object can be cached and, if so, how long it can
be safely stored Section 5.2 examines cache server operation
in more detail, focusing on those aspects outside the scope of
the http specification itself
2.4.5 Counting and Limiting Page Views
Whenever an intermediate cache server processes client
re-quests, the origin server can lose some control over its
inter-actions with clients In many ways that is a benefit, as cache
servers reduce the load on origin servers and can significantly
improve their performance There are some disadvantages,
Web Browser
Origin Server Internet
5
Cache Server
GET
6 200 OK Web Browser
Figure 2.27
When a new client asks for the same object, the cache server returns its local copy instead of sending another request all the way to the origin server This speeds up the response, and it saves bandwidth for the Internet connection
Trang 14though For some Web sites, having a cache deliver pages to clients is a significant problem because it means the origin server does not know how often users view its content When the site derives revenue from advertising, being able
to count the number of site users may be critical to ing that revenue As a consequence, many Web servers delib-erately designate their content as non-cachable, even when caching is otherwise both possible and desirable The devel-opers of http have recognized this problem and introduced
maximiz-a technique thmaximiz-at maximiz-allows cmaximiz-aching maximiz-and yet still gives origin servers a way to count and, if desired, limit page views by the cache server clients This technique is an extension to the base http specification; it is documented in rfc 2777
The process begins when a proxy inserts a Meter header into
a request message as it forwards the message on (See section 3.2.35 for details of this header.) Steps 2 and 3 of figure 2.28 show this process By inserting the header here, the proxy
insert the Meter header in requests
passing through them Servers ask
for metering on a particular object
by including the Meter header in
their replies.
Trang 15indicates its willingness to report on and limit the number of
times it returns the resulting response from its cache
The origin server responds to this invitation by including a
Meter header in its response This header tells the proxies
how to handle the object with respect to reporting and usage
limitations
Later, when another client requests the same object, the
proxies that have a cached copy will need to validate that
copy with the origin server When they do, as figure 2.29
shows, they update the Meter header in their requests This
meter information is a report of the number of times the
cached entry has been provided to clients
2.5 Cookies and State Maintenance
The http protocol normally operates as if each client
request is independent of all others The server responds to
any request strictly on the merits of that request, without
Trang 16reference to other requests from the client (or, for that matter, any other client) This type of operation is known as
stateless because the server does not have to keep track of the
state of its clients
Because maintaining state requires server resources (memory, processing power, etc.), stateless operation is usually desir-able In some applications, however, the server needs to keep some state information about each of its clients Users that successfully log in to a Web site, for example, shouldn’t have
to log in again every time they view a different page on that site A server can avoid this inconvenience by tracking the state of the client The first time the client requests a page from the site, the server requires the user to log in As the user continues to browse the site and make additional http requests, however, the server remembers the previously suc-cessful login and refrains from requesting additional logins
2.5.1 Cookies
State maintenance requires one critical capability: Servers must be able to associate one http request with another The server must be able to tell, for example, that the user requesting a new page really is the same user that has already logged in, not a different user that has not been authorized The mechanism that http defines for state maintenance is
3 HTTP Request
+ Cookie
Figure 2.30 Servers can return state
management cookies in their
responses Clients, if they wish,
include those cookies in subsequent
requests to the same server.