In this first chapter we’ll get a quick introduction to a few important concepts, including the relationship between the Hypertext Transfer Protocol http and the Web, the notion of proto
Trang 3HTTP Essentials
Protocols for Secure, Scaleable Web Sites
Stephen A Thomas
Wiley Computer Publishing
John Wiley & Sons, Inc
New York •••• Chichester •••• Weinheim •••• Brisbane •••• Singapore •••• Toronto
Trang 4Publisher: Robert Ipsen
Editor: Margaret Eldridge
Managing Editor: Micheline Frederick
Text Design & Composition: Stephen Thomas
Designations used by companies to distinguish their products are often claimed as trademarks In all instances where John Wiley & Sons, Inc., is aware of a claim, the product names appear in initial capital or all capital letters Readers, however, should contact the appropriate companies for more complete information regarding trademarks and registration
This book is printed on acid-free paper
Copyright © 2001 by Stephen A Thomas All rights reserved
Published by John Wiley & Sons, Inc
Published simultaneously in Canada
No part of this publication may be reproduced, stored in a retrieval system or mitted in any form or by any means, electronic, mechanical, photocopying, re- cording, scanning or otherwise, except as permitted under Section 107 or 108 of the
trans-1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, ma 01923, (978) 750-
8400, fax (978) 750-4744 Requests to the Publisher for permission should be dressed to the Permissions Department, John Wiley & Sons, Inc., 605 Third Avenue, New York, ny 10158-0012, (212) 850-6011, fax (212) 850-6008, email perm- req@wiley.com
ad-This publication is designed to provide accurate and authoritative information in regard to the subject matter covered It is sold with the understanding that the pub- lisher is not engaged in professional services If professional advice or other expert assistance is required, the services of a competent professional person should be sought
Printed in the United States of America
1 0 9 8 7 6 5 4 3 2 1
Trang 5For the West Avenue Gang
Trang 7CONTENTS
Chapter 1: Introduction 1
1.1 HTTP and the World Wide Web 2
1.2 Protocol Layers 3
1.3 Uniform Resource Identifiers 9
1.4 Organization of This Book 10
Chapter 2: HTTP Operation 13
2.1 Clients and Servers 13
2.1.1 Initiating Communication 14
2.1.2 Connections 15
2.1.3 Persistence 15
2.1.4 Pipelining 17
2.2 User Operations 19
2.2.1 Web Page Retrieval – GET 19
2.2.2 Web Forms – POST 20
2.2.3 File Upload – PUT 22
2.2.4 File Deletion – DELETE 23
2.3 Behind the Scenes 24
2.3.1 Capabilities – OPTIONS 24
2.3.2 Status – HEAD 25
2.3.3 Path – TRACE 25
2.4 Cooperating Servers 26
2.4.1 Virtual Hosts 27
Trang 8viii HTTP Essentials
2.4.2 Redirection 29
2.4.3 Proxies, Gateways, and Tunnels 30
2.4.4 Cache Servers 33
2.4.5 Counting and Limiting Page Views 35
2.5 Cookies and State Maintenance 37
2.5.1 Cookies 38
2.5.2 Cookie Attributes 41
2.5.3 Accepting Cookies 42
2.5.4 Returning Cookies 44
Chapter 3: HTTP Messages 47
3.1 The Structure of HTTP Messages 48
3.1.1 HTTP Requests 48
3.1.2 HTTP Responses 51
3.2 Header Fields 53
3.2.1 Accept 57
3.2.2 Accept-Charset 58
3.2.3 Accept-Encoding 59
3.2.4 Accept-Language 59
3.2.5 Accept-Ranges 60
3.2.6 Age 61
3.2.7 Allow 65
3.2.8 Authentication-Info 65
3.2.9 Authorization 65
3.2.10 Cache-Control 65
3.2.11 Connection 70
3.2.12 Content-Encoding 73
3.2.13 Content-Language 74
3.2.14 Content-Length 74
3.2.15 Content-Location 75
3.2.16 Content-MD5 76
3.2.17 Content-Range 77
3.2.18 Content-Type 78
3.2.19 Cookie 79
3.2.20 Cookie2 80
3.2.21 Date 80
Trang 9Contents ix
3.2.22 ETag 81
3.2.23 Expect 83
3.2.24 Expires 84
3.2.25 From 84
3.2.26 Host 85
3.2.27 If-Match 86
3.2.28 If-Modified-Since 88
3.2.29 If-None-Match 90
3.2.30 If-Range 91
3.2.31 If-Unmodified-Since 92
3.2.32 Last-Modified 93
3.2.33 Location 93
3.2.34 Max-Forwards 94
3.2.35 Meter 99
3.2.36 Pragma 102
3.2.37 Proxy-Authenticate 102
3.2.38 Proxy-Authorization 103
3.2.39 Range 103
3.2.40 Referer 103
3.2.41 Retry-After 105
3.2.42 Server 105
3.2.43 Set-Cookie2 106
3.2.44 TE 106
3.2.45 Trailer 107
3.2.46 Transfer-Encoding 108
3.2.47 Upgrade 110
3.2.48 User-Agent 110
3.2.49 Vary 111
3.2.50 Via 112
3.2.51 Warning 113
3.2.52 WWW-Authenticate 114
3.3 Status Codes 115
3.3.1 Informational (1xx) 117
3.3.2 Successful (2xx) 119
3.3.3 Redirection (3xx) 122
3.3.4 Client Error (4xx) 124
3.3.5 Server Error (5xx) 127
Trang 10x HTTP Essentials
Chapter 4: Securing HTTP 129
4.1 Web Authentication 130
4.1.1 Basic Authentication 130
4.1.2 Original Digest Authentication 133
4.1.3 Improved Digest Authentication 142
4.1.4 Protecting Against Replay Attacks 144
4.1.5 Mutual Authentication 145
4.1.6 Protection for Frequent Clients 149
4.1.7 Integrity Protection 152
4.2 Secure Sockets Layer 156
4.2.1 SSL and Other Protocols 157
4.2.2 Public Key Cryptography 159
4.2.3 SSL Operation 161
4.3 Transport Layer Security 168
4.3.1 Differences from SSL 168
4.3.2 Control of the Protocol 169
4.3.3 Upgrading to TLS within an HTTP Session 169
4.4 Secure HTTP 172
Chapter 5: Accelerating HTTP 177
5.1 Load Balancing 177
5.1.1 Locating Servers 178
5.1.2 Distributing Requests 180
5.1.3 Determining a Target Server 182
5.2 Advanced Caching 186
5.2.1 Caching Implementations 186
5.2.2 Proxy Auto Configuration Scripts 194
5.2.3 Web Proxy Auto-Discovery 197
5.2.4 Web Cache Communication Protocol 200
5.2.5 Network Element Control Protocol 204
5.2.6 Internet Cache Protocol 212
5.2.7 Hyper Text Caching Protocol 216
5.2.8 Cache Array Routing Protocol 222
5.3 Other Acceleration Techniques 225
Trang 11Contents xi
5.3.1 Specialized SSL Processing 225
5.3.2 TCP Multiplexing 227
Appendix A: HTTP Versions 229
A.1 HTTP’s Evolution 229
A.2 HTTP Version Differences 231
A.3 HTTP 1.1 Support 234
Appendix B: Building Bullet-Proof Web Sites 241
B.1 The Internet Connection 242
B.1.1 Redundant Links 242
B.1.2 Multi-homing 246
B.1.3 Securing the Perimeter 249
B.2 Systems and Infrastructure 250
B.2.1 Reliability through Mirrored Web Sites 250
B.2.2 Local Load Balancing and Clustering 251
B.2.3 Multi-Layer Security Architectures 254
B.3 Applications 255
B.3.1 Web Application Dynamics 256
B.3.2 Application Servers 257
B.3.3 Database Management Systems 260
B.3.4 Application Security 265
B.3.5 Platform Security 265
B.4 Staying Vigilant 266
B.4.1 External Site Monitoring 266
B.4.2 Internal Network Management 268
B.4.3 Intrusion Detection 270
B.4.4 Maintenance and Upgrade Procedures 273
B.5 The Big Picture 274
B.5.1 Internet Connection 276
B.5.2 Web Systems 276
B.5.3 Applications 276
B.5.4 Database Management System 277
B.5.5 Network Management and Monitoring 277
Trang 12xii HTTP Essentials
B.5.6 Intrusion Detection System 277
References 279
General References 279
HTTP Specifications 280
Separate Security Protocols 280
Caching Protocols 281
Previous HTTP Versions 281
Glossary 283
Index 307
Trang 13CHAPTER 1
Introduction —
HTTP, the Internet, and the Web
them—over 80 percent—feature a World Wide Web address
Even more remarkably, only 121 (61 percent) list a telephone
number If advertisements are a reflection of society, then
here in the United States, at least, the Web has become an
indispensable part of our lives
This book is about what makes the Web tick It explains the
protocol that defines how Web browsers communicate with
Web servers, the mechanisms that keep that communication
secure from counterfeits and eavesdroppers, and the
tech-nologies that accelerate our Web experience In this first
chapter we’ll get a quick introduction to a few important
concepts, including the relationship between the Hypertext
Transfer Protocol (http) and the Web, the notion of
proto-col layers, and the Web’s idea of an address The final section
outlines the rest of the text
By the end of the book we’ll have covered all aspects of the
Hypertext Transfer Protocol: its operation, message formats,
Trang 142 HTTP Essentials
security mechanisms, and acceleration techniques We will also see how http has evolved, and how newer implementa-tions maintain backward compatibility with old systems And finally, we will take what we’ve learned and apply it to building scalable, highly available, and secure Web site architectures
1.1 HTTP and the World Wide Web
The Internet can trace its roots to research projects begun in
British physicist working in Switzerland, however, has guably influenced today’s Internet more than any other per-
outlined the advantages of a hypertext-based, linked
with Robert Cailliau, created the first Web browsers and servers Those browsers needed a protocol to regulate their communications; for that Berners-Lee and Cailliau designed the first version of http
Since then, Web traffic has grown to dominate the Internet
file transfer, and remote login Today, at least in the common
vernacular, the World Wide Web is the Internet And the
Web continues to grow In the fall of 2000, as this book is nearing completion, the Censorware Project reports that the Web has roughly:
Trang 15The Hypertext Transfer Protocol has grown along with the
Web The original specification for http fits comfortably on
a single page and, at 656 words long, can be read and
under-stood in just a few minutes In contrast, the specification for
other documents that make up the http standard, define the
rules by which Web browsers, Web servers, proxies, and
other Web systems establish and maintain communications
with each other The http standards do not dictate what
information the systems exchange once they establish
com-munication Indeed, one of http’s greatest strengths is its
ability to accommodate almost any kind of information
ex-change Web pages, for example, are often created according
to the rules for the Hypertext Markup Language, or html
(also invented by Berners-Lee) But http is equally adept at
transferring remote printing instructions, program files, and
multimedia objects With the ubiquity of Web browsers, the
pervasiveness of the Internet, and the power and flexibility of
http, the protocol Berners-Lee and Cailliau developed may
ultimately become the foundation for all network-based
computing
1.2 Protocol Layers
To understand http, it helps to know a little about the
chitecture of the Internet We can look at the Internet’s
ar-chitecture from two perspectives From one view, the
Internet is a loosely connected collection of networks of all
Trang 164 HTTP Essentials
sizes and types that cooperate to exchange information stead of considering physical systems, however, we’ll focus on the software that controls those systems From that perspec-tive, the Internet is a collection of different communication protocols; these protocols cooperate to provide services Providing services over the Internet is actually a very com-plex undertaking To make the challenge more manageable, the Internet designers divided the work into different com-ponents and assigned those components to several different communications protocols The designers further organized those protocols into layers
system The lowest layer protocol controls the specific work technology, whether it’s an Ethernet lan, a dial-up modem, a fiber optic link, or any other technology One of the Internet’s greatest strengths is its ability to adapt to all types of network technology Isolating the protocol for that technology within its own layer is one of the reasons for this flexibility; supporting a new network technology is simply a matter of implementing an appropriate low layer protocol
net-Transport Protocol (TCP)
Internet Protocol (IP)
Network Technology
Application Communication System
Figure 1.1 Systems that communicate over
the Internet use several protocols.
Each protocol operates at its own
layer in a protocol stack, fulfilling
specific responsibilities This figure
shows the four protocol layers
used in an HTTP exchange HTTP
itself is the application.
Trang 17Introduction 5
The protocol layer immediately above the network
technol-ogy is the Internet Protocol, or ip And even though ip may
not be as famous as other protocols on the Internet, it can
easily justify its name as the Internet Protocol Not every
sys-tem on the Internet uses the same network technology, and
different systems rely on different transport and application
protocols Every system on the Internet, however, uses ip
The Internet Protocol’s main responsibility is taking
individ-ual packets of information and forwarding them to their
des-tination Most communications between systems require the
exchange of many packets, and ip takes responsibility for
every one
The next protocol is the transport protocol The Internet in
general uses three different transport protocols, but Web
communications in particular uses one: the Transmission
Control Protocol (tcp) While ip has responsibility for
mov-ing packets from one system to another, tcp makes that
in-formation transfer reliable It ensures that the packets arrive
in the right order, that none get lost in transit, and that no
errors appear
The final protocol layer is the application This protocol
ac-tually does something meaningful with the information
that’s exchanged, including organizing the exchange into
conversations The application protocol that most interests us
here is, of course, http, but there are many other application
protocols on the Internet There are application protocols for
exchanging electronic mail, for setting up telephone calls, for
authorizing dialup sessions, and so on Of course, as we
noted earlier, http traffic is the bulk of traffic on today’s
Internet
The internal protocol organization of a single system isn’t
what’s important for communications After all, it takes
more than one system to have meaningful communications
system into the diagram Now we can start to see the way
communication actually takes place The figure shows black
The 7 Layer Stack?
Many theoretical descriptions of network communications rely on the Open Systems
Interconnection Reference Model That model, developed by the International Standards Organization as a framework for protocol standards, defines seven protocol layers The Internet’s developers, however, have never been a slave to abstract theory; instead, they’ve focused on making practical networks operate In most cases, the four protocol layers of figure 1.1 are sufficient and appropriate
Trang 186 HTTP Essentials
arrows between the different protocol layers within a system Those arrows represent direct interaction The application protocol in one system interacts directly with the transport protocol That protocol, in turn, interacts directly with ip, and ip interacts with the protocol controlling the network technology The different systems can directly interact with each other only through the network technology
how-ever The gray arrows represent a logical interaction, and, as the figure indicates, each protocol layer logically interacts with its peer in the distant system So even though the appli-cation in one system directly interacts only with tcp, the re-sult of that interaction is a logical communication with the application in another system In the case of http, the http implementation in one system (for example, a Web browser)
is effectively communicating with the http implementation
in another (a Web server, perhaps)
To see this process in more detail, let’s look at how an http message makes its way from your Web browser to a Web server on the Internet Figure 1.3 shows the first four steps in
Transport Protocol (TCP)
Internet Protocol (IP)
Network Technology
Application Communication System
their protocols interface directly with
other protocols within each individual
system Effectively, however,
protocols at each layer communicate
with their peers in the other system.