Because HTTP is a textual protocol, you can connect to your Apache server directly using Telnet.. Unicode and UTF-8 Encoding The standard ASCII character set includes only 127 symbols, i
Trang 1(http://www.ietf.org/rfc/rfc2045.txt) All official MIME types are listed here:
ftp://ftp.isi.edu/in-notes/iana/assignments/media-types/ (you should actually browse the FTPdirectory) A MIME type is official when it has been registered with IANA (Internet Assigned Numbers Authority,http://www.iana.org/)
HTTP Protocol: RFC 2616
The Hypertext Transfer Protocol (HTTP) is the protocol used by browsers to request pages, and by the servers tosend the requested pages It is the very heart of the Web HTTP is the protocol used by the browser to requestdocuments, and of course by the server to send the requested files Its latest version is 1.1, and it is formalized inRFC 2616 (http://www.ietf.org/rfc/rfc2616.txt)
Like many other Internet protocols, HTTP is text based This means that it is possible to connect to an HTTP servermanually and observe what happens when a connection is established
The functionality of the HTTP protocol is quite simple: it is a request/response protocol, where the client requests a
resource (also informally called a page) and the server provides a response This is a typical HTTP request:
GET /index.html HTTP/1.1
Host: www.apress.com
As you can see, the only piece of information specified is the resource requested (/index.html), the protocoltype (HTTP/1.1), and the host you are expecting to be connected to (www.apress.com) This last piece of
information is required by HTTP 1.1, and is important in order to be able to apply to virtual domains properly (where
a single IP address can manage several different domain names) This is a typical response message:
HTTP/1.1 200 OK
Date: Sat, 14 Sep 2002 10:58:19 GMT
Server: Apache/2.0.40 (Unix) DAV/2 PHP/4.2.3
Last-Modified: Fri, 04 May 2001 00:01:18 GMT
HTTP/1.1 200 OK
Date: Sat, 14 Sep 2002 11:12:48 GMT
Server: Apache/2.0.40 (Unix) DAV/2 PHP/4.2.3
Last-Modified: Tue, 24 Aug 1999 05:33:58 GMT
ETag: "5ba6c-ec-be34bd80"
Trang 2What comes after the HTTP header is the binary information that makes up the GIF file.
Note This doesn't violate the earlier contention that the HTTP protocol is text based, because the binary
information represents the payload and is not part of the protocol itself
Because HTTP is a textual protocol, you can connect to your Apache server directly using Telnet For example:
[merc@localhost merc]$ telnet localhost 80
Date: Sun, 15 Sep 2002 06:48:00 GMT
Server: Apache/2.0.40 (Unix) DAV/2 PHP/4.2.3
Last-Modified: Sun, 15 Sep 2002 06:47:46 GMT
Encoding is the means of converting data into a different format, while retaining the content This is an important
aspect for Apache security, because encoding can often be used to manipulate applications and to make them dothings they are not supposed to
Unicode and UTF-8 Encoding
The standard ASCII character set includes only 127 symbols, including letters, apostrophes, speech marks, tabs,the newline character, and other control characters You can only represent writing in the English language usingASCII, because other languages need special letters (such as è, à, and so on) that are not included in the standardASCII code That is why several types of extended ASCII tables exist They share the characters up to 127 with theASCII code, and the symbols from 128 through 255 are used to define the extra characters exclusive to a particular
Trang 3This system has its own limitations: A document can contain only one set of characters, and you can't insert
French, English, and Italian text in the same document More importantly, some Asian languages need far morethan the 128 extra symbols made available by the extended ASCII tables This is why Unicode
(http://www.unicode.org/) was created: it's a bigger character set and includes symbols for every naturallanguage Note that the ISO/IEC 10646-1 format is compatible with the Unicode standard, that is, they both definethe same set of characters
Some programs may find Unicode hard to deal with, because it's a multi-byte character set This means that everycharacter is represented using two or four bytes, and this can cause great trouble for existing applications Forback-ward compatibility with older applications, the UTF-8 encoding standard is used
UTF-8 encoding is a standard encoding format used to represent Unicode characters in a stream of bytes UTF-8encoding is described in detail by RFC 2279 (http://www.ietf.org/rfc/rfc2279.txt)
Note Before HTML 4.0, the standard encoding format for web pages was ISO 8859-1, the first of a set of more
than ten different character sets that covered most European languages (they are identified by the
number after the dash: 8859-1, 8859-2, 8859-3, and so on) Now, more and more software is compatible
Unicode-UTF-8 encoding of Unicode is convenient for several reasons, but especially because it is much easier to
communicate with old applications using this encoding Also, null-terminated strings are not changed by UTF-8, andUS-ASCII strings are written in UTF-8 with no modifications
Note For advanced understanding of UTF-8 encoding, you may also refer to
http://www.cl.cam.ac.uk/~mgk25/unicode.html
To display any Unicode characters on an HTML page, you have to use a special notation Here, the euro symbol isrepresented by €:
<H1> This is the euro sign: € </H1>
Information related to this is documented at http://www.w3.org/TR/html401/charset.html
This notation is also necessary to display those characters that are considered special by HTML For example,when you want to display the string <BR>, you can use this notation to represent the characters so that the browserdoes not interpret it as a tag:
<H1> This is a tag: <BR> </H1>
HTML has a list of entities that can be used to represent a symbol An entity is a name used to identify a particular
character In case of <, the entity is lt and the notation is < (including the semicolon) The following line willoutput the words "This is a tag: <BR>":
<H1> This is a tag: <BR> </H1>
Note Refer to http://www.w3.org/TR/REC-html40/sgml/entities.html for a comprehensive list of
entities
Trang 4To summarize: the most modern character set you can use is Unicode, whereas UTF-8 is the most convenient way
of encoding Unicode for backward compatibility (as well as space saving when using English) If you want to displaycharacters from your character set, you can use the &#NN; notation (where NN is the number allocated to thecharacter/symbol), or the string &entity; (where entity is the entity name)
Note Remember that the trailing semicolon is critical when writing an identity.
URL Encoding
Although UTF-8 and Unicode refer to the content of a page, URL encoding refers exclusively to URLs On many
occasions you need to encode some of the characters in a URL In the important RFC 1738
(http://www.w3.org/Addressing/rfc1738.txt) you can read:
Octets must be encoded if they have no corresponding graphic character within the US-ASCII coded
character set, if the use of the corresponding character is unsafe, or if the corresponding character is
reserved for some other interpretation within the particular URL scheme.
There are several reasons why a character might be deemed "unsafe"—for example, when it has special meanings
in particular contexts For example, you cannot use a space in a URL, and you must therefore encode it For moredetailed information, please read RFC 1738
You cannot use non-ASCII characters in a URL, because there is no way to specify what character set you areusing within the URL (and therefore determine how to render it properly)
A string is URL-encoded by substituting any "unsafe" characters with a symbol (%) followed by two hexadecimaldigits, which represent the character's corresponding US-ASCII code For example, & becomes %26, the spacebecomes %20, and the string Tony & Anna becomes Tony%20%26%20Anna, or Tony+%26+Anna (the space can
be encoded with a + for historical reasons)
Note Using the + rather than %20 is only permitted in the query string, not in the network path portion of the
URI
Trang 5What Happens When You Serve a Page
Knowing what happens when Apache serves a page will help you understand where some of the security issuesarise, as well as how to fix them The next sections details what happens when a request is issued to the webserver, analyzing four common cases
A Static Page
A static page is the simplest case, in which the requested resource is served to the client as it is, without anyprocessing or execution of scripts The client connects to the web server, and makes the request (for example,http://www.apress.com/index.html):
The client displays the page as an HTML document, executing any JavaScript code according to its securitypreferences For more information about security problems in browsers, please read
http://www.guninski.com/browsers.html and http://www.guninski.com/netscape.html
A CGI Script with POST
A POST request is made when users submit data after they have filled out an online form, if the method used isPOST For example, consider a page called form.html that contains the following code:
<FORM ACTION="/here.pl" METHOD="POST">
<INPUT TYPE="TEXT" NAME="name" MAXLENGTH="10"> Name
<INPUT TYPE="TEXT" NAME="surname" MAXLENGTH="10"> Surname
<INPUT TYPE="submit">
</FORM>
Figure B-1 shows how it is displayed on the browser
Trang 6Figure B-1: A simple form.
Suppose that the user enters Tony & in the name field, and Mobily in the surname field
After pressing the Submit Query button, here is what would happen:
The client URL-encodes the data keyed by the user The encoded result would be
name=Tony+%26&surname=Mobily Note that:
The & character keyed in by the user is encoded into %26
The fields are separated by the special character &
The first half of the line (name=Tony+%26) is related to the name field, which is assigned the valueTony+%26 with the special character = The space is converted into a +
The second half of the line (surname=Mobily) is related to the surname field, which is assigned thevalue Mobily It doesn't need any encoding
The characters &, =, and + are special; if they weren't escaped, the CGI program wouldn't be able to tell thedifference between the characters keyed in by the user, and the ones used to compose a correct querystring
Trang 7Notice that after a POST command, the web server waits for additional information (in this case
name=Tony+%26&surname=Mobily) up to Content-length bytes or the end of file, separated from therequest's headers by an empty line
The server runs the here.pl program after preparing its environment, that is, a set of environment
variables used to contain information about the page The program receives the input from the user (which isURL-encoded) from its standard input The output generated by the program here.pl is likely to be a webpage, but it could just as easily be a GIF image or something else The program also includes the headerthat defines the MIME-Type of the response
A CGI Script with GET
You can submit a web form using HTTP's GET command It is more or less similar to the POST method The onlycriteria that change are:
The way the form is coded A form that generates a GET request would start with:
<FORM action="http://localhost/here.pl" METHOD="GET">
Notice that the value of the attribute METHOD is GET instead of POST
1
The request sent by the browser The query string is still URL-encoded, but it follows the HTTP command
GET For example:
The way the script receives the information The here.pl script receives the string
name=Tony+%26&surname=Mobily through the environment variable QUERY_STRING The web serversets this variable just before running the script The only problem with this method is that you have to becareful about the size of content stored in environment variables, because they can only hold between 1Kand 4K, depending on the operating system you use Often, this limit can be reached in complex scripts.3
The query string is visibly appended to the URL Therefore, it appears in logs, and in the address bar of
the browser This makes the query string vulnerable to manipulation and misuse by the user
4
Trang 8A Dynamic Page
The server processes a dynamic page before sending it back to the browser It can be seen as a faster substitute ofthe CGI mechanism; dynamic pages are very popular, because they allow web designers to insert code directly intoHTML pages A dynamic page written in PHP can look like this:
<H1> Welcome! </H1>
<? print("Testing the script!<BR>\n"); ?>
The result page would be:
<H1> Welcome! </H1>
Testing the script!<BR>
This is what happens when a PHP dynamic page is requested:
The client connects to the web server, and makes the request (for example
http://www.mobily.com/dynamic.php) The request could be done in two ways: through a POSTcommand or through a GET command In both cases, PHP deals directly with the information coming fromthe request, and makes sure that the information is readily available to the script
Other Request Types
The GET and POST requests are only two of the types available in HTTP The others are:
HEAD: Only returns the headers in response to a request, without the response's body
PUT: Used to store a resource on the web server
DELETE: Used to delete a resource
OPTIONS: Used to request information about communication options with the web server
Please refer to RFC 2616 for more detailed information on HTTP headers
Trang 9A sound knowledge of all the technologies involved in Apache (and the web in general) is necessary to understandmost of Apache's vulnerabilities There is a considerable amount of information to study, and the fact that webtechnologies are always changing doesn't help You should therefore keep your knowledge updated, constantlyreading the available documentation and keeping an eye on emerging and promising standards
Trang 10Appendix C: Chapter Checkpoints
The checkpoints from each chapter are provided here for quick reference
Chapter 1: Secure Installation and Configuration
Obtain the Apache package from a secure source (such as http://httpd.apache.org), or your distribution's FTPsite or CD-ROM
Check the integrity of the package you obtain (using GnuPG, MD5, or the tools provided by your distribution)
Be aware of exactly what each directive does, and what possible consequences they have for your server'ssecurity You should configure Apache so that httpd.conf contains only the directives you actually need.Apply all the basic security checks on your configuration: file permissions, protection of root's home page,deletion of any default files, disabling of any extra information on your server, and disabling of the TRACEmethod
Make sure that you have protected important files (such as htaccess) using mod_access; and make surethat you need to make minimal modifications to your httpd.conf file (uncomment specific, prewritten lines) toblock a particular IP address
Learn a little about mod_rewrite, and use it to prevent people from using your web site's images
Install and configure SSL (when required) using the latest SSL implementation available; obtain a valid
certificate from a Certificate Authority
Test your installation's strength using an automatic auditing program (such as Nikto, Nessus, SAINT, or
SARA)
Trang 11Chapter 2: Common Attacks
Familiarize yourself with common terms used in computer security: exploit, buffer overflow, DOS attack, rootshell attack, root kit, script kiddie, and more
Know how some representative exploits work, to gain a deeper understanding of the possible threats and theirconsequences
Check http://httpd.apache.org daily to see if a new version of Apache has been released If it has,update your server(s)
Be familiar with relevant web sites such as Apache Week, CVE, VulnWatch, Security focus, CERT, X-ForceISS, and so on (see Appendix A for a detailed list and web addresses)
Subscribe to some of these web sites' newsletters and mailing lists, and read them daily
Trang 12Chapter 3: Logging
Be aware of all your logging options (and problems), and set an ideal environment to enable proper loggingregardless of the solution you use Also clearly document the logging architecture (even if it uses normal files).Check logs regularly or delegate a program to do so; notify the offenders whenever possible
Minimize the number of entries in the error_log This might mean notifying CGI authors of warnings,
notifying referring webmasters that links have changed, and so on
Make sure that there is always enough space for log files (automatic software helps by notifying you of low diskspace situations)
If your environment is critical or attack-prone, log onto a remote machine and encrypt the logging information
In this case, be aware of all the pros and cons of every single remote-logging solution, and try to keep it simple.
Trang 13Chapter 4: Cross-Site Scripting Attacks
Gain as much information and knowledge as possible about XSS
Make sure that the web developers on your team are well aware of the problem and apply each piece of advicegiven in the section "How to Prevent XSS" (identify the user's input, specify the page's character set, don'tallow HTML input, use existing library functions to perform XSS-critical operations such as URL encoding).After developing a web site, allocate a number of hours to look for XSS vulnerabilities on an ongoing basis.For critical sites, impose input checking in your server using a third-party module such as mod_parmguard(see Chapter 5)
Keep updated on XSS-related problems and browser's vulnerabilities (see the section "Online Resources onXSS" for a list of interesting web sites)
Trang 14Chapter 5: Apache Security Modules
Look for modules that might suit your needs at http://modules.apache.org
Check the modules' development status, vitality, and support before installing them
Check the modules' quality by searching the Internet for other people's experience with them, check themodules' source code, read their documentation, and so on before installing them
Test the modules you plan to use, and see if they suit your needs Also, test the modules in a real-worldenvironment, making sure that you can deactivate them quickly if you need to
Constantly check your modules' development status and security upgrades Subscribe to the modules' mailinglists for announcements and support
Only use the modules you need
Check your module's messages and warnings periodically