Although you may start out writing CGI programs, there’s a good chance that downthe road you’ll be building applications using some other platform, like Active ServerPages, ColdFusion, o
Trang 2800 East 96th St., Indianapolis, Indiana, 46240 USA
Trang 3Sams Teach Yourself CGI in 24 Hours, Second Edition
Copyright 2003 by Sams Publishing
All rights reserved No part of this book shall be reproduced, stored in a retrieval system, or transmitted by any means, electronic, mechanical, photo- copying, recording, or otherwise, without written permission from the pub- lisher No patent liability is assumed with respect to the use of the information contained herein Although every precaution has been taken in the preparation
of this book, the publisher and author assume no responsibility for errors or omissions Nor is any liability assumed for damages resulting from the use of the information contained herein.
International Standard Book Number: 0-672-32404-0 Library of Congress Catalog Card Number: 2002107939 Printed in the United States of America
First Printing: September 2002
05 04 03 4 3 2
Trademarks
All terms mentioned in this book that are known to be trademarks or service marks have been appropriately capitalized Sams Publishing cannot attest to the accuracy of this information Use of a term in this book should not be regarded
as affecting the validity of any trademark or service mark.
Warning and Disclaimer
Every effort has been made to make this book as complete and as accurate as possible, but no warranty or fitness is implied The information provided is on
an “as is” basis The author and the publisher shall have neither liability nor responsibility to any person or entity with respect to any loss or damages aris- ing from the information contained in this book.
Trang 4Contents at a Glance
Part I An Introduction to CGI 3
3 Downloading, Installing, and Debugging CGI Scripts 35
Part II Capturing User Input 71
Part III CGI Programming Languages and Tools 147
Part IV Building Basic CGI Applications 227
Part V Integrating Databases with CGI 339
Trang 5Part VI Additional CGI Tips and Tricks 395
Trang 6Part I An Introduction to CGI 3
Hour 1 Overview of CGI Programming 5
Types of Web Applications .6
A History of CGI .6
What Is a CGI Program? .7
How CGI Programs Work .8
How Resources Are Requested .8
Fulfilling the Request .9
Passing Data to a CGI Program .11
Pros and Cons of CGI .11
CGI Programming Languages .13
Perl 13
UNIX Shell 13
The C Language .14
Visual Basic .15
Python 15
Java 16
Summary 16
Q&A 16
Workshop 17
Quiz 17
Quiz Answers .17
2 Setting Up Your CGI Environment 19 The Web Server Itself .19
Hosting Your CGI Scripts 20
Running Your Own Web Server .20
Web Hosting .21
Web-Server Operating Systems .22
UNIX 22
Windows 23
The CGI Environment .23
Web Servers .24
Web-Server Directory Structure .24
How Scripts Are Executed .25
Setting Up Your CGI Development Environment .26
Step 1: Download a Web Server 26
Trang 7Step 3: Download a Perl Interpreter 28
Step 4: Install the Perl Interpreter .28
Step 5: Get the Web Server Up and Running .29
Step 6: Test the Web Server .29
Step 7: Test a Perl CGI Script .30
What If Something Went Wrong? .32
Summary 33
Q&A 33
Workshop 34
Quiz 34
Exercises 34
Quiz Answers .34
3 Downloading, Installing, and Debugging CGI Scripts 35 Downloading Scripts from the Internet .36
Finding the Scripts You Need 36
What to Look for in Publicly-Available Scripts 37
Installing a Downloaded Script .38
Example: Downloading and Installing a Guestbook Script .38
Configuring the Script .39
Installing the Files and Setting Permissions .40
Testing the Script .41
Customizing the Look and Feel .41
Debugging CGI Scripts .42
Finding the Source of an Error 42
Fixing Setup Errors .44
Tools and Techniques for Debugging Your Program Code .47
Compiled Versus Interpreted Languages .48
Running CGI Scripts from the Command Line .49
Using Print Statements for Debugging .50
Summary 50
Q&A 51
Workshop 51
Quiz 51
Exercises 52
Quiz Answers .52
4 Writing Your First CGI Program 53 Parts of CGI Programs 54
A Sample CGI Program .54
A URL-Redirection Program 57
How File Redirection Works .59
Pipes 60
Trang 8Working with Files in Perl 61
Common Statements Used in Perl 63
The if Statement 65
Perl Expressions 66
Summary 68
Q&A 68
Workshop 69
Quiz 69
Exercises 69
Quiz Answers .69
Part II Capturing User Input 71 5 Creating HTML Forms 73 The <form> Tag 74
The action Attribute 74
The method Attribute 75
The enctype Attribute 75
The target Attribute 75
The <input> Tag 76
Text Input Fields 76
Password Fields .77
Check Boxes 78
Radio Buttons .78
Hidden Fields .79
File Upload Fields .80
Reset Buttons .80
Submit Buttons .80
Using Images as Submit Buttons .81
Other Form Fields .82
Text Areas .83
Select Lists .84
Workshop: Building an Entire Form .85
Elements in the Survey Form .87
Summary 87
Q&A 88
Workshop 88
Quiz 88
Exercises 88
Quiz Answers .89
Trang 96 Working with HTTP 91
HTTP Basics .92
What Takes Place During an HTTP Session .92
Step 1: Establish a TCP Connection .93
Step 2: The Web Browser Sends a Command to the Server .94
Step 3: The Web Browser Sends Request Headers .95
Step 4: The Web Server Responds .95
Step 5: The Web Server Sends Response Headers 96
Step 6: The Web Server Sends the Data to the Browser .96
Step 7: The Web Server Closes the TCP Connection .96
Request Methods .97
The GET Method 97
The POST Method 98
Choosing Between GET and POST 98
Server Response Codes .99
Response Headers .101
Cache-control 101
Content-length 101
Content-type 101
Expires 101
Pragma 102
Server 102
Set-Cookie 102
NPH Scripts .102
Content Types .103
How Servers Use Content Types .104
Content-Type Categories .105
Nonstandard Types .106
Secure Connections .107
Summary 108
Q&A 109
Workshop 109
Quiz 109
Exercises 109
Quiz Answers .110
7 Validating User Input 111 Using JavaScript for Form Validation .111
How JavaScript Works to Validate Forms .112
An Example of Form Validation .113
The Form Itself 113
The Event Handler .114
Designing Easily Validated Forms 114
Trang 10Incorporating Validation into the Form-Processing Code .115
How a Form-Processing Program Works .115
An Example of Form Processing 116
The Main Script Logic .117
The Input Validation Subroutine .118
The Output Subroutine .119
The Form Creation Subroutine 120
The Full Source Code 121
Validating Values .123
Regular Expressions .124
An Example That Uses Regular Expressions .126
Summary 128
Q&A 128
Workshop 128
Quiz 128
Exercises 129
Quiz Answers .129
8 Creating an Email Feedback Form 131 What Kinds of Applications Involve Sending Email? 132
How Email Works .132
Mail Message Composition .134
Using Net::SMTP 135
Example: Sending Email from a CGI Script 136
Setting Things Up 138
The Application Logic .139
Validating the Form .140
Sending the Email Message .141
Using sendmail 141
Using sendmail with Perl 142
Summary 143
Q&A 144
Workshop 144
Quiz 145
Exercises 145
Quiz Answers .145
Part III CGI Programming Languages and Tools 147 9 Web Application Architecture 149 Application Design .150
Round-trip Scripts .150
Figuring Out Whether a Form Was Submitted 151
Declarative Programming .152
Trang 11Handling Many Types of Requests .153
Sharing Code Among Scripts .156
Using CGI::Application 158
Installing CGI::Appliction 159
Creating CGI::Application Applications 162
An Example That Uses CGI::Application 162
Other CGI::Application Notes 166
Summary 166
Workshop 167
Q&A 167
Quiz 167
Exercises 167
Quiz Answers .168
10 Delving Further into Perl 169 CGI.pm 169
Accessing CGI.pm from Your Program .170
Decoding Form Data via CGI.pm 171
Named Image-Input Fields 171
Multiple Select Lists 172
File Upload Fields .172
Generating HTTP Headers and HTML Tags via CGI.pm 175
How to Generate HTTP Headers .176
How to Generate HTML Tags .177
Skipping the Object-Oriented Stuff .179
cgi-lib.pl 180
Replacing cgi-lib.pl with CGI.pm 180
Handling Errors with CGI::Carp 181
Sending Fatal Errors to the Browser .182
Resources for Perl Programmers .182
Perl Documentation .183
Perl Information on the Web .183
Perl Books .184
The Comprehensive Perl Archive Network .184
Summary 186
Q&A 186
Workshop 186
Quiz 186
Exercises 186
Quiz Answers .187
Trang 1211 Other Popular CGI Programming Languages 189
Will My Favorite Language Work for CGI? .189
Writing CGI Programs Using the Bourne Shell .190
How Shell Scripts Work .191
Creating Gateways to UNIX Commands .192
Working with Query Strings .193
Writing a Program That Uses the Query String 194
Writing CGI Programs in C 195
The cgic Library 196
Printing Headers .196
Printing Output .197
Handling Form Input .197
A C Example .201
Writing CGI Programs in Python .205
A Python Example 206
Summary 208
Q&A 208
Workshop 209
Quiz 209
Quiz Answers .209
12 Pros and Cons of Alternate Technologies 211 Looking Back 212
Why CGI Alternatives Appeared .212
Getting Past CGI’s Limitations .212
J2EE 213
Servlets 214
JavaServer Pages .214
PHP 216
Examples Using PHP .216
ASP.NET 218
Business Objects 218
Code Blocks .219
HTML Pages .221
Macromedia ColdFusion .221
ColdFusion Sample Code 221
The Apache mod_perl Module 222
Porting Your Scripts from CGI to mod_perl 223
Summary 224
Q&A 224
Workshop 225
Quiz 225
Exercises 225
Trang 13Part IV Building Basic CGI Applications 227
13 Using Flat Files for Data Storage 229
What Is a Database? .230
Flat-File Databases .230
Delimiting Data Using Characters .231
Delimiting Data Using Field Widths .231
File Operations 232
Retrieving Records from a Database .232
Inserting a Record into a Database .232
Deleting Records from a Database 232
Modifying a Record in a Database 233
File Locking .233
Building a Database Application .234
The Sample Database .234
Retrieving Records from the Database .235
Inserting a Record into the Database .238
Deleting Records from the Database .243
Modifying a Record in the Database .245
Summary 250
Q&A 250
Workshop 250
Quiz 251
Exercises 251
Quiz Answers .251
14 Creating a CGI-Based Message Board 253 The Structure of the Application .254
The File Format .254
The Display Script .255
Utility Subroutines .260
Opening the Topic File .260
Parsing a Topic File .261
Printing the Topic List .263
Printing a Topic .264
The Posting Script .265
Presenting the New Topic Form .270
Processing a New Topic Submission .271
Adding a Response 272
Summary 272
Q&A 273
Trang 14Workshop 273
Quiz 273
Exercises 273
Quiz Answers .273
15 Session Management 275 Why Use Session Management? .276
Basic Authentication .277
Hidden Fields in Forms .278
Hidden Form Fields Example .279
Application Logic 281
Printing the Hidden Fields .281
Using Cookies .282
How Cookies Work .283
Using Cookies to Save User Information 285
Using Cookies to Retrieve User Information 287
Setting and Retrieving Cookies with JavaScript .289
Session Management with Cookies 290
The Catalog Page .291
The Checkout Form .293
Why CGI and Cookies Don’t Mix .295
Summary 295
Q&A 296
Workshop 296
Quiz 296
Exercises 297
Quiz Answers .297
16 Building a Simple Shopping Cart 299 How the Sample Shopping Cart Works .300
The Catalog .301
Printing the Catalog .304
Adding Items to the Shopping Cart .305
The Contents of the Shopping Cart .306
Printing the User’s Cart .309
Removing an Item from the Cart .311
Checkout 312
The Checkout Script for This Example .313
Summary 318
Q&A 319
Workshop 319
Quiz 320
Exercises 320
Quiz Answers .320
Trang 1517 Content Management with CGI 321
Why Content Publishing? .321
Separating Content and Presentation .322
Building in an Editorial Process 322
Types of Content Publishing Systems .322
Data Storage for Content Publishing Systems .324
A Content Publishing Example .324
The Story-Input Program .325
The Story-Display Program .330
Free Content Management Systems .335
Mason 336
Zope 337
PostNuke 337
Red Hat Content Management Solution .337
Summary 337
Q&A 337
Workshop 338
Quiz 338
Exercises 338
Quiz Answers .338
Part V Integrating Databases with CGI 339 18 Working with Relational Databases 341 The Relational Database Model .341
Structured Query Language 343
Statements for Data Manipulation .343
Statements for Data Definition 344
Statements for Database Administration .345
Database Design .345
Characteristics of Good Databases 346
Symptoms of Bad Databases .347
The Design Process .347
Creating a Database .348
Choosing Which Database to Access .349
Creating a Table .349
Relational Data Types .350
String Data .351
Numeric Data .352
Temporal Data .352
Summary 353
Q&A 353
Trang 16Workshop 353
Quiz 353
Exercises 354
Quiz Answers .354
19 How to Use the Structured Query Language 355 Structured Query Language 356
The SELECT Statement 357
Adding, Deleting, and Modifying Records .363
Database Interfaces .365
ODBC 366
DBI and DBD .366
A Sample Program Using DBI and DBD .367
Summary 370
Q&A 370
Workshop 371
Quiz 371
Exercises 371
Quiz Answers .371
20 Creating an Online Store 373 The Database Design .373
The Catalog Script .377
Opening and Closing Database Connections .381
Displaying the Product List .381
Adding Items to the Shopping Cart .382
The Shopping Cart Script .382
Displaying the Shopping Cart .385
Removing Items from the Shopping Cart .386
The Checkout Script .387
Storing Orders in the Database .392
Summary 393
Q&A 393
Workshop 394
Quiz 394
Exercises 394
Quiz Answers .394
Part VI Additional CGI Tips and Tricks 395 21 Handling Other Content Types 397 Content Types .397
Handling Binary Content 398
An Authenticated Download Application .399
Trang 17Creating Your Own Ad Server .403
Tracking User Activity 405
Summary 409
Q&A 409
Workshop 409
Quiz 409
Exercises 410
Quiz Answers .410
22 Securing CGI Scripts 411 Why Security? .412
The Crack-a-Mac Contest .412
Risk Assessment .413
Securing Your Web Server .414
Keep Your Software Up-to-Date .415
Store Your CGI Scripts Together .415
File Permissions .416
Server Options That Are Bad for Security .417
Common CGI Security Holes .417
A Note on How CGI Works .417
The Buffer Overflow Problem .418
Don’t Send Raw Input to Shell Commands .419
Using File Paths Is Risky .420
Don’t Place the Perl Interpreter in cgi-bin .421
Security Hole with DOS Batch Files .421
Keep Your Server Information Private .421
Safe Programming .422
Running Shell Commands Without Using the Shell .422
Summary 423
Q&A 423
Workshop 424
Quiz 424
Exercises 424
Quiz Answers .424
23 Creating Custom Error Documents 425 What Is an Error Document? 425
Configuring Your Web Server for Custom Error Documents .427
The Apache Web Server .428
Using HTML to Create a Basic Error Document .429
Trang 18Using CGI to Create an Advanced Error Document .430
Environment Variables for Error Documents .430
Linking Back from the Error Document to the Referring Page .431
Creating Custom Links from the Error Document .432
Handling a “Not Found” Error 435
Handling an “Unauthorized” Error .439
Summary 440
Q&A 440
Workshop 441
Quiz 441
Exercises 441
Quiz Answers .441
24 Server Side Includes 443 How Server Side Includes Work .444
Setting Up Your Web Server for SSI .444
Apache .445
Microsoft Internet Information Server .446
Using SSI Directives .446
flastmod 446
SSI Directives .448
#echo 448
#include 448
#fsize 449
#exec 449
#config 450
Designing Pages Using SSI .453
Using the #include Directive 453
Last Modified Dates .455
Using the #exec Directive 455
XSSI 457
printenv 457
set 457
if Directives .457
Summary 459
Q&A 459
Workshop 459
Quiz 460
Exercises 460
Quiz Answers .460
Trang 19Part VII Appendixes 461
B Response Codes and Reason Phrases 467
C Environment Variables and Request Headers 471
D Summary of Regular Expressions 479
Trang 20About the Author
Rafe Colburn is a software developer who works at TogetherSoft in Raleigh, North
Carolina He’s also the author of Special Edition Using SQL for Que Publishing and has
contributed to a number of other computer books His personal site, rc3.org Daily, can befound at http://rc3.org, and he can be reached via email at rafecolburn@rafe.us
Trang 22We Want to Hear from You!
As the reader of this book, you are our most important critic and commentator We value
your opinion and want to know what we’re doing right, what we could do better, whatareas you’d like to see us publish in, and any other words of wisdom you’re willing topass our way
You can email or write me directly to let me know what you did or didn’t like about thisbook—as well as what we can do to make our books stronger
Please note that I cannot help you with technical problems related to the topic of this book, and that due to the high volume of mail I receive, I might not be able to reply to every message.
When you write, please be sure to include this book’s title and author as well as yourname and phone or email address I will carefully review your comments and share themwith the author and editors who worked on the book
Email: webdev@samspublishing.com
Mail: Mark Taber
Associate PublisherSams Publishing
800 East 96th StreetIndianapolis, IN 46240 USA
Reader Services
For more information about this book or others from Sams Publishing, visit our Web site
atwww.samspublishing.com Type the ISBN (excluding hyphens) or the title of the book
in the Search box to find the book you’re looking for
Trang 24During the past few years, Web applications have undergone an amazing transition Notthat long ago they were a relative rarity Sites might have an email feedback form, or asearch engine, or maybe even a message board From there, we moved to a realm inwhich hundreds of companies centered around Web applications sprung up Since then,the craze has died down some and many of those companies are no longer around, butWeb applications themselves aren’t going away—indeed, they’re more common thanever
These days, not only does nearly every company have a Web site, but most of them ture Web applications as well Corporate intranets are also filled with Web applications
fea-of all kinds More importantly, most sfea-oftware companies fea-offer some level fea-of Web gration for their software Large enterprise software companies like Siebel, SAP, andOracle have Web interfaces for their products The main reason that Web applicationsaren’t in the headlines every day any more is that they’ve become so common
inte-The primary focus of this book is on using the Common Gateway Interface (CGI), which
is built into nearly every Web server, to develop Web applications The biggest advantage
of CGI is that it supports nearly every popular programming language So, if you alreadyknow how to program, you can probably get started writing Web applications right away.Even if you don’t program, you can copy existing CGI programs and modify them to suityour own needs
The larger focus of this book is on teaching you how Web applications are designed andbuilt Although you may start out writing CGI programs, there’s a good chance that downthe road you’ll be building applications using some other platform, like Active ServerPages, ColdFusion, or perhaps Java servlets Even so, this book will provide you with theknowledge you need to understand how Web applications work in general, and somemethods you can use to write Web applications that are easy to improve and maintain.Also, knowing CGI will enable you to quickly solve problems that you might not be able
to solve using other technologies For example, because CGI programs can be written in
a wide variety of languages, often it’s easiest to write Web applications that communicatewith other programs using CGI
I should also point out that most of the examples in this book are written in Perl Perl is ascripting language that’s available for most computing platforms, including Unix andWindows It has a number of features that make it well-suited to CGI programming, and
in fact, it’s the most popular language for writing CGI programs The biggest advantagefor readers of this book is that it’s easy to learn Perl in bits and pieces You don’t need tounderstand any big concepts behind software design in order to write useful programs in
Trang 25My goal in this book is to explain the Perl concepts that I use in the example programs,and not to go beyond that I also used as few special Perl features as possible in order toapply the lessons in these examples to the programming language that you use to writeCGI programs If you don’t know any programming languages, Perl is as good a startingpoint as any other language I’d advise you to pick up a Perl book (like Laura Lemay’s
Teach Yourself Perl in 21 Days), or at least check out the Perl information at
www.perl.com.Hopefully this book will be the first step in sending you off on a long and successfulcareer as a professional or amateur developer of Web applications If you have any com-ments or feedback, please send them to me at rafecolburn@rafe.us Good luck!
Trang 261 Overview of CGI Programming
2 Setting Up Your CGI Environment
3 Downloading, Installing, and Debugging CGI Scripts
4 Writing Your First CGI Program
An Introduction to CGI
Trang 28a Web site? If the answer to any of those questions is yes, you’ve not only
browsed a Web site, you’ve interacted with a Web application.
It didn’t take long for Web developers to realize that Web sites could provide
a lot of functionality above and beyond browsable content Increasingly,you’ll find that Web applications—which provide truly interactive function-ality to your Web sites—are taking the place of static HTML content If youwant to accept and process user input, retrieve and present data from a data-base, or communicate with applications external to the Web server, you mustuse a Web application development platform In this book, you will learnhow to build interactive Web applications using the Common GatewayInterface (CGI)—which is the predominant platform for deploying Webapplications today
Trang 29This hour will provide the introductory and background information you need before youcan start writing CGI programs The following topics will be discussed:
• Types of Web applications
• A brief history of CGI
• The definition of a CGI program
• How CGI programs work
• Pros and cons of using CGI to write Web applications
• Programming languages you can use to implement CGI programs
Types of Web Applications
A Web application is like any other application, except that the interface for it is vided through the browser Originally, Web applications were generally used for func-tions that are unique to the Web site feedback forms, online discussion boards, andshopping carts for electronic commerce sites However, the world of Web applicationprogramming has matured, and now there are Web-based replacements for many desktopapplications People use the Web to manage their calendars and contacts, to locate placesusing online mapping services, and to read their email
pro-Although you could use JavaScript to augment a Web page with ity,” such as image rollovers and pull-down menus of links, you’d still be stuck with the static information the author originally placed on the page.
Trang 30At the time NCSA released HTTPD, CGI was the only way to implement Web tions Because CGI is so simple, and the source code to NCSA HTTPD was freely avail-able, nearly every Web server developed after NCSA HTTPD supports CGI If you’reinterested, you can view the original CGI documentation at:
applica-http://hoohoo.ncsa.uiuc.edu/cgi/
Today, all of the popular Web servers on both Windows and UNIX support CGI
In the years since the people at NCSA initially defined CGI and implemented it in theirWeb server, lots of other methods of developing Web applications have been introduced
Some Web application platforms you might be familiar with include JavaServer Pagesand servlets, Microsoft’s Active Server Pages, Macromedia’s ColdFusion, and PHP
These Web application development tools, among others, will be discussed in Hour 12,
“Pros and Cons of Alternate Technologies.”
What Is a CGI Program?
A CGI program is executed by the Web server in response to a request made by the Webbrowser The Web server acts as an intermediary between the browser and the CGI pro-gram: it passes the browser’s request to the CGI program, and it sends output from theprogram back to the Web browser for processing For example, a program might accept astock ticker symbol, look up the stock price associated with that symbol and return it tothe user as part of a dynamically generated Web page Or, a program might accept auser’s comment, and send it to the site’s Webmaster in an email message Almost anyprogramming language can be used to write a CGI program; CGI itself is the definedinterface between the Web server and the external program you want to write
Let me discuss briefly what a CGI program doesn’t do—it doesn’t interact with the user
in a direct way It doesn’t display or retrieve information from prompts, menus, or otherinteractive features A CGI program doesn’t display graphics, either Although it maygenerate binary data that is, in fact, an image, it doesn’t create any windows or otherwiseinteract with a graphical user interface
To work properly, a CGI program must meet the following criteria:
• You must be able to execute the program from the command line by simply typingthe program’s name (For example, Java programs must be executed through theJava virtual machine, by typing java programname This makes them unsuitablefor use as CGI programs.)
• The program must generate a valid content-type header
1
Trang 31Any type of content is fair game as output of a CGI program For example, content typesinclude HTML code, GIF images, plain text files, Microsoft Word documents, and audiofiles The content-type header that’s supplied by the program indicates which sort of con-tent is being returned, so that the browser can take the appropriate action Later in thehour, I’ll discuss the details of how to create this header.
Basically, that’s it As long as the Web server can execute the program, and the programgenerates valid output, it’s acceptable for use as a CGI program Later in the hour, I’lldiscuss what qualifies as valid output, and I’ll also discuss other capabilities generallyassociated with CGI programs
How CGI Programs Work
Now I’m ready to get down to the nuts and bolts of how CGI programs work The greatthing about CGI is that it’s an extremely simple interface If you’re familiar with UNIX-based operating systems, you will recognize the concepts that CGI is grounded in
As I’ve already discussed, CGI is a set of conventions that allows Web servers and nal programs to communicate To illustrate how CGI programs work, I’m going toinclude a description of the entire HTTP session, so you can understand at a high levelhow it all fits together
exter-How Resources Are Requested
An HTTP session is initiated when a Web client (usually a Web browser) requests aresource from a Web server As I’m sure you already know, these resources are identifiedusing URLs When you’re dealing with static HTML pages, the URL simply consists ofthe location of a file stored on a Web server Let’s say you have a URL like this:
http://www.example.com/somedirectory/index.html
That URL corresponds to the file index.htmlin the somedirectorysubdirectory of theWeb server’s document root If the document root is /home/httpd/htdocs, the path thatcorresponds to the URL is:
/home/httpd/htdocs/somedirectory/index.html
If the Web server can locate and read the file, the contents of the file will be sent back tothe client that requested it
When the URL points to a CGI program, things get a bit more complicated Let’s look at
a URL that points to a CGI program:
http://www.example.com/cgi-bin/example.cgi
Trang 32In this case, the resource being requested is a program named example.cgi What it does
is unimportant What is important is that when the Web server determines that therequested resource is a CGI program, it executes that program and returns the output ofthe program to the client
This process is very different from that used for static HTML files For one thing, a lotmore can go wrong When a CGI program is requested, the Web server must determinethe following:
1 Can it locate the requested program file?
2 Does it know that the requested file is a CGI program? (I’ll discuss this in Hour 2,
“Setting Up Your CGI Environment.”)
3 Is it allowed to execute the program?
4 Did the program execute without any errors?
5 Was the output of the program a valid response to a Web request? (I’ll discuss thislater in this section.)
Only if the answer to all of those questions is affirmative can the Web server successfullyfulfill the request If the answer to any of these is no, an error will be returned, or some-thing strange will happen
Fulfilling the Request
As stated earlier, a CGI program must supply a contenttype header so the Web browserknows what type of output the program has returned Normally, the server derives thecontent type from the extension of the requested file Because the extension of the CGIprogram generally has no relation to the content type of the data that the program gener-ates, the content type has to be specified within the program itself Content types for Webcontent are specified using MIME types MIME is a standard that is commonly associ-ated with email, but the naming system used for identifying the type of data stored in aMIME attachment is exactly the same as the naming system for specifying content types
on the Web Table 1.1 contains a list of common content types
T ABLE 1.1 Common Web Content Types
1
Trang 33If a CGI program generates HTML code, it produces the following content-type header:
Content-type: text/html
That information is received by the Web server, and included with the other headers thatare sent back to the browser The HTTP protocol specifies that headers are to be sepa-rated from the actual content by two linefeeds When a browser receives two consecutivelinefeeds, it knows that the headers have ended and the content to be processed hasbegun So, to conclude this example, if the CGI program example.cgiproduces HTMLcode as its output, the full output of the program might be as follows:
Content-type: text/html
<html>
<head><title>A simple example.</title></head>
<body>This is a simple example.</body>
Content-Type: text/html Client-Date: Thu, 30 May 2002 03:24:56 GMT Client-Peer: 209.197.70.60:80
<html>
<head><title>A simple example.</title></head>
<body>This is a simple example.</body>
</html>
HTTP Headers
Most of the information relevant to an HTTP transaction is visible to the user The URL being requested and the information entered in a form are the visible parts of a request Similarly, the HTML (or other) data returned by a request is displayed by the browser or saved to disk.
However, other information is exchanged that’s invisible to users This information is exchanged between the browser and server, and is used to make their jobs easier.
Trang 34Passing Data to a CGI Program
In my description of how a typical HTTP session works, I left out something very tant! I didn’t explain how information is passed from the Web browser to the CGI pro-gram There are several different ways that data can be passed to the Web server from abrowser For now, I’ll provide an overview of how this process works
impor-Generally speaking, the data passed to CGI programs is collected using HTML forms
(There are other ways to provide data as well.) In Hour 5, “Creating HTML Forms,” Idiscuss how to create forms for submitting information to CGI programs, and I also dis-cuss some alternate methods of providing data to a CGI program The data can also beembedded in URLs used within standard hyperlinks
Before data can be passed to a CGI program, it has to be encoded to remove any ters that might break things Most of the time, a technique called URL encoding is used
charac-URL encoding is a method of escaping certain characters that are significant to the Web
server so that they’re ignored and passed directly to the CGI program For example, the ?character is used to separate the filename in the URL from the query string If the querystring itself contains a ?, passing it without escaping it could cause problems So, the ? istranslated so that it doesn’t confuse the Web server
Spaces are also confusing, so they’re converted into plus signs Since plus signs are used
to replace spaces, real plus signs have to be encoded as well Fortunately, these days youdon’t need to worry much about the mechanics of URL encoding, because there are pro-grams and libraries that will do your dirty work for you More details about how URLencoding works will be provided in Hour 6
Pros and Cons of CGI
There are advantages and disadvantages to writing your Web applications as CGI grams In Hour 12, I’ll discuss some of the alternatives to CGI for Web application pro-gramming and how those alternatives compare to CGI For now I’ll just talk about CGI
pro-1
You’re already familiar with one kind of header, the content-type header, which must be provided by every CGI program This header tells the browser how to handle the content being returned by the CGI program Other common headers are used to specify the types
of content that browsers accept, or to indicate the name and version of the Web server
or Web browser.
A list of the common HTTP headers appears in Appendix B, “Environment Variables and Request Headers.”
Trang 35I’ll talk about the good stuff first—the advantages of CGI programming The mainadvantage of CGI programming is that it’s the ultimate cross-platform technology Itworks on Web servers running both Windows and UNIX, and with almost every Webserver So when you write CGI programs, you can be fairly certain that they’ll beportable to whatever environment you’ll want to run them in The second major advan-tage of CGI is that it’s language independent For the most part, you can write CGI pro-grams in the language of your choice There’s no need to learn a new programminglanguage just to write CGI programs If you choose a cross-platform language, like Perl,it’s trivial to port your programs from UNIX to Windows, or vice versa.
Another advantage of CGI is that it’s a very simple interface It’s not necessary to haveany special libraries to create a CGI program, or write your programs to use a particularAPI Instead, CGI programs rely on the standard UNIX concepts of standard input, stan-dard output, and environment variables to communicate with the Web server
Now let’s take a look at the disadvantages of CGI The single greatest disadvantage ofCGI programs comes into play when you write your CGI programs in a scripting lan-guage Every time a CGI program is requested, the interpreter for the scripting languagehas to be started, the script has to be evaluated, and then the script has to be executed.The fact that you have to run the Perl interpreter every time a Perl CGI script isrequested is very inefficient Whether this is a problem depends on how powerful yourWeb server is, how many requests there are for your CGI scripts, and how long it takesthe CGI program to load Generally speaking, the performance issue does not become aproblem unless you run a very high traffic Web site, or you have an antiquated Webserver
People who write their CGI programs in a compiled language like C don’t have to dealwith this problem, because there’s no extra overhead like that generated by an interpreter
In fact, it was once common to use small, fast-executing CGI programs as a gatewaybetween the Web server and the application server process That allowed the applicationserver to work with Web servers that they can’t communicate with through a native inter-face
The other main complaint about CGI programs is that they don’t make things as easy onWeb programmers as some other newer Web application platforms When you write aCGI program, in addition to all of the program logic that creates the functionality youwant, you also have to write code to generate the HTML code for the page Most oftoday’s more popular application servers allow you to embed program logic within astandard HTML page, which can save some work when you write the programs Theseapplication servers are also easier to learn for people who know HTML but don’t knowhow to program It is, however, harder to write structured, well organized programs whenyou use this type of technology, so the choice is really one of preference One isn’tabsolutely better than the other
Trang 36CGI Programming Languages
As I’ve already stated, almost any programming language can be used to write CGI grams Just because it isn’t mentioned here doesn’t mean that it’s unsuitable for CGI pro-gramming As long as programs written in the language can meet the criteria that Idiscussed earlier in this hour for CGI programs, it can be used for CGI programming Inthis section of the hour, I’m going to discuss some of the more commonly used lan-guages, but this is by no means a complete list
pro-Perl
Perl is the granddaddy of all of the languages used for CGI programming Perl had theright mix of ease of use, features helpful to CGI programming, and popularity to becomethe dominant language for writing CGI programs when the original Web servers that sup-port CGI were released It’s not necessarily any better suited to CGI programming thanany of a number of other languages, but it’s the language most CGI programmers use
One factor that established the popularity of Perl as a CGI programming language wasthe availability of libraries that made it easy to write CGI programs The CGI.pmmodule,which is used to make a number of CGI-related tasks easier, is now bundled with the Perlinterpreter The most important functionality provided by CGI.pmis seamless conversion
of form input to a useful Perl data structure It also provides tons of additional ality to make it convenient to generate HTML code
function-Another advantage of Perl is that a lot of CGI programs that have already been written inthe language are available for download on the Internet In many cases, you can down-load an existing script and adapt it to your purposes rather than writing a new script fromscratch
In this book, nearly all of the CGI programs will be written in Perl Perl is easy to learn,especially if you already know how to program, and is generally considered to be the defacto standard for CGI programming However, this book is really about proper Webapplication design, and isn’t written as a Perl tutorial The most important thing for you
to take away are the proper design techniques for creating Web applications
UNIX Shell
When it comes to writing a simple CGI program, particularly one that is designed tointeract with UNIX programs, writing it as a shell script is a common choice Most peo-ple who write CGI programs using shell scripts do so because they’re system administra-tors who are already familiar with shell scripting It’s easy to do a lot with a few lines ofcode in a shell script, particularly if it involves interfacing with other UNIX commandline programs
1
Trang 37For example, if you wanted to write a CGI program that returns the load average for theserver (using the uptimecommand), writing it as a shell script would make a lot ofsense The disadvantage of writing your CGI programs as shell scripts is that, in myopinion, shell scripts are best suited to quick and dirty tasks Other languages are bettersuited for writing complex CGI programs.
The C Language
The C programming language is perfectly acceptable for writing CGI scripts, as are anyother compiled languages that can be used to create standard command-line executablefiles The main advantage of writing CGI programs in a compiled language like C is thatthe performance is very good The programs execute in less time than it takes to start thePerl interpreter to run a Perl script
Unfortunately, there are a number of disadvantages involved with writing CGI programs
in C as well These are all general software development tradeoffs, they’re not specific toCGI programming Any comparison of scripting languages to compiled languages willinclude the same reasons For the benefit of people who haven’t read such a discussion,I’ll quickly cover the issues here
Basically there are three areas where scripting languages have an advantage over piled languages The first area is speed of development Scripting languages tend to havehigher-level statements than compiled languages, which makes it easier to complete thetasks for which the scripting language was designed For example, Perl has lots of toolsthat are designed to make it easy to manipulate text files Writing a program to searchthrough a text file for valid email addresses would require a lot fewer lines of Perl codethan C code
com-The second advantage is in debugging When you work with a compiled language, youhave to recompile your code every time you make a change to it So, when you’re debug-ging a C program, you have to compile the program, run it, and if it doesn’t work, youhave to change it, recompile it, and run it again When you use a scripting language,you’re saved the compilation steps You can simply test your program, and if it doesn’twork, you can make a change and test it again
The third advantage of scripting languages is that they’re easier to learn than compiledlanguages, generally speaking With most scripting languages, you can learn languageconstructs as you need them, rather than learning an overarching language philosophyfirst, or learning how to structure the programs Many languages allow you to build morestructure into your programs once you learn how, but still allow you to create simple pro-grams at first
Trang 38Now let’s talk about the advantages of compiled languages over scripting languages Theprimary advantage is performance Scripting languages must be processed by an inter-preter and turned into machine executable code every single time they’re run When youwrite scripts that you only use occasionally for various tasks, the performance issue isn’t
a big deal In a high demand Web environment where a CGI program has to process dreds of requests per hour, the overhead can make using CGI scripts prohibitive
hun-Another advantage of most compiled languages, and C in particular, is that they’re able for just about any programming task Most scripting languages are designed for aparticular task, or type of task There are many good general purpose scripting lan-guages, but even they aren’t as flexible as C Most of the time, this flexibility doesn’tcome into play because scripting languages are flexible enough for the task at hand, but
suit-in some cases, C is the only language that will do
Visual Basic
Visual Basic is an incredibly popular language for writing client/server applications
Microsoft claims that there are more Visual Basic programmers than there are for anyother language Unfortunately, for a number of reasons Visual Basic is poorly suited tothe creation of CGI programs While Visual Basic can be used with some Web serversusing the wince interface, it’s not really much like the standard CGI interface, and is kind
of awkward for writing Web applications
Fortunately, there are other options for Visual Basic programmers Microsoft’s InternetInformation Server supports Active Server Pages (ASP), which allows you to embedapplication logic into a Web page ASP supports a scripting language called VBScript,which is kind of a simplified version of Visual Basic You can also write COM objects inVisual Basic that can be accessed from ASP pages Microsoft’s NET framework alsomakes it easy to integrate code written in Visual Basic (or other NET languages) withASP.NET pages I’ll discuss both of these options in Hour 12
Python
Python is an object-oriented scripting language that’s available for most popular ing systems Like Perl, it’s a general purpose language suitable for many tasks, includingCGI programming Most Python fans like it because it’s easy to write readable, maintain-able programs using the language It combines many of the advantages of scripting lan-guages, like rapid development, with some of the advantages of compiled languages, likesolid program structure One of the nicest, and most controversial, features of the lan-guage is that it uses white space to define blocks in the source code In other words, inorder to work properly, Python code has to be formatted in a sensible manner You canlearn more about Python at http://www.python.org
operat-1
Trang 39Java programmers are in the same boat as Visual Basic programmers when it comes toCGI programming Earlier, I said that it’s impossible to write CGI programs in Java Thereason for this is that Java programs have to be executed using a Java Virtual Machine, sothere’s no way to call a Java CGI program directly from the Web server However, youcan write wrapper CGI programs in another language that can be used to call the Javaprogram using the Java Virtual Machine If you do choose to go this route, you’re thenfaced with the overhead of starting a wrapper program, which in turn starts the JavaVirtual Machine and then executes the Java program
Again, like Visual Basic, there are other options available for Java programmers You canuse Java servlets to write Web applications Servlets are the Java equivalent of CGI pro-grams A servlet engine includes a Java Virtual Machine, and runs while the Web server
is up When a request is made, the servlet engine executes the Java program that wasrequested and sends the output of the servlet to the user as a response There are also anumber of application servers that use Java as their programming language Any of theseoptions are better for writing Web applications in Java than writing CGI programs inJava
Summary
The purpose of this hour was to provide you with a wide ranging overview of the CGIlandscape At this point you should understand, at least at a high level, how CGI pro-grams work and what qualifies a program as a CGI program You should also have agood idea which programming languages can be used to write CGI scripts, and theadvantages and disadvantages of the most popular CGI programming languages In thenext hour, “Setting Up Your CGI Environment,” I’m going to tell you how to set up yourcomputer so that you can begin writing and testing CGI scripts
Q&A
Q I’ve heard of Web Objects, ColdFusion, JavaServer Pages, (insert application server here)—how does it compare to CGI for writing Web applications?
A I’ll discuss most of the popular Web application servers in Hour 12 Each
applica-tion server has its own strengths and weaknesses and it’s impossible to generalize
Q Does the browser that my site’s visitors use matter when I write CGI grams?
pro-A No CGI programs are browser-independent—what counts is the content generated
by your CGI program, which is entirely up to you
Trang 40Q Are any security issues related to running CGI programs?
A Yes, CGI programs can make your site vulnerable to intruders in many different
ways if you’re not careful Hour 22, “Securing CGI Scripts,” covers the most mon security issues
com-Q My Internet service provider (ISP) doesn’t allow me to write my own CGI programs What can I do?
A Not a lot Many ISPs don’t allow users to supply their own CGI programs for
secu-rity reasons If this is the case, you’ll have to find another ISP that does MostWeb-hosting services aimed at professional Web designers do allow their users tocreate their own CGI programs You can find a list of some ISPs that support CGI
at the Web site for this book at http://rc3.org/books/cgi-in-24-hours/
Workshop
The quiz questions are designed to strengthen the knowledge you gain each hour
Quiz
1 Which was the first Web server to support CGI?
2 What is the primary advantage of compiled languages over interpreted languages?
3 What is the name of the technique used to translate special characters in querystrings to characters that are acceptable to the Web server?
Quiz Answers
1 The first Web server that supported CGI was NCSA HTTPD
2 The primary advantage of compiled languages over interpreted languages is mance they are stored in a machine-native form that enables them to be executedquickly
perfor-3 URL coding is the technique used to translate special characters so that they can beaccepted by a Web server
1