Content may include the following: Files taking too long to load Inability to open files Cleaning out client-side cache Server may cache information as well Web page update settings
Trang 1Chapter 9
Internet Site Functionality Design
i-NET+ EXAM OBJECTIVES COVERED IN THIS CHAPTER:
Identify the issues that affect Internet site functionality (e.g., performance, security, and reliability) Content may include the following:
Files taking too long to load
Inability to open files
Cleaning out client-side cache
Server may cache information as well
Web page update settings in browsers
Describe different types of search indexes—static index/site map, keyword index, full-text index Examples could include the following:
Searching your site
Searching content
Indexing your site for a search
Trang 2Perhaps the most important aspect of implementing and taining a Web site is making sure that it is accessible and usable by your audi-ence Regardless of how wonderful content is, if users cannot access the site
main-in a timely and reliable way, they will go elsewhere for the main-information they seek Therefore, it is important to know enough about the technologies that run the Internet that you can ensure that your site will meet the demands of its users
In this chapter, you will learn about several critical topics that have an impact on a site’s functionality and usability:
Site functionality issues
Technology and content-type planning
Caching
Site indexingEach of these major topics contributes to the overall usability of a Web site
Site Functionality Issues
Internet users are a fickle bunch Technological glitches not only harm functionality, they often cost sites their reputation for usability and reliabil-ity Everyone has given up on a Web page because it is too slow or just plain broken What do we do if www.amazon.com/ goes down? We go to www.barnesandnoble.com/ What do we do if a site requires ActiveX and our corporate security policy is to disallow ActiveX? We go to a different site It
Trang 3Site Functionality Issues 393
is important to know the most common errors users experience and why they occur
Functionality errors manifest themselves in three ways:
Users can’t get to the site at all
It takes too long to download and view a page
The document they request is missing or appears to be broken
In the following sections, we’ll take a look at the technological factors neath each of these errors
under-Connectivity Failure
The most basic Web browser error is when a user fails to get any information from your Web server These attempts will generate a warning message in the browser such as “Host not found” or “Request timed out.” Such a warn-ing message is shown in Figure 9.1—the user is trying to go to the Web site www.bahoozit.com, which does not exist
Not all error messages indicate a connectivity problem If the server gives
a dreaded “404—File not found” error, for example, your client is ing to the server but the requested document cannot be found If the host wasn’t found or the request timed out, there was never a full-fledged con-nection between the client and the browser Because connectivity errors mean that the server never gets a full connection to the client, such problems are often never logged on the server
Trang 4connect-394 Chapter 9 Internet Site Functionality Design
As explained in Chapter 2, several client queries and server responses need to succeed for a user to browse a Web page—the server’s domain name needs to be resolved into an IP address, and the client needs to make a suc-cessful request to the Web server at that address If the user can’t get to a site
at all, the problem could be caused by one of several factors:
The client’s network settings or DNS services are not working
The client’s connection to the Internet is down
The server’s hardware or software is malfunctioning or overwhelmed
The server’s connection to the Internet is down or overwhelmed
Available IP network connections between client and server are over-saturated
The server’s DNS records are corrupt or unavailable
Determining the exact cause of failure requires some troubleshooting For more information on troubleshooting, see Chapter 10
Another common reason that users get an error message is that the domain name they entered is incorrect The best way to counter this potential problem
is to register a domain name that is short, descriptive of your organization, and easy to remember If the domain name isn’t unique sounding, people can forget it and try similar names Some organizations register multiple domain names that people might think of going to George W Bush, for his U S pres-
will register common misspellings
Download and View Time
One common reason a user doesn’t use a Web site is that it is too slow How slow is too slow? Researchers at Yale claim that 10 seconds is the threshold
of frustration Users may wait longer than that if the information cannot readily be found elsewhere or if they are particularly interested in a site, but then again, they might not So depending on the patience level of the audi-ence, pages should finish loading within 10 seconds of the time a user clicks
a link
Trang 5Site Functionality Issues 395
In the following sections, you’ll learn:
The different stages of a request that can eat into those 10 seconds
How to estimate the time it takes to download a page
How available bandwidth limits download speeds
Examples and rules of thumb for download times
These sections will enable you to estimate whether your page is going to be too slow
Stages of a Request
The 10 seconds a user will wait gets split up into several steps, and each step uses up a portion of that time The major steps are as follows:
1. DNS lookup and initial connection from client to Web server occurs
2. Request sits in the Web server queue, waiting to be serviced
3. Server generates response to the request (gets a file, runs a script)
4. Server transmits the data to the client
5. Client renders/displays the data
Combined, steps 1 and 5, which are the ones most clearly out of the trol of the server, generally take a second or two The time required for steps
con-2 and 3 depends on the server configuration, although they can often also be reduced to less than a second (you’ll learn more about this in “Planning Robust Back-End Service” later in this chapter) The bulk of the time, there-fore, is spent on step 4, transmitting the data from server to client
Step 5 can sometimes take longer than one second Slow computers may take several seconds to parse and render HTML documents Even fast computers can get bogged down by complex HTML code, such as nested tables.
Determining Transmission Time
Step 4 generally takes the longest amount of time, so it has the most impact
on the apparent speed of the Web site If the Web page takes too long to load, the user will leave Therefore, it is important to be able to estimate how long
a Web page will take to download for different types of users
Trang 6396 Chapter 9 Internet Site Functionality Design
Transmission time is a function of how large the page is, divided by the speed at which it is downloaded The size of the page is measured in kilobytes for the HTML, graphics, and multimedia files The standard way to express this is as follows:
Time of Download = (Size of Page ÷ Available Bandwidth)
If a site has a 100K page, the time it will take someone to download it with
a 5KB/s connection can be estimated Using 100K for the size of the page and 5KB/s for the available bandwidth, the formula shows that it would take 20 seconds to download:
X seconds = (100 kilobytes ÷ 5 kilobytes per second) = 20 seconds
Be careful not to confuse bytes and bits People write about file sizes and download speeds using the terms kilobytes and kilobits Bytes are generally
8 bits Also, kilobytes and kilobits refer to 1,024 bytes and bits respectively, not 1,000 Unfortunately, some folks, especially advertisers, represent kilo- bits and kilobytes with inconsistent symbols Kilobits are referred to as K, k,
kb, and Kb Kilobytes are referred to as K, k, kB, and KB The symbols K and
k are ambiguous! When looking at a number like 14K or 14k, a good rule of thumb is that modem-like devices are generally measured in terms of kilobits per second, and file sizes are almost always measured in kilobytes Lacking any other clues, KB is likely to be kilobytes and kb (or Kb) kilobits When writ- ing, choose clear notation, such as KB and kb.
Bandwidth Bottlenecks
When data is downloaded, it flows in a pipeline from the server to the server’s Internet connection to the general Internet, then from the client’s network connection to the client So the available bandwidth is the speed of the slowest segment of the pipeline In 1999 in the United States, the slowest segment is generally the client’s network connection If a U.S browser is vis-iting a server in Kenya, however, the slowest segment is likely going to be the slow connection between the Kenyan and U.S national backbones
Trang 7Site Functionality Issues 397
Theoretical and Practical Download Speeds
The goal of Web designers should be to design pages that won’t take too long
to download Network connections, however, rarely perform exactly as advertised Therefore, you should consider the following:
Know the theoretical speed of different devices
Take these speeds with a grain of salt
It is easy to determine the theoretical speed of any device A 56Kbps modem, for example, should be able to download about 7K per second You can determine that with this formula:
(56Kbps ÷ 8 bits per byte) = 7KB/sTable 9.1 lists the theoretical speeds of several types of network connections
Real-world factors like initial connection times, intervening devices, and line noise slow downloads to below their advertised limits Even with a fast
1000Mbps)
Trang 8398 Chapter 9 Internet Site Functionality Design
server and a good ISP, a 56Kbps modem, for example, will rarely achieve that speed 56Kbps modems operate at 33.6Kbps over analog phone lines If
an ISP has digital lines, there is a chance that their users will be able to get 56Kbps download speed, but uploads will stay at 33.6Kbps A 14.4Kbps modem will often download at 1.5KB/s, a 28.8Kbps modem at 3KB/s, a 56Kbps modem will optimistically download at 5KB/s, and an unloaded T1 dedicated line will download at 180KB/s
DSL and cable modem users will notice large variances in their download speeds, anywhere from 384Kbps to 10Mbps Even DSL services that are advertised at 384Kbps frequently get download speeds of 800Kbps (100KB/s) during unloaded times and 100Kbps or slower when the DSL network is sat- urated For more information on benchmarking DSL and cable modems, see
Example: A Page Viewers Might Abandon
Freshmeat (www.freshmeat.net), a popular Unix software directory, weighs in at 78K, almost all HTML As you can see in Figure 9.2, modem users have a smaller Internet pipeline than DSL users do It will take a 28.8K modem user about 26 seconds to download a page this size, whereas a DSL modem running at the advertised 384Kbps would receive it in about 2 seconds Downloading this page hovers on the threshold of frustration for 56Kbs modem users—fickle users might get bored with waiting for the page and jump over to see if linuxapps.com is loading any quicker (at a slightly slimmer 75K)
Different Download Times:
DSL vs a 56K Modem
www.freshmeat.net home page
1 second
384K DSL
78KB
56Kbps modem
MOO! MOO!
Trang 9Site Functionality Issues 399
Example: A Page Viewers Would Not Abandon
Google (www.google.com) has a highly functional search page of only 12K
As you can see in Figure 9.3, even a 14.4 modem user can download the page
in less than the 10 second threshold of frustration It is unlikely that even the impatient users would abandon the Google page in less than 8 seconds to try another search engine like www.hotbot.com (a lean 30K)
You can test out the probable download times of any page on the Internet with this free online tool: www2.imagiware.com.
Inability to Open or View Files
If people can’t use the files on your site, they will often feel frustrated and give up Files that cannot be opened are either corrupt or are somehow incompatible with certain software and hardware configurations In this sec-tion, you will learn the following:
How a browser successfully recognizes a file
What stops a browser from opening a multimedia file
What stops a browser from opening an HTML file
How to identify and fix corrupt files
Download Times for google.com
10,000,000,000, 000,000,000,000, 000,000,000,000, 000,000,000,000, 000,000,000,000, 000,000,000,000, 000,000,000,000, 000,000,000,000, 000,000
Trang 10It is important for Web site owners to fix the broken files and mark incompatible ones with warnings as to who can and cannot use them.
Many times when someone says a file “won’t open,” it is because the file is simply not there Broken links and missing files are quite common on the WWW People move the files in their Web site around a lot, and the links to their old files are not automatically updated See Chapter 10 on how to set up
a system to counter this potential source of errors.
How a Browser Recognizes a File
Browsers sometimes fail to display a file or display it in a mangled fashion
To understand why they fail, like good doctors we need to first understand what happens with our patient when everything goes right and the browser
succeeds in displaying a file The technology that makes this happen is MIME file types
MIME is an acronym that stands for Multipurpose Internet Mail Extension It allows Web browsers and e-mail clients to recognize and view lots of different types of files Servers that deliver pages tag these pages as being certain file types Clients display these file types as best they can Read www.whatis.com/ mime.htm for details.
In a foreign culture, even people who know the language need to be told when something is a joke They often don’t pick up the subtle clues they need
to change the context of their understanding from “serious” to “joke.” In a similar way, browsers need to be told explicitly what mode they should use
to interpret each file Browsers handle many different types of files The first Web browser was designed to display only HTML Later browsers learned
to understand files from Gopher servers, FTP servers, and WWWAIS index servers The next generation of browsers learned to display inline images like GIF and JPEG files More recently, browsers can open Adobe Acrobat por-table documents, Java applets, XML documents, and others
When a browser downloads a file, the Web server tells the browser exactly what type of file it is The server uses a configuration file (MIME.TYPES in
Trang 11Site Functionality Issues 401
Apache and Netscape servers) to figure out what files should be marked as being which file types As you can see in Figure 9.4, MIME.TYPES has two fields—the field on the left names a content-type The field on the right con-tains all the file extensions that should trigger the Web server to mark a file as the corresponding content-type in the field on the left
The browser uses the MIME information to decide what to do with a ticular type of file The browser could try to parse and display the file, save the file, or launch an external program to open the file The client uses a flex-ible lookup table mapping “MIME-type” to “what to do.” In Figure 9.4, you can see a list of different MIME types and what MIME types the browser knows belong to each extension Figures 9.5 and 9.6 show the user config-uring the exact mapping; the MIME type audio/x-pn-aiff is being mapped to run on a RealPlayer external program
Trang 12par-F I G U R E 9 5 Configuring client MIME types
Trang 13Site Functionality Issues 403
Web servers send a MIME header with each file, specifying what type of file
it is The Web site administrator maintains a lookup table on the Web server
that matches file extension to MIME-type If you are adding a new file type to
your site, add it to this lookup table
Missing MIME-Types and Plug-Ins
After a browser uses MIME-types to recognize a file, it may use either an
external program or a “plug-in” to open nonstandard file types Plug-ins are
mini-programs that work within the browser and add extra functionality, such as Shockwave or VRML browsing If the browser comes upon a MIME-type that it doesn’t have in its lookup table, it may be unable to dis-play the file Likewise, if the MIME-type requires a plug-in, the browser may lack that plug-in and be unable to read the file
If the browser doesn’t have a plug-in or external program capable of opening the file, a file can appear unreadable to the user The file isn’t really unreadable, it is just not “openable” for that particular user If the user had
a stand-alone application or plug-in that can read the file, then the file would
be readable To assist the user, Microsoft and Netscape browsers check to see if there are any downloadable plug-ins available that can be used to view
a new MIME-type
Not all MIME-types have plug-ins for every platform Some plug-ins only exist for Macintosh computers, others only for Windows Therefore, users can be unable to open a special multimedia file because the plug-in needed to open that file simply does not exist for their platform
Misconfigured MIME-Types
If the server sends the wrong MIME-type, the browser may try to use the wrong application to interpret the data This will look to the user like a “‘bro- ken” file.
If a document is supposed to be a Microsoft Word document, but the
browser tries to open it as a plain text file, MIME is probably the reason Check that the server is sending DOC files with the MIME header application/ msword and that the browser is set to use WINWORD.EXE to open files of type application/msword
Trang 14See Chapter 6 for more on configuring MIME on the client.
Malformed HTML
Browsers internally render documents with the MIME-type text/html, so users don’t need any plug-ins for normal Web documents However, even HTML can be “not viewable” when one of the following conditions exist:
The HTML contains tags that the browser does not support
The page includes an incompatible Java or JavaScript program
There isn’t enough room to properly display the HTML
Nonstandard Tags
If the HTML uses nonstandard HTML tags, and the browser doesn’t port those tags, the page can be unusable—the frame tags without the “no-frames” option is a good example of this problem If the HTML is invalid (for example, if it is missing closing tags), the browser may not know how to render the page and just render nothing for the entire malformed item In the case of a malformed <TABLE>…</TABLE>, the entire page could be blank
sup-Java and sup-JavaScript
HTML pages now can also include client-side scripting using JavaScript and, for those who only use Microsoft Internet Explorer browsers, VBScript They can also include small Java applications called applets Both JavaScript and Java have different versions, and not all browsers support all versions
If a page that contains JavaScript works for the developers but fails to load properly for other users, check to see if the JavaScript is written so that it needs a recent browser
Graphic Resolution
HTML is usually viewable on monitors of many different sizes Paragraphs wrap to fit the available space Some HTML tags (including <IMG> and
<TABLE>) can specify absolute widths in terms of pixels If a Web site uses
a <TABLE SIZE=900 ALIGN=CENTER> tag, then a user with monitor olution of 640x480 pixels will not be able to view most of the Web site This can be even more destructive when the frame option is used and the ability
Trang 15res-Site Functionality Issues 405
Computer screens generally display between 72 and 96 pixels per inch, and there are still many monitors that only display 640x480 pixels There-fore, when scanning in pictures, keep in mind that a Web browser will con-vert a high-resolution image (say, 300 dots per inch) to 72 pixels per inch This means a 3.5-inch photograph scanned in at 300 dots per inch can end
up displaying at 1,050 pixels—larger than the screen of a large number of browsers
The terms dots per inch (dpi) and pixels per inch (ppi) are often used
inter-changeably when discussing screen resolution This is not technically correct, however Dots per inch is a printer resolution, whereas pixels per inch is a screen resolution.
Not only is the image larger than the viewable area of the browser dow, it also requires extra bandwidth to download the larger graphic file Sticking to a screen resolution image (72 to 96ppi) will help keep files small enough to transmit quickly
win-Corrupt Files
File corruption can also stop some files from being opened Corruption
means that a working file has been changed so that its application can no longer understand, or parse, the file In the Web server environment, files are rarely corrupted Generally, “corrupt” files are really files that aren’t being opened with the right program or that have been misnamed or otherwise mangled by the user
For example, suppose a user has a file called BIGDIARY.DOC and then puts this file in the compressed zip archive ARCHIVES.ZIP To open ARCHIVES.ZIP, a user would need to have a program that could parse ZIP files But if that user renames ARCHIVES.ZIP to ARCHIVES.DOC, Microsoft Word would claim that the file is corrupt
The best way to fix file corruption is to try to open the original file on the original computer If the file is not corrupted, replace the corrupted version with the uncorrupted version and try again If you’re transferring a file from one computer to another using FTP, set the FTP program to use ASCII when transferring text files (such as HTML, scripts, and files with the TXT exten-sion) and BINARY when transferring binary files (such as files with the extensions EXE and DOC)
Trang 16Technology and Content Planning
The best way to ensure a well-functioning Web site is to plan ahead By planning ahead, administrators can address potential problems before their customers are screaming for blood Also, comprehensive planning leads to optimal trade-offs with factors like high functionality versus compatibility.This section will address planning both the front end (what the users see) and the back end (what makes the site work behind the scenes) Specifically,
it will consider the processes for the following:
Planning which content types (media) to use
Planning for what server and network resources may be needed
A well-thought-out and well-implemented plan for both the front and back end of a Web site will minimize the problems discussed in the previous sections
Audience-Appropriate Media
A Web site’s content is more than just the words in an HTML document The content can also includes the graphics, video, and other multimedia files on your site Some people will appreciate these glitzy multimedia effects; others will be unable or unwilling to view nonstandard or large multimedia files Choices to either include or not include different content types will have con-sequences on who uses a Web site
Keep the following in mind when choosing your content policies:
Determine the attitude and key technical attributes of your audience
Given the goals of your Web site, choose a content policy tailored to your audience
A site that follows these methods will serve its viewers in a strategic way and
is therefore more likely to achieve its goals
Audience Profiles
One simple yet beautiful strategy for building up or maintaining an audience
is to use technologies that work for them Before you can do that, though, you need to know who your audience is In terms of what content types to
Trang 17Technology and Content Planning 407
use, you will especially want to consider their desires and technical capacity
to use different media This section breaks these attributes into two areas:
Desire for multimedia content
Client performance levels
This information should help you made informed decisions about the priate content and capabilities for your Web site
appro-Desire for Multimedia Content
It is difficult to know exactly what anyone wants without asking them directly This section provides a rule of thumb for guessing when multime-dia content would be desired It also shows real-life examples of when such content is appropriate and presents a hypothetical example of when Shock-wave multimedia would be a good idea
The rule of thumb for multimedia content is this: does the functionality
of the file directly serve the central purpose of the Web page and cally enhance the usability of the page? If the answer is yes, users will desire that content and may be willing to go through the effort to get to it If the answer is no, the multimedia files will cause frustration if they delay users
dramati-or ask them to modify their browsers in any way
For example, when NASA first released pictures from Galileo at galileo.jpl.nasa.gov/images/io/ioimages.html, people went to the site and waited for a long time to download the pictures Visitors to the NASA site went there especially to view the pictures, so their motivation to wait was high But when users go to Yahoo!, they don’t expect to be dazzled by a Flash graphic; they have come to find another Web site Yahoo! keeps mul-timedia delays to a minimum and focuses on its functionality as a category browser Although it’s different, Yahoo! and NASA are each providing the content their viewers want
For more on designing usable Web sites, visit dmoz.org/Computers/
Internet/WWW/Web_Usability/.
Think about who your audience is and why they come to your site How much time are they willing to spend for multimedia content? Take, for example, Shockwave, which is a plug-in that allows users to play simple games and view animated pictures Would your audience want to download
a Shockwave plug-in in order to view your site? If you have a news site, nice
Trang 18pictures would be appealing, but a Shockwave game might not be ling But if your site provides Web-based tools for diagramming atoms, scientists would probably have enough motivation to download a plug-in
compel-It really depends on how relevant the multimedia is to the purpose of the Web page
Client Performance Levels
To make the best possible site, you’ll need some information about the net abilities of your audience Audiences have different abilities and techno-logical needs Their abilities and needs will depend upon factors such as network and Internet connection speed, browser type and version, and oper-ating system
Inter-NETWORK SPEED
The speed of a user’s Internet connection affects her willingness to download big files Users who get Internet connectivity by dialing in to an ISP with a modem have vastly slower Internet connections than those who have DSL, cable modems, or T1s Those with especially slow network connections need text-oriented navigation and content because they may surf “images off” (meaning that they turn off the capability for their browser to display images, thus making the page load faster) Those who have fast Internet con-nections are more likely to want to download extras like software and music
Web servers can record how long it takes viewers to download each file You can use the formula outlined in “Download and View Time” earlier in this chapter to compute the average network speed of your viewers
BROWSER VERSION
It is important to determine what browsers your visitors are using so you know what capabilities they have Browsers come in more options than just Netscape Navigator and Microsoft Internet Explorer (IE) Each browser can implement different features, and different versions of a browser also have differing capabilities
Although the latest browsers, like Netscape Navigator 4.7, implement cutting-edge features, using these features may break the Web site for other
Trang 19Technology and Content Planning 409
browsers There are actually hundreds of different brands and versions of browsers, and they can all differ from each other in terms of their capabilities:
The Lynx text-mode browser can’t view Java applets
Only IE can use VBScript
Only recent versions of IE can display raw XML
Netscape Navigator 2 doesn’t format text according to <STYLE> tags
For a more complete listing of browser features, see www.browsercaps.com.
It’s often easier to upgrade a browser than it is to get a faster network nection, but there are still a lot of older browsers in use A Web server can log the browser version used by each visitor to your Web site
con-Lynx is a text browser for the World Wide Web It comes installed on most
Linux machines and was widely used at universities in the early days of the WWW It remains the browser of choice for tens of thousands of users For more information, see lynx.browser.org/.
OPERATING SYSTEMS
People surf the Web on many flavors of Microsoft Windows, Macintosh computers, Linux and other Unix systems, BeOS boxes, Amigas, and more Browsers often send along the name of their operating system to the Web server so it too can be logged and analyzed
Browsers are highly cross platform, so the operating system is not usually
an issue However, if the Web site relies on special content types that need plug-ins, those plug-ins may not be available for all operating systems Also, external plug-in programs may be limited to only a few operating systems
EXAMPLE PERFORMANCE PROFILES
Suppose your audience consists of university computer science departments How would you categorize their Internet capabilities and needs? Computer science labs typically have fast network connections, recent versions of Web browsers, and a mix of operating systems If your audience consists of Win-dows gamers, you can’t assume that they have more than a 28.8Kbps modem, but you could assume they are running Windows with a recent browser and can download a Windows-only plug-in if there was a good reason to do so
Trang 20Sites designed for these two different audiences might very well differ in the media use.
Content-Technology Policies
After gathering information about your audience, the next step is to draft
a content type policy The policy will guide the entire organization as to what content types to use on the Web site
The goal of the content policy is to make sure the Web site can inform,
entertain, and supply the target audience with the tools they want without
putting up roadblocks As you saw in the preceding section, different people will consider different content types a roadblock, so no policy will satisfy everyone
An absence of policy can lead to confusion among the Web site developers and frustration among its audience, so it is worth considering the basic types
of policies and how to implement one
Types of Policies
Should the Web site use only content types that everyone is able to view? Should you use a technology if only 80 percent of your visitors have access to it? Here are four of the most common policies These basic poli-cies can be modified to reflect an organization’s goals and culture
CAPTIVE AUDIENCE
With a captive audience policy, the Web site creators create the content in
the format they want to use and their audience must use only browsers that work with the content types they’ve chosen Captive audience policies usu-ally rely on the content creators having control of the browsers people have
on their desktops This is generally only the case in corporate intranets, and even then, only when there is strict control and standardization of comput-ing resources Where it is feasible, such a policy can lead to the full use of leading-edge money-saving technologies
LOWEST COMMON DENOMINATOR
A lowest common denominator policy is the opposite of the captive audience
policy—a Web site designer creates a site that is functional for almost any browser, even those with extremely low capabilities The idea here is to cre-ate the content so it looks good on the text-only Lynx browser, and it’ll work even better in everything else
Trang 21Technology and Content Planning 411
To demonstrate how a lowest common denominator policy works in practice, Figures 9.7 and 9.8 show the same Web page—a community site for Canadian activists—in both Lynx and Netscape Navigator The site in Lynx,
a text-mode browser, functions perfectly well, as you can see in Figure 9.7 Figure 9.8 shows the same content in Navigator, which offers all the Lynx features and more—including fonts, colors, and a background image
Trang 22F I G U R E 9 8 The Web Networks Web site in a graphical browser
There is a techno-political movement that supports the lowest common denominator approach: www.anybrowser.org/campaign/.
85% POLICY
The 85% policy states that you should “use technologies that will reach many
people, but don’t let the stragglers drag functionality down for other ers.” A lot of sites don’t care deeply about reaching everyone For example, college students generally put up a home page for fun Their home pages don’t generate more fun for them if they were created so the tiny fraction of text-only browsers can view them But a Shockwave party invitation might
Trang 23view-Technology and Content Planning 413
increase the fun considerably, so they will design the site so that the majority
of the visitors will be able to visit it and take advantage of its features.Businesses generally put up a Web site to sell something A glitzy Web site may sell more than a plain one, even if the glitzy page is theoretically not accessible to those with slower modems
ADAPTIVE CONTENT
Using an adaptive content policy, Web site developers don’t necessarily
have to choose between accessibility and glamour Instead, sites can deliver advanced features to clients that can use them and deliver standard features
to those who can’t This way, the whole audience is well served Creating such a Web site, however, adds complexity and often cost
There are two ways to create a Web site that provides high functionality
to advanced clients and also gracefully provides reduced functionality:
Differential content Servers identify which clients can use advanced
fea-tures and send pages with those feafea-tures only For example, you could ate a Web page that recognizes older browsers and redirects them to a portion of the site that doesn’t use frames Figure 9.9 shows a Web page
cre-in which users are asked which version of the Web site they want to visit
Graceful degradation Like subtle irony in the Simpsons, Web pages can
sometimes include advanced features in a way that harmlessly passes over
the heads of less-advanced browsers This is called graceful degradation—
if the browser can’t use advanced features like frames or JavaScript, these extra features are just ignored The benefit of the graceful degradation is that everyone can use the site as they would like to use it; in other words,
“the user is always right.” The cost is added complexity in maintaining multiple versions of documents or in documents that degrade well
Trang 24F I G U R E 9 9 www.browsercaps.com asks what site version to use.
Implementing a Policy
Once you have put all that effort into researching your audience and ing the type of policy to use, it would be a waste to ignore the policy There are two main factors in the success of a policy:
choos- That it is specific
That it is adopted and used
Trang 25Technology and Content Planning 415
CONCRETE AND SPECIFIC
Content policies serve as a style guide for content creators and Web site designers The policy should give these people specific guidelines to follow Here are some examples of guidelines you might include:
Limit main navigation pages to a maximum size of 80K
Mark links to pages that are larger than 150K
Do not use HTML that requires a browser more recent than Netscape 3
Your guidelines may be more restrictive or less restrictive than these sample guidelines, but they should be as specific
KNOWN AND ADOPTED
Many people contribute to the health of a site and play a role in creating its content There needs to be agreement on what technologies to use This pol-icy might be handed down by the CEO, or it might be collaboratively devel-oped But it should be written down, and new employees should be trained
in its use
Planning Robust Back-End Service
Plan for the Web site back end so that it is robust and can meet the demands placed upon it by good fortune If you don’t, the consequences can be quite severe There is an amusing television commercial that illustrates this In a self-help group, a man says, “I just can’t help get over my feeling of being stu-pid,” and the group facilitator says, “Nobody is stupid, Bob.” Bob then reports how he blew a multimillion dollar marketing campaign by not warn-ing the server guys, and the site couldn’t take the hits The closing comment
is “That is stupid Bob!” Don’t be stupid like Bob ☺
There are two ways in which the failure of the server operations brings down a site:
Too many hits overload the server
A critical component dies or malfunctions
The following sections cover strategies for minimizing these two possibilities
Trang 26Abundance Equals Performance
Web servers are often no heavier or bulkier than a simple word processor, yet they can almost magically serve up millions of documents to people all over the world Just as mysteriously, they can bog down and serve docu-
ments slowly The actual performance of a Web server depends on network
speed, RAM, processor and hard-disk speed, software, and operating tem That said, there is a general theory that you can use to plan in advance how much of these resources your server(s) are going to need In the follow-ing sections, we’ll try to demystify Web server performance by providing an overview of these topics:
sys- Basic theory on Web server performance
A strategy for high performance
Finding performance blocksThe bottom line is that an overloaded server will seem slower, so it is important to always operate servers with spare capacity Given that surges of interest can generate demand spikes, it is desirable to have plenty of spare capacity
Even with extremely fast computers that have enough RAM, an individual request can only be fulfilled as quickly as the client can receive the response (see the section on proxy servers later in this chapter for caveats to this).
Performance Theory
Web servers are built to handle many simultaneous requests, much like a busy restaurant is designed to handle the constant flow of dining traffic People wait in the lobby until there is a free table Then a waiter leads them to a table and services their requests until they are done and leave Web servers gener-ally have 10 to 200 semi-independent processes or threads that can each fulfill
one request at a time Each process or thread is called a child of the Web
server Incoming requests cool their heels in a pool of unassigned requests (the lobby) When a Web server process (waiter) is unoccupied, it’ll be assigned to handle a request in the pool
Trang 27Technology and Content Planning 417
The following sections use the words threads and processes because the
multithreaded, multiprocess model is the most used in today’s software Apache, for example, is the most widely used Web server, and it uses threads and processes, depending on the underlying operating system But not all Web servers rely on the multithreaded or multiprocess model See the World Wide Web Consortium’s list of servers at www.w3.org/Servers.html for more information on all different types of servers.
Mathematicians have described the properties of these pools of waiting
people in queue theory Queue is the British word for line One thing that
queue theory predicts is that the length of the queue is dependent on the ative size of the outgoing and incoming flows Queue theory uses the term
rel-utilization rate to signify things like the number of new requests per second
divided by the number of requests that can be fulfilled each second:
Utilization Rate = (Rate of New Requests ÷ Maximum Rate Fulfilled)For example, a Web server that can finish 10 requests each second and gets 8 requests per second will have a utilization rate of 8 This Web server’s queue will approach zero because there will usually be zero requests waiting
in the queue Even if there are occasionally more than 10 requests in a ond, the server will quickly be able to recover and bring the queue down to zero again If the Web server gets 11 requests a second, however, then in the long run, the queue will grow by at least 1 per second In 10 minutes, the queue will have more than 600 waiting requests, and the queue time will be
sec-a minute So if our exsec-ample Web server goes from 8 requests to 11 requests
a second, the performance degradation is massive, not incremental
Keep Utilization Rate Low
When the utilization rate approaches 1.0, the queue will grow and the formance of the server will start to degrade The key is to ensure that the uti-lization rate (even at peak times) is a lot less than 1 There are two ways to lower the utilization rate:
per- Reduce the number of incoming requests
Increase the maximum rate of fulfilled requests
Reducing the number of incoming requests is simple—immediately reject
or discard requests after a threshold has been reached This is often ceptable and is used only as a safety measure to make sure overloaded Web sites don’t lock up
Trang 28unac-Therefore, it is often necessary to increase the rate of fulfilled requests This rate is the number of children actively fulfilling requests divided by the average length of time it takes each request to be fulfilled The equation looks like this:
Rate of Fulfilled Requests = Active Children ÷ Time per RequestNote that the equation is only accurate when both the number of active children and the time per request is fairly constant So if a Web server gen-erally has 20 busy children and the average request is fulfilled in 4 seconds, the Web server has a rate of about 5 requests per second It may be possible
to decrease the utilization rate by increasing the number of effective children
Removing Blocks to High Performance
The preceding section indicated that increasing the number of effective dren can increase the maximum capacity of the Web server and thereby increase the performance of the system Increasing the number of effective
chil-child processes requires a balance of resources The word effective is very
important—simply increasing the number of children may actually reduce the number of effective children If there are more waiters than tables, they’ll just be stumbling into each other and fewer people will get served
When planning your server environment, you need a balance of elements such as network bandwidth, RAM, disk I/O, and database connections Any one of these can easily become a bottleneck The most common limiting fac-tor for Web servers is the lack of network bandwidth to serve people at peak times See the sidebar “Choosing the Right Amount of Bandwidth for a Server” for more on this
If there is plenty of bandwidth, it will certainly take experimentation to ascertain what is limiting the number of effective children As an example of calculating the right balance, let’s consider an Apache Web server, which is one of the most commonly used Web server softwares After bandwidth, the lack of RAM is the most frequent limiting factor for Apache Each child pro-cess will use some amount of memory (5MB of RAM, for example), and so
a server with 500MB of RAM available for Web serving can only support
100 of these children Even doubling the CPU cycles will not significantly increase the speed of a system that is RAM bound, and vice versa
Trang 29Technology and Content Planning 419
The use of a Swap file or virtual RAM is usually unacceptable for Web vers The time it takes to swap memory to the hard disk increases the average length of time it takes each request to be fulfilled
ser-There are other possible constraints on the number of effective children
If the children execute computationally intensive scripts or programs, the CPU may be the bottleneck If each process consumes some other limited resource, such as database connections or disk I/O, it can reduce the useful-ness of more children
Choosing the Right Amount of Bandwidth for a Server
The formula for determining current bandwidth needs is the maximum request rate divided by the average download speed of each request If a Web server gets a maximum of 5 requests a second, with an average down- load speed of 3 K/s, then it only needs 15 K/s of bandwidth, such as a fast ISDN line When a site is nearing its capacity, it is likely beginning to slow down, which causes users to give up, thereby reducing the bandwidth required Adding bandwidth to a busy site reduces response time, so fewer people quit.
This means that, after increasing bandwidth, a site will often see its use jump up and needs to increase bandwidth again The exact utilization rate
at which performance is degraded depends on the network hardware and configuration Request rates often increase in a linear fashion for a while and then sharply spike up when the site is listed in a popular magazine or search engine or when a community site has a critical mass of users.
The key is to be able to easily increase a site’s capacity How easy is it to get your ISP to add an extra T1 of bandwidth or to change ISPs? If it takes a week of lead time, it may be unacceptable to have bad performance for
a week In this case, the network planner should buy bandwidth in advance
of possible marketing successes.
Trang 30Redundancy Equals Reliability
Your back-end service is only as good as its weakest link If your tion’s name servers don’t work, no one will be able to get to your site and it doesn’t matter if your site has plenty of bandwidth
organiza-Don’t forget to ensure that network services like DNS and e-mail are also dant Crackers can disable or clog poorly configured servers When DNS stops working, no one can find your site When e-mail goes down, most organizations shut down In addition to the security measures covered in Chapter 7, consider getting redundant mail and DNS servers so you’ll be covered in case of crack- ers, earthquakes, or other emergencies.
redun-It is necessary to plan what would happen if any resource suffered a breakdown and to make sure no crucial points can fail without a backup There are two ways most organizations assure that they can recover from a malfunctioning component:
Owning spares for the component
Knowing someone will fix the broken component immediatelyWhichever strategy or combination of strategies you use, be sure to consider the cost of the strategy against the potential cost of downtime