THE PAST DECADE has seen an increased interest in website performance, with businesses of all sizes realizing that even modest changes in page loading times can have a signifi cant effec
Trang 3INTRODUCTION xxiii
PART I FRONT END CHAPTER 1 A Refresher on Web Browsers 3
CHAPTER 2 Utilizing Client-Side Caching 23
CHAPTER 3 Content Compression 39
CHAPTER 4 Keeping the Size Down with Minifi cation 53
CHAPTER 5 Optimizing Web Graphics and CSS 71
CHAPTER 6 JavaScript, the Document Object Model, and Ajax 111
PART II BACK END CHAPTER 7 Working with Web Servers 141
CHAPTER 8 Tuning MySQL 193
CHAPTER 9 MySQL in the Network 255
CHAPTER 10 Utilizing NoSQL Solutions 309
CHAPTER 11 Working with Secure Sockets Layer (SSL) 359
CHAPTER 12 Optimizing PHP 375
PART III APPENDIXES APPENDIX A TCP Performance 405
APPENDIX B Designing for Mobile Platforms 409
APPENDIX C Compression 417
INDEX 427
Trang 5OPTIMIZING THE FRONT END AND THE BACK END
Peter Smith
John Wiley & Sons, Inc.
Trang 6Indianapolis, IN 46256
www.wiley.com
Copyright © 2013 by John Wiley & Sons, Inc., Indianapolis, Indiana
Published simultaneously in Canada
No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means,
electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108
of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization
through payment of the appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers,
MA 01923, (978) 750-8400, fax (978) 646-8600 Requests to the Publisher for permission should be addressed to the
Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011,
fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.
Limit of Liability/Disclaimer of Warranty: The publisher and the author make no representations or warranties with
respect to the accuracy or completeness of the contents of this work and specifi cally disclaim all warranties, including
without limitation warranties of fi tness for a particular purpose No warranty may be created or extended by sales or
promotional materials The advice and strategies contained herein may not be suitable for every situation This work
is sold with the understanding that the publisher is not engaged in rendering legal, accounting, or other professional
services If professional assistance is required, the services of a competent professional person should be sought Neither
the publisher nor the author shall be liable for damages arising herefrom The fact that an organization or Web site is
referred to in this work as a citation and/or a potential source of further information does not mean that the author or the
publisher endorses the information the organization or Web site may provide or recommendations it may make Further,
readers should be aware that Internet Web sites listed in this work may have changed or disappeared between when this
work was written and when it is read.
For general information on our other products and services please contact our Customer Care Department within the
United States at (877) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.
Wiley publishes in a variety of print and electronic formats and by print-on-demand Some material included with standard
print versions of this book may not be included in e-books or in print-on-demand If this book refers to media such as a CD
or DVD that is not included in the version you purchased, you may download this material at http://booksupport
.wiley.com For more information about Wiley products, visit www.wiley.com.
Library of Congress Control Number: 2012949514
Trademarks: Wiley, the Wiley logo, Wrox, the Wrox logo, Programmer to Programmer, and related trade dress are
trade-marks or registered tradetrade-marks of John Wiley & Sons, Inc and/or its affi liates, in the United States and other countries,
and may not be used without written permission All other trademarks are the property of their respective owners John
Wiley & Sons, Inc., is not associated with any product or vendor mentioned in this book.
Trang 9PETER G SMITH has been a full-time Linux consultant, web developer, and system administrator, with
a particular interest in performance for the past 13 years Over the years, he has helped a wide range
of clients in areas such as front-end performance, load balancing and scalability, and database mization Past open source projects include modules for Apache and OSCommerce, a cross-platform IRC client, and contributions to The Linux Documentation Project (TLDP)
opti-ABOUT THE TECHNICAL EDITOR
JOHN PELOQUIN is a software engineer with back-end and front-end experience ranging across web applications of all sizes Peloquin earned his B.A in Mathematics from the University of California at Berkeley, and is currently a lead engineer for a healthcare technology startup, where
he makes heavy use of MySQL, PHP, and JavaScript He has edited Professional JavaScript for Web Developers, 3rd Edition by Nicholas Zakas (Indianapolis: Wiley, 2012) and JavaScript 24-Hour Trainer by Jeremy McPeak (Indianapolis: Wiley, 2010) When he is not coding or col-
lecting errata, Peloquin is often found engaged in mathematics, philosophy, or juggling
Trang 11Mary Beth Wakefi eld
FREELANCER EDITORIAL MANAGER
Trang 13A LOT OF PEOPLE HAVE BEEN INVOLVED in making this book happen I’d like to thank everyone at Wiley for their hard work, especially Carol Long for having faith in my original idea and helping me
to develop it, and Kevin Shafer, my Project Editor, who patiently helped turn my manuscript into
a well-rounded book Special thanks are also due to John Peloquin, whose technical review proved invaluable
I’d also like to take the opportunity to thank my friends and family for being so supportive over the past few months
Trang 15INTRODUCTION xxiii
PART I: FRONT END
CHAPTER 1: A REFRESHER ON WEB BROWSERS 3
CHAPTER 2: UTILIZING CLIENT-SIDE CACHING 23
Trang 16The DNS Resolution Process 35
Summary 37
Summary 51
CHAPTER 4: KEEPING THE SIZE DOWN WITH MINIFICATION 53
Summary 69
CHAPTER 5: OPTIMIZING WEB GRAPHICS AND CSS 71
JPEG 72GIF 72PNG 73SVG 73
Trang 17Image Editing Software 74
MNG 109APNG 109
Summary 110
CHAPTER 6: JAVASCRIPT, THE DOCUMENT OBJECT MODEL,
Trang 18PART II: BACK END
CHAPTER 7: WORKING WITH WEB SERVERS 141
Apache 141
Nginx 158
HAProxy 181
Summary 191
Trang 19Looking Inside MySQL 194
MyISAM 195InnoDB 196MEMORY 197ARCHIVE 198
Threading 219
Mutexes 222Compression 223
Trang 20MongoDB 325
Replication 339Sharding 343
CouchDB 354
Trang 21Amazon Dynamo and Google BigTable 355Riak 356Cassandra 356Redis 356HBase 356
Compiling 379
Trang 22Using APC with FastCGI 387
phc 388Phalanger 388HipHop 388
Sessions 389
APPENDIX B: DESIGNING FOR MOBILE PLATFORMS 409
Trang 23Caching in Mobile Devices 414
LZ77 417LZ78 418LZW 419
INDEX 427
Trang 25THE PAST DECADE has seen an increased interest in website performance, with businesses of all sizes realizing that even modest changes in page loading times can have a signifi cant effect on their profi ts The move toward a faster web has been driven largely by Yahoo! and Google, which have both carried out extensive research on the subject of website performance, and have worked hard to make web masters aware of the benefi ts.
This book provides valuable information that you must know about website performance optimization — from database replication and web server load balancing, to JavaScript profi ling and the latest features of Cascading Style Sheets 3 (CSS3) You can discover (perhaps surprising) ways in which your website is under-performing, and learn how to scale out your system as the popularity of your site increases
WHY SPEED IS IMPORTANT
At fi rst glance, it may seem as if website loading speeds aren’t terribly important Of course, it puts off users if they must wait 30 seconds for your page to load But if loading times are relatively low, isn’t that enough? Does shaving off a couple of seconds from loading times actually make that much
of a difference? Numerous pieces of research have been carried out on this subject, and the results are quite surprising
In 2006, Google experimented with reducing the size of its Maps homepage (from 100 KB to 70–80 KB) Within a week, traffi c had increased by 10 percent, according to ZDNet (http://www zdnet.com/blog/btl/googles-marissa-mayer-speed-wins/3925?p=3925) Google also found that a half-second increase in loading times for search results had led to a 20 percent drop in sales That same year, Amazon.com came to similar conclusions, after experiments showed that for each 100-millisecond increase in loading time, sales dropped by 1 percent (http://ai.stanford edu/~ronnyk/IEEEComputer2007OnlineExperiments.pdf)
The fact that there is a correlation between speed and sales perhaps isn’t too surprising, but the extent to which even a tiny difference in loading times can have such a noticeable impact on sales certainly is
But that’s not the only worry Not only do slow websites lose traffi c and sales, work at Stanford University suggests that slow websites are also considered less credible (http://captology stanford.edu/pdf/p61-fogg.pdf) It seems that, as Internet connections have become faster, the willingness of users to wait has started to wane If you want your site to be busy and well liked, it pays to be fast
Trang 26If all this weren’t enough, there’s now yet another reason to ensure that your site runs quickly In
2010, Google announced that loading times would play a role in how it ranked sites — that is,
faster sites will rank higher (http://googlewebmastercentral.blogspot.com/2010/04/
using-site-speed-in-web-search-ranking.html) However, loading times carry a relatively
low weight at the moment, and other factors (relevance, backlinks, and so on) are still much more
important
Hopefully you are now convinced of the need for speed So, let’s take a look at some of the reasons
why sites are slow
Why Sites Are Slow
The most common reason why websites run slowly is that they simply weren’t designed with speed
in mind Typically, the fi rst step in the creation of a site is for a graphics designer to create templates
based on the ideas of the site owner (who is often not technically minded) The graphic designer’s
main goal is an attractive looking interface regardless of size, and the nontechnical site owner
gener-ally wants lots of bells and whistles, again without appreciating the performance impact
The next step is for a programmer to make things work behind the scenes, which typically involves
a server-side scripting language (such as PHP or Perl) and a back-end database Sadly, performance
is often low on the programmer’s agenda, too, especially when his or her boss wants to see visible
results fast It simply isn’t worth the programmer’s time to compress the bloated graphics created by
the designer, or to convert them to sprites
Another often overlooked fact is that much of the development and testing of a new website will
probably be carried out on a development server under low load A database query that takes a
couple of seconds to run may not be a problem when the site has only a couple of users But when
the site goes live, that same query could well slow down the site to a crawl Tools such as Apache
Benchmark can simulate heavy traffi c
There is also the issue of caching Those involved in the creation and development of a site
typi-cally already have primed caches (That is, images and external JavaScript/CSS used by the site will
already be cached in their browsers.) This causes the site to load much faster than it would for fi
rst-time visitors
Other factors affecting the speed of a website are connection speed and computer “power.”
Developers typically have powerful computers and a good Internet connection, and it’s easy to
for-get that plenty of people (especially in rural locations) still use dial-up modems and computers that
are 10 years old Care must be taken to ensure that such users are accommodated for
The Compromise between Functionality and Speed
The creation of a website is often a battle between the designers who want looks and functionality,
and the programmers who want performance (Sadly, “battle” tends to be a more apt description
than “collaboration.”) Inevitably, some compromises must be made Both sides tend to be guilty of
Trang 27tunnel vision here, but it’s worth trying to develop a rounded view of the situation Although speed
is important, it’s not the “be all and end all.” In your quest for more and more savings, be wary of stripping down your website too much
Scaling Up versus Scaling Out
There are two basic approaches to scaling your website:
➤ Scaling up (sometimes referred to as scaling vertical) means keeping the same number of
servers but upgrading the server hardware For example, you may run your whole setup from
a single server As your site gets more traffi c, you discover that the server is beginning to struggle, so you throw in another stick of RAM or upgrade the CPU — which is scaling up
➤ With scaling out (also referred to as scaling horizontally), you increase the number of
machines in your setup For example, in the previous scenario, you could place your base on its own server, or use a load balancer to split web traffi c across two web servers
data-So, which method is best? You’ll hear a lot of criticism of vertical scaling, but in reality, it is a viable solution for many The majority of websites do not achieve overnight success Rather, the user base steadily increases over the years For these sites, vertical scaling is perfectly fi ne Advances in hard-ware mean that each time you want to upgrade, a machine with more CPU cores, or more memory,
or faster disks will be available
Scaling up isn’t without its problems, though You pay a premium for top-of-the-range hardware
The latest monster server will usually cost more than two mid-range servers with the same overall power Also, additional CPU cores and RAM don’t tend to result in a linear increase in perfor-mance For example, no matter how much RAM you have, access to it is still along a fi xed-width bus, which can transfer only at a fi nite rate Additional CPU cores aren’t a great benefi t if your bot-tleneck is with a single-threaded application So, scaling up offers diminishing returns, and it also fails to cope when your site goes stratospheric For that, you need a topology where you can easily add additional mid-range servers to cope with demand
Scaling out is trickier, because it involves more planning If you have a pool of web servers, you must think about how sessions are handled, user uploads, and so on If you split your database over several machines, you must worry about keeping data in sync Horizontal scaling is the best long-term solution, but it requires more thought as to how to make your setup scalable
Finally, be wary of taking the idea of horizontal scaling to extremes Some people take the idea too far, setting up clusters of Pentium I machines because “that’s how Google does it.” Actually, Google doesn’t do this Although Google scales out to a high degree, it still uses decent hardware
on each node
Scaling out isn’t without its drawbacks either Each additional node means extra hardware
to monitor and replace, and time spent installing and deploying code The most satisfactory arrangement tends to be through a combination of scaling up and scaling out
Trang 28The Dangers of Premature Optimization
There’s a famous quote by Donald Knuth, author of the legendary The Art of Computer
Programming (Reading, MA: Addison-Wesley Professional, 2011) “Premature optimization is the
root of all evil,” he said, and this is often re-quoted in online discussions as a means of dismissing
another user’s attempts at more marginal optimizations For example, if one developer is
contem-plating writing his or her PHP script as a PHP extension in C, the Knuth quote will invariably be
used to dispute that idea
So, what exactly is wrong with premature optimization? The fi rst danger is that it adds
complex-ity to your code, and makes it more diffi cult to maintain and debug For example, imagine that you
decided to rewrite some of your C code in assembly for optimal performance It’s easy to fall into
the trap of not seeing the forest for the trees — you become so focused on the performance of one
small aspect of the system that you lose perspective on overall performance You may be wasting
valuable time on relatively unimportant areas — there may be much bigger and easier gains to be
made elsewhere
So, it’s generally best to consider optimization only after you already have a good overview of how
the whole infrastructure (hardware, operating system, databases, web servers, and so on) will fi t
together At that point, you will be in a better position to judge where the greatest gains can be made
That’s not to say you should ignore effi ciency when writing your code The Knuth quote is often
mis-used because it can be diffi cult to say what constitutes premature optimization, and what is simply
good programming practice For example, if your application will be reading a lot of information
from the database, you may decide that you will write some basic caching to wrap around these calls,
to cut down on load on the database
Does this count as premature optimization? It’s certainly premature in the sense that you don’t even
know if these database calls will be a signifi cant bottleneck, and it is adding an extra degree of
com-plexity to your code But could it not also be classed as simply planning with scalability in mind?
Building in this caching from the outset will be quicker (and probably better integrated) than hacking
it in at a later date
If you’re tempted to optimize prematurely, stop and consider these two points:
➤ Will there defi nitely be a benefi t — and will it be a signifi cant one?
➤ Will it make the code signifi cantly more diffi cult to maintain or debug?
If the answers are “yes” and “no,” respectively, you should optimize
Time Is Money
Optimizing is a satisfactory experience — so much so that you may fi nd yourself attempting
opti-mization for the sake of it, rather than because it is needed That’s not necessarily a bad thing
Research has shown that even tiny increases in page loading times can have an impact on revenue
and user experience, so optimization doesn’t have to be a case of passively responding to complaints
about speed But time is also money, and sometimes simply throwing extra hardware at the problem
Trang 29is the best solution Is spending the best part of a week trying to perform further optimizations the right move, or would spending $100 on a RAM upgrade be just as effective? The latter option seems like a cop-out but is probably the most cost-effective route.
TOOLS OF THE TRADE
The bottlenecks in an application don’t always occur where you might expect them to, and an important precursor to optimization is to spend time watching how the application runs
Waterfall Views
Waterfall views are extremely useful when looking at the front end of a website These are graphs
showing the order in which the browser is requesting resources, and the time that it takes each resource to download Most waterfall tools also show things like the time spent for domain name service (DNS) lookups, for establishing a TCP connection to the web server, for parsing and render-ing data, and so on
There are a lot of waterfall tools out there — some run in your browser; others are websites into which you enter the URL that you want to check But many have subtle fl aws For example, one popular online tool will request any resources contained in commented-out Hypertext Markup Language (HTML) such as the following:
WebPageTest.org
By far, the best online waterfall tool is probably WebPageTest.org (commonly known as WPT), developed by Google, AOL, and others It offers dozens of locations around the world from which to perform tests and has an impressive list of browsers to test in — from Internet Explorer 6 through
to 10, to iPhone, Firefox, and Chrome Figure I-1 shows WPT in action
Figure I-1 shows the results page for http://www.google.com The six images at the top right indicate how the site scored in what WPT determined to be the six key areas Remember that this is just a summary for quick reference and should not be taken as an absolute For instance, in the test, google.com scored an “F” for “Cache static content,” yet it is still well optimized Clicking any of these scores will give a breakdown of how the grade was determined
Trang 30The way in which a page loads can vary dramatically, depending on whether the user’s cache
is primed (that is, if the user has previously visited the site) Some static resources (such as CSS,
JavaScript, images, and so on) may already be in the browser cache, signifi cantly speeding things up
So, the default is for WPT to perform a First View test (that is, as the browser would see the target
site if it had an unprimed cache), and a Repeat View test (that is, emulating the effect of visiting
the site with an already primed cache) A preview image is shown for both these tests, and clicking
one brings up the full waterfall graphic, as shown in Figure I-2
FIGURE I-1
FIGURE I-2
Trang 31The horizontal bar shows time elapsed (with resources listed vertically, in the order in which they were requested) So, the browser fi rst fetched the index page (/), then chrome-48.png, then logo3w png, and so on Figure I-3 shows the fi rst half second in more detail
FIGURE I-3
The section at the beginning of the fi rst request indicates a DNS lookup — the browser must resolve www.google.com to an IP address This took approximately 50 milliseconds The next section indicates the time taken to establish a connection to the web server This includes setting
up the TCP connection (if you’re unfamiliar with the three-way handshake, see Appendix A, “TCP Performance”), and possibly waiting for the web server to spawn a new worker process to handle the request In this example, that took approximately 70 milliseconds
The next section shows the time to fi rst byte (TTFB) At the beginning of this section, the client has issued the request and is waiting for the server to respond There’ll always be a slight pause here (approximately 120 milliseconds in this example), even for static fi les However, high delays often indicate an overloaded server — perhaps high levels of disk contention, or back-end scripts that are taking a long time to generate the page
Finally, the server returns a response to the client, which is shown by the fi nal section of the bar
The size of this section is dependent on the size of the resource being returned and the available bandwidth The number following the bar is the total time for the resource, from start to fi nish
After the web browser fetches the HTML document, it can begin fetching resources linked to in it Note that in request 2, there is no DNS lookup — the browser already has the response cached For request 5, the resource resides on a subdomain, ssl.gstatic.com, so this does incur a DNS lookup.Also notice two vertical lines at approximately the 40-millisecond and 55-millisecond marks The
fi rst line indicates the point at which the browser began to render the page The second line cates the point at which the onLoad event fi red — that is, the point at which the page had fi nished loading
indi-You’ll learn more about these waterfall views later in this book — you’ll learn how to optimize the downloading order, why some of the requests have a connection overhead and others don’t, and why there are sometimes gaps where nothing seems to be happening
Firebug
The downside to WPT is that it shows how the page loads on a remote machine, not your own
Usually, this isn’t a problem, but occasionally you want to test a URL inside a members-only area,
Trang 32or see the page as it would look for someone in your country (or on your ISP) WPT does actually
support some basic scripting, allowing it to log in to htpasswd-protected areas, but this isn’t any
help if you want to log in to something more complicated
Firebug is a useful Firefox extension that (among other things) can show a waterfall view as a page
loads in your browser This is perhaps a more accurate portrayal of real-world performance if you’re
running on a modestly powered PC with home broadband because the WPT tests are presumably
conducted from quite powerful and well-connected hardware
The output of Firebug is similar to that of WPT, complete with the two vertical lines representing
the start and end of rendering Each resource can be clicked to expand a list of the headers sent and
received with the request
System Monitoring
This book is intended to be platform-neutral Whether you run Berkeley Software Distribution
(BSD), Linux, Solaris, Windows, OS X, or some other operating system, the advice given in this
book should still be applicable
Nevertheless, for system performance-monitoring tools, this will inevitably be quite
platform-specifi c Some tools such as netstat are implemented across most operating systems, but the likes
of vmstat and iostat exist only in the UNIX world, and Windows users must use other tools Let’s
briefl y look at the most common choices to see how they work
vmstat
vmstat is an essential tool on most fl avors of UNIX and its derivatives (Linux, OS X, and so on)
It provides information on memory usage, disk activity, and CPU utilization With no arguments,
vmstat simply displays a single-line summary of system activity However, a numeric value is
usu-ally specifi ed on the command line, causing vmstat to output data every x seconds Here’s vmstat
in action with an interval of 5 seconds:
# vmstat 5
procs -memory - -swap -io -system
r b swpd free buff cache si so bi bo in cs us sy id wa
Trang 33The fi rst columns are as follows:
➤ r — This is the number of currently running processes
➤ b — This is the number of blocking processes
Blocking processes are those that cannot yet run because they are waiting on the hardware (most
often the disks) Naturally, this is the least-desirable state for a process to be in, and a high number
of blocking processes generally indicates a bottleneck somewhere (again, usually the disks) If the
number of running processes exceeds the number of CPU cores on the system, this can also cause
some degrading of performance, but blocking is the real killer
The next four columns are similar to the information given by the free command, as shown here:
➤ swpd — This is how much swap memory is in use (expressed in bytes)
➤ free — This is idle memory
➤ buff — This is memory used for buffers
➤ cache — This is memory used for caching
If you’re coming to UNIX from the world of Windows, it’s worth taking some time to ensure that you are absolutely clear on what these fi gures mean — in UNIX, things aren’t as clear-cut as “free” and “used” memory
The next two columns show swap usage:
➤ si — This is the bytes read in from swap
➤ so — This is the bytes written out to swap
Swapping is usually a bad thing, no matter what operating system you use It indicates insuffi cient physical memory If swapping occurs, expect to see high numbers of blocking processes as the CPUs wait on the disks
Following are the next two columns:
➤ bi — This is the bytes read from block devices
➤ bo — This is the bytes written to block devices
Invariably, block devices means hard disks, so these two columns show how much data is being read
from and written to disk With disks so often being a bottleneck, it’s worth studying these columns with the goal of trying to reduce disk activity Often, you’ll be surprised just how much writing is going on
NOTE For a breakdown of which disks and partitions the activity occurs on, see the iostat command.
Trang 34Now, consider the next two columns:
➤ in — This is the number of CPU interrupts
➤ cs — This is the number of context switches
At the risk of digressing too much into CPU architecture, a context switch occurs when the CPU
either switches from one process to another, or handles an interrupt Context switching is an
essential part of multitasking operating systems but also incurs some slight overhead If your
system performs a huge number of context switches, this can degrade performance
The fi nal four columns show CPU usage, measured as a percentage of the CPU time:
➤ us — This is the time spent running userland code
➤ sy — This is the system time (that is, time spent running kernel code)
➤ id — This shows the idle time (That is, the CPU is doing nothing.)
➤ wa — This shows the time that the CPU is waiting on I/O
id (idle) is naturally the most preferable state to be in, whereas wa (waiting) is the least wa indicates
that the CPU has things to do but can’t because it’s waiting on other hardware Usually, this is the
disks, so check for high values in the io and swap columns
Whether the CPU will mostly be running user code or kernel code depends on the nature of the
appli-cations running on the machine Many of the appliappli-cations discussed in this book spend a lot of time
sending and receiving data over the network, and this is usually implemented at the kernel level
The previous vmstat example was taken from a web server at a fairly quiet time of the day Let’s
look at another example, taken from the same server, while the nightly backup process was running:
# vmstat 5
procs -memory - -swap -io -system
r b swpd free buff cache si so bi bo in cs us sy id wa
Although the machine is far from being overloaded, performance is not ideal You see regular
block-ing processes, disk activity is higher, and the CPUs (this machine had six cores) are spendblock-ing more
of their time in the waiting (wa) state
Trang 35Depending on your operating system, there may be other data available from vmstat For example, the Linux version can give a more detailed breakdown of disk activity (with the –d switch) and can show statistics on forking (with the –f switch) Check the man pages to see exactly what your system supports.
WHO THIS BOOK IS FOR
The information in this book is designed to appeal to a wide range of readers, from system administrators charged with managing busy websites, to web developers looking to write effi cient, high-performance code
This book makes no assumptions about your underlying operating system, and the information is (in most cases) equally applicable whether you run OS X, Linux, Windows, FreeBSD, or another
fl avor of UNIX Situations are highlighted in which some of the information depends on the operating system used
Trang 36WHAT THIS BOOK COVERS
A wide range of technologies are in use on the web, and it would be futile to attempt to cover them
all (or at least cover them in suffi cient detail) Rather, the discussions in this book concentrate on the
most popular open source technologies — PHP, MySQL, Apache, Nginx, memcache, and mongodb
In this book, you’ll discover many of the advanced features of these technologies, and the ways
in which they can be utilized to provide scalable, high-performance websites You’ll learn
cur-rent performance best practices, tips for improving your existing sites, and how to design with
scalability in mind
The browser market is wide and varied The discussions in this book focus on the fi ve main web
browsers (which together make up the vast majority of web users) — Internet Explorer, Chrome,
Firefox, Opera, and Safari Behavior can vary in suitable (but important) ways between versions,
and, in most cases, when particular aspects of browser behavior are examined, the discussion
includes versions from the past 5 years or so It’s unfortunate (but inevitable) that a sizeable number
of users will not be running the most current version
HOW THIS BOOK IS STRUCTURED
The book is divided into two parts, covering aspects of website performance related to the front end
(Part I) and the back end (Part II)
In the fi rst part you’ll meet topics such as the HTTP protocol, how web browsers work, browser
caching, content compression, minifi cation, JavaScript, CSS, and web graphics — all essential topics
for web developers Following are the chapters included in this part of the book:
➤ Chapter 1, “A Refresher on Web Browsers” — This chapter provides a look under the hood
at how the web works In this chapter, you will meet the HTTP protocol, and features such
as caching, persistent connections, and Keep-Alive
➤ Chapter 2, “Utilizing Client-Side Caching” — This chapter examines the ways in which
web browsers cache content, and what you can do to control it
➤ Chapter 3, “Content Compression” — Here you fi nd everything you need to know about
compressing content to speed up page loading times
➤ Chapter 4, “Keeping the Size Down with Minifi cation” — In this chapter, you discover the
art of minifying HTML, CSS, and JavaScript to further reduce payload sizes
➤ Chapter 5, “Optimizing Web Graphics and CSS” — Here you learn how to optimize the
most common image formats, and discover ways in which CSS can be used to create lean,
effi cient markup
➤ Chapter 6, “JavaScript, the Document Object Model, and Ajax” — JavaScript is an
increas-ingly important part of the web In this chapter, you learn about performance aspects of the
language, with an emphasis on interaction with the document object model (DOM)
Trang 37The second part of the book focuses on the technologies behind the scenes — databases, web servers, server-side scripting, and so on Although many of these issues are of more interest to back-end developers and system administrators, they are vital for front-end developers to understand to appreciate the underlying system Following are the chapters included in this part of the book:
➤ Chapter 7, “Working with Web Servers” — This chapter provides everything you need to
know about tuning Apache and Nginx The second half of the chapter looks at load ing and related issues that arise (for example, session affi nity)
balanc-➤ Chapter 8, “Tuning MySQL” — In this fi rst of two chapters devoted to MySQL, you meet
the myriad of tuning options and discover the differences between MyISAM and InnoDB
➤ Chapter 9, “MySQL in the Network” — Here you learn how to scale out MySQL using
such techniques as replication, sharding, and partitioning
➤ Chapter 10, “Utilizing NoSQL Solutions” — NoSQL is a collective term for lightweight
database alternatives In this chapter, you learn about two of the most important players:
memcache and mongodb
➤ Chapter 11, “Working with Secure Sockets Layer (SSL)” — SSL can be a performance
killer, but there are a surprising number of things that you can do to improve the situation
➤ Chapter 12, “Optimizing PHP” — Perhaps the most popular back-end scripting language,
PHP can have a signifi cant impact on performance In this chapter, you learn about opcode caching, and discover how to write lean, effi cient PHP
This book also includes three appendixes that provide additional information:
➤ Appendix A, “TCP Performance” — Transmission control protocol (TCP) and Internet
Protocol (IP) are the protocols that drive in the Internet In this appendix, you learn about some of the performance aspects of TCP, including the three-way handshake and Nagle’s algorithm
➤ Appendix B, “Designing for Mobile Platforms” — An increasing number of users now
access the web via mobile devices such as cell phones and tablets These bring about their own design considerations
➤ Appendix C, “Compression” — This book makes numerous references to compression
Here you discover the inner workings of the LZW family, the algorithm behind HTTP pression, and many image formats
com-WHAT YOU NEED TO USE THIS BOOK
To get the most out of this book, you should have a basic working knowledge of web development — HTML, JavaScript, CSS, and perhaps PHP You should also be familiar with basic system
management — editing fi les, installing applications, and so on
Trang 38To help you get the most from the text and keep track of what’s happening, we’ve used a number of
conventions throughout the book
NOTE Notes indicates notes, tips, hints, tricks, and/or asides to the current discussion.
As for styles in the text:
➤ We highlight new terms and important words when we introduce them.
➤ We show keyboard strokes like this: Ctrl+A
➤ We show fi lenames, URLs, and code within the text like so: persistence.properties
➤ We present code in two different ways:
We use a monofont type with no highlighting for most code examples.
We use bold to emphasize code that is particularly important in the present
context or to show changes from a previous code snippet.
ERRATA
We make every effort to ensure that there are no errors in the text or in the code However, no one
is perfect, and mistakes do occur If you fi nd an error in one of our books, like a spelling mistake
or faulty piece of code, we would be grateful for your feedback By sending in errata, you may save
another reader hours of frustration, and, at the same time, you will be helping us provide even
higher-quality information
To fi nd the errata page for this book, go to http://www.wrox.com and locate the title using the
Search box or one of the title lists Then, on the book details page, click the Book Errata link On
this page, you can view all errata that has been submitted for this book and posted by Wrox editors
NOTE A complete book list, including links to each book’s errata, is also available at www.wrox.com/misc-pages/booklist.shtml.
If you don’t spot “your” error on the Book Errata page, go to www.wrox.com/contact/techsupport
.shtml and complete the form there to send us the error you have found We’ll check the information
and, if appropriate, post a message to the book’s errata page and fi x the problem in subsequent
editions of the book
Trang 39For author and peer discussion, join the P2P forums at p2p.wrox.com The forums are a web-based system for you to post messages relating to Wrox books and related technologies, and to interact with other readers and technology users The forums offer a subscription feature to e-mail you topics of interest of your choosing when new posts are made to the forums Wrox authors, editors, other industry experts, and your fellow readers are present on these forums
At http://p2p.wrox.com, you will fi nd a number of different forums that will help you, not only as you read this book, but also as you develop your own applications To join the forums, just follow these steps:
1. Go to p2p.wrox.com and click the Register link
2. Read the terms of use and click Agree
3. Complete the required information to join, as well as any optional information you want to provide, and click Submit
4. You will receive an e-mail with information describing how to verify your account and complete the joining process
NOTE You can read messages in the forums without joining P2P, but to post your own messages, you must join.
After you join, you can post new messages and respond to messages other users post You can read messages at any time on the web If you would like to have new messages from a particular forum e-mailed to you, click the Subscribe to this Forum icon by the forum name in the forum listing
For more information about how to use the Wrox P2P, be sure to read the P2P FAQs for answers to questions about how the forum software works, as well as many common questions specifi c to P2P and Wrox books To read the FAQs, click the FAQ link on any P2P page