Professional Website Performance: Optimizing the Front-End and Back-End potx

THE PAST DECADE has seen an increased interest in website performance, with businesses of all sizes realizing that even modest changes in page loading times can have a signifi cant effec

Trang 3

INTRODUCTION xxiii

PART I FRONT END CHAPTER 1 A Refresher on Web Browsers 3

CHAPTER 2 Utilizing Client-Side Caching 23

CHAPTER 3 Content Compression 39

CHAPTER 4 Keeping the Size Down with Miniﬁ cation 53

CHAPTER 5 Optimizing Web Graphics and CSS 71

CHAPTER 6 JavaScript, the Document Object Model, and Ajax 111

PART II BACK END CHAPTER 7 Working with Web Servers 141

CHAPTER 8 Tuning MySQL 193

CHAPTER 9 MySQL in the Network 255

CHAPTER 10 Utilizing NoSQL Solutions 309

CHAPTER 11 Working with Secure Sockets Layer (SSL) 359

CHAPTER 12 Optimizing PHP 375

PART III APPENDIXES APPENDIX A TCP Performance 405

APPENDIX B Designing for Mobile Platforms 409

APPENDIX C Compression 417

INDEX 427

Trang 5

OPTIMIZING THE FRONT END AND THE BACK END

Peter Smith

John Wiley & Sons, Inc.

Trang 6

Indianapolis, IN 46256

www.wiley.com

Published simultaneously in Canada

No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means,

electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108

of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization

through payment of the appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers,

MA 01923, (978) 750-8400, fax (978) 646-8600 Requests to the Publisher for permission should be addressed to the

Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011,

fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.

Limit of Liability/Disclaimer of Warranty: The publisher and the author make no representations or warranties with

respect to the accuracy or completeness of the contents of this work and specifi cally disclaim all warranties, including

without limitation warranties of fi tness for a particular purpose No warranty may be created or extended by sales or

promotional materials The advice and strategies contained herein may not be suitable for every situation This work

is sold with the understanding that the publisher is not engaged in rendering legal, accounting, or other professional

services If professional assistance is required, the services of a competent professional person should be sought Neither

the publisher nor the author shall be liable for damages arising herefrom The fact that an organization or Web site is

referred to in this work as a citation and/or a potential source of further information does not mean that the author or the

publisher endorses the information the organization or Web site may provide or recommendations it may make Further,

readers should be aware that Internet Web sites listed in this work may have changed or disappeared between when this

work was written and when it is read.

For general information on our other products and services please contact our Customer Care Department within the

United States at (877) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.

Wiley publishes in a variety of print and electronic formats and by print-on-demand Some material included with standard

print versions of this book may not be included in e-books or in print-on-demand If this book refers to media such as a CD

or DVD that is not included in the version you purchased, you may download this material at http://booksupport

.wiley.com For more information about Wiley products, visit www.wiley.com.

Library of Congress Control Number: 2012949514

Trademarks: Wiley, the Wiley logo, Wrox, the Wrox logo, Programmer to Programmer, and related trade dress are

trade-marks or registered tradetrade-marks of John Wiley & Sons, Inc and/or its affi liates, in the United States and other countries,

and may not be used without written permission All other trademarks are the property of their respective owners John

Wiley & Sons, Inc., is not associated with any product or vendor mentioned in this book.

Trang 9

PETER G SMITH has been a full-time Linux consultant, web developer, and system administrator, with

a particular interest in performance for the past 13 years Over the years, he has helped a wide range

of clients in areas such as front-end performance, load balancing and scalability, and database mization Past open source projects include modules for Apache and OSCommerce, a cross-platform IRC client, and contributions to The Linux Documentation Project (TLDP)

opti-ABOUT THE TECHNICAL EDITOR

JOHN PELOQUIN is a software engineer with back-end and front-end experience ranging across web applications of all sizes Peloquin earned his B.A in Mathematics from the University of California at Berkeley, and is currently a lead engineer for a healthcare technology startup, where

he makes heavy use of MySQL, PHP, and JavaScript He has edited Professional JavaScript for Web Developers, 3rd Edition by Nicholas Zakas (Indianapolis: Wiley, 2012) and JavaScript 24-Hour Trainer by Jeremy McPeak (Indianapolis: Wiley, 2010) When he is not coding or col-

lecting errata, Peloquin is often found engaged in mathematics, philosophy, or juggling

Trang 11

Mary Beth Wakeﬁ eld

FREELANCER EDITORIAL MANAGER

Trang 13

A LOT OF PEOPLE HAVE BEEN INVOLVED in making this book happen I’d like to thank everyone at Wiley for their hard work, especially Carol Long for having faith in my original idea and helping me

to develop it, and Kevin Shafer, my Project Editor, who patiently helped turn my manuscript into

a well-rounded book Special thanks are also due to John Peloquin, whose technical review proved invaluable

I’d also like to take the opportunity to thank my friends and family for being so supportive over the past few months

Trang 15

INTRODUCTION xxiii

PART I: FRONT END

CHAPTER 1: A REFRESHER ON WEB BROWSERS 3

CHAPTER 2: UTILIZING CLIENT-SIDE CACHING 23

Trang 16

The DNS Resolution Process 35

Summary 37

Summary 51

CHAPTER 4: KEEPING THE SIZE DOWN WITH MINIFICATION 53

Summary 69

CHAPTER 5: OPTIMIZING WEB GRAPHICS AND CSS 71

JPEG 72GIF 72PNG 73SVG 73

Trang 17

Image Editing Software 74

MNG 109APNG 109

Summary 110

CHAPTER 6: JAVASCRIPT, THE DOCUMENT OBJECT MODEL,

Trang 18

PART II: BACK END

CHAPTER 7: WORKING WITH WEB SERVERS 141

Apache 141

Nginx 158

HAProxy 181

Summary 191

Trang 19

Looking Inside MySQL 194

MyISAM 195InnoDB 196MEMORY 197ARCHIVE 198

Threading 219

Mutexes 222Compression 223

Trang 20

MongoDB 325

Replication 339Sharding 343

CouchDB 354

Trang 21

Amazon Dynamo and Google BigTable 355Riak 356Cassandra 356Redis 356HBase 356

Compiling 379

Trang 22

Using APC with FastCGI 387

phc 388Phalanger 388HipHop 388

Sessions 389

APPENDIX B: DESIGNING FOR MOBILE PLATFORMS 409

Trang 23

Caching in Mobile Devices 414

LZ77 417LZ78 418LZW 419

INDEX 427

Trang 25

THE PAST DECADE has seen an increased interest in website performance, with businesses of all sizes realizing that even modest changes in page loading times can have a signifi cant effect on their profi ts The move toward a faster web has been driven largely by Yahoo! and Google, which have both carried out extensive research on the subject of website performance, and have worked hard to make web masters aware of the benefi ts.

This book provides valuable information that you must know about website performance optimization — from database replication and web server load balancing, to JavaScript profi ling and the latest features of Cascading Style Sheets 3 (CSS3) You can discover (perhaps surprising) ways in which your website is under-performing, and learn how to scale out your system as the popularity of your site increases

WHY SPEED IS IMPORTANT

At fi rst glance, it may seem as if website loading speeds aren’t terribly important Of course, it puts off users if they must wait 30 seconds for your page to load But if loading times are relatively low, isn’t that enough? Does shaving off a couple of seconds from loading times actually make that much

of a difference? Numerous pieces of research have been carried out on this subject, and the results are quite surprising

In 2006, Google experimented with reducing the size of its Maps homepage (from 100 KB to 70–80 KB) Within a week, traffi c had increased by 10 percent, according to ZDNet (http://www zdnet.com/blog/btl/googles-marissa-mayer-speed-wins/3925?p=3925) Google also found that a half-second increase in loading times for search results had led to a 20 percent drop in sales That same year, Amazon.com came to similar conclusions, after experiments showed that for each 100-millisecond increase in loading time, sales dropped by 1 percent (http://ai.stanford edu/~ronnyk/IEEEComputer2007OnlineExperiments.pdf)

The fact that there is a correlation between speed and sales perhaps isn’t too surprising, but the extent to which even a tiny difference in loading times can have such a noticeable impact on sales certainly is

But that’s not the only worry Not only do slow websites lose traffi c and sales, work at Stanford University suggests that slow websites are also considered less credible (http://captology stanford.edu/pdf/p61-fogg.pdf) It seems that, as Internet connections have become faster, the willingness of users to wait has started to wane If you want your site to be busy and well liked, it pays to be fast

Trang 26

If all this weren’t enough, there’s now yet another reason to ensure that your site runs quickly In

2010, Google announced that loading times would play a role in how it ranked sites — that is,

faster sites will rank higher (http://googlewebmastercentral.blogspot.com/2010/04/

using-site-speed-in-web-search-ranking.html) However, loading times carry a relatively

low weight at the moment, and other factors (relevance, backlinks, and so on) are still much more

important

Hopefully you are now convinced of the need for speed So, let’s take a look at some of the reasons

why sites are slow

Why Sites Are Slow

The most common reason why websites run slowly is that they simply weren’t designed with speed

in mind Typically, the fi rst step in the creation of a site is for a graphics designer to create templates

based on the ideas of the site owner (who is often not technically minded) The graphic designer’s

main goal is an attractive looking interface regardless of size, and the nontechnical site owner

gener-ally wants lots of bells and whistles, again without appreciating the performance impact

The next step is for a programmer to make things work behind the scenes, which typically involves

a server-side scripting language (such as PHP or Perl) and a back-end database Sadly, performance

is often low on the programmer’s agenda, too, especially when his or her boss wants to see visible

results fast It simply isn’t worth the programmer’s time to compress the bloated graphics created by

the designer, or to convert them to sprites

Another often overlooked fact is that much of the development and testing of a new website will

probably be carried out on a development server under low load A database query that takes a

couple of seconds to run may not be a problem when the site has only a couple of users But when

the site goes live, that same query could well slow down the site to a crawl Tools such as Apache

Benchmark can simulate heavy traffi c

There is also the issue of caching Those involved in the creation and development of a site

typi-cally already have primed caches (That is, images and external JavaScript/CSS used by the site will

already be cached in their browsers.) This causes the site to load much faster than it would for fi

rst-time visitors

Other factors affecting the speed of a website are connection speed and computer “power.”

Developers typically have powerful computers and a good Internet connection, and it’s easy to

for-get that plenty of people (especially in rural locations) still use dial-up modems and computers that

are 10 years old Care must be taken to ensure that such users are accommodated for

The Compromise between Functionality and Speed

The creation of a website is often a battle between the designers who want looks and functionality,

and the programmers who want performance (Sadly, “battle” tends to be a more apt description

than “collaboration.”) Inevitably, some compromises must be made Both sides tend to be guilty of

Trang 27

tunnel vision here, but it’s worth trying to develop a rounded view of the situation Although speed

is important, it’s not the “be all and end all.” In your quest for more and more savings, be wary of stripping down your website too much

Scaling Up versus Scaling Out

There are two basic approaches to scaling your website:

➤ Scaling up (sometimes referred to as scaling vertical) means keeping the same number of

servers but upgrading the server hardware For example, you may run your whole setup from

a single server As your site gets more traffi c, you discover that the server is beginning to struggle, so you throw in another stick of RAM or upgrade the CPU — which is scaling up

➤ With scaling out (also referred to as scaling horizontally), you increase the number of

machines in your setup For example, in the previous scenario, you could place your base on its own server, or use a load balancer to split web traffi c across two web servers

data-So, which method is best? You’ll hear a lot of criticism of vertical scaling, but in reality, it is a viable solution for many The majority of websites do not achieve overnight success Rather, the user base steadily increases over the years For these sites, vertical scaling is perfectly fi ne Advances in hard-ware mean that each time you want to upgrade, a machine with more CPU cores, or more memory,

or faster disks will be available

Scaling up isn’t without its problems, though You pay a premium for top-of-the-range hardware

The latest monster server will usually cost more than two mid-range servers with the same overall power Also, additional CPU cores and RAM don’t tend to result in a linear increase in perfor-mance For example, no matter how much RAM you have, access to it is still along a fi xed-width bus, which can transfer only at a fi nite rate Additional CPU cores aren’t a great benefi t if your bot-tleneck is with a single-threaded application So, scaling up offers diminishing returns, and it also fails to cope when your site goes stratospheric For that, you need a topology where you can easily add additional mid-range servers to cope with demand

Scaling out is trickier, because it involves more planning If you have a pool of web servers, you must think about how sessions are handled, user uploads, and so on If you split your database over several machines, you must worry about keeping data in sync Horizontal scaling is the best long-term solution, but it requires more thought as to how to make your setup scalable

Finally, be wary of taking the idea of horizontal scaling to extremes Some people take the idea too far, setting up clusters of Pentium I machines because “that’s how Google does it.” Actually, Google doesn’t do this Although Google scales out to a high degree, it still uses decent hardware

on each node

Scaling out isn’t without its drawbacks either Each additional node means extra hardware

to monitor and replace, and time spent installing and deploying code The most satisfactory arrangement tends to be through a combination of scaling up and scaling out

Trang 28

The Dangers of Premature Optimization

There’s a famous quote by Donald Knuth, author of the legendary The Art of Computer

Programming (Reading, MA: Addison-Wesley Professional, 2011) “Premature optimization is the

root of all evil,” he said, and this is often re-quoted in online discussions as a means of dismissing

another user’s attempts at more marginal optimizations For example, if one developer is

contem-plating writing his or her PHP script as a PHP extension in C, the Knuth quote will invariably be

used to dispute that idea

So, what exactly is wrong with premature optimization? The fi rst danger is that it adds

complex-ity to your code, and makes it more diffi cult to maintain and debug For example, imagine that you

decided to rewrite some of your C code in assembly for optimal performance It’s easy to fall into

the trap of not seeing the forest for the trees — you become so focused on the performance of one

small aspect of the system that you lose perspective on overall performance You may be wasting

valuable time on relatively unimportant areas — there may be much bigger and easier gains to be

made elsewhere

So, it’s generally best to consider optimization only after you already have a good overview of how

the whole infrastructure (hardware, operating system, databases, web servers, and so on) will fi t

together At that point, you will be in a better position to judge where the greatest gains can be made

That’s not to say you should ignore effi ciency when writing your code The Knuth quote is often

mis-used because it can be diffi cult to say what constitutes premature optimization, and what is simply

good programming practice For example, if your application will be reading a lot of information

from the database, you may decide that you will write some basic caching to wrap around these calls,

to cut down on load on the database

Does this count as premature optimization? It’s certainly premature in the sense that you don’t even

know if these database calls will be a signifi cant bottleneck, and it is adding an extra degree of

com-plexity to your code But could it not also be classed as simply planning with scalability in mind?

Building in this caching from the outset will be quicker (and probably better integrated) than hacking

it in at a later date

If you’re tempted to optimize prematurely, stop and consider these two points:

➤ Will there defi nitely be a benefi t — and will it be a signifi cant one?

➤ Will it make the code signifi cantly more diffi cult to maintain or debug?

If the answers are “yes” and “no,” respectively, you should optimize

Time Is Money

Optimizing is a satisfactory experience — so much so that you may fi nd yourself attempting

opti-mization for the sake of it, rather than because it is needed That’s not necessarily a bad thing

Research has shown that even tiny increases in page loading times can have an impact on revenue

and user experience, so optimization doesn’t have to be a case of passively responding to complaints

about speed But time is also money, and sometimes simply throwing extra hardware at the problem

Trang 29

is the best solution Is spending the best part of a week trying to perform further optimizations the right move, or would spending $100 on a RAM upgrade be just as effective? The latter option seems like a cop-out but is probably the most cost-effective route.

TOOLS OF THE TRADE

The bottlenecks in an application don’t always occur where you might expect them to, and an important precursor to optimization is to spend time watching how the application runs

Waterfall Views

Waterfall views are extremely useful when looking at the front end of a website These are graphs

showing the order in which the browser is requesting resources, and the time that it takes each resource to download Most waterfall tools also show things like the time spent for domain name service (DNS) lookups, for establishing a TCP connection to the web server, for parsing and render-ing data, and so on

There are a lot of waterfall tools out there — some run in your browser; others are websites into which you enter the URL that you want to check But many have subtle fl aws For example, one popular online tool will request any resources contained in commented-out Hypertext Markup Language (HTML) such as the following:

WebPageTest.org

By far, the best online waterfall tool is probably WebPageTest.org (commonly known as WPT), developed by Google, AOL, and others It offers dozens of locations around the world from which to perform tests and has an impressive list of browsers to test in — from Internet Explorer 6 through

to 10, to iPhone, Firefox, and Chrome Figure I-1 shows WPT in action

Figure I-1 shows the results page for http://www.google.com The six images at the top right indicate how the site scored in what WPT determined to be the six key areas Remember that this is just a summary for quick reference and should not be taken as an absolute For instance, in the test, google.com scored an “F” for “Cache static content,” yet it is still well optimized Clicking any of these scores will give a breakdown of how the grade was determined

Trang 30

The way in which a page loads can vary dramatically, depending on whether the user’s cache

is primed (that is, if the user has previously visited the site) Some static resources (such as CSS,

JavaScript, images, and so on) may already be in the browser cache, signifi cantly speeding things up

So, the default is for WPT to perform a First View test (that is, as the browser would see the target

site if it had an unprimed cache), and a Repeat View test (that is, emulating the effect of visiting

the site with an already primed cache) A preview image is shown for both these tests, and clicking

one brings up the full waterfall graphic, as shown in Figure I-2

FIGURE I-1

FIGURE I-2

Trang 31

The horizontal bar shows time elapsed (with resources listed vertically, in the order in which they were requested) So, the browser fi rst fetched the index page (/), then chrome-48.png, then logo3w png, and so on Figure I-3 shows the fi rst half second in more detail

FIGURE I-3

The section at the beginning of the fi rst request indicates a DNS lookup — the browser must resolve www.google.com to an IP address This took approximately 50 milliseconds The next section indicates the time taken to establish a connection to the web server This includes setting

up the TCP connection (if you’re unfamiliar with the three-way handshake, see Appendix A, “TCP Performance”), and possibly waiting for the web server to spawn a new worker process to handle the request In this example, that took approximately 70 milliseconds

The next section shows the time to fi rst byte (TTFB) At the beginning of this section, the client has issued the request and is waiting for the server to respond There’ll always be a slight pause here (approximately 120 milliseconds in this example), even for static fi les However, high delays often indicate an overloaded server — perhaps high levels of disk contention, or back-end scripts that are taking a long time to generate the page

Finally, the server returns a response to the client, which is shown by the fi nal section of the bar

The size of this section is dependent on the size of the resource being returned and the available bandwidth The number following the bar is the total time for the resource, from start to fi nish

After the web browser fetches the HTML document, it can begin fetching resources linked to in it Note that in request 2, there is no DNS lookup — the browser already has the response cached For request 5, the resource resides on a subdomain, ssl.gstatic.com, so this does incur a DNS lookup.Also notice two vertical lines at approximately the 40-millisecond and 55-millisecond marks The

fi rst line indicates the point at which the browser began to render the page The second line cates the point at which the onLoad event fi red — that is, the point at which the page had fi nished loading

indi-You’ll learn more about these waterfall views later in this book — you’ll learn how to optimize the downloading order, why some of the requests have a connection overhead and others don’t, and why there are sometimes gaps where nothing seems to be happening

Firebug

The downside to WPT is that it shows how the page loads on a remote machine, not your own

Usually, this isn’t a problem, but occasionally you want to test a URL inside a members-only area,

Trang 32

or see the page as it would look for someone in your country (or on your ISP) WPT does actually

support some basic scripting, allowing it to log in to htpasswd-protected areas, but this isn’t any

help if you want to log in to something more complicated

Firebug is a useful Firefox extension that (among other things) can show a waterfall view as a page

loads in your browser This is perhaps a more accurate portrayal of real-world performance if you’re

running on a modestly powered PC with home broadband because the WPT tests are presumably

conducted from quite powerful and well-connected hardware

The output of Firebug is similar to that of WPT, complete with the two vertical lines representing

the start and end of rendering Each resource can be clicked to expand a list of the headers sent and

received with the request

System Monitoring

This book is intended to be platform-neutral Whether you run Berkeley Software Distribution

(BSD), Linux, Solaris, Windows, OS X, or some other operating system, the advice given in this

book should still be applicable

Nevertheless, for system performance-monitoring tools, this will inevitably be quite

platform-specifi c Some tools such as netstat are implemented across most operating systems, but the likes

of vmstat and iostat exist only in the UNIX world, and Windows users must use other tools Let’s

briefl y look at the most common choices to see how they work

vmstat

vmstat is an essential tool on most fl avors of UNIX and its derivatives (Linux, OS X, and so on)

It provides information on memory usage, disk activity, and CPU utilization With no arguments,

vmstat simply displays a single-line summary of system activity However, a numeric value is

usu-ally specifi ed on the command line, causing vmstat to output data every x seconds Here’s vmstat

in action with an interval of 5 seconds:

# vmstat 5

procs -memory - -swap -io -system

r b swpd free buff cache si so bi bo in cs us sy id wa

Trang 33

The fi rst columns are as follows:

➤ r — This is the number of currently running processes

➤ b — This is the number of blocking processes

Blocking processes are those that cannot yet run because they are waiting on the hardware (most

often the disks) Naturally, this is the least-desirable state for a process to be in, and a high number

of blocking processes generally indicates a bottleneck somewhere (again, usually the disks) If the

number of running processes exceeds the number of CPU cores on the system, this can also cause

some degrading of performance, but blocking is the real killer

The next four columns are similar to the information given by the free command, as shown here:

➤ swpd — This is how much swap memory is in use (expressed in bytes)

➤ free — This is idle memory

➤ buff — This is memory used for buffers

➤ cache — This is memory used for caching

If you’re coming to UNIX from the world of Windows, it’s worth taking some time to ensure that you are absolutely clear on what these fi gures mean — in UNIX, things aren’t as clear-cut as “free” and “used” memory

The next two columns show swap usage:

➤ si — This is the bytes read in from swap

➤ so — This is the bytes written out to swap

Swapping is usually a bad thing, no matter what operating system you use It indicates insuffi cient physical memory If swapping occurs, expect to see high numbers of blocking processes as the CPUs wait on the disks

Following are the next two columns:

➤ bi — This is the bytes read from block devices

➤ bo — This is the bytes written to block devices

Invariably, block devices means hard disks, so these two columns show how much data is being read

from and written to disk With disks so often being a bottleneck, it’s worth studying these columns with the goal of trying to reduce disk activity Often, you’ll be surprised just how much writing is going on

NOTE For a breakdown of which disks and partitions the activity occurs on, see the iostat command.

Trang 34

Now, consider the next two columns:

➤ in — This is the number of CPU interrupts

➤ cs — This is the number of context switches

At the risk of digressing too much into CPU architecture, a context switch occurs when the CPU

either switches from one process to another, or handles an interrupt Context switching is an

essential part of multitasking operating systems but also incurs some slight overhead If your

system performs a huge number of context switches, this can degrade performance

The fi nal four columns show CPU usage, measured as a percentage of the CPU time:

➤ us — This is the time spent running userland code

➤ sy — This is the system time (that is, time spent running kernel code)

➤ id — This shows the idle time (That is, the CPU is doing nothing.)

➤ wa — This shows the time that the CPU is waiting on I/O

id (idle) is naturally the most preferable state to be in, whereas wa (waiting) is the least wa indicates

that the CPU has things to do but can’t because it’s waiting on other hardware Usually, this is the

disks, so check for high values in the io and swap columns

Whether the CPU will mostly be running user code or kernel code depends on the nature of the

appli-cations running on the machine Many of the appliappli-cations discussed in this book spend a lot of time

sending and receiving data over the network, and this is usually implemented at the kernel level

The previous vmstat example was taken from a web server at a fairly quiet time of the day Let’s

look at another example, taken from the same server, while the nightly backup process was running:

# vmstat 5

procs -memory - -swap -io -system

r b swpd free buff cache si so bi bo in cs us sy id wa

Although the machine is far from being overloaded, performance is not ideal You see regular

block-ing processes, disk activity is higher, and the CPUs (this machine had six cores) are spendblock-ing more

of their time in the waiting (wa) state

Trang 35

Depending on your operating system, there may be other data available from vmstat For example, the Linux version can give a more detailed breakdown of disk activity (with the –d switch) and can show statistics on forking (with the –f switch) Check the man pages to see exactly what your system supports.

WHO THIS BOOK IS FOR

The information in this book is designed to appeal to a wide range of readers, from system administrators charged with managing busy websites, to web developers looking to write effi cient, high-performance code

This book makes no assumptions about your underlying operating system, and the information is (in most cases) equally applicable whether you run OS X, Linux, Windows, FreeBSD, or another

fl avor of UNIX Situations are highlighted in which some of the information depends on the operating system used

Trang 36

WHAT THIS BOOK COVERS

A wide range of technologies are in use on the web, and it would be futile to attempt to cover them

all (or at least cover them in suffi cient detail) Rather, the discussions in this book concentrate on the

most popular open source technologies — PHP, MySQL, Apache, Nginx, memcache, and mongodb

In this book, you’ll discover many of the advanced features of these technologies, and the ways

in which they can be utilized to provide scalable, high-performance websites You’ll learn

cur-rent performance best practices, tips for improving your existing sites, and how to design with

scalability in mind

The browser market is wide and varied The discussions in this book focus on the fi ve main web

browsers (which together make up the vast majority of web users) — Internet Explorer, Chrome,

Firefox, Opera, and Safari Behavior can vary in suitable (but important) ways between versions,

and, in most cases, when particular aspects of browser behavior are examined, the discussion

includes versions from the past 5 years or so It’s unfortunate (but inevitable) that a sizeable number

of users will not be running the most current version

HOW THIS BOOK IS STRUCTURED

The book is divided into two parts, covering aspects of website performance related to the front end

(Part I) and the back end (Part II)

In the fi rst part you’ll meet topics such as the HTTP protocol, how web browsers work, browser

caching, content compression, minifi cation, JavaScript, CSS, and web graphics — all essential topics

for web developers Following are the chapters included in this part of the book:

➤ Chapter 1, “A Refresher on Web Browsers” — This chapter provides a look under the hood

at how the web works In this chapter, you will meet the HTTP protocol, and features such

as caching, persistent connections, and Keep-Alive

➤ Chapter 2, “Utilizing Client-Side Caching” — This chapter examines the ways in which

web browsers cache content, and what you can do to control it

➤ Chapter 3, “Content Compression” — Here you fi nd everything you need to know about

compressing content to speed up page loading times

➤ Chapter 4, “Keeping the Size Down with Minifi cation” — In this chapter, you discover the

art of minifying HTML, CSS, and JavaScript to further reduce payload sizes

➤ Chapter 5, “Optimizing Web Graphics and CSS” — Here you learn how to optimize the

most common image formats, and discover ways in which CSS can be used to create lean,

effi cient markup

➤ Chapter 6, “JavaScript, the Document Object Model, and Ajax” — JavaScript is an

increas-ingly important part of the web In this chapter, you learn about performance aspects of the

language, with an emphasis on interaction with the document object model (DOM)

Trang 37

The second part of the book focuses on the technologies behind the scenes — databases, web servers, server-side scripting, and so on Although many of these issues are of more interest to back-end developers and system administrators, they are vital for front-end developers to understand to appreciate the underlying system Following are the chapters included in this part of the book:

➤ Chapter 7, “Working with Web Servers” — This chapter provides everything you need to

know about tuning Apache and Nginx The second half of the chapter looks at load ing and related issues that arise (for example, session affi nity)

balanc-➤ Chapter 8, “Tuning MySQL” — In this fi rst of two chapters devoted to MySQL, you meet

the myriad of tuning options and discover the differences between MyISAM and InnoDB

➤ Chapter 9, “MySQL in the Network” — Here you learn how to scale out MySQL using

such techniques as replication, sharding, and partitioning

➤ Chapter 10, “Utilizing NoSQL Solutions” — NoSQL is a collective term for lightweight

database alternatives In this chapter, you learn about two of the most important players:

memcache and mongodb

➤ Chapter 11, “Working with Secure Sockets Layer (SSL)” — SSL can be a performance

killer, but there are a surprising number of things that you can do to improve the situation

➤ Chapter 12, “Optimizing PHP” — Perhaps the most popular back-end scripting language,

PHP can have a signifi cant impact on performance In this chapter, you learn about opcode caching, and discover how to write lean, effi cient PHP

This book also includes three appendixes that provide additional information:

➤ Appendix A, “TCP Performance” — Transmission control protocol (TCP) and Internet

Protocol (IP) are the protocols that drive in the Internet In this appendix, you learn about some of the performance aspects of TCP, including the three-way handshake and Nagle’s algorithm

➤ Appendix B, “Designing for Mobile Platforms” — An increasing number of users now

access the web via mobile devices such as cell phones and tablets These bring about their own design considerations

➤ Appendix C, “Compression” — This book makes numerous references to compression

Here you discover the inner workings of the LZW family, the algorithm behind HTTP pression, and many image formats

com-WHAT YOU NEED TO USE THIS BOOK

To get the most out of this book, you should have a basic working knowledge of web development — HTML, JavaScript, CSS, and perhaps PHP You should also be familiar with basic system

management — editing fi les, installing applications, and so on

Trang 38

To help you get the most from the text and keep track of what’s happening, we’ve used a number of

conventions throughout the book

NOTE Notes indicates notes, tips, hints, tricks, and/or asides to the current discussion.

As for styles in the text:

➤ We highlight new terms and important words when we introduce them.

➤ We show keyboard strokes like this: Ctrl+A

➤ We show fi lenames, URLs, and code within the text like so: persistence.properties

➤ We present code in two different ways:

We use a monofont type with no highlighting for most code examples.

We use bold to emphasize code that is particularly important in the present

context or to show changes from a previous code snippet.

ERRATA

We make every effort to ensure that there are no errors in the text or in the code However, no one

is perfect, and mistakes do occur If you fi nd an error in one of our books, like a spelling mistake

or faulty piece of code, we would be grateful for your feedback By sending in errata, you may save

another reader hours of frustration, and, at the same time, you will be helping us provide even

higher-quality information

To fi nd the errata page for this book, go to http://www.wrox.com and locate the title using the

Search box or one of the title lists Then, on the book details page, click the Book Errata link On

this page, you can view all errata that has been submitted for this book and posted by Wrox editors

NOTE A complete book list, including links to each book’s errata, is also available at www.wrox.com/misc-pages/booklist.shtml.

If you don’t spot “your” error on the Book Errata page, go to www.wrox.com/contact/techsupport

.shtml and complete the form there to send us the error you have found We’ll check the information

and, if appropriate, post a message to the book’s errata page and fi x the problem in subsequent

editions of the book

Trang 39

For author and peer discussion, join the P2P forums at p2p.wrox.com The forums are a web-based system for you to post messages relating to Wrox books and related technologies, and to interact with other readers and technology users The forums offer a subscription feature to e-mail you topics of interest of your choosing when new posts are made to the forums Wrox authors, editors, other industry experts, and your fellow readers are present on these forums

At http://p2p.wrox.com, you will fi nd a number of different forums that will help you, not only as you read this book, but also as you develop your own applications To join the forums, just follow these steps:

1. Go to p2p.wrox.com and click the Register link

2. Read the terms of use and click Agree

3. Complete the required information to join, as well as any optional information you want to provide, and click Submit

4. You will receive an e-mail with information describing how to verify your account and complete the joining process

NOTE You can read messages in the forums without joining P2P, but to post your own messages, you must join.

After you join, you can post new messages and respond to messages other users post You can read messages at any time on the web If you would like to have new messages from a particular forum e-mailed to you, click the Subscribe to this Forum icon by the forum name in the forum listing

For more information about how to use the Wrox P2P, be sure to read the P2P FAQs for answers to questions about how the forum software works, as well as many common questions specifi c to P2P and Wrox books To read the FAQs, click the FAQ link on any P2P page

Tiêu đề	Professional Website Performance: Optimizing the Front-End and Back-End potx
Tác giả	Peter Smith
Trường học	John Wiley & Sons, Inc.
Chuyên ngành	Website Performance Optimization
Thể loại	Graduate thesis
Năm xuất bản	2012
Thành phố	New York

Định dạng
Số trang	484
Dung lượng	11,14 MB