Performance optimisations in a cloud centric world

Whether you’ve already moved systems to the cloud or are thinking of doing so, this book will point out some of the risks to your site’s performancecreated by this loss of control and pu

Trang 4

Performance Optimizations in a

Cloud-Centric World

Andy Still

Trang 5

Performance Optimizations in a Cloud-Centric World

by Andy Still

Printed in the United States of America

Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North,Sebastopol, CA 95472

O’Reilly books may be purchased for educational, business, or salespromotional use Online editions are also available for most titles (

http://safaribooksonline.com ) For more information, contact ourcorporate/institutional sales department: 800-998-9938 or

corporate@oreilly.com

Editor: Brian Anderson

Copyeditor: Holly Bauer

Proofreader: Nicole Shelby

Cover Designer: Randy Comer

July 2015: First Edition

Trang 6

Revision History for the First Edition

2015-07-19: First Release

2015-09-02: Second Release

The O’Reilly logo is a registered trademark of O’Reilly Media, Inc

Performance Optimizations in a Cloud-Centric World, the cover image, and

related trade dress are trademarks of O’Reilly Media, Inc

While the publisher and the author have used good faith efforts to ensure thatthe information and instructions contained in this work are accurate, the

publisher and the author disclaim all responsibility for errors or omissions,including without limitation responsibility for damages resulting from the use

of or reliance on this work Use of the information and instructions contained

in this work is at your own risk If any code samples or other technology thiswork contains or describes is subject to open source licenses or the

intellectual property rights of others, it is your responsibility to ensure thatyour use thereof complies with such licenses and/or rights

Cover image courtesy of Vera & Jean-Christophe, from flickr The originalimage (“Heavy Traffic”) was in color

978-1-491-93137-0

[LSI]

Trang 7

For Candance

For Candance,

who insists that all poor performance on the Internet is my fault

Trang 8

Back in the day, it was simple

Content was served from your server, over your network, and then to clientmachines that you controlled Even when that moved out from a LAN to aWAN, the connectivity came from a single provider—it was all under yourcontrol

Then came the Internet…

Now content was being served across the public Internet to end-user

machines—you lost control of the location, type of machine, and type ofconnectivity

Then came the cloud…

The cloud brought with it a new way of thinking about web system hosting.Hosting shifted from being a hand-crafted service to a commodity serviceproviding throwaway systems You moved from being a hardware owner tobeing a service consumer

With this change came the increasing loss of control over your system

Nowadays your application is often the only element that you control

directly, and even that can be dependent on consuming third-party services

This is not a bad thing, but you need to be aware of the issues that can arise

as a result of this shift to the cloud

Whether you’ve already moved systems to the cloud or are thinking of doing

so, this book will point out some of the risks to your site’s performancecreated by this loss of control and puts forth some methods to identify andthen mitigate those risks

In no way, though, does this book set out to deter you from moving into thecloud This author has long been a cloud advocate and works almost

exclusively on cloud-based systems

Trang 9

For simplicity, I’ve used the term “website” throughout to refer to any system that

distributes data across the Internet, including browser-based applications, mobile apps, etc.

Trang 10

Chapter 1 Losing Control

So, here we are in the world of the cloud, with ever-expanding elements ofour websites being placed in the hands of others

Trang 11

Advantages to Giving Up Control

There are many positive aspects to making this move (after all, why else

would so many people be doing it?), so before going into the negatives, let’sremind ourselves of some of the advantages of cloud-based systems:

Quick and easy access to enterprise-level solutions

For example, building your own geographically available SQL servercluster with real-time failover would take lots of hardware, high-qualityconnectivity between data centers, a high degree of expertise in databasesand networking, and a reasonable amount of time and ongoing

maintenance Services such as Amazon RDS make this achievable within

an hour, and at a reasonable hourly rate

Flexibility and the ability to experiment and evolve systems easily

The ability to create and throw away systems means that you can makemistakes and learn from experience what’s the best setup for your system.Rather than spending time and effort doing capacity estimates to

determine the hardware needed, you can just try different sizes, find thebest size, and then change the setup if you reach capacity, or even at

different times of day

Access to data you could never create yourself

Third-party data sources do create risks, but they also enhance the

attractiveness of your system by providing data that you otherwise

wouldn’t be able to provide but that your users rely upon—either becausethat data is about a third-party system (e.g., Twitter feeds), or because itwould not be economical (e.g., mapping data)

They improve performance and resilience

While they are out of your control, most cloud-based systems have higherlevels of resilience built in than you would build into an equivalent

system Likewise, though, there are potential issues created by how CDNsroute traffic; CDNs will usually offer performance improvements over

Trang 12

systems that do not use them.

Cloud-based systems are also built for high performance and throughputand designed to scale out of the box Many services will scale

automatically and invisibly to you as the consumer, and others will scale

at the click of a button or an API call

Access to systems run by specialists in the area—not generalists

In house or using a general data center, you may have a small teamdedicated to a task—or more likely, a team of generalists who have adegree of expertise across a range of areas Bringing in a range of

specialist cloud providers allows to you work with entire companies thatare dedicated to expertise in specific areas, such as security, DNS, orgeolocation

Trang 13

Performance Risks

Despite these advantages, it’s important to be aware of the inherent

performance risks, especially in this era where good website performance iskey to user satisfaction The next sections cover important considerations forperformance and outline key performance risks, following the journey that auser must travel in order to take advantage of your website

Trang 14

1 The Last Mile

Before any user can access your website, they need to connect from theirdevice to your servers The first stage of this connection, between the user’s

device and the Internet backbone, is known as the last mile For a desktop

user, this is usually the connection to their ISP, whether that be by DSL,cable, or even dial-up For a mobile user, it’s the connection via their mobilenetwork

This section of the connection between user and server is the most inefficientand variable, and it will add latency onto any connection

To illustrate this, in 2013 the FCC released research that showed that a speed fiber connection would add 18ms latency—and that was the best-casescenario—a DSL connection would add 44ms, and dial-up was considerablyslower For mobile users, the story was even worse: a 4G connection had alatency overhead of 600ms on new connections, a 3G connection had a

top-latency of over 2s on new connections, and even existing open connectionhad a latency as high as 500ms

THE “LAST MILE” OR THE “FIRST MILE”?

Although the last mile is the traditional name of the first stage of the connection between a user and the server, it may be more appropriate to think of it as the first mile or the on ramp, as the

delay is often in establishing the connection in the first instance, particularly in mobile networks Mobile connections have to communicate with the network to validate that a connection is

allowed and to define the speed at which they can connect before anything can be opened For 4G networks, this exchange happens with the local cell tower, but for 3G networks, the exchange takes place with the core network; therefore, 3G networks have much higher latency on newly opened connections.

This is a high-impact area of the delivery of any website, and it’s the one areawhere there is genuinely little to be done about the issue Nevertheless, it’simportant to be aware of the variations that are possible and actually beingexperienced, and to ensure that your website’s functionality is not affected bythem

Trang 15

Unreliable delivery of content

The variability in connection speed of the last mile means that it’s hard todetermine how fast content will be delivered to users This presents many

of the same challenges that we’ll explore in the next section—they’reoften amplified by the challenges of the last mile

Trang 16

or from where the user is coming to you to request it.

Users are now accessing data from an expanding range of devices, via manydifferent means of connectivity, and from an ever-widening range of

Latency is based on the distance that the data has to travel to get from end

to end and any other associated delays involved in establishing and

maintaining a connection

Trang 17

Which Is the Biggest Challenge to Performance?

Bandwidth is often discussed as a limiting factor, but in many cases, latency

is the killer—bandwidth can be scaled up, but latency is not as easy to

address

There is a theoretical minimum latency that will exist based on the physicaldistance between two places Optimally configured fiber connections cantravel at approximately 1.5× the time taken to travel at the speed of light Thespeed of light is very fast, but there is still a measurable delay when

transmitting over long distances For example, the theoretical fastest speedfor sending data from New York to London is 56ms; to Sydney, it’s 160ms.This means that to serve data to a user in Sydney from your servers in NewYork, 160ms will pass to establish a connection, and another 160ms will passbefore the first byte of data is returned That means that 320ms is the fastestpossible time, even in optimal conditions, that a single byte of data could bereturned Of course, most requests will involve multiple round trips for dataand multiple connections

However, data often doesn’t travel by an optimal route

The BGP (Border Gateway Protocol) that manages most of the routing on theInternet is designed to find optimal routes between any two points Like allother protocols, though, it can be prone to misconfiguration, which

sometimes results in the selection of less-than-optimal routes

More commonly, such suboptimal routes are chosen due to the peering

arrangements of your network provider Peering determines which othernetworks a network will agree to forward traffic to You should not assumethat the Internet is a non-partisan place where data moves freely from system

to system; the reality is that peering is a commercial arrangement, and

companies will choose their peers based on financial, competitive, and otherless-idealistic reasons The upshot for your system is that it is important to beaware that the peering arrangement that your hosting company (and the

companies they have arrangements with) has in place can affect the

performance of your system

Trang 18

When choosing a data center, you can get information about these

arrangements; however, cloud providers are not so open Therefore, it’simportant to monitor what’s happening to determine the best cloud providerfor your end users

Trang 19

The variability of connectivity across the backbone really boils down to asingle performance risk, but it’s a fundamental one that you need to be aware

of when building any web-based system

Unreliable delivery of content

If you cannot control how data is being sent to a user, you cannot controlthe speed at which it arrives This makes it very difficult to determineexactly how a website should be developed For example:

Can data can be updated in real time?

Can activity be triggered in response to a user activity, e.g., predictivesearch?

Which functionality should be executed client side and which serverside?

Can functionality be consistent across platforms?

Trang 20

3 Servers and Data Center Infrastructure

Traditionally, when hosting in a data center, you can make an informed

choice about all aspects of the hardware and infrastructure you use You canwork with the data center provider to build the hardware and the networkinfrastructure to your specific requirements, including the connectivity intoyour systems You can influence or at least be aware of the types of hardwareand networking being used, the peering relationships, the physical location ofyour hardware, and even its location within the building

The construction of your platform is a process of building something to last,and once built, it should remain relatively static, with any changes being non-trivial operations

The migration of many data centers to virtualized platforms started a process

of migration from static to throwaway platforms However, it was with thegrowth of cloud-based Infrastructure as a Service (IaaS) platforms that

systems became completely throwaway An extension to IaaS is Platform as aService (PaaS), where, rather than having any access to the infrastructure atall, you simply pass some code into the system, a platform is created, and thecode deployed upon it is ready to run

With these systems, all details of the underlying hardware and infrastructureare hidden from view, and you’re asked to put your trust in the cloud

providers to do what is best This way of working is practical and can bebeneficial; cloud providers are managing infrastructure across many usersand have a constant process of upgrading and improving the underlying

technology The only way they can coordinate rolling out the new technology

is to make it non-optional (and therefore hidden from end users)

Trang 21

Loss of control over the data center creates two key performance risks

Loss of ability to fine-tune hardware/networking

Cloud providers will provide machines based on a set of generic sizes,and they usually keep the underlying architecture deliberately vague,using measurements such as “compute units” rather than specifying theexact hardware being used

Likewise, network connectivity is expressed in generic terms such as

small, medium, large, etc., rather than specifying the actual values so that

the exact nature of the networking is out of your control

All of this means that you cannot benchmark your application and thenspecify the exact hardware you want your application to run on Youcannot make operating system modifications to suit that exact hardware,because at any point, your servers may restart on different hardware

configurations

No guarantee of consistency

Every time you reboot a machine it can potentially (and usually, actually)come back up on completely different hardware, so there is no guaranteethat you’ll get consistent performance This is due in part to varyinghardware, and also to the potential for noisy neighbors—that is, otherusers sharing your infrastructure and consequently affecting the

performance of your infrastructure In practice, these inconsistencies aremuch rarer than they used to be

Some cloud vendors will offer higher-priced alternatives that will

guarantee that certain pieces of hardware will be dedicated for your use

Trang 22

4 Third-Party SaaS Tools

While you lose control over the hardware and the infrastructure with IaaS,you still have access to the underlying operating system; however, in theworld of the cloud, systems are increasingly dependent on higher-levelSoftware as a Service (SaaS) systems that deliver functionality rather than aplatform on which you can execute your own functionality

All access is provided via an API, and you have absolutely no control overhow the service is run or configured

Trang 23

Examples in this section

For consistency and to illustrate the range of services offered by single providers, all

examples of services in this section are provided by Amazon Web Services (AWS); other providers offer similar ranges of services.

These SaaS systems can provide a wide range of functionality, includingdatabase (Amazon RDS or DynamoDB), file storage (Amazon S3), messagequeuing (Amazon SQS), data analysis (Amazon EMR), email sending

(Amazon SES), authentication (AWS Directory Service), data warehousing(Amazon Redshift), and many others

There are even cloud-based services now that will provide shared execution platforms (such as Amazon Lambda) These services trigger smallpieces of code in response to a schedule or an event (e.g., an image upload orbutton click) and execute them in an environment outside your control

Trang 24

code-Performance Risks

As you start to introduce third-party SaaS services, there are two key

performance risks that you must be aware of

Complete failure or performance degradation

Although one of the selling points of third-party SaaS systems is that theyare built on much more resilient platforms than you could build and

manage on your own, the fact remains that if they do go down or start torun slowly, there is nothing you can do about it—you are entirely in thehands of the provider to resolve the issue

Loss of data

Though the data storage systems are designed to be resilient (and in

general, they are), there have been examples in the past of cloud

providers losing data due to hardware failures or software issues

Trang 25

5 CDNs and Other Cloud-Based Systems

Many systems now sit behind remote cloud-based services, meaning that anyrequests made to your server are routed via these systems before hitting it

Trang 26

The most common example of these systems are CDNs (content deliverynetworks) These are systems that sit outside your infrastructure, handlingtraffic before it hits your servers to provide globally distributed caching ofcontent

CDNs are part of any best-practice setup for a high-usage website, providinghigher-speed distribution of data as well as lowering overhead of your

servers

The way they work is conceptually simple: when a user makes a request for aresource from your system, the DNS resolution is resolved to the point ofpresence within the CDN infrastructure that has the least latency and load The user then makes the request to that server If the server has a cached copy

of the resource the user is requesting, it returns it; if it doesn’t, or if the

version it has has expired, then it requests a copy from your server and caches

it for future requests

If the CDN has a cached copy, then the latency for that request is much

lower; if not, then the connection between the CDN and the origin server isoptimized so that the longer-distance part of the request is completed fasterthan if the request was made directly by the end user

Trang 27

Web application firewall

Provides protection against some standard security exploits, such as

cross-site scripting or SQL injection

Traffic queuing

Protects your site from being overrun with traffic by queuing excessdemand until space becomes available

Translation services

Translate content into the language of the locale of the user

It is not uncommon to find that requests have been routed via multiple based services between the user and your server

Trang 28

cloud-Performance Risks

There are a number of performance risks associated with moving your

website behind cloud-based services

Complete failure or performance degradation

Like with third-party SaaS tools, if a cloud system you rely on goes

down, so will your system Likewise, if that cloud system starts to runslowly, so will your system

This could be caused by hardware or infrastructure issues, or issues

associated with software releases (SaaS providers will usually releaseoften and unannounced) They could also be caused by third-party

malicious activities such as hacking or DoS attacks—SaaS systems can

be high profile and therefore potential targets for such attacks

Increased overhead

All additional processing being done will add time to the overall

processing time of a request When adding an additional system in front

of your own system, you’re not only adding the time taken for that

service to execute the functionality that it is providing, but you’re alsoadding to the number of network hops the data has to make to completeits journey

Increased latency

All services will add additional hops onto the route taken by the request.Some services offer geolocation so that users will be routed to a locallybased service, but others do not It’s not uncommon to hear of systemswhere requests are routed back and forth across the Atlantic several timesbetween the user and the server as they pass through cloud providersoffering different functionality

Trang 29

6 Third-Party Components

Websites are increasingly dependent on being consumers of data orfunctionality provided by third-party systems

Trang 30

Client Side

Client-side systems will commonly display data from third parties as part oftheir core content This can include:

Data from third-party advertising systems (e.g., Google AdWords)

Social media content (e.g., Twitter feeds or Facebook “like” counts)News feeds provided by RSS feeds

Location mapping and directions (e.g., Google Maps)

Unseenthird party calls, such as analytics, affiliate tracking tags, ormonitoring tools

Trang 31

Server Side

Server-side content will often retrieve external data and combine it with yourdata to create a mashup of multiple data sources These can include freelyavailable and commercial data sources; for example, combining your branchlocations with mapping data to determine the nearest branch to the user’slocation

Định dạng
Số trang	62
Dung lượng	1,25 MB