IT training 18 04 25 cloud native attitude book p3 khotailieu

ABOUT THE AUTHORS Container Solutions As experts in Cloud Native strategy and technology, Container Solutions support their clients with migrations to the cloud.. Container Solutions’ di

Trang 2

ABOUT THIS BOOK/BLURB

This is a small book with a single purpose, to tell you all about Cloud Native - what it is, what it’s for, who’s using it and why

Go to any software conference and you’ll hear endless discussion

of containers, orchestrators and microservices Why are they so

fashionable? Are there good reasons for using them? What are the trade-offs and do you have to take a big bang approach to adoption? We step back from the hype, summarize the key concepts, and interview some of the enterprises who’ve adopted Cloud Native in production

Take copies of this book and pass them around or just zoom in to

increase the text size and ask your colleagues to read over your shoulder Horizontal and vertical scaling are fully supported

The only hard thing about this book is you can’t assume anyone else has read it and the narrator is notoriously unreliable

What did you think of this book? We’d love to hear from you with feedback or if you need help with a Cloud Native project

email info@container-solutions.com

This book is available in PDF form from the Container Solutions website

at www.container-solutions.com

First published in Great Britain in 2017 by Container Solutions

Publishing, a division of Container Solutions Ltd

Trang 3

Anne Currie

Anne Currie has been in the software industry for over 20 years working on everything from large scale servers and distributed systems in the ‘90’s

to early ecommerce platforms in the 00’s to cutting edge operational tech on the 10’s She has regularly written, spoken and consulted internationally She firmly believes in the importance of the technology industry to society and fears that we often forget how powerful we are She is currently working with Container Solutions

ABOUT THE AUTHORS

Container Solutions

As experts in Cloud Native strategy and technology, Container Solutions support their clients with migrations to the cloud Their unique approach starts with understanding the specific customer needs Then, together with your team, they design and implement custom solutions that last Container Solutions’ diverse team of experts is equipped with

a broad range of Cloud Native skills, with a focus on distributed system development

Container Solutions have global perspective and their office locations include the Netherlands, United Kingdom, Switzerland, Germany and Canada

Trang 4

5 7 11 14 17 21 25 28 30

CONTENT

ARE CASE STUDIES EVER USEFUL?

CASE STUDY / THE FINANCIAL TIMES

CASE STUDY / SKYSCANNER

CASE STUDY / ASOS

CASE STUDY / STARLING BANK

CASE STUDY / ITV

CASE STUDY / CONTAINER SOLUTIONS

DO THOSE CASE STUDIES TELL US ANYTHING?

APPENDIX / THE CONTAINER SOLUTIONS METHOD

Trang 5

ARE CASE STUDIES

What I wanted from the interviews was to understand:

• What was their aim?

• What issues and roadblocks did they hit?

• Did they get what they wanted?

Trang 6

Early adopter case studies are usually only

moderately useful Successful businesses are

unique with their own goals and risk profiles

Early adopters of Cloud Native will usually have

a different attitude to risk than folk starting

out now However, at least these folk are more

realistic role models for the average enterprise

than Google or Netflix

These case studies did give me a general idea of what industry pioneers have done, how difficult

it was and whether the path has become any easier over time

Are Case Studies Ever Useful?

Trang 7

THE FINANCIAL TIMES

CASE STUDY

“Our goal of becoming a technologically agile company was a major success - the teams moved from deploys taking 120 days to only 15 minutes”

Sarah Wells,

Trang 8

Based in London, The Financial Times has an

average worldwide daily readership of 2.2

million Its paid circulation, including both

print and digital, is 856K Three quarters of its

subscribers are digital

The FT was a pioneer of content paywalls and

was the first mainstream UK newspaper to report

earning more from digital subscriptions than

print sales They are also unusual in earning more

from content than from advertising

The FT have been gradually adopting

microservices, continuous delivery, containers

and orchestrators for three years Like

Skyscanner (who I’ll talk about next), their

original motivation was to be able to move faster

and respond more quickly to changes in the

marketplace

As Sarah Wells, the high-profile tech lead of

the content platform, points out, “our goal

of becoming a technologically agile company

was a major success - the teams moved from

deploys taking 120 days to only 15 minutes”

In the process, according to senior project

manager Victoria Morgan-Smith, “the teams

were completely liberated”.

So how did they achieve all this? Broadly

speaking, they made incremental but constant

improvements

The FT have moved an increasing share of their

infrastructure into the cloud (IaaS) Six years

they use off-the-shelf, cloud-based services like databases-as-a-service (including AWS Aurora) and queues-as-a-service wherever possible

Again this is because operating this functionality

in house is “not a differentiator” for the company

Within the FT as a whole there was a strong inclination to move to a microservices-oriented architecture but in different parts of the company they took different approaches The

FT have three big programmes of work where they implemented a new system as a set of microservices One of those (subscription services) incrementally migrated their monolithic server to a microservice architecture by slowly carving off key components However, the remaining two projects (the new content platform and the new website) essentially both built a duplicate of their respective monoliths right from the start using microservices

Interestingly, both of those approaches worked successfully for the FT, suggesting that there

is no one correct way to do a monolith to microservice migration

After nearly three years the content platform has moved from a monolith to having around

150 microservices each of which broadly “does one thing” However, they have not followed the popular “Conway’s law” approach where one or more microservices represent the responsibilities of each team (many services

to one team) Instead multiple teams support each microservice (many to many) This helps maximize parallelism but is mostly because

CASE STUDY

The Financial Times

Trang 9

They found that, in Wells’ words,

“infrastructure-as-code was necessary for

microservices”, and they evolved a strong

culture of automation and CD According to

Wells, “There is a fair amount of diversity within

the FT with some teams running a home-grown

continuous delivery system based on Puppet

while others wrap and deploy their services in

Docker containers on the container-friendly

Linux operating system CoreOS, with yet others

deploying to Heroku

Basically, we have at least:

1 A home-grown, puppet-based platform,

currently hosted on AWS without containers

2 A Heroku-hosted PaaS,

3 A Docker container-based environment using

CoreOS, hosted on AWS”

All of these environments work well, they are

each evolving and were each chosen by the

relevant tech team to meet their own needs at

the time Again, the FT’s experience suggests

there is more than one way to successfully

implement an architectural vision that is

microservice-oriented and runs in a cloud-based

environment with continuous delivery

Finally, the FT’s content platform team

found that containers were the gateway to

orchestration The content folk have been

orchestrating their Docker-containerized

processes in production for several years with

the original motivation being server density -

more efficient resource utilization By using large

AWS instances to host multiple containerized

processes, controlled with an orchestrator, they

reduced their hosting costs by around 75% As

very early users of orchestration they created

their own orchestrator from several open source

tools but are now evaluating the latest shelf products, in particular Kubernetes

off-the-So what unexpected results came out of this Cloud Native evolution for the FT? They anticipated the shift to faster deployments would increase risk In fact, they have moved from a 20% deployment rollback rate to ~0.1%, i.e a two order-of-magnitude reduction in their error rate They ascribe this to the ability

to release small changes more often with microservices They have invested heavily in monitoring and A/B testing, again building their own tools for the latter, and they replaced traditional pre-deployment acceptance tests with automated monitoring in production of key functionality

How have they handled the complexity of distributed systems? They chose to make heavy use of asynchronous queues-as-a-service, which simplified their distributed architecture

by limiting the knock-on effects of a single microservice outage (although this does increase system latency, a tradeoff they accepted) They also limit the use of chained synchronous calls

to avoid cascading failures as one failed service holds up a whole chain of services waiting

on outstanding synchronous requests They also struggled with issues around the order of microservice instantiation and are contemplating rules that microservices should exit if pre-

requisite services are not yet available, allowing the orchestrator to automatically re-start them (by which point their pre-requisite service should hopefully have appeared) Basically, it was difficult but they learned and improved as they went

CASE STUDY

The Financial Times

Trang 10

According to project manager Victoria

Morgan-Smith:

“our goal throughout was to de-risk

experimentation” but that involved

“training, tools and trust”.

The FT heavily invested in internal on-the-job

training with an explicit remit for their devops

teams to disseminate the new operational

knowledge to developers and operations They

learned that their teams could be trusted to

make good judgments if they were informed,

given responsibility and had the right tools For

example, initially, their IaaS bills were very high,

but once developers were given training and access to billing tools and guidance on budgets the bills reduced

In common with many other early adopters the

FT experimented and built in-house and were prepared to accept a level of uncertainty and risk Sometimes their tech teams needed to re-assess

as the world changed, as with their move from private to public cloud, but they were persistent and trusted to make the occasional readjustment

in a rapidly changing environment Trust was a key factor in their progress

CASE STUDY

The Financial Times

Trang 12

Launched in 2003 and headquartered in

Scotland, Skyscanner is a global travel search site

with more than 50 million monthly users Their

self-built technology, which includes websites

like www.skyscanner.com and a mobile app,

supports over 30 languages and 150 currencies

Skyscanner successfully use some highly

advanced Cloud Native strategies in a mixed

environment: they have a monolithic core system

and a fast-growing range of supplementary

microservices Part of their estate is hosted on

their own servers and part is in the

cloud on AWS

Skyscanner have now been using containers

and orchestrators in production for around two

years and their new code is generally “Cloud

Native” i.e microservice-based, containerized

and orchestrated The decision to move towards

Cloud Native was jointly made between their

operations and their development teams and

their motivation was speed The company

wanted to increase their deployment frequency

and velocity “We saw the ability to move

and adapt as a strategic asset”, said Stuart

Davidson who runs the enterprise’s build and

deployment teams According to visionary

ex-Amazon CTO Bryan Dove, Skyscanner’s goal is

to “react at the speed of the internet” His bold

ambition is “10,000 releases every day” and they

are moving rapidly towards achieving it

According to Davidson, “Back in 2014 we were

making around 100 deploys a month We

adopted continuous delivery and that helped

us deploy more quickly, but it didn’t solve all our problems - developers were still limited by which libraries and frameworks were supported

in production Moving to containerization plus CD was the game-changer. It increased our deployment rate by a factor of 500”

Skyscanner’s goal was to achieve “idea to user inside an afternoon” and they have mostly achieved this In around two years their delivery time has dropped from 6-8 weeks to a few hours In common with many early Cloud Native adopters Skyscanner achieved this as an entirely internal project using off-the-shelf or

self-built tooling

Supporting thousands of microservices within

a single environment involves defining some simplifying constraints Skyscanner have developed a microservice shell that provides useful standard defaults on some low level operational behaviours, network interface use, for example They also specify contracts and mandate contract consistency for any new microservices The key is that any constraints make the engineers’ lives easier, but don’t limit them, “batteries are included but removable”

For Skyscanner, the initial motivation for adopting microservices and containerization was deployment speed However, once they had successfully containerized they rapidly started

to use an orchestrator in production to reduce their operating costs “For us”, said Davidson,

“containers were the enabler for Cloud Native”

CASE STUDY

Skyscanner

Trang 13

Skyscanner are an excellent example of evolving

strategy

• Several years ago, the team identified their

first goal as increased deployment speed and their first step as continuous delivery

To achieve this they successfully developed

a CD pipeline using the open source tool TeamCity

• They identified another bottleneck as

environmental limitations Developers wanted to use the latest library versions but were limited to what was currently supported in the build system and production instances The ops team set a goal to remove this limitation by allowing developers to bundle their chosen environment into their production deployments using containerization As a step in this process they moved to a more container-friendly build tool (Drone)

• Once they had successfully containerized,

the Skyscanner team moved again

They decided to improve their resilience and reduce costs by using a container orchestrator in part of their production environment They initially chose the easiest orchestrator for them to try out at the time - the newly launched Amazon Elastic Container Service (ECS)

They were happy with that and it achieved

the margin improvement they were looking

for As a result they have continued to extend

orchestrator use in their production environment

Having met all their goals so far, Skyscanner are

now considering their next challenges, which

include handling many different production

environments and making microservices even

smaller in order to move even faster

Skyscanner’s voyage has been a continuous, iterative process and by no means easy

According to Davidson, “The bumps and scars make you more skeptical Many of the tools

we tried did not live up to their hype” They had

to constantly experiment and to sometimes abandon one tool entirely and move to a new one as their needs changed or the tool proved inadequate to their growing scale They correctly didn’t view this as failure, but as a valuable learning process that is only possible in the reduced-risk environment of the cloud

However, according to Davidson, within his own team this learning came at a cost too, “every migration we had to do, every time we had to make our engineers change how they were doing something made us lose a little credibility that

we knew what we were doing As much as the engineering community in Skyscanner were awesome about this, I was always really aware

of the level of change fatigue we introduced”

To make this rate of change work, several of the company’s techies took the initiative to upgrade their change management skills with a course at Edinburgh Business School

Conceptually, the move to Cloud Native has been

a positive one for Skyscanner Their developers have embraced their new responsibilities to drive and test their own releases, ops no longer impose unnecessarily restrictive environmental constraints upon the development team and they improved their deployment speed 500-fold However, they don’t believe their operational journey is at, or will ever reach, an end Like their customers, their technology teams are keen to keep moving forward to new destinations

CASE STUDY

Skyscanner

Trang 14

CASE STUDY

“The ability to provide fast response

Trang 15

Founded in 2000, ASOS is a highly successful,

global eCommerce fashion retailer Across their

various mobile and web platforms they had 800

million visits in the first half of 2017 They have

21 million social media followers and their retail

sales were just under £1B in H1 (the first half

of) 2017 [8] ASOS’s mission is to become “the

world’s number-one online shopping destination

for fashion-loving 20-somethings”

Since their inception, ASOS have been tech

visionaries who built their own platform in-house

to meet their specific needs and resisted the

urge felt by many retailers to go for off-the-shelf

eCommerce products To advance their overall

objectives, ASOS have identified a number

of strategic goals including: faster feature

velocity (getting functionality from idea to user

more quickly), improved scalability to handle

peaks like Black Friday and the ever-faster site

response times that are famously key to online

conversion ASOS are in a slightly different

technical position from our other case studies

Unlike Skyscanner and FT, ASOS’s services run on

Windows not Linux

Several years ago, ASOS determined that a

key factor in achieving their goals would be to

transition from on-premises, owned and

self-managed servers to cloud hosting They have

been gradually moving all of their services from

their own data centres to Azure with the stated

aim of 100% cloud within 2 years

One of their aims from cloud was to significantly

reduce the operational load on their teams They

decided they would rather have their technical

folk focused on areas of greater business

advantage like features ASOS therefore chose

to run on Azure’s “Cloud Services” PaaS, i.e they use fully managed VMs provided, monitored, patched and supported by Microsoft ASOS just deploy applications to those VMs, using them as

“units of isolation” They found this did indeed reduce their operational overheads and so they went even further, transitioning to fully managed databases wherever possible (aka “database-as-a-service”) They now host their own stateful services only when Azure does not offer a fully managed alternative

As well as moving to the cloud, ASOS embraced

a Cloud Native-style approach with heavy use of microservices A microservice-oriented architecture on flexible cloud infrastructure has been core to their improved feature velocity

However, they have sometimes chosen to prioritize other goals Like FT and Skyscanner, ASOS creates their microservices to “do one thing” Unlike the other two, however, in some instances ASOS groups, links and deploys multiple microservices as a single process, with each microservice a Windows library

The majority of the ASOS estate consists of the more usual discrete services communicating over (typically) REST, but ASOS group and deploy their services together where performance

is particularly critical This improves the responsiveness of these service groups, which

is good, but the increased coupling does have a negative impact on ASOS’s agility and ability to change those particular services, which is bad In most cases, ASOS choose to prioritize agility and feature velocity over performance and therefore they deploy using the more common single-decoupled-microservice model in the majority

of cases

CASE STUDY

Asos

Trang 16

Why is grouping services more performant

anyway? As we discussed in an earlier chapter,

microservices talking across multiple VM

instances can potentially introduce significant

intercommunication latency Grouping

microservices can make deployment trickier and

increase coupling, which slows feature velocity

(time from idea to deployment) However, it can

significantly improve execution speed, which

ASOS judge to be an important priority for them

for some services According to their Enterprise

Architect David Green “the ability to provide

fast response times is key to our business”

{Aside - for some Linux orchestrators you can

achieve a similar result using the “Affinity”

feature, which tells the orchestrator to make sure

that some services are always co-located on the

same VM instance or “node”)

This is a good demonstration that microservice

experts still make judgements and balance

tradeoffs on how they will implement a Cloud

Native approach This is true even on a

service-by-service basis

{Aside - interestingly in the ASOS architecture

the majority (obviously not all!) of their service

communication is with the (remote) client, so

cross-VM latency isn’t actually as big an issue as

we might think for most of their services.}

Their interest in execution speed is also reflected

in ASOS’s data architecture They keep data as

close to users as possible and make extensive use of NoSQL databases and caching One of the attractions of a microservice architecture for ASOS is the ability to make more granular choices about how and where data is maintained, all of which helps with their critical

response times

Another, completely different, aspect of a microservice architecture that particularly appealed to ASOS was the ability to parallelize teams and reduce handovers and blockages This also helped them improve their feature velocity

ASOS are extremely happy with the progress they have made using cloud and microservices Last year’s huge Black Friday beat all previous records for scale and responsiveness across their applications So, what next technically for ASOS?

As Azure continues to add managed stateful services, ASOS will transition to use them They would also like to improve their server density (effective resource utilization) To achieve this they are likely to investigate containers and orchestration, but that tooling is still less mature

on Windows than Linux

Overall a cloud (PaaS) and microservice-focussed strategy has worked very well for ASOS and they intend to continue on their current path

CASE STUDY

Asos

Trang 18

Starling Bank was founded in 2014 Based in

London, it has been licensed and operating since

July 2016 The bank is a successful part of the

British Fintech scene, which is a spin-off from

the UK’s strong financial services sector

Starling are a mobile-only, challenger bank who

describe themselves as a “tech business with

a banking licence” They provide a full service

current account solely accessed from Android

and iOS mobile devices

They received $70m of investment in early 2016

Starling’s tech comprises a cloud-hosted

back-end system, talking to apps on users’ mobile

phones, and third party services As well as a full

current account, the bank provides Mastercard

debit cards (customers spend money on their

SB debit card and the authorizations and debits

arrive at Starling servers through third party

systems) They also support direct debits,

standing orders and faster payments, which are

again provided by back-end integrations with

other third party systems

Starting in 2016, Starling created their core

infrastructure on Amazon Web Services (AWS)

inside just 12 months Their highly articulate CTO

Greg Hawkins likes to say, “we built a bank in a

year”.

In common with everyone I’ve interviewed

for this series of case studies, Starling use a

microservices architecture of independent

services interacting via clearly defined APIs As of

March 2018 they have ~20 Java microservices

That number will increase

every service can be developed on by multiple teams They operate this way because they can

As Hawkins puts it, “we’re taking advantage

of the flexibility we get from our small size -

we can reconfigure ourselves very quickly.”

As they continue to grow, Greg recognizes that they will lose some of that flexibility, “it won’t last forever”, and will then adopt smaller microservices and a more Conway-like model

In terms of deployment and operations, whilst services can be deployed individually, for convenience Starling usually use a simultaneous deployment approach where all services in the back-end are deployed at once This is a tradeoff that has evolved between minimizing the small amount of overhead around releases and keeping release frequency up They built a rudimentary orchestrator themselves to drive rolling deploys based on version number changes (scale up AWS, create new services on the new instances, expose those new services instead

of the old ones, turn off the old ones and scale down their AWS instances)

Starling generally redeploy their whole estate 4-5 times per day to production So, new functionality reaches prod rapidly, and it’s business-as-usual to apply security patches fast when necessary

As always, API management is a tough challenge for frequent deployments You could argue (naively) that simultaneous deployment makes this easier because you are always re-deploying both sides of your API at once, but this isn’t really true for several reasons:

• Starling don’t mandate simultaneous

CASE STUDY

Starling Bank

Định dạng
Số trang	37
Dung lượng	10,71 MB