1. Trang chủ
  2. » Ngoại Ngữ

Lessons learned from writing over 300,000 lines of infrastructure code

139 93 1

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 139
Dung lượng 16,26 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Bash, Chef, Ansible, Puppet Provision Provision the infrastructure: e.g., EC2 Instances, load balancers, network topology, security groups, IAM permissions, etc.. Task Description Exampl

Trang 1

Lessons learned from writing

Trang 4

We are trying to build this…

Trang 6

If you just read the headlines, it all sounds

Trang 7

Kubernetes, Docker, serverless, microservices, infrastructure as code, distributed tracing, big data systems, data warehouses, data lakes,

chaos engineering, zero-trust architecture,

streaming architecture, immutable

infrastructure, service discovery, service

meshes, NoSQL, NewSQL, ChatOps, HugOps,

Trang 8

But to me, it doesn’t feel

Trang 13

Here’s something we don’t

Trang 14

Building production-grade

Trang 18

Project Examples Time estimate

Managed service ECS, ELB, RDS, ElastiCache 1 – 2 weeks

Distributed system (stateless) nginx, Node.js app, Rails app 2 – 4 weeks

Distributed system (stateful) Elasticsearch, Kafka, MongoDB 2 – 4 months

Entire cloud architecture Apps, DBs, CI/CD, monitoring, etc 6 – 24 months

Trang 20

One trend I love: manage

Trang 21

Manual DBA work

Trang 23

we’ve created a

reusable library of

Trang 24

Primarily written in Terraform, Go,

Trang 25

Off-the-shelf, battle-tested solutions for AWS, Docker, VPCs, VPN, MySQL, Postgres, Couchbase, ElasticSearch, Kafka, ZooKeeper,

Trang 26

The library is used in production

by hundreds of customers

Trang 27

Project Examples Time estimate

Managed service ECS, ELB, RDS, ElastiCache 1 – 2 weeks

Distributed system (stateless) nginx, Node.js app, Rails app 2 – 4 weeks

Distributed system (stateful) Elasticsearch, Kafka, MongoDB 2 – 4 months

Entire cloud architecture Apps, DBs, CI/CD, monitoring, etc 6 – 24 months

Trang 28

Project Examples Time estimate

Managed service ECS, ELB, RDS, ElastiCache 1 – 2 weeks 1 day

Distributed system (stateless) nginx, Node.js app, Rails app 2 – 4 weeks 1 day

Distributed system (stateful) Elasticsearch, Kafka, MongoDB 2 – 4 months 1 day

Entire cloud architecture Apps, DBs, CI/CD, monitoring, etc 6 – 24 months 1 day

Trang 30

In this talk, I’ll share what we

Trang 31

I’m

ybrikman.com

Trang 32

Co-founder of

Trang 35

2  

3  

4  

5  

Trang 36

Project Examples Time estimate

Managed service ECS, ELB, RDS, ElastiCache 1 – 2 weeks

Distributed system (stateless) nginx, Node.js app, Rails app 2 – 4 weeks

Distributed system (stateful) Elasticsearch, Kafka, MongoDB 2 – 4 months

Entire cloud architecture Apps, DBs, CI/CD, monitoring, etc 6 – 24 months

Trang 38

How can it possibly take that

Trang 41

Yak shaving: a seemingly

endless series of small tasks you have to do before you

can do what you actually

Trang 44

The production-grade

Trang 45

Task Description Example tools

Install Install the software binaries and all dependencies Bash, Chef, Ansible, Puppet

Configure

Configure the software at runtime: e.g., configure port settings, file paths, users, leaders, followers, replication, etc

Bash, Chef, Ansible, Puppet

Provision Provision the infrastructure: e.g., EC2 Instances, load balancers, network topology, security groups, IAM

permissions, etc

Terraform, CloudFormation

Deploy Deploy the service on top of the infrastructure Roll out updates with no downtime: e.g., blue-green, rolling, canary

deployments

Scripts, Orchestration tools (ECS, K8S, Nomad)

Trang 46

Task Description Example tools

Security Encryption in transit (TLS) and on disk, authentication, authorization, secrets management, server hardening ACM, EBS Volumes, Cognito, Vault, CiS Monitoring Availability metrics, business metrics, app metrics, server, metrics, events, observability, tracing, alerting CloudWatch, DataDog, New Relic, Honeycomb

Logs Rotate logs on disk Aggregate log data to a central location CloudWatch Logs, ELK, Sumo Logic, Papertrail

Trang 47

ec2-Task Description Example tools

Networking VPCs, subnets, static and dynamic IPs, service discovery, service mesh, firewalls, DNS, SSH access, VPN access EIPs, ENIs, VPCs, NACLs, SGs, Route 53,

OpenVPN

High availability Withstand outages of individual processes, EC2 Instances, services, Availability Zones, and regions. Multi AZ, multi-region, replication, ASGs, ELBs

Scalability Scale up and down in response to load Scale horizontally (more servers) and/or vertically (bigger servers).

ASGs, replication, sharding, caching, divide and conquer

Performance Optimize CPU, memory, disk, network, GPU and usage Query tuning Benchmarking, load testing, profiling Dynatrace, valgrind, VisualVM, ab, Jmeter

Trang 48

Task Description Example tools

Cost optimization Pick proper instance types, use spot and reserved instances, use auto scaling, nuke unused resources ASGs, spot instances, reserved instances

Documentation Document your code, architecture, and practices Create playbooks to respond to incidents READMEs, wikis, Slack

Tests Write automated tests for your infrastructure code Run tests after every commit and nightly Terratest

Trang 49

Key takeaway: use a checklist to build

Trang 50

Full checklist: gruntwork.io/devops-checklist/

Trang 51

1  

3  

4  

5  

Trang 52

What tools do you use to

Trang 54

Here’s the toolset we’ve found

Trang 55

Server Server Server Server Server

Networking, Load Balancers, Databases, Users, Permissions, etc

1 Deploy all the basic infrastructure

Trang 56

Server Server Server Server Server Networking, Load Balancers, Databases, Users, Permissions, etc

VM VM VM VM VM

Trang 57

Server Server Server Server Server

Networking, Load Balancers, Databases, Users, Permissions, etc

VM VM VM VM VM

3 Some of the VMs form a cluster

Trang 58

Server Server Server Server Server

Networking, Load Balancers, Databases, Users, Permissions, etc

Trang 59

Server Server Server Server Server

Networking, Load Balancers, Databases, Users, Permissions, etc

Trang 64

New way: make changes

Trang 66

More time than making a

Trang 67

If you make changes manually,

Trang 68

And the next person to try to

Trang 69

So then they’ll fall back and

Trang 70

But making manual changes

Trang 72

Key takeaway: tools are not enough

Trang 74

It’s tempting to define all of your

dev

qa test stage prod

Trang 75

Downsides: runs slower; harder to understand;

harder to review (plan output unreadable); harder

to test; harder to reuse code; need admin

dev

qa test stage prod

Trang 76

Also, a mistake anywhere could break

dev

qa test stage prod

Trang 77

qa test stage prod

Trang 78

What you really want is

Trang 79

MySQL VPC

Frontend

Trang 81

And break it up into small, reusable,

module

module module

module module module

module

Trang 82

└ dev └ stage └ prod

Trang 83

└ dev

└ vpc

└ mysql └ frontend └ stage

└ vpc

└ mysql └ frontend └ prod

└ vpc

└ mysql └ frontend

Trang 85

gruntwork-io └ asg

└ alb └ ssh

Trang 88

/modules: implementation code, broken

Trang 89

install-xxx: sub-module to install the

Trang 90

run-xxx: sub-module to launch the

Trang 91

xxx-cluster: sub-module to deploy

Trang 92

xxx-yyy: sub-modules with shareable

Trang 93

Each sub-module exposes variables for

Trang 94

Small, configurable sub-modules

Trang 95

As you can combine and compose

Trang 96

/examples: Runnable example code for

Trang 100

Typically, our tests deploy & validate

Trang 101

Key takeaway: build infrastructure

Trang 102

1  

2  

3  

5  

Trang 103

Infrastructure code rots very

Trang 105

Infrastructure code without

Trang 106

For general-purpose languages, we

Trang 107

For infrastructure as code tools,

Trang 109

We write these integration tests in

Trang 110

Terratest philosophy: how

Trang 113

terraformOptions := &terraform.Options {

TerraformDir: " /examples/vault-with-elb", }

defer terraform.Destroy(t, terraformOptions)

terraform.InitAndApply(t, terraformOptions)

validateServerIsWorking(t, terraformOptions)

Run terraform init and

terraform apply to deploy

Trang 114

terraformOptions := &terraform.Options {

TerraformDir: " /examples/vault-with-elb", }

defer terraform.Destroy(t, terraformOptions) terraform.InitAndApply(t, terraformOptions)

validateServerIsWorking(t, terraformOptions)

Validate the infrastructure

Trang 115

// Get IPs of servers

aws.GetPublicIpsOfEc2Instances(t, ids, region)

// Make HTTP requests in a retry loop

http.GetWithRetry(t, url, 200, expected, retries, sleep)

// Run command over SSH

Terratest has many tools built-in for validation

Trang 117

Note: tests create and destroy

Trang 118

Pro tip #1: run tests in completely

Trang 119

Pro tip #2: clean up left-over

Trang 120

e2e

Tests

Integration Tests

Unit Tests

Trang 121

As you go up the pyramid, tests get

e2e Tests

Integration Tests

Unit Tests

Trang 122

How the test pyramid works

Trang 123

Unit tests for infrastructure code: test

e2e Tests

Integration Tests

Unit Tests

Trang 124

Integration tests for infrastructure code:

e2e Tests

Integration Tests

Unit Tests

Trang 125

e2e

Tests

Integration Tests

Unit Tests

Trang 126

Note the test times! This is another

e2e Tests

Integration Tests

Unit Tests

Trang 127

Make sure to check out Terratest best

Trang 128

Key takeaway: infrastructure code

Trang 129

1  

2  

3  

4  

Trang 130

Let’s put it all together:

Trang 131

Task Description Example tools

Security Encryption in transit (TLS) and on disk, authentication, authorization, secrets management, server hardening ACM, EBS Volumes, Cognito, Vault, CiS Monitoring Availability metrics, business metrics, app metrics, server, metrics, events, observability, tracing, alerting CloudWatch, DataDog, New Relic, Honeycomb

Logs Rotate logs on disk Aggregate log data to a central location CloudWatch Logs, ELK, Sumo Logic, Papertrail

Trang 132

2 Write some code

Trang 136

6 Promote that versioned code from

Trang 138

Before…

Trang 139

info@gruntwork.io

Ngày đăng: 30/11/2018, 18:26

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN