1. Trang chủ
  2. » Công Nghệ Thông Tin

IT training always bee tracing slides khotailieu

45 41 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 45
Dung lượng 4,37 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Welcome to Always Bee Tracing!If you haven’t already, please clone the repository of your choice: ▸ Golang into your $GOPATH: git clone git@github.com:honeycombio/tracing-workshop-go.git

Trang 1

Welcome to Always Bee Tracing!

If you haven’t already, please clone the repository of your choice:


▸ Golang (into your $GOPATH):


git clone git@github.com:honeycombio/tracing-workshop-go.git

▸ Node:


git clone git@github.com:honeycombio/tracing-workshop-node.git

Please: also accept your invites to the "Always Bee Tracing" Honeycomb team and our Slack channel

Trang 2

Always Bee Tracing

A Honeycomb Tracing workshop

Trang 3

▸ We used to have "one thing" (monolithic application)

▸ Then we started to have "more things" (splitting monoliths into services)

▸ Now we have "yet more things", or even "Death Star" architectures

(microservices, containers, serverless)

A bit of history

Trang 4

▸ Now we have N2 problems (one slow service bogs down everything, etc.)

2010 - Google releases the Dapper paper describing how they improve on

existing tracing systems

Key innovations: use of sampling, common client libraries decoupling app

code from tracing logic

A bit of history

Trang 5

2012 - Zipkin was developed at Twitter for use with Thrift RPC

2015 - Uber releases Jaeger (also OpenTracing)

▸ Better sampling story, better client libraries, no Scribe/Kafka

▸ Various proprietary systems abound

2019 - Honeycomb is the best available due to best-in-class queries ;)

Why should GOOG have all the fun?

Trang 6

▸ Standards for tracing exist: OpenTracing, OpenCensus, etc

Pros: Collaboration, preventing vendor lock-in

Cons: Slower innovation, political battles/drama

▸ Honeycomb has integrations to bridge standard formats with the

Honeycomb event model

A word on standards

Trang 7

How Honeycomb fits in

Understand how your production systems

are behaving, right now

QUERY BUILDER INTERACTIVE VISUALS RAW DATA TRACES BUBBLEUP + OUTLIERS

BEELINES (AUTOMATIC INSTRUMENTATION + TRACING APIS)

DATA STORE


High Cardinality Data | High Dimensionality Data | Efficient storage

Trang 8

▸ For software engineers who need to understand their code

Better when visualized (preferably first in aggregate)

▸ Best when layered on top of existing data streams (rather than adding

another data silo to your toolkit)

Tracing is…

Trang 10

Instrumentation (and tracing)

Trang 11

Our path today

▸ Establish a baseline: send simple events

▸ Customize: enrich with custom fields and extend into traces

Explore: learn to query a collection of traces, to find the most interesting

one

Trang 12

a third-party dependency

a black-box service

Trang 13

EXERCISE: Run the wall service

go run /wall.go

‣ Open up http://localhost:8080 in your browser and post some messages

to your wall

‣ Try writing messages like these:

‣ "hello #test #hashtag"

‣ "seems @twitteradmin isn’t a valid username but @honeycombio is"

node /wall.js

Trang 14

→ let’s see what we’ve got

Trang 15

Go

Trang 16

→ let’s see what we’ve got

Trang 17

Custom Instrumentation

Identify metadata that will help you isolate unexpected behavior in

custom logic:

▸ Bits about your infrastructure (e.g which host)

▸ Bits about your deploy (e.g which version/build, which feature flags)

▸ Bits about your business (e.g which customer, which shopping cart)

▸ Bits about your execution (e.g payload characteristics, sub-timers)

Trang 18

EXERCISE: Find Checkpoint 1

Go

Node

Trang 19

→ let’s see what we’ve got

Trang 20

EVENT ID: B, PARENTID: A

EVENT ID: C, PARENTID: B

TRACE 1

Trang 21

EVENT ID: A

EVENT ID: B, PARENTID: A

EVENT ID: C, PARENTID: B

TRACE 1

Trang 22

EXERCISE: Find Checkpoint 2

‣ Try writing messages like these:

‣ "seems @twitteradmin isn’t a valid username but @honeycombio is"

‣ "have you tried @honeycombio for @mysql #observability?"

Trang 23

→ let’s see what we’ve got

Trang 24

Our first, simple trace

Trang 25

→ let’s see what we’ve got

Trang 26

Checkpoint 2 Takeaways

▸ Events can be used to trace across functions within a service just as

easily as it can be "distributed"

▸ Store useful metadata on any event in a trace — and query against it!

To aggregate per trace, filter to trace.parent_id does-not-exist

(or break down by unique trace.trace_id values)

Trang 27

EXERCISE: ID sources of latency

Who’s experienced the longest delay when talking to Twitter?

▸ Hint: app.username, MAX(duration_ms),


and name = check_twitter

Who’s responsible for the most amount of cumulative time talking to

Twitter?

▸ Hint: Use SUM(duration_ms) instead

Trang 28

a third-party dependency

Trang 29

EXERCISE: Run the analysis service

‣ Open up http://localhost:8080 in your browser and post some messages

to your wall

‣ Try these:

‣ "everything is awesome!"

‣ "the sky is dark and gloomy and #winteriscoming"

go run /analysis.go node /analysis.js

Trang 30

→ let’s see what we’ve got

Trang 31

EXERCISE: Find Checkpoint 3

Go

Node

Trang 32

→ let’s see what we’ve got

Trang 34

Break

Trang 35

Mosey

back

to seats, please :)

Trang 36

a third-party dependency

a black-box service

Trang 37

EXERCISE: Find Checkpoint 4

Go

Node

Trang 38

→ let’s see what we’ve got

Trang 39

Checkpoint 4 Takeaways

▸ Working with a black box? Instrument from the perspective of the code

you can control

▸ Similar to identifying test cases in TDD: capture fields to let you refine your

understanding of the system

Trang 40

EXERCISE: Who’s knocking over my black box?

▸ First: what does "knocking over" mean? We know that we talk to our black

box via an HTTP call What are our signals of health?

▸ What’s the "usual worst" latency for this call out to AWS?


(Explore different calculations: P95 = 95th percentile, MAX, HEATMAP)

▸ Hint: P95(duration_ms),


and request.host contains aws

Trang 41

Puzzle Time

Trang 42

Scenario #1

Symptoms: we pulled in that last POST in order to persist messages somewhere, but

we’re hearing from customer support that behavior has felt buggy lately — like it works sometimes but not always What’s going on?

Think about:

those failing requests.

response.status_code request.content_length HEATMAPs are great :)

Trang 43

Scenario #2

Symptoms: everything feels slowed down, but more importantly the persistence

behavior seems completely broken What gives?

Think about:

the bleeding? What might we need to find out to answer that question?

response.status_code app.username

Trang 44

Scenario #3

Symptoms: persistence seems fine, but all requests seem to have slowed down to a

snail’s pace What could be impacting our overall latency so badly?

Prompts:

if you’d like to capture more about the characteristics of your payload

amazonaws.com

response.status_code request.host contains aws

Trang 45

Thank you & Office Hours

Ngày đăng: 12/11/2019, 22:10

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

  • Đang cập nhật ...

TÀI LIỆU LIÊN QUAN