research in internet-scale computing systems

Five Year Mission • Observation: Internet systems complex, fragile, manually managed, evolving rapidly – To scale Ebay, must build Ebay-sized company – To scale YouTube, get acquired by

Trang 1

Berkeley RAD Lab:

Research in Internet-scale Computing Systems

Randy H Katz randy@cs.berkeley.edu

28 March 2007

Trang 2

Five Year Mission

• Observation: Internet systems complex, fragile, manually managed, evolving rapidly

– To scale Ebay, must build Ebay-sized company

– To scale YouTube, get acquired by a Google-sized company

• Mission: Enable a single person to create, evolve, and operate the next-generation IT service

– “The Fortune 1 Million” by enabling rapid innovation

• Approach: Create core technology spanning systems, networking, and machine learning

• Focus: Making datacenter easier to manage to enable one person to Analyze, Deploy, Operate a scalable IT

service

Trang 3

Jan 07 Announcements by

Microsoft and Google

• Microsoft and Google race to build next-gen DCs

– Microsoft announces a $550 million DC in TX

– Google confirm plans for a $600 million site in NC

– Google two more DCs in SC; may cost another $950

million about 150,000 computers each

• Internet DCs are the next computing platform

• Power availability drives deployment decisions

Trang 4

Datacenter is the Computer

• Google program == Web search, Gmail,…

• Google computer ==

Warehouse-sized facilities and workloads likely more common

Luiz Barroso’s talk at RAD Lab 12/11/06

Sun Project Blackbox

10/17/06 Compose datacenter from 20 ft containers!

– Power/cooling for 200 KW – External taps for electricity, network, cold water

– 250 Servers, 7 TB DRAM,

or 1.5 PB disk in 2006 – 20% energy savings – 1/10th? cost of a building

Trang 5

Datacenter Programming

System

• Ruby on Rails: open source Web framework

optimized for programmer happiness and sustainable productivity:

– Convention over configuration – Scaffolding: automatic, Web-based, UI to stored data – Program the client: write browser-side code in Ruby, compile to Javascript

– “Duck Typing/Mix-Ins”

• Proven Expressiveness

– Lines of code Java vs RoR: 3:1 – Lines of configuration Java vs RoR: 10:1

• More than a fad

– Java on Rails, Python on Rails, …

Trang 6

Datacenter Synthesis + OS

• Synthesis: change DC via written specification

– DC Spec Language compiled to logical configuration

• OS: allocate, monitor, adjust during operation

– Director using machine learning, Drivers send commands

Trang 7

“System” Statistical Machine Learning

• S2ML Strengths

– Handle SW churn: Train vs write the logic

– Beyond queuing models: Learns how to handle/make policy between steady states

– Beyond control theory: Coping with complex cost

functions

– Discovery: Finding trends, needles in data haystack– Exploit cheap processing advances: fast enough to run online

• S2ML as an integral component of DC OS

Trang 8

Datacenter Monitoring

• S2ML needs data to analyze

• DC components come with sensors already

– CPUs (performance counters)

– Disks (SMART interface)

• Add sensors to software

– Log files

– D-trace for Solaris, Mac OS

• Trace 10K++ nodes within and between DCs

– *Trace: App-oriented path recording framework

– X-Trace: Cross-layer/-domain including network layer

Trang 9

Middleboxes in Today’s DC

• Middle boxes inserted on

physical path– Policy via plumbing– Weakest link: 1 point of failure, bottleneck

– Expensive to upgrade and introduce new

functionality

• Identity-based Routing Layer: policy not plumbing

to route classified packets

to appropriate middlebox services

High Speed Network

Trang 10

– Bringing processor resources on/off-line: Dynamic

environment, complex cost function, measurement- driven decisions

• Preserve 100% Service Level Agreements

• Don’t hurt hardware reliability

• Then conserve energy

• Conserve energy and improve reliability

– MTTF: stress of on/off cycle vs benefits of off-hours

Trang 11

DC Networking and Power

• Within DC racks, network equipment often the “hottest” components in the hot spot

• Network opportunities for power reduction

– Transition to higher speed interconnects (10 Gbs) at DC scales and densities

– High function/high power assists embedded in network element (e.g., TCAMs)

Trang 12

QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

Thermal Image of Typical

Cluster Rack

Rack

Switch

M K Patterson, A Pratt, P Kumar,

“From UPS to Silicon: an end-to-end evaluation of datacenter efficiency”, Intel Corporation

Trang 13

DC Networking and Power

• Selectively power down ports/portions of net elements

• Enhanced power-awareness in the network stack

– Power-aware routing and support for system virtualization

• Support for datacenter “slice” power down and restart

– Application and power-aware media access/control

• Dynamic selection of full/half duplex

• Directional asymmetry to save power, e.g., 10Gb/s send, 100Mb/s receive

– Power-awareness in applications and protocols

• Hard state (proxying), soft state (caching), protocol/data “streamlining” for power as well as b/w reduction

• Power implications for topology design

– Tradeoffs in redundancy/high-availability vs power consumption – VLANs support for power-aware system virtualization

Trang 14

Why University Research?

• Imperative that future technical leaders learn to deal with scale in modern computing systems

• Draw on talented but inexperienced people

– Pick from worldwide talent pool for students & faculty

– Don’t know what they can’t do

• Inexpensive allows focus on speculative ideas

– Mostly grad student salaries

– Faculty part time

• Tech Transfer engine

– Success = Train students to go forth and replicate

– Promiscuous publication, including source code

– Ideal launching point for startups

Trang 15

Why a New Funding Model?

• DARPA has exiting long-term research in

experimental computing systems

• NSF swamped with proposals, yielding

even more conservative decisions

• Community emphasis on theoretical vs

experimental-oriented systems-building

research

• Alternative: turn to Industry for funding

– Opportunity to shape research agenda

Trang 16

QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.

New Funding Model

• 30 grad students + 5 undergrads+ 6 faculty + 4 staff

• Foundation Companies: $500K/yr for 5 years

– Google, Microsoft, Sun Microsystems

– Prefer founding partner technology in prototypes

– Many from company attend retreats, advise on directions, head start

on research results

– Putting IP in Public Domain so partners use but not sued

• Large Affiliates $100K/yr: Fujitsu, HP, IBM, Siemens

• Small Affiliates $50K/yr: Nortel, Oracle

• State matching programs add $1M/year: MICRO, Discovery

Trang 17

• “DC is the Computer”

– OS: ML+VM, Net: Identity-based Routing, FS: Web Storage

– Prog Sys: RoR, Libraries: Web Services

– Development Environment: RAMP (simulator), AWE (tester), Web 2.0 apps (benchmarks)

– Debugging Environment: *Trace + X-Trace

• Milestones

– DC Energy Conservation + Reliability Enhancement– Web 2.0 Apps in RoR

Trang 18

Conclusions

• Develop-Analyze-Deploy-Operate modern systems at

Internet scale

– Ruby-on-Rails for rapid applications development

– Declarative datacenter for correct-by-construction system

configuration and operation

– Resource management by System Statistical Machine Learning – Virtual Machines and Network Storage for flexible resource

allocation

– Power reduction and reliability enhancement by fast

power-down/restart for processing nodes

– Pervasive monitoring, tracing, simultation, workload generation for runtime analysis/operation

Trang 19

Discussion Points

• Jointly designed datacenter testbed

– Mini-DC consisting of clusters, middleboxes, and

network equipment

– Representative network topology

• Power-aware networking

– Evaluation of existing network elements

– Platform for investigating power reduction schemes in network elements

• Mutual information exchange

– Network storage architecture

– System Statistical Machine Learning

Trang 20

Ruby on Rails = DC PL

• Reasons to love Ruby on Rails

1 Convention over Configuration

• Rails framework feature enabled by Ruby language

feature (Meta Object Programming)

2 Scaffolding: automatic, Web based, (pedestrian)

User Interface to stored data

3 Program the client: v 1.1 write browser-side code

in Ruby then compile to Javascript

4 “Duck Typing/Mix-Ins”

• Looks like string, responds like string, it’s a string!

• Mix-in improvement over multiple inheritance

Trang 21

DC Monitoring

• Imagine a world where path information always passed along so that can always track user

requests throughout system

• Across apps, OS, network components and

layers, different computers on LAN, …

Trang 22

*Trace: The 1% Solution

• *Trace Goal: Make Path Based Analysis have low

overhead so it can be always on inside datacenter

– “Baseline” path info collection with ≤ 1% overhead

– Selectively add more local detail for specific requests

• *Trace: an end-to-end path recording framework

– Capture & timestamp a unique requestID across all system

components

– “Top level” log contains path traces

– Local logs contain additional detail,

correlated to path ID

– Built on X-trace

Trang 23

X-Trace: comprehensive tracing through Layers, Networks, Apps

• Trace connectivity of distributed

components

– Capture causal connections

between requests/responses

• Cross-layer

– Include network and middleware

services such as IP and LDAP

• Cross-domain

– Multiple datacenters, composed

services, overlays, mash-ups

– Control to individual

administrative domains

• “Network path” sensor

– Put individual requests/responses, at different network layers, in the context of an end-to-end

request

Trang 24

Actuator:

Policy-based Routing Layer

• Assign ID to incoming packets (hash + table lookup)

• Route based on IDs, not locations (i.e., not IP addr)

– Sets up logical paths without changing network topology

• Set of common middle boxes get single ID

– No single weakest link: robust, scalable throughput

Identity-based Routing Layer

Firewall

(IDF)

Balancer (IDLB)

Load- Detection (IDID)

Intrusion-Service (IDS)

(IDID,IDS) pkt (IDF,IDLB) pkt

pkt

• So simple can be done

in FPGA?

• More general than MPLS

Trang 25

Other RAD Lab Projects

• Research Accelerator for MP (RAMP)

Trang 26

• Good match to Machine Learning

– An optimization, so imperfection not catastrophic

– Lots of data to measure, dynamically changing

workload, complex cost function

• Not steady state, so not queuing theory

• PG&E trying to change behavior of datacenters

• Properly state problem:

1 Preserve 100% Service Level Agreements

2 Don’t hurt hardware reliability

3 Then conserve energy

• Radical idea: can conserving energy improve

hardware reliability?

1st Milestone:

DC Energy Conservation

Trang 27

• Improve component reliability?

• Disks: Lifetimes measured in Powered On

Hours, but limited to 50,000 start/stop cycles

• Idea, if turn off disks 50%, then ≈ 50% annual

failure rate as long as don’t exceed 50,000 start/

stop cycles (≈ once per hour)

• Integrated Circuits: lifetimes affected by Thermal Cycling (fast change bad), Electromigration (turn off helps), Dielectric Breakdown (turn off helps)

• Idea: If limited number of times cycled thermally, could cut IC failure rate due EM, DB by ≈ 30%?

1st Milestone: Conserve

Energy & Improve Reliability

See “A Case For Adaptive Datacenters To Conserve Energy and Improve

Trang 28

• Demonstrate RAD Lab vision of 1 person

creating next great service and scale up

• Where get example great apps, given grad

students creating the technology?

• Use “Undergraduate Computing Clubs” to create

exciting apps in RoR using RAD Lab equipment,

technology

– Armando Fox is RoR club leader

– Recruited Real World RoR programmer to develop

code and advise RoR computing club

– ≈30 students joined club Jan 2007

– Hire best ugrads to build RoR apps in RAD Lab

RAD Lab 2.0 2nd Milestone:

Killer Web 2.0 Apps

Trang 29

Miracle of University Research

• Talented (inexperienced) people

– Pick from worldwide talent pool for students & faculty

– Don’t know what they can’t do

• Inexpensive

– Mostly grad student salaries ($50k-$75k/yr overhead)

– Faculty part time ($75k-$100k/yr including overhead)

• Berkeley & Stanford Swing for Fences (R, not r or D)

• Even if hit a single, train next generation of leaders

• Technology Transfer engine

– Success = Train students to go forth & multiply

– Publish everything, including source code

– Ideal launching point for startups

Trang 30

Chance to Partner with

a Great University

• Chance to Work on the “Next Great Thing”

• US News & World Report ranking of CS Systems universities: 1

Berkeley, 2 CMU, 2 MIT, 4 Stanford

• Berkeley & Stanford some the top suppliers of systems students

to industry (and academia)

• National Academy study mentions Berkeley in 7 of 19 $1B+

industries from IT research, Stanford 4 times

• Timesharing (SDS 940), Client-Server Computing (BSD Unix),

VLSI Design (Spice),

Trang 31

Years to > $1B IT industry from Research Start

Trang 32

• Communication inversely proportional to distance

– Almost never if > 100 feet or on different floor

• Everyone (including faculty) in open offices

• Great Meeting Rooms, Ubiquitous Whiteboards

• Technology to concentrate: Cell phone, Ipod, laptop

• Google “Physical RAD Lab” to learn more

Trang 33

Example of Next Great Thing

Berkeley Adaptive Distributed systems

Laboratory (“RAD Lab”)

– Founded 12/2005: with Google,

Microsoft, Sun as founding partners

– Armando Fox, Randy Katz, Mike Jordan,

Anthony Joseph, Dave Patterson, Scott Shenker, Ion Stoica

– Google “RAD Lab” to learn more

Trang 34

RAD Lab Goal:

Enable Next “Ebay”

• Create technology to enable next great Internet Service to grow rapidly without growing the organization rapidly

– Machine Learning + Systems is secret sauce

• Position: “The datacenter is the computer”

– Leverage point is simplifying datacenter management

• What is the programming language of the datacenter?

• What is CAD for the datacenter?

• What is the OS for the datacenter?

Định dạng
Số trang	34
Dung lượng	7,9 MB