Chuyên đề mạng thế hệ mới mạngChuyên đề mạng thế hệ mới mạngChuyên đề mạng thế hệ mới mạngChuyên đề mạng thế hệ mới mạngChuyên đề mạng thế hệ mới mạngChuyên đề mạng thế hệ mới mạngChuyên đề mạng thế hệ mới mạngChuyên đề mạng thế hệ mới mạngChuyên đề mạng thế hệ mới mạngChuyên đề mạng thế hệ mới mạngChuyên đề mạng thế hệ mới mạngChuyên đề mạng thế hệ mới mạngChuyên đề mạng thế hệ mới mạngChuyên đề mạng thế hệ mới mạng
Trang 1Shortest-Path &
Adaptative Routing
Prométhée Spathispromethee.spathis@lip6.fr
Thème NPA, LIP6 Paris, FRANCE
– Reacting quickly to alleviate congestion
– Avoiding over-reacting and causing oscillations
– Limiting bandwidth & CPU overhead on routers
• Load-sensitive routing
– Routers adapt to link load in a distributed fashion
– At the packet level, or on “group of packets”
• Traffic engineering
– Centralized computation of routing parameters
– Network-wide measurements of offered traffic
IP Addressing
• 32-bit number in dotted-quad notation (12.34.158.5)
• Divided into network & host portions (left and right)
• 12.34.158.0/24 is a 24-bit prefix with 28addresses
Trang 2• In the older days…
Reserved for future use (sounds a bit scary…)
• And then, address space became scarce…
Some History: Why Dotted-Quad Notation? IP Address != Host Machine
• Dynamic IP address assignment (DHCP)– Single client may have multiple addresses over time– Address may correspond to multiple clients over time
• Shared machines– Multiple users on a shared compute server– Transfers traveling through proxies and firewalls– Multiple Web sites hosted on a single machine
• Replicated sites– Multiple machines hosting a single (popular) Web site
• Addresses do not correspond to geographic location– Similar prefix does not necessarily imply nearby hosts– Single prefix may span hosts in large geographic region
• Source IP address may be spoofed (e.g., DoS attack)
Addresses Lifetime and Scope
scope
update frequency
global
(NAT)
IP (global)
FQDN (global)
FQDN (private)
DHCP
port/protocol numbers
Challenges for Internet routing
scale: with 200 million destinations:
• can’t store all dest’s in routing tables!
• routing table exchange would swamp links!
administrative autonomy
• internet = network of networks
• each network admin may want
to control routing in its own network
Routing study thus far - idealization
– all routers identical– network “flat”
… not true in practice
Trang 3Use two 32-bit numbers to represent a network
Network number = IP address + Mask
Usually written as 12.4.0.0/15
Classless Inter-Domain Routing (CIDR) CIDR = Hierarchy in Address Allocation
• Prefixes are key to Internet scalability – Address allocation by ARIN/RIPE/APNIC and by ISPs – Routing protocols and packet forwarding based on prefixes – Today, routing tables contain ~150,000-200,000 prefixes
12.0.0.0/8
12.0.0.0/16
12.254.0.0/16
12.1.0.0/16 12.2.0.0/16 12.3.0.0/16
: : :
12.253.0.0/16
12.3.0.0/24 12.3.1.0/24
: :
12.3.254.0/24 12.253.0.0/19 12.253.32.0/19 12.253.64.0/19 12.253.96.0/19 12.253.128.0/19 12.253.160.0/19 12.253.192.0/19
: : :
Hierarchical addressing: route aggregation
Hierarchical addressing allows efficient advertisement of routing
information:
“Send me anythingwith addresses beginning 200.23.16.0/20”
200.23.16.0/23
200.23.18.0/23
200.23.30.0/23
Fly-By-Night-ISPOrganization 0
Organization 1
ISPs-R-Us “Send me anything
with addresses beginning 199.31.0.0/16
200.23.16.0/23
200.23.18.0/23 200.23.30.0/23
Fly-By-Night-ISPOrganization 0
Organization 1
ISPs-R-Us “Send me anything
with addresses beginning 199.31.0.0/16
or 200.23.18.0/23”
200.23.20.0/23Organization 2
Trang 4Longest Prefix Match Forwarding
4.0.0.0/8 4.83.128.0/17 201.10.0.0/21 201.10.6.0/23 126.255.103.0/24 201.10.6.17
destination
forwarding table
Serial0/0.1
outgoing link
• Forwarding tables in IP routers
– Maps each IP prefix to next-hop link(s)
• Destination-based forwarding
– Packet has a destination address
– Router identifies longest-matching prefix
15
A Simple Algorithm
• Scan the forwarding table one entry at a time
– See if the destination matches the entry– If so, check the size of the mask for the prefix– Keep track of the entry with longest-matching prefix
Simplest Algorithm is Too Slow
• Overhead is linear in size of the forwarding table
– Today, that means 150,000-200,000 entries!
– And, the router may have just a few nanoseconds before the
next packet is arriving
• Need greater efficiency to keep up with line rate
• Patricia tree is faster than linear scan
– Proportional to number of bits in the address
• Patricia tree can be made faster
– Can make a k-ary tree
E.g., 4-ary tree with four children (00, 01, 10, and 11)
– Faster lookup, though requires more space
• Can use special hardware
– Content Addressable Memories (CAMs)– Allows look-ups on a key rather than flat address
• Huge innovations in the mid-to-late 1990s
– After CIDR was introduced (in 1994)– … and longest-prefix match was a major bottleneck
Trang 5R
R
RA
B
C D
R1 R2
Net Nxt Hop A
B C D E default
R2 R2 Direct R5 R5 R2
Net Nxt Hop
A B C D E default
R1 Direct R3 R1 R3 R1
Default to
upstream
router
A B C D E default
Forwarding: determine next hop
Routing: establish end-to-end paths
Forwarding always works Routing can be badly broken
• Forwarding: data plane– Directing a data packet to an outgoing link– Individual router using a forwarding table
• Routing: control plane– Computing the paths the packets will follow– Routers talking amongst themselves– Individual router creating a forwarding table
forwarding table
Host, router network layer functions:
Administrator manually configuresforwarding table entries
In practice : a mix of these.
Static routing mostly at the “edge”
+ More control+ Not restricted to destination-based forwarding
- Doesn’t scale
- Slow to adapt to network failures
+ Can rapidly adapt to changes
in network topology+ Can be made to scale well
- Complex distributed algorithms
- Consume CPU, Bandwidth, Memory
- Debugging can be difficult
- Current protocols are destination-based
How Are Forwarding Tables Populated to implement Routing?
Trang 6Data, Control, and Management Planes
Data Plane Control Plane Management
Plane Timescale Packet (nsec) Event (10
msec to sec) Human (min to hours) Tasks Forwarding,
buffering,
filtering,
scheduling
Routing, signaling Analysis, configuration
– Routers/links managed by a single “institution”
– Service provider, company, university, etc.
AS 1
AS 2
BGP
EGP = Exterior Gateway Protocol
IGP = Interior Gateway Protocol
Metric based: OSPF, IS-IS, RIP,
EIGRP (cisco) Policy based: BGP
The Routing Domain of BGP is the entire Internet
OSPF
EIGRP
LIP6 network
to egress
Inter-AS routing
(Border Gateway Protocol) determines AS path and egress point
Trang 7Interconnected ASes
• Forwarding table is configured by both intra-and inter-AS routing algorithm
– Intra-AS sets entries for internal dests
– Inter-AS & Intra-As sets entries for external dests
1d
2a 2c 2b
Forwarding
table
The Gang of Four
EGP
IGP
BGP
RIP IS-IS
OSPF
• Topology information is
flooded within the routing
domain
• Best end-to-end paths are
computed locally at each
• Works only if policy is
shared and uniform
• Examples: OSPF, IS-IS
• Each router knows little about network topology
• Only best next-hops are chosen by each router for each destination network
• Best end-to-end paths result from composition of all next-hop choices
• Does not require any notion
of distance
• Does not require uniform policies at all routers
• Examples: RIP, BGP
Trang 8Shortest-Path Routing
• Path-selection model
– Destination-based
– Minimum hop count or sum of link weights
– Dynamic vs static link weights (i.e load-insensitive routing)
3
2
2
11
41
4
53
Link-State Routing: Dijsktra’s Algorithm
• Each router keeps track of its incident links– Link cost, and whether the link is up or down
• Each router broadcasts the link state– To give every router a complete view of the graph
• Each router runs Dijkstra’s algorithm– To compute shortest paths and forwarding table
2
11
41
4
53
E.g., OSPF and IS-IS
Dijkstra’s Shortest-Path Algorithm
• Iterative algorithm
– After k iterations, know least-cost path to k nodes
• S: nodes whose least-cost path definitively known
– Initially, S = {u} where u is the source node
– Add one node to S in each iteration
• D(v): current cost of path from source to node v
– Initially, D(v) = c(u,v) for all nodes v adjacent to u
– … and D(v) = ∞ for all other nodes v
– Continually update D(v) as shorter paths are learned
Trang 953
3
2
2
11
41
4
53
4
53
4
53
41
4
53
3
2
2
11
4
1
4
53
3
2
2
11
41
4
53
z
s t
– The cost on the link
• Each router broadcasts the link state– To give every router a complete view of the graph
• Each router runs Dijkstra’s algorithm– To compute the shortest paths– … and construct the forwarding table
• Example protocols– Open Shortest Path First (OSPF)– Intermediate System – Intermediate System (IS-IS)
Trang 10Detecting Topology Changes
• Beaconing
– Periodic “hello” messages in both directions
– Detect a failure after a few missed “hellos”
• Performance trade-offs
– Detection speed
– Overhead on link bandwidth and CPU
– Likelihood of false detection
– Ensure all nodes receive link-state information
– … and that they use the latest version
– Time-to-live for each packet
When to Initiate Flooding
Trang 11Convergence
• Getting consistent routing information to all nodes
– E.g., all nodes having the same link-state database
• Consistent forwarding after convergence
– All nodes have the same link-state database
– All nodes forward packets on shortest paths
– The next router on the path forwards to the next hop
3
2
2
11
41
4
53
41
4
53
Transient Disruptions
• Inconsistent link-state database
– Some routers know about failure before others
– The shortest paths are no longer consistent
– Can cause transient forwarding loops
4
53
3
2
2
11
41
Convergence Delay
• Sources of convergence delay
– Detection latency– Flooding of link-state information– Shortest-path computation– Creating the forwarding table
• Performance during convergence period
– Lost packets due to blackholes and TTL expiry– Looping packets consuming resources– Out-of-order packets reaching the destination
• Very bad for VoIP, online gaming, and video
Trang 12Reducing Convergence Delay
• Faster detection
– Smaller hello timers
– Link-layer technologies that can detect failures
• Faster flooding
– Flooding immediately
– Sending link-state packets with high-priority
• Faster computation
– Faster processors on the routers
– Incremental Dijkstra algorithm
• Faster forwarding-table update
– Data structures supporting incremental updates
Traffic Engineering: Tuning Link Weights
3
2
2
1 1
3 1
4
5 3
– Balanced load, low latency, service agreements
• Question: Given topology and traffic matrix, which link weights to use?
Traffic Engineering: Key Ingredients of Approach
• Instrumentation– Topology: monitoring of the routing protocols– Traffic matrix: fine-grained traffic measurement
• Network-wide models– Representations of topology and traffic– “What-if” models of shortest-path routing
• Network optimization– Efficient algorithms to find good configurations– Operational experience to identify key constraints
Trang 13Scalability: Overhead of Link-State Protocols
3
2
2
1 1
3 1
4
5 3
• Protocol overhead depends on the topology
– Bandwidth: flooding of link state advertisements
– Memory: storing the link-state database
– Processing: computing the shortest paths
• **
Scalability: Improving the Scaling Properties
• Dijkstra’s shortest-path algorithm
– Simplest version: O(N2), where N is # of nodes– Better algorithms: O(L*log(N)), where L is # links– Incremental algorithms: great for small changes
• Timers to pace operations
– Minimum time between LSAs for the same link– Minimum time between path computations
• More resources on the routers
– Routers with more CPU and memory
Scaling Link-State Routing
• Overhead of link-state routing
– Flooding link-state packets throughout the network
– Running Dijkstra’s shortest-path algorithm
• Introducing hierarchy through “areas”
Distance Vector Routing: Bellman-Ford
• Define distances at each node x– dx(y) = cost of least-cost path from x to y
• Update distances based on neighbors– dx(y) = min {c(x,v) + dv(y)} over all neighbors v3
2
2
11
41
4
53
u
v
w
x y
Trang 14Distance Vector Algorithm
• c(x,v) = cost for direct link from x to v
– Node x maintains costs of direct links c(x,v)
• Dx(y) = estimate of least cost from x to y
– Node x maintains distance vector Dx= [Dx(y): y є N ]
• Node x maintains its neighbors’ distance vectors
– For each neighbor v, x maintains Dv= [Dv(y): y є N ]
• Each node v periodically sends Dvto its neighbors
– And neighbors update their own distance vectors
– Dx(y) ← minv{c(x,v) + Dv(y)} for each node y ∊ N
• Over time, the distance vector Dxconverges
62
Distance Vector Algorithm
local iteration caused by:
• Local link cost change
• Distance vector update message from neighbor
Distributed:
• Each node notifies neighbors
only when its DV changes
• Neighbors then notify their neighbors if necessary
waitfor (change in local link cost or message from neighbor)
2
3
6
4 1
Optimum 1-hop paths
Distance Vector Example: Step 2
2
3
6
4 1
1
1
3
Trang 153
6
4 1
1
1
3
66
Distance Vector: Link Cost Changes
Link cost changes:
• Node detects local link cost change
• Updates the distance table
• If cost change in least cost path, notify neighbors
1 4 50
Y
1
algorithm terminates
“goodnews travelsfast”
Distance Vector: Link Cost Changes
Link cost changes:
• Good news travels fast
• Bad news travels slow - “count to infinity”
1 4 50
Y
60
algorithm continues on!
Distance Vector: Poison Reverse
If Z routes through Y to get to X :
• Z tells Y its (Z’s) distance to X is infinite (so Y won’t route to X via Z)
• Still, can have problems when more than 2 routers are involved X Z
1 4 50
Y
60
algorithm terminates
Trang 16Routing Information Protocol (RIP)
• Distance vector protocol
– Nodes send distance vectors every 30 seconds
– … or, when an update causes a change in routing
• Link costs in RIP
– All links have cost 1
– Valid distances of 1 through 15
– … with 16 representing infinity
– Small “infinity” smaller “counting to infinity” problem
• RIP is limited to fairly small networks
– E.g., used in the Princeton campus network
Network Layer 4-70
RIP: Link Failure and Recovery
If no advertisement heard after 180 sec > neighbor/link declared dead
– routes via neighbor invalidated– new advertisements sent to neighbors– neighbors in turn send out new advertisements (if tables changed)
– link failure info quickly propagates to entire net– poison reverse used to prevent ping-pong loops (infinite distance = 16 hops)
• Topology information is
flooded within the routing
domain
• Best end-to-end paths are
computed locally at each
• Works only if policy is
shared and uniform
• Examples: OSPF, IS-IS
• Each router knows little about network topology
• Only best next-hops are chosen by each router for each destination network
• Best end-to-end paths result from composition of all next-hop choices
• Does not require any notion
of distance
• Does not require uniform policies at all routers
• Examples: RIP, BGP
• Link-state routing with static link weights– Static weights: avoid stability problems – Link state: faster reaction to topology changes
• Most common protocols in backbones– OSPF: Open Shortest Path First– IS-IS: Intermediate System–Intermediate System
• Some use of distance vector in enterprises– RIP: Routing Information Protocol– EIGRP: Enhanced Interior Gateway Routing Protocol
• Growing use of Multi-Protocol Label Switching