inden-hannes@Munich> show configuration policy-options indenta-standalone then accept term does is accept every unicast route in the inet.0 routing tables and mark it for export into the
Trang 1The IS-IS configuration looks alright – all interfaces are referenced At the top there is
a pointer to an export policy which we will examine closer
JUNOS configuration
On first sight the static-to-isis policy looks good, however once you check the tation of the terms and accept statements you will find out that the policy does not do what the network operator wanted it to do.
inden-hannes@Munich> show configuration policy-options
indenta-standalone then accept term does is accept every unicast route in the inet.0 routing
tables and mark it for export into the IS-IS link-state database Because there is no fromstatement at the same indentation level as the final then accept statement, we have
an unconditional export of the entire Internet routing table into IS-IS (The final “then”logic is executed when no terms match the routes The logic is here “Is the route 10/8 orlonger?” No, that’s a private address “Is the route static?” No, it’s an Internet route
“Okay, then unconditionally accept the route into IS-IS.”)
The distributed storage space that each node may allocate is 1492(–27) * 256
375 Kbytes How many IPv4 prefixes do fit in those 375 Kbytes? Figure 12.11 in Chapter
12 “IP Reachability Information” illustrates the structure and storage requirements of theExtended IP Reachability TLV #135 Worst case, the TLV consumes 9 bytes and bestcase 5 bytes due to variable prefix length packing For the average Internet route we canassume a prefix length between /16 and /24 and safely assume a total storage requirement
of 8 bytes per prefix In a single TLV, on average, 31 TLVs fit, which requires 31 * 8 + 2(TLV Overhead) 250 bytes to store An LSP fragment is at maximum 1492 bytes insize For TLV information there is 1492 – Header size (27) 1465 space That means
in total we can store 31 * 5 + 26 181 routes per fragment Inside 256 fragments we can
store around 46 K routes, which is too little to hold the entire Internet routing table As
soon as the routers hit that limit, it pulls the “emergency brake” and sets the overload bit
Trang 2Finally, it cleans up the mess by purging the previously generated LSPs off the uted link-state database And that’s what the router was showing us.
distrib-In order to fix the problem, the then accept statement is moved into the termstatic
data-JUNOS command output
After the router has changed, the broken routing policy the Overload Bit is automatically cleared.
hannes@Munich> show isis database
IS-IS level 2 link-state database:
LSP ID Sequence Checksum Lifetime Attributes Munich.00-00 0x1c2 0x2d3b 1192 L1 L2 Pennsauken.00-00 0xc77 0xec5e 711 L1 L2 Frankfurt.00-00 0x198 0xdd86 933 L1 L2
Trang 315.4 Summary
Most IS-IS problems can be resolved quickly if you stick to a troubleshooting plan andcheck from Layer-1 of the OSI Reference Model right up to the Application Layer In IS-IS, the Application Layer represents the link-state database that holds the network’slink state PDUs The network engineer needs to develop an understanding of what func-tions each layer is performing and what tools he has available to gather information.After information gathering, the collected data needs to be analyzed and interpreted,which requires knowledge of the show commands and debug outputs For detecting mis-configuration on a router, the network engineer needs to understand where the IS-IS rele-vant data in the configuration are stored
The majority of IS-IS problems are related to adjacency formation The network engineerneeds to get familiar with all sorts of debug output for IOS and JUNOS Just looking atthe IS-IS specific configuration is often not enough to resolve a problem We havedemonstrated in the Internet route export case study that understanding of route exportand policy processing is paramount for resolving complex problems
Trang 4Network Design
For a long time, link-state protocols were believed not to scale However, today there areoperational networks with more than 1200 routers in a single level Still, networks that runlink-state protocols need to be carefully designed and a lot of factors need to be considered
to get to such a scale By ignoring certain reasonable constraints, you can easily break anetwork in certain scenarios In this chapter you will learn about the critical IS-IS networkdesign factors, all forms of router stress, including flooding stress, SPF stress and forward-ing state change stress, as well as what things to consider to build robust, fast-convergingnetworks
16.1 Topology and Reachability Information
In service provider networks there are always at least two protocols in use The first is anIGP (which could be OSPF or IS-IS), and the other is BGP One of the first questions
asked by networking novices is why do we need both? It turns out that all IGPs (IS-IS, OSPF, EIGRP) lack one fundamental thing, which is flow-control For IGPs, there is no
way to tell an adjacent router that their updates have overwhelmed the receiver and thesender should throttle down The only way to deal with the situation is to throw away theupdates and wait for re-transmission However, that is still a dangerous game, as it mayoffload stress at the expense of the sending router, which needs to queue retransmissionsand therefore consumes CPU and memory Careful protocol heuristics need to be imple-mented to make sure that both the sending and receiving router do not take themselves
out of service Dave Katz, a software engineer with Juniper Networks, who can be blamed for writing the majority of IGP implementations on the Internet (his own self-
definition) puts the complexity around finding the right heuristics in a single quote:
Link State Protocols are hard! (Dave Katz) What network engineers at service providers have been doing is to apply a divide and conquer strategy and separating topology from reachability information Topology infor-
mation contains the skeleton of the network – it is a graph that describes how the nodes are connected to each other It does not contain any information about customernetworks and server networks, or so on Ideally, it does not even contain informationabout the directly connected sub-nets Figure 16.1 shows that the only information that therouters advertise is their loopback IP address, which is necessary to bring up an iBGP full-mesh distribution network which handles bulk transport of the routing information
routing-475
Trang 5When you run IS-IS over a link you typically advertise your local IP sub-net in your
IS-IS LSPs There is even the notion that local IP sub-nets should not be announced by
IS-IS, but rather by BGP Historically there has not been an option to preclude certain IPsub-nets from being announced However, recent routing software allows you to change
BGP BGP
BGP
BGP BGP
BGP
Washington
IS-IS IS-IS
IS-IS
IS-IS IS-IS
Trang 6that behaviour In IOS, there is a single knob that changes the advertising behaviour ofdirectly connects sub-nets Once you configure the passive-only knob, the routingsoftware walks down the list of configured interfaces and looks for interfaces that aremarked as passive Recall that passive means that you include that interface’ssub-net in your routing update, but you do not try to establish a neighbour relationship or
an adjacency over that interface The loopback interface is by default passive and so
if you configure the passive-only option, only the loopback IP address of the router
Trang 7The nice thing about the JUNOS policy is that you may explicitly control the level tosuppress direct routes by introduction of a to {} statement The following exampleshows how to restrict to the loopback0 interface related routes inside Level 2 LSPs only.
Kirk Lougheed (Cisco Systems) and myself’s goal was to build a routing protocol able
to convey 1000 routes and not fall into pieces – If you consider the total routes being today in the Internet we pushed the envelope a bit (Yakov Rekhter)
Based on BGP’s superb scaling capabilities, the idea here is to “borrow” the existingBGP distribution mesh being used for transport of Internet routes for internal routes
as well
The conclusion as to why you always need two protocols is therefore: IS-IS scales
too poorly for conveying a bulk amount of routes, however, it can quickly discover a topology and provide routing connectivity between router loopback IP addresses BGPheavily depends on these IGP-supplied routes to bring up the iBGP Second, BGP isreally in the dark when it comes to ascertaining the distance between a pair of routers.Internal BGP sessions are not “targeted” and therefore need an IGP to resolve routes and
to give BGP speakers directions
In order to come up with a design recommendation, let’s first evaluate the forms of
stress that routers are exposed to and develop a set of critical design factors based on
those insights From there we will set up some rules to follow when designing an IS-ISnetwork
Trang 816.2.1 Flooding
Unlike link-local packets like Hellos (IIH) or Synchronization packets (SNP), ting link-state PDUs (LSPs) has a network-wide bandwidth usage impact Once a routerfloods LSPs, it is using bandwidth equal to the number of links in a given topology timesthe size of the LSP Worst case, it can be that network-wide transmission of an LSPcomes at a cost of using the number of all links times the size of a LSP squared The biggap between the best and the worst case (recall the best case is linear behaviour and the
transmit-worst case is N^2 behaviour) is solely explainable by the way the topology is meshed.
Consider Figure 16.2, where in a strict ring topology of six routers there is no duplicate
F 16.2 In a dense-meshed environment there are lots of duplicate LSPs to process
Trang 9transmission of an LSP As soon as a link breaks, the LSP travels round until every nodegets a copy Note that for greater visibility the propagation of only one LSP is shown Ofcourse, in real networks both ends of the link that breaks would originate a new LSP Assoon as you add links to the topology, the more redundant the transmission of LSPs gets.
In the ring-topology each router sees the LSP one time
The worst case is a full-mesh of all routers, where a single router failure triggers
(N – 1) LSPs being flooded over (N – 2) links ( O(N2)) through the network The bigproblem in a dense- or full-mesh environment is that nodes that already got a copy ofLSPs receive many redundant duplicates with the same information
An additional source of flooding stress comes from turning on the TE extensions.Once you turn on features like Traffic Engineering, DiffServ Traffic Engineering or AutoBandwidth, then the TEDs throughout the network topology need to be updated throughthe use of the IS-IS flooding sub-system That means that every router in the networksees (and needs to see) accurate TE information However, if the TE implementation permits changes to flooding timers, then let having very conservative timers guide yourdesign TE extensions are a major source of LSP updates and there should be an effort toreduce these to the minimum possible
It is recommended that you consider the topology to evaluate the stress resulting fromreceipt of duplicate LSPs Densely meshed environments scale poorly in flooding environ-ments Try to avoid full-mesh or near-full mesh topologies Sometimes a lot of extraredundancy does not turn into more resiliency
16.2.2 SPF Stress
Link-state routing protocols were once believed to be CPU intense algorithms thatexhausted an embedded system’s sparse resources Because of that belief, both link-stateIGPs (OSPF, IS-IS) have provisions to split the size of the link-state domains to smallerunits In OSPF multiple areas, and in IS-IS two levels, are an attempt to spare the controlplane CPU when doing the SPF run
A lot has changed in the last decade CPUs became (in line with Moore’s Law) faster by
a factor of 8000; Trunk bandwidth grew from T1 speeds to OC-192c/STM-64 The onlything that has not changed at all is the paranoid thinking that SPF may exhaust the CPUresources of a router The fact is, the demand that SPF puts on router resources has beenoutpaced by the processing power of modern CPUs Table 16.1 shows how SPF executionfares on modern route processors like the Cisco Systems GRP or a Juniper Networks RE3.0 The CPU requirements of an SPF operation are well understood and well documented
by computer scientists The fundamental relationship is O(N * log(N )), which describes a curve where the CPU requirements grow a little more than linearly, with N being the num- ber of total routers in the network In practice it is a little more than just log N due to the
2-way check that is needed to verify that a node is connected on both ends and not a dead end.The results from the simulation in Table 16.1 are impressive It means that processing
a grid of 2000 routers, which are in total connected by 5000 links, has a typical tion runtime of only 100–245 milliseconds If you consider this table then it is obvious
execu-that raw SPF execution time is not a problem for large IS-IS networks So what is it then?
Trang 10Why are we all so scared of routers running excessive number of SPF runs back to back?What is it besides the SPF calculation itself that scares network operators so much?
16.2.3 Forwarding State Change Stress
The purpose of the SPF calculation is to find out the shortest path to every edge of the
network However, just the insight that there are better paths available is not enough.
There are no good things, unless you do them! (Erich Kästner)
The router has to pass on the new proximity results to a subsystem called the resolver,
which is used to map third party next-hops to forwarding next-hops Consider Figure16.3, if the link between Washington and New York breaks, the SPF calculation will befinished in a matter of microseconds Each IS-IS speaker is also a BGP speaker and car-ries several thousand active BGP routes If the IS-IS topology changes, then the BGProutes that depend on IS-IS need to get changed as well The resolver needs now to back-track through all the BGP routes and verify that the BGP next-hop is affected by a change
in the core topology As you can imagine, walking down a table of several hundreds ofthousands of BGP route-entries is a resource intensive task In our example, there aretons of forwarding state changes to do: all Washington and New York routes need to bechanged in a very short time
After the BGP dependencies have been worked out, this may generate changes in theBGP topology as well: recall that the IGP distance is part of the BGP route selectionprocess But that is only half of the story, as those things still occur on the control plane
T ABLE 16.1 Modern route processors can calculate topologies for
thousands of nodes and links sub second.
SPF runtime (ms) Juniper Networks Cisco Systems Routers Links Routing Engine 3.0 GRP 12000
Trang 11The forwarding state change of tens of thousands of routes may stress several sub-systems
of an Internet core router It turns out that changing a forwarding state is one of the mostexpensive operations in a router Meanwhile, both Juniper and Cisco have found a way topass on third party next-hop information to the line-cards and retain the dependency ofBGP routes to IS-IS speakers to forwarding interfaces More on passing on third party next-hop information, and why it is not always a good idea to attempt to fully resolve a route toits forwarding next-hop, can be found in Chapter 10, “SPF and Route Calculation”
Wash D.C.
Metric 4 Metric 2
Metric 1 Metric 1
Pennsauken
Frankfurt London
Washington New York
Paris
BGP
40 K active routes
BGP
25 K active routes
BGP
30 K active routes
BGP
15 K active routes
BGP
20 K active routes
BGP
10 K active routes
F IGURE 16.3 The resolver needs to track and map BGP next-hops to the shortest path resulting from the SPF calculation
Trang 1216.2.4 CPU and Memory Usage
The two main things that utilize the CPU most in an IS-IS router are the SPF calculationand the resolver SPF calculation puts a short burden on the system but even in largetopologies that burden does not last more than 200 ms using modern route processors Asdiscussed in the previous section, the far bigger CPU hog is the resolver, which maps BGProutes to forwarding next-hops SPF execution runtime is ultimately a non-issue; however,the burden that the resolver can put on the system needs to be carefully examined
In the 1990s, during the explosive growth of the Internet, routers were constantly short
of memory Since then network service providers are cautious about the memory usage
of their routing protocols There is almost no IS-IS-related documentation regardingmemory consumption The majority of IS-IS implementations use memory in three areas:
1 Link-state database
2 SPF result table
3 Storing neighbour information
The link-state database size is the easiest to predict It contains mostly raw data thatwas extracted from the TLVs in an IS-IS PDU There are also overhead and index struc-tures so the IS-IS software can quickly traverse the database when it is looking for a cer-tain LSP As a rough guideline, one can state that the size of the link-state database isabout double the size that individual LSPs consume on the wire For example, if the net-work knows about 100 LSPs with an average length of 400 bytes each, then the size tostore this information in the router software is 100 * 400 * 2 80 KB
The size of the SPF result table depends largely on how many IP prefixes are known
to IS-IS inside the network A good estimation here is that each prefix consumes about
70 bytes For example, if you have 1600 IS-IS prefixes in your network, then the ory consumption on the control plane is 112 KB
mem-The neighbouring table is the most complex one to calculate as all the flooding stateand retransmission list needs to be kept on a per adjacency basis That structure is alsodependent on the size of the link-state database, because all the flooding states are tied toboth the LSP and the adjacency There is a lot of clever pointer work involved here, andthe overhead to do efficient flooding is enormous A good approximate figure is that thistable is about 50 times the average LSP size multiplied by the number of active adjacen-cies For example, if the average LSP is about 400 bytes and the number of adjacencies
is eight, then the memory consumption is 400 * 50 * 8 160 K
If you sum the three memory areas up, then the result for a large network is unlikely
to exceed 4–5 MB in total In IS-IS, the memory consumption is minimal given that there are mainly route processors with 256 MB–2 GB memory deployed in the field.Interestingly, there are large overhead structures in the LSP databases to increase LSPlookup speed and to keep flooding state even for large numbers of adjacencies This is justmore evidence that memory consumption for IS-IS networks with big core routers is anon-issue
Trang 13recommenda-The rest of this chapter draws on many of the topics and ideas discussed throughoutthis book There is no need to repeat more than the basics of the discussions, however, so
we don’t present all of the gory details all over again
16.3.1 Separate Topology and IP Reachability Data
Perhaps the most important rule is keeping topology and IP reachability data separate.You saw that IGPs are not very good at transporting large numbers of routes, so justavoid it and pass the job to BGP In large (more than 1000 routers per level) you mayeven decide to advertise directly connected routes in BGP as well Given that an averageIS-IS core router has about five or six directly attached sub-nets, then you clearly want toavoid that extra 2500–3000 prefixes at the IS-IS level in order to keep convergence timeswithin an upper bound An ideal IS-IS LSP contains just a single IP prefix, which is therouter’s loopback IP address, plus Extended IS Reach TLVs that point to neighbouringrouters
Tcpdump output
An ideal LSP just conveys a single IP prefix per router and passes all other routing
infor-mation via BGP.
12:36:45.587565 OSI, IS-IS, length: 405
hlen: 27, v: 1, pdu-v: 1, sys-id-len: 6 (0), max-area: 3 (0)
L2 LSP, lsp-id: 2092.1113.4009-00, seq: 0x000002fd, lifetime: 1198s chksum: 0xe984 (correct), PDU length: 185, Flags: [ L1L2 IS ]
Area address(es) TLV #1, length: 4
Area address (length: 3): 49.0001
Protocols supported TLV #129, length: 1
NLPID(s): IPv4
IPv4 Interface address(es) TLV #132, length: 4
IPv4 interface address: 192.168.1.1
Hostname TLV #137, length: 10
Hostname: Washington
Extended IS Reachability TLV #22, length: 99
IS Neighbor: 1921.6800.1077.00, Metric: 4, sub-TLVs present (12) IPv4 interface address (subTLV #6), length: 4, 172.17.1.6
Trang 14IS Neighbor: 1921.6800.1043.00, Metric: 4, sub-TLVs present (12) IPv4 interface address (subTLV #6), length: 4, 172.16.33.38
IPv4 neighbor address (subTLV #8), length: 4, 172.16.33.37
IS Neighbor: 1921.6800.1018.00, Metric: 4, sub-TLVs present (12) IPv4 interface address (subTLV #6), length: 4, 172.16.33.25
IPv4 neighbor address (subTLV #8), length: 4, 172.16.33.26
Extended IPv4 reachability TLV #135, length: 9
IPv4 prefix: 192.168.1.1/32, Distribution: up, Metric: 0
Authentication TLV #10, length: 17
HMAC-MD5 password: 68e18feb2e29257113e4bb6580169310
16.3.2 Keep the Number of Active BGP Routes per Node Low
Vendors have come up with smart representations of BGP routes and how those routesdepend on IS-IS routes However, there is one fault condition where even smart routerepresentations inside a router do not gain us much If an entire BGP speaker disappears,then when the BGP speaker goes down the BGP control plane needs to re-route all thoseprefixes, which of course takes time If an IS-IS router is carrying a large number ofactive routes, then it takes proportionally longer if that BGP router goes down Figure16.4 shows that, on the left-hand side, Washington is a “hotspot” BGP speaker that car-ries the majority of BGP routes If this speaker goes down, then you need to re-route all
120 K routes, which can cause a network wide outage of up to 3 minutes The logical step
is to spread those 120 K routes among several routers as shown on the right-hand side ofFigure 16.4
In well-developed peering meshes, the average number of routes per border router isnot more than 10 K In our example, because of a lack of routers, we still did not put morethan 30 K routes per node In practice, if you receive more than 10 K routes per peer, thenyou may need to consider a redundant router and spread the incoming prefixes over thetwo redundant routers Re-routing 10 K prefixes if the active router breaks down can bedone in a matter of 5–10 seconds
16.3.3 Avoid LSP Fragmentation
IS-IS has plenty of space (precisely 375,040 bytes per LSP) in the distributed database.Despite this vast amount of information that an individual IS-IS speaker can originate,
you typically do not want to use that storage size – ever You should try to accommodate
all the information that you need in maxLSPsize (1492) – LSP header (27) 1465bytes There may be a number of additional LSP updates if you cross an LSP boundaryand have to break things up into another segment Consider Figure 16.5 to see what happens
if you are at the edge of Fragment 0 and an additional adjacency comes up Router1921.6800.1018 decides that it needs to break up another segment Router 1921.6800.1018 generates the fragment and floods it The troubles start if any of the router’sother sub-nets or adjacencies become unavailable Assume that Adjacency #4 falls down,and then the entire TLVs that follow this particular adjacency gets shifted, and also mayfall into another fragment Considering the example in Figure 16.5, there is no need to
Trang 15routes New York
30K active routes Washington