The Complete IS-IS Routing Protocol- P10 docx

After the pre-SPF delay, the router freezes the link-state database and does the SPF calculation.. After the SPF calculation has completed, the router starts an SPF hold-down timer which

Trang 1

10.3.1 Full SPF Run

The full SPF run is the heavyweight of SPF ﬂavours It both re-computes the topologicalgrid in an area as well as re-computes the reachable IP preﬁxes Full SPF runs are typi-cally triggered by the following events:

• Local conﬁguration change

• Update to a known LSP, which contains an adjacency change

• Local aged adjacency

• Receipt of a new/unknown LSP

• New Area-ID in the Level-1 network

• Link metric change

• Purging an LSP

• Periodically for additional robustness (every 15 minutes)

The full SPF run is not scheduled immediately after the above trigger events Instead it is

delayed for a conﬁgurable minimum amount of time The most typical event from the

above list is a new or updated LSP In IS-IS networks, as in any other network running

link-state routing protocols, there is a general observation that single LSP updates arevery rare They are almost always accompanied by other LSPs, which follow shortlyafter the ﬁrst LSP shows up The reason behind this is very clear: if a link fails there are

always two routers that need to re-originate their LSPs So it is better to wait a couple of

milliseconds before starting an SPF calculation, which may tie the router down on theorder of 100s of milliseconds

So routers delay the SPF calculation The typical pre-SPF delay value is 100 or 200 ms

(depending on IOS or JUNOS) After the pre-SPF delay, the router freezes the link-state

database and does the SPF calculation Freezing means that during this time, no LSPadditions or changes can be made

10.3.1.1 Link-state Database Locking

It is absolutely mandatory for an IS-IS implementation to freeze the database during anSPF calculation run An LSP change inserted during a run of the SPF calculation mayresult in bogus routes Consider Figure 10.10 to get an idea what will happen if the link-state database is not locked We are in the middle of an SPF calculation The early stages

of the SPF calculation considered the path through Washington the best path in the network.

Now it is exploring the network downstream from Washington Suddenly, the link betweenWashington and New York goes down Unfortunately, the New York–Washington path is

our best-path candidate The SPF calculation does not backtrack through path candidates

to see if the path properties have changed If the router does not lock the link-state databasethen the result will be most likely bogus routes Of course, IOS and JUNOS both lock thedatabase (as any serious IS-IS implementation has to) and queue any incoming LSPs forinsertion once the database is unlocked

After the SPF calculation has completed, the router starts an SPF hold-down timer

which blocks further SPF runs for self-protection reasons

Trang 2

SPF Calculation Diversity 259

10.3.1.2 Self-protection

The purpose of hold-downs is to allow the IS-IS router to work less Consider Figure 10.11

to see why SPF hold downs make sense If there were no hold-down for SPF calculation,then the average utilization of the control plane CPU would be very high During an SPFcalculation (100–200 ms) the CPU utilization jumps to 100 per cent But shortly there-after it drops down to 0 per cent If a network is shaky, then additional LSPs triggeringnew SPF calculations will follow, raising the CPU utilization to 100 per cent once again

for a short period of time By applying SPF hold-down timers, IS-IS keeps the intervals between the SPF calculations large and so lowers the average CPU utilization spent for

SPF calculations In other words, SPF hold-down is a self-protection mechanism to avoid

meltdown of the router’s control plane SPF hold downs trade responsiveness for stability.

What is gained is a router control plane that is stable in every situation and does not godown the “CPU churning spiral” when the network starts to get shaky However, on theother hand, a router loses responsiveness Consider a router that is in the middle of an

87000 600000

250000

315000

315000 26000

Trang 4

SPF hold-down period: even if plenty of LSPs do rush in, the router has to wait until thehold down period is over before scheduling the SPF calculation again Then there areconsiderations like “How short should the hold-down time be to still be responsive?” and

“How long should the hold-timer be to be stable enough?” and even “What is the optimalhold-down timer value?”

Unfortunately there is no universal hold-down timer value that applies to all networking

scenarios Hold-down timers are always a compromise between stability and

responsive-ness Look at stability to start with: this mostly depends on network size and link

stabil-ity Network engineers used to say “In a quiet environment, OSPF and IS-IS are quietprotocols”

In the infancy of link-state routing protocols there was usually a static SPF hold-downtimer of 5 seconds between SPF runs This was a conservative timer, the better to scalefor large networks Today, adaptive timers, which take into account the churn in the network,are more common The basic idea behind the new schemes is that the ﬁrst couple of SPFcalculations are scheduled immediately without any notable delay and only subsequent,persistent SPF runs are delayed The more SPF runs need to be scheduled, the longer thehold-down timer gets Such schemes are a much better compromise between responsivenessand stability than static timers can ever be

The typical adaptive timer algorithm implementation reacts very fast, and is veryresponsive at ﬁrst This covers 99 per cent of the typical network-changing events, whichare link failures That means that two LSPs arrive within a very short window For theremaining 1 per cent of failure scenarios, the algorithm falls back to the older SPF hold-down static intervals for self-protection reasons

JUNOS and IOS have different ways of implementing hold-down timers IOS

imple-ments a technique called exponential back off Here the hold-down interval gets doubled

each time an SPF calculation is executed The initial delay, the max-delay and the mum hold-down interval can be conﬁgured using the using the spf-interval

mini-<max-holddown> [<initial-wait> <minimum-holddown>] tion command The following shows a custom conﬁguration of the SPF hold downbehaviour in IOS This works as follows:

conﬁgura-IOS conﬁguration

In IOS there are three timers to control SPF hold-down The first timer specifies the SPF hold-down in the slower phase expressed in units of seconds The second timer specifies how many milliseconds to wait before scheduling the very first SPF calculation The third timer specifies the minimum SPF hold-down in the fast phase The last two timers are expressed in units of milliseconds.

London# show running-config

[ … ]

router isis

spf-interval 5 200 1000

Trang 5

Figure 10.12 shows the timing behaviour of the exponential back-off algorithm compared

to the JUNOS style, called a “3 fast back-off” method In IOS, the first SPF run isdelayed for 200 ms Next, the minimum-hold-down timer kicks in, so scheduling of thesecond SPF run will take at least 1000 ms as specified in the third argument of the spf-intervalconfiguration command All subsequent SPF runs will get delayed for doublethe previous hold-down time, 2 seconds for the third SPF run, 4 seconds for the fourthSPF run, and so on Similarly, the LSP origination interval, which was explained inChapter 6, “Generating, Flooding and Ageing LSPs”, also has a precaution that the hold-down does not grow to infinite value Clipping of the hold-down timer is done with thefirst argument (5 seconds) of the spf-interval command During every fast-build,the SPF interval gets bigger until it hits the ceiling of 5 seconds After a particular routerhas not scheduled an SPF run for 20 seconds, the SPF hold-down state will be reset Thismeans that from here on, any further SPF calculations will be scheduled “fast”, like thefirst couple of SPF runs

JUNOS takes a different approach Instead of gradually getting slower, there is a ﬁxed number of fast runs, and after that the router falls back into slow scheduling mode The

engineers at Juniper Networks argue that this linear form of back off has worked ﬁne forthe past 10 years, and more sophisticated methods are not needed In most implementations,the static SPF hold-down period is set to 5 seconds and by straight switching between thetwo modes, fast and slow, no harm is done

JUNOS has an initial pre-SPF timer that defaults to 200 ms It can be changed usingthe spf-delay conﬁguration command, which is available under the protocolsisisstanza This command affects both the partial and the full SPF calculation and can

be changed in the range from 50 ms to 1000 ms

JUNOS conﬁguration

In JUNOS there is only one timer that controls SPF scheduling The spf-interval

con-ﬁguration command determines in units of milliseconds the initial-wait and inter-SPF wait period when scheduling SPF calculations.

hannes@Vienna> show configuration

Trang 7

10.3.1.3 Timer Compatibility Issues

It is recommended to keep at least the initial-wait timer the same across the IOS andJUNOS routers in a network Once they are the same it is certain that the SPF calculationsstart and ﬁnish almost simultaneously Due to the hop-by-hop routing paradigm, nearsimultaneous SPF calculations and re-routing is desired to avoid transient loops However,

it can never be guaranteed that two routers converge at the same time, but keeping the

timers current is usually good enough, or at least does not break the desired global gence intentionally

conver-The following two IOS and JUNOS conﬁguration ﬁles are a good tradeoff between thetwo schemes and have proven to work well even in large multi-vendor networks

JUNOS conﬁguration

An SPF delay of 100 ms means that the SPF algorithm converges fast and still provides reasonable protection The typical SPF run in large networks does not last longer than

100 ms This 100 ms of quiet takes the average utilization down to 50 per cent.

hannes@Vienna> show configuration

London# show running-config

[ … ]

router isis

spf-interval 5 100 100

[ … ]

10.3.1.4 Performance and CPU Usage

The CPU cost of a plain, un-optimized SPF run is probably one of the most well-examinedalgorithms in computer science Before assessing worst-case ﬁgures, ﬁrst consider twofactors: how many routers and how many links are in the network Let the number of

routers be N and the number of links be L.

Trang 8

It is actually very hard to predict the SPF runtime, as it is highly dependent on thetopology, that is, how the routers are meshed to each other It has been shown above thatthe tracking of nodes on the PATH list consumes the most cycles So what is done is to

present a worst-case and an average-case scenario, considering the number of routers (N)

or the number of links (L) To ﬁnd out what the real SPF runtime will be, and it will be

somewhere between the two ﬁgures, how densely meshed the network is has to be takeninto account

For a router-based, worst case estimate, simply take a look at the number of routersand the number of search operations, assuming that every router is in the worst case con-

nected to every other router (a full mesh) Therefore, for a total of N nodes, at maximum

N–1 iterations steps are needed for the search operation to ﬁnd out if the actual path is

better than the TENTative path This is quite intuitive Mathematically speaking, the runtime requirements of the SPF run equals N * N–1 or O(N^2) Squared growth is really, really

the worst case

Exploring all the feasible path scales directly, along with the absolute number of links

it can be shown that the SPF computation time is proportional to the number of links in

the network Mathematically speaking, O(L * log(L)).

For example, let the number of routers be 100 and the numbers of links be 400 Then

the worst-case estimate would be that O(N^2) CPU-time-units (100 * 100 10000) are

spent The abstract unit “CPU-time units” is used because such observations only makesense in a comparative way If there is a given number of nodes and a given number oflinks in a network, and the current SPF run time, a good estimate of the CPU runtime inthe future, when the number of routers and the number of links is higher, can be made

The pure link-based observation results in a computational complexity of L * log(L),

which is 400 * (log(400)) 1040 of CPU time-units

So there is a factor of 10 deviation between the two estimates In reality both the number

of links and the number of routers need to be considered Both ﬁgures are needed for the

meshing factor, that is, how densely a given set of routers is meshed It will be shown

shortly that the link-based model is a much better approximation than the worst-caseestimate

The model where the total SPF runtime equals N (log(N)*2*log(L)) turns out to work best in practice In this formula, both the number of links and the number of nodes plus

a factor of two go into the formula The factor of two is needed because the two-waycheck is part of the path selection algorithm Based on that formula, the resulting calcula-tions come very close to reality See Table 10.1 for the best model of route-processorCPU prediction around today

The theoretical model was veriﬁed using a lab test based on two common routeprocessors: the Juniper Networks RE 3.0 taken from the M & T-Series of Routers, andthe GRP Routing Engine taking from the Cisco GSR 12000 series The two route processorswere exercised using the Agilent QA Robot Router Control-Plane Stress Testing Software.The Router Tester produces a grid, as shown in Figure 10.13

Every 25 seconds, one link of the virtual topology was changed and the SPF runtimeshave been recorded using the show isis spf-log operational level CLI command

on IOS and show isis spf log on JUNOS

Trang 9

IOS command output

London#show isis spf-log

JUNOS command output

hannes@Frankfurt> show isis spf log

IS-IS level 1 SPF log:

T ABLE 10.1 A prediction of real-world SPF runtime on common control plane CPUs.

Trang 10

model are quite surprising For even moderate to large topologies, the SPF calculation isquickly ﬁnished after several tens of milliseconds There are barely 30 IS-IS networks inthe world that have more than 400 routers and an SPF runtime greater than 50 ms

for their Level-2 routers So for the majority of networks, SPF-runtime is an absolute

non-issue It is certainly not the SPF runtime for the full SPF run that consumes a lot of

CPU resources

10.3.2 Partial SPF Run

A partial SPF run only does recalculation leaf-related information Partial runs are typically

triggered by the following events:

• Metric of preﬁxes change

• New preﬁxes

• Deletion of preﬁxes

The partial SPF run is basically an extraction of all the preﬁxes in the link-state

data-base plus some information about the proximity of the preﬁxes (in simple words, a

SUT

F IGURE 10.13 The SUT is exposed to a 7 7 virtual grid to test SPF calculation time

Trang 11

metric) Based on that, the partial run is basically a search operation, which tries to ﬁnd

out the lowest metric for a given prefix Figure 10.14 illustrates the simplicity of a partialSPF calculation All the leaf information from the routers on the PATH list, plus thePennsauken root router, extract their IPv4 prefixes and move them to a table Next, thelist is sorted and duplicate entries with a worse cost are eliminated Finally, the prefixesare sorted by their cost in ascending order This simple search operation is computationallymuch less complex than the topological section of the full SPF run

Both JUNOS and IOS support partial runs for IPv4 and IPv6 In IOS, you can alsocontrol the SPF delay for partial route calculations (PRCs) PRC is an IOS term and can

be controlled using the prc-interval router isis configuration command Thesetimers can be more aggressive (shorter) than the spf-interval <a> <b> <c>timers This is because the burden that a partial SPF run adds to a control plane is not ashigh as a full run, so the router does not need to self-protect so much The following con-figuration example sets the router pre-SPF timer (initial wait) before doing a partial SPFcalculation to 100 ms For the second run, the router holds down for 250 ms The PRCalso employs an exponential back-off timer That means after the second run, the hold-down value is now 500 ms The first argument of the command controls the maximumhold-down value of one second

Partial SPF runs are pretty cheap from the calculation point of view A router has to scanthrough all the routers in its link-state database, extract the prefix information, add theprefix cost of the distance to the originating router, and sort the prefixes to find out which

is closest This exhibits absolutely linear behaviour, meaning the CPU processing time isdirectly proportional to the number of routes in the network Mathematically speaking,

this would be O(R) with R being the number of preﬁxes of an address family In practical

implementations, the cost of the partial SPF run nears zero cost Typically, the partial run

is less than 10 ms execution time, even if R is unreasonably high (like 10,000) routes So

partial runs are even less of an issue than full SPF runs

Trang 12

Washington Washington Washington Washington New York New York New York

Paris Paris Paris

Pennsauken Pennsauken Pennsauken Pennsauken Frankfurt Frankfurt Frankfurt Frankfurt Frankfurt London London London

Trang 13

10.3.3 Incremental SPF Run

The incremental SPF (iSPF) run is an optimized version of the full SPF run What it does

is maintain additional data structures, so-called Neighbor and Parent lists, during previous full SPF calculations The paths that have not been used so far are of special interest.

Consider Figure 10.15, which shows the SPF tree from the SPF calculation example.Note that the link between London and Frankfurt is not on the shortest path tree from

Trang 14

Pennsauken’s perspective If the Pennsauken router receives a new LSP reporting thatthis particular link is down, then Pennsauken does not need to schedule a full SPF run.The reason is that because the router doing the SPF calculation has not used the link

before (when it was up), then it does not have to consider it when it is down.

Keep in mind that such considerations, whether to do a full SPF or an incremental SPF

run, is a purely local decision that applies only to the local router For other routers in

the network, for example Frankfurt, the link between London and Frankfurt may be ingful, and therefore on Frankfurt’s shortest path tree The iSPF advantage on thePennsauken router is meaningless to the Frankfurt router The incremental SPF run only

mean-spares the full SPF run on some of the routers in a given area but not to all of them.

Which routers beneﬁt from incremental SPF is heavily dependent on topology

Another optimization of the incremental SPF run is to track network dependencies.

Consider Figure 10.16, which shows a new router (Munich) attached as a leaf to the sample

87000 600000

Trang 15

topology The incremental SPF algorithm ﬁgures out that Munich is a leaf node anddependent on the Frankfurt router That knowledge is used in the SPF calculation Recallthat once the immediate successors on the PATH list are explored, the algorithm knows

that Munich is (because of its edge position) an uninteresting node for path searches and

hence does not need to get explored

Two scenarios where the iSPF algorithm may be applicable have been highlighted It

is the authors’ opinion that in the ﬁrst scenario (Figure 10.15) the performance improvement

is next to nothing This is due to the fact that, in a distributed environment, convergence isbound to the worst-case performing router It has been shown that not all routers takeequal advantage of the optimization, and some routers in the topology need a full SPFrun anyway The second example (Figure 10.16) is far more interesting as it dramaticallyreduces the number of nodes that need to get explored Also the majority of the routers

in the network take advantage of this and so there is a real SPF performance improvement

There are little, but profound, things known about theoretical models of the incrementalSPF calculation This is because there are lots of caveats and “it depends” in the underlyingalgorithm Incremental SPF only makes sense if the underlying topology is sparselymeshed and has many edge nodes Identiﬁcation and path tracking turned out to have one

of the highest overheads in the full SPF run

Stefano Previdi, a Development Engineer at Cisco Systems who maintains their IS-ISrouting protocol, claims that the average saving is 80 per cent from early ﬁeld trials Theﬁrst practical examination was conducted by Cengiz Alaettinoglu and Stephen Casner ofPacketdesign, who monitored the QWEST backbone in the US and analyzed full andincremental SPF runtimes The results are illustrated in Figure 10.17

It will be shown shortly that this is the misguided reason that people are afraid of frequentSPF runs It is the post-processing of route resolving and preﬁx insertion, and not the SPFcalculation itself, which makes the control planes of the core routers in the Internet busy

avg = 1069 usec

F IGURE 10.17 Incremental SPF performs by a factor of 80 better than the full (Dijkstra) SPF based

on the QWEST topology

Tiêu đề	The Complete IS-IS Routing Protocol
Trường học	Unknown
Chuyên ngành	Computer Networks
Thể loại	tài liệu hướng dẫn
Năm xuất bản	2023
Thành phố	Unknown

Định dạng
Số trang	30
Dung lượng	392,2 KB