Optimal wire sizing and buffer insertion for low power and a generalized delay model.. A new approach to simultaneous buffer insertion and wire sizing.. The algorithm performs buffer ins
Trang 1T(V1 )
v
v2 v3 … v k
Rmin
FIGURE 26.14 If α1andα2satisfy the condition in Definition 1 at v1, α2is redundant (From Shi, W and
Li, Z., IEEE Trans Computer-Aided Design, 24, 879, 2005 With permission.)
node being processed However, all candidates at this node must be propagated further upstream toward the source This means the load seen at this node must be driven by some minimal amount of upstream wire or gate resistance By anticipating the upstream resistance ahead of time, one can prune out more potentially inferior candidates earlier rather than later, which reduces the total number of candidates generated More specifically, assume that each candidate must be driven by an upstream
resistance of at least Rmin The pruning based on anticipated upstream resistance is called predictive
pruning
Definition 1 Predictive Pruning Let α1and α2be two nonredundant candidates of T (v) such that
C (α1)<C(α2) and Q(α1)<Q(α2) If Q(α2) − Rmin · C(α2 ) ≤ Q(α1) − Rmin · C(α1 ), then α2is pruned.
Q (v, α1) − Q(v, α2) = Q(v1,α1) − Q(v1,α2) − Rmin· [C(v1,α1) − C(v1,α2)] ≥ 0
Predictive pruning technique prunes more redundant solutions while guarantees optimality It is one of four key techniques of fast algorithms proposed in Ref [39] In Ref [42], significant speedup
is achieved by simply extending predictive pruning technique to buffer cost Aggressive predictive
Ref [43] to achieve further speedup with a little degradation of solution quality
26.5.3 CONVEXPRUNING
The basic data structure of van Ginneken’s algorithms is a sorted list of nondominated candidates Both the pruning in van Ginneken’s algorithm and the predictive pruning are performed by comparing two neighboring candidates a time However, more potentially inferior candidates can be pruned out
by comparing three neighboring candidate solutions simultaneously For three solutions in the sorted list, the middle one may be pruned according to convex pruning
Definition 2 Convex Pruning Let α1, α2and, α3be three nonredundant candidates of T(v) such that C (α1) < C(α2) < C(α3) and Q(α1) < Q(α2) < Q(α3) If
Q (α2) − Q(α1)
C (α2) − C(α1) <
Q (α3) − Q(α2)
C (α3) − C(α2) (26.25) then we call α2nonconvex, and prune it.
Convex pruning can be explained by Figure 26.15 Consider Q as the Y -axis and C as the X-axis.
Then candidates are points in the two-dimensional plane It is easy to see that the set of nonredundant
Trang 2c q
c1 c2c3
q1
q2
q3 Pruned
q
c1 c3 c4
q1
q3
q4
(b)
c4
q4
FIGURE 26.15 (a) Nonredundant candidates N(v) and (b) nonredundant candidates M(v) after convex
pruning (From Li, Z and Shi, W., IEEE Trans Computer-Aided Design, 25, 484, 2006 With permission.)
definition is shown in Figure 26.15a, and is pruned in Figure 26.15b The set of nonredundant
T (v) that satisfy the condition in Definition 2 In Figure 26.15, let the slope between α1 andα2(α2 andα3) beρ1,2 (ρ2,3) If candidate α2 is not on the convex hull of the solution set, thenρ1,2< ρ2,3
These candidates must have certain upstream resistance R including wire resistance and buffer/driver
toα1 In other words, if a candidate is not on the convex hull, it will be pruned either by the solution ahead of it or the solution behind it Please note that this conclusion only applies to two-pin nets For multipin nets, when the upstream could be a merging vertex, nonredundant candidates that are pruned by convex pruning could still be useful
performed in linear time by Graham’s scan Furthermore, when a new candidate is inserted to the list, we only need to check its neighbors to decide if any candidate should be pruned under convex
In Refs [40,41], the convex pruning is used to form the convex hull of nonredundant candidates,
pruning (called squeeze pruning) is performed on both two-pin and multipin nets to prune more solutions with a little degradation of solution quality
26.5.4 EFFICIENTWAY TOFINDBESTCANDIDATES
T (v), where N(v) does not include candidates with buffers inserted at v Now we want to insert
Pi (v, α) = Q(v, α) − R(Bi) · C(v, α) − K(Bi) (26.26)
Q (v,βi) = max
α∈N(v) {P i (v, α)}
C (v,βi) = C(Bi)
Trang 3Define the best candidate for B i as the candidateα ∈ N(v) such that α maximizes Pi(v, α)
position
According to convex pruning, it is easy to see that all best candidates are on the convex hull The
following lemma says that if we sort candidates in increasing Q and C order from left to right, then
as we add wires to the candidates, we always move to the left to find the best candidates
Lemma 1 For any T(v), let nonredundant candidates after convex pruning be α1,α2, , αk, in increasing Q and C order Now add wire e to each candidate αj and denote it as αj + e For any
buffer type Bi, if αj gives the maximum Pi (αj) and αk gives the maximum Pi (αk + e), then k ≤ j.
The following lemma says the best candidate can be found by local search, if all candidates are convex
Lemma 2 For any T(v), let nonredundant candidates after convex pruning be α1,α2, , αk , in increasing Q and C order If P i(αj−1) ≤ Pi (αj), Pi (αj) ≥ Pi(αj+1), then αj is the best candidate for buffer type B i and
P i (α1) ≤ · · · ≤ Pi (αj−1) ≤ Pi(αj )
Pi (αj) ≥ Pi(αj+1) ≥ · · · ≥ Pi (αk)
26.5.5 IMPLICITREPRESENTATION
Van Ginnken’s algorithm uses explicit representation to store slack and capacitance values, and
explicit updating of candidates
the tree [39] or as global variables themselves [41] are updated Intuitively, qa represents extra wire delay, ca represents extra wire capacitance, and ra represents extra wire resistance
candidate:
∗ In Ref [40], Lemma 1 is presented differently It says if all buffers are sorted decreasingly according to driving resistance, then the best candidates for each buffer type in such order is from left to right.
†In Ref [41], only two fields, q and c, are necessary for each candidate qa, ca, and ra are global variables for each two-pin
segment.
Trang 4The actual value of Q and C of each candidate α are decided as follows:
Q (α) = q − qa − ra · c
Implicit representation is applied on balance tree in Ref [39], where the operation of adding a
REFERENCES
1 J Cong An interconnect-centric design flow for nanometer technologies Proceedings of IEEE, 89(4):
505–528, April 2001
2 J A Davis, R Venkatesan, A Kaloyeros, M Beylansky, S J Souri, K Banerjee, K C Saraswat, A Rahman,
R Reif, and J D Meindl Interconnect limits on gigascale integration (GSI) in the 21st century Proceedings
of IEEE, 89(3): 305–324, March 2001.
3 R Ho, K W Mai, and M A Horowitz The future of wires Proceedings of IEEE, 89(4): 490–504,
April 2001
4 A B Kahng and G Robins On Optimal Interconnections for VLSI Kluwer Academic Publishers, Boston,
MA, 1995
5 J Cong, L He, C -K Koh, and P H Madden Performance optimization of VLSI interconnect layout
Integration: The VLSI Journal, 21: 1–94, 1996.
6 P Saxena, N Menezes, P Cocchini, and D A Kirkpatrick Repeater scaling and its impact on CAD IEEE Transactions on Computer-Aided Design, 23(4): 451–463, April 2004.
7 J Cong Challenges and opportunities for design innovations in nanometer technologies SRC Design Sciences Concept Paper, 1997.
8 M S Bazaraa, H D Sherali, and C M Shetty Nonlinear Programming: Theory and Algorithms John
Wiley & Sons, NY, 1993
9 C J Alpert and A Devgan Wire segmenting for improved buffer insertion In Proceedings of the ACM/IEEE Design Automation Conference, Anaheim, CA, pp 588–593, 1997.
10 C C N Chu and D F Wong Closed form solution to simultaneous buffer insertion/sizing and wire sizing
ACM Transactions on Design Automation of Electronic Systems, 6(3): 343–371, July 2001.
11 L P P P van Ginneken Buffer placement in distributed RC-tree networks for minimal Elmore delay
In Proceedings of the IEEE International Symposium on Circuits and Systems, New Orleans, LA,
pp 865–868, 1990
12 J Lillis, C K Cheng, and T Y Lin Optimal wire sizing and buffer insertion for low power and a generalized
delay model IEEE Journal of Solid-State Circuits, 31(3): 437–447, March 1996.
13 N Menezes and C -P Chen Spec-based repeater insertion and wire sizing for on-chip interconnect In
Proceedings of the International Conference on VLSI Design, Goa, India, pp 476–483, 1999.
14 L -D Huang, M Lai, D F Wong, and Y Gao Maze routing with buffer insertion under transition time
constraints IEEE Transactions on Computer-Aided Design, 22(1): 91–95, January 2003.
15 C J Alpert, A B Kahng, B Liu, I I Mandoiu, and A Z Zelikovsky Minimum buffered routing with
bounded capacitive load for slew rate and reliability control IEEE Transactions on Computer-Aided Design,
22(3): 241–253, March 2003
16 C Kashyap, C J Alpert, F Liu, and A Devgan Closed form expressions for extending step delay and slew
metrics to ramp inputs In Proceedings of the ACM International Symposium on Physical Design, Monterey,
CA, pp 24–31, 2003
17 H B Bakoglu Circuits, Interconnections and Packaging for VLSI Addison-Wesley, Reading, MA, 1990.
18 N H E Weste and K Eshraghian Principles of CMOS VLSI Design: A System Perspective Addison-Wesley
Publishing Company, Reading, MA, 1993
19 S Hu, C J Alpert, J Hu, S Karandikar, Z Li, W Shi, and C -N Sze Fast algorithms for slew constrained
minimum cost buffering In Proceedings of the ACM/IEEE Design Automation Conference, San Francisco,
CA, pp 308–313, 2006
Trang 520 J Cong and C K Koh Simultaneous driver and wire sizing for performance and power optimization IEEE Transactions on VLSI Systems, 2(4): 408–425, December 1994.
21 S S Sapatnekar RC interconnect optimization under the Elmore delay model In Proceedings of the ACM/IEEE Design Automation Conference, San Diego, CA, pp 392–396, 1994.
22 J Cong and K -S Leung Optimal wiresizing under the distributed Elmore delay model IEEE Transactions
on Computer-Aided Design, 14(3): 321–336, March 1995.
23 J P Fishburn and C A Schevon Shaping a distributed RC line to minimize Elmore delay IEEE Transactions
on Circuits and Systems, 42(12): 1020–1022, December 1995.
24 C P Chen, Y P Chen, and D F Wong Optimal wire-sizing formula under the Elmore delay model In
Proceedings of the ACM/IEEE Design Automation Conference, Las Vegas, NV, pp 487–490, 1996.
25 C J Alpert, A Devgan, J P Fishburn, and S T Quay Interconnect synthesis without wire tapering IEEE Transactions on Computer-Aided Design, 20(1): 90–104, January 2001.
26 A Devgan Efficient coupled noise estimation for on-chip interconnects In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, San Jose, CA, pp 147–151, 1997.
27 C J Alpert, A Devgan, and S T Quay Buffer insertion for noise and delay optimization IEEE Transactions
on Computer-Aided Design, 18(11): 1633–1645, November 1999.
28 C C N Chu and D F Wong A new approach to simultaneous buffer insertion and wire sizing In
Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, San Jose, CA,
pp 614–621, 1997
29 W C Elmore The transient response of damped linear networks with particular regard to wideband
amplifiers Journal of Applied Physics, 19: 55–63, January 1948.
30 F J Liu, J Lillis, and C K Cheng Design and implementation of a global router based on a new
layout-driven timing model with three poles In Proceedings of the IEEE International Symposium on Circuits and Systems, Hong Kong, China, pp 1548–1551, 1997.
31 J Qian, S Pullela, and L T Pillage Modeling the effective capacitance for the RC interconnect of CMOS
gates IEEE Transactions on Computer-Aided Design, 13(12): 1526–1535, December 1994.
32 S R Nassif and Z Li A more effective C eff In Proceedings of the IEEE International Symposium on Quality Electronic Design, San Jose, CA, pp 648–653, 2005.
33 B Tutuianu, F Dartu, and L Pileggi Explicit RC-circuit delay approximation based on the first three
moments of the impulse response In Proceedings of the ACM/IEEE Design Automation Conference, Las
Vegas, NV, pp 611–616, 1996
34 C J Alpert, F Liu, C V Kashyap, and A Devgan Closed-form delay and slew metrics made easy IEEE Transactions on Computer-Aided Design, 23(12): 1661–1669, December 2004.
35 C J Alpert, A Devgan, and S T Quay Buffer insertion with accurate gate and interconnect delay
computation In Proceedings of the ACM/IEEE Design Automation Conference, New Orleans, LA,
pp 479–484, 1999
36 C -K Cheng, J Lillis, S Lin, and N Chang Interconnect Analysis and Synthesis Wiley Interscience, New
York, 2000
37 S Hassoun, C J Alpert, and M Thiagarajan Optimal buffered routing path constructions for single and
multiple clock domain systems In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, San Jose, CA, pp 247–253, 2002.
38 P Cocchini A methodology for optimal repeater insertion in pipelined interconnects IEEE Transactions
on Computer-Aided Design, 22(12): 1613–1624, December 2003.
39 W Shi and Z Li A fast algorithm for optimal buffer insertion IEEE Transactions on Computer-Aided Design, 24(6): 879–891, June 2005.
40 Z Li and W Shi An O (bn2) time algorithm for buffer insertion with b buffer types IEEE Transactions on Computer-Aided Design, 25(3): 484–489, March 2006.
41 Z Li and W Shi An O (mn) time algorithm for optimal buffer insertion of nets with m sinks In Proceedings
of Asia and South Pacific Design Automation Conference, Yokohama, Japan, pp 320–325, 2006.
42 W Shi, Z Li, and C J Alpert Complexity analysis and speedup techniques for optimal buffer insertion
with minimum cost In Proceedings of Asia and South Pacific Design Automation Conference, Yokohama,
Japan, pp 609–614, 2004
43 Z Li, C N Sze, C J Alpert, J Hu, and W Shi Making fast buffer insertion even faster via approximation
techniques In Proceedings of Asia and South Pacific Design Automation Conference, Shanghai, China,
pp 13–18, 2005
Trang 627 Generalized Buffer Insertion
Miloš Hrki ´c and John Lillis
CONTENTS
27.1 Introduction 557
27.2 Two-Phase Approach and Buffer-Aware Tree Construction 560
27.2.1 C-Tree Algorithm 560
27.2.2 Buffer Tree Topology Generation 561
27.3 Simultaneous Tree Construction and Buffer Insertion 562
27.3.1 P-Tree Algorithm 562
27.3.2 S-Tree Algorithm 564
27.3.3 SP-Tree Algorithm 566
27.3.4 Complete Tree Topology Exploration 566
References 566
27.1 INTRODUCTION
It has been widely recognized that interconnect is a dominating factor in modern very large scale integration (VLSI) circuit designs Chapter 26 gave an overview of challenges that interconnect faces and introduced a technique called repeater insertion that has proven to be very efficient in addressing emerging interconnect issues
Early work on repeater insertion focused mainly on improving interconnect timing performance The most influential work is van Ginneken’s dynamic programming algorithm [1] The algorithm performs buffer insertion on a fixed and embedded tree (e.g., as given by a global router) and produces
an optimal timing solution under Elmore delay model [2] Various generalizations of van Ginneken’s algorithm have appeared in the literature taking into account issues of practical importance such as buffer libraries with inverting and noninverting buffers, simultaneous wire sizing, and slew-based delay models Additionally, generalizations that address natural constrained optimization variants of the problem (e.g., minimization of area or power consumption subject to timing constraints) have also appeared Progress has also been made in improving computational complexity as well as practical runtime Many of these results are presented in Chapter 26
A significant limitation of van Ginneken’s approach is that it requires a fixed and embedded tree that has to be provided in advance This constraint forces the final buffered solution quality to depend on the input tree Even though the algorithm provides an optimal timing solution for a given tree, it will produce a poor solution when given a poor tree A few example scenarios that are very common in practice can be used to illustrate this limitation
As noted earlier, one of the basic interconnect optimization tasks is delay minimization Given that sinks may have very different required signal arrival time constraints, a routing solution that focuses only on, for example, minimizing wirelength may not be good enough In Figure 27.1,
sinks F and G are timing critical while the others are not Configuration in Figure 27.1a has better
wirelength, but the buffering cost is very high On the other hand, configuration in Figure 27.1b can achieve better timing results with slightly more wirelength but many fewer buffers
557
Trang 7(a) (b)
D
C B A
E F G
D
C B A
FIGURE 27.1 Buffering example: Sinks F and G are assumed to be critical; tree (a) has slightly smaller
wirelength but requires more buffers (and may prevent timing constraints on F and G from being met) than the
tree (b)
FIGURE 27.2 Buffering example: To meet signal polarity requirements, the number of buffers that is required
varies significantly from one topology to another
In some cases, certain sinks of a net require input signals of inverted polarity Choices made during route construction can have a large impact on the cost of buffering solutions, as we can see
in Figure 27.2 The two solutions Figure 27.2 have very different buffer and wiring costs
Figure 27.3 shows a simple example illustrating the issues raised during buffering and routing
in the presence of blockages In configuration of Figure 27.3a, the route goes over the blockage and cannot be buffered (thus, possibly violating timing, load, or slew constraints) If the route completely avoids the blockage, the resulting solution is expensive in terms of wire and buffer costs (Figure 27.3b) Finally, by being aware of different types of blockages, configuration in Figure 27.3c dominates both in delay and resource usages/costs
Recently, some designs have reserved internal areas of macroobjects for buffering of external nets (e.g., the whitespace in macros as in Figure 27.4) Any buffer insertion algorithm that has to work
on a route that is not aware of the layout specifics will have limited chances of success Referring to
Figure 27.4, assuming that sink A is critical and the others are not, the two solutions in Figure 27.4
can have significant quality difference (e.g., cost or timing characteristics)
In other practical formulations, routing or buffering feasibility is not considered a zero or one property (blocked or free) Instead, a complex cost function based on the local and global design densities and congestions should drive routing and buffering algorithms; such formulations can prevent overconstraining the design space, but require incremental interaction with placers and routers Even more, the overall design closure can suffer because irresponsible use of buffering resources on nets (or portions of nets) that are not critical can prevent other critical nets from
Given the examples above, routing and buffering algorithms should be able to account for the cost/performance trade-off of the solutions that they produce Generating the fastest buffering solution
∗ Some of the approaches that are specifically designed to target blockages (routing or placement) as well as design density and congestion are presented in more detail in Chapter 28 However, some of the ideas will be reviewed in this chapter because they are among the core components of some tree synthesis and buffering algorithms.
Trang 8FIGURE 27.3 Buffering example: Depending on the interaction between routes and blockages, buffered
solution can be (a) infeasible, (b) expensive, (c) or not bad at all
(b)
1 2
3 1
2
3
D
D B B
FIGURE 27.4 Buffering example: With increasing complexity of constraints, ability of buffering algorithms
to handle such constraints is becoming more important Assuming that sink A in critical, solutions (a) and (b) can have significant quality difference
may be necessary for some nets, but if applied to all nets, the design would quickly become too expensive (e.g., in area and power usage), or even become impossible to manufacture In addition, algorithm complexity and runtime is a very important practical factor given that hundreds of thousands
of nets may need to be buffered within a given CPU time budget
In the following sections, we give an overview of recent research that addresses one or more of the problems mentioned above This area of research is still very active and our summary presents only a snapshot of the past and current research
The majority of techniques that address problems mentioned above can be placed in one of the two categories Several works propose a two-stage sequential method where a buffer-aware tree is constructed first, followed by van Ginneken style buffer insertion as in Refs [3–6] These techniques have small execution time with some sacrifice in solution quality and predictability In Section 27.2,
we describe techniques from Refs [3,6] in more detail
A more robust and predictable approach proposes simultaneous route construction while per-forming buffer insertion An example is the buffered P-Tree class of algorithms [7], which integrates buffer insertion into the P-Tree Steiner tree construction algorithm [8] The P-Tree algorithm intro-duced a paradigm of finding an optimal solution in a constrained, but very large, space including topological, embedding, and buffering degrees of freedom, as opposed to applying ad hoc heuristics Section 27.3 presents methods for simultaneous routing tree construction and buffer insertion from Refs [7–12]
Trang 927.2 TWO-PHASE APPROACH AND BUFFER-AWARE TREE CONSTRUCTION 27.2.1 C-TREEALGORITHM
The work in Ref [3] addresses the problem of buffering under timing and polarity constraints Given a net with placed pins, timing and polarity requirements at sinks, driver properties, a buffer library, and the technology’s interconnect parasitics, the goal is to find a Steiner tree that, after buffer insertion, meets timing constraints while minimizing solution cost (i.e., wire and buffer usage)
A two-phase flow is proposed: a buffer-aware Steiner tree construction called C-Tree is followed
by a van Ginneken style buffer insertion It is argued that an optimal buffer insertion on a fixed and routed tree can produce good/optimal results as long as it is given the right Steiner tree However,
in practice, instead of finding the right tree (which is very difficult because the tree construction algorithm is not optimizing the true objective) one can construct a buffer-aware Steiner tree, which tries to anticipate potential buffer locations
The main idea in C-Tree (clustered tree) is to construct a tree in two stages First, sinks are clustered based on a distance metrics (timing criticality, polarity requirements, physical distance) Then, lower level trees are constructed on each cluster After determining tapping points for each cluster, the top-level timing-driven tree is constructed, connecting the driver with cluster tapping points Merging the top-level tree with cluster trees yields a final tree for the entire net
Sink properties used for clustering are spatial (physical location coordinates), temporal (required arrival times), and polarity The distance metrics incorporate all three elements They are defined separately and then combined using scaling factors into a single distance metric The spatial distance
same required arrival time over the longer distance Thus, an estimate of the achievable delay is used
to adjust required arrival time and obtain achievable slack It is further argued that the difference in
parameter The criticality is a value between 0 and 1, where 1 is the most critical (the average
defined as the difference in sink criticality Finally, the distance metric is a linear combination of spatial, temporal, and polarity distances (noting that spatial distance is normalized by spatial diameter
sDiam(N) defined as the maximum distance between the sinks):
β[s Dist(si , s j)/s Diam(N)] + (1 − β)t Dist(si , s j) + p Dist(si , s j).
The clustering itself is done using K-center heuristics It is an iterative approach, which identifies sinks that are furthest away and labels them as cluster seeds The remaining sinks are then clustered around the closest seed More details can be found in Ref [3]
Once the clusters are determined, timing-driven Steiner trees are constructed on each cluster and one on the top level using the Prim–Dijkstra algorithm from Ref [13]
The experimental results show that this technique often exhibits a good trade-off between runtime and the quality of results (i.e., providing good solutions on the average in terms of both the cost and the delay while keeping low runtime) In addition, this method is not very complicated to implement One should be aware of the fact that this algorithm is not designed to handle obstacles and design congestion in general, so results may not be very predictable in those scenarios
Trang 1027.2.2 BUFFERTREETOPOLOGYGENERATION
A more recent work [6] also recognizes the problem of buffering fixed trees, together with the growing problem of design size, where millions of nets have to be optimized in a reasonable amount of time This work presents a new algorithm for generation of tree topologies that are buffer-friendly The algorithm balances achieving the signal required arrival time constraints and minimizing wirelength Let us first explain the notion of the tree topology in this work (we will refer to it as a partially embedded tree topology) Figure 27.5a shows a partially embedded tree topology It is a directed tree structure where each node except the root has only one input edge, each internal node has exactly two output edges, while the root has only one output edge In addition, each node has an assigned placement location (placement overlap is allowed) However, the embeddings of the edges (i.e., routes) as well as the number of buffers and buffer placements are not specified An example
of a completely embedded and buffered tree topology is given in Figure 27.5b
Once the partially embedded topology tree is constructed, many of the known techniques can
be used to perform two-pin routing and buffer insertion between the tree nodes (i.e., Refs [14–16])
As opposed to the approach in Ref [3], subtree parities (i.e., signal polarities) are resolved locally because inverters are being used for buffering
The algorithm proceeds in the following manner First, sinks are ordered based on criticality (the most critical first) In a manner similar to Ref [3], criticality estimation is based on estimated slack rather than only relying on sink required arrival time To estimate the delay from the driver to sinks,
a linear delay model is used (similar to Ref [5]) augmented by estimated buffer intrinsic delays and loads The assumption is that these paths are going to be buffered eventually so the algorithm accounts for the delay that the path is going to have after buffering In Ref [6], some additional experiments are performed to justify this assumption and results show good correlation between estimates and final results
When the ordering is complete, sinks are added to the topology one at a time (the initial topology consists of the driver and the most critical sink only) A single sink insertion is performed by examining all edges in the current topology and finding the closest tapping point within the bounding box of the edge terminals (note that the topology is partially embedded and all nodes have fixed placement locations) The edge for which the overall slack has the best value is chosen and sink insertion is performed by breaking that edge and inserting a new internal node to the tree The parent
of the new node is the source of the chosen edge and the children are the newly inserted sink and the destination node of the chosen edge By keeping the arrival times at each topology node, a single sink insertion can be performed in linear time, giving the overall quadratic algorithm complexity (note that each operation is fairly simple, which leads to a very small execution time)
In addition, Ref [6] proves theoretical lower bounds on slack and wirelength in two extreme cases: sinks close to the driver and sinks having large noncritical required arrival times Among the
A B
C
D
A B
C D
FIGURE 27.5 (a) Partially embedded routing tree topology and (b) completely embedded and buffered tree.