One metric that reveals the scaling is critical buffer length, the minimum distance beyond which inserting an optimally placed and sized buffer makes the interconnect delay less than tha
Trang 1532 Handbook of Algorithms for Physical Design Automation
36 Elmore, W C The transient response of damped linear networks with particular regard to wide-band
amplifiers Journal of Applied Physics 19(1): 55–63, 1948.
37 Lin, T M and Mead, C A Signal delay in general RC-networks IEEE Transactions Computer-Aided Design CAD-3(4): 331–349, October 1984.
38 Rubinstein, J., Penfield, P., and Horowitz, M A Signal delay in RC tree networks IEEE Transactions Computer-Aided Design 2(3): 202–211, 1983.
39 Tsay, R S Exact zero skew In Proceedings of the IEEE International Conference Computer-Aided Design,
Santa Clara, CA, November 1991, pp 336–339
40 Alpert, C J., Hu, T C., Huang, J H., Kahng, A B., and Karger, D Prim-Dijkstra tradeoffs for improved
performance-driven routing tree design IEEE Transactions Computer-Aided Design 14(7): 890–896, July
1995 (ISCAS 1993)
41 Awerbuch, B., Baratz, A., and Peleg, D Cost-sensitive analysis of communication protocols In Proceed-ings of the ACM Symposium Principles of Distributed Computing, Quebec City, Quebec, Canada, 1990,
pp 177–187
42 Cong, J., Kahng, A B., Robins, G., Sarrafzadeh, M., and Wong, C K Provably good algorithms for
performance-driven global routing In Proceedings of the IEEE International Symposium Circuits and Systems, San Diego, CA, May 1992, pp 2240–2243.
43 Cong, J., Kahng, A B., Robins, G., Sarrafzadeh, M., and Wong, C K Provably good performance-driven
global routing IEEE Transactions Computer-Aided Design 11(6): 739–752, 1992.
44 Khuller, S., Raghavachari, B., and Young, N Balancing minimum spanning and shortest path trees In
Proceedings of the ACM/SIAM Symposium Discrete Algorithms, Austin, TX, January 1993, pp 243–250.
45 Boese, K D., Kahng, A B., McCoy, B A., and Robins, G Fidelity and near-optimality of Elmore-based
routing constructions In Proceedings of the IEEE International Conference Computer Design, Cambridge,
MA, October 1993, pp 81–84
46 Boese, K D., Kahng, A B., McCoy, B A., and Robins, G Rectilinear Steiner trees with minimum
Elmore delay In Proceedings of the ACM/IEEE Design Automation Conference, San Diego, CA, June
1994, pp 381–386
47 Boese, K D., Kahng, A B., and Robins, G High-performance routing trees with identified critical sinks
In Proceedings of the ACM/IEEE Design Automation Conference, Dallas, TX, June 1993, pp 182–187.
48 Lillis, J., Cheng, C K., Lin, T -T Y., and Ho, C -Y New performance driven routing techniques
with explicit area/delay tradeoff and simultaneous wire sizing In Proceedings of the ACM/IEEE Design Automation Conference, Las Vegas, NV, 1996, pp 395–400.
49 Chen, H., Cheng, C -K., Kahng, A., M˘andoiu, I I., Wang, Q., and Yao., B The y-architecture for on-chip
interconnect: Analysis and methodology IEEE Transactions Computer-Aided Design 24(4): 588–599,
April 2005
50 Chen, H., Cheng, C -K., Kahng, A B., M˘andoiu, I., and Wang, Q Estimation of wirelength reduction for
λ-geometry vs Manhattan placement and routing In Proceedings of the ACM International Workshop on System-Level Interconnect Prediction, Monterey, CA, 2003, pp 71–76.
51 Koh, C -K and Madden, P H Manhattan or non-Manhattan?: A study of alternative VLSI routing
architectures In Proceedings of the Great Lakes Symposium VLSI, Chicago, IL, 2000, pp 47–52.
52 Li, Y Y., Cheung, S K., Leung, K S., and Wong, C K Steiner tree construction inλ3-metric IEEE Transactions Circuits and Systems-II: Analog and Digital Signal Processing 45(5): 563–574, May 1998.
53 Nielsen, B K., Winter, P., and Zachariasen, M An exact algorithm for the uniformly-oriented Steiner tree
problem In Proceedings of the European Symposium on Algorithms, Lecture Notes in Computer Science
2461 Springer-Verlag, Rome, Italy, 2002, pp 760–771
54 Sarrafzadeh, M and Wong, C K Hierarchical Steiner tree construction in uniform orientations IEEE Transactions Computer-Aided Design 11(9): 1095–1103, September 1992.
55 Teig, S The x architecture: Not your father’s diagonal wiring In Proceedings of the ACM International Workshop on System-Level Interconnect Prediction, San Diego, CA, 2002, pp 33–37.
56 Yildiz, M C and Madden, P H Preferred direction Steiner trees In Proceedings of the Great Lakes Symposium VLSI, West Lafayette, IN, 2001, pp 56–61.
57 Dijkstra, E W A note on two problems in connection with graphs Numerische Mathematik 1:
269–271, 1959
58 Prim, A Shortest connecting networks and some generalizations Bell System Technical Journal 36:
1389–1401, 1957
Trang 2Timing-Driven Interconnect Synthesis 533
59 Cong, J., Kahng, A B., Robins, G., Sarrafzadeh, M., and Wong, C K Performance-driven global routing
for cell based ICs In Proceedings of the IEEE International Conference Computer Design, Cambridge,
MA, October 1991, pp 170–173
60 Robins, G and Zelikovsky, A Improved Steiner tree approximation in graphs In Proceedings of the ACM/SIAM Symposium Discrete Algorithms, San Francisco, CA, January 2000, pp 770–779.
61 Kahng, A B and Robins, G On performance bounds for a class of rectilinear Steiner tree heuristics in
arbitrary dimension IEEE Transactions Computer-Aided Design 11(11): 1462–1465, November 1992.
62 Griffith, J., Robins, G., Salowe, J S., and Zhang, T Closing the gap: Near-optimal Steiner trees in
polynomial time IEEE Transactions Computer-Aided Design 13(11): 1351–1365, November 1994.
63 Kahng, A B and Robins, G A new class of iterative Steiner tree heuristics with good performance IEEE Transactions Computer-Aided Design 11(7): 893–902, July 1992.
64 Cong, J., Leung, K S., and Zhou, D Performance-driven interconnect design based on distributed RC delay
model In Proceedings of the ACM/IEEE Design Automation Conference, Dallas, June 1993, pp 606–611.
65 Nastansky, L., Selkow, S M., and Stewart, N F Cost-minima trees in directed acyclic graphs Zeitschrift for Operations Research 18: 59–67, 1974.
66 de Matos, R R L A Rectilinear Arborescence Problem PhD thesis, University of Alabama, Tuscaloosa, Alabama, 1979
67 Ho, J M., Ko, M T., Ma, T H., and Sung, T Y Algorithms for rectilinear optimal multicast tree problem
In Proceedings of the International Symposium on Algorithms and Computation, Nagoya, Japan, June
1992, pp 106–15
68 Leung, K -S and Cong, J Fast optimal algorithms for the minimum rectilinear Steiner arborescence
problem In Proceedings of the IEEE International Symposium Circuits and Systems, Vol 3, Hong Kong,
1997, pp 1568–1571
69 Rao, S K., Sadayappan, P., Hwang, F K., and Shor, P W The rectilinear Steiner arborescence problem
Algorithmica 7(1): 277–288, 1992.
70 Trubin, V A Subclass of the Steiner problems on a plane with rectilinear metric Cybernetics and Systems Analysis 21(3): 320–322, 1985.
71 Shi, W and Su, C The rectilinear Steiner arborescence problem is np-complete SIAM Journal of Computation 35(3): 729–740, 2006.
72 Cordova, J and Lee, Y H A heuristic algorithm for the rectilinear Steiner arborescence problem Technical Report TR-94-025, University of Florida, Gainesville, FL, 1994
73 Alexander, M J and Robins, G New performance-driven FPGA routing algorithms IEEE Transactions Computer-Aided Design 15(12): 1505–1517, December 1996.
74 Kou, L., Markowsky, G., and Berman, L A fast algorithm for Steiner trees Acta Informatica 15: 141–
145, 1981
75 Cong, J., Kahng, A B., and Leung, K -S Efficient algorithms for the minimum shortest path Steiner
arborescence problem with applications to VLSI physical design IEEE Transactions Computer-Aided Design 17(1): 24–39, January 1998.
76 Robins, G On Optimal Interconnections PhD thesis, Department of Computer Science, UCLA, CSD-TR-920024, Los Angeles, CA, 1992
77 Zhou, D., Tsui, F., and Gao, D S High performance multichip interconnection design In Proceedings of the ACM/SIGDA Physical Design Workshop, Lake Arrowhead, CA, April 1993, pp 32–43.
78 Sriram, M and Kang, S M Performance driven MCM routing using a second order RLC tree delay
model In IEEE International Conference on Wafer Scale Integration, San Francisco, CA, January 1993,
pp 262–267
79 Alpert, C J., Gandham, G., Hrkic, M., Hu, J., Kahng, A B., Lillis, J., Liu, B., Quay, S T., Sapatnekar, S S.,
and Sullivan, A J Buffered Steiner trees for difficult instances IEEE Transactions Computer-Aided Design
21(1): 3–14, January 2002
80 Ganley, J L Accuracy and fidelity of fast net length estimates Integration: The VLSI Journal 23(2):
151–155, 1997
81 Hong, X., Xue, T., Kuh, E S., Cheng, C K., and Huang, J Performance-driven Steiner tree algorithms for
global routing In Proceedings of the ACM/IEEE Design Automation Conference, Dallas, TX, June 1993,
pp 177–181
82 Hu, J and Sapatnekar, S S Algorithms for non-Hanan-based optimization for VLSI interconnect under
a higher order awe model IEEE Transactions Computer-Aided Design 19(4): 446–458, April 2000.
Trang 3534 Handbook of Algorithms for Physical Design Automation
83 Hu, J and Sapatnekar, S S A timing-constrained simultaneous global routing algorithm IEEE Transactions Computer-Aided Design 21(9): 1025–1036, September 2002.
84 Peyer, S., Zachariasen, M., and Grove, D J Delay-related secondary objectives for rectilinear Steiner
minimum trees Discrete and Applied Mathematics 136(2): 271–298, February 2004.
85 Wu, D., Hu, J., and Mahapatra, R Coupling aware timing optimization and antenna avoidance in layer
assignment In Proceedings of the International Symposium on Physical Design ACM Press, New York,
2005, pp 20–27
86 Hanan, M On Steiner’s problem with rectilinear distance SIAM Journal of Applied Mathematics 14: 255–
265, 1966
87 Zachariasen, M A catalog of Hanan grid problems Networks—An International Journal 38(2): 76–
83, 2001
88 Hou, H., Hu, J., and Sapatnekar, S S Non-Hanan routing IEEE Transactions Computer-Aided Design
18(4): 436–444, April 1999
89 Fisher, A L and Kung, H T Synchronizing large systolic arrays In Proceedings of SPIE, Arlington, VA,
May 1982, pp 44–52
90 Friedman, E G Clock distribution design in VLSI circuits—an overview In Proceedings of the IEEE International Symposium Circuits and Systems, Chicago, IL, May 1993, pp 1475–1478.
91 Pullela, S., Menezes, N., and Pillage, L T Reliable non-zero skew clock trees using wire width
optimiza-tion In Proceedings of the ACM/IEEE Design Automation Conference, San Diego, CA, 1993, pp 165–170.
92 Zhu, Q., Dai, W W M., and Xi, J G Optimal sizing of high-speed clock networks based on distributed RC
and lossy transmission line models In Proceedings of the IEEE International Conference Computer-Aided Design, 1993, pp 628–633.
93 Dutta, R and Marek-Sadowska, M Algorithm for wire sizing of power and ground networks in VLSI
designs Journal of Circuits, Systems and Computers 2: 141–157, June 1992.
94 Cong, J., and Leung, K S Optimal wiresizing under the distributed Elmore delay model In Proceedings
of the IEEE International Conference Computer-Aided Design, 1993, pp 634–639.
95 Hodes, T D., McCoy, B A., and Robins, G Dynamically-wiresized Elmore-based routing constructions
In Proceedings of the IEEE International Symposium Circuits and Systems, Vol I, London, United
Kingdom, May 1994, pp 463–466
96 Sapetnekar, S RC interconnect optimization under the Elmore delay model In Proceedings of the ACM/IEEE Design Automation Conference, San Diego, CA, June 1994, pp 387–391.
97 Erhard, K H and Johannes, F M Power/ground networks in VLSI: Are general graphs better than trees?
Integration: The VLSI Journal 14(1): 91–109, November 1992.
98 Erhard, K H., Johannes, F M., and Dachauer, R Topology optimization techniques for power/ground
networks in VLSI In Proceedings of the European Design Automation Conference, Hamburg, Germany,
September 1992, pp 362–367
99 Lin, S and Wong, C K Process-variation-tolerant clock skew minimization In Proceedings of the IEEE International Conference Computer-Aided Design, San Jose, CA, November 1994, pp 284–288.
100 Chan, P K and Karplus, K Computing signal delay in general RC networks by tree/link partitioning
IEEE Transactions Computer-Aided Design 9(8): 898–902, August 1990.
101 Martin, D and Rumin, N C Delay prediction from resistance-capacitance models of general MOS
circuits IEEE Transactions Computer-Aided Design 12(7): 997–1003, July 1993.
102 Kahng, A B., Liu, B., and Mandoiu, I I Non-tree routing for reliability and yield improvement IEEE Transactions Computer-Aided Design 23(1): 148–156, 2004.
103 Hu, S., Li, Q., Hu, J., and Li, P Steiner network construction for timing critical nets In Proceedings of the ACM/IEEE Design Automation Conference, 2006, pp 379–384.
104 Borah, M., Owens, R M., and Irwin, M J An edge-based heuristic for Steiner routing IEEE Transactions Computer-Aided Design 13: 1563–1568, 1994.
105 Qiu, W and Shi, W Minimum moment Steiner trees In Proceedings of the ACM/SIAM Symposium Discrete Algorithms, 2004, pp 488–495.
106 Saxena, P., Menezes, N., Cocchini, P., and Kirkpatrick, D A Repeater scaling and its impact on CAD
IEEE Transactions Computer-Aided Design 23(4): 451–463, April 2004.
107 Hrkic, M and Lillis, J Buffer tree synthesis with consideration of temporal locality, sink polarity
requirements, solution cost, congestion and blockages IEEE Transactions Computer-Aided Design
22(4): 481–491, April 2003
Trang 426 Buffer Insertion Basics
Jiang Hu, Zhuo Li, and Shiyan Hu
CONTENTS
26.1 Motivation 535
26.2 Optimization of Two-Pin Nets 536
26.3 van Ginneken’s Algorithm 538
26.3.1 Concept of Candidate Solution 538
26.3.2 Generating Candidate Solutions 539
26.3.2.1 Wire Insertion 539
26.3.2.2 Buffer Insertion 539
26.3.2.3 Branch Merging 539
26.3.3 Inferiority and Pruning Identification 540
26.3.4 Pseudocode 540
26.3.5 Example 540
26.4 van Ginneken Extensions 542
26.4.1 Handling Library with Multiple Buffers 542
26.4.2 Library with Inverters 542
26.4.3 Polarity Constraints 542
26.4.4 Slew and Capacitance Constraints 543
26.4.5 Integration with Wire Sizing 543
26.4.6 Noise Constraints with Devgan Metric 544
26.4.6.1 Devgan’s Coupling Noise Metric 544
26.4.6.2 Algorithm of Buffer Insertion with Noise Avoidance 546
26.4.7 Higher Order Delay Modeling 546
26.4.7.1 Higher Order Point Admittance Model 547
26.4.7.2 Higher Order Wire Delay Model 548
26.4.7.3 Accurate Gate Delay 549
26.4.8 Flip-Flop Insertion 549
26.5 Speedup Techniques 550
26.5.1 Recent Speedup Results 550
26.5.2 Predictive Pruning 551
26.5.3 Convex Pruning 552
26.5.4 Efficient Way to Find Best Candidates 553
26.5.5 Implicit Representation 554
References 555
26.1 MOTIVATION
When the VLSI technology scales, gate delay and wire delay change in opposite directions Smaller devices imply less gate-switching delay In contrast, thinner wire size leads to increased wire resi-stance and greater signal propagation delay along wires As a result, wire delay has become
535
Trang 5536 Handbook of Algorithms for Physical Design Automation
a dominating factor for VLSI circuit performance Further, it is becoming a limiting factor to the progress of VLSI technology This is the well-known interconnect challenge [1–3] Among many techniques addressing this challenge [4,5], buffer (or repeater) insertion is such an effective technique that it is an indispensable necessity for timing closure in submicron technology and beyond Buffers can reduce wire delay by restoring signal strength, in particular, for long wires Moreover, buffers can be applied to shield capacitive load from timing-critical paths such that the interconnect delay along critical paths are reduced
As the ratio of wire delay to gate delay increases from one technology to the next, more and more buffers are required to achieve performance goals The buffer scaling is studied by Intel and the results are reported in Ref [6] One metric that reveals the scaling is critical buffer length, the minimum distance beyond which inserting an optimally placed and sized buffer makes the interconnect delay less than that of the corresponding unbuffered wire When wire delay increases because of the technology scaling, the critical buffer length becomes shorter, i.e., the distance that a buffer can comfortably drive shrinks According to Ref [6], the critical buffer length decreases by 68 percent when the VLSI technology migrates from 90 to 45 nm (for two generations) Please note that the critical buffer-length scaling significantly outpaces the VLSI technology scaling, which is roughly 0.5× for every two generations If we look at the percentage of block level nets requiring buffers, it grows from 5.8 percent in 90-nm technology to 19.6 percent in 45-nm technology [6] Perhaps the most alarming result is the scaling of buffer count [6], which predicts that 35 percent of cells will be buffers in 45-nm technology as opposed to only 6 percent in 90-nm technology
The dramatic buffer scaling undoubtedly generates large and profound impact to VLSI circuit design With millions of buffers required per chip, almost nobody can afford to neglect the importance
of buffer insertion as compared to a decade ago when only a few thousands of buffers are needed for a chip [7] Because of this importance, buffer insertion algorithms and methodologies need to
be deeply studied on various aspects First, a buffer insertion algorithm should deliver solutions of high quality because interconnect and circuit performance largely depend on the way that buffers are placed Second, a buffer insertion algorithm needs to be sufficiently fast so that millions of nets can
be optimized in reasonable time Third, accurate delay models are necessary to ensure that buffer insertion solutions are reliable Fourth, buffer insertion techniques are expected to simultaneously handle multiple objectives, such as timing, power, and signal integrity, and their trade-offs Last but not the least, buffer insertion should interact with other layout steps, such as placement and routing,
as the sheer number of buffers has already altered the landscape of circuit layout design Many of these issues will be discussed in subsequent sections and other chapters
26.2 OPTIMIZATION OF TWO-PIN NETS
For buffer insertion, perhaps the most simple case is a two-pin net, which is a wire segment with a driver (source) at one end and a sink at the other end The simplicity allows closed form solutions to buffer insertion in two-pin nets
If the delay of a two-pin net is to be minimized by using a single buffer type b, one needs to decide the number of buffers k and the spacing between the buffers, the source and the sink First,
let us look at a very simple case to attain an intuitive understanding of the problem In this case,
the length of the two-pin net is l and the wire resistance and capacitance per unit length are r and
c, respectively The number of buffers k has been given and is fixed The driver resistance is the same as the buffer output resistance R b The load capacitance of the sink is identical to buffer input
capacitance C b The buffer has an intrinsic delay of t b The k buffers separates the net into k+ 1
segments, with length of l = (l0, l1, , l k ) T(Figure 26.1) Then, the Elmore delay of this net can be expressed as
t (l) =
k
=0
αl2
Trang 6Buffer Insertion Basics 537
FIGURE 26.1 Buffer insertion in a two-pin net.
whereα = 1
2rc, β = R b c + rC b, andγ = R b C b + t b A formal problem formulation is
subject to g (l) = l −
k
i=0
According to the Kuhn–Tucker condition [8], the following equation is the necessary condition for the optimal solution
∇t(l) + λ ∇g(l) = 0 (26.4) whereλ is the Lagrangian multiplier According to the above condition, it can be easily derived that
l i= β
Becauseα, β, and λ are all constants, it can be seen that the buffers need to be equally spaced to
minimize the delay This is an important conclusion that can be treated as a rule of thumb The value
of the Lagrangian multiplierλ can be found by plugging Equation 26.5 into Equation 26.3.
In more general cases, the driver resistance Rdmay be different from that of buffer output
resis-tance and so is the sink capaciresis-tance CL For such cases, the optimum number of buffers minimizing the delay is given by Ref [9]
k=
−1
2 +
1+2[rcl + r(C b − CL) − c(R b − Rd)]2
rc (R b C b + t b )
(26.6)
The length of each segment can be obtained through [9]
l0= 1
k+ 1
l+k (R b − Rd)
r +CL− C b
c
l1= = l k−1= 1
k+ 1
l−R b − Rd
r +CL− C b
c
(26.7)
l k= 1
k+ 1
l−R b − Rd
r −k (CL− C b )
c
A closed form solution to simultaneous buffer insertion/sizing and wire sizing is reported in Ref [10] Figure 26.2 shows an example of this simultaneous optimization The wire is segmented
into m pieces The length l i and width h i of each wire piece i are the variables to be optimized There are k buffers inserted between these pieces The size b i of each buffer i is also a decision
variable A buffer location is indicated by its surrounding wire pieces For example, if the set of
wire pieces between buffer i − 1 and i is P i−1, the distance between the two buffers is equal to
j ∈Pi−1 l j There are two important conclusions [10] for the optimal solution that minimizes the
delay First, all wire pieces have the same length, i.e., l i= l
m , i = 1, 2, , m Second, for wire pieces
P i−1 = {p i−1,1, p i−1,2, , p i −1,mi−1 } between buffer i − 1 and i, their widths satisfy h i−1,1 > h i−1,2
> > h i −1,mi−1 and form a geometric progression.
Trang 7538 Handbook of Algorithms for Physical Design Automation
h1 h2
l2
l1
l m
h m
FIGURE 26.2 Example of simultaneous buffer insertion/sizing and wire sizing.
26.3 VAN GINNEKEN’S ALGORITHM
For a general case of signal nets, which may have multiple sinks, van Ginneken’s algorithm [11] is perhaps the first systematic approach on buffer insertion For a fixed signal routing tree and given candidate buffer locations, van Ginneken’s algorithm can find the optimal buffering solution that
maximizes timing slack according to the Elmore delay model If there are n candidate buffer locations, its computation complexity is O (n2) Based on van Ginneken’s algorithm, numerous extensions have
been made, such as handling of multiple buffer types, trade-off with power and cost, addressing slew rate and crosstalk noise, and using accurate delay models and speedup techniques These extensions will be covered in subsequent sections
At a high level, van Ginneken’s algorithm [11] proceeds bottom-up from the leaf nodes toward the driver along a given routing tree A set of candidate solutions keep updated during the process, where three operations adding wire, inserting buffers, and branch merging may be performed Meanwhile, the inferior solutions are pruned to accelerate the algorithm After a set of candidate solutions are propagated to the source, the solution with the maximum required arrival time is selected as the final
solution For a routing tree with n buffer positions, the algorithm computes the optimal buffering solution in O (n2) time.
A net is given as a binary routing tree T = (V, E), where V = {s0} ∪ V s ∪ V n , and E ⊆ V × V Vertex s0is the source vertex and also the root of T , V s is the set of sink vertices, and V nis the set of
internal vertices In the existing literatures, s0is also referred as driver Denote by T (v) the subtree of
T rooted at v Each sink vertex s ∈ V s is associated with a sink capacitance C (s) and a required arrival time (RAT) Each edge e ∈ E is associated with lumped resistance R(e) and capacitance C(e) A buffer library B containing all the possible buffer types that can be assigned to a buffer position is also given In this section, B contains only one buffer type Delay estimation is obtained using the Elmore
delay model, which is described in Chapter 3 A buffer assignmentγ is a mapping γ : V n → B ∪{¯b} where ¯b denotes that no buffer is inserted The timing buffering problem is defined as follows Timing-driven buffer insertion problem: Given a binary routing tree T = (V, E), possible buffer positions, and a buffer library B, compute a buffer assignment γ such that the RAT at driver is
maximized
26.3.1 CONCEPT OFCANDIDATESOLUTION
A buffer assignmentγ is also called a candidate solution for the timing buffering problem A partial
solution, denoted byγ v , refers to an incomplete solution where the buffer assignment in T (v) has
been determined
The Elmore delay from v to any sink s in T (v) under γ vis computed by
D (s, γ v ) =
e=( vi,vj ) [D (v i ) + D (e)]
where the sum is taken over all edges along the path from v to s The slack of vertex v under γ vis defined as
Trang 8Buffer Insertion Basics 539
Q (γ v ) = min
s ∈T(v) {RAT(s) − D(s, γ v )}
At any vertex v, the effect of a partial solution γ v to its upstream part is characterized by a
(Q(γ v ), C(γ v )) pair, where Q is the slack at v under γ v and C is the downstream capacitance viewing
at v under γ v
26.3.2 GENERATINGCANDIDATESOLUTIONS
van Ginneken’s algorithm proceeds bottom-up from the leaf nodes toward the driver along T A
set of candidate solutions, denoted by , are kept updated during this process There are three
operations through solution propagation, namely, wire insertion, buffer insertion, and branch merging (Figure 26.3) We are to describe them in turn
26.3.2.1 Wire Insertion
Suppose that a partial solutionγ v at position v propagates to an upstream position u and there is no branching point in between If no buffer is placed at u, then only wire delay needs to be considered.
Therefore, the new solutionγ ucan be computed as
Q (γ u ) = Q(γ v ) − D(e)
where e = (u, v) and D(e) = R(e) C(e)
2 + C(γ v )
26.3.2.2 Buffer Insertion
Suppose that we add a buffer b at u Denote by R (b), K(b) the driving resistance and the intrinsic delay of buffer b, respectively γ uis then updated toγ
uwhere
Q (γ
u ) = Q(γ u ) − R (b) · C(γ u ) + K(b)
C (γ
26.3.2.3 Branch Merging
When two branches Tl and Tr meet at a branching point v, l andr, which correspond to Tl and
Tr, respectively, are to be merged The merging process is performed as follows For each solution
γl∈ land each solutionγr∈ r, generate a new solutionγaccording to
C (γ) = C(γl) + C(γr)
The smaller Q is picked since the worst-case circuit performance needs to be considered.
u
(a) Wire insertion (b) Buffer insertion
v
T(v)
u
T(u)
(c) Branch merging
v1
v2
T(v2 )
T(v1 )
v
FIGURE 26.3 Operations in van Ginneken’s algorithm.
Trang 9540 Handbook of Algorithms for Physical Design Automation
26.3.3 INFERIORITY ANDPRUNINGIDENTIFICATION
Simply propagating all solutions by the above three operations makes the solution set grow expo-nentially in the number of buffer positions processed An effective and efficient pruning technique
is necessary to reduce the size of the solution set This motivates an important concept—inferior solution—in van Ginneken’s algorithm For any two partial solutionsγ1andγ2at the same vertex v,
γ2is inferior toγ1if C (γ1) ≤ C(γ2) and Q(γ1) ≥ Q(γ2) Whenever a solution becomes inferior, it is
pruned from the solution set Therefore, only solutions that excel in at least one aspect of downstream capacitance and slack can survive
For an efficient pruning implementation and thus an efficient buffering algorithm, a sorted list is used to maintain the solution set The solution set is increasingly sorted according to C, and thus
Q is also increasingly sorted if does not contain any inferior solutions.
By a straightforward implementation, when adding a wire, the number of candidate solutions will not change; when inserting a buffer, only one new candidate solution will be introduced More
efforts are needed to merge two branches Tland Trat v For each partial solution in l, find the first
solution with larger Q value in r If such a solution does not exist, the last solution inrwill be taken Becauselandrare sorted, we only need to traverse them once Partial solutions inrare similarly treated It is easy to see that after merging, the number of solutions is at most|l| + |r|
As such, given n buffer positions, at most n solutions can be generated at any time Consequently, the pruning procedure at any vertex in T runs in O (n) time.
26.3.4 PSEUDOCODE
In van Ginneken’s algorithm, a set of candidate solutions are propagated from sinks to driver Along
a branch, after a candidate buffer location v is processed, all solutions are propagated to its upstream buffer location u through wire insertion A buffer is then inserted to each solution to obtain a new
solution Meanwhile, inferior solutions are pruned At a branching point, solution sets from all branches are merged by merging process In this way, the algorithm proceeds in the bottom-up
fashion and the solution with maximum required arrival time at driver is returned Given n buffer positions in T , van Ginneken’s algorithm can compute a buffer assignment with maximum slack at driver in O (n2) time, because any operation at any node can be performed in O(n) time Refer to
Figure 26.4 for the pseudocode of van Ginneken’s algorithm
26.3.5 EXAMPLE
Let us look at a simple example to illustrate the work flow of van Ginneken’s algorithm Refer to
Figure 26.5 Assume that there are three nondominated solutions at v3whose(Q, C) pairs are
(200, 10), (300, 30), and (500, 50) and there are two nondominated solutions at v2whose(Q, C) pairs are
(290, 5) and (350, 20)
We first propagate them to v1through wire insertion Assume that R (v1, v3)=3 and C(v1, v3)=2 Solution (200, 10) at v3 becomes(200 − 3 · (2/2 + 10), 10 + 2) = (167, 12) at v1 Similarly, the
other two solutions become (207, 32) and (347, 52) Assume that R (v2, v3) = 2 and C(v2, v3) = 2, solutions at v2become (278, 7) and (308, 22) at v1
We are now to merge these solutions at v1 Denote byl the solutions propagated from v3 and
byrthe solutions propagated from v2 Before merging, partial solutions inlare
(167, 12) , (207, 32) , and (347, 52)
Trang 10Buffer Insertion Basics 541
Algorithm: van Ginneken’s algorithm
andC(γs) =C(s)
5 C(γ ) =C(γ ) +C(e)
6 Q(γ ) =Q(γ ) −D(e)
10 setC(γ) =C(b)
11 setQ(γ) =Q(γ ) −R(b) ·C(γ ) −K(b)
13 //merge1and2tov t
16 setC(γ) =C(γ1) +C(γ2)
17 setQ(γ) =min{Q(γ1),Q(γ2)}
FIGURE 26.4 van Ginneken’s algorithm.
and partial solutions inrare
(278, 7) and (308, 22) After branch merging, the new candidate partial solutions whose Q are dictated by solutions in lare
(167, 19) , (207, 39) , and (308, 74)
and those dictated by solutions inrare
(278, 59) and (308, 74)
V2
S1
S3
S4
V1 V3
S2
S0
FIGURE 26.5 Example for performing van Ginneken’s algorithm.
... Ginneken’s algorithm.and partial solutions inrare
(278, 7) and (308, 22) After branch merging, the new candidate partial solutions whose Q are dictated...
S2
S0
FIGURE 26.5 Example for performing van Ginneken’s algorithm.