Handbook of algorithms for physical design automation part 56 pps

One metric that reveals the scaling is critical buffer length, the minimum distance beyond which inserting an optimally placed and sized buffer makes the interconnect delay less than tha

Trang 1

532 Handbook of Algorithms for Physical Design Automation

36 Elmore, W C The transient response of damped linear networks with particular regard to wide-band

amplifiers Journal of Applied Physics 19(1): 55–63, 1948.

37 Lin, T M and Mead, C A Signal delay in general RC-networks IEEE Transactions Computer-Aided Design CAD-3(4): 331–349, October 1984.

38 Rubinstein, J., Penfield, P., and Horowitz, M A Signal delay in RC tree networks IEEE Transactions Computer-Aided Design 2(3): 202–211, 1983.

39 Tsay, R S Exact zero skew In Proceedings of the IEEE International Conference Computer-Aided Design,

Santa Clara, CA, November 1991, pp 336–339

40 Alpert, C J., Hu, T C., Huang, J H., Kahng, A B., and Karger, D Prim-Dijkstra tradeoffs for improved

performance-driven routing tree design IEEE Transactions Computer-Aided Design 14(7): 890–896, July

1995 (ISCAS 1993)

41 Awerbuch, B., Baratz, A., and Peleg, D Cost-sensitive analysis of communication protocols In Proceed-ings of the ACM Symposium Principles of Distributed Computing, Quebec City, Quebec, Canada, 1990,

pp 177–187

42 Cong, J., Kahng, A B., Robins, G., Sarrafzadeh, M., and Wong, C K Provably good algorithms for

performance-driven global routing In Proceedings of the IEEE International Symposium Circuits and Systems, San Diego, CA, May 1992, pp 2240–2243.

43 Cong, J., Kahng, A B., Robins, G., Sarrafzadeh, M., and Wong, C K Provably good performance-driven

global routing IEEE Transactions Computer-Aided Design 11(6): 739–752, 1992.

44 Khuller, S., Raghavachari, B., and Young, N Balancing minimum spanning and shortest path trees In

Proceedings of the ACM/SIAM Symposium Discrete Algorithms, Austin, TX, January 1993, pp 243–250.

45 Boese, K D., Kahng, A B., McCoy, B A., and Robins, G Fidelity and near-optimality of Elmore-based

routing constructions In Proceedings of the IEEE International Conference Computer Design, Cambridge,

MA, October 1993, pp 81–84

46 Boese, K D., Kahng, A B., McCoy, B A., and Robins, G Rectilinear Steiner trees with minimum

Elmore delay In Proceedings of the ACM/IEEE Design Automation Conference, San Diego, CA, June

1994, pp 381–386

47 Boese, K D., Kahng, A B., and Robins, G High-performance routing trees with identified critical sinks

In Proceedings of the ACM/IEEE Design Automation Conference, Dallas, TX, June 1993, pp 182–187.

48 Lillis, J., Cheng, C K., Lin, T -T Y., and Ho, C -Y New performance driven routing techniques

with explicit area/delay tradeoff and simultaneous wire sizing In Proceedings of the ACM/IEEE Design Automation Conference, Las Vegas, NV, 1996, pp 395–400.

49 Chen, H., Cheng, C -K., Kahng, A., M˘andoiu, I I., Wang, Q., and Yao., B The y-architecture for on-chip

interconnect: Analysis and methodology IEEE Transactions Computer-Aided Design 24(4): 588–599,

April 2005

50 Chen, H., Cheng, C -K., Kahng, A B., M˘andoiu, I., and Wang, Q Estimation of wirelength reduction for

λ-geometry vs Manhattan placement and routing In Proceedings of the ACM International Workshop on System-Level Interconnect Prediction, Monterey, CA, 2003, pp 71–76.

51 Koh, C -K and Madden, P H Manhattan or non-Manhattan?: A study of alternative VLSI routing

architectures In Proceedings of the Great Lakes Symposium VLSI, Chicago, IL, 2000, pp 47–52.

52 Li, Y Y., Cheung, S K., Leung, K S., and Wong, C K Steiner tree construction inλ3-metric IEEE Transactions Circuits and Systems-II: Analog and Digital Signal Processing 45(5): 563–574, May 1998.

53 Nielsen, B K., Winter, P., and Zachariasen, M An exact algorithm for the uniformly-oriented Steiner tree

problem In Proceedings of the European Symposium on Algorithms, Lecture Notes in Computer Science

2461 Springer-Verlag, Rome, Italy, 2002, pp 760–771

54 Sarrafzadeh, M and Wong, C K Hierarchical Steiner tree construction in uniform orientations IEEE Transactions Computer-Aided Design 11(9): 1095–1103, September 1992.

55 Teig, S The x architecture: Not your father’s diagonal wiring In Proceedings of the ACM International Workshop on System-Level Interconnect Prediction, San Diego, CA, 2002, pp 33–37.

56 Yildiz, M C and Madden, P H Preferred direction Steiner trees In Proceedings of the Great Lakes Symposium VLSI, West Lafayette, IN, 2001, pp 56–61.

57 Dijkstra, E W A note on two problems in connection with graphs Numerische Mathematik 1:

269–271, 1959

58 Prim, A Shortest connecting networks and some generalizations Bell System Technical Journal 36:

1389–1401, 1957

Trang 2

Timing-Driven Interconnect Synthesis 533

59 Cong, J., Kahng, A B., Robins, G., Sarrafzadeh, M., and Wong, C K Performance-driven global routing

for cell based ICs In Proceedings of the IEEE International Conference Computer Design, Cambridge,

MA, October 1991, pp 170–173

60 Robins, G and Zelikovsky, A Improved Steiner tree approximation in graphs In Proceedings of the ACM/SIAM Symposium Discrete Algorithms, San Francisco, CA, January 2000, pp 770–779.

61 Kahng, A B and Robins, G On performance bounds for a class of rectilinear Steiner tree heuristics in

arbitrary dimension IEEE Transactions Computer-Aided Design 11(11): 1462–1465, November 1992.

62 Griffith, J., Robins, G., Salowe, J S., and Zhang, T Closing the gap: Near-optimal Steiner trees in

polynomial time IEEE Transactions Computer-Aided Design 13(11): 1351–1365, November 1994.

63 Kahng, A B and Robins, G A new class of iterative Steiner tree heuristics with good performance IEEE Transactions Computer-Aided Design 11(7): 893–902, July 1992.

64 Cong, J., Leung, K S., and Zhou, D Performance-driven interconnect design based on distributed RC delay

model In Proceedings of the ACM/IEEE Design Automation Conference, Dallas, June 1993, pp 606–611.

65 Nastansky, L., Selkow, S M., and Stewart, N F Cost-minima trees in directed acyclic graphs Zeitschrift for Operations Research 18: 59–67, 1974.

66 de Matos, R R L A Rectilinear Arborescence Problem PhD thesis, University of Alabama, Tuscaloosa, Alabama, 1979

67 Ho, J M., Ko, M T., Ma, T H., and Sung, T Y Algorithms for rectilinear optimal multicast tree problem

In Proceedings of the International Symposium on Algorithms and Computation, Nagoya, Japan, June

1992, pp 106–15

68 Leung, K -S and Cong, J Fast optimal algorithms for the minimum rectilinear Steiner arborescence

problem In Proceedings of the IEEE International Symposium Circuits and Systems, Vol 3, Hong Kong,

1997, pp 1568–1571

69 Rao, S K., Sadayappan, P., Hwang, F K., and Shor, P W The rectilinear Steiner arborescence problem

Algorithmica 7(1): 277–288, 1992.

70 Trubin, V A Subclass of the Steiner problems on a plane with rectilinear metric Cybernetics and Systems Analysis 21(3): 320–322, 1985.

71 Shi, W and Su, C The rectilinear Steiner arborescence problem is np-complete SIAM Journal of Computation 35(3): 729–740, 2006.

72 Cordova, J and Lee, Y H A heuristic algorithm for the rectilinear Steiner arborescence problem Technical Report TR-94-025, University of Florida, Gainesville, FL, 1994

73 Alexander, M J and Robins, G New performance-driven FPGA routing algorithms IEEE Transactions Computer-Aided Design 15(12): 1505–1517, December 1996.

74 Kou, L., Markowsky, G., and Berman, L A fast algorithm for Steiner trees Acta Informatica 15: 141–

145, 1981

75 Cong, J., Kahng, A B., and Leung, K -S Efficient algorithms for the minimum shortest path Steiner

arborescence problem with applications to VLSI physical design IEEE Transactions Computer-Aided Design 17(1): 24–39, January 1998.

76 Robins, G On Optimal Interconnections PhD thesis, Department of Computer Science, UCLA, CSD-TR-920024, Los Angeles, CA, 1992

77 Zhou, D., Tsui, F., and Gao, D S High performance multichip interconnection design In Proceedings of the ACM/SIGDA Physical Design Workshop, Lake Arrowhead, CA, April 1993, pp 32–43.

78 Sriram, M and Kang, S M Performance driven MCM routing using a second order RLC tree delay

model In IEEE International Conference on Wafer Scale Integration, San Francisco, CA, January 1993,

pp 262–267

79 Alpert, C J., Gandham, G., Hrkic, M., Hu, J., Kahng, A B., Lillis, J., Liu, B., Quay, S T., Sapatnekar, S S.,

and Sullivan, A J Buffered Steiner trees for difficult instances IEEE Transactions Computer-Aided Design

21(1): 3–14, January 2002

80 Ganley, J L Accuracy and fidelity of fast net length estimates Integration: The VLSI Journal 23(2):

151–155, 1997

81 Hong, X., Xue, T., Kuh, E S., Cheng, C K., and Huang, J Performance-driven Steiner tree algorithms for

global routing In Proceedings of the ACM/IEEE Design Automation Conference, Dallas, TX, June 1993,

pp 177–181

82 Hu, J and Sapatnekar, S S Algorithms for non-Hanan-based optimization for VLSI interconnect under

a higher order awe model IEEE Transactions Computer-Aided Design 19(4): 446–458, April 2000.

Trang 3

83 Hu, J and Sapatnekar, S S A timing-constrained simultaneous global routing algorithm IEEE Transactions Computer-Aided Design 21(9): 1025–1036, September 2002.

84 Peyer, S., Zachariasen, M., and Grove, D J Delay-related secondary objectives for rectilinear Steiner

minimum trees Discrete and Applied Mathematics 136(2): 271–298, February 2004.

85 Wu, D., Hu, J., and Mahapatra, R Coupling aware timing optimization and antenna avoidance in layer

assignment In Proceedings of the International Symposium on Physical Design ACM Press, New York,

2005, pp 20–27

86 Hanan, M On Steiner’s problem with rectilinear distance SIAM Journal of Applied Mathematics 14: 255–

265, 1966

87 Zachariasen, M A catalog of Hanan grid problems Networks—An International Journal 38(2): 76–

83, 2001

88 Hou, H., Hu, J., and Sapatnekar, S S Non-Hanan routing IEEE Transactions Computer-Aided Design

18(4): 436–444, April 1999

89 Fisher, A L and Kung, H T Synchronizing large systolic arrays In Proceedings of SPIE, Arlington, VA,

May 1982, pp 44–52

90 Friedman, E G Clock distribution design in VLSI circuits—an overview In Proceedings of the IEEE International Symposium Circuits and Systems, Chicago, IL, May 1993, pp 1475–1478.

91 Pullela, S., Menezes, N., and Pillage, L T Reliable non-zero skew clock trees using wire width

optimiza-tion In Proceedings of the ACM/IEEE Design Automation Conference, San Diego, CA, 1993, pp 165–170.

92 Zhu, Q., Dai, W W M., and Xi, J G Optimal sizing of high-speed clock networks based on distributed RC

and lossy transmission line models In Proceedings of the IEEE International Conference Computer-Aided Design, 1993, pp 628–633.

93 Dutta, R and Marek-Sadowska, M Algorithm for wire sizing of power and ground networks in VLSI

designs Journal of Circuits, Systems and Computers 2: 141–157, June 1992.

94 Cong, J., and Leung, K S Optimal wiresizing under the distributed Elmore delay model In Proceedings

of the IEEE International Conference Computer-Aided Design, 1993, pp 634–639.

95 Hodes, T D., McCoy, B A., and Robins, G Dynamically-wiresized Elmore-based routing constructions

In Proceedings of the IEEE International Symposium Circuits and Systems, Vol I, London, United

Kingdom, May 1994, pp 463–466

96 Sapetnekar, S RC interconnect optimization under the Elmore delay model In Proceedings of the ACM/IEEE Design Automation Conference, San Diego, CA, June 1994, pp 387–391.

97 Erhard, K H and Johannes, F M Power/ground networks in VLSI: Are general graphs better than trees?

Integration: The VLSI Journal 14(1): 91–109, November 1992.

98 Erhard, K H., Johannes, F M., and Dachauer, R Topology optimization techniques for power/ground

networks in VLSI In Proceedings of the European Design Automation Conference, Hamburg, Germany,

September 1992, pp 362–367

99 Lin, S and Wong, C K Process-variation-tolerant clock skew minimization In Proceedings of the IEEE International Conference Computer-Aided Design, San Jose, CA, November 1994, pp 284–288.

100 Chan, P K and Karplus, K Computing signal delay in general RC networks by tree/link partitioning

IEEE Transactions Computer-Aided Design 9(8): 898–902, August 1990.

101 Martin, D and Rumin, N C Delay prediction from resistance-capacitance models of general MOS

circuits IEEE Transactions Computer-Aided Design 12(7): 997–1003, July 1993.

102 Kahng, A B., Liu, B., and Mandoiu, I I Non-tree routing for reliability and yield improvement IEEE Transactions Computer-Aided Design 23(1): 148–156, 2004.

103 Hu, S., Li, Q., Hu, J., and Li, P Steiner network construction for timing critical nets In Proceedings of the ACM/IEEE Design Automation Conference, 2006, pp 379–384.

104 Borah, M., Owens, R M., and Irwin, M J An edge-based heuristic for Steiner routing IEEE Transactions Computer-Aided Design 13: 1563–1568, 1994.

105 Qiu, W and Shi, W Minimum moment Steiner trees In Proceedings of the ACM/SIAM Symposium Discrete Algorithms, 2004, pp 488–495.

106 Saxena, P., Menezes, N., Cocchini, P., and Kirkpatrick, D A Repeater scaling and its impact on CAD

IEEE Transactions Computer-Aided Design 23(4): 451–463, April 2004.

107 Hrkic, M and Lillis, J Buffer tree synthesis with consideration of temporal locality, sink polarity

requirements, solution cost, congestion and blockages IEEE Transactions Computer-Aided Design

22(4): 481–491, April 2003

Trang 4

26 Buffer Insertion Basics

Jiang Hu, Zhuo Li, and Shiyan Hu

CONTENTS

26.1 Motivation 535

26.2 Optimization of Two-Pin Nets 536

26.3 van Ginneken’s Algorithm 538

26.3.1 Concept of Candidate Solution 538

26.3.2 Generating Candidate Solutions 539

26.3.2.1 Wire Insertion 539

26.3.2.2 Buffer Insertion 539

26.3.2.3 Branch Merging 539

26.3.3 Inferiority and Pruning Identification 540

26.3.4 Pseudocode 540

26.3.5 Example 540

26.4 van Ginneken Extensions 542

26.4.1 Handling Library with Multiple Buffers 542

26.4.2 Library with Inverters 542

26.4.3 Polarity Constraints 542

26.4.4 Slew and Capacitance Constraints 543

26.4.5 Integration with Wire Sizing 543

26.4.6 Noise Constraints with Devgan Metric 544

26.4.6.1 Devgan’s Coupling Noise Metric 544

26.4.6.2 Algorithm of Buffer Insertion with Noise Avoidance 546

26.4.7 Higher Order Delay Modeling 546

26.4.7.1 Higher Order Point Admittance Model 547

26.4.7.2 Higher Order Wire Delay Model 548

26.4.7.3 Accurate Gate Delay 549

26.4.8 Flip-Flop Insertion 549

26.5 Speedup Techniques 550

26.5.1 Recent Speedup Results 550

26.5.2 Predictive Pruning 551

26.5.3 Convex Pruning 552

26.5.4 Efficient Way to Find Best Candidates 553

26.5.5 Implicit Representation 554

References 555

26.1 MOTIVATION

When the VLSI technology scales, gate delay and wire delay change in opposite directions Smaller devices imply less gate-switching delay In contrast, thinner wire size leads to increased wire resi-stance and greater signal propagation delay along wires As a result, wire delay has become

535

Trang 5

a dominating factor for VLSI circuit performance Further, it is becoming a limiting factor to the progress of VLSI technology This is the well-known interconnect challenge [1–3] Among many techniques addressing this challenge [4,5], buffer (or repeater) insertion is such an effective technique that it is an indispensable necessity for timing closure in submicron technology and beyond Buffers can reduce wire delay by restoring signal strength, in particular, for long wires Moreover, buffers can be applied to shield capacitive load from timing-critical paths such that the interconnect delay along critical paths are reduced

As the ratio of wire delay to gate delay increases from one technology to the next, more and more buffers are required to achieve performance goals The buffer scaling is studied by Intel and the results are reported in Ref [6] One metric that reveals the scaling is critical buffer length, the minimum distance beyond which inserting an optimally placed and sized buffer makes the interconnect delay less than that of the corresponding unbuffered wire When wire delay increases because of the technology scaling, the critical buffer length becomes shorter, i.e., the distance that a buffer can comfortably drive shrinks According to Ref [6], the critical buffer length decreases by 68 percent when the VLSI technology migrates from 90 to 45 nm (for two generations) Please note that the critical buffer-length scaling significantly outpaces the VLSI technology scaling, which is roughly 0.5× for every two generations If we look at the percentage of block level nets requiring buffers, it grows from 5.8 percent in 90-nm technology to 19.6 percent in 45-nm technology [6] Perhaps the most alarming result is the scaling of buffer count [6], which predicts that 35 percent of cells will be buffers in 45-nm technology as opposed to only 6 percent in 90-nm technology

The dramatic buffer scaling undoubtedly generates large and profound impact to VLSI circuit design With millions of buffers required per chip, almost nobody can afford to neglect the importance

of buffer insertion as compared to a decade ago when only a few thousands of buffers are needed for a chip [7] Because of this importance, buffer insertion algorithms and methodologies need to

be deeply studied on various aspects First, a buffer insertion algorithm should deliver solutions of high quality because interconnect and circuit performance largely depend on the way that buffers are placed Second, a buffer insertion algorithm needs to be sufficiently fast so that millions of nets can

be optimized in reasonable time Third, accurate delay models are necessary to ensure that buffer insertion solutions are reliable Fourth, buffer insertion techniques are expected to simultaneously handle multiple objectives, such as timing, power, and signal integrity, and their trade-offs Last but not the least, buffer insertion should interact with other layout steps, such as placement and routing,

as the sheer number of buffers has already altered the landscape of circuit layout design Many of these issues will be discussed in subsequent sections and other chapters

26.2 OPTIMIZATION OF TWO-PIN NETS

For buffer insertion, perhaps the most simple case is a two-pin net, which is a wire segment with a driver (source) at one end and a sink at the other end The simplicity allows closed form solutions to buffer insertion in two-pin nets

If the delay of a two-pin net is to be minimized by using a single buffer type b, one needs to decide the number of buffers k and the spacing between the buffers, the source and the sink First,

let us look at a very simple case to attain an intuitive understanding of the problem In this case,

the length of the two-pin net is l and the wire resistance and capacitance per unit length are r and

c, respectively The number of buffers k has been given and is fixed The driver resistance is the same as the buffer output resistance R b The load capacitance of the sink is identical to buffer input

capacitance C b The buffer has an intrinsic delay of t b The k buffers separates the net into k+ 1

segments, with length of l = (l0, l1, , l k ) T(Figure 26.1) Then, the Elmore delay of this net can be expressed as

t (l) =

k

=0

αl2

Trang 6

Buffer Insertion Basics 537

FIGURE 26.1 Buffer insertion in a two-pin net.

whereα = 1

2rc, β = R b c + rC b, andγ = R b C b + t b A formal problem formulation is

subject to g (l) = l −

k

i=0

According to the Kuhn–Tucker condition [8], the following equation is the necessary condition for the optimal solution

∇t(l) + λ ∇g(l) = 0 (26.4) whereλ is the Lagrangian multiplier According to the above condition, it can be easily derived that

l i= β

Becauseα, β, and λ are all constants, it can be seen that the buffers need to be equally spaced to

minimize the delay This is an important conclusion that can be treated as a rule of thumb The value

of the Lagrangian multiplierλ can be found by plugging Equation 26.5 into Equation 26.3.

In more general cases, the driver resistance Rdmay be different from that of buffer output

resis-tance and so is the sink capaciresis-tance CL For such cases, the optimum number of buffers minimizing the delay is given by Ref [9]

k=

−1

2 +

1+2[rcl + r(C b − CL) − c(R b − Rd)]2

rc (R b C b + t b )

(26.6)

The length of each segment can be obtained through [9]

l0= 1

k+ 1

l+k (R b − Rd)

r +CL− C b

c

l1= = l k−1= 1

k+ 1

l−R b − Rd

r +CL− C b

c

(26.7)

l k= 1

k+ 1

l−R b − Rd

r −k (CL− C b )

c

A closed form solution to simultaneous buffer insertion/sizing and wire sizing is reported in Ref [10] Figure 26.2 shows an example of this simultaneous optimization The wire is segmented

into m pieces The length l i and width h i of each wire piece i are the variables to be optimized There are k buffers inserted between these pieces The size b i of each buffer i is also a decision

variable A buffer location is indicated by its surrounding wire pieces For example, if the set of

wire pieces between buffer i − 1 and i is P i−1, the distance between the two buffers is equal to

j ∈Pi−1 l j There are two important conclusions [10] for the optimal solution that minimizes the

delay First, all wire pieces have the same length, i.e., l i= l

m , i = 1, 2, , m Second, for wire pieces

P i−1 = {p i−1,1, p i−1,2, , p i −1,mi−1 } between buffer i − 1 and i, their widths satisfy h i−1,1 > h i−1,2

> > h i −1,mi−1 and form a geometric progression.

Trang 7

h1 h2

l2

l1

l m

h m

FIGURE 26.2 Example of simultaneous buffer insertion/sizing and wire sizing.

26.3 VAN GINNEKEN’S ALGORITHM

For a general case of signal nets, which may have multiple sinks, van Ginneken’s algorithm [11] is perhaps the first systematic approach on buffer insertion For a fixed signal routing tree and given candidate buffer locations, van Ginneken’s algorithm can find the optimal buffering solution that

maximizes timing slack according to the Elmore delay model If there are n candidate buffer locations, its computation complexity is O (n2) Based on van Ginneken’s algorithm, numerous extensions have

been made, such as handling of multiple buffer types, trade-off with power and cost, addressing slew rate and crosstalk noise, and using accurate delay models and speedup techniques These extensions will be covered in subsequent sections

At a high level, van Ginneken’s algorithm [11] proceeds bottom-up from the leaf nodes toward the driver along a given routing tree A set of candidate solutions keep updated during the process, where three operations adding wire, inserting buffers, and branch merging may be performed Meanwhile, the inferior solutions are pruned to accelerate the algorithm After a set of candidate solutions are propagated to the source, the solution with the maximum required arrival time is selected as the final

solution For a routing tree with n buffer positions, the algorithm computes the optimal buffering solution in O (n2) time.

A net is given as a binary routing tree T = (V, E), where V = {s0} ∪ V s ∪ V n , and E ⊆ V × V Vertex s0is the source vertex and also the root of T , V s is the set of sink vertices, and V nis the set of

internal vertices In the existing literatures, s0is also referred as driver Denote by T (v) the subtree of

T rooted at v Each sink vertex s ∈ V s is associated with a sink capacitance C (s) and a required arrival time (RAT) Each edge e ∈ E is associated with lumped resistance R(e) and capacitance C(e) A buffer library B containing all the possible buffer types that can be assigned to a buffer position is also given In this section, B contains only one buffer type Delay estimation is obtained using the Elmore

delay model, which is described in Chapter 3 A buffer assignmentγ is a mapping γ : V n → B ∪{¯b} where ¯b denotes that no buffer is inserted The timing buffering problem is defined as follows Timing-driven buffer insertion problem: Given a binary routing tree T = (V, E), possible buffer positions, and a buffer library B, compute a buffer assignment γ such that the RAT at driver is

maximized

26.3.1 CONCEPT OFCANDIDATESOLUTION

A buffer assignmentγ is also called a candidate solution for the timing buffering problem A partial

solution, denoted byγ v , refers to an incomplete solution where the buffer assignment in T (v) has

been determined

The Elmore delay from v to any sink s in T (v) under γ vis computed by

D (s, γ v ) =

e=( vi,vj ) [D (v i ) + D (e)]

where the sum is taken over all edges along the path from v to s The slack of vertex v under γ vis defined as

Trang 8

Q (γ v ) = min

s ∈T(v) {RAT(s) − D(s, γ v )}

At any vertex v, the effect of a partial solution γ v to its upstream part is characterized by a

(Q(γ v ), C(γ v )) pair, where Q is the slack at v under γ v and C is the downstream capacitance viewing

at v under γ v

26.3.2 GENERATINGCANDIDATESOLUTIONS

van Ginneken’s algorithm proceeds bottom-up from the leaf nodes toward the driver along T A

set of candidate solutions, denoted by , are kept updated during this process There are three

operations through solution propagation, namely, wire insertion, buffer insertion, and branch merging (Figure 26.3) We are to describe them in turn

26.3.2.1 Wire Insertion

Suppose that a partial solutionγ v at position v propagates to an upstream position u and there is no branching point in between If no buffer is placed at u, then only wire delay needs to be considered.

Therefore, the new solutionγ ucan be computed as

Q (γ u ) = Q(γ v ) − D(e)

where e = (u, v) and D(e) = R(e) C(e)

2 + C(γ v )

26.3.2.2 Buffer Insertion

Suppose that we add a buffer b at u Denote by R (b), K(b) the driving resistance and the intrinsic delay of buffer b, respectively γ uis then updated toγ

uwhere

Q (γ

u ) = Q(γ u ) − R (b) · C(γ u ) + K(b)

C (γ

26.3.2.3 Branch Merging

When two branches Tl and Tr meet at a branching point v, l andr, which correspond to Tl and

Tr, respectively, are to be merged The merging process is performed as follows For each solution

γl∈ land each solutionγr∈ r, generate a new solutionγaccording to

C (γ) = C(γl) + C(γr)

The smaller Q is picked since the worst-case circuit performance needs to be considered.

u

(a) Wire insertion (b) Buffer insertion

v

T(v)

u

T(u)

(c) Branch merging

v1

v2

T(v2 )

T(v1 )

v

FIGURE 26.3 Operations in van Ginneken’s algorithm.

Trang 9

26.3.3 INFERIORITY ANDPRUNINGIDENTIFICATION

Simply propagating all solutions by the above three operations makes the solution set grow expo-nentially in the number of buffer positions processed An effective and efficient pruning technique

is necessary to reduce the size of the solution set This motivates an important concept—inferior solution—in van Ginneken’s algorithm For any two partial solutionsγ1andγ2at the same vertex v,

γ2is inferior toγ1if C (γ1) ≤ C(γ2) and Q(γ1) ≥ Q(γ2) Whenever a solution becomes inferior, it is

pruned from the solution set Therefore, only solutions that excel in at least one aspect of downstream capacitance and slack can survive

For an efficient pruning implementation and thus an efficient buffering algorithm, a sorted list is used to maintain the solution set The solution set is increasingly sorted according to C, and thus

Q is also increasingly sorted if does not contain any inferior solutions.

By a straightforward implementation, when adding a wire, the number of candidate solutions will not change; when inserting a buffer, only one new candidate solution will be introduced More

efforts are needed to merge two branches Tland Trat v For each partial solution in l, find the first

solution with larger Q value in r If such a solution does not exist, the last solution inrwill be taken Becauselandrare sorted, we only need to traverse them once Partial solutions inrare similarly treated It is easy to see that after merging, the number of solutions is at most|l| + |r|

As such, given n buffer positions, at most n solutions can be generated at any time Consequently, the pruning procedure at any vertex in T runs in O (n) time.

26.3.4 PSEUDOCODE

In van Ginneken’s algorithm, a set of candidate solutions are propagated from sinks to driver Along

a branch, after a candidate buffer location v is processed, all solutions are propagated to its upstream buffer location u through wire insertion A buffer is then inserted to each solution to obtain a new

solution Meanwhile, inferior solutions are pruned At a branching point, solution sets from all branches are merged by merging process In this way, the algorithm proceeds in the bottom-up

fashion and the solution with maximum required arrival time at driver is returned Given n buffer positions in T , van Ginneken’s algorithm can compute a buffer assignment with maximum slack at driver in O (n2) time, because any operation at any node can be performed in O(n) time Refer to

Figure 26.4 for the pseudocode of van Ginneken’s algorithm

26.3.5 EXAMPLE

Let us look at a simple example to illustrate the work flow of van Ginneken’s algorithm Refer to

Figure 26.5 Assume that there are three nondominated solutions at v3whose(Q, C) pairs are

(200, 10), (300, 30), and (500, 50) and there are two nondominated solutions at v2whose(Q, C) pairs are

(290, 5) and (350, 20)

We first propagate them to v1through wire insertion Assume that R (v1, v3)=3 and C(v1, v3)=2 Solution (200, 10) at v3 becomes(200 − 3 · (2/2 + 10), 10 + 2) = (167, 12) at v1 Similarly, the

other two solutions become (207, 32) and (347, 52) Assume that R (v2, v3) = 2 and C(v2, v3) = 2, solutions at v2become (278, 7) and (308, 22) at v1

We are now to merge these solutions at v1 Denote byl the solutions propagated from v3 and

byrthe solutions propagated from v2 Before merging, partial solutions inlare

(167, 12) , (207, 32) , and (347, 52)

Trang 10

Algorithm: van Ginneken’s algorithm

andC(γs) =C(s)

5 C(γ ) =C(γ ) +C(e)

6 Q(γ ) =Q(γ ) −D(e)

10 setC(γ) =C(b)

11 setQ(γ) =Q(γ ) −R(b) ·C(γ ) −K(b)

13 //merge1and2tov t

16 setC(γ) =C(γ1) +C(γ2)

17 setQ(γ) =min{Q(γ1),Q(γ2)}

FIGURE 26.4 van Ginneken’s algorithm.

and partial solutions inrare

(278, 7) and (308, 22) After branch merging, the new candidate partial solutions whose Q are dictated by solutions in lare

(167, 19) , (207, 39) , and (308, 74)

and those dictated by solutions inrare

(278, 59) and (308, 74)

V2

S1

S3

S4

V1 V3

S2

S0

FIGURE 26.5 Example for performing van Ginneken’s algorithm.

and partial solutions inrare

(278, 7) and (308, 22) After branch merging, the new candidate partial solutions whose Q are dictated...

S2

S0

FIGURE 26.5 Example for performing van Ginneken’s algorithm.

Định dạng
Số trang	10
Dung lượng	213,98 KB