1. Trang chủ
  2. » Khoa Học Tự Nhiên

Handbook of algorithms for physical design automation part 67 pot

10 199 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 10
Dung lượng 288,22 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

33.1 BUFFER PLANNING BASICS Some VLSI designs may not allow buffers to be inserted inside a circuit block as they consume silicon resource and require connections to the power/ground net

Trang 1

Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C032 Finals Page 642 9-10-2008 #17

chip may be partially or completely blocked and buffers cannot be placed in these areas As the capacity constraints for the edges in the global routing graph ensure that not too many wires cross the boundaries between two adjacent global routing tiles, similar constraints ensure that not too many buffers are placed in one global routing tile

Vygen [26] considers the coupling capacitance and minimizes the total power consumption while ensuring the timing constraints for individual nets and certain paths A Steiner tree for a net is characterized not only by the edges of the global routing graph, but also each edge of the Steiner tree has a continuous parameter specifying the spacing to each side of the final route It is assumed that the coupling capacitance decreases linearly with the spacing The timing constraints are ensured by bounding the weighted capacitance for subsets of the nets, a constraint similar to the constraint that bounds the total weighted wirelength While more space decreases the coupling capacitance also more routing resources are used The problem can be formulated as a fractional packing problem of infinitely many Steiner trees, infinitely many because of the continuous spacing parameters Because the capacitance depends linearly on the spacing, every edge of the Steiner tree that minimizes the cost function with respect to the dual variables either has the maximum or minimum spacing The task

of the subroutine is still to find a Steiner minimal tree in the grid graph with respect to a nonuniform length function

Müller [27] describes a parallel multithreaded implementation of the approximation scheme He shows that it is possible to update the dual variables at the end of each phase for all nets instead of updating them immediately after a Steiner tree is found The set of nets is split into subsets and each thread computes the minimal Steiner trees for one subset in the global routing graph

32.9 CONCLUSION

This chapter is about the global routing problem, and specifically about algorithms solving the linear programming relaxation The complexity of the linear program is enormous and hence it is not possible to solve the linear program optimally The linear programming relaxation for global routing

is a special case of a fractional packing problem and is similar to the multicommodity flow problem

We showed that the approximation algorithm for the multicommodity flow problem can be applied

to the fractional global routing problem A final integer solution is derived by randomized rounding This approach has been used successfully in practice and has been extended to consider additional constraints and objectives

The approach of the linear relaxation and randomized rounding is general and it may be possible

to apply it to other combinatorial optimization problems in physical design For global routing, the approach works well because the capacities of the edges are relatively large and hence randomized rounding does not disturb the solution much

REFERENCES

1 P Raghavan and C D Thompson, Randomized rounding: A technique for provably good algorithms and

algorithmic proofs, Combinatorica, 7(4): 365–374, 1987.

2 P Raghavan and C D Thompson, Multiterminal global routing: A deterministic approximation, Algorith-mica, 6: 73–82, 1991.

3 F Shahrokhi and D W Matula, The maximum concurrent flow problem, Journal of the Association for Computing Machinery, 37: 24–31, 1990.

4 S Plotkin, D Shmoys, and E Tardos, Fast approximation algorithms for fractional packing and covering

problems, Mathematics of Operations Research, 20: 257–301, 1995.

5 A V Goldberg, J D Oldham, S Plotkin, and C Stein, An implementation of a combinatorial

approx-imation algorithm for minimum-cost multicommodity flow, in Integer Programming and Combinatorial Optimization (6th International IPCO Conference), Houston, TX, pp 338–352, 1998.

6 N Garg and J Könemann, Faster and simpler algorithms for multicommodity flow and other fractional

packing problems, in Proceedings of the 39th Annual Symposium on Foundations of Computer Science,

Palo Alto, CA, pp 300–309, 1998

Trang 2

Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C032 Finals Page 643 9-10-2008 #18

7 L K Fleischer, Approximating fractional multicommodity flow independent of the number of commodities,

SIAM Journal on Discrete Mathematics, 13(4): 505–520, 2000 (FOCS 1999).

8 G Karakostas, Faster approximation schemes for fractional multicommodity flow problems, in Proceedings

of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms, San Francisco, CA, pp 166–173, 2002.

9 C Albrecht, Global routing by new approximation algorithms for multicommodity flow, IEEE Transactions

on Computer-Aided Design of Integrated Circuits and Systems, 20: 622–632, May 2001 (ISPD 2000).

10 M R Kramer and J van Leeuwen, The complexity of wire routing and finding minimum area layouts for

arbitrary VLSI circuits, in Advances in Computing Research, Vol 2: VLSI Theory, F P Preparata (Eds),

JAI Press, Greenwhich, CT, pp 192–146, 1984

11 J Vygen, Disjoint paths, Technical Report 94816, Research Institute for Discrete Mathematics, University

of Bonn, Bonn, Germany, 1994

12 V Chvátal, Linear Programming New York: Freeman, 1983.

13 A Schrijver, Theory of Linear and Integer Programming Chichester, United Kingdom: Wiley, 1986.

14 J Werber, Das Multicommodity-Flow-Problem und seine Anwendung im Global Routing Diplomarbeit,

Universität Bonn, Bonn, Germany, 2000

15 T C Hu and M T Shing, A decomposition algorithm for circuit routing, in VLSI Circuit Layout: Theory and Design, T C Hu and E S Kuh (Eds), IEEE Press, pp 144–152, 1985.

16 G B Dantzig, Maximization of a linear function of variables subject to linear inequalities, in Activity Analysis of Production and Allocation, Tj C Koopmans (Eds), Wiley, NY, pp 399–347, 1951.

17 H T Jongen, K Meer, and E Triesch, Optimization Theory Norwell, MA: Kluwer Academic Publishers,

2004

18 A Vannelli, An adaptation of the interior point method for solving the global routing problem, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 10: pp 193–203, February

1991 (CICCC 1989)

19 M Hanan, On Steiner’s problem with rectilinear distance, Soviet Mathematics Doklady, 14(2): 255–265,

1966

20 N Karmarkar, A new polynomial-time algorithm for linear programming, Combinatorica, 4: 373–395,

1984

21 L G Khachiyan, A polynomial-time algorithm in linear programming, Soviet Mathematics Doklady, 20:

191–194, 1979

22 R C Carden IV, J Li, and C -K Cheng, A global router with a theoretical bound on the optimum solution,

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 15: 208–216, February

1996

23 P Raghavan, Probabilistic construction of deterministic algorithms: Approximating packing integer

programs, Journal of Computer and System Sciences, 37: 130–143, 1988.

24 H Chernoff, A measure of asymptotic efficiency for tests based on the sum of observations, Annals of Mathematical Statistics, 23: 493–509, 1952.

25 C Albrecht, A Kahng, I M˘andoiu, and A Zelikovsky, Multicommodity flow algorithms for buffered global

routing, in Handbook of Approximation Algorithms and Metaheuristics, T F Gonzales (Ed.), Boca Raton,

FL: Chapman & Hall/CRC, pp 80.1–80.18, 2007 (ASPDAC 2002)

26 J Vygen, Near-optimum global routing with coupling, delay bounds, and power consumption, in Integer Programming and Combinatorial Optimization (10th International IPCO Conference), LNCS 3064,

G Nemhauser and D Bienstock (Eds.) Berlin, Germany: Springer, pp 308–324, 2004

27 D Müller, Optimizing yield in global routing, in Digest of Technical Papers of the IEEE/ACM International Conference on Computer-Aided Design San Jose, CA, November 2006.

Trang 3

Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C032 Finals Page 644 9-10-2008 #19

Trang 4

Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C033 Finals Page 645 29-9-2008 #2

Cheng-Kok Koh, Evangeline F.Y Young,

and Yao-Wen Chang

CONTENTS

33.1 Buffer Planning Basics 646

33.1.1 Feasible Regions 648

33.1.2 Independent Feasible Regions 649

33.1.3 Two-Dimensional Feasible Region 649

33.2 Buffer Blocks and Sites 649

33.2.1 Buffer Block Planning 650

33.2.2 Buffer Site Planning 652

33.3 Interconnect Planning and Buffer Planning 653

33.3.1 Routability-Driven Buffer Planning 653

33.3.1.1 Routability-Driven Buffer Planning with Dead Space Redistribution 654

33.3.1.2 Interconnect Planning with Fixed Interval Buffer Insertion Constraint 655

33.3.1.3 Methodology for Interconnect Planning in Buffer Site 656

33.3.1.4 Other Routability-Driven Buffer Planning Approaches 656

33.3.2 Pin Assignment with Buffer Planning 657

33.3.3 Noise-Aware Buffer Planning 658

33.3.3.1 Independent Feasible Regions with Transition Time Constraints 658

33.3.3.2 Common Independent Feasible Region 660

33.3.3.3 Buffer Block Planning Considering Transition Time and Delay 660

33.3.4 Buffer Planning with Noise Constraints 661

33.4 Flip-Flop and Buffer Planning (Wire Retiming) 663

33.4.1 Minimizing Latency 664

33.4.1.1 Two-Pin Net Optimization Using Analytical Formulas 664

33.4.1.2 Multiple-Terminal Net Optimization 666

33.4.2 Latency Constrained Optimization 666

33.4.3 Wire Retiming 667

33.4.4 Area Constrained Wire Retiming 668

33.5 Concluding Remarks 669

References 670

With the growing dominance of global interconnects on circuit performance, it is desirable to optimize interconnects as early as possible Recall from Chapter 26 that buffer insertion is generally considered the most effective and popular technique to reduce interconnect delay, especially for global signals

A buffer is composed of two inverters while a repeater is referred to as a buffer or an inverter

To simplify the discussions, we shall use buffer and repeater interchangeably throughout this chapter

645

Trang 5

Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C033 Finals Page 646 29-9-2008 #3

As hundreds of thousands of buffers may be inserted for modern high-performance VLSI designs,

it is imperative to plan for the buffer positions as early as possible to ensure timing closure and design convergence In this chapter, we shall first present the enabling concept of buffer planning, namely, the feasible region in which a buffer can be inserted such that the timing constraint is met Following a description of two fundamental approaches to buffer planning, taking into account only timing constraints, we address also other important design issues such as noise constraints and routability in buffer planning When buffer insertion fails to meet the timing constraints, pipelining

of global interconnects with flip-flops becomes necessary We devote a section on flip-flop and buffer planning to deal with the challenges that arise from the additional latency introduced by interconnect pipelining

33.1 BUFFER PLANNING BASICS

Some VLSI designs may not allow buffers to be inserted inside a circuit block as they consume silicon resource and require connections to the power/ground network Consequently, buffers are placed in channels and dead spaces of a floorplan, and they are often clustered to form buffer blocks between existing circuit blocks of the floorplan, which inevitably increases the chip area [1] It is thus desirable to carefully plan for the buffer blocks during/after floorplanning to minimize the area overhead and facilitate routing This is known as buffer block planning

However, the existence of buffer blocks imposes more design constraints Because buffers con-nect global nets, the routing regions where buffer blocks are located might be congested Furthermore, buffers might be placed in poor locations because buffers are clustered into blocks and thus the best location for a buffer is forbidden A remedy to this deficiency is to distribute buffers more uniformly

in a chip, so as to naturally spread out global nets This approach looks promising in handling the aforementioned problems with wire congestion and buffer blockages In contrast to the buffer block planning methodology, Alpert et al [2] proposed the buffer site methodology The methodology allocates a buffering resource within a block by inserting a buffer site that can accommodate buffers (or other logic gates if the buffer site is not used for buffering) For buffer site planning, we shall plan for the buffers during/after floorplanning such that the given buffer sites can accommodate buffers and the routing timing and congestion constraints are satisfied

To determine the optimal location for buffer insertion, we shall first consider the feasible region (FR) for a buffer, which is referred to as the region where the buffer can be placed to satisfy the timing constraint Figure 33.1 shows respective FRs for inserting (a) one buffer and (b) multiple buffers into a net between a source and a sink, where the FRs are shaded

The concepts of the feasible region come in two forms Cong, Kong, and Pan first defined in Ref [1] the feasible region for buffer insertion to be the region where a buffer could be placed to satisfy a target timing constraint, assuming that all the remaining buffers were placed in their optimal positions In contrast, Sarkar and Koh [3] introduced the idea of independent feasible region (IFR) for buffer insertion, which was defined as the region where a buffer could be placed such that the timing constraint of the net was satisfied, assuming that the other buffers were also placed within their respective independent feasible regions

Before presenting the analytical formulas for computing the feasible regions, we shall first introduce the notation and delay model that will be used throughout this chapter Each driver/buffer

is modeled as a switch-level RC circuit [4] and each wire is modeled as aπ-type circuit, as shown

in Figure 33.2 We use the Elmore delay model [5] covered in Chapter 3 for delay computation The notation for the physical parameters of wire and buffer is listed in Table 33.1

Given a wire segment of length l with driver output resistance R and sink capacitance C, the

Elmore delay of this segment is given by

D (R, C, l) =rc

2



l2+ (Rc + rC)l + RC.

Trang 6

Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C033 Finals Page 647 29-9-2008 #4

(a)

(b)

Feasible region

x1,min

x1,max

x1

Rd

Feasible regions for the corresponding buffers

x1

x2

x n

Rd

FIGURE 33.1 Feasible regions for buffer insertion (a) Single-buffer insertion and (b) multiple-buffer

insertion (From Cong, J., Kong, T., and Pan, Z., IEEE Trans VLSI Sys., 9, 929, 2001 (ICCAD 1999).)

(b)

rl

cl/2 cl/2

Length l

Wire

(a) Buffer

Rb

Cb Tb /Rb

FIGURE 33.2 Buffer and wire model (a) Switch-level buffer model and (b) wire model.

Using the preceding expression, the Elmore delay of a single-source, single-sink net (i.e., two-pin

net) N of length L with n buffers can be computed as

DN(x1, x2, x n , L ) = D(Rd, Cb, x1) + D(Rb, Cs, L − x n ) +

n−1



i=1

D (Rb, Cb, x i+1− x i ) + nTb,

where

Rdis the driver resistance

Csis the sink capacitance

x i is the location of the ith buffer

For convenience, we reexpress the optimal locations of the n buffers for the delay minimization of a

net [6], presented in Chapter 26, as follows:

x= (i − 1)y+ xi ∈ {1, 2, n},

Trang 7

Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C033 Finals Page 648 29-9-2008 #5

TABLE 33.1 Parameters of Wire and Buffer Parameter Description

r Wire resistance per unit length

c Wire capacitance per unit length

Tb Intrinsic buffer delay

Cb Buffer input capacitance

Rb Buffer output resistance

where

x L∗= 1

n+ 1



L+n (Rb− Rd)

r +(Cs− Cb)

c

 and yL= 1

n+ 1



L(Rb− Rd)

r +(Cs− Cb)

c



We denote the optimal delay for the net N, of length L, with n buffers by

DNopt(n, L) = DN(x

1, x2∗, , x

n , L ).

In the following subsections, we first discuss the computation of the feasible region and the independent feasible region of a buffer on a one-dimensional line segment, and then extend the idea

to a two-dimensional chip plane

33.1.1 FEASIBLEREGIONS

For n buffers inserted in a two-pin net N as shown in Figure 33.1b, their feasible regions can be

computed as follows [1]

Theorem 1 For a two-pin net N of length L and with n buffers inserted and a given timing bound

DN

tgt, the feasible region for the ith buffer (i ≤ n) is x i ∈ [x i,min , x i,max ] with

x i,min= max



0,K2−K2− 4K1K3

2K1

 ,

and

x i,max= min



L, K2−K2+ 4K1K3

2K1

 ,

where

K1 = (n + 1)rc 2i (n − i + 1),

K2 = (Rb− Rd)c

i + (Cs− Cb)r + rcL

n − i + 1 , and

K3 = nTb−DN

tgt+Rd+ (i − 1)Rb+ (n − i)rL n − i + 1Cb+Rb((n−1)Cb+Cs+cL)+ rcL2

2(n − i + 1) + rLCs− (i − 1)c(Rb− Rd)2

2ir − (n − i)r(Cb− Cs)2

2(n − i + 1)c .

Trang 8

Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C033 Finals Page 649 29-9-2008 #6

We denote the width of the feasible region for a given buffer by WFR Cong, Kong, and Pan

gave an analytical expression for WFRin Ref [1] Sarkar and Koh presented an equivalent analytical expression in Ref [3], as given below

tgt≥ DN

opt(n, L), the width of the feasible region for the ith buffer (i ≤ n) of the

net N is

WFR= 2 ·

2(DN

tgt− DN

opt(n, L))(n − i + 1)(i)

33.1.2 INDEPENDENTFEASIBLEREGIONS

In contrast to the definition of feasible region, the IFR of a buffer is the region where it can be placed

to satisfy the timing constraints of the net, assuming that the other buffers are placed within their respective IFRs [3] To provide every buffer in the net with an equal degree of freedom to move

within its IFR, the IFRs are chosen to have the same width, denoted by WIFR Hence, the IFR for the

ith buffer of a net N with a corresponding target delay DN

tgtis given by IFRi = (x

i − WIFR/2, x

i + WIFR/2) ∩ (0, L),

such that∀ (x1, x2, , x i, , x n ) ∈ IFR1× IFR2× × IFR n and DN(x1, x2, , x n , L ) ≤ DN

tgt The

following theorem gives an analytical expression for WIFR

tgt ≥ DN

opt(n, L), the width of the independent feasible region for the ith buffer

(i ≤ n) of the net N is

WIFR= 2

DN

tgt− DN

opt(n, L)

rc (2n − 1) .

33.1.3 TWO-DIMENSIONALFEASIBLEREGION

Implicit in the preceding discussions are the assumptions that a routing from source to sink exists, which is not true for buffer planning during floorplanning, and that buffer insertion occurs only along

an one-dimensional line For buffer planning, we typically assume that the two terminals of a net are connected with a shortest path within the bounding box of the net The union of the one-dimensional FRs (or one-dimensional IFRs) of a buffer on all monotonic Manhattan routes between source and sink forms the two-dimensional FR (or two-dimensional IFR) of that buffer (see Figure 33.3) The feasible region of a buffer may be reduced by circuit blocks Moreover, 2D IFRs of buffers

of the same net are not entirely independent of each other As the widths and locations of a 2D IFR are determined under the assumption of a monotonic Manhattan route between the source and the sink, an assignment of buffers to locations within their respective 2D IFRs is legal only if the buffers lie along a monotone source-to-sink path Figure 33.3 shows a nonmonotonic buffer assignment, which may not meet the timing constraint, even though the buffers are all within their respective 2D IFRs Therefore, when we have committed a buffer to a location in its 2D IFR, it may be necessary

to update the 2D IFRs of all other buffers in the net

33.2 BUFFER BLOCKS AND SITES

There are two approaches to buffer planning: buffer block planning and buffer site planning For buffer block planning, top-level macroblocks with only buffers, or buffer blocks, are inserted into the floorplan [1,3,7–9] The underlying idea to this methodology is that when one moves a buffer considerably from its optimal location, only a small delay penalty is incurred As a result, buffers can

Trang 9

Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C033 Finals Page 650 29-9-2008 #7

Source

Sink

Feasible regions of three buffers, two of which have their areas reduced by

circuit blocks

Assignment of buffers to these locations results in a nonmonotonic path

Circuit block A

Circuit block B

FIGURE 33.3 2D feasible regions and their implications on buffer assignment.

be relocated within their respective feasible regions or independent feasible regions such that they can be clustered together to form buffer blocks The buffer site methodology puts the onus on block designers to allocate a buffering resource within a block by inserting a buffer site The allocation

of buffer sites within blocks may not be uniform; a low-performance block may accommodate more buffer sites than a high-performance one, and some blocks, such as a cache, may not have any buffer sites A preallocated buffer site may remain unassigned to a net after planning In that case, unused buffer sites can be used to accommodate other useful circuit elements, such as decoupling capacitors

To facilitate buffer planning, a chip is typically divided into tiles first Figure 33.4 shows a tiled chip layout with channel regions, hard blocks, and soft blocks The capacity of each tile for buffer insertion depends on whether the tile overlaps with channel regions, dead areas, or hard blocks Channel regions and dead areas of the floorplan have high capacity for buffer insertion In contrast, hard blocks have very low capacity for buffer insertion unless some buffer sites have been inserted intentionally [2] As the exact layout of each soft block is yet to be determined, it is typically assumed that as long as the total area of functional units and buffers in a soft block is not larger than its preallocated space, the layout of this block can be completed in the placement stage For ease

of problem formulation, all the tiles in a soft block may be merged together, as in Figure 33.4 The buffer capacity of this merged block tile is the total area less the area consumed by its functional units It is the responsibility of the placement tool to ensure that buffers are placed at appropriate locations in the physical realization of a soft block

Let VTdenote the set of tiles obtained as described in the preceding paragraph We can construct

a tile graph GT(VT, ET), where every two neighboring tiles u and v in VTare connected by an edge e u,v

in ET For a tile v, let B (v) be the number of buffer sites within v and b(v) be the number of buffers assigned to v Let W (e u,v ) be the wire capacity of the edge e u,v , and w (e u,v ) denote the actual wire usage of e u,v It is clear that a buffer planning solution is feasible only if b (v) ≤ B(v) for all v ∈ VT

and w (e) ≤ W(e) for all e ∈ ET

33.2.1 BUFFERBLOCKPLANNING

The buffer block planning problem can be informally stated as follows: Given a set of circuit blocks and a set of connections with feasible regions for buffer insertion to satisfy the design constraints (e.g., timing, noise), plan the locations of buffer blocks within the available free space (e.g., dead spaces

Trang 10

Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C033 Finals Page 651 29-9-2008 #8

Hard block Soft block channel region Dead space/

FIGURE 33.4 Tile graph for buffer planning.

and channels) so as to route a maximum number of connections Buffer blocks can be planned after floorplanning [1,3,7,9] or during floorplanning [10–13] Postfloorplanning buffer block planning is more efficient, but is often limited by the quality of a given floorplan because the location and size of the space for buffer insertion are fixed Furthermore, as the dead spaces are often treated as undesired cost during floorplanning, they are usually avoided or minimized As a result, the size and location

of dead spaces may not be suitable for postfloorplanning buffer insertion Therefore, there are also efforts that integrate buffer block planning into floorplanning to fully utilize useful dead spaces for performance optimization This approach typically enjoys higher design flexibility, but inevitably incurs higher time complexity

Cong, Kong, and Pan first considered postfloorplanning buffer block planning in Ref [1]; they derived feasible region formulas to determine where to insert buffers to meet timing constraints and proposed a greedy algorithm to plan buffer blocks in a slicing floorplan Sarkar and Koh also con-sidered routability and addressed the concept of independent feasible regions in Ref [3] Moreover, both approaches in Refs [1,3] expand channels to provide more buffers if necessary On the basis

of a network-flow formulation, Tang and Wong in Ref [9] optimally planned as many buffers into buffer blocks as possible for all nets, each with at most one buffer Given an existing buffer block plan, Dragan et al in Ref [7] performed buffering of global nets Nets are routed using available buffer blocks such that required upper and lower bounds on buffer intervals and the wirelength upper bounds per connection are satisfied

We describe the generic approach for postfloorplanning buffer block planning as presented in Ref [1] First, we construct a directed horizontal constraint graph and a vertical constraint graph

for a given floorplan, denoted by GH and GV, respectively Each vertex v in GH models a vertical

routing channel, and an edge e = (v1, v2) denotes a circuit block whose respective left and right boundaries are adjacent to the routing channels v1and v2 The weight of a vertex v, w (v), denotes the corresponding channel width while the weight of an edge e, w (e), represents the corresponding

Ngày đăng: 03/07/2014, 20:20