Handbook of algorithms for physical design automation part 59 ppt

needed, additional CPU time may be required for postprocess buffering.27.3 SIMULTANEOUS TREE CONSTRUCTION AND BUFFER INSERTION In this section, some of the methods that combine buffer in

Trang 1

needed, additional CPU time may be required for postprocess buffering).

27.3 SIMULTANEOUS TREE CONSTRUCTION AND BUFFER INSERTION

In this section, some of the methods that combine buffer insertion and topology construction are presented Most of them belong to the P-Tree class of algorithms The work started with the first version of the P-Tree algorithm [8] that was designed to construct timing-driven routing tree and

it has seen a decade long evolution of various improvements and extensions that were building on the original ideas These algorithms are designed to handle a variety of challenges seen in modern designs Simultaneous tree construction and repeater insertion (with multiple buffers and inverters

in the library) while being able to optimize multiple objectives, again, simultaneously (delay, cost, congestion, wirelength) are achieved through the core optimization engine Practical issues such as obstacles (i.e., placement and routing blockages), multilayer routing and vias, and nonorthogonal routing are handled by the capability of the algorithms to work on general graph models as rout-ing targets Spatial, temporal, and polarity localities (all of them independently) are captured and exploited by implicit specification of the set of tree topologies that will be searched In the following subsections, we give an overview of the mentioned contributions in chronological order

27.3.1 P-TREE ALGORITHM

The work in Ref [8] presents an algorithm that constructs rectilinear Steiner trees while explicitly optimizing both delay and wire area Contrary to the methods presented in the previous section, it introduces the paradigm of finding an optimal solution in the constrained solution space; in other words, the solution search space is defined in advance, and then the algorithm finds an optimal solution in this constrained space

The P-Tree algorithm simultaneously optimizes over topologies and their embeddings To illus-trate the degrees of freedom in embedding a particular topology consider Figure 27.6 The driver is the root, leaves represent sinks, and internal nodes represent Steiner or branching points However, the branching points do not have defined placement locations As an example, topology in Figure 27.6a

can yield very different embedded solutions as shown in Figure 27.6b and c Notice that sinks A and

B when the topology is embedded (Figure 27.6b and c) are always in the same subtree, as specified

by the topology (Figure 27.6a)

Thus, how one embeds a particular topology can be of great importance But what about the topology itself? The P-Tree algorithm does not limit itself to a single topology Instead, it optimizes

C

D

A

C

D

A B

C D

FIGURE 27.6 Routing tree topology (a) can have different embedded solutions (b) and (c).

Trang 2

(a) (b) (c)

C

A

A B

B

A

FIGURE 27.7 Topologies (a) and (b) satisfy permutation BDAC while topology (c) does not.

over all topologies induced by a sink permutation (a set exponential in size) These topology trees are simultaneously explored and embedded to the routing domain

The routing topologies that are produced are permutation-constrained routing trees (giving rise to the name P-Tree) Given an ordering of sinks (i.e., permutation) a topology satisfies the permutation constraint if some depth-first traversal of the topology tree produces the same sink ordering As an

example, given the permutation BDAC, trees in Figure 27.7a and b satisfy the permutation while the

one in Figure 27.7c does not

The method proposed to find a high-quality permutation consists of three steps The first step constructs a minimum spanning tree (MST) on all pins (both driver and sinks) The next step converts the MST into topology by reorienting MST such that the driver is at the root and binarizing it such that each internal node has exactly two outgoing edges The last step is to apply the dynamic programming algorithm to optimize the induced permutation further The proposed approach is to optimize the tour length of the permutation (in the sense of a traveling salesman problem (TSP)) The intuition

is that TSP provides good clustering information (i.e., sinks that are close in the placement should

be close in the topology as well) In addition, because the permutation is consistent with the MST,

it guarantees that the minimum area solution induced by this permutation can be at most 50 percent larger than the optimal Steiner tree A more detailed description of the algorithm can be also found

in Ref [17]

Once the permutation is obtained, thus defining all topologies to be explored, the algorithm proceeds to topology embedding The routing target is specified by a Hanan grid (note that in Ref [10] this step of the flow is redesigned to handle general graph model, giving the capability to account for blockages and congestion) Instead of embedding topologies one at a time, algorithm exploits the structure of permutation constrained topologies and achieves polynomial computational complexity while exploring an exponential search space For example, topologies in Figure 27.7a

and b have identical subtree containing sinks B and D, and there is no need to compute solution for

this subtree more than once

The dynamic programming approach proceeds in a bottom-up fashion computing the following

solution sets: S (v, i, j) contains signatures∗of the solutions over all permutation-induced topologies

driving sinks from i to j in the permutation that are rooted at the vertex v in the target routing graph Set Sb(v, i, j) contains signatures of the solutions over all permutation-induced topologies driving sinks from i to j in the permutation that are rooted at the vertex v with a constraint that vertex v is

also a branching point The top-level view of the P-Tree algorithm is given in Figure 27.8

Depending on the solution signature, the algorithm can optimize many objectives In the P – TreeA

mode, solution signature contains only one parameter: total wire capacitance c This mode is used

to optimize wire area In the P – TreeATmode, both timing and area are optimized simultaneously

∗ Solution signature is described in the following paragraph.

Trang 3

a4 j=i+I

a5 ComputeSb(v, i, j)

a6 ComputeS(v, i, j)

a7 endfor

a8 endfor

a9 returnS(vd,1, n−1)

FIGURE 27.8 P-Tree algorithmic framework.

Instead of one single final solution, algorithm produces a family of nondominated solutions with area/delay trade-off In this mode, solution signature is specified by an ordered pair(c, q), where c represents total downstream capacitance, while q represents signal required arrival time Under the

Elmore delay model [2], managing these primitives is done in a way similar to Ref [1], as explained

in Chapter 26 Just as a reminder the primitives include joining, augmenting, merging, and pruning solutions In the followup work, P – TreeAand P – TreeATmodes are referred to as one-dimensional (1D) and two-dimensional (2D) modes because the algorithm optimizes only one parameter in the 1D mode, and simultaneously optimizes two parameters in the 2D mode Details about computing

S and Sbsets can be found in Ref [8] As a side note, computation of the set S is done by performing

four sweeps of the routing grid (once in each direction) This step is very efficient under assumption that the routing target is a Hanan grid, but it cannot handle obstacles

Once the set S is computed at the top level S (vd, 1, n − 1), the actual topology embeddings can

be obtained by backtracking through the data structure containing these solution signatures

In Ref [7], the original algorithm is extended to perform simultaneous route construction and buffer insertion The general flow of the algorithm is almost identical to Ref [8], except that each internal vertex in the topology is also considered as a buffer insertion candidate location However, some adjustments had to be made to the solution signature and, because of that, to all primitives that operate on them as well Because inserting a buffer decouples downstream load, and effectively resets the load visible from the parent vertex to the input capacitance of the inserted buffer, the load value cannot be used for cost estimation (but remains necessary for delay estimation) Thus, a third parameter that represents solution cost (e.g., area, power, congestion, or any composite function of

those) is added to the solutions signature The solution signature now becomes a triple (p, c, q) More

details about the three-dimensional (3D) (simultaneously optimizing three parameters) version of the primitives can be found in Ref [12] Also, methods from Ref [12] can be applied here to extend the algorithm to support multiple buffers and inverters in the library

Although the algorithm’s computational complexity is O (n5) in the 1D mode, and

pseudopoly-nomially bounded in the 2D mode, it produces solutions of very high quality in terms of both cost and delay The algorithm may not be suitable for optimizing every single net in the design, but using this approach to optimize a smaller number of highly critical nets can be very beneficial, especially when one considers that the algorithm produces a family of solutions with delay/cost trade-off giving more choices in the overall design optimization process Experimental results support these claims

A detailed description of the algorithm and its extensions that were summarized above can be also found in Ref [17]

27.3.2 S-TREE ALGORITHM

The work in Ref [11] adopts the overall philosophy of P-Tree while improving runtime and scalabil-ity, introducing a general strategy (stitching) for incorporating multiple measures of locality between

Trang 4

sinks and subtrees (e.g., temporal, polarity), and finally adopting a general graph model as the routing target enabling natural solutions for obstacles, multilayer routing, etc

Although the P-Tree topology space is very large and has a very good ability to capture spatial sink locality, it may create solutions of higher cost in certain scenarios that include, for example, highly critical sinks physically located among noncritical sinks To address this issue, the S-Tree defines its topology search set using a topology tree and a sink partition Sink partition specifies which sinks can break the original topology tree and be stitched to any other part of the topology while maintaining the order within its own partition set The topology set is then composed of all valid stitching trees (thus, giving the name S-Tree) In Figure 27.9, given the topology on the left and

a sink partition{{A, C, D, F}, {B, E}}, one can find all possible topologies that satisfy the stitching

requirement Assuming that sinksBandEare critical, note how they are allowed to break away from noncritical sinks and climb toward the root The fact that these topologies are in the solution space guarantees only that they will be explored but whether they will be chosen at the end depends on the cost and quality of the embedded trees

In the extreme cases, if all sinks belong to a single partition, S-Tree is equivalent to a single topology tree embedding The other extreme case is if each sink belongs to its own partition, which explores all possible tree topologies in a way similar to Ref [9]

The initial topology is obtained as in P-Tree, using the first two steps only: construct an MST and convert it to a topology tree by reorientation and binarization The sink partitions can be determined based on estimated timing criticality, input signal polarity requirements, or any other criteria Once the topology set is defined, the algorithm proceeds with the dynamic programming algorithm that performs topology set embedding In the embedding process, no sink (nor a group of sinks) receives any special treatment

Modifications to the topology embedding algorithm include support of a general graph model for a routing target Propagation of candidate solutions through the routing graph is performed using timing-driven maze routing approach with simultaneous buffer insertion as in Ref [14] Note that solution quality depends a lot on the routing target graph Careful graph construction is a key to

A

S

E B A

C D

S

E

F B

S

E

F A

C

S

B

S

B C

E

FIGURE 27.9 S-Tree topology space example The initial topology tree on the left and a sink partition

{{A, C, D, F}, {B, E}} define a set of five different topologies that will be explored simultaneously.

Trang 5

Optimization modes include all three modes seen in the P-Tree algorithm (1D, 2D, and 3D) Experimental results demonstrate the effectiveness of the approach in identifying solutions with very good timing and low cost Also, the algorithm has very good predictability characteristics

27.3.3 SP-TREEALGORITHM

The SP-Tree algorithm [10], as the name suggests (the stitching permutation constrained trees), combines the topology tree spaces of P-Tree and S-Tree After obtaining the sink permutation and constructing a set of topology trees, sink partitions are specified and the entire set of topology trees

is expanded by the stitching operation, producing a new larger set of tree topologies The SP-Tree topology set contains both the P-Tree and the S-Tree topologies in addition to some new topologies that are not contained in either P-Tree or S-Tree

The topology set embedding algorithm is identical to that of the S-Tree with the main difference

in the construction of the topology tree set that is provided as an input to the embedder More details about both algorithms can be found in Ref [18], while executable solvers and examples can be found

at Ref [19]

Algorithms like buffered P-Tree, S-Tree, and SP-Tree, based on a general graph model as a routing target, provide high-quality solutions and have large flexibility with different optimization objectives As expected, experimental results demonstrate that SP-Tree flow produces solutions that are always better or equal to those of both S-Tree and P-Tree, usually with a smaller cost, but at a runtime penalty

27.3.4 COMPLETE TREE TOPOLOGY EXPLORATION

For the experimental purposes, one can construct a tree topology set that contains all possible tree topologies This is similar to topology decomposition from Ref [20] One can achieve that either

by using the S-Tree and requesting that each sink belongs to its own partition, or by using a specific approach as in Ref [9] Embedded routing trees obtained in this fashion are indeed optimal, but the computational complexity prevents any practical applications, although they are valuable for research purposes and evaluation of other heuristic approaches

REFERENCES

1 L P P P van Ginneken, Buffer placement in distributed RC-Tree networks for minimal elmore delay, in

Proceedings of the IEEE International Symposium on Circuits and Systems, New Orleans, LA, May 1990,

pp 865–868

2 W C Elmore, The transient response of damped linear networks with particular regard to wideband

amplifiers, Journal of Applied Physics, 19: 55–63, Jan 1948.

3 C J Alpert, G Gandham, M Hrki´c, J Hu, A B Kahng, J Lillis, B Liu, S T Quay, S S Sapatnekar, and

A J Sullivan, Buffered Steiner trees for difficult instances, IEEE Transactions on Computer-Aided Design

of Integrated Circuits and Systems, 21(1): 3–14, Jan 2002.

4 C J Alpert, G Gandham, M Hrki´c, J Hu, and S T Quay, Porosity aware buffered Steiner tree

construction, in Proceedings of the ACM International Symposium on Physical Design, Monterey, CA,

Apr 2003, pp 158–165

5 C J Alpert, M Hrki´c, J Hu, and S T Quay, Fast and flexible buffer trees that navigate the physical

layout environment, in Proceedings of the 41st Design Automation Conference, San Diego, CA, Jun 2004,

pp 24–29

6 C Bartoschek, S Held, D Rautenbach, and J Vygen, Efficient generation of short and fast repeater tree

topologies, in Proceedings of the ACM International Symposium on Physical Design, San Jose, CA, Apr.

2006, pp 120–127

Trang 6

7 J Lillis, C K Cheng, and T T Y Lin, Simultaneous routing and buffer insertion for high performance

interconnect, in Proceedings of the 6th IEEE Great Lakes Symposium on VLSI, Ames, IA, Mar 1996,

pp 148–153

8 J Lillis, C K Cheng, T T Y Lin, and C Y Ho, New performance driven routing techniques with

explicit area/delay tradeoff and simultaneous wire sizing, in Proceedings of the 33rd Design Automation Conference, Las Vegas, NV, Jun 1996, pp 395–400.

9 J Cong and X Yuan, Routing tree construction under fixed buffer locations, in Proceedings of the 37th Design Automation Conference, Los Angeles, CA, Jun 2000, pp 379–384.

10 M Hrki´c and J Lillis, Buffer tree synthesis with consideration of temporal locality, sink polarity

require-ments, solution cost and blockages, in Proceedings of the ACM International Symposium on Physical Design, Del Mar, CA, Apr 2002, pp 98–103.

11 M Hrki´c and J Lillis, S-Tree: A technique for buffered routing tree synthesis, in Proceedings of the 39th Design Automation Conference, New Orleans, LA, Jun 2002, pp 578–583.

12 J Lillis, C K Cheng, and T T Y Lin, Optimal wire sizing and buffer insertion for low power and a

generalized delay model, IEEE Journal of Solid-State Circuits, 31(3): 437–447, Mar 1996.

13 C J Alpert, T C Hu, J H Huang, A B Kahng, and D Karger, Prim-Dijkstra tradeoffs for improved

performance-driven routing tree design, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 14(7): 890–898, Jul 1995.

14 S W Hur, A Jagannathan, and J Lillis, Timing-driven maze routing, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 19(2): 234–241, Feb 2000.

15 A Jagannathan, S -W Hur, and J Lillis, A fast algorithm for context-aware buffer insertion, in Proceedings

of the 37th Design Automation Conference, Los Angeles, CA, Jun 2000, pp 368–373.

16 H Zhou, D F Wong, I M Liu, and A Aziz, Simultaneous routing and buffer insertion with restrictions on

buffer locations, in Proceedings of the 36th Design Automation Conference, New Orleans, LA, Jun 1999,

pp 96–99

17 J Lillis, Algorithms for Performance Driven Design of Integrated Circuits, 1996 Available at http://www.cs.uic.edu/˜jlillis/papers/thesis.ps

18 M Hrki´c and J Lillis, Buffer tree synthesis with consideration of temporal locality, sink polarity

require-ments, solution cost, congestion and blockages, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 22(4): 481–491, Apr 2003.

19 M Hrki´c and J Lillis, GSRC Single Interconnect Tree Synthesis Web Page Available at http://eda.cs uic.edu/software/interconnect/gsrc.html

20 S E Dreyfus and R A Wagner, The Steiner problem in graphs, Networks, 1: 195–208, 1972.

Trang 8

28 Buffering in the Layout

Environment

Jiang Hu and Cliff C N Sze

CONTENTS

28.1 Introduction 569

28.2 Placement and Routing Blockages 569

28.3 Buffered Path with Blockage Avoidance 571

28.3.1 Dynamic Programming-Based Method 571

28.3.2 Graph-Based Approach 572

28.4 Buffered Tree with Blockage Avoidance 574

28.4.1 Tree Adjustment Technique 574

28.4.2 Simultaneous Tree Construction and Buffer Insertion 574

28.4.2.1 Dynamic Programming-Based Method 575

28.4.2.2 Graph-Based Technique 575

28.5 Layout Environment Aware Buffered Steiner Tree 577

28.5.1 Measurement of Placement and Routing Congestion 577

28.5.2 Plate-Based Tree Adjustment 578

28.5.2.1 Dynamic Programming-Based Adjustment 578

28.5.2.2 Hybrid Approach for Tree Adjustment 579

28.5.3 Layout Navigation 581

28.5.4 Relating Buffering Candidate Locations to Layout Environment 582

References 583

28.1 INTRODUCTION

Chapters 26 and 27 presented buffering algorithms where the buffering problem was isolated from the general problem of timing closure The main problem with this extraction is that there are no necessarily resources available to put the buffers or wires in their desired locations One way to manage this flow is to put the buffers in their ideal locations and allow a legalization procedure to move them to actual locations The problem with this approach is that buffers may be moved quite far from their ideal locations, which could completely corrupt the quality of the solution It is much better to place the buffers in regions where there appear to be sufficient space, so that legalization would move the buffers by at most 10–20 routing tracks, which would preserve the original solution

To do this, one must certainly take into account the blockages and preferably local placement and routing congestion In this chapter, we explore techniques that consider these factors

28.2 PLACEMENT AND ROUTING BLOCKAGES

In realistic chip designs, some regions may be occupied by IP blocks, memory arrays, and macros Such regions allow wires to pass through but have no room for buffer insertion Therefore, buffer insertion has to be performed with consideration of these buffer blockages If a wire path has large

569

Trang 9

(a) (b) (c)

FIGURE 28.1 (a) The wire path has large overlap with the buffer blockage, which is the shaded region, and

there is no feasible buffering solution (b) Rerouting the wire to completely avoid the blockage may result in large wire detour (c) Ideally, the wire path should largely avoid the blockage with limited detour

overlap with blockages as in Figure 28.1a, no feasible buffering solution can be found However, avoiding buffer blockages completely as Figure 28.1b may cause unnecessary wire detour as well

as delay degradation Hence, it is important to have algorithmic techniques that can find a proper routing topology (as Figure 28.1c) together with the buffer insertion solution Another more intuitive example was previously described in Figure 27.4 To find the best buffered interconnect solution, the routing must consider feasible buffer insertion locations

A similar problem is buffer insertion considering placement and routing congestions If a buffer is inserted in a crowded place as in Figure 28.2a, it might be moved far away later during cell placement legalization Thus, such insertion is not favorable although it is feasible, i.e., buffers are preferred

to be inserted in relatively sparse regions like Figure 28.2b Similarly, buffer insertion in a routing congested region may intensify the wire routability problem Hence, buffer insertion algorithms need

to be aware of layout environment and be able to handle the trade off between timing performance and congestion avoidance

Section 28.3 introduces algorithms on blockage avoidance for two-pin nets, i.e., buffered paths Blockage buffered Steiner tree methods for multipin nets are described in Section 28.4 In Section 28.5, techniques for handling congestions are discussed

FIGURE 28.2 (a) If a buffer is placed in a crowded region, it may be moved by placement legalizer far away

from its preferred location (b) If the buffer is placed in a sparse region, its location will not be significantly changed by placement legalization

Trang 10

28.3 BUFFERED PATH WITH BLOCKAGE AVOIDANCE

For two-pin nets, the problem is to find a buffered path with the minimum delay under the blockage constraints In the literature, there exists two major approaches: (1) dynamic programming and (2) graph-based algorithm Both are based on the Elmore delay model It has not been showed that these approaches are fast enough to be practical for general optimization cases, but they can be very useful to a handful of the most critical nets

28.3.1 DYNAMIC PROGRAMMING-BASED METHOD

The dynamic programming-based method [1,2] propagates partial solutions from the sink node t through a routing graph G = (V, E) and picks the optimal solution at the source node s The routing

graph can be either a uniform grid reflecting routing tracks (Figure 28.3a) or an extended Hanan grid, which is obtained by drawing vertical and horizontal lines through the given pins and blockage boundaries (Figure 28.3b) For each edge(u, v) ∈ E, R(u, v) and C(u, v) are the edge resistance and capacitance, respectively For each node v ∈ V, there is a label p(v) ∈ {0, 1} which is equal to 0

if it overlaps with a buffer blockage and equal to 1, otherwise Besides the routing graph, the driver

resistance Rd, sink capacitance C t , and a buffer library B are assumed to be given Each buffer type

b ∈ B is modeled by its input capacitance C(b), intrinsic delay K(b), and output resistance R(b).

A partial solution at a node v is characterized by a quadruple α = (c, d, m, v), where c is the current input capacitance seen at v, d is the delay from v to the sink t, and m is a labeling for the buffered path from v to t The label of m (v) = b indicates that buffer b ∈ B is inserted at node v and m (v) = 0 implies that no buffer is inserted there The solution α1= (c1, d1, m1, v ) is inferior to

solutionα2 = (c2, d2, m2, v ) if c1≥ c2and d1≥ d2

The partial solutions are maintained in a priority queue Q initialized with the solution (C t , 0, 0, t ) at the sink t Each time, the top solution in Q, which has the minimum delay, is extracted for expansion A solution (c, d, m, u) is expanded to its neighbor node v if there is an

edge(u, v) ∈ E The expanded solution is (c + C(u, v), d + R(u, v) · (c + C(u, v)/2), m, v) where the delay increase is based on the Elmore delay model If a solution (c, d, m, v) is at node v where

m (v) = 0 and p(v) = 1, buffers of each type are inserted there to generate new partial solutions If the buffer type is b ∈ B, its corresponding buffered solution is (C(b), d + R(b) · c + K(b), m, v)

s t

FIGURE 28.3 Routing graph.

Định dạng
Số trang	10
Dung lượng	193,48 KB