needed, additional CPU time may be required for postprocess buffering.27.3 SIMULTANEOUS TREE CONSTRUCTION AND BUFFER INSERTION In this section, some of the methods that combine buffer in
Trang 1needed, additional CPU time may be required for postprocess buffering).
27.3 SIMULTANEOUS TREE CONSTRUCTION AND BUFFER INSERTION
In this section, some of the methods that combine buffer insertion and topology construction are presented Most of them belong to the P-Tree class of algorithms The work started with the first version of the P-Tree algorithm [8] that was designed to construct timing-driven routing tree and
it has seen a decade long evolution of various improvements and extensions that were building on the original ideas These algorithms are designed to handle a variety of challenges seen in modern designs Simultaneous tree construction and repeater insertion (with multiple buffers and inverters
in the library) while being able to optimize multiple objectives, again, simultaneously (delay, cost, congestion, wirelength) are achieved through the core optimization engine Practical issues such as obstacles (i.e., placement and routing blockages), multilayer routing and vias, and nonorthogonal routing are handled by the capability of the algorithms to work on general graph models as rout-ing targets Spatial, temporal, and polarity localities (all of them independently) are captured and exploited by implicit specification of the set of tree topologies that will be searched In the following subsections, we give an overview of the mentioned contributions in chronological order
27.3.1 P-TREE ALGORITHM
The work in Ref [8] presents an algorithm that constructs rectilinear Steiner trees while explicitly optimizing both delay and wire area Contrary to the methods presented in the previous section, it introduces the paradigm of finding an optimal solution in the constrained solution space; in other words, the solution search space is defined in advance, and then the algorithm finds an optimal solution in this constrained space
The P-Tree algorithm simultaneously optimizes over topologies and their embeddings To illus-trate the degrees of freedom in embedding a particular topology consider Figure 27.6 The driver is the root, leaves represent sinks, and internal nodes represent Steiner or branching points However, the branching points do not have defined placement locations As an example, topology in Figure 27.6a
can yield very different embedded solutions as shown in Figure 27.6b and c Notice that sinks A and
B when the topology is embedded (Figure 27.6b and c) are always in the same subtree, as specified
by the topology (Figure 27.6a)
Thus, how one embeds a particular topology can be of great importance But what about the topology itself? The P-Tree algorithm does not limit itself to a single topology Instead, it optimizes
C
D
A
C
D
A B
C D
FIGURE 27.6 Routing tree topology (a) can have different embedded solutions (b) and (c).
Trang 2(a) (b) (c)
C
C
A
A B
B
A
FIGURE 27.7 Topologies (a) and (b) satisfy permutation BDAC while topology (c) does not.
over all topologies induced by a sink permutation (a set exponential in size) These topology trees are simultaneously explored and embedded to the routing domain
The routing topologies that are produced are permutation-constrained routing trees (giving rise to the name P-Tree) Given an ordering of sinks (i.e., permutation) a topology satisfies the permutation constraint if some depth-first traversal of the topology tree produces the same sink ordering As an
example, given the permutation BDAC, trees in Figure 27.7a and b satisfy the permutation while the
one in Figure 27.7c does not
The method proposed to find a high-quality permutation consists of three steps The first step constructs a minimum spanning tree (MST) on all pins (both driver and sinks) The next step converts the MST into topology by reorienting MST such that the driver is at the root and binarizing it such that each internal node has exactly two outgoing edges The last step is to apply the dynamic programming algorithm to optimize the induced permutation further The proposed approach is to optimize the tour length of the permutation (in the sense of a traveling salesman problem (TSP)) The intuition
is that TSP provides good clustering information (i.e., sinks that are close in the placement should
be close in the topology as well) In addition, because the permutation is consistent with the MST,
it guarantees that the minimum area solution induced by this permutation can be at most 50 percent larger than the optimal Steiner tree A more detailed description of the algorithm can be also found
in Ref [17]
Once the permutation is obtained, thus defining all topologies to be explored, the algorithm proceeds to topology embedding The routing target is specified by a Hanan grid (note that in Ref [10] this step of the flow is redesigned to handle general graph model, giving the capability to account for blockages and congestion) Instead of embedding topologies one at a time, algorithm exploits the structure of permutation constrained topologies and achieves polynomial computational complexity while exploring an exponential search space For example, topologies in Figure 27.7a
and b have identical subtree containing sinks B and D, and there is no need to compute solution for
this subtree more than once
The dynamic programming approach proceeds in a bottom-up fashion computing the following
solution sets: S (v, i, j) contains signatures∗of the solutions over all permutation-induced topologies
driving sinks from i to j in the permutation that are rooted at the vertex v in the target routing graph Set Sb(v, i, j) contains signatures of the solutions over all permutation-induced topologies driving sinks from i to j in the permutation that are rooted at the vertex v with a constraint that vertex v is
also a branching point The top-level view of the P-Tree algorithm is given in Figure 27.8
Depending on the solution signature, the algorithm can optimize many objectives In the P – TreeA
mode, solution signature contains only one parameter: total wire capacitance c This mode is used
to optimize wire area In the P – TreeATmode, both timing and area are optimized simultaneously
∗ Solution signature is described in the following paragraph.
Trang 3a4 j=i+I
a5 ComputeSb(v, i, j)
a6 ComputeS(v, i, j)
a7 endfor
a8 endfor
a9 returnS(vd,1, n−1)
FIGURE 27.8 P-Tree algorithmic framework.
Instead of one single final solution, algorithm produces a family of nondominated solutions with area/delay trade-off In this mode, solution signature is specified by an ordered pair(c, q), where c represents total downstream capacitance, while q represents signal required arrival time Under the
Elmore delay model [2], managing these primitives is done in a way similar to Ref [1], as explained
in Chapter 26 Just as a reminder the primitives include joining, augmenting, merging, and pruning solutions In the followup work, P – TreeAand P – TreeATmodes are referred to as one-dimensional (1D) and two-dimensional (2D) modes because the algorithm optimizes only one parameter in the 1D mode, and simultaneously optimizes two parameters in the 2D mode Details about computing
S and Sbsets can be found in Ref [8] As a side note, computation of the set S is done by performing
four sweeps of the routing grid (once in each direction) This step is very efficient under assumption that the routing target is a Hanan grid, but it cannot handle obstacles
Once the set S is computed at the top level S (vd, 1, n − 1), the actual topology embeddings can
be obtained by backtracking through the data structure containing these solution signatures
In Ref [7], the original algorithm is extended to perform simultaneous route construction and buffer insertion The general flow of the algorithm is almost identical to Ref [8], except that each internal vertex in the topology is also considered as a buffer insertion candidate location However, some adjustments had to be made to the solution signature and, because of that, to all primitives that operate on them as well Because inserting a buffer decouples downstream load, and effectively resets the load visible from the parent vertex to the input capacitance of the inserted buffer, the load value cannot be used for cost estimation (but remains necessary for delay estimation) Thus, a third parameter that represents solution cost (e.g., area, power, congestion, or any composite function of
those) is added to the solutions signature The solution signature now becomes a triple (p, c, q) More
details about the three-dimensional (3D) (simultaneously optimizing three parameters) version of the primitives can be found in Ref [12] Also, methods from Ref [12] can be applied here to extend the algorithm to support multiple buffers and inverters in the library
Although the algorithm’s computational complexity is O (n5) in the 1D mode, and
pseudopoly-nomially bounded in the 2D mode, it produces solutions of very high quality in terms of both cost and delay The algorithm may not be suitable for optimizing every single net in the design, but using this approach to optimize a smaller number of highly critical nets can be very beneficial, especially when one considers that the algorithm produces a family of solutions with delay/cost trade-off giving more choices in the overall design optimization process Experimental results support these claims
A detailed description of the algorithm and its extensions that were summarized above can be also found in Ref [17]
27.3.2 S-TREE ALGORITHM
The work in Ref [11] adopts the overall philosophy of P-Tree while improving runtime and scalabil-ity, introducing a general strategy (stitching) for incorporating multiple measures of locality between
Trang 4sinks and subtrees (e.g., temporal, polarity), and finally adopting a general graph model as the routing target enabling natural solutions for obstacles, multilayer routing, etc
Although the P-Tree topology space is very large and has a very good ability to capture spatial sink locality, it may create solutions of higher cost in certain scenarios that include, for example, highly critical sinks physically located among noncritical sinks To address this issue, the S-Tree defines its topology search set using a topology tree and a sink partition Sink partition specifies which sinks can break the original topology tree and be stitched to any other part of the topology while maintaining the order within its own partition set The topology set is then composed of all valid stitching trees (thus, giving the name S-Tree) In Figure 27.9, given the topology on the left and
a sink partition{{A, C, D, F}, {B, E}}, one can find all possible topologies that satisfy the stitching
requirement Assuming that sinksBandEare critical, note how they are allowed to break away from noncritical sinks and climb toward the root The fact that these topologies are in the solution space guarantees only that they will be explored but whether they will be chosen at the end depends on the cost and quality of the embedded trees
In the extreme cases, if all sinks belong to a single partition, S-Tree is equivalent to a single topology tree embedding The other extreme case is if each sink belongs to its own partition, which explores all possible tree topologies in a way similar to Ref [9]
The initial topology is obtained as in P-Tree, using the first two steps only: construct an MST and convert it to a topology tree by reorientation and binarization The sink partitions can be determined based on estimated timing criticality, input signal polarity requirements, or any other criteria Once the topology set is defined, the algorithm proceeds with the dynamic programming algorithm that performs topology set embedding In the embedding process, no sink (nor a group of sinks) receives any special treatment
Modifications to the topology embedding algorithm include support of a general graph model for a routing target Propagation of candidate solutions through the routing graph is performed using timing-driven maze routing approach with simultaneous buffer insertion as in Ref [14] Note that solution quality depends a lot on the routing target graph Careful graph construction is a key to
A
S
E B A
C D
S
E
F B
S
E
F A
C
S
B
S
B C
E
FIGURE 27.9 S-Tree topology space example The initial topology tree on the left and a sink partition
{{A, C, D, F}, {B, E}} define a set of five different topologies that will be explored simultaneously.
Trang 5Optimization modes include all three modes seen in the P-Tree algorithm (1D, 2D, and 3D) Experimental results demonstrate the effectiveness of the approach in identifying solutions with very good timing and low cost Also, the algorithm has very good predictability characteristics
27.3.3 SP-TREEALGORITHM
The SP-Tree algorithm [10], as the name suggests (the stitching permutation constrained trees), combines the topology tree spaces of P-Tree and S-Tree After obtaining the sink permutation and constructing a set of topology trees, sink partitions are specified and the entire set of topology trees
is expanded by the stitching operation, producing a new larger set of tree topologies The SP-Tree topology set contains both the P-Tree and the S-Tree topologies in addition to some new topologies that are not contained in either P-Tree or S-Tree
The topology set embedding algorithm is identical to that of the S-Tree with the main difference
in the construction of the topology tree set that is provided as an input to the embedder More details about both algorithms can be found in Ref [18], while executable solvers and examples can be found
at Ref [19]
Algorithms like buffered P-Tree, S-Tree, and SP-Tree, based on a general graph model as a routing target, provide high-quality solutions and have large flexibility with different optimization objectives As expected, experimental results demonstrate that SP-Tree flow produces solutions that are always better or equal to those of both S-Tree and P-Tree, usually with a smaller cost, but at a runtime penalty
27.3.4 COMPLETE TREE TOPOLOGY EXPLORATION
For the experimental purposes, one can construct a tree topology set that contains all possible tree topologies This is similar to topology decomposition from Ref [20] One can achieve that either
by using the S-Tree and requesting that each sink belongs to its own partition, or by using a specific approach as in Ref [9] Embedded routing trees obtained in this fashion are indeed optimal, but the computational complexity prevents any practical applications, although they are valuable for research purposes and evaluation of other heuristic approaches
REFERENCES
1 L P P P van Ginneken, Buffer placement in distributed RC-Tree networks for minimal elmore delay, in
Proceedings of the IEEE International Symposium on Circuits and Systems, New Orleans, LA, May 1990,
pp 865–868
2 W C Elmore, The transient response of damped linear networks with particular regard to wideband
amplifiers, Journal of Applied Physics, 19: 55–63, Jan 1948.
3 C J Alpert, G Gandham, M Hrki´c, J Hu, A B Kahng, J Lillis, B Liu, S T Quay, S S Sapatnekar, and
A J Sullivan, Buffered Steiner trees for difficult instances, IEEE Transactions on Computer-Aided Design
of Integrated Circuits and Systems, 21(1): 3–14, Jan 2002.
4 C J Alpert, G Gandham, M Hrki´c, J Hu, and S T Quay, Porosity aware buffered Steiner tree
construction, in Proceedings of the ACM International Symposium on Physical Design, Monterey, CA,
Apr 2003, pp 158–165
5 C J Alpert, M Hrki´c, J Hu, and S T Quay, Fast and flexible buffer trees that navigate the physical
layout environment, in Proceedings of the 41st Design Automation Conference, San Diego, CA, Jun 2004,
pp 24–29
6 C Bartoschek, S Held, D Rautenbach, and J Vygen, Efficient generation of short and fast repeater tree
topologies, in Proceedings of the ACM International Symposium on Physical Design, San Jose, CA, Apr.
2006, pp 120–127
Trang 67 J Lillis, C K Cheng, and T T Y Lin, Simultaneous routing and buffer insertion for high performance
interconnect, in Proceedings of the 6th IEEE Great Lakes Symposium on VLSI, Ames, IA, Mar 1996,
pp 148–153
8 J Lillis, C K Cheng, T T Y Lin, and C Y Ho, New performance driven routing techniques with
explicit area/delay tradeoff and simultaneous wire sizing, in Proceedings of the 33rd Design Automation Conference, Las Vegas, NV, Jun 1996, pp 395–400.
9 J Cong and X Yuan, Routing tree construction under fixed buffer locations, in Proceedings of the 37th Design Automation Conference, Los Angeles, CA, Jun 2000, pp 379–384.
10 M Hrki´c and J Lillis, Buffer tree synthesis with consideration of temporal locality, sink polarity
require-ments, solution cost and blockages, in Proceedings of the ACM International Symposium on Physical Design, Del Mar, CA, Apr 2002, pp 98–103.
11 M Hrki´c and J Lillis, S-Tree: A technique for buffered routing tree synthesis, in Proceedings of the 39th Design Automation Conference, New Orleans, LA, Jun 2002, pp 578–583.
12 J Lillis, C K Cheng, and T T Y Lin, Optimal wire sizing and buffer insertion for low power and a
generalized delay model, IEEE Journal of Solid-State Circuits, 31(3): 437–447, Mar 1996.
13 C J Alpert, T C Hu, J H Huang, A B Kahng, and D Karger, Prim-Dijkstra tradeoffs for improved
performance-driven routing tree design, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 14(7): 890–898, Jul 1995.
14 S W Hur, A Jagannathan, and J Lillis, Timing-driven maze routing, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 19(2): 234–241, Feb 2000.
15 A Jagannathan, S -W Hur, and J Lillis, A fast algorithm for context-aware buffer insertion, in Proceedings
of the 37th Design Automation Conference, Los Angeles, CA, Jun 2000, pp 368–373.
16 H Zhou, D F Wong, I M Liu, and A Aziz, Simultaneous routing and buffer insertion with restrictions on
buffer locations, in Proceedings of the 36th Design Automation Conference, New Orleans, LA, Jun 1999,
pp 96–99
17 J Lillis, Algorithms for Performance Driven Design of Integrated Circuits, 1996 Available at http://www.cs.uic.edu/˜jlillis/papers/thesis.ps
18 M Hrki´c and J Lillis, Buffer tree synthesis with consideration of temporal locality, sink polarity
require-ments, solution cost, congestion and blockages, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 22(4): 481–491, Apr 2003.
19 M Hrki´c and J Lillis, GSRC Single Interconnect Tree Synthesis Web Page Available at http://eda.cs uic.edu/software/interconnect/gsrc.html
20 S E Dreyfus and R A Wagner, The Steiner problem in graphs, Networks, 1: 195–208, 1972.
Trang 828 Buffering in the Layout
Environment
Jiang Hu and Cliff C N Sze
CONTENTS
28.1 Introduction 569
28.2 Placement and Routing Blockages 569
28.3 Buffered Path with Blockage Avoidance 571
28.3.1 Dynamic Programming-Based Method 571
28.3.2 Graph-Based Approach 572
28.4 Buffered Tree with Blockage Avoidance 574
28.4.1 Tree Adjustment Technique 574
28.4.2 Simultaneous Tree Construction and Buffer Insertion 574
28.4.2.1 Dynamic Programming-Based Method 575
28.4.2.2 Graph-Based Technique 575
28.5 Layout Environment Aware Buffered Steiner Tree 577
28.5.1 Measurement of Placement and Routing Congestion 577
28.5.2 Plate-Based Tree Adjustment 578
28.5.2.1 Dynamic Programming-Based Adjustment 578
28.5.2.2 Hybrid Approach for Tree Adjustment 579
28.5.3 Layout Navigation 581
28.5.4 Relating Buffering Candidate Locations to Layout Environment 582
References 583
28.1 INTRODUCTION
Chapters 26 and 27 presented buffering algorithms where the buffering problem was isolated from the general problem of timing closure The main problem with this extraction is that there are no necessarily resources available to put the buffers or wires in their desired locations One way to manage this flow is to put the buffers in their ideal locations and allow a legalization procedure to move them to actual locations The problem with this approach is that buffers may be moved quite far from their ideal locations, which could completely corrupt the quality of the solution It is much better to place the buffers in regions where there appear to be sufficient space, so that legalization would move the buffers by at most 10–20 routing tracks, which would preserve the original solution
To do this, one must certainly take into account the blockages and preferably local placement and routing congestion In this chapter, we explore techniques that consider these factors
28.2 PLACEMENT AND ROUTING BLOCKAGES
In realistic chip designs, some regions may be occupied by IP blocks, memory arrays, and macros Such regions allow wires to pass through but have no room for buffer insertion Therefore, buffer insertion has to be performed with consideration of these buffer blockages If a wire path has large
569
Trang 9(a) (b) (c)
FIGURE 28.1 (a) The wire path has large overlap with the buffer blockage, which is the shaded region, and
there is no feasible buffering solution (b) Rerouting the wire to completely avoid the blockage may result in large wire detour (c) Ideally, the wire path should largely avoid the blockage with limited detour
overlap with blockages as in Figure 28.1a, no feasible buffering solution can be found However, avoiding buffer blockages completely as Figure 28.1b may cause unnecessary wire detour as well
as delay degradation Hence, it is important to have algorithmic techniques that can find a proper routing topology (as Figure 28.1c) together with the buffer insertion solution Another more intuitive example was previously described in Figure 27.4 To find the best buffered interconnect solution, the routing must consider feasible buffer insertion locations
A similar problem is buffer insertion considering placement and routing congestions If a buffer is inserted in a crowded place as in Figure 28.2a, it might be moved far away later during cell placement legalization Thus, such insertion is not favorable although it is feasible, i.e., buffers are preferred
to be inserted in relatively sparse regions like Figure 28.2b Similarly, buffer insertion in a routing congested region may intensify the wire routability problem Hence, buffer insertion algorithms need
to be aware of layout environment and be able to handle the trade off between timing performance and congestion avoidance
Section 28.3 introduces algorithms on blockage avoidance for two-pin nets, i.e., buffered paths Blockage buffered Steiner tree methods for multipin nets are described in Section 28.4 In Section 28.5, techniques for handling congestions are discussed
FIGURE 28.2 (a) If a buffer is placed in a crowded region, it may be moved by placement legalizer far away
from its preferred location (b) If the buffer is placed in a sparse region, its location will not be significantly changed by placement legalization
Trang 1028.3 BUFFERED PATH WITH BLOCKAGE AVOIDANCE
For two-pin nets, the problem is to find a buffered path with the minimum delay under the blockage constraints In the literature, there exists two major approaches: (1) dynamic programming and (2) graph-based algorithm Both are based on the Elmore delay model It has not been showed that these approaches are fast enough to be practical for general optimization cases, but they can be very useful to a handful of the most critical nets
28.3.1 DYNAMIC PROGRAMMING-BASED METHOD
The dynamic programming-based method [1,2] propagates partial solutions from the sink node t through a routing graph G = (V, E) and picks the optimal solution at the source node s The routing
graph can be either a uniform grid reflecting routing tracks (Figure 28.3a) or an extended Hanan grid, which is obtained by drawing vertical and horizontal lines through the given pins and blockage boundaries (Figure 28.3b) For each edge(u, v) ∈ E, R(u, v) and C(u, v) are the edge resistance and capacitance, respectively For each node v ∈ V, there is a label p(v) ∈ {0, 1} which is equal to 0
if it overlaps with a buffer blockage and equal to 1, otherwise Besides the routing graph, the driver
resistance Rd, sink capacitance C t , and a buffer library B are assumed to be given Each buffer type
b ∈ B is modeled by its input capacitance C(b), intrinsic delay K(b), and output resistance R(b).
A partial solution at a node v is characterized by a quadruple α = (c, d, m, v), where c is the current input capacitance seen at v, d is the delay from v to the sink t, and m is a labeling for the buffered path from v to t The label of m (v) = b indicates that buffer b ∈ B is inserted at node v and m (v) = 0 implies that no buffer is inserted there The solution α1= (c1, d1, m1, v ) is inferior to
solutionα2 = (c2, d2, m2, v ) if c1≥ c2and d1≥ d2
The partial solutions are maintained in a priority queue Q initialized with the solution (C t , 0, 0, t ) at the sink t Each time, the top solution in Q, which has the minimum delay, is extracted for expansion A solution (c, d, m, u) is expanded to its neighbor node v if there is an
edge(u, v) ∈ E The expanded solution is (c + C(u, v), d + R(u, v) · (c + C(u, v)/2), m, v) where the delay increase is based on the Elmore delay model If a solution (c, d, m, v) is at node v where
m (v) = 0 and p(v) = 1, buffers of each type are inserted there to generate new partial solutions If the buffer type is b ∈ B, its corresponding buffered solution is (C(b), d + R(b) · c + K(b), m, v)
s t
FIGURE 28.3 Routing graph.