40.9 X GLOBAL ROUTING Most physical design tools start the routing process with a global router GR, which creates a plan— that is, a set of corridors—for the detailed router to follow fo
Trang 1interplay between Manhattan and diagonal geometries To modify partitioning to support X, we used two-dimensional, k-way partitioning in place of one-dimensional bipartitioning
To exploit the benefits of the partitioning idea within an X system, we use a modified simulated
annealing strategy to partition the components into an n × m grid, where n and m are both greater than 1 This approach was used by Suaris and Kedem [SK89], where n = m = 2, and by Bapat and Cohoon [BC93], Alexander et al [ACGR98], and Ganley [G95], where n = m = 3 We refined and improved this approach for n = m = 4 to create the first placer for the X interconnect architecture.
The key principle behind these algorithms is to consider an approximate routing of each net,
where the cell positions are rounded to the centers of the partitions in the n × m grid The partition
of a given net can then be considered to form a bit vector of length nm, in which each bit is on if a
component on the net lies in the corresponding partition The numeric value of this bit vector can then be used to index a table giving the score of that particular configuration of the net This table is precomputed once and stored, so there is no recurring runtime cost for the computation of its values
A single, canonical value for the score can be stored and simply scaled (if necessary) to the actual size of the current grid
The default measure used by all of the placement algorithms cited above, including ours, is the
length of an optimal (in our case, octilinear) Steiner tree of the points in the n × m grid Although
Suaris and Kedem [SK89] and Huang and Kahng [HK97] report that KLFM-style partitioning works
well for n = m = 2, it turns out to perform quite poorly for n, m ≥ 3 The terrain of the optimization
objective becomes too rough, and KLFM-style, local optimization algorithms become trapped in deep local optima that are globally poor Our algorithm instead uses a sophisticated, multiobjective variant of simulated annealing; although this is computationally expensive, it produces substantially higher-quality solutions than KLFM or any of several other, simpler heuristics that we tried The other objectives, aside from total Steiner tree length, enforce that both the components in each partition and the (approximate) routing congestion will fit in the partition The increased number of partitions, and the fact that their sizes cannot be adjusted to match a particular partition, makes the balance problem harder to solve in this context as well In particular, balancing the 16 slots alone often leads
to overfilled rows or columns in the 4× 4 grid Additional terms to enforce the balance of each rows
and columns are added, resulting in the overall objective function for a particular p:
f (p) =
nets n
len(n) + α
slots s
bal(s) + β
rows r
bal(r) + γ
columns c bal(c)
bal(s) = max{size(s) − cap(s), 0}2
The capacity cap (s) of a slot s is calculated simply by evenly distributing the total size of the
cells across the available space in the slot The row and column are the four rows and columns of slots in the 4×4 grid Some care must be taken in optimizing this multiobjective measure, especially because the balance terms are highly correlated with one another
The major drawback of simulated annealing is its high running time We alleviate this problem somewhat by using multilevel techniques [CCY03,YWES00] The netlist is recursively clustered so that the clusters are (heuristically) loosely connected to one another and approximately the same size The clustering recursion stops when there are a few hundred clusters That clustered netlist is then partitioned in the 4×4 grid using annealing One level of clusters is then resolved into its subclusters, leaving each subcluster in the same partition as its parent cluster Annealing measures the temperature
of the revised solution (in the higher-resolution optimization space) and then continues to improve it This process is repeated until the bottom level of clustering—that is, the original netlist—is reached Note that in this process, it is critical that the annealer accurately measure the temperature of the starting partitions; too high a temperature will destroy the quality of the partitioning solution found
so far, and too low a temperature will restrict the amount of further improvement that the annealer can make
Trang 2Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C040 Finals Page 843 29-9-2008 #10
X Architecture Place and Route: Physical Design for the X Interconnect Architecture 843
Once a partitioning solution is found for a particular grid, the same technique is applied recur-sively to each of the 16 slots This process repeats until there are few enough cells in a partition that
a solution can be found by branch-and-bound, without the use of partitioning The entire algorithm
is as follows:
Place(NetlistN,GridG: {
Recursively cluster N(=N0) to form netlists N1,N2, ., Nc
Randomly partition Nc into G
Anneal to improve the partition of Nc in G
Repeat for c down to 1:{
Break the clusters that form Nc, producing netlist Nc − 1 Measure the temperature of Nc − 1’s partition in G
Continue annealing to improve the solution further
This technique is completely data-driven; although the default measure stored in the table is the optimal Steiner tree length, by simply swapping in a different table the algorithm can optimize different distance metrics or different measures such as low-diameter trees or cross-cut congestion The increased resolution of the partition matters fundamentally; just as Ganley [G95] demon-strated the superiority of a 3× 3 partition over a 2 × 2 partition, our own work has demonstrated that the 4× 4 partition is superior still to the 3 × 3 partition For future work, storing the table for a 5 × 5 partition is probably still within reach on current hardware It could certainly be accomplished by storing only one of each set of eight symmetric configurations, though this presents an algorithmic challenge in being able to look them up sufficiently quickly; after all, this is by far the most-executed operation in the algorithm
To achieve the best overall results with X, the layout system must include a placement strategy that optimizes layout in an X-aware way Accomplishing this is in many ways more difficult than in the Manhattan realm and presents challenges that require new approaches than those used historically for Manhattan placement A few solutions to those challenges are presented here, but doubtless there
is much improvement yet to be made
40.9 X GLOBAL ROUTING
Most physical design tools start the routing process with a global router (GR), which creates a plan— that is, a set of corridors—for the detailed router to follow for each net The basic objectives of GR are
to minimize wirelength and to minimize the worst congestion, measured as (wires planned/estimated
capacity) at boundaries between regions called Gcells.
In the X interconnect architecture, another application for GR is in determining pin placement
on macros during floorplanning Without pin assignment of the quality that an X-aware GR can provide, final X routing at both the top level and within blocks would suffer
Global routing for the X interconnect architecture should be sensitive to problem details for which the change from a rectilinear to an octilinear distance metric makes a difference For example, the set of pin locations of a net can be augmented up front with auxiliary Steiner points to steer the routing toward an optimal topology Such points should be derived using octilinear Steiner tree algorithms as discussed in Section 40.11
Another example arises in the computation of wirelength lower bounds, as in estimating distance-based future cost used to evaluate intermediate nodes in search algorithms Given an already derived
target set of wiring (e.g., wiring T in Figure 40.1), consider the subproblem of routing to it from
a new point, P A fast approximation of distance to the target set is the distance L1 to a minimal
Trang 3T
L2
L1
P
B1
B2
FIGURE 40.1 Use of two bounding boxes in estimating distance.
bounding box, B1, of that set Checking the distance L2 to a second minimal bounding box B2,
with sides rotated—for example, 45◦with respect to the first—sometimes gives a much more useful (larger) lower bound Although this technique springs naturally from consideration of the octilinear distance metric, it is applicable to rectilinear wiring problems as well
The largest impact of the X interconnect architecture is that it requires fundamental rethinking
of the GR problem representation At the core of any approach is the scheme chosen to model global routes as connections between Gcells, at the boundaries of which congestion will be evaluated
In a search algorithm, the model prescribes a routing graph: the basic nodes in the search space, and the available moves between nodes (edges) It is preferable to use three dimensions of routing nodes (x, y, layer) to allow accurate assessment of costs associated with vias, especially in areas
where particular layer transition are unavailable For octilinear routing, octagonal Gcells might seem appropriate, but they are unworkable because octagons do not tile the plane One could consider rotating the Gcell grids on diagonal layers 45◦with respect to those on Manhattan layers, but mis-matched shapes complicate the modeling of layer transitions We began with a uniform grid of square Gcells on all layers
Planar moves are exclusively in the preferred routing direction for the given layer But how should moves on diagonal layers be modeled? A naive approach, providing diagonal moves between Gcells that touch at their corners, would introduce two problems:
1 Routes using only diagonal layers would fall unnaturally into two disjoint sets As with opposite-colored bishops in chess, there is no purely diagonal path between black and white squares: for example, between two Manhattan neighbors
2 Because diagonally adjacent Gcells touch only at their corners, it is unclear where the congestion impact of a move between them would be assessed Any detailed connection between such neighbors must also traverse one of their mutual Manhattan neighbors, but which one, is ambiguous
Trang 4Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C040 Finals Page 845 29-9-2008 #12
X Architecture Place and Route: Physical Design for the X Interconnect Architecture 845
FIGURE 40.2 Gcells with routing nodes.
To address these issues, we have instead used a graph with higher-resolution moves Each Gcell is divided into four quadrants, which serve as routing graph nodes The Gcells with nodes are illustrated
in Figure 40.2 On Manhattan layers, this doubles the steps needed to cover a given distance: half the moves are internal to a Gcell and do not pay a congestion cost
Routes on 45◦layers visit southeast (SE) and northwest (NW) quadrants alternately; routes on
135◦layers visit northeast (NE) and southwest (SW) quadrants alternately The connections for both types of diagonal layers are illustrated in Figure 40.3 Every diagonal move crosses a known Gcell boundary Moreover, congestion is sampled at the same places (Gcell boundaries) on every layer, as
A
NW
NW SE
SE
B
NE
NE SW SW
FIGURE 40.3 Alternate routing nodes on diagonal layers The left-side illustration shows moves between
quadrants in the 45◦direction The right-side illustration shows moves between quadrants in the 135◦direction
Trang 5FIGURE 40.4 Congestion is measured on every layer only at Gcell boundaries.
shown in Figure 40.4 Direct moves between Gcells that touch at their corners, which suffer from ambiguous congestion effects, are eliminated
Because in this model the different diagonal layers use disjoint sets of (x, y) nodes, special zig
direction changes are introduced inside Gcells, providing, for example, for direct moves between a
SE quadrant on a 45◦layer (e.g., node A in Figure 40.3), and a NE quadrant on an adjacent 135◦ layer (e.g., node B in Figure 40.3).
The presence of Gcell-internal moves at which planar congestion is not assessed in unusual Care must be taken to prevent search algorithms from abusing internal moves: for example,
if NE moves across northern Gcell boundaries could be alternated with NW sequences inside Gcells, long due-north connections could be constructed in which only 45◦-layer boundaries were crossed For this reason, we allow at most one zig in a Gcell, because two successive zig moves (e.g., SE(45) to NE(135) and NE(135) to NW(45)) could form a NW sequence without incurring
135◦-layer congestion
Zig moves are awkward, and the high-resolution move grid incurs a substantial runtime penalty
We are investigating less cumbersome models Still, the model described here has proven quite practical The model provides the substrate for any routing algorithm chosen Although the most popular algorithms for global routing use a rip-up-and-reroute approach, we were attracted to the multicommodity flow formulation as described by Albrecht ([A01], also in Chapter 32) for the provably optimal properties that its theoretical framework offers and because its use of multiple rounding phases reduces its dependence on routing order This approach builds on an algorithm of Garg and Könemann [GK98] and insightful theoretical work by Fleischer [F99]
Ref [A01] formulates a mixed integer program for GR and a linear programming relaxation that allows fractional global routes and describes cost functions in terms of edge congestion and net length with respect to which minimal Steiner trees are found for all nets in one phase and for
Trang 6Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C040 Finals Page 847 29-9-2008 #14
X Architecture Place and Route: Physical Design for the X Interconnect Architecture 847
prescribed subsets of nets in later phases The average of solutions from the different phases is a solution to the fractional GR; selection of one route for each net from a random phase gives the
GR solution Two key strong points of this approach are that it uses a very effective exponential congestion cost function and that it provides multiple alternative routes per net
Our work convinced us that multicommodity flow approaches to GR are more attractive than previously reported We have found modifications of the cost function and new uses for the alternative routes, some described below, that dramatically speed up the algorithm and improve the quality of its GR solutions A major motivation was our desire to provide tighter guidance to the detailed router
by using smaller Gcells: on the order of 10–20 tracks wide versus 50–100 in Ref [A01] Taken together with our interest in modeling five or more layers instead of two and a fourfold increase in nodes per layer to support the routing graph discussed above, the need for performance improvement
is clear
Runtime is reduced when wirelength is considered, because length contributions to cost help rein in the very broad expansion characteristic of Dijkstra-style search when minimizing congestion
cost alone Curiously, Ref [A01] recommends initializing the variables y e(for congestion on each
edge) and y L(for total wirelength) so that the initial contribution to congestion cost from any move
dominates its contribution to length cost by a very large factor, L /c(e) (L is the total wirelength of the design, and c (e) is the capacity of Gcell edge e.) If, instead, y e and y Lare initialized to an identical value (to put length and congestion costs on an even footing), excellent congestion is still achieved with much more reasonable runtime
Normalized to its initial value, the recommended congestion cost y e of using an edge e during
any search for a minimal Steiner tree is exp[εU(e)/c(e)], where ε is an experimental constant, and
U (e) is the total capacity already used by routes passing through edge e This cost is
backward-looking in that it accounts only for earlier routes A powerful and novel refinement is to charge a
forward-looking cost equal to the increase to y ethat would result if the route being considered took the given edge, namely exp[εU(e)/c(e)] − exp[εU(e)/c(e)], where U(e) − U(e) is the incremental
usage involved Without this refinement, the cost of increasing edge usage from 0 to 3, as for a wide wire, for example, would be the same(= exp(0)) whether the edge’s capacity were 1 or 16 Similarly, although a term like y Lcan help optimize total wirelength, the associated cost is only linear in the length of a net during any search Effective control of individual netlengths, as for timing-driven GR, requires a super-linear cost: for example, a term exponential in the ratio of the
route length being produced to a desired length Ld Any node n in a search can be associated with
a length estimate Lf(n) = Lg(n) + Lh(n), where Lg(n) is the total search length along the best path found from the source to node n, and Lh (n) is a lower bound on the remaining distance to the target.
Lf, rather than Lg, should be consulted when optimizing route length, using a cost to step from node
A to node B such as
exp(εLf(B)/Ld) − exp(εLf(A)/Ld),
which penalizes detouring moves but not those that move toward the target
Assembling a GR by selecting a route for each net at random from the solutions of different phases
(randomized rounding) is widely used and has advantages as a theoretical tool, but it is inappropriate
to use only this technique to convert fractional to integer solutions in practice Especially for problems with smaller Gcells, randomized rounding yields unreliable results because the tail of the congestion distribution is so long (i.e., has so much probability mass) that a small number of highly overcongested cells frequently results in practice with a randomized rounding strategy A better GR can be obtained
by applying heuristics to optimize the mix of solutions from different phases Even the following simple greedy procedure is effective:
Trang 7Input: Routes R n][p], for nets n = 1, ,N, from phases
p = 1 .P
Output: Selections s n], defining the route to use for
each net n = 1, ,N, namely R[ ][s n]]
Procedure:
(1)Set R n] := 1 for n = 1, ,N
(2)Embed route R[ ][s n]] for n = 1, ,N, to compute
total usages U(e) for edges e in the routing graph
(3)For i:= 1 to k
(4)begin
(5) For n:= 1 to N
(6) begin
(7) Unembed the current route R n][s n]]
(8) Setcost[ ]:=
e ∈R[n][p]
exp(εU(e)/c(e))forp=1, ,P
(where eachU(e)includes the embeddingof R[ ][p])
is minimum
(10) Embed route R[ ][s n]]
(11) end
(12)end
The procedure admits numerous variations: for example, initial solutions other than phase 1
(line (1)), different numbers of iterations k (line 3), different net orderings (line 5), and different cost
functions (line 8) Of course, more general, nongreedy heuristics are also possible To our knowledge, the problem of optimizing the mix of results from different phases has not yet been explored—even for applications of multicommodity flow outside of global routing—which is interesting because
we have observed that it can yield substantially improved solution quality compared to randomized rounding
The X interconnect architecture requires rethinking of the basic routing graph and also encourages
a fresh look at several aspects of global routing algorithms Exponential costs new solution selection heuristics can be applied in the optimization of many criteria in a multicommodity flow approach Although the theory of these techniques remains to be developed further, they have already proven effective in practice
40.10 X DETAILED ROUTING
The most surprising thing about X is that the whole physical design system—not just the detailed router—must be rethought to realize its full benefit and that the full benefit goes far beyond what simple rerouting of a Manhattan design with a diagonal-aware detailed router can achieve Perhaps the second most surprising thing about X is that the detailed router itself requires modifications, both to achieve runtimes competitive with Manhattan routers and the best possible results The modifications can range from fairly conservative repairs of existing routing techniques all the way
to radical reconceptualizations of the routing problem
Ultimately, the complete abandonment of preferred directions—that is, liquid routing—promises the highest-quality routing, both for X and for Manhattan, with respect to wirelength, timing, and via counts However, for reasons of both implementation simplicity and smoother adaptation of
X technology (versus possible lithographic concerns with liquid routing), we opted to maintain preferred directions (except for local jogs), which still provides many of the advantages of the X interconnect architecture versus Manhattan
Trang 8Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C040 Finals Page 849 29-9-2008 #16
X Architecture Place and Route: Physical Design for the X Interconnect Architecture 849
X detailed routing requires changes throughout a detailed routing system, ranging from low-level functionality such as the geometry manipulation machinery up to high-level strategies for double-cut via insertion In the following, we concentrate on two central areas of detailed routing in X: routing space modeling and path search with manufacturing constraints
40.10.1 ROUTINGSPACEMODEL ANDSEARCHALGORITHM
Today’s (Manhattan) detailed routing systems vary somewhat in their specifics but are generally built
on top of gridded, Dijkstra/A∗/Lee maze router-based, rip-up-and-reroute strategies For reasons of
memory, runtime, and solution quality, grid-based routers are particularly common in the domain of both block-level and full-chip flat routing Most connections are made with the minimum possible wire width and spacing: that is, at the minimum pitch at which routing segments can be placed adjacent to each other without violating design rule constraints To model the routing space most straightforwardly, a three-dimensional rectangular grid where nodes have distance equal to the min-imum pitch is a convenient and accurate data structure for representing dense packing of wires For each layer, a preferred direction (horizontal or vertical) is given All nodes and edges of the grid that
are located on a straight line in the preferred direction are commonly referred to as a track The router
positions the majority of the wires onto tracks, and the remainder are connections between tracks
(jogs) All such jogs and also all vias between the planes connect two neighboring grid nodes Special
methods are used to deal with off-grid pins, wider-than-normal wires, and other geometries, and con-straints that cannot be fully modeled in the gridded approach As described in Chapter 23, efficient search algorithms such as line search, maze routing, and, in some circumstances, track assignment methods and channel routing can be used for generating a routing on such a gridded representation Despite its near-ubiquity in Manhattan systems, the straightforward gridded representation is ill-suited to the X interconnect architecture To see this, suppose the horizontal (or vertical) distance
between adjacent nodes in the grid is P, as shown in Figure 40.5 Then, just adding diagonal edges between the nodes will restrict diagonal routing to diagonal pitches P that are multiples of P /√2
Because P> P, the minimum usable diagonal pitch would be P ∗√2, but for current manufacturing
technologies, P ≈ P holds Thus, naively using the classical three-dimensional routing grid as a
P
2 ∗ P
√
FIGURE 40.5 Gridded routing space rotation Only every other diagonal track is usable so Manhattan pitch
P→ diagonal pitch√2∗ P.
Trang 9model for diagonal routing would effectively waste more than 30 percent of all available routing resources The seemingly reasonable notion of simply rotating Manhattan routing resources on the upper layers by 45◦is not a viable approach
A suitable model for X must provide efficient usage of the available routing space This means that Manhattan and diagonal tracks must be available at the smallest permissible Manhattan or diagonal pitches, respectively, and grid nodes must exist at all intersections of tracks in two adjacent planes (to allow vias between these places) At the same time, the memory requirements should not
be significantly higher, but this is impossible to achieve with a simple three-dimensional gridded model Making the resolution finer to allow diagonal tracks in almost minimum pitch will necessarily bloat the grid size and also seriously affect the runtime of algorithms
Extensions of interval-based representations as described in Ref [H98] are much better suited for efficient detailed routing for X Relying on the predominance of preferred-direction routing but permitting diagonals, interval-based methods model the routing space with arbitrarily high resolution
in one direction per layer without impacting memory requirements
The whole routing area is implicitly viewed as a gigantic three-dimensional grid, where the
distance between two neighboring nodes is the manufacturing grid resolution M Each plane is
seen as a collection of lines with a preferred direction (horizontal, vertical, NE, or NW) within
distance M, for Manhattan, or M√
2, for diagonal, planes The lines that represent desired routing tracks (according to the minimum pitch requirements) are stored as a set of intervals Each interval represents a maximal consecutive set of nodes on the line with the same routability status All intervals comprising a single line are kept in an appropriate tree structure to support fast query, split, merge, and update operations Lines not representing routing tracks are not represented at all More details about the technique can be found in Ref [H98]
Using this approach, tracks are modeled with manufacturing grid resolution, and at all (x, y)
loca-tions where tracks of adjacent routing layers intersect, there is a grid node on both tracks (implicitly represented by an interval on the track), as illustrated in Figure 40.6 Because the highest possible
manufacturing resolution M is used for the track representation, the model preserves the flexibility
of gridless routing within a superficially gridded data structure
Manufacturing grid node
Manufacturing grid line
Diagonal routing tracks
Manhattan routing tracks
FIGURE 40.6 Efficient track-based routing space model Dashed lines on manufacturing grid units Solid
lines represent Manhattan and diagonal routing tracks Diagonal pitch matches almost fully the Manhattan pitch
Trang 10Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C040 Finals Page 851 29-9-2008 #18
X Architecture Place and Route: Physical Design for the X Interconnect Architecture 851
The memory consumption of such a data structure is determined by the complexity of the
obstructions and wires in the routing layers It does not depend on the actual grid resolution M As
the overwhelming majority of wires obey the preferred directions, this type of interval representation
is very memory-efficient [SN02] Moreover, there is no penalty for X versus Manhattan, either in terms of memory requirements or in terms of packing tracks at the minimum possible pitch The interval-based representation enables the efficient implementation of path search algorithms for X using extensions of the interval labeling approach in Ref [H98] The theoretical and practical complexity is just moderately higher than the complexity of the variant for Manhattan setups The best possible X routing requires full octilinear wiring in the absence of preferred direction constraints: that is, the generation of paths that make use of diagonal and Manhattan directions on the same plane Therefore, in planes where diagonal wiring is allowed, up to three labeling operations might happen between neighboring intervals on adjacent tracks versus one in the Manhattan case Fortunately, the runtime of a search using interval labeling is mostly determined in practice by the number of labels used to traverse between adjacent planes Because the number of potential via locations is the same for X and Manhattan wiring, although their positions are different, the overall runtime of an interval-based router is comparable for both interconnect architectures
40.10.2 MANUFACTURING-CONSTRAINEDROUTING
Having an efficient routing space representation and fast search algorithms as previously described makes it possible to use known sequential routing methods combined with rip-up-and-reroute strate-gies to do the basic routing At nanometer technology nodes, though, the fundamental routing representation, path search machinery, and rip-up heuristics are far from sufficient for creating manufacturable X designs
Manufacturing constraints such as OPC require metal geometries to fulfill certain spacing or length requirements and to avoid certain geometric structures completely The space of possible geometries that an X system can produce is far richer than those generated by a Manhattan system, enabling X to produce superior solutions On the other hand, X requires much more elaborate constraint handling to avoid creating geometries that can be difficult to manufacture Examples include
• Acute angles: that is, a metal shape having two edges in a 45◦outer angle Such geometries occur when the routing process creates a path that makes a 45◦or 315◦bend
• Short edge: that is, boundary edges of metal geometries with a length below a certain
threshold The length threshold may depend on the specific angle of the edge as well as on the angle between this edge and its neighboring edges at the corners
• Minimum area: that is, a small connected piece of metal on a plane with total area below a
certain threshold Such geometries can occur if the routing process makes a very small jog
on the plane between two vias
Although acute angles cannot be created by a Manhattan system, short edges and minimum area constraints are troublesome in Manhattan routing, too Nonetheless, there is a fundamental difference
in how these constraints are handled within an X system versus its Manhattan counterparts; although
it is possible to handle these constraints in mostly separate pre- and postprocessing phases for Manhattan wiring, X requires awareness of such rules in virtually all steps of the design process The flexibility of the X approach would otherwise generate a large number of violations that could not be repaired with simple, local transformations in a postprocessor
Short edges in Manhattan designs typically occur when complicated pin structures are accessed by wires or vias In practice, in Manhattan systems, pin access constraints are handled by a preprocessing step in which legal pin access directions are determined Remaining short edges aside from pin access