Handbook of algorithms for physical design automation part 48 doc

The routing cost is divided into two parts: 1 external cost computed using the region Quadratic solver Congestion estimator Partition Region router FIGURE 22.5 a Congestion-driven placem

Trang 1

interconnect regions as a result of clustering The third approach is to balance the perimeter-degree

of the partitions during the partitioning phase of the multilevel placement One could use a multi-constraint partitioner to solve this problem of balancing area and perimeter-degree simultaneously Alternatively, one could satisfy both the constraints in a sequential manner by first balancing the perimeter-degree and then balancing the areas of the partitions

The authors present extensive empirical data to validate their claims about fidelity of perimeter-degree as a simple and effective metric to homogenize interconnection complexity

22.3 GLOBAL-PLACEMENT CONGESTION IMPROVEMENT

There have been several studies on incorporating the congestion metric during global-placement stage

of the physical implementation flow Local wiring has a big impact on final congestion characteristics

of a design Hence historically, it has been difficult to robustly address congestion during the global-placement stage There have been major advances in addressing congestion during global global-placement over the past decade In this section, we detail some of these approaches

22.3.1 INCORPORATINGCONGESTIONESTIMATION DURINGGLOBALPLACEMENT

The placement algorithms need to have a good and fast estimation of wiring requirements if they intend to target congestion as a metric during the placement process Other chapters in this book detail several of these wiring density estimation approaches [10,19,27,39] Works [6,29] have pro-posed to incorporate congestion estimation techniques within partitioning-driven quadratic placers

in interesting ways

The authors of Ref [29] base their work on quadratic placement engine that solves an uncon-strained minimization problem, the objective function of which is the squared wirelength of the netlist Because the quadratic solution in general has many overlapping cells, the overlap is resolved

by partitioning the solution and iterating over the partitioned solutions [23] The quadratic wirelength minimum solution serves to guide a mincut partitioning routine After each round of partitioning, the cells in a region are constrained to the particular region by introducing a center of gravity constraint for each region for subsequent quadratic minimization formulations Figure 22.5a illustrates the pro-posed congestion-driven placement methodology Before each successive placement, internal route estimation and a region-based global route are performed on each region to estimate routing supply-demand ratios These ratios are used to influence the placer into growing or shrinking the partitioned regions based on resource demand and supply The region router estimates the routing demand of

wires spanning multiple regions The region router is implemented using a A∗algorithm [5] on the region-based graph Once routing demand is computed, this information is used to grow or shrink the regions of the current placement Regions with higher routing demand are allocated more white

space by growing them For q regions in a placement stage, a growth matrix G is defined as an (n − q × n − q) diagonal matrix with entry g ii equal to the region weight of the independent cell x i The growth matrix is computed as follows After a round of quadratic minimization and partitioning,

a set of new regions is generated Congestion analysis based on the router is performed on this new set of regions The routing cost is divided into two parts: (1) external cost computed using the region

Quadratic solver

Congestion

estimator

Partition

Region router

FIGURE 22.5 (a) Congestion-driven placement methodology and (b) example of region growth relieving

congestion (From Parakh, P N., Brown, R B., and Sakallah, K A., DAC, 1998.)

Trang 2

router for nets spanning multiple regions and (2) internal cost computed using a line-probe router for intraregion routes The difference between routing supply and demand of a region determine the extra routing tracks required and hence, the growth required by the region The reader is referred to the original publication [29] for details on how the growth matrix is embedded in the quadratic for-mulation of the center-of-gravity constraints The effect of the growth matrix terms in the quadratic formulation is to effectively grow and shrink the regions based on the growth matrix Reduction in congestion occurs due to the ability to transform horizontal routes into vertical and vice versa This

is shown in Figure 22.5b, where for a vertically congested region vertical expansion is produced The work in Ref [6] also targets a flow to avoid routing congestion during the global placement The main contribution is to use a fast but reliable way to detect routing criticalities and then use the information effectively in a partitioning-driven placer The techniques are tested on real-world large industrial design with very good results The framework for their studies is based on a four-way partitioning (quadrisection)-based quadratic placer, BonnPlace [37] The placement begins by solving

a quadratic wirelength minimum solution as describe in the previous paragraph This overlapping placement is then used by a quadrisection routine to generate four partitions of the netlist The objective of the quadrisection is to divide the area of the die into approximately equal regions and assign cells to each region such that density requirements for each region are not violated and the displacement of the cells from their initial quadratic wirelength minimum locations is minimized Center-of-gravity constraints are then imposed on the cells assigned to their particular regions for subsequent quadratic wirelength minimization This process is iterated till the number of cells in a region is small enough for detail placement techniques to place Like the work in Ref [29], the authors

of Ref [6] also try to relieve congestion during global-placement iterations by allocating more white space to perceived congested areas However, the mechanism is very different First, we describe the measurement of congestion of a placement as it appears during the placement algorithm as proposed

in Ref [6] Given a chip that is partitioned into k × k regions (forming a (k + 1) × (k + 1) placement

grid), pin density for each region (for local congestion cost) and a congestion estimation for each edge in the dual graph of the placement grid (for the intraregion congestion cost) is computed For

a fast estimation of a global route, probabilistic global router is used to route the intraregion nets During a placement level (iteration), the current status of the placement grid is used as the global routing partition Figure 22.6b shows the routing grid as a dual of the placement grid Multiterminal nets are split into sets of two-terminal nets using Steiner tree decomposition of the net For each two point connection in the Steiner tree of a net, probabilistic routing similar to the algorithms proposed in Refs [27,39] is used These probabilities are then added over all nets for each routing

grid box to give the expected usage p (e) Figure 22.6c shows how probabilities are calculated for two

(c) (b) Steiner tree and routing grid

(a) Placement grid

0.2 0.2 0.2 0.2 0.2 0.2 0.4

1 1

1

0.2, 0.4, 0.6 vertical edges 0.2, 0.4, 0.6 horizontal edges

0.6

0.6 0.4 0.2

0.2 0.2 0.2 0.4

0.4 0.2

Probabilistic weights

FIGURE 22.6 Calculation of routing congestion using probabilistic routing analysis Given a placement of a

chip partitioned into k × k grid shown in (a), the nets are decomposed into two pin connections using Steiner

tree decomposition as shown in (b) Each two pin connection is then routed probabilistically as shown in (c)

(From Brenner, U and Rohe, A., ISPD, 2002.)

Trang 3

connections and then added for each grid box The fraction cong(e) = p(e)

cap(e) (cap(e) is the routing capacity) is defined as the estimated congestion for an edge e ∈ E(G) in the dual of the placement

grid as shown in Figure 22.6b To account for routability problems inside a routing grid, the pin

density pin – dens(R) inside a region R is used as a second metric To use the computed congestion

data, the authors rely on the popular technique of inflating groups of cells that are deemed to be

in a congested region The size of a congested cell is increased from the normal geometric area of

s (c) = x(c) · y(c) to an increased value of s(c) = [1 + b(c)] · s(c), with b(c) ≥ 0 During the

partitioning stage, the new sizes of cells are used to satisfy the density requirements such that for set

C (R) being partitioned in subsets C(R1), , C(R4), the condition c ∈C(Ri) s(c) ≤ s(R i ) is satisfied for i = 1, , 4 The numbers b(c) depend upon an input parameter τ ≥ 0, which is an upper bound on the total increment of b (c) for each placement level The initial value b(c) for each cell

is proportional to pin density of the cell During placement, if a cell is deemed to be in a congested

region R, then b (c) is increased by min{1, 2[cong(e i ) − 1]} Once the cells in congested regions are

inflated, they are spread around using a repartitioning method in the placer, which respects the new inflated sizes of the cells and moves cells out of regions that are too full Repartitioning is called after normal partitioning stage It considers 2× 2 windows of adjacent regions R1, , R4and forms

a larger region R = R1∪, , ∪, R4 The larger region is then repartitioned to obey new density limits and improve wirelength This is done for all 2× 2 windows repeatedly over all the regions till it yields reasonable improvement in wirelength Because the windows are overlapping, it allows the inflated cells to move away from congested regions, in effect reducing the demand for routing in these regions

The work in Ref [30] also proposes a white-space allocation (WSA) technique based on conges-tion estimates during mincut global placement The techniques are based on the (WSA) technique proposed in Ref [28] and discussed in Section 22.4.2 The difference is that Ref [28] proposes to apply the technique after placement as a detail placement optimization, while Ref [30] uses it during partitioning-based global placement The framework used for the studies in Ref [30] is a top-down mincut bipartitioning-based placer Unlike Ref [6], which uses a quadrisection-based fixed grid placer, Ref [30] use a bisection-based variable grid placer in which the cutline is allowed to shift after each round of partitioning The objective for postpartitioning cutline shifting can be based on equalizing densities or congestion estimates In Ref [30], before each round of partitioning, the entire placement region is overlayed on a grid Congestion maps [39] are built using last updated locations

of all pins When cells are partitioned and their positions are changed, the congestion value for their nets are updated The routing demands and supplies for either side of the cutline are estimated using the congestion map Using these estimates, cutline is shifted after partitioning to equalize the ratio

of demand to supply on either side of the cutline This effectively allocates more white space to potentially congested areas without having to artificially change the sizes of individual cells

22.3.2 STEINERWIRELENGTHOPTIMIZATION DURINGGLOBALPLACEMENT

Taking an orthogonal approach to traditional works on congestion-driven placement, the work in Ref [30] shows that congestion characteristics of a placement can be improved by modifying the objective of the placer to optimize Steiner tree wirelength (StWL) and not the traditional half-perimeter wirelength (HPWL) The contention is that StWL correlates much better with routed wirelength in comparison to HPWL The placement framework for the study is based on the top-down mincut partitioning algorithm implemented in the well-known Capo placer [1] and is called rooster Mincut placers generally use either bisection or quadrisection to divide the placement area and the netlist The objective of the partitioning is to minimize the number of connections between the partitions and also to obey area constraints in the partitions This is done iteratively till the number

of cells in a region is small When bipartitioning is used, the careful choice of vertical or horizontal cut direction influence wirelength, wirelength ratio between horizontal and vertical components, and also routing congestion in resulting placement solution [36] Proper handling of terminals [13]

is essential to the success of top-down placement approaches When partitioning a placement bin

Trang 4

(region) cells inside the bin may connect to cells and fixed points outside the particular bin These connections need to be properly accounted for by formulating the problem as partitioning cells in a bin with fixed terminals The terminals are propagated to boundaries of the bin being partitioned and are considered fixed in one or more partitions Traditional mincut objective during placement does not accurately represent the wirelength objective [35] The authors of Ref [35] introduce a new terminal propagation technique that allows the partitioner to better map netcut to HPWL Each original net can be represented by one or two weighted nets for the partitioning problem, depending upon the configuration of the net’s terminals relative to bin’s center The weighted netcut formulation better represents the HPWL objective This formulation is later simplified in Ref [9] The work in Ref [30] extends this formulation to facilitate minimization of wirelength estimates other than HPWL, in particular, StWL minimization The authors of Ref [30] show that StWL is a valid cost function for the weighted partitioning problems However, several assumptions for minimizing traditional HPWL objective do not hold anymore Moving terminal locations changes Steiner tree construction making StWL estimates inaccurate Additionally, nets that were considered inessential in HPWL minimization cannot be considered so during StWL minimization because there are many Steiner trees of different lengths having same bounding box unlike the HPWL objective To limit the addition runtime during placement, an efficient data structure called pointset with multiplicities is proposed

to aid fast building and maintaining of Steiner trees for partitioning At the beginning of the mincut placement, all movable cells are placed at the center of the first placement bin that represents the full core placeable area When constructing the partitioning instance for a bin, a Steiner tree evaluator

is called for each essential net of the bin The weighted partitioning problem is formed based on the Steiner tree segments The weighted mincut partitioner runs and produces a solution A cutline is selected based on the partitioning result, new bins are created and cells are placed at the center of their respective bins Three different Steiner tree evaluators, batched iterated 1-Steiner (BI1ST) [20], FastSteiner [21] and fast lookup table based wirelength estimation technique (FLUTE) [11] are tested in the flow In addition, Ref [30] also suggests using the StWL objective in detail placement techniques Two types of sliding window optimizers targeting StWL are proposed The first one exhaustively checks all possible linear orderings of a group of cells in row and the second one uses dynamic programming algorithm for an interleaving optimization like the one proposed in Ref [17]

22.3.3 FREESPACEMANAGEMENT DURINGGLOBALPLACEMENT

Traditional works on congestion-driven placement in the context of high-utilization designs have focused on redistributing the available limited white space toward the congested areas However, for modern system on chip (SoC) design styles with several big intellectual property (IP) blocks, large macrocells, the available free space has increased for dust logic For such low-utilization designs,

in addition to solving the problem of local high-congestion spots, it is also important to ensure that the design is not overspread over the die and also that it has a good natural distribution of cells in the layout Addressing these concerns improves the overall quality of the physical design in terms

of performance of the circuit, wirelength, power consumed, congestion owing to global nets, etc However, even for low-utilization designs, care must be taken to ensure that every local region of the die has enough routing resources to accomodate routing demand created by cells in that region Hence, the free space management techniques during global placement need to be cognizant of maintaining enough white space in the local regions while ensuring that the design is not overspread for other placement objectives

Hierarchical WSA during top-down partitioning-based placement flows is discussed in Ref [7] and implemented in the placer Capo The available white space is uniformly spread throughout the core region This ensures that every local region of the die has enough available white space and helps routing At every level of partitioning during the top-down flow, the tolerances during the partitioning of an individual bin are determined by the white space deterioration factorα α is

determined by the available white space in the bin and the number of partitioning levels away to leaf level in the partitioning tree (approximated by log N, where N is the number of cells in the

Trang 5

bin) Hierachical WSA allows higher tolerances during partitioning and shifting cutlines after each partitioning to equalize the relative white space in the child bins and to ensure a uniform white space distribution This simple strategy is very good for high-utilization designs but produces circuits with high wirelength and worse performance for sparse designs

The work in Ref [4] argues that top-down placement based on recursive bisection with multilevel partitioning does poorly on low-utilization designs and tends to produce excessive wirelength when large amounts of white space are present On the other hand, analytical placement algorithms have

a global view of the placement problem and can better manage large amounts of white space To address this, analytical constraint generation (ACG) [4] proposes to combine a mincut-based placer with a quadratic placement engine In ACG, during top-down recursive bisection-based mincut placement flow, the partitioning capacities for each placement bin to be partitioned are generated based on quadratic wirelength minimum placement at that level At each level during the top-down flow, the quadratic wirelength minimization engine produces a placement subject to the respective placement bin constraints for each cell For each placement bin to be partitioned, the capacities of each child bin are determined by the center of mass of the quadratic wirelength minimum solution The tolerances for the partitioning problem are kept fairly rigid Care is taken to ensure that none

of the child partitioning bins overflow their target utilizations ACG gets the hint from a quadratic wirelength minimum solution on the natural placement of the design and uses powerful multilevel mincut partitioning techniques to produce lower wirelength solutions that are not overspread The techniques proposed in Refs [2,3], to handle the issue of sparse designs take a different approach Their techniques are not dependent on the type of the placer A black-box placer that uniformly distributes the available white space across the core area is assumed By preprocessing the netlist, it can be ensured that (1) there is minimum local white space through the core area and (2) better allocation of the remaining white space The constraint of minimum local white space

is required to ensure that local regions in the die do not become highly congested The technique consists of adding small disconnected free cells to the design in an amount not exceeding the white space that remains after the local white space requirement is satisfied Because the free cells are disconnected and small, the black-box placer is free to place the cells so as to improve the relevant design objectives After the placement process is complete, the free cells are removed from the design and the underlying cell sites are empty This causes high cell density (which respects the minimum local white space requirement) in certain areas, with empty sites occupying the vacant areas of the chip In addition, the work in Ref [3] describes a low-overhead implementation of filler cells in a mincut placer, because explicit modeling of free cells in the placement problem impacts the runtime and memory footprint of the placer In contrast to the ACG work where partitioning capacities for each partitioning problem are determined by quadratic wirelength minimum solution and the partitioning tolerances are kept fairly strict, in Ref [3], the partitioning tolerances for a low-utilization placement bins are increased sufficiently so that the partitioner can find the optimum mincut solution subject to the capacity and tolerance constraints The increase in tolerances is done keeping in mind the minimum local white space requirements Figure 22.7 shows the placement of the same design by different mincut placement approaches as discussed above

22.4 DETAILED PLACEMENT CONGESTION IMPROVEMENT

Congestion-driven detailed placement approaches have been studied extensively in literature In general, these approaches tend to be more effective compared to congestion-driven global placement, because a relatively more accurate congestion information can be achieved at the detailed placement level The global-placement step determines rough locations of the majority of the cells, as well

as the local area density and pin density After global placement completes, a pass of congestion estimation can be applied The congestion distribution of the entire design, or congestion map, is usually the guide of the postglobal-placement congestion-reduction algorithms

The goal of congestion-reduction algorithms is to reduce the congestion in the peak congestion spots, at the cost of increasing the routing demand in noncongested area, thus averaging out the

Trang 6

(a) (b) (c) (d)

FIGURE 22.7 Ckt4 design from IBM has 74 percent white space (a) Placement produced by Capo with

uniform white space distribution (b) Another placement produced by Capo free filler cells (not shown in the picture) was added to reduce the placer white space from 74 to 15 percent This reduces the wirelength from 15.32e6 to 8.77e6 (c) Placement obtained from a mincut placer from IBM with 70 percent target density (d) Placement obtained by the ACG technique with 70 percent target density (From Adya, S N and

Markov, I L., International Conference on Computer Aided Design (ICCAD), San Jose, CA, 2003; Alpert,

C J., Nam, G -J., and Villarrubia, P G., Proceedings of the 2002 IEEE/ACM International Conference on Computer-Aided Design, San Jose, CA, 2002.)

routing demand over the die For the detailed placement stage, there are typically following three ways to reduce the routing demands in congested area

First, traditional HPWL improvement, either globally or locally, can reduce the routing demand

in the congested area by having smaller wires in the design In this sense, the conventional wirelength minimization is of value for congestion control This is particularly true for high-utilization designs, where little room can be used for manipulating the available white space of congestion control Wang

et al [38] illustrate the relationship between total wirelength and congestion, based on the edge overflow congestion model They observe a strong correlation between wirelength and congestion

in experiments Recent work on mincut placement [22] or analytical placement [28] also suggests that reducing total HPWL helps producing routable placements

Second, in congested regions, replacing cells with accurate routing topology information can yield significant congestion reduction This type of approaches include cell swapping integrated with Steiner tree routing update in the simulated annealing algorithm [38], cell replacement with an exact congestion objective obtained by global routing or routing estimator [28,30], optimal interleaving with trunk tree decomposition [17], and sliding window optimization with StWL objective [30] The incremental routing approach in Ref [38] can be runtime prohibitive because both incremental rout-ing and annealrout-ing are computationally expensive A good control of the workrout-ing area is paramount for the approach Usually, the working area grows from a highly congested spot and is subject to some maximum area constraints If the congestion is mainly caused by entangled routing topologies, the cell replacement approach in Ref [28] can be very effective for moderate congest reduction The exploration space, however, is rather limited because fewer cells are touched comparing to other methods Both approaches failed to address the congestion from global interconnects, nets that pass through the congested area Accounting for global interconnects in routing congestion reduction is fundamentally a very hard problem for detailed placement The approach in Ref [30] uses a fast Steiner tree estimator in mincut placement It is a promising technique as it improves the StWL at almost no cost on runtime

Third, low-design utilization, or the abundance of whitespace, for many modern ASIC designs allows cell spreading, the most effective way to reduce the routing congestion By spreading the cells in the core placeable area, the routing resources can be significantly increased In other words, the average routing demand in the congested area can be reduced to meet the routing resource constraints Theoretically, if the utilization continues to decrease, the routability goal will be achieved for any given design In reality, however, spreading the placement inevitably deteriorates the circuit performance and increases the power consumption, because of increased average interconnect length

We will go through the details of several techniques in the following sections:

Trang 7

22.4.1 ROUTERINTEGRATION

An early approach on cell replacing with routing information was proposed in Ref [38] In this work, the detailed placement step is based on low-temperature simulated annealing Traditionally, the cost function in simulated annealing is total HPWL, or some form of weighted total HPWL Here the congestion cost is incorporated into the objective function To obtain congestion cost, a fast Steiner tree router is called for every single move in simulated annealing algorithm The routing supply/demand information for all the region edges need to be incrementally updated This approach

is computationally expensive because incremental routing and congestion update are much more time consuming than simple bounding-box computation The advantage is on the fidelity side—congestion improvement achieved in placement can be mostly carried on to the routing Several techniques can

be used to speed up this approach and make it practically feasible, such as routing estimation or lazy update on nets with large fanouts Using appropriate sizes of the congested regions in optimization

is another key step for this approach The idea is to avoid applying the expensive computation on noncongested areas The modeling of congestion cost greatly affects the final placement quality One basic question is should we penalize the moves only when the routing demand is higher than the supply, or we start giving some cost once the demand is close to the supply, i.e., look-ahead approach The authors tried many different congestion modeling functions and suggest that look-ahead congestion cost combined with total wirelength is most effective

A typical problem in congestion minimization is that reducing the peak congestion of one region

is often at the cost of increasing the congestion of its surrounding area Classic congestion removal approaches work in expanded area, which is the peak congestion region plus the surrounding less congested area Furthermore, these approaches work on one expanded area at a time This results in conflicting congested areas The authors in Refs [42,43] suggest conflict resolving techniques before starting congestion optimization The basic idea is to exploit the flexibility when growing the peak congestion spot to an expanded area, and obtain the best combined expanded areas for multiple peak congestion spots The problem can be formulated as a linear program or an integer linear program

A simpler heuristic is to start from greedy expansion and gradually adjust the expanded area to avoid conflicts

Jariwala and Lillis [17] proposed a detailed placement method to integrate trunk decomposition-based routing into optimal interleaving algorithm Optimal interleaving is a detailed placement wire-length minimization technique proposed in Ref [17] It is a powerful intrarow optimization technique The basic idea is to collect a group of cells in a single-row window, partition them into two groups, and optimally interleave two groups with the same cell order of each group A dynamic programming

algorithm ensures the optimal wirelength can be found in O (n2) time complexity In Ref [17], the

routing information is considered in the interleaving process The routes associated with cells are also interleaved and the number of nets crossing a channel is taken into account The algorithm is suc-cessful on field programmable gate array (FPGA) benchmarks It reduces the number of congested routing channels (defined as the channels with maximum routing density) by 45 percent on average

22.4.2 WHITESPACEMANAGEMENT

Because modern cell-based designs usually have utilizations lower than their precedences, recent research work has been focused on congestion improvement in the context of whitespace Early research on whitespace handling techniques include Ref [7], in which the whitespace control is used primarily for improving the quality of mincut partitioning, and Ref [29], which implicitly uses whitespace for area expansion

Yang et al [40,41] proposed a WSA approach for the purpose of reducing congestion The experiment flow in this chapter is complete, including global/detailed routing with a widely used commercial router, on a set of benchmarks based on a 0.18µm library After global-placement stage, the chip area is divided into rectangular areas (bins) and the congestion estimation is conducted to obtain a congestion degree for each bin At this point, all the bins have the same amount of whitespace

Trang 8

Next, a two-step whitespace reallocation is performed to adjust the amount of whitespace, or cell utilization in each bin This is accomplished by first computing the desired whitespace distribution based on the current congestion map, and then moving cells between adjacent bins to meet the white-space requirement Moving cells for meeting the target whitewhite-space distribution inevitably degrades the quality of the placement A wirelength improvement stage is introduced to recover the loss Dur-ing this low-temperature annealDur-ing stage, the latest WSA map is maintained Cell movDur-ing/swappDur-ing are allowed only if they do not violate the whitespace requirements by a certain amount The key factor that determines the performance of this approach is the conversion from congestion map to target whitespace map Because of the lack of detailed congestion information and uncertainty in cell movement, it is extremely hard to accurately estimate how much whitespace needs to be injected into

a bin to make a congested bin routable What this approach is able to do is to relieve the congestion hot spot to some extent In reality, because congestion often conflicts with other placement objec-tives (timing, wirelength, etc.), neither underdoing nor overdoing of congestion removal is desired Multiple runs with different scaling factor will help to achieve the best trade-off Another uncertainty

of this approach lies in the convergence issue The target whitespace map is derived from the initial congestion map However, after cell movement the congestion map will be changed and the target WSA is out of sync, i.e., a previous noncongested spot may pop up with less whitespace allocated Performing iterations of this approach multiple times may help, but it could cause oscillation problem

or considerable loss of quality of the input placement

An enhancement of the above approach was proposed in Ref [16] In this work, the congestion estimation and WSA are similar to that of Ref [40,41] A flow-based algorithm called placement migration is devised to guide the cell movement for achieving the whitespace map This algorithm can minimize the amount of movement (total moving distance of all the moved cells), given an original whitespace map (uniform distribution) and a target whitespace map Less cell movement means smaller perturbation from the input placement and usually lower wirelength loss

Another interesting approach named whitespace allocation is proposed by Li et al [28] The algorithm starts with a global placement and the derived congestion map The placement is then modified using a recursive bipartitioning, or slicing tree method In a preprocessing step, a congestion degree number is computed for every cell in the design, based on current placement congestion map

In the first step, the placement is recursively partitioned until every region contains a small number

of cells For each partitioning step, the cut direction is determined by the aspect ratio of the region

to be partitioned Each cutline geometrically bisects a region evenly and tries to maintain a square shape of the regions A slicing tree is constructed completely at the end of recursive bipartitioning Each node of the slicing tree corresponds to a region that is bipartitioned The left child and right child of the node corresponds to the two subregions after partitioning A list of cells is saved in each node, representing all the cells placed inside the region Next, the slicing tree is evaluated in

a bottom-up fashion A congestion level is given to each node of the tree The congestion level

of a node is simply the summation of the congestion level of its two children For a leaf node, the congestion level is the summation of the congestion degree of all the cells in the end region A cutline adjustment step is then performed The cutlines in the nodes are shifted by traversing the tree in a top-down fashion For each node, the amounts of whitespace allocated to the two child nodes are

linearly proportional to their congestion levels Consider a region r with lower-left corner (x0 , y0), upper-right corner(x1 , y1), and the original vertical cut direction at xcut = (x0+ x1)/2 The area of

this region is A r = (x1−x0)(y1−y0) Assume that the total area of cells for left subregion r0and right

subregion r1are S0and S1, and corresponding congestion levels are OVL0and OVL1, respectively The total amount of whitespace,(A r − S0 − S1), is to be allocated into two subregions such that the amounts of whitespace in the two subregions are linearly proportional to their congestion levels

(Figure 22.8) Thus, the amount of whitespace allocated to subregion r0is

r0 = (A r − S0− S1) OVL0

OVL0+ OVL1

Trang 9

A r A r

FIGURE 22.8 Subregion r0is more congested than subregion r1 Cutline is shifted from xcutto xcut , resulting

more whitespace in subregion r0

Then the new cutline location xcut can be derived as follows:

γ = S0 + (A r − S0− S1)

OVL0 OVL0+OVL1

A r

xcut= γ x1+ (1 − γ ) x0 whereγ is the ratio of the left subregion area to A rafter the cutline adjustment

This step is similar to a top-down partitioning-based global placement except that the cut direc-tion, the cut locadirec-tion, and subnetlists are all known The cells stay at the center of the region to which they belong during the top-down flow The placement is then legalized with minimum per-turbation and a cell-swapping-based detailed placement is performed Li et al showed that this step alone can considerably reduce the postrouting congestion (measured by percentage of overflowed bins) by 72 percent on average Furthermore, the similar approach was adopted in Ref [30] for improving congestion by improving the routed total wirelength The difference is that in Ref [30] the congestion-estimate-based cutline adjustment is performed during the global-placement stage, which is a top-down partitioning framework

The limitation of these whitespace adjustment approaches is they can do little on the part of congestion that is originated from global interconnects If we consider a congested region on the globally routed layout, the congestion of the area can be classified into three types: local nets, the interconnects within the region; semiglobal nets, the interconnects coming in/out the region; and the global nets, the interconnects passing through the region WSA can relieve the congestion for local or semiglobal nets However, it cannot reduce the routing demands coming from global nets Therefore, the effectiveness of the WSA approaches is greatly dependent on the composition of these different types of congestion sources Generally, the proportions of the congestion sources are determined by two factors: the netlist structure and the global-placement quality It is thus desirable to make the netlist structure more congestion friendly (reducing global nets at no cost of increasing local nets)

or improve the global-placement quality (reducing the total wirelength)

In summary, congestion reduction in detailed placement can be very effective because the con-gestion spots have been detected with high fidelity after global placement If enough whitespace

is given, most congestion can be relieved by intelligently spreading the cells to achieve a routable design However, other objectives in placement, such as timing and power, are likely to be degraded during the spreading Also, more whitespace means less density and more silicon area In that sense, congestion reduction should not be addressed only in detailed placement It is an objec-tive throughout the entire placement flow and should be considered as early as possible, even in floorplanning stage

Trang 10

22.5 SIMULATED ANNEALING FOR CONGESTION IMPROVEMENT

Simulated annealing as an effective placement algorithm has been studied for over two decades [25,32,33] As circuit sizes increase, simulated annealing is no longer a good choice as an global-placement algorithm because of its weak scalability However, during detailed global-placement stage, annealing is still an attractive approach in many cases For instance, simulated annealing can be applied on a particular region to further optimize the local placement One flexibility of annealing approach is that it can incorporate virtually any cost into its objective function, as long as the cost can be modeled and incrementally updated It is then natural to extend simulated annealing from optimizing classic HPWL to optimizing congestion

A number of publications have presented methods to apply simulated annealing for congestion alleviation They vary in congestion modeling and objective cost function In this section, we review several cost functions to represent congestion Once the right cost function is selected, both low temperature or greedy algorithm can be used as the optimization framework

22.5.1 RISA

Cheng proposed a routability model named RISA [10] and for the first time integrated the routability model in simulated annealing algorithm The basic idea of RISA can be explained using Figure 22.9

The entire chip is divided into M × N rectangular regions R represents one such region There are two nets whose bounding boxes overlap with region R Let W R (H R ) be the width (height) of region R, W1 (H1) be the width (height) of net N1 s bounding box, w1 (h1) be the width (height)

of overlapping box between R and net N1s bounding box For region R, the horizontal and vertical congestion contributed by net N1are

Ch = q w1 h1 H1W R

Cv = q w1h1

W1H R where q is the netweight that is a function of net pin count q is obtained by randomly building

optimal Steiner tree within net’s bounding box and statistically deriving the probability of net

crossings Table 22.1 shows the precomputed value for q.

Congestion cost computed by RISA model is integrated into simulated annealing algorithm Experiments show that with RISA model, routing congestion as measured by number of

overcon-gested grids is considerably reduced The runtime cost is about 2X comparing to the original simulated

annealing algorithm

W1

H1

w1

W R

N1

N2

H R

h1

R

FIGURE 22.9 RISA routability model.

Định dạng
Số trang	10
Dung lượng	195,24 KB