Handbook of algorithms for physical design automation part 45 pps

The net-based approach deals with nets only, with the hope that if we handle the nets on the critical paths well, the entire critical path delay may be optimized implic-itly.. Netweighti

Trang 1

Test in Europe Conference, pp 321–324, 1998.

7 H Ren, D Z Pan, C J Alpert, and P Villarrubia Diffusion-based placement migration In Proceedings of

Design Automation Conference, pp 515–520, 2005.

8 C Li, M Xie, C -K Koh, J Cong, and P H Madden Routability-driven placement and white space

allocation In Proceedings of International Conference on Computer Aided Design, pp 394–401, 2004.

9 J A Roy, J F Lu, and I L Markov Seeing the forest and the trees: Steiner wirelength optimization in

placement In Proceedings of International Symposium on Physical Design, pp 78–85, 2006.

10 C Li, C -K Koh, and P H Madden Floorplan management: Incremental placement for gate sizing and

buffer insertion In Proceedings of Asia South Pacific Design Automation Conference, pp 349–354, 2005.

11 T Lou, H Ren, C J Alpert, and D Z Pan Computational geometry based placement migration In

Proceedings of International Conference on Computer Aided Design, pp 41–47, 2005.

12 D Hill US Patent 6,370,673: Method and system for high speed detailed placement of cells within an integrated circuit design, 2002

13 A R Agnihotri, S Ono, C Li, M C Yildiz, A Khatkhate, C -K Koh, and P H Madden Mixed

block placement via fractional cut recursive bisection IEEE Transactions on Computer-Aided Design

of Integrated Circuits and Systems, 24(5):748–761, 2005.

14 U Brenner, A Pauli, and J Vygen Almost optimum placement legalization by minimum cost flow and

dynamic programming In Proceedings of International Symposium on Physical Design, pp 2–9, 2004.

15 A R Agnihotri, M C Yildiz, A Khatkhate, A Mathur, S Ono, and P H Madden Fractional cut: Improved

recursive bisection placement In Proceedings of International Conference on Computer Aided Design,

pp 307–310, 2003

16 T H Cormen, C E Leiserson, and R L Rivest Introduction to Algorithms MIT Press, Cambridge, MA,

1990

17 C K Cheng, S Z Yao, and T C Hu The orientation of modules based on graph decomposition IEEE

Transactions on Computers, 40(6):774–780, 1991.

18 T W Her and D F Wong On over-the-cell channel routing with cell orientations consideration IEEE

Transactions on Computer-Aided Design of Integrated Circuits and Systems, 14(6):766–772, 1995.

19 W Swartz and C Sechen A new generalized row-based global router In Proceedings of Design Automation

Conference, pp 491–498, 1993.

20 A E Caldwell, A B Kahng, and I L Markov Optimal partitioners and end-case placers for

standard-cell layout IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 19(11):

1304–1313, 2000

21 S -W Hur and J Lillis Mongrel: Hybrid techniques for standard cell placement In Proceedings of

International Conference on Computer Aided Design, pp 165–170, 2000.

22 M R Garey and D S Johnson Computers and Intractibility: A Guide to the Theory of NP-Completeness.

W H Freeman and Co., San Francisco, CA, 1979 p 209

23 A B Kahng, P Tucker, and A Zelikovsky Optimization of linear placements for wirelength minimization

with free sites In Proceedings of Asia South Pacific Design Automation Conference, pp 241–244, 1999.

24 U Brenner and J Vygen Faster optimal single-row placement with fixed ordering In Proceedings of

Design, Automation and Test in Europe Conference, pp 117–122, 2000.

25 C C Chang, J Cong, and M Xie Optimality and scalability study of existing placement algorithms In

Proceedings of Asia South Pacific Design Automation Conference, pp 621–627, 2003.

Trang 2

21 Timing-Driven Placement

David Z Pan, Bill Halpin, and Haoxing Ren

CONTENTS

21.1 Introduction 424

21.2 Building Blocks and Classification 424

21.2.1 Net Modeling 424

21.2.2 Timing Analysis and Metrics 425

21.2.3 Overview of Timing-Driven Placement 426

21.3 Netweighting-Based Approach 427

21.3.1 Static Netweighting 427

21.3.1.1 Slack-Based Netweighting 428

21.3.1.2 Sensitivity-Based Netweighting 429

21.3.2 Dynamic Netweighting 432

21.3.2.1 Incremental Timing Analysis 432

21.3.2.2 Incremental Net Weighting 433

21.3.2.3 Placement Implementation 433

21.4 Net-Constraint-Based Approach 434

21.4.1 Net-Constraint Generation 434

21.4.1.1 Generating Effective NLCs 434

21.4.1.2 Single-Shot NLC Generation 434

21.4.1.3 Incremental NLC Generation 435

21.4.2 Net-Constraint Placement 436

21.4.2.1 Partition-Based Net-Constraint Placement 436

21.4.2.2 Force-Directed Net-Constraint Placement 437

21.4.2.3 Net-Constraint-Based Detailed Placement 437

21.5 Path (or Timing Graph)-Based Approach 437

21.5.1 LP-Based Formulation 438

21.5.1.1 Physical Constraints 438

21.5.1.2 Electrical/Timing Constraints 438

21.5.1.3 Objective Functions 439

21.5.2 Partitioning-Based Overlap Removal 440

21.5.3 Lagrangian Relaxation Method 440

21.5.4 Simulated Annealing 441

21.5.5 Graph-Based Differential Timing 441

21.6 Additional Techniques 441

21.6.1 Hybrid Net and Path-Based Approach 442

21.6.2 Hippocrates: A Detailed Placer without Degrading Timing 442

21.6.3 Accurate Net-Modeling Issue 442

21.7 Conclusions 443

References 443

423

Trang 3

The placement algorithms presented in the previous chapters mostly focus on minimizing the total wirelength (TWL) Timing-driven placement (TDP) is designed specifically targeting wires on timing critical paths It shall be noted that a cell is usually connected with two or more cells Making some targeted nets shorter during placement may sacrifice the wirelengths of other nets that are connected through common cells While the delay on critical paths decreases, other paths may become critical Therefore, TDP has to be performed in a very careful and balanced manner

Timing-driven placement has been studied extensively over the last two decades The drive for new methods in TDP to maximize circuit performance is from multiple facets because of the technology scaling and integration: (1) growing interconnect versus gate delay ratios; (2) higher levels

of on-die functional integration, which makes global interconnects even longer; (3) increasing chip operating frequencies, which make timing closure tough; and (4) increasing number of macros and standard cells for modern system-on-chip (SOC) designs These factors create continuing challenges

to better TDP

Timing-driven placement can be performed at both global and detailed placement stages (see previous chapters on placement) Historically, TDP algorithms can be roughly grouped into two classes: net-based and path-based The net-based approach deals with nets only, with the hope that if

we handle the nets on the critical paths well, the entire critical path delay may be optimized implic-itly The two basic techniques for net-based optimization are through netweighting [1–4] and net constraints [5–10] The path-based approach directly works on all or a subset of paths [11–14] The majority path-based approaches formulate the problem into a mathematical programming frame-work (e.g., linear programming [LP]) There are pros and cons for both net-based and path-based approaches in terms of runtime/scalability, ease of implementation, controllability, etc Modern TDP techniques tend to use some hybrid manner of both net-based and path-based approaches [15]

In this chapter, we discuss fundamental algorithms as well as recent trends of TDP Because of the large amount of works in TDP, it is not possible to exhaust all of them in this chapter Instead, we describe the basic ideas and fundamental techniques, and point out recent researches and possible future directions We first cover the basic building blocks for TDP Then the next two sections discuss net-based approaches, i.e., through netweighting and net constraints Then we survey the basic formulations and algorithms behind the path (or timing graph)-based approach Additional techniques and issues in the context of TDP are discussed, followed by conclusions

21.2 BUILDING BLOCKS AND CLASSIFICATION

21.2.1 NETMODELING

Given a placement, net modeling answers a fundamental question how the net is modeled for its routing topology and wirelength computation/estimation Figure 21.1 shows different net modeling strategies for a multiple-pin net

Bounding box model Clique model Star model Steiner model

FIGURE 21.1 Different net models that can be used for placement.

Trang 4

The simplest and most widely used method to compute wirelength is the half-perimeter

wire-length (HPWL) of its bounding box For a net i, let l i , r i , u i , and b irepresent the left, right, top, and

bottom locations of its bounding box Then the HPWL of net i is

HPWL is the lower bound for wirelength estimation, and it is accurate for two- and three-pin nets, which account for the majority nets

In analytical placement engines, wirelength is often modeled as a quadratic term (or pseudolinear term as in recent literature [16,17]) In those engines, clique and star models are often used In the clique model, an edge is introduced between every pin pair of the net In the star model, an extra star point located at the geometric center is created and each pin is connected to the star point In general, small nets (e.g., less than five pins) can use the clique model, but for large nets with a lot of pins, clique model is not friendly to the matrix solvers because it creates dense matrices Star models are preferred for large nets For the clique model, because it is a complete graph with far more edges than necessary to connect the net, each edge is usually assigned a weight of 2/n (where n is the number

of pins of the net) [18]

The bounding box, clique, and star models are three most popular net models There are other net models, such as Steiner trees, which are more accurate for nets with four or more pins However,

in most designs two- and three-pin nets are the majority of the entire netlist For example, in the industry circuit suite from Ref [19], two- and three-pin nets constitute 64 and 20 percent of the total nets, respectively [20] With exception of very few placers [21], Steiner-tree-based models are seldom used because they are computationally expensive There are some recent works trying to link Steiner tree with placement, e.g., in a partition-based placer [22] More research is needed to evaluate or make Steiner-based placement mainstream

21.2.2 TIMINGANALYSIS ANDMETRICS

As its name implies, TDP has to be guided by some timing metrics, which in turn need delay modeling and timing analysis TDP algorithms can use different levels of timing models to trade off accuracy and runtime In general, the switch level resistance-capacitance (RC) model for gates and Elmore delay model for interconnects are fairly sufficient There are more accurate models [23], but they are not extensively used in placement One main reason is the higher runtime The other reason is that during placement, routing is not done yet It is not very meaningful to use more accurate models if errors from those uncertainties are even greater

Based on the gate and interconnect delay models, static timing analysis (STA) or even path-based∗timing analysis can be performed STA [24] computes circuit path delays using the critical path method [25] From the set of arrival times (Arr) asserted on timing starting points and required arrival times (Req) asserted on timing endpoints, STA propagates (latest) arrival time forward and (earliest) required arrival time backward, in the topological order Then the slack at any timing point

t is the difference of its required arrival time minus its arrival time.

Static timing analysis can be performed incrementally if small changes in the netlist are made For more details on delay modeling and timing analysis, the reader is referred to Chapter 3

Timing convergence metrics measure the extent to which a placement satisfies timing constraints They also give an indication of how difficult it would be for a design engineer to

∗ In most cases, STA is sufficient for TDP The path-based timing analysis is more accurate, e.g., to capture false paths But it is very time consuming, and one may do it only if necessary, e.g., on a set of critical paths.

Trang 5

slack (WNS)

where P o is the set of timing endpoints, i.e., primary outputs (POs) and data inputs of memory elements To achieve timing closure, WNS should be nonnegative For nanometer designs with growing variability, one may set the slack target to be a positive value to safe guard variations from process, voltage, or thermal issues The WNS, however, only gives information about the worst path

It is possible that two placement solutions have similar WNS values, but one has only a single critical path while the other has thousands of critical and near critical paths The figure of merit (FOM) is another very important timing closure metric [4] It can be defined as follows:

t ∈Po,Slk(t)<Slkt

where Slkt is the slack target for the entire design If Slkt = 0, the FOM is reduced to the total negative slack (TNS) [8]

21.2.3 OVERVIEW OFTIMING-DRIVENPLACEMENT

The overview of TDP is shown in Figure 21.2 It has three basic components: timing analysis, core placement algorithms, and interfaces between them by translating timing analysis/metrics into certain weights or constraints for core placement engines to drive and guide TDP

The previous section discusses the basics of timing analysis and metrics from a given netlist However, which netlist to start with so that we can have a meaningful timing analysis to guide TDP is a very important yet open question For example, shall we start from an unplaced netlist or some initial placement? For modern timing closure, many buffers also need to be inserted for high-fanout nets and long interconnects to get a reasonable timing picture (otherwise, there may be many loading/slew violations that make timing reports meaningless) On the other hand, those buffers will change the netlist structure for TDP Shall they be kept or stripped out during TDP? There is very little literature covering this netlist preparation step A reasonable strategy can be as follows: First, we start with some initial placement (e.g., wirelength driven), then perform some rough buffering/fanout optimization to get a reasonable timing estimation for the entire chip to guide TDP engine Whether

to keep those buffers during TDP may vary among different physical synthesis systems

Placement has been one of the most heavily studied physical design topics Some of the most popular placement algorithms include analytical/force-directed placement, partition-based place-ment, simulated-annealing (SA) based placeplace-ment, and LP-based placement The reader is referred

to Chapters 15 through 18 for detailed discussions

The most interesting aspect of the TDP is the mechanism to translate timing metrics into actions to drive the core placement engines The focus of the rest of this chapter is on this aspect Based on that, the TDP can be roughly classified into net-based and path-based approaches The net-based approach,

Timing analysis/metrics

• Gate

• Interconnect

• Static timing analysis

• Path-based timing analysis

Interface to placer

• Netweights

• Net constraints

• Path constraints, etc.

Core placement algorithms

• Force-directed

• Partitioning-based

• Simulated annealing

• LP-based, etc.

FIGURE 21.2 Basic building blocks and overview of TDP.

Trang 6

as its name implies, deals with individual nets Because timing analysis inherently deals with paths (with timing propagation), the timing information is then translated into either net constraints or netweights [1–4] to guide TDP engines The main idea of net constraint generation (or delay bud-geting) is to distribute slack for each path to its constituent nets such that a zero-slack solution

is obtained The delay budget for each net is then translated into its wirelength constraint during placement The main idea of netweighting is to assign higher netweights to more timing-critical nets while minimizing the total weighted wirelength objective Netweighting gives direction for timing optimization through shortening critical nets, but it does not have exact control because the objective

is the total weighted wirelength; the net constraint approach specifies that, but it may be too much constrained in terms of global optimization Net weighting and net constraint processes can be iter-atively refined as more accurate timing information is obtained during placement A systematic way

of explicit perturbation control is important for net-based algorithms

The path-based approach directly works on all or a subset of paths The majority of this approach formulate the problem into mathematical programming framework (e.g., LP) It usually maintains

an accurate timing view during the placement optimization [26] However, its drawback is its poor scalability and complexity because of possible exponential number of paths to be simultaneously optimized [26] An effective technique is to embed timing graph/constraint through auxiliary vari-ables [11] The mathematical programming-based approach needs to deal with cell overlapping issues, e.g., through partitioning The path-based timing can also be evaluated in a simulated annealing framework [13]

Both net-based and path-based approaches have pros and cons Path-based in general has more accurate timing view and control, but it suffers from poor scalability The net-based approaches,

in particular netweighting, have low computational complexity and high flexibility Thus, they are suitable for large application specific integrated circuits/system-on-chip (ASIC/SOC) designs with millions of placeable objects Recent research shows that hybrid of these two basic approaches are promising [15]

21.3 NETWEIGHTING-BASED APPROACH

Classic placement algorithms optimize the TWL They can be easily modified to be timing-driven using the netweighting technique, which assigns different weights to different nets such that the placer minimizes the total weighted wirelength (if all the weights are the same, it degenerates into the classic wirelength-driven placement) Intuitively, a proper netweighting should assign higher weights on more timing-critical nets, with the hope that the placement engine will reduce the lengths

of these critical nets and thus their delays to achieve better overall timing

Netweighting-based TDP is very simple to implement and less computational intensive As mod-ern very large scale integration (VLSI) designs have millions placeable objects (gates/cells/macros), netweighting is attractive because of its simplicity Almost all placement algorithms support netweighting Quadratic placement can optimize the weighted quadratic wirelength, partition-based placement can optimize the weighted cutsize (see Chapter 8), and simulated-annealing-partition-based placement can optimize the weighted linear wirelength, etc

Although netweighting appears to be straightforward, it is not easy to generate a good netweight-ing Higher netweights on a set of critical nets in general shall reduce their wirelengths and delays, but other nets may become longer and more critical In this section, we will review two basic sets of netweighting algorithms: static netweighting and dynamic netweighting Static netweighting assigns weights once before TDP and the weights do not change during TDP Dynamic netweighting updates weights during the TDP process

21.3.1 STATICNETWEIGHTING

Static netweighting computes the netweights once before TDP It can be divided into two cate-gories: empirical netweighting and sensitivity-based netweighting Empirical netweighting methods

Trang 7

nets are assigned higher netweights Sensitivity-based netweighting computes weights based on the sensitivity analysis of netweights to factors such as WNS and FOM The key difference of these two netweighting schemes is that sensitivity-based approach has some look-ahead mechanism that can estimate the impact of netweighting on key factors Therefore, it assigns higher netweights on those nets that have bigger impacts on the overall timing closure goal

21.3.1.1 Slack-Based Netweighting

Empirical netweighting assigns netweight based on the critically of the net, which indicates how much the placer should reduce the wirelength on this net The criticality computation can be com-puted based on the static timing analysis (STA) Assuming there is only one clock period, the net criticality can often be measured by slack Nets with negative slacks are critical nets and are assigned higher netweights than those nets with positive slacks

W1, slack< 0

where W1and W2 are positive constants and W1 > W2 Among the critical nets that have negative slacks, higher weights can also be given to those which are more negative One can either use a continuous model [1] or a step-wise model [2] to compute weights based on the slack distribution,

as shown in Figure 21.3

It shall be noted that netweights shall not continue to increase when slack is less than certain threshold This is because slack can be very negative because of invalid timing assertions Usually, placers do not need very high netweights to pull nets together

For some placers, one might add an exponential component into netweighting [27] to further emphasize critical nets

1−slack

T

α

(21.6)

where

T is the longest path delay

α is the criticality exponent

If there are multiple clocks in the design, the clock cycle time can be considered during netweighting Nets on paths of shorter cycle time should have higher weights than those of longer

cycle time with the same slack For example, we can use T in Equation 21.6 to represent different cycle times Nets of long cycle clocks get a larger T than those of short cycle clocks.

FIGURE 21.3 Netweight assignment based on slack using continuous or step model.

Trang 8

1−slack

Tclk

α

(21.7)

where Tclkis the clock cycle time for a particular net

There are other empirical factors that can be considered for netweighting, e.g., path depth and driver strength [28] A deep path, which has many stages of logic, is more likely to have longer wirelength The endpoints of this path could be placed far away from each other, therefore worse timing is expected than a path with less number of stages A net with a weak driver would have longer delay than a strong driver with the same wirelength Therefore, netweight should be proportional to the longest path depth (which can be computed by running breadth first search twice: once from PIs and second time from POs), and inversely to slack and driver strength, i.e.,

where

Dlis the longest path depth

Rdis the driver resistance

A weaker driver has a larger effective driving resistance, thus the net it drives will have a bigger netweight

One could also consider path sharing during netweighting Intuitively, a net on many critical paths should be assigned a higher weight because reducing the length of such net can reduce the delay on many critical paths

where GP is the number of critical paths passing this net Suppose we assign two variables for each

net p on the timing graph: F (p) the number of different critical paths starting from timing beginning

point (PI) to net p; and B (p) the number of different critical paths from net p to timing endpoints

(PO) The total number of critical paths passing through net p is then GP (p) = F(p) × B(p) This

netweighting assignment only considers the sharing effect of critical paths, and each path has the same impact of the netweight Kong proposed an accurate, all path counting algorithm PATH [3], which considers both noncritical and critical paths during path counting It can properly scale the impact of all paths by their relative timing criticalities

To perform netweighting for unplaced designs, STA can use the wire load model, e.g., based

on fanout, to estimate the delay (compared to placed designs, STA can use the actual wire load

to compute delay) Normally, it is not accurate with wire load models Therefore, for an unplaced design, an alternative way of generating weights is to use fanout and delay bound [29] instead of slack Fanout is used to estimate wirelength and wire delay [30], and delay bound is the estimated allowable wire delay, i.e., any wire delay above this bound would result in negative slack The weight can be computed as the ratio of fanout and delay bound

In general, as the impact of netweight assignment is not very predictable, extensive parameter tuning may be needed to make it work on specific design styles

21.3.1.2 Sensitivity-Based Netweighting

Netweighting can help improve timing on critical paths However, it may have negative effects

on TWL Assigning higher netweights on too many nets may result in significant degradation of

Trang 9

only on those nets that will result in large gain in timing, we can reduce wirelength degradation and other side effects of netweighting

Sensitivity-based netweighting tries to predict the netweighting impact on timing and use that sensitivity to guide netweighting [4,31] The question that sensitivity analysis tries to address is as follows: Given an initial placement from an initial netweighting scheme, if we increase the weight

for a net i by certain nominal amount, how much improvement net i will get for its worst slack (WNS)

and the overall FOM (or in a more familiar term, TNS when the slack threshold for FOM is 0) With detailed sensitivity analysis, larger weights could be assigned to a net whose weight change can have

a larger impact to delay In this section, we will explain how to estimate both slack sensitivity and TNS sensitivity to netweights and how to use those sensitivities to compute netweights

First, one needs to estimate the impact of netweight change to wirelength, i.e the wirelength sensitivity to netweight This sensitivity depends on the characteristics of a placer It is not easy

to estimate such sensitivity for mincut or simulated-annealing-based algorithms But for quadratic placement, one can come up with an analytical model to estimate it Based on Tsay’s analytical model [6], the wirelength sensitivity to netweight can be derived [4] as

S L

Wsrc(i) + Wsink(i) − 2W(i)

where

L (i) is the initial wirelength of net i

W (i) is the initial weight of net i

Wsrc(i) is the total initial weight on the driver/source of net i (simply the summation of all nets

that intersect with the driver)

Wsink(i) is the total initial weight on the receiver/sink of net i

Intuitively, Equation 21.11 implies that if the initial wirelength L (i) is longer, for the same

amount of nominal weight change, it expects to see bigger wirelength change Meanwhile, if the

initial weight W (i) is relatively small, its expected wirelength change will be bigger The negative

sign means that increasing netweight will reduce wirelength

The next step is to estimate the wirelength impact on delay Using the switch level RC device model and the Elmore delay model [32], the delay sensitivity to wirelength can be estimated as

S T

where

r and c are the unit length wire resistance and capacitance, respectively

Rdis the output resistance of the net driver

C l is the load capacitance

It implies that for a given technology (fixed r and c), the delay of a long wire with a weak driver

and large load will be more sensitive to the same amount of wirelength change

With wirelength sensitivity and delay sensitivity, one can compute the slack sensitivity to netweight as

SSlkW (i) = Slk(i) W(i) = −T(i) L(i) ·W(i) L(i) = −S T

L (i)S L

Total negative slack is an important timing closure objective The TNS sensitivity to netweight

is defined as follows:

Trang 10

Note that TNS improvement comes from the delay improvement of this net, Equation 21.14 can be decomposed into

STNS

T(i)

Slk

where K (i) = TNS

T(i), which means how much TNS improvement it can achieve by reducing net delay

T (i) It has been shown in Ref [4] that K(i) is equal to the negative of the number of critical timing

endpoints whose slacks are influenced by net i with a nominal T(i) and can be computed efficiently

as shown in the following algorithm:

Algorithm 1Counting the number of influenced timing critical

endpoints for each net

1 decompose nets with multiple sink pins into sets

of driver-to-sink nets

3 sort all nets in topological order from timing end points

to timing start points

7 for allsink pinjof netido

As an example, Figure 21.4 shows two paths from a timing begin point P ito timing endpoints

P o1 and P o2 Net n3.1 and n3.2 are the decomposed driver-to-sink nets from the original net n3 The

pairs in the figure such as (−3, 1) have the following meaning: the first number is the slack, and the

second number is the K value Because the slacks at P o1 and P o2are−3ns and −2ns, respectively

(worse than the slack target of 0), the K values for P o1 and P o2 are both 1 We can see how the K values are propagated from PO to PI Note that for gate C, the upper input pin has slack of−2ns while the lower input pin has slack of−1ns, thus the upper pin is the most timing critical pin to gate

C and it will influence the slack of P o2 The lower pin of C does not influence P o2

The sensitivity-based netweighting scheme starts from a set of initial netweights (e.g., uniform netweighting at the beginning), and computes a new set of netweights that would maximize the slack

P i (- 3, 2)

n5

A

n3.2

( - 2, 1)

B

n3.1

( - 3, 1) ( - 3, 1)

n1 P o1

P o2

n4 ( - 1, 0)

C

n2

D

FIGURE 21.4 Counting the number of influenced timing endpoints.

Định dạng
Số trang	10
Dung lượng	196,2 KB