Handbook of algorithms for physical design automation part 69 pptx

To satisfy the noise constraint, a buffer can be inserted at u as in Equation 33.1, where WIFRNithe width of the feasible region IFRN i for buffers satisfying the noise constraint, is co

Trang 1

worst case when the victim net is fully coupled from both sides by two aggressor nets Of course, more optimistic modeling, for example, based on some distribution assumption, is also applicable Nevertheless, it will be clear that the technical conclusion, say with the distribution assumption, remains similar, and thus we should focus on the worst-case scenario for easier presentation With the worst-case scenario, we have

cc= 2lvcf

dmin . Furthermore, 2cc is adopted in the model to account for the worst case coupling effect when all the aggressor nets have different signal transitions from that of the victim net, for which the Miller effect makes the coupling phenomenon more significant by doubling the coupling effect Again, the technique to be presented readily applies to other models with less pessimistic estimation

Consider a wire e = (u, v), where u and v are two nodes in a buffered tree Let the length of the wire segment e be l e , and T (v) be the subtree rooted at v I T (v) is the total downstream current seen at

v and is the current induced by aggressor nets on downstream wires of v The current on a unit-length wire induced by aggressor nets is i0 = λpc [24], where c is the unit-length wire capacitance, λ is the fixed ratio of coupling to total wire capacitance, p is the slope (i.e., power supply voltage over input rise time) of all aggressor nets’ signals, and ccis modeled as some fraction of the unit-length wire capacitance of the victim net Letχ(u, v) be the noise on the wire segment between two neighboring buffers u and v The resulting noise χ(u, v) induced from the coupling current is the voltage pulse coupled from aggressor nets in the victim net for a wire segment e = (u, v) Using an Elmore-delay

like noise metric [24] to modelχ(u, v) (see Chapter 3), we can express the noise constraint as

χ(u, v) = RbI T(v) + rl e

i0l e

2 + I T(v)

where Rbis the output resistance of a minimum size buffer, and M vis the noise margin for a buffer

or a sink v, which is the maximum allowable noise without incurring any logic error.

The width WIFR(N)iof the independent feasible region IFR(N) i for the ith buffer that satisfies the

noise constraint is given by

WIFR(N)i≤

Rb

r

2

+

I T (v)

i0

2

+2M v

i0r −Rb

r −I T (v)

i0

For this noise model, the four factors that determine the size of a feasible region are noise margin

M v , buffer resistance Rb, unit-length wire resistance r, and crosstalk-induced unit current i0 The feasible region under noise constraint, denoted by IFR(N) iis the maximum allowable length

in each net satisfying the noise margins after buffer insertion To estimate the feasible region under noise constraint, IFR(N) i, the noise formulas [11] below can be applied The induced noise current

on wire segment e = (u, v) is computed by I e = i0l e To satisfy the noise constraint, a buffer can be

inserted at u as in Equation 33.1, where WIFR(N)ithe width of the feasible region IFR(N) i for buffers satisfying the noise constraint, is computed from Equation 33.2

χ(u, v) = RbI T (u) + rWIFR(N)i

i0WIFR(N)i

2 + I T (v)

Given two-pin nets as inputs, the method is to scan from the sink s i with the given M si to the

source s0 Because the accumulated crosstalk-induced current I T (v)is zero for pins of two-pin nets, the noise formula is given by

Trang 2

IFR(D)i

IFR(N)i

Intersection

Obstacle

Sink

FIGURE 33.9 Respective feasible regions IFR(D) i, IFR(N) i, and IFR(D) i ∩ IFR(N) ifor inserting a buffer that satisfy the delay, noise, and both delay and noise constraints

χ(u, v) = RbI T (u) + rWIFR(N)i

I e

2 + I T (v)

= Rb I e + I T (v)

+ rWIFR(N)i

I e

2 + I T (v)

= RbI e + rWIFR(N)i

I e

On the basis of Equations 33.3 and 33.4, WIFR(N)i, can be computed by

WIFR(N)i≤

Rb

r

2

+2M v

i0r −Rb

r .

In the preceding equation, WIFR(N)i is the maximum length from the next buffer B i+1back to B i

without causing any logic error

To handle the transition time, delay, and noise constraints simultaneously, we first compute the respective feasible regions IFR(R) i, IFR(D) i, and IFR(N) i for inserting buffer i to satisfy the

transition time, delay, and noise constraints, and then find the intersection of IFR(R) i, IFR(D) i, and IFR(N) i to derive the feasible region for buffer i that meets all these constraints (see Figure 33.9

for an illustration) Furthermore, the buffer block planning algorithm presented in section 33.3.3.3 still works by additionally considering the noise constraint

33.4 FLIP-FLOP AND BUFFER PLANNING (WIRE RETIMING)

Although buffer insertion is very effective in improving the delay performance (and noise toler-ance) of interconnects, the timing constraints may be so tight that they are beyond the maximum performance deliverable by buffer insertion, making the insertion of flip-flops or latches for pipelined signal transmission necessary In the case of modern high-performance microprocessors [13], it is not unusual for global signals to take several clock cycles to travel across the chip to reach their destinations In fact, the wire delay can be as long as about ten clock cycles in the near future [25]

It has been shown in Ref [26] that under an aggressive scaling scenario where the frequency of microprocessors approximately doubles and die size increases by about 25 percent in every process generation, the number of flip-flops (referred to as clocked repeaters) increases by 7 times every process generation

Trang 3

As the number of flip-flops and buffers increases in an exponential fashion, the planning and design of pipelined interconnects are very important emerging problems Several design challenges can be posed:

1 What is the minimum latency required between two communicating functional blocks of a design?

2 Given the latency constraints between two communicating functional blocks of a design, where should flip-flops and buffers be inserted to minimize, for example, the total flip-flop and buffer area?

3 How does interconnect latency affect the system behavior? Arbitrary interconnect latency may destroy the functionality of a sequential circuit How can functional blocks and interconnects be simultaneously retimed to achieve the desired circuit performance while maintaining its functionality?

4 How can buffer planning take into consideration the retiming of logic blocks and interconnects, as well as the placement of those flip-flops relocated by retiming?

33.4.1 MINIMIZINGLATENCY

In the initial stages of the design of high-performance microarchitectures, the minimum latency that can be achieved on long interconnects gives microarchitects and circuit designers an accurate prediction of the timing and routing demands required of the design There are two approaches

to the problem of latency minimization: (1) using analytical formulas [27]; and (2) using a van Ginneken-style dynamic programming approach [26]

33.4.1.1 Two-Pin Net Optimization Using Analytical Formulas

Consider a wire with length L, driver Rd, and sink Cs On the basis of the optimal delay formula

obtained when we insert n buffers into the wire [6], the optimal delay for an interconnect properly

inserted with an ideal optimal number of buffers is

Dopt (L) =Rbc + rCb+2rc (RbCb+ Tb)· L + (l r + l c ) ·2rc (RbCb+ Tb)

+ l r rCb+ l c Rbc−rc

2(l2

r + l2

c ) − Tb,

where

l r = Rd− Rb

r

l c= Cs− Cb

c

Here, the ideal optimal number of buffers is defined as

nopt(L) =

rc

2(RbCb+ Tb) · (L + l r + l c ) − 1,

which may not be an integer Therefore, the maximum length of a wire inserted with the ideal

optimal number of buffers that can meet a given delay constraint Dtgtis

Lmax(Rd, Cs, Dtgt) = Dtgt+ Tb+rc

2(l2

r + l2

c ) − l r rCb− l c Rbc − (l r + l c )√2rc (RbCb+ Tb)

Trang 4

Although the ideal optimal number of buffers noptmay not be an integer, which is not realizable, the actual optimal number of buffers of the interconnect is eithern

opt or n

opt Let LN(n) denote the

maximal length for an interconnect N with n buffers under a given timing requirement Dtgt (LN(n) can be obtained by solving for L in the optimal delay formula for a given n and Dtgt.) The maximum

wire length of the interconnect inserted with buffers that can meet a given target delay Dtgtis

Lmax(Rd, Cs, Dtgt) = max{LN(n

opt), LN(n

opt)}.

With flip-flops inserted, we have to define target delays for the first segment, the middle segments, and the last segment of the pipelined interconnects separately The timing constraint for any middle

segment, denoted Dtgt,M, is the clock period less the setup time and the flip-flop propagation delay

The timing constraint for the first segment, denoted Dtgt,F, should ensure that the maximum delay from those source flip-flops before the driver to the first flip-flop along the pipelined interconnect is smaller than one clock period less the setup time and the flip-flop propagation delay Similarly, the

timing constraint for the last segment, denoted Dtgt,L, should ensure that the maximum delay from the last flip-flop along the pipelined interconnect to the flip-flops after the sink is smaller than one clock period less the setup time and the flip-flop propagation delay Therefore, the minimum latency

or the least number of flip-flops required to meet the delay and clock period constraints is

NFF=

⎧

⎪

0 ifL ≤ Lmax(Rd, Cs, Dtgt),

1 ifLmax Rd, Cs, Dtgt

< L ≤ LL+ LF,

L −LF−LL LM

+ 1 otherwise, where

LF= Lmax(Rd, CF, Dtgt, F )

LL = Lmax(RF, Cs, Dtgt,L)

LM= Lmax(RF, CF, Dtgt,M) with R F and C F being respectively the output resistance and input capacitance of a flip-flop

In the context of flip-flop and buffer planning, of greater interest is the feasible regions (or

independent feasible regions) of flip-flops and buffers Let n be the number of flip-flops inserted in

an interconnect and f i be the location of the ith (1 ≤ i ≤ n) flip-flop With f∗

i denoting the central

location of the ith flip-flop in its feasible region, and WFRthe uniform width of the feasible regions,

we define the FR for the ith flip-flop as

FRi= f i∗− WFR

2, f i∗+ WFR

2

∩ (0, L),

such that(f1, f2, , f i, f n ) ∈ FR1× FR2× × FR n , f1 ≤ LF, f i − f i−1 ≤ L M for 2≤ i ≤ n, and

L − f n ≤ LL

The following inequalities must hold for a flip-flop solution to be feasible:

f1∗+ WFR/2 ≤ LF, f i∗− f∗

i−1+ WFR≤ LM for 2≤ i ≤ n, and L − f∗

n + WFR/2 ≤ LL

The largest WFRthat satisfies these inequalities is

WFR= (LF+ LL+ (n − 1)LM− L)/n.

Correspondingly, the central locations f i∗are

f∗= L + (i − 1)L − (i − 1/2)W for 1≤ i ≤ n.

Trang 5

The independent feasible regions of flip-flops and buffers can also be determined in a fairly straightforward fashion [27] With the definition of feasible regions of flip-flops in place, the buffer planning algorithms outlined in preceding sections can be easily extended to handle the latency minimization problem

33.4.1.2 Multiple-Terminal Net Optimization

In the case of two-pin net optimization (Section 33.4.1.1), the planning can be carried out without first performing routing In the case of multiple-terminal net optimization, the assumption is that the routing solution of global nets is known In the context of design migration, this is typically true, where the microarchitects and circuit designers would like to make minimal changes to the design

A natural algorithm to adopt would be that of van Ginneken [14]

In Ref [26], each flip-flop and buffer insertion solution can be represented by a four-tuple

γ = (c, r, λ, a), where c is the capacitance seen by the upstream resistance, r is the required arrival

time,λ is the maximum number of flip-flops crossed when going from this node (or edge) to its leaf nodes, and a is the flip-flop or buffer assignment at this node For simplicity, we assume that long

edges are segmented properly and that flip-flop and buffer insertion is allowed only at nodes

At a leaf node v, the solution is (c v , r v, 0,∅), where c v is the sink capacitance, and r vis the required

arrival time at node v The propagation of a solution from a node to its parent edge (the edge connecting

the node to its parent node) proceeds as in the dynamic programming algorithm of Ref [14] Let

the node solution at node v be (c v , r v,λ v , a v ) The corresponding solution at the upstream node of the

branch(u, v) is (c v + C u,v , r v − R u,v (C u,v + c v ), λ v,∅), where C u,v is the edge capacitance and R u,vis the edge resistance When two downstream branches meet at a parent node, we merge two solutions

(c u , r u,λ u , a u ) and (c v , r v,λ v , a v ) from the two branches to form (c u + c v, min(r u , r v ), max(λ u,λ v ), a u∪

a v ) When we insert a buffer g to drive a subtree with solution (c u , r u,λ u , a u ), the new solution is (c g , r u − R g c u − t g,λ u,{g}), where c g is the gate capacitance of g, R g is the output resistance of g, and

t g is the intrinsic delay of g When we add a flip-flop f to drive the subtree instead, the new solution

is(c f , TCP− t su,f,λ u + 1, {f }), where c f is the gate capacitance of f , TCPis the clock period, and t su, f

is the setup time of f Note that when we insert a flip-flop, we have to first verify that the pipeline

stage immediately after the newly inserted flip-flop has nonnegative slack or required arrival time

As in the van Ginneken’s algorithm, it is important to perform pruning of all solutions to keep only noninferior solutions that can lead to an optimal solution at the root node Letγ = (c, r, λ, a)

andγ = (c, r,λ, a) be two solutions at any node in the tree We say that γ is inferior and can be

pruned if at least one of the following is true:

• λ = λ, c ≥ c, and r < r

• λ = λ, c > c, and r = r

• λ = λ, c = c, r = r, and cost(γ) > cost(γ), where cost(·) is a user-specified cost function associated with the flip-flop and buffer solution; an example of the cost function is the total area of the solution

• λ > λ, c ≥ c, and r ≤ r

Also note that all solutions kept in the algorithm have nonnegative r.

33.4.2 LATENCYCONSTRAINEDOPTIMIZATION

Suppose the required latency at leaf node v is λ v(assuming that the latency at the root node is zero),

we can generalize the algorithm given in Section 33.4.1.2 by using γ = (c v , r v,−λ v, 0) at v The

algorithm in Section 33.4.1.2 can then be applied to compute an optimal solution to the latency constrained optimization problem with a minor modification: Any solution that has a latency greater than zero can be pruned [26]

Trang 6

As the required latency at the root node is zero, only solutions that have zero latency would be feasible Consequently, at the root node, if a solution has a negative latencyλ, more flip-flops can

always be added to make the solution feasible, that is, the latency at the root node equals zero As

we search top-down to retrieve an optimal solution at all nodes, we might have to insert more flip-flops Consider the solution(c u + c v, min(r u , r v ), max (λ u,λ v ), a u ∪a v ) obtained by merging solutions (c u , r u,λ u , a u ) and (c v , r v,λ v , a v ) of two downstream branches If λ u = max(λ u,λ v ), an additional

λ u − λ vflip-flops should be inserted to the branch that contains the solution(c v , r v,λ v , a v ).

33.4.3 WIRERETIMING

Unfortunately, long wires cannot be pipelined in isolation It is important to consider the effect of interconnect latency on overall system behavior Relocation of flip-flops to pipeline logic path while preserving the functionality of the circuit is known as retiming [28] However, traditional retiming approaches ignore interconnect delay In modern-day designs, it is imperative to consider the problem

of retiming with both interconnect and gate delays [29–31]

In the context of retiming, a sequential circuit can be represented by a direct graph G R (V R , E R ), where each node v ∈ V R corresponds to a combinational gate, and each directed edge e uv ∈ E R

connects the output of gate u to the input of gate v, through a nonnegative number of registers Without loss of generality, G Rcan be assumed to be strongly connected; fictitious nodes and edges

can be added to make it strongly connected otherwise Let d u be the gate delay of node u, w uvthe

number of flip-flops of edge e uv , and d uv the interconnect delay of edge e uv if all the flip-flops are removed Although it is hard to accurately model interconnect delay, it is fairly accurate to assume that the delay of a wire is linearly proportional to its length for the following reasons: When a wire is short, the linear component of the wire delay dominates the quadratic component For a long wire, buffers inserted at appropriate locations can render the delay linear

The retiming problem can be viewed as one of determining a labeling of the nodes r : V R → Z, where Z is the set of integers [28], such that w uv + r(v) − r(u) ≥ 0 for all edges w uv ∈ E R The

retiming label r (v) of node v represents the number of flip-flops moved from its outputs to its fan-ins

andˆw uv = w uv +r(v)−r(u) denotes the number of flip-flops on edge e uvafter retiming Retiming can

be formulated as a problem of determining a feasible retiming solution for a given clock period, that

is, a solution in which the number of flip-flops on every edge is nonnegative for a given clock period

The minimum achievable clock period TCP∗ can then be computed by performing a binary search

A feasible retiming solution for a given clock period TCP must satisfy the following set of constraints [30]:

d v ≤ a(v) ∀ v ∈ V R,

a (v) ≤ T CP ∀ v ∈ V R,

w uv + r(v) − r(u) ≥ 0 ∀ e uv ∈ E R,

a (v) ≥ a(u) + d uv + d v − TCP[w uv + r(v) − r(u)] ∀ e uv ∈ E R,

Here, a (v) represents the maximum arrival time at the output of gate v from a flip-flop that directly drives the logic path containing v The first two constraints are fairly straightforward The third

constraint is required for a feasible retiming solution The fourth constraint ensures that sufficient

flip-flops are inserted along each edge e uv for the circuit to be operable at a clock period of TCP Every

flip-flop along the edge e uv after retiming reduces the right-hand side of the inequality by TCP

By introducing a variable R (v) defined as a(v)/TCP+ r(v) at each node v, the preceding set of

constraints can be transformed into a set of difference constraints as follows [30]:

R (v) − r(v) ≥ d (v)

Trang 7

R (v) − r(v) ≤ 1 ∀ v ∈ V R, (33.6)

R(v) − R(u) ≥ d uv

TCP

+ d v

TCP

These difference constraints involve|V R | real variables R(v), |V R | integer variables r(v), and 2|V R| +

2|E R | constraints, and can be solved in polynomial time of O(|V R ||E R | log |V R | + |V R|2log2|V R |),

using Fibonacci heap as the data structure [32]

Given a feasible retiming solution, the exact positions at which flip-flops should be inserted

can be determined as follows: For each edge e uv with nonzero ˆw uv, the first flip-flop on this edge

is inserted at a distance that corresponds to a delay of TCP− a(u) from the output of gate u Other flip-flops are inserted at a distance that corresponds to a delay of TCPfrom the previous one, until

gate v is reached All remaining flip-flops on this edge are then inserted right before v.

A fast approximation algorithm can be obtained by first replacing each gate by a wire of the same delay, and then solving optimally and efficiently the retiming problem with only interconnect delays [30] The key to the fast approximation algorithm is the observation that for a directed graph

where d v = 0 for all v ∈ V R , given R (v) for all v ∈ V Rthat satisfy the constraint in Equation 33.8, the

set of difference constraints can be satisfied by setting r (v) = R(v) for all v ∈ V R The problem

of finding R (v) for all v ∈ V R to satisfy the constraint given in Equation 33.8 can be posed as a

single-source longest-paths problem on G R with the cost or length of each edge e uv ∈ E R defined

as d uv /TCP− w uv Any node in G R can be the source node as the graph is strongly connected If G R has a positive cycle, the clock period TCPis infeasible The single-source longest-paths problem can

be solved by the Bellman–Ford algorithm in O (|V R ||E R |) time complexity With a path compaction preprocessing step to the reduce the size of G R, the complexity can be further reduced

Given a retiming solution for a graph with only interconnect delays, if the solution retimes some flip-flops into a wire that represents a gate, a postprocessing step is required to get back a feasible retiming solution that has both gate and interconnect delays First, we move the flip-flops in a gate to its fan-ins or fan-outs depending on which direction has a shorter distance (delay) A linear program

is then used to determine the exact positions of the flip-flops on the interconnect edges The objective

of the linear program is to minimize the clock period TCPsubject to constraints on the flip-flop counts

and constraints on the delays between flip-flops Let x k

uv denote the delay from the kth flip-flop to the (k + 1)st flip-flop of the wire from node u to node v in G R , for k = 0, 1, , ˆw uv The linear program

is formulated as follows:

Minimize TCP

subject to

ˆwuv

k=0

x k

uv = d uv ∀ e uv ∈ E R,

x uv ˆwuv + d v ≤ a(v) ∀ e uv ∈ E Rs.t ˆw uv > 0,

a (u) + x0

uv ≤ T CP ∀ e uv ∈ E Rs.t ˆw uv > 0,

a (u) + d uv ≤ a(v) ∀ e uv ∈ E Rs.t ˆw uv > 0,

33.4.4 AREACONSTRAINEDWIRERETIMING

To account for the area overhead incurred by wire retiming during the planning stage, a more closely related problem is that of minimum-area retiming To render conventional minimum-area retiming applicable to interconnects, each long interconnect can be represented as a series of interconnect units,

Trang 8

each of which has delay but performs no logic function A natural segmentation of an interconnect can be obtained by buffer insertion, with each interconnect unit being a buffer driving an interconnect segment

Although minimum-area retiming is optimal in terms of overall area consumption, it may not be directly applicable to interconnect retiming and planning To minimize the total area consumption,

it may relocate flip-flops from regions with a lot of empty space to overcongested regions That may result in area constraint violations in a given floorplan, necessitating iterations of floorplanning and interconnect planning Therefore, for interconnect retiming and planning, it is necessary to consider local area constraints such that both the timing and the impact on floorplan of the relocated flip-flops can be taken into account In Ref [29], a new retiming problem, called local area constrained (LAC) retiming problem, has been formulated with the following three sets of constraints, of which the first two are typical of the retiming problem [28] and the third captures the local area constraints:

1 Edge weights must be nonnegative:

r (v) − r(u) ≥ −w(e u,v ), ∀ e u,v ∈ E R

2 For any path u v whose delay (along successive combinational logic paths) is larger than the clock period TCP, there should be at least one flip-flop on it after retiming:

r (v) − r(u) ≥ −W(u, v) + 1, ∀ u v, D(u, v) > TCP, where

W (u, v) defines the minimum latency for a signal to transfer from u to v before retiming

D (u, v) is the maximum delay (of successive combinational logic paths) of the logic path from u to v with the minimum latency W (u, v)

3 To define the local area constraints, we let F be the set of all functional units, V T be the

set of all tiles, and for any t i ∈ V T , C (t i ) be the remaining capacity (after buffer insertion) that is available for flip-flop insertion The function P : F → V Tmaps each functional unit

v ∈ F to a tile t i ∈ V T such that P (v) = t i means that functional unit or interconnect unit v

is in tile t iof the floorplan The local area constraint of a tile requires that

P (u)=ti, eu,v∈ER

w (e u,v ) + r(v) − r(u)≤ C(t i ), ∀ t i ∈ V T

As each local area constraint involves more than two retiming variables, the LAC-retiming problem is an integer linear programming problem, which is NP-complete In Ref [29], a heuristic based on minimum-area retiming was used to solve the LAC-retiming problem In minimum-area retiming, all flip-flops are assumed to have the same area cost; thus, the minimization of total number

of flops is equivalent to the minimization of the total area In LAC-retiming, the insertion of flip-flops into different tiles should take into account the differences in the tile capacities To achieve that, the LAC-retiming problem was solved in Ref [29] as a series of weighted minimum-area retiming problems, with the weights of flip-flops adjusted according to the congestion levels in the tiles As different weights are assigned to flip-flops in different tiles based on the area consumption and tile capacities in the series of minimum-area retiming problems, flip-flops from overutilized tiles can be repositioned to those with low-area consumption

33.5 CONCLUDING REMARKS

While Semiconductor process scaling has enabled integrated circuits of increasingly high perfor-mance, it has also created several new design concerns In this chapter, we have summarized several

Trang 9

buffer planning methodologies that tackle the design challenges brought forth by the exponential growth of buffers Most of these methodologies address both timing and layout closure issues simulta-neously by allocating sufficient silicon resources and routing resources during floorplanning or right after floorplanning As multiple-cycle data communications become increasingly necessary, many

of these buffer planning methodologies have been extended to also address the exponential growth

of flip-flops (clocked repeaters) The challenge here is to account for the changes in latency intro-duced by additional flip-flops along global interconnects While we have presented these planning methodologies in the context of synchronous system design, we believe that these methodologies also have an important role to play in the design of SOCs, NOCs, latency-insensitive systems, and globally asynchronous locally synchronous systems

It is also important to recognize that the planning methodologies presented in this chapter may have fundamental limits To a certain extent, the planning methodologies shield the downstream stages of physical synthesis from the problem of inserting a huge fraction of repeater (and clocked repeater) However, empirical studies [33] indicate that it is unlikely that incremental improvements

to the physical synthesis technologies can adequately handle the exponential growth in repeater and clocked repeater counts if the scaling continues at the existing pace Instead, a correct-by-construction design methodology that trades off optimality for predictability has been proposed in Ref [33] Perhaps even more alarming is a theoretical study, which is based on Rent’s rule [34,35], that demonstrates the necessity of excessively long wires as the number of computing elements within a system continues to grow [36] As large monolithic designs are unattractive, increased quality, instead of improved capacity, of CAD algorithms and tools should perhaps be the proper objective of future research [36]

REFERENCES

1 J Cong, T Kong, and Z Pan Buffer block planning for interconnect planning and prediction IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 9(6):929–937, 2001 (ICCAD 1999).

2 C J Alpert, J Hu, S S Sapatnekar, and P G Villarrubia A practical methodology for early buffer and

wire resource allocation IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,

22(5):573–583, 2003 (DAC 2001)

3 P Sarkar and C -K Koh Routability-driven repeater block planning for interconnect-centric

floorplanning IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,

20(5):660–671, 2001 (ISPD 2000)

4 J Cong, L He, K -Y Khoo, C -K Koh, and Z Pan Interconnect design for deep submicron ICs In

Proceedings of IEEE/ACM International Conference on Computer Aided Design, San Jose, CA, pp 478–

485, 1997

5 W C Elmore The transient response of damped linear networks with particular regard to wide-band

amplifiers Journal of Applied Physics, 19(1):55–63, January 1948.

6 C J Alpert and A Devgan Wire segmenting for improved buffer insertion In Proceedings of ACM/IEEE Design Automation Conference, Anaheim, CA, pp 588–593, June 1997.

7 F F Dragan, A B Kahng, I I Mandoiu, S Muddu, and A Zelikovsky Provably good global buffering

by generalized multiterminal multicommodity flow approximation IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 21(3):263–274, 2002 (ASPDAC 2001).

8 F F Dragan, A B Kahng, S Muddu, and A Zelikovsky Provably good global buffering using an available

buffer block plan In Proceedings of IEEE/ACM International Conference on Computer Aided Design, San

Jose, CA, pp 104–109, 2000

9 X Tang and D F Wong Network flow based buffer planning Integration, 30(2):143–155, 2001 (ISPD

2000)

10 Y -H Cheng and Y -W Chang Integrating buffer planning with floorplanning for simultaneous

multi-objective optimization In Proceedings of IEEE/ACM Asia South Pacific Design Automation Conference,

pp 624–627, Piscataway, NJ, 2004 IEEE Press

Trang 10

11 H -R Jiang, Y -W Chang, J -Y Jou, and K -Y Chao Simultaneous floorplan and buffer block

optimization IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 23(5):694–

703, 2004 (ASPDAC 2003)

12 Y Ma, X Hong, S Dong, S Chen, Y Cai, C K Cheng, and J Gu Dynamic global buffer planning

optimization based on detail block locating and congestion analysis In Proceedings of ACM/IEEE Design Automation Conference, pp 806–811, New York, 2003 ACM Press.

13 R McInerney, M Page, K Leeper, T Hillie, H Chan, and B Basaran Methodology for repeater insertion management in the RTL, layout, floorplan, and fullchip timing databases of the Itanium microprocessor

In Proceedings of ACM International Symposium on Physical Design, San Diego, CA, pp 99–104, 2000.

14 L P P P van Ginneken Buffer placement in distributed RC-tree networks for minimal Elmore delay

In Proceedings of IEEE International Symposium on Circuits and Systems, New Orleans, LA, pp 865–868,

1990

15 S Chen, X Hong, S Dong, Y Ma, Y Cai, C -K Cheng, and J Gu A buffer planning algorithm based

on dead space redistribution In ASP-DAC ’03: Proceedings of the 2003 Conference on Asia South Pacific Design Automation, pp 435–438, Piscataway, NJ, 2003 IEEE Press.

16 S Chen, X Hong, S Dong, Y Ma, Y Cai, C -K Cheng, and J Gu A buffer planning algorithm with

congestion optimization In Proceedings of IEEE/ACM Asia South Pacific Design Automation Conference,

pp 615–620, Piscataway, NJ, 2004 IEEE Press

17 C W Sham and E F Young Routability driven floorplanner with buffer block planning IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 22(4):470–480, 2003

(ISPD 2002)

18 K K Wong and E F Young Fast buffer planning and congestion optimization in interconnect-driven

floorplanning In Proceedings of IEEE/ACM Asia South Pacific Design Automation Conference, Kitakyushu,

Japan, pp 411–416, 2003

19 C Albrecht, A B Kahng, I Mandoiu, and A Zelikovsky Floorplan evaluation with timing-driven global

wireplanning, pin assignment and buffer/wire sizing In Proceedings of IEEE/ACM Asia South Pacific Design Automation Conference, Bangalore, India, pp 580–591, 2002.

20 Y Ma, X Hong, S Dong, S Chen, C -K Cheng, and J Gu Buffer planning as an integral part of

floorplanning with consideration of routing congestion IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 24(4):609–621, 2005 (ISPD 2003, ASPDAC 2004).

21 H Xiang, X Tang, and D F Wong An algorithm for integrated pin assignment and buffer planning ACM Transactions on Design Automation of Electronics Systems, 10(3):561–572, 2005 (DAC 2002).

22 P Sarkar and C -K Koh Repeater block planning under simultaneous delay and transition time constraints

In Proceedings of IEEE/ACM Design, Automation and Test in Europe Conference, pp 540–545, Piscataway,

NJ, 2001 IEEE Press

23 S -M Li, Y -H Cherng, and Y -W Chang Noise-aware buffer planning for interconnect-driven

floorplanning In Proceedings of IEEE/ACM Asia South Pacific Design Automation Conference, Kitakyushu,

Japan, pp 423–426, 2003

24 A Devgan Efficient coupled noise estimation for on-chip interconnects In Proceedings of IEEE/ACM International Conference on Computer Aided Design, San Jose, CA, pp 147–153, 1997.

25 D Matzke Will physical scalability sabotage performance gains? IEEE Computers, 8:37–39, September

1997

26 P Cocchini A methodology for optimal repeater insertion in pipelined interconnects IEEE Transactions

on Computer-Aided Design of Integrated Circuits and Systems, 22(12):1613–1624, 2003 (ICCAD 2002).

27 R Lu, G Zhong, C -K Koh, and K -Y Chao Flip-flop and repeater insertion for early interconnect

planning In Proceedings of IEEE/ACM Design, Automation and Test in Europe Conference, Paris, France,

pp 690–695, March 2002

28 C E Leiserson and J B Saxe Retiming synchronous circuitry Algorithmica, 6:5–35,1991.

29 R Lu and C -K Koh Interconnect planning with local area constrained retiming In Proceedings of IEEE/ACM Design, Automation and Test in Europe Conference, Messe Munich, Germany, pp 442–447,

March 2003

30 C C Chu, E F Young, D K Tong, and S Dechu Retiming with interconnect delay In Proceedings of IEEE/ACM International Conference on Computer Aided Design, San Jose, CA, pp 221–226, 2003.

31 C Lin and H Zhou Retiming for wire pipelining in system-on-chip IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 23(9):1338–1345, 2004 (ICCAD 2003).

2 Given the latency constraints between two communicating functional blocks of a design, where should flip-flops and buffers be inserted to minimize, for example,... solution of global nets is known In the context of design migration, this is typically true, where the microarchitects and circuit designers would like to make minimal changes to the design

A

Định dạng
Số trang	10
Dung lượng	182,68 KB