Handbook of algorithms for physical design automation part 100 docx

Because the number of fanin nodes to the root of the tree is bounded, an exhaustive search for the right placement order of the subtrees is reasonable and would result in a linear-time a

Trang 1

from overcrowded regions to empty regions They take care not to cause thrashing in which LUTs are moved back and forth between two clusters Avoiding thrashing can be done by keeping a history

of violations of CLBs Hence, if thrashing has been occurring for a few moves, the relative cost of both CLBs involved in thrashing is increased, resulting in the extra LUT or register to be moved to

a third CLB

46.4.4 LINEARDATAPATHPLACEMENT

Callahan et al [29] presented GAMA, a linear-time simultaneous placement and mapping method for LUT-based FPGAs They only focus on datapaths that are comprised of arrays of bitslices The basic idea is to preserve the datapath structure so that we can reduce the problem size by primarily looking at a bitslice of the datapath Once a bitslice is mapped and placed, other bitslices of the datapath can be mapped and placed similarly on rows above or below the initial bitslice

One of the goals in developing GAMA was to perform mapping and placement with little compu-tational effort To achieve a linear time complexity, the authors limit the search space by considering only a subset of solutions, which means they might not produce an optimal solution Because optimal mapping of directed acyclic graphs (DAGs) is NP-complete, GAMA first splits the circuit graph into

a forest of trees before processing it by the mapping and placement steps The tree covering algorithm does not directly handle cycles or nodes with multiple fanouts, and might duplicate nodes to reduce the number of trees Each tree is compared to elements from a preexisting pattern library that contains compound modules such as the one shown in Figure 46.12 Dynamic programming is used to find the best cover in linear time After the tree covering process, a postprocessing step is attempted to find opportunities for local optimization at the boundaries of the covered trees Interested readers are referred to Ref [29] for more details on the mapping process of GAMA

Because the modules will form a bitslice datapath layout, the placement problem translates into finding a linear ordering of the modules in the datapath Wirelength minimization is the primary goal during linear placement The authors assume that the output of every module is available at its right boundary A tree is placed by recursively placing its left and right subtrees, and then placing the root node to the right of the subtrees The two subtrees are placed next to each other Figure 46.13 shows

an example of a tree placement Because subtree t2 is wider, placing it to the right of subtree t1 will

result in longer wirelength Because the number of fanin nodes to the root of the tree is bounded, an exhaustive search for the right placement order of the subtrees is reasonable and would result in a linear-time algorithm

In addition to the local placement algorithm, Callahan et al also attempt some global optimiza-tions The linear placement algorithm arranges modules within a tree, but all trees in the circuit must also be globally placed A greedy algorithm is used to place trees next to each other so that

+

&

+

&

Pattern in library Library pattern found

in circuit graph

FIGURE 46.12 Example of a pattern in the tree covering library (Based on Callahan, T J et al.,

Proceed-ings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 123–132, 1998 With

permission.)

Trang 2

t1

(a)

t2

t1

(b)

FIGURE 46.13 Tree placement example (Based on Callahan, T J et al., Proceedings of the ACM/SIGDA

International Symposium on Field Programmable Gate Arrays, 123–132, 1998 With permission.)

the length of the critical path in the circuit is minimized Furthermore, after global and local place-ment is accomplished, individual modules are moved across tree boundaries to further optimize the placement

Ababei and Bazargan [30] proposed a linear placement methodology for datapaths in a dynami-cally reconfigurable system in which datapaths corresponding to different basic blocks∗in a program are loaded, overwritten, and possibly reloaded on linear strips of an FPGA They assume that the FPGA chip is divided into strips as shown in Figure 46.14 An expression tree corresponding to computations in a basic block is placed entirely in one strip, getting its input values from either memory blocks on the two sides of the strip and writing the output of the expression to one of these memory blocks

Depending on how frequently basic blocks are loaded and reloaded, three placement algorithms are developed:

1 Static placement: This case is similar to the problem considered by Callahan et al [29], that is, each expression tree is given an empty FPGA strip to be placed on The solution proposed by Ababei tries to minimize critical path delay, congestion, and wirelength

Strip 1 I/O

M I/O M I/O M I/O M

I/O M I/O M I/O M I/O M

Strip 2 Strip 3 Strip 4

FIGURE 46.14 FPGA divided into linear strips.

∗ A basic block is a sequence of code, for example, written in the C language, with no jumps or function calls A basic block, usually the body of a loop with many iterations, could be mapped to a coprocessor like an attached FPGA to perform computations faster Data used by the basic block should be made accessible to the coprocessor and the output of the computations should in turn be made accessible to the processor This could be achieved either by streaming data from the processor to the FPGA and vice versa, or by providing direct memory access to the FPGA.

Trang 3

algorithm is covered in Section 47.3.2.1.

2 Dynamic placement with no module reuse: In this scenario, we assume that multiple basic blocks can be mapped to the same strip, either because a number of them run in parallel, or because there is a good chance that a mapped basic block be invoked again in the future The goal is to place the modules of a new expression on the empty regions between the modules

of previous basic blocks, leaving the previously placed modules and their connections intact

As a result, the placement of the new basic block becomes a linear, noncontiguous placement problem with blockages being the modules from previous basic blocks

3 Dynamic placement with no module reuse: This scenario is similar to the previous one, except that we try to reuse a few modules and connections left over by previous basic blocks that are no longer active Doing so will save in reconfiguration time and results in better usage of the FPGA real estate Finding the largest common subgraph between the old and the new expression trees helps us maximize the reuse of the modules that are already placed The authors propose a greedy solution for the second problem, that is, dynamic placement without module reuse The algorithm works directly on expression trees Modules are rank-ordered based on parameters such as the volume (sum of module widths) of their children subtrees, and latest arrival time on the critical path The ordering of the nodes determines the linear order in which they should

be placed on the noncontiguous space

To solve the third problem, that is, dynamic placement with module reuse, first a linear ordering

of modules is obtained using the previous two algorithms to minimize wirelength, congestion, and critical path delay Then a maximum matching between the existing inactive modules and the linear ordering is sought such that the maximum number of modules are reused while perturbations to the linear ordering are kept at a minimum The algorithm is then extended to be applied to general graphs, and not just trees To achieve better reuse, a maximum common subgraph problem is solved

to find the largest subset of modules and their connections of the expression graphs that are already placed and those of the new basic block

46.4.5 VARIATION-AWAREPLACEMENT

Hutton et al proposed the first statistical timing analysis placement method for FPGAs [31] They consider both inter- and intradie process variations in their modeling, but do not model spatial correlations among within-die variables In other words, local variations are modeled as independent random variables.∗In Ref [31], they model delay of a circuit element as a Gaussian variable, which

is a function of V t and Leff, each of which are broken into their global (systematic) and local (random) components Block-based statistical timing analysis [33] is used to compute the timing criticality of nodes, which will be used instead of TVPR’s timing-cost component (see Equations 46.3 and 46.5) SSTA (statistical static timing analysis) is performed only at each temperature, not at every move

In their experiments, they compare their statistical timing-based placement to TVPR, and

con-sider the effect of guard-banding and speed-binning Guard-banding is achieved by adding k σ to the delay of every element, where k is a user-defined factor such as 3, 4, or 5, and σ is the

stan-dard deviation of the element’s delay Timing yield considering speed-binning is computed during Monte Carlo simulations by assuming that chips are divided into fast, medium, and slow critical path delays Their statistical placement shows yield improvements over TVPR in almost all combi-nations of guard-banding and speed-binning scenarios In a follow-up work, Lin and He [34] show

∗ Cheng et al [32] show that by ignoring spatial correlations, we lose at least 14 percent in the accuracy of the estimated delay The error in delay estimation accuracy is defined as the integration of the absolute error between the distributions obtained through Monte Carlo simulations and statistical sum and maximum computations of the circuit delay See Section 46.3 of Ref [32] for more details.

Trang 4

that combining statistical physical synthesis, statistical placement and statistical routing result in significant yield improvements (from 50 failed chips per 10,000 chips to 5 failed chips in their experimental setup)

Cheng et al [32] propose a placement method that tailors the placement to individual chips, after the variation map for every chip is obtained This is a preliminary work that tries to answer the question of given the exact map of FPGA element delays, how much improvement can we get by adapting the placement to individual chips They show about 5.3 percent improvement on average

in their experimental setup, although they do not address how the device parameter maps can be obtained in practice

46.4.6 LOWPOWERPLACEMENT

Low power FPGA placement and routing methods try to assign noncritical elements to low power resources on the FPGA There have been many recent works targetting FPGA power minimization

We will only focus on two efforts: one deals with the placement problem [35] and the other addresses dual voltage assignment to routes [36], the latter will be discussed in Section 46.5.4

The authors in Ref [35] consider an architecture that is divided into physical regions, each of which can be independently power gated To enable leakage power savings, designers must look into two issues carefully:

1 Region granularity: They should determine the best granularity of the power gating regions Too small a region would have high circuit overheads both in terms of sleep transistors and configuration bits that must control them On the other hand, a finer granularity gives more control over which logic units could shut down and could potentially harness more leakage savings

2 Placement strategies: CAD developers should adopt placement strategies that constrain logic blocks with similar activity to the same regions If all logic blocks placed in one region are going to be inactive for a long period of time, then the whole region can be power gated However, architectural properties of the FPGA would influence the effectiveness of the placement strategy For example, if the FPGA architecture has carry chains that run in the vertical direction, then the placement algorithm must place modules in regions that are vertically aligned Not doing so could harm performance significantly

By constraining the placement of modules with similar power activity, we can achieve two goals: power gate unused logic permanently, and power gate inactive modules for the duration of their inactive period In their experiments, they consider various sizes of the power gating regions and also look into dynamic versus static powering down of unused/idle regions

46.5 ROUTING

Versatile placement and routing [6] uses Dijkstra’s algorithm (i.e., a maze router) to connect terminals

of a net Its router is based on the negotiation-based algorithm PathFinder [37] PathFinder first routes all nets independently using the shortest route for each path As a result, some routing regions will become overcongested Then in an iterative process, nets are ripped-up and rerouted to alleviate congestion Nets that are not timing-critical take detours away from the congested regions, and nets that are timing critical are likely to take the same route as round one

There is a possibility that two routing channels show a thrashing effect, that is, nets are

ripped-up from one channel and rerouted through the other, and then in the next iteration be ripped-ripped-up from the second and rerouted through the first To avoid this, VPR use a history term that not only penalizes routing through a currently congested region, but it also uses the congestion data from the recent history to avoid thrashing So the congestion of a channel is defined as its current resource (over-)usage plus a weighted sum of the previous congestion values from previous routing iterations

Trang 5

Reexpand around new wire

FIGURE 46.15 Local expansion of the wavefront (Based on Betz, V and Rose, J., Field-Programmable

Logic and Applications (W Luk, P Y Cheung, and M Glesner, eds.), pp 213–222, Springer-Verlag, Berlin,

Germany, 1997 With permission.)

To route a multiterminal net, VPR uses the maze routing algorithm, described in Chapter 23

After connecting two terminals of a k terminal net, VPR’s maze router starts a wave from all points

on the wire connecting the two terminals The wave is propagated until the next terminal is reached

The process is repeated k− 1 times When a new terminal is reached, instead of restarting the wave from the new wiring tree from scratch, the maze routing algorithm starts a local wave from the new branch of wire that connected the new terminal to the rest of the tree When the wavefront of the local wave gets as far out as the previous wavefront, the two waves are merged and expanded until

a new terminal is reached Figure 46.15 illustrates the process

46.5.1 HIERARCHICALROUTING

Chang et al propose a hierarchical routing method for island-style FPGAs with segmented routing architecture in Ref [38] (Section 45.4.1) Because nets are simultaneously routed, the net-ordering problem at the detailed routing level would not be an issue, in fact, global routing and detailed routing are performed at the same time in this approach They model timing in their formulation as well, and estimate the delay of a route to be the number of programmable switches that it has to go through This is a reasonable estimation because the delay of the switch points is much larger than the routing wires in a typical FPGA architecture Each channel is divided into a number of subchannels, each subchannel corresponding to the set of segments of the same length within that channel

After minimum spanning routing trees are generated, delay bounds are assigned to segments of the route and then the problems of channel assignment and delay bound recalculation are solved

hierarchically Figure 46.16 shows an example of a hierarchical routing step, in which connection i

is generated by a minimum spanning tree algorithm The problem is divided into two subproblems, one containing pin1 and the other containing pin2 The cutline between the two regions contains a number of horizontal subchannels The algorithm tries to decide on the subchannel through which this net is going to be routed Once the subchannel is decided (see the right part of Figure 46.16), then the routing problem can be broken into two smaller subproblems While dividing the problem into smaller subproblems, the algorithm keeps updating the delay bounds on the nets, and keeps an eye on the congestion

To decide on which subchannel j to use to route a routing segment i, the following cost

function is used:

C ij = C (1)

ij + C (2)

ij + C (3)

Trang 6

After assignment

pin1 ch1

ch2

ch3 ch4

ch1

ch2

l i1

l i2

Subchannel j

ch3 ch4

Cutline

Pin2

Region 1 Region 2

pin2 Connection i

FIGURE 46.16 Delay bound redistribution after a hierarchical routing step (Based on Chang, Y.-W et al.,

ACM Transactions on Design Automation of Electronic Systems, 5, 433–450, 2000 With permission.)

where C ij (0) is zero if connection i can reach subchannel j, and∞ otherwise Reachability can be determined by a breadth-first search on the connectivity graph The second term intends to utilize the routing segments evenly according to the connection length and its delay bound:

C (2) ij = a

l i

U i

− L j

where

l i is the Manhattan distance of the connection i

U iis the delay bound of the connection

L j is the length of routing segments in the subchannel j

a > 0 is a constant

The term tries to maximize routing resource efficiency in routing So, for example, if a net

has a delay bound U i = 4 and Manhattan distance l i = 8, it can be routed through four switches, which means the ideal routing resource whose length is just right for this connection is 8/4 = 2.

For a subchannel that contains routing segments of length 2, the cost function will evaluate to zero, that is, segment length of 2 is ideal for routing this net On the other hand, if a subchannel with segment length of 6 is considered, then the cost function will evaluate to 4, which means using segments of length 6 might be an overkill for this net, as its slack is high and we do not have to waste our length 6 routing resources on this net

Cost component C ij (3)in Equation 46.6 is shown in Figure 46.17 Figure 46.17b shows a typical nontiming driven routing, and Figure 46.17a shows the cost function used in Ref [38] The basic idea

is to assign a lower cost to routes that are likely to use fewer bends For example, in Figure 46.17a,

if subchannel s3 is chosen, then chances are that when the subproblem of routing from a pin to s3

is being solved, more bends are introduced between the pin and s3 On the other hand, routing the net through s1 or s5 will guarantee that the route from the subchannel to at least one pin is going to

use no bends Note that the cost of routing outside the bounding box of the net increases linearly to discourage detours, which in turn hurt the delay of a net

After a net is divided into two subnets, the delay bound of the net is distributed among the two subnets based on their lengths So, for example, in Figure 46.17, if the original delay bound of

connection i was U i , then U i1 = [l i1 /(l i1 + l i2 )] × U i , and U i2 = [l i2 /(l i1 + l i2 )] × U i

46.5.2 SAT-BASEDROUTING

Recent advances in SAT (Satisfiability problem) solvers have encouraged researchers to formulate various problems as SAT problems and utilize the efficiency of these solvers Nam et al [39] for-mulated the detailed routing on a fully segmented routing architecture (i.e., all routing segments

Trang 7

s1 s2 s3

(b)

(a)

s4

s5

Cutline

C ij(3)

X-coordinate of the subchannel used for routing

FIGURE 46.17 Cost function (Based on Chang, Y.-W et al., ACM Transactions on Design Automation of

Electronic Systems, 5, 433–450, 2000 With permission.)

are of length 1) as a SAT problem The basic idea is shown in Figure 46.18 Figure 46.18a shows

an instance of a global routing problem that includes three nets, A, B, and C and an FPGA with a channel width of three tracks Figure 46.18b shows possible solutions for the routing of net A.

In a SAT problem, constraints are written in the form of conjunctive normal form (CNF) clauses

The CNF formulation of the constraints on net A are shown in Equation 46.8, where AH, BH, and

CH are integer variables showing the horizontal track numbers that are assigned to nets A, B, and C, respectively AV is the vertical track number assigned to net A The conditions on the first line enforce that a unique track number is assigned to A, the second line ensures that the switchbox constraints

are met (here it is assumed that a subset switchbox is used), and the third line enforces that a valid

track number is assigned to the vertical segment of net A These conditions state the connectivity constraints for net A.

2

Row index (a) Global routing example (b) Possible solutions for net A

Net B

Net A Net C

CLB

(0, 0)

CLB (0, 0)

CLB (2, 0)

CLB (4, 0)

CLB (2, 0) CLB

(4, 0)

CLB (4, 0)

CLB (4, 2)

CLB (2, 2)

CLB

(0, 2)

CLB (0, 2)

Vertical channel 1

2

1

0

Horizontal channel 1

Vertical channel 3

SRC 0

1 DST

0

1

0

FIGURE 46.18 SAT formulation of a detailed routing problem (From Nam, G.-J., Sakallah, K A., and

Rutenbar, R A., IEEE Trans Comput Aided Des Integrated Circuits Syst., 21, 674, 2002 With permission.)

Trang 8

Conn(A) = [(AH ≡ 0) ∨ (AH ≡ 1) ∨ (AH ≡ 2)] ∧

[(AH = AV)] ∧

[(AV ≡ 0) ∨ (AV ≡ 1) ∨ (AV ≡ 2)]

(46.8)

To ensure that different nets do not share the same track number in a channel (exclusivity constraint), conditions like Equation 46.9 must be added to the problem:

where H1 refers to the horizontal channel shown in Figure 46.18a The routability problem of the

example of Figure 46.18a can be formulated as in Equation 46.10:

Routable(X) = Conn (A) ∧ Conn (B) ∧ Conn (C) ∧ Excl (H1) (46.10)

where X is a vector of track variables AH, BH, CH, AV, BV, and CV If Routable (X) is satisfiable, then

a routing solution exists and can be derived from the values returned by the SAT solver The authors extend the model so that doglegs can be defined too Interested readers and referred to Ref [39] for details

Even though detailed routing can be elegantly formulated as a SAT problem, in practice its application is limited If a solution does not exist (i.e., when there are not enough tracks), the SAT solver would take a long time exploring all track assignment possibilities and returning with a negative answer, that is, Routable(X) is not satisfiable Furthermore, even if a solution exists but the routing

instance is difficult (e.g., when there are barely enough routing tracks to route the given problem instance), the SAT solver might take a long time In practice, the SAT solver could be terminated

if the time spent on the problem is more than a prespecified limit This could either mean that the problem instance is difficult, or no routing solution exists for the given number of tracks

46.5.3 GRAPH-BASEDROUTING

The FPGA global routing problem can be modeled as a graph matching problem in which branches

of a routing tree are assigned (matched) to sets of routing segments in a multisegment architecture

to estimate the number of channels required for detailed routing Lin et al propose a graph-based routing method in Ref [40] The input to the problem is a set of globally routed nets The goal is

to assign each straight segment of each net to a track in the channel that it is globally routed so that

a lower bound on the required number of tracks is obtained for each channel Interactions between channels are ignored in this work, as a result, the bound on the number of tracks needed for each channel is calculated in isolation The actual number of tracks needed for the whole design might be larger depending on the switchbox architecture and the way horizontal and vertical channels interact They model the track assignment problem within one channel as a weighted matching problem Straight segments of nets are called subnets (e.g., a net routed in the shape of an “L” is divided into

two subnets) Within a channel, subnets belonging to a maximum clique C of overlapping subnets∗ are assigned to tracks from a set of tracks H using a bipartite graph matching problem Members of set C form the nodes on one side of the bipartite graph used in the matching problem, and the nodes

on the other side of the matching graph are tracks in set H The weight on the edges from subnets to routing tracks are determined based on the track length utilization The track utilization Ur(i x , t ) of

a subnet i x on track t is defined as

Ur(i x , t ) = len(i x )

1≤y<klen s y

+α

∗ Refer to Ref [41] for more discussions on finding cliques of overlapping net intervals and calculating lower bounds on channel densities.

Trang 9

len(i x ) and len(s y ) are the respective lengths of the subnet i x and the segment s y

y is an FPGA routing segment in the track that i xis globally routed in

k is the number of segments needed to route the subnet on that track

Note that the first and the last FPGA routing segments used in routing the subnet might be longer than what the subnet needs, and hence some of the track length would go underutilized The algorithm tries to maximize routing segment utilization by matching a subnet to a track that has segments whose lengths and starting points match closely to those of the span of the subnet This

is achieved by maximizing the sum of track utilizations Ur(i x , t ) over all subnets Parameter α in

the equation above is used to enable simultaneous routability and timing optimization They further extend the algorithm to consider timing as well as routability using an iterative process After an initial routing, they distribute timing slacks to nets, and order channels based on how critical they are A channel is critical if its density is the highest

46.5.4 LOWPOWERROUTING

The authors in Ref [36] assume that all switches and connection boxes in a modified island-style FPGA are Vdd-programmable An SRAM bit can determine if the driver driving a particular switch

or connection box will be in high or low Vdd To avoid adding level converters, they enforce the constraint that no low-Vdd switch can drive a high-Vdd element The result is each routing tree can

be mapped either fully in high-Vdd, or fully in low-Vdd, or mapped to high-Vdd from the source

up to a point in the routing tree, and then low-Vdd from that point to the sink In terms of power consumption, it is desired to map as many routing resources to low-Vdd, as that would consume less power than high-Vdd But because low-Vdd resources are slower, care must be taken not to slow down critical paths in the circuit

They propose a heuristic sensitivity-based algorithm and a linear programming formulation for assigning voltage levels to programmable routing resources (switches and their associated buffers) The sensitivity-based method first calculates power sensitivityP/Vddfor each routing resource, which is the power reduction by changing high-Vdd to low-Vdd A resource with the highest sensitivity is tried with low-Vdd If the path containing the switch does not violate the timing con-straint, then the switch and all its downsteam routing resources are locked on low-Vdd Otherwise, the switch is changed back to high-Vdd The linear programming method tries to distribute path slacks among route segments such that the number of low-Vdd resources is maximized subject to the constraint that no low-Vdd switch drives a high-Vdd one

46.5.5 OTHERROUTINGMETHODS

In this subsection, we review miscellaneous routing methods such as pipeline routing, congestion-driven routing, and statistical timing routing

46.5.5.1 Pipeline Routing

Eguro and Hauck [42] propose a timing-driven pipeline-aware routing algorithm that reduces critical path delay A pipeline-aware routing problem requires the connection from a source node to a sink node to pass through certain number of pipeline registers and each segment of the route (between source, sink, and registers) must satisfy delay constraints The work by Eguro and Hauck adapts PathFinder [37] When considering pipelining, the problem becomes more difficult compared to a traditional routing problem, because as registers move along a route, the criticality of the routing

segments would change For example, suppose a net is to connect logic block A to logic block B through one register R In the first routing iteration, R might be placed close to A, which makes the subroute A–R not critical, but R–B would probably be critical In the next iteration, R might move

Trang 10

closer to B, and hence the two subroutes might be considered critical and noncritical in successive

iterations

To address the problem stated above, the authors in Ref [42] perform simultaneous wave propa-gation maze routing searches, each assuming that the net has a distinct timing-criticality value When the sink (or a register) is reached in the search process, the routing wave that best balances congestion and timing criticality is chosen Interested readers are referred to Ref [42] for more details

46.5.5.2 Congestion-Driven Routing

Another work that deals with routability and congestion estimation is fGrep [43] To estimate conges-tion, waves are started from a source node, and all possible paths are implicitly enumerated at every step of the wave propagation The probability that the net passes through a particular routing element

is the ratio of the total number of paths that pass through that routing element to the total number

of paths that can route the net Routing demand or congestion on a routing element is defined as the sum of these probabilities among all nets Of course, performing full wave propagation for every net would be costly As a trade-off, the authors trim the wave once it has passed a certain predetermined distance, which results in the speedup of the estimation at the cost of accuracy Another speedup technique used by the authors is to start waves from all terminals of a net and stop when two waves reach each other

46.5.5.3 Statistical Timing Routing

Statistical timing analysis has found its way into FPGA CAD tools in recent years Sivaswamy

et al [44] showed that using SSTA during the routing stage could greatly improve timing yield over traditional static timing analysis methods with guard-banding More specifically, in their experimental setup they could reduce the yield loss from about 8 per 10,000 chips to about 1 per 10,000 chips They considered inter- and intradie variations and modeled spatial correlations in their statistical modeling of device parameters

Matsumo et al [45] proposed a reconfiguration methodology for yield enhancement in which multiple routing solutions are generated for a design and the one that yields the best timing for a particular FPGA chip is loaded on that chip This can be done by performing at-speed testing of an

individual FPGA chip using each of the n configurations that are generated and by picking the one

that yields the best clock speed The advantage of this method compared to a method that requires obtaining the delay map of all elements on the chip (e.g., the work by Cheng et al [32]) is that extensive tests are not required to determine which configuration yields the best timing results

In the current version of their method, Matsumo et al [45] fix the placement and only explore different routing solutions In each configuration, they try to avoid routing each critical path through the same regions used by other configurations, which means that ideally, each configuration routes

a critical path through a unique set of routing resources that are spatially far away from the paths

in other configurations As a result, if a critical path in one configuration is slow due to process variations, chances are that other configurations would route the same path through regions that are faster, resulting in a faster clock frequency Figure 46.19 shows three configurations with different routes for a critical path and the delay variation map of the switch matrix Using the delay map in Figure 46.19, we can calculate the delay of the critical path in the first, second, and third configurations

as 4.9, 4.5, and 5.1, respectively

They ignore spatial correlations in their method, hence they can analytically calculate the

probability that a design fails timing constraints given n configurations The probability that none of the n configurations passes the timing test is

Y (Target) = 1 −1− Y (Target) n

(46.12)

Định dạng
Số trang	10
Dung lượng	202,94 KB