Handbook of algorithms for physical design automation part 15 potx

Their k-way partitioning heuristic constructs k prototype vectors with distinct directions to represent blocks and places into the corresponding block the vertices that have correspondin

Trang 1

2D placement 3D placement

X

0.05 0

⫺ 0.05

⫺ 0.1

⫺ 0.15 0.1 0.05 0

⫺ 0.020

0.02

X Y

0.040.06 0.08 0.15

0.1

0.05

⫺ 0.05

⫺ 0.1

⫺ 0.15

⫺ 0.2

⫺ 0.04 ⫺ 0.02 0 0.02 0.04 0.06 0.08 0.1

0

FIGURE 7.9 Placements ofprim1using (a) two eigenvectors and (b) three eigenvectors

7.3.1.2 Partitioning Solutions from Multiple Eigenvectors

It is also possible to use multiple eigenvectors to determine arrangements of vertices that minimize the

number of cuts Hall [Hal70] suggests that the location of the vertices in r-dimensional space can be

used to identify blocks (see Section 7.3.1 for a description of his method) Two- and three-dimensional placements of prim1are shown in Figure 7.9 The three branches in the two-dimensional plot indicate three blocks should be formed On the other hand, it is not as obvious how to cluster vertices

in the three-dimensional plot

Instead of minimizing the squared distance between two vertices as in Equations 7.3 and 7.4, Frankle and Karp [FK86] transform the distance minimization problem to one of finding the point

emanating from the projection of x onto all eigenvectors that is furthest from the origin The vector

induced by this point will give a good ordering with respect to the wirelength

Chan et al [CSZ94] use the cosine of the angle between two rows of the|V| × k eigenvector

matrix, V, to determine how close the vertices are to each other If the cosine between two vectors is

close to 1, then the corresponding vertices must belong to the same block Their k-way partitioning heuristic constructs k prototype vectors with distinct directions (to represent blocks) and places

into the corresponding block the vertices that have corresponding vectors within π8 radians of the prototype vector

This approach was the starting point for a method devised by Alpert et al The idea behind multiple eigenvector linear orderings (MELO) [AY95], [AKY99] is after removing the first column

(which corresponds to the zero eigenvalue) from V (call this matrix V), the partition that satisfies the usual mincut objective and balance constraints is obtained by finding a permutation of the rows of

Vthat results in the maximum possible two-norm sum of the rows Alpert and Yao [AKY99] prove

that when the number of eigenvectors selected is n, then maximizing the vector sum is equivalent to

minimizing netcut

7.3.2 LINEARPROGRAMMINGFORMULATIONS

In paraboli, Riess et al [RDJ94], [AK95] use the eigenvector technique of Section 7.3.1 to fix the vertices corresponding to the ten smallest eigenvector components and ten largest eigenvector components to locations 1.0 and 0.0, respectively The center of gravity of the remaining vertices is fixed at location 0.5 They use a mathematical programming technique to reposition the free vertices

Trang 2

so the overall wirelength is reduced The mathematical formulation is given by

min

|V|

i=1

|V|

j=1

a ij

|x i − x j|(x i − x j )2

s.t

|V|

i=1

x i = f

In the next pass of the algorithm, the 5 percent of vertices with the largest (smallest) resulting

coordinate are moved so their center of gravity is at x i = 0.95 and x i = 0.05 After performing the

optimization and repositioning, the process is repeated at center of gravity of x i = 0.9 and x i= 0.1, etc The process is repeated ten times so there are ten different orderings The best ordering is the one among the ten orderings with the best ratiocut metric

In Ref [LLLC96], the authors point out that linear cost functions spread out dense blocks of vertices, whereas quadratic cost functions naturally identify blocks of vertices, making it easier to assign discrete locations to otherwise closely packed vertices They incorporate the merits of both linear and quadratic methods in a modifiedα-order cost function:

min

|V|

i >j

|V|

j=1

a ij

|x i − x j|2−α (x i − x j )2

s.t

|V|

i=1

x i = f

where 1 ≤ α ≤ 2 If α = 1, the cost function becomes the linear cost function; for α = 2, the

cost function becomes the quadratic cost function They observe thatα = 1.2 best incorporates the

benefits of linear and quadratic cost functions

7.3.3 INTEGERPROGRAMMINGFORMULATIONS

In Ref [AK95], the authors formulate bipartitioning as an integer quadratic program Let x isindicate

that vertex i belongs to block s Let a ij represent the cost of the edge connecting vertices i and j Let

Bbe a matrix with b ii = 0, ∀ i and b ij

number of edges that have endpoints in more than one block is given by

min

k

i,j=1

m

s, =1

a ij x is b s x j  (7.6)

s.t

k

s=1

m

i=1

Constraint given in Equation 7.7 indicates each vertex belongs to exactly one block and constraint given in Equation 7.8 denotes block sizes The rationale behind the objective function is that when the edge(i, j) is cut, a ij

k

s, =1 x is b s x j = a ij—in effect the cost of cutting the edge(i, j) appears only

once in the summation On the other hand, if edge(i, j) is uncut, then s = and b s = 0, which

implies that a ij

k

=1 x is b s x j = 0

Trang 3

In Refs [AV93], [Kuc05], the authors formulate the k-way partitioning problem as a 0–1 integer linear program (INLP) Assume there are j = 1 · · · k blocks, i = 1 · · · |V| vertices, s = 1 · · · |E| nets, and i= 1 |e| s vertices per net s Let s (i) denote the index of the ith vertex of edge s in the set of vertices, V Define x ijto be an indicator variable such that

x ij=

1 vertex i is in block j

0 otherwise

The crux of the model is in the way we represent uncut edges If a specific net consists of vertices 1 through 4, then it will be uncut if

x 1j x 2j x 3j x 4j = 1 for some j

Introduce the indicator variable

y sj=

1 if net s has all of its vertices entirely in block j

0 otherwise

These constraints enable us to write the partitioning problem as an integer program To understand how these constraints work, consider a net consisting of vertices 1 and 5 Thus, for this net to be

uncut, x ij x 5j = 1 Because x 1j , x 5j ∈ {0, 1} then it is true that x 1j x 5j ≤ x 1j and x 1j x 5j ≤ x 5j

The objective function maximizes the sum of uncut nets (hence, minimizing the sum of cutnets)

max

k

j=1

n

s=1

s.t y sj ≤ x s(i)j ∀ i, j, s (7.11)

n

j=1

l j ≤

m

i=1

a i x ij ≤ u j ∀ j (7.13)

x pq = 1 p ∈ V, q ∈ B (7.14)

Constraint given in Equation 7.11 is the net connectivity constraint Constraint given in Equation 7.12 has each vertex assigned to exactly one block Constraint given in Equation 7.13 imposes block size

limits, given nonunit cell sizes a i The bounds for bipartitioning are typically l j = [0.45m

i=1a i] and

u j = [0.55m

i=1a i ] Constraint given in Equation 7.14 indicates that vertex p is in block q.

7.3.4 NETWORKFLOW

Given a directed graph G, each directed edge (or arc) (x, y) has an associated nonnegative number

c (x, y) called the capacity of the arc The capacity can be viewed as the maximal amount of flow that

Trang 4

t

x

4

2

1 1

2

1 1

1

FIGURE 7.10 Flow network (From Ford, L R and Fulkerson, D R., Flows in Networks, Princeton University

Press, Princeton, NJ, 1962.)

leaves x and ends at y per unit time [FF62] Let s indicate a starting node and t a terminating node.

A flow from s to t is a function f that satisfies the equations

from y

f (x, y) −

to y

f (y, x) =

⎧

⎨

⎩

0, x

−k, x = t

(7.17)

Equation 7.17 implies the total flow k out of s is equal to −k out of t and there is no flow out of

intermediate nodes (as with Kirchoff’s law)

Equation 7.18 implies the flow is not allowed to exceed the capacity value Borrowing the example

from Ref [FF62], in Figure 7.10, we see that the flow out of s is−1 − 1 + 1 + 4 = 3, the flow out

of intermediate node x is −4 + 2 + 1 + 1 = 0 and the flow out of t is −2 + 1 − 1 − 1 = −3 The idea behind bipartitioning is to separate G into two blocks (not necessarily the same size) such that s ∈ C1and t ∈ C2where the netcut is given by

x ∈C1,y∈C2 c (x, y) The following theorem

links computing the maximum flow to the netcut

Theorem 3 MinFlow MaxCut: For any network, the maximum flow value from s to t is equal to

the minimum cut capacity for all cuts separating s and t

If we can find the maximum flow value from s to t, we will have found the partition with the

smallest cut In Figure 7.10, the maximum flow is 3 In Ref [FF62], the authors prove the maximum flow computation can be solved in polynomial time The problem is that partitions can be very unbalanced

In Ref [YW94], the authors propose a maximum flow algorithm that finds balanced partitions

in polynomial time Because nets are bidirectional, to apply network flow techniques, the net is transformed into an equivalent flow network and the flow representation shown in Figure 7.11 is used

The idea is that all vertices in net 1 are connected toward vertex x and away from vertex y The next step is to solve the maxflow-mincut problem in O

cutset, Ec, for the unbalanced problem Finally, if the balance criterion is not satisfied, vertices in

C1(or C2) are collapsed into s (or t), a vertex v ∈ C1(or in C2) incident on a net in Ecis collapsed

into s (or t) and the cutset, Ec, is recomputed The procedure has the same time complexity as the unbalanced mincut algorithm

Trang 5

Net 1

u v

w

u v

w

FIGURE 7.11 Efficient flow representation.

7.3.5 DYNAMICPROGRAMMING

In a series of two papers [AK94], [AK96], the authors discuss clustering methods that form blocks

by splitting a linear ordering of vertices using dynamic programming It can be shown that dynamic programming can be used to optimally split the ordering into blocks [AK94]

In Ref [AK94], the authors embed a linear ordering obtained from multiple eigenvectors in mul-tidimensional space and use a traveling-salesman problem (TSP) heuristic to traverse the points The idea is that points that are close together in the embedding are in proximity to one another in the linear ordering A space-filling curve is then used as a good TSP heuristic because it traverses the points that are near to each other before wandering off to explore other parts of the space They construct

k blocks by splitting the tour into 2, 3, , k − 1, up to k segments using dynamic programming.

7.4 CLUSTERING

Partitioning is implicitly a top-down process in which an entire netlist is scanned for the separation

of vertices into a few blocks The complementary process to partitioning is clustering in which a few vertices at a time are grouped into a number of blocks proportional to the number of vertices [Alp96]

A block can be defined in a number of ways Intuitively, a block is a dense region in a hypergraph

[GPS90] The clique is the densest possible subgraph of a graph The density of a graph G (V, E) is

|E|

( |V|

2) and by this definition, clustering is the separation of V into k dense subgraphs, {C1, C2, , C k} in

which each of C ihave density equal to : 0 < ≤ 1 However, this problem is NP-complete [AK95].

A less formal way of defining a block is simply a region where vertices have multiple connections with one another This forms the basis of clustering techniques that use vertex matchings Normally,

matchings apply to graphs, but here, we apply them to hypergraphs A matching of G = (V, E) is a

subset of hyperedges with the property that no two hyperedges share the same vertex A heavy-edge matching means edges with the heaviest weights are selected first A maximum matching means as many vertices as possible are matched [PS98], [Ten99] For a hypergraph that consists of two-point hyperedges only, a maximum matching consists of|V|2 edges (Figure 7.12) In more general case, a maximum matching contracts fewer than |V|2 edges

1

2

3

4

5

6

7

8

9

10

FIGURE 7.12 Maximum matching of two-point hyperedges.

Trang 6

The clustering process tends to decrease the sparsity of the netlist, which is fortunate because FM-based algorithms perform best when the average vertex degree is larger than 5 [AK95] Walshaw [Wal03] suggests clustering filters out irrelevant data from the partitioning solution space so that subsequent iterative improvement steps look for a minimum in a more convex space

We have divided clustering methods into three categories roughly in chronological order Cluster-ing techniques block many vertices simultaneously in a hierarchical fashion [KK98,AK98] or one ver-tex at a time in an agglomerative fashion, based on physical connectivity information [AK96,CL00, HMS03,LMS05,AKN+05] In cell placers, information such as cell names (i.e., indicating which presynthesized objects cells belonged to) may be incorporated to speed up the clustering heuristic

7.4.1 HIERARCHICALCLUSTERING

Hierarchical techniques merge all vertices into clusters at the same time Candidate vertices for hier-archical clustering are based on the results of vertex matchings [BS93,HL95,AK98,KK98,Kar03]; matched vertices are then merged into clusters of vertices Matchings are used extensively because they tend to locate independent logical groupings of vertices, thus avoiding the buildup of vertices

of excessively large degree Matchings may be selected randomly or by decreasing netsize, called heavy-edge matching After clustering, the average vertex weight increases, but the average net degree decreases Karypis and Kumar [Kar03] use the following clustering schemes, assuming unit weights on nets:

1 Select pairs of vertices that are present in the same nets by finding a maximum matching

of vertices based on a clique-graph representation (edge clustering)

2 Find a heavy-edge matching of vertices by nonincreasing net size; after all nets have been visited, merge matched vertices (net clustering)

3 After nets have been selected for matching, for each net that has not been contracted, its (unmatched) vertices are contracted together (modified net clustering)

4 To preserve some of the natural clustering that may be destroyed by the independence criterion of the previous three schemes, after an initial matching phase, for each vertex

υ ∈ V, consider vertices that belong to nets with the largest weight incident on υ, whether

they are matched or not (first choice clustering)

The clustering schemes are depicted in Figure 7.13

Karypis [Kar03] points out that there is no consistently better clustering scheme for all netlists Examples can be constructed for any of the above clustering methods that fail to determine the correct partitions [Kar03] Karypis [Kar03] also suggests that a good stopping point for clustering is when

there are 30k vertices where k indicates the desired number of blocks.

After the clustering phase, an initial bipartition that satisfies the balance constraint is performed

It is not necessary at this point to produce an optimal bipartition because that is ultimately the purpose

of the refinement phase Recently, several new clustering algorithms have been devised

7.4.2 AGGLOMERATIVECLUSTERING

Agglomerative methods form clusters one at a time based on connectivity of nets adjacent to the vertices being considered Once a cluster is formed, its vertices are removed from the remaining pool of vertices The key to achieving a good clustering solution is in somehow capturing global connectivity information

7.4.2.1 Clustering Based on Vertex Ordering

In Ref [AK96], the authors introduce the concept of an attraction function and a window to construct a linear ordering of vertices Given a starting vertex,υ∗, and an initially empty set of ordered vertices, S,

Trang 7

(c) Modified net clustering

(b) Net clustering

(a) Edge clustering

FIGURE 7.13 Clustering schemes (From Karypis, G., Multilevel Optimization in VLSICAD, Kluwer

Academic Publishers, Boston, MA, 2003.)

they compute the attraction function forυ∗

i at step i in V −S Various attraction functions are described.

For example, one using the absorption objective is given by

Attract(i) =

e

1

|e| − 1 where E (i) indicates the set of edges at step i They then select the vertex υ∗

i in V − S with optimal attraction function and add it to S Finally, they update the attraction function for every vertex in

V − S and repeat until V − S becomes empty The order in which vertices are inserted into S defines blocks, where vertices that were recently inserted into S have more attraction on υ∗

i than vertices that were inserted many passes earlier (called windowing in Ref [AK96]) Dynamic programming

is ultimately used to split S into blocks The authors report that windowing produced superior results

with respect to the absorption metric over other ordering techniques

7.4.2.2 Clustering Based on Connectivity

In Ref [CL00], the authors use the concept of edge separability to guide the clustering process

Given an edge e = (x, y), the edge separability, λ(e), is defined as the minimum cutsize among cuts separating vertices x and y To determine the set of nets to be clustered, Z (G), they solve a

maximum flow problem (because computing edge separability is equivalent to finding the maximum

flow between x and y) To assess in what order the nets in Z (G) should be contracted, the authors

use a specialized ranking function related to the separability metric Nets are contracted until the maximum cluster limit size of log2|V| is reached.

In Refs [HMS03], [HMS04], the authors use a clique representation of nets, the weight of a connection is given by

w (c) = w (e) (|e| − 1)|e|

Trang 8

A B

C D D

C

B A

1

½

FIGURE 7.14 Clique net model (with edge weights 1/(|e| − 1)) favors absorption better.

where w (c) is the weight of a cluster and w(e) is the weight of a net segment (determined by the net

model used) The rationale behind using a clique model for nets is that it favors configurations where the net is absorbed completely into a cluster In Figure 7.14, net 1 consists of vertices{A, B, C} and

net 2 consists of vertices{C, D} On the left side, using a star net model, the cost of cutting any edge

is 1 so clusters can be formed in three ways On the right side, the cost of cutting the edge connecting

C and D is highest, so clusters like these are formed.

The cost of each of a fine cluster, f , is given by

c ∈f w (c) and the overall cost of a fine clustering

solution is given by

f

c ∈f w (c), where the goal is to maximize the overall cost of the fine clustering

solution

In Ref [LMS05], the authors propose clustering technique based on physical connectivity They

define an internal force of a block C as a summation of weights of all internal block connections.

Fint(C) =

i,j ∈C

w (i, j)

As well, they define an external force of a block C as the summation of weights of nets with at least one vertex located outside C and at least one vertex inside C.

Fext(C) =

i ∈C,jC

w (i, j)

The measure that best reflects physical connectivity is the ratio of external to internal forces

(C) = Fext(C)

Fint(C)

Where the goal is to maximize(C) Fextcan be measured in other ways as well In Ref [LMS05], the authors use a local Rent’s exponent of a block

p= logG

T t

where

G is the number of nodes in the block

T is the number of nets that have connections inside the block and outside the block

t is the average node degree of the circuit

The seed growth algorithm works by constructing a block with strong physical connectivity starting

from a seed node with large net degree The connectivity between neighbor node u and block

C is given by conn (u, C) = i ∈C w (u, i) In subsequent passes, neighbor nodes with the largest

possible connectivity are added to the block while keeping the internal force as large as possible

Trang 9

When the block size exceeds some threshold value, an attempt is made to minimize the local Rent exponent to reduce the external force Experimental results indicate the seed growth algorithm produces placements with improved wirelength over placers that use clustering techniques described

in Section 7.4.1

7.4.2.3 Clustering Based on Cell Area

In Ref [AKN+05], the authors propose a clustering scheme tailored specifically to large-scale cell placement Their method is different from methods described in Section 7.4.1 in that those methods block vertices indiscriminately, whereas best choice clustering considers only the best possible pair

of vertices among all vertex pairs The main idea behind best choice clustering is to identify the best possible pair of clustering candidates using a priority-queue data structure with pair-value key the tuple(u, v, d(u, v)) where u and v are the vertex pair and d(u, v) is the clustering score The pair-value

keys are sorted, in descending order, by clustering score The idea is to block the pair at the top of the priority queue

The clustering score is given by

d (u, v) =

e

1

|e|

1

a (u) + a(v)

The first term is the weight of hyperedge e, which is inversely proportional to the number of vertices incident on hyperedge e The a (u) + a(v) term is the total area of cells u and v Thus, this method

favors cells with small area, connected by nets of small degree The above area function is necessary

to prevent the formation of overly large blocks The authors propose using other score functions including one that uses the total number of pins instead of cell area, because the total number of pins

is more indicative of block size (via Rent’s rule described in Section 7.4.2)

Once a(u, v) pair with the highest clustering score is merged into vertex u, the clustering score

for all of us neighbors must be recalculated This represents the most time-consuming stage of the best choice clustering algorithm For this reason, the authors introduce the concept of the lazy-update clustering score technique, in which the recalculation of clustering scores is delayed until a vertex pair reaches the top of the priority queue

The best choice clustering algorithm is shown to produce better quality placement solutions than edge coarsening and first-choice clustering The lazy-update scheme is shown to be particularly effective at reducing runtime, all with almost no change in half-perimeter wirelength Studies are under way as of this writing into incorporating fixed vertices (corresponding to input/output terminals) into the best choice algorithm

7.5 MULTILEVEL PARTITIONING

The gist of multilevel partitioning is to construct a sequence of successively coarser graphs, to partition the coarsest graph (subject to balance constraints) and to project the partitions onto the next level finer graph while performing numerical or FM-type iterative improvement to further improve the partition [BJ93,BS93,HL95,Alp96,KAKS97] (Figure 7.15)

7.5.1 MULTILEVELEIGENVECTORPARTITIONING

The basis of multilevel partitioning with eigenvectors is described in Ref [BS93] and consists

of clustering, interpolation, and refinement steps Contraction consists of selecting a subgraph,

G: V⊂ V, of the original graph such that V is a maximum matching with respect to G The

Lanczos algorithm [Dem97] is then applied to the reduced bipartitioning problem

Interpolation consists of the following: given an|V|× 1 Fiedler vector, x, of a contracted graph

G, an interpolation step constructs a|V|×1 vector x out of x This is accomplished by remembering

Trang 10

Clustering Refinement

FIGURE 7.15 Essence of multilevel partitioning.

that the ith component of xwas derived by contracting vertex m (i) of x and upon reconstructing a

new|V| × 1 vector, x0, inserting component x m(i) into the m (i)th slot of x0, initially filling all empty

slots in x0with zeros For example, if

x0= [x10 0 x40 x60 0 0 x10] then the zero components are then assigned the average values of their left and right nonzero neighbors

x0 =

x1x1+ x4

2

x1+ x4

2 x4

x4+ x6

2 x6

x6+ x10

2

x6+ x10

2

x6+ x10

2 x10

Refinement consists of using x0as a good starting solution for the Fiedler optimization problem Equa-tions 7.3 through 7.5 The authors use a cubically converging numerical technique called Rayleigh

quotient iteration to solve for x [Wat91].

7.5.2 MULTILEVELMOVE-BASEDPARTITIONING

One of the original works on multilevel partitioning in the VLSI domain [AHK96] applied techniques that were previously employed on finite element meshes [HL95], [KK95] The authors converted circuit netlists to graphs, using a clique representation for individual nets, and ran the multilevel graph partitioner, Metis [KK95], to obtain high-quality bipartitions Using a graph representation, however, has the pitfall that removing one edge from the cutset does not reflect the true objective that is to remove an entire net from the cutset Subsequent works [AHK97], [KAKS97] partitioned hypergraphs directly using the two-stage approach of clustering and refinement They obtained optimal or near-optimal mincut results on the set of test cases listed Multilevel partitioning, to this day, remains the de facto partitioning technique

Multilevel move-based partitioning consists of clustering and iterative improvement steps The power of multilevel partitioning becomes evident during the iterative improvement phase, where moving one vertex across the block boundary corresponds to moving an entire group of clustered vertices

The refinement process consists of repeatedly applying an iterative improvement phase to suc-cessively finer hypergraphs, while declustering after each pass of the interchange heuristic Because

of the space complexity of Sanchis’ k-way FM algorithm and because vertices are clustered into the

proper blocks, Karypis et al [KK99] use a downhill-only search variant of FM that does not require the use of a bucket list Their refinement method visits vertices in random order and moves them if

they result in a positive gain (and preserve the balance criterion) If a vertex v is internal to the block being considered, then it is not moved; if v is a boundary vertex, it can be moved to a block that houses v’s neighbors The move that generates the highest gain is effectuated In experiments, the

refinement method converges to a high-quality solution in only a few passes

Định dạng
Số trang	10
Dung lượng	207 KB