Their k-way partitioning heuristic constructs k prototype vectors with distinct directions to represent blocks and places into the corresponding block the vertices that have correspondin
Trang 12D placement 3D placement
X
0.05 0
⫺ 0.05
⫺ 0.05
⫺ 0.1
⫺ 0.1
⫺ 0.15 0.1 0.05 0
⫺ 0.020
0.02
X Y
0.040.06 0.08 0.15
0.1
0.05
⫺ 0.05
⫺ 0.1
⫺ 0.15
⫺ 0.2
⫺ 0.04 ⫺ 0.02 0 0.02 0.04 0.06 0.08 0.1
0
FIGURE 7.9 Placements ofprim1using (a) two eigenvectors and (b) three eigenvectors
7.3.1.2 Partitioning Solutions from Multiple Eigenvectors
It is also possible to use multiple eigenvectors to determine arrangements of vertices that minimize the
number of cuts Hall [Hal70] suggests that the location of the vertices in r-dimensional space can be
used to identify blocks (see Section 7.3.1 for a description of his method) Two- and three-dimensional placements of prim1are shown in Figure 7.9 The three branches in the two-dimensional plot indicate three blocks should be formed On the other hand, it is not as obvious how to cluster vertices
in the three-dimensional plot
Instead of minimizing the squared distance between two vertices as in Equations 7.3 and 7.4, Frankle and Karp [FK86] transform the distance minimization problem to one of finding the point
emanating from the projection of x onto all eigenvectors that is furthest from the origin The vector
induced by this point will give a good ordering with respect to the wirelength
Chan et al [CSZ94] use the cosine of the angle between two rows of the|V| × k eigenvector
matrix, V, to determine how close the vertices are to each other If the cosine between two vectors is
close to 1, then the corresponding vertices must belong to the same block Their k-way partitioning heuristic constructs k prototype vectors with distinct directions (to represent blocks) and places
into the corresponding block the vertices that have corresponding vectors within π8 radians of the prototype vector
This approach was the starting point for a method devised by Alpert et al The idea behind multiple eigenvector linear orderings (MELO) [AY95], [AKY99] is after removing the first column
(which corresponds to the zero eigenvalue) from V (call this matrix V), the partition that satisfies the usual mincut objective and balance constraints is obtained by finding a permutation of the rows of
Vthat results in the maximum possible two-norm sum of the rows Alpert and Yao [AKY99] prove
that when the number of eigenvectors selected is n, then maximizing the vector sum is equivalent to
minimizing netcut
7.3.2 LINEARPROGRAMMINGFORMULATIONS
In paraboli, Riess et al [RDJ94], [AK95] use the eigenvector technique of Section 7.3.1 to fix the vertices corresponding to the ten smallest eigenvector components and ten largest eigenvector components to locations 1.0 and 0.0, respectively The center of gravity of the remaining vertices is fixed at location 0.5 They use a mathematical programming technique to reposition the free vertices
Trang 2so the overall wirelength is reduced The mathematical formulation is given by
min
|V|
i=1
|V|
j=1
a ij
|x i − x j|(x i − x j )2
s.t
|V|
i=1
x i = f
In the next pass of the algorithm, the 5 percent of vertices with the largest (smallest) resulting
coordinate are moved so their center of gravity is at x i = 0.95 and x i = 0.05 After performing the
optimization and repositioning, the process is repeated at center of gravity of x i = 0.9 and x i= 0.1, etc The process is repeated ten times so there are ten different orderings The best ordering is the one among the ten orderings with the best ratiocut metric
In Ref [LLLC96], the authors point out that linear cost functions spread out dense blocks of vertices, whereas quadratic cost functions naturally identify blocks of vertices, making it easier to assign discrete locations to otherwise closely packed vertices They incorporate the merits of both linear and quadratic methods in a modifiedα-order cost function:
min
|V|
i >j
|V|
j=1
a ij
|x i − x j|2−α (x i − x j )2
s.t
|V|
i=1
x i = f
where 1 ≤ α ≤ 2 If α = 1, the cost function becomes the linear cost function; for α = 2, the
cost function becomes the quadratic cost function They observe thatα = 1.2 best incorporates the
benefits of linear and quadratic cost functions
7.3.3 INTEGERPROGRAMMINGFORMULATIONS
In Ref [AK95], the authors formulate bipartitioning as an integer quadratic program Let x isindicate
that vertex i belongs to block s Let a ij represent the cost of the edge connecting vertices i and j Let
Bbe a matrix with b ii = 0, ∀ i and b ij
number of edges that have endpoints in more than one block is given by
min
k
i,j=1
m
s, =1
a ij x is b s x j (7.6)
s.t
k
s=1
m
i=1
Constraint given in Equation 7.7 indicates each vertex belongs to exactly one block and constraint given in Equation 7.8 denotes block sizes The rationale behind the objective function is that when the edge(i, j) is cut, a ij
k
s, =1 x is b s x j = a ij—in effect the cost of cutting the edge(i, j) appears only
once in the summation On the other hand, if edge(i, j) is uncut, then s = and b s = 0, which
implies that a ij
k
=1 x is b s x j = 0
Trang 3In Refs [AV93], [Kuc05], the authors formulate the k-way partitioning problem as a 0–1 integer linear program (INLP) Assume there are j = 1 · · · k blocks, i = 1 · · · |V| vertices, s = 1 · · · |E| nets, and i= 1 |e| s vertices per net s Let s (i) denote the index of the ith vertex of edge s in the set of vertices, V Define x ijto be an indicator variable such that
x ij=
1 vertex i is in block j
0 otherwise
The crux of the model is in the way we represent uncut edges If a specific net consists of vertices 1 through 4, then it will be uncut if
x 1j x 2j x 3j x 4j = 1 for some j
Introduce the indicator variable
y sj=
1 if net s has all of its vertices entirely in block j
0 otherwise
These constraints enable us to write the partitioning problem as an integer program To understand how these constraints work, consider a net consisting of vertices 1 and 5 Thus, for this net to be
uncut, x ij x 5j = 1 Because x 1j , x 5j ∈ {0, 1} then it is true that x 1j x 5j ≤ x 1j and x 1j x 5j ≤ x 5j
The objective function maximizes the sum of uncut nets (hence, minimizing the sum of cutnets)
max
k
j=1
n
s=1
s.t y sj ≤ x s(i)j ∀ i, j, s (7.11)
n
j=1
l j ≤
m
i=1
a i x ij ≤ u j ∀ j (7.13)
x pq = 1 p ∈ V, q ∈ B (7.14)
Constraint given in Equation 7.11 is the net connectivity constraint Constraint given in Equation 7.12 has each vertex assigned to exactly one block Constraint given in Equation 7.13 imposes block size
limits, given nonunit cell sizes a i The bounds for bipartitioning are typically l j = [0.45m
i=1a i] and
u j = [0.55m
i=1a i ] Constraint given in Equation 7.14 indicates that vertex p is in block q.
7.3.4 NETWORKFLOW
Given a directed graph G, each directed edge (or arc) (x, y) has an associated nonnegative number
c (x, y) called the capacity of the arc The capacity can be viewed as the maximal amount of flow that
Trang 4t
x
4
2
1 1
2
1 1
1
1
1
FIGURE 7.10 Flow network (From Ford, L R and Fulkerson, D R., Flows in Networks, Princeton University
Press, Princeton, NJ, 1962.)
leaves x and ends at y per unit time [FF62] Let s indicate a starting node and t a terminating node.
A flow from s to t is a function f that satisfies the equations
from y
f (x, y) −
to y
f (y, x) =
⎧
⎨
⎩
0, x
−k, x = t
(7.17)
Equation 7.17 implies the total flow k out of s is equal to −k out of t and there is no flow out of
intermediate nodes (as with Kirchoff’s law)
Equation 7.18 implies the flow is not allowed to exceed the capacity value Borrowing the example
from Ref [FF62], in Figure 7.10, we see that the flow out of s is−1 − 1 + 1 + 4 = 3, the flow out
of intermediate node x is −4 + 2 + 1 + 1 = 0 and the flow out of t is −2 + 1 − 1 − 1 = −3 The idea behind bipartitioning is to separate G into two blocks (not necessarily the same size) such that s ∈ C1and t ∈ C2where the netcut is given by
x ∈C1,y∈C2 c (x, y) The following theorem
links computing the maximum flow to the netcut
Theorem 3 MinFlow MaxCut: For any network, the maximum flow value from s to t is equal to
the minimum cut capacity for all cuts separating s and t
If we can find the maximum flow value from s to t, we will have found the partition with the
smallest cut In Figure 7.10, the maximum flow is 3 In Ref [FF62], the authors prove the maximum flow computation can be solved in polynomial time The problem is that partitions can be very unbalanced
In Ref [YW94], the authors propose a maximum flow algorithm that finds balanced partitions
in polynomial time Because nets are bidirectional, to apply network flow techniques, the net is transformed into an equivalent flow network and the flow representation shown in Figure 7.11 is used
The idea is that all vertices in net 1 are connected toward vertex x and away from vertex y The next step is to solve the maxflow-mincut problem in O
cutset, Ec, for the unbalanced problem Finally, if the balance criterion is not satisfied, vertices in
C1(or C2) are collapsed into s (or t), a vertex v ∈ C1(or in C2) incident on a net in Ecis collapsed
into s (or t) and the cutset, Ec, is recomputed The procedure has the same time complexity as the unbalanced mincut algorithm
Trang 5Net 1
u v
w
u v
w
FIGURE 7.11 Efficient flow representation.
7.3.5 DYNAMICPROGRAMMING
In a series of two papers [AK94], [AK96], the authors discuss clustering methods that form blocks
by splitting a linear ordering of vertices using dynamic programming It can be shown that dynamic programming can be used to optimally split the ordering into blocks [AK94]
In Ref [AK94], the authors embed a linear ordering obtained from multiple eigenvectors in mul-tidimensional space and use a traveling-salesman problem (TSP) heuristic to traverse the points The idea is that points that are close together in the embedding are in proximity to one another in the linear ordering A space-filling curve is then used as a good TSP heuristic because it traverses the points that are near to each other before wandering off to explore other parts of the space They construct
k blocks by splitting the tour into 2, 3, , k − 1, up to k segments using dynamic programming.
7.4 CLUSTERING
Partitioning is implicitly a top-down process in which an entire netlist is scanned for the separation
of vertices into a few blocks The complementary process to partitioning is clustering in which a few vertices at a time are grouped into a number of blocks proportional to the number of vertices [Alp96]
A block can be defined in a number of ways Intuitively, a block is a dense region in a hypergraph
[GPS90] The clique is the densest possible subgraph of a graph The density of a graph G (V, E) is
|E|
( |V|
2) and by this definition, clustering is the separation of V into k dense subgraphs, {C1, C2, , C k} in
which each of C ihave density equal to : 0 < ≤ 1 However, this problem is NP-complete [AK95].
A less formal way of defining a block is simply a region where vertices have multiple connections with one another This forms the basis of clustering techniques that use vertex matchings Normally,
matchings apply to graphs, but here, we apply them to hypergraphs A matching of G = (V, E) is a
subset of hyperedges with the property that no two hyperedges share the same vertex A heavy-edge matching means edges with the heaviest weights are selected first A maximum matching means as many vertices as possible are matched [PS98], [Ten99] For a hypergraph that consists of two-point hyperedges only, a maximum matching consists of|V|2 edges (Figure 7.12) In more general case, a maximum matching contracts fewer than |V|2 edges
1
2
3
4
5
6
7
8
9
10
FIGURE 7.12 Maximum matching of two-point hyperedges.
Trang 6The clustering process tends to decrease the sparsity of the netlist, which is fortunate because FM-based algorithms perform best when the average vertex degree is larger than 5 [AK95] Walshaw [Wal03] suggests clustering filters out irrelevant data from the partitioning solution space so that subsequent iterative improvement steps look for a minimum in a more convex space
We have divided clustering methods into three categories roughly in chronological order Cluster-ing techniques block many vertices simultaneously in a hierarchical fashion [KK98,AK98] or one ver-tex at a time in an agglomerative fashion, based on physical connectivity information [AK96,CL00, HMS03,LMS05,AKN+05] In cell placers, information such as cell names (i.e., indicating which presynthesized objects cells belonged to) may be incorporated to speed up the clustering heuristic
7.4.1 HIERARCHICALCLUSTERING
Hierarchical techniques merge all vertices into clusters at the same time Candidate vertices for hier-archical clustering are based on the results of vertex matchings [BS93,HL95,AK98,KK98,Kar03]; matched vertices are then merged into clusters of vertices Matchings are used extensively because they tend to locate independent logical groupings of vertices, thus avoiding the buildup of vertices
of excessively large degree Matchings may be selected randomly or by decreasing netsize, called heavy-edge matching After clustering, the average vertex weight increases, but the average net degree decreases Karypis and Kumar [Kar03] use the following clustering schemes, assuming unit weights on nets:
1 Select pairs of vertices that are present in the same nets by finding a maximum matching
of vertices based on a clique-graph representation (edge clustering)
2 Find a heavy-edge matching of vertices by nonincreasing net size; after all nets have been visited, merge matched vertices (net clustering)
3 After nets have been selected for matching, for each net that has not been contracted, its (unmatched) vertices are contracted together (modified net clustering)
4 To preserve some of the natural clustering that may be destroyed by the independence criterion of the previous three schemes, after an initial matching phase, for each vertex
υ ∈ V, consider vertices that belong to nets with the largest weight incident on υ, whether
they are matched or not (first choice clustering)
The clustering schemes are depicted in Figure 7.13
Karypis [Kar03] points out that there is no consistently better clustering scheme for all netlists Examples can be constructed for any of the above clustering methods that fail to determine the correct partitions [Kar03] Karypis [Kar03] also suggests that a good stopping point for clustering is when
there are 30k vertices where k indicates the desired number of blocks.
After the clustering phase, an initial bipartition that satisfies the balance constraint is performed
It is not necessary at this point to produce an optimal bipartition because that is ultimately the purpose
of the refinement phase Recently, several new clustering algorithms have been devised
7.4.2 AGGLOMERATIVECLUSTERING
Agglomerative methods form clusters one at a time based on connectivity of nets adjacent to the vertices being considered Once a cluster is formed, its vertices are removed from the remaining pool of vertices The key to achieving a good clustering solution is in somehow capturing global connectivity information
7.4.2.1 Clustering Based on Vertex Ordering
In Ref [AK96], the authors introduce the concept of an attraction function and a window to construct a linear ordering of vertices Given a starting vertex,υ∗, and an initially empty set of ordered vertices, S,
Trang 7(c) Modified net clustering
(b) Net clustering
(a) Edge clustering
FIGURE 7.13 Clustering schemes (From Karypis, G., Multilevel Optimization in VLSICAD, Kluwer
Academic Publishers, Boston, MA, 2003.)
they compute the attraction function forυ∗
i at step i in V −S Various attraction functions are described.
For example, one using the absorption objective is given by
Attract(i) =
e
1
|e| − 1 where E (i) indicates the set of edges at step i They then select the vertex υ∗
i in V − S with optimal attraction function and add it to S Finally, they update the attraction function for every vertex in
V − S and repeat until V − S becomes empty The order in which vertices are inserted into S defines blocks, where vertices that were recently inserted into S have more attraction on υ∗
i than vertices that were inserted many passes earlier (called windowing in Ref [AK96]) Dynamic programming
is ultimately used to split S into blocks The authors report that windowing produced superior results
with respect to the absorption metric over other ordering techniques
7.4.2.2 Clustering Based on Connectivity
In Ref [CL00], the authors use the concept of edge separability to guide the clustering process
Given an edge e = (x, y), the edge separability, λ(e), is defined as the minimum cutsize among cuts separating vertices x and y To determine the set of nets to be clustered, Z (G), they solve a
maximum flow problem (because computing edge separability is equivalent to finding the maximum
flow between x and y) To assess in what order the nets in Z (G) should be contracted, the authors
use a specialized ranking function related to the separability metric Nets are contracted until the maximum cluster limit size of log2|V| is reached.
In Refs [HMS03], [HMS04], the authors use a clique representation of nets, the weight of a connection is given by
w (c) = w (e) (|e| − 1)|e|
Trang 8A B
C D D
C
B A
1
1
1
1
½
FIGURE 7.14 Clique net model (with edge weights 1/(|e| − 1)) favors absorption better.
where w (c) is the weight of a cluster and w(e) is the weight of a net segment (determined by the net
model used) The rationale behind using a clique model for nets is that it favors configurations where the net is absorbed completely into a cluster In Figure 7.14, net 1 consists of vertices{A, B, C} and
net 2 consists of vertices{C, D} On the left side, using a star net model, the cost of cutting any edge
is 1 so clusters can be formed in three ways On the right side, the cost of cutting the edge connecting
C and D is highest, so clusters like these are formed.
The cost of each of a fine cluster, f , is given by
c ∈f w (c) and the overall cost of a fine clustering
solution is given by
f
c ∈f w (c), where the goal is to maximize the overall cost of the fine clustering
solution
In Ref [LMS05], the authors propose clustering technique based on physical connectivity They
define an internal force of a block C as a summation of weights of all internal block connections.
Fint(C) =
i,j ∈C
w (i, j)
As well, they define an external force of a block C as the summation of weights of nets with at least one vertex located outside C and at least one vertex inside C.
Fext(C) =
i ∈C,jC
w (i, j)
The measure that best reflects physical connectivity is the ratio of external to internal forces
(C) = Fext(C)
Fint(C)
Where the goal is to maximize(C) Fextcan be measured in other ways as well In Ref [LMS05], the authors use a local Rent’s exponent of a block
p= logG
T t
where
G is the number of nodes in the block
T is the number of nets that have connections inside the block and outside the block
t is the average node degree of the circuit
The seed growth algorithm works by constructing a block with strong physical connectivity starting
from a seed node with large net degree The connectivity between neighbor node u and block
C is given by conn (u, C) = i ∈C w (u, i) In subsequent passes, neighbor nodes with the largest
possible connectivity are added to the block while keeping the internal force as large as possible
Trang 9When the block size exceeds some threshold value, an attempt is made to minimize the local Rent exponent to reduce the external force Experimental results indicate the seed growth algorithm produces placements with improved wirelength over placers that use clustering techniques described
in Section 7.4.1
7.4.2.3 Clustering Based on Cell Area
In Ref [AKN+05], the authors propose a clustering scheme tailored specifically to large-scale cell placement Their method is different from methods described in Section 7.4.1 in that those methods block vertices indiscriminately, whereas best choice clustering considers only the best possible pair
of vertices among all vertex pairs The main idea behind best choice clustering is to identify the best possible pair of clustering candidates using a priority-queue data structure with pair-value key the tuple(u, v, d(u, v)) where u and v are the vertex pair and d(u, v) is the clustering score The pair-value
keys are sorted, in descending order, by clustering score The idea is to block the pair at the top of the priority queue
The clustering score is given by
d (u, v) =
e
1
|e|
1
a (u) + a(v)
The first term is the weight of hyperedge e, which is inversely proportional to the number of vertices incident on hyperedge e The a (u) + a(v) term is the total area of cells u and v Thus, this method
favors cells with small area, connected by nets of small degree The above area function is necessary
to prevent the formation of overly large blocks The authors propose using other score functions including one that uses the total number of pins instead of cell area, because the total number of pins
is more indicative of block size (via Rent’s rule described in Section 7.4.2)
Once a(u, v) pair with the highest clustering score is merged into vertex u, the clustering score
for all of us neighbors must be recalculated This represents the most time-consuming stage of the best choice clustering algorithm For this reason, the authors introduce the concept of the lazy-update clustering score technique, in which the recalculation of clustering scores is delayed until a vertex pair reaches the top of the priority queue
The best choice clustering algorithm is shown to produce better quality placement solutions than edge coarsening and first-choice clustering The lazy-update scheme is shown to be particularly effective at reducing runtime, all with almost no change in half-perimeter wirelength Studies are under way as of this writing into incorporating fixed vertices (corresponding to input/output terminals) into the best choice algorithm
7.5 MULTILEVEL PARTITIONING
The gist of multilevel partitioning is to construct a sequence of successively coarser graphs, to partition the coarsest graph (subject to balance constraints) and to project the partitions onto the next level finer graph while performing numerical or FM-type iterative improvement to further improve the partition [BJ93,BS93,HL95,Alp96,KAKS97] (Figure 7.15)
7.5.1 MULTILEVELEIGENVECTORPARTITIONING
The basis of multilevel partitioning with eigenvectors is described in Ref [BS93] and consists
of clustering, interpolation, and refinement steps Contraction consists of selecting a subgraph,
G: V⊂ V, of the original graph such that V is a maximum matching with respect to G The
Lanczos algorithm [Dem97] is then applied to the reduced bipartitioning problem
Interpolation consists of the following: given an|V|× 1 Fiedler vector, x, of a contracted graph
G, an interpolation step constructs a|V|×1 vector x out of x This is accomplished by remembering
Trang 10Clustering Refinement
FIGURE 7.15 Essence of multilevel partitioning.
that the ith component of xwas derived by contracting vertex m (i) of x and upon reconstructing a
new|V| × 1 vector, x0, inserting component x m(i) into the m (i)th slot of x0, initially filling all empty
slots in x0with zeros For example, if
x0= [x10 0 x40 x60 0 0 x10] then the zero components are then assigned the average values of their left and right nonzero neighbors
x0 =
x1x1+ x4
2
x1+ x4
2 x4
x4+ x6
2 x6
x6+ x10
2
x6+ x10
2
x6+ x10
2 x10
Refinement consists of using x0as a good starting solution for the Fiedler optimization problem Equa-tions 7.3 through 7.5 The authors use a cubically converging numerical technique called Rayleigh
quotient iteration to solve for x [Wat91].
7.5.2 MULTILEVELMOVE-BASEDPARTITIONING
One of the original works on multilevel partitioning in the VLSI domain [AHK96] applied techniques that were previously employed on finite element meshes [HL95], [KK95] The authors converted circuit netlists to graphs, using a clique representation for individual nets, and ran the multilevel graph partitioner, Metis [KK95], to obtain high-quality bipartitions Using a graph representation, however, has the pitfall that removing one edge from the cutset does not reflect the true objective that is to remove an entire net from the cutset Subsequent works [AHK97], [KAKS97] partitioned hypergraphs directly using the two-stage approach of clustering and refinement They obtained optimal or near-optimal mincut results on the set of test cases listed Multilevel partitioning, to this day, remains the de facto partitioning technique
Multilevel move-based partitioning consists of clustering and iterative improvement steps The power of multilevel partitioning becomes evident during the iterative improvement phase, where moving one vertex across the block boundary corresponds to moving an entire group of clustered vertices
The refinement process consists of repeatedly applying an iterative improvement phase to suc-cessively finer hypergraphs, while declustering after each pass of the interchange heuristic Because
of the space complexity of Sanchis’ k-way FM algorithm and because vertices are clustered into the
proper blocks, Karypis et al [KK99] use a downhill-only search variant of FM that does not require the use of a bucket list Their refinement method visits vertices in random order and moves them if
they result in a positive gain (and preserve the balance criterion) If a vertex v is internal to the block being considered, then it is not moved; if v is a boundary vertex, it can be moved to a block that houses v’s neighbors The move that generates the highest gain is effectuated In experiments, the
refinement method converges to a high-quality solution in only a few passes