– Retiming for Clock Period Minimization– Retiming for Register Minimization Hoàng Trang 4.1 INTRODUCTION • Retiming is a transformation technique used to change the locations of delay
Trang 1ĐẠI HỌC QUỐC GIA TP.HỒ CHÍ MINH TRƯỜNG ĐẠI HỌC BÁCH KHOA
GV: Hoàng Trang Email: hoangtrang@hcmut.edu.vn mr.hoangtrang@gmail.com
Thank to: thầy Hồ Trung Mỹ Slide: from text book of Parhi
Data broadcast truyền dữ liệu khắp nơi, phát tán dữ liệu
Parallel processing xử lý song song
communication bound giới hạn truyền thông
thời gian trễ truyền thông
2
cuu duong than cong com
Trang 2– Retiming for Clock Period Minimization
– Retiming for Register Minimization
Hoàng Trang
4.1 INTRODUCTION
• Retiming is a transformation technique used to
change the locations of delay elements in a circuit
without affecting the input/output characteristics of
the circuit.
• For example, consider the IIR filters in Fig 4.1(a) &
(b) Although the filters in Fig 4.1(a) and Fig 4.1(b)
have delays at different locations, these filters have
the same input/output characteristics These 2
filters can be derived from one another using
retiming.
4
cuu duong than cong com
Trang 3• Retiming has many applications in synchronous circuit
design These applications include
– reducing the clock period of the circuit,
– reducing the number of registers in the circuit,
– reducing the power consumption of the circuit, and
– logic synthesis
6
cuu duong than cong com
Trang 4Hoàng Trang
Applications of Retiming (cont’d)
• Retiming can be used to increase the clock rate of a circuit by
reducing the computation time of the critical path.
• For example:
– The critical path of the filter in Fig 4.1(a) = TM+TA= 3 u.t => this
filter cannot be clocked with a clock period of less than 3 u.t
– The retimed filter in Fig 4.1(b) = TA+TA= 2 u.t => this filter can be
clocked with a clock period of 2 u.t
– By retiming the filter in Fig 4.1(a) to obtain the filter in Fig 4.1(b), the
clock period has been reduced from 3 u.t to 2 u.t., or by 33%
• Retiming can be used to decrease the number of registers in a
circuit The filter in Fig 4.1 (a) uses 4 registers while the filter in
Fig 4.1 (b) uses 5 registers.
• Since retiming can affect the clock period and the number of
registers, it is sometimes desirable to take both of these
parameters into account.
Trang 5• Pipelining is Equivalent to Introducing Many
delays at the Input followed by Retiming
10
cuu duong than cong com
Trang 6Hoàng Trang
4.2 DEFINITIONS AND PROPERTIES
4.2.1 Quantitative Description of Retiming
11
• Retiming maps circuit G to a retimed circuit Gr
• Retiming solution characterized by a value r(V) for
each node V in graph
– Let w(e) denote weight of edge e of graph G, and
wr(e) denote weight of edge e of graph Gr
– Weight of edge rom U V in the retimed graph is
computed from weight of edge in original graph using
wr(e) = w(e) + r(V) - r(U)
• Retiming solution is feasible if wr(e) >= 0 for all edges
e
Node Retiming
• Transfer delay through a node in DFG:
• r(v) = # of delays transferred from
out-going edges to incoming edges of
node v
• w(e) = # of delays on edge e
• wr(e) = # of delays on edge e after
1 0
0
( ) ( )( ) ( ) ( )( ) ( ) ( )
k
r r i i k
i i i i
Trang 7Hoàng Trang
Invariant Properties
1 Retiming does NOT change the total number
of delays for each cycle.
2 Retiming does not change loop bound or
iteration bound of the DFG
3 If the retiming values of every node v in a
DFG G are added to a constant integer j, the
the weights (# of delays) of the retimed graph
will remain the same
Trang 8DFG Illustration of the Example
• Weight of a path from node 0 to node k is
number of delays between those nodes
• Computation time of a path between node 0
to node k is the sum of computation times
(adders, etc.) of each of the nodes
• Properties:
– Retiming does not change number of delays in
a cycle
– Retiming does not alter iteration bound of DFG
– Adding a constant value j to the retiming value
of each node does not change the mapping
Trang 9Hoàng Trang
Hoàng Trang
4.3 Solving Systems of Inequalities
• Shortest path algorithms (Appendix A of Parhi book)
– Bellman-Ford
– Floyd-Warshall
• Given a set of M inequalities and N variables, where each
inequality has the form ri– rj<= k for integer values of k, can use
one of shortest path algorithms to determine if solution exists
and to find one solution
• Procedure:
– 1) Draw the constraint graph
a) Draw the node i for each of the N variables ri, i=1, N
b) Draw the node N+1
c) For each inequality ri– rj<= k, draw the edge ji for node j to node i
with length k
d) For each node i, i=1,2,WN, draw the edge N + 1 i from the node N+1
to the node i with length 0
– 2) Solve using a shortest path algorithm
a) the system of equalities has a solution if and only if the constraints
graph contains no negative cycles
b) if a solution exists, one solution is where riis the minimum-length path
from the node N+1 to the node i
cuu duong than cong com
Trang 10Hoàng Trang
Bellman-Ford Algorithm
Find shortest path from an arbitrarily
chosen origin node U to each node in a
directed graphif no negative cycle
exists
Given a direct graph
w(m,n): weight on edge from node m
to node n, = ∞ if there is no edge from
m to n
r(i,j): the shortest path from node U to
node i within j-1 steps
r(i,1) = w(U,i),
r(i,j+1) = min {r(k,j) + w(k,i)},
j = 1, 2, …, N-1
if max(r(:,n-1)-r(:,n))>0, then there is a
negative cycle Else, r(i,n-1) gives
shortest cycle length from i to U Note that 1 > 0, hence there is at least
one negative cycle
2 1
3 4
1 1
Trang 11Floyd-Warshall Algorithm
Find shortest path between all
possible pairs of nodes in the
graph provided no negative cycle
exists
Algorithm:
Initialization: R(1)=W;
For k=1 to N
If R(k)(u,u) < 0 for any k, u, then a
negative cycle exist Else,
R(N+1)(u,v) is SP from u to v
2 1
3 4
2 1
Retiming Example – Bellman-Ford Algorithm
• For retiming example:
3
4
5
1 1 0 0
0 0
Trang 12Retiming Example – Floyd-Warshall algorithm
– First, two special cases of retiming, namely, cutset
retiming and pipelining, are considered
– Two algorithms are then considered for etiming to
minimize the clock period and retiming to minimize
the number of registers that are required to implement
the circuit.
24
cuu duong than cong com
Trang 18• Cutset retiming is often used in combination with slow-down
• The procedure is to first replace each delay in the DFG with N delays to
create an N -slow version of the DFG and then to perform cutset retiming on the
N –slow DFG
cuu duong than cong com
Trang 19Time Scaling (Slow Down)
• Transform each delay element
(register) D to ND and reduce
the sample frequency by N fold
will slow down the
computation N times
• During slow down, the
processor clock cycle time
remains unchanged Only the
sampling cycle time increased
• Provides opportunity for
retiming, and interleaving.
Vy(3) y(2) y(1)
V x(3) x(2) x(1) Vy(3) y(2) y(1)
Hoàng Trang
cuu duong than cong com
Trang 23• Retiming for clock period minimization is the tool
used to cause a recursive DFG to have a clock period
to equal the iteration bound
4.4.2 Retiming for Clock Period Minimization
cuu duong than cong com
Trang 24Retiming for Clock Period Minimization cont’d
• Minimum feasible clock period is computation time of the
critical path, which is the path with the longest computation
time among all paths with no delays Minimum clock period is
Φ(G)
• Want to find a retiming solution Φ(Gr0) <= Φ(Gr) for any other
retiming solution r In other words, we want to find the
retiming solution with minimum clock period
• Nomenclature:
– W(U,V) = minimum numbers of registers on any path from node U to V
– D(U,V) = maximum computation time among all paths from U to V
with weight W(U,V)
• Algorithm for retiming for clock period minimization
• First construct W(U,V) and D(U,V)
– 1) Let M=tmax·n where tmaxis the maximum computation time of the
nodes in G and n is the number of nodes in G
– 2) Form a new graph G' which is the same as G except the edge
weights are replaced by w'(e) = Mw(e) – t(U) for all edges e for UV
– 3) Solve the all-pairs shortest path problem on G' (using
Floyd-Warshall, for example) Let S'UVbe the shortest path from U to V
– 4) If U ≠ V, then W(U,V) = ceil(S'UV/M) and D(U,V) = MW(U,V) - S'UV+
t(V) If U=V, then W(U,V) = 0 and D(U,V) = t(U) Ceil() is the ceiling
function
• Use W(U,V) and D(U,V) to determine if there is a retiming
solution that can achieve a desired clock period c
– Usually set this desired clock period equal to the iteration bound of
the circuit.
cuu duong than cong com
Trang 25Hoàng Trang
Minimization cont'd
– Given a desired clock period c, there is a feasible retiming solution r
such that Φ(Gr) <= c if the following constraints hold
• CONSTRAINT 1: (feasibility) r(U) – r(V) <= w(e) for every UV along
edge e of G
– This enforces the numbers of delays on each edge in the retimed graph to be
nonnegative
• CONSTRAINT 2: (critical path) r(U) – r(V) <= W(U,V) – 1 for all vertices
U,V, in G such that D(U,V) > c
– This enforces Φ(Gr) <= c
• Thus, to find a solution
1) pick a value of c (usually equal to iteration bound)
2) Create a series of inequalities based on the feasibility constraint
3) Create a series of inequalities based on the critical path constraint
4) Combine these (using most restrictive if overlap exists) and create a
constraint graph
5) Find feasibility using shortest-path algorithm (i.e Floyd-Warshall) and
find retiming values
Hoàng Trang
cuu duong than cong com
Trang 26Hoàng Trang
Retiming to Reduce Registers
• Register Sharing
When a node has multiple
fan-out with different number of
delays, the registers can be
shared so that only the branch
with max # of delays will be
needed
• Register reduction through node delay transfer from multiple input edges to output edges (e.g r(v) > 0)
• Should be done only when clock cycle constraint (if any) is not violated
D
D
D
Delay reduction
4.4.3 Retiming for Register Minimization
(a) Usage: 1 + 3 + 7 = 11 Regcuu duong than cong com(b) Usage: 1 + 2 + 4 = 7 Reg
Trang 27Other Applications of Retiming
• Retiming for Folding (Chapter 6)
• Retiming for Power Reduction (Chap 17)
• Retiming for Logic Synthesis (Beyond Scope of
Trang 28Hoàng Trang
END chapter 4
cuu duong than cong com