INTEGRATION, the VLSI journal 41 2008 123–134Low-power state encoding for partitioned FSMs with mixed synchronous/asynchronous state memory Cao Cao , Bengt Oelmann Department of Informa
Trang 1INTEGRATION, the VLSI journal 41 (2008) 123–134
Low-power state encoding for partitioned FSMs with mixed
synchronous/asynchronous state memory
Cao Cao , Bengt Oelmann Department of Information Technology and Media, Mid Sweden University, SE-851 70 Sundsvall, Sweden Received 31 May 2006; received in revised form 6 February 2007; accepted 7 February 2007
Abstract
Partitioned finite state machine (FSM) architectures in general enable low-power implementations and it has been shown that for these architectures, state memory based on both synchronous and asynchronous storage elements gives lower power consumption compared to their fully synchronous counterparts In this paper we present state encoding techniques for a partitioned FSM architecture based on mixed synchronous/asynchronous state memory The state memory, in this case, is composed of a synchronous local state memory and
an asynchronous global state memory The local state memory uses synchronous storage elements and is shared by all sub-FSMs The global state memory operates asynchronously and is responsible for handling the interaction between sub-FSMs Even though the partitioned FSM contains the asynchronous mechanism, its input/output behaviour is still cycle by cycle equivalent to the original monolithic synchronous FSM In this paper, we discuss the low-power state encoding method for the implementation of partitioned FSM with mixed synchronous/asynchronous state memory For the local state assignment a, what we call, state-bundling procedure is presented to enable states residing in different sub-FSMs to share the same state codes Based on state-bundles, two state encoding techniques, in which one is the employment of binary encoding and the other is the further optimization for low power, are compared
r2007 Elsevier B.V All rights reserved
Keywords: State encoding; Low power; Mixed synchronous/asynchronous; Finite state machine partitioning
1 Introduction
For finite state machine (FSM) low-power design, there
are two main active areas of research One is FSM
partitioning and the other is low-power state encoding
These two methods can be used together or separately in
order to reduce the power dissipation of FSMs
FSM partitioning can be considered as the employment
of the concept ‘‘Dynamic Power Management’’[1](DPM)
at Register Transfer (RT) level The objective of a DPM
scheme is to partition the original design into two or more
units and those currently idle units are able to be shut
down to reduce dynamic power dissipation It is usual for
mechanisms to be added to the design to detect and shut
down the idle parts of the units Implementation of these
will result in additional circuits which, in turn, will add to
the circuit area and power dissipation Therefore, to
achieve a solution which uses the minimal power con-sumption, it is important to make an initial, careful, analysis in order to find the most beneficial idle conditions, taking the overhead into account
As to FSM partitioning for low power, the original FSM
is partitioned into several sub-FSMs and for most of the time, except when there is a state transition between two sub-FSMs, only one sub-FSM is active and all others are shut down Clock-gating and input-disabling are usually used as shut-down circuitry After FSM partitioning, the sub-FSM network basically has two types of structure in which the main difference is the means of implementing the state memory
In [2,3], each sub-FSM has its own state memory (as shown in Fig 1a) and extra signals are added to control which sub-FSM should be currently active Since every sub-FSM can be synthesized separately, a low-power state encoding method for monolithic FSM [4,5] can be used directly to reduce the power dissipation of each sub-FSM
In this structure, the state memories are, in some sense,
www.elsevier.com/locate/vlsi
0167-9260/$ - see front matter r 2007 Elsevier B.V All rights reserved.
doi: 10.1016/j.vlsi.2007.02.002
Corresponding author.
E-mail address: cao.cao@miun.se (C Cao).
Trang 2redundant At any given time, only one state memory in the
current active sub-FSM is of importance for storing state
information while those others, remaining in the idle
sub-FSMs, are not used
By contrast, in the partitioned FSM structure proposed
by Chow et al [6], all sub-FSMs share the same state
memory (also called local state memory, LSM) and thus
area can be reduced (seeFig 1b) However, a global state
memory is added to determine which one of the sub-FSMs
is currently active Because states in different sub-FSMs
can have the same local state code and be distinguished by
their global state codes, the local state assignment in one
FSM will influence the state codes in another
sub-FSM The sub-FSMs, therefore cannot be synthesized
separately and the low-power state encoding problem, in
this case, should be considered carefully For state
encoding [6] presents a method considering crossing
transitions (the state transition between two different
sub-FSMs) by introducing pseudo-outputs A pseudo-output
bit represents a relation imposed by the crossing transitions
Subsequently, all crossing transitions are deleted and Jedi
[7]is used to perform low-power state assignment for each
individual sub-FSM
Both approaches to low-power FSM partitioning
de-scribed above, assume fully synchronous implementations,
either based on a shared or separate state memory
However, the fully synchronous implementation does have
disadvantages For the separate state memory structure, in
the clock cycle when there is a crossing transition (where the
source state and destination state reside in two different
sub-FSMs), both sub-FSMs involved must be clocked
which adds to the power consumption For the shared state
memory structure, the global state memory, used for
determining the current active sub-FSM, is always clocked
and consumes power
For partitioned FSMs utilizing separate state memory,
with the aid of an asynchronous hand-over mechanism,[8]
removes the requirement of clocking two sub-FSMs at a
crossing transition The power overhead, introduced for
managing the interaction between the sub-FSMs, can thus
be reduced It was verified that, in the clock cycle without a
crossing transition, the power consumption of the
synchro-nous control for sub-FSM interaction is 5.8 times that of the asynchronous control[9] The average power reduction
of 45%, for a set of FSM benchmark circuits, is achieved using this asynchronous communication between sub-FSMs[3] Using the model of shared mixed synchronous/ asynchronous state memory, without state encoding optimization[10] achieves an average power reduction of 56%
As a development of [10], in this paper, a novel low-power state encoding algorithm is proposed and applied to FSM partitioning based on mixed synchronous/asynchro-nous state memory This algorithm is based on the assumption that the total power consumption is propor-tional to the switching activity of state bit-lines[11] The main contributions of this paper are as follows:
A state assignment procedure: Introduction of state-bundling that reflects the property of state encoding in partitioned FSM with shared state memory
Power optimized state encoding: An efficient state encoding algorithm is performed on the state-bundle table to reduce switching activity of state bit-lines, which generally can lead to the total power reduction, including that of the next state logic and the output logic
Demonstration of efficiency: The proposed algorithm has been incorporated in a tool for low-power synthesis
of partitioned FSMs To a set of MCNC benchmark circuits [12], it was demonstrated that, in combination with partitioning procedures, the tool can achieve an average power reduction of 59%
The outline for the rest of this paper is as follows: Section 2 introduces the implementation structure of the mixed synchronous/asynchronous state memory and the neces-sary definitions and procedures for this implementation In Section 3 the basic binary state encoding technique and the state assignment optimization for low power are presented
In Section 4 experimental results showing the possibility of
a further power reduction from state encoding optimiza-tion are presented In Secoptimiza-tion 5 the paper is concluded by a discussion regarding the limitations of the two-step synthesis for partitioned FSMs where a partitioning step
is followed by a state encoding step
2 Partitioned FSM with mixed synchronous/asynchronous state memory
2.1 Implementation architecture
In this paper, use is made of the architecture developed
in[13]which has a shared LSM and a global asynchronous state memory (GSM), seeFig 2 The basic idea is for the LSM, which is always clocked, to be synchronous and the GSM asynchronous, as this does not consume power when idle The partitioning of the original monolithic FSM is based on state transition probabilities where the states with
Fig 1 Structural decomposition of FSM (a) Separate state memory and
(b) Shared state memory.
Trang 3high mutual transition probabilities are implemented in the
same sub-FSM Since the state transition probabilities
between sub-FSMs are low, the global state memory is
generally idle and thus an asynchronous implementation is
more power efficient The shut-down mechanisms used are
input-gating, in order to reduce power dissipation in idle
combinational logic, and clock-gating to shut down
flip-flops temporarily not required in the LSM The primary
output and next-state information of the partitioned FSM
are obtained by merging the separate output and
next-state variables from the sub-FSMs (see Merging function in
Fig 2)
2.2 STG transformation
In order to implement the structure described in the
previous section, the original state transition graph (STG)
of the monolithic FSM must be transformed.Fig 3serves
to illustrate the basic ideas of the design model for STG
transformation
The initial monolithic machine is decomposed into two
sub-FSMs F1and F2as indicated inFig 3a There are three
crossing transitions after partitioning, two from s2in F1and
one from s5 in F2 For each crossing transition, an
additional g-state is introduced, which is inside the
sub-FSM where the source state resides and has the same index
as the destination state of the crossing transition InFig 3b
there are three g-states, g1, g3 and g4, whose indices
correspond to s1, s3, and s4, respectively
After introducing g-states, a crossing transition is
transformed into two steps The first step is inside the
LSM and a state transition from the original source state
ends at the g-state In the second step, the g-state is
detected and the global state set denoted by R, responsible
for determining the active sub-FSM, changes subsequently
A change in R will deactivate the sub-FSM containing the
source state of the crossing transition and activate the
sub-FSM containing the destination state of that transition
Take the crossing transition from s2to s3as an example
After STG transformation, the transition from s2will enter
g3 The detection of g3will cause the global state set R to
make a transition from r1to r2 After the completion of the
crossing transition F2, not F1as before, will be appointed as
the active sub-FSM
2.3 Basic definitions The monolithic Mealy type FSM is defined as a sextuple:
F ¼ ðS; X ; Y ; d; l; s0Þwhere S is the set of states, X is the set
of binary inputs, Y is the set of binary outputs, d is the transition function, l is the output function and s0is the initial state
Let there be a partition on the set S:P ¼ {S1,S2,y,Sn} where P is defined as a collection of subsets such that [nm¼1Sm¼S and Si\Sj¼Ø for iaj where 1pi, jpn The monolithic FSM is decomposed into a set of sub-FSMs where every state subset SmAP defines a sub-FSM as: Fm¼ ðSm; Xm; Ym; dm; lm; sm
0Þ: The state subset Sm is called the internal state of the sub-FSM, Xmthe set of input variables at the transitions from the states in Sm, and Ym the set of output variables on the sets Smand Xm
T(Sm) is defined as the set of states that are not included
in Sm and to which there are transitions from Sm of Fm: TðSmÞ ¼ fsjjdðsk; XhÞ ¼sj; sjeSm; sk2Smg
Q(Sm) is defined as the set of states inside Sm and to which there are transitions from other sub-FSMs: QðSmÞ ¼ fsjjdðsk; XhÞ ¼sj; sj2Sm; skeSmg
The shorter notations Tm and Qm are used throughout the remainder of this paper
The set of g-states Gm, for the crossing transitions originated from Sm, replacing Tmas the set of destination states, is defined as Gm¼ fgijsi2Tmg
After STG transformation, let the set of internal states in the sub-FSM Fmbe Um: Um¼Sm[Gm
2.4 State-bundling
In a synchronous FSM the crossing transition, as for all other transitions, must be completed within one clock cycle As explained above, a crossing transition after the STG transformation requires two steps of state transitions
As the behaviour of the transformed STG is still cycle to cycle equivalent to that of the original synchronous one, these two steps of transitions must be completed within one clock cycle It is known that an asynchronous state transition is triggered by a signal transition rather than the active edge of the clock signal Therefore, following the synchronous state transition in the first step, the second step involving the asynchronous state transition happens immediately inside the global state memory and the two steps will be completed before the start of the next clock cycle When the global asynchronous state transition occurs, the local state must remain unchanged, which places a restriction on the state encoding In other words, the local states must be coded in such a way that a g-state and its associated entry state, together named as a coupled-state, must have identical codes For the example in
Fig 3b, there are three coupled-states in total: (s1, g1), (s3, g3), and (s4, g4) Each coupled-state includes two states that reside in different sub-FSMs but have the same state code The coupled-states show how the local state assign-ments in different sub-FSMs should relate to each other,
1 1
11
δ 1 λ 1
I
O
Merging function Gating function
LSM
Φ
Fig 2 Mixed synch/asynch FSM.
Trang 4which imposes a further restriction on the local state
encoding Other states, not included in the coupled-states,
are denoted as free states (s2and s5in this example) These
free states have more freedom within the local state
assignment If they are located in different sub-FSMs,
since their global states are different, they can choose
whether to have the same local state code or not
The states sharing the same local state code are called a
state-bundle A, what we call, state-bundle table is used to
describe the behaviour of the decomposed FSM including
the sub-FSM interaction We take the above example
(Fig 3) as an illustration and its state-bundle table is shown
inFig 4 In the table, each row includes the internal states
of a sub-FSM and each column, having the same local state
code, represents a state-bundle A local state transition is
represented by a horizontal change between columns and a
global state transition is represented by a vertical change
between rows Every coupled-state shares the same local
state code and should be placed in the same bundle The
bundle including state is further distinguished as a
g-state-bundle in the remainder of the paper
There are two reasons for state-bundling: (1) it enables
states in different sub-FSMs to share local state codes and
(2) it enables an efficient asynchronous global state
transition
3 State encoding for local states
After FSM partitioning, the whole local state encoding
procedure is performed on the state-bundle table Each
coupled-state is placed in a single column and then the free
states are added to the table In this section we present two
state-encoding methods One is the basic procedure that
generally offers good results [10] The other is the
optimization procedure for low-power By merging two
or more coupled-states into the same state-bundle and optimizing the state codes of the state-bundles, we can minimize the switching activity in the state bit-lines and thus the power dissipation[11]
3.1 Basic state encoding algorithm The following example will be used to illustrate the basic procedure of local state assignment Assume that the original state set is S, let there be a partition P ¼
fS1; S2; S3; S4gwhich results in the following sets of states:
U1 ¼ fs1; s2; s3; g4g, U2¼ fs4; s5; s6; g7g, U3¼ fs7; g1g, and
U4 ¼ fs8; s9; s10; s11; s12; s13; g1; g5g The duty time of each state subset Um, i.e., the probability of the corresponding sub-FSM Fmto be active, is given by the sum of the state transition probabilities for which the source states are inside Sm, that is Tm¼P
probðsi; sjÞ; si2Sm; sj2S The static probability of Um, i.e., the sum of the static state probabilities of Sm, is defined as Dm¼P
probðsiÞ; si2Sm Note that Pn
m¼1Dm¼1 as the sum of the static state probabilities of all states equals 1 By contrast,Pn
m¼1Tmis greater than 1, because a crossing transition is associated with two sub-FSMs and its state transition probability contributes to the duty time of both involved sub-FSMs When building the state-bundle table, for an n-way partitioned FSM, there are n rows The set of state-bundles, denoted B, can be defined as B ¼ {b0,b1,y,bp}, where p is determined by the number of columns since each column represents a state-bundle Corresponding to the definition
of state probability and state transition probability, two probabilities concerning state-bundles are defined One is the state-bundle probability, expressed as probðbmÞ ¼ P
probðsiÞ, si2bm, 0pmpp, representing the sum of state probabilities for states in the state-bundle bm The other is the state-bundle transition probability, defined as probðbmbkÞ ¼P
probðsisjÞ, si2bm, sj2bk, 0pm, kpp, describing the sum of state transition probabilities from states in the bundle bmto the states in the bundle bk Binary state codes are assigned to the columns of the table from left to right in incremental order Initially, the codes of state-bundles correspond to the bundle index, i.e.,
S5
S1
x1
S2
S3
S4
x1
x1
x1
x1
x1
S1
S2
g3
g4
x1
x1
x1
x1
S3
S4
x1
x1
g1
S5
F1
F 2
r1+,r
2-r2+, r
1-r2+, r
1-Fig 3 Example: (a) monolithic FSM with state partition indicated and (b) coupled-states introduced.
Fig 4 Example of state-bundle table.
Trang 5b0has the code ‘‘000’’, b1‘‘001’’ and so on Binary encoding
ensures that the number of clocked local state bits for each
sub-FSM is minimal, since the high state bits unused
always hold the value of ‘‘zero’’[13]
The construction of the state-bundle table starts from the
coupled-states From previous discussions, it is known that
each coupled-state is in a bundle The number of bundles
necessary for coupled-states is Sn
m¼1Qm
, i.e., the sum of the entry-states of all sub-FSMs Note that Sn
m¼1Qm
equals the total number of g-states InFig 5, coupled-states
are shaded grey in the state-bundle table It can be seen that
for example s4 in F2 is in the same column as g4 in F1
because they are states After filling the
coupled-states in the table, free coupled-states in each sub-FSM are placed
into the table ordinally from the left most empty cell The
pseudo-code for the bundling algorithm is shown inFig 6
3.2 Power-optimized state encoding algorithm
The power-optimized procedure of state assignment is
based on the basic state encoding algorithm mentioned
above Binary codes are still used for the columns of the
state-bundle table from left to right, but states will be
moved to suitable columns in order to reduce the switching
activity in the state bit-lines In the first step, using the
‘‘merging state’’ algorithm, two or more
coupled-states can be merged into the same state-bundle It is thus
possible to reduce both the number of state bits in the LSM
and the clocked bits for a single sub-FSM In this step,
state probabilities are also taken into account to reduce
switching activity In the second step, to further reduce the
switching activity, these g-state-bundles (including
coupled-state) are moved to suitable columns depending on their
mutual state-bundle transition probability The free states
are then placed in the table, taking switching activity into
account
3.2.1 Merging coupled-state algorithm
For partitioned FSM, to determine if it is necessary to
merge coupled-states, we introduce the measurement
criteria:
c ¼
Sm
m¼1ðSm\QmÞ
Sn
m¼1Qm
where the denominator represents the total number of
destination states for the crossing transitions (equal to the
total number of g-states) and the numerator represents the total number of free states and thus the smaller the number
of g-states, the bigger the value of c For a partitioned FSM with a small number of g-states, the number of coupled-states is limited and merging coupled-coupled-states may be unnecessary In Fig 5, the number of g-states is small and c equals 9
4 Most FSMs, partitioned according to the state transition probabilities, have small numbers of g-states and will therefore have a large value for c For this reason the basic state-bundling procedure without merging coupled-states works well in most cases
However, when the number of g-states is large, placing each coupled-state into a different column will result in a large number of g-state-bundles and thus causing inefficient
Fig 5 State-bundle table with large c.
Fig 6 Pseudo-code for basic procedures of building state-bundle table.
Trang 6usage of the LSM The objective of merging coupled-states
is to reduce the number of g-state-bundles so that more
states are able to share the same local state code It is then
possible to simplify the detection logic for g-states and
reduce the number of local state bits required for each
sub-FSM
To illustrate the merging coupled-state algorithm, the
example inFig 7is used where the value of c is2
5(Fig 8)
The initial state-bundle table with coupled-states before
merging is shown inFig 9a, where the five g-states reside in
five state-bundles The merging procedure is performed
using the following steps
(1) Rows in the table are sorted (in sort( ) function of
Fig 10) according to the static probability Dm of their
corresponding sub-FSMs (see Fig 7) Following this
rearrangement, the static probability of the sub-FSMs
appears in descending order Since those sub-FSMs with
high static probability generally contribute more to the
final power dissipation, they are given priority in the
following optimization procedures The sorted state-bundle
table is shown in Fig 9b Following on from this, the
g-state-bundle with the highest state bundle probability is
moved to the first column and is assigned the state code
‘‘zero’’ in binary coding In the proposed implementation
architecture, the next-state information of the partitioned
FSM is obtained by merging the next-state variables of the
different sub-FSMs (Fig 2) using OR gates When the
present active sub-FSM, assuming that its current state is
encoded to ‘‘zero’’, is deactivated in the next clock cycle,
there are no state changes in this sub-FSM as the next-state variable of a deactivated sub-FSM is always encoded to
‘‘zero’’ Encoding the g-state-bundle with the highest probability to ‘‘zero’’ can therefore reduce the switching activity in the next-state bit-lines for the crossing transi-tions
(2) The coupled-states are merged and two or more will reside in the same column To ensure that the state-bundle probability of the first column is a maximum, the algorithm begins from the coupled-state in the first column (b0 by default) and attempts to merge other coupled-states into it
If two or more coupled-states are able to be chosen for merging, then the one in the bundle with the highest state bundle probability will be chosen When the merging process for the first column b0is completed, b0 is locked The same procedure continues for the following coupled-states until the one in the last column has been executed In the example given inFig 9b, it is shown that both b2and b3
can be merged into b0 According to the static probability information of sub-FSMs shown inFig 7, the state-bundle probability of b3 is 0.3 (prob(b3) ¼ prob(s4) ¼ prob(D4)), larger than that of b2 (prob(b2) ¼ prob(s3)pprob(s3)+ prob(s2) ¼ prob(D3) ¼ 0.2) Therefore, b3 is chosen to be merged into b0 The updated state-bundle table after merging coupled-states is shown inFig 9c, where the total number of g-state-bundles is reduced from 5 to 4
3.2.2 g-state-bundle encoding optimization Since every g-state-bundle is given a unique local state code, the problem of reducing the switching activity between state transitions is transformed to reducing the switching activity of the transitions between the bundles In this step, the indices of state-bundles are fixed but their positions are moved between the columns The algorithm,
in addition to reducing the switching activity between state-bundles, attempts to allow the sub-FSMs with higher static probabilities to retain the minimum-length encoding The state encoding procedure begins from the top row of the state-bundle table, corresponding to the sub-FSM with the highest static probability, and continues to the last row of the table Bundle b0is in the first column with the highest state-bundle probability and, its position is locked initially when other g-state-bundles are unlocked In the table, if a row includes a state belonging to a g-state-bundle, the g-state-bundle is said to be valid in this row For the valid g-state-bundles of each row, their positions will be optimized and then locked without further changes
A greedy algorithm, shown inFig 11, is used to minimize the hamming distance of g-bundles with high state-bundle transition probability
The procedure can be illustrated through the example given above InFig 9c, the state-bundle table after merging the coupled-states includes state-bundles b0, b1, b2, b3 The first row only includes states s0and g1, belonging to b0and
b1, respectively, so b0and b1are the valid state-bundles in this row (corresponding to F1) Since b0is locked initially, the g-state-bundle optimization procedure begins from b
S1
S4
F1
S0
F 2
F
F 4
S5
F5
S6
Sub-FSM static probability:
D1 = 0.3 D2 = 0.1 D3 = 0.2 D4 = 0.3 D5 = 0.1
S3
S2
F 3
Fig 7 Example of a partitioned FSM with small c.
B:
b 0
000
b 1 001
b 2 010
b 3 011
b 4 100
bits
s3
1
2
Fig 8 State-bundle table before optimization.
Trang 7To ensure that valid g-state-bundles in F1use the
minimum-length codes, b1can only be assigned the code ‘‘01’’ and
thus only one state bit is necessary to distinguish the valid
g-state-bundles in F1 When the optimization in the first
row is completed, the position of g-state-bundles b0and b1
is locked For the remaining two bundles, b2includes the
coupled-state (s3, g3) and b3includes (s5, g5) In F4there are
4 states and the number of minimal local state bits is 2
Since the codes ‘‘00’’ and ‘‘01’’ have been assigned, the
remaining available codes are ‘‘10’’ and ‘‘11’’ The
bundle transition probabilities between unassigned
state-bundles (b2, b3) and assigned bundles (b0, b1) are calculated
If it is assumed that the transition probability between b3
and b0is the highest, then b3is placed in the column with
the code ‘‘10’’, which has a hamming distance of 1 to b0
(‘‘00’’) and is then locked Bundle b is subsequently
assigned the code ‘‘11’’ and locked Since all g-state-bundles are locked, the state encoding optimization for g-state-bundles is complete It can be seen from Fig 9d that the position of b3and b2is swapped after optimization
3.2.3 Free state encoding optimization Free states are states of sub-FSMs that are not included
in the coupled-states These states are not related to crossing transitions and therefore their state assignment optimiza-tion is not influenced by the ordering of sub-FSMs For every unassigned free state of a single sub-FSM Fm, its state transition probabilities with all other assigned states (inside
or outside Fm) is calculated The free state associated with the highest state transition probability is chosen and placed into the column minimizing the hamming distance of this
B:
b 0 000
b 1 001
b 2 010
b 3 011
b 4 100
F 1 s0 g1
B:
b 0 000
b 1 001
b 2 010
b 3 011
b 4 100
F 1 s0 g1
-F 4 - g1 g3 s4 g5
5
B:
b 0 000
b 1 001
b 2 010
b 3 011
b 4 100
F 1 s0 g1
-F 4 s4 g1 g3 g5
-F 3 g4 - s3
-B:
C:
b 0 00
b 1 01
b 3 10
b 2 11
F 1 s0 g1
-F 4 s4 g1 g5 g3
-B:
C:
b 0 00
b 1 01
b 3 10
b 2
F1 s0 g1
-F 4 s4 g1 g5 g3 2
-s4
e
Fig 9 Optimization procedures for state-bundle table: (a) initial table with coupled-states; (b) sorted table; (c) merging coupled-states; (d) optimization of g-state-bundles; (e) final state-bundle table.
Trang 8transition Meanwhile the condition of minimum-length
encoding for Fmshould still be satisfied
In the example above, for sub-FSM F3, s2is a free state
and the minimum number of state bits for F3is 2 (obtained
from minimumLengthCode( ) function inFig 12) Since s2
only has a state transition to s3 (in Fig 7), it should be
placed in the column which has the smallest hamming
distance to the column with code ‘‘11’’, which is where s3
resides Both columns, with code ‘‘01’’ and ‘‘10’’, have a
hamming distance 1 to ‘‘11’’, so s2can be placed in either
In Fig 9e, it is placed in the left column ‘‘01’’ For
sub-FSM F5, s6is a free state It has state transitions to s5and
s0, where the former occurs in sub-FSM F5and the latter is
between sub-FSMs F5 and F1 Assume that the state
transition probability between s6and s5is higher than that
between s6 and s0, s6 is placed in the column with code
‘‘11’’, which has a hamming distance of 1 from the column
with code ‘‘10’’ where s5is to be found The column ‘‘01’’
has a hamming distance of 2 from ‘‘10’’ and is therefore not
chosen
The final state-bundle table after optimization proce-dures is shown inFig 9e In comparison to the state-bundle table without optimization shown inFig 8, it can be seen that the total number of local state bits, as well as the state bits needed for F4and F5, is reduced from 3 to 2 The effect
of reducing local state bits depends largely on whether or not it is possible for those sub-FSMs having a high probability of being active to be able to reduce the number
of clocked local state bits
4 Experimental results
In this section the results are presented showing how the state assignment optimization procedures, given in Section 3.2, influence the power consumption of partitioned FSMs The optimization algorithms are incorporated into the automatic synthesis tool based on our previous work[10] Seven MCNC standard benchmarks[12]were used in the experiments The number of states in these benchmarks Fig 10 Pseudo-code for coupled-state merging.
Trang 9ranges from 19 to 121 As in [14], we use Monte Carlo
simulation to obtain the approximate state transition
probabilities The input vectors are randomly generated
and the average input probability is set to 0.5 by default
The inputs are assumed to be independent of each other
Also, it is assumed that the state probability of each state
becomes a constant as time increases to infinity After a
warm-up period of clock cycles, the state probability of
every state is sampled A simplified convergence criterion is
used, i.e., only when the maximum difference value
between the probabilities of each state sampled in two
consecutive time units (or clock cycles) is less than e, a user
specified constant, does the simulation stop The default
value of e is set to 106 For all the standard benchmarks
tested, the simulations converged in a reasonable time This
method removes the limitation to deal with STG in an
explicit way and supports the execution of large
bench-marks The power and area figures presented in the graphs
are obtained from gate-level estimations in Power
Compi-ler, and logic synthesis is performed using Design
Compiler, both of which tools are obtained from Synopsys
[15] A 0.18 mm CMOS standard cell library [16] is used The power supply voltage Vddis assumed as 1.8 V and the clock frequency is 20 MHz
For the original FSM, binary state encoding is the default one used by Synopsys The total power of this monolithic FSM is Ptot,mono¼Pclk+Preg+Pns+Pout where Pclk is the clock net power, Pregis the power in the state registers, Pnsis the power in the next-state function, and Pout is the power in the output function The total power of the partitioned FSM is Ptot,part¼Pclk+Preg+Pns
+Pout+Pohwhere Pohis the power overhead, which is the sum of the power dissipated in the global state memory, circuits for idle condition detection, and shut-down circuitry The power dissipated in the sub-FSMs can be further indicated as Ptot,sub-FSM¼Ptot,partPoh
FSM partitioning, on its own, is an efficient method for achieving power reductions In Fig 13, the power consumptions before and after FSM partitioning is compared It can be seen that significant reductions have been obtained using the mixed synchronous/asynchronous architecture without optimized state encoding
Fig 11 Pseudo-code for optimized g-state-bundle encoding.
Trang 10In the partitioned FSM a significant part of the power is
dissipated in the global state memory, the circuits for idle
condition detection and the shut-down logic (in total Poh)
This part is rarely affected by optimization procedures
presented in this paper To examine in detail how the
proposed procedures affect the power consumption, we
firstly only consider the power dissipated in the sub-FSMs
(Ptot,sub-FSM¼Ptot,partPoh) InFig 14, the impact of state
assignment alone on power optimization of Ptot,sub-FSM
after FSM partitioning is illustrated It is shown that, in
comparison to the basic binary encoding procedures, how
the means of merging coupled-state (indicated by the legend
‘‘merged g-states’’) and the following encoding
optimiza-tion procedures (indicated by the legend ‘‘encoding’’) affect
power dissipation As can be seen, the merging of
coupled-states is, by itself, rarely a goal For benchmarks s832,
s820, scf and s1488, coupled-state merging has no obvious
positive effect This result is reasonable as the number of
local state bits may not be reduced following merging
Even when the number does decrease, since the local state bits are clock-gated, the sub-FSMs having a high probability of being active may still use the same number
of state bits as before the merging The following state encoding optimization process (including optimization for g-state-bundles and free states), by contrast, has better performance in terms of power reduction On average, for the benchmarks tested, regarding the power dissipation of the sub-FSMs (Ptot,sub-FSM¼Ptot,partPoh), the whole state assignment optimization technique achieves a power reduction of around 13% compared to that of the partitioned FSM using binary state encoding simply
As shown in Fig 13 to the right, the sub-FSM power (Ptot,partPoh) is only a portion (in average 40% or so) of the decomposed FSM power (Ptot,part) The total power reduction from FSM partitioning and optimization proce-dures is 59% compared to 56% in[10] Most low-power state encoding methods focus on the monolithic FSM, and for the method that does concern partitioned FSM with Fig 12 Pseudo-code for optimized free state encoding.