Low power state encoding for partitioned

INTEGRATION, the VLSI journal 41 2008 123–134Low-power state encoding for partitioned FSMs with mixed synchronous/asynchronous state memory Cao Cao , Bengt Oelmann Department of Informa

Trang 1

INTEGRATION, the VLSI journal 41 (2008) 123–134

Low-power state encoding for partitioned FSMs with mixed

synchronous/asynchronous state memory

Cao Cao , Bengt Oelmann Department of Information Technology and Media, Mid Sweden University, SE-851 70 Sundsvall, Sweden Received 31 May 2006; received in revised form 6 February 2007; accepted 7 February 2007

Abstract

Partitioned ﬁnite state machine (FSM) architectures in general enable low-power implementations and it has been shown that for these architectures, state memory based on both synchronous and asynchronous storage elements gives lower power consumption compared to their fully synchronous counterparts In this paper we present state encoding techniques for a partitioned FSM architecture based on mixed synchronous/asynchronous state memory The state memory, in this case, is composed of a synchronous local state memory and

an asynchronous global state memory The local state memory uses synchronous storage elements and is shared by all sub-FSMs The global state memory operates asynchronously and is responsible for handling the interaction between sub-FSMs Even though the partitioned FSM contains the asynchronous mechanism, its input/output behaviour is still cycle by cycle equivalent to the original monolithic synchronous FSM In this paper, we discuss the low-power state encoding method for the implementation of partitioned FSM with mixed synchronous/asynchronous state memory For the local state assignment a, what we call, state-bundling procedure is presented to enable states residing in different sub-FSMs to share the same state codes Based on state-bundles, two state encoding techniques, in which one is the employment of binary encoding and the other is the further optimization for low power, are compared

Keywords: State encoding; Low power; Mixed synchronous/asynchronous; Finite state machine partitioning

1 Introduction

For ﬁnite state machine (FSM) low-power design, there

are two main active areas of research One is FSM

partitioning and the other is low-power state encoding

These two methods can be used together or separately in

order to reduce the power dissipation of FSMs

FSM partitioning can be considered as the employment

of the concept ‘‘Dynamic Power Management’’[1](DPM)

at Register Transfer (RT) level The objective of a DPM

scheme is to partition the original design into two or more

units and those currently idle units are able to be shut

down to reduce dynamic power dissipation It is usual for

mechanisms to be added to the design to detect and shut

down the idle parts of the units Implementation of these

will result in additional circuits which, in turn, will add to

the circuit area and power dissipation Therefore, to

achieve a solution which uses the minimal power con-sumption, it is important to make an initial, careful, analysis in order to ﬁnd the most beneﬁcial idle conditions, taking the overhead into account

As to FSM partitioning for low power, the original FSM

is partitioned into several sub-FSMs and for most of the time, except when there is a state transition between two sub-FSMs, only one sub-FSM is active and all others are shut down Clock-gating and input-disabling are usually used as shut-down circuitry After FSM partitioning, the sub-FSM network basically has two types of structure in which the main difference is the means of implementing the state memory

In [2,3], each sub-FSM has its own state memory (as shown in Fig 1a) and extra signals are added to control which sub-FSM should be currently active Since every sub-FSM can be synthesized separately, a low-power state encoding method for monolithic FSM [4,5] can be used directly to reduce the power dissipation of each sub-FSM

In this structure, the state memories are, in some sense,

www.elsevier.com/locate/vlsi

doi: 10.1016/j.vlsi.2007.02.002

Corresponding author.

E-mail address: cao.cao@miun.se (C Cao).

Trang 2

redundant At any given time, only one state memory in the

current active sub-FSM is of importance for storing state

information while those others, remaining in the idle

sub-FSMs, are not used

By contrast, in the partitioned FSM structure proposed

by Chow et al [6], all sub-FSMs share the same state

memory (also called local state memory, LSM) and thus

area can be reduced (seeFig 1b) However, a global state

memory is added to determine which one of the sub-FSMs

is currently active Because states in different sub-FSMs

can have the same local state code and be distinguished by

their global state codes, the local state assignment in one

FSM will inﬂuence the state codes in another

sub-FSM The sub-FSMs, therefore cannot be synthesized

separately and the low-power state encoding problem, in

this case, should be considered carefully For state

encoding [6] presents a method considering crossing

transitions (the state transition between two different

sub-FSMs) by introducing pseudo-outputs A pseudo-output

bit represents a relation imposed by the crossing transitions

Subsequently, all crossing transitions are deleted and Jedi

[7]is used to perform low-power state assignment for each

individual sub-FSM

Both approaches to low-power FSM partitioning

de-scribed above, assume fully synchronous implementations,

either based on a shared or separate state memory

However, the fully synchronous implementation does have

disadvantages For the separate state memory structure, in

the clock cycle when there is a crossing transition (where the

source state and destination state reside in two different

sub-FSMs), both sub-FSMs involved must be clocked

which adds to the power consumption For the shared state

memory structure, the global state memory, used for

determining the current active sub-FSM, is always clocked

and consumes power

For partitioned FSMs utilizing separate state memory,

with the aid of an asynchronous hand-over mechanism,[8]

removes the requirement of clocking two sub-FSMs at a

crossing transition The power overhead, introduced for

managing the interaction between the sub-FSMs, can thus

be reduced It was veriﬁed that, in the clock cycle without a

crossing transition, the power consumption of the

synchro-nous control for sub-FSM interaction is 5.8 times that of the asynchronous control[9] The average power reduction

of 45%, for a set of FSM benchmark circuits, is achieved using this asynchronous communication between sub-FSMs[3] Using the model of shared mixed synchronous/ asynchronous state memory, without state encoding optimization[10] achieves an average power reduction of 56%

As a development of [10], in this paper, a novel low-power state encoding algorithm is proposed and applied to FSM partitioning based on mixed synchronous/asynchro-nous state memory This algorithm is based on the assumption that the total power consumption is propor-tional to the switching activity of state bit-lines[11] The main contributions of this paper are as follows:

A state assignment procedure: Introduction of state-bundling that reﬂects the property of state encoding in partitioned FSM with shared state memory

Power optimized state encoding: An efﬁcient state encoding algorithm is performed on the state-bundle table to reduce switching activity of state bit-lines, which generally can lead to the total power reduction, including that of the next state logic and the output logic

Demonstration of efﬁciency: The proposed algorithm has been incorporated in a tool for low-power synthesis

of partitioned FSMs To a set of MCNC benchmark circuits [12], it was demonstrated that, in combination with partitioning procedures, the tool can achieve an average power reduction of 59%

The outline for the rest of this paper is as follows: Section 2 introduces the implementation structure of the mixed synchronous/asynchronous state memory and the neces-sary deﬁnitions and procedures for this implementation In Section 3 the basic binary state encoding technique and the state assignment optimization for low power are presented

In Section 4 experimental results showing the possibility of

a further power reduction from state encoding optimiza-tion are presented In Secoptimiza-tion 5 the paper is concluded by a discussion regarding the limitations of the two-step synthesis for partitioned FSMs where a partitioning step

is followed by a state encoding step

2 Partitioned FSM with mixed synchronous/asynchronous state memory

2.1 Implementation architecture

In this paper, use is made of the architecture developed

in[13]which has a shared LSM and a global asynchronous state memory (GSM), seeFig 2 The basic idea is for the LSM, which is always clocked, to be synchronous and the GSM asynchronous, as this does not consume power when idle The partitioning of the original monolithic FSM is based on state transition probabilities where the states with

Fig 1 Structural decomposition of FSM (a) Separate state memory and

(b) Shared state memory.

Trang 3

high mutual transition probabilities are implemented in the

same sub-FSM Since the state transition probabilities

between sub-FSMs are low, the global state memory is

generally idle and thus an asynchronous implementation is

more power efﬁcient The shut-down mechanisms used are

input-gating, in order to reduce power dissipation in idle

combinational logic, and clock-gating to shut down

ﬂip-ﬂops temporarily not required in the LSM The primary

output and next-state information of the partitioned FSM

are obtained by merging the separate output and

next-state variables from the sub-FSMs (see Merging function in

Fig 2)

2.2 STG transformation

In order to implement the structure described in the

previous section, the original state transition graph (STG)

of the monolithic FSM must be transformed.Fig 3serves

to illustrate the basic ideas of the design model for STG

transformation

The initial monolithic machine is decomposed into two

sub-FSMs F1and F2as indicated inFig 3a There are three

crossing transitions after partitioning, two from s2in F1and

one from s5 in F2 For each crossing transition, an

additional g-state is introduced, which is inside the

sub-FSM where the source state resides and has the same index

as the destination state of the crossing transition InFig 3b

there are three g-states, g1, g3 and g4, whose indices

correspond to s1, s3, and s4, respectively

After introducing g-states, a crossing transition is

transformed into two steps The ﬁrst step is inside the

LSM and a state transition from the original source state

ends at the g-state In the second step, the g-state is

detected and the global state set denoted by R, responsible

for determining the active sub-FSM, changes subsequently

A change in R will deactivate the sub-FSM containing the

source state of the crossing transition and activate the

sub-FSM containing the destination state of that transition

Take the crossing transition from s2to s3as an example

After STG transformation, the transition from s2will enter

g3 The detection of g3will cause the global state set R to

make a transition from r1to r2 After the completion of the

crossing transition F2, not F1as before, will be appointed as

the active sub-FSM

2.3 Basic definitions The monolithic Mealy type FSM is deﬁned as a sextuple:

F ¼ ðS; X ; Y ; d; l; s0Þwhere S is the set of states, X is the set

of binary inputs, Y is the set of binary outputs, d is the transition function, l is the output function and s0is the initial state

Let there be a partition on the set S:P ¼ {S1,S2,y,Sn} where P is deﬁned as a collection of subsets such that [nm¼1Sm¼S and Si\Sj¼Ø for iaj where 1pi, jpn The monolithic FSM is decomposed into a set of sub-FSMs where every state subset SmAP deﬁnes a sub-FSM as: Fm¼ ðSm; Xm; Ym; dm; lm; sm

0Þ: The state subset Sm is called the internal state of the sub-FSM, Xmthe set of input variables at the transitions from the states in Sm, and Ym the set of output variables on the sets Smand Xm

T(Sm) is deﬁned as the set of states that are not included

in Sm and to which there are transitions from Sm of Fm: TðSmÞ ¼ fsjjdðsk; XhÞ ¼sj; sjeSm; sk2Smg

Q(Sm) is deﬁned as the set of states inside Sm and to which there are transitions from other sub-FSMs: QðSmÞ ¼ fsjjdðsk; XhÞ ¼sj; sj2Sm; skeSmg

The shorter notations Tm and Qm are used throughout the remainder of this paper

The set of g-states Gm, for the crossing transitions originated from Sm, replacing Tmas the set of destination states, is deﬁned as Gm¼ fgijsi2Tmg

After STG transformation, let the set of internal states in the sub-FSM Fmbe Um: Um¼Sm[Gm

2.4 State-bundling

In a synchronous FSM the crossing transition, as for all other transitions, must be completed within one clock cycle As explained above, a crossing transition after the STG transformation requires two steps of state transitions

As the behaviour of the transformed STG is still cycle to cycle equivalent to that of the original synchronous one, these two steps of transitions must be completed within one clock cycle It is known that an asynchronous state transition is triggered by a signal transition rather than the active edge of the clock signal Therefore, following the synchronous state transition in the ﬁrst step, the second step involving the asynchronous state transition happens immediately inside the global state memory and the two steps will be completed before the start of the next clock cycle When the global asynchronous state transition occurs, the local state must remain unchanged, which places a restriction on the state encoding In other words, the local states must be coded in such a way that a g-state and its associated entry state, together named as a coupled-state, must have identical codes For the example in

Fig 3b, there are three coupled-states in total: (s1, g1), (s3, g3), and (s4, g4) Each coupled-state includes two states that reside in different sub-FSMs but have the same state code The coupled-states show how the local state assign-ments in different sub-FSMs should relate to each other,

1 1

11

δ 1 λ 1

I

O

Merging function Gating function

LSM

Φ

Fig 2 Mixed synch/asynch FSM.

Trang 4

which imposes a further restriction on the local state

encoding Other states, not included in the coupled-states,

are denoted as free states (s2and s5in this example) These

free states have more freedom within the local state

assignment If they are located in different sub-FSMs,

since their global states are different, they can choose

whether to have the same local state code or not

The states sharing the same local state code are called a

state-bundle A, what we call, state-bundle table is used to

describe the behaviour of the decomposed FSM including

the sub-FSM interaction We take the above example

(Fig 3) as an illustration and its state-bundle table is shown

inFig 4 In the table, each row includes the internal states

of a sub-FSM and each column, having the same local state

code, represents a state-bundle A local state transition is

represented by a horizontal change between columns and a

global state transition is represented by a vertical change

between rows Every coupled-state shares the same local

state code and should be placed in the same bundle The

bundle including state is further distinguished as a

g-state-bundle in the remainder of the paper

There are two reasons for state-bundling: (1) it enables

states in different sub-FSMs to share local state codes and

(2) it enables an efﬁcient asynchronous global state

transition

3 State encoding for local states

After FSM partitioning, the whole local state encoding

procedure is performed on the state-bundle table Each

coupled-state is placed in a single column and then the free

states are added to the table In this section we present two

state-encoding methods One is the basic procedure that

generally offers good results [10] The other is the

optimization procedure for low-power By merging two

or more coupled-states into the same state-bundle and optimizing the state codes of the state-bundles, we can minimize the switching activity in the state bit-lines and thus the power dissipation[11]

3.1 Basic state encoding algorithm The following example will be used to illustrate the basic procedure of local state assignment Assume that the original state set is S, let there be a partition P ¼

fS1; S2; S3; S4gwhich results in the following sets of states:

U1 ¼ fs1; s2; s3; g4g, U2¼ fs4; s5; s6; g7g, U3¼ fs7; g1g, and

U4 ¼ fs8; s9; s10; s11; s12; s13; g1; g5g The duty time of each state subset Um, i.e., the probability of the corresponding sub-FSM Fmto be active, is given by the sum of the state transition probabilities for which the source states are inside Sm, that is Tm¼P

probðsi; sjÞ; si2Sm; sj2S The static probability of Um, i.e., the sum of the static state probabilities of Sm, is deﬁned as Dm¼P

probðsiÞ; si2Sm Note that Pn

m¼1Dm¼1 as the sum of the static state probabilities of all states equals 1 By contrast,Pn

m¼1Tmis greater than 1, because a crossing transition is associated with two sub-FSMs and its state transition probability contributes to the duty time of both involved sub-FSMs When building the state-bundle table, for an n-way partitioned FSM, there are n rows The set of state-bundles, denoted B, can be deﬁned as B ¼ {b0,b1,y,bp}, where p is determined by the number of columns since each column represents a state-bundle Corresponding to the deﬁnition

of state probability and state transition probability, two probabilities concerning state-bundles are deﬁned One is the state-bundle probability, expressed as probðbmÞ ¼ P

probðsiÞ, si2bm, 0pmpp, representing the sum of state probabilities for states in the state-bundle bm The other is the state-bundle transition probability, deﬁned as probðbmbkÞ ¼P

probðsisjÞ, si2bm, sj2bk, 0pm, kpp, describing the sum of state transition probabilities from states in the bundle bmto the states in the bundle bk Binary state codes are assigned to the columns of the table from left to right in incremental order Initially, the codes of state-bundles correspond to the bundle index, i.e.,

S5

S1

x1

S2

S3

S4

x1

S1

S2

g3

g4

x1

S3

S4

x1

g1

S5

F1

F 2

r1+,r

2-r2+, r

1-r2+, r

1-Fig 3 Example: (a) monolithic FSM with state partition indicated and (b) coupled-states introduced.

Fig 4 Example of state-bundle table.

Trang 5

b0has the code ‘‘000’’, b1‘‘001’’ and so on Binary encoding

ensures that the number of clocked local state bits for each

sub-FSM is minimal, since the high state bits unused

always hold the value of ‘‘zero’’[13]

The construction of the state-bundle table starts from the

coupled-states From previous discussions, it is known that

each coupled-state is in a bundle The number of bundles

necessary for coupled-states is Sn

m¼1Qm

, i.e., the sum of the entry-states of all sub-FSMs Note that Sn

m¼1Qm

equals the total number of g-states InFig 5, coupled-states

are shaded grey in the state-bundle table It can be seen that

for example s4 in F2 is in the same column as g4 in F1

because they are states After ﬁlling the

coupled-states in the table, free coupled-states in each sub-FSM are placed

into the table ordinally from the left most empty cell The

pseudo-code for the bundling algorithm is shown inFig 6

3.2 Power-optimized state encoding algorithm

The power-optimized procedure of state assignment is

based on the basic state encoding algorithm mentioned

above Binary codes are still used for the columns of the

state-bundle table from left to right, but states will be

moved to suitable columns in order to reduce the switching

activity in the state bit-lines In the ﬁrst step, using the

‘‘merging state’’ algorithm, two or more

coupled-states can be merged into the same state-bundle It is thus

possible to reduce both the number of state bits in the LSM

and the clocked bits for a single sub-FSM In this step,

state probabilities are also taken into account to reduce

switching activity In the second step, to further reduce the

switching activity, these g-state-bundles (including

coupled-state) are moved to suitable columns depending on their

mutual state-bundle transition probability The free states

are then placed in the table, taking switching activity into

account

3.2.1 Merging coupled-state algorithm

For partitioned FSM, to determine if it is necessary to

merge coupled-states, we introduce the measurement

criteria:

c ¼

Sm

m¼1ðSm\QmÞ

Sn

m¼1Qm

where the denominator represents the total number of

destination states for the crossing transitions (equal to the

total number of g-states) and the numerator represents the total number of free states and thus the smaller the number

of g-states, the bigger the value of c For a partitioned FSM with a small number of g-states, the number of coupled-states is limited and merging coupled-coupled-states may be unnecessary In Fig 5, the number of g-states is small and c equals 9

4 Most FSMs, partitioned according to the state transition probabilities, have small numbers of g-states and will therefore have a large value for c For this reason the basic state-bundling procedure without merging coupled-states works well in most cases

However, when the number of g-states is large, placing each coupled-state into a different column will result in a large number of g-state-bundles and thus causing inefﬁcient

Fig 5 State-bundle table with large c.

Fig 6 Pseudo-code for basic procedures of building state-bundle table.

Trang 6

usage of the LSM The objective of merging coupled-states

is to reduce the number of g-state-bundles so that more

states are able to share the same local state code It is then

possible to simplify the detection logic for g-states and

reduce the number of local state bits required for each

sub-FSM

To illustrate the merging coupled-state algorithm, the

example inFig 7is used where the value of c is2

5(Fig 8)

The initial state-bundle table with coupled-states before

merging is shown inFig 9a, where the ﬁve g-states reside in

ﬁve state-bundles The merging procedure is performed

using the following steps

(1) Rows in the table are sorted (in sort( ) function of

Fig 10) according to the static probability Dm of their

corresponding sub-FSMs (see Fig 7) Following this

rearrangement, the static probability of the sub-FSMs

appears in descending order Since those sub-FSMs with

high static probability generally contribute more to the

ﬁnal power dissipation, they are given priority in the

following optimization procedures The sorted state-bundle

table is shown in Fig 9b Following on from this, the

g-state-bundle with the highest state bundle probability is

moved to the ﬁrst column and is assigned the state code

‘‘zero’’ in binary coding In the proposed implementation

architecture, the next-state information of the partitioned

FSM is obtained by merging the next-state variables of the

different sub-FSMs (Fig 2) using OR gates When the

present active sub-FSM, assuming that its current state is

encoded to ‘‘zero’’, is deactivated in the next clock cycle,

there are no state changes in this sub-FSM as the next-state variable of a deactivated sub-FSM is always encoded to

‘‘zero’’ Encoding the g-state-bundle with the highest probability to ‘‘zero’’ can therefore reduce the switching activity in the next-state bit-lines for the crossing transi-tions

(2) The coupled-states are merged and two or more will reside in the same column To ensure that the state-bundle probability of the ﬁrst column is a maximum, the algorithm begins from the coupled-state in the ﬁrst column (b0 by default) and attempts to merge other coupled-states into it

If two or more coupled-states are able to be chosen for merging, then the one in the bundle with the highest state bundle probability will be chosen When the merging process for the ﬁrst column b0is completed, b0 is locked The same procedure continues for the following coupled-states until the one in the last column has been executed In the example given inFig 9b, it is shown that both b2and b3

can be merged into b0 According to the static probability information of sub-FSMs shown inFig 7, the state-bundle probability of b3 is 0.3 (prob(b3) ¼ prob(s4) ¼ prob(D4)), larger than that of b2 (prob(b2) ¼ prob(s3)pprob(s3)+ prob(s2) ¼ prob(D3) ¼ 0.2) Therefore, b3 is chosen to be merged into b0 The updated state-bundle table after merging coupled-states is shown inFig 9c, where the total number of g-state-bundles is reduced from 5 to 4

3.2.2 g-state-bundle encoding optimization Since every g-state-bundle is given a unique local state code, the problem of reducing the switching activity between state transitions is transformed to reducing the switching activity of the transitions between the bundles In this step, the indices of state-bundles are ﬁxed but their positions are moved between the columns The algorithm,

in addition to reducing the switching activity between state-bundles, attempts to allow the sub-FSMs with higher static probabilities to retain the minimum-length encoding The state encoding procedure begins from the top row of the state-bundle table, corresponding to the sub-FSM with the highest static probability, and continues to the last row of the table Bundle b0is in the ﬁrst column with the highest state-bundle probability and, its position is locked initially when other g-state-bundles are unlocked In the table, if a row includes a state belonging to a g-state-bundle, the g-state-bundle is said to be valid in this row For the valid g-state-bundles of each row, their positions will be optimized and then locked without further changes

A greedy algorithm, shown inFig 11, is used to minimize the hamming distance of g-bundles with high state-bundle transition probability

The procedure can be illustrated through the example given above InFig 9c, the state-bundle table after merging the coupled-states includes state-bundles b0, b1, b2, b3 The ﬁrst row only includes states s0and g1, belonging to b0and

b1, respectively, so b0and b1are the valid state-bundles in this row (corresponding to F1) Since b0is locked initially, the g-state-bundle optimization procedure begins from b

S1

S4

F1

S0

F 2

F

F 4

S5

F5

S6

Sub-FSM static probability:

D1 = 0.3 D2 = 0.1 D3 = 0.2 D4 = 0.3 D5 = 0.1

S3

S2

F 3

Fig 7 Example of a partitioned FSM with small c.

B:

b 0

000

b 1 001

b 2 010

b 3 011

b 4 100

bits

s3

1

2

Fig 8 State-bundle table before optimization.

Trang 7

To ensure that valid g-state-bundles in F1use the

minimum-length codes, b1can only be assigned the code ‘‘01’’ and

thus only one state bit is necessary to distinguish the valid

g-state-bundles in F1 When the optimization in the ﬁrst

row is completed, the position of g-state-bundles b0and b1

is locked For the remaining two bundles, b2includes the

coupled-state (s3, g3) and b3includes (s5, g5) In F4there are

4 states and the number of minimal local state bits is 2

Since the codes ‘‘00’’ and ‘‘01’’ have been assigned, the

remaining available codes are ‘‘10’’ and ‘‘11’’ The

bundle transition probabilities between unassigned

state-bundles (b2, b3) and assigned bundles (b0, b1) are calculated

If it is assumed that the transition probability between b3

and b0is the highest, then b3is placed in the column with

the code ‘‘10’’, which has a hamming distance of 1 to b0

(‘‘00’’) and is then locked Bundle b is subsequently

assigned the code ‘‘11’’ and locked Since all g-state-bundles are locked, the state encoding optimization for g-state-bundles is complete It can be seen from Fig 9d that the position of b3and b2is swapped after optimization

3.2.3 Free state encoding optimization Free states are states of sub-FSMs that are not included

in the coupled-states These states are not related to crossing transitions and therefore their state assignment optimiza-tion is not inﬂuenced by the ordering of sub-FSMs For every unassigned free state of a single sub-FSM Fm, its state transition probabilities with all other assigned states (inside

or outside Fm) is calculated The free state associated with the highest state transition probability is chosen and placed into the column minimizing the hamming distance of this

B:

b 0 000

b 1 001

b 2 010

b 3 011

b 4 100

F 1 s0 g1

B:

b 0 000

b 1 001

b 2 010

b 3 011

b 4 100

F 1 s0 g1

-F 4 - g1 g3 s4 g5

5

B:

b 0 000

b 1 001

b 2 010

b 3 011

b 4 100

F 1 s0 g1

-F 4 s4 g1 g3 g5

-F 3 g4 - s3

-B:

C:

b 0 00

b 1 01

b 3 10

b 2 11

F 1 s0 g1

-F 4 s4 g1 g5 g3

-B:

C:

b 0 00

b 1 01

b 3 10

b 2

F1 s0 g1

-F 4 s4 g1 g5 g3 2

-s4

e

Fig 9 Optimization procedures for state-bundle table: (a) initial table with coupled-states; (b) sorted table; (c) merging coupled-states; (d) optimization of g-state-bundles; (e) ﬁnal state-bundle table.

Trang 8

transition Meanwhile the condition of minimum-length

encoding for Fmshould still be satisﬁed

In the example above, for sub-FSM F3, s2is a free state

and the minimum number of state bits for F3is 2 (obtained

from minimumLengthCode( ) function inFig 12) Since s2

only has a state transition to s3 (in Fig 7), it should be

placed in the column which has the smallest hamming

distance to the column with code ‘‘11’’, which is where s3

resides Both columns, with code ‘‘01’’ and ‘‘10’’, have a

hamming distance 1 to ‘‘11’’, so s2can be placed in either

In Fig 9e, it is placed in the left column ‘‘01’’ For

sub-FSM F5, s6is a free state It has state transitions to s5and

s0, where the former occurs in sub-FSM F5and the latter is

between sub-FSMs F5 and F1 Assume that the state

transition probability between s6and s5is higher than that

between s6 and s0, s6 is placed in the column with code

‘‘11’’, which has a hamming distance of 1 from the column

with code ‘‘10’’ where s5is to be found The column ‘‘01’’

has a hamming distance of 2 from ‘‘10’’ and is therefore not

chosen

The ﬁnal state-bundle table after optimization proce-dures is shown inFig 9e In comparison to the state-bundle table without optimization shown inFig 8, it can be seen that the total number of local state bits, as well as the state bits needed for F4and F5, is reduced from 3 to 2 The effect

of reducing local state bits depends largely on whether or not it is possible for those sub-FSMs having a high probability of being active to be able to reduce the number

of clocked local state bits

4 Experimental results

In this section the results are presented showing how the state assignment optimization procedures, given in Section 3.2, inﬂuence the power consumption of partitioned FSMs The optimization algorithms are incorporated into the automatic synthesis tool based on our previous work[10] Seven MCNC standard benchmarks[12]were used in the experiments The number of states in these benchmarks Fig 10 Pseudo-code for coupled-state merging.

Trang 9

ranges from 19 to 121 As in [14], we use Monte Carlo

simulation to obtain the approximate state transition

probabilities The input vectors are randomly generated

and the average input probability is set to 0.5 by default

The inputs are assumed to be independent of each other

Also, it is assumed that the state probability of each state

becomes a constant as time increases to inﬁnity After a

warm-up period of clock cycles, the state probability of

every state is sampled A simpliﬁed convergence criterion is

used, i.e., only when the maximum difference value

between the probabilities of each state sampled in two

consecutive time units (or clock cycles) is less than e, a user

speciﬁed constant, does the simulation stop The default

value of e is set to 106 For all the standard benchmarks

tested, the simulations converged in a reasonable time This

method removes the limitation to deal with STG in an

explicit way and supports the execution of large

bench-marks The power and area ﬁgures presented in the graphs

are obtained from gate-level estimations in Power

Compi-ler, and logic synthesis is performed using Design

Compiler, both of which tools are obtained from Synopsys

[15] A 0.18 mm CMOS standard cell library [16] is used The power supply voltage Vddis assumed as 1.8 V and the clock frequency is 20 MHz

For the original FSM, binary state encoding is the default one used by Synopsys The total power of this monolithic FSM is Ptot,mono¼Pclk+Preg+Pns+Pout where Pclk is the clock net power, Pregis the power in the state registers, Pnsis the power in the next-state function, and Pout is the power in the output function The total power of the partitioned FSM is Ptot,part¼Pclk+Preg+Pns

+Pout+Pohwhere Pohis the power overhead, which is the sum of the power dissipated in the global state memory, circuits for idle condition detection, and shut-down circuitry The power dissipated in the sub-FSMs can be further indicated as Ptot,sub-FSM¼Ptot,partPoh

FSM partitioning, on its own, is an efﬁcient method for achieving power reductions In Fig 13, the power consumptions before and after FSM partitioning is compared It can be seen that signiﬁcant reductions have been obtained using the mixed synchronous/asynchronous architecture without optimized state encoding

Fig 11 Pseudo-code for optimized g-state-bundle encoding.

Trang 10

In the partitioned FSM a signiﬁcant part of the power is

dissipated in the global state memory, the circuits for idle

condition detection and the shut-down logic (in total Poh)

This part is rarely affected by optimization procedures

presented in this paper To examine in detail how the

proposed procedures affect the power consumption, we

ﬁrstly only consider the power dissipated in the sub-FSMs

(Ptot,sub-FSM¼Ptot,partPoh) InFig 14, the impact of state

assignment alone on power optimization of Ptot,sub-FSM

after FSM partitioning is illustrated It is shown that, in

comparison to the basic binary encoding procedures, how

the means of merging coupled-state (indicated by the legend

‘‘merged g-states’’) and the following encoding

optimiza-tion procedures (indicated by the legend ‘‘encoding’’) affect

power dissipation As can be seen, the merging of

coupled-states is, by itself, rarely a goal For benchmarks s832,

s820, scf and s1488, coupled-state merging has no obvious

positive effect This result is reasonable as the number of

local state bits may not be reduced following merging

Even when the number does decrease, since the local state bits are clock-gated, the sub-FSMs having a high probability of being active may still use the same number

of state bits as before the merging The following state encoding optimization process (including optimization for g-state-bundles and free states), by contrast, has better performance in terms of power reduction On average, for the benchmarks tested, regarding the power dissipation of the sub-FSMs (Ptot,sub-FSM¼Ptot,partPoh), the whole state assignment optimization technique achieves a power reduction of around 13% compared to that of the partitioned FSM using binary state encoding simply

As shown in Fig 13 to the right, the sub-FSM power (Ptot,partPoh) is only a portion (in average 40% or so) of the decomposed FSM power (Ptot,part) The total power reduction from FSM partitioning and optimization proce-dures is 59% compared to 56% in[10] Most low-power state encoding methods focus on the monolithic FSM, and for the method that does concern partitioned FSM with Fig 12 Pseudo-code for optimized free state encoding.

Định dạng
Số trang	12
Dung lượng	5,63 MB

Tài liệu tham khảo	Loại	Chi tiết
[15] Synopsys Inc., / http://www.synopsys.com, company homepage S . [16] United Microelectronics Corp., / http://www.umc.com.tw, companyhomepage S	Link
[1] L. Benini, G.D. Micheli, Dynamic Power Management: Design Techniques and CAD Tools, Kluwer Academic Publishers, Dor- drecht, 1998	Khác
[2] L. Benini, F. Vermeulen, G.D. Micheli, Finite-state machine partitioning for low-power, in: Proceedings of the IEEE International Symposium on Circuits and Systems, vol. 2, 1998, pp. 5–8	Khác
[3] B. Oelmann, K. Tammema¨e, M. Kruus, M. O’Nils, Automatic FSM synthesis for low-power mixed synchronous/asynchronous imple- mentation, J. VLSI Design 12 (2001) 167–186 (Special issue on low- power design)	Khác
[4] L. Benini, G.D. Micheli, State assignment for low power dissipation, IEEE J. Solid-State Circuits 30 (1995) 258–268	Khác
[5] C.Y. Tsui, M. Pedram, A.M. Despain, Low power state assignment targeting two and multilevel logic implementations, IEEE Trans.Comput. Aided Design 17 (12) (1998) 1281–1291	Khác
[6] S.-H. Chow, Y.-C. Ho, T. Hwang, Low-power realization of ﬁnite- state machines—a decomposition approach, ACM Trans. Design Automat. Electron. Syst. 1 (1996) 315–340	Khác
[7] B. Lin, A. Richard Newton, Synthesis of multiple level logic from symbolic high-level description languages, in: VLSI 89, Munich, 1989, pp. 187–196	Khác
[8] B. Oelmann, M. O’Nils, A low power hand-over mechanism for gated-clock FSMs, in: Proceedings of the European Conference on Circuit Theory and Design, 1999, pp. 118–121	Khác
[9] B. Oelmann, M. O’ Nils, Asynchronous control of low-power gated- clock ﬁnite-state machines, in: Proceedings of IEEE International Conference on Electronics, Circuits, and Systems, 1999, pp. 915–918	Khác
[10] C. Cao, M. O’Nils, B. Oelmann, A tool for low-power synthesis of FSMs with mixed synchronous/asynchronous state memory, in:IEEE Proceedings of the Norchip Conference, 2004, pp. 199–202	Khác
[11] K. Roy, S. Prasad, SYCLOP: synthesis of CMOS logic for low power applications, in: Proceedings of the International Conference on Computer Design (ICCD), 1992, pp. 464–467	Khác
[12] S. Yang, Logic synthesis of optimization benchmarks—user guide version 3.0, MCNC Technical Report	Khác
[13] C. Cao, B. Oelmann, Mixed synchronous/asynchronous state memory for low power FSM design, in: Proceedings of the EUROMICRO Symposium on Digital System Design, 2004, pp.363–370	Khác
[14] J.C. Monteiro, A.L. Oliveira, Implicit FSM decomposition applied to low power design, IEEE Trans. Very Large Scale Integration Syst. 10 (5) (2002) 560–565	Khác