MEMORY, MICROPROCESSOR, and ASIC phần 2 pot

The latest arrival time that a valid data signal Qi appears at the output of Ri: that is, the sum of the latest possible arrival time of the leading edge of Ci and the maximum clock-to

Trang 1

1-20 Memory, Microprocessor, and ASIC

signal Ci and Cf to the flip-flops Ri and Rf are denoted by and , respectively The input and outputdata signals to Ri and Rf are denoted by Di, Qi, Df and Qf, respectively

An analysis of the timing properties of the local data path shown in Fig 1.14 is offered in the followingsections First, the timing relationships to prevent the late arrival of data signals to Rf are examined in thenext subsection The timing relationships to prevent the early arrival of signals to the register Rf are thendescribed, followed by analyses that borrow some notation from Refs 11 and 12 Similar analyses ofsynchronous circuits from the timing perspective can be found in Refs 45 through 49

Preventing the Late Arrival of the Data Signal in a Local Data Path with Flip-Flops

The operation of the local data path Ri Rf shown in Fig 1.14 requires that any data signal that isbeing stored in Rf arrives at the data input Df of Rf no later than before the latching edge of the clock

signal Cf It is possible for the opposite event to occur, that is, for the data signal Df not to arrive at the

register Rf sufficiently early in order to be stored successfully within Rf If this situation occurs, the

local data path shown in Fig 1.14 fails to perform as expected and it is said that a timing failure or violation has been created This form of timing violation is typically called a setup (or long path) violation.

A setup violation is depicted in Fig 1.15 and is used in the following discussion

The identical clock periods of the clock signals Ci and Cf are shaded for identification in Fig 1.15 Also shaded in Fig 1.15 are those portions of the data signals Di , Q i , and D f that are relevant to the

operation of the local data path shown in Fig 1.14 Specifically, the shaded portion of Di corresponds

to the data to be stored in Ri at the beginning of the k-th clock period This data signal propagates to

the output of the register Ri and is illustrated by the shaded portion of Qi shown in Fig 1.15 The combinational logic operates on Qi , during the k-th clock period The result of this operation is the shaded portion of the signal Df which must be stored in Rf during the next (k+1)-th clock period Observe that, as illustrated in Fig 1.15, the leading edge of Ci that initiates the k-th clock period occurs at time kTCP Similarly, the leading edge of C f that initiates the (k + 1)-th clock period occurs

at time +(k+1) TCP Therefore, the latest arrival time of Df at Rf must satisfy

(1.15)The term on the right-hand side of Eq 1.15 corresponds to the critical situation

of the leading edge of Cf arriving earlier by the maximum possible deviation The - term on the

right-hand side of Eq 1.15 accounts for the setup time of Rf (recall the definition of ) Note that thevalue of in Eq 1.15 consists of two components:

1 The latest arrival time that a valid data signal Qi appears at the output of Ri: that is, the sum

of the latest possible arrival time of the leading edge of Ci and the

maximum clock-to-Q delay of Ri

2 The maximum propagation delay of the data signals through the combinational logicblock Lif and interconnect along the path Ri Rf

Therefore, can be described as

FIGURE 1.14 A single-phase local data path.

Trang 2

thermore, certain terms in Eq 1.17 can be grouped together and, by noting that - =Tskew (i, f) is the

clock skew between the registers Ri and Rf,

(1.18)Note that a violation of Eq 1.18 is illustrated in Fig 1.15

The timing relationship Eq 1.18 represents three important results describing the late arrival of the

signal Df at the data input of the final register Rf in a local data path Ri Rf:

1 Given any values of Tskew (i, f) and the late arrival of the data signal at Rf can

be prevented by controlling the value of the clock period TCP A sufficiently large value of T CP

can always be chosen to relax Eq 1.18 by increasing the upper bound described by the hand side of Eq 1.18

right-FIGURE 1.15 Timing diagram of a local data path with flip-flops with violation of the setup constraint.

Trang 3

2 For correct operation, the clock period TCP does not necessarily have to be larger than the term

( + + ) If the clock skew TSkew (i, f) is properly controlled, choosing a particular negative

value for the clock skew will relax the left side of Eq 1.18, thereby permitting Eq 1.18 to besatisfied despite TCP-( + + ) < 0

3 Both the term 2 and the term ( + + ) are harmful in the sense that these terms impose

a lower bound on the clock period TCP (as expected) Although negative skew can be used to relax the inequality of Eq 1.18, these two terms work against relaxing the values of TCP and TSkew (i,f)

Finally, the relationship in Eq 1.18 can be rewritten in a form that clarifies the upper bound on the

clock skew TSkew (i, f) imposed by Eq 1.18:

(1.19)

Preventing the Early Arrival of the Data Signal in a Local Data Path with Flip-Flops

Late arrival of the signal Df at the data input of Rf (see Fig 1.14) was analyzed in the previoussubsection In this section, the analysis of the timing relationships of the local data path Ri Rf to

prevent early data arrival of Df is presented To this end, recall from previous discussion that any data signal Df being stored in Rf must lag the arrival of the leading edge of Cf by at least It is possible for

the opposite event to occur, that is, for a new data to overwrite the value of Df and be stored withinthe register Rf If this situation occurs, the local data path shown in Fig 1.14 will not perform as

desired because of a catastrophic timing violation known as a hold (or short path) violation.

In this section, hold timing violations are analyzed It is shown that a hold violation is more dangerousthan a setup violation since a hold violation cannot be removed by simply adjusting the clock period

T cp (unlike the case of a data signal arriving late where TCP can be increased to satisfy Eq 1.18) A hold

violation is depicted in Fig 1.16, which is used in the following discussion

The situation depicted in Fig 1.16 is different from the situation depicted in Fig 1.15 in thefollowing sense In Fig 1.15, a data signal stored in Ri during the k-th clock period arrives too late to

be stored in Rf during the (k+1)-th clock period In Fig 1.16, however, the data stored in Ri during

the k-th clock period arrives at Rf too early and destroys the data that had to be stored in Rf during the

same k-th clock period To clarify this concept, certain portions of the data signals are shaded for easy identification in Fig 1.16 The data Di being stored in Ri at the beginning of the k-th clock period is

shaded This data signal propagates to the output of the register Ri and is illustrated by the shaded

portion of Qi shown in Fig 1.16 The output of the logic (left unshaded in Fig 1.16) is being stored

within the register Rf at the beginning of the (k+1)-th clock period Finally, the shaded portion of Df

corresponds to the data that must be stored in Rf at the beginning of the k-th clock period.

Note that, as illustrated in Fig 1.16, the leading (or latching) edge of Ci that initiates the k-th clock period occurs at time +kTCP Similarly, the leading (or latching) edge of C f that initiates the k-th clock period occurs at time +kTCP Therefore, the earliest arrival time of the data signal Df at the

register Rf must satisfy the following condition:

(1.20)The term on the right-hand side of Eq 1.20 corresponds to the critical situation of

the leading edge of the k-th clock period of Cf arriving late by the maximum possible deviation Note

that the value of in Eq 1.20 has two components:

1 The earliest arrival time that a valid data signal Qi appears at the output of Ri: that is, the sum

of the earliest arrival time of the leading edge of Ci and the

minimum clock-to-Q delay of Ri

2 The minimum propagation delay of the signals through the combinational logic block Lifand interconnect wires along the path R R

Trang 4

The timing relationship described by Eq 1.23 provides certain important facts describing the early

arrival of the signal Df at the data input of the final register Rf of a local data path:

1 Unlike Eq 1.18, the inequality Eq 1.23 does not depend on the clock period TCP Therefore, a violation of Eq 1.23 cannot be corrected by simply manipulating the value of TCP A synchronous

digital system with hold violations is non-functional, while a system with setup violations willstill operate correctly at a reduced speed.* For this reason, hold violations result in catastrophic

FIGURE 1.16 Timing diagram of a local data path with flip-flops with a violation of the hold constraint.

Trang 5

timing failure and are considered significantly more dangerous than the setup violations previouslydescribed

2 The relationship in Eq 1.23 can be satisfied with a sufficiently large value of the clock skew

T Skew (i, f) However, both the term 2 and the term are harmful in the sense that these terms impose a lower bound on the clock skew TSkew (i, f) between the registers Ri and Rf Althoughpositive skew may be used to relax Eq 1.23, these two terms work against relaxing the values

of TSkew (i, f) and

Finally, the relationship in Eq 1.23 can be rewritten to stress the lower bound imposed on the clock

skew TSkew (i,f,) by Eq 1.23:

(1.24)

1.4.7 Analysis of a Single-Phase Local Data Path with Latches

A local data path consisting of two level-sensitive registers (or latches) and the combinational logicbetween these registers (or latches) is shown in Fig 1.17 Note the initial latch Ri, which is the origin

of the data signal, and the final latch Rf, which is the destination of the data signal The combinationallogic block Lif between Ri and Rf accepts the input data signals sourced by Ri and other registers andlogic gates and transmits the data signals that have been operated on to Rf The period of the clock

signal is denoted by TCP and the delays of the clock signals Ci and Cf to the latches Ri and Rf aredenoted by and respectively The input and output data signals to Ri and Rf are denoted by Di , Q i ,

D f , and Q f , respectively.

An analysis of the timing properties of the local data path shown in Fig 1.17 is offered in the followingsections The timing relationships to prevent the late arrival of the data signal at the latch Rf are examined,

as well as the timing relationships to prevent the early arrival of the data signal at the latch Rf

The analyses presented in this section build on assumptions regarding the timing relationshipsamong the signals of a latch similar to those assumptions used in the previous chapter section Specifically,

it is guaranteed that every data signal arrives at the data input of a latch no later than time before thetrailing clock edge Also, this data signal must remain stable at least time after the trailing edge, that

is, no new data signal should arrive at a latch time after the latch has become opaque

Observe the differences between a latch and a flip-flop.45,50 Inflip-flops, the setup and hold

requirements described in the previous paragraph are relative to the leading—not to the trailing—edge

of the clock signal Similar to flip-flops, the late and early arrival of the data signal to a latch give rise totiming violations known as setup and hold violations, respectively

Preventing the Late Arrival of the Data Signal in a Local Data Path with Latches

A similar signal setup to the example illustrated in Fig 1.15 is assumed in the following discussion A

data signal Di, is stored in the latch Ri during the k-th clock period The data Qi , stored in Ri propagatesthrough the combinational logic Lif and the interconnect along the path Ri Rf In the (k+1)-th

FIGURE 1.17 A single-phase local data path with latches.

Trang 6

System Timing

clock period, the result Df of the computation in Lif is stored within the latch Rf The signal Df must arrive at least time before the trailing edge of Cf in the (k + 1)-th clock period.

Similar to the discussion presented in the previous section, the latest arrival time of Df at the D

input of Rf must satisfy

(1.25)

Note the difference between Eqs 1.25 and 1.15 In Eq 1.15, the first term on the right-hand side is [

+(k + 1) TCP- ], while in Eq 1.25, the first term on the right-hand side has an additional term Theaddition of corresponds to the concept that, unlike flip-flops, a data signal is stored in a latch, shown

in Fig 1.17, at the trailing edge of the clock signal (the term) Similar to the case of flip-flops, the term

on the right-hand side of Eq 1.25 corresponds to the critical situation

of the trailing edge of the clock signal Cf arriving earlier by the maximum possible deviation .

Observe that the value of in Eq 1.25 consists of two components:

1 The latest arrival time when a valid data signal Qi appears at the output of the latch Ri

2 The maximum signal propagation delay through the combinational logic block Lif and theinterconnect along the path Ri Rf

(1.26)

However, unlike the situation of flip-flops discussed previously, the term on the right-hand side of

Eq 1.26 is not the sum of the delays through the register Ri The reason is that the value of depends

on whether the signal Di arrived before or during the transparent state of Ri in the k-th clock period.

Therefore, the value of in Eq 1.26 is the greater of the following two quantities:

(1.27)There are two terms on the right-hand side of Eq 1.27:

1 The term corresponds to the situation in which Di arrives at Ri after the leading

edge of the k-th clock period.

2 The term corresponds to the situation in which Di arrives at Ri

before the leading edge of the k-th clock pulse arrives.

By substituting Eq 1.27 into Eq 1.26, the latest time of arrival is:

(1.28)which is in turn substituted into Eq 1.25 to obtain

(1.29)

Equation Eq 1.29 is an expression for the inequality that must be satisfied in order to prevent the late

arrival of a data signal at the data input D of the register Rf By satisfying Eq 1.29, setup violations inthe local data path with latches shown in Fig 1.17 are avoided For a circuit to operate correctly, Eq.1.29 must be enforced for any local data path R R consisting of the latches R and R

Trang 7

The max operation in Eq 1.29 creates a mathematically difficult situation since it is unknownwhich of the quantities under the max operation is greater To overcome this obstacle, this max operationcan be split into two conditions:

(1.30)

(1.31)

Taking into account that the clock skew TSkew (i, f)= - , Eqs 1.30 and 1.31 can be rewritten as

(1.32)(1.33)

Equation 1.33 can be rewritten in a form that clarifies the upper bound on the clock skew TSkew (i, f)

imposed by Eq 1.33:

(1.34)(1.35)

Preventing the Early Arrival of the Data Signal in a Local Data Path with Latches

A similar signal setup to the example illustrated in Fig 1.16 is assumed in the discussion presented inthis section Recall the difference between the late arrival of a data signal at Rf and the early arrival of

a data signal at Rf In the former case, the data signal stored in the latch Ri during the k-th clock period

arrives too late to be stored in the latch Rf during the (k+1)-th clock period In the latter case, the data

signal stored in the latch Ri during the k-th clock period propagates to the latch Rf too early andoverwrites the data signal that was already stored in the latch Rf during the same k-th clock period.

In order for the proper data signal to be successfully latched within Rf during the k-th clock period, there should not be any changes in the signal Df until at least the hold time after the arrival of the storing (trailing) edge of the clock signal Cf Therefore, the earliest arrival time of the data signal Df

at the register Rf must satisfy the following condition:

(1.36)The term on the right-hand side of Eq 1.36 corresponds to the critical

situation of the trailing edge of the k-th clock period of the clock signal Cf arriving late by the

maxiumum possible deviation Note that the value of in Eq 1.36 consists of two components:

1 The earliest arrival time that a valid data signal Qi appears at the output of the latch Ri: that

is, the sum of the earliest arrival time of the leading edge of the

clock signal Ci and the minimum clock-to-Q delay of Rf

2 The minimum propagation delay of the signal through the combinational logic Lif and theinterconnect along the path Ri Rf

(1.37)

By substituting Eq 1.37 into Eq 1.36, the timing condition guaranteeing that Df does not arrive too

early at the latch R is

Trang 8

System Timing

(1.38)The inequality Eq 1.38 can be further simplified by reorganizing the terms and noting that -

=T Skew (i, f)is the clock skew between the registers Ri and Rf:

(1.39)The timing relationship described by Eq 1.39 represents two important results describing the early

arrival of the signal Df at the data input of the final latch Rf of a local data path:

1 The relationship in Eq 1.39 does not depend on the value of the clock period TCP Therefore,

if a hold timing violation in a synchronous system has occurred,* this timing violation iscatastrophic

2 The relationship in Eq 1.39 can be satisfied with a sufficiently large value of the clock skew TSkew (i, f) Furthermore, both the term ( + ) and the term are harmful in the sense that these terms impose a lower bound on the clock skew TSkew (i, f) between the latches Rj and Rf Although positive

skew TSkew (i, f)>0 can be used to relax Eq 1.39, these two terms make it difficult to satisfy the inequality in Eq 1.39 for specific values of TSkew (i, f) and ( + )

Furthermore, Eq 1.39 can be rewritten to emphasize the lower bound on the clock skew TSkew (i, f)

In a fully synchronous digital VLSI system, however, it is possible to encounter types of local datapaths different from those circuits analyzed in this chapter For example, a local data path may begin

with a positive-polarity, edge-sensitive register Ri, and end with a negative-polarity, edge-sensitive register

Rf It is also possible that different types of registers are used; for example, a register with more than onedata input In each individual case, the analyses described in this chapter illustrate the general methodologyused to derive the proper timing relationships specific to that system Furthermore, note that for agiven system, the timing relationships that must be satisfied for the system to operate correctly—such

as Eqs 1.19, 1.24, 1.34, 1.35, and 1.40—are collectively referred to as the overall timing constraints of the

synchronous digital system.13,51-55

1.6 Glossary of Terms

The following notations are used in this chapter

1 Clock Signal Parameters

T CP: The clock period of a circuit

DL: The tolerance of the leading edge of any clock signal

∆T: The tolerance of the trailing edge of any clock signal

* As described by the inequality Eq 1.39 not being satisfied.

Trang 9

The tolerance of the leading edge of a clock signal driving a latch

The tolerance of the trailing edge of a clock signal driving a latch

The tolerance of the leading edge of a clock signal driving a flip-flop

The tolerance of the trailing edge of a clock signal driving a flip-flop

The minimum width of the clock signal in a circuit with latches

The minimum width of the clock signal in a circuit with flip-flops

2 Latch Parameters

The clock-to-output delay of a latch

The clock-to-output delay of the latch Ri

The minimum clock-to-output delay of a latch

The minimum clock-to-output delay of the latch Ri

The maximum clock-to-output delay of a latch

The maximum clock-to-output delay of the latch Ri

The data-to-output delay of a latch

The data-to-output delay of the latch Ri

The minimum data-to-output delay of a latch

The minimum data-to-output delay of the latch Ri

The maximum data-to-output delay of a latch

The maximum data-to-output delay of the latch Ri

The setup time of a latch

The setup time of the latch Ri

The hold time of a latch

The hold time of the latch Ri

The latest arrival time of the data signal at the data input of a latch

The latest arrival time of the data signal at the data input of the latch RiThe earliest arrival time of the data signal at the data input of a latch

The earliest arrival time of the data signal at the data input of the latch RiThe latest arrival time of the data signal at the data output of a latch

The latest arrival time of the data signal at the data output of the latch RiThe earliest arrival time of the data signal at the data output of a latch

The earliest arrival time of the data signal at the data output of the latch Ri

3 Flip-flop Parameters

The clock-to-output delay of a latch

The clock-to-output delay of the latch Ri

The minimum clock-to-output delay of a flip-flop

The minimum clock-to-output delay of the flip-flop Ri

The maximum clock-to-output delay of a flip-flop

The maximum clock-to-output delay of the flip-flop R

Trang 10

System Timing

The setup time of a flip-flop

The setup time of the flip-flop Ri

The hold time of a flip-flop

The hold time of the flip-flop Ri

The latest arrival time of the data signal at the data input of a flip-flop

The latest arrival time of the data signal at the data input of the flip-flop RiThe earliest arival time of the data signal at the data input of a flip-flop

The earliest arrival time of the data signal at the data input of the flip-flop RiThe latest arrival time of the data signal at the data output of a flip-flopThe latest arival time of the data signal at the data output of the flip-flop RiThe earliest arrival time of the data signal at the data output of a flip-flopThe earliest arrival time of the data signal at the data output of the flip-flop Ri

4 Local Data Path Parameters

Ri ?RightArrow-? Rf : A local data path from register Ri to register Rf exists

Ri ?RightArrow-? Rf : A local data path from register Ri to register Rf does not exist

5 Vasseghi, N., Yeager, K., Sarto, E., and Seddighnezhad, M., “200-Mhz Superscalar RISC

Microprocessor,” IEEE Journal of Solid-State Circuits, vol SC-31, pp 1675–1686, Nov 1996.

6 Bakoglu, H.B., Circuits, Interconnections, and Packaging for VLSI Addison-Wesley Publishing Company,

Reading, MA, 1990

7 Bothra, S., Rogers, B., Kellam, M., and Osburn, C.M., “Analysis of the Effects of Scaling on

Interconnect Delay in ULSI Circuits,” IEEE Transactions on Electron Devices, vol ED-40, pp 591–

597, Mar 1993

8 Weste, N.W and Eshraghian, K., Principles of CMOS VLSI Design: A Systems Perspective

Addison-Wesley Publishing Company, Reading, MA, 2nd ed., 1992

9 Mead, C and Conway, L., Introduction to VLSI Systems Addison-Wesley Publishing Company,

12 Unger, S.H and Tan, C-J., “Clocking Schemes for High-Speed Digital Systems,” IEEE Transactions

on Computers, vol C.-35, pp 880–895, Oct 1986.

13 Friedman, E.G., Clock Distribution Networks in VLSI Circuits and Systems IEEE Press, 1995.

Trang 11

14 Bowhill, W.J et al., “Circuit Implementation of a 300-MHz 64-bit Second-generation CMOS

Alpha CPU,” Digital Technial Journal, vol 7, no 1, pp 100–118, 1995.

15 Neves, J.L and Friedman, E.G., “Topological Design of Clock Distribution Networks Based on

Non-Zero Clock Skew Specification,” Proceedings of the 36th IEEE Midwest Symposium on Circuits and Systems, pp 468–11, Aug 1993.

16 Xi, J.G and Dai, W.W.-M., “Useful-Skew Clock Routing With Gate Sizing for Low Power

Design,” Proceedings of the 33rd ACM/IEEE Design Automation Conference, pp 383–388, June 1996.

17 Neves, J.L and Friedman, E.G., “Design Methodology for Synthesizing Clock Distribution

Networks Exploiting Non-Zero Localized Clock Skew,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol VLSI-4, pp 286–291, June 1996.

18 Jackson, M.A B., Srinivasan, A., and Kuh, E.S., “Clock Routing for High-Performance ICs,”

Proceedings of the 27th ACM/IEEE Design Automation Conference, pp 573–579, June 1990.

19 Tsay, R.-S., “An Exact Zero-Skew Clock Routing Algorithm,” IEEE Transactions on Aided Design of lntegrated Circuits and Systems, vol CAD-12, pp 242–249, Feb 1993.

Computer-20 Chou, N.-C and Cheng, C.-K., “On General Zero-Skew Clock New Construction,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol VLSI-3, pp 141–146, Mar 1995.

21 Ito, N., Sugiyama, H., and Konno, T., “ChipPRISM: Clock Routing and Timing Analysis for

High-Performance CMOS VLSI Chips,” Fujitsu Scientific and Technical Jornal, vol 31, pp 180–187, Dec 1995.

22 Leiserson, C.E and Saxe, J.B., “A Mixed-Integer Linear Programming Problem Which Is Efficiently

Solvable,” Journal of Algorithms, vol 9, pp 114–128, Mar 1988.

23 Cormen, T.H., Leiserson, C.E., and Rivest, R.L., Introduction to Algorithms MIT Press, 1989.

24 West, D.B., Introduction to Graph Theory Prentice Hall, Upper Saddle River, NJ, 1996.

25 Fishburn, J.P., “Clock Skew Optimization,” IEEE Transactions on Computers, vol C-39, pp 945–

951, July 1990

26 Lee, T.-C and Kong, J., “The New Line in IC Design,” IEEE Spectrum, pp 52–58, Mar 1997.

27 Friedman, E.G., “The Application of Localized Clock Distribution Design to Improving the

Performance of Retimed Sequential Circuits,” Proceedings of the IEEE Asia-Pacific Conference on Circuits and Systems, pp 12–17, Dec 1992.

28 Kourtev, I.S and Friedman, E.G., “Simultaneous Clock Scheduling and Buffered Clock Tree

Synthesis,” Proceedings of the IEEE International Symposium on Circuits and Systems, pp 1812–1815,

June 1997

29 Neves, J.L and Friedman, E.G., “Optimal Clock Skew Scheduling Tolerant to Process Variations,”

Proceedings of the 33rd ACM/IEEE Design Automation Conference, pp 623–628, June 1996.

30 Glasser, L.A and Dobberpuhl, D.W., The Design and Analysis of VLSI Circuits Addison-Wesley

Publishing Company, Reading, MA, 1985

31 Uyemura, J.P., Circuit Design for CMOS VLSI Kluwer Academic Publishers, 1992.

32 Kang, S.M and Leblebici, Y., CMOS Digital Integrated Circuits: Analysis and Design The

McGraw-Hill Companies, Inc., New York, 1996

33 Sedra, A.S and Smith, K.C., Microelectronic Circuits Oxford University Press, 4th ed., 1997.

34 Kohavi, Z., Switching and Finite Automata Theory McGraw-Hill Book Company, New York, 2nd

ed., 1978

35 Mano, M.M and Kime, C.R., Logic and Computer Design Fundamentals Prentice-Hall, Inc., 1997.

36 Wolf, W., Modern VLSI Design: A Systems Approach Prentice Hall, Upper Saddle River, NJ, 1994.

37 Kacprzak, T and Albicki, A., “Analysis of Metastable Operation in RS CMOS Flip-Flops,” IEEE Journal of Solid-State Circuits, vol SC-22, pp 57–64, Feb 1987.

38 Jackson, T.A and Albicki, A., “Analysis of Metastable Operation in D Latches,” IEEE Transactions

on Circuits and Systems—I: Fundamental Theory and Applications, vol CAS I-36, pp 1392–1404,

Nov 1989

39 Friedman, E.G., “Latching Characteristics of a CMOS Bistable Register,” IEEE Transactions on Circuits and Systems—I: Fundamental Theory and Applications, vol CAS 1–40, pp 902–908, Dec.

1993

Trang 12

42 Afghani, M and Yuan, J., “Double-Edge-Triggered D-Flip-Flops for High-Speed CMOS Circuits,”

IEEE Journal of Solid State Circuits, vol SC-26, pp 1168–1170, Aug 1991.

43 Hossain, R., Wronski, L., and Albicki, A., “Double Edge Triggered Devices: Speed and Power

Constraints,” Proceedings of the 1996 IEEE International Symposium on Circuits and Systems, vol 3,

pp 1491–1494, 1993

44 Blair, G.M., “Low-Power Double-Edge Triggered Flip-Flop,” Electronics Letters, vol 33, pp 845–

81, May 1997

45 Lin, L, Ludwig, J.A., and Eng, K., “Analyzing Cycle Stealing on Synchronous Circuits with

Level-Sensitive Latches,” Proceedings of the 29th ACM/IEEE Design Automation Conference, pp.

393–398, June 1992

46 Lee, J fuw, Tang, D.T., and Wong, C.K., “A Timing Analysis Algorithm for Circuits with

Level-Sensitive Latches,” IEEE Transactions on Computer-Aided Design, vol CAD-15, pp 535–543, May

1996

47 Szymanski, T.G., “Computing Optimal Clock Schedules”Proceedings of the 29th ACM/IEEE Design Automation Conference, pp 399–404, June 1992.

48 Dagenais, M.R and Rumin, N.C., “On the Calculation of Optimal Clocking Parameters in

Synchronous Circuits with Level-Sensitive Latches,” IEEE Transactions on Computer-Aided Design,

vol CAD-8, pp 268–278, Mar 1989

49 Sakallah, K.A., Mudge, T.N., and Olukotun, O.A., “checkTc and minTc: Timing Verification and Optimal Clocking of Synchronous Digital Circuits,” Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, pp 552–555, Nov 1990.

50 Sakallah, K.A., Mudge, T.N., and Olukotun, O.A., “Analysis and Design of Latch-Controlled

Synchronous Digital Circuits,” IEEE Transactions on Computer-Aided Design, vol CAD-11, pp.

322–333, Mar 1992

51 Kourtev, I.S and Friedman, E.G., “Topological Synthesis of Clock Trees with Non-Zero Clock

Skew,” Proceedings of the 1997 ACM/IEEE International Workshop on Timing Issues in the Specification and Design of Digital Systems, pp 158–163, Dec 1997.

52 Kourtev, I.S and Friedman, E.G., “Topological Synthesis of Clock Trees for VLSI-Based DSP

Systems,” Proceedings of the IEEE Workshop on Signal Processing Systems, pp 151–162, Nov 1997.

53 Kourtev, I.S and Friedman, E.G., “Integrated Circuit Signal Delay,” Encydopedia of Electrical and Electronics Engineering Wiley Publishing Company, vol 10, pp 378–392, 1999.

54 Neves, J.L and Friedman, E.G., “Synthesizing Distributed Clock Trees for High Performance

ASICs,” Proceedings of the IEEE ASIC Conference, pp 126–129, Sept 1994.

55 Neves, J.L and Friedman, E.G., “Buffered Clock Tree Synthesis with Optimal Clock Skew

Scheduling for Reduced Sensitivity to Process Parameter Variations,” Proceedings of the ACM/ SIGDA International Workshop on Timing Issues in the Specification and Synthesis of Digital Systems,

pp 131–141, Nov 1995

56 Deokar, R.R and Sapatnekar, S.S., “A Fresh Look at Retiming via Clock Skew Optimization,”

Proceedings of the 32nd ACM/IEEE Design Automation Conference, pp 310–315, June 1995.

Trang 14

2.1 Introduction

Read-only memory (ROM) is the densest form of semiconductor memory, which is used for theapplications such as video game software, laser printer fonts, dictionary data in word processors, andsound-source data in electronic musical instruments

The ROM market segment grew well through the first half of the 1990s, closely coinciding with ajump in personal computer (PC) sales and other consumer-oriented electronic systems, as shown inFig 2.1.1 Because a very large ROM application base (video games) moved toward compact diskROM-based systems (CD-ROM), the ROM market segment declined However, greater functionalitymemory products have become relatively cost-competitive with ROM It is believed that the ROMmarket will continue to grow moderately through the year 2003

2.2 ROM

Read-only memories (ROMs) consist of an array of core cells whose contents or state is preprogrammed

by using the presence or absence of a single transistor as the storage mechanism during the fabricationprocess The contents of the memory are therefore maintained indefinitely, regardless of the previoushistory of the device and/or the previous state of the power supply

2.2.1 Core Cells

A binary core cell stores binary information through the presence or absenc of a single transistor at theintersection of the wordline and bitline ROM core cells can be connected in two possible ways: aparallel NOR array of cells or a series NAND array of cells each requiring one transistor per storagecell In this case, either connecting or disconnecting the drain connection from the bitline programsthe ROM cell The NOR array is larger as there is potentially one drain contact per transistor (or percell) made to each bitline Potentially, the NOR array is faster as there are no serially connectedtransistors as in the NAND array approach However, the NAND array is much more compact as nocontacts are required within the array itself However, the serially connected pull-down transistors thatcomprise the bitline are potentially very slow.2

Jen-Sheng Hwang

National Science Council

0–8493–1737–1/03/$0.00+$1.50

Trang 15

Encoding multiple-valued data in the memory array involves a one-to-one mapping of logic value

to transistor characteristics at each memory location and can be implemented in two ways:

(i) Adjust the width-to-length (W/L) ratios of the transistors in the core cells of the memoryarray, or

(ii) Adjust the threshold voltage of the transistors in the core cells of the memory array.3The first technique works on the principle that the W/L ratio of a transistor determines the amount ofcurrent that can flow through the device (i.e., the transconductance) This current can be measured todetermine the size of the device at the selected location and hence the logic value stored at thislocation In order to store 2 bits per cell, one would use one of four discrete transistor sizes Intel Corp.used this technique in the early 1980s to implement high-density look-up tables in its i8087 math co-processor Motorola Inc also introduced a four-state ROM cell with an unusual transistor geometrythat had variable W/L devices The conceptual electrical schematic of the memory cell, along with thesurrounding peripheral circuitry, is shown in Fig 2.2.2

2.2.2 Peripheral Circuitry

The four states in a 2-bit per cell ROM are four distinct current levels There are two primary techniques

to determine which of the four possible current levels an addressed cell generates One techniquecompares the current generated by a selected memory cell against three reference cells using threeseparate sense amplifiers The reference cells are transistors with W/L ratios that fall in between thefour possible standard transistor sizes found in the memory array as illustrated in Fig 2.3.2

The approach is essentially a 2-bit flash analog-to-digital (A/D) converter An alternate method forreading a two-bit per cell device is to compute the time it takes for a linearly rising voltage to matchthe output voltage of the cell This time interval then can be mapped to the equivalent 2-bit binarycode corresponding to the memory contents

FIGURE 2.1 The ROM market growth and forecast.

Trang 16

ROM/PROM/EPROM

FIGURE 2.2 Geometry-variable multiple-valued NOR ROM.

FIGURE 2.3 ROM sense amplifier.

Trang 17

2.2.3 Architecture

Constructing large ROMs with fast access times requires the memory array to be divided into smallermemory banks This gives rise to the concept of divided word lines and divided bit lines that reduces thecapacitance of these structures, allowing for faster signal dynamics Typically, memory blocks would be nolarger than 256 rows by 256 columns In order to quantitatively compare the area advantage of themultiple-valued approach, one can calculate the area per bit of a 2-bit per cell ROM divided by the areaper bit of a 1-bit per cell ROM Ideally, one would expect this ratio to be 0.5 In the case of a practical 2-bit per cell ROM,4 the ratio is 0.6 since the cell is larger than a regular ROM cell in order to accommo-date any one of the four possible size transistors ROM density in the Mb capacity range is in general verycomparable to that of DRAM density despite the differences in fabrication technology.2

In user-programmable or field-programmable ROMs, the customer can program the contents of thememory array by blowing selected fuses (i.e., physically altering them) on the silicon substrate This allowsfor a “one-time” customization after the ICs have been fabricated The quest for a memory that is nonvolatileand electrically alterable has led to the development of EPROMs, EEPROMs, and flash memories.2

2.3 PROM

Since process technology has shifted to QLM or PLM to achieve better device performance, it isimportant to develop a ROM technology that offers short TAT, high density, high speed, and lowpower There are many types of ROM, each with merits and demerits:5

• The diffusion programming ROM has excellent density but has a very long process cycle time

• The conventional VIA-2 contact programming ROM has better cycle time, but it has poor density

• An architecture VIA-2 contact programming ROM for QLM and PLM processes has simpleprocessing with high density which obtains excellent results targeting 2.5 V and 2.0 V supply voltage

2.3.1 Read-Only Memory Module Architecture

The details of the ROM module configuration are shown in Fig 2.4 This ROM has a single accessmode (16-bit data read from half of ROM array) and a dual access mode (32-bit data read from both

FIGURE 2.4 ROM module array configuration.

Trang 18

ROM/PROM/EPROM

ROM arrays) with external address and control signals One block in the array contains 16-bit linesand is connected to a sense amplifier circuit as shown in Fig 2.5 In the decoder, only one bit line in 16bits is selected and precharged by P1 and Tl.5

16 bits in half array at a single access mode or 32 bits in a dual access mode are dynamicallyprecharged to VDD level D1 is a pul-down transistor to keep unselected bit lines at ground level.The speed of the ROM will be limited by bit line discharge time in the worst-case ROM coding.When connection exists on all of bit lines vertically, total parasitic capacitance Cbs on the bit line byN-diffusions and Cbg will be a maximum Tills situation is shown in Fig 2.6a In the 8KW ROM,

256 bit cells are in the vertical direction, resulting in 256 times of cell bit line capacitance In thiscase, discharge time from VDD to GND level is about 6 to 8 ns at VDD=1.66 V and depends onROM programming type such as diffusion or VIA-2 Short circuit currents in the sense amplifiercircuits arc avoided by using a delayed enable signal (Sense Enable) There are dummy bit lines onboth sides of the array, as indicated in Fig 2.4 This line contains “0”s on all 256 cells and has thelongest discharge time It is used to generate timing for a delayed enable signal that activates thesense amplifier circuits These circuits were used for all types of ROM to provide a fair comparison

of the performance of each type of ROM.5

FIGURE 2.5 Detail of low power selective bit line precharge and sense amplifier circuits.

Tiêu đề	Memory, Microprocessor, and ASIC phần 2 pot
Trường học	University of Technology
Chuyên ngành	Computer Engineering
Thể loại	Lecture Note
Năm xuất bản	2024
Thành phố	Hanoi

Định dạng
Số trang	37
Dung lượng	1,57 MB