Hardware and Computer Organization- P16 pptx

External clock input Phase-locked loop PLL Global clocks Major clocks Local clocks... The combinatorial logic within each pipeline stage depends upon the time budget it has to complete i

Trang 1

phenom-so, and that 10 GHz is not far behind.

A 5 GHz clock rate corresponds to a clock period of 200 picoseconds (ps) Since the speed of light

is roughly 12 inches per nanosecond in free space and 6 inches per nanosecond through a wire, this means that in 200 ps, light can travel about 1.2 inches A modern microprocessor is about ¾ of

an inch on a side, so this means that 62% of the clock period will be wasted just getting the clock signal from one edge of the chip to the other Since our microprocessor is a fully synchronous

machine, this is a very serious problem We call this problem clock skew Clock skew is simply the

difference in time between corresponding portions of the clock (phase difference) because of the problems associated with simultaneously distributing the clock to all portions of the chip In Tera-mac, clock skew was a major design issue that had to be factored into all elements of the machine design Also, the original Cray supercomputer controlled clock skew by adjusting the lengths of the coaxial cables carrying the clock to various circuit boards in the machine

Another potential problem is that all transistors don’t switch in exactly the same way There can

be slight differences in the switching characteristics of the clock circuitry at various portions of the chip Measurements have shown these differences in switching characteristics to be as large

as about 180 ps17 Thus, as the chips get bigger and faster, our ability to keep the clock uniformly distributed across the chip becomes more problematic

Today, most clock distribution networks are hierarchical Figure 16.11 shows a typical clock

distribution network The circuit block labeled phase-locked loop represents the method used in

modern computers to multiply the internal clock frequency to a higher value than the external clock input For example, if your external clock frequency is 200 MHz, a multiplier value that you might set in the BIOS, or is

locked into the chip, could

be a factor of 11 Thus, the

internal clock frequency

is 2200 MHz, or 2.2 GHz

As you can see, simple

variations in IC process

parameters could lead to

clock skew problems as the

clock is distributed to all of

the synchronous circuitry

on the chip

Recall that the modern

processor is a

pipeline-driven device with different Figure 16.11: Synchronous clock distribution network

External clock input

Phase-locked loop (PLL)

Global clocks

Major clocks

Local clocks

Trang 2

Future Trends and Reﬁgurable Hardware

combinatorial logic

circuits functioning

within the various

stages of the pipe

All of the stages are

driven from the same

synchronous clock,

as shown in Figure

16.12 Here we can

see the reason why

limiting clock skew is so critical Each stage of the pipeline must complete its work before the clock arrives to latch the result into the next stage of the pipeline The combinatorial logic within each pipeline stage depends upon the time budget it has to complete its work before the next clock edge comes along Skewing of the clock edges means that some pipeline stages will be clocked sooner than others, destroying the synchronicity of the pipeline

Now, let’s modify the

The system clock is

used to drive local

clock controllers

for each stage of the

pipeline However, each pipeline stage is autonomous, and its local clock is not synchronized with the clock of either the previous stage or the next stage of the pipeline

When the combinatorial logic of a particular stage has completed its work, the stage logic outputs

a request to the local clock controller to latch the result to into the D register that feeds the next stage When the data is latched into the input register for the next stage, the local clock controller issues an acknowledge signal to the next stage, indicating that valid data is now available to work

with The net effect is that we’ve created a pipeline with handshake control between the stages

Each stage must request a data transfer and the latch mechanism responds with an ment of the transfer to the next stage

acknowledge-The drawback of this scheme is that because the local clocks are not synchronized, the handshake

Figure 16.12: Pipeline with a synchronous clock

Combinatorial Logic CombinatorialLogic CombinatorialLogic

D Register D Register D Register

Clock

Figure 16.13: Pipeline with an asynchronous clocking architecture.

Combinatorial Logic

Acknowledge Acknowledge Acknowledge

Request Request Request

Trang 3

Chapter 16

434

the pipe could easily propagate back and stall the pipe However, the advantages of such a scheme could far outweigh the disadvantages when we are asking our processors to run at clock speeds in excess of 10 GHz Given that we may still be able to build digital logic circuits capable of running

at such high clock rates, local clocking of the system is probably the only solution

This raises an interesting question, “Why use clocks at all?” Can we build a completely

asynchro-nous (clockless) computer According to Marculescu et al 17 fully asynchronous designs are probably still a ways away The computer-aided design (CAD) tools used for design and veriﬁcation of mod-ern processors still have not reached a level of sophistication that would allow them to deal with a fully asynchronous design Also, there’s the problem of inertia We just don’t design computers this way However, the local clock remains a viable compromise to the problem of clock skew

Several start-up companies have already formed

to exploit the idea of a fully asynchronous

mi-croprocessor design Fulcrum Microsystems18

grew out of work done at Caltech Figure 16.14

illustrates one of the potential advantages to

asynchronous processors

With an asynchronous system, the data in the

pipeline ﬂows through at its own rate Additional

circuitry is needed to prevent the runaway

condi-tion that clocks and registers are used to prevent in tradicondi-tional clocked microprocessor systems This concept is similar

to the use of local

clocks, but in this

case, additional logic

is necessary to

de-tect when a stage has

completed its work so

that the next stage in

the pipeline may be

enabled This is shown

in Figure 16.15

Summary of Chapter 16

In Chapter 16, we covered:

• The architecture of programmable logic devices

• The architecture of ﬁeld programmable gate arrays

• The development of reconﬁgurable computing machines based upon arrays of ﬁeld

programmable gate arrays

• Future trends in molecular computing, local clocks and clockless computers

Figure 16.14: Advantage of clockless logic over traditionally clocked logic Courtesy of Fulcrum

Microsystems

Cycle time of clocked logic

Manufacturing margin Clock jitter, skew margin Worst case − average case (logic execution time)

Cycle time of clockless logic Logic Time

Figure 16.15: Clockless pipeline Courtesy of Fulcrum Microsystems

Input Completion Detection

Output Completion Detection

Stage A Stage B Stage C

Dual-Rail Domino Logic

Trang 4

Future Trends and Reﬁgurable Hardware

5 “Inside Intel: It’s Moving at Double-Time to Head Off Competitors,” Business Week, June 1, 1992.

6 Greg Snider, Philip Kuekes, W Bruce Culbertson, Richard J Carter, Arnold S Berger, Rick Amerson, The Teramac Conﬁgurable Computer Engine, Proceedings of the 5th International Workshop on Field-Programmable Logic and Applications, edited by Will Moore and Wayne Luk, Oxford, UK, September 1995, p 44.

7 B.S Landman and R.L Russo, IEEE Trans Comp., C20, 1469, 1971.

8 Rick Anderson, Richard J Carter, W Bruce Culbertson, Philip Kuekes, Greg Snider, Lyle Albertson: Plasma: An FPGA for Million Gate Systems FPGA ‘96 Proceedings of the 1996 Fourth International Symposium on Field Programmable Gate Arrays, February 11-13, 1996, Monterey, CA, USA ACM, 1996, pp 10–16.

9 B Culbertson, R Amerson, R Carter, P Kuekes, G Snider, The Teramac Custom Computer: Extending the limits with defect tolerance, IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems, November 1996.

10 Barry Shakleford, HP Labs, Private Communication.

16 Mark A Reed and James M Tour, Computing with Molecules, Scientiﬁc American, June, 2000, p 89.

17 Diana Marculescu, Dave Albonesi, Alper Buyuktosunoglu, Tutorial: Partially Asynchronous Microprocessors, Micro-35, Istanbul, Turkey, Nov 18, 2002.

18 http://www.fulcrummicro.com.

Trang 5

1 Consider the circuit for a portion of a PLD as shown below Indicate a fuse that is “blown” by

a solid black interconnect and a connection as an open white circle Make a copy of the gram and “program” the device by ﬁlling in the interconnect circles of the fuses that you want

dia-to blow Program the logical equation:

A B

4 Suppose that you want to design a synchronous CPU with a 10 GHz clock rate The worst case propagation delay through the logic gates is 28 picoseconds No stage of the pipeline has more than three levels of logic circuitry You also need to maintain a safety margin of 10 picosec-onds to allow for manufacturing uncertainties, device set-up times, and differences between the switching characteristics of the devices in the circuitry Approximately what is largest dif-ference in the length of the clock paths that this design can tolerate?

Trang 6

approxi-Thus, there are two effects going on Computers can achieve higher performance in areas such

as bus bandwidth and complexity because we can take advantage of the number of circuits we can place on a single die Also, these complex designs can run faster Finally, complex circuit designs allow even more complex software applications to run because we have memories with higher speed and capacity to implement the algorithms

3 An advantage of an abstraction layer concept is that you can hide the details and differences

of the lower level details so that programs at the upper level need only be written once and will be able to run on a wide range of different machines A disadvantage is that you may lose efﬁciency as calls to the lower level functions must progress through the different layer and be translated at each step

5 On average, semiconductor memory is 34,286 times faster than the hard drive

7 Convert the following hexadecimal numbers to decimal:

Trang 8

1 The AND circuit becomes an OR circuit and the OR circuit becomes an AND circuit

3

5 The truth table is shown on the right

7 The circuit is shown below:

b

Chapter 2: Solutions for Odd-Numbered Problems

Trang 10

1 The truth table and K-maps are shown below:

XOR

XOR SUM

A B Cin

A B Cin SUM Cout

SUM = Cin ⊕ [A ⊕ B]

We can Use the Karnaugh map to simplify the logic for Cout There are three loops:

Cout = B * Cin + A * Cin + A * BFollowing is the logic circuitry for SUM and Cout

Trang 11

Appendix A

442

3 Assume that at T = 0 the logic level changes from 0 to 1, as shown, above We can see that as the change propagates through each gate an additional 10 ns delay is introduced When the signal gets to point A, 50 ns later, it puts the opposite polarity signal on the ﬁrst gate and the sequence starts over again in the opposite direction At T = 100 ns the situation is the same as

T = 0, but 100 ns have elapsed Thus, the circuit oscillates with a period of 100 ns Therefore, the frequency at point A is 10 MHz

The waveform seen at point A is:

Waveform at Point A

100 nsec.

5 The truth tables and K-maps are shown below:

The simpliﬁed equations are:

AB AB AB AB

CD CD K-Map for X

AB AB AB AB

CD CD

Trang 12

Solutions for Odd-Numbered Problems

7 Let’s walk through the logic of the solution The pump motor logic is designed so that if the temperature is too low, the pump would not automatically start the pump motor and the heater Another possible interpretation is that a low temperature would automatically start the pump motor and the heater The circuitry for the pump shows both options for the solution

a Pump motor: The pump motor is on (f = 1) when the timer (B) is on OR the manual switch (F) is on AND the key switch (E) is on Note in the alternative solution the temperature being low can also turn on the pump, so we’ve added a term to account for that case

b Heater: The heater should go on (h = 1)

when the temperature sensor (A) indicates

that the temperature of the water is below

the set temperature on the control panel We

also have the practical consideration that

the heater shouldn’t be turned on unless

the pump is also operating This could be

dangerous if the water isn’t ﬂowing while it

is being heated The solution is shown in the

circuit diagram for the heater, h

Thus, in the above circuit there are three AND

conditions for the heater to be turned on

1 The key switch (E) must be enabled,

2 The pump must be on (B + F),

3 The temperature is low (A)

The alternative solution leads to a simpler

arrangement Only the key switch AND low

temperature are required to turn on the heater

We don’t have to worry about the pump because

A also turns it on

c Blower: The air blower (g) is pretty simple The key switch must

be on (E = 1) AND the blower switch must be on (D = 1) to turn on

the soothing bubbles after a hard day of solving homework

prob-lem sets The solution is shown, right:

AND

AND OR

A

B F

E

E f

f B

F Solution

Alternative Solution

OR NOT

E Solution

Alternative Solution

E A

A

B F

NOT

E

g AND D

Trang 13

Appendix A

444

C

D B A

9 The circuit is shown below:

Trang 14

1 Following is the state machine diagram:

100 011

111

010 101

3 The table is shown below:

5 The table is shown below The pattern repeats itself after six clock pulses

BEFORE PULSE AFTER PULSE

Trang 15

J Clock K

Clock K

Trang 16

1 The truth table is shown below The state diagram is shown

to the right:

Aout = 1 Bout = 0

Aout = 0 Bout = 0

Aout = 0 Bout = 1

Aout = 1 Bout = 1

Trang 17

Appendix A

448

5 We have four states, S0 through S3, so we need two variables, X and Y, to provide the outputs

to the register and to provide two inputs to the truth table Thus, we can make the following assertions:

Now, assume that we’re in state S0 (S0 → X = 0, Y = 0) The possibilities are:

1 No coin is deposited, stay in S0

2 A dime is deposited (a = 0, b = 1) transition to state S1

3 A quarter is deposited (a = 1, b = 0) transition to state S3

We can express this condition as follows:

1 No coin is deposited, it stays in S1

2 A dime is deposited, it transitions to S2

3 A quarter is deposited, it returns to S0 and dispenses the merchandise

We can show this as the following conditions:

1 No coin is deposited, it stays in S2

2 A dime is deposited, it transitions to S0 and dispenses merchandise

3 A quarter is deposited, it returns to S0 and dispenses the merchandise

We can show this as the following conditions:

Tiêu đề	Local Clocks
Trường học	Unknown University
Chuyên ngành	Hardware and Computer Organization
Thể loại	Lecture Notes
Năm xuất bản	2004
Thành phố	Unknown City

Định dạng
Số trang	30
Dung lượng	673,8 KB