External clock input Phase-locked loop PLL Global clocks Major clocks Local clocks... The combinatorial logic within each pipeline stage depends upon the time budget it has to complete i
Trang 1phenom-so, and that 10 GHz is not far behind.
A 5 GHz clock rate corresponds to a clock period of 200 picoseconds (ps) Since the speed of light
is roughly 12 inches per nanosecond in free space and 6 inches per nanosecond through a wire, this means that in 200 ps, light can travel about 1.2 inches A modern microprocessor is about ¾ of
an inch on a side, so this means that 62% of the clock period will be wasted just getting the clock signal from one edge of the chip to the other Since our microprocessor is a fully synchronous
machine, this is a very serious problem We call this problem clock skew Clock skew is simply the
difference in time between corresponding portions of the clock (phase difference) because of the problems associated with simultaneously distributing the clock to all portions of the chip In Tera-mac, clock skew was a major design issue that had to be factored into all elements of the machine design Also, the original Cray supercomputer controlled clock skew by adjusting the lengths of the coaxial cables carrying the clock to various circuit boards in the machine
Another potential problem is that all transistors don’t switch in exactly the same way There can
be slight differences in the switching characteristics of the clock circuitry at various portions of the chip Measurements have shown these differences in switching characteristics to be as large
as about 180 ps17 Thus, as the chips get bigger and faster, our ability to keep the clock uniformly distributed across the chip becomes more problematic
Today, most clock distribution networks are hierarchical Figure 16.11 shows a typical clock
distribution network The circuit block labeled phase-locked loop represents the method used in
modern computers to multiply the internal clock frequency to a higher value than the external clock input For example, if your external clock frequency is 200 MHz, a multiplier value that you might set in the BIOS, or is
locked into the chip, could
be a factor of 11 Thus, the
internal clock frequency
is 2200 MHz, or 2.2 GHz
As you can see, simple
variations in IC process
parameters could lead to
clock skew problems as the
clock is distributed to all of
the synchronous circuitry
on the chip
Recall that the modern
processor is a
pipeline-driven device with different Figure 16.11: Synchronous clock distribution network
External clock input
Phase-locked loop (PLL)
Global clocks
Major clocks
Local clocks
Trang 2Future Trends and Refigurable Hardware
combinatorial logic
circuits functioning
within the various
stages of the pipe
All of the stages are
driven from the same
synchronous clock,
as shown in Figure
16.12 Here we can
see the reason why
limiting clock skew is so critical Each stage of the pipeline must complete its work before the clock arrives to latch the result into the next stage of the pipeline The combinatorial logic within each pipeline stage depends upon the time budget it has to complete its work before the next clock edge comes along Skewing of the clock edges means that some pipeline stages will be clocked sooner than others, destroying the synchronicity of the pipeline
Now, let’s modify the
The system clock is
used to drive local
clock controllers
for each stage of the
pipeline However, each pipeline stage is autonomous, and its local clock is not synchronized with the clock of either the previous stage or the next stage of the pipeline
When the combinatorial logic of a particular stage has completed its work, the stage logic outputs
a request to the local clock controller to latch the result to into the D register that feeds the next stage When the data is latched into the input register for the next stage, the local clock controller issues an acknowledge signal to the next stage, indicating that valid data is now available to work
with The net effect is that we’ve created a pipeline with handshake control between the stages
Each stage must request a data transfer and the latch mechanism responds with an ment of the transfer to the next stage
acknowledge-The drawback of this scheme is that because the local clocks are not synchronized, the handshake
Figure 16.12: Pipeline with a synchronous clock
Combinatorial Logic CombinatorialLogic CombinatorialLogic
D Register D Register D Register
Clock
Figure 16.13: Pipeline with an asynchronous clocking architecture.
Combinatorial Logic
Combinatorial Logic
Combinatorial Logic
Acknowledge Acknowledge Acknowledge
Request Request Request
Trang 3Chapter 16
434
the pipe could easily propagate back and stall the pipe However, the advantages of such a scheme could far outweigh the disadvantages when we are asking our processors to run at clock speeds in excess of 10 GHz Given that we may still be able to build digital logic circuits capable of running
at such high clock rates, local clocking of the system is probably the only solution
This raises an interesting question, “Why use clocks at all?” Can we build a completely
asynchro-nous (clockless) computer According to Marculescu et al 17 fully asynchronous designs are probably still a ways away The computer-aided design (CAD) tools used for design and verification of mod-ern processors still have not reached a level of sophistication that would allow them to deal with a fully asynchronous design Also, there’s the problem of inertia We just don’t design computers this way However, the local clock remains a viable compromise to the problem of clock skew
Several start-up companies have already formed
to exploit the idea of a fully asynchronous
mi-croprocessor design Fulcrum Microsystems18
grew out of work done at Caltech Figure 16.14
illustrates one of the potential advantages to
asynchronous processors
With an asynchronous system, the data in the
pipeline flows through at its own rate Additional
circuitry is needed to prevent the runaway
condi-tion that clocks and registers are used to prevent in tradicondi-tional clocked microprocessor systems This concept is similar
to the use of local
clocks, but in this
case, additional logic
is necessary to
de-tect when a stage has
completed its work so
that the next stage in
the pipeline may be
enabled This is shown
in Figure 16.15
Summary of Chapter 16
In Chapter 16, we covered:
• The architecture of programmable logic devices
• The architecture of field programmable gate arrays
• The development of reconfigurable computing machines based upon arrays of field
programmable gate arrays
• Future trends in molecular computing, local clocks and clockless computers
Figure 16.14: Advantage of clockless logic over traditionally clocked logic Courtesy of Fulcrum
Microsystems
Cycle time of clocked logic
Manufacturing margin Clock jitter, skew margin Worst case − average case (logic execution time)
Cycle time of clockless logic Logic Time
Figure 16.15: Clockless pipeline Courtesy of Fulcrum Microsystems
Input Completion Detection
Output Completion Detection
Stage A Stage B Stage C
Dual-Rail Domino Logic
Dual-Rail Domino Logic
Dual-Rail Domino Logic
Trang 4Future Trends and Refigurable Hardware
5 “Inside Intel: It’s Moving at Double-Time to Head Off Competitors,” Business Week, June 1, 1992.
6 Greg Snider, Philip Kuekes, W Bruce Culbertson, Richard J Carter, Arnold S Berger, Rick Amerson, The Teramac Configurable Computer Engine, Proceedings of the 5th International Workshop on Field-Programmable Logic and Applications, edited by Will Moore and Wayne Luk, Oxford, UK, September 1995, p 44.
7 B.S Landman and R.L Russo, IEEE Trans Comp., C20, 1469, 1971.
8 Rick Anderson, Richard J Carter, W Bruce Culbertson, Philip Kuekes, Greg Snider, Lyle Albertson: Plasma: An FPGA for Million Gate Systems FPGA ‘96 Proceedings of the 1996 Fourth International Symposium on Field Programmable Gate Arrays, February 11-13, 1996, Monterey, CA, USA ACM, 1996, pp 10–16.
9 B Culbertson, R Amerson, R Carter, P Kuekes, G Snider, The Teramac Custom Computer: Extending the limits with defect tolerance, IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems, November 1996.
10 Barry Shakleford, HP Labs, Private Communication.
16 Mark A Reed and James M Tour, Computing with Molecules, Scientific American, June, 2000, p 89.
17 Diana Marculescu, Dave Albonesi, Alper Buyuktosunoglu, Tutorial: Partially Asynchronous Microprocessors, Micro-35, Istanbul, Turkey, Nov 18, 2002.
18 http://www.fulcrummicro.com.
Trang 51 Consider the circuit for a portion of a PLD as shown below Indicate a fuse that is “blown” by
a solid black interconnect and a connection as an open white circle Make a copy of the gram and “program” the device by filling in the interconnect circles of the fuses that you want
dia-to blow Program the logical equation:
A B
4 Suppose that you want to design a synchronous CPU with a 10 GHz clock rate The worst case propagation delay through the logic gates is 28 picoseconds No stage of the pipeline has more than three levels of logic circuitry You also need to maintain a safety margin of 10 picosec-onds to allow for manufacturing uncertainties, device set-up times, and differences between the switching characteristics of the devices in the circuitry Approximately what is largest dif-ference in the length of the clock paths that this design can tolerate?
Trang 6approxi-Thus, there are two effects going on Computers can achieve higher performance in areas such
as bus bandwidth and complexity because we can take advantage of the number of circuits we can place on a single die Also, these complex designs can run faster Finally, complex circuit designs allow even more complex software applications to run because we have memories with higher speed and capacity to implement the algorithms
3 An advantage of an abstraction layer concept is that you can hide the details and differences
of the lower level details so that programs at the upper level need only be written once and will be able to run on a wide range of different machines A disadvantage is that you may lose efficiency as calls to the lower level functions must progress through the different layer and be translated at each step
5 On average, semiconductor memory is 34,286 times faster than the hard drive
7 Convert the following hexadecimal numbers to decimal:
Trang 81 The AND circuit becomes an OR circuit and the OR circuit becomes an AND circuit
3
5 The truth table is shown on the right
7 The circuit is shown below:
b
Chapter 2: Solutions for Odd-Numbered Problems
Trang 101 The truth table and K-maps are shown below:
Chapter 3: Solutions for Odd-Numbered Problems
XOR
XOR SUM
A B Cin
A B Cin SUM Cout
SUM = Cin ⊕ [A ⊕ B]
We can Use the Karnaugh map to simplify the logic for Cout There are three loops:
Cout = B * Cin + A * Cin + A * BFollowing is the logic circuitry for SUM and Cout
Trang 11Appendix A
442
3 Assume that at T = 0 the logic level changes from 0 to 1, as shown, above We can see that as the change propagates through each gate an additional 10 ns delay is introduced When the signal gets to point A, 50 ns later, it puts the opposite polarity signal on the first gate and the sequence starts over again in the opposite direction At T = 100 ns the situation is the same as
T = 0, but 100 ns have elapsed Thus, the circuit oscillates with a period of 100 ns Therefore, the frequency at point A is 10 MHz
The waveform seen at point A is:
Waveform at Point A
100 nsec.
5 The truth tables and K-maps are shown below:
The simplified equations are:
AB AB AB AB
CD CD K-Map for X
AB AB AB AB
CD CD
Trang 12Solutions for Odd-Numbered Problems
7 Let’s walk through the logic of the solution The pump motor logic is designed so that if the temperature is too low, the pump would not automatically start the pump motor and the heater Another possible interpretation is that a low temperature would automatically start the pump motor and the heater The circuitry for the pump shows both options for the solution
a Pump motor: The pump motor is on (f = 1) when the timer (B) is on OR the manual switch (F) is on AND the key switch (E) is on Note in the alternative solution the temperature being low can also turn on the pump, so we’ve added a term to account for that case
b Heater: The heater should go on (h = 1)
when the temperature sensor (A) indicates
that the temperature of the water is below
the set temperature on the control panel We
also have the practical consideration that
the heater shouldn’t be turned on unless
the pump is also operating This could be
dangerous if the water isn’t flowing while it
is being heated The solution is shown in the
circuit diagram for the heater, h
Thus, in the above circuit there are three AND
conditions for the heater to be turned on
1 The key switch (E) must be enabled,
2 The pump must be on (B + F),
3 The temperature is low (A)
The alternative solution leads to a simpler
arrangement Only the key switch AND low
temperature are required to turn on the heater
We don’t have to worry about the pump because
A also turns it on
c Blower: The air blower (g) is pretty simple The key switch must
be on (E = 1) AND the blower switch must be on (D = 1) to turn on
the soothing bubbles after a hard day of solving homework
prob-lem sets The solution is shown, right:
AND
AND OR
A
B F
E
E f
f B
F Solution
Alternative Solution
OR NOT
E Solution
Alternative Solution
E A
A
B F
NOT
E
g AND D
Trang 13Appendix A
444
C
D B A
9 The circuit is shown below:
Trang 141 Following is the state machine diagram:
Chapter 4: Solutions for Odd-Numbered Problems
100 011
111
010 101
3 The table is shown below:
5 The table is shown below The pattern repeats itself after six clock pulses
BEFORE PULSE AFTER PULSE
Trang 15J Clock K
Clock K
Clock K
Trang 161 The truth table is shown below The state diagram is shown
to the right:
Chapter 5: Solutions for Odd-Numbered Problems
Aout = 1 Bout = 0
Aout = 0 Bout = 0
Aout = 0 Bout = 1
Aout = 1 Bout = 1
Trang 17Appendix A
448
5 We have four states, S0 through S3, so we need two variables, X and Y, to provide the outputs
to the register and to provide two inputs to the truth table Thus, we can make the following assertions:
Now, assume that we’re in state S0 (S0 → X = 0, Y = 0) The possibilities are:
1 No coin is deposited, stay in S0
2 A dime is deposited (a = 0, b = 1) transition to state S1
3 A quarter is deposited (a = 1, b = 0) transition to state S3
We can express this condition as follows:
Now, assume that we’re in state S1 (S1 → X = 1, Y = 0) The possibilities are:
1 No coin is deposited, it stays in S1
2 A dime is deposited, it transitions to S2
3 A quarter is deposited, it returns to S0 and dispenses the merchandise
We can show this as the following conditions:
Now, assume that we’re in state S2 (S2 → X = 0, Y = 1) The possibilities are:
1 No coin is deposited, it stays in S2
2 A dime is deposited, it transitions to S0 and dispenses merchandise
3 A quarter is deposited, it returns to S0 and dispenses the merchandise
We can show this as the following conditions: