Hardware and Computer Organization- P17 potx

If the Z ﬂag = 0, then the value in register r1, &DEF02340, is incremented by 4 to &DEF02344 and that value is used as an address pointer to retrieve the 16-bit data object stored in th

Trang 2

Chapter 9: Solutions for Odd-Numbered Problems

1 This is an example of the addressing mode known as “address register indirect with index and displacement” The effective address is the sum of the address value in A0, the index value, D0, and the 2’s complement displacement Since $84 is a negative number, –7C Thus, the effective address, EA = $2000 + $0400 + ( –$7C ).

EA = $2384 The program is not relocatable for two reasons:

1 There is a jump to an absolute address, start

2 An absolute address is loaded into A0 The program could still be relocatable by managing what gets loaded into A0 and D0, but the jump instruction forces it to be absolute.

3 The value in D0 after the highlighted instruction is $0000002A.

5 00000400 067955550000AAAA ADDI.W #$5555,$0000AAAA

00000408 06B9AAAA55550000FFFE ADDI.L #$AAAA5555,$0000FFFE

00000412 0640AAAA ADDI.W #$AAAA,D0

Trang 3

Appendix A

464

* Main Program

one

MOVE.W str_len(PC),D1

Trang 4

Solutions for Odd-Numbered Problems

************************************************************************

* Subroutine: do_test

*

* Performs the actual memory test Fills

* the memory with the test pattern of interest.

* Registers used: D1,A0,A1,A2

* Return values: None

* Registers saved: None

* Input parameters:

* D0.W = test pattern

* A4.L = Points to memory location to save the count of bad addresses

* A5.L = Points to memory location to save the last bad address found

* A6.L = Points to memory location to save the data_read back and data

* written

*

* Assumptions: Saves all registers used internally

************************************************************************

loca-tion

MOVEM.L (SP)+,D1/A0-A2 * Restore registers

* Data Space

Trang 5

Appendix A

466

str_len DC.W str_len-string

last_addr DS.W 1

END start

Trang 6

1 <0C0020h> = 15C7h The word is aligned.

MOV AX,8200H ;Get segment value

MOV DX,AX ;Load segment register

MOV SI,0000 ;Load source index register

MOV DI,0200H ;Load destination index register

MOV CX,1000 ;Load counter

loader:

MOV AL,[SI] ;Get byte

MOV [DI],AL ;Store byte

INC SI ;advance pointers

INC DI

DEC CX

JNZ loader

Trang 8

1 The 68K has two operational modes, user and supervisor The ARM architecture allows for

7 operational modes User mode is the lowest privilege level The other modes are: System, Supervisor, Abort, Fast Interrupt Request, Interrupt Request and Undeﬁned

3 The biggest difference is that, with the exception of registers, r13-r15, all registers are pletely general-purpose Any register may be used as part of an arithmetic operation or as an address pointer This is in sharp contrast to the distinction that the 68K architecture makes between the address registers, A0-A6 and the data registers, D0-D7

com-5 MOV r4,#&100

ORR r4,r4,#3

7 <r11> = &0013E94C

9 If the Z ﬂag = 0, then the value in register r1, &DEF02340, is incremented by 4 to

&DEF02344 and that value is used as an address pointer to retrieve the 16-bit data object stored in that memory location The 16-bit value is then loaded into general-purpose register r4 If the Z ﬂag = 1, then the instruction is not executed.

Trang 10

* Input register list:

* A6- Pointer to the data string to be sent.

* Return register list:

* A6- Pointer to the character after the string terminating

* character.

* Register usage: All registers used by xmitStr will be saved and

* restored upon exit

* Subroutine starts here

Trang 11

Appendix A

472

3 The successive approximation always takes the same number of clock cycles to digitize the

unknown signal Since it is 16 bits of resolution, it takes 16 clock cycles Since this is a 1 MHz clock rate, it takes 16 microseconds to do the digitization.

The single ramp A/D must count up to the unknown voltage Therefore, we need to determine how many counts it takes to get to 1.5001 volts However, we can easily see that the range of

a 16-bit converter is 0 through 65,535 ($0000 to $FFFF in hexadecimal) Thus, the minimum voltage increment of the A/D converter is 0.0001 volts Thus, it takes 15,001 clock cycles or 15,001 microseconds to digitize the unknown voltage.

5a An 11-bit, 2’s complement number can represent a number in the range of –1028 to +1027,

so each change of 1 digital value corresponds to 0.010 volts Anything smaller might not be detectable.

5b Since we know that each digital code increment represents 0.01 volts, we know that +5.11 volts would be represented as 511 ( 5.11 volts / 0.01 volts/count = 511 counts ) In binary, +511 would be 00011111111, so the 2’s complement negative value (–5.11) would be 11100000001 5c 8.96 volts would correspond to a digital value of 896, or 01110000000 In order to properly represent this as a 16-bit number we need to add the appropriate number of leading zeros Thus, the result is 0000 0011 1000 0000 or 0x0380.

5d In order to digitize an 11-bit value using successive approximation, which is the hardware analogy of a binary search algorithm, we would need LOG2 211 or 11 samples

5e Since we take a sample on every rising edge of the clock and we need 11 samples, we need

11 rising edges The clock frequency is 1 MHz, so the clock period is 1 microsecond Thus, it takes 11 microseconds to digitize the analog signal.

7a 25 microseconds = 40 KHz frequency In order to collect 4 samples per cycle, the maximum

frequency of the unknown waveform must be no greater than 10 KHz.

7b 14 bit conversion = 1 part in 16,384 10V/16,384 = 0006V

7c In one millisecond it droops 1 volt in 25 microseconds it droops (25/1000) * 1 = 025 volts Since this is signiﬁcantly greater than the 0.0006 resolution of the converter, the S/H would introduce an unacceptably large error Thus, it can’t be used.

Trang 12

Solutions for Odd-Numbered Problems

Trang 14

1 In this particular example, Segment A would be result in better pipeline efﬁciency The reason

is that each of the instructions is independent of the others; there is no dependencies between them In Segment B, each instruction must complete before the next instruction has enough information to complete Thus, the MOVE.W D1,D0 must put the result in D0 before the ADD.W instruction can begin to operate Likewise, MULU, can’t begin until the result of the ADD operation in the previous instruction has completed Thus, in a pipelined operation, the instructions must each complete before the next one can ﬁnish

3 a No, because it involves a memory to memory transfer

b Yes, the addition occurs between two registers.

c Yes, the move is a store operation that transfers a register to memory.

d No, The AND operation takes place between an immediate value and memory.

e Yes, the operation is an immediate load operation that transfers data from memory to a

register.

5 There are several RISC characteristics illustrated here The most important RISC characteristic

is that the ADD operation could only take place between data stored in the general purpose registers Also, there was no effective addressing mode that allowed the memory address to be directly speciﬁed, the memory addresses had to be loaded as literals into the registers and then the registers were used as memory pointers Thus, we see only two addressing modes used 7a Since the pipeline has seven stages, and each stage requires 2 clock cycles, then it takes 14 clock cycles for the ﬁrst instruction to move down the pipeline Since each clock cycle takes

10 nanoseconds, the total time for the ﬁrst instruction is 140 nanoseconds.

7b If we assume no stalls, after the ﬁrst instruction is retired, the next 9 instructions would follow

at intervals of 2 clock cycles, so we would have 9 times 20 nanoseconds, or 180 nanoseconds for the basic block to completely execute However, the pipeline will stall twice for 4 clock cycles, this adds another 80 nanoseconds ( 2 × 4 × 10), so the total elapsed time is:

ET = 140 ns + 180 ns + 80 ns = 400 ns

Trang 16

1 a The memory hierarchy is often represented as a pyramid with the CPU at the top It

illus-trates the point that the fastest memory, but least amount of memory, is closest to the CPU and that as we get further from the CPU the amount of memory goes up, but the speed goes down Thus, there is a reciprocal relation between the access speed of memory and the size of the memory Also, the cost per bit goes down as you get further from the top

b Spatial locality refers to the fact that instructions and data tend to be grouped together Instructions are located in sequence and data tends to stored in clusters For caches, this means that a cache can be much smaller than main memory but still be efﬁcient in terms

of the probability that if instructions or data are already in the cache, then it is likely that successive instructions or data will be there as well

Temporal locality refers to the fact that if an instruction or data was recently accessed, it is likely to be accessed soon, again Thus, if something is in the cache and has been recently accessed, it is likely that it will be accessed again, thus improving the efﬁciency of the cache.

c With caches, we want to maximize the hit rate and minimize the miss penalty One way to minimize the miss penalty is to reﬁll a portion of the cache in a burst, rather than one word

at a time whenever there is a cache miss Modern SDRAM memory is designed to reﬁll the on-chip cache in a burst of data reads, thus minimizing the penalty or reloading.

d A write through cache will always write the data into the cache and to main memory

at the same time, thus avoiding the problem of data differences between the cache and main memory, but sacriﬁcing some performance The write back cache will hold the data written only to the cache and then write it to main memory when the bus is available Per- formance is improved but runs the risk of memory being corrupted.

3 Spatial locality can be demonstrated in three ways

a The compiled instructions occupy a very small region of memory, only 32 bytes in length Thus, we may assume that they are located close to each other.

b Since the variables in the array DataStream are being accessed by de-referencing the pointer variable, DataStream + an offset value, count, the individual elements of the array must be located adjacent to each other in memory

c The variables, count and maxcount are local variables to the function, main() As such the compiler has created a stack frame on the system stack just large enough to hold two integer values, so they must also be located near each other.

Trang 17

Appendix A

478

Temporal locality can be demonstrated as follows:

a Since the main part of the program is a for loop, the instructions in the loop are executed

5 a Main memory has an address range of 00000 FFFFF, or 220 discreet addresses This

is approximately 1 Mbytes of addresses If each reﬁll line has 64 bytes, or 26, then the number of reﬁll lines = 220 / 26 = 214

16,384 reﬁll lines in main memory

b The cache memory is 4096 bytes in size Using the same method as in A, above, the ber of reﬁll lines in the cache = 212 / 26 = 26

num-64 reﬁll lines in the cache memory

c Since this is a direct mapped cache there must be the same number of rows of reﬁll lines

in the cache memory as there are in the main memory Therefore, the number of rows of reﬁll lines multiplied by the number of columns of reﬁll lines = 16,384

Number of columns of reﬁll lines = 16,384/64= 214 / 26 = 28 = 256

256 columns of reﬁll lines in the main memory

d Since there are 256 columns, the TAG memory must contain 8 bits in order to be able to address any one of the 256 columns Thus, tag memory requires 8 bits.

e See the below diagram:

• • • •

7 Effective execution time = hit rate * hit time + miss rate * miss penalty

Effective execution time = 98*10 + 02 * 100 * 10 = 9.8 + 20 = 29.9 nsec.

9 When the processor is initialized at start-up, all TLB entries are invalid The validity bit is needed to know when a valid entry has been placed in the TLB or if it is just garbage.

Trang 18

1 Video gamers are notorious for overclocking their CPUs to gain the last ounce of performance from the machine However, overclocking generates more heat, which slows down the internal processes, and also causes the processor to run closer to its design limits Liquid cooling is more efﬁcient at removing heat, so the CPU can run cooler with a higher heat load on the CPU.

3 For computer #1:

1 Each instruction executes in 1 clock cycle, or 1/100MHz = 10 nanoseconds

2 It must execute a total of 1000 + 200 * 100 = 21000 instructions

3 Total execution time = 21000 × 10 nanoseconds = 2.1 × 104 times 10 × 10−9

= 21 × 10−5 = 0.210 × 10−6 or 210 microseconds.

For computer #2:

1 It must execute the same 21000 instructions, but some take twice as long as others fore 40% of the 21000 instructions take 1 clock cycle and 60% of the 21000 instructions take 2 clock cycles.

There-2 At 250 MHz, 1 clock cycle takes 4 nanoseconds and 2 clock cycles take 8 nanoseconds.

3 Therefore, the total execution time is 0.4 × 21000 × 4 ×10−9 + 0.6 × 21000 × 8 × 10−9 = (8.4 × 103) × (4 × 10−9) + (12.2 × 103) × (8 × 10−9)

= (33.6 × 10−6) + (97.6 × 10−6) = 131.2 × 10−6 = 131.2 microseconds.

5 Cycles per instruction x seconds per clock cycle = seconds per instruction

This is the measure we want:

Computer #1 requires 2 cycles per instruction and each clock cycle takes 1 ns (1/1GHz).

Therefore, computer #1 requires 2 nanoseconds to execute 1 instruction.

Computer #2 requires 1.2 cycles per instruction and each clock cycle takes 2 ns (1/500MHz).

Therefore, computer #2 requires 2.4 nanoseconds to execute 1 instruction.

Thus, performance = 2.4/2.0 = 1.2 Or computer #1 has 20% better performance.

7 Analyzing this problem requires that we consider the number of accesses required for both the instructions and the actual add operation Let’s use 68000 assembly language for this example Here’s a representative code snippet:

MOVE.L var1,D0 *6 bytes long

ADD.L var2,D0 *6 bytes long

MOVE.L D0,var3 *6 bytes long

Trang 20

1 The fuse map is shown below:

pseudo-of a good hashing function Thus, any imperfection quickly generates a result that is very different from the standard.

Trang 22

About the Author

Arnold S Berger is a Senior Lecturer in the Computing and Software Systems Department at the University of Washington-Bothell He received his BS and PhD degrees from Cornell University Prior to joining UWB, Dr Berger was an R&D Director at Applied Microsystems Corporation, a manufacturer of specialized hardware and software tools for embedded systems developers Prior

to coming to the Paciﬁc Northwest 5 years ago, he was the Embedded Tools Marketing Manager at Advanced Micro Devices and an R&D Project Manager at Hewlett-Packard’s Logic Systems Divi- sion in Colorado Springs, Colorado

Dr Berger has published over 40 papers on embedded systems development methods and the tools needed to design them He holds three patents in the area of embedded systems design tools and embedded systems simulation He is the author of Embedded Systems Design: An Introduction to Processes, Tools and Techniques During the two-year period prior to the Y2K date changeover,

Dr Berger consulted for the electric power industry on testing and remediation of their embedded systems

When not teaching or consulting, Arnie is an avid cyclist, electronic hobbyist, and woodworker His stable of three bicycles collectively log more mileage in a year than does his car

Tiêu đề	Solutions for odd-numbered problems
Trường học	University of Technology
Chuyên ngành	Computer Science
Thể loại	bài tập
Năm xuất bản	2023
Thành phố	Hanoi

Định dạng
Số trang	30
Dung lượng	224,12 KB