If the Z flag = 0, then the value in register r1, &DEF02340, is incremented by 4 to &DEF02344 and that value is used as an address pointer to retrieve the 16-bit data object stored in th
Trang 2Chapter 9: Solutions for Odd-Numbered Problems
1 This is an example of the addressing mode known as “address register indirect with index and displacement” The effective address is the sum of the address value in A0, the index value, D0, and the 2’s complement displacement Since $84 is a negative number, –7C Thus, the effective address, EA = $2000 + $0400 + ( –$7C ).
EA = $2384 The program is not relocatable for two reasons:
1 There is a jump to an absolute address, start
2 An absolute address is loaded into A0 The program could still be relocatable by managing what gets loaded into A0 and D0, but the jump instruction forces it to be absolute.
3 The value in D0 after the highlighted instruction is $0000002A.
5 00000400 067955550000AAAA ADDI.W #$5555,$0000AAAA
00000408 06B9AAAA55550000FFFE ADDI.L #$AAAA5555,$0000FFFE
00000412 0640AAAA ADDI.W #$AAAA,D0
Trang 3Appendix A
464
* Main Program
one
MOVE.W str_len(PC),D1
Trang 4Solutions for Odd-Numbered Problems
************************************************************************
* Subroutine: do_test
*
* Performs the actual memory test Fills
* the memory with the test pattern of interest.
* Registers used: D1,A0,A1,A2
* Return values: None
* Registers saved: None
* Input parameters:
* D0.W = test pattern
* A4.L = Points to memory location to save the count of bad addresses
* A5.L = Points to memory location to save the last bad address found
* A6.L = Points to memory location to save the data_read back and data
* written
*
* Assumptions: Saves all registers used internally
************************************************************************
loca-tion
MOVEM.L (SP)+,D1/A0-A2 * Restore registers
* Data Space
Trang 5
Appendix A
466
str_len DC.W str_len-string
last_addr DS.W 1
END start
Trang 6Chapter 10: Solutions for Odd-Numbered Problems
1 <0C0020h> = 15C7h The word is aligned.
MOV AX,8200H ;Get segment value
MOV DX,AX ;Load segment register
MOV SI,0000 ;Load source index register
MOV DI,0200H ;Load destination index register
MOV CX,1000 ;Load counter
loader:
MOV AL,[SI] ;Get byte
MOV [DI],AL ;Store byte
INC SI ;advance pointers
INC DI
DEC CX
JNZ loader
Trang 81 The 68K has two operational modes, user and supervisor The ARM architecture allows for
7 operational modes User mode is the lowest privilege level The other modes are: System, Supervisor, Abort, Fast Interrupt Request, Interrupt Request and Undefined
3 The biggest difference is that, with the exception of registers, r13-r15, all registers are pletely general-purpose Any register may be used as part of an arithmetic operation or as an address pointer This is in sharp contrast to the distinction that the 68K architecture makes between the address registers, A0-A6 and the data registers, D0-D7
com-5 MOV r4,#&100
ORR r4,r4,#3
7 <r11> = &0013E94C
9 If the Z flag = 0, then the value in register r1, &DEF02340, is incremented by 4 to
&DEF02344 and that value is used as an address pointer to retrieve the 16-bit data object stored in that memory location The 16-bit value is then loaded into general-purpose register r4 If the Z flag = 1, then the instruction is not executed.
Chapter 11: Solutions for Odd-Numbered Problems
Trang 10Chapter 12: Solutions for Odd-Numbered Problems
* Input register list:
* A6- Pointer to the data string to be sent.
* Return register list:
* A6- Pointer to the character after the string terminating
* character.
* Register usage: All registers used by xmitStr will be saved and
* restored upon exit
* Subroutine starts here
Trang 11Appendix A
472
3 The successive approximation always takes the same number of clock cycles to digitize the
unknown signal Since it is 16 bits of resolution, it takes 16 clock cycles Since this is a 1 MHz clock rate, it takes 16 microseconds to do the digitization.
The single ramp A/D must count up to the unknown voltage Therefore, we need to determine how many counts it takes to get to 1.5001 volts However, we can easily see that the range of
a 16-bit converter is 0 through 65,535 ($0000 to $FFFF in hexadecimal) Thus, the minimum voltage increment of the A/D converter is 0.0001 volts Thus, it takes 15,001 clock cycles or 15,001 microseconds to digitize the unknown voltage.
5a An 11-bit, 2’s complement number can represent a number in the range of –1028 to +1027,
so each change of 1 digital value corresponds to 0.010 volts Anything smaller might not be detectable.
5b Since we know that each digital code increment represents 0.01 volts, we know that +5.11 volts would be represented as 511 ( 5.11 volts / 0.01 volts/count = 511 counts ) In binary, +511 would be 00011111111, so the 2’s complement negative value (–5.11) would be 11100000001 5c 8.96 volts would correspond to a digital value of 896, or 01110000000 In order to properly represent this as a 16-bit number we need to add the appropriate number of leading zeros Thus, the result is 0000 0011 1000 0000 or 0x0380.
5d In order to digitize an 11-bit value using successive approximation, which is the hardware analogy of a binary search algorithm, we would need LOG2 211 or 11 samples
5e Since we take a sample on every rising edge of the clock and we need 11 samples, we need
11 rising edges The clock frequency is 1 MHz, so the clock period is 1 microsecond Thus, it takes 11 microseconds to digitize the analog signal.
7a 25 microseconds = 40 KHz frequency In order to collect 4 samples per cycle, the maximum
frequency of the unknown waveform must be no greater than 10 KHz.
7b 14 bit conversion = 1 part in 16,384 10V/16,384 = 0006V
7c In one millisecond it droops 1 volt in 25 microseconds it droops (25/1000) * 1 = 025 volts Since this is significantly greater than the 0.0006 resolution of the converter, the S/H would introduce an unacceptably large error Thus, it can’t be used.
Trang 12Solutions for Odd-Numbered Problems
Trang 14Chapter 13: Solutions for Odd-Numbered Problems
1 In this particular example, Segment A would be result in better pipeline efficiency The reason
is that each of the instructions is independent of the others; there is no dependencies between them In Segment B, each instruction must complete before the next instruction has enough information to complete Thus, the MOVE.W D1,D0 must put the result in D0 before the ADD.W instruction can begin to operate Likewise, MULU, can’t begin until the result of the ADD operation in the previous instruction has completed Thus, in a pipelined operation, the instructions must each complete before the next one can finish
3 a No, because it involves a memory to memory transfer
b Yes, the addition occurs between two registers.
c Yes, the move is a store operation that transfers a register to memory.
d No, The AND operation takes place between an immediate value and memory.
e Yes, the operation is an immediate load operation that transfers data from memory to a
register.
5 There are several RISC characteristics illustrated here The most important RISC characteristic
is that the ADD operation could only take place between data stored in the general purpose registers Also, there was no effective addressing mode that allowed the memory address to be directly specified, the memory addresses had to be loaded as literals into the registers and then the registers were used as memory pointers Thus, we see only two addressing modes used 7a Since the pipeline has seven stages, and each stage requires 2 clock cycles, then it takes 14 clock cycles for the first instruction to move down the pipeline Since each clock cycle takes
10 nanoseconds, the total time for the first instruction is 140 nanoseconds.
7b If we assume no stalls, after the first instruction is retired, the next 9 instructions would follow
at intervals of 2 clock cycles, so we would have 9 times 20 nanoseconds, or 180 nanoseconds for the basic block to completely execute However, the pipeline will stall twice for 4 clock cycles, this adds another 80 nanoseconds ( 2 × 4 × 10), so the total elapsed time is:
ET = 140 ns + 180 ns + 80 ns = 400 ns
Trang 16Chapter 14: Solutions for Odd-Numbered Problems
1 a The memory hierarchy is often represented as a pyramid with the CPU at the top It
illus-trates the point that the fastest memory, but least amount of memory, is closest to the CPU and that as we get further from the CPU the amount of memory goes up, but the speed goes down Thus, there is a reciprocal relation between the access speed of memory and the size of the memory Also, the cost per bit goes down as you get further from the top
b Spatial locality refers to the fact that instructions and data tend to be grouped together Instructions are located in sequence and data tends to stored in clusters For caches, this means that a cache can be much smaller than main memory but still be efficient in terms
of the probability that if instructions or data are already in the cache, then it is likely that successive instructions or data will be there as well
Temporal locality refers to the fact that if an instruction or data was recently accessed, it is likely to be accessed soon, again Thus, if something is in the cache and has been recently accessed, it is likely that it will be accessed again, thus improving the efficiency of the cache.
c With caches, we want to maximize the hit rate and minimize the miss penalty One way to minimize the miss penalty is to refill a portion of the cache in a burst, rather than one word
at a time whenever there is a cache miss Modern SDRAM memory is designed to refill the on-chip cache in a burst of data reads, thus minimizing the penalty or reloading.
d A write through cache will always write the data into the cache and to main memory
at the same time, thus avoiding the problem of data differences between the cache and main memory, but sacrificing some performance The write back cache will hold the data written only to the cache and then write it to main memory when the bus is available Per- formance is improved but runs the risk of memory being corrupted.
3 Spatial locality can be demonstrated in three ways
a The compiled instructions occupy a very small region of memory, only 32 bytes in length Thus, we may assume that they are located close to each other.
b Since the variables in the array DataStream are being accessed by de-referencing the pointer variable, DataStream + an offset value, count, the individual elements of the array must be located adjacent to each other in memory
c The variables, count and maxcount are local variables to the function, main() As such the compiler has created a stack frame on the system stack just large enough to hold two integer values, so they must also be located near each other.
Trang 17Appendix A
478
Temporal locality can be demonstrated as follows:
a Since the main part of the program is a for loop, the instructions in the loop are executed
5 a Main memory has an address range of 00000 FFFFF, or 220 discreet addresses This
is approximately 1 Mbytes of addresses If each refill line has 64 bytes, or 26, then the number of refill lines = 220 / 26 = 214
16,384 refill lines in main memory
b The cache memory is 4096 bytes in size Using the same method as in A, above, the ber of refill lines in the cache = 212 / 26 = 26
num-64 refill lines in the cache memory
c Since this is a direct mapped cache there must be the same number of rows of refill lines
in the cache memory as there are in the main memory Therefore, the number of rows of refill lines multiplied by the number of columns of refill lines = 16,384
Number of columns of refill lines = 16,384/64= 214 / 26 = 28 = 256
256 columns of refill lines in the main memory
d Since there are 256 columns, the TAG memory must contain 8 bits in order to be able to address any one of the 256 columns Thus, tag memory requires 8 bits.
e See the below diagram:
• • • •
7 Effective execution time = hit rate * hit time + miss rate * miss penalty
Effective execution time = 98*10 + 02 * 100 * 10 = 9.8 + 20 = 29.9 nsec.
9 When the processor is initialized at start-up, all TLB entries are invalid The validity bit is needed to know when a valid entry has been placed in the TLB or if it is just garbage.
Trang 18Chapter 15: Solutions for Odd-Numbered Problems
1 Video gamers are notorious for overclocking their CPUs to gain the last ounce of performance from the machine However, overclocking generates more heat, which slows down the internal processes, and also causes the processor to run closer to its design limits Liquid cooling is more efficient at removing heat, so the CPU can run cooler with a higher heat load on the CPU.
3 For computer #1:
1 Each instruction executes in 1 clock cycle, or 1/100MHz = 10 nanoseconds
2 It must execute a total of 1000 + 200 * 100 = 21000 instructions
3 Total execution time = 21000 × 10 nanoseconds = 2.1 × 104 times 10 × 10−9
= 21 × 10−5 = 0.210 × 10−6 or 210 microseconds.
For computer #2:
1 It must execute the same 21000 instructions, but some take twice as long as others fore 40% of the 21000 instructions take 1 clock cycle and 60% of the 21000 instructions take 2 clock cycles.
There-2 At 250 MHz, 1 clock cycle takes 4 nanoseconds and 2 clock cycles take 8 nanoseconds.
3 Therefore, the total execution time is 0.4 × 21000 × 4 ×10−9 + 0.6 × 21000 × 8 × 10−9 = (8.4 × 103) × (4 × 10−9) + (12.2 × 103) × (8 × 10−9)
= (33.6 × 10−6) + (97.6 × 10−6) = 131.2 × 10−6 = 131.2 microseconds.
5 Cycles per instruction x seconds per clock cycle = seconds per instruction
This is the measure we want:
Computer #1 requires 2 cycles per instruction and each clock cycle takes 1 ns (1/1GHz).
Therefore, computer #1 requires 2 nanoseconds to execute 1 instruction.
Computer #2 requires 1.2 cycles per instruction and each clock cycle takes 2 ns (1/500MHz).
Therefore, computer #2 requires 2.4 nanoseconds to execute 1 instruction.
Thus, performance = 2.4/2.0 = 1.2 Or computer #1 has 20% better performance.
7 Analyzing this problem requires that we consider the number of accesses required for both the instructions and the actual add operation Let’s use 68000 assembly language for this example Here’s a representative code snippet:
MOVE.L var1,D0 *6 bytes long
ADD.L var2,D0 *6 bytes long
MOVE.L D0,var3 *6 bytes long
Trang 20Chapter 16: Solutions for Odd-Numbered Problems
1 The fuse map is shown below:
pseudo-of a good hashing function Thus, any imperfection quickly generates a result that is very different from the standard.
Trang 22About the Author
Arnold S Berger is a Senior Lecturer in the Computing and Software Systems Department at the University of Washington-Bothell He received his BS and PhD degrees from Cornell University Prior to joining UWB, Dr Berger was an R&D Director at Applied Microsystems Corporation, a manufacturer of specialized hardware and software tools for embedded systems developers Prior
to coming to the Pacific Northwest 5 years ago, he was the Embedded Tools Marketing Manager at Advanced Micro Devices and an R&D Project Manager at Hewlett-Packard’s Logic Systems Divi- sion in Colorado Springs, Colorado
Dr Berger has published over 40 papers on embedded systems development methods and the tools needed to design them He holds three patents in the area of embedded systems design tools and embedded systems simulation He is the author of Embedded Systems Design: An Introduction to Processes, Tools and Techniques During the two-year period prior to the Y2K date changeover,
Dr Berger consulted for the electric power industry on testing and remediation of their embedded systems
When not teaching or consulting, Arnie is an avid cyclist, electronic hobbyist, and woodworker His stable of three bicycles collectively log more mileage in a year than does his car