Tài liệu ARM Architecture Reference Manual- P17 doc

This can entail: • cleaning the data cache storing dirty data to memory • draining the write buffer completing all buffered writes • flushing the instruction cache.. For some of these, a

Trang 1

Because of the wide variety of systems based on ARM processors, all functionality described in Part B

might be inappropriate to any given system Furthermore, some ARM processors have implemented functions in a different manner to the one described here Because of this, the datasheet or Technical Reference Manual for a particular ARM processor is the definitive source for its memory and system control facilities

Part B therefore does not attempt to specify absolute requirements on the functionality of the System

Control coprocessor or other memory system components Instead, it contains guidelines which, if followed:

• mean that the system is more likely to be compatible with existing and future ARM software

• probably make it easier to port incompatible software to the system

In order to provide an adequate description of the range of memory and system facilities on existing ARM

implementations, Part B describes a number of options that will not be used on new ARM implementations

For information on the rules that must be followed by new implementations of the memory and system architectures, contact ARM Ltd

The fact that Part B describes a broad range of facilities, many of which are used only on some existing

ARM implementations, also means that architecture version numbers for the memory and system architectures would not be helpful or descriptive They are therefore not used

Trang 2

1.2 System-level issues

This section lists a number of general and operating-system issues that the system designer needs to address when using an ARM processor

1.2.1 Memory systems, write buffers and caches

ARM processors and software are designed to be connected to a byte-addressed memory Word and halfword accesses to the memory ignore the alignment of the address and access the naturally-aligned value that is addressed (so a memory access ignores address bits 0 and 1 for word access, and ignores bit 0 for halfword accesses) The endianness of the ARM processor should normally match that of the memory system, or be configured to match it before any non-word accesses occur (when the endianness is configurable and CP15 is implemented, bit[7] of CP15 register 1 controls the endianness)

Memory that is used to hold programs and data should be marked as follows:

• Main (RAM) memory is normally set as cachable and bufferable

• ROM memory is normally set as cachable, and should be marked as read only, so the bufferable attribute is not used and should be 1

Write buffers

Some ARM implementations incorporate a merging write buffer that subsumes multiple writes to the same location into a single write to main memory Furthermore, some write buffers re-order writes, so that writes are issued to memory in a different order to the order in which they are issued by the processor Therefore, I/O locations should not normally be marked as bufferable, to ensure all writes are issued to the I/O device

in the correct order

For writes to bufferable areas of memory, memory aborts can only be signaled to the processor as a result

of conditions that are detectable at the time the data is placed in the write buffer Conditions that can only

be detected when the data is later written to main memory (such as a parity error from main memory) must

be handled by other methods (typically by raising an interrupt)

Caches

Frame buffers can be cachable, but frame buffers on writeback cache implementations must be copied back

to memory after the frame buffer has been updated Frame buffers can be bufferable, but again the write buffer must be written back to memory after the frame buffer has been updated

ARM processors do not normally support cache coherence between the ARM and other system bus masters Bus snooping is not supported If memory data is to be shared between multiple bus masters without taking special software measures to ensure coherency, then the data must be mapped as:

• uncachable to ensure that all reads access main memory

• unbufferable to ensure that all write access main memory

Trang 3

Alternatively, using software, you can manage the coherence of data buffers that are read or written by another bus master by:

• cleaning data from writeback caches and write buffers to memory when the processor has written to the data buffer and before the other bus master reads the buffer

• flushing relevant data from caches when the buffer is being read after the other bus master has written the buffer

You can use an uncached, unbuffered semaphore to maintain synchronization between multiple bus masters

(see Semaphores on page B1-6)

For implementations with writeback caches, all dirty cache data must be written back before any alterations are made to the MMU page tables, to ensure that cache line write back can use the page tables to form the correct physical address for the transfer

You can index caches using either virtual or physical addresses Physical pages must only be mapped into

a single virtual page, otherwise the result is UNPREDICTABLE ARM processors do not normally provide coherence between multiple virtual copies of a single physical page

Some ARM implementations support separate instruction and data caches Coherence between the data and instruction caches is not necessarily maintained in hardware, so if the instruction stream is written, the instruction cache and data cache must be made coherent This can entail:

• cleaning the data cache (storing dirty data to memory)

• draining the write buffer (completing all buffered writes)

• flushing the instruction cache

Instruction and data memory incoherence occurs after a program has been loaded (and therefore treated as data) and is about to be executed It also occurs if self-modifying code is used or generated

1.2.2 Interrupts

ARM processors implement fast and normal levels of interrupt Both interrupts are signaled externally, and many implementations synchronize interrupts before an exception is raised

Fast interrupt request (FIQ)

Disables subsequent normal and fast interrupts by setting the I and F bits in the CPSR

Normal interrupt request (IRQ)

Disables subsequent normal interrupts by setting the I bit in the CPSR

For more information, see Exceptions on page A2-13.

Canceling interrupts

It is the responsibility of software (the interrupt handler) to ensure that the cause of an interrupt is canceled (no longer signaled to the processor) before interrupts are re-enabled (by clearing the I and/or F bit in the CPSR) Interrupts can be canceled with any instruction that might make an external data bus access, meaning any load or store, a swap, or any coprocessor instruction

Trang 4

Canceling an interrupt via an instruction fetch is UNPREDICTABLE Canceling an interrupt with a load multiple that restores the CPSR and re-enables interrupts is UNPREDICTABLE.

Devices that do not instantaneously cancel an interrupt (that is, they do not cancel the interrupt before letting the access complete) must be probed by software to ensure that interrupts have been canceled before interrupts are re-enabled This allows a device connected to a remote I/O bus to operate correctly

1.2.3 Semaphores

The Swap and Swap Byte instructions have predictable behavior when used in two ways:

• Systems with multiple bus masters that use the Swap instructions to implement semaphores to control interaction between different bus masters

In this case, the semaphores must be placed in an uncached and unbufferable region of memory The Swap instruction then causes a (locked) read-write bus transaction

This type of semaphore can be externally aborted

• Systems with multiple threads running on a uniprocessor that use the Swap instructions to implement semaphores to control interaction of the threads

In this case, the semaphores can be placed in a cached and bufferable region of memory, and a (locked) read-write bus transaction might or might not occur The Swap and Swap Byte instructions are likely to have better performance on such a system than they do on a system with multiple bus masters (as described above)

This type of semaphore has UNPREDICTABLE behavior if it is externally aborted

Semaphores placed in uncachable/bufferable memory regions have UNPREDICTABLE results Semaphores placed in cachable/unbufferable memory regions have UNPREDICTABLE results

Trang 5

The System Control Coprocessor

This chapter describes coprocessor 15, the System Control coprocessor It contains the following sections:

• About the System Control coprocessor on page B2-2

• Registers on page B2-3

• Register 0: ID codes on page B2-6

• Register 1: Control register on page B2-13

• Registers 2-15 on page B2-17.

Trang 6

2.1 About the System Control coprocessor

All of the standard memory and system facilities are controlled by coprocessor 15 (CP15), which is therefore called the System Control coprocessor Some also use other methods of control, which are described in the chapters describing the facilities concerned For example, the Memory Management Unit

described in Chapter B3 Memory Management Unit is also controlled by page tables in memory

If none of the standard memory and system facilities are implemented in a system, the System Control coprocessor might not be present In this case, no coprocessor accepts CP15 instructions, and so all such instructions are UNDEFINED

However, new implementations of the memory and system architectures must implement the System Control coprocessor, and must follow some additional rules about which facilities are implemented For details of these rules, contact ARM Ltd

This chapter describes the overall design of the System Control coprocessor and how its registers are accessed Detailed information is given on some of its registers Other registers are allocated to facilities described in detail in other chapters and are only summarized in this chapter

Trang 7

2.2 Registers

The System Control coprocessor can contain up to 16 primary registers, each of which is 32 bits long For some of these, additional bits in the register access instructions are used to identify a specific version of the register and/or specific types of access to the register, so the number of physical 32-bit registers in CP15 can be more than 16 However, the 4-bit primary register number is used to identify registers in descriptions

of the System Control coprocessor, because it is the primary factor determining the function of the register.CP15 registers can be read-only, write-only or read/write The detailed descriptions of the registers specify:

• what types of access are allowed

• what functionality is invoked by each type of access

• whether a primary register identifies more than one physical register, and if so, how they are distinguished

• any other details that are relevant to the use of the register

2.2.1 Register access instructions

The only defined System Control coprocessor instructions are:

• MCR instructions to write an ARM register to a CP15 register

• MRC instructions to read the value of a CP15 register into an ARM register

All CP15 CDP, LDC and STC instructions are UNDEFINED

The MCR and MRC instructions to access the CP15 registers use the generic syntax for those instructions:

MCR{<cond>} p15, 0, <Rd>, <CRn>, <CRm>{, <opcode2>}

MRC{<cond>} p15, 0, <Rd>, <CRn>, <CRm>{, <opcode2>}

where:

<cond> This is the condition under which the instruction is executed The conditions are

defined in The condition field on page A3-5 If <cond> is omitted, the AL (always) condition is used

Bits[23:21] These bits of the instruction, which are the <opcode1> field in generic MRC and

MCR instructions, are always 0b000 in valid CP15 instructions If they are not 0b000, the instruction is UNPREDICTABLE

<Rd> This is the ARM register involved in the transfer (the source register for MCR and

the destination register for MRC) This register must not be R15, even though MCR

instructions normally allow it to be R15 If R15 is specified for <Rd> in a CP15 MRC

or MCR instruction, the instruction is UNPREDICTABLE

Trang 8

<CRn> This is the primary CP15 register involved in the transfer (the destination register

for MCR and the source register for MRC) The standard generic coprocessor register names are c0, c1, , c15

<CRm> This is an additional coprocessor register name which is used for accesses to some

primary registers to specify additional information about the version of the register and/or the type of access

When the description of a primary register does not specify <CRm>, c0 must be specified If another register is specified, the instruction is UNPREDICTABLE

<opcode2> This is an optional 3-bit number which is used for accesses to some primary

registers to specify additional information about the version of the register and/or the type of access If it is omitted, 0 is used

When the description of a primary register does not specify <opcode2>, it must

be omitted or 0 must be specified If another value is specified, the instruction is UNPREDICTABLE

These MCR and MRC instructions can only be used while the processor is in a privileged mode If they are executed while the processor is in User mode, an Undefined Instruction exception occurs

Note

If access to some System Control coprocessor functionality by User mode programs is required, the usual solution is that the operating system defines one or more SWIs to supply it As the precise set of memory and system facilities available on different processors can vary considerably, it is recommended that all such SWIs are implemented in an easily replaceable module and that the SWI interface of this module is defined

to be as independent of processor details as possible

The IMB and IMB_Range SWIs described in Instruction Memory Barriers (IMBs) on page A2-28 are

examples of such SWIs

Trang 9

2.2.2 Primary register allocation

Table 2-1 shows the allocation of the primary registers of the System Control coprocessor

Table 2-1 Primary register allocation

0 ID codes (read-only) ID and Cache type Register 0: ID codes on page B2-6

1 Control bits (read/write) Miscellaneous control bits Register 1: Control register on page B2-13

2 Memory protection and control

MMU: Translation table base

PU: Cachability bits

Register 2: Translation table base on

page B3-23

Register 2: Cachability bits on page B4-6

MMU: Domain access control

PU: Bufferability bits

Register 3: Domain access control on

page B3-24

Register 3: Bufferability bits on page B4-6

MMU: ReservedPU: Reserved

Register 4: Reserved on page B3-24 Registers 4, 8, 10: Reserved on page B4-7

MMU: Fault statusPU: Access permission bits

Register 5: Fault status on page B3-24 Register 5: Access permission bits on

7 Cache and write buffer Cache/write buffer control Register 7: Cache functions on page B5-15

MMU: TLB controlPU: Reserved

Register 8: TLB functions on page B3-25 Registers 4, 8, 10: Reserved on page B4-7

9 Cache and write buffer Cache lockdown Register 9: Cache lockdown on page B5-18

MMU: TLB lockdownPU: Reserved

Register 10: TLB lockdown on page B3-27 Registers 4, 8, 10: Reserved on page B4-73

Trang 10

2.3 Register 0: ID codes

CP15 register 0 contains one or more identification codes for the ARM and system implementation When this register is read, the opcode2 field of the MRC instruction selects which identification code is wanted, as shown in Table 2-2, and the CRm field must be specified as c0 (if it is not, the instruction is

UNPREDICTABLE) Writing to CP15 register 0 is UNPREDICTABLE

It is recommended that all the ID registers in Table 2-2 are implemented, but only the main ID register (<opcode2> == 0) is mandatory Whether or not other ID registers are implemented is IMPLEMENTATION DEFINED

If an <opcode2> value corresponding to an unimplemented or reserved ID register is encountered, the System Control coprocessor returns the value of the main ID register

ID registers other than the main ID register are defined so that when implemented, their value cannot be equal to that of the main ID register Software can therefore determine whether they exist by reading both the main ID register and the desired register and comparing their values If the two values are not equal, the desired register exists

2.3.1 Main ID register

When CP15 register 0 is read with <opcode2> == 0, an identification code is returned from which, among other things, the ARM architecture version number can be determined, as well as whether or not the Thumb instruction set has been implemented

Note

Only some of the fields in CP15 register 0 are architecturally defined The rest are IMPLEMENTATION DEFINED and provide more detailed information about the exact processor variant Consult individual datasheets for the precise identification codes used for each processor

For historical reasons, there are three distinct ways in which the CP15 register 0 ID code might need to be interpreted To determine which to use, look at bits[15:12] of the ID code:

• if they are 0x0, this indicates a pre-ARM7 processor

• if they are 0x7, this indicates that the processor is in the ARM7 family

• otherwise, a more recent processor family than ARM7 is involved

Table 2-2 System Control coprocessor ID registers

Trang 11

-Post-ARM7 processors

If bits[15:12] of the ID code are neither 0x0 nor 0x7, the ID code is interpreted as follows:

Bits[3:0] Contain the IMPLEMENTATION DEFINED revision number for the processor

Bits[15:4] Contain an IMPLEMENTATION DEFINED representation of the primary part number for the

processor The top four bits of this number are not allowed to be 0x0 or 0x7

Bits[19:16] Contain an architecture code The following architecture codes are defined (all other values

of the architecture code are reserved by ARM Ltd):

Bits[23:20] Contain an IMPLEMENTATION DEFINED variant number This is typically used to distinguish

two variants of the same primary part, for example, two different cache size variants

Bits[31:24] Contain an implementor code The following codes are defined (all other values of the

architecture code are reserved by ARM Ltd):

Trang 12

ARM7 family processors

If bits[15:12] of the ID code are 0x7, the ID code is interpreted as follows:

Bits[3:0] Contain the IMPLEMENTATION DEFINED revision number for the processor

Bits[15:4] Contain an IMPLEMENTATION DEFINED representation of the primary part number for the

processor The top four bits of this number are 0x7

Bits[22:16] Contain an IMPLEMENTATION DEFINED variant number

Bit[23] Indicates which of the two possible architectures for an ARM7-based processor is involved:

1 Architecture 4T

Bits[31:24] Contain an implementor code See Post-ARM7 processors for these codes.

Pre-ARM7 processors

Four processors prior to ARM7 use ID codes in which bits[15:12] are 0x0, and no further processors will

be allocated such ID codes They are interpreted as a 28-bit processor ID and a 4-bit revision number:

The processor ID values are as follows:

Trang 13

2.3.2 Cache Type register

If present, the Cache Type register supplies the following details about the cache:

• whether it is a unified cache or separate instruction and data caches

• its size, line length and associativity

• whether it is a write-through cache or a write-back cache

• how it can be cleaned efficiently (in the case of a write-back cache)

• whether cache lock-down is supported

See Types of cache on page B5-5 for a discussion of these details.

The format of the Cache Type register is:

ctype Specifies details of the cache not specified by the S bit and the Dsize and Isize fields See

Table 2-3 on page B2-9 for details of the encoding All values not specified in the table are reserved for future expansion

S bit Specifies whether the cache is a unified cache (S == 0), or separate instruction and data

caches (S == 1) If S == 0, the Isize and Dsize fields both describe the unified cache, and must be identical

Dsize Specifies the size, line length and associativity of the data cache, or of the unified cache if

S == 0 See Cache size fields on page B2-10 for details of the encoding.

Isize Specifies the size, line length and associativity of the instruction cache, or of the unified

cache if S == 0 See Cache size fields on page B2-10 for details of the encoding.

Table 2-3 Cache type values

Trang 14

The Read data block method of cleaning write-back caches encoded by ctype == 0b0001 consists of loading

a sequential block of data with size equal to that of the cache, and which is known not to be in the cache already It is only suitable for use when the cache organization guarantees that this causes the entire cache

to be reloaded (For example, direct-mapped caches normally have this property, as do caches using some types of round-robin replacement.)

Note

This method of cache cleaning must only be used if the Cache Type register has ctype == 0b0001, or if

implementation documentation states that it is a valid method for the implementation

Register 7: Cache functions on page B5-15 gives details of the register 7 operations used for cleaning other

write-back caches

For an explanation of cache lockdown and of the formats referred to in Table 2-3, see Register 9: Cache lockdown on page B5-18.

2.3.3 Cache size fields

The Dsize and Isize fields in the Cache Type register have the same format, as follows:

Bits[11:9] are reserved for future expansion

The size of the cache is determined by the size field and M bit, as shown in Table 2-4

Table 2-4 Cache sizes size field Size if M == 0 Size if M == 1

Trang 15

The line length of the cache is determined by the len field, as shown in Table 2-5.

The associativity of the cache is determined by the assoc field and M bit, as shown in Table 2-6

The cache absent encoding overrides all other data in the cache size field.

Alternatively, the following formulae can be used to determine the values LINELEN, ASSOCIATIVITY

and NSETS, defined in Cache size on page B5-4, once the cache absent case (assoc == 0b000, M == 1) has

been checked for and eliminated:

LINELEN = 1 << (len+3) /* In bytes */

MULTIPLIER = 2 + M ASSOCIATIVITY = MULTIPLIER << (assoc-1) NSETS = 1 << (size + 6 - assoc - len)

Table 2-5 Cache line lengths

Tiêu đề	Introduction to Memory and System Architectures
Trường học	University of Cambridge
Chuyên ngành	Computer Architecture
Thể loại	Reference Manual
Năm xuất bản	2000
Thành phố	Cambridge

Định dạng
Số trang	30
Dung lượng	415,2 KB