This can entail: • cleaning the data cache storing dirty data to memory • draining the write buffer completing all buffered writes • flushing the instruction cache.. For some of these, a
Trang 1Because of the wide variety of systems based on ARM processors, all functionality described in Part B
might be inappropriate to any given system Furthermore, some ARM processors have implemented functions in a different manner to the one described here Because of this, the datasheet or Technical Reference Manual for a particular ARM processor is the definitive source for its memory and system control facilities
Part B therefore does not attempt to specify absolute requirements on the functionality of the System
Control coprocessor or other memory system components Instead, it contains guidelines which, if followed:
• mean that the system is more likely to be compatible with existing and future ARM software
• probably make it easier to port incompatible software to the system
In order to provide an adequate description of the range of memory and system facilities on existing ARM
implementations, Part B describes a number of options that will not be used on new ARM implementations
For information on the rules that must be followed by new implementations of the memory and system architectures, contact ARM Ltd
The fact that Part B describes a broad range of facilities, many of which are used only on some existing
ARM implementations, also means that architecture version numbers for the memory and system architectures would not be helpful or descriptive They are therefore not used
Trang 21.2 System-level issues
This section lists a number of general and operating-system issues that the system designer needs to address when using an ARM processor
1.2.1 Memory systems, write buffers and caches
ARM processors and software are designed to be connected to a byte-addressed memory Word and halfword accesses to the memory ignore the alignment of the address and access the naturally-aligned value that is addressed (so a memory access ignores address bits 0 and 1 for word access, and ignores bit 0 for halfword accesses) The endianness of the ARM processor should normally match that of the memory system, or be configured to match it before any non-word accesses occur (when the endianness is configurable and CP15 is implemented, bit[7] of CP15 register 1 controls the endianness)
Memory that is used to hold programs and data should be marked as follows:
• Main (RAM) memory is normally set as cachable and bufferable
• ROM memory is normally set as cachable, and should be marked as read only, so the bufferable attribute is not used and should be 1
Write buffers
Some ARM implementations incorporate a merging write buffer that subsumes multiple writes to the same location into a single write to main memory Furthermore, some write buffers re-order writes, so that writes are issued to memory in a different order to the order in which they are issued by the processor Therefore, I/O locations should not normally be marked as bufferable, to ensure all writes are issued to the I/O device
in the correct order
For writes to bufferable areas of memory, memory aborts can only be signaled to the processor as a result
of conditions that are detectable at the time the data is placed in the write buffer Conditions that can only
be detected when the data is later written to main memory (such as a parity error from main memory) must
be handled by other methods (typically by raising an interrupt)
Caches
Frame buffers can be cachable, but frame buffers on writeback cache implementations must be copied back
to memory after the frame buffer has been updated Frame buffers can be bufferable, but again the write buffer must be written back to memory after the frame buffer has been updated
ARM processors do not normally support cache coherence between the ARM and other system bus masters Bus snooping is not supported If memory data is to be shared between multiple bus masters without taking special software measures to ensure coherency, then the data must be mapped as:
• uncachable to ensure that all reads access main memory
• unbufferable to ensure that all write access main memory
Trang 3Alternatively, using software, you can manage the coherence of data buffers that are read or written by another bus master by:
• cleaning data from writeback caches and write buffers to memory when the processor has written to the data buffer and before the other bus master reads the buffer
• flushing relevant data from caches when the buffer is being read after the other bus master has written the buffer
You can use an uncached, unbuffered semaphore to maintain synchronization between multiple bus masters
(see Semaphores on page B1-6)
For implementations with writeback caches, all dirty cache data must be written back before any alterations are made to the MMU page tables, to ensure that cache line write back can use the page tables to form the correct physical address for the transfer
You can index caches using either virtual or physical addresses Physical pages must only be mapped into
a single virtual page, otherwise the result is UNPREDICTABLE ARM processors do not normally provide coherence between multiple virtual copies of a single physical page
Some ARM implementations support separate instruction and data caches Coherence between the data and instruction caches is not necessarily maintained in hardware, so if the instruction stream is written, the instruction cache and data cache must be made coherent This can entail:
• cleaning the data cache (storing dirty data to memory)
• draining the write buffer (completing all buffered writes)
• flushing the instruction cache
Instruction and data memory incoherence occurs after a program has been loaded (and therefore treated as data) and is about to be executed It also occurs if self-modifying code is used or generated
1.2.2 Interrupts
ARM processors implement fast and normal levels of interrupt Both interrupts are signaled externally, and many implementations synchronize interrupts before an exception is raised
Fast interrupt request (FIQ)
Disables subsequent normal and fast interrupts by setting the I and F bits in the CPSR
Normal interrupt request (IRQ)
Disables subsequent normal interrupts by setting the I bit in the CPSR
For more information, see Exceptions on page A2-13.
Canceling interrupts
It is the responsibility of software (the interrupt handler) to ensure that the cause of an interrupt is canceled (no longer signaled to the processor) before interrupts are re-enabled (by clearing the I and/or F bit in the CPSR) Interrupts can be canceled with any instruction that might make an external data bus access, meaning any load or store, a swap, or any coprocessor instruction
Trang 4Canceling an interrupt via an instruction fetch is UNPREDICTABLE Canceling an interrupt with a load multiple that restores the CPSR and re-enables interrupts is UNPREDICTABLE.
Devices that do not instantaneously cancel an interrupt (that is, they do not cancel the interrupt before letting the access complete) must be probed by software to ensure that interrupts have been canceled before interrupts are re-enabled This allows a device connected to a remote I/O bus to operate correctly
1.2.3 Semaphores
The Swap and Swap Byte instructions have predictable behavior when used in two ways:
• Systems with multiple bus masters that use the Swap instructions to implement semaphores to control interaction between different bus masters
In this case, the semaphores must be placed in an uncached and unbufferable region of memory The Swap instruction then causes a (locked) read-write bus transaction
This type of semaphore can be externally aborted
• Systems with multiple threads running on a uniprocessor that use the Swap instructions to implement semaphores to control interaction of the threads
In this case, the semaphores can be placed in a cached and bufferable region of memory, and a (locked) read-write bus transaction might or might not occur The Swap and Swap Byte instructions are likely to have better performance on such a system than they do on a system with multiple bus masters (as described above)
This type of semaphore has UNPREDICTABLE behavior if it is externally aborted
Semaphores placed in uncachable/bufferable memory regions have UNPREDICTABLE results Semaphores placed in cachable/unbufferable memory regions have UNPREDICTABLE results
Trang 5The System Control Coprocessor
This chapter describes coprocessor 15, the System Control coprocessor It contains the following sections:
• About the System Control coprocessor on page B2-2
• Registers on page B2-3
• Register 0: ID codes on page B2-6
• Register 1: Control register on page B2-13
• Registers 2-15 on page B2-17.
Trang 62.1 About the System Control coprocessor
All of the standard memory and system facilities are controlled by coprocessor 15 (CP15), which is therefore called the System Control coprocessor Some also use other methods of control, which are described in the chapters describing the facilities concerned For example, the Memory Management Unit
described in Chapter B3 Memory Management Unit is also controlled by page tables in memory
If none of the standard memory and system facilities are implemented in a system, the System Control coprocessor might not be present In this case, no coprocessor accepts CP15 instructions, and so all such instructions are UNDEFINED
However, new implementations of the memory and system architectures must implement the System Control coprocessor, and must follow some additional rules about which facilities are implemented For details of these rules, contact ARM Ltd
This chapter describes the overall design of the System Control coprocessor and how its registers are accessed Detailed information is given on some of its registers Other registers are allocated to facilities described in detail in other chapters and are only summarized in this chapter
Trang 72.2 Registers
The System Control coprocessor can contain up to 16 primary registers, each of which is 32 bits long For some of these, additional bits in the register access instructions are used to identify a specific version of the register and/or specific types of access to the register, so the number of physical 32-bit registers in CP15 can be more than 16 However, the 4-bit primary register number is used to identify registers in descriptions
of the System Control coprocessor, because it is the primary factor determining the function of the register.CP15 registers can be read-only, write-only or read/write The detailed descriptions of the registers specify:
• what types of access are allowed
• what functionality is invoked by each type of access
• whether a primary register identifies more than one physical register, and if so, how they are distinguished
• any other details that are relevant to the use of the register
2.2.1 Register access instructions
The only defined System Control coprocessor instructions are:
• MCR instructions to write an ARM register to a CP15 register
• MRC instructions to read the value of a CP15 register into an ARM register
All CP15 CDP, LDC and STC instructions are UNDEFINED
The MCR and MRC instructions to access the CP15 registers use the generic syntax for those instructions:
MCR{<cond>} p15, 0, <Rd>, <CRn>, <CRm>{, <opcode2>}
MRC{<cond>} p15, 0, <Rd>, <CRn>, <CRm>{, <opcode2>}
where:
<cond> This is the condition under which the instruction is executed The conditions are
defined in The condition field on page A3-5 If <cond> is omitted, the AL (always) condition is used
Bits[23:21] These bits of the instruction, which are the <opcode1> field in generic MRC and
MCR instructions, are always 0b000 in valid CP15 instructions If they are not 0b000, the instruction is UNPREDICTABLE
<Rd> This is the ARM register involved in the transfer (the source register for MCR and
the destination register for MRC) This register must not be R15, even though MCR
instructions normally allow it to be R15 If R15 is specified for <Rd> in a CP15 MRC
or MCR instruction, the instruction is UNPREDICTABLE
Trang 8<CRn> This is the primary CP15 register involved in the transfer (the destination register
for MCR and the source register for MRC) The standard generic coprocessor register names are c0, c1, , c15
<CRm> This is an additional coprocessor register name which is used for accesses to some
primary registers to specify additional information about the version of the register and/or the type of access
When the description of a primary register does not specify <CRm>, c0 must be specified If another register is specified, the instruction is UNPREDICTABLE
<opcode2> This is an optional 3-bit number which is used for accesses to some primary
registers to specify additional information about the version of the register and/or the type of access If it is omitted, 0 is used
When the description of a primary register does not specify <opcode2>, it must
be omitted or 0 must be specified If another value is specified, the instruction is UNPREDICTABLE
These MCR and MRC instructions can only be used while the processor is in a privileged mode If they are executed while the processor is in User mode, an Undefined Instruction exception occurs
Note
If access to some System Control coprocessor functionality by User mode programs is required, the usual solution is that the operating system defines one or more SWIs to supply it As the precise set of memory and system facilities available on different processors can vary considerably, it is recommended that all such SWIs are implemented in an easily replaceable module and that the SWI interface of this module is defined
to be as independent of processor details as possible
The IMB and IMB_Range SWIs described in Instruction Memory Barriers (IMBs) on page A2-28 are
examples of such SWIs
Trang 92.2.2 Primary register allocation
Table 2-1 shows the allocation of the primary registers of the System Control coprocessor
Table 2-1 Primary register allocation
0 ID codes (read-only) ID and Cache type Register 0: ID codes on page B2-6
1 Control bits (read/write) Miscellaneous control bits Register 1: Control register on page B2-13
2 Memory protection and control
MMU: Translation table base
PU: Cachability bits
Register 2: Translation table base on
page B3-23
Register 2: Cachability bits on page B4-6
3 Memory protection and control
MMU: Domain access control
PU: Bufferability bits
Register 3: Domain access control on
page B3-24
Register 3: Bufferability bits on page B4-6
4 Memory protection and control
MMU: ReservedPU: Reserved
Register 4: Reserved on page B3-24 Registers 4, 8, 10: Reserved on page B4-7
5 Memory protection and control
MMU: Fault statusPU: Access permission bits
Register 5: Fault status on page B3-24 Register 5: Access permission bits on
7 Cache and write buffer Cache/write buffer control Register 7: Cache functions on page B5-15
8 Memory protection and control
MMU: TLB controlPU: Reserved
Register 8: TLB functions on page B3-25 Registers 4, 8, 10: Reserved on page B4-7
9 Cache and write buffer Cache lockdown Register 9: Cache lockdown on page B5-18
10 Memory protection and control
MMU: TLB lockdownPU: Reserved
Register 10: TLB lockdown on page B3-27 Registers 4, 8, 10: Reserved on page B4-73
Trang 102.3 Register 0: ID codes
CP15 register 0 contains one or more identification codes for the ARM and system implementation When this register is read, the opcode2 field of the MRC instruction selects which identification code is wanted, as shown in Table 2-2, and the CRm field must be specified as c0 (if it is not, the instruction is
UNPREDICTABLE) Writing to CP15 register 0 is UNPREDICTABLE
It is recommended that all the ID registers in Table 2-2 are implemented, but only the main ID register (<opcode2> == 0) is mandatory Whether or not other ID registers are implemented is IMPLEMENTATION DEFINED
If an <opcode2> value corresponding to an unimplemented or reserved ID register is encountered, the System Control coprocessor returns the value of the main ID register
ID registers other than the main ID register are defined so that when implemented, their value cannot be equal to that of the main ID register Software can therefore determine whether they exist by reading both the main ID register and the desired register and comparing their values If the two values are not equal, the desired register exists
2.3.1 Main ID register
When CP15 register 0 is read with <opcode2> == 0, an identification code is returned from which, among other things, the ARM architecture version number can be determined, as well as whether or not the Thumb instruction set has been implemented
Note
Only some of the fields in CP15 register 0 are architecturally defined The rest are IMPLEMENTATION DEFINED and provide more detailed information about the exact processor variant Consult individual datasheets for the precise identification codes used for each processor
For historical reasons, there are three distinct ways in which the CP15 register 0 ID code might need to be interpreted To determine which to use, look at bits[15:12] of the ID code:
• if they are 0x0, this indicates a pre-ARM7 processor
• if they are 0x7, this indicates that the processor is in the ARM7 family
• otherwise, a more recent processor family than ARM7 is involved
Table 2-2 System Control coprocessor ID registers
Trang 11-Post-ARM7 processors
If bits[15:12] of the ID code are neither 0x0 nor 0x7, the ID code is interpreted as follows:
Bits[3:0] Contain the IMPLEMENTATION DEFINED revision number for the processor
Bits[15:4] Contain an IMPLEMENTATION DEFINED representation of the primary part number for the
processor The top four bits of this number are not allowed to be 0x0 or 0x7
Bits[19:16] Contain an architecture code The following architecture codes are defined (all other values
of the architecture code are reserved by ARM Ltd):
Bits[23:20] Contain an IMPLEMENTATION DEFINED variant number This is typically used to distinguish
two variants of the same primary part, for example, two different cache size variants
Bits[31:24] Contain an implementor code The following codes are defined (all other values of the
architecture code are reserved by ARM Ltd):
Trang 12ARM7 family processors
If bits[15:12] of the ID code are 0x7, the ID code is interpreted as follows:
Bits[3:0] Contain the IMPLEMENTATION DEFINED revision number for the processor
Bits[15:4] Contain an IMPLEMENTATION DEFINED representation of the primary part number for the
processor The top four bits of this number are 0x7
Bits[22:16] Contain an IMPLEMENTATION DEFINED variant number
Bit[23] Indicates which of the two possible architectures for an ARM7-based processor is involved:
1 Architecture 4T
Bits[31:24] Contain an implementor code See Post-ARM7 processors for these codes.
Pre-ARM7 processors
Four processors prior to ARM7 use ID codes in which bits[15:12] are 0x0, and no further processors will
be allocated such ID codes They are interpreted as a 28-bit processor ID and a 4-bit revision number:
The processor ID values are as follows:
Trang 132.3.2 Cache Type register
If present, the Cache Type register supplies the following details about the cache:
• whether it is a unified cache or separate instruction and data caches
• its size, line length and associativity
• whether it is a write-through cache or a write-back cache
• how it can be cleaned efficiently (in the case of a write-back cache)
• whether cache lock-down is supported
See Types of cache on page B5-5 for a discussion of these details.
The format of the Cache Type register is:
ctype Specifies details of the cache not specified by the S bit and the Dsize and Isize fields See
Table 2-3 on page B2-9 for details of the encoding All values not specified in the table are reserved for future expansion
S bit Specifies whether the cache is a unified cache (S == 0), or separate instruction and data
caches (S == 1) If S == 0, the Isize and Dsize fields both describe the unified cache, and must be identical
Dsize Specifies the size, line length and associativity of the data cache, or of the unified cache if
S == 0 See Cache size fields on page B2-10 for details of the encoding.
Isize Specifies the size, line length and associativity of the instruction cache, or of the unified
cache if S == 0 See Cache size fields on page B2-10 for details of the encoding.
Table 2-3 Cache type values
Trang 14The Read data block method of cleaning write-back caches encoded by ctype == 0b0001 consists of loading
a sequential block of data with size equal to that of the cache, and which is known not to be in the cache already It is only suitable for use when the cache organization guarantees that this causes the entire cache
to be reloaded (For example, direct-mapped caches normally have this property, as do caches using some types of round-robin replacement.)
Note
This method of cache cleaning must only be used if the Cache Type register has ctype == 0b0001, or if
implementation documentation states that it is a valid method for the implementation
Register 7: Cache functions on page B5-15 gives details of the register 7 operations used for cleaning other
write-back caches
For an explanation of cache lockdown and of the formats referred to in Table 2-3, see Register 9: Cache lockdown on page B5-18.
2.3.3 Cache size fields
The Dsize and Isize fields in the Cache Type register have the same format, as follows:
Bits[11:9] are reserved for future expansion
The size of the cache is determined by the size field and M bit, as shown in Table 2-4
Table 2-4 Cache sizes size field Size if M == 0 Size if M == 1
Trang 15The line length of the cache is determined by the len field, as shown in Table 2-5.
The associativity of the cache is determined by the assoc field and M bit, as shown in Table 2-6
The cache absent encoding overrides all other data in the cache size field.
Alternatively, the following formulae can be used to determine the values LINELEN, ASSOCIATIVITY
and NSETS, defined in Cache size on page B5-4, once the cache absent case (assoc == 0b000, M == 1) has
been checked for and eliminated:
LINELEN = 1 << (len+3) /* In bytes */
MULTIPLIER = 2 + M ASSOCIATIVITY = MULTIPLIER << (assoc-1) NSETS = 1 << (size + 6 - assoc - len)
Table 2-5 Cache line lengths