ARM7
ARM7
2
ARM7TDMI
TDMI = (?)
– Thumb instruction set
– Debug-interface (JTAG/ICEBreaker)
– Multiplier (hardware)
– Interrupt (fast interrupts)
ARM7/ARM9 Architecture Feature Highlights
•32/16-bit RISC architecture ( ARM v4T )
•32-bit ARM instruction set for maximum performance and flexibility
•16-bit Thumb instruction set for increased code density
•Unified bus interface, 32-bit data bus carries both instructions and data
•8-, 16-, and 32-bit Data Types
•Three-stage pipeline
•4GBytes Linear Address Space
•32-bit ALU and high-performance multiplier
•37 piece of 32 bit register
•Very small die size and low power consumption
•Fully static operation
•Coprocessor interface
•Extensive debug facilities:
Embedded ICE-RT real-time debug unit.
On-chip JTAG interface unit.
•Interface for direct connection to Embedded Trace Macro cell (ETM).
•Pipelined (ARM7: 3 stages)
• Cached (depending on the implementation)
• Von Neuman-type bus structure (ARM7), Harvard (ARM9)
•7 modes of operation (usr, fiq, irq, svc, abt, sys, und)
• Simple structure -> reasonably good speed / power consumption
ratio
•Very Low Power Consumption: Industry-leader in MIPS/Watt.
Differences between RISC and CISC
CISC RISC
Variable size instructions with many Fixed size instructions (32 bit)with few
formats formats
Multi clock complex instructions. Single clock reduced instructions.
Memory to memory load andstore Register to register load andstore
instructions
Small code size, high cycles per second. Large code size, Low cycles per second.
Emphasis on hardware Emphasis on software
Increased hardware cost. Reduced hardware cost.
ARM Powered Products
7
Pipeline Organization
• Increases speed –
most instructions executed in single cycle
• Versions:
– 3-stage (ARM7TDMI and earlier)
– 5-stage (ARMS, ARM9TDMI)
– 6-stage (ARM10TDMI)
8
ARM7 Pipeline Model
i
n
s
t i Fetch Decode Execute
r
u Fetch Decode Execute
i+1
c
t
i i+2 Fetch Decode Execute
o cycle
n
t t+1 t+2 t+3 t+4 10
Pipeline Organization Stages:
• 5-stage pipeline:
Reduces work per cycle => Fetch
allows higher clock frequency
Decode
Separates data and instruction
memory => Execute
reduction of CPI
(average number of clock Cycles Per Buffer/data
Instruction)
Write-back
11
Pipeline Organization
• Pipeline flushed and refilled on branch,
causing execution to slow down
• Special features in instruction set
eliminate small jumps in code
to obtain the best flow through pipeline
12
ARM-7 Architecture
ARM Architecture Version Summary
Core Version Feature
ARM1 v1 26 bit address
ARM2, ARM2as, ARM3 v2 32 bit multiply
coprocessor
ARM6, ARM60, ARM610, v3 32 bit addresses
ARM7, ARM710, Separate PC and PSRs
ARM7D, ARM7DI Undefined instruction and
Abort modes
Fully static
Big or little endian
StrongARM, SA-110, SA-1100 v4 Half word and signed
ARM8, ARM810 halfword/byte support
Enhanced multiplier
System mode
ARM7TDMI, ARM710T, ARM720T, ARM740T v4T Thumb instruction set
ARM9TDMI, ARM920T, ARM940T
PC
regi ster
DECODE bank
i nstructi on
decode
A m ul ti pl y &
L regi ster
U control
A B
b
u b b
EXECUTE s u
s barrel
shi fter
u
s
ALU
18
Processor Modes
cpsr
spsr spsr spsr spsr spsr spsr
cpsr
spsr spsr spsr spsr spsr
NZCVQ J U n d e f i n e d I FT mode
f s x c
Condition code flags Interrupt Disable bits.
N = Negative result from ALU I = 1: Disables the IRQ.
Z = Zero result from ALU F = 1: Disables the FIQ.
C = ALU operation Carried out
V = ALU operation oVerflowed
T Bit
Architecture xT only
T = 0: Processor in ARM state
Sticky Overflow flag - Q flag
T = 1: Processor in Thumb state
Architecture 5TE/J only
Indicates if saturation has occurred
Mode bits
Specify the processor mode
J bit
Architecture 5TEJ only
J = 1: Processor in Jazelle state
25
What is Exceptions
27
Exception Entry
When an exception arises
ARM completes the current instruction as best it can (except that
reset exception)
handle the exception which starts from a specific location (exception
vector).
Processor performs the following sequence:
Change to the operating mode corresponding to the particular
exception
Stores the return address in LR_<mode>
Copy old CPSR into SPSR_<mode>
Set appropriate CPSR bits
If core currently in Thumb state then ARM state is entered.
Disable IRQs by setting bit 7
If the exception is a fast interrupt, disable further faster interrupt by
setting bit 6 of the CPSR
28
Exception Entry
Force PC to relevant vector address
29
Exception Return
Once the exception has been handled, the user task is normally
resumed
The sequence is
Any modified user registers must be restored from the handler’s
stack
CPSR must be restored from the appropriate SPSR
PC must be changed back to the relevant instruction address
The last two steps happen atomically as part of a single
instruction
30
Exceptions of ARM-7
Mode changes can be made under
Software control
External interrupts
Exception process
The modes other than user mode are privileged modes
Have full access to system resources
Can change mode freely
Exception modes
FIQ
IRQ
Supervisor mode
Abort: data abort and instruction prefetch abort
Undefined
Exception
Task flow
Class Cause
Interrupt External stimulus
Fault Internal cause
Trap Trap instruction
Exception (cont’d)
Highest priority:
1. Reset
2. Data abort
3. FIQ
4. IRQ
5. Prefetch abort
Lowest priority:
6. Undefined Instruction, Software interrupt.
Memory Organization
1 Little-Endian
2 Big – Endian
Memory Organization
Big-endian
The least significant byte (LSB) value, 0Dh, is at the lowest
address. The other bytes follow in increasing order of
significance
Little-endian
Advanced Microprocessor Bus Architecture
(AMBA)
Advanced Microprocessor Bus Architecture
byte repl.
• Execute
buffer/
– An operand is shifted and the
D-cache
load/store
address
data ALU result generated. If the
rot/sgn ex instruction is a load or store, the
LDR pc
memory address is computed in
register write
the ALU
write-back
5-Stage Pipeline Organization (2/2)
next
pc
+4
• Buffer/Data
– Data memory is accessed if
I-cache fetch
pc + 4
LDM/
mul – The result generated by the
STM
+4
post-
index
shift reg
instruction are written back to
shift
pre-index
execute
the register file, including any
ALU
mux
forwarding
paths data loaded from memory
B, BL
MOV pc
SUBS pc
byte repl.
D-cache buffer/
load/store
address
data
rot/sgn ex
LDR pc