KEMBAR78
ARM Processor | PDF | Cpu Cache | Microprocessor
0% found this document useful (0 votes)
89 views88 pages

ARM Processor

The document provides an overview of the ARM architecture, including a timeline of ARM processors from 1985 to 2002, details on the ARM instruction set and programming model, and descriptions of ARM registers, memory organization, data processing and transfer instructions, conditional execution, and control flow. It explains that ARM is a RISC processor designed for low power consumption with a 32-bit architecture and load-store instruction set.

Uploaded by

Ayan Acharya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views88 pages

ARM Processor

The document provides an overview of the ARM architecture, including a timeline of ARM processors from 1985 to 2002, details on the ARM instruction set and programming model, and descriptions of ARM registers, memory organization, data processing and transfer instructions, conditional execution, and control flow. It explains that ARM is a RISC processor designed for low power consumption with a 32-bit architecture and load-store instruction set.

Uploaded by

Ayan Acharya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 88

INTRODUCTION

 ARM is a RISC processor.


 It is used for small size and high
performance applications.
 Simple architecture – low power
consumption.

ARM System - On - Chip


Architecture 2
TIMELINE (1/2)
 1985: Acorn Computer Group manufactures the
first commercial RISC microprocessor.
 1990: Acorn and Apple participation leads to the
founding of Advanced RISC Machines (A.R.M.).
 1991: ARM6, First embeddable RISC
microprocessor.
 1992 – 1994: Various companies use ARM (Sharp,
Samsung), while in 1993 ARM7, the first
multimedia microprocessor is introduced.

ARM System - On - Chip


Architecture 3
TIMELINE (2/2)
 1995: Introduction of Thumb and ARM8.
 1996 – 2000: Alcatel, Huindai, Philips, Sony, use
ΑRM, while in 1999 η ARM cooperates with Erickson
for the development of Bluetooth.
 2000 – 2002: ARM’s share of the 32 – bit embedded
RISC microprocessor market is 80%. ARM Developer
Suite is introduced.

ARM System - On - Chip


Architecture 4
THE ARM
ARCHITECTURE
GENERAL INFO (1/2)
AIM: Simple design

 Load – store architecture


 32 bit data bus
 3 addressing modes

ARM System - On - Chip


Architecture 6
GENERAL INFO (2/2)
Simple architecture
+ Small size
Simple instruction set
+
Code density Low power
consumption

ARM System - On - Chip


Architecture 7
Registers
 32 general purpose registers
 7 modes of operation
 Different set of visible registers and
different cpsr control level in each
mode.

ARM System - On - Chip


Architecture 8
ARM Programming Model
r0
usable in user mode
r1
r2
r3 system modes only
r4
r5
r6
r7
r8_fiq
r8
r9 r9_fiq
r10_fiq
r10
r11 r11_fiq
r12_fiq r13_irq r13_und
r12 r13_abt
r13_fiq r13_svc r14_irq r14_und
r13 r14_svc r14_abt
r14 r14_fiq
r15 (PC)

SPSR_abt SPSR_irq SPSR_und


CPSR SPSR_fiq SPSR_svc

fiq svc abort irq undefined


user mode mode mode mode mode mode
CPSR

ARM CPSR format


31 28 27 8 7 6 5 4 0
N ZC V unused IF T mode

N: Negative
Z: Zero
C: Carry
V: Overflow
Q: Saturation (for enhanced DSP instructions)
ARM System - On - Chip
Architecture 10
Memory Organization
bit 31 bit 0
23 22 21 20
 Address bus: 32 – bits
19 18 17 16  1 word = 32 – bits
word16
15 14 13 12
half-word14 half-word12
11 10 9 8
word8
7 6 5 4
byte6 half-word4
3 2 1 0 byte
byte3 byte2 byte1 byte0 address

ARM System - On - Chip


Architecture 11
Instruction Set
 Three instruction types
 Data processing
 Data transfer
 Control flow

ARM System - On - Chip


Architecture 12
Supervisor mode
 In user mode the operating system handles
operations outside user privileges.
 Using “supervisor calls”, the user goes to
system level and can perform system
functions.

ARM System - On - Chip


Architecture 13
I/O System
 ARM handles peripherals as “memory mapped
devices with interrupt support”.
 Interrupts:
 IRQ: normal interrupt
 FIQ: fast interrupt

ARM System - On - Chip


Architecture 14
Exceptions
 Exceptions:
 Interrupts
 Supervisor Call
 Traps
 When an exception takes place:
 The value of PC is copied to r14_exc
 The operating mode changes into the respective
exception mode.
 The PC takes the exception handler vector
address.
ARM System - On - Chip
Architecture 15
ARM programming model
r0
usable in user mode
r1
r2
r3 system modes only
r4
r5
r6
r7
r8_fiq
r8
r9 r9_fiq
r10_fiq
r10
r11 r11_fiq
r12_fiq r13_irq r13_und
r12 r13_abt
r13_fiq r13_svc r14_irq r14_und
r13 r14_svc r14_abt
r14 r14_fiq
r15 (PC)

SPSR_abt SPSR_irq SPSR_und


CPSR SPSR_fiq SPSR_svc

fiq svc abort irq undefined


user mode mode mode mode mode mode
THE ARM
INSTRUCTION SET
Data Processing Instructions (1/2)
 Arithmetic Operations
ADD r0, r1, r2 ; r0:= r1+r2 and don’t update flags
ADDS r0, r1, r2 ; r0:= r1+r2 and update flags
 Logical Operations
AND r0, r1, r2 ; r0:= r1 AND r2
 Register Movement
MOV r0, r2
 Comparison
CMP r1, r2
ARM System - On - Chip
Architecture 18
Data Processing Instructions (2/2)
 Operands:
 Immediate operands
ADD r3, r3, #1
 Shifted register operands:
ADD r3, r2, r1, LSL #3

 Miscellaneous data processing instructions:


 Multiplication:
MUL r4, r3, r2

ARM System - On - Chip


Architecture 19
Data transfer instructions
 Load and store instructions:
LDR r0, [r1]
STR r0, [r1]
 Offset: LDR r0, [r1,#4]
 Post – indexed: LDR r0, [r1], #16
 Auto – indexed: LDR r0, [r1,#16]!
 Multiple data transfers:
LDMIA r1, {r0,r2,r5}

ARM System - On - Chip


Architecture 20
Examples
 PRE:
 r0 = 0x00000000
 r1 = 0x00009000
 mem32[0x00009000] = 0x01010101
 mem32[0x00009004] = 0x02020202
 LDR r0, [r1, #4]!
 POST:
 r0 = 0x02020202
 r1 = 0x00009004

ARM System - On - Chip


Architecture 21
Examples
 PRE:
 r0 = 0x00000000
 r1 = 0x00009000
 mem32[0x00009000] = 0x01010101
 mem32[0x00009004] = 0x02020202
 LDR r0, [r1, #4]
 POST:
 r0 = 0x02020202
 r1 = 0x00009000

ARM System - On - Chip


Architecture 22
Examples
 PRE:
 r0 = 0x00000000
 r1 = 0x00009000
 mem32[0x00009000] = 0x01010101
 mem32[0x00009004] = 0x02020202
 LDR r0, [r1], #4
 POST:
 r0 = 0x01010101
 r1 = 0x00009004

ARM System - On - Chip


Architecture 23
Examples
 mem32[0x80018] = 0x03
 mem32[0x80014] = 0x02

 mem32[0x80010] = 0x01

 r0 = 0x00080010

LDMIA r0!, {r1-r3}


 r0 = 0x0008001c

 r1 = 0x00000001

 r2 = 0x00000002

 r3 = 0x00000003

ARM System - On - Chip


Architecture 24
Examples
 mem32[0x8001c] = 0x04
 mem32[0x80018] = 0x03

 mem32[0x80014] = 0x02

 mem32[0x80010] = 0x01

 r0 = 0x00080010

LDMIB r0!, {r1-r3}


 r0 = 0x0008001c

 r1 = 0x00000002

 r2 = 0x00000003

 r3 = 0x00000004

ARM System - On - Chip


Architecture 25
Conditional execution
Instructions can be executed
conditionally without braches
CMP r2, r3 ;subtract and set flags
ADDGE r4, r5, r6 ; if r2>r3
SUBLT r4, r5, r6 ; else

ARM System - On - Chip


Architecture 26
Conditional execution mnemonics

ARM System - On - Chip


Architecture 27
Control flow instructions
 Branch instruction: B label
 Conditional branch: BNE label
 Branch and Link: BL label
BL loop
… …
Loop … …
… …
MOV PC, r14 ; επιστροφή

ARM System - On - Chip


Architecture 28
Example 1
AREA ARMex, CODE, READONLY ; Name this block of code ARMex
ENTRY ; Mark first instruction to execute
start
MOV r0, #10 ; Set up parameters
MOV r1, #3
ADD r0, r0, r1 ; r0 = r0 + r1
stop
MOV r0, #0x18 ; angel_SWIreason_ReportException
LDR r1, =0x20026 ; ADP_Stopped_ApplicationExit
SWI 0x123456 ; ARM semihosting SWI
END ; Mark end of file

ARM System - On - Chip


Architecture 29
Example 2
AREA subrout, CODE, READONLY ; Name this block of code
ENTRY ; Mark first instruction to execute
start MOV r0, #10 ; Set up parameters
MOV r1, #3
BL doadd ; Call subroutine
stop
MOV r0, #0x18 ; angel_SWIreason_ReportException
LDR r1, =0x20026 ; ADP_Stopped_ApplicationExit
SWI 0x123456 ; ARM semihosting SWI
doadd
ADD r0, r0, r1 ; Subroutine code
MOV pc, lr ; Return from subroutine
END ; Mark end of file

ARM System - On - Chip


Architecture 30
ARM ORGANIZATION AND
IMPLEMENTATION
3 – Stage A[31:0]

address regis ter


control

Pipeline P
C i ncrementer

(ARM7 – register
bank
PC

80MHz) A multi ply


i nstructi on
decode
&

Fetch
L register
U control
 b
A B
u b b

Decode
s u u
barrel
 s
shi fter
s

 Execute ALU

 Throughput:
1 instruction / cycle
data out regi ster data i n regi ster

D[31:0]
5 – stage pipeline (1/2)
 Program execution time:
N inst  CPI
Tprog 
f clk

 Ways to reduce Tprog:


 Increase f clk Logic simplification
 Reduce CPI reduce the number of
multicycle instructions.

ARM System - On - Chip


Architecture 33
5 – stage
pipeline
(ARM9-
150MHz)
(2/2)
 Fetch
 Decode
 Execute
 Buffer / Data
 Write - Back
ARM coprocessor interface
 ARM supports upto 16 coprocessors, which
can be software emulated.
 Each coprocessor has upto 16 general-
purpose registers
 ARM is a load and store architecture.
 Coprocessors usually handle on – chip
functions, such as cache and memory
management.

ARM System - On - Chip


Architecture 35
ARCHITECTURAL SUPPORT FOR
HIGH – LEVEL LANGUAGES
Floating - point accelerator (1/2)

 For floating-point operations, ARM has the FPE


software emulator and the FPA 10 hardware floating
– point accelerator.
 FPA 10 includes:
 Coprocessor interface
 Load / store unit
 Register bank ( 8 registers 80 – bit )
 ALU (adder, mult, div)

ARM System - On - Chip


Architecture 37
Floating - point accelerator (2/2)
data bus

pipeline instructio n load/store


co ntr ol issuer unit

co processor registe r ba nk
co processor
hand-sh ake interface

add
arithmetic
mult
unit
div

ARM System - On - Chip


Architecture 38
APCS (1/2)
 APCS (ARM Procedure Call Standard) is a set of
rules concerning C procedure input and output.
 Specific use of general purpose registers. (r0 –
r4: arguments, r4 – r8 variables, r10 stack limit,
etc. )
 Procedure I/O:

BL Loop

Loop …
MOV pc, lr

ARM System - On - Chip


Architecture 39
APCS (2/2)
C code Assembly code

f1 LDR r0, [r13]


void f1(int a) { STR r13!, [r14]
f2(a); STR r13!, [r0]
} BL f2
16 SUB r13,#4
LDR r13!, r15
8
4
0 Stack pointer
ARM System - On - Chip
Architecture 40
THUMB PROGRAMMER’S
MODEL
General information
 Thumb objective:
Code density.
 Thumb has a 16 – bit instruction set.
 A subset of the ARM instruction set is coded to a
16–bit space
 With appropriate use great benefits can be
achieved in terms of
 Power efficiency
 Enhanced performance

ARM System - On - Chip


Architecture 42
Going in and out of Thumb mode
 Using the BX instruction, in ARM state:
e.g. ΒΧ r0
 Commands are assembled as 16 – bit
instructions with the appropriate directive
 If r0[0] is 1, the T bit in the CPSR becomes 1
and the PC is set to the address obtained from
the remaining bits of r0.
 Using the BX instruction from Thumb state,
we return to ARM state.

ARM System - On - Chip


Architecture 43
The Thumb programmer’s model
 Thumb registers
r0
r1 shaded registers have
res tricted acc es s
r2
r3
r4 Lo regis ters
r5
r6
r7
r8
r9
r10
r11
Hi registers
r12
SP (r13)
CPSR
LR (r14)
PC (r15)

ARM System - On - Chip


Architecture 44
ARM vs. Thumb (1/3)
 Thumb  ARM
 Upto 70% code  40% faster code
size reduction when coupled with
 40% more a 32-bit memory
instructions.
 45% faster code
with 16-bit
memory
 Requires about
30% less external
memory
ARM System - On - Chip
Architecture 45
ARM vs. Thumb (2/3)
 If performance is critical:
ARM

 If cost and power consumption are


critical:
Thumb

ARM System - On - Chip


Architecture 46
ARM and Τhumb interaction
 A 32 – bit ARM system can go into Thumb mode
for specific routines, in order to meet power and
memory constraints.
 A 16 – bit system: Can use an on – chip, 32 – bit
memory for ARM state routines, and a 16-bit off
– chip memory and Thumb code for the rest of
the application.

ARM System - On - Chip


Architecture 47
Example 3
AREA ThumbSub, CODE, READONLY ; Name this block of code
ENTRY ; Mark first instruction to execute
CODE32 ; Subsequent instructions are ARM
header ADR r0, start + 1 ; Processor starts in ARM state,
BX r0 ; so small ARM code header used
; to call Thumb main program
CODE16 ; Subsequent instructions are Thumb
start
MOV r0, #10 ; Set up parameters
MOV r1, #3
BL doadd ; Call subroutine
stop
MOV r0, #0x18 ;
angel_SWIreason_ReportException
LDR r1, =0x20026 ; ADP_Stopped_ApplicationExit
SWI 0xAB ; Thumb semihosting SWI
doadd
ADD r0, r0, r1 ; Subroutine code
MOV pc, lr ; Return from subroutine
END ; Mark end of file

ARM System - On - Chip


Architecture 48
Example 4
 Implement the following pseudocode in ARM
and Thumb assembly. Which is more efficient
in terms of execution time and which in terms
of code size?
If r1>r2 then
R3= r4 + r5
R6 = r4 – r5
Else
R3= r4 - r5
R6 = r4 + r5
ARM System - On - Chip
Architecture 49
Example 5
 Write an ARM assembly program that
loads data from memory location 0x40,
sets bits 3 to 5, clears bits 0 to 2 and
leaves the remaining bits unchanged.
 Test it using 0xAD as input data

ARM System - On - Chip


Architecture 50
ARCHITECTURAL SUPPORT
FOR SYSTEM
DEVELOPMENT
The ARM memory interface

A basic
ARM
memory
system
AMBA (1/4)
 Advanced Microcontroller Bus Architecture
 Advanced High – Performance Bus
 Advanced System Bus
 Advanced Peripheral Bus
 AMBA objectives:
 Technology – independence
 To encourage modular system design

ARM System - On - Chip


Architecture 53
AMBA (2/4)
 A typical AMBA – based system

ARM System - On - Chip


Architecture 54
AMBA (3/4)
 AHB bus arbiter

 Burst address

transaction master
1
slave
1

 Split

write
data
transaction master slave

Data bus 64 –
2 2

128 bit
master slave
3 read 3
data

decoder

ARM System - On - Chip


Architecture 55
AMBA (4/4)
 AMBA Design Kit (ADK)
 An environment that assists designers in developing
ΑΜΒΑ based components και SoC designs.

ARM System - On - Chip


Architecture 56
Signal Processing Support (1/2)

 Piccolo DSP coprocessor.


 Various data memories for maximizing
throughput.

ARM System - On - Chip


Architecture 57
Signal Processing Support (2/2)
 Piccolo
ALU

mult

decode and control


ARM7TDMI

output
register buffer
bank

input
I cache
buffer

AMBA i/f AMBA i/f

AMBA
MEMORY HIERARCHY
Memory hierarchy
Larger size Lower speed

Memory Size Speed


type
Registers 32 – bit A few nsec
On – chip 8– 10 nsec
cache 32kbytes
Off – chip 100 – 200 10 – 30
cache kbytes nsec
RAM Mbytes 100 nsec
ARM System - On - Chip
Architecture 60
On – chip memory
 Necessary for performance
 Some system prefer RAM to on – chip
cache. Simpler, cheaper and less power-
hungry.

ARM System - On - Chip


Architecture 61
Cache types
 Cache types:
 Unified cache.
 Separate instruction and data caches.
 Performance: hit rate – miss rate
t av  htcache  (1  h)t main
 Compulsory miss: first time and address is accessed
 Capacity miss: When cache full
 Conflict miss: Two addresses compete for the same place in
the cache
ARM System - On - Chip
Architecture 62
Replacement policy -implementation

 Least Recently Used (LRU)


 Least Frequently Used (LFU)
 Data prediction

 Fully-associative
 Direct-mapped
 Set-associative
ARM System - On - Chip
Architecture 63
Direct – mapped cache (1/2)

 A line of
data
stored
in a tag
of
memory

ARM System - On - Chip


Architecture 64
Direct – mapped cache (2/2)

 Each memory location has a specific


place in the cache.
 Tag and data can be accessed at the
same time.
 Tag RAM smaller than data RAM and
has a smaller access time allowing the
comparison to complete before
accessing the data RAM.
ARM System - On - Chip
Architecture 65
 2 – way set
– associative
cache. (1/3)
Set associative cache (2/3)
 A set – associative cache has a number of
sets yielding n – way associative cache.
 Two addresses that would be competing for
the same spot in a direct mapped cache, can
be stored in different locations and accessed
independently.

ARM System - On - Chip


Architecture 67
Set associative (3/3)
 Set selection:
 Random allocation
 Least recently used (LRU)
 Round – robin (cyclic)

ARM System - On - Chip


Architecture 68
Fully associative (1/2)
address

tag CAM data RA M

mux

hit data
Write strategies
 Write – through
All write operations are passed to main memory
 Write – through with buffered write
Write operations are passed to main memory
through the write buffer
 Copy – back (write – back)
Write operations update only the cache.

ARM System - On - Chip


Architecture 70
Cache feature summary
Org ani zati o nal feature Opti o ns
Cache-MMU rel ati o ns hi p Physical cache Virtual cache
Cache co ntents Unified instruction Separate instruction
and data cache and data caches
As s o ci ati v i ty Direct-mapped Set-associative Fully associative
RAM-RAM RAM-RAM CAM-RAM
Repl acement s trateg y Cyclic Random LRU
Wri te s trateg y Write-through Write-through with Copy-back
write buffer

ARM System - On - Chip


Architecture 71
‘Perfect’ cache performance

Cache fo rm Perfo rmance


No cache 1
Instruction-only cache 1.95
Instruction and data cache 2.5
Data-only cache 1.13

ARM System - On - Chip


Architecture 72
MMU (1/3)
Two memory management approaches:
 Segmentation
 Paging

ARM System - On - Chip


Architecture 73
MMU (2/3)
 Segmented memory management:
segment selector logical address

base limit

segment descriptor table

+ >?

physical address access fault

ARM System - On - Chip


Architecture 74
MMU (3/3)
 Paging memory management:
31 22 21 12 11 0
logical address

data

page page page


directory table frame

ARM System - On - Chip


Architecture 75
ARCHITECTURAL SUPPORT
FOR OPERATING SYSTEMS
External Trace Port 14 External 8 external DMA
Clock Analyser Interrupts requests

Timers
&
ETM CLCD
W'Dog VIC DMAC CLCD
RTC
(PL031)
(PL192) (PL080) (PL110) Display
ARM1136JF
External
System
core
Reset &
Control AHB/APB 64 64 64 64
Battery Fail Bridge

}
config 1.
64
2.
3.
64
4.
64
5. 8 AHBs
64
6.
SDRAM MPMC
7.
& DDR (PL176)
8.

unassigned
config Bus Matrix
Static SMC
Memory (PL093)
1. ARM Periph AHB AHB/APB AHB/APB
Bridge Bridge UART
2. ARM D Write AHB (PL011) 2x UARTs
3. ARM D Read AHB
4. ARM I AHB
5. ARM DMA AHB
Smart Card
6. CLCD AHB GPIO SSP SCI
(PL061) (PL022) (PL131) (UICC
7. DMA 2 AHB
compliant)
8. DMA 1 AHB

32 GPIO
Lines
CP15
 On – chip coprocessor for MMU, cache,
protection unit control.
 Control takes place through registers with
instructions executed in supervisor mode.

ARM System - On - Chip


Architecture 77
Protection Unit
 Simpler alternative to the MMU.
Requires simpler software and
hardware.
 Does not use translation tables, but 8
protection regions instead.

ARM System - On - Chip


Architecture 78
ARM DEVELOPER SUITE
ARMULATOR (1/2)
 Armulator: Emulator of various ARM
processors.
 Allows project development in C, C++
or Assembly.
 It includes debugger, compilers,
assembler and this entire set is called
ARM Developer Suite (ADS).

ARM System - On - Chip


Architecture 80
ARMULATOR (2/2)
 Possible project options:
 ARM and Thumb Interworking
 Mixing C, C++ and Assembly
 Code for ROM
 Exception handlers

MM

ARM System - On - Chip


Architecture 81
ARMULATOR TUTORIAL
 CODEWARRIOR ENVIRONMENT

ARM System - On - Chip


Architecture 82
ARM System - On - Chip
Architecture 83
ARM System - On - Chip
Architecture 84
ARM System - On - Chip
Architecture 85
ARM System - On - Chip
Architecture 86
ARM System - On - Chip
Architecture 87

You might also like