KEMBAR78
Unit 1 | PDF | Central Processing Unit | Random Access Memory
0% found this document useful (0 votes)
27 views77 pages

Unit 1

This document provides an overview of ARM architecture, including its historical development, key components like the CPU, memory types, and instruction sets. It contrasts RISC and CISC architectures, highlighting the advantages and disadvantages of each, and details the structure and functionality of ARM processors. Additionally, it describes the ARM instruction execution process, memory management, and the role of registers in data processing.

Uploaded by

vitalguideco
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views77 pages

Unit 1

This document provides an overview of ARM architecture, including its historical development, key components like the CPU, memory types, and instruction sets. It contrasts RISC and CISC architectures, highlighting the advantages and disadvantages of each, and details the structure and functionality of ARM processors. Additionally, it describes the ARM instruction execution process, memory management, and the role of registers in data processing.

Uploaded by

vitalguideco
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 77

Unit – I

9Hours
ARM Architecture: The Acron RISC machine, Architectural inheritance,
Architecture of ARM7TDMI, ARM programmers model, ARM development tools, 3
stage pipeline ARM organization, ARM instruction execution. The advanced
micro controller bus architecture (AMBA). Introduction, structure of assembly
language modules, Predefined register names, frequently used directives,
Macros, Miscellaneous assembler features.
Text 1 (2.1,2.2,2.3,2.4,2.5,4.1,4.3) Text 2( 4.1 to 4.6)
The unit of data size
 Bit : a binary digit that can have the value 0 or 1
 Byte : 8 bits
 Nibble : half of a bye, or 4 bits
 Word : two bytes, or 16 bits

The terms used to describe amounts of memory in


IBM PCs and compatibles
 Kilobyte (K): 210 bytes
 Megabyte (M) : 220 bytes, over 1 million
 Gigabyte (G) : 230 bytes, over 1 billion
 Terabyte (T) : 240 bytes, over 1 trillion
CPU (Central Processing Unit)
 Execute information stored in memory

I/O (Input/output) devices


 Provide a means of communicating with CPU

Memory
- RAM (Random Access Memory) – temporary storage of
programs that computer is running
 The data is lost when computer is off
- ROM (Read Only Memory) – contains programs and
information essential to operation of the computer
 The information cannot be changed by use, and is not
lost when power is off.
 It is called nonvolatile memory
Registers
 The CPU uses registers to store information
temporarily
- Data to be processed
- Address of data/code to be fetched from memory
 In general, the more and bigger the registers, the
better the CPU
 Registers can be 8-, 16-, 32-, or 64-bit
 The disadvantage of more and bigger registers is
the increased cost of such a CPU
ALU (arithmetic/logic unit)
 Performs arithmetic functions such as add,
subtract, multiply, and divide, and logic functions
such as AND, OR, and NOT
Program counter
 Points to the address of the next instruction to be
executed.
 As each instruction is executed, the program
counter is incremented to point to the address of
the next instruction to be executed.
Instruction decoder
- Interprets the instruction fetched into the CPU
- A CPU capable of understanding more
instructions requires more transistors to design
General-purpose microprocessors:
 Must add RAM, ROM, I/O ports, and timers externally to
make them functional.
 Make the system bulkier and much more expensive.
 Have the advantage of versatility on the amount of RAM,
ROM, and I/O ports

Microcontroller:
 The fixed amount of on-chip ROM, RAM, and number of
I/O ports makes them ideal for many applications in which
cost and space are critical
 In many applications, the space it takes, the power it
consumes, and the price per unit are much more critical
considerations than the computing power
Von Neumann architecture:
 A computer whose memory holds both data and
Instructions.
 The CPU has several internal registers
 Program counter, general-purpose register…
 CPU fetches instructions by program counter from Memory
 The separation of the instruction memory from the CPU Distinguish a
stored-program computer from a general finite-state machine

Harvard Architecture:
 Separate memories for data and program
 The program counter points to the program Memory
 Hard to write self-modifying programs
 Used for one very simple reason
 Provide higher performance for digital signal Processing
 Most of DSPs are Harvard architectures
 Most of the phone calls go through at least 2 DSPs, one at each end
of the phone call
von Neumann vs. Harvard:
 Harvard cannot use self-modifying code.
 Harvard allows two simultaneous memory fetches.
 Most DSPs use Harvard architecture for streaming data:
 greater memory bandwidth
 more predictable bandwidth
 Streaming data
 Data set the arrive continuously and periodically
 What is RISC?
 RISC, or Reduced Instruction Set Computer is a type of
microprocessor architecture that utilizes a small,
highly-optimized set of instructions, rather than a
more specialized set of instructions often found in
other types of architectures.
 One cycle execution time: RISC processors have a CPI
(clock per instruction) of one cycle. This is due to the
optimization of each instruction on the CPU and a
technique called PIPELINING
 pipelining: Technique that allows for simultaneous
execution of parts, or stages, of instructions to more
efficiently process instructions;
 large number of registers: The RISC design philosophy
generally incorporates a larger number of registers to
prevent in large amounts of interactions with memory
The main characteristics of CISC microprocessors
are:
 Extensive instructions.
 Complex and efficient machine instructions.
 Extensive addressing capabilities for memory operations.
 Relatively few registers.
In comparison, RISC processors are more or less the
opposite of the above:
 Reduced instruction set.
 Less complex, simple instructions.
 Hardwired control unit and machine instructions.
 Few addressing schemes for memory operands with only
two basic instructions, LOAD and STORE.
 Many symmetric registers which are organised into a
register file.
CISC RISC

Emphasis on hardware Emphasis on software

Includes multi-clock complex Single-clock, reduced instruction


instructions only

Memory-to-memory: "LOAD" and Register to register: "LOAD" and


"STORE“ incorporated in "STORE“ are independent
instructions instructions

Small code sizes, high cycles per Low cycles per second, large code
second sizes

Transistors used for storing Spends more transistors on


complex instructions memory registers
RISC advantages
• A smaller die size
• A shorter development time
• A higher performance

RISC drawbacks
• RISCs generally have poor code density compared with CISCs.
• RISCs don’t execute x86 code.
 First ARM was developed at Acorn Computers
Limited, of Cambridge, England between October
1983 and April 1985.
 Before 1990, ARM stood for Acorn RISC Machine
 Later on ARM stands for Advanced RISC Machine
 RISC concept was introduced in 1980 at Stanford
and Berkley.
 ARM core limited founded in 1990
 ARM cores
-Licensed partners to develop and fabricate new
microcontrollers
-Soft core
 ARM was established in November 1990 as Advanced RISC
Machines Ltd.,
 UK-based joint venture between Apple Computer, Acorn
Computer Group and VLSI Technology.
 Apple and VLSI both provided funding, while Acorn
supplied the technology.
 Acorn, developer of the world’s first commercial single-
chip RISC processor, and Apple, intent on advancing the use
of RISC technology in its own systems, chartered ARM with
creating a new microprocessor standard.
 ARM immediately differentiated itself in the market by
creating the first low-cost RISC architecture.
 Conversely, competing architectures, which were more
commonly focused on maximizing performance, were
first used in high-end workstations.
 1985 Acorn Computer Group develops the world's first commercial RISC
processor
 1987 Acorn's ARM processor debuts as the first RISC processor for low-cost
PCs
 1990 Advanced RISC Machines (ARM) spins out of Acorn and Apple
Computer's collaboration efforts with a charter to create a new
microprocessor standard. VLSI Technology becomes an investor and the first
licensee
 1991 ARM introduces its first embeddable RISC core, the ARM6™ solution
 1995 Atmel/ES2, Digital, LG Semicon, NEC and Symbios Logic license ARM
technology.
- ARM's Thumb® architecture extension gives 32-bit RISC performance at
16-bit system cost and offers industry- leading code density
- ARM opens office in Munich, Germany
- ARM launches Software Development Toolkit
- TI samples first ARM Thumb core
- First StrongARM™ core from Digital Semiconductor and ARM
- ARM extends family with ARM8™ high-performance solution
- ARM launches the ARM7100™ "PDA-on-a-chip"
 16-bit CISC microprocessor had certain
disadvantages available in 1983
-They were slower than standard memory
parts
-Instructions that took many clock cycles to
complete
-Long interrupt latency
 Control over ALU and shifter for every data
processing operations to maximize their usage
 Auto-Increment and auto-Decrement addressing
modes to optimize program loops
 Load and Store Multiple instructions to maximize
data throughput
 Conditional Execution of instruction to maximize
execution throughput
A[31:0] control

address regis ter

P
C incrementer

PC
register
bank

instructi on
decode
A multipl y &
L register
U control
A B
b
u b b
s u u
s barrel s
shifter

ALU

data out register data in register

D[31:0]
 Data items are placed in register file
-No data processing instructions directly
manipulate data in memory
 Instructions typically use two source registers and
single result or destinations registers
 A Barrel shifter on the data path can preprocess
data before it enters ALU
 Increment/Decrement logic can update register
content for sequential access independent of ALU
 General Purpose Registers hold either data or address
 All registers are of 32 bits
 Total 37 registers
 In user mode 16 data registers and 2 status registers are
visible

 Data registers: r0 to 15
-Three registers r13, r14, r15 perform special functions
-r13: stack pointer
-r14: link register (where return address is put whenever
a subroutine is called)
-r15: program counter
 Depending upon context, registers r13 and r14 can
also be used as GPR
 Any instruction which use r0 can as well be used
with any other GPR (r1-r13)
 In addition, there are two status registers
-CPSR: Current Program Status Register
-SPSR: Saved Program Status Register
 The ARM chip was designed based on Berkeley
RISC I and II and the Stanford MIPS
(Microprocessor without Interlocking Pipeline
Stages)
 Features Used from Berkeley RISC design
-a load-store architecture
-fixed length 32-bit instructions
-3-address instruction formats
 Features Rejected
-Register windows
-Delayed Branches
- Single Cycle execution of all instructions
 Based upon RISC Architecture with enhancements to meet
requirements of embedded applications

◦ A Large uniform register file


◦ Load-store architecture
◦ Uniform and fixed length instructions
◦ 32-bit processor
◦ Instructions are 32-bit long
◦ Good speed/power consumption ratio
◦ High Code Density
 Developed by Advanced RISC Machines
 32-bit RISC embedded processor
 Low-end ARM core for applications like digital mobile
phones
 Von Neumann, load/store architecture
 Only 32 bit data bus for both instr. and data.
 Only the load/store instr (and SWP) access memory.
 3-stage pipeline
- Fetch
- Decode
- Execute
 Low power, fully static design
 3-stage pipeline
 Unified bus interface
 T: in addition to the 32-bit ARM instruction set, also
support 16-bit Thumb instructions.
 D: on-chip Debug support- Enable the processor to halt
in response to a debug request.
 M: an enhanced Multiplier Can yield a full 64-bit result
 I: EmbeddedICE hardware to give on-chip breakpoint and
watchpoint support.
 The embeddedICE module introduces breakpoint and
watchpoint registers that are accessed using JTAG interface
 ARM has 37 registers in total, all of which are 32-
bits long.
◦ 30 general purpose registers
◦ 5 dedicated saved program status registers
◦ 1 dedicated program counter
◦ 1 dedicated current program status register
 However these are arranged into several banks,
with the accessible bank being governed by the
processor mode. Each mode can access
◦ a particular set of r0-r12 registers
◦ a particular r13 (the stack pointer) and r14 (link register)
◦ r15 (the program counter)
◦ cpsr (the current program status register)

and privileged modes can also access


◦ a particular spsr (saved program status register)

32 of 42
• Fifteen general-purpose registers are visible at any one
time, depending on the current processor mode, as r0,
r1, ... ,r13, r14.
• By convention, r13 is used as a stack pointer (sp) in
ARM assembly language. The C and C++ compilers
always use r13 as the stack pointer.
• In User mode, r14 is used as a link register (lr) to store
the return address when a subroutine call is made.
• It can also be used as a general-purpose register if the
return address is stored on the stack.
• In the exception handling modes, r14 holds the return
address for the exception, or a subroutine return
address if subroutine calls are executed within an
exception.
• r14 can be used as a general-purpose register if the
return address is stored on the stack.
33 of 42
• The program counter is accessed as r15 (or pc). It is
incremented by one word (four bytes) for each
instruction in ARM state, or by two bytes in Thumb
state.
• Branch instructions load the destination address into
the program counter. You can also load the program
counter directly using data operation instructions.
For example, to return from a subroutine, you can
copy the link register into the program counter
using:
 MOV pc,lr or
 MOV r15,r14
• During execution, r15 does not contain the address
of the currently executing instruction. The address of
the currently executing instruction is typically pc– 8
for ARM, or pc– 4 for Thumb.

34 of 42
 Instruction set will only process values which are in
registers
 The only operations which apply to memory state are
ones which copy memory values into registers(load
instructions) or copy register values into memory(store
instruction)
 ARM does not support such ‘memory-to-memory’
operations
 Therefore all ARM instructions fall into three categories;
1. Data processing instructions.
- These use and change only register values.
2. Data transfer instructions.
- Loads or stores.
3. Control flow instructions.
- E.g., branch, call/return, tripping into system code
(supervisor calls).
 The ARM handles I/O peripherals as memory-mapped devices with
interrupt support. The internal registers in these device appear as
addressable locations within the ARM’s memory map and may be
read and written using the same load-store instructions as any other
memory locations.

 Peripherals may attract the processor’s attention by making an


interrupt request using either the normal interrupt (IRQ) or the fast
interrupt (FIQ) input.

 Interrupts are a form of exception.

 Both are Level Sensitive and Maskable

 Normally most interrupt sources share the IRQ input. Some may
include DMA hardware external to the processor to handle high-
bandwidth I/O traffic
r0
usable in user mode
r1
r2
r3 system modes only
r4
r5
r6
r7
r8_fiq
r8
r9 r9_fiq
r10_fiq
r10
r11 r11_fiq
r12_fiq r13_irq r13_und
r12 r13_abt
r13_fiq r13_svc r14_irq r14_und
r13 r14_svc r14_abt
r14 r14_fiq
r15 (PC)

SPSR_irq SPSR_und
SPSR_abt
CPSR SPSR_fiq SPSR_svc

fiq svc abort irq undefined


user mode mode mode mode mode mode
 The CPSR holds:
 copies of the Arithmetic Logic Unit (ALU) status flags
the current processor mode
 interrupt disable flags.
 The ALU status flags in the CPSR are used to
determine whether conditional instructions
are executed or not.
 On Thumb-capable processors, the CPSR also
holds the current processor state (ARM or
Thumb).

39 of 42
The CPSR is used in user-level programs to store the condition code bits.
These bits are used, for example, to record the result of a comparison
operation and to control whether a conditional branch is taken or not.

• N: Negative; the last ALU operation produced a negative result


• Z: Zero; the last ALU operation produced a zero result
• C: Carry; the last ALU operation generated a carry-out, either from the
arithmetic operation or from the shifter.
• V: oVerflow; the last arithmetic ALU operation generated an overflow into the
sign bit.

40
Flag Logical Instruction Arithmetic Instruction

Negative No meaning Bit 31 of the result has been


set. Indicates a negative
(N=‘1’) number in signed
operations
Zero Result is all zeroes Result of operation was zero
(Z=‘1’)
Carry After Shift operation Result was greater than 32
(C=‘1’) ‘1’ was left in carry bits
flag
oVerflow No meaning Result was greater than 31
(V=‘1’) bits Indicates a possible
corruption of the sign bit in
signed numbers

41 of 42
31 28 8 4 0

N Z CV I F T Mode

Copies of the ALU status flags (latched if the Condition


instruction has the "S" bit set).
bits
Condition Code Flags Interrupt Disable bits.
N = Negative result from ALU flag. I = 1, disables the IRQ.
Z = Zero result from ALU flag. F = 1, disables the FIQ.
C = ALU operation Carried out
V = ALU operation oVerflowed T Bit (Architecture v4T only)
T = 0, Processor in ARM state
Mode Bits T = 1, Processor in Thumb state
M[4:0] define the processor mode.

42 of 42
44 of 42
 In addition to the processor register state, and ARM system has
memory state.
 Memory may be viewed as a linear array of bytes numbered from 0
up to 232-1. Data items may be 8-bit bytes, 16-bit half-words or 32-
bit words.
 Words are always aligned on 4-byte boundaries (i.e., the two LSB are
zero ) and half-words are aligned on even byte boundaries.
 All ARM instructions are 32 bits wide (except the
compressed 16-bit Thumb instructions which are
described later).
The most notable features of the ARM instruction set are:
 The Load-Store Architecture
 3-address data processing instructions
 Conditional execution of every instruction
 The inclusion of very powerful load and store multiple
register instructions
 The ability to perform a general shift operation and a
general ALU operation in a single instruction that
executes in a single clock cycle
 Open instruction set extension through the coprocessor
inst
 A very dense 16-bit compressed instruction set in Thumb
mode
 C or Assembler source files are compiled or assembled
into ARM object format (.aof) files
 Then linked into ARM image format (.aif) files
 The image format files can be built to include the debug
tables required by the ARM symbolic debugger (ARMsd)
which can load, run and debug programs either on
hardware such as the ARM Development Board or using a
software emulation of the ARM (the ARMulator)
 The ARMulator has been designed to allow easy
extension of the software model to include system
features such as caches, memory timing characteristics,
and so on
 ARM C compiler - ANSI standard, fast, integrated
 ARM Assembler - translate assembly instructions to
machine code instructions (object files)
 Linker- Takes one or more object files (from C compiler
or ARM assembler) and combines them into one
executable program
 Resolve symbolic references (i.e. names of variables or
routines are turned into actual memory addresses)
 ARM symbolic debugger - full control on execution and
viewing of registers
 ARMulator - emulate the ARM processes with a system
 Instruction-accurate modelling
 Cycle-accurate modelling
 Timing-accurate modelling
 ARM C compiler is compliant with the ANSI standard for C
 Uses ARM procedure Call Standard for all externally
available functions
 Can produce assembly source output instead of ARM
object format, so code can be inspected, or even hand
optimized, and then assembled subsequently
 Compiler can also produce Thumb code
 The ARM assembler Full macro assembler which produces
ARM object format output that can be linked with output
from the C compiler
 Nearer to Machine-level, with most assembly instructions
translating into single ARM (or Thumb) instructions.
 Takes one or more object files and combines them into
an executable program
 Resolves symbolic references between the object files
and extracts object modules from libraries as needed by
the program
 Can assemble the various components of the program in
a number of different ways, depending on weather the
code is to run in RAM or ROM, whether overlays are
required, and so on
 Linker includes debug tables in the output file
 Can also produce object library modules that are not
executable but are ready for efficient linking with object
files in future.
 Front-end interface to assist in debugging programs
running either under emulation or remotely on a target
system such as the ARM development board
 Allows an executable program to be loaded into the
ARMulator or a development board and run
 Allows the setting of breakpoints, which are addresses in
the code that, if executed, cause execution to halt so that
the processor state can be examined
 In the ARMulator, or when running on hardware with
appropriate support, it also allows the setting of
watchpoints
 Supports full source level debugging, allowing the C
programmer to debug a program using source file to
specify breakpoints and using variable names from
original program
 ARM emulator is a suite of programs that models the
behavior of various ARM processor cores in software on a
host system
 Can operate at various levels of accuracy:
 Instruction-accurate modeling gives the exact behavior
of the system state without regard to the precise timing
characteristics of the processor
 Cycle-accurate modeling gives the exact behavior of the
processor on a cycle-by-cycle basis, allowing the exact
number of clock cycles that a program requires to be
established
 Timing-accurate modeling presents signals at the correct
time within a cycle, allowing logic delays to be accounted
for
 ARM Development Board is a circuit board incorporating a
range of components and interfaces to support the
development of ARM-based systems

Software Toolkit:
 ARM Project Manager is a graphical front-end for the tools
 It supports the building of a single library or executable image
from a list of files that make up a particular project
 Source files (C, assembler, and so on)
- Object files
- Library files
 The source files may be edited within the Project Manager
 There are many options which may be chosen for the build
 Whether the output should be optimized for code size or
execution time
 Whether the output should be in debug or release form
 Which ARM processor is the target and particularly
whether it supports the Thumb instruction set
 JumpStart: JumpStart tools from VLSI Technology, Inc.,
include the same basic set of development tools but
present a full X-windows interface on a suitable
workstation rather than the command-line interface of
the standard ARM toolkit
 There are many other suppliers of tools that support ARM
development
 The principal components of an ARM organization with a 3-
stage pipeline are:
◦ Register bank:
A[31:0] control

 2 read ports & 1 write ports address regis ter

 additional read and write P


incrementer
ports for accessing r15
C

◦ Barrel shifter: register


PC

 Shift or rotate one operand


bank

by any number of bits instruction

◦ ALU:
decode
A multiply &
L
◦ Address register and incrementer:
register
U control
A B
b
◦ Data register: u
s
b
u
b
u
barrel
 Hold data passing to and from mem.
s s
shifter

◦ Instruction decoder and associated


control logic: ALU

data out register data in register

D[31:0]
 ARM processor up to the ARM7 employ a simple 3-
stage pipeline with the following pipeline stages

◦ Fetch
 The instruction is fetched from memory and placed in the
instruction pipeline
◦ Decode
 The instruction is decoded and the datapath control signals
prepared for the next cycle. In this stage the instruction ‘owns’
the decode logic but not the datapath.
◦ Execute
 The instruction ‘owns’ the datapath; the register bank is read,
and operand shifted, the ALU result generated and written back
into a destination register.
 When the processor is executing simple data processing instructions
the pipeline enables one instruction to be completed every clock cycle.
An individual instruction take three clock cycles to complete, so it has
a three-cycle latency, but the throughput is one instruction per cycle.

1 fetch decode execute

2 fetch decode execute

3 fetch decode execute


instruction
time

Figure 4.2. ARM single-cycle instruction 3-stage pipeline operation.


 When a multi-cycle instruction is executed the flow is less regular,
as illustrated in Fig. 4.3. This shows a sequence of single-cycle
ADD instructions with a data store instruction, STR, occurring after
the first ADD. The decode logic is always generating the control
signals for the datapath to use in the next cycle, so in addition to
the explicit decode cycle it also generating the control for the data
transfer during the address calculation cycle of the STR.

1 fetch ADD decode execute

2 fetch STR decode calc. addr. data xfer

3 fetch ADD decode execute

4 fetch ADD decode execute

5 fetch ADD decode execute


instruction
time

Figure 4.3. ARM multi-cycle instruction 3-stage pipeline operation.


 The 3-stage pipeline used in the ARM cores up to the ARM7 is
very cost-effective, but higher performance requires the
processor organization to be redesigned. The time, Tprog,
required to execute a given program is given by:
Tprog = (Ninst X CPI) / fclk

 Since Ninst is constant for a given program, there are only two
ways to increase performance:
◦ Increase the clock rate, fclk.
 This requires the logic in each pipeline to be simplified, therefore,
the number of pipeline stages to be increased.
◦ Reduce the average number of clock cycle per instruction, CPI.
 This requires either that instructions which occupy more than one
pipeline slot in a 3-stage pipeline ARM are re-implemented to
occupy fewer slots, or that pipeline stalls cause by dependencies
between instructions are reduced, or a combination of both.
 Memory Bottleneck
A 3-stage ARM core accesses memory on (almost) every clock
cycle either to fetch an instruction or to transfer data. To get
significant better CPI the memory system must deliver more than
one value in each clock cycle either by delivering more than 32
bits per cycle from a single memory or by having separate
memories for instruction and data accesses.

 As a result of the above issues, higher performance ARM cores


employ a 5-stage pipeline and have separate instruction and
data memories.
Datapath activity during data processing instruction
SUB r2,r1,r0; r2=r1-r0
SUB r0, r1, #128; r0 := r1 - 128

address register address register

increment increment

Rd PC Rd PC
registers registers
Rn Rm Rn

mult mult

as ins. as ins.

as instruction as instruction
[7:0]

data out data in i. pipe data out data in i. pipe

(a) register - register operations (b) register - immediate operations


 Data Transfer Instructions
Datapath activity during data transfer instruction
LDR R1,=0X40000000 ;
STR R2,[R1],#4 ; R2=[0X40000000] & R1=R1+4 =0X400000004

address register address register

increment increment

PC Rn PC
registers registers
Rn Rd

mult mult

lsl #0 shifter

= A / A+ B / A- B =A+ B /A- B
[1 1:0]

data out data in i. pipe byte? data in i. pipe

(a) 1st cycle - compute addr


ess (b) 2nd c ycle - store data & auto-index
 Branch Instructions
Datapath activity during data transfer instruction
PC-> 0X00000008 BL LABEL ; LABEL=0X0000F000,
; LR (R14) = 0X0000000A

address register address register

increment increment

R14
registers registers
PC PC

mult mult

lsl #2 shifter

= A+ B =A

[23:0]

data out data in i. pipe data out data in i. pipe

(a) 1st cy cle - c ompute branch tar


get (b) 2nd c ycle - save return address
STRUCTURE OF ASSEMBLY LANGUAGE MODULES:
We begin by examining a very simple module as a starting point.
Consider the following
code:
AREA ARMex, CODE, READONLY; Name this block of code ARMex
ENTRY ; Mark first instruction to execute
start MOV r0, #10 ; Set up parameters
MOV r1, #3
ADD r0, r0, r1 ; r0 = r0 + r1
stop B stop ; infinite loop
END ; Mark end of file

 The instructions, directives, and pseudo-instructions must be preceded


by a white space, either a tab or any number of spaces, even if you don’t
have a label at the beginning.
 If you are using the Keil tools, the first semicolon on a line indicates the
beginning of a comment, unless you have the semicolon inside of a
string constant, for example,
abc SETS “This is a semicolon;”
 At some point, you will begin using constants in your assembly, and they are
allowed in a handful of formats:
• Decimal, for example, 123
• Hexadecimal, for example, 0x3F
• n_xxx (Keil only) where: n is a base between 2 and 9 , xxx is a number in
that base.

 Character constants consist of opening and closing single quotes,


 String constants are contained within double quotes.

 For example, in the Keil tools, you could say something like

MOV r3, #’A’ ; single character constant


GBLS str1 ; set the value of global string variable
str1 SETS “Hello world!\n”
In the Code Composer Studio tools, you might say
.string “Hello world!”
which places 8-bit characters in the string into a section of code, but the .string
directive neither adds a NUL character at the end of the characters nor
interprets escape characters. Instead, you could say
.cstring “Hello world!\n”
PREDEFINED REGISTER NAMES:
 Most assemblers have a set of register names that can be used
interchangeably in your code, mostly to make it easier to read. The
ARM assembler is no different, and includes a set of predefined,
case-sensitive names that are synonymous with registers.
 While the tools recognize predeclared names for basic registers,
status registers, floating-point registers, and coprocessors, only the
following are of immediate use to us:
 r0-r15 or R0-R15

 s0-s31 or S0-S31

 a1-a4 (argument, result, or scratch registers, synonyms for r0 to r3)

 sp or SP (stack pointer, r13)

 lr or LR (Link Register, r14)

 pc or PC (Program Counter, r15)

 cpsr or CPSR (current program status register)

 spsr or SPSR (saved program status register)

 apsr or APSR (application program status register)


name RN expr
where name is the name to be assigned to the register. Obviously name cannot
be the same as any of the predefined names. The expr parameter takes on
values from 0 to 15. Mind that you do not assign two or more names to the
same register.
EXAMPLE
code:
coeff1 RN 8 ; coefficient 1
coeff2 RN 9 ; coefficient 2
dest RN 0 ; register 0 holds the pointer to
; destination matrix

The syntax for the EQU directive is:


name EQU expr{,type}
where name is the symbolic name to assign to the value, expr is a register-
relative address, a program-relative address, an absolute address, or a 32-bit
integer constant.The parameter type is optional and can be any one of
ARM
THUMB
CODE16
CODE32
DATA
EXAMPLE
SRAM_BASE EQU 0x04000000 ; assigns SRAM a base address
abc EQU 2 ; assigns the value 2 to the symbol abc
xyz EQU label+8 ; assigns the address (label+8)
; to the symbol xyz
fiq EQU 0x1C, CODE32 ; assigns the absolute address 0x1C to the symbol fiq,
;and marks it as code

Declaring an Entry Point: In the Keil tools, the ENTRY directive declares an entry
point to a program. The syntax is:
ENTRY
Your program must have at least one ENTRY point for a program; otherwise, a
warning is generated at link time. If you have a project with multiple source
files, not every source file will have an ENTRY directive, and any single source
file should only have one ENTRY directive. The assembler will generate an error
if more than one ENTRY exists in a single source file.

EXAMPLE
AREA ARMex, CODE, READONLY
ENTRY ; Entry point for the application
 When writing programs that contain tables or data that must be configured
before the program begins, it is necessary to specify exactly what memory looks
like.
 Strings, floating-point constants, and even addresses can be stored in
memory as data using various directives.

DCB: actually defines the initial runtime contents of memory. The syntax is
{label} DCB expr{,expr}…
where expr is either a numeric expression that evaluates to an integer in the
range −128 to 255, or a quoted string, where the characters of the string are
stored consecutively in memory.

 Since the DCB directive affects memory at the byte level, you should use an
ALIGN directive afterward if any instructions follow to ensure that the
instruction is aligned correctly in memory.

EXAMPLE:
Unlike strings in C, ARM assembler strings are not null-terminated. You can
construct a null-terminated string using DCB as follows:
C_string DCB “C_string”,0
If this string started at address 0x4000 in memory, it would look like
ALIGN directive: It aligns the current location to a specified boundary by
padding with zeros. The syntax is
ALIGN {expr{,offset}}
where expr is a numeric expression evaluating to any power of two from 2^0 to
2^31, and offset can be any numeric expression. The current location is aligned
to the next address of the form
offset + n * expr
If expr is not specified, ALIGN sets the current location to the next word (four
byte) boundary.
EXAMPLE:
AREA OffsetExample, CODE
DCB 1 ; This example places the two
ALIGN 4,3 ; bytes in the first and fourth
DCB 1 ; bytes of the same word

AREA Example, CODE, READONLY


start LDR r6, = label1 ; code
MOV pc,lr
label1 DCB 1 ; pc now misaligned
ALIGN ; ensures that subroutine1 addresses
Subroutine1 MOV r5, #0x5 ; the following instruction.

SPACE:
The syntax is: {label} SPACE expr
where expr evaluates to the number of zeroed bytes to reserve. You may also
want to use the ALIGN directive after using a SPACE directive, to align any code
that follows.
EXAMPLE: AREA MyData, DATA, READWRITE
data1 SPACE 255 ; defines 255 bytes of zeroed storage
Ending a Source File:
This is the easiest of the directives—END simply tells the assembler you’re at
the end of a source file. The syntax for the Keil tools is,
END
When you terminate your source file, place the directive on a line by itself.
MACROS: Macro definitions allow a programmer to build definitions of functions
or operations once, and then call this operation by name throughout the code,
saving some writing time.
 In fact, macros can be part of a process known as conditional assembly,
wherein parts of the source file may or may not be assembled based on certain
variables, such as the architecture version (or a variable that you specify
yourself).
 Two directives are used to define a macro: MACRO and MEND. The syntax is:
MACRO
{$label} macroname{$cond} {$parameter{,$parameter}…}
; code
MEND
 Where $label is a parameter that is substituted with a symbol given when the
macro is invoked. The symbol is usually a label. The macro name must not
begin with an instruction or directive name.
 The parameter $cond is a special parameter designed to contain a condition
code; however, values other than valid condition codes are permitted.
 The term $parameter is substituted when the macro is invoked.
Within the macro body, parameters such as $label, $parameter, or $cond can be
used in the same way as other variables. They are given new values each time
the macro is invoked. Parameters must begin with $ to distinguish them from
ordinary symbols. Any number of parameters can be used. The $label field is
optional, and the macro itself defines the locations of any labels
EXAMPLE : Suppose you have a sequence of instructions that appears multiple
times in your code—in this case, two ADD instructions followed by a
multiplication. You could define a small macro as follows:
MACRO ; macro definition:
; vara = 8 * (varb + varc + 6)
$Label_1 AddMul $vara, $varb, $varc
$Label_1
ADD $vara, $varb, $varc ; add two terms
ADD $vara, $vara, #6 ; add 6 to the sum
LSL $vara, $vara, #3 ; multiply by 8
MEND
In your source code file, you can then instantiate the macro as many times as
you like. You might call the sequence as,
CSet1 AddMul r0, r1, r2 ; invoke the macro
; the rest of your code
and the assembler makes the necessary substitutions, so that the assembly
listing actually reads as,
CSet1 ; invoke the macro
ADD r0, r1, r2
ADD r0, r0, #6
LSL r0, r0, #3 ; the rest of your code
ORR r1, r1, #1:SHL:3 ;
set CCREG[3]
 Here, a 1 is shifted left three bits.
 Assuming you like to call register r1 CCREG,
you have now set bit 3.
 The advantage in writing it this way is that you are more likely to understand
that you wanted a one in a particular bit location, rather than simply using a
logical operation with a value such as 0x8.
You can even use these operators in the creation of constants, for example,
DCD (0x8321:SHL:4):OR:2
MOV r0, #((1:SHL:14):OR:(1:SHL:12))
MOV r0, #((1 <<14) | (1 <<12))
MOV r0, #0x5000

You might also like