MC & ES-Module 1 Notes
MC & ES-Module 1 Notes
MODULE – 01
ARM EMBEDDED SYSTEMS & ARM PROCESSOR FUNDAMENTALS
ARM EMBEDDED SYSTEMS
Microprocessors Microcontrollers
Microprocessors generally does not have Microcontroller is ‘all in one’ processor, with
RAM, ROM and I/O pins. RAM, I/O ports, all on the chip.
Microprocessors usually use its pins as a bus to
interface to RAM, ROM, and peripheral Controlling bus is internal and not available to
devices. Hence, the controlling bus is the board designer.
expandable at the board level.
Microprocessors are generally capable of being Microcontrollers are usually used for more
built into bigger general purpose applications. dedicated applications.
Microcontrollers have power saving system,
Microprocessors, generally do not have power
like idle mode or power saving; mode so
saving system.
overall it uses less power.
Microcontrollers are made by using
The overall cost of systems made with
complementary metal oxide semiconductor
Microprocessors is high, because of the high
technology; so they are far cheaper than
number of external components required.
Microprocessors.
Processing speed of general microprocessors is
Processing speed of Microcontrollers is about
above 1 GHz; so it works much faster than
8 MHz to 50 MHz.
Microcontrollers.
Microprocessors are based on von-Neumann Microcontrollers are based on Harvard
model; where, program and data are stored in architecture; where, program memory and data
same memory module. memory are separate.
Dept. of CSE (AI & ML), SVIT Asst. Prof. Soumya L N page: 1
M___ _ ______ _ _____ __ ____ ___ ___ ____
The ARM processor core is a key component of many successful 32-bit embedded systems. ARM
cores arewidely used in mobile phones, handheld organizers, and a multitude of other everyday
portable consumerdevices.
The first ARM1 prototype was designed in 1985. Over one billion ARM processors had
been shipped worldwide by the end of 2001. The ARM Company bases their success on a
simple and powerful original design, which continues to improve today through constant
technical innovation.
For example, one of ARM’s most successful cores is the ARM7TDMI. It provides up to
120 Dhrystone MIPS and is known for its high code density and low power consumption,
making it ideal for mobile embedded devices.
Dept. of CSE (AI & ML), SVIT Asst. Prof. Soumya L N page: 2
M___ _ ______ _ _____ __ ____ ___ ___ ____
CISC RISC
1. Complex instructions, taking multiple clock 1. Simple instructions, taking single clock
2. Emphasis on hardware, complexity is in the 2. Emphasis on software, complexity is in the
micro-program/processor complier
3. Complex instructions, instructions executed 3. Reduced instructions, instructions executed
by micro-program/processor by hardware
4. Variable format instructions, single register 4. Fixed format instructions, multiple register
set and many instructions sets and few instructions
5. Many instructions and many addressing 5. Fixed instructions and few addressing modes
modes
6. Conditional jump is usually based on status 6. Conditional jump can be based on a bit
register bit anywhere in memory
7. Memory reference is embedded in many 7. Memory reference is embedded in
instructions LOAD/STORE instructions
Dept. of CSE (AI & ML), SVIT Asst. Prof. Soumya L N page: 3
M___ _ ______ _ _____ __ ____ ___ ___ ____
These design rules allow a RISC processor to be simpler, and thus the core can operate at higher
clock frequencies.
o In contrast, traditional CISC processors are more complex and operate at lower clock
frequencies.
Dept. of CSE (AI & ML), SVIT Asst. Prof. Soumya L N page: 4
M___ _ ______ _ _____ __ ____ ___ ___ ____
Variable cycle execution for certain instructions—Not every ARM instruction executes
in
a single cycle. For example, load-store-multiple instructions vary in the number of
execution cycles depending upon the number of registers being transferred. The transfer
can occur on sequential memory addresses. Code density is also improved since multiple
register transfers are common operations at the start and end of functions.
Inline barrel shifter leading to more complex instructions—The inline barrel shifter is a
hardware component that preprocesses one of the input registers before it is used by an
instruction. This expands the capability of many instructions to improve core
performance and code density.
Thumb 16-bit instruction set—ARM enhanced the processor core by adding a second 16-
bit instruction set called Thumb that permits the ARM core to execute either 16- or 32-bit
instructions. The 16-bit instructions improve code density by about 30% over 32-bit
fixed-length instructions.
Conditional execution—An instruction is only executed when a specific condition has
been satisfied. This feature improves performance and code density by reducing branch
instructions.
Enhanced instructions—The enhanced digital signal processor (DSP) instructions were
added to the standard ARM instruction set to support fast 16×16-bit multiplier operations.
These instructions allow a faster-performing ARM processor.
These additional features have made the ARM processor one of the most commonly used 32-bit
embedded processor cores.
Dept. of CSE (AI & ML), SVIT Asst. Prof. Soumya L N page: 5
M___ _ ______ _ _____ __ ____ ___ ___ ____
Dept. of CSE (AI & ML), SVIT Asst. Prof. Soumya L N page: 6
M___ _ ______ _ _____ __ ____ ___ ___ ____
The cache is placed between main memory and the core. It is used to speed up data
transfer between the processor and main memory. A cache provides an overall increase
in performance but with a loss of predictable execution time. Although the cache
increases the general performance of the system, it does not help real-time system
response.
The main memory is large—around 256 KB to 256 MB (or even greater), depending on
the application—and is generally stored in separate chips. Load and store instructions
access the mainmemory unless the values have been stored in the cache .
Secondary storage is the largest and slowest form of memory. Hard disk drives and
CD-ROMdrives are examples of secondary storage.
Width: The memory width is the number of bits the memory returns on each access—typically
8, 16, 32,or 64 bits.
The memory width has a direct effect on the overall performance and cost ratio.
Lower bitmemories are less expensive, but reduce the system performance.
The following Table summarizes theoretical cycle times on an ARM processor using different
memorywidth devices.
Table: Fetching Instruction from Memory
Instruction Size 8-bit Memory 16-bit Memory 32-bit Memory
ARM 32-bit 4 cycles 2 cycles 1 cycles
Thumb 16-bit 2 cycles 1 cycles 1 cycles
Dept. of CSE (AI & ML), SVIT Asst. Prof. Soumya L N page: 8
Microcontroller and Embedded Systems (BCO601)
(
Read-only memory (ROM) is the least flexible of all memory types because it contains an
image that is permanently set at production time and cannot be reprogrammed.
o ROMs are used in high-volume devices that require no updates or corrections. Many
devices also use a ROM to hold boot code.
Flash ROM can be written to as well as read, but it is slow to write so you shouldn’t use
it for holding dynamic data.
o Its main use is for holding the device firmware or storing long-term data that
needs to be preservedafter power is off. The erasing and writing of flash ROM are
completely software controlled with no additional hardware circuitry required,
which reduces the manufacturing costs.
Dynamic random access memory (DRAM) is the most commonly used RAM for devices.
It has the lowest cost per megabyte compared with other types of RAM. DRAM is
dynamic—it needs to have its storage cells refreshed and given a new electronic charge
every few milliseconds, so you need to set up a DRAM controller before using the
memory.
Static random access memory (SRAM) is faster than the more traditional DRAM, but
requires more silicon area. SRAM is static—the RAM does not require refreshing. The
access time for SRAM is considerably shorter than the equivalent DRAM because
SRAM does not require a pause between data accesses. But cost of SRAM is high.
Synchronous dynamic random access memory (SDRAM) is one of many subcategories of
DRAM. It can run at much higher clock speeds than conventional memory. SDRAM
synchronizes itself with the processor bus, because it is clocked. Internally the data is
fetched from memory cells, pipelined, and finally brought out on the bus in a burst.
Dept. of CSE (AI & ML), SVIT Asst. Prof. Soumya L N Page: 9
Microcontroller and Embedded Systems(BCO601)
Peripherals:
Embedded systems that interact with the outside world need some form of peripheral device. A
peripheral device performs input and output functions for the chip by connecting to other
devices or sensors that are off-chip.
o Each peripheral device usually performs a single function and may reside on-chip.
o Peripherals range from a simple serial communication device to a more complex
802.11 wireless device.
All ARM peripherals are memory mapped—the programming interface is a set of
memory- addressed registers. The address of these registers is an offset from a specific
peripheral base address.
Controllers are specialized peripherals that implement higher levels of functionality
within an embedded system.
o Two important types of controllers are memory controllers and interrupt controllers.
Memory Controllers: Memory controllers connect different types of memory to the processor bus.
o On power-up a memory controller is configured in hardware to allow certain memory
devices tobe active. These memory devices allow the initialization code to be executed.
Some memory devices must be set up by software; for example, when using DRAM, you first
have to setup the memory timings and refresh rate before it can be accessed.
Dept. of CSE (AI & ML), SVIT Asst. Prof. Soumya L N Page: 9
Microcontroller and Embedded Systems(BCO601)
2. The vector interrupt controller (VIC) is more powerful than the standard interrupt
controller, because it prioritizes interrupts and simplifies the determination of which
device caused the interrupt.
o Depending on the type, the VIC will either call the standard interrupt exception
handler, which can load the address of the handler.
Dept. of CSE (AI & ML), SVIT Asst. Prof. Soumya L N Page: 10
Microcontroller and Embedded Systems(BCO601)
The above Figure shows memory before and after reorganization. It is common for ARM-based
embeddedsystems to provide for memory remapping because it allows the system to start the
initialization codefrom ROM at power-up. The initialization code then redefines or remaps the
memory map to place RAM at address 0x00000000—an important step because then the
exception vector table can be in RAM and thus can be reprogrammed.
2. Diagnostics are often embedded in the initialization code. Diagnostic code tests the system
by exercising the hardware target to check if the target is in working order. It also tracks
down standard system-related issues. The primary purpose of diagnostic code is fault
identification and isolation.
3. Booting involves loading an image and handing control over to that image. The boot
process itself can be complicated if the system must boot different operating systems or
Dept. of CSE (AI & ML), SVIT Asst. Prof. Soumya L N Page: 11
Microcontroller and Embedded Systems(BCO601)
Operating System:
The initialization process prepares the hardware for an operating system to take control.
An operating system organizes the system resources: the peripherals, memory, and
processing time.
ARM processors support over 50 operating systems. We can divide operating systems
into twomain categories: real-time operating systems (RTOSs) and platform operating
systems.
1. RTOSs provide guaranteed response times to events. Different operating systems have
differentamounts of control over the system response time.
o A hard real-time application requires a guaranteed response to work at all.
o In contrast, a soft real-time application requires a good response time,
but the performance degrades more gracefully if the response time
overruns.
2. Platform operating systems require a memory management unit to manage large, non-
real-timeapplications and tend to have secondary storage.
o The Linux operating system is a typical example of a platform operating system.
Dept. of CSE (AI & ML), SVIT Asst. Prof. Soumya L N Page: 12
Microcontroller and Embedded Systems(BCO601)
Applications:
The operating system schedules applications—code dedicated to handle a particular
task. An application implements a processing task; the operating system controls the
environment.
o An embedded system can have one active application or several applications
runningsimultaneously.
ARM processors are found in numerous market segments, including networking, auto-
motive,mobile and consumer devices, mass storage, and imaging.
ARM processor is found in networking applications like home gateways, DSL modems
for high-speed Internet communication, and 802.11 wireless communications.
The mobile device segment is the largest application area for ARM processors, because
of mobilephones.
ARM processors are also found in mass storage devices such as hard drives and imaging
productssuch as inkjet printers—applications that are cost sensitive and high volume.
In contrast, ARM processors are not found in applications that require leading-edge high performance.
Because these applications tend to be low volume and high cost, ARM has decided not to focus
designs on these type of applications.
Dept. of CSE (AI & ML), SVIT Asst. Prof. Soumya L N Page: 13
Microcontroller and Embedded Systems(BCO601)
A programmer can think of an ARM core as functional units connected by data buses, as shown
in the following Figure.
Dept. of CSE (AI & ML), SVIT Asst. Prof. Soumya L N Page: 14
Microcontroller and Embedded Systems(BCO601)
The instruction decoder translates instructions before they are executed. Each instruction
executed belongs to a particular instruction set.
The ARM processor, like all RISC processors, use a load-store architecture—means it
has two instruction types for transferring data in and out of the processor.
o Since the ARM core is a 32-bit processor, most instructions treat the registers as
holding signed or unsigned 32-bit values. The sign extend hardware converts
signed 8-bit and 16-bit numbers to 32-bit values as they are read from memory
and placed in a register.
ARM instructions typically have two source registers, Rn and Rm, and a single result or
destination register, Rd. Source operands are read from the register file using the
internal buses A and B, respectively.
The ALU (arithmetic logic unit) or MAC (multiply-accumulate unit) takes the register
values Rn and Rm from the A and B buses and computes a result. Data processing
instructions write the result in Rd directly to the register file.
Load and store instructions use the ALU to generate an address to be held in the address
register and broadcast on the Address bus.
o One important feature of the ARM is that register Rm alternatively can be
preprocessed in the barrel shifter before it enters the ALU. Together the barrel
shifter and ALU can calculate a wide range of expressions and addresses.
After passing through the functional units, the result in Rd is written back to the register
file usingthe Result bus.
For load and store instructions the Incrementer updates the address register before the
core readsor writes the next register value from or to the next sequential memory
location.
The processor continues executing instructions until an exception or
interruptchanges the normal execution flow.
Dept. of CSE (AI & ML), SVIT Asst. Prof. Soumya L N Page: 15
Microcontroller and Embedded Systems(BCO601)
REGISTERS:
General-purpose registers hold either data or an address. They are identified
with theletter r prefixed to the register number. For example, register 4 is given
the label r4. The Figure shows the active registers available in user mode. (A
protected mode is normally used when executing applications).
The processor can operate in seven different modes.
All the registers shown are 32 bits in size.
There are up to 18 active registers:
o 16 data registers and 2 processor status registers.
o The data registers visible to the programmer are r0 to r15.
The ARM processor has three registers assigned to a particular task or special function:
r13, r14,and r15. They are given with different labels to differentiate them from the
other registers.
o Register r13 is traditionally used as the stack pointer (sp) and stores the head of
the stackin the current processor mode.
o Register r14 is called the link register (lr) and is where the core puts the return
addresswhenever it calls a subroutine.
o Register r15 is the program counter (pc) and contains the address of the next
instructionto be fetched by the processor.
In ARM state the registers r0 to r13 are orthogonal—any instruction that you can apply
to r0 youcan equally well apply to any of the other registers.
In addition to the 16 data registers, there are two program status registers: cpsr (current
programstatus register) and spsr (saved program status register).
Dept. of CSE (AI & ML), SVIT Asst. Prof. Soumya L N Page: 16
Microcontroller and Embedded Systems(BCO601)
The ARM core uses the cpsr to monitor and control internal operations. The cpsr is a dedicated
32-bit register and resides in the register file. The following Figure shows the basic layout of a
generic program status register. Note that the shaded parts are reserved for future expansion.
The cpsr is divided into four fields, each 8 bits wide: flags, status, extension, and control. In
currentdesigns the extension and status fields are reserved for future use.
The control field contains the processor mode, state, and interrupt mask bits.
The flags field contains the condition flags.
Some ARM processor cores have extra bits allocated. For example, the J bit, which can be found
in the flags field, is only available on Jazelle-enabled processors, which execute 8-bit
instructions.
It is highly probable that future designs will assign extra bits for the monitoring and control of
new features.
Processor Modes:
The processor mode determines which registers are active and the access rights to the cpsr
register itself. Each processor mode is either privileged or non-privileged:
o A privileged mode allows full read-write access to the cpsr.
o A non-privileged mode only allows read access to the control field in the cpsr,
but stillallows read-write access to the condition flags.
Dept. of CSE (AI & ML), SVIT Asst. Prof. Soumya L N Page: 17
Microcontroller and Embedded Systems(BCO601)
Supervisor mode is the mode that the processor is in after reset and is
generallythe mode that an operating system kernel operates in.
System mode is a special version of user mode that allows full read-write
accessto the cpsr.
Undefined mode is used when the processor encounters an instruction
that isundefined or not supported by the implementation.
o one non-privileged mode (user).
User mode is used for programs and applications.
Banked Registers:
The following Figure shows all 37 registers in the register file.
Of these, 20 registers are hidden from a program at different times.
These registers are called banked registers and are identified by the shading in the diagram.
They are available only when the processor is in a particular mode; for example, abort
mode hasbanked registers r13_abt, r14_abt and spsr_abt.
Banked registers of a particular mode are denoted by an underline character post-
fixed to themode mnemonic or _mode.
Every processor mode except user mode can change mode by writing directly to the
mode bits ofthe cpsr.
All processor modes except system mode have a set of associated banked registers
that are asubset of the main 16 registers.
A banked register maps one-to-one onto a user mode register.
If you change processor mode, a banked register from the new mode will replace an
existing register.
Dept. of CSE (AI & ML), SVIT Asst. Prof. Soumya L N Page: 18
Microcontroller and Embedded Systems(BCO601)
o For example, when the processor is in the interrupt request mode, the instructions
you execute still access registers named r13 and r14. However, these registers are
the banked registers r13_irq and r14_irq. The user mode registers r13_usr and
r14_usr are not affected by the instruction referencing these registers. A program
still has normal access to the other registers r0 to r12.
The processor mode can be changed by a program that writes directly to the cpsr (the
processor core has to be in privileged mode) or by hardware when the core responds to an
exception or interrupt.
The following exceptions and interrupts cause a mode change: reset, interrupt request,
fast interrupt request, software interrupt, data abort, prefetch abort, and undefined
instruction.
Dept. of CSE (AI & ML), SVIT Asst. Prof. Soumya L N Page: 19
Microcontroller and Embedded Systems(BCO601)
Exceptions and interrupts suspend the normal execution of sequential instructions and jump to
a specific location.
The following Figure illustrates what happens when an interrupt forces a mode change.
The Figure shows the core changing from user mode to interrupt request mode, which
happens when an interrupt request occurs due to an external device raising an interrupt to
the processor core.
This change causes user registers r13 and r14 to be banked. The user registers are
replaced with registers r13_irq and r14_irq, respectively.
o Note r14_irq contains the return address and r13_irq contains the stack pointer for
interrupt request mode.
Dept. of CSE (AI & ML), SVIT Asst. Prof. Soumya L N Page: 20
Microcontroller and Embedded Systems(BCO601)
saved program status register (spsr), which stores the previous mode’s cpsr. The cpsr
being copied intospsr_irq.
To return back to user mode, a special return instruction is used that instructs the core to
restore the original cpsr from the spsr_irq and bank in the user registers r13 and r14.
Note that, the spsr can only be modified and read in a privileged mode. There is no spsr
available in user mode.
Another important feature to note is that the cpsr is not copied into the spsr when a mode
change is forced due to a program writing directly to the cpsr. The saving of the cpsr only
occurs when an exception or interrupt is raised.
When power is applied to the core, it starts in supervisor mode, which is privileged.
Starting in a privileged mode is useful since initialization code can use full access to the
cpsr to set up the stacks for each of the other modes.
The following Table lists the various modes and the associated binary patterns. The last
column of the table gives the bit patterns that represent each of the processor modes in the
cpsr.
Table: Processor Mode
Mode Abbreviation Privileged Mode[4:0]
Abort abt yes 10111
Fast Interrupt Request fiq yes 10001
Interrupt Request irq yes 10010
Supervisor svc yes 10011
System sys yes 11111
Undefined und yes 11011
User usr no 10000
Dept. of CSE (AI & ML), SVIT Asst. Prof. Soumya L N Page: 21
Microcontroller and Embedded Systems(BCO601)
The ARM instruction set is only active when the processor is in ARM state.
The Thumb instruction set is only active when the processor is in Thumb state. Once in
Thumbstate the processor is executing purely Thumb 16-bit instructions.
You cannot inter-mingle sequential ARM, Thumb, and Jazelle instructions.
The Jazelle J and Thumb T bits in the cpsr reflect the state of the processor.
o When both J and T bits are 0, the processor is in ARM state and
executes ARM instructions. This is the case when power is
applied to the processor instructions. This is the case when
power is applied to the processor.
o When the T bit is 1, then the processor is in Thumb state.
The ARM designers introduced a third instruction set called Jazelle. Jazelle executes 8-
bit instructions and is a hybrid mix of software and hardware designed to speed up the
execution of Java byte-codes.
To execute Java byte-codes, you require the Jazelle technology plus a specially modified
version of the Java virtual machine.
The following Table gives the Jazelle instruction set features.
Dept. of CSE (AI & ML), SVIT Asst. Prof. Soumya L N Page: 22
Microcontroller and Embedded Systems(BCO601)
Interrupt Masks:
Interrupt masks are used to stop specific interrupt requests from interrupting the processor.
There are two interrupt request levels available on the ARM processor core—
o interrupt request (IRQ)
o fast interrupt request (FIQ).
The cpsr has two interrupt mask bits, 7 and 6 (or I and F), which control the masking of
IRQ andFIQ, respectively.
The I bit masks IRQ when set to binary 1; and similarly, the F bit masks FIQ when set
to binary1.
Condition Flags:
Condition flags are updated by comparisons and the result of ALU operations that specify
the S instruction suffix.
o For example, if a SUBS subtract instruction results in a register value of zero, then the Z
Flag in the cpsr is set. This particular subtract instruction specifically updates the cpsr.
With processor cores that include the DSP extensions, the Q bit indicates if an overflow
or saturation has occurred in an enhanced DSP instruction. The flag is “sticky” in the
sense that the hardware only sets this flag. To clear the flag you need to write to the cpsr
directly.
In Jazelle-enabled processors, the J bit reflects the state of the core; if it is set, the core is
in Jazelle state. The J bit is not generally usable and is only available on some processor
cores. To take advantage of Jazelle, extra software has to be licensed from both ARM
Limited and Sun Microsystems.
Most ARM instructions can be executed conditionally on the value of the condition flags.
The following Table lists the condition flags and a short description on what causes
them to beset.
Dept. of CSE (AI & ML), SVIT Asst. Prof. Soumya L N Page: 23
Microcontroller and Embedded Systems(BCO601)
These flags are located in the most significant bits in the cpsr. These bits are used for
conditional execution. The following Figure shows a typical value for the cpsr with both DSP
For the condition flags a capital letter shows that the flag has been set. For interrupts
a capitalletter shows that an interrupt is disabled.
In the cpsr example shown in above Figure, the C flag is the only condition flag set.
The rest nzvq flags are all clear.
The processor is in ARM state because neither the Jazelle j nor Thumb t bits are set.
The IRQinterrupts are enabled, and FIQ interrupts are disabled.
Finally, you can see from the Figure, the processor is in supervisor (SVC) mode,
since themode[4:0] is equal to binary 10011.
Dept. of CSE (AI & ML), SVIT Asst. Prof. Soumya L N Page: 24
Microcontroller and Embedded Systems(BCO601)
Conditional Execution:
Conditional execution controls whether or not the core will execute an instruction.
Prior to execution, the processor compares the condition attribute with the condition flags
in the
cpsr. If they match, then the instruction is executed; otherwise the instruction is ignored.
The condition attribute is post-fixed to the instruction mnemonic, which is encoded
into theinstruction.
The following Table lists the conditional execution code mnemonics. When a
condition mnemonic is not present, the default behavior is to set it to always
(AL) execute.
Dept. of CSE (AI & ML), SVIT Asst. Prof. Soumya L N Page: 25
Microcontroller and Embedded Systems(BCO601)
PIPELINE:
A pipeline is the mechanism in a RISC processor, which is used to execute instructions.
Pipeline speeds up execution by fetching the next instruction while other instructions
are being decoded and executed.
The Figure shows a sequence of three instructions being fetched, decoded, and executed
by the processor.
o The three instructions are placed into the pipeline sequentially.
o In the first cycle, the core fetches the ADD instruction from memory.
o In the second cycle, the core fetches the SUB instruction and decodes the ADD
instruction.
o In the third cycle, both the SUB and ADD instructions are moved along the
pipeline. The ADD instruction is executed, the SUB instruction is decoded, and
the CMP instruction is fetched.
This procedure is called filling the pipeline.
The pipeline allows the core to execute an instruction every cycle.
Dept. of CSE (AI & ML), SVIT Asst. Prof. Soumya L N Page: 26
Microcontroller and Embedded Systems(BCO601)
As the pipeline length increases, the amount of work done at each stage is reduced, which
allows
o the processor to attain a higher operating frequency. This in turn increases the
performance.
o The increased pipeline length also means increased system latency and there can be data
dependency between certain stages.
o The pipeline design for each ARM family differs. For example, The ARM9 core
increases the pipeline length to five stages, as shown in Figure.
o The ARM9 adds a memory and writeback stage, which allows the ARM9 to –
process on average 1.1 Dhrystone MIPS per MHz
increase the instruction throughput in ARM9 by around 13% compared
with anARM7.
o The ARM10 increases the pipeline length still further by adding a sixth stage, as
shown in the following Figure.
o The ARM10 –
can process on average 1.3 Dhrystone MIPS per MHz
have about 34% more throughput than an ARM7 processor core
but again at a higher latency cost.
NOTE: Even though the ARM9 and ARM10 pipelines are different, they still use the same
pipeline executing characteristics as an ARM7. Hence, code written for the ARM7 will execute
on an ARM9 or ARM10.
Pipeline Executing Characteristics:
The ARM pipeline will not process an instruction, until it passes completely through the
executestage.
Dept. of CSE (AI & ML), SVIT Asst. Prof. Soumya L N Page: 27
Microcontroller and Embedded Systems(BCO601)
The MSR instruction is used to enable IRQ interrupts,which only occurs once the MSR
instruction completes the execute stage of the pipeline.It clears the I bit in the cpsr to
enable the IRQ interrupts.
Once the ADD instruction enters the execute stage of the pipeline, IRQ interrupts are
enabled.The following Figure illustrates the use of the pipeline and the program counter pc.
Dept. of CSE (AI & ML), SVIT Asst. Prof. Soumya L N Page: 28
Microcontroller and Embedded Systems(BCO601)
o Second, ARM10 uses branch prediction, which reduces the effect of a pipeline
flush by predicting possible branches and loading the new branch address prior to
the execution of the instruction.
o Third, an instruction in the execute stage will complete even though an interrupt
has been raised. Other instructions in the pipeline will be abandoned, and the
processor will start filling the pipeline.
Each vector table entry contains a form of branch instruction pointing to the start of a
specific routine:
o Reset vector is the location of the first instruction executed by the processor when
Dept. of CSE (AI & ML), SVIT Asst. Prof. Soumya L N Page: 29
Microcontroller and Embedded Systems(BCO601)
o Prefetch abort vector occurs when the processor attempts to fetch an instruction
from an address without the correct access permissions. The actual abort occurs in
the decode stage.
o Data abort vector is similar to a prefetch abort, but is raised when an instruction
attemptsto access data memory without the correct access permissions.
o Interrupt request vector is used by external hardware to interrupt the normal
execution flow of the processor. It can only be raised if IRQs are not masked in
the cpsr.
o Fast interrupt request vector is similar to the interrupt request, but is reserved for
hardware requiring faster response times. It can only be raised if FIQs are not
masked in the cpsr.
CORE EXTENSIONS:
Core extensions are the standard hardware components placed next to the ARM core.
They improve performance, manage resources, and provide extra functionality and are
designedto provide flexibility in handling particular applications.
Each ARM family has different extensions available.There are three hardware extensions:
cache and tightly coupled memory, memory management and the coprocessor interface.
ARM has two forms of cache. The first is found attached to the Von Neumann–style
Dept. of CSE (AI & ML), SVIT Asst. Prof. Soumya L N Page: 30
Microcontroller and Embedded Systems(BCO601)
cores. It combines both data and instruction into a single unified cache, as shown in the
following Figure.
This is achieved using a form of memory called tightly coupled memory (TCM). TCM is
fast SRAM located close to the core and guarantees the clock cycles required to fetch
instructions or data.
TCMs appear as memory in the address map and can be accessed as fast memory.
Dept. of CSE (AI & ML), SVIT Asst. Prof. Soumya L N Page: 31
Microcontroller and Embedded Systems(BCO601)
By combining both technologies, ARM processors can have both improved performance
and predictable real-time response. The following Figure shows an example core with a
combination of caches and TCMs.
Memory Management:
Embedded systems often use multiple memory devices. It is usually necessary to have a
method to organize these devices and protect the system from applications trying to make
inappropriate accesses to hardware. This is achieved with the assistance of memory
management hardware.
ARM cores have three different types of memory management hardware—
o no extensions providing no protection
o a memory protection unit (MPU) providing limited protection
o a memory management unit (MMU) providing full protection
Non protected memory is fixed and provides very little flexibility. It is normally used for
small, simple embedded systems that require no protection from rogue applications.
MPUs employ a simple system that uses a limited number of memory regions. These
regions are controlled with a set of special coprocessor registers, and each region is
defined with specific access permissions.This type of memory management is used for
systems that require memory protection but don’t have a complex memory map.
Dept. of CSE (AI & ML), SVIT Asst. Prof. Soumya L N Page: 32
Microcontroller and Embedded Systems(BCO601)
MMUs are the most comprehensive memory management hardware available on the
ARM. The MMU uses a set of translation tables to provide fine-grained control over
memory. These tables are stored in main memory and provide a virtual-to-physical
address map as well as access permissions. MMUs are designed for more sophisticated
platform operating systems that support multitasking.
Coprocessors:
Coprocessors can be attached to the ARM processor. A coprocessor extends the
processing features of a core by extending the instruction set or by providing
configuration registers. More than one coprocessor can be added to the ARM core via the
coprocessor interface.
The coprocessor can be accessed through a group of dedicated ARM instructions that
provide a load-store type interface.
o For example, coprocessor 15: The ARM processor uses coprocessor 15 registers
to control the cache, TCMs, and memory management.
The coprocessor can also extend the instruction set by providing a specialized group of
new instructions.
o For example, there are a set of specialized instructions that can be added to the
standard ARM instruction set to process vector floating-point (VFP) operations.
These new instructions are processed in the decode stage of the ARM pipeline.
o If the decode stage sees a coprocessor instruction, then it offers it to the relevant
coprocessor.
o If the coprocessor is not present or doesn’t recognize the instruction, then the
ARM takes an undefined instruction exception, which allows you to emulate the
behavior of the coprocessor in software.
************
Dept. of CSE (AI & ML), SVIT Asst. Prof. Soumya L N Page: 33