KEMBAR78
1.2.1 Sophistication: Chapter 1: Introduction | PDF | Arm Architecture | Central Processing Unit
0% found this document useful (0 votes)
101 views11 pages

1.2.1 Sophistication: Chapter 1: Introduction

Uploaded by

carlosmg4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
101 views11 pages

1.2.1 Sophistication: Chapter 1: Introduction

Uploaded by

carlosmg4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Chapter 1: Introduction

1.2.1 Sophistication
At first glance the peripheral set looks like a typical small microcontroller, featuring peripherals such as Dual ADC,
general purpose timers, I2C,SPI,CAN,USB and a real-time clock. However, each of these peripherals is very
feature-rich. For example the 12-bit ADC has an integral temperature sensor and multiple conversion modes and
devices with dual ADC can slave both ADCs together in a further nine conversion modes. Similarly, each of the
four timers has four capture compare units and each timer block may be combined with the others to build
sophisticated timer arrays. An advanced timer has additional support for motor control, with 6 complimentary
PWM outputs with programmable dead time and a break input line that will force the PWM signal to a pre
programmed safe state. The SPI peripheral has a hardware CRC generator for 8 and 16 words to support
interfacing to SD and MMC cards.

Surprisingly for a small microcontroller, the STM32 also includes a DMA unit with up to 12 channels. Each
channel can be used to transfer data to and from any peripheral register on memory location as 8/16 or 32-bit
words. Each of the peripherals can be a DMA flow controller sending or demanding data as required. An internal
bus arbiter and bus matrix minimise the arbitration between the CPU data accesses and the DMA channels. This
means that the DMA unit is flexible, easy to use and really automates data flow within the microcontroller.

In an effort to square the circle the STM32 is a low power as well as high performance microcontroller. It can run
from a 2V supply and at 72MHz with everything switched on it consumes just 36mA. In combination with the
Cortex low power modes the STM32 has a standby power consumption of just 2µA. An internal 8MHz RC
oscillator allows the chip to quickly come out of low power modes while the external oscillator is still starting up.
This fast entry and exiting from low power modes further reduces overall power consumption.

1.2.2 Safety
As well as demanding more processing power and more sophisticated peripherals, many modern applications
have to operate in safety-critical environments. With this in mind, the STM32 has a number of hardware features
that help support high integrity applications. These include a low power voltage detector, a clock security system
and two separate watchdogs. The first watchdog is a windowed watchdog. This watchdog must be refreshed in a
defined time frame. If you hit it too soon, or too late, the watchdog will trigger. The second watchdog is an
independent watchdog which has its own external oscillator separate from the main system clock. A further clock
security system can detect failure of the main external oscillator and fail safely back onto an internal 8MHz RC
oscillator.

1.2.3 Security
One of the other unfortunate requirements of modern design is the need for code security to prevent software
piracy. Here the STM32 FLASH can be locked for FLASH READ accesses via the debug port. When READ
protection is enabled, the FLASH memory is also WRITE protected to prevent untrusted code from being inserted
on the interrupt vector table. Further WRITE protection can be enabled over the remainder of the FLASH memory.
The STM32 also has a real-time clock and a small area of battery backed SRAM. This region has an anti-tamper
input that can trigger an interrupt on a state change. In addition an anti-tamper event will automatically clear the
contents of the battery backed SRAM.

1.2.4 Software Development


If you are already using an ARM-based microcontroller, the good news is that the chances are that your
development tools already support the Thumb-2 instruction set and the Cortex family. The worst case is a
software upgrade to get the necessary support. ST also provide a peripheral driver library, a USB developer
library as an ANSI C library and source code that is compatible with earlier libraries published for their STR7 and
STR9 microcontrollers. Ports of these libraries are already available for popular compiler tools. Similarly, many
open source and commercial RTOS and middleware (TCP/IP, file system etc) are available for the Cortex family.
The Cortex-M3 also comes with a whole new debug system called CoreSight. Access to the CoreSight system is
through the Debug Access Port which supports either a standard JTAG connection or a serial wire (2 Pin)
interface. As well as providing debug run control, the CoreSight system on the STM32 provides a data watchpoint

© Hitex (UK) Ltd. Page 7


Chapter 1: Introduction

and an instrumentation trace. The instrumentation trace can send selected application information up to the
debug tool. This can provide extended debug information and can also be used during software testing.

1.2.5 The STM32 Family


The STM32 family has four distinct branches. These are grouped as “Performance Line”, “Access Line” and
“USB Access Line” devices.ST have also announced a fourth group of variants called the Connectivity line . In the
user manual the Performance, USB Access and Access Lines are referred to as the High, Medium and Low
density devices. In the Performance, Access and USB Access Lines the peripherals embedded in the devices are
richer in the bigger memory devices than on the smaller memory devices. There are therefore three memory
ranges and corresponding peripheral sets. The Low density devices have Flash memory sizes from 16KB – 32KB
and have the smallest memory set. The Medium density devices have Flash memory sizes from 64KB to 128KB
and the High density devices have Flash memory sizes from 256KB to 512KB and have the richest peripheral set.

The Access Line is the entry line for the STM32 family, with 36MHz operation and a simple peripheral set. The
Performance Line runs to 72MHz and features more peripherals. The USB Access Line adds a USB device
peripheral for cost-sensitive USB applications. ST have also announces a new branch of the STM32 family called
the “Connectivity Line”. This line brings advanced communications peripherals to the STM32 including a dual role
USB controller and an Ethernet MAC. The dual role USB controller can operate as both a device and a Host/OTG
controller. The Ethernet MAC also includes IEEE1588 support for real time Ethernet protocols.

Importantly the package types and pins layouts are the same between all the different variants. This allows
different versions of the STM32 to be interchanged without having to re-spin the PCB, and with minimal software
effort

© Hitex (UK) Ltd. Page 8


Chapter 1: Introduction

© Hitex (UK) Ltd. Page 9


Chapter 2: Cortex Overview

© Hitex (UK) Ltd. Page 10


Chapter 2: Cortex Overview

2. Cortex Overview
As we saw in the introduction, the Cortex processor is the next generation embedded core from ARM. It is
something of a departure from the earlier ARM CPUs in that it is a complete processor core, consisting of the
Cortex CPU and a surrounding set of system peripherals, providing the heart of an embedded system. As a
result of the wide variety of embedded systems, the Cortex processor is available in a number of application
profiles. These are denoted by the letter following the Cortex name. The three profiles are as follows:

Cortex-A Series, applications processors for complex OS and user applications.


Supports the ARM, Thumb and Thumb-2 instruction sets.

Cortex-R Series, real-time systems profile.


Supports the ARM, Thumb, and Thumb-2 instruction sets.

Cortex-M Series, microcontroller profile optimized for cost-sensitive applications.


Supports Thumb-2 instruction set only.

The number at the end of the Cortex name refers to the relative performance level, with 1 the lowest and 8 the
highest. Currently performance level 3 is the highest performance level available in the microcontroller profile. The
STM32 is based on the Cortex-M3 processor.

2.1 ARM Architectural Revision


ARM also somewhat confusingly denote each of their processors with an architectural revision. (This is written
ARMV6, ARMV7 etc.) The Cortex M3 has the architectural revision ARMV7 M.

The Cortex-M3 processor is based on the ARMV7


architecture and is capable of executing the
Thumb-2 instruction set.

Thus the documentation for the Cortex-M3 consists of the Cortex-M3 Technical Reference Manual and the
ARMV7 M Architectural Reference Manual. Both of these documents can be downloaded from the ARM website
at www.arm.com

© Hitex (UK) Ltd. Page 11


Chapter 2: Cortex Overview

2.2 Cortex Processor And Cortex CPU


Throughout the remainder of this book, the terms Cortex processor and Cortex CPU will be used to distinguish
between the complete Cortex embedded core and the internal RISC CPU. In the next section we will look at the
key features of the Cortex CPU followed by the system peripherals in the Cortex processor.

2.3 Cortex CPU


At the heart of the Cortex processor is a 32-bit RISC CPU. This CPU has a simplified version of the ARM7/9
programmer’s model, but a richer instruction set with good integer maths support, better bit manipulation and
‘harder’ real-time performance.

2.3.1 Pipeline
The Cortex CPU can execute most instructions in a single cycle. Like the ARM7 and ARM9 CPUs this is achieved
with a three stage pipeline.

Like the ARM7 and ARM9 CPUs


the Cortex-M3 has a three stage
pipeline. However, the Cortex-M3
also has branch prediction to
minimise the number of pipeline
flushes.

Whilst one instruction is being executed, the next is being decoded and a third is being fetched from memory.
This works very well for linear code, but when a branch is encountered the pipeline must be flushed and refilled
before code can continue to execute. In the ARM7 and ARM9 CPUs branches are very expensive in terms of
code performance. In the Cortex CPU the three stage pipeline is enhanced with branch prediction. This means
that when a conditional branch instruction is reached, a speculative fetch is performed, so that both destinations
of the conditional instruction are available for execution without incurring a performance hit. The worst case is an
indirect branch where a speculative fetch cannot be made and the only course of action is to flush the pipeline.
While the pipeline is key to the overall performance of the Cortex CPU, no special considerations need to be
made in the application code.

2.3.2 Programmer’s Model


The Cortex CPU is a RISC processor which has a load and store architecture. In order to perform data processing
instructions, the operands must be loaded into a central register file, the data operation must be performed on
these registers and the results then saved back to the memory store.

The Cortex-M3 is a load and store architecture. All data has to be moved into a central register file before a
data processing instruction can act on it.

© Hitex (UK) Ltd. Page 12


Chapter 2: Cortex Overview

Consequently all the program activity focuses around the CPU register file. This register file consists of sixteen
32-bit wide registers. Registers R0-R12 are simple registers that can be used to hold program variables. The
Registers R13-R15 have special functions within the Cortex CPU. Register R13 is used as the stack pointer. This
register is banked, which allows the Cortex CPU to have two operating modes each with their own separate stack
space. This is typically used by an RTOS which can run its ‘system’ code in a protected mode. In the Cortex CPU
the two stacks are called the main stack and the process stack. The next register R14 is called the link register.
This register is used to store the return address when a call is made to a procedure. This allows the Cortex CPU
to make a fast entry and exit to a procedure. If your code calls several levels of subroutines, the compiler will
automatically store R14 on the stack. The final register R15 is the program counter; since this is part of the central
register file it can be read and manipulated like any other register.

The Cortex-M3 has a CPU register file of 16 32-bit wide registers. Like the
earlier ARM7/9 CPUs R13 is the stack pointer. R14 is the link register and
R15 is the PC. R13 is a banked register to allow the Cortex-M3 to operate
with two stacks: a process stack and a main stack.

2.3.2.1 XPSR
In addition to the register file there is a separate register called the Program Status Register. This is not part of
the main register file and is only accessible through two dedicated instructions. The xPSR contains a number of
fields that influence the execution of the Cortex CPU.

The Program Status Register contains status fields for instruction execution. This register is
aliased into the Application, Execution and Interrupt Status Registers

The xPSR register can also be accessed through three special alias names that allow access to sub-ranges of
bits within the xPSR. The top five bits are the condition code flags and are aliased as the Application Program
Status Register. The first four condition code flags N,Z,C,V ( Negative, Zero, Carry and Overflow) will be set and
cleared depending on the result of a data processing instruction. The Q bit is used by the DPS saturated maths
instructions to indicate that a variable has reached its maximum or minimum value. Like the ARM 32-bit
instruction set, certain Thumb-2 instructions are only executed if the instruction condition code matches the state
of the Application Program Status Register flags. If the instruction condition codes do not match, the instruction
passes through the pipeline as a NOP. This ensures that instructions flow smoothly through the pipeline and
minimises pipeline flushes. In the Cortex CPU, this technique is extended with the Execution Program Status

© Hitex (UK) Ltd. Page 13


Chapter 2: Cortex Overview

Register. This is an alias of bits 26 – 8 of the xPSR. This contains three fields: the “If then” field the “interrupt
continuable instruction” and the Thumb instruction field. The Thumb-2 instruction set has an efficient method of
executing small ‘if then’ blocks of instructions. When a conditional test is true, it can set a value in the IT field that
tells the CPU to execute up to four following instructions. If the conditional test fails, these instructions will pass
through the pipeline as a NOP. Thus a typical line of C would be coded as follows:

If (r0 ==0)
CMP r0,#0 compare r0 to 0
ITTEE EQ if true execute the next two instructions
Then r0 = *r1 +2;
LDR r0,[r1] load contents of memory location into r0
ADDr0,#2 add 2

While most Thumb-2 instructions execute in a single cycle, some (such as load and store instructions) take
multiple cycles. So that the Cortex CPU can have a deterministic interrupt response time, these instructions must
be interruptible. When an instruction is terminated early, the interrupt continuable instruction field stores the
number of the next register to be operated on in the load or store multiple instruction. Thus once the interrupt has
been serviced, the load/store multiple instruction can resume execution. The final Thumb field is inherited from
the earlier ARM CPUs. This field indicates if the ARM or Thumb instruction set is currently being executed by the
CPU. In the Cortex-M3 this bit is always set to one. Finally, the interrupt status field contains information on any
interrupt request that was pre-empted.

© Hitex (UK) Ltd. Page 14


Chapter 2: Cortex Overview

2.3.3 CPU Operating Modes


While the Cortex processor is designed to be a low gate count, fast and easy to use microcontroller core, it has
been designed to support the use of a real-time operating system. The Cortex processor has two operating
modes: Thread mode and Handler mode. The CPU will run in Thread mode while it is executing in non-interrupt
background mode and will switch to the Handler mode when it is executing exceptions. In addition, the Cortex
CPU can execute code in a privileged or non-privileged mode. In privileged mode, the CPU has access to the full
instruction set. In unprivileged mode certain instructions are disabled (such as the MRS and MSR instructions
which allow access to the xPSR and its aliases). Additionally, access to most registers in the Cortex processor
system control space is also disabled. Stack usage can also be configured. The main stack (R13) can be used by
both Thread and Handler mode. Alternatively, Handler mode can be configured to use the process stack (R13
banked register).

The Cortex-M3 can be used in a ‘flat’ simple mode. It


is also designed to support real-time operating
systems. It has Handler and Thread modes that can
be configured to use the main and process stacks
and have privileged access to the Cortex system
control registers.

Out of reset the Cortex processor will run in a ‘flat’ configuration. Both Thread and Handler modes execute in
privileged mode, so there are no restrictions on access to any processor resources. Both the Thread and Handler
modes use the main stack. In order to start execution, the Cortex processor simply needs the reset vector and the
start address of the stack to be configured before you can start to execute your application C code. However, if
you are using an RTOS or are developing a safety-critical application, the chip can be used in a mode advanced
configuration where Handler mode (exceptions and the RTOS) runs in privileged mode and uses the main stack
while application code runs in Thread mode with unprivileged access and uses the process stack. This way the
system code and the application code are partitioned and errors in the application code will not cause the RTOS
to crash.

© Hitex (UK) Ltd. Page 15


Chapter 2: Cortex Overview

2.3.4 Thumb-2 Instruction Set


The ARM7 and ARM9 CPUs can execute two instruction sets: the ARM 32-bit instruction set and the Thumb 16-
bit instruction set. This allows a developer to optimise his program by selecting the instruction set used for
different procedures: 32-bit instructions for speed and 16-bit instructions for code compression. The Cortex CPU
is designed to execute the Thumb-2 instruction set which is a blend of 16 and 32 bit instructions. The thumb-2
instruction set gives a 26% code density improvement over the ARM 32-bit instruction set and a 25%
improvement in performance over the Thumb 16-bit instruction set. The Thumb2 instruction set has some
improved multiply instructions which can execute in a single cycle and a hardware divide that takes between 2 – 7
cycles.

The Cortex processor benchmarks give


a performance level of 1.2 DMIPS/MHz,
which is 1.2 Clock cycles per instruction.

The Thumb-2 instruction set also has: improved branching instructions including test and compare, if/then
conditional execution blocks and for data manipulation byte ordering and byte and half word extraction
instructions. While still a RISC processor, the Cortex CPU also has a rich instruction set that is specifically
designed as a good target for a C compiler. A typical Cortex-M3 program will be written entirely in ANSI C, with
minimal non-ANSI keywords and only the exception vector table written in Assembler.

© Hitex (UK) Ltd. Page 16


Chapter 2: Cortex Overview

2.3.5 Memory Map


The Cortex-M3 processor is a standardised microcontroller core and as such has a well-defined memory map.
Despite the multiple internal busses this memory map is a linear 4 Gbyte address space.

The Cortex-M3 defines a fixed 4


Gb memory map that specifies
regions for code SRAM
peripherals, external memory
and devices and the Cortex
system registers. This memory
map is common to all Cortex-
based devices.

The first 1Gbyte of memory is split evenly between a code region and a SRAM region. The code space is
optimised to be executed from the I-Code bus. Similarly, the SRAM is reached with the D-code bus. Although
code can be loaded and executed from the SRAM, the instructions would be fetched using the system bus, which
incurs an extra wait state. It is likely that code would run slower from SRAM than from on-chip FLASH memory
located in the code region. The next 0.5 Gbyte of memory is the on-chip peripheral region. All user peripherals
provided by the microcontroller vendor will be located in this region. The first 1 Mbyte of both the SRAM and
Peripheral regions is bit-addressable using a technique called bit banding. Since all the SRAM and all the user
peripherals on the STM32 are located in these regions all the memory locations of the STM32 can be
manipulated in a word-wide or bitwise fashion. The next 2 Gbyte address space is allocated to external memory-
mapped SRAM and peripherals. The final 0.5 Gbyte is allocated to the internal Cortex processor peripherals and
a region for future vendor specific enhancements to the Cortex processor. All of the Cortex processor registers
are at fixed locations for all Cortex-based microcontrollers. This allows code to be more easily ported between
different STM32 variants and indeed other vendors’ Cortex-based microcontrollers. One processor to learn, one
set of tools to invest in and large amounts of reusable code across a wide range of microcontrollers.

© Hitex (UK) Ltd. Page 17

You might also like