KEMBAR78
Bcs302 Complete Notes | PDF | Embedded System | Microcontroller
0% found this document useful (0 votes)
2K views179 pages

Bcs302 Complete Notes

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2K views179 pages

Bcs302 Complete Notes

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 179

BCS402-MICROCONTROLLERS

BCS402 Microcontrollers

MODULE 1
Difference between Microprocessor and Microcontroller

THE RISC DESIGN PHILOSOPHY


Q. Explain briefly the RISC design philosophy.
Answer:
 RISC design philosophy is
o Aimed at simple but powerful instructions that execute within a single cycle at a
high clock speed.

Dept. of CSE, RRIT 1


BCS402 Microcontrollers

o Concentrates on reducing the complexity of instructions performed by the


hardware.
o Provides greater flexibility and intelligence in software rather than hardware.
 The RISC philosophy is implemented with four major design rules:
o Instructions: RISC has a reduced number of instruction classes. These classes
provide simple operations so that each is executed in a single cycle. Each
instruction is a fixed length to allow the pipeline to fetch future instructions
before decoding the current instruction.
o Pipeline: The processing of instructions is broken down into smaller units that
can be executed in parallel by pipelines.
o Register: RISC machines have a large general-purpose register set. Any register
can contain either data or an address.
o Load-store architecture: The processor operates on the data held in registers.
Separate load and store instructions transfer data between the register bank and
external memory.
 These design rules allow a RISC processor to be simpler, and thus the core can operate at
higher clock speed.
 Figure below shows the major difference between CISC and RISC processors, CISC
emphasizes on hardware complexity, whereas RISC emphasizes on compiler complexity.
CISC RISC
Greater Complexity
Compiler Compiler

Code Code
Generation Generation

Greater Complexity

Processor Processor
Difference between RISC and CISC

RISC CISC

Emphasizes on compiler complexity Emphasizes on processor complexity


Simple but powerful instructions Instructions are more complicated

Dept. of CSE, RRIT 2


BCS402 Microcontrollers

Executes instruction in single cycle Takes many cycle to execute


Instructions are of fixed length Instructions are of variable length
Have large set of general purpose registers Have limited set of general purpose registers
Any register can contain either data or an Dedicated registers for specific purpose
address
Separate load and store instructions transfer MOV instructions can be used to transfer
data between the register and external memory. between register and memory.

THE ARM DESIGN PHILOSOPHY


Q Explain briefly the ARM design philosophy.
Write the physical features of ARM processor.
Answer:
Following are the ARM design philosophy
 The ARM processor has been specially designed to be small to reduce power
consumption and extend battery operation- essentially for application such as mobile
phones and personal digital assistants.
 High code density is major requirement since embedded systems have limited memory
due to cost and physical size restrictions. High code density is useful for applications that
have limited on-board memory, such as mobile phones.
 Embedded systems are price sensitive and use low cost memory devices.
 Another requirement is to reduce the area of the die taken up by the embedded processor.
For a single-chip solution, the smaller the area used by the embedded processor, the more
available space for specialized peripherals.
 ARM has incorporated hardware debug technology within the processor so that software
engineers can view what is happening while the processor executing code.

Instruction Set For Embedded Systems

Q. What are the silent features of ARM instruction set are suitable for embedded applications?

Answer:
In the following ways that make the ARM instruction set suitable for embedded applications:
 Variable cycle execution for certain instructions—Not every ARM instruction executes in a
single cycle. For example, load-store-multiple instructions vary in the number of execution cycles
depending upon the number of registers being transferred.
 Inline barrel shifter leading to more complex instructions—The inline barrel shifter is a
hardware component that preprocesses one of the input registers before it is used by an
instruction. This expands the capability of many instructions to improve core performance and
code density.

Dept. of CSE, RRIT 3


BCS402 Microcontrollers

 Thumb 16-bit instruction set—ARM enhanced the processor core by adding a second 16-bit
instruction set called Thumb that permits the ARM core to execute either 16- or 32-bit
instructions.
 Conditional execution— An instruction is only executed when a specific condition has been
satisfied. This feature improves performance and code density by reducing branch instructions.
 Enhanced instructions—The enhanced digital signal processor (DSP) instructions were added to
the standard ARM instruction set to support fast 16×16-bit multiplier operations.

EMBEDDED SYSTEM HARDWARE


Q. With a neat diagram explain the ARM based embedded device microcontroller.

Or
With a neat diagram explain the different hardware components of an embedded device based on ARM
core.
Answer: Figure shown below shows a typical embedded device based on ARM core. Each box represents
a feature or function.

ARM
Processor ROM
Memory Controller FLASH ROM
SRAM
Interrupt Controller DRAM

AHB-external bridge External bus

AHB Arbiter

AHB-APB bridge

Ethernet
Real-time clock

Counter/timers
Console Serial UARTs

 ARM processor based embedded system hardware can be separated into the following four main
hardware components:
o The ARM processor: The ARM processor controls the embedded device. Different
versions of the ARM processor are available to suits the desired operating characteristics.
o Controllers: Controllers coordinate important blocks of the system. Two commonly
found controllers are memory controller and interrupt controller.
o Peripherals: The peripherals provide all the input-output capability external to the chip
and responsible for the uniqueness of the embedded device.
o Bus: A bus is used to communicate between different parts of the device.
 ARM Bus Technology
o Embedded devices use an on-chip bus that is internal to the chip and that allows different
peripheral devices to be interconnected with an ARM core.
o There are two different classes of devices attached to the bus.
 The ARM processor core is a bus master—a logical device capable of initiating
a data transfer with another device across the same bus.

Dept. of CSE, RRIT 4


BCS402 Microcontrollers


Peripherals tend to be bus slaves—logical devices capable only of responding to
a transfer request from a bus master device.
 AMBA Bus Protocol
o The Advanced Microcontroller Bus Architecture (AMBA) was introduced in 1996 and
has been widely adopted as the on-chip bus architecture used for ARM processors.
o The first AMBA buses introduced were the ARM System Bus (ASB) and the ARM
Peripheral Bus (APB).
o Later ARM introduced another bus design, called the ARM High Performance Bus
(AHB).
o AHB provides higher data throughput than ASB because it is based on a centralized
multiplexed bus scheme rather than the ASB bidirectional bus design.
 MEMORY
o An embedded system has to have some form of memory to store and execute code.
o Figure below shows the memory trade-offs: the fastest memory cache is physically
located nearer the ARM processor core and the slowest secondary memory is set further
away.
o Generally the closer memory is to the processor core, the more it costs and the smaller its
capacity.

 PERIPHERALS
o Embedded systems that interact with the outside world need some form of peripheral
device.
o Controllers are specialized peripherals that implement higher levels of functionality
within the embedded system.
o Memory controller: Memory controllers connect different types of memory to the
processor bus.
o Interrupt controller: An interrupt controller provides a programmable governing policy
that allows software to determine which peripheral or device can interrupt the processor
at any specific time.

Q. Explain the AMBA bus protocol.


 The Advanced Microcontroller Bus Architecture (AMBA) was introduced in 1996 and
has been widely adopted as the on-chip bus architecture used for ARM processors.
 The first AMBA buses introduced were the ARM System Bus (ASB) and the ARM
Peripheral Bus (APB).
 Later ARM introduced another bus design, called the ARM High Performance Bus
(AHB).
 Using AMBA, peripheral designers can reuse the same design on multiple projects.
Because there are a large number of peripherals developed with an AMBA interface.

Dept. of CSE, RRIT 5


BCS402 Microcontrollers

 A peripheral can simply be bolted onto the on-chip bus without having to redesign an
interface for different processor architecture.
 This plug-and-play interface for hardware developers improves availability and time to
market.
 AHB provides higher data throughput than ASB because it is based on a centralized
multiplexed bus scheme rather than the ASB bidirectional bus design.
 This change allows the AHB bus to run at higher clock speeds and to be the first ARM
bus to support widths of 64 and 128 bits.
 ARM has introduced two variations on the AHB bus: Multi-layer AHB and AHB-Lite.
 In contrast to the original AHB, which allows a single bus master to be active on the bus
at any time, the Multi-layer AHB bus allows multiple active bus masters.
 AHB-Lite is a subset of the AHB bus and it is limited to a single bus master. This bus
was developed for designs that do not require the full features of the standard AHB bus.

Embedded System Software


Q. Explain briefly the ARM processor based embedded system software.
OR Explain the structure of ARM cross development tool kit.

Answer:

 An embedded system requires software to drive it. Figure below shows typical software
components required to control an embedded device.
 Each software components in the stack uses a higher level of abstraction to separate the code
from the hardware device.

Applications

Operating System

Initialization Divice drivers


Hardware device

Initialization (BOOT) code:


⚫ Initialization code (or boot code) takes the processor from the reset state to a state where the
operating system can run.
⚫ First code executed on the board and is specific to a particular target or group of targets.
⚫ Handles a number of administrative tasks prior to handling control over to an operating system.
⚫ We can group these different tasks into three phases: initial hardware configuration, diagnostics
and booting.
⚫ Initial hardware configuration involves setting up the target platform so it can boot an
image.
⚫ Diagnostics: The primary purpose of diagnostic code is fault identification and isolation.
⚫ Booting: involves loading an image and handling control over the image. Loading an
image involves copying an entire program including code and data into RAM.
The operating system
 An operating system organizes the system resources: the peripherals, memory and processing
time.

Dept. of CSE, RRIT 6


BCS402 Microcontrollers

 ARM processors support over 50 operating systems.


 We can divide operating systems into two main categories: real time operating systems (RTOSs)
and platform operating systems.
 RTOSs provide guaranteed response times to events. Systems running an RTOS generally do not
have secondary storage.
 Platform operating systems require a memory management unit to manage large, non-real time
applications and tends to have secondary storage.

The device drivers:


 Device drivers are the third component that provides a consistent software interface to the
peripherals on the hardware device.
Applications:
 Finally, an application performs one of the tasks required for a device. For example, a mobile
phone might have diary application.
 There may be multiple applications running on the same device, controlled by the operating
systems.
 An embedded system can have one active application or several applications running
simultaneously.
 The software components can run from ROM or RAM. ROM code that is fixed on the device is
called firmware, for example the initialization code.

ARM core data flow model

Q. Explain ARM core data flow model with neat diagram.

Figure1: ARM core dataflow model

 An ARM core as functional units connected by data buses, as shown in Figure1, where, the
arrows represent the flow of data, the lines represent the buses, and the boxes represent either an
operation unit or a storage area.
 The instruction decoder translates instructions before they are executed.

Dept. of CSE, RRIT 7


BCS402 Microcontrollers

 The ARM processor, like all RISC processors, uses a load - store architecture.
 Load instructions copy data from memory to registers, and conversely the store instructions
copy data from registers to memory.
 There are no data processing instructions that directly manipulate data in memory.
 ARM instructions typically have two source registers, Rn and Rm, and a single destination
register, Rd. Source operands are read from the register file using the internal buses A and B,
respectively.
 The ALU (arithmetic logic unit) or MAC (multiply-accumulate unit) takes the register values Rn
and Rm from the A and B buses and computes a result.
 Data processing instructions write the result in Rd directly to the register file.
 Load and store instructions use the ALU to generate an address to be held in the address register
and broadcast on the Address bus.
 One important feature of the ARM is that register Rm alternatively can be preprocessed in the
barrel shifter before it enters the ALU.
 After passing through the functional units, the result in Rd is written back to the register file using
the Result bus.
 For load and store instructions the incrementer updates the address register before the core reads
or writes the next register value from or to the next sequential memory location.

REGISTERS
Q5. Explain briefly the active registers available in user mode.
OR
With a neat diagram explain the different general purpose registers of ARM processors.
Answer: Figure shown below shows the active registers available in user mode. All the registers shown
are 32 bits in size.
r0
r1
r2
r3
r4
r5
r6
r7
r8
r9
r10
r11
r12
r13 sp
r14 lr
r15 pc

cpsr
-
 There are up to 18 active registers: 16 data registers and 2 processor status registers. The data
registers are visible to the programmer as r0 to r15.
 The ARM processor has three registers assigned to a particular task: r13, r14 and r15.
 Register r13: Register r13 is traditionally used as the stack pointer (sp) and stores the head of the
stack in the current processor mode.

Dept. of CSE, RRIT 8


BCS402 Microcontrollers

 Register r14: Register r14 is called the link register (lr) and is where the core puts the return
address whenever it calls a subroutine.
 Register r15: Register r15 is the program counter (pc) and contains the address of the next
instruction to be fetched by the processor.
 In addition to the 16 data registers, there are two program status registers: current program status
register (cpsr) and saved program status register (spsr).

CPSR (Current Program Status Register)


Q6. Explain the various fields in current program status register (CPSR) with neat diagram.

Answer: Figure below shows the basic layout of a generic program status register.

Fields
Flags Status Extension Control
Bit 31 30 29 28 7 6 5 4 0

N Z C V I F T Mode

Function

Condition Interrupt Masks Processor


flags Mode
Thumb
state
 The cpsr is divided into four fields, each 8 bits wide: flags, status, extension and control.
 In current designs the extension and status fields are reserved for future use.
 The control field contains the processor mode, state and interrupts mask bits.
 The flag field contains the condition flags.
 The following table gives the bit patterns that represent each of the processor modes in the cpsr.

Mode Mode[4:0]
Abort 10111
Fast interrupt request 10001
Interrupt request 10010
Supervisor 10011
System 11111
Undefined 11011
User 10000

 When cpsr bit 5, T=1, then the processor is in Thumb state. When T=0, the processor is in ARM
state.
 The cpsr has two interrupt mask bits, 7 and 6 (I and F) which control the masking Interrupt
request (IRQ) and Fast Interrupt Request (FIR).
 Condition flags are updated by comparisons and the result of ALU operations that specify the
S instruction suffix.

Dept. of CSE, RRIT 9


BCS402 Microcontrollers

 For example, if SUBS subtract instruction results in a register value of zero, then the Z flag in
the cpsr is set.

 The following table shows the conditional flags:

Flag Flag Name Set when


N Negative Bit 31 of the result is a binary 1
Z Zero The result is zero, frequently used to indicate equality
C Carry The result causes an unsigned carry
V Overflow The result causes a signed overflow

Processor Mode

Q7. Explain the various modes of operation of ARM processor.

Answer:

 Each processor mode is either privileged or nonprivileged.


 A privileged mode allows read-write access to the cprs.
 A nonprivileged mode only allows read access to the control field in the cpsr but allows read-
write access to the conditional flags.
 There are seven processor modes : six privileged modes and one nonprivileged mode.
 The privilege modes are abort, fast interrupt request , interrupt request, supervisor, system and
undefined. The nonprivileged mode is user.
1. The processor enter abort mode when there is a failure to attempt to access memory.
2. Fast interrupt request and interrupt request modes correspond to the two interrupt levels
available on the ARM processor.
3. Supervisor mode is the mode that the processor is in after reset and is generally the mode
that an operating system kernel operates in.
4. System mode is a special version of user mode that allows full read-write access to the cpsr.
5. Undefined mode is used when the processor encounters an instruction that is undefined or
not supported by the implementation. User mode is used for program and applications.
Banked Registers
Q8. Explain programmer’s model of ARM processor with complete register sets avaliable.
OR
What are banked registers? Show how the banked registers are utilized when the user mode changes to
IRQ mode.
Answer:
 Figure below shows all 37 registers in the register file.
 Of these, 20 registers are hidden from a program at different times. These registers are called
banked registers.
 They are available only when the processor is in a particular mode, for example, abort mode has
banked registers r13_abt, r14_abt and spsr_abt.
 Banked registers of a particular mode are denoted by an underline character post-fixed to the
mode mnemonic.

Dept. of CSE, RRIT 10


BCS402 Microcontrollers

Figure 1: Complete ARM register set


 Every processor mode except user mode can change mode by writing directly to the mode bits of
the cpsr.
 All privileged modes except system mode have a set of associated banked registers that are subset
of the main 16 registers.
 If the processor mode is changed, a banked register from the new mode will replace an existing
register.
 The processor mode can be changed by a program that writes directly to the cpsr when the
processor core is in privilege mode.
 The following exception and interrupts causes a mode change: reset, interrupt request, fast
interrupt request, software interrupt, data abort, prefetch abort and undefined instructions.
 Exceptions and interrupts suspend the normal execution of sequential instructions and jump to a
specific location.
 Following figure 2 illustrates the happening when an interrupt forces a mode change.
 The figure 2 shows the core changing from user mode to interrupt request mode, which happens
when an interrupt request occurs due to an external device raising an interrupt to the processor
core. This change causes user registers r13 and r14 to be banked.

Dept. of CSE, RRIT 11


BCS402 Microcontrollers

Figure 2: changing mode on an exception


 The user registers are replaced with registers r13_irq and r14_irq respectively.
 r14_irq contains the return address and r13_irq contains the stack pointer for interrupt request
mode.
 The saved program status register (spsr), which stores the previous mode’s cpsr.

PIPELINE
Q9. With neat diagram explain the various blocks in a 3 stage pipeline of ARM processor
organization.
OR
Explain ARM pipeline with 3,5,6 stages.
Answer:
 Pipeline is the mechanism to speed up execution by fetching the next instruction while other
instruction are being decoded and executed.
 Figure 1 shows the ARM7 three-stage pipeline.

Fetch Decode Execute


Figure 1: ARM7 Three-stage pipeline
 Fetch loads an instruction from memory.
 Decode identifies the instruction to be executed.
 Execute processes the instruction and writes the result back to a register.
 Figure 2 illustrates the pipeline using a simple example. It shows a sequence of three instructions
being fetched, decoded and executed by the processor.

Dept. of CSE, RRIT 12


BCS402 Microcontrollers

 Each instruction takes a single cycle to complete after the pipeline is filled.
o In the first cycle, the core fetches the ADD instruction from the memory.
o In the second cycle, the core fetches the SUB instruction and decode the ADD
instruction.
o In the third cycle, the core fetches CMP instruction from the memory, decode the SUB
instruction and execute the ADD instruction.
o The ADD instruction is executed, the SUB instruction is decoded, and the CMP
instruction is fetched. This procedure is called filling the pipeline.

Fetch Decode Execute

Cycle 1 ADD

Time
Cycle 2 SUB ADD

Cycle 3 CMP SUB ADD

 The pipeline design for each ARM family differs. For example, the ARM9 core increases the
pipeline length to five stages as shown in the figure below.

Fetch Decode Execute Memory Write

 The ARM10 increases the pipeline length still further by adding a sixth stage as shown in the
figure below.

Fetch Issue Decode Execute Memory Write

 As the pipeline length increases the amount of work done at each stage is reduced, which allows
the processor to attain a higher operating frequency. This in turn increases the performance.
 Pipeline Executing Characteristics
a. The ARM pipeline has not processed an instruction until it passes completely through the
execute stage. For example, an ARM7 pipeline (with three stages) has executed an instruction
only when the fourth instruction is fetched. Figure below shows an instruction sequence on an
ARM7 pipeline.

Dept. of CSE, RRIT 13


BCS402 Microcontrollers

Figure 1: ARM instruction sequence


b. In the execute stage, the pc always points to the address of the instruction plus 8 bytes. In
other words, the pc always points to the address of the instruction being executed plus two
instructions ahead as shown in figure 2 below

Figure 2: Example: pc = address + 8


c. The execution of a branch instruction or branching by the direct modification of the pc
causes the ARM core to flush its pipeline.
d. ARM10 uses branch prediction, which reduces the effect of a pipeline flush by predicting
possible branches and loading the new branch address prior to the execution of the
instruction.
e. An instruction in the execute stage will complete even though an interrupt has been raised.

Exceptions, Interrupts, and the Vector Table

Q10. Explain briefly the interrupt and the vector table.


Answer:
 When an exception or interrupt occurs, the processor sets the program counter (pc) to a specific
memory address.
 The address is within a specified address range called the vector table.
 The entries in the vector table are the instructions that branch to specific routines designed to
handle particular exception or interrupt.
 The memory map address 0x00000000 is reserved for the vector table, a set of 32-bit words.
 On some processors, the vector table can optionally located at higher address in memory starting
at the 0xffff0000.
 When an exception or interrupt occurs, the processor suspends normal execution and starts
loading instructions from the exception vector table.

Dept. of CSE, RRIT 14


BCS402 Microcontrollers

 Each vector table entry contains a form of branch instruction pointing to start of a specific
routine.
 Following is the vector table:
Exception/Interrupt Shorthand Address High address
Reset RESET 0x00000000 0xffff0000
Undefined instruction UNDEF 0x00000004 0xffff0004
Software interrupt SWI 0x00000008 0xffff0008
Prefetch abort PABT 0x0000000c 0xffff000c
Data abort DABT 0x00000010 0xffff0010
Reserved --- 0x00000014 0xffff0014
Interrupt request IRQ 0x00000018 0xffff0018
Fast interrupt request FIQ 0x0000001c 0xffff001c

 Reset vector is the location of the first instruction executed by the processor when power is
applied. This instruction branches to the initialization code.
 Undefined instruction vector is used when the processor cannot decode the instruction.
 Software interrupt vector is called when SWI instruction is executed. The SWI is frequently
used as the mechanism to invoke an operating system routine.
 Prefetch abort vector occurs when the processor attempts to fetch an instruction from an address
without the correct access permissions.
 Data abort vectors is similar to a prefetch abort but is raised when an instruction attempts to
access data memory without the correct access permissions.
 Interrupt request vector is used by external hardware to interrupt the normal execution flow of
the processor.
 Fast interrupt request vector is similar to the interrupt request but is reserved for hardware
requiring faster response times.
Core Extensions
Q11. Discuss the following with neat diagrams
a. Von Neumann architecture with cache
b. Harvard architecture with TCM
OR
Discuss all 3 core extensions.
Answer:
There are three core extensions wrap around ARM processor: cache and tightly coupled memory,
memory management and the coprocessor interface.
1. Cache and tightly coupled memory: The cache is a block of fast memory placed between
main memory and the core. With a cache the processor core can run for the majority of the time
without having to wait for data from slow external memory.
o ARM has two forms of cache. The first found attached to the Von Neumann-style cores.
It combines both data and instruction into a single unified cache as shown in the figure 1
below.

Dept. of CSE, RRIT 15


BCS402 Microcontrollers

Figure 1: A simplified Von Neumann architecture with cache.

o The second form, attached to the Harvard-style cores, has separate cache for data and
instruction as shown figure 2

Figure 2: A simplified Harvard architecture with TCMs.

o A cache provides an overall increase in performance but will not give


predictable execution.
o But for real-time systems it is paramount that code execution is deterministic.
o This is achieved using a form of memory called tightly coupled memory (TCM).
o TCM is fast SRAM located close to the core and guarantees the clock cycles required to
fetch instructions or data.
o By combining both technologies, ARM processors can behave both improved
performance and predictable real-time response. The following diagram shows an
example of core with a combination of caches and TCMs as shown in figure 3

Dept. of CSE, RRIT 16


BCS402 Microcontrollers

Figure 3: combining both technologies

2. Memory management:
 Embedded systems often use multiple memory devices. It is usually necessary to have a method
to help organize these devices and protect the system from applications trying to make
appropriate accesses to hardware.
 This is achieved with the assistance of memory management hardware.
 ARM cores have three different types of memory management hardware- no extensions provide
no protection, a memory protection unit (MPU) providing limited protection and a memory
management unit (MMU) providing full protection.
o Nonprotected memory is fixed and provides very little flexibility. It normally used for
small, simple embedded systems that require no protection from rogue applications.
o Memory protection unit (MPU) employs a simple system that uses a limited number of
memory regions. These regions are controlled with a set of special coprocessor registers,
and each region is defined with specific access permission but don’t have a complex
memory map.
o Memory management unit (MMU)are the most comprehensive memory management
hardware available on the ARM. The MMU uses a set of translation tables to provide
fine-grained control over memory.
 These tables are stored in main memory and provide virtual to physical address
map as well as access permission. MMU designed for more sophisticated system
that supports multitasking.

Briefly explain how coprocessors can be attached to ARM processor.

3. Coprocessors:
 A coprocessor extends the processing features of a core by extending the instruction set or by
providing configuration registers.
 More than one coprocessor can be added to the ARM core via the coprocessor interface.

Dept. of CSE, RRIT 17


BCS402 Microcontrollers

 The coprocessor can be accessed through a group of dedicated ARM instructions that provide a
load-store type interface.
 The coprocessor can also extend the instruction set by providing a specialized instructions that
can be added to standard ARM instruction set to process vector floating-point (VFP) operations.
 These new instructions are processed in the decode stage of the ARM pipeline. If the decode
stage sees a coprocessor instruction, then it offers it to the relevant coprocessor.
 But, if the coprocessor is not present or doesn’t recognize the instruction, then the ARM takes an
undefined instruction exception.

Dept. of CSE, RRIT 18


BCS402 Microcontrollers

MODULE 2
Data Processing Instructions
 The data processing instructions manipulate data within registers. They are move
instructions, arithmetic instructions, logical instructions, compare instructions and
multiply instructions.
 Most data processing instructions can process one of their operands using the barrel
shifter.
 If S is suffixed on a data processing instruction, then it updates the flags in the cpsr.
MOVE INSTRUCTIONS:
 It copies N into a destination register Rd, where N is a register or immediate value. This
instruction is useful for setting initial values and transferring data between registers.

Syntax: <instruction> {<cond>} {S} Rd, N

 In the example shown below, the MOV instruction takes the contents of register r5
and copies them into register r7.

USING BARREL SHIFTER WITH DATA TRANSFER INSTRUCTION:


 Data processing instructions are processed within the arithmetic and logic unit (ALU).
 A unique and powerful feature of the ARM processor is the ability to shift the 32-bit
binary pattern in one of the source registers left or right by a specific number of positions
before it enters the ALU.
 This shift increases the power and flexibility of many data processing operations.
 For example, We apply a logical shift left (LSL) to register Rm before moving it to the
destination register.
PRE r5=5
r7=8
MOV r7, r5, LSL #2
POST r5=5
r7=20
 The above example shift logical left r5=5 (00000101 in binary) by two bits and
then r7=20 (00010100 in binary).

Dept. of CSE, RRIT 19


BCS402 Microcontrollers

Figure: Barrel shifter and ALU


 Following table shows barrel shifter operation

Dept. of CSE, RRIT 20


BCS402 Microcontrollers

ARITHMETIC INSTRUCTIONS:
 The arithmetic instructions implement addition and subtraction of 32-bit signed
and unsigned values.
Syntax: <instruction>{<cond>} {S} Rd, Rn, N

 In the following example, subtract instruction subtracts a value stored in register r2


from a value stored in the register r1. The result is stored in register r0.

 In the following example, the reverse subtract instruction (RSB) subtract r1 from
the constant value #0, writing the result in r0.

USING THE BARREL SHIFTER WITH ARITHMETIC INSTRUCTIONS:


 Example below illustrates the use of the inline barrel shifter with an arithmetic
instruction. The instruction multiplies the value stored in register r1 by three.
 Register r1 is first shifted one location to the left to give the value of twice r1. The ADD
instruction then adds the result of the barrel shift operation to register r1. The final result
transferred into register r0 is equal to three times the value stored in register r1.

Dept. of CSE, RRIT 21


BCS402 Microcontrollers

LOGICAL INSTRUCTIONS:
 Logical instructions perform bitwise operations on the two source registers.
Syntax: <instruction> {<cond>} {S} Rd, Rn, N

 In the example shown below, a logical OR operation between registers r1 and r2 and
the result is in r0.

COMPARISON INSTRUCTIONS:
 The comparison instructions are used to compare or test a register with a 32-bit value.
They update the cpsr flag bits according to the result, but do not affect other registers.
 After the bits have been set, the information can be used to change program flow by
using conditional execution.
Syntax: <instruction> {<cond>} Rn, N

 Example shown below for CMP instruction, both r0 and r1 are equal before the execution
of the instruction. The value of the z flag prior to the execution is 0 and after the
execution z flag changes to 1 (upper case of Z).

Dept. of CSE, RRIT 22


BCS402 Microcontrollers

The CMP is effectively a subtract instruction with the result discarded;


Similarly the TST instruction is a logical AND operation and TEQ is a logical XOR
operation. For each, the results are discarded but the condition bits are updated in the
cpsr.
MULTIPLY INSTRUCTIONS:
 The multiply instructions multiply the contents of a pair of registers and depending upon
the instruction, accumulate the results in another register.
 The long multiplies accumulate onto a pair of registers representing a 64-bit value.
Syntax: MLA {<cond>} {S} Rd, Rm, Rs, Rn
MUL {<cond>} {S} Rd, Rm, Rs

Syntax: <instruction> {<cond>} {S} RdLo, RdHi, Rm, Rs

 In the following example below shows a multiply instruction that multiplies registers
r1 and r2 and places the result into the register r0.

Dept. of CSE, RRIT 23


BCS402 Microcontrollers

 The long multiply instructions (SMLAL, SMULL, UMLAL, and UMULL) produce a 64-
bit result.

BRANCH INSTRUCTIONS
Q2. Explain briefly branch instructions of ARM processor.
Answer:
 A branch instruction changes the flow of execution or is used to call a routine.
 This type of instruction allows programs to have subroutines, if-then-else structures, and
loops.
 The change of execution flow forces the program counter (pc) to point to a new address.

 T refers to the Thumb bit in the cpsr.


 When instruction set T, the ARM switches to Thumb state.
 The example shown below is a forward branch. The forward branch skips three
instructions.

Dept. of CSE, RRIT 24


BCS402 Microcontrollers

 The branch with link (BL) instruction changes the execution flow in addition
overwrites the link register lr with a return address. The example shows below a
fragment of code that branches to a subroutine using the BL instruction.

The branch exchange (BX) instruction uses an absolute address stored in register
Rm. It is primarily used to branch to and from Thumb code. The T bit in the cpsr is
updated by the least significant bit of the branch register.
 Similarly, branch exchange with link (BLX) instruction updates the T bit of the cpsr
with the least significant bit and additionally sets the link register with the return
address.
LOAD-STORE INSTRUCTIONS ( Memory Access Instructions)
 Load-store instructions transfer data between memory and processor registers. There are
three types of load-store instructions: single-register transfer, multiple-register transfer,
and swap.
a) Single-Register Transfer
 These instructions are used for moving a single data item in and out of a register.
 Here are the various load-store single-register transfer instructions.
Syntax: <LDR|STR>{<cond>}{B} Rd, addressing1
LDR{<cond>}SB|H|SH Rd, addressing2
STR{<cond>}H Rd, addressing2

Dept. of CSE, RRIT 25


BCS402 Microcontrollers

 Example:
1. LDR r0, [r1]
o This instruction loads a word from the address stored in register r1 and places
it into register r0.

2. STR r0, [r1]


 This instruction goes the other way by storing the contents of register r0 to
the address contained in register r1.

b) Multiple-Register Transfer
 Load-store multiple instructions can transfer multiple registers between memory and the
processor in a single instruction. The transfer occurs from a base address register Rn
pointing into memory.
 Multiple-register transfer instructions are more efficient from single-register transfers for
moving blocks of data around memory and saving and restoring context and stacks.

Syntax: <LDM|STM>{<cond>}<addressing mode> Rn{!},<registers>{ˆ}

 Here N is the number of registers in the list of registers.


c) SWAP Instruction
 The swap instruction is a special case of a load-store instruction. It swaps
the contents of memory with the contents of a register.

Syntax: SWP {B} {<cond>} Rd, Rm, [Rn]

Dept. of CSE, RRIT 26


BCS402 Microcontrollers

Addressing modes:
Single-Register Load-Store Addressing Modes
 The ARM instruction set provides different modes for addressing memory.
 These modes incorporate one of the indexing methods: preindex with writeback,
preindex, and postindex

Example:

Dept. of CSE, RRIT 27


BCS402 Microcontrollers

Addressing mode for load-store multiple instructions


 Table below shows the different addressing modes for the load-store multiple
instructions.

Example:
mem32[0x8001c] =0x04

 If LDMIA is replaced with LDMIB post execution the content of registers is shown
below

STACK OPERATIONS
 The ARM architecture uses the load-store multiple instructions to carry out stack
operations.
 The pop operation (removing data from a stack) uses a load multiple instruction;
similarly, the push operation (placing data onto the stack) uses a store multiple
instruction.

Dept. of CSE, RRIT 28


BCS402 Microcontrollers

 When you use a full stack (F), the stack pointer sp points to an address that is the last
used or full location.
 In contrast, if you use an empty stack (E) the sp points to an address that is the first
unused or empty location.
 A stack is either ascending (A) or descending (D). Ascending stacks grow towards
higher memory addresses; in contrast, descending stacks grow towards lower memory
addresses.
 Addressing modes for stack operation

 The LDMFD and STMFD instructions provide the pop and push functions, respectively.
 Example1: With full descending

Figure: STMFD instruction full stack push operation.


Example 2: With empty descending

Figure: STMED instruction empty stack push operation.

Dept. of CSE, RRIT 29


BCS402 Microcontrollers

SOFTWARE INTERRUPT INSTRUCTION


Q3. Explain briefly the software interrupt instruction.
Answer:
 A software interrupt instruction (SWI) causes a software interrupt exception, which
provides a mechanism for applications to call operating system routines.

Syntax: SWI {<cond>} SWI_number

 When the processor executes an SWI instruction, it sets the program counter pc to the
offset 0xB in the vector table.
 The instruction also forces the processor mode to SVC, which allows an operating system
routine to be called in a privileged mode.
 Each SWI instruction has an associated SWI number, which is used to represent a
particular function call or feature.
 The example below shows an SWI call with SWI number 0x123456, used by
ARM toolkits as a debugging SWI.

 Since SWI instructions are used to call operating system routines, it is required some
form of parameter passing.
 This achieved by using registers. In the above example, register r0 is used to pass
parameter 0x12. The return values are also passed back via register.

Dept. of CSE, RRIT 30


BCS402 Microcontrollers

Program Status Register Instructions


Q4. Explain briefly program status register instructions.
Answer:
 The ARM instruction set provides two instructions to directly control a program status
register (psr).
 The MRS instruction transfers the contents of either the cpsr or spsr to general purpose
register.
 The MSR instruction transfers the contents of a general purpose register to cpsr or spsr.
 Together these instructions are used to read and write the cpsr and spsr.
Syntax: MRS {<cond>} Rd <cpsr |spsr>
MSR {<cond>} <cpsr|spsr} _<fields>,Rm
MSR {<cond>} <cpsr|spsr} _<fields>, #immediate
 The table shows the program status register instructions

Coprocessor Instructions
Q5. Explain briefly coprocessor instructions.
Answer:
 Coprocessor instructions are used to extend the instruction set.
 A coprocessor can either provide additional computation capability or be used to control
the memory subsystem including caches and memory management.
 These instructions are used only by core with a coprocessor.
Syntax: CDP {<cond>} cp,opcode1, Cd, Cn {,opcode2}
<MRC|MCR>{<cond>}cp,opcode1,Rd,Cn,Cm{,opcode2}
<LDC|STC>{<cond>}cp,Cd,addressing

 In the syntax of the coprocessor instructions, the cp field represents the number between
p0 and p15. The opcode fields describe the operation to take place on the coprocessor.
The Cn, Cm and Cd fields describe registers within the coprocessor.
 For example: The instruction below copies coprocessor CP15 register c0 into a general
purpose register r10.

Dept. of CSE, RRIT 31


BCS402 Microcontrollers

MRC p15, 0, r10, c0, c0, 0 ; CP15 register-0 is copied into general
purpose register r10.
 For example: The instruction below moves the contents of CP15 control register c1
into register r1 of the processor core.
MRC p15, 0, r1, c1, c0, 0
Loading Constants
Q6. Explain briefly the loading constants.
Answer:
 There are two pseudo instructions to move a 32-bit constant value to a register.
Syntax: LDR Rd, =constant
ADR Rd, label

 The example below shows an LDR instruction loading a 32-bit constant 0xff00ffff
into register r0.
LDR r0, =0xff00ffff
Programs:
1. Write ALP program for ARM7 demonstrating the data transfer.
Answer:
AREA DATATRANSFER, CODE, READONLY
ENTRY
LDR R9,=SRC ;LOAD STARTING ADDRESS OF SOURCE
LDR R10,=DST ;LOAD STARTING ADDRESS OF DESTINATION

LDMIA R9!,{R0-R7]
STMIA R10!,{R0-R7]

SRC DCD 1,2,3,4,5,6,7,8


AREA BLOCKDATA, DATA, READWRITE
DST DCD 0,0,0,0,0,0,0,0
END

Dept. of CSE, RRIT 32


BCS402 Microcontrollers

2. Write ALP program for ARM7 demonstrating logical operation.


Answer:
AREA LOGIC, CODE, READONLY
ENTRY
LDR R0, =5
LDR R1, =3
AND R4, R0, R1
ORR R5, R0, R1
EOR R6, R0,
R1 BIC R7,
R0, R1 END

3. Write ALP program for ARM7 demonstrating arithmetic


operation. Answer:
AREA ARITH, CODE, READONLY
ENTRY
LDR, R1, =20
LDR R2, =25
ADD R3, R1, R2
MUL R4, R1, R2
SUB R5, R1, R2
END

4. Write ALP using ARM instructions that calls subroutine fact to find factorial of a given
number.
Answer:

AREA FACTORIAL, CODE, READONLY


ENTRY
START LDR R0, #5
BL FACT // BRANCH WITH LINK
LDR R4, =DST // LOCATION TO STORE RESULT
STR R5, [R4]
STOP B STOP

FACT
MOVS R1, R0 ; If R1= 0, ZF =1
MOVEQ R5, #1 ; If ZF =1, Return 1
LOOP
SUBNES R1, R1, #1 ; R1 = R1-1 if R1 not 0
MULNE R0, R1, R0 ; R0 = R1 * R0
BNE LOOP ; IF (R1 != 0) LOOP.

Dept. of CSE, RRIT 33


BCS402 Microcontrollers

MOV R5, R0
MOV PC, R14 ; RETURN WITH RESULT IN R5.
AREA FACT, DATA,
READWRITE DSTDCD 0

END

5. Write ALP program to add array of 16 bit numbers and store the result in
memory. Answer:
AREA AryAdd, CODE, READONLY
ENTRY
LDR R0, =SRC ; pointer to source array
LDR R1, = DST ; pointer to destination
MOV R2, #5 ; count of numbers
MOV R5, #0 ; initial sum
UP LDRH R3, [R0] ; 1st number in R2
ADD R5, R5, R3 ; add numbers
ADD R0, R0, #2 ; increment pointer to next
number SUBS R2, R2, #1 ; decrement count by
1 CMP R2, #0
BNE UP
STRH R5, [R1]
STOP B STOP
SRC DCW 10, 20, 30, 40, 50
AREA BLOCKDATA, DATA, READWRITE
DST DCW 0
END

6. Write ALP program to generate Fibonacci series.

AREA FIB, CODE,


READONLY ENTRY
MOV R0, #00 ; FIRST FIBONACCI NUMBER
SUB R0, R0, #01 ; R0= -1
MOV R1, #01
MOV R4, #05 ;NO OF FIBONACCI NUMBERS TO
GENERATE LDR R2, = FIBO ;ADDRESS TO STORE FIBONACCI
NUMBERS
BACK ADD R0, R1 ;ADDING THE PREVIOUS TWO
NUMBERS STR R0, [R2] ; STORING THE NUMBER IN A MEMORY
ADD R2, #04 ;INCREMENTING THE
ADDRESS MOV R3, R0

Dept. of CSE, RRIT 34


BCS402 Microcontrollers

MOV R0, R1

Dept. of CSE, RRIT 35


BCS402 Microcontrollers

MOV R1, R3
SUB R4, #01 ;DECREMENTING THE COUNTER
CMP R4, #00 ;COMPARING THE COUNTER TO ZERO
BNE BACK ;LOOPING
BACK STOP B STOP
AREA FIBONACCI, DATA, READWRITE
FIBO DCD 0,0,0,0,0
END

7. Write an ALP to copy a block (Block1 ) to another block (Block2) using


ARM instructions. ( Lab 6b program)
area word, code, readonly ;name the block of code
num equ 20 ;set number of words to be copied
entry ;mark the first instruction called
Start
ldr r0, =src ;r0 = pointer to source block
ldr r1, =dst ;r1 = pointer to destination block
mov r2, #num ;r2 = number of words to copy
Wordcopy
ldr r3, [r0], #4 ;load a word from the source(src) and
str r3, [r1], #4 ;store it to the destination(dst)
subs r2, r2, #1 ;decrement the counter(num)
bne wordcopy ;... copy more
Stop
src dcd 1,2,3,4,5,6,7,8,1,2,3,4,5,6,7,8,1,2,3,4
area blockdata, data, readwrite
dst dcd 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

End

8. Write an ALP to display message “HELLO WORLD” using ARM7


instructions Refer Lab mnaual

WRITING AND OPTIMIZING ARM ASSEMBLY CODE

Writing assembly by hand gives you direct control of three optimization tools that you cannot
explicitly use by writing C source:

■ Instruction scheduling: Reordering the instructions in a code sequence to avoid processor


stalls. Since ARM implementations are pipelined, the timing of an instruction can be affected by
neighboring instructions.
■ Register allocation: Deciding how variables should be allocated to ARM registers or stack
locations for maximum performance. Our goal is to minimize the number of memory accesses.
■ Conditional execution: Accessing the full range of ARM condition codes and conditional
instructions.

Dept. of CSE, RRIT 36


BCS402 Microcontrollers

Writing Assembly Code


Example 6.1
This example shows how to convert a C function to an assembly function—usually the first stage
of assembly optimization. Consider the simple C program main.c following that prints the
squares of the integers from 0 to 9:

int square(int i)
{
return i*i;
}
 Let’s see how to replace square by an assembly function that performs the same action.
Remove the C definition of square, but not the declaration (the second line) to produce a
new C file main1.c. Next add an armasm assembler file square.s with the following
contents:

 The AREA directive names the area or code section that the code lives in. If you use
nonalphanumeric characters in a symbol or area name, then enclose the name in vertical
bars.
 The EXPORT directive makes the symbol square available for external linking.
 The input argument is passed in register r0, and the return value is returned in register r0.

Dept. of CSE, RRIT 37


BCS402 Microcontrollers

 The multiply instruction has a restriction that the destination register must not be the
same as the first argument register. Therefore we place the multiply result into r1 and
move this to r0.
 The END directive marks the end of the assembly file. Comments follow a semicolon.
 Example 6.1 only works if you are compiling your C as ARM code. If you compile your
C as Thumb code, then the assembly routine must return using a BX instruction as shown
below

Example 6.2
This example shows how to call a subroutine from an assembly routine. We will take Example
6.1 and convert the whole program (including main) into assembly. We will call the C library
routine printf as a subroutine. Create a new assembly file main3.s with the following contents:

 We have used a new directive, IMPORT, to declare symbols that are defined in other
files.

Dept. of CSE, RRIT 38


BCS402 Microcontrollers

 The imported symbol Lib$$Request$$armlib makes a request that the linker links with
the standard ARM C library. The WEAK specifier prevents the linker from giving an
error if the symbol is not found at link time. If the symbol is not found, it will take the
value zero.
 The second imported symbol main is the start of the C library initialization code.
 You only need to import these symbols if you are defining your own main; a main
defined in C code will import these automatically for you. Importing printf allows us to
call that C library function.
 The RN directive allows us to use names for registers. In this case we define i as an
alternate name for register r4. Using register names makes the code more readable.
 Recall that ATPCS states that a function must preserve registers r4 to r11 and sp. We
corrupt i(r4), and calling printf will corrupt lr.
 Therefore we stack these two registers at the start of the function using an STMFD
instruction. The LDMFD instruction pulls these registers from the stack and returns by
writing the return address to pc.
 The DCB directive defines byte data described as a string or a comma-separated list of
bytes.
 Note that Example 6.3 also assumes that the code is called from ARM code. If the code
can be called from Thumb code as in Example 6.2 then we must be capable of returning
to Thumb code.

 Finally, let’s look at an example where we pass more than four parameters. Recall that
ATPCS places the first four arguments in registers r0 to r3. Subsequent arguments are
placed on the stack.
Example 6.3
 This example defines a function sumof that can sum any number of integers. The
arguments are the number of integers to sum followed by a list of the integers. The sumof
function is written in assembly and can accept any number of arguments.

Dept. of CSE, RRIT 39


BCS402 Microcontrollers

 Next define the sumof function in an assembly file sumof.s:

 The code keeps count of the number of remaining values to sum, N. The first three values
are in registers r1, r2, r3. The remaining values are on the stack.

Profiling and Cycle Counting


 The first stage of any optimization process is to identify the critical routines and measure
their current performance. A profiler is a tool that measures the proportion of time or
processing cycles spent in each subroutine. You use a profiler to identify the most critical
routines.
 A cycle counter measures the number of cycles taken by a specific routine. You can
measure your success by using a cycle counter to benchmark a given subroutine before
and after an optimization.

Dept. of CSE, RRIT 40


BCS402 Microcontrollers

 The ARM simulator used by the ADS1.1 debugger is called the ARMulator and provides
profiling and cycle counting features.
 The ARMulator profiler works by sampling the program counter pc at regular intervals.
The profiler identifies the function the pc points to and updates a hit counter for each
function it encounters.
 Another approach is to use the trace output of a simulator as a source for analysis.
 A pc-sampled profiler can produce meaningless results if it records too few samples. You
can even implement your own pc-sampled profiler in a hardware system using timer
interrupts to collect the pc data points.
 ARM implementations do not normally contain cycle-counting hardware, so to easily
measure cycle counts you should use an ARM debugger with ARM simulator. You can
configure the ARMulator to simulate a range of different ARM cores and obtain cycle
count benchmarks for a number of platforms.

Instruction Scheduling
 Instructions that are conditional on the value of the ARM condition codes in the cpsr
take one cycle if the condition is not met.
 If the condition is met, then the following rules apply:
1. ALU operations such as addition, subtraction, and logical operations take one
cycle. This includes a shift by an immediate value. If you use a register-
specified shift, then add one cycle. If the instruction writes to the pc, then add
two cycles.
2. Load instructions that loadN32-bit words of memory such as LDR and LDM
take N cycles to issue, but the result of the last word loaded is not available on
the following cycle. The updated load address is available on the next cycle.
This assumes zero-wait-state memory for an uncached system, or a cache hit
for a cached system. An LDM of a single value is exceptional, taking two
cycles. If the instruction loads pc, then add two cycles.
3. Load instructions that load 16-bit or 8-bit data such as LDRB, LDRSB,
LDRH, and LDRSH take one cycle to issue. The load result is not available on
the following two cycles. The updated load address is available on the next
cycle. This assumes zero-wait-state memory for an uncached system, or a
cache hit for a cached system.
4. Branch instructions take three cycles.
5. Store instructions that store N values take N cycles. This assumes zero-wait-
state memory for an uncached system, or a cache hit or a write buffer with N
free entries for a cached system. An STM of a single value is exceptional,
taking two cycles.
6. Multiply instructions take a varying number of cycles depending on the value
of the second operand in the product.
 To understand how to schedule code efficiently on the ARM, we need to understand the
ARM pipeline and dependencies. The ARM9TDMI processor performs five operations in
parallel:

Dept. of CSE, RRIT 41


BCS402 Microcontrollers

o Fetch: Fetch from memory the instruction at address pc. The instruction is loaded
into the core and then processes down the core pipeline.
o Decode: Decode the instruction that was fetched in the previous cycle. The
processor also reads the input operands from the register bank if they are not
available via one of the forwarding paths.
o ALU: Executes the instruction that was decoded in the previous cycle. Note this
instruction was originally fetched from address pc − 8 (ARM state) or pc − 4
(Thumb state). Normally this involves calculating the answer for a data
processing operation, or the address for a load, store, or branch operation. Some
instructions may spend several cycles in this stage. For example, multiply and
register-controlled shift operations take several ALU cycles.
o LS1: Load or store the data specified by a load or store instruction. If the
instruction is not a load or store, then this stage has no effect.
o LS2: Extract and zero- or sign-extend the data loaded by a byte or halfword load
instruction. If the instruction is not a load of an 8-bit byte or 16-bit halfword item,
then this stage has no effect.
 Figure 6.1 shows a simplified functional view of the five-stage ARM9TDMI pipeline.
Note that multiply and register shift operations are not shown in the figure.

 After an instruction has completed the five stages of the pipeline, the core writes the
result to the register file. Note that pc points to the address of the instruction being
fetched.
 The ALU is executing the instruction that was originally fetched from address pc − 8 in
parallel with fetching the instruction at address pc.
 How does the pipeline affect the timing of instructions? Consider the following
examples. These examples show how the cycle timings change because an earlier
instruction must complete a stage before the current instruction can progress down the
pipeline.
 If an instruction requires the result of a previous instruction that is not available, then the
processor stalls. This is called a pipeline hazard or pipeline interlock.

Example 6.4: This example shows the case where there is no interlock.

 This instruction pair takes two cycles. The ALU calculates r0 + r1 in one cycle. Therefore
this result is available for the ALU to calculate r0 + r2 in the second cycle.

Dept. of CSE, RRIT 42


BCS402 Microcontrollers

Example 6.5: This example shows a one-cycle interlock caused by load use.

 This instruction pair takes three cycles. The ALU calculates the address r2 + 4 in the first
cycle while decoding the ADD instruction in parallel. However, the ADD cannot proceed
on the second cycle because the load instruction has not yet loaded the value of r1.
Therefore the pipeline stalls for one cycle while the load instruction completes the LS1
stage.
 Now that r1 is ready, the processor executes the ADD in the ALU on the third cycle.
 Figure 6.2: illustrates how this interlock affects the pipeline. The processor stalls the
ADD instruction for one cycle in the ALU stage of the pipeline while the load instruction
completes the LS1 stage. We’ve denoted this stall by an italic ADD. Since the LDR
instruction proceeds down the pipeline, but the ADD instruction is stalled, a gap opens up
between them.
 This gap is sometimes called a pipeline bubble. We’ve marked the bubble with a dash.

Example 6.6: This example shows a one-cycle interlock caused by delayed load use.

 This instruction triplet takes four cycles. Although the ADD proceeds on the cycle
following the load byte, the EOR instruction cannot start on the third cycle. The r1 value
is not ready until the load instruction completes the LS2 stage of the pipeline. The
processor stalls the EOR instruction for one cycle.
 Note that the ADD instruction does not affect the timing at all. The sequence takes four
cycles whether it is there or not.
 Figure 6.3 shows how this sequence progresses through the processor pipeline. The
ADD doesn’t cause any stalls since the ADD does not use r1, the result of the load.

Dept. of CSE, RRIT 43


BCS402 Microcontrollers

Example 6.8: This example shows why a branch instruction takes three cycles. The processor
must flush the pipeline when jumping to a new address.

 The three executed instructions take a total of five cycles. The MOV instruction executes
on the first cycle. On the second cycle, the branch instruction calculates the destination
address. This causes the core to flush the pipeline and refill it using this new pc value.
The refill takes two cycles. Finally, the SUB instruction executes normally.
 Figure 6.4 illustrates the pipeline state on each cycle. The pipeline drops the two
instructions following the branch when the branch takes place.

Dept. of CSE, RRIT 44


BCS402 Microcontrollers

Scheduling of load instructions

 Load instructions occur frequently in compiled code, accounting for approximately one
third of all instructions. Careful scheduling of load instructions so that pipeline stalls
don’t occur can improve performance.
 The compiler cannot move a load instruction before a store instruction unless it is certain
that the two pointers used do not point to the same address.
 Let’s consider an example of a memory-intensive task. The following function,
str_tolower, copies a zero-terminated string of characters from in to out. It converts the
string to lowercase in the process.

 The ADS1.1 compiler generates the following compiled output. Notice that the compiler
optimizes the condition (c>=‘A’ && c<=‘Z’) to the check that 0<=c-‘A’<=‘Z’-‘A’. The
compiler can perform this check using a single unsigned comparison.

Dept. of CSE, RRIT 45


BCS402 Microcontrollers

 Unfortunately, the SUB instruction uses the value of c directly after the LDRB instruction
that loads c. Consequently, the ARM9TDMI pipeline will stall for two cycles. The
compiler can’t do any better since everything following the load of c depends on its
value.
 However, there are two ways you can alter the structure of the algorithm to avoid the
cycles by using assembly. We call these methods load scheduling by preloading and
unrolling.

1. Load Scheduling by Preloading:


 In this method of load scheduling, we load the data required for the loop at the
end of the previous loop, rather than at the beginning of the current loop. To get
performance improvement with little increase in code size, we don’t unroll the
loop.
 Example 6.9: This assembly applies the preload method to the str_tolower
function.

Dept. of CSE, RRIT 46


BCS402 Microcontrollers

 The scheduled version is one instruction longer than the C version, but we save two
cycles for each inner loop iteration. This reduces the loop from 11 cycles per character to
9 cycles per character on an ARM9TDMI, giving a 1.22 times speed improvement.

2. Load Scheduling by Unrolling


 This method of load scheduling works by unrolling and then interleaving the body of the
loop.
 For example, we can perform loop iterations i, i + 1, i + 2 interleaved. When the result of
an operation from loop i is not ready, we can perform an operation from loop i + 1 that
avoids waiting for the loop i result.
 Example 6.9: The assembly applies load scheduling by unrolling to the str_tolower
function.

Dept. of CSE, RRIT 47


BCS402 Microcontrollers

 This loop is the most efficient implementation we’ve looked at so far. The
implementation requires seven cycles per character on ARM9TDMI. This gives a 1.57
times speed increase over the original str_tolower.

Register Allocation
 You can use 14 of the 16 visible ARM registers to hold general-purpose data. The other
two registers are the stack pointer r13 and the program counter r15.
 For a function to be ATPCS compliant it must preserve the callee values of registers r4 to
r11. ATPCS also specifies that the stack should be eight-byte aligned; therefore you must
preserve this alignment if calling subroutines.
 Use the following template for optimized assembly routines requiring many registers:

 Our only purpose in stacking r12 is to keep the stack eight-byte aligned.
 In this section we look at how best to allocate variables to register numbers for register
intensive tasks, how to use more than 14 local variables, and how to make the best use of
the 14 available registers.
1. Allocating Variables to Register Numbers
 When you write an assembly routine, it is best to start by using names for the
variables, rather than explicit register numbers. This allows you to change the
allocation of variables to register numbers easily.
 You can even use different register names for the same physical register number
when their use doesn’t overlap. Register names increase the clarity and readability
of optimized code.
 However, there are several cases where the physical number of the register is
important:
i. Argument registers. The ATPCS convention defines that the first four
arguments to a function are placed in registers r0 to r3. Further arguments
are placed on the stack. The return value must be placed in r0.
ii. Registers used in a load or store multiple. Load and store multiple
instructions LDM and STM operate on a list of registers in order of
ascending register number. If r0 and r1 appear in the register list, then the
processor will always load or store r0 using a lower address than r1 and so
on.
iii. Load and store double word. The LDRD and STRD instructions
introduced in ARMv5E operate on a pair of registers with sequential
register numbers, Rd and Rd + 1. Furthermore, Rd must be an even
register number.
 There are several possible ways we can proceed when we run out of registers:

Dept. of CSE, RRIT 48


BCS402 Microcontrollers

 Reduce the number of registers we require by performing fewer operations in


each loop.
 Use the stack to store the least-used values to free up more registers.
 Alter the code implementation to free up more registers.

2. Using More than 14 Local Variables


 If you need more than 14 local 32-bit variables in a routine, then you must store
some variables on the stack. The standard procedure is to work outwards from the
innermost loop of the algorithm, since the innermost loop has the greatest
performance impact.
3. Making the Most of Available Registers
 On a load-store architecture such as the ARM, it is more efficient to access values
held in registers than values held in memory. There are several tricks you can use
to fit several sub-32-bit length variables into a single 32-bit register and thus can
reduce code size and increase performance.

Conditional Execution

 The processor core can conditionally execute most ARM instructions. This conditional
execution is based on one of 15 condition codes.
 If you don’t specify a condition, the assembler defaults to the execute always condition
(AL).
 The other 14 conditions split into seven pairs of complements. The conditions depend on
the four condition code flags N, Z, C, V stored in the cpsr register.
 By default, ARM instructions do not update the N, Z, C, V flags in the ARM cpsr. For
most instructions, to update these flags you append an S suffix to the instruction
mnemonic.
 Exceptions to this are comparison instructions that do not write to a destination register.
Their sole purpose is to update the flags and so they don’t require the S suffix.
 By combining conditional execution and conditional setting of the flags, you can
implement simple if statements without any need for branches. This improves efficiency
since branches can take many cycles and also reduces code size.
 Example 6.18: Consider the following example for conditional execution , the following
C code identifies if c is a vowel:

 In assembly you can write this using conditional comparisons:

Dept. of CSE, RRIT 49


BCS402 Microcontrollers

 As soon as one of the TEQ comparisons detects a match, the Z flag is set in the cpsr.
The following TEQNE instructions have no effect as they are conditional on Z = 0.

Looping Constructs
 Most routines critical to performance will contain a loop.
 This section describes how to implement these loops efficiently in assembly. We also
look at examples of how to unroll loops for maximum performance.
1. Decremented Counted Loops
 For a decrementing loop of N iterations, the loop counter i counts down from N to 1
inclusive. The loop terminates with i = 0. An efficient implementation is

 The loop overhead consists of a subtraction setting the condition codes followed by a
conditional branch. On ARM7 and ARM9 this overhead costs four cycles per loop. If
I is an array index, then you may want to count down from N−1 to 0 inclusive instead
so that you can access array element zero.
 You can implement this in the same way by using a different conditional branch:

 In this arrangement the Z flag is set on the last iteration of the loop and cleared for
other iterations. If there is anything different about the last loop, then we can achieve
this using the EQ and NE conditions. For example, if you preload data for the next
loop then you want to avoid the preload on the last loop.

2. Unrolled Counted Loops


 Loop unrolling reduces the loop overhead by executing the loop body multiple
times.
 However, there are problems to overcome. What if the loop count is not a
multiple of the unroll amount? What if the loop count is smaller than the unroll
amount?
 In this section we look at how you can handle these issues in assembly.

Dept. of CSE, RRIT 50


BCS402 Microcontrollers

 We’ll take the C library function memset as a case study. This function sets N
bytes of memory at address s to the byte value c. The function needs to be
efficient, so we will look at how to unroll the loop without placing extra
restrictions on the input operands.
 Our version of memset will have the following C prototype:

 To be efficient for large N, we need to write multiple bytes at a time using STR or
STM instructions. Therefore our first task is to align the array pointer s.
 However, it is only worth us doing this if N is sufficiently large. We aren’t sure
yet what “sufficiently large” means, but let’s assume we can choose a threshold
value T1 and only bother to align the array when N ≥ T1.
 Clearly T1 ≥ 3 as there is no point in aligning if we don’t have four bytes to
write!
 Now suppose we have aligned the array s. We can use store multiples to set
memory efficiently.
 For example, we can use a loop of four store multiples of eight words each to set
128 bytes on each loop. However, it will only be worth doing this if N ≥ T2 ≥
128, where T2 is another threshold to be determined later on.
 Finally, we are left with N < T2 bytes to set. We can write bytes in blocks of four
using STR until N < 4. Then we can finish by writing bytes singly with STRB to
the end of the array.

3. Multiple Nested Loops


 How many loop counters does it take to maintain multiple nested loops? Actually,
one will suffice—or more accurately, one provided the sum of the bits needed for
each loop count does not exceed 32.
 We can combine the loop counts within a single register, placing the innermost
loop count at the highest bit positions.
 This section gives an example showing how to do this. This example shows how
to merge three loop counts into a single loop count. Suppose we wish to multiply
matrix B by matrix C to produce matrix A, where A, B, C have the following
constant dimensions. We assume that R, S, T are relatively large but less than
256.

 A simple C implementation of the matrix multiply uses three nested loops i,


j, and k:

Dept. of CSE, RRIT 51


BCS402 Microcontrollers

Dept. of CSE, RRIT 52


18CS44 Microcontrollers and Embedded Systems

Module 3

Syllabus - Embedded System Components: Embedded Vs General computing system,


History of embedded systems, Classification of Embedded systems, Major applications areas
of
embedded systems, purpose of embedded systems.

Core of an Embedded System including all types of processor/controller, Memory, Sensors,


Actuators, LED, 7 segment LED display, stepper motor, Keyboard, Push button switch,
Communication Interface (on board and external types), embedded firmware, other system
Components.

Text book 2: Chapter 1(Sections 1.2 to 1.6), Chapter 2(Sections 2.1 to 2.6)RBT: L1, L2

What is an embedded system?


An embedded system is an electronic/electro-mechanical system designed to perform a
specific function and is a combination of both hardware and firmware (software).

Every embedded system is unique, and the hardware as well as the firmware is highly
specialised to the application domain.

Embedded systems are becoming an inevitable part of any product or equipment in all fields
including household appliances, telecommunications, medical equipment, industrial control,
consumer products, etc.

General purpose computing system Vs Embedded System

General purpose computing system Embedded System


1. A system which is a combination of a A system which is a combination of a
generic hardware and a General Purpose special purpose hardware and an embedded
Operating System for executing variety of Operating System for executing specific set
applications. of applications.
2. Contains a General Purpose Operating May or may not contain an operating system
System (GPOS) for functioning
3. Applications are alterable The firmware of the embedded system is
(programmable) ( it is possible for the end pre-programed and it is non-alterable by the
user to re-install the operating system, and end-user
also add or remove user applications)
4. Performance is the key deciding factor in Application-specific requirements (Like
the selection of the system. Always, “Faster performance,
is Better’ Power requirements, memory usage. Etc.)
are the key deciding
5. Response requirements are not time- For certain category of embedded systems
critical the response time requirement is highly

Dept. of CSE, HKBKCE 53 2019-20


18CS44 Microcontrollers and Embedded Systems

critical
6. Need not be deterministic in execution Execution behaviour is deterministic for
behaviour certain types of embedded systems like
Hard Real Time systems
7. Less/not at all tailored towards reduced Highly tailored to take advantage of the
operating power requirements. power saving modes

History of embedded systems


Embedded systems were in existence even before the IT revolution. In the olden days
embedded systems were built around the old vacuum tube and transistor technologies and the
embedded algorithm was developed in low level languages.

Advances in semiconductor and Nano-technology and IT revolution gave way to the


development of very small embedded systems.

The first recognised modern embedded system is the Apollo Guidance Computer (AGC)
developed by the MIT Instrumentation Laboratory for the lunar expedition. They ran the
inertial guidance systems of both the Command Module (CM) and the Lunar Excursion
Module (LEM).

The Command Module was designed to encircle the moon while the Lunar Module and its
crew were designed to go down to the moon surface and land there safely.

TheLunar Module featured in total 18 engines. There were 16 reaction control thrusters, a
descent engineand an ascent engine. The descent engine was ‘designed to’ provide thrust to
the lunar module out of the lunar orbit and land it safely on the moon.

MIT’s original design was based on 4K words of fixedmemory (Read Only Memory) and 256
words of erasable memory (Random Access Memory). By June1963, the figures reached 10K
of fixed and 1K of erasable memory. The final configuration was 36Kwords of fixed memory
and 2K words of erasable memory.

The clock frequency of the first microchipproto model used in AGC was 1.024 MHz and it
was derived from a 2.048 MHz crystal clock.

The first mass-produced embedded system was the guidance computer for the Minuteman-
I missile in 1961. It was the ‘Autonetics D-17’ guidance computer, built using discrete
transistor logic and a hard-disk for main memory.

The first integrated circuit was produced in September 1958 but computersusing them didn’t
begin to appear until 1963. Some of their early uses were in embedded systems, notably used
by NASA for the Apollo Guidance Computer and by the US military in the Minuteman-II
intercontinental ballistic missile.

Dept. of CSE, HKBKCE 54 2019-20


18CS44 Microcontrollers and Embedded Systems

Classification of Embedded systems


Some of the criteria used in the classification of embedded systems are:
1. Based on generation
2. Complexity and performance requirements
3. Based on deterministic behaviour No
4. Based on triggering.

This classification is based on the order in which the embedded systems evolved from the
first version to where they are today.

1. First Generation - The early embedded systems were built around 8bit
microprocessors like 8085 and Z80, and 4bit microcontrollers. Simple in hardware
circuits with firmware developed in assembly code.
Example -Digital telephone keypads, stepper motor control units etc.

2. Second Generation - These are embedded systems built around 16bit microprocessors
and 8 or 16 bit microcontrollers.
- The instruction set for the second generation processors/controllers were much more
complex and powerful than the first generation processors/controllers.
- Some of the second generation embedded systems contained embedded operating
systems for their operation.
Example -Data Acquisition Systems, SCADA systems, etc.

3. Third Generation - With advances in processor technology, embedded system


developers started making use of powerful 32bit processors and 16bit
microcontrollers for their design.
- A new concept of application and domain specific processors/controllers like
Digital Signal Processors (DSP) and Application Specific Integrated Circuits
(ASICs) came into the picture.
- The instruction set of processors became more complex and powerful and the
concept of instruction pipelining also evolved.
- Processors like Intel Pentium, Motorola 68K, etc. gained attention in high
performance embedded requirements. Dedicated embedded real time and general
purpose operating systems entered into the embedded market. Embedded systems
spread its ground to areas like robotics, media, industrial process control, networking,
etc.

4. Fourth Generation -The advent of System on Chips (SoC), reconfigurable


processors and multicore processors are bringing high performance and tight
integration into the embedded device market.
- The SoC technique implements a total system on a chip by integrating different
functionalities with a processor core on an integrated circuit.

Dept. of CSE, HKBKCE 55 2019-20


18CS44 Microcontrollers and Embedded Systems

- The fourth generation embedded systems are making use of high performance real
time embedded operating systems for their functioning.
Example -Smart phone devices, mobile internet devices (MIDs), etc.
Classification Based on Complexity and Performance
According to this classification, embedded system can be grouped into:

1. Small-Scale Embedded Systems - Embedded systems which are simple in


application needs and where the performance requirements are not time critical fall
under this category.
- An electronic toy is a typical example of a small-scale embedded system. Small-scale
embedded systems are usually built around low performance and low cost 8 or 16 bit
microprocessors/microcontrollers.
- A small-scale embedded system may or may not contain an operating system for its
functioning.

2. Medium-Scale Embedded Systems Embedded systems which are slightly complexin


hardware and firmware (software) requirements fall under this category.
- Medium-scale embedded systems are usually built around medium performance, low cost
16 or 32 bit microprocessors/microcontrollers or digital signal processors.
- They usually contain an embedded operating system (either general purpose or real time
operating system) for functioning.

3. Large-Scale Embedded Systems/Complex Systems Embeddedsystems–


whichinvolve highly complex hardware and firmware requirements fall under this
category.
- They are employedin mission critical applications demanding high performance. Such
systems are commonly builtaround high performance 32 or 64 bit RISC
processors/controllers or Reconfigurable System on Chip(RSoC) or multi-core processors
and programmable logic devices.
- They may contain multiple processors/controllers and co-units/hardware accelerators for
processing requirements from themain processor of the system.
- Decoding/encoding of media, cryptographic function implementation,etc. are examples for
processing requirements which can be implemented using a co-
processor/hardwareaccelerator.
- Complex embedded systems usually contain a-high performance Real Time
OperatingSystem (RTOS) for task scheduling, prioritization and management:

Major application areas of embedded systems


The application areas and the products in the embedded domain are countless. A few of the
important domains and products are listed below:

1. Consumer electronics: Camcorders, cameras, etc.


2. Household appliances: Television, DVD players, washing machine, fridge, microwave
oven, etc.

Dept. of CSE, HKBKCE 56 2019-20


18CS44 Microcontrollers and Embedded Systems

3. Home automation and security systems: Air conditioners, sprinklers, intruder detection
alarms, closed circuit television cameras, fire alarms, etc.
4. Automotive industry: Anti-lock breaking systems (ABS), engine control, ignition
systems, automatic navigation systems, etc.
5. Telecom: Cellular telephones, telephone switches, handset multimedia applications, etc.
6. Computer peripherals: Printers, scanners, fax machines, etc.
7. Computer networking systems: Network routers, switches, hubs, firewalls, etc.
8. Healthcare: Different kinds of scanners, EEG, ECG machines etc.
9. Measurement & Instrumentation: Digital multi meters, digital CROs, logic analysers
PLC systems, etc.
10. Banking & Retail: Automatic teller machines (ATM) and currency counters etc.
11. Card Readers: Barcode, smart card readers, hand held devices, etc.

Purpose of embedded systems


Embedded systems are used in various domains like consumer electronics, home automation,
telecommunications, automotive industry, healthcare etc. Within the domain itself, according
to the application usage, they may have different functionalities.

Each embedded system is designed to serve the purpose of any one or a combination of the
following tasks:
1. Data collection/Storage/Representation
2. Data communication
3. Data (signal) processing
4. Monitoring
5. Control
6. Application specific user interface

1. Data Collection/Storage/Representation
- Embedded systems designed for the purpose of data collection performs acquisition of data
from the external world. Data collection is usually done for storage, analysis, manipulation
and transmission.
- The term “data” refers all kinds of information, i.e. text, voice, image, video, electrical
signals and anyother measurable quantities. Data can be either analog (continuous) or digital
(discrete).
- Embedded systems with analog data capturing techniques collect data directly in the form of
analog signals, whereas embedded systems with digital data collection mechanism converts
the analog signal to corresponding digital signal using analog to digital (A/D) converters and
then collects the binary equivalent of the analog data.
- The collected data may be stored directly in the system or may be transmitted to some other
systems or it may be processed by the system. These actions are purely dependent on the
purpose for which the embedded system is designed.
- Embedded systems designed for pure measurement applicationswithout storage, collects
data and gives a meaningful representation of the collected data bymeans of graphical

Dept. of CSE, HKBKCE 57 2019-20


18CS44 Microcontrollers and Embedded Systems

representation or quantity value and deletes the collected data when new data arrives at the
data collection terminal.
- ExampleAnalog and digital CROs without storage memory.
- Some embedded systems store the collected data for processing and analysis. Such systems
incorporate a built-in/plug-in storage memory for storing the captured data.
- Example Instruments with storage memory used in medical applications.
- Certain embedded systems store the data and will not give a representationof the same to
the user, whereas the data is used for internal processing.

A digital camera is a typical example of anembedded system with data


collection/storage/representation of data. The captured image may be stored within the
memory of the camera. It also be presented to the user through a graphicLCD unit.

2. Data Communication
- Embedded data communication systems are deployed in applications ranging from complex
satellite communication systems to simple home networking systems.
- The data collected by an embedded terminal may require transferring of the sameto some
other system located remotely.
- The transmission is achieved either by a wire-line medium or by a wirelessmedium.
Wire-line medium was the most common choice in olden days embedded systems. As
technology is changing, wireless medium is becoming the de-factostandard for data
communication in embedded systems.
- A wireless medium offers cheaper connectivity solutions and make the communication link
free from the hassle of wire bundles.
- The data collecting embedded terminal itself can incorporate data communication units like
wireless modules (Bluetooth, ZigBee, Wi-Fi, EDGE, GPRS, etc.) or wire-line modules (RS-
232C, USB, TCP/IP,PS2, etc.).
- Certain embedded systems act as a dedicated transmission unit between the sending and
receiving terminals, offering sophisticated functionalities like data packetizing, encrypting
and decrypting. Example Network hubs, routers, switches, etc.

3. Data (Signal) Processing


-The data (voice, image, video, electrical signals and other measurable quantities) collected
by embedded systems may be used for various kinds of data processing.

-Embedded systems with signal processing functionalities are employed in applications


demanding signal processing like speech coding, synthesis, audiovideo codec, transmission
applications, etc.

-A digital hearing aid is a typical example of an embedded system employing data


processing.

Dept. of CSE, HKBKCE 58 2019-20


18CS44 Microcontrollers and Embedded Systems

4. Monitoring
Embedded systems falling under this category are specifically designed for monitoring
purpose. Embedded products coming under the medical domain are with monitoring
functions only. They are used for determining the state of some variables using input sensors.

For example the electro cardiogram (ECG) machine is used for monitoring the heartbeat of
a patient. The machine is intended to do the monitoring of the heartbeat. It cannot impose
control over the heartbeat.

Some other examples of embedded systems with monitoring function are measuring
instruments like digital CRO, digital multimeters, logic analyzers, etc. They are used for
knowing (monitoring) the status of some variables like current, voltage, etc. Theycannot
control the variables in turn.

5. Control
- Embedded systems with control functionalities impose control over some variables
according to the changes in input variables.
- A system with control functionality contains both sensors and actuators.
Sensors are connected to the input port for capturing the changes in environmental variable or
measuring variable. The actuators connected to the output portare controlled according to the
changes in input variable to put an impact on the controlling variable to bring the controlled
variable to the specified range.

- Air conditioner system used in our home to control the room temperature to a specified
limit is a typicalexample for embedded system for control purpose.
- An air conditioner contains a room temperature sensing element (sensor) which may be a
thermistorand a handheld unit for setting up (feeding)the desired temperature. The handheld
unit maybe connected to the central embedded unit residinginside the air conditioner through
a wireless linkor through a wired link. The air compressor unit acts as the actuator. The
compressor is controlledaccording to the current room temperature and the desired
temperature set by the end user.Here the input variable is the current room temperatureand
the controlled variable is also the room temperature. The controlling variable is cool air flow
by the compressor unit. If the controlled variable. And input variable are not at the same
value, the controlling variable tries to equalise them throughb taking actions on the cool air
flow.

6. Application Specific User Interface


These are embedded systems with application-specific user: interfaces like buttons, switches,
keypad, lights, bells, display units, etc. Mobile phone is an example for this. In mobile phone
the user interface is provided through the keypad, graphic LCDmodule, system speaker,
vibration alert, etc.

Dept. of CSE, HKBKCE 59 2019-20


18CS44 Microcontrollers and Embedded Systems

The Typical Embedded System


A typical embedded system shown below contains a single chip controller, which acts as
the master brain of the system. The controller can be a Microprocessor (e.g. Intel 8085) or a
Microcontroller (e.g. Atmel AT89C51) or a Field Programmable Gate Array (FPGA) device
(e.g. Xilinx Spartan) or a Digital Signal Processor (DSP) (e.g. BlackfinProcessors from
Analog Devices) or an Application Specific Integrated Circuit (ASIC).

Embedded systems are basically designed to regulate a physical variable or to manipulate


the state of some devices by sending some control signals to the Actuators or devices
connected to the O/p ports of the system, in response to the input signals provided by the end
users or Sensors which are connected to the input ports.

Figure 3.1: Elements of embedded system

Hence an embedded system is a reactive system. The control is achieved by processing the
information coming from the sensors and user interfaces, and controlling some actuators that
regulate the physical variable.

Key boards, push button switches, etc. are examples for common user interface input
devices whereas LEDs, liquid crystal displays, piezoelectric buzzers, etc. are examples for
common output devices for a typical embedded system.
For example, if the embedded system is designed for any handheld application, such as a
mobile handset application, then the system should contain user interfaces like a keyboard for
performing input operations and display unit for providing users the status of various
activities in progress.

Dept. of CSE, HKBKCE 60 2019-20


18CS44 Microcontrollers and Embedded Systems

Some embedded systems do not require any manual intervention for their operation. They
automatically sense the variations in the real world, to which they are interacting through the
sensors which are connected to the input port of the system. The sensor information is passed
to the processor. Upon receivingthe sensor data the processor performs some pre-defined
operations with the help of the firmware embedded in the system and sends some actuating
signals to the actuator connected to the output port of the embedded system, which in turn
acts on the controlling variable to bring the controlled variable to the desired level to make
the embedded system work in the desired manner.
The Memory of the system is responsible for holding the control algorithm and other
important configurationdetails.
For most of embedded systems, the memory for storing the algorithm or configurationdata is
of fixed type, which is a kind of Read Only Memory (ROM) and it is not available for the
enduser for modifications, which means the memory is protected from unwanted user
interaction.
The most common types of memories used inembedded systems for control algorithm storage
are OTP, PROM, UVEPROM, EEPROM and FLASH.Depending on the control application,
the memory size may vary from a few bytes to megabytes.
Sometimes the system requires temporary memoryfor performing arithmetic operations or
control algorithm execution and this type of memory is knownas “working memory”.
Random Access Memory (RAM) is used in most of the systems as the workingmemory.
The sizeof the RAM also varies from a few bytes to kilobytes or megabytes depending on the
application.

Core of the embedded system


Embedded systems are domain and application specific and are built around a central core.
The core of the embedded system falls into any one of the following categories:
1. General Purpose and Domain Specific Processors
i. Microprocessors
ii. Microcontrollers
iii. Digital Signal Processors
2. Application Specific Integrated Circuits (ASICs)
3. Programmable Logic Devices (PLDs)
4. Commercial off-the-shelf Components (COTS)

1. General Purpose and Domain Specific Processors


Almost 80% of the embedded systems are processor/controller based. The processor may be
a microprocessor or a microcontroller or a digital signal processor, depending on the domain
and application.
i. Microprocessors
A Microprocessor is a silicon chip representing a central processing unit(CPU), which is
capable of performing arithmetic as well as logical operations according to a pre-defined
set of instructions.

Dept. of CSE, HKBKCE 61 2019-20


18CS44 Microcontrollers and Embedded Systems

The first microprocessor developed by Intel was Intel 4004, a 4bit processor which was
released in November 1971.
It featured 1K data memory, a 12bit program counter and 4K program memory, sixteen 4bit
general purpose registers and 46 instructions. It ran at a clock speed of 740 kHz.
In 1972, 14 more instructions were added to the 4004 instruction set and the program space is
upgraded to 8K.
It was quickly replaced in April 1972 by Intel 8008 which was similar to [Intel 4040, the
only difference was that its program counter was 14 bits wide and the 5008 served as a
terminal controller.
In April 1974 Intel launched the first 8bit processor, the Intel 8080, with 16bit address bus
and program counter and seven 8bit registers. Intel 8080 was the most commonly used
processors for industrial control and other embedded applications in the 1975s.
Immediately after the release of Intel 8080, Motorola also entered the market with their
processor, Motorola 6800 with a different architecture and instruction set compared to 8080.
In 1976 Intel came up with the upgraded version of 8080 — Intel 8085, with two newly
added instructions, three interrupt pins and serial I/O.
In July 1976 Zilog entered the microprocessor market with its Z80 processor as competitor
to Intel.
Technical advances in the field of semiconductor industry brought a new dimension to the
microprocessor market and twentieth century witnessed a fast growth in, processor
technology. 16, 32 and 64 bit processors came into market.
Intel, AMD, Freescale, IBM, TI, Cyrix, Hitachi, NEC, LSI Logic, etc. are the key players in
the processor market. Intel still leads the market with cutting edge technologies in the
processor industry.
Different instruction set and system architecture are available for the design of a
microprocessor. Harvard and Von-Neumann are the two common system architectures for
processor design. Processors based on Harvard architecture contains separate buses for
program memory and data memory, whereasprocessors based on Von-Neumann architecture
shares a single system bus for program and data memory.

ii. Microcontrollers
A Microcontroller is an integrated chip that contains a CPU, RAM, special and general
purpose register arrays, on chip ROM/FLASH memory for program storage, timer and
interrupt control units and dedicated I/O ports.
Texas Instrument’s TMS 1000 is considered as the world’s first microcontroller. TI
followed Intel’s 4004/4040, 4 bit processor design and added some amount of RAM, program
storage memory (ROM) and I/O support on a single chip, there by eliminated the requirement
of multiple hardware chips for self-functioning. Provision to add custom instructions to the
CPU was another innovative feature ofTMS 1000. TMS 1000 was released in 1974.
In 1977 Intel entered the microcontroller market with a family of controllers named MCS-48
family. Intel 8048 is recognised as Intel’s first microcontroller. The design of 8048
adopted a true Harvard architecture where program and data memory shared the same address
bus.

Dept. of CSE, HKBKCE 62 2019-20


18CS44 Microcontrollers and Embedded Systems

Intel came out with its most fruitful design in the 8bit microcontroller domain the
8051Family. It is the most popular and powerful 8bit microcontroller ever built. It was
developed in the 1980s and was put under the family MCS-51.
Almost 75% of the microcontrollers used in the embedded domain were 8051 family based
controllers during the 1980-90s. Due to the low cost, wide availability, memory efficient
instruction set, mature development tools and Boolean processing (bit manipulation
operation) capability etc.
Another important family of microcontrollers used in industrial control and embedded
applications is the PIC family microcontrollers from Microchip Technologies.

The instruction set architecture of a microcontroller can be either RISC or CISC.


Microcontrollers are designed for either general purpose application requirement or domain
specific application requirement. The Intel 8051 microcontroller is a typical example for a
general purpose microcontroller, whereas the automotive AVR microcontroller family from
Atmel Corporation is a typical example for ASIP specifically designed forthe automotive
domain.
Microprocessor vs Microcontroller (Refer Module1 notes)
iii. Digital Signal Processors
Digital Signal Processors (DSPs) are powerful special purpose 8/16/32 bit microprocessors
designed specifically to meet the computational demands and power constraints of today’s
embedded audio, video, and communications applications.
Digital signal processors are 2 to 3 times faster than the general purpose microprocessors in
signal processing applications. This is because of the architectural difference between the
two. DSPs implement algorithms in hardwarewhich speeds up the execution whereas general
purpose processors implement the algorithm in firmware and the speed of execution depends
primarily on the clock for the processors.
Typical digital signal processor incorporates the following key units:
i. Program Memory - Memory for storing the program required by DSP to process the data
ii. Data Memory - Working memory for storing temporary variables and data/signal to be
processed.
iii. Computational Engine - Performs the signal processing with the stored programmemory.
Computational Engine incorporates many specialised arithmetic units and each of them
operatessimultaneously to increase the execution speed. It also incorporates multiple
hardware shifters forshifting operands and thereby saves execution time.
iv. I/O Unit - Acts as an interface between the outside world and DSP. It is responsible for
capturing signalsto be processed and delivering the processed signals.

RISC vs. CISC (refer module 1 notes)

 Harvard vs. Von-Neumann Processor/Controller Architecture


Microprocessors/controllers based on the Von-Neumann architectureshares a single
common busfor fetching both instructions and data. Program instructions and data are stored
in a common mainmemory. Von-Neumann architecture based processors/controllers first
fetch an instruction and thenfetch the data to support the instruction from code memory. The
two separate fetches slows down thecontroller’s operation.

Dept. of CSE, HKBKCE 63 2019-20


18CS44 Microcontrollers and Embedded Systems

Figure 3.2

Microprocessors/controllers based on the Harvard architecture will have separate data bus
and instruction bus. This allows the data transfer and program fetching to occur
simultaneously in both buses. With Harvard architecture, the data memory can be read and
written while the program memory is being accessed. These separated data memory and code
memory buses allow one instruction to execute while the next instruction is fetched (“pre-
fetching”). The pre-fetch allows much faster execution than Von-Neumann architecture.

The following table highlights the differences between Harvard and Von-Neumann
architecture
Harvard architecture Von-Neumann architecture
Separate buses for instruction and data Single bus for instruction and data fetching
fetching
Easier to pipeline, so high performance can Low performance
achieve
Costly Cheaper
Since data memory and program memory Since data memory and program-memory
are stored physically in different locations, are stored physically ‘in the same chip,
no chances for accidental corruption of chances for accidental corruption of
program memory program memory

 Big-Endian vs. Little-Endian Processors/Controllers


It specifies the order in which the data is stored in the memory by processor.
Suppose the word length is two byte then data can be storedin memory in two different ways:
1. Little-endian - the lower-order byte of the data is stored in memory at the lowest
address, and the higher-order byte at the highest address.
For example, a 4 byte long integer Byte3 Byte2 Byte1 Byte0 will be stored in the memory as
shown below:

Byte 0 Byte 0 0x20000 (Base Address )

Dept. of CSE, HKBKCE 64 2019-20


18CS44 Microcontrollers and Embedded Systems

Byte 1 Byte 1 0x20001 (Base Address + 1)

Byte 2 Byte 2 0x20002 (Base Address + 2)

Byte 3 Byte 3 0x20003 (Base Address + 3)

2. Big-endian - the higher-order byte of the data is stored in memory at the lowest
address, and the lower-order byte at the highest address.

For example, a 4 byte longinteger Byte3 Byte2 Bytel Byte will be stored in the memory as
follows':
Byte 3 Byte 3 0x20000 (Base Address )

Byte 2 Byte 2 0x20001 (Base Address + 1)

Byte 1 Byte 1 0x20002 (Base Address + 2)

Byte 0 Byte 0 0x20003 (Base Address + 3)

 Load Store Operation and Instruction Pipelining


The memory access related operationsare performed by the special instructions called Load
and store instructions.
If the operand is specified as memorylocation, the content isloaded to a register using the load
instruction. The instruction store - storesdata from a specified register to a specified memory
location.
The concept of Load Store Architectureis illustrated with the following example:
Suppose x, y and z are memory locations and we want to add the contents of x and y and
store theresult in location z.
The first instruction load R1, x loads the register R1 with the content of memory location x,
the secondinstruction load R2, y loads the register R2 with the content of memory location y.
The instruction add R3, R1, R2 adds the content of registers R1 and R2 and stores the result
in register R3. The nextinstruction store R3, z stores the content of register R3 in memory
location z.

Dept. of CSE, HKBKCE 65 2019-20


18CS44 Microcontrollers and Embedded Systems

Figure 3.3: Load Store architecture

Instruction Pipelining
The conventional instruction execution by the processor follows the fetch-decode-execute
sequence. Where the ‘fetch’ part fetches the instruction from program memory and the
decode part decodes the instruction. The execute stage reads the operands, perform ALU
operations and stores the result.
In conventional program execution, the fetch and decode operations are performed in
sequence. By pipelining the processing speed can be increased.
Instruction pipelining refers to the overlapped execution of instructions. Under normal
program execution the pc will have the address of next instruction to execute, while the
decoding and execution of the current instruction is in progress. If the current instruction in
progress is a branch instruction like jump or call instruction, there is no meaning in fetching
the instruction following the current instruction. In such cases the instruction fetched is
flushed and a new instruction fetch is performed to fetch the instruction.

Whenever the current instruction is executing the program counter will be loaded with the
address of the next instruction. In case of jump or branch instruction, the new location is
known only after completion of the jump or branch instruction. Depending on the stages
involved in an instruction, there can be multiple levels of instruction pipelining.

Figure below illustrates the concept of Instruction pipelining for single stage pipelining.

Figure 3.4: The concept of Instruction pipelining for single stage pipelining

Dept. of CSE, HKBKCE 66 2019-20


18CS44 Microcontrollers and Embedded Systems

Application Specific Integrated Circuits (ASICs)


Application Specific Integrated Circuit (ASIC) is a microchip designed to perform a specific
or unique application. It integrates several functions into a single chip and there by reduces
the system development cost.
Most of the ASICs are proprietary products. As a single chip, ASIC consumes a very small
area in the total system and there by helps in the design of smaller systems with high
capabilities/functionalities.
It can be custom fabricated by using the components from a re-usable ‘building block’ library
of components for a particular customer application.

 Programmable Logic Devices


Logic devices provide specific functions, including device-to-device interfacing, data
communication, signal processing, data display, timing and control operations, and almost
every other function a system must perform.
Logic devices can be classified into two broad categories—fixed and programmable.
The circuits in a fixed logic device are permanent, they perform one function or setof
functions, once manufactured they cannot be changed.
On the other hand, Programmable Logic Devices (PLDs) offer customers a wide range of
logic capacity, features, speed, and voltage characteristics and these devices can be re-
configured to perform any number of functions at any time.
Example network router, a DSL modem, a DVD player, or an automotive navigation system.
The key benefit of using PLDs is that during the design phase customers can change the
circuitry as often as they want until the design operates to their satisfaction. That’s because
PLDs are based on re-writable memory technology-to change the design.
The two major types of programmable logic devices are
i. Field Programmable Gate Arrays (FPGAs)
ii. Complex Programmable Logic Devices (CPLDs)

- Advantages of PLDs
1) PLDs offer customer much more flexibility during the design cycle.
2) PLDs do not require long lead times for prototypes or production parts because PLDs are
already on a distributor’s shelf and ready for shipment.
3) PLDs can be reprogrammed even after a piece of equipment is shipped to a customer

 Commercial off-the-shelf components (COTs)


1) A Commercial off the Shelf product is one which is used 'ASIC'.
2) The COTS components itself may be develop around a general purpose or domain
specific processor or an ASICs or a PLDs.
3) The major advantage of using COTS is that they are readily available in the market, are
chip and a developer can cut down his/her development time to a great extent.

Dept. of CSE, HKBKCE 67 2019-20


18CS44 Microcontrollers and Embedded Systems

4) The major drawback of using COTS components in embedded design is that the
manufacturer of the COTS component may withdraw the product or discontinue the
production of the COTS at any time if rapid change in technology occurs.

Advantages of COTS:
1) Ready to use
2) Easy to integrate
3) Reduces development time

Disadvantages of COTS:
1) No operational or manufacturing standard (all proprietary)
2) Vendor or manufacturer may discontinue production of a particular COTS product

 Memory
Memory is an important part of a processor/controller based embedded systems. Some of the
processors/controllers contain built in memory and this memory is referred as on-chip
memory.
Others do not contain any memory inside the chip and requires external memory to be
connected with the controller/processor to store the control algorithm. It is called off-chip
memory.

Also some working memory is required for holding data temporarily during certain
operations.

The different types of memory used in embedded system applications are.

Program Storage Memory (ROM)


The program memory or code storage memory of an embedded system stores the
program instructions. It retains its contents even after the power turned off. It is generally
known as non-volatile storage memory.

It can be classified into different types as shown in the block diagram.

Figure 3.5: Classification of memories


.

Dept. of CSE, HKBKCE 68 2019-20


18CS44 Microcontrollers and Embedded Systems

i. Masked ROM (MROM)- Masked ROM isa one-time programmable device. It makes use
of the hardwired technology for storing data. The device is factory programmed by masking
and metallisation process at the time of production itself, according to the data provided by
the end user.

The primary advantage of this is low cost for high volume production. They are the least
expensive type of solid state memory

ii. Programmable Read Only Memory (PROM) / (OTP)- Unlike Masked ROM Memory,
One Time Programmable Memory (OTP) or PROM is not pre-programmed by the
manufacturer. The end user is responsible for programming these devices. This memory has
nichrome or polysilicon wires arranged in a matrix. These wires can be functionally viewed
as fuses.

It is programmed by a PROM programmer which selectively burns the fuses according to the
bit pattern to be stored. Fuses which are not blown/burned represents a logic “1” whereas
fuses which are blown/burned represents a logic “O”. The default state is logic “1”.

iii. Erasable Programmable Read Only Memory (EPROM) -OTPs are not useful and
worth for development purpose.

During the development phase the code is subject to continuous changes and using an OTP
each time to load the code is not economical. Erasable Programmable Read Only Memory
(EPROM) gives the flexibility to re-program the same chip.

EPROM stores the bit information by charging the floating gate of an FET. Bit information is
stored by using an EPROM programmer, which applies high voltage to charge the floating
gate. EPROM contains a quartz crystal window for erasing the stored information. If the
window is exposed to ultraviolet rays for a fixed duration, the entire memory will be erased.

iv. Electrically Erasable Programmable Read Only Memory (EEPROM)- As the name
indicates, the information contained in the EEPROM memory can be altered by using
electrical signals at the register/Byte level. They can be erased and reprogrammed in-circuit.

These chips include a chip erase mode and in this mode they can be erased in a few
milliseconds. It provides greater flexibility for system design. The only limitation is their
capacity is limited when compared with the standard ROM (A few kilobytes).

v. FLASH -FLASH is the latest ROM technology and is the most popular ROM technology
used in today’s embedded designs. FLASH memory is a variation of EEPROM technology. It
combines there-programmability of EEPROM and the high capacity of standard ROMs.

FLASH memory is organised as sectors (blocks) or pages. It stores information in an array of


floating gate MOSFET transistors. The erasing of memory can be done at sector level or page
level without affecting the other sectors or pages. Each sector/page should be erased before
re-programming. The typical erasable capacity of FLASH is 1000 cycles.

vi. NVRAM- Non-volatile RAM is a random access memory with battery backup. It contains
static RAM based memory and a minute battery for providing supply to the memory in the

Dept. of CSE, HKBKCE 69 2019-20


18CS44 Microcontrollers and Embedded Systems

absence of external power supply. The memory and battery are packed together in a single
package.

 Read-Write Memory/Random Access Memory (RAM)


RAM is the data memory or working memory of the controller/processor.
Controller/processor can read from it and write to it.
RAM is volatile - means when the power is turned off, all the contents are destroyed.
RAM is a direct access memory - means we can access the desired memory location directly
without the need for traversing through the entire memory locations to reach the desired
memory position (i.e. random access of memory location).

RAM generally falls into three categories as shown in below figure

1. Static RAM (SRAM)


2. Dynamic RAM (DRAM)
3. Non-volatile RAM (NVRAM)

Figure 3.6

1. Static RAM (SRAM)- Static RAM stores data in the form of voltage. They are made up of
flip-flops. Static RAM is the fastest form of RAM available.

In typical implementation, an SRAM cell (bit) is realised using six transistors (or 6
MOSFETs). Four of the transistors are used for building the latch (flip-flop) part of the
memory cell and two for controlling the access.

Figure 3.7: SRAM cell implementation

Dept. of CSE, HKBKCE 70 2019-20


18CS44 Microcontrollers and Embedded Systems

2. Dynamic RAM (DRAM) – Dynamic RAM stores data in the form of charge. They
are made up of MOS transistor gates.
The advantages of DRAM are its high density and low cost compared to SRAM.
The disadvantage is that since the information is stored as charge it gets leaked off with time
and to prevent this they need to be refreshed periodically.

Special circuits called DRAM controllers are used for the refreshing operation. The refresh
operation is done periodically in milli-seconds interval.

Below figure illustrates the typical implementationof a DRAM cell.

Figure 3.8: DRAM Cell implantation

3. NVRAM- Non-volatile RAM is a random access memory with battery backup. It contains
static RAM based memory and a minute battery for providing supply to the memory in the
absence of external power supply.
The memory and battery are packed together in a single package. NVRAM is used for the
non-volatile storage of results of operations or for setting up of flags, etc.

 Memory Shadowing
Generally the execution of a program or a configuration from a Read Only Memory (ROM)
is very slow(120 to 200 ns) compared to the execution from a random access memory (40
to 70 ns). RAM access is about three times as fast as ROM access.

Shadowing of memory is a technique adopted to solve the execution speed problem in


processor-based systems.

In computer systems will be a configuration holding ROM called Basic Input Output
Configuration ROM(BIOS). The systems BIOS stores the hardware configuration
information like the address assigned for various serial ports etc.

During system boot up BIOS is read and the system is configured according to it built’s a
time consuming.

Now the manufactures included a RAM behind the logical layer of BIOS at its same address
as a shadow to the BIOS and the first step happens during the boot up is copying the BIOS to
the shadowed RAM and write protecting the RAM then disabling the BIOS reading.

Dept. of CSE, HKBKCE 71 2019-20


18CS44 Microcontrollers and Embedded Systems

RAM is volatile and it cannot hold the configuration data which is copied from the BIOS
when the power supply is switched off. Only a ROM can hold it permanently. But for high
system performance it should be accessed from a RAM instead of accessing from a ROM.

 Sensors and Actuators


An embedded system is in constant interaction with the Real world and the
controlling/monitoring functions executed by the embedded – system is achieved in
accordance with the changes happening to the Real world.

The changes in system environment or variables are detected by the sensors connected to
the input port of the embedded system. If the embedded system is designed for any
controlling purpose, the system will produce some changes in the controlling variable to
bring the controlled variable to the desired value. It is achieved through an actuator
connected to the output port of the embedded system.

If the embedded system is designed for monitoring purpose only, then there is no need for
including an actuator in the system. For example, take the case of an ECG machine. It is
designed to monitor the heart beat status of a patient and it cannot impose a control over the
patient’s heart beat and its order.

Sensors - A sensor is a transducer device that converts energy from one form to another for
any measurement or control purpose.

Actuators -Actuator is a form of transducer device (mechanical or electrical) which converts


signals to corresponding physical action (motion). It acts as an output device.

 The I/O Subsystem


The I/O subsystem of the embedded system facilitates the interaction of the embedded system
with the external world.

Interaction happens through the sensors and actuators connected to the input and output ports
respectively of the embedded system. The sensors may not be directly interfaced to the input
ports, instead they may be interfaced through signal conditioning and translating systems like
ADC, optocouplers, etc.

Light Emitting Diode (LED)- Light Emitting Diode (LED) is an important output devicefor
visual indication in any embedded system.

LED can be used as an indicator for the status of various signals or situations. Typical
examples are indicating the presence of power conditions like ‘Device ON’, ‘Battery low’ or
‘Charging of battery’ for a battery operated handheld embedded devices.

Light Emitting Diode is a p-n junction diode and it contains an anode and a cathode. For
proper functioning of the LED, the anode of it should be connected to +ve terminal of the
supply voltage and cathode to the —ve terminal of supply voltage.

The current flowing through the LED must be limited to a value below the maximum current
that it can conduct. A resister is used in series between the power supply and the LED to limit
the current through the LED.

Dept. of CSE, HKBKCE 72 2019-20


18CS44 Microcontrollers and Embedded Systems

The ideal LED interfacing circuit is shown in below Figure.

Figure 3.9

LEDs can be interfaced to the port pin of a processor/controller in two ways.


i). The anode is directly connected to the port pin and the port pin drives the LED. The
port pin sources current to the LED when the port pin is at logic High (Logic ‘1’).
ii). The cathode of the LED is connected to the port pin of the processor/controller and
the anode to the supply voltage through a current limiting resistor. The LED is turned on
when the port pin is at logic Low (Logic ‘0’).

7-Segment LED Display- The 7-segmentLED display is an output device for displaying
alphanumeric characters. It contains 8 light-emitting diode (LED) segments arranged in a
special form. Out of the 8LED segments, 7 are used for displaying alpha numeric characters
and 1 is used for representing decimal point.

Below figure shows the arrangement of LED segments in a 7-segment LED display.

Figure 3.10: 7-Segment LED Display

The LED segments are named A to G and the decimal point LED segment is named as DP.
The LED segments A to G and DP should be lit (ON) accordingly to display numbers and
characters.

Dept. of CSE, HKBKCE 73 2019-20


18CS44 Microcontrollers and Embedded Systems

For example, for displaying the number 4, the segments F, G, B and C are lit (on).For dis-
playing 3, the segments A, B, C, D, G and DP are lit. For displaying the character‘d’, the
segments B, C, D, E and G are lit.

All these 8 LED segments need to be connected to one port of the processor/controller for
displaying alpha numeric digits.

The 7-segmentLED displays are available in two different configurations, namely


i. Common Anode
ii. Common Cathode.

In the common anode configuration, the anodes of the 8 segments are connected commonly

Figure 3.11

Whereas in the common cathode configuration, the 8 LED segments share a common
cathode line.

Figure 3.12

Based on the configuration of the 7-segment LED unit, the LED segment’s anode or cathode
is connected to the port of the processor/controller in the order ‘A’ segment to the least
significant port pin-and DP segment to the most significant port pin.

The current flow through each of the LED segments should be limited to the maximum value
supported by the LED display unit. It can be limited by connecting a current limiting resistor
to the anode or cathode of each segment.

Dept. of CSE, HKBKCE 74 2019-20


18CS44 Microcontrollers and Embedded Systems

For common cathode configurations, the anode of each LED segment is connected to the port
pins of the port to which the display is interfaced.

The anode of the common anode LED display is connected to the 5V supply voltage through
a current limiting resistor and the cathode of each LED segment is connected to the
respective port pin lines.

For an LED segment to lit in the Common anode LED configuration, the port pin to which
the cathode of the LED segment is connected should be set at logic 0.

Stepper Motor- A stepper motor is an electro-mechanical device which generates discrete


displacement (motion) in response, to the electrical signals.

It differs from the normal DC motor in its operation. The DC motor produces continuous
rotation on applying DC voltage whereas a stepper motor produces discrete rotation in
response to the DC voltage applied to it.

Stepper motors are widely used in industrial embedded applications, consumer electronic
products and robotics control systems.

Based on the coil winding arrangements, a two-phase stepper motor is classified into two.
1. Unipolar
2. Bipolar

1. Unipolar-A unipolar stepper motor contains two windings per phase. The direction of
rotation (clockwise or anticlockwise) of a stepper motor is controlled by changing the
direction of current flow.
Current in one direction flows through one coil and in the opposite direction flows through
the other coil. It is easy to shift the direction of rotation by just switching the terminals to
which the coils are connected.
Below figure illustrates the working of a two-phase unipolar stepper motor.

Figure 3.13

The coils are represented as A, B, C and D. Coils A and C carry current in opposite directions
for phase 1. Similarly, B and D carry current in opposite directions for phase 2.

Dept. of CSE, HKBKCE 75 2019-20


18CS44 Microcontrollers and Embedded Systems

2. Bipolar- A bipolar stepper motor contains single winding per phase. For reversing the
motor rotation the current flow through the windings is reversed dynamically.

The stator winding details for a two phase unipolar stepper motor is shown in below figure.

Figure 3.14

The stepping of stepper motor can be implemented in different ways by changing the
sequence of activation of the stator windings.

The different stepping modes supported by stepper motor are explained below.
i. Full Step- In the full step mode both the phases are energised simultaneously. The
coils A, B, C and Dare energised in the following order:

Step Coil A Coil B Coil C Coil D


1 H H L L
2 L H H L
3 L L H H
4 H L L H

ii. Wave Step - In the wave step mode only one phase is energised at a time and each coils
of the phase is energised alternatively.

The coils A, B, C and D are energised in the following order:

Step Coil A Coil B Coil C Coil D


1 H L L L

Dept. of CSE, HKBKCE 76 2019-20


18CS44 Microcontrollers and Embedded Systems

2 L H L L
3 L L H L
4 L L L H

iii.Half Step - It uses the combination of wave and full step. It has the highest torque
and stability.

The coil energising sequence for half step is given below.


Step Coil A Coil B Coil C Coil D
1 H L L L
2 H H L L
3 L H L L
4 L H H L
5 L L H L
6 L L H H
7 L L L H
8 H L L H

The following circuit diagram illustrates the interfacing of a stepper motor through a driver
circuit connected to the port pins of a microcontroller/processor.

Figure 3.15: Interfacing of stepper motor through a driver circuit

Dept. of CSE, HKBKCE 77 2019-20


18CS44 Microcontrollers and Embedded Systems

Keyboard - Keyboard is an input device for user interfacing.


Example PDA device with large number of alpha-numeric keys. In such situations it may not
be possible to interface each keys to a port pin due to the limitation in the number of general
purpose port pins available for the processor/controller.

Matrix keyboard is an optimum solution for handling large key requirements. It greatly
reduces the number of interface connections. For example, for interfacing 16 keys, in the
direct interfacing technique 16 port pins are required, whereas in the matrix keyboard only 8
lines are required. The 16 keys are arranged in a 4 column x 4 Row matrix.

Below figure illustrates the connection of keys in a matrix keyboard.

Figure 3.16: Matrix keyboard interfacing


In a matrix keyboard, the keys are arranged in matrix fashion. For detecting a key press the
keyboard uses the scanning technique, where each row of the matrix is pulled low and the
columns are read.

After reading the status of each columns corresponding to a row, the row is pulled high and
the next row is pulled low and the status of the columns are read. This process is repeated
until the scanning for all rows are completed.

When a row is pulled low and if a key connected to the row is pressed, reading the column to
which the key is connected will give logic0.

Since keys are mechanical devices, there is a possibility for de-bounce issues, which may
give multiple key press effect for a single key press. To prevent this, a proper key de-
bouncing technique should be applied. Hardware key de-bouncer circuits and software key
de-bounce techniques are the key denouncing techniques available.

Dept. of CSE, HKBKCE 78 2019-20


18CS44 Microcontrollers and Embedded Systems

Module 4

Syllabus: Embedded System Design Concepts: Characteristics and Quality Attributes of


Embedded
Systems, Operational quality attributes, non-operational quality attributes, Embedded
Systems- Application and Domain specific, Hardware Software Co-Design and Program
Modelling, embedded firmware design and development.

Text book 2: Chapter-3, Chapter-4, Chapter-7 (Sections 7.1, 7.2 only), Chapter-9
(Sections 9.1, 9.2, 9.3.1, 9.3.2 only)

Characteristics of an embedded system


Some of the important characteristics of an embedded system are:

1. Application and domain specific


2. Reactive and Real Time
3. Operates in harsh environments
4. Distributed
5. Small size and weight
6. Power concerns

1. Application and Domain specific


 An embedded system is designed for a specific purpose only. It will not do any other
task.
 Ex. A washing machine can only wash, it cannot cook
 Certain embedded systems are specific to a domain: ex. A hearing aid is an
application that belongs to the domain of signal processing.

2. Reactive and real time


 Certain Embedded systems are designed to react to the events that occur in the nearby
environment. These events also occur real-time.
 Ex. An air conditioner adjusts its mechanical parts as soon as it gets a signal from its
sensors to increase or decrease the temperature when the user operates it using a
remote control.
 An embedded system uses Sensors to take inputs and has actuators to bring out the
required functionality.

3. Operation in harsh environment


 Certain embedded systems are designed to operate in harsh environments like very
high temperature of the deserts or very low temperature of the mountains or extreme
rains.
 These embedded systems have to be capable of sustaining the environmental
conditions it is designed to operate in.
4. Distributed systems

Dept. of CSE, HKBKCE 79 2019-20


18CS44 Microcontrollers and Embedded Systems

 Certain embedded systems are part of a larger system and thus form components of a
distributed system.
 These components are independent of each other but have to work together for the
larger system to function properly.

 Ex. A car has many embedded systems controlled to its dash board. Each one is an
independent embedded system yet the entire car can be said to function properly only
if all the systems work together.

5. Small size and weight


 An embedded system that is compact in size and has light weight will be desirable or
more popular than one that is bulky and heavy.

 Ex. Currently available cell phones. The cell phones that have the maximum features
are popular but also their size and weight is an important characteristic

6. Power concerns
 It is desirable that the power utilization and heat dissipation of any embedded system
be low.

 If more heat is dissipated then additional units like heat sinks or cooling fans need to
be added to the circuit.

 If more power is required then a battery of higher power or more batteries need to be
accommodated in the embedded system

Quality attributes of embedded system

Quality attributes are the non-functional requirements that need to be documented properly
in any system design.

The Quality attributes of any embedded system are classified into two, namely

1. Operational Quality Attributes


2. Non-Operational Quality Attributes

1. Operational Quality Attributes

The operational quality attributes represent the relevant quality attributes related to the
embedded system when it is in the operational mode or online mode.

The important quality attributes are

a. Response
b. Throughput
c. Reliability
d. Maintainability
e. Security
f. Safety

Dept. of CSE, HKBKCE 80 2019-20


18CS44 Microcontrollers and Embedded Systems

a) Response
 Response is a measure of quickness of the system.
 It gives you an idea about how fast your system is tracking the input variables.
 Most of the embedded system demand fast response which should be real-time.
 For example, an embedded system deployed in flight control application should
respond in a Real Time manner. Any response delay in the system will create
potential damages to the safety of the flight as well as the passengers.

b) Throughput
 Throughput deals with the efficiency of system.
 It can be defined as rate of production or process of a defined process over a
stated period of time.
 The rates can be expressed in terms of units of products, batches produced, or any
other meaningful measurements.
 In the case of a Card Reader, throughput means how many transactions the Reader
can perform in a minute or in an hour or in a day.
 Throughput is generally measured in terms of ‘Benchmark’. A ‘Benchmark’ is
a reference point by which something can be measured.

c) Reliability
 Reliability is a measure of how much percentage you rely upon the proper
functioning of the system.
 Mean Time between failures (MTBF) and Mean Time To Repair (MTTR) are
terms used in defining system reliability.
 Mean Time between failures can be defined as the average time the system
is functioning before a failure occurs.
 Mean time to repair can be defined as the average time the system has spent in repairs.

d) Maintainability
 Maintainability deals with support and maintenance to the end user or a client in
case of technical issues and product failures or on the basis of a routine system
check-up.

i. Scheduled or Periodic Maintenance


 This is the maintenance that is required regularly after a periodic time interval.
 Example: Periodic Cleaning of Air Conditioners Refilling of printer cartridges.

ii. Maintenance to unexpected failure


 This involves the maintenance due to a sudden breakdown in the functioning of
the system.
 Example: Air conditioner not powering on, Printer not taking paper in spite of a
full paper stack

e) Security
 Confidentiality, Integrity and Availability are three corner stones of information
security. Confidentiality deals with protection data from unauthorized

Dept. of CSE, HKBKCE 81 2019-20


18CS44 Microcontrollers and Embedded Systems

disclosure.

Dept. of CSE, HKBKCE 82 2019-20


18CS44 Microcontrollers and Embedded Systems

 Integrity gives protection from unauthorized modification.


 Availability gives protection from unauthorized user
 Certain Embedded systems have to make sure they conform to the security measures.
 Example: An Electronic Safety Deposit Locker can be used only with a pin
number like a password.

f) Safety
 Safety deals with the possible damage that can happen to the operating person and
environment due to the breakdown of an embedded system or due to the emission of
hazardous materials from the embedded products.

 A safety analysis is a must in product engineering to evaluate the anticipated damage


and determine the best course of action to bring down the consequence of damages to
an acceptable level.

2. Non Operational Attributes

The quality attributes that needs to be addressed not on the basis of operational aspects are
grouped under this category.

The important Non Operational Attributes quality attributes are

a. Testability & Debug-ability


b. Evaluability
c. Portability
d. Time to prototype and market
e. Per unit and total cost.

a) Testability and Debug-ability


 It deals with how easily one can test his/her design, application and by which
mean he/she can test it.
 In hardware testing the peripherals and total hardware function in designed manner
 Firmware testing is functioning in expected way
 Debug-ability is means of debugging the product as such for figuring out the
probable sources that create unexpected behavior in the total system

b) Evaluability
 For embedded system, the qualitative attribute “Evaluability” refer to ease with which
the embedded product can be modified to take advantage of new firmware or
hardware technology.

c) Portability
 Portability is measured of system Independence.
 An embedded product can be called portable if it is capable of performing its
operation as it is intended to do in various environments irrespective of different
processor and or controller and embedded operating systems.

Dept. of CSE, HKBKCE 83 2019-20


18CS44 Microcontrollers and Embedded Systems

d) Time to prototype and market


 Time to Market is the time elapsed between the conceptualization of a product and
time at which the product is ready for selling or use
 Product prototyping help in reducing time to market.
 Prototyping is an informal kind of rapid product development in which
important feature of the under consider are develop.
 In order to shorten the time to prototype, make use of all possible option like use
of reuse, off the self-component etc.

e) Per unit and total cost


 Cost is an important factor which needs to be carefully monitored. Proper market
study and cost benefit analysis should be carried out before taking decision on per
unit cost of the embedded product.
 When the product is introduced in the market, for the initial period the sales and
revenue will be low
 There won’t be much competition when the product sales and revenue increase.
 During the maturing phase, the growth will be steady and revenue reaches highest
point and at retirement time there will be a drop in sales volume.

Figure: Product life cycle (PLC) curve

Embedded Systems-Application and Domain specific


Embedded systems are application and domain specific, meaning; they are specifically built
for certain applications in certain domains like consumer electronics, telecom, automotive,
industrial control, etc.

Embedded systems are highly specialised in functioning and are dedicated for a specific
application.

Dept. of CSE, HKBKCE 84 2019-20


18CS44 Microcontrollers and Embedded Systems

WASHING MACHINE - Application-specific embedded system


An embedded system contains sensors, actuators, control unit and application- specific
user interfaces like keyboards, display units, etc. You can see all these components in a
washing machine if you have a closer look at.it. Some of them are visible and some of them
may be invisible to you.

Figure: Washing machine Functional block diagram

The actuator part of the washing machine consists of a motorised agitator, tumble tub, water
drawing pump and inlet valve to control the flow of water into the unit.

The sensor part consists of the water temperature sensor, level sensor, etc.

The control part contains a microprocessor/ controller based board with interfaces to the
sensors and actuators.

The sensor data is fed back to the control unit and the-control unit generates the necessary
actuator outputs.

The control unit also provides connectivity to user interfaces like keypad for setting the
washing time, selecting the type of material to be washed like light, medium, heavy duty, etc.
User feedback is reflected through the display unit and LEDs connected to the control board.

i. Water inlet control valve - Near the water inlet point of the washing there is water inlet
control valve. When you load the clothes in washing machine, this valve gets opened
automatically and it closes automatically depending on the total quantity of the water
required.
Dept. of CSE, HKBKCE 85 2019-20
18CS44 Microcontrollers and Embedded Systems

ii. Water pump: The water pump circulates water through the washing machine. It works in
two directions, re-circulating the water during wash cycle and draining the water during the
spin cycle.

iii. Tub: There are two types of tubs in the washing machine: inner and outer. The clothes
are loaded in the inner tub, where the clothes are washed, rinsed and dried. The inner tub has
small holes for draining the water. The external tub covers the inner tub and supports it
during various cycles of clothes washing.

iv. Agitator or rotating disc: The agitator is located inside the tub of the washing machine.
It is the important part of the washing machine that actually performs the cleaning operation
of the clothes.

During the wash cycle the agitator rotates continuously and produces strong rotating currents
within the water due to which the clothes also rotate inside the tub. The rotation of the clothes
within water containing the detergent enables the removal of the dirt particles from the fabric
of the clothes.

In some washing machines, instead of the long agitator, there is a disc that contains blades on
its upper side. The rotation of the disc and the blades produce strong currents within the water
and the rubbing of clothes that helps in removing the dirt from clothes.

v. Motor of the washing machine: The motor is coupled to the agitator or the disc and
produces it rotator motion. These are multispeed motors, whose speed can be changed as per
the requirement. In the fully automatic washing machine the speed of the motor i.e. the
agitator changes automatically as per the load on the washing machine.

vi. Timer: The timer helps setting the wash time for the clothes manually. In the automatic
mode the time is set automatically depending upon the number of clothes inside the washing
machine.

vii. Printed circuit board (PCB): The PCB comprises of the various electronic components
and circuits, which are programmed to perform in unique ways depending on the load
conditions (the condition and the amount of clothes loaded in the washing machine). They are
sort of artificial intelligence devices that sense the various external conditions and take the
decisions accordingly. These are also called as fuzzy logic systems. Thus the PCB will
calculate the total weight of the clothes, and find out the quantity of water and detergent
required, and the total time required for washing the clothes. Then they will decide the time
required for washing and rinsing. The entire processing is done on a kind of processor which
may be a microprocessor or microcontroller.

viii. Drain pipe: The drain pipe enables removing the dirty water from the washing that has
been used for the washing purpose.

Dept. of CSE, HKBKCE 86 2019-20


18CS44 Microcontrollers and Embedded Systems

AUTOMOTIVE - Domain-specific examples of embedded system


The major domains of embedded systems are consumer, industrial, automotive, telecom, etc.,
of which telecom and automotive industry holds a big market share.

 Inner Workings of Automotive Embedded Systems - where electronics take


control over the mechanical systems.

The presence of automotive embedded system in a vehicle varies from simple mirror and
wiper controls to complex air bag controller and antilock brake systems (ABS).

Automotive embedded systems are normally built around microcontrollers or DSPs are
generally known as Electronic Control Units (ECUs).

The number of embedded controllers in an ordinary vehicle varies from 20 to 40 where as a


luxury vehicle like Mercedes S and BMW 7 may contain 75 to 100 numbers of embedded
controllers.

The first embedded system used in automotive application was the microprocessor based
fuel injection system introduced by Volkswagen 1600 in 1968.

The various types of electronic control units (ECUs) used in the automotive embedded
industry they are
1. High-speed embedded control units
2. Low-speed embedded control units.

1. High-speed Electronic Control Units (HECUs) - High-speed electronic control units


(HECUs) are deployed in critical control units requiring fast response. They include fuel
injection systems, antilock brake systems, engine control, electronic throttle, steering
controls, transmission control unit and central control unit.

2. Low-speed Electronic Control Units (LECUs) - Low Speed Electronic Control Units
(LECUs) are deployed in applications where response time is not so critical. They generally
built around low cost microprocessors/microcontrollers and digital signal processors. Audio
controllers, passenger and driver door locks, door glass controls (power windows), wiper
control, mirror control, seat control systems etc. are examples of LECUs.

Dept. of CSE, HKBKCE 87 2019-20


18CS44 Microcontrollers and Embedded Systems

Figure: Embedded system in automotive - domain


 Automotive Communication Buses
Automotive applications make use of serial buses for communication, which greatly reduces
the amount of wiring required inside a vehicle.

Different types of serial interface buses deployed in automotive embedded applications are

1. Controller Area Network (CAN) - The CAN bus was originally proposed by Robert
Bosch. It supports medium speed with data rates up to 125 Kbps and high speed with data
rates up to 1Mbps data transfer.

CAN is an event-driven protocol interface with support for error handling in data
transmission. It is generally employed in safety system like airbag control; power train
systems like engine control and Antilock Brake System (ABS); and navigation systems like
GPS etc.

2. Local Interconnect Network (LIN) - LIN bus is a single master multiple slave (up to 16
independent slave nodes) communication interface. LIN is a low speed, single wire
communication interface with support for data rates up to 20 Kbps and is used for
sensor/actuator interfacing.

LIN bus is employed in applications like mirror controls, fan controls, seat positioning
controls, window. Controls, and position controls where response time is not a critical issue.

3. Media-Oriented System Transport (MOST) Bus - The Media-oriented system


transport (MOST) is targeted for automotive audio/video equipment interfacing, used
primarily in European cars.

Dept. of CSE, HKBKCE 88 2019-20


18CS44 Microcontrollers and Embedded Systems

A MOST bus is a multimedia fibre-optic point-to-point network implemented in a star, ring’


or daisy chained topology over optical fibre cables.

The MOST bus-specifications define the physical layer as well as the application layer,
network layer, and media access control.

MOST bus is an optical fibre cable connected between the Electrical Optical Converter
(EOC) and Optical Electrical Converter (OEC), which would translate into the optical cable
MOST bus.

Fundamental issues in Hardware Software Co-design


The following issues are some of the fundamental issues in hardware software co-design.
1. Selecting the model - System models are used for capturing and describing the system
characteristics. A model is a formal system consisting of objects and composition rules.

It is hard to make a decision on which model should be followed in a particular system


design. Most often designers switch between a varieties of models from the requirements
specification to the implementation of the system design.

2. Selecting the Architecture - A model only captures the system characteristics and does
not provide information on how the system can be manufactured?

The architecture specifies how a system is going to implement in terms of the number and
different types of components and the interconnection among them.

Some of the commonly used architectures in system design are

i. The controller architecture - implements the finite state machine model using a state
register and two combinational circuits. The state register holds the present state and the
combinational circuit’s implement the logic for next state and output.

ii. The datapath architecture - is best suited for implementing the data flow graph model
where the output is generated as a result of a set of predefined computations on the input data.

A datapath represents a channel between the input and output and in datapath architecture the
datapath may contain registers, counters, register files, memories and ports along with high
speed arithmetic units.

iii. The Finite State Machine Datapath (FSMD) – this architecture combines the controller
architecture with datapath architecture. It implements a controller with datapath.

The controller generates the control input whereas the datapath processes the data. The
datapath contains two types of I/O ports, out of which one acts as the control port for
receiving/sending the control signals from/to the controller unit and the second I/O port
interfaces the datapath with external world for data input and data output.

iv. The Complex Instruction Set Computing (CISC) - architecture uses an instruction set
representing complex operations. It is possible for a CISC instruction set to perform a large
complex operation with a single instruction.

Dept. of CSE, HKBKCE 89 2019-20


18CS44 Microcontrollers and Embedded Systems

The use of a single complex instruction in place of multiple simple instructions greatly
reduces the program memory access and program memory size requirement.

v. The Very Long Instruction Word (VLIW) - architecture implements multiple functional
units (ALUs, multipliers, etc.) in the datapath. The VLIW instruction packages one standard
instruction per functional unit of the datapath.

vi. Parallel processing architecture - implements multiple concurrent Processing Elements


(PEs) and each processing element may associate a datapath containing register and local
memory.

Single Instruction Multiple Data (SIMD) and Multiple Instruction Multiple Data (MIMD)
architectures are examples for parallel processing architecture.

In SIMD architecture, a single instruction is executed in parallel with the help of the
Processing Elements. On the other hand, the processing elements of the MIMD architecture
execute different instructions at a given point of time.

3. Selecting the language - A programming language captures a Computational Model and


maps it into architecture. Any programing language can be used.

A model can be captured using multiple programming languages like C, C++, C#, Java, etc.
for software implementations and for hardware implementations languages like VHDL,
System C, Verilog, etc.

On the other hand, a single language can be used for capturing a variety of models. Certain
languages are good in capturing certain computational model. For example, C++ is a good
candidate for capturing an object oriented model.

4. Partitioning System Requirements into hardware and software - From an


implementation perspective, it may be possible to implement the system requirements in
either hardware or software (firmware). It is a tough decision making task to figure out which
one to opt. various hardware software trade-offs are used for making a decision on the
hardware- software partitioning.

Computational Models in Embedded design


The commonly used computational models in embedded system design are
1. Data Flow Graph (DFG) model
2. Control Data Flow Graph (CDFG) model
3. State Machine model
4. Concurrent Process model
5. Sequential Program model
6. Object oriented model.

1. Data Flow Graph/Diagram (DFG) Model


The Data Flow Graph (DFG) model translates the data processing requirements into a
data flow graph. It is a data driven model in which the program execution is determined by
data.

This model emphasises on the data and operations.

Dept. of CSE, HKBKCE 90 2019-20


18CS44 Microcontrollers and Embedded Systems

In Data Flow Graph (DFG) model the operation on the data (process) is represented using a
block (circle) and data flow is represented using arrows. An inward arrow to the process
(circle) represents input data and outward arrow from the process (circle) represents
output data.
Now let’s have a look at the implementation of a DFG. Suppose one of the functions in our
application contains the computational requirement x = a + b; and y = x - c.

Below figure illustrates the implementation of a DFG model for implementing these
requirements.

Figure: Data Flow Graph (DFG) model

In a DFG model, a data path is the data flow path from input to output. A DFG model is said
to be acyclic DFG (ADFG) if it doesn’t contain multiple values for the input variable and
multiple output values for a given set of input(s).

Feedback inputs (Output is fed back to Input), events, etc. are examples for non-acyclic inputs.

2. Control Data Flow Graph/Diagram (CDFG)


The Control DFG (CDFG) model involves conditional program execution. CDFG models
contains both data operations and control operations.

The CDFG uses Data Flow Graph (DFG) as element and conditional (constructs) as decision
makers. CDFG contains both data flow nodes and decision nodes, whereas DFG contains
only data flow nodes.

Let us have a look at the implementation of the CDFG for the following requirement.
If flag =1, then x=a+b; else y=a—b; this requirement contains a decision making process.

Dept. of CSE, HKBKCE 91 2019-20


18CS44 Microcontrollers and Embedded Systems

Figure: Control Data Flow Graph model

The control node is represented by a ‘Diamond’ block which is the decision making
element. The decision on which process is to be executed is determined by the control node.

3. State Machine Model


The State Machine model is used for modelling reactive or event-driven embedded systems
whose processing behavior are dependent on state transitions. Embedded systems used in the
control and industrial applications are typical examples for event driven systems.

The State Machine model describes the system behavior with ‘States’, ‘Events’, ‘Actions’
and ‘Transitions’.

State is a representation of a current situation. An event is an input to the state. The event
acts as stimuli for state transition. Transition is the movement from one state to another.
Action is an activity to be performed by the state machine.
A Finite State Machine (FSM) model is one in which the number of states are finite. As an
example let us consider the design of an embedded system for driver/passenger ‘Seat Belt
Warning’ in an automotive using the FSM model.

The system requirements are

1. When the vehicle ignition is turned on and the seat belt is not fastened within 10 seconds
of ignition ON, the system generates an alarm signal for 5 seconds.

2. The Alarm is turned off when the alarm time (5 seconds) expires or if the
driver/passenger fastens the belt or if the ignition switch is turned off, whichever happens
first.

Here the states are ‘Alarm Off’, ‘Waiting’ and ‘Alarm On’ and the events are ‘Ignition Key
ON’, ‘Ignition Key OFF’, “Timer Expire’, ‘Alarm Time Expire’ and ‘Seat Belt ON’.

Dept. of CSE, HKBKCE 92 2019-20


18CS44 Microcontrollers and Embedded Systems

Figure: FSM Model for automatic seat belt warning

Using the FSM, the system requirements can be modelled as shown in the above figure.

The ‘Ignition Key ON’ event triggers the 10 second timer and transitions the state to ‘Waiting’.

If a ‘Seat Belt ON’ or ‘Ignition Key OFF’ event occurs during the wait state, the state
transitions into ‘Alarm Off.

When the wait timer expires in the waiting state, the event ‘Timer Expire’ is generated and it
transitions the state to ‘Alarm On’ from the ‘Waiting’ state.

The ‘Alarm On’ state continues until a ‘Seat Belt ON’ or ‘Ignition Key OFF’ event or ‘Alarm
Time Expire’ event, whichever occurs first. The occurrence of any of these events transitions
the state to ‘Alarm Off’.

The wait state is implemented using a timer. The timer also has certain set of states andevents
for state transitions. Using the FSM model, the timer can be modelled as shown in below
figure.

Figure: FSM Model for times


Dept. of CSE, HKBKCE 93 2019-20
18CS44 Microcontrollers and Embedded Systems

The timer state can be either ‘IDLE’ or ‘READY’ or ‘RUNNING’. During the normal
condition when the timer is not running, it is said to be in the ‘IDLE’ state.

The timer is said to be in the ‘READY’ state when the timer is loaded with the count
corresponding to the required time delay.

The timer remains in the ‘READY’ state until a ‘Start Timer’ event occurs. The timer
changes its state to ‘RUNNING’ from the ‘READY’ state on receiving a ‘Start Timer’ event
and remains in the ‘RUNNING’ state until the timer count expires or a “Stop Timer’ even
occurs.

The timer state changes to ‘IDLE’ from ‘RUNNING’ on receiving a ‘Stop Timer’ or ‘Timer
Expire’ event.

 Example 1
Design an automatic tea/coffee vending machine based on FSM model for the following
requirement.

The tea/coffee vending is initiated by user inserting a 5 rupee coin. After inserting the
coin, the user can either select ‘Coffee’ or ‘Tea’ or press ‘Cancel’ to cancel the order
and take back the coin.

The FSM representation for the above requirement is shown in the below figure.

It contains four states namely


i. Wait for coin
ii. Wait for User Input
iii. Dispense Tea
iv. Dispense Coffee.

The event ‘Insert Coin’ (5 rupee coin insertion), transitions the state to ‘Wait for User Input’.
The system stays in this state until a user input is received from the buttons ‘Cancel’, ‘Tea’ or
‘Coffee’.
If the event triggered in ‘Wait State’ is ‘Cancel’ button press, the coin is pushed out and the
state “transitions to ‘Wait for Coin’.

If the event received in the ‘Wait State’ is either “Tea’ button press, or ‘Coffee’ button press,
the state changes to ‘Dispense Tea’ and ‘Dispense Coffee’ respectively.

Once the coffee/tea vending is over, the respective states transitions back to the ‘Wait for
Coin’ state.

A few modifications like adding a timeout for the ‘Wait State’ (Currently the ‘Wait State’ is
infinite; it can be re-designed to a timeout based ‘Wait State’. If no user input is received
within the timeout period, the coin is returned back and the state automatically transitions to
‘Wait for Coin’ on the timeout event) and capturing another events like, ‘Water not
available’, ‘Tea/Coffee Mix not available’ and changing the state to an ‘Error State’ can be
added to enhance this design.

Dept. of CSE, HKBKCE 94 2019-20


18CS44 Microcontrollers and Embedded Systems

Figure: FSM Model automatic tea/coffee vending machine

 Example 2
Design a coin operated public telephone unit based on FSM model for the following
requirements.
1. The calling process is initiated by lifting the receiver (off-hook) of the telephone unit.
2. After lifting the phone the user needs to insert a 1 rupee coin to make the call.
3. If the line is busy, the coin is returned on placing the receiver back on the hook.
4. If the line is through, the user is allowed to talk till 60 seconds and at the end of
45 second, prompt for inserting another 1 rupee coin for continuing the call is
initiated
5. If the user doesn’t insert another 1 rupee coin, the call is terminated on completing
the 60 seconds time slot.
6. The system is ready to accept new call request when the receiver is placed back on
the hook.
7. The system goes to the ‘Out of Order’ state when there is a line fault.

The FSM representation for the above requirement is shown in the below figure.

Dept. of CSE, HKBKCE 95 2019-20


18CS44 Microcontrollers and Embedded Systems

Figure: FSM Model for Coin Operated Telephone System

4. Sequential Program Model


In the sequential programming Model, the functions or processing requirements are executed
in sequence.

Here the program instructions are iterated and executed conditionally and the data gets
transformed through a series of operations. The important tool used for modelling sequential
program is Flow Charts.

Example sequential program model for the ‘Seat Belt Warning’ system is illustrated below.

#define ON 1
#define OFF 0
#define YES 1
#define NO 0
void seat_belt_warn()
{
wait_10sec();
if (check_ignition_key()==ON)

Dept. of CSE, HKBKCE 96 2019-20


18CS44 Microcontrollers and Embedded Systems

{
if (check_seat_belt()==OFF)
{
set_timer(5);
start_alarm();
while ( (check_seat_belt()==OFF ) &&( check_ignition_key()==OFF) &&
(timer_expire()==NO));
Stop_alarm();
}
}
}

Figure: Sequential program Model for seat belt warning system

Dept. of CSE, HKBKCE 97 2019-20


18CS44 Microcontrollers and Embedded Systems

5. Concurrent/Communicating Process Model


• The concurrent or communicating process model concurrently executes
tasks/processes. Concurrent models uses CPU effectively. It is easier to implement
certain requirements in concurrent processing model than the conventional sequential
execution.
• Sequential execution leads to a single sequential execution of task and thereby leads to
poor processor utilization.
• If the task is split into multiple subtask, it is possible to tackle the CPU usage
effectively.
• However, concurrent processing model requires additional overheads in task
scheduling, Task synchronisation and communication.
Example – let us examine how to implements the ‘Seat Belt Warning’ system in concurrent
processing model.
1. Timer task for waiting 10 seconds (wait timer task)
2. Task for checking the ignition key status (ignition key status monitoring task)
3. Task for checking the seat belt status (seat belt status monitoring task)
4. Task for starting and stopping the alarm (alarm control task)
5. Alarm timer task for waiting 5 seconds (alarm timer task)

Figure: Tasks for ‘Seat Belt Warning System

Dept. of CSE, HKBKCE 98 2019-20


18CS44 Microcontrollers and Embedded Systems

Figure: Concurrent processing Program model for Seat Belt Warning System
• We have five tasks here and we cannot execute them sequentially or randomly. We need
to synchronize their execution through some mechanism.
• We need to start alarm only after the expiration the 10 seconds wait timer, and that too
only if the seat belt is OFF and the ignition key is ON.
• Hence the alarm control task is executed only when the wait timer is expired and if the
ignition key is in the ON state and seat belt is in the OFF state.
• Here we will use events to indicate these scenarios.

Dept. of CSE, HKBKCE 99 2019-20


18CS44 Microcontrollers and Embedded Systems

6. Object-Oriented Model
The object-oriented model is an object based model for modelling system requirements.

Object-oriented model provides re-usability, maintainability and productivity in system design.


Object is an entity used for representing or modelling a particular piece of the system. Each
object is characterised by a set of unique behaviour and state.
Class is an abstract description of a set of objects and it can be considered as a blueprint of an
object.

A class represents the state of an object through member variables and object behaviour
through the member functions. The member variables and member function of a class can be
private, public or protected. Private member variables and functions are accessible only
within the class whereas public variable and functions are accessible within and outside of a
class. The protected variable and functions are protected from external access.

Embedded firmware design and development


• Firmware is considered as the master brain of the embedded system. Imparting
intelligence to an embedded system is a one time process and it can happen at any
stage. Once intelligence is imparted to the embedded product, the product starts
functioning properly and continue serving the assigned task till hardware
breakdown occurs or a corruption in embedded firmware occurs.
• Example: new born baby is a hardware need to design firmware to embed
intelligence in empty brain
• For most embedded products the embedded firmware is stored at a permanent
memory (ROM) and they are non-alterable by end user.
• Designing embedded firmware requires understanding of the particular
embedded product hardware.

Embedded firmware design approaches

Two basic approaches are used for embedded firmware design. They are
1. Conventional Procedural Based Firmware Design
2. Embedded Operating System (OS) Based Design

1. Conventional Procedural Based Firmware Design - The conventional procedural


based design is also known as ‘Super Loop Model’.

The Super Loop based firmware development approach is adopted for applications that are
not time critical and where the response time is not so important (embedded systems
where missing deadlines are acceptable).

It is very similar to a conventional procedural programming where the code is executed task
by task. The task listed at the top of the program code is executed first and the tasks just
below the top are executed after completing the first task.

The firmware execution flow for this will be

Dept. of CSE, HKBKCE 100 2019-20


18CS44 Microcontrollers and Embedded Systems

i. Configure the common pararneters and perform initialisation for various


hardware components memory, registers, etc.
ii. Start the first task and execute it
iii. Execute the second task.
iv. Execute the next task
v. …….
vi. Execute the last defined task
vii. Jump back to the first task and follow the same flow

Visualise the operational sequence listed above in terms of a ‘C’ program code as

void main()
{
Configuration();
Initialization();

While(1)
{
Task 1;
Task 2;
.
.
.
Task n;
}
}

Almost all tasks in above example are non-ending and are repeated infinitely throughout the
operation. From the above ‘C’ code you can see that the tasks 1 to n are performed one after
another and when the last task (n task) is executed, the firmware execution is again re-
directed to Task 1 and it is repeated forever in the loop. This repetition is achieved by using
an infinite loop. Here the while (1) { } loop. This approach is also referred as ‘Super loop
based Approach’.

Since the tasks are running inside an infinite loop, the only way to come out of the loop is
either a hardware reset or an interrupt assertion. A hardware reset brings the program
execution back to the main loop. Whereas an interrupt request suspends the task execution
temporarily and performs the corresponding interrupt routine and on completion of the
interrupt routine it restarts the task execution from the point where it got interrupted.

The ‘Super loop based design’ doesn’t require an operating system, since there is no need for
scheduling which task is to be executed and assigning priority to each task. Here the priorities
are fixed and the order in which the tasks to be executed are also fixed.

This type of design is deployed in low-cost embedded products and products where response
time is not time critical. Example, reading/writing data to and from a card using a card reader
requires a sequence of operations like checking the presence of card, authenticating the
operation, reading/writing, etc.

Dept. of CSE, HKBKCE 101 2019-20


18CS44 Microcontrollers and Embedded Systems

The major drawback of this approach is that any failure in any part of a single task will
affect the total system. If the program hangs up at some point while executing a task, it will
remain there forever and ultimately the product stops functioning.

Watch Dog Timers (WDTs) helps in coming out from the loop when an unexpected failure
occurs or when the processor hangs up.

2. The Embedded Operating System (OS) Based Approach


The Operating System (OS) based approach contains operating systems, which can be either
a General Purpose Operating System (GPOS) or a Real Time Operating System (RTOS)
to host the user written application firmware.

The General Purpose OS (GPOS) based design is very similar to a conventional PC based
application development where the device contains an operating system (Windows/Unix/
Linux, etc. for Desktop PCs) and you will be creating and running user applications on top of
it.

Real Time Operating System (RTOS) based design approach is employed in embedded
products demanding Real-time response. RTOS respond in a timely and predictable manner
to events. Real Time operating system contains a Real Time kemel responsible for
performing pre-emptive multitasking, scheduler for scheduling tasks, multiple threads, etc.

EMBEDDED FIRMWARE DEVELOPMENT LANGUAGES


1. Assembly language based development
2. High level language based development
Assembly language based development
• ‘Assembly language’ is the human readable notation of ‘machine language’,
where as ‘machine language’ is processor understandable language.
• Assembly language programming is the task of writing processor specific
machine code in mnemonic form, converting the mnemonics into actual
processor instructions and associated data using assembler.
• The general format of an assembly language instruction is an opcode followed by
operands.
• The opcode tells the processor what to do and the operands provide the data and
information required to perform the action specified by the opcode.
• Each line of the assembly language program is split into four fields as
given below

• Similar to ‘C’ and other high level language programming, you can have
multiple source files called modules.
• Each module is represented by an ‘.asm’ or ‘.src’ file similar to the ‘.c’ files in C
programming. This approach is called modular programming

Dept. of CSE, HKBKCE 102 2019-20


18CS44 Microcontrollers and Embedded Systems

• Modular programs are usually easy to code, debug and alter.


• Source File to Object File Translation: Translation of assembly code to machine
code is performed by assembler.
• Some assemblers are freely available in the internet. Some assemblers are commercial
and requires licence from the vendor. Example A51 Macro Assembler from Keil
software is a popular assembler for the 8051 family microcontroller.
• The various steps involved in the conversion of a program written in assembly language
to corresponding binary file/machine language is illustrated in below figure below.
• Similar to ‘C’ and other high level language programming, you can have multiple
source files called modules.
• Each module is represented by an ‘.asm’ or ‘.src’ file similar to the ‘.c’ files in C
programming. This approach is called modular programming.
• Modular programs are usually easy to code, debug and alter.

Figure: Assembly language to machine language conversion process

Dept. of CSE, HKBKCE 103 2019-20


18CS44 Microcontrollers and Embedded Systems

i. Library File Creation and Usage - Libraries are specially formatted, ordered program
collections of object modules that may be used by the linker at absolute object file creation.

When the linker processes a library only those object modules in the library that are necessary
to create the program are used. Library files are generated with extension ‘.lib’.
.
ii. Linker and Locater - Linker and Locater is another software utility responsible for
linking the various object modules in a multi-module project and assigning absolute address
to each module. Linker generates an absolute object module by extracting the object modules
from the library.

iii. Object to Hex File Converter - This is the final stage in the conversion of Assembly
language to machine understandable language (machine code).

Hex File is the representation of the machine code and the hex file is dumped into the code
memory of the processor/controller.

Hex file is created from the final ‘Absolute Object File’ using the Object to Hex File
Converter utility.

 Advantages of Assembly Language Based Development


i. Efficient Code Memory and Data Memory Usage (Memory Optimisation)
ii. High Performance
iii. Low Level Hardware Access
iv. Code Reverse Engineering Reverse engineering

 Drawbacks of Assembly Language Based Development


i. High Development Time
ii. Developer Dependency
iii. Non-Portable

High level language based development

• The most commonly used high level language for embedded firmware application development
is ‘C’.
• C is well defined easy to use high level language with extensive cross platform development
tool support. Nowadays cross compiler for C++ is also emerging out.
• High level language based development approach is same as that of assembly language based
development except that conversion of source file written in high level language to object file is
done by cross compiler.
• The various steps involved in the conversion of a program is illustrated in figure.

Dept. of CSE, HKBKCE 104 2019-20


18CS44 Microcontrollers and Embedded Systems

Advantages and limitations of high level language based development


Advantages:
• Reduced development time
• Developer independency
• Portability
Limitations:
• Cross compilers available for high level language may not be so efficient in generating
optimized instructions.
• More execution time than assembly language program.
• Not possible to access the hardware at low level.
• High level language based development tools are costly.

Mixing Assembly and High level language


• Certain embedded firmware development situations may demand the mixing of
high level language with assembly and vise versa.
• Three ways of mixing
• Assembly language with High level language.
• High level language with assembly language.
• Inline assembly programming

Mixing Assembly language with High level language ( Assembly language with C)
• Assembly routines are mixed with ‘C’ in situations where the entire program is written
in ‘C’.
• When accessing certain low level hardware, the timing specifications may be very
critical and a cross compiler generated code may not be able to offer the required
performance.
• Writing assembly routine and invoking it from C is the most advised method to handle
the situation.

Dept. of CSE, HKBKCE 105 2019-20


18CS44 Microcontrollers and Embedded Systems

• Mixing C and assembly is little complicated in the sense—


• How parameters are passed form C to assembly routine?
• How values are returned from Assembly to C?
• How assembly routine is invoked form C?

• Passing parameters form C to assembly routine and returning values from Assembly to
C and method of invoking assembly routine form C is cross compiler dependent. There
is no written rule for this.
• This information can be obtained from the documentation of the cross compiler you are
using.

Mixing of High level language with assembly language.( C with assembly language)
• Mixing the code written in a high level language like C and assembly language is
useful in the following scenarios:
1. The source code is already available in assembly language and a routine
written in high level language like ‘C’ needs to be included to the existing
code.
2. The entire source code is planned in assembly code for various reasons like
optimized code, optimal performance etc.. But some portions of the code may
be very difficult and tedious to code in assembly.
3. To include built in library functions written in ‘C’ language provided by the
cross compiler.
• Mixing assembly and C major questions that need to be addressed are—
• How parameters are passed form C to assembly routine?
• How values are returned from Assembly to C?
• How C routine is invoked form assembly code?
• Passing parameters to C and returning values from C function and method of invoking
C function is cross compiler dependent. There is no written rule for this.
• This information can be obtained from the documentation of the cross compiler you are
using.

Inline assembly
• Inline assembly is another technique for inserting target processor specific assembly
instructions at any location of the source code written in high level language ‘C’.
• This avoids the delay in calling an assembly routine form a ‘C’ code.
• Special keywords are used to indicate that the start and end of assembly instructions.
Keywords are cross compiler specific.
• C51 uses keywords #pragma asm and #pragma endasm to indicate a block of code
written in assembly.

Dept. of CSE, HKBKCE 106 2019-20


18CS44 Microcontrollers and Embedded Systems

 PROGRAMMING IN EMBEDDED C
C Embedded C
1. C is a well-structured, well defined 1. Embedded ‘C’ can be considered as a subset of
and standardised language. ‘C’ language. And it supports all ‘C’ instructions and
incorporates a few target processor specific
2. It is General purpose functions/instructions.
programming language
2. It is Specific purpose programming language.
3. C is generally used for
desktop computers. 3. Embedded C is generally used
microcontroller based
4.Compiler is used for conversion of programs applications.
written in ‘C’ to the binary code
4. Cross Compiler is used for conversion of
5. C language is not hardware programs written in ‘Embedded C’ to the binary
dependent language. code

5. Embedded C is fully hardware


dependent language.

Compiler Cross-Compiler
1. Compiler is a software tool that converts a source 1. Cross Compiler is a software tool that converts a
code written in a high level language (C- Language) source code written in an Embedded C to machine
to machine level language. level language.

2. Compiler are used in platform specific 2. Cross-compilers are used in cross- platform
development applications. development applications.

3. Compilers are generally termed as ‘Native 3.Cross compiler generates machine code for the
Compilers’. A native compiler generates machine different machine (processor) on which it is running
code for the same machine (processor) on which it is
running
4.Example – Keil Compiler (Keil C51)
4. Example - GCC (GNU Complier collection),
Borland turbo C, Intel C++ compiler.

Dept. of CSE, HKBKCE 107 2019-20


18CS44 Microcontrollers and Embedded Systems

Dept. of CSE, HKBKCE 108 2019-20


18CS44 Microcontrollers and Embedded Systems

Module 5
RTOS and IDE for Embedded System Design
Operational System Basics
 The operating system acts as a bridge between the user applications/tasks and the
underlying system resources through a set of system functionalities and services.
 The OS manages the system resources and makes them available to the user
applications/tasks on a need basis.
 A normal computing system is a collection of different I/O subsystems, working, and
storage memory.
 The primary functions of an operating system is
 Make the system convenient to use
 Organise and manage the system resources efficiently and correctly
 Figure 10.1 gives an insight into the basic components of an operating system and
their interfaces with rest of the world.

The Kernel

 The kernel is the core of the operating system and is responsible for managing the
system resources andthe communication among the hardware and other system
services.
 Kernel acts as the abstraction layerbetween system resources and user applications.
 Kernel contains a set of system libraries and services.
 For a general purpose OS, the kernel contains different services for handling the
following.

Dept. Of CSE, HKBKCE 109 2019-20


18CS44 Microcontrollers and Embedded Systems

Process Management

 Process management deals with managing the processes/tasks.


 Process managementincludes setting up the memory space for the process, loading the
process’s code into the memoryspace, allocating system resources, scheduling and
managing the execution of the process, setting upand managing the Process Control
Block (PCB), Inter Process Communication and synchronisation,process
termination/deletion, etc.

Primary Memory Management

 The term primary memory refers to the volatile memory (RAM) where processes are
loaded and variables and shared data associated with each process are stored.
 The Memory Management Unit (MMU) of the kernel is responsible for
 Keeping track of which part of the memory area is currently used by which process
 Allocating and De-allocating memory space on a need basis (Dynamic memory
allocation).

File System Management

 File is a collection of related information.


 A file could be a program (source code or executable), text files, image files, word
documents, audio/video files, etc.
 Each of these files differ in the kind of information they hold and the way in which
the information is stored.
 The file operation is a useful service provided by the OS.
 The file system management service of Kernel is responsible for
 The creation, deletion and alteration of files
 Creation, deletion and alteration of directories
 Saving of files in the secondary storage memory (e.g. Hard disk storage)
 Providing automatic allocation of file space based on the amount of free space
available
 Providing a flexible naming convention. for the files
 The various file system management operations are OS dependent.
 For example, the kernel of Microsoft® DOS OS supports a specific set of file system
management operations and they are not the sameas the file system operations
supported by UNIX Kernel.
- I/O System (Device) Management Kernel is responsible for routing the I/O
requests coming fromdifferent user applications to the appropriate I/O devices
of the system.
 In a well-structured OS, thedirect accessing of I/O devices are not allowed and the
access to them are provided through a set ofApplication Programming Interfaces
(APIs) exposed by the kernel.
 The kernel maintains a list of allthe I/O devices of the system.
 This list may be available in advance, at the time of building the kernel.
 Some kernels, dynamically updates the list of available devices as and when a new
device is installed (e.g. Windows XP kernel keeps the list updated when a new plug
‘n’ play USB device is attached to the system).

Dept. Of CSE, HKBKCE 110 2019-20


18CS44 Microcontrollers and Embedded Systems

 The service ‘Device Manager’ (Name may vary across different OS kernels) of the
kernel is responsible for handling all I/O device related operations.
 The kernel talks to the I/O device through a set of low-level systems calls, which are
implemented in a service, called device drivers.
 The device drivers are specific to a device or a class of devices.
 The Device Manager is responsible for
 Loading and unloading of device drivers
 Exchanging information and the system specific control signals to and
from the device

Secondary Storage Management

 The secondary storage management deals with managing the secondary storage
memory devices, if any, connected to the system.
 Secondary memory is used as backup medium for programs and data since the main
memory is volatile.
 In most of the systems, the secondary storage is kept in disks (Hard Disk).
 The secondary storage management service of kernel deals with
 Disk storage allocation
 Disk scheduling (Time interval at which the disk is activated to backup data)
 Free Disk space management

Protection Systems

 Most of the modern operating systems are designed in such a way to support multiple
users with different levels of access permissions (e.g. Windows XP with user
permissions like
 ‘Administrator’, ‘Standard’, ‘Restricted’, etc.). Protection deals with implementing
the security policies to restrict the access to both user and system resources by
different applications or processes or users.
 In multiuser supported operating systems, one user may not be allowed to view or
modify the whole/portions of another user’s data or profile details.
 In addition, some application may not be granted with permission to make use of
some of the system resources.
 This kind of protection is provided by the protection services running within the
kernel.
Interrupt Handler

 Kernel provides handler mechanism for all external/internal interrupts generated by


the system.
 These are some of the important services offered by the kernel of an operating system.
 It does not mean that a kernel contains no more than components/services explained
above.
 Depending on the type of the operating system, a kernel, may contain lesser number
of components/services or more number of components/services.
 In addition to the components/services listed above, many operating systems offer a
number of add-on system components/services to the kernel.

Dept. Of CSE, HKBKCE 111 2019-20


18CS44 Microcontrollers and Embedded Systems

 Network communication, network management, user-interface graphics, timer


services (delays, timeouts, etc.), error handler, database management, etc. are
examples for such components/services.
 Kernel exposes the interface to the variouskernel applications/services, hosted by
kernel, to the user applications through a set of standardApplication Programming
Interfaces (APIs).
 User applications can avail these API calls to access thevarious kernel
application/services.

Kernel Space and User Space

 The applications/services are Classified into two categories, namely: user applications
and keel applications.
 The program code corresponding to the kernel applications/services are kept in a
contiguous area (OS dependent) of primary (working) memory and is protected from
the un-authorised access by user programs/applications.
 The memory space at which the keel code is located is known as ‘Kernel Space’.
 Similarly, all user applications are loaded to a specific area of primary memory and
this memory area is referred as ‘User Space’.
 User space is the memory area where user applications are loaded and executed.
 The partitioning of memory into kernel. and user space is purely Operating System
dependent.
 Some OS implements this kind of partitioning and protection whereas some OS do
not segregate the kernel and user application code storage into two separate areas.
 In an operating system with virtual memory support, the user applications are loaded
into its corresponding virtual memory space with demand paging technique; Meaning,
the entire code for the user application need not be loaded to themain (primary)
memory at once; instead the user application code is split into different pages and
these pages are loaded into and out of the main memory area on a need basis.
 The act of loading the code into and out of the main memory is termed as ‘Swapping’.
 Swapping happens between the main (primary) memory and secondary storage
memory.
 Each process run in its own virtual memory space and are not allowed accessing the
memory space corresponding to another processes, unless explicitly requested by the
process.
 Each process will have certain privilege levels on accessing the memory of other
processesand based on the privilege settings, processes can request kernel-to map
another process’s memory to its own or share through some other mechanism.
 Most of the operating systems keep the kernel applicationcode in main memory and it
is not swapped out into the secondary memory.

Monolithic Kernel and Microkernel


 The kernel forms the heart of an operatingsystem.
 Different approaches are adopted for building an Operating System kernel.
 Based on thekernel design, kernels can be classified into ‘Monolithic’ and ‘Micro’
.
Monolithic Kernel

 In monolithic kernel architecture, all kernel services run in the kernel space.

Dept. Of CSE, HKBKCE 112 2019-20


18CS44 Microcontrollers and Embedded Systems

 Here all kernel modules run within the same memory space under a single kernel
thread. The tight internal integration of kernel modules in monolithic kernel
architecture allows the effective utilisation of thelow-level features of the underlying
system.
 The major drawback of monolithic kernel is that any erroror failure in any one of the
kernel modules leads to the crashing of the entire kernel application.
 LINUX,SOLARIS, MS-DOS kernels are examples of monolithic kernel.
 The architecture representation of amonolithic kernel is given in Fig. 10.2.

Microkernel

 The microkernel design incorporates only the essential set of Operating System
services into the kernel.
 The rest of the Operating System services are implemented in programs known as
‘Servers’ which runs in user space.
 This provides a ‘highly modular design and OS-neutral abstraction to the kernel.
 Memory management, process management, timer systems and interrupt handlers are
the essential services, which forms the part of the microkernel.
 Mach, QNX, Minix 3 kernels are examples for microkernel.
 The architecture representation of a microkernel is shown in Fig. 10.3.
 Microkernel based design approach offers the following benefits
Robustness:

 If a problem is encountered in any of the services, which runs as ‘Server’application,


the same can be reconfigured and re-started without the need for restarting the entire
OS.
 Thus, this approach ishighly useful for systems, which demands‘high ‘availability’.

Dept. Of CSE, HKBKCE 113 2019-20


18CS44 Microcontrollers and Embedded Systems

Configurability:

 Any services, which run as ‘Server’ application can be changed without the needto
restart the whole system. This makes the system dynamically configurable.

TYPES OF OPERATING SYSTEM

Operating Systems are classified into different types

General Purpose Operating System (GPOS)

 The operating systems, which are deployed in general computing systems, are
referred as General Purpose Operating Systems (GPOS).
 The kernel of such an OS is more generalised and it contains all kinds of services
required for executing generic applications.
 General-purpose operating systems are often quite non-deterministic in behaviour.
Their services can inject random delays into application software and may cause slow
responsiveness of an application at unexpected times.
 GPOS are usually deployed in computing systems where deterministic behaviour is
not an important criterion.
 Personal Computer Desktop system is a typical example for a system where GPOSs
are deployed.
 Windows XP/MS-DOS TC etc. are examples for General Purpose Operating Systems.

Real-Time Operating System (RTOS)

 In a broad sense, ‘Real-Time’ implies deterministic timingbehaviour.


 Deterministic timing behaviour in RTOS context means the OS services consumes
only known and ap) expected amounts of time regardless the number of services.
 A Real-Time: Operating System or RTOS me implements policies’ and rules
concerning time-critical allocation of a system’s resources.
 The RTOS decides which applications should run in which order and how much time
needs to be allocated for each application.
 Windows CE, QNX, VxWorks MicroC/OS-II, etc. are examples of Real-Time
Operating Systems (RTOS).

The Real-Time Kernel

 The kernel of a Real-Time Operating System is referred as RealTime kernel.


 In complement to the conventional OS kernel, the Real-Time kernel is highly
specialised and it contains only the minimal set of services required for running the
user applications/tasks.

The basic functions of a Real-Time kernel are listed below:

 Task/Process management
 Task/Process scheduling
 Task/Process synchronisation
 Error/Exception handling

Dept. Of CSE, HKBKCE 114 2019-20


18CS44 Microcontrollers and Embedded Systems

 Memory management
 Interrupt handling
 Time management

Task/Process management

 Deals with setting up the memory space for the tasks, loading the task’s code into the
memory space, allocating system resources, setting up a Task Control Block (TCB)
for the task and task/process termination/deletion.
 A Task Control Block (TCB) is used for holding the information correspondingto a
task.
 TCB usually contains the following set of information.
 Task ID: Task Identification Number
 Task State: The current state of the task (e.g. State = ‘Ready’ for a task
which is ready to execute)
 Task Type: Task typeIndicates what is the type for this task. The task
can be a hard real time or soft real time or background task.
 Task Priority: Task priority (e.g. Task priority = 1 for task with priority
= 1)
 Task Context Pointer: Context pointer-Pointer for context saving
 Task Memory Pointers: Pointers to the code memory, data memory
and stack memory for the task
 Task System Resource Pointers: Pointers to system resources
(semaphores, mutex, etc.) used by the task
 Task Pointers: Pointers to other TCBs (TCBs for preceding, next and
waiting tasks)
Other Parameters Other relevant taskparameters

 The parameters and implementation of the TCB is kernel dependent.


 The TCB parameters vary across different kernels, based on the task management
implementation.
 Task management service utilises the TCB of a task in the following way
 Creates a TCB for a task on creating a task
 Delete/remove the TCB of a task when the task is terminated or deleted
 Reads the TCB to get the state of a task
 Update the TCB with updated parameters on need basis (e.g. on a context switch)
 Modify the TCB to change the priority of the task dynamically

Task/Process Scheduling

 Deals with sharing the PU among various tasks/processes.


 A kernel application called ‘Scheduler’ handles the task scheduling.
 Scheduler is nothing but an algorithm implementation, which performs the efficient
and optimal scheduling of tasks to provide a deterministic behaviour.

Dept. Of CSE, HKBKCE 115 2019-20


18CS44 Microcontrollers and Embedded Systems

Task/Process Synchronisation

 Deals with synchronising the concurrent access of a resource, which is shared across
multiple tasks and the communication between various tasks.

Error/Exception Handling

 Deals with registering and handling the errors occurred/exceptions raised during the
execution of tasks.
 Insufficient memory, timeouts, deadlocks, deadline missing, bus error, divide by zero,
unknown instruction execution, etc. are examples of errors/exceptions.
Errors/Exceptions can happen at the kernel level services or at task level.
 Deadlock is an example for kernel level exception, whereas timeout is an example for
a task level exception.
 The OS kernel gives the information about the error in the form of a system call
(API).
 GetLastError() API provided by Windows CE RTOS is an example for such a system
call.
 Watchdog timer is a mechanism for handling the timeouts for tasks.
 Certain tasks may involve the waiting of external events from devices.
 These tasks will wait infinitely when the external device is not responding and the
task will generate a hang-up behaviour.
 In order to avoid these types of scenarios, a proper timeout mechanism should be
implemented.
 A watch- dog is normally used in such situations.
 The watchdog will be loaded with the maximum expected wait time for the event and
if the event is not triggered within this wait time, the same is informed to the task and
the task is timed out.
 If the event happens before the timeout, the watchdog is resetted.

Memory Management

 Compared to the General Purpose Operating Systems, the memory management


function of an RTOS kernel is slightly different.
 In general, the memory allocation time increases depending on the size of the block of
memory needs to be allocated and the state of the allocated memory block
 Since predictable timing and deterministic behaviour are the primary focus of an
RTOS, RTOS achieves this by compromising the effectiveness of memory allocation.
 RTOS makes use of block’ based memory allocation technique, instead of the usual
dynamic memory allocation techniques used by the GPOS_RTOS kernel uses blocks
of fixed size of dynamic memory and the block is allocated for a task on a feed basis.
 The blocks are stored in a ‘Free Buffer Queue’.
 To achieve predictable timing and avoid the timing overheads, most of the RTOS
kernels allow tasks to access any of the memory blocks without any memory
protection.
 RTOS kernels assume that the whole design is proven correct and protection is
unnecessary. Some commercial RTOS kernels allow memory protection as optional
and the kernel enters a fail-safe mode when an illegal memory access occurs.

Dept. Of CSE, HKBKCE 116 2019-20


18CS44 Microcontrollers and Embedded Systems

Interrupt Handling

Deals with the handling of various types of interrupts. Interrupts provide Real- Time
behaviour to systems.

Interrupts inform the processor that an external device or an associated task requires
immediate attention of the CPU.

Interrupts can be either Synchronous or Asynchronous. Interrupts which occurs in syne with
the currently executing task is known as Synchronous interrupts.

Usually the software interrupts fall under the Synchronous Interrupt category.

Divide by zero, memory segmentation error, etc. are examples of synchronous interrupts.
For synchronous interrupts, the interrupt handler runs in the same context of theinterrupting
task
.
Asynchronous. interrupts are interrupts, which occurs at any point of execution of any task,
and are not in sync with the currently executing task.

The interrupts generated by external devices (by asserting the interrupt line of the
processor/controller to which the interrupt line of the device is connected) connected to the
processor/controller, timer overflow interrupts, serial data reception/ transmission interrupts,
etc. are examples for asynchronous interrupts.

For asynchronous interrupts, the interrupt handler is usually written as separate task
(Depends onOS kernel implementation) and it runs in a different context.

Hence, a context switch happens while handling the asynchronous interrupts.

Priority levels can be assigned to the interrupts and each interrupts can be enabled or disabled
individually.

Most of the RTOS kernel implements ‘Nested Interrupts’ architecture. Interrupt nesting
allows pre-emption (interruption) of an Interrupt Service Routine (ISR), servicing an
interrupt, by a high priority interrupt.

Time Management

Accurate time management is essential for providing precise time reference for all
applications.

The time reference to kernel is provided by a high-resolution Real-Time Clock (RTC)


hardware chip (hardware timer).

The hardware timer isprogrammed to interrupt the processorcontroller at a fixed rate.

This timer interrupt is referred as ‘Timer tick’.

Dept. Of CSE, HKBKCE 117 2019-20


18CS44 Microcontrollers and Embedded Systems

The ‘Timer tick’ is taken as the timing reference by the kernel.

The ‘Timer tick’ interval may vary depending on the hardware timer.

Usually the ‘Timer tick’ varies in the microseconds range.

The time parameters for tasks are expressed as the multiples of the ‘Timer tick’.

The System time is updated based on the ‘Timer tick’.

If the System time register is 32 bits wide and the ‘Timer tick’ interval is 1 microsecond, the
System time register will reset in 232 * 10-8/ (24 * 60 * 60) = 49700 Days = ~ 0.0497 Days =
1.19 Hours If the ‘Timer tick’ interval is | millisecond, the system time register will reset in
232 * 10°3 / (24 * 60 * 60) = 497 Days = 49.7 Days = ~ 50 Days The ‘Timer tick’ interrupt is
handled by the “Timer Interrupt’ handler of kernel.
The ‘Timer tick’ interrupt can be utilised for implementing the following actions.

 Save the current context (Context of the currently executing task).


 Increment the System time register by one.
 Generate timing error and reset the System time register if the timer tick count is
greater than the maximum range available for System time register.
 Update the timers implemented in kernel (Increment or decrement the timer registers
for each timer depending on the count direction setting for each register.
 Increment registers with count direction setting = ‘count up’ and decrement registers
with count direction setting = ‘count down’).
 Activate the periodic tasks, which are in the idle state.
 Invoke the scheduler and schedule the tasks again based on the scheduling algorithm.
 Delete all the terminated tasks and their associated data structures(TCBs)
 Load the context for the first task in the ready queue.
 Due to rescheduling the ready task might be changed to a new one from the task
which was preemptedby the “Timer interrupt” task.

Hard Real-Time

 Real-Time Operating Systems that strictly adhere to the timing constraints for a task
is referred as ‘Hard Real-Time’ systems.
 A HardReal-Time system must meet the deadlines for a task without any slippage.
 Missing any deadline may produce catastrophic results for Hard Real-Time Systems,
including permanent data lose and irrecoverable damages to the system/users.
 A system can have several such tasks and the key to their correct operation lies in
scheduling them so that they meet their time constraints.
 Air bag control systems and Anti-lock Brake Systems (ABS) of vehicles are typical
examples for Hard Real-Time Systems.
 The Air bag conttol system should be into action and deploy the air bags when the
vehicle meets a severe accident. Ideally speaking, the time for triggering the air bag
deployment task, when an accident is sensed by the Air bag control system, should be
zero and the air bags should be deployed exactly within the time frame, which is
predefined for the air bag deployment task.

Dept. Of CSE, HKBKCE 118 2019-20


18CS44 Microcontrollers and Embedded Systems

Soft Real-Time

 Real-Time Operating System that does not guarantee meeting deadlines, but offer the
best effort to meet the deadline are referred as ‘Soft Real-Time’ systems.
 Missing deadlines for tasks are acceptable for a Soft Real-time system if the
frequency of deadline missing is within the compliance limit of the Quality of Service
(QoS).
 Automatic Teller Machine (ATM) is a typical example for Soft- Real-Time System.
 If the ATM takes a few seconds more than the ideal operation time, nothing fatal
happens.
 An audio-video playback system is another example for Soft Real-Time system.
 No potential damage arises if a sample comes late by fraction of a second, for
playback.

TASKS, PROCESS AND THREADS

The term ‘task’ refers to something that needs to be done. In our day-to-day life, we are
bound to the execution of a number of tasks.

The task can be the one assigned by our managers or the one assigned by our
professors/teachers or the one related to our personal or family needs. In addition, we will
have an order of priority and schedule/timeline for executing these tasks.

In the operating system context, a task is defined as the program in execution and thé related
information maintained by the operating systemfor the program.

Task is also known as ‘Job’ in the operating system context.

A program or part of it in execution is also called a ‘Process’.

The terms ‘Task’, ‘Job’ and ‘Process’ refer to the same entity in the operating system context
and most often they are used interchangeably.

Process

 A ‘Process’ is a program, or part of it, in execution.


 Process’is also known as an instance of a program in execution.
 Multiple instances of the same program can execute simultaneously.
 A process requires various system resources like CPU for executing the process;
memory for storing the code corresponding to the process and associated variables,
I/O devices for information exchange, etc. A process is sequential in execution.

The Structure of a Process

 The concept of ‘Process’ leads to concurrent execution (pseudo parallelism) of tasks


and thereby the efficient utilisation of the CPU and other system resources.
 Concurrent execution is achieved through the sharing of CPU among the processes.

Dept. Of CSE, HKBKCE 119 2019-20


18CS44 Microcontrollers and Embedded Systems

 A process mimics a processor in properties and holds a set of registers, process status,
a Program Counter (PC) to point to the next executable instruction of the process, a
stack for holding the local variables associated with the process and the code
corresponding to the process.
 This can be visualised as shown in Fig. 10.4.

A process which inherits all-the properties of the CPU can be considered as a virtual
processor, awaiting its turn to have its properties switched into the physical processor.

When the process gets its turn, its registers arid the program counter register becomes
mapped to the physical registers of the CPU.

From a memory perspective, the memory occupied by the process is segregated into three
regions, namely, Stack memory, Data memory and Code memory (Fig. 10.5).

The ‘Stack’ memory holds all temporary data such as variables local to the process.

Data memory holds all global data for the process.

The code memory contains the program code (instructions) corresponding to the process.

On loading a process into the main memory, a specific area of memory is allocated for the
process.

The stack memory usually starts (OS Kermel implementation dependent) atthe highest
memory address from the memory area allocated for the process.

Say for example, the memory map of the memory area allocated for the process is 2048 to
2100, the stack memory starts at address 2100 and grows downwards to accommodate the
variables local to the process.

Dept. Of CSE, HKBKCE 120 2019-20


18CS44 Microcontrollers and Embedded Systems

Process States and State Transition

 The creation of a process to its termination is not a single step operation.


 The process traverses through a series of states during its transition from the newly
created state to the terminated state.
 The cycle through which a process changes its state from “newly created’ to
‘execution completed’ is known as ‘Process Life Cycle’.
 The various states through which a process traverses through during a Process Life .
 Cycle indicates the current status of the process with respect to time and also provides
information on what it is allowed to do next.
 Figure 10.6 represents the various states associated with a process.

The state at which a process is being created is referred as ‘Created State’.

The Operating System recognises a process in the ‘Created State’ but no resources are
allocated to the process.

The state, where a process is incepted into the memory and awaiting the processor time for
execution, is known as ‘Ready State’.

At this stage, the process is placed in the ‘Ready list? queue maintained by the OS.

The state where in the source code instructions corresponding to the process is being
executed is called ‘Running State’.

Running state is the state at which the process execution happens.

Dept. Of CSE, HKBKCE 121 2019-20


18CS44 Microcontrollers and Embedded Systems

‘Blocked State/Wait State’ refers to a state where a running process is temporarily


suspended from execution and does not have immediate access to resources.

The blocked state might be invoked by various conditions like: the process enters a wait state
for an event to occur (e.g. Waiting for user inputs such as keyboard input) or waiting for
getting access to a shared resource
A state where the process completes its execution is known as ‘Completed State’.

The transition of a process from one state to another is known as ‘State transition’.

When a process changes its state from Ready to running or from running to blocked or
terminated or from blocked to running, the CPU allocation for the process may also change.

It should be noted that the state representation _ for a process/task mentioned here is a
generic representation.

The states associated with a task may be known with a different name or there may be more
or less number of states than the one explained here under different OS kernel.

Dept. Of CSE, HKBKCE 122 2019-20


18CS44 Microcontrollers and Embedded Systems

For example, under VxWorks’ kernel, the tasks may be in either one or a specific
combination of the states READY, PEND, DELAY and SUSPEND.

The PEND state represents a state where the task/process is blocked on waiting for I/O or
system resource.

The DELAY state represents a state in which the task/process is sleeping and the SUSPEND
state represents a state where a task/process is temporarily suspended from execution and not
available for execution.

Under MicroC/OS-II kernel, the tasks may be in one of the states, DORMANT, READY,
RUNNING, WAITING or INTERRUPTED.

The DORMANT state represents the ‘Created’ state and WAITING state represents the state
in which a process waits for shared resource or I/O access.

Process Management

Process management deals with the creation of a process, setting up the memory space for the
process, loading the process’s code into the memory space, allocating system resources,
setting up a Process Control Block (PCB) for the process and process termination/deletion.

THREADS

 A thread is the primitive that can execute code.


 A thread is a single sequential flow of control within a process.
 ‘Thread’ is also known as light- weight process. A process can have many threads of
execution.
 Different threads, which are part of a process, share the same address space; meaning
. they share the data memory, code memory and heap memory area.
 Threads maintain their own thread status (CPU register values), Program Counter
(PC) and stack.

The Concept of Multithreading

Dept. Of CSE, HKBKCE 123 2019-20


18CS44 Microcontrollers and Embedded Systems

 A process/task in embedded application may be a complex or lengthy one and it may


contain various suboperations like getting input from I/O devices connected tothe
processor, performing some internal calculations/operations, updating some I/O
devices etc.
 If all the subfunctions of a task are executed in sequence, the CPU utilisation may not
be efficient.
 For example, if the process is waiting for a user input, the CPU enters the wait state
for the event, and the process execution also enters a wait state.
 Instead of this single sequential execution of the whole process, if the task/process is
split into different threads carrying out the different subfunctionalities of the process,
the CPU can be effectively utilised and when the thread corresponding to the I/O
operation enters the wait state, another threads which do not require the I/O event for
their operation can be switched into execution.
 This leads to morespeedy ‘execution of the process and the efficient utilisation of the
processor timeand resources.
 The. multithreaded atchitecture of a process can be better visualised with the thread-
process diagram shown in Fig. 10, 8.)

Dept. Of CSE, HKBKCE 124 2019-20


18CS44 Microcontrollers and Embedded Systems

 If the process is split into multiple threads, which executes a portion of the process,
there will be a main thread and rest of the threads will be created within the main
thread.
 Use of multiple threads to execute a process brings the following advantage.
 Better memory utilisation. Multiple threads of the same process share the
address space for data memory. This also reduces the complexity of inter
thread communication since variables can be shared across the threads.
 Since the process is split into different threads, when one thread enters a wait
state, the CPU can be utilised by other threads of the process that do not
require the event, which the other thread is waiting, for processing. This
speeds up the execution of the process. e
 Efficient CPU utilisation. The CPU is engaged all time.

Thread Standards

 Thread standards deal with the different standards available for thread creation and
management.
 These standards are utilised by the operating systems for thread creation and thread
management.
 It is a set of thread class libraries. The commonly available thread class libraries are
explained below

POSIX Threads: POSIX stands for Portable Operating System Interface.

The POSIX 4 standard deals with the Real-Time extensions and POSLX.4a standard deals
with thread extensions.

The POSIX standard library for thread creation and management is ‘Pthreads’.

‘Pthreads’ library defines the set of POSIX thread creation and management functions in ‘C’
language.

The primitive

creates a new thread. for running the function start_ function. Here pthread _t is the handle to
the newly created thread and pthread_attr_t is the data type for holding the thread attributes.
‘start_function’ is the function the thread is going to execute and arguments is the arguments
for ‘start_function’ (It is a void * in the above example). On successful creation of a Pthread,
pthread_create() associates the Thread Control Block (TCB) corresponding to the newly
created thread to the variable of type pthread_t (new_thread_ID in our example).

The primitive

Dept. Of CSE, HKBKCE 125 2019-20


18CS44 Microcontrollers and Embedded Systems

blocks the current thread and waits until the completion of the thread pointed’ by it (in this
example new_thread ) All the POSIX ‘thread calls’ returns an integer. A return value of zero
indicates the success of the call. Itis always good to check the return value of each call.

Dept. Of CSE, HKBKCE 126 2019-20


18CS44 Microcontrollers and Embedded Systems

You can compile this application using the gcc compiler.

Examine the output to figure out the thread execution switching.

The lines printed will give an idea of the order in which the thread execution is switched
between.

The pthread_join call forces the main thread to wait until the completion of the thread cd, if
the main thread finishes the execution first. The termination of a thread can happen in
different ways.

The thread can terminate either by completing its execution (natural termination) or by a
forced termination. Jn a natural termination, the thread completes its execution and returns
back to the main thread through a simple return or by executing the pthreadexit() call.

Forced termination can be achieved by the call pthread _cancel() or through the termination
of the main thread with exit or exec functions.

pthread_cancel() call is used by a thread to terminate another thread.

Dept. Of CSE, HKBKCE 127 2019-20


18CS44 Microcontrollers and Embedded Systems

pthread _exit() call is used by a thread to explicitly exit after it completes its work and is no
longer required to exist.

If the main thread finishes before the threads it has created, and exits with pthread_exit(), the
other threads continue to execute.

If the main thread uses exit call to exit the thread, all threads created by the main thread is
terminated forcefully.

Exiting a thread with the call pthread_exit() will not perform a cleanup.

It will not close any files opened by the thread and files will remain in the open status even
after the thread terminates.

Calling pthread_join at the end of the main thread is the best way to achieve synchronisation
and proper cleanup.

The main thread, after finishing its task waits for the completion of other threads, which were
joined to it using the pthread _join call.

With a pthread_join call, the main thread waits other threads, which were joined to it, and
finally merges to the single main thread.

Ifa new thread spawned by the main thread is still not joined to the main thread, it will be
counted against the system’s maximum thread limit.

Improper cleanup will lead to the failure of new thread creation.

User Level Thread User level threads do not have kernel/Operating System support and they
exist solely in the running process.

Even if a process contains multiple user level threads, the OS treats it as single thread and
will not switch the execution among the different threads of it.

It is the responsibility of the process to schedule each thread as and when required.

THREAD PRE-EMPTION

 Thread Pre-emption is the act of pre-empting the currently running thread.


 Thread Pre-emption ability is solely dependent on the Operating System.
 Thread Pre-emption is performed for sharing a CPU time among the threads.
 The execution switching among the threads is known as “Thread Context
Switching”

User Level Thread

 User level threads do not have kernel/Operating System support and they exist solely
in the running process.
 Even if a process contains multiple user level threads, the OS treats it as single thread
and will not switch the execution among the different threads of it.

Dept. Of CSE, HKBKCE 128 2019-20


18CS44 Microcontrollers and Embedded Systems

 It is the responsibility of the process to schedule each thread as and when required.
 In summary, user level threads of a process are non-preemptive at thread level from
OS perspective.

Kernel/System Level Thread

 Kernel level threads are individual units of execution, which the OS treats as separate
threads.
 The OS interrupts the execution of the currently running kernel thread and switches
the execution to another kernel thread based on the scheduling policies implemented
by the OS. Kernel level threads are pre-emptive.
 For user level threads, the execution switching (thread context switching) happens
only when the currently executing user level thread is voluntarily blocked.
 Hence, no OS intervention and system calls are involved in the context switching of
user level threads.
 This makes context switching of user level threads very fast.
 On the other hand, kernel level threads involve lots of kernel overhead and involve
system calls for context switching. However, kernel threads maintain a clear layer of
abstraction and allow threads to use system calls independently.
 There are many ways for binding user level threads with system/kernel level threads.
 The following section gives an overview of various thread binding models.

Many-to-One Model

 Many user level threads are mapped to a single kernel thread.


 In this model, the kernel treats all user level threads as single thread and the execution
switching among the user level threads happens when a currently executing user level
thread voluntarily blocks itself or relinquishes the CPU.
 Solaris Green threads and GNU Portable Threads are examples for this.
 The ‘PThread” example given under the POSIX thread library section is an
illustrative example for application with Many-to-One thread model.

One-to-One Model

 In One-to-One model, each user level thread is bonded to a kernel/system level


thread. Windows XP/NT/2000 and Linux threads are examples for One-to-One thread
models.
 The modified ‘PThread’ example given under the ‘Thread Pre-emption’ section is an
illustrative example for application with One-to-One thread model.

Many-to-Many Model

 In this model many user level threads are allowed to be mapped-to many kernel
threads.
 Windows NT/2000 with ThreadFibre package is an example for this.

Thread Vs Process

Dept. Of CSE, HKBKCE 129 2019-20


18CS44 Microcontrollers and Embedded Systems

MULTIPROCESSING AND MULTITASKING

In the operating sys- tem context multiprocessing describes the ability to execute multiple
processes simultaneously.

Systems which are capable of performing multiprocessing, are.known as


multiprocessorsystems.

Multiprocessor systems possess multiple CPUs and can execute multiple processes
simultaneously.

The ability of the operating system to have multiple programs in memory, which are ready
for execution, is referred as multiprogramming.

In a uniprocessor system, it is not possible to execute multiple processes simultaneously.

However, it is possible for a uniprocessor system to achieve some degree of pseudo


parallelism in the execution of multiple processés by switching the execution among different
processes.

The ability of an operating system to hold multiple processes in memory and switch the
processor (CPU) from executing one process to another process is known as multitasking.

Multitasking creates the illusion of multiple tasks executing in parallel.

Multitasking involves the switching of CPU from executing one task to another.

A ‘process’ is considered as a ‘Virtual processor’, awaiting its turn.to have its properties
switched into the physical processor. In a multitasking environment, when task/process

Dept. Of CSE, HKBKCE 130 2019-20


18CS44 Microcontrollers and Embedded Systems

switching happens, the virtual processor (task/process) gets its proper- ties converted into that
of the physical processor.

The switching of the virtual processor to physical processor is controlled by the scheduler of
the OS kernel.

Whenever a CPU switching happens, the current context of execution should be saved to
retrieve it at a later point of time when the CPU executes the process, which is interrupted
currently due to execution switching.

The context saving and retrieval is essential for resuming a process exactly from the point
where it was interrupted due to CPU switching.

The act of switching CPU among the processes or changing the current execution context is
known as ‘Context switching’.

The act of saving the current context which contains the context details (Register details,
memory details, system resource usage details, execution details, etc.) for the currently
running process at the time of CPU switching is known as ‘Context saving’.

The process of retrieving the saved context details for a process, which is going to be
executed due to CPU switching, is known as ‘Context retrieval’.

Multitasking involves ‘Context switching’ (Fig. 10.11), “Context saving’ and ‘Context
retrieval’.

Toss juggling- The skilful object manipulation game is a classic real world example for the
multitasking illusion.

The juggler uses a number of objects (balls, rings, etc.) and throws them up and catches

them. At any point of time, he throws only one ball and catches only one per hand.

However, the speed at which he is switching the balls for throwing and catching creates the
illusion, he is throwing and catching multiple balls or using more than two hands
simultaneously to the spectators

Dept. Of CSE, HKBKCE 131 2019-20


18CS44 Microcontrollers and Embedded Systems

TYPES OF MULTITASKING

Depending on how the switching act is implemented, multitasking can be classified into
different types.

The following section describes the various types of multitasking existing in the Operating
System’s context.

Co-operative Multitasking

Co-operative multitasking is the most primitive form of multitasking in which a task/process


gets a chance to execute only when the currently executing task/ process voluntarily
relinquishes the CPU.

In this method, any task/process can hold the CPU as much time as it wants. Since this type
of implementation involves the mercy of the tasks each other for getting the CPU time for
execution, it is known as co-operative multitasking.

If the currently executing task is non-cooperative, the other tasks may have to wait for a long
time to get the CPU. ”

Preemptive Multitasking

Preemptive multitasking ensures that every task/process gets a chance to execute.

When and how much time a process gets is dependent on the implementation of the
preemptive scheduling.

Dept. Of CSE, HKBKCE 132 2019-20


18CS44 Microcontrollers and Embedded Systems

As the name indicates, in preemptive multitasking, the currently running task/ process is
preempted to give a chance to other tasks/process to executé.

The preemption of task may be based on time slots or task/process priority.

Non-preemptiveMultitasking.

Innon-preemptivemultitasking, the process/task, whichis currently giventheCPU time, is


allowedto execute until itterminates (enters the ‘Completed’ state) or enters the
‘Blocked/Wait’ state, waiting foran I/O or system resource.

The co-operative andnon-preemptive multitasking differs in their behaviour when they arein
the ‘Blocked/Wait state.

In co-operative multitasking, the currently executing process/task need not relinquish the
CPU when it enters the ‘Blocked/Wait’ state

TASK COMMUNICATION

In a multitasking system, multiple tasks/processes run concurrently (in pseudo parallelism)


and each process may or may not interact between.

Based on the degree of interaction, the processes running on an OS are classified as

Co-operating Processes: In the co-operating interaction model one process requires the
inputs from other processes to complete its execution.

Competing Processes: The competing processes do not share anything among themselves
but they share the system resources. The competing processes compete for the system
resources such as file, display device, etc. Co-operating processes exchanges information and
communicate through the following methods.

Co-operation through Sharing: The co-operating process exchange data through some
shared resources.

Co-operation through Communication: No data is shared between the processes. But they
communicate for synchronisation. The mechanism through which processes/tasks
communicate each other is known as Inter Process/Task Communication (IPC). Inter
Process Communication is essential for process co-ordination. The various types of Inter
Process Communication (IPC) mechanisms adopted by process are kernel (Operating
System) dependent. Some of the important IPC mechanisms adopted by various kernels are
explained below.

Shared Memory

 Processes share some area of the memory to communicate among them (Fig. 10.16).
Information to be communicated by the process is written to the shared memory area.

Dept. Of CSE, HKBKCE 133 2019-20


18CS44 Microcontrollers and Embedded Systems

 Other processes which require this information can read the same from the shared
memory area.
 It is same as the real world example where ‘Notice Board’ is used by corporate to
publish the public information among the employees.
 The implementation of shared memory concept is kernel dependent.
 Different mechanisms are adopted by different kernels for implementing this. A few
among them are:

Pipes

 ‘Pipe’ is asection of the shared memory used by processes for communicating.


 Pipes follow the client-server architecture.
 A process which creates a pipe is known as a pipe server and a process which
connects to a pipe is known as pipe client.
 A pipe can be considered as a conduit for information flow and has two conceptual
ends.
 It can be unidirectional, allowing information flow in one direction or bidirectional
allowing bi-directional information flow.
 A unidirectional pipe allows the process connecting at one end of the pipe to write to
the pipe and the process connected at the other end of the pipe to read the data,
whereas a bi-directional pipe allows both reading and writing at one end.
 The unidirectional pipe can be visualised as

The implementation of ‘Pipes’ is also OS dependent. Microsoft® Windows Desktop


Operating Systems support two types of ‘Pipes’ for Inter Process Communication. They are:

Anonymous Pipes: The anonymous pipes-are unnamed, unidirectional pipes used for data
transfer between two. processes.

Named Pipes: Named pipe is a named, unidirectional or bi-directional pipe for data
exchange between processes.

Like anonymous pipes, the process which creates the named pipe is known as pipe server. A
process which connects to the named pipe is known as pipe client.

Dept. Of CSE, HKBKCE 134 2019-20


18CS44 Microcontrollers and Embedded Systems

With named pipes, any process can act as both client and server allowing point-to-point
communication.

Named pipes can be used for communicating between processes running on the same
machine or between processes running on different machines connected to a network.

Memory Mapped Objects

Memory mapped object is a shared memory technique adopted by certain Real-Time


Operating Systems for allocating a shared block of memory which can be accessed by
multiple process simultaneously

In this approach a mapping object is created and physical storage for it is reserved and
committed.

A process can map the entire committed physical area or a block of it to its virtual address
space.

All read and write operation to this virtual address space by a process is directed to its
committed physical area.

Any process which wants to share data with other processes can map the physical memory
area of the mapped object to its virtual memory space and use it for sharing the data.

Windows CE 5.0 RTOS uses the memory mapped object based shared memory technique for
Inter Process Communication (Fig. 10.18).

The CreateFileMapping (HANDLE hFile, LPSECURITY ATTRI-BUTES


lpFileMappingAttributes, DWORD fiProtect, DWORD dwMaximumSizeHigh, DWORD.
dw- MaximumSizeLow, LPCTSTR IpName) system call is used for sharing the memory.

This API call is used for creating a mapping from a file.

In order to create the mapping from the system paging memory, the handle parameter should
be passed as INVALID HANDLE _VALUE (-1).

The lpFileMappingA tributes parameter represents the security attributes and it must be
NULL.

The flProtect parameter represents the read write access for the shared memory area.

A value of PAGE READONLY makes the shared memory read only whereas the value
PAGH_READWRITE gives read-write access to the shared memory.

The parameter dwMaximumSizeHigh specifies the higher order 32 bits of the maximum size
of the memory mapped object and dwMaximumSizeLow specifies the lower order 32 bits of
the maximum size of the memory mapped object.

Dept. Of CSE, HKBKCE 135 2019-20


18CS44 Microcontrollers and Embedded Systems

The parameter /pName points to a null terminated string specifying the name of the memory
mapped object.

The memory mapped object is created as unnamed object if the parameter IpName is NULL.

If IpName specifies the name of an existing memory mapped object, the function re- turns the
handle of the existing memory mapped object to the caller process.

The memory mapped object can be shared between the processes by either passing the handle
of the object or by passing its name.

If the handle of the memory mapped object created by a process is passed to another process
for shared access, there is a possibility of closing the handle by the process which created the
handle while it is in use by another process.

This will throw OS level exceptions.

Ifthe name of the memory object is passed for shared access among processes, processes can
use this name for creating a shared memory object which will open the shared memory object
already existing with the given name.

The OS will maintain ausage count for the named object and it is incremented each time
when a process creates/opens a memory mapped object with existing name.

This will prevent the destruction of a shared memory object by one process while it is being
accessed by another process.

Hence passing the name of the memory mapped object is strongly recommended for memory
mapped object based inter process communication.

Dept. Of CSE, HKBKCE 136 2019-20


18CS44 Microcontrollers and Embedded Systems

The MapViewOfFile (HANDLE hFileMappingObject DWORD dwDesiredAccess, DWORD


dwFileOffs- etHigh, DWORD dwFileOffsetLow, DWORD dwNumberO{BytesTo Map)
system call maps a view of the memory mapped object to the address space of the calling
process.

The parameter hFileMappingObject specifies the handle to an existing memory mapped


object.
The dwDesiredAccess parameter represents the read write access for the mapped view area.

A value of FILE MAP WRITE makes the view access read-write, provided the memory
mapped object hFileMappingObject is created with read-write access, whereas the value
FILE MAP READ gives read only access to the shared memory, provided the memory
mapped object hFileMappingObject is created with read-write/read only access.

The parameter dwFileOffsetHigh specifies the higher order 32 bits and dwFileOffsetLow
specifies the lower order 32 bits of the memory offset where mapping is to begin from the
memory mapped object.

A value of ‘0’ for both of these maps the view from the beginning memory area of the
memory object. d@wNumberOf- BytesToMap specifies the number of bytes of the memory
object to map.

If dwNumberOfBytesTo Map is zero, the entire memory area owned by the memory mapped
object is mapped.

On successful execution, Map ViewOfFile call returns the starting address of the mapped
view.

If the function fails it returns NULL.

A mapped view of the memory mapped object is unmapped by the API call
UnmapViewOfFile (LPCVOID IpBaseAddress).

The IpBaseAddress parameter specifies a pointer to the base address of the mapped view of a
memory object that is to be unmapped.

This value must be identical to the value returned by a previous call to the MapViewOfFile
function.

Calling UnmapViewOfFile cleans up the committed physical storage in a process’s virtual


address space.

In other words, it frees the virtual address space of the mapping object. Under Windows
NT/XP OS, a process can open an existing memory mapped object by calling the API
OpenFileMapping(DWORD dwDesiredAccess,

BOOL bInheritHandle, LPCTSTR lpName).

The parameter dwDesiredAccess specifies the read write access permissions for

Dept. Of CSE, HKBKCE 137 2019-20


18CS44 Microcontrollers and Embedded Systems

the memory mapped object.

A value of FILE MAP ALL ACCESS provides read-write access, whereas the value FILE
_MAP_READ allocates only read access and FILE_MAP_ WRITE allocates write only
access.

The parameter blnheritHandle specifies the handle inheritance.

If this parameter is TRUE, the calling process inherits the handle of the existing object,
otherwise not.

The parameter /pName specifies the name of the existing memory mapped object which
needs to be opened.

Windows CE 5.0 does not support handle inheritance and hence the API call
OpenFileMapping is not supported.

Message Passing

 Message passing is an (a)synchronous information exchange mechanism used for


Inter Process/Thread Communication.
 The major difference between shared memory and messagelimited amount of
info/data is passed through message passing.
 Also message passing is relatively fast and free from the synchronisation overheads
compared to shared memory.
 Based on the message passing operation between the processes, message passing is
classified into

Message Queue

Usually the process which wants to talk to another process posts the message to a First-In-
First-Out (FIFO) queue called ‘Message queue’, which stores the messages temporarily in a
system defined memory object, to pass it to the desired process (Fig. 10.20).

Messages are sent and received through send (Name of the process to which the message is to
bé sent-message) and receive (Name of the process from which the message is to be received,
message) méthods.

The messages are exchanged through a message queue.

The implementation of the message queue, send and receive methods are OS kernel
dependent.

The Windows XP OS kernel maintains a single system message queue and one
process/thread (Process and threads are used interchangeably here, since thread is the basic
unit of process in windows) specific message queue.

A thread which wants to communicate with another thread posts the message to the system
message queue.

Dept. Of CSE, HKBKCE 138 2019-20


18CS44 Microcontrollers and Embedded Systems

The kernel picks up the message from the system message queue one at a time and examines
the message for finding the destination thread and then posts the message to the message
queue of the corresponding thread.
For posting a message to a thread’s message queue, the kernel fills a message structure MSG
and copies it to the message queue of the thread.

The message structure MSG contains the handle of the process/thread for which the message
is intended, the message parameters, the time at which the message is posted, etc.

A thread can simply post a message to another thread and can continue its operation or it may
wait for a response from the thread to which the message is posted.

The messaging mechanism is classified into synchronous and asynchronous based on the
behaviour of the message posting thread.

In asynchronous messaging, the message posting thread just posts the message to the queue
and it will not wait for an acceptance (return) from the thread to which the message is posted,
whereas in synchronous messaging, the thread which posts a message enters waiting state and
waits for the message result from the thread to which the message is posted.

The thread which invoked the send message becomes blocked and the scheduler will not pick
it up for scheduling.

The PostMessage (HWND hWnd, UINT Msg, WPARAM wParam, LPARAM

Dept. Of CSE, HKBKCE 139 2019-20


18CS44 Microcontrollers and Embedded Systems

[Param) or PostThreadMessage (DWORD idThread, UINT Msg, WPARAM wParam,


LPARAM |Param) API is used by a thread in Windows for posting a message to its own
message queue or to the message queue of another thread.

The PostMessage API does not always guarantee the posting of messages to message queue.
The PostMessage API will not post a message to the message queue when the message queue
is full.

Hence it is recommended to check the return value of PostMessage API to confirm the
posting of message.

The SendMessage (HWND hWnd, UINT Msg, WPARAM wParam, LPARAM _ [Param)
API call sends a message to the thread specified by the handle hWnd and waits for the callee
thread to process the message.

The thread which calls the SendMessage API enters waiting state and _ waits for the message
result from the thread to which the message is posted.

The thread which invoked the SendMessage API call becomes blocked and the scheduler will
not pick it up for scheduling.

The Windows CE operating system supports a special Point-to-Point Message queue


implementation.

The OS maintains a First In First Out (FIFO) buffer for storing the messages and each
process can access this buffer for reading and writing messages.

The OS also maintains a special queue, with single message storing capacity, for storing high
priority messages (Alert messages).

The creation and usage of message queues under Windows CE OS is explained below.

The CreateMsgQueue(LPCWSTR IpszName, LPMSGQUEUEOPTIONS IpOptions) API


call creates a message queue or opens a named message queue and returns a read only or
write only handle to the message queue.

A process can use this handle for reading or writing a message from/to of the message queue
pointed by the handle. The parameter /pszName specifies the name of the message queue.

If this parameter is NULL, an unnamed message queue is created. Processes can use the
handle returned by the API call if the message queue is created without any name.

If the message queue is created as named message queue, other processes can use the name of
the message queue for opening the named message queue created by a process.

Calling the CreateMsgQueue API with an existing named message queue as parameter
returns a handle to the existing message queue.

Dept. Of CSE, HKBKCE 140 2019-20


18CS44 Microcontrollers and Embedded Systems

Under the Desktop Windows Operating Systems (Windows 9x/XP/NT/2K), each object type
(viz. mutex, semaphores, events, memory maps, watchdog timers and message queues) share
the same namespace and the same name is not allowed for creating any of this.

Windows CE kernel maintains separate namespace for each and supports the same name
across different objects.

The /pOptions parameter points to a MSGQOUEUEOPTIONS structure that sets the


properties of the message queue.

Mailbox

 Mailbox is an alternate form of ‘Message queues’ and it is used in certain Real- Time
Operating Systems for IPC.
 Mailbox technique for IPC in RTOS is usually used for one way messaging.
 The task/thread which wants to send a message to other tasks/threads creates a
mailbox for posting the messages.
 The threads which are interested in receiving the messages posted to the mailbox by
the mailbox creator thread can subscribe to the mailbox.
 The thread which creates the mailbox is known as ‘mailbox server’ and the threads
which subscribe to the mailbox are known as ‘mailbox clients’.
 The mailbox server posts messages to the mailbox and notifies it to the clients which
are subscribed to the mailbox.
 The clients read the message from the mailbox on receiving the notification.
 The mailbox creation, subscription, message reading and writing are achieved through
OS kernel provided API calls, Mailbox and message queues are same in functionality.
 The only difference is in the number of messages supported by them.
 Both of them are used for passing data in the form of message(s) from a task to
another task(s). Mailbox is used for exchanging a single message between two tasks
or between an Interrupt Service Routine (ISR) and a task.
 Mailbox associates a pointer pointing to the mailbox and a wait list to hold the tasks
waiting for a message to appear in the mailbox.
 The implementation of mailbox is OS kernel dependent.
 The MicroC/OS-II implements mailbox as a mechanism for inter-task
communication.
 Figure 10.21 given below illustrates the mailbox based IPC technique

Dept. Of CSE, HKBKCE 141 2019-20


18CS44 Microcontrollers and Embedded Systems

Signalling

 Signalling isa primitive way of communication between processes/threads.


 Signals are used for asynchronous notifications where one process/thread fires a
signal, indicating the occurrence of a scenario which the other process(es)/thread(s) is
waiting.
 Signals are not queued and they do not carry any data.
 The communication mechanisms used in RTX51 Tiny OS is an example for
Signalling.
 The os_send_signal kernel call under RTX 51 sends a signal from one task to a
specified task. Similarly the os_wait kernel call waits for a specified signal.
 The VxWorks RTOS kernel also implements “signals’ for inter process
communication. Whenever a specified signal occurs it is handled in a signal handler
associated with the signal.

Remote Procedure Call (RPC) and Sockets

Dept. Of CSE, HKBKCE 142 2019-20


18CS44 Microcontrollers and Embedded Systems

 Remote Procedure Call or RPC (Fig. 10.22) is the Inter Process Communication (IPC)
mechanism used by a process to call a procedure of another process running on the
same CPU or on a different CPU which is interconnected in a network.
 In the object oriented language terminology RPC is also known as Remote Invocation
or Remote Method Invocation (RMI).
 RPC is mainly used for distributed applications like client-server applications.
 With RPC it is possible to communicate over a heterogeneous network (i.e. Network
where Client and server applications are running on different Operating systems), The
CPU/process containing the procedure which needs to be invoked remotely is known
as server.
 The CPU/process which initiates an RPC request is known as client.

 It is possible to implement RPC communication with different invocation interfaces.


 In order to make the RPC communication compatible across all platforms it should
stick on to certain standard formats.
 Interface Definition Language (IDL) defines the interfaces for RPC. Microsoft
Interface Defi- nition Language (MIDL) is the IDL implementation from Microsoft
for all Microsoft platforms.
 The RPC communication can be either Synchronous (Blocking) or Asynchronous
(Non-blocking).
 In the Synchronous communication, the process which calls the remote procedure is
blocked until it receives a response back from the other process.
 In asynchronous RPC calls, the calling process continues its execution while the
remote process performs the execution of the procedure.

Dept. Of CSE, HKBKCE 143 2019-20


18CS44 Microcontrollers and Embedded Systems

 The result from the remote procedure is returned back to the caller through
mechanisms like callback functions.

 On security front, RPC employs authentication mechanisms to protect the systems


against vulnerabilities.
 The client applications (processes) should authenticate themselves with the server for
getting access.
 Authentication mechanisms like IDs, public key cryptography (like DES, 3DES), etc.
are used by the client for authentication.
 Without authentication, any client can access the remote procedure.
 This may lead to potential security risks. Sockets are used for RPC communication.
 Socket is a logical endpoint in a two-way communication link between two
applications running on a network.
 A port number is associated with a socket so that the network layer of the
communication channel can deliver the data to the designated application.
 Sockets are of different types, namely, Internet sockets (INET), UNIX sockets, etc.
 The INET socket works on internet communication protocol.
 TCP/IP, UDP, etc. are the communication protocols used by INET sockets. INET
sockets are classified into:
1. Stream sockets
2. Datagram sockets

 Stream sockets are connection oriented and they use TCP to establish a reliable
connection. On the other hand, Datagram sockets rely on UDP for establishing a
connection.
 The UDP connection is unreliable when compared to TCP.
 The client-server communication model uses a socket at the client side and a socket at
the server side.
 A port number is assigned to both of these sockets.
 The client and server should be aware of the port number associated with the socket.
 In order to start the communication, the client needs to send a connection request to
the server at the specified port number.
 The client should be aware of the name of the server along with its port number.
 The server always listens to the specified port number on the network. Upon receiving
a connection request from the client, based on the success of authentication, the server
grants the connection request and a communication channel is established between
the client and server. The client uses the host name and port number of server for
sending re- quests and server uses the client’s name and port number for sending
responses.

TASK SYNCHRONISATION

 In a multitasking environment, multiple processes run concurrently (in pseudo


parallelism) and share the system resources.
 Apart from this, each process has its own boundary wall and they communicate with
each other with different [PC mechanisms including shared memory and variables.
 Imagine a situation where two processes try to access display hardware connected to
the system or two processes try to access a shared memory area where one process
tries to write to a memory location when the other process is trying to read from this.

Dept. Of CSE, HKBKCE 144 2019-20


18CS44 Microcontrollers and Embedded Systems

 The act of making processes aware of the access of the shared resources by each
process to avoid conflicts is known as Task/Process Synchronisation.

Task Communication/Synchronisation Issues

Racing

Dept. Of CSE, HKBKCE 145 2019-20


18CS44 Microcontrollers and Embedded Systems

 From a programmer perspective the value of counter will be 10 at the end of


execution of processes A & B.
 But ‘it need not be always’ in a real world execution of this piece of code under a
multitasking kernel.
 The results depend on the process scheduling policies adopted by the OS kernel. Now
let’s dig into the piece of code illustrated above.
 The program statement counter++; looks like a single statement from a high level
programming language (‘C’ language) perspective.
 The low level implementation of this statement is dependent on the underlying
processor instruction set and the (cross) compiler in use.
 The low level implementation of the high level program statement counter++,; under
Windows XP op- erating system running on an Intel Centrino Duo processor is given
below. The code snippet is compiled with Microsoft Visual Studio 6.0 compiler.

 At the processor instruction level, the value of the variable counter is loaded to the
Accumulator register (EAX register).
 The memory variable counter is represented using a pointer.
 The base pointer register (EBP register) is used for pointing to the memory variable
counter. After loading the contents of the variable‘counter to the Accumulator, the
Accumulator content is incremented by one using the add instruction.
 Finally the content of Accumulator is loaded to the memory location which repre-
sents the variable counter.
 Both the processes Process A and Process B contain the program statement counter+
+; Translating this into the machine instruction.

 Imagine a situation where a process switching (context switching) happens from


Process A to Process B when Process A is executing the counter++; statement.
 Process A accomplishes the counter+ +; statement through three different low level
instructions.
 Now imagine that the process switching happened at the point where Process A
executed the low level instruction, ‘mov eax,dwordptr [ebp-4]’ and is about to execute
the next instruction ‘add eax,1’.
 The scenario is illustrated in Fig. 10.23. Process B increments the shared variable
‘counter’ in the middle of the operation where Process A tries to increment it.
 When Process A gets the CPU time for execution, it starts from the point where it got
interrupted (If Process B is also using the same registers eax and ebp for executing
counter++;

Dept. Of CSE, HKBKCE 146 2019-20


18CS44 Microcontrollers and Embedded Systems

instruction, the original content of these registers will be saved as part of the context saving
and it will be retrieved back as part of context retrieval, when process A gets the CPU for
execution.
 Hence the content of eax and ebp remains intact irrespective of context switching).
 Though the variable counter is incremented by Process B, Process A is unaware of it
and it increments the variable with the old value.
 This leads to the loss of one increment for the variable counter.
 This problem occurs due to non-atomict operation on variables.
 This issue wouldn’t have been occurred if the underlying actions corresponding to the
program statement counter++; is finished in a single CPU execution cycle.
 The best way to avoid this situation is make the access and modification of shared
variables mutually exclusive; meaning when one process accesses a shared variable,
prevent the other processes from accessing it.

To summarise, Racing or Race condition is the situation in which multiple processes


compete (race) each other to access and manipulate shared data concurrently. In a Race
condition the final val- ue of the shared data depends on the process which acted on the data
finally.

Deadlock

A race condition produces incorrect results whereas a deadlock condition creates a situation
where none of the processes are able to make any progress in their execution resulting in a set
of deadlock processes.

A situation similar to traffic jam issues is illustrated below

Dept. Of CSE, HKBKCE 147 2019-20


18CS44 Microcontrollers and Embedded Systems

In its simplest form ‘deadlock’ is the condition in which a process is waiting for a resource
held by another process which is wait- ing for a resource held by the first process (Fig.
10.25).

To elaborate: Process A holds a resource x and it wants a resource y held by Process B.

Process B is currently holding resource y and it wants the resource x which is currently held
by Process A.

Both hold the respective resources and they compete each other to get the resource held by
the respective processes.

The result of the competition is ‘deadlock’.

Dept. Of CSE, HKBKCE 148 2019-20


18CS44 Microcontrollers and Embedded Systems

None of the competing process will be able to access the resources held by other processes
since they are locked by the respective processes (If a mutual exclusion policy is
implemented for shared resource access, the resource is locked by the process which is
currently accessing it).

Mutual Exclusion: The criteria that only one process can hold a resource at a time. Meaning
processes should access shared resources with mutual exclusion. Typical example is the
accessing of display hardware in an embedded device.

Hold and Wait: The condition in which a process holds a shared resource by acquiring the
lock control- ling the shared access and waiting for additional resources held by other
processes.

No Resource Preemption: The criteria that operating system cannot take back a resource
from a process which is currently holding it and the resource can only be released voluntarily
by the process holding it.

Circular Wait:

A process is waiting for a resource which is currently held by another process which in turn is
waiting for a resource held by the first process.

In general, there exists a set of waiting process PO, Pl ... Px with PO is waiting for a resource
held by P1 and P1 is waiting for a resource held by PO, ..., Pn is waiting for a resource held
by PO and PO is waiting for a resource held by Pv and so on...

This forms a circular wait queue. _ ‘Deadlock’ is a-result of the combined occurrence of
these four conditions listed above.

These conditions are first described by E. G. Coffman in 1971 and it is popularly known as
Coffman conditions

Deadlock Handling

A smart OS may foresee the deadlock condition and will act proactively to avoid such a
situation.

Now if a deadlock occurred, how the OS responds to it? The reaction to deadlock condition
by OS is nonuniform.

The OS may adopt any of the following techniques to detect and prevent deadlock conditions.

Ignore Deadlocks:

Always assume that the system design is deadlock free.

This is acceptable for the reason the cost of removing a deadlock is large compared to the
chance of happening a deadlock.

UNIX: is an example for an OS following this principle.

Dept. Of CSE, HKBKCE 149 2019-20


18CS44 Microcontrollers and Embedded Systems

A life critical system cannot pretend that it is deadlock free for any reason.

Detect and Recover:

This approach suggests the detection of a deadlock situation and recovery from

it. This 1s similar to the deadlock condition that may arise at a traffic junction.

When the vehicles from different directions compete to cross the junction, deadlock (traffic
jam) condition is resulted.deadlock (traffic jam) is happened at the junction, the only solution
is to back up the vehicles from one direction and allow the vehicles from opposite direction to
cross the junction. If the traffic is too high, lots of vehicles may have to be backed up to
resolve the traffic jam.

This technique is also known as ‘back up cars’ technique (Fig. 10.26).

Operating systems keep a resource graph in their memory.

The resource graph is updated on each resource request and release.

A deadlock condition can be detected by analysing the resource graph by graph analyser
algorithms.

Dept. Of CSE, HKBKCE 150 2019-20


18CS44 Microcontrollers and Embedded Systems

Once a deadlock condition is detected, the system can terminate a process or preempt the
resource to break the deadlocking cycle.

Avoid Deadlocks: Deadlock is avoided by the careful resource allocation techniques by the
Operating System. It is similar to the traffic light mechanism at junctions to avoid the traffic
jams.

Prevent Deadlocks: Prevent the deadlock condition by negating one of the four conditions
favouring the deadlock situation.

Ensure that-a process does not hold any other resources when it requests a resource. This can
be achieved by implementing the following set of rules/guidelines in allocating resources to
pro- cesses.
1. A process must request all its required resource and the resources should be
allocated before the process begins its execution.
2. Grant resource allocation requests from processes only if the process does not hold
a resource currently.

Ensure that resource preemption (resource releasing) is possible at operating system level.
This can be achieved by implementing the following set of rules/guidelines in resources
allocation and releasing.
1. Release all the resources currently held by a process if a request made by the
process for a new resource is not able to fulfil immediately.
2. Add the resources which are preempted (released) to a resource list describing the
resources which the process requires to complete its execution.
3. Reschedule the process for execution only when the process gets its old resources
and the new resource which is requested by the process. Imposing these criterions may
introduce negative impacts like low resource utilisation and starvation of processes.

Livelock

 The Livelock condition is similar to the deadlock condition except that a process in
livelock condition changes its state with time.
 While in deadlock a process enters in wait state for a resource and continues in that
state forever without making any progress in the execution, in.alivelock condition a
process always does something but is unable’ to make any progress in the execution
completion.
 The livelock condition is better explained with the real world example, two people
attempting to cross each other in a narrow corridor.
 Both the persons move towards each side of the corridor to allow the opposite person
to cross. Since the corridor is narrow, none of them are able to cross each other. Here
both of the persons perform some action but still they are unable to achieve their
target, cross each other.

Starvation

 In the multitasking context, starvation is the condition in which a process does not get
the resources required to continue its execution for a long time.
 As time progresses the process starves on resource.

Dept. Of CSE, HKBKCE 151 2019-20


18CS44 Microcontrollers and Embedded Systems

 Starvation may arise due to various conditions like byproduct of preventive measures
of deadlock, scheduling policies favouring high priority tasks and tasks with shortest
execution time, etc.

HOW TO CHOOSE AN RTOS

Functional Requirements

Processor Support It is not necessary that all RTOS’s support all kinds of processor
architecture. It is essential to ensure the processor support by the RTOS.

Memory RequirementsThe OS requires ROM memory for holding the OS files and it is
normally stored in a non-volatile memory like FLASH.

OS also requires working memory RAM for loading the OS services.

Since embedded systems are memory constrained, it is essential to evaluate the minimal
ROM and RAM requirements for the OS under consideration.

Real-time Capabilities It is not mandatory that the operating system for all embedded
systems need to be Real-time and all embedded Operating systemsare ‘Real-time’
inbehaviour.

The task/processscheduling policies plays an important role in the ‘Real-time’ behaviour of


an OS.

Analyse the real-time capabilities of the OS under consideration and the standards met by the
operating system for real-time capabilities.

Kernel and Interrupt Latency The kernel of the OS may disable interrupts while executing
certain services and it may lead to interrupt latency.

For an embedded system whose response requirements are high, this latency should be
minimal.

Inter Process Communication and Task Synchronisation The implementation of Inter


Process Communication and Synchronisation is OS kernel dependent.

Certain kernels may provide a bunch of options whereas others provide very limited

options. Certain kernels implement policies for avoiding priority inversion issues in

resource sharing. Modularisation Support Most of the operating systems provide a bunch

of features.

At times it may not be necessary for an embedded product for its functioning.

It is very useful if the OS supports modularisation where in which the developer can choose
the essential modules and re-compile the OS image for functioning.

Dept. Of CSE, HKBKCE 152 2019-20


18CS44 Microcontrollers and Embedded Systems

Windows CE is an example for a highly modular operating system.

Dept. Of CSE, HKBKCE 153 2019-20


18CS44 Microcontrollers and Embedded Systems

Support for Networking and CommunicationThe OS kernel may provide stack


implementation and driver support for a bunch of communication interfaces and networking.

Ensure that the OS under consideration provides support for all the interfaces required by the
embedded product.

Development Language Support Certain operating systems include the run time libraries
required for running applications written in languages like Java and C#.

A Java Virtual Machine (JVM) customised for the Operating System is essential for running
java applications.

Similarly the NET Compact Framework (.NETCF) is required for running Microsoft® NET
applications on top of the Operating System.

The OS may include these components as built-in component, if not, check the availability of
the same from a thirdparty vendor for the OS under consideration.

Non-functional Requirements

Custom Developed or Off the Shelf Depending on the OS requirement, it is possible to go


for the complete development of an operating system suiting the embedded system needs or
use an off the shelf, readily available operating system, which is either a commercial product
or an Open Source product, which is in close match with the system requirements.

Sometimes it may be possible to build the required features: by customising an Open source
OS.

The decision on which to select is purely dependent on the development cost, licensing fees
for the OS, development time and availability of skilled resources.

Cost The total cost for developing or buying the OS and maintaining it in terms of
commercial product and custom build needs to be evaluated before taking a decision on the
selection of OS.

Development and Debugging Tools Availability The availability of development and


debugging tools is a critical decision making factor in the selection of an OS for embedded
design.

Certain Operating Systems may be superior in performance, but the availability of tools for
supporting the development may be limited.

Ease of Use How easy it is to use a commercial RTOS is another important feature that needs
to be considered in the RTOS selection.

INTEGRATION OF HARDWARE AND FIRMWARE

 Integration of hardware and firmware deals with the embedding of firmware into the
target hardware board.

Dept. Of CSE, HKBKCE 154 2019-20


18CS44 Microcontrollers and Embedded Systems

 It is the process of ‘Embedding Intelligence’ to the product.


 The embedded processors/controllers used in the target board may or may not have
built in code memory.
 For non-operating system based embedded products, if the processor/controller
contains internal memory and the total size of the firmware is fitting into the code
memory area, the code memory is downloaded into the target controller/processor.
 If the processor/controller does not support built in code memory or the size of the
firmware is exceeding the memory size supported by the target processor/controller,
an external dedicated EPROM/ FLASH memory chip is used for holding the
firmware.
 This chip is interfaced to the processor/controller. (The type of firmware storage,
either processor storage or external storage is decided at the time of hardware design
by taking the firmware complexity into consideration).
 A variety of techniques are used for embedding the firmware into the target board.
 The commonly used firmware embedding techniques for a non-OS based embedded
system are explained below.
 The non-OS based embedded systems store the firmware either in the onchip
processor/controller memory or offchip memory (FLASH/NVRAM, etc.).

Out-of-Circuit Programming

 Out-of-circuit programming is performed outside the target board.


 The processor or memory chip into which the firmware needs to be embedded is
taken out of the target board and it is programmed with the help of a programming
device (Fig. 12.1).
 The programming device is a dedicated unit which contains the necessary hardware
circuit to generate the programming signals.
 Most of the programmer devices available in the market are capable of programming
different family of devices with different pin outs (Pin counts).
 The programmer contains a ZIF socket with locking pin to hold the device to be
programmed. The programming device will be under the Programmer: LabTool-
48UXP) control of a utility program running PC. Usually the programmer is
interfaced to the PC through RS-232C/USB/Parallel Port Interface.
 The commands to control the programmer are sent from the utility program to the
programmer through the interface.

Dept. Of CSE, HKBKCE 155 2019-20


18CS44 Microcontrollers and Embedded Systems

The sequence of operations for embedding the firmware with a programmer is listed below.
1. Connect the programming device to the specified port of PC (USB/COM port/parallel port)
2. Power up the device (Most of the programmers incorporate LED to indicate Device power
up. Ensure that the power indication LED is ON)
3. Execute the programming utility on the PC and ensure proper connectivity is established
between PC and programmer. In case of error, turn off device power and try connecting it
again
4. Unlock the ZIF socket by turning the lock pin
5. Insert the device to be programmed into the open socket as per the insert diagram shown
on the programmer
6. Lock the ZIF socket
7. Select the device name from the list of supported devices
8. Load the hex file which is to be embedded into the device
9. Program the device by ‘Program’ option of utility program
10.Wait till the completion of programming operation (Till busy LED of programmer is off)
11.Ensure that programming is successful by checking the status LED on the programmer
(Usually ‘Green’ for success and ‘Red’ for error condition) or by noticing the feedback from
the utility program
12. Unlock the ZIF socket and take the device out of programmer

Dept. Of CSE, HKBKCE 156 2019-20


18CS44 Microcontrollers and Embedded Systems

 Now the firmware is successfully embedded into the device.


 Insert the device into the board, power up the board and test it for the required
functionalities. It is to be noted that the most of programmers support only Dual
Inline Package (DIP) chips, since its ZIF socket is designed to accommodate only DIP
chips.
 Hence programming of chips with other packages is not possible with the current
setup. Adaptor sockets which convert a non-DIP package to DIP socket can be used
for programming such chips.
 One side of the Adaptor socket contains a DIP interface and the other side acts as a
holder for holding the chip with a non-DIP package (say VQFP).
 Option for setting firmware protection will be available on the programming utility.
 If wereally want the firmware to be protected against unwanted external access, and if
the device is supporting memory protection, enable the memory protection on the
utility before programming the device.
 The programmer usually erases the existing content of the chip before programming
the chip. Only EEPROM and FLASH memory chips are erasable by the programmer.
 Some old embedded systems may be built around UVEPROM chips and such chips
should be erased using a separate ‘UV Chip Eraser’ before programming.
 The major drawback of out-of-circuit programming is the high development time.
 Whenever the firmware is changed, the chip should be taken out of the development
board for re-programming.
 This is tedious and prone to chip damages due to frequent insertion and removal.
 Better use a socket on the board side to hold the chip till the firmware modifications
are over. The programmer facilitates programming of only one chip at a time and it is
not suitable for batch production.
 Using-a ‘Gang Programmer’ resolves this issue to certain extent.
 A gang programmer is similar to an ordinary programmer except that it contains
multiple ZIF sockets (4 to 8) and capable of programming multiple devices at a time.
 But it is bit expensive compared to an ordinary programmer.
 Another big drawback of this programming technique is that once the product is
deployed in the market in a production environment, it is very difficult to upgrade the
firmware.

Dept. Of CSE, HKBKCE 157 2019-20


18CS44 Microcontrollers and Embedded Systems

In System Programming (ISP)

In System Programming with SPI Protocol

Devices with SPI In System Programming support contains a built-in SPI interface and the
on-chip EEPROM or FLASH memory is programmed through this interface.

The primary I/O lines involved in SPI - In System Programming are listed below.

MOSI - Master Out Slave In


MISO - Master In Slave Out
SCK - System Clock
RST- Reset of Target Device
GND - Ground of Target Device

 PC acts as the master and target device acts as the slave in ISP.
 The program data is sent to the MOSI pin of target device and the device
acknowledgement is originated from the MISO pin of the device.
 SCK pin acts as the clock for data transfer.
 A utility program can be developed on the PC side to generate the above signal lines.
 Since the target device works under a supply voltage less than SV (TTL/CMOS), it is
better to connect these lines of the target device with the parallel port of the PC.
 Since Parallel port operations are also at 5V logic, no need for any other intermediate
hardware for signal conversion.
 The pins of parallel port to which the ISP pins of device needs to be connected are
dependent on the program, which is used for generating these signals, or you can fix
these lines first and then write the program according to the pin interconnection
assignments.
 Standard SPI-ISP utilities are freely available on the internet and there is no need for
going for writing own program. What you need to do is just connect the pins as
mentioned by the program requirement.

As mentioned earlier, for ISP operations, thetarget device needs to be powered up in a pre-
defined sequence. The power up sequence for In System Programming for Atmel’s AT89S
series microcontroller family is listed below.
1. Apply supply voltage between VCC and GND pins of target chip.
2. Set RST pin to “HIGH” state.
3. If a crystal is not connected across pins XTAL1 and XTAL2, apply a 3 MHz to 24
MHz clock to XTAL]I pin and wait for at least 10 milliseconds.
4. Enable serial programming by sending the Programming Enable serial instruction
to pin MOSI/ P1.5. The frequency of the shift clock supplied at pin SCK/P1.7 needs
to be less than the CPU clock at XTAL1 divided by 40.
5. The Code or Data array is programmed one byte at a time by supplying the address
and data together with the appropriate Write instruction. The selected memory
location is first erased before the new data is written. The write cycle is self-timed and
typically takes less than 2.5 ms at SV.
6. Any memory location can be verified by using the Read instruction, which returns
the content at the selected address at serial output MISO/P1.6.

Dept. Of CSE, HKBKCE 158 2019-20


18CS44 Microcontrollers and Embedded Systems

7. After successfully programming the device, set RST pin low or turn off the chip
power supply and turn it ON to commence the normal operation.

 The key player behind ISP is a factory programmed memory (ROM) called ‘Boot
ROM’.
 The Boot ROM normally resides at the top end of code memory space and it varies in
the order of a few Kilo Bytes (For a controller with 64K code memory space and 1K
Boot ROM, the Boot ROM resides at memory location FCOOH to FFFFH).
 It contains a set of Low-level Instruction APIs and these APIs allow the
‘arocessor/controller to perform the FLASH memory programming, erasing and
Reading operations.
 “The contents of the Boot ROM are provided by the chip manufacturer and the same
is masked into every ‘device.
 The Boot ROM for different family or series devices is different.
 By default the Reset vector starts the code memory execution at location 0000H.
 If the ISP mode is enabled through the special ISP Power up sequence, the execution
will start at the Boot ROM vector location.
 In System Programming technique is the best advised programming technique for
development work since the effort required to re-program the device in case of
firmware modification is very little.
 Firmware upgrades for products supporting ISP is quite simple.

In Application Programming (IAP)

 In Application Programming (JAP) is a technique used by the firmware running on


the target deyice for modifying a selected portion of the code memory.
 It is not a technique for first time embedding of user written firmware. It modifies the
program code memory under the control of the embedded application.
 Updating calibration data, look-up tables, etc., which are stored in code memory, are
typical examples of IAP.
 The Boot ROM resident API instructions which perform various functions such as
programming, erasing, and reading the Flash memory during ISP.mode, are made
available to the end-user written firmware for IAP.
 Thus it is possible for an end-user application to perform operations on the Flash
memory.
 A common entry point to these API routines is provided for interfacing them to the
end-user’s application.
 Functions are performed by setting up specific registers as required by a specific
operation and performing a call to the common entry point.
 Like any other subroutine call, after completion of the function, control will return to
the end-user’s code.
 The Boot ROM is shadowed with the user code memory in its address range.
 This shadowing is controlled by a status bit.
 When this status bit is set, accesses to the internal code memory
 In this address range will be from the Boot ROM.
 When cleared, accesses will be from the user’s code memory.
 Hence the user should set the status bit prior to calling the common entry point for
IAP operations

Dept. Of CSE, HKBKCE 159 2019-20


18CS44 Microcontrollers and Embedded Systems

Use of Factory Programmed Chip

 It is possible to embed the firmware into the target processor/controller memory at the
time of chip fabrication itself.
 Such chips are known as ‘Factory programmed chips’.
 Once the firmware design is over and the firmware achieved operational stability, the
firmware files can be sent to the chip fabricator to embed it into the code memory.
 Factory programmed chips are convenient for mass production. applications and it
greatly reduces the product development time.
 It is not recommended to use factory programmed chips for development purpose
where the firmware undergoes frequent changes.
 Factory programmed ICs are bit expensive.

Firmware Loading for Operating System Based Devices

 The OS based embedded systems are programmed using the In System Programming
(ISP) technique.
 OS based embedded systems contain a special piece of code called ‘Boot loader’
program which takes control of the OS and application firmware embedding and
copying of the OS image to the RAM of the system for execution.
 The ‘Boot loader’ for such embedded systems comes as pre-loaded or it can be loaded
to the memory using the various interface supported like JTAG.
 The bootloader contains necessary driver initialisation implementation for initialising
the supported interfaces like UART, TCP/IP etc.
 Bootloader implements menu options for selecting the source for OS image to load
 In case of the network based loading, the bootloader broadcasts the target’s presence
over the network and the host machine on which the OS image resides can identify
the target device by capturing this message.
 Once a communication link is established between the host and target machine, the
OS image can be directly downloaded to the FLASH memory of the target device.

BOARD POWER UP.

 Now the firmware is embedded into the target board using one of the programming
techniques
 Sometimes the first power up may end up in a messy explosion leaving the smell of
burned components behind.
 It may happen due to various reasons, like Proper care was not taken in applying the
power and power applied in reverse polarity (+ve of supply connected to -ve of the
target board and vice versa), components were not placed in the correct polarity order

THE EMBEDDED SYSTEM DEVELOPMENT AND ENVIRONMENT

Dept. Of CSE, HKBKCE 160 2019-20


18CS44 Microcontrollers and Embedded Systems

The development environment consists of a Development Computer (PC) or Host, which acts
as the heart of the development environment, Integrated Development Environment (IDE)
Tool for embedded firmware development and debugging, Electronic Design Automation
(EDA) Tool for Embedded Hardware design, An emulator hardware for debugging the target
board, Signal sources (like Function generator) for simulating the inputs to the target board,
Target hardwaredebugging tools (Digital CRO, Multimeter, Logic Analyser, etc.) and the
target hardware.

The Integrated Development Environment (IDE) and Electronic Design Automation (EDA)
tools are selected based on the target hardware development requirement and they are
supplied as Installable files in CDs by vendors.

These tools need to be installed on the host PC used for development

activities. These tools can be either freeware or licensed copy or evaluation

versions.

Licensed versions of the tools are fully featured and fully functional whereas trial versions
fall into two categories, tools with limited features, and full featured copies with limited
period of usage.

Dept. Of CSE, HKBKCE 161 2019-20


18CS44 Microcontrollers and Embedded Systems

DISASSEMBLER/DECOMPILER

 Disassembler is a utility program which converts machine codes into target processor
specific Assembly codes/instructions.
 The process of converting machine codes into Assembly code is known as
‘Disassembling’.
 In operation, disassembling is complementary to assembling/cross-assembling.
 Decompiler is the utility program for translating machine codes into corresponding
high level language instructions.
 Decompiler performs the reverse operation of compiler/cross-compiler.
 The disassemblers/decompilers for different family of processors/controllers are
different. Disassemblers/Decompilers are deployed in reverse engineering.

Dept. Of CSE, HKBKCE 162 2019-20


18CS44 Microcontrollers and Embedded Systems

 Reverse engineering is the process of revealing the technology behind the working of
a product. Disassemblers/decompilers help the reverse engineering process by
translating the embedded firmware into Assembly/high level language instructions.
 Disassemblers/Decompilers are powerful tools for analysing the presence of
malicious codes (virus information) in an executable image.
 Disassemblers/Decompilers are available as either freeware tools readily available for
free download from internet or as commercial tools.
 It is not possible for a disassembler/decompiler to generate an exact replica of the
original assembly code/high level source code in terms of the symbolic constants and
comments used. However disassemblers/decompilers generate a source code which is
somewhat matching to the original source code from which the binary code is
generated.

SIMULATORS, EMULATORS AND DEBUGGING

 Simulator is a software tool used for simulating the various conditions for checking
the functionality of the application firmware.
 The Integrated Development Environment (IDE) itself will be providing simulator
support and they help in debugging the firmware for checking its required
functionality.
 In certain scenarios, simulator refers to a soft model (GUI model) of the embedded
product. For example, if the product under development is a handheld device, to test
the functionalities of the various menu and user interfaces, a soft form model of the
product with all UI as given in the end product can be developed in software.
 Soft phone is an example for such a simulator.
 Emulator is hardware device which emulates the functionalities of the target device
and allows real time debugging of the embedded firmware in a hardware
environment.

Simulators

Simulators simulate the target hardware and the firmware execution can be inspected using
simulators.

The features of simulator based debugging are listed below.

1. Purely software based


2. Doesn’t require a real target system
3. Very primitive (Lack of featured I/O support. Everything is a simulated one)
4. Lack of Real-time behaviour

Advantages of Simulator Based Debugging

Simulator based debugging techniques are simple and straightforward.


The major advantages of simulator based firmware debugging techniques are explained
below.

No Need for Original Target Board

Dept. Of CSE, HKBKCE 163 2019-20


18CS44 Microcontrollers and Embedded Systems

 Simulator based debugging technique is purely software oriented.


 IDE’s software support simulates the CPU of the target board.
 User only needs to know about the memory map of various devices within the target
board and the firmware should be written on the basis of it.
 Since the real hardware is not required, firmware development can start well in
advance immediately after the device interface and memory maps are finalised.
 This saves development time.

Simulate I/O Peripherals

 Simulator provides the option to simulate various I/O peripherals.


 Using simulator’s I/O support you can edit the values for I/O registers and can be
used as the input/output value in the firmware execution.
 Hence it eliminates the need for connecting I/O devices for debugging the firmware.

Simulates Abnormal Conditions

 With simulator’s simulation support you can input any desired value for any
parameter during debugging the firmware and can observe the control flow of
firmware.
 It really helps the developer in simulating abnormal operational environment for
firmware and helps the firmware developer to study the behaviour of the firmware
under abnormal input conditions.

Limitations of Simulator based Debugging

 Though simulation based firmware debugging technique is very helpful in embedded


applications, they possess certain limitations and we cannot fully rely upon the
simulator-based firmware debugging.
 Some of the limitations of simulator- based debugging are explained below.

Deviation from Real Behaviour

 Simulation-based firmware debugging is always carried out in a development


environment where the developer may not be able to debug the firmware under all
possible combinations of input.
 Under certain operating conditions we may get some particular result and it need not
be the same when the: firmware runs in a production environment.

Lack of real timeliness

 The major limitation of simulatorbased debugging is that it is not real-time in


behaviour.
 The debugging is developer driven and it is no way capable of creating a real time
behaviour. Moreover in a real application the I/O condition may be varying or
unpredictable.
 Simulation goes for simulating those conditions for known values.

Dept. Of CSE, HKBKCE 164 2019-20


18CS44 Microcontrollers and Embedded Systems

Emulators and Debuggers

Debugging in embedded application is the process of diagnosing the firmware execution,


monitoring the target processor’s registers and memory while the firmware is running and
checking the signals from various buses of the embedded hardware. Debugging process in
embedded application is broadly classified into two, namely; hardware debugging and
firmware debugging.

Hardware debugging deals with the monitoring of various bus signals and checking the status
lines of the target hardware.

Firmware debugging deals with examining the firmware execution, execution flow, changes
to various CPU registers and status registers on execution of the firmware to ensure that the
firmware is running as per the design.

Incremental EEPROM Burning Technique

 This is the most primitive type of firmware debugging technique where the code is
separated into different functional code units.
 Instead of burning the entire code into the EEPROM chip at once, the code is burned
in incremental order, where the code corresponding to all functionalities are
separately coded, cross-compiled and burned into the chip one by one.
 The code will incorporate some indication support like lighting up an “LED (every
embedded product contains at least one LED).
 If not, you should include provision for at least one LED in the target board at the
hardware design time such that it can be used for debugging purpose)” or activate a
“BUZZER (In a system with BUZZER support)” if the code is functioning in the
expected way.
 If the first functionality is found working perfectly on the target board with the
corresponding code burned into the EEPROM, go for burning the code corresponding
to the next functionality and check whether it is working.
 Repeat this process till all functionalities are covered.
 Please ensure that before entering into one level up, the previous level has delivered a
correct result.
 If the code corresponding to any functionality is found not giving the expected result,
fix it by modifying the code and then only go for ‘adding the next functionality for
burning into the EEPROM.
 After you found all functionalities working properly, combine the entire source for all
functionalities together, re-compile and burn the code for thetotal system functioning.
Obviously it is a time-consuming process.
 It is a onetime process and once you test the firmware in an incremental model you
can go for mass production.
 In incremental firmware burning technique we are not doing any debugging but
observing the statusof firmware execution as a debug method.

Dept. Of CSE, HKBKCE 165 2019-20


18CS44 Microcontrollers and Embedded Systems

 The very common mistake committed by firmware developers in developing non-


operating system-based embedded application is burning the entire code altogether
and fed up with debugging the code.
 Incremental firmware burning technique is widely adopted in small, simple system
developments and in product development where time is not a big constraint (e.g.
R&D projects).
 It is also very useful in product development environments where no other debug
tools are available.

Inline BreakpointBased Firmware Debugging

 Inline breakpoint based debugging is another primitive method of firmware


debugging.
 Within the firmware where you want to ensure that firmware execution is reaching up
to a specified point, insert an inline debug code immediately after the point.
 The debug code is a printf() function whichprintsa string given as per the firmware.
 You can insert debug codes (printf{()) commands at each point where you want to
ensure the firmware execution is covering that point.
 Cross-compile the source code with the debug codes embedded within it.
 Burn the corresponding hex file into the EEPROM.
 You can view the print{() generated data on the ‘Hyper- Terminal—A communication
facility available with the Windows OS coming under the Communications section of
Start Menu’ of the Development PC.
 Configure the serial communication settings of the ‘HyperTerminal’ connection to the
same as that of the serial communication settings configured in the firmware (Say
Baudrate = 9600; Parity = None; Stop Bit = 1; Flow Control = None);
 Connect the target board’s serial port (COM) to the development PC’s COM Port
using an RS232 Cable.
 Power up the target board.
 Depending on the execution flow of firmware and the inline debug codes inserted in
the firmware, you can view the debug information on the ‘HyperTerminal’.
 Typical usage of inline debug codes and the debug info retrieved on the
HyperTerminal is illustrated below.
 If the firmware is error free and the execution occurs properly, you will get all the
debug messages on the HyperTerminal.
 Based on this debug info you can check the firmware for errors (Fig. 13.38).

Dept. Of CSE, HKBKCE 166 2019-20


18CS44 Microcontrollers and Embedded Systems

Monitor Program Based Firmware Debugging

 Monitor program based firmware debugging is the first adopted invasive


method for firmware debugging (Fig. 13.39).
 In this approach a monitor program which acts as a supervisor 1s developed.
 The monitor program controls the downloading of user code into the code
memory, inspects and modifies register/memory locations; allows single
stepping of source code, etc.
 The monitor program implements the debug functions as per a pre-defined
command set from the debug application interface.

Dept. Of CSE, HKBKCE 167 2019-20


18CS44 Microcontrollers and Embedded Systems

 The monitor program always listens to the serial port of the target device and
according to the command received/from the serial interface it performs
command specific actions like firmware downloading, memory
inspection/modification, firmware single "stepping and sends the debug
information (various register and memory contents) back to the main debug
program running on the development PC, etc.
 The first step in any monitor program development is determining a set of
commands for performing various operations like firmware downloading,
memory/ register inspection/modification, single stepping, etc.
 Once the commands for each operation is fixed, write the code forperforming
the actions corresponding to these commands.
 As mentioned earlier, the commands may be received through any of the
external interface of the target processor (e.g. RS-232C serial interface/parallel
interface/USB, etc.).
 Themonitor program should query this interface to get commands or should
handle the command reception if the data reception.is implemented through
interrupts.
 On receiving a command, examine it and perform the action corresponding to
it.
 The entire code stuff handling the command reception and corresponding
action implementation is known as the “monitor program”.
 The most common type of interface used between target board and debug
application is RS-232C Serial interface.
 After the successful completion of the ‘monitor program’ development, it is
compiled and burned into the FLASH memory or ROM of the target board.
 The-code memory contain- ing the monitor program is known as the ‘Monitor
ROM’.

The monitor program contains the following set of minimal features.

1. Command set interface to establish communication with the debugging application


2. Firmware download option to code memory

Dept. Of CSE, HKBKCE 168 2019-20


18CS44 Microcontrollers and Embedded Systems

3. Examine and modify processor registers and working memory (RAM)


4. Single step program execution
5. Set breakpoints in firmware execution
6. Send debug information to debug application running on host machine

 The monitor program usually resides at the reset vector (code memory 0000H) of the
target processor.
 The monitor program is commonly employed in development boards and the
development board supplier provides the monitor program, in the form of a ROM
chip.
 The actual code memory is downloaded intoa RAM chip which is interfaced to the
processor in the Von-Neumann architecture model.
 The Von-Neumann architecture model is achieved by ANDing the PSEN\ and RD\
signals of the target processor (In case of 805/) and connecting the output of AND
Gate to the Output Enable (RD\) pin of RAM chip.
 WR\ signal of the target processor is interfaced to The WR\ signal of the Von
Neumann RAM. Monitor ROM size varies in the range of a few kilo bytes (Usually
4K). An address decoder circuit maps the address range allocated to the monitor ROM
and activates the Chip Select (CS\) of the ROMif theaddress is within the range
specified for the Monitor ROM.
 A user program is normally leaded at locations 0x4000 or 0x8000.
 Theaddress decoder circuit ensures the enabling of the RAMchip.(CS\) when
theaddress range is outside that allocated to the ROM monitor.
 Though there are two memory chips (Monitor ROM Chip and Von-Neumann RAM),
the total memory map available for both of them will be 64K for a
processor/controller with 16bit address space aidthe memory decoder units take care
of avoiding conflicts in accessing both. While developing user program for monitor
ROM based systems, special care should be taken to offset the user codeand handling
the interrupt vectors.
 The target development IDE will help in resolving this. During firmware execution
and single stepping, the user code may have to-be altered and hence the firmware is
always downloaded into a Von-Neumann RAM in monitor ROM-based debugging
systems.
 Monitor ROM-based debugging is suitable only for development work and it is not a
good choice for mass produced systems.

The major drawbacks of monitorbased debugging system are

1. The entire memory map is converted into a Von-Neumann model and it is shared
between the monitor ROM, monitor program data memory, monitor-program trace
buffer, user written firmware and external user memory.

For 8051, the original Harvard architecture supports 64K code memory and 64K
external data memory (Total 128K memory map).

Going for a monitor based debugging shrinks the total available memory to 64K Von-
Neumann memory and it needs to accommodate all kinds of memory requirement
(Monitor Code, monitor data, trace buffer memory, User code and External User data
memory).

Dept. Of CSE, HKBKCE 169 2019-20


18CS44 Microcontrollers and Embedded Systems

2. The communication link between the debug application running on Development PC


and monitor program residing in the target system is achieved through a serial link
and usually the controller’s On-chip UART is used for establishing this link.

Hence one serial port of the target processor be- comes dedicated for the monitor
application and it cannot be used for any other device interfacing.

Wastage of a serial port! It is a serious issue in controllers or processors with single


UART.

In Circuit Emulator (ICE) Based Firmware

‘Simulator’ is a software application that precisely duplicates (mimics) the target CPU and
simulates the various features and instructions supported by the target CPU, whereas an
‘Emulator’ is a self-contained hardware device which emulates the target CPU.

The emulator hardware contains necessary emulation logic and it is hooked to the debugging
application running on the development PC on one end and connects to the target board
through some interface on the other end.

In summary, the simulator ‘simulates’ the target board CPU and the emulator ‘emulates’ the
target board CPU.

There is a scope change that has happened to the definition of an emulator.

In olden days emulators were defined as special hardware devices used for emulating the
functionality of a processor/controllerand performing various debug operations like halt
firmware execution, set breakpoints, get or set internal RAM/CPU register, etc.

Nowadays pure software applications which perform the functioning of a hardware emulator
is also called as ‘Emulators’ (though they are ‘Simulators’ in operation).

The emulator application for emulating the operation of a PDA phone for application
development is an example of a ‘Software Emulator’.

A hardware emulator is controlled by a debugger application running on the development PC.

The debugger application may be part of the Integrated Development Environment (IDE) or a
third party supplied tool.

Most of the IDEs incorporate debugger support for some of the emulators commonly
available in the market.

The emulators for different families of processors/controllers are different.


Figure 13.40 illustrates the different subsystems and interfaces of an ‘Emulator’ device.

Dept. Of CSE, HKBKCE 170 2019-20


18CS44 Microcontrollers and Embedded Systems

The Emulator POD forms the heart of any emulator system and it contains the following
functional units.

Emulation Device

 Emulation device is a replica of the target CPU which receives various signals from
the target board through a device adaptor connected to the target board and performs
the execution of firmware under the control of debug commands from the debug
application.
 The emulation device can be either a standard chip same as the target processor (e.g.
AT89C51) or a Programmable Logic Device (PLD) configured to function as the
target CPU.
 Ifa standard chip is used as the emulation device, the emulation will provide real-time
execution behaviour.
 At the same time the emulator becomes dedicated to that particular device and cannot
be re-used for the derivatives of the same chip.
 PLD-based. emulators can easily be re-configured to use with derivatives of the target
CPU under consideration.
 By simply loading the configuration file of the derivative processor/controller, the
PLD gets re-configured and it functions as the derivative device.
 A major drawback of PLD-based emulator is the accuracy of replication of target
CPU functionalities. PLD-based emulator logic is easy to implement for simple target
CPUs but for complex target CPUs it is quite difficult.

Emulation Memory

 It is the Random Access Memory (RAM) incorporated in the Emulator device.


 It acts as a replacement to the target board’s EEPROM where the code is supposed to
be downloaded after each firmware modification.
 Hence the original EEPROM memory is emulated by the RAM of emulator.
 This is known as ‘ROM Emulation’.
 ROM emulation eliminates the hassles of ROM burning and it offers the benefit of
infinite number of reprogrammings (Most of the EEPROM chips available in the
market supports only 100 to 1000 re-program cycles).
 Emulation memory also acts as a trace buffer in debugging.

Dept. Of CSE, HKBKCE 171 2019-20


18CS44 Microcontrollers and Embedded Systems

 Trace buffer is a memory pool holding the instructions executed/registers


modified/related data by the processor while debugging.
 The trace buffer size is emulator dependent and the trace buffer holds the recent trace
information when the buffer overflows.

The common features of trace buffer memory and trace buffer data viewing are listed below:

 Trace buffer records each bus cycle in frames


 Trace data can be viewed in the debugger application as Assembly/Source code
 Trace buffering can be-done on the basis of a Trace trigger (Event)
 Trace buffer can also record signals from target board other than CPU signals
(Emulator dependent)
 Trace data is avery useful information in firmware debugging

Emulator Control Logic

 Emulator control logic is the logic circuits used for implementing complex hardware
breakpoints, trace buffer trigger detection, trace buffer control, etc.
 Emulator control logic circuits are also used for implementing logic analyser
functions in advanced emulator devices.
 The ‘Emulator POD’ is connected to the target board through a ‘Device adaptor’ and
signal cable.

Device Adaptors

 Device adaptors act as an interface between the target board and emulator POD.
 Device adaptors are normally pin-to-pin compatible sockets which can be
inserted/plugged into the target board for routing the various signals from the pins
assigned for the target processor. The device adaptor is usually connected to the
emulator POD using ribbon cables.
 The adaptor type varies depending on the target processor’s chip package. DIP,
PLCC, etc. are some commonly used adaptors.

On Chip Firmware Debugging (OCD)

 Though OCD adds silicon complexity and cost factor, from a developer perspective it
is.a very good feature supporting fast and efficient firmware debugging.
 The On Chip Debug facilities integrated to the processor/controller are chip vendor
dependent and most of them are proprietary technologies like Background Debug
Mode (BDM), OnCE, etc.
 Some vendors add ‘on chip software debug support’ through JTAG (Joint Test Action
Group) port.
 Processors/controllers with OCD support incorporate a dedicated debug module to the
existing architecture.
 Usually the on-chip debugger provides the means to set simple breakpoints, query the
internal state of the chip and single.step through code.
 OCD module implements dedicated registers for controlling debugging.

Dept. Of CSE, HKBKCE 172 2019-20


18CS44 Microcontrollers and Embedded Systems

 An On Chip Debugger can be enabled by setting the OCD enable bit (The bit name
and register holding the bit varies across vendors).
 Debug related registers are used for debugger control (Enable/disable single stepping,
Freeze execution, etc.) and breakpoint address setting.
 BDM and JTAG are the two commonly used interfaces to communicate between the
Debug application running on Development PC and OCD module of target CPU.
 Some interface logic in the form of hardware will be implemented between the CPU
OCD interface and the host PC to capture the debug information from the target CPU
and sending it to the debugger application running on the host PC.
 The interface between the hardware and PC may be Serial/Parallel/USB.

 The following section will give you a brief introduction about Background Debug
Mode (BDM) and JTAG interface used in On Chip Debugging.
 Background Debug Mode (BDM) interface is a proprietary On Chip Debug solution
from Motorola. BDM defines the communication interface between the chip resident
debug core and host PC where the BDM compatible remote debugger is running.
 BDM makes use of 10 or 26 pin connector to connect to the target board.
 Serial data in (DSI, Serial data out (DSO) and Serial clock (DSCLK) are the three
major signal lines used in BDM.
 DSI sends debug commands serially to the target processor from the remote debugger
application and DSO sends the debug response to the debugger from the processor.
Synchronisation of serial transmission is done by the serial clock DSCLK generated
by the debugger application.
 Debugging is controlled by BDM specific debug. commands.
 The debug commands are usu- ally 17-bit wide. 16 bits are used for representing the
command and 1 bit for status/control.
 Chips with JTAG debug interface contain a built-in JTAG port for communicating
with the remote debugger application.

 JITAG is the acronym for Joint, Test Action Group. JTAG is the alternate narye for
IEEE 1149.1 standard.
 Like BDM, JTAG is also a serial interface:

The signal lines of JTAG protocol are explained below.

 Test Data In (TDI): It is used for sending debug commands serially from remote
debugger to the target processor.
 Test Data Out (TDO): Transmit debug response to the remote debugger from target
CPU.
 Test Clock (TCK): Synchronises the serial data transfer.
 Test Mode Select (TMS): Sets the mode of testing.
 Test Reset (TRST): It is an optional signal line used for resetting the target CPU. The
serial data transfer rate for JIAG debugging is chip dependent. It is usually within the
range of 10 to 1000 MHz.

TARGET HARDWARE DEBUGGING

Dept. Of CSE, HKBKCE 173 2019-20


18CS44 Microcontrollers and Embedded Systems

 Hardware debugging involves the monitoring of various signals of the target board
(address/data lines, port pins, etc.), checking the interconnection among various
components, circuit continuity checking, etc.
 The various hardware debugging tools used in Embedded Product Development are
explained below.

Magnifying Glass (Lens)

 A magnifying glass is a powerful visual inspection tool.


 With a magnifying glass (lens), the surface of the target board can be examined
thoroughly for dry soldering of components, missing components, improper
placement of components, improper soldering, track (PCB connection) damage, short
of tracks, etc.
 Nowadays high quality magnifying stations are available for visual inspection.
 The magnifying station incorporates magnifying glasses attached to a stand with CFL
tubes for providing proper illumination for inspection.
 The station usually incorporates multiple magnifying lenses. The main lens acts as a
visual inspection: tool for the entire hardware board whereas the other small lens
within the station is used for magnifying a relatively small area of the board which
requires thorough inspection.

Multimeter

 A multimeter is used for measuring various electrical quantities like voltage (Both AC
and DC), current (DC as well-as AC), resistance, capacitance, continuity checking,
transistor checking, cathode and anode identification of diode, etc.
 Any multimeter will work over a specific range for each measurement.
 A multimeter is the most valuable tool in the toolkit of an-embedded hardware
developer.
 It is the primary debugging tool for physical contact based hardware debugging and
almost all developers start debugging the hardware with it.
 In embedded hardware debugging it is mainly used for checking the circuit continuity
between different points on the board, measuring the supply voltage, checking the
signal value, polarity, etc.
 Both analog and digital versions of a multimeter are available.
 The digital version is preferred over analog the one for various reasons like
readability, accuracy, etc.
 Fluke, Rishab, Philips, etc. are the manufacturers of commonly available high quality
digital multimeters.

Digital CRO

 Cathode Ray Oscilloscope (CRO) is a little more sophisticated tool compared to a


multimeter. CRO is used for waveform capturing and analysis, measurement of signal
strength, etc.
 By connecting,the point under observation on the target board to the. Channels of the
Oscilloscope, the waveforms can be captured and analysed for expected behaviour.
 CRO is a very good tool in analysing interference noise in the power supply line and
other signal lines.

Dept. Of CSE, HKBKCE 174 2019-20


18CS44 Microcontrollers and Embedded Systems

 Monitoring the crystal oscillator signal from the target board is a typical example of
the usage of CRO for waveform capturing and analysis in target board debugging.
CROs are available in both analog and digital versions.
 Though Digital CROs are costly, featurewise they are best suited for target board
debugging applications.
 Digital CROs are available for high frequency support and they also incorporate
modern techniques for recording waveform over a period of time, capturing waves on
the basis of a configurable event (trigger) from the target board (e.g. High to low
transition of a port pin of the target processor).
 Most of the modern digital CROs contain more than one channel and it is easy to
capture and analyse various signals from the target board using multiple channels
simultaneously.
 Various measurements like phase, amplitude, etc. is also possible with CROs.
 Tektronix, Agilent, Philips, etc. are the manufacturers of high precision good quality
digital CROs.

Logic Analyser

 A logic analyser is the big brother of digital CRO.


 Logic analyser is used for capturing digital data (logic 1 and 0) from a digital circuitry
whereas CRO is employed in capturing all kinds of waves including logic signals.
 Another major limitation of CRO is that the total number of logic signals/waveforms
that can be captured with a CRO is limited to the number of channels.
 A logic analyser contains special connectors and clips which can be attached to the
target board for capturing digital data.
 In target board debugging applications, a logic analyser captures the states of various
port pins, address bus and data bus of the target processor/controller, etc.
 Logic analysers give an exact reflection of what happens when a particular line of
firmware is running.
 This is achieved by capturing the address line logic and data line logic of target
hardware.
 Most modern logic analysers contain provisions for storing captured data, selecting a
desired region of the captured waveform, zooming selected region of the captured
waveform, etc. Tektronix, Agilent, etc. are the giants in the logic analyser market.

Function Generator

 Function generator is not a debugging tool.


 It is an input signal simulator tool.
 A function generator is capable of producing various periodic waveforms like sine
wave, square wave, saw-tooth wave, etc. with different frequencies and amplitude.
 Sometimes the target board may require some kind of periodic waveform with a
particular frequency as input to some part of the board.
 Thus, in a debugging environment, the function generator serves the purpose of
generating and supplying required signals.

Dept. Of CSE, HKBKCE 175 2019-20


18CS44 Microcontrollers and Embedded Systems

BOUNDARY SCAN

As the complexity of the hardware increase, the number of chips present in the board and the
interconnection among them may also increase.

The device packages used in the PCB become miniature to reduce the total board space
occupied by them and multiple layers may be required to route the inter- connections among
the chips.

With miniature device packages and multiple layers for the PCB it will be very difficult to
debug the hardware using magnifying glass, multimeter, etc. to check the interconnection
among the various chips.
Boundary scan is a technique used for testing the interconnection among the various chips,
which support JTAG interface, present in the board. Chips which support boundary scan
associate a boundary scan cell with each pin of the device.

A JTAG port which contains the five signal lines namely TDI, TDO, TCK, TRST and TMS
form the Test Access Port (TAP) for a JTAG sup- ported chip.

Each device will have its own TAP.

The PCB also contains a TAP for connecting the JTAG signal lines to the external world.

A boundary scan path is formed inside the board by interconnecting the devices through
JTAG signal lines.

The TDI pin of the TAP of the PCB is connected to the TDI pin of the first device.

The TDO pin of the first device is connected to the TDI pin of the second device.

In this way all devices are interconnected and the TDO pin of the last JTAG device is
connected to the TDO pin of the TAP of the PCB.

The clock line TCK and the Test Mode Select (TMS) line of the devices areconnected to the
clock line and Test mode select line of the Test Access Port of the PCB respectively.

This forms a boundary scan path.

Dept. Of CSE, HKBKCE 176 2019-20


18CS44 Microcontrollers and Embedded Systems

As mentioned earlier, each pin of the device associates a boundary scan cell with it.

The boundary scan cell isa multipurpose memory cell.

The boundary scan cell associated with the input pins of an IC is known as ‘input cells’ and
the boundary scan cells associated with the output pins of an IC is known as ‘output cells’.

The boundary scan cells can be used for capturing the input pin signal state and passing it to
the internal circuitry, capturing the signals from the internal circuitry and passing it to the
output pin, and shifting the data received from the Test Data In pin of the TAP.

The boundary scan cells associated with the pins are interconnected and they form a chain
from the TDI pin of the device to its TDO pin.

The boundary scan cells can be operated in Normal, Capture, Update and Shift modes.

In the Normal mode, the input of the boundary scan cell appears directly at its output.

In the Capture mode, the boundary scan cell associated with each input pin of the chip
captures the signal from the respective pins to the cell and the boundary scan cell associated
with each output pin of the chip captures the signal from the internal circuitry.

In the Update mode, the boundary scan cell associated with each input pin.of the chip passes
the already captured data to the internal circuitry and the boundary scan cell associated with
each output pin of the chip passes the already captured data to the respective output pin.

Dept. Of CSE, HKBKCE 177 2019-20


18CS44 Microcontrollers and Embedded Systems

In the shift mode, data is shifted from TDI pin to TDO pin of the device through the
boundary scan cells.

ICs supporting boundary scan contain additional boundary scan related registers for
facilitating the boundary scan operation.

Instruction Register, Bypass Register, Identification Register, etc. are examples of boundary
scan related registers.

The Instruction Register is used for holding and processing the instruction received over the
TAP.

The bypass register is used for bypassing the boundary scan path of the device and directly
interconnecting the, TDI pin of the device to its TDO. It disconnects a device from the
bound- ary scan path.

Different instructions are used for testing the interconnections and the functioning of the chip.

Extest, Bypass, Sample and Preload, Intest, etc. are examples for instructions for different
types of boundary scan tests, whereas the instruction Runbist is used for performing a self
test internal functioning of the chip.

The Runbist instruction produces a pass/fail result.

Boundary Scan Description Language (BSDL) is used for implementing boundary scan tests
using JTAG.

BSDL is a subset of VHDL and it describes the JTAG implementation in a device.

BSDL provides information on how boundary scan is implemented in an integrated

chip.

The BSDL file (File which describes the boundary scan implementation for a device format)
for a JTAG compliant device is supplied the device manufacturers or it can be downloaded
from internet repository.

The BSDL file is used as the input to a Boundary Scan Tool for generating boundary scan
test cases for a PCB.

Automated tools are available for boundary scan test implementation from multiple vendors.

The ScanExpress™ Boundary Scan (JTAG) product from Corelis Inc. (www.corelis.com) 1s
a popular tool for boundary scan test implementation.

Dept. Of CSE, HKBKCE 178 2019-20

You might also like