Bcs302 Complete Notes
Bcs302 Complete Notes
BCS402 Microcontrollers
MODULE 1
Difference between Microprocessor and Microcontroller
Code Code
Generation Generation
Greater Complexity
Processor Processor
Difference between RISC and CISC
RISC CISC
Q. What are the silent features of ARM instruction set are suitable for embedded applications?
Answer:
In the following ways that make the ARM instruction set suitable for embedded applications:
Variable cycle execution for certain instructions—Not every ARM instruction executes in a
single cycle. For example, load-store-multiple instructions vary in the number of execution cycles
depending upon the number of registers being transferred.
Inline barrel shifter leading to more complex instructions—The inline barrel shifter is a
hardware component that preprocesses one of the input registers before it is used by an
instruction. This expands the capability of many instructions to improve core performance and
code density.
Thumb 16-bit instruction set—ARM enhanced the processor core by adding a second 16-bit
instruction set called Thumb that permits the ARM core to execute either 16- or 32-bit
instructions.
Conditional execution— An instruction is only executed when a specific condition has been
satisfied. This feature improves performance and code density by reducing branch instructions.
Enhanced instructions—The enhanced digital signal processor (DSP) instructions were added to
the standard ARM instruction set to support fast 16×16-bit multiplier operations.
Or
With a neat diagram explain the different hardware components of an embedded device based on ARM
core.
Answer: Figure shown below shows a typical embedded device based on ARM core. Each box represents
a feature or function.
ARM
Processor ROM
Memory Controller FLASH ROM
SRAM
Interrupt Controller DRAM
AHB Arbiter
AHB-APB bridge
Ethernet
Real-time clock
Counter/timers
Console Serial UARTs
ARM processor based embedded system hardware can be separated into the following four main
hardware components:
o The ARM processor: The ARM processor controls the embedded device. Different
versions of the ARM processor are available to suits the desired operating characteristics.
o Controllers: Controllers coordinate important blocks of the system. Two commonly
found controllers are memory controller and interrupt controller.
o Peripherals: The peripherals provide all the input-output capability external to the chip
and responsible for the uniqueness of the embedded device.
o Bus: A bus is used to communicate between different parts of the device.
ARM Bus Technology
o Embedded devices use an on-chip bus that is internal to the chip and that allows different
peripheral devices to be interconnected with an ARM core.
o There are two different classes of devices attached to the bus.
The ARM processor core is a bus master—a logical device capable of initiating
a data transfer with another device across the same bus.
Peripherals tend to be bus slaves—logical devices capable only of responding to
a transfer request from a bus master device.
AMBA Bus Protocol
o The Advanced Microcontroller Bus Architecture (AMBA) was introduced in 1996 and
has been widely adopted as the on-chip bus architecture used for ARM processors.
o The first AMBA buses introduced were the ARM System Bus (ASB) and the ARM
Peripheral Bus (APB).
o Later ARM introduced another bus design, called the ARM High Performance Bus
(AHB).
o AHB provides higher data throughput than ASB because it is based on a centralized
multiplexed bus scheme rather than the ASB bidirectional bus design.
MEMORY
o An embedded system has to have some form of memory to store and execute code.
o Figure below shows the memory trade-offs: the fastest memory cache is physically
located nearer the ARM processor core and the slowest secondary memory is set further
away.
o Generally the closer memory is to the processor core, the more it costs and the smaller its
capacity.
PERIPHERALS
o Embedded systems that interact with the outside world need some form of peripheral
device.
o Controllers are specialized peripherals that implement higher levels of functionality
within the embedded system.
o Memory controller: Memory controllers connect different types of memory to the
processor bus.
o Interrupt controller: An interrupt controller provides a programmable governing policy
that allows software to determine which peripheral or device can interrupt the processor
at any specific time.
A peripheral can simply be bolted onto the on-chip bus without having to redesign an
interface for different processor architecture.
This plug-and-play interface for hardware developers improves availability and time to
market.
AHB provides higher data throughput than ASB because it is based on a centralized
multiplexed bus scheme rather than the ASB bidirectional bus design.
This change allows the AHB bus to run at higher clock speeds and to be the first ARM
bus to support widths of 64 and 128 bits.
ARM has introduced two variations on the AHB bus: Multi-layer AHB and AHB-Lite.
In contrast to the original AHB, which allows a single bus master to be active on the bus
at any time, the Multi-layer AHB bus allows multiple active bus masters.
AHB-Lite is a subset of the AHB bus and it is limited to a single bus master. This bus
was developed for designs that do not require the full features of the standard AHB bus.
Answer:
An embedded system requires software to drive it. Figure below shows typical software
components required to control an embedded device.
Each software components in the stack uses a higher level of abstraction to separate the code
from the hardware device.
Applications
Operating System
An ARM core as functional units connected by data buses, as shown in Figure1, where, the
arrows represent the flow of data, the lines represent the buses, and the boxes represent either an
operation unit or a storage area.
The instruction decoder translates instructions before they are executed.
The ARM processor, like all RISC processors, uses a load - store architecture.
Load instructions copy data from memory to registers, and conversely the store instructions
copy data from registers to memory.
There are no data processing instructions that directly manipulate data in memory.
ARM instructions typically have two source registers, Rn and Rm, and a single destination
register, Rd. Source operands are read from the register file using the internal buses A and B,
respectively.
The ALU (arithmetic logic unit) or MAC (multiply-accumulate unit) takes the register values Rn
and Rm from the A and B buses and computes a result.
Data processing instructions write the result in Rd directly to the register file.
Load and store instructions use the ALU to generate an address to be held in the address register
and broadcast on the Address bus.
One important feature of the ARM is that register Rm alternatively can be preprocessed in the
barrel shifter before it enters the ALU.
After passing through the functional units, the result in Rd is written back to the register file using
the Result bus.
For load and store instructions the incrementer updates the address register before the core reads
or writes the next register value from or to the next sequential memory location.
REGISTERS
Q5. Explain briefly the active registers available in user mode.
OR
With a neat diagram explain the different general purpose registers of ARM processors.
Answer: Figure shown below shows the active registers available in user mode. All the registers shown
are 32 bits in size.
r0
r1
r2
r3
r4
r5
r6
r7
r8
r9
r10
r11
r12
r13 sp
r14 lr
r15 pc
cpsr
-
There are up to 18 active registers: 16 data registers and 2 processor status registers. The data
registers are visible to the programmer as r0 to r15.
The ARM processor has three registers assigned to a particular task: r13, r14 and r15.
Register r13: Register r13 is traditionally used as the stack pointer (sp) and stores the head of the
stack in the current processor mode.
Register r14: Register r14 is called the link register (lr) and is where the core puts the return
address whenever it calls a subroutine.
Register r15: Register r15 is the program counter (pc) and contains the address of the next
instruction to be fetched by the processor.
In addition to the 16 data registers, there are two program status registers: current program status
register (cpsr) and saved program status register (spsr).
Answer: Figure below shows the basic layout of a generic program status register.
Fields
Flags Status Extension Control
Bit 31 30 29 28 7 6 5 4 0
N Z C V I F T Mode
Function
Mode Mode[4:0]
Abort 10111
Fast interrupt request 10001
Interrupt request 10010
Supervisor 10011
System 11111
Undefined 11011
User 10000
When cpsr bit 5, T=1, then the processor is in Thumb state. When T=0, the processor is in ARM
state.
The cpsr has two interrupt mask bits, 7 and 6 (I and F) which control the masking Interrupt
request (IRQ) and Fast Interrupt Request (FIR).
Condition flags are updated by comparisons and the result of ALU operations that specify the
S instruction suffix.
For example, if SUBS subtract instruction results in a register value of zero, then the Z flag in
the cpsr is set.
Processor Mode
Answer:
PIPELINE
Q9. With neat diagram explain the various blocks in a 3 stage pipeline of ARM processor
organization.
OR
Explain ARM pipeline with 3,5,6 stages.
Answer:
Pipeline is the mechanism to speed up execution by fetching the next instruction while other
instruction are being decoded and executed.
Figure 1 shows the ARM7 three-stage pipeline.
Each instruction takes a single cycle to complete after the pipeline is filled.
o In the first cycle, the core fetches the ADD instruction from the memory.
o In the second cycle, the core fetches the SUB instruction and decode the ADD
instruction.
o In the third cycle, the core fetches CMP instruction from the memory, decode the SUB
instruction and execute the ADD instruction.
o The ADD instruction is executed, the SUB instruction is decoded, and the CMP
instruction is fetched. This procedure is called filling the pipeline.
Cycle 1 ADD
Time
Cycle 2 SUB ADD
The pipeline design for each ARM family differs. For example, the ARM9 core increases the
pipeline length to five stages as shown in the figure below.
The ARM10 increases the pipeline length still further by adding a sixth stage as shown in the
figure below.
As the pipeline length increases the amount of work done at each stage is reduced, which allows
the processor to attain a higher operating frequency. This in turn increases the performance.
Pipeline Executing Characteristics
a. The ARM pipeline has not processed an instruction until it passes completely through the
execute stage. For example, an ARM7 pipeline (with three stages) has executed an instruction
only when the fourth instruction is fetched. Figure below shows an instruction sequence on an
ARM7 pipeline.
Each vector table entry contains a form of branch instruction pointing to start of a specific
routine.
Following is the vector table:
Exception/Interrupt Shorthand Address High address
Reset RESET 0x00000000 0xffff0000
Undefined instruction UNDEF 0x00000004 0xffff0004
Software interrupt SWI 0x00000008 0xffff0008
Prefetch abort PABT 0x0000000c 0xffff000c
Data abort DABT 0x00000010 0xffff0010
Reserved --- 0x00000014 0xffff0014
Interrupt request IRQ 0x00000018 0xffff0018
Fast interrupt request FIQ 0x0000001c 0xffff001c
Reset vector is the location of the first instruction executed by the processor when power is
applied. This instruction branches to the initialization code.
Undefined instruction vector is used when the processor cannot decode the instruction.
Software interrupt vector is called when SWI instruction is executed. The SWI is frequently
used as the mechanism to invoke an operating system routine.
Prefetch abort vector occurs when the processor attempts to fetch an instruction from an address
without the correct access permissions.
Data abort vectors is similar to a prefetch abort but is raised when an instruction attempts to
access data memory without the correct access permissions.
Interrupt request vector is used by external hardware to interrupt the normal execution flow of
the processor.
Fast interrupt request vector is similar to the interrupt request but is reserved for hardware
requiring faster response times.
Core Extensions
Q11. Discuss the following with neat diagrams
a. Von Neumann architecture with cache
b. Harvard architecture with TCM
OR
Discuss all 3 core extensions.
Answer:
There are three core extensions wrap around ARM processor: cache and tightly coupled memory,
memory management and the coprocessor interface.
1. Cache and tightly coupled memory: The cache is a block of fast memory placed between
main memory and the core. With a cache the processor core can run for the majority of the time
without having to wait for data from slow external memory.
o ARM has two forms of cache. The first found attached to the Von Neumann-style cores.
It combines both data and instruction into a single unified cache as shown in the figure 1
below.
o The second form, attached to the Harvard-style cores, has separate cache for data and
instruction as shown figure 2
2. Memory management:
Embedded systems often use multiple memory devices. It is usually necessary to have a method
to help organize these devices and protect the system from applications trying to make
appropriate accesses to hardware.
This is achieved with the assistance of memory management hardware.
ARM cores have three different types of memory management hardware- no extensions provide
no protection, a memory protection unit (MPU) providing limited protection and a memory
management unit (MMU) providing full protection.
o Nonprotected memory is fixed and provides very little flexibility. It normally used for
small, simple embedded systems that require no protection from rogue applications.
o Memory protection unit (MPU) employs a simple system that uses a limited number of
memory regions. These regions are controlled with a set of special coprocessor registers,
and each region is defined with specific access permission but don’t have a complex
memory map.
o Memory management unit (MMU)are the most comprehensive memory management
hardware available on the ARM. The MMU uses a set of translation tables to provide
fine-grained control over memory.
These tables are stored in main memory and provide virtual to physical address
map as well as access permission. MMU designed for more sophisticated system
that supports multitasking.
3. Coprocessors:
A coprocessor extends the processing features of a core by extending the instruction set or by
providing configuration registers.
More than one coprocessor can be added to the ARM core via the coprocessor interface.
The coprocessor can be accessed through a group of dedicated ARM instructions that provide a
load-store type interface.
The coprocessor can also extend the instruction set by providing a specialized instructions that
can be added to standard ARM instruction set to process vector floating-point (VFP) operations.
These new instructions are processed in the decode stage of the ARM pipeline. If the decode
stage sees a coprocessor instruction, then it offers it to the relevant coprocessor.
But, if the coprocessor is not present or doesn’t recognize the instruction, then the ARM takes an
undefined instruction exception.
MODULE 2
Data Processing Instructions
The data processing instructions manipulate data within registers. They are move
instructions, arithmetic instructions, logical instructions, compare instructions and
multiply instructions.
Most data processing instructions can process one of their operands using the barrel
shifter.
If S is suffixed on a data processing instruction, then it updates the flags in the cpsr.
MOVE INSTRUCTIONS:
It copies N into a destination register Rd, where N is a register or immediate value. This
instruction is useful for setting initial values and transferring data between registers.
In the example shown below, the MOV instruction takes the contents of register r5
and copies them into register r7.
ARITHMETIC INSTRUCTIONS:
The arithmetic instructions implement addition and subtraction of 32-bit signed
and unsigned values.
Syntax: <instruction>{<cond>} {S} Rd, Rn, N
In the following example, the reverse subtract instruction (RSB) subtract r1 from
the constant value #0, writing the result in r0.
LOGICAL INSTRUCTIONS:
Logical instructions perform bitwise operations on the two source registers.
Syntax: <instruction> {<cond>} {S} Rd, Rn, N
In the example shown below, a logical OR operation between registers r1 and r2 and
the result is in r0.
COMPARISON INSTRUCTIONS:
The comparison instructions are used to compare or test a register with a 32-bit value.
They update the cpsr flag bits according to the result, but do not affect other registers.
After the bits have been set, the information can be used to change program flow by
using conditional execution.
Syntax: <instruction> {<cond>} Rn, N
Example shown below for CMP instruction, both r0 and r1 are equal before the execution
of the instruction. The value of the z flag prior to the execution is 0 and after the
execution z flag changes to 1 (upper case of Z).
In the following example below shows a multiply instruction that multiplies registers
r1 and r2 and places the result into the register r0.
The long multiply instructions (SMLAL, SMULL, UMLAL, and UMULL) produce a 64-
bit result.
BRANCH INSTRUCTIONS
Q2. Explain briefly branch instructions of ARM processor.
Answer:
A branch instruction changes the flow of execution or is used to call a routine.
This type of instruction allows programs to have subroutines, if-then-else structures, and
loops.
The change of execution flow forces the program counter (pc) to point to a new address.
The branch with link (BL) instruction changes the execution flow in addition
overwrites the link register lr with a return address. The example shows below a
fragment of code that branches to a subroutine using the BL instruction.
The branch exchange (BX) instruction uses an absolute address stored in register
Rm. It is primarily used to branch to and from Thumb code. The T bit in the cpsr is
updated by the least significant bit of the branch register.
Similarly, branch exchange with link (BLX) instruction updates the T bit of the cpsr
with the least significant bit and additionally sets the link register with the return
address.
LOAD-STORE INSTRUCTIONS ( Memory Access Instructions)
Load-store instructions transfer data between memory and processor registers. There are
three types of load-store instructions: single-register transfer, multiple-register transfer,
and swap.
a) Single-Register Transfer
These instructions are used for moving a single data item in and out of a register.
Here are the various load-store single-register transfer instructions.
Syntax: <LDR|STR>{<cond>}{B} Rd, addressing1
LDR{<cond>}SB|H|SH Rd, addressing2
STR{<cond>}H Rd, addressing2
Example:
1. LDR r0, [r1]
o This instruction loads a word from the address stored in register r1 and places
it into register r0.
b) Multiple-Register Transfer
Load-store multiple instructions can transfer multiple registers between memory and the
processor in a single instruction. The transfer occurs from a base address register Rn
pointing into memory.
Multiple-register transfer instructions are more efficient from single-register transfers for
moving blocks of data around memory and saving and restoring context and stacks.
Addressing modes:
Single-Register Load-Store Addressing Modes
The ARM instruction set provides different modes for addressing memory.
These modes incorporate one of the indexing methods: preindex with writeback,
preindex, and postindex
Example:
Example:
mem32[0x8001c] =0x04
If LDMIA is replaced with LDMIB post execution the content of registers is shown
below
STACK OPERATIONS
The ARM architecture uses the load-store multiple instructions to carry out stack
operations.
The pop operation (removing data from a stack) uses a load multiple instruction;
similarly, the push operation (placing data onto the stack) uses a store multiple
instruction.
When you use a full stack (F), the stack pointer sp points to an address that is the last
used or full location.
In contrast, if you use an empty stack (E) the sp points to an address that is the first
unused or empty location.
A stack is either ascending (A) or descending (D). Ascending stacks grow towards
higher memory addresses; in contrast, descending stacks grow towards lower memory
addresses.
Addressing modes for stack operation
The LDMFD and STMFD instructions provide the pop and push functions, respectively.
Example1: With full descending
When the processor executes an SWI instruction, it sets the program counter pc to the
offset 0xB in the vector table.
The instruction also forces the processor mode to SVC, which allows an operating system
routine to be called in a privileged mode.
Each SWI instruction has an associated SWI number, which is used to represent a
particular function call or feature.
The example below shows an SWI call with SWI number 0x123456, used by
ARM toolkits as a debugging SWI.
Since SWI instructions are used to call operating system routines, it is required some
form of parameter passing.
This achieved by using registers. In the above example, register r0 is used to pass
parameter 0x12. The return values are also passed back via register.
Coprocessor Instructions
Q5. Explain briefly coprocessor instructions.
Answer:
Coprocessor instructions are used to extend the instruction set.
A coprocessor can either provide additional computation capability or be used to control
the memory subsystem including caches and memory management.
These instructions are used only by core with a coprocessor.
Syntax: CDP {<cond>} cp,opcode1, Cd, Cn {,opcode2}
<MRC|MCR>{<cond>}cp,opcode1,Rd,Cn,Cm{,opcode2}
<LDC|STC>{<cond>}cp,Cd,addressing
In the syntax of the coprocessor instructions, the cp field represents the number between
p0 and p15. The opcode fields describe the operation to take place on the coprocessor.
The Cn, Cm and Cd fields describe registers within the coprocessor.
For example: The instruction below copies coprocessor CP15 register c0 into a general
purpose register r10.
MRC p15, 0, r10, c0, c0, 0 ; CP15 register-0 is copied into general
purpose register r10.
For example: The instruction below moves the contents of CP15 control register c1
into register r1 of the processor core.
MRC p15, 0, r1, c1, c0, 0
Loading Constants
Q6. Explain briefly the loading constants.
Answer:
There are two pseudo instructions to move a 32-bit constant value to a register.
Syntax: LDR Rd, =constant
ADR Rd, label
The example below shows an LDR instruction loading a 32-bit constant 0xff00ffff
into register r0.
LDR r0, =0xff00ffff
Programs:
1. Write ALP program for ARM7 demonstrating the data transfer.
Answer:
AREA DATATRANSFER, CODE, READONLY
ENTRY
LDR R9,=SRC ;LOAD STARTING ADDRESS OF SOURCE
LDR R10,=DST ;LOAD STARTING ADDRESS OF DESTINATION
LDMIA R9!,{R0-R7]
STMIA R10!,{R0-R7]
4. Write ALP using ARM instructions that calls subroutine fact to find factorial of a given
number.
Answer:
FACT
MOVS R1, R0 ; If R1= 0, ZF =1
MOVEQ R5, #1 ; If ZF =1, Return 1
LOOP
SUBNES R1, R1, #1 ; R1 = R1-1 if R1 not 0
MULNE R0, R1, R0 ; R0 = R1 * R0
BNE LOOP ; IF (R1 != 0) LOOP.
MOV R5, R0
MOV PC, R14 ; RETURN WITH RESULT IN R5.
AREA FACT, DATA,
READWRITE DSTDCD 0
END
5. Write ALP program to add array of 16 bit numbers and store the result in
memory. Answer:
AREA AryAdd, CODE, READONLY
ENTRY
LDR R0, =SRC ; pointer to source array
LDR R1, = DST ; pointer to destination
MOV R2, #5 ; count of numbers
MOV R5, #0 ; initial sum
UP LDRH R3, [R0] ; 1st number in R2
ADD R5, R5, R3 ; add numbers
ADD R0, R0, #2 ; increment pointer to next
number SUBS R2, R2, #1 ; decrement count by
1 CMP R2, #0
BNE UP
STRH R5, [R1]
STOP B STOP
SRC DCW 10, 20, 30, 40, 50
AREA BLOCKDATA, DATA, READWRITE
DST DCW 0
END
MOV R0, R1
MOV R1, R3
SUB R4, #01 ;DECREMENTING THE COUNTER
CMP R4, #00 ;COMPARING THE COUNTER TO ZERO
BNE BACK ;LOOPING
BACK STOP B STOP
AREA FIBONACCI, DATA, READWRITE
FIBO DCD 0,0,0,0,0
END
End
Writing assembly by hand gives you direct control of three optimization tools that you cannot
explicitly use by writing C source:
int square(int i)
{
return i*i;
}
Let’s see how to replace square by an assembly function that performs the same action.
Remove the C definition of square, but not the declaration (the second line) to produce a
new C file main1.c. Next add an armasm assembler file square.s with the following
contents:
The AREA directive names the area or code section that the code lives in. If you use
nonalphanumeric characters in a symbol or area name, then enclose the name in vertical
bars.
The EXPORT directive makes the symbol square available for external linking.
The input argument is passed in register r0, and the return value is returned in register r0.
The multiply instruction has a restriction that the destination register must not be the
same as the first argument register. Therefore we place the multiply result into r1 and
move this to r0.
The END directive marks the end of the assembly file. Comments follow a semicolon.
Example 6.1 only works if you are compiling your C as ARM code. If you compile your
C as Thumb code, then the assembly routine must return using a BX instruction as shown
below
Example 6.2
This example shows how to call a subroutine from an assembly routine. We will take Example
6.1 and convert the whole program (including main) into assembly. We will call the C library
routine printf as a subroutine. Create a new assembly file main3.s with the following contents:
We have used a new directive, IMPORT, to declare symbols that are defined in other
files.
The imported symbol Lib$$Request$$armlib makes a request that the linker links with
the standard ARM C library. The WEAK specifier prevents the linker from giving an
error if the symbol is not found at link time. If the symbol is not found, it will take the
value zero.
The second imported symbol main is the start of the C library initialization code.
You only need to import these symbols if you are defining your own main; a main
defined in C code will import these automatically for you. Importing printf allows us to
call that C library function.
The RN directive allows us to use names for registers. In this case we define i as an
alternate name for register r4. Using register names makes the code more readable.
Recall that ATPCS states that a function must preserve registers r4 to r11 and sp. We
corrupt i(r4), and calling printf will corrupt lr.
Therefore we stack these two registers at the start of the function using an STMFD
instruction. The LDMFD instruction pulls these registers from the stack and returns by
writing the return address to pc.
The DCB directive defines byte data described as a string or a comma-separated list of
bytes.
Note that Example 6.3 also assumes that the code is called from ARM code. If the code
can be called from Thumb code as in Example 6.2 then we must be capable of returning
to Thumb code.
Finally, let’s look at an example where we pass more than four parameters. Recall that
ATPCS places the first four arguments in registers r0 to r3. Subsequent arguments are
placed on the stack.
Example 6.3
This example defines a function sumof that can sum any number of integers. The
arguments are the number of integers to sum followed by a list of the integers. The sumof
function is written in assembly and can accept any number of arguments.
The code keeps count of the number of remaining values to sum, N. The first three values
are in registers r1, r2, r3. The remaining values are on the stack.
The ARM simulator used by the ADS1.1 debugger is called the ARMulator and provides
profiling and cycle counting features.
The ARMulator profiler works by sampling the program counter pc at regular intervals.
The profiler identifies the function the pc points to and updates a hit counter for each
function it encounters.
Another approach is to use the trace output of a simulator as a source for analysis.
A pc-sampled profiler can produce meaningless results if it records too few samples. You
can even implement your own pc-sampled profiler in a hardware system using timer
interrupts to collect the pc data points.
ARM implementations do not normally contain cycle-counting hardware, so to easily
measure cycle counts you should use an ARM debugger with ARM simulator. You can
configure the ARMulator to simulate a range of different ARM cores and obtain cycle
count benchmarks for a number of platforms.
Instruction Scheduling
Instructions that are conditional on the value of the ARM condition codes in the cpsr
take one cycle if the condition is not met.
If the condition is met, then the following rules apply:
1. ALU operations such as addition, subtraction, and logical operations take one
cycle. This includes a shift by an immediate value. If you use a register-
specified shift, then add one cycle. If the instruction writes to the pc, then add
two cycles.
2. Load instructions that loadN32-bit words of memory such as LDR and LDM
take N cycles to issue, but the result of the last word loaded is not available on
the following cycle. The updated load address is available on the next cycle.
This assumes zero-wait-state memory for an uncached system, or a cache hit
for a cached system. An LDM of a single value is exceptional, taking two
cycles. If the instruction loads pc, then add two cycles.
3. Load instructions that load 16-bit or 8-bit data such as LDRB, LDRSB,
LDRH, and LDRSH take one cycle to issue. The load result is not available on
the following two cycles. The updated load address is available on the next
cycle. This assumes zero-wait-state memory for an uncached system, or a
cache hit for a cached system.
4. Branch instructions take three cycles.
5. Store instructions that store N values take N cycles. This assumes zero-wait-
state memory for an uncached system, or a cache hit or a write buffer with N
free entries for a cached system. An STM of a single value is exceptional,
taking two cycles.
6. Multiply instructions take a varying number of cycles depending on the value
of the second operand in the product.
To understand how to schedule code efficiently on the ARM, we need to understand the
ARM pipeline and dependencies. The ARM9TDMI processor performs five operations in
parallel:
o Fetch: Fetch from memory the instruction at address pc. The instruction is loaded
into the core and then processes down the core pipeline.
o Decode: Decode the instruction that was fetched in the previous cycle. The
processor also reads the input operands from the register bank if they are not
available via one of the forwarding paths.
o ALU: Executes the instruction that was decoded in the previous cycle. Note this
instruction was originally fetched from address pc − 8 (ARM state) or pc − 4
(Thumb state). Normally this involves calculating the answer for a data
processing operation, or the address for a load, store, or branch operation. Some
instructions may spend several cycles in this stage. For example, multiply and
register-controlled shift operations take several ALU cycles.
o LS1: Load or store the data specified by a load or store instruction. If the
instruction is not a load or store, then this stage has no effect.
o LS2: Extract and zero- or sign-extend the data loaded by a byte or halfword load
instruction. If the instruction is not a load of an 8-bit byte or 16-bit halfword item,
then this stage has no effect.
Figure 6.1 shows a simplified functional view of the five-stage ARM9TDMI pipeline.
Note that multiply and register shift operations are not shown in the figure.
After an instruction has completed the five stages of the pipeline, the core writes the
result to the register file. Note that pc points to the address of the instruction being
fetched.
The ALU is executing the instruction that was originally fetched from address pc − 8 in
parallel with fetching the instruction at address pc.
How does the pipeline affect the timing of instructions? Consider the following
examples. These examples show how the cycle timings change because an earlier
instruction must complete a stage before the current instruction can progress down the
pipeline.
If an instruction requires the result of a previous instruction that is not available, then the
processor stalls. This is called a pipeline hazard or pipeline interlock.
Example 6.4: This example shows the case where there is no interlock.
This instruction pair takes two cycles. The ALU calculates r0 + r1 in one cycle. Therefore
this result is available for the ALU to calculate r0 + r2 in the second cycle.
Example 6.5: This example shows a one-cycle interlock caused by load use.
This instruction pair takes three cycles. The ALU calculates the address r2 + 4 in the first
cycle while decoding the ADD instruction in parallel. However, the ADD cannot proceed
on the second cycle because the load instruction has not yet loaded the value of r1.
Therefore the pipeline stalls for one cycle while the load instruction completes the LS1
stage.
Now that r1 is ready, the processor executes the ADD in the ALU on the third cycle.
Figure 6.2: illustrates how this interlock affects the pipeline. The processor stalls the
ADD instruction for one cycle in the ALU stage of the pipeline while the load instruction
completes the LS1 stage. We’ve denoted this stall by an italic ADD. Since the LDR
instruction proceeds down the pipeline, but the ADD instruction is stalled, a gap opens up
between them.
This gap is sometimes called a pipeline bubble. We’ve marked the bubble with a dash.
Example 6.6: This example shows a one-cycle interlock caused by delayed load use.
This instruction triplet takes four cycles. Although the ADD proceeds on the cycle
following the load byte, the EOR instruction cannot start on the third cycle. The r1 value
is not ready until the load instruction completes the LS2 stage of the pipeline. The
processor stalls the EOR instruction for one cycle.
Note that the ADD instruction does not affect the timing at all. The sequence takes four
cycles whether it is there or not.
Figure 6.3 shows how this sequence progresses through the processor pipeline. The
ADD doesn’t cause any stalls since the ADD does not use r1, the result of the load.
Example 6.8: This example shows why a branch instruction takes three cycles. The processor
must flush the pipeline when jumping to a new address.
The three executed instructions take a total of five cycles. The MOV instruction executes
on the first cycle. On the second cycle, the branch instruction calculates the destination
address. This causes the core to flush the pipeline and refill it using this new pc value.
The refill takes two cycles. Finally, the SUB instruction executes normally.
Figure 6.4 illustrates the pipeline state on each cycle. The pipeline drops the two
instructions following the branch when the branch takes place.
Load instructions occur frequently in compiled code, accounting for approximately one
third of all instructions. Careful scheduling of load instructions so that pipeline stalls
don’t occur can improve performance.
The compiler cannot move a load instruction before a store instruction unless it is certain
that the two pointers used do not point to the same address.
Let’s consider an example of a memory-intensive task. The following function,
str_tolower, copies a zero-terminated string of characters from in to out. It converts the
string to lowercase in the process.
The ADS1.1 compiler generates the following compiled output. Notice that the compiler
optimizes the condition (c>=‘A’ && c<=‘Z’) to the check that 0<=c-‘A’<=‘Z’-‘A’. The
compiler can perform this check using a single unsigned comparison.
Unfortunately, the SUB instruction uses the value of c directly after the LDRB instruction
that loads c. Consequently, the ARM9TDMI pipeline will stall for two cycles. The
compiler can’t do any better since everything following the load of c depends on its
value.
However, there are two ways you can alter the structure of the algorithm to avoid the
cycles by using assembly. We call these methods load scheduling by preloading and
unrolling.
The scheduled version is one instruction longer than the C version, but we save two
cycles for each inner loop iteration. This reduces the loop from 11 cycles per character to
9 cycles per character on an ARM9TDMI, giving a 1.22 times speed improvement.
This loop is the most efficient implementation we’ve looked at so far. The
implementation requires seven cycles per character on ARM9TDMI. This gives a 1.57
times speed increase over the original str_tolower.
Register Allocation
You can use 14 of the 16 visible ARM registers to hold general-purpose data. The other
two registers are the stack pointer r13 and the program counter r15.
For a function to be ATPCS compliant it must preserve the callee values of registers r4 to
r11. ATPCS also specifies that the stack should be eight-byte aligned; therefore you must
preserve this alignment if calling subroutines.
Use the following template for optimized assembly routines requiring many registers:
Our only purpose in stacking r12 is to keep the stack eight-byte aligned.
In this section we look at how best to allocate variables to register numbers for register
intensive tasks, how to use more than 14 local variables, and how to make the best use of
the 14 available registers.
1. Allocating Variables to Register Numbers
When you write an assembly routine, it is best to start by using names for the
variables, rather than explicit register numbers. This allows you to change the
allocation of variables to register numbers easily.
You can even use different register names for the same physical register number
when their use doesn’t overlap. Register names increase the clarity and readability
of optimized code.
However, there are several cases where the physical number of the register is
important:
i. Argument registers. The ATPCS convention defines that the first four
arguments to a function are placed in registers r0 to r3. Further arguments
are placed on the stack. The return value must be placed in r0.
ii. Registers used in a load or store multiple. Load and store multiple
instructions LDM and STM operate on a list of registers in order of
ascending register number. If r0 and r1 appear in the register list, then the
processor will always load or store r0 using a lower address than r1 and so
on.
iii. Load and store double word. The LDRD and STRD instructions
introduced in ARMv5E operate on a pair of registers with sequential
register numbers, Rd and Rd + 1. Furthermore, Rd must be an even
register number.
There are several possible ways we can proceed when we run out of registers:
Conditional Execution
The processor core can conditionally execute most ARM instructions. This conditional
execution is based on one of 15 condition codes.
If you don’t specify a condition, the assembler defaults to the execute always condition
(AL).
The other 14 conditions split into seven pairs of complements. The conditions depend on
the four condition code flags N, Z, C, V stored in the cpsr register.
By default, ARM instructions do not update the N, Z, C, V flags in the ARM cpsr. For
most instructions, to update these flags you append an S suffix to the instruction
mnemonic.
Exceptions to this are comparison instructions that do not write to a destination register.
Their sole purpose is to update the flags and so they don’t require the S suffix.
By combining conditional execution and conditional setting of the flags, you can
implement simple if statements without any need for branches. This improves efficiency
since branches can take many cycles and also reduces code size.
Example 6.18: Consider the following example for conditional execution , the following
C code identifies if c is a vowel:
As soon as one of the TEQ comparisons detects a match, the Z flag is set in the cpsr.
The following TEQNE instructions have no effect as they are conditional on Z = 0.
Looping Constructs
Most routines critical to performance will contain a loop.
This section describes how to implement these loops efficiently in assembly. We also
look at examples of how to unroll loops for maximum performance.
1. Decremented Counted Loops
For a decrementing loop of N iterations, the loop counter i counts down from N to 1
inclusive. The loop terminates with i = 0. An efficient implementation is
The loop overhead consists of a subtraction setting the condition codes followed by a
conditional branch. On ARM7 and ARM9 this overhead costs four cycles per loop. If
I is an array index, then you may want to count down from N−1 to 0 inclusive instead
so that you can access array element zero.
You can implement this in the same way by using a different conditional branch:
In this arrangement the Z flag is set on the last iteration of the loop and cleared for
other iterations. If there is anything different about the last loop, then we can achieve
this using the EQ and NE conditions. For example, if you preload data for the next
loop then you want to avoid the preload on the last loop.
We’ll take the C library function memset as a case study. This function sets N
bytes of memory at address s to the byte value c. The function needs to be
efficient, so we will look at how to unroll the loop without placing extra
restrictions on the input operands.
Our version of memset will have the following C prototype:
To be efficient for large N, we need to write multiple bytes at a time using STR or
STM instructions. Therefore our first task is to align the array pointer s.
However, it is only worth us doing this if N is sufficiently large. We aren’t sure
yet what “sufficiently large” means, but let’s assume we can choose a threshold
value T1 and only bother to align the array when N ≥ T1.
Clearly T1 ≥ 3 as there is no point in aligning if we don’t have four bytes to
write!
Now suppose we have aligned the array s. We can use store multiples to set
memory efficiently.
For example, we can use a loop of four store multiples of eight words each to set
128 bytes on each loop. However, it will only be worth doing this if N ≥ T2 ≥
128, where T2 is another threshold to be determined later on.
Finally, we are left with N < T2 bytes to set. We can write bytes in blocks of four
using STR until N < 4. Then we can finish by writing bytes singly with STRB to
the end of the array.
Module 3
Text book 2: Chapter 1(Sections 1.2 to 1.6), Chapter 2(Sections 2.1 to 2.6)RBT: L1, L2
Every embedded system is unique, and the hardware as well as the firmware is highly
specialised to the application domain.
Embedded systems are becoming an inevitable part of any product or equipment in all fields
including household appliances, telecommunications, medical equipment, industrial control,
consumer products, etc.
critical
6. Need not be deterministic in execution Execution behaviour is deterministic for
behaviour certain types of embedded systems like
Hard Real Time systems
7. Less/not at all tailored towards reduced Highly tailored to take advantage of the
operating power requirements. power saving modes
The first recognised modern embedded system is the Apollo Guidance Computer (AGC)
developed by the MIT Instrumentation Laboratory for the lunar expedition. They ran the
inertial guidance systems of both the Command Module (CM) and the Lunar Excursion
Module (LEM).
The Command Module was designed to encircle the moon while the Lunar Module and its
crew were designed to go down to the moon surface and land there safely.
TheLunar Module featured in total 18 engines. There were 16 reaction control thrusters, a
descent engineand an ascent engine. The descent engine was ‘designed to’ provide thrust to
the lunar module out of the lunar orbit and land it safely on the moon.
MIT’s original design was based on 4K words of fixedmemory (Read Only Memory) and 256
words of erasable memory (Random Access Memory). By June1963, the figures reached 10K
of fixed and 1K of erasable memory. The final configuration was 36Kwords of fixed memory
and 2K words of erasable memory.
The clock frequency of the first microchipproto model used in AGC was 1.024 MHz and it
was derived from a 2.048 MHz crystal clock.
The first mass-produced embedded system was the guidance computer for the Minuteman-
I missile in 1961. It was the ‘Autonetics D-17’ guidance computer, built using discrete
transistor logic and a hard-disk for main memory.
The first integrated circuit was produced in September 1958 but computersusing them didn’t
begin to appear until 1963. Some of their early uses were in embedded systems, notably used
by NASA for the Apollo Guidance Computer and by the US military in the Minuteman-II
intercontinental ballistic missile.
This classification is based on the order in which the embedded systems evolved from the
first version to where they are today.
1. First Generation - The early embedded systems were built around 8bit
microprocessors like 8085 and Z80, and 4bit microcontrollers. Simple in hardware
circuits with firmware developed in assembly code.
Example -Digital telephone keypads, stepper motor control units etc.
2. Second Generation - These are embedded systems built around 16bit microprocessors
and 8 or 16 bit microcontrollers.
- The instruction set for the second generation processors/controllers were much more
complex and powerful than the first generation processors/controllers.
- Some of the second generation embedded systems contained embedded operating
systems for their operation.
Example -Data Acquisition Systems, SCADA systems, etc.
- The fourth generation embedded systems are making use of high performance real
time embedded operating systems for their functioning.
Example -Smart phone devices, mobile internet devices (MIDs), etc.
Classification Based on Complexity and Performance
According to this classification, embedded system can be grouped into:
3. Home automation and security systems: Air conditioners, sprinklers, intruder detection
alarms, closed circuit television cameras, fire alarms, etc.
4. Automotive industry: Anti-lock breaking systems (ABS), engine control, ignition
systems, automatic navigation systems, etc.
5. Telecom: Cellular telephones, telephone switches, handset multimedia applications, etc.
6. Computer peripherals: Printers, scanners, fax machines, etc.
7. Computer networking systems: Network routers, switches, hubs, firewalls, etc.
8. Healthcare: Different kinds of scanners, EEG, ECG machines etc.
9. Measurement & Instrumentation: Digital multi meters, digital CROs, logic analysers
PLC systems, etc.
10. Banking & Retail: Automatic teller machines (ATM) and currency counters etc.
11. Card Readers: Barcode, smart card readers, hand held devices, etc.
Each embedded system is designed to serve the purpose of any one or a combination of the
following tasks:
1. Data collection/Storage/Representation
2. Data communication
3. Data (signal) processing
4. Monitoring
5. Control
6. Application specific user interface
1. Data Collection/Storage/Representation
- Embedded systems designed for the purpose of data collection performs acquisition of data
from the external world. Data collection is usually done for storage, analysis, manipulation
and transmission.
- The term “data” refers all kinds of information, i.e. text, voice, image, video, electrical
signals and anyother measurable quantities. Data can be either analog (continuous) or digital
(discrete).
- Embedded systems with analog data capturing techniques collect data directly in the form of
analog signals, whereas embedded systems with digital data collection mechanism converts
the analog signal to corresponding digital signal using analog to digital (A/D) converters and
then collects the binary equivalent of the analog data.
- The collected data may be stored directly in the system or may be transmitted to some other
systems or it may be processed by the system. These actions are purely dependent on the
purpose for which the embedded system is designed.
- Embedded systems designed for pure measurement applicationswithout storage, collects
data and gives a meaningful representation of the collected data bymeans of graphical
representation or quantity value and deletes the collected data when new data arrives at the
data collection terminal.
- ExampleAnalog and digital CROs without storage memory.
- Some embedded systems store the collected data for processing and analysis. Such systems
incorporate a built-in/plug-in storage memory for storing the captured data.
- Example Instruments with storage memory used in medical applications.
- Certain embedded systems store the data and will not give a representationof the same to
the user, whereas the data is used for internal processing.
2. Data Communication
- Embedded data communication systems are deployed in applications ranging from complex
satellite communication systems to simple home networking systems.
- The data collected by an embedded terminal may require transferring of the sameto some
other system located remotely.
- The transmission is achieved either by a wire-line medium or by a wirelessmedium.
Wire-line medium was the most common choice in olden days embedded systems. As
technology is changing, wireless medium is becoming the de-factostandard for data
communication in embedded systems.
- A wireless medium offers cheaper connectivity solutions and make the communication link
free from the hassle of wire bundles.
- The data collecting embedded terminal itself can incorporate data communication units like
wireless modules (Bluetooth, ZigBee, Wi-Fi, EDGE, GPRS, etc.) or wire-line modules (RS-
232C, USB, TCP/IP,PS2, etc.).
- Certain embedded systems act as a dedicated transmission unit between the sending and
receiving terminals, offering sophisticated functionalities like data packetizing, encrypting
and decrypting. Example Network hubs, routers, switches, etc.
4. Monitoring
Embedded systems falling under this category are specifically designed for monitoring
purpose. Embedded products coming under the medical domain are with monitoring
functions only. They are used for determining the state of some variables using input sensors.
For example the electro cardiogram (ECG) machine is used for monitoring the heartbeat of
a patient. The machine is intended to do the monitoring of the heartbeat. It cannot impose
control over the heartbeat.
Some other examples of embedded systems with monitoring function are measuring
instruments like digital CRO, digital multimeters, logic analyzers, etc. They are used for
knowing (monitoring) the status of some variables like current, voltage, etc. Theycannot
control the variables in turn.
5. Control
- Embedded systems with control functionalities impose control over some variables
according to the changes in input variables.
- A system with control functionality contains both sensors and actuators.
Sensors are connected to the input port for capturing the changes in environmental variable or
measuring variable. The actuators connected to the output portare controlled according to the
changes in input variable to put an impact on the controlling variable to bring the controlled
variable to the specified range.
- Air conditioner system used in our home to control the room temperature to a specified
limit is a typicalexample for embedded system for control purpose.
- An air conditioner contains a room temperature sensing element (sensor) which may be a
thermistorand a handheld unit for setting up (feeding)the desired temperature. The handheld
unit maybe connected to the central embedded unit residinginside the air conditioner through
a wireless linkor through a wired link. The air compressor unit acts as the actuator. The
compressor is controlledaccording to the current room temperature and the desired
temperature set by the end user.Here the input variable is the current room temperatureand
the controlled variable is also the room temperature. The controlling variable is cool air flow
by the compressor unit. If the controlled variable. And input variable are not at the same
value, the controlling variable tries to equalise them throughb taking actions on the cool air
flow.
Hence an embedded system is a reactive system. The control is achieved by processing the
information coming from the sensors and user interfaces, and controlling some actuators that
regulate the physical variable.
Key boards, push button switches, etc. are examples for common user interface input
devices whereas LEDs, liquid crystal displays, piezoelectric buzzers, etc. are examples for
common output devices for a typical embedded system.
For example, if the embedded system is designed for any handheld application, such as a
mobile handset application, then the system should contain user interfaces like a keyboard for
performing input operations and display unit for providing users the status of various
activities in progress.
Some embedded systems do not require any manual intervention for their operation. They
automatically sense the variations in the real world, to which they are interacting through the
sensors which are connected to the input port of the system. The sensor information is passed
to the processor. Upon receivingthe sensor data the processor performs some pre-defined
operations with the help of the firmware embedded in the system and sends some actuating
signals to the actuator connected to the output port of the embedded system, which in turn
acts on the controlling variable to bring the controlled variable to the desired level to make
the embedded system work in the desired manner.
The Memory of the system is responsible for holding the control algorithm and other
important configurationdetails.
For most of embedded systems, the memory for storing the algorithm or configurationdata is
of fixed type, which is a kind of Read Only Memory (ROM) and it is not available for the
enduser for modifications, which means the memory is protected from unwanted user
interaction.
The most common types of memories used inembedded systems for control algorithm storage
are OTP, PROM, UVEPROM, EEPROM and FLASH.Depending on the control application,
the memory size may vary from a few bytes to megabytes.
Sometimes the system requires temporary memoryfor performing arithmetic operations or
control algorithm execution and this type of memory is knownas “working memory”.
Random Access Memory (RAM) is used in most of the systems as the workingmemory.
The sizeof the RAM also varies from a few bytes to kilobytes or megabytes depending on the
application.
The first microprocessor developed by Intel was Intel 4004, a 4bit processor which was
released in November 1971.
It featured 1K data memory, a 12bit program counter and 4K program memory, sixteen 4bit
general purpose registers and 46 instructions. It ran at a clock speed of 740 kHz.
In 1972, 14 more instructions were added to the 4004 instruction set and the program space is
upgraded to 8K.
It was quickly replaced in April 1972 by Intel 8008 which was similar to [Intel 4040, the
only difference was that its program counter was 14 bits wide and the 5008 served as a
terminal controller.
In April 1974 Intel launched the first 8bit processor, the Intel 8080, with 16bit address bus
and program counter and seven 8bit registers. Intel 8080 was the most commonly used
processors for industrial control and other embedded applications in the 1975s.
Immediately after the release of Intel 8080, Motorola also entered the market with their
processor, Motorola 6800 with a different architecture and instruction set compared to 8080.
In 1976 Intel came up with the upgraded version of 8080 — Intel 8085, with two newly
added instructions, three interrupt pins and serial I/O.
In July 1976 Zilog entered the microprocessor market with its Z80 processor as competitor
to Intel.
Technical advances in the field of semiconductor industry brought a new dimension to the
microprocessor market and twentieth century witnessed a fast growth in, processor
technology. 16, 32 and 64 bit processors came into market.
Intel, AMD, Freescale, IBM, TI, Cyrix, Hitachi, NEC, LSI Logic, etc. are the key players in
the processor market. Intel still leads the market with cutting edge technologies in the
processor industry.
Different instruction set and system architecture are available for the design of a
microprocessor. Harvard and Von-Neumann are the two common system architectures for
processor design. Processors based on Harvard architecture contains separate buses for
program memory and data memory, whereasprocessors based on Von-Neumann architecture
shares a single system bus for program and data memory.
ii. Microcontrollers
A Microcontroller is an integrated chip that contains a CPU, RAM, special and general
purpose register arrays, on chip ROM/FLASH memory for program storage, timer and
interrupt control units and dedicated I/O ports.
Texas Instrument’s TMS 1000 is considered as the world’s first microcontroller. TI
followed Intel’s 4004/4040, 4 bit processor design and added some amount of RAM, program
storage memory (ROM) and I/O support on a single chip, there by eliminated the requirement
of multiple hardware chips for self-functioning. Provision to add custom instructions to the
CPU was another innovative feature ofTMS 1000. TMS 1000 was released in 1974.
In 1977 Intel entered the microcontroller market with a family of controllers named MCS-48
family. Intel 8048 is recognised as Intel’s first microcontroller. The design of 8048
adopted a true Harvard architecture where program and data memory shared the same address
bus.
Intel came out with its most fruitful design in the 8bit microcontroller domain the
8051Family. It is the most popular and powerful 8bit microcontroller ever built. It was
developed in the 1980s and was put under the family MCS-51.
Almost 75% of the microcontrollers used in the embedded domain were 8051 family based
controllers during the 1980-90s. Due to the low cost, wide availability, memory efficient
instruction set, mature development tools and Boolean processing (bit manipulation
operation) capability etc.
Another important family of microcontrollers used in industrial control and embedded
applications is the PIC family microcontrollers from Microchip Technologies.
Figure 3.2
Microprocessors/controllers based on the Harvard architecture will have separate data bus
and instruction bus. This allows the data transfer and program fetching to occur
simultaneously in both buses. With Harvard architecture, the data memory can be read and
written while the program memory is being accessed. These separated data memory and code
memory buses allow one instruction to execute while the next instruction is fetched (“pre-
fetching”). The pre-fetch allows much faster execution than Von-Neumann architecture.
The following table highlights the differences between Harvard and Von-Neumann
architecture
Harvard architecture Von-Neumann architecture
Separate buses for instruction and data Single bus for instruction and data fetching
fetching
Easier to pipeline, so high performance can Low performance
achieve
Costly Cheaper
Since data memory and program memory Since data memory and program-memory
are stored physically in different locations, are stored physically ‘in the same chip,
no chances for accidental corruption of chances for accidental corruption of
program memory program memory
2. Big-endian - the higher-order byte of the data is stored in memory at the lowest
address, and the lower-order byte at the highest address.
For example, a 4 byte longinteger Byte3 Byte2 Bytel Byte will be stored in the memory as
follows':
Byte 3 Byte 3 0x20000 (Base Address )
Instruction Pipelining
The conventional instruction execution by the processor follows the fetch-decode-execute
sequence. Where the ‘fetch’ part fetches the instruction from program memory and the
decode part decodes the instruction. The execute stage reads the operands, perform ALU
operations and stores the result.
In conventional program execution, the fetch and decode operations are performed in
sequence. By pipelining the processing speed can be increased.
Instruction pipelining refers to the overlapped execution of instructions. Under normal
program execution the pc will have the address of next instruction to execute, while the
decoding and execution of the current instruction is in progress. If the current instruction in
progress is a branch instruction like jump or call instruction, there is no meaning in fetching
the instruction following the current instruction. In such cases the instruction fetched is
flushed and a new instruction fetch is performed to fetch the instruction.
Whenever the current instruction is executing the program counter will be loaded with the
address of the next instruction. In case of jump or branch instruction, the new location is
known only after completion of the jump or branch instruction. Depending on the stages
involved in an instruction, there can be multiple levels of instruction pipelining.
Figure below illustrates the concept of Instruction pipelining for single stage pipelining.
Figure 3.4: The concept of Instruction pipelining for single stage pipelining
- Advantages of PLDs
1) PLDs offer customer much more flexibility during the design cycle.
2) PLDs do not require long lead times for prototypes or production parts because PLDs are
already on a distributor’s shelf and ready for shipment.
3) PLDs can be reprogrammed even after a piece of equipment is shipped to a customer
4) The major drawback of using COTS components in embedded design is that the
manufacturer of the COTS component may withdraw the product or discontinue the
production of the COTS at any time if rapid change in technology occurs.
Advantages of COTS:
1) Ready to use
2) Easy to integrate
3) Reduces development time
Disadvantages of COTS:
1) No operational or manufacturing standard (all proprietary)
2) Vendor or manufacturer may discontinue production of a particular COTS product
Memory
Memory is an important part of a processor/controller based embedded systems. Some of the
processors/controllers contain built in memory and this memory is referred as on-chip
memory.
Others do not contain any memory inside the chip and requires external memory to be
connected with the controller/processor to store the control algorithm. It is called off-chip
memory.
Also some working memory is required for holding data temporarily during certain
operations.
i. Masked ROM (MROM)- Masked ROM isa one-time programmable device. It makes use
of the hardwired technology for storing data. The device is factory programmed by masking
and metallisation process at the time of production itself, according to the data provided by
the end user.
The primary advantage of this is low cost for high volume production. They are the least
expensive type of solid state memory
ii. Programmable Read Only Memory (PROM) / (OTP)- Unlike Masked ROM Memory,
One Time Programmable Memory (OTP) or PROM is not pre-programmed by the
manufacturer. The end user is responsible for programming these devices. This memory has
nichrome or polysilicon wires arranged in a matrix. These wires can be functionally viewed
as fuses.
It is programmed by a PROM programmer which selectively burns the fuses according to the
bit pattern to be stored. Fuses which are not blown/burned represents a logic “1” whereas
fuses which are blown/burned represents a logic “O”. The default state is logic “1”.
iii. Erasable Programmable Read Only Memory (EPROM) -OTPs are not useful and
worth for development purpose.
During the development phase the code is subject to continuous changes and using an OTP
each time to load the code is not economical. Erasable Programmable Read Only Memory
(EPROM) gives the flexibility to re-program the same chip.
EPROM stores the bit information by charging the floating gate of an FET. Bit information is
stored by using an EPROM programmer, which applies high voltage to charge the floating
gate. EPROM contains a quartz crystal window for erasing the stored information. If the
window is exposed to ultraviolet rays for a fixed duration, the entire memory will be erased.
iv. Electrically Erasable Programmable Read Only Memory (EEPROM)- As the name
indicates, the information contained in the EEPROM memory can be altered by using
electrical signals at the register/Byte level. They can be erased and reprogrammed in-circuit.
These chips include a chip erase mode and in this mode they can be erased in a few
milliseconds. It provides greater flexibility for system design. The only limitation is their
capacity is limited when compared with the standard ROM (A few kilobytes).
v. FLASH -FLASH is the latest ROM technology and is the most popular ROM technology
used in today’s embedded designs. FLASH memory is a variation of EEPROM technology. It
combines there-programmability of EEPROM and the high capacity of standard ROMs.
vi. NVRAM- Non-volatile RAM is a random access memory with battery backup. It contains
static RAM based memory and a minute battery for providing supply to the memory in the
absence of external power supply. The memory and battery are packed together in a single
package.
Figure 3.6
1. Static RAM (SRAM)- Static RAM stores data in the form of voltage. They are made up of
flip-flops. Static RAM is the fastest form of RAM available.
In typical implementation, an SRAM cell (bit) is realised using six transistors (or 6
MOSFETs). Four of the transistors are used for building the latch (flip-flop) part of the
memory cell and two for controlling the access.
2. Dynamic RAM (DRAM) – Dynamic RAM stores data in the form of charge. They
are made up of MOS transistor gates.
The advantages of DRAM are its high density and low cost compared to SRAM.
The disadvantage is that since the information is stored as charge it gets leaked off with time
and to prevent this they need to be refreshed periodically.
Special circuits called DRAM controllers are used for the refreshing operation. The refresh
operation is done periodically in milli-seconds interval.
3. NVRAM- Non-volatile RAM is a random access memory with battery backup. It contains
static RAM based memory and a minute battery for providing supply to the memory in the
absence of external power supply.
The memory and battery are packed together in a single package. NVRAM is used for the
non-volatile storage of results of operations or for setting up of flags, etc.
Memory Shadowing
Generally the execution of a program or a configuration from a Read Only Memory (ROM)
is very slow(120 to 200 ns) compared to the execution from a random access memory (40
to 70 ns). RAM access is about three times as fast as ROM access.
In computer systems will be a configuration holding ROM called Basic Input Output
Configuration ROM(BIOS). The systems BIOS stores the hardware configuration
information like the address assigned for various serial ports etc.
During system boot up BIOS is read and the system is configured according to it built’s a
time consuming.
Now the manufactures included a RAM behind the logical layer of BIOS at its same address
as a shadow to the BIOS and the first step happens during the boot up is copying the BIOS to
the shadowed RAM and write protecting the RAM then disabling the BIOS reading.
RAM is volatile and it cannot hold the configuration data which is copied from the BIOS
when the power supply is switched off. Only a ROM can hold it permanently. But for high
system performance it should be accessed from a RAM instead of accessing from a ROM.
The changes in system environment or variables are detected by the sensors connected to
the input port of the embedded system. If the embedded system is designed for any
controlling purpose, the system will produce some changes in the controlling variable to
bring the controlled variable to the desired value. It is achieved through an actuator
connected to the output port of the embedded system.
If the embedded system is designed for monitoring purpose only, then there is no need for
including an actuator in the system. For example, take the case of an ECG machine. It is
designed to monitor the heart beat status of a patient and it cannot impose a control over the
patient’s heart beat and its order.
Sensors - A sensor is a transducer device that converts energy from one form to another for
any measurement or control purpose.
Interaction happens through the sensors and actuators connected to the input and output ports
respectively of the embedded system. The sensors may not be directly interfaced to the input
ports, instead they may be interfaced through signal conditioning and translating systems like
ADC, optocouplers, etc.
Light Emitting Diode (LED)- Light Emitting Diode (LED) is an important output devicefor
visual indication in any embedded system.
LED can be used as an indicator for the status of various signals or situations. Typical
examples are indicating the presence of power conditions like ‘Device ON’, ‘Battery low’ or
‘Charging of battery’ for a battery operated handheld embedded devices.
Light Emitting Diode is a p-n junction diode and it contains an anode and a cathode. For
proper functioning of the LED, the anode of it should be connected to +ve terminal of the
supply voltage and cathode to the —ve terminal of supply voltage.
The current flowing through the LED must be limited to a value below the maximum current
that it can conduct. A resister is used in series between the power supply and the LED to limit
the current through the LED.
Figure 3.9
7-Segment LED Display- The 7-segmentLED display is an output device for displaying
alphanumeric characters. It contains 8 light-emitting diode (LED) segments arranged in a
special form. Out of the 8LED segments, 7 are used for displaying alpha numeric characters
and 1 is used for representing decimal point.
Below figure shows the arrangement of LED segments in a 7-segment LED display.
The LED segments are named A to G and the decimal point LED segment is named as DP.
The LED segments A to G and DP should be lit (ON) accordingly to display numbers and
characters.
For example, for displaying the number 4, the segments F, G, B and C are lit (on).For dis-
playing 3, the segments A, B, C, D, G and DP are lit. For displaying the character‘d’, the
segments B, C, D, E and G are lit.
All these 8 LED segments need to be connected to one port of the processor/controller for
displaying alpha numeric digits.
In the common anode configuration, the anodes of the 8 segments are connected commonly
Figure 3.11
Whereas in the common cathode configuration, the 8 LED segments share a common
cathode line.
Figure 3.12
Based on the configuration of the 7-segment LED unit, the LED segment’s anode or cathode
is connected to the port of the processor/controller in the order ‘A’ segment to the least
significant port pin-and DP segment to the most significant port pin.
The current flow through each of the LED segments should be limited to the maximum value
supported by the LED display unit. It can be limited by connecting a current limiting resistor
to the anode or cathode of each segment.
For common cathode configurations, the anode of each LED segment is connected to the port
pins of the port to which the display is interfaced.
The anode of the common anode LED display is connected to the 5V supply voltage through
a current limiting resistor and the cathode of each LED segment is connected to the
respective port pin lines.
For an LED segment to lit in the Common anode LED configuration, the port pin to which
the cathode of the LED segment is connected should be set at logic 0.
It differs from the normal DC motor in its operation. The DC motor produces continuous
rotation on applying DC voltage whereas a stepper motor produces discrete rotation in
response to the DC voltage applied to it.
Stepper motors are widely used in industrial embedded applications, consumer electronic
products and robotics control systems.
Based on the coil winding arrangements, a two-phase stepper motor is classified into two.
1. Unipolar
2. Bipolar
1. Unipolar-A unipolar stepper motor contains two windings per phase. The direction of
rotation (clockwise or anticlockwise) of a stepper motor is controlled by changing the
direction of current flow.
Current in one direction flows through one coil and in the opposite direction flows through
the other coil. It is easy to shift the direction of rotation by just switching the terminals to
which the coils are connected.
Below figure illustrates the working of a two-phase unipolar stepper motor.
Figure 3.13
The coils are represented as A, B, C and D. Coils A and C carry current in opposite directions
for phase 1. Similarly, B and D carry current in opposite directions for phase 2.
2. Bipolar- A bipolar stepper motor contains single winding per phase. For reversing the
motor rotation the current flow through the windings is reversed dynamically.
The stator winding details for a two phase unipolar stepper motor is shown in below figure.
Figure 3.14
The stepping of stepper motor can be implemented in different ways by changing the
sequence of activation of the stator windings.
The different stepping modes supported by stepper motor are explained below.
i. Full Step- In the full step mode both the phases are energised simultaneously. The
coils A, B, C and Dare energised in the following order:
ii. Wave Step - In the wave step mode only one phase is energised at a time and each coils
of the phase is energised alternatively.
2 L H L L
3 L L H L
4 L L L H
iii.Half Step - It uses the combination of wave and full step. It has the highest torque
and stability.
The following circuit diagram illustrates the interfacing of a stepper motor through a driver
circuit connected to the port pins of a microcontroller/processor.
Matrix keyboard is an optimum solution for handling large key requirements. It greatly
reduces the number of interface connections. For example, for interfacing 16 keys, in the
direct interfacing technique 16 port pins are required, whereas in the matrix keyboard only 8
lines are required. The 16 keys are arranged in a 4 column x 4 Row matrix.
After reading the status of each columns corresponding to a row, the row is pulled high and
the next row is pulled low and the status of the columns are read. This process is repeated
until the scanning for all rows are completed.
When a row is pulled low and if a key connected to the row is pressed, reading the column to
which the key is connected will give logic0.
Since keys are mechanical devices, there is a possibility for de-bounce issues, which may
give multiple key press effect for a single key press. To prevent this, a proper key de-
bouncing technique should be applied. Hardware key de-bouncer circuits and software key
de-bounce techniques are the key denouncing techniques available.
Module 4
Text book 2: Chapter-3, Chapter-4, Chapter-7 (Sections 7.1, 7.2 only), Chapter-9
(Sections 9.1, 9.2, 9.3.1, 9.3.2 only)
Certain embedded systems are part of a larger system and thus form components of a
distributed system.
These components are independent of each other but have to work together for the
larger system to function properly.
Ex. A car has many embedded systems controlled to its dash board. Each one is an
independent embedded system yet the entire car can be said to function properly only
if all the systems work together.
Ex. Currently available cell phones. The cell phones that have the maximum features
are popular but also their size and weight is an important characteristic
6. Power concerns
It is desirable that the power utilization and heat dissipation of any embedded system
be low.
If more heat is dissipated then additional units like heat sinks or cooling fans need to
be added to the circuit.
If more power is required then a battery of higher power or more batteries need to be
accommodated in the embedded system
Quality attributes are the non-functional requirements that need to be documented properly
in any system design.
The Quality attributes of any embedded system are classified into two, namely
The operational quality attributes represent the relevant quality attributes related to the
embedded system when it is in the operational mode or online mode.
a. Response
b. Throughput
c. Reliability
d. Maintainability
e. Security
f. Safety
a) Response
Response is a measure of quickness of the system.
It gives you an idea about how fast your system is tracking the input variables.
Most of the embedded system demand fast response which should be real-time.
For example, an embedded system deployed in flight control application should
respond in a Real Time manner. Any response delay in the system will create
potential damages to the safety of the flight as well as the passengers.
b) Throughput
Throughput deals with the efficiency of system.
It can be defined as rate of production or process of a defined process over a
stated period of time.
The rates can be expressed in terms of units of products, batches produced, or any
other meaningful measurements.
In the case of a Card Reader, throughput means how many transactions the Reader
can perform in a minute or in an hour or in a day.
Throughput is generally measured in terms of ‘Benchmark’. A ‘Benchmark’ is
a reference point by which something can be measured.
c) Reliability
Reliability is a measure of how much percentage you rely upon the proper
functioning of the system.
Mean Time between failures (MTBF) and Mean Time To Repair (MTTR) are
terms used in defining system reliability.
Mean Time between failures can be defined as the average time the system
is functioning before a failure occurs.
Mean time to repair can be defined as the average time the system has spent in repairs.
d) Maintainability
Maintainability deals with support and maintenance to the end user or a client in
case of technical issues and product failures or on the basis of a routine system
check-up.
e) Security
Confidentiality, Integrity and Availability are three corner stones of information
security. Confidentiality deals with protection data from unauthorized
disclosure.
f) Safety
Safety deals with the possible damage that can happen to the operating person and
environment due to the breakdown of an embedded system or due to the emission of
hazardous materials from the embedded products.
The quality attributes that needs to be addressed not on the basis of operational aspects are
grouped under this category.
b) Evaluability
For embedded system, the qualitative attribute “Evaluability” refer to ease with which
the embedded product can be modified to take advantage of new firmware or
hardware technology.
c) Portability
Portability is measured of system Independence.
An embedded product can be called portable if it is capable of performing its
operation as it is intended to do in various environments irrespective of different
processor and or controller and embedded operating systems.
Embedded systems are highly specialised in functioning and are dedicated for a specific
application.
The actuator part of the washing machine consists of a motorised agitator, tumble tub, water
drawing pump and inlet valve to control the flow of water into the unit.
The sensor part consists of the water temperature sensor, level sensor, etc.
The control part contains a microprocessor/ controller based board with interfaces to the
sensors and actuators.
The sensor data is fed back to the control unit and the-control unit generates the necessary
actuator outputs.
The control unit also provides connectivity to user interfaces like keypad for setting the
washing time, selecting the type of material to be washed like light, medium, heavy duty, etc.
User feedback is reflected through the display unit and LEDs connected to the control board.
i. Water inlet control valve - Near the water inlet point of the washing there is water inlet
control valve. When you load the clothes in washing machine, this valve gets opened
automatically and it closes automatically depending on the total quantity of the water
required.
Dept. of CSE, HKBKCE 85 2019-20
18CS44 Microcontrollers and Embedded Systems
ii. Water pump: The water pump circulates water through the washing machine. It works in
two directions, re-circulating the water during wash cycle and draining the water during the
spin cycle.
iii. Tub: There are two types of tubs in the washing machine: inner and outer. The clothes
are loaded in the inner tub, where the clothes are washed, rinsed and dried. The inner tub has
small holes for draining the water. The external tub covers the inner tub and supports it
during various cycles of clothes washing.
iv. Agitator or rotating disc: The agitator is located inside the tub of the washing machine.
It is the important part of the washing machine that actually performs the cleaning operation
of the clothes.
During the wash cycle the agitator rotates continuously and produces strong rotating currents
within the water due to which the clothes also rotate inside the tub. The rotation of the clothes
within water containing the detergent enables the removal of the dirt particles from the fabric
of the clothes.
In some washing machines, instead of the long agitator, there is a disc that contains blades on
its upper side. The rotation of the disc and the blades produce strong currents within the water
and the rubbing of clothes that helps in removing the dirt from clothes.
v. Motor of the washing machine: The motor is coupled to the agitator or the disc and
produces it rotator motion. These are multispeed motors, whose speed can be changed as per
the requirement. In the fully automatic washing machine the speed of the motor i.e. the
agitator changes automatically as per the load on the washing machine.
vi. Timer: The timer helps setting the wash time for the clothes manually. In the automatic
mode the time is set automatically depending upon the number of clothes inside the washing
machine.
vii. Printed circuit board (PCB): The PCB comprises of the various electronic components
and circuits, which are programmed to perform in unique ways depending on the load
conditions (the condition and the amount of clothes loaded in the washing machine). They are
sort of artificial intelligence devices that sense the various external conditions and take the
decisions accordingly. These are also called as fuzzy logic systems. Thus the PCB will
calculate the total weight of the clothes, and find out the quantity of water and detergent
required, and the total time required for washing the clothes. Then they will decide the time
required for washing and rinsing. The entire processing is done on a kind of processor which
may be a microprocessor or microcontroller.
viii. Drain pipe: The drain pipe enables removing the dirty water from the washing that has
been used for the washing purpose.
The presence of automotive embedded system in a vehicle varies from simple mirror and
wiper controls to complex air bag controller and antilock brake systems (ABS).
Automotive embedded systems are normally built around microcontrollers or DSPs are
generally known as Electronic Control Units (ECUs).
The first embedded system used in automotive application was the microprocessor based
fuel injection system introduced by Volkswagen 1600 in 1968.
The various types of electronic control units (ECUs) used in the automotive embedded
industry they are
1. High-speed embedded control units
2. Low-speed embedded control units.
2. Low-speed Electronic Control Units (LECUs) - Low Speed Electronic Control Units
(LECUs) are deployed in applications where response time is not so critical. They generally
built around low cost microprocessors/microcontrollers and digital signal processors. Audio
controllers, passenger and driver door locks, door glass controls (power windows), wiper
control, mirror control, seat control systems etc. are examples of LECUs.
Different types of serial interface buses deployed in automotive embedded applications are
1. Controller Area Network (CAN) - The CAN bus was originally proposed by Robert
Bosch. It supports medium speed with data rates up to 125 Kbps and high speed with data
rates up to 1Mbps data transfer.
CAN is an event-driven protocol interface with support for error handling in data
transmission. It is generally employed in safety system like airbag control; power train
systems like engine control and Antilock Brake System (ABS); and navigation systems like
GPS etc.
2. Local Interconnect Network (LIN) - LIN bus is a single master multiple slave (up to 16
independent slave nodes) communication interface. LIN is a low speed, single wire
communication interface with support for data rates up to 20 Kbps and is used for
sensor/actuator interfacing.
LIN bus is employed in applications like mirror controls, fan controls, seat positioning
controls, window. Controls, and position controls where response time is not a critical issue.
The MOST bus-specifications define the physical layer as well as the application layer,
network layer, and media access control.
MOST bus is an optical fibre cable connected between the Electrical Optical Converter
(EOC) and Optical Electrical Converter (OEC), which would translate into the optical cable
MOST bus.
2. Selecting the Architecture - A model only captures the system characteristics and does
not provide information on how the system can be manufactured?
The architecture specifies how a system is going to implement in terms of the number and
different types of components and the interconnection among them.
i. The controller architecture - implements the finite state machine model using a state
register and two combinational circuits. The state register holds the present state and the
combinational circuit’s implement the logic for next state and output.
ii. The datapath architecture - is best suited for implementing the data flow graph model
where the output is generated as a result of a set of predefined computations on the input data.
A datapath represents a channel between the input and output and in datapath architecture the
datapath may contain registers, counters, register files, memories and ports along with high
speed arithmetic units.
iii. The Finite State Machine Datapath (FSMD) – this architecture combines the controller
architecture with datapath architecture. It implements a controller with datapath.
The controller generates the control input whereas the datapath processes the data. The
datapath contains two types of I/O ports, out of which one acts as the control port for
receiving/sending the control signals from/to the controller unit and the second I/O port
interfaces the datapath with external world for data input and data output.
iv. The Complex Instruction Set Computing (CISC) - architecture uses an instruction set
representing complex operations. It is possible for a CISC instruction set to perform a large
complex operation with a single instruction.
The use of a single complex instruction in place of multiple simple instructions greatly
reduces the program memory access and program memory size requirement.
v. The Very Long Instruction Word (VLIW) - architecture implements multiple functional
units (ALUs, multipliers, etc.) in the datapath. The VLIW instruction packages one standard
instruction per functional unit of the datapath.
Single Instruction Multiple Data (SIMD) and Multiple Instruction Multiple Data (MIMD)
architectures are examples for parallel processing architecture.
In SIMD architecture, a single instruction is executed in parallel with the help of the
Processing Elements. On the other hand, the processing elements of the MIMD architecture
execute different instructions at a given point of time.
A model can be captured using multiple programming languages like C, C++, C#, Java, etc.
for software implementations and for hardware implementations languages like VHDL,
System C, Verilog, etc.
On the other hand, a single language can be used for capturing a variety of models. Certain
languages are good in capturing certain computational model. For example, C++ is a good
candidate for capturing an object oriented model.
In Data Flow Graph (DFG) model the operation on the data (process) is represented using a
block (circle) and data flow is represented using arrows. An inward arrow to the process
(circle) represents input data and outward arrow from the process (circle) represents
output data.
Now let’s have a look at the implementation of a DFG. Suppose one of the functions in our
application contains the computational requirement x = a + b; and y = x - c.
Below figure illustrates the implementation of a DFG model for implementing these
requirements.
In a DFG model, a data path is the data flow path from input to output. A DFG model is said
to be acyclic DFG (ADFG) if it doesn’t contain multiple values for the input variable and
multiple output values for a given set of input(s).
Feedback inputs (Output is fed back to Input), events, etc. are examples for non-acyclic inputs.
The CDFG uses Data Flow Graph (DFG) as element and conditional (constructs) as decision
makers. CDFG contains both data flow nodes and decision nodes, whereas DFG contains
only data flow nodes.
Let us have a look at the implementation of the CDFG for the following requirement.
If flag =1, then x=a+b; else y=a—b; this requirement contains a decision making process.
The control node is represented by a ‘Diamond’ block which is the decision making
element. The decision on which process is to be executed is determined by the control node.
The State Machine model describes the system behavior with ‘States’, ‘Events’, ‘Actions’
and ‘Transitions’.
State is a representation of a current situation. An event is an input to the state. The event
acts as stimuli for state transition. Transition is the movement from one state to another.
Action is an activity to be performed by the state machine.
A Finite State Machine (FSM) model is one in which the number of states are finite. As an
example let us consider the design of an embedded system for driver/passenger ‘Seat Belt
Warning’ in an automotive using the FSM model.
1. When the vehicle ignition is turned on and the seat belt is not fastened within 10 seconds
of ignition ON, the system generates an alarm signal for 5 seconds.
2. The Alarm is turned off when the alarm time (5 seconds) expires or if the
driver/passenger fastens the belt or if the ignition switch is turned off, whichever happens
first.
Here the states are ‘Alarm Off’, ‘Waiting’ and ‘Alarm On’ and the events are ‘Ignition Key
ON’, ‘Ignition Key OFF’, “Timer Expire’, ‘Alarm Time Expire’ and ‘Seat Belt ON’.
Using the FSM, the system requirements can be modelled as shown in the above figure.
The ‘Ignition Key ON’ event triggers the 10 second timer and transitions the state to ‘Waiting’.
If a ‘Seat Belt ON’ or ‘Ignition Key OFF’ event occurs during the wait state, the state
transitions into ‘Alarm Off.
When the wait timer expires in the waiting state, the event ‘Timer Expire’ is generated and it
transitions the state to ‘Alarm On’ from the ‘Waiting’ state.
The ‘Alarm On’ state continues until a ‘Seat Belt ON’ or ‘Ignition Key OFF’ event or ‘Alarm
Time Expire’ event, whichever occurs first. The occurrence of any of these events transitions
the state to ‘Alarm Off’.
The wait state is implemented using a timer. The timer also has certain set of states andevents
for state transitions. Using the FSM model, the timer can be modelled as shown in below
figure.
The timer state can be either ‘IDLE’ or ‘READY’ or ‘RUNNING’. During the normal
condition when the timer is not running, it is said to be in the ‘IDLE’ state.
The timer is said to be in the ‘READY’ state when the timer is loaded with the count
corresponding to the required time delay.
The timer remains in the ‘READY’ state until a ‘Start Timer’ event occurs. The timer
changes its state to ‘RUNNING’ from the ‘READY’ state on receiving a ‘Start Timer’ event
and remains in the ‘RUNNING’ state until the timer count expires or a “Stop Timer’ even
occurs.
The timer state changes to ‘IDLE’ from ‘RUNNING’ on receiving a ‘Stop Timer’ or ‘Timer
Expire’ event.
Example 1
Design an automatic tea/coffee vending machine based on FSM model for the following
requirement.
The tea/coffee vending is initiated by user inserting a 5 rupee coin. After inserting the
coin, the user can either select ‘Coffee’ or ‘Tea’ or press ‘Cancel’ to cancel the order
and take back the coin.
The FSM representation for the above requirement is shown in the below figure.
The event ‘Insert Coin’ (5 rupee coin insertion), transitions the state to ‘Wait for User Input’.
The system stays in this state until a user input is received from the buttons ‘Cancel’, ‘Tea’ or
‘Coffee’.
If the event triggered in ‘Wait State’ is ‘Cancel’ button press, the coin is pushed out and the
state “transitions to ‘Wait for Coin’.
If the event received in the ‘Wait State’ is either “Tea’ button press, or ‘Coffee’ button press,
the state changes to ‘Dispense Tea’ and ‘Dispense Coffee’ respectively.
Once the coffee/tea vending is over, the respective states transitions back to the ‘Wait for
Coin’ state.
A few modifications like adding a timeout for the ‘Wait State’ (Currently the ‘Wait State’ is
infinite; it can be re-designed to a timeout based ‘Wait State’. If no user input is received
within the timeout period, the coin is returned back and the state automatically transitions to
‘Wait for Coin’ on the timeout event) and capturing another events like, ‘Water not
available’, ‘Tea/Coffee Mix not available’ and changing the state to an ‘Error State’ can be
added to enhance this design.
Example 2
Design a coin operated public telephone unit based on FSM model for the following
requirements.
1. The calling process is initiated by lifting the receiver (off-hook) of the telephone unit.
2. After lifting the phone the user needs to insert a 1 rupee coin to make the call.
3. If the line is busy, the coin is returned on placing the receiver back on the hook.
4. If the line is through, the user is allowed to talk till 60 seconds and at the end of
45 second, prompt for inserting another 1 rupee coin for continuing the call is
initiated
5. If the user doesn’t insert another 1 rupee coin, the call is terminated on completing
the 60 seconds time slot.
6. The system is ready to accept new call request when the receiver is placed back on
the hook.
7. The system goes to the ‘Out of Order’ state when there is a line fault.
The FSM representation for the above requirement is shown in the below figure.
Here the program instructions are iterated and executed conditionally and the data gets
transformed through a series of operations. The important tool used for modelling sequential
program is Flow Charts.
Example sequential program model for the ‘Seat Belt Warning’ system is illustrated below.
#define ON 1
#define OFF 0
#define YES 1
#define NO 0
void seat_belt_warn()
{
wait_10sec();
if (check_ignition_key()==ON)
{
if (check_seat_belt()==OFF)
{
set_timer(5);
start_alarm();
while ( (check_seat_belt()==OFF ) &&( check_ignition_key()==OFF) &&
(timer_expire()==NO));
Stop_alarm();
}
}
}
Figure: Concurrent processing Program model for Seat Belt Warning System
• We have five tasks here and we cannot execute them sequentially or randomly. We need
to synchronize their execution through some mechanism.
• We need to start alarm only after the expiration the 10 seconds wait timer, and that too
only if the seat belt is OFF and the ignition key is ON.
• Hence the alarm control task is executed only when the wait timer is expired and if the
ignition key is in the ON state and seat belt is in the OFF state.
• Here we will use events to indicate these scenarios.
6. Object-Oriented Model
The object-oriented model is an object based model for modelling system requirements.
A class represents the state of an object through member variables and object behaviour
through the member functions. The member variables and member function of a class can be
private, public or protected. Private member variables and functions are accessible only
within the class whereas public variable and functions are accessible within and outside of a
class. The protected variable and functions are protected from external access.
Two basic approaches are used for embedded firmware design. They are
1. Conventional Procedural Based Firmware Design
2. Embedded Operating System (OS) Based Design
The Super Loop based firmware development approach is adopted for applications that are
not time critical and where the response time is not so important (embedded systems
where missing deadlines are acceptable).
It is very similar to a conventional procedural programming where the code is executed task
by task. The task listed at the top of the program code is executed first and the tasks just
below the top are executed after completing the first task.
Visualise the operational sequence listed above in terms of a ‘C’ program code as
void main()
{
Configuration();
Initialization();
While(1)
{
Task 1;
Task 2;
.
.
.
Task n;
}
}
Almost all tasks in above example are non-ending and are repeated infinitely throughout the
operation. From the above ‘C’ code you can see that the tasks 1 to n are performed one after
another and when the last task (n task) is executed, the firmware execution is again re-
directed to Task 1 and it is repeated forever in the loop. This repetition is achieved by using
an infinite loop. Here the while (1) { } loop. This approach is also referred as ‘Super loop
based Approach’.
Since the tasks are running inside an infinite loop, the only way to come out of the loop is
either a hardware reset or an interrupt assertion. A hardware reset brings the program
execution back to the main loop. Whereas an interrupt request suspends the task execution
temporarily and performs the corresponding interrupt routine and on completion of the
interrupt routine it restarts the task execution from the point where it got interrupted.
The ‘Super loop based design’ doesn’t require an operating system, since there is no need for
scheduling which task is to be executed and assigning priority to each task. Here the priorities
are fixed and the order in which the tasks to be executed are also fixed.
This type of design is deployed in low-cost embedded products and products where response
time is not time critical. Example, reading/writing data to and from a card using a card reader
requires a sequence of operations like checking the presence of card, authenticating the
operation, reading/writing, etc.
The major drawback of this approach is that any failure in any part of a single task will
affect the total system. If the program hangs up at some point while executing a task, it will
remain there forever and ultimately the product stops functioning.
Watch Dog Timers (WDTs) helps in coming out from the loop when an unexpected failure
occurs or when the processor hangs up.
The General Purpose OS (GPOS) based design is very similar to a conventional PC based
application development where the device contains an operating system (Windows/Unix/
Linux, etc. for Desktop PCs) and you will be creating and running user applications on top of
it.
Real Time Operating System (RTOS) based design approach is employed in embedded
products demanding Real-time response. RTOS respond in a timely and predictable manner
to events. Real Time operating system contains a Real Time kemel responsible for
performing pre-emptive multitasking, scheduler for scheduling tasks, multiple threads, etc.
• Similar to ‘C’ and other high level language programming, you can have
multiple source files called modules.
• Each module is represented by an ‘.asm’ or ‘.src’ file similar to the ‘.c’ files in C
programming. This approach is called modular programming
i. Library File Creation and Usage - Libraries are specially formatted, ordered program
collections of object modules that may be used by the linker at absolute object file creation.
When the linker processes a library only those object modules in the library that are necessary
to create the program are used. Library files are generated with extension ‘.lib’.
.
ii. Linker and Locater - Linker and Locater is another software utility responsible for
linking the various object modules in a multi-module project and assigning absolute address
to each module. Linker generates an absolute object module by extracting the object modules
from the library.
iii. Object to Hex File Converter - This is the final stage in the conversion of Assembly
language to machine understandable language (machine code).
Hex File is the representation of the machine code and the hex file is dumped into the code
memory of the processor/controller.
Hex file is created from the final ‘Absolute Object File’ using the Object to Hex File
Converter utility.
• The most commonly used high level language for embedded firmware application development
is ‘C’.
• C is well defined easy to use high level language with extensive cross platform development
tool support. Nowadays cross compiler for C++ is also emerging out.
• High level language based development approach is same as that of assembly language based
development except that conversion of source file written in high level language to object file is
done by cross compiler.
• The various steps involved in the conversion of a program is illustrated in figure.
Mixing Assembly language with High level language ( Assembly language with C)
• Assembly routines are mixed with ‘C’ in situations where the entire program is written
in ‘C’.
• When accessing certain low level hardware, the timing specifications may be very
critical and a cross compiler generated code may not be able to offer the required
performance.
• Writing assembly routine and invoking it from C is the most advised method to handle
the situation.
• Passing parameters form C to assembly routine and returning values from Assembly to
C and method of invoking assembly routine form C is cross compiler dependent. There
is no written rule for this.
• This information can be obtained from the documentation of the cross compiler you are
using.
Mixing of High level language with assembly language.( C with assembly language)
• Mixing the code written in a high level language like C and assembly language is
useful in the following scenarios:
1. The source code is already available in assembly language and a routine
written in high level language like ‘C’ needs to be included to the existing
code.
2. The entire source code is planned in assembly code for various reasons like
optimized code, optimal performance etc.. But some portions of the code may
be very difficult and tedious to code in assembly.
3. To include built in library functions written in ‘C’ language provided by the
cross compiler.
• Mixing assembly and C major questions that need to be addressed are—
• How parameters are passed form C to assembly routine?
• How values are returned from Assembly to C?
• How C routine is invoked form assembly code?
• Passing parameters to C and returning values from C function and method of invoking
C function is cross compiler dependent. There is no written rule for this.
• This information can be obtained from the documentation of the cross compiler you are
using.
Inline assembly
• Inline assembly is another technique for inserting target processor specific assembly
instructions at any location of the source code written in high level language ‘C’.
• This avoids the delay in calling an assembly routine form a ‘C’ code.
• Special keywords are used to indicate that the start and end of assembly instructions.
Keywords are cross compiler specific.
• C51 uses keywords #pragma asm and #pragma endasm to indicate a block of code
written in assembly.
PROGRAMMING IN EMBEDDED C
C Embedded C
1. C is a well-structured, well defined 1. Embedded ‘C’ can be considered as a subset of
and standardised language. ‘C’ language. And it supports all ‘C’ instructions and
incorporates a few target processor specific
2. It is General purpose functions/instructions.
programming language
2. It is Specific purpose programming language.
3. C is generally used for
desktop computers. 3. Embedded C is generally used
microcontroller based
4.Compiler is used for conversion of programs applications.
written in ‘C’ to the binary code
4. Cross Compiler is used for conversion of
5. C language is not hardware programs written in ‘Embedded C’ to the binary
dependent language. code
Compiler Cross-Compiler
1. Compiler is a software tool that converts a source 1. Cross Compiler is a software tool that converts a
code written in a high level language (C- Language) source code written in an Embedded C to machine
to machine level language. level language.
2. Compiler are used in platform specific 2. Cross-compilers are used in cross- platform
development applications. development applications.
3. Compilers are generally termed as ‘Native 3.Cross compiler generates machine code for the
Compilers’. A native compiler generates machine different machine (processor) on which it is running
code for the same machine (processor) on which it is
running
4.Example – Keil Compiler (Keil C51)
4. Example - GCC (GNU Complier collection),
Borland turbo C, Intel C++ compiler.
Module 5
RTOS and IDE for Embedded System Design
Operational System Basics
The operating system acts as a bridge between the user applications/tasks and the
underlying system resources through a set of system functionalities and services.
The OS manages the system resources and makes them available to the user
applications/tasks on a need basis.
A normal computing system is a collection of different I/O subsystems, working, and
storage memory.
The primary functions of an operating system is
Make the system convenient to use
Organise and manage the system resources efficiently and correctly
Figure 10.1 gives an insight into the basic components of an operating system and
their interfaces with rest of the world.
The Kernel
The kernel is the core of the operating system and is responsible for managing the
system resources andthe communication among the hardware and other system
services.
Kernel acts as the abstraction layerbetween system resources and user applications.
Kernel contains a set of system libraries and services.
For a general purpose OS, the kernel contains different services for handling the
following.
Process Management
The term primary memory refers to the volatile memory (RAM) where processes are
loaded and variables and shared data associated with each process are stored.
The Memory Management Unit (MMU) of the kernel is responsible for
Keeping track of which part of the memory area is currently used by which process
Allocating and De-allocating memory space on a need basis (Dynamic memory
allocation).
The service ‘Device Manager’ (Name may vary across different OS kernels) of the
kernel is responsible for handling all I/O device related operations.
The kernel talks to the I/O device through a set of low-level systems calls, which are
implemented in a service, called device drivers.
The device drivers are specific to a device or a class of devices.
The Device Manager is responsible for
Loading and unloading of device drivers
Exchanging information and the system specific control signals to and
from the device
The secondary storage management deals with managing the secondary storage
memory devices, if any, connected to the system.
Secondary memory is used as backup medium for programs and data since the main
memory is volatile.
In most of the systems, the secondary storage is kept in disks (Hard Disk).
The secondary storage management service of kernel deals with
Disk storage allocation
Disk scheduling (Time interval at which the disk is activated to backup data)
Free Disk space management
Protection Systems
Most of the modern operating systems are designed in such a way to support multiple
users with different levels of access permissions (e.g. Windows XP with user
permissions like
‘Administrator’, ‘Standard’, ‘Restricted’, etc.). Protection deals with implementing
the security policies to restrict the access to both user and system resources by
different applications or processes or users.
In multiuser supported operating systems, one user may not be allowed to view or
modify the whole/portions of another user’s data or profile details.
In addition, some application may not be granted with permission to make use of
some of the system resources.
This kind of protection is provided by the protection services running within the
kernel.
Interrupt Handler
The applications/services are Classified into two categories, namely: user applications
and keel applications.
The program code corresponding to the kernel applications/services are kept in a
contiguous area (OS dependent) of primary (working) memory and is protected from
the un-authorised access by user programs/applications.
The memory space at which the keel code is located is known as ‘Kernel Space’.
Similarly, all user applications are loaded to a specific area of primary memory and
this memory area is referred as ‘User Space’.
User space is the memory area where user applications are loaded and executed.
The partitioning of memory into kernel. and user space is purely Operating System
dependent.
Some OS implements this kind of partitioning and protection whereas some OS do
not segregate the kernel and user application code storage into two separate areas.
In an operating system with virtual memory support, the user applications are loaded
into its corresponding virtual memory space with demand paging technique; Meaning,
the entire code for the user application need not be loaded to themain (primary)
memory at once; instead the user application code is split into different pages and
these pages are loaded into and out of the main memory area on a need basis.
The act of loading the code into and out of the main memory is termed as ‘Swapping’.
Swapping happens between the main (primary) memory and secondary storage
memory.
Each process run in its own virtual memory space and are not allowed accessing the
memory space corresponding to another processes, unless explicitly requested by the
process.
Each process will have certain privilege levels on accessing the memory of other
processesand based on the privilege settings, processes can request kernel-to map
another process’s memory to its own or share through some other mechanism.
Most of the operating systems keep the kernel applicationcode in main memory and it
is not swapped out into the secondary memory.
In monolithic kernel architecture, all kernel services run in the kernel space.
Here all kernel modules run within the same memory space under a single kernel
thread. The tight internal integration of kernel modules in monolithic kernel
architecture allows the effective utilisation of thelow-level features of the underlying
system.
The major drawback of monolithic kernel is that any erroror failure in any one of the
kernel modules leads to the crashing of the entire kernel application.
LINUX,SOLARIS, MS-DOS kernels are examples of monolithic kernel.
The architecture representation of amonolithic kernel is given in Fig. 10.2.
Microkernel
The microkernel design incorporates only the essential set of Operating System
services into the kernel.
The rest of the Operating System services are implemented in programs known as
‘Servers’ which runs in user space.
This provides a ‘highly modular design and OS-neutral abstraction to the kernel.
Memory management, process management, timer systems and interrupt handlers are
the essential services, which forms the part of the microkernel.
Mach, QNX, Minix 3 kernels are examples for microkernel.
The architecture representation of a microkernel is shown in Fig. 10.3.
Microkernel based design approach offers the following benefits
Robustness:
Configurability:
Any services, which run as ‘Server’ application can be changed without the needto
restart the whole system. This makes the system dynamically configurable.
The operating systems, which are deployed in general computing systems, are
referred as General Purpose Operating Systems (GPOS).
The kernel of such an OS is more generalised and it contains all kinds of services
required for executing generic applications.
General-purpose operating systems are often quite non-deterministic in behaviour.
Their services can inject random delays into application software and may cause slow
responsiveness of an application at unexpected times.
GPOS are usually deployed in computing systems where deterministic behaviour is
not an important criterion.
Personal Computer Desktop system is a typical example for a system where GPOSs
are deployed.
Windows XP/MS-DOS TC etc. are examples for General Purpose Operating Systems.
Task/Process management
Task/Process scheduling
Task/Process synchronisation
Error/Exception handling
Memory management
Interrupt handling
Time management
Task/Process management
Deals with setting up the memory space for the tasks, loading the task’s code into the
memory space, allocating system resources, setting up a Task Control Block (TCB)
for the task and task/process termination/deletion.
A Task Control Block (TCB) is used for holding the information correspondingto a
task.
TCB usually contains the following set of information.
Task ID: Task Identification Number
Task State: The current state of the task (e.g. State = ‘Ready’ for a task
which is ready to execute)
Task Type: Task typeIndicates what is the type for this task. The task
can be a hard real time or soft real time or background task.
Task Priority: Task priority (e.g. Task priority = 1 for task with priority
= 1)
Task Context Pointer: Context pointer-Pointer for context saving
Task Memory Pointers: Pointers to the code memory, data memory
and stack memory for the task
Task System Resource Pointers: Pointers to system resources
(semaphores, mutex, etc.) used by the task
Task Pointers: Pointers to other TCBs (TCBs for preceding, next and
waiting tasks)
Other Parameters Other relevant taskparameters
Task/Process Scheduling
Task/Process Synchronisation
Deals with synchronising the concurrent access of a resource, which is shared across
multiple tasks and the communication between various tasks.
Error/Exception Handling
Deals with registering and handling the errors occurred/exceptions raised during the
execution of tasks.
Insufficient memory, timeouts, deadlocks, deadline missing, bus error, divide by zero,
unknown instruction execution, etc. are examples of errors/exceptions.
Errors/Exceptions can happen at the kernel level services or at task level.
Deadlock is an example for kernel level exception, whereas timeout is an example for
a task level exception.
The OS kernel gives the information about the error in the form of a system call
(API).
GetLastError() API provided by Windows CE RTOS is an example for such a system
call.
Watchdog timer is a mechanism for handling the timeouts for tasks.
Certain tasks may involve the waiting of external events from devices.
These tasks will wait infinitely when the external device is not responding and the
task will generate a hang-up behaviour.
In order to avoid these types of scenarios, a proper timeout mechanism should be
implemented.
A watch- dog is normally used in such situations.
The watchdog will be loaded with the maximum expected wait time for the event and
if the event is not triggered within this wait time, the same is informed to the task and
the task is timed out.
If the event happens before the timeout, the watchdog is resetted.
Memory Management
Interrupt Handling
Deals with the handling of various types of interrupts. Interrupts provide Real- Time
behaviour to systems.
Interrupts inform the processor that an external device or an associated task requires
immediate attention of the CPU.
Interrupts can be either Synchronous or Asynchronous. Interrupts which occurs in syne with
the currently executing task is known as Synchronous interrupts.
Usually the software interrupts fall under the Synchronous Interrupt category.
Divide by zero, memory segmentation error, etc. are examples of synchronous interrupts.
For synchronous interrupts, the interrupt handler runs in the same context of theinterrupting
task
.
Asynchronous. interrupts are interrupts, which occurs at any point of execution of any task,
and are not in sync with the currently executing task.
The interrupts generated by external devices (by asserting the interrupt line of the
processor/controller to which the interrupt line of the device is connected) connected to the
processor/controller, timer overflow interrupts, serial data reception/ transmission interrupts,
etc. are examples for asynchronous interrupts.
For asynchronous interrupts, the interrupt handler is usually written as separate task
(Depends onOS kernel implementation) and it runs in a different context.
Priority levels can be assigned to the interrupts and each interrupts can be enabled or disabled
individually.
Most of the RTOS kernel implements ‘Nested Interrupts’ architecture. Interrupt nesting
allows pre-emption (interruption) of an Interrupt Service Routine (ISR), servicing an
interrupt, by a high priority interrupt.
Time Management
Accurate time management is essential for providing precise time reference for all
applications.
The ‘Timer tick’ interval may vary depending on the hardware timer.
The time parameters for tasks are expressed as the multiples of the ‘Timer tick’.
If the System time register is 32 bits wide and the ‘Timer tick’ interval is 1 microsecond, the
System time register will reset in 232 * 10-8/ (24 * 60 * 60) = 49700 Days = ~ 0.0497 Days =
1.19 Hours If the ‘Timer tick’ interval is | millisecond, the system time register will reset in
232 * 10°3 / (24 * 60 * 60) = 497 Days = 49.7 Days = ~ 50 Days The ‘Timer tick’ interrupt is
handled by the “Timer Interrupt’ handler of kernel.
The ‘Timer tick’ interrupt can be utilised for implementing the following actions.
Hard Real-Time
Real-Time Operating Systems that strictly adhere to the timing constraints for a task
is referred as ‘Hard Real-Time’ systems.
A HardReal-Time system must meet the deadlines for a task without any slippage.
Missing any deadline may produce catastrophic results for Hard Real-Time Systems,
including permanent data lose and irrecoverable damages to the system/users.
A system can have several such tasks and the key to their correct operation lies in
scheduling them so that they meet their time constraints.
Air bag control systems and Anti-lock Brake Systems (ABS) of vehicles are typical
examples for Hard Real-Time Systems.
The Air bag conttol system should be into action and deploy the air bags when the
vehicle meets a severe accident. Ideally speaking, the time for triggering the air bag
deployment task, when an accident is sensed by the Air bag control system, should be
zero and the air bags should be deployed exactly within the time frame, which is
predefined for the air bag deployment task.
Soft Real-Time
Real-Time Operating System that does not guarantee meeting deadlines, but offer the
best effort to meet the deadline are referred as ‘Soft Real-Time’ systems.
Missing deadlines for tasks are acceptable for a Soft Real-time system if the
frequency of deadline missing is within the compliance limit of the Quality of Service
(QoS).
Automatic Teller Machine (ATM) is a typical example for Soft- Real-Time System.
If the ATM takes a few seconds more than the ideal operation time, nothing fatal
happens.
An audio-video playback system is another example for Soft Real-Time system.
No potential damage arises if a sample comes late by fraction of a second, for
playback.
The term ‘task’ refers to something that needs to be done. In our day-to-day life, we are
bound to the execution of a number of tasks.
The task can be the one assigned by our managers or the one assigned by our
professors/teachers or the one related to our personal or family needs. In addition, we will
have an order of priority and schedule/timeline for executing these tasks.
In the operating system context, a task is defined as the program in execution and thé related
information maintained by the operating systemfor the program.
The terms ‘Task’, ‘Job’ and ‘Process’ refer to the same entity in the operating system context
and most often they are used interchangeably.
Process
A process mimics a processor in properties and holds a set of registers, process status,
a Program Counter (PC) to point to the next executable instruction of the process, a
stack for holding the local variables associated with the process and the code
corresponding to the process.
This can be visualised as shown in Fig. 10.4.
A process which inherits all-the properties of the CPU can be considered as a virtual
processor, awaiting its turn to have its properties switched into the physical processor.
When the process gets its turn, its registers arid the program counter register becomes
mapped to the physical registers of the CPU.
From a memory perspective, the memory occupied by the process is segregated into three
regions, namely, Stack memory, Data memory and Code memory (Fig. 10.5).
The ‘Stack’ memory holds all temporary data such as variables local to the process.
The code memory contains the program code (instructions) corresponding to the process.
On loading a process into the main memory, a specific area of memory is allocated for the
process.
The stack memory usually starts (OS Kermel implementation dependent) atthe highest
memory address from the memory area allocated for the process.
Say for example, the memory map of the memory area allocated for the process is 2048 to
2100, the stack memory starts at address 2100 and grows downwards to accommodate the
variables local to the process.
The Operating System recognises a process in the ‘Created State’ but no resources are
allocated to the process.
The state, where a process is incepted into the memory and awaiting the processor time for
execution, is known as ‘Ready State’.
At this stage, the process is placed in the ‘Ready list? queue maintained by the OS.
The state where in the source code instructions corresponding to the process is being
executed is called ‘Running State’.
The blocked state might be invoked by various conditions like: the process enters a wait state
for an event to occur (e.g. Waiting for user inputs such as keyboard input) or waiting for
getting access to a shared resource
A state where the process completes its execution is known as ‘Completed State’.
The transition of a process from one state to another is known as ‘State transition’.
When a process changes its state from Ready to running or from running to blocked or
terminated or from blocked to running, the CPU allocation for the process may also change.
It should be noted that the state representation _ for a process/task mentioned here is a
generic representation.
The states associated with a task may be known with a different name or there may be more
or less number of states than the one explained here under different OS kernel.
For example, under VxWorks’ kernel, the tasks may be in either one or a specific
combination of the states READY, PEND, DELAY and SUSPEND.
The PEND state represents a state where the task/process is blocked on waiting for I/O or
system resource.
The DELAY state represents a state in which the task/process is sleeping and the SUSPEND
state represents a state where a task/process is temporarily suspended from execution and not
available for execution.
Under MicroC/OS-II kernel, the tasks may be in one of the states, DORMANT, READY,
RUNNING, WAITING or INTERRUPTED.
The DORMANT state represents the ‘Created’ state and WAITING state represents the state
in which a process waits for shared resource or I/O access.
Process Management
Process management deals with the creation of a process, setting up the memory space for the
process, loading the process’s code into the memory space, allocating system resources,
setting up a Process Control Block (PCB) for the process and process termination/deletion.
THREADS
If the process is split into multiple threads, which executes a portion of the process,
there will be a main thread and rest of the threads will be created within the main
thread.
Use of multiple threads to execute a process brings the following advantage.
Better memory utilisation. Multiple threads of the same process share the
address space for data memory. This also reduces the complexity of inter
thread communication since variables can be shared across the threads.
Since the process is split into different threads, when one thread enters a wait
state, the CPU can be utilised by other threads of the process that do not
require the event, which the other thread is waiting, for processing. This
speeds up the execution of the process. e
Efficient CPU utilisation. The CPU is engaged all time.
Thread Standards
Thread standards deal with the different standards available for thread creation and
management.
These standards are utilised by the operating systems for thread creation and thread
management.
It is a set of thread class libraries. The commonly available thread class libraries are
explained below
The POSIX 4 standard deals with the Real-Time extensions and POSLX.4a standard deals
with thread extensions.
The POSIX standard library for thread creation and management is ‘Pthreads’.
‘Pthreads’ library defines the set of POSIX thread creation and management functions in ‘C’
language.
The primitive
creates a new thread. for running the function start_ function. Here pthread _t is the handle to
the newly created thread and pthread_attr_t is the data type for holding the thread attributes.
‘start_function’ is the function the thread is going to execute and arguments is the arguments
for ‘start_function’ (It is a void * in the above example). On successful creation of a Pthread,
pthread_create() associates the Thread Control Block (TCB) corresponding to the newly
created thread to the variable of type pthread_t (new_thread_ID in our example).
The primitive
blocks the current thread and waits until the completion of the thread pointed’ by it (in this
example new_thread ) All the POSIX ‘thread calls’ returns an integer. A return value of zero
indicates the success of the call. Itis always good to check the return value of each call.
The lines printed will give an idea of the order in which the thread execution is switched
between.
The pthread_join call forces the main thread to wait until the completion of the thread cd, if
the main thread finishes the execution first. The termination of a thread can happen in
different ways.
The thread can terminate either by completing its execution (natural termination) or by a
forced termination. Jn a natural termination, the thread completes its execution and returns
back to the main thread through a simple return or by executing the pthreadexit() call.
Forced termination can be achieved by the call pthread _cancel() or through the termination
of the main thread with exit or exec functions.
pthread _exit() call is used by a thread to explicitly exit after it completes its work and is no
longer required to exist.
If the main thread finishes before the threads it has created, and exits with pthread_exit(), the
other threads continue to execute.
If the main thread uses exit call to exit the thread, all threads created by the main thread is
terminated forcefully.
Exiting a thread with the call pthread_exit() will not perform a cleanup.
It will not close any files opened by the thread and files will remain in the open status even
after the thread terminates.
Calling pthread_join at the end of the main thread is the best way to achieve synchronisation
and proper cleanup.
The main thread, after finishing its task waits for the completion of other threads, which were
joined to it using the pthread _join call.
With a pthread_join call, the main thread waits other threads, which were joined to it, and
finally merges to the single main thread.
Ifa new thread spawned by the main thread is still not joined to the main thread, it will be
counted against the system’s maximum thread limit.
User Level Thread User level threads do not have kernel/Operating System support and they
exist solely in the running process.
Even if a process contains multiple user level threads, the OS treats it as single thread and
will not switch the execution among the different threads of it.
It is the responsibility of the process to schedule each thread as and when required.
THREAD PRE-EMPTION
User level threads do not have kernel/Operating System support and they exist solely
in the running process.
Even if a process contains multiple user level threads, the OS treats it as single thread
and will not switch the execution among the different threads of it.
It is the responsibility of the process to schedule each thread as and when required.
In summary, user level threads of a process are non-preemptive at thread level from
OS perspective.
Kernel level threads are individual units of execution, which the OS treats as separate
threads.
The OS interrupts the execution of the currently running kernel thread and switches
the execution to another kernel thread based on the scheduling policies implemented
by the OS. Kernel level threads are pre-emptive.
For user level threads, the execution switching (thread context switching) happens
only when the currently executing user level thread is voluntarily blocked.
Hence, no OS intervention and system calls are involved in the context switching of
user level threads.
This makes context switching of user level threads very fast.
On the other hand, kernel level threads involve lots of kernel overhead and involve
system calls for context switching. However, kernel threads maintain a clear layer of
abstraction and allow threads to use system calls independently.
There are many ways for binding user level threads with system/kernel level threads.
The following section gives an overview of various thread binding models.
Many-to-One Model
One-to-One Model
Many-to-Many Model
In this model many user level threads are allowed to be mapped-to many kernel
threads.
Windows NT/2000 with ThreadFibre package is an example for this.
Thread Vs Process
In the operating sys- tem context multiprocessing describes the ability to execute multiple
processes simultaneously.
Multiprocessor systems possess multiple CPUs and can execute multiple processes
simultaneously.
The ability of the operating system to have multiple programs in memory, which are ready
for execution, is referred as multiprogramming.
The ability of an operating system to hold multiple processes in memory and switch the
processor (CPU) from executing one process to another process is known as multitasking.
Multitasking involves the switching of CPU from executing one task to another.
A ‘process’ is considered as a ‘Virtual processor’, awaiting its turn.to have its properties
switched into the physical processor. In a multitasking environment, when task/process
switching happens, the virtual processor (task/process) gets its proper- ties converted into that
of the physical processor.
The switching of the virtual processor to physical processor is controlled by the scheduler of
the OS kernel.
Whenever a CPU switching happens, the current context of execution should be saved to
retrieve it at a later point of time when the CPU executes the process, which is interrupted
currently due to execution switching.
The context saving and retrieval is essential for resuming a process exactly from the point
where it was interrupted due to CPU switching.
The act of switching CPU among the processes or changing the current execution context is
known as ‘Context switching’.
The act of saving the current context which contains the context details (Register details,
memory details, system resource usage details, execution details, etc.) for the currently
running process at the time of CPU switching is known as ‘Context saving’.
The process of retrieving the saved context details for a process, which is going to be
executed due to CPU switching, is known as ‘Context retrieval’.
Multitasking involves ‘Context switching’ (Fig. 10.11), “Context saving’ and ‘Context
retrieval’.
Toss juggling- The skilful object manipulation game is a classic real world example for the
multitasking illusion.
The juggler uses a number of objects (balls, rings, etc.) and throws them up and catches
them. At any point of time, he throws only one ball and catches only one per hand.
However, the speed at which he is switching the balls for throwing and catching creates the
illusion, he is throwing and catching multiple balls or using more than two hands
simultaneously to the spectators
TYPES OF MULTITASKING
Depending on how the switching act is implemented, multitasking can be classified into
different types.
The following section describes the various types of multitasking existing in the Operating
System’s context.
Co-operative Multitasking
In this method, any task/process can hold the CPU as much time as it wants. Since this type
of implementation involves the mercy of the tasks each other for getting the CPU time for
execution, it is known as co-operative multitasking.
If the currently executing task is non-cooperative, the other tasks may have to wait for a long
time to get the CPU. ”
Preemptive Multitasking
When and how much time a process gets is dependent on the implementation of the
preemptive scheduling.
As the name indicates, in preemptive multitasking, the currently running task/ process is
preempted to give a chance to other tasks/process to executé.
Non-preemptiveMultitasking.
The co-operative andnon-preemptive multitasking differs in their behaviour when they arein
the ‘Blocked/Wait state.
In co-operative multitasking, the currently executing process/task need not relinquish the
CPU when it enters the ‘Blocked/Wait’ state
TASK COMMUNICATION
Co-operating Processes: In the co-operating interaction model one process requires the
inputs from other processes to complete its execution.
Competing Processes: The competing processes do not share anything among themselves
but they share the system resources. The competing processes compete for the system
resources such as file, display device, etc. Co-operating processes exchanges information and
communicate through the following methods.
Co-operation through Sharing: The co-operating process exchange data through some
shared resources.
Co-operation through Communication: No data is shared between the processes. But they
communicate for synchronisation. The mechanism through which processes/tasks
communicate each other is known as Inter Process/Task Communication (IPC). Inter
Process Communication is essential for process co-ordination. The various types of Inter
Process Communication (IPC) mechanisms adopted by process are kernel (Operating
System) dependent. Some of the important IPC mechanisms adopted by various kernels are
explained below.
Shared Memory
Processes share some area of the memory to communicate among them (Fig. 10.16).
Information to be communicated by the process is written to the shared memory area.
Other processes which require this information can read the same from the shared
memory area.
It is same as the real world example where ‘Notice Board’ is used by corporate to
publish the public information among the employees.
The implementation of shared memory concept is kernel dependent.
Different mechanisms are adopted by different kernels for implementing this. A few
among them are:
Pipes
Anonymous Pipes: The anonymous pipes-are unnamed, unidirectional pipes used for data
transfer between two. processes.
Named Pipes: Named pipe is a named, unidirectional or bi-directional pipe for data
exchange between processes.
Like anonymous pipes, the process which creates the named pipe is known as pipe server. A
process which connects to the named pipe is known as pipe client.
With named pipes, any process can act as both client and server allowing point-to-point
communication.
Named pipes can be used for communicating between processes running on the same
machine or between processes running on different machines connected to a network.
In this approach a mapping object is created and physical storage for it is reserved and
committed.
A process can map the entire committed physical area or a block of it to its virtual address
space.
All read and write operation to this virtual address space by a process is directed to its
committed physical area.
Any process which wants to share data with other processes can map the physical memory
area of the mapped object to its virtual memory space and use it for sharing the data.
Windows CE 5.0 RTOS uses the memory mapped object based shared memory technique for
Inter Process Communication (Fig. 10.18).
In order to create the mapping from the system paging memory, the handle parameter should
be passed as INVALID HANDLE _VALUE (-1).
The lpFileMappingA tributes parameter represents the security attributes and it must be
NULL.
The flProtect parameter represents the read write access for the shared memory area.
A value of PAGE READONLY makes the shared memory read only whereas the value
PAGH_READWRITE gives read-write access to the shared memory.
The parameter dwMaximumSizeHigh specifies the higher order 32 bits of the maximum size
of the memory mapped object and dwMaximumSizeLow specifies the lower order 32 bits of
the maximum size of the memory mapped object.
The parameter /pName points to a null terminated string specifying the name of the memory
mapped object.
The memory mapped object is created as unnamed object if the parameter IpName is NULL.
If IpName specifies the name of an existing memory mapped object, the function re- turns the
handle of the existing memory mapped object to the caller process.
The memory mapped object can be shared between the processes by either passing the handle
of the object or by passing its name.
If the handle of the memory mapped object created by a process is passed to another process
for shared access, there is a possibility of closing the handle by the process which created the
handle while it is in use by another process.
Ifthe name of the memory object is passed for shared access among processes, processes can
use this name for creating a shared memory object which will open the shared memory object
already existing with the given name.
The OS will maintain ausage count for the named object and it is incremented each time
when a process creates/opens a memory mapped object with existing name.
This will prevent the destruction of a shared memory object by one process while it is being
accessed by another process.
Hence passing the name of the memory mapped object is strongly recommended for memory
mapped object based inter process communication.
A value of FILE MAP WRITE makes the view access read-write, provided the memory
mapped object hFileMappingObject is created with read-write access, whereas the value
FILE MAP READ gives read only access to the shared memory, provided the memory
mapped object hFileMappingObject is created with read-write/read only access.
The parameter dwFileOffsetHigh specifies the higher order 32 bits and dwFileOffsetLow
specifies the lower order 32 bits of the memory offset where mapping is to begin from the
memory mapped object.
A value of ‘0’ for both of these maps the view from the beginning memory area of the
memory object. d@wNumberOf- BytesToMap specifies the number of bytes of the memory
object to map.
If dwNumberOfBytesTo Map is zero, the entire memory area owned by the memory mapped
object is mapped.
On successful execution, Map ViewOfFile call returns the starting address of the mapped
view.
A mapped view of the memory mapped object is unmapped by the API call
UnmapViewOfFile (LPCVOID IpBaseAddress).
The IpBaseAddress parameter specifies a pointer to the base address of the mapped view of a
memory object that is to be unmapped.
This value must be identical to the value returned by a previous call to the MapViewOfFile
function.
In other words, it frees the virtual address space of the mapping object. Under Windows
NT/XP OS, a process can open an existing memory mapped object by calling the API
OpenFileMapping(DWORD dwDesiredAccess,
The parameter dwDesiredAccess specifies the read write access permissions for
A value of FILE MAP ALL ACCESS provides read-write access, whereas the value FILE
_MAP_READ allocates only read access and FILE_MAP_ WRITE allocates write only
access.
If this parameter is TRUE, the calling process inherits the handle of the existing object,
otherwise not.
The parameter /pName specifies the name of the existing memory mapped object which
needs to be opened.
Windows CE 5.0 does not support handle inheritance and hence the API call
OpenFileMapping is not supported.
Message Passing
Message Queue
Usually the process which wants to talk to another process posts the message to a First-In-
First-Out (FIFO) queue called ‘Message queue’, which stores the messages temporarily in a
system defined memory object, to pass it to the desired process (Fig. 10.20).
Messages are sent and received through send (Name of the process to which the message is to
bé sent-message) and receive (Name of the process from which the message is to be received,
message) méthods.
The implementation of the message queue, send and receive methods are OS kernel
dependent.
The Windows XP OS kernel maintains a single system message queue and one
process/thread (Process and threads are used interchangeably here, since thread is the basic
unit of process in windows) specific message queue.
A thread which wants to communicate with another thread posts the message to the system
message queue.
The kernel picks up the message from the system message queue one at a time and examines
the message for finding the destination thread and then posts the message to the message
queue of the corresponding thread.
For posting a message to a thread’s message queue, the kernel fills a message structure MSG
and copies it to the message queue of the thread.
The message structure MSG contains the handle of the process/thread for which the message
is intended, the message parameters, the time at which the message is posted, etc.
A thread can simply post a message to another thread and can continue its operation or it may
wait for a response from the thread to which the message is posted.
The messaging mechanism is classified into synchronous and asynchronous based on the
behaviour of the message posting thread.
In asynchronous messaging, the message posting thread just posts the message to the queue
and it will not wait for an acceptance (return) from the thread to which the message is posted,
whereas in synchronous messaging, the thread which posts a message enters waiting state and
waits for the message result from the thread to which the message is posted.
The thread which invoked the send message becomes blocked and the scheduler will not pick
it up for scheduling.
The PostMessage API does not always guarantee the posting of messages to message queue.
The PostMessage API will not post a message to the message queue when the message queue
is full.
Hence it is recommended to check the return value of PostMessage API to confirm the
posting of message.
The SendMessage (HWND hWnd, UINT Msg, WPARAM wParam, LPARAM _ [Param)
API call sends a message to the thread specified by the handle hWnd and waits for the callee
thread to process the message.
The thread which calls the SendMessage API enters waiting state and _ waits for the message
result from the thread to which the message is posted.
The thread which invoked the SendMessage API call becomes blocked and the scheduler will
not pick it up for scheduling.
The OS maintains a First In First Out (FIFO) buffer for storing the messages and each
process can access this buffer for reading and writing messages.
The OS also maintains a special queue, with single message storing capacity, for storing high
priority messages (Alert messages).
The creation and usage of message queues under Windows CE OS is explained below.
A process can use this handle for reading or writing a message from/to of the message queue
pointed by the handle. The parameter /pszName specifies the name of the message queue.
If this parameter is NULL, an unnamed message queue is created. Processes can use the
handle returned by the API call if the message queue is created without any name.
If the message queue is created as named message queue, other processes can use the name of
the message queue for opening the named message queue created by a process.
Calling the CreateMsgQueue API with an existing named message queue as parameter
returns a handle to the existing message queue.
Under the Desktop Windows Operating Systems (Windows 9x/XP/NT/2K), each object type
(viz. mutex, semaphores, events, memory maps, watchdog timers and message queues) share
the same namespace and the same name is not allowed for creating any of this.
Windows CE kernel maintains separate namespace for each and supports the same name
across different objects.
Mailbox
Mailbox is an alternate form of ‘Message queues’ and it is used in certain Real- Time
Operating Systems for IPC.
Mailbox technique for IPC in RTOS is usually used for one way messaging.
The task/thread which wants to send a message to other tasks/threads creates a
mailbox for posting the messages.
The threads which are interested in receiving the messages posted to the mailbox by
the mailbox creator thread can subscribe to the mailbox.
The thread which creates the mailbox is known as ‘mailbox server’ and the threads
which subscribe to the mailbox are known as ‘mailbox clients’.
The mailbox server posts messages to the mailbox and notifies it to the clients which
are subscribed to the mailbox.
The clients read the message from the mailbox on receiving the notification.
The mailbox creation, subscription, message reading and writing are achieved through
OS kernel provided API calls, Mailbox and message queues are same in functionality.
The only difference is in the number of messages supported by them.
Both of them are used for passing data in the form of message(s) from a task to
another task(s). Mailbox is used for exchanging a single message between two tasks
or between an Interrupt Service Routine (ISR) and a task.
Mailbox associates a pointer pointing to the mailbox and a wait list to hold the tasks
waiting for a message to appear in the mailbox.
The implementation of mailbox is OS kernel dependent.
The MicroC/OS-II implements mailbox as a mechanism for inter-task
communication.
Figure 10.21 given below illustrates the mailbox based IPC technique
Signalling
Remote Procedure Call or RPC (Fig. 10.22) is the Inter Process Communication (IPC)
mechanism used by a process to call a procedure of another process running on the
same CPU or on a different CPU which is interconnected in a network.
In the object oriented language terminology RPC is also known as Remote Invocation
or Remote Method Invocation (RMI).
RPC is mainly used for distributed applications like client-server applications.
With RPC it is possible to communicate over a heterogeneous network (i.e. Network
where Client and server applications are running on different Operating systems), The
CPU/process containing the procedure which needs to be invoked remotely is known
as server.
The CPU/process which initiates an RPC request is known as client.
The result from the remote procedure is returned back to the caller through
mechanisms like callback functions.
Stream sockets are connection oriented and they use TCP to establish a reliable
connection. On the other hand, Datagram sockets rely on UDP for establishing a
connection.
The UDP connection is unreliable when compared to TCP.
The client-server communication model uses a socket at the client side and a socket at
the server side.
A port number is assigned to both of these sockets.
The client and server should be aware of the port number associated with the socket.
In order to start the communication, the client needs to send a connection request to
the server at the specified port number.
The client should be aware of the name of the server along with its port number.
The server always listens to the specified port number on the network. Upon receiving
a connection request from the client, based on the success of authentication, the server
grants the connection request and a communication channel is established between
the client and server. The client uses the host name and port number of server for
sending re- quests and server uses the client’s name and port number for sending
responses.
TASK SYNCHRONISATION
The act of making processes aware of the access of the shared resources by each
process to avoid conflicts is known as Task/Process Synchronisation.
Racing
At the processor instruction level, the value of the variable counter is loaded to the
Accumulator register (EAX register).
The memory variable counter is represented using a pointer.
The base pointer register (EBP register) is used for pointing to the memory variable
counter. After loading the contents of the variable‘counter to the Accumulator, the
Accumulator content is incremented by one using the add instruction.
Finally the content of Accumulator is loaded to the memory location which repre-
sents the variable counter.
Both the processes Process A and Process B contain the program statement counter+
+; Translating this into the machine instruction.
instruction, the original content of these registers will be saved as part of the context saving
and it will be retrieved back as part of context retrieval, when process A gets the CPU for
execution.
Hence the content of eax and ebp remains intact irrespective of context switching).
Though the variable counter is incremented by Process B, Process A is unaware of it
and it increments the variable with the old value.
This leads to the loss of one increment for the variable counter.
This problem occurs due to non-atomict operation on variables.
This issue wouldn’t have been occurred if the underlying actions corresponding to the
program statement counter++; is finished in a single CPU execution cycle.
The best way to avoid this situation is make the access and modification of shared
variables mutually exclusive; meaning when one process accesses a shared variable,
prevent the other processes from accessing it.
Deadlock
A race condition produces incorrect results whereas a deadlock condition creates a situation
where none of the processes are able to make any progress in their execution resulting in a set
of deadlock processes.
In its simplest form ‘deadlock’ is the condition in which a process is waiting for a resource
held by another process which is wait- ing for a resource held by the first process (Fig.
10.25).
Process B is currently holding resource y and it wants the resource x which is currently held
by Process A.
Both hold the respective resources and they compete each other to get the resource held by
the respective processes.
None of the competing process will be able to access the resources held by other processes
since they are locked by the respective processes (If a mutual exclusion policy is
implemented for shared resource access, the resource is locked by the process which is
currently accessing it).
Mutual Exclusion: The criteria that only one process can hold a resource at a time. Meaning
processes should access shared resources with mutual exclusion. Typical example is the
accessing of display hardware in an embedded device.
Hold and Wait: The condition in which a process holds a shared resource by acquiring the
lock control- ling the shared access and waiting for additional resources held by other
processes.
No Resource Preemption: The criteria that operating system cannot take back a resource
from a process which is currently holding it and the resource can only be released voluntarily
by the process holding it.
Circular Wait:
A process is waiting for a resource which is currently held by another process which in turn is
waiting for a resource held by the first process.
In general, there exists a set of waiting process PO, Pl ... Px with PO is waiting for a resource
held by P1 and P1 is waiting for a resource held by PO, ..., Pn is waiting for a resource held
by PO and PO is waiting for a resource held by Pv and so on...
This forms a circular wait queue. _ ‘Deadlock’ is a-result of the combined occurrence of
these four conditions listed above.
These conditions are first described by E. G. Coffman in 1971 and it is popularly known as
Coffman conditions
Deadlock Handling
A smart OS may foresee the deadlock condition and will act proactively to avoid such a
situation.
Now if a deadlock occurred, how the OS responds to it? The reaction to deadlock condition
by OS is nonuniform.
The OS may adopt any of the following techniques to detect and prevent deadlock conditions.
Ignore Deadlocks:
This is acceptable for the reason the cost of removing a deadlock is large compared to the
chance of happening a deadlock.
A life critical system cannot pretend that it is deadlock free for any reason.
This approach suggests the detection of a deadlock situation and recovery from
it. This 1s similar to the deadlock condition that may arise at a traffic junction.
When the vehicles from different directions compete to cross the junction, deadlock (traffic
jam) condition is resulted.deadlock (traffic jam) is happened at the junction, the only solution
is to back up the vehicles from one direction and allow the vehicles from opposite direction to
cross the junction. If the traffic is too high, lots of vehicles may have to be backed up to
resolve the traffic jam.
A deadlock condition can be detected by analysing the resource graph by graph analyser
algorithms.
Once a deadlock condition is detected, the system can terminate a process or preempt the
resource to break the deadlocking cycle.
Avoid Deadlocks: Deadlock is avoided by the careful resource allocation techniques by the
Operating System. It is similar to the traffic light mechanism at junctions to avoid the traffic
jams.
Prevent Deadlocks: Prevent the deadlock condition by negating one of the four conditions
favouring the deadlock situation.
Ensure that-a process does not hold any other resources when it requests a resource. This can
be achieved by implementing the following set of rules/guidelines in allocating resources to
pro- cesses.
1. A process must request all its required resource and the resources should be
allocated before the process begins its execution.
2. Grant resource allocation requests from processes only if the process does not hold
a resource currently.
Ensure that resource preemption (resource releasing) is possible at operating system level.
This can be achieved by implementing the following set of rules/guidelines in resources
allocation and releasing.
1. Release all the resources currently held by a process if a request made by the
process for a new resource is not able to fulfil immediately.
2. Add the resources which are preempted (released) to a resource list describing the
resources which the process requires to complete its execution.
3. Reschedule the process for execution only when the process gets its old resources
and the new resource which is requested by the process. Imposing these criterions may
introduce negative impacts like low resource utilisation and starvation of processes.
Livelock
The Livelock condition is similar to the deadlock condition except that a process in
livelock condition changes its state with time.
While in deadlock a process enters in wait state for a resource and continues in that
state forever without making any progress in the execution, in.alivelock condition a
process always does something but is unable’ to make any progress in the execution
completion.
The livelock condition is better explained with the real world example, two people
attempting to cross each other in a narrow corridor.
Both the persons move towards each side of the corridor to allow the opposite person
to cross. Since the corridor is narrow, none of them are able to cross each other. Here
both of the persons perform some action but still they are unable to achieve their
target, cross each other.
Starvation
In the multitasking context, starvation is the condition in which a process does not get
the resources required to continue its execution for a long time.
As time progresses the process starves on resource.
Starvation may arise due to various conditions like byproduct of preventive measures
of deadlock, scheduling policies favouring high priority tasks and tasks with shortest
execution time, etc.
Functional Requirements
Processor Support It is not necessary that all RTOS’s support all kinds of processor
architecture. It is essential to ensure the processor support by the RTOS.
Memory RequirementsThe OS requires ROM memory for holding the OS files and it is
normally stored in a non-volatile memory like FLASH.
Since embedded systems are memory constrained, it is essential to evaluate the minimal
ROM and RAM requirements for the OS under consideration.
Real-time Capabilities It is not mandatory that the operating system for all embedded
systems need to be Real-time and all embedded Operating systemsare ‘Real-time’
inbehaviour.
Analyse the real-time capabilities of the OS under consideration and the standards met by the
operating system for real-time capabilities.
Kernel and Interrupt Latency The kernel of the OS may disable interrupts while executing
certain services and it may lead to interrupt latency.
For an embedded system whose response requirements are high, this latency should be
minimal.
Certain kernels may provide a bunch of options whereas others provide very limited
options. Certain kernels implement policies for avoiding priority inversion issues in
resource sharing. Modularisation Support Most of the operating systems provide a bunch
of features.
At times it may not be necessary for an embedded product for its functioning.
It is very useful if the OS supports modularisation where in which the developer can choose
the essential modules and re-compile the OS image for functioning.
Ensure that the OS under consideration provides support for all the interfaces required by the
embedded product.
Development Language Support Certain operating systems include the run time libraries
required for running applications written in languages like Java and C#.
A Java Virtual Machine (JVM) customised for the Operating System is essential for running
java applications.
Similarly the NET Compact Framework (.NETCF) is required for running Microsoft® NET
applications on top of the Operating System.
The OS may include these components as built-in component, if not, check the availability of
the same from a thirdparty vendor for the OS under consideration.
Non-functional Requirements
Sometimes it may be possible to build the required features: by customising an Open source
OS.
The decision on which to select is purely dependent on the development cost, licensing fees
for the OS, development time and availability of skilled resources.
Cost The total cost for developing or buying the OS and maintaining it in terms of
commercial product and custom build needs to be evaluated before taking a decision on the
selection of OS.
Certain Operating Systems may be superior in performance, but the availability of tools for
supporting the development may be limited.
Ease of Use How easy it is to use a commercial RTOS is another important feature that needs
to be considered in the RTOS selection.
Integration of hardware and firmware deals with the embedding of firmware into the
target hardware board.
Out-of-Circuit Programming
The sequence of operations for embedding the firmware with a programmer is listed below.
1. Connect the programming device to the specified port of PC (USB/COM port/parallel port)
2. Power up the device (Most of the programmers incorporate LED to indicate Device power
up. Ensure that the power indication LED is ON)
3. Execute the programming utility on the PC and ensure proper connectivity is established
between PC and programmer. In case of error, turn off device power and try connecting it
again
4. Unlock the ZIF socket by turning the lock pin
5. Insert the device to be programmed into the open socket as per the insert diagram shown
on the programmer
6. Lock the ZIF socket
7. Select the device name from the list of supported devices
8. Load the hex file which is to be embedded into the device
9. Program the device by ‘Program’ option of utility program
10.Wait till the completion of programming operation (Till busy LED of programmer is off)
11.Ensure that programming is successful by checking the status LED on the programmer
(Usually ‘Green’ for success and ‘Red’ for error condition) or by noticing the feedback from
the utility program
12. Unlock the ZIF socket and take the device out of programmer
Devices with SPI In System Programming support contains a built-in SPI interface and the
on-chip EEPROM or FLASH memory is programmed through this interface.
The primary I/O lines involved in SPI - In System Programming are listed below.
PC acts as the master and target device acts as the slave in ISP.
The program data is sent to the MOSI pin of target device and the device
acknowledgement is originated from the MISO pin of the device.
SCK pin acts as the clock for data transfer.
A utility program can be developed on the PC side to generate the above signal lines.
Since the target device works under a supply voltage less than SV (TTL/CMOS), it is
better to connect these lines of the target device with the parallel port of the PC.
Since Parallel port operations are also at 5V logic, no need for any other intermediate
hardware for signal conversion.
The pins of parallel port to which the ISP pins of device needs to be connected are
dependent on the program, which is used for generating these signals, or you can fix
these lines first and then write the program according to the pin interconnection
assignments.
Standard SPI-ISP utilities are freely available on the internet and there is no need for
going for writing own program. What you need to do is just connect the pins as
mentioned by the program requirement.
As mentioned earlier, for ISP operations, thetarget device needs to be powered up in a pre-
defined sequence. The power up sequence for In System Programming for Atmel’s AT89S
series microcontroller family is listed below.
1. Apply supply voltage between VCC and GND pins of target chip.
2. Set RST pin to “HIGH” state.
3. If a crystal is not connected across pins XTAL1 and XTAL2, apply a 3 MHz to 24
MHz clock to XTAL]I pin and wait for at least 10 milliseconds.
4. Enable serial programming by sending the Programming Enable serial instruction
to pin MOSI/ P1.5. The frequency of the shift clock supplied at pin SCK/P1.7 needs
to be less than the CPU clock at XTAL1 divided by 40.
5. The Code or Data array is programmed one byte at a time by supplying the address
and data together with the appropriate Write instruction. The selected memory
location is first erased before the new data is written. The write cycle is self-timed and
typically takes less than 2.5 ms at SV.
6. Any memory location can be verified by using the Read instruction, which returns
the content at the selected address at serial output MISO/P1.6.
7. After successfully programming the device, set RST pin low or turn off the chip
power supply and turn it ON to commence the normal operation.
The key player behind ISP is a factory programmed memory (ROM) called ‘Boot
ROM’.
The Boot ROM normally resides at the top end of code memory space and it varies in
the order of a few Kilo Bytes (For a controller with 64K code memory space and 1K
Boot ROM, the Boot ROM resides at memory location FCOOH to FFFFH).
It contains a set of Low-level Instruction APIs and these APIs allow the
‘arocessor/controller to perform the FLASH memory programming, erasing and
Reading operations.
“The contents of the Boot ROM are provided by the chip manufacturer and the same
is masked into every ‘device.
The Boot ROM for different family or series devices is different.
By default the Reset vector starts the code memory execution at location 0000H.
If the ISP mode is enabled through the special ISP Power up sequence, the execution
will start at the Boot ROM vector location.
In System Programming technique is the best advised programming technique for
development work since the effort required to re-program the device in case of
firmware modification is very little.
Firmware upgrades for products supporting ISP is quite simple.
It is possible to embed the firmware into the target processor/controller memory at the
time of chip fabrication itself.
Such chips are known as ‘Factory programmed chips’.
Once the firmware design is over and the firmware achieved operational stability, the
firmware files can be sent to the chip fabricator to embed it into the code memory.
Factory programmed chips are convenient for mass production. applications and it
greatly reduces the product development time.
It is not recommended to use factory programmed chips for development purpose
where the firmware undergoes frequent changes.
Factory programmed ICs are bit expensive.
The OS based embedded systems are programmed using the In System Programming
(ISP) technique.
OS based embedded systems contain a special piece of code called ‘Boot loader’
program which takes control of the OS and application firmware embedding and
copying of the OS image to the RAM of the system for execution.
The ‘Boot loader’ for such embedded systems comes as pre-loaded or it can be loaded
to the memory using the various interface supported like JTAG.
The bootloader contains necessary driver initialisation implementation for initialising
the supported interfaces like UART, TCP/IP etc.
Bootloader implements menu options for selecting the source for OS image to load
In case of the network based loading, the bootloader broadcasts the target’s presence
over the network and the host machine on which the OS image resides can identify
the target device by capturing this message.
Once a communication link is established between the host and target machine, the
OS image can be directly downloaded to the FLASH memory of the target device.
Now the firmware is embedded into the target board using one of the programming
techniques
Sometimes the first power up may end up in a messy explosion leaving the smell of
burned components behind.
It may happen due to various reasons, like Proper care was not taken in applying the
power and power applied in reverse polarity (+ve of supply connected to -ve of the
target board and vice versa), components were not placed in the correct polarity order
The development environment consists of a Development Computer (PC) or Host, which acts
as the heart of the development environment, Integrated Development Environment (IDE)
Tool for embedded firmware development and debugging, Electronic Design Automation
(EDA) Tool for Embedded Hardware design, An emulator hardware for debugging the target
board, Signal sources (like Function generator) for simulating the inputs to the target board,
Target hardwaredebugging tools (Digital CRO, Multimeter, Logic Analyser, etc.) and the
target hardware.
The Integrated Development Environment (IDE) and Electronic Design Automation (EDA)
tools are selected based on the target hardware development requirement and they are
supplied as Installable files in CDs by vendors.
versions.
Licensed versions of the tools are fully featured and fully functional whereas trial versions
fall into two categories, tools with limited features, and full featured copies with limited
period of usage.
DISASSEMBLER/DECOMPILER
Disassembler is a utility program which converts machine codes into target processor
specific Assembly codes/instructions.
The process of converting machine codes into Assembly code is known as
‘Disassembling’.
In operation, disassembling is complementary to assembling/cross-assembling.
Decompiler is the utility program for translating machine codes into corresponding
high level language instructions.
Decompiler performs the reverse operation of compiler/cross-compiler.
The disassemblers/decompilers for different family of processors/controllers are
different. Disassemblers/Decompilers are deployed in reverse engineering.
Reverse engineering is the process of revealing the technology behind the working of
a product. Disassemblers/decompilers help the reverse engineering process by
translating the embedded firmware into Assembly/high level language instructions.
Disassemblers/Decompilers are powerful tools for analysing the presence of
malicious codes (virus information) in an executable image.
Disassemblers/Decompilers are available as either freeware tools readily available for
free download from internet or as commercial tools.
It is not possible for a disassembler/decompiler to generate an exact replica of the
original assembly code/high level source code in terms of the symbolic constants and
comments used. However disassemblers/decompilers generate a source code which is
somewhat matching to the original source code from which the binary code is
generated.
Simulator is a software tool used for simulating the various conditions for checking
the functionality of the application firmware.
The Integrated Development Environment (IDE) itself will be providing simulator
support and they help in debugging the firmware for checking its required
functionality.
In certain scenarios, simulator refers to a soft model (GUI model) of the embedded
product. For example, if the product under development is a handheld device, to test
the functionalities of the various menu and user interfaces, a soft form model of the
product with all UI as given in the end product can be developed in software.
Soft phone is an example for such a simulator.
Emulator is hardware device which emulates the functionalities of the target device
and allows real time debugging of the embedded firmware in a hardware
environment.
Simulators
Simulators simulate the target hardware and the firmware execution can be inspected using
simulators.
With simulator’s simulation support you can input any desired value for any
parameter during debugging the firmware and can observe the control flow of
firmware.
It really helps the developer in simulating abnormal operational environment for
firmware and helps the firmware developer to study the behaviour of the firmware
under abnormal input conditions.
Hardware debugging deals with the monitoring of various bus signals and checking the status
lines of the target hardware.
Firmware debugging deals with examining the firmware execution, execution flow, changes
to various CPU registers and status registers on execution of the firmware to ensure that the
firmware is running as per the design.
This is the most primitive type of firmware debugging technique where the code is
separated into different functional code units.
Instead of burning the entire code into the EEPROM chip at once, the code is burned
in incremental order, where the code corresponding to all functionalities are
separately coded, cross-compiled and burned into the chip one by one.
The code will incorporate some indication support like lighting up an “LED (every
embedded product contains at least one LED).
If not, you should include provision for at least one LED in the target board at the
hardware design time such that it can be used for debugging purpose)” or activate a
“BUZZER (In a system with BUZZER support)” if the code is functioning in the
expected way.
If the first functionality is found working perfectly on the target board with the
corresponding code burned into the EEPROM, go for burning the code corresponding
to the next functionality and check whether it is working.
Repeat this process till all functionalities are covered.
Please ensure that before entering into one level up, the previous level has delivered a
correct result.
If the code corresponding to any functionality is found not giving the expected result,
fix it by modifying the code and then only go for ‘adding the next functionality for
burning into the EEPROM.
After you found all functionalities working properly, combine the entire source for all
functionalities together, re-compile and burn the code for thetotal system functioning.
Obviously it is a time-consuming process.
It is a onetime process and once you test the firmware in an incremental model you
can go for mass production.
In incremental firmware burning technique we are not doing any debugging but
observing the statusof firmware execution as a debug method.
The monitor program always listens to the serial port of the target device and
according to the command received/from the serial interface it performs
command specific actions like firmware downloading, memory
inspection/modification, firmware single "stepping and sends the debug
information (various register and memory contents) back to the main debug
program running on the development PC, etc.
The first step in any monitor program development is determining a set of
commands for performing various operations like firmware downloading,
memory/ register inspection/modification, single stepping, etc.
Once the commands for each operation is fixed, write the code forperforming
the actions corresponding to these commands.
As mentioned earlier, the commands may be received through any of the
external interface of the target processor (e.g. RS-232C serial interface/parallel
interface/USB, etc.).
Themonitor program should query this interface to get commands or should
handle the command reception if the data reception.is implemented through
interrupts.
On receiving a command, examine it and perform the action corresponding to
it.
The entire code stuff handling the command reception and corresponding
action implementation is known as the “monitor program”.
The most common type of interface used between target board and debug
application is RS-232C Serial interface.
After the successful completion of the ‘monitor program’ development, it is
compiled and burned into the FLASH memory or ROM of the target board.
The-code memory contain- ing the monitor program is known as the ‘Monitor
ROM’.
The monitor program usually resides at the reset vector (code memory 0000H) of the
target processor.
The monitor program is commonly employed in development boards and the
development board supplier provides the monitor program, in the form of a ROM
chip.
The actual code memory is downloaded intoa RAM chip which is interfaced to the
processor in the Von-Neumann architecture model.
The Von-Neumann architecture model is achieved by ANDing the PSEN\ and RD\
signals of the target processor (In case of 805/) and connecting the output of AND
Gate to the Output Enable (RD\) pin of RAM chip.
WR\ signal of the target processor is interfaced to The WR\ signal of the Von
Neumann RAM. Monitor ROM size varies in the range of a few kilo bytes (Usually
4K). An address decoder circuit maps the address range allocated to the monitor ROM
and activates the Chip Select (CS\) of the ROMif theaddress is within the range
specified for the Monitor ROM.
A user program is normally leaded at locations 0x4000 or 0x8000.
Theaddress decoder circuit ensures the enabling of the RAMchip.(CS\) when
theaddress range is outside that allocated to the ROM monitor.
Though there are two memory chips (Monitor ROM Chip and Von-Neumann RAM),
the total memory map available for both of them will be 64K for a
processor/controller with 16bit address space aidthe memory decoder units take care
of avoiding conflicts in accessing both. While developing user program for monitor
ROM based systems, special care should be taken to offset the user codeand handling
the interrupt vectors.
The target development IDE will help in resolving this. During firmware execution
and single stepping, the user code may have to-be altered and hence the firmware is
always downloaded into a Von-Neumann RAM in monitor ROM-based debugging
systems.
Monitor ROM-based debugging is suitable only for development work and it is not a
good choice for mass produced systems.
1. The entire memory map is converted into a Von-Neumann model and it is shared
between the monitor ROM, monitor program data memory, monitor-program trace
buffer, user written firmware and external user memory.
For 8051, the original Harvard architecture supports 64K code memory and 64K
external data memory (Total 128K memory map).
Going for a monitor based debugging shrinks the total available memory to 64K Von-
Neumann memory and it needs to accommodate all kinds of memory requirement
(Monitor Code, monitor data, trace buffer memory, User code and External User data
memory).
Hence one serial port of the target processor be- comes dedicated for the monitor
application and it cannot be used for any other device interfacing.
‘Simulator’ is a software application that precisely duplicates (mimics) the target CPU and
simulates the various features and instructions supported by the target CPU, whereas an
‘Emulator’ is a self-contained hardware device which emulates the target CPU.
The emulator hardware contains necessary emulation logic and it is hooked to the debugging
application running on the development PC on one end and connects to the target board
through some interface on the other end.
In summary, the simulator ‘simulates’ the target board CPU and the emulator ‘emulates’ the
target board CPU.
In olden days emulators were defined as special hardware devices used for emulating the
functionality of a processor/controllerand performing various debug operations like halt
firmware execution, set breakpoints, get or set internal RAM/CPU register, etc.
Nowadays pure software applications which perform the functioning of a hardware emulator
is also called as ‘Emulators’ (though they are ‘Simulators’ in operation).
The emulator application for emulating the operation of a PDA phone for application
development is an example of a ‘Software Emulator’.
The debugger application may be part of the Integrated Development Environment (IDE) or a
third party supplied tool.
Most of the IDEs incorporate debugger support for some of the emulators commonly
available in the market.
The Emulator POD forms the heart of any emulator system and it contains the following
functional units.
Emulation Device
Emulation device is a replica of the target CPU which receives various signals from
the target board through a device adaptor connected to the target board and performs
the execution of firmware under the control of debug commands from the debug
application.
The emulation device can be either a standard chip same as the target processor (e.g.
AT89C51) or a Programmable Logic Device (PLD) configured to function as the
target CPU.
Ifa standard chip is used as the emulation device, the emulation will provide real-time
execution behaviour.
At the same time the emulator becomes dedicated to that particular device and cannot
be re-used for the derivatives of the same chip.
PLD-based. emulators can easily be re-configured to use with derivatives of the target
CPU under consideration.
By simply loading the configuration file of the derivative processor/controller, the
PLD gets re-configured and it functions as the derivative device.
A major drawback of PLD-based emulator is the accuracy of replication of target
CPU functionalities. PLD-based emulator logic is easy to implement for simple target
CPUs but for complex target CPUs it is quite difficult.
Emulation Memory
The common features of trace buffer memory and trace buffer data viewing are listed below:
Emulator control logic is the logic circuits used for implementing complex hardware
breakpoints, trace buffer trigger detection, trace buffer control, etc.
Emulator control logic circuits are also used for implementing logic analyser
functions in advanced emulator devices.
The ‘Emulator POD’ is connected to the target board through a ‘Device adaptor’ and
signal cable.
Device Adaptors
Device adaptors act as an interface between the target board and emulator POD.
Device adaptors are normally pin-to-pin compatible sockets which can be
inserted/plugged into the target board for routing the various signals from the pins
assigned for the target processor. The device adaptor is usually connected to the
emulator POD using ribbon cables.
The adaptor type varies depending on the target processor’s chip package. DIP,
PLCC, etc. are some commonly used adaptors.
Though OCD adds silicon complexity and cost factor, from a developer perspective it
is.a very good feature supporting fast and efficient firmware debugging.
The On Chip Debug facilities integrated to the processor/controller are chip vendor
dependent and most of them are proprietary technologies like Background Debug
Mode (BDM), OnCE, etc.
Some vendors add ‘on chip software debug support’ through JTAG (Joint Test Action
Group) port.
Processors/controllers with OCD support incorporate a dedicated debug module to the
existing architecture.
Usually the on-chip debugger provides the means to set simple breakpoints, query the
internal state of the chip and single.step through code.
OCD module implements dedicated registers for controlling debugging.
An On Chip Debugger can be enabled by setting the OCD enable bit (The bit name
and register holding the bit varies across vendors).
Debug related registers are used for debugger control (Enable/disable single stepping,
Freeze execution, etc.) and breakpoint address setting.
BDM and JTAG are the two commonly used interfaces to communicate between the
Debug application running on Development PC and OCD module of target CPU.
Some interface logic in the form of hardware will be implemented between the CPU
OCD interface and the host PC to capture the debug information from the target CPU
and sending it to the debugger application running on the host PC.
The interface between the hardware and PC may be Serial/Parallel/USB.
The following section will give you a brief introduction about Background Debug
Mode (BDM) and JTAG interface used in On Chip Debugging.
Background Debug Mode (BDM) interface is a proprietary On Chip Debug solution
from Motorola. BDM defines the communication interface between the chip resident
debug core and host PC where the BDM compatible remote debugger is running.
BDM makes use of 10 or 26 pin connector to connect to the target board.
Serial data in (DSI, Serial data out (DSO) and Serial clock (DSCLK) are the three
major signal lines used in BDM.
DSI sends debug commands serially to the target processor from the remote debugger
application and DSO sends the debug response to the debugger from the processor.
Synchronisation of serial transmission is done by the serial clock DSCLK generated
by the debugger application.
Debugging is controlled by BDM specific debug. commands.
The debug commands are usu- ally 17-bit wide. 16 bits are used for representing the
command and 1 bit for status/control.
Chips with JTAG debug interface contain a built-in JTAG port for communicating
with the remote debugger application.
JITAG is the acronym for Joint, Test Action Group. JTAG is the alternate narye for
IEEE 1149.1 standard.
Like BDM, JTAG is also a serial interface:
Test Data In (TDI): It is used for sending debug commands serially from remote
debugger to the target processor.
Test Data Out (TDO): Transmit debug response to the remote debugger from target
CPU.
Test Clock (TCK): Synchronises the serial data transfer.
Test Mode Select (TMS): Sets the mode of testing.
Test Reset (TRST): It is an optional signal line used for resetting the target CPU. The
serial data transfer rate for JIAG debugging is chip dependent. It is usually within the
range of 10 to 1000 MHz.
Hardware debugging involves the monitoring of various signals of the target board
(address/data lines, port pins, etc.), checking the interconnection among various
components, circuit continuity checking, etc.
The various hardware debugging tools used in Embedded Product Development are
explained below.
Multimeter
A multimeter is used for measuring various electrical quantities like voltage (Both AC
and DC), current (DC as well-as AC), resistance, capacitance, continuity checking,
transistor checking, cathode and anode identification of diode, etc.
Any multimeter will work over a specific range for each measurement.
A multimeter is the most valuable tool in the toolkit of an-embedded hardware
developer.
It is the primary debugging tool for physical contact based hardware debugging and
almost all developers start debugging the hardware with it.
In embedded hardware debugging it is mainly used for checking the circuit continuity
between different points on the board, measuring the supply voltage, checking the
signal value, polarity, etc.
Both analog and digital versions of a multimeter are available.
The digital version is preferred over analog the one for various reasons like
readability, accuracy, etc.
Fluke, Rishab, Philips, etc. are the manufacturers of commonly available high quality
digital multimeters.
Digital CRO
Monitoring the crystal oscillator signal from the target board is a typical example of
the usage of CRO for waveform capturing and analysis in target board debugging.
CROs are available in both analog and digital versions.
Though Digital CROs are costly, featurewise they are best suited for target board
debugging applications.
Digital CROs are available for high frequency support and they also incorporate
modern techniques for recording waveform over a period of time, capturing waves on
the basis of a configurable event (trigger) from the target board (e.g. High to low
transition of a port pin of the target processor).
Most of the modern digital CROs contain more than one channel and it is easy to
capture and analyse various signals from the target board using multiple channels
simultaneously.
Various measurements like phase, amplitude, etc. is also possible with CROs.
Tektronix, Agilent, Philips, etc. are the manufacturers of high precision good quality
digital CROs.
Logic Analyser
Function Generator
BOUNDARY SCAN
As the complexity of the hardware increase, the number of chips present in the board and the
interconnection among them may also increase.
The device packages used in the PCB become miniature to reduce the total board space
occupied by them and multiple layers may be required to route the inter- connections among
the chips.
With miniature device packages and multiple layers for the PCB it will be very difficult to
debug the hardware using magnifying glass, multimeter, etc. to check the interconnection
among the various chips.
Boundary scan is a technique used for testing the interconnection among the various chips,
which support JTAG interface, present in the board. Chips which support boundary scan
associate a boundary scan cell with each pin of the device.
A JTAG port which contains the five signal lines namely TDI, TDO, TCK, TRST and TMS
form the Test Access Port (TAP) for a JTAG sup- ported chip.
The PCB also contains a TAP for connecting the JTAG signal lines to the external world.
A boundary scan path is formed inside the board by interconnecting the devices through
JTAG signal lines.
The TDI pin of the TAP of the PCB is connected to the TDI pin of the first device.
The TDO pin of the first device is connected to the TDI pin of the second device.
In this way all devices are interconnected and the TDO pin of the last JTAG device is
connected to the TDO pin of the TAP of the PCB.
The clock line TCK and the Test Mode Select (TMS) line of the devices areconnected to the
clock line and Test mode select line of the Test Access Port of the PCB respectively.
As mentioned earlier, each pin of the device associates a boundary scan cell with it.
The boundary scan cell associated with the input pins of an IC is known as ‘input cells’ and
the boundary scan cells associated with the output pins of an IC is known as ‘output cells’.
The boundary scan cells can be used for capturing the input pin signal state and passing it to
the internal circuitry, capturing the signals from the internal circuitry and passing it to the
output pin, and shifting the data received from the Test Data In pin of the TAP.
The boundary scan cells associated with the pins are interconnected and they form a chain
from the TDI pin of the device to its TDO pin.
The boundary scan cells can be operated in Normal, Capture, Update and Shift modes.
In the Normal mode, the input of the boundary scan cell appears directly at its output.
In the Capture mode, the boundary scan cell associated with each input pin of the chip
captures the signal from the respective pins to the cell and the boundary scan cell associated
with each output pin of the chip captures the signal from the internal circuitry.
In the Update mode, the boundary scan cell associated with each input pin.of the chip passes
the already captured data to the internal circuitry and the boundary scan cell associated with
each output pin of the chip passes the already captured data to the respective output pin.
In the shift mode, data is shifted from TDI pin to TDO pin of the device through the
boundary scan cells.
ICs supporting boundary scan contain additional boundary scan related registers for
facilitating the boundary scan operation.
Instruction Register, Bypass Register, Identification Register, etc. are examples of boundary
scan related registers.
The Instruction Register is used for holding and processing the instruction received over the
TAP.
The bypass register is used for bypassing the boundary scan path of the device and directly
interconnecting the, TDI pin of the device to its TDO. It disconnects a device from the
bound- ary scan path.
Different instructions are used for testing the interconnections and the functioning of the chip.
Extest, Bypass, Sample and Preload, Intest, etc. are examples for instructions for different
types of boundary scan tests, whereas the instruction Runbist is used for performing a self
test internal functioning of the chip.
Boundary Scan Description Language (BSDL) is used for implementing boundary scan tests
using JTAG.
chip.
The BSDL file (File which describes the boundary scan implementation for a device format)
for a JTAG compliant device is supplied the device manufacturers or it can be downloaded
from internet repository.
The BSDL file is used as the input to a Boundary Scan Tool for generating boundary scan
test cases for a PCB.
Automated tools are available for boundary scan test implementation from multiple vendors.
The ScanExpress™ Boundary Scan (JTAG) product from Corelis Inc. (www.corelis.com) 1s
a popular tool for boundary scan test implementation.