Machine code instructions
• We need to start with a few facts.
• The only language that the CPU recognises is machine code.
• Machine code consists of a sequence of instructions.
• An instruction contains an opcode.
• An instruction may not have an operand but up to three operands are possible.
• Different processors have different instruction sets associated with them.
• Different processors will have comparable instructions for the same operations, but the coding of the
instructions will be different.
For a particular processor, the following must be defined for each individual machine code instruction:
• the total number of bits or bytes for the whole instruction
• the number of bits that define the opcode
• the number of operands that are defined in the remaining bits
• whether the opcode occupies the most significant or the least significant bits.
We will consider a simple system where there is either one or zero operands. This simple system will be assumed
to have a 16-bit address bus width.
The number of bits needed for the opcode depends on the number of different opcodes in the instruction set for the
processor. The opcode can be structured with the first few bits defining the operation and the remaining bits
associated with addressing. A sensible instruction format for our simple processor is shown in Figure 6.01.
Figure 6.01 A simple instruction format
This has an eight-bit opcode consisting of four bits for the operation, two bits for the address mode and the
remaining two bits for addressing registers. This allows 16 different operations each with one of four addressing
modes. This opcode will occupy the most significant bits in the instruction. Because in some circumstances the
operand will be a memory address it is sensible to allocate 16 bits for it. This is in keeping with the 16-bit address
bus.
When an instruction arrives in the CPU the control unit checks the opcode to see what action it defines. This first
step in the decode stage of the fetch–execute cycle can be described using the register transfer notation. However,
a slight amendment is needed to the format. The following shows the transfer of bits 16 to 23, which represent the
opcode, from the current instruction register to the control unit:
CU ← [CIR(23:16)]
Assembly language
A programmer might wish to write a program where the actions taken by the processor are directly controlled. It is
argued that this is the most efficient type of program. However, writing a substantial program as a sequence of
machine code instructions would take a very long time and there would be inevitably lots of errors along the way.
The solution for this type of programming is to use assembly language. As well as having a uniquely defined
machine code language, each processor has its own assembly language.
The essence of assembly language is that for each machine code instruction there is an equivalent assembly
language instruction which comprises:
• a mnemonic (a symbolic abbreviation) for the opcode
• a character representation for the operand.
If a program has been written in assembly language it has to be translated into machine code before it can be
executed by the processor. The translation program is called an assembler.
Using an assembly language, the programmer has the advantage of the coding being easier to write than it would
have been in machine code. In addition, the use of the assembler allows a programmer to include some special
features in an assembly language program. Examples of some of these are:
• comments
• symbolic names for constants labels for addresses
• macros
• directives.
A macro is a sequence of instructions that is to be used more than once in a program. A directive is an instruction
to the assembler as to how it should construct the final executable machine code. This might be to direct how
memory should be used or to define files or procedures that will be used.
Symbolic, relative and absolute addressing
When considering how an assembler would convert an assembly language program into machine code it is
necessary to understand the difference between symbolic, relative and absolute addressing. To explain these, we
can consider a simple assembly language program which totals single numbers input at the keyboard.
The use of symbolic addressing allows a programmer to write some assembly language code without having to
bother about where the code will be stored in memory when the program is run. However, it is possible to write
assembly language code where the symbolic addressing is replaced by either relative addressing or absolute
addressing. Table 6.02 shows the simple code from Table 6.01 converted to use these alternative approaches.
Table 6.02 A simple assembly language program using relative and absolute addressing
For the relative addressing example, the assumption is that a special-function base register BR contains the base
address. The contents of this register can then be used as indicated by [BR]. Note that there are no labels for the
code. The left-hand column is just for illustration identifying the offset from the base address which is the address of
the first instruction in the program.
For the absolute address example there are again no labels for the code. The left-hand column is again just for
illustration but this time identifying actual memory addresses. This has been coded with the understanding that the
first instruction in the program is to be stored at memory address 200.
The assembly process for a two-pass assembler
For any assembler there are a number of things that have to be done with the assembly language code before any
translation can be done. Some examples are:
• removal of comments
• replacement of a macro name used in an instruction by the list of instructions that constitute the macro
definition
• removal and storage of directives to be acted upon later.
A two-pass assembler is designed to handle programs written in the style of the one. This program contains
forward references. Some of the instructions have a symbolic address for the operand where the location of the
address is not known at that stage of the program. A two-pass assembler is needed so that in the first pass the
location of the addresses for forward references can be identified.
To achieve this during the first pass the assembler uses a symbol table. The code is read line by line. When a
symbolic address is met for the first time its name is entered into the symbol table. Alongside the name a
corresponding address has to be added as soon as that can be identified. Table 6.03 shows a possible format for
the symbol table that would be created for the program.
Table 6.04 An opcode lookup table
Provided that no errors have been identified, the output from the second pass will be a machine code program. For
our example, this code is shown in Table 6.05 along with the original assembly code for comparison.
Table 6.05 Machine code created from assembly code Some points to note are as follows.
• Most of the instructions have an operand which is a 16-bit binary number.
• Usually this represents an address but for the SUB and LDM instructions the operand is used as a value.
• There is no operand for the IN and END instructions.
• The INC instruction is a special case. There is an operand in the assembly language code but this just
identifies a register. In the machine code the register is identified within the opcode so no operand is
needed.
• The machine code has been coded with the first instruction occupying address zero. This code is not
executable in this form but it is valid output from the assembler.
• Changes will be needed for the addresses when the program is loaded into memory ready for it to be
executed.
• Three memory locations following the program code have been allocated a value zero to ensure that they
are available for use by the program when it is executed.
Addressing modes
When an instruction requires a value to be loaded into a register there are different ways of identifying the value.
Each one is known as an addressing mode. It was stated that, for our simple processor, two bits of the opcode in
a machine code instruction would be used to define the addressing mode. This allows four different modes.
Table 6.06 Addressing modes
For immediate addressing there are three options for defining the value:
• #48 specifies the denary value 48
• #B00110000 specifies the binary equivalent
• #&30 specifies the hexadecimal equivalent
Assembly language instructions
We continue to consider a simple processor with a limited instruction set. The examples described here do not
correspond directly to those found in the assembly language for any specific processor. Individual instructions will
have a match in more than one real-life set. The important point is that these examples are representative. In
particular, there are examples of the most common categories of instruction.
Data movement - These types of instruction can involve loading data into a register or storing data in memory.
Table 6.07 contains a few examples of the format of the instructions with explanations.
Table 6.07 Some instruction formats for data movement
The important point to note is that the mnemonic defines the instruction type including which register is involved
and, where appropriate, the addressing mode. It is important to read the mnemonic carefully! The instruction will
have an actual address where <address> is shown, a register abbreviation where <register> is shown and a denary
value for n where #n is shown. The explanations use ACC to indicate the accumulator. For explanations of LDD,
LDI and LDX, refer back to Table 6.07.
Input and output - There are two instructions provided for input or output. In each case the instruction has only an
opcode; there is no operand.
• The instruction with opcode IN is used to store in the ACC the ASCII value of a character typed at the
keyboard.
• The instruction with opcode OUT is used to display on the screen the character for which the ASCII code is
stored in the ACC.
Comparisons and jumps - A program might need an unconditional jump or might need a jump if a condition is
met.
Table 6.08 Jump and compare instruction formats
Note that the comparison is restricted to asking if two values are equal. The result of the comparison is recorded by
a flag in the status register. The execution of the conditional jump instruction begins by checking whether or not the
flag bit has been set. This jump instruction does not cause an immediate jump. This is because a new value has to
be supplied to the program counter so that the next instruction is fetched from this newly specified address. The
incrementing of the program counter that took place automatically when the instruction was fetched is overwritten.
Arithmetic operations - There are no instructions for general-purpose multiplication or division. General-purpose
addition and subtraction are catered for. Table 6.09 contains the instruction formats used for arithmetic operations.
Table 6.09 Instruction formats for arithmetic operations
Figure 6.03 shows a program to find out how many times 5 divides into 75. The following should be noted
concerning the program.
• The first three instructions initialise the count and the sum.
• The instruction in address 103 is the one that is returned to in each iteration of the loop; in the first iteration
it is loading the value 0 into the accumulator when this value is already stored but this cannot be avoided.
Figure 6.03 A program to calculate the result of dividing 75 by 5
• The next three instructions are increasing the count by 1 and storing the new value. Instructions 106 to 108
add 5 to the sum.
• Instructions 109 and 110 check to see if the sum has reached 75 and if it has not the program begins the
next iteration of the loop.
• Instructions 111 to 113 are only used when the sum has reached 75 which causes the value 15 stored for
the count to be output.
Shift operations - There are two shift instructions available:
- LSL #n
where the bits in the accumulator are shifted logically n places to the left
- LSR #n
where the bits are shifted to the right.
In a logical shift no consideration is given as to what the binary code in the accumulator represents. Because a
shift operation moves a bit from the accumulator into the carry bit in the status register this can be used to examine
individual bits. For a left logical shift, the most significant bit is moved to the carry bit, the remaining bits are shifted
left and a zero is entered for the least significant bit. For a right logical shift, it is the least significant bit that is
moved to the carry bit and a zero is entered for the most significant bit.
If the accumulator content represents an unsigned integer, the left shift operation is a fast way to multiply by two.
However, this only gives a correct result if the most significant bit is a zero. For an unsigned integer the right shift
represents integer division by two. For example, consider:
00110001 (denary 49) gives if right shifted 00011000 (denary 24)
The remainder from the division can be found in the carry bit. Again, the division will not always give a correct
result; continuing right shifts will eventually produce a zero for every bit. It should be apparent that a logical shift
cannot be used for multiplication or division by two when a signed integer is stored. This is because the operation
may produce a result where the sign of the number has changed. As indicated earlier, only the two logical shifts are
available for the simple processor considered here. However, in more complex processors there is likely to be a
cyclic shift capability. Here a bit moves off one end into the carry bit then one step later moves in at the other end.
All bit values in the original code are retained. Left and right arithmetic shifts are also likely to be available. These
work in a similar way to logical shifts, but are provided for the multiplication or division of a signed integer by two.
The sign bit is always retained following the shift.
Bitwise logic operation -The options for this are described in Table 6.10.
Table 6.10 Bitwise logical operation instructions
The operand for a bitwise logic operation instruction is referred to as a mask because it can effectively cover some
of the bits and only affect specific bits.
Further consideration of assembly language instructions
Register transfer notation - an extension to register transfer notation. We can use this to describe the execution
of an instruction. For example, the LDD instruction is described by:
ACC ← [[CIR(15:0)]]
The instruction is in the CIR and only the 16-bit address needs to be examined to identify the location of the data in
memory. The contents of that location are transferred into the accumulator.
Computer arithmetic - The worked examples illustrates how the values stored in the Status Register can identify a
specific overflow condition.
The use of the following three flags is required:
• the carry flag, identified as C, which is set to 1 if there is a carry
• the negative flag, identified as N, which is set to 1 if a result is negative
• the overflow flag, identified as V, which is set to 1 if overflow is detected.