8/29/2022
Ho Chi Minh City University of Technology
Department of Electrical and Electronics
1. History of CPUs
2. Intel x86 Processors
3. ARM processors
4. Memory
5. Computer Software
1. History of CPUs
1950s:
Ferranti Mark 1, 1951: from University of Manchester
single 80-bit accumulator , the 40-bit "multiplicand/quotient
register"
UNIVAC I (UNIVersal Automatic Computer I) designed
principally by J. Presper Eckert and John Mauchly, the
inventors of the ENIAC
1,905 operations per second running on a 2.25 MHz clock.
IBM 704 in 1957:
Ferranti Mark 1, c. 1951 Ferranti Mark 1, c. 1951 An IBM 704 computer
at NACA in 1957
2
1
8/29/2022
1. History of CPUs
1960s:
IBM System/360 (S/360): 34,500 instructions per
second, with memory from 8 to 64 KB
PDP-11: developed by Digital Equipment Corporation
32 bit processor, allow 4 MB of physical memory
Motorola 68000:
Initial speed grades were 4, 6, and 8 MHz.
68k instruction set
IBM System/360 PDP-11/40 Motorola MC68000
3
1. History of CPUs
1970s:
Intel 4004 (1971):
a single instruction cycle was 10.8 microseconds
Clock rate is 1 MHz
Intel 8008 (1972)/ 8080(1974)/8086(1976): 8-bit CPU with an
external 14-bit address
8008 clock frequency: 0.2 - 0.8MHz
8080 clock frequency: 2 MHz
8086 clock frequency : 5-10MHz
32-bit VAX (1977): based on DEC's earlier PDP-11, support
virtual memory
Intel 4004 Intel 8088 Intel 8086
4
2
8/29/2022
A Brief History of Computer
Link YouTube: https://www.youtube.com/watch?v=iK0PT5q7GlE
2. Intel x86 Processors
Dominate laptop/desktop/server market
Evolutionary design
Backwards compatible up until 8086, introduced in 1978
Added more features as time goes on
Complex instruction set computer (CISC)
Many different instructions with many different formats
But, only small subset encountered with Linux programs
Hard to match performance of Reduced Instruction Set
Computers (RISC)
But, Intel has done just that!
In terms of speed. Less so for low power.
3
8/29/2022
Intel x86 Evolution: Milestones
Name Date Transistors MHz
8086 1978 29K 5-10
First 16-bit Intel processor. Basis for IBM PC & DOS
1MB address space
386 1985 275K 16-33
First 32 bit Intel processor , referred to as IA32
16 bit data path
Added “flat addressing”, capable of running Unix
486
32-bit register, 32-bit data
486DX include FPU (Floating Point Unit)
Pentium 4E 2004 125M 2800-3800
First 64-bit Intel x86 processor, referred to as x86-64
Core 2 2006 291M 1060-3500
First multi-core Intel processor
Core i3, i5, i7 2008 731M 1700-3900
Two cores / four cores
Intel x86 Processors, cont.
Machine Evolution
386 1985 0.3M
Pentium 1993 3.1M
Pentium/MMX 1997 4.5M
Pentium Pro 1995 6.5M
Pentium III 1999 8.2M
Pentium 4 2001 42M
Core 2 Duo 2006 291M
Core i7 2008 731M
Added Features
Instructions to support multimedia operations
Instructions to enable more efficient conditional
operations
Transition from 32 bits to 64 bits
More cores
4
8/29/2022
2015 State of the Art
Core i7 Broadwell 2015
Desktop Model
4 cores
Integrated graphics
3.3-3.8 GHz
65W
Server Model
8 cores
Integrated I/O
2-2.6 GHz
45W
2. Intel x86 Processors
8086 processor
40 pin dual in-line package
16-bit wide data bus
16-bit registers
20-bit external address bus
provides a 1 MB physical
address space
The maximum linear address
space is limited to 64 KB
Max CPU clock: 5- 10 MHz
10
5
8/29/2022
2. CPU - x86 Processor
CPU, memory, input/output devices
Instruction set, interfacing C to assembly, macros, stack
frame and calling convention
Interrupt, exception
11
The architecture of 8086 microprocessor
2 major units:
BIU - Bus Interface Unit: bus interface, segment registers, fetch
queue
EU - Execution Unit: control unit, ALU, registers
12
6
8/29/2022
2. x86 Processors - 8086
Instructions:
One-address or two addresses operations
Support Assembly and high-level programming language (C,
Pascal)
Main registers: are called data register or general register
16 bit data
Can be accessed by 8-bit registers
AH AL AX (primary accumulator)
BH BL BX (base, accumulator)
CH CL CX (counter, accumulator)
DH DL DX (accumulator, other functions
13
2. x86 Processors - 8086
Index registers: for addressing
SI Source Index
DI Destination Index
BP Base Pointer
SP Stack Pointer
Program counter:
IP Instruction Pointer
Segment registers:
CS Code Segment
DS Data Segment
ES Extra Segment
SS Stack Segment
14
7
8/29/2022
2. x86 Processors - 8086
Segment registers:
a way to allow programs to address more than 64 KB
the registers CS, DS, SS, and ES point to the currently used program code
segment (CS), the current data segment (DS), the current stack segment
(SS), and one extra segment determined by the programmer (ES).
CS Code Segment
DS Data Segment
ES Extra Segment
SS Stack Segment
0110 1000 1000 0111 0000 Segment, 16 bits, shifted 4 bits left
+ 0011 0100 1010 1001 Offset, 16 bits
0110 1011 1101 0001 1001 Address, 20 bits
15
1. x86 Processors - 8086
Examples for x86
memory segmentation
16
8
8/29/2022
1. x86 Processors - 8086
x86-32: 80386, 80486
Register extend to 32-bit
EAX. EBX ECX, EDX
ESI, EDI, EBP, ESP, EIP, EFLAGS
Two new segment registers (FS and GS) were added
FS, GS is extra data for segment registers
x86-64: AMD64, Core i5, Core i7,
An R-prefix identifies the 64-bit registers (RAX, RBX,
RCX, RDX, RSI, RDI, RBP, RSP, RFLAGS, RIP)
Add eight additional 64-bit general registers (R8-R15)
17
Some History: IA32 Registers
Origin
(mostly obsolete)
%eax %ax %ah %al accumulate
%ecx %cx %ch %cl counter
general purpose
%edx %dx %dh %dl data
%ebx %bx %bh %bl base
source
%esi %si index
destination
%edi %di index
stack
%esp %sp
pointer
base
%ebp %bp
pointer
16-bit virtual registers
(backwards compatibility) 18
9
8/29/2022
x86-64 Integer Registers
%rax %eax %r8 %r8d
%rbx %ebx %r9 %r9d
%rcx %ecx %r10 %r10d
%rdx %edx %r11 %r11d
%rsi %esi %r12 %r12d
%rdi %edi %r13 %r13d
%rsp %esp %r14 %r14d
%rbp %ebp %r15 %r15d
Can reference low-order 4 bytes (also low-order 1
& 2 bytes) 19
3. ARM Processors
• ARM (Acorn RISC Machine) started as a new, powerful, CPU
design for the replacement of the 8-bit 6502 in Acorn
Computers (Cambridge, UK, 1985)
• First models had only a 26-bit program counter, limiting the
memory space to 64 MB (not too much by today standards,
but a lot at that time).
• 1990 spin-off: ARM renamed Advanced RISC Machines
20
10
8/29/2022
3. ARM Processors
• ARM now focuses on Embedded CPU cores
• IP licensing: Almost every silicon manufacturer sells
some microcontroller with an ARM core. Some even
compete with their own designs.
• Processing power with low current consumption
• Good MIPS/Watt figure
• Ideal for portable devices
• Compact memories: 16-bit opcodes (Thumb)
• New cores with added features
• Harvard architecture (ARM9, ARM11, Cortex)
• Floating point arithmetic
• Vector computing
• Java language
21
3. ARM Processors
• 32-bit CPU, Harvard architecture
• 3-operand instructions (typical): ADD Rd,Rn,Operand2
• RISC design:
• Few, simple, instructions
• Load/store architecture (instructions operate on registers, not
memory)
• Large register set
• Pipelined execution
22
11
8/29/2022
Von Neumann Harvard
ARM9s
ARM7s and newers
and olders
Inst. Data
AHB
bus
I D
Cache Cache
MEMORY
& I/O
Bus Interface
AHB
Memory-mapped I/O: bus
• No specific instructions for I/O
(use Load/Store instr. instead) MEMORY
• Peripheral’s registers at some & I/O
memory addresses
23
ARM7TDMI Pipeline
FETCH DECODE EXECUTE
Reg. Reg.
Read Shift ALU Write
1 Clock cycle
ARM9TDMI Pipeline
FETCH DECODE EXECUTE MEMORY WRITE
Reg. Reg.
Shift ALU access
Read Write
1 Clock cycle
• Fetch: Read Op-code from memory to internal Instruction Register
• Decode: Activate the appropriate control lines depending on Opcode
• Execute: Do the actual processing
24
12
8/29/2022
1 FETCH DECODE EXECUTE
2 FETCH DECODE EXECUTE
3 FETCH DECODE EXECUTE
instruction
time
• Simple instructions (like ADD) Complete at a rate of one per cycle
25
• More complex instructions:
1 ADD FETCH DECODE EXECUTE
2 STR FETCH DECODE Cal. ADDR Data Xfer.
3 ADD FETCH stall DECODE EXECUTE
4 ADD FETCH stall DECODE EXECUTE
5 ADD FETCH DECODE EXECUTE
instruction
time
STR : 2 effective clock cycles (+1 cycle)
26
13
8/29/2022
Data Sizes and Instruction Sets
The ARM is a 32-bit architecture.
When used in relation to the ARM:
Byte means 8 bits
Halfword means 16 bits (two bytes)
Word means 32 bits (four bytes)
Most ARM’s implement two instruction sets
32-bit ARM Instruction Set
16-bit Thumb Instruction Set
27
Processor Modes
The ARM has seven operating modes:
User : unprivileged mode under which most tasks run
FIQ : entered when a high priority (fast) interrupt is raised
IRQ : entered when a low priority (normal) interrupt is raised
SVC : (Supervisor) entered on reset and when a Software Interrupt
instruction is executed
Abort : used to handle memory access violations
Undef : used to handle undefined instructions
System : privileged mode using the same registers as user mode
28
14
8/29/2022
The Registers
ARM has 37 registers all of which are 32-bits long.
1 dedicated program counter
1 dedicated current program status register
5 dedicated saved program status registers
30 general purpose registers
The current processor mode governs which of several banks is
accessible. Each mode can access
a particular set of r0-r12 registers
a particular r13 (the stack pointer, sp) and r14 (the link register, lr)
the program counter, r15 (pc)
the current program status register, cpsr
Privileged modes (except System) can also access
a particular spsr (saved program status register)
29
The ARM Register Set
Current Visible Registers
r0
Abort
Undef
SVC
IRQ
FIQ
User Mode
Mode
Mode
Mode
Mode
r1
r2
r3 Banked out Registers
r4
r5
User,
r6 User FIQ IRQ SVC Undef Abort
r7
SYS
r8 r8 r8
r9 r9 r9
r10 r10 r10
r11 r11 r11
r12 r12 r12
r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp)
r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr)
r15 (pc)
cpsr
spsr spsr spsr spsr spsr spsr
30
15
8/29/2022
Special Registers
Special function registers:
PC (R15): Program Counter. Any instruction with PC as its destination
register is a program branch
LR (R14): Link Register. Saves a copy of PC when executing the BL
instruction (subroutine call) or when jumping to an exception or interrupt
routine
- It is copied back to PC on the return from those routines
SP (R13): Stack Pointer. There is no stack in the ARM architecture. Even
so, R13 is usually reserved as a pointer for the program-managed stack
CPSR : Current Program Status Register. Holds the visible status register
SPSR : Saved Program Status Register. Holds a copy of the previous status
register while executing exception or interrupt routines
- It is copied back to CPSR on the return from the exception or interrupt
- No SPSR available in User or System modes
31
4. Memory
Memory - Purpose of memory is data storage. Two major
types of memory :
Primary memory - to hold data and instructions during
processing
eg RAM. Relatively limited capacity and volatile
Secondary memory - to provide permanent long term storage
eg hard disk. High capacity and non-volatile
RAM banks Hard disk NAND flash chip
32
16
8/29/2022
4. Memory
Primary memory consists of a set of locations defined
by sequentially numbered addresses. Each location
contains a binary number that can be interpreted as data
or an instruction.
8086 uses 20-bit physical address
Manage 1MB of memory
80386 uses 32-bit physical address
Manage 4GB of memory
X86-64 uses 64-bit physical address
Manage ??? of memory
33
u Memory locations are called words. Words are 8 bits (one byte) in size, or
a multiple of 8. Common word sizes are 16, 32 and 64 bits.
0 1 0 0 1 0 0 0 1
1
1 1 0 1 0 0 1 1
2
0 1 0 0 0 0 0 0
3
4 1 0 1 0 0 1 1 1
5 1 1 1 0 1 0 1 0
1 1 0 0 1 0 1 0
Memory locations, using an 8 bit word
34
17
8/29/2022
2. Memory
Memory is commonly measured in multiples of bits
and bytes.
1 bit = 1 binary digit (0 or 1).
1. 1 byte = 8 bits
2. 1KB = 1024 bytes = 210
3. 1MB = 1024 KB= 220
4. 1GB = 1024 MB = 230
5. 1TB = 1024 GB = 240
35
Big Endian vs. Little Endian
• x86 processors are little-endian
• IBM z/Architecture mainframes are big-endian processors
Big Endian Little Endian
(Others) High Memory (Intel)
Addresses
Register Register
FE ED FA CE
00 0x5 00 FE ED FA CE
00 0x4 00
CE 0x3 FE
FA 0x2 ED
ED 0x1 FA
FE 0x0 CE
Low Memory Addresses
36
18
8/29/2022
5. Computer Software
Assembly/Machine Code View
CPU Memory
Addresses
Registers
Data Code
PC Data
Condition Instructions Stack
Codes
Programmer-Visible State
PC: Program counter Memory
Address of next instruction Byte addressable array
Called “RIP” (x86-64)
Code and user data
Register file
Stack to support procedures
Heavily used program data
Condition codes
Store status information about most
recent arithmetic or logical operation
Used for conditional branching
37
5. Computer Software
Compiling Into Assembly
C Code (sum.c) Generated x86-64 Assembly
long plus(long x, long y); sumstore:
pushq %rbx
void sumstore(long x, long y, movq %rdx, %rbx
long *dest) call plus
{ movq %rax, (%rbx)
long t = plus(x, y); popq %rbx
*dest = t; ret
}
Obtain (on shark machine) with command
gcc –Og –S sum.c
Produces file sum.s
Warning: Will get very different results on non-Shark machines (Andrew Linux,
Mac OS-X, …) due to different versions of gcc and different compiler settings.
38
19
8/29/2022
Quiz
1) Pick the correct choice for the 8086 CPU.
A 16 bit word size, 8 bit data path
B 8 bit word size, 8 bit data path
C 16 bit word size, 16 bit data path
D 4 bit word size, 8 bit data path
E 8 bit word size, 16 bit data path
2) Pick the correct choice for the 80386SX CPU.
A 16 bit word size, 16 bit data path
B 32 bit word size, 16 bit data path
C 8 bit word size, 32 bit data path
D 32 bit word size, 8 bit data path
E 32 bit word size, 32 bit data path
3) Pick the correct choice for the 80486DX CPU.
A 32 bit word size, 16 bit data path
B 64 bit word size, 32 bit data path
C 32 bit word size, 32 bit data path
D 32 bit word size, 16 bit data path
E 32 bit word size, 64 bit data path
39
Quiz
4) What is the first CPU to include an internal math
coprocessor?
A 386DX
B 486SX
C 486DX
D Pentium
5) What are the two main components of the CPU?
A The Control Unit and ALU
B The Registers and Output/Input management
C The ALU and FPU
6) What are the two main desktop CPU manufacturers?
A Intel and AMD
B Via and Power PC Address Content
C Marek and Sun UltraSparc 0x4000 2F
7) What are the 32-bit data when we read a double-word at 0x4001 65
the address 0x4000 with Big Endian mode?
0x4002 7E
A 0xAC7E652F
B 0x2F657EAC 0x4003 AC
C 0xCAE756F2 40
20
8/29/2022
Quiz
8) Pick the correct choice for the ARM processor.
A 16 bit word size, 16 bit data path
B 32 bit word size, 16 bit data path
C 8 bit word size, 32 bit data path
D 32 bit word size, 8 bit data path
E 32 bit word size, 32 bit data path
9) Pick the wrong choice for ARM architecture.
A Von Neumann architecture
B Harvard architecture
C 3 stage pipeline architecture
D 32-bit ARM Instruction Set
10) Pick the wrong choice for ARM registers.
A ARM has 37 32-bit registers
B There are 13 general purpose registers
C R13 is Stack Pointer
D R14 is the program counter
41
Exercises
1. Suppose that you discover that RAM addresses 000C0000 to 000C7FFF are
reserved for a PC’s video adapter. How many bytes of memory is this?
2. Suppose that you have an Intel 8086. Find the five-hex-digit address that
corresponds to each of these segment:offset pairs:
(a) 2B8C:8D21 (b) 059A:7A04 (c) 1234:5678
3. In an 8086 program, suppose that the data segment register DS contains the
segment number 23D1 and that an instruction fetches a word at offset 7B86
in the data segment. What is the five-hex-digit address of the word that is
fetched?
4. In an 8086 program, suppose that the code segment register CS contains the
segment number 014C and that the instruction pointer IP contains 15FE.
What is the five-hex-digit address of the next instruction to be fetched?
5. What are advantages and disadvantage of secondary memory?
42
21