75%(4)75% found this document useful (4 votes) 11K views699 pagesComputer Architecture & Organisation
This book is written by Atul P. Godse and Deepali. A. Godse and is published by Technical Publications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
aE et
A Conceptual Approach
‘aspog ‘Vd
(peosdty enydesv0) y
aspog dy =MONEZIUCHIO 8 SIMISYYIIy JeINdWED
0-¥62-6605-06-218 NASI
‘yore eau
A.P Godse, D. A. Godse
5 TT Rewew
gamer 4a pit role pegrrreoreonsComputer Architecture
& OrganizationAbout the Author
A. P. Godse
© Completed MS in Software Systems wih distinction from Bila Insitute of Technology
Completed BLE. in Industrial Electonics with distinction from University of Pune in 1950,
Worked as a Professor at Vishwakaima Institute of Techsology, Pune,
Worked as a Technical Director at Noble Insttute of Technology, Pune.
Worked as selection Committee member for M.S. admission for West Virginia University,
Washington D.C.
© Developed Microprocessor Based Instuments in co-ordination with Anna Haare for
Environmental Studies Laboratory, at Ralegan Siddhi
‘© Developed Microprocessor Lab in-house for Vishwakarma Institute of Technology.
© Worked as Subject Expert for a State Level Technical Paper Presentation Competition, Pune,
‘© Awarded on 26th Jan 2001 by Pune Municipal Corporation for contributing in education field
tnd technical writing,
(© Awarded as a "Parvati Bhushan Puraskar” for contributing in the education fel
‘© Since 1996, vnting books on various engineering subject. Over the years, many of books are
recommended as the reference books and text bools in various national and intemational
engineering universities
D. A. Godse
© Completed! M.E and pursuing PhD. in Computer Engineering from Bharati Vidvapeoth’s
University Pune
‘© Completed BE. in Industial Electonics from University of Pune in 1992,
"© Working as a Professor and Head of Information Technology Deparment in BLV.C.0.E.W, Pune
‘© Subject Expert for silabus seting of Computer Engineering and Information Technology branches
at the faculty of Engineering of Pune Univers,
© Subject Expert and Group Leader for sjlabus seting of Electonics, Electronics and
TTelecommunication and Indust Electronics branches at the faculty of Maharashtra State, Board
of Technical Education.
© Subject In-charge for Laborston Manual Development, Technical Teacher's Training Instiute,
Pune.
‘© Subject In-charge for Question Bank Development Project, Technical Teacher's Training Insitute,
Pune.
‘© Subject In-charge for the preparation of Teacher's Guide, Board of Technical Examination,
Maharashtra state
‘© Subject Expert for a State Level Technical Paper Presentation Competition organized by Bharal
Vidvapeeth’s Jawaharal Nehni Institute of Technology, Pune.
‘© Local Inquiry Committe (LIC) member of Engineering faculy of Pune University
‘© Awarded on 15" August 2006 by Pune Municipal Corporation for contributing in education field
and technical writing,
‘© Awarded on the occasion of Intemational Women’s Day at Yashawantrao Chavan Pratishthan
Sabhagrh, Mumbai by Bharativa Shikshan Sanstha
@COMPUTER ARCHITECTURE
& ORGANIZATION
Atul P Godse
IMS. SchaoreSytams BITS Pon)
{BE indusol Elecranice
Forme Lecturer in Deparment of Elcronics Enon
\Vshookorms inetite oF lchnolony
Pune
Mrs. Deepali A. Godse
8 Indatal Elecrones ME, (Computes)
Heod of ormaton Teele Dap
Ahora epost Coleg af ngneneg for Women,
ine
Price® S48
sn 97493-50924
Pune Nashik Bangalore Chennai Hyderabad
Ahmedabad Bhopal Lucknow Jaipur Delhi
=> TECHNICAL
= PUBLICATIONS
‘An Up-Thrust for Knowledge f
i)COMPUTER ARCHITECTURE & ORGANIZATION
Fin Eon: mary 2014
Ree Jomary 2015
Rept: March 2019
{© Copyright wih Author
Al publahing sight printed and book varsion ezervad wih Technical Publeaions. No pat af his book
should be rpreduced irony form, Elecronc, Machanlcol, Photocopy a ay information storage and
retevl system wihout par permission in wring, from Technical Pusheatons, Ane
Published by
SP TECHMIGAL Feet ete
TERRIERS a SSSSEIN” ot a itaaneccneen,
Printer
Bernd Lo
Sa) Tic, abc,
Midge, Kas Mb
Pris 8 548
tsa on
ISBN 9789350992340 (Printed Book)
srass09s2340 TT]The importance of Computer Architecture and Organization is
‘well known in various engineering fields. Overwhelming response to our
books on various subjects inspired us 10 write this book. The book is
structured to cover the bey aspects of the subject Computer
Architecture and Organization
‘The boob uses plain, lucid language to explain fundamentals of this
subject. The book provides logical method of explaining various
complicated concepts and stepwise methods to explain the important
topics. Each chapter is well supported with necessary illustrations
practical examples and solved problems. All the chapters in the book
are arranged in a proper sequence that permits each topic to bulld
upon earlier studies All care has been taken to make students
comfortable in understanding the basic concepts of the subject
Representative questions have been added at the end of each
section t help the students in picking important points from that
section,
‘The book not only covers the entire scope of the subject but
‘explains the philosophy of the subject. This makes the understanding of
this subject more clear and makes it more Interesting The book will be
very uselul not only to the students but also to the subject teachers.
‘The students have 10 omit nothing and possibly have to cover nothing
‘more,
‘We wish to express our profound thanks to all those who helped in
making this ook a reality, Much needed moral support and
encouragement is provided on numerous occasions by our whole
family, We wish to thank the Publisher and the entire team of
Technical Publications who have taken immense pain to get this book
in time with quality printing
Any suggestion for the improvement of the book will be
acknowledged and well appreciated
Authers
AD. Gate
DA Ge
”T able of Contents
Chapter-1 Introduction @-ito(-
1.1. Computing and Computers
1.1.1 Computer Types
1.1.2 Functional Units (Elements) of Computer
1.1.2.4 Input Unit
1.1.22 Memary Unit
1.1.2.3 Arithmetic and Logie Unit
1.1.24 Output Unit
1.1.2.5 Control Unit
11.2.6 Features of Von Neumann Mode!
1.1.3 Limitations of Computers
1.2 Evolution of Computers...
1.2.1 First Generation (Von Neumann Architecture)
1.2.2 Detail Structure of 1AS/Von Neumann Machine
1.2.3 Second Generation
1.2.4 Third Generation.
1.2.5 Later Generations
1.3 VLSI Era
1.3.1 Integrated Circuits .
1.3.2 1C Families
1.3.3 Processor Architecture.
1.3.4 Performance
1.3.4.1 Processor Clock
1.3.4,2.€PU Time
oo
136)
ee!
1-2
1-4
1a
1-16
1-20
1-20
1-20
seat
1-22
1-24
1-251.3.43 Performance Metries 1-26
413.44 Performance Measurement. 4-31
1.4 Design sod 34
4.4.1 System Design 1-34
4.4.1.1 System Representation oa nn Meh
14.1.2 Design Process 435
1.5 Register Level.
1.5.1 Register Level Components
1.5.2 Programmable Logie Devices.
1.5.3 Field Programmable Gate Arrays.
1.5.4 Register Level Design
15.4.1 Data and Control
15.4.2 A Description Language 1-65,
15.4.3 Design Techniques 1-66,
1.6 Processor Level . 1-67
1.6.1 Processor Level Components. ‘ 1-67
1.6:.1 Central Processing Unit 1-67
16:12 Memories... PP . 1-69
1.6.13 10 Devices 1-70
16.1.4 Interconnection Networks . . poapaunnns aa Heh
1.6.2 Processor-Level Design. 1-72
16.2.1 Prototype structures 1-73
1.6.2.2 Queueing Models. 175
1.7 CPU Organization 1-78
1.7.1 CPU Register Organization. .. sis = ere
1.8 Data Representation.
41.8.1 Decimal Number System
1.8.2 Binary Number System.
4.8.3 Octal Number System.
1.8.4 Hexadecimal Number System
41.8.5 Format of a Binary Number...
wi19 Fixed Point Numbers.
1.9.1 Signed Binary Numbers
1.9.21's Complement Representation
1.9.3 2's Complement Representation
1.9.45ign Extension
1.10 Floating Point Numbers.
1.10.2 IEEE Standard for Floating-Point Numbers ...
1.10.2 Special Values
1.10.3 Exceptions...
1.11 Instruction Formats
11.12 Instruction Types
1.12.1 Data Transfer instructions
1.12.2 Arithmetic Instructions .
4.12.3 Logical instructions. ....
1.12.4 Shift and Rotate instructions.
1.12.5 Program Sequencing and Control instructions
1.12.6 RISC Instructions
1.13 Addressing Modes
1.13.1 Implementation of Variables and Constants
1.132 indirection and Pointers,
4,13.3 Indexing and Arrays ...-
1.13.4 Relative Addressing
1.135 Additional Modes
Two Marks Questions with Answers.
Chapter-2 Data Path Design
2.1. Introduction
2.2 Fixed Point Arithmetic...
2.2.1 Addition
2.2.2 Subtraction .
1-94
1-95
1-96
1-96
1-98
1-99
1-100
1-106
1-106
1-107
od 113
1-113
1-113
111s
1-118
1-118
1-120
A121
1121
1
1122
1-123
1-123
1-127
(2-1) to (2 - 106)
2-22.3 Adder and Subtractor Circuits .. 2-9
2BAMAIEARIENS. co ececee eee ee
2.3.2 Full Adders...... se oR 2-10
2.3.3 Parallel Adder .. 2-12
2.3.4 Parallel Subtractor 2-43
2.3.8 Addition / Subtraction Logic Unit ...... 2-14
2.3.6 Overflow in Integer Arithmetic 2-14
2.3.7 Addition and Subtraction of Signed-magnitude Data * oi 2-16
2.8 Look Ahead Carry Adders 2-18
2.5 Multiplication... 2-24
2.6 Robertson Algorithm
2.6.1 Robertson Algorithm for 2's Multiplier...
2.6.2 Robertson Algorithm for 2's Complement Fraction. .
2.7 Booth's Algorithm...
2.8 Fast Multiplication 2-82
2.8.1 Bit Pair Recoding of Multipliers 2-42
2.8.2 Array Multiplier. 2-45
2SDIASIOM snanerwnronenrnr alata ranannnmnnmnn ood 51
2.9.4 Restoring Division. 2-82
2.9.2 Non-restering Division “ = 2-56
2.9.3 Comparison between Restoring and Non-restoring Division Algorithm .....2-61
2.10 Floating Point Arithmetic. 7 2-61
2.10.1 Addition and Subtraction... 2-61
2.10.2 Problems in Floating Point Arithmetic . 2-63
2.10.3 Flowchart and Algorithm for Floating Point Addition and Subtraction .....2-64
2.1044 implementing Floating Point Operations 2-66
2.10.5 Multiplication and Division 2-67
2.10.6 Guard Bits and Truncation .......sceeeeceseesesenseeeeee ihn
2.11 Arithmetic Logic Unit..
-22.11.4 Design of Arithmetic Unit... sm - . ce 2eT2
2.11.2 Design of Logic Circuit 2-76
2.11.3 Combining Arithmetic and Logic Units . 2-78
2.11.4 Status Register 2-81
2.11.5 Sequential ALUs . 2-84
2.11.6 ALU Expansion, 2-86
2.12 Coprocessors... 2-87
2.13 Pipeline Processing... 2-91
2.14 Pipeline Design 2-95
‘Two Marks Questions with Answers. 2-100
Chapter-3 Control Design (3-1) to (3 - 120)
3.1 Introduction... 3-2
3.2Some Fundamental Concepts. 3-2
S:2:A Repistet Transfet 1c Rat ce ren Sree nentnensnetsnierinene Al
3.2.2 Performing an Arithmetic or Logic Operation 3-6
3.2.3 Fetching a Word from Memory... .242escseeeessseseeseceeessenese eB 6
3.2.4 Storing a Word in Memory. 3-9
3.2.5 Execution of a Complete instruction... 3-10
3.2.6 Branch Instruction a a
3.2.7 Multiple Bus Organization... . 3-12
3.3 Hardwired Control 3-15
3.3.1 Design Methods of Hardwired Control Unit. ........ce.cseesssseeeeees 3-47
3.3.1.1 State-table Method 0 3-20
33.1.2 Delay Element Method ‘ aa
3.3.1.3 Sequence Counter Method. * 3-23
3.3.1.4 PLA Method
3.3.2 A Complete Processor
3.3.3 CPU Control Unit
43.3.4 Design of Control Unit of GCD Processor using State Table i353
oy43.3.5 Design of Control Unit of GCD Processor using One Hot Method ce B-Ab
3.4 Microprogrammed Control...
3.4.1 Microinstruction
3.4.11 Grouping of Control Signals 3-50
3.4.12 Techniques of Grouping of Control Signals.
3.42 Mieroinstruction Sequencing
3.4.3 Techniques for Modification or Generation of Branch Addresses
43.4.4 Microinstructions with Next Address Field
3.45 Microinstruction Execution .
3.5 Comparison between Hardwired and Microprogrammed Control. 3-68
3.6 Bit Slicing and Bit Sliced Microprogram Sequencer... 3-69
3.6.1 Features of Bit Slicing 3-70
3.6.2 Processor Sice-2901. an
3.6.3 Microprograms for 2901 3-74
3.6.4 16-bit Bit Sliced Processor : 2 3-75
3.6.5 16-Bit Multiplication. 3-76
3.7 Applications of Microprogramming...
318 Pipeline Control swis.insistuinsonetomngettistainitnitiiinnannininsntniiait <0
13.8.1 Principles of Pipelining. 3-79
3.8.2 Instruction Pipelines St 2 3-81
3.8.3 Implementation of Two-Stage Instruction Pipelining 3-83
3.8.4 Implementation of Four-Stage Instruction Pipelining 3-84
3.8.5 Pipeline Performance. 3-85
43.8.6 Hazards in instruction Pipelining 3-90
3.8.6.1 Structural Hazards 3-91
3.86.2 Data Havards, 3-92
3.8.63 Instruction or Control Hazards 3-95
43.8.7 Influence on Instruction Sets......+s+ecceveveeeveeseeseesevseeeeees 3-98
3.87.1 Addressing Modes 3-983.8.7.2 Condition Codes 3-100
3.8.8 Branch Prediction 3-101
3.9 Superscaling Processing... t . joee3 = 103
3.9.1 Instruction Level Parallelism and Machine Parallelism. ceevevee3 +205
3.9.2 Instruction- Issue Policy fn seseesereenees 3106
EAS Reger READ n entnennnoneine cononensneonans tie onsgenreene 3 A08!
3.9.4 Branch Prediction 3-108
3.10 Nano Programming, 3-108
Two Marks Questions with Answers. 3-110
Chapter-4 Memory Organization (4-1) to (4-120)
4.1 Basic Concepts, cr en er)
4.2 Memory Hierarchy/Multilevel Memory System:
4,3 Random Access Memories.
4.3.1 ROM (Read Only Memory)... * ancraneramacnnenn AEE
43.1 fom
{43.1.2 PROM [Programmable Read Only Memory) a9
4.3.1.3 EPROM (Erasable Programmable Read Only Memory) 4-10
43.1.4 EEPROM (Electrically Erasable Programmable Read Only Memory) an
4.3.2 RAM (Random Access Memory) «+ at
43.2.1 StaticRAM, an
43.2.2 Internal Organization aa
43.2.3 Dynamic RAM (ORAM). i 4-16
43.24 internal Structure of DRAM, a7
4.3.3 Comparison between SRAM and DRAM. ......0+.seceseesesseeseeeeseed- 47
4.4RAM Interfaces...
4.4.1 Expanding Word Size and Memory Capacity
4.5 Advanced DRAMS ..
4.5.1 Enhanced ORAM (EDRAM)
45.2 Cache DRAM (CDRAM).
ost)45.3 Synchronous DRAM (SDRAM), satire a reehnoemsaare in orarth BS)
45.4.1 Timing Diagram 4-28
45.3.2 Performance Measures or aaa ae
45.3: Double-Data-Rate SORAM 4-29
4.5.4 Rambus ORAM . 4-30
455 Ramlink ORAM, 4-30
45.6 EDORAM..... a : weomndd
4.6 Serial Access Memories. 4-32
4.6.1 Magnetic Disk Memory 4-33
46.1.1 Magnetic Surface Recording... =. « 4-33
46.1.2 Data Organization and Formatting 48
46.13 characteristics . 4-36
4.6.1.4 Tica Disks 7 4-36
45:15 Access Time... 4-36
46.1.6 Data Bulfer/Cache - « a7
45.17 Disk Contraler am
46.1.8 Loading of Operating System fom Dik 4.39
4.6.2 Floppy Disk Memory. . a 4-40
46.2.1 Specifications of Floppy Disks, 4-40
4.62.2 Disk Format 4.40
45.23 Storage Density aca
4.6.3 Magnetic Tape
4.6.4 RAD
4.6.5 Optical Memories.
465: 080M,
4552 WORM.. ano wan Biba
46.53 Erasable Optical Disk wm OSB:
46.5.4 DVD Technology
4,7 Cache Memory.
4.7.1 Cache Memory System. ....-
4.7.4.1 Program Locality... . « cee
4-6147.12 Locality of Reference 4-64
4.7.13 Block etch 4-65
4.7.2 Elements of Cache Design 4-65
4.73 Mapping. : 3 ce 66:
4.734 Direct Mapping. 4-67
4.7.3.2 Associative Mapping (Fully Associative Mapping) 4-68
4.7.3 Sov Associative Mapping.
4.7.4 Comparison between Mapping Techniques
47.5 Cache Write / Updating
4.7.5.1 Write through system 4-76
4775.2 Buffered Write through System. a7
47.5.3 Write Back System F an
4.7.6 Cache Coherency ..
4.7.7 Replacement Algorithms 4-78
4,8 Performance Issues. 4-80
4,9 Virtual Memory. oh 8S
4.9.1 Address Mapping using Pages . 4-86
4.9.2 Page Replacement ° 4-87
4.9.3 Page Replacement Algorithms. ......+4.s.e.scesesersseee ce BB
4.9.4 Memory Management Hardware 4-93
4.9.4.1 Segmented Page Mapping 4-94
4.9.4.2 Memory Protection
4,10 Memory Allocation
4,11 Associative Memory 101
4.11.1 Hardware Organization . 4-101
4.11.2 Read Operation. : . ai cee tO
4.11.3 Write Operation .....0.6se2ceceeeeeeeeeeesees ceeeeeee 4-104
4.12 Memory interleaving 4-105
‘Two Marks Questions with Answers... ve 4-106
wwChapter-5 System Organiza
5.1 Introduction
5.2 Communication Methods
5.2.1 Buses ..
5.2.1.1 Single Bus Structure
5.2.1.2 Multiple Bus Structures.
5.2.1.3 Bus Design Parameters.
5.2.2 Bus Control
5.2.24 Synchronous Input/Output Transfer
5.2.22 Asynchronous Input/Output Transfer
52.2.3 Strobe Control
5.2.2.4 Handshaking,
5.2.3 Bus Interfacing
5.2.4 Bus Arbitration
5.2.4.1 Centralized Arbitration
5.2.42 Distributed Arbitration
5.2.5 PCI Bus
5.25.1 Festures
52.5.2 PCI Configurations
5.2.53 PCIBus Signals
5.2.5.4 PC1 Bus Commands
5.2.5.5 Data Transfer
5.2.5.6 PCI Arbitration
5.3 1/0 and System Control
5.3.1 /0 Modules
53.1.1 Major Requirements of an /0 Module,
5.3.2 Programmed /0.
53.21 M0 Addressing .
5322/0 Instructions
5.3.23 Interfacing
5.3.3 Interrupt Driven /
mn -1)toG-
138)
5-25.3.3.1 Interrupt Hardware 5-45
53.3.2 Enabling and Disabling Imerrupts 5-49
5.3.3.3 Handling Multiple Devices 5-50
5.3.3.4 Vectored interrupts 5-51
5.3.3.5 Nested interrupts eee.
153.36 Interrupt Priority 5-52
53.3.7 Pipeline Interrupts 5-53
533.8 Bcceptions 5-54
S33.9PCIinterrupts ©. ee ‘ 5-55
5.4 Comparison between Programmed I/O and Interrupt Driven 1/0 ...nmee5 = 56
55.5 Direct Memory Access (DMA).
5.5.1 Hardware Controlled Data Transfer «.... POWs acness oe “58.
55.2 DMAIdle Cycle.
SSBDMA Activa Oyele secazn ions trr cannon dvnuesckesaenntbasanexey 5-8)
5.5.4 DMAChannels, Ml vc xcinic : acme “BB)
5.5.5 Data Transfer Modes 5-62
5.6 Interface Circuits. 5-67
5.6.1 Parallel Port... sf susie san O7
5.6.1.1 Input Port 5-68
5.6.1.2 Output Port 5-70
5.6.13 Combined Input/Output Port sn
5.6.1.4 Programmable Parallel Port
5.6.2 Serial Port .
5.6.3 Comparison between Serial and Parallel Interface ..
5.7 1/0 Channel,
5.7.4 Characteristics of I/O Channel.
5.7.2 Types of /0 Channels.
5.1.2.1 Selector channel ees S76
5.7.2.2 Multiplexer Channel Hanw sana Bee
5.81/0 Processor 08 — 5-77
oi5.8.1 Features and Functions of 1OP.
5.8.2 Block Diagram of IOP
5.8.3 CPU and IOP Communication
5.9 Operating Systems.
5.9.1 What is an Operating System ?
5.9.2 Necessity of Operating System
5.9.3 Functions of Operating System
5.9.4 Types of Operating Systems.
5.9.4.1 Batch Operating System
5.9.4.2 Multiprogramming Operating System
‘5.9.4.3 Time Sharing Operating System.
5.9.4.4 Real-Time Operating Systems.
5.9.5 Protection
5.9.5.1 V0 Protection
585.2 Memory Protection
5.95.3 CPU Protection
5.9.6 Distributed Operating System
5.9.7 Operating System Services.
5.9.8 System Calls,
5.9.8.1 Process and Job Control
5.9.8.2 File Manipulation
5.9.8.3 Device Manipulation
5.9.84 Information Maintenance
5.9.9 System Call Implementation -
5.9.10 How System Calls Used .
5.9.11 System Programs -
5.9.12 Protocol Layers
5.9.13 Features of Unix ..
5.9.14 Structure of Unis.
5.9.14.1 The Kernel
59.142 The ps command
oa
5-78
5-78
5-795.9.14. The Kernel and System Calle
5.9.144 Running A Command The Shell
5.9.14.5 AProgram Hierarchy
5.9.15 Files and Directories.
59.15.1 File Creation
5.9152 File Systems
5.9.16 Peripheral Devices : Special Files.
5.10 Multiprocessors
5.10.1 Loosely Coupled Multiprocessors (LCS)
5.10.2 Tightly Coupled Multiprocessors.
'5.10.2.1 TeS without Private Cache
5.10.2.2 TES with Private Cache
5.10.3 Processor Characteristics for Multiprocessing.
5.10.4 Interconnection Networks
'5.10.4.1 Time Shared Bus or Common Bus
5.10.42 Cross Bar Switch
5.10.43 Multiport Memory
5.10.44 Multistage interconnection Network (MIN).
5.10.5 Contention Problems in Multiprocessor Systems
5.10.1 Memory Contention
5.10.5.2 Communication Contention
5.10.53 Hot Spot Contention
5.10.54 Techniques for Reduclng Contentions
5.10.6 Mutual Exclusion
5.10.7 Dead Lock ...
5.11 Fault Tolerance
5.11.1 Redundancy .
5.11.2 Fault Tolerant Systems
5.11.21 Stati Redundancy
5.1.2.2 Dynamle Redundancy
5.11.3 Redundant Disk Arrays.
ot
5-94
5-95
5-96
5-97
5-97
5-99
5-100
5-101
5-102
5-103
5-104
5-107
5-107
5-110
sa
5-116
5-16
5-116
5-116
s.u7
5-18
5-119
5-119
5-120
5-120
5-120
5-1
5-1235.11.4 Fault Tolerance Measure. coeceeeee ceeeee 5-423
5.11.4. Availability 5-13
5.11.42 Reliability 5-123
5.11.43 System Reliability 5-125
5.11.44 Mean Time To Fallure (MITF) 5-126
5.11.45 Mean Time To Fallure (MTTF) and Mean Time To Repair (MTTR) 5-127
SALLS Fault Tolerant Computers ........ssssessseseseesecessevscsseesessSed27
5.11.5. The Tandem Nonstop s-17
5.11.82 The Tandem VIX s-18
5.11.53 The Tandem Himalaya, 5-08
5.11.54 Architectural Details of Tandem’s NonStop Computers. 5-128
5.11.55 Architectural Detalls ofthe Tandem Himalaya Computers... 5-130
Two Marks Questions with Answers. 5-131
Chapter-6 RISC, CISC, Superscalar and Vector Processors
(6 - 1) to (6 - 58)
6. RISC Processor... 6-2
6.2 RISC Versus CISC..... 6-2
6.3 RISC Properties . 6-4
6.4 RISC Addressing Modes wn... sn -7
6.5 RISC Evaluations... 6-8
6.5.1 RISC and VLSI Realization 6-8
65.2 Computing Speed ...2..+0+++ Branton aermnsawrn SB)
6.5.3 Design Cost and Reliability Considerations. ..... 6-9
6.5.4 HLL Support, 6-10
6.5.5 Shortcomings of RISC Architecture 6-10
6.6 On-chip Register File Versus Cache Evaluation... 6-11
6.7 Overview of RISC Development and Current Systems, 6-13
6.7.1 SPARC. 6-13
6.7.2 The Intel i860 Processor Architecture. 6-17
oy6.7.3 RISC Processor Motorola 88000 ....... - . ee 6-20
6.8 Superscalar Processor.
6.8.1 Instruction Level Parallelism and Machine Parallelism
6.8.2 Instruction-Issue Policy
6.8.3 Register Renaming
6.8.4 Branch Prediction.
6.8.5 Example : PowerPC Processor .
6.8.6 Power Intercupts.
6.8.7 Machine Status Register (MSR)
6.8.8 Data Types
6.8.9 PowerPC Addressing Modes .
6.8.10 PowerPC Instruction Farmats
6.8.11 PowerPC Instruction Set.
6.8.12 Features of PowerPC Architecture ..
6.8.13 Features Not Defined by the PowerPC Architecture. ma
6.9 Vector Processor... 6-49
6.9.1 Characteristis of Vector Processing 6-49
6.9.2 Vector Processing Approach . . 6-50
‘Two Marks Questions with Answers. 6-55Introduction
Syllabus
Computing and computers, Evolution of computers, VLSL era, System design-regiser level, Processor
level, CPU organization, Dara representation, Fixed:point numbers, Floating point numbers,
Instruction formats, Instruction types. Addressing modes.
Contents
1.4 Computing and Computers May-04, 07, 09, 10,
Dec.-04, 07, 08, 09, 10,
1.2 Evolution of Computers May-07, Dec.-11,
1.3 VSI Era Dec.-03, 08,
4.4 Design
1.5 Rogistor Love! cess Doe.-03, 11, May-11,
1.8 Processor Level
1.7 CPU Organization May-09,
1.8 Data Representation Dec.-11, May-12,
1.9 Fixed Point Numbers Dec.-07, May-09,
1.10 Floating Point Numbers Dec.-06, May-07, 08
4.11 Instruction Formats Dec.-03, 06, 07, 10,
May-06, 08, 09,
41.12 Instruction Types May-06, 12, Dec.-06, 12,
1.13 Addressing Modes Dec.-06, 07, 08, 10, 11, 12,
May-07, 08, 09, 10, 11,
Two Marks Qustions with AnswersComputer Architecture and Organization
Introduction
Computing and Computers
This chapter provides a broad overview of digital computers. This section of the
chapter first examines the nature and limitations of the computing process and then
explains the evolution of computers. The later part of this section discuss the VLSI based.
computer systems,
In early days, the man was using the abacus and the slide rules for computing
purpose. But as the size and complexity of the calculations being carried out increases,
these manual calculations has serious limitations :
‘* The speed at which manual calculations are carried out is limited.
‘+ Humans are notoriously prone to error, so manually done tong calculations are
unreliable unless elaborate precautions are taken to eliminate mistakes.
(On the other hand, the computer is a machine which do not have distraction and
fatigue and can perform billions of operations in considerable quick time. Further more,
they can provide results free from error.
Computer Types
A digital computer or simply computer in its simplest form is a fast electronic
calculating machine that accepts digitized information from the user, processes it
according to a sequence of instructions stored in the internal storage and provides the
processed information to the user. The sequence of instructions stored in the internal
storage is called computer program and internal storage is called computer memory.
According to size, cost computational power and application computers are classified
© Microcomputers
‘+ Minicomputers
‘+ Desktop computers
‘+ Personal computers
‘+ Portable notebook computers
‘+ Workstations
‘+ Mainframes or enterprise systems
© Servers
‘+ Supercomputers
Microcomputers : As the name implies microcomputers are smaller computers. They
contain only one Central Processing Unit. One distinguishing feature of a microcomputer
is that the CPU is usually a single integrated circuit called a microprocessor.
TECHNICAL PUBLICATIONS”. An ph er hnowesgeComputer Architecture and Organization
Introduction
Microcomputer is the integration of microprocessor and. supporting peripherals
(memory and 1/0 devices), The word length depends on the microprocessor used and is
in the range of S-bits to 32-bits. These type of computers are used for small industrial
control, process control and where storage and speed requirements are moderate.
Minicomputers : Minicomputers are the scaled up version of the microcomputers with
the moderate speed and storage capacity. These are designed to process smaller data
words, typically 32-bit words. This type of computers are used for scientific calculations,
research, data processing application and many other.
Desktop Computers : The desktop computers are the computers which are ustally
found on a home or office desk. They consist of processing unit, storage unit, visual
display and audio as output units and keyboard and mouse as input units. Usually
storage unit of such computer consists of hard disks, CD-ROMs and diskettes.
Personal Computers : The personal computers are the most common form of desktop
computers. They found wide use in homes, schools and business offices.
Portable Notebook Computers : Portable notebook computers are the compact version
of personal computers. The laptop computers are the good example of portable notebook
computer.
Workstations : Workstations have higher computation power than personal computers
‘They have high resolution graphics terminals and improved input/output capabilities.
Workstations are used in engineering applications and in interactive graphics,
applications.
Mainframes or Enterprise Systems : Mainframe computers are implemented using
tivo of more central processing units (CPU). These are designed to work at very high
speeds with large data word lengths, typically 64-bits or greater. The data storage
capacity of these computers is very high. This type of computers are used for complex
c calculations, large data processing applications, Military defense control and for
complex graphics applications (eg: For creating walkthroughs with the help of
animation softwares).
Servers : These computers have large storage unit and faster communication links
The large storage unit allows to store sizable database and fast communication links
allow faster communication of data blocks with computers connected in the network
These computers serve major role in internet communication,
Supercomputers : These computers are basically multiprocessor computers used for
the large-scale numerical calculations required in applications such as weather
forecasting, robotic engineering, aircrait design and simulation.
TECHNICAL PUBLICATIONS”. An ph er hnowesgeComputer Architecture and Organization
Introduction
Functional Units (Elements) of Computer
‘The idea of having computer wired for general computations with program stored in
memory was introduced by John Von Neumann when he was working as a consultant
at the Moore school. He and originators of ENIAC designed the first slorad program
computer named EDVAC (Electronic Discrete Variable Computer). The two key principles
are used to build computers
1 Instructions are represented as numbers, ——
2. Programs can be stored in memory to be ae
read or writen just ike numbers ma} F Cae
few prncpien leat: te’ stone! program eet
concept. According to stored-program concept ‘ene
inno can cain the peigeet out enalh
the comsponding compiled maine ede, eto T
prigian aedicira Uavcgaples dst gented oe
the machine code The Pg. Lt stews the | Coo TEE | oe
soiclisinchint Of VonNeseiean eucrisan [pl] TE | See | ey
Seas kes ncaa 4 apa
output unit, control unit, Arithmetic and logic Fig. 11.1 A Von Neumann machine
unit and memory unit.
‘The input unit accepts the digital information from user with the help of input
devices such as keyboard, mouse, microphone ete, The information received from the
input unit is either stored in the memory for later use or immediately used by the
arithmetic and logic unit to perform the desired operations, The program stored in the
memory decides the processing steps and the processed output is sent to the user with
the help of output devices or it is stored in the memory for later reference. All the above
mentioned activities are co-ordinated and controlled by the control unit. The arithmetic
and logic unit in conjunction with control unit is commonly called Central Processing
Unit (CPU)
Input Unit
A computer accepts a digitally coded information through input unit using input
devices, The most commonly used input devices are keyboard and mouse. The keyboard
is used for entering text and numeric information. On the other hand mouse is used to
position the screen cursor and thereby enter the information by selecting option. Apart
from keyboard and mouse there are many other input devices are available, which
include joysticks, trackball, spaceball, digitizers and scanners.
TECHNICAL PUBLICATIONS”. An ph er hnowesgeComputer Architecture and Organization
‘Memory Unit
‘The memory unit is used to store programs and data. Usually, two types of memory
devices are used to form a memory unit : primary storage memory device and
secondary storage memory device. The primary memory, commonly called main
memory is a fast memory used for the storage of programs and active data (the data
currently in process), The main memory is a semiconductor memory. It consists of a
large number of semiconductor storage cells, cach capable of storing one bit of
information. These cells are tead or written by the central processing unit in a group of
fixed size called word. The main memory is organized such that the contents of one
word, containing n bits, can be stored or retrieved in one write or read operation,
respectively.
To access data from a particular word from main memory each word in the main
memory has a distinct address. This allows to access any word from the main memory
by specifying corresponding address. The number of bits in cach word is referred to as
the word length of the computer. Typically, the word length varies from 8 to 64 bits
The number of such words in the main memory decides the size of memory or capacity
of the memory. This is one of the specification of the computer. The size of computer
main memory varies from few million words to tens of million words.
‘An important characteristics of a memory is an access time (the time required to
access one word). The access time for main memory should be as small as possible.
Typically, itis of the order of 10 to 100 nanoseconds. This access time also depend on
the type of memory. In randomly accessed memories (RAMS), fixed time is required to
access any word! in the memory. However, in sequential aecess memories this time is not
fixed
‘The main memory consists of only randomly accessed memories. These memories are
fast but they are small in capacities and expensive. Therefore, the computer uses the
secondary storage memories such as magnetic tapes, magnetic disks for the storage of
large amount of data.
‘Arithmetic and Logic Unit
The arithmetic and logic unit (ALU) is responsible for performing arithmetic
operations such as add, subtract, division and multiplication, and logical operations such
as ANDing, ORing, Inverting etc. To perform these operations operands from the main
memory are brought into the high speed storage elements called registers of the
processor. Each register can store one word of data and they are used to store frequently
tused operands, The access times to registers are typically 5 to 10 times faster than access
times to memory. After performing operation the result is either stored in the register or
memory location,
‘Output Unit
‘The output unit sends the processed results to the user using output devices such as
video monitor, printer, plotter, ete. The video monitors display the output on the CRT
screen whereas printers and plotters give the hard-copy output. Printers are classified
Introduction
TECHNICAL PUBLICATIONS”. An ph fr hnomesgeComputer Architecture and Organization Introduction
according to their printing methodology : Impact printers and non-impact printers
Impact printers press formed character faces against an inked printers, Non impact
printers and plotters use laser techniques, ink-jet sprays, xerographic processes,
electrostatic methods, and electrothermal methods to get images onto the paper. A.
ink jet printer are the examples of non-impact printers.
Control Unit
‘As mentioned earlier, the control unit co-ordinates and controls the activities amongst
the functional units, The basic function of control unit is to fetch the instructions stored
in the main memory, identify the operations and the devices involved in it, and
accordingly generate control signals to execute the desired operations.
The control unit uses control signals or timing signals to determine when a given
action is to take place. It controls input and output operations, data transfers between
the processor, memory and input/output devices using timing, signals.
The control and the arithmetic and logic units of a computer are usually many times
faster than other devices connected to a computer system. This enables them to control a
‘number of external input/output devices.
Features of Von Neumann Model
‘The main features of a Von Neumann model are
1. It uses stored program concept. The program (instructions) and data are stored in a
single read-write memory.
2. The contents of read-write memory are addressable by location, without regard to
the type of data contained there.
3. Execution of instructions occurs in a sequential manner (unless explicitly modified)
from one instruction to the next.
Because of the stored program architecture of Von-Neumann machine, the processor
performance is tightly bound to the memory performance. That is, since we need to
access memory at least once per cycle to read an instruction, the processor can only
operate as fast as the memory. ‘This is sometimes known as the Von Neumann
bottleneck or memory wall.
Limitations of Computers
1. Unsolvable problems : The problems which has no known solution procedure or
has no solution cannot be solved by a computer.
‘There are some problems which can be solved for specific input conditions however
there is no general procedure to solved the problem which can accept all the possible
inputs. Such problems are known as undecidable problems. Such undecidable problems
cannot be solved by computers.
TECHNICAL PUBLICATIONS”. An ph er hnowesgeComputer Architecture and Organization z Introduction
2. Intractable problems : The problem which ean be solved by computer of reasonable
size and cost with acceptable degree of accuracy and in a reasonable amount of time is
called tractable problems. The problems which are not tractable are called intractable
problems. We normally say a problem is intractable if all its known solution methods
grow exponentially with the size of problem.
‘An intractable problem can be solved in a reasonable amount of time only when its
size n is below some maximum value Nya. The size n depends on the speed of the
computer.
3. Speed limitations : There are limitations on the speed of computer. Although
computers continue to increase in speed because of advances in the hardware
technology, the rate of increase has not kept pace with demand. Thus it is necessary to
find new ways to improve the performance of computer at reasonable cost.
1. List various lements of computer.
2. Explain the function ofeach functional unit in dhe computer sytem with suitable diagram
3. Name the fection units of «computer and hw they are interrelted.
4 With a neat dingrnnexplie Vor Newnan compte octet
5. What is mons! by te stored program concapt 2 Discuss
What is mon by Cental Processing Unit (CPU) ?
7 Drmw and explain the black diagram ofa simple computer wit fe fenctional units
‘5 What is mount by the stored progam concept ? Discus.
9. Explain in detit abut functional wit of computers
10, Name the functional units of « computer and how they are inereated. — ESSTCAETETEE
11, Descrite the functional units of the computer sytem.
12, What mewn ly Central Processing Unit(CPU)?
13, Explain the architecture of basic computer id neat diagram.
EJ Evolution of Computers
In 1623-62 the french philosopher Blaise Pascal invented an early and influential
mechanical calculator that could add and subtract decimal numbers. In the 19" century
Charles Babbage designed the first mechanical computer that can perform multistep
operations automatically, that is, without a human intervening in every step. However,
these mechanical machines suffered from two serious drawbacks
TECHNICAL PUBLICATIONS”. An ph er hnowesgeComputer Architecture and Organization Introduction
+ Their computing speed was limited by the inertia of their moving parts and
+ The transmission of digital information by mechanical means was quite unreliable
The further research in this ficld invented the electronic computer, in which the
‘moving parts" are electrons and they can be transmitted and processed reliably at
speeds approaching that of light (300,000 km/s).
(EEG First Generation (Von Neumann Architecture)
The first electronic computer, ENIAC (Electronic Numerical Integrator And
Computer), was designed and constructed under the direction of Eckert and Mauchly at
the Moore School of Engineering (University of Pennsylvania). It was made up of more
than 18000 vacuum tubes anc 1500 relays. ENIAC's primary function was to compute
ballistic trajectories. It was able to perform nearly 5000 additions or subtractions. per
second.
The ENIAC was a decimal rather than a binary machine. All numbers in an ENIAC
were represented in decimal form and arithmetic was peeformed in the decimal system.
Its data memory consists of 20 “accumulators”, each capable of storing a ten digit decimal
number. Each digit was represented by a ring of 10 vacuum tubes and only one vacuum
tube was in the ON state to represent one of the ten digits. The major drawback of the
ENIAC was that it was wired in for specific computations. For modifications and
replacements of programs manually setting of switches and plugging and unplugging of
cables was necessary. It was a very time consuming process. Despite these shortcomings,
ENIAC was. used for about ten years,
The idea of having computer wired for general computations with program stored in
memory was introduced by John Yon Neumann when he was working as a consultant
at the Moore school, He and originators of ENIAC designed the first stored program
computer named EDVAC (Electronic Discrete Variable Computer). The stored program
concept in EDVAC facilitated the users to enter and alter various programs and do
variety of computations
‘The EDVAC project was further developed by Von Neumann with his collaborators
at the Institute for Advanced
Studies (IAS) in Princeton -——[Lee= |
They came up with a new
machine referred to as IAS t
or Von Neumann Machine.
thas now become the usual | PU! count | Ouput
frame of reference for many | nt unit
modem computers.
gis aeee te ol 1
Neumann machine, It
Fig. 1.2.1 A Von Neumann machine
T
TECHNICAL PUBLICATIONS”. An ph er hnowesgeComputer Architecture and Organization Introduction
consists of five basic units whose functions can be summarized as follows +
‘* The input unit transmits data and instructions from the outside world to machine.
It is operated by control unit
‘+ The memory unit stores both, data and instructions.
‘* The arithmetic-logic unit performs arithmetic and logical operations.
‘* The control unit fetches and interprets the instruction
them to be executed.
in memory and causes
‘+ The output unit transmits final results and messages to the outside world
In the original IAS machine (Von Neumann machine), memory unit consists of 4096
storage locations (2!2=4096) of 40 bits each, referred to as words. These memory
locations are used to store data as well as instructions. Both, data and instructions are
represented in the binary form with a specific format as shown in the Fig, 12.2
Data Format
Fig. 1.2.2 shows data
format. The leftmost bit 24 39
sign of the number (0
the
negative), while
remaining 39 bits Sih Number see
indicate the number's Fig, 1.2.2 Data format
size in a two's
complement form,
The numbers are assumed to have an implicit binary point corresponding to the
decimal point in ordinary decimal notation. It may be placed in any fixed position
within the number word format ; hence these numbers are called fixed-point, If the
implicit binary point is assumed to lie between bits 0 and 1, then all numbers are
treated as fractions. Some examples of the IAS representation of fractions are as follows:
+05 = 0100 0000 0000 0000 0000 0000 0000 0000 0000 0000
+01 = 0000 1100 1100 1100 1100 1100 1100 1100 1100 1100
Zero = 0000 0000 0000 0000 0000 0000 d000 G000 0000 oo00
=O1= 1111 0011 0011 0011 0011 0011 0011 0011 0011 0100
With binary point fixed between bits 0 and 1, fractions are restricted to lie between
A and +1. Hence, all numbers used in calculations that lie outside this range must be
adjusted by some suitable scaling factor.
TECHNICAL PUBLICATIONS". An ph fr romleageComputer Architecture and Organization 10 Teotketer
Instruction format Leh inervction Right instruction
aoe ee, ee eee
Fig. 16 shows the : ee = ~
opsode sree Opaode Ares
Fig. 1.2.3 Instruction format
be stored in each 40-bits
memory location. As
shown in Fig. 1.23
instruction consists of two parts : an S-bit opcode (operation code) and 12-bit address
The opcode defines the operation to be performed (add, subtract, etc.) and address part
identifies any of the 2! memory locations that may be used to store an operand of the
instruction,
[EEE2] Detail Structure of IAS/Von Neumann Machine.
structure
capone
on peter
we a Input
Tt ome
wae = =
U Aci
ae Multiplier Quotient
‘Dele Register |
opin Gam)
i Teton Bie
=|) = | ———
= dines Reiser AR
| —_—-
=] bm ;
+ T
oa FF ome
eget
Fig. 1.24 Structure of JAS computer
TECHNICAL PUBLICATIONS”. An ph er hnowesgeComputer Architecture and Organization
Introduction
Fig. 124 shows detail structure of IAS computer. It consists of various. processing
and control units, along with a set of high speed registers (AC, MQ, DR, TBR, PC, IR,
and AR), These registers are used to store instructions, memory addresses and data.
‘The complete instruction eycle involves three operations : Instruction fetching, opcode
decoding and instruction execution. The control circuits in the program control unit are
responsible for fetching instructions, decoding opcodes, routing information correctly
through the system and providing. proper control signals for central processing, unit
(CPU) actions. After decoding, the arithmetic logic circuits of the data processing unit
perform actions specified by the inetruction. An electronic clock circuit (not s
the Fig. 12.1) is used to generate the hasic timing signals to synchronize the operation of
the different parts of the system. The functioning of different registers is as given below :
Pc (Program Counter)
Tt is an address register. It is used to store the address of the next instruction to be
executed and hence also referred to as instruction address register.
AR (Address Register)
Its a 12-bit address register. It is used to specify the address in memory of the word
to be written into or read from the DR.
DR (Data Register)
It is a 40-bit register It is used to store any 40-bit word. A word transfer can take
place between the 40-bit data register DR of the CPU and any memory location. The DR
may be used to store an operand during the execution of an instruction
AC (Accumulator) and MQ (Multiplier-Quotient)
‘These are two 40-bit registers used for the temporary storage of operands and results
IR (Instruction Register) and IBR (Instruction Buffer Register)
Program control unit fetches two instructions simultaneously from memory. The
opeode of the first instruction is placed in the instruction register (IR) and the
instruction that is not to be executed immediately (second instruction) is placed in the
instruction buffer register (BR).
Before going to see detail operations of instruction processing, we will see the
instructions of the LAS computer.
Instructions
‘The instructions of IAS computer are divided in five groups
+ Data transfer
‘+ Unconditional branch
‘+ Conditional branch
+ Arithmetic
‘+ Address modify
TECHNICAL PUBLICATIONS”. An ph er hnowesgeComputer Architecture and Organization 112 Introduction
‘Table 1.21 shows the instruction set of LAS computer.
Table 1.2.1 Instruction set of IAS computer
Instruction Cycles:
Let us see how instruction is processed. The complete instruction eyele involves three
operations : Instruction fetching, opcode decoding and instruction execution,
The Fig. 125 shows the basic instruction cycle. After each instruction cycle, central
processing unit checks for any valid interrupt request. If so, central processing, unit
[TECHNICAL PUBLICATIONS". a up ta oomComputer Architecture and Organization 13 Introduction
fetches the instructions from the interrupt service routine and after completion of
interrupt service routine, central processing unit starts the new instruction cycle from
where it has been interrupted. The Fig. 1.26 shows instruction cycle with interrupt cycle.
Fonte
norman | Fenovte parents | Fetch ove
ee aia atin
Seana Deen ‘Execute instruction ‘Execute cycle
Sod
Cars Interuptor
Fig. 125 Basic instruction cycle
Fig. 127 shows the principle actions Process intra
carried out in each eyele
Fig. 1.2.6 Basic instruction cycle with interrupt
Fetch Cycle :
The fetch cycle is common to all instructions, We know that program control unit
fetches two instructions simultaneously from memory, it is necessary to check whether
the next instruction is available in the IBR or not, If not, the previously incremented
contents of the program counter are transferred to the address register and a Read
request is sent to Memory (M). The required data at memory location X (M(x)) is then
transferred to the data register DR. The opcode of the requited instruction (which is in
either the left or right half of the fetched word) is sent to the instruction rogister and
address part is sent to the address register, while the second instruction may be
transferred to the instruction buffer register TBR.
If next instruction is available in the IBR, opcode part is sent to the instruction
register and address part is sent to the address register.
It is important to note that program counter is ineremented only when instruction is
read from memory i.e. when the next instruction is not available in the IBR.
TECHNICAL PUBLICATIONS”. An ph er hnowesgeComputer Architecture and Organization
Introduction
No Memory
required
DR= MAR)
Fetch
R= OR 27) Yes
AR + DR (28:39)
L__Tpcs reer
TER = DR 2088)
IR DR(O7)
AR = DR E10)
Decode cycle Decede instruction in
M@X)— Ac [Go to M(x, 20:39)] AC= 0 then ‘AC = Aac— M(x)
‘90 to M(X, 20:39)
DR MAR)
Execution
ree
ig. 1.2.7 Three phases of instruction cycle
Decode Cycle
In the decode cycle, the instruction in the instruction register is decoded by the
control circuits in the program control unit
Execution Cycle
In the execution eycle micto-operations depending on the instructions are carried out.
Fig. 12.7 shows operations for four instructions: M(X) < AC, go to M(X, 2039), if
AC 2 0 then go to MIX, 20:39) and AC © AC - M(X), Note that each instruction is
executed by a sequence of micro-operations. For example, instruction MQX) — AC
requires two micro-operations; first the contents of accumulator are transferred to the
data register and then the contents of data register DR are transferred to the memory
location specified by the address register AR.
TECHNICAL PUBLICATIONS”. An ph fr hnowesgeComputer Arcitecture and Orparization Introduction
Programming Examples
Program 1 Subtract two numbers
AC «© MG0) + Get the contents of memory location
30 into
the accumulator
AC © AC- MGI) + Subtract the contents of memory location 51 from
+ the contents of accumulator and place the result in
j the accumulator
M@2) — AC Store the contents of accumulator, result in the
7 memory location 52.
Program 2 : Solve equation 2a + 2b, where M(100) = a and M(101) = b
AC © M(100) + Get the contents of memory location 100 into
» the accumulator
AC AC x2 5 Multiply accumulator by 2 (a x 2)
M(102) — AC + Save the result of multiplication
AC © Mulo1) + Get the contents of memory 101
» into the accumulator
ACE ACK 5 Multiply accumulator by 2 (b x 2)
AC © AC + M(102) ‘Add the contents of memory
+ location 102 to the contents of
+ the accumulator and place
7 result 2a + 2b) in the accumulator.
EEE] second Generation
In the second generation, IBM 7094 system, the first major change in the electronic
computer came with the replacement of the vacuum tube by the transistor. From the
architecture point of view, the second generation CPU differs from that of the LAS
computer mainly in the addition of a set of index registers and arithmetic circuits that
can handle both floating-point and fixed-point operations. They have separate 1/0
processors having direct access to main memory to control 1/O operations,
In the second generation, magnetic core memories and magnetic drum storage
devices were more widely used. In this generation, the higher level languages such as
Fortran were developed, making the preparation of application programs much easier.
System programs called compilers were developed to translate these high-level language
programs into a corresponding assembly language program, which was then translated
into executable machine language form.
‘The Fig, 1.28 shows the IBM 7094 configuration. We see the major difference is the
uuse of data channels, A data channel is an independent 1/0 module with its own
TECHNICAL PUBLICATIONS”. An up fr hnowesgeComputer Architecture and Organization
Introduction
a en
Pune
cou cae |
chanel
Une printer
LJ card reador
Mutoleser
Da
| baw
Disk
Momery
— | om
Daa ||
chen
t+] Hrrerianes
paw | Talepocesng
channel [=] “Teguipmnt
Fig. 1.2.8 IBM 7094 configuration
processor (I/O processor) and its own instruction sel. A computer system with 1/0
processor does not execute detailed I/O instructions. Such instructions are stored in a
main memory to be executed by a special purpose processor in the data channel itself
In such configuration CPU initiates an 1/0 transfer by sending a control signal to the
data channel, instructing it to execute a sequence of instructions in memory. The data
channel can perform this task independently relieving the processing burden of the CPU.
Another new feature of 7094 is the use of multiplexer. It schedules access to the
memory from the CPU and data channels, allowing these devices to act independently
Third Generation
In mid 1960's, Integrated Circuits (ICs) began to replace the discrete transistor circuits
to introduced third generation of computers. In the third generation various techniques
were introduced to improve the performance of the computer, These are as follows
TECHNICAL PUBLICATIONS”. An ph fr hnowesgeComputer Architecture and Organization
Introduction
‘¢ Microprogramming
‘© Parallel processing
4) Multiprocessing b) Pipelining
‘© Sharing resources
Microprogramming was introduced to simplify CPU design and increase: their
flexibility, whereas parallel processing was introduced to increase effective speed at
which programs could be executed. To cope up with the demand of large memory space
and to offset the speed difference of electronic circuit of CPU and memory subsystem,
semiconductor memories were introduced, replacing ferrite cores.
‘Table 1.22 gives the listing of some third generation systems,
‘Company Products
Burroughs Corporation TM 260, 85500, #500, 7500, 8500
String Array Computer UNIVAC 1108, CDC 6600, 7600, CDR STAR-100
iinols Automatic Computer MUACIV
Digital Equipment Corporation POPS, PDP, PDP-1t
Table 1.22
In 1964, IBM had announced new series of computers (third generation), named
system/360. They came up with systematic computer architecture and its implementation.
The architecture of computer is its structure and behaviour as seen by an assembly
language programmer. It includes data and instruction formats, addressing modes,
instruction set and the general organization of the CPU registers, main memory and 1/0
system.
The implementation, on the other hand, refers to the Iogical and physical design
techniques used to realize the architecture in any specific instance. The logical aspect of
the implementation can be referred to as computer organization,
IBM has continued to add new computer, IBM 370 series to this family (system/360).
All computers representing system/360 and system 370 families have same architecture,
but they have different implementations.
Structure of IBM 360/370
Fig. 1.29 shows the general structure of a typical system/360-370 or IBM 360-370
computer,
‘As shown in Fig. 12.9, two separate 1/O channels are used: multiplexer channels and
selector channels, The multiplexer channels allow the multiplexed (interleaved) data
TECHNICAL PUBLICATIONS”. An ph er hnowesgeComputer Architecture and Organization
Introduction
transmission between main memory and several I/O devices, where as selector channels
allow the data transmission only with one 1/O device at a time,
Tape Disk [ Disk
| Ee Ee
I CRA
ee ik
oS, Ee
= = a
memory | i
tI an
me! Tr
a I a
LL | ~leem TT.
oo
mi ‘tT tT
am) sea) PSs
fa] aed
Ltt I
com | [ Se | ]
Fig, 1.2.9 Structure of an S/360-370 series computer
‘The selector channels are intendled for use with very high speed 1/0 devices, such as
magnetic disk while multiplexer channels are intended for controlling a number of low
speed devices such as printers, card readers and card punches.
The I/O devices are interfaced to memory control unit through selector channels or
multiplexer channels. The 1/O devices are connected to selector or multiplexer channels
with the help 1/0 interface bus. It consists of data and control signals.
Features of IBM 360-370
'# Tt uses 32-bit or 4 byte word format.
‘© Tehas 16 general registers and 4 double length floating point registers.
‘+ It provides extonsive instruction set, almost 150 opcodes supported with word,
byte, integer, decimal and floating point operations.
TECHNICAL PUBLICATIONS”. An ph er hnowesgeComputer Architecture and Organization
19 Introduction
‘+ It supports data transfer instructions for register to register, register to memory,
memory to register or memory to memory data transfers.
‘+ The main memory of the 360 is faster and can be expanded upto 1 million words
in some models.
Basic CPU Implementation
Fig. 1.2.10 shows the structure of central processing unit for IBM 360-370. As shown.
in the Fig, 1.2.10, arithmetic-logic unit is divided logically into three subunits
‘© Fixed point arithmetic unit
© Decimal arithmetic unit
‘+ Floating point arithmetic unit
6
82-8
ee
registers
Floating-point
‘nitmetie
nit
Foied-point Floating-point
anthmtic ‘arinmate
It performs following operations
‘+ Fixed point operations, including binary integer arithmetic and effective address
computation,
‘+ Floating point arithmetic
‘© Variable-length operations, including decimal arithmetic and character string
operations.
TECHNICAL PUBLICATIONS”. An ph er hnowesgeComputer Architecture and Organization Introduction
‘AS mentioned earlier, 360 system consists of sixteen 32-bit general registers. These are
used to store operands and results. For floating point arithmetic four 64-bit
floating-point registers are used. The address register AR stores the address for memory.
The data transfer with memory is done through data register DR. The instruction
register stores the opcode portion of the instruction. The Program Status Word (PSW) is
a two 32-it words. It stores. program status, interrupts that the CPU may respond to
and the address of the next instruction to be executed.
[EEG] Later Generations
Beyond the third generation, it is not easy to define generation of computers. Later
generations are based on the advances in integrated-circuit technology. The impact of
large scale integration (LSI) and very large scale integration (VLSI) technology on
computer design has been profound. It has made it possible to fabricate an entire CPU,
main memory or similar device with a single IC that can be mass produced at very low
cost. This has resulted in new classes of machines such as personal computers and
high-performance parallel processors that contain thousands of CPUs.
San
7. apn ie dks mel wacins
8. let dt and isracton rats sed by Von Neu ce
4. List te instructions sep by IAS copter.
Bier hoc of oe,
What doy meet by tsracon ple 1 Eph the Lak rc ee wn ep of
fac
7. Bigshot hr ns fv cos
8 Brey explain he ergoiation of TAS computer ith 8 aucion so. RSTRNT
9. With a neat diagram explain Vore Newman computer achitecte
VLSI Era
‘As mentioned earlier, the very large scale integration (VLSI) allows manufacturers to
fabricate a CPU, main memory or even all the electronic circuits of a computer, on a
single IC that can be mass-produced at very low cost. This has resulted in new classes
oof machines ranging from portable personal computers to supercomputers that contain
thousands of CPUs.
EERE integrated Circuits
Since the 1960s a powerful technology for manufacturing different circuits has been
the integrated circuit or IC, An integrated circuit is a group of transistors, diodes,
resistors and sometimes capacitors wired together on a very small substrate (or wafer).
TECHNICAL PUBLICATIONS”. in ph fr hnowesgeComputer Architecture and Organization
Introduction
Using IC technology, tens of thousands of components can be contained in a single
integrated circuit. There are many advantages offered by integrated-circuit technology as
compared with discrete components interconnected by conventional techniques.
‘The important advantages are
1. Low cost 2,Small size 3, High reliability 4. Improved performance
Large Scale Integration (LSI) is an extension of integrated-circuit (IC) techniques. LSI
represents the process of fabricating, chips with a large number of components which are
interconnected to form complete subsystems or systems. In 1972, typically more than
100 gates or 1000 individual circuit components were contained in commercially
ble LSI cireuits
Medium Scale Integration (MSI) devices have density of components less than LSI but
more than about 100 per chip.
The latter case is Very Large Scale Integration ie. VLSI. The impact of large seale
integration (LSI) and very large scale integration (VLSI) technology on computer design
has been profound. The third and later generations of computer are based on the
advances in integrated circuit technology. VLSI technology made it possible to fabricate
an entire CPU, main memory and IO circuits of a computer on a single IC that can be
mass-produced at very low cost, This has resulted in new classes of machines such as
personal computers and high-performance parallel processors that contain thousands of
CPUs. ICs can be manufactured in high volume at low cost per circuit
For any computer system, system performance depencis on the processor as well as
the performance of main memory system. Dynamic Random-Access Memory (DRAM) is
a basic building block of main memory. Around 1970, a pocket calculator was
manufactured using a single IC chip. Alter this great achievement, single-chip DRAMs
and microprocessors were developed. There is continuous development in the IC
technology. Because of this, DRAM chip was 1 K = 2" bits in 1970, has been growing
continuously and now a days 1G-bits DRAMSs are available. As IC technology improved
and chip density (Le. number of transistors contained in the chip) increased, the
complexity and performance of one-chip microprocessors increased steadily. This is
reflected in the increase in CPU word size from 4-bit, Sbit, 16-bit, 32-bit and upto 6L-bit.
By 1990 it became possible to fabricate entire CPU along with part of its main memory,
on a single IC.
EE@ ic Fa
According to the transistor and circuit types employed, different sub technologies
exist within IC technology. The most important among these are, bipolar and MOS
(Metal-Oxide-Semiconductor) which is also referred to as unipolar. The basic elements in
both the types of circuits are transistors. The difference between them is in the polarities
of electric charges associated with primary carriers of electrical signals within their
transistors. Bipolar circuits use both the types of carriers whereas MOS circuits use
TECHNICAL PUBLICATIONS". An up hr hnomesgeComputer Architecture and Organization
Introduction
negative carriers (electrons in case of NMOS) and positive carriers (holes in case of
PMOS), CMOS is the MOS family which combines PMOS and NMOS transistors in the
same IC. This technology was widely used in 1980s. This is preferred by many
manufacturers for microprocessors and other VLSI ICs because of its advantages such a3
high density, high speed and at the same time very low power consumption.
KEE] Processor architecture
By 1980 computers were classified by their sizes and capabilities as mainframe
computers, minicomputers and microcomputers
Mainframe computers are implemented using two or more central processing units
(CPUs). These are designed to work at very high speeds with large data word lengths,
typically 64-bits or greater. The data storage capacity of these computers is very high.
These are used for complex scientific, large data processing applications, Military defense
control and for complex graphics applications.
Minicomputers are the scaled down version of mainframe computer with the
moderate speed and storage capacity. These are designed to process smaller data words,
typically 32-bit words, This type of computers are used for scientific calculations,
research, data processing application etc
Microcomputers are smaller computers. They contain only one Central Processing,
Unit CPU (microprocessor) is usually a single integrated circuit. Microcomputer is the
integration of microprocessor and supporting peripherals (memory and I/O devices).
The word length depends on the microprocessor used! and is in the range of &bits to
82-bits. This type of computers are used for small industrial control, process control and
where storage and speed requirements are moderate.
In the mid 1970s, microcomputer technology gave rise to a new class of machines
called personal computers (PCs). A typical PC has the Von Neumann organization. That
is, it includes a microprocessor, a multi-megabyte main memory, I/O devices (eg.
keyboard, video monitor or screen), a magnetic or optical disk drive unit for high
capacity secondary memory and interface cireuits for connecting PC to 1/0 devices and
to other computers. PCs are very much useful in offices and homes for education,
entertainment and increasingly, communication with other computers via the World
Wide Web (WWW).
In 1981, a personal computer family IBM PC was introduced, which became the most
suceessful PC family. The IBM PC series began with 8086 microprocessor. Because of the
advances in VLSI technology, processor hardware became much less expensive and
computer designers increased the use of complex, multistep instructions. The 8086
microprocessor was followed by 80186, $0286, 80386, 80486 and pentium. In. 1984,
Motorola 680%0 microprocessor family was introduced by Apple Computer's Macintosh
TECHNICAL PUBLICATIONS”. An ph er hnowesgeComputer Architecture and Organization
Introduction
which was another popular personal computer series. The advances in VLSI made it
possible to add new features such as new instructions, data types, instruction sets and
addressing modes to old microprocessors. But at the same time, the ability to execute
programs written for older machines was kept constant. A single complex instruction
can be used to replace a number of instructions for a given task. For example,
‘multiplication operation can be performed by repeated execution of add instructions. In
8086 microprocessor, multiplication instruction is available which can replace repeated
execution of add instructions. This also reduces the overall program execution time. The
Intel 80x 86/pentium series illustrates the trend toward more complex instruction set
The Intel 8086 microprocessor chip contains 20,000 transistors. It was designed to process
I6-bit data words. After twenty-five years, Pentium was introduced by Intel which
contained over 3 million transistors. It can process 32-bit and 64-bit words directly. Also
it can process floating point operations. It is superscalar processor. ic. it is capable of
parallel instruction execution of multiple instructions.
The 80x86, 6800 are called complex instruction set computers (CISC), The complex
instructions reduce program size. But this technology does not translate into faster
program execution. However, complex instructions require relatively complex processing
circuits, So it becomes necessary to put CISCs in largest and most expensive IC category.
This reduces the computer's overall performance. To overcome this drawback, a new
processor, Reduced Instruction Set Computer (RISC) was introduced by IBM. In 1980s,
a number of commercially successful RISC microprocessors were introduced such as IBM
RISC System/6000 and SPARC.
Advances in VLSI technology affecting all types of computers tend to increase the
CPU's clock frequency and hence to reduce program execution time.
The PowerPC architecture was developed by Motorola, IBM and Apple computer, is
based on the POWER architecture implemented by RS/6000™ family of computers. The
PowerPC architecture takes advantage of recent technological advances in such areas as
process technology, compiler design and RISC microprocessor design to provide
software compatibility across a diverse family of implementations, primarily single chip
microprocessors, intended for a wide range of systems, inchiding battery-powered
personal computers; embedded controllers, high-end scientific and graphics workstations
and multiprocessing, microprocessor based mainframes.
Because of further advances in VLSI, Intel came out with the first S-bit
microcontroller, 8048. Unlike microprocessors, microcontrollers are generally optimized
for specific applications. The Intel 8048 was designed to control the general task. After
that, the high performance microcontroller families such as MCS51, MCS9% were
developed. The 8051 in the MCS51 family was optimized for 8-bit math and single bit
Boolean operations and the 8086 in the MCS96 family was designed for high speed /high
TECHNICAL PUBLICATIONS". An ph fr romieageComputer Architecture and Organization Introduction
performance control applications. Overall these families provide larger program and data
memory spaces, more flexible I/O and peripheral capabilities, greater speed and lower
system cost than any previous generation single chip microcontroller.
‘The IC technology has also been the driving force in the proliferation of large-scale
computer networks-the Internet. All this discussion shows us the impact of VLSI on
computer design and application,
[EEX] Performance
When we say one computer is faster than another, we compare their speeds and
observes that the faster computer runs a program in less time than other computers. The
computer centre manager running a large server system may say a computer is faster
when it completes more jobs in an hour. The computer user is always interested in
reducing the time between the start and the completion of the program or event, ic.
reducing the execution time, The execution time is also referred to as response time.
Reduction in response time inereases the throughput (the total amount of work done in
4 given time). The performance of the computer is directly related to throughput and
hence it is reciprocal of execution time
1
Performance = =——! —__
Execution time,
A
‘This means that for two computers A and B if the performance of A is greater than
the performance of B, we have,
Performance, > Performances
1 1
Execution timeg > Execution time,
‘That is, the execution time on B is longer than that on A, if A is faster than B.
In discussing a computer design, we often want to relate the performance of two
different computers quantitatively. We will use the phrase "A is n times faster than B" or
equivalently "A is n times as fast as B° to mean.
Performance
Performances,
If A is n times faster than B then the execution time on B is n times longer than it is
on A:
Performance, _ Execution time
Performance; ~ Execution time y
TECHNICAL PUBLICATIONS”. An ph er hnowesgeComputer Architecture and Organization
Introduction
ee
Pen IE sec il fet is AB
Solution ; We know that A is n times faster than B if
Performance, _ Execution times
Performances ~ Execution time,
‘Thus the performance ratio is,
B_ os
R- 25
and A is therefore 25 times faster than B.
In the above example, we could also say that computer B is 25 times slower than
computer A, since
Performance y
= 25
Performance, ~ ”
means that
Performance, _
eA. = Performances
For simplicity, we will normally use the terminology faster than when we try to
compare computers quantitatively. Because performance and execution time are
reciprocals, increasing performance requires decreasing execution time. To avoid the
potential confusion between the terms increasing and decreasing, we usually say
"improve performance’ or “improve execution time” when we mean “increase
performance’ and “decrease execution time’.
The ideal performance of a computer system is achieved when we have a perfect
match between the machine capability and the program behaviour. The machine
capability can be enhanced with beter hardware technology, innovative architectural
features and efficient resources management. However, program behaviour is difficult to
predict since it heavily depends on application and run-time conditions. The program
behaviour also depends on the algorithm design, data structures used, language
efficiency, programmer skill and compiler technology. Let us see the factors for
projecting the performance of a computer.
Processor Clock
In today’s digital computer, the CPU or simply the processor is driven by a clock
with a constant cycle time called processor clock. The time period of processor clock is
denoted by P. The period P of one clock cycle is an important parameter that affects
processor performance. The clock rate is given by R= 1/P which is measured in eyeles
TECHNICAL PUBLICATIONS”. An ph er hnowesgeComputer Architecture and Organization
Introduction
per second (CPS). The electrical unit for this measurement of CPS is hertz (Hz). Today's
personal computers and workstations have clock rates in the range of megahertz (MHz)
and gigahertz (GHz). The computers having clock rate of 800 MHz have 800 mi
cycles per seconel
‘CPU Time
CPU execution time or simply CPU time is the time the CPU spends computing for
particular task and does not include time spend waiting for 1/O or running other
programs. CPU time can be divided into the CPU time spent in the program, called user
CPU time and the CPU time spent in the operating system performing tasks on behalf
of the program, called system CPU time. Differentiating between system and user CPU
time is difficult to do accurately because it is often hard to assign responsibility for
operating system activities to one user program rather than another and because of the
functionality differences among operating systems. We use CPU performance to refer
user CPU time.
Performance Metrics
Users and designers often examine performance using different metrics. If we could
relate these different metrics, we could determine the effect of a design change on the
performance as seen by the user. Since we are confining ourselves to CPU performance
at this point, the bottom-line performance measure is CPU execution time, A simple
formula relates the most basic metrics (clock cycles and clock cycle time) to CPU time :
CPU execution time = CPU clock cycles for a program « Clock cycle time
fora program
Alternatively, because clock rate and clock cycle time are inverses,
CPU execution time _ CPU clock eyes for a program
fora program Clock rate
‘This formula makes it clear that the hardware designer ean improve performance by
reducing either the length of the clock eycle or the number of clock cycles required for a
program.
Hardware Software Interface
The previous equation do not include any reference to the number of instructions
needed for the program, However, since the compiler clearly generated instructions to
execute and the computer had to execute the instructions to run the program, the
execution time must depend on the number of instructions in a program.
TECHNICAL PUBLICATIONS”. An ph er hnowesgeComputer Architecture and Organization Introduction
For the execution of program, processor has to execute number of machine language
instructions. This number is denoted by N. The number N is the actual number of
instructions executed by the processor and is not necessarily equal to the number of
machine instructions in the machine language program, This is because some
instructions may be executed more than once in the loop and others may not be
executed at all. Each machine instruction takes one or more cycle time for execution.
‘This time is required to perform various steps needed to execute machine instruction.
The average number of basic steps required to execute one machine instruction is
denoted by S, where each basic step is completed in one clock cycle. Thus, the program
execution time is given by,
Nxs
R
Te (131)
‘where Nis the actual number of instructions executed by the processor for execution
of a program, R is a clock rate measured in clocks per second and S is the average
number of steps needed to execute one machine instruction. The above equation is
known as basic performance equation
When machine instruction execution time is measured in terms of cycles per
instruction (CPD, the program execution time is given as
NxCPI
Te Be
(132)
We know that each instruction execution involves cycle of events involving the
instruction fetch, decode, operand(s) fetch, execution and store results. We need to access
memory to perform instruction fetch, to perform operands) fetch or to store results
‘The memory cycle is the time needed to complete one memory reference. Usually, @
memory cycle is k times the processor eyele, P. The value of k depends on the speed of
the memory technology and the interconnection scheme used to interface memory and
processor,
‘The CPI of the an instruction type can be divided into two components terms
corresponding to the total processor cycles and memory cycles needed to complete the
execution of the instruction. Therefore, we can rewrite the equation 2 as,
Nx(p+m>k)
T= 33)
Where p is the number of processor cycles required for the instruction decode and
execute, m is the number of memory references needed, k is the ratio between memory.
cycle and processor cycle, N is the machine instruction count and R is the clock rate
‘The above performance parameters, ie. N, p, m, k, R are affected by four system
TECHNICAL PUBLICATIONS". An ph fr romleageComputer Architecture and Organization Introduction
attributes : instruction set architecture, compiler technology, CPU implementation and
control and cache and memory hierarchy, as shown Table 1.3.1.
Performance parameters
Machine Average cycles per instruction | Clock
instruction rate (R)
System attributes “count (N) | Processor | Memory | Memory
cycles per references per| access
instruction | instruction | latency,
© (=) k
Instruction set architecture v v
Compiler technol ¢ v v
Processor implementation and ¥ “
sontrol
‘Ceche and memory hierarchy v v
Table 1.31
The instruction set architecture affects the machine instruction count (N), ie. the
program length and the average processor cycles required per instruction (p). The
compiler technology affects the value of N, p and the memory reference count (m). The
processor implementation and control determine the total processor time (p/R) required.
Finally, the memory technology and hierarchy design affect the memory access latency
k/R).
(QREIIREY ei: rvs comptes vse sme ation set reitetre:
‘Computer A has a clock cycle time of 250 ps and a CPI of 2.0 for some program and
computer B has a clack eycle time of 500 ps and a CPI of 1.2 for the same program. Which
computer is faster for this program and by hozo nuwch ?
Solution : We know that each computer executes the same number of instructions for
the program; let's call this number N. First, find the number of processor clock cyeles for
cach computer :
CPU dlock cycles, = Nx 20
(CPU clock cycles, = Nx 1.2
The CPU time for each machine will be
CPU time, = CPU dock cycles, x Clock eyele times,
Nx 2.0x 250 ps = 500.N ps
CPU dlock cycles « Clock cycle time
Nx 1.2% 500 ps = 6O0N ps
CPU times
TECHNICAL PUBLICATIONS”. An ph fr hnowesgeComputer Architecture and Organization Introduction
‘Thus we can say that computer A is faster. The amount faster is given by the ratio of
the execution times.
CPUPerformance, _ Execution timesy _ 600N ps
‘CPUPeriormancey ~ Execution times, ~ 500 ps
We can conclude that computer A is 12 times faster than computer B for this
program
Other Performance Measures
MIPS is an another way to measure the processor speed. The processor speed can be
measured in terms of million instructions per second (MIPS). It is given as,
MIPS rate = 1
‘Average time required for the execution of instruction x 10°
- ota we 34)
(CPIx 10° Nx CPI x 10°
Substituting value of T from equation (1.32) we get,
N
MiP rate = —N_ 135)
Tx 106 me)
Referring equation (1.3.2) we can also write
MIPS rate = NER
Cx 10°
Where C is the total number of clock cycles required to execute a given program,
(Nx CPD,
Throughput Rate
Another important measure of throughput is known as throughput rate, It indicates a
number of programs a system can execute per unit time, It is often specified as
programs/second, Throughput can be further measured separately for the system (W,)
and for the processor (W,). The processor throughput is given as,
‘Number of machine instructions executed per second
w,
Number of machine instructions per program
p
36)
= MIPS rates 108
= Misamis
It is often greater than the system throughput because in system throughput we have
to consider the system overheads caused by the I/O, compiler and OS (operating
system) when multiple programs are interleaved for processor execution by
multiprogramming or time sharing. If the processor is kept busy in a perfect program
TECHNICAL PUBLICATIONS". An ph fr romleageComputer Architecture and Organization
Introduction
interleaving fashion, then W, = W,- This will probably never happen, since the system
overhead often causes an extra delay and the processor may be left idle for some cycles,
MFLOPS:
‘The 1970s and 1980s marked the growth of the supercomputer industry which was
defined by high performance on floating-point-intensive programs. Average instruction
time and MIPS were clearly inappropriate metrics for this industry. Hence another
popular alternative to execution time was invented. It is million floating-point operations
per second, abbreviated megaflops or MFLOPS but always pronounced “megaflops.”
MFLOPS can be defined as
Mrtops = Number of floating point operations in a program
Execution time x 10°
Where, a floating-point operation is an addition, subtraction, multiplication or
division operation applied to a number in a single or double precision floating point
representation. Such data items are heavily used in scientific calculations and are
specified in programming languages using key words like float, real, double or double
precision,
MFLOP rating is dependent on the program. Different programs require the execution
of different number of floating point operations. Since MFLOPS were intended to
measure floating-point performance, they are not applicable outside that range.
MFLOPS is based on operations in the program rather than on instructions, hence it
has a stronger claim than MIPS to being a fair comparison between different machines.
‘The key point is that the same program running on different computers may execute a
different number of instructions but will always execute the same number of
floating-point operations. Unfortunately, MFLOPS is not dependable because the set of
floating-point operations is not consistent across machines and the number of actual
floating-point operations performed may vary. For example, the processor which does
not provide division instruction requires several floating-point operations to perform
floating-point division, whereas the processor which provides division instruction
requires only one floating point operation to perform floating, point division.
Another major problem is that the MFLOPS ratings changes according not only to the
mixture of integer and floating-point operations but to the mixture of fast and slow
floating-point operations. For example, the program with floating-point add operations
have a higher rating than the program with floating-point division operations. This
problem can be solved by giving more weights to the complex floating-point operations
while measuring the performance. These MFLOPS might be called normalized MFLOPS.
Of course, because of the counting and weighting, these normalized MFLOPS may be
very different from the actual rate at which a machine executes floating-point operations,
TECHNICAL PUBLICATIONS". An ph fr romleageComputer Architecture and Organization
Introduction
Performance Measurement
When we compare the performance of different computers say A, B and C, we may
observe that some programs run faster by computer A, some by computer B and some
by computer C. In this situation they present a confusing picture and we cannot have a
clear idea of which computer is faster. This happens because each computer has an
ability to execute particular instruction or step in the instruction execution faster than
other
‘We know that, processing of an instruction involves several steps
‘+ Fetch the instruction from main memory M.
‘+ Decode the instruction opcode,
‘+ Load the operands from the main memory if they are not in the CPU registers.
‘+ Execute the instruction using appropriate functional unit, such as floating point
adder or fixed point adder.
‘+ Store the results in the main memory unless they are to be retained in CPU
rogisters,
All instructions do not require to perform all steps listed above. When instruction has
all its operands in CPU registers, it will run faster whereas the instruction which
requires multiple memory accesses takes more time to execute. Let us consider two
programs P, and P,, with instructions having all operands in the CPU and with
instructions having all operands in the memory, respectively. Also consider two
computers C, and Cy, The clock speed of C; is greater than the clock speed of Cy ;
however the memory access time in C; is less than the memory access time in C>. With
these computer conditions we can easily understand that the C, will execute the
program P, faster than C and C, will execute the program P faster than C,. In such
situation it is difficult to decide which computer is faster. Therefore, measures of
instruction execution performance are based on average figures, which are usually
determined experimentally by measuring the run times of representative called
benchmark programs, In recent years, it has become popular to put together collection
of benchmarks to try to measure the performance of processors with a variety of
applications. The benchmark programs are different for checking the performance of
processor for different applications, According to applications the benchmark programs
are classified as
‘+ Desktop benchmark
‘+ Server benchmark and
‘+ Embedded benchmark
Desktop Benchmarks
Desktop benchmarks divide into two broad classes : CPU-intensive benchmarks and
graphics-intensive benchmarks. These two benchmark programs measures compute
CPU and graphics performance of the processor, respectively
TECHNICAL PUBLICATIONS”. An ph er hnowesgeComputer Architecture and Organization
Introduction
Server Benchmarks
We know that servers have to perform many funetions, so there are multiple types of
benchmark programs for servers
‘= CPU throughput oriented benchmark : This benchmark program can be used to
measure the processing rate of a multiprocessor by running, multiple copies, one for each
CPU benchmark and converting the CPU time into a rate, This particular measurement
is known as SPEC rate.
+ Web server benchmark ; This benchmark program simulates multiple clients
requesting both static and dynamic pages from a server, as well as clients posting data
to the server.
+ File system benchmark : It is used to measure network file system (NFS)
performance using a script of file server requests. It also tests the performance of the
1/0 system (both disk and network I/O) as well as the CPU,
‘= Transaction processing benchmark : It is used to measure the ability of a system to
handle transactions, which consists of database accesses and updates. In the mid 1980s, a
group of concerned engineers formed the vendor-independent transaction processing,
council (TPC) to try to create a set of realistic and fair benchmark programs for
transaction processing. Following this TPC benchmark program there were many:
benchmarks published namely TPC-A, TPC-C, TPC-H, TPC-R and TPC-W. All these
benchmarks, measure performance in transactions per second. In addition, they include a
response time requirement, so that throughput performance is measured only when the
response time limit is met.
Embedded Benchmarks
Embedded applications have enormous variety and their performance requirements
are also different. Thus, it is unrealistic to have a single set of benchmark programs for
embedded systems. In practice, many designers of embedded systems planned for the
benchmark programs that reflect their application, either as kernels or as stand-alone
versions of the entite application,
‘A new set of standardized benchmark programs (EDN) Embedded Microprocessor
Benchmark Consortium are available for embedded applications which are characterized
well by Kernel performance. These benchmark programs are divided into five different
classes :
‘+ Automotive/industrial
© Consumer
+ Networking
‘© Office automation
+ Telecommunications
TECHNICAL PUBLICATIONS". An ph fr hnowesgeComputer Architecture and Organization
Introduction
Automotive/industrial benchmark programs include microbenchmark programs for
arithmetic operations, pointer chasing, memory performance, matrix arithmetic, table
lookup and bit manipulation, They also include automobile control benchmarks and FFT
benchmarks. The consumer benchmark programs include mainly multimedia benchmarks
like JPEG compress /decompress, filtering and RGB conversions. Networking benchmark
is the collection of programs for shortest path calculations, IP routing and packet flow
operations. Office automation benchmark programs include graphics and text
benchmarks such as Bezier curve calculation, dithering, image rotation and text
processing. Finally, telecommunication benchmark programs include filtering and DSP
benchmarks.
‘The selected benchmark programs are compiled for the computer under test and the
running time on a real computer is measured. The same benchmark program is also
compiled and run on the reference computer. A nonprofit organization called System
Performance Evaluation Corporation (SPEC) specified the benchmark programs and
reference computers in 1995 and again in 2000. For SPEC, the reference computer is
the SUN SPARCStation 10/40 and for SPEC2000, the reference computer is an
Ultra-SPARC1O0 workstation with a 300 MEIz Ultra SPARC-II processor.
‘The running time of a benchmark program is compared for computer under test and
the reference computer to decide the SPEC rating of the computer under test. The SPEC
rating is given by
Running time on the reference computer
Running time on the computer under test
The SPEC rating for all selected programs is individually calculated and then the
geometric mean of the results is computed to determine the overall SPEC rating for the
computer under test. Itis given by,
t
SPEC rating =
SPEC rating = ( Tisrec, y
i
where n is the number benchmark programs used for determining SPEC rating, The
computers providing higher performance have higher SPEC rating,
Woitea short note on VLSI Fea
Explain the advantages offered by integrated cireuit technology,
Write a short mote on IC fmily,
Write a short noleon evolutions of computers and their performance considerations
What are the important factors that determine a computer’ performance ? Give is significance.
6. Write the basic performance equation and using this equation explain how the performance of a
sytem cam be npr
TECHNICAL PUBLICATIONS". An ph fr hnowesgeComputer Architecture and Organization
EY esign
This section explains the design process for digital system at two basic levels of
abstraction : the register level and processor level.
[EEA System Design
‘A computer is a large and complex system in which system objects are the
components of the computer. These components are connected to perform a specific
function, The function of such system is determined by the functions of its components
and how the components are connected
‘System Representation
‘We can represent a system using a graph or a block diagram. A computer system is
usually represented by a block diagram, A system has its own stricture and behaviour.
The structure and behaviour are the two properties of the system. We can define the
stnuchure of a system as the abstract graph consisting of its block diagram with no
functional information, as shown in Fig. 14.1
Introduction
A Nor
AND
® ger on [+ Ace
ano}
(0) block diagram representing EX-NOR logic cicult
i 2
(b) Structure of a system as an at
Fig. 1.4.4
tract graph
TECHNICAL PUBLICATIONS”. in up hr hnowesgeComputer Architecture and Organization Introduction
As shown in Fig. 14.1, the structure gives the components and their interconnection.
A behavioral description, on the other hand, describes the function of each component
and thus the function of the system. The behaviour of the system may be represented by
Boolean function or by truth table in case of logic circuit,
‘The behaviour of logic circuits can also be described by Hardware description
language such as VHDL. They can provide precise, technology-independent descriptions
of digital circuits at various levels of abstraction, primarily the gate and register levels.
Design Process
For a given system's structure, the task of determining its function or behaviour is
termed analysis, On the other hand, the problem of determining a system structure that
exhibits a given behaviour is design or synthesis,
‘The design process starts with the construction of initial design. In this process, with
given a desired range of behaviour and set of available components we have to
determine a structure (Design).
The next step is to evaluate its cost and performance. The cost and performance
should be in the acceptable range. Then we have to confirm that whether the formed
structure achieves the desired behaviour. If not we have to modify the design to meet
the design goals. The Fig. 142 illustrates the design process
‘Construct an
Ina design
‘Modify tho design
to meet the desion goals
Fig. 1.422 Design process
TECHNICAL PUBLICATIONS”. An ph fr hnowesgeComputer Architecture and Organization
Introduction
Computer-aided Design
The computer-aided design (CAD) tools provides designers with a range of programs
to support their design goals. They are used to automate fully or party the more tedious
design and evaluate its steps. They contribute mainly in three important ways to the
overall design process,
© CAD editors or translators convert design data into forms such as HDL.
descriptions or schematic diagrams, which can be efficiently processed by the
humans, computers or both.
‘+ Simulators creates the computer model for the design and can mimic the design's
behaviour. It helps designer to determine how well the design meets various
performance and cost goals.
‘+ Synthesizers derives structures that implement all or part of some design step.
Design Levels
‘The design of a computer system can be carried at several levels of abstraction. The
three such recommended levels are =
‘+ The processor level also called the architecture, behaviour or system level.
‘+ The register level, also called the register-transfer level (RTL),
‘+ The gate level, also called the logic level.
The Table 1.4.1 shows the comparison between these levels,
Design Level Components___IC Density Information Units ‘Time Units
Procior GP memories 10 VLSI Bc of words
Register Registers, counters, Msi Words 10 10s,
‘combinational circuit,
Sal sequential eects
‘Gute Lage aes, fips SSI) Bis 10 0s
Tablo 1.4.1 Comparison between design levels
3 dessa ammplas yet aly wha
‘+ Specify the processor-level structure of the system.
‘+ Specify the register-level structure of each component type identified in step 1
‘+ Specify the gate-level structure of each component type identified in step 2.
This design approach is known as top-down design approach and it is extensively
used in both hardware and software designs. Well, it is up to the designer to decide
whether to design a system using medium scale ICs, small scale ICs or a single IC
composed of standard cells. If the system is to be designed using medium-scale ICs or
standard cells, then the third step, gate-level design, is no longer needed. In the
following section we discuss the register-level and processor-level design approaches,
TECHNICAL PUBLICATIONS”. An up fr hnowesgeComputer Architecture and Organization
1 Explain the design proces for a digital system
2. What do you understand by design Ices inthe design of computer system ?
Introduction
5. Explain top-down design approach
EE] Register Level Dec.-03, 11, Mi
[At the register or register transfer level, related information bits are grouped to form.
words or vectors. These words are processed by small combinational or sequential
circuits
EEE Register Lon
The Table 15.1 shows the commonly used register level components and. their
functions. These components are link to form.
7
| Components
Type ‘Component Function
‘Combinational Word gates. Boolean operations.
Multiplexers and Demultiplexers. Data routings general combinational
functions.
Decoders and encoders Code checking and conversion,
‘Adders and subtracts, ‘Addition and subtraction.
Arithmetic logic units ‘Numerical and logical operations.
Programmable logic devices. Genetal combinational functions.
Sequential Registers Information storage
Shift register Information storage; serial parallel
conversion.
Counters CControl/tining signal generation.
Programmable logic devices. Genenal sequential functions.
Table 1.5.1 Commonly used register level components
The Fig. 15.1 shows the generic block representation of a register-level component
The "/m" on the input lines indicate it is a m-bit input bus. A slash '/’ with number or
letter next to it indicates the multi-bit bus. A bubble on the start or end of the line
indicates an active low signal; otherwise itis an active high signal. The input and output
data lines are shown separately. Similarly the input and output control lines are also
shown separately. The input control lines associated with a multifunction block fall into
two broad categories : select lines and enable lines. The select lines specify one of
TECHNICAL PUBLICATIONS”. An ph er hnowesgeComputer Architecture and Organization
several possible
operations that the unit
is to perform and
enable lines specify the
time or condition for a
selected operation to be
‘Select
performed. The output coniro!
control. signals, if any, pat
indicate when or how
the unit completes its
processing.
Let us see the major
combinational and
sequential components
used in design at the
rogister level
Word Gat
logical functions can be
performed on the m-bit
Enable
Introduction
ata input ines
Control
ouout
Satis S
Data oulputlines
Fig. 1.5.1 Generic block representation of a register-level
component
AO AB
binary words using
word gate operators, Let
A= (0p, ay oy agg) and
(by, bar « By) be
the two m-bit words we r—J
can perform the bitwise
AND operation on them
to result another m-bit
result, as shown in the
Fig. 152.
Multiplexers
Multiplexer is a digital switch. It allows
digital information from several sources to
be routed onto a single output fine, as
shown in the Fig. 153. The basic
multiplexer has several data-input lines
and a single output line. The selection of a
particular input line is controlled by a set
of selection lines. Normally, there are 2"
input lines and n selection lines whose bi
%
(2) Logie diagram
Fig. 1.5.2 Two-input, m-bit AND word gate
Yn v
(©) Symbot
Source D ———2
Fig, 1.5.3 Analog selector switch
combinations determine which input is selected. Therefore, multiplexer is ‘many into
one! and it provides the digital equivalent of an analog selector switch.
TECHNICAL PUBLICATIONS”. An ph er hnowesgeComputer Architecture and Organization
Introduction
Fig. 154. shows 4+to-1 line multiplexer. Each of the four lines, Dp to Dy, is applied to
bone input of an AND gate. Selection lines are decoded to select a particular AND gate.
SS
(@) Logie diagram
Inputs
m J output
Enable
input
elely
Ss
0
1
0
PP 1S 1S |e
Select inputs
{¢) Logical symbol!
(©) Function table
Fig. 1.54 4410-4 line multiplexer
Data ing For example, when $8) = 01, the AND
gate associated with data input D, has two of
its inputs equal to 1 and the thind input
connected to Dj. The other three AND gates
have at least one input equal to 0, which
makes their outputs equal t0 0. The OR gate
output is now equal to the value of Dy, thus
we can say data bit D, is routed to the
output when S; S) = 01.
{ In general, if multiplexer has 2" inputs
ete epi then we can represent multiplexer, as shown
Fig. 155 Fig. 155
Mattplexer
(ux)
Enable e
TECHNICAL PUBLICATIONS”. An ph er hnowesgeComputer Architecture and Organization
Introduction
In some cases, two or more multiplexers are enclosed within one IC package, as
shown in the Fig, 1.56. The Fig. 1.56 shows quadruple 2-to-1 line multiplexer, ie. four
multiplexers, each capable of selecting one of two input lines, Output Y; can be selected
to be equal to either Ay or B;, Similarly output Y may have the value of Az or By and
s0 on. The selection line $ selects one of two lines in all four multiplexers. The control
input E enables the multiplexers in the 0 state and disables them in the 1 state. When
E= 1, outputs have all 0's, regardless of the value of S.
Ao
Yo
Ay
%;
Ay
Ye
Ay
Ya
Ss.
(Select)
e.
(Enable)
Fig. 1.5.6 Quadruple 2-to-1 line multiplexer
In general, if multiplexer has n multiplexers enclosed with each capable of selecting
one of the m input lines, we can represent multiplexer as shown in Fig, 1.57. Such a
multiplexer is called 2"-input, m-bit multiplexer.
TECHNICAL PUBLICATIONS". An ph fr romleageComputer Architecture and Organization Introduction
Expanding Multiplexers pale eat
Several digital multiplexer ICs are available apa ny
such as 74150 (16 to 1), 74151 (8 to 1), 74157
(Dual 2 input) and 74153 (Dual 4 to 1)
multiplexer. It is possible to expand the range
oof inputs for multiplexer beyond the available
range in the integrated circuits. This can be
accomplished by interconnecting several
multiplexers, For example, two 7AXX151, 8 to
1 multiplexers can be used together to form a
T6-to-l multiplexes, two 74XX150, 16 to 1 y
Data output
multiplexers can be used together to form a 32 mati
to 1 multiplexer and so on. The Fig. 15.8 shows Fig. 1.5.7
an eight-input multiplexer constructed from two-input multiplexers,
inp
a 8 8
oT ot o—1 oT
Mux: MUX MUX Mux:
1 * 1 1
Be
Select
7A Mx Mux
5
8
MUX
Enable e
y
Data output
Fig, 1.5.8 An eight-nput multiplexer constructed from two-input multiplexers.
TECHNICAL PUBLICATIONS”. An ph er hnowesgeComputer Architecture and Organization
Introduction
Multiplexer as Function Generator
‘A multiplexer consists of a set of AND gates whose outputs are connected to single
OR gate, Because of this construction any Boolean function in a SOP form can be easily
realized using multiplexer. Each AND gate in the multiplexer represents a minterm. In 8
to 1 multiplexer, there are 3 select inputs and 23 minterms. By connecting the function
variables directly to the select inputs, a multiplexer can be made to select the AND gate
that corresponds to the minterm in the function. If a minterm exists in a function, we
have to connect the AND gate data input to logic 1; otherwise we have to connect it to
logic 0. This is illustrated in the following example
LEELEERD Mplement the fottowing Boolean fiunetion using 8 : 1 multiplexer.
F(A, B,C)= Em(1,3,5,6)
Solution : The function canbe
implemented with a 8o-1 multiplexer,
as shown in Fig. 159. Three variables
‘A, B and C are applied to the select
lines. The minterms to be included (1, 3,
5 and 6) are chosen by making their
corresponcling input lines equal to 1
Minterms 0, 2, 4 and 7 are not included
by making their input lines equal to 0.
In the above example we have seen
the method for implementing Boolean
function of 3 variables with 2° (8) - to
1 multiplexer. Similarly, we can
implement any Boolean function of 1
variables with 2"to-1 multiplexer.
However, it is possible to do better than
this. If we have Boolean function of n +1 variables, we take n of these variables and
connect them to the selection lines of a multiplexer. The remaining, single variable of the
function is used for the inputs of the multiplexer. In this way we can implement any
Boolean function of n variables with 2°~'- to-1 multiplexer. Lot us see one example.
(GORE i ei ar coo tes he register ee? Using a
‘multiplexer implement a fall adder.
Solution : Carry = AB+-ACig + BC; = Sim 5 6, 7)
Sum = BCjy + ABC\y + ABT, + ABC, = Smit, 2, 4, 7)
Fig. 1.5.9 Boolean function implementation
‘using MUX
TECHNICAL PUBLICATIONS”. An ph er hnowesgeComputer Architecture and Organization Introduction
Logie 1
°
1
Logie
2
2 as
cary fit sum
4
5
6
7
S51 So
AB Cy
Fig. 1.5.10
Decoder
‘A decoder is a multiple-input, multiple-output logic circuit which converts coded
inputs into coded outputs, where the input and output codes are different. The input
code generally has fewer bits than the output code. Each input code word produces a
different output code word, ie. there is one-to-one mapping from input code words into
output code words. This one-to-one mapping can be expressed in a truth table
‘The Fig. 15.11 shows the general structure
of the, Geode arent As'ahown in the: pas oot ==
Fig. 15.11, the encoded information is inputs. +] Cc
presented 3 n. inputs. producing 2" possible Sl ae Posse
outputs. The 2" output values are from 0. Decoder Pees
through 2" — 1. Sometimes an n-bit binary Enable —*| J
code is truncated to represent fewer output inpus | is
values thon 2°. For example, in the BCD code, Fig, 1.5.11 General structure of decoder
the d-bit combinations 000) through 1001
represent the decimal digits 0-9 and combinations 1010 through 1111 are not used.
Usually, a decoder is provided with enable inputs to activate decoded output based on
data inputs. When any one enable input is unasserted, all outputs of decoder are
disabled
Encoder ee
2 ata
‘An encoder is a digital circuit that “puts, >
performs the inverse operation of a decoder. an gee
‘An encoder has 2” (or fewer) input lines and ed aa
A output lines. In encoder the output lines Enable —-]
generate the binary code corresponding to the "" —+|
input value, The Fig. 1.5.12 shows the general Fig, 4.5.12 General structure of encoder
TECHNICAL PUBLICATIONS”. An ph er hnowesgeComputer Architecture and Organization Introduction
structure of the encoder circuit. As shown in the Fig. 1.5.12, the decoded information is
presented as 2° inputs producing n possible outputs,
A priority encoder is an encoder circuit that includes the priority function. In priority
encoder, if two or more inputs are equal to 1 at the same time, the input having. the
highest priority will take precedence,
‘Table 1.52 shows truth table of 4-bit priority encoder.
Inputs Outputs
Do Pr Pe PY
S)¢)o)o) x |x io
x Ja fo pa) ofa fo
Lf
Table 1.5.2 Truth table of 4-bit priority encoder
Table 1.52 shows D3 input with highest priority and Dy input with lowest priority
When D; input is high, regardless of other inputs output is 1 1. The D, has the next
priority. Thus, when Dj = 0 and D, = 1, regardless of other two lower priority input,
output is 10. The output for D, is generated only if higher priority inputs are 0 and so
fon. The output V (@ valid output indicator) indicates, one or more of the inputs are
equal to 1. If all inputs are 0, V is equal to 0, and the other two outputs (Y, and Ys) of
the circuit are not used,
Cascading priority encoders
By cascade connection of several priority encoders, we can obtain a larger priority
encoder. In encoder IC, there are two enable signals EI and EO. EI (Input Enable) signal
enables the priority encoder while EO (Output Enable) is asserted only when El is
asserted and none of the other inputs are asserted. Thus EO signal can be used to enable
other lower priority encoders. The Fig. 15.11 shows the cascade connection of two 8-bit
priority encoder IC3(74148) to form a 16-bit priority encoder. As shown in the Fig. 1.5.11,
EI input of the IC; is grounded. If any input of IC, goes low, its BO output goes high
and disables IC> (lower 8-bit priority encoder). The GS output of encoder ICs goes low
when any of its input becomes low. The output from both the ICs are again encoded
using AND gates, The GS output of IC; is used as a most significant bit in the encoded
code, The Table 1.5.1 shows the truth table for 16-bit priority encoder.
TECHNICAL PUBLICATIONS”. An ph er hnowesgeComputer Architecture and Organization 1245 Introduction
Inputs of pronty encoder Inputs of pity encoder
Tes 43210 41921109 8
A
1 203 v
(se)
UU
Encoded data
Fig. 1.5.13 16-bit priority encoderComputer Architecture and Organization Introduction
‘Table 1.5.3 Truth table for 16-bit priority encoder
Demuttiplexer
A demultiplexer is a circuit that receives information on a single line and transmits
this information on one of 2" possible output lines, The selection of specific output line
is controlled by the values of n selection lines. Fig. 1.5.14 shows 1 : 4 demultiplexer. The
single input variable Dj, has a path to all four outputs, but the input information is
directed to only one of the output lines.
S &
y sy
Fig. 1.5.14 (a) Logic diagram
‘TECHNICAL PUBLICATIONS". An ta or komComputer Achtecu an Ogerizaton rovton
Emble 5) % Da % YY %
ox x x 0 0 0 0
fe: ; fmo > fe &
oo > oo —y,
Pn =
1° 18 0 0 00 4
>
o 1 eo a oo
1): °fe)o > ©
8S; Sy
i 1 oltso 0 i
i Ji tfelo » o »
Fig. 1.5.14 (b) Block diagram
‘Table 1.5.4 Function table for 1:4 demultipl
Arithmetic Elements
The arithmetic functions such as addition and subtraction of fixed point numbers can
be implemented by combinational register-level components. Most forms of fixed-point
‘multiplication and division and essentially all floating-point operations aze too complex
to be realized by single component at this design level. However, adders and subtractors
for fixed-point binary numbers are basic register level components from which we can
derive a variety of other arithmetic circuits. The Fig. 1.5.14 (a) shows a component that
adds two 8-bit data words and an input carry bit; it is called a S-bit adder. One such
component can be cascaded to form an adder to add numbers of arbitrary size.
However, addition time increases with the number size.
on eit can
aut] binary adder i.
8
sum
(2) Symbol for Bt binary alder (b) Symbol for bt magni
Fig. 1.5.15
‘The magnitude comparator is an another useful arithmetic component, whose
function is to compare the magnitudes of two binary numbers. The Fig, 15.15 (b) s
the symbol for 8-bit magnitude comparator.
TECHNICAL PUBLICATIONS”. An ph er hnowesgeComputer Architecture and Organization
Introduction
Let us see the design of bit magnitude comparator at register level. To check the
number A is greater than number B (A > B) we have to perform following steps
Compute B from B using an n
it word inverter,
+ Add A and B using an ndit adder and use the output-carry signal Cog, a6 the
primary output. If Cy) = 1, then A > B if Coy = 0, then A < B
For example, if A = 10001100 and B = 01001000 then B = 10110111 and A +51 0100
W011, be, Cyyy= Land A> B
Using similar technique and changing postions of A and B we can derive the output
of A < B. To get the ‘equals’ output we can use ExNOR word gate. The Ex:NOR word
gate bitwise compares inputs A and B and gives output word of all 1s if both words
are equal, To generate single bit output A'~ B we have to AND gate word. The
Fig. 15.16 shows the implementation of above discussed circuit
a
8
te. 8
Ca ait c.<0 Cou ait
binary adder n binary addor
8
6 8
‘Sum (Not used) A=B ‘Sum (Not used)
AsB A>8
Fig, 1.5.16 Rogister-lovel design of a 8-bit magnitude comparator
Register
A register is a group of flip-flops. A flip-flop can store 1-bit information. So an n-bit
register has a group of n flip-lops and is capable of storing any. binary
information/number containing n-bits.
TECHNICAL PUBLICATIONS". An ph fr romleageComputer Architecture and Organization Introduction
Buffer Register
Fig. 15.17 shows the simplest register constructed with four D flip-flops. This register
is also called buffer register. Each D-fip-lop is triggered with a common negative edge
clock pulse, The input X bits set up the flip-flops for loading, Therefore, when the first
negative clock edge arrives, the stored binary information becomes,
Q,QOcQp = ABCD
In this register, four D flip-flops are used. So it can store 4-bit binary information.
Thus the number of flip-flop stages in a register determines its total storage capacity.
by
cP
Fig. 1.5.47 Buffer register
Shift Registers
The binary information
(data) in a register can be ql owns
moved from stage to stage PO" a its
within the register or into
or out of the register — (@) Seva shit right then out {b) Soria shit fot then out
upon application of clock isa
pulses. This type of bit movement
or shifting is essential for certain it
arithmetic and logic operations l
sed in microprocessors, This gives m4
rise to a group of registers called ttt
Data bits
‘shift registers’. They are very
important in applications involving _(c) Parallel shitin (2) Parallel shit out
the storage and transfer of data in
a digital system,
5.18 gives the symbolical
representation of the different types
of data movement in shift register (Rotate ight (0 Rotate oft
operations
Fig, 1.5.18 Basic data movement in registers
TECHNICAL PUBLICATIONS”. An ph er hnowesgeComputer Architecture and Organization Introduction
4
ac le ce acs
lo >
ck pooux “uk LK
shit
(ee ———4 +4 J)
(2) Logic diagram
A
shin —F
Shirt register
clear —|
¥
(8) Symbol
Fig, 1.5.19 4.bit right shift register
‘The Fig, 15.19 shows the register level implementation of right shift register using.
D flip-flops. A right shift is accomplished by activating the SHIFT enable line connected
to the clock input CLK of each flip-flop. In addition to the serial data lines, m input and
output lines are often provided to permit parallel data transfers to or from the shift
register. Additional control lines are required to select the serial or parallel input modes.
The shift register further can be refined to permit both left and right shift operations,
The Fig. 1.5.20 shows the shift register with parallel and serial modes along with the
right and left shifts operations.
AAs shown in the Fig, 15.20, the D input of each flip-flop has three sources : Output of
left adjacent flip-flop, output of right adjacent flip-lop and parallel input. Out of these
three sources one source is selected at a time and it is done with the help of decoder.
The decoder select lines (SL; and SIs) select the one source out of three as shown in the
Table 1.5.5,
Sk, Sle Selected Source
a ° Parle input
0 1 Output of right adjacent FF
1 ° Output of eft oacent FF
1 1 z
Table 15.5
TECHNICAL PUBLICATIONS”. An ph er hnowesgeComputer Architecture and Organization Introduction
becodor | Tt =
Sy Sto
Serial data. Serial aaa |
forrightshit ‘orien ont
My os Pas
cP
iO [Os JQ [20
Parallel ouputs
Fig. 1.5.20 4-bit bidirectional shift register with parallel load
When select lines are 00 (ie. SL; = 0 and Sty = 0), data from the parallel inputs is
loaded into the 4-bit register. When select lines are 01 (ie. SL, = 0 and Sly = 1), data
within the register is shifted 1-bit left. When select lines are 10 (ie. SLy = 1 and Sly=0),
data with in the register is shifted 1-bit right.
Tri-State Register
In the buffer register, there is no control over input as well as output bits. We can
control input and output of the register by connecting tri-state devices at the input and
output of the register as shown in the Fig, 1.5.21.
wo KKK
Ps oy Lamy Peo]
rP@]|-7oO te Po
an Oe A
O% oy a,
Fig. 1.521 Tristate register
As shown in the Fig. 1.5.21, tri-state switches are used to control the read/write
operation. The tri-state switch is a binary switch. It is closed when enabled and it is
in disabled. Here RD and W
TECHNICAL PUBLICATIONS”. An ph fr hnomesgeComputer Architecture and Organization
2 Introduction
as enabled signal. To get the data on the output lines RD signal is enabled and to load
data into the register WR signal is enabled. When RD signal is disabled the output lines
are in high-impedance state.
Buses
‘A bus is a group of wires that transmits a binary word. In Fig. 1.5.22 group of wires
By, By B, and By is a bus. The number of wires decide binary word. Thus the bus
shown in the Fig. 15.22 is four bit bus. The bus is a common transmission path between
the tri-state registers. The input for and output from all the registers are connected to
the common bu:
By Bp BB
u tama
ouK. A —T
Ef erate 1
a te
c aux
enattep —E
T
ts fier
aux . —
4 erate 1
Tosap——to
> cu
Enatlep —,
T
Bis
1.5.22 Connecting registers to common data bus
TECHNICAL PUBLICATIONS”. An ph er hnomesgeComputer Architecture and Organization
Introduction
In the Fig. 1.5.23 all the control signals are in complemented form; this means that
registers have active low inputs.
In this bus organization nothing happens until we apply low input signals. In other
words, as long as all LOAD and ENABLE inputs are high, the registers are isolated from
the bus. To transfer the word from one register to another, it is necessary to make the
appropriate control signals low. For example, to transfer the word from register A to
rogister B, it is necessary to make Ey and Ly inputs low.
‘The connection between common data bus and registers can be shown in simplified
form as shown in the Fig. 15.23. Here, the data path is shown by a single line. The
number of actual data lines in the data path are indicated by number with a slash on
the line. The input and output data lines are made common,
(Common ta bus
10 10 1 10
Register Register
bf
Fig. 1.5.23 Simplified way of showing registers and its connection with common data bus
Counters
‘A register is used solely for storing and shifting data which is in the form of 1s
and/or 0s, entered from an external source. It has no specific sequence of states except
in certain very specialized applications. A counter is a register capable of counting the
number of clock pulses arriving at its clock input. Count represents the number of clock
pulses arrived. A specified sequence of states appears as the counter output. This is the
main difference between a register and a counter. A specified sequence of states is,
different for different types of counters,
There are two types of counters, synchronous and asynchronous. In synchronous
counter, the common clock input is connected to all of the flip-flops and thus they are
clocked simultaneously, In asynchronous counter, commonly called, ripple counters the
first fip-op is clocked by the extemal clock pulse and then each successive flip-flop is
clocked by the Q of Q output of the previous fip-lop. Therefore in an asynchconous
counter, the flip-flops are not clocked simultaneously.
TECHNICAL PUBLICATIONS". An ph fr romleageComputer Architecture and Organization
Introduction
‘The Fig. 1524 shows the symbol for a
say ‘modulo 2" up-down counter. On receiving
Modu? positive going edge on the Enable (clock
cuca —{ “pun signal) input counter increments its count by
bes 1. As it is an mbit counter; its counting is
upasomn— i
modulo-2"; that is, the counter's modulus
5 k= 2" and it has 2" states Sy, $y)» Sony
The output of counter is an n-bit binary
‘number. The CLEAR input of the counter
when activated resets the counter and
UP/DOWN input selects the operation of the
counter.
count
Fig. 1.5.24 A modulo 2” up-down counter
EIE®] Programmable Logic Devices
There are many applications for digital logic where the market is not great enough to
develop a special-purpose MST or LSI chip. This situation has led to the development of
Programmable Logic Devices (PLDs) which can be easily configured by the individual
user for specialized applications
Basically, there are three types of PLDs
‘+ Read Only Memory (ROM)
‘+ Programmable Logic Array (PLA)
‘© Programmable Array Logic (PAL)
Here, we examine programmable logic devices as
ie 4 new class of components
A read only memory (ROM) is a device that
includes both the decoder and the OR gates within
fem a single IC package. The Fig. 1.525 shows the block
ont diagram of ROM. It consists of n input lines and m
‘output lines. Each bit combination of the input
variables is called an address. Each bit combination
that comes out of the output lines is called a word.
mostputs ‘The number of bits per word is equal to the number
‘of output lines, m. The address specified in binary
number denotes one of the minterms of n variables,
wt addresses possible with n input variables is 2". An output word
can be selected by a unique address and since there are 2" distinct addresses in a ROM,
there are 2" distinct words in the ROM, The word available on the output lines at any
given time depends on the address value applied to the input lines.
Fig. 1.5.25 Block diagram of ROM
The number of dist
TECHNICAL PUBLICATIONS”. An ph er hnowesgeComputer Architecture and Organization
Introduction
Let us consider 64 x 4 ROM. The ROM consists of 64 words of 4 bits each. This
‘means that there are four output lines and particular word from 64 words presently
available on the output lines is determined from the six input lines. There are only six
inputs in a 64 x 4 ROM because 2° = 64 and with six variables, we can specify 64
addresses or minterms. For each address input, there is a unique selected word. Thus, if
the input address is 000000, word number 0 is selected and applied to the output fines.
If the input address is 111111, word number 63 is selected and applied to the output
lines.
‘The Fig. 15.26 shows the internal logic construction of a 64 x 4 ROM. The six input
variables are decoded in 64 lines by means of 64 AND gates and 6 inverters. Each
output of the decoder represents one of the minterms of a function of six variables. The
64 outputs of the decoder are connected through fuses to each OR gate. Only four of
these fuses are shown in the diagram, but actually each OR gate has 64 inputs and each
input goes through a fuse that can be blown as desired,
Ql
Bs 1
A 2
hs
6:68
bs ecover
Ne
As
63
§ Q 6414 = 256 Fuses
Foo Fe Fy
Fig, 1.5.26 Logic construction of 64 x 4 ROM
The ROM is a two level implementation in sum of minterms form. Let us see
AND-OR and AND-OR-INVERTER implementation of ROM. Fig. 1.5.27 shows the 4 « 2
ROM with AND-OR and AND-OR-INVERTER implementations.
TECHNICAL PUBLICATIONS”. An ph er hnowesgeIntroduction
Computer Architecture and Organization
‘Address input
Ay Ay
lv “Minterms
Ps
a
10
”
8
Fig, 1.5.27 (a) 4 « 2 ROM with AND-OR gates
A ae
==
F, A
Fig. 1.5.27 (b) 4 2 ROM with AND-ORJNVERTER gates
TECHNICAL PUBLICATIONS". An ph er hnowesgeComputer Architecture and Organization
Introduction
There are four types of ROM : Masked ROM, PROM, EPROM and EEPROM.
PROM (Programmable Read Only Memory)
Programmable Read Only Memory (PROM) allows user to store data/program.
PROMS use the fuses with material like nichrome and polycrystalline. The user can blow
these fuses by passing around 20 to 50 mA of current for the period 5 to 20 ys. The
blowing of fuses according to the truth table is called programming of ROM. The user
can program PROMs with special PROM programmer. The PROM programmer
selectively bums the fuses according to the bit pattem to be stored. This process is also
known as bunting of PROM. The PROMs are one time programmable. Once
programmed, the information stored is permanent.
EPROM (Erasable Programmable Read Only Memory)
Erasable programmable ROMs use MOS circuitry. They store Is and Qs as a packet of
charge in a buried layer of the IC chip. EPROMs can be programmed by the user with a
special EPROM programmer. The important point for now is that we can erase the
stored data in the EPROMS by exposing the chip to ultraviolet light through its quartz,
window for 15 to 20 minutes.
In EPROM, it is not possible to erase selective information; when erased, the entire
information is lost, The chip can be reprogrammed. This memory is ideally suitable for
product development, experimental projects and college laboratories, since this chip can
be reused many times.
EEPROM (Electrically Erasable Programmable Read Only Memory)
Hlectrically erasable programmable ROMs also use MOS circuitry very similar to that
of EPROM. Data is stored as charge or no charge on an insulated layer or an insulated
floating gate in the device. The insulating layer is made very thin (< 200 A). Therefore, a
voltage a3 low as 20 to 25 V can be used to move charges actoss the thin barrier in
either direction for programming or erasing. EEPROM allows selective erasing at the
register level rather than erasing all the information since the information can be
changed by using electrical signals. The EEPROM memory also has a special chip erase
mode by which entire chip can be erased in 10 ms. This time is quite small as compared
to time required to erase EPROM and it can be erased and reprogrammed with device
right in the circuit. However, EEPROMS. are most expensive and the least dense ROMs
‘The combinational circuit do not use all the minterms every time. Occasionally, they’
have don't care conditions. Don’t care condition when implemented with a ROM
becomes an address input that will never occur. The result is that not all the bit pattems
available in the ROM are used, which may be considered a waste of available
equipment.
For cases where the number of don’t care conditions is excessive, it is more
economical to use @ second type of LSI component called a Programmable Logic Array
TECHNICAL PUBLICATIONS”. An ph er hnowesgeComputer Architecture and Organization
Introduction
(PLA). A PLA is similar to a ROM in concept; however it does not provide full decoding
of the variables and does not generates all the minterms as in the ROM. The PLA.
replaces decoder by group of AND gates, each of which can be programmed to generate
a product term of the input variables. In PLA, both AND and OR gates have fuses at
the inputs, therefore in PLA both AND and OR gates are programmable. Fig. 1.528
shows the block diagram of PLA. It consists of n inputs, m outputs, k product terms,
and m sum terms, The product terms constitute a group of k AND gates and the sum
terms constitute a group of m OR gates. Fuses are inserted between all n inputs and.
their complement values to each of the AND gates. Fuses are also provided between the
outputs of the AND gates and the inputs of the OR gates. The third set of fuses in the
output inverters allows the output function to be generated either in the AND-OR form.
or in the AND-OR-INVERT form. When inverter is bypassed by link we get AND-OR
implementation. To get AND-OR-INVERTER implementation inverter link has to be
disconnected.
nek fuses
k product msum
terms | >|
(AND gates)“ m | (OR gates) ™
7 fuses output
inputs Ree
Fig. 1.5.28 Block diagram of PLA
Fig. 1.5.29 shows the intemal construction of PLA having 3 inputs, 3 product terms
and two outputs. The size of the PLA is specified by the number of inputs, the number
of product terms and the number of outputs (the number of sum terms is equal to the
number of outputs.)
a
Fig. 1.5.29 Internal construction of PLA
TECHNICAL PUBLICATIONS”. An ph er hnowesgeComputer Architecture and Organization Introduction
Like ROM, PLA can be mask-programmable or field-programmable. With a
mask-programmable PLA, the user must submit a PLA program table to the
manufacturer. This table is used by the vendor to produce a user-made PLA that has the
required internal paths between inputs and outputs, A second type of PLA available is
called a field-programmable logic array or FPLA. The FPLA can be programmed by the
user by means of certain recommended procedures. FPLAs can be programmed with
commercially available programmer units,
Programmable logic devices have many gates interconnected through many of
electronic fuses. It is sometimes convenient to draw the internal logic of such devices in
compact form referred to as artay logic. Fig. 1.5.30 shows the conventional and array
logic symbols for a multipleinput AND gate.
Fuses
(2) Conventional symbol (0) Array logic symbot
Fig. 1.5.30
‘The array logic symbol shown in the Fig. 1530 (b) uses a single horizontal line
connected to the gate input and multiple vertical lines to indicate the individual inputs.
Each intersection between horizontal line and vertical line indicates the fuse connection,
We have seen that PLA is a device with a programmable AND array and
programmable OR array. However, PAL programmable array logic is a programmable
logic device with a fixed OR array and a programmable AND array. Because only AND
gates are programmable, the PAL is easier to program, but is not as flexible as the PLA.
Fig. 1531 shows the array logic of a typical PAL. It has four inputs and four outputs.
Bach input has buffer and an inverter gate. It is important to note that two gates are
shown with one composite graphic symbol with normal and complement outputs. There
are four sections. Each section has three programmable AND gates and one fixed OR
gate, As shown in the Fig. 1.5.31, each AND gate has 10 fused programmable inputs
The output of section 1 is connected to a buffer-inverter gate and then fedback into the
inputs of the AND gates through fuses.
‘The commercial PAL devices has more gates than the one shown in Fig. 1531. A
typical PAL integrated circuit may have eight inputs, eight outputs and eight sections,
each consisting of an eight wide AND-OR array.
TECHNICAL PUBLICATIONS". An ph fr romleageComputer Architecture and Organization Introduction
AR BB CT OD ww
Product
z
ez Cats
10
vy
2X Fuse ntact
> Fuse brown
TECHNIGAL PUBLICATIONS". An ph fr romleageComputer Architecture and Organization
Introduction
EEE] Field Programmable Gate Arrays
In mid-1980s, an important class of PLDs was introduced, called field-programmable
gate array. The Fig. 1.5.32 shows the general structure of FPGA chip. It consists of a
large number of programmable logic blocks surrounded by programmable I/O block
The programmable logic blocks of FPGA are smaller and less capable than a PLD, but
an FPGA chip contains a lot more logie blocks to make it more capable. As shown in
the Fig, 1.5.32, the logic blocks are distributed across the entize chip. These logic blocks
can be interconnected with programmable inter connections.
geannnnnnnnoo,
BEEBE SEES
EERE EEe
BERBER REe
BERBER
PELs
frocranmabie
fope sic
55 po oo0o0000000F)
Fig. 1.5.32 General FPGA chip architecture
aware
coonocqoood
a
Proganmable
bo feae
‘As compared to standard gate arrays, the field programmable gate arrays are larger
devices, The basic cell structure for FPGA is some what complicated than the basic cell
structure of standard gate array. The FPLA use read/write memory cell to control the
state of each connection.
‘The word field in the name refers to the ability of the gate arrays to be programmed.
for a specific function by the user instead of by the manufacturer of the device. The
word array is used to indicate a series of columns and rows of gates that can be
programmed by the end user.
Two types of logic cells found in FPGAs are those based on multiplexers and those
based on PROM table-lookup memories. Fig. 1.5.33 shows a cell type employed by Actel
Corp's ACT series of multiplexer-based FPGAs. This cell is a four-input, 1-bit
multiplexer with an AND and OR gate added. An ACT FPGA contains a large atray of
TECHNICAL PUBLICATIONS”. An ph er hnowesgeComputer Architecture and Organization
2 Introduction
such cells organized in rows separated by horizontal wiring channels as shown in
Fig. 15.33 (b). Vertical wire segments are attached to each cell's I/O terminal. These
wires enable connections to be established between the cells and the wiring channels by
means of one-time programmable antifuses positioned where the horizontal and vertical
input-output and test eeu
is 2
% g
Four input g
mo ae 3
1 multiplexer i
S; So 2
g
z
z Keo inout
orouput
1% eH Vericat —\Honzonta
H% Yoh wie wig channel
Cat fie book)
(0) Basie eet {©) hip architecture
Fig. 1.533 Multiplexer based FPGA
We know that multiplexer can be used as a
7 function generator, it can be used to implement
ie any Boolean function. Therefore, the cell in the
"Four input multiplexer based FPGA is also capable of
pa ae Lape
mitted implementing various useful Boolean functions
aT]? Fig. 1534 shows the implementation of complete
set of logic gates using this cell
8
Z = xo Sy Bu +1 81 Sat xD $1 So #X3 81 SO
=as% xq = and xy =a
= abed. sy = band s = ed
Poul
Fig. 1.5.34 (a) AND gate
TECHNICAL PUBLICATIONS”. An ph er hnowesge‘Rreerireimecire wnt Gpanicabo GR
i Z = xq 51 89 +1 51 80 +X 91 50+%3 91 59
peurtnput 5 5p + B10 +51 By +81 59
ae ait opm my Saye Landixy
t 2 mink Ris Spits Bp Atk
sy & eadpdptegts, vA+AB=A4B
= alb+qd+d+(b+o)
y= b+ cand s=d
TT =alb+c+d+(b+e) -A+AB=A+B
be od
Fig. 1.54 (b) OR gate atdtbte SA+AB=A+B
atbectd
1 %
* Four input
me Az =
sq iultiplexer
0
a & Z-= Xo By Bo +X1 By Sq +X $1 By +X3 £1 $0
=aa-a
s = 0,5, =aand =a
Fig. 1.5.34 (c) NOT 9:
EEZI Register Love! Design
At register level design, a set of registers are linked by combinational data transfer
and data-processing circuits. A block diagram defines its structure and the set of
operations it performs on data words can define its behaviour. Each operation can be
defined in the form
cond : Z =f (Ay, Ag, Age os Ay)
where f is a function to be performed or an instruction to be executed in one clock
cycle, and Ay, Ay, Ay... , Ay and Z denote data words or the registers that store data.
The prefix ‘cond’ denot ) for the
a control condition that must be satisfied (cond
TECHNICAL PUBLICATIONS”. An ph er hnowesgeComputer Architecture and Organization
Introduction
indicated operation to take place. Therefore, when ‘cond! = 1 the function f is computed
on Ay Ay, Ay «and Ay and result is stored in the Z data word.
Data and Control
ramen] | [Ree
| |
7s
|
Reeerz
Fig. 1.5.35 (a) Simple re
level system
‘The Fig. 15.35 (a) shows simplest register level
system. It performs operation Z=A+B. The
Fig. 1.5.35 (b) shows a more complicated system that
can perform several different operations. Such a
‘muliifunction system can perform only one operation
ata time, The operation to be perform is decided by
the control signals. Therefore, the multifunction
system is partitioned into a data-processing part called
‘a datapath and a controlling part called a control
unit. The control unit is responsible for selecting and
controlling the actions of the datapath.
aiundon
al
Register Cont
Fig, 4.5.35 (b) Multifunction register level system
As shown in Fig. 15.35 (b), control unit (CU) selects the operation for the ALU to
perform in each clock cycle. It also determines the input operands to apply to the ALU
and the destination of its resulls.
TECHNICAL PUBLICATIONS”. An ph er hnowesgeComputer Architecture and Organization Introduction
The large extension to this multifunction unit is the computer's CPU. The computer's
CPU js responsible for the interpretation of instructions and generation of required
control signals for the execution of the instructions. This unit of computer is called I-unit
and the datapath unit of computer is called E-unit,
‘A Description Language
The HDL can be used to provide both behavioural and structural description at the
register level. For example,
if cond = 1 then Z =F (Ay, Ay) Ay on AQ)
where f can be any function. For example, Z = A + B or Z = A - B. Here,
+ represents the adder and ~ represents the subtractor. The input connections in both the
cases from registers A and B are inferred from the fact that A and B are the arguments
of + and ~ while the output connection from the adder/subtractor is inferred from Z
Let us see the formal language description of an S-bit binary multiplier. We know
that, the multiplication can be perform in two ways : 1. Repetitive addition 2. Shift and
add. Here, our intention is to study the language descriptions, hence we prefer simple
method of multiplication, ie. multiplication by repetitive addition.
Multiplication in: INBUS; out: OUTBUS),
rogister A [0-7], 3 (0:7, 2 10:7,
[0°71 bus INBUS 10:7,
‘ouTBUS (0:7);
BEGIN 0, A = INBUS;
REPEAT ;
if CY # 1 then go to NEXT
NEXT
fend multiplication ;
In the above program, two S-bit buses INBUS and OUTBUS form multipliers input
and output ports, respectively. The program initializes these buses with statement : bus
INBUS [0 : 7], OUTBUS [0 : 7] and initialize Sbit registers A, B, Z and C with
statement : register A [0 : 7], B [0 : 71, Z {0 : 7] and C [0 : 7]. The registers Z and C ate
initialized to store the lower byte and higher byte of multiplication, respectively.
Initially, the result (Z and C) is made 0 and register A and B are loaded with
‘multiplicand and multiplier from the INBUS, respectively. The multiplicand is added
repeated for multiplier times and result is stored in the Z and C registers. The carry
after addition of lower byte is used to increment the value in the higher byte register,
i.e. C register. The final result is then transferred § bits at a time to OUTBUS.
TECHNICAL PUBLICATIONS”. An ph er hnowesgeComputer Architecture and Organization
Introduction
EE] Design Techniques
‘The general approach to the design problem for register level system is as follows
1. Define the desired behaviour of the system by a set of sequences of
register-transfer operations, such that each operation can be implemented directly
using the available design components. This gives the desired algorithm.
2. Analyse the algorithm, to determine the types of components and the number of
each type required for the datapath,
3. Construct a block diagram for datapath using the components identified in step 2.
Make the connections between the components so that all data paths implied by
algorithm are present and the given performance-cost constraints are met.
4. Analyse algorithm and datapath to identify the control signals needed. Introduce
the logic or control points necessary to apply these signals into datapath.
5. Design a control unit for datapath that meets all the requirements of algorithm.
6, Check whether the final design operates correctly and meets all performance-cost
goals.
A design of algorithm in step 1 is a creative design process. It is similar to writing
computer program and depends heavily on the skill and experience of the designer. The
second step is to identify the data processing components. It is a straight forward
process, However, it becomes complicated when the possibility of sharing components
exists. For example, the perform operation
Cond: A=A+B,C=C+D;
requires two adders, because the operation has to perform in parallel. However, if we
use single adder and perform operations serially we can lower the cost by sharing a
single adder. Thus
Cond (ty): A= A+B
Cond (ty +1) :C=C+D
‘The step 3 requires defining an interconnection structure that links the component
needed by the various parts of algorithm. Identifying the control signals and design of
control unit in step 4 and step 5, respectively, is a relatively independent process. The
step 6, design verification plays an important role in the development process. The
simulation via CAD tool can be used to identify and correct functional errors before new
design is committed to hardware.
TECHNICAL PUBLICATIONS”. An ph er hnowesgeComputer Architecture and Organization Introduction
List onvons register level component
i
2. Drm and explain he generic black representation of reiter level component
3. Write short notes on
4) Muttplewers 8) Devader «Encoder
4) Demutpieers—o) Avtbmerie cements) Registers
8) Tristate register 1) Buses 8s Counters
1 Programm esi designs Wy EPLA
4. Explain the desig paces at the register lee.
5. We short notes ow HDL
6 Esplin the rgitordoe design of a i mage comparator Se
7. Descite the organization of processor withthe general resister organization SEEM IT
8 What isa priority encoder ? Design a 16-bit priority encoder using eo copies ofan 8-8 priority
encoder.
8 Design 4 bidirectional sift register with paral ond and expan
28.Dro te eck re of 4 rite compre i
Processor Level De
The processor level which is also called system level is the highest in the hierarchy
of computer design. The storage and processing of information ate the major objectives
of this level. Processing involves execution of programs and processing of data files. The
components requited for performing these functions are complex. Usually sequential
circuits are used which are based’ on VLSI technology. A slight design theory is
necessary at this level of abstraction.
Processor Level Components
The different types of components which are generally used at this level can be
divided mainly into four groups as
‘© Processors
+ Memories
* 1/0 devices
‘+ Interconnection networks
Tn this se
components.
Central Processing Unit
The primary function of a central processing unit is to execute sequences of
instructions stored in a memory, which is external to the central processing, unit. When
the functions of the processor are restricted, those processors become more specialized
processor such as 1/O processor. Most of the times CPUs are microprocessors whose
physical implementation is a single VLSI chip.
we will see the brief summary of the characteristics of all. these
TECHNICAL PUBLICATIONS”. An ph er hnowesgeComputer Architecture and Organization Introduction
Fig. 1.6.1 shows a typical CPU structure and its connection to memory. The CPU
contains different units such as control unit, arithmetic logic unit, register unit, decoding,
tunit which are necessary for the execution of instructions. The sequence of operations
involved in processing an instruction constitutes an instruction cycle. This can be
subdivided into three major phases : fetch cycle, decode cycle and execute cycle. The
address of the next instruction which is to be fetched from memory is in the Program.
Counter (PC). During fetch phase CPU loads this address in Address Register (AR). This,
i the register which gives address to the memory. Once the address is available on the
address bus, the read command from control unit copies the contents of addressed
memory location to the instruction register (IR). During decode phase, the instruction in
the IR is decoded by instruction decoder. In the next, ie. execute phase CPU has to
perform a particular set of micro-operation depending on the instruction,
Alll these operations are synchronized with the help of clock signal. The frequency of
this signal Is nothing but the operating frequency of CPU. Thus the CPU is a
synchronous sequential circuit and its clock period is the computer's basic unit of time.
ee
wwe Ef Canto
iy
Neto: AR > Aes Rote
Taees own ROWE R= btn ea
on bute repr
‘tu Regt iat shoo
Forsmploty--- shows
‘tera ort signals
1.6.1 Typical CPU structure
TECHNICAL PUBLICATIONS”. An ph er hnowesgeIntroduction
For the storage of programs and data required by the processors, external memories
are necessary. Ideally, computer memory should be fast, large and_ inexpensive
Unfortunately, it is impossible to meet all the three of these requirements
simultaneously. Increased speed and size are achieved at increased cost. Very fast
memory system can be achieved if SRAM chips are used. These chips are expensive and
for the cost reason it is impracticable to build a large main memory using SRAM chips
The only altemative is to use DRAM chips for large main memories,
Processor fetches the code and data from the main memory to execute the program,
The DRAMS which form the main memory are slower devices. So it is necessary to
insert wait states in memory read/write cycles. This reduces the speed of execution. The
solution for this problem is come out with the fact that most of the computer programs
work with only small sections of code and data at a particular time. In the memory
system small section of SRAM is added along with main memory, referred to as cache
memory. The program which is to be executed is loaded in the main memory, but the
part of program (code) and data that work at a particular time is usually accessed from
the cache memory. This is accomplished by loading the active part of code and data
from main memory to cache memory. The cache controller looks after this swapping
‘between main memory and
cache memory with the help
Increasing Pu Increasing increasing of DMA controller. The cache
= CREE speed costperbt memory just discussed is
ache called secondary cache. Recent
processors have the builtin
cache memory called. primary
cache,
ae DRAMs along with cache
‘ache allow main memories in the
range of tens of megabytes to
i be implemented ata
reasonable cost, the size and
ae beiter speed. performance. But
" the size of memory is still
{ small compared tothe
demands of large programs
tages ai with voluminous data. A.
memory solution is provided by using
secondary storage, mainly
magnetic disks and magnetic
Fig. 162 i ~
TECHNICAL PUBLICATIONS”. in ph fr hnowesgeComputer Architecture and Organization
Introduction
tapes to implement large memory spaces. Very large disks are available at a reasonable
price, sacrificing the speed,
From the above discussion, we can realize that to make efficient computer system it
is not possible to rely on a single memory component, but to employ a memory hievarchy.
Using memory hierarchy all of different types of memory units are employed to give
efficient computer system. A typical memory hierarchy is illustrated in Fig. 1.62.
In summary, we can say that a huge amount of cost-effective storage can be provided
by magnotic disks. A large, yet affordable, main memory can be built with DRAM
technology along with the cache memory to achieve better speed performance.
10 Devices
‘A computer communicats with outside world by means of input-output (10) system.
The main function of 1/O system is to transfer information between CPU or memory
and the outside world.
The important point to be noted here is, 1/0 devices (peripherals) cannot be
connected directly to the system bus. The reasons are discussed here.
‘+ A variety of peripherals with different methods of operation are available. So it
would be impractical to incorporate the necessary logic within the CPU to control
a range of devices.
‘+ The data transfer rate of peripherals is often much slower than that of the memory.
or CPU. So it is impractical to use the high speed system bus to communicate
directly with the peripherals,
‘+ Generally, the peripherals used in a computer system have different data formats
and word lengths than that of CPU used in i.
So to overcome all these difficulties, it is necessary to use a module in between
system bus and peripherals, called as I/O module or I/O system.
This 1/O system has two major functions,
‘+ Interface to the CPU and memory via the system bus,
‘+ Interface to one or more 1/O devices by tailored data links
‘The table gives list of representative 10 devices.
10 device Type Medium toffrom which TO device transforms:
digital electrical signals
| Analog.igial converter 1 Analog (continuous) electrical signals
CDROM dive 1 Characters (and coded images) on optical disk
Document scanner reader 1 tmnages on paper
TECHNICAL PUBLICATIONS”. An ph er hnowesgeComputer Architecture and Organization Introduction
Dotimatrix display panel © Images on screen
Keybourd/keypad 1 Characters on heybosed
Laser printer © mages on paper
Loudspeaker © Spoken words and sounds
Mogneticaisk drive VO Characters (end code! images) on magnetic disk
Magnetictape drive 1/0 Characters (end codes! images) on magnetic tape
peer 1 Spoken words and sounds
Mouse/touchpad 1 Spatial postion on pad
Table 1.6.1
[EEE] Interconnection Networks
The processor level components, CPU, memories, 10 devices communicate via system.
bus (address bus, data bus and control bus). Ina computer system, when many
components are used, communication between these components may be controlled by a
subsystem called an interconnection network. Switching network, communications
controller, bus controller are the examples of such subsystem, Under the control of
interconnection network, dynamic communication paths among the components via the
buses can be established, The communication paths are shared by the components to
reduce cost. At any time, communication and hence use of shared bus is possible
between any two components. When more than two components request use of the bus,
it resulls in bus contention, The function of the interconnecting network is to resolve
such contention. For performing this function, interconnecting network selects one of the
requesting devices on some priority basis and connects it to the bus. The remaining
requesting devices are kept in a queue.
Some evolutionary steps in the 1/O function are summarized here.
1. In a simple microprocessor-controlled devices, a peripheral device is directly
contzolled by CPU.
2. A controller is then added to CPU to control peripheral devices with programming
facility
3. Now interrupts are employed in the configuration mentioned in step 2. This saves
the CPU time which was required for a polling of 1/O device
4. DMA controller is introduced to give direct access to memory for 1/O module.
5. The I/O module is then enhanced to become a processor with a specialized
instruction set tailored for I/O. The 1/0 processor is capable of executing an 1/0
TECHNICAL PUBLICATIONS”. An ph er hnowesgeComputer Architecture and Organization
Introduction
program in memory with directions given by CPU. It can execute I/O program
without intervention of CPU.
6. The 1/0 module is further enhanced to have its own local memory. This makes it
possible to control a large set of 1/O devices with minimal CPU involvement.
In step 5 and step 6 we have seen that the I/O module is capable of executi
programs. Such 1/O module is commonly known as VO channel.
Generally the communication between processorevel components is asynchronous
since they cannot access some unit or bus simultaneously and hence components cannot
be synchronized directly by a common clock signal. The following different causes can
be stated regarding this synchronization problem.
+ The speed of operation of different components vary over @ wide range. Eg. CPUs
are faster than main memories and main memories are faster than I/O devices.
‘+ The different components work more independently. E.g. Execution of different
programs by CPUs and IOPs.
‘+ Ik is practically difficult to allow synchronous transmission of information between
components due to large physical distance between them.
[EGE] Processor-Level Design
While designing any system, it is very much difficult to give a precise description of
the desired system behaviour. Because of this reason, the pracessor level design job is
critical as compare to register level design. Generally to design at this level, a prototype
design of known performance is taken. Then according to the necessity new technologies
are added and new performance requirements are achieved
Performance characteristics : Year by year, the cost of computer systems continues to
drop dramatically, while the performance and capacity of those systems continue to rise
equally dramatically. In this section we introduce some basic aspects of computer system
performance characteristics, The total time needed to execute application programs is the
‘most important measure of computer system performance, In other words we can say
that the speed of the computer system is an important characteristics to define the
performance of the computer system. The speed of the computer system depends on
various factors. Let us discuss those factors.
Hardware : The speed of the processor used in the computer systems basically decides
the speed of the computer system. For example, system having processor Pentium IV
runs faster than system having Pentium I. However, system speed is not only depends
oon the processor speed, but it is also atfected by the supported hardware.
Each processor has its own address bus width and data bus width, internal registers,
on chip memory and the instruction set. Higher data bus width allows the transfer of
TECHNICAL PUBLICATIONS”. An ph er hnowesgeComputer Architecture and Organization
Introduction
data with more number of bit at a time, For example, data bus width of 64-bit allows
64 bits transfer of data at a time and data bus width of 32-bit allows 32-bits transfer of
data at a time. Higher address bus width gives higher addressing capacity. More
‘number of intemal registers allow to store partial results and avoid unnecessary memory.
accesses resulting faster operation, Similarly, on-chip memory allows to store currently
executing program module, required data and partial results in CPU itself which can be
accessed quickly resulting faster operation, The system speed is also depends on the
speed of the secondary memory, the speed of the I/O parts and the speed of data
transfer between them.
Programming language : Now a days, programs are usually written in a high-level
languages. These languages require separate compiler to translate programs into machine
level language. Therefore, the computer system is affected by the performance of the
compiler and hence the language used for the program.
Pipelining : ‘The processor executes the instruction in steps or phases such as fetch,
decode and execute. By overlapping these phases of successive instructions we can
achieve a substantial improvement in the performance of the computer system. This
technique is known as pipelining,
Parallelism : It is possible to perform transfers to and from secondary memory like
storage disk or tapes in parallel with program execution in the processor or with activity
in other 1/0 devices. Such technique is known as parallelism. The most of computer
system use this parallelism to the improve the system performance,
‘Types of memory and IO devices : ‘The performance of the computer depends on
the type of memory and 10 devices supported by it.
Compatibility with other types of computer cost : The performance of a computer
system is also decided by the total cost of the system
{All these performance specifications are considered while designing a new computer
system. Eventhough the new computer design is closely based on a known design,
accurate performance prediction of the new system may not be possible. For accurate
performance, the understanding of the relation between the structure of a computer and
its performance is very important. Using some mathematical analysis, a little amount of
useful performance evaluation can be done. For performance evaluation, experiments
during the design process are to be performed. For this purpose computer simulation
can be used or performance of the copy of the machine under working conditions can
be measured.
Prototype Structures
‘The processorlevel design using prototypestructures involve following steps in the
design process.
1, Fast select a prototype design as per the system requirements and adapt it to
satisfy the given performance constraints.
TECHNICAL PUBLICATIONS”. An ph er hnowesgeComputer Architecture and Organization
Introduction
2. Determine the performance of proposed system,
3. If the performance is unsatisfactory, design is to be modified. Repeat step 1.
4, The above steps are to be continued until the acceptable design is obtained and the
desired performance constraints are achieved.
‘These steps are widely followed for designing a computer system. While designing
new systems, the precautions are always taken to remain compatible with existing
hardware and software standards. The reason is that when these standards are to be
changed, computer owners have to spend
money to retrain users and programmers.
Also the well tested software is to be | PU Main memory
replaced by the modified software. So in
the new design of the computer system the
drastic changes in the previous design are
generally avoided. Because of all these
reasons, there is slow evolution of
computer architecture.
Fig 163 shows the structure of [10 | | 00 |]
first generation computers. This is the basic
computer structure.
Tntereonnecting network
Device m
Fig. 1.6.3 Basic computer structure
The second and subsequent
generations of computer involve oF Vanna,
special-purpose 10 processors
and cache memory in addition
to basic components used
Cache.
within the basic system. This rai
advanced structure is shown in
Fig. 164.
‘The more advanced structure
involves more than one CPU,
ie. a multiprocessor system.
Fig. 165 gives the computer
structure with two CPUs, main Processor] [Processor
memory banks.
Inerconnecting network
v0
Device
v0.
Device 1] | Device 2
1.6.4 Computer structure with 10 processors and
‘cache memory
TECHNICAL PUBLICATIONS”. An up ha fr hnowesgeComputer Architecture and Organization Introduction
cput ceut }> asi tenenp
Tain Tan Tan
aa ar memery] | marry |
i 2
interconnecting network
iT
Processor] —_|Processoy Processor
10. 0.
Deviee + Device N
Fig. 1.6.5 Computer structure with two CPUs, main memory banks
If we link several copies of the foregoing prototype structures, more complex.
structures of computer can be obtained, Computer Network is an example of such a
structure.
‘Queueing Models
In this section we will discuss an analytic performance model of a computer system.
The model which is discussed here is based on queueing theory. The model which is
considered here is, M/M/1 model. The first M indicates the interarrival time between
two successive items requiring service from the server. The items are served in their
order of arrival. ie. First Come First Service (FCFS) scheduling. The second M indicates
service time distribution. 1 indicates number of service facility centers. Fig. 1.66 shows
simple queueing model of a computer.
Shared
ems
‘Sone
requting Queue a
Quoueing system
Serviced items
Fig. 1.6.6 Simple queueing model of a computer system
TECHNICAL PUBLICATIONS”. An ph er hnowesgeComputer Architecture and Organization Introduction
CPU is used as a server. The items or tasks requiring service by the CPU are queued
in a memory, One task is processed at a time by the CPU. The tasks from the queue are
processed (serviced) by the CPU on FCFS basis. The tasks requiring service and joining.
the queue follow probability distribution with mean (ie. average arrival rate) denoted by
2 (lambda). Also the service time distribution ie. service rate follows probability
distribution with mean denoted by u(mu).
The traffic intensity, denoted by p (tho) represents the mean utilization of the server,
i
leis given by, p =
For example, if the average of two tasks are arriving per second, then 2. = 2. If the
average rate of servicing is eight tasks per second, then jt = 8
= 2-025.
‘The traffic intensity i.e, the mean utilization of the server, p =» = 2
‘The interarrival time between two successive customers is a random variable with
parameter 4 This random process is characterized by the interarrival time distribution
denoted by P(t) It is defined as the probability that at least one task arrives during a
period of t. The M/M/1 case assumes that the number of items or tasks arriving or
joining the queue follows Poisson distribution with parameter 2. The probability
distribution is, P(t) = 1 e-*
‘When t = 0, this exponential distribution has Py) = 0. As t increases, Py(t) increases
steadily toward 1 at a rate 2.
Let Pt) be the probability that the service required by a task is completed by the
CPU in time t or less after removing it from the queue. Then the probability distribution
of it is given by,
Pt) = 1-e""
There are different performance parameters which can characterize the steady-state
performance of the single-server queueing system.
1 Traffic intensity It is denoted by p and given by p = 4/4. It is the average
fraction of time the server is busy. Thus p is nothing but utilization of the server.
2. Average number of tasks queued in the system : It includes the number of tasks
‘waiting for service and the number of tasks actually being served. It is also known,
as mean queue length, Let E(N) be the average number of tasks in the system.
‘Then,
BN) = Yn, (16a)
co
TECHNICAL PUBLICATIONS". An ph fr romleageComputer Architecture and Organization
Introduction
where P, is the probability that there are n tasks in the system. It is given by
P, = (= ppp"
Substituting in equation (i) gives,
n(l=p)p®
~
E(N)
1
=p)?
(L-p)p(L+2p+3p?+4p3 +--+) = (-php
e0=¥%)
a (162)
3. Average time that tasks spend in the system : It involves waiting time in queue
and actual service time. This is also called Average response time or mean waiting
time. Let F(V) be the average time that tasks spend in the system. The quantities
E(V) and E(N) are related directly. When average number of tasks are E(N) and
tasks enter the system at rate 2, then we can write,
BY) = ENYA (163)
Combining equations (1.62) and (1.6.3), we get
ates
x
FV)
EV)
4, Average time spent waiting in the queue excluding service time : Let it be E(W).
QW) = Ey) -2
where 1/1 is the average time required to service a task.
OW)
EW)
re (1.64)
TECHNIGAL PUBLICATIONS". An ph fr romleageComputer Architecture and Organization
Introduction
5. Average number of tasks waiting in the queue excluding those being served is
denoted by E(Q). The average number of tasks being serviced is 1 /p. Hence
subtracting this from E(N) yields E(Q).
EQ)
(165)
ire)
From equation (1.6.4) and (1.65), we get,
Bw) = HQ
L. List the processor level components
2. Write a short note on
ocr
B) Memories
1) 10 devices
2 Interconnection netoork
43. Explain various design aspects inthe prvessor level design
4. Explain the processor level design process sing protelypestrictares
5. Domo and expan simple queueing model of computer system,
CPU Organization
In addition to execute programs, the CPU controls the functioning of other system
components with the help of control signals. It directly oF indirectly controls. 1/0
operations such as data transfers between 1/O devices and main memory. It also
supports interrupt facility by which extemal devices can request for CPU service. The
major functions of the CPU are summarized in the flowchart shown in Fig, 17.1.
To perform above mentioned functions CPU organization is divided into many
functional unit and each functional unit is responsible for execution of particular tasks.
Lot us study the general CPU organization.
TECHNICAL PUBLICATIONS”. An ph er hnowesgeComputer Architecture and Organization
‘are
there
inetvuetions.
‘waling
execution?
Fetch the next
instructon
aa
Decode the:
instruction
—i
Execute the
instrucion
“Transfer program
contol o interrupt
‘serve routine
Introduction
‘STOP
Fetch oycle
Decode cycle
Exocute cycle
Program wanster
Fig. 1.7.1 Flowchart showing major functions of processor
The Fig, 172 shows the general CPU organization. It includes three major logic
devices,
TECHNICAL PUBLICATIONS”. An ph er hnowesgesn tn son
fle] |
me]
Ls / He
;
2 Instruction fe
! co
o,
TTT =
Extomat nput and
output cont ines
Fig. 1.7.2 General processor organization
© ALU
‘© Several registers
© Control unit,
The internal data bus is used to transmit data between these logic devices.
ALU:
One of the CPU's major logic devices is the arithmetic logic unit (ALU). It contains
the CPU’s data processing logic. It has two inputs and an output. The internal data bus
of CPU is connected to the two inputs of ALU through the temporary register and the
accumulator.
‘The ALU’s single output is connected to the intemal data bus. It allows to send the
output of ALU over the bus to any device connected to the bus. In most of the CPUs
register A gives data for the ALU and after performing the operation, the resulting data
word is sent to the register A and stored there. This special register, where the result is
accumulated is commonly known as accumulator.
‘The ALU works on either one or two data words, depending on the kind of
operation. The ALU uses input ports as necessary. For example, addition operation uses
both ALU inputs while complementing data operation uses only one input. To
TECHNICAL PUBLICATIONS”. An ph er hnowesgeComputer Architecture and Organization
Introduction
complement the data word, all the bits of the word's that are logic 1 are set to logic 0
and all the bits of the word at logic 0 are set to logic 1.
The ALU of the most of the CPUs can perform following functions,
+ Add
‘+ Subtract
+ AND
= OR
‘+ Exclusive OR
‘+ Complement
‘© Shift right
‘+ Shift left
‘+ Increment
+ Decrement
Registers :
Registers are a prominent part of the block diagram and the programming model of
any CPU. The basic registers found in most of the CPUs are the accumulator, the
program counter, the stack pointer, the status register, the general purpose registers, the
memory address register, the instruction register and the temporary data registers,
Control Logic :
‘The control logic is a important block in the CPU. The control logic is responsible for
working of all other parts of the CPU together. It maintains the synchronization in
operation of different parts in the CPU. The synchronization is achieved with the help of
tone of the control logie’s major external inputs, CPU's clock. The clock is a signal which
is the basis of all the timings inside the CPU.
Usually CPU's control logic is microprogrammed. This means that the organization of
the control logic itself is much like the organization of a very special purpose CPU,
‘The control logic receives the signal from instruction decoder which decodes the
instruction stored in the instruction register. The control logic then generates the control
signals necessary to carry out this instruction. The control logic does a few other special
functions. Tt looks after the CPU power-up sequence. It also processes interrupts. An
interrupt is like a request to the CPU from other extemal devices such as the memory
and 1/0. The interrupt asks the CPU to execute a special program.
TECHNICAL PUBLICATIONS". An ph fr romleageComputer Architecture and Organization Introduction
Internal Data Bus :
‘The intemal data bus connects the different parts of CPU together and it enables the
communication between these parts. The data transfer through this internal data bus is
controlled by control logic.
CPU's internal data bus usually connected to an external data bus. Due to this CPU
can communicate with external memory or [/O devices. Usually the internal data bus is
connected to the extemal data bus by logic called a bi-directional bus (transceiver).
Another way to represent CPU organization is the single bus CPU organization in
which the arithmetic and logic unit and all CPU registers are connected through a single
common bus. It also shows the external memory bus connected to address (AR) and
data register (DR).
‘The registers Y, Z and Temp in Fig. 1.73, are used only by the CPU unit for
temporary storage during the execution of some instructions. These registers are never
used for storing data generated by one instruction for later use by another instruction.
The programmer cannot access these registers. The TR and the instruction decoder are
integral parts of the control circuitry in the CPU unit, All other registers and the ALU
are used for storing and manipulating data. The data registers, ALU and the
interconnecting bus is referred to as data path.
cums
Conte sits
=
Aes ighion
‘o> oe
nf
4
a
x | tag
|
y
T Re
i io
au f Aad
conrad. $e Aw
lines | OR ——e\ PL Rea
aR: I ‘Canyin
; |. aa
ig. 1.7.3 Single bus organization of processor
TECHNICAL PUBLICATIONS”. An ph fr hnowesgeComputer Architecture and Organization
Introduction
CPU Register Organization
We have seen that CPU consists of various registers. These registers are used for
different purposes. Let us study the functioning of these registers one by one.
The Accumulator
The accumulator is the major working register of CPU. Most of the time it is used to
hold the data for manipulation. Whenever the operation processes two words, whether
arithmetically or logically, the accumulator contains one of the words. The other word
may be present in another register or in a memory location. Most of the times the result
of an arithmetic or logical operation is placed in the accumulator. In such cases, after
execution of instruction original contents of accumulator are lost because they are
overwritten,
The accumulator is also used for data transfer between an I/O port and a memory
location or between one memory location and another:
The Program Counter :
‘The Program Counter is one of the most important registers in the CPU. As
mentioned earlier, a program is a series of instructions stored in the memory. These
instructions tell the CPU exactly how to solve a problem. It is important that these
instructions must be executed in a proper order to got the correct result. This sequence
of instruction execution is monitored by the program counter. It keeps track of which
instruction is being used and what the next instruction will be.
The program counter gives the address of memory location from where the next
instruction is to be fetched. Due to this the length of the program counter decides the
maximum program length in bytes. For example, CPU that has 16-bit program counter,
can address 2"° bytes (64 K) of memory.
Before the CPU can start executing a program, the program counter has to be loaded
with valid memory address. This memory location must contain the opcode of first
instruction in the program. In most of the CPUs this location is fixed. For example,
memory address (0000H) for 16-bit program counter. The fixed address is loaded into
the program counter by resetting the CPU.
‘As said earlier, the instructions must be executed in a proper order to get the correct
result. This does not mean that every instruction must follow the last instruction in the
memory, But it must follow the logical sequence of the instructions, In some situations,
it is better to execute part of a program that is not in sequence (don’t confuse with
logical sequence) with the main program. For example, there may be a part of a
program that must be repeated many times during the execution of the entire program.
Rather than writing repeated part of the program again and again, the programmer can
write that part only once. This part is written separately. The part of the program which
is written separately is called subroutine. The Fig, 1.7.4 shows how the main and
subroutine programs are executed.
TECHNICAL PUBLICATIONS”. An ph er hnowesgeComputer Arcitecture and Orparization a Introduction
The program counter does the Mainprogram Subroutine posram
major role in subroutine execution as = —
it can be loaded with required = wOde
= | pattie
memory address. With the help of We = Sprogemtobe
instruction it is possible to load any = ( ‘epeates
memory address in the program =
counter, When subroutine is to be yl etn he
executed, the program counter is Subrouiine CALL prone
loaded with the memory address of —
the first instruction in the subroutine.
After execution of the subroutine, the
program counter is loaded with the memory address of the next instruction from where
the program control was transferred to the subroutine program,
Fig. 1.7.4 Execution of subroutine programs
The Status Register : The status register is used to store the results of certain
condition when certain operations are performed during execution of the program. The
status register is also referred to as flag register. ALU operations and certain register
operations may set or reset one or more bits in the status register. Status bits lead to a
new set of CPU instructions. These instructions permit the execution of a program to
change flow on the basis of the condition of bits in the status register. So the condition
bits in the status register can be used to take logical decisions within the program. Some
of the common status register bits are:
1) Carry/Borrow : The carry bit is set when the summation of two 8:bit numbers is
sgreater than 1111 1111 (FFH). A borrow is generated when a large number is subtracted
from a smaller number.
2) Zero : The zero bit is set when the contents of register are zero after any
operation. This happens not only when you decrement the register, but also when any
arithmetic or logical operation causes the contents of register to be zero.
3) Negative or sign : In 2's complement arithmetic, the most significant bit is a sign
bit. If this bit is logic 1, the number is negative number, otherwise a positive number
The negative bit or sign bit is set when any arithmetic or logical operation gives a
negative result.
4) Auxiliary carry : The auxiliary carry bit of status register is set when an addition
in the first 4 bits causes a carry into the fifth bit. This is often referred as half carry or
intermediate carry. This is used in the BCD
arithmetic. x
5) Overflow flag: In 2% complement BEX
arithmetic, most significant bit is used to represent
i 2 a eda
ip, ae seal: Ee Ne, 18 repre .
inagrihide of'a number [See Fig, 175) This Hag i "M+ 17826: complement it
TECHNICAL PUBLICATIONS”. An ph er hnowesgeComputer Architecture and Organization
Introduction
set if the result of a signed operation is too large to fit in the number of bits available
(7-its for 8-bit number) to represent it
For example, if you add the S-bit signed number OL110110 (+118 decimal) and the
Sbit signed umber 00110110 (+ 54 decimal). The result will be 10101100
(+ 172 decimal), which is the correct binary result, but in this case it is too large to fit in
the 7-bils allowed for the magnitude in an &-bit signed number. The overflow flag will
be set after this operation to indicate that the result of the addition has overflowed into
the sign bit.
6) Parity : When the result of an operation leave the indicated register with an even.
number of Is, parity bit is set.
The Stack Pointer :
This is an important register which programmer uses frequently. In the earlier
sections we have seen how subroutines are executed by changing the program counter
contents. But one question you may have in your mind is that how the program counter
is loaded with the address of the next instruction (rotuen address) from where the
program control was transferred to the subroutine. This return address is kept in a
special memory area called the stack. Before transferring the program contral to the
subroutine the retum address is pushed onto the stack. After the execution of subroutine
the return address is popped off from the stack and loaded into the program counter
‘The memory address of the stack area is given by a special rogister called the stack
pointer. Like the Program Counter, the Stack Pointer automatically points to the next
available location in the memory. In most CPUs, the stack pointer decrements (points to
the next lower memory address) when data is pushed
oon the stack. This allows the programmer to build the
mrs} & [se stack down in memory as shown in the Fig. 176.
Usually stack operations are 2 byte operations. This
arc means that the stack pointer decrements by two
memory address locations each time when 2 byte data
aro ES is pushed on the stack. When the data is popped off
ale from the stack, the stack pointer is incremented by two
memory address locations.
zee] It is important to note that as you go on storing
(pushing) data on the stack, the stack pointer always
points the last data placed on the stack and when you
try to remove (pop) data you always get the last data
Fig. 1.7.6 Stack operation placed on the stack. This kind of stack operation is
called LIFO (last in first out) operation.
TECHNICAL PUBLICATIONS”. An ph er hnowesgeComputer Architecture and Organization
Introduction
Genoral Purpose Registers =
In addition to the six basic registers, most CPUs have other registers called general
purpose registers. The general purpose registers are used as simple storage area, mainly
these are used to store intermediate results of the operation. Getting the operand from
the general purpose registers is more faster than from memory so it is better to have
sufficient number of general purpose register in the CPU. The CPU used in this chapter
has six general purpose registers (Refer Fig. 1.7.2) called the B, C, D, E, H, and L
registers, These registers individually can operate as 8 bit registers. Together, the BC,
DE, and HL registers can operate as 16-bit register pairs.
Memory Address Register :
‘The memory address register gives the address of memory location that the processor
wants to use, That is, memory address register holds 16-bit binary number. The output
of the memory address register drives the 16-bit address bus. This output is used to
select a memory location.
The Instruction Register :
‘The instruction register holds the operation code (opcode) of the instruction the CPU
is currently executing. The instruction register is loaded during the opcode fetch cycle.
The contents of the instruction register is used to drive the part of the control logic
known as the instruction decoder.
Temporary Data Register ;
‘The need for the temporary data registers arises because the ALU has no storage of
its own. The ALU has two inputs. One input is supplied by accumulator and other from,
temporary data register. The programmer cannot access this temporary data register and
therefore it is not a part of programming model.
1. Dro and explain the CPU organization
2. Explain the single hus orgenzaton of processor.
3, Explain the CPU register organization.
4 Explain the use of following registers of processor
4 Program counter. Accummultar Instruction register. Stack pointer
5. Name nd explain enious special register ina ypial computer
TECHNICAL PUBLICATIONS”. An ph er hnowesgeComputer Architecture and Organization
Introduction
[EEE] Data Representation PESTS
‘The basic form of information handled by a computer are instructions and data. The
data can be in the form of numbers or nonnumerical data. The data in the number form
can be further classified as fixed point and floating-point. This is illustrated in Fig. 8.1.
Information
Instruction Dats
Nonnumerical data Numbers
Floating point ied point
Binary Decimal Binary Decimal
Fig. 1.8.1 The basic information types
‘The Fig. 1.8.1 shows that, finally, there are two ways to represent information; either
in binary form or in decimal form. The digital computer represents information in binary
words, where a word is a unit of information of some fixed length n. An n-bit word
allows up to 2° different items to be represented. The precision of a number word is
determined by its length and therefore to represent numbers of various sizes no single
word length is suitable for representing every kind of information encountered in a
typical computer. Considering this fact, fixed-point numbers come in lengths of 1, 2, 4 or
more bytes, whereas, floating-point numbers come in single precision (4-bytes) or double
precision (8-bytes) formats.
Big-Endian and Little-Endian Assignments
‘There are two ways that byte addresses can be assigned across words : big-endian
and little-endian, When lower byte addresses are used for the more significant bytes
(the leftmost bytes) of the word, addressing is called big endian. When the lower byte
addresses are used for the less significant bytes (the rightmost byte) of the word,
addressing is called little-endian. This is illustrated in Fig. 1.82,
TECHNICAL PUBLICATIONS”. An ph er hnowesge