Processor Architectures Overview
Processor Architectures Overview
RISC-CISC-Harvard-Von Neumann
PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information.
PDF generated at: Thu, 23 Feb 2012 19:39:43 UTC
Contents
Articles
Von Neumann architecture 1
Harvard architecture 8
Complex instruction set computing 10
Reduced instruction set computing 13
References
Article Sources and Contributors 21
Image Sources, Licenses and Contributors 22
Article Licenses
License 23
Von Neumann architecture 1
History
The earliest computing machines had fixed programs. Some very simple computers still use this design, either for
simplicity or training purposes. For example, a desk calculator (in principle) is a fixed program computer. It can do
basic mathematics, but it cannot be used as a word processor or a gaming console. Changing the program of a
fixed-program machine requires re-wiring, re-structuring, or re-designing the machine. The earliest computers were
not so much "programmed" as they were "designed". "Reprogramming", when it was possible at all, was a laborious
process, starting with flowcharts and paper notes, followed by detailed engineering designs, and then the
often-arduous process of physically re-wiring and re-building the machine. It could take three weeks to set up a
program on ENIAC and get it working.[4]
With the proposal of the stored-program computer this changed. A stored-program computer includes by design an
instruction set and can store in memory a set of instructions (a program) that details the computation.
A stored-program design also allows for self-modifying code. One early motivation for such a facility was the need
for a program to increment or otherwise modify the address portion of instructions, which had to be done manually
in early designs. This became less important when index registers and indirect addressing became usual features of
machine architecture. Another use was to embed frequently used data in the instruction stream using immediate
addressing. Self-modifying code has largely fallen out of favor, since it is usually hard to understand and debug, as
well as being inefficient under modern processor pipelining and caching schemes.
On a large scale, the ability to treat instructions as data is what makes assemblers, compilers and other automated
programming tools possible. One can "write programs which write programs".[5] On a smaller scale, repetitive
I/O-intensive operations such as the BITBLT image manipulation primitive or pixel & vertex shaders in modern 3d
graphics, were considered inefficient to run without custom hardware. These operations could be accelerated on
general purpose processors with "on the fly compilation" ("just-in-time compilation") technology, e.g.,
code-generating programsone form of self-modifying code that has remained popular.
There are drawbacks to the Von Neumann design. Aside from the Von Neumann bottleneck described below,
program modifications can be quite harmful, either by accident or design. In some simple stored-program computer
designs, a malfunctioning program can damage itself, other programs, or the operating system, possibly leading to a
Von Neumann architecture 2
computer crash. Memory protection and other forms of access control can usually protect against both accidental and
malicious program modification.
Both von Neumann's and Turing's papers described stored-program computers, but von Neumann's earlier paper
achieved greater circulation and the computer architecture it outlined became known as the "von Neumann
architecture". In the 1953 publication Faster than Thought: A Symposium on Digital Computing Machines (edited by
B.V. Bowden), a section in the chapter on Computers in America reads as follows:[14]
THE MACHINE OF THE INSTITUTE FOR ADVANCED STUDIES, PRINCETON
In 1945, Professor J. von Neumann, who was then working at the Moore School of Engineering in
Philadelphia, where the E.N.I.A.C. had been built, issued on behalf of a group of his co-workers a
report on the logical design of digital computers. The report contained a fairly detailed proposal for the
design of the machine which has since become known as the E.D.V.A.C. (electronic discrete variable
automatic computer). This machine has only recently been completed in America, but the von Neumann
report inspired the construction of the E.D.S.A.C. (electronic delay-storage automatic calculator) in
Cambridge (see page 130).
In 1947, Burks, Goldstine and von Neumann published another report which outlined the design of
another type of machine (a parallel machine this time) which should be exceedingly fast, capable
perhaps of 20,000 operations per second. They pointed out that the outstanding problem in constructing
such a machine was in the development of a suitable memory, all the contents of which were
instantaneously accessible, and at first they suggested the use of a special tubecalled the Selectron,
which had been invented by the Princeton Laboratories of the R.C.A. These tubes were expensive and
difficult to make, so von Neumann subsequently decided to build a machine based on the Williams
memory. This machine, which was completed in June, 1952 in Princeton has become popularly known
as the Maniac. The design of this machine has inspired that of half a dozen or more machines which are
now being built in America, all of which are known affectionately as "Johniacs."'
In the same book, the first two paragraphs of a chapter on ACE read as follows:[15]
AUTOMATIC COMPUTATION AT THE NATIONAL PHYSICAL LABORATORY'
One of the most modern digital computers which embodies developments and improvements in the
technique of automatic electronic computing was recently demonstrated at the National Physical
Laboratory, Teddington, where it has been designed and built by a small team of mathematicians and
electronics research engineers on the staff of the Laboratory, assisted by a number of production
engineers from the English Electric Company, Limited. The equipment so far erected at the Laboratory
is only the pilot model of a much larger installation which will be known as the Automatic Computing
Engine, but although comparatively small in bulk and containing only about 800 thermionic valves, as
can be judged from Plates XII, XIII and XIV, it is an extremely rapid and versatile calculating machine.
The basic concepts and abstract principles of computation by a machine were formulated by Dr. A. M.
Turing, F.R.S., in a paper1. read before the London Mathematical Society in 1936, but work on such
machines in Britain was delayed by the war. In 1945, however, an examination of the problems was
made at the National Physical Laboratory by Mr. J. R. Womersley, then superintendent of the
Mathematics Division of the Laboratory. He was joined by Dr. Turing and a small staff of specialists,
and, by 1947, the preliminary planning was sufficiently advanced to warrant the establishment of the
special group already mentioned. In April, 1948, the latter became the Electronics Section of the
Laboratory, under the charge of Mr. F. M. Colebrook.
Von Neumann architecture 4
Evolution
Through the decades of the 1960s and 1970s computers generally
became both smaller and faster, which led to some evolutions in their
architecture. For example, memory-mapped I/O allows input and
output devices to be treated the same as memory.[20] A single system
bus could be used to provide a modular system with lower cost. This is
sometimes called a "streamlining" of the architecture.[21] In subsequent
decades, simple microcontrollers would sometimes omit features of the
model to lower cost and size. Larger computers added features for
higher performance.
Single system bus evolution of the architecture
The term "von Neumann bottleneck" was coined by John Backus in his 1977 ACM Turing Award lecture. According
to Backus:
Surely there must be a less primitive way of making big changes in the store than by pushing vast
numbers of words back and forth through the von Neumann bottleneck. Not only is this tube a literal
bottleneck for the data traffic of a problem, but, more importantly, it is an intellectual bottleneck that has
kept us tied to word-at-a-time thinking instead of encouraging us to think in terms of the larger
conceptual units of the task at hand. Thus programming is basically planning and detailing the enormous
traffic of words through the von Neumann bottleneck, and much of that traffic concerns not significant
data itself, but where to find it.[22][23]
The performance problem can be alleviated (to some extent) by several mechanisms. Providing a cache between the
CPU and the main memory, providing separate caches or separate access paths for data and instructions (the
so-called Modified Harvard architecture), using branch predictor algorithms and logic, and providing a limited CPU
stack to reduce memory access are four of the ways performance is increased. The problem can also be sidestepped
somewhat by using parallel computing, using for example the Non-Uniform Memory Access (NUMA)
architecturethis approach is commonly employed by supercomputers. It is less clear whether the intellectual
bottleneck that Backus criticized has changed much since 1977. Backus's proposed solution has not had a major
influence. Modern functional programming and object-oriented programming are much less geared towards "pushing
vast numbers of words back and forth" than earlier languages like Fortran were, but internally, that is still what
computers spend much of their time doing, even highly parallel supercomputers.
In some cases, emerging memristor technology may be able to circumvent the von Neumann bottleneck.[24]
Von Neumann architecture 6
References
Inline
[1] von Neumann 1945
[2] Ganesan 2009
[3] Markgraf, Joey D. (2007), The Von Neumann bottleneck (http:/ / aws. linnbenton. edu/ cs271c/ markgrj/ ), , retrieved August 24, 2011
[4] Copeland 2006, p.104
[5] MFTL (My Favorite Toy Language) entry Jargon File 4.4.7 (http:/ / catb. org/ ~esr/ jargon/ html/ M/ MFTL. html), , retrieved 2008-07-11
[6] Turing, A.M. (1936), "On Computable Numbers, with an Application to the Entscheidungsproblem", Proceedings of the London
Mathematical Society, 2 42: 23065, 1937, doi:10.1112/plms/s2-42.1.230 (and Turing, A.M. (1938), "On Computable Numbers, with an
Application to the Entscheidungsproblem: A correction", Proceedings of the London Mathematical Society, 2 43 (6): 5446, 1937,
doi:10.1112/plms/s2-43.6.544)
[7] The Life and Work of Konrad Zuse Part 10: Konrad Zuse and the Stored Program Computer (http:/ / web. archive. org/ web/
20080601160645/ http:/ / www. epemag. com/ zuse/ part10. htm), archived from the original (http:/ / www. epemag. com/ zuse/ part10. htm)
on June 1, 2008, , retrieved 2008-07-11
[8] Lukoff, Herman (1979), From Dits to Bits...: A Personal History of the Electronic Computer, Robotics Press, ISBN978-0-89661-002-6
[9] ENIAC project administrator Grist Brainerd's December 1943 progress report for the first period of the ENIAC's development implicitly
proposed the stored program concept (while simultaneously rejecting its implementation in the ENIAC) by stating that "in order to have the
simplest project and not to complicate matters" the ENIAC would be constructed without any "automatic regulation".
[10] Copeland 2006, p.113
[11] Copeland, Jack (2000), A Brief History of Computing: ENIAC and EDVAC (http:/ / www. alanturing. net/ turing_archive/ pages/ Reference
Articles/ BriefHistofComp. html#ACE), , retrieved 27 January 2010
[12] Copeland, Jack (2000), A Brief History of Computing: ENIAC and EDVAC (http:/ / www. alanturing. net/ turing_archive/ pages/ Reference
Articles/ BriefHistofComp. html#ACE), , retrieved 27 January 2010 which cites Randell, B. (1972), Meltzer, B.; Michie, D., eds., "On Alan
Turing and the Origins of Digital Computers", Machine Intelligence 7 (Edinburgh: Edinburgh University Press): 10, ISBN0902383264
[13] Copeland 2006, pp.108111
[14] Bowden 1953, pp.176,177
[15] Bowden 1953, p.135
[16] "Electronic Computer Project" (http:/ / www. ias. edu/ people/ vonneumann/ ecp/ ). Institute for Advanced Study. . Retrieved May 26, 2011.
[17] Illiac Design Techniques, report number UIUCDCS-R-1955-146, Digital Computer Laboratory, University of Illinois at
Urbana-Champaign, 1955
[18] F.E. Hamilton, R.R. Seeber, R.A. Rowley, and E.S. Hughes (January 19, 1949). "Selective Sequence Electronic Calculator" (http:/ / patft1.
uspto. gov/ netacgi/ nph-Parser?Sect1=PTO1& Sect2=HITOFF& d=PALL& p=1& u=/ netahtml/ PTO/ srchnum. htm& r=1& f=G& l=50&
s1=2636672. PN. & OS=PN/ 2636672& RS=PN/ 2636672). US Patent 2,636,672. . Retrieved April 28, 2011. Issued April 28, 1953.
[19] Herbert R.J. Grosch (1991), Computer: Bit Slices From a Life (http:/ / www. columbia. edu/ acis/ history/ computer. html), Third
Millennium Books, ISBN0-88733-085-1,
[20] C. Gordon Bell; R. Cady; H. McFarland; J. O'Laughlin; R. Noonan; W. Wulf (1970), "A New Architecture for Mini-ComputersThe DEC
PDP-11" (http:/ / research. microsoft. com/ en-us/ um/ people/ gbell/ CGB Files/ New Architecture PDP11 SJCC 1970 c. pdf), Spring Joint
Computer Conference: pp.657675, .
[21] Linda Null; Julia Lobur (2010), The essentials of computer organization and architecture (http:/ / books. google. com/
books?id=f83XxoBC_8MC& pg=PA36) (3rd ed.), Jones & Bartlett Learning, pp.36,199203, ISBN9781449600068,
[22] Backus, John W.. "Can Programming Be Liberated from the von Neumann Style? A Functional Style and Its Algebra of Programs" (http:/ /
www. cs. cmu. edu/ ~crary/ 819-f09/ Backus78. pdf). . Retrieved 2012-01-20.
[23] Dijkstra, Edsger W.. "E. W. Dijkstra Archive: A review of the 1977 Turing Award Lecture" (http:/ / www. cs. utexas. edu/ ~EWD/
transcriptions/ EWD06xx/ EWD692. html). . Retrieved 2008-07-11.
[24] Mouttet, Blaise L (2009), "Memristor Pattern Recognition Circuit Architecture for Robotics" (http:/ / www. iiis. org/ CDs2008/ CD2009SCI/
CITSA2009/ PapersPdf/ I086AI. pdf), Proceedings of the 2nd International Multi-Conference on Engineering and Technological Innovation
II: 6570,
[25] "COP8 Basic Family Users Manual" (http:/ / www. national. com/ appinfo/ mcu/ files/ Basic_user1. pdf). National Semiconductor. .
Retrieved 2012-01-20.
Von Neumann architecture 7
[26] "COP888 Feature Family Users Manual" (http:/ / www. national. com/ appinfo/ mcu/ files/ Feature_user. pdf). National Semiconductor. .
Retrieved 2012-01-20.
General
Bowden, B.V., ed. (1953), Faster Than Thought: A Symposium on Digital Computing Machines, London: Sir
Isaac Pitman and Sons Ltd.
Rojas, Ral; Hashagen, Ulf, eds. (2000), The First Computers: History and Architectures, MIT Press,
ISBN0-262-18197-5
Davis, Martin (2000), The universal computer: the road from Leibniz to Turing, New York: W W Norton &
Company Inc., ISBN0-393-04785-7
Can Programming be Liberated from the von Neumann Style?, John Backus, 1977 ACM Turing Award Lecture.
Communications of the ACM, August 1978, Volume 21, Number 8 Online PDF (http://www.stanford.edu/
class/cs242/readings/backus.pdf)
C. Gordon Bell and Allen Newell (1971), Computer Structures: Readings and Examples, McGraw-Hill Book
Company, New York. Massive (668 pages)
Copeland, Jack (2006), "Colossus and the Rise of the Modern Computer", in Copeland, B. Jack, Colossus: The
Secrets of Bletchley Park's Codebreaking Computers, Oxford: Oxford University Press, ISBN978-0-19-284055-4
von Neumann, John (1945), First Draft of a Report on the EDVAC (http://qss.stanford.edu/~godfrey/
vonNeumann/vnedvac.pdf), retrieved August 24, 2011
Ganesan, Deepak (2009), The Von Neumann Model (http://none.cs.umass.edu/~dganesan/courses/fall09/
handouts/Chapter4.pdf), retrieved October 22, 2011
External links
Harvard vs von Neumann (http://www.pic24micro.com/harvard_vs_von_neumann.html)
A tool that emulates the behavior of a von Neumann machine (http://home.gna.org/vov/)
Harvard architecture 8
Harvard architecture
The Harvard architecture is a computer
architecture with physically separate storage
and signal pathways for instructions and
data. The term originated from the Harvard
Mark I relay-based computer, which stored
instructions on punched tape (24 bits wide)
and data in electro-mechanical counters.
These early machines had data storage
entirely contained within the central
processing unit, and provided no access to
the instruction storage as data. Programs
needed to be loaded by an operator; the
processor could not boot itself. Harvard architecture
Memory details
In a Harvard architecture, there is no need to make the two memories share characteristics. In particular, the word
width, timing, implementation technology, and memory address structure can differ. In some systems, instructions
can be stored in read-only memory while data memory generally requires read-write memory. In some systems, there
is much more instruction memory than data memory so instruction addresses are wider than data addresses.
Another modification provides a pathway between the instruction memory (such as ROM or flash) and the CPU to
allow words from the instruction memory to be treated as read-only data. This technique is used in some
microcontrollers, including the Atmel AVR. This allows constant data, such as text strings or function tables, to be
accessed without first having to be copied into data memory, preserving scarce (and power-hungry) data memory for
read/write variables. Special machine language instructions are provided to read data from the instruction memory.
(This is distinct from instructions which themselves embed constant data, although for individual constants the two
mechanisms can substitute for each other.)
Speed
In recent years, the speed of the CPU has grown many times in comparison to the access speed of the main memory.
Care needs to be taken to reduce the number of times main memory is accessed in order to maintain performance. If,
for instance, every instruction run in the CPU requires an access to memory, the computer gains nothing for
increased CPU speeda problem referred to as being "memory bound".
It is possible to make extremely fast memory but this is only practical for small amounts of memory for cost, power
and signal routing reasons. The solution is to provide a small amount of very fast memory known as a CPU cache
which holds recently accessed data. As long as the data that the CPU needs is in the cache, the performance hit is
much smaller than it is when the cache has to turn around and get the data from the main memory.
Microcontrollers are characterized by having small amounts of program (flash memory) and data (SRAM)
memory, with no cache, and take advantage of the Harvard architecture to speed processing by concurrent
instruction and data access. The separate storage means the program and data memories can have different bit
widths, for example using 16-bit wide instructions and 8-bit wide data. They also mean that instruction prefetch
can be performed in parallel with other activities. Examples include, the AVR by Atmel Corp, the PIC by
Microchip Technology, Inc. and the ARM Cortex-M3 processor (not all ARM chips have Harvard architecture).
Even in these cases, it is common to have special instructions to access program memory as data for read-only tables,
or for reprogramming.
External links
Harvard vs Von Neumann [1]
References
[1] http:/ / www. pic24micro. com/ harvard_vs_von_neumann. html
New instructions
In the 1970s, analysis of high level languages indicated some complex machine language implementations and it was
determined that new instructions could improve performance. Some instructions were added that were never
intended to be used in assembly language but fit well with compiled high level languages. Compilers were updated
to take advantage of these instructions. The benefits of semantically rich instructions with compact encodings can be
seen in modern processors as well, particularly in the high performance segment where caches are a central
component (as opposed to most embedded systems). This is because these fast, but complex and expensive,
memories are inherently limited in size, making compact code beneficial. Of course, the fundamental reason they are
Complex instruction set computing 11
needed is that main memories (i.e. dynamic RAM today) remain slow compared to a (high performance) CPU-core.
Design issues
While many designs achieved the aim of higher throughput at lower cost and also allowed high-level language
constructs to be expressed by fewer instructions, it was observed that this was not always the case. For instance,
low-end versions of complex architectures (i.e. using less hardware) could lead to situations where it was possible to
improve performance by not using a complex instruction (such as a procedure call or enter instruction), but instead
using a sequence of simpler instructions.
One reason for this was that architects (microcode writers) sometimes "over-designed" assembler language
instructions, i.e. including features which were not possible to implement efficiently on the basic hardware available.
This could, for instance, be "side effects" (above conventional flags), such as the setting of a register or memory
location that was perhaps seldom used; if this was done via ordinary (non duplicated) internal buses, or even the
external bus, it would demand extra cycles every time, and thus be quite inefficient.
Even in balanced high performance designs, highly encoded and (relatively) high-level instructions could be
complicated to decode and execute efficiently within a limited transistor budget. Such architectures therefore
required a great deal of work on the part of the processor designer in cases where a simpler, but (typically) slower,
solution based on decode tables and/or microcode sequencing is not appropriate. At a time when transistors and other
components were a limited resource, this also left fewer components and less opportunity for other types of
performance optimizations.
Superscalar
In a more modern context, the complex variable length encoding used by some of the typical CISC architectures
makes it complicated, but still feasible, to build a superscalar implementation of a CISC programming model
directly; the in-order superscalar original Pentium and the out-of-order superscalar Cyrix 6x86 are well known
examples of this. The frequent memory accesses for operands of a typical CISC machine may limit the instruction
level parallelism that can be extracted from the code, although this is strongly mediated by the fast cache structures
used in modern designs, as well as by other measures. Due to inherently compact and semantically rich instructions,
the average amount of work performed per machine code unit (i.e. per byte or bit) is higher for a CISC than a RISC
processor, which may give it a significant advantage in a modern cache based implementation. (Whether the
downsides versus the upsides justifies a complex design or not is food for a never-ending debate in certain circles.)
Complex instruction set computing 12
Transistors for logic, PLAs, and microcode are no longer scarce resources; only large high-speed cache memories are
limited by the maximum number of transistors today. Although complex, the transistor count of CISC decoders do
not grow exponentially like the total number of transistors per processor (the majority typically used for caches).
Together with better tools and enhanced technologies, this has led to new implementations of highly encoded and
variable length designs without load-store limitations (i.e. non-RISC). This governs re-implementations of older
architectures such as the ubiquitous x86 (see below) as well as new designs for microcontrollers for embedded
systems, and similar uses. The superscalar complexity in the case of modern x86 was solved with dynamically issued
and buffered micro-operations, i.e. indirect and dynamic superscalar execution; the Pentium Pro and AMD K5 are
early examples of this. It allows a fairly simple superscalar design to be located after the (fairly complex) decoders
(and buffers), giving, so to speak, the best of both worlds in many respects.
Notes
[1] Patterson, D. A. and Ditzel, D. R. 1980. The case for the reduced instruction set computing. SIGARCH Comput. Archit. News 8, 6 (October
1980), 25-33. DOI= http:/ / doi. acm. org/ 10. 1145/ 641914. 641917
[2] http:/ / www. cs. uiowa. edu/ ~jones/ arch/ cisc/
References
This article was originally based on material from the Free On-line Dictionary of Computing, which is licensed
under the GFDL.
Tanenbaum, Andrew S. (2006) Structured Computer Organization, Fifth Edition, Pearson Education, Inc. Upper
Saddle River, NJ.
External links
RISC vs. CISC comparison (http://www.pic24micro.com/cisc_vs_risc.html)
logic for dealing with the delay in completing a memory access (cache miss, etc.) to only two instructions. This led
to RISC designs being referred to as load/store architectures.[5]
One more issue is that complex instructions are difficult to restart, e.g. following a page fault. In some cases,
restarting from the beginning will work (although wasteful), but in many this would give incorrect results. Therefore
the machine needs to have some hidden state to remember which parts went through and what needs to be done.
With a load/store machine, the PC supplies all information.
Alternatives
RISC was developed as an alternative to what is now known as CISC. Over the years, other strategies have been
implemented as alternatives to RISC and CISC. Some examples are VLIW, MISC, OISC, massive parallel
processing, systolic array, reconfigurable computing, and dataflow architecture.
Identical general purpose registers, allowing any register to be used in any context, simplifying compiler design
(although normally there are separate floating point registers);
Simple addressing modes, with complex addressing performed via sequences of arithmetic and/or load-store
operations;
Few data types in hardware, some CISCs have byte string instructions, or support complex numbers; this is so far
unlikely to be found on a RISC.
Exceptions abound, of course, within both CISC and RISC.
RISC designs are also more likely to feature a Harvard memory model, where the instruction stream and the data
stream are conceptually separated; this means that modifying the memory where code is held might not have any
effect on the instructions executed by the processor (because the CPU has a separate instruction and data cache), at
least until a special synchronization instruction is issued. On the upside, this allows both caches to be accessed
simultaneously, which can often improve performance.
Many early RISC designs also shared the characteristic of having a branch delay slot. A branch delay slot is an
instruction space immediately following a jump or branch. The instruction in this space is executed, whether or not
the branch is taken (in other words the effect of the branch is delayed). This instruction keeps the ALU of the CPU
busy for the extra time normally needed to perform a branch. Nowadays the branch delay slot is considered an
unfortunate side effect of a particular strategy for implementing some RISC designs, and modern RISC designs
generally do away with it (such as PowerPC and more recent versions of SPARC and MIPS).
Early RISC
The first system that would today be known as RISC was the CDC 6600 supercomputer, designed in 1964, a decade
before the term was invented. The CDC 6600 had a load-store architecture with only two addressing modes
(register+register, and register+immediate constant) and 74 opcodes (whereas an Intel 8086 has 400). The 6600 had
eleven pipelined functional units for arithmetic and logic, plus five load units and two store units; the memory had
multiple banks so all load-store units could operate at the same time. The basic clock cycle/instruction issue rate was
10 times faster than the memory access time. Jim Thornton and Seymour Cray designed it as a number-crunching
CPU supported by 10 simple computers called "peripheral processors" to handle I/O and other operating system
functions.[9] Thus the joking comment later that the acronym RISC actually stood for "Really Invented by Seymour
Cray".
Another early load-store machine was the Data General Nova minicomputer, designed in 1968 by Edson de Castro.
It had an almost pure RISC instruction set, remarkably similar to that of today's ARM processors; however it has not
been cited as having influenced the ARM designers, although Novas were in use at the University of Cambridge
Computer Laboratory in the early 1980s.
The earliest attempt to make a chip-based RISC CPU was a project at IBM which started in 1975. Named after the
building where the project ran, the work led to the IBM 801 CPU family which was used widely inside IBM
hardware. The 801 was eventually produced in a single-chip form as the ROMP in 1981, which stood for 'Research
OPD [Office Products Division] Micro Processor'. As the name implies, this CPU was designed for "mini" tasks, and
when IBM released the IBM RT-PC based on the design in 1986, the performance was not acceptable. Nevertheless
the 801 inspired several research projects, including new ones at IBM that would eventually lead to their POWER
system.
The most public RISC designs, however, were the results of university research programs run with funding from the
DARPA VLSI Program. The VLSI Program, practically unknown today, led to a huge number of advances in chip
design, fabrication, and even computer graphics.
The Berkeley RISC project started in 1980 under the direction of David Patterson and Carlo H. Sequin, based on
gaining performance through the use of pipelining and an aggressive use of a technique known as register
Reduced instruction set computing 17
windowing. In a normal CPU, one has a small number of registers, and a program can use any register at any time. In
a CPU with register windows, there are a huge number of registers, e.g. 128, but programs can only use a small
number of them, e.g. eight, at any one time. A program that limits itself to eight registers per procedure can make
very fast procedure calls: The call simply moves the window "down" by eight, to the set of eight registers used by
that procedure, and the return moves the window back. (On a normal CPU, most calls must save at least a few
registers' values to the stack in order to use those registers as working space, and restore their values on return.)
The RISC project delivered the RISC-I processor in 1982. Consisting of only 44,420 transistors (compared with
averages of about 100,000 in newer CISC designs of the era) RISC-I had only 32 instructions, and yet completely
outperformed any other single-chip design. They followed this up with the 40,760 transistor, 39 instruction RISC-II
in 1983, which ran over three times as fast as RISC-I.
At about the same time, John L. Hennessy started a similar project called MIPS at Stanford University in 1981.
MIPS focused almost entirely on the pipeline, making sure it could be run as "full" as possible. Although pipelining
was already in use in other designs, several features of the MIPS chip made its pipeline far faster. The most
important, and perhaps annoying, of these features was the demand that all instructions be able to complete in one
cycle. This demand allowed the pipeline to be run at much higher data rates (there was no need for induced delays)
and is responsible for much of the processor's performance. However, it also had the negative side effect of
eliminating many potentially useful instructions, like a multiply or a divide.
In the early years, the RISC efforts were well known, but largely confined to the university labs that had created
them. The Berkeley effort became so well known that it eventually became the name for the entire concept. Many in
the computer industry criticized that the performance benefits were unlikely to translate into real-world settings due
to the decreased memory efficiency of multiple instructions, and that that was the reason no one was using them. But
starting in 1986, all of the RISC research projects started delivering products.
Later RISC
Berkeley's research was not directly commercialized, but the RISC-II design was used by Sun Microsystems to
develop the SPARC, by Pyramid Technology to develop their line of mid-range multi-processor machines, and by
almost every other company a few years later. It was Sun's use of a RISC chip in their new machines that
demonstrated that RISC's benefits were real, and their machines quickly outpaced the competition and essentially
took over the entire workstation market.
John Hennessy left Stanford (temporarily) to commercialize the MIPS design, starting the company known as MIPS
Computer Systems. Their first design was a second-generation MIPS chip known as the R2000. MIPS designs went
on to become one of the most used RISC chips when they were included in the PlayStation and Nintendo 64 game
consoles. Today they are one of the most common embedded processors in use for high-end applications.
IBM learned from the RT-PC failure and went on to design the RS/6000 based on their new POWER architecture.
They then moved their existing AS/400 systems to POWER chips, and found much to their surprise that even the
very complex instruction set ran considerably faster. POWER would also find itself moving "down" in scale to
produce the PowerPC design, which eliminated many of the "IBM only" instructions and created a single-chip
implementation. Today the PowerPC is one of the most commonly used CPUs for automotive applications (some
cars have more than 10 of them inside). It was also the CPU used in most Apple Macintosh machines from 1994 to
2006. (Starting in February 2006, Apple switched their main production line to Intel x86 processors.)
Almost all other vendors quickly joined. From the UK, similar research efforts resulted in the INMOS transputer, the
Acorn Archimedes and the Advanced RISC Machine line, which is a huge success today. Most mobile phones and
MP3 players use ARM processors. Companies with existing CISC designs also quickly joined the revolution. Intel
released the i860 and i960 by the late 1980s, although they were not very successful. Motorola built a new design
called the 88000 in homage to their famed CISC 68000, but it saw almost no use. The company eventually
abandoned it and joined IBM to produce the PowerPC. AMD released their 29000 which would go on to become the
Reduced instruction set computing 18
Further reading
Television
Computer Chronicles (1986). " RISC (http://www.archive.org/details/RISC1986)".
External links
RISC vs. CISC (http://www-cs-faculty.stanford.edu/~eroberts/courses/soco/projects/2000-01/risc/risccisc/
)
What is RISC (http://www-cs-faculty.stanford.edu/~eroberts/courses/soco/projects/2000-01/risc/whatis/)
RISC vs. CISC from historical perspective (http://www.cpushack.net/CPU/cpuAppendA.html)
Article Sources and Contributors 21
Harvard architecture Source: http://en.wikipedia.org/w/index.php?oldid=477983174 Contributors: AgadaUrbanit, Ale And Quail, Alex Pascual, Amarco90, Anomalocaris, Antandrus,
Arjun01, AxelBoldt, Bachrach44, Bitflung, BokicaK, Bozoid, CanisRufus, CryptoDerk, DHR, Dcljr, DexDor, Doradus, Dyl, Eaglizard, Epinheiro, Fanopanic, Frap, Furrykef, Fuzheado,
GrahamDavies, Guy Macon, HenkeB, Henriok, Iain.mcclatchie, J.delanoy, JamesMLane, Jni, JonHarder, JorgePeixoto, Jwortzel, Kbdank71, Krauss, Kvng, Levin, LittleDan, LokiClock,
Lordofcode, Magic5ball, Malleus Fatuorum, Maury Markowitz, Michael Hardy, Moilforgold, NathanBeach, Neelix, Nessa los, Oosterwal, Oxymoron83, Pion, Pjrm, Plugwash, Psycotica0, R. S.
Shaw, RTC, Rdsmith4, Remi0o, Reswobslc, Ric8ard, Robert K S, Rwwww, Sepia tone, Shadowjams, Sietse Snel, SpeedyGonsales, Srasku, Sw1974, Toddintr, Transcendent, Unyoyega,
Wernher, Witguiota, Ykhwong, Zarek, 130 anonymous edits
Complex instruction set computing Source: http://en.wikipedia.org/w/index.php?oldid=475827237 Contributors: 209.239.198.xxx, Alimentarywatson, Andrejj, Arndbergmann, Blazar,
Buybooks Marius, CanisRufus, Carbuncle, Cassie Puma, Cdleary, Chris Howard, Collabi, Conversion script, Cybercobra, DMTagatac, DaleDe, Davnor, Deflective, Destynova, DmitryKo, Dyl,
Edward, Ejrrjs, EncMstr, EoGuy, Epbr123, Eras-mus, Ergbert, Ethancleary, EvanCarroll, Eyreland, Fejesjoco, Flying Bishop, Frap, Galain, Gardar Rurak, Graham Chapman, Guy Harris,
HenkeB, James Foster, Jason Quinn, Joanjoc, JonHarder, JonathonReinhart, Jpfagerback, Kallikanzarid, Karl-Henner, Kbdank71, Kelly Martin, Kwertii, Liao, Lion10, MFH, Materialscientist,
MattGiuca, Mike4ty4, Mudlock, Murray Langton, NapoliRoma, Neilc, Nikto parcheesy, Nxavar, Nyat, Optakeover, OrgasGirl, PS2pcGAMER, Pgquiles, PokeYourHeadOff, Prodego, Qbeep,
Quuxplusone, R'n'B, R. S. Shaw, Rdnk, RekishiEJ, Rich Farmbrough, Rilak, Robert Merkel, Rwwww, Saaya, Sct72, SimDoc, SimonP, Skittleys, Slady, Sopoforic, Stephan Leeds, Swiftly,
Template namespace initialisation script, Tesi1700, Thincat, Tirppa, TutterMouse, UnicornTapestry, Urhixidur, VampWillow, Virtualphtn, Whaa?, WhiteTimberwolf, Wiki alf, Wws, 119
anonymous edits
Reduced instruction set computing Source: http://en.wikipedia.org/w/index.php?oldid=475286182 Contributors: 15.253, 16@r, 18.94, 209.239.198.xxx, 62.253.64.xxx, Adam Bishop,
AgadaUrbanit, Ale07, Alecv, Ancheta Wis, Andre Engels, Andrew.baine, Aninhumer, Anss123, Autarchprinceps, AvayaLive, Bcaff05, Beanyk, Betacommand, Bobanater, Bobblewik, Brianski,
Bryan Derksen, Btwied, C xong, C. A. Russell, Cambrant, Capricorn42, Cbturner46, Charles Matthews, Christan80, Cliffster1, Cmdrjameson, Conversion script, Corti, Cybercobra, Cybermaster,
Damian Yerrick, Darkink, Davewho2, David Gerard, David Shay, DavidCary, DavidConner, Davipo, Dbfirs, Dcoetzee, DeadEyeArrow, Deflective, Derek Ross, Dkanter, Dmsar, Donreed, Dr
zepsuj, DragonHawk, Drcwright, Drj, Dro Kulix, Dyl, Eclipsed aurora, EdBever, EdgeOfEpsilon, Edward, Eloquence, EnOreg, Eras-mus, Evice, Finlay McWalter, Fonzy, Frap, Fredrik,
Fromageestciel, Fujimuji, Furrykef, G3pro, GCFreak2, Gaius Cornelius, Gazno, Gesslein, Gjs238, Graham87, GregLindahl, GregorB, Guy Harris, Hadal, Hardyplants, Heirpixel, HenkeB,
Henriok, Hephaestos, HubmaN, ICE77, ISC PB, Iain.mcclatchie, Ianw, Imroy, In2thats12, IvanLanin, JVz, Jack1956, Jamesmusik, Jasongagich, Jay.slovak, JayC, Jengod, Jerryobject, Jesse
Viviano, Jevansen, Jfmantis, Jiang, JoanneB, Joey Eads, Johncatsoulis, JonHarder, Josh Grosse, JulesH, Kaszeta, Kate, Kbdank71, Kelly Martin, Kevin, Kman543210, Knutux, Koper, Koyaanis
Qatsi, Kristof vt, Kwamikagami, Kwertii, Labalius, Larowebr, Leszek Jaczuk, Levin, Liao, Liftarn, Ligulem, Littleman TAMU, Lorkki, Lquilter, MER-C, MFH, Magus732, Marcosw, Mark
Richards, MarkMLl, Matsuiny2004, MattGiuca, Mattpat, Maurreen, Maury Markowitz, Mav, Mdz, MehrdadAfshari, MetaNest, Michael Hardy, Micky750k, Mike4ty4, MikeCapone, Mikeblas,
Milan Kerlger, Mintleaf, Miremare, Modster, Moxfyre, MrPrada, Mrand, Mrwojo, Murray Langton, Nasukaren, Nate Silva, Neilc, Nikevich, Nurg, Nutschig, OCNative, Odysseus1479,
Optakeover, Orichalque, Owengerig, Parklandspanaway, Paul D. Anderson, Paul Foxworthy, Pgquiles, Phil webster, Philippe, Pixel8, Plr4ever, Pnm, PrimeHunter, Ptoboley, QTCaptain,
Quuxplusone, Qwertyus, R. S. Shaw, RAMChYLD, RadicalBender, Radimvice, Rajrajmarley, Rat144, Raysonho, RedWolf, Rehnn83, Remi0o, Retodon8, Rilak, Robert K S, Robert Merkel,
Romanm, Ruud Koot, Rwwww, Saaya, Sbierwagen, Scepia, Scootey, Self-Perfection, Senpai71, Shieldforyoureyes, Shirifan, Sietse Snel, Simetrical, SimonW, Snoyes, Solipsist, Sonu mangla,
SpeedyGonsales, SpuriousQ, Stan Shebs, Stephan Leeds, Stewartadcock, StuartBrady, Surturz, Susvolans, T-bonham, The Appleton, TheMandarin, Thecheesykid, Thorpe, Thumperward,
Thunderbrand, TimBentley, Tksharpless, Toresbe, Toussaint, Trevj, UncleDouggie, UnicornTapestry, Unyoyega, Uriyan, VampWillow, Vishwastengse, Watcharakorn, Wcooley, Weeniewhite,
Weevil, Wehe, Wernher, Wik, WojPob, Worthawholebean, Wws, Xyb, Yurik, Zachlipton, ZeroOne, ^demon, 405 anonymous edits
Image Sources, Licenses and Contributors 22
License
Creative Commons Attribution-Share Alike 3.0 Unported
//creativecommons.org/licenses/by-sa/3.0/