KEMBAR78
Lec01 Intro | PDF | Integrated Circuit | Central Processing Unit
0% found this document useful (0 votes)
26 views41 pages

Lec01 Intro

CENG 3420 is a course on Computer Organization & Design, taught by Bei Yu at CUHK, focusing on the major components of computer systems, CPU design, performance improvement techniques, and multiprocessor architecture. The grading structure includes attendance, homework, midterms, labs, and a final exam, with specific requirements for passing. The course aims to provide students with knowledge about computer architecture's impact on software design and performance optimization.

Uploaded by

rebeccawong878
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views41 pages

Lec01 Intro

CENG 3420 is a course on Computer Organization & Design, taught by Bei Yu at CUHK, focusing on the major components of computer systems, CPU design, performance improvement techniques, and multiprocessor architecture. The grading structure includes attendance, homework, midterms, labs, and a final exam, with specific requirements for passing. The course aims to provide students with knowledge about computer architecture's impact on software design and performance optimization.

Uploaded by

rebeccawong878
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

CENG 3420

Computer Organization & Design


Lecture 01: Introduction
Bei Yu
CSE Department, CUHK
byu@cse.cuhk.edu.hk

(Textbook: Chapters 1.3 & 1.4)

2025 Spring
Course Information

2/37
Course Administration

Instructor:
• Bei Yu (byu@cse.cuhk.edu.hk)
• Office: SHB 907
• Office Hrs: H14:30–16:30

Tutors:
• Mingjun Li (mjli23@cse.cuhk.edu.hk)
• Yuhao Ji (1155224012@link.cuhk.edu.hk)
• Fangzhou Liu (fzliu23@cse.cuhk.edu.hk)
• Yifan Shi (i.yifan@link.cuhk.edu.hk)
• Mengjia Dai (mjdai@link.cuhk.edu.hk)

3/37
Grading Information

Grade Determinates
5% Attendance
15% Homework
20% Midterm (Mar. 14)
20% Three Labs (Individual project)
40% Final Exam
• Late submission per day is subject to 10% of penalty.
• A student must gain at least 50% of the full marks in order to pass the course.
• A student must attend at least 80% of lectures in order to gain all class attendance
credits.

4/37
General References

Textbook:
• Computer Organization and Design, RISC-V Edition
• Soft copy, amazon.cn, or amazon.com

Manuals:
• RV32 Reference 1 and RV32 Reference 2 (on course webpage)
• Lab tutorials (slides)
Slides:
• On the course web page before lecture
• Summary may be uploaded afterwards (on piazza)

5/37
Course Content

• Introduction to the major components of a computer system, how they function


together in executing a program.
• Introduction to CPU datapath and control unit design
• Introduction to techniques to improve performance and energy-efficiency of
computer systems
• Introduction to multiprocessor architecture

6/37
Course Content

• Introduction to the major components of a computer system, how they function


together in executing a program.
• Introduction to CPU datapath and control unit design
• Introduction to techniques to improve performance and energy-efficiency of
computer systems
• Introduction to multiprocessor architecture

Philosophy
To learn what determines the capabilities and performance of computer systems and to
understand the interactions between the computer’s architecture and its software so that
future software designers (compiler writers, operating system designers, database
programmers, application programmers, ...) can achieve the best cost-performance
trade-offs and so that future architects understand the effects of their design choices on
software.

6/37
Why Learn This Stuff?

• You want to call yourself a “computer scientist/engineer”


• You want to build HW/SW people use (so need performance/power)
• You need to make a purchasing decision or offer “expert” advice
Both hardware and software affect performance/power
• Algorithm determines number of source-level statements
• Language/compiler/architecture determine the number of machine-level
instructions
• Processor/memory determine how fast and how power-hungry machine-level
instructions are executed

7/37
What You Should Already Know

• Basic logic design & machine organization


• logical minimization, FSMs, component design
• processor, memory, I/O

• Create, run, debug programs in an assembly language


• Will be introduced in tutorial

• Create, compile, and run C/C++ programs

• Create, organize, and edit files and run programs on Unix/Linux

8/37
What You Should Already Know

• Basic logic design & machine organization


• logical minimization, FSMs, component design
• processor, memory, I/O

• Create, run, debug programs in an assembly language


• Will be introduced in tutorial

• Create, compile, and run C/C++ programs

• Create, organize, and edit files and run programs on Unix/Linux

One example here!

8/37
Question
Which program will run faster?
• A: cache-test1.c
• B: cache-test2.c
• C: similar runtime

9/37
Computer Organization and Design

• This course is all about how computers work

• But what do we mean by a computer?


• Different types: embedded, laptop, desktop, server
• Different uses: automobiles, graphics, finance, genomics ...
• Different manufacturers: Intel, Apple, IBM, Sony, Oracle ...
• Different underlying technologies and different costs

• Analogy: Consider a course on “automotive vehicles”


• Many similarities from vehicle to vehicle (e.g., wheels)
• Huge differences from vehicle to vehicle (e.g., gas vs. electric)

• Best way to learn:


• Focus on a specific instance and learn how it works
• While learning general principles and historical perspectives

10/37
How Do the Pieces Fit Together?

Applications
Operating
System
Compiler Firmware
Instruction Set
Architecture Memory Processor I/O system network
system

Datapath & Control


Digital Design
Circuit Design

• Coordination of many levels of abstraction


• Under a rapidly changing set of forces
• Design, measurement, and evaluation
11/37
How Do the Pieces Fit Together?

Applications
Operating
CSCI3150
CSCI3120 System
Compiler Firmware
Instruction Set
Architecture Memory Processor I/O system network
system CENG4430

Datapath & Control CENG2400&CENG3420


Digital Design ENGG2020
Circuit Design CENG3470

• Coordination of many levels of abstraction


• Under a rapidly changing set of forces
• Design, measurement, and evaluation
11/37
A Bit of History

12/37
The Evolution of Computer Hardware

When was the first transistor invented?

(a) (b)

(a) 1947, bi-polar transistor, by John Bardeen et al. at Bell Laboratories; (b) UNIVAC I (Universal
Automatic Computer): the first commercial computer in USA.

13/37
The Evolution of Computer Hardware

When was the first IC (integrated circuit) invented?

(a) (b)
(a) 1958, by Jack Kilby@Texas Instruments, by hand. Several transistors, resistors and capacitors on
a single substrate. (b) IBM System/360, 2MHz, 128KB – 256KB.

14/37
The Evolution of Computer Hardware

When was the first Microprocessor?

(a) (b)
1971, Intel 4004.

15/37
The IC Manufacturing Process

Yield
Proportion of working dies per wafer

Check this: https://youtu.be/d9SWNLZvA8g?list=FLELqiXCJQW-jcijW8ZAbA8w


16/37
AMD Opteron X2 Wafer

300mm wafer, 117 chips, 90nm technology.

17/37
Integrated Circuit Cost

Cost per wafer


Cost per die =
Dies per wafer · Yield
Dies per wafer = Wafer area / Die area
1
Yield =
[1 + (Defects per area · Die area / 2)]2

Nonlinear relation to area and defect rate


• Wafer cost and area are fixed
• Defect rate determined by manufacturing process
• Die area determined by architecture and circuit design

18/37
Impacts of Advancing Technology

Processor
• Logic capacity: increases about 30% per year
• Performance: 2× every 1.5 years

Memory
• DRAM capacity: 4× every 3 years, about 60% per year
• Memory speed: 1.5× every 10 years
• Cost per bit: decreases about 25% per year

Disk
• Capacity: increases about 60% per year

19/37
Moore’s Law for CPUs and DRAMs

From: “Facing the Hot Chips Challenge Again”, Bill Holt, Intel, presented at Hot Chips 17, 2005.
20/37
Main driver: device scaling ...

From: “Facing the Hot Chips Challenge Again”, Bill Holt, Intel, presented at Hot Chips 17, 2005.

21/37
Technology Scaling Road Map (ITRS)

Year 2004 2006 2008 2010 2012


Feature size (nm) 90 65 45 32 22
Intg. Capacity (BT) 2 4 6 16 32

Fun facts about 45nm transistors


• 30 million can fit on the head of a pin
• You could fit more than 2,000 across the width of a human hair
• If car prices had fallen at the same rate as the price of a single transistor since 1968, a
new car today would cost about 1 cent

22/37
Highest Clock Rate of Intel Processors

23/37
Highest Clock Rate of Intel Processors

What if the exponential increase had kept up? Why not?


• Due to process improvements
• Deeper pipeline
• Circuit design techniques
23/37
Power Issue

Power = Capacitive load · Voltage2 · Frequency1

Example
For a simple processor, if capacitive load is reduced by 15%, voltage is reduced by 15%,
maintain the same frequency, how much power consumption can be reduced?
• A: 27.8%
• B: 38.6%
• C: 85.0%

1 24/37
here we only consider dynamic power, but not static power
A Sea Change Is at Hand

• The power challenge has forced a change in the design of microprocessors


• Since 2002 the rate of improvement in the response time of programs on desktop
computers has slowed from a factor of 1.5 per year to less than a factor of 1.2 per year
• As of 2006 all desktop and server companies are shipping microprocessors with
multiple processors – cores – per chip
• Plan of record is to add two cores per chip per generation (about every two years)

Product AMD Intel IBM Power 6 Sun Niagara


Barcelona Nehalem 2
Cores per chip 4 4 2 8
Clock rate ~2.5 GHz ~2.5 GHz 4.7 GHz 1.4 GHz
Power 120 W ~100 W ~100 W 94 W

25/37
Intel Core i7 Processor

45nm technology, 18.9mm x 13.6mm, 0.73billion transistors, 2008


26/37
A Computer

Desktop computers
Designed to deliver good performance to a single user at low cost usually executing 3rd
party software, usually incorporating a graphics display, a keyboard, and a mouse
27/37
Other Classes of Computers

Servers
Used to run larger programs for multiple, simultaneous users typically accessed only via a
network and that places a greater emphasis on dependability and (often) security

Supercomputers
A high performance, high cost class of servers with hundreds to thousands of processors,
terabytes of memory and petabytes of storage that are used for high-end scientific and
engineering applications.

Embedded computers (processors)


A computer inside another device used for running one predetermined application

28/37
Supercomputers

Tianhe-2 (MilkyWay-2)
• Over 3 million cores
• Power: 17.6 MW (24 MW with cooling)
• Speed: 33.86 PFLOPS (peta = 1015 )

29/37
Embedded Computers in You Car

30/37
PostPC Era

Personal Mobile Device (PMD)


Battery-operated device with wireless connectivity

Warehouse Scale Computer (WSC)


Datacenter containing hundreds of thousands of servers providing software as a service
(SaaS)

31/37
Growth in Cell Phone Sales (Embedded)
• embedded growth >> desktop growth
• Where else are embedded processors found?

32/37
When Machine Learning Meets Hardware

Convolution layer is one of the most expensive layers


• Computation pattern
• Emerging challenges

More and more end-point devices with limited memory


• Cameras
• Smartphone
• Autonomous driving

33/37
Convolutional Neural Network (CNN)

34/37
Bottleneck of CNN

35/37
36/37
Apple M1 Processor

• 8-core CPU
• 8-core GPU
• 16-core Neural Engine

37/37

You might also like