Data Parallelism, Task Parallelism, CPU, GPU

The document compares CPUs and GPUs, highlighting their architectures and performance capabilities. CPUs are optimized for sequential processing with fewer powerful cores, while GPUs excel in parallel processing with many simpler cores. Performance metrics such as computational performance and memory bandwidth are discussed, along with example code demonstrating CPU and GPU execution times.

Uploaded by

TIKTOK DHAMAL

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views13 pages

Data Parallelism, Task Parallelism, CPU, GPU

Uploaded by

TIKTOK DHAMAL

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 13

CPU vs GPU , Performance

analysis , Terminology
Group Member
Maasma Zari 2022-CS-
504
Zameer ul Hassan 2022-
CS-540
Haris Khan 2022-CS-
556
What is a CPU?

•CPU (Central Processing Unit)

is known as the brain of the
computer.
•Optimized for sequential serial
processing.
•Excellent at handling complex
instructions and few tasks at
high speed.
Example Tasks:
.

•Running operating systems

•Managing input/output operations
•Handling background processe
Architecture of CPU

•Few cores (4 to 32 cores typically).

.
•Each core is very powerful and complex.
•Large cache memory.
•Focus on minimizing latency (responding
quickly).
Diagram: (Simple core architecture illustration)
What is a GPU?
•GPU (Graphics Processing
Unit) is designed for massively
.

parallel tasks.
•Originally built for rendering
graphics.
•Now used in AI, scientific
simulations, video editing, etc.
•Can handle thousands of simple
tasks at the same time.
Architecture of GPU

. •Hundreds to thousands of simple cores.

•Smaller cache memory per core.
•High throughput (focus on executing
many operations at once).
•Ideal for Data Parallelism.
Diagram: (Grid of many small cores)
Code:
#include <iostream> cudaMalloc((void**)&dev_a, size);
#include <cuda_runtime.h> cudaMalloc((void**)&dev_b, size);
#include <chrono> cudaMalloc((void**)&dev_c, size);

__global__ void addKernelGPU(int *c, const int *a, const int *b, int n) { cudaMemcpy(dev_a, a, size, cudaMemcpyHostToDevice);
int i = blockDim.x * blockIdx.x + threadIdx.x; cudaMemcpy(dev_b, b, size, cudaMemcpyHostToDevice);
if (i < n) {
c[i] = a[i] + b[i]; // CPU Execution
} auto start_cpu = std::chrono::high_resolution_clock::now();
} addCPU(c_cpu, a, b, N);
auto end_cpu = std::chrono::high_resolution_clock::now();
void addCPU(int *c, const int *a, const int *b, int n) { std::chrono::duration<double> cpu_duration = end_cpu - start_cpu;
for (int i = 0; i < n; ++i) { std::cout << "CPU Time: " << cpu_duration.count() << " seconds" <<
c[i] = a[i] + b[i]; std::endl;
}
} // GPU Execution
auto start_gpu = std::chrono::high_resolution_clock::now();
int main() { addKernelGPU<<<(N + 255) / 256, 256>>>(dev_c, dev_a, dev_b, N);
const int N = 1 << 20; // 1 million elements cudaMemcpy(c_gpu, dev_c, size, cudaMemcpyDeviceToHost);
size_t size = N * sizeof(int); auto end_gpu = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> gpu_duration = end_gpu - start_gpu;
int *a = new int[N]; std::cout << "GPU Time: " << gpu_duration.count() << " seconds" <<
int *b = new int[N]; std::endl;
int *c_cpu = new int[N];
int *c_gpu = new int[N]; // Cleanup
cudaFree(dev_a);
for (int i = 0; i < N; ++i) { cudaFree(dev_b);
a[i] = i; cudaFree(dev_c);
b[i] = i * 2; delete[] a;
} delete[] b;
delete[] c_cpu;
int *dev_a = nullptr; delete[] c_gpu;
int *dev_b = nullptr;
int *dev_c = nullptr; return 0;
Output:

CPU Time: 0.06 seconds

GPU Time: 0.002 seconds
Metric Description

Measures the number of floating-

Peak Computational point operations per second
Performance (GFLOPS) (billions) that a system can
achieve.
Measures how fast data can be
Memory Bandwidth (GB/sec) moved between memory and
processor.
How effectively the processor
Efficiency Ratio
reads/writes data from/to memory.
Term Meaning

Host CPU and its memory

Device GPU and its memory
.

Data Parallelism, Task Parallelism, CPU, GPU
No ratings yet
Data Parallelism, Task Parallelism, CPU, GPU
13 pages
PC Cuda Assignment-2
No ratings yet
PC Cuda Assignment-2
29 pages
G80 Cuda
No ratings yet
G80 Cuda
25 pages
Source Code
No ratings yet
Source Code
7 pages
GPU History & CUDA Programming Basics
No ratings yet
GPU History & CUDA Programming Basics
44 pages
Lecture 11 Programming On Gpus Part 1 Zxu2acms60212 40212 S15lec 11 Gpupdf
No ratings yet
Lecture 11 Programming On Gpus Part 1 Zxu2acms60212 40212 S15lec 11 Gpupdf
121 pages
L06 GPGPU CUDA Programming 1
No ratings yet
L06 GPGPU CUDA Programming 1
23 pages
21.L18 Intro To GPU and CUDA C
No ratings yet
21.L18 Intro To GPU and CUDA C
89 pages
2023 CSC14120 Lecture01 CUDAIntroduction
No ratings yet
2023 CSC14120 Lecture01 CUDAIntroduction
32 pages
GPU & CUDA Programming Guide
No ratings yet
GPU & CUDA Programming Guide
31 pages
2023 CSC14120 Lecture05 CUDAMemories
No ratings yet
2023 CSC14120 Lecture05 CUDAMemories
48 pages
DS1822 - Parallel Computing-Unit3
No ratings yet
DS1822 - Parallel Computing-Unit3
17 pages
Intro to CUDA Programming Guide
No ratings yet
Intro to CUDA Programming Guide
33 pages
CUDA Introduction
No ratings yet
CUDA Introduction
39 pages
High Performance Computing On Gpu
No ratings yet
High Performance Computing On Gpu
37 pages
GPU Programming: Dr. Florian Ferreira
No ratings yet
GPU Programming: Dr. Florian Ferreira
101 pages
Note2 4
No ratings yet
Note2 4
11 pages
Chapter 5 - General Purpose PGPU, CUDA
No ratings yet
Chapter 5 - General Purpose PGPU, CUDA
70 pages
AcceleratingAIAdvancements Pre Print Doube Blind
No ratings yet
AcceleratingAIAdvancements Pre Print Doube Blind
9 pages
Comparative Analysis of CPU and GPU Profiling For Deep Learning Models
No ratings yet
Comparative Analysis of CPU and GPU Profiling For Deep Learning Models
6 pages
GPU Parallel Computing Lab Report
No ratings yet
GPU Parallel Computing Lab Report
8 pages
Cuda Review 1
No ratings yet
Cuda Review 1
13 pages
Vector Addition
No ratings yet
Vector Addition
3 pages
GPU Basics
No ratings yet
GPU Basics
93 pages
GPGPU Tutorial
No ratings yet
GPGPU Tutorial
155 pages
Owens
No ratings yet
Owens
67 pages
Cuda 1
No ratings yet
Cuda 1
45 pages
Lab 1 Parallel
No ratings yet
Lab 1 Parallel
4 pages
04 IntroductionGPUsCUDA
No ratings yet
04 IntroductionGPUsCUDA
25 pages
Cuda C/C++ Basics: NVIDIA Corporation
No ratings yet
Cuda C/C++ Basics: NVIDIA Corporation
67 pages
Lecture3 Fundamentals of CUDA (Part1) - 2025
No ratings yet
Lecture3 Fundamentals of CUDA (Part1) - 2025
52 pages
Chapter 8
No ratings yet
Chapter 8
58 pages
CUDA Compute Unified Device Architecture
No ratings yet
CUDA Compute Unified Device Architecture
26 pages
Introduction To CUDA C 3
No ratings yet
Introduction To CUDA C 3
67 pages
cs179 2016 Lec13
No ratings yet
cs179 2016 Lec13
30 pages
Programming Gpus With Cuda: John Mellor-Crummey
No ratings yet
Programming Gpus With Cuda: John Mellor-Crummey
42 pages
Cuda Mode Lecture2
No ratings yet
Cuda Mode Lecture2
33 pages
GPU Computing 2
No ratings yet
GPU Computing 2
28 pages
Introduction To CUDA: CAP 4730 Spring 2012
No ratings yet
Introduction To CUDA: CAP 4730 Spring 2012
35 pages
Parallel Programming With CUDA - Architecture, Analysis
No ratings yet
Parallel Programming With CUDA - Architecture, Analysis
93 pages
HPC Final 4-8
No ratings yet
HPC Final 4-8
25 pages
CUDA Programming Basic: High Performance Computing Center Hanoi University of Science & Technology
No ratings yet
CUDA Programming Basic: High Performance Computing Center Hanoi University of Science & Technology
38 pages
CUDA Class Lecture01
No ratings yet
CUDA Class Lecture01
26 pages
лк CUDA - 1 PDCn
No ratings yet
лк CUDA - 1 PDCn
31 pages
Overview of GPGPU's
No ratings yet
Overview of GPGPU's
81 pages
Comp206 Lecture14
No ratings yet
Comp206 Lecture14
29 pages
CUDA Programming for Developers
No ratings yet
CUDA Programming for Developers
42 pages
GPU Microarchitecture Insights via Microbenchmarking
No ratings yet
GPU Microarchitecture Insights via Microbenchmarking
12 pages
01 Cuda C Basics
No ratings yet
01 Cuda C Basics
32 pages
1 Cuda
100% (1)
1 Cuda
173 pages
GPUs and GPGPU
No ratings yet
GPUs and GPGPU
15 pages
Addition Cuda
No ratings yet
Addition Cuda
2 pages
BECOA157 Parallel Matrix Multiplication
No ratings yet
BECOA157 Parallel Matrix Multiplication
3 pages
Optimus Developer Guide
No ratings yet
Optimus Developer Guide
11 pages
Lecture GPUArchCUDA01
No ratings yet
Lecture GPUArchCUDA01
57 pages
CUDA Programming Invert
No ratings yet
CUDA Programming Invert
36 pages
CUDA Programming for Engineers
No ratings yet
CUDA Programming for Engineers
84 pages
15 Bellmanford
No ratings yet
15 Bellmanford
39 pages
Week 7 AI LUMS
No ratings yet
Week 7 AI LUMS
47 pages
13 WLAN Overview
No ratings yet
13 WLAN Overview
60 pages
First Merit List BSCS 2023 New Campus269
No ratings yet
First Merit List BSCS 2023 New Campus269
2 pages
LSW HW S67
No ratings yet
LSW HW S67
52 pages
Train Signal (Lab 03) - Web Servers
No ratings yet
Train Signal (Lab 03) - Web Servers
73 pages
Identitas Peserta Puskesmas
100% (1)
Identitas Peserta Puskesmas
176 pages
OS Practical Viva Questions Points
No ratings yet
OS Practical Viva Questions Points
5 pages
Hardware-Assisted Security Enhanced Linux in Embedded Systems: A Proposal
No ratings yet
Hardware-Assisted Security Enhanced Linux in Embedded Systems: A Proposal
7 pages
Chapter 1 Digital Systems and Binary Numbers
No ratings yet
Chapter 1 Digital Systems and Binary Numbers
13 pages
Mifare Compatiblity Chart - EN
No ratings yet
Mifare Compatiblity Chart - EN
1 page
CS551 Operating Systems Course Overview
No ratings yet
CS551 Operating Systems Course Overview
15 pages
Windows XP Repair Install Guide
No ratings yet
Windows XP Repair Install Guide
4 pages
ATI Call Flow
No ratings yet
ATI Call Flow
39 pages
Cloud Computing Revision Notes
No ratings yet
Cloud Computing Revision Notes
6 pages
Binary Mathematics
No ratings yet
Binary Mathematics
18 pages
Network Congestion Control Techniques
No ratings yet
Network Congestion Control Techniques
10 pages
Computer Networking Questions
No ratings yet
Computer Networking Questions
27 pages
Mod Menu Crash 2025 07 14-15 21 33
No ratings yet
Mod Menu Crash 2025 07 14-15 21 33
3 pages
EB8000 User Manual
No ratings yet
EB8000 User Manual
755 pages
FPGA Lab Tutorial
No ratings yet
FPGA Lab Tutorial
53 pages
19 IPv6 Basics
No ratings yet
19 IPv6 Basics
42 pages
Wincor Nixdorf Wincor Nixdorf: Tpiscan Workshop For Customizers
No ratings yet
Wincor Nixdorf Wincor Nixdorf: Tpiscan Workshop For Customizers
14 pages
Operating Systems
No ratings yet
Operating Systems
17 pages
Balcena HowWiresharkWorks
No ratings yet
Balcena HowWiresharkWorks
13 pages
Peer-to-Peer (P2P) Networks: Dr. Yingwu Zhu
No ratings yet
Peer-to-Peer (P2P) Networks: Dr. Yingwu Zhu
63 pages
USB to RS485/422 Converter Guide
No ratings yet
USB to RS485/422 Converter Guide
4 pages
Cooperative Expendable Micro-Slice Servers (CEMS) : Low Cost, Low Power Servers For Internet-Scale Services
No ratings yet
Cooperative Expendable Micro-Slice Servers (CEMS) : Low Cost, Low Power Servers For Internet-Scale Services
8 pages
E Share Server
No ratings yet
E Share Server
841 pages
Illegal Digital Materials
100% (1)
Illegal Digital Materials
2 pages
Installation of Oracle 11g On Linux
No ratings yet
Installation of Oracle 11g On Linux
5 pages
9777B1 (Rack Configuration and Utilities Guide)
No ratings yet
9777B1 (Rack Configuration and Utilities Guide)
76 pages
Altcryptpad Install
No ratings yet
Altcryptpad Install
17 pages
NETWORK Installations: (Read Completely Before Proceeding) Tops
No ratings yet
NETWORK Installations: (Read Completely Before Proceeding) Tops
3 pages

Data Parallelism, Task Parallelism, CPU, GPU

Uploaded by

Data Parallelism, Task Parallelism, CPU, GPU

Uploaded by

CPU vs GPU , Performance

•CPU (Central Processing Unit)

•Running operating systems

•Few cores (4 to 32 cores typically).

. •Hundreds to thousands of simple cores.

CPU Time: 0.06 seconds

Measures the number of floating-

Host CPU and its memory

You might also like