CPU vs GPU , Performance
analysis , Terminology
Group Member
Maasma Zari 2022-CS-
504
Zameer ul Hassan 2022-
CS-540
Haris Khan 2022-CS-
556
What is a CPU?
•CPU (Central Processing Unit)
is known as the brain of the
computer.
•Optimized for sequential serial
processing.
•Excellent at handling complex
instructions and few tasks at
high speed.
Example Tasks:
.
•Running operating systems
•Managing input/output operations
•Handling background processe
Architecture of CPU
•Few cores (4 to 32 cores typically).
.
•Each core is very powerful and complex.
•Large cache memory.
•Focus on minimizing latency (responding
quickly).
Diagram: (Simple core architecture illustration)
What is a GPU?
•GPU (Graphics Processing
Unit) is designed for massively
.
parallel tasks.
•Originally built for rendering
graphics.
•Now used in AI, scientific
simulations, video editing, etc.
•Can handle thousands of simple
tasks at the same time.
Architecture of GPU
. •Hundreds to thousands of simple cores.
•Smaller cache memory per core.
•High throughput (focus on executing
many operations at once).
•Ideal for Data Parallelism.
Diagram: (Grid of many small cores)
Code:
#include <iostream> cudaMalloc((void**)&dev_a, size);
#include <cuda_runtime.h> cudaMalloc((void**)&dev_b, size);
#include <chrono> cudaMalloc((void**)&dev_c, size);
__global__ void addKernelGPU(int *c, const int *a, const int *b, int n) { cudaMemcpy(dev_a, a, size, cudaMemcpyHostToDevice);
int i = blockDim.x * blockIdx.x + threadIdx.x; cudaMemcpy(dev_b, b, size, cudaMemcpyHostToDevice);
if (i < n) {
c[i] = a[i] + b[i]; // CPU Execution
} auto start_cpu = std::chrono::high_resolution_clock::now();
} addCPU(c_cpu, a, b, N);
auto end_cpu = std::chrono::high_resolution_clock::now();
void addCPU(int *c, const int *a, const int *b, int n) { std::chrono::duration<double> cpu_duration = end_cpu - start_cpu;
for (int i = 0; i < n; ++i) { std::cout << "CPU Time: " << cpu_duration.count() << " seconds" <<
c[i] = a[i] + b[i]; std::endl;
}
} // GPU Execution
auto start_gpu = std::chrono::high_resolution_clock::now();
int main() { addKernelGPU<<<(N + 255) / 256, 256>>>(dev_c, dev_a, dev_b, N);
const int N = 1 << 20; // 1 million elements cudaMemcpy(c_gpu, dev_c, size, cudaMemcpyDeviceToHost);
size_t size = N * sizeof(int); auto end_gpu = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> gpu_duration = end_gpu - start_gpu;
int *a = new int[N]; std::cout << "GPU Time: " << gpu_duration.count() << " seconds" <<
int *b = new int[N]; std::endl;
int *c_cpu = new int[N];
int *c_gpu = new int[N]; // Cleanup
cudaFree(dev_a);
for (int i = 0; i < N; ++i) { cudaFree(dev_b);
a[i] = i; cudaFree(dev_c);
b[i] = i * 2; delete[] a;
} delete[] b;
delete[] c_cpu;
int *dev_a = nullptr; delete[] c_gpu;
int *dev_b = nullptr;
int *dev_c = nullptr; return 0;
Output:
CPU Time: 0.06 seconds
GPU Time: 0.002 seconds
Metric Description
Measures the number of floating-
Peak Computational point operations per second
Performance (GFLOPS) (billions) that a system can
achieve.
Measures how fast data can be
Memory Bandwidth (GB/sec) moved between memory and
processor.
How effectively the processor
Efficiency Ratio
reads/writes data from/to memory.
Term Meaning
Host CPU and its memory
Device GPU and its memory
.