0% found this document useful (0 votes)

108 views33 pages

GPU Programming Slides 1

The document outlines a GPU Programming course (CSGG3018) taught by Amit Gurung, focusing on parallel programming with GPU architecture and APIs. It includes course objectives, outcomes, evaluation methods, and key concepts in concurrency and parallel programming. The course emphasizes the importance of concurrency in GPU programming and covers tools like CUDA and OpenCL for efficient computation.

Uploaded by

pillai.siddhart

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

108 views33 pages

GPU Programming Slides 1

Uploaded by

pillai.siddhart

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

GPU Programming

Course Code: CSGG3018

Instructor: AMIT GURUNG
Email: amit.gurung@ddn.upes.ac.in

Welcomes

12 programmes Ranked 501-600 in Ranked 801-850 in Ranked 46th in

A Grade accredited Rankings 2025 World Rankings 2025 Rankings 2024*
* University Category

Jan – May, 2025

1
Course overview

Course Code Course name L T P C

GPU Programming
3 0 0 3

Total Units to be Covered: 6 Total Contact Hours:34

Basic of Programming
Prerequisite(s): Basic knowledge of Computer Architecture
Course Objectives

• The objective of this course to

provide a deep knowledge of
parallel programming with GPU
architecture and APIs with its
practical applications.
Course Outcomes
On completion of this course, the students will be able to

CO1 Describe the GPU computer architecture, GPU

programming environments and Data Parallelism.
CO2 Explore the contents of Data parallel Execution
Model and CUDA Memories
CO3 Elaborate the data parallelism concepts in OpenCL
& OpenACC and compare OpenACC & CUDA
CO4 Illustrate the programs to solve problems and
execute on GPU.
Recommendations

Textbooks

1.IBM ICE Publications

Modes of Evaluation
Quiz/Assignment/ presentation/ extempore/ Written
Examination
• Examination Scheme
Components IA MID SEM End Sem Total
Weightage (%) 30 20 50 100

Internal Assessment
What is Concurrent
Programming?
Concurrent programming is the practice of
executing multiple tasks or processes
simultaneously.
• Key Concepts:
• Tasks can interact or run independently.
• Focuses on tasks appearing to run at the same time (not on how
hardware are used).
• Applications:
• Real-time systems (e.g., operating systems).
• Gaming and multimedia.
• Data processing.
Concurrency vs. Parallelism
Concurrency Parallelism

Tasks progress simultaneously. Tasks execute simultaneously.

May involve task switching. Requires multiple

(single core) processors/cores.

Example: Multi-threading. Example: GPU computations.

Why Concurrency is important?
1. Performance:
• Allows better utilization of resources.
2. Responsiveness:
• Keeps systems responsive during long-running tasks. (due to
switching)
3. Scalability:
• Handles large datasets and complex computations efficiently.
4. Modern Hardware:
• Exploits multi-core CPUs and GPUs for better performance.
Real-World Examples of Concurrency

• Web Servers:
• Handle multiple client requests simultaneously.
• Video Games:
• Render graphics, play audio, and handle user input
concurrently.
• Data Analytics:
• Process data streams in real time.
• Autonomous Systems:
• Sensor data processing and decision-making in
parallel.
Key Concepts in Concurrency
• Threads:
• Lightweight processes that run independently and can
share resources.
• Synchronization:
• Mechanisms to ensure threads access shared data safely.
• Example: Locks prevent simultaneous access (only single
thread).
• Example: Semaphores control thread access to resources
(multiple threads management).
Key Concepts in Concurrency
• Deadlocks:
• Happens when two or more tasks wait for each other indefinitely.
• Example: Task A locks Resource 1 and waits for Resource 2, while
Task B locks Resource 2 and waits for Resource 1.
• Prevention: Use a consistent order for locking resources or implement
timeout mechanisms.
• Race Conditions:
• Occurs when multiple threads modify shared data without proper
synchronization, leading to unpredictable outcomes.
• Example: Two threads incrementing a shared counter simultaneously.
• Solution: Use atomic operations or locks to synchronize access
Concurrency in GPU Programming
• GPUs are designed for:
• Massive parallelism.
• Executing thousands of threads
simultaneously.
• Why GPUs need concurrency:
• Efficiently handle compute-intensive tasks.
• Parallelize tasks like matrix operations, image
processing, etc.
Tools and Frameworks
1. CPU-based Concurrency:
POSIX Threads (Pthreads or Portable Operating
System Interface):
• Provides a standard API for creating and managing
threads.
• Offers flexibility but requires careful handling of
synchronization.
• OpenMP:
• Simplifies parallel programming with compiler
directives.
• Ideal for loop-level parallelism in shared-memory systems
Tools and Frameworks
2. GPU-based Concurrency:
• CUDA (Compute Unified Device Architecture):
• NVIDIA’s framework for parallel programming on GPUs.
• OpenCL (Open Computing Language):
• A portable framework for programming across GPUs and
other accelerators.
Summary on Concurrent Programming
• Concurrency enables multitasking and efficient use of
resources.
• Key concepts include threads, synchronization, deadlocks,
and race conditions.
• GPUs exploit concurrency to achieve massive parallelism.
• Understanding concurrency is essential for GPU
programming.
Parallel Programming
Parallel Programming
• Parallel programming allows multiple computations to run
simultaneously, improving speed and efficiency.
• Applications include scientific simulations, data analytics, and
machine learning.
• Key Concepts
• Task Parallelism: Dividing the problem into tasks processed
concurrently.
• Data Parallelism: Processing large datasets by distributing data
across cores.
• Synchronization: Managing dependencies between tasks.
Parallel programming workflow:
Eg. of master-worker model

Model - 1
Model - 2
Parallel Programming
Parallel Architectures
• Classifications of parallel programming models can
be divided broadly into two areas:
• Process interaction [Shared memory (e.g., multicore
CPUs) vs. Distributed memory (e.g., clusters)]
• Problem decomposition (data/task parallelism)
Parallel Programming
Process interaction
• Process interaction relates to the mechanisms by
which parallel processes are able to
communicate with each other.
• The most common forms of interaction are
shared memory and message passing, but
interaction can also be implicit (invisible to the
programmer).
Parallel Programming
Shared Memory
• Shared memory is an efficient means of passing data
between processes.
• Parallel processes share a global address space that
they read and write to asynchronously.
• Asynchronous concurrent access can lead to race
conditions
• Solution: mechanisms such as locks, semaphores and
monitors can be used to avoid these.
Parallel Programming
Message Passing
• In a message-passing model, parallel
processes exchange data through
passing messages to one another.
• It can be asynchronous, where a message
can be sent before the receiver is ready, or
synchronous, where the receiver must be
ready.
Parallel Programming
• Programming Models
• Thread-based (e.g., Pthreads, OpenMP).
• Message Passing Interface (MPI for distributed memory) is a
programming interface that allows for communication between
processes in a parallel computing environment.
• Data-Parallel (e.g., CUDA, OpenCL).
• Libraries: BLAS, TensorFlow (support for parallelism).
History of Graphics Processors

Assignment uploaded in LMS

Graphics Processing Units (GPUs)
• Architecture
• Specialized for data-parallel tasks like matrix
multiplication, making them ideal for graphics and
deep learning.
• Thousands of smaller cores compared to CPUs.
• Applications Beyond Graphics
• AI/ML training, high-performance computing (HPC),
cryptocurrency mining.
General-Purpose GPUs (GPGPUs)
• Definition
• GPUs used for tasks beyond graphics, such as
scientific computing and simulations.
• Programming GPGPUs
• CUDA (NVIDIA), OpenCL (not specific to any vendor).
• Libraries: cuDNN for deep learning, Thrust for
parallel programming.
Comparison: CPU vs GPU

CPU GPU
Simplified view of a GPU
Comparison: CPU vs GPU
• CPU Characteristics
• Few powerful cores.
• Optimized for sequential processing.
• Suitable for task-switching and latency-sensitive tasks.
• GPU Characteristics
• Thousands of smaller cores.
• Designed for parallelism and throughput.
• Best for tasks like image rendering, simulations, and neural
network training.
• Example Use Cases
• CPUs: Operating systems, databases.
• GPUs: Graphics, AI model training.
Heterogeneous Computing
• Definition
• Combining CPUs, GPUs, and other processors for
optimal workload distribution.
• Examples
• Hybrid systems like NVIDIA DGX or Intel Xeon with
integrated GPUs.
• Benefits include higher efficiency, cost-effectiveness, and
power savings.
• Nvidia DGX (Deep GPU Xceleration) represents a series of servers and workstations
designed by Nvidia, primarily geared towards enhancing deep learning applications
through the use of general-purpose computing on graphics processing units (GPGPU)
Programming GPUs using
CUDA/OpenCL/OpenACC
• CUDA
• Proprietary to NVIDIA GPUs.
• Features: Kernels, shared memory, warp-based parallelism.
• OpenCL
• Open standard supporting CPUs, GPUs, FPGAs.
• Portable but less optimized than CUDA.
• OpenACC
• High-level directives for parallelism.
• Ideal for researchers who need quick solutions without diving
into low-level programming.
References

https://en.wikipedia.org/

https://www.nvidia.com/content/PDF/nvidia-ampere-ga-
102-gpu-architecture-whitepaper-v2.pdf


Promox 3
No ratings yet
Promox 3
200 pages
Recognizing Quantum Fields - Holo-Application
No ratings yet
Recognizing Quantum Fields - Holo-Application
88 pages
FILE004
No ratings yet
FILE004
90 pages
HEVC Encoding: A New Era in Video Compression
No ratings yet
HEVC Encoding: A New Era in Video Compression
17 pages
Physics Quiz on Waves and Sound
100% (1)
Physics Quiz on Waves and Sound
16 pages
Alphaphone Brainwave Analyzer Instruction Manual and Drawings
No ratings yet
Alphaphone Brainwave Analyzer Instruction Manual and Drawings
149 pages
(W261.Book) Download PDF Magic of The World by John Mulholland
No ratings yet
(W261.Book) Download PDF Magic of The World by John Mulholland
5 pages
990 Energy Screams
No ratings yet
990 Energy Screams
10 pages
Unix Architecture
No ratings yet
Unix Architecture
14 pages
Search and Meta Search Engines
No ratings yet
Search and Meta Search Engines
9 pages
Unit 2 Structure of OS
No ratings yet
Unit 2 Structure of OS
24 pages
TLDR Book
No ratings yet
TLDR Book
5,461 pages
G Saxby Practical - Holography - Chapter 1 - What Is A Hologram
No ratings yet
G Saxby Practical - Holography - Chapter 1 - What Is A Hologram
13 pages
HAARP Ionospheric Research Study
No ratings yet
HAARP Ionospheric Research Study
14 pages
Exokernel
No ratings yet
Exokernel
6 pages
Mirror: For Other Uses, See - "Looking Glass" Redirects Here. For Other Uses, See
No ratings yet
Mirror: For Other Uses, See - "Looking Glass" Redirects Here. For Other Uses, See
7 pages
Introduction to Logic Programming
No ratings yet
Introduction to Logic Programming
6 pages
The BSD Family Tree
No ratings yet
The BSD Family Tree
6 pages
1 - PDFsam - RTFM - Red Team Field Manual v3
No ratings yet
1 - PDFsam - RTFM - Red Team Field Manual v3
3 pages
Holographic Godforms: The Spirit of The Times: Article
No ratings yet
Holographic Godforms: The Spirit of The Times: Article
20 pages
Linux Power Management Review
No ratings yet
Linux Power Management Review
46 pages
Abducted by Aliens - Chuck Weiss
No ratings yet
Abducted by Aliens - Chuck Weiss
370 pages
TrueOS Handbook
No ratings yet
TrueOS Handbook
266 pages
DSO138-Mini Manual 2021 03 05
No ratings yet
DSO138-Mini Manual 2021 03 05
15 pages
How To Setup A Clonezilla Server
No ratings yet
How To Setup A Clonezilla Server
12 pages
Hearing - One of Man's Most Important Communication Channel
No ratings yet
Hearing - One of Man's Most Important Communication Channel
21 pages
Nano Computers
No ratings yet
Nano Computers
36 pages
12 - Superconducting Magnetic Energy Storage System For GRIDS (Lehner For Li)
No ratings yet
12 - Superconducting Magnetic Energy Storage System For GRIDS (Lehner For Li)
21 pages
MAPS Vol11 No1 - The Influence of Psychedelics On Remote Viewing PDF
No ratings yet
MAPS Vol11 No1 - The Influence of Psychedelics On Remote Viewing PDF
2 pages
Are We Living in A Computer Simulation II
No ratings yet
Are We Living in A Computer Simulation II
3 pages
Rock Forming Minerals PDF
No ratings yet
Rock Forming Minerals PDF
16 pages
HP Pro Slate 10 EE G1 Rebranding Process SOP
100% (1)
HP Pro Slate 10 EE G1 Rebranding Process SOP
12 pages
Dimensions
No ratings yet
Dimensions
4 pages
A Musical Grammar
No ratings yet
A Musical Grammar
375 pages
Parallel Programming
No ratings yet
Parallel Programming
32 pages
CitrixVmware GPU Deployment Guide TechPub v02d6 Final
No ratings yet
CitrixVmware GPU Deployment Guide TechPub v02d6 Final
302 pages
0 - Jan Pajak - List of Content of All Monographs From Series
No ratings yet
0 - Jan Pajak - List of Content of All Monographs From Series
28 pages
Gravity Dark Energy Is Created by Sound
No ratings yet
Gravity Dark Energy Is Created by Sound
36 pages
Publications On Psi Research (Dean Radin)
100% (1)
Publications On Psi Research (Dean Radin)
8 pages
One + One Is Not Two
No ratings yet
One + One Is Not Two
4 pages
Monads for Programmers
100% (1)
Monads for Programmers
18 pages
TLDR Book
No ratings yet
TLDR Book
3,781 pages
Design of Plasma Generator Driven by High-Frequenc
No ratings yet
Design of Plasma Generator Driven by High-Frequenc
10 pages
A Study On The Binary Option Model and Its Pricing
No ratings yet
A Study On The Binary Option Model and Its Pricing
7 pages
Microkernel - Wikipedia
No ratings yet
Microkernel - Wikipedia
69 pages
SRX SDR Software Reference Manual
No ratings yet
SRX SDR Software Reference Manual
250 pages
Isometric System
No ratings yet
Isometric System
25 pages
Android Operating System
No ratings yet
Android Operating System
6 pages
Disk Drive Science
No ratings yet
Disk Drive Science
9 pages
DSO 138 Manual
0% (1)
DSO 138 Manual
8 pages
Physical Quantum Algorithms
No ratings yet
Physical Quantum Algorithms
7 pages
Vision Pro
100% (1)
Vision Pro
16 pages
YSFlight Blender Book - Chapter 1
No ratings yet
YSFlight Blender Book - Chapter 1
30 pages
2023 CSC14120 Lecture00 CourseIntroduction
No ratings yet
2023 CSC14120 Lecture00 CourseIntroduction
30 pages
Intro GPUs
No ratings yet
Intro GPUs
36 pages
Parallel Computing 1 Unit
No ratings yet
Parallel Computing 1 Unit
59 pages
PP Cuda Unit1 1
No ratings yet
PP Cuda Unit1 1
77 pages
Cuda
No ratings yet
Cuda
69 pages
Chapter 5 - General Purpose PGPU, CUDA
No ratings yet
Chapter 5 - General Purpose PGPU, CUDA
70 pages
Parallel Programming FDP
No ratings yet
Parallel Programming FDP
43 pages
LSW HW S67
No ratings yet
LSW HW S67
52 pages
AxTraxNG™ Software Installation and User Manual v27 - 081015 - English PDF
No ratings yet
AxTraxNG™ Software Installation and User Manual v27 - 081015 - English PDF
165 pages
EE447 Lecture Notes v3.2
0% (1)
EE447 Lecture Notes v3.2
138 pages
70-411 R2 Test Bank Lesson 03
0% (1)
70-411 R2 Test Bank Lesson 03
11 pages
CCNA Exploration2: Routing Protocols and Concepts - Final Exam
No ratings yet
CCNA Exploration2: Routing Protocols and Concepts - Final Exam
12 pages
Lesson 2 Number Base Conversion
No ratings yet
Lesson 2 Number Base Conversion
12 pages
06 - NuMicro I2C
No ratings yet
06 - NuMicro I2C
25 pages
8086 Instruction Set
100% (1)
8086 Instruction Set
92 pages
Computer Science Test Part: First: Q. 1: Mcqs
No ratings yet
Computer Science Test Part: First: Q. 1: Mcqs
1 page
NetBackup 104 AdminGuide Hadoop
No ratings yet
NetBackup 104 AdminGuide Hadoop
61 pages
Network Congestion Control Techniques
No ratings yet
Network Congestion Control Techniques
10 pages
Citra Log
No ratings yet
Citra Log
5 pages
USB to RS485/422 Converter Guide
No ratings yet
USB to RS485/422 Converter Guide
4 pages
3.FX2 - Student - Lab - Guide - V3.0-Edited For Mobile Solution
No ratings yet
3.FX2 - Student - Lab - Guide - V3.0-Edited For Mobile Solution
35 pages
Installing WebUtil
No ratings yet
Installing WebUtil
8 pages
2 - BODS Errors
100% (1)
2 - BODS Errors
19 pages
1 - Advanced Network Architecture
No ratings yet
1 - Advanced Network Architecture
10 pages
Vijeo Compatibility Matrix A5 Jun2015
No ratings yet
Vijeo Compatibility Matrix A5 Jun2015
4 pages
Computer Hardware and Servicing
No ratings yet
Computer Hardware and Servicing
232 pages
WCE700 MX51 ER 1106 ReferenceManual
No ratings yet
WCE700 MX51 ER 1106 ReferenceManual
245 pages
C Notes Final
No ratings yet
C Notes Final
384 pages
7.1.3.6 Lab Configuring Advanced EIGRP For IPv4 Features 1
No ratings yet
7.1.3.6 Lab Configuring Advanced EIGRP For IPv4 Features 1
9 pages
Arduino Timer & Interrupt Guide
100% (1)
Arduino Timer & Interrupt Guide
11 pages
Your 10-Day Roadmap To Learn Google Cloud Platform (GCP) - Zero To Hero!
No ratings yet
Your 10-Day Roadmap To Learn Google Cloud Platform (GCP) - Zero To Hero!
3 pages
IOT Notes AllUnits PDF
100% (1)
IOT Notes AllUnits PDF
50 pages
GPFS V3.4 Advanced Administration Guide
No ratings yet
GPFS V3.4 Advanced Administration Guide
174 pages
Hardware-Assisted Security Enhanced Linux in Embedded Systems: A Proposal
No ratings yet
Hardware-Assisted Security Enhanced Linux in Embedded Systems: A Proposal
7 pages
TEMS Investigation User's Manual
No ratings yet
TEMS Investigation User's Manual
1,144 pages
CPU Architecture Essentials
No ratings yet
CPU Architecture Essentials
24 pages
SEcurity Enhanced Linux Overview
No ratings yet
SEcurity Enhanced Linux Overview
25 pages

GPU Programming Slides 1

Uploaded by

GPU Programming Slides 1

Uploaded by

GPU Programming

Course Code: CSGG3018

12 programmes Ranked 501-600 in Ranked 801-850 in Ranked 46th in

Jan – May, 2025

Course Code Course name L T P C

Total Units to be Covered: 6 Total Contact Hours:34

• The objective of this course to

CO1 Describe the GPU computer architecture, GPU

1.IBM ICE Publications

Tasks progress simultaneously. Tasks execute simultaneously.

May involve task switching. Requires multiple

Example: Multi-threading. Example: GPU computations.

Assignment uploaded in LMS

You might also like