0% found this document useful (0 votes)

117 views18 pages

GPUProgramming Talk

Modern GPUs are present in many computing systems and can be used to accelerate a wide range of applications. GPUs consist of hundreds or thousands of smaller cores designed for parallel processing, compared to CPUs which have fewer but larger cores optimized for serial work. Programming frameworks like CUDA, OpenACC, and libraries like cuBLAS and cuFFT allow developers to leverage the parallel capabilities of GPUs to achieve significant speedups for applications that are highly parallel and computationally intensive.

Uploaded by

Ramu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

117 views18 pages

GPUProgramming Talk

Uploaded by

Ramu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 18

GPU Programming

using BU Shared Computing Cluster

Scientific Computing and Visualization

Boston University
GPU Programming

• GPU – graphics processing unit

• Originally designed as a graphics processor

• Nvidia's GeForce 256 (1999) – first GPU

o single-chip processor for mathematically-intensive tasks

o transforms of vertices and polygons
o lighting
o polygon clipping
o texture mapping
o polygon rendering
GPU Programming

Modern GPUs are present in

 Embedded systems
 Personal Computers
 Game consoles
 Mobile Phones
 Workstations
GPU Programming

Traditional GPU workflow

Vertex processing Blending, Z-buffering

Triangles, Lines,
Shading, Texturing
Points
GPU Programming

GPGPU
1999-2000 computer scientists from various fields started using
GPUs to accelerate a range of scientific applications.

GPU programming required the use of graphics APIs such as

OpenGL and Cg.

2002 James Fung (University of Toronto) developed OpenVIDIA.

NVIDIA greatly invested in GPGPU movement and offered a

number of options and libraries for a seamless experience for C, C+
+ and Fortran programmers.
GPU Programming

GPGPU timeline
In November 2006 Nvidia launched CUDA, an API that allows to
code algorithms for execution on Geforce GPUs using C
programming language.

Khronus Group defined OpenCL in 2008 supported on AMD,

Nvidia and ARM platforms.

In 2012 Nvidia presented and demonstrated OpenACC - a set of

directives that greatly simplify parallel programming of
heterogeneous systems.
GPU Programming

CPU GPU

CPUs consist of a few cores GPUs consist of hundreds or

optimized for serial processing thousands of smaller, efficient cores
designed for parallel performance
GPU Programming

SCC CPU SCC GPU

Intel Xeon X5650: NVIDIA Tesla M2070:

Clock speed: 2.66 GHz Core clock: 1.15GHz
4 instructions per cycle Single instruction
CPU - 6 cores 448 CUDA cores

2.66 x 4 x 6 = 1.15 x 1 x 448 =

63.84 Gigaflops double precision 515 Gigaflops double precision
GPU Programming

SCC CPU SCC GPU

Intel Xeon X5650: NVIDIA Tesla M2070:

Memory size: 288 GB Memory size: 3GB total
Bandwidth: 32 GB/sec Bandwidth: 150 GB/sec
GPU Programming

GPU Computing Growth

2008 2013

100M x 4.3 430M

CUDA-capable GPUs CUDA-capable GPUs

150K x 10.67 1.6M

CUDA downloads CUDA downloads

1 x 50 50
Supercomputer Supercomputers

4,000 x 9.25 37,000

Academic Papers
Academic Papers
GPU Programming

GPU Acceleration

Applications
GPU-accelerated OpenACC Programming
libraries Directives Languages

Seamless linking to GPU- Simple directives for easy Most powerful and flexible
enabled libraries. GPU-acceleration of new way to design GPU
and existing applications accelerated applications

cuFFT, cuBLAS,
C/C++, Fortran,
Thrust, NPP, IMSL, PGI Accelerator
Python, Java, etc.
CULA, cuRAND, etc.
GPU Programming

GPU Accelerated Libraries

powerful library of parallel algorithms and data
structures;

provides a flexible, high-level interface for GPU

programming;

For example, the thrust::sort algorithm delivers

5x to 100x faster sorting performance than STL
and TBB
GPU Programming

GPU Accelerated Libraries

a GPU-accelerated version of the complete

standard BLAS library;

cuBLAS
6x to 17x faster performance than the latest
MKL BLAS

Complete support for all 152 standard BLAS

routines

Single, double, complex, and double complex

data types

Fortran binding
GPU Programming

GPU Accelerated Libraries

cuSPARSE NPP cuFFT cuRAND

GPU Programming

OpenACC Directives
• Simple compiler directives
• Works on multicore CPUs & many core
GPUs
• Future integration into OpenMP

Program myscience

... serial code ... CPU

!$acc compiler Directive
do k = 1,n1
do i = 1,n2
... parallel code ... GPU
enddo
enddo
$acc end compiler Directive

End Program myscience

GPU Programming

CUDA
Programming language extension to C/C++ and FORTRAN;

Designed for efficient general purpose computation on GPU.

global void kernel(float* x, float* y, float* z, int n){

int idx= blockIdx.x * blockDim.x + threadIdx.x;
if(idx < n) z[idx] = x[idx] * y[idx];
}
int main(){
...
cudaMalloc(...);
cudaMemcpy(...);
kernel <<<num_blocks, block_size>>> (...);
cudaMemcpy(...);
cudaFree(...);
...
}
GPU Programming

MATLAB with GPU-acceleration

Use GPUs with MATLAB through Parallel Computing Toolbox

• GPU-enabled MATLAB functions such as fft, filter, and several linear algebra
operations

• GPU-enabled functions in toolboxes: Communications System Toolbox, Neural

Network Toolbox, Phased Array Systems Toolbox and Signal Processing Toolbox

• CUDA kernel integration in MATLAB applications, using only a single line of MATLAB
code

A=rand(2^16,1); A=gpuArray(rand(2^16,1));

B=fft(A); B=fft(A);
GPU Programming

Will Execution on a GPU Accelerate My Application?

Computationally intensive—The time spent on computation significantly

exceeds the time spent on transferring data to and from GPU memory.

Massively parallel—The computations can be broken down into

hundreds or thousands of independent units of work.

GPU Basics
No ratings yet
GPU Basics
93 pages
Introduction To Gpu Programming With Cuda and Openacc
100% (1)
Introduction To Gpu Programming With Cuda and Openacc
40 pages
NVIDIA OpenCL JumpStart Guide
No ratings yet
NVIDIA OpenCL JumpStart Guide
15 pages
GPU Computing Revolution CUDA
100% (1)
GPU Computing Revolution CUDA
5 pages
Graphics Programming in C
No ratings yet
Graphics Programming in C
2 pages
Cuda - New Features and Beyond Ampere Programming For Developers PDF
No ratings yet
Cuda - New Features and Beyond Ampere Programming For Developers PDF
78 pages
Cuda C/C++ Basics: NVIDIA Corporation
No ratings yet
Cuda C/C++ Basics: NVIDIA Corporation
67 pages
OpenCL for Programmers
No ratings yet
OpenCL for Programmers
13 pages
A Brief Introduction To 3d
100% (1)
A Brief Introduction To 3d
84 pages
2019 DVCon India Modern SystemC.v2 - 4.3
No ratings yet
2019 DVCon India Modern SystemC.v2 - 4.3
41 pages
Design of Parallel Algorithm'S: Faculty Guide: Group Members
No ratings yet
Design of Parallel Algorithm'S: Faculty Guide: Group Members
49 pages
Neural ODEs: Continuous-Depth Models
No ratings yet
Neural ODEs: Continuous-Depth Models
13 pages
Principles of Compiler Design - Tutorial 9
100% (1)
Principles of Compiler Design - Tutorial 9
7 pages
Introduction To Deep Learning - With Complexe Python and TensorFlow Examples - Jürgen Brauer PDF
No ratings yet
Introduction To Deep Learning - With Complexe Python and TensorFlow Examples - Jürgen Brauer PDF
245 pages
Parallel Computing Toolbox
No ratings yet
Parallel Computing Toolbox
730 pages
CUDA Installation Guide Windows
No ratings yet
CUDA Installation Guide Windows
28 pages
Chapter 2. Pair Programming
No ratings yet
Chapter 2. Pair Programming
15 pages
HighPerformanceComputing DS
No ratings yet
HighPerformanceComputing DS
2 pages
Image Rotation Using CUDA
No ratings yet
Image Rotation Using CUDA
18 pages
Glade Tutorial
No ratings yet
Glade Tutorial
5 pages
Gpu1 - GPU Introduction
No ratings yet
Gpu1 - GPU Introduction
20 pages
Introduction To High Performance Scientific Computing
No ratings yet
Introduction To High Performance Scientific Computing
464 pages
Solanki A. Applications of Blockchain and Big IoT Systems... 2023
100% (1)
Solanki A. Applications of Blockchain and Big IoT Systems... 2023
561 pages
Big CPU Big Data
No ratings yet
Big CPU Big Data
424 pages
Flow Based Programming Book
100% (2)
Flow Based Programming Book
377 pages
3D Letter "N" in DirectX Guide
No ratings yet
3D Letter "N" in DirectX Guide
25 pages
Parallel Programming (Wilkinson)
No ratings yet
Parallel Programming (Wilkinson)
485 pages
Uncertainty in Modeling
No ratings yet
Uncertainty in Modeling
25 pages
Compute Unified Device Architecture
No ratings yet
Compute Unified Device Architecture
6 pages
CUDA C Programming Guide PDF
No ratings yet
CUDA C Programming Guide PDF
301 pages
Game Engine Architecture Second Edition Jason Gregory Download
100% (1)
Game Engine Architecture Second Edition Jason Gregory Download
60 pages
C1SE-12 Team Sprint Task Overview
No ratings yet
C1SE-12 Team Sprint Task Overview
8 pages
Functional Programming for OO Devs
No ratings yet
Functional Programming for OO Devs
382 pages
CUDA C Programming Guide
No ratings yet
CUDA C Programming Guide
376 pages
2019 Book ArchitectureOfComputingSystems
100% (1)
2019 Book ArchitectureOfComputingSystems
337 pages
Introduction To GPU Architecture: © 2006 University of Central Florida
100% (1)
Introduction To GPU Architecture: © 2006 University of Central Florida
41 pages
Performance Optimization With Modern CUDA Programming Techniques - 1635781161534001h3am
No ratings yet
Performance Optimization With Modern CUDA Programming Techniques - 1635781161534001h3am
77 pages
Concurrency Primer
No ratings yet
Concurrency Primer
12 pages
Aarne Ranta - Implementing Programming Languages. An Introduction To Compilers and Interpreters (2012, College Publications)
No ratings yet
Aarne Ranta - Implementing Programming Languages. An Introduction To Compilers and Interpreters (2012, College Publications)
226 pages
Programming Agents Williams
No ratings yet
Programming Agents Williams
31 pages
Rust Book en Us Shieber
100% (1)
Rust Book en Us Shieber
338 pages
Cluster Computing
No ratings yet
Cluster Computing
32 pages
AN Simple OOP in C
No ratings yet
AN Simple OOP in C
15 pages
Speaker - A02 - 5747 - Best Practices in Networking For AI
No ratings yet
Speaker - A02 - 5747 - Best Practices in Networking For AI
15 pages
Parallela Cluster by Michael Johan Kruger
No ratings yet
Parallela Cluster by Michael Johan Kruger
56 pages
High Performance Computing
100% (2)
High Performance Computing
61 pages
Lecture Week - 1 Introduction 1 - SP-24
No ratings yet
Lecture Week - 1 Introduction 1 - SP-24
51 pages
CE5015 - GPU Programming
No ratings yet
CE5015 - GPU Programming
5 pages
Direct2D Succinctly PDF
No ratings yet
Direct2D Succinctly PDF
187 pages
Fundamentals of Multicore Software Development PDF
No ratings yet
Fundamentals of Multicore Software Development PDF
322 pages
High Performance Computing ChapterSampler
No ratings yet
High Performance Computing ChapterSampler
124 pages
High Performance Computer Networks HPCN - Engineering Science
No ratings yet
High Performance Computer Networks HPCN - Engineering Science
6 pages
LECTURE 1 - Inroduction To OOPs
100% (1)
LECTURE 1 - Inroduction To OOPs
22 pages
GANs for Financial Data Augmentation
No ratings yet
GANs for Financial Data Augmentation
8 pages
Computer Graphics
100% (1)
Computer Graphics
132 pages
CUDA Installation Guide Windows
100% (1)
CUDA Installation Guide Windows
17 pages
Pytorch Tutorial 1 Rev 1
No ratings yet
Pytorch Tutorial 1 Rev 1
48 pages
CUDA Class Lecture01
No ratings yet
CUDA Class Lecture01
26 pages
Introduction To CUDA
No ratings yet
Introduction To CUDA
51 pages
Cs-3006 8 Gpuprogramming Using Cuda&Opencl
No ratings yet
Cs-3006 8 Gpuprogramming Using Cuda&Opencl
167 pages

GPUProgramming Talk

Uploaded by

GPUProgramming Talk

Uploaded by

GPU Programming

using BU Shared Computing Cluster

Scientific Computing and Visualization

• GPU – graphics processing unit

• Originally designed as a graphics processor

• Nvidia's GeForce 256 (1999) – first GPU

o single-chip processor for mathematically-intensive tasks

Modern GPUs are present in

Traditional GPU workflow

Vertex processing Blending, Z-buffering

GPU programming required the use of graphics APIs such as

2002 James Fung (University of Toronto) developed OpenVIDIA.

NVIDIA greatly invested in GPGPU movement and offered a

Khronus Group defined OpenCL in 2008 supported on AMD,

In 2012 Nvidia presented and demonstrated OpenACC - a set of

CPUs consist of a few cores GPUs consist of hundreds or

SCC CPU SCC GPU

Intel Xeon X5650: NVIDIA Tesla M2070:

2.66 x 4 x 6 = 1.15 x 1 x 448 =

SCC CPU SCC GPU

Intel Xeon X5650: NVIDIA Tesla M2070:

GPU Computing Growth

100M x 4.3 430M

150K x 10.67 1.6M

4,000 x 9.25 37,000

GPU Accelerated Libraries

provides a flexible, high-level interface for GPU

For example, the thrust::sort algorithm delivers

GPU Accelerated Libraries

a GPU-accelerated version of the complete

Complete support for all 152 standard BLAS

Single, double, complex, and double complex

GPU Accelerated Libraries

cuSPARSE NPP cuFFT cuRAND

... serial code ... CPU

End Program myscience

Designed for efficient general purpose computation on GPU.

__global__ void kernel(float* x, float* y, float* z, int n){

MATLAB with GPU-acceleration

Use GPUs with MATLAB through Parallel Computing Toolbox

• GPU-enabled functions in toolboxes: Communications System Toolbox, Neural

Will Execution on a GPU Accelerate My Application?

Computationally intensive—The time spent on computation significantly

Massively parallel—The computations can be broken down into

You might also like

global void kernel(float* x, float* y, float* z, int n){