GPU Computing with MATLAB
Andy Th
Product Marketing Manager
Image Processing Applications
2014 The MathWorks, Inc.
1
GPU Computing with MATLAB
For MATLAB Programmers
Acceleration MATLAB Code with GPUs
Minimal code changes
For CUDA Programmers
Create test harnesses for your kernels
Quickly explore algorithm parameters
Analyze and visualize kernel results
2
Agenda
What is MATLAB?
Demo: Designing a Camera Pipeline in MATLAB
Demo: Brain Scan Demo CPU vs GPU
Demo: White Balance Example using CUDA Code
Summary
3
What is MATLAB?
High level language and development
environment for:
Algorithm and application development
Data analysis
Mathematical modeling
Multicore and GPU computing*
Extensive math, engineering, and
plotting functionality
Add-on products for image and video
processing, communications, signal
processing, financial modeling, and
more
Over 1.3 million COM & EDU users
* Requires Parallel Computing Toolbox 4
Algorithm Development Process
Requirements
Research & Design
Explore and discover
Design
Gain insight into problem
Test
Test & Verification
Evaluate options, trade-offs Elaborate
Implementation
Migrate design to production
.exe
.NET .dll
Optimize performance CUDA
.C/C++ Java
Deploy / Integrate / Test
HDL
5
Running MATLAB code on the GPU
200+ MATLAB functions that are GPU enabled
Random number generation Solvers SVD
FFT Convolutions Cholesky and LU
Matrix multiplications Min/max factorization
Additional support in toolboxes
Image Processing Communications Signal Processing Neural Networks
Morphological filtering, Turbo, LDPC, and Cross correlation, Network training
2-D filtering, Viterbi decoders, FIR filtering, and simulation
bwmorph imhist
bwlookup imnoise
Ability to launch CUDA kernels corr2
edge
imopen
imresize
histeq imrotate
imadjust imshow
imbothat imtophat
imclose imwarp
Ability to deploy MATLAB GPU applications imdilate mean2
imerode medfilt2
imfilter padarray
imgradient rgb2gray 6
Agenda
What is MATLAB?
Demo: Designing a Camera Pipeline in MATLAB
Demo: Brain Scan Demo CPU vs GPU
Demo: White Balance Example using CUDA Code
Summary
7
Demo: Digital Camera Pipeline
From Sensor Data to Image
Scene Image
8
Stages of the Camera Pipeline
1. Noise Reduction : Reduce noise in the raw data
2. Demosaic: Interpolate the raw image into an 1. Noise Reduction
RGB image 2. Demosaic
3. Tone Mapping
3. Tone Mapping: Convert sensor RGB values to 4. White Balance
RGB values 5. Gamma Correction
4. White Balance: Adjust color of image to
compensate for different lighting conditions
5. Gamma Correction: Adjust color of image for
display
9
Approaches to GPU Computing in MATLAB
GPU enabled functions
Greater Control
Simple programming constructs:
gpuArray, gather
Ease of Use
Advanced programming constructs:
arrayfun
Interface for experts:
CUDAKernel, MEX support
http://www.mathworks.com/help/distcomp/run-cuda-or-ptx-code-on-gpu.htmll
http://www.mathworks.com/help/distcomp/run-mex-functions-containing-cuda-code.html
10
Agenda
What is MATLAB?
Demo: Designing a Camera Pipeline in MATLAB
Demo: Brain Scan Demo CPU vs GPU
Demo: White Balance Example using CUDA Code
Summary
11
Demo: Brain Scan
CPU vs GPU
12
Agenda
What is MATLAB?
Demo: Designing a Camera Pipeline in MATLAB
Demo: Brain Scan Demo CPU vs GPU
Demo: White Balance Example using CUDA Code
Summary
13
Stages of the Camera Pipeline
1. Noise Reduction : Reduce noise in the raw data
2. Demosaic: Interpolate the raw image into an 1. Noise Reduction
RGB image 2. Demosaic
3. Tone Mapping
3. Tone Mapping: Convert sensor RGB values to 4. White Balance
RGB values 5. Gamma Correction
4. White Balance: Adjust color of image to
compensate for different lighting conditions
5. Gamma Correction: Adjust color of image for
display
14
Demonstration: White balance algorithm
Goal: Deliver algorithm as CUDA C/C++
Before After
How can MATLAB support the CUDA development process?
15
Programming Parallel Applications (GPU)
GPU enabled functions
Greater Control
Simple programming constructs:
gpuArray, gather
Ease of Use
Advanced programming constructs:
arrayfun
Interface for experts:
CUDAKernel, MEX support
http://www.mathworks.com/help/distcomp/run-cuda-or-ptx-code-on-gpu.htmll
http://www.mathworks.com/help/distcomp/run-mex-functions-containing-cuda-code.html
16
Summary Part 1: Apply Scaling Factors on GPU
Code run entirely on CPU Applying Scaling Factors on GPU
function imageData = whitebalance(imageData) function adjustedImage = whitebalance_gpu(imageData)
% WHITEBALANCE forces average image color to be gray.
% Find the average values for each channel. ...
avg_rgb = mean(mean(imageData));
% Find average gray value and compute scaling factor. % Load the Kernel
factors = max(mean(avg_rgb), 128)./avg_rgb; kernel =
parallel.gpu.CUDAKernel('applyScaleFactors.ptxw64',
'applyScaleFactors.cu' );
...
% Copy our Image to the GPU
% Adjust the image to the new gray value. imageDataGPU = gpuArray( imageData );
imageData(:,:,1) = uint8(imageData(:,:,1)*factors(1)); ...
imageData(:,:,2) = uint8(imageData(:,:,2)*factors(2)); % Apply kernel to scale the color values
imageData(:,:,3) = uint8(imageData(:,:,3)*factors(3)); adjustedImageGPU = feval( kernel, adjustedImageGPU,
imageDataGPU, factors, nRows, nCols );
Execution time on CPU: 120 ms Execution time on GPU: 2.2 ms
50x Faster 17
Summary Part 2: Compute Scaling Factors on GPU
Code run entirely on CPU Computing Scaling Factors on GPU
function imageData = whitebalance(imageData) function imageData = whitebalance(imageData)
% WHITEBALANCE forces average image color to be gray. % WHITEBALANCE forces average image color to be gray.
% Find the average values for each channel. ...
avg_rgb = mean(mean(imageData));
% Find average gray value and compute scaling factor. % Compute the factors from the mean.
factors = max(mean(avg_rgb), 128)./avg_rgb; computeFactorsKernel = parallel.gpu.CUDAKernel( ...
'computeScaleFactors.ptxw64','computeScaleFactors.cu'
);
% 3 doubles of shared memory
% Adjust the image to the new gray value. computeFactorsKernel.SharedMemorySize = 3*8;
imageData(:,:,1) = uint8(imageData(:,:,1)*factors(1)); computeFactorsKernel.ThreadBlockSize = [3 1 1];
imageData(:,:,2) = uint8(imageData(:,:,2)*factors(2)); factors = feval( computeFactorsKernel, avg_rgb
imageData(:,:,3) = uint8(imageData(:,:,3)*factors(3));
...
18
Summary Part 3: Compute Mean on GPU with NPP
Code run entirely on CPU Implementing NPP through MEX
function imageData = whitebalance(imageData) function adjustedImage = whitebalance_gpu(imageData)
% WHITEBALANCE forces average image color to be gray. ...
%**************************************
% Find the average values for each channel. % Find the average values for each channel.
avg_rgb = mean(mean(imageData));
avg_rgb = computeMeanMEX(imageDataGPU);
% Find average gray value and compute scaling factor.
factors = max(mean(avg_rgb), 128)./avg_rgb;
%**************************************
% Copy our Image to the GPU
imageDataGPU = gpuArray( imageData );
...
% Adjust the image to the new gray value. % Apply kernel to scale the color values
imageData(:,:,1) = uint8(imageData(:,:,1)*factors(1)); adjustedImageGPU = feval( kernel, adjustedImageGPU,
imageData(:,:,2) = uint8(imageData(:,:,2)*factors(2)); imageDataGPU, factors, nRows, nCols );
imageData(:,:,3) = uint8(imageData(:,:,3)*factors(3));
19
Summary Part 4: Integrate C with CUDA
Code run entirely on CPU Implementing C++ with CUDA Kernels
function imageData = whitebalance(imageData) // dllmain.cpp : Defines the entry point for the DLL
% WHITEBALANCE forces average image color to be gray. application.
#include <mex.h>
% Find the average values for each channel. #include <gpu\mxGPUArray.h>
avg_rgb = mean(mean(imageData)); #include "whitebalance.h"
// A function to compute the mean of all the elements
% Find average gray value and compute scaling factor. of a uint8 matrix on the GPU.
factors = max(mean(avg_rgb), 128)./avg_rgb; // Returns a single scalar value on the GPU.
void mexFunction ( const int nlhs, mxArray * plhs[],
const int nrhs,
const mxArray * const prhs[] )
% Adjust the image to the new gray value.
{
imageData(:,:,1) = uint8(imageData(:,:,1)*factors(1));
imageData(:,:,2) = uint8(imageData(:,:,2)*factors(2)); //Ensure the GPU system is initialized.
imageData(:,:,3) = uint8(imageData(:,:,3)*factors(3)); int mwGpuStat = 0;
mwGpuStat = mxInitGPU();
if ( mwGpuStat != 0 )
mexErrMsgTxt( "Error initializing MW GPU
system" );
...
20
Agenda
What is MATLAB?
Demo: Designing a Camera Pipeline in MATLAB
Demo: Brain Scan Demo CPU vs GPU
Demo: White Balance Example using CUDA Code
Summary
21
Programming Parallel Applications (GPU)
GPU enabled functions
Greater Control
Simple programming constructs:
gpuArray, gather
Ease of Use
Advanced programming constructs:
arrayfun
Interface for experts:
CUDAKernel, MEX support
http://www.mathworks.com/help/distcomp/run-cuda-or-ptx-code-on-gpu.htmll
http://www.mathworks.com/help/distcomp/run-mex-functions-containing-cuda-code.html
22
MATLAB Value-add to CUDA Programmers
Develop prototype code to explore Requirements
algorithms Research & Design
Explore and discover
Design
Manage GPU data and launch kernels Gain insight into problem
Test & Verification
Test
using a simple interface Evaluate options, trade-offs Elaborate
Incrementally develop and test kernels Implementation
Implementation
Migrate design to production
.NET .dll
.exe
Optimize performance
CUDA
Analyze and visualize kernel results Deploy / Integrate / Test .C/C++ Java
HDL
23
Scaling to Run on Multiple GPUs
Single GPU Multiple GPUs
N = 1000; % Number of iterations N = 1000; % Number of iterations
A = gpuArray(A); % transfer data to GPU A = gpuArray(A); % transfer data
for ix = 1:M parfor ix = 1:M
% Do the GPU-based calculation % Do the GPU-based calculation
X = myGPUFunction(ix,A); X = myGPUFunction(ix,A);
% Gather data % Gather data
Xtotal(ix,:)= gather(X); Xtotal(ix,:)= gather(X);
end end
24
Running MATLAB code on the GPU
200+ MATLAB functions that are GPU enabled
Random number generation Solvers SVD
FFT Convolutions Cholesky and LU
Matrix multiplications Min/max factorization
Additional support in toolboxes
Image Processing Communications Signal Processing Neural Networks
Morphological filtering, Turbo, LDPC, and Cross correlation, Network training
2-D filtering, Viterbi decoders, FIR filtering, and simulation
bwmorph imhist
bwlookup imnoise
Ability to launch CUDA kernels corr2
edge
imopen
imresize
histeq imrotate
imadjust imshow
imbothat imtophat
imclose imwarp
Ability to deploy MATLAB GPU applications imdilate mean2
imerode medfilt2
imfilter padarray
imgradient rgb2gray 25
GPU Computing with MATLAB
For MATLAB Programmers
Acceleration MATLAB Code with GPUs
Minimal code changes
For CUDA Programmers
Create test harnesses for your kernels
Quickly explore algorithm parameters
Analyze and visualize kernel results
26
Questions & Answers
2014 The MathWorks, Inc.
27