HASKELL ARRAYS, ACCELERATED
USING GPUS
Manuel M. T. Chakravarty
University of New South Wales
JOINT WORK WITH
Gabriele Keller
Sean Lee
Monday, 7 September 2009
General Purpose GPU
Programming (GPGPU)
Monday, 7 September 2009
MODERN GPUS ARE FREELY PROGRAMMABLE
But no function pointers & limited recursion
Monday, 7 September 2009
MODERN GPUS ARE FREELY PROGRAMMABLE
But no function pointers & limited recursion
Monday, 7 September 2009
Very Different
Programming Model
(Compared to multicore CPUs)
Monday, 7 September 2009
Quadcore Tesla T10
Xeon CPU GPU
or e
d s/c
re a
o f th
e ds
n dr
hu
MASSIVELY PARALLEL PROGRAMS NEEDED
Tens of thousands of dataparallel threads
Monday, 7 September 2009
Programming GPUs is
hard! Why bother?
Monday, 7 September 2009
Reduce power consumption!
✴ GPU achieves 20x better performance/Watt (judging by peak performance)
✴ Speedups between 20x to 150x have been observed in real applications
Monday, 7 September 2009
Sparse Matrix Vector Multiplication
1000
100
Time (milliseconds)
10
0.1
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Number of non-zero elements (million)
Plain Haskell, CPU only (AMD Sempron) Plain Haskell, CPU only (Intel Xeon)
Haskell EDSL on a GPU (GeForce 8800GTS) Haskell EDSL on a GPU (Tesla S1070 x1)
Prototype of code generator targeting GPUs
Computation only, without CPU ⇄ GPU transfer
Monday, 7 September 2009
Challenges
Code must be massively dataparallel
Control structures are limited
‣ No function pointers
‣ Very limited recursion
Software-managed cache, memory-access
patterns, etc.
Portability...
Monday, 7 September 2009
Tesla T10
GPU
OTHER COMPUTE ACCELERATOR ARCHITECTURES
Goal: portable data parallelism
Monday, 7 September 2009
Tesla T10
GPU
OTHER COMPUTE ACCELERATOR ARCHITECTURES
Goal: portable data parallelism
Monday, 7 September 2009
Tesla T10
GPU
OTHER COMPUTE ACCELERATOR ARCHITECTURES
Goal: portable data parallelism
Monday, 7 September 2009
Tesla T10
GPU
OTHER COMPUTE ACCELERATOR ARCHITECTURES
Goal: portable data parallelism
Monday, 7 September 2009
Data.Array.Accelerate
Collective operations on multi-dimensional regular
arrays
Embedded DSL
‣ Restricted control flow
‣ First-order GPU code
Generative approach based on combinator
templates
Multiple backends
Monday, 7 September 2009
Data.Array.Accelerate
e l i sm
tarall
a p
ve d a
Collective operations on multi-dimensional regular
a s si
m
arrays ✓
Embedded DSL
‣ Restricted control flow
‣ First-order GPU code
Generative approach based on combinator
templates
Multiple backends
Monday, 7 September 2009
Data.Array.Accelerate
e l i sm
t a p arall
ve d a
Collective operations on multi-dimensional regular
a s si
m
arrays ✓
Embedded DSL
u re s
‣ Restricted controloflow l st ruct
ntro
d c
imit e
l
✓ GPU code
‣ First-order
Generative approach based on combinator
templates
Multiple backends
Monday, 7 September 2009
Data.Array.Accelerate
e l i sm
t a p arall
ve d a
Collective operations on multi-dimensional regular
a s si
m
arrays ✓
Embedded DSL
u re s
‣ Restricted controloflow l st ruct
ntro
d c
imit e
l
✓ GPU code
‣ First-order e r ns
e s s patt
d ac c
Generative approach - tu based
n e on combinator
n d
templates ✓ ha
Multiple backends
Monday, 7 September 2009
Data.Array.Accelerate
e l i sm
t a p arall
ve d a
Collective operations on multi-dimensional regular
a s si
m
arrays ✓
Embedded DSL
u re s
‣ Restricted controloflow l st ruct
ntro d c
imit e
l
✓ GPU code
‣ First-order e r ns
e s s patt
d ac c
Generative approach - tu based
n e on combinator
n d
templates ✓ ha
b i li t y
Multiple backends
p o r ta
✓
Monday, 7 September 2009
import Data.Array.Accelerate
Dot product
dotp :: Vector Float -> Vector Float
-> Acc (Scalar Float)
dotp xs ys
= let
xs' = use xs
ys' = use ys
in
fold (+) 0 (zipWith (*) xs' ys')
Monday, 7 September 2009
import Data.Array.Accelerate
HaskellDot product
array
dotp :: Vector Float -> Vector Float
-> Acc (Scalar Float)
dotp xs ys
= let
xs' = use xs
ys' = use ys
in
fold (+) 0 (zipWith (*) xs' ys')
Monday, 7 September 2009
import Data.Array.Accelerate
HaskellDot product
array
dotp :: Vector Float -> Vector Float
-> Acc (Scalar Float)
dotp xs ys
EDSL array =
= let
desc. of array comps
xs' = use xs
ys' = use ys
in
fold (+) 0 (zipWith (*) xs' ys')
Monday, 7 September 2009
import Data.Array.Accelerate
HaskellDot product
array
dotp :: Vector Float -> Vector Float
-> Acc (Scalar Float)
dotp xs ys
EDSL array =
= let
desc. of array comps
xs' = use xs
ys' = use ys
in Lift Haskell arrays into
fold (+) 0 (zipWith (*) xs' ys') EDSL — may trigger
host➙device transfer
Monday, 7 September 2009
import Data.Array.Accelerate
HaskellDot product
array
dotp :: Vector Float -> Vector Float
-> Acc (Scalar Float)
dotp xs ys
EDSL array =
= let
desc. of array comps
xs' = use xs
ys' = use ys
in Lift Haskell arrays into
fold (+) 0 (zipWith (*) xs' ys') EDSL — may trigger
host➙device transfer
EDSL array
computations
Monday, 7 September 2009
import Data.Array.Accelerate
Sparse-matrix vector multiplication
type SparseVector a = Vector (Int, a)
type SparseMatrix a = (Segments, SparseVector a)
smvm :: Acc (SparseMatrix Float)
-> Acc (Vector Float)
-> Acc (Vector Float)
smvm (segd, smat) vec
= let
(inds, vals) = unzip smat
vecVals = backpermute (shape inds)
(\i -> inds!i) vec
products = zipWith (*) vecVals vals
in
foldSeg (+) 0 products segd
Monday, 7 September 2009
import Data.Array.Accelerate
[0, 0, 6.0, 0, 7.0] ≈ [(2, 6.0), (4, 7.0)]
Sparse-matrix vector multiplication
type SparseVector a = Vector (Int, a)
type SparseMatrix a = (Segments, SparseVector a)
smvm :: Acc (SparseMatrix Float)
-> Acc (Vector Float)
-> Acc (Vector Float)
smvm (segd, smat) vec
= let
(inds, vals) = unzip smat
vecVals = backpermute (shape inds)
(\i -> inds!i) vec
products = zipWith (*) vecVals vals
in
foldSeg (+) 0 products segd
Monday, 7 September 2009
import Data.Array.Accelerate
[0, 0, 6.0, 0, 7.0] ≈ [(2, 6.0), (4, 7.0)]
Sparse-matrix vector multiplication
type SparseVector a = Vector (Int, a)
type SparseMatrix a = (Segments, SparseVector a)
smvm :: Acc (SparseMatrix Float)
-> Acc (Vector Float)
-> Acc (Vector Float)[[10, 20], [], [30]] ≈ ([2, 0, 1], [10, 20, 30])
smvm (segd, smat) vec
= let
(inds, vals) = unzip smat
vecVals = backpermute (shape inds)
(\i -> inds!i) vec
products = zipWith (*) vecVals vals
in
foldSeg (+) 0 products segd
Monday, 7 September 2009
Architecture of
Data.Array.Accelerate
Monday, 7 September 2009
map (\x -> x + 1) arr
Monday, 7 September 2009
map (\x -> x + 1) arr
d e Br u ijn
f y & H OAS ->
R e i
Map (Lam (Add `PrimApp`
(ZeroIdx, Const 1))) arr
Monday, 7 September 2009
map (\x -> x + 1) arr
d e Br u ijn
f y & H OAS ->
R e i
Map (Lam (Add `PrimApp`
(ZeroIdx, Const 1))) arr
Recover
sharing
(CSE or O
bserve)
Monday, 7 September 2009
map (\x -> x + 1) arr
d e Br u ijn
f y & H OAS ->
R e i
Map (Lam (Add `PrimApp`
(ZeroIdx, Const 1))) arr
Recover t i o n
sharing m is a
(CSE or O Op t i
bserve) s io n )
(Fu
Monday, 7 September 2009
map (\x -> x + 1) arr
d e Br u ijn
f y & H OAS ->
R e i
Map (Lam (Add `PrimApp`
(ZeroIdx, Const 1))) arr
Recover t i o n
sharing m is a
(CSE or O Op t i
bserve) s io n )
(Fu
Code generation
__global__ void kernel (float *arr, int n)
{...
Monday, 7 September 2009
map (\x -> x + 1) arr
d e Br u ijn
f y & H OAS ->
R e i
Map (Lam (Add `PrimApp`
(ZeroIdx, Const 1))) arr
Recover t i o n
sharing m is a
(CSE or O Op t i
bserve) s io n )
(Fu
Code generation
__global__ void kernel (float *arr, int n)
{... nvcc
Monday, 7 September 2009
map (\x -> x + 1) arr
d e Br u ijn
f y & H OAS ->
R e i
Map (Lam (Add `PrimApp`
(ZeroIdx, Const 1))) arr
Recover t i o n
sharing m is a
(CSE or O Op t i
bserve) s io n ) e
(Fu c ka g
a
p gins
plu
Code generation
__global__ void kernel (float *arr, int n)
{... nvcc
Monday, 7 September 2009
The API of
Data.Array.Accelerate
(The main bits)
Monday, 7 September 2009
Array types
data Array dim e — regular, multi-dimensional arrays
type DIM0 = ()
type DIM1 = Int
type DIM2 = (Int, Int)
⟨and so on⟩
type Scalar e = Array DIM0 e
type Vector e = Array DIM1 e
Monday, 7 September 2009
Array types
data Array dim e — regular, multi-dimensional arrays
type DIM0 = ()
type DIM1 = Int
type DIM2 = (Int, Int)
⟨and so on⟩
type Scalar e = Array DIM0 e
type Vector e = Array DIM1 e
EDSL forms
data Exp e — scalar expression form
data Acc a — array expression form
Monday, 7 September 2009
Array types
data Array dim e — regular, multi-dimensional arrays
type DIM0 = ()
type DIM1 = Int
type DIM2 = (Int, Int)
⟨and so on⟩
type Scalar e = Array DIM0 e
type Vector e = Array DIM1 e
EDSL forms
data Exp e — scalar expression form
data Acc a — array expression form
Classes
class Elem e — scalar and tuples types
class Elem ix => Ix ix — unit and integer tuples
Monday, 7 September 2009
Scalar operations
instance Num (Exp e) — overloaded arithmetic
instance Integral (Exp e)
⟨and so on⟩
(==*), (/=*), (<*), (<=*), — comparisons
(>*), (>=*), min, max
(&&*), (||*), not — logical operators
(?) :: Elem t — conditional expression
=> Exp Bool -> (Exp t, Exp t) -> Exp t
(!) :: (Ix dim, Elem e) — scalar indexing
=> Acc (Array dim e) -> Exp dim -> Exp e
shape :: Ix dim — yield array shape
=> Acc (Array dim e) -> Exp dim
Monday, 7 September 2009
Collective array operations — creation
— use an array from Haskell land
use :: (Ix dim, Elem e)
=> Array dim e -> Acc (Array dim e)
— create a singleton
unit :: Elem e => Exp e -> Acc (Scalar e)
— multi-dimensional replication
replicate :: (SliceIx slix, Elem e)
=> Exp slix
-> Acc (Array (Slice slix) e)
-> Acc (Array (SliceDim slix) e)
— Example: replicate (All, 10, All) twoDimArr
Monday, 7 September 2009
Collective array operations — slicing
— slice extraction
slice :: (SliceIx slix, Elem e)
=> Acc (Array (SliceDim slix) e)
-> Exp slix
-> Acc (Array (Slice slix) e)
— Example: slice (5, All, 7) threeDimArr
Monday, 7 September 2009
Collective array operations — mapping
map :: (Ix dim, Elem a, Elem b)
=> (Exp a -> Exp b)
-> Acc (Array dim a)
-> Acc (Array dim b)
zipWith :: (Ix dim, Elem a, Elem b, Elem c)
=> (Exp a -> Exp b -> Exp c)
-> Acc (Array dim a)
-> Acc (Array dim b)
-> Acc (Array dim c)
Monday, 7 September 2009
Collective array operations — reductions
fold :: (Ix dim, Elem a)
=> (Exp a -> Exp a -> Exp a) — associative
-> Exp a
-> Acc (Array dim a)
-> Acc (Scalar a)
scan :: Elem a
=> (Exp a -> Exp a -> Exp a) — associative
-> Exp a
-> Acc (Vector a)
-> (Acc (Vector a), Acc (Scalar a))
Monday, 7 September 2009
Collective array operations — permutations
permute :: (Ix dim, Ix dim', Elem a)
=> (Exp a -> Exp a -> Exp a)
-> Acc (Array dim' a)
-> (Exp dim -> Exp dim')
-> Acc (Array dim a)
-> Acc (Array dim' a)
backpermute :: (Ix dim, Ix dim', Elem a)
=> Exp dim'
-> (Exp dim' -> Exp dim)
-> Acc (Array dim a)
-> Acc (Array dim' a)
Monday, 7 September 2009
Dense Matrix-Matrix Multiplication
1000000
100000
Time (milliseconds)
10000
1000
100
10
1
128 256 512 1024
size of square NxN matrix
Unboxed Haskell arrays Delayed arrays Plain C Mac OS vecLib
Regular arrays in package dph-seq
@ 3.06 GHz Core 2 Duo (with GHC 6.11)
Monday, 7 September 2009
Conclusion
EDSL for processing multi-dimensional arrays
Collective array operations (highly data parallel)
Support for multiple backends
Status:
‣ Very early version on Hackage (only interpreter)
http://hackage.haskell.org/package/accelerate
‣ Currently porting GPU backend over
‣ Looking for backend contributors
Monday, 7 September 2009