KEMBAR78
GPU Computing with Ruby | PDF
GPU Computing with Ruby



      SpeedGo Computing

         Chung Shin Yee
 shinyee@speedgocomputing.com
CPU vs GPU Architecture
        6 Core vs 1024 Core
6 GB/s vs 300 GB/s Memory Bandwidth




       By CUDA C Programming Guide
CUDA Programming Model



                              .
                              .
                              .
                              .




By CUDA C Programming Guide
Existing Programming Tools
●   Cg
●   BrookGPU
●   GLSL (OpenGL Shading Language)
●   Nvidia CUDA C/C++
●   OpenCL
●   PyCUDA     Where is the Red Ruby ?
Bridging Ruby & CUDA C/C++
●   Ruby C extension
       –   Hard to manipulate Ruby objects in C.
       –   Compilation problems.
●   Ruby FFI
       –   Bridging purely in Ruby.
       –   Support multiple Ruby implementations.
Ruby Bridge Sample
Developing SGC Ruby CUDA
●   Object-oriented API.
●   Start with crucial operations.
       –   Memory allocation.
       –   Memory transfer.
       –   Kernel launch.
       –   Wrapper for structures.
●   Documented with YARD.
Driver vs Runtime API
●   CUDA Driver API
      –   For system developers.
      –   Supported by PyCUDA.
●   CUDA Runtime API
      –   For computation centric developers.


          We going to support both API !
Using SGC Ruby CUDA
●   Kernel program in CUDA C.
Using SGC Ruby CUDA
●   Compiling kernel into PTX.
       –   nvcc --ptx vadd.cu
Using SGC Ruby CUDA
●   Setup
        require 'rubycu'
        include SGC::CU
        CUInit.init
        d = CUDevice.get(0)
        c = CUContext.create(d)
        m = CUModule.new.load(“vadd.ptx”)
        f = m.function(“vadd”)
Using SGC Ruby CUDA
●   Memory allocations
        da = CUDevice.malloc(10*4)
        db = CUDevice.malloc(10*4)
        dc = CUDevice.malloc(10*4)
        ha = Buffer.new(:int, 10)
        hb = Buffer.new(:int, 10)
        hc = Buffer.new(:int, 10)
Using SGC Ruby CUDA
●   Initialization
         (0...10).each { |i|
                ha[i] = i
                hb[i] = 1
                hc[i] = ha[i] + hb[i]
                hd[i] = 0
         }
Using SGC Ruby CUDA
●   Transfer inputs to the GPU
        CUMemory.memcpy_htod(da, ha, 4*10)
        CUMemory.memcpy_htod(db, hb, 4*10)
        CUMemory.memcpy_htod(dc, hc, 4*10)
Using SGC Ruby CUDA
●    Launch kernel on GPU
            # Launch with 1x1x1 grid,
            # 10x1x1 blocks,
            params = [da, db, dc, 10]
            f.launch_kernel(1, 1, 1, 10, 1, 1, 0, 0, params)




    By CUDA C Programming Guide       By CUDA C Programming Guide
Using SGC Ruby CUDA
●   Transfer results back to system memory
         CUMemory.memcpy_dtoh(hd, dc, 4*10)
●   Verify results
         (0...10).each { |i|
               assert_equal(hc[i], hd[i])
         }
Problematic CUDA Runtime API
●   For use in a CUDA C/C++ program.
●   Workaround
       –   CUDA C/C++ effectively uses C/C++
            bindings.
       –   Create dynamic library for the kernel
            programs.
       –   Load the library at runtime.
Current Limitations
●   Support limited data types.
       –   Fixnum   → int
       –   ??       → long
       –   Float    → float
       –   ??       → double
●   No supports for CUDA C++ templates.
●   No Ruby in a kernel program.
To Support
●   Texture memory.
●   New features in CUDA 4.0
       –   Multi-GPU.
       –   Unified Virtual Memory.
●   More C data types.
●   Mac platform.
Try It Now! Thank You ~
git clone git://github.com/xman/sgc-ruby-cuda.git
cd sgc-ruby-cuda
gem install ffi yard
rake test
rake yard

GPU Computing with Ruby

  • 1.
    GPU Computing withRuby SpeedGo Computing Chung Shin Yee shinyee@speedgocomputing.com
  • 2.
    CPU vs GPUArchitecture 6 Core vs 1024 Core 6 GB/s vs 300 GB/s Memory Bandwidth By CUDA C Programming Guide
  • 3.
    CUDA Programming Model . . . . By CUDA C Programming Guide
  • 4.
    Existing Programming Tools ● Cg ● BrookGPU ● GLSL (OpenGL Shading Language) ● Nvidia CUDA C/C++ ● OpenCL ● PyCUDA Where is the Red Ruby ?
  • 5.
    Bridging Ruby &CUDA C/C++ ● Ruby C extension – Hard to manipulate Ruby objects in C. – Compilation problems. ● Ruby FFI – Bridging purely in Ruby. – Support multiple Ruby implementations.
  • 6.
  • 7.
    Developing SGC RubyCUDA ● Object-oriented API. ● Start with crucial operations. – Memory allocation. – Memory transfer. – Kernel launch. – Wrapper for structures. ● Documented with YARD.
  • 8.
    Driver vs RuntimeAPI ● CUDA Driver API – For system developers. – Supported by PyCUDA. ● CUDA Runtime API – For computation centric developers. We going to support both API !
  • 9.
    Using SGC RubyCUDA ● Kernel program in CUDA C.
  • 10.
    Using SGC RubyCUDA ● Compiling kernel into PTX. – nvcc --ptx vadd.cu
  • 11.
    Using SGC RubyCUDA ● Setup require 'rubycu' include SGC::CU CUInit.init d = CUDevice.get(0) c = CUContext.create(d) m = CUModule.new.load(“vadd.ptx”) f = m.function(“vadd”)
  • 12.
    Using SGC RubyCUDA ● Memory allocations da = CUDevice.malloc(10*4) db = CUDevice.malloc(10*4) dc = CUDevice.malloc(10*4) ha = Buffer.new(:int, 10) hb = Buffer.new(:int, 10) hc = Buffer.new(:int, 10)
  • 13.
    Using SGC RubyCUDA ● Initialization (0...10).each { |i| ha[i] = i hb[i] = 1 hc[i] = ha[i] + hb[i] hd[i] = 0 }
  • 14.
    Using SGC RubyCUDA ● Transfer inputs to the GPU CUMemory.memcpy_htod(da, ha, 4*10) CUMemory.memcpy_htod(db, hb, 4*10) CUMemory.memcpy_htod(dc, hc, 4*10)
  • 15.
    Using SGC RubyCUDA ● Launch kernel on GPU # Launch with 1x1x1 grid, # 10x1x1 blocks, params = [da, db, dc, 10] f.launch_kernel(1, 1, 1, 10, 1, 1, 0, 0, params) By CUDA C Programming Guide By CUDA C Programming Guide
  • 16.
    Using SGC RubyCUDA ● Transfer results back to system memory CUMemory.memcpy_dtoh(hd, dc, 4*10) ● Verify results (0...10).each { |i| assert_equal(hc[i], hd[i]) }
  • 17.
    Problematic CUDA RuntimeAPI ● For use in a CUDA C/C++ program. ● Workaround – CUDA C/C++ effectively uses C/C++ bindings. – Create dynamic library for the kernel programs. – Load the library at runtime.
  • 18.
    Current Limitations ● Support limited data types. – Fixnum → int – ?? → long – Float → float – ?? → double ● No supports for CUDA C++ templates. ● No Ruby in a kernel program.
  • 19.
    To Support ● Texture memory. ● New features in CUDA 4.0 – Multi-GPU. – Unified Virtual Memory. ● More C data types. ● Mac platform.
  • 20.
    Try It Now!Thank You ~ git clone git://github.com/xman/sgc-ruby-cuda.git cd sgc-ruby-cuda gem install ffi yard rake test rake yard