NERSCPowering Scientific Discovery Since 1974

Introduction to OpenCL


OpenCL is an open standard for programming heterogeneous computers composed of CPUs, GPUs and other processors.  OpenCL consists of a framework to define the platform as a host (typically a CPU) and one or more compute devices (e.g. a GPU) plus a C-based programming language for writing programs for the compute devices.  Using OpenCL, a programmer can write parallel programs that use all the resources of the heterogeneous computer.    We give an example of a C++ API but the concepts are relevant to OpenCL as a whole. OpenCL is current available on NERSC's DIRAC Testbed.

From: SC13 Tutorial -- OpenCL: A Hands-on Introduction

Tim Mattson, Intel Corp.; Alice Koniges, Berkeley Lab; Simon McIntosh-Smith University of Bristol.

SC13 attendees are shown working on Dirac. Tutorial Credits: This content is based on slides produced by Tom Deakin and Simon which were based on slides by Tim and Simon with help from Ben Gaster (Qualcomm) .

Dirac Instructions (comments are denoted with a #):

# Log into a login node for accessing Dirac (carver)
ssh -Y
qsub -I -V -q dirac_int -l nodes=1:ppn=8
# Wait to be taken to a node
#you are there when your prompt says [username@dirac37] for example

Note: Module set-up is subject to default changes

module unload cuda
module load cuda/5.5
module unload pgi
module load gcc-sl6

Go to previous working directory if in a newly started PBS shell:


Make a directory for your exercises and grab them if you have not already

mkdir OpenCL_exercises
cd OpenCL_exercises
svn export

Compilation and first execution:

make; ./vadd

Example: vector addition

The hello world of program of data parallel programming is to add two vectors

C[i] = A[i] + B[i] for i=0 to N-1

For the OpenCL solution, there are two parts

      Kernel code

      Host code

Vector Addition – Kernel 

__kernel void vadd(  __global const float *a,
                                                     __global const float *b,
                                                     __global            float *c)
     int gid = get_global_id(0);
     c[gid]  = a[gid] + b[gid];

      Take the Vadd program we provide you. It will run a simple kernel to add two vectors together.

      Look at the host code and identify the API calls in the host code. Compare them against the API descriptions on the OpenCL C++ reference card.

       Expected output:

      A message verifying that the program completed successfully

Vector Addition – Host

       The host program is the code that runs on the host to:

      Setup the environment for the OpenCL program

      Create and manage kernels

       5 simple steps in a basic host program:

      Define the platform … platform = devices+context+queues

      Create and Build the program (dynamic library for kernels)

      Setup memory objects

      Define the kernel (attach arguments to kernel function)

      Submit commands … transfer memory objects and execute kernels

The C++ Interface

       Khronos has defined a common C++ header file containing a high level interface to OpenCL, cl.hpp

       This interface is dramatically easier to work with1

       Key features:

      Uses common defaults for the  platform and command-queue, saving the programmer from extra coding for the most common use cases

      Simplifies the basic API by bundling key parameters with the objects rather than requiring verbose and repetitive argument lists

      Ability to “call” a kernel from the host, like a regular function

      Error checking can be performed with C++ exceptions

1 especially for C++ programmers…