Wednesday, 10 October 2012

The OpenCL Standard

The OpenCL Standard / OpenCL Specification
Open programming standards designers are tasked with a very challenging objective: arrive at a common set of programming standards that are acceptable to a range of competing needs and requirements. The Khronos consortium that manages the OpenCL standard has done a good job addressing these requirements. The consortium has developed an applications programming interface (API) that is general
enough to run on significantly different architectures while being adaptable enough that each hardware platform can still obtain high performance. Using the core language and correctly following the specification, any program designed for one vendor can execute on another’s hardware. The model set forth by OpenCL creates portable, vendor- and device-independent programs that are capable of being accelerated on many different hardware platforms.The OpenCL API is a C with a CþþWrapper API that is defined in terms of the C API. There are third-party bindings for many languages, including Java, Python, and .NET. The code that executes on an OpenCL device, which in general is not the same
device as the host CPU, is written in the OpenCL C language. OpenCL C is a restricted version of the C99 language with extensions appropriate for executing data-parallel code on a variety of heterogeneous devices. The OpenCL Specification The OpenCL specification is defined in four parts, called models, that can be summarized as follows:
Heterogeneous Computing with OpenCL
1.     Platform model: Specifies that there is one processor coordinating execution (the host) and one or more processors capable of executing OpenCL C code (the devices). It defines an abstract hardware model that is used by programmers when writing OpenCL C functions (called kernels) that execute on the devices.
 
 
 
 
 
 
2.     Execution model: Defines how the OpenCL environment is configured on the host and how kernels are executed on the device. This includes setting up an OpenCL context on the host, providing mechanisms for host–device interaction, and defining a concurrency model used for kernel execution on devices.
 
 
 
 
 
3.     Memory model: Defines the abstract memory hierarchy that kernels use, regardless of the actual underlying memory architecture. The memory model closely resembles current GPU memory hierarchies, although this has not limited adoptability by other accelerators.
 
 
 
 
4.     Programming model: Defines how the concurrency model is mapped to physical hardware.

 
 
In a typical scenario, we might observe an OpenCL implementation executing on a host x86 CPU, which is using a GPU device as an accelerator. The platform model defines this relationship between the host and device. The host sets up a kernel for the GPU to run and instantiates it with some specified degree of parallelism. This is the execution model. The data within the kernel is allocated by the programmer to specific parts of an abstract memory hierarchy. The runtime and driver will map these abstract memory spaces to the physical hierarchy. Finally, hardware thread contexts that execute the kernel must be created and mapped to actual GPU hardware units. This is done using the programming model.
Basic OpenCL Program Structure
 
OpenCL Language & API Highlights
Platform Layer API (called from host)
Abstraction layer for diverse computational resources
Query, select and initialize compute devices
Create compute contexts and work-queues
Runtime API (called from host)
Launch compute kernels
© NVIDIA Corporation 2009
Set kernel execution configuration
Manage scheduling, compute, and memory resources
OpenCL Language
Write compute kernels that run on a compute device
C-based cross-platform programming interface
Subset of ISO C99 with language extensions
Includes rich set of built-in functions, in addition to standard C operators

 

No comments:

Post a Comment