The OpenCL Standard / OpenCL Specification
Open programming standards designers are tasked with a very
challenging objective: arrive at a common set of programming standards that are
acceptable to a range of competing needs and requirements. The Khronos
consortium that manages the OpenCL standard has done a good job addressing
these requirements. The consortium has developed an applications programming
interface (API) that is general
enough to run on significantly different architectures while
being adaptable enough that each hardware platform can still obtain high
performance. Using the core language and correctly following the specification,
any program designed for one vendor can execute on another’s hardware. The
model set forth by OpenCL creates portable, vendor- and device-independent
programs that are capable of being accelerated on many different hardware
platforms.The OpenCL API is a C with a CþþWrapper API that is defined in terms
of the C API. There are third-party bindings for many languages, including
Java, Python, and .NET. The code that executes on an OpenCL device, which in
general is not the same
device as the host CPU, is written in the OpenCL C language.
OpenCL C is a restricted version of the C99 language with extensions
appropriate for executing data-parallel code on a variety of heterogeneous
devices. The OpenCL Specification The OpenCL specification is defined in four
parts, called models, that can be summarized as follows:
Heterogeneous Computing with OpenCL
1.
Platform model: Specifies that there is one processor coordinating execution
(the host) and one or more processors capable of executing OpenCL C code (the
devices). It defines an abstract hardware model that is used by programmers
when writing OpenCL C functions (called kernels) that execute on the devices.
2.
Execution model: Defines how the OpenCL environment is configured
on the host and how kernels are executed on the device. This includes setting
up an OpenCL context on the host, providing mechanisms for host–device
interaction, and defining a concurrency model used for kernel execution on
devices.
3.
Memory model: Defines the abstract memory hierarchy that kernels use,
regardless of the actual underlying memory architecture. The memory model
closely resembles current GPU memory hierarchies, although this has not limited
adoptability by other accelerators.
4.
Programming model: Defines how the concurrency model is mapped to
physical hardware.
In a typical scenario, we might observe an OpenCL
implementation executing on a host x86 CPU, which is using a GPU device as an
accelerator. The platform model defines this relationship between the host and
device. The host sets up a kernel for the GPU to run and instantiates it with
some specified degree of parallelism. This is the execution model. The data
within the kernel is allocated by the programmer to specific parts of an
abstract memory hierarchy. The runtime and driver will map these abstract
memory spaces to the physical hierarchy. Finally, hardware thread contexts that
execute the kernel must be created and mapped to actual GPU hardware units. This
is done using the programming model.
Basic OpenCL Program Structure
OpenCL Language & API Highlights
Platform Layer API (called
from host)
Abstraction layer for diverse computational resources
Query, select and initialize compute devices
Create compute contexts and
work-queues
Runtime API (called
from host)
Launch compute kernels
© NVIDIA Corporation 2009
Set kernel execution configuration
Manage scheduling, compute, and memory resources
OpenCL Language
Write compute kernels that
run on a compute device
C-based cross-platform programming interface
Subset of ISO C99 with language extensions
Includes rich set of built-in functions, in addition to standard
C operators
No comments:
Post a Comment