PGI CUDA C x86 Compiler
The NVIDIA CUDA architecture was developed to enable offloading computationally intensive kernels to massively parallel GPUs. Through API function calls and language extensions, CUDA gives developers explicit control over the mapping of general-purpose computational kernels to GPUs, as well as the placement and movement of data between an x86 processor and the GPU. First introduced in 2007, CUDA is the most popular GPGPU parallel programming model.
The PGI CUDA C compiler for x86 platforms will allow developers using CUDA to compile and optimize their CUDA applications to run on x86-based workstations, servers and clusters with or without an NVIDIA GPU accelerator. When run on x86-based systems without a GPU, PGI CUDA C applications will use multiple cores and the streaming SIMD (Single Instruction Multiple Data) capabilities of Intel and AMD CPUs for parallel execution.
PGI CUDA C for Multi-core x86
The PGI CUDA C compiler will implement the current NVIDIA CUDA C language for GPUs, and it will closely track the evolution of CUDA C moving forward. PGI CUDA C for x86 implementation will proceed in phases:
Longer term, the PGI CUDA C for x86 compiler will support execution of device kernels on NVIDIA CUDA-enabled GPUs. In addition, PGI Unified Binary technology will enable developers to build one binary that will use NVIDIA GPUs when present or default to using multi-core x86 only if no GPU is present.
The PGI CUDA C for x86 compiler processes CUDA C as a native parallel programming language for multi-core x86 including:
At run-time, CUDA C programs compiled for x86 will execute each CUDA thread block using a single host core, eliminating synchronization where possible. CUDA host code will support all PGI optimizations for Intel/AMD processors. PGI believes that well-structured CUDA C for multi-core x86 programs can approach the efficiency and performance of the same algorithm written in OpenMP.