PGI CUDA C x86 Compiler

 Der PGI CUDA C x86 Compiler ermöglicht Ihnen die Generierung von Programmen aus CUDA Code für die alternative Ausführung auf einem x86 Prozessor, wenn eine x64-Bit CUDA fähige NVIDIA GPU nicht zur Verfügung steht.

Der Compiler wurde bereits 2010 anläßlich der Supercomputing '10 in New Orleans vorgestellt.


Zurzeit werden Beta Tester mit anspruchsvollem CUDA Quellcode gesucht

Alle Workstation, Server und CDK Compiler sowie PGI Visual Fortran sind als Accelerator Version erhältlich.

Weitergehende Informationen zu den PGI Accelerator Compilern finden Sie hier


The PGI CUDA C compiler for x86 platforms will allow developers using CUDA to compile and optimize their CUDA applications to run on x86-based workstations, servers and clusters with or without an NVIDIA GPU accelerator. When run on x86-based systems without a GPU, PGI CUDA C applications will use multiple cores and the streaming SIMD (Single Instruction Multiple Data) capabilities of Intel and AMD CPUs for parallel execution.

PGI CUDA C for Multi-core x86


The PGI CUDA C compiler will implement the current NVIDIA CUDA C language for GPUs, and it will closely track the evolution of CUDA C moving forward. PGI CUDA C for x86 implementation will proceed in phases:

  1. Prototype demonstration at SC10 in New Orleans (November 2010)
  2. First production release in Q2 2011 with most CUDA C functionality; this will not be a performance release
  3. Performance release in Q4 2011 leveraging multi-core and SSE/AVX to implement low-overhead native parallel/SIMD execution

Longer term, the PGI CUDA C for x86 compiler will support execution of device kernels on NVIDIA CUDA-enabled GPUs. In addition, PGI Unified Binary technology will enable developers to build one binary that will use NVIDIA GPUs when present or default to using multi-core x86 only if no GPU is present.

Implementation Overview

The PGI CUDA C for x86 compiler processes CUDA C as a native parallel programming language for multi-core x86 including:

  • Inlining device kernel functions
  • Translating chevron syntax to parallel/vector loops
  • Using multiple cores and SSE/AVX instructions

At run-time, CUDA C programs compiled for x86 will execute each CUDA thread block using a single host core, eliminating synchronization where possible. CUDA host code will support all PGI optimizations for Intel/AMD processors. PGI believes that well-structured CUDA C for multi-core x86 programs can approach the efficiency and performance of the same algorithm written in OpenMP.