CUDA

CUDA, aka Compute Unified Device Architecture, is an open source programming language created in 2007.

#49on PLDB

17Years Old

18kRepos

CUDA is a parallel computing platform and application programming interface (API) model created by Nvidia. It allows software developers and software engineers to use a CUDA-enabled graphics processing unit (GPU) for general purpose processing – an approach termed GPGPU (General-Purpose computing on Graphics Processing Units). The CUDA platform is a software layer that gives direct access to the GPU's virtual instruction set and parallel computational elements, for the execution of compute kernels. Read more on Wikipedia...

CUDA website
CUDA downloads page
CUDA Wikipedia page
CUDA docs
There are at least 18,135 CUDA repos on GitHub
CUDA first developed in Nvidia
PLDB estimates there are currently 769 job openings for CUDA programmers.
file extensions for CUDA include cu and cuh
The Google BigQuery Public Dataset GitHub snapshot shows 4k users using CUDA in 4k repos on GitHub
Check out the 32 CUDA meetup groups on Meetup.com.
Pygments supports syntax highlighting for CUDA
GitHub supports syntax highlighting for CUDA
Release Notes for CUDA
Official Blog page for CUDA
Frequently Asked Questions for CUDA
Indeed.com has 483 matches for "cuda engineer".
See also: (18 related languages)Linux, C, Fortran, OpenGL, OpenCL, LLVM IR, Python, Perl, Java, Ruby, Lua, Haskell, R, MATLAB, IDL, Mathematica, Common Lisp, F#

Example from hello-world:

#include <stdio.h>

__global__ void hello_world(){
    printf("Hello World\n");
}

int main() {
    hello_world<<<1,1>>>();
    return 0;
}

Example from the Hello World Collection:

// Hello world in CUDA

#include <stdio.h>
 
const int N = 16; 
const int blocksize = 16; 
 
__global__ 
void hello(char *a, int *b) 
{
	a[threadIdx.x] += b[threadIdx.x];
}
 
int main()
{
	char a[N] = "Hello \0\0\0\0\0\0";
	int b[N] = {15, 10, 6, 0, -11, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
 
	char *ad;
	int *bd;
	const int csize = N*sizeof(char);
	const int isize = N*sizeof(int);
 
	printf("%s", a);
 
	cudaMalloc( (void**)&ad, csize ); 
	cudaMalloc( (void**)&bd, isize ); 
	cudaMemcpy( ad, a, csize, cudaMemcpyHostToDevice ); 
	cudaMemcpy( bd, b, isize, cudaMemcpyHostToDevice ); 
	
	dim3 dimBlock( blocksize, 1 );
	dim3 dimGrid( 1, 1 );
	hello<<<dimGrid, dimBlock>>>(ad, bd);
	cudaMemcpy( a, ad, csize, cudaMemcpyDeviceToHost ); 
	cudaFree( ad );
	cudaFree( bd );
	
	printf("%s\n", a);
	return EXIT_SUCCESS;
}

Example from Linguist:

#include <stdio.h>
#include <cuda_runtime.h>

/**
 * CUDA Kernel Device code
 *
 * Computes the vector addition of A and B into C. The 3 vectors have the same
 * number of elements numElements.
 */
__global__ void
vectorAdd(const float *A, const float *B, float *C, int numElements)
{
    int i = blockDim.x * blockIdx.x + threadIdx.x;

    if (i < numElements)
    {
        C[i] = A[i] + B[i];
    }
}

/**
 * Host main routine
 */
int
main(void)
{
    // Error code to check return values for CUDA calls
    cudaError_t err = cudaSuccess;

    // Launch the Vector Add CUDA Kernel
    int threadsPerBlock = 256;
    int blocksPerGrid =(numElements + threadsPerBlock - 1) / threadsPerBlock;
    vectorAdd<<<blocksPerGrid, threadsPerBlock>>>(d_A, d_B, d_C, numElements);
    err = cudaGetLastError();

    if (err != cudaSuccess)
    {
        fprintf(stderr, "Failed to launch vectorAdd kernel (error code %s)!\n", cudaGetErrorString(err));
        exit(EXIT_FAILURE);
    }

    // Reset the device and exit
    err = cudaDeviceReset();

    return 0;
}

Example from Wikipedia:

import numpy
from pycublas import CUBLASMatrix
A = CUBLASMatrix( numpy.mat([[1,2,3]],[[4,5,6]],numpy.float32) )
B = CUBLASMatrix( numpy.mat([[2,3]],[4,5],[[6,7]],numpy.float32) )
C = A*B
print C.np_mat()

Feature	Supported	Token	Example
Comments	✓		/* A comment */
MultiLine Comments	✓	/* */	/* A comment */
Print() Debugging	✓	printf
Case Insensitive Identifiers	X
Semantic Indentation	X

CUDA - Programming language

Language features