StarPU Handbook - StarPU Applications
CUDA Extensions

Macros

#define STARPU_USE_CUDA
 
#define STARPU_HAVE_LIBNVIDIA_ML
 
#define STARPU_MAXCUDADEVS
 
#define STARPU_CUBLAS_REPORT_ERROR(status)
 
#define STARPU_CUDA_REPORT_ERROR(status)
 

Functions

void starpu_cublas_report_error (const char *func, const char *file, int line, int status)
 
void starpu_cuda_report_error (const char *func, const char *file, int line, cudaError_t status)
 
cudaStream_t starpu_cuda_get_local_stream (void)
 
const struct cudaDeviceProp * starpu_cuda_get_device_properties (unsigned workerid)
 
int starpu_cuda_copy_async_sync (void *src_ptr, unsigned src_node, void *dst_ptr, unsigned dst_node, size_t ssize, cudaStream_t stream, enum cudaMemcpyKind kind)
 
int starpu_cuda_copy2d_async_sync (void *src_ptr, unsigned src_node, void *dst_ptr, unsigned dst_node, size_t blocksize, size_t numblocks, size_t ld_src, size_t ld_dst, cudaStream_t stream, enum cudaMemcpyKind kind)
 
int starpu_cuda_copy3d_async_sync (void *src_ptr, unsigned src_node, void *dst_ptr, unsigned dst_node, size_t blocksize, size_t numblocks_1, size_t ld1_src, size_t ld1_dst, size_t numblocks_2, size_t ld2_src, size_t ld2_dst, cudaStream_t stream, enum cudaMemcpyKind kind)
 
void starpu_cuda_set_device (unsigned devid)
 
nvmlDevice_t starpu_cuda_get_nvmldev (unsigned devid)
 
void starpu_cusparse_init (void)
 
void starpu_cusparse_shutdown (void)
 
cusparseHandle_t starpu_cusparse_get_local_handle (void)
 
void starpu_cublas_init (void)
 
void starpu_cublas_set_stream (void)
 
void starpu_cublas_shutdown (void)
 
cublasHandle_t starpu_cublas_get_local_handle (void)
 
void starpu_cusolver_init (void)
 
void starpu_cusolver_shutdown (void)
 
cusolverDnHandle_t starpu_cusolverDn_get_local_handle (void)
 
cusolverSpHandle_t starpu_cusolverSp_get_local_handle (void)
 
cusolverRfHandle_t starpu_cusolverRf_get_local_handle (void)
 

Detailed Description

Macro Definition Documentation

◆ STARPU_USE_CUDA

#define STARPU_USE_CUDA

Defined when StarPU has been installed with CUDA support. It should be used in your code to detect the availability of CUDA.

◆ STARPU_HAVE_LIBNVIDIA_ML

#define STARPU_HAVE_LIBNVIDIA_ML

Defined when StarPU has been installed with NVidia-ML support. It should be used in your code to detect the availability of NVML-related functions.

◆ STARPU_MAXCUDADEVS

#define STARPU_MAXCUDADEVS

Define the maximum number of CUDA devices that are supported by StarPU.

◆ STARPU_CUBLAS_REPORT_ERROR

#define STARPU_CUBLAS_REPORT_ERROR (   status)

Call starpu_cublas_report_error(), passing the current function, file and line position.

◆ STARPU_CUDA_REPORT_ERROR

#define STARPU_CUDA_REPORT_ERROR (   status)

Call starpu_cuda_report_error(), passing the current function, file and line position.

Function Documentation

◆ starpu_cusparse_init()

void starpu_cusparse_init ( void  )

Initialize CUSPARSE on every CUDA device controlled by StarPU. This call blocks until CUSPARSE has been properly initialized on every device. See CUDA-specificOptimizations for more details.

◆ starpu_cublas_init()

void starpu_cublas_init ( void  )

Initialize CUBLAS on every CUDA device. The CUBLAS library must be initialized prior to any CUBLAS call. Calling starpu_cublas_init() will initialize CUBLAS on every CUDA device controlled by StarPU. This call blocks until CUBLAS has been properly initialized on every device. See CUDA-specificOptimizations for more details.

◆ starpu_cublas_get_local_handle()

cublasHandle_t starpu_cublas_get_local_handle ( void  )

Return the CUBLAS handle to be used to queue CUBLAS kernels. It is properly initialized and configured for multistream by starpu_cublas_init(). See CUDA-specificOptimizations for more details.

◆ starpu_cublas_report_error()

void starpu_cublas_report_error ( const char *  func,
const char *  file,
int  line,
int  status 
)

Report a CUBLAS error. See CUDASupport for more details.

◆ starpu_cuda_report_error()

void starpu_cuda_report_error ( const char *  func,
const char *  file,
int  line,
cudaError_t  status 
)

Report a CUDA error. See CUDASupport for more details.

◆ starpu_cuda_get_local_stream()

cudaStream_t starpu_cuda_get_local_stream ( void  )

Return the current worker’s CUDA stream. StarPU provides a stream for every CUDA device controlled by StarPU. This function is only provided for convenience so that programmers can easily use asynchronous operations within codelets without having to create a stream by hand. Note that the application is not forced to use the stream provided by starpu_cuda_get_local_stream() and may also create its own streams. Synchronizing with cudaDeviceSynchronize() is allowed, but will reduce the likelihood of having all transfers overlapped. See CUDA-specificOptimizations for more details.

◆ starpu_cuda_get_device_properties()

const struct cudaDeviceProp * starpu_cuda_get_device_properties ( unsigned  workerid)

Return a pointer to device properties for worker workerid (assumed to be a CUDA worker). See EnablingImplementationAccordingToCapabilities for more details.

◆ starpu_cuda_copy_async_sync()

int starpu_cuda_copy_async_sync ( void *  src_ptr,
unsigned  src_node,
void *  dst_ptr,
unsigned  dst_node,
size_t  ssize,
cudaStream_t  stream,
enum cudaMemcpyKind  kind 
)

Copy ssize bytes from the pointer src_ptr on src_node to the pointer dst_ptr on dst_node. The function first tries to copy the data asynchronous (unless stream is NULL). If the asynchronous copy fails or if stream is NULL, it copies the data synchronously. The function returns -EAGAIN if the asynchronous launch was successfull. It returns 0 if the synchronous copy was successful, or fails otherwise.

See CUDASupport for more details.

◆ starpu_cuda_copy2d_async_sync()

int starpu_cuda_copy2d_async_sync ( void *  src_ptr,
unsigned  src_node,
void *  dst_ptr,
unsigned  dst_node,
size_t  blocksize,
size_t  numblocks,
size_t  ld_src,
size_t  ld_dst,
cudaStream_t  stream,
enum cudaMemcpyKind  kind 
)

Copy numblocks blocks of blocksize bytes from the pointer src_ptr on src_node to the pointer dst_ptr on dst_node.

The blocks start at addresses which are ld_src (resp. ld_dst) bytes apart in the source (resp. destination) interface.

The function first tries to copy the data asynchronous (unless stream is NULL). If the asynchronous copy fails or if stream is NULL, it copies the data synchronously. The function returns -EAGAIN if the asynchronous launch was successfull. It returns 0 if the synchronous copy was successful, or fails otherwise.

See CUDASupport for more details.

◆ starpu_cuda_copy3d_async_sync()

int starpu_cuda_copy3d_async_sync ( void *  src_ptr,
unsigned  src_node,
void *  dst_ptr,
unsigned  dst_node,
size_t  blocksize,
size_t  numblocks_1,
size_t  ld1_src,
size_t  ld1_dst,
size_t  numblocks_2,
size_t  ld2_src,
size_t  ld2_dst,
cudaStream_t  stream,
enum cudaMemcpyKind  kind 
)

Copy numblocks_1 * numblocks_2 blocks of blocksize bytes from the pointer src_ptr on src_node to the pointer dst_ptr on dst_node.

The blocks are grouped by numblocks_1 blocks whose start addresses are ld1_src (resp. ld1_dst) bytes apart in the source (resp. destination) interface.

The function first tries to copy the data asynchronous (unless stream is NULL). If the asynchronous copy fails or if stream is NULL, it copies the data synchronously. The function returns -EAGAIN if the asynchronous launch was successfull. It returns 0 if the synchronous copy was successful, or fails otherwise.

See CUDASupport for more details.

◆ starpu_cuda_set_device()

void starpu_cuda_set_device ( unsigned  devid)

Call cudaSetDevice(devid) or cudaGLSetGLDevice(devid), according to whether devid is among the field starpu_conf::cuda_opengl_interoperability.

See CUDASupport for more details.

◆ starpu_cuda_get_nvmldev()

nvmlDevice_t starpu_cuda_get_nvmldev ( unsigned  devid)

Return the nvml device for a CUDA device See CUDASupport for more details.

◆ starpu_cusolver_init()

void starpu_cusolver_init ( void  )

Initialize CUSOLVER on every CUDA device controlled by StarPU. This call blocks until CUSOLVER has been properly initialized on every device.

See CUDA-specificOptimizations

◆ starpu_cublas_set_stream()

void starpu_cublas_set_stream ( void  )

Set the proper CUBLAS stream for CUBLAS v1. This must be called from the CUDA codelet before calling CUBLAS v1 kernels, so that they are queued on the proper CUDA stream. When using one thread per CUDA worker, this function does not do anything since the CUBLAS stream does not change, and is set once by starpu_cublas_init(). See CUDA-specificOptimizations for more details.

◆ starpu_cublas_shutdown()

void starpu_cublas_shutdown ( void  )

Synchronously deinitialize the CUBLAS library on every CUDA device. See CUDA-specificOptimizations for more details.

◆ starpu_cusolver_shutdown()

void starpu_cusolver_shutdown ( void  )

Synchronously deinitialize the CUSOLVER library on every CUDA device.

See CUDA-specificOptimizations

◆ starpu_cusolverDn_get_local_handle()

cusolverDnHandle_t starpu_cusolverDn_get_local_handle ( void  )

Return the CUSOLVER Dense handle to be used to queue CUSOLVER kernels. It is properly initialized and configured for multistream by starpu_cusolver_init().

See CUDA-specificOptimizations

◆ starpu_cusolverSp_get_local_handle()

cusolverSpHandle_t starpu_cusolverSp_get_local_handle ( void  )

Return the CUSOLVER Sparse handle to be used to queue CUSOLVER kernels. It is properly initialized and configured for multistream by starpu_cusolver_init().

See CUDA-specificOptimizations

◆ starpu_cusolverRf_get_local_handle()

cusolverRfHandle_t starpu_cusolverRf_get_local_handle ( void  )

Return the CUSOLVER Refactorization handle to be used to queue CUSOLVER kernels. It is properly initialized and configured for multistream by starpu_cusolver_init().

See CUDA-specificOptimizations

◆ starpu_cusparse_shutdown()

void starpu_cusparse_shutdown ( void  )

Synchronously deinitialize the CUSPARSE library on every CUDA device. See CUDA-specificOptimizations for more details.

◆ starpu_cusparse_get_local_handle()

cusparseHandle_t starpu_cusparse_get_local_handle ( void  )

Return the CUSPARSE handle to be used to queue CUSPARSE kernels. It is properly initialized and configured for multistream by starpu_cusparse_init(). See CUDA-specificOptimizations for more details.