CUDA.jl 2.4 and 2.5

Jan 8, 2021
Tim Besard

CUDA.jl v2.4 and v2.5 are two almost-identical feature releases, respectively for Julia 1.5 and 1.6. These releases feature a greatly improved findmin and findmax kernels, an improved interface for kernel introspection, support for CUDA 11.2, and of course many bug fixes.

Improved `findmin` and `findmax` kernels

Thanks to @tkf and @Ellipse0934, CUDA.jl now uses a single-pass kernel for finding the minimum or maximum item in a CuArray. This fixes compatibility with NaN-valued elements, while on average improving performance. Depending on the rank, shape and size of the array these improvements vary from a minor regression to order-of-magnitude improvements.

New kernel introspection interface

It is now possible to obtain a compiled-but-not-launched kernel by passing the launch=false keyword to @cuda. This is useful when you want to reflect, e.g., query the amount of registers, or other kernel properties:

julia> kernel = @cuda launch=false identity(nothing)
CUDA.HostKernel{identity,Tuple{Nothing}}(...)

julia> CUDA.registers(kernel)
4

The old API is still available, and will even be extended in future versions of CUDA.jl for the purpose of compiling device functions (not kernels):

julia> kernel = cufunction(identity, Tuple{Nothing})
CUDA.HostKernel{identity,Tuple{Nothing}}(...)

Support for CUDA 11.2

CUDA.jl now supports the latest version of CUDA, version 11.2. Because CUDNN and CUTENSOR are not compatible with this release yet, CUDA.jl won't automatically switch to it unless you explicitly request so:

julia> ENV["JULIA_CUDA_VERSION"] = "11.2"
"11.2"

julia> using CUDA

julia> CUDA.versioninfo()
CUDA toolkit 11.2.0, artifact installation
CUDA driver 11.2.0
NVIDIA driver 460.27.4

Alternatively, if you disable use of artifacts through JULIA_CUDA_USE_BINARYBUILDER=false, CUDA 11.2 can be picked up from your local system.

Future developments

Due to upstream compiler changes, CUDA.jl 2.4 is expected to be the last release compatible with Julia 1.5. Patch releases are still possible, but are not automatic: If you need a specific bugfix from a future CUDA.jl release, create an issue or PR to backport the change.

CUDA.jl 2.4 and 2.5

Improved findmin and findmax kernels

New kernel introspection interface

Support for CUDA 11.2

Future developments

Improved `findmin` and `findmax` kernels