CUDA.jl 4.0


Tim Besard

CUDA.jl 4.0 is a breaking release that introduces the use of JLLs to provide the CUDA toolkit. This makes it possible to compile other binary libaries against the CUDA runtime, and use them together with CUDA.jl. The release also brings CUSPARSE improvements, the ability to limit memory use, and many bug fixes and performance improvements.

JLLs for CUDA artifacts

While CUDA.jl has been using binary artifacts for a while, it was manually managing installation and selection of them, i.e., not by using standardised JLL packages. This complicated use of the artifacts by other packages, and made it difficult to build other binary packages against the CUDA runtime.

With CUDA.jl 4.0, we now use JLLs to load the CUDA driver and runtime. Specifically, there are two JLLs in play: CUDA_Driver_jll and CUDA_Runtime_jll. The former is responsible for loading the CUDA driver library (possibly upgrading it using a forward-compatible version), and determining the CUDA version that your set-up supports:

❯ JULIA_DEBUG=CUDA_Driver_jll julia
julia> using CUDA_Driver_jll
┌ System CUDA driver found at libcuda.so.1, detected as version 12.0.0
└ @ CUDA_Driver_jll
┌ System CUDA driver is recent enough; not using forward-compatible driver
└ @ CUDA_Driver_jll

With the driver identified and loaded, CUDA_Runtime_jll can select a compatible toolkit. By default, it uses the latest supported toolkit that is compatible with the driver:

julia> using CUDA_Runtime_jll

julia> CUDA_Runtime_jll.cuda_toolkits
10-element Vector{VersionNumber}:
 v"10.2.0"
 v"11.0.0"
 v"11.1.0"
 v"11.2.0"
 v"11.3.0"
 v"11.4.0"
 v"11.5.0"
 v"11.6.0"
 v"11.7.0"
 v"11.8.0"

julia> CUDA_Runtime_jll.host_platform
Linux x86_64 {cuda=11.8}

As you can see, the selected CUDA runtime is encoded in the host platform. This makes it possible for Julia to automatically select compatible versions of other binary packages. For example, if we install and load SuiteSparse_GPU_jll, which right now provides builds for CUDA 10.2, 11.0 and 12.0, the artifact resolution code knows to load the build for CUDA 11.0 which is compatible with the selected CUDA 11.8 runtime:

julia> using SuiteSparse_GPU_jll

julia> SuiteSparse_GPU_jll.best_wrapper
"~/.julia/packages/SuiteSparse_GPU_jll/.../x86_64-linux-gnu-cuda+11.0.jl"

The change to JLLs requires a breaking change: the JULIA_CUDA_VERSION and JULIA_CUDA_USE_BINARYBUILDER environment variables have been removed, and are replaced by preferences that are set in the current environment. For convenience, you can set these preferences by calling CUDA.set_runtime_version!:

❯ julia --project
julia> using CUDA
julia> CUDA.runtime_version()
v"11.8.0"

julia> CUDA.set_runtime_version!(v"11.7")
┌ Set CUDA Runtime version preference to 11.7,
└ please re-start Julia for this to take effect.

❯ julia --project
julia> using CUDA
julia> CUDA.runtime_version()
v"11.7.0"

julia> using CUDA_Runtime_jll
julia> CUDA_Runtime_jll.host_platform
Linux x86_64 {cuda=11.7}

The changed preference is reflected in the host platform, which means that you can use this mechanism to load a different builds of other binary packages. For example, if you rely on a package or JLL that does not yet have a build for CUDA 12, you could set the preference to v"11.x" to load an available build.

For discovering a local runtime, you can set the version to "local", which will replace the use of CUDA_Runtime_jll by CUDA_Runtime_discovery.jl, an API-compatible package that replaces the JLL with a local runtime discovery mechanism:

❯ julia --project
julia> CUDA.set_runtime_version!("local")
┌ Set CUDA Runtime version preference to local,
└ please re-start Julia for this to take effect.

❯ JULIA_DEBUG=CUDA_Runtime_Discovery julia --project
julia> using CUDA
┌ Looking for CUDA toolkit via environment variables CUDA_PATH
└ @ CUDA_Runtime_Discovery
┌ Looking for binary ptxas in /opt/cuda
│   all_locations =
│    2-element Vector{String}:
│     "/opt/cuda"
│     "/opt/cuda/bin"
└ @ CUDA_Runtime_Discovery
┌ Debug: Found ptxas at /opt/cuda/bin/ptxas
└ @ CUDA_Runtime_Discovery
...

Memory limits

By popular demand, support for memory limits has been reinstated. This functionality had been removed after the switch to CUDA memory pools, as the memory pool allocator does not yet support memory limits. Awaiting improvements by NVIDIA, we have added functionality to impose memory limits from the Julia side, in the form of two environment variables:

The value of these variables can be formatted as a numer of bytes, optionally followed by a unit, or as a percentage of the total device memory. Examples: 100M, 50%, 1.5GiB, 10000.

CUSPARSE improvements

Thanks to the work of @amontoison, the CUSPARSE interface has undergone many improvements:

Other changes

Backport releases

Because CUDA.jl 4.0 is a breaking release, two additional releases have been made that backport bugfixes and select features: