CUDA.jl 5.8: CuSparseVector broadcasting, CUDA 12.9, and more
Tim Besard
CUDA.jl v5.8 brings several enhancements, most notably the introduction of broadcasting support for CuSparseVector
. The release also includes support for CUDA 12.9, and updates to key CUDA libraries like cuTENSOR, cuQuantum, and cuDNN.
Broadcasting for CuSparseVector
A significant enhancement in CUDA.jl v5.8 is the support for broadcasting CuSparseVector
. Thanks to @kshyatt, it is now possible to use sparse GPU vectors in broadcast expressions just like it was already possible with sparse matrices:
julia> using CUDA, .CUSPARSE, SparseArrays
julia> x = cu(sprand(Float32, 10, 0.3))
10-element CuSparseVector{Float32, Int32} with 4 stored entries:
[2] = 0.459139
[3] = 0.964073
[8] = 0.904363
[9] = 0.721723
julia> # a zero-preserving elementwise operation
x .* 2
10-element CuSparseVector{Float32, Int32} with 4 stored entries:
[2] = 0.918278
[3] = 1.928146
[8] = 1.808726
[9] = 1.443446
julia> # a non-zero-preserving elementwise operation
x .+ 1
10-element CuArray{Float32, 1, CUDA.DeviceMemory}:
1.0
1.4591388
1.9640732
1.0
1.0
1.0
1.0
1.9043632
1.7217231
1.0
julia> # combining multiple sparse inputs
x .+ cu(sprand(Float32, 10, 0.3))
10-element CuSparseVector{Float32, Int32} with 6 stored entries:
[1] = 0.906
[2] = 0.583197
[3] = 0.964073
[4] = 0.259103
[8] = 0.904363
[9] = 0.935917
Minor Changes
CUDA.jl 5.8 also includes several other useful updates:
Added support for CUDA 12.9;
Subpackages have been updated to CUDNN 9.10, cuTensor 2.2, and cuQuantum 25.03;
CUSPARSE.gemm!
now supports additional algorithms choices to limit memory usage;Symbols can now be passed to CUDA kernels and stored in
CuArray
s;CuTensor
multiplication now preserves the memory type of the input tensors;Sparse CSR matrices are now interfaced with the SparseMatricesCSR.jl package.
As always, we encourage users to update to the latest version to benefit from these improvements and bug fixes. Check out the changelog for a full list of changes.