Learn
Currently, the Julia CUDA stack is the most mature, easiest to install, and full-featured. The CUDA.jl documentation is a central place for information on all relevant packages. Start with the instructions on how to install the stack, and follow with this introductory tutorial. There are also a series of notebooks on more advanced uses of CUDA.jl, including application and kernel optimization, as well as advanced memory management and concurrent programming concepts (which apply to other back-ends as well).
If you prefer video material, there are plenty of talks and workshops on GPU programming in Julia to be found on Youtube. For example:
GPU programming in Julia
3-hour workshop covering various of the toolchain:
Array programming
Kernel programminng
Parallel proggramming concepts
CUDA.jl application and kernel profiling
Image processing using AMDGPU.jl
Vendor-neutral GPU programming with KernelAbstractions.jl
Concurrent GPU computing in CUDA.jl 3.0
Introduction to concurrent GPU computing:
Overlapping GPU computations
Using multiple devices
Using threads