Learn
Currently, the Julia CUDA stack is the most mature, easiest to install, and full-featured. The CUDA.jl documentation is a central place for information on all relevant packages. Start with the instructions on how to install the stack, and follow with this introductory tutorial.
If you prefer videos, the presentations below highlight different aspects of the toolchain.
Concurrent GPU computing in CUDA.jl 3.0
Introduction to concurrent GPU computing:
Overlapping GPU computations
Using multiple devices
Using threads
Effective CUDA GPU computing in Julia
Design and benefits of the Julia GPU stack
Composability with existing (non-GPU) software
Performance killers and tools for optimization
Demonstration