Hardware support for CUDA is ubiquitous. Most gamer rigs have an NVIDIA GPU:
And nearly every current NVIDIA GPU supports CUDA.
But somehow CUDA projects remain obscure.
I was searching for CUDA projects a while back and kept finding scattered bits & pieces of information without any cohesive organization. It reminded me a bit of the era of x2ftp.oulu.fi, when finding comprehensive information about game development and demoscene techniques was an ordeal.
So I’m collecting some CUDA projects here to provide a breadcrumb trail for the next person to go down this path.
At some point I might write a bit of a UI to make it easier to search & filter these. But for now I’ve only got time for traditional HTML: just one long static vertical page of text and images.
(Note that this list leans toward real-time graphics, not simulations, AI/ML, crypto, or offline rendering.)
Igor Ševo’s A million particles in CUDA and OpenGL, running on a GeForce GTX 570:
Path tracing / ARTiOW
Simon Brown’s classic path-tracer with explicit direct lighting.
Roger Allen’s Accelerated Ray Tracing in One Weekend in CUDA.
Henrik Dahlberg’s CUDA / OpenGL path tracer. With real-time update (here’s a blog post about implementing this with GLFW):
Andy Eder’s path tracer, running on a GTX 1080 Ti:
And here it is running much faster on a Turing-class RTX 2080 Ti (no RTX features were used):
Cyrille Favreau’s Sol-R supports CUDA or OpenCL.
Sven Forstmann’s RLE-based-Voxel-Raycasting / Voxlap method
@voxel-tracer has shared many CUDA rendering projects:
v-elev, render voxel elevation models (heightfields?).
a sphere tracer. Has the interesting constraint that it only draws scenes that fit in the 64KB constant memory on the GPU.
CUDA port of ‘Ray Tracing in One Weekend’. Achieves 0.55sec (non-realtime) render on a GTX 1050, with 10 bounces. With a single bounce, this can run real-time at 17 FPS.
Ashley Hauck’s Clam
floating point rounding shenanigans pic.twitter.com/kzquGwJU0A— Ashley (@khyperia) July 27, 2019
Clam4 lives! I rewrote, for 8th or 9th time, my fractal raytracer in C# (with CUDA rendering engine) to make it more portable. pic.twitter.com/UTDiWr3IVO— Ashley (@khyperia) March 25, 2017
Clouds, volumes, fluids
Tokaspt family tree
Tokaspt has had several offshoots:
Sam Lepere (RayTracey) based his 2011 tokap (“The Once Known as Pong”) on tokaspt. Tokap is a real-time path-traced Pong:
If I understand correctly, NVIDIA Optix is built on top of CUDA. So technically projects using Optix are running CUDA – albeit through Optix’s abstractions and not calling CUDA directly.
Jacco Bikker’s Lighthouse 2 real-time raytracing framework. LH2 relies on Optix 6.0 for its raytracing (and eventually other raytracing hardware / APIs).
Achieving 33 FPS on a 1060m:
Relying on Optix makes LH2’s raytracing infrastructure ‘optimal’ to a large degree - in the sense that it’s difficult to achieve faster performance with your own code:
"The ray tracing infrastructure (with related scene management acceleration structure maintenance) should be close to optimal." That's a bold statement or I'm reading it wrong.— Dominik Susmel (@Keyframe) July 11, 2019
@ProgrammerLin’s voxel w/ GI demo, via OptiX 6.0 running on an NVIDIA GTX 2070 (w/ RTX):
Ingo Wald’s series on an Optix version of Raytracing in One Weekend:
OpenCL is the portable analogue to CUDA. It’s reportedly slower than CUDA (although some project report OpenCL to be faster).
Sven Forstmann’s Voxel Splatting using OpenCL. Achieves 2 Billion splats / second (!), about 20-30 FPS.
Sven Forstmann’s “Sparse Voxel Octree Raycasting with Image Warping exploiting Frame-to-Frame Coherence”. Achieves 30-50 FPS on a GTX 580M:
David Bucciarelli wrote several OpenCL demos, which he compared to their CPU counterpart:
- 0.45M samples/second for Smallpt (CPU)
- 0.42M samples/second for SmallptGPU (running on CPU only, single-threaded)
- 4.5M samples/second for SmallptGPU (running on GPU)
The GPU runtime is 10x faster than single-threaded CPU. However, multi-threading would close the gap quite a bit. An 8 core machine would otensibly achieve nearly the same performance.
Official NVIDIA CUDA projects
There’s not a great guide to the CUDA toolkit samples, so I made my own by crawling the sources. It’s implemented as a (barebones) Datasette browser.
(That web app is running on a free Heroku Dyno instance which goes to sleep, so give it 5-20 seconds to boot if it’s slow to launch)
Voxels, no source code
There are several recent projects without source code available. They’re still worth checking out because they’re newer and show the current state of the art.
Jacco Bikker recently (first half of 2019) wrote a voxel raytracer. Achieves 200+ FPS on a 1060m.
This is a CUDA port of his earlier CPU voxel raytracer.
Voxel sphere ray tracing in CUDA, with a larger dataset. Running on a mobile 1060 here. Framerate should allow for a shadow ray per pixel, maybe a diffuse bounce? With some filtering this could work. :) pic.twitter.com/dhJSpFF5XY— Jacco Bikker (@j_bikker) March 29, 2019
@ProgrammerLin dropped RTX / Optix support and implemented this directly in CUDA, achieving 100 FPS for single-bounce lighting:
Decided a while ago to drop Optix, so here's my first-ever Cuda program - a voxel path tracer with single-bounce lighting. Runs at ~100fps on my RTX 2070 and isn't using any RTX features. Not optimized yet either. Not bad for the first day! #gamedev #voxels #indiegamedev pic.twitter.com/sBx0e4aaEN— Lin (@ProgrammerLin) January 6, 2019
Later, @ProgrammerLin ported this to RTX / Optix 6.0:
Got back into Optix with the recent release of 6.0. Here's an experiment with multiple materials. The jagged edges are because the diamond is voxelized without sharp features. pic.twitter.com/hydFnygAsu— Lin (@ProgrammerLin) April 26, 2019
A more recent version, running at 30 FPS:
Spent a little bit of time today working on this project again. Now voxels emit light properly. There are still a couple minor issues to fix but it's actually usable! This was running constantly at 30 fps at 720p, 10 spp. #rtx #voxels #pathtracing #gamedev #indiegamedev pic.twitter.com/qcbYrwoiKN— Lin (@ProgrammerLin) July 23, 2019