Devon Strawn
Design + Computers

Survey of CUDA projects

Hardware support for CUDA is ubiquitous. Most gamer rigs have an NVIDIA GPU:

And nearly every current NVIDIA GPU supports CUDA.

But somehow CUDA projects remain obscure.

I was searching for CUDA projects a while back and kept finding scattered bits & pieces of information without any cohesive organization. It reminded me a bit of the era of, when finding comprehensive information about game development and demoscene techniques was an ordeal.

So I’m collecting some CUDA projects here to provide a breadcrumb trail for the next person to go down this path.

At some point I might write a bit of a UI to make it easier to search & filter these. But for now I’ve only got time for traditional HTML: just one long static vertical page of text and images.

(Note that this list leans toward real-time graphics, not simulations, AI/ML, crypto, or offline rendering.)



Igor Ševo’s A million particles in CUDA and OpenGL, running on a GeForce GTX 570:

Path tracing / ARTiOW

Simon Brown’s classic path-tracer with explicit direct lighting.

Adventures in CUDA Path Tracing: Part 1

Adventures in CUDA Path Tracing: Part 2

Roger Allen’s Accelerated Ray Tracing in One Weekend in CUDA.

CUDA port of Aras’s ‘toy path tracer’. Described in this blog series.

Henrik Dahlberg’s CUDA / OpenGL path tracer. With real-time update (here’s a blog post about implementing this with GLFW):

Andy Eder’s path tracer, running on a GTX 1080 Ti:

And here it is running much faster on a Turing-class RTX 2080 Ti (no RTX features were used):

Cyrille Favreau’s Sol-R supports CUDA or OpenCL.


Dave Kotfis and Jiawei Wang’s voxel rendering class project compares CUDA rendering perf vs. the VoxelPipe library.

Sven Forstmann’s RLE-based-Voxel-Raycasting / Voxlap method

@voxel-tracer has shared many CUDA rendering projects:

v-elev, render voxel elevation models (heightfields?).

a sphere tracer. Has the interesting constraint that it only draws scenes that fit in the 64KB constant memory on the GPU.

CUDA port of ‘Ray Tracing in One Weekend’. Achieves 0.55sec (non-realtime) render on a GTX 1050, with 10 bounces. With a single bounce, this can run real-time at 17 FPS.

Another CUDA port of ‘Ray Tracing in One Weekend’.


Ashley Hauck’s Clam

Sergii Kharagorgiev’s fractal_demo described here is a 3D fractal renderer. There’s not much about it online (Sergii’s site is down), no screenshots.

Clouds, volumes, fluids

Peter Whidden’s Fat-Clouds, animations here, a fluid simulator in about 600 lines of code.


Miles Lacey’s basic ray marcher (depends on his cuda_math.h). Even though the filename of the image this snippet generates is cuda_sphere.ppm, it appears to render a torus:

Jonathan Granskog’s simpleCudaRayMarcher, described here is an unoptimized practice raymarcher that renders 3D fractals.

Jarl Larsson‘s KernelSanders is a DX11 raytracer and raymarcher.

Tokaspt family tree

Thierry Berger-Perrin is the originator of the ompf2 real-time raytracing form. Thierry wrote tokaspt (“The Once Known as SmallPT”), a CUDA port of SmallPT.

Tokaspt has had several offshoots:

Sam Lepere (RayTracey) based his 2011 tokap (“The Once Known as Pong”) on tokaspt. Tokap is a real-time path-traced Pong:

Optix-based projects

If I understand correctly, NVIDIA Optix is built on top of CUDA. So technically projects using Optix are running CUDA – albeit through Optix’s abstractions and not calling CUDA directly.

Jacco Bikker’s Lighthouse 2 real-time raytracing framework. LH2 relies on Optix 6.0 for its raytracing (and eventually other raytracing hardware / APIs).

Achieving 33 FPS on a 1060m:

Relying on Optix makes LH2’s raytracing infrastructure ‘optimal’ to a large degree - in the sense that it’s difficult to achieve faster performance with your own code:

@ProgrammerLin’s voxel w/ GI demo, via OptiX 6.0 running on an NVIDIA GTX 2070 (w/ RTX):

Ingo Wald’s series on an Optix version of Raytracing in One Weekend:

OpenCL projects

OpenCL is the portable analogue to CUDA. It’s reportedly slower than CUDA (although some project report OpenCL to be faster).

Sven Forstmann’s Voxel Splatting using OpenCL. Achieves 2 Billion splats / second (!), about 20-30 FPS.

Sven Forstmann’s “Sparse Voxel Octree Raycasting with Image Warping exploiting Frame-to-Frame Coherence”. Achieves 30-50 FPS on a GTX 580M:

A slower, OpenGL-based version of Sven’s voxel splatting engine.

David Bucciarelli wrote several OpenCL demos, which he compared to their CPU counterpart:

SmallptGPU, described here. Performance comparison of SmallptGPU vs. the original Smallpt shows:

The GPU runtime is 10x faster than single-threaded CPU. However, multi-threading would close the gap quite a bit. An 8 core machine would otensibly achieve nearly the same performance.

SmallptGPU (OpenCL) from David Bucciarelli on Vimeo.

SmallptGPU2 described here

MandelGPU decribed here.

MandelGPU (OpenCL) from David Bucciarelli on Vimeo.

JuliaGPU described here, based on QJulia.

JuliaGPU (OpenCL) from David Bucciarelli on Vimeo.

MandelGPU described here.

SmallLuxGPU described here

SmallLuxGPU 2.0 Preview from David Bucciarelli on Vimeo.

Official NVIDIA CUDA projects

There’s not a great guide to the CUDA toolkit samples, so I made my own by crawling the sources. It’s implemented as a (barebones) Datasette browser.

(That web app is running on a free Heroku Dyno instance which goes to sleep, so give it 5-20 seconds to boot if it’s slow to launch)

Voxels, no source code

There are several recent projects without source code available. They’re still worth checking out because they’re newer and show the current state of the art.

Jacco Bikker recently (first half of 2019) wrote a voxel raytracer. Achieves 200+ FPS on a 1060m.

This is a CUDA port of his earlier CPU voxel raytracer.

@ProgrammerLin dropped RTX / Optix support and implemented this directly in CUDA, achieving 100 FPS for single-bounce lighting:

Later, @ProgrammerLin ported this to RTX / Optix 6.0:

A more recent version, running at 30 FPS: