pathtracer-ocl

module
v0.0.0-...-4f29a8f Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 19, 2022 License: MIT

README

pathtracer-ocl

Path tracer written in Go and OpenCL example0

exampleexample2example3 (2048 samples)

Description

Simple unidirectional pathtracer written just for fun using Go as frontend and OpenCL as computation backend.

Supports:

  • Spheres, Planes, Boxes, Cylinders, Plain triangles
  • Diffuse, refractive and reflective materials
  • Movable camera
  • Anti-aliasing
  • Depth of Field with simple focal length and camera aperture.
  • .OBJ model loading and rendering with BVH support, incl computing vertex normals.
  • Texture-mapped planes, spheres and cubes.
  • Textured environment spheres and cubes

Based on or inspired by:

Usage

A few command-line args have been added to simplify testing things.

      --width int            Image width (default 640)
      --height int           Image height (default 480)
      --samples int          Number of samples per pixel (default 1)
      --aperture float       Aperture. If 0, no DoF will be used. Default: 0
      --focal-length float   Focal length. Default: 0
      --device-index int     Use OpenCL device with index (use --list-devices to list available devices)
      --list-devices         List available OpenCL devices
      --list-scenes          List available scenes

Suggested values for focal length and aperture (if you want Depth of Field) for the standard Cornell box: 1.6 and 0.1.

Example:

go run cmd/pt/main.go --samples 2048 --aperture 0.15 --focal-length 1.6 --width 1280 --height 960

Note! The project probably only works on AMD64 CPUs since there's some leftover PLAN9 assembly generated from C AVX2 instrinsics, which is unlikely to work well on M1 Macs with ARM CPUs.

Listing and selecting a device

Not all OpenCL devices are created equal. On the author's semi-ancient MacBook Pro 2014, running go run cmd/pt/main.go --list-devices yields:

Index: 0 Type: CPU Name: Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz
Index: 1 Type: GPU Name: Iris Pro
Index: 2 Type: GPU Name: GeForce GT 750M

However, the Iris Pro iGPU does not support double-precision floating point numbers. Also, there are subtle differences between CPU and GPU device, which in certain situations may result in panics or segmentation faults. In other words: Your milage may vary. CPU-based devices seems to be the most stable and on MacBooks, CPU has significantly better performance than the discrete GPUs.

Note: At some point, the path tracer stopped working on the GeForce GT 750M. Works fine on more modern GPUs...

Running on Windows

Given a nVidia GPU on Windows 10, a few env vars needs to be set. nVidia includes the OpenCL header and lib in their CUDA installation which by default seems to be installed at C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.5 which needs to be shortened. See example below:

set CGO_ENABLED=1
set CGO_CFLAGS=-I C:\PROGRA~1\NVIDIA~2\CUDA\v11.5\include
set CGO_LDFLAGS=-L C:\PROGRA~1\NVIDIA~2\CUDA\v11.5\lib\x64
go build cmd/pt/main.go

Performance

For this reference image at 1280x960: example

MacBook Pro mid-2014
  • Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz: 10m45.813990173s
  • GeForce GT 750M GPU: 14m12.049519483s
MacBook Pro 2019
  • Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz: 4m20.568928052s
  • AMD Radeon Pro 560X Compute Engine: 5m14.439406471s
Desktop PC with Windows 10 - Ryzen 2600X
  • NVIDIA GeForce RTX 2080: 45.4309853s

(AMD has dropped Ryzen CPU OpenCL support on Windows)

In this scenario, the 8-core Core i9 CPU is more than twice as fast as the 4-cire Core i7 CPU on the older MacBook. Both mGPUs are slower than their respective CPUs.

The king is unsurprisingly enough the GeForce RTX 2080 on my Desktop PC, which is almost 6x faster than the 8-core Intel CPU.

With 3D models

1280x960, 2048 samples.

Teapot
  • Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz: 57m
  • NVIDIA GeForce RTX 2080, 256 wgsize: 47m
  • NVIDIA GeForce RTX 2080, 256 wgsize all lines in a single chunk: 29m20s. (Also, a few other optimizations had been done, especially regarding bounding box intersection testing)
Gopher
  • Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz: ???
  • NVIDIA GeForce RTX 2080: 47m

Clearly, the GPU underperforms enormously with 3D model rendering. Gopher is about 16000 triangles, Teapot 6500 triangles. Both use BVH trees with semi-optimized group sizes.

Note that GPU usage (RTX2080) is 100% according to MSI Afterburner, so the GPU is doing all it can. It seems something is really inefficient though.

Issues

--The current DoF has some issues producing slight artifacts, probably due to how random numbers are seeded for the aperture-based ray origin.-- Issue fixed after adding sunflower-based camera aperture sampling.

Teapot

The classic. Here a 1280x960 render using 2048 samples: Teapot (Due to a single naive bounding box around all ~6500 triangles, rendering this image was extremely slow, about 7.5 hours on a MacBook Pro 2014). After adding a BVH with 500 triangles per node, the rendering went down to 1h42m. A render with about 75 triangles per node seems to be optimal in this scene, TBD full render!

Depth of Field

Depth-of-field effect is accomplished through casting a standard camera->pixel ray into the scene, and then creating a new "focal point" by using focal distance (distance to a point along camera ray). A new random origin point is then randomly picked around the camera origin with r==aperture and a new ray is cast from the new camera through the focal point, resulting in objects not near the focal point to appear increasingly out-of focus.

1280x960, 2048 samples, focal length 1.6, aperture 0.15. DoF

Lonesome window with Gopher and DoF

Lonesome window 2048 samples. Combines a "window" with lengthened cornell box, reflective sphere and the Gopher model.

The window is made from an almost flat "light" cube with 4 cubes as window lining.

Lonesome window with DoF

Lonesome window 4096 samples. Bad use-case for brute-force path tracing. Similar to scene above, with Gopher and roof light removed.

Anti-aliasing

Anti-aliasing is accomplished through the age-old trick of casting each ray through a random location in the pixel. Given enough samples, an anti-aliased effect will occurr.

Examples rendered in 640x480 with 512 samples:

Without anti-aliasing:

noaa

With anti-aliasing:

noaa

Some tidbits on model rendering

The program now (march 2022) supports rendering .OBJ models as triangle groups structured into BVH trees. Given that recursion is forbidden in OpenCL, as well as variable-length arrays cannot be passed to OpenCL without careful management of struct sizes, "numberOfNN" fields etc, incorporating 3D model rendering with acceptable performance was non-trivial.

The overall solution:

  • We have three types of "objects" that can be passed over to OpenCL
    • Objects. These are planes, spheres, cubes and cylinders, as well as "groups"
    • A group has a translation etc, but consists of 0-64 child subgroups, where each child subgroup typically is a partial mesh from an .OBJ file.
    • Each subgroup has a bounding box, may reference an arbitrary number of triangles and have 2 child subgroups. The subgroups form a binary tree.
    • Triangles. All triangles for all models goes into a single array of triangles with pre-computed vertex normals, material color/emission and consume 512 bytes each.

A challenge here was that traversing the subgroup tree must be done using for-loops and a local "stack" rather than recursion.

Directories

Path Synopsis
cmd
pt
internal
ocl

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL