**Deferred Hybrid Path Tracing Series: Shadows, Noise and Sampling in Real-Time Rendering** ![Figure [fig:dragons]: Here be dragons](here_be_dragons.png border=1 width=75%) A couple of months ago I decided to start a new rendering project with the goal of creating a high-performance engine tailored for real-time ray tracing from scratch. While I didn’t have a specific vision in mind at the time, I set out a few core requirements: fully bindless resources, a mesh shader geometry pipeline to allow for meshlet rendering, and to have a code base for implementing advanced ray tracing techniques. In this ongoing article series I cover rendering challenges and solutions I encounter as I build and optimize this engine. In this first article I will go over how I render shadows and how noise impacts convergence in real-time ray tracing. I'll also provide an overview of the engine architecture so far. I'll show you how different types of noise affect the results and walk through an example for directional lights. Engine Overview =============================================================================== This engine is a deferred hybrid path tracer written in C++ with D3D12 as the rendering API. It is fully bindless, meaning I can use ResourceDescriptorHeap in shaders to dynamically index resources such as textures and samplers without the need for fixed binding slots, making it a lot easier to write shaders while improving performance. Rasterization is done through a deferred pipeline using mesh shaders. I use [meshoptimizer by Arseny Kapoulkine](https://github.com/zeux/meshoptimizer) to generate meshlets and use a basic mesh shader (with no amplification shader for now) to render geometry. The choice to use mesh shaders was inspired by the techniques used in Alan Wake 2, which use mesh shaders to improve performance by having access to finer grain culling on a per meshlet level. This can reduce unnecessary geometry processing, especially in complex scenes. The deferred pass writes to three render targets: one for albedo (color), normals and world-space positions. Additionally, I store depth using a reverse Z buffer to maintain high precision across the entire depth range. Though there is no special encoding to save memory at this stage, I plan to use techniques like octahedral encoding for normals later: ![Figure [fig:meshlet_view]: Meshlets IDs](meshlet_view.png border=1 width=256px) ![Figure [fig:normals_view]: World-space normals](normals_view.png border=1 width=256px) ![Figure [fig:positions_view]: World-space positions](positions_view.png border=1 width=256px) ![Figure [fig:depth_view]: Reverse Z buffer](depth_view.png border=1 width=256px) Note that storing world-positions isn't needed, as we can reconstruct the position from the depth. For now I'm doing it this way because it's easy. Once the geometry is rendered and the necessary information is stored in the geometry buffers, the next step is to compute shadows and lighting. This is done after the geometry pass by dispatching a compute pass to cast shadow rays against the scene BVH (Bounding Volume Hierarchy). The BVH is a spatial data structure that efficiently organizes scene geometry, enabling fast intersection tests for ray tracing. For the shadow pass, threads are launched in groups of 64 to cover a tile of 8x8 pixels on the screen. Sampling, Noise and Shadows =============================================================================== In path tracing, we use sampling to approximate the rendering equation and calculate lighting interactions. The way samples are distributed across the scene significantly affects how well the image converges. Poorly distributed samples lead to noticeable noise, while well-distributed samples give us higher quality results. This leads me to low-discrepancy sequences. A low-discrepancy sequence, also known as a quasi-random sequence, is a set of points that are distributed in a way that minimizes gaps and clustering. This property is especially desirable when we want to evaluate the rendering equation where clustering can lead to large areas of the sample space being underrepresented. An uneven distribution can result in visual artifacts like noise or banding. ## 2D Animated Blue Noise To show this, I've implemented sunlight by sampling a disk similarly to Alan Wolfe's article in Ray Tracing Gems II. The blue noise evenly distributes sampling points on the disk and is animated using a Halton sequence constructed by two different 1D low-discrepency Van Der Corput sequences, generated for N frames. For the base of the Van Der Corput Sequences I use alternating coprime pairs. Once N frames is reached, the sequence is updated. This method speeds up convergence during real-time previews and ensures that the noise pattern doesn't repeat, further minimizing visual artifacts. ```C float2 AnimateBlueNoise2D(float2 sequenceOffset01, float2 blueNoise, uint sequenceFrameIndex, uint sequenceFrameCount) { const float g = 1.32471795; const float a1 = 1.0 / g; const float a2 = a1 * a1; const float m = float(sequenceFrameIndex % sequenceFrameCount); return float2(frac(sequenceOffset01.x + blueNoise.x + 0.5 + a1 * m), frac(sequenceOffset01.y + blueNoise.y + 0.5 + a2 * m)); } ``` Additionally, I use a low-discrepency R2 sequence to further animate the noise between updates of the 2D Halton sequence. As shown in the following images, this technique reduces the visual noise compared to other types of noise, such as Interleaved Gradient Noise (IGN) or simple white noise (approximated using hash noise functions): ![Figure [fig:bn_no_accum]: Blue noise first frame](bn_no_accum.png border=1 width=200px) ![Figure [fig:ign_no_accum]: IGN first frame](ign_no_accum.png border=1 width=200px) ![Figure [fig:wn_no_accum]: White noise first frame](wn_no_accum.png border=1 width=200px) ![Figure [fig:bn_accum]: Blue noise converged](bn_accum.png border=1 width=200px) ![Figure [fig:ign_accum]: IGN converged](ign_accum.png border=1 width=200px) ![Figure [fig:wn_accum]: White noise converged](wn_accum.png border=1 width=200px) Eagle eyed readers might be able to spot visible streaks in the converged white noise render. I suspect this is due to the hashing function used, but I haven't properly investigated it. ## Comparison With Ray Tracing Gems II ... Conclusion =============================================================================== In this article, we've seen how low-discrepancy sequences, like the Van Der Corput sequence and animated blue noise, can significantly improve convergence in real-time ray tracing. This approach proves far more effective than Interleaved Gradient Noise or simple white noise, which tend to produce noticeable patterns due to poorly distributed samples. While this technique is promising, there's still much work to be done. I need to further refine the selection of sample points and improve random number generation for the engine to achieve better visual quality in more complex lighting scenarios. At some point I also want to integrate a denoiser such as A-SVGF, AMD's FFX Denoiser or NVIDIA's NRD. I plan to add additional light types, including spotlights, point lights and mesh area lights using cumulative distribution functions (CDFs) on the GPU. Additionally, I want to improve the renderer's efficiency by optimizing hardware usage and using a wavefront-based path tracing approach. Beyond that, I plan to work on ambient occlusion, culling, and level-of-detail to enhance performance in large scenes. I'll also develop a proper scene representation and continue exploring physically based rendering and advanced material modeling to make the engine more realistic.