Hello Sphere Ray Tracing¶

This example shows advanced ray tracing with custom intersection shaders that define ray-primitive intersection tests in shader code. Unlike Hello Triangle Ray Tracing which uses built-in triangle intersection, this example traces rays against axis-aligned bounding boxes (AABBs) and performs sphere-ray intersection analytically in an intersection shader. This technique enables ray tracing of procedural geometry like spheres, tori, fractals, and implicit surfaces without storing triangle meshes.

The example does not use the KDGpuExample helper API, demonstrating low-level ray tracing setup.

Overview¶

What this example demonstrates:

Creating acceleration structures with AABB primitives
Custom intersection shaders for procedural geometry
Analytic sphere-ray intersection (quadratic equation solving)
Ray tracing pipeline with procedural hit groups
Shader binding table (SBT) for custom intersections

Use cases:

Procedural geometry (spheres, ellipsoids, tori)
Implicit surfaces (metaballs, signed distance functions)
Volume rendering (clouds, smoke, fog)
Fractals and mathematical shapes
Mixed procedural/triangle rendering

Vulkan Requirements¶

Vulkan Version: 1.2+
Extensions:
- VK_KHR_ray_tracing_pipeline
- VK_KHR_acceleration_structure
- VK_KHR_buffer_device_address
Features:
- rayTracingPipeline
- accelerationStructure
- bufferDeviceAddress
Shader: SPIR-V 1.4+ with ray tracing and intersection shader support

Key Concepts¶

Custom Intersection Shaders:

Built-in triangle intersection is fast but limited. Intersection shaders let you define custom ray-primitive tests in shader code. When a ray hits an AABB, the intersection shader is invoked to perform precise intersection testing.

Spec: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_KHR_ray_tracing_pipeline.html

AABB Geometry:

Procedural primitives use AABBs (axis-aligned bounding boxes) instead of triangles. The AABB is a conservative bound - ray traversal tests ray-AABB intersection first (fast), then invokes the intersection shader only if the ray hits the AABB.

For spheres:

AABB sphereAABB = {
    .min = center - vec3(radius),
    .max = center + vec3(radius)
};

Procedural Hit Groups:

Ray tracing shader groups for procedural geometry use RayTracingShaderGroupType::ProceduralHit which contains an intersection shader plus closest-hit shader (and optional any-hit).

Implementation¶

Ray Tracing Pipeline¶

First, we need to create a ray tracing pipeline. Unlike graphics pipelines, ray tracing pipelines can contain multiple shader stages for different purposes: ray generation, miss, closest hit, and intersection.

    // Create raytracing shaders
    auto rayTracingGenShaderPath = KDGpuExample::assetDir().file("shaders/examples/hello_sphere_rt/raygen.spv");
    auto rayTracingMissShaderPath = KDGpuExample::assetDir().file("shaders/examples/hello_sphere_rt/miss.spv");
    auto rayTracingClosestShaderPath = KDGpuExample::assetDir().file("shaders/examples/hello_sphere_rt/closest.spv");
    auto rayTracingIntersectionShaderPath = KDGpuExample::assetDir().file("shaders/examples/hello_sphere_rt/intersection.spv");

    auto rayTracingGenShader = m_device.createShaderModule(readShaderFile(rayTracingGenShaderPath));
    auto rayTracingMissShader = m_device.createShaderModule(readShaderFile(rayTracingMissShaderPath));
    auto rayTracingClosestShader = m_device.createShaderModule(readShaderFile(rayTracingClosestShaderPath));
    auto rayTracingIntersectionShader = m_device.createShaderModule(readShaderFile(rayTracingIntersectionShaderPath));

Filename: hello_sphere_rt/hello_sphere_rt.cpp

Our ray tracing generation shader will expect a BindGroup that provides the TBLAS as well as a writable image into which we can store the rendering of our ray tracing work.

    // Create bind group layout consisting of an acceleration structure and an image to write out to
    const BindGroupLayoutOptions rtBindGroupLayoutOptions = {
        .bindings = {
                {
                        // Acceleration Structure
                        .binding = 0,
                        .count = 1,
                        .resourceType = ResourceBindingType::AccelerationStructure,
                        .shaderStages = ShaderStageFlags(ShaderStageFlagBits::RaygenBit),
                },
                {
                        // Output Image
                        .binding = 1,
                        .count = 1,
                        .resourceType = ResourceBindingType::StorageImage,
                        .shaderStages = ShaderStageFlagBits::RaygenBit | ShaderStageFlagBits::MissBit | ShaderStageFlagBits::ClosestHitBit,
                },
        },
    };

Filename: hello_sphere_rt/hello_sphere_rt.cpp

These stages are then organized into shader groups. In this example, we have a general group for ray generation, another for the miss shader, and a procedural hit group that combines the closest hit and intersection shaders.

    // Create a raytracing pipeline
    const RayTracingPipelineOptions pipelineOptions{
        .shaderStages = {
                ShaderStage{
                        .shaderModule = rayTracingGenShader.handle(),
                        .stage = ShaderStageFlagBits::RaygenBit,
                },
                ShaderStage{
                        .shaderModule = rayTracingMissShader.handle(),
                        .stage = ShaderStageFlagBits::MissBit,
                },
                ShaderStage{
                        .shaderModule = rayTracingClosestShader.handle(),
                        .stage = ShaderStageFlagBits::ClosestHitBit,
                },
                ShaderStage{
                        .shaderModule = rayTracingIntersectionShader.handle(),
                        .stage = ShaderStageFlagBits::IntersectionBit,
                },
        },
        .shaderGroups = {
                // Gen
                RayTracingShaderGroupOptions{
                        .type = RayTracingShaderGroupType::General,
                        .generalShaderIndex = 0,
                },
                // Miss
                RayTracingShaderGroupOptions{
                        .type = RayTracingShaderGroupType::General,
                        .generalShaderIndex = 1,
                },
                // Closest Hit
                RayTracingShaderGroupOptions{
                        .type = RayTracingShaderGroupType::ProceduralHit,
                        .closestHitShaderIndex = 2,
                        .intersectionShaderIndex = 3,
                },
        },
        .layout = m_pipelineLayout,
    };
    m_pipeline = m_device.createRayTracingPipeline(pipelineOptions);

Filename: hello_sphere_rt/hello_sphere_rt.cpp

The ProceduralHit group combines intersection and closest-hit shaders for custom geometry.

Ray Tracing Shaders¶

The pipeline uses four GLSL shaders, all located under assets/shaders/examples/hello_sphere_rt/.

Ray Generation — raygen.rgen¶

The ray generation shader runs once per pixel. It reconstructs a world-space ray from the pixel's NDC coordinates using the inverse view-projection matrix, then calls traceRayEXT against the TLAS. The result written by whichever hit/miss shader fires is stored back into the output image with imageStore.

#version 460 core
#extension GL_EXT_ray_tracing : enable

layout(location = 0) rayPayloadEXT vec4 payload;

layout(set = 0, binding = 0) uniform accelerationStructureEXT topLevelAS;
layout(set = 0, binding = 1) writeonly uniform image2D img;
layout(set = 1, binding = 0) uniform Camera
{
    mat4 viewMatrix;
    mat4 projectionMatrix;
}
camera;

vec3 unproject(vec3 ndc)
{
    mat4 inverseViewProjection = inverse(camera.projectionMatrix * camera.viewMatrix);
    vec4 tmp = inverseViewProjection * vec4(ndc, 1.0);
    tmp = tmp / tmp.w;
    return tmp.xyz;
}

void main()
{
    const vec2 pixelCenter = vec2(gl_LaunchIDEXT.xy) + vec2(0.5);
    const vec2 inUV = pixelCenter / vec2(gl_LaunchSizeEXT.xy);
    vec2 d = inUV * 2.0 - 1.0;

    // Ray is expected to be provided in world space
    vec3 near = unproject(vec3(d.xy, 0.0));
    vec3 far = unproject(vec3(d.xy, 1.0));

    vec4 origin = vec4(near, 1.0);
    vec4 direction = vec4(normalize(far - near), 0.0);

    uint rayFlags = gl_RayFlagsNoneEXT;
    float tMin = 0.1;
    float tMax = 1000.0;

    traceRayEXT(topLevelAS, // acceleration structure
                rayFlags, // rayFlags
                0xFF, // cullMask
                0, // sbtRecordOffset
                0, // sbtRecordStride
                0, // missIndex
                origin.xyz, // ray origin
                tMin, // ray min range
                direction.xyz, // ray direction
                tMax, // ray max range
                0 // payload (location = 0)
    );

    imageStore(img, ivec2(gl_LaunchIDEXT.xy), payload);
}

Filename: hello_sphere_rt/raygen.rgen

Intersection — intersection.rint¶

This shader is the heart of the procedural geometry technique. It is invoked for every AABB that the ray touches. The shader reads the corresponding SphereData entry (center + radius) indexed by gl_PrimitiveID, solves the quadratic ray-sphere equation, and, if there is a real positive root, reports the intersection distance with reportIntersectionEXT. If the discriminant is negative the ray misses the sphere and the shader returns silently, letting the traversal continue.

#version 460 core
#extension GL_EXT_ray_tracing : enable

struct SphereData {
    vec3 center;
    float radius;
    vec4 color;
};

layout(std430, set = 2, binding = 0) readonly buffer Spheres
{
    SphereData data[];
}
spheres;

void main()
{
    vec3 orig = gl_WorldRayOriginEXT;
    vec3 dir = normalize(gl_WorldRayDirectionEXT);

    SphereData sphereData = spheres.data[gl_PrimitiveID];
    vec3 sphereCenter = sphereData.center;
    float sphereRadius = sphereData.radius;

    vec3 oc = orig - sphereCenter;
    float b = dot(oc, dir);
    float c = dot(oc, oc) - sphereRadius * sphereRadius;
    float discriminant = b * b - c;

    if (discriminant < 0.0)
        return;

    float hit = -b - sqrt(discriminant);

    // vec3 intersection = orig + hits.x * dir;
    reportIntersectionEXT(hit, 0);
}

Filename: hello_sphere_rt/intersection.rint

Closest Hit — closest.rchit¶

Once the traversal has committed the closest intersection for a given ray, the closest-hit shader runs. It looks up the SphereData entry again, computes the world-space hit point from gl_WorldRayOriginEXT + gl_HitTEXT * gl_WorldRayDirectionEXT, derives the surface normal, and applies a simple Lambertian (diffuse) shading model before writing the result into the ray payload.

#version 460 core
#extension GL_EXT_ray_tracing : enable

layout(location = 0) rayPayloadInEXT vec4 payload;

struct SphereData {
    vec3 center;
    float radius;
    vec4 color;
};

layout(std430, set = 2, binding = 0) readonly buffer Spheres
{
    SphereData data[];
}
spheres;

void main()
{
    // Compute some lighting because we can
    vec3 lightDir = normalize(vec3(1.0));

    SphereData sphereData = spheres.data[gl_PrimitiveID];
    // Intersection point on sphere surface
    vec3 worldHit = gl_WorldRayOriginEXT + gl_HitTEXT * gl_WorldRayDirectionEXT;
    // Normal from Sphere
    vec3 normalAtHit = normalize(worldHit - sphereData.center);

    // Diffuse Factor
    float diffuse = max(dot(lightDir, normalAtHit), 0.0);

    payload = sphereData.color * diffuse;
}

Filename: hello_sphere_rt/closest.rchit

Miss — miss.rmiss¶

If no geometry is hit the miss shader fires and fills the payload with a flat dark-gray background colour.

#version 460 core
#extension GL_EXT_ray_tracing : enable

layout(location = 0) rayPayloadInEXT vec4 payload;

void main()
{
    payload = vec4(vec3(0.3), 1.0);
}

Filename: hello_sphere_rt/miss.rmiss

Acceleration Structures¶

Ray tracing is performed against acceleration structures rather than raw vertex data. We create a Bottom Level Acceleration Structure (BLAS) for the geometry (the spheres' AABBs) and a Top Level Acceleration Structure (TLAS) that instances the BLAS.

    const size_t SphereCount = 1024;
    // ... omitting some code for brevity in my thought, but I'll replace everything needed
    struct SphereData {
        glm::vec4 positionAndRadius;
        glm::vec4 color;
    };
    static_assert(sizeof(SphereData) == 8 * sizeof(float));

    std::vector<SphereData> spheres(SphereCount);
    std::vector<VkAabbPositionsKHR> aabbs(SphereCount);
    // ...

Filename: hello_sphere_rt/hello_sphere_rt.cpp

    // Create Acceleration Structures (the BoundingVolumes we will ray trace against)

    // We will have SphereCount aabbGeometry
    m_bottomLevelAs = m_device.createAccelerationStructure(AccelerationStructureOptions{
            .label = "BottomLevelAS",
            .type = AccelerationStructureType::BottomLevel,
            .flags = AccelerationStructureFlagBits::PreferFastTrace,
            .geometryTypesAndCount = {
                    {
                            .geometry = aabbGeometry,
                            .maxPrimitiveCount = SphereCount,
                    },
            },
    });

    const AccelerationStructureGeometryInstancesData aabbGeometryInstance{
        .data = {
                AccelerationStructureGeometryInstance{
                        .flags = GeometryInstanceFlagBits::TriangleFacingCullDisable,
                        .accelerationStructure = m_bottomLevelAs,
                },
        },
    };

    // Add the instance information for our AABB
    m_topLevelAs = m_device.createAccelerationStructure(AccelerationStructureOptions{
            .label = "TopLevelAS",
            .type = AccelerationStructureType::TopLevel,
            .flags = AccelerationStructureFlagBits::PreferFastTrace,
            .geometryTypesAndCount = {
                    {
                            .geometry = aabbGeometryInstance,
                            .maxPrimitiveCount = 1,
                    },
            },
    });

Filename: hello_sphere_rt/hello_sphere_rt.cpp

Key points:

AccelerationStructureGeometryAabbsData: AABB buffer with tightly packed {minX, minY, minZ, maxX, maxY, maxZ} floats
Each AABB = 24 bytes (6 floats)

Once the structures are created, they must be built on the GPU.

    // Build acceleration structures
    {
        auto commandRecorder = m_device.createCommandRecorder();

        // Bottom Level AS
        commandRecorder.beginDebugLabel(DebugLabelOptions{
                .label = "BottomLevel - AccelerationStructures",
                .color = { 0.0f, 1.0f, 0.0f, 1.0f },
        });

        commandRecorder.buildAccelerationStructures(BuildAccelerationStructureOptions{
                .buildGeometryInfos = {
                        {
                                .geometries = { aabbGeometry },
                                .destinationStructure = m_bottomLevelAs,
                                .buildRangeInfos = {
                                        {
                                                .primitiveCount = static_cast<uint32_t>(aabbs.size()),
                                                .primitiveOffset = 0,
                                                .firstVertex = 0,
                                                .transformOffset = 0,
                                        },
                                },
                        },
                },
        });

        // Pro Tip: If you don't want to spend days wondering why you have not hits...
        // => Make sure you wait for the bottomLevelAS to have been built prior to building the topLevelAS
        commandRecorder.memoryBarrier(MemoryBarrierOptions{
                .srcStages = PipelineStageFlags(PipelineStageFlagBit::AccelerationStructureBuildBit),
                .dstStages = PipelineStageFlags(PipelineStageFlagBit::AccelerationStructureBuildBit),
                .memoryBarriers = {
                        {
                                .srcMask = AccessFlags(AccessFlagBit::AccelerationStructureWriteBit),
                                .dstMask = AccessFlags(AccessFlagBit::AccelerationStructureReadBit),
                        },
                },
        });
        commandRecorder.endDebugLabel();

        // Top Level AS
        commandRecorder.beginDebugLabel(DebugLabelOptions{
                .label = "TopLevel - AccelerationStructures",
                .color = { 0.0f, 1.0f, 0.2f, 1.0f },
        });

        commandRecorder.buildAccelerationStructures(BuildAccelerationStructureOptions{
                .buildGeometryInfos = {
                        {
                                .geometries = { aabbGeometryInstance },
                                .destinationStructure = m_topLevelAs,
                                .buildRangeInfos = {
                                        {
                                                .primitiveCount = 1, // 1 BLAS
                                                .primitiveOffset = 0,
                                                .firstVertex = 0,
                                                .transformOffset = 0,
                                        },
                                },
                        },
                },
        });

        commandRecorder.endDebugLabel();

        CommandBuffer cmdBuffer = commandRecorder.finish();
        m_queue.submit(SubmitOptions{
                .commandBuffers = { cmdBuffer },
        });
        m_queue.waitUntilIdle();
    }

Filename: hello_sphere_rt/hello_sphere_rt.cpp

AS build happens on GPU via command buffer.

Shader Binding Table (SBT)¶

The Shader Binding Table connects the trace calls in the shaders to the actual shader groups in the pipeline.

    // Create Shader Binding Table
    // This basically allows use to create a selection of ShaderGroups we want to use for a specific trace call
    // e.g which rayGen, which Miss, which Hit group we want to use
    // https://docs.vulkan.org/spec/latest/chapters/raytracing.html#shader-binding-table
    // https://www.willusher.io/graphics/2019/11/20/the-sbt-three-ways
    m_sbt = RayTracingShaderBindingTable(&m_device, RayTracingShaderBindingTableOptions{
                                                            .nbrMissShaders = 1,
                                                            .nbrHitShaders = 1,
                                                    });

    m_sbt.addRayGenShaderGroup(m_pipeline, 0);
    m_sbt.addMissShaderGroup(m_pipeline, 1);
    m_sbt.addHitShaderGroup(m_pipeline, 2);

Filename: hello_sphere_rt/hello_sphere_rt.cpp

SBT layout must match pipeline shader group layout with proper alignment.

Rendering¶

The rendering process involves transitioning the swapchain image to a general layout so it can be used as a storage image by the ray tracing shaders, and then issuing a traceRays call.

    auto commandRecorder = m_device.createCommandRecorder();

    if (!m_swapchainImageLayouts.empty()) {
        const Handle<Texture_t> outputImage = m_swapchain.textures()[m_currentSwapchainImageIndex];

        // Transition Swapchain Image to General Layout
        commandRecorder.textureMemoryBarrier(TextureMemoryBarrierOptions{
                .srcStages = KDGpu::PipelineStageFlags(KDGpu::PipelineStageFlagBit::TopOfPipeBit),
                .srcMask = KDGpu::AccessFlagBit::None,
                .dstStages = KDGpu::PipelineStageFlags(KDGpu::PipelineStageFlagBit::RayTracingShaderBit),
                .dstMask = KDGpu::AccessFlagBit::ShaderReadBit | KDGpu::AccessFlagBit::ShaderWriteBit,
                .oldLayout = m_swapchainImageLayouts[m_currentSwapchainImageIndex],
                .newLayout = KDGpu::TextureLayout::General,
                .texture = outputImage,
                .range = {
                        .aspectMask = KDGpu::TextureAspectFlagBits::ColorBit,
                        .levelCount = 1,
                },
        });

        // Update Image entry on BindGroup
        m_rtBindGroup.update(BindGroupEntry{
                .binding = 1,
                .resource = ImageBinding{
                        .textureView = m_swapchainViews[m_currentSwapchainImageIndex],
                },
        });

        commandRecorder.beginDebugLabel(DebugLabelOptions{
                .label = "RayTracing Pass",
                .color = { 1.0f, 0.0f, 0.0f, 1.0f },
        });

        auto rtPass = commandRecorder.beginRayTracingPass();
        rtPass.setPipeline(m_pipeline);
        rtPass.setBindGroup(0, m_rtBindGroup);
        rtPass.setBindGroup(1, m_cameraBindGroup);
        rtPass.setBindGroup(2, m_sphereDataBindGroup);

        // Issue RT Trace call using the SBT table we previously filled
        rtPass.traceRays(RayTracingCommand{
                .raygenShaderBindingTable = m_sbt.rayGenShaderRegion(),
                .missShaderBindingTable = m_sbt.missShaderRegion(),
                .hitShaderBindingTable = m_sbt.hitShaderRegion(),
                .extent = {
                        .width = m_swapchainExtent.width,
                        .height = m_swapchainExtent.height,
                        .depth = 1,
                },
        });

        rtPass.end();
        commandRecorder.endDebugLabel();

        // Transition Swapchain Image to ColorAttachment Layout
        commandRecorder.textureMemoryBarrier(TextureMemoryBarrierOptions{
                .srcStages = KDGpu::PipelineStageFlags(KDGpu::PipelineStageFlagBit::RayTracingShaderBit),
                .srcMask = KDGpu::AccessFlagBit::ShaderReadBit | KDGpu::AccessFlagBit::ShaderWriteBit,
                .dstStages = KDGpu::PipelineStageFlags(KDGpu::PipelineStageFlagBit::ColorAttachmentOutputBit),
                .dstMask = KDGpu::AccessFlagBit::ColorAttachmentReadBit,
                .oldLayout = KDGpu::TextureLayout::General,
                .newLayout = KDGpu::TextureLayout::ColorAttachmentOptimal,
                .texture = outputImage,
                .range = {
                        .aspectMask = KDGpu::TextureAspectFlagBits::ColorBit,
                        .levelCount = 1,
                },
        });

        commandRecorder.beginDebugLabel(DebugLabelOptions{
                .label = "Raster Pass",
                .color = { 0.0f, 0.0f, 1.0f, 1.0f },
        });

        // Create a GraphicsRenderPass to draw the imgui overlay
        // Implicitly Transition Swapchain Image to Presentation Layout
        auto opaquePass = commandRecorder.beginRenderPass(RenderPassCommandRecorderOptions{
                .colorAttachments = {
                        {
                                .view = m_swapchainViews[m_currentSwapchainImageIndex],
                                .loadOperation = AttachmentLoadOperation::Load,
                                .clearValue = { 0.0f, 0.0f, 0.0f, 0.0f },
                                .initialLayout = TextureLayout::ColorAttachmentOptimal,
                                .finalLayout = TextureLayout::PresentSrc,
                        },
                },
                .depthStencilAttachment = {
                        .view = m_depthTextureView,
                },
        });
        renderImGuiOverlay(&opaquePass);
        opaquePass.end();
        commandRecorder.endDebugLabel();

        // Update layout so that we know what layout we are in on the next frames
        m_swapchainImageLayouts[m_currentSwapchainImageIndex] = KDGpu::TextureLayout::PresentSrc;
    }

    m_commandBuffer = commandRecorder.finish();

Filename: hello_sphere_rt/hello_sphere_rt.cpp

The traceRays call uses the regions from our SBT to know which shaders to execute for each ray.

Performance Notes¶

Custom Intersection Cost:

More expensive than triangle intersection (GPU triangle units idle)
Sphere intersection: ~10-50 cycles (vs ~5 for triangles)
Complex SDFs can be 100+ cycles

Optimization Tips:

Keep AABBs tight (reduce false positives)
Early-exit intersection tests when possible
Simplify math (avoid sqrt, transcendentals where possible)
Mix procedural and triangles (use triangles when efficient)

When to Use Procedural:

Geometry is naturally implicit (spheres, SDFs)
Tessellation would be too expensive
Dynamic/animated procedural shapes
Memory-constrained (no vertex buffers)

Hello Sphere Ray Tracing¶

Overview¶

Vulkan Requirements¶

Key Concepts¶

Implementation¶

Ray Tracing Pipeline¶

Ray Tracing Shaders¶

Ray Generation — raygen.rgen¶

Intersection — intersection.rint¶

Closest Hit — closest.rchit¶

Miss — miss.rmiss¶

Acceleration Structures¶

Shader Binding Table (SBT)¶

Rendering¶

Performance Notes¶

See Also¶

Further Reading¶