Skip to content

Descriptor Indexing with Bindless Rendering

This example shows how to use descriptor indexing (also called "bindless rendering") to access arrays of resources using non-uniform indices in shaders. Traditional Vulkan requires binding specific descriptors before each draw call; descriptor indexing allows binding large arrays once and indexing into them dynamically in shaders. This dramatically reduces CPU overhead and enables efficient material systems, texture atlases, and data-driven rendering.

The example uses the KDGpuExample helper API for simplified setup.

Overview

What this example demonstrates:

  • Enabling VK_EXT_descriptor_indexing extension and required features
  • Creating descriptor sets with large arrays of uniform buffers
  • Using nonuniformEXT qualifier in shaders for dynamic indexing
  • Drawing multiple objects with different materials/transforms
  • Variable-length descriptor arrays sized at runtime

Use cases:

  • Material systems (hundreds/thousands of materials in one array)
  • Texture streaming and mega-textures
  • GPU-driven rendering (indirect draws selecting resources)
  • Bindless vertex/index buffers
  • Efficient multi-material rendering

Vulkan Requirements

  • Vulkan Version: 1.2+ (descriptor indexing promoted to core)
  • Extensions: VK_EXT_descriptor_indexing (core in 1.2)
  • Features:
    • shaderUniformBufferArrayNonUniformIndexing
    • runtimeDescriptorArray
    • descriptorBindingVariableDescriptorCount
    • descriptorBindingPartiallyBound (optional but recommended)
  • Shader: SPIR-V 1.3+ or GLSL 450+ with GL_EXT_nonuniform_qualifier

Key Concepts

Traditional Descriptor Binding:

1
2
3
4
5
// Draw 3 objects with different materials:
for (int i = 0; i < 3; i++) {
    bindDescriptorSet(materialDescriptorSets[i]);  // CPU overhead!
    draw(object[i]);
}

Every material requires a descriptor set bind, which has CPU cost.

Descriptor Indexing / Bindless:

1
2
3
4
5
6
7
8
// Bind array of ALL materials once:
bindDescriptorSet(allMaterialsArray);  // One bind for all!

// Draw all objects with different material indices:
for (int i = 0; i < 3; i++) {
    // Index is computed in shader or passed via push constant
    draw(object[i]);
}

Shader:

1
2
3
4
5
6
7
8
9
layout(set = 0, binding = 0) uniform Material {
    mat4 transform;
} materials[16];  // Array of materials/transforms

void main() {
    // Index computed from frame time/angle:
    uint index = computeIndex();
    mat4 transform = materials[nonuniformEXT(index)].transform;
}

Benefits:

  • Massively reduced bind calls
  • GPU-driven resource selection
  • Simplified render loop

Spec: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_EXT_descriptor_indexing.html

NonUniform Indexing:

The nonuniformEXT qualifier tells the compiler that the index can vary between shader invocations (non-uniform control flow). Without this:

  • Index must be compile-time constant, or
  • Index must be uniform across all invocations in a subgroup

With nonuniformEXT:

  • Each triangle/pixel can use different index
  • Enables material-per-object, texture-per-pixel selection
  • May have small performance cost on some hardware

Variable-Length Arrays:

Traditional Vulkan requires compile-time array sizes. Descriptor indexing allows:

1
2
3
layout(set = 0, binding = 0) uniform Transforms {
    mat4 matrix;
} transforms[];  // Variable length!

The size is determined at runtime by descriptorCount in VkDescriptorSetLayoutBinding.

Implementation

Allocating Descriptor Array Buffers:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
    // Create a set of TransformsCount UBOs, each holding a distinct rotation matrix
    {
        const BufferOptions bufferOptions = {
            .size = sizeof(glm::mat4),
            .usage = BufferUsageFlagBits::UniformBufferBit,
            .memoryUsage = MemoryUsage::CpuToGpu // So we can map it to CPU address space
        };

        m_transformBuffers.reserve(TransformsCount);

        const float angleStep = 360.0f / float(TransformsCount);

        for (size_t i = 0; i < TransformsCount; ++i) {
            const glm::mat4 mat = glm::rotate(glm::mat4(1.0f), glm::radians(i * angleStep), glm::vec3(0.0f, 0.0f, 1.0f));

            Buffer buf = m_device.createBuffer(bufferOptions);
            auto bufferData = buf.map();
            std::memcpy(bufferData, &mat, sizeof(glm::mat4));
            buf.unmap();

            m_transformBuffers.emplace_back(std::move(buf));
        }
    }

Filename: bindgroup_indexing/bindgroup_indexing.cpp

This example creates an array of uniform buffers, each holding a rotation matrix. The shader will index into this array.

Storage Buffer for Frame Counter:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
    // Create a SSBO that will hold a frameCounter
    {
        m_frameCounterSSBO = m_device.createBuffer(BufferOptions{
                .size = sizeof(uint32_t),
                .usage = BufferUsageFlagBits::StorageBufferBit,
                .memoryUsage = MemoryUsage::CpuToGpu // So we can map it to CPU address space
        });

        auto bufferData = m_frameCounterSSBO.map();
        std::memset(bufferData, 0, sizeof(uint32_t));
        m_frameCounterSSBO.unmap();
    }

Filename: bindgroup_indexing/bindgroup_indexing.cpp

A storage buffer (SSBO) tracks frame count, which the vertex shader uses to compute rotation angles.

Descriptor Set Layout with Variable Array:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
    // Create bind group layout consisting of a:
    // - a binding holding an array of at most TransformsCount UBOs
    // - a binding holding an SSBO
    const BindGroupLayoutOptions transformsBindGroupLayoutOptions = {
        .bindings = {
                {
                        .binding = 0,
                        .count = TransformsCount,
                        .resourceType = ResourceBindingType::UniformBuffer,
                        .shaderStages = ShaderStageFlagBits::VertexBit,
                        // As far as the shader is concerned, it has no idea how many UBOs are in the array
                        .flags = { ResourceBindingFlagBits::VariableBindGroupEntriesCountBit },
                },
        }
    };
    const BindGroupLayoutOptions ssboBindGroupLayoutOptions = {
        .bindings = {
                {
                        .binding = 0,
                        .resourceType = ResourceBindingType::StorageBuffer,
                        .shaderStages = ShaderStageFlagBits::VertexBit,
                },
        }
    };
    const BindGroupLayout transformsBindGroupLayout = m_device.createBindGroupLayout(transformsBindGroupLayoutOptions);
    const BindGroupLayout ssboBindGroupLayout = m_device.createBindGroupLayout(ssboBindGroupLayoutOptions);

    m_transformCountPushConstant = PushConstantRange{
        .offset = 0,
        .size = sizeof(uint32_t),
        .shaderStages = ShaderStageFlags(ShaderStageFlagBits::VertexBit)
    };

Filename: bindgroup_indexing/bindgroup_indexing.cpp

Key configuration:

  • count = TransformsCount: Array of N buffers
  • VariableBindGroupEntriesCountBit: Variable-length array in shader
  • Shader can access as uniform Transforms {} transforms[]

Creating Descriptor Array:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
        // Create a bindGroup to hold the variable length array UBOs
        {
            BindGroupOptions bindGroupOptions = {
                .layout = transformsBindGroupLayout,
                .maxVariableArrayLength = TransformsCount,
            };

            bindGroupOptions.resources.reserve(TransformsCount);

            // Array of UBOs
            for (size_t i = 0; i < TransformsCount; ++i) {
                bindGroupOptions.resources.emplace_back(BindGroupEntry{
                        .binding = 0,
                        .resource = UniformBufferBinding{ .buffer = m_transformBuffers[i] },
                        .arrayElement = static_cast<uint32_t>(i),
                });
            }

            m_transformsBindGroup = m_device.createBindGroup(bindGroupOptions);
        }

Filename: bindgroup_indexing/bindgroup_indexing.cpp

The descriptor set binds all array elements at once.

Storage Buffer Bind Group:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
        // Create a bindGroup to hold the frameCounter SSBO
        {
            BindGroupOptions bindGroupOptions = {
                .layout = ssboBindGroupLayout,
                .resources = {
                        {
                                .binding = 0,
                                .resource = StorageBufferBinding{ .buffer = m_frameCounterSSBO },
                        },
                }
            };

            m_ssboBindGroup = m_device.createBindGroup(bindGroupOptions);
        }

Filename: bindgroup_indexing/bindgroup_indexing.cpp

Separate bind group for the frame counter buffer.

Rendering with Descriptor Arrays:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
    opaquePass.setPipeline(m_pipeline);
    opaquePass.setVertexBuffer(0, m_buffer);
    opaquePass.setIndexBuffer(m_indexBuffer);
    // Push Constant
    opaquePass.pushConstant(m_transformCountPushConstant, &TransformsCount);
    // Bind bindGroups
    opaquePass.setBindGroup(0, m_transformsBindGroup);
    opaquePass.setBindGroup(1, m_ssboBindGroup);
    const DrawIndexedCommand drawCmd = { .indexCount = 3 };
    opaquePass.drawIndexed(drawCmd);

    opaquePass.end();

Filename: bindgroup_indexing/bindgroup_indexing.cpp

Bind pipeline, vertex buffer, and both descriptor sets. Then draw - the shader dynamically selects which transform to use based on computed angle.

When it comes to our shader, we do a few things. We start by defining the extension:

1
#extension GL_EXT_nonuniform_qualifier : enable

Then we declare our bind groups and push constant blocks.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
layout(set = 0, binding = 0) uniform Transform
{
    mat4 modelMatrix;
} transforms[];

layout(set = 1, binding = 0) coherent buffer FrameCounter
{
    uint primitiveProcessingCount;
} frameCounter;

layout(push_constant) uniform PushConstants {
    uint transformsCount;
} pushConstants;

And we can finally perform a non uniform indexing into the Transform bind group.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
uint frameIdx = (frameCounter.primitiveProcessingCount / 3);

float angle =  mod(float(frameIdx), 360.0); // value between 0 and 359
const float angleStep = 360.0 / float(pushConstants.transformsCount);

// Select the right index based on current angle and steps between transforms
// angle [0, 359] and angleStep (e.g 45)
const uint transformIdx = uint(angle / angleStep);

gl_Position = transforms[transformIdx].modelMatrix * vec4(vertexPosition, 1.0);

Performance Notes

Benefits:

  • CPU: 10-100Ă— reduction in descriptor set binds (major bottleneck in complex scenes)
  • CPU: Simplified render loop (bind once, draw many)
  • Memory: No need to duplicate descriptors per-object

Costs:

  • GPU: NonUniform indexing may reduce SIMD efficiency (divergent execution)
  • Memory: All descriptors in array must be valid or use descriptorBindingPartiallyBound
  • Cache: Scattered resource access may reduce cache hit rate

Best Practices:

  • Group draws by similar resource usage to improve cache coherency
  • Use partially bound arrays to avoid allocating unused descriptors
  • Profile: gains depend on CPU/GPU balance (CPU-bound benefits most)
  • Consider indirect draws for fully GPU-driven rendering

Hardware Considerations:

  • NVIDIA: Excellent support, minimal overhead
  • AMD: Good support, watch for subgroup divergence
  • Mobile: Varies; check vendor documentation
  • Intel: Good in recent GPUs

See Also

Further Reading


Updated on 2026-03-31 at 00:02:07 +0000