Skip to content

Dynamic Uniform Buffer

dynamic_ubo.png

This example shows how to use dynamic uniform buffer (UBO) bindings to render multiple objects without creating separate descriptor sets for each. By storing multiple transforms in one contiguous buffer and using dynamic offsets when binding, we can efficiently render many objects with minimal descriptor management overhead. This is essential for rendering scenes with hundreds or thousands of objects.

The example uses the KDGpuExample helper API for simplified setup.

Overview

What this example demonstrates:

  • Dynamic uniform buffer offsets for per-object data
  • Alignment requirements (minUniformBufferOffsetAlignment)
  • Single buffer with multiple transforms
  • Efficient descriptor reuse across draw calls
  • Instanced rendering alternative

Performance benefit:

  • One descriptor set binding for N objects (instead of N bindings)
  • Reduced descriptor set switching overhead
  • Better cache locality with contiguous buffer
  • Minimal CPU-side per-object work

Vulkan Requirements

  • Vulkan Version: 1.0+
  • Extensions: None (dynamic UBOs are core functionality)
  • Device Limits: minUniformBufferOffsetAlignment (typically 16-256 bytes)

Key Concepts

Dynamic Uniform Buffers:

Vulkan provides two types of uniform buffer bindings:

  • Static UBO: Offset fixed at descriptor set creation time
  • Dynamic UBO: Offset specified at bind time (vkCmdBindDescriptorSets)

Dynamic UBOs allow binding different regions of the same buffer without creating multiple descriptor sets. This is perfect for per-object data like transform matrices.

Alignment Requirements:

Dynamic UBO offsets must be aligned to minUniformBufferOffsetAlignment, a device-specific value (typically 16 or 256 bytes). Even if your data is smaller (e.g., 64-byte mat4), you must stride by the alignment requirement:

1
2
3
4
5
// Wrong: may crash or produce incorrect results
offset = objectIndex * sizeof(mat4);  // 64 bytes

// Correct: aligned to device requirement
offset = objectIndex * alignedSize;    // e.g., 256 bytes

Spec: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VkPhysicalDeviceLimits.html

Use Cases:

  • Per-object transforms in a scene
  • Per-material parameters
  • Per-light data arrays
  • Skeletal animation bone matrices
  • Any homogeneous per-item data

Implementation

Computing Aligned Stride:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
        // Retrieve minimum buffer offset alignment
        const size_t minDynamicUBOOffsetAlignment = m_device.adapter()->properties().limits.minUniformBufferOffsetAlignment;
        m_dynamicUBOByteStride = std::max(minDynamicUBOOffsetAlignment, sizeof(glm::mat4));

        const BufferOptions bufferOptions = {
            .size = entityCount * m_dynamicUBOByteStride,
            .usage = BufferUsageFlagBits::UniformBufferBit,
            .memoryUsage = MemoryUsage::CpuToGpu // So we can map it to CPU address space
        };
        m_transformDynamicUBOBuffer = m_device.createBuffer(bufferOptions);

Filename: dynamic_ubo/dynamic_ubo_triangles.cpp

Key points:

  • Query minUniformBufferOffsetAlignment from device limits
  • Round up data size to alignment boundary
  • Allocate buffer with: objectCount * alignedStride
  • Wasted bytes are unavoidable but small

Example: With 256-byte alignment and 64-byte mat4:

  • Stride = 256 bytes
  • Waste = 192 bytes per object (75% wasted)
  • For 1000 objects: 250 KB total, 187 KB wasted
  • Still more efficient than 1000 descriptor sets!

Bind Group Layout Configuration:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
    // Create bind group layout consisting of a single binding holding a UBO
    const BindGroupLayoutOptions bindGroupLayoutOptions = {
        .bindings = {
                {
                        .binding = 0,
                        .resourceType = ResourceBindingType::DynamicUniformBuffer,
                        .shaderStages = ShaderStageFlags(ShaderStageFlagBits::VertexBit),
                },
        },
    };
    const BindGroupLayout bindGroupLayout = m_device.createBindGroupLayout(bindGroupLayoutOptions);

Filename: dynamic_ubo/dynamic_ubo_triangles.cpp

Use ResourceBindingType::DynamicUniformBuffer instead of UniformBuffer. This tells Vulkan you'll provide offsets at draw time.

Bind Group with Dynamic Binding:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
    const BindGroupOptions bindGroupOptions = {
        .layout = bindGroupLayout,
        .resources = {
                {
                        .binding = 0,
                        // We are dealing with a Dynamic UBO expected to hold a set of transform matrices.
                        // The size we specify for the binding is the size of a single entry in the buffer
                        .resource = DynamicUniformBufferBinding{
                                .buffer = m_transformDynamicUBOBuffer,
                                .size = uint32_t(m_dynamicUBOByteStride),
                        },
                },
        },
    };

Filename: dynamic_ubo/dynamic_ubo_triangles.cpp

The size field specifies size of ONE entry (the aligned stride), not the entire buffer. Vulkan uses this with the dynamic offset to compute actual buffer region.

Per-Frame Update:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
    // Each frame we want to rotate the triangle a little
    static float angle = 0.0f;
    angle += 0.1f;
    if (angle > 360.0f)
        angle -= 360.0f;

    std::vector<uint8_t> rawTransformData(entityCount * m_dynamicUBOByteStride, 0U);

    // Update EntityCount matrices into the single buffer we have
    for (size_t i = 0; i < entityCount; ++i) {
        auto transform = glm::mat4(1.0f);
        transform = glm::translate(transform, glm::vec3(-0.7f + (i * 0.5f), 0.0f, 0.0f));
        transform = glm::scale(transform, glm::vec3(0.2f));
        transform = glm::rotate(transform, glm::radians(angle + (45.0f * i)), glm::vec3(0.0f, 0.0f, 1.0f));

        std::memcpy(rawTransformData.data() + (i * m_dynamicUBOByteStride), &transform, sizeof(glm::mat4));
    }

    auto *bufferData = m_transformDynamicUBOBuffer.map();
    std::memcpy(bufferData, rawTransformData.data(), rawTransformData.size());
    m_transformDynamicUBOBuffer.unmap();

Filename: dynamic_ubo/dynamic_ubo_triangles.cpp

Map the entire buffer once, update all transforms, unmap. Each transform is written at i * alignedStride offset.

Rendering with Dynamic Offsets:

1
2
3
4
5
6
7
    for (size_t i = 0; i < entityCount; ++i) {
        // Bind Group and provide offset into the Dynamic UBO that holds all the transform matrices
        const uint32_t dynamicUBOOffset = i * m_dynamicUBOByteStride;
        opaquePass.setBindGroup(0, m_transformBindGroup, m_pipelineLayout, std::array{ dynamicUBOOffset });
        const DrawIndexedCommand drawCmd = { .indexCount = 3 };
        opaquePass.drawIndexed(drawCmd);
    }

Filename: dynamic_ubo/dynamic_ubo_triangles.cpp

The critical call is setBindGroup with the dynamic offset:

1
opaquePass.setBindGroup(0, m_transformBindGroup, {}, std::array{ i * m_dynamicUBOByteStride });

This selects which object's transform the shader sees, without changing descriptor sets.

Performance Notes

When to Use Dynamic UBOs:

  • Good: 10-10,000 objects with small per-object data (transforms, colors)
  • Good: Data updated frequently (every frame)
  • Bad: Huge per-object data (>256 bytes) - use storage buffers instead
  • Bad: Very sparse updates - static descriptor sets may be better

Memory Overhead:

  • Alignment waste can be significant for small data
  • For 64-byte data with 256-byte alignment: 75% wasted
  • For 256-byte data with 256-byte alignment: 0% wasted
  • Pack multiple data items to reduce waste

CPU Performance:

  • Much faster than updating/binding N descriptor sets
  • Single vkCmdBindDescriptorSets with offset vs N calls
  • Reduces driver validation overhead

GPU Performance:

  • Contiguous buffer improves cache locality
  • Uniform buffer access is very fast
  • No performance difference from static UBOs

Alternatives:

  • Push Constants: For very small data (<128 bytes), fastest but limited size
  • Storage Buffers: For large/variable-size data, no alignment waste but slower access
  • Instanced Rendering: For identical geometry with per-instance data
  • Indirect Drawing: For GPU-driven rendering with buffers

See Also

Further Reading


Updated on 2026-03-31 at 00:02:07 +0000