Order Independent Transparency (OIT) with Compute¶

This example solves the transparency rendering problem where traditional alpha blending requires sorting objects back-to-front. Order-independent transparency (OIT) allows rendering transparent objects in any order by storing all fragments per-pixel in a linked list, then sorting and blending them in a final pass. This technique combines fragment shaders to build linked lists, storage buffers for fragment data, and compute shaders for particle updates.

The example uses the KDGpuExample helper API for simplified setup.

Key technique: Per-pixel linked list with atomic operations for lock-free fragment insertion, followed by in-shader sorting and blending.

Use cases: Complex transparent scenes (particles, glass), effects, volumetric rendering, avoiding CPU sort overhead.

The Approach¶

This example uses a per-pixel linked list:

Particle Update: A compute shader updates particle positions each frame.
Clear: The linked list buffer and the "head pointer" image is cleared to prepare for the new frame.
Fragment Storage: Transparent objects are rendered. Their fragment shader uses atomic operations to insert fragment data (color, depth, next pointer) into a global linked list storage buffer and updates a 2D image storing the head-node index for each pixel.
Compositing: A full-screen pass reads the linked list for each pixel, sorts the fragments by depth, and blends them front-to-back into the final image.

Data Structures¶

Each particle carries position, velocity, and color. The RGBA color's alpha channel controls transparency for all spheres.

struct ParticleData {
    glm::vec4 positionAndRadius;
    glm::vec4 velocity;
    glm::vec4 color;
};
static_assert(sizeof(ParticleData) == 12 * sizeof(float));

Filename: compute_oit_transparency/compute_oit_transparency.cpp

Each fragment stored in the linked list contains its color, depth, and the index of the next node. The buffer is sized dynamically as described in [Alpha Pass (Linked List Generation)].

    struct FragmentInfo {
        glm::vec4 color;
        float depth;
        int32_t next;
        float _pad[2];
    };
    static_assert(sizeof(FragmentInfo) == 8 * sizeof(float));

Filename: compute_oit_transparency/compute_oit_transparency.cpp

Initialization¶

The example initializes several nested structures to manage the different passes: m_particles, m_alpha, m_compositing, m_cubeMesh, m_sphereMesh, and m_global.

Particle Simulation¶

A compute shader updates particle positions every frame. The storage buffer binding layout uses a single SSBO bound at stage ComputeBit. A specialization constant is used to bake the local workgroup X size (256) into the shader at pipeline-creation time, matching the dispatch calculation ParticlesCount / 256:

    auto initializeComputePipeline = [this]() -> void {
        // Create a compute shader (spir-v only for now)
        auto computeShaderPath = KDGpuExample::assetDir().file("shaders/examples/compute_oit_transparency/particles.comp.spv");
        auto computeShader = m_device.createShaderModule(KDGpuExample::readShaderFile(computeShaderPath));

        // Create bind group layout consisting of a single binding holding a SSBO
        m_particles.bindGroupLayout = m_device.createBindGroupLayout(BindGroupLayoutOptions{
                .bindings = {
                        {
                                .binding = 0,
                                .resourceType = ResourceBindingType::StorageBuffer,
                                .shaderStages = ShaderStageFlags(ShaderStageFlagBits::ComputeBit),
                        },
                },
        });

        // Create a pipeline layout (array of bind group layouts)
        m_particles.computePipelineLayout = m_device.createPipelineLayout(PipelineLayoutOptions{
                .bindGroupLayouts = { m_particles.bindGroupLayout } });

        // Create a bindGroup to hold the UBO with the transform
        m_particles.particleBindGroup = m_device.createBindGroup(BindGroupOptions{
                .layout = m_particles.bindGroupLayout,
                .resources = {
                        {
                                .binding = 0,
                                .resource = StorageBufferBinding{ .buffer = m_particles.particleDataBuffer },
                        },
                },
        });

        m_particles.computePipeline = m_device.createComputePipeline(ComputePipelineOptions{
                .layout = m_particles.computePipelineLayout,
                .shaderStage = {
                        .shaderModule = computeShader,
                        // Use a specialization constant to set the local X workgroup size
                        .specializationConstants = {
                                {
                                        .constantId = 0,
                                        .value = 256,
                                },
                        },
                },
        });
    };
    initializeComputePipeline();

Filename: compute_oit_transparency/compute_oit_transparency.cpp

Each frame, the compute pass dispatches enough workgroups to cover all particles:

        // Particles
        {
            auto computePass = commandRecorder.beginComputePass();
            computePass.setPipeline(m_particles.computePipeline);
            computePass.setBindGroup(0, m_particles.particleBindGroup);
            constexpr size_t LocalWorkGroupXSize = 256;
            computePass.dispatchCompute(ComputeCommand{ .workGroupX = ParticlesCount / LocalWorkGroupXSize });
            computePass.end();
        }

Filename: compute_oit_transparency/compute_oit_transparency.cpp

Alpha Pass (Linked List Generation)¶

The alpha pass bind group layout exposes two fragment-stage bindings: a StorageBuffer for the per-pixel linked list nodes, and a StorageImage (R32_UINT) for the head-pointer texture:

    m_alpha.alphaBindGroupLayout = m_device.createBindGroupLayout(BindGroupLayoutOptions{
            .bindings = {
                    {
                            .binding = 0,
                            .resourceType = ResourceBindingType::StorageBuffer,
                            .shaderStages = ShaderStageFlags(KDGpu::ShaderStageFlagBits::FragmentBit),
                    },
                    {
                            .binding = 1,
                            .resourceType = ResourceBindingType::StorageImage,
                            .shaderStages = ShaderStageFlags(KDGpu::ShaderStageFlagBits::FragmentBit),
                    },
            },
    });

Filename: compute_oit_transparency/compute_oit_transparency.cpp

On the shader side, those same two bindings are declared as a coherent SSBO (with an atomic counter in the first field) and a coherent uimage2D:

layout(std430, set = 0, binding = 0) coherent buffer AlphaFragments
{
    uint nextIdx;
    vec3 _pad;
    AlphaFragment fragments[];
}
alphaFragments;

layout(set = 0, binding = 1, r32ui) uniform coherent uimage2D alphaHeadPointer;

Filename: compute_oit_transparency/alpha.frag

Both resources are sized to the window dimensions and (re)created on every resize. Up to 8 transparent fragments are budgeted per pixel; the buffer is padded by one vec4 to store the global atomic counter at offset 0:

    // Recreated fragmentHeadsPointer texture
    m_alpha.fragmentHeadsPointer = m_device.createTexture(KDGpu::TextureOptions{
            .label = "fragmentHeadPointers",
            .type = KDGpu::TextureType::TextureType2D,
            .format = KDGpu::Format::R32_UINT,
            .extent = { std::max(m_window->width(), uint32_t(1)), std::max(m_window->height(), uint32_t(1)), 1 },
            .mipLevels = 1,
            .usage =
                    KDGpu::TextureUsageFlagBits::TransferDstBit | KDGpu::TextureUsageFlagBits::StorageBit,
            .memoryUsage = KDGpu::MemoryUsage::GpuOnly,
    });
    m_alpha.fragmentHeadsPointerView = m_alpha.fragmentHeadsPointer.createView(KDGpu::TextureViewOptions{
            .label = "fragmentHeadPointersView",
            .range = {
                    .aspectMask = KDGpu::TextureAspectFlagBits::ColorBit,
                    .levelCount = 1,
            },
    });
    m_alpha.fragmentHeadsPointerLayout = TextureLayout::Undefined;

Filename: compute_oit_transparency/compute_oit_transparency.cpp

    // Recreate fragmentsLinkedList SSBO
    const size_t MaxFragmentCount = std::max(m_window->width(), uint32_t(1)) * std::max(m_window->height(), uint32_t(1)) * 8;

    // vec4 to hold nextId + array of structs
    m_alpha.fragmentLinkedListBufferByteSize = sizeof(float) * 4 + MaxFragmentCount * sizeof(FragmentInfo);
    m_alpha.fragmentLinkedListBuffer = m_device.createBuffer(KDGpu::BufferOptions{
            .label = "FragmentSSBO",
            .size = m_alpha.fragmentLinkedListBufferByteSize,
            .usage = KDGpu::BufferUsageFlagBits::StorageBufferBit |
                    KDGpu::BufferUsageFlagBits::TransferDstBit,
            .memoryUsage = KDGpu::MemoryUsage::GpuOnly,
    });

Filename: compute_oit_transparency/compute_oit_transparency.cpp

In the alpha pass itself, the storage buffer and head pointer image are cleared, layout transitions and memory barriers are inserted to guarantee write visibility, and then the sphere and cube meshes are drawn using instanced rendering:

        // Alpha
        {
            // Wait for SSBO writes completion by ComputeShader
            commandRecorder.bufferMemoryBarrier(KDGpu::BufferMemoryBarrierOptions{
                    .srcStages = KDGpu::PipelineStageFlags(KDGpu::PipelineStageFlagBit::ComputeShaderBit),
                    .srcMask = KDGpu::AccessFlagBit::ShaderWriteBit,
                    .dstStages = KDGpu::PipelineStageFlags(KDGpu::PipelineStageFlagBit::VertexInputBit),
                    .dstMask = KDGpu::AccessFlagBit::VertexAttributeReadBit,
                    .buffer = m_particles.particleDataBuffer,
            });

            // Clear Fragment List SSBO
            commandRecorder.clearBuffer(KDGpu::BufferClear{
                    .dstBuffer = m_alpha.fragmentLinkedListBuffer,
                    .byteSize = m_alpha.fragmentLinkedListBufferByteSize,
            });

            // Transition fragmentHeadsPointer to general layout if needed
            if (m_alpha.fragmentHeadsPointerLayout == KDGpu::TextureLayout::Undefined) {
                commandRecorder.textureMemoryBarrier(KDGpu::TextureMemoryBarrierOptions{
                        .srcStages = KDGpu::PipelineStageFlags(KDGpu::PipelineStageFlagBit::TopOfPipeBit),
                        .srcMask = KDGpu::AccessFlagBit::None,
                        .dstStages = KDGpu::PipelineStageFlags(KDGpu::PipelineStageFlagBit::TransferBit),
                        .dstMask =
                                KDGpu::AccessFlagBit::TransferWriteBit | KDGpu::AccessFlagBit::TransferReadBit,
                        .oldLayout = KDGpu::TextureLayout::Undefined,
                        .newLayout = KDGpu::TextureLayout::General,
                        .texture = m_alpha.fragmentHeadsPointer,
                        .range = {
                                .aspectMask = KDGpu::TextureAspectFlagBits::ColorBit,
                                .levelCount = 1,
                        },
                });
                m_alpha.fragmentHeadsPointerLayout = KDGpu::TextureLayout::General;
            }

            // Clear Fragment Head Texture Image
            commandRecorder.clearColorTexture(KDGpu::ClearColorTexture{
                    .texture = m_alpha.fragmentHeadsPointer,
                    .layout = KDGpu::TextureLayout::General,
                    .clearValue = {
                            .uint32 = { 0, 0, 0, 0 },
                    },
                    .ranges = {
                            {
                                    .aspectMask = KDGpu::TextureAspectFlagBits::ColorBit,
                                    .levelCount = 1,
                            },
                    },
            });

            // Wait until fragments SSBO has been cleared
            commandRecorder.bufferMemoryBarrier(KDGpu::BufferMemoryBarrierOptions{
                    .srcStages = KDGpu::PipelineStageFlags(KDGpu::PipelineStageFlagBit::TransferBit),
                    .srcMask = KDGpu::AccessFlagBit::TransferWriteBit,
                    .dstStages = KDGpu::PipelineStageFlags(KDGpu::PipelineStageFlagBit::FragmentShaderBit),
                    .dstMask = KDGpu::AccessFlagBit::ShaderWriteBit | KDGpu::AccessFlagBit::ShaderReadBit,
                    .buffer = m_alpha.fragmentLinkedListBuffer,
            });

            // Wait until fragments SSBO Heads pointer image has been cleared
            commandRecorder.textureMemoryBarrier(KDGpu::TextureMemoryBarrierOptions{
                    .srcStages = KDGpu::PipelineStageFlags(KDGpu::PipelineStageFlagBit::TransferBit),
                    .srcMask = KDGpu::AccessFlagBit::TransferWriteBit,
                    .dstStages = KDGpu::PipelineStageFlags(KDGpu::PipelineStageFlagBit::FragmentShaderBit),
                    .dstMask = KDGpu::AccessFlagBit::ShaderWriteBit | KDGpu::AccessFlagBit::ShaderReadBit,
                    .oldLayout = KDGpu::TextureLayout::General,
                    .newLayout = KDGpu::TextureLayout::General,
                    .texture = m_alpha.fragmentHeadsPointer,
                    .range = {
                            .aspectMask = KDGpu::TextureAspectFlagBits::ColorBit,
                            .levelCount = 1,
                    },
            });

            // Render Alpha meshes to fragment list
            auto alphaPass = commandRecorder.beginRenderPass(*m_alpha.renderPassOptions);

            // Draw Spheres
            alphaPass.setPipeline(m_sphereMesh.graphicsPipeline);
            alphaPass.setBindGroup(0, m_alpha.alphaLinkedListBindGroup);
            alphaPass.setBindGroup(1, m_global.cameraBindGroup);
            alphaPass.setVertexBuffer(0, m_sphereMesh.vertexBuffer);
            alphaPass.setVertexBuffer(1, m_particles.particleDataBuffer); // Per instance Data
            alphaPass.draw(DrawCommand{ .vertexCount = uint32_t(m_sphereMesh.vertexCount), .instanceCount = ParticlesCount });

            // Draw Cube
            alphaPass.setPipeline(m_cubeMesh.graphicsPipeline);
            alphaPass.setBindGroup(0, m_alpha.alphaLinkedListBindGroup);
            alphaPass.setBindGroup(1, m_global.cameraBindGroup);
            alphaPass.setVertexBuffer(0, m_cubeMesh.vertexBuffer);
            alphaPass.draw(DrawCommand{ .vertexCount = 36, .instanceCount = 1 });

            alphaPass.end();
        }

Filename: compute_oit_transparency/compute_oit_transparency.cpp

The corresponding fragment shader performs the lock-free insertion.

It atomically increments the global node counter to claim a slot
Writes the alpha fragment color and depth into that slot
Records the previous head pointer in the alpha fragment struct (allowing linked list traversal later)
Then atomically swaps the head pointer for the current pixel so the new node points to the previous head — forming the linked list one fragment at a time:

void main()
{
    // Get next free entry in fragments buffers
    // We treat 0 as the end of the linked list so we offset every value by 1
    uint nodeIdx = atomicAdd(alphaFragments.nextIdx, 1) + 1;

    // If we still have room in the fragments buffers
    if (nodeIdx < alphaFragments.fragments.length()) {

        // Insert new fragment entry
        alphaFragments.fragments[nodeIdx].color = adsModel(color);
        alphaFragments.fragments[nodeIdx].depth = gl_FragCoord.z;

        // Update alphaHeadPointer to nodeIdx for current fragment
        uint previousHeadIdx =
                imageAtomicExchange(alphaHeadPointer, ivec2(gl_FragCoord.xy), nodeIdx);
        // Set next to previousHeadIdx (0 is considered as the ending index)
        alphaFragments.fragments[nodeIdx].next = previousHeadIdx;
    }
}

Filename: compute_oit_transparency/alpha.frag

Compositing Pass¶

After the alpha pass, barriers ensure all fragment writes to the SSBO and head-pointer image are visible to the compositing fragment shader. A full-screen quad (6 vertices, no vertex buffer) reads the linked list and blends the sorted fragments into the swapchain image:

        // Compositing
        {
            // Wait until fragment Heads pointer image writes have been completed
            commandRecorder.textureMemoryBarrier(KDGpu::TextureMemoryBarrierOptions{
                    .srcStages = KDGpu::PipelineStageFlags(KDGpu::PipelineStageFlagBit::FragmentShaderBit),
                    .srcMask = KDGpu::AccessFlagBit::ShaderWriteBit,
                    .dstStages = KDGpu::PipelineStageFlags(KDGpu::PipelineStageFlagBit::FragmentShaderBit),
                    .dstMask = KDGpu::AccessFlagBit::ShaderReadBit,
                    .oldLayout = KDGpu::TextureLayout::General,
                    .newLayout = KDGpu::TextureLayout::General,
                    .texture = m_alpha.fragmentHeadsPointer,
                    .range = {
                            .aspectMask = KDGpu::TextureAspectFlagBits::ColorBit,
                            .levelCount = 1,
                    },
            });
            // Wait until fragment SSBO list writes have been completed
            commandRecorder.bufferMemoryBarrier(KDGpu::BufferMemoryBarrierOptions{
                    .srcStages = KDGpu::PipelineStageFlags(KDGpu::PipelineStageFlagBit::FragmentShaderBit),
                    .srcMask = KDGpu::AccessFlagBit::ShaderWriteBit,
                    .dstStages = KDGpu::PipelineStageFlags(KDGpu::PipelineStageFlagBit::FragmentShaderBit),
                    .dstMask = KDGpu::AccessFlagBit::ShaderReadBit,
                    .buffer = m_alpha.fragmentLinkedListBuffer,
            });

            // Render Compositing full screen quad to screen
            auto compositingPass = commandRecorder.beginRenderPass(RenderPassCommandRecorderOptions{
                    .colorAttachments = {
                            {
                                    .view = m_swapchainViews.at(m_currentSwapchainImageIndex),
                                    .clearValue = { 0.3f, 0.3f, 0.3f, 1.0f },
                                    .initialLayout = TextureLayout::Undefined,
                                    .finalLayout = TextureLayout::PresentSrc,
                            },
                    },
                    .depthStencilAttachment = {
                            .view = m_depthTextureView,
                    },
            });
            compositingPass.setPipeline(m_compositing.graphicsPipeline);
            compositingPass.setBindGroup(0, m_alpha.alphaLinkedListBindGroup);
            compositingPass.draw(DrawCommand{ .vertexCount = 6 });
            compositingPass.end();
        }

Filename: compute_oit_transparency/compute_oit_transparency.cpp

For each fragment of the full screen quad, the compositing fragment shader traverses the per-pixel linked list.

It does so by first retrieving the head pointer for the given fragment coordinges in the head pointer image.
Then since each Alpha Fragment stores a next pointers, it collects up to 32 fragment indices into a local array.
Next it runs an insertion sort by depth (front-to-back), then blends the sorted colors using mix() with each fragment's own alpha:

void main()
{
    // Retrieve all Alpha Fragments
    uint alphaHeadPtr = imageLoad(alphaHeadPointerTexture, ivec2(gl_FragCoord.xy)).r;
    const uint MAX_FRAGMENT_COUNT = 32;
    uint alphaFragmentIndices[MAX_FRAGMENT_COUNT];
    uint alphaFragmentIndexCount = 0;
    while (alphaHeadPtr > 0 && alphaFragmentIndexCount < MAX_FRAGMENT_COUNT) {
        alphaFragmentIndices[alphaFragmentIndexCount] = alphaHeadPtr;
        alphaHeadPtr = alphaFragments.fragments[alphaHeadPtr].next;
        ++alphaFragmentIndexCount;
    }

    // Sort Alpha Fragments by Depth (biggest depth first)
    if (alphaFragmentIndexCount > 1) {
        for (uint i = 0; i < alphaFragmentIndexCount - 1; i++) {
            for (uint j = 0; j < alphaFragmentIndexCount - i - 1; j++) {
                if (alphaDepth(alphaFragmentIndices[j]) > alphaDepth(alphaFragmentIndices[j + 1])) {
                    swap(alphaFragmentIndices[j], alphaFragmentIndices[j + 1]);
                }
            }
        }
    }

    vec4 blendedColor = vec4(0.0);
    for (uint i = 0; i < alphaFragmentIndexCount; i++) {
        vec4 alphaFragColor = alphaColor(alphaFragmentIndices[i]);
        blendedColor = mix(blendedColor, alphaFragColor, alphaFragColor.a);
    }

    fragColor = blendedColor;
}

Filename: compute_oit_transparency/compositing.frag

Vulkan Requirements¶

Vulkan Version: 1.0+ (compute shaders are core)
Extensions: None required (atomic operations are core)
Features: Compute shader support, atomic operations on storage buffers, image atomics

See VkShaderStageFlagBits - Compute Shader and Atomic Operations

Order Independent Transparency (OIT) with Compute¶

The Approach¶

Data Structures¶

Initialization¶

Particle Simulation¶

Alpha Pass (Linked List Generation)¶

Compositing Pass¶

Vulkan Requirements¶

Further Reading¶