Render to Texture (Dynamic Rendering with Local Read)¶

Overview¶

Traditional Vulkan requires creating VkRenderPass and VkFramebuffer objects to define multi-pass rendering. These API objects are verbose to create and inflexible—changing attachment formats or load/store operations requires rebuilding the entire render pass. Vulkan 1.3 introduced Dynamic Rendering, which eliminates these objects by specifying attachments directly in the command buffer.

However, dynamic rendering initially lacked a critical feature: the ability to read from attachments within the same render pass (like subpass input attachments). The VK_KHR_dynamic_rendering_local_read extension (and VK_EXT_shader_tile_image for mobile) fills this gap. You can now render to an attachment in one draw call and immediately sample it in a subsequent draw call—all within a single dynamic render pass.

This example demonstrates a deferred shading pipeline:

Scene pass: Render scene geometry to an output color attachment
Post Processing pass: Sample the input color attachment (output of the scene pass), add some post processing effect and render to the final color attachment

Both passes occur within one dynamic rendering scope, avoiding the overhead of separate render passes. On tile-based GPUs (mobile, Apple Silicon), attachments remain in on-chip tile memory between passes, preserving the performance benefits of traditional subpasses.

Vulkan Requirements¶

Vulkan Version: 1.3 (for dynamic rendering core support)
Required Extensions:
- VK_KHR_dynamic_rendering_local_read (enables reading attachments during dynamic rendering)
- Alternatively: VK_EXT_shader_tile_image (mobile-focused tile image extension)
Required Features:
- dynamicRendering: Enables render pass-free rendering
- dynamicRenderingLocalRead: Allows reading from local attachments within dynamic render pass
Dynamic Rendering: Must begin render pass with RenderPassCommandRecorderWithDynamicRenderingOptions using dynamic rendering options

Key Concepts¶

Dynamic Rendering:

Instead of pre-creating VkRenderPass and VkFramebuffer, you begin rendering by directly specifying attachments:

    auto opaquePass = commandRecorder.beginRenderPass(RenderPassCommandRecorderWithDynamicRenderingOptions{
            .colorAttachments = {
                    {
                            // Offscreen Texture (Pass 1)
                            .view = m_colorOutputView, // We always render to the color texture
                            .clearValue = ColorClearValue{ 0.0f, 0.0f, 0.0f, 1.0f },
                            .layout = TextureLayout::DynamicLocalRead,
                    },
                    {
                            // Swapchain Output (Pass 2)
                            .view = m_swapchainViews.at(m_currentSwapchainImageIndex),
                            .clearValue = ColorClearValue{ 0.3f, 0.3f, 0.3f, 1.0f },
                            .layout = TextureLayout::ColorAttachmentOptimal,
                    },
            },
    });

Filename: render_to_texture_subpass_dynamic_rendering/render_to_texture_subpass_dynamic_rendering.cpp

This eliminates hundreds of lines of boilerplate and makes render pass configuration data-driven.

Local Read:

Traditional subpass dependencies allow reading from attachments using input_attachment (GLSL).

Local read extends this to dynamic rendering. Mark an attachment with TextureLayout::DynamicLocalRead layout and bind it as an input attachment for subsequent draws within the same render pass:

    const BindGroupOptions bindGroupOptions = {
        .layout = m_colorBindGroupLayout,
        .resources = {
                {
                        .binding = 0,
                        .resource = InputAttachmentBinding{
                                .textureView = m_colorOutputView,
                                .layout = TextureLayout::DynamicLocalRead,
                        },
                },
        },
    };
    m_colorBindGroup = m_device.createBindGroup(bindGroupOptions);

Filename: render_to_texture_subpass_dynamic_rendering/render_to_texture_subpass_dynamic_rendering.cpp

On-Tile Memory (TBDR GPUs):

Tile-based deferred rendering GPUs (ARM Mali, Qualcomm Adreno, Apple GPUs) render to on-chip "tile memory" before writing to main memory. Local reads keep data on-chip between passes, avoiding expensive memory bandwidth:

Traditional subpasses: Keep data on-chip automatically
Dynamic rendering + local read: Achieves the same optimization with simpler API

For mobile VR and high-resolution rendering, this bandwidth saving is critical.

Shader Integration:

The scene pass fragment shader writes its output to layout(location = 0), which the pipeline's dynamicOutputLocations maps to color attachment 0:

layout(location = 0) in vec3 color;

layout(location = 0) out vec4 fragColor;

void main()
{
    fragColor = vec4(color, 1.0);
}

Filename: assets/shaders/examples/render_to_texture_subpass/rotating_triangle.frag

The post-process fragment shader declares its input using the standard subpassInput type, bound to input_attachment_index = 0.

This index corresponds to the remapping set up by dynamicInputLocations in the post-process pipeline — color attachment 0 is exposed as input attachment 0:

layout(location = 0) in vec2 texCoord;
layout(input_attachment_index = 0, binding = 0) uniform subpassInput inputColor;

Filename: assets/shaders/examples/render_to_texture_subpass/desaturate.frag

The input is read with subpassLoad, and the result is written to layout(location = 0) which the pipeline maps to color attachment 1 (the swapchain image):

void main()
{
    vec3 color = subpassLoad(inputColor).rgb;

    const float lineWidth = 0.001;
    if (texCoord.s > pushConstants.filterPosition + lineWidth) {
        float gray = luminance(color);
        fragColor = vec4(gray, gray, gray, 1.0);
    } else if (texCoord.s < pushConstants.filterPosition - lineWidth) {
        fragColor = vec4(color, 1.0);
    } else {
        fragColor = vec4(0.0, 0.0, 1.0, 1.0);
    }
}

Filename: assets/shaders/examples/render_to_texture_subpass/desaturate.frag

Implementation Details¶

Dynamic Rendering Configuration:

Specify all attachments when beginning the render pass.
Specify all attachments when creating the pipelines and whether they are used and if so, to which shader input or output index they are to be bound
BindGroups can use InputAttachmentBinding bindings with layout DynamicLocalRead to enable sampling from input attachments. The DynamicLocalRead layout tells the driver this attachment will be read later in the same pass.

Graphics Pipeline Setup:

Pipelines must declare their attachment formats in dynamicRendering options (since no VkRenderPass provides this info). This replaces the renderPass field used in traditional pipelines.

Attachment Location and Input Remapping¶

Dynamic rendering with local read introduces a two-level remapping system that controls how color attachments are wired to shader inputs and outputs. Both levels must agree for the draw to be valid.

Level 1 — Pipeline declaration:

Each pipeline statically declares which attachments it reads from and writes to, baked into the pipeline object at creation time. This maps to VkRenderingAttachmentLocationInfoKHR and VkRenderingInputAttachmentIndexInfoKHR in the pNext chain.

dynamicInputLocations.inputColorAttachments[i]: for each color attachment index i, declares whether it is exposed as an input attachment and, if so, which input attachment index (remappedIndex) the shader sees it under.
dynamicOutputLocations.outputAttachments[i]: for each color attachment index i, declares whether the pipeline writes to it and, if so, which fragment shader output location (remappedIndex) drives it.

Setting .enabled = false marks an attachment as unused for that pipeline — the pipeline will neither read from nor write to it, even though the attachment exists in the dynamic render pass.

Main scene pipeline declaration — no input attachments, frag output 0 writes only to color attachment 0, color attachment 1 unused:

    const GraphicsPipelineOptions pipelineOptions = {
        .label = "Main scene pipeline",
        .shaderStages = {
                { .shaderModule = vertexShader, .stage = ShaderStageFlagBits::VertexBit },
                { .shaderModule = fragmentShader, .stage = ShaderStageFlagBits::FragmentBit } },
        .layout = m_pipelineLayout,
        .vertex = {
                .buffers = {
                        { .binding = 0, .stride = sizeof(Vertex) },
                },
                .attributes = {
                        { .location = 0, .binding = 0, .format = Format::R32G32B32_SFLOAT }, // Position
                        { .location = 1, .binding = 0, .format = Format::R32G32B32_SFLOAT, .offset = sizeof(glm::vec3) }, // Color
                },
        },
        .renderTargets = {
                // We need to specify all our RenderTarget even if we will only target 1
                { .format = m_colorFormat },
                { .format = m_swapchainFormat },
        },
        .dynamicRendering = {
                .enabled = true, // Mark that we want to use it with dynamic rendering
                .dynamicInputLocations = DynamicInputAttachmentLocations{
                        // Specify that we have no input attachments
                        .inputColorAttachments = {
                                {
                                        .enabled = false,
                                },
                                {
                                        .enabled = false,
                                },
                        },
                },
                .dynamicOutputLocations = DynamicOutputAttachmentLocations{
                        // Specify that we want frag output[0] to write only to color attachment[0]
                        .outputAttachments = {
                                {
                                        .enabled = true,
                                        .remappedIndex = 0,
                                },
                                {
                                        .enabled = false,
                                },
                        },
                },
        },
    };
    m_pipeline = m_device.createGraphicsPipeline(pipelineOptions);

Filename: render_to_texture_subpass_dynamic_rendering/render_to_texture_subpass_dynamic_rendering.cpp

Post-process pipeline declaration — color attachment 0 is exposed as input attachment 0, frag output 0 writes only to color attachment 1, color attachment 0 output is unused:

    const GraphicsPipelineOptions pipelineOptions = {
        .label = "Post-process pipeline",
        .shaderStages = {
                { .shaderModule = vertexShader, .stage = ShaderStageFlagBits::VertexBit },
                { .shaderModule = fragmentShader, .stage = ShaderStageFlagBits::FragmentBit } },
        .layout = m_postProcessPipelineLayout,
        .vertex = {
                .buffers = { { .binding = 0, .stride = (3 + 2) * sizeof(float) } },
                .attributes = {
                        { .location = 0, .binding = 0, .format = Format::R32G32B32_SFLOAT }, // Position
                        { .location = 1, .binding = 0, .format = Format::R32G32_SFLOAT, .offset = 3 * sizeof(float) } // Texture coords
                },
        },
        .renderTargets = {
                // We need to specify all our RenderTarget even if we will only target 1
                { .format = m_colorFormat },
                { .format = m_swapchainFormat },
        },
        .primitive = { .topology = PrimitiveTopology::TriangleStrip },
        .dynamicRendering = {
                .enabled = true, // Mark that we want to use it with dynamic rendering
                .dynamicInputLocations = DynamicInputAttachmentLocations{
                        // Specify that we want color attachment[0] to be fed as input attachment[0]
                        .inputColorAttachments = {
                                {
                                        .enabled = true,
                                        .remappedIndex = 0,
                                },
                                {
                                        .enabled = false,
                                },
                        },
                },
                .dynamicOutputLocations = DynamicOutputAttachmentLocations{
                        // Specify that we want frag output[0] to write only to color attachment[1]
                        .outputAttachments = {
                                {
                                        .enabled = false,
                                },
                                {
                                        .enabled = true,
                                        .remappedIndex = 0,
                                },
                        },
                },
        },
    };
    m_postProcessPipeline = m_device.createGraphicsPipeline(pipelineOptions);

Filename: render_to_texture_subpass_dynamic_rendering/render_to_texture_subpass_dynamic_rendering.cpp

Level 2 — Per-draw command buffer state (vkCmdSet*):

When switching pipelines within the same render pass, the render pass instance must also be updated to match the new pipeline's declared mapping. This is done with setInputAttachmentMapping() and setOutputAttachmentMapping(), which call vkCmdSetRenderingInputAttachmentIndicesKHR and vkCmdSetRenderingAttachmentLocationsKHR respectively.

The array passed to each call mirrors the pipeline's declaration:

setOutputAttachmentMapping(locations): locations[i] is the fragment shader output location that writes to color attachment i. Pass std::nullopt to mark attachment i as unused for output.
setInputAttachmentMapping(inputIndices, ...): inputIndices[i] is the input attachment index that the shader reads color attachment i from. Pass std::nullopt to mark attachment i as unused for input.

Multi-Pass Execution:

Pass 1 — set state for main scene pipeline, render geometry into color attachment 0, leave attachment 1 untouched:

    std::array<std::optional<uint32_t>, 2> pass1InputIndices = { std::nullopt, std::nullopt }; // no input attachments
    std::array<std::optional<uint32_t>, 2> pass1OutputLocations = { 0, std::nullopt }; // frag output 0 -> attachment 0; attachment 1 unused
    opaquePass.setInputAttachmentMapping(pass1InputIndices, {}, {});
    opaquePass.setOutputAttachmentMapping(pass1OutputLocations);

    opaquePass.setPipeline(m_pipeline);
    opaquePass.setVertexBuffer(0, m_buffer);
    opaquePass.setIndexBuffer(m_indexBuffer);
    opaquePass.setBindGroup(0, m_transformBindGroup);
    opaquePass.drawIndexed(DrawIndexedCommand{ .indexCount = 3 });

Filename: render_to_texture_subpass_dynamic_rendering/render_to_texture_subpass_dynamic_rendering.cpp

Pass 2 — update state for post-process pipeline: color attachment 0 is now read as input attachment 0, frag output 0 now routes to color attachment 1:

    std::array<std::optional<uint32_t>, 2> pass2InputIndices = { 0, std::nullopt }; // input attachment 0 reads from attachment 0
    const std::array<std::optional<uint32_t>, 2> pass2OutputLocations = { std::nullopt, 0 }; // frag output 0 -> attachment 1; frag output 1 -> attachment 0 unused
    opaquePass.setInputAttachmentMapping(pass2InputIndices, {}, {});
    opaquePass.setOutputAttachmentMapping(pass2OutputLocations);

    opaquePass.setPipeline(m_postProcessPipeline);
    opaquePass.setVertexBuffer(0, m_fullScreenQuad);
    opaquePass.setBindGroup(0, m_colorBindGroup);
    opaquePass.pushConstant(m_filterPosPushConstantRange, m_filterPosData.data());
    opaquePass.draw(DrawCommand{ .vertexCount = 4 });

Filename: render_to_texture_subpass_dynamic_rendering/render_to_texture_subpass_dynamic_rendering.cpp

No nextSubpass() call needed—just update the mapping state and issue draws in sequence.

Memory Barrier (if needed):

If the implementation requires explicit barriers between local reads, insert a pipeline barrier:

    commandRecorder.memoryBarrier(MemoryBarrierOptions{
            .srcStages = PipelineStageFlagBit::ColorAttachmentOutputBit,
            .dstStages = PipelineStageFlagBit::FragmentShaderBit,
            .memoryBarriers = {
                    {
                            .srcMask = AccessFlagBit::ColorAttachmentWriteBit,
                            .dstMask = AccessFlagBit::InputAttachmentReadBit,
                    },
            },
    });

Filename: render_to_texture_subpass_dynamic_rendering/render_to_texture_subpass_dynamic_rendering.cpp

Many drivers optimize this away for tile-based architectures.

Performance Notes¶

Tile Memory Efficiency: On mobile GPUs, local reads keep intermediate data on-chip:

Without local read: G-Buffer written to RAM (bandwidth cost), then read back (bandwidth cost)
With local read: G-Buffer stays in tile memory (~95% bandwidth saving)

For a 2048x2048 G-Buffer (3 textures × 16 bytes = 96 MB), this saves ~190 MB of bandwidth per frame.

Desktop GPUs: Less dramatic but still beneficial:

Improved cache locality
Reduced command buffer overhead
Driver-side optimization opportunities

API Simplicity: Dynamic rendering reduces validation layers overhead and driver complexity, improving frame pacing and CPU performance.

Best Practices:

Use local read for all multi-pass effects: deferred shading, SSAO, post-processing chains
Profile bandwidth with GPU tools (Snapdragon Profiler, ARM Streamline, Xcode Instruments)
Consider VK_EXT_shader_tile_image for explicit tile memory control on supported hardware
Minimize attachment format changes mid-pass

Render to Texture (Dynamic Rendering with Local Read)¶

Overview¶

Vulkan Requirements¶

Key Concepts¶

Implementation Details¶

Attachment Location and Input Remapping¶

Performance Notes¶

See also¶

Further Reading¶