Render to Texture with Subpasses

This example shows how to use subpasses within a single render pass to perform multi-pass rendering (like post-processing) while keeping intermediate data in on-chip tile memory.
Unlike Render to Texture which uses separate render passes, subpasses allow tile-based deferred renderers (TBDR) to avoid expensive round-trips to main memory.
This is crucial for mobile GPU efficiency but also benefits desktop GPUs.
The example uses the KDGpuExample helper API for simplified setup.
Key Benefit: Subpasses keep data in fast on-chip memory, avoiding expensive main memory transfers. This provides 50-75% bandwidth savings on mobile GPUs.
Use cases: Mobile rendering, deferred shading, post-processing on mobile, multi-stage effects.
Render Pass Configuration
We define a render pass with two subpasses:
- Subpass 0: Renders the main scene (a rotating triangle) into an intermediate color attachment.
- Subpass 1: Performs a post-processing effect (grayscale/blur) by reading from the first attachment as an input attachment and writing to the final swapchain image.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53 | void RenderToTextureSubpass::createRenderPass()
{
// attachment 1: color attachment(used as output for subpass 1 and input for subpass 2)
// attachment 2: color attachment for presenting(output for subpass 2)
const std::vector<AttachmentDescription> attachmentDescriptions{
AttachmentDescription{
.format = m_colorFormat,
.stencilLoadOperation = AttachmentLoadOperation::DontCare,
.stencilStoreOperation = AttachmentStoreOperation::DontCare,
},
AttachmentDescription{
.format = m_swapchainFormat,
.stencilLoadOperation = AttachmentLoadOperation::DontCare,
.stencilStoreOperation = AttachmentStoreOperation::DontCare,
.finalLayout = TextureLayout::PresentSrc }
};
const std::vector<SubpassDescription> subpassDescriptions{
SubpassDescription{
.colorAttachmentReference = { { 0 } },
},
SubpassDescription{
.inputAttachmentReference = { { 0 } },
.colorAttachmentReference = { { 1 } },
}
};
// First dependency ensure that the previous renderpass must finish before it can write output to attachment 0
// Second dependency ensure that subpass 1 wait for subpass 0 to finish writing to attachment 0 before it reads it
const std::vector<SubpassDependenciesDescriptions> dependencyDescriptions{
SubpassDependenciesDescriptions{
.srcSubpass = ExternalSubpass,
.dstSubpass = 0,
.dstStageMask = PipelineStageFlagBit::ColorAttachmentOutputBit,
.dstAccessMask = AccessFlagBit::ColorAttachmentReadBit | AccessFlagBit::ColorAttachmentWriteBit },
SubpassDependenciesDescriptions{
.srcSubpass = 0,
.dstSubpass = 1,
.srcStageMask = PipelineStageFlagBit::ColorAttachmentOutputBit,
.dstStageMask = PipelineStageFlagBit::ColorAttachmentOutputBit | PipelineStageFlagBit::FragmentShaderBit,
.srcAccessMask = AccessFlagBit::ColorAttachmentWriteBit,
.dstAccessMask = AccessFlagBit::InputAttachmentReadBit | AccessFlagBit::ColorAttachmentWriteBit | AccessFlagBit::ColorAttachmentReadBit,
}
};
const RenderPassOptions renderPassInfo = {
.attachments = attachmentDescriptions,
.subpassDescriptions = subpassDescriptions,
.subpassDependencies = dependencyDescriptions
};
m_renderPass = m_device.createRenderPass(renderPassInfo);
}
|
Filename: render_to_texture_subpass/render_to_texture_subpass.cpp
Subpass Dependencies
Dependencies are crucial here. We must specify that Subpass 1 depends on Subpass 0's writes to the color attachment. By using PipelineStageFlagBit::ColorAttachmentOutputBit for the source and PipelineStageFlagBit::FragmentShaderBit for the destination, we ensure the data is ready before the post-process shader tries to read it.
In the GLSL shader for Subpass 1, we access the data from Subpass 0 using a special subpassInput type instead of a standard sampler2D:
| layout(input_attachment_index = 0, set = 0, binding = 0) uniform subpassInput inputColor;
void main() {
vec3 color = subpassLoad(inputColor).rgb;
// apply post-processing...
}
|
For more information, see the Vulkan documentation on Subpasses.
Draw Call Submission
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50 | void RenderToTextureSubpass::render()
{
auto commandRecorder = m_device.createCommandRecorder();
// Pass 1: Color pass
auto opaquePass = commandRecorder.beginRenderPass(RenderPassCommandRecorderWithRenderPassOptions{
.renderPass = m_renderPass.handle(),
.attachments = {
{
.view = m_colorOutputView, // We always render to the color texture
.color = Attachment::ColorOperations{
.clearValue = ColorClearValue{ 0.0f, 0.0f, 0.0f, 1.0f },
},
},
{
.view = m_swapchainViews.at(m_currentSwapchainImageIndex),
.color = Attachment::ColorOperations{
.clearValue = ColorClearValue{ 0.3f, 0.3f, 0.3f, 1.0f },
.layout = TextureLayout::ColorAttachmentOptimal,
},
},
},
});
opaquePass.setPipeline(m_pipeline);
opaquePass.setVertexBuffer(0, m_buffer);
opaquePass.setIndexBuffer(m_indexBuffer);
opaquePass.setBindGroup(0, m_transformBindGroup);
opaquePass.drawIndexed(DrawIndexedCommand{ .indexCount = 3 });
opaquePass.nextSubpass();
// Pass 2: Post process
opaquePass.setPipeline(m_postProcessPipeline);
opaquePass.setVertexBuffer(0, m_fullScreenQuad);
opaquePass.setBindGroup(0, m_colorBindGroup);
opaquePass.pushConstant(m_filterPosPushConstantRange, m_filterPosData.data());
opaquePass.draw(DrawCommand{ .vertexCount = 4 });
renderImGuiOverlay(&opaquePass, m_inFlightIndex, &m_renderPass, 1);
opaquePass.end();
// Finalize the command recording
m_commandBuffer = commandRecorder.finish();
const SubmitOptions submitOptions = {
.commandBuffers = { m_commandBuffer },
.waitSemaphores = { m_presentCompleteSemaphores[m_inFlightIndex] },
.signalSemaphores = { m_renderCompleteSemaphores[m_currentSwapchainImageIndex] }
};
m_queue.submit(submitOptions);
}
|
Filename: render_to_texture_subpass/render_to_texture_subpass.cpp
Compare this to Render to Texture, where the two passes are separate beginRenderPass / end() calls on separate commandRecorder invocations:
- In
render_to_texture, both the scene and the post-process pass are begun independently with commandRecorder.beginRenderPass(...), and an explicit commandRecorder.textureMemoryBarrier(...) is inserted between them to transition the intermediate texture from ColorAttachmentOptimal to ShaderReadOnlyOptimal.
- Here, all attachments — including both the intermediate color texture and the final swapchain image — are declared upfront in a single
beginRenderPass call. Between the two subpasses, opaquePass.nextSubpass() is called instead of ending and restarting the render pass.
- No barrier is needed between the subpasses. The
SubpassDependenciesDescriptions declared in createRenderPass() tells the driver exactly when and how to synchronize access to attachment 0, so it can keep the data in tile memory rather than flushing it to main memory.
Updated on 2026-03-31 at 00:02:07 +0000