Offscreen Rendering to Image File
This example shows offscreen (headless) rendering where Vulkan renders to textures that are saved to disk rather than displayed on screen. Unlike windowed examples, this has no event loop, no swapchain, and no KDGpuExample helper - just pure GPU rendering to memory. This technique is essential for batch rendering, server-side rendering, thumbnails, baking, and automated testing. The example renders a large dataset of points (scatter plot visualization) with MSAA and saves the result.
Overview
What this demonstrates: Headless Vulkan initialization, rendering without swapchains, creating render targets manually, MSAA offscreen, reading back GPU results to CPU, saving images to disk, batch processing.
Use cases: Server-side rendering, thumbnail generation, texture baking, automated testing, batch visualization, screenshot tools.
Requirements
- Vulkan Version: 1.0+
- Extensions: None (pure compute/graphics)
- No window required
When starting the program, the first thing we do is fill a large vector with vertex information, which represent points we want to plot. These are generated by a function that we won't cover here. Vertices look like this:
| struct Vertex {
glm::vec2 pos;
glm::vec4 color;
};
|
Filename: offscreen_rendering/offscreen.h
We then construct the "Offscreen" object and call its initializeScene method.
createRenderTargets
The first part of initialization is the constructor, which creates a vulkan API instance, finds a physical adapter, and creates a device and a queue from it:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17 | Offscreen::Offscreen()
: m_api(std::make_unique<VulkanGraphicsApi>())
{
m_instance = m_api->createInstance(InstanceOptions{
.applicationName = "offscreen_rendering",
.applicationVersion = KDGPU_MAKE_API_VERSION(0, 1, 0, 0) });
auto adapter = m_instance.selectAdapter(AdapterDeviceType::Default);
const auto adapterProperties = adapter->properties();
SPDLOG_INFO("Using adapter: {}", adapterProperties.deviceName);
// Create a device and grab the first queue
m_device = adapter->createDevice(DeviceOptions{ .requestedFeatures = adapter->features() });
m_queue = m_device.queues()[0];
createRenderTargets();
}
|
Filename: offscreen_rendering/offscreen.cpp
The constructor also calls createRenderTargets, which is a long function that configures and creates all the textures we need, as well as an array of KDGpu::TextureMemoryBarrierOptions, the purpose of which will be shown later. First, let's look at the color texture initialization.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25 | // Create a color texture to use as our color render target
const TextureOptions msaaColorTextureOptions = {
.type = TextureType::TextureType2D,
.format = m_colorFormat,
.extent = { m_width, m_height, 1 },
.mipLevels = 1,
.samples = m_samples,
.usage = TextureUsageFlagBits::ColorAttachmentBit,
.memoryUsage = MemoryUsage::GpuOnly
};
m_msaaColorTexture = m_device.createTexture(msaaColorTextureOptions);
m_msaaColorTextureView = m_msaaColorTexture.createView();
// Create a color texture to use as our resolve render target
const TextureOptions colorTextureOptions = {
.type = TextureType::TextureType2D,
.format = m_colorFormat,
.extent = { m_width, m_height, 1 },
.mipLevels = 1,
.samples = SampleCountFlagBits::Samples1Bit,
.usage = TextureUsageFlagBits::ColorAttachmentBit | TextureUsageFlagBits::TransferSrcBit,
.memoryUsage = MemoryUsage::GpuOnly
};
m_colorTexture = m_device.createTexture(colorTextureOptions);
m_colorTextureView = m_colorTexture.createView();
|
Filename: offscreen_rendering/offscreen.cpp
We need to create two textures since we are using multisampling. The first texture can be upsampled and then rendered into the second to create the MSAA effect. To use this we simply configure the main pass to use both views:
1
2
3
4
5
6
7
8
9
10
11
12
13 | auto renderPass = commandRecorder.beginRenderPass(KDGpu::RenderPassCommandRecorderOptions{
.colorAttachments = {
{
.view = m_msaaColorTextureView,
.resolveView = m_colorTextureView,
.clearValue = { 0.3f, 0.3f, 0.3f, 1.0f },
},
},
.depthStencilAttachment = {
.view = m_depthTextureView,
},
.samples = m_samples,
});
|
Filename: offscreen_rendering/offscreen.cpp
For more information on multi-sampling with KDGpu, check out Hello Triangle MSAA.
Next, we initialize a texture which exists in CPU address space, which we will use as a proxy to render to from vulkan and then copy onto disk in an image format.
1
2
3
4
5
6
7
8
9
10
11
12 | // Create a color texture that is host visible and in linear layout. We will copy into this.
const TextureOptions cpuColorTextureOptions = {
.type = TextureType::TextureType2D,
.format = m_colorFormat,
.extent = { m_width, m_height, 1 },
.mipLevels = 1,
.samples = SampleCountFlagBits::Samples1Bit,
.tiling = TextureTiling::Linear, // Linear so we can manipulate it on the host
.usage = TextureUsageFlagBits::TransferDstBit,
.memoryUsage = MemoryUsage::CpuOnly
};
m_cpuColorTexture = m_device.createTexture(cpuColorTextureOptions);
|
Filename: offscreen_rendering/offscreen.cpp
The tiling field determines the layout of the texels in memory. Optimal is the default value, and in that case the texture will be laid out in a more optimized way for whatever hardware the program is running on. Linear means row-major order, which is best if the texture needs to be CPU-addressable.
After creating textures, we need to create an array of memory barrier options. First, lets look at how these options are used later, during rendering:
| commandRecorder.textureMemoryBarrier(m_barriers[uint8_t(TextureBarriers::CopySrcPre)]);
commandRecorder.textureMemoryBarrier(m_barriers[uint8_t(TextureBarriers::CopyDstPre)]);
commandRecorder.copyTextureToTexture(m_copyOptions);
commandRecorder.textureMemoryBarrier(m_barriers[uint8_t(TextureBarriers::CopyDstPost)]);
commandRecorder.textureMemoryBarrier(m_barriers[uint8_t(TextureBarriers::CopySrcPost)]);
|
Filename: offscreen_rendering/offscreen.cpp
Memory barriers are commands which will act to ensure that memory in the CPU cache is flushed and visible to other cores before continuing processing. They also can make changes to the format of memory, which is part of the options we will configure. Before we can perform the copyTextureToTexture operation, we need to ensure that the texture memory is up-to-date, visible, and in the correct format. Afterwards, we need memory barriers to reset the GPU texture to its original format and to set up the CPU texture's format so it can be mapped to CPU address space. So, lets set the different memory barrier options:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51 | // Insert a texture memory barrier to ensure the rendering to the color render target
// is completed and to transition it into a layout suitable for copying from
m_barriers[uint8_t(TextureBarriers::CopySrcPre)] = {
.srcStages = PipelineStageFlagBit::ColorAttachmentOutputBit,
.srcMask = AccessFlagBit::ColorAttachmentWriteBit,
.dstStages = PipelineStageFlagBit::TransferBit,
.dstMask = AccessFlagBit::TransferReadBit,
.oldLayout = TextureLayout::ColorAttachmentOptimal,
.newLayout = TextureLayout::TransferSrcOptimal,
.texture = m_colorTexture,
.range = { .aspectMask = TextureAspectFlagBits::ColorBit }
};
// Insert another texture memory barrier to transition the destination cpu visible
// texture into a suitable layout for copying into
m_barriers[uint8_t(TextureBarriers::CopyDstPre)] = {
.srcStages = PipelineStageFlagBit::TransferBit,
.srcMask = AccessFlagBit::None,
.dstStages = PipelineStageFlagBit::TransferBit,
.dstMask = AccessFlagBit::TransferWriteBit,
.oldLayout = TextureLayout::Undefined,
.newLayout = TextureLayout::TransferDstOptimal,
.texture = m_cpuColorTexture,
.range = { .aspectMask = TextureAspectFlagBits::ColorBit }
};
// Transition the destination texture to general layout so that we can map it to the cpu
// address space later.
m_barriers[uint8_t(TextureBarriers::CopyDstPost)] = {
.srcStages = PipelineStageFlagBit::TransferBit,
.srcMask = AccessFlagBit::TransferWriteBit,
.dstStages = PipelineStageFlagBit::TransferBit,
.dstMask = AccessFlagBit::MemoryReadBit,
.oldLayout = TextureLayout::TransferDstOptimal,
.newLayout = TextureLayout::General,
.texture = m_cpuColorTexture,
.range = { .aspectMask = TextureAspectFlagBits::ColorBit }
};
// Transition the color target back to the color attachment optimal layout, ready
// to render again later.
m_barriers[uint8_t(TextureBarriers::CopySrcPost)] = {
.srcStages = PipelineStageFlagBit::TransferBit,
.srcMask = AccessFlagBit::TransferReadBit,
.dstStages = PipelineStageFlagBit::TransferBit,
.dstMask = AccessFlagBit::MemoryReadBit,
.oldLayout = TextureLayout::TransferSrcOptimal,
.newLayout = TextureLayout::ColorAttachmentOptimal,
.texture = m_colorTexture,
.range = { .aspectMask = TextureAspectFlagBits::ColorBit }
};
|
Filename: offscreen_rendering/offscreen.cpp
The last step is to create the copy options for the copyTextureToTexture call shown earlier:
| m_copyOptions = {
.srcTexture = m_colorTexture,
.srcLayout = TextureLayout::TransferSrcOptimal,
.dstTexture = m_cpuColorTexture,
.dstLayout = TextureLayout::TransferDstOptimal,
.regions = {
{
.extent = { .width = m_width, .height = m_height, .depth = 1 },
},
}
};
|
Filename: offscreen_rendering/offscreen.cpp
initializeScene
The first thing to do on scene initialization is to load the image that we use represent points on the graph. The majority of this is identical to the texture loading seen in the Textured Quad example, including the same loadImage helper function. One difference is that we set the scaling filters for upscaling and downscaling.
| m_pointSampler = m_device.createSampler(SamplerOptions{ .magFilter = FilterMode::Linear, .minFilter = FilterMode::Linear });
|
Filename: offscreen_rendering/offscreen.cpp
Also, we keep track of the buffer upload information in a member variable, to free later. This is a housekeeping task which normally would handled by KDGpuExample::ExampleEngineLayer::uploadBufferData.
| m_stagingBuffers.emplace_back(m_queue.uploadTextureData(uploadOptions));
|
Filename: offscreen_rendering/offscreen.cpp
Next we load the shaders. The vertex shader uses gl_PointSize to control the rendered size of each point. See the Vulkan Point Sprite documentation for details.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23 | layout(location = 0) in vec4 vertexPos;
layout(location = 1) in vec4 vertexCol;
layout(location = 0) out vec4 color;
out gl_PerVertex
{
vec4 gl_Position;
float gl_PointSize;
};
layout(set = 1, binding = 0) uniform Transform
{
mat4 proj;
}
transform;
void main()
{
color = vertexCol;
gl_PointSize = 16.0;
gl_Position = transform.proj * vertexPos;
}
|
Filename: offscreen_rendering/plot.vert
Next, we create a buffer to hold the transformation matrix, called m_proj, and copy an orthographic projection into it:
| void Offscreen::setProjection(float left, float right, float bottom, float top)
{
// NB: We flip bottom and top since Vulkan (and KDGpu) invert the y vs OpenGL
m_proj = glm::ortho(left, right, top, bottom);
auto bufferData = m_projBuffer.map();
std::memcpy(bufferData, &m_proj, sizeof(glm::mat4));
m_projBuffer.unmap();
}
|
Filename: offscreen_rendering/offscreen.cpp
We create the necessary bind group and bind group layouts, and then finally create the pipeline options:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42 | const GraphicsPipelineOptions pipelineOptions = {
.shaderStages = {
{ .shaderModule = vertexShader, .stage = ShaderStageFlagBits::VertexBit },
{ .shaderModule = fragmentShader, .stage = ShaderStageFlagBits::FragmentBit },
},
.layout = m_pipelineLayout,
.vertex = {
.buffers = { { .binding = 0, .stride = sizeof(Offscreen::Vertex) } },
.attributes = {
{ .location = 0, .binding = 0, .format = Format::R32G32_SFLOAT }, // Position
{ .location = 1, .binding = 0, .format = Format::R32G32B32A32_SFLOAT, .offset = sizeof(glm::vec2) } // Color
},
},
.renderTargets = {
{
.format = m_colorFormat,
.blending = {
.blendingEnabled = true,
.color = {
.srcFactor = BlendFactor::SrcAlpha,
.dstFactor = BlendFactor::OneMinusSrcAlpha,
},
.alpha = {
.srcFactor = BlendFactor::SrcAlpha,
.dstFactor = BlendFactor::OneMinusSrcAlpha,
},
},
},
},
.depthStencil = {
.format = m_depthFormat,
.depthTestEnabled = false,
.depthWritesEnabled = false,
.depthCompareOperation = CompareOperation::Always,
},
.primitive = {
.topology = PrimitiveTopology::PointList,
},
.multisample = {
.samples = m_samples,
},
};
|
Filename: offscreen_rendering/offscreen.cpp
Notice:
- the use of multisampling
CompareOperation::Always
- The pointlist topology, so that the vertices get interpreted as points and not triangles
- That blending is enabled, with settings for both color and alpha
- The binding for the vertex buffer that uses the size of the Offscreen::Vertex shown earlier.
Data Upload
Having completed initialization, we need to pass the large vector of vertex data we generated earlier into our Offscreen object. We pass it in with setData:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24 | void Offscreen::setData(const std::vector<Offscreen::Vertex> &data)
{
m_pointCount = data.size();
const DeviceSize dataByteSize = data.size() * sizeof(Offscreen::Vertex);
BufferOptions bufferOptions = {
.size = dataByteSize,
.usage = BufferUsageFlagBits::VertexBufferBit | BufferUsageFlagBits::TransferDstBit,
.memoryUsage = MemoryUsage::GpuOnly
};
m_dataBuffer = m_device.createBuffer(bufferOptions);
const BufferUploadOptions uploadOptions = {
.destinationBuffer = m_dataBuffer,
.dstStages = PipelineStageFlagBit::VertexAttributeInputBit,
.dstMask = AccessFlagBit::VertexAttributeReadBit,
.data = data.data(),
.byteSize = dataByteSize
};
// Initiate the data upload. We note the upload details so that we can
// test to see when it is safe to destroy the staging buffer. We will check
// at the end of each render function.
m_stagingBuffers.emplace_back(m_queue.uploadBufferData(uploadOptions));
}
|
Filename: offscreen_rendering/offscreen.cpp
At the end of this function we also keep track of the buffer for release later.
Vulkan Concepts
Headless Rendering: Vulkan doesn't require a window or display. Rendering can target any GPU-accessible memory, making it ideal for:
- Batch processing
- Server-side rendering
- Distributed rendering
- Automated testing
Memory Barriers and Layout Transitions: GPU and CPU maintain separate memory caches. Barriers ensure memory visibility and correct texture layouts for reading/writing. See Synchronization and Cache Control.
Texture Copy Operations: Both GPU-to-GPU and GPU-to-CPU transfers require proper layout transitions and synchronization. See Image Copies.
Key Vulkan Features
Further Reading
Updated on 2026-03-31 at 00:02:07 +0000