Skip to content

KDGpu 0.10.0

Host Image Copy for Texture Upload

Host Image Copy for Texture Upload¶

This example shows how to use the host image copy extension to upload texture data directly from CPU memory to GPU textures without creating intermediate staging buffers or command buffers. Traditional texture uploads require staging buffers and GPU copy commands; host image copy eliminates this overhead for simpler and more efficient texture loading. This is especially useful for texture streaming, dynamic textures, and initial asset loading.

The example uses the KDGpuExample helper API for simplified setup.

Overview¶

What this example demonstrates:

Enabling VK_EXT_host_image_copy extension
Loading images with STB Image library
Creating textures with host copy usage
Direct CPU-to-GPU texture transfer
Layout transitions for host copies
Immediate visibility of uploaded data

Use cases:

Simplified texture loading code
Dynamic texture updates from CPU
Texture streaming systems
Initial asset loading
Eliminating staging buffer management

Vulkan Requirements¶

Vulkan Version: 1.0+ with extension
Extensions: VK_EXT_host_image_copy
Features: hostImageCopy
Limits: Check copySrcLayoutCount, copyDstLayoutCount

Key Concepts¶

Traditional Texture Upload:

// Complex multi-step process:
1. Create staging buffer (CPU-visible)
2. Copy image data to staging buffer
3. Create GPU texture (GPU-only)
4. Record command buffer:
   - Transition texture layout
   - vkCmdCopyBufferToImage
   - Transition to shader-readable layout
5. Submit and wait for GPU copy
6. Destroy staging buffer

Problems:

Many steps and objects
Staging buffer memory overhead
Synchronization complexity
Command buffer overhead

Host Image Copy:

// Simplified direct copy:
Create texture with host copy usage
Transition layout (host-side)
vkCopyMemoryToImageEXT (CPU call, no GPU work)
Done! Immediately visible to GPU

Benefits:

No staging buffers
No command buffers for upload
Simpler code
Immediate visibility

Spec: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_EXT_host_image_copy.html First, we include the STB image header, which will provide us with the necessary inputs to the KDGpu API:

#define STB_IMAGE_IMPLEMENTATION
#include <stb_image.h>

Filename: host_image_copy_texture/host_image_copy.cpp

We organize all of the STB-image-supplied values into one struct, using an RGBA KDGpu::Format.

struct ImageData {
    uint32_t width{ 0 };
    uint32_t height{ 0 };
    uint8_t *pixelData{ nullptr };
    DeviceSize byteSize{ 0 };
    Format format{ Format::R8G8B8A8_UNORM };
};

Filename: host_image_copy_texture/host_image_copy.cpp

And we have a function to populate the struct, with some per-platform implementation details. This function includes all of the STB image calls in this example.

ImageData loadImage(KDUtils::File &file)
{
    int texChannels;
    int _width = 0, _height = 0;

    if (!file.open(std::ios::in | std::ios::binary)) {
        SPDLOG_LOGGER_CRITICAL(KDGpu::Logger::logger(), "Failed to open file {}", file.path());
        throw std::runtime_error("Failed to open file");
    }

    const KDUtils::ByteArray fileContent = file.readAll();
    std::vector<uint32_t> buffer(fileContent.size() / 4);

    auto _data = stbi_load_from_memory(
            fileContent.data(), fileContent.size(), &_width, &_height, &texChannels, STBI_rgb_alpha);

    if (_data == nullptr) {
        SPDLOG_WARN("Failed to load texture {} {}", file.path(), stbi_failure_reason());
        return {};
    }
    SPDLOG_DEBUG("Texture dimensions: {} x {}", _width, _height);

    return ImageData{
        .width = static_cast<uint32_t>(_width),
        .height = static_cast<uint32_t>(_height),
        .pixelData = static_cast<uint8_t *>(_data),
        .byteSize = 4 * static_cast<DeviceSize>(_width) * static_cast<DeviceSize>(_height)
    };
}

Filename: host_image_copy_texture/host_image_copy.cpp

Texture Initialization and Upload¶

At scene initialization, we load and upload the texture, and create and upload the vertex buffer for a fullscreen quad on which to render our texture onto. Some things to note:

We have no need to access the buffer after loading, so the memory usage is GPU-only.
We perform the texture upload with only one copy region which covers the whole texture. This is just boilerplate for us but a larger texture could make use of multiple copy regions.
The oldLayout and newLayout options offer a way to optimize the texture for different usecases. Check out KDGpu::TextureLayout to see the available layouts.

We begin by loading the raw image data by calling our loadImage function.

        // Load the image data and size
        auto imageFile = KDGpuExample::assetDir().file("textures/samuel-ferrara-1527pjeb6jg-unsplash.jpg");
        ImageData image = loadImage(imageFile);

Filename: host_image_copy_texture/host_image_copy.cpp

Next we need to create out texture and specify that it will be used with host copies.

        // Create Texture compatible with Host Transfers
        const TextureOptions textureOptions = {
            .type = TextureType::TextureType2D,
            .format = image.format,
            .extent = { .width = image.width, .height = image.height, .depth = 1 },
            .mipLevels = 1,
            .usage = TextureUsageFlagBits::SampledBit | TextureUsageFlagBits::TransferDstBit | KDGpu::TextureUsageFlagBits::HostTransferBit,
            .memoryUsage = MemoryUsage::GpuOnly,
            .initialLayout = TextureLayout::Undefined
        };
        m_texture = m_device.createTexture(textureOptions);

Filename: host_image_copy_texture/host_image_copy.cpp

In order to initiate a copy from the host to the texture, the texture first need to be transitioned to a suitable texture layout.

        // Transition the texture to the General Layout on the host
        m_texture.hostLayoutTransition(HostLayoutTransition{
                .oldLayout = TextureLayout::Undefined,
                .newLayout = TextureLayout::General,
                .range = {
                        .aspectMask = TextureAspectFlagBits::ColorBit,
                },
        });

Filename: host_image_copy_texture/host_image_copy.cpp

Then we can proceed with the actual transfer of data between our host side data and the texture. The big advantage of host copies is we don't have to mess with creating intermediate staging buffers followed by copy commands in a CommandBuffer to upload data to our texture (and that even if our texture is set to use GPU-only memory). Also host copies writes become immediately visible to the GPU.

        // Upload the texture data through the host
        m_texture.copyHostMemoryToTexture(HostMemoryToTextureCopy{
                .dstTextureLayout = TextureLayout::General,
                .regions = {
                        HostMemoryToTextureCopyRegion{
                                .srcHostMemoryPointer = image.pixelData,
                                .dstSubresource = TextureSubresourceLayers{
                                        .aspectMask = TextureAspectFlagBits::ColorBit,
                                        .mipLevel = 0,
                                        .baseArrayLayer = 0,
                                        .layerCount = 1,
                                },
                                .dstOffset = { .x = 0, .y = 0, .z = 0 },
                                .dstExtent = { image.width, image.height, 1 },
                        },
                },
        });

Filename: host_image_copy_texture/host_image_copy.cpp

Next we can just transition the texture to a layout suitable for shaders to sample from it.

        // Transition the texture to the ShaderReadOnlyOptimal Layout on the host
        m_texture.hostLayoutTransition(HostLayoutTransition{
                .oldLayout = TextureLayout::General,
                .newLayout = TextureLayout::ShaderReadOnlyOptimal,
                .range = {
                        .aspectMask = TextureAspectFlagBits::ColorBit,
                },
        });

Filename: host_image_copy_texture/host_image_copy.cpp

Finally we can create a TextureView and a Sampler which is what our shader will be using to sample data from the Texture.

        // Create a view and sampler
        m_textureView = m_texture.createView();
        m_sampler = m_device.createSampler();

Filename: host_image_copy_texture/host_image_copy.cpp

Geometry and Graphics Pipeline Initialization¶

The quad vertex buffer creation is unsurprising. The geometry is uploaded in NDC coordinates so that we don't need to perform any transformation in the vertex shader.

    struct Vertex {
        glm::vec3 position;
        glm::vec2 texCoord;
    };

    // Create a buffer to hold the quad vertex data
    {
        const float scale = 0.8f;
        const std::array<Vertex, 4> vertexData = {
            Vertex{ // Bottom-left
                    .position = { -1.0f * scale, 1.0f * scale, 0.0f },
                    .texCoord = { 0.0f, 1.0f } },
            Vertex{ // Bottom-right
                    .position = { 1.0f * scale, 1.0f * scale, 0.0f },
                    .texCoord = { 1.0f, 1.0f } },
            Vertex{ // Top-left
                    .position = { -1.0f * scale, -1.0f * scale, 0.0f },
                    .texCoord = { 0.0f, 0.0f } },
            Vertex{ // Top-right
                    .position = { 1.0f * scale, -1.0f * scale, 0.0f },
                    .texCoord = { 1.0f, 0.0f } }
        };

        const DeviceSize dataByteSize = vertexData.size() * sizeof(Vertex);
        const BufferOptions bufferOptions = {
            .size = dataByteSize,
            .usage = BufferUsageFlagBits::VertexBufferBit | BufferUsageFlagBits::TransferDstBit,
            .memoryUsage = MemoryUsage::GpuOnly
        };
        m_buffer = m_device.createBuffer(bufferOptions);
        const BufferUploadOptions uploadOptions = {
            .destinationBuffer = m_buffer,
            .dstStages = PipelineStageFlagBit::VertexAttributeInputBit,
            .dstMask = AccessFlagBit::VertexAttributeReadBit,
            .data = vertexData.data(),
            .byteSize = dataByteSize
        };
        uploadBufferData(uploadOptions);
    }

Filename: host_image_copy_texture/host_image_copy.cpp

We then proceed with creating a simple rendering pipeline that expects 2 attributes (position and texture coordinates) and a bindgroup.

    // Create a vertex shader and fragment shader (spir-v only for now)
    auto vertexShaderPath = KDGpuExample::assetDir().file("shaders/examples/textured_quad/textured_quad.vert.spv");
    auto vertexShader = m_device.createShaderModule(KDGpuExample::readShaderFile(vertexShaderPath));

    auto fragmentShaderPath = KDGpuExample::assetDir().file("shaders/examples/textured_quad/textured_quad.frag.spv");
    auto fragmentShader = m_device.createShaderModule(KDGpuExample::readShaderFile(fragmentShaderPath));

    // Create bind group layout consisting of a single binding holding a UBO
    // clang-format off
    const BindGroupLayoutOptions bindGroupLayoutOptions = {
        .bindings = {{
            .binding = 0,
            .resourceType = ResourceBindingType::CombinedImageSampler,
            .shaderStages = ShaderStageFlags(ShaderStageFlagBits::FragmentBit)
        }}
    };
    // clang-format on
    const BindGroupLayout bindGroupLayout = m_device.createBindGroupLayout(bindGroupLayoutOptions);

    // Create a pipeline layout (array of bind group layouts)
    const PipelineLayoutOptions pipelineLayoutOptions = {
        .bindGroupLayouts = { bindGroupLayout }
    };
    m_pipelineLayout = m_device.createPipelineLayout(pipelineLayoutOptions);

    // Create a pipeline
    // clang-format off
    const GraphicsPipelineOptions pipelineOptions = {
        .shaderStages = {
            { .shaderModule = vertexShader, .stage = ShaderStageFlagBits::VertexBit },
            { .shaderModule = fragmentShader, .stage = ShaderStageFlagBits::FragmentBit }
        },
        .layout = m_pipelineLayout,
        .vertex = {
            .buffers = {
                { .binding = 0, .stride = sizeof(Vertex) }
            },
            .attributes = {
                { .location = 0, .binding = 0, .format = Format::R32G32B32_SFLOAT }, // Position
                { .location = 1, .binding = 0, .format = Format::R32G32_SFLOAT, .offset = sizeof(glm::vec3) } // TexCoord
            }
        },
        .renderTargets = {
            { .format = m_swapchainFormat }
        },
        .depthStencil = {
            .format = m_depthFormat,
            .depthWritesEnabled = true,
            .depthCompareOperation = CompareOperation::Less
        },
        .primitive = {
            .topology = PrimitiveTopology::TriangleStrip
        }
    };
    // clang-format on
    m_pipeline = m_device.createGraphicsPipeline(pipelineOptions);

Filename: host_image_copy_texture/host_image_copy.cpp

Next the BindGroup is allocated, it holds the TextureView and Sampler to use for sampling from our texture in the fragment shader.

    const BindGroupOptions bindGroupOptions = {
        .layout = bindGroupLayout,
        .resources = {{
            .binding = 0,
            .resource = TextureViewSamplerBinding{ .textureView = m_textureView, .sampler = m_sampler }
        }}
    };
    // clang-format on
    m_textureBindGroup = m_device.createBindGroup(bindGroupOptions);

Filename: host_image_copy_texture/host_image_copy.cpp

Rendering¶

Now that we have initialized everything properly, the render function is one of the simplest so far. We set the pipeline and buffer as usual, and set the bindgroup we just created.

void HostImageCopy::render()
{
    auto commandRecorder = m_device.createCommandRecorder();

    auto opaquePass = commandRecorder.beginRenderPass(KDGpu::RenderPassCommandRecorderOptions{
            .colorAttachments = {
                    {
                            .view = m_swapchainViews.at(m_currentSwapchainImageIndex),
                            .clearValue = { 0.3f, 0.3f, 0.3f, 1.0f },
                            .finalLayout = TextureLayout::PresentSrc,
                    },
            },
            .depthStencilAttachment = {
                    .view = m_depthTextureView,
            },
    });
    opaquePass.setPipeline(m_pipeline);
    opaquePass.setVertexBuffer(0, m_buffer);
    opaquePass.setBindGroup(0, m_textureBindGroup);
    opaquePass.draw(DrawCommand{ .vertexCount = 4 });
    renderImGuiOverlay(&opaquePass);
    opaquePass.end();
    m_commandBuffer = commandRecorder.finish();

    const SubmitOptions submitOptions = {
        .commandBuffers = { m_commandBuffer },
        .waitSemaphores = { m_presentCompleteSemaphores[m_inFlightIndex] },
        .signalSemaphores = { m_renderCompleteSemaphores[m_currentSwapchainImageIndex] }
    };
    m_queue.submit(submitOptions);
}

Filename: host_image_copy_texture/host_image_copy.cpp

Also, be sure to actually sample from the texture in the shader:

layout(location = 0) in vec2 texCoord;

layout(location = 0) out vec4 fragColor;

layout(set = 0, binding = 0) uniform sampler2D colorTexture;

void main()
{
    vec3 color = texture(colorTexture, texCoord).rgb;
    fragColor = vec4(color, 1.0);
}

Filename: host_image_copy_texture/doc/shadersnippet.frag

Updated on 2026-07-16 at 00:01:15 +0000