Hello Sphere Mesh Shader¶

This example shows mesh shaders, the modern replacement for vertex+geometry+tessellation shaders. Mesh shaders generate geometry directly on the GPU using a compute-like programming model, processing groups of triangles ("meshlets") in parallel. This enables advanced techniques like GPU-driven culling, LOD selection, and procedural geometry while being significantly more efficient than geometry shaders. The example procedurally generates a sphere entirely on the GPU.

The example uses the KDGpuExample helper API for simplified setup.

Overview¶

What this example demonstrates:

Enabling VK_EXT_mesh_shader extension
Creating mesh shader pipelines (no vertex input state)
Dispatching mesh shader workgroups with drawMeshTasks
Generating vertices and primitives in mesh shader
Procedural geometry generation on GPU

Use cases:

GPU-driven rendering (culling, LOD)
Procedural geometry (spheres, terrain)
Geometry amplification
Modern rendering pipelines
Replacing geometry shaders (much faster)

Vulkan Requirements¶

Vulkan Version: 1.0+ with extension
Extensions: VK_EXT_mesh_shader
Features: meshShader, taskShader (optional)
Limits: Check maxMeshOutputVertices, maxMeshOutputPrimitives

Key Concepts¶

Traditional Pipeline:

Vertex Shader (per-vertex)
  ↓
[Tessellation] (optional)
  ↓
[Geometry Shader] (per-primitive, slow!)
  ↓
Rasterization

Problems:

Geometry shaders are slow (serialized processing)
Limited control over geometry processing
Poor GPU utilization

Mesh Shader Pipeline:

[Task Shader] (optional, culling/LOD)
  ↓
Mesh Shader (generates meshlets in parallel)
  ↓
Rasterization

Benefits:

Compute-like parallelism (workgroups)
Direct control over output
Much faster than geometry shaders
Enables GPU-driven rendering

Spec: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_EXT_mesh_shader.html

Meshlets:

Meshlets are small groups of triangles (typically 64-256 vertices, 128-384 triangles):

Large meshes split into meshlets
Each meshlet processed by one mesh shader workgroup
Enables efficient GPU-driven culling per-meshlet
Better cache locality

Mesh Shader Programming Model:

#extension GL_EXT_mesh_shader : require

layout(local_size_x = 32) in;  // Workgroup size
layout(triangles, max_vertices = 64, max_primitives = 42) out;

void main() {
    uint tid = gl_LocalInvocationIndex;

    // Generate vertices (parallel):
    if (tid < vertexCount) {
        gl_MeshVerticesEXT[tid].gl_Position = computePosition(tid);
    }

    // Generate indices (parallel):
    if (tid < primitiveCount) {
        gl_PrimitiveTriangleIndicesEXT[tid] = uvec3(i0, i1, i2);
    }

    // Set output counts:
    SetMeshOutputsEXT(vertexCount, primitiveCount);
}

Implementation¶

Loading Mesh Shader:

    // Create a mesh shader and fragment shader
    auto meshShaderPath = KDGpuExample::assetDir().file("shaders/examples/hello_sphere_mesh/hello_sphere_mesh.mesh.spv");
    auto meshShader = m_device.createShaderModule(KDGpuExample::readShaderFile(meshShaderPath));

    auto fragmentShaderPath = KDGpuExample::assetDir().file("shaders/examples/hello_sphere_mesh/hello_sphere_mesh.frag.spv");
    auto fragmentShader = m_device.createShaderModule(KDGpuExample::readShaderFile(fragmentShaderPath));

Filename: hello_sphere_mesh/hello_sphere_mesh.cpp

Note ShaderStageFlagBits::MeshBit for mesh shader stage.

Mesh Shader Pipeline:

    // Create a pipeline layout
    const PipelineLayoutOptions pipelineLayoutOptions = {};
    m_pipelineLayout = m_device.createPipelineLayout(pipelineLayoutOptions);

    // Create a pipeline
    const GraphicsPipelineOptions pipelineOptions = {
        .label = "Triangle",
        .shaderStages = {
                { .shaderModule = meshShader, .stage = ShaderStageFlagBits::MeshBit },
                { .shaderModule = fragmentShader, .stage = ShaderStageFlagBits::FragmentBit },
        },
        .layout = m_pipelineLayout,
        .renderTargets = {
                { .format = m_swapchainFormat },
        },
        .depthStencil = {
                .format = m_depthFormat,
                .depthWritesEnabled = true,
                .depthCompareOperation = CompareOperation::Less,
        },
        .primitive = {
                .cullMode = CullModeFlagBits::None,
        }
    };
    m_pipeline = m_device.createGraphicsPipeline(pipelineOptions);

Filename: hello_sphere_mesh/hello_sphere_mesh.cpp

Key differences:

No vertex input state (no vertex buffers/attributes)
Mesh shader replaces vertex shader
Everything else similar to graphics pipeline

Dispatching Mesh Shader:

    opaquePass.setPipeline(m_pipeline);
    opaquePass.drawMeshTasks(KDGpu::DrawMeshCommand{
            .workGroupX = 1,
            .workGroupY = 1,
            .workGroupZ = 1,
    });

Filename: hello_sphere_mesh/hello_sphere_mesh.cpp

drawMeshTasks() dispatches workgroups to mesh shader. Each workgroup generates one meshlet.

For this example: 1 workgroup generates entire sphere procedurally.

Performance Notes¶

Performance Gains:

vs Geometry Shaders: 5-10× faster (parallel processing)
vs Vertex Shaders: Comparable, but enables GPU-driven techniques
GPU Culling: Can cull meshlets before rasterization

Best Practices:

Meshlets: 64-256 vertices, 128-384 triangles (vendor-specific sweet spot)
Use task shaders for coarse culling before mesh shader
Keep mesh shader simple (expensive stage)
Pre-compute meshlet data offline when possible

Hardware Support:

NVIDIA: Turing+ (RTX 2000+), excellent support
AMD: RDNA 2+ (RX 6000+), good support
Intel: Arc series, good support
Mobile: Limited; check vendor documentation

Hello Sphere Mesh Shader¶

Overview¶

Vulkan Requirements¶

Key Concepts¶

Implementation¶

Performance Notes¶

See Also¶

Further Reading¶