Hello Triangle with Frame Overlap¶
This example is visually identical to Hello Triangle but achieves significantly higher performance through proper frame synchronization. By allowing the CPU to prepare the next frame while the GPU renders the previous one, we eliminate idle time on both processors. This is a fundamental technique for real-time rendering applications.
The example uses the KDGpuExample helper API with the AdvancedExampleEngineLayer base class that manages multiple in-flight frames.
Overview¶
What this example demonstrates:
- Triple-buffered rendering with independent frames "in-flight"
- Fence-based CPU/GPU synchronization for frame pacing
- Per-frame command buffers and synchronization primitives
- Eliminating GPU bubbles and CPU stalls for maximum throughput
Performance benefit:
- Simple blocking: CPU waits for GPU → both idle 50% of the time
- Frame overlap: CPU and GPU work in parallel → both utilized continuously
- Typical improvement: 50-100% higher frame rate
Vulkan Requirements¶
- Vulkan Version: 1.0+
- Extensions: None (core synchronization)
- Synchronization Primitives: VkFence and VkSemaphore
Key Concepts¶
The Problem with Blocking:
In Hello Triangle (SimpleExampleEngineLayer), the CPU calls device.waitUntilIdle() after submitting each frame. This creates a timeline like:
1 2 3 | |
The GPU is idle while the CPU prepares, and the CPU is idle while the GPU renders. Both processors run at ~50% utilization.
Triple-Buffered Frame Overlap:
By maintaining multiple frames "in-flight" simultaneously, we overlap CPU and GPU work:
1 2 3 4 | |
The CPU prepares frame N+1 while the GPU renders frame N. Both processors stay busy.
Why "Double-Buffered"?
We maintain 2 independent sets of resources (buffers, command buffers, fences):
- Frame N-1: GPU is rendering
- Frame N: CPU is preparing
This ensures we never access resources currently in use by the GPU.
Fences vs Semaphores:
Vulkan provides two synchronization primitives:
- Fence (VkFence): CPU-GPU synchronization. CPU can wait on a fence to know when GPU work completes.
- Semaphore (VkSemaphore): GPU-GPU synchronization. GPU waits on semaphores between queue submissions.
This example uses:
- Per-frame fences: CPU waits to ensure frame N-2 finished before reusing its resources for frame N+1
- Present semaphores: GPU waits for swapchain image acquisition before rendering
- Render semaphores: Presentation waits for rendering to complete
For more on synchronization: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VkFence.html
Implementation¶
AdvancedExampleEngineLayer vs SimpleExampleEngineLayer:
The key difference is removing device.waitUntilIdle() and managing per-frame resources:
- SimpleExampleEngineLayer: Blocks after every frame, single command buffer
- AdvancedExampleEngineLayer: Manages multiple in-flight frames, per-frame resources
To see what AdvancedExampleEngineLayer does behind the scenes, study Hello Triangle Native API which manually implements all synchronization.
Frame Overlap Synchronization:
The render function uses per-frame indices to manage independent frame resources:
1 2 3 4 5 6 7 8 9 10 | |
Filename: hello_triangle_overlap/hello_triangle.cpp
Key points:
m_inFlightIndex: Current frame slot (0, 1, (or 2 for triple-buffering))m_commandBuffers[m_inFlightIndex]: This frame's command buffer (each frame has its own)m_frameFences[m_inFlightIndex]: Signal this fence when GPU finishes this framem_presentCompleteSemaphores[m_inFlightIndex]: Wait for swapchain image acquisitionm_renderCompleteSemaphores[m_currentSwapchainImageIndex]: Signal when rendering completes
Frame Lifecycle:
Each frame goes through this cycle:
- Acquire: Get next swapchain image (signals present semaphore)
- Wait: CPU waits on frame fence from N-2 frames ago (ensure that frame finished)
- Record: CPU records command buffer for current frame N
- Submit: GPU begins executing commands (waits on present semaphore, signals render semaphore and fence)
- Present: Display frame on screen (waits on render semaphore)
By the time we reach frame N, frame N-2 has definitely completed (fence wait), so we can safely reuse its resources.
Resource Management:
Each frame needs independent resources:
1 2 3 4 5 6 7 | |
Performance Notes¶
- Latency vs Throughput: Double/Triple-buffering increases throughput (FPS) but adds 1-2 frames of input latency. For competitive games, consider double-buffering.
- CPU-bound vs GPU-bound: Frame overlap only helps when both CPU and GPU have work to do. If GPU-bound, overlap won't improve FPS.
- Frame pacing: Fences prevent unlimited buffering. Without them, CPU could queue dozens of frames, causing massive latency.
- VSync interaction: With VSync enabled, you may not see FPS improvement but will have smoother frame times.
See Also¶
- Hello Triangle - Simple blocking version for comparison
- Hello Triangle Native API - Manual synchronization implementation without helper API
- Vulkan Synchronization Guide - Comprehensive synchronization overview
- VkFence - CPU-GPU synchronization
- VkSemaphore - GPU-GPU synchronization
Further Reading¶
- Frame Pipelining - Blog on Vulkan Synchronization and frame pipelining
- Vulkan Synchronization Primer - Khronos blog on synchronization
- CPU-GPU Parallelism - NVIDIA guide to overlapping compute and transfer
Updated on 2026-03-31 at 00:02:07 +0000