
Last updated: Feb 28, 2024
Goober Graphics
Problem
Existing graphics engines are either too heavy for small projects or too limiting for advanced rendering techniques. Developers need a lightweight, modern C++ graphics engine that provides maximum performance while maintaining flexibility for custom rendering pipelines.
Approach
Building a next-generation graphics engine from scratch with modern C++20, focusing on performance, modularity, and cross-platform compatibility. The engine combines cutting-edge rendering techniques with a clean, extensible architecture:
Advanced Rendering Pipeline
- Physically-Based Rendering: Full PBR implementation with IBL and multiple light types
- Clustered Deferred Rendering: Efficient handling of thousands of dynamic lights
- Temporal Upsampling: High-quality anti-aliasing with minimal performance cost
- Dynamic Resolution Scaling: Adaptive quality based on performance targets
- Volumetric Lighting: Real-time fog, god rays, and atmospheric scattering
- Screen-Space Reflections: High-quality reflections with temporal filtering
Engine Architecture
- Entity-Component-System: Data-oriented design for maximum performance
- Multi-threaded Rendering: Parallel command buffer generation and submission
- GPU-Driven Rendering: Minimize CPU overhead with compute-based culling
- Memory Pool Allocators: Custom allocation strategies for different resource types
- Hot-Reload System: Real-time asset reloading for rapid iteration
- Cross-Platform Support: Windows, Linux, macOS with unified API
Core Implementation
// ---------------------------------------------------------------------------
// strong_handle.hpp
// Modern C++20 strong handle with generation tracking and hashing
// ---------------------------------------------------------------------------
#pragma once
#include <cstdint>
#include <functional>
template<typename Tag>
class StrongHandle {
uint32_t _index{0};
uint32_t _generation{0};
public:
constexpr StrongHandle() = default;
constexpr StrongHandle(uint32_t idx, uint32_t gen = 0)
: _index(idx), _generation(gen) {}
constexpr bool is_valid() const noexcept { return _index != 0; }
constexpr uint32_t index() const noexcept { return _index; }
constexpr uint32_t generation() const noexcept { return _generation; }
friend constexpr bool operator==(StrongHandle a, StrongHandle b) noexcept {
return a._index == b._index && a._generation == b._generation;
}
friend constexpr bool operator!=(StrongHandle a, StrongHandle b) noexcept {
return !(a == b);
}
friend constexpr bool operator<(StrongHandle a, StrongHandle b) noexcept {
return std::tie(a._generation, a._index) < std::tie(b._generation, b._index);
}
};
template<typename Tag>
struct std::hash<StrongHandle<Tag>> {
size_t operator()(StrongHandle<Tag> h) const noexcept {
return (static_cast<size_t>(h.index()) << 32) ^ h.generation();
}
};
// Forward tags
struct BufferTag {};
using BufferHandle = StrongHandle<BufferTag>;
struct ShaderTag {};
using ShaderHandle = StrongHandle<ShaderTag>;
// ---------------------------------------------------------------------------
// bitflags.hpp
// Enum-class bitflags utilities
// ---------------------------------------------------------------------------
#pragma once
#include <type_traits>
template<typename E>
concept EnumClass = std::is_enum_v<E> && !std::is_convertible_v<E, int>;
template<EnumClass E>
constexpr E operator|(E a, E b) {
using U = std::underlying_type_t<E>;
return static_cast<E>(static_cast<U>(a) | static_cast<U>(b));
}
template<EnumClass E>
constexpr E operator&(E a, E b) {
using U = std::underlying_type_t<E>;
return static_cast<E>(static_cast<U>(a) & static_cast<U>(b));
}
template<EnumClass E>
constexpr E operator~(E e) {
using U = std::underlying_type_t<E>;
return static_cast<E>(~static_cast<U>(e));
}
template<EnumClass E>
constexpr bool has(E value, E flag) {
return static_cast<bool>(value & flag);
}
// ---------------------------------------------------------------------------
// culling_system.hpp
// GPU-driven frustum + Hi-Z occlusion culling
// ---------------------------------------------------------------------------
#pragma once
#include "strong_handle.hpp"
#include <span>
struct Matrix4f;
struct Vector4f;
class CullingSystem {
public:
struct alignas(16) CullData {
Matrix4f viewProj;
Vector4f frustumPlanes[6];
uint32_t objectCount;
float lodBias;
uint32_t depthPyramidMip; // extra param for Hi-Z
};
void dispatch(const CullData& data) {
ShaderBinder _(cullingCS_); // RAII binder
cullingCS_.set_uniform("uCull", data);
const uint32_t groups = (data.objectCount + 63) / 64;
cmd_.dispatch(groups, 1, 1);
}
BufferHandle visible_objects() const { return visBuffer_; }
// Debug helper: draws a micro-UI bar with dispatch cost
void draw_gui() const;
private:
ComputeShader cullingCS_;
BufferHandle visBuffer_;
CommandList cmd_;
};
// ---------------------------------------------------------------------------
// material_system.hpp
// Automatic shader-variant baker with hot reload
// ---------------------------------------------------------------------------
#pragma once
#include "bitflags.hpp"
#include "strong_handle.hpp"
#include <unordered_map>
class MaterialSystem {
public:
enum class Flags : uint32_t {
NONE = 0,
HAS_ALBEDO_MAP = 1 << 0,
HAS_NORMAL_MAP = 1 << 1,
HAS_METALLIC = 1 << 2,
HAS_ROUGHNESS = 1 << 3,
ALPHA_TESTED = 1 << 4,
DOUBLE_SIDED = 1 << 5,
};
ShaderHandle get_shader(Flags flags) {
if (auto it = cache_.find(flags); it != cache_.end())
return it->second;
ShaderPermutation perm;
perm.define("ALBEDO_MAP", has(flags, Flags::HAS_ALBEDO_MAP));
perm.define("NORMAL_MAP", has(flags, Flags::HAS_NORMAL_MAP));
perm.define("METALLIC_MAP", has(flags, Flags::HAS_METALLIC));
perm.define("ROUGHNESS_MAP",has(flags, Flags::HAS_ROUGHNESS));
perm.define("ALPHA_TEST", has(flags, Flags::ALPHA_TESTED));
perm.define("DOUBLE_SIDED", has(flags, Flags::DOUBLE_SIDED));
auto shader = shaders_.compile("pbr_material", perm);
return cache_.emplace(flags, shader).first->second;
}
// Call once per frame (cheap) – will rebuild any shaders whose file changed
void hot_reload() { shaders_.reload_dirty(); }
private:
std::unordered_map<Flags, ShaderHandle> cache_;
ShaderCompiler shaders_;
};
Technical Achievements
Performance Optimization
- GPU-Driven Architecture: All culling and LOD selection on GPU using compute shaders
- Bindless Resources: Direct GPU access to textures without CPU binding overhead
- Parallel Command Generation: Multi-threaded command buffer recording
- Memory-Mapped Buffers: Persistent mapping for streaming data updates
- Custom Memory Allocators: Stack, ring, and pool allocators for different use cases
Innovation Highlights
- Automatic Shader Variants: Runtime shader compilation with caching system
- Temporal Resource Management: Smart resource lifetime tracking and cleanup
- Hot-Reload Everything: Shaders, textures, meshes, and even C++ code
- Debugging Tools: Built-in profiler with GPU timing and memory tracking
- Cross-API Abstraction: Unified interface for Vulkan, DirectX 12, and Metal
Roadmap
Phase 1 - Foundation (Complete)
- Core engine architecture with ECS
- Vulkan renderer with basic PBR
- Asset loading and management system
- Basic scene graph and transforms
- Memory management and profiling
Phase 2 - Advanced Rendering (Current)
- Clustered deferred rendering
- Screen-space reflections
- Volumetric lighting and fog
- Temporal anti-aliasing (TAA)
- GPU-driven particle systems
Phase 3 - Next Generation
- Ray tracing integration (RTX/RDNA2)
- Machine learning-based upscaling
- Advanced post-processing pipeline
- VR/AR rendering optimizations
- Procedural geometry generation
Phase 4 - Production Ready
- Visual scripting system
- Physics integration (custom or Bullet)
- Audio system with spatial audio
- Networking for multiplayer games
- Complete editor with visual tools