3.8 KiB
3.8 KiB
yaze Performance Optimization Summary
🎉 Massive Performance Improvements Achieved!
📊 Overall Performance Results
| Component | Before | After | Improvement |
|---|---|---|---|
| DungeonEditor::Load | 17,967ms | 3,747ms | 🚀 79% faster! |
| Total ROM Loading | ~18.6s | ~4.7s | 🚀 75% faster! |
| User Experience | 18-second freeze | Near-instant | Dramatic improvement |
🚀 Optimizations Implemented
1. Performance Monitoring System with Feature Flag
- Feature Flag Control:
kEnablePerformanceMonitoringin FeatureFlags allows enabling/disabling the system. - Zero-Overhead When Disabled:
ScopedTimerbecomes a no-op when monitoring is off. - UI Toggle: Performance monitoring can be toggled in the Settings UI.
2. DungeonEditor Parallel Loading (79% Speedup)
- Problem Solved: Loading 296 rooms sequentially was the primary bottleneck, taking ~18 seconds.
- Solution: Implemented multi-threaded room loading, using up to 8 threads to process rooms in parallel. This includes thread-safe collection of results and hardware-aware concurrency.
3. Incremental Overworld Map Loading
- Problem Solved: UI would block and show blank maps while all 160 overworld maps were loaded upfront.
- Solution: Implemented a priority-based incremental loading system. It creates textures for the current world's maps first, at a 4x faster rate (8 per frame), while showing "Loading..." placeholders for the rest.
4. On-Demand Map Reloading
- Problem Solved: Any property change would trigger an expensive full map refresh, even for non-visible maps.
- Solution: An intelligent refresh system now only reloads maps that are currently visible. Changes to non-visible maps are deferred until they are viewed.
Appendix A: Dungeon Editor Parallel Optimization
- Problem Identified:
DungeonEditor::LoadAllRoomstook 17.97 seconds, accounting for 99.9% of loading time. - Strategy: The 296 independent rooms were loaded in parallel across up to 8 threads (~37 rooms per thread).
- Implementation: Used
std::asyncto launch tasks andstd::mutexto safely collect results (like room size and palette data). Results are sorted on the main thread for consistency. - Result: Loading time for the dungeon editor was reduced by 79% to ~3.7 seconds.
Appendix B: Overworld Load Optimization
- Problem Identified:
Overworld::Loadtook 2.9 seconds, with the main bottleneck being the sequential decompression of 160 map tiles (DecompressAllMapTiles). - Strategy: Parallelize the decompression operations and implement lazy loading for maps that are not immediately visible.
- Implementation: The plan involves using
std::asyncto decompress map batches concurrently and creating a system to only load essential maps on startup, deferring the rest to a background process. - Expected Result: A 70-80% reduction in initial overworld loading time.
Appendix C: Renderer Optimization
- Problem Identified: The original renderer created GPU textures synchronously on the main thread for all 160 overworld maps, blocking the UI for several seconds.
- Strategy: Defer texture creation. Bitmaps and surface data are prepared first (a CPU-bound task that can be backgrounded), while the actual GPU texture creation (a main-thread-only task) is done progressively or on-demand.
- Implementation: A
CreateBitmapWithoutTexturemethod was introduced. A lazy loading system (ProcessDeferredTextures) processes a few textures per frame to avoid blocking, andEnsureMapTexturecreates a texture immediately if a map becomes visible. - Result: A much more responsive UI during ROM loading, with an initial load time of only ~200-500ms.