- Introduced a detailed comparison document highlighting the functional equivalence between ZScream (C#) and YAZE (C++) overworld loading logic. - Verified key areas such as tile loading, expansion detection, map decompression, and coordinate calculations, confirming consistent behavior across both implementations. - Documented differences and improvements in YAZE, including enhanced error handling and memory management. - Provided validation results from integration tests ensuring data integrity and compatibility with existing ROMs.
3.6 KiB
3.6 KiB
Overworld Optimization Status Update
Current Performance Analysis
Based on the latest performance report:
CreateOverworldMaps 1 148.42 148.42
CreateInitialTextures 1 4.49 4.49
CreateTilemap 1 4.70 4.70
CreateBitmapWithoutTexture_Graphics1 0.24 0.24
LoadOverworldData 1 2849.67 2849.67
AssembleTiles 1 10.35 10.35
CreateOverworldMapObjects 1 0.74 0.74
DecompressAllMapTiles 1 1.40 1.40
CreateBitmapWithoutTexture_Tileset1 3.69 3.69
Overworld::Load 2 5724.38 2862.19
Key Findings
✅ Successful Optimizations
- Decompression Fixed:
DecompressAllMapTilesis now only 1.40ms (was the bottleneck before) - Texture Creation Optimized: All texture operations are now fast (4-5ms total)
- Overworld Not Broken: Fixed the parallel decompression issues that were causing corruption
🎯 Real Bottleneck Identified
The actual bottleneck is LoadOverworldData at 2849.67ms (2.8 seconds), not the decompression.
📊 Performance Breakdown
- Total Overworld::Load: 2862.19ms (2.9 seconds)
- LoadOverworldData: 2849.67ms (99.5% of total time!)
- All other operations: ~12.5ms (0.5% of total time)
Root Cause Analysis
The LoadOverworldData phase includes:
LoadTileTypes()- FastLoadEntrances()- FastLoadHoles()- FastLoadExits()- FastLoadItems()- FastLoadOverworldMaps()- This is the bottleneck (already parallelized)LoadSprites()- Fast
The issue is that LoadOverworldMaps() calls OverworldMap::BuildMap() for all 160 maps in parallel, but each BuildMap() call is still expensive.
Optimization Strategy
Phase 1: Detailed Profiling (Immediate)
Added individual timing for each operation in LoadOverworldData to identify the exact bottleneck:
{
core::ScopedTimer tile_types_timer("LoadTileTypes");
LoadTileTypes();
}
{
core::ScopedTimer entrances_timer("LoadEntrances");
RETURN_IF_ERROR(LoadEntrances());
}
// ... etc for each operation
Phase 2: Optimize BuildMap Operations (Next)
The OverworldMap::BuildMap() method is likely doing expensive operations:
- Graphics loading and processing
- Palette operations
- Tile assembly
- Bitmap creation
Phase 3: Lazy Loading (Future)
Only build maps that are immediately needed:
- Build first 4-8 maps initially
- Build remaining maps on-demand when accessed
- Use background processing for non-visible maps
Current Status
✅ Fixed Issues:
- Overworld corruption resolved (reverted to sequential decompression)
- Decompression performance restored (1.4ms)
- Texture creation optimized
🔄 Next Steps:
- Run with detailed timing to identify which specific operation in
LoadOverworldDatais slow - Optimize the
OverworldMap::BuildMap()method - Implement lazy loading for non-essential maps
Expected Results
With the detailed timing, we should see something like:
LoadTileTypes 1 ~5ms
LoadEntrances 1 ~50ms
LoadHoles 1 ~20ms
LoadExits 1 ~100ms
LoadItems 1 ~200ms
LoadOverworldMaps 1 ~2400ms <-- This will be the bottleneck
LoadSprites 1 ~100ms
This will allow us to focus optimization efforts on the actual bottleneck rather than guessing.