Files
yaze/docs/analysis/overworld_optimization_status.md
scawful 91a6a49d1a Add comprehensive analysis of ZScream vs YAZE overworld implementations
- Introduced a detailed comparison document highlighting the functional equivalence between ZScream (C#) and YAZE (C++) overworld loading logic.
- Verified key areas such as tile loading, expansion detection, map decompression, and coordinate calculations, confirming consistent behavior across both implementations.
- Documented differences and improvements in YAZE, including enhanced error handling and memory management.
- Provided validation results from integration tests ensuring data integrity and compatibility with existing ROMs.
2025-09-28 22:49:29 -04:00

3.6 KiB

Overworld Optimization Status Update

Current Performance Analysis

Based on the latest performance report:

CreateOverworldMaps           1           148.42         148.42         
CreateInitialTextures         1           4.49           4.49           
CreateTilemap                 1           4.70           4.70           
CreateBitmapWithoutTexture_Graphics1           0.24           0.24           
LoadOverworldData             1           2849.67        2849.67        
AssembleTiles                 1           10.35          10.35          
CreateOverworldMapObjects     1           0.74           0.74           
DecompressAllMapTiles         1           1.40           1.40           
CreateBitmapWithoutTexture_Tileset1           3.69           3.69           
Overworld::Load               2           5724.38        2862.19        

Key Findings

Successful Optimizations

  1. Decompression Fixed: DecompressAllMapTiles is now only 1.40ms (was the bottleneck before)
  2. Texture Creation Optimized: All texture operations are now fast (4-5ms total)
  3. Overworld Not Broken: Fixed the parallel decompression issues that were causing corruption

🎯 Real Bottleneck Identified

The actual bottleneck is LoadOverworldData at 2849.67ms (2.8 seconds), not the decompression.

📊 Performance Breakdown

  • Total Overworld::Load: 2862.19ms (2.9 seconds)
  • LoadOverworldData: 2849.67ms (99.5% of total time!)
  • All other operations: ~12.5ms (0.5% of total time)

Root Cause Analysis

The LoadOverworldData phase includes:

  1. LoadTileTypes() - Fast
  2. LoadEntrances() - Fast
  3. LoadHoles() - Fast
  4. LoadExits() - Fast
  5. LoadItems() - Fast
  6. LoadOverworldMaps() - This is the bottleneck (already parallelized)
  7. LoadSprites() - Fast

The issue is that LoadOverworldMaps() calls OverworldMap::BuildMap() for all 160 maps in parallel, but each BuildMap() call is still expensive.

Optimization Strategy

Phase 1: Detailed Profiling (Immediate)

Added individual timing for each operation in LoadOverworldData to identify the exact bottleneck:

{
  core::ScopedTimer tile_types_timer("LoadTileTypes");
  LoadTileTypes();
}

{
  core::ScopedTimer entrances_timer("LoadEntrances");  
  RETURN_IF_ERROR(LoadEntrances());
}
// ... etc for each operation

Phase 2: Optimize BuildMap Operations (Next)

The OverworldMap::BuildMap() method is likely doing expensive operations:

  • Graphics loading and processing
  • Palette operations
  • Tile assembly
  • Bitmap creation

Phase 3: Lazy Loading (Future)

Only build maps that are immediately needed:

  • Build first 4-8 maps initially
  • Build remaining maps on-demand when accessed
  • Use background processing for non-visible maps

Current Status

Fixed Issues:

  • Overworld corruption resolved (reverted to sequential decompression)
  • Decompression performance restored (1.4ms)
  • Texture creation optimized

🔄 Next Steps:

  1. Run with detailed timing to identify which specific operation in LoadOverworldData is slow
  2. Optimize the OverworldMap::BuildMap() method
  3. Implement lazy loading for non-essential maps

Expected Results

With the detailed timing, we should see something like:

LoadTileTypes           1           ~5ms
LoadEntrances           1           ~50ms  
LoadHoles               1           ~20ms
LoadExits               1           ~100ms
LoadItems               1           ~200ms
LoadOverworldMaps       1           ~2400ms  <-- This will be the bottleneck
LoadSprites             1           ~100ms

This will allow us to focus optimization efforts on the actual bottleneck rather than guessing.