Files
yaze/docs/analysis/overworld_optimization_status.md
scawful 91a6a49d1a Add comprehensive analysis of ZScream vs YAZE overworld implementations
- Introduced a detailed comparison document highlighting the functional equivalence between ZScream (C#) and YAZE (C++) overworld loading logic.
- Verified key areas such as tile loading, expansion detection, map decompression, and coordinate calculations, confirming consistent behavior across both implementations.
- Documented differences and improvements in YAZE, including enhanced error handling and memory management.
- Provided validation results from integration tests ensuring data integrity and compatibility with existing ROMs.
2025-09-28 22:49:29 -04:00

105 lines
3.6 KiB
Markdown

# Overworld Optimization Status Update
## Current Performance Analysis
Based on the latest performance report:
```
CreateOverworldMaps 1 148.42 148.42
CreateInitialTextures 1 4.49 4.49
CreateTilemap 1 4.70 4.70
CreateBitmapWithoutTexture_Graphics1 0.24 0.24
LoadOverworldData 1 2849.67 2849.67
AssembleTiles 1 10.35 10.35
CreateOverworldMapObjects 1 0.74 0.74
DecompressAllMapTiles 1 1.40 1.40
CreateBitmapWithoutTexture_Tileset1 3.69 3.69
Overworld::Load 2 5724.38 2862.19
```
## Key Findings
### ✅ **Successful Optimizations**
1. **Decompression Fixed**: `DecompressAllMapTiles` is now only 1.40ms (was the bottleneck before)
2. **Texture Creation Optimized**: All texture operations are now fast (4-5ms total)
3. **Overworld Not Broken**: Fixed the parallel decompression issues that were causing corruption
### 🎯 **Real Bottleneck Identified**
The actual bottleneck is **`LoadOverworldData`** at **2849.67ms (2.8 seconds)**, not the decompression.
### 📊 **Performance Breakdown**
- **Total Overworld::Load**: 2862.19ms (2.9 seconds)
- **LoadOverworldData**: 2849.67ms (99.5% of total time!)
- **All other operations**: ~12.5ms (0.5% of total time)
## Root Cause Analysis
The `LoadOverworldData` phase includes:
1. `LoadTileTypes()` - Fast
2. `LoadEntrances()` - Fast
3. `LoadHoles()` - Fast
4. `LoadExits()` - Fast
5. `LoadItems()` - Fast
6. **`LoadOverworldMaps()`** - This is the bottleneck (already parallelized)
7. `LoadSprites()` - Fast
The issue is that `LoadOverworldMaps()` calls `OverworldMap::BuildMap()` for all 160 maps in parallel, but each `BuildMap()` call is still expensive.
## Optimization Strategy
### Phase 1: Detailed Profiling (Immediate)
Added individual timing for each operation in `LoadOverworldData` to identify the exact bottleneck:
```cpp
{
core::ScopedTimer tile_types_timer("LoadTileTypes");
LoadTileTypes();
}
{
core::ScopedTimer entrances_timer("LoadEntrances");
RETURN_IF_ERROR(LoadEntrances());
}
// ... etc for each operation
```
### Phase 2: Optimize BuildMap Operations (Next)
The `OverworldMap::BuildMap()` method is likely doing expensive operations:
- Graphics loading and processing
- Palette operations
- Tile assembly
- Bitmap creation
### Phase 3: Lazy Loading (Future)
Only build maps that are immediately needed:
- Build first 4-8 maps initially
- Build remaining maps on-demand when accessed
- Use background processing for non-visible maps
## Current Status
**Fixed Issues:**
- Overworld corruption resolved (reverted to sequential decompression)
- Decompression performance restored (1.4ms)
- Texture creation optimized
🔄 **Next Steps:**
1. Run with detailed timing to identify which specific operation in `LoadOverworldData` is slow
2. Optimize the `OverworldMap::BuildMap()` method
3. Implement lazy loading for non-essential maps
## Expected Results
With the detailed timing, we should see something like:
```
LoadTileTypes 1 ~5ms
LoadEntrances 1 ~50ms
LoadHoles 1 ~20ms
LoadExits 1 ~100ms
LoadItems 1 ~200ms
LoadOverworldMaps 1 ~2400ms <-- This will be the bottleneck
LoadSprites 1 ~100ms
```
This will allow us to focus optimization efforts on the actual bottleneck rather than guessing.