- Introduced a detailed comparison document highlighting the functional equivalence between ZScream (C#) and YAZE (C++) overworld loading logic. - Verified key areas such as tile loading, expansion detection, map decompression, and coordinate calculations, confirming consistent behavior across both implementations. - Documented differences and improvements in YAZE, including enhanced error handling and memory management. - Provided validation results from integration tests ensuring data integrity and compatibility with existing ROMs.
105 lines
3.6 KiB
Markdown
105 lines
3.6 KiB
Markdown
# Overworld Optimization Status Update
|
|
|
|
## Current Performance Analysis
|
|
|
|
Based on the latest performance report:
|
|
|
|
```
|
|
CreateOverworldMaps 1 148.42 148.42
|
|
CreateInitialTextures 1 4.49 4.49
|
|
CreateTilemap 1 4.70 4.70
|
|
CreateBitmapWithoutTexture_Graphics1 0.24 0.24
|
|
LoadOverworldData 1 2849.67 2849.67
|
|
AssembleTiles 1 10.35 10.35
|
|
CreateOverworldMapObjects 1 0.74 0.74
|
|
DecompressAllMapTiles 1 1.40 1.40
|
|
CreateBitmapWithoutTexture_Tileset1 3.69 3.69
|
|
Overworld::Load 2 5724.38 2862.19
|
|
```
|
|
|
|
## Key Findings
|
|
|
|
### ✅ **Successful Optimizations**
|
|
1. **Decompression Fixed**: `DecompressAllMapTiles` is now only 1.40ms (was the bottleneck before)
|
|
2. **Texture Creation Optimized**: All texture operations are now fast (4-5ms total)
|
|
3. **Overworld Not Broken**: Fixed the parallel decompression issues that were causing corruption
|
|
|
|
### 🎯 **Real Bottleneck Identified**
|
|
The actual bottleneck is **`LoadOverworldData`** at **2849.67ms (2.8 seconds)**, not the decompression.
|
|
|
|
### 📊 **Performance Breakdown**
|
|
- **Total Overworld::Load**: 2862.19ms (2.9 seconds)
|
|
- **LoadOverworldData**: 2849.67ms (99.5% of total time!)
|
|
- **All other operations**: ~12.5ms (0.5% of total time)
|
|
|
|
## Root Cause Analysis
|
|
|
|
The `LoadOverworldData` phase includes:
|
|
1. `LoadTileTypes()` - Fast
|
|
2. `LoadEntrances()` - Fast
|
|
3. `LoadHoles()` - Fast
|
|
4. `LoadExits()` - Fast
|
|
5. `LoadItems()` - Fast
|
|
6. **`LoadOverworldMaps()`** - This is the bottleneck (already parallelized)
|
|
7. `LoadSprites()` - Fast
|
|
|
|
The issue is that `LoadOverworldMaps()` calls `OverworldMap::BuildMap()` for all 160 maps in parallel, but each `BuildMap()` call is still expensive.
|
|
|
|
## Optimization Strategy
|
|
|
|
### Phase 1: Detailed Profiling (Immediate)
|
|
Added individual timing for each operation in `LoadOverworldData` to identify the exact bottleneck:
|
|
|
|
```cpp
|
|
{
|
|
core::ScopedTimer tile_types_timer("LoadTileTypes");
|
|
LoadTileTypes();
|
|
}
|
|
|
|
{
|
|
core::ScopedTimer entrances_timer("LoadEntrances");
|
|
RETURN_IF_ERROR(LoadEntrances());
|
|
}
|
|
// ... etc for each operation
|
|
```
|
|
|
|
### Phase 2: Optimize BuildMap Operations (Next)
|
|
The `OverworldMap::BuildMap()` method is likely doing expensive operations:
|
|
- Graphics loading and processing
|
|
- Palette operations
|
|
- Tile assembly
|
|
- Bitmap creation
|
|
|
|
### Phase 3: Lazy Loading (Future)
|
|
Only build maps that are immediately needed:
|
|
- Build first 4-8 maps initially
|
|
- Build remaining maps on-demand when accessed
|
|
- Use background processing for non-visible maps
|
|
|
|
## Current Status
|
|
|
|
✅ **Fixed Issues:**
|
|
- Overworld corruption resolved (reverted to sequential decompression)
|
|
- Decompression performance restored (1.4ms)
|
|
- Texture creation optimized
|
|
|
|
🔄 **Next Steps:**
|
|
1. Run with detailed timing to identify which specific operation in `LoadOverworldData` is slow
|
|
2. Optimize the `OverworldMap::BuildMap()` method
|
|
3. Implement lazy loading for non-essential maps
|
|
|
|
## Expected Results
|
|
|
|
With the detailed timing, we should see something like:
|
|
```
|
|
LoadTileTypes 1 ~5ms
|
|
LoadEntrances 1 ~50ms
|
|
LoadHoles 1 ~20ms
|
|
LoadExits 1 ~100ms
|
|
LoadItems 1 ~200ms
|
|
LoadOverworldMaps 1 ~2400ms <-- This will be the bottleneck
|
|
LoadSprites 1 ~100ms
|
|
```
|
|
|
|
This will allow us to focus optimization efforts on the actual bottleneck rather than guessing.
|