Files
yaze/docs/B5-stability-improvements.md
scawful 2b11338e20 Update version to 0.3.2 and enhance stability across platforms
- Bump project version to 0.3.2 in CMakeLists.txt, vcpkg.json, and relevant headers.
- Increase Windows stack size to 8MB to prevent stack overflow during asset loading.
- Isolate development utilities from CI builds to ensure cleaner release artifacts.
- Implement comprehensive bounds checking and fix segmentation faults in the graphics system.
- Update documentation to reflect new build instructions and stability improvements.
2025-09-29 15:47:14 -04:00

404 lines
10 KiB
Markdown

# Stability, Testability & Release Workflow Improvements
This document outlines recent improvements to YAZE's stability, testability, and release workflows, along with recommendations for future optimizations.
## Recent Improvements (v0.3.2)
### Windows Platform Stability
#### Stack Size Optimization
**Problem:** Windows default stack size (1MB) was insufficient for `EditorManager::LoadAssets()` which loads 223 graphics sheets and initializes multiple editors.
**Solution:** Increased stack size to 8MB to match Unix-like systems.
**Implementation:**
```cmake
# src/app/app.cmake
if(MSVC)
target_link_options(yaze PRIVATE /STACK:8388608) # 8MB stack
elseif(MINGW)
target_link_options(yaze PRIVATE -Wl,--stack,8388608)
endif()
```
**Impact:**
- ✅ Eliminated stack overflow crashes during ROM loading
- ✅ Consistent behavior across all platforms
- ✅ Handles deep call chains from editor initialization
#### Development Utility Isolation
**Problem:** Development-only utilities (`extract_vanilla_values`, `rom_patch_utility`) were being built in CI/release workflows, causing linker errors.
**Solution:** Isolated development utilities from CI/release builds using environment detection.
**Implementation:**
```cmake
# test/CMakeLists.txt
if(NOT YAZE_MINIMAL_BUILD AND YAZE_ENABLE_ROM_TESTS AND NOT DEFINED ENV{GITHUB_ACTIONS})
add_executable(extract_vanilla_values ...)
target_link_libraries(extract_vanilla_values yaze_core ...)
endif()
```
**Impact:**
- ✅ Clean release builds without development artifacts
- ✅ Proper library linkage for development tools
- ✅ Faster CI/CD pipelines
### Graphics System Stability
#### Segmentation Fault Resolution
**Problem:** Tile cache system using `std::move()` operations invalidated Bitmap surface pointers, causing crashes.
**Solution:** Disabled move semantics in tile cache and implemented direct SDL texture updates.
**Technical Details:**
- Root cause: `std::move()` operations on Bitmap objects invalidated internal SDL_Surface pointers
- Fix: Disabled tile cache and use direct texture updates
- Optimization: Maintained surface/texture pooling while ensuring pointer stability
**Impact:**
- ✅ Eliminated all segmentation faults in tile16 editor
- ✅ Stable graphics rendering pipeline
- ✅ Reliable texture management
#### Comprehensive Bounds Checking
**Problem:** Out-of-bounds access to tile and palette data caused crashes and corruption.
**Solution:** Added extensive bounds checking throughout graphics pipeline.
**Areas Covered:**
- Tile16 canvas access
- Palette group selection
- Graphics sheet indexing
- Texture coordinate calculations
**Impact:**
- ✅ Prevents crashes from invalid data
- ✅ Better error reporting
- ✅ Safer memory access patterns
### Build System Improvements
#### Modern Windows Workflow
**Previous Approach:** Generate Visual Studio solution files using Python script.
**New Approach:** Use Visual Studio's native CMake support.
**Benefits:**
- No project generation step required
- CMakeLists.txt is the single source of truth
- Changes reflect immediately without regeneration
- Better IntelliSense and debugging experience
- Cross-platform consistency
**Migration:**
```powershell
# Old workflow
python scripts/generate-vs-projects.py
# Open YAZE.sln
# New workflow
# File → Open → Folder → yaze
# Visual Studio detects CMakeLists.txt automatically
```
#### Enhanced CI/CD Reliability
**Improvements:**
- Automatic vcpkg fallback mechanisms
- Proper development utility isolation
- Consistent test execution across platforms
- Better error reporting and debugging
## Recommended Optimizations
### High Priority
#### 1. Lazy Graphics Loading
**Current:** All 223 graphics sheets loaded on ROM open.
**Proposed:** Load graphics sheets on-demand when editors access them.
**Benefits:**
- Faster ROM loading (3-5x improvement expected)
- Reduced memory footprint
- Better startup performance
- Eliminates stack pressure
**Implementation Strategy:**
```cpp
class LazyGraphicsLoader {
std::array<std::optional<gfx::Bitmap>, kNumGfxSheets> sheets_;
gfx::Bitmap& GetSheet(int index) {
if (!sheets_[index]) {
sheets_[index] = LoadGraphicsSheet(rom_, index);
}
return *sheets_[index];
}
};
```
**Impact:**
- 🔵 Complexity: Medium
- 🟢 Performance Gain: High
- 🟢 Risk: Low (backward compatible)
#### 2. Heap-Based Large Allocations
**Current:** Large arrays and vectors allocated on stack during asset loading.
**Proposed:** Move large data structures to heap allocation.
**Benefits:**
- Reduces stack pressure
- More flexible memory management
- Better for Windows default stack constraints
- Safer for deep call chains
**Areas to Convert:**
- Graphics sheet arrays in LoadAllGraphicsData()
- Editor initialization data structures
- Temporary buffers in compression/decompression
**Impact:**
- 🟢 Complexity: Low
- 🟢 Performance Gain: Medium
- 🟢 Risk: Very Low
#### 3. Streaming ROM Assets
**Current:** Load entire ROM and all assets into memory.
**Proposed:** Stream assets from ROM file as needed.
**Benefits:**
- Minimal memory footprint
- Instant ROM opening
- Better for large ROM hacks
- More scalable architecture
**Challenges:**
- Requires architecture refactoring
- Need efficient caching strategy
- Must maintain edit performance
**Impact:**
- 🔴 Complexity: High
- 🟢 Performance Gain: Very High
- 🟡 Risk: Medium (requires testing)
### Medium Priority
#### 4. Enhanced Test Isolation
**Current:** Some tests share global state through Arena singleton.
**Proposed:** Better test isolation with mock singletons.
**Benefits:**
- More reliable test execution
- Parallel test execution possible
- Better test independence
- Easier debugging
**Implementation:**
```cpp
class TestArena : public Arena {
// Test-specific implementation
};
TEST_F(GraphicsTest, TestCase) {
TestArena arena;
Arena::SetInstance(&arena); // Override singleton
// Run test
}
```
**Impact:**
- 🟡 Complexity: Medium
- 🟢 Performance Gain: Medium (parallel tests)
- 🟢 Risk: Low
#### 5. Dependency Caching Optimization
**Current:** CI builds re-download and build some dependencies.
**Proposed:** Enhanced caching strategies for vcpkg and build artifacts.
**Benefits:**
- Faster CI builds (2-3x improvement)
- Reduced CI costs
- More reliable builds (less network dependency)
- Better developer experience
**Implementation:**
```yaml
# GitHub Actions
- uses: actions/cache@v4
with:
path: |
~/.ccache
~/vcpkg_cache
build/_deps
key: ${{ runner.os }}-${{ hashFiles('**/CMakeLists.txt') }}
```
**Impact:**
- 🟢 Complexity: Low
- 🟢 Performance Gain: High (CI only)
- 🟢 Risk: Very Low
#### 6. Memory Pool for Graphics
**Current:** Individual allocation for each Bitmap and texture.
**Proposed:** Memory pool for graphics objects.
**Benefits:**
- Reduced allocation overhead
- Better cache locality
- Predictable memory usage
- Faster allocation/deallocation
**Areas to Apply:**
- Bitmap objects
- SDL surfaces and textures
- Tile data structures
**Impact:**
- 🟡 Complexity: Medium
- 🟡 Performance Gain: Medium
- 🟡 Risk: Medium (requires careful design)
### Low Priority
#### 7. Build Time Optimization
**Current:** Full rebuild takes 10-15 minutes.
**Proposed:** Optimize compilation units and dependencies.
**Strategies:**
- Use forward declarations more extensively
- Split large compilation units
- Optimize template instantiations
- Better use of precompiled headers
**Impact:**
- 🟡 Complexity: Medium
- 🟢 Performance Gain: Medium (developer experience)
- 🟢 Risk: Low
#### 8. Release Workflow Simplification
**Current:** Three separate release workflows (simplified, standard, complex).
**Proposed:** Single unified workflow with conditional features.
**Benefits:**
- Easier maintenance
- Consistent behavior
- Better documentation
- Clearer mental model
**Implementation:**
```yaml
jobs:
release:
strategy:
matrix:
include:
- profile: minimal # Quick releases
- profile: standard # Normal releases
- profile: maximum # Production releases
```
**Impact:**
- 🟢 Complexity: Low
- 🟢 Performance Gain: None (maintenance benefit)
- 🟢 Risk: Very Low
## Testing Improvements
### Current State
- ✅ Comprehensive unit test coverage (46+ tests)
- ✅ Integration tests for major components
- ✅ ROM-dependent tests properly isolated
- ✅ CI-safe test configuration
- ✅ Platform-specific test handling
### Recommendations
#### 1. Visual Regression Testing
**Goal:** Catch graphics rendering regressions automatically.
**Approach:**
- Capture screenshots of editor states
- Compare against baseline images
- Flag visual differences for review
**Tools:** ImGui Test Engine (already integrated)
#### 2. Performance Benchmarks
**Goal:** Track performance regressions in CI.
**Metrics:**
- ROM load time
- Graphics sheet decompression
- Editor initialization
- Memory usage
**Implementation:** Google Benchmark (already a dependency)
#### 3. Fuzz Testing
**Goal:** Find edge cases and crashes through random input.
**Areas:**
- ROM parsing
- Compression/decompression
- Palette handling
- Tile data processing
**Tools:** LibFuzzer or AFL
## Metrics & Monitoring
### Current Measurements
- Build time: ~10-15 minutes (full rebuild)
- ROM load time: ~2-3 seconds
- Memory usage: ~500MB-1GB typical
- Test execution: ~30 seconds (CI), ~2 minutes (full)
### Target Improvements
- Build time: <5 minutes (incremental), <10 minutes (full)
- ROM load time: <1 second (with lazy loading)
- Memory usage: <300MB (with streaming)
- Test execution: <15 seconds (CI), <1 minute (full)
## Action Items
### Immediate (v0.3.2)
- [x] Fix Windows stack overflow
- [x] Isolate development utilities
- [x] Fix graphics segfaults
- [x] Update build documentation
- [ ] Complete tile16 palette display fixes
### Short Term (v0.3.3)
- [ ] Implement lazy graphics loading
- [ ] Move large allocations to heap
- [ ] Enhanced CI caching
- [ ] Performance benchmarks
### Medium Term (v0.4.0)
- [ ] Streaming ROM assets
- [ ] Memory pool for graphics
- [ ] Visual regression tests
- [ ] Enhanced test isolation
### Long Term (v0.5.0+)
- [ ] Fuzz testing integration
- [ ] Build time optimization
- [ ] Release workflow unification
- [ ] Advanced memory profiling
## Conclusion
The v0.3.2 release focuses on stability and reliability improvements, particularly for the Windows platform. The recommended optimizations provide a clear roadmap for future performance and maintainability improvements while maintaining backward compatibility and code quality.
For questions or suggestions, please open an issue or discussion on the GitHub repository.