backend-infra-engineer: Release v0.3.9-hotfix7 snapshot

2025-11-23 13:37:10 -05:00
parent c8289bffda
commit 2934c82b75
202 changed files with 34914 additions and 845 deletions
--- a/docs/internal/CI-TEST-STRATEGY.md
+++ b/docs/internal/CI-TEST-STRATEGY.md
@@ -0,0 +1,175 @@
+# CI Test Strategy
+
+## Overview
+
+The yaze project uses a **tiered testing strategy** to balance CI speed with comprehensive coverage. This document explains the strategy, configuration, and how to add tests.
+
+**Key Distinction:**
+- **Default Tests** (PR/Push CI): Stable, fast, no external dependencies - ALWAYS run, MUST pass
+- **Optional Tests** (Nightly CI): ROM-dependent, experimental, benchmarks - Run nightly, non-blocking
+
+Tier breakdown:
+- **Tier 1 (PR/Push CI)**: Fast feedback loop with stable tests only (~5-10 minutes total)
+- **Tier 2 (Nightly CI)**: Full test suite including heavy/flaky/ROM tests (~30-60 minutes total)
+- **Tier 3 (Configuration Matrix)**: Weekly cross-platform configuration validation
+
+## Test Tiers
+
+### Tier 1: PR/Push Tests (ci.yml)
+**When:** Every PR and push to master/develop
+**Duration:** 5-10 minutes per platform
+**Coverage:**
+- Stable tests (unit + integration that don't require ROM)
+- Smoke tests for GUI framework validation (Linux only)
+- Basic build validation across all platforms
+
+**Test Labels:**
+- `stable`: Core functionality tests with stable contracts
+- Includes both unit and integration tests that are fast and reliable
+
+### Tier 2: Nightly Tests (nightly.yml)
+**When:** Nightly at 3 AM UTC (or manual trigger)
+**Duration:** 30-60 minutes total
+**Coverage:**
+- ROM-dependent tests (with test ROM if available)
+- Experimental AI tests (with Ollama integration)
+- GUI E2E tests (full workflows with ImGuiTestEngine)
+- Performance benchmarks
+- Extended integration tests with all features enabled
+
+**Test Labels:**
+- `rom_dependent`: Tests requiring actual Zelda3 ROM
+- `experimental`: AI and unstable feature tests
+- `gui`: Full GUI automation tests
+- `benchmark`: Performance regression tests
+
+### Tier 3: Configuration Matrix (matrix-test.yml)
+**When:** Nightly at 2 AM UTC (or manual trigger)
+**Duration:** 20-30 minutes
+**Coverage:**
+- Different feature combinations (minimal, gRPC-only, full AI, etc.)
+- Platform-specific configurations
+- Build configuration validation
+
+## CTest Label System
+
+Tests are organized with labels for selective execution:
+
+```cmake
+# In test/CMakeLists.txt
+yaze_add_test_suite(yaze_test_stable "stable" OFF ${STABLE_TEST_SOURCES})
+yaze_add_test_suite(yaze_test_rom_dependent "rom_dependent" OFF ${ROM_DEPENDENT_SOURCES})
+yaze_add_test_suite(yaze_test_gui "gui;experimental" ON ${GUI_TEST_SOURCES})
+yaze_add_test_suite(yaze_test_experimental "experimental" OFF ${EXPERIMENTAL_SOURCES})
+yaze_add_test_suite(yaze_test_benchmark "benchmark" OFF ${BENCHMARK_SOURCES})
+```
+
+## Running Tests Locally
+
+### Run specific test categories:
+```bash
+# Stable tests only (what PR CI runs)
+ctest -L stable --output-on-failure
+
+# ROM-dependent tests
+ctest -L rom_dependent --output-on-failure
+
+# Experimental tests
+ctest -L experimental --output-on-failure
+
+# GUI tests headlessly
+./build/bin/yaze_test_gui -nogui
+
+# Benchmarks
+./build/bin/yaze_test_benchmark
+```
+
+### Using test executables directly:
+```bash
+# Run stable test suite
+./build/bin/yaze_test_stable
+
+# Run with specific filter
+./build/bin/yaze_test_stable --gtest_filter="*Overworld*"
+
+# Run GUI smoke tests only
+./build/bin/yaze_test_gui -nogui --gtest_filter="*Smoke*"
+```
+
+## Test Presets
+
+CMakePresets.json defines test presets for different scenarios:
+
+- `stable`: Run stable tests only (no ROM dependency)
+- `unit`: Run unit tests only
+- `integration`: Run integration tests only
+- `stable-ai`: Stable tests with AI stack enabled
+- `unit-ai`: Unit tests with AI stack enabled
+
+Example usage:
+```bash
+# Configure with preset
+cmake --preset ci-linux
+
+# Run tests with preset
+ctest --preset stable
+```
+
+## Adding New Tests
+
+### For PR/Push CI (Tier 1 - Default):
+Add to `STABLE_TEST_SOURCES` in `test/CMakeLists.txt`:
+- **Requirements**: Must not require ROM files, must complete in < 30 seconds, stable behavior (no flakiness)
+- **Examples**: Unit tests, basic integration tests, framework smoke tests
+- **Location**: `test/unit/`, `test/integration/` (excluding subdirs below)
+- **Labels assigned**: `stable`
+
+### For Nightly CI (Tier 2 - Optional):
+Add to appropriate test suite in `test/CMakeLists.txt`:
+
+- `ROM_DEPENDENT_TEST_SOURCES` - Tests requiring ROM
+  - Location: `test/e2e/rom_dependent/` or `test/integration/` (ROM-gated with `#ifdef`)
+  - Labels: `rom_dependent`
+
+- `GUI_TEST_SOURCES` / `EXPERIMENTAL_TEST_SOURCES` - Experimental features
+  - Location: `test/integration/ai/` for AI tests
+  - Labels: `experimental`
+
+- `BENCHMARK_TEST_SOURCES` - Performance tests
+  - Location: `test/benchmarks/`
+  - Labels: `benchmark`
+
+## CI Optimization Tips
+
+### For Faster PR CI:
+1. Keep tests in STABLE_TEST_SOURCES minimal
+2. Use `continue-on-error: true` for non-critical tests
+3. Leverage caching (CPM, sccache, build artifacts)
+4. Run platform tests in parallel
+
+### For Comprehensive Coverage:
+1. Use nightly.yml for heavy tests
+2. Schedule at low-traffic times
+3. Upload artifacts for debugging failures
+4. Use longer timeouts for integration tests
+
+## Monitoring and Alerts
+
+### PR/Push Failures:
+- Block merging if stable tests fail
+- Immediate feedback in PR comments
+- Required status checks on protected branches
+
+### Nightly Failures:
+- Summary report in GitHub Actions
+- Optional Slack/email notifications for failures
+- Artifacts retained for 30 days for debugging
+- Non-blocking for development
+
+## Future Improvements
+
+1. **Test Result Trends**: Track test success rates over time
+2. **Flaky Test Detection**: Automatically identify and quarantine flaky tests
+3. **Performance Tracking**: Graph benchmark results over commits
+4. **ROM Test Infrastructure**: Secure storage/retrieval of test ROM
+5. **Parallel Test Execution**: Split test suites across multiple runners