backend-infra-engineer: Post v0.3.9-hotfix7 snapshot (build cleanup)

2025-12-22 00:20:49 +00:00
parent 2934c82b75
commit 5c4cd57ff8
1259 changed files with 239160 additions and 43801 deletions
--- a/docs/internal/agents/archive/testing-docs-2025/archive-index.md
+++ b/docs/internal/agents/archive/testing-docs-2025/archive-index.md
@@ -0,0 +1,142 @@
+# Testing Documentation Archive (November 2025)
+
+This directory contains testing-related documentation that was archived during a comprehensive cleanup of `/docs/internal/testing/` to reduce duplication and improve maintainability.
+
+## Archive Rationale
+
+The testing directory contained 25 markdown files with significant duplication of content from:
+- `test/README.md` - The canonical test suite documentation
+- `docs/public/build/quick-reference.md` - The canonical build reference
+- `docs/internal/ci-and-testing.md` - CI/CD pipeline documentation
+
+## Archived Files (6 total)
+
+### Bloated/Redundant Documentation
+
+1. **testing-strategy.md** (843 lines)
+   - Duplicates the tiered testing strategy from `test/README.md`
+   - Reason: Content moved to canonical test/README.md
+   - Reference: See test/README.md for current strategy
+
+2. **TEST_INFRASTRUCTURE_IMPROVEMENT_PLAN.md** (2257 lines)
+   - Massive improvement proposal document
+   - Duplicates much of test/README.md and docs/internal/ci-and-testing.md
+   - Reason: Content integrated into existing canonical docs
+   - Reference: Implementation recommendations are in docs/internal/ci-and-testing.md
+
+3. **ci-improvements-proposal.md** (690 lines)
+   - Detailed CI/CD improvement proposals
+   - Overlaps significantly with docs/internal/ci-and-testing.md
+   - Reason: Improvements documented in canonical CI/testing doc
+   - Reference: See docs/internal/ci-and-testing.md
+
+4. **cmake-validation.md** (672 lines)
+   - CMake validation guide
+   - Duplicates content from docs/public/build/quick-reference.md
+   - Reason: Build validation covered in quick-reference.md
+   - Reference: See docs/public/build/quick-reference.md
+
+5. **integration-plan.md** (505 lines)
+   - Testing infrastructure integration planning document
+   - Much of content duplicated in test/README.md
+   - Reason: Integration approach implemented and documented elsewhere
+   - Reference: See test/README.md for current integration approach
+
+6. **matrix-testing-strategy.md** (499 lines)
+   - Platform/configuration matrix testing strategy
+   - Some unique content but much is duplicated in other docs
+   - Reason: Matrix testing implementation is in scripts/
+   - Reference: Check scripts/test-config-matrix.sh and related scripts
+
+## Deleted Files (14 total - Already in git staging)
+
+These files were completely duplicative and offered no unique value:
+
+1. **QUICKSTART.md** - Exact duplicate of QUICK_START_GUIDE.md
+2. **QUICK_START_GUIDE.md** - Duplicates test/README.md Quick Start section
+3. **QUICK_REFERENCE.md** - Redundant quick reference for symbol detection
+4. **README_TESTING.md** - Duplicate hub documentation
+5. **TESTING_INDEX.md** - Navigation index (redundant)
+6. **ARCHITECTURE_HANDOFF.md** - AI-generated project status document
+7. **INITIATIVE.md** - AI-generated project initiative document
+8. **EXECUTIVE_SUMMARY.md** - AI-generated executive summary
+9. **IMPLEMENTATION_GUIDE.md** - Symbol detection implementation guide (superseded)
+10. **MATRIX_TESTING_README.md** - Matrix testing system documentation
+11. **MATRIX_TESTING_IMPLEMENTATION.md** - Matrix testing implementation guide
+12. **MATRIX_TESTING_CHECKLIST.md** - Matrix testing checklist
+13. **SYMBOL_DETECTION_README.md** - Duplicate of symbol-conflict-detection.md
+14. **TEST_INFRASTRUCTURE_IMPROVEMENT_PLAN.md** - (see archived files above)
+
+## Files Retained (5 total in docs/internal/testing/)
+
+1. **dungeon-gui-test-design.md** (1007 lines)
+   - Unique architectural test design for dungeon editor
+   - Specific to DungeonEditorV2 testing with ImGuiTestEngine
+   - Rationale: Contains unique architectural and testing patterns not found elsewhere
+
+2. **pre-push-checklist.md** (335 lines)
+   - Practical developer checklist for pre-commit validation
+   - Links to scripts and CI verification
+   - Rationale: Useful operational checklist referenced by developers
+
+3. **README.md** (414 lines)
+   - Hub documentation for testing infrastructure
+   - Links to canonical testing documents and resources
+   - Rationale: Serves as navigation hub to various testing documents
+
+4. **symbol-conflict-detection.md** (440 lines)
+   - Complete documentation for symbol conflict detection system
+   - Details on symbol extraction, detection, and pre-commit hooks
+   - Rationale: Complete reference for symbol conflict system
+
+5. **sample-symbol-database.json** (1133 bytes)
+   - Example JSON database for symbol conflict detection
+   - Supporting documentation for symbol system
+   - Rationale: Example data for understanding symbol database format
+
+## Canonical Documentation References
+
+When working with testing, refer to these canonical sources:
+
+- **Test Suite Overview**: `test/README.md` (407 lines)
+  - Tiered testing strategy, test structure, running tests
+  - How to write new tests, CI configuration
+
+- **Build & Test Quick Reference**: `docs/public/build/quick-reference.md`
+  - CMake presets, common build commands
+  - Test execution quick reference
+
+- **CI/CD Pipeline**: `docs/internal/ci-and-testing.md`
+  - CI workflow configuration, test infrastructure
+  - GitHub Actions integration
+
+- **CLAUDE.md**: Project root CLAUDE.md
+  - References canonical test documentation
+  - Links to quick-reference.md and test/README.md
+
+## How to Restore
+
+If you need to reference archived content:
+
+```bash
+# View specific archived document
+cat docs/internal/agents/archive/testing-docs-2025/testing-strategy.md
+
+# Restore if needed
+mv docs/internal/agents/archive/testing-docs-2025/<filename>.md docs/internal/testing/
+```
+
+## Cleanup Results
+
+- **Before**: 25 markdown files (12,170 total lines)
+- **After**: 5 markdown files (2,943 total lines)
+- **Reduction**: 75.8% fewer files, 75.8% fewer lines
+- **Result**: Cleaner documentation structure, easier to maintain, reduced duplication
+
+## Related Cleanup
+
+This cleanup was performed as part of documentation janitor work to:
+- Remove AI-generated spam and duplicate documentation
+- Enforce single source of truth for each documentation topic
+- Keep root documentation directory clean
+- Maintain clear, authoritative documentation structure
--- a/docs/internal/agents/archive/testing-docs-2025/ci-improvements-proposal.md
+++ b/docs/internal/agents/archive/testing-docs-2025/ci-improvements-proposal.md
@@ -0,0 +1,690 @@
+# CI/CD Improvements Proposal
+
+## Executive Summary
+
+This document proposes specific improvements to the YAZE CI/CD pipeline to catch build failures earlier, reduce wasted CI time, and provide faster feedback to developers.
+
+**Goals**:
+- Reduce time-to-first-failure from ~15 minutes to <5 minutes
+- Catch 90% of failures in fast jobs (<5 min)
+- Reduce PR iteration time from hours to minutes
+- Prevent platform-specific issues from reaching CI
+
+**ROI**:
+- **Time Saved**: ~10 minutes per failed build × ~30 failures/month = **5 hours/month**
+- **Developer Experience**: Faster feedback → less context switching
+- **CI Cost**: Minimal (fast jobs use fewer resources)
+
+---
+
+## Current CI Pipeline Analysis
+
+### Current Jobs
+
+| Job | Platform | Duration | Cost | Catches |
+|-----|----------|----------|------|---------|
+| build | Ubuntu/macOS/Windows | 15-20 min | High | Compilation errors |
+| test | Ubuntu/macOS/Windows | 5 min | Medium | Test failures |
+| windows-agent | Windows | 30 min | High | AI stack issues |
+| code-quality | Ubuntu | 2 min | Low | Format/lint issues |
+| memory-sanitizer | Ubuntu | 20 min | High | Memory bugs |
+| z3ed-agent-test | macOS | 15 min | High | Agent integration |
+
+**Total PR Time**: ~40 minutes (parallel), ~90 minutes (worst case)
+
+### Issues with Current Pipeline
+
+1. **Long feedback loop**: 15-20 minutes to find out if headers are missing
+2. **Wasted resources**: Full 20-minute builds that fail in first 2 minutes
+3. **No early validation**: CMake configuration succeeds, but compilation fails later
+4. **Symbol conflicts detected late**: Link errors only appear after full compile
+5. **Platform-specific issues**: Discovered after 15+ minutes per platform
+
+---
+
+## Proposed Improvements
+
+### Improvement 1: Configuration Validation Job
+
+**Goal**: Catch CMake errors in <2 minutes
+
+**Implementation**:
+```yaml
+config-validation:
+  name: "Config Validation - ${{ matrix.preset }}"
+  runs-on: ${{ matrix.os }}
+  strategy:
+    fail-fast: true  # Stop immediately if any fails
+    matrix:
+      include:
+        - os: ubuntu-22.04
+          preset: ci-linux
+        - os: macos-14
+          preset: ci-macos
+        - os: windows-2022
+          preset: ci-windows
+
+  steps:
+    - uses: actions/checkout@v4
+      with:
+        submodules: recursive
+
+    - name: Setup build environment
+      uses: ./.github/actions/setup-build
+      with:
+        platform: ${{ matrix.platform }}
+        preset: ${{ matrix.preset }}
+
+    - name: Validate CMake configuration
+      run: |
+        cmake --preset ${{ matrix.preset }} \
+          -DCMAKE_VERBOSE_MAKEFILE=OFF
+
+    - name: Check include paths
+      run: |
+        grep "INCLUDE_DIRECTORIES" build/CMakeCache.txt || \
+          (echo "Include paths not configured" && exit 1)
+
+    - name: Validate presets
+      run: cmake --preset ${{ matrix.preset }} --list-presets
+```
+
+**Benefits**:
+- ✅ Fails in <2 minutes for CMake errors
+- ✅ Catches missing dependencies immediately
+- ✅ Validates include path propagation
+- ✅ Low resource usage (no compilation)
+
+**What it catches**:
+- CMake syntax errors
+- Missing dependencies (immediate)
+- Invalid preset definitions
+- Include path misconfiguration
+
+---
+
+### Improvement 2: Compile-Only Job
+
+**Goal**: Catch compilation errors in <5 minutes
+
+**Implementation**:
+```yaml
+compile-check:
+  name: "Compile Check - ${{ matrix.preset }}"
+  runs-on: ${{ matrix.os }}
+  needs: [config-validation]  # Run after config validation passes
+  strategy:
+    fail-fast: false
+    matrix:
+      include:
+        - os: ubuntu-22.04
+          preset: ci-linux
+          platform: linux
+        - os: macos-14
+          preset: ci-macos
+          platform: macos
+        - os: windows-2022
+          preset: ci-windows
+          platform: windows
+
+  steps:
+    - uses: actions/checkout@v4
+      with:
+        submodules: recursive
+
+    - name: Setup build environment
+      uses: ./.github/actions/setup-build
+      with:
+        platform: ${{ matrix.platform }}
+        preset: ${{ matrix.preset }}
+
+    - name: Configure project
+      run: cmake --preset ${{ matrix.preset }}
+
+    - name: Compile representative files
+      run: |
+        # Compile 10-20 key files to catch most header issues
+        cmake --build build --target rom.cc.o bitmap.cc.o \
+          overworld.cc.o resource_catalog.cc.o \
+          dungeon.cc.o sprite.cc.o palette.cc.o \
+          asar_wrapper.cc.o controller.cc.o canvas.cc.o \
+          --parallel 4
+
+    - name: Check for common issues
+      run: |
+        # Platform-specific checks
+        if [ "${{ matrix.platform }}" = "windows" ]; then
+          echo "Checking for /std:c++latest flag..."
+          grep "std:c++latest" build/compile_commands.json || \
+            echo "Warning: C++20 flag may be missing"
+        fi
+```
+
+**Benefits**:
+- ✅ Catches header issues in ~5 minutes
+- ✅ Tests actual compilation without full build
+- ✅ Platform-specific early detection
+- ✅ ~70% faster than full build
+
+**What it catches**:
+- Missing headers
+- Include path problems
+- Preprocessor errors
+- Template instantiation issues
+- Platform-specific compilation errors
+
+---
+
+### Improvement 3: Symbol Conflict Job
+
+**Goal**: Detect ODR violations before linking
+
+**Implementation**:
+```yaml
+symbol-check:
+  name: "Symbol Check - ${{ matrix.platform }}"
+  runs-on: ${{ matrix.os }}
+  needs: [build]  # Run after full build completes
+  strategy:
+    matrix:
+      include:
+        - os: ubuntu-22.04
+          platform: linux
+        - os: macos-14
+          platform: macos
+        - os: windows-2022
+          platform: windows
+
+  steps:
+    - uses: actions/checkout@v4
+
+    - name: Download build artifacts
+      uses: actions/download-artifact@v4
+      with:
+        name: build-${{ matrix.platform }}
+        path: build
+
+    - name: Check for symbol conflicts (Unix)
+      if: matrix.platform != 'windows'
+      run: ./scripts/verify-symbols.sh --build-dir build
+
+    - name: Check for symbol conflicts (Windows)
+      if: matrix.platform == 'windows'
+      shell: pwsh
+      run: .\scripts\verify-symbols.ps1 -BuildDir build
+
+    - name: Upload conflict report
+      if: failure()
+      uses: actions/upload-artifact@v4
+      with:
+        name: symbol-conflicts-${{ matrix.platform }}
+        path: build/symbol-report.txt
+```
+
+**Benefits**:
+- ✅ Catches ODR violations before linking
+- ✅ Detects FLAGS conflicts (Linux-specific)
+- ✅ Platform-specific symbol issues
+- ✅ Runs in parallel with tests (~3 minutes)
+
+**What it catches**:
+- Duplicate symbol definitions
+- FLAGS_* conflicts (gflags)
+- ODR violations
+- Link-time errors (predicted)
+
+---
+
+### Improvement 4: Fail-Fast Strategy
+
+**Goal**: Stop wasting resources on doomed builds
+
+**Current Behavior**: All jobs run even if one fails
+**Proposed Behavior**: Stop non-essential jobs if critical jobs fail
+
+**Implementation**:
+```yaml
+jobs:
+  # Critical path: These must pass
+  config-validation:
+    # ... (as above)
+
+  compile-check:
+    needs: [config-validation]
+    strategy:
+      fail-fast: true  # Stop all platforms if one fails
+
+  build:
+    needs: [compile-check]
+    strategy:
+      fail-fast: false  # Allow other platforms to continue
+
+  # Non-critical: These can be skipped if builds fail
+  integration-tests:
+    needs: [build]
+    if: success()  # Only run if build succeeded
+
+  windows-agent:
+    needs: [build, test]
+    if: success() && github.event_name != 'pull_request'
+```
+
+**Benefits**:
+- ✅ Saves ~60 minutes of CI time per failed build
+- ✅ Faster feedback (no waiting for doomed jobs)
+- ✅ Reduced resource usage
+
+---
+
+### Improvement 5: Preset Matrix Testing
+
+**Goal**: Validate all presets can configure
+
+**Implementation**:
+```yaml
+preset-validation:
+  name: "Preset Validation"
+  runs-on: ${{ matrix.os }}
+  strategy:
+    matrix:
+      os: [ubuntu-22.04, macos-14, windows-2022]
+
+  steps:
+    - uses: actions/checkout@v4
+
+    - name: Test all presets for platform
+      run: |
+        for preset in $(cmake --list-presets | grep ${{ matrix.os }} | awk '{print $1}'); do
+          echo "Testing preset: $preset"
+          cmake --preset "$preset" --list-presets || exit 1
+        done
+```
+
+**Benefits**:
+- ✅ Catches invalid preset definitions
+- ✅ Validates CMake configuration across all presets
+- ✅ Fast (<2 minutes)
+
+---
+
+## Proposed CI Pipeline (New)
+
+### Job Dependencies
+
+```
+┌─────────────────────┐
+│ config-validation   │ (2 min, fail-fast)
+└──────────┬──────────┘
+           │
+           ▼
+┌─────────────────────┐
+│  compile-check      │ (5 min, fail-fast)
+└──────────┬──────────┘
+           │
+           ▼
+┌─────────────────────┐
+│       build         │ (15 min, parallel)
+└──────────┬──────────┘
+           │
+           ├──────────┬──────────┬──────────┐
+           ▼          ▼          ▼          ▼
+      ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐
+      │  test  │ │ symbol │ │quality │ │sanitize│
+      │ (5 min)│ │(3 min) │ │(2 min) │ │(20 min)│
+      └────────┘ └────────┘ └────────┘ └────────┘
+```
+
+### Time Comparison
+
+**Current Pipeline**:
+- First failure: ~15 minutes (compilation error)
+- Total time: ~40 minutes (if all succeed)
+
+**Proposed Pipeline**:
+- First failure: ~2 minutes (CMake error) or ~5 minutes (compilation error)
+- Total time: ~40 minutes (if all succeed)
+
+**Time Saved**:
+- CMake errors: **13 minutes saved** (15 min → 2 min)
+- Compilation errors: **10 minutes saved** (15 min → 5 min)
+- Symbol conflicts: **Caught earlier** (no failed PRs)
+
+---
+
+## Implementation Plan
+
+### Phase 1: Quick Wins (Week 1)
+
+1. **Add config-validation job**
+   - Copy composite actions
+   - Add new job to `ci.yml`
+   - Test on feature branch
+
+2. **Add symbol-check script**
+   - Already created: `scripts/verify-symbols.sh`
+   - Add Windows version: `scripts/verify-symbols.ps1`
+   - Test locally
+
+3. **Update job dependencies**
+   - Make `build` depend on `config-validation`
+   - Add fail-fast to compile-check
+
+**Deliverables**:
+- ✅ Config validation catches CMake errors in <2 min
+- ✅ Symbol checker available for CI
+- ✅ Fail-fast prevents wasted CI time
+
+### Phase 2: Compilation Checks (Week 2)
+
+1. **Add compile-check job**
+   - Identify representative files
+   - Create compilation target list
+   - Add to CI workflow
+
+2. **Platform-specific smoke tests**
+   - Windows: Check `/std:c++latest`
+   - Linux: Check `-std=c++20`
+   - macOS: Check framework links
+
+**Deliverables**:
+- ✅ Compilation errors caught in <5 min
+- ✅ Platform-specific issues detected early
+
+### Phase 3: Symbol Validation (Week 3)
+
+1. **Add symbol-check job**
+   - Integrate `verify-symbols.sh`
+   - Upload conflict reports
+   - Add to required checks
+
+2. **Create symbol conflict guide**
+   - Document common issues
+   - Provide fix examples
+   - Link from CI failures
+
+**Deliverables**:
+- ✅ ODR violations caught before merge
+- ✅ FLAGS conflicts detected automatically
+
+### Phase 4: Optimization (Week 4)
+
+1. **Fine-tune fail-fast**
+   - Identify critical vs optional jobs
+   - Set up conditional execution
+   - Test resource savings
+
+2. **Add caching improvements**
+   - Cache compiled objects
+   - Share artifacts between jobs
+   - Optimize dependency downloads
+
+**Deliverables**:
+- ✅ ~60 minutes CI time saved per failed build
+- ✅ Faster PR iteration
+
+---
+
+## Success Metrics
+
+### Before Improvements
+
+| Metric | Value |
+|--------|-------|
+| Time to first failure | 15-20 min |
+| CI failures per month | ~30 |
+| Wasted CI time/month | ~8 hours |
+| PR iteration time | 2-4 hours |
+| Symbol conflicts caught | 0% (manual) |
+
+### After Improvements (Target)
+
+| Metric | Value |
+|--------|-------|
+| Time to first failure | **2-5 min** |
+| CI failures per month | **<10** |
+| Wasted CI time/month | **<2 hours** |
+| PR iteration time | **30-60 min** |
+| Symbol conflicts caught | **100%** |
+
+### ROI Calculation
+
+**Time Savings**:
+- 20 failures/month × 10 min saved = **200 minutes/month**
+- 10 failed PRs avoided = **~4 hours/month**
+- **Total: ~5-6 hours/month saved**
+
+**Developer Experience**:
+- Faster feedback → less context switching
+- Earlier error detection → easier debugging
+- Fewer CI failures → less frustration
+
+---
+
+## Risks & Mitigations
+
+### Risk 1: False Positives
+**Risk**: New checks catch issues that aren't real problems
+**Mitigation**:
+- Test thoroughly before enabling as required
+- Allow overrides for known false positives
+- Iterate on filtering logic
+
+### Risk 2: Increased Complexity
+**Risk**: More jobs = harder to understand CI failures
+**Mitigation**:
+- Clear job names and descriptions
+- Good error messages with links to docs
+- Dependency graph visualization
+
+### Risk 3: Slower PR Merges
+**Risk**: More required checks = slower to merge
+**Mitigation**:
+- Make only critical checks required
+- Run expensive checks post-merge
+- Provide override mechanism for emergencies
+
+---
+
+## Alternative Approaches Considered
+
+### Approach 1: Pre-commit Hooks
+**Pros**: Catch issues before pushing
+**Cons**: Developers can skip, not enforced
+**Decision**: Provide optional hooks, but rely on CI
+
+### Approach 2: GitHub Actions Matrix Expansion
+**Pros**: Test more combinations
+**Cons**: Significantly more CI time
+**Decision**: Focus on critical paths, expand later if needed
+
+### Approach 3: Self-Hosted Runners
+**Pros**: Faster builds, more control
+**Cons**: Maintenance overhead, security concerns
+**Decision**: Stick with GitHub runners for now
+
+---
+
+## Related Work
+
+### Similar Implementations
+- **LLVM Project**: Uses compile-only jobs for fast feedback
+- **Chromium**: Extensive smoke testing before full builds
+- **Abseil**: Symbol conflict detection in CI
+
+### Best Practices
+1. **Fail Fast**: Stop early if critical checks fail
+2. **Layered Testing**: Quick checks first, expensive checks later
+3. **Clear Feedback**: Good error messages with actionable advice
+4. **Caching**: Reuse work across jobs when possible
+
+---
+
+## Appendix A: New CI Jobs (YAML)
+
+### Config Validation Job
+```yaml
+config-validation:
+  name: "Config Validation - ${{ matrix.name }}"
+  runs-on: ${{ matrix.os }}
+  strategy:
+    fail-fast: true
+    matrix:
+      include:
+        - name: "Ubuntu 22.04"
+          os: ubuntu-22.04
+          preset: ci-linux
+          platform: linux
+        - name: "macOS 14"
+          os: macos-14
+          preset: ci-macos
+          platform: macos
+        - name: "Windows 2022"
+          os: windows-2022
+          preset: ci-windows
+          platform: windows
+
+  steps:
+    - name: Checkout code
+      uses: actions/checkout@v4
+      with:
+        submodules: recursive
+
+    - name: Setup build environment
+      uses: ./.github/actions/setup-build
+      with:
+        platform: ${{ matrix.platform }}
+        preset: ${{ matrix.preset }}
+
+    - name: Validate CMake configuration
+      run: cmake --preset ${{ matrix.preset }}
+
+    - name: Check configuration
+      shell: bash
+      run: |
+        # Check include paths
+        grep "INCLUDE_DIRECTORIES" build/CMakeCache.txt
+
+        # Check preset is valid
+        cmake --preset ${{ matrix.preset }} --list-presets
+```
+
+### Compile Check Job
+```yaml
+compile-check:
+  name: "Compile Check - ${{ matrix.name }}"
+  runs-on: ${{ matrix.os }}
+  needs: [config-validation]
+  strategy:
+    fail-fast: true
+    matrix:
+      include:
+        - name: "Ubuntu 22.04"
+          os: ubuntu-22.04
+          preset: ci-linux
+          platform: linux
+        - name: "macOS 14"
+          os: macos-14
+          preset: ci-macos
+          platform: macos
+        - name: "Windows 2022"
+          os: windows-2022
+          preset: ci-windows
+          platform: windows
+
+  steps:
+    - name: Checkout code
+      uses: actions/checkout@v4
+      with:
+        submodules: recursive
+
+    - name: Setup build environment
+      uses: ./.github/actions/setup-build
+      with:
+        platform: ${{ matrix.platform }}
+        preset: ${{ matrix.preset }}
+
+    - name: Configure project
+      run: cmake --preset ${{ matrix.preset }}
+
+    - name: Smoke compilation test
+      shell: bash
+      run: ./scripts/pre-push-test.sh --smoke-only --preset ${{ matrix.preset }}
+```
+
+### Symbol Check Job
+```yaml
+symbol-check:
+  name: "Symbol Check - ${{ matrix.name }}"
+  runs-on: ${{ matrix.os }}
+  needs: [build]
+  strategy:
+    matrix:
+      include:
+        - name: "Ubuntu 22.04"
+          os: ubuntu-22.04
+          platform: linux
+        - name: "macOS 14"
+          os: macos-14
+          platform: macos
+
+  steps:
+    - name: Checkout code
+      uses: actions/checkout@v4
+
+    - name: Download build artifacts
+      uses: actions/download-artifact@v4
+      with:
+        name: build-${{ matrix.platform }}
+        path: build
+
+    - name: Check for symbol conflicts
+      shell: bash
+      run: ./scripts/verify-symbols.sh --build-dir build
+
+    - name: Upload conflict report
+      if: failure()
+      uses: actions/upload-artifact@v4
+      with:
+        name: symbol-conflicts-${{ matrix.platform }}
+        path: build/symbol-report.txt
+```
+
+---
+
+## Appendix B: Cost Analysis
+
+### Current Monthly CI Usage (Estimated)
+
+| Job | Duration | Runs/Month | Total Time |
+|-----|----------|------------|------------|
+| build (3 platforms) | 15 min × 3 | 100 PRs | **75 hours** |
+| test (3 platforms) | 5 min × 3 | 100 PRs | **25 hours** |
+| windows-agent | 30 min | 30 | **15 hours** |
+| code-quality | 2 min | 100 PRs | **3.3 hours** |
+| memory-sanitizer | 20 min | 50 PRs | **16.7 hours** |
+| z3ed-agent-test | 15 min | 30 | **7.5 hours** |
+| **Total** | | | **142.5 hours** |
+
+### Proposed Monthly CI Usage
+
+| Job | Duration | Runs/Month | Total Time |
+|-----|----------|------------|------------|
+| config-validation (3) | 2 min × 3 | 100 PRs | **10 hours** |
+| compile-check (3) | 5 min × 3 | 100 PRs | **25 hours** |
+| build (3 platforms) | 15 min × 3 | 80 PRs | **60 hours** (↓20%) |
+| test (3 platforms) | 5 min × 3 | 80 PRs | **20 hours** (↓20%) |
+| symbol-check (2) | 3 min × 2 | 80 PRs | **8 hours** |
+| windows-agent | 30 min | 25 | **12.5 hours** (↓17%) |
+| code-quality | 2 min | 100 PRs | **3.3 hours** |
+| memory-sanitizer | 20 min | 40 PRs | **13.3 hours** (↓20%) |
+| z3ed-agent-test | 15 min | 25 | **6.25 hours** (↓17%) |
+| **Total** | | | **158.4 hours** (+11%) |
+
+**Net Change**: +16 hours/month (11% increase)
+
+**BUT**:
+- Fewer failed builds (20% reduction)
+- Faster feedback (10-15 min saved per failure)
+- Better developer experience (invaluable)
+
+**Conclusion**: Slight increase in total CI time, but significant improvement in efficiency and developer experience
--- a/docs/internal/agents/archive/testing-docs-2025/cmake-validation.md
+++ b/docs/internal/agents/archive/testing-docs-2025/cmake-validation.md
@@ -0,0 +1,672 @@
+# CMake Configuration Validation
+
+Comprehensive guide to validating CMake configuration and catching dependency issues early.
+
+## Overview
+
+The CMake validation toolkit provides four powerful tools to catch configuration issues before they cause build failures:
+
+1. **validate-cmake-config.cmake** - Validates CMake cache and configuration
+2. **check-include-paths.sh** - Verifies include paths in compile commands
+3. **visualize-deps.py** - Generates dependency graphs
+4. **test-cmake-presets.sh** - Tests all CMake presets
+
+## Quick Start
+
+```bash
+# 1. Validate configuration after running cmake
+cmake --preset mac-dbg
+cmake -P scripts/validate-cmake-config.cmake build
+
+# 2. Check include paths
+./scripts/check-include-paths.sh build
+
+# 3. Visualize dependencies
+python3 scripts/visualize-deps.py build --format graphviz --stats
+
+# 4. Test all presets for your platform
+./scripts/test-cmake-presets.sh --platform mac
+```
+
+## Tool 1: validate-cmake-config.cmake
+
+### Purpose
+Validates CMake configuration by checking:
+- Required targets exist
+- Feature flags are consistent
+- Compiler settings are correct
+- Platform-specific configuration (especially Windows/Abseil)
+- Output directories are created
+- Common configuration issues
+
+### Usage
+
+```bash
+# Validate default build directory
+cmake -P scripts/validate-cmake-config.cmake
+
+# Validate specific build directory
+cmake -P scripts/validate-cmake-config.cmake build_ai
+
+# Validate after configuration
+cmake --preset win-ai
+cmake -P scripts/validate-cmake-config.cmake build
+```
+
+### Exit Codes
+- **0** - All checks passed
+- **1** - Validation failed (errors detected)
+
+### What It Checks
+
+#### 1. Required Targets
+Ensures core targets exist:
+- `yaze_common` - Common interface library
+
+#### 2. Feature Flag Consistency
+- When `YAZE_ENABLE_AI` is ON, `YAZE_ENABLE_GRPC` must also be ON
+- When `YAZE_ENABLE_GRPC` is ON, validates gRPC version is set
+
+#### 3. Compiler Configuration
+- C++ standard is set to 23
+- MSVC runtime library is configured correctly on Windows
+- Compiler flags are propagated correctly
+
+#### 4. Abseil Configuration (Windows)
+**CRITICAL for Windows builds with gRPC:**
+- Checks `CMAKE_MSVC_RUNTIME_LIBRARY` is set to `MultiThreaded`
+- Validates `ABSL_PROPAGATE_CXX_STD` is enabled
+- Verifies Abseil include directories exist
+
+This prevents the "Abseil missing include paths" issue.
+
+#### 5. Output Directories
+- `build/bin` exists
+- `build/lib` exists
+
+#### 6. Common Issues
+- LTO enabled in Debug builds (warning)
+- Missing compile_commands.json
+- Generator expressions not expanded
+
+### Example Output
+
+```
+=== CMake Configuration Validator ===
+✓ Build directory: build
+✓ Loaded 342 cache variables
+
+=== Validating required targets ===
+✓ Required target exists: yaze_common
+
+=== Validating feature flags ===
+✓ gRPC enabled: ON
+✓ gRPC version: 1.67.1
+✓ Tests enabled
+✓ AI features enabled
+
+=== Validating compiler flags ===
+✓ C++ standard: 23
+✓ CXX flags set: /EHsc /W4 /bigobj
+
+=== Validating Windows/Abseil configuration ===
+✓ MSVC runtime: MultiThreaded$<$<CONFIG:Debug>:Debug>
+✓ Abseil CXX standard propagation enabled
+
+=== Validation Summary ===
+✓ All validation checks passed!
+Configuration is ready for build
+```
+
+## Tool 2: check-include-paths.sh
+
+### Purpose
+Validates include paths in compile_commands.json to catch missing includes before compilation.
+
+**Key Problem Solved:** On Windows, Abseil includes from gRPC were sometimes not propagated, causing build failures. This tool catches that early.
+
+### Usage
+
+```bash
+# Check default build directory
+./scripts/check-include-paths.sh
+
+# Check specific build directory
+./scripts/check-include-paths.sh build_ai
+
+# Verbose mode (shows all include directories)
+VERBOSE=1 ./scripts/check-include-paths.sh build
+```
+
+### Prerequisites
+
+- **jq** (optional but recommended): `brew install jq` / `apt install jq`
+- Without jq, uses basic grep parsing
+
+### What It Checks
+
+#### 1. Common Dependencies
+- SDL2 includes
+- ImGui includes
+- yaml-cpp includes
+
+#### 2. Platform-Specific Includes
+Validates platform-specific headers based on detected OS
+
+#### 3. Abseil Includes (Windows Critical)
+When gRPC is enabled:
+- Checks `build/_deps/grpc-build/third_party/abseil-cpp` exists
+- Validates Abseil paths are in compile commands
+- Warns about unexpanded generator expressions
+
+#### 4. Suspicious Configurations
+- No `-I` flags at all (error)
+- Relative paths with `../` (warning)
+- Duplicate include paths (warning)
+
+### Exit Codes
+- **0** - All checks passed or warnings only
+- **1** - Critical errors detected
+
+### Example Output
+
+```
+=== Include Path Validation ===
+Build directory: build
+✓ Using jq for JSON parsing
+
+=== Common Dependencies ===
+✓ SDL2 includes found
+✓ ImGui includes found
+⚠ yaml-cpp includes not found (may be optional)
+
+=== Platform-Specific Includes ===
+Platform: macOS
+✓ SDL2 framework/library
+
+=== Checking Abseil Includes (Windows Issue) ===
+gRPC build detected - checking Abseil paths...
+✓ Abseil from gRPC build: build/_deps/grpc-build/third_party/abseil-cpp
+
+=== Suspicious Configurations ===
+✓ Include flags present (234/245 commands)
+✓ No duplicate include paths
+
+=== Summary ===
+Checks performed: 5
+Warnings: 1
+✓ All include path checks passed!
+```
+
+## Tool 3: visualize-deps.py
+
+### Purpose
+Generates visual dependency graphs and detects circular dependencies.
+
+### Usage
+
+```bash
+# Generate GraphViz diagram (default)
+python3 scripts/visualize-deps.py build
+
+# Generate Mermaid diagram
+python3 scripts/visualize-deps.py build --format mermaid -o deps.mmd
+
+# Generate text tree
+python3 scripts/visualize-deps.py build --format text
+
+# Show statistics
+python3 scripts/visualize-deps.py build --stats
+```
+
+### Output Formats
+
+#### 1. GraphViz (DOT)
+```bash
+python3 scripts/visualize-deps.py build --format graphviz -o dependencies.dot
+
+# Render to PNG
+dot -Tpng dependencies.dot -o dependencies.png
+
+# Render to SVG (better for large graphs)
+dot -Tsvg dependencies.dot -o dependencies.svg
+```
+
+**Color Coding:**
+- Blue boxes: Executables
+- Green boxes: Libraries
+- Gray boxes: Unknown type
+- Red arrows: Circular dependencies
+
+#### 2. Mermaid
+```bash
+python3 scripts/visualize-deps.py build --format mermaid -o dependencies.mmd
+```
+
+View at https://mermaid.live/edit or include in Markdown:
+
+````markdown
+```mermaid
+graph LR
+  yaze_app-->yaze_lib
+  yaze_lib-->SDL2
+```
+````
+
+#### 3. Text Tree
+```bash
+python3 scripts/visualize-deps.py build --format text
+```
+
+Simple text representation for quick overview.
+
+### Circular Dependency Detection
+
+The tool automatically detects and highlights circular dependencies:
+
+```
+✗ Found 1 circular dependencies
+  libA -> libB -> libC -> libA
+```
+
+Circular dependencies in graphs are shown with red arrows.
+
+### Statistics Output
+
+With `--stats` flag:
+```
+=== Dependency Statistics ===
+Total targets: 47
+Total dependencies: 156
+Average dependencies per target: 3.32
+
+Most connected targets:
+  yaze_lib: 23 dependencies
+  yaze_app: 18 dependencies
+  yaze_cli: 15 dependencies
+  ...
+```
+
+## Tool 4: test-cmake-presets.sh
+
+### Purpose
+Tests that all CMake presets can configure successfully, ensuring no configuration regressions.
+
+### Usage
+
+```bash
+# Test all presets for current platform
+./scripts/test-cmake-presets.sh
+
+# Test specific preset
+./scripts/test-cmake-presets.sh --preset mac-ai
+
+# Test only Mac presets
+./scripts/test-cmake-presets.sh --platform mac
+
+# Test in parallel (4 jobs)
+./scripts/test-cmake-presets.sh --parallel 4
+
+# Quick mode (don't clean between tests)
+./scripts/test-cmake-presets.sh --quick
+
+# Verbose output
+./scripts/test-cmake-presets.sh --verbose
+```
+
+### Options
+
+| Option | Description |
+|--------|-------------|
+| `--parallel N` | Test N presets in parallel (default: 4) |
+| `--preset PRESET` | Test only specific preset |
+| `--platform PLATFORM` | Test only presets for platform (mac/win/lin) |
+| `--quick` | Skip cleaning between tests (faster) |
+| `--verbose` | Show full CMake output |
+
+### Platform Detection
+
+Automatically skips presets for other platforms:
+- On macOS: Only tests `mac-*` and generic presets
+- On Linux: Only tests `lin-*` and generic presets
+- On Windows: Only tests `win-*` and generic presets
+
+### Example Output
+
+```
+=== CMake Preset Configuration Tester ===
+Platform: mac
+Parallel jobs: 4
+
+Presets to test:
+  - mac-dbg
+  - mac-rel
+  - mac-ai
+  - dev
+  - ci
+
+Running tests in parallel (jobs: 4)...
+
+✓ mac-dbg configured successfully (12s)
+✓ dev configured successfully (15s)
+✓ mac-rel configured successfully (11s)
+✓ mac-ai configured successfully (45s)
+✓ ci configured successfully (18s)
+
+=== Test Summary ===
+Total presets tested: 5
+Passed: 5
+Failed: 0
+✓ All presets configured successfully!
+```
+
+### Failure Handling
+
+When a preset fails:
+```
+✗ win-ai failed (34s)
+  Log saved to: preset_test_win-ai.log
+
+=== Test Summary ===
+Total presets tested: 3
+Passed: 2
+Failed: 1
+Failed presets:
+  - win-ai
+
+Check log files for details: preset_test_*.log
+```
+
+## Integration with CI
+
+### Add to GitHub Actions Workflow
+
+```yaml
+name: CMake Validation
+
+on: [push, pull_request]
+
+jobs:
+  validate-cmake:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v3
+
+      - name: Configure CMake
+        run: cmake --preset ci-linux
+
+      - name: Validate Configuration
+        run: cmake -P scripts/validate-cmake-config.cmake build
+
+      - name: Check Include Paths
+        run: ./scripts/check-include-paths.sh build
+
+      - name: Detect Circular Dependencies
+        run: python3 scripts/visualize-deps.py build --stats
+```
+
+### Pre-Configuration Check
+
+Run validation as first CI step to fail fast:
+
+```yaml
+- name: Fast Configuration Check
+  run: |
+    cmake --preset minimal
+    cmake -P scripts/validate-cmake-config.cmake build
+```
+
+## Common Issues and Solutions
+
+### Issue 1: Missing Abseil Includes on Windows
+
+**Symptom:**
+```
+✗ Missing required include: Abseil from gRPC build
+```
+
+**Solution:**
+1. Ensure `ABSL_PROPAGATE_CXX_STD` is ON in cmake/dependencies/grpc.cmake
+2. Reconfigure with `--fresh`: `cmake --preset win-ai --fresh`
+3. Check that gRPC was built successfully
+
+**Prevention:**
+Run `cmake -P scripts/validate-cmake-config.cmake` after every configuration.
+
+### Issue 2: Circular Dependencies
+
+**Symptom:**
+```
+✗ Found 2 circular dependencies
+  libA -> libB -> libA
+```
+
+**Solution:**
+1. Visualize full graph: `python3 scripts/visualize-deps.py build --format graphviz -o deps.dot`
+2. Render: `dot -Tpng deps.dot -o deps.png`
+3. Identify and break cycles by:
+   - Moving shared code to a new library
+   - Using forward declarations instead of includes
+   - Restructuring dependencies
+
+### Issue 3: Preset Configuration Fails
+
+**Symptom:**
+```
+✗ mac-ai failed (34s)
+  Log saved to: preset_test_mac-ai.log
+```
+
+**Solution:**
+1. Check log file: `cat preset_test_mac-ai.log`
+2. Common causes:
+   - Missing dependencies (gRPC build failure)
+   - Incompatible compiler flags
+   - Platform condition mismatch
+3. Test preset manually: `cmake --preset mac-ai -B test_build -v`
+
+### Issue 4: Generator Expressions Not Expanded
+
+**Symptom:**
+```
+⚠ Generator expressions found in compile commands (may not be expanded)
+```
+
+**Solution:**
+This is usually harmless. Generator expressions like `$<BUILD_INTERFACE:...>` are CMake-internal and won't appear in final compile commands. If build fails, the issue is elsewhere.
+
+## Best Practices
+
+### 1. Run Validation After Every Configuration
+
+```bash
+# Configure
+cmake --preset mac-ai
+
+# Validate immediately
+cmake -P scripts/validate-cmake-config.cmake build
+./scripts/check-include-paths.sh build
+```
+
+### 2. Test All Presets Before Committing
+
+```bash
+# Quick test of all platform presets
+./scripts/test-cmake-presets.sh --platform mac --parallel 4
+```
+
+### 3. Check Dependencies When Adding New Targets
+
+```bash
+# After adding new target to CMakeLists.txt
+cmake --preset dev
+python3 scripts/visualize-deps.py build --stats
+```
+
+Look for:
+- Unexpected high dependency counts
+- New circular dependencies
+
+### 4. Use in Git Hooks
+
+Create `.git/hooks/pre-commit`:
+```bash
+#!/bin/bash
+# Validate CMake configuration before commit
+
+if [ -f "build/CMakeCache.txt" ]; then
+    echo "Validating CMake configuration..."
+    cmake -P scripts/validate-cmake-config.cmake build || exit 1
+fi
+```
+
+### 5. Periodic Full Validation
+
+Weekly or before releases:
+```bash
+# Full validation suite
+./scripts/test-cmake-presets.sh --parallel 4
+cmake --preset dev
+cmake -P scripts/validate-cmake-config.cmake build
+./scripts/check-include-paths.sh build
+python3 scripts/visualize-deps.py build --format graphviz --stats -o deps.dot
+```
+
+## Troubleshooting
+
+### Tool doesn't run on Windows
+
+**Bash scripts:**
+Use Git Bash, WSL, or MSYS2 to run `.sh` scripts.
+
+**CMake scripts:**
+Should work natively on Windows:
+```powershell
+cmake -P scripts\validate-cmake-config.cmake build
+```
+
+### jq not found
+
+Install jq for better JSON parsing:
+```bash
+# macOS
+brew install jq
+
+# Ubuntu/Debian
+sudo apt install jq
+
+# Windows (via Chocolatey)
+choco install jq
+```
+
+Scripts will work without jq but with reduced functionality.
+
+### Python script fails
+
+Ensure Python 3.7+ is installed:
+```bash
+python3 --version
+```
+
+No external dependencies required - uses only standard library.
+
+### GraphViz rendering fails
+
+Install GraphViz:
+```bash
+# macOS
+brew install graphviz
+
+# Ubuntu/Debian
+sudo apt install graphviz
+
+# Windows (via Chocolatey)
+choco install graphviz
+```
+
+## Advanced Usage
+
+### Custom Validation Rules
+
+Edit `scripts/validate-cmake-config.cmake` to add project-specific checks:
+
+```cmake
+# Add after existing checks
+log_header "Custom Project Checks"
+
+if(DEFINED CACHE_MY_CUSTOM_FLAG)
+  if(CACHE_MY_CUSTOM_FLAG)
+    log_success "Custom flag enabled"
+  else()
+    log_error "Custom flag must be enabled for this build"
+  endif()
+endif()
+```
+
+### Automated Dependency Reports
+
+Generate weekly dependency reports:
+
+```bash
+#!/bin/bash
+# weekly-deps-report.sh
+
+DATE=$(date +%Y-%m-%d)
+REPORT_DIR="reports/$DATE"
+mkdir -p "$REPORT_DIR"
+
+# Configure
+cmake --preset ci
+
+# Generate all formats
+python3 scripts/visualize-deps.py build \
+  --format graphviz --stats -o "$REPORT_DIR/deps.dot"
+
+python3 scripts/visualize-deps.py build \
+  --format mermaid -o "$REPORT_DIR/deps.mmd"
+
+python3 scripts/visualize-deps.py build \
+  --format text -o "$REPORT_DIR/deps.txt"
+
+# Render GraphViz
+dot -Tsvg "$REPORT_DIR/deps.dot" -o "$REPORT_DIR/deps.svg"
+
+echo "Report generated in $REPORT_DIR"
+```
+
+### CI Matrix Testing
+
+Test all presets across platforms:
+
+```yaml
+jobs:
+  test-presets:
+    strategy:
+      matrix:
+        os: [ubuntu-latest, macos-latest, windows-latest]
+    runs-on: ${{ matrix.os }}
+    steps:
+      - uses: actions/checkout@v3
+      - name: Test Presets
+        run: ./scripts/test-cmake-presets.sh --parallel 2
+```
+
+## Quick Reference
+
+| Task | Command |
+|------|---------|
+| Validate config | `cmake -P scripts/validate-cmake-config.cmake build` |
+| Check includes | `./scripts/check-include-paths.sh build` |
+| Visualize deps | `python3 scripts/visualize-deps.py build` |
+| Test all presets | `./scripts/test-cmake-presets.sh` |
+| Test one preset | `./scripts/test-cmake-presets.sh --preset mac-ai` |
+| Generate PNG graph | `python3 scripts/visualize-deps.py build -o d.dot && dot -Tpng d.dot -o d.png` |
+| Check for cycles | `python3 scripts/visualize-deps.py build --stats` |
+| Verbose include check | `VERBOSE=1 ./scripts/check-include-paths.sh build` |
+
+## See Also
+
+- [Build Quick Reference](../../public/build/quick-reference.md) - Build commands
+- [Build Troubleshooting](../../BUILD-TROUBLESHOOTING.md) - Common build issues
+- [CMakePresets.json](../../../CMakePresets.json) - All available presets
+- [GitHub Actions Workflows](../../../.github/workflows/) - CI configuration
--- a/docs/internal/agents/archive/testing-docs-2025/gap-analysis.md
+++ b/docs/internal/agents/archive/testing-docs-2025/gap-analysis.md
@@ -0,0 +1,390 @@
+# Testing Infrastructure Gap Analysis
+
+## Executive Summary
+
+Recent CI failures revealed critical gaps in our testing infrastructure that allowed platform-specific build failures to reach CI. This document analyzes what we currently test, what we missed, and what infrastructure is needed to catch issues earlier.
+
+**Date**: 2025-11-20
+**Triggered By**: Multiple CI failures in commits 43a0e5e314, c2bb90a3f1, and related fixes
+
+---
+
+## 1. Issues We Didn't Catch Locally
+
+### 1.1 Windows Abseil Include Path Issues (c2bb90a3f1)
+**Problem**: Abseil headers not found during Windows/clang-cl compilation
+**Why it wasn't caught**:
+- No local pre-push compilation check
+- CMake configuration validates successfully, but compilation fails later
+- Include path propagation from gRPC/Abseil not validated until full compile
+
+**What would have caught it**:
+- ✅ Smoke compilation test (compile subset of files to catch header issues)
+- ✅ CMake configuration validator (check include path propagation)
+- ✅ Header dependency checker
+
+### 1.2 Linux FLAGS Symbol Conflicts (43a0e5e314, eb77bbeaff)
+**Problem**: ODR (One Definition Rule) violation - multiple `FLAGS` symbols across libraries
+**Why it wasn't caught**:
+- Symbol conflicts only appear at link time
+- No cross-library symbol conflict detection
+- Static analysis doesn't catch ODR violations
+- Unit tests don't link full dependency graph
+
+**What would have caught it**:
+- ✅ Symbol conflict scanner (nm/objdump analysis)
+- ✅ ODR violation detector
+- ✅ Full integration build test (link all libraries together)
+
+### 1.3 Platform-Specific Configuration Issues
+**Problem**: Preprocessor flags, compiler detection, and platform-specific code paths
+**Why it wasn't caught**:
+- No local cross-platform validation
+- CMake configuration differences between platforms not tested
+- Compiler detection logic (clang-cl vs MSVC) not validated
+
+**What would have caught it**:
+- ✅ CMake configuration dry-run on multiple platforms
+- ✅ Preprocessor flag validation
+- ✅ Compiler detection smoke test
+
+---
+
+## 2. Current Testing Coverage
+
+### 2.1 What We Test Well
+
+#### Unit Tests (test/unit/)
+- **Coverage**: Core algorithms, data structures, parsers
+- **Speed**: Fast (<1s for most tests)
+- **Isolation**: Mocked dependencies, no ROM required
+- **CI**: ✅ Runs on every PR
+- **Example**: `hex_test.cc`, `asar_wrapper_test.cc`, `snes_palette_test.cc`
+
+**Strengths**:
+- Catches logic errors quickly
+- Good for TDD
+- Platform-independent
+
+**Gaps**:
+- Doesn't catch build system issues
+- Doesn't catch linking problems
+- Doesn't validate dependencies
+
+#### Integration Tests (test/integration/)
+- **Coverage**: Multi-component interactions, ROM operations
+- **Speed**: Slower (1-10s per test)
+- **Dependencies**: May require ROM files
+- **CI**: ✅ Runs on develop/master
+- **Example**: `asar_integration_test.cc`, `dungeon_editor_v2_test.cc`
+
+**Strengths**:
+- Tests component interactions
+- Validates ROM operations
+
+**Gaps**:
+- Still doesn't catch platform-specific issues
+- Doesn't validate symbol conflicts
+- Doesn't test cross-library linking
+
+#### E2E Tests (test/e2e/)
+- **Coverage**: Full UI workflows, user interactions
+- **Speed**: Very slow (10-60s per test)
+- **Dependencies**: GUI, ImGuiTestEngine
+- **CI**: ⚠️ Limited (only on macOS z3ed-agent-test)
+- **Example**: `dungeon_editor_smoke_test.cc`, `canvas_selection_test.cc`
+
+**Strengths**:
+- Validates real user workflows
+- Tests UI responsiveness
+
+**Gaps**:
+- Not run consistently across platforms
+- Slow feedback loop
+- Requires display/window system
+
+### 2.2 What We DON'T Test
+
+#### Build System Validation
+- ❌ CMake configuration correctness per preset
+- ❌ Include path propagation from dependencies
+- ❌ Compiler flag compatibility
+- ❌ Linker flag validation
+- ❌ Cross-preset compatibility
+
+#### Symbol-Level Issues
+- ❌ ODR (One Definition Rule) violations
+- ❌ Duplicate symbol detection across libraries
+- ❌ Symbol visibility (public/private)
+- ❌ ABI compatibility between libraries
+
+#### Platform-Specific Compilation
+- ❌ Header-only compilation checks
+- ❌ Preprocessor branch coverage
+- ❌ Platform macro validation
+- ❌ Compiler-specific feature detection
+
+#### Dependency Health
+- ❌ Include path conflicts
+- ❌ Library version mismatches
+- ❌ Transitive dependency validation
+- ❌ Static vs shared library conflicts
+
+---
+
+## 3. CI/CD Coverage Analysis
+
+### 3.1 Current CI Matrix (.github/workflows/ci.yml)
+
+| Platform | Build | Test (stable) | Test (unit) | Test (integration) | Test (AI) |
+|----------|-------|---------------|-------------|-------------------|-----------|
+| Ubuntu 22.04 (GCC-12) | ✅ | ✅ | ✅ | ❌ | ❌ |
+| macOS 14 (Clang) | ✅ | ✅ | ✅ | ❌ | ✅ |
+| Windows 2022 (Core) | ✅ | ✅ | ✅ | ❌ | ❌ |
+| Windows 2022 (AI) | ✅ | ✅ | ✅ | ❌ | ❌ |
+
+**CI Job Flow**:
+1. **build**: Configure + compile full project
+2. **test**: Run stable + unit tests
+3. **windows-agent**: Full AI stack (gRPC + AI runtime)
+4. **code-quality**: clang-format, cppcheck, clang-tidy
+5. **memory-sanitizer**: AddressSanitizer (Linux only)
+6. **z3ed-agent-test**: Full agent test suite (macOS only)
+
+### 3.2 CI Gaps
+
+#### Missing Early Feedback
+- ❌ No compilation-only job (fails after 15-20 min build)
+- ❌ No CMake configuration validation job (would catch in <1 min)
+- ❌ No symbol conflict checking job
+
+#### Limited Platform Coverage
+- ⚠️ Only Linux gets AddressSanitizer
+- ⚠️ Only macOS gets full z3ed agent tests
+- ⚠️ Windows AI stack not tested on PRs (only post-merge)
+
+#### Incomplete Testing
+- ❌ Integration tests not run in CI
+- ❌ E2E tests not run on Linux/Windows
+- ❌ No ROM-dependent testing
+- ❌ No performance regression detection
+
+---
+
+## 4. Developer Workflow Gaps
+
+### 4.1 Pre-Commit Hooks
+**Current State**: None
+**Gap**: No automatic checks before local commits
+
+**Should Include**:
+- clang-format check
+- Build system sanity check
+- Copyright header validation
+
+### 4.2 Pre-Push Validation
+**Current State**: Manual testing only
+**Gap**: Easy to push broken code to CI
+
+**Should Include**:
+- Smoke build test (quick compilation check)
+- Unit test run
+- Symbol conflict detection
+
+### 4.3 Local Cross-Platform Testing
+**Current State**: Developer-dependent
+**Gap**: No easy way to test across platforms locally
+
+**Should Include**:
+- Docker-based Linux testing
+- VM-based Windows testing (for macOS/Linux devs)
+- Preset validation tool
+
+---
+
+## 5. Root Cause Analysis by Issue Type
+
+### 5.1 Windows Abseil Include Paths
+
+**Timeline**:
+- ✅ Local macOS build succeeds
+- ✅ CMake configuration succeeds on all platforms
+- ❌ Windows compilation fails 15 minutes into CI
+- ❌ Fix attempt 1 fails (14d1f5de4c)
+- ❌ Fix attempt 2 fails (c2bb90a3f1)
+- ✅ Final fix succeeds
+
+**Why Multiple Attempts**:
+1. No local Windows testing environment
+2. CMake configuration doesn't validate actual compilation
+3. No header-only compilation check
+4. 15-20 minute feedback cycle from CI
+
+**Prevention**:
+- Header compilation smoke test
+- CMake include path validator
+- Local Windows testing (Docker/VM)
+
+### 5.2 Linux FLAGS Symbol Conflicts
+
+**Timeline**:
+- ✅ Local macOS build succeeds
+- ✅ Unit tests pass
+- ❌ Linux full build fails at link time
+- ❌ ODR violation: multiple `FLAGS` definitions
+- ✅ Fix: move FLAGS definition, rename conflicts
+
+**Why It Happened**:
+1. gflags creates `FLAGS_*` symbols in headers
+2. Multiple translation units define same symbols
+3. macOS linker more permissive than Linux ld
+4. No symbol conflict detection
+
+**Prevention**:
+- Symbol conflict scanner
+- ODR violation checker
+- Cross-platform link test
+
+---
+
+## 6. Recommended Testing Levels
+
+We propose a **5-level testing pyramid**:
+
+### Level 0: Static Analysis (< 1s)
+- clang-format
+- clang-tidy on changed files
+- Copyright headers
+- CMakeLists.txt syntax
+
+### Level 1: Configuration Validation (< 10s)
+- CMake configure dry-run
+- Include path validation
+- Compiler detection check
+- Preprocessor flag validation
+
+### Level 2: Smoke Compilation (< 2 min)
+- Compile subset of files (1 file per library)
+- Header-only compilation
+- Template instantiation check
+- Platform-specific branch validation
+
+### Level 3: Symbol Validation (< 5 min)
+- Full project compilation
+- Symbol conflict detection (nm/dumpbin)
+- ODR violation check
+- Library dependency graph
+
+### Level 4: Test Execution (5-30 min)
+- Unit tests (fast)
+- Integration tests (medium)
+- E2E tests (slow)
+- ROM-dependent tests (optional)
+
+---
+
+## 7. Actionable Recommendations
+
+### 7.1 Immediate Actions (This Initiative)
+
+1. **Create pre-push scripts** (`scripts/pre-push-test.sh`, `scripts/pre-push-test.ps1`)
+   - Run Level 0-2 checks locally
+   - Estimated time: <2 minutes
+   - Blocks 90% of CI failures
+
+2. **Create symbol conflict detector** (`scripts/verify-symbols.sh`)
+   - Scan built libraries for duplicate symbols
+   - Run as part of pre-push
+   - Catches ODR violations
+
+3. **Document testing strategy** (`docs/internal/testing/testing-strategy.md`)
+   - Clear explanation of each test level
+   - When to run which tests
+   - CI vs local testing
+
+4. **Create pre-push checklist** (`docs/internal/testing/pre-push-checklist.md`)
+   - Interactive checklist for developers
+   - Links to tools and scripts
+
+### 7.2 Short-Term Improvements (Next Sprint)
+
+1. **Add CI compile-only job**
+   - Runs in <5 minutes
+   - Catches compilation issues before full build
+   - Fails fast
+
+2. **Add CI symbol checking job**
+   - Runs after compile-only
+   - Detects ODR violations
+   - Platform-specific
+
+3. **Add CMake configuration validation job**
+   - Tests all presets
+   - Validates include paths
+   - <2 minutes
+
+4. **Enable integration tests in CI**
+   - Run on develop/master only (not PRs)
+   - Requires ROM file handling
+
+### 7.3 Long-Term Improvements (Future)
+
+1. **Docker-based local testing**
+   - Linux environment for macOS/Windows devs
+   - Matches CI exactly
+   - Fast feedback
+
+2. **Cross-platform test matrix locally**
+   - Run tests across multiple platforms
+   - Automated VM/container management
+
+3. **Performance regression detection**
+   - Benchmark suite
+   - Historical tracking
+   - Automatic alerts
+
+4. **Coverage tracking**
+   - Line coverage per PR
+   - Coverage trends over time
+   - Uncovered code reports
+
+---
+
+## 8. Success Metrics
+
+### 8.1 Developer Experience
+- **Target**: <2 minutes pre-push validation time
+- **Target**: 90% reduction in CI build failures
+- **Target**: <3 attempts to fix CI issues (down from 5-10)
+
+### 8.2 CI Efficiency
+- **Target**: <5 minutes to first failure signal
+- **Target**: 50% reduction in wasted CI time
+- **Target**: 95% PR pass rate (up from ~70%)
+
+### 8.3 Code Quality
+- **Target**: Zero ODR violations
+- **Target**: Zero platform-specific include issues
+- **Target**: 100% symbol conflict detection
+
+---
+
+## 9. Reference
+
+### Similar Issues in Recent History
+- Windows std::filesystem support (19196ca87c, b556b155a5)
+- Linux circular dependency (0812a84a22, e36d81f357)
+- macOS z3ed linker error (9c562df277)
+- Windows clang-cl detection (84cdb09a5b, cbdc6670a1)
+
+### Related Documentation
+- `docs/public/build/quick-reference.md` - Build commands
+- `docs/public/build/troubleshooting.md` - Platform-specific fixes
+- `CLAUDE.md` - Build system guidelines
+- `.github/workflows/ci.yml` - CI configuration
+
+### Tools Used
+- `nm` (Unix) / `dumpbin` (Windows) - Symbol inspection
+- `clang-tidy` - Static analysis
+- `cppcheck` - Code quality
+- `cmake --preset <name> --list-presets` - Preset validation
--- a/docs/internal/agents/archive/testing-docs-2025/integration-plan.md
+++ b/docs/internal/agents/archive/testing-docs-2025/integration-plan.md
@@ -0,0 +1,505 @@
+# Testing Infrastructure Integration Plan
+
+**Owner**: CLAUDE_TEST_COORD
+**Status**: Draft
+**Created**: 2025-11-20
+**Target Completion**: 2025-12-15
+
+## Executive Summary
+
+This document outlines the rollout plan for comprehensive testing infrastructure improvements across the yaze project. The goal is to reduce CI failures, catch issues earlier, and provide developers with fast, reliable testing tools.
+
+## Current State Assessment
+
+### What's Working Well
+
+✅ **Test Organization**:
+- Clear directory structure (unit/integration/e2e/benchmarks)
+- Good test coverage for core systems
+- ImGui Test Engine integration for GUI testing
+
+✅ **CI/CD**:
+- Multi-platform matrix (Linux, macOS, Windows)
+- Automated test execution on every commit
+- Test result artifacts on failure
+
+✅ **Helper Scripts**:
+- `run-tests.sh` for preset-based testing
+- `smoke-build.sh` for quick build verification
+- `run-gh-workflow.sh` for remote CI triggers
+
+### Current Gaps
+
+❌ **Developer Experience**:
+- No pre-push validation hooks
+- Long CI feedback loop (10-15 minutes)
+- Unclear what tests to run locally
+- Format checking often forgotten
+
+❌ **Test Infrastructure**:
+- No symbol conflict detection tools
+- No CMake configuration validators
+- Platform-specific test failures hard to reproduce locally
+- Flaky test tracking is manual
+
+❌ **Documentation**:
+- Testing docs scattered across multiple files
+- No clear "before you push" checklist
+- Platform-specific troubleshooting incomplete
+- Release testing process not documented
+
+## Goals and Success Criteria
+
+### Primary Goals
+
+1. **Fast Local Feedback** (<5 minutes for pre-push checks)
+2. **Early Issue Detection** (catch 90% of CI failures locally)
+3. **Clear Documentation** (developers know exactly what to run)
+4. **Automated Validation** (pre-push hooks, format checking)
+5. **Platform Parity** (reproducible CI failures locally)
+
+### Success Metrics
+
+- **CI Failure Rate**: Reduce from ~20% to <5%
+- **Time to Fix**: Average time from failure to fix <30 minutes
+- **Developer Satisfaction**: Positive feedback on testing workflow
+- **Test Runtime**: Unit tests complete in <10s, full suite in <5min
+- **Coverage**: Maintain >80% test coverage for critical paths
+
+## Rollout Phases
+
+### Phase 1: Documentation and Tools (Week 1-2) ✅ COMPLETE
+
+**Status**: COMPLETE
+**Completion Date**: 2025-11-20
+
+#### Deliverables
+
+- ✅ Master testing documentation (`docs/internal/testing/README.md`)
+- ✅ Developer quick-start guide (`docs/public/developer/testing-quick-start.md`)
+- ✅ Integration plan (this document)
+- ✅ Updated release checklist with testing requirements
+
+#### Validation
+
+- ✅ All documents reviewed and approved
+- ✅ Links between documents verified
+- ✅ Content accuracy checked against actual implementation
+
+### Phase 2: Pre-Push Validation (Week 3)
+
+**Status**: PLANNED
+**Target Date**: 2025-11-27
+
+#### Deliverables
+
+1. **Pre-Push Script** (`scripts/pre-push.sh`)
+   - Run unit tests automatically
+   - Check code formatting
+   - Verify build compiles
+   - Exit with error if any check fails
+   - Run in <2 minutes
+
+2. **Git Hook Integration** (`.git/hooks/pre-push`)
+   - Optional installation script
+   - Easy enable/disable mechanism
+   - Clear output showing progress
+   - Skip with `--no-verify` flag
+
+3. **Developer Documentation**
+   - How to install pre-push hook
+   - How to customize checks
+   - How to skip when needed
+
+#### Implementation Steps
+
+```bash
+# 1. Create pre-push script
+scripts/pre-push.sh
+
+# 2. Create hook installer
+scripts/install-git-hooks.sh
+
+# 3. Update documentation
+docs/public/developer/git-workflow.md
+docs/public/developer/testing-quick-start.md
+
+# 4. Test on all platforms
+- macOS: Verify script runs correctly
+- Linux: Verify script runs correctly
+- Windows: Create PowerShell equivalent
+```
+
+#### Validation
+
+- [ ] Script runs in <2 minutes on all platforms
+- [ ] All checks are meaningful (catch real issues)
+- [ ] False positive rate <5%
+- [ ] Developers report positive feedback
+
+### Phase 3: Symbol Conflict Detection (Week 4)
+
+**Status**: PLANNED
+**Target Date**: 2025-12-04
+
+#### Background
+
+Recent Linux build failures were caused by symbol conflicts (FLAGS_rom, FLAGS_norom redefinition). We need automated detection to prevent this.
+
+#### Deliverables
+
+1. **Symbol Conflict Checker** (`scripts/check-symbols.sh`)
+   - Parse CMake target link graphs
+   - Detect duplicate symbol definitions
+   - Report conflicts with file locations
+   - Run in <30 seconds
+
+2. **CI Integration**
+   - Add symbol check job to `.github/workflows/ci.yml`
+   - Run on every PR
+   - Fail build if conflicts detected
+
+3. **Documentation**
+   - Troubleshooting guide for symbol conflicts
+   - Best practices for avoiding conflicts
+
+#### Implementation Steps
+
+```bash
+# 1. Create symbol checker
+scripts/check-symbols.sh
+# - Use nm/objdump to list symbols
+# - Compare across linked targets
+# - Detect duplicates
+
+# 2. Add to CI
+.github/workflows/ci.yml
+# - New job: symbol-check
+# - Runs after build
+
+# 3. Document usage
+docs/internal/testing/symbol-conflict-detection.md
+```
+
+#### Validation
+
+- [ ] Detects known symbol conflicts (FLAGS_rom case)
+- [ ] Zero false positives on current codebase
+- [ ] Runs in <30 seconds
+- [ ] Clear, actionable error messages
+
+### Phase 4: CMake Configuration Validation (Week 5)
+
+**Status**: PLANNED
+**Target Date**: 2025-12-11
+
+#### Deliverables
+
+1. **CMake Preset Validator** (`scripts/validate-cmake-presets.sh`)
+   - Verify all presets configure successfully
+   - Check for missing variables
+   - Validate preset inheritance
+   - Test preset combinations
+
+2. **Build Matrix Tester** (`scripts/test-build-matrix.sh`)
+   - Test common preset/platform combinations
+   - Verify all targets build
+   - Check for missing dependencies
+
+3. **Documentation**
+   - CMake troubleshooting guide
+   - Preset creation guidelines
+
+#### Implementation Steps
+
+```bash
+# 1. Create validators
+scripts/validate-cmake-presets.sh
+scripts/test-build-matrix.sh
+
+# 2. Add to CI (optional job)
+.github/workflows/cmake-validation.yml
+
+# 3. Document
+docs/internal/testing/cmake-validation.md
+```
+
+#### Validation
+
+- [ ] All current presets validate successfully
+- [ ] Catches common configuration errors
+- [ ] Runs in <5 minutes for full matrix
+- [ ] Provides clear error messages
+
+### Phase 5: Platform Matrix Testing (Week 6)
+
+**Status**: PLANNED
+**Target Date**: 2025-12-18
+
+#### Deliverables
+
+1. **Local Platform Testing** (`scripts/test-all-platforms.sh`)
+   - Run tests on all configured platforms
+   - Parallel execution for speed
+   - Aggregate results
+   - Report differences across platforms
+
+2. **CI Enhancement**
+   - Add platform-specific test suites
+   - Better artifact collection
+   - Test result comparison across platforms
+
+3. **Documentation**
+   - Platform-specific testing guide
+   - Troubleshooting platform differences
+
+#### Implementation Steps
+
+```bash
+# 1. Create platform tester
+scripts/test-all-platforms.sh
+
+# 2. Enhance CI
+.github/workflows/ci.yml
+# - Better artifact collection
+# - Result comparison
+
+# 3. Document
+docs/internal/testing/platform-testing.md
+```
+
+#### Validation
+
+- [ ] Detects platform-specific failures
+- [ ] Clear reporting of differences
+- [ ] Runs in <10 minutes (parallel)
+- [ ] Useful for debugging platform issues
+
+## Training and Communication
+
+### Developer Training
+
+**Target Audience**: All contributors
+
+**Format**: Written documentation + optional video walkthrough
+
+**Topics**:
+1. How to run tests locally (5 minutes)
+2. Understanding test categories (5 minutes)
+3. Using pre-push hooks (5 minutes)
+4. Debugging test failures (10 minutes)
+5. CI workflow overview (5 minutes)
+
+**Materials**:
+- ✅ Quick start guide (already created)
+- ✅ Testing guide (already exists)
+- [ ] Video walkthrough (optional, Phase 6)
+
+### Communication Plan
+
+**Announcements**:
+1. **Phase 1 Complete**: Email/Slack announcement with links to new docs
+2. **Phase 2 Ready**: Announce pre-push hooks, encourage adoption
+3. **Phase 3-5**: Update as each phase completes
+4. **Final Rollout**: Comprehensive announcement when all phases done
+
+**Channels**:
+- GitHub Discussions
+- Project README updates
+- CONTRIBUTING.md updates
+- Coordination board updates
+
+## Risk Mitigation
+
+### Risk 1: Developer Resistance to Pre-Push Hooks
+
+**Mitigation**:
+- Make hooks optional (install script)
+- Keep checks fast (<2 minutes)
+- Allow easy skip with `--no-verify`
+- Provide clear value proposition
+
+### Risk 2: False Positives Causing Frustration
+
+**Mitigation**:
+- Test extensively before rollout
+- Monitor false positive rate
+- Provide clear bypass mechanisms
+- Iterate based on feedback
+
+### Risk 3: Tools Break on Platform Updates
+
+**Mitigation**:
+- Test on all platforms before rollout
+- Document platform-specific requirements
+- Version-pin critical dependencies
+- Maintain fallback paths
+
+### Risk 4: CI Becomes Too Slow
+
+**Mitigation**:
+- Use parallel execution
+- Cache aggressively
+- Make expensive checks optional
+- Profile and optimize bottlenecks
+
+## Rollback Plan
+
+If any phase causes significant issues:
+
+1. **Immediate**: Disable problematic feature (remove hook, comment out CI job)
+2. **Investigate**: Gather feedback and logs
+3. **Fix**: Address root cause
+4. **Re-enable**: Gradual rollout with fixes
+5. **Document**: Update docs with lessons learned
+
+## Success Indicators
+
+### Week-by-Week Targets
+
+- **Week 2**: Documentation complete and published ✅
+- **Week 3**: Pre-push hooks adopted by 50% of active developers
+- **Week 4**: Symbol conflicts detected before reaching CI
+- **Week 5**: CMake preset validation catches configuration errors
+- **Week 6**: Platform-specific failures reproducible locally
+
+### Final Success Criteria (End of Phase 5)
+
+- ✅ All documentation complete and reviewed
+- [ ] CI failure rate <5% (down from ~20%)
+- [ ] Average time to fix CI failure <30 minutes
+- [ ] 80%+ developers using pre-push hooks
+- [ ] Zero symbol conflict issues reaching production
+- [ ] Platform parity: local tests match CI results
+
+## Maintenance and Long-Term Support
+
+### Ongoing Responsibilities
+
+**Testing Infrastructure Lead** (CLAUDE_TEST_COORD):
+- Monitor CI failure rates
+- Respond to testing infrastructure issues
+- Update documentation as needed
+- Coordinate with platform specialists
+
+**Platform Specialists**:
+- Maintain platform-specific test helpers
+- Troubleshoot platform-specific failures
+- Keep documentation current
+
+**All Developers**:
+- Report testing infrastructure issues
+- Suggest improvements
+- Keep tests passing locally before pushing
+
+### Quarterly Reviews
+
+**Schedule**: Every 3 months
+
+**Review**:
+1. CI failure rate trends
+2. Test runtime trends
+3. Developer feedback
+4. New platform/tool needs
+5. Documentation updates
+
+**Adjustments**:
+- Update scripts for new platforms
+- Optimize slow tests
+- Add new helpers as needed
+- Archive obsolete tools/docs
+
+## Budget and Resources
+
+### Time Investment
+
+**Initial Rollout** (Phases 1-5): ~6 weeks
+- Documentation: 1 week ✅
+- Pre-push validation: 1 week
+- Symbol detection: 1 week
+- CMake validation: 1 week
+- Platform testing: 1 week
+- Buffer/testing: 1 week
+
+**Ongoing Maintenance**: ~4 hours/month
+- Monitoring CI
+- Updating docs
+- Fixing issues
+- Quarterly reviews
+
+### Infrastructure Costs
+
+**Current**: $0 (using GitHub Actions free tier)
+
+**Projected**: $0 (within free tier limits)
+
+**Potential Future Costs**:
+- GitHub Actions minutes (if exceed free tier)
+- External CI service (if needed)
+- Test infrastructure hosting (if needed)
+
+## Appendix: Related Work
+
+### Completed by Other Agents
+
+**GEMINI_AUTOM**:
+- ✅ Remote workflow trigger support
+- ✅ HTTP API testing infrastructure
+- ✅ Helper scripts for agents
+
+**CLAUDE_AIINF**:
+- ✅ Platform-specific build fixes
+- ✅ CMake preset expansion
+- ✅ gRPC integration improvements
+
+**CODEX**:
+- ✅ Documentation audit and consolidation
+- ✅ Build verification scripts
+- ✅ Coordination board setup
+
+### Planned by Other Agents
+
+**CLAUDE_TEST_ARCH**:
+- Pre-push testing automation
+- Gap analysis of test coverage
+
+**CLAUDE_CMAKE_VALIDATOR**:
+- CMake configuration validation tools
+- Preset verification
+
+**CLAUDE_SYMBOL_CHECK**:
+- Symbol conflict detection
+- Link graph analysis
+
+**CLAUDE_MATRIX_TEST**:
+- Platform matrix testing
+- Cross-platform validation
+
+## Questions and Clarifications
+
+**Q: Are pre-push hooks mandatory?**
+A: No, they're optional but strongly recommended. Developers can install with `scripts/install-git-hooks.sh` and remove anytime.
+
+**Q: How long will pre-push checks take?**
+A: Target is <2 minutes. Unit tests (<10s) + format check (<5s) + build verification (~1min).
+
+**Q: What if I need to push despite failing checks?**
+A: Use `git push --no-verify` to bypass hooks. This should be rare and only for emergencies.
+
+**Q: Will this slow down CI?**
+A: No. Most tools run locally to catch issues before CI. Some new CI jobs are optional/parallel.
+
+**Q: What if tools break on my platform?**
+A: Report in GitHub issues with platform details. We'll fix or provide platform-specific workaround.
+
+## References
+
+- [Testing Documentation](README.md)
+- [Quick Start Guide](../../public/developer/testing-quick-start.md)
+- [Coordination Board](../agents/coordination-board.md)
+- [Release Checklist](../release-checklist.md)
+- [CI Workflow](../../../.github/workflows/ci.yml)
+
+---
+
+**Next Actions**: Proceed to Phase 2 (Pre-Push Validation) once Phase 1 is approved and published.
--- a/docs/internal/agents/archive/testing-docs-2025/matrix-testing-strategy.md
+++ b/docs/internal/agents/archive/testing-docs-2025/matrix-testing-strategy.md
@@ -0,0 +1,499 @@
+# Matrix Testing Strategy
+
+**Owner**: CLAUDE_MATRIX_TEST (Platform Matrix Testing Specialist)
+**Last Updated**: 2025-11-20
+**Status**: ACTIVE
+
+## Executive Summary
+
+This document defines the strategy for comprehensive platform/configuration matrix testing to catch issues across CMake flag combinations, platforms, and build configurations.
+
+**Key Goals**:
+- Catch cross-configuration issues before they reach production
+- Prevent "works on my machine" problems
+- Document problematic flag combinations
+- Make matrix testing accessible to developers locally
+- Minimize CI time while maximizing coverage
+
+**Quick Links**:
+- Configuration reference: `/docs/internal/configuration-matrix.md`
+- GitHub Actions workflow: `/.github/workflows/matrix-test.yml`
+- Local test script: `/scripts/test-config-matrix.sh`
+
+## 1. Problem Statement
+
+### Current Gaps
+
+Before this initiative, yaze only tested:
+1. **Default configurations**: `ci-linux`, `ci-macos`, `ci-windows` presets
+2. **Single feature toggles**: One dimension at a time
+3. **No interaction testing**: Missing edge cases like "GRPC=ON but REMOTE_AUTOMATION=OFF"
+
+### Real Bugs Caught by Matrix Testing
+
+Examples of issues a configuration matrix would catch:
+
+**Example 1: GRPC Without Automation**
+```cmake
+# Broken: User enables gRPC but disables remote automation
+cmake -B build -DYAZE_ENABLE_GRPC=ON -DYAZE_ENABLE_REMOTE_AUTOMATION=OFF
+# Result: gRPC headers included but server code never compiled → link errors
+```
+
+**Example 2: HTTP API Without CLI Stack**
+```cmake
+# Broken: User wants HTTP API but disables agent CLI
+cmake -B build -DYAZE_ENABLE_HTTP_API=ON -DYAZE_ENABLE_AGENT_CLI=OFF
+# Result: REST endpoints defined but no command dispatcher → runtime errors
+```
+
+**Example 3: AI Runtime Without JSON**
+```cmake
+# Broken: User enables AI with Gemini but disables JSON
+cmake -B build -DYAZE_ENABLE_AI_RUNTIME=ON -DYAZE_ENABLE_JSON=OFF
+# Result: Gemini parser requires JSON but it's not available → compile errors
+```
+
+**Example 4: Windows GRPC Version Mismatch**
+```cmake
+# Broken on Windows: gRPC version incompatible with MSVC ABI
+cmake -B build (with gRPC <1.67.1)
+# Result: Symbol errors, linker failures on Visual Studio
+```
+
+## 2. Matrix Testing Approach
+
+### Strategy: Smart, Not Exhaustive
+
+Instead of testing all 2^18 = 262,144 combinations:
+
+1. **Baseline**: Default configuration (most common user scenario)
+2. **Extremes**: All ON, All OFF (catch hidden assumptions)
+3. **Interactions**: Known problematic combinations
+4. **Tiers**: Progressive validation by feature complexity
+5. **Platforms**: Run critical tests on each OS
+
+### Testing Tiers
+
+#### Tier 1: Core Platforms (Every Commit)
+
+**When**: On push to `master` or `develop`, every PR
+**What**: The three critical presets that users will actually use
+**Time**: ~15 minutes total
+
+```
+ci-linux (gRPC + Agent, Linux)
+ci-macos (gRPC + Agent UI + Agent, macOS)
+ci-windows (gRPC, Windows)
+```
+
+**Why**: These reflect real user workflows. If they break, users are impacted immediately.
+
+#### Tier 2: Feature Combinations (Nightly / On-Demand)
+
+**When**: Nightly at 2 AM UTC, manual dispatch, or `[matrix]` in commit message
+**What**: 6-8 specific flag combinations per platform
+**Time**: ~45 minutes total (parallel across 3 platforms × 7 configs)
+
+```
+Linux:        minimal, grpc-only, full-ai, cli-no-grpc, http-api, no-json
+macOS:        minimal, full-ai, agent-ui, universal
+Windows:      minimal, full-ai, grpc-remote, z3ed-cli
+```
+
+**Why**: Tests dangerous interactions without exponential explosion. Each config tests a realistic user workflow.
+
+#### Tier 3: Platform-Specific (As Needed)
+
+**When**: When platform-specific issues arise
+**What**: Architecture-specific builds (ARM64, universal binary, etc.)
+**Time**: ~20 minutes
+
+```
+Windows ARM64:     Debug + Release
+macOS Universal:   arm64 + x86_64
+Linux ARM:         Cross-compile tests
+```
+
+**Why**: Catches architecture-specific issues that only appear on target platforms.
+
+### Configuration Selection Rationale
+
+#### Why "Minimal"?
+
+Tests the smallest viable configuration:
+- Validates core ROM reading/writing works without extras
+- Ensures build system doesn't have "feature X requires feature Y" errors
+- Catches over-linked libraries
+
+#### Why "gRPC Only"?
+
+Tests server-side automation without AI:
+- Validates gRPC infrastructure
+- Tests GUI automation system
+- Ensures protocol buffer compilation
+- Minimal dependencies for headless servers
+
+#### Why "Full AI Stack"?
+
+Tests maximum feature complexity:
+- All AI features enabled
+- Both Gemini and Ollama paths
+- Remote automation + Agent UI
+- Catches subtle linking issues with yaml-cpp, OpenSSL, etc.
+
+#### Why "No JSON"?
+
+Tests optional JSON dependency:
+- Ensures Ollama works without JSON
+- Validates graceful degradation
+- Catches hardcoded JSON assumptions
+
+#### Why Platform-Specific?
+
+Each platform has unique constraints:
+- **Windows**: MSVC ABI compatibility, gRPC version pinning
+- **macOS**: Universal binary (arm64 + x86_64), Homebrew dependencies
+- **Linux**: GCC version, glibc compatibility, system library versions
+
+## 3. Problematic Flag Combinations
+
+### Pattern 1: Hidden Dependencies (Fixed)
+
+**Configuration**:
+```cmake
+YAZE_ENABLE_GRPC=ON
+YAZE_ENABLE_REMOTE_AUTOMATION=OFF  # ← Inconsistent!
+```
+
+**Problem**: gRPC headers included, but no automation server compiled → link errors
+
+**Fix**: CMake now forces:
+```cmake
+if(YAZE_ENABLE_REMOTE_AUTOMATION AND NOT YAZE_ENABLE_GRPC)
+  set(YAZE_ENABLE_GRPC ON ... FORCE)
+endif()
+```
+
+**Matrix Test**: `grpc-only` configuration validates this constraint.
+
+### Pattern 2: Orphaned Features (Fixed)
+
+**Configuration**:
+```cmake
+YAZE_ENABLE_HTTP_API=ON
+YAZE_ENABLE_AGENT_CLI=OFF  # ← HTTP API needs a CLI context!
+```
+
+**Problem**: REST endpoints defined but no command dispatcher
+
+**Fix**: CMake forces:
+```cmake
+if(YAZE_ENABLE_HTTP_API AND NOT YAZE_ENABLE_AGENT_CLI)
+  set(YAZE_ENABLE_AGENT_CLI ON ... FORCE)
+endif()
+```
+
+**Matrix Test**: `http-api` configuration validates this.
+
+### Pattern 3: Optional Dependency Breakage
+
+**Configuration**:
+```cmake
+YAZE_ENABLE_AI_RUNTIME=ON
+YAZE_ENABLE_JSON=OFF  # ← Gemini requires JSON!
+```
+
+**Problem**: Gemini service can't parse responses
+
+**Status**: Currently relies on developer discipline
+**Matrix Test**: `no-json` + `full-ai` would catch this
+
+### Pattern 4: Platform-Specific ABI Mismatch
+
+**Configuration**: Windows with gRPC <1.67.1
+
+**Problem**: MSVC ABI differences, symbol mismatch
+
+**Status**: Documented in `ci-windows` preset
+**Matrix Test**: `grpc-remote` on Windows validates gRPC version
+
+### Pattern 5: Architecture-Specific Issues
+
+**Configuration**: macOS universal binary with platform-specific dependencies
+
+**Problem**: Homebrew packages may not have arm64 support
+
+**Status**: Requires dependency audit
+**Matrix Test**: `universal` on macOS tests both arm64 and x86_64
+
+## 4. Matrix Testing Tools
+
+### Local Testing: `scripts/test-config-matrix.sh`
+
+Developers run this before pushing to validate all critical configurations locally.
+
+#### Quick Start
+```bash
+# Test all configurations on current platform
+./scripts/test-config-matrix.sh
+
+# Test specific configuration
+./scripts/test-config-matrix.sh --config minimal
+
+# Smoke test (configure only, no build)
+./scripts/test-config-matrix.sh --smoke
+
+# Verbose with timing
+./scripts/test-config-matrix.sh --verbose
+```
+
+#### Features
+- **Fast feedback**: ~2-3 minutes for all configurations
+- **Smoke mode**: Configure without building (30 seconds)
+- **Platform detection**: Automatically runs platform-appropriate presets
+- **Result tracking**: Clear pass/fail summary
+- **Debug logging**: Full CMake/build output in `build_matrix/<config>/`
+
+#### Output Example
+```
+Config: minimal
+  Status: PASSED
+  Description: No AI, no gRPC
+  Build time: 2.3s
+
+Config: full-ai
+  Status: PASSED
+  Description: All features enabled
+  Build time: 45.2s
+
+============
+2/2 configs passed
+============
+```
+
+### CI Testing: `.github/workflows/matrix-test.yml`
+
+Automated nightly testing across all three platforms.
+
+#### Execution
+- **Trigger**: Nightly (2 AM UTC) + manual dispatch + `[matrix]` in commit message
+- **Platforms**: Linux (ubuntu-22.04), macOS (14), Windows (2022)
+- **Configurations per platform**: 6-7 distinct flag combinations
+- **Total runtime**: ~45 minutes (all jobs in parallel)
+- **Report**: Pass/fail summary + artifact upload on failure
+
+#### What It Tests
+
+**Linux (6 configs)**:
+1. `minimal` - No AI, no gRPC
+2. `grpc-only` - gRPC without automation
+3. `full-ai` - All features
+4. `cli-no-grpc` - CLI only
+5. `http-api` - REST endpoints
+6. `no-json` - Ollama mode
+
+**macOS (4 configs)**:
+1. `minimal` - GUI, no AI
+2. `full-ai` - All features
+3. `agent-ui` - Agent UI panels only
+4. `universal` - arm64 + x86_64 binary
+
+**Windows (4 configs)**:
+1. `minimal` - No AI
+2. `full-ai` - All features
+3. `grpc-remote` - gRPC + automation
+4. `z3ed-cli` - CLI executable
+
+## 5. Integration with Development Workflow
+
+### For Developers
+
+Before pushing code to `develop` or `master`:
+
+```bash
+# 1. Make changes
+git add src/...
+
+# 2. Test locally
+./scripts/test-config-matrix.sh
+
+# 3. If all pass, commit
+git commit -m "feature: add new thing"
+
+# 4. Push
+git push
+```
+
+### For CI/CD
+
+**On every push to develop/master**:
+1. Standard CI runs (Tier 1 tests)
+2. Code quality checks
+3. If green, wait for nightly matrix test
+
+**Nightly**:
+1. All Tier 2 combinations run in parallel
+2. Failures trigger alerts
+3. Success confirms no new cross-configuration issues
+
+### For Pull Requests
+
+Option A: **Include `[matrix]` in commit message**
+```bash
+git commit -m "fix: handle edge case [matrix]"
+git push  # Triggers matrix test immediately
+```
+
+Option B: **Manual dispatch**
+- Go to `.github/workflows/matrix-test.yml`
+- Click "Run workflow"
+- Select desired tier
+
+## 6. Monitoring & Maintenance
+
+### What to Watch
+
+**Daily**: Check nightly matrix test results
+- Link: GitHub Actions > `Configuration Matrix Testing`
+- Alert if any configuration fails
+
+**Weekly**: Review failure patterns
+- Are certain flag combinations always failing?
+- Is a platform having consistent issues?
+- Do dependencies need version updates?
+
+**Monthly**: Audit the matrix configuration
+- Do new flags need testing?
+- Are deprecated flags still tested?
+- Can any Tier 2 configs be combined?
+
+### Adding New Configurations
+
+When adding a new feature flag:
+
+1. **Update `cmake/options.cmake`**
+   - Define the option
+   - Document dependencies
+   - Add constraint enforcement
+
+2. **Update `/docs/internal/configuration-matrix.md`**
+   - Add to Section 1 (flags)
+   - Update Section 2 (constraints)
+   - Add to relevant Tier in Section 3
+
+3. **Update `/scripts/test-config-matrix.sh`**
+   - Add to `CONFIGS` array
+   - Test locally: `./scripts/test-config-matrix.sh --config new-config`
+
+4. **Update `/.github/workflows/matrix-test.yml`**
+   - Add matrix job entries for each platform
+   - Estimate runtime impact
+
+## 7. Troubleshooting Common Issues
+
+### Issue: "Configuration failed" locally
+
+```bash
+# Check the cmake log
+tail -50 build_matrix/<config>/config.log
+
+# Check if presets exist
+cmake --list-presets
+```
+
+### Issue: "Build failed" locally
+
+```bash
+# Get full build output
+./scripts/test-config-matrix.sh --config <name> --verbose
+
+# Check for missing dependencies
+# On macOS: brew list | grep <dep>
+# On Linux: apt list --installed | grep <dep>
+```
+
+### Issue: Test passes locally but fails in CI
+
+**Likely causes**:
+1. Different CMake version (CI uses latest)
+2. Different compiler (GCC vs Clang vs MSVC)
+3. Missing system library
+
+**Solutions**:
+- Check `.github/actions/setup-build` for CI environment
+- Match local compiler: `cmake --preset ci-linux -DCMAKE_CXX_COMPILER=gcc-13`
+- Add dependency: Update `cmake/dependencies.cmake`
+
+## 8. Future Improvements
+
+### Short Term (Next Sprint)
+
+- [ ] Add binary size tracking per configuration
+- [ ] Add compile time benchmarks
+- [ ] Auto-generate configuration compatibility matrix chart
+- [ ] Add `--ci-mode` flag to local script (simulates GH Actions)
+
+### Medium Term (Next Quarter)
+
+- [ ] Integrate with release pipeline (validate all Tier 2 before release)
+- [ ] Add performance regression tests per configuration
+- [ ] Create configuration validator tool (warns on suspicious combinations)
+- [ ] Document platform-specific dependency versions
+
+### Long Term (Next Year)
+
+- [ ] Separate `YAZE_ENABLE_AI` and `YAZE_ENABLE_AI_RUNTIME` (currently coupled)
+- [ ] Add Tier 0 (smoke tests) that run on every commit
+- [ ] Create web dashboard of matrix test results
+- [ ] Add "configuration suggestion" tool (infer optimal flags for user's hardware)
+
+## 9. Reference: Configuration Categories
+
+### GUI User (Desktop)
+```cmake
+YAZE_BUILD_GUI=ON
+YAZE_BUILD_AGENT_UI=ON
+YAZE_ENABLE_GRPC=OFF           # No network overhead
+YAZE_ENABLE_AI=OFF             # Unnecessary for GUI-only
+```
+
+### Server/Headless (Automation)
+```cmake
+YAZE_BUILD_GUI=OFF
+YAZE_ENABLE_GRPC=ON
+YAZE_ENABLE_REMOTE_AUTOMATION=ON
+YAZE_ENABLE_AI=OFF             # Optional
+```
+
+### Full-Featured Developer
+```cmake
+YAZE_BUILD_GUI=ON
+YAZE_BUILD_AGENT_UI=ON
+YAZE_ENABLE_GRPC=ON
+YAZE_ENABLE_REMOTE_AUTOMATION=ON
+YAZE_ENABLE_AI_RUNTIME=ON
+YAZE_ENABLE_HTTP_API=ON
+```
+
+### CLI-Only (z3ed Agent)
+```cmake
+YAZE_BUILD_GUI=OFF
+YAZE_BUILD_Z3ED=ON
+YAZE_ENABLE_GRPC=ON
+YAZE_ENABLE_AI_RUNTIME=ON
+YAZE_ENABLE_HTTP_API=ON
+```
+
+### Minimum (Embedded/Library)
+```cmake
+YAZE_BUILD_GUI=OFF
+YAZE_BUILD_CLI=OFF
+YAZE_BUILD_TESTS=OFF
+YAZE_ENABLE_GRPC=OFF
+YAZE_ENABLE_AI=OFF
+```
+
+---
+
+**Questions?** Check `/docs/internal/configuration-matrix.md` or ask in coordination-board.md.
--- a/docs/internal/agents/archive/testing-docs-2025/testing-strategy.md
+++ b/docs/internal/agents/archive/testing-docs-2025/testing-strategy.md
@@ -0,0 +1,843 @@
+# YAZE Testing Strategy
+
+## Purpose
+
+This document defines the comprehensive testing strategy for YAZE, explaining what each test level catches, when to run tests, and how to debug failures. It serves as the authoritative guide for developers and AI agents.
+
+**Last Updated**: 2025-11-20
+
+---
+
+## Table of Contents
+
+1. [Testing Philosophy](#1-testing-philosophy)
+2. [Test Pyramid](#2-test-pyramid)
+3. [Test Categories](#3-test-categories)
+4. [When to Run Tests](#4-when-to-run-tests)
+5. [Test Organization](#5-test-organization)
+6. [Platform-Specific Testing](#6-platform-specific-testing)
+7. [CI/CD Testing](#7-cicd-testing)
+8. [Debugging Test Failures](#8-debugging-test-failures)
+
+---
+
+## 1. Testing Philosophy
+
+### Core Principles
+
+1. **Fast Feedback**: Developers should get test results in <2 minutes locally
+2. **Fail Early**: Catch issues at the lowest/fastest test level possible
+3. **Confidence**: Tests should give confidence that code works across platforms
+4. **Automation**: All tests should be automatable in CI
+5. **Clarity**: Test failures should clearly indicate what broke and where
+
+### Testing Goals
+
+- **Prevent Regressions**: Ensure new changes don't break existing functionality
+- **Catch Build Issues**: Detect compilation/linking problems before CI
+- **Validate Logic**: Verify algorithms and data structures work correctly
+- **Test Integration**: Ensure components work together
+- **Validate UX**: Confirm UI workflows function as expected
+
+---
+
+## 2. Test Pyramid
+
+YAZE uses a **5-level testing pyramid**, from fastest (bottom) to slowest (top):
+
+```
+                    ┌─────────────────────┐
+                    │   E2E Tests (E2E)   │ Minutes    │ Few tests
+                    │  Full UI workflows  │            │ High value
+                    ├─────────────────────┤            │
+                 ┌─ │ Integration (INT)   │ Seconds    │
+                 │  │ Multi-component     │            │
+                 │  ├─────────────────────┤            │
+      Tests      │  │   Unit Tests (UT)   │ <1 second  │
+                 │  │  Isolated logic     │            │
+                 └─ ├─────────────────────┤            │
+                    │ Symbol Validation   │ Minutes    │
+                    │ ODR, conflicts      │            ▼
+                    ├─────────────────────┤
+                    │ Smoke Compilation   │ ~2 min
+                    │ Header checks       │
+      Build        ├─────────────────────┤
+      Checks       │ Config Validation   │ ~10 sec
+                   │ CMake, includes     │
+                   ├─────────────────────┤
+                   │ Static Analysis     │ <1 sec     │ Many checks
+                   │ Format, lint        │            │ Fast feedback
+                   └─────────────────────┘            ▼
+```
+
+---
+
+## 3. Test Categories
+
+### Level 0: Static Analysis (< 1 second)
+
+**Purpose**: Catch trivial issues before compilation
+
+**Tools**:
+- `clang-format` - Code formatting
+- `clang-tidy` - Static analysis (subset of files)
+- `cppcheck` - Additional static checks
+
+**What It Catches**:
+- ✅ Formatting violations
+- ✅ Common code smells
+- ✅ Potential null pointer dereferences
+- ✅ Unused variables
+
+**What It Misses**:
+- ❌ Build system issues
+- ❌ Linking problems
+- ❌ Runtime logic errors
+
+**Run Locally**:
+```bash
+# Format check (don't modify)
+cmake --build build --target yaze-format-check
+
+# Static analysis on changed files
+git diff --name-only HEAD | grep -E '\.(cc|h)$' | \
+  xargs clang-tidy-14 --header-filter='src/.*'
+```
+
+**Run in CI**: ✅ Every PR (code-quality job)
+
+---
+
+### Level 1: Configuration Validation (< 10 seconds)
+
+**Purpose**: Validate CMake configuration without full compilation
+
+**What It Catches**:
+- ✅ CMake syntax errors
+- ✅ Missing dependencies (immediate)
+- ✅ Invalid preset combinations
+- ✅ Include path misconfigurations
+
+**What It Misses**:
+- ❌ Actual compilation errors
+- ❌ Header availability issues
+- ❌ Linking problems
+
+**Run Locally**:
+```bash
+# Validate a preset
+./scripts/pre-push-test.sh --config-only
+
+# Test multiple presets
+for preset in mac-dbg mac-rel mac-ai; do
+  cmake --preset "$preset" --list-presets > /dev/null
+done
+```
+
+**Run in CI**: 🔄 Proposed (new job)
+
+---
+
+### Level 2: Smoke Compilation (< 2 minutes)
+
+**Purpose**: Quick compilation check to catch header/include issues
+
+**What It Catches**:
+- ✅ Missing headers
+- ✅ Include path problems
+- ✅ Preprocessor errors
+- ✅ Template instantiation issues
+- ✅ Platform-specific compilation
+
+**What It Misses**:
+- ❌ Linking errors
+- ❌ Symbol conflicts
+- ❌ Runtime behavior
+
+**Strategy**:
+- Compile 1-2 representative files per library
+- Focus on files with many includes
+- Test platform-specific code paths
+
+**Run Locally**:
+```bash
+./scripts/pre-push-test.sh --smoke-only
+```
+
+**Run in CI**: 🔄 Proposed (compile-only job, <5 min)
+
+---
+
+### Level 3: Symbol Validation (< 5 minutes)
+
+**Purpose**: Detect symbol conflicts and ODR violations
+
+**What It Catches**:
+- ✅ Duplicate symbol definitions
+- ✅ ODR (One Definition Rule) violations
+- ✅ Missing symbols (link errors)
+- ✅ Symbol visibility issues
+
+**What It Misses**:
+- ❌ Runtime logic errors
+- ❌ Performance issues
+- ❌ Memory leaks
+
+**Tools**:
+- `nm` (Unix/macOS) - Symbol inspection
+- `dumpbin /symbols` (Windows) - Symbol inspection
+- `c++filt` - Symbol demangling
+
+**Run Locally**:
+```bash
+./scripts/verify-symbols.sh
+```
+
+**Run in CI**: 🔄 Proposed (symbol-check job)
+
+---
+
+### Level 4: Unit Tests (< 1 second each)
+
+**Purpose**: Fast, isolated testing of individual components
+
+**Location**: `test/unit/`
+
+**Characteristics**:
+- No external dependencies (ROM, network, filesystem)
+- Mocked dependencies via test doubles
+- Single-component focus
+- Deterministic (no flaky tests)
+
+**What It Catches**:
+- ✅ Algorithm correctness
+- ✅ Data structure behavior
+- ✅ Edge cases and error handling
+- ✅ Isolated component logic
+
+**What It Misses**:
+- ❌ Component interactions
+- ❌ ROM data handling
+- ❌ UI workflows
+- ❌ Platform-specific issues
+
+**Examples**:
+- `test/unit/core/hex_test.cc` - Hex conversion logic
+- `test/unit/gfx/snes_palette_test.cc` - Palette operations
+- `test/unit/zelda3/object_parser_test.cc` - Object parsing
+
+**Run Locally**:
+```bash
+./build/bin/yaze_test --unit
+```
+
+**Run in CI**: ✅ Every PR (test job)
+
+**Writing Guidelines**:
+```cpp
+// GOOD: Fast, isolated, no dependencies
+TEST(UnitTest, SnesPaletteConversion) {
+  gfx::SnesColor color(0x7C00);  // Red in SNES format
+  EXPECT_EQ(color.red(), 31);
+  EXPECT_EQ(color.rgb(), 0xFF0000);
+}
+
+// BAD: Depends on ROM file
+TEST(UnitTest, LoadOverworldMapColors) {
+  Rom rom;
+  rom.LoadFromFile("zelda3.sfc");  // ❌ External dependency
+  auto colors = rom.ReadPalette(0x1BD308);
+  EXPECT_EQ(colors.size(), 128);
+}
+```
+
+---
+
+### Level 5: Integration Tests (1-10 seconds each)
+
+**Purpose**: Test interactions between components
+
+**Location**: `test/integration/`
+
+**Characteristics**:
+- Multi-component interactions
+- May require ROM files (optional)
+- Real implementations (minimal mocking)
+- Slower but more realistic
+
+**What It Catches**:
+- ✅ Component interaction bugs
+- ✅ Data flow between systems
+- ✅ ROM operations
+- ✅ Resource management
+
+**What It Misses**:
+- ❌ Full UI workflows
+- ❌ User interactions
+- ❌ Visual rendering
+
+**Examples**:
+- `test/integration/asar_integration_test.cc` - Asar patching + ROM
+- `test/integration/dungeon_editor_v2_test.cc` - Dungeon editor logic
+- `test/integration/zelda3/overworld_integration_test.cc` - Overworld loading
+
+**Run Locally**:
+```bash
+./build/bin/yaze_test --integration
+```
+
+**Run in CI**: ⚠️ Limited (develop/master only, not PRs)
+
+**Writing Guidelines**:
+```cpp
+// GOOD: Tests component interaction
+TEST(IntegrationTest, AsarPatchRom) {
+  Rom rom;
+  ASSERT_TRUE(rom.LoadFromFile("zelda3.sfc"));
+
+  AsarWrapper asar;
+  auto result = asar.ApplyPatch("test.asm", rom);
+  ASSERT_TRUE(result.ok());
+
+  // Verify ROM was patched correctly
+  EXPECT_EQ(rom.ReadByte(0x12345), 0xAB);
+}
+```
+
+---
+
+### Level 6: End-to-End (E2E) Tests (10-60 seconds each)
+
+**Purpose**: Validate full user workflows through the UI
+
+**Location**: `test/e2e/`
+
+**Characteristics**:
+- Full application stack
+- Real UI (ImGui + SDL)
+- User interaction simulation
+- Requires display/window system
+
+**What It Catches**:
+- ✅ Complete user workflows
+- ✅ UI responsiveness
+- ✅ Visual rendering (screenshots)
+- ✅ Cross-editor interactions
+
+**What It Misses**:
+- ❌ Performance issues
+- ❌ Memory leaks (unless with sanitizers)
+- ❌ Platform-specific edge cases
+
+**Tools**:
+- `ImGuiTestEngine` - UI automation
+- `ImGui_TestEngineHook_*` - Test engine integration
+
+**Examples**:
+- `test/e2e/dungeon_editor_smoke_test.cc` - Open dungeon editor, load ROM
+- `test/e2e/canvas_selection_test.cc` - Select tiles on canvas
+- `test/e2e/overworld/overworld_e2e_test.cc` - Overworld editing workflow
+
+**Run Locally**:
+```bash
+# Headless (fast)
+./build/bin/yaze_test --e2e
+
+# With GUI visible (slow, for debugging)
+./build/bin/yaze_test --e2e --show-gui --normal
+```
+
+**Run in CI**: ⚠️ macOS only (z3ed-agent-test job)
+
+**Writing Guidelines**:
+```cpp
+void E2ETest_DungeonEditorSmokeTest(ImGuiTestContext* ctx) {
+  ctx->SetRef("DockSpaceViewport");
+
+  // Open File menu
+  ctx->MenuCheck("File/Load ROM", true);
+
+  // Enter ROM path
+  ctx->ItemInput("##rom_path");
+  ctx->KeyCharsAppend("zelda3.sfc");
+
+  // Click Load button
+  ctx->ItemClick("Load");
+
+  // Verify editor opened
+  ctx->WindowFocus("Dungeon Editor");
+  IM_CHECK(ctx->WindowIsOpen("Dungeon Editor"));
+}
+```
+
+---
+
+## 4. When to Run Tests
+
+### 4.1 During Development (Continuous)
+
+**Frequency**: After every significant change
+
+**Run**:
+- Level 0: Static analysis (IDE integration)
+- Level 4: Unit tests for changed components
+
+**Tools**:
+- VSCode C++ extension (clang-tidy)
+- File watchers (`entr`, `watchexec`)
+
+```bash
+# Watch mode for unit tests
+find src test -name "*.cc" | entr -c ./build/bin/yaze_test --unit
+```
+
+---
+
+### 4.2 Before Committing (Pre-Commit)
+
+**Frequency**: Before `git commit`
+
+**Run**:
+- Level 0: Format check
+- Level 4: Unit tests for changed files
+
+**Setup** (optional):
+```bash
+# Install pre-commit hook
+cat > .git/hooks/pre-commit << 'EOF'
+#!/bin/bash
+# Format check
+if ! cmake --build build --target yaze-format-check; then
+  echo "❌ Format check failed. Run: cmake --build build --target yaze-format"
+  exit 1
+fi
+EOF
+chmod +x .git/hooks/pre-commit
+```
+
+---
+
+### 4.3 Before Pushing (Pre-Push)
+
+**Frequency**: Before `git push` to remote
+
+**Run**:
+- Level 0: Static analysis
+- Level 1: Configuration validation
+- Level 2: Smoke compilation
+- Level 3: Symbol validation
+- Level 4: All unit tests
+
+**Time Budget**: < 2 minutes
+
+**Command**:
+```bash
+# Unix/macOS
+./scripts/pre-push-test.sh
+
+# Windows
+.\scripts\pre-push-test.ps1
+```
+
+**What It Prevents**:
+- 90% of CI build failures
+- ODR violations
+- Include path issues
+- Symbol conflicts
+
+---
+
+### 4.4 After Pull Request Creation
+
+**Frequency**: Automatically on every PR
+
+**Run** (CI):
+- Level 0: Static analysis (code-quality job)
+- Level 2: Full compilation (build job)
+- Level 4: Unit tests (test job)
+- Level 4: Stable tests (test job)
+
+**Time**: 15-20 minutes
+
+**Outcome**: ✅ Required for merge
+
+---
+
+### 4.5 After Merge to Develop/Master
+
+**Frequency**: Post-merge (develop/master only)
+
+**Run** (CI):
+- All PR checks
+- Level 5: Integration tests
+- Level 6: E2E tests (macOS)
+- Memory sanitizers (Linux)
+- Full AI stack tests (Windows/macOS)
+
+**Time**: 30-45 minutes
+
+**Outcome**: ⚠️ Optional (but monitored)
+
+---
+
+### 4.6 Before Release
+
+**Frequency**: Release candidates
+
+**Run**:
+- All CI tests
+- Manual exploratory testing
+- Performance benchmarks
+- Cross-platform smoke testing
+
+**Checklist**: See `docs/internal/release-checklist.md`
+
+---
+
+## 5. Test Organization
+
+### Directory Structure
+
+```
+test/
+├── unit/                   # Level 4: Fast, isolated tests
+│   ├── core/              # Core utilities
+│   ├── gfx/               # Graphics system
+│   ├── zelda3/            # Game logic
+│   ├── cli/               # CLI components
+│   ├── gui/               # GUI widgets
+│   └── emu/               # Emulator
+│
+├── integration/           # Level 5: Multi-component tests
+│   ├── ai/                # AI integration
+│   ├── editor/            # Editor systems
+│   └── zelda3/            # Game system integration
+│
+├── e2e/                   # Level 6: Full workflow tests
+│   ├── overworld/         # Overworld editor E2E
+│   ├── zscustomoverworld/ # ZSCustomOverworld E2E
+│   └── rom_dependent/     # ROM-required E2E
+│
+├── benchmarks/            # Performance tests
+├── mocks/                 # Test doubles
+└── test_utils.cc          # Test utilities
+```
+
+### Naming Conventions
+
+**Files**:
+- Unit: `<component>_test.cc`
+- Integration: `<feature>_integration_test.cc`
+- E2E: `<workflow>_e2e_test.cc`
+
+**Test Names**:
+```cpp
+// Unit
+TEST(UnitTest, ComponentName_Behavior_ExpectedOutcome) { }
+
+// Integration
+TEST(IntegrationTest, SystemName_Interaction_ExpectedOutcome) { }
+
+// E2E
+void E2ETest_WorkflowName_StepDescription(ImGuiTestContext* ctx) { }
+```
+
+### Test Labels (CTest)
+
+Tests are labeled for selective execution:
+
+- `stable` - No ROM required, fast
+- `unit` - Unit tests only
+- `integration` - Integration tests
+- `e2e` - End-to-end tests
+- `rom_dependent` - Requires ROM file
+
+```bash
+# Run only stable tests
+ctest --preset stable
+
+# Run unit tests
+./build/bin/yaze_test --unit
+
+# Run ROM-dependent tests
+./build/bin/yaze_test --rom-dependent --rom-path zelda3.sfc
+```
+
+---
+
+## 6. Platform-Specific Testing
+
+### 6.1 Cross-Platform Considerations
+
+**Different Linker Behavior**:
+- macOS: More permissive (weak symbols)
+- Linux: Strict ODR enforcement
+- Windows: MSVC vs clang-cl differences
+
+**Strategy**: Test on Linux for strictest validation
+
+**Different Compilers**:
+- GCC (Linux): `-Werror=odr`
+- Clang (macOS/Linux): More warnings
+- clang-cl (Windows): MSVC compatibility mode
+
+**Strategy**: Use verbose presets (`*-dbg-v`) to see all warnings
+
+### 6.2 Local Cross-Platform Testing
+
+**For macOS Developers**:
+```bash
+# Test Linux build locally (future: Docker)
+docker run --rm -v $(pwd):/workspace yaze-linux-builder \
+  cmake --preset lin-dbg && cmake --build build --target yaze
+```
+
+**For Linux Developers**:
+```bash
+# Test macOS build locally (requires macOS VM)
+# Future: GitHub Actions remote testing
+```
+
+**For Windows Developers**:
+```powershell
+# Test via WSL (Linux build)
+wsl bash -c "cmake --preset lin-dbg && cmake --build build"
+```
+
+---
+
+## 7. CI/CD Testing
+
+### 7.1 Current CI Matrix
+
+| Job | Platform | Preset | Duration | Runs On |
+|-----|----------|--------|----------|---------|
+| build | Ubuntu 22.04 | ci-linux | ~15 min | All PRs |
+| build | macOS 14 | ci-macos | ~20 min | All PRs |
+| build | Windows 2022 | ci-windows | ~25 min | All PRs |
+| test | Ubuntu 22.04 | ci-linux | ~5 min | All PRs |
+| test | macOS 14 | ci-macos | ~5 min | All PRs |
+| test | Windows 2022 | ci-windows | ~5 min | All PRs |
+| windows-agent | Windows 2022 | ci-windows-ai | ~30 min | Post-merge |
+| code-quality | Ubuntu 22.04 | - | ~2 min | All PRs |
+| memory-sanitizer | Ubuntu 22.04 | sanitizer | ~20 min | PRs |
+| z3ed-agent-test | macOS 14 | mac-ai | ~15 min | Develop/master |
+
+### 7.2 Proposed CI Improvements
+
+**New Jobs**:
+
+1. **compile-only** (< 5 min)
+   - Run BEFORE full build
+   - Compile 10-20 representative files
+   - Fast feedback on include issues
+
+2. **symbol-check** (< 3 min)
+   - Run AFTER build
+   - Detect ODR violations
+   - Platform-specific (Linux most strict)
+
+3. **config-validation** (< 2 min)
+   - Test all presets can configure
+   - Validate include paths
+   - Catch CMake errors early
+
+**Benefits**:
+- 90% of issues caught in <5 minutes
+- Reduced wasted CI time
+- Faster developer feedback
+
+---
+
+## 8. Debugging Test Failures
+
+### 8.1 Local Test Failures
+
+**Unit Test Failure**:
+```bash
+# Run specific test
+./build/bin/yaze_test "TestSuiteName.TestName"
+
+# Run with verbose output
+./build/bin/yaze_test --verbose "TestSuiteName.*"
+
+# Run with debugger
+lldb -- ./build/bin/yaze_test "TestSuiteName.TestName"
+```
+
+**Integration Test Failure**:
+```bash
+# Ensure ROM is available
+export YAZE_TEST_ROM_PATH=/path/to/zelda3.sfc
+./build/bin/yaze_test --integration --verbose
+```
+
+**E2E Test Failure**:
+```bash
+# Run with GUI visible (slow motion)
+./build/bin/yaze_test --e2e --show-gui --cinematic
+
+# Take screenshots on failure
+YAZE_E2E_SCREENSHOT_DIR=/tmp/screenshots \
+  ./build/bin/yaze_test --e2e
+```
+
+### 8.2 CI Test Failures
+
+**Step 1: Identify Job**
+- Which platform failed? (Linux/macOS/Windows)
+- Which job failed? (build/test/code-quality)
+- Which test failed? (check CI logs)
+
+**Step 2: Reproduce Locally**
+```bash
+# Use matching CI preset
+cmake --preset ci-linux  # or ci-macos, ci-windows
+cmake --build build
+
+# Run same test
+./build/bin/yaze_test --unit
+```
+
+**Step 3: Platform-Specific Issues**
+
+**If Windows-only failure**:
+- Check for MSVC/clang-cl differences
+- Validate include paths (Abseil, gRPC)
+- Check preprocessor macros (`_WIN32`, etc.)
+
+**If Linux-only failure**:
+- Check for ODR violations (duplicate symbols)
+- Validate linker flags
+- Check for gflags `FLAGS` conflicts
+
+**If macOS-only failure**:
+- Check for framework dependencies
+- Validate Objective-C++ code
+- Check for Apple SDK issues
+
+### 8.3 Build Failures
+
+**CMake Configuration Failure**:
+```bash
+# Verbose CMake output
+cmake --preset ci-linux -DCMAKE_VERBOSE_MAKEFILE=ON
+
+# Check CMake cache
+cat build/CMakeCache.txt | grep ERROR
+
+# Check include paths
+cmake --build build --target help | grep INCLUDE
+```
+
+**Compilation Failure**:
+```bash
+# Verbose compilation
+cmake --build build --preset ci-linux -v
+
+# Single file compilation
+cd build
+ninja -v path/to/file.cc.o
+```
+
+**Linking Failure**:
+```bash
+# Check symbols in library
+nm -gU build/lib/libyaze_core.a | grep FLAGS
+
+# Check duplicate symbols
+./scripts/verify-symbols.sh --verbose
+
+# Check ODR violations
+nm build/lib/*.a | c++filt | grep " [TDR] " | sort | uniq -d
+```
+
+### 8.4 Common Failure Patterns
+
+**Pattern 1: "FLAGS redefined"**
+- **Cause**: gflags creates `FLAGS_*` symbols in multiple TUs
+- **Solution**: Define FLAGS in exactly one .cc file
+- **Prevention**: Run `./scripts/verify-symbols.sh`
+
+**Pattern 2: "Abseil headers not found"**
+- **Cause**: Include paths not propagated from gRPC
+- **Solution**: Add explicit Abseil include directory
+- **Prevention**: Run smoke compilation test
+
+**Pattern 3: "std::filesystem not available"**
+- **Cause**: Missing C++17/20 standard flag
+- **Solution**: Add `/std:c++latest` (Windows) or `-std=c++20`
+- **Prevention**: Validate compiler flags in CMake
+
+**Pattern 4: "Multiple definition of X"**
+- **Cause**: Header-only library included in multiple TUs
+- **Solution**: Use `inline` or move to single TU
+- **Prevention**: Symbol conflict checker
+
+---
+
+## 9. Best Practices
+
+### 9.1 Writing Tests
+
+1. **Fast**: Unit tests should complete in <100ms
+2. **Isolated**: No external dependencies (files, network, ROM)
+3. **Deterministic**: Same input → same output, always
+4. **Clear**: Test name describes what is tested
+5. **Focused**: One assertion per test (ideally)
+
+### 9.2 Test Data
+
+**Good**:
+```cpp
+// Inline test data
+const uint8_t palette_data[] = {0x00, 0x7C, 0xFF, 0x03};
+auto palette = gfx::SnesPalette(palette_data, 4);
+```
+
+**Bad**:
+```cpp
+// External file dependency
+auto palette = gfx::SnesPalette::LoadFromFile("test_palette.bin");  // ❌
+```
+
+### 9.3 Assertions
+
+**Prefer `EXPECT_*` over `ASSERT_*`**:
+- `EXPECT_*` continues on failure (more info)
+- `ASSERT_*` stops immediately (for fatal errors)
+
+```cpp
+// Good: Continue testing after failure
+EXPECT_EQ(color.red(), 31);
+EXPECT_EQ(color.green(), 0);
+EXPECT_EQ(color.blue(), 0);
+
+// Bad: Only see first failure
+ASSERT_EQ(color.red(), 31);
+ASSERT_EQ(color.green(), 0);  // Never executed if red fails
+```
+
+---
+
+## 10. Resources
+
+### Documentation
+- **Gap Analysis**: `docs/internal/testing/gap-analysis.md`
+- **Pre-Push Checklist**: `docs/internal/testing/pre-push-checklist.md`
+- **Quick Reference**: `docs/public/build/quick-reference.md`
+
+### Scripts
+- **Pre-Push Test**: `scripts/pre-push-test.sh` (Unix/macOS)
+- **Pre-Push Test**: `scripts/pre-push-test.ps1` (Windows)
+- **Symbol Checker**: `scripts/verify-symbols.sh`
+
+### CI Configuration
+- **Workflow**: `.github/workflows/ci.yml`
+- **Composite Actions**: `.github/actions/`
+
+### Tools
+- **Test Runner**: `test/yaze_test.cc`
+- **Test Utilities**: `test/test_utils.h`
+- **Google Test**: https://google.github.io/googletest/
+- **ImGui Test Engine**: https://github.com/ocornut/imgui_test_engine