Files

391 lines
11 KiB
Markdown

# Testing Infrastructure Gap Analysis
## Executive Summary
Recent CI failures revealed critical gaps in our testing infrastructure that allowed platform-specific build failures to reach CI. This document analyzes what we currently test, what we missed, and what infrastructure is needed to catch issues earlier.
**Date**: 2025-11-20
**Triggered By**: Multiple CI failures in commits 43a0e5e314, c2bb90a3f1, and related fixes
---
## 1. Issues We Didn't Catch Locally
### 1.1 Windows Abseil Include Path Issues (c2bb90a3f1)
**Problem**: Abseil headers not found during Windows/clang-cl compilation
**Why it wasn't caught**:
- No local pre-push compilation check
- CMake configuration validates successfully, but compilation fails later
- Include path propagation from gRPC/Abseil not validated until full compile
**What would have caught it**:
- ✅ Smoke compilation test (compile subset of files to catch header issues)
- ✅ CMake configuration validator (check include path propagation)
- ✅ Header dependency checker
### 1.2 Linux FLAGS Symbol Conflicts (43a0e5e314, eb77bbeaff)
**Problem**: ODR (One Definition Rule) violation - multiple `FLAGS` symbols across libraries
**Why it wasn't caught**:
- Symbol conflicts only appear at link time
- No cross-library symbol conflict detection
- Static analysis doesn't catch ODR violations
- Unit tests don't link full dependency graph
**What would have caught it**:
- ✅ Symbol conflict scanner (nm/objdump analysis)
- ✅ ODR violation detector
- ✅ Full integration build test (link all libraries together)
### 1.3 Platform-Specific Configuration Issues
**Problem**: Preprocessor flags, compiler detection, and platform-specific code paths
**Why it wasn't caught**:
- No local cross-platform validation
- CMake configuration differences between platforms not tested
- Compiler detection logic (clang-cl vs MSVC) not validated
**What would have caught it**:
- ✅ CMake configuration dry-run on multiple platforms
- ✅ Preprocessor flag validation
- ✅ Compiler detection smoke test
---
## 2. Current Testing Coverage
### 2.1 What We Test Well
#### Unit Tests (test/unit/)
- **Coverage**: Core algorithms, data structures, parsers
- **Speed**: Fast (<1s for most tests)
- **Isolation**: Mocked dependencies, no ROM required
- **CI**: ✅ Runs on every PR
- **Example**: `hex_test.cc`, `asar_wrapper_test.cc`, `snes_palette_test.cc`
**Strengths**:
- Catches logic errors quickly
- Good for TDD
- Platform-independent
**Gaps**:
- Doesn't catch build system issues
- Doesn't catch linking problems
- Doesn't validate dependencies
#### Integration Tests (test/integration/)
- **Coverage**: Multi-component interactions, ROM operations
- **Speed**: Slower (1-10s per test)
- **Dependencies**: May require ROM files
- **CI**: ✅ Runs on develop/master
- **Example**: `asar_integration_test.cc`, `dungeon_editor_v2_test.cc`
**Strengths**:
- Tests component interactions
- Validates ROM operations
**Gaps**:
- Still doesn't catch platform-specific issues
- Doesn't validate symbol conflicts
- Doesn't test cross-library linking
#### E2E Tests (test/e2e/)
- **Coverage**: Full UI workflows, user interactions
- **Speed**: Very slow (10-60s per test)
- **Dependencies**: GUI, ImGuiTestEngine
- **CI**: ⚠️ Limited (only on macOS z3ed-agent-test)
- **Example**: `dungeon_editor_smoke_test.cc`, `canvas_selection_test.cc`
**Strengths**:
- Validates real user workflows
- Tests UI responsiveness
**Gaps**:
- Not run consistently across platforms
- Slow feedback loop
- Requires display/window system
### 2.2 What We DON'T Test
#### Build System Validation
- ❌ CMake configuration correctness per preset
- ❌ Include path propagation from dependencies
- ❌ Compiler flag compatibility
- ❌ Linker flag validation
- ❌ Cross-preset compatibility
#### Symbol-Level Issues
- ❌ ODR (One Definition Rule) violations
- ❌ Duplicate symbol detection across libraries
- ❌ Symbol visibility (public/private)
- ❌ ABI compatibility between libraries
#### Platform-Specific Compilation
- ❌ Header-only compilation checks
- ❌ Preprocessor branch coverage
- ❌ Platform macro validation
- ❌ Compiler-specific feature detection
#### Dependency Health
- ❌ Include path conflicts
- ❌ Library version mismatches
- ❌ Transitive dependency validation
- ❌ Static vs shared library conflicts
---
## 3. CI/CD Coverage Analysis
### 3.1 Current CI Matrix (.github/workflows/ci.yml)
| Platform | Build | Test (stable) | Test (unit) | Test (integration) | Test (AI) |
|----------|-------|---------------|-------------|-------------------|-----------|
| Ubuntu 22.04 (GCC-12) | ✅ | ✅ | ✅ | ❌ | ❌ |
| macOS 14 (Clang) | ✅ | ✅ | ✅ | ❌ | ✅ |
| Windows 2022 (Core) | ✅ | ✅ | ✅ | ❌ | ❌ |
| Windows 2022 (AI) | ✅ | ✅ | ✅ | ❌ | ❌ |
**CI Job Flow**:
1. **build**: Configure + compile full project
2. **test**: Run stable + unit tests
3. **windows-agent**: Full AI stack (gRPC + AI runtime)
4. **code-quality**: clang-format, cppcheck, clang-tidy
5. **memory-sanitizer**: AddressSanitizer (Linux only)
6. **z3ed-agent-test**: Full agent test suite (macOS only)
### 3.2 CI Gaps
#### Missing Early Feedback
- ❌ No compilation-only job (fails after 15-20 min build)
- ❌ No CMake configuration validation job (would catch in <1 min)
- ❌ No symbol conflict checking job
#### Limited Platform Coverage
- ⚠️ Only Linux gets AddressSanitizer
- ⚠️ Only macOS gets full z3ed agent tests
- ⚠️ Windows AI stack not tested on PRs (only post-merge)
#### Incomplete Testing
- ❌ Integration tests not run in CI
- ❌ E2E tests not run on Linux/Windows
- ❌ No ROM-dependent testing
- ❌ No performance regression detection
---
## 4. Developer Workflow Gaps
### 4.1 Pre-Commit Hooks
**Current State**: None
**Gap**: No automatic checks before local commits
**Should Include**:
- clang-format check
- Build system sanity check
- Copyright header validation
### 4.2 Pre-Push Validation
**Current State**: Manual testing only
**Gap**: Easy to push broken code to CI
**Should Include**:
- Smoke build test (quick compilation check)
- Unit test run
- Symbol conflict detection
### 4.3 Local Cross-Platform Testing
**Current State**: Developer-dependent
**Gap**: No easy way to test across platforms locally
**Should Include**:
- Docker-based Linux testing
- VM-based Windows testing (for macOS/Linux devs)
- Preset validation tool
---
## 5. Root Cause Analysis by Issue Type
### 5.1 Windows Abseil Include Paths
**Timeline**:
- ✅ Local macOS build succeeds
- ✅ CMake configuration succeeds on all platforms
- ❌ Windows compilation fails 15 minutes into CI
- ❌ Fix attempt 1 fails (14d1f5de4c)
- ❌ Fix attempt 2 fails (c2bb90a3f1)
- ✅ Final fix succeeds
**Why Multiple Attempts**:
1. No local Windows testing environment
2. CMake configuration doesn't validate actual compilation
3. No header-only compilation check
4. 15-20 minute feedback cycle from CI
**Prevention**:
- Header compilation smoke test
- CMake include path validator
- Local Windows testing (Docker/VM)
### 5.2 Linux FLAGS Symbol Conflicts
**Timeline**:
- ✅ Local macOS build succeeds
- ✅ Unit tests pass
- ❌ Linux full build fails at link time
- ❌ ODR violation: multiple `FLAGS` definitions
- ✅ Fix: move FLAGS definition, rename conflicts
**Why It Happened**:
1. gflags creates `FLAGS_*` symbols in headers
2. Multiple translation units define same symbols
3. macOS linker more permissive than Linux ld
4. No symbol conflict detection
**Prevention**:
- Symbol conflict scanner
- ODR violation checker
- Cross-platform link test
---
## 6. Recommended Testing Levels
We propose a **5-level testing pyramid**:
### Level 0: Static Analysis (< 1s)
- clang-format
- clang-tidy on changed files
- Copyright headers
- CMakeLists.txt syntax
### Level 1: Configuration Validation (< 10s)
- CMake configure dry-run
- Include path validation
- Compiler detection check
- Preprocessor flag validation
### Level 2: Smoke Compilation (< 2 min)
- Compile subset of files (1 file per library)
- Header-only compilation
- Template instantiation check
- Platform-specific branch validation
### Level 3: Symbol Validation (< 5 min)
- Full project compilation
- Symbol conflict detection (nm/dumpbin)
- ODR violation check
- Library dependency graph
### Level 4: Test Execution (5-30 min)
- Unit tests (fast)
- Integration tests (medium)
- E2E tests (slow)
- ROM-dependent tests (optional)
---
## 7. Actionable Recommendations
### 7.1 Immediate Actions (This Initiative)
1. **Create pre-push scripts** (`scripts/pre-push-test.sh`, `scripts/pre-push-test.ps1`)
- Run Level 0-2 checks locally
- Estimated time: <2 minutes
- Blocks 90% of CI failures
2. **Create symbol conflict detector** (`scripts/verify-symbols.sh`)
- Scan built libraries for duplicate symbols
- Run as part of pre-push
- Catches ODR violations
3. **Document testing strategy** (`docs/internal/testing/testing-strategy.md`)
- Clear explanation of each test level
- When to run which tests
- CI vs local testing
4. **Create pre-push checklist** (`docs/internal/testing/pre-push-checklist.md`)
- Interactive checklist for developers
- Links to tools and scripts
### 7.2 Short-Term Improvements (Next Sprint)
1. **Add CI compile-only job**
- Runs in <5 minutes
- Catches compilation issues before full build
- Fails fast
2. **Add CI symbol checking job**
- Runs after compile-only
- Detects ODR violations
- Platform-specific
3. **Add CMake configuration validation job**
- Tests all presets
- Validates include paths
- <2 minutes
4. **Enable integration tests in CI**
- Run on develop/master only (not PRs)
- Requires ROM file handling
### 7.3 Long-Term Improvements (Future)
1. **Docker-based local testing**
- Linux environment for macOS/Windows devs
- Matches CI exactly
- Fast feedback
2. **Cross-platform test matrix locally**
- Run tests across multiple platforms
- Automated VM/container management
3. **Performance regression detection**
- Benchmark suite
- Historical tracking
- Automatic alerts
4. **Coverage tracking**
- Line coverage per PR
- Coverage trends over time
- Uncovered code reports
---
## 8. Success Metrics
### 8.1 Developer Experience
- **Target**: <2 minutes pre-push validation time
- **Target**: 90% reduction in CI build failures
- **Target**: <3 attempts to fix CI issues (down from 5-10)
### 8.2 CI Efficiency
- **Target**: <5 minutes to first failure signal
- **Target**: 50% reduction in wasted CI time
- **Target**: 95% PR pass rate (up from ~70%)
### 8.3 Code Quality
- **Target**: Zero ODR violations
- **Target**: Zero platform-specific include issues
- **Target**: 100% symbol conflict detection
---
## 9. Reference
### Similar Issues in Recent History
- Windows std::filesystem support (19196ca87c, b556b155a5)
- Linux circular dependency (0812a84a22, e36d81f357)
- macOS z3ed linker error (9c562df277)
- Windows clang-cl detection (84cdb09a5b, cbdc6670a1)
### Related Documentation
- `docs/public/build/quick-reference.md` - Build commands
- `docs/public/build/troubleshooting.md` - Platform-specific fixes
- `CLAUDE.md` - Build system guidelines
- `.github/workflows/ci.yml` - CI configuration
### Tools Used
- `nm` (Unix) / `dumpbin` (Windows) - Symbol inspection
- `clang-tidy` - Static analysis
- `cppcheck` - Code quality
- `cmake --preset <name> --list-presets` - Preset validation