backend-infra-engineer: Release v0.3.3 snapshot
This commit is contained in:
390
docs/internal/testing/gap-analysis.md
Normal file
390
docs/internal/testing/gap-analysis.md
Normal file
@@ -0,0 +1,390 @@
|
||||
# Testing Infrastructure Gap Analysis
|
||||
|
||||
## Executive Summary
|
||||
|
||||
Recent CI failures revealed critical gaps in our testing infrastructure that allowed platform-specific build failures to reach CI. This document analyzes what we currently test, what we missed, and what infrastructure is needed to catch issues earlier.
|
||||
|
||||
**Date**: 2025-11-20
|
||||
**Triggered By**: Multiple CI failures in commits 43a0e5e314, c2bb90a3f1, and related fixes
|
||||
|
||||
---
|
||||
|
||||
## 1. Issues We Didn't Catch Locally
|
||||
|
||||
### 1.1 Windows Abseil Include Path Issues (c2bb90a3f1)
|
||||
**Problem**: Abseil headers not found during Windows/clang-cl compilation
|
||||
**Why it wasn't caught**:
|
||||
- No local pre-push compilation check
|
||||
- CMake configuration validates successfully, but compilation fails later
|
||||
- Include path propagation from gRPC/Abseil not validated until full compile
|
||||
|
||||
**What would have caught it**:
|
||||
- ✅ Smoke compilation test (compile subset of files to catch header issues)
|
||||
- ✅ CMake configuration validator (check include path propagation)
|
||||
- ✅ Header dependency checker
|
||||
|
||||
### 1.2 Linux FLAGS Symbol Conflicts (43a0e5e314, eb77bbeaff)
|
||||
**Problem**: ODR (One Definition Rule) violation - multiple `FLAGS` symbols across libraries
|
||||
**Why it wasn't caught**:
|
||||
- Symbol conflicts only appear at link time
|
||||
- No cross-library symbol conflict detection
|
||||
- Static analysis doesn't catch ODR violations
|
||||
- Unit tests don't link full dependency graph
|
||||
|
||||
**What would have caught it**:
|
||||
- ✅ Symbol conflict scanner (nm/objdump analysis)
|
||||
- ✅ ODR violation detector
|
||||
- ✅ Full integration build test (link all libraries together)
|
||||
|
||||
### 1.3 Platform-Specific Configuration Issues
|
||||
**Problem**: Preprocessor flags, compiler detection, and platform-specific code paths
|
||||
**Why it wasn't caught**:
|
||||
- No local cross-platform validation
|
||||
- CMake configuration differences between platforms not tested
|
||||
- Compiler detection logic (clang-cl vs MSVC) not validated
|
||||
|
||||
**What would have caught it**:
|
||||
- ✅ CMake configuration dry-run on multiple platforms
|
||||
- ✅ Preprocessor flag validation
|
||||
- ✅ Compiler detection smoke test
|
||||
|
||||
---
|
||||
|
||||
## 2. Current Testing Coverage
|
||||
|
||||
### 2.1 What We Test Well
|
||||
|
||||
#### Unit Tests (test/unit/)
|
||||
- **Coverage**: Core algorithms, data structures, parsers
|
||||
- **Speed**: Fast (<1s for most tests)
|
||||
- **Isolation**: Mocked dependencies, no ROM required
|
||||
- **CI**: ✅ Runs on every PR
|
||||
- **Example**: `hex_test.cc`, `asar_wrapper_test.cc`, `snes_palette_test.cc`
|
||||
|
||||
**Strengths**:
|
||||
- Catches logic errors quickly
|
||||
- Good for TDD
|
||||
- Platform-independent
|
||||
|
||||
**Gaps**:
|
||||
- Doesn't catch build system issues
|
||||
- Doesn't catch linking problems
|
||||
- Doesn't validate dependencies
|
||||
|
||||
#### Integration Tests (test/integration/)
|
||||
- **Coverage**: Multi-component interactions, ROM operations
|
||||
- **Speed**: Slower (1-10s per test)
|
||||
- **Dependencies**: May require ROM files
|
||||
- **CI**: ✅ Runs on develop/master
|
||||
- **Example**: `asar_integration_test.cc`, `dungeon_editor_v2_test.cc`
|
||||
|
||||
**Strengths**:
|
||||
- Tests component interactions
|
||||
- Validates ROM operations
|
||||
|
||||
**Gaps**:
|
||||
- Still doesn't catch platform-specific issues
|
||||
- Doesn't validate symbol conflicts
|
||||
- Doesn't test cross-library linking
|
||||
|
||||
#### E2E Tests (test/e2e/)
|
||||
- **Coverage**: Full UI workflows, user interactions
|
||||
- **Speed**: Very slow (10-60s per test)
|
||||
- **Dependencies**: GUI, ImGuiTestEngine
|
||||
- **CI**: ⚠️ Limited (only on macOS z3ed-agent-test)
|
||||
- **Example**: `dungeon_editor_smoke_test.cc`, `canvas_selection_test.cc`
|
||||
|
||||
**Strengths**:
|
||||
- Validates real user workflows
|
||||
- Tests UI responsiveness
|
||||
|
||||
**Gaps**:
|
||||
- Not run consistently across platforms
|
||||
- Slow feedback loop
|
||||
- Requires display/window system
|
||||
|
||||
### 2.2 What We DON'T Test
|
||||
|
||||
#### Build System Validation
|
||||
- ❌ CMake configuration correctness per preset
|
||||
- ❌ Include path propagation from dependencies
|
||||
- ❌ Compiler flag compatibility
|
||||
- ❌ Linker flag validation
|
||||
- ❌ Cross-preset compatibility
|
||||
|
||||
#### Symbol-Level Issues
|
||||
- ❌ ODR (One Definition Rule) violations
|
||||
- ❌ Duplicate symbol detection across libraries
|
||||
- ❌ Symbol visibility (public/private)
|
||||
- ❌ ABI compatibility between libraries
|
||||
|
||||
#### Platform-Specific Compilation
|
||||
- ❌ Header-only compilation checks
|
||||
- ❌ Preprocessor branch coverage
|
||||
- ❌ Platform macro validation
|
||||
- ❌ Compiler-specific feature detection
|
||||
|
||||
#### Dependency Health
|
||||
- ❌ Include path conflicts
|
||||
- ❌ Library version mismatches
|
||||
- ❌ Transitive dependency validation
|
||||
- ❌ Static vs shared library conflicts
|
||||
|
||||
---
|
||||
|
||||
## 3. CI/CD Coverage Analysis
|
||||
|
||||
### 3.1 Current CI Matrix (.github/workflows/ci.yml)
|
||||
|
||||
| Platform | Build | Test (stable) | Test (unit) | Test (integration) | Test (AI) |
|
||||
|----------|-------|---------------|-------------|-------------------|-----------|
|
||||
| Ubuntu 22.04 (GCC-12) | ✅ | ✅ | ✅ | ❌ | ❌ |
|
||||
| macOS 14 (Clang) | ✅ | ✅ | ✅ | ❌ | ✅ |
|
||||
| Windows 2022 (Core) | ✅ | ✅ | ✅ | ❌ | ❌ |
|
||||
| Windows 2022 (AI) | ✅ | ✅ | ✅ | ❌ | ❌ |
|
||||
|
||||
**CI Job Flow**:
|
||||
1. **build**: Configure + compile full project
|
||||
2. **test**: Run stable + unit tests
|
||||
3. **windows-agent**: Full AI stack (gRPC + AI runtime)
|
||||
4. **code-quality**: clang-format, cppcheck, clang-tidy
|
||||
5. **memory-sanitizer**: AddressSanitizer (Linux only)
|
||||
6. **z3ed-agent-test**: Full agent test suite (macOS only)
|
||||
|
||||
### 3.2 CI Gaps
|
||||
|
||||
#### Missing Early Feedback
|
||||
- ❌ No compilation-only job (fails after 15-20 min build)
|
||||
- ❌ No CMake configuration validation job (would catch in <1 min)
|
||||
- ❌ No symbol conflict checking job
|
||||
|
||||
#### Limited Platform Coverage
|
||||
- ⚠️ Only Linux gets AddressSanitizer
|
||||
- ⚠️ Only macOS gets full z3ed agent tests
|
||||
- ⚠️ Windows AI stack not tested on PRs (only post-merge)
|
||||
|
||||
#### Incomplete Testing
|
||||
- ❌ Integration tests not run in CI
|
||||
- ❌ E2E tests not run on Linux/Windows
|
||||
- ❌ No ROM-dependent testing
|
||||
- ❌ No performance regression detection
|
||||
|
||||
---
|
||||
|
||||
## 4. Developer Workflow Gaps
|
||||
|
||||
### 4.1 Pre-Commit Hooks
|
||||
**Current State**: None
|
||||
**Gap**: No automatic checks before local commits
|
||||
|
||||
**Should Include**:
|
||||
- clang-format check
|
||||
- Build system sanity check
|
||||
- Copyright header validation
|
||||
|
||||
### 4.2 Pre-Push Validation
|
||||
**Current State**: Manual testing only
|
||||
**Gap**: Easy to push broken code to CI
|
||||
|
||||
**Should Include**:
|
||||
- Smoke build test (quick compilation check)
|
||||
- Unit test run
|
||||
- Symbol conflict detection
|
||||
|
||||
### 4.3 Local Cross-Platform Testing
|
||||
**Current State**: Developer-dependent
|
||||
**Gap**: No easy way to test across platforms locally
|
||||
|
||||
**Should Include**:
|
||||
- Docker-based Linux testing
|
||||
- VM-based Windows testing (for macOS/Linux devs)
|
||||
- Preset validation tool
|
||||
|
||||
---
|
||||
|
||||
## 5. Root Cause Analysis by Issue Type
|
||||
|
||||
### 5.1 Windows Abseil Include Paths
|
||||
|
||||
**Timeline**:
|
||||
- ✅ Local macOS build succeeds
|
||||
- ✅ CMake configuration succeeds on all platforms
|
||||
- ❌ Windows compilation fails 15 minutes into CI
|
||||
- ❌ Fix attempt 1 fails (14d1f5de4c)
|
||||
- ❌ Fix attempt 2 fails (c2bb90a3f1)
|
||||
- ✅ Final fix succeeds
|
||||
|
||||
**Why Multiple Attempts**:
|
||||
1. No local Windows testing environment
|
||||
2. CMake configuration doesn't validate actual compilation
|
||||
3. No header-only compilation check
|
||||
4. 15-20 minute feedback cycle from CI
|
||||
|
||||
**Prevention**:
|
||||
- Header compilation smoke test
|
||||
- CMake include path validator
|
||||
- Local Windows testing (Docker/VM)
|
||||
|
||||
### 5.2 Linux FLAGS Symbol Conflicts
|
||||
|
||||
**Timeline**:
|
||||
- ✅ Local macOS build succeeds
|
||||
- ✅ Unit tests pass
|
||||
- ❌ Linux full build fails at link time
|
||||
- ❌ ODR violation: multiple `FLAGS` definitions
|
||||
- ✅ Fix: move FLAGS definition, rename conflicts
|
||||
|
||||
**Why It Happened**:
|
||||
1. gflags creates `FLAGS_*` symbols in headers
|
||||
2. Multiple translation units define same symbols
|
||||
3. macOS linker more permissive than Linux ld
|
||||
4. No symbol conflict detection
|
||||
|
||||
**Prevention**:
|
||||
- Symbol conflict scanner
|
||||
- ODR violation checker
|
||||
- Cross-platform link test
|
||||
|
||||
---
|
||||
|
||||
## 6. Recommended Testing Levels
|
||||
|
||||
We propose a **5-level testing pyramid**:
|
||||
|
||||
### Level 0: Static Analysis (< 1s)
|
||||
- clang-format
|
||||
- clang-tidy on changed files
|
||||
- Copyright headers
|
||||
- CMakeLists.txt syntax
|
||||
|
||||
### Level 1: Configuration Validation (< 10s)
|
||||
- CMake configure dry-run
|
||||
- Include path validation
|
||||
- Compiler detection check
|
||||
- Preprocessor flag validation
|
||||
|
||||
### Level 2: Smoke Compilation (< 2 min)
|
||||
- Compile subset of files (1 file per library)
|
||||
- Header-only compilation
|
||||
- Template instantiation check
|
||||
- Platform-specific branch validation
|
||||
|
||||
### Level 3: Symbol Validation (< 5 min)
|
||||
- Full project compilation
|
||||
- Symbol conflict detection (nm/dumpbin)
|
||||
- ODR violation check
|
||||
- Library dependency graph
|
||||
|
||||
### Level 4: Test Execution (5-30 min)
|
||||
- Unit tests (fast)
|
||||
- Integration tests (medium)
|
||||
- E2E tests (slow)
|
||||
- ROM-dependent tests (optional)
|
||||
|
||||
---
|
||||
|
||||
## 7. Actionable Recommendations
|
||||
|
||||
### 7.1 Immediate Actions (This Initiative)
|
||||
|
||||
1. **Create pre-push scripts** (`scripts/pre-push-test.sh`, `scripts/pre-push-test.ps1`)
|
||||
- Run Level 0-2 checks locally
|
||||
- Estimated time: <2 minutes
|
||||
- Blocks 90% of CI failures
|
||||
|
||||
2. **Create symbol conflict detector** (`scripts/verify-symbols.sh`)
|
||||
- Scan built libraries for duplicate symbols
|
||||
- Run as part of pre-push
|
||||
- Catches ODR violations
|
||||
|
||||
3. **Document testing strategy** (`docs/internal/testing/testing-strategy.md`)
|
||||
- Clear explanation of each test level
|
||||
- When to run which tests
|
||||
- CI vs local testing
|
||||
|
||||
4. **Create pre-push checklist** (`docs/internal/testing/pre-push-checklist.md`)
|
||||
- Interactive checklist for developers
|
||||
- Links to tools and scripts
|
||||
|
||||
### 7.2 Short-Term Improvements (Next Sprint)
|
||||
|
||||
1. **Add CI compile-only job**
|
||||
- Runs in <5 minutes
|
||||
- Catches compilation issues before full build
|
||||
- Fails fast
|
||||
|
||||
2. **Add CI symbol checking job**
|
||||
- Runs after compile-only
|
||||
- Detects ODR violations
|
||||
- Platform-specific
|
||||
|
||||
3. **Add CMake configuration validation job**
|
||||
- Tests all presets
|
||||
- Validates include paths
|
||||
- <2 minutes
|
||||
|
||||
4. **Enable integration tests in CI**
|
||||
- Run on develop/master only (not PRs)
|
||||
- Requires ROM file handling
|
||||
|
||||
### 7.3 Long-Term Improvements (Future)
|
||||
|
||||
1. **Docker-based local testing**
|
||||
- Linux environment for macOS/Windows devs
|
||||
- Matches CI exactly
|
||||
- Fast feedback
|
||||
|
||||
2. **Cross-platform test matrix locally**
|
||||
- Run tests across multiple platforms
|
||||
- Automated VM/container management
|
||||
|
||||
3. **Performance regression detection**
|
||||
- Benchmark suite
|
||||
- Historical tracking
|
||||
- Automatic alerts
|
||||
|
||||
4. **Coverage tracking**
|
||||
- Line coverage per PR
|
||||
- Coverage trends over time
|
||||
- Uncovered code reports
|
||||
|
||||
---
|
||||
|
||||
## 8. Success Metrics
|
||||
|
||||
### 8.1 Developer Experience
|
||||
- **Target**: <2 minutes pre-push validation time
|
||||
- **Target**: 90% reduction in CI build failures
|
||||
- **Target**: <3 attempts to fix CI issues (down from 5-10)
|
||||
|
||||
### 8.2 CI Efficiency
|
||||
- **Target**: <5 minutes to first failure signal
|
||||
- **Target**: 50% reduction in wasted CI time
|
||||
- **Target**: 95% PR pass rate (up from ~70%)
|
||||
|
||||
### 8.3 Code Quality
|
||||
- **Target**: Zero ODR violations
|
||||
- **Target**: Zero platform-specific include issues
|
||||
- **Target**: 100% symbol conflict detection
|
||||
|
||||
---
|
||||
|
||||
## 9. Reference
|
||||
|
||||
### Similar Issues in Recent History
|
||||
- Windows std::filesystem support (19196ca87c, b556b155a5)
|
||||
- Linux circular dependency (0812a84a22, e36d81f357)
|
||||
- macOS z3ed linker error (9c562df277)
|
||||
- Windows clang-cl detection (84cdb09a5b, cbdc6670a1)
|
||||
|
||||
### Related Documentation
|
||||
- `docs/public/build/quick-reference.md` - Build commands
|
||||
- `docs/public/build/troubleshooting.md` - Platform-specific fixes
|
||||
- `CLAUDE.md` - Build system guidelines
|
||||
- `.github/workflows/ci.yml` - CI configuration
|
||||
|
||||
### Tools Used
|
||||
- `nm` (Unix) / `dumpbin` (Windows) - Symbol inspection
|
||||
- `clang-tidy` - Static analysis
|
||||
- `cppcheck` - Code quality
|
||||
- `cmake --preset <name> --list-presets` - Preset validation
|
||||
Reference in New Issue
Block a user