11 KiB
Testing Infrastructure Gap Analysis
Executive Summary
Recent CI failures revealed critical gaps in our testing infrastructure that allowed platform-specific build failures to reach CI. This document analyzes what we currently test, what we missed, and what infrastructure is needed to catch issues earlier.
Date: 2025-11-20
Triggered By: Multiple CI failures in commits 43a0e5e314, c2bb90a3f1, and related fixes
1. Issues We Didn't Catch Locally
1.1 Windows Abseil Include Path Issues (c2bb90a3f1)
Problem: Abseil headers not found during Windows/clang-cl compilation Why it wasn't caught:
- No local pre-push compilation check
- CMake configuration validates successfully, but compilation fails later
- Include path propagation from gRPC/Abseil not validated until full compile
What would have caught it:
- ✅ Smoke compilation test (compile subset of files to catch header issues)
- ✅ CMake configuration validator (check include path propagation)
- ✅ Header dependency checker
1.2 Linux FLAGS Symbol Conflicts (43a0e5e314, eb77bbeaff)
Problem: ODR (One Definition Rule) violation - multiple FLAGS symbols across libraries
Why it wasn't caught:
- Symbol conflicts only appear at link time
- No cross-library symbol conflict detection
- Static analysis doesn't catch ODR violations
- Unit tests don't link full dependency graph
What would have caught it:
- ✅ Symbol conflict scanner (nm/objdump analysis)
- ✅ ODR violation detector
- ✅ Full integration build test (link all libraries together)
1.3 Platform-Specific Configuration Issues
Problem: Preprocessor flags, compiler detection, and platform-specific code paths Why it wasn't caught:
- No local cross-platform validation
- CMake configuration differences between platforms not tested
- Compiler detection logic (clang-cl vs MSVC) not validated
What would have caught it:
- ✅ CMake configuration dry-run on multiple platforms
- ✅ Preprocessor flag validation
- ✅ Compiler detection smoke test
2. Current Testing Coverage
2.1 What We Test Well
Unit Tests (test/unit/)
- Coverage: Core algorithms, data structures, parsers
- Speed: Fast (<1s for most tests)
- Isolation: Mocked dependencies, no ROM required
- CI: ✅ Runs on every PR
- Example:
hex_test.cc,asar_wrapper_test.cc,snes_palette_test.cc
Strengths:
- Catches logic errors quickly
- Good for TDD
- Platform-independent
Gaps:
- Doesn't catch build system issues
- Doesn't catch linking problems
- Doesn't validate dependencies
Integration Tests (test/integration/)
- Coverage: Multi-component interactions, ROM operations
- Speed: Slower (1-10s per test)
- Dependencies: May require ROM files
- CI: ✅ Runs on develop/master
- Example:
asar_integration_test.cc,dungeon_editor_v2_test.cc
Strengths:
- Tests component interactions
- Validates ROM operations
Gaps:
- Still doesn't catch platform-specific issues
- Doesn't validate symbol conflicts
- Doesn't test cross-library linking
E2E Tests (test/e2e/)
- Coverage: Full UI workflows, user interactions
- Speed: Very slow (10-60s per test)
- Dependencies: GUI, ImGuiTestEngine
- CI: ⚠️ Limited (only on macOS z3ed-agent-test)
- Example:
dungeon_editor_smoke_test.cc,canvas_selection_test.cc
Strengths:
- Validates real user workflows
- Tests UI responsiveness
Gaps:
- Not run consistently across platforms
- Slow feedback loop
- Requires display/window system
2.2 What We DON'T Test
Build System Validation
- ❌ CMake configuration correctness per preset
- ❌ Include path propagation from dependencies
- ❌ Compiler flag compatibility
- ❌ Linker flag validation
- ❌ Cross-preset compatibility
Symbol-Level Issues
- ❌ ODR (One Definition Rule) violations
- ❌ Duplicate symbol detection across libraries
- ❌ Symbol visibility (public/private)
- ❌ ABI compatibility between libraries
Platform-Specific Compilation
- ❌ Header-only compilation checks
- ❌ Preprocessor branch coverage
- ❌ Platform macro validation
- ❌ Compiler-specific feature detection
Dependency Health
- ❌ Include path conflicts
- ❌ Library version mismatches
- ❌ Transitive dependency validation
- ❌ Static vs shared library conflicts
3. CI/CD Coverage Analysis
3.1 Current CI Matrix (.github/workflows/ci.yml)
| Platform | Build | Test (stable) | Test (unit) | Test (integration) | Test (AI) |
|---|---|---|---|---|---|
| Ubuntu 22.04 (GCC-12) | ✅ | ✅ | ✅ | ❌ | ❌ |
| macOS 14 (Clang) | ✅ | ✅ | ✅ | ❌ | ✅ |
| Windows 2022 (Core) | ✅ | ✅ | ✅ | ❌ | ❌ |
| Windows 2022 (AI) | ✅ | ✅ | ✅ | ❌ | ❌ |
CI Job Flow:
- build: Configure + compile full project
- test: Run stable + unit tests
- windows-agent: Full AI stack (gRPC + AI runtime)
- code-quality: clang-format, cppcheck, clang-tidy
- memory-sanitizer: AddressSanitizer (Linux only)
- z3ed-agent-test: Full agent test suite (macOS only)
3.2 CI Gaps
Missing Early Feedback
- ❌ No compilation-only job (fails after 15-20 min build)
- ❌ No CMake configuration validation job (would catch in <1 min)
- ❌ No symbol conflict checking job
Limited Platform Coverage
- ⚠️ Only Linux gets AddressSanitizer
- ⚠️ Only macOS gets full z3ed agent tests
- ⚠️ Windows AI stack not tested on PRs (only post-merge)
Incomplete Testing
- ❌ Integration tests not run in CI
- ❌ E2E tests not run on Linux/Windows
- ❌ No ROM-dependent testing
- ❌ No performance regression detection
4. Developer Workflow Gaps
4.1 Pre-Commit Hooks
Current State: None Gap: No automatic checks before local commits
Should Include:
- clang-format check
- Build system sanity check
- Copyright header validation
4.2 Pre-Push Validation
Current State: Manual testing only Gap: Easy to push broken code to CI
Should Include:
- Smoke build test (quick compilation check)
- Unit test run
- Symbol conflict detection
4.3 Local Cross-Platform Testing
Current State: Developer-dependent Gap: No easy way to test across platforms locally
Should Include:
- Docker-based Linux testing
- VM-based Windows testing (for macOS/Linux devs)
- Preset validation tool
5. Root Cause Analysis by Issue Type
5.1 Windows Abseil Include Paths
Timeline:
- ✅ Local macOS build succeeds
- ✅ CMake configuration succeeds on all platforms
- ❌ Windows compilation fails 15 minutes into CI
- ❌ Fix attempt 1 fails (
14d1f5de4c) - ❌ Fix attempt 2 fails (
c2bb90a3f1) - ✅ Final fix succeeds
Why Multiple Attempts:
- No local Windows testing environment
- CMake configuration doesn't validate actual compilation
- No header-only compilation check
- 15-20 minute feedback cycle from CI
Prevention:
- Header compilation smoke test
- CMake include path validator
- Local Windows testing (Docker/VM)
5.2 Linux FLAGS Symbol Conflicts
Timeline:
- ✅ Local macOS build succeeds
- ✅ Unit tests pass
- ❌ Linux full build fails at link time
- ❌ ODR violation: multiple
FLAGSdefinitions - ✅ Fix: move FLAGS definition, rename conflicts
Why It Happened:
- gflags creates
FLAGS_*symbols in headers - Multiple translation units define same symbols
- macOS linker more permissive than Linux ld
- No symbol conflict detection
Prevention:
- Symbol conflict scanner
- ODR violation checker
- Cross-platform link test
6. Recommended Testing Levels
We propose a 5-level testing pyramid:
Level 0: Static Analysis (< 1s)
- clang-format
- clang-tidy on changed files
- Copyright headers
- CMakeLists.txt syntax
Level 1: Configuration Validation (< 10s)
- CMake configure dry-run
- Include path validation
- Compiler detection check
- Preprocessor flag validation
Level 2: Smoke Compilation (< 2 min)
- Compile subset of files (1 file per library)
- Header-only compilation
- Template instantiation check
- Platform-specific branch validation
Level 3: Symbol Validation (< 5 min)
- Full project compilation
- Symbol conflict detection (nm/dumpbin)
- ODR violation check
- Library dependency graph
Level 4: Test Execution (5-30 min)
- Unit tests (fast)
- Integration tests (medium)
- E2E tests (slow)
- ROM-dependent tests (optional)
7. Actionable Recommendations
7.1 Immediate Actions (This Initiative)
-
Create pre-push scripts (
scripts/pre-push-test.sh,scripts/pre-push-test.ps1)- Run Level 0-2 checks locally
- Estimated time: <2 minutes
- Blocks 90% of CI failures
-
Create symbol conflict detector (
scripts/verify-symbols.sh)- Scan built libraries for duplicate symbols
- Run as part of pre-push
- Catches ODR violations
-
Document testing strategy (
docs/internal/testing/testing-strategy.md)- Clear explanation of each test level
- When to run which tests
- CI vs local testing
-
Create pre-push checklist (
docs/internal/testing/pre-push-checklist.md)- Interactive checklist for developers
- Links to tools and scripts
7.2 Short-Term Improvements (Next Sprint)
-
Add CI compile-only job
- Runs in <5 minutes
- Catches compilation issues before full build
- Fails fast
-
Add CI symbol checking job
- Runs after compile-only
- Detects ODR violations
- Platform-specific
-
Add CMake configuration validation job
- Tests all presets
- Validates include paths
- <2 minutes
-
Enable integration tests in CI
- Run on develop/master only (not PRs)
- Requires ROM file handling
7.3 Long-Term Improvements (Future)
-
Docker-based local testing
- Linux environment for macOS/Windows devs
- Matches CI exactly
- Fast feedback
-
Cross-platform test matrix locally
- Run tests across multiple platforms
- Automated VM/container management
-
Performance regression detection
- Benchmark suite
- Historical tracking
- Automatic alerts
-
Coverage tracking
- Line coverage per PR
- Coverage trends over time
- Uncovered code reports
8. Success Metrics
8.1 Developer Experience
- Target: <2 minutes pre-push validation time
- Target: 90% reduction in CI build failures
- Target: <3 attempts to fix CI issues (down from 5-10)
8.2 CI Efficiency
- Target: <5 minutes to first failure signal
- Target: 50% reduction in wasted CI time
- Target: 95% PR pass rate (up from ~70%)
8.3 Code Quality
- Target: Zero ODR violations
- Target: Zero platform-specific include issues
- Target: 100% symbol conflict detection
9. Reference
Similar Issues in Recent History
- Windows std::filesystem support (
19196ca87c,b556b155a5) - Linux circular dependency (
0812a84a22,e36d81f357) - macOS z3ed linker error (
9c562df277) - Windows clang-cl detection (
84cdb09a5b,cbdc6670a1)
Related Documentation
docs/public/build/quick-reference.md- Build commandsdocs/public/build/troubleshooting.md- Platform-specific fixesCLAUDE.md- Build system guidelines.github/workflows/ci.yml- CI configuration
Tools Used
nm(Unix) /dumpbin(Windows) - Symbol inspectionclang-tidy- Static analysiscppcheck- Code qualitycmake --preset <name> --list-presets- Preset validation