Files
yaze/docs/internal/testing/gap-analysis.md
2025-11-21 21:35:50 -05:00

11 KiB

Testing Infrastructure Gap Analysis

Executive Summary

Recent CI failures revealed critical gaps in our testing infrastructure that allowed platform-specific build failures to reach CI. This document analyzes what we currently test, what we missed, and what infrastructure is needed to catch issues earlier.

Date: 2025-11-20 Triggered By: Multiple CI failures in commits 43a0e5e314, c2bb90a3f1, and related fixes


1. Issues We Didn't Catch Locally

1.1 Windows Abseil Include Path Issues (c2bb90a3f1)

Problem: Abseil headers not found during Windows/clang-cl compilation Why it wasn't caught:

  • No local pre-push compilation check
  • CMake configuration validates successfully, but compilation fails later
  • Include path propagation from gRPC/Abseil not validated until full compile

What would have caught it:

  • Smoke compilation test (compile subset of files to catch header issues)
  • CMake configuration validator (check include path propagation)
  • Header dependency checker

1.2 Linux FLAGS Symbol Conflicts (43a0e5e314, eb77bbeaff)

Problem: ODR (One Definition Rule) violation - multiple FLAGS symbols across libraries Why it wasn't caught:

  • Symbol conflicts only appear at link time
  • No cross-library symbol conflict detection
  • Static analysis doesn't catch ODR violations
  • Unit tests don't link full dependency graph

What would have caught it:

  • Symbol conflict scanner (nm/objdump analysis)
  • ODR violation detector
  • Full integration build test (link all libraries together)

1.3 Platform-Specific Configuration Issues

Problem: Preprocessor flags, compiler detection, and platform-specific code paths Why it wasn't caught:

  • No local cross-platform validation
  • CMake configuration differences between platforms not tested
  • Compiler detection logic (clang-cl vs MSVC) not validated

What would have caught it:

  • CMake configuration dry-run on multiple platforms
  • Preprocessor flag validation
  • Compiler detection smoke test

2. Current Testing Coverage

2.1 What We Test Well

Unit Tests (test/unit/)

  • Coverage: Core algorithms, data structures, parsers
  • Speed: Fast (<1s for most tests)
  • Isolation: Mocked dependencies, no ROM required
  • CI: Runs on every PR
  • Example: hex_test.cc, asar_wrapper_test.cc, snes_palette_test.cc

Strengths:

  • Catches logic errors quickly
  • Good for TDD
  • Platform-independent

Gaps:

  • Doesn't catch build system issues
  • Doesn't catch linking problems
  • Doesn't validate dependencies

Integration Tests (test/integration/)

  • Coverage: Multi-component interactions, ROM operations
  • Speed: Slower (1-10s per test)
  • Dependencies: May require ROM files
  • CI: Runs on develop/master
  • Example: asar_integration_test.cc, dungeon_editor_v2_test.cc

Strengths:

  • Tests component interactions
  • Validates ROM operations

Gaps:

  • Still doesn't catch platform-specific issues
  • Doesn't validate symbol conflicts
  • Doesn't test cross-library linking

E2E Tests (test/e2e/)

  • Coverage: Full UI workflows, user interactions
  • Speed: Very slow (10-60s per test)
  • Dependencies: GUI, ImGuiTestEngine
  • CI: ⚠️ Limited (only on macOS z3ed-agent-test)
  • Example: dungeon_editor_smoke_test.cc, canvas_selection_test.cc

Strengths:

  • Validates real user workflows
  • Tests UI responsiveness

Gaps:

  • Not run consistently across platforms
  • Slow feedback loop
  • Requires display/window system

2.2 What We DON'T Test

Build System Validation

  • CMake configuration correctness per preset
  • Include path propagation from dependencies
  • Compiler flag compatibility
  • Linker flag validation
  • Cross-preset compatibility

Symbol-Level Issues

  • ODR (One Definition Rule) violations
  • Duplicate symbol detection across libraries
  • Symbol visibility (public/private)
  • ABI compatibility between libraries

Platform-Specific Compilation

  • Header-only compilation checks
  • Preprocessor branch coverage
  • Platform macro validation
  • Compiler-specific feature detection

Dependency Health

  • Include path conflicts
  • Library version mismatches
  • Transitive dependency validation
  • Static vs shared library conflicts

3. CI/CD Coverage Analysis

3.1 Current CI Matrix (.github/workflows/ci.yml)

Platform Build Test (stable) Test (unit) Test (integration) Test (AI)
Ubuntu 22.04 (GCC-12)
macOS 14 (Clang)
Windows 2022 (Core)
Windows 2022 (AI)

CI Job Flow:

  1. build: Configure + compile full project
  2. test: Run stable + unit tests
  3. windows-agent: Full AI stack (gRPC + AI runtime)
  4. code-quality: clang-format, cppcheck, clang-tidy
  5. memory-sanitizer: AddressSanitizer (Linux only)
  6. z3ed-agent-test: Full agent test suite (macOS only)

3.2 CI Gaps

Missing Early Feedback

  • No compilation-only job (fails after 15-20 min build)
  • No CMake configuration validation job (would catch in <1 min)
  • No symbol conflict checking job

Limited Platform Coverage

  • ⚠️ Only Linux gets AddressSanitizer
  • ⚠️ Only macOS gets full z3ed agent tests
  • ⚠️ Windows AI stack not tested on PRs (only post-merge)

Incomplete Testing

  • Integration tests not run in CI
  • E2E tests not run on Linux/Windows
  • No ROM-dependent testing
  • No performance regression detection

4. Developer Workflow Gaps

4.1 Pre-Commit Hooks

Current State: None Gap: No automatic checks before local commits

Should Include:

  • clang-format check
  • Build system sanity check
  • Copyright header validation

4.2 Pre-Push Validation

Current State: Manual testing only Gap: Easy to push broken code to CI

Should Include:

  • Smoke build test (quick compilation check)
  • Unit test run
  • Symbol conflict detection

4.3 Local Cross-Platform Testing

Current State: Developer-dependent Gap: No easy way to test across platforms locally

Should Include:

  • Docker-based Linux testing
  • VM-based Windows testing (for macOS/Linux devs)
  • Preset validation tool

5. Root Cause Analysis by Issue Type

5.1 Windows Abseil Include Paths

Timeline:

  • Local macOS build succeeds
  • CMake configuration succeeds on all platforms
  • Windows compilation fails 15 minutes into CI
  • Fix attempt 1 fails (14d1f5de4c)
  • Fix attempt 2 fails (c2bb90a3f1)
  • Final fix succeeds

Why Multiple Attempts:

  1. No local Windows testing environment
  2. CMake configuration doesn't validate actual compilation
  3. No header-only compilation check
  4. 15-20 minute feedback cycle from CI

Prevention:

  • Header compilation smoke test
  • CMake include path validator
  • Local Windows testing (Docker/VM)

5.2 Linux FLAGS Symbol Conflicts

Timeline:

  • Local macOS build succeeds
  • Unit tests pass
  • Linux full build fails at link time
  • ODR violation: multiple FLAGS definitions
  • Fix: move FLAGS definition, rename conflicts

Why It Happened:

  1. gflags creates FLAGS_* symbols in headers
  2. Multiple translation units define same symbols
  3. macOS linker more permissive than Linux ld
  4. No symbol conflict detection

Prevention:

  • Symbol conflict scanner
  • ODR violation checker
  • Cross-platform link test

We propose a 5-level testing pyramid:

Level 0: Static Analysis (< 1s)

  • clang-format
  • clang-tidy on changed files
  • Copyright headers
  • CMakeLists.txt syntax

Level 1: Configuration Validation (< 10s)

  • CMake configure dry-run
  • Include path validation
  • Compiler detection check
  • Preprocessor flag validation

Level 2: Smoke Compilation (< 2 min)

  • Compile subset of files (1 file per library)
  • Header-only compilation
  • Template instantiation check
  • Platform-specific branch validation

Level 3: Symbol Validation (< 5 min)

  • Full project compilation
  • Symbol conflict detection (nm/dumpbin)
  • ODR violation check
  • Library dependency graph

Level 4: Test Execution (5-30 min)

  • Unit tests (fast)
  • Integration tests (medium)
  • E2E tests (slow)
  • ROM-dependent tests (optional)

7. Actionable Recommendations

7.1 Immediate Actions (This Initiative)

  1. Create pre-push scripts (scripts/pre-push-test.sh, scripts/pre-push-test.ps1)

    • Run Level 0-2 checks locally
    • Estimated time: <2 minutes
    • Blocks 90% of CI failures
  2. Create symbol conflict detector (scripts/verify-symbols.sh)

    • Scan built libraries for duplicate symbols
    • Run as part of pre-push
    • Catches ODR violations
  3. Document testing strategy (docs/internal/testing/testing-strategy.md)

    • Clear explanation of each test level
    • When to run which tests
    • CI vs local testing
  4. Create pre-push checklist (docs/internal/testing/pre-push-checklist.md)

    • Interactive checklist for developers
    • Links to tools and scripts

7.2 Short-Term Improvements (Next Sprint)

  1. Add CI compile-only job

    • Runs in <5 minutes
    • Catches compilation issues before full build
    • Fails fast
  2. Add CI symbol checking job

    • Runs after compile-only
    • Detects ODR violations
    • Platform-specific
  3. Add CMake configuration validation job

    • Tests all presets
    • Validates include paths
    • <2 minutes
  4. Enable integration tests in CI

    • Run on develop/master only (not PRs)
    • Requires ROM file handling

7.3 Long-Term Improvements (Future)

  1. Docker-based local testing

    • Linux environment for macOS/Windows devs
    • Matches CI exactly
    • Fast feedback
  2. Cross-platform test matrix locally

    • Run tests across multiple platforms
    • Automated VM/container management
  3. Performance regression detection

    • Benchmark suite
    • Historical tracking
    • Automatic alerts
  4. Coverage tracking

    • Line coverage per PR
    • Coverage trends over time
    • Uncovered code reports

8. Success Metrics

8.1 Developer Experience

  • Target: <2 minutes pre-push validation time
  • Target: 90% reduction in CI build failures
  • Target: <3 attempts to fix CI issues (down from 5-10)

8.2 CI Efficiency

  • Target: <5 minutes to first failure signal
  • Target: 50% reduction in wasted CI time
  • Target: 95% PR pass rate (up from ~70%)

8.3 Code Quality

  • Target: Zero ODR violations
  • Target: Zero platform-specific include issues
  • Target: 100% symbol conflict detection

9. Reference

Similar Issues in Recent History

  • docs/public/build/quick-reference.md - Build commands
  • docs/public/build/troubleshooting.md - Platform-specific fixes
  • CLAUDE.md - Build system guidelines
  • .github/workflows/ci.yml - CI configuration

Tools Used

  • nm (Unix) / dumpbin (Windows) - Symbol inspection
  • clang-tidy - Static analysis
  • cppcheck - Code quality
  • cmake --preset <name> --list-presets - Preset validation