backend-infra-engineer: Post v0.3.9-hotfix7 snapshot (build cleanup)

2025-12-22 00:20:49 +00:00
parent 2934c82b75
commit 5c4cd57ff8
1259 changed files with 239160 additions and 43801 deletions
--- a/docs/internal/agents/archive/testing-docs-2025/matrix-testing-strategy.md
+++ b/docs/internal/agents/archive/testing-docs-2025/matrix-testing-strategy.md
@@ -0,0 +1,499 @@
+# Matrix Testing Strategy
+
+**Owner**: CLAUDE_MATRIX_TEST (Platform Matrix Testing Specialist)
+**Last Updated**: 2025-11-20
+**Status**: ACTIVE
+
+## Executive Summary
+
+This document defines the strategy for comprehensive platform/configuration matrix testing to catch issues across CMake flag combinations, platforms, and build configurations.
+
+**Key Goals**:
+- Catch cross-configuration issues before they reach production
+- Prevent "works on my machine" problems
+- Document problematic flag combinations
+- Make matrix testing accessible to developers locally
+- Minimize CI time while maximizing coverage
+
+**Quick Links**:
+- Configuration reference: `/docs/internal/configuration-matrix.md`
+- GitHub Actions workflow: `/.github/workflows/matrix-test.yml`
+- Local test script: `/scripts/test-config-matrix.sh`
+
+## 1. Problem Statement
+
+### Current Gaps
+
+Before this initiative, yaze only tested:
+1. **Default configurations**: `ci-linux`, `ci-macos`, `ci-windows` presets
+2. **Single feature toggles**: One dimension at a time
+3. **No interaction testing**: Missing edge cases like "GRPC=ON but REMOTE_AUTOMATION=OFF"
+
+### Real Bugs Caught by Matrix Testing
+
+Examples of issues a configuration matrix would catch:
+
+**Example 1: GRPC Without Automation**
+```cmake
+# Broken: User enables gRPC but disables remote automation
+cmake -B build -DYAZE_ENABLE_GRPC=ON -DYAZE_ENABLE_REMOTE_AUTOMATION=OFF
+# Result: gRPC headers included but server code never compiled → link errors
+```
+
+**Example 2: HTTP API Without CLI Stack**
+```cmake
+# Broken: User wants HTTP API but disables agent CLI
+cmake -B build -DYAZE_ENABLE_HTTP_API=ON -DYAZE_ENABLE_AGENT_CLI=OFF
+# Result: REST endpoints defined but no command dispatcher → runtime errors
+```
+
+**Example 3: AI Runtime Without JSON**
+```cmake
+# Broken: User enables AI with Gemini but disables JSON
+cmake -B build -DYAZE_ENABLE_AI_RUNTIME=ON -DYAZE_ENABLE_JSON=OFF
+# Result: Gemini parser requires JSON but it's not available → compile errors
+```
+
+**Example 4: Windows GRPC Version Mismatch**
+```cmake
+# Broken on Windows: gRPC version incompatible with MSVC ABI
+cmake -B build (with gRPC <1.67.1)
+# Result: Symbol errors, linker failures on Visual Studio
+```
+
+## 2. Matrix Testing Approach
+
+### Strategy: Smart, Not Exhaustive
+
+Instead of testing all 2^18 = 262,144 combinations:
+
+1. **Baseline**: Default configuration (most common user scenario)
+2. **Extremes**: All ON, All OFF (catch hidden assumptions)
+3. **Interactions**: Known problematic combinations
+4. **Tiers**: Progressive validation by feature complexity
+5. **Platforms**: Run critical tests on each OS
+
+### Testing Tiers
+
+#### Tier 1: Core Platforms (Every Commit)
+
+**When**: On push to `master` or `develop`, every PR
+**What**: The three critical presets that users will actually use
+**Time**: ~15 minutes total
+
+```
+ci-linux (gRPC + Agent, Linux)
+ci-macos (gRPC + Agent UI + Agent, macOS)
+ci-windows (gRPC, Windows)
+```
+
+**Why**: These reflect real user workflows. If they break, users are impacted immediately.
+
+#### Tier 2: Feature Combinations (Nightly / On-Demand)
+
+**When**: Nightly at 2 AM UTC, manual dispatch, or `[matrix]` in commit message
+**What**: 6-8 specific flag combinations per platform
+**Time**: ~45 minutes total (parallel across 3 platforms × 7 configs)
+
+```
+Linux:        minimal, grpc-only, full-ai, cli-no-grpc, http-api, no-json
+macOS:        minimal, full-ai, agent-ui, universal
+Windows:      minimal, full-ai, grpc-remote, z3ed-cli
+```
+
+**Why**: Tests dangerous interactions without exponential explosion. Each config tests a realistic user workflow.
+
+#### Tier 3: Platform-Specific (As Needed)
+
+**When**: When platform-specific issues arise
+**What**: Architecture-specific builds (ARM64, universal binary, etc.)
+**Time**: ~20 minutes
+
+```
+Windows ARM64:     Debug + Release
+macOS Universal:   arm64 + x86_64
+Linux ARM:         Cross-compile tests
+```
+
+**Why**: Catches architecture-specific issues that only appear on target platforms.
+
+### Configuration Selection Rationale
+
+#### Why "Minimal"?
+
+Tests the smallest viable configuration:
+- Validates core ROM reading/writing works without extras
+- Ensures build system doesn't have "feature X requires feature Y" errors
+- Catches over-linked libraries
+
+#### Why "gRPC Only"?
+
+Tests server-side automation without AI:
+- Validates gRPC infrastructure
+- Tests GUI automation system
+- Ensures protocol buffer compilation
+- Minimal dependencies for headless servers
+
+#### Why "Full AI Stack"?
+
+Tests maximum feature complexity:
+- All AI features enabled
+- Both Gemini and Ollama paths
+- Remote automation + Agent UI
+- Catches subtle linking issues with yaml-cpp, OpenSSL, etc.
+
+#### Why "No JSON"?
+
+Tests optional JSON dependency:
+- Ensures Ollama works without JSON
+- Validates graceful degradation
+- Catches hardcoded JSON assumptions
+
+#### Why Platform-Specific?
+
+Each platform has unique constraints:
+- **Windows**: MSVC ABI compatibility, gRPC version pinning
+- **macOS**: Universal binary (arm64 + x86_64), Homebrew dependencies
+- **Linux**: GCC version, glibc compatibility, system library versions
+
+## 3. Problematic Flag Combinations
+
+### Pattern 1: Hidden Dependencies (Fixed)
+
+**Configuration**:
+```cmake
+YAZE_ENABLE_GRPC=ON
+YAZE_ENABLE_REMOTE_AUTOMATION=OFF  # ← Inconsistent!
+```
+
+**Problem**: gRPC headers included, but no automation server compiled → link errors
+
+**Fix**: CMake now forces:
+```cmake
+if(YAZE_ENABLE_REMOTE_AUTOMATION AND NOT YAZE_ENABLE_GRPC)
+  set(YAZE_ENABLE_GRPC ON ... FORCE)
+endif()
+```
+
+**Matrix Test**: `grpc-only` configuration validates this constraint.
+
+### Pattern 2: Orphaned Features (Fixed)
+
+**Configuration**:
+```cmake
+YAZE_ENABLE_HTTP_API=ON
+YAZE_ENABLE_AGENT_CLI=OFF  # ← HTTP API needs a CLI context!
+```
+
+**Problem**: REST endpoints defined but no command dispatcher
+
+**Fix**: CMake forces:
+```cmake
+if(YAZE_ENABLE_HTTP_API AND NOT YAZE_ENABLE_AGENT_CLI)
+  set(YAZE_ENABLE_AGENT_CLI ON ... FORCE)
+endif()
+```
+
+**Matrix Test**: `http-api` configuration validates this.
+
+### Pattern 3: Optional Dependency Breakage
+
+**Configuration**:
+```cmake
+YAZE_ENABLE_AI_RUNTIME=ON
+YAZE_ENABLE_JSON=OFF  # ← Gemini requires JSON!
+```
+
+**Problem**: Gemini service can't parse responses
+
+**Status**: Currently relies on developer discipline
+**Matrix Test**: `no-json` + `full-ai` would catch this
+
+### Pattern 4: Platform-Specific ABI Mismatch
+
+**Configuration**: Windows with gRPC <1.67.1
+
+**Problem**: MSVC ABI differences, symbol mismatch
+
+**Status**: Documented in `ci-windows` preset
+**Matrix Test**: `grpc-remote` on Windows validates gRPC version
+
+### Pattern 5: Architecture-Specific Issues
+
+**Configuration**: macOS universal binary with platform-specific dependencies
+
+**Problem**: Homebrew packages may not have arm64 support
+
+**Status**: Requires dependency audit
+**Matrix Test**: `universal` on macOS tests both arm64 and x86_64
+
+## 4. Matrix Testing Tools
+
+### Local Testing: `scripts/test-config-matrix.sh`
+
+Developers run this before pushing to validate all critical configurations locally.
+
+#### Quick Start
+```bash
+# Test all configurations on current platform
+./scripts/test-config-matrix.sh
+
+# Test specific configuration
+./scripts/test-config-matrix.sh --config minimal
+
+# Smoke test (configure only, no build)
+./scripts/test-config-matrix.sh --smoke
+
+# Verbose with timing
+./scripts/test-config-matrix.sh --verbose
+```
+
+#### Features
+- **Fast feedback**: ~2-3 minutes for all configurations
+- **Smoke mode**: Configure without building (30 seconds)
+- **Platform detection**: Automatically runs platform-appropriate presets
+- **Result tracking**: Clear pass/fail summary
+- **Debug logging**: Full CMake/build output in `build_matrix/<config>/`
+
+#### Output Example
+```
+Config: minimal
+  Status: PASSED
+  Description: No AI, no gRPC
+  Build time: 2.3s
+
+Config: full-ai
+  Status: PASSED
+  Description: All features enabled
+  Build time: 45.2s
+
+============
+2/2 configs passed
+============
+```
+
+### CI Testing: `.github/workflows/matrix-test.yml`
+
+Automated nightly testing across all three platforms.
+
+#### Execution
+- **Trigger**: Nightly (2 AM UTC) + manual dispatch + `[matrix]` in commit message
+- **Platforms**: Linux (ubuntu-22.04), macOS (14), Windows (2022)
+- **Configurations per platform**: 6-7 distinct flag combinations
+- **Total runtime**: ~45 minutes (all jobs in parallel)
+- **Report**: Pass/fail summary + artifact upload on failure
+
+#### What It Tests
+
+**Linux (6 configs)**:
+1. `minimal` - No AI, no gRPC
+2. `grpc-only` - gRPC without automation
+3. `full-ai` - All features
+4. `cli-no-grpc` - CLI only
+5. `http-api` - REST endpoints
+6. `no-json` - Ollama mode
+
+**macOS (4 configs)**:
+1. `minimal` - GUI, no AI
+2. `full-ai` - All features
+3. `agent-ui` - Agent UI panels only
+4. `universal` - arm64 + x86_64 binary
+
+**Windows (4 configs)**:
+1. `minimal` - No AI
+2. `full-ai` - All features
+3. `grpc-remote` - gRPC + automation
+4. `z3ed-cli` - CLI executable
+
+## 5. Integration with Development Workflow
+
+### For Developers
+
+Before pushing code to `develop` or `master`:
+
+```bash
+# 1. Make changes
+git add src/...
+
+# 2. Test locally
+./scripts/test-config-matrix.sh
+
+# 3. If all pass, commit
+git commit -m "feature: add new thing"
+
+# 4. Push
+git push
+```
+
+### For CI/CD
+
+**On every push to develop/master**:
+1. Standard CI runs (Tier 1 tests)
+2. Code quality checks
+3. If green, wait for nightly matrix test
+
+**Nightly**:
+1. All Tier 2 combinations run in parallel
+2. Failures trigger alerts
+3. Success confirms no new cross-configuration issues
+
+### For Pull Requests
+
+Option A: **Include `[matrix]` in commit message**
+```bash
+git commit -m "fix: handle edge case [matrix]"
+git push  # Triggers matrix test immediately
+```
+
+Option B: **Manual dispatch**
+- Go to `.github/workflows/matrix-test.yml`
+- Click "Run workflow"
+- Select desired tier
+
+## 6. Monitoring & Maintenance
+
+### What to Watch
+
+**Daily**: Check nightly matrix test results
+- Link: GitHub Actions > `Configuration Matrix Testing`
+- Alert if any configuration fails
+
+**Weekly**: Review failure patterns
+- Are certain flag combinations always failing?
+- Is a platform having consistent issues?
+- Do dependencies need version updates?
+
+**Monthly**: Audit the matrix configuration
+- Do new flags need testing?
+- Are deprecated flags still tested?
+- Can any Tier 2 configs be combined?
+
+### Adding New Configurations
+
+When adding a new feature flag:
+
+1. **Update `cmake/options.cmake`**
+   - Define the option
+   - Document dependencies
+   - Add constraint enforcement
+
+2. **Update `/docs/internal/configuration-matrix.md`**
+   - Add to Section 1 (flags)
+   - Update Section 2 (constraints)
+   - Add to relevant Tier in Section 3
+
+3. **Update `/scripts/test-config-matrix.sh`**
+   - Add to `CONFIGS` array
+   - Test locally: `./scripts/test-config-matrix.sh --config new-config`
+
+4. **Update `/.github/workflows/matrix-test.yml`**
+   - Add matrix job entries for each platform
+   - Estimate runtime impact
+
+## 7. Troubleshooting Common Issues
+
+### Issue: "Configuration failed" locally
+
+```bash
+# Check the cmake log
+tail -50 build_matrix/<config>/config.log
+
+# Check if presets exist
+cmake --list-presets
+```
+
+### Issue: "Build failed" locally
+
+```bash
+# Get full build output
+./scripts/test-config-matrix.sh --config <name> --verbose
+
+# Check for missing dependencies
+# On macOS: brew list | grep <dep>
+# On Linux: apt list --installed | grep <dep>
+```
+
+### Issue: Test passes locally but fails in CI
+
+**Likely causes**:
+1. Different CMake version (CI uses latest)
+2. Different compiler (GCC vs Clang vs MSVC)
+3. Missing system library
+
+**Solutions**:
+- Check `.github/actions/setup-build` for CI environment
+- Match local compiler: `cmake --preset ci-linux -DCMAKE_CXX_COMPILER=gcc-13`
+- Add dependency: Update `cmake/dependencies.cmake`
+
+## 8. Future Improvements
+
+### Short Term (Next Sprint)
+
+- [ ] Add binary size tracking per configuration
+- [ ] Add compile time benchmarks
+- [ ] Auto-generate configuration compatibility matrix chart
+- [ ] Add `--ci-mode` flag to local script (simulates GH Actions)
+
+### Medium Term (Next Quarter)
+
+- [ ] Integrate with release pipeline (validate all Tier 2 before release)
+- [ ] Add performance regression tests per configuration
+- [ ] Create configuration validator tool (warns on suspicious combinations)
+- [ ] Document platform-specific dependency versions
+
+### Long Term (Next Year)
+
+- [ ] Separate `YAZE_ENABLE_AI` and `YAZE_ENABLE_AI_RUNTIME` (currently coupled)
+- [ ] Add Tier 0 (smoke tests) that run on every commit
+- [ ] Create web dashboard of matrix test results
+- [ ] Add "configuration suggestion" tool (infer optimal flags for user's hardware)
+
+## 9. Reference: Configuration Categories
+
+### GUI User (Desktop)
+```cmake
+YAZE_BUILD_GUI=ON
+YAZE_BUILD_AGENT_UI=ON
+YAZE_ENABLE_GRPC=OFF           # No network overhead
+YAZE_ENABLE_AI=OFF             # Unnecessary for GUI-only
+```
+
+### Server/Headless (Automation)
+```cmake
+YAZE_BUILD_GUI=OFF
+YAZE_ENABLE_GRPC=ON
+YAZE_ENABLE_REMOTE_AUTOMATION=ON
+YAZE_ENABLE_AI=OFF             # Optional
+```
+
+### Full-Featured Developer
+```cmake
+YAZE_BUILD_GUI=ON
+YAZE_BUILD_AGENT_UI=ON
+YAZE_ENABLE_GRPC=ON
+YAZE_ENABLE_REMOTE_AUTOMATION=ON
+YAZE_ENABLE_AI_RUNTIME=ON
+YAZE_ENABLE_HTTP_API=ON
+```
+
+### CLI-Only (z3ed Agent)
+```cmake
+YAZE_BUILD_GUI=OFF
+YAZE_BUILD_Z3ED=ON
+YAZE_ENABLE_GRPC=ON
+YAZE_ENABLE_AI_RUNTIME=ON
+YAZE_ENABLE_HTTP_API=ON
+```
+
+### Minimum (Embedded/Library)
+```cmake
+YAZE_BUILD_GUI=OFF
+YAZE_BUILD_CLI=OFF
+YAZE_BUILD_TESTS=OFF
+YAZE_ENABLE_GRPC=OFF
+YAZE_ENABLE_AI=OFF
+```
+
+---
+
+**Questions?** Check `/docs/internal/configuration-matrix.md` or ask in coordination-board.md.