backend-infra-engineer: Post v0.3.9-hotfix7 snapshot (build cleanup)

This commit is contained in:
scawful
2025-12-22 00:20:49 +00:00
parent 2934c82b75
commit 5c4cd57ff8
1259 changed files with 239160 additions and 43801 deletions

View File

@@ -0,0 +1,499 @@
# Matrix Testing Strategy
**Owner**: CLAUDE_MATRIX_TEST (Platform Matrix Testing Specialist)
**Last Updated**: 2025-11-20
**Status**: ACTIVE
## Executive Summary
This document defines the strategy for comprehensive platform/configuration matrix testing to catch issues across CMake flag combinations, platforms, and build configurations.
**Key Goals**:
- Catch cross-configuration issues before they reach production
- Prevent "works on my machine" problems
- Document problematic flag combinations
- Make matrix testing accessible to developers locally
- Minimize CI time while maximizing coverage
**Quick Links**:
- Configuration reference: `/docs/internal/configuration-matrix.md`
- GitHub Actions workflow: `/.github/workflows/matrix-test.yml`
- Local test script: `/scripts/test-config-matrix.sh`
## 1. Problem Statement
### Current Gaps
Before this initiative, yaze only tested:
1. **Default configurations**: `ci-linux`, `ci-macos`, `ci-windows` presets
2. **Single feature toggles**: One dimension at a time
3. **No interaction testing**: Missing edge cases like "GRPC=ON but REMOTE_AUTOMATION=OFF"
### Real Bugs Caught by Matrix Testing
Examples of issues a configuration matrix would catch:
**Example 1: GRPC Without Automation**
```cmake
# Broken: User enables gRPC but disables remote automation
cmake -B build -DYAZE_ENABLE_GRPC=ON -DYAZE_ENABLE_REMOTE_AUTOMATION=OFF
# Result: gRPC headers included but server code never compiled → link errors
```
**Example 2: HTTP API Without CLI Stack**
```cmake
# Broken: User wants HTTP API but disables agent CLI
cmake -B build -DYAZE_ENABLE_HTTP_API=ON -DYAZE_ENABLE_AGENT_CLI=OFF
# Result: REST endpoints defined but no command dispatcher → runtime errors
```
**Example 3: AI Runtime Without JSON**
```cmake
# Broken: User enables AI with Gemini but disables JSON
cmake -B build -DYAZE_ENABLE_AI_RUNTIME=ON -DYAZE_ENABLE_JSON=OFF
# Result: Gemini parser requires JSON but it's not available → compile errors
```
**Example 4: Windows GRPC Version Mismatch**
```cmake
# Broken on Windows: gRPC version incompatible with MSVC ABI
cmake -B build (with gRPC <1.67.1)
# Result: Symbol errors, linker failures on Visual Studio
```
## 2. Matrix Testing Approach
### Strategy: Smart, Not Exhaustive
Instead of testing all 2^18 = 262,144 combinations:
1. **Baseline**: Default configuration (most common user scenario)
2. **Extremes**: All ON, All OFF (catch hidden assumptions)
3. **Interactions**: Known problematic combinations
4. **Tiers**: Progressive validation by feature complexity
5. **Platforms**: Run critical tests on each OS
### Testing Tiers
#### Tier 1: Core Platforms (Every Commit)
**When**: On push to `master` or `develop`, every PR
**What**: The three critical presets that users will actually use
**Time**: ~15 minutes total
```
ci-linux (gRPC + Agent, Linux)
ci-macos (gRPC + Agent UI + Agent, macOS)
ci-windows (gRPC, Windows)
```
**Why**: These reflect real user workflows. If they break, users are impacted immediately.
#### Tier 2: Feature Combinations (Nightly / On-Demand)
**When**: Nightly at 2 AM UTC, manual dispatch, or `[matrix]` in commit message
**What**: 6-8 specific flag combinations per platform
**Time**: ~45 minutes total (parallel across 3 platforms × 7 configs)
```
Linux: minimal, grpc-only, full-ai, cli-no-grpc, http-api, no-json
macOS: minimal, full-ai, agent-ui, universal
Windows: minimal, full-ai, grpc-remote, z3ed-cli
```
**Why**: Tests dangerous interactions without exponential explosion. Each config tests a realistic user workflow.
#### Tier 3: Platform-Specific (As Needed)
**When**: When platform-specific issues arise
**What**: Architecture-specific builds (ARM64, universal binary, etc.)
**Time**: ~20 minutes
```
Windows ARM64: Debug + Release
macOS Universal: arm64 + x86_64
Linux ARM: Cross-compile tests
```
**Why**: Catches architecture-specific issues that only appear on target platforms.
### Configuration Selection Rationale
#### Why "Minimal"?
Tests the smallest viable configuration:
- Validates core ROM reading/writing works without extras
- Ensures build system doesn't have "feature X requires feature Y" errors
- Catches over-linked libraries
#### Why "gRPC Only"?
Tests server-side automation without AI:
- Validates gRPC infrastructure
- Tests GUI automation system
- Ensures protocol buffer compilation
- Minimal dependencies for headless servers
#### Why "Full AI Stack"?
Tests maximum feature complexity:
- All AI features enabled
- Both Gemini and Ollama paths
- Remote automation + Agent UI
- Catches subtle linking issues with yaml-cpp, OpenSSL, etc.
#### Why "No JSON"?
Tests optional JSON dependency:
- Ensures Ollama works without JSON
- Validates graceful degradation
- Catches hardcoded JSON assumptions
#### Why Platform-Specific?
Each platform has unique constraints:
- **Windows**: MSVC ABI compatibility, gRPC version pinning
- **macOS**: Universal binary (arm64 + x86_64), Homebrew dependencies
- **Linux**: GCC version, glibc compatibility, system library versions
## 3. Problematic Flag Combinations
### Pattern 1: Hidden Dependencies (Fixed)
**Configuration**:
```cmake
YAZE_ENABLE_GRPC=ON
YAZE_ENABLE_REMOTE_AUTOMATION=OFF # ← Inconsistent!
```
**Problem**: gRPC headers included, but no automation server compiled → link errors
**Fix**: CMake now forces:
```cmake
if(YAZE_ENABLE_REMOTE_AUTOMATION AND NOT YAZE_ENABLE_GRPC)
set(YAZE_ENABLE_GRPC ON ... FORCE)
endif()
```
**Matrix Test**: `grpc-only` configuration validates this constraint.
### Pattern 2: Orphaned Features (Fixed)
**Configuration**:
```cmake
YAZE_ENABLE_HTTP_API=ON
YAZE_ENABLE_AGENT_CLI=OFF # ← HTTP API needs a CLI context!
```
**Problem**: REST endpoints defined but no command dispatcher
**Fix**: CMake forces:
```cmake
if(YAZE_ENABLE_HTTP_API AND NOT YAZE_ENABLE_AGENT_CLI)
set(YAZE_ENABLE_AGENT_CLI ON ... FORCE)
endif()
```
**Matrix Test**: `http-api` configuration validates this.
### Pattern 3: Optional Dependency Breakage
**Configuration**:
```cmake
YAZE_ENABLE_AI_RUNTIME=ON
YAZE_ENABLE_JSON=OFF # ← Gemini requires JSON!
```
**Problem**: Gemini service can't parse responses
**Status**: Currently relies on developer discipline
**Matrix Test**: `no-json` + `full-ai` would catch this
### Pattern 4: Platform-Specific ABI Mismatch
**Configuration**: Windows with gRPC <1.67.1
**Problem**: MSVC ABI differences, symbol mismatch
**Status**: Documented in `ci-windows` preset
**Matrix Test**: `grpc-remote` on Windows validates gRPC version
### Pattern 5: Architecture-Specific Issues
**Configuration**: macOS universal binary with platform-specific dependencies
**Problem**: Homebrew packages may not have arm64 support
**Status**: Requires dependency audit
**Matrix Test**: `universal` on macOS tests both arm64 and x86_64
## 4. Matrix Testing Tools
### Local Testing: `scripts/test-config-matrix.sh`
Developers run this before pushing to validate all critical configurations locally.
#### Quick Start
```bash
# Test all configurations on current platform
./scripts/test-config-matrix.sh
# Test specific configuration
./scripts/test-config-matrix.sh --config minimal
# Smoke test (configure only, no build)
./scripts/test-config-matrix.sh --smoke
# Verbose with timing
./scripts/test-config-matrix.sh --verbose
```
#### Features
- **Fast feedback**: ~2-3 minutes for all configurations
- **Smoke mode**: Configure without building (30 seconds)
- **Platform detection**: Automatically runs platform-appropriate presets
- **Result tracking**: Clear pass/fail summary
- **Debug logging**: Full CMake/build output in `build_matrix/<config>/`
#### Output Example
```
Config: minimal
Status: PASSED
Description: No AI, no gRPC
Build time: 2.3s
Config: full-ai
Status: PASSED
Description: All features enabled
Build time: 45.2s
============
2/2 configs passed
============
```
### CI Testing: `.github/workflows/matrix-test.yml`
Automated nightly testing across all three platforms.
#### Execution
- **Trigger**: Nightly (2 AM UTC) + manual dispatch + `[matrix]` in commit message
- **Platforms**: Linux (ubuntu-22.04), macOS (14), Windows (2022)
- **Configurations per platform**: 6-7 distinct flag combinations
- **Total runtime**: ~45 minutes (all jobs in parallel)
- **Report**: Pass/fail summary + artifact upload on failure
#### What It Tests
**Linux (6 configs)**:
1. `minimal` - No AI, no gRPC
2. `grpc-only` - gRPC without automation
3. `full-ai` - All features
4. `cli-no-grpc` - CLI only
5. `http-api` - REST endpoints
6. `no-json` - Ollama mode
**macOS (4 configs)**:
1. `minimal` - GUI, no AI
2. `full-ai` - All features
3. `agent-ui` - Agent UI panels only
4. `universal` - arm64 + x86_64 binary
**Windows (4 configs)**:
1. `minimal` - No AI
2. `full-ai` - All features
3. `grpc-remote` - gRPC + automation
4. `z3ed-cli` - CLI executable
## 5. Integration with Development Workflow
### For Developers
Before pushing code to `develop` or `master`:
```bash
# 1. Make changes
git add src/...
# 2. Test locally
./scripts/test-config-matrix.sh
# 3. If all pass, commit
git commit -m "feature: add new thing"
# 4. Push
git push
```
### For CI/CD
**On every push to develop/master**:
1. Standard CI runs (Tier 1 tests)
2. Code quality checks
3. If green, wait for nightly matrix test
**Nightly**:
1. All Tier 2 combinations run in parallel
2. Failures trigger alerts
3. Success confirms no new cross-configuration issues
### For Pull Requests
Option A: **Include `[matrix]` in commit message**
```bash
git commit -m "fix: handle edge case [matrix]"
git push # Triggers matrix test immediately
```
Option B: **Manual dispatch**
- Go to `.github/workflows/matrix-test.yml`
- Click "Run workflow"
- Select desired tier
## 6. Monitoring & Maintenance
### What to Watch
**Daily**: Check nightly matrix test results
- Link: GitHub Actions > `Configuration Matrix Testing`
- Alert if any configuration fails
**Weekly**: Review failure patterns
- Are certain flag combinations always failing?
- Is a platform having consistent issues?
- Do dependencies need version updates?
**Monthly**: Audit the matrix configuration
- Do new flags need testing?
- Are deprecated flags still tested?
- Can any Tier 2 configs be combined?
### Adding New Configurations
When adding a new feature flag:
1. **Update `cmake/options.cmake`**
- Define the option
- Document dependencies
- Add constraint enforcement
2. **Update `/docs/internal/configuration-matrix.md`**
- Add to Section 1 (flags)
- Update Section 2 (constraints)
- Add to relevant Tier in Section 3
3. **Update `/scripts/test-config-matrix.sh`**
- Add to `CONFIGS` array
- Test locally: `./scripts/test-config-matrix.sh --config new-config`
4. **Update `/.github/workflows/matrix-test.yml`**
- Add matrix job entries for each platform
- Estimate runtime impact
## 7. Troubleshooting Common Issues
### Issue: "Configuration failed" locally
```bash
# Check the cmake log
tail -50 build_matrix/<config>/config.log
# Check if presets exist
cmake --list-presets
```
### Issue: "Build failed" locally
```bash
# Get full build output
./scripts/test-config-matrix.sh --config <name> --verbose
# Check for missing dependencies
# On macOS: brew list | grep <dep>
# On Linux: apt list --installed | grep <dep>
```
### Issue: Test passes locally but fails in CI
**Likely causes**:
1. Different CMake version (CI uses latest)
2. Different compiler (GCC vs Clang vs MSVC)
3. Missing system library
**Solutions**:
- Check `.github/actions/setup-build` for CI environment
- Match local compiler: `cmake --preset ci-linux -DCMAKE_CXX_COMPILER=gcc-13`
- Add dependency: Update `cmake/dependencies.cmake`
## 8. Future Improvements
### Short Term (Next Sprint)
- [ ] Add binary size tracking per configuration
- [ ] Add compile time benchmarks
- [ ] Auto-generate configuration compatibility matrix chart
- [ ] Add `--ci-mode` flag to local script (simulates GH Actions)
### Medium Term (Next Quarter)
- [ ] Integrate with release pipeline (validate all Tier 2 before release)
- [ ] Add performance regression tests per configuration
- [ ] Create configuration validator tool (warns on suspicious combinations)
- [ ] Document platform-specific dependency versions
### Long Term (Next Year)
- [ ] Separate `YAZE_ENABLE_AI` and `YAZE_ENABLE_AI_RUNTIME` (currently coupled)
- [ ] Add Tier 0 (smoke tests) that run on every commit
- [ ] Create web dashboard of matrix test results
- [ ] Add "configuration suggestion" tool (infer optimal flags for user's hardware)
## 9. Reference: Configuration Categories
### GUI User (Desktop)
```cmake
YAZE_BUILD_GUI=ON
YAZE_BUILD_AGENT_UI=ON
YAZE_ENABLE_GRPC=OFF # No network overhead
YAZE_ENABLE_AI=OFF # Unnecessary for GUI-only
```
### Server/Headless (Automation)
```cmake
YAZE_BUILD_GUI=OFF
YAZE_ENABLE_GRPC=ON
YAZE_ENABLE_REMOTE_AUTOMATION=ON
YAZE_ENABLE_AI=OFF # Optional
```
### Full-Featured Developer
```cmake
YAZE_BUILD_GUI=ON
YAZE_BUILD_AGENT_UI=ON
YAZE_ENABLE_GRPC=ON
YAZE_ENABLE_REMOTE_AUTOMATION=ON
YAZE_ENABLE_AI_RUNTIME=ON
YAZE_ENABLE_HTTP_API=ON
```
### CLI-Only (z3ed Agent)
```cmake
YAZE_BUILD_GUI=OFF
YAZE_BUILD_Z3ED=ON
YAZE_ENABLE_GRPC=ON
YAZE_ENABLE_AI_RUNTIME=ON
YAZE_ENABLE_HTTP_API=ON
```
### Minimum (Embedded/Library)
```cmake
YAZE_BUILD_GUI=OFF
YAZE_BUILD_CLI=OFF
YAZE_BUILD_TESTS=OFF
YAZE_ENABLE_GRPC=OFF
YAZE_ENABLE_AI=OFF
```
---
**Questions?** Check `/docs/internal/configuration-matrix.md` or ask in coordination-board.md.