Files
yaze/docs/z3ed/E6-z3ed-implementation-plan.md

70 KiB
Raw Blame History

z3ed Agentic Workflow Plan

Last Updated: October 2, 2025 Status: Core Infrastructure Complete | Test Harness Enhancement Phase 🎯

📋 Quick Start: See README.md for essential links and project status.

Executive Summary

The z3ed CLI and AI agent workflow system has completed major infrastructure milestones:

Completed Phases:

  • Phase 6: Resource Catalogue - Machine-readable API specs for AI consumption
  • AW-01/02/03: Acceptance Workflow - Proposal tracking, sandbox management, GUI review with ROM merging
  • AW-04: Policy Evaluation Framework - YAML-based constraint system for proposal acceptance
  • IT-01: ImGuiTestHarness - Full GUI automation via gRPC + ImGuiTestEngine (all 3 phases complete)
  • IT-02: CLI Agent Test - Natural language → automated GUI testing (implementation complete)

🎯 Active Phase:

  • Conversational Agent Implementation: Foundation complete, LLM function calling COMPLETE (Oct 3, 2025)

📋 Next Phases (Updated Oct 3, 2025):

  • Priority 1: Live LLM Testing (1-2h) - Verify function calling with Ollama/Gemini
  • Priority 2: GUI Chat Widget (6-8h) - Create ImGui widget matching TUI experience
  • Priority 3: Expand Tool Coverage (8-10h) - Add dialogue, sprite, region inspection tools
  • Priority 4: Widget Discovery API (IT-06) - AI agents enumerate available GUI interactions
  • Priority 5: Windows Cross-Platform Testing - Validate on Windows with vcpkg
  • Deprioritized: Collaborative Editing (IT-10) - Postponed in favor of practical LLM integration

Recent Accomplishments (Updated: October 2025):

  • IT-08 Enhanced Error Reporting Complete: Full diagnostic capture operational
    • IT-08a: Screenshot RPC with SDL capture (BMP format, 1536x864)
    • IT-08b: Auto-capture execution context on failures (frame, window, widget)
    • IT-08c: Widget state dumps with comprehensive UI snapshot (JSON, 45 min)
    • Proto schema updated with screenshot_path, failure_context, widget_state
    • GetTestResults RPC returns complete failure diagnostics
  • IT-09 CLI Suite Commands Landed: End-to-end suite orchestration for CI
    • agent test suite run handles groups, tags, params, retries, and emits summaries plus default JUnit XML under test-results/junit/
    • agent test suite validate performs structural linting with exit codes
    • NEW agent test suite create interactive builder writes YAML suites to tests/<name>.yaml (with --force overwrite) and guides group/test entry
  • IT-08a Screenshot RPC Complete: SDL-based screenshot capture operational
    • Captures 1536x864 BMP files via SDL_RenderReadPixels
    • Successfully tested via gRPC (5.3MB output files)
    • Foundation for auto-capture on test failures
  • Policy Framework Complete: PolicyEvaluator service fully integrated with ProposalDrawer GUI
    • 4 policy types implemented: test_requirement, change_constraint, forbidden_range, review_requirement
    • 3 severity levels: Info (informational), Warning (overridable), Critical (blocks acceptance)
    • GUI displays color-coded violations ( critical, ⚠️ warning, info)
    • Accept button gating based on policy violations with override confirmation dialog
    • Example policy configuration at .yaze/policies/agent.yaml
  • E2E Validation Complete: All 5 functional RPC tests passing (Ping, Click, Type, Wait, Assert)
    • Window detection timing issue resolved with 10-frame yield buffer in Wait RPC
    • Thread safety issues resolved with shared_ptr state management
    • Test harness validated on macOS ARM64 with real YAZE GUI interactions
  • gRPC Test Harness (IT-01 & IT-02): Full implementation complete with natural language → GUI testing
  • Test Recording & Replay (IT-07): JSON recorder/replayer implemented, CLI and harness wired, end-to-end regression workflow captured in scripts/test_record_replay_e2e.sh
  • Build System: Hardened CMake configuration with reliable gRPC integration
  • Proposal Workflow: Agentic proposal system fully operational (create, list, diff, review in GUI)

Known Limitations & Improvement Opportunities:

  • Screenshot Auto-Capture: Manual RPC only → needs integration with TestManager failure detection
  • Test Introspection: Complete - GetTestStatus/ListTests/GetResults RPCs operational
  • Widget Discovery: AI agents can't enumerate available widgets → add DiscoverWidgets RPC
  • Test Recording: No record/replay for regression testing → add RecordSession/ReplaySession RPCs
  • Synchronous Wait: Async tests return immediately → add blocking mode or result polling
  • Error Context: Test failures lack screenshots/state dumps → enhance error reporting
  • Performance: Tests add ~166ms per Wait call due to frame yielding (acceptable trade-off)
  • YAML Parsing: Simple parser implemented, consider yaml-cpp for complex scenarios

Time Investment: 28.5 hours total (IT-01: 11h, IT-02: 7.5h, E2E: 2h, Policy: 6h, Docs: 2h)

Quick Reference

Start Test Harness:

./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze \
  --enable_test_harness \
  --test_harness_port=50052 \
  --rom_file=assets/zelda3.sfc &

Test All RPCs:

./scripts/test_harness_e2e.sh

Create Proposal:

./build/bin/z3ed agent run "Test prompt" --sandbox
./build/bin/z3ed agent list
./build/bin/z3ed agent diff --proposal-id <ID>

Review in GUI:

  • Open YAZE → Debug → Agent Proposals
  • Select proposal → Review → Accept/Reject/Delete

1. Current Priorities (Week of Oct 2-8, 2025)

Status: Core Infrastructure Complete | Test Harness Enhancement Phase 🔧

Priority 1: Test Harness Enhancements (IT-05 to IT-09) 🔧 ACTIVE

Goal: Transform test harness from basic automation to comprehensive testing platform and deliver holistic error reporting across YAZE
Time Estimate: 20-25 hours total (7.5h completed in IT-07)
Blocking Dependency: IT-01 Complete

Motivation: The harness now supports AI workflows, regression capture, and automation—but error surfaces remain shallow:

  • AI Agent Development: Still needs widget discovery for adaptive planning
  • Regression Testing: Recording/replay finished; reporting pipeline must surface actionable failures
  • CI/CD Integration: Requires reliable artifacts (logs, screenshots, structured context)
  • Debugging: Failures lack screenshots, widget hierarchies, and EditorManager state snapshots
  • Application Consistency: z3ed, EditorManager, and core services emit heterogeneous error formats

IT-05: Test Introspection API (6-8 hours)

Status (Oct 2, 2025): Completed

Highlights:

  • imgui_test_harness.proto now exposes GetTestStatus, ListTests, and GetTestResults RPCs backed by TestManager's execution history.
  • CLI commands (z3ed agent test status|list|results) are fully wired with JSON/YAML formatting, follow-mode polling, and filtering options.
  • GuiAutomationClient provides typed wrappers for introspection APIs so agent workflows can poll status programmatically.
  • Regression coverage lives in scripts/test_harness_e2e.sh; a slimmer introspection smoke (scripts/test_introspection_e2e.sh) is queued for CI automation but manual verification paths are documented.

Future Enhancements:

  • Capture richer assertion metadata (expected/actual pairs) for improved failure messaging when the underlying harness exposes it.
  • Add pagination helpers to CLI once history volume grows (low priority).

Example Usage:

# Queue a test
z3ed agent test --prompt "Open Overworld editor"

# Poll for completion
z3ed test status --test-id grpc_click_12345678

# Retrieve results
z3ed test results --test-id grpc_click_12345678 --format json

API Schema:

message GetTestStatusRequest {
  string test_id = 1;
}

message GetTestStatusResponse {
  enum Status { QUEUED = 0; RUNNING = 1; PASSED = 2; FAILED = 3; TIMEOUT = 4; }
  Status status = 1;
  int64 execution_time_ms = 2;
  string error_message = 3;
  repeated string assertion_failures = 4;
}

message ListTestsRequest {
  string category_filter = 1;  // Optional: "grpc", "unit", etc.
  int32 page_size = 2;
  string page_token = 3;
}

message ListTestsResponse {
  repeated TestInfo tests = 1;
  string next_page_token = 2;
}

message TestInfo {
  string test_id = 1;
  string name = 2;
  string category = 3;
  int64 last_run_timestamp_ms = 4;
  int32 total_runs = 5;
  int32 pass_count = 6;
  int32 fail_count = 7;
}

IT-06: Widget Discovery API (4-6 hours)

Implementation Tasks:

  1. Add DiscoverWidgets RPC:

    • Enumerate all windows currently open in YAZE GUI
    • List all interactive widgets (buttons, inputs, menus, tabs) per window
    • Return widget metadata: ID, type, label, enabled state, position
    • Support filtering by window name or widget type
  2. AI-Friendly Output Format:

    • JSON schema describing available interactions
    • Natural language descriptions for each widget
    • Suggested action templates (e.g., "Click button:{label}")

Example Usage:

# Discover all widgets
z3ed gui discover

# Filter by window
z3ed gui discover --window "Overworld"

# Get only buttons
z3ed gui discover --type button

API Schema (current):

message DiscoverWidgetsRequest {
  string window_filter = 1;
  WidgetType type_filter = 2;
  string path_prefix = 3;
  bool include_invisible = 4;
  bool include_disabled = 5;
}

message WidgetBounds {
  float min_x = 1;
  float min_y = 2;
  float max_x = 3;
  float max_y = 4;
}

message DiscoveredWidget {
  string path = 1;
  string label = 2;
  string type = 3;
  string description = 4;
  string suggested_action = 5;
  bool visible = 6;
  bool enabled = 7;
  WidgetBounds bounds = 8;
  uint32 widget_id = 9;
  int64 last_seen_frame = 10;
  int64 last_seen_at_ms = 11;
  bool stale = 12;
}

message DiscoveredWindow {
  string name = 1;
  bool visible = 2;
  repeated DiscoveredWidget widgets = 3;
}

message DiscoverWidgetsResponse {
  repeated DiscoveredWindow windows = 1;
  int32 total_widgets = 2;
  int64 generated_at_ms = 3;
}

Benefits for AI Agents:

  • LLMs can dynamically learn available GUI interactions
  • Agents can adapt to UI changes without hardcoded widget names
  • Natural language descriptions enable better prompt engineering

IT-07: Test Recording & Replay COMPLETE (Oct 2, 2025)

Highlights:

  • Implemented StartRecording, StopRecording, and ReplayTest RPCs with persistent JSON scripts
  • Added CLI commands: z3ed test record start|stop, z3ed test replay
  • Scripts stored in tests/gui/ with metadata (name, tags, assertions, timing hints)
  • Added regression coverage via scripts/test_record_replay_e2e.sh
  • Documentation updates in E6-z3ed-reference.md and new quick-start snippets in README
  • Confirmed compatibility with natural language prompts generated by the agent workflow

Outcome: Recording/replay is production-ready; focus shifts to surfacing rich failure diagnostics (IT-08).

IT-08: Enhanced Error Reporting (5-7 hours) COMPLETE

Status: IT-08a Complete | IT-08b Complete | IT-08c Complete Objective: Deliver a unified, high-signal error reporting pipeline spanning ImGuiTestHarness, z3ed CLI, EditorManager, and core application services.

Implementation Tracks:

  1. Harness-Level Diagnostics
  • IT-08a: Screenshot RPC implemented (SDL-based, BMP format, 1536x864)
  • IT-08b: Auto-capture screenshots and context on test failure using shared helper that writes to ${TMPDIR}/yaze/test-results/<test_id>/
  • IT-08c: Widget tree JSON dumps emitted alongside failure context
  • HTML bundle exporter (screenshots + widget tree) remains a stretch goal
  1. CLI Experience Improvements
  • Surface artifact paths, failure context, and widget state in CLI output (DONE)
  • Standardize error envelopes in z3ed (absl::Status + structured payload)
  • Add --format html flag to emit rich bundles (planned)
  • Integrate with recording workflow: replay failures using captured state (planned)
  1. EditorManager & Application Integration
  • Introduce shared ErrorAnnotatedResult utility exposing status, context, actionable_hint
  • Adapt EditorManager subsystems (ProposalDrawer, OverworldEditor, DungeonEditor) to adopt the shared structure
  • Add in-app failure overlay (ImGui modal) that references harness artifacts when available
  • Hook proposal acceptance/replay flows to display enriched diagnostics when sandbox merges fail
  1. Telemetry & Storage Hooks (Stretch)
  • Optionally emit error metadata to a ring buffer for future analytics/telemetry workstreams
  • Provide CLI flag --error-artifact-dir to customize storage (supports CI separation)

Error Report Example:

{
  "test_id": "grpc_assert_12345678",
  "failure_time": "2025-10-02T14:23:45Z",
  "assertion": "visible:Overworld",
  "expected": "visible",
  "actual": "hidden",
  "screenshot": "/tmp/yaze/test-results/grpc_assert_12345678/failure_1696357220000.bmp",
  "widget_state": {
    "active_window": "Main Window",
    "focused_widget": null,
    "visible_windows": ["Main Window", "Debug"],
    "overworld_window": { "exists": true, "visible": false, "position": "0,0,0,0" }
  },
  "execution_context": {
    "frame_count": 1234,
    "recent_events": ["Click: menuitem: Overworld Editor", "Wait: window_visible:Overworld"],
    "resource_stats": { "memory_mb": 245, "textures": 12, "framerate": 60.0 },
    "editor_manager_snapshot": {
      "active_module": "OverworldEditor",
      "dirty_buffers": ["overworld_layer_1"],
      "last_error": null
    }
  }
}

IT-09: CI/CD Integration CLI Tooling Shipped

Delivered (Oct 3, 2025):

  1. Standardized Suite Runtime
  • YAML suite parser/loader with group dependencies and retry semantics
  • z3ed agent test suite run exposes --group, --tag, --param, --retries, --ci-mode, and --junit
  • Automatic JUnit XML emission to test-results/junit/<suite>.xml
  1. Validation & Authoring UX
  • z3ed agent test suite validate surfaces structural linting with annotated exit codes (0 pass, 1 fail, 2 error)
  • NEW z3ed agent test suite create <name> interactive flow scaffolds suites under tests/, prompting for metadata, groups, replay scripts, tags, and key=value parameters (with --force overwrite support)
  1. Reporting
  • Text and JSON summaries include per-test assertions and retry outcomes
  • Default output directory layout ready for CI artifact upload

Next Steps (post-CLI follow-through):

  • Publish canonical tests/smoke.yaml / tests/regression.yaml samples
  • Add .github/workflows/gui-tests.yml template referencing the new runner
  • Document flaky-test mitigation patterns, including recommended retry counts
  • Wire suite execution output into docs/CI dashboards for quick triage

Test Suite Format:

name: YAZE GUI Test Suite
description: Comprehensive tests for YAZE editor functionality
version: 1.0

config:
  timeout_per_test: 30s
  retry_on_failure: 2
  parallel_execution: false

test_groups:
  - name: smoke
    description: Fast tests for basic functionality
    tests:
      - tests/overworld_load.json
      - tests/dungeon_load.json
  
  - name: regression
    description: Full test suite for release validation
    depends_on: [smoke]
    tests:
      - tests/palette_edit.json
      - tests/sprite_load.json
      - tests/rom_save.json

GitHub Actions Integration:

name: GUI Tests
on: [push, pull_request]

jobs:
  gui-tests:
    runs-on: macos-latest
    steps:
      - uses: actions/checkout@v2
      - name: Build YAZE with test harness
        run: |
          cmake -B build -DYAZE_WITH_GRPC=ON
          cmake --build build --target yaze --target z3ed
      - name: Start test harness
        run: |
          ./build/bin/yaze --enable_test_harness --headless &
          sleep 5
      - name: Run test suite
        run: |
          ./build/bin/z3ed test run-suite tests/suite.yaml --ci-mode
      - name: Upload test results
        if: always()
        uses: actions/upload-artifact@v2
        with:
          name: test-results
          path: test-results/

IT-10: Collaborative Editing & Multiplayer Sessions ⏸️ DEPRIORITIZED

Status: Postponed in favor of LLM integration work
Rationale: While collaborative editing is an interesting feature, practical LLM integration provides more immediate value for the agentic workflow system. The core infrastructure is complete, and enabling real AI agents to interact with z3ed is the critical next step.

Future Consideration: IT-10 may be revisited after LLM integration is production-ready and validated by users. The collaborative editing design is preserved in the documentation for future reference.

See: LLM-INTEGRATION-PLAN.md for the new priority work.


Priority 2: LLM Integration (Ollama + Gemini + Claude) 🤖 NEW PRIORITY

Goal: Enable practical AI-driven ROM modifications with local and remote LLM providers
Time Estimate: 12-15 hours total
Status: Ready to Implement

Why This is Critical: The z3ed infrastructure is complete (CLI, proposals, sandbox, GUI automation), but currently uses MockAIService with hardcoded commands. Real LLM integration unlocks the full potential of the agentic workflow system.

📋 Complete Documentation:

Implementation Phases:

Phase 1: Ollama Local Integration (4-6 hours) 🎯 START HERE

  • Create OllamaAIService class with health checks and model management
  • Wire into agent commands with provider selection mechanism
  • Add CMake configuration for httplib support
  • End-to-end testing with qwen2.5-coder:7b model

Key Benefits: Local, free, private, no rate limits

Phase 2: Gemini Fixes (2-3 hours)

  • Fix existing GeminiAIService implementation
  • Improve prompting with resource catalogue
  • Add markdown code block stripping for reliable parsing

Phase 3: Claude Integration (2-3 hours)

  • Create ClaudeAIService class
  • Implement Messages API integration
  • Same interface as other services for easy swapping

Phase 4: Enhanced Prompt Engineering (3-4 hours)

  • Create PromptBuilder utility class
  • Load resource catalogue (z3ed-resources.yaml) into system prompts
  • Add few-shot examples for improved accuracy (>90%)
  • Inject ROM context (current state, loaded editors)

Quick Start After Implementation:

# Install Ollama
brew install ollama
ollama serve &
ollama pull qwen2.5-coder:7b

# Configure z3ed
export YAZE_AI_PROVIDER=ollama

# Use natural language
z3ed agent run --prompt "Make all soldier armor red" --rom zelda3.sfc --sandbox
z3ed agent diff  # Review changes

Testing Script: ./scripts/quickstart_ollama.sh (automated setup validation)


Priority 3: Windows Cross-Platform Testing 🪟

  1. Collaboration Server:

    • WebSocket server for real-time client communication
    • Session management (create, join, authentication)
    • Edit event broadcasting to all connected clients
    • Conflict resolution (last-write-wins with timestamps)
  2. Collaboration Client:

    • Connect to remote sessions via WebSocket
    • Send local edits to server
    • Receive and apply remote edits
    • ROM state synchronization on join
  3. Edit Event Protocol:

    • Protobuf definitions for edit events (tile, sprite, palette, map)
    • Cursor position tracking
    • AI proposal sharing and voting
    • Session state messages
  4. GUI Integration:

    • Status bar showing connected users
    • Collaboration panel (user list, activity feed)
    • Live cursor rendering (color-coded per user)
    • Proposal voting UI (Accept/Reject/Discuss)
  5. Session Recording & Replay:

    • Record all events to YAML/JSON file
    • Replay engine with timeline controls
    • Export session summaries for review

CLI Commands:

# Host a collaborative session
z3ed collab host --port 5000 --password "dev123"

# Join a session
z3ed collab join yaze://connect/192.168.1.100:5000

# List active sessions (LAN discovery)
z3ed collab list

# Disconnect from session
z3ed collab disconnect

# Replay recorded session
z3ed collab replay session_2025_10_02.yaml --speed 2x

User Stories:

  • US-1: As a ROM hacker, I want to host a collaborative session so my teammates can join and work together
  • US-2: As a collaborator, I want to see other users' edits in real-time so we stay synchronized
  • US-3: As a team lead, I want to use AI agents with my team so we can all benefit from automation (shared proposals with majority voting)
  • US-4: As a collaborator, I want to see where other users are working so we don't conflict (live cursors)
  • US-5: As a project manager, I want to record collaborative sessions so we can review work later

Benefits:

  • Real-Time Collaboration: Multiple users can edit the same ROM simultaneously
  • Shared AI Assistance: Team votes on AI proposals before execution
  • Conflict Prevention: Live cursors show where teammates are working
  • Audit Trail: Session recording for review and compliance
  • Remote Teams: Connect over LAN or internet (with optional encryption)

Technical Architecture:

┌──────────────┐     ┌─────────────────┐     ┌──────────────┐
│  Client A    │────►│  Collab Server  │◄────│  Client B    │
│  (Host)      │     │  (WebSocket)    │     │              │
└──────────────┘     │                 │     └──────────────┘
                     │  - Session Mgmt │
                     │  - Event Broker │     ┌──────────────┐
                     │  - Conflict Res │◄────│  Client C    │
                     └─────────────────┘     └──────────────┘

Security Considerations:

  • Optional password protection for sessions
  • Read-only vs read-write access levels
  • ROM checksum verification (prevents desync)
  • Rate limiting (prevent spam/DOS)
  • Optional TLS/SSL encryption for public internet

See: IT-10-COLLABORATIVE-EDITING.md for complete specification


Priority 2: Windows Cross-Platform Testing 🪟

Goal: Validate z3ed and test harness on Windows
Time Estimate: 8-10 hours
Blocking Dependency: IT-05 Complete (need stable API)

📋 Detailed Guides: See NEXT_PRIORITIES_OCT2.md for complete implementation breakdowns with code examples.


2. Workstreams Overview

Workstream Goal Status Notes
Resource Catalogue Machine-readable CLI specs for AI consumption Complete docs/api/z3ed-resources.yaml generated
Acceptance Workflow Human review/approval of agent proposals Complete ProposalDrawer with ROM merging operational
ImGuiTest Bridge Automated GUI testing via gRPC Complete All 3 phases done (11 hours)
Verification Pipeline Layered testing + CI coverage 📋 In Progress E2E validation phase
Telemetry & Learning Capture signals for improvement 📋 Planned Optional/opt-in (Phase 8)

Completed Work Summary

Resource Catalogue (RC) :

  • CLI flag passthrough and resource catalog system
  • agent describe exports YAML/JSON schemas
  • docs/api/z3ed-resources.yaml maintained
  • All ROM/Palette/Overworld/Dungeon/Patch commands documented

Acceptance Workflow (AW-01/02/03) :

  • ProposalRegistry with disk persistence and cross-session tracking
  • RomSandboxManager for isolated ROM copies
  • agent list and agent diff commands
  • ProposalDrawer GUI: List/detail views, Accept/Reject/Delete, ROM merging
  • Integrated into EditorManager (Debug → Agent Proposals)

ImGuiTestHarness (IT-01) :

  • Phase 1: gRPC infrastructure (6 RPC methods)
  • Phase 2: TestManager integration with dynamic tests
  • Phase 3: Full ImGuiTestEngine (Type/Wait/Assert RPCs)
  • E2E test script: scripts/test_harness_e2e.sh
  • Documentation: IT-01-QUICKSTART.md

3. Task Backlog

ID Task Workstream Type Status Dependencies
RC-01 Define schema for ResourceCatalog entries and implement serialization helpers. Resource Catalogue Code Done Schema system complete with all resource types documented
RC-02 Auto-generate docs/api/z3ed-resources.yaml from command annotations. Resource Catalogue Tooling Done Generated and committed to docs/api/
RC-03 Implement z3ed agent describe CLI surface returning JSON schemas. Resource Catalogue Code Done Both YAML and JSON output formats working
RC-04 Integrate schema export with TUI command palette + help overlays. Resource Catalogue UX 📋 Planned RC-03
RC-05 Harden CLI command routing/flag parsing to unblock agent automation. Resource Catalogue Code Done Fixed rom info handler to use FLAGS_rom
AW-01 Implement sandbox ROM cloning and tracking (RomSandboxManager). Acceptance Workflow Code Done ROM sandbox manager operational with lifecycle management
AW-02 Build proposal registry service storing diffs, logs, screenshots. Acceptance Workflow Code Done ProposalRegistry implemented with disk persistence
AW-03 Add ImGui drawer for proposals with accept/reject controls. Acceptance Workflow UX Done ProposalDrawer GUI complete with ROM merging
AW-04 Implement policy evaluation for gating accept buttons. Acceptance Workflow Code Done PolicyEvaluator service with 4 policy types (test, constraint, forbidden, review), GUI integration complete (6 hours)
AW-05 Draft .z3ed-diff hybrid schema (binary deltas + JSON metadata). Acceptance Workflow Design 📋 Planned AW-01
IT-01 Create ImGuiTestHarness IPC service embedded in yaze_test. ImGuiTest Bridge Code Done Phase 1+2+3 Complete - Full GUI automation with gRPC + ImGuiTestEngine (11 hours)
IT-02 Implement CLI agent step translation (imgui_action → harness call). ImGuiTest Bridge Code Done z3ed agent test command with natural language prompts (7.5 hours)
IT-03 Provide synchronization primitives (WaitForIdle, etc.). ImGuiTest Bridge Code Done Wait RPC with condition polling already implemented in IT-01 Phase 3
IT-04 Complete E2E validation with real YAZE widgets ImGuiTest Bridge Test Done IT-02 - All 5 functional tests passing, window detection fixed with yield buffer
IT-05 Add test introspection RPCs (GetTestStatus, ListTests, GetResults) ImGuiTest Bridge Code Done IT-01 - Enable clients to poll test results and query execution state (Oct 2, 2025)
IT-06 Implement widget discovery API for AI agents ImGuiTest Bridge Code 📋 Planned IT-01 - DiscoverWidgets RPC to enumerate windows, buttons, inputs
IT-07 Add test recording/replay for regression testing ImGuiTest Bridge Code Done IT-05 - RecordSession/ReplaySession RPCs with JSON test scripts
IT-08 Enhance error reporting with screenshots and state dumps ImGuiTest Bridge Code Done IT-01 - Screenshot RPC, auto-capture, widget state dumps complete (Oct 2, 2025)
IT-08a Screenshot RPC implementation (SDL capture) ImGuiTest Bridge Code Done IT-01 - Screenshot capture complete (Oct 2, 2025)
IT-08b Auto-capture screenshots on test failure ImGuiTest Bridge Code Done IT-08a - Integrated with TestManager (Oct 2, 2025)
IT-08c Widget state dumps and execution context ImGuiTest Bridge Code Done IT-08b - Enhanced failure diagnostics (Oct 2, 2025)
IT-09 Create standardized test suite format for CI integration ImGuiTest Bridge Infra Done IT-07 - CLI suite run/validate/create commands, JUnit output
IT-10 Collaborative editing & multiplayer sessions with shared AI Collaboration Feature 📋 Planned IT-05, IT-08 - Real-time multi-user editing with live cursors, shared proposals (12-15 hours)
VP-01 Expand CLI unit tests for new commands and sandbox flow. Verification Pipeline Test 📋 Planned RC/AW tasks
VP-02 Add harness integration tests with replay scripts. Verification Pipeline Test 📋 Planned IT tasks
VP-03 Create CI job running agent smoke tests with YAZE_WITH_JSON. Verification Pipeline Infra 📋 Planned VP-01, VP-02
TL-01 Capture accept/reject metadata and push to telemetry log. Telemetry & Learning Code 📋 Planned AW tasks
TL-02 Build anonymized metrics exporter + opt-in toggle. Telemetry & Learning Infra 📋 Planned TL-01

Status Legend: 🔄 Active · 📋 Planned · Done

Progress Summary:

  • Completed: 13 tasks (54%)
  • 🔄 Active: 0 tasks (0%)
  • 📋 Planned: 11 tasks (46%)
  • Total: 24 tasks (6 test harness enhancements + 1 collaborative feature)

3. Immediate Next Steps (Week of Oct 1-7, 2025)

Priority 0: Testing & Validation (Active)

  1. TEST: Complete end-to-end proposal workflow

    • Launch YAZE and verify ProposalDrawer displays live proposals
    • Test Accept action → verify ROM merge and save prompt
    • Test Reject and Delete actions
    • Validate filtering and refresh functionality
  2. Widget ID Refactoring (Started Oct 2, 2025) 🎯 NEW

    • Added widget_id_registry to build system
    • Registered 13 Overworld toolset buttons with hierarchical IDs
    • 📋 Next: Test widget discovery and update test harness
    • See: WIDGET_ID_REFACTORING_PROGRESS.md

Priority 1: ImGuiTestHarness Foundation (IT-01) COMPLETE

Rationale: Required for automated GUI testing and remote control of YAZE for AI workflows
Decision: Use gRPC - Production-grade, cross-platform, type-safe (see IT-01-grpc-evaluation.md)

Status: Phase 1 Complete | Phase 2 Complete | Phase 3 Planned <20>

Phase 1: gRPC Infrastructure COMPLETE

  • Add gRPC to build system via FetchContent
  • Create .proto schema (Ping, Click, Type, Wait, Assert, Screenshot)
  • Implement gRPC server with all 6 RPC stubs
  • Test with grpcurl - all RPCs responding
  • Server lifecycle management (Start/Shutdown)
  • Cross-platform build verified (macOS ARM64)

See: GRPC_TEST_SUCCESS.md for Phase 1 completion details

Phase 2: ImGuiTestEngine Integration COMPLETE

Goal: Replace stub RPC handlers with actual GUI automation
Status: Infrastructure complete, dynamic test registration implemented
Time Spent: ~4 hours

Implementation Guide: 📖 IT-01-PHASE2-IMPLEMENTATION-GUIDE.md

Completed Tasks:

  1. TestManager Integration - gRPC service receives TestManager reference
  2. Build System - Successfully compiles with ImGuiTestEngine support
  3. Server Startup - gRPC server starts correctly on macOS with test harness flag
  4. Dynamic Test Registration - Click RPC uses IM_REGISTER_TEST() macro for dynamic tests
  5. Stub Handlers - Type/Wait/Assert RPCs return success (implementation pending Phase 3)
  6. Ping RPC - Fully functional, returns YAZE version and timestamp

Key Learnings:

  • ImGuiTestEngine requires test registration - can't call test functions directly
  • Test context provided by engine via test->Output.Status not test->Status
  • YAZE uses custom flag system with FLAGS_name->Get() pattern
  • Correct flags: --enable_test_harness, --test_harness_port, --rom_file

Testing Results:

# Server starts successfully
./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze \
  --enable_test_harness \
  --test_harness_port=50052 \
  --rom_file=assets/zelda3.sfc &

# Ping RPC working
grpcurl -plaintext -d '{"message":"test"}' \
  127.0.0.1:50052 yaze.test.ImGuiTestHarness/Ping
# Response: {"message":"Pong: test","timestampMs":"...","yazeVersion":"0.3.2"}

Issues Fixed:

  • SIGSEGV on TestManager initialization (deferred ImGuiTestEngine init to Phase 3)
  • ImGuiTestEngine API mismatch (switched to dynamic test registration)
  • Status field access (corrected to test->Output.Status)
  • Port conflicts (use port 50052, killall yaze to cleanup)
  • Flag naming (documented correct underscore format)

Phase 3: Full ImGuiTestEngine Integration COMPLETE (Oct 2, 2025)

Goal: Complete implementation of all GUI automation RPCs

Completed Tasks:

  1. Type RPC Implementation - Full text input automation

    • ItemInfo API usage corrected (returns by value, not pointer)
    • Focus management with ItemClick before typing
    • Clear-first functionality with keyboard shortcuts
    • Dynamic test registration with timeout handling
  2. Wait RPC Implementation - Condition polling with timeout

    • Three condition types: window_visible, element_visible, element_enabled
    • Configurable timeout (default 5000ms) and poll interval (default 100ms)
    • Proper Yield() calls to allow ImGui event processing
    • Extended timeout for test execution
  3. Assert RPC Implementation - State validation with structured responses

    • Multiple assertion types: visible, enabled, exists, text_contains
    • Actual vs expected value reporting
    • Detailed error messages for debugging
    • text_contains partially implemented (text retrieval needs refinement)
  4. API Compatibility Fixes

    • Corrected ItemInfo usage (by value, check ID != 0)
    • Fixed flag names (ItemFlags instead of StatusFlags)
    • Proper visibility checks using RectClipped dimensions
    • All dynamic tests properly registered and cleaned up

Testing:

  • Build successful on macOS ARM64
  • All RPCs respond correctly
  • Test script created: scripts/test_harness_e2e.sh
  • See IT-01-PHASE3-COMPLETE.md for full implementation details

Known Limitations:

  • Screenshot RPC not implemented (placeholder stub)
  • text_contains assertion uses placeholder text retrieval
  • Need end-to-end workflow testing with real YAZE widgets
  1. End-to-End Testing (1 hour)
    • Create shell script workflow: start server → click button → wait for window → type text → assert state
    • Test with real YAZE editors (Overworld, Dungeon, etc.)
    • Document edge cases and troubleshooting

Phase 4: CLI Integration & Windows Testing (4-5 hours)

  1. CLI Client (z3ed agent test)
  • Generate gRPC calls from AI prompts
  • Natural language → ImGui action translation
  • Screenshot capture for LLM feedback
  • Emit structured error envelopes with artifact links (IT-08)
  1. Windows Testing
    • Detailed build instructions for vcpkg setup
    • Test on Windows VM or with contributor
    • Add Windows CI job to GitHub Actions
    • Document troubleshooting

IT-01 Quick Reference

Start YAZE with Test Harness:

./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze \
  --enable_test_harness \
  --test_harness_port=50052 \
  --rom_file=assets/zelda3.sfc &

Test RPCs with grpcurl:

# Ping - Health check
grpcurl -plaintext -import-path src/app/core/proto -proto imgui_test_harness.proto \
  -d '{"message":"test"}' 127.0.0.1:50052 yaze.test.ImGuiTestHarness/Ping

# Click - Click UI element
grpcurl -plaintext -import-path src/app/core/proto -proto imgui_test_harness.proto \
  -d '{"target":"button:Overworld","type":"LEFT"}' \
  127.0.0.1:50052 yaze.test.ImGuiTestHarness/Click

# Type - Input text
grpcurl -plaintext -import-path src/app/core/proto -proto imgui_test_harness.proto \
  -d '{"target":"input:Filename","text":"zelda3.sfc","clear_first":true}' \
  127.0.0.1:50052 yaze.test.ImGuiTestHarness/Type

# Wait - Wait for condition
grpcurl -plaintext -import-path src/app/core/proto -proto imgui_test_harness.proto \
  -d '{"condition":"window_visible:Overworld Editor","timeout_ms":5000}' \
  127.0.0.1:50052 yaze.test.ImGuiTestHarness/Wait

# Assert - Validate state
grpcurl -plaintext -import-path src/app/core/proto -proto imgui_test_harness.proto \
  -d '{"condition":"visible:Main Window"}' \
  127.0.0.1:50052 yaze.test.ImGuiTestHarness/Assert

Troubleshooting:

  • Port in use: killall yaze or use --test_harness_port=50053
  • Connection refused: Check server started with lsof -i :50052
  • Unrecognized flag: Use underscores not hyphens (e.g., --rom_file not --rom)

Priority 2: Policy Evaluation Framework (AW-04, 4-6 hours)

  1. DESIGN: YAML-based Policy Configuration

    # .yaze/policies/agent.yaml
    version: 1.0
    policies:
      - name: require_tests
        type: test_requirement
        enabled: true
        rules:
          - test_suite: "overworld_rendering"
            min_pass_rate: 0.95
          - test_suite: "palette_integrity"
            min_pass_rate: 1.0
    
      - name: limit_change_scope
        type: change_constraint
        enabled: true
        rules:
          - max_bytes_changed: 10240  # 10KB
          - allowed_banks: [0x00, 0x01, 0x0E]  # Graphics banks only
          - forbidden_ranges:
            - start: 0xFFB0  # ROM header
              end: 0xFFFF
    
      - name: human_review_required
        type: review_requirement
        enabled: true
        rules:
          - if: bytes_changed > 1024
            then: require_diff_review: true
          - if: commands_executed > 10
            then: require_log_review: true
    
  2. IMPLEMENT: PolicyEvaluator Service

    • src/cli/service/policy_evaluator.{h,cc}
    • Singleton service loads policies from .yaze/policies/
    • EvaluateProposal(proposal_id) -> PolicyResult
    • Returns: pass/fail + list of violations with severity
    • Hook into ProposalRegistry lifecycle
  3. INTEGRATE: Policy UI in ProposalDrawer

    • Add "Policy Status" section in detail view
    • Display violations with icons: Critical, ⚠️ Warning, Info
    • Gate Accept button: disabled if critical violations exist
    • Show helpful messages: "Accept blocked: Test pass rate 0.85 < 0.95"
    • Allow policy overrides with confirmation: "Override policy? This action will be logged."

Priority 3: Documentation & Consolidation (2-3 hours)

  1. CONSOLIDATE: Merge standalone docs into main plan

    • AW-03 summary → already in main plan, delete standalone doc
    • Check for other AW-* or task-specific docs to merge
    • Update main plan with architecture diagrams
  2. CREATE: Architecture Flow Diagram

    • Visual representation of proposal lifecycle
    • Component interaction diagram
    • Add to implementation plan

Later: Advanced Features

  • VP-01: Expand CLI unit tests
  • VP-02: Integration tests with replay scripts
  • TL-01: Telemetry capture for learning

4. Current Issues & Blockers

Active Issues

None - all blocking issues resolved as of Oct 1, 2025

Known Limitations (Non-Blocking)

  1. ProposalDrawer lacks keyboard navigation
  2. Large diffs/logs truncated at 1000 lines (consider pagination)
  3. Proposals don't persist full metadata to disk (prompt, description, sandbox_id reconstructed)
  4. No policy evaluation yet (AW-04)

5. Architecture Overview

5.1. Proposal Lifecycle Flow

┌─────────────────────────────────────────────────────────────────┐
│ 1. CREATION (CLI: z3ed agent run)                               │
├─────────────────────────────────────────────────────────────────┤
│ User Prompt                                                      │
│      ↓                                                           │
│ MockAIService / GeminiAIService                                 │
│      ↓ (generates commands)                                     │
│ ["palette export ...", "overworld set-tile ..."]                │
│      ↓                                                           │
│ RomSandboxManager::CreateSandbox(rom)                           │
│      ↓ (creates isolated copy)                                  │
│ /tmp/yaze/sandboxes/<timestamp>/zelda3.sfc                      │
│      ↓                                                           │
│ Execute commands on sandbox ROM                                 │
│      ↓ (logs each command)                                      │
│ ProposalRegistry::CreateProposal(sandbox_id, prompt, desc)      │
│      ↓ (creates proposal directory)                             │
│ /tmp/yaze/proposals/proposal-<timestamp>-<seq>/                 │
│   ├─ execution.log (command outputs)                            │
│   ├─ diff.txt (if generated)                                    │
│   └─ screenshots/ (if any)                                      │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ 2. DISCOVERY (CLI: z3ed agent list)                             │
├─────────────────────────────────────────────────────────────────┤
│ ProposalRegistry::ListProposals()                               │
│      ↓ (lazy loads from disk)                                   │
│ LoadProposalsFromDiskLocked()                                   │
│      ↓ (scans /tmp/yaze/proposals/)                             │
│ Reconstructs metadata from filesystem                           │
│      ↓ (parses timestamps, reads logs)                          │
│ Returns vector<ProposalMetadata>                                │
│      ↓                                                           │
│ Display table: ID | Status | Created | Prompt | Stats           │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ 3. REVIEW (GUI: Debug → Agent Proposals)                        │
├─────────────────────────────────────────────────────────────────┤
│ ProposalDrawer::Draw()                                          │
│      ↓ (called every frame from EditorManager)                  │
│ ProposalDrawer::RefreshProposals()                              │
│      ↓ (calls ProposalRegistry::ListProposals)                  │
│ Display proposal list (selectable table)                        │
│      ↓ (user clicks proposal)                                   │
│ ProposalDrawer::SelectProposal(id)                              │
│      ↓ (loads detail content)                                   │
│ Read execution.log and diff.txt from proposal directory         │
│      ↓                                                           │
│ Display detail view:                                            │
│   ├─ Metadata (sandbox_id, timestamp, stats)                   │
│   ├─ Diff (syntax highlighted)                                  │
│   └─ Log (command execution trace)                              │
│      ↓                                                           │
│ User decides: [Accept] [Reject] [Delete]                        │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ 4. ACCEPTANCE (GUI: Click "Accept" button)                      │
├─────────────────────────────────────────────────────────────────┤
│ ProposalDrawer::AcceptProposal(proposal_id)                     │
│      ↓                                                           │
│ Get proposal metadata (includes sandbox_id)                     │
│      ↓                                                           │
│ RomSandboxManager::ListSandboxes()                              │
│      ↓ (find sandbox by ID)                                     │
│ sandbox_rom_path = sandbox.rom_path                             │
│      ↓                                                           │
│ Load sandbox ROM from disk                                      │
│      ↓                                                           │
│ rom_->WriteVector(0, sandbox_rom.vector())                      │
│      ↓ (copies entire sandbox ROM → main ROM)                   │
│ ROM marked dirty (save prompt appears)                          │
│      ↓                                                           │
│ ProposalRegistry::UpdateStatus(id, kAccepted)                   │
│      ↓                                                           │
│ User: File → Save ROM                                           │
│      ↓                                                           │
│ Changes committed ✅                                            │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ 5. REJECTION (GUI: Click "Reject" button)                       │
├─────────────────────────────────────────────────────────────────┤
│ ProposalDrawer::RejectProposal(proposal_id)                     │
│      ↓                                                           │
│ ProposalRegistry::UpdateStatus(id, kRejected)                   │
│      ↓                                                           │
│ Proposal preserved for audit trail                              │
│ Sandbox ROM left untouched (can be cleaned up later)            │
└─────────────────────────────────────────────────────────────────┘

5.2. Component Interaction Diagram

┌────────────────────┐
│   CLI Layer        │
│  (z3ed commands)   │
└────────┬───────────┘
         │
         ├──► agent run ──────────┐
         ├──► agent list ─────────┤
         └──► agent diff ─────────┤
                                  │
         ┌────────────────────────▼──────────────────────┐
         │         CLI Service Layer                     │
         ├───────────────────────────────────────────────┤
         │  ┌─────────────────────────────────────────┐  │
         │  │ ProposalRegistry (Singleton)            │  │
         │  │  • CreateProposal()                     │  │
         │  │  • ListProposals()                      │  │
         │  │  • GetProposal()                        │  │
         │  │  • UpdateStatus()                       │  │
         │  │  • RemoveProposal()                     │  │
         │  │  • LoadProposalsFromDiskLocked()        │  │
         │  └────────────┬────────────────────────────┘  │
         │               │                               │
         │  ┌────────────▼────────────────────────────┐  │
         │  │ RomSandboxManager (Singleton)          │  │
         │  │  • CreateSandbox()                     │  │
         │  │  • ActiveSandbox()                     │  │
         │  │  • ListSandboxes()                     │  │
         │  │  • RemoveSandbox()                     │  │
         │  └────────────┬────────────────────────────┘  │
         └───────────────┼────────────────────────────────┘
                         │
         ┌───────────────▼────────────────────────────────┐
         │         Filesystem Layer                       │
         ├────────────────────────────────────────────────┤
         │  /tmp/yaze/proposals/                          │
         │    └─ proposal-<timestamp>-<seq>/              │
         │         ├─ execution.log                       │
         │         ├─ diff.txt                            │
         │         └─ screenshots/                        │
         │                                                │
         │  /tmp/yaze/sandboxes/                          │
         │    └─ <timestamp>-<seq>/                       │
         │         └─ zelda3.sfc (isolated ROM copy)      │
         └────────────────────────────────────────────────┘
                         ▲
                         │
         ┌───────────────┴────────────────────────────────┐
         │         GUI Layer                              │
         ├────────────────────────────────────────────────┤
         │  ┌─────────────────────────────────────────┐   │
         │  │ EditorManager                           │   │
         │  │  • current_rom_                         │   │
         │  │  • proposal_drawer_                     │   │
         │  │  • Update() { proposal_drawer_.Draw() } │   │
         │  └────────────┬────────────────────────────┘   │
         │               │                                │
         │  ┌────────────▼────────────────────────────┐   │
         │  │ ProposalDrawer                          │   │
         │  │  • rom_ (ptr to EditorManager's ROM)    │   │
         │  │  • Draw()                               │   │
         │  │  • DrawProposalList()                   │   │
         │  │  • DrawProposalDetail()                 │   │
         │  │  • AcceptProposal() ← ROM MERGE         │   │
         │  │  • RejectProposal()                     │   │
         │  │  • DeleteProposal()                     │   │
         │  └─────────────────────────────────────────┘   │
         └────────────────────────────────────────────────┘

5.3. Data Flow: Agent Run to ROM Merge

User: "Make soldiers wear red armor"
         │
         ▼
┌────────────────────────┐
│ MockAIService          │ Generates: ["palette export sprites_aux1 4 soldier.col"]
└────────┬───────────────┘
         │
         ▼
┌────────────────────────┐
│ RomSandboxManager      │ Creates: /tmp/.../sandboxes/20251001T200215-1/zelda3.sfc
└────────┬───────────────┘
         │
         ▼
┌────────────────────────┐
│ Command Executor       │ Runs: palette export on sandbox ROM
└────────┬───────────────┘
         │
         ▼
┌────────────────────────┐
│ ProposalRegistry       │ Creates: proposal-20251001T200215-1/
│                        │   • execution.log: "[timestamp] palette export succeeded"
└────────┬───────────────┘   • diff.txt: (if diff generated)
         │
         │ Time passes... user launches GUI
         ▼
┌────────────────────────┐
│ ProposalDrawer loads   │ Reads: /tmp/.../proposals/proposal-*/
│                        │ Displays: List of proposals
└────────┬───────────────┘
         │
         │ User clicks "Accept"
         ▼
┌────────────────────────┐
│ AcceptProposal()       │ 1. Find sandbox ROM: /tmp/.../sandboxes/.../zelda3.sfc
│                        │ 2. Load sandbox ROM
│                        │ 3. rom_->WriteVector(0, sandbox_rom.vector())
│                        │ 4. Main ROM now contains all sandbox changes
│                        │ 5. ROM marked dirty
└────────┬───────────────┘
         │
         ▼
┌────────────────────────┐
│ User: File → Save      │ Changes persisted to disk ✅
└────────────────────────┘

5. Open Questions

  • What serialization format should the proposal registry adopt for diff payloads (binary vs. textual vs. hybrid)?
    ➤ Decision: pursue a hybrid package (.z3ed-diff) that wraps binary tile/object deltas alongside a JSON metadata envelope (identifiers, texture descriptors, preview palette info). Capture format draft under RC/AW backlog.
  • How should the harness authenticate escalation requests for mutation actions?
    ➤ Still open—evaluate shared-secret vs. interactive user prompt in the harness spike (IT-01).
  • Can we reuse existing regression test infrastructure for nightly ImGui runs or should we spin up a dedicated binary?
    ➤ Investigate during the ImGuiTestHarness spike; compare extending yaze_test jobs versus introducing a lightweight automation runner.

4. Work History & Key Decisions

This section provides a high-level summary of completed workstreams and major architectural decisions.

Resource Catalogue Workstream (RC) - COMPLETE

  • Outcome: A machine-readable API specification for all z3ed commands.
  • Artifact: docs/api/z3ed-resources.yaml is the generated source of truth.
  • Details: Implemented a schema system and serialization for all CLI resources (ROM, Palette, Agent, etc.), enabling AI consumption.

Acceptance Workflow (AW-01, AW-02, AW-03) - COMPLETE

  • Outcome: A complete, human-in-the-loop proposal review system.
  • Components:
    • RomSandboxManager: For creating isolated ROM copies.
    • ProposalRegistry: For tracking proposals, diffs, and logs with disk persistence.
    • ProposalDrawer: An ImGui panel for reviewing, accepting, and rejecting proposals, with full ROM merging capabilities.
  • Integration: The agent run, agent list, and agent diff commands are fully integrated with the registry. The GUI and CLI share the same underlying proposal data.

ImGuiTestHarness (IT-01, IT-02) - CORE COMPLETE

  • Outcome: A gRPC-based service for automated GUI testing.
  • Decision: Chose gRPC for its performance, cross-platform support, and type safety.
  • Features: Implemented 6 core RPCs: Ping, Click, Type, Wait, Assert, and a stubbed Screenshot.
  • Integration: The z3ed agent test command can translate natural language prompts into a sequence of gRPC calls to execute tests.

Files Modified/Created

A summary of files created or changed during the implementation of the core z3ed infrastructure.

Core Services & CLI Handlers:

  • src/cli/service/proposal_registry.{h,cc}
  • src/cli/service/rom_sandbox_manager.{h,cc}
  • src/cli/service/resource_catalog.{h,cc}
  • src/cli/handlers/agent.cc
  • src/cli/handlers/rom.cc

GUI & Application Integration:

  • src/app/editor/system/proposal_drawer.{h,cc}
  • src/app/editor/editor_manager.{h,cc}
  • src/app/core/service/imgui_test_harness_service.{h,cc}
  • src/app/core/proto/imgui_test_harness.proto

Build System (CMake):

  • src/app/app.cmake
  • src/app/emu/emu.cmake
  • src/cli/z3ed.cmake
  • src/CMakeLists.txt

Documentation & API Specs:

  • docs/api/z3ed-resources.yaml
  • docs/z3ed/E6-z3ed-cli-design.md
  • docs/z3ed/E6-z3ed-implementation-plan.md
  • docs/z3ed/E6-z3ed-reference.md
  • docs/z3ed/README.md

5. Open Questions

  • What serialization format should the proposal registry adopt for diff payloads (binary vs. textual vs. hybrid)?
    ➤ Decision: pursue a hybrid package (.z3ed-diff) that wraps binary tile/object deltas alongside a JSON metadata envelope (identifiers, texture descriptors, preview palette info). Capture format draft under RC/AW backlog.
  • How should the harness authenticate escalation requests for mutation actions?
    ➤ Still open—evaluate shared-secret vs. interactive user prompt in the harness spike (IT-01).
  • Can we reuse existing regression test infrastructure for nightly ImGui runs or should we spin up a dedicated binary?
    ➤ Investigate during the ImGuiTestHarness spike; compare extending yaze_test jobs versus introducing a lightweight automation runner.

Z3ED_AI Flag Migration Guide

Date: October 3, 2025
Status: Complete and Tested

Summary

This document describes the consolidation of z3ed AI build flags into a single Z3ED_AI master flag, fixing a Gemini integration crash, and improving build ergonomics.

Problem Statement

Before (Issues):

  1. Confusing Build Flags: Users had to specify -DYAZE_WITH_GRPC=ON -DYAZE_WITH_JSON=ON to enable AI features
  2. Crash on Startup: Gemini integration crashed due to PromptBuilder using JSON/YAML unconditionally
  3. Poor Modularity: AI dependencies scattered across multiple conditional blocks
  4. Unclear Documentation: Users didn't know which flags enabled which features

Root Cause of Crash:

// GeminiAIService constructor (ALWAYS runs when Gemini key present)
GeminiAIService::GeminiAIService(const GeminiConfig& config) : config_(config) {
  // This line crashed when YAZE_WITH_JSON=OFF
  prompt_builder_.LoadResourceCatalogue("");  // ❌ Uses nlohmann::json unconditionally
}

The PromptBuilder::LoadResourceCatalogue() function used nlohmann::json and yaml-cpp without guards, causing segfaults when JSON support wasn't compiled in.

Solution

1. Created Z3ED_AI Master Flag

New CMakeLists.txt (/Users/scawful/Code/yaze/CMakeLists.txt):

# Master flag for z3ed AI agent features
option(Z3ED_AI "Enable z3ed AI agent features (Gemini/Ollama integration)" OFF)

# Auto-enable dependencies
if(Z3ED_AI)
    message(STATUS "Z3ED_AI enabled: Activating AI agent dependencies (JSON, YAML, httplib)")
    set(YAZE_WITH_JSON ON CACHE BOOL "Enable JSON support" FORCE)
endif()

Benefits:

  • Single flag to enable all AI features: -DZ3ED_AI=ON
  • Auto-manages dependencies (JSON, YAML, httplib)
  • Clear intent: "I want AI agent features"
  • Backward compatible: Old flags still work

2. Fixed PromptBuilder Crash

Added Compile-Time Guard (src/cli/service/ai/prompt_builder.h):

#ifndef YAZE_CLI_SERVICE_PROMPT_BUILDER_H_
#define YAZE_CLI_SERVICE_PROMPT_BUILDER_H_

// Warn at compile time if JSON not available
#if !defined(YAZE_WITH_JSON)
#warning "PromptBuilder requires JSON support. Build with -DZ3ED_AI=ON or -DYAZE_WITH_JSON=ON"
#endif

Added Runtime Guard (src/cli/service/ai/prompt_builder.cc):

absl::Status PromptBuilder::LoadResourceCatalogue(const std::string& yaml_path) {
#ifndef YAZE_WITH_JSON
  // Gracefully degrade instead of crashing
  std::cerr << "⚠️  PromptBuilder requires JSON support for catalogue loading\n"
            << "   Build with -DZ3ED_AI=ON or -DYAZE_WITH_JSON=ON\n"
            << "   AI features will use basic prompts without tool definitions\n";
  return absl::OkStatus();  // Don't crash, just skip advanced features
#else
  // ... normal loading code ...
#endif
}

Benefits:

  • No more segfaults when GEMINI_API_KEY is set but JSON disabled
  • Clear error messages at compile time and runtime
  • Graceful degradation instead of hard failure

3. Updated z3ed Build Configuration

New z3ed.cmake (src/cli/z3ed.cmake):

# AI Agent Support (Consolidated via Z3ED_AI flag)
if(Z3ED_AI OR YAZE_WITH_JSON)
  target_compile_definitions(z3ed PRIVATE YAZE_WITH_JSON)
  message(STATUS "✓ z3ed AI agent enabled (Ollama + Gemini support)")
  target_link_libraries(z3ed PRIVATE nlohmann_json::nlohmann_json)
endif()

# SSL/HTTPS Support for Gemini
if((Z3ED_AI OR YAZE_WITH_JSON) AND (YAZE_WITH_GRPC OR Z3ED_AI))
  find_package(OpenSSL)
  if(OpenSSL_FOUND)
    target_compile_definitions(z3ed PRIVATE CPPHTTPLIB_OPENSSL_SUPPORT)
    target_link_libraries(z3ed PRIVATE OpenSSL::SSL OpenSSL::Crypto)
    message(STATUS "✓ SSL/HTTPS support enabled for z3ed (Gemini API ready)")
  else()
    message(WARNING "OpenSSL not found - Gemini API will not work")
    message(STATUS "  • Ollama (local) still works without SSL")
  endif()
endif()

Benefits:

  • Clear status messages during build
  • Explains what's enabled and what's missing
  • Guidance on how to fix missing dependencies

Migration Instructions

For Users

Old Way (still works):

cmake -B build -DYAZE_WITH_GRPC=ON -DYAZE_WITH_JSON=ON
cmake --build build --target z3ed

New Way (recommended):

cmake -B build -DZ3ED_AI=ON
cmake --build build --target z3ed

With GUI Testing:

cmake -B build -DZ3ED_AI=ON -DYAZE_WITH_GRPC=ON
cmake --build build --target z3ed

For Developers

Check if AI Features Available:

#ifdef YAZE_WITH_JSON
  // JSON-dependent code (AI responses, config loading)
#else
  // Fallback or warning
#endif

Don't use JSON/YAML directly - use PromptBuilder which handles guards automatically.

Testing Results

Build Configurations Tested

  1. Minimal Build (no AI):

    cmake -B build
    ./build/bin/z3ed --help  # ✅ Works, shows "AI disabled" message
    
  2. AI Enabled (new flag):

    cmake -B build -DZ3ED_AI=ON
    export GEMINI_API_KEY="..."
    ./build/bin/z3ed agent plan --prompt "test"  # ✅ Works, connects to Gemini
    
  3. Full Stack (AI + gRPC):

    cmake -B build -DZ3ED_AI=ON -DYAZE_WITH_GRPC=ON
    ./build/bin/z3ed agent test --prompt "..."  # ✅ Works, GUI automation available
    

Crash Scenarios Fixed

Before:

export GEMINI_API_KEY="..."
cmake -B build  # JSON disabled by default
./build/bin/z3ed agent plan --prompt "test"
# Result: Segmentation fault (139) ❌

After:

export GEMINI_API_KEY="..."
cmake -B build  # JSON disabled by default
./build/bin/z3ed agent plan --prompt "test"
# Result: ⚠️ Warning message, graceful degradation ✅
export GEMINI_API_KEY="..."
cmake -B build -DZ3ED_AI=ON  # JSON enabled
./build/bin/z3ed agent plan --prompt "Place a tree at 10, 10"
# Result: ✅ Gemini responds, creates proposal

Impact on Build Modularization

This change aligns with the goals in build_modularization_plan.md and build_modularization_implementation.md:

Before:

  • Scattered conditional compilation flags
  • Dependencies unclear
  • Hard to add to modular library system

After:

  • Clear feature flag: Z3ED_AI
  • Can create libyaze_agent.a with if(Z3ED_AI) guard
  • Easy to make optional in modular build:
    if(Z3ED_AI)
      add_library(yaze_agent STATIC ${YAZE_AGENT_SOURCES})
      target_compile_definitions(yaze_agent PUBLIC YAZE_WITH_JSON)
      target_link_libraries(yaze_agent PUBLIC nlohmann_json::nlohmann_json yaml-cpp)
    endif()
    

Future Modular Build Integration

When implementing modular builds (Phase 6-7 from build_modularization_plan.md):

# src/cli/agent/agent_library.cmake (NEW)
if(Z3ED_AI)
  add_library(yaze_agent STATIC
    cli/service/ai/ai_service.cc
    cli/service/ai/ollama_ai_service.cc
    cli/service/ai/gemini_ai_service.cc
    cli/service/ai/prompt_builder.cc
    cli/service/agent/conversational_agent_service.cc
    # ... other agent sources
  )
  
  target_compile_definitions(yaze_agent PUBLIC YAZE_WITH_JSON)
  
  target_link_libraries(yaze_agent PUBLIC
    yaze_util
    nlohmann_json::nlohmann_json
    yaml-cpp
  )
  
  # Optional SSL for Gemini
  if(OpenSSL_FOUND)
    target_compile_definitions(yaze_agent PRIVATE CPPHTTPLIB_OPENSSL_SUPPORT)
    target_link_libraries(yaze_agent PRIVATE OpenSSL::SSL OpenSSL::Crypto)
  endif()
  
  message(STATUS "✓ yaze_agent library built with AI support")
endif()

Benefits for Modular Build:

  • Agent library clearly optional
  • Can rebuild just agent library when AI code changes
  • z3ed links to yaze_agent instead of individual sources
  • Faster incremental builds

Documentation Updates

Updated files:

  • docs/z3ed/README.md - Added Z3ED_AI flag documentation
  • docs/z3ed/Z3ED_AI_FLAG_MIGRATION.md - This document
  • 📋 TODO: Update docs/02-build-instructions.md with Z3ED_AI flag
  • 📋 TODO: Update CI/CD workflows to use Z3ED_AI

Backward Compatibility

Old Flags Still Work

# These all enable AI features:
cmake -B build -DYAZE_WITH_JSON=ON          # ✅ Works
cmake -B build -DYAZE_WITH_GRPC=ON          # ✅ Works (auto-enables JSON)
cmake -B build -DZ3ED_AI=ON                 # ✅ Works (new way)

# Combining flags:
cmake -B build -DZ3ED_AI=ON -DYAZE_WITH_GRPC=ON  # ✅ Full stack

No Breaking Changes

  • Existing build scripts continue to work
  • CI/CD pipelines don't need immediate updates
  • Users can migrate at their own pace

Next Steps

Short Term (Complete)

  • Fix Gemini crash
  • Create Z3ED_AI master flag
  • Update z3ed build configuration
  • Test all build configurations
  • Update README documentation
  • Update CI/CD workflows to use -DZ3ED_AI=ON
  • Add Z3ED_AI to preset configurations
  • Update main build instructions docs
  • Create agent library module (see above)

Long Term (Integration with Modular Build)

  • Implement yaze_agent library (Phase 6)
  • Add agent to modular dependency graph
  • Create agent-specific unit tests
  • Optional: Split Gemini/Ollama into separate modules

References

  • Related Issues: Gemini crash (segfault 139) with GEMINI_API_KEY set
  • Related Docs:
    • docs/build_modularization_plan.md - Future library structure
    • docs/build_modularization_implementation.md - Implementation guide
    • docs/z3ed/README.md - User-facing z3ed documentation
    • docs/z3ed/AGENT-ROADMAP.md - AI agent development plan

Summary

This migration successfully:

  1. Fixed crash: Gemini no longer segfaults when JSON disabled
  2. Simplified builds: One flag (Z3ED_AI) replaces multiple flags
  3. Improved UX: Clear error messages and build status
  4. Maintained compatibility: Old flags still work
  5. Prepared for modularization: Clear path to libyaze_agent.a
  6. Tested thoroughly: All configurations verified working

The z3ed AI agent is now production-ready with Gemini and Ollama support!

6. References

Active Documentation:

  • E6-z3ed-cli-design.md - Overall CLI design and architecture
  • E6-z3ed-reference.md - Technical command and API reference
  • docs/api/z3ed-resources.yaml - Machine-readable API reference (generated)

Source Code:

  • src/cli/service/ - Core services (proposal registry, sandbox manager, resource catalog)
  • src/app/editor/system/proposal_drawer.{h,cc} - GUI review panel
  • src/app/core/service/imgui_test_harness_service.{h,cc} - gRPC automation server

Last Updated: [Current Date] Contributors: @scawful, GitHub Copilot License: Same as YAZE (see ../../LICENSE)