scawful/yaze

Fork 0

Files

scawful ae3f1dea2f feat: Consolidate Z3ED AI build flags into a single master flag and improve error handling

2025-10-03 23:24:03 -04:00

70 KiB

Raw Blame History

z3ed Agentic Workflow Plan

Last Updated: October 2, 2025 Status: Core Infrastructure Complete | Test Harness Enhancement Phase 🎯

📋 Quick Start: See README.md for essential links and project status.

Executive Summary

The z3ed CLI and AI agent workflow system has completed major infrastructure milestones:

✅ Completed Phases:

Phase 6: Resource Catalogue - Machine-readable API specs for AI consumption
AW-01/02/03: Acceptance Workflow - Proposal tracking, sandbox management, GUI review with ROM merging
AW-04: Policy Evaluation Framework - YAML-based constraint system for proposal acceptance
IT-01: ImGuiTestHarness - Full GUI automation via gRPC + ImGuiTestEngine (all 3 phases complete)
IT-02: CLI Agent Test - Natural language → automated GUI testing (implementation complete)

🎯 Active Phase:

Conversational Agent Implementation: ✅ Foundation complete, LLM function calling ✅ COMPLETE (Oct 3, 2025)

📋 Next Phases (Updated Oct 3, 2025):

Priority 1: Live LLM Testing (1-2h) - Verify function calling with Ollama/Gemini
Priority 2: GUI Chat Widget (6-8h) - Create ImGui widget matching TUI experience
Priority 3: Expand Tool Coverage (8-10h) - Add dialogue, sprite, region inspection tools
Priority 4: Widget Discovery API (IT-06) - AI agents enumerate available GUI interactions
Priority 5: Windows Cross-Platform Testing - Validate on Windows with vcpkg
Deprioritized: Collaborative Editing (IT-10) - Postponed in favor of practical LLM integration

Recent Accomplishments (Updated: October 2025):

✅ IT-08 Enhanced Error Reporting Complete: Full diagnostic capture operational
- IT-08a: Screenshot RPC with SDL capture (BMP format, 1536x864)
- IT-08b: Auto-capture execution context on failures (frame, window, widget)
- IT-08c: Widget state dumps with comprehensive UI snapshot (JSON, 45 min)
- Proto schema updated with screenshot_path, failure_context, widget_state
- GetTestResults RPC returns complete failure diagnostics
✅ IT-09 CLI Suite Commands Landed: End-to-end suite orchestration for CI
- agent test suite run handles groups, tags, params, retries, and emits summaries plus default JUnit XML under test-results/junit/
- agent test suite validate performs structural linting with exit codes
- NEW agent test suite create interactive builder writes YAML suites to tests/<name>.yaml (with --force overwrite) and guides group/test entry
✅ IT-08a Screenshot RPC Complete: SDL-based screenshot capture operational
- Captures 1536x864 BMP files via SDL_RenderReadPixels
- Successfully tested via gRPC (5.3MB output files)
- Foundation for auto-capture on test failures
✅ Policy Framework Complete: PolicyEvaluator service fully integrated with ProposalDrawer GUI
- 4 policy types implemented: test_requirement, change_constraint, forbidden_range, review_requirement
- 3 severity levels: Info (informational), Warning (overridable), Critical (blocks acceptance)
- GUI displays color-coded violations (⛔ critical, ⚠️ warning, ℹ️ info)
- Accept button gating based on policy violations with override confirmation dialog
- Example policy configuration at .yaze/policies/agent.yaml
✅ E2E Validation Complete: All 5 functional RPC tests passing (Ping, Click, Type, Wait, Assert)
- Window detection timing issue resolved with 10-frame yield buffer in Wait RPC
- Thread safety issues resolved with shared_ptr state management
- Test harness validated on macOS ARM64 with real YAZE GUI interactions
gRPC Test Harness (IT-01 & IT-02): Full implementation complete with natural language → GUI testing
✅ Test Recording & Replay (IT-07): JSON recorder/replayer implemented, CLI and harness wired, end-to-end regression workflow captured in scripts/test_record_replay_e2e.sh
Build System: Hardened CMake configuration with reliable gRPC integration
Proposal Workflow: Agentic proposal system fully operational (create, list, diff, review in GUI)

Known Limitations & Improvement Opportunities:

Screenshot Auto-Capture: Manual RPC only → needs integration with TestManager failure detection
Test Introspection: ✅ Complete - GetTestStatus/ListTests/GetResults RPCs operational
Widget Discovery: AI agents can't enumerate available widgets → add DiscoverWidgets RPC
Test Recording: No record/replay for regression testing → add RecordSession/ReplaySession RPCs
Synchronous Wait: Async tests return immediately → add blocking mode or result polling
Error Context: Test failures lack screenshots/state dumps → enhance error reporting
Performance: Tests add ~166ms per Wait call due to frame yielding (acceptable trade-off)
YAML Parsing: Simple parser implemented, consider yaml-cpp for complex scenarios

Time Investment: 28.5 hours total (IT-01: 11h, IT-02: 7.5h, E2E: 2h, Policy: 6h, Docs: 2h)

Quick Reference

Start Test Harness:

./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze \
  --enable_test_harness \
  --test_harness_port=50052 \
  --rom_file=assets/zelda3.sfc &

Test All RPCs:

./scripts/test_harness_e2e.sh

Create Proposal:

./build/bin/z3ed agent run "Test prompt" --sandbox
./build/bin/z3ed agent list
./build/bin/z3ed agent diff --proposal-id <ID>

Review in GUI:

Open YAZE → Debug → Agent Proposals
Select proposal → Review → Accept/Reject/Delete

1. Current Priorities (Week of Oct 2-8, 2025)

Status: Core Infrastructure Complete ✅ | Test Harness Enhancement Phase 🔧

Priority 1: Test Harness Enhancements (IT-05 to IT-09) 🔧 ACTIVE

Goal: Transform test harness from basic automation to comprehensive testing platform and deliver holistic error reporting across YAZE
Time Estimate: 20-25 hours total (7.5h completed in IT-07)
Blocking Dependency: IT-01 Complete ✅

Motivation: The harness now supports AI workflows, regression capture, and automation—but error surfaces remain shallow:

AI Agent Development: Still needs widget discovery for adaptive planning
Regression Testing: Recording/replay finished; reporting pipeline must surface actionable failures
CI/CD Integration: Requires reliable artifacts (logs, screenshots, structured context)
Debugging: Failures lack screenshots, widget hierarchies, and EditorManager state snapshots
Application Consistency: z3ed, EditorManager, and core services emit heterogeneous error formats

IT-05: Test Introspection API (6-8 hours)

Status (Oct 2, 2025): ✅ Completed

Highlights:

imgui_test_harness.proto now exposes GetTestStatus, ListTests, and GetTestResults RPCs backed by TestManager's execution history.
CLI commands (z3ed agent test status|list|results) are fully wired with JSON/YAML formatting, follow-mode polling, and filtering options.
GuiAutomationClient provides typed wrappers for introspection APIs so agent workflows can poll status programmatically.
Regression coverage lives in scripts/test_harness_e2e.sh; a slimmer introspection smoke (scripts/test_introspection_e2e.sh) is queued for CI automation but manual verification paths are documented.

Future Enhancements:

Capture richer assertion metadata (expected/actual pairs) for improved failure messaging when the underlying harness exposes it.
Add pagination helpers to CLI once history volume grows (low priority).

Example Usage:

# Queue a test
z3ed agent test --prompt "Open Overworld editor"

# Poll for completion
z3ed test status --test-id grpc_click_12345678

# Retrieve results
z3ed test results --test-id grpc_click_12345678 --format json

API Schema:

message GetTestStatusRequest {
  string test_id = 1;
}

message GetTestStatusResponse {
  enum Status { QUEUED = 0; RUNNING = 1; PASSED = 2; FAILED = 3; TIMEOUT = 4; }
  Status status = 1;
  int64 execution_time_ms = 2;
  string error_message = 3;
  repeated string assertion_failures = 4;
}

message ListTestsRequest {
  string category_filter = 1;  // Optional: "grpc", "unit", etc.
  int32 page_size = 2;
  string page_token = 3;
}

message ListTestsResponse {
  repeated TestInfo tests = 1;
  string next_page_token = 2;
}

message TestInfo {
  string test_id = 1;
  string name = 2;
  string category = 3;
  int64 last_run_timestamp_ms = 4;
  int32 total_runs = 5;
  int32 pass_count = 6;
  int32 fail_count = 7;
}

Implementation Tasks:

Add DiscoverWidgets RPC:
- Enumerate all windows currently open in YAZE GUI
- List all interactive widgets (buttons, inputs, menus, tabs) per window
- Return widget metadata: ID, type, label, enabled state, position
- Support filtering by window name or widget type
AI-Friendly Output Format:
- JSON schema describing available interactions
- Natural language descriptions for each widget
- Suggested action templates (e.g., "Click button:{label}")

Example Usage:

# Discover all widgets
z3ed gui discover

# Filter by window
z3ed gui discover --window "Overworld"

# Get only buttons
z3ed gui discover --type button

API Schema (current):

message DiscoverWidgetsRequest {
  string window_filter = 1;
  WidgetType type_filter = 2;
  string path_prefix = 3;
  bool include_invisible = 4;
  bool include_disabled = 5;
}

message WidgetBounds {
  float min_x = 1;
  float min_y = 2;
  float max_x = 3;
  float max_y = 4;
}

message DiscoveredWidget {
  string path = 1;
  string label = 2;
  string type = 3;
  string description = 4;
  string suggested_action = 5;
  bool visible = 6;
  bool enabled = 7;
  WidgetBounds bounds = 8;
  uint32 widget_id = 9;
  int64 last_seen_frame = 10;
  int64 last_seen_at_ms = 11;
  bool stale = 12;
}

message DiscoveredWindow {
  string name = 1;
  bool visible = 2;
  repeated DiscoveredWidget widgets = 3;
}

message DiscoverWidgetsResponse {
  repeated DiscoveredWindow windows = 1;
  int32 total_widgets = 2;
  int64 generated_at_ms = 3;
}

Benefits for AI Agents:

LLMs can dynamically learn available GUI interactions
Agents can adapt to UI changes without hardcoded widget names
Natural language descriptions enable better prompt engineering

IT-07: Test Recording & Replay ✅ COMPLETE (Oct 2, 2025)

Highlights:

Implemented StartRecording, StopRecording, and ReplayTest RPCs with persistent JSON scripts
Added CLI commands: z3ed test record start|stop, z3ed test replay
Scripts stored in tests/gui/ with metadata (name, tags, assertions, timing hints)
Added regression coverage via scripts/test_record_replay_e2e.sh
Documentation updates in E6-z3ed-reference.md and new quick-start snippets in README
Confirmed compatibility with natural language prompts generated by the agent workflow

Outcome: Recording/replay is production-ready; focus shifts to surfacing rich failure diagnostics (IT-08).

IT-08: Enhanced Error Reporting (5-7 hours) ✅ COMPLETE

Status: IT-08a Complete ✅ | IT-08b Complete ✅ | IT-08c Complete ✅ Objective: Deliver a unified, high-signal error reporting pipeline spanning ImGuiTestHarness, z3ed CLI, EditorManager, and core application services.

Implementation Tracks:

Harness-Level Diagnostics

✅ IT-08a: Screenshot RPC implemented (SDL-based, BMP format, 1536x864)
✅ IT-08b: Auto-capture screenshots and context on test failure using shared helper that writes to ${TMPDIR}/yaze/test-results/<test_id>/
✅ IT-08c: Widget tree JSON dumps emitted alongside failure context
⏳ HTML bundle exporter (screenshots + widget tree) remains a stretch goal

CLI Experience Improvements

Surface artifact paths, failure context, and widget state in CLI output (DONE)
Standardize error envelopes in z3ed (absl::Status + structured payload)
Add --format html flag to emit rich bundles (planned)
Integrate with recording workflow: replay failures using captured state (planned)

EditorManager & Application Integration

Introduce shared ErrorAnnotatedResult utility exposing status, context, actionable_hint
Adapt EditorManager subsystems (ProposalDrawer, OverworldEditor, DungeonEditor) to adopt the shared structure
Add in-app failure overlay (ImGui modal) that references harness artifacts when available
Hook proposal acceptance/replay flows to display enriched diagnostics when sandbox merges fail

Telemetry & Storage Hooks (Stretch)

Optionally emit error metadata to a ring buffer for future analytics/telemetry workstreams
Provide CLI flag --error-artifact-dir to customize storage (supports CI separation)

Error Report Example:

{
  "test_id": "grpc_assert_12345678",
  "failure_time": "2025-10-02T14:23:45Z",
  "assertion": "visible:Overworld",
  "expected": "visible",
  "actual": "hidden",
  "screenshot": "/tmp/yaze/test-results/grpc_assert_12345678/failure_1696357220000.bmp",
  "widget_state": {
    "active_window": "Main Window",
    "focused_widget": null,
    "visible_windows": ["Main Window", "Debug"],
    "overworld_window": { "exists": true, "visible": false, "position": "0,0,0,0" }
  },
  "execution_context": {
    "frame_count": 1234,
    "recent_events": ["Click: menuitem: Overworld Editor", "Wait: window_visible:Overworld"],
    "resource_stats": { "memory_mb": 245, "textures": 12, "framerate": 60.0 },
    "editor_manager_snapshot": {
      "active_module": "OverworldEditor",
      "dirty_buffers": ["overworld_layer_1"],
      "last_error": null
    }
  }
}

IT-09: CI/CD Integration ✅ CLI Tooling Shipped

Delivered (Oct 3, 2025):

Standardized Suite Runtime

YAML suite parser/loader with group dependencies and retry semantics
z3ed agent test suite run exposes --group, --tag, --param, --retries, --ci-mode, and --junit
Automatic JUnit XML emission to test-results/junit/<suite>.xml

Validation & Authoring UX

z3ed agent test suite validate surfaces structural linting with annotated exit codes (0 pass, 1 fail, 2 error)
NEW z3ed agent test suite create <name> interactive flow scaffolds suites under tests/, prompting for metadata, groups, replay scripts, tags, and key=value parameters (with --force overwrite support)

Reporting

Text and JSON summaries include per-test assertions and retry outcomes
Default output directory layout ready for CI artifact upload

Next Steps (post-CLI follow-through):

Publish canonical tests/smoke.yaml / tests/regression.yaml samples
Add .github/workflows/gui-tests.yml template referencing the new runner
Document flaky-test mitigation patterns, including recommended retry counts
Wire suite execution output into docs/CI dashboards for quick triage

Test Suite Format:

name: YAZE GUI Test Suite
description: Comprehensive tests for YAZE editor functionality
version: 1.0

config:
  timeout_per_test: 30s
  retry_on_failure: 2
  parallel_execution: false

test_groups:
  - name: smoke
    description: Fast tests for basic functionality
    tests:
      - tests/overworld_load.json
      - tests/dungeon_load.json
  
  - name: regression
    description: Full test suite for release validation
    depends_on: [smoke]
    tests:
      - tests/palette_edit.json
      - tests/sprite_load.json
      - tests/rom_save.json

GitHub Actions Integration:

name: GUI Tests
on: [push, pull_request]

jobs:
  gui-tests:
    runs-on: macos-latest
    steps:
      - uses: actions/checkout@v2
      - name: Build YAZE with test harness
        run: |
          cmake -B build -DYAZE_WITH_GRPC=ON
          cmake --build build --target yaze --target z3ed
      - name: Start test harness
        run: |
          ./build/bin/yaze --enable_test_harness --headless &
          sleep 5
      - name: Run test suite
        run: |
          ./build/bin/z3ed test run-suite tests/suite.yaml --ci-mode
      - name: Upload test results
        if: always()
        uses: actions/upload-artifact@v2
        with:
          name: test-results
          path: test-results/

IT-10: Collaborative Editing & Multiplayer Sessions ⏸️ DEPRIORITIZED

Status: Postponed in favor of LLM integration work
Rationale: While collaborative editing is an interesting feature, practical LLM integration provides more immediate value for the agentic workflow system. The core infrastructure is complete, and enabling real AI agents to interact with z3ed is the critical next step.

Future Consideration: IT-10 may be revisited after LLM integration is production-ready and validated by users. The collaborative editing design is preserved in the documentation for future reference.

See: LLM-INTEGRATION-PLAN.md for the new priority work.

Priority 2: LLM Integration (Ollama + Gemini + Claude) 🤖 NEW PRIORITY

Goal: Enable practical AI-driven ROM modifications with local and remote LLM providers
Time Estimate: 12-15 hours total
Status: Ready to Implement

Why This is Critical: The z3ed infrastructure is complete (CLI, proposals, sandbox, GUI automation), but currently uses MockAIService with hardcoded commands. Real LLM integration unlocks the full potential of the agentic workflow system.

📋 Complete Documentation:

LLM-INTEGRATION-PLAN.md - Detailed technical implementation guide (60+ pages)
LLM-IMPLEMENTATION-CHECKLIST.md - Step-by-step task list with checkboxes
LLM-INTEGRATION-SUMMARY.md - Executive summary and getting started

Implementation Phases:

Phase 1: Ollama Local Integration (4-6 hours) 🎯 START HERE

Create OllamaAIService class with health checks and model management
Wire into agent commands with provider selection mechanism
Add CMake configuration for httplib support
End-to-end testing with qwen2.5-coder:7b model

Key Benefits: Local, free, private, no rate limits

Phase 2: Gemini Fixes (2-3 hours)

Fix existing GeminiAIService implementation
Improve prompting with resource catalogue
Add markdown code block stripping for reliable parsing

Phase 3: Claude Integration (2-3 hours)

Create ClaudeAIService class
Implement Messages API integration
Same interface as other services for easy swapping

Phase 4: Enhanced Prompt Engineering (3-4 hours)

Create PromptBuilder utility class
Load resource catalogue (z3ed-resources.yaml) into system prompts
Add few-shot examples for improved accuracy (>90%)
Inject ROM context (current state, loaded editors)

Quick Start After Implementation:

# Install Ollama
brew install ollama
ollama serve &
ollama pull qwen2.5-coder:7b

# Configure z3ed
export YAZE_AI_PROVIDER=ollama

# Use natural language
z3ed agent run --prompt "Make all soldier armor red" --rom zelda3.sfc --sandbox
z3ed agent diff  # Review changes

Testing Script: ./scripts/quickstart_ollama.sh (automated setup validation)

Priority 3: Windows Cross-Platform Testing 🪟

Collaboration Server:
- WebSocket server for real-time client communication
- Session management (create, join, authentication)
- Edit event broadcasting to all connected clients
- Conflict resolution (last-write-wins with timestamps)
Collaboration Client:
- Connect to remote sessions via WebSocket
- Send local edits to server
- Receive and apply remote edits
- ROM state synchronization on join
Edit Event Protocol:
- Protobuf definitions for edit events (tile, sprite, palette, map)
- Cursor position tracking
- AI proposal sharing and voting
- Session state messages
GUI Integration:
- Status bar showing connected users
- Collaboration panel (user list, activity feed)
- Live cursor rendering (color-coded per user)
- Proposal voting UI (Accept/Reject/Discuss)
Session Recording & Replay:
- Record all events to YAML/JSON file
- Replay engine with timeline controls
- Export session summaries for review

CLI Commands:

# Host a collaborative session
z3ed collab host --port 5000 --password "dev123"

# Join a session
z3ed collab join yaze://connect/192.168.1.100:5000

# List active sessions (LAN discovery)
z3ed collab list

# Disconnect from session
z3ed collab disconnect

# Replay recorded session
z3ed collab replay session_2025_10_02.yaml --speed 2x

User Stories:

US-1: As a ROM hacker, I want to host a collaborative session so my teammates can join and work together
US-2: As a collaborator, I want to see other users' edits in real-time so we stay synchronized
US-3: As a team lead, I want to use AI agents with my team so we can all benefit from automation (shared proposals with majority voting)
US-4: As a collaborator, I want to see where other users are working so we don't conflict (live cursors)
US-5: As a project manager, I want to record collaborative sessions so we can review work later

Benefits:

Real-Time Collaboration: Multiple users can edit the same ROM simultaneously
Shared AI Assistance: Team votes on AI proposals before execution
Conflict Prevention: Live cursors show where teammates are working
Audit Trail: Session recording for review and compliance
Remote Teams: Connect over LAN or internet (with optional encryption)

Technical Architecture:

┌──────────────┐     ┌─────────────────┐     ┌──────────────┐
│  Client A    │────►│  Collab Server  │◄────│  Client B    │
│  (Host)      │     │  (WebSocket)    │     │              │
└──────────────┘     │                 │     └──────────────┘
                     │  - Session Mgmt │
                     │  - Event Broker │     ┌──────────────┐
                     │  - Conflict Res │◄────│  Client C    │
                     └─────────────────┘     └──────────────┘

Security Considerations:

Optional password protection for sessions
Read-only vs read-write access levels
ROM checksum verification (prevents desync)
Rate limiting (prevent spam/DOS)
Optional TLS/SSL encryption for public internet

See: IT-10-COLLABORATIVE-EDITING.md for complete specification

Priority 2: Windows Cross-Platform Testing 🪟

Goal: Validate z3ed and test harness on Windows
Time Estimate: 8-10 hours
Blocking Dependency: IT-05 Complete (need stable API)

📋 Detailed Guides: See NEXT_PRIORITIES_OCT2.md for complete implementation breakdowns with code examples.

2. Workstreams Overview

Workstream	Goal	Status	Notes
Resource Catalogue	Machine-readable CLI specs for AI consumption	✅ Complete	`docs/api/z3ed-resources.yaml` generated
Acceptance Workflow	Human review/approval of agent proposals	✅ Complete	ProposalDrawer with ROM merging operational
ImGuiTest Bridge	Automated GUI testing via gRPC	✅ Complete	All 3 phases done (11 hours)
Verification Pipeline	Layered testing + CI coverage	📋 In Progress	E2E validation phase
Telemetry & Learning	Capture signals for improvement	📋 Planned	Optional/opt-in (Phase 8)

Completed Work Summary

Resource Catalogue (RC) ✅:

CLI flag passthrough and resource catalog system
agent describe exports YAML/JSON schemas
docs/api/z3ed-resources.yaml maintained
All ROM/Palette/Overworld/Dungeon/Patch commands documented

Acceptance Workflow (AW-01/02/03) ✅:

ProposalRegistry with disk persistence and cross-session tracking
RomSandboxManager for isolated ROM copies
agent list and agent diff commands
ProposalDrawer GUI: List/detail views, Accept/Reject/Delete, ROM merging
Integrated into EditorManager (Debug → Agent Proposals)

ImGuiTestHarness (IT-01) ✅:

Phase 1: gRPC infrastructure (6 RPC methods)
Phase 2: TestManager integration with dynamic tests
Phase 3: Full ImGuiTestEngine (Type/Wait/Assert RPCs)
E2E test script: scripts/test_harness_e2e.sh
Documentation: IT-01-QUICKSTART.md

3. Task Backlog

ID	Task	Workstream	Type	Status	Dependencies
RC-01	Define schema for `ResourceCatalog` entries and implement serialization helpers.	Resource Catalogue	Code	✅ Done	Schema system complete with all resource types documented
RC-02	Auto-generate `docs/api/z3ed-resources.yaml` from command annotations.	Resource Catalogue	Tooling	✅ Done	Generated and committed to docs/api/
RC-03	Implement `z3ed agent describe` CLI surface returning JSON schemas.	Resource Catalogue	Code	✅ Done	Both YAML and JSON output formats working
RC-04	Integrate schema export with TUI command palette + help overlays.	Resource Catalogue	UX	📋 Planned	RC-03
RC-05	Harden CLI command routing/flag parsing to unblock agent automation.	Resource Catalogue	Code	✅ Done	Fixed rom info handler to use FLAGS_rom
AW-01	Implement sandbox ROM cloning and tracking (`RomSandboxManager`).	Acceptance Workflow	Code	✅ Done	ROM sandbox manager operational with lifecycle management
AW-02	Build proposal registry service storing diffs, logs, screenshots.	Acceptance Workflow	Code	✅ Done	ProposalRegistry implemented with disk persistence
AW-03	Add ImGui drawer for proposals with accept/reject controls.	Acceptance Workflow	UX	✅ Done	ProposalDrawer GUI complete with ROM merging
AW-04	Implement policy evaluation for gating accept buttons.	Acceptance Workflow	Code	✅ Done	PolicyEvaluator service with 4 policy types (test, constraint, forbidden, review), GUI integration complete (6 hours)
AW-05	Draft `.z3ed-diff` hybrid schema (binary deltas + JSON metadata).	Acceptance Workflow	Design	📋 Planned	AW-01
IT-01	Create `ImGuiTestHarness` IPC service embedded in `yaze_test`.	ImGuiTest Bridge	Code	✅ Done	Phase 1+2+3 Complete - Full GUI automation with gRPC + ImGuiTestEngine (11 hours)
IT-02	Implement CLI agent step translation (`imgui_action` → harness call).	ImGuiTest Bridge	Code	✅ Done	`z3ed agent test` command with natural language prompts (7.5 hours)
IT-03	Provide synchronization primitives (`WaitForIdle`, etc.).	ImGuiTest Bridge	Code	✅ Done	Wait RPC with condition polling already implemented in IT-01 Phase 3
IT-04	Complete E2E validation with real YAZE widgets	ImGuiTest Bridge	Test	✅ Done	IT-02 - All 5 functional tests passing, window detection fixed with yield buffer
IT-05	Add test introspection RPCs (GetTestStatus, ListTests, GetResults)	ImGuiTest Bridge	Code	✅ Done	IT-01 - Enable clients to poll test results and query execution state (Oct 2, 2025)
IT-06	Implement widget discovery API for AI agents	ImGuiTest Bridge	Code	📋 Planned	IT-01 - DiscoverWidgets RPC to enumerate windows, buttons, inputs
IT-07	Add test recording/replay for regression testing	ImGuiTest Bridge	Code	✅ Done	IT-05 - RecordSession/ReplaySession RPCs with JSON test scripts
IT-08	Enhance error reporting with screenshots and state dumps	ImGuiTest Bridge	Code	✅ Done	IT-01 - Screenshot RPC, auto-capture, widget state dumps complete (Oct 2, 2025)
IT-08a	Screenshot RPC implementation (SDL capture)	ImGuiTest Bridge	Code	✅ Done	IT-01 - Screenshot capture complete (Oct 2, 2025)
IT-08b	Auto-capture screenshots on test failure	ImGuiTest Bridge	Code	✅ Done	IT-08a - Integrated with TestManager (Oct 2, 2025)
IT-08c	Widget state dumps and execution context	ImGuiTest Bridge	Code	✅ Done	IT-08b - Enhanced failure diagnostics (Oct 2, 2025)
IT-09	Create standardized test suite format for CI integration	ImGuiTest Bridge	Infra	✅ Done	IT-07 - CLI suite run/validate/create commands, JUnit output
IT-10	Collaborative editing & multiplayer sessions with shared AI	Collaboration	Feature	📋 Planned	IT-05, IT-08 - Real-time multi-user editing with live cursors, shared proposals (12-15 hours)
VP-01	Expand CLI unit tests for new commands and sandbox flow.	Verification Pipeline	Test	📋 Planned	RC/AW tasks
VP-02	Add harness integration tests with replay scripts.	Verification Pipeline	Test	📋 Planned	IT tasks
VP-03	Create CI job running agent smoke tests with `YAZE_WITH_JSON`.	Verification Pipeline	Infra	📋 Planned	VP-01, VP-02
TL-01	Capture accept/reject metadata and push to telemetry log.	Telemetry & Learning	Code	📋 Planned	AW tasks
TL-02	Build anonymized metrics exporter + opt-in toggle.	Telemetry & Learning	Infra	📋 Planned	TL-01

Status Legend: 🔄 Active · 📋 Planned · ✅ Done

Progress Summary:

✅ Completed: 13 tasks (54%)
🔄 Active: 0 tasks (0%)
📋 Planned: 11 tasks (46%)
Total: 24 tasks (6 test harness enhancements + 1 collaborative feature)

3. Immediate Next Steps (Week of Oct 1-7, 2025)

Priority 0: Testing & Validation (Active)

TEST: Complete end-to-end proposal workflow
- Launch YAZE and verify ProposalDrawer displays live proposals
- Test Accept action → verify ROM merge and save prompt
- Test Reject and Delete actions
- Validate filtering and refresh functionality
Widget ID Refactoring (Started Oct 2, 2025) 🎯 NEW
- ✅ Added widget_id_registry to build system
- ✅ Registered 13 Overworld toolset buttons with hierarchical IDs
- 📋 Next: Test widget discovery and update test harness
- See: WIDGET_ID_REFACTORING_PROGRESS.md

Priority 1: ImGuiTestHarness Foundation (IT-01) ✅ COMPLETE

Rationale: Required for automated GUI testing and remote control of YAZE for AI workflows
Decision: ✅ Use gRPC - Production-grade, cross-platform, type-safe (see IT-01-grpc-evaluation.md)

Status: Phase 1 Complete ✅ | Phase 2 Complete ✅ | Phase 3 Planned <20>

Phase 1: gRPC Infrastructure ✅ COMPLETE

✅ Add gRPC to build system via FetchContent
✅ Create .proto schema (Ping, Click, Type, Wait, Assert, Screenshot)
✅ Implement gRPC server with all 6 RPC stubs
✅ Test with grpcurl - all RPCs responding
✅ Server lifecycle management (Start/Shutdown)
✅ Cross-platform build verified (macOS ARM64)

See: GRPC_TEST_SUCCESS.md for Phase 1 completion details

Phase 2: ImGuiTestEngine Integration ✅ COMPLETE

Goal: Replace stub RPC handlers with actual GUI automation
Status: Infrastructure complete, dynamic test registration implemented
Time Spent: ~4 hours

Implementation Guide: 📖 IT-01-PHASE2-IMPLEMENTATION-GUIDE.md

Completed Tasks:

✅ TestManager Integration - gRPC service receives TestManager reference
✅ Build System - Successfully compiles with ImGuiTestEngine support
✅ Server Startup - gRPC server starts correctly on macOS with test harness flag
✅ Dynamic Test Registration - Click RPC uses IM_REGISTER_TEST() macro for dynamic tests
✅ Stub Handlers - Type/Wait/Assert RPCs return success (implementation pending Phase 3)
✅ Ping RPC - Fully functional, returns YAZE version and timestamp

Key Learnings:

ImGuiTestEngine requires test registration - can't call test functions directly
Test context provided by engine via test->Output.Status not test->Status
YAZE uses custom flag system with FLAGS_name->Get() pattern
Correct flags: --enable_test_harness, --test_harness_port, --rom_file

Testing Results:

# Server starts successfully
./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze \
  --enable_test_harness \
  --test_harness_port=50052 \
  --rom_file=assets/zelda3.sfc &

# Ping RPC working
grpcurl -plaintext -d '{"message":"test"}' \
  127.0.0.1:50052 yaze.test.ImGuiTestHarness/Ping
# Response: {"message":"Pong: test","timestampMs":"...","yazeVersion":"0.3.2"}

Issues Fixed:

❌→✅ SIGSEGV on TestManager initialization (deferred ImGuiTestEngine init to Phase 3)
❌→✅ ImGuiTestEngine API mismatch (switched to dynamic test registration)
❌→✅ Status field access (corrected to test->Output.Status)
❌→✅ Port conflicts (use port 50052, killall yaze to cleanup)
❌→✅ Flag naming (documented correct underscore format)

Phase 3: Full ImGuiTestEngine Integration ✅ COMPLETE (Oct 2, 2025)

Goal: Complete implementation of all GUI automation RPCs

Completed Tasks:

✅ Type RPC Implementation - Full text input automation
- ItemInfo API usage corrected (returns by value, not pointer)
- Focus management with ItemClick before typing
- Clear-first functionality with keyboard shortcuts
- Dynamic test registration with timeout handling
✅ Wait RPC Implementation - Condition polling with timeout
- Three condition types: window_visible, element_visible, element_enabled
- Configurable timeout (default 5000ms) and poll interval (default 100ms)
- Proper Yield() calls to allow ImGui event processing
- Extended timeout for test execution
✅ Assert RPC Implementation - State validation with structured responses
- Multiple assertion types: visible, enabled, exists, text_contains
- Actual vs expected value reporting
- Detailed error messages for debugging
- text_contains partially implemented (text retrieval needs refinement)
✅ API Compatibility Fixes
- Corrected ItemInfo usage (by value, check ID != 0)
- Fixed flag names (ItemFlags instead of StatusFlags)
- Proper visibility checks using RectClipped dimensions
- All dynamic tests properly registered and cleaned up

Testing:

Build successful on macOS ARM64
All RPCs respond correctly
Test script created: scripts/test_harness_e2e.sh
See IT-01-PHASE3-COMPLETE.md for full implementation details

Known Limitations:

Screenshot RPC not implemented (placeholder stub)
text_contains assertion uses placeholder text retrieval
Need end-to-end workflow testing with real YAZE widgets

End-to-End Testing (1 hour)
- Create shell script workflow: start server → click button → wait for window → type text → assert state
- Test with real YAZE editors (Overworld, Dungeon, etc.)
- Document edge cases and troubleshooting

Phase 4: CLI Integration & Windows Testing (4-5 hours)

CLI Client (z3ed agent test)

Generate gRPC calls from AI prompts
Natural language → ImGui action translation
Screenshot capture for LLM feedback
Emit structured error envelopes with artifact links (IT-08)

Windows Testing
- Detailed build instructions for vcpkg setup
- Test on Windows VM or with contributor
- Add Windows CI job to GitHub Actions
- Document troubleshooting

IT-01 Quick Reference

Start YAZE with Test Harness:

./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze \
  --enable_test_harness \
  --test_harness_port=50052 \
  --rom_file=assets/zelda3.sfc &

Test RPCs with grpcurl:

# Ping - Health check
grpcurl -plaintext -import-path src/app/core/proto -proto imgui_test_harness.proto \
  -d '{"message":"test"}' 127.0.0.1:50052 yaze.test.ImGuiTestHarness/Ping

# Click - Click UI element
grpcurl -plaintext -import-path src/app/core/proto -proto imgui_test_harness.proto \
  -d '{"target":"button:Overworld","type":"LEFT"}' \
  127.0.0.1:50052 yaze.test.ImGuiTestHarness/Click

# Type - Input text
grpcurl -plaintext -import-path src/app/core/proto -proto imgui_test_harness.proto \
  -d '{"target":"input:Filename","text":"zelda3.sfc","clear_first":true}' \
  127.0.0.1:50052 yaze.test.ImGuiTestHarness/Type

# Wait - Wait for condition
grpcurl -plaintext -import-path src/app/core/proto -proto imgui_test_harness.proto \
  -d '{"condition":"window_visible:Overworld Editor","timeout_ms":5000}' \
  127.0.0.1:50052 yaze.test.ImGuiTestHarness/Wait

# Assert - Validate state
grpcurl -plaintext -import-path src/app/core/proto -proto imgui_test_harness.proto \
  -d '{"condition":"visible:Main Window"}' \
  127.0.0.1:50052 yaze.test.ImGuiTestHarness/Assert

Troubleshooting:

Port in use: killall yaze or use --test_harness_port=50053
Connection refused: Check server started with lsof -i :50052
Unrecognized flag: Use underscores not hyphens (e.g., --rom_file not --rom)

Priority 2: Policy Evaluation Framework (AW-04, 4-6 hours)

DESIGN: YAML-based Policy Configuration

# .yaze/policies/agent.yaml
version: 1.0
policies:
  - name: require_tests
    type: test_requirement
    enabled: true
    rules:
      - test_suite: "overworld_rendering"
        min_pass_rate: 0.95
      - test_suite: "palette_integrity"
        min_pass_rate: 1.0

  - name: limit_change_scope
    type: change_constraint
    enabled: true
    rules:
      - max_bytes_changed: 10240  # 10KB
      - allowed_banks: [0x00, 0x01, 0x0E]  # Graphics banks only
      - forbidden_ranges:
        - start: 0xFFB0  # ROM header
          end: 0xFFFF

  - name: human_review_required
    type: review_requirement
    enabled: true
    rules:
      - if: bytes_changed > 1024
        then: require_diff_review: true
      - if: commands_executed > 10
        then: require_log_review: true

IMPLEMENT: PolicyEvaluator Service
- src/cli/service/policy_evaluator.{h,cc}
- Singleton service loads policies from .yaze/policies/
- EvaluateProposal(proposal_id) -> PolicyResult
- Returns: pass/fail + list of violations with severity
- Hook into ProposalRegistry lifecycle
INTEGRATE: Policy UI in ProposalDrawer
- Add "Policy Status" section in detail view
- Display violations with icons: ⛔ Critical, ⚠️ Warning, ℹ️ Info
- Gate Accept button: disabled if critical violations exist
- Show helpful messages: "Accept blocked: Test pass rate 0.85 < 0.95"
- Allow policy overrides with confirmation: "Override policy? This action will be logged."

Priority 3: Documentation & Consolidation (2-3 hours)

CONSOLIDATE: Merge standalone docs into main plan
- ✅ AW-03 summary → already in main plan, delete standalone doc
- Check for other AW-* or task-specific docs to merge
- Update main plan with architecture diagrams
CREATE: Architecture Flow Diagram
- Visual representation of proposal lifecycle
- Component interaction diagram
- Add to implementation plan

Later: Advanced Features

VP-01: Expand CLI unit tests
VP-02: Integration tests with replay scripts
TL-01: Telemetry capture for learning

4. Current Issues & Blockers

Active Issues

None - all blocking issues resolved as of Oct 1, 2025

Known Limitations (Non-Blocking)

ProposalDrawer lacks keyboard navigation
Large diffs/logs truncated at 1000 lines (consider pagination)
Proposals don't persist full metadata to disk (prompt, description, sandbox_id reconstructed)
No policy evaluation yet (AW-04)

5. Architecture Overview

5.1. Proposal Lifecycle Flow

┌─────────────────────────────────────────────────────────────────┐
│ 1. CREATION (CLI: z3ed agent run)                               │
├─────────────────────────────────────────────────────────────────┤
│ User Prompt                                                      │
│      ↓                                                           │
│ MockAIService / GeminiAIService                                 │
│      ↓ (generates commands)                                     │
│ ["palette export ...", "overworld set-tile ..."]                │
│      ↓                                                           │
│ RomSandboxManager::CreateSandbox(rom)                           │
│      ↓ (creates isolated copy)                                  │
│ /tmp/yaze/sandboxes/<timestamp>/zelda3.sfc                      │
│      ↓                                                           │
│ Execute commands on sandbox ROM                                 │
│      ↓ (logs each command)                                      │
│ ProposalRegistry::CreateProposal(sandbox_id, prompt, desc)      │
│      ↓ (creates proposal directory)                             │
│ /tmp/yaze/proposals/proposal-<timestamp>-<seq>/                 │
│   ├─ execution.log (command outputs)                            │
│   ├─ diff.txt (if generated)                                    │
│   └─ screenshots/ (if any)                                      │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ 2. DISCOVERY (CLI: z3ed agent list)                             │
├─────────────────────────────────────────────────────────────────┤
│ ProposalRegistry::ListProposals()                               │
│      ↓ (lazy loads from disk)                                   │
│ LoadProposalsFromDiskLocked()                                   │
│      ↓ (scans /tmp/yaze/proposals/)                             │
│ Reconstructs metadata from filesystem                           │
│      ↓ (parses timestamps, reads logs)                          │
│ Returns vector<ProposalMetadata>                                │
│      ↓                                                           │
│ Display table: ID | Status | Created | Prompt | Stats           │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ 3. REVIEW (GUI: Debug → Agent Proposals)                        │
├─────────────────────────────────────────────────────────────────┤
│ ProposalDrawer::Draw()                                          │
│      ↓ (called every frame from EditorManager)                  │
│ ProposalDrawer::RefreshProposals()                              │
│      ↓ (calls ProposalRegistry::ListProposals)                  │
│ Display proposal list (selectable table)                        │
│      ↓ (user clicks proposal)                                   │
│ ProposalDrawer::SelectProposal(id)                              │
│      ↓ (loads detail content)                                   │
│ Read execution.log and diff.txt from proposal directory         │
│      ↓                                                           │
│ Display detail view:                                            │
│   ├─ Metadata (sandbox_id, timestamp, stats)                   │
│   ├─ Diff (syntax highlighted)                                  │
│   └─ Log (command execution trace)                              │
│      ↓                                                           │
│ User decides: [Accept] [Reject] [Delete]                        │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ 4. ACCEPTANCE (GUI: Click "Accept" button)                      │
├─────────────────────────────────────────────────────────────────┤
│ ProposalDrawer::AcceptProposal(proposal_id)                     │
│      ↓                                                           │
│ Get proposal metadata (includes sandbox_id)                     │
│      ↓                                                           │
│ RomSandboxManager::ListSandboxes()                              │
│      ↓ (find sandbox by ID)                                     │
│ sandbox_rom_path = sandbox.rom_path                             │
│      ↓                                                           │
│ Load sandbox ROM from disk                                      │
│      ↓                                                           │
│ rom_->WriteVector(0, sandbox_rom.vector())                      │
│      ↓ (copies entire sandbox ROM → main ROM)                   │
│ ROM marked dirty (save prompt appears)                          │
│      ↓                                                           │
│ ProposalRegistry::UpdateStatus(id, kAccepted)                   │
│      ↓                                                           │
│ User: File → Save ROM                                           │
│      ↓                                                           │
│ Changes committed ✅                                            │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ 5. REJECTION (GUI: Click "Reject" button)                       │
├─────────────────────────────────────────────────────────────────┤
│ ProposalDrawer::RejectProposal(proposal_id)                     │
│      ↓                                                           │
│ ProposalRegistry::UpdateStatus(id, kRejected)                   │
│      ↓                                                           │
│ Proposal preserved for audit trail                              │
│ Sandbox ROM left untouched (can be cleaned up later)            │
└─────────────────────────────────────────────────────────────────┘

5.2. Component Interaction Diagram

┌────────────────────┐
│   CLI Layer        │
│  (z3ed commands)   │
└────────┬───────────┘
         │
         ├──► agent run ──────────┐
         ├──► agent list ─────────┤
         └──► agent diff ─────────┤
                                  │
         ┌────────────────────────▼──────────────────────┐
         │         CLI Service Layer                     │
         ├───────────────────────────────────────────────┤
         │  ┌─────────────────────────────────────────┐  │
         │  │ ProposalRegistry (Singleton)            │  │
         │  │  • CreateProposal()                     │  │
         │  │  • ListProposals()                      │  │
         │  │  • GetProposal()                        │  │
         │  │  • UpdateStatus()                       │  │
         │  │  • RemoveProposal()                     │  │
         │  │  • LoadProposalsFromDiskLocked()        │  │
         │  └────────────┬────────────────────────────┘  │
         │               │                               │
         │  ┌────────────▼────────────────────────────┐  │
         │  │ RomSandboxManager (Singleton)          │  │
         │  │  • CreateSandbox()                     │  │
         │  │  • ActiveSandbox()                     │  │
         │  │  • ListSandboxes()                     │  │
         │  │  • RemoveSandbox()                     │  │
         │  └────────────┬────────────────────────────┘  │
         └───────────────┼────────────────────────────────┘
                         │
         ┌───────────────▼────────────────────────────────┐
         │         Filesystem Layer                       │
         ├────────────────────────────────────────────────┤
         │  /tmp/yaze/proposals/                          │
         │    └─ proposal-<timestamp>-<seq>/              │
         │         ├─ execution.log                       │
         │         ├─ diff.txt                            │
         │         └─ screenshots/                        │
         │                                                │
         │  /tmp/yaze/sandboxes/                          │
         │    └─ <timestamp>-<seq>/                       │
         │         └─ zelda3.sfc (isolated ROM copy)      │
         └────────────────────────────────────────────────┘
                         ▲
                         │
         ┌───────────────┴────────────────────────────────┐
         │         GUI Layer                              │
         ├────────────────────────────────────────────────┤
         │  ┌─────────────────────────────────────────┐   │
         │  │ EditorManager                           │   │
         │  │  • current_rom_                         │   │
         │  │  • proposal_drawer_                     │   │
         │  │  • Update() { proposal_drawer_.Draw() } │   │
         │  └────────────┬────────────────────────────┘   │
         │               │                                │
         │  ┌────────────▼────────────────────────────┐   │
         │  │ ProposalDrawer                          │   │
         │  │  • rom_ (ptr to EditorManager's ROM)    │   │
         │  │  • Draw()                               │   │
         │  │  • DrawProposalList()                   │   │
         │  │  • DrawProposalDetail()                 │   │
         │  │  • AcceptProposal() ← ROM MERGE         │   │
         │  │  • RejectProposal()                     │   │
         │  │  • DeleteProposal()                     │   │
         │  └─────────────────────────────────────────┘   │
         └────────────────────────────────────────────────┘

5.3. Data Flow: Agent Run to ROM Merge

User: "Make soldiers wear red armor"
         │
         ▼
┌────────────────────────┐
│ MockAIService          │ Generates: ["palette export sprites_aux1 4 soldier.col"]
└────────┬───────────────┘
         │
         ▼
┌────────────────────────┐
│ RomSandboxManager      │ Creates: /tmp/.../sandboxes/20251001T200215-1/zelda3.sfc
└────────┬───────────────┘
         │
         ▼
┌────────────────────────┐
│ Command Executor       │ Runs: palette export on sandbox ROM
└────────┬───────────────┘
         │
         ▼
┌────────────────────────┐
│ ProposalRegistry       │ Creates: proposal-20251001T200215-1/
│                        │   • execution.log: "[timestamp] palette export succeeded"
└────────┬───────────────┘   • diff.txt: (if diff generated)
         │
         │ Time passes... user launches GUI
         ▼
┌────────────────────────┐
│ ProposalDrawer loads   │ Reads: /tmp/.../proposals/proposal-*/
│                        │ Displays: List of proposals
└────────┬───────────────┘
         │
         │ User clicks "Accept"
         ▼
┌────────────────────────┐
│ AcceptProposal()       │ 1. Find sandbox ROM: /tmp/.../sandboxes/.../zelda3.sfc
│                        │ 2. Load sandbox ROM
│                        │ 3. rom_->WriteVector(0, sandbox_rom.vector())
│                        │ 4. Main ROM now contains all sandbox changes
│                        │ 5. ROM marked dirty
└────────┬───────────────┘
         │
         ▼
┌────────────────────────┐
│ User: File → Save      │ Changes persisted to disk ✅
└────────────────────────┘

5. Open Questions

What serialization format should the proposal registry adopt for diff payloads (binary vs. textual vs. hybrid)?
➤ Decision: pursue a hybrid package (.z3ed-diff) that wraps binary tile/object deltas alongside a JSON metadata envelope (identifiers, texture descriptors, preview palette info). Capture format draft under RC/AW backlog.
How should the harness authenticate escalation requests for mutation actions?
➤ Still open—evaluate shared-secret vs. interactive user prompt in the harness spike (IT-01).
Can we reuse existing regression test infrastructure for nightly ImGui runs or should we spin up a dedicated binary?
➤ Investigate during the ImGuiTestHarness spike; compare extending yaze_test jobs versus introducing a lightweight automation runner.

4. Work History & Key Decisions

This section provides a high-level summary of completed workstreams and major architectural decisions.

Resource Catalogue Workstream (RC) - ✅ COMPLETE

Outcome: A machine-readable API specification for all z3ed commands.
Artifact: docs/api/z3ed-resources.yaml is the generated source of truth.
Details: Implemented a schema system and serialization for all CLI resources (ROM, Palette, Agent, etc.), enabling AI consumption.

Acceptance Workflow (AW-01, AW-02, AW-03) - ✅ COMPLETE

Outcome: A complete, human-in-the-loop proposal review system.
Components:
- RomSandboxManager: For creating isolated ROM copies.
- ProposalRegistry: For tracking proposals, diffs, and logs with disk persistence.
- ProposalDrawer: An ImGui panel for reviewing, accepting, and rejecting proposals, with full ROM merging capabilities.
Integration: The agent run, agent list, and agent diff commands are fully integrated with the registry. The GUI and CLI share the same underlying proposal data.

ImGuiTestHarness (IT-01, IT-02) - ✅ CORE COMPLETE

Outcome: A gRPC-based service for automated GUI testing.
Decision: Chose gRPC for its performance, cross-platform support, and type safety.
Features: Implemented 6 core RPCs: Ping, Click, Type, Wait, Assert, and a stubbed Screenshot.
Integration: The z3ed agent test command can translate natural language prompts into a sequence of gRPC calls to execute tests.

Files Modified/Created

A summary of files created or changed during the implementation of the core z3ed infrastructure.

Core Services & CLI Handlers:

src/cli/service/proposal_registry.{h,cc}
src/cli/service/rom_sandbox_manager.{h,cc}
src/cli/service/resource_catalog.{h,cc}
src/cli/handlers/agent.cc
src/cli/handlers/rom.cc

GUI & Application Integration:

src/app/editor/system/proposal_drawer.{h,cc}
src/app/editor/editor_manager.{h,cc}
src/app/core/service/imgui_test_harness_service.{h,cc}
src/app/core/proto/imgui_test_harness.proto

Build System (CMake):

src/app/app.cmake
src/app/emu/emu.cmake
src/cli/z3ed.cmake
src/CMakeLists.txt

Documentation & API Specs:

docs/api/z3ed-resources.yaml
docs/z3ed/E6-z3ed-cli-design.md
docs/z3ed/E6-z3ed-implementation-plan.md
docs/z3ed/E6-z3ed-reference.md
docs/z3ed/README.md

5. Open Questions

What serialization format should the proposal registry adopt for diff payloads (binary vs. textual vs. hybrid)?
➤ Decision: pursue a hybrid package (.z3ed-diff) that wraps binary tile/object deltas alongside a JSON metadata envelope (identifiers, texture descriptors, preview palette info). Capture format draft under RC/AW backlog.
How should the harness authenticate escalation requests for mutation actions?
➤ Still open—evaluate shared-secret vs. interactive user prompt in the harness spike (IT-01).
Can we reuse existing regression test infrastructure for nightly ImGui runs or should we spin up a dedicated binary?
➤ Investigate during the ImGuiTestHarness spike; compare extending yaze_test jobs versus introducing a lightweight automation runner.

Z3ED_AI Flag Migration Guide

Date: October 3, 2025
Status: ✅ Complete and Tested

Summary

This document describes the consolidation of z3ed AI build flags into a single Z3ED_AI master flag, fixing a Gemini integration crash, and improving build ergonomics.

Problem Statement

Before (Issues):

Confusing Build Flags: Users had to specify -DYAZE_WITH_GRPC=ON -DYAZE_WITH_JSON=ON to enable AI features
Crash on Startup: Gemini integration crashed due to PromptBuilder using JSON/YAML unconditionally
Poor Modularity: AI dependencies scattered across multiple conditional blocks
Unclear Documentation: Users didn't know which flags enabled which features

Root Cause of Crash:

// GeminiAIService constructor (ALWAYS runs when Gemini key present)
GeminiAIService::GeminiAIService(const GeminiConfig& config) : config_(config) {
  // This line crashed when YAZE_WITH_JSON=OFF
  prompt_builder_.LoadResourceCatalogue("");  // ❌ Uses nlohmann::json unconditionally
}

The PromptBuilder::LoadResourceCatalogue() function used nlohmann::json and yaml-cpp without guards, causing segfaults when JSON support wasn't compiled in.

Solution

1. Created Z3ED_AI Master Flag

New CMakeLists.txt (/Users/scawful/Code/yaze/CMakeLists.txt):

# Master flag for z3ed AI agent features
option(Z3ED_AI "Enable z3ed AI agent features (Gemini/Ollama integration)" OFF)

# Auto-enable dependencies
if(Z3ED_AI)
    message(STATUS "Z3ED_AI enabled: Activating AI agent dependencies (JSON, YAML, httplib)")
    set(YAZE_WITH_JSON ON CACHE BOOL "Enable JSON support" FORCE)
endif()

Benefits:

✅ Single flag to enable all AI features: -DZ3ED_AI=ON
✅ Auto-manages dependencies (JSON, YAML, httplib)
✅ Clear intent: "I want AI agent features"
✅ Backward compatible: Old flags still work

2. Fixed PromptBuilder Crash

Added Compile-Time Guard (src/cli/service/ai/prompt_builder.h):

#ifndef YAZE_CLI_SERVICE_PROMPT_BUILDER_H_
#define YAZE_CLI_SERVICE_PROMPT_BUILDER_H_

// Warn at compile time if JSON not available
#if !defined(YAZE_WITH_JSON)
#warning "PromptBuilder requires JSON support. Build with -DZ3ED_AI=ON or -DYAZE_WITH_JSON=ON"
#endif

Added Runtime Guard (src/cli/service/ai/prompt_builder.cc):

absl::Status PromptBuilder::LoadResourceCatalogue(const std::string& yaml_path) {
#ifndef YAZE_WITH_JSON
  // Gracefully degrade instead of crashing
  std::cerr << "⚠️  PromptBuilder requires JSON support for catalogue loading\n"
            << "   Build with -DZ3ED_AI=ON or -DYAZE_WITH_JSON=ON\n"
            << "   AI features will use basic prompts without tool definitions\n";
  return absl::OkStatus();  // Don't crash, just skip advanced features
#else
  // ... normal loading code ...
#endif
}

Benefits:

✅ No more segfaults when GEMINI_API_KEY is set but JSON disabled
✅ Clear error messages at compile time and runtime
✅ Graceful degradation instead of hard failure

3. Updated z3ed Build Configuration

New z3ed.cmake (src/cli/z3ed.cmake):

# AI Agent Support (Consolidated via Z3ED_AI flag)
if(Z3ED_AI OR YAZE_WITH_JSON)
  target_compile_definitions(z3ed PRIVATE YAZE_WITH_JSON)
  message(STATUS "✓ z3ed AI agent enabled (Ollama + Gemini support)")
  target_link_libraries(z3ed PRIVATE nlohmann_json::nlohmann_json)
endif()

# SSL/HTTPS Support for Gemini
if((Z3ED_AI OR YAZE_WITH_JSON) AND (YAZE_WITH_GRPC OR Z3ED_AI))
  find_package(OpenSSL)
  if(OpenSSL_FOUND)
    target_compile_definitions(z3ed PRIVATE CPPHTTPLIB_OPENSSL_SUPPORT)
    target_link_libraries(z3ed PRIVATE OpenSSL::SSL OpenSSL::Crypto)
    message(STATUS "✓ SSL/HTTPS support enabled for z3ed (Gemini API ready)")
  else()
    message(WARNING "OpenSSL not found - Gemini API will not work")
    message(STATUS "  • Ollama (local) still works without SSL")
  endif()
endif()

Benefits:

✅ Clear status messages during build
✅ Explains what's enabled and what's missing
✅ Guidance on how to fix missing dependencies

Migration Instructions

For Users

Old Way (still works):

cmake -B build -DYAZE_WITH_GRPC=ON -DYAZE_WITH_JSON=ON
cmake --build build --target z3ed

New Way (recommended):

cmake -B build -DZ3ED_AI=ON
cmake --build build --target z3ed

With GUI Testing:

cmake -B build -DZ3ED_AI=ON -DYAZE_WITH_GRPC=ON
cmake --build build --target z3ed

For Developers

Check if AI Features Available:

#ifdef YAZE_WITH_JSON
  // JSON-dependent code (AI responses, config loading)
#else
  // Fallback or warning
#endif

Don't use JSON/YAML directly - use PromptBuilder which handles guards automatically.

Testing Results

Build Configurations Tested ✅

Minimal Build (no AI):

cmake -B build
./build/bin/z3ed --help  # ✅ Works, shows "AI disabled" message

AI Enabled (new flag):

cmake -B build -DZ3ED_AI=ON
export GEMINI_API_KEY="..."
./build/bin/z3ed agent plan --prompt "test"  # ✅ Works, connects to Gemini

Full Stack (AI + gRPC):

cmake -B build -DZ3ED_AI=ON -DYAZE_WITH_GRPC=ON
./build/bin/z3ed agent test --prompt "..."  # ✅ Works, GUI automation available

Crash Scenarios Fixed ✅

Before:

export GEMINI_API_KEY="..."
cmake -B build  # JSON disabled by default
./build/bin/z3ed agent plan --prompt "test"
# Result: Segmentation fault (139) ❌

After:

export GEMINI_API_KEY="..."
cmake -B build  # JSON disabled by default
./build/bin/z3ed agent plan --prompt "test"
# Result: ⚠️ Warning message, graceful degradation ✅

export GEMINI_API_KEY="..."
cmake -B build -DZ3ED_AI=ON  # JSON enabled
./build/bin/z3ed agent plan --prompt "Place a tree at 10, 10"
# Result: ✅ Gemini responds, creates proposal

Impact on Build Modularization

This change aligns with the goals in build_modularization_plan.md and build_modularization_implementation.md:

Before:

Scattered conditional compilation flags
Dependencies unclear
Hard to add to modular library system

After:

✅ Clear feature flag: Z3ED_AI
✅ Can create libyaze_agent.a with if(Z3ED_AI) guard

✅ Easy to make optional in modular build:

if(Z3ED_AI)
  add_library(yaze_agent STATIC ${YAZE_AGENT_SOURCES})
  target_compile_definitions(yaze_agent PUBLIC YAZE_WITH_JSON)
  target_link_libraries(yaze_agent PUBLIC nlohmann_json::nlohmann_json yaml-cpp)
endif()

Future Modular Build Integration

When implementing modular builds (Phase 6-7 from build_modularization_plan.md):

# src/cli/agent/agent_library.cmake (NEW)
if(Z3ED_AI)
  add_library(yaze_agent STATIC
    cli/service/ai/ai_service.cc
    cli/service/ai/ollama_ai_service.cc
    cli/service/ai/gemini_ai_service.cc
    cli/service/ai/prompt_builder.cc
    cli/service/agent/conversational_agent_service.cc
    # ... other agent sources
  )
  
  target_compile_definitions(yaze_agent PUBLIC YAZE_WITH_JSON)
  
  target_link_libraries(yaze_agent PUBLIC
    yaze_util
    nlohmann_json::nlohmann_json
    yaml-cpp
  )
  
  # Optional SSL for Gemini
  if(OpenSSL_FOUND)
    target_compile_definitions(yaze_agent PRIVATE CPPHTTPLIB_OPENSSL_SUPPORT)
    target_link_libraries(yaze_agent PRIVATE OpenSSL::SSL OpenSSL::Crypto)
  endif()
  
  message(STATUS "✓ yaze_agent library built with AI support")
endif()

Benefits for Modular Build:

Agent library clearly optional
Can rebuild just agent library when AI code changes
z3ed links to yaze_agent instead of individual sources
Faster incremental builds

Documentation Updates

Updated files:

✅ docs/z3ed/README.md - Added Z3ED_AI flag documentation
✅ docs/z3ed/Z3ED_AI_FLAG_MIGRATION.md - This document
📋 TODO: Update docs/02-build-instructions.md with Z3ED_AI flag
📋 TODO: Update CI/CD workflows to use Z3ED_AI

Backward Compatibility

Old Flags Still Work ✅

# These all enable AI features:
cmake -B build -DYAZE_WITH_JSON=ON          # ✅ Works
cmake -B build -DYAZE_WITH_GRPC=ON          # ✅ Works (auto-enables JSON)
cmake -B build -DZ3ED_AI=ON                 # ✅ Works (new way)

# Combining flags:
cmake -B build -DZ3ED_AI=ON -DYAZE_WITH_GRPC=ON  # ✅ Full stack

No Breaking Changes

Existing build scripts continue to work
CI/CD pipelines don't need immediate updates
Users can migrate at their own pace

Next Steps

Short Term (Complete)

✅ Fix Gemini crash
✅ Create Z3ED_AI master flag
✅ Update z3ed build configuration
✅ Test all build configurations
✅ Update README documentation

Medium Term (Recommended)

Update CI/CD workflows to use -DZ3ED_AI=ON
Add Z3ED_AI to preset configurations
Update main build instructions docs
Create agent library module (see above)

Long Term (Integration with Modular Build)

Implement yaze_agent library (Phase 6)
Add agent to modular dependency graph
Create agent-specific unit tests
Optional: Split Gemini/Ollama into separate modules

References

Related Issues: Gemini crash (segfault 139) with GEMINI_API_KEY set
Related Docs:
- docs/build_modularization_plan.md - Future library structure
- docs/build_modularization_implementation.md - Implementation guide
- docs/z3ed/README.md - User-facing z3ed documentation
- docs/z3ed/AGENT-ROADMAP.md - AI agent development plan

Summary

This migration successfully:

✅ Fixed crash: Gemini no longer segfaults when JSON disabled
✅ Simplified builds: One flag (Z3ED_AI) replaces multiple flags
✅ Improved UX: Clear error messages and build status
✅ Maintained compatibility: Old flags still work
✅ Prepared for modularization: Clear path to libyaze_agent.a
✅ Tested thoroughly: All configurations verified working

The z3ed AI agent is now production-ready with Gemini and Ollama support!

6. References

Active Documentation:

E6-z3ed-cli-design.md - Overall CLI design and architecture
E6-z3ed-reference.md - Technical command and API reference
docs/api/z3ed-resources.yaml - Machine-readable API reference (generated)

Source Code:

src/cli/service/ - Core services (proposal registry, sandbox manager, resource catalog)
src/app/editor/system/proposal_drawer.{h,cc} - GUI review panel
src/app/core/service/imgui_test_harness_service.{h,cc} - gRPC automation server

Last Updated: [Current Date] Contributors: @scawful, GitHub Copilot License: Same as YAZE (see ../../LICENSE)

70 KiB Raw Blame History Unescape Escape

z3ed Agentic Workflow Plan

Executive Summary

Quick Reference

1. Current Priorities (Week of Oct 2-8, 2025)

Priority 1: Test Harness Enhancements (IT-05 to IT-09) 🔧 ACTIVE

IT-05: Test Introspection API (6-8 hours)

IT-06: Widget Discovery API (4-6 hours)

IT-07: Test Recording & Replay ✅ COMPLETE (Oct 2, 2025)

IT-08: Enhanced Error Reporting (5-7 hours) ✅ COMPLETE

IT-09: CI/CD Integration ✅ CLI Tooling Shipped

IT-10: Collaborative Editing & Multiplayer Sessions ⏸️ DEPRIORITIZED

Priority 2: LLM Integration (Ollama + Gemini + Claude) 🤖 NEW PRIORITY

Phase 1: Ollama Local Integration (4-6 hours) 🎯 START HERE

Phase 2: Gemini Fixes (2-3 hours)

Phase 3: Claude Integration (2-3 hours)

Phase 4: Enhanced Prompt Engineering (3-4 hours)

Priority 3: Windows Cross-Platform Testing 🪟

Priority 2: Windows Cross-Platform Testing 🪟

2. Workstreams Overview

Completed Work Summary

3. Task Backlog

3. Immediate Next Steps (Week of Oct 1-7, 2025)

Priority 0: Testing & Validation (Active)

Priority 1: ImGuiTestHarness Foundation (IT-01) ✅ COMPLETE

Phase 1: gRPC Infrastructure ✅ COMPLETE

Phase 2: ImGuiTestEngine Integration ✅ COMPLETE

Phase 3: Full ImGuiTestEngine Integration ✅ COMPLETE (Oct 2, 2025)

Phase 4: CLI Integration & Windows Testing (4-5 hours)

IT-01 Quick Reference

Priority 2: Policy Evaluation Framework (AW-04, 4-6 hours)

Priority 3: Documentation & Consolidation (2-3 hours)

Later: Advanced Features

4. Current Issues & Blockers

Active Issues

Known Limitations (Non-Blocking)

5. Architecture Overview

5.1. Proposal Lifecycle Flow

5.2. Component Interaction Diagram

5.3. Data Flow: Agent Run to ROM Merge

5. Open Questions

4. Work History & Key Decisions

Resource Catalogue Workstream (RC) - ✅ COMPLETE

Acceptance Workflow (AW-01, AW-02, AW-03) - ✅ COMPLETE

ImGuiTestHarness (IT-01, IT-02) - ✅ CORE COMPLETE

Files Modified/Created

5. Open Questions

Z3ED_AI Flag Migration Guide

Summary

Problem Statement

Before (Issues):

Root Cause of Crash:

Solution

1. Created Z3ED_AI Master Flag

2. Fixed PromptBuilder Crash

3. Updated z3ed Build Configuration

Migration Instructions

For Users

For Developers

Testing Results

Build Configurations Tested ✅

Crash Scenarios Fixed ✅

Impact on Build Modularization

Before:

After:

Future Modular Build Integration

Documentation Updates

Backward Compatibility

Old Flags Still Work ✅

No Breaking Changes

Next Steps

Short Term (Complete)

Medium Term (Recommended)

Long Term (Integration with Modular Build)

References

Summary

6. References

70 KiB

Raw Blame History