70 KiB
z3ed Agentic Workflow Plan
Last Updated: October 2, 2025 Status: Core Infrastructure Complete | Test Harness Enhancement Phase 🎯
📋 Quick Start: See README.md for essential links and project status.
Executive Summary
The z3ed CLI and AI agent workflow system has completed major infrastructure milestones:
✅ Completed Phases:
- Phase 6: Resource Catalogue - Machine-readable API specs for AI consumption
- AW-01/02/03: Acceptance Workflow - Proposal tracking, sandbox management, GUI review with ROM merging
- AW-04: Policy Evaluation Framework - YAML-based constraint system for proposal acceptance
- IT-01: ImGuiTestHarness - Full GUI automation via gRPC + ImGuiTestEngine (all 3 phases complete)
- IT-02: CLI Agent Test - Natural language → automated GUI testing (implementation complete)
🎯 Active Phase:
- Conversational Agent Implementation: ✅ Foundation complete, LLM function calling ✅ COMPLETE (Oct 3, 2025)
📋 Next Phases (Updated Oct 3, 2025):
- Priority 1: Live LLM Testing (1-2h) - Verify function calling with Ollama/Gemini
- Priority 2: GUI Chat Widget (6-8h) - Create ImGui widget matching TUI experience
- Priority 3: Expand Tool Coverage (8-10h) - Add dialogue, sprite, region inspection tools
- Priority 4: Widget Discovery API (IT-06) - AI agents enumerate available GUI interactions
- Priority 5: Windows Cross-Platform Testing - Validate on Windows with vcpkg
- Deprioritized: Collaborative Editing (IT-10) - Postponed in favor of practical LLM integration
Recent Accomplishments (Updated: October 2025):
- ✅ IT-08 Enhanced Error Reporting Complete: Full diagnostic capture operational
- IT-08a: Screenshot RPC with SDL capture (BMP format, 1536x864)
- IT-08b: Auto-capture execution context on failures (frame, window, widget)
- IT-08c: Widget state dumps with comprehensive UI snapshot (JSON, 45 min)
- Proto schema updated with screenshot_path, failure_context, widget_state
- GetTestResults RPC returns complete failure diagnostics
- ✅ IT-09 CLI Suite Commands Landed: End-to-end suite orchestration for CI
agent test suite runhandles groups, tags, params, retries, and emits summaries plus default JUnit XML undertest-results/junit/agent test suite validateperforms structural linting with exit codes- NEW
agent test suite createinteractive builder writes YAML suites totests/<name>.yaml(with--forceoverwrite) and guides group/test entry
- ✅ IT-08a Screenshot RPC Complete: SDL-based screenshot capture operational
- Captures 1536x864 BMP files via SDL_RenderReadPixels
- Successfully tested via gRPC (5.3MB output files)
- Foundation for auto-capture on test failures
- ✅ Policy Framework Complete: PolicyEvaluator service fully integrated with ProposalDrawer GUI
- 4 policy types implemented: test_requirement, change_constraint, forbidden_range, review_requirement
- 3 severity levels: Info (informational), Warning (overridable), Critical (blocks acceptance)
- GUI displays color-coded violations (⛔ critical, ⚠️ warning, ℹ️ info)
- Accept button gating based on policy violations with override confirmation dialog
- Example policy configuration at
.yaze/policies/agent.yaml
- ✅ E2E Validation Complete: All 5 functional RPC tests passing (Ping, Click, Type, Wait, Assert)
- Window detection timing issue resolved with 10-frame yield buffer in Wait RPC
- Thread safety issues resolved with shared_ptr state management
- Test harness validated on macOS ARM64 with real YAZE GUI interactions
- gRPC Test Harness (IT-01 & IT-02): Full implementation complete with natural language → GUI testing
- ✅ Test Recording & Replay (IT-07): JSON recorder/replayer implemented, CLI and harness wired, end-to-end regression workflow captured in
scripts/test_record_replay_e2e.sh - Build System: Hardened CMake configuration with reliable gRPC integration
- Proposal Workflow: Agentic proposal system fully operational (create, list, diff, review in GUI)
Known Limitations & Improvement Opportunities:
- Screenshot Auto-Capture: Manual RPC only → needs integration with TestManager failure detection
- Test Introspection: ✅ Complete - GetTestStatus/ListTests/GetResults RPCs operational
- Widget Discovery: AI agents can't enumerate available widgets → add DiscoverWidgets RPC
- Test Recording: No record/replay for regression testing → add RecordSession/ReplaySession RPCs
- Synchronous Wait: Async tests return immediately → add blocking mode or result polling
- Error Context: Test failures lack screenshots/state dumps → enhance error reporting
- Performance: Tests add ~166ms per Wait call due to frame yielding (acceptable trade-off)
- YAML Parsing: Simple parser implemented, consider yaml-cpp for complex scenarios
Time Investment: 28.5 hours total (IT-01: 11h, IT-02: 7.5h, E2E: 2h, Policy: 6h, Docs: 2h)
Quick Reference
Start Test Harness:
./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze \
--enable_test_harness \
--test_harness_port=50052 \
--rom_file=assets/zelda3.sfc &
Test All RPCs:
./scripts/test_harness_e2e.sh
Create Proposal:
./build/bin/z3ed agent run "Test prompt" --sandbox
./build/bin/z3ed agent list
./build/bin/z3ed agent diff --proposal-id <ID>
Review in GUI:
- Open YAZE →
Debug → Agent Proposals - Select proposal → Review → Accept/Reject/Delete
1. Current Priorities (Week of Oct 2-8, 2025)
Status: Core Infrastructure Complete ✅ | Test Harness Enhancement Phase 🔧
Priority 1: Test Harness Enhancements (IT-05 to IT-09) 🔧 ACTIVE
Goal: Transform test harness from basic automation to comprehensive testing platform and deliver holistic error reporting across YAZE
Time Estimate: 20-25 hours total (7.5h completed in IT-07)
Blocking Dependency: IT-01 Complete ✅
Motivation: The harness now supports AI workflows, regression capture, and automation—but error surfaces remain shallow:
- AI Agent Development: Still needs widget discovery for adaptive planning
- Regression Testing: Recording/replay finished; reporting pipeline must surface actionable failures
- CI/CD Integration: Requires reliable artifacts (logs, screenshots, structured context)
- Debugging: Failures lack screenshots, widget hierarchies, and EditorManager state snapshots
- Application Consistency: z3ed, EditorManager, and core services emit heterogeneous error formats
IT-05: Test Introspection API (6-8 hours)
Status (Oct 2, 2025): ✅ Completed
Highlights:
imgui_test_harness.protonow exposesGetTestStatus,ListTests, andGetTestResultsRPCs backed byTestManager's execution history.- CLI commands (
z3ed agent test status|list|results) are fully wired with JSON/YAML formatting, follow-mode polling, and filtering options. GuiAutomationClientprovides typed wrappers for introspection APIs so agent workflows can poll status programmatically.- Regression coverage lives in
scripts/test_harness_e2e.sh; a slimmer introspection smoke (scripts/test_introspection_e2e.sh) is queued for CI automation but manual verification paths are documented.
Future Enhancements:
- Capture richer assertion metadata (expected/actual pairs) for improved failure messaging when the underlying harness exposes it.
- Add pagination helpers to CLI once history volume grows (low priority).
Example Usage:
# Queue a test
z3ed agent test --prompt "Open Overworld editor"
# Poll for completion
z3ed test status --test-id grpc_click_12345678
# Retrieve results
z3ed test results --test-id grpc_click_12345678 --format json
API Schema:
message GetTestStatusRequest {
string test_id = 1;
}
message GetTestStatusResponse {
enum Status { QUEUED = 0; RUNNING = 1; PASSED = 2; FAILED = 3; TIMEOUT = 4; }
Status status = 1;
int64 execution_time_ms = 2;
string error_message = 3;
repeated string assertion_failures = 4;
}
message ListTestsRequest {
string category_filter = 1; // Optional: "grpc", "unit", etc.
int32 page_size = 2;
string page_token = 3;
}
message ListTestsResponse {
repeated TestInfo tests = 1;
string next_page_token = 2;
}
message TestInfo {
string test_id = 1;
string name = 2;
string category = 3;
int64 last_run_timestamp_ms = 4;
int32 total_runs = 5;
int32 pass_count = 6;
int32 fail_count = 7;
}
IT-06: Widget Discovery API (4-6 hours)
Implementation Tasks:
-
Add DiscoverWidgets RPC:
- Enumerate all windows currently open in YAZE GUI
- List all interactive widgets (buttons, inputs, menus, tabs) per window
- Return widget metadata: ID, type, label, enabled state, position
- Support filtering by window name or widget type
-
AI-Friendly Output Format:
- JSON schema describing available interactions
- Natural language descriptions for each widget
- Suggested action templates (e.g., "Click button:{label}")
Example Usage:
# Discover all widgets
z3ed gui discover
# Filter by window
z3ed gui discover --window "Overworld"
# Get only buttons
z3ed gui discover --type button
API Schema (current):
message DiscoverWidgetsRequest {
string window_filter = 1;
WidgetType type_filter = 2;
string path_prefix = 3;
bool include_invisible = 4;
bool include_disabled = 5;
}
message WidgetBounds {
float min_x = 1;
float min_y = 2;
float max_x = 3;
float max_y = 4;
}
message DiscoveredWidget {
string path = 1;
string label = 2;
string type = 3;
string description = 4;
string suggested_action = 5;
bool visible = 6;
bool enabled = 7;
WidgetBounds bounds = 8;
uint32 widget_id = 9;
int64 last_seen_frame = 10;
int64 last_seen_at_ms = 11;
bool stale = 12;
}
message DiscoveredWindow {
string name = 1;
bool visible = 2;
repeated DiscoveredWidget widgets = 3;
}
message DiscoverWidgetsResponse {
repeated DiscoveredWindow windows = 1;
int32 total_widgets = 2;
int64 generated_at_ms = 3;
}
Benefits for AI Agents:
- LLMs can dynamically learn available GUI interactions
- Agents can adapt to UI changes without hardcoded widget names
- Natural language descriptions enable better prompt engineering
IT-07: Test Recording & Replay ✅ COMPLETE (Oct 2, 2025)
Highlights:
- Implemented
StartRecording,StopRecording, andReplayTestRPCs with persistent JSON scripts - Added CLI commands:
z3ed test record start|stop,z3ed test replay - Scripts stored in
tests/gui/with metadata (name, tags, assertions, timing hints) - Added regression coverage via
scripts/test_record_replay_e2e.sh - Documentation updates in
E6-z3ed-reference.mdand new quick-start snippets in README - Confirmed compatibility with natural language prompts generated by the agent workflow
Outcome: Recording/replay is production-ready; focus shifts to surfacing rich failure diagnostics (IT-08).
IT-08: Enhanced Error Reporting (5-7 hours) ✅ COMPLETE
Status: IT-08a Complete ✅ | IT-08b Complete ✅ | IT-08c Complete ✅ Objective: Deliver a unified, high-signal error reporting pipeline spanning ImGuiTestHarness, z3ed CLI, EditorManager, and core application services.
Implementation Tracks:
- Harness-Level Diagnostics
- ✅ IT-08a: Screenshot RPC implemented (SDL-based, BMP format, 1536x864)
- ✅ IT-08b: Auto-capture screenshots and context on test failure using shared
helper that writes to
${TMPDIR}/yaze/test-results/<test_id>/ - ✅ IT-08c: Widget tree JSON dumps emitted alongside failure context
- ⏳ HTML bundle exporter (screenshots + widget tree) remains a stretch goal
- CLI Experience Improvements
- Surface artifact paths, failure context, and widget state in CLI output (DONE)
- Standardize error envelopes in z3ed (
absl::Status+ structured payload) - Add
--format htmlflag to emit rich bundles (planned) - Integrate with recording workflow: replay failures using captured state (planned)
- EditorManager & Application Integration
- Introduce shared
ErrorAnnotatedResultutility exposingstatus,context,actionable_hint - Adapt EditorManager subsystems (ProposalDrawer, OverworldEditor, DungeonEditor) to adopt the shared structure
- Add in-app failure overlay (ImGui modal) that references harness artifacts when available
- Hook proposal acceptance/replay flows to display enriched diagnostics when sandbox merges fail
- Telemetry & Storage Hooks (Stretch)
- Optionally emit error metadata to a ring buffer for future analytics/telemetry workstreams
- Provide CLI flag
--error-artifact-dirto customize storage (supports CI separation)
Error Report Example:
{
"test_id": "grpc_assert_12345678",
"failure_time": "2025-10-02T14:23:45Z",
"assertion": "visible:Overworld",
"expected": "visible",
"actual": "hidden",
"screenshot": "/tmp/yaze/test-results/grpc_assert_12345678/failure_1696357220000.bmp",
"widget_state": {
"active_window": "Main Window",
"focused_widget": null,
"visible_windows": ["Main Window", "Debug"],
"overworld_window": { "exists": true, "visible": false, "position": "0,0,0,0" }
},
"execution_context": {
"frame_count": 1234,
"recent_events": ["Click: menuitem: Overworld Editor", "Wait: window_visible:Overworld"],
"resource_stats": { "memory_mb": 245, "textures": 12, "framerate": 60.0 },
"editor_manager_snapshot": {
"active_module": "OverworldEditor",
"dirty_buffers": ["overworld_layer_1"],
"last_error": null
}
}
}
IT-09: CI/CD Integration ✅ CLI Tooling Shipped
Delivered (Oct 3, 2025):
- Standardized Suite Runtime
- YAML suite parser/loader with group dependencies and retry semantics
z3ed agent test suite runexposes--group,--tag,--param,--retries,--ci-mode, and--junit- Automatic JUnit XML emission to
test-results/junit/<suite>.xml
- Validation & Authoring UX
z3ed agent test suite validatesurfaces structural linting with annotated exit codes (0 pass, 1 fail, 2 error)- NEW
z3ed agent test suite create <name>interactive flow scaffolds suites undertests/, prompting for metadata, groups, replay scripts, tags, and key=value parameters (with--forceoverwrite support)
- Reporting
- Text and JSON summaries include per-test assertions and retry outcomes
- Default output directory layout ready for CI artifact upload
Next Steps (post-CLI follow-through):
- Publish canonical
tests/smoke.yaml/tests/regression.yamlsamples - Add
.github/workflows/gui-tests.ymltemplate referencing the new runner - Document flaky-test mitigation patterns, including recommended retry counts
- Wire suite execution output into docs/CI dashboards for quick triage
Test Suite Format:
name: YAZE GUI Test Suite
description: Comprehensive tests for YAZE editor functionality
version: 1.0
config:
timeout_per_test: 30s
retry_on_failure: 2
parallel_execution: false
test_groups:
- name: smoke
description: Fast tests for basic functionality
tests:
- tests/overworld_load.json
- tests/dungeon_load.json
- name: regression
description: Full test suite for release validation
depends_on: [smoke]
tests:
- tests/palette_edit.json
- tests/sprite_load.json
- tests/rom_save.json
GitHub Actions Integration:
name: GUI Tests
on: [push, pull_request]
jobs:
gui-tests:
runs-on: macos-latest
steps:
- uses: actions/checkout@v2
- name: Build YAZE with test harness
run: |
cmake -B build -DYAZE_WITH_GRPC=ON
cmake --build build --target yaze --target z3ed
- name: Start test harness
run: |
./build/bin/yaze --enable_test_harness --headless &
sleep 5
- name: Run test suite
run: |
./build/bin/z3ed test run-suite tests/suite.yaml --ci-mode
- name: Upload test results
if: always()
uses: actions/upload-artifact@v2
with:
name: test-results
path: test-results/
IT-10: Collaborative Editing & Multiplayer Sessions ⏸️ DEPRIORITIZED
Status: Postponed in favor of LLM integration work
Rationale: While collaborative editing is an interesting feature, practical LLM integration provides more immediate value for the agentic workflow system. The core infrastructure is complete, and enabling real AI agents to interact with z3ed is the critical next step.
Future Consideration: IT-10 may be revisited after LLM integration is production-ready and validated by users. The collaborative editing design is preserved in the documentation for future reference.
See: LLM-INTEGRATION-PLAN.md for the new priority work.
Priority 2: LLM Integration (Ollama + Gemini + Claude) 🤖 NEW PRIORITY
Goal: Enable practical AI-driven ROM modifications with local and remote LLM providers
Time Estimate: 12-15 hours total
Status: Ready to Implement
Why This is Critical: The z3ed infrastructure is complete (CLI, proposals, sandbox, GUI automation), but currently uses MockAIService with hardcoded commands. Real LLM integration unlocks the full potential of the agentic workflow system.
📋 Complete Documentation:
- LLM-INTEGRATION-PLAN.md - Detailed technical implementation guide (60+ pages)
- LLM-IMPLEMENTATION-CHECKLIST.md - Step-by-step task list with checkboxes
- LLM-INTEGRATION-SUMMARY.md - Executive summary and getting started
Implementation Phases:
Phase 1: Ollama Local Integration (4-6 hours) 🎯 START HERE
- Create
OllamaAIServiceclass with health checks and model management - Wire into agent commands with provider selection mechanism
- Add CMake configuration for httplib support
- End-to-end testing with
qwen2.5-coder:7bmodel
Key Benefits: Local, free, private, no rate limits
Phase 2: Gemini Fixes (2-3 hours)
- Fix existing
GeminiAIServiceimplementation - Improve prompting with resource catalogue
- Add markdown code block stripping for reliable parsing
Phase 3: Claude Integration (2-3 hours)
- Create
ClaudeAIServiceclass - Implement Messages API integration
- Same interface as other services for easy swapping
Phase 4: Enhanced Prompt Engineering (3-4 hours)
- Create
PromptBuilderutility class - Load resource catalogue (
z3ed-resources.yaml) into system prompts - Add few-shot examples for improved accuracy (>90%)
- Inject ROM context (current state, loaded editors)
Quick Start After Implementation:
# Install Ollama
brew install ollama
ollama serve &
ollama pull qwen2.5-coder:7b
# Configure z3ed
export YAZE_AI_PROVIDER=ollama
# Use natural language
z3ed agent run --prompt "Make all soldier armor red" --rom zelda3.sfc --sandbox
z3ed agent diff # Review changes
Testing Script: ./scripts/quickstart_ollama.sh (automated setup validation)
Priority 3: Windows Cross-Platform Testing 🪟
-
Collaboration Server:
- WebSocket server for real-time client communication
- Session management (create, join, authentication)
- Edit event broadcasting to all connected clients
- Conflict resolution (last-write-wins with timestamps)
-
Collaboration Client:
- Connect to remote sessions via WebSocket
- Send local edits to server
- Receive and apply remote edits
- ROM state synchronization on join
-
Edit Event Protocol:
- Protobuf definitions for edit events (tile, sprite, palette, map)
- Cursor position tracking
- AI proposal sharing and voting
- Session state messages
-
GUI Integration:
- Status bar showing connected users
- Collaboration panel (user list, activity feed)
- Live cursor rendering (color-coded per user)
- Proposal voting UI (Accept/Reject/Discuss)
-
Session Recording & Replay:
- Record all events to YAML/JSON file
- Replay engine with timeline controls
- Export session summaries for review
CLI Commands:
# Host a collaborative session
z3ed collab host --port 5000 --password "dev123"
# Join a session
z3ed collab join yaze://connect/192.168.1.100:5000
# List active sessions (LAN discovery)
z3ed collab list
# Disconnect from session
z3ed collab disconnect
# Replay recorded session
z3ed collab replay session_2025_10_02.yaml --speed 2x
User Stories:
- US-1: As a ROM hacker, I want to host a collaborative session so my teammates can join and work together
- US-2: As a collaborator, I want to see other users' edits in real-time so we stay synchronized
- US-3: As a team lead, I want to use AI agents with my team so we can all benefit from automation (shared proposals with majority voting)
- US-4: As a collaborator, I want to see where other users are working so we don't conflict (live cursors)
- US-5: As a project manager, I want to record collaborative sessions so we can review work later
Benefits:
- Real-Time Collaboration: Multiple users can edit the same ROM simultaneously
- Shared AI Assistance: Team votes on AI proposals before execution
- Conflict Prevention: Live cursors show where teammates are working
- Audit Trail: Session recording for review and compliance
- Remote Teams: Connect over LAN or internet (with optional encryption)
Technical Architecture:
┌──────────────┐ ┌─────────────────┐ ┌──────────────┐
│ Client A │────►│ Collab Server │◄────│ Client B │
│ (Host) │ │ (WebSocket) │ │ │
└──────────────┘ │ │ └──────────────┘
│ - Session Mgmt │
│ - Event Broker │ ┌──────────────┐
│ - Conflict Res │◄────│ Client C │
└─────────────────┘ └──────────────┘
Security Considerations:
- Optional password protection for sessions
- Read-only vs read-write access levels
- ROM checksum verification (prevents desync)
- Rate limiting (prevent spam/DOS)
- Optional TLS/SSL encryption for public internet
See: IT-10-COLLABORATIVE-EDITING.md for complete specification
Priority 2: Windows Cross-Platform Testing 🪟
Goal: Validate z3ed and test harness on Windows
Time Estimate: 8-10 hours
Blocking Dependency: IT-05 Complete (need stable API)
📋 Detailed Guides: See NEXT_PRIORITIES_OCT2.md for complete implementation breakdowns with code examples.
2. Workstreams Overview
| Workstream | Goal | Status | Notes |
|---|---|---|---|
| Resource Catalogue | Machine-readable CLI specs for AI consumption | ✅ Complete | docs/api/z3ed-resources.yaml generated |
| Acceptance Workflow | Human review/approval of agent proposals | ✅ Complete | ProposalDrawer with ROM merging operational |
| ImGuiTest Bridge | Automated GUI testing via gRPC | ✅ Complete | All 3 phases done (11 hours) |
| Verification Pipeline | Layered testing + CI coverage | 📋 In Progress | E2E validation phase |
| Telemetry & Learning | Capture signals for improvement | 📋 Planned | Optional/opt-in (Phase 8) |
Completed Work Summary
Resource Catalogue (RC) ✅:
- CLI flag passthrough and resource catalog system
agent describeexports YAML/JSON schemasdocs/api/z3ed-resources.yamlmaintained- All ROM/Palette/Overworld/Dungeon/Patch commands documented
Acceptance Workflow (AW-01/02/03) ✅:
ProposalRegistrywith disk persistence and cross-session trackingRomSandboxManagerfor isolated ROM copiesagent listandagent diffcommands- ProposalDrawer GUI: List/detail views, Accept/Reject/Delete, ROM merging
- Integrated into EditorManager (
Debug → Agent Proposals)
ImGuiTestHarness (IT-01) ✅:
- Phase 1: gRPC infrastructure (6 RPC methods)
- Phase 2: TestManager integration with dynamic tests
- Phase 3: Full ImGuiTestEngine (Type/Wait/Assert RPCs)
- E2E test script:
scripts/test_harness_e2e.sh - Documentation: IT-01-QUICKSTART.md
3. Task Backlog
| ID | Task | Workstream | Type | Status | Dependencies |
|---|---|---|---|---|---|
| RC-01 | Define schema for ResourceCatalog entries and implement serialization helpers. |
Resource Catalogue | Code | ✅ Done | Schema system complete with all resource types documented |
| RC-02 | Auto-generate docs/api/z3ed-resources.yaml from command annotations. |
Resource Catalogue | Tooling | ✅ Done | Generated and committed to docs/api/ |
| RC-03 | Implement z3ed agent describe CLI surface returning JSON schemas. |
Resource Catalogue | Code | ✅ Done | Both YAML and JSON output formats working |
| RC-04 | Integrate schema export with TUI command palette + help overlays. | Resource Catalogue | UX | 📋 Planned | RC-03 |
| RC-05 | Harden CLI command routing/flag parsing to unblock agent automation. | Resource Catalogue | Code | ✅ Done | Fixed rom info handler to use FLAGS_rom |
| AW-01 | Implement sandbox ROM cloning and tracking (RomSandboxManager). |
Acceptance Workflow | Code | ✅ Done | ROM sandbox manager operational with lifecycle management |
| AW-02 | Build proposal registry service storing diffs, logs, screenshots. | Acceptance Workflow | Code | ✅ Done | ProposalRegistry implemented with disk persistence |
| AW-03 | Add ImGui drawer for proposals with accept/reject controls. | Acceptance Workflow | UX | ✅ Done | ProposalDrawer GUI complete with ROM merging |
| AW-04 | Implement policy evaluation for gating accept buttons. | Acceptance Workflow | Code | ✅ Done | PolicyEvaluator service with 4 policy types (test, constraint, forbidden, review), GUI integration complete (6 hours) |
| AW-05 | Draft .z3ed-diff hybrid schema (binary deltas + JSON metadata). |
Acceptance Workflow | Design | 📋 Planned | AW-01 |
| IT-01 | Create ImGuiTestHarness IPC service embedded in yaze_test. |
ImGuiTest Bridge | Code | ✅ Done | Phase 1+2+3 Complete - Full GUI automation with gRPC + ImGuiTestEngine (11 hours) |
| IT-02 | Implement CLI agent step translation (imgui_action → harness call). |
ImGuiTest Bridge | Code | ✅ Done | z3ed agent test command with natural language prompts (7.5 hours) |
| IT-03 | Provide synchronization primitives (WaitForIdle, etc.). |
ImGuiTest Bridge | Code | ✅ Done | Wait RPC with condition polling already implemented in IT-01 Phase 3 |
| IT-04 | Complete E2E validation with real YAZE widgets | ImGuiTest Bridge | Test | ✅ Done | IT-02 - All 5 functional tests passing, window detection fixed with yield buffer |
| IT-05 | Add test introspection RPCs (GetTestStatus, ListTests, GetResults) | ImGuiTest Bridge | Code | ✅ Done | IT-01 - Enable clients to poll test results and query execution state (Oct 2, 2025) |
| IT-06 | Implement widget discovery API for AI agents | ImGuiTest Bridge | Code | 📋 Planned | IT-01 - DiscoverWidgets RPC to enumerate windows, buttons, inputs |
| IT-07 | Add test recording/replay for regression testing | ImGuiTest Bridge | Code | ✅ Done | IT-05 - RecordSession/ReplaySession RPCs with JSON test scripts |
| IT-08 | Enhance error reporting with screenshots and state dumps | ImGuiTest Bridge | Code | ✅ Done | IT-01 - Screenshot RPC, auto-capture, widget state dumps complete (Oct 2, 2025) |
| IT-08a | Screenshot RPC implementation (SDL capture) | ImGuiTest Bridge | Code | ✅ Done | IT-01 - Screenshot capture complete (Oct 2, 2025) |
| IT-08b | Auto-capture screenshots on test failure | ImGuiTest Bridge | Code | ✅ Done | IT-08a - Integrated with TestManager (Oct 2, 2025) |
| IT-08c | Widget state dumps and execution context | ImGuiTest Bridge | Code | ✅ Done | IT-08b - Enhanced failure diagnostics (Oct 2, 2025) |
| IT-09 | Create standardized test suite format for CI integration | ImGuiTest Bridge | Infra | ✅ Done | IT-07 - CLI suite run/validate/create commands, JUnit output |
| IT-10 | Collaborative editing & multiplayer sessions with shared AI | Collaboration | Feature | 📋 Planned | IT-05, IT-08 - Real-time multi-user editing with live cursors, shared proposals (12-15 hours) |
| VP-01 | Expand CLI unit tests for new commands and sandbox flow. | Verification Pipeline | Test | 📋 Planned | RC/AW tasks |
| VP-02 | Add harness integration tests with replay scripts. | Verification Pipeline | Test | 📋 Planned | IT tasks |
| VP-03 | Create CI job running agent smoke tests with YAZE_WITH_JSON. |
Verification Pipeline | Infra | 📋 Planned | VP-01, VP-02 |
| TL-01 | Capture accept/reject metadata and push to telemetry log. | Telemetry & Learning | Code | 📋 Planned | AW tasks |
| TL-02 | Build anonymized metrics exporter + opt-in toggle. | Telemetry & Learning | Infra | 📋 Planned | TL-01 |
Status Legend: 🔄 Active · 📋 Planned · ✅ Done
Progress Summary:
- ✅ Completed: 13 tasks (54%)
- 🔄 Active: 0 tasks (0%)
- 📋 Planned: 11 tasks (46%)
- Total: 24 tasks (6 test harness enhancements + 1 collaborative feature)
3. Immediate Next Steps (Week of Oct 1-7, 2025)
Priority 0: Testing & Validation (Active)
-
TEST: Complete end-to-end proposal workflow
- Launch YAZE and verify ProposalDrawer displays live proposals
- Test Accept action → verify ROM merge and save prompt
- Test Reject and Delete actions
- Validate filtering and refresh functionality
-
Widget ID Refactoring (Started Oct 2, 2025) 🎯 NEW
- ✅ Added widget_id_registry to build system
- ✅ Registered 13 Overworld toolset buttons with hierarchical IDs
- 📋 Next: Test widget discovery and update test harness
- See: WIDGET_ID_REFACTORING_PROGRESS.md
Priority 1: ImGuiTestHarness Foundation (IT-01) ✅ COMPLETE
Rationale: Required for automated GUI testing and remote control of YAZE for AI workflows
Decision: ✅ Use gRPC - Production-grade, cross-platform, type-safe (see IT-01-grpc-evaluation.md)
Status: Phase 1 Complete ✅ | Phase 2 Complete ✅ | Phase 3 Planned <20>
Phase 1: gRPC Infrastructure ✅ COMPLETE
- ✅ Add gRPC to build system via FetchContent
- ✅ Create .proto schema (Ping, Click, Type, Wait, Assert, Screenshot)
- ✅ Implement gRPC server with all 6 RPC stubs
- ✅ Test with grpcurl - all RPCs responding
- ✅ Server lifecycle management (Start/Shutdown)
- ✅ Cross-platform build verified (macOS ARM64)
See: GRPC_TEST_SUCCESS.md for Phase 1 completion details
Phase 2: ImGuiTestEngine Integration ✅ COMPLETE
Goal: Replace stub RPC handlers with actual GUI automation
Status: Infrastructure complete, dynamic test registration implemented
Time Spent: ~4 hours
Implementation Guide: 📖 IT-01-PHASE2-IMPLEMENTATION-GUIDE.md
Completed Tasks:
- ✅ TestManager Integration - gRPC service receives TestManager reference
- ✅ Build System - Successfully compiles with ImGuiTestEngine support
- ✅ Server Startup - gRPC server starts correctly on macOS with test harness flag
- ✅ Dynamic Test Registration - Click RPC uses
IM_REGISTER_TEST()macro for dynamic tests - ✅ Stub Handlers - Type/Wait/Assert RPCs return success (implementation pending Phase 3)
- ✅ Ping RPC - Fully functional, returns YAZE version and timestamp
Key Learnings:
- ImGuiTestEngine requires test registration - can't call test functions directly
- Test context provided by engine via
test->Output.Statusnottest->Status - YAZE uses custom flag system with
FLAGS_name->Get()pattern - Correct flags:
--enable_test_harness,--test_harness_port,--rom_file
Testing Results:
# Server starts successfully
./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze \
--enable_test_harness \
--test_harness_port=50052 \
--rom_file=assets/zelda3.sfc &
# Ping RPC working
grpcurl -plaintext -d '{"message":"test"}' \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Ping
# Response: {"message":"Pong: test","timestampMs":"...","yazeVersion":"0.3.2"}
Issues Fixed:
- ❌→✅ SIGSEGV on TestManager initialization (deferred ImGuiTestEngine init to Phase 3)
- ❌→✅ ImGuiTestEngine API mismatch (switched to dynamic test registration)
- ❌→✅ Status field access (corrected to
test->Output.Status) - ❌→✅ Port conflicts (use port 50052,
killall yazeto cleanup) - ❌→✅ Flag naming (documented correct underscore format)
Phase 3: Full ImGuiTestEngine Integration ✅ COMPLETE (Oct 2, 2025)
Goal: Complete implementation of all GUI automation RPCs
Completed Tasks:
-
✅ Type RPC Implementation - Full text input automation
- ItemInfo API usage corrected (returns by value, not pointer)
- Focus management with ItemClick before typing
- Clear-first functionality with keyboard shortcuts
- Dynamic test registration with timeout handling
-
✅ Wait RPC Implementation - Condition polling with timeout
- Three condition types: window_visible, element_visible, element_enabled
- Configurable timeout (default 5000ms) and poll interval (default 100ms)
- Proper Yield() calls to allow ImGui event processing
- Extended timeout for test execution
-
✅ Assert RPC Implementation - State validation with structured responses
- Multiple assertion types: visible, enabled, exists, text_contains
- Actual vs expected value reporting
- Detailed error messages for debugging
- text_contains partially implemented (text retrieval needs refinement)
-
✅ API Compatibility Fixes
- Corrected ItemInfo usage (by value, check ID != 0)
- Fixed flag names (ItemFlags instead of StatusFlags)
- Proper visibility checks using RectClipped dimensions
- All dynamic tests properly registered and cleaned up
Testing:
- Build successful on macOS ARM64
- All RPCs respond correctly
- Test script created:
scripts/test_harness_e2e.sh - See
IT-01-PHASE3-COMPLETE.mdfor full implementation details
Known Limitations:
- Screenshot RPC not implemented (placeholder stub)
- text_contains assertion uses placeholder text retrieval
- Need end-to-end workflow testing with real YAZE widgets
- End-to-End Testing (1 hour)
- Create shell script workflow: start server → click button → wait for window → type text → assert state
- Test with real YAZE editors (Overworld, Dungeon, etc.)
- Document edge cases and troubleshooting
Phase 4: CLI Integration & Windows Testing (4-5 hours)
- CLI Client (
z3ed agent test)
- Generate gRPC calls from AI prompts
- Natural language → ImGui action translation
- Screenshot capture for LLM feedback
- Emit structured error envelopes with artifact links (IT-08)
- Windows Testing
- Detailed build instructions for vcpkg setup
- Test on Windows VM or with contributor
- Add Windows CI job to GitHub Actions
- Document troubleshooting
IT-01 Quick Reference
Start YAZE with Test Harness:
./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze \
--enable_test_harness \
--test_harness_port=50052 \
--rom_file=assets/zelda3.sfc &
Test RPCs with grpcurl:
# Ping - Health check
grpcurl -plaintext -import-path src/app/core/proto -proto imgui_test_harness.proto \
-d '{"message":"test"}' 127.0.0.1:50052 yaze.test.ImGuiTestHarness/Ping
# Click - Click UI element
grpcurl -plaintext -import-path src/app/core/proto -proto imgui_test_harness.proto \
-d '{"target":"button:Overworld","type":"LEFT"}' \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Click
# Type - Input text
grpcurl -plaintext -import-path src/app/core/proto -proto imgui_test_harness.proto \
-d '{"target":"input:Filename","text":"zelda3.sfc","clear_first":true}' \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Type
# Wait - Wait for condition
grpcurl -plaintext -import-path src/app/core/proto -proto imgui_test_harness.proto \
-d '{"condition":"window_visible:Overworld Editor","timeout_ms":5000}' \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Wait
# Assert - Validate state
grpcurl -plaintext -import-path src/app/core/proto -proto imgui_test_harness.proto \
-d '{"condition":"visible:Main Window"}' \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Assert
Troubleshooting:
- Port in use:
killall yazeor use--test_harness_port=50053 - Connection refused: Check server started with
lsof -i :50052 - Unrecognized flag: Use underscores not hyphens (e.g.,
--rom_filenot--rom)
Priority 2: Policy Evaluation Framework (AW-04, 4-6 hours)
-
DESIGN: YAML-based Policy Configuration
# .yaze/policies/agent.yaml version: 1.0 policies: - name: require_tests type: test_requirement enabled: true rules: - test_suite: "overworld_rendering" min_pass_rate: 0.95 - test_suite: "palette_integrity" min_pass_rate: 1.0 - name: limit_change_scope type: change_constraint enabled: true rules: - max_bytes_changed: 10240 # 10KB - allowed_banks: [0x00, 0x01, 0x0E] # Graphics banks only - forbidden_ranges: - start: 0xFFB0 # ROM header end: 0xFFFF - name: human_review_required type: review_requirement enabled: true rules: - if: bytes_changed > 1024 then: require_diff_review: true - if: commands_executed > 10 then: require_log_review: true -
IMPLEMENT: PolicyEvaluator Service
src/cli/service/policy_evaluator.{h,cc}- Singleton service loads policies from
.yaze/policies/ EvaluateProposal(proposal_id) -> PolicyResult- Returns: pass/fail + list of violations with severity
- Hook into ProposalRegistry lifecycle
-
INTEGRATE: Policy UI in ProposalDrawer
- Add "Policy Status" section in detail view
- Display violations with icons: ⛔ Critical, ⚠️ Warning, ℹ️ Info
- Gate Accept button: disabled if critical violations exist
- Show helpful messages: "Accept blocked: Test pass rate 0.85 < 0.95"
- Allow policy overrides with confirmation: "Override policy? This action will be logged."
Priority 3: Documentation & Consolidation (2-3 hours)
-
CONSOLIDATE: Merge standalone docs into main plan
- ✅ AW-03 summary → already in main plan, delete standalone doc
- Check for other AW-* or task-specific docs to merge
- Update main plan with architecture diagrams
-
CREATE: Architecture Flow Diagram
- Visual representation of proposal lifecycle
- Component interaction diagram
- Add to implementation plan
Later: Advanced Features
- VP-01: Expand CLI unit tests
- VP-02: Integration tests with replay scripts
- TL-01: Telemetry capture for learning
4. Current Issues & Blockers
Active Issues
None - all blocking issues resolved as of Oct 1, 2025
Known Limitations (Non-Blocking)
- ProposalDrawer lacks keyboard navigation
- Large diffs/logs truncated at 1000 lines (consider pagination)
- Proposals don't persist full metadata to disk (prompt, description, sandbox_id reconstructed)
- No policy evaluation yet (AW-04)
5. Architecture Overview
5.1. Proposal Lifecycle Flow
┌─────────────────────────────────────────────────────────────────┐
│ 1. CREATION (CLI: z3ed agent run) │
├─────────────────────────────────────────────────────────────────┤
│ User Prompt │
│ ↓ │
│ MockAIService / GeminiAIService │
│ ↓ (generates commands) │
│ ["palette export ...", "overworld set-tile ..."] │
│ ↓ │
│ RomSandboxManager::CreateSandbox(rom) │
│ ↓ (creates isolated copy) │
│ /tmp/yaze/sandboxes/<timestamp>/zelda3.sfc │
│ ↓ │
│ Execute commands on sandbox ROM │
│ ↓ (logs each command) │
│ ProposalRegistry::CreateProposal(sandbox_id, prompt, desc) │
│ ↓ (creates proposal directory) │
│ /tmp/yaze/proposals/proposal-<timestamp>-<seq>/ │
│ ├─ execution.log (command outputs) │
│ ├─ diff.txt (if generated) │
│ └─ screenshots/ (if any) │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ 2. DISCOVERY (CLI: z3ed agent list) │
├─────────────────────────────────────────────────────────────────┤
│ ProposalRegistry::ListProposals() │
│ ↓ (lazy loads from disk) │
│ LoadProposalsFromDiskLocked() │
│ ↓ (scans /tmp/yaze/proposals/) │
│ Reconstructs metadata from filesystem │
│ ↓ (parses timestamps, reads logs) │
│ Returns vector<ProposalMetadata> │
│ ↓ │
│ Display table: ID | Status | Created | Prompt | Stats │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ 3. REVIEW (GUI: Debug → Agent Proposals) │
├─────────────────────────────────────────────────────────────────┤
│ ProposalDrawer::Draw() │
│ ↓ (called every frame from EditorManager) │
│ ProposalDrawer::RefreshProposals() │
│ ↓ (calls ProposalRegistry::ListProposals) │
│ Display proposal list (selectable table) │
│ ↓ (user clicks proposal) │
│ ProposalDrawer::SelectProposal(id) │
│ ↓ (loads detail content) │
│ Read execution.log and diff.txt from proposal directory │
│ ↓ │
│ Display detail view: │
│ ├─ Metadata (sandbox_id, timestamp, stats) │
│ ├─ Diff (syntax highlighted) │
│ └─ Log (command execution trace) │
│ ↓ │
│ User decides: [Accept] [Reject] [Delete] │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ 4. ACCEPTANCE (GUI: Click "Accept" button) │
├─────────────────────────────────────────────────────────────────┤
│ ProposalDrawer::AcceptProposal(proposal_id) │
│ ↓ │
│ Get proposal metadata (includes sandbox_id) │
│ ↓ │
│ RomSandboxManager::ListSandboxes() │
│ ↓ (find sandbox by ID) │
│ sandbox_rom_path = sandbox.rom_path │
│ ↓ │
│ Load sandbox ROM from disk │
│ ↓ │
│ rom_->WriteVector(0, sandbox_rom.vector()) │
│ ↓ (copies entire sandbox ROM → main ROM) │
│ ROM marked dirty (save prompt appears) │
│ ↓ │
│ ProposalRegistry::UpdateStatus(id, kAccepted) │
│ ↓ │
│ User: File → Save ROM │
│ ↓ │
│ Changes committed ✅ │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ 5. REJECTION (GUI: Click "Reject" button) │
├─────────────────────────────────────────────────────────────────┤
│ ProposalDrawer::RejectProposal(proposal_id) │
│ ↓ │
│ ProposalRegistry::UpdateStatus(id, kRejected) │
│ ↓ │
│ Proposal preserved for audit trail │
│ Sandbox ROM left untouched (can be cleaned up later) │
└─────────────────────────────────────────────────────────────────┘
5.2. Component Interaction Diagram
┌────────────────────┐
│ CLI Layer │
│ (z3ed commands) │
└────────┬───────────┘
│
├──► agent run ──────────┐
├──► agent list ─────────┤
└──► agent diff ─────────┤
│
┌────────────────────────▼──────────────────────┐
│ CLI Service Layer │
├───────────────────────────────────────────────┤
│ ┌─────────────────────────────────────────┐ │
│ │ ProposalRegistry (Singleton) │ │
│ │ • CreateProposal() │ │
│ │ • ListProposals() │ │
│ │ • GetProposal() │ │
│ │ • UpdateStatus() │ │
│ │ • RemoveProposal() │ │
│ │ • LoadProposalsFromDiskLocked() │ │
│ └────────────┬────────────────────────────┘ │
│ │ │
│ ┌────────────▼────────────────────────────┐ │
│ │ RomSandboxManager (Singleton) │ │
│ │ • CreateSandbox() │ │
│ │ • ActiveSandbox() │ │
│ │ • ListSandboxes() │ │
│ │ • RemoveSandbox() │ │
│ └────────────┬────────────────────────────┘ │
└───────────────┼────────────────────────────────┘
│
┌───────────────▼────────────────────────────────┐
│ Filesystem Layer │
├────────────────────────────────────────────────┤
│ /tmp/yaze/proposals/ │
│ └─ proposal-<timestamp>-<seq>/ │
│ ├─ execution.log │
│ ├─ diff.txt │
│ └─ screenshots/ │
│ │
│ /tmp/yaze/sandboxes/ │
│ └─ <timestamp>-<seq>/ │
│ └─ zelda3.sfc (isolated ROM copy) │
└────────────────────────────────────────────────┘
▲
│
┌───────────────┴────────────────────────────────┐
│ GUI Layer │
├────────────────────────────────────────────────┤
│ ┌─────────────────────────────────────────┐ │
│ │ EditorManager │ │
│ │ • current_rom_ │ │
│ │ • proposal_drawer_ │ │
│ │ • Update() { proposal_drawer_.Draw() } │ │
│ └────────────┬────────────────────────────┘ │
│ │ │
│ ┌────────────▼────────────────────────────┐ │
│ │ ProposalDrawer │ │
│ │ • rom_ (ptr to EditorManager's ROM) │ │
│ │ • Draw() │ │
│ │ • DrawProposalList() │ │
│ │ • DrawProposalDetail() │ │
│ │ • AcceptProposal() ← ROM MERGE │ │
│ │ • RejectProposal() │ │
│ │ • DeleteProposal() │ │
│ └─────────────────────────────────────────┘ │
└────────────────────────────────────────────────┘
5.3. Data Flow: Agent Run to ROM Merge
User: "Make soldiers wear red armor"
│
▼
┌────────────────────────┐
│ MockAIService │ Generates: ["palette export sprites_aux1 4 soldier.col"]
└────────┬───────────────┘
│
▼
┌────────────────────────┐
│ RomSandboxManager │ Creates: /tmp/.../sandboxes/20251001T200215-1/zelda3.sfc
└────────┬───────────────┘
│
▼
┌────────────────────────┐
│ Command Executor │ Runs: palette export on sandbox ROM
└────────┬───────────────┘
│
▼
┌────────────────────────┐
│ ProposalRegistry │ Creates: proposal-20251001T200215-1/
│ │ • execution.log: "[timestamp] palette export succeeded"
└────────┬───────────────┘ • diff.txt: (if diff generated)
│
│ Time passes... user launches GUI
▼
┌────────────────────────┐
│ ProposalDrawer loads │ Reads: /tmp/.../proposals/proposal-*/
│ │ Displays: List of proposals
└────────┬───────────────┘
│
│ User clicks "Accept"
▼
┌────────────────────────┐
│ AcceptProposal() │ 1. Find sandbox ROM: /tmp/.../sandboxes/.../zelda3.sfc
│ │ 2. Load sandbox ROM
│ │ 3. rom_->WriteVector(0, sandbox_rom.vector())
│ │ 4. Main ROM now contains all sandbox changes
│ │ 5. ROM marked dirty
└────────┬───────────────┘
│
▼
┌────────────────────────┐
│ User: File → Save │ Changes persisted to disk ✅
└────────────────────────┘
5. Open Questions
- What serialization format should the proposal registry adopt for diff payloads (binary vs. textual vs. hybrid)?
➤ Decision: pursue a hybrid package (.z3ed-diff) that wraps binary tile/object deltas alongside a JSON metadata envelope (identifiers, texture descriptors, preview palette info). Capture format draft under RC/AW backlog. - How should the harness authenticate escalation requests for mutation actions?
➤ Still open—evaluate shared-secret vs. interactive user prompt in the harness spike (IT-01). - Can we reuse existing regression test infrastructure for nightly ImGui runs or should we spin up a dedicated binary?
➤ Investigate during the ImGuiTestHarness spike; compare extendingyaze_testjobs versus introducing a lightweight automation runner.
4. Work History & Key Decisions
This section provides a high-level summary of completed workstreams and major architectural decisions.
Resource Catalogue Workstream (RC) - ✅ COMPLETE
- Outcome: A machine-readable API specification for all
z3edcommands. - Artifact:
docs/api/z3ed-resources.yamlis the generated source of truth. - Details: Implemented a schema system and serialization for all CLI resources (ROM, Palette, Agent, etc.), enabling AI consumption.
Acceptance Workflow (AW-01, AW-02, AW-03) - ✅ COMPLETE
- Outcome: A complete, human-in-the-loop proposal review system.
- Components:
RomSandboxManager: For creating isolated ROM copies.ProposalRegistry: For tracking proposals, diffs, and logs with disk persistence.ProposalDrawer: An ImGui panel for reviewing, accepting, and rejecting proposals, with full ROM merging capabilities.
- Integration: The
agent run,agent list, andagent diffcommands are fully integrated with the registry. The GUI and CLI share the same underlying proposal data.
ImGuiTestHarness (IT-01, IT-02) - ✅ CORE COMPLETE
- Outcome: A gRPC-based service for automated GUI testing.
- Decision: Chose gRPC for its performance, cross-platform support, and type safety.
- Features: Implemented 6 core RPCs:
Ping,Click,Type,Wait,Assert, and a stubbedScreenshot. - Integration: The
z3ed agent testcommand can translate natural language prompts into a sequence of gRPC calls to execute tests.
Files Modified/Created
A summary of files created or changed during the implementation of the core z3ed infrastructure.
Core Services & CLI Handlers:
src/cli/service/proposal_registry.{h,cc}src/cli/service/rom_sandbox_manager.{h,cc}src/cli/service/resource_catalog.{h,cc}src/cli/handlers/agent.ccsrc/cli/handlers/rom.cc
GUI & Application Integration:
src/app/editor/system/proposal_drawer.{h,cc}src/app/editor/editor_manager.{h,cc}src/app/core/service/imgui_test_harness_service.{h,cc}src/app/core/proto/imgui_test_harness.proto
Build System (CMake):
src/app/app.cmakesrc/app/emu/emu.cmakesrc/cli/z3ed.cmakesrc/CMakeLists.txt
Documentation & API Specs:
docs/api/z3ed-resources.yamldocs/z3ed/E6-z3ed-cli-design.mddocs/z3ed/E6-z3ed-implementation-plan.mddocs/z3ed/E6-z3ed-reference.mddocs/z3ed/README.md
5. Open Questions
- What serialization format should the proposal registry adopt for diff payloads (binary vs. textual vs. hybrid)?
➤ Decision: pursue a hybrid package (.z3ed-diff) that wraps binary tile/object deltas alongside a JSON metadata envelope (identifiers, texture descriptors, preview palette info). Capture format draft under RC/AW backlog. - How should the harness authenticate escalation requests for mutation actions?
➤ Still open—evaluate shared-secret vs. interactive user prompt in the harness spike (IT-01). - Can we reuse existing regression test infrastructure for nightly ImGui runs or should we spin up a dedicated binary?
➤ Investigate during the ImGuiTestHarness spike; compare extendingyaze_testjobs versus introducing a lightweight automation runner.
Z3ED_AI Flag Migration Guide
Date: October 3, 2025
Status: ✅ Complete and Tested
Summary
This document describes the consolidation of z3ed AI build flags into a single Z3ED_AI master flag, fixing a Gemini integration crash, and improving build ergonomics.
Problem Statement
Before (Issues):
- Confusing Build Flags: Users had to specify
-DYAZE_WITH_GRPC=ON -DYAZE_WITH_JSON=ONto enable AI features - Crash on Startup: Gemini integration crashed due to
PromptBuilderusing JSON/YAML unconditionally - Poor Modularity: AI dependencies scattered across multiple conditional blocks
- Unclear Documentation: Users didn't know which flags enabled which features
Root Cause of Crash:
// GeminiAIService constructor (ALWAYS runs when Gemini key present)
GeminiAIService::GeminiAIService(const GeminiConfig& config) : config_(config) {
// This line crashed when YAZE_WITH_JSON=OFF
prompt_builder_.LoadResourceCatalogue(""); // ❌ Uses nlohmann::json unconditionally
}
The PromptBuilder::LoadResourceCatalogue() function used nlohmann::json and yaml-cpp without guards, causing segfaults when JSON support wasn't compiled in.
Solution
1. Created Z3ED_AI Master Flag
New CMakeLists.txt (/Users/scawful/Code/yaze/CMakeLists.txt):
# Master flag for z3ed AI agent features
option(Z3ED_AI "Enable z3ed AI agent features (Gemini/Ollama integration)" OFF)
# Auto-enable dependencies
if(Z3ED_AI)
message(STATUS "Z3ED_AI enabled: Activating AI agent dependencies (JSON, YAML, httplib)")
set(YAZE_WITH_JSON ON CACHE BOOL "Enable JSON support" FORCE)
endif()
Benefits:
- ✅ Single flag to enable all AI features:
-DZ3ED_AI=ON - ✅ Auto-manages dependencies (JSON, YAML, httplib)
- ✅ Clear intent: "I want AI agent features"
- ✅ Backward compatible: Old flags still work
2. Fixed PromptBuilder Crash
Added Compile-Time Guard (src/cli/service/ai/prompt_builder.h):
#ifndef YAZE_CLI_SERVICE_PROMPT_BUILDER_H_
#define YAZE_CLI_SERVICE_PROMPT_BUILDER_H_
// Warn at compile time if JSON not available
#if !defined(YAZE_WITH_JSON)
#warning "PromptBuilder requires JSON support. Build with -DZ3ED_AI=ON or -DYAZE_WITH_JSON=ON"
#endif
Added Runtime Guard (src/cli/service/ai/prompt_builder.cc):
absl::Status PromptBuilder::LoadResourceCatalogue(const std::string& yaml_path) {
#ifndef YAZE_WITH_JSON
// Gracefully degrade instead of crashing
std::cerr << "⚠️ PromptBuilder requires JSON support for catalogue loading\n"
<< " Build with -DZ3ED_AI=ON or -DYAZE_WITH_JSON=ON\n"
<< " AI features will use basic prompts without tool definitions\n";
return absl::OkStatus(); // Don't crash, just skip advanced features
#else
// ... normal loading code ...
#endif
}
Benefits:
- ✅ No more segfaults when
GEMINI_API_KEYis set but JSON disabled - ✅ Clear error messages at compile time and runtime
- ✅ Graceful degradation instead of hard failure
3. Updated z3ed Build Configuration
New z3ed.cmake (src/cli/z3ed.cmake):
# AI Agent Support (Consolidated via Z3ED_AI flag)
if(Z3ED_AI OR YAZE_WITH_JSON)
target_compile_definitions(z3ed PRIVATE YAZE_WITH_JSON)
message(STATUS "✓ z3ed AI agent enabled (Ollama + Gemini support)")
target_link_libraries(z3ed PRIVATE nlohmann_json::nlohmann_json)
endif()
# SSL/HTTPS Support for Gemini
if((Z3ED_AI OR YAZE_WITH_JSON) AND (YAZE_WITH_GRPC OR Z3ED_AI))
find_package(OpenSSL)
if(OpenSSL_FOUND)
target_compile_definitions(z3ed PRIVATE CPPHTTPLIB_OPENSSL_SUPPORT)
target_link_libraries(z3ed PRIVATE OpenSSL::SSL OpenSSL::Crypto)
message(STATUS "✓ SSL/HTTPS support enabled for z3ed (Gemini API ready)")
else()
message(WARNING "OpenSSL not found - Gemini API will not work")
message(STATUS " • Ollama (local) still works without SSL")
endif()
endif()
Benefits:
- ✅ Clear status messages during build
- ✅ Explains what's enabled and what's missing
- ✅ Guidance on how to fix missing dependencies
Migration Instructions
For Users
Old Way (still works):
cmake -B build -DYAZE_WITH_GRPC=ON -DYAZE_WITH_JSON=ON
cmake --build build --target z3ed
New Way (recommended):
cmake -B build -DZ3ED_AI=ON
cmake --build build --target z3ed
With GUI Testing:
cmake -B build -DZ3ED_AI=ON -DYAZE_WITH_GRPC=ON
cmake --build build --target z3ed
For Developers
Check if AI Features Available:
#ifdef YAZE_WITH_JSON
// JSON-dependent code (AI responses, config loading)
#else
// Fallback or warning
#endif
Don't use JSON/YAML directly - use PromptBuilder which handles guards automatically.
Testing Results
Build Configurations Tested ✅
-
Minimal Build (no AI):
cmake -B build ./build/bin/z3ed --help # ✅ Works, shows "AI disabled" message -
AI Enabled (new flag):
cmake -B build -DZ3ED_AI=ON export GEMINI_API_KEY="..." ./build/bin/z3ed agent plan --prompt "test" # ✅ Works, connects to Gemini -
Full Stack (AI + gRPC):
cmake -B build -DZ3ED_AI=ON -DYAZE_WITH_GRPC=ON ./build/bin/z3ed agent test --prompt "..." # ✅ Works, GUI automation available
Crash Scenarios Fixed ✅
Before:
export GEMINI_API_KEY="..."
cmake -B build # JSON disabled by default
./build/bin/z3ed agent plan --prompt "test"
# Result: Segmentation fault (139) ❌
After:
export GEMINI_API_KEY="..."
cmake -B build # JSON disabled by default
./build/bin/z3ed agent plan --prompt "test"
# Result: ⚠️ Warning message, graceful degradation ✅
export GEMINI_API_KEY="..."
cmake -B build -DZ3ED_AI=ON # JSON enabled
./build/bin/z3ed agent plan --prompt "Place a tree at 10, 10"
# Result: ✅ Gemini responds, creates proposal
Impact on Build Modularization
This change aligns with the goals in build_modularization_plan.md and build_modularization_implementation.md:
Before:
- Scattered conditional compilation flags
- Dependencies unclear
- Hard to add to modular library system
After:
- ✅ Clear feature flag:
Z3ED_AI - ✅ Can create
libyaze_agent.awithif(Z3ED_AI)guard - ✅ Easy to make optional in modular build:
if(Z3ED_AI) add_library(yaze_agent STATIC ${YAZE_AGENT_SOURCES}) target_compile_definitions(yaze_agent PUBLIC YAZE_WITH_JSON) target_link_libraries(yaze_agent PUBLIC nlohmann_json::nlohmann_json yaml-cpp) endif()
Future Modular Build Integration
When implementing modular builds (Phase 6-7 from build_modularization_plan.md):
# src/cli/agent/agent_library.cmake (NEW)
if(Z3ED_AI)
add_library(yaze_agent STATIC
cli/service/ai/ai_service.cc
cli/service/ai/ollama_ai_service.cc
cli/service/ai/gemini_ai_service.cc
cli/service/ai/prompt_builder.cc
cli/service/agent/conversational_agent_service.cc
# ... other agent sources
)
target_compile_definitions(yaze_agent PUBLIC YAZE_WITH_JSON)
target_link_libraries(yaze_agent PUBLIC
yaze_util
nlohmann_json::nlohmann_json
yaml-cpp
)
# Optional SSL for Gemini
if(OpenSSL_FOUND)
target_compile_definitions(yaze_agent PRIVATE CPPHTTPLIB_OPENSSL_SUPPORT)
target_link_libraries(yaze_agent PRIVATE OpenSSL::SSL OpenSSL::Crypto)
endif()
message(STATUS "✓ yaze_agent library built with AI support")
endif()
Benefits for Modular Build:
- Agent library clearly optional
- Can rebuild just agent library when AI code changes
- z3ed links to
yaze_agentinstead of individual sources - Faster incremental builds
Documentation Updates
Updated files:
- ✅
docs/z3ed/README.md- Added Z3ED_AI flag documentation - ✅
docs/z3ed/Z3ED_AI_FLAG_MIGRATION.md- This document - 📋 TODO: Update
docs/02-build-instructions.mdwith Z3ED_AI flag - 📋 TODO: Update CI/CD workflows to use Z3ED_AI
Backward Compatibility
Old Flags Still Work ✅
# These all enable AI features:
cmake -B build -DYAZE_WITH_JSON=ON # ✅ Works
cmake -B build -DYAZE_WITH_GRPC=ON # ✅ Works (auto-enables JSON)
cmake -B build -DZ3ED_AI=ON # ✅ Works (new way)
# Combining flags:
cmake -B build -DZ3ED_AI=ON -DYAZE_WITH_GRPC=ON # ✅ Full stack
No Breaking Changes
- Existing build scripts continue to work
- CI/CD pipelines don't need immediate updates
- Users can migrate at their own pace
Next Steps
Short Term (Complete)
- ✅ Fix Gemini crash
- ✅ Create Z3ED_AI master flag
- ✅ Update z3ed build configuration
- ✅ Test all build configurations
- ✅ Update README documentation
Medium Term (Recommended)
- Update CI/CD workflows to use
-DZ3ED_AI=ON - Add Z3ED_AI to preset configurations
- Update main build instructions docs
- Create agent library module (see above)
Long Term (Integration with Modular Build)
- Implement
yaze_agentlibrary (Phase 6) - Add agent to modular dependency graph
- Create agent-specific unit tests
- Optional: Split Gemini/Ollama into separate modules
References
- Related Issues: Gemini crash (segfault 139) with GEMINI_API_KEY set
- Related Docs:
docs/build_modularization_plan.md- Future library structuredocs/build_modularization_implementation.md- Implementation guidedocs/z3ed/README.md- User-facing z3ed documentationdocs/z3ed/AGENT-ROADMAP.md- AI agent development plan
Summary
This migration successfully:
- ✅ Fixed crash: Gemini no longer segfaults when JSON disabled
- ✅ Simplified builds: One flag (
Z3ED_AI) replaces multiple flags - ✅ Improved UX: Clear error messages and build status
- ✅ Maintained compatibility: Old flags still work
- ✅ Prepared for modularization: Clear path to
libyaze_agent.a - ✅ Tested thoroughly: All configurations verified working
The z3ed AI agent is now production-ready with Gemini and Ollama support!
6. References
Active Documentation:
E6-z3ed-cli-design.md- Overall CLI design and architectureE6-z3ed-reference.md- Technical command and API referencedocs/api/z3ed-resources.yaml- Machine-readable API reference (generated)
Source Code:
src/cli/service/- Core services (proposal registry, sandbox manager, resource catalog)src/app/editor/system/proposal_drawer.{h,cc}- GUI review panelsrc/app/core/service/imgui_test_harness_service.{h,cc}- gRPC automation server
Last Updated: [Current Date] Contributors: @scawful, GitHub Copilot License: Same as YAZE (see ../../LICENSE)