# z3ed Agentic Workflow Plan **Last Updated**: October 2, 2025 **Status**: Core Infrastructure Complete | Test Harness Enhancement Phase 🎯 > πŸ“‹ **Quick Start**: See [README.md](README.md) for essential links and project status. ## Executive Summary The z3ed CLI and AI agent workflow system has completed major infrastructure milestones: **βœ… Completed Phases**: - **Phase 6**: Resource Catalogue - Machine-readable API specs for AI consumption - **AW-01/02/03**: Acceptance Workflow - Proposal tracking, sandbox management, GUI review with ROM merging - **AW-04**: Policy Evaluation Framework - YAML-based constraint system for proposal acceptance - **IT-01**: ImGuiTestHarness - Full GUI automation via gRPC + ImGuiTestEngine (all 3 phases complete) - **IT-02**: CLI Agent Test - Natural language β†’ automated GUI testing (implementation complete) **πŸ”„ Active Phase**: - **Test Harness Enhancements (IT-05 to IT-09)**: Expanding from basic automation to comprehensive testing platform with a renewed emphasis on system-wide error reporting **πŸ“‹ Next Phases**: - **Priority 1**: Test Introspection API (IT-05) - Enable test status querying and result polling - **Priority 2**: Widget Discovery API (IT-06) - AI agents enumerate available GUI interactions - **Priority 3**: Enhanced Error Reporting (IT-08+) - Holistic improvements spanning z3ed, ImGuiTestHarness, EditorManager, and core application services **Recent Accomplishments** (Updated: October 2025): - **βœ… IT-08b Auto-Capture Complete**: Failure diagnostics now captured automatically - Execution context (frame count, active window, focused widget) captured on failure - Screenshot path placeholder set for future RPC integration - Proto schema updated with failure diagnostic fields - GetTestResults RPC returns comprehensive failure information - **βœ… IT-08a Screenshot RPC Complete**: SDL-based screenshot capture operational - Captures 1536x864 BMP files via SDL_RenderReadPixels - Successfully tested via gRPC (5.3MB output files) - Foundation for auto-capture on test failures - **βœ… Policy Framework Complete**: PolicyEvaluator service fully integrated with ProposalDrawer GUI - 4 policy types implemented: test_requirement, change_constraint, forbidden_range, review_requirement - 3 severity levels: Info (informational), Warning (overridable), Critical (blocks acceptance) - GUI displays color-coded violations (β›” critical, ⚠️ warning, ℹ️ info) - Accept button gating based on policy violations with override confirmation dialog - Example policy configuration at `.yaze/policies/agent.yaml` - **βœ… E2E Validation Complete**: All 5 functional RPC tests passing (Ping, Click, Type, Wait, Assert) - Window detection timing issue **resolved** with 10-frame yield buffer in Wait RPC - Thread safety issues **resolved** with shared_ptr state management - Test harness validated on macOS ARM64 with real YAZE GUI interactions - **gRPC Test Harness (IT-01 & IT-02)**: Full implementation complete with natural language β†’ GUI testing - **βœ… Test Recording & Replay (IT-07)**: JSON recorder/replayer implemented, CLI and harness wired, end-to-end regression workflow captured in `scripts/test_record_replay_e2e.sh` - **Build System**: Hardened CMake configuration with reliable gRPC integration - **Proposal Workflow**: Agentic proposal system fully operational (create, list, diff, review in GUI) **Known Limitations & Improvement Opportunities**: - **Screenshot Auto-Capture**: Manual RPC only β†’ needs integration with TestManager failure detection - **Test Introspection**: βœ… Complete - GetTestStatus/ListTests/GetResults RPCs operational - **Widget Discovery**: AI agents can't enumerate available widgets β†’ add DiscoverWidgets RPC - **Test Recording**: No record/replay for regression testing β†’ add RecordSession/ReplaySession RPCs - **Synchronous Wait**: Async tests return immediately β†’ add blocking mode or result polling - **Error Context**: Test failures lack screenshots/state dumps β†’ enhance error reporting - **Performance**: Tests add ~166ms per Wait call due to frame yielding (acceptable trade-off) - **YAML Parsing**: Simple parser implemented, consider yaml-cpp for complex scenarios **Time Investment**: 28.5 hours total (IT-01: 11h, IT-02: 7.5h, E2E: 2h, Policy: 6h, Docs: 2h) ## Quick Reference **Start Test Harness**: ```bash ./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze \ --enable_test_harness \ --test_harness_port=50052 \ --rom_file=assets/zelda3.sfc & ``` **Test All RPCs**: ```bash ./scripts/test_harness_e2e.sh ``` **Create Proposal**: ```bash ./build/bin/z3ed agent run "Test prompt" --sandbox ./build/bin/z3ed agent list ./build/bin/z3ed agent diff --proposal-id ``` **Review in GUI**: - Open YAZE β†’ `Debug β†’ Agent Proposals` - Select proposal β†’ Review β†’ Accept/Reject/Delete --- ## 1. Current Priorities (Week of Oct 2-8, 2025) **Status**: Core Infrastructure Complete βœ… | Test Harness Enhancement Phase πŸ”§ ### Priority 1: Test Harness Enhancements (IT-05 to IT-09) πŸ”§ ACTIVE **Goal**: Transform test harness from basic automation to comprehensive testing platform **and deliver holistic error reporting across YAZE** **Time Estimate**: 20-25 hours total (7.5h completed in IT-07) **Blocking Dependency**: IT-01 Complete βœ… **Motivation**: The harness now supports AI workflows, regression capture, and automationβ€”but error surfaces remain shallow: - **AI Agent Development**: Still needs widget discovery for adaptive planning - **Regression Testing**: Recording/replay finished; reporting pipeline must surface actionable failures - **CI/CD Integration**: Requires reliable artifacts (logs, screenshots, structured context) - **Debugging**: Failures lack screenshots, widget hierarchies, and EditorManager state snapshots - **Application Consistency**: z3ed, EditorManager, and core services emit heterogeneous error formats #### IT-05: Test Introspection API (6-8 hours) **Status (Oct 2, 2025)**: 🟑 *Server-side RPCs implemented; CLI + E2E pending* **Progress**: - βœ… `imgui_test_harness.proto` expanded with GetTestStatus/ListTests/GetTestResults messages. - βœ… `TestManager` maintains execution history (queuedβ†’runningβ†’completed) with logs, metrics, and aggregates. - βœ… `ImGuiTestHarnessServiceImpl` exposes the three introspection RPCs with pagination, status conversion, and log/metric marshalling. - ⚠️ `agent` CLI commands (`test status`, `test list`, `test results`) still stubbed. - ⚠️ End-to-end introspection script (`scripts/test_introspection_e2e.sh`) not implemented; regression script `test_harness_e2e.sh` currently failing because it references the unfinished CLI. **Immediate Next Steps**: 1. **Wire CLI Client Methods** - Implement gRPC client wrappers for the new RPCs in the automation client. - Add user-facing commands under `z3ed agent test ...` with JSON/YAML output options. 2. **Author E2E Validation Script** - Spin up harness, run Click/Assert workflow, poll via `agent test status`, fetch results. - Update CI notes with the new script and expected output. 3. **Documentation & Examples** - Extend `E6-z3ed-reference.md` with full usage examples and sample outputs. - Add troubleshooting section covering common errors (unknown test_id, timeout, etc.). 4. **Stretch (Optional Before IT-06)** - Capture assertion metadata (expected/actual) for richer `AssertionResult` payloads. **Example Usage**: ```bash # Queue a test z3ed agent test --prompt "Open Overworld editor" # Poll for completion z3ed test status --test-id grpc_click_12345678 # Retrieve results z3ed test results --test-id grpc_click_12345678 --format json ``` **API Schema**: ```proto message GetTestStatusRequest { string test_id = 1; } message GetTestStatusResponse { enum Status { QUEUED = 0; RUNNING = 1; PASSED = 2; FAILED = 3; TIMEOUT = 4; } Status status = 1; int64 execution_time_ms = 2; string error_message = 3; repeated string assertion_failures = 4; } message ListTestsRequest { string category_filter = 1; // Optional: "grpc", "unit", etc. int32 page_size = 2; string page_token = 3; } message ListTestsResponse { repeated TestInfo tests = 1; string next_page_token = 2; } message TestInfo { string test_id = 1; string name = 2; string category = 3; int64 last_run_timestamp_ms = 4; int32 total_runs = 5; int32 pass_count = 6; int32 fail_count = 7; } ``` #### IT-06: Widget Discovery API (4-6 hours) **Implementation Tasks**: 1. **Add DiscoverWidgets RPC**: - Enumerate all windows currently open in YAZE GUI - List all interactive widgets (buttons, inputs, menus, tabs) per window - Return widget metadata: ID, type, label, enabled state, position - Support filtering by window name or widget type 2. **AI-Friendly Output Format**: - JSON schema describing available interactions - Natural language descriptions for each widget - Suggested action templates (e.g., "Click button:{label}") **Example Usage**: ```bash # Discover all widgets z3ed gui discover # Filter by window z3ed gui discover --window "Overworld" # Get only buttons z3ed gui discover --type button ``` **API Schema**: ```proto message DiscoverWidgetsRequest { string window_filter = 1; // Optional: filter by window name enum WidgetType { ALL = 0; BUTTON = 1; INPUT = 2; MENU = 3; TAB = 4; CHECKBOX = 5; } WidgetType type_filter = 2; } message DiscoverWidgetsResponse { repeated WindowInfo windows = 1; } message WindowInfo { string name = 1; bool is_visible = 2; repeated WidgetInfo widgets = 3; } message WidgetInfo { string id = 1; string label = 2; string type = 3; // "button", "input", "menu", etc. bool is_enabled = 4; string position = 5; // "x,y,width,height" string suggested_action = 6; // "Click button:Open ROM" } ``` **Benefits for AI Agents**: - LLMs can dynamically learn available GUI interactions - Agents can adapt to UI changes without hardcoded widget names - Natural language descriptions enable better prompt engineering #### IT-07: Test Recording & Replay βœ… COMPLETE (Oct 2, 2025) **Highlights**: - Implemented `StartRecording`, `StopRecording`, and `ReplayTest` RPCs with persistent JSON scripts - Added CLI commands: `z3ed test record start|stop`, `z3ed test replay` - Scripts stored in `tests/gui/` with metadata (name, tags, assertions, timing hints) - Added regression coverage via `scripts/test_record_replay_e2e.sh` - Documentation updates in `E6-z3ed-reference.md` and new quick-start snippets in README - Confirmed compatibility with natural language prompts generated by the agent workflow **Outcome**: Recording/replay is production-ready; focus shifts to surfacing rich failure diagnostics (IT-08). #### IT-08: Enhanced Error Reporting (5-7 hours) πŸ”„ ACTIVE **Status**: IT-08a Complete βœ… | IT-08b Complete βœ… | IT-08c In Progress πŸ”„ **Objective**: Deliver a unified, high-signal error reporting pipeline spanning ImGuiTestHarness, z3ed CLI, EditorManager, and core application services. **Implementation Tracks**: 1. **Harness-Level Diagnostics** - βœ… IT-08a: Screenshot RPC implemented (SDL-based, BMP format, 1536x864) - βœ… IT-08b: Auto-capture screenshots and context on test failure - οΏ½ IT-08c: Widget tree dumps and recent ImGui events on failure (NEXT) - Serialize results to both structured JSON (for automation) and human-friendly HTML bundles - Persist artifacts under `test-results//` with timestamped directories 2. **CLI Experience Improvements** - Standardize error envelopes in z3ed (`absl::Status` + structured payload) - Surface artifact paths, summarized failure reason, and next-step hints in CLI output - Add `--format html` / `--format json` flags to `z3ed agent test results` to emit richer context - Integrate with recording workflow: replay failures using captured state for fast reproduction 3. **EditorManager & Application Integration** - Introduce shared `ErrorAnnotatedResult` utility exposing `status`, `context`, `actionable_hint` - Adapt EditorManager subsystems (ProposalDrawer, OverworldEditor, DungeonEditor) to adopt the shared structure - Add in-app failure overlay (ImGui modal) that references harness artifacts when available - Hook proposal acceptance/replay flows to display enriched diagnostics when sandbox merges fail 4. **Telemetry & Storage Hooks** (Stretch) - Optionally emit error metadata to a ring buffer for future analytics/telemetry workstreams - Provide CLI flag `--error-artifact-dir` to customize storage (supports CI separation) **Error Report Example**: ```json { "test_id": "grpc_assert_12345678", "failure_time": "2025-10-02T14:23:45Z", "assertion": "visible:Overworld", "expected": "visible", "actual": "hidden", "screenshot": "/tmp/yaze_test_12345678.png", "widget_state": { "active_window": "Main Window", "focused_widget": null, "visible_windows": ["Main Window", "Debug"], "overworld_window": { "exists": true, "visible": false, "position": "0,0,0,0" } }, "execution_context": { "frame_count": 1234, "recent_events": ["Click: menuitem: Overworld Editor", "Wait: window_visible:Overworld"], "resource_stats": { "memory_mb": 245, "textures": 12, "framerate": 60.0 }, "editor_manager_snapshot": { "active_module": "OverworldEditor", "dirty_buffers": ["overworld_layer_1"], "last_error": null } } } ``` #### IT-09: CI/CD Integration (2-3 hours) **Implementation Tasks**: 1. **Standardized Test Suite Format**: - YAML/JSON format for test suite definitions - Support test groups (smoke, regression, nightly) - Enable parallel execution with dependencies 2. **CI-Friendly CLI**: - `z3ed test run-suite tests/suite.yaml --ci-mode` - Exit codes: 0 = all passed, 1 = failures, 2 = errors - JUnit XML output for CI parsers - GitHub Actions integration examples 3. **Documentation**: - Add `.github/workflows/gui-tests.yml` example - Create sample test suites for common scenarios - Document best practices for flaky test handling **Test Suite Format**: ```yaml name: YAZE GUI Test Suite description: Comprehensive tests for YAZE editor functionality version: 1.0 config: timeout_per_test: 30s retry_on_failure: 2 parallel_execution: false test_groups: - name: smoke description: Fast tests for basic functionality tests: - tests/overworld_load.json - tests/dungeon_load.json - name: regression description: Full test suite for release validation depends_on: [smoke] tests: - tests/palette_edit.json - tests/sprite_load.json - tests/rom_save.json ``` **GitHub Actions Integration**: ```yaml name: GUI Tests on: [push, pull_request] jobs: gui-tests: runs-on: macos-latest steps: - uses: actions/checkout@v2 - name: Build YAZE with test harness run: | cmake -B build -DYAZE_WITH_GRPC=ON cmake --build build --target yaze --target z3ed - name: Start test harness run: | ./build/bin/yaze --enable_test_harness --headless & sleep 5 - name: Run test suite run: | ./build/bin/z3ed test run-suite tests/suite.yaml --ci-mode - name: Upload test results if: always() uses: actions/upload-artifact@v2 with: name: test-results path: test-results/ ``` --- #### IT-10: Collaborative Editing & Multiplayer Sessions (12-15 hours) **Implementation Tasks**: 1. **Collaboration Server**: - WebSocket server for real-time client communication - Session management (create, join, authentication) - Edit event broadcasting to all connected clients - Conflict resolution (last-write-wins with timestamps) 2. **Collaboration Client**: - Connect to remote sessions via WebSocket - Send local edits to server - Receive and apply remote edits - ROM state synchronization on join 3. **Edit Event Protocol**: - Protobuf definitions for edit events (tile, sprite, palette, map) - Cursor position tracking - AI proposal sharing and voting - Session state messages 4. **GUI Integration**: - Status bar showing connected users - Collaboration panel (user list, activity feed) - Live cursor rendering (color-coded per user) - Proposal voting UI (Accept/Reject/Discuss) 5. **Session Recording & Replay**: - Record all events to YAML/JSON file - Replay engine with timeline controls - Export session summaries for review **CLI Commands**: ```bash # Host a collaborative session z3ed collab host --port 5000 --password "dev123" # Join a session z3ed collab join yaze://connect/192.168.1.100:5000 # List active sessions (LAN discovery) z3ed collab list # Disconnect from session z3ed collab disconnect # Replay recorded session z3ed collab replay session_2025_10_02.yaml --speed 2x ``` **User Stories**: - **US-1**: As a ROM hacker, I want to host a collaborative session so my teammates can join and work together - **US-2**: As a collaborator, I want to see other users' edits in real-time so we stay synchronized - **US-3**: As a team lead, I want to use AI agents with my team so we can all benefit from automation (shared proposals with majority voting) - **US-4**: As a collaborator, I want to see where other users are working so we don't conflict (live cursors) - **US-5**: As a project manager, I want to record collaborative sessions so we can review work later **Benefits**: - **Real-Time Collaboration**: Multiple users can edit the same ROM simultaneously - **Shared AI Assistance**: Team votes on AI proposals before execution - **Conflict Prevention**: Live cursors show where teammates are working - **Audit Trail**: Session recording for review and compliance - **Remote Teams**: Connect over LAN or internet (with optional encryption) **Technical Architecture**: ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Client A │────►│ Collab Server │◄────│ Client B β”‚ β”‚ (Host) β”‚ β”‚ (WebSocket) β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ - Session Mgmt β”‚ β”‚ - Event Broker β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ - Conflict Res │◄────│ Client C β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` **Security Considerations**: - Optional password protection for sessions - Read-only vs read-write access levels - ROM checksum verification (prevents desync) - Rate limiting (prevent spam/DOS) - Optional TLS/SSL encryption for public internet **See**: [IT-10-COLLABORATIVE-EDITING.md](IT-10-COLLABORATIVE-EDITING.md) for complete specification --- ### Priority 2: Windows Cross-Platform Testing πŸͺŸ **Goal**: Validate z3ed and test harness on Windows **Time Estimate**: 8-10 hours **Blocking Dependency**: IT-05 Complete (need stable API) > πŸ“‹ **Detailed Guides**: See [NEXT_PRIORITIES_OCT2.md](NEXT_PRIORITIES_OCT2.md) for complete implementation breakdowns with code examples. --- ## 2. Workstreams Overview | Workstream | Goal | Status | Notes | |------------|------|--------|-------| | Resource Catalogue | Machine-readable CLI specs for AI consumption | βœ… Complete | `docs/api/z3ed-resources.yaml` generated | | Acceptance Workflow | Human review/approval of agent proposals | βœ… Complete | ProposalDrawer with ROM merging operational | | ImGuiTest Bridge | Automated GUI testing via gRPC | βœ… Complete | All 3 phases done (11 hours) | | Verification Pipeline | Layered testing + CI coverage | πŸ“‹ In Progress | E2E validation phase | | Telemetry & Learning | Capture signals for improvement | πŸ“‹ Planned | Optional/opt-in (Phase 8) | ### Completed Work Summary **Resource Catalogue (RC)** βœ…: - CLI flag passthrough and resource catalog system - `agent describe` exports YAML/JSON schemas - `docs/api/z3ed-resources.yaml` maintained - All ROM/Palette/Overworld/Dungeon/Patch commands documented **Acceptance Workflow (AW-01/02/03)** βœ…: - `ProposalRegistry` with disk persistence and cross-session tracking - `RomSandboxManager` for isolated ROM copies - `agent list` and `agent diff` commands - **ProposalDrawer GUI**: List/detail views, Accept/Reject/Delete, ROM merging - Integrated into EditorManager (`Debug β†’ Agent Proposals`) **ImGuiTestHarness (IT-01)** βœ…: - Phase 1: gRPC infrastructure (6 RPC methods) - Phase 2: TestManager integration with dynamic tests - Phase 3: Full ImGuiTestEngine (Type/Wait/Assert RPCs) - E2E test script: `scripts/test_harness_e2e.sh` - Documentation: IT-01-QUICKSTART.md --- ## 3. Task Backlog | ID | Task | Workstream | Type | Status | Dependencies | |----|------|------------|------|--------|--------------| | RC-01 | Define schema for `ResourceCatalog` entries and implement serialization helpers. | Resource Catalogue | Code | βœ… Done | Schema system complete with all resource types documented | | RC-02 | Auto-generate `docs/api/z3ed-resources.yaml` from command annotations. | Resource Catalogue | Tooling | βœ… Done | Generated and committed to docs/api/ | | RC-03 | Implement `z3ed agent describe` CLI surface returning JSON schemas. | Resource Catalogue | Code | βœ… Done | Both YAML and JSON output formats working | | RC-04 | Integrate schema export with TUI command palette + help overlays. | Resource Catalogue | UX | πŸ“‹ Planned | RC-03 | | RC-05 | Harden CLI command routing/flag parsing to unblock agent automation. | Resource Catalogue | Code | βœ… Done | Fixed rom info handler to use FLAGS_rom | | AW-01 | Implement sandbox ROM cloning and tracking (`RomSandboxManager`). | Acceptance Workflow | Code | βœ… Done | ROM sandbox manager operational with lifecycle management | | AW-02 | Build proposal registry service storing diffs, logs, screenshots. | Acceptance Workflow | Code | βœ… Done | ProposalRegistry implemented with disk persistence | | AW-03 | Add ImGui drawer for proposals with accept/reject controls. | Acceptance Workflow | UX | βœ… Done | ProposalDrawer GUI complete with ROM merging | | AW-04 | Implement policy evaluation for gating accept buttons. | Acceptance Workflow | Code | βœ… Done | PolicyEvaluator service with 4 policy types (test, constraint, forbidden, review), GUI integration complete (6 hours) | | AW-05 | Draft `.z3ed-diff` hybrid schema (binary deltas + JSON metadata). | Acceptance Workflow | Design | πŸ“‹ Planned | AW-01 | | IT-01 | Create `ImGuiTestHarness` IPC service embedded in `yaze_test`. | ImGuiTest Bridge | Code | βœ… Done | Phase 1+2+3 Complete - Full GUI automation with gRPC + ImGuiTestEngine (11 hours) | | IT-02 | Implement CLI agent step translation (`imgui_action` β†’ harness call). | ImGuiTest Bridge | Code | βœ… Done | `z3ed agent test` command with natural language prompts (7.5 hours) | | IT-03 | Provide synchronization primitives (`WaitForIdle`, etc.). | ImGuiTest Bridge | Code | βœ… Done | Wait RPC with condition polling already implemented in IT-01 Phase 3 | | IT-04 | Complete E2E validation with real YAZE widgets | ImGuiTest Bridge | Test | βœ… Done | IT-02 - All 5 functional tests passing, window detection fixed with yield buffer | | IT-05 | Add test introspection RPCs (GetTestStatus, ListTests, GetResults) | ImGuiTest Bridge | Code | βœ… Done | IT-01 - Enable clients to poll test results and query execution state (Oct 2, 2025) | | IT-06 | Implement widget discovery API for AI agents | ImGuiTest Bridge | Code | πŸ“‹ Planned | IT-01 - DiscoverWidgets RPC to enumerate windows, buttons, inputs | | IT-07 | Add test recording/replay for regression testing | ImGuiTest Bridge | Code | βœ… Done | IT-05 - RecordSession/ReplaySession RPCs with JSON test scripts | | IT-08 | Enhance error reporting with screenshots and state dumps | ImGuiTest Bridge | Code | πŸ”„ Active | IT-01 - Capture widget state on failure for debugging (67% complete: IT-08a βœ…, IT-08b βœ…, IT-08c πŸ”„) | | IT-08a | Screenshot RPC implementation (SDL capture) | ImGuiTest Bridge | Code | βœ… Done | IT-01 - Screenshot capture complete (Oct 2, 2025) | | IT-08b | Auto-capture screenshots on test failure | ImGuiTest Bridge | Code | βœ… Done | IT-08a - Integrated with TestManager (Oct 2, 2025) | | IT-08c | Widget state dumps and execution context | ImGuiTest Bridge | Code | οΏ½ Active | IT-08b - Enhanced failure diagnostics (NEXT PRIORITY) | | IT-09 | Create standardized test suite format for CI integration | ImGuiTest Bridge | Infra | πŸ“‹ Planned | IT-07 - JSON/YAML test suite format compatible with CI/CD pipelines | | IT-10 | Collaborative editing & multiplayer sessions with shared AI | Collaboration | Feature | πŸ“‹ Planned | IT-05, IT-08 - Real-time multi-user editing with live cursors, shared proposals (12-15 hours) | | VP-01 | Expand CLI unit tests for new commands and sandbox flow. | Verification Pipeline | Test | πŸ“‹ Planned | RC/AW tasks | | VP-02 | Add harness integration tests with replay scripts. | Verification Pipeline | Test | πŸ“‹ Planned | IT tasks | | VP-03 | Create CI job running agent smoke tests with `YAZE_WITH_JSON`. | Verification Pipeline | Infra | πŸ“‹ Planned | VP-01, VP-02 | | TL-01 | Capture accept/reject metadata and push to telemetry log. | Telemetry & Learning | Code | πŸ“‹ Planned | AW tasks | | TL-02 | Build anonymized metrics exporter + opt-in toggle. | Telemetry & Learning | Infra | πŸ“‹ Planned | TL-01 | _Status Legend: πŸ”„ Active Β· πŸ“‹ Planned Β· βœ… Done_ **Progress Summary**: - βœ… Completed: 12 tasks (50%) - πŸ”„ Active: 1 task (4%) - πŸ“‹ Planned: 11 tasks (46%) - **Total**: 24 tasks (6 test harness enhancements + 1 collaborative feature) ## 3. Immediate Next Steps (Week of Oct 1-7, 2025) ### Priority 0: Testing & Validation (Active) 1. **TEST**: Complete end-to-end proposal workflow - Launch YAZE and verify ProposalDrawer displays live proposals - Test Accept action β†’ verify ROM merge and save prompt - Test Reject and Delete actions - Validate filtering and refresh functionality 2. **Widget ID Refactoring** (Started Oct 2, 2025) 🎯 NEW - βœ… Added widget_id_registry to build system - βœ… Registered 13 Overworld toolset buttons with hierarchical IDs - πŸ“‹ Next: Test widget discovery and update test harness - See: [WIDGET_ID_REFACTORING_PROGRESS.md](WIDGET_ID_REFACTORING_PROGRESS.md) ### Priority 1: ImGuiTestHarness Foundation (IT-01) βœ… COMPLETE **Rationale**: Required for automated GUI testing and remote control of YAZE for AI workflows **Decision**: βœ… **Use gRPC** - Production-grade, cross-platform, type-safe (see `IT-01-grpc-evaluation.md`) **Status**: Phase 1 Complete βœ… | Phase 2 Complete βœ… | Phase 3 Planned οΏ½ #### Phase 1: gRPC Infrastructure βœ… COMPLETE - βœ… Add gRPC to build system via FetchContent - βœ… Create .proto schema (Ping, Click, Type, Wait, Assert, Screenshot) - βœ… Implement gRPC server with all 6 RPC stubs - βœ… Test with grpcurl - all RPCs responding - βœ… Server lifecycle management (Start/Shutdown) - βœ… Cross-platform build verified (macOS ARM64) **See**: `GRPC_TEST_SUCCESS.md` for Phase 1 completion details #### Phase 2: ImGuiTestEngine Integration βœ… COMPLETE **Goal**: Replace stub RPC handlers with actual GUI automation **Status**: Infrastructure complete, dynamic test registration implemented **Time Spent**: ~4 hours **Implementation Guide**: πŸ“– **[IT-01-PHASE2-IMPLEMENTATION-GUIDE.md](IT-01-PHASE2-IMPLEMENTATION-GUIDE.md)** **Completed Tasks**: 1. βœ… **TestManager Integration** - gRPC service receives TestManager reference 2. βœ… **Build System** - Successfully compiles with ImGuiTestEngine support 3. βœ… **Server Startup** - gRPC server starts correctly on macOS with test harness flag 4. βœ… **Dynamic Test Registration** - Click RPC uses `IM_REGISTER_TEST()` macro for dynamic tests 5. βœ… **Stub Handlers** - Type/Wait/Assert RPCs return success (implementation pending Phase 3) 6. βœ… **Ping RPC** - Fully functional, returns YAZE version and timestamp **Key Learnings**: - ImGuiTestEngine requires test registration - can't call test functions directly - Test context provided by engine via `test->Output.Status` not `test->Status` - YAZE uses custom flag system with `FLAGS_name->Get()` pattern - Correct flags: `--enable_test_harness`, `--test_harness_port`, `--rom_file` **Testing Results**: ```bash # Server starts successfully ./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze \ --enable_test_harness \ --test_harness_port=50052 \ --rom_file=assets/zelda3.sfc & # Ping RPC working grpcurl -plaintext -d '{"message":"test"}' \ 127.0.0.1:50052 yaze.test.ImGuiTestHarness/Ping # Response: {"message":"Pong: test","timestampMs":"...","yazeVersion":"0.3.2"} ``` **Issues Fixed**: - βŒβ†’βœ… SIGSEGV on TestManager initialization (deferred ImGuiTestEngine init to Phase 3) - βŒβ†’βœ… ImGuiTestEngine API mismatch (switched to dynamic test registration) - βŒβ†’βœ… Status field access (corrected to `test->Output.Status`) - βŒβ†’βœ… Port conflicts (use port 50052, `killall yaze` to cleanup) - βŒβ†’βœ… Flag naming (documented correct underscore format) #### Phase 3: Full ImGuiTestEngine Integration βœ… COMPLETE (Oct 2, 2025) **Goal**: Complete implementation of all GUI automation RPCs **Completed Tasks**: 1. βœ… **Type RPC Implementation** - Full text input automation - ItemInfo API usage corrected (returns by value, not pointer) - Focus management with ItemClick before typing - Clear-first functionality with keyboard shortcuts - Dynamic test registration with timeout handling 2. βœ… **Wait RPC Implementation** - Condition polling with timeout - Three condition types: window_visible, element_visible, element_enabled - Configurable timeout (default 5000ms) and poll interval (default 100ms) - Proper Yield() calls to allow ImGui event processing - Extended timeout for test execution 3. βœ… **Assert RPC Implementation** - State validation with structured responses - Multiple assertion types: visible, enabled, exists, text_contains - Actual vs expected value reporting - Detailed error messages for debugging - text_contains partially implemented (text retrieval needs refinement) 4. βœ… **API Compatibility Fixes** - Corrected ItemInfo usage (by value, check ID != 0) - Fixed flag names (ItemFlags instead of StatusFlags) - Proper visibility checks using RectClipped dimensions - All dynamic tests properly registered and cleaned up **Testing**: - Build successful on macOS ARM64 - All RPCs respond correctly - Test script created: `scripts/test_harness_e2e.sh` - See `IT-01-PHASE3-COMPLETE.md` for full implementation details **Known Limitations**: - Screenshot RPC not implemented (placeholder stub) - text_contains assertion uses placeholder text retrieval - Need end-to-end workflow testing with real YAZE widgets 6. **End-to-End Testing** (1 hour) - Create shell script workflow: start server β†’ click button β†’ wait for window β†’ type text β†’ assert state - Test with real YAZE editors (Overworld, Dungeon, etc.) - Document edge cases and troubleshooting #### Phase 4: CLI Integration & Windows Testing (4-5 hours) 7. **CLI Client** (`z3ed agent test`) - Generate gRPC calls from AI prompts - Natural language β†’ ImGui action translation - Screenshot capture for LLM feedback - Emit structured error envelopes with artifact links (IT-08) 8. **Windows Testing** - Detailed build instructions for vcpkg setup - Test on Windows VM or with contributor - Add Windows CI job to GitHub Actions - Document troubleshooting ### IT-01 Quick Reference **Start YAZE with Test Harness**: ```bash ./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze \ --enable_test_harness \ --test_harness_port=50052 \ --rom_file=assets/zelda3.sfc & ``` **Test RPCs with grpcurl**: ```bash # Ping - Health check grpcurl -plaintext -import-path src/app/core/proto -proto imgui_test_harness.proto \ -d '{"message":"test"}' 127.0.0.1:50052 yaze.test.ImGuiTestHarness/Ping # Click - Click UI element grpcurl -plaintext -import-path src/app/core/proto -proto imgui_test_harness.proto \ -d '{"target":"button:Overworld","type":"LEFT"}' \ 127.0.0.1:50052 yaze.test.ImGuiTestHarness/Click # Type - Input text grpcurl -plaintext -import-path src/app/core/proto -proto imgui_test_harness.proto \ -d '{"target":"input:Filename","text":"zelda3.sfc","clear_first":true}' \ 127.0.0.1:50052 yaze.test.ImGuiTestHarness/Type # Wait - Wait for condition grpcurl -plaintext -import-path src/app/core/proto -proto imgui_test_harness.proto \ -d '{"condition":"window_visible:Overworld Editor","timeout_ms":5000}' \ 127.0.0.1:50052 yaze.test.ImGuiTestHarness/Wait # Assert - Validate state grpcurl -plaintext -import-path src/app/core/proto -proto imgui_test_harness.proto \ -d '{"condition":"visible:Main Window"}' \ 127.0.0.1:50052 yaze.test.ImGuiTestHarness/Assert ``` **Troubleshooting**: - **Port in use**: `killall yaze` or use `--test_harness_port=50053` - **Connection refused**: Check server started with `lsof -i :50052` - **Unrecognized flag**: Use underscores not hyphens (e.g., `--rom_file` not `--rom`) ### Priority 2: Policy Evaluation Framework (AW-04, 4-6 hours) 5. **DESIGN**: YAML-based Policy Configuration ```yaml # .yaze/policies/agent.yaml version: 1.0 policies: - name: require_tests type: test_requirement enabled: true rules: - test_suite: "overworld_rendering" min_pass_rate: 0.95 - test_suite: "palette_integrity" min_pass_rate: 1.0 - name: limit_change_scope type: change_constraint enabled: true rules: - max_bytes_changed: 10240 # 10KB - allowed_banks: [0x00, 0x01, 0x0E] # Graphics banks only - forbidden_ranges: - start: 0xFFB0 # ROM header end: 0xFFFF - name: human_review_required type: review_requirement enabled: true rules: - if: bytes_changed > 1024 then: require_diff_review: true - if: commands_executed > 10 then: require_log_review: true ``` 6. **IMPLEMENT**: PolicyEvaluator Service - `src/cli/service/policy_evaluator.{h,cc}` - Singleton service loads policies from `.yaze/policies/` - `EvaluateProposal(proposal_id) -> PolicyResult` - Returns: pass/fail + list of violations with severity - Hook into ProposalRegistry lifecycle 7. **INTEGRATE**: Policy UI in ProposalDrawer - Add "Policy Status" section in detail view - Display violations with icons: β›” Critical, ⚠️ Warning, ℹ️ Info - Gate Accept button: disabled if critical violations exist - Show helpful messages: "Accept blocked: Test pass rate 0.85 < 0.95" - Allow policy overrides with confirmation: "Override policy? This action will be logged." ### Priority 3: Documentation & Consolidation (2-3 hours) 8. **CONSOLIDATE**: Merge standalone docs into main plan - βœ… AW-03 summary β†’ already in main plan, delete standalone doc - Check for other AW-* or task-specific docs to merge - Update main plan with architecture diagrams 9. **CREATE**: Architecture Flow Diagram - Visual representation of proposal lifecycle - Component interaction diagram - Add to implementation plan ### Later: Advanced Features - VP-01: Expand CLI unit tests - VP-02: Integration tests with replay scripts - TL-01: Telemetry capture for learning ## 4. Current Issues & Blockers ### Active Issues None - all blocking issues resolved as of Oct 1, 2025 ### Known Limitations (Non-Blocking) 1. ProposalDrawer lacks keyboard navigation 2. Large diffs/logs truncated at 1000 lines (consider pagination) 3. Proposals don't persist full metadata to disk (prompt, description, sandbox_id reconstructed) 4. No policy evaluation yet (AW-04) ## 5. Architecture Overview ### 5.1. Proposal Lifecycle Flow ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ 1. CREATION (CLI: z3ed agent run) β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ User Prompt β”‚ β”‚ ↓ β”‚ β”‚ MockAIService / GeminiAIService β”‚ β”‚ ↓ (generates commands) β”‚ β”‚ ["palette export ...", "overworld set-tile ..."] β”‚ β”‚ ↓ β”‚ β”‚ RomSandboxManager::CreateSandbox(rom) β”‚ β”‚ ↓ (creates isolated copy) β”‚ β”‚ /tmp/yaze/sandboxes//zelda3.sfc β”‚ β”‚ ↓ β”‚ β”‚ Execute commands on sandbox ROM β”‚ β”‚ ↓ (logs each command) β”‚ β”‚ ProposalRegistry::CreateProposal(sandbox_id, prompt, desc) β”‚ β”‚ ↓ (creates proposal directory) β”‚ β”‚ /tmp/yaze/proposals/proposal--/ β”‚ β”‚ β”œβ”€ execution.log (command outputs) β”‚ β”‚ β”œβ”€ diff.txt (if generated) β”‚ β”‚ └─ screenshots/ (if any) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ 2. DISCOVERY (CLI: z3ed agent list) β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ ProposalRegistry::ListProposals() β”‚ β”‚ ↓ (lazy loads from disk) β”‚ β”‚ LoadProposalsFromDiskLocked() β”‚ β”‚ ↓ (scans /tmp/yaze/proposals/) β”‚ β”‚ Reconstructs metadata from filesystem β”‚ β”‚ ↓ (parses timestamps, reads logs) β”‚ β”‚ Returns vector β”‚ β”‚ ↓ β”‚ β”‚ Display table: ID | Status | Created | Prompt | Stats β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ 3. REVIEW (GUI: Debug β†’ Agent Proposals) β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ ProposalDrawer::Draw() β”‚ β”‚ ↓ (called every frame from EditorManager) β”‚ β”‚ ProposalDrawer::RefreshProposals() β”‚ β”‚ ↓ (calls ProposalRegistry::ListProposals) β”‚ β”‚ Display proposal list (selectable table) β”‚ β”‚ ↓ (user clicks proposal) β”‚ β”‚ ProposalDrawer::SelectProposal(id) β”‚ β”‚ ↓ (loads detail content) β”‚ β”‚ Read execution.log and diff.txt from proposal directory β”‚ β”‚ ↓ β”‚ β”‚ Display detail view: β”‚ β”‚ β”œβ”€ Metadata (sandbox_id, timestamp, stats) β”‚ β”‚ β”œβ”€ Diff (syntax highlighted) β”‚ β”‚ └─ Log (command execution trace) β”‚ β”‚ ↓ β”‚ β”‚ User decides: [Accept] [Reject] [Delete] β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ 4. ACCEPTANCE (GUI: Click "Accept" button) β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ ProposalDrawer::AcceptProposal(proposal_id) β”‚ β”‚ ↓ β”‚ β”‚ Get proposal metadata (includes sandbox_id) β”‚ β”‚ ↓ β”‚ β”‚ RomSandboxManager::ListSandboxes() β”‚ β”‚ ↓ (find sandbox by ID) β”‚ β”‚ sandbox_rom_path = sandbox.rom_path β”‚ β”‚ ↓ β”‚ β”‚ Load sandbox ROM from disk β”‚ β”‚ ↓ β”‚ β”‚ rom_->WriteVector(0, sandbox_rom.vector()) β”‚ β”‚ ↓ (copies entire sandbox ROM β†’ main ROM) β”‚ β”‚ ROM marked dirty (save prompt appears) β”‚ β”‚ ↓ β”‚ β”‚ ProposalRegistry::UpdateStatus(id, kAccepted) β”‚ β”‚ ↓ β”‚ β”‚ User: File β†’ Save ROM β”‚ β”‚ ↓ β”‚ β”‚ Changes committed βœ… β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ 5. REJECTION (GUI: Click "Reject" button) β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ ProposalDrawer::RejectProposal(proposal_id) β”‚ β”‚ ↓ β”‚ β”‚ ProposalRegistry::UpdateStatus(id, kRejected) β”‚ β”‚ ↓ β”‚ β”‚ Proposal preserved for audit trail β”‚ β”‚ Sandbox ROM left untouched (can be cleaned up later) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` ### 5.2. Component Interaction Diagram ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ CLI Layer β”‚ β”‚ (z3ed commands) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”œβ”€β”€β–Ί agent run ──────────┐ β”œβ”€β”€β–Ί agent list ────────── └──► agent diff ────────── β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ CLI Service Layer β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ ProposalRegistry (Singleton) β”‚ β”‚ β”‚ β”‚ β€’ CreateProposal() β”‚ β”‚ β”‚ β”‚ β€’ ListProposals() β”‚ β”‚ β”‚ β”‚ β€’ GetProposal() β”‚ β”‚ β”‚ β”‚ β€’ UpdateStatus() β”‚ β”‚ β”‚ β”‚ β€’ RemoveProposal() β”‚ β”‚ β”‚ β”‚ β€’ LoadProposalsFromDiskLocked() β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ RomSandboxManager (Singleton) β”‚ β”‚ β”‚ β”‚ β€’ CreateSandbox() β”‚ β”‚ β”‚ β”‚ β€’ ActiveSandbox() β”‚ β”‚ β”‚ β”‚ β€’ ListSandboxes() β”‚ β”‚ β”‚ β”‚ β€’ RemoveSandbox() β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Filesystem Layer β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ /tmp/yaze/proposals/ β”‚ β”‚ └─ proposal--/ β”‚ β”‚ β”œβ”€ execution.log β”‚ β”‚ β”œβ”€ diff.txt β”‚ β”‚ └─ screenshots/ β”‚ β”‚ β”‚ β”‚ /tmp/yaze/sandboxes/ β”‚ β”‚ └─ -/ β”‚ β”‚ └─ zelda3.sfc (isolated ROM copy) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β–² β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ GUI Layer β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ EditorManager β”‚ β”‚ β”‚ β”‚ β€’ current_rom_ β”‚ β”‚ β”‚ β”‚ β€’ proposal_drawer_ β”‚ β”‚ β”‚ β”‚ β€’ Update() { proposal_drawer_.Draw() } β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ ProposalDrawer β”‚ β”‚ β”‚ β”‚ β€’ rom_ (ptr to EditorManager's ROM) β”‚ β”‚ β”‚ β”‚ β€’ Draw() β”‚ β”‚ β”‚ β”‚ β€’ DrawProposalList() β”‚ β”‚ β”‚ β”‚ β€’ DrawProposalDetail() β”‚ β”‚ β”‚ β”‚ β€’ AcceptProposal() ← ROM MERGE β”‚ β”‚ β”‚ β”‚ β€’ RejectProposal() β”‚ β”‚ β”‚ β”‚ β€’ DeleteProposal() β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` ### 5.3. Data Flow: Agent Run to ROM Merge ``` User: "Make soldiers wear red armor" β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ MockAIService β”‚ Generates: ["palette export sprites_aux1 4 soldier.col"] β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ RomSandboxManager β”‚ Creates: /tmp/.../sandboxes/20251001T200215-1/zelda3.sfc β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Command Executor β”‚ Runs: palette export on sandbox ROM β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ ProposalRegistry β”‚ Creates: proposal-20251001T200215-1/ β”‚ β”‚ β€’ execution.log: "[timestamp] palette export succeeded" β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β€’ diff.txt: (if diff generated) β”‚ β”‚ Time passes... user launches GUI β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ ProposalDrawer loads β”‚ Reads: /tmp/.../proposals/proposal-*/ β”‚ β”‚ Displays: List of proposals β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ User clicks "Accept" β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ AcceptProposal() β”‚ 1. Find sandbox ROM: /tmp/.../sandboxes/.../zelda3.sfc β”‚ β”‚ 2. Load sandbox ROM β”‚ β”‚ 3. rom_->WriteVector(0, sandbox_rom.vector()) β”‚ β”‚ 4. Main ROM now contains all sandbox changes β”‚ β”‚ 5. ROM marked dirty β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ User: File β†’ Save β”‚ Changes persisted to disk βœ… β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` ## 5. Open Questions - What serialization format should the proposal registry adopt for diff payloads (binary vs. textual vs. hybrid)? \ ➀ Decision: pursue a hybrid package (`.z3ed-diff`) that wraps binary tile/object deltas alongside a JSON metadata envelope (identifiers, texture descriptors, preview palette info). Capture format draft under RC/AW backlog. - How should the harness authenticate escalation requests for mutation actions? \ ➀ Still openβ€”evaluate shared-secret vs. interactive user prompt in the harness spike (IT-01). - Can we reuse existing regression test infrastructure for nightly ImGui runs or should we spin up a dedicated binary? \ ➀ Investigate during the ImGuiTestHarness spike; compare extending `yaze_test` jobs versus introducing a lightweight automation runner. ## 4. Work History & Key Decisions This section provides a high-level summary of completed workstreams and major architectural decisions. ### Resource Catalogue Workstream (RC) - βœ… COMPLETE - **Outcome**: A machine-readable API specification for all `z3ed` commands. - **Artifact**: `docs/api/z3ed-resources.yaml` is the generated source of truth. - **Details**: Implemented a schema system and serialization for all CLI resources (ROM, Palette, Agent, etc.), enabling AI consumption. ### Acceptance Workflow (AW-01, AW-02, AW-03) - βœ… COMPLETE - **Outcome**: A complete, human-in-the-loop proposal review system. - **Components**: - `RomSandboxManager`: For creating isolated ROM copies. - `ProposalRegistry`: For tracking proposals, diffs, and logs with disk persistence. - `ProposalDrawer`: An ImGui panel for reviewing, accepting, and rejecting proposals, with full ROM merging capabilities. - **Integration**: The `agent run`, `agent list`, and `agent diff` commands are fully integrated with the registry. The GUI and CLI share the same underlying proposal data. ### ImGuiTestHarness (IT-01, IT-02) - βœ… CORE COMPLETE - **Outcome**: A gRPC-based service for automated GUI testing. - **Decision**: Chose **gRPC** for its performance, cross-platform support, and type safety. - **Features**: Implemented 6 core RPCs: `Ping`, `Click`, `Type`, `Wait`, `Assert`, and a stubbed `Screenshot`. - **Integration**: The `z3ed agent test` command can translate natural language prompts into a sequence of gRPC calls to execute tests. ### Files Modified/Created A summary of files created or changed during the implementation of the core `z3ed` infrastructure. **Core Services & CLI Handlers**: - `src/cli/service/proposal_registry.{h,cc}` - `src/cli/service/rom_sandbox_manager.{h,cc}` - `src/cli/service/resource_catalog.{h,cc}` - `src/cli/handlers/agent.cc` - `src/cli/handlers/rom.cc` **GUI & Application Integration**: - `src/app/editor/system/proposal_drawer.{h,cc}` - `src/app/editor/editor_manager.{h,cc}` - `src/app/core/service/imgui_test_harness_service.{h,cc}` - `src/app/core/proto/imgui_test_harness.proto` **Build System (CMake)**: - `src/app/app.cmake` - `src/app/emu/emu.cmake` - `src/cli/z3ed.cmake` - `src/CMakeLists.txt` **Documentation & API Specs**: - `docs/api/z3ed-resources.yaml` - `docs/z3ed/E6-z3ed-cli-design.md` - `docs/z3ed/E6-z3ed-implementation-plan.md` - `docs/z3ed/E6-z3ed-reference.md` - `docs/z3ed/README.md` ## 5. Open Questions - What serialization format should the proposal registry adopt for diff payloads (binary vs. textual vs. hybrid)? \ ➀ Decision: pursue a hybrid package (`.z3ed-diff`) that wraps binary tile/object deltas alongside a JSON metadata envelope (identifiers, texture descriptors, preview palette info). Capture format draft under RC/AW backlog. - How should the harness authenticate escalation requests for mutation actions? \ ➀ Still openβ€”evaluate shared-secret vs. interactive user prompt in the harness spike (IT-01). - Can we reuse existing regression test infrastructure for nightly ImGui runs or should we spin up a dedicated binary? \ ➀ Investigate during the ImGuiTestHarness spike; compare extending `yaze_test` jobs versus introducing a lightweight automation runner. ## 6. References **Active Documentation**: - `E6-z3ed-cli-design.md` - Overall CLI design and architecture - `E6-z3ed-reference.md` - Technical command and API reference - `docs/api/z3ed-resources.yaml` - Machine-readable API reference (generated) **Source Code**: - `src/cli/service/` - Core services (proposal registry, sandbox manager, resource catalog) - `src/app/editor/system/proposal_drawer.{h,cc}` - GUI review panel - `src/app/core/service/imgui_test_harness_service.{h,cc}` - gRPC automation server --- **Last Updated**: [Current Date] **Contributors**: @scawful, GitHub Copilot **License**: Same as YAZE (see ../../LICENSE)