Files
yaze/docs/z3ed/E6-z3ed-implementation-plan.md
scawful 286efdec6a Enhance ImGuiTestHarness with dynamic test integration and end-to-end validation
- Updated README.md to reflect the completion of IT-01 and the transition to end-to-end validation phase.
- Introduced a new end-to-end test script (scripts/test_harness_e2e.sh) for validating all RPC methods of the ImGuiTestHarness gRPC service.
- Implemented dynamic test functionality in ImGuiTestHarnessService for Type, Wait, and Assert methods, utilizing ImGuiTestEngine.
- Enhanced error handling and response messages for better clarity during test execution.
- Updated existing methods to support dynamic test registration and execution, ensuring robust interaction with the GUI elements.
2025-10-02 00:49:28 -04:00

862 lines
46 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters
This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# z3ed Agentic Workflow Implementation Plan
**Last Updated**: October 2, 2025
**Status**: IT-01 Complete ✅ | AW-03 Complete ✅ | E2E Validation Phase
> 📋 **See Also**: [NEXT_PRIORITIES_OCT2.md](NEXT_PRIORITIES_OCT2.md) for detailed implementation guides for current priorities.
## Executive Summary
The z3ed CLI and AI agent workflow system has completed major infrastructure milestones:
**✅ Completed Phases**:
- **Phase 6**: Resource Catalogue - Machine-readable API specs for AI consumption
- **AW-01/02/03**: Acceptance Workflow - Proposal tracking, sandbox management, GUI review with ROM merging
- **IT-01**: ImGuiTestHarness - Full GUI automation via gRPC + ImGuiTestEngine (all 3 phases complete)
**🔄 Active Phase**:
- **Priority 1**: End-to-End Workflow Validation - Test complete proposal lifecycle with real GUI
**📋 Next Phases**:
- **Priority 2**: CLI Agent Test Command (IT-02) - Natural language → automated GUI testing
- **Priority 3**: Policy Evaluation Framework (AW-04) - YAML-based constraints for proposal acceptance
## Quick Reference
**Start Test Harness**:
```bash
./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze \
--enable_test_harness \
--test_harness_port=50052 \
--rom_file=assets/zelda3.sfc &
```
**Test All RPCs**:
```bash
./scripts/test_harness_e2e.sh
```
**Create Proposal**:
```bash
./build/bin/z3ed agent run "Test prompt" --sandbox
./build/bin/z3ed agent list
./build/bin/z3ed agent diff --proposal-id <ID>
```
**Review in GUI**:
- Open YAZE → `Debug → Agent Proposals`
- Select proposal → Review → Accept/Reject/Delete
---
## 1. Current Priorities (Week of Oct 2-8, 2025)
**Status**: Phase 1 Complete ✅ | Phase 2 Complete ✅ | Phase 3 Complete ✅
### Priority 1: End-to-End Workflow Validation (ACTIVE) 🔄
**Goal**: Validate complete AI agent workflow from proposal creation to ROM commit
**Time Estimate**: 2-3 hours
**Status**: Ready to execute
**Task Checklist**:
1.**E2E Test Script**: Already created (`scripts/test_harness_e2e.sh`)
2. 📋 **Manual Testing Workflow**:
- Start YAZE with test harness enabled
- Create proposal via CLI: `z3ed agent run "Test prompt" --sandbox`
- Verify proposal appears in ProposalDrawer GUI
- Test Accept → validate ROM merge and save prompt
- Test Reject → validate status update
- Test Delete → validate cleanup
3. 📋 **Real Widget Testing**:
- Click actual YAZE buttons (Overworld, Dungeon, etc.)
- Type into real input fields
- Wait for actual windows to appear
- Assert on real widget states
4. 📋 **Document Edge Cases**:
- Widget not found scenarios
- Timeout handling
- Error recovery patterns
### Priority 2: CLI Agent Test Command (IT-02) 📋 NEXT
**Goal**: Natural language → automated GUI testing via gRPC
**Time Estimate**: 4-6 hours
**Blocking Dependency**: Priority 1 completion
**Implementation Tasks**:
1. **Create `z3ed agent test` command**:
- Parse natural language prompt
- Generate RPC call sequence (Click → Wait → Assert)
- Execute via gRPC client
- Capture results and screenshots
2. **Example Usage**:
```bash
z3ed agent test --prompt "Open Overworld editor and verify it loads" \
--rom zelda3.sfc
# Generated workflow:
# 1. Click "button:Overworld"
# 2. Wait "window_visible:Overworld Editor" (5s)
# 3. Assert "visible:Overworld Editor"
# 4. Screenshot "full"
```
3. **Implementation Files**:
- `src/cli/handlers/agent.cc` - Add `HandleTestCommand()`
- `src/cli/service/gui_automation_client.{h,cc}` - gRPC client wrapper
- `src/cli/service/test_workflow_generator.{h,cc}` - Prompt → RPC translator
### Priority 3: Policy Evaluation Framework (AW-04) 📋
**Goal**: YAML-based constraint system for gating proposal acceptance
**Time Estimate**: 6-8 hours
**Blocking Dependency**: None (can work in parallel)
> <20> **Detailed Guides**: See [NEXT_PRIORITIES_OCT2.md](NEXT_PRIORITIES_OCT2.md) for complete implementation breakdowns with code examples.
---
## 2. Workstreams Overview
This plan decomposes the design additions into actionable engineering tasks. Each workstream contains milestones, blocking dependencies, and expected deliverables.
1. `src/cli/handlers/rom.cc` - Added `RomInfo::Run` implementation
2. `src/cli/z3ed.h` - Added `RomInfo` class declaration
3. `src/cli/modern_cli.cc` - Updated `HandleRomInfoCommand` routing
4. `src/cli/service/resource_catalog.cc` - Added `rom info` schema entry
---
## 2. Workstreams Overview
| Workstream | Goal | Status | Notes |
|------------|------|--------|-------|
| Resource Catalogue | Machine-readable CLI specs for AI consumption | ✅ Complete | `docs/api/z3ed-resources.yaml` generated |
| Acceptance Workflow | Human review/approval of agent proposals | ✅ Complete | ProposalDrawer with ROM merging operational |
| ImGuiTest Bridge | Automated GUI testing via gRPC | ✅ Complete | All 3 phases done (11 hours) |
| Verification Pipeline | Layered testing + CI coverage | 📋 In Progress | E2E validation phase |
| Telemetry & Learning | Capture signals for improvement | 📋 Planned | Optional/opt-in (Phase 8) |
### Completed Work Summary
**Resource Catalogue (RC)** ✅:
- CLI flag passthrough and resource catalog system
- `agent describe` exports YAML/JSON schemas
- `docs/api/z3ed-resources.yaml` maintained
- All ROM/Palette/Overworld/Dungeon/Patch commands documented
**Acceptance Workflow (AW-01/02/03)** ✅:
- `ProposalRegistry` with disk persistence and cross-session tracking
- `RomSandboxManager` for isolated ROM copies
- `agent list` and `agent diff` commands
- **ProposalDrawer GUI**: List/detail views, Accept/Reject/Delete, ROM merging
- Integrated into EditorManager (`Debug → Agent Proposals`)
**ImGuiTestHarness (IT-01)** ✅:
- Phase 1: gRPC infrastructure (6 RPC methods)
- Phase 2: TestManager integration with dynamic tests
- Phase 3: Full ImGuiTestEngine (Type/Wait/Assert RPCs)
- E2E test script: `scripts/test_harness_e2e.sh`
- Documentation: IT-01-QUICKSTART.md
---
## 3. Task Backlog
| ID | Task | Workstream | Type | Status | Dependencies |
|----|------|------------|------|--------|--------------|
| RC-01 | Define schema for `ResourceCatalog` entries and implement serialization helpers. | Resource Catalogue | Code | Done | Schema system complete with all resource types documented |
| RC-02 | Auto-generate `docs/api/z3ed-resources.yaml` from command annotations. | Resource Catalogue | Tooling | Done | Generated and committed to docs/api/ |
| RC-03 | Implement `z3ed agent describe` CLI surface returning JSON schemas. | Resource Catalogue | Code | Done | Both YAML and JSON output formats working |
| RC-04 | Integrate schema export with TUI command palette + help overlays. | Resource Catalogue | UX | Planned | RC-03 |
| RC-05 | Harden CLI command routing/flag parsing to unblock agent automation. | Resource Catalogue | Code | Done | Fixed rom info handler to use FLAGS_rom |
| AW-01 | Implement sandbox ROM cloning and tracking (`RomSandboxManager`). | Acceptance Workflow | Code | Done | ROM sandbox manager operational with lifecycle management |
| AW-02 | Build proposal registry service storing diffs, logs, screenshots. | Acceptance Workflow | Code | Done | ProposalRegistry implemented with disk persistence |
| AW-03 | Add ImGui drawer for proposals with accept/reject controls. | Acceptance Workflow | UX | Done | ProposalDrawer GUI complete with ROM merging |
| AW-04 | Implement policy evaluation for gating accept buttons. | Acceptance Workflow | Code | In Progress | AW-03, Priority 2 - YAML policies + PolicyEvaluator |
| AW-05 | Draft `.z3ed-diff` hybrid schema (binary deltas + JSON metadata). | Acceptance Workflow | Design | Planned | AW-01 |
| IT-01 | Create `ImGuiTestHarness` IPC service embedded in `yaze_test`. | ImGuiTest Bridge | Code | Done | ✅ Phase 1+2+3 Complete - Full GUI automation with gRPC + ImGuiTestEngine |
| IT-02 | Implement CLI agent step translation (`imgui_action` → harness call). | ImGuiTest Bridge | Code | In Progress | IT-01, `z3ed agent test` command with natural language prompts |
| IT-03 | Provide synchronization primitives (`WaitForIdle`, etc.). | ImGuiTest Bridge | Code | Done | ✅ Wait RPC with condition polling already implemented in IT-01 Phase 3 |
| VP-01 | Expand CLI unit tests for new commands and sandbox flow. | Verification Pipeline | Test | Planned | RC/AW tasks |
| VP-02 | Add harness integration tests with replay scripts. | Verification Pipeline | Test | Planned | IT tasks |
| VP-03 | Create CI job running agent smoke tests with `YAZE_WITH_JSON`. | Verification Pipeline | Infra | Planned | VP-01, VP-02 |
| TL-01 | Capture accept/reject metadata and push to telemetry log. | Telemetry & Learning | Code | Planned | AW tasks |
| TL-02 | Build anonymized metrics exporter + opt-in toggle. | Telemetry & Learning | Infra | Planned | TL-01 |
_Status Legend: Prototype · In Progress · Planned · Blocked · Done_
## 3. Immediate Next Steps (Week of Oct 1-7, 2025)
### Priority 0: Testing & Validation (Active)
1. **TEST**: Complete end-to-end proposal workflow
- Launch YAZE and verify ProposalDrawer displays live proposals
- Test Accept action → verify ROM merge and save prompt
- Test Reject and Delete actions
- Validate filtering and refresh functionality
### Priority 1: ImGuiTestHarness Foundation (IT-01) ✅ PHASE 2 COMPLETE
**Rationale**: Required for automated GUI testing and remote control of YAZE for AI workflows
**Decision**: ✅ **Use gRPC** - Production-grade, cross-platform, type-safe (see `IT-01-grpc-evaluation.md`)
**Status**: Phase 1 Complete ✅ | Phase 2 Complete ✅ | Phase 3 Planned <20>
#### Phase 1: gRPC Infrastructure ✅ COMPLETE
- ✅ Add gRPC to build system via FetchContent
- ✅ Create .proto schema (Ping, Click, Type, Wait, Assert, Screenshot)
- ✅ Implement gRPC server with all 6 RPC stubs
- ✅ Test with grpcurl - all RPCs responding
- ✅ Server lifecycle management (Start/Shutdown)
- ✅ Cross-platform build verified (macOS ARM64)
**See**: `GRPC_TEST_SUCCESS.md` for Phase 1 completion details
#### Phase 2: ImGuiTestEngine Integration ✅ COMPLETE
**Goal**: Replace stub RPC handlers with actual GUI automation
**Status**: Infrastructure complete, dynamic test registration implemented
**Time Spent**: ~4 hours
**Implementation Guide**: 📖 **[IT-01-PHASE2-IMPLEMENTATION-GUIDE.md](IT-01-PHASE2-IMPLEMENTATION-GUIDE.md)**
**Completed Tasks**:
1. ✅ **TestManager Integration** - gRPC service receives TestManager reference
2. ✅ **Build System** - Successfully compiles with ImGuiTestEngine support
3. ✅ **Server Startup** - gRPC server starts correctly on macOS with test harness flag
4. ✅ **Dynamic Test Registration** - Click RPC uses `IM_REGISTER_TEST()` macro for dynamic tests
5. ✅ **Stub Handlers** - Type/Wait/Assert RPCs return success (implementation pending Phase 3)
6. ✅ **Ping RPC** - Fully functional, returns YAZE version and timestamp
**Key Learnings**:
- ImGuiTestEngine requires test registration - can't call test functions directly
- Test context provided by engine via `test->Output.Status` not `test->Status`
- YAZE uses custom flag system with `FLAGS_name->Get()` pattern
- Correct flags: `--enable_test_harness`, `--test_harness_port`, `--rom_file`
**Testing Results**:
```bash
# Server starts successfully
./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze \
--enable_test_harness \
--test_harness_port=50052 \
--rom_file=assets/zelda3.sfc &
# Ping RPC working
grpcurl -plaintext -d '{"message":"test"}' \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Ping
# Response: {"message":"Pong: test","timestampMs":"...","yazeVersion":"0.3.2"}
```
**Issues Fixed**:
- ❌→✅ SIGSEGV on TestManager initialization (deferred ImGuiTestEngine init to Phase 3)
- ❌→✅ ImGuiTestEngine API mismatch (switched to dynamic test registration)
- ❌→✅ Status field access (corrected to `test->Output.Status`)
- ❌→✅ Port conflicts (use port 50052, `killall yaze` to cleanup)
- ❌→✅ Flag naming (documented correct underscore format)
#### Phase 3: Full ImGuiTestEngine Integration ✅ COMPLETE (Oct 2, 2025)
**Goal**: Complete implementation of all GUI automation RPCs
**Completed Tasks**:
1. ✅ **Type RPC Implementation** - Full text input automation
- ItemInfo API usage corrected (returns by value, not pointer)
- Focus management with ItemClick before typing
- Clear-first functionality with keyboard shortcuts
- Dynamic test registration with timeout handling
2. ✅ **Wait RPC Implementation** - Condition polling with timeout
- Three condition types: window_visible, element_visible, element_enabled
- Configurable timeout (default 5000ms) and poll interval (default 100ms)
- Proper Yield() calls to allow ImGui event processing
- Extended timeout for test execution
3. ✅ **Assert RPC Implementation** - State validation with structured responses
- Multiple assertion types: visible, enabled, exists, text_contains
- Actual vs expected value reporting
- Detailed error messages for debugging
- text_contains partially implemented (text retrieval needs refinement)
4. ✅ **API Compatibility Fixes**
- Corrected ItemInfo usage (by value, check ID != 0)
- Fixed flag names (ItemFlags instead of StatusFlags)
- Proper visibility checks using RectClipped dimensions
- All dynamic tests properly registered and cleaned up
**Testing**:
- Build successful on macOS ARM64
- All RPCs respond correctly
- Test script created: `scripts/test_harness_e2e.sh`
- See `IT-01-PHASE3-COMPLETE.md` for full implementation details
**Known Limitations**:
- Screenshot RPC not implemented (placeholder stub)
- text_contains assertion uses placeholder text retrieval
- Need end-to-end workflow testing with real YAZE widgets
6. **End-to-End Testing** (1 hour)
- Create shell script workflow: start server → click button → wait for window → type text → assert state
- Test with real YAZE editors (Overworld, Dungeon, etc.)
- Document edge cases and troubleshooting
#### Phase 4: CLI Integration & Windows Testing (4-5 hours)
7. **CLI Client** (`z3ed agent test`)
- Generate gRPC calls from AI prompts
- Natural language → ImGui action translation
- Screenshot capture for LLM feedback
8. **Windows Testing**
- Detailed build instructions for vcpkg setup
- Test on Windows VM or with contributor
- Add Windows CI job to GitHub Actions
- Document troubleshooting
### IT-01 Quick Reference
**Start YAZE with Test Harness**:
```bash
./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze \
--enable_test_harness \
--test_harness_port=50052 \
--rom_file=assets/zelda3.sfc &
```
**Test RPCs with grpcurl**:
```bash
# Ping - Health check
grpcurl -plaintext -import-path src/app/core/proto -proto imgui_test_harness.proto \
-d '{"message":"test"}' 127.0.0.1:50052 yaze.test.ImGuiTestHarness/Ping
# Click - Click UI element
grpcurl -plaintext -import-path src/app/core/proto -proto imgui_test_harness.proto \
-d '{"target":"button:Overworld","type":"LEFT"}' \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Click
# Type - Input text
grpcurl -plaintext -import-path src/app/core/proto -proto imgui_test_harness.proto \
-d '{"target":"input:Filename","text":"zelda3.sfc","clear_first":true}' \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Type
# Wait - Wait for condition
grpcurl -plaintext -import-path src/app/core/proto -proto imgui_test_harness.proto \
-d '{"condition":"window_visible:Overworld Editor","timeout_ms":5000}' \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Wait
# Assert - Validate state
grpcurl -plaintext -import-path src/app/core/proto -proto imgui_test_harness.proto \
-d '{"condition":"visible:Main Window"}' \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Assert
```
**Troubleshooting**:
- **Port in use**: `killall yaze` or use `--test_harness_port=50053`
- **Connection refused**: Check server started with `lsof -i :50052`
- **Unrecognized flag**: Use underscores not hyphens (e.g., `--rom_file` not `--rom`)
### Priority 2: Policy Evaluation Framework (AW-04, 4-6 hours)
5. **DESIGN**: YAML-based Policy Configuration
```yaml
# .yaze/policies/agent.yaml
version: 1.0
policies:
- name: require_tests
type: test_requirement
enabled: true
rules:
- test_suite: "overworld_rendering"
min_pass_rate: 0.95
- test_suite: "palette_integrity"
min_pass_rate: 1.0
- name: limit_change_scope
type: change_constraint
enabled: true
rules:
- max_bytes_changed: 10240 # 10KB
- allowed_banks: [0x00, 0x01, 0x0E] # Graphics banks only
- forbidden_ranges:
- start: 0xFFB0 # ROM header
end: 0xFFFF
- name: human_review_required
type: review_requirement
enabled: true
rules:
- if: bytes_changed > 1024
then: require_diff_review: true
- if: commands_executed > 10
then: require_log_review: true
```
6. **IMPLEMENT**: PolicyEvaluator Service
- `src/cli/service/policy_evaluator.{h,cc}`
- Singleton service loads policies from `.yaze/policies/`
- `EvaluateProposal(proposal_id) -> PolicyResult`
- Returns: pass/fail + list of violations with severity
- Hook into ProposalRegistry lifecycle
7. **INTEGRATE**: Policy UI in ProposalDrawer
- Add "Policy Status" section in detail view
- Display violations with icons: ⛔ Critical, ⚠️ Warning, Info
- Gate Accept button: disabled if critical violations exist
- Show helpful messages: "Accept blocked: Test pass rate 0.85 < 0.95"
- Allow policy overrides with confirmation: "Override policy? This action will be logged."
### Priority 3: Documentation & Consolidation (2-3 hours)
8. **CONSOLIDATE**: Merge standalone docs into main plan
- ✅ AW-03 summary → already in main plan, delete standalone doc
- Check for other AW-* or task-specific docs to merge
- Update main plan with architecture diagrams
9. **CREATE**: Architecture Flow Diagram
- Visual representation of proposal lifecycle
- Component interaction diagram
- Add to implementation plan
### Later: Advanced Features
- VP-01: Expand CLI unit tests
- VP-02: Integration tests with replay scripts
- TL-01: Telemetry capture for learning
## 4. Current Issues & Blockers
### Active Issues
None - all blocking issues resolved as of Oct 1, 2025
### Known Limitations (Non-Blocking)
1. ProposalDrawer lacks keyboard navigation
2. Large diffs/logs truncated at 1000 lines (consider pagination)
3. Proposals don't persist full metadata to disk (prompt, description, sandbox_id reconstructed)
4. No policy evaluation yet (AW-04)
## 5. Architecture Overview
### 5.1. Proposal Lifecycle Flow
```
┌─────────────────────────────────────────────────────────────────┐
│ 1. CREATION (CLI: z3ed agent run) │
├─────────────────────────────────────────────────────────────────┤
│ User Prompt │
│ ↓ │
│ MockAIService / GeminiAIService │
│ ↓ (generates commands) │
│ ["palette export ...", "overworld set-tile ..."] │
│ ↓ │
│ RomSandboxManager::CreateSandbox(rom) │
│ ↓ (creates isolated copy) │
│ /tmp/yaze/sandboxes/<timestamp>/zelda3.sfc │
│ ↓ │
│ Execute commands on sandbox ROM │
│ ↓ (logs each command) │
│ ProposalRegistry::CreateProposal(sandbox_id, prompt, desc) │
│ ↓ (creates proposal directory) │
│ /tmp/yaze/proposals/proposal-<timestamp>-<seq>/ │
│ ├─ execution.log (command outputs) │
│ ├─ diff.txt (if generated) │
│ └─ screenshots/ (if any) │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ 2. DISCOVERY (CLI: z3ed agent list) │
├─────────────────────────────────────────────────────────────────┤
│ ProposalRegistry::ListProposals() │
│ ↓ (lazy loads from disk) │
│ LoadProposalsFromDiskLocked() │
│ ↓ (scans /tmp/yaze/proposals/) │
│ Reconstructs metadata from filesystem │
│ ↓ (parses timestamps, reads logs) │
│ Returns vector<ProposalMetadata> │
│ ↓ │
│ Display table: ID | Status | Created | Prompt | Stats │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ 3. REVIEW (GUI: Debug → Agent Proposals) │
├─────────────────────────────────────────────────────────────────┤
│ ProposalDrawer::Draw() │
│ ↓ (called every frame from EditorManager) │
│ ProposalDrawer::RefreshProposals() │
│ ↓ (calls ProposalRegistry::ListProposals) │
│ Display proposal list (selectable table) │
│ ↓ (user clicks proposal) │
│ ProposalDrawer::SelectProposal(id) │
│ ↓ (loads detail content) │
│ Read execution.log and diff.txt from proposal directory │
│ ↓ │
│ Display detail view: │
│ ├─ Metadata (sandbox_id, timestamp, stats) │
│ ├─ Diff (syntax highlighted) │
│ └─ Log (command execution trace) │
│ ↓ │
│ User decides: [Accept] [Reject] [Delete] │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ 4. ACCEPTANCE (GUI: Click "Accept" button) │
├─────────────────────────────────────────────────────────────────┤
│ ProposalDrawer::AcceptProposal(proposal_id) │
│ ↓ │
│ Get proposal metadata (includes sandbox_id) │
│ ↓ │
│ RomSandboxManager::ListSandboxes() │
│ ↓ (find sandbox by ID) │
│ sandbox_rom_path = sandbox.rom_path │
│ ↓ │
│ Load sandbox ROM from disk │
│ ↓ │
│ rom_->WriteVector(0, sandbox_rom.vector()) │
│ ↓ (copies entire sandbox ROM → main ROM) │
│ ROM marked dirty (save prompt appears) │
│ ↓ │
│ ProposalRegistry::UpdateStatus(id, kAccepted) │
│ ↓ │
│ User: File → Save ROM │
│ ↓ │
│ Changes committed ✅ │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ 5. REJECTION (GUI: Click "Reject" button) │
├─────────────────────────────────────────────────────────────────┤
│ ProposalDrawer::RejectProposal(proposal_id) │
│ ↓ │
│ ProposalRegistry::UpdateStatus(id, kRejected) │
│ ↓ │
│ Proposal preserved for audit trail │
│ Sandbox ROM left untouched (can be cleaned up later) │
└─────────────────────────────────────────────────────────────────┘
```
### 5.2. Component Interaction Diagram
```
┌────────────────────┐
│ CLI Layer │
│ (z3ed commands) │
└────────┬───────────┘
├──► agent run ──────────┐
├──► agent list ─────────┤
└──► agent diff ─────────┤
┌────────────────────────▼──────────────────────┐
│ CLI Service Layer │
├───────────────────────────────────────────────┤
│ ┌─────────────────────────────────────────┐ │
│ │ ProposalRegistry (Singleton) │ │
│ │ • CreateProposal() │ │
│ │ • ListProposals() │ │
│ │ • GetProposal() │ │
│ │ • UpdateStatus() │ │
│ │ • RemoveProposal() │ │
│ │ • LoadProposalsFromDiskLocked() │ │
│ └────────────┬────────────────────────────┘ │
│ │ │
│ ┌────────────▼────────────────────────────┐ │
│ │ RomSandboxManager (Singleton) │ │
│ │ • CreateSandbox() │ │
│ │ • ActiveSandbox() │ │
│ │ • ListSandboxes() │ │
│ │ • RemoveSandbox() │ │
│ └────────────┬────────────────────────────┘ │
└───────────────┼────────────────────────────────┘
┌───────────────▼────────────────────────────────┐
│ Filesystem Layer │
├────────────────────────────────────────────────┤
│ /tmp/yaze/proposals/ │
│ └─ proposal-<timestamp>-<seq>/ │
│ ├─ execution.log │
│ ├─ diff.txt │
│ └─ screenshots/ │
│ │
│ /tmp/yaze/sandboxes/ │
│ └─ <timestamp>-<seq>/ │
│ └─ zelda3.sfc (isolated ROM copy) │
└────────────────────────────────────────────────┘
┌───────────────┴────────────────────────────────┐
│ GUI Layer │
├────────────────────────────────────────────────┤
│ ┌─────────────────────────────────────────┐ │
│ │ EditorManager │ │
│ │ • current_rom_ │ │
│ │ • proposal_drawer_ │ │
│ │ • Update() { proposal_drawer_.Draw() } │ │
│ └────────────┬────────────────────────────┘ │
│ │ │
│ ┌────────────▼────────────────────────────┐ │
│ │ ProposalDrawer │ │
│ │ • rom_ (ptr to EditorManager's ROM) │ │
│ │ • Draw() │ │
│ │ • DrawProposalList() │ │
│ │ • DrawProposalDetail() │ │
│ │ • AcceptProposal() ← ROM MERGE │ │
│ │ • RejectProposal() │ │
│ │ • DeleteProposal() │ │
│ └─────────────────────────────────────────┘ │
└────────────────────────────────────────────────┘
```
### 5.3. Data Flow: Agent Run to ROM Merge
```
User: "Make soldiers wear red armor"
┌────────────────────────┐
│ MockAIService │ Generates: ["palette export sprites_aux1 4 soldier.col"]
└────────┬───────────────┘
┌────────────────────────┐
│ RomSandboxManager │ Creates: /tmp/.../sandboxes/20251001T200215-1/zelda3.sfc
└────────┬───────────────┘
┌────────────────────────┐
│ Command Executor │ Runs: palette export on sandbox ROM
└────────┬───────────────┘
┌────────────────────────┐
│ ProposalRegistry │ Creates: proposal-20251001T200215-1/
│ │ • execution.log: "[timestamp] palette export succeeded"
└────────┬───────────────┘ • diff.txt: (if diff generated)
│ Time passes... user launches GUI
┌────────────────────────┐
│ ProposalDrawer loads │ Reads: /tmp/.../proposals/proposal-*/
│ │ Displays: List of proposals
└────────┬───────────────┘
│ User clicks "Accept"
┌────────────────────────┐
│ AcceptProposal() │ 1. Find sandbox ROM: /tmp/.../sandboxes/.../zelda3.sfc
│ │ 2. Load sandbox ROM
│ │ 3. rom_->WriteVector(0, sandbox_rom.vector())
│ │ 4. Main ROM now contains all sandbox changes
│ │ 5. ROM marked dirty
└────────┬───────────────┘
┌────────────────────────┐
│ User: File → Save │ Changes persisted to disk ✅
└────────────────────────┘
```
## 5. Open Questions
- What serialization format should the proposal registry adopt for diff payloads (binary vs. textual vs. hybrid)? \
➤ Decision: pursue a hybrid package (`.z3ed-diff`) that wraps binary tile/object deltas alongside a JSON metadata envelope (identifiers, texture descriptors, preview palette info). Capture format draft under RC/AW backlog.
- How should the harness authenticate escalation requests for mutation actions? \
➤ Still open—evaluate shared-secret vs. interactive user prompt in the harness spike (IT-01).
- Can we reuse existing regression test infrastructure for nightly ImGui runs or should we spin up a dedicated binary? \
➤ Investigate during the ImGuiTestHarness spike; compare extending `yaze_test` jobs versus introducing a lightweight automation runner.
## 5. Completed Work Summary
### Resource Catalogue Workstream (RC) - ✅ COMPLETE
The Resource Catalogue workstream has been successfully completed, providing a foundation for AI-driven automation:
**Implementation Details**:
- Created comprehensive schema system in `src/cli/service/resource_catalog.{h,cc}`
- Implemented resource catalog for: ROM, Patch, Palette, Overworld, Dungeon, and Agent commands
- Each resource includes: name, description, actions, arguments, effects, and return values
- Built dual-format serialization: JSON (compact) and YAML (human-readable)
**Key Fixes**:
- Fixed `rom info` segfault by creating dedicated `RomInfo` handler using `FLAGS_rom`
- Added `rom info` action to resource schema with proper metadata
- Ensured all ROM commands consistently use flag-based dispatch
**Generated Artifacts**:
- `docs/api/z3ed-resources.yaml` - Authoritative machine-readable API reference
- Both JSON and YAML output formats validated and working
- Resource filtering capability (`--resource <name>`) operational
**Command Examples**:
```bash
# View all resources in YAML
z3ed agent describe --format yaml
# Get specific resource as JSON
z3ed agent describe --format json --resource rom
# Generate documentation file
z3ed agent describe --format yaml --output docs/api/z3ed-resources.yaml
```
**Testing Results**:
All commands tested and verified working:
- ✅ `z3ed rom info --rom=zelda3.sfc` - displays title, size, filename
- ✅ `z3ed rom validate --rom=zelda3.sfc` - verifies checksum and header
- ✅ `z3ed agent describe --format yaml` - outputs complete catalog
- ✅ `z3ed agent describe --format json --resource rom` - filters by resource
### Acceptance Workflow (AW-01, AW-02) - ✅ CORE COMPLETE
The foundational infrastructure for proposal tracking and review is now operational:
**RomSandboxManager Implementation** (AW-01):
- Singleton service managing isolated ROM copies for agent proposals
- Sandboxes created in `YAZE_SANDBOX_ROOT` (env var) or system temp directory
- Automatic directory creation and ROM file cloning
- Active sandbox tracking for current agent session
- Cleanup utilities for removing old sandboxes
**ProposalRegistry Implementation** (AW-02):
- Comprehensive tracking of agent-generated ROM modifications
- Stores proposal metadata: ID, sandbox ID, prompt, description, timestamps
- Records execution diffs in `diff.txt` within proposal directory
- Appends command execution logs to `execution.log` with timestamps
- Support for screenshot attachments (path tracking)
- Proposal lifecycle: Pending → Accepted/Rejected
- Query capabilities: get by ID, list all, filter by status, find latest pending
**Agent Run Integration**:
- `agent run` now creates sandbox + proposal automatically
- All command executions logged with timestamps and status
- Success/failure outcomes captured in proposal logs
- User feedback includes proposal ID and sandbox path for review
- Foundation ready for `agent diff`, `agent commit`, `agent revert` enhancements
**Agent Diff Enhancement** (Completed Oct 1, 2025):
- Reads proposal diffs from ProposalRegistry automatically
- Displays detailed metadata: proposal ID, status, timestamps, command count
- Shows diff content from proposal directory
- Displays execution log (first 50 lines, with truncation for long logs)
- Provides next-step guidance (commit/revert/GUI review)
- Supports `--proposal-id` flag to view specific proposals
- Fallback to legacy diff behavior if no proposals found
**Agent List Command** (New - Oct 1, 2025):
- Enumerates all proposals in the registry
- Shows proposal ID, status, creation time, prompt, and stats
- Indicates pending/accepted/rejected status for each proposal
- Provides guidance on using `agent diff` to view details
- Empty state message guides users to create proposals with `agent run`
**Resource Catalog Updates**:
- Added `agent list` action with returns schema
- Added `agent diff` action with arguments (`--proposal-id`) and returns schema
- Updated agent resource description to include listing and diffing capabilities
- Regenerated `docs/api/z3ed-resources.yaml` with new agent actions
**ProposalDrawer GUI Component** (Completed Oct 1, 2025):
- ImGui right-side drawer for proposal review (AW-03)
- Split view: proposal list (top) + detail view (bottom)
- List view: table with ID, status, prompt columns; colored status indicators
- Detail view: collapsible sections for metadata/diff/log; syntax-aware display
- Action buttons: Accept, Reject, Delete with confirmation dialogs
- Status filtering (All/Pending/Accepted/Rejected)
- Integrated into EditorManager with Debug → Agent Proposals menu
- Accept/Reject updates ProposalRegistry status
- Delete removes proposal from registry and filesystem
- TODO: Implement actual ROM merging in AcceptProposal method
**Proposal Persistence Fix** (Completed Oct 1, 2025):
- Fixed ProposalRegistry to load proposals from disk on first access
- Added `LoadProposalsFromDiskLocked()` method scanning proposal root directory
- Lazy loading implementation in `ListProposals()` for automatic registry population
- Reconstructs metadata from filesystem: ID, timestamps, log/diff paths, screenshots
- Parses creation time from proposal ID format (`proposal-20251001T200215-1`)
- Enables cross-session proposal tracking - `agent list` now finds all proposals
- ProposalDrawer can now display proposals created via CLI `agent run`
**CMake Build Integration**:
- Added `cli/service/proposal_registry.cc` and `cli/service/rom_sandbox_manager.cc` to all app targets
- Fixed linker errors by including CLI service sources in:
- `yaze` (main GUI app)
- `yaze_emu` (emulator standalone)
- `yaze_core` (testing library)
- `yaze_c` (C API library)
- All targets now build successfully with ProposalDrawer dependencies
**Architecture Benefits**:
- Clean separation: RomSandboxManager (file ops) ↔ ProposalRegistry (metadata)
- Thread-safe with mutex protection for concurrent access
- Extensible design ready for ImGui review UI (AW-03)
- Proposal persistence enables post-session review and auditing
- Proposal-centric workflow enables human-in-the-loop review
- GUI and CLI both have full access to proposal system
**Next Steps for AW Workstream**:
- Test ProposalDrawer in running application
- Complete ROM merging in AcceptProposal method
- AW-04: Policy evaluation for gating mutations
- AW-05: `.z3ed-diff` hybrid format design
### Files Modified/Created
**Phase 6 (Resource Catalogue)**:
1. `src/cli/handlers/rom.cc` - Added `RomInfo::Run` implementation
2. `src/cli/z3ed.h` - Added `RomInfo` class declaration
3. `src/cli/modern_cli.cc` - Updated `HandleRomInfoCommand` routing
4. `src/cli/service/resource_catalog.cc` - Added `rom info` schema entry
5. `docs/api/z3ed-resources.yaml` - Generated comprehensive API catalog
**AW-01 & AW-02 (Proposal Tracking)**:
6. `src/cli/service/proposal_registry.h` - New proposal tracking service interface
7. `src/cli/service/proposal_registry.cc` - Implementation with full lifecycle management
8. `src/cli/handlers/agent.cc` - Integrated ProposalRegistry into agent run workflow
**Agent Diff & List Enhancement**:
9. `src/cli/handlers/agent.cc` - Enhanced HandleDiffCommand with proposal reading, added HandleListCommand
10. `src/cli/service/resource_catalog.cc` - Added agent list/diff actions with schemas
11. `docs/api/z3ed-resources.yaml` - Regenerated with new agent commands
12. `docs/E6-z3ed-cli-design.md` - Updated Section 8.1 with list/diff documentation
**AW-03 (ProposalDrawer GUI)**:
13. `src/app/editor/system/proposal_drawer.h` - Complete drawer interface with Draw/Accept/Reject/Delete
14. `src/app/editor/system/proposal_drawer.cc` - Full implementation (~350 lines) with list/detail views
15. `src/app/editor/editor_manager.h` - Added ProposalDrawer member and include
16. `src/app/editor/editor_manager.cc` - Added menu item and Draw() call in Update loop
17. `src/CMakeLists.txt` - Added proposal_drawer files to System Editor source group
18. `src/app/app.cmake` - Added CLI service sources to yaze target (both Apple and non-Apple builds)
19. `src/app/emu/emu.cmake` - Added CLI service sources to yaze_emu target
20. `src/CMakeLists.txt` - Added CLI service sources to yaze_core library sources
9. `src/cli/z3ed.cmake` - Added proposal_registry.cc to build
10. `docs/E6-z3ed-implementation-plan.md` - Updated progress and task statuses
**Agent Diff & List (Oct 1, 2025)**:
21. `src/cli/handlers/agent.cc` - Enhanced `HandleDiffCommand` with proposal reading, added `HandleListCommand`
22. `src/cli/service/resource_catalog.cc` - Added agent list and diff actions to schema
23. `docs/api/z3ed-resources.yaml` - Regenerated with new agent commands
**Proposal Persistence Fix (Oct 1, 2025)**:
24. `src/cli/service/proposal_registry.h` - Added `LoadProposalsFromDiskLocked()` declaration
25. `src/cli/service/proposal_registry.cc` - Implemented disk loading with timestamp parsing and metadata reconstruction
26. `src/cli/service/proposal_registry.cc` - Modified `ListProposals()` to lazy-load proposals from disk
**ROM Merging Implementation (Oct 1, 2025)**:
27. `src/app/editor/system/proposal_drawer.h` - Added `SetRom()` method and `Rom*` member for merge operations
28. `src/app/editor/system/proposal_drawer.cc` - Implemented full ROM merging in `AcceptProposal()` with sandbox loading
29. `src/app/editor/editor_manager.cc` - Added `SetRom(current_rom_)` call before drawing ProposalDrawer
30. `src/app/editor/system/proposal_drawer.cc` - Added RomSandboxManager include for sandbox path resolution
## 6. References
**Active Documentation**:
- `E6-z3ed-cli-design.md` - Overall CLI design and architecture
- `NEXT_PRIORITIES_OCT2.md` - Current work priorities with detailed implementation guides
- `IT-01-QUICKSTART.md` - Test harness quick reference
- `docs/api/z3ed-resources.yaml` - Machine-readable API reference (generated)
**Source Code**:
- `src/cli/service/` - Core services (proposal registry, sandbox manager, resource catalog)
- `src/app/editor/system/proposal_drawer.{h,cc}` - GUI review panel
- `src/app/core/imgui_test_harness_service.{h,cc}` - gRPC automation server
**Historical Documentation** (archived):
- `archive/STATE_SUMMARY_*.md` - Historical state snapshots
- `archive/IT-01-PHASE*-COMPLETE.md` - Phase completion reports
- `archive/*-grpc-*.md` - gRPC design decisions and technical notes
- `archive/PROGRESS_SUMMARY_*.md` - Daily progress logs
---
**Last Updated**: October 2, 2025
**Contributors**: @scawful, GitHub Copilot
**License**: Same as YAZE (see ../../LICENSE)