Add developer guide for z3ed CLI outlining architecture, commands, and roadmap

- Introduced a comprehensive developer guide for z3ed CLI. - Documented core capabilities, architecture, command reference, and agentic workflow. - Included implementation details for build system and AI service configuration. - Provided roadmap with completed and active tasks for future development.
2025-10-04 03:16:45 -04:00
parent 209e150b18
commit 551f926aba
5 changed files with 149 additions and 5006 deletions
--- a/docs/z3ed/AGENT-ROADMAP.md
+++ b/docs/z3ed/AGENT-ROADMAP.md
@@ -1,580 +0,0 @@
-# z3ed Agent Roadmap
-
-**Last Updated**: October 3, 2025
-
-## Current Status
-
-### ✅ Production Ready
- **Build System**: Z3ED_AI flag consolidation complete
- **AI Backends**: Ollama (local) and Gemini (cloud) operational
- **Conversational Agent**: Multi-step tool execution with chat history
- **Tool Dispatcher**: 5 read-only tools (resource-list, dungeon-list-sprites, overworld-find-tile, overworld-describe-map, overworld-list-warps)
- **TUI Chat**: FTXUI-based interactive terminal interface
- **Simple Chat**: Text-mode REPL for AI testing (no FTXUI dependencies)
- **GUI Chat Widget**: ImGui-based widget (needs integration into main app)
-
-### 🚧 Active Work
-1. **Live LLM Testing** (1-2h): Verify function calling with real models
-2. **GUI Integration** (4-6h): Wire AgentChatWidget into YAZE editor
-3. **Proposal Workflow** (6-8h): End-to-end integration from chat to ROM changes
-
-## Core Vision
-
-Transform z3ed from a command-line tool into a **conversational ROM hacking assistant** where users can:
- Ask questions about ROM contents ("What dungeons exist?")
- Inspect game data interactively ("How many soldiers in room X?")
- Build changes incrementally through dialogue
- Generate proposals from conversation context
-
-## Technical Architecture
-
-### 1. Conversational Agent Service ✅
-**Status**: Complete
- `ConversationalAgentService`: Manages chat sessions and tool execution
- Integrates with Ollama/Gemini AI services
- Handles tool calls with automatic JSON formatting
- Maintains conversation history and context
-
-### 2. Read-Only Tools ✅
-**Status**: 5 tools implemented
- `resource-list`: Enumerate labeled resources
- `dungeon-list-sprites`: Inspect sprites in rooms
- `overworld-find-tile`: Search for tile16 IDs
- `overworld-describe-map`: Get map metadata
- `overworld-list-warps`: List entrances/exits/holes
-
-**Next**: Add dialogue, sprite info, and region inspection tools
-
-### 3. Chat Interfaces
-**Status**: Multiple modes available
- **TUI (FTXUI)**: Full-screen interactive terminal (✅ complete)
- **Simple Mode**: Text REPL for automation/testing (✅ complete)
- **GUI (ImGui)**: Dockable widget in YAZE (⚠️ needs integration)
-
-### 4. Proposal Workflow Integration
-**Status**: Planned
-**Goal**: When user requests ROM changes, agent generates proposal
-1. User chats to explore ROM
-2. User requests change ("add two more soldiers")
-3. Agent generates commands → creates proposal
-4. User reviews with `agent diff` or GUI
-5. User accepts/rejects proposal
-
-## Immediate Priorities
-
-### Priority 1: Live LLM Testing (1-2 hours)
-Verify function calling works end-to-end:
- Test Gemini 2.0 with natural language prompts
- Test Ollama (qwen2.5-coder) with tool discovery
- Validate multi-step conversations
- Exercise all 5 tools
-
-### Priority 2: GUI Chat Integration (4-6 hours)
-Wire AgentChatWidget into main YAZE editor:
- Add menu item: Debug → Agent Chat
- Connect to shared ConversationalAgentService
- Test with loaded ROM context
- Add history persistence
-
-### Priority 3: Proposal Generation (6-8 hours)
-
-## Technical Implementation Plan
-
-### 1. Conversational Agent Service
- **Description**: A new service that will manage the back-and-forth between the user and the LLM. It will maintain chat history and orchestrate the agent's different modes (Q&A vs. command generation).
- **Components**:
-    - `ConversationalAgentService`: The main class for managing the chat session.
-    - Integration with existing `AIService` implementations (Ollama, Gemini).
- **Status**: In progress — baseline service exists with chat history, tool loop handling, and structured response parsing. Next up: wiring in live ROM context and richer session state.
-
-### 2. Read-Only "Tools" for the Agent
- **Description**: To enable the agent to answer questions, we need to expand `z3ed` with a suite of read-only commands that the LLM can call. This is aligned with the "tool use" or "function calling" capabilities of modern LLMs.
- **Example Tools to Implement**:
-    - `resource list --type <dungeon|sprite|...>`: List all user-defined labels of a certain type.
-    - `dungeon list-sprites --room <id|label>`: List all sprites in a given room.
-    - `dungeon get-info --room <id|label>`: Get metadata for a specific room.
-    - `overworld find-tile --tile <id>`: Find all occurrences of a specific tile on the overworld map.
- **Advanced Editing Tools (for future implementation)**:
-    - `overworld set-area --map <id> --x <x> --y <y> --width <w> --height <h> --tile <id>`
-    - `overworld replace-tile --map <id> --from <old_id> --to <new_id>`
-    - `overworld blend-tiles --map <id> --pattern <name> --density <percent>`
- **Status**: Foundational commands (`resource-list`, `dungeon-list-sprites`) are live with JSON output. Focus is shifting to high-value Overworld and dialogue inspection tools.
-
-### 3. TUI and GUI Chat Interfaces
- **Description**: User-facing components for interacting with the `ConversationalAgentService`.
- **Components**:
-    - **TUI**: A new full-screen component in `z3ed` using FTXUI, providing a rich chat experience in the terminal.
-    - **GUI**: A new ImGui widget that can be docked into the main `yaze` application window.
- **Status**: In progress — CLI/TUI and GUI chat widgets exist, now rendering tables/JSON with readable formatting. Need to improve input ergonomics and synchronized history navigation.
-
-### 4. Integration with the Proposal Workflow
- **Description**: The final step is to connect the conversation to the action. When a user's prompt implies a desire to modify the ROM (e.g., "Okay, now add two more soldiers"), the `ConversationalAgentService` will trigger the existing `Tile16ProposalGenerator` (and future proposal generators for other resource types) to create a proposal.
- **Workflow**:
-    1. User chats with the agent to explore the ROM.
-    2. User asks the agent to make a change.
-    3. `ConversationalAgentService` generates the commands and passes them to the appropriate `ProposalGenerator`.
-    4. A new proposal is created and saved.
-    5. The TUI/GUI notifies the user that a proposal is ready for review.
-    6. User uses the `agent diff` and `agent accept` commands (or UI equivalents) to review and apply the changes.
- **Status**: The proposal workflow itself is mostly implemented. This task involves integrating it with the new conversational service.
-
-## Next Steps
-
-### Immediate Priorities
-1.  **✅ Build System Consolidation** (COMPLETE - Oct 3, 2025):
-    - ✅ Created Z3ED_AI master flag for simplified builds
-    - ✅ Fixed Gemini crash with graceful degradation
-    - ✅ Updated documentation with new build instructions
-    - ✅ Tested both Ollama and Gemini backends
-    - **Next**: Update CI/CD workflows to use `-DZ3ED_AI=ON`
-2.  **Live LLM Testing** (NEXT UP - 1-2 hours):
-    - Verify function calling works with real Ollama/Gemini
-    - Test multi-step tool execution
-    - Validate all 5 tools with natural language prompts
-3.  **Expand Overworld Tool Coverage**:
-    - ✅ Ship read-only tile searches (`overworld find-tile`) with shared formatting for CLI and agent calls.
-    - Next: add area summaries, teleport destination lookups, and keep JSON/Text parity for all new tools.
-4.  **Polish the TUI Chat Experience**:
-    - Tighten keyboard shortcuts, scrolling, and copy-to-clipboard behaviour.
-    - Align log file output with on-screen formatting for easier debugging.
-5.  **Document & Test the New Tooling**:
-    - Update the main `README.md` and relevant docs to cover the new chat formatting.
-    - Add regression tests (unit or golden JSON fixtures) for the new Overworld tools.
-5.  **Build GUI Chat Widget**:
-    - Create the ImGui component.
-    - Ensure it shares the same backend service as the TUI.
-6.  **Full Integration with Proposal System**:
-    - Implement the logic for the agent to transition from conversation to proposal generation.
-7.  **Expand Tool Arsenal**:
-    - Continuously add new read-only commands to give the agent more capabilities to inspect the ROM.
-8.  **Multi-Modal Agent**:
-    - Explore the possibility of the agent generating and displaying images (e.g., a map of a dungeon room) in the chat.
-9.  **Advanced Configuration**:
-    - Implement environment variables for selecting AI providers and models (e.g., `YAZE_AI_PROVIDER`, `OLLAMA_MODEL`).
-    - Add CLI flags for overriding the provider and model on a per-command basis.
-10.  **Performance and Cost-Saving**:
-    - Implement a response cache to reduce latency and API costs.
-    - Add token usage tracking and reporting.
-
-## Current Status & Next Steps (Updated: October 3, 2025)
-
-We have made significant progress in laying the foundation for the conversational agent.
-
-### ✅ Completed
- **Build System Consolidation**: ✅ **NEW** Z3ED_AI master flag (Oct 3, 2025)
-  - Single flag enables all AI features: `-DZ3ED_AI=ON`
-  - Auto-manages dependencies (JSON, YAML, httplib, OpenSSL)
-  - Fixed Gemini crash when API key set but JSON disabled
-  - Graceful degradation with clear error messages
-  - Backward compatible with old flags
-  - Ready for build modularization (enables optional `libyaze_agent.a`)
-  - **Docs**: `docs/z3ed/Z3ED_AI_FLAG_MIGRATION.md`
- **`ConversationalAgentService`**: ✅ Fully operational with multi-step tool execution loop
-  - Handles tool calls with automatic JSON output format
-  - Prevents recursion through proper tool result replay
-  - Supports conversation history and context management
- **TUI Chat Interface**: ✅ Production-ready (`z3ed agent chat`)
-  - Renders tables from JSON tool results
-  - Pretty-prints JSON payloads with syntax formatting
-  - Scrollable history with user/agent distinction
- **Tool Dispatcher**: ✅ Complete with 5 read-only tools
-  - `resource-list`: Enumerate labeled resources (dungeons, sprites, palettes)
-  - `dungeon-list-sprites`: Inspect sprites in dungeon rooms
-  - `overworld-find-tile`: Search for tile16 IDs across maps
-  - `overworld-describe-map`: Get comprehensive map metadata
-  - `overworld-list-warps`: List entrances/exits/holes with filtering
- **Structured Output Rendering**: ✅ Both TUI formats support tables and JSON
-  - Automatic table generation from JSON arrays/objects
-  - Column-aligned formatting with headers
-  - Graceful fallback to text for malformed data
- **ROM Context Integration**: ✅ Tools can access loaded ROM or load from `--rom` flag
-  - Shared ROM context passed through ConversationalAgentService
-  - Automatic ROM loading with error handling
- **AI Service Foundation**: ✅ Ollama and Gemini services operational
-  - Enhanced prompting system with resource catalogue loading
-  - System instruction generation with examples
-  - Health checks and model availability validation
-  - Both backends tested and working in production
-
-### 🚧 In Progress
- **Live LLM Testing**: Ready to execute with real Ollama/Gemini
-  - All infrastructure complete (function calling, tool schemas, response parsing)
-  - Need to verify multi-step tool execution with live models
-  - Test scenarios prepared for all 5 tools
-  - **Estimated Time**: 1-2 hours
- **GUI Chat Widget**: Not yet started
-  - TUI implementation complete and can serve as reference
-  - Should reuse table/JSON rendering logic from TUI
-  - Target: `src/app/gui/debug/agent_chat_widget.{h,cc}`
-  - **Estimated Time**: 6-8 hours
-
-### 🚀 Next Steps (Priority Order)
-
-#### Priority 1: Live LLM Testing with Function Calling (1-2 hours)
-**Goal**: Verify Ollama/Gemini can autonomously invoke tools in production
-
-**Infrastructure Complete** ✅:
- ✅ Tool schema generation (`BuildFunctionCallSchemas()`)
- ✅ System prompts include function definitions
- ✅ AI services parse `tool_calls` from responses
- ✅ ConversationalAgentService dispatches to ToolDispatcher
- ✅ All 5 tools tested independently
-
-**Testing Tasks**:
-1. **Gemini Testing** (30 min)
-   - Verify Gemini 2.0 generates correct `tool_calls` JSON
-   - Test prompt: "What dungeons are in this ROM?"
-   - Verify tool result fed back into conversation
-   - Test multi-step: "Now list sprites in the first dungeon"
-
-2. **Ollama Testing** (30 min)
-   - Verify qwen2.5-coder discovers and calls tools
-   - Same test prompts as Gemini
-   - Compare response quality between models
-
-3. **Tool Coverage Testing** (30 min)
-   - Exercise all 5 tools with natural language prompts
-   - Verify JSON output formats correctly
-   - Test error handling (invalid room IDs, etc.)
-
-**Success Criteria**:
- LLM autonomously calls tools without explicit command syntax
- Tool results incorporated into follow-up responses
- Multi-turn conversations work with context
-
-#### Priority 2: Implement GUI Chat Widget (6-8 hours)
-**Goal**: Unified chat experience in YAZE application
-
-1. **Create ImGui Chat Widget** (4 hours)
-   - File: `src/app/gui/debug/agent_chat_widget.{h,cc}`
-   - Reuse table/JSON rendering logic from TUI implementation
-   - Add to Debug menu: `Debug → Agent Chat`
-   - Share `ConversationalAgentService` instance with TUI
-
-2. **Add Chat History Persistence** (2 hours)
-   - Save chat history to `.yaze/agent_chat_history.json`
-   - Load on startup, display in GUI/TUI
-   - Add "Clear History" button
-
-3. **Polish Input Experience** (2 hours)
-   - Multi-line input support (Shift+Enter for newline, Enter to send)
-   - Keyboard shortcuts: Ctrl+L to clear, Ctrl+C to copy last response
-   - Auto-scroll to bottom on new messages
-
-#### Priority 3: Proposal Generation (6-8 hours)
-Connect chat to ROM modification workflow:
- Detect action intents in conversation
- Generate proposal from accumulated context
- Link proposal to chat history
- GUI notification when proposal ready
-
-## Command Reference
-
-### Chat Modes
-```bash
-# Interactive TUI chat (FTXUI)
-z3ed agent chat --rom zelda3.sfc
-
-# Simple text mode (for automation/AI testing)
-z3ed agent simple-chat --rom zelda3.sfc
-
-# Batch mode from file
-z3ed agent simple-chat --file tests.txt --rom zelda3.sfc
-```
-
-### Tool Commands (for direct testing)
-```bash
-# List dungeons
-z3ed agent resource-list --type dungeon --format json
-
-# Find tiles
-z3ed agent overworld-find-tile --tile 0x02E --map 0x05
-
-# List sprites in room
-z3ed agent dungeon-list-sprites --room 0x012
-```
-
-## Build Quick Reference
-
-```bash
-# Full AI features
-cmake -B build -DZ3ED_AI=ON
-cmake --build build --target z3ed
-
-# With GUI automation/testing
-cmake -B build -DZ3ED_AI=ON -DYAZE_WITH_GRPC=ON
-cmake --build build
-
-# Minimal (no AI)
-cmake -B build
-cmake --build build --target z3ed
-```
-
-## Future Enhancements
-
-### Short Term (1-2 months)
- Dialogue/text search tools
- Sprite info inspection
- Region/teleport tools
- Response caching
- Token usage tracking
-
-### Medium Term (3-6 months)
- Multi-modal agent (image generation)
- Advanced configuration (env vars, model selection)
- Proposal templates for common edits
- Undo/redo in conversations
-
-### Long Term (6+ months)
- Visual diff viewer for proposals
- Collaborative editing sessions
- Learning from user feedback
- Custom tool plugins
-**Goal**: Enable deeper ROM introspection for level design questions
-
-1. **Dialogue/Text Tools** (3 hours)
-   - `dialogue-search --text "search term"`: Find text in ROM dialogue
-   - `dialogue-get --id 0x...`: Get dialogue by message ID
-
-2. **Sprite Tools** (3 hours)
-   - `sprite-get-info --id 0x...`: Sprite metadata (HP, damage, AI)
-   - `overworld-list-sprites --map 0x...`: Sprites on overworld map
-
-3. **Advanced Overworld Tools** (4 hours)
-   - `overworld-get-region --map 0x...`: Region boundaries and properties
-   - `overworld-list-transitions --from-map 0x...`: Map transitions/scrolling
-   - `overworld-get-tile-at --map 0x... --x N --y N`: Get specific tile16 value
-
-#### Priority 4: Performance and Caching (4-6 hours)
-
-1. **Response Caching** (3 hours)
-   - Implement LRU cache for identical prompts
-   - Cache tool results by (tool_name, args) key
-   - Configurable TTL (default: 5 minutes for ROM introspection)
-
-2. **Token Usage Tracking** (2 hours)
-   - Log tokens per request (Ollama and Gemini APIs provide this)
-   - Display in chat footer: "Last response: 1234 tokens, ~$0.02"
-   - Add `--show-token-usage` flag to CLI commands
-
-3. **Streaming Responses** (optional, 3-4 hours)
-   - Use Ollama/Gemini streaming APIs
-   - Update GUI/TUI to show partial responses as they arrive
-   - Improves perceived latency for long responses
-
-## z3ed Build Quick Reference
-
-```bash
-# Full AI features (Ollama + Gemini)
-cmake -B build -DZ3ED_AI=ON
-cmake --build build --target z3ed
-
-# AI + GUI automation/testing
-cmake -B build -DZ3ED_AI=ON -DYAZE_WITH_GRPC=ON
-cmake --build build --target z3ed
-
-# Minimal build (no AI)
-cmake -B build
-cmake --build build --target z3ed
-```
-
-## Build Flags Explained
-
-| Flag | Purpose | Dependencies | When to Use |
-|------|---------|--------------|-------------|
-| `Z3ED_AI=ON` | **Master flag** for AI features | JSON, YAML, httplib, (OpenSSL*) | Want Ollama or Gemini support |
-| `YAZE_WITH_GRPC=ON` | GUI automation & testing | gRPC, Protobuf, (auto-enables JSON) | Want GUI test harness |
-| `YAZE_WITH_JSON=ON` | Low-level JSON support | nlohmann_json | Auto-enabled by above flags |
-
-*OpenSSL optional - required for Gemini (HTTPS), Ollama works without it
-
-## Feature Matrix
-
-| Feature | No Flags | Z3ED_AI | Z3ED_AI + GRPC |
-|---------|----------|---------|----------------|
-| Basic CLI | ✅ | ✅ | ✅ |
-| Ollama (local) | ❌ | ✅ | ✅ |
-| Gemini (cloud) | ❌ | ✅* | ✅* |
-| TUI Chat | ❌ | ✅ | ✅ |
-| GUI Test Automation | ❌ | ❌ | ✅ |
-| Tool Dispatcher | ❌ | ✅ | ✅ |
-| Function Calling | ❌ | ✅ | ✅ |
-
-*Requires OpenSSL for HTTPS
-
-## Common Build Scenarios
-
-### Developer (AI features, no GUI testing)
-```bash
-cmake -B build -DZ3ED_AI=ON
-cmake --build build --target z3ed -j8
-```
-
-### Full Stack (AI + GUI automation)
-```bash
-cmake -B build -DZ3ED_AI=ON -DYAZE_WITH_GRPC=ON
-cmake --build build --target z3ed -j8
-```
-
-### CI/CD (minimal, fast)
-```bash
-cmake -B build -DYAZE_MINIMAL_BUILD=ON
-cmake --build build -j$(nproc)
-```
-
-### Release Build (optimized)
-```bash
-cmake -B build -DZ3ED_AI=ON -DCMAKE_BUILD_TYPE=Release
-cmake --build build --target z3ed -j8
-```
-
-## Migration from Old Flags
-
-### Before (Confusing)
-```bash
-cmake -B build -DYAZE_WITH_GRPC=ON -DYAZE_WITH_JSON=ON
-```
-
-### After (Clear Intent)
-```bash
-cmake -B build -DZ3ED_AI=ON -DYAZE_WITH_GRPC=ON
-```
-
-**Note**: Old flags still work for backward compatibility!
-
-## Troubleshooting
-
-### "Build with -DZ3ED_AI=ON" warning
-**Symptom**: AI commands fail with "JSON support required"  
-**Fix**: Rebuild with AI flag
-```bash
-rm -rf build && cmake -B build -DZ3ED_AI=ON && cmake --build build
-```
-
-### "OpenSSL not found" warning
-**Symptom**: Gemini API doesn't work  
-**Impact**: Only affects Gemini (cloud). Ollama (local) works fine  
-**Fix (optional)**:
-```bash
-# macOS
-brew install openssl
-
-# Linux
-sudo apt install libssl-dev
-
-# Then rebuild
-cmake -B build -DZ3ED_AI=ON && cmake --build build
-```
-
-### Ollama vs Gemini not auto-detecting
-**Symptom**: Wrong backend selected  
-**Fix**: Set explicit provider
-```bash
-# Force Ollama
-export YAZE_AI_PROVIDER=ollama
-./build/bin/z3ed agent plan --prompt "test"
-
-# Force Gemini
-export YAZE_AI_PROVIDER=gemini
-export GEMINI_API_KEY="your-key"
-./build/bin/z3ed agent plan --prompt "test"
-```
-
-## Environment Variables
-
-| Variable | Default | Purpose |
-|----------|---------|---------|
-| `YAZE_AI_PROVIDER` | auto | Force `ollama` or `gemini` |
-| `GEMINI_API_KEY` | - | Gemini API key (enables Gemini) |
-| `OLLAMA_MODEL` | `qwen2.5-coder:7b` | Override Ollama model |
-| `GEMINI_MODEL` | `gemini-2.5-flash` | Override Gemini model |
-
-## Platform-Specific Notes
-
-### macOS
- OpenSSL auto-detected via Homebrew
- Keychain integration for SSL certs
- Recommended: `brew install openssl ollama`
-
-### Linux
- OpenSSL typically pre-installed
- Install via: `sudo apt install libssl-dev`
- Ollama: Download from https://ollama.com
-
-### Windows
- Use Ollama (no SSL required)
- Gemini requires OpenSSL (harder to setup on Windows)
- Recommend: Focus on Ollama for Windows builds
-
-## Performance Tips
-
-### Faster Incremental Builds
-```bash
-# Use Ninja instead of Make
-cmake -B build -GNinja -DZ3ED_AI=ON
-ninja -C build z3ed
-
-# Enable ccache
-export CMAKE_CXX_COMPILER_LAUNCHER=ccache
-cmake -B build -DZ3ED_AI=ON
-```
-
-### Reduce Build Scope
-```bash
-# Only build z3ed (not full yaze app)
-cmake --build build --target z3ed
-
-# Parallel build
-cmake --build build --target z3ed -j$(nproc)
-```
-
-## Related Documentation
-
- **Migration Guide**: [Z3ED_AI_FLAG_MIGRATION.md](Z3ED_AI_FLAG_MIGRATION.md)
- **Technical Roadmap**: [AGENT-ROADMAP.md](AGENT-ROADMAP.md)
- **Main README**: [README.md](README.md)
- **Build Modularization**: `../../build_modularization_plan.md`
-
-## Quick Test
-
-Verify your build works:
-
-```bash
-# Check z3ed runs
-./build/bin/z3ed --version
-
-# Test AI detection
-./build/bin/z3ed agent plan --prompt "test" 2>&1 | head -5
-
-# Expected output (with Z3ED_AI=ON):
-# 🤖 Using Gemini AI with model: gemini-2.5-flash
-# or
-# 🤖 Using Ollama AI with model: qwen2.5-coder:7b
-# or
-# 🤖 Using MockAIService (no LLM configured)
-```
-
-## Support
-
-If you encounter issues:
-1. Check this guide's troubleshooting section
-2. Review [Z3ED_AI_FLAG_MIGRATION.md](Z3ED_AI_FLAG_MIGRATION.md)
-3. Verify CMake output for warnings
-4. Open an issue with build logs
-
-## Summary
-
-**Recommended for most users**:
-```bash
-cmake -B build -DZ3ED_AI=ON
-cmake --build build --target z3ed -j8
-./build/bin/z3ed agent chat
-```
-
-This gives you:
- ✅ Ollama support (local, free)
- ✅ Gemini support (cloud, API key required)
- ✅ TUI chat interface
- ✅ Tool dispatcher with 5 commands
- ✅ Function calling support
- ✅ All AI agent features
--- a/docs/z3ed/E6-z3ed-cli-design.md
+++ b/docs/z3ed/E6-z3ed-cli-design.md
@@ -1,826 +0,0 @@
-# z3ed CLI Architecture & Design
-
-## 1. Overview
-
-This document is the **source of truth** for the z3ed CLI architecture and design. It outlines the evolution of `z3ed`, the command-line interface for the YAZE project, from a collection of utility commands into a powerful, scriptable, and extensible tool for both manual and automated ROM hacking, with full support for AI-driven generative development.
-
-**Related Documents**:
- **[E6-z3ed-implementation-plan.md](E6-z3ed-implementation-plan.md)** - Implementation tracker, task backlog, and roadmap
- **[E6-z3ed-reference.md](E6-z3ed-reference.md)** - Technical reference: commands, APIs, troubleshooting
- **[README.md](README.md)** - Quick overview and documentation index
-
-**Last Updated**: [Current Date]
-
-`z3ed` has successfully implemented its core infrastructure and is **production-ready on macOS**:
-
-**✅ Completed Features**:
- **Resource-Oriented CLI**: Clean `z3ed <resource> <action>` command structure
- **Resource Catalogue**: Machine-readable API specs in YAML/JSON for AI consumption
- **Acceptance Workflow**: Full proposal lifecycle (create → review → accept/reject → commit)
- **ImGuiTestHarness (IT-01)**: gRPC-based GUI automation with 6 RPC methods
- **CLI Agent Test (IT-02)**: Natural language prompts → automated GUI testing
- **ProposalDrawer GUI**: Integrated review interface in YAZE editor
- **ROM Sandbox Manager**: Isolated testing environment for safe experimentation
- **Proposal Registry**: Cross-session proposal tracking with disk persistence
-
-**🔄 In Progress**:
- **Test Harness Enhancements (IT-05 to IT-09)**: Expanding from basic automation to comprehensive testing platform
-  - Test introspection APIs for status/results polling
-  - Widget discovery for AI-driven interactions
-  - **✅ Test recording/replay for regression testing**
-  - Enhanced error reporting with screenshots and application-wide diagnostics
-  - CI/CD integration with standardized test formats
-
-**📋 Planned Next**:
- **Policy Evaluation Framework (AW-04)**: YAML-based constraints for proposal acceptance
- **Windows Cross-Platform Testing**: Validate on Windows with vcpkg
- **Production Readiness**: Telemetry, screenshot implementation, expanded test coverage
-
-## 2. Design Goals
-
-The z3ed CLI is built on three core pillars:
-
-1.  **Power & Usability for ROM Hackers**: Empower users with fine-grained control over all aspects of the ROM directly from the command line, supporting both interactive exploration and scripted automation.
-
-2.  **Testability & Automation**: Provide robust commands for validating ROM integrity, automating complex testing scenarios, and enabling reproducible workflows through scripting.
-
-3.  **AI & Generative Hacking**: Establish a powerful, scriptable API that an AI agent (LLM/MCP) can use to perform complex, generative tasks on the ROM, with human oversight and approval workflows.
-
-### 2.1. Key Architectural Decisions
-
-**Resource-Oriented Command Structure**: Adopted `z3ed <resource> <action>` pattern (similar to kubectl, gcloud) for clarity and extensibility.
-
-**Machine-Readable API**: All commands documented in `docs/api/z3ed-resources.yaml` with structured schemas for AI consumption.
-
-**Proposal-Based Workflow**: AI-generated changes are sandboxed and tracked as "proposals" requiring human review and acceptance.
-
-**gRPC Test Harness**: Embedded gRPC server in YAZE enables remote GUI automation for testing and AI-driven workflows.
-
-**Comprehensive Testing Platform**: Test harness evolved beyond basic automation to support:
- **Widget Discovery**: AI agents can enumerate available GUI interactions dynamically
- **Test Introspection**: Query test status, results, and execution queue in real-time
- **Recording & Replay**: Capture test sessions as JSON scripts for regression testing
- **CI/CD Integration**: Standardized test suite format with JUnit XML output
- **Enhanced Debugging**: Screenshot capture, widget state dumps, and execution context on failures
-
-**Cross-Platform Foundation**: Core built for macOS/Linux with Windows support planned via vcpkg.
-
-## 3. Proposed CLI Architecture: Resource-Oriented Commands
-
-The CLI has adopted a `z3ed <resource> <action> [options]` structure, similar to modern CLIs like `gcloud` or `kubectl`, improving clarity and extensibility.
-
-### 3.1. Top-Level Resources
-
- `rom`: Commands for interacting with the ROM file itself.
- `patch`: Commands for applying and creating patches.
- `gfx`: Commands for graphics manipulation.
- `palette`: Commands for palette manipulation.
- `overworld`: Commands for overworld editing.
- `dungeon`: Commands for dungeon editing.
- `sprite`: Commands for sprite management and creation.
- `test`: Commands for running tests.
- `tui`: The entrypoint for the enhanced Text User Interface.
- `agent`: Commands for interacting with the AI agent.
-
-### 3.2. Example Command Mapping
-
-The command mapping has been successfully implemented, transitioning from the old flat structure to the new resource-oriented approach.
-
-## 4. New Features & Commands
-
-### 4.1. For the ROM Hacker (Power & Scriptability)
-
-These commands focus on exporting data to and from the original SCAD (Nintendo Super Famicom/SNES CAD) binary formats found in the gigaleak, as well as other relevant binary formats. This enables direct interaction with development assets, version control, and sharing. Many of these commands have been implemented or are in progress.
-
- **Dungeon Editing**: Commands for exporting, importing, listing, and adding objects.
- **Overworld Editing**: Commands for getting, setting tiles, listing, and moving sprites.
- **Graphics & Palettes**: Commands for exporting/importing sheets and palettes.
-
-### 4.2. For Testing & Automation
-
- **ROM Validation & Comparison**: `z3ed rom validate`, `z3ed rom diff`, and `z3ed rom generate-golden` have been implemented.
- **Test Execution**: `z3ed test run` and `z3ed test list-suites` are in progress.
-
-## 5. TUI Enhancements
-
-The `--tui` flag now launches a significantly enhanced, interactive terminal application built with FTXUI. The TUI has been decomposed into a set of modular components, with each command handler responsible for its own TUI representation, making it more extensible and easier to maintain.
-
- **Dashboard View**: The main screen is evolving into a dashboard.
- **Interactive Palette Editor**: In progress.
- **Interactive Hex Viewer**: Implemented.
- **Command Palette**: In progress.
- **Tabbed Layout**: Implemented.
-
-## 6. Generative & Agentic Workflows (MCP Integration)
-
-The redesigned CLI serves as the foundational API for an AI-driven Model-Code-Program (MCP) loop. The AI agent's "program" is a script of `z3ed` commands.
-
-### 6.1. The Generative Workflow
-
-The generative workflow has been refined to incorporate more detailed planning and verification steps, leveraging the `z3ed agent` commands.
-
-### 6.2. Key Enablers
-
- **Granular Commands**: The CLI provides commands to manipulate data within the binary formats (e.g., `palette set-color`, `gfx set-pixel`), abstracting complexity from the AI agent.
- **Idempotency**: Commands are designed to be idempotent where possible.
- **SpriteBuilder CLI**: Deprioritized for now, pending further research and development of the underlying assembly generation capabilities.
-
-## 7. Implementation Roadmap
-
-### Phase 1: Core CLI & TUI Foundation (Done)
- **CLI Structure**: Implemented.
- **Command Migration**: Implemented.
- **TUI Decomposition**: Implemented.
-
-### Phase 2: Interactive TUI & Command Palette (Done)
- **Interactive Palette Editor**: Implemented.
- **Interactive Hex Viewer**: Implemented.
- **Command Palette**: Implemented.
-
-### Phase 3: Testing & Project Management (Done)
- **`rom validate`**: Implemented.
- **`rom diff`**: Implemented.
- **`rom generate-golden`**: Implemented.
- **Project Scaffolding**: Implemented.
-
-### Phase 4: Agentic Framework & Generative AI (✅ Foundation Complete, 🚧 LLM Integration In Progress)
- **`z3ed agent` command**: ✅ Implemented with `run`, `plan`, `diff`, `test`, `commit`, `revert`, `describe`, `learn`, and `list` subcommands.
- **Resource Catalog System**: ✅ Complete - comprehensive schema for all CLI commands with effects and returns metadata.
- **Agent Describe Command**: ✅ Fully operational - exports command catalog in JSON/YAML formats for AI consumption.
- **Agent List Command**: ✅ Complete - enumerates all proposals with status and metadata.
- **Agent Diff Enhancement**: ✅ Complete - reads proposals from registry, supports `--proposal-id` flag, displays execution logs and metadata.
- **Machine-Readable API**: ✅ `docs/api/z3ed-resources.yaml` generated and maintained for automation.
- **Conversational Agent Service**: ✅ Complete - multi-step tool execution loop with history management.
- **Tool Dispatcher**: ✅ Complete - 5 read-only tools for ROM introspection (`resource-list`, `dungeon-list-sprites`, `overworld-find-tile`, `overworld-describe-map`, `overworld-list-warps`).
- **TUI Chat Interface**: ✅ Complete - production-ready with table/JSON rendering (`z3ed agent chat`).
- **AI Service Backends**: ✅ Operational - Ollama (local) and Gemini (cloud) with enhanced prompting.
- **LLM Function Calling**: 🚧 In Progress - ToolDispatcher exists, needs tool schema injection into prompts and response parsing.
- **GUI Chat Widget**: 📋 Planned - TUI implementation complete, ImGui widget pending.
- **Execution Loop (MCP)**: ✅ Complete - command parsing and execution logic operational.
- **Leveraging `ImGuiTestEngine`**: ✅ Complete - `agent test` subcommand for GUI verification (see IT-01/02).
- **Sandbox ROM Management**: ✅ Complete - `RomSandboxManager` operational with full lifecycle management.
- **Proposal Tracking**: ✅ Complete - `ProposalRegistry` implemented with metadata, diffs, logs, and lifecycle management.
- **Granular Data Commands**: ✅ Complete - rom, palette, overworld, dungeon commands operational.
- **SpriteBuilder CLI**: Deprioritized.
-
-### Phase 5: Code Structure & UX Improvements (Completed)
- **Modular Architecture**: Refactored CLI handlers into clean, focused modules with proper separation of concerns.
- **TUI Component System**: Implemented `TuiComponent` interface for consistent UI components across the application.
- **Unified Command Interface**: Standardized `CommandHandler` base class with both CLI and TUI execution paths.
- **Error Handling**: Improved error handling with consistent `absl::Status` usage throughout the codebase.
- **Build System**: Streamlined CMake configuration with proper dependency management and conditional compilation.
- **Code Quality**: Resolved linting errors and improved code maintainability through better header organization and forward declarations.
-
-### Phase 6: Resource Catalogue & API Documentation (✅ Completed - Oct 1, 2025)
- **Resource Schema System**: ✅ Comprehensive schema definitions for all CLI resources (ROM, Patch, Palette, Overworld, Dungeon, Agent).
- **Metadata Annotations**: ✅ All commands annotated with arguments, effects, returns, and stability levels.
- **Serialization Framework**: ✅ Dual-format export (JSON compact, YAML human-readable) with resource filtering.
- **Agent Describe Command**: ✅ Full implementation with `--format`, `--resource`, `--output`, `--version` flags.
- **API Documentation Generation**: ✅ Automated generation of `docs/api/z3ed-resources.yaml` for AI/tooling consumption.
- **Flag-Based Dispatch**: ✅ Hardened command routing - all ROM commands use `FLAGS_rom` consistently.
- **ROM Info Fix**: ✅ Created dedicated `RomInfo` handler, resolving segfault issue.
-
-**Key Achievements**:
- Machine-readable API catalog enables LLM integration for automated ROM hacking workflows
- Comprehensive command documentation with argument types, effects, and return schemas
- Stable foundation for AI agents to discover and invoke CLI commands programmatically
- Validation layer for ensuring command compatibility and argument correctness
-
-**Testing Coverage**:
- ✅ All ROM commands tested: `info`, `validate`, `diff`, `generate-golden`
- ✅ Agent describe tested: YAML output, JSON output, resource filtering, file generation
- ✅ Help system integration verified with updated command listings
- ✅ Build system validated on macOS (arm64) with no critical warnings
-
-## 8. Agentic Framework Architecture - Advanced Dive
-
-The agentic framework is designed to allow an AI agent to make edits to the ROM based on high-level natural language prompts. The framework is built around the `z3ed` CLI and the `ImGuiTestEngine`. This section provides a more advanced look into its architecture and future development.
-
-### 8.1. The `z3ed agent` Command
-
-The `z3ed agent` command is the main entry point for the agent. It has the following subcommands:
-
- `run --prompt "..."`: Executes a prompt by generating and running a sequence of `z3ed` commands.
- `plan --prompt "..."`: Shows the sequence of `z3ed` commands the AI plans to execute.
- `diff [--proposal-id <id>]`: Shows a diff of the changes made to the ROM after running a prompt. Displays the latest pending proposal by default, or a specific proposal if ID is provided.
- `list`: Lists all proposals with their status, creation time, prompt, and execution statistics.
- `test --prompt "..."`: Generates changes and then runs an `ImGuiTestEngine` test to verify them.
- `commit`: Saves the modified ROM and any new assets to the project.
- `revert`: Reverts the changes made by the agent.
- `describe [--resource <name>]`: Returns machine-readable schemas for CLI commands, enabling AI/LLM integration.
- `learn --description "..."`: Records a sequence of user actions (CLI commands and GUI interactions) and associates them with a natural language description, allowing the agent to learn new workflows.
-
-### 8.2. The Agentic Loop (MCP) - Detailed Workflow
-
-1.  **Model (Planner)**: The agent receives a high-level natural language prompt. It leverages an LLM to break down this goal into a detailed, executable plan. This plan is a sequence of `z3ed` CLI commands, potentially interleaved with `ImGuiTestEngine` test steps for intermediate verification. The LLM's prompt includes the user's request, a comprehensive list of available `z3ed` commands (with their parameters and expected effects), and relevant contextual information about the current ROM state (e.g., loaded ROM, project files, current editor view).
-2.  **Code (Command & Test Generation)**: The LLM returns the generated plan as a structured JSON object. This JSON object contains an array of actions, where each action specifies a `z3ed` command (with its arguments) or an `ImGuiTestEngine` test to execute. This structured output is crucial for reliable parsing and execution by the `z3ed` agent.
-3.  **Program (Execution Engine)**: The `z3ed agent` parses the JSON plan and executes each command sequentially. For `z3ed` commands, it directly invokes the corresponding internal `CommandHandler` methods. For `ImGuiTestEngine` steps, it launches the `yaze_test` executable with the appropriate test arguments. The output (stdout, stderr, exit codes) of each executed command is captured. This output, along with any visual feedback from `ImGuiTestEngine` (e.g., screenshots), can be fed back to the LLM for iterative refinement of the plan.
-4.  **Verification (Tester)**: The `ImGuiTestEngine` plays a critical role here. After the agent executes a sequence of commands, it can generate and run a specific `ImGuiTestEngine` script. This script can interact with the YAZE GUI (e.g., open a specific editor, navigate to a location, assert visual properties) to verify that the changes were applied correctly and as intended. The results of these tests (pass/fail, detailed logs, comparison screenshots) are reported back to the user and can be used by the LLM to self-correct or refine its strategy.
-
-### 8.3. AI Model & Protocol Strategy
-
- **Models**: The framework will support both local and remote AI models, offering flexibility and catering to different user needs.
-
---
-
-## 9. Test Harness Evolution: From Automation to Platform
-
-The ImGuiTestHarness has evolved from a basic GUI automation tool into a comprehensive testing platform that serves dual purposes: **AI-driven generative workflows** and **traditional GUI testing**.
-
-### 9.1. Current Capabilities (IT-01 to IT-04) ✅
-
-**Core Automation** (6 RPCs):
- `Ping` - Health check and version verification
- `Click` - Button, menu, and tab interactions
- `Type` - Text input with focus management
- `Wait` - Condition polling (window visibility, element state)
- `Assert` - State validation (visible, enabled, exists)
- `Screenshot` - Capture (stub, needs implementation)
-
-**Integration Points**:
- ImGuiTestEngine dynamic test registration
- Async test queue with frame-accurate timing
- gRPC server embedded in YAZE process
- Cross-platform build (macOS validated, Windows planned)
-
-**Proven Use Cases**:
- Menu-driven editor opening (Overworld, Dungeon, etc.)
- Window visibility validation
- Multi-step workflows with timing dependencies
- Natural language test prompts via `z3ed agent test`
-
-### 9.2. Limitations Identified
-
-**For AI Agents**:
- ❌ Can't discover available widgets → must hardcode target names
- ❌ No way to query test results → async tests return immediately with no status
- ❌ No structured error context → failures lack screenshots and state dumps
- ❌ Limited to predefined actions → can't learn new interaction patterns
-
-**For Traditional Testing**:
- ❌ No test recording → can't capture manual workflows for regression
- ❌ No test suite format → can't organize tests into smoke/regression/nightly groups
- ❌ No CI integration → can't run tests in automated pipelines
- ❌ No result persistence → test history lost between sessions
- ❌ Poor debugging → failures don't capture visual or state context
-
-### 9.3. Enhancement Roadmap (IT-05 to IT-09)
-
-#### IT-05: Test Introspection API (6-8 hours)
-**Problem**: Tests execute asynchronously with no way to query status or results. Clients poll blindly or give up early.
-
-**Solution**: Add 3 new RPCs:
- `GetTestStatus(test_id)` → Returns queued/running/passed/failed/timeout with execution time
- `ListTests(category_filter)` → Enumerates all registered tests with metadata
- `GetTestResults(test_id)` → Retrieves detailed results: logs, assertions, metrics
-
-**Benefits**:
- AI agents can poll for test completion reliably
- CLI can show real-time progress bars
- Test history enables trend analysis (flaky tests, performance regressions)
-
-**Example Flow**:
-```bash
-# Queue test (returns immediately with test_id)
-TEST_ID=$(z3ed agent test --prompt "Open Overworld" --output json | jq -r '.test_id')
-
-# Poll until complete
-while true; do
-  STATUS=$(z3ed agent test status --test-id $TEST_ID --format json | jq -r '.status')
-  [[ "$STATUS" =~ ^(PASSED|FAILED|TIMEOUT)$ ]] && break
-  sleep 0.5
-done
-
-# Get results
-z3ed agent test results --test-id $TEST_ID --include-logs
-```
-
-#### IT-06: Widget Discovery API (4-6 hours)
-**Problem**: AI agents must know widget names in advance. Can't adapt to UI changes or learn new editors.
-
-**Solution**: Add `DiscoverWidgets` RPC:
- Enumerates all windows currently open
- Lists interactive widgets per window: buttons, inputs, menus, tabs
- Returns metadata: ID, label, type, enabled state, position
- Provides suggested action templates (e.g., "Click button:Save")
-
-**Benefits**:
- AI agents discover GUI capabilities dynamically
- Test scripts validate expected widgets exist
- LLM prompts improved with natural language descriptions
- Reduces brittleness from hardcoded widget names
-
-**Example Flow**:
-```python
-# AI agent workflow
-widgets = z3ed_client.DiscoverWidgets(window_filter="Overworld")
-
-# LLM prompt: "Which buttons are available in the Overworld editor?"
-available_actions = [w.suggested_action for w in widgets.buttons if w.is_enabled]
-
-# LLM generates: "Click button:Save Changes"
-z3ed_client.Click(target="button:Save Changes")
-```
-
-#### IT-07: Test Recording & Replay ✅ COMPLETE
-**Outcome**: Recording workflow, replay runner, and JSON script format shipped alongside CLI commands (`z3ed test record start|stop`, `z3ed test replay`). Regression coverage captured in `scripts/test_record_replay_e2e.sh`; documentation updated with quick-start examples. Focus now shifts to error diagnostics and artifact surfacing (IT-08).
-
-#### IT-08: Holistic Error Reporting (5-7 hours)
-**Problem**: Errors surface differently across the CLI, ImGuiTestHarness, and EditorManager. Failures lack actionable context, slowing down triage and AI agent autonomy.
-
-**Solution Themes**:
- **Harness Diagnostics**: Implement the Screenshot RPC, capture widget tree/state, and bundle execution context for every failed run.
- **Structured Error Envelope**: Introduce a shared `ErrorAnnotatedResult` format (status + metadata + hints) adopted by z3ed, harness services, and EditorManager subsystems.
- **Artifact Surfacing**: Persist artifacts under `test-results/<test_id>/`; expose paths in CLI output and in-app overlays.
- **Developer Experience**: Provide HTML + JSON result formats, actionable hints (“Re-run with --follow”, “Open screenshot: …”), and cross-links to recorded sessions for replay.
-
-**Benefits**:
- Faster debugging with consistent, high-signal failure context
- AI agents can reason about structured errors and attempt self-healing
- EditorManager gains on-screen diagnostics tied to harness artifacts
- Lays groundwork for future telemetry and CI reporting
-
-#### IT-09: CI/CD Integration ✅ CLI Foundations Complete
-**Problem**: Tests run manually. No automated regression on PR/merge.
-
-**Shipped**:
- YAML test suite runtime with dependency-aware execution and retry handling
- `z3ed agent test suite run` supports `--group`, `--tag`, `--param`,
-    `--retries`, `--ci-mode`, and automatic JUnit XML emission under
-    `test-results/junit/`
- `z3ed agent test suite validate` performs structural linting and surfaces
-    exit codes (0 pass, 1 fail, 2 error)
- NEW `z3ed agent test suite create` interactive builder generates suites
-    (defaulting to `tests/<name>.yaml`), with prompts for groups, replay scripts,
-    tags, and key=value parameters. `--force` enables overwrite flows.
-
-**Next Integration Steps**:
- Publish canonical `tests/smoke.yaml` / `tests/regression.yaml` templates in
-    the repo
- Add GitHub Actions example wiring harness referencing the new runner
- Document best practices for mapping suite tags to CI stages (smoke,
-    regression, nightly)
- Wire run summaries into docs (`docs/testing/`) with badge-ready status tables
-
-**GitHub Actions Example**:
-```yaml
-name: GUI Tests
-on: [push, pull_request]
-jobs:
-  gui-tests:
-    runs-on: macos-latest
-    steps:
-      - name: Build YAZE
-        run: cmake --build build --target yaze --target z3ed
-      - name: Start test harness
-        run: ./build/bin/yaze --enable_test_harness --headless &
-      - name: Run smoke tests
-        run: ./build/bin/z3ed test suite run tests/smoke.yaml --ci-mode
-      - name: Upload results
-        uses: actions/upload-artifact@v2
-        with:
-          name: test-results
-          path: test-results/
-```
-
-**Benefits**:
- Catch regressions before merge
- Test history tracked in CI dashboard
- Parallel execution for faster feedback
- Flaky test detection (retry logic, failure rates)
-
-### 9.4. Unified Testing Vision
-
-The enhanced test harness serves three audiences:
-
-**For AI Agents** (Generative Workflows):
- Widget discovery enables dynamic learning
- Test introspection provides reliable feedback loops
- Recording captures expert workflows for training data
-
-**For Developers** (Unit/Integration Testing):
- Test suites organize tests by scope (smoke, regression, nightly)
- CI integration catches regressions early
- Rich error reporting speeds up debugging
-
-**For QA Engineers** (Manual Testing Automation):
- Record manual workflows once, replay forever
- Parameterized tests reduce maintenance burden
- Visual test reports simplify communication
-
-**Shared Infrastructure**:
- Single gRPC server handles all test types
- Consistent test script format (JSON/YAML)
- Common result storage and reporting
- Cross-platform support (macOS, Windows, Linux)
-
-### 9.5. Implementation Priority
-
-**Phase 1: Foundation** (Already Complete ✅)
- Core automation RPCs (Ping, Click, Type, Wait, Assert)
- ImGuiTestEngine integration
- gRPC server lifecycle
- Basic E2E validation
-
-**Phase 2: Introspection & Discovery** (IT-05, IT-06 - 10-14 hours)
- Test status/results querying
- Widget enumeration API
- Async test management
- *Critical for AI agents*
-
-**Phase 3: Recording & Replay** (IT-07 - 8-10 hours)
- Test script format
- Recording workflow
- Replay engine
- *Unlocks regression testing*
-
-**Phase 4: Production Readiness** (IT-08, IT-09 - 5-7 hours)
- Screenshot implementation
- Error context capture
- CI/CD integration
- *Enables automated pipelines*
-
-**Total Estimated Effort**: 23-31 hours beyond current implementation
-
---
-  - **Local Models (macOS Setup)**: For privacy, offline use, and reduced operational costs, integration with local LLMs via [Ollama](https://ollama.ai/) is a priority. Users can easily install Ollama on macOS and pull models optimized for code generation, such as `codellama:7b`. The `z3ed` agent will communicate with Ollama's local API endpoint.
-  - **Remote Models (Gemini API)**: For more complex tasks requiring advanced reasoning capabilities, integration with powerful remote models like the Gemini API will be available. Users will need to provide a `GEMINI_API_KEY` environment variable. A new `GeminiAIService` class will be implemented to handle the secure API requests and responses.
- **Protocol**: A robust, yet simple, JSON-based protocol will be used for communication between `z3ed` and the AI model. This ensures structured data exchange, critical for reliable parsing and execution. The `z3ed` tool will serialize the user's prompt, current ROM context, available `z3ed` commands, and any relevant `ImGuiTestEngine` capabilities into a JSON object. The AI model will be expected to return a JSON object containing the sequence of commands to be executed, along with potential explanations or confidence scores.
-
-### 8.4. GUI Integration & User Experience
-
- **Agent Control Panel**: A dedicated TUI/GUI panel will be created for managing the agent. This panel will serve as the primary interface for users to interact with the AI. It will feature:
-    - A multi-line text input for entering natural language prompts.
-    - Buttons for `Run`, `Plan`, `Diff`, `Test`, `Commit`, `Revert`, and `Learn` actions.
-    - A real-time log view displaying the agent's thought process, executed commands, and their outputs.
-    - A status bar indicating the agent's current state (e.g., "Idle", "Planning", "Executing Commands", "Verifying Changes").
- **Diff Editing UI**: A TUI-based visual diff viewer will be implemented. This UI will present a side-by-side comparison of the original ROM state (or a previous checkpoint) and the changes proposed or made by the agent. Users will be able to:
-    - Navigate through individual differences (e.g., changed bytes, modified tiles, added objects).
-    - Highlight specific changes.
-    - Accept or reject individual changes or groups of changes, providing fine-grained control over the agent's output.
- **Interactive Planning**: The agent will present its generated plan in a human-readable format within the GUI. Users will have the opportunity to:
-    - Review each step of the plan.
-    - Approve the entire plan for execution.
-    - Reject specific steps or the entire plan.
-    - Edit the plan directly (e.g., modify command arguments, reorder steps, insert new commands) before allowing the agent to proceed.
-
-### 8.5. Testing & Verification
-
- **`ImGuiTestEngine` Integration**: The agent will be able to dynamically generate and execute `ImGuiTestEngine` tests. This allows for automated visual verification of the agent's work, ensuring that changes are not only functionally correct but also visually appealing and consistent with design principles. The agent can be trained to generate test scripts that assert specific pixel colors, UI element positions, or overall visual layouts.
- **Mock Testing Framework**: A robust "mock" mode will be implemented for the `z3ed agent`. In this mode, the agent will simulate the execution of commands without modifying the actual ROM. This is crucial for safe and fast testing of the agent's planning and command generation capabilities. The existing `MockRom` class will be extended to fully support all `z3ed` commands, providing a consistent interface for both real and mock execution.
- **User-Facing Tests**: A "tutorial" or "challenge" mode will be created where users can test the agent with a series of predefined tasks. This will serve as an educational tool for users to understand the agent's capabilities and provide a way to benchmark its performance against specific ROM hacking challenges.
-
-### 8.6. Safety & Sandboxing
-
- **Dry Run Mode**: The agent will always offer a "dry run" mode, where it only shows the commands it would execute without making any actual changes to the ROM. This provides a critical safety net for users.
- **Command Whitelisting**: The agent's execution environment will enforce a strict command whitelisting policy. Only a predefined set of "safe" `z3ed` commands will be executable by the AI. Any attempt to execute an unauthorized command will be blocked.
- **Resource Limits**: The agent will operate within defined resource limits (e.g., maximum number of commands per plan, maximum data modification size) to prevent unintended extensive changes or infinite loops.
- **Human Oversight**: Given the inherent unpredictability of AI models, human oversight will be a fundamental principle. The interactive planning and diff editing UIs are designed to keep the user in control at all times.
-
-### 8.7. Optional JSON Dependency
-
-To avoid breaking platform builds where a JSON library is not available or desired, the JSON-related code will be conditionally compiled using a preprocessor macro (e.g., `YAZE_WITH_JSON`). When this macro is not defined, the agentic features that rely on JSON will be disabled. The `nlohmann/json` library will be added as a submodule to the project and included in the build only when `YAZE_WITH_JSON` is defined.
-
-### 8.8. Contextual Awareness & Feedback Loop
-
- **Contextual Information**: The agent's prompts to the LLM will be enriched with comprehensive contextual information, including:
-    - The current state of the loaded ROM (e.g., ROM header, loaded assets, current editor view).
-    - Relevant project files (e.g., `.yaze` project configuration, symbol files).
-    - User preferences and previous interactions.
-    - A dynamic list of available `z3ed` commands and their detailed usage.
- **Feedback Loop for Learning**: The results of `ImGuiTestEngine` verifications and user accept/reject actions will form a crucial feedback loop. This data can be used to fine-tune the LLM or train smaller, specialized models to improve the agent's planning and command generation capabilities over time.
-
-### 8.9. Error Handling and Recovery
-
- **Robust Error Reporting**: The agent will provide clear and actionable error messages when commands fail or unexpected situations arise.
- **Rollback Mechanisms**: The `revert` command provides a basic rollback. More advanced mechanisms, such as transactional changes or snapshotting, could be explored for complex multi-step operations.
- **Interactive Debugging**: In case of errors, the agent could pause execution and allow the user to inspect the current state, modify the plan, or provide corrective instructions.
-
-### 8.10. Extensibility
-
- **Modular Command Handlers**: The `z3ed` CLI's modular design allows for easy addition of new commands, which automatically become available to the AI agent.
- **Pluggable AI Models**: The `AIService` interface enables seamless integration of different AI models (local or remote) without modifying the core agent logic.
- **Custom Test Generation**: Users or developers can extend the `ImGuiTestEngine` capabilities to create custom verification tests for specific hacking scenarios.
-
-## 9. UX Improvements and Architectural Decisions
-
-### 9.1. TUI Component Architecture
-
-The TUI system has been redesigned around a consistent component architecture:
-
- **`TuiComponent` Interface**: All UI components implement a standard interface with a `Render()` method, ensuring consistency across the application.
- **Component Composition**: Complex UIs are built by composing simpler components, making the code more maintainable and testable.
- **Event Handling**: Standardized event handling patterns across all components for consistent user experience.
-
-### 9.2. Command Handler Unification
-
-The CLI and TUI systems now share a unified command handler architecture:
-
- **Dual Execution Paths**: Each command handler supports both CLI (`Run()`) and TUI (`RunTUI()`) execution modes.
- **Shared State Management**: Common functionality like ROM loading and validation is centralized in the base `CommandHandler` class.
- **Consistent Error Handling**: All commands use `absl::Status` for uniform error reporting across CLI and TUI modes.
-
-### 9.3. Interface Consolidation
-
-Several interfaces have been combined and simplified:
-
- **Unified Menu System**: The main menu now serves as a central hub for both direct command execution and TUI mode switching.
- **Integrated Help System**: Help information is accessible from both CLI and TUI modes with consistent formatting.
- **Streamlined Navigation**: Reduced cognitive load by consolidating related functionality into single interfaces.
-
-### 9.4. Code Organization Improvements
-
-The codebase has been restructured for better maintainability:
-
- **Header Organization**: Proper forward declarations and include management to reduce compilation dependencies.
- **Namespace Management**: Clean namespace usage to avoid conflicts and improve code clarity.
- **Build System Optimization**: Streamlined CMake configuration with conditional compilation for optional features.
-
-### 9.5. Future UX Enhancements
-
-Based on the current architecture, several UX improvements are planned:
-
- **Progressive Disclosure**: Complex commands will offer both simple and advanced modes.
- **Context-Aware Help**: Help text will adapt based on current ROM state and available commands.
- **Undo/Redo System**: Command history tracking for safer experimentation.
- **Batch Operations**: Support for executing multiple related commands as a single operation.
-
-## 10. Implementation Status and Code Quality
-
-### 10.1. Recent Refactoring Improvements (January 2025)
-
-The z3ed CLI underwent significant refactoring to improve code quality, fix linting errors, and enhance maintainability.
-
-**Issues Resolved**:
- ✅ **Missing Headers**: Added proper forward declarations for `ftxui::ScreenInteractive` and `TuiComponent`
- ✅ **Include Path Issues**: Standardized all includes to use `cli/` prefix instead of `src/cli/`
- ✅ **Namespace Conflicts**: Resolved namespace pollution issues by properly organizing includes
- ✅ **Duplicate Definitions**: Removed duplicate `CommandInfo` and `ModernCLI` definitions
- ✅ **FLAGS_rom Multiple Definitions**: Changed duplicate `ABSL_FLAG` declarations to `ABSL_DECLARE_FLAG`
-
-**Build System Improvements**:
- **CMake Configuration**: Cleaned up `z3ed.cmake` to properly configure all source files
- **Dependency Management**: Added proper includes for `absl/flags/declare.h` where needed
- **Conditional Compilation**: Properly wrapped JSON/HTTP library usage with `#ifdef YAZE_WITH_JSON`
-
-**Architecture Improvements**:
- Removed `std::unique_ptr<TuiComponent>` members from command handlers to avoid incomplete type issues
- Simplified constructors and `RunTUI` methods
- Maintained clean separation between CLI and TUI execution paths
-
-### 10.2. File Organization
-
-```
-src/cli/
-  ├── cli_main.cc          (Entry point - defines FLAGS)
-  ├── modern_cli.{h,cc}    (Command registry and dispatch)
-  ├── tui.{h,cc}           (TUI components and layout management)
-  ├── z3ed.{h,cc}          (Command handler base classes)
-  ├── service/
-  │   ├── ai_service.{h,cc}           (AI service interface)
-  │   └── gemini_ai_service.{h,cc}    (Gemini API implementation)
-  ├── handlers/            (Command implementations)
-  │   ├── agent.cc
-  │   ├── command_palette.cc
-  │   ├── compress.cc
-  │   ├── dungeon.cc
-  │   ├── gfx.cc
-  │   ├── overworld.cc
-  │   ├── palette.cc
-  │   ├── patch.cc
-  │   ├── project.cc
-  │   ├── rom.cc
-  │   ├── sprite.cc
-  │   └── tile16_transfer.cc
-  └── tui/                 (TUI component implementations)
-      ├── tui_component.h
-      ├── asar_patch.{h,cc}
-      ├── palette_editor.{h,cc}
-      └── command_palette.{h,cc}
-```
-
-### 10.3. Code Quality Improvements
-
-**Removed Problematic Patterns**:
- Eliminated returning raw pointers to temporary objects in `GetCommandHandler`
- Used `static` storage for handlers to ensure valid lifetimes
- Proper const-reference usage to avoid unnecessary copies
-
-**Standardized Error Handling**:
- Consistent use of `absl::Status` return types
- Proper status checking with `RETURN_IF_ERROR` macro
- Clear error messages for user-facing commands
-
-**API Corrections**:
- Fixed `Bitmap::bpp()` → `Bitmap::depth()`
- Fixed `PaletteGroup::set_palette()` → direct pointer manipulation
- Fixed `Bitmap::mutable_vector()` → `Bitmap::set_data()`
-
-### 10.4. TUI Component System
-
-**Implemented Components**:
- `TuiComponent` interface for consistent UI components
- `ApplyAsarPatchComponent` - Modular patch application UI
- `PaletteEditorComponent` - Interactive palette editing
- `CommandPaletteComponent` - Command search and execution
-
-**Standardized Patterns**:
- Consistent navigation across all TUI screens
- Centralized error handling with dedicated error screen
- Direct component function calls instead of handler indirection
-
-### 10.5. Known Limitations
-
-**Remaining Warnings (Non-Critical)**:
- Unused parameter warnings (mostly for stub implementations)
- Nodiscard warnings for status returns that are logged elsewhere
- Copy-construction warnings (minor performance considerations)
- Virtual destructor warnings in third-party zelda3 classes
-
-### 10.6. Future Code Quality Goals
-
-1. **Complete TUI Components**: Finish implementing all planned TUI components with full functionality
-2. **Error Handling**: Add proper status checking for all `LoadFromFile` calls
-3. **API Methods**: Implement missing ROM validation methods
-4. **JSON Integration**: Complete HTTP/JSON library integration for Gemini AI service
-5. **Performance**: Address copy-construction warnings by using const references
-6. **Testing**: Expand unit test coverage for command handlers
-
-## 11. Agent-Ready API Surface Area
-
-To unlock deeper agentic workflows, the CLI and application layers must expose a well-documented, machine-consumable API surface that mirrors the capabilities available in the GUI editors. The following initiatives expand the command coverage and standardize access for both humans and AI agents:
-
- **Resource Inventory**: Catalogue every actionable subsystem (ROM metadata, banks, tile16 atlas, actors, palettes, scripts) and map it to a resource/action pair (e.g., `rom header set`, `dungeon room copy`, `sprite spawn`). The catalogue will live in `docs/api/z3ed-resources.yaml` and be generated from source annotations; current machine-readable coverage includes palette, overworld, rom, patch, and dungeon actions.
- **Rich Metadata**: Schemas annotate each action with structured `effects` and `returns` arrays so agents can reason about side-effects and expected outputs when constructing plans.
- **Command Introspection Endpoint**: Introduce `z3ed agent describe --resource <name>` to return a structured schema describing arguments, enum values, preconditions, side-effects, and example invocations. Schemas will follow JSON Schema, enabling UI tooltips and LLM prompt construction.  _Prototype status (Oct 2025)_: the command now streams catalog JSON from `ResourceCatalog`, including `effects` and `returns` arrays for each action across palette, overworld, rom, patch, and dungeon resources.  
-    ```json
-    {
-        "resources": [
-            {
-                "resource": "rom",
-                "actions": [
-                    {
-                        "name": "validate",
-                        "effects": [
-                            "Reads ROM from disk, verifies checksum, and reports header status."
-                        ],
-                        "returns": [
-                            { "field": "report", "type": "object", "description": "Checksum + header validation summary." }
-                        ]
-                    }
-                ]
-            },
-            {
-                "resource": "overworld",
-                "actions": [
-                    {
-                        "name": "get-tile",
-                        "returns": [
-                            { "field": "tile", "type": "integer", "description": "Tile id located at the supplied coordinates." }
-                        ]
-                    }
-                ]
-            }
-        ]
-    }
-    ```
- **State Snapshot APIs**: Extend `rom` and `project` resources with `export-state` actions that emit compact JSON snapshots (bank checksums, tile hashes, palette CRCs). Snapshots will seed the LLM context and accelerate change verification.
- **Write Guard Hooks**: All mutation-oriented commands will publish `PreChange` and `PostChange` events onto an internal bus (backed by `absl::Notification` + ring buffer). The agent loop subscribes to the bus to build a change proposal timeline used in review UIs and acceptance workflows.
- **Replayable Scripts**: Standardize a TOML-based script format (`.z3edscript`) that records CLI invocations with metadata (ROM hash, duration, success). Agents can emit scripts, humans can replay them via `z3ed script run <file>`.
-
-## 12. Acceptance & Review Workflow
-
-An explicit accept/reject system keeps humans in control while encouraging rapid agent iteration.
-
-### 12.1. Change Proposal Lifecycle
-
-1. **Draft**: Agent executes commands in a sandbox ROM (auto-cloned using `Rom::SaveToFile` with `save_new=true`). All diffs, test logs, and screenshots are attached to a proposal ID.
-2. **Review**: The dashboard surfaces proposals with summary cards (changed resources, affected banks, test status). Users can open a detail view built atop the existing diff viewer, augmented with per-resource controls (accept tile, reject palette entry, etc.).
-3. **Decision**: Accepting merges the delta into the primary ROM and commits associated assets. Rejecting discards the sandbox ROM and emits feedback signals (tagged reasons) that can be fed back to future LLM prompts.
-4. **Archive**: Accepted proposals are archived with metadata for provenance; rejected ones are stored briefly for analytics before being pruned.
-
-### 12.2. UI Extensions
-
- **Proposal Drawer**: Adds a right-hand drawer in the ImGui dashboard listing open proposals with filters (resource type, test pass/fail, age).
- **Inline Diff Controls**: Integrate checkboxes/buttons into the existing palette/tile hex viewers so users can cherry-pick changes without leaving the visual context.
- **Feedback Composer**: Provide quick tags (“Incorrect palette”, “Misplaced sprite”, “Regression detected”) and optional freeform text. Feedback is serialized into the agent telemetry channel.
- **Undo/Redo Enhancements**: Accepted proposals push onto the global undo stack with descriptive labels, enabling rapid rollback during exploratory sessions.
-
-### 12.3. Policy Configuration
-
- **Gatekeeping Rules**: Define YAML-driven policies (e.g., “require passing `agent smoke` and `palette regression` suites before accept button activates”). Rules live in `.yaze/policies/agent.yaml` and are evaluated by the dashboard.
- **Access Control**: Integrate project roles so only maintainers can finalize proposals while contributors can submit drafts.
- **Telemetry Opt-In**: Provide toggles for sharing anonymized proposal statistics to improve default prompts and heuristics.
-
-## 13. ImGuiTestEngine Control Bridge
-
-Allowing an LLM to drive the ImGui UI safely requires a structured bridge between generated plans and the `ImGuiTestEngine` runtime.
-
-### 13.1. Bridge Architecture
-
- **Test Harness API**: Expose a lightweight gRPC/IPC service (`ImGuiTestHarness`) that accepts serialized input events (click, drag, key, text), query requests (widget tree, screenshot), and expectations (assert widget text equals …). The service runs inside `yaze_test` when started with `--automation=sock`. Agents connect via domain sockets (macOS/Linux) or named pipes (Windows).
- **Command Translation Layer**: Extend `z3ed agent run` to recognize plan steps with type `imgui_action`. These steps translate to harness calls (e.g., `{ "type": "imgui_action", "action": "click", "target": "Palette/Cell[12]" }`).
- **Synchronization Primitives**: Provide `WaitForIdle`, `WaitForCondition`, and `Delay` primitives so LLMs can coordinate with frame updates. Each primitive enforces timeouts and returns explicit success/failure statuses.
- **State Queries**: Implement reflection endpoints retrieving ImGui widget hierarchy, enabling the agent to confirm UI states before issuing the next action—mirroring how `ImGuiTestEngine` DSL scripts work today.
-
-#### 13.1.1. Transport & Envelope
-
- **Session bootstrap**: `yaze_test --automation=<socket path>` spins up the harness and prints a connection URI. The CLI or external agent opens a persistent stream (Unix domain socket on macOS/Linux, named pipe + overlapped IO on Windows). TLS is out-of-scope; trust is derived from local IPC.
- **Message format**: Each frame is a length-prefixed JSON envelope with optional binary attachments. Core fields:
-    ```json
-    {
-        "id": "req-42",
-        "type": "event" | "query" | "expect" | "control",
-        "payload": { /* type-specific body */ },
-        "attachments": [
-            { "slot": 0, "mime": "image/png" }
-        ]
-    }
-    ```
-    Binary blobs (e.g., screenshots) follow immediately after the JSON payload in the same frame to avoid out-of-band coordination.
- **Streaming semantics**: Responses reuse the `id` field and include `status`, `error`, and optional attachments. Long-running operations (`WaitForCondition`) stream periodic `progress` updates before returning `status: "ok"` or `status: "timeout"`.
-
-#### 13.1.2. Harness Runtime Lifecycle
-
-1. **Attach**: Agent sends a `control` message (`{"command":"attach"}`) to lock in a session. Harness responds with negotiated capabilities (available input devices, screenshot formats, rate limits).
-2. **Activate context**: Agent issues an `event` to focus a specific ImGui context (e.g., "main", "palette_editor"). Harness binds to the corresponding `ImGuiTestEngine` backend fixture.
-3. **Execute actions**: Agent streams `event` objects (`click`, `drag`, `keystroke`, `text_input`). Harness feeds them into the ImGui event queue at the start of the next frame, waits for the frame to settle, then replies.
-4. **Query & assert**: Agent interleaves `query` messages (`get_widget_tree`, `capture_screenshot`, `read_value`) and `expect` messages (`assert_property`, `assert_pixel`). Harness routes these to existing ImGuiTestEngine inspectors, lifting the results into structured JSON.
-5. **Detach**: Agent issues `{"command":"detach"}` (or connection closes). Harness flushes pending frames, releases sandbox locks, and tears down the socket.
-
-#### 13.1.3. Integration with `z3ed agent`
-
- **Plan annotation**: The CLI plan schema gains a new step kind `imgui_action` with fields `harness_uri`, `actions[]`, and optional `expect[]`. During execution `z3ed agent run` opens the harness stream, feeds each action, and short-circuits on first failure.
- **Sandbox awareness**: Harness sessions inherit the active sandbox ROM path from `RomSandboxManager`, ensuring UI assertions operate on the same data snapshot as CLI mutations.
- **Telemetry hooks**: Every harness response is appended to the proposal timeline (see §12) with thumbnails for screenshots. Failures bubble up as structured errors with hints (`"missing_widget": "Palette/Cell[12]"`).
-
-### 13.2. Safety & Sandboxing
-
- **Read-Only Default**: Harness sessions start in read-only mode; mutation commands must explicitly request escalation after presenting a plan (triggering a UI prompt for the user to authorize). Without authorization, only `capture` and `assert` operations succeed.
- **Rate Limiting**: Cap concurrent interactions and enforce per-step quotas to prevent runaway agents.
- **Logging**: Every harness call is logged and linked to the proposal ID, with playback available inside the acceptance UI.
-
-### 13.3. Script Generation Strategy
-
- **Template Library**: Publish a library of canonical ImGui action sequences (open file, expand tree, focus palette editor). Plans reference templates via IDs to reduce LLM token usage and improve reliability.
- **Auto-Healing**: When a widget lookup fails, the harness can suggest closest matches (Levenshtein distance) so the agent can retry with corrected IDs.
- **Hybrid Execution**: Encourage plans that mix CLI operations for bulk edits and ImGui actions for visual verification, minimizing UI-driven mutations.
-
-## 14. Test & Verification Strategy
-
-### 14.1. Layered Test Suites
-
- **CLI Unit Tests**: Extend `test/cli/` with high-coverage tests for new resource handlers using sandbox ROM fixtures.
- **Harness Integration Tests**: Add `test/ui/automation/` cases that spin up the harness, replay canned plans, and validate deterministic behavior.
- **End-to-End Agent Scenarios**: Create golden scenarios (e.g., “Recolor Link tunic”, “Shift Dungeon Chest”) that exercise command + UI flows, verifying ROM diffs, UI captures, and pass/fail criteria.
-
-### 14.2. Continuous Verification
-
- **CI Pipelines**: Introduce dedicated CI jobs for agent features, enabling `YAZE_WITH_JSON` builds, running harness smoke suites, and publishing artifacts (diffs, screenshots) on failure.
- **Nightly Regression**: Schedule nightly runs of expensive ImGui scenarios and long-running CLI scripts with hardware acceleration (Apple Metal) to detect flaky interactions.
- **Fuzzing Hooks**: Instrument command parsers with libFuzzer harnesses to catch malformed LLM output early.
-
-### 14.3. Telemetry-Informed Testing
-
- **Flake Tracker**: Aggregate harness failures by widget/action to prioritize stabilization.
- **Adaptive Test Selection**: Use proposal metadata to select relevant regression suites dynamically (e.g., palette-focused proposals trigger palette regression tests).
- **Feedback Loop**: Feed test outcomes back into prompt engineering, e.g., annotate prompts with known flaky commands so the LLM favors safer alternatives.
-
-## 15. Expanded Roadmap (Phase 6+)
-
-### Phase 6: Agent Workflow Foundations (Planned)
- Implement resource catalogue tooling and `agent describe` schemas.
- Ship sandbox ROM workflow with proposal tracking and acceptance UI.
- Finalize ImGuiTestHarness MVP with read-only verification.
- Expand CLI surface with sprite/object manipulation commands flagged as agent-safe.
-
-### Phase 7: Controlled Mutation & Review (Planned)
- Enable harness mutation mode with user authorization prompts.
- Deliver inline diff controls and feedback composer UI.
- Wire policy engine for gating accept buttons.
- Launch initial telemetry dashboards (opt-in) for agent performance metrics.
-
-### Phase 8: Learning & Self-Improvement (Exploratory)
- Capture accept/reject rationales to train prompt selectors.
- Experiment with reinforcement signals for local models (reward accepted plans, penalize rejected ones).
- Explore collaborative agent sessions where multiple proposals merge or compete under defined heuristics.
- Investigate deterministic replay of LLM outputs for reliable regression testing.
-
-### 7.4. Widget ID Management for Test Automation
-
-A key challenge in GUI test automation is the fragility of identifying widgets. Relying on human-readable labels (e.g., `"button:Overworld"`) makes tests brittle; a simple text change in the UI can break the entire test suite.
-
-To address this, the `z3ed` ecosystem includes a robust **Widget ID Management** system.
-
-**Goals**:
-   **Decouple Tests from Labels**: Tests should refer to a stable, logical ID, not a display label.
-   **Hierarchical and Scoped IDs**: Allow for organized and unique identification of widgets within complex, nested UIs.
-   **Discoverability**: Enable the test harness to easily find and interact with widgets using these stable IDs.
-
-**Implementation**:
-   **`WidgetIdRegistry`**: A central service that manages the mapping between stable, hierarchical IDs and the dynamic `ImGuiID`s used at runtime.
-   **Hierarchical Naming**: Widget IDs are structured like paths (e.g., `/editors/overworld/toolbar/save_button`). This avoids collisions and provides context.
-   **Registration**: Editor and tool developers are responsible for registering their interactive widgets with the `WidgetIdRegistry` upon creation.
-   **Test Harness Integration**: The `ImGuiTestHarness` uses the registry to look up the current `ImGuiID` for a given stable ID, ensuring it always interacts with the correct widget, regardless of label changes or UI refactoring.
-
-This system is critical for the long-term maintainability of the automated E2E validation pipeline.
--- a/docs/z3ed/E6-z3ed-implementation-plan.md
+++ b/docs/z3ed/E6-z3ed-implementation-plan.md
--- a/docs/z3ed/E6-z3ed-reference.md
+++ b/docs/z3ed/E6-z3ed-reference.md
--- a/docs/z3ed/developer_guide.md
+++ b/docs/z3ed/developer_guide.md
@@ -0,0 +1,149 @@
+# z3ed Developer Guide
+
+**Version**: 0.1.0-alpha  
+**Last Updated**: October 3, 2025
+
+## 1. Overview
+
+This document is the **source of truth** for the z3ed CLI architecture, design, and roadmap. It outlines the evolution of `z3ed` into a powerful, scriptable, and extensible tool for both manual and AI-driven ROM hacking.
+
+`z3ed` has successfully implemented its core infrastructure and is **production-ready on macOS**.
+
+### Core Capabilities
+
+1.  **Conversational Agent**: Chat with an AI (Ollama or Gemini) to explore ROM contents and plan changes using natural language.
+2.  **GUI Test Automation**: A gRPC-based test harness allows for widget discovery, test recording/replay, and introspection for debugging and AI-driven validation.
+3.  **Proposal System**: A safe, sandboxed editing workflow where all changes are tracked as "proposals" that require human review and acceptance.
+4.  **Resource-Oriented CLI**: A clean `z3ed <resource> <action>` command structure that is both human-readable and machine-parsable.
+
+## 2. Architecture
+
+The z3ed system is composed of several layers, from the high-level AI agent down to the YAZE GUI and test harness.
+
+### System Components Diagram
+
+```
+┌─────────────────────────────────────────────────────────┐
+│ AI Agent Layer (LLM: Ollama, Gemini)                    │
+└────────────────────┬────────────────────────────────────┘
+                     │
+┌────────────────────▼────────────────────────────────────┐
+│ z3ed CLI (Command-Line Interface)                       │
+│  ├─ agent run/plan/diff/test/list/describe              │
+│  └─ rom/palette/overworld/dungeon commands              │
+└────────────────────┬────────────────────────────────────┘
+                     │
+┌────────────────────▼────────────────────────────────────┐
+│ Service Layer (Singleton Services)                      │
+│  ├─ ProposalRegistry (Proposal Tracking)                │
+│  ├─ RomSandboxManager (Isolated ROM Copies)             │
+│  ├─ ResourceCatalog (Machine-Readable API Specs)        │
+│  └─ ConversationalAgentService (Chat & Tool Dispatch)   │
+└────────────────────┬────────────────────────────────────┘
+                     │
+┌────────────────────▼────────────────────────────────────┐
+│ ImGuiTestHarness (gRPC Server in YAZE)                  │
+│  ├─ Ping, Click, Type, Wait, Assert, Screenshot         │
+│  └─ Introspection & Discovery RPCs                      │
+└────────────────────┬────────────────────────────────────┘
+                     │
+┌────────────────────▼────────────────────────────────────┐
+│ YAZE GUI (ImGui Application)                            │
+│  └─ ProposalDrawer & Editor Windows                     │
+└─────────────────────────────────────────────────────────┘
+```
+
+### Key Architectural Decisions
+
+-   **Resource-Oriented Command Structure**: `z3ed <resource> <action>` for clarity and extensibility.
+-   **Machine-Readable API**: All commands are documented in `docs/api/z3ed-resources.yaml` with structured schemas for AI consumption.
+-   **Proposal-Based Workflow**: AI-generated changes are sandboxed as "proposals" requiring human review.
+-   **gRPC Test Harness**: An embedded gRPC server in YAZE enables remote GUI automation.
+
+## 3. Command Reference
+
+This section provides a reference for the core `z3ed` commands.
+
+### Agent Commands
+
+-   `agent run --prompt "..."`: Executes an AI-driven ROM modification in a sandbox.
+-   `agent plan --prompt "..."`: Shows the sequence of commands the AI plans to execute.
+-   `agent list`: Shows all proposals and their status.
+-   `agent diff [--proposal-id <id>]`: Shows the changes, logs, and metadata for a proposal.
+-   `agent describe [--resource <name>]`: Exports machine-readable API specifications for AI consumption.
+-   `agent chat`: Opens an interactive terminal chat (TUI) with the AI agent.
+-   `agent simple-chat`: A lightweight, non-TUI chat mode for scripting and automation.
+-   `agent test ...`: Commands for running and managing automated GUI tests.
+
+### Resource Commands
+
+-   `rom info|validate|diff`: Commands for ROM file inspection and comparison.
+-   `palette export|import|list`: Commands for palette manipulation.
+-   `overworld get-tile|find-tile|set-tile`: Commands for overworld editing.
+-   `dungeon list-sprites|list-rooms`: Commands for dungeon inspection.
+
+## 4. Agentic & Generative Workflow (MCP)
+
+The `z3ed` CLI is the foundation for an AI-driven Model-Code-Program (MCP) loop, where the AI agent's "program" is a script of `z3ed` commands.
+
+1.  **Model (Planner)**: The agent receives a natural language prompt and leverages an LLM to create a plan, which is a sequence of `z3ed` commands.
+2.  **Code (Generation)**: The LLM returns the plan as a structured JSON object containing actions.
+3.  **Program (Execution)**: The `z3ed agent` parses the plan and executes each command sequentially in a sandboxed ROM environment.
+4.  **Verification (Tester)**: The `ImGuiTestHarness` is used to run automated GUI tests to verify that the changes were applied correctly.
+
+## 5. Roadmap & Implementation Status
+
+**Last Updated**: October 3, 2025
+
+### ✅ Completed
+
+-   **Core Infrastructure**: Resource-oriented CLI, proposal workflow, sandbox manager, and resource catalog are all production-ready.
+-   **AI Backends**: Both Ollama (local) and Gemini (cloud) are operational.
+-   **Conversational Agent**: The agent service, tool dispatcher (with 5 read-only tools), and TUI/simple chat interfaces are complete.
+-   **GUI Test Harness (IT-01 to IT-09)**: A comprehensive GUI testing platform with introspection, widget discovery, recording/replay, enhanced error reporting, and CI integration support.
+
+### 🚧 Active & Next Steps
+
+1.  **Live LLM Testing (1-2h)**: Verify function calling with real models (Ollama/Gemini).
+2.  **GUI Chat Integration (6-8h)**: Wire the `AgentChatWidget` into the main YAZE editor.
+3.  **Expand Tool Coverage (8-10h)**: Add new read-only tools for inspecting dialogue, sprites, and regions.
+4.  **Windows Cross-Platform Testing (8-10h)**: Validate `z3ed` and the test harness on Windows.
+
+## 6. Technical Implementation Details
+
+### Build System
+
+A single `Z3ED_AI=ON` CMake flag enables all AI features, including JSON, YAML, and httplib dependencies. This simplifies the build process and is designed for the upcoming build modularization.
+
+**Build Command (with AI features):**
+```bash
+cmake -B build -DZ3ED_AI=ON
+cmake --build build --target z3ed
+```
+
+### AI Service Configuration
+
+AI providers can be configured via command-line flags, which override environment variables.
+
+-   `--ai_provider=<mock|ollama|gemini>`
+-   `--ai_model=<model_name>`
+-   `--gemini_api_key=<key>`
+-   `--ollama_host=<url>`
+
+### Test Harness (gRPC)
+
+The test harness is a gRPC server embedded in the YAZE application, enabling remote control for automated testing. It exposes RPCs for actions like `Click`, `Type`, and `Wait`, as well as advanced introspection and test management.
+
+**Start Test Harness:**
+```bash
+./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze \
+  --enable_test_harness \
+  --test_harness_port=50052 \
+  --rom_file=assets/zelda3.sfc &
+```
+
+**Key RPCs:**
+-   **Automation**: `Ping`, `Click`, `Type`, `Wait`, `Assert`, `Screenshot`
+-   **Introspection**: `GetTestStatus`, `ListTests`, `GetTestResults`
+-   **Discovery**: `DiscoverWidgets`
+-   **Recording**: `StartRecording`, `StopRecording`, `ReplayTest`