From 551f926aba53c512c8f811e91bc501184349fb29 Mon Sep 17 00:00:00 2001 From: scawful Date: Sat, 4 Oct 2025 03:16:45 -0400 Subject: [PATCH] Add developer guide for z3ed CLI outlining architecture, commands, and roadmap - Introduced a comprehensive developer guide for z3ed CLI. - Documented core capabilities, architecture, command reference, and agentic workflow. - Included implementation details for build system and AI service configuration. - Provided roadmap with completed and active tasks for future development. --- docs/z3ed/AGENT-ROADMAP.md | 580 ------ docs/z3ed/E6-z3ed-cli-design.md | 826 -------- docs/z3ed/E6-z3ed-implementation-plan.md | 2268 ---------------------- docs/z3ed/E6-z3ed-reference.md | 1332 ------------- docs/z3ed/developer_guide.md | 149 ++ 5 files changed, 149 insertions(+), 5006 deletions(-) delete mode 100644 docs/z3ed/AGENT-ROADMAP.md delete mode 100644 docs/z3ed/E6-z3ed-cli-design.md delete mode 100644 docs/z3ed/E6-z3ed-implementation-plan.md delete mode 100644 docs/z3ed/E6-z3ed-reference.md create mode 100644 docs/z3ed/developer_guide.md diff --git a/docs/z3ed/AGENT-ROADMAP.md b/docs/z3ed/AGENT-ROADMAP.md deleted file mode 100644 index 07e0e546..00000000 --- a/docs/z3ed/AGENT-ROADMAP.md +++ /dev/null @@ -1,580 +0,0 @@ -# z3ed Agent Roadmap - -**Last Updated**: October 3, 2025 - -## Current Status - -### βœ… Production Ready -- **Build System**: Z3ED_AI flag consolidation complete -- **AI Backends**: Ollama (local) and Gemini (cloud) operational -- **Conversational Agent**: Multi-step tool execution with chat history -- **Tool Dispatcher**: 5 read-only tools (resource-list, dungeon-list-sprites, overworld-find-tile, overworld-describe-map, overworld-list-warps) -- **TUI Chat**: FTXUI-based interactive terminal interface -- **Simple Chat**: Text-mode REPL for AI testing (no FTXUI dependencies) -- **GUI Chat Widget**: ImGui-based widget (needs integration into main app) - -### 🚧 Active Work -1. **Live LLM Testing** (1-2h): Verify function calling with real models -2. **GUI Integration** (4-6h): Wire AgentChatWidget into YAZE editor -3. **Proposal Workflow** (6-8h): End-to-end integration from chat to ROM changes - -## Core Vision - -Transform z3ed from a command-line tool into a **conversational ROM hacking assistant** where users can: -- Ask questions about ROM contents ("What dungeons exist?") -- Inspect game data interactively ("How many soldiers in room X?") -- Build changes incrementally through dialogue -- Generate proposals from conversation context - -## Technical Architecture - -### 1. Conversational Agent Service βœ… -**Status**: Complete -- `ConversationalAgentService`: Manages chat sessions and tool execution -- Integrates with Ollama/Gemini AI services -- Handles tool calls with automatic JSON formatting -- Maintains conversation history and context - -### 2. Read-Only Tools βœ… -**Status**: 5 tools implemented -- `resource-list`: Enumerate labeled resources -- `dungeon-list-sprites`: Inspect sprites in rooms -- `overworld-find-tile`: Search for tile16 IDs -- `overworld-describe-map`: Get map metadata -- `overworld-list-warps`: List entrances/exits/holes - -**Next**: Add dialogue, sprite info, and region inspection tools - -### 3. Chat Interfaces -**Status**: Multiple modes available -- **TUI (FTXUI)**: Full-screen interactive terminal (βœ… complete) -- **Simple Mode**: Text REPL for automation/testing (βœ… complete) -- **GUI (ImGui)**: Dockable widget in YAZE (⚠️ needs integration) - -### 4. Proposal Workflow Integration -**Status**: Planned -**Goal**: When user requests ROM changes, agent generates proposal -1. User chats to explore ROM -2. User requests change ("add two more soldiers") -3. Agent generates commands β†’ creates proposal -4. User reviews with `agent diff` or GUI -5. User accepts/rejects proposal - -## Immediate Priorities - -### Priority 1: Live LLM Testing (1-2 hours) -Verify function calling works end-to-end: -- Test Gemini 2.0 with natural language prompts -- Test Ollama (qwen2.5-coder) with tool discovery -- Validate multi-step conversations -- Exercise all 5 tools - -### Priority 2: GUI Chat Integration (4-6 hours) -Wire AgentChatWidget into main YAZE editor: -- Add menu item: Debug β†’ Agent Chat -- Connect to shared ConversationalAgentService -- Test with loaded ROM context -- Add history persistence - -### Priority 3: Proposal Generation (6-8 hours) - -## Technical Implementation Plan - -### 1. Conversational Agent Service -- **Description**: A new service that will manage the back-and-forth between the user and the LLM. It will maintain chat history and orchestrate the agent's different modes (Q&A vs. command generation). -- **Components**: - - `ConversationalAgentService`: The main class for managing the chat session. - - Integration with existing `AIService` implementations (Ollama, Gemini). -- **Status**: In progress β€” baseline service exists with chat history, tool loop handling, and structured response parsing. Next up: wiring in live ROM context and richer session state. - -### 2. Read-Only "Tools" for the Agent -- **Description**: To enable the agent to answer questions, we need to expand `z3ed` with a suite of read-only commands that the LLM can call. This is aligned with the "tool use" or "function calling" capabilities of modern LLMs. -- **Example Tools to Implement**: - - `resource list --type `: List all user-defined labels of a certain type. - - `dungeon list-sprites --room `: List all sprites in a given room. - - `dungeon get-info --room `: Get metadata for a specific room. - - `overworld find-tile --tile `: Find all occurrences of a specific tile on the overworld map. -- **Advanced Editing Tools (for future implementation)**: - - `overworld set-area --map --x --y --width --height --tile ` - - `overworld replace-tile --map --from --to ` - - `overworld blend-tiles --map --pattern --density ` -- **Status**: Foundational commands (`resource-list`, `dungeon-list-sprites`) are live with JSON output. Focus is shifting to high-value Overworld and dialogue inspection tools. - -### 3. TUI and GUI Chat Interfaces -- **Description**: User-facing components for interacting with the `ConversationalAgentService`. -- **Components**: - - **TUI**: A new full-screen component in `z3ed` using FTXUI, providing a rich chat experience in the terminal. - - **GUI**: A new ImGui widget that can be docked into the main `yaze` application window. -- **Status**: In progress β€” CLI/TUI and GUI chat widgets exist, now rendering tables/JSON with readable formatting. Need to improve input ergonomics and synchronized history navigation. - -### 4. Integration with the Proposal Workflow -- **Description**: The final step is to connect the conversation to the action. When a user's prompt implies a desire to modify the ROM (e.g., "Okay, now add two more soldiers"), the `ConversationalAgentService` will trigger the existing `Tile16ProposalGenerator` (and future proposal generators for other resource types) to create a proposal. -- **Workflow**: - 1. User chats with the agent to explore the ROM. - 2. User asks the agent to make a change. - 3. `ConversationalAgentService` generates the commands and passes them to the appropriate `ProposalGenerator`. - 4. A new proposal is created and saved. - 5. The TUI/GUI notifies the user that a proposal is ready for review. - 6. User uses the `agent diff` and `agent accept` commands (or UI equivalents) to review and apply the changes. -- **Status**: The proposal workflow itself is mostly implemented. This task involves integrating it with the new conversational service. - -## Next Steps - -### Immediate Priorities -1. **βœ… Build System Consolidation** (COMPLETE - Oct 3, 2025): - - βœ… Created Z3ED_AI master flag for simplified builds - - βœ… Fixed Gemini crash with graceful degradation - - βœ… Updated documentation with new build instructions - - βœ… Tested both Ollama and Gemini backends - - **Next**: Update CI/CD workflows to use `-DZ3ED_AI=ON` -2. **Live LLM Testing** (NEXT UP - 1-2 hours): - - Verify function calling works with real Ollama/Gemini - - Test multi-step tool execution - - Validate all 5 tools with natural language prompts -3. **Expand Overworld Tool Coverage**: - - βœ… Ship read-only tile searches (`overworld find-tile`) with shared formatting for CLI and agent calls. - - Next: add area summaries, teleport destination lookups, and keep JSON/Text parity for all new tools. -4. **Polish the TUI Chat Experience**: - - Tighten keyboard shortcuts, scrolling, and copy-to-clipboard behaviour. - - Align log file output with on-screen formatting for easier debugging. -5. **Document & Test the New Tooling**: - - Update the main `README.md` and relevant docs to cover the new chat formatting. - - Add regression tests (unit or golden JSON fixtures) for the new Overworld tools. -5. **Build GUI Chat Widget**: - - Create the ImGui component. - - Ensure it shares the same backend service as the TUI. -6. **Full Integration with Proposal System**: - - Implement the logic for the agent to transition from conversation to proposal generation. -7. **Expand Tool Arsenal**: - - Continuously add new read-only commands to give the agent more capabilities to inspect the ROM. -8. **Multi-Modal Agent**: - - Explore the possibility of the agent generating and displaying images (e.g., a map of a dungeon room) in the chat. -9. **Advanced Configuration**: - - Implement environment variables for selecting AI providers and models (e.g., `YAZE_AI_PROVIDER`, `OLLAMA_MODEL`). - - Add CLI flags for overriding the provider and model on a per-command basis. -10. **Performance and Cost-Saving**: - - Implement a response cache to reduce latency and API costs. - - Add token usage tracking and reporting. - -## Current Status & Next Steps (Updated: October 3, 2025) - -We have made significant progress in laying the foundation for the conversational agent. - -### βœ… Completed -- **Build System Consolidation**: βœ… **NEW** Z3ED_AI master flag (Oct 3, 2025) - - Single flag enables all AI features: `-DZ3ED_AI=ON` - - Auto-manages dependencies (JSON, YAML, httplib, OpenSSL) - - Fixed Gemini crash when API key set but JSON disabled - - Graceful degradation with clear error messages - - Backward compatible with old flags - - Ready for build modularization (enables optional `libyaze_agent.a`) - - **Docs**: `docs/z3ed/Z3ED_AI_FLAG_MIGRATION.md` -- **`ConversationalAgentService`**: βœ… Fully operational with multi-step tool execution loop - - Handles tool calls with automatic JSON output format - - Prevents recursion through proper tool result replay - - Supports conversation history and context management -- **TUI Chat Interface**: βœ… Production-ready (`z3ed agent chat`) - - Renders tables from JSON tool results - - Pretty-prints JSON payloads with syntax formatting - - Scrollable history with user/agent distinction -- **Tool Dispatcher**: βœ… Complete with 5 read-only tools - - `resource-list`: Enumerate labeled resources (dungeons, sprites, palettes) - - `dungeon-list-sprites`: Inspect sprites in dungeon rooms - - `overworld-find-tile`: Search for tile16 IDs across maps - - `overworld-describe-map`: Get comprehensive map metadata - - `overworld-list-warps`: List entrances/exits/holes with filtering -- **Structured Output Rendering**: βœ… Both TUI formats support tables and JSON - - Automatic table generation from JSON arrays/objects - - Column-aligned formatting with headers - - Graceful fallback to text for malformed data -- **ROM Context Integration**: βœ… Tools can access loaded ROM or load from `--rom` flag - - Shared ROM context passed through ConversationalAgentService - - Automatic ROM loading with error handling -- **AI Service Foundation**: βœ… Ollama and Gemini services operational - - Enhanced prompting system with resource catalogue loading - - System instruction generation with examples - - Health checks and model availability validation - - Both backends tested and working in production - -### 🚧 In Progress -- **Live LLM Testing**: Ready to execute with real Ollama/Gemini - - All infrastructure complete (function calling, tool schemas, response parsing) - - Need to verify multi-step tool execution with live models - - Test scenarios prepared for all 5 tools - - **Estimated Time**: 1-2 hours -- **GUI Chat Widget**: Not yet started - - TUI implementation complete and can serve as reference - - Should reuse table/JSON rendering logic from TUI - - Target: `src/app/gui/debug/agent_chat_widget.{h,cc}` - - **Estimated Time**: 6-8 hours - -### πŸš€ Next Steps (Priority Order) - -#### Priority 1: Live LLM Testing with Function Calling (1-2 hours) -**Goal**: Verify Ollama/Gemini can autonomously invoke tools in production - -**Infrastructure Complete** βœ…: -- βœ… Tool schema generation (`BuildFunctionCallSchemas()`) -- βœ… System prompts include function definitions -- βœ… AI services parse `tool_calls` from responses -- βœ… ConversationalAgentService dispatches to ToolDispatcher -- βœ… All 5 tools tested independently - -**Testing Tasks**: -1. **Gemini Testing** (30 min) - - Verify Gemini 2.0 generates correct `tool_calls` JSON - - Test prompt: "What dungeons are in this ROM?" - - Verify tool result fed back into conversation - - Test multi-step: "Now list sprites in the first dungeon" - -2. **Ollama Testing** (30 min) - - Verify qwen2.5-coder discovers and calls tools - - Same test prompts as Gemini - - Compare response quality between models - -3. **Tool Coverage Testing** (30 min) - - Exercise all 5 tools with natural language prompts - - Verify JSON output formats correctly - - Test error handling (invalid room IDs, etc.) - -**Success Criteria**: -- LLM autonomously calls tools without explicit command syntax -- Tool results incorporated into follow-up responses -- Multi-turn conversations work with context - -#### Priority 2: Implement GUI Chat Widget (6-8 hours) -**Goal**: Unified chat experience in YAZE application - -1. **Create ImGui Chat Widget** (4 hours) - - File: `src/app/gui/debug/agent_chat_widget.{h,cc}` - - Reuse table/JSON rendering logic from TUI implementation - - Add to Debug menu: `Debug β†’ Agent Chat` - - Share `ConversationalAgentService` instance with TUI - -2. **Add Chat History Persistence** (2 hours) - - Save chat history to `.yaze/agent_chat_history.json` - - Load on startup, display in GUI/TUI - - Add "Clear History" button - -3. **Polish Input Experience** (2 hours) - - Multi-line input support (Shift+Enter for newline, Enter to send) - - Keyboard shortcuts: Ctrl+L to clear, Ctrl+C to copy last response - - Auto-scroll to bottom on new messages - -#### Priority 3: Proposal Generation (6-8 hours) -Connect chat to ROM modification workflow: -- Detect action intents in conversation -- Generate proposal from accumulated context -- Link proposal to chat history -- GUI notification when proposal ready - -## Command Reference - -### Chat Modes -```bash -# Interactive TUI chat (FTXUI) -z3ed agent chat --rom zelda3.sfc - -# Simple text mode (for automation/AI testing) -z3ed agent simple-chat --rom zelda3.sfc - -# Batch mode from file -z3ed agent simple-chat --file tests.txt --rom zelda3.sfc -``` - -### Tool Commands (for direct testing) -```bash -# List dungeons -z3ed agent resource-list --type dungeon --format json - -# Find tiles -z3ed agent overworld-find-tile --tile 0x02E --map 0x05 - -# List sprites in room -z3ed agent dungeon-list-sprites --room 0x012 -``` - -## Build Quick Reference - -```bash -# Full AI features -cmake -B build -DZ3ED_AI=ON -cmake --build build --target z3ed - -# With GUI automation/testing -cmake -B build -DZ3ED_AI=ON -DYAZE_WITH_GRPC=ON -cmake --build build - -# Minimal (no AI) -cmake -B build -cmake --build build --target z3ed -``` - -## Future Enhancements - -### Short Term (1-2 months) -- Dialogue/text search tools -- Sprite info inspection -- Region/teleport tools -- Response caching -- Token usage tracking - -### Medium Term (3-6 months) -- Multi-modal agent (image generation) -- Advanced configuration (env vars, model selection) -- Proposal templates for common edits -- Undo/redo in conversations - -### Long Term (6+ months) -- Visual diff viewer for proposals -- Collaborative editing sessions -- Learning from user feedback -- Custom tool plugins -**Goal**: Enable deeper ROM introspection for level design questions - -1. **Dialogue/Text Tools** (3 hours) - - `dialogue-search --text "search term"`: Find text in ROM dialogue - - `dialogue-get --id 0x...`: Get dialogue by message ID - -2. **Sprite Tools** (3 hours) - - `sprite-get-info --id 0x...`: Sprite metadata (HP, damage, AI) - - `overworld-list-sprites --map 0x...`: Sprites on overworld map - -3. **Advanced Overworld Tools** (4 hours) - - `overworld-get-region --map 0x...`: Region boundaries and properties - - `overworld-list-transitions --from-map 0x...`: Map transitions/scrolling - - `overworld-get-tile-at --map 0x... --x N --y N`: Get specific tile16 value - -#### Priority 4: Performance and Caching (4-6 hours) - -1. **Response Caching** (3 hours) - - Implement LRU cache for identical prompts - - Cache tool results by (tool_name, args) key - - Configurable TTL (default: 5 minutes for ROM introspection) - -2. **Token Usage Tracking** (2 hours) - - Log tokens per request (Ollama and Gemini APIs provide this) - - Display in chat footer: "Last response: 1234 tokens, ~$0.02" - - Add `--show-token-usage` flag to CLI commands - -3. **Streaming Responses** (optional, 3-4 hours) - - Use Ollama/Gemini streaming APIs - - Update GUI/TUI to show partial responses as they arrive - - Improves perceived latency for long responses - -## z3ed Build Quick Reference - -```bash -# Full AI features (Ollama + Gemini) -cmake -B build -DZ3ED_AI=ON -cmake --build build --target z3ed - -# AI + GUI automation/testing -cmake -B build -DZ3ED_AI=ON -DYAZE_WITH_GRPC=ON -cmake --build build --target z3ed - -# Minimal build (no AI) -cmake -B build -cmake --build build --target z3ed -``` - -## Build Flags Explained - -| Flag | Purpose | Dependencies | When to Use | -|------|---------|--------------|-------------| -| `Z3ED_AI=ON` | **Master flag** for AI features | JSON, YAML, httplib, (OpenSSL*) | Want Ollama or Gemini support | -| `YAZE_WITH_GRPC=ON` | GUI automation & testing | gRPC, Protobuf, (auto-enables JSON) | Want GUI test harness | -| `YAZE_WITH_JSON=ON` | Low-level JSON support | nlohmann_json | Auto-enabled by above flags | - -*OpenSSL optional - required for Gemini (HTTPS), Ollama works without it - -## Feature Matrix - -| Feature | No Flags | Z3ED_AI | Z3ED_AI + GRPC | -|---------|----------|---------|----------------| -| Basic CLI | βœ… | βœ… | βœ… | -| Ollama (local) | ❌ | βœ… | βœ… | -| Gemini (cloud) | ❌ | βœ…* | βœ…* | -| TUI Chat | ❌ | βœ… | βœ… | -| GUI Test Automation | ❌ | ❌ | βœ… | -| Tool Dispatcher | ❌ | βœ… | βœ… | -| Function Calling | ❌ | βœ… | βœ… | - -*Requires OpenSSL for HTTPS - -## Common Build Scenarios - -### Developer (AI features, no GUI testing) -```bash -cmake -B build -DZ3ED_AI=ON -cmake --build build --target z3ed -j8 -``` - -### Full Stack (AI + GUI automation) -```bash -cmake -B build -DZ3ED_AI=ON -DYAZE_WITH_GRPC=ON -cmake --build build --target z3ed -j8 -``` - -### CI/CD (minimal, fast) -```bash -cmake -B build -DYAZE_MINIMAL_BUILD=ON -cmake --build build -j$(nproc) -``` - -### Release Build (optimized) -```bash -cmake -B build -DZ3ED_AI=ON -DCMAKE_BUILD_TYPE=Release -cmake --build build --target z3ed -j8 -``` - -## Migration from Old Flags - -### Before (Confusing) -```bash -cmake -B build -DYAZE_WITH_GRPC=ON -DYAZE_WITH_JSON=ON -``` - -### After (Clear Intent) -```bash -cmake -B build -DZ3ED_AI=ON -DYAZE_WITH_GRPC=ON -``` - -**Note**: Old flags still work for backward compatibility! - -## Troubleshooting - -### "Build with -DZ3ED_AI=ON" warning -**Symptom**: AI commands fail with "JSON support required" -**Fix**: Rebuild with AI flag -```bash -rm -rf build && cmake -B build -DZ3ED_AI=ON && cmake --build build -``` - -### "OpenSSL not found" warning -**Symptom**: Gemini API doesn't work -**Impact**: Only affects Gemini (cloud). Ollama (local) works fine -**Fix (optional)**: -```bash -# macOS -brew install openssl - -# Linux -sudo apt install libssl-dev - -# Then rebuild -cmake -B build -DZ3ED_AI=ON && cmake --build build -``` - -### Ollama vs Gemini not auto-detecting -**Symptom**: Wrong backend selected -**Fix**: Set explicit provider -```bash -# Force Ollama -export YAZE_AI_PROVIDER=ollama -./build/bin/z3ed agent plan --prompt "test" - -# Force Gemini -export YAZE_AI_PROVIDER=gemini -export GEMINI_API_KEY="your-key" -./build/bin/z3ed agent plan --prompt "test" -``` - -## Environment Variables - -| Variable | Default | Purpose | -|----------|---------|---------| -| `YAZE_AI_PROVIDER` | auto | Force `ollama` or `gemini` | -| `GEMINI_API_KEY` | - | Gemini API key (enables Gemini) | -| `OLLAMA_MODEL` | `qwen2.5-coder:7b` | Override Ollama model | -| `GEMINI_MODEL` | `gemini-2.5-flash` | Override Gemini model | - -## Platform-Specific Notes - -### macOS -- OpenSSL auto-detected via Homebrew -- Keychain integration for SSL certs -- Recommended: `brew install openssl ollama` - -### Linux -- OpenSSL typically pre-installed -- Install via: `sudo apt install libssl-dev` -- Ollama: Download from https://ollama.com - -### Windows -- Use Ollama (no SSL required) -- Gemini requires OpenSSL (harder to setup on Windows) -- Recommend: Focus on Ollama for Windows builds - -## Performance Tips - -### Faster Incremental Builds -```bash -# Use Ninja instead of Make -cmake -B build -GNinja -DZ3ED_AI=ON -ninja -C build z3ed - -# Enable ccache -export CMAKE_CXX_COMPILER_LAUNCHER=ccache -cmake -B build -DZ3ED_AI=ON -``` - -### Reduce Build Scope -```bash -# Only build z3ed (not full yaze app) -cmake --build build --target z3ed - -# Parallel build -cmake --build build --target z3ed -j$(nproc) -``` - -## Related Documentation - -- **Migration Guide**: [Z3ED_AI_FLAG_MIGRATION.md](Z3ED_AI_FLAG_MIGRATION.md) -- **Technical Roadmap**: [AGENT-ROADMAP.md](AGENT-ROADMAP.md) -- **Main README**: [README.md](README.md) -- **Build Modularization**: `../../build_modularization_plan.md` - -## Quick Test - -Verify your build works: - -```bash -# Check z3ed runs -./build/bin/z3ed --version - -# Test AI detection -./build/bin/z3ed agent plan --prompt "test" 2>&1 | head -5 - -# Expected output (with Z3ED_AI=ON): -# πŸ€– Using Gemini AI with model: gemini-2.5-flash -# or -# πŸ€– Using Ollama AI with model: qwen2.5-coder:7b -# or -# πŸ€– Using MockAIService (no LLM configured) -``` - -## Support - -If you encounter issues: -1. Check this guide's troubleshooting section -2. Review [Z3ED_AI_FLAG_MIGRATION.md](Z3ED_AI_FLAG_MIGRATION.md) -3. Verify CMake output for warnings -4. Open an issue with build logs - -## Summary - -**Recommended for most users**: -```bash -cmake -B build -DZ3ED_AI=ON -cmake --build build --target z3ed -j8 -./build/bin/z3ed agent chat -``` - -This gives you: -- βœ… Ollama support (local, free) -- βœ… Gemini support (cloud, API key required) -- βœ… TUI chat interface -- βœ… Tool dispatcher with 5 commands -- βœ… Function calling support -- βœ… All AI agent features diff --git a/docs/z3ed/E6-z3ed-cli-design.md b/docs/z3ed/E6-z3ed-cli-design.md deleted file mode 100644 index c3242c0f..00000000 --- a/docs/z3ed/E6-z3ed-cli-design.md +++ /dev/null @@ -1,826 +0,0 @@ -# z3ed CLI Architecture & Design - -## 1. Overview - -This document is the **source of truth** for the z3ed CLI architecture and design. It outlines the evolution of `z3ed`, the command-line interface for the YAZE project, from a collection of utility commands into a powerful, scriptable, and extensible tool for both manual and automated ROM hacking, with full support for AI-driven generative development. - -**Related Documents**: -- **[E6-z3ed-implementation-plan.md](E6-z3ed-implementation-plan.md)** - Implementation tracker, task backlog, and roadmap -- **[E6-z3ed-reference.md](E6-z3ed-reference.md)** - Technical reference: commands, APIs, troubleshooting -- **[README.md](README.md)** - Quick overview and documentation index - -**Last Updated**: [Current Date] - -`z3ed` has successfully implemented its core infrastructure and is **production-ready on macOS**: - -**βœ… Completed Features**: -- **Resource-Oriented CLI**: Clean `z3ed ` command structure -- **Resource Catalogue**: Machine-readable API specs in YAML/JSON for AI consumption -- **Acceptance Workflow**: Full proposal lifecycle (create β†’ review β†’ accept/reject β†’ commit) -- **ImGuiTestHarness (IT-01)**: gRPC-based GUI automation with 6 RPC methods -- **CLI Agent Test (IT-02)**: Natural language prompts β†’ automated GUI testing -- **ProposalDrawer GUI**: Integrated review interface in YAZE editor -- **ROM Sandbox Manager**: Isolated testing environment for safe experimentation -- **Proposal Registry**: Cross-session proposal tracking with disk persistence - -**πŸ”„ In Progress**: -- **Test Harness Enhancements (IT-05 to IT-09)**: Expanding from basic automation to comprehensive testing platform - - Test introspection APIs for status/results polling - - Widget discovery for AI-driven interactions - - **βœ… Test recording/replay for regression testing** - - Enhanced error reporting with screenshots and application-wide diagnostics - - CI/CD integration with standardized test formats - -**πŸ“‹ Planned Next**: -- **Policy Evaluation Framework (AW-04)**: YAML-based constraints for proposal acceptance -- **Windows Cross-Platform Testing**: Validate on Windows with vcpkg -- **Production Readiness**: Telemetry, screenshot implementation, expanded test coverage - -## 2. Design Goals - -The z3ed CLI is built on three core pillars: - -1. **Power & Usability for ROM Hackers**: Empower users with fine-grained control over all aspects of the ROM directly from the command line, supporting both interactive exploration and scripted automation. - -2. **Testability & Automation**: Provide robust commands for validating ROM integrity, automating complex testing scenarios, and enabling reproducible workflows through scripting. - -3. **AI & Generative Hacking**: Establish a powerful, scriptable API that an AI agent (LLM/MCP) can use to perform complex, generative tasks on the ROM, with human oversight and approval workflows. - -### 2.1. Key Architectural Decisions - -**Resource-Oriented Command Structure**: Adopted `z3ed ` pattern (similar to kubectl, gcloud) for clarity and extensibility. - -**Machine-Readable API**: All commands documented in `docs/api/z3ed-resources.yaml` with structured schemas for AI consumption. - -**Proposal-Based Workflow**: AI-generated changes are sandboxed and tracked as "proposals" requiring human review and acceptance. - -**gRPC Test Harness**: Embedded gRPC server in YAZE enables remote GUI automation for testing and AI-driven workflows. - -**Comprehensive Testing Platform**: Test harness evolved beyond basic automation to support: -- **Widget Discovery**: AI agents can enumerate available GUI interactions dynamically -- **Test Introspection**: Query test status, results, and execution queue in real-time -- **Recording & Replay**: Capture test sessions as JSON scripts for regression testing -- **CI/CD Integration**: Standardized test suite format with JUnit XML output -- **Enhanced Debugging**: Screenshot capture, widget state dumps, and execution context on failures - -**Cross-Platform Foundation**: Core built for macOS/Linux with Windows support planned via vcpkg. - -## 3. Proposed CLI Architecture: Resource-Oriented Commands - -The CLI has adopted a `z3ed [options]` structure, similar to modern CLIs like `gcloud` or `kubectl`, improving clarity and extensibility. - -### 3.1. Top-Level Resources - -- `rom`: Commands for interacting with the ROM file itself. -- `patch`: Commands for applying and creating patches. -- `gfx`: Commands for graphics manipulation. -- `palette`: Commands for palette manipulation. -- `overworld`: Commands for overworld editing. -- `dungeon`: Commands for dungeon editing. -- `sprite`: Commands for sprite management and creation. -- `test`: Commands for running tests. -- `tui`: The entrypoint for the enhanced Text User Interface. -- `agent`: Commands for interacting with the AI agent. - -### 3.2. Example Command Mapping - -The command mapping has been successfully implemented, transitioning from the old flat structure to the new resource-oriented approach. - -## 4. New Features & Commands - -### 4.1. For the ROM Hacker (Power & Scriptability) - -These commands focus on exporting data to and from the original SCAD (Nintendo Super Famicom/SNES CAD) binary formats found in the gigaleak, as well as other relevant binary formats. This enables direct interaction with development assets, version control, and sharing. Many of these commands have been implemented or are in progress. - -- **Dungeon Editing**: Commands for exporting, importing, listing, and adding objects. -- **Overworld Editing**: Commands for getting, setting tiles, listing, and moving sprites. -- **Graphics & Palettes**: Commands for exporting/importing sheets and palettes. - -### 4.2. For Testing & Automation - -- **ROM Validation & Comparison**: `z3ed rom validate`, `z3ed rom diff`, and `z3ed rom generate-golden` have been implemented. -- **Test Execution**: `z3ed test run` and `z3ed test list-suites` are in progress. - -## 5. TUI Enhancements - -The `--tui` flag now launches a significantly enhanced, interactive terminal application built with FTXUI. The TUI has been decomposed into a set of modular components, with each command handler responsible for its own TUI representation, making it more extensible and easier to maintain. - -- **Dashboard View**: The main screen is evolving into a dashboard. -- **Interactive Palette Editor**: In progress. -- **Interactive Hex Viewer**: Implemented. -- **Command Palette**: In progress. -- **Tabbed Layout**: Implemented. - -## 6. Generative & Agentic Workflows (MCP Integration) - -The redesigned CLI serves as the foundational API for an AI-driven Model-Code-Program (MCP) loop. The AI agent's "program" is a script of `z3ed` commands. - -### 6.1. The Generative Workflow - -The generative workflow has been refined to incorporate more detailed planning and verification steps, leveraging the `z3ed agent` commands. - -### 6.2. Key Enablers - -- **Granular Commands**: The CLI provides commands to manipulate data within the binary formats (e.g., `palette set-color`, `gfx set-pixel`), abstracting complexity from the AI agent. -- **Idempotency**: Commands are designed to be idempotent where possible. -- **SpriteBuilder CLI**: Deprioritized for now, pending further research and development of the underlying assembly generation capabilities. - -## 7. Implementation Roadmap - -### Phase 1: Core CLI & TUI Foundation (Done) -- **CLI Structure**: Implemented. -- **Command Migration**: Implemented. -- **TUI Decomposition**: Implemented. - -### Phase 2: Interactive TUI & Command Palette (Done) -- **Interactive Palette Editor**: Implemented. -- **Interactive Hex Viewer**: Implemented. -- **Command Palette**: Implemented. - -### Phase 3: Testing & Project Management (Done) -- **`rom validate`**: Implemented. -- **`rom diff`**: Implemented. -- **`rom generate-golden`**: Implemented. -- **Project Scaffolding**: Implemented. - -### Phase 4: Agentic Framework & Generative AI (βœ… Foundation Complete, 🚧 LLM Integration In Progress) -- **`z3ed agent` command**: βœ… Implemented with `run`, `plan`, `diff`, `test`, `commit`, `revert`, `describe`, `learn`, and `list` subcommands. -- **Resource Catalog System**: βœ… Complete - comprehensive schema for all CLI commands with effects and returns metadata. -- **Agent Describe Command**: βœ… Fully operational - exports command catalog in JSON/YAML formats for AI consumption. -- **Agent List Command**: βœ… Complete - enumerates all proposals with status and metadata. -- **Agent Diff Enhancement**: βœ… Complete - reads proposals from registry, supports `--proposal-id` flag, displays execution logs and metadata. -- **Machine-Readable API**: βœ… `docs/api/z3ed-resources.yaml` generated and maintained for automation. -- **Conversational Agent Service**: βœ… Complete - multi-step tool execution loop with history management. -- **Tool Dispatcher**: βœ… Complete - 5 read-only tools for ROM introspection (`resource-list`, `dungeon-list-sprites`, `overworld-find-tile`, `overworld-describe-map`, `overworld-list-warps`). -- **TUI Chat Interface**: βœ… Complete - production-ready with table/JSON rendering (`z3ed agent chat`). -- **AI Service Backends**: βœ… Operational - Ollama (local) and Gemini (cloud) with enhanced prompting. -- **LLM Function Calling**: 🚧 In Progress - ToolDispatcher exists, needs tool schema injection into prompts and response parsing. -- **GUI Chat Widget**: πŸ“‹ Planned - TUI implementation complete, ImGui widget pending. -- **Execution Loop (MCP)**: βœ… Complete - command parsing and execution logic operational. -- **Leveraging `ImGuiTestEngine`**: βœ… Complete - `agent test` subcommand for GUI verification (see IT-01/02). -- **Sandbox ROM Management**: βœ… Complete - `RomSandboxManager` operational with full lifecycle management. -- **Proposal Tracking**: βœ… Complete - `ProposalRegistry` implemented with metadata, diffs, logs, and lifecycle management. -- **Granular Data Commands**: βœ… Complete - rom, palette, overworld, dungeon commands operational. -- **SpriteBuilder CLI**: Deprioritized. - -### Phase 5: Code Structure & UX Improvements (Completed) -- **Modular Architecture**: Refactored CLI handlers into clean, focused modules with proper separation of concerns. -- **TUI Component System**: Implemented `TuiComponent` interface for consistent UI components across the application. -- **Unified Command Interface**: Standardized `CommandHandler` base class with both CLI and TUI execution paths. -- **Error Handling**: Improved error handling with consistent `absl::Status` usage throughout the codebase. -- **Build System**: Streamlined CMake configuration with proper dependency management and conditional compilation. -- **Code Quality**: Resolved linting errors and improved code maintainability through better header organization and forward declarations. - -### Phase 6: Resource Catalogue & API Documentation (βœ… Completed - Oct 1, 2025) -- **Resource Schema System**: βœ… Comprehensive schema definitions for all CLI resources (ROM, Patch, Palette, Overworld, Dungeon, Agent). -- **Metadata Annotations**: βœ… All commands annotated with arguments, effects, returns, and stability levels. -- **Serialization Framework**: βœ… Dual-format export (JSON compact, YAML human-readable) with resource filtering. -- **Agent Describe Command**: βœ… Full implementation with `--format`, `--resource`, `--output`, `--version` flags. -- **API Documentation Generation**: βœ… Automated generation of `docs/api/z3ed-resources.yaml` for AI/tooling consumption. -- **Flag-Based Dispatch**: βœ… Hardened command routing - all ROM commands use `FLAGS_rom` consistently. -- **ROM Info Fix**: βœ… Created dedicated `RomInfo` handler, resolving segfault issue. - -**Key Achievements**: -- Machine-readable API catalog enables LLM integration for automated ROM hacking workflows -- Comprehensive command documentation with argument types, effects, and return schemas -- Stable foundation for AI agents to discover and invoke CLI commands programmatically -- Validation layer for ensuring command compatibility and argument correctness - -**Testing Coverage**: -- βœ… All ROM commands tested: `info`, `validate`, `diff`, `generate-golden` -- βœ… Agent describe tested: YAML output, JSON output, resource filtering, file generation -- βœ… Help system integration verified with updated command listings -- βœ… Build system validated on macOS (arm64) with no critical warnings - -## 8. Agentic Framework Architecture - Advanced Dive - -The agentic framework is designed to allow an AI agent to make edits to the ROM based on high-level natural language prompts. The framework is built around the `z3ed` CLI and the `ImGuiTestEngine`. This section provides a more advanced look into its architecture and future development. - -### 8.1. The `z3ed agent` Command - -The `z3ed agent` command is the main entry point for the agent. It has the following subcommands: - -- `run --prompt "..."`: Executes a prompt by generating and running a sequence of `z3ed` commands. -- `plan --prompt "..."`: Shows the sequence of `z3ed` commands the AI plans to execute. -- `diff [--proposal-id ]`: Shows a diff of the changes made to the ROM after running a prompt. Displays the latest pending proposal by default, or a specific proposal if ID is provided. -- `list`: Lists all proposals with their status, creation time, prompt, and execution statistics. -- `test --prompt "..."`: Generates changes and then runs an `ImGuiTestEngine` test to verify them. -- `commit`: Saves the modified ROM and any new assets to the project. -- `revert`: Reverts the changes made by the agent. -- `describe [--resource ]`: Returns machine-readable schemas for CLI commands, enabling AI/LLM integration. -- `learn --description "..."`: Records a sequence of user actions (CLI commands and GUI interactions) and associates them with a natural language description, allowing the agent to learn new workflows. - -### 8.2. The Agentic Loop (MCP) - Detailed Workflow - -1. **Model (Planner)**: The agent receives a high-level natural language prompt. It leverages an LLM to break down this goal into a detailed, executable plan. This plan is a sequence of `z3ed` CLI commands, potentially interleaved with `ImGuiTestEngine` test steps for intermediate verification. The LLM's prompt includes the user's request, a comprehensive list of available `z3ed` commands (with their parameters and expected effects), and relevant contextual information about the current ROM state (e.g., loaded ROM, project files, current editor view). -2. **Code (Command & Test Generation)**: The LLM returns the generated plan as a structured JSON object. This JSON object contains an array of actions, where each action specifies a `z3ed` command (with its arguments) or an `ImGuiTestEngine` test to execute. This structured output is crucial for reliable parsing and execution by the `z3ed` agent. -3. **Program (Execution Engine)**: The `z3ed agent` parses the JSON plan and executes each command sequentially. For `z3ed` commands, it directly invokes the corresponding internal `CommandHandler` methods. For `ImGuiTestEngine` steps, it launches the `yaze_test` executable with the appropriate test arguments. The output (stdout, stderr, exit codes) of each executed command is captured. This output, along with any visual feedback from `ImGuiTestEngine` (e.g., screenshots), can be fed back to the LLM for iterative refinement of the plan. -4. **Verification (Tester)**: The `ImGuiTestEngine` plays a critical role here. After the agent executes a sequence of commands, it can generate and run a specific `ImGuiTestEngine` script. This script can interact with the YAZE GUI (e.g., open a specific editor, navigate to a location, assert visual properties) to verify that the changes were applied correctly and as intended. The results of these tests (pass/fail, detailed logs, comparison screenshots) are reported back to the user and can be used by the LLM to self-correct or refine its strategy. - -### 8.3. AI Model & Protocol Strategy - -- **Models**: The framework will support both local and remote AI models, offering flexibility and catering to different user needs. - ---- - -## 9. Test Harness Evolution: From Automation to Platform - -The ImGuiTestHarness has evolved from a basic GUI automation tool into a comprehensive testing platform that serves dual purposes: **AI-driven generative workflows** and **traditional GUI testing**. - -### 9.1. Current Capabilities (IT-01 to IT-04) βœ… - -**Core Automation** (6 RPCs): -- `Ping` - Health check and version verification -- `Click` - Button, menu, and tab interactions -- `Type` - Text input with focus management -- `Wait` - Condition polling (window visibility, element state) -- `Assert` - State validation (visible, enabled, exists) -- `Screenshot` - Capture (stub, needs implementation) - -**Integration Points**: -- ImGuiTestEngine dynamic test registration -- Async test queue with frame-accurate timing -- gRPC server embedded in YAZE process -- Cross-platform build (macOS validated, Windows planned) - -**Proven Use Cases**: -- Menu-driven editor opening (Overworld, Dungeon, etc.) -- Window visibility validation -- Multi-step workflows with timing dependencies -- Natural language test prompts via `z3ed agent test` - -### 9.2. Limitations Identified - -**For AI Agents**: -- ❌ Can't discover available widgets β†’ must hardcode target names -- ❌ No way to query test results β†’ async tests return immediately with no status -- ❌ No structured error context β†’ failures lack screenshots and state dumps -- ❌ Limited to predefined actions β†’ can't learn new interaction patterns - -**For Traditional Testing**: -- ❌ No test recording β†’ can't capture manual workflows for regression -- ❌ No test suite format β†’ can't organize tests into smoke/regression/nightly groups -- ❌ No CI integration β†’ can't run tests in automated pipelines -- ❌ No result persistence β†’ test history lost between sessions -- ❌ Poor debugging β†’ failures don't capture visual or state context - -### 9.3. Enhancement Roadmap (IT-05 to IT-09) - -#### IT-05: Test Introspection API (6-8 hours) -**Problem**: Tests execute asynchronously with no way to query status or results. Clients poll blindly or give up early. - -**Solution**: Add 3 new RPCs: -- `GetTestStatus(test_id)` β†’ Returns queued/running/passed/failed/timeout with execution time -- `ListTests(category_filter)` β†’ Enumerates all registered tests with metadata -- `GetTestResults(test_id)` β†’ Retrieves detailed results: logs, assertions, metrics - -**Benefits**: -- AI agents can poll for test completion reliably -- CLI can show real-time progress bars -- Test history enables trend analysis (flaky tests, performance regressions) - -**Example Flow**: -```bash -# Queue test (returns immediately with test_id) -TEST_ID=$(z3ed agent test --prompt "Open Overworld" --output json | jq -r '.test_id') - -# Poll until complete -while true; do - STATUS=$(z3ed agent test status --test-id $TEST_ID --format json | jq -r '.status') - [[ "$STATUS" =~ ^(PASSED|FAILED|TIMEOUT)$ ]] && break - sleep 0.5 -done - -# Get results -z3ed agent test results --test-id $TEST_ID --include-logs -``` - -#### IT-06: Widget Discovery API (4-6 hours) -**Problem**: AI agents must know widget names in advance. Can't adapt to UI changes or learn new editors. - -**Solution**: Add `DiscoverWidgets` RPC: -- Enumerates all windows currently open -- Lists interactive widgets per window: buttons, inputs, menus, tabs -- Returns metadata: ID, label, type, enabled state, position -- Provides suggested action templates (e.g., "Click button:Save") - -**Benefits**: -- AI agents discover GUI capabilities dynamically -- Test scripts validate expected widgets exist -- LLM prompts improved with natural language descriptions -- Reduces brittleness from hardcoded widget names - -**Example Flow**: -```python -# AI agent workflow -widgets = z3ed_client.DiscoverWidgets(window_filter="Overworld") - -# LLM prompt: "Which buttons are available in the Overworld editor?" -available_actions = [w.suggested_action for w in widgets.buttons if w.is_enabled] - -# LLM generates: "Click button:Save Changes" -z3ed_client.Click(target="button:Save Changes") -``` - -#### IT-07: Test Recording & Replay βœ… COMPLETE -**Outcome**: Recording workflow, replay runner, and JSON script format shipped alongside CLI commands (`z3ed test record start|stop`, `z3ed test replay`). Regression coverage captured in `scripts/test_record_replay_e2e.sh`; documentation updated with quick-start examples. Focus now shifts to error diagnostics and artifact surfacing (IT-08). - -#### IT-08: Holistic Error Reporting (5-7 hours) -**Problem**: Errors surface differently across the CLI, ImGuiTestHarness, and EditorManager. Failures lack actionable context, slowing down triage and AI agent autonomy. - -**Solution Themes**: -- **Harness Diagnostics**: Implement the Screenshot RPC, capture widget tree/state, and bundle execution context for every failed run. -- **Structured Error Envelope**: Introduce a shared `ErrorAnnotatedResult` format (status + metadata + hints) adopted by z3ed, harness services, and EditorManager subsystems. -- **Artifact Surfacing**: Persist artifacts under `test-results//`; expose paths in CLI output and in-app overlays. -- **Developer Experience**: Provide HTML + JSON result formats, actionable hints (β€œRe-run with --follow”, β€œOpen screenshot: …”), and cross-links to recorded sessions for replay. - -**Benefits**: -- Faster debugging with consistent, high-signal failure context -- AI agents can reason about structured errors and attempt self-healing -- EditorManager gains on-screen diagnostics tied to harness artifacts -- Lays groundwork for future telemetry and CI reporting - -#### IT-09: CI/CD Integration βœ… CLI Foundations Complete -**Problem**: Tests run manually. No automated regression on PR/merge. - -**Shipped**: -- YAML test suite runtime with dependency-aware execution and retry handling -- `z3ed agent test suite run` supports `--group`, `--tag`, `--param`, - `--retries`, `--ci-mode`, and automatic JUnit XML emission under - `test-results/junit/` -- `z3ed agent test suite validate` performs structural linting and surfaces - exit codes (0 pass, 1 fail, 2 error) -- NEW `z3ed agent test suite create` interactive builder generates suites - (defaulting to `tests/.yaml`), with prompts for groups, replay scripts, - tags, and key=value parameters. `--force` enables overwrite flows. - -**Next Integration Steps**: -- Publish canonical `tests/smoke.yaml` / `tests/regression.yaml` templates in - the repo -- Add GitHub Actions example wiring harness referencing the new runner -- Document best practices for mapping suite tags to CI stages (smoke, - regression, nightly) -- Wire run summaries into docs (`docs/testing/`) with badge-ready status tables - -**GitHub Actions Example**: -```yaml -name: GUI Tests -on: [push, pull_request] -jobs: - gui-tests: - runs-on: macos-latest - steps: - - name: Build YAZE - run: cmake --build build --target yaze --target z3ed - - name: Start test harness - run: ./build/bin/yaze --enable_test_harness --headless & - - name: Run smoke tests - run: ./build/bin/z3ed test suite run tests/smoke.yaml --ci-mode - - name: Upload results - uses: actions/upload-artifact@v2 - with: - name: test-results - path: test-results/ -``` - -**Benefits**: -- Catch regressions before merge -- Test history tracked in CI dashboard -- Parallel execution for faster feedback -- Flaky test detection (retry logic, failure rates) - -### 9.4. Unified Testing Vision - -The enhanced test harness serves three audiences: - -**For AI Agents** (Generative Workflows): -- Widget discovery enables dynamic learning -- Test introspection provides reliable feedback loops -- Recording captures expert workflows for training data - -**For Developers** (Unit/Integration Testing): -- Test suites organize tests by scope (smoke, regression, nightly) -- CI integration catches regressions early -- Rich error reporting speeds up debugging - -**For QA Engineers** (Manual Testing Automation): -- Record manual workflows once, replay forever -- Parameterized tests reduce maintenance burden -- Visual test reports simplify communication - -**Shared Infrastructure**: -- Single gRPC server handles all test types -- Consistent test script format (JSON/YAML) -- Common result storage and reporting -- Cross-platform support (macOS, Windows, Linux) - -### 9.5. Implementation Priority - -**Phase 1: Foundation** (Already Complete βœ…) -- Core automation RPCs (Ping, Click, Type, Wait, Assert) -- ImGuiTestEngine integration -- gRPC server lifecycle -- Basic E2E validation - -**Phase 2: Introspection & Discovery** (IT-05, IT-06 - 10-14 hours) -- Test status/results querying -- Widget enumeration API -- Async test management -- *Critical for AI agents* - -**Phase 3: Recording & Replay** (IT-07 - 8-10 hours) -- Test script format -- Recording workflow -- Replay engine -- *Unlocks regression testing* - -**Phase 4: Production Readiness** (IT-08, IT-09 - 5-7 hours) -- Screenshot implementation -- Error context capture -- CI/CD integration -- *Enables automated pipelines* - -**Total Estimated Effort**: 23-31 hours beyond current implementation - ---- - - **Local Models (macOS Setup)**: For privacy, offline use, and reduced operational costs, integration with local LLMs via [Ollama](https://ollama.ai/) is a priority. Users can easily install Ollama on macOS and pull models optimized for code generation, such as `codellama:7b`. The `z3ed` agent will communicate with Ollama's local API endpoint. - - **Remote Models (Gemini API)**: For more complex tasks requiring advanced reasoning capabilities, integration with powerful remote models like the Gemini API will be available. Users will need to provide a `GEMINI_API_KEY` environment variable. A new `GeminiAIService` class will be implemented to handle the secure API requests and responses. -- **Protocol**: A robust, yet simple, JSON-based protocol will be used for communication between `z3ed` and the AI model. This ensures structured data exchange, critical for reliable parsing and execution. The `z3ed` tool will serialize the user's prompt, current ROM context, available `z3ed` commands, and any relevant `ImGuiTestEngine` capabilities into a JSON object. The AI model will be expected to return a JSON object containing the sequence of commands to be executed, along with potential explanations or confidence scores. - -### 8.4. GUI Integration & User Experience - -- **Agent Control Panel**: A dedicated TUI/GUI panel will be created for managing the agent. This panel will serve as the primary interface for users to interact with the AI. It will feature: - - A multi-line text input for entering natural language prompts. - - Buttons for `Run`, `Plan`, `Diff`, `Test`, `Commit`, `Revert`, and `Learn` actions. - - A real-time log view displaying the agent's thought process, executed commands, and their outputs. - - A status bar indicating the agent's current state (e.g., "Idle", "Planning", "Executing Commands", "Verifying Changes"). -- **Diff Editing UI**: A TUI-based visual diff viewer will be implemented. This UI will present a side-by-side comparison of the original ROM state (or a previous checkpoint) and the changes proposed or made by the agent. Users will be able to: - - Navigate through individual differences (e.g., changed bytes, modified tiles, added objects). - - Highlight specific changes. - - Accept or reject individual changes or groups of changes, providing fine-grained control over the agent's output. -- **Interactive Planning**: The agent will present its generated plan in a human-readable format within the GUI. Users will have the opportunity to: - - Review each step of the plan. - - Approve the entire plan for execution. - - Reject specific steps or the entire plan. - - Edit the plan directly (e.g., modify command arguments, reorder steps, insert new commands) before allowing the agent to proceed. - -### 8.5. Testing & Verification - -- **`ImGuiTestEngine` Integration**: The agent will be able to dynamically generate and execute `ImGuiTestEngine` tests. This allows for automated visual verification of the agent's work, ensuring that changes are not only functionally correct but also visually appealing and consistent with design principles. The agent can be trained to generate test scripts that assert specific pixel colors, UI element positions, or overall visual layouts. -- **Mock Testing Framework**: A robust "mock" mode will be implemented for the `z3ed agent`. In this mode, the agent will simulate the execution of commands without modifying the actual ROM. This is crucial for safe and fast testing of the agent's planning and command generation capabilities. The existing `MockRom` class will be extended to fully support all `z3ed` commands, providing a consistent interface for both real and mock execution. -- **User-Facing Tests**: A "tutorial" or "challenge" mode will be created where users can test the agent with a series of predefined tasks. This will serve as an educational tool for users to understand the agent's capabilities and provide a way to benchmark its performance against specific ROM hacking challenges. - -### 8.6. Safety & Sandboxing - -- **Dry Run Mode**: The agent will always offer a "dry run" mode, where it only shows the commands it would execute without making any actual changes to the ROM. This provides a critical safety net for users. -- **Command Whitelisting**: The agent's execution environment will enforce a strict command whitelisting policy. Only a predefined set of "safe" `z3ed` commands will be executable by the AI. Any attempt to execute an unauthorized command will be blocked. -- **Resource Limits**: The agent will operate within defined resource limits (e.g., maximum number of commands per plan, maximum data modification size) to prevent unintended extensive changes or infinite loops. -- **Human Oversight**: Given the inherent unpredictability of AI models, human oversight will be a fundamental principle. The interactive planning and diff editing UIs are designed to keep the user in control at all times. - -### 8.7. Optional JSON Dependency - -To avoid breaking platform builds where a JSON library is not available or desired, the JSON-related code will be conditionally compiled using a preprocessor macro (e.g., `YAZE_WITH_JSON`). When this macro is not defined, the agentic features that rely on JSON will be disabled. The `nlohmann/json` library will be added as a submodule to the project and included in the build only when `YAZE_WITH_JSON` is defined. - -### 8.8. Contextual Awareness & Feedback Loop - -- **Contextual Information**: The agent's prompts to the LLM will be enriched with comprehensive contextual information, including: - - The current state of the loaded ROM (e.g., ROM header, loaded assets, current editor view). - - Relevant project files (e.g., `.yaze` project configuration, symbol files). - - User preferences and previous interactions. - - A dynamic list of available `z3ed` commands and their detailed usage. -- **Feedback Loop for Learning**: The results of `ImGuiTestEngine` verifications and user accept/reject actions will form a crucial feedback loop. This data can be used to fine-tune the LLM or train smaller, specialized models to improve the agent's planning and command generation capabilities over time. - -### 8.9. Error Handling and Recovery - -- **Robust Error Reporting**: The agent will provide clear and actionable error messages when commands fail or unexpected situations arise. -- **Rollback Mechanisms**: The `revert` command provides a basic rollback. More advanced mechanisms, such as transactional changes or snapshotting, could be explored for complex multi-step operations. -- **Interactive Debugging**: In case of errors, the agent could pause execution and allow the user to inspect the current state, modify the plan, or provide corrective instructions. - -### 8.10. Extensibility - -- **Modular Command Handlers**: The `z3ed` CLI's modular design allows for easy addition of new commands, which automatically become available to the AI agent. -- **Pluggable AI Models**: The `AIService` interface enables seamless integration of different AI models (local or remote) without modifying the core agent logic. -- **Custom Test Generation**: Users or developers can extend the `ImGuiTestEngine` capabilities to create custom verification tests for specific hacking scenarios. - -## 9. UX Improvements and Architectural Decisions - -### 9.1. TUI Component Architecture - -The TUI system has been redesigned around a consistent component architecture: - -- **`TuiComponent` Interface**: All UI components implement a standard interface with a `Render()` method, ensuring consistency across the application. -- **Component Composition**: Complex UIs are built by composing simpler components, making the code more maintainable and testable. -- **Event Handling**: Standardized event handling patterns across all components for consistent user experience. - -### 9.2. Command Handler Unification - -The CLI and TUI systems now share a unified command handler architecture: - -- **Dual Execution Paths**: Each command handler supports both CLI (`Run()`) and TUI (`RunTUI()`) execution modes. -- **Shared State Management**: Common functionality like ROM loading and validation is centralized in the base `CommandHandler` class. -- **Consistent Error Handling**: All commands use `absl::Status` for uniform error reporting across CLI and TUI modes. - -### 9.3. Interface Consolidation - -Several interfaces have been combined and simplified: - -- **Unified Menu System**: The main menu now serves as a central hub for both direct command execution and TUI mode switching. -- **Integrated Help System**: Help information is accessible from both CLI and TUI modes with consistent formatting. -- **Streamlined Navigation**: Reduced cognitive load by consolidating related functionality into single interfaces. - -### 9.4. Code Organization Improvements - -The codebase has been restructured for better maintainability: - -- **Header Organization**: Proper forward declarations and include management to reduce compilation dependencies. -- **Namespace Management**: Clean namespace usage to avoid conflicts and improve code clarity. -- **Build System Optimization**: Streamlined CMake configuration with conditional compilation for optional features. - -### 9.5. Future UX Enhancements - -Based on the current architecture, several UX improvements are planned: - -- **Progressive Disclosure**: Complex commands will offer both simple and advanced modes. -- **Context-Aware Help**: Help text will adapt based on current ROM state and available commands. -- **Undo/Redo System**: Command history tracking for safer experimentation. -- **Batch Operations**: Support for executing multiple related commands as a single operation. - -## 10. Implementation Status and Code Quality - -### 10.1. Recent Refactoring Improvements (January 2025) - -The z3ed CLI underwent significant refactoring to improve code quality, fix linting errors, and enhance maintainability. - -**Issues Resolved**: -- βœ… **Missing Headers**: Added proper forward declarations for `ftxui::ScreenInteractive` and `TuiComponent` -- βœ… **Include Path Issues**: Standardized all includes to use `cli/` prefix instead of `src/cli/` -- βœ… **Namespace Conflicts**: Resolved namespace pollution issues by properly organizing includes -- βœ… **Duplicate Definitions**: Removed duplicate `CommandInfo` and `ModernCLI` definitions -- βœ… **FLAGS_rom Multiple Definitions**: Changed duplicate `ABSL_FLAG` declarations to `ABSL_DECLARE_FLAG` - -**Build System Improvements**: -- **CMake Configuration**: Cleaned up `z3ed.cmake` to properly configure all source files -- **Dependency Management**: Added proper includes for `absl/flags/declare.h` where needed -- **Conditional Compilation**: Properly wrapped JSON/HTTP library usage with `#ifdef YAZE_WITH_JSON` - -**Architecture Improvements**: -- Removed `std::unique_ptr` members from command handlers to avoid incomplete type issues -- Simplified constructors and `RunTUI` methods -- Maintained clean separation between CLI and TUI execution paths - -### 10.2. File Organization - -``` -src/cli/ - β”œβ”€β”€ cli_main.cc (Entry point - defines FLAGS) - β”œβ”€β”€ modern_cli.{h,cc} (Command registry and dispatch) - β”œβ”€β”€ tui.{h,cc} (TUI components and layout management) - β”œβ”€β”€ z3ed.{h,cc} (Command handler base classes) - β”œβ”€β”€ service/ - β”‚ β”œβ”€β”€ ai_service.{h,cc} (AI service interface) - β”‚ └── gemini_ai_service.{h,cc} (Gemini API implementation) - β”œβ”€β”€ handlers/ (Command implementations) - β”‚ β”œβ”€β”€ agent.cc - β”‚ β”œβ”€β”€ command_palette.cc - β”‚ β”œβ”€β”€ compress.cc - β”‚ β”œβ”€β”€ dungeon.cc - β”‚ β”œβ”€β”€ gfx.cc - β”‚ β”œβ”€β”€ overworld.cc - β”‚ β”œβ”€β”€ palette.cc - β”‚ β”œβ”€β”€ patch.cc - β”‚ β”œβ”€β”€ project.cc - β”‚ β”œβ”€β”€ rom.cc - β”‚ β”œβ”€β”€ sprite.cc - β”‚ └── tile16_transfer.cc - └── tui/ (TUI component implementations) - β”œβ”€β”€ tui_component.h - β”œβ”€β”€ asar_patch.{h,cc} - β”œβ”€β”€ palette_editor.{h,cc} - └── command_palette.{h,cc} -``` - -### 10.3. Code Quality Improvements - -**Removed Problematic Patterns**: -- Eliminated returning raw pointers to temporary objects in `GetCommandHandler` -- Used `static` storage for handlers to ensure valid lifetimes -- Proper const-reference usage to avoid unnecessary copies - -**Standardized Error Handling**: -- Consistent use of `absl::Status` return types -- Proper status checking with `RETURN_IF_ERROR` macro -- Clear error messages for user-facing commands - -**API Corrections**: -- Fixed `Bitmap::bpp()` β†’ `Bitmap::depth()` -- Fixed `PaletteGroup::set_palette()` β†’ direct pointer manipulation -- Fixed `Bitmap::mutable_vector()` β†’ `Bitmap::set_data()` - -### 10.4. TUI Component System - -**Implemented Components**: -- `TuiComponent` interface for consistent UI components -- `ApplyAsarPatchComponent` - Modular patch application UI -- `PaletteEditorComponent` - Interactive palette editing -- `CommandPaletteComponent` - Command search and execution - -**Standardized Patterns**: -- Consistent navigation across all TUI screens -- Centralized error handling with dedicated error screen -- Direct component function calls instead of handler indirection - -### 10.5. Known Limitations - -**Remaining Warnings (Non-Critical)**: -- Unused parameter warnings (mostly for stub implementations) -- Nodiscard warnings for status returns that are logged elsewhere -- Copy-construction warnings (minor performance considerations) -- Virtual destructor warnings in third-party zelda3 classes - -### 10.6. Future Code Quality Goals - -1. **Complete TUI Components**: Finish implementing all planned TUI components with full functionality -2. **Error Handling**: Add proper status checking for all `LoadFromFile` calls -3. **API Methods**: Implement missing ROM validation methods -4. **JSON Integration**: Complete HTTP/JSON library integration for Gemini AI service -5. **Performance**: Address copy-construction warnings by using const references -6. **Testing**: Expand unit test coverage for command handlers - -## 11. Agent-Ready API Surface Area - -To unlock deeper agentic workflows, the CLI and application layers must expose a well-documented, machine-consumable API surface that mirrors the capabilities available in the GUI editors. The following initiatives expand the command coverage and standardize access for both humans and AI agents: - -- **Resource Inventory**: Catalogue every actionable subsystem (ROM metadata, banks, tile16 atlas, actors, palettes, scripts) and map it to a resource/action pair (e.g., `rom header set`, `dungeon room copy`, `sprite spawn`). The catalogue will live in `docs/api/z3ed-resources.yaml` and be generated from source annotations; current machine-readable coverage includes palette, overworld, rom, patch, and dungeon actions. -- **Rich Metadata**: Schemas annotate each action with structured `effects` and `returns` arrays so agents can reason about side-effects and expected outputs when constructing plans. -- **Command Introspection Endpoint**: Introduce `z3ed agent describe --resource ` to return a structured schema describing arguments, enum values, preconditions, side-effects, and example invocations. Schemas will follow JSON Schema, enabling UI tooltips and LLM prompt construction. _Prototype status (OctΒ 2025)_: the command now streams catalog JSON from `ResourceCatalog`, including `effects` and `returns` arrays for each action across palette, overworld, rom, patch, and dungeon resources. - ```json - { - "resources": [ - { - "resource": "rom", - "actions": [ - { - "name": "validate", - "effects": [ - "Reads ROM from disk, verifies checksum, and reports header status." - ], - "returns": [ - { "field": "report", "type": "object", "description": "Checksum + header validation summary." } - ] - } - ] - }, - { - "resource": "overworld", - "actions": [ - { - "name": "get-tile", - "returns": [ - { "field": "tile", "type": "integer", "description": "Tile id located at the supplied coordinates." } - ] - } - ] - } - ] - } - ``` -- **State Snapshot APIs**: Extend `rom` and `project` resources with `export-state` actions that emit compact JSON snapshots (bank checksums, tile hashes, palette CRCs). Snapshots will seed the LLM context and accelerate change verification. -- **Write Guard Hooks**: All mutation-oriented commands will publish `PreChange` and `PostChange` events onto an internal bus (backed by `absl::Notification` + ring buffer). The agent loop subscribes to the bus to build a change proposal timeline used in review UIs and acceptance workflows. -- **Replayable Scripts**: Standardize a TOML-based script format (`.z3edscript`) that records CLI invocations with metadata (ROM hash, duration, success). Agents can emit scripts, humans can replay them via `z3ed script run `. - -## 12. Acceptance & Review Workflow - -An explicit accept/reject system keeps humans in control while encouraging rapid agent iteration. - -### 12.1. Change Proposal Lifecycle - -1. **Draft**: Agent executes commands in a sandbox ROM (auto-cloned using `Rom::SaveToFile` with `save_new=true`). All diffs, test logs, and screenshots are attached to a proposal ID. -2. **Review**: The dashboard surfaces proposals with summary cards (changed resources, affected banks, test status). Users can open a detail view built atop the existing diff viewer, augmented with per-resource controls (accept tile, reject palette entry, etc.). -3. **Decision**: Accepting merges the delta into the primary ROM and commits associated assets. Rejecting discards the sandbox ROM and emits feedback signals (tagged reasons) that can be fed back to future LLM prompts. -4. **Archive**: Accepted proposals are archived with metadata for provenance; rejected ones are stored briefly for analytics before being pruned. - -### 12.2. UI Extensions - -- **Proposal Drawer**: Adds a right-hand drawer in the ImGui dashboard listing open proposals with filters (resource type, test pass/fail, age). -- **Inline Diff Controls**: Integrate checkboxes/buttons into the existing palette/tile hex viewers so users can cherry-pick changes without leaving the visual context. -- **Feedback Composer**: Provide quick tags (β€œIncorrect palette”, β€œMisplaced sprite”, β€œRegression detected”) and optional freeform text. Feedback is serialized into the agent telemetry channel. -- **Undo/Redo Enhancements**: Accepted proposals push onto the global undo stack with descriptive labels, enabling rapid rollback during exploratory sessions. - -### 12.3. Policy Configuration - -- **Gatekeeping Rules**: Define YAML-driven policies (e.g., β€œrequire passing `agent smoke` and `palette regression` suites before accept button activates”). Rules live in `.yaze/policies/agent.yaml` and are evaluated by the dashboard. -- **Access Control**: Integrate project roles so only maintainers can finalize proposals while contributors can submit drafts. -- **Telemetry Opt-In**: Provide toggles for sharing anonymized proposal statistics to improve default prompts and heuristics. - -## 13. ImGuiTestEngine Control Bridge - -Allowing an LLM to drive the ImGui UI safely requires a structured bridge between generated plans and the `ImGuiTestEngine` runtime. - -### 13.1. Bridge Architecture - -- **Test Harness API**: Expose a lightweight gRPC/IPC service (`ImGuiTestHarness`) that accepts serialized input events (click, drag, key, text), query requests (widget tree, screenshot), and expectations (assert widget text equals …). The service runs inside `yaze_test` when started with `--automation=sock`. Agents connect via domain sockets (macOS/Linux) or named pipes (Windows). -- **Command Translation Layer**: Extend `z3ed agent run` to recognize plan steps with type `imgui_action`. These steps translate to harness calls (e.g., `{ "type": "imgui_action", "action": "click", "target": "Palette/Cell[12]" }`). -- **Synchronization Primitives**: Provide `WaitForIdle`, `WaitForCondition`, and `Delay` primitives so LLMs can coordinate with frame updates. Each primitive enforces timeouts and returns explicit success/failure statuses. -- **State Queries**: Implement reflection endpoints retrieving ImGui widget hierarchy, enabling the agent to confirm UI states before issuing the next actionβ€”mirroring how `ImGuiTestEngine` DSL scripts work today. - -#### 13.1.1. Transport & Envelope - -- **Session bootstrap**: `yaze_test --automation=` spins up the harness and prints a connection URI. The CLI or external agent opens a persistent stream (Unix domain socket on macOS/Linux, named pipe + overlapped IO on Windows). TLS is out-of-scope; trust is derived from local IPC. -- **Message format**: Each frame is a length-prefixed JSON envelope with optional binary attachments. Core fields: - ```json - { - "id": "req-42", - "type": "event" | "query" | "expect" | "control", - "payload": { /* type-specific body */ }, - "attachments": [ - { "slot": 0, "mime": "image/png" } - ] - } - ``` - Binary blobs (e.g., screenshots) follow immediately after the JSON payload in the same frame to avoid out-of-band coordination. -- **Streaming semantics**: Responses reuse the `id` field and include `status`, `error`, and optional attachments. Long-running operations (`WaitForCondition`) stream periodic `progress` updates before returning `status: "ok"` or `status: "timeout"`. - -#### 13.1.2. Harness Runtime Lifecycle - -1. **Attach**: Agent sends a `control` message (`{"command":"attach"}`) to lock in a session. Harness responds with negotiated capabilities (available input devices, screenshot formats, rate limits). -2. **Activate context**: Agent issues an `event` to focus a specific ImGui context (e.g., "main", "palette_editor"). Harness binds to the corresponding `ImGuiTestEngine` backend fixture. -3. **Execute actions**: Agent streams `event` objects (`click`, `drag`, `keystroke`, `text_input`). Harness feeds them into the ImGui event queue at the start of the next frame, waits for the frame to settle, then replies. -4. **Query & assert**: Agent interleaves `query` messages (`get_widget_tree`, `capture_screenshot`, `read_value`) and `expect` messages (`assert_property`, `assert_pixel`). Harness routes these to existing ImGuiTestEngine inspectors, lifting the results into structured JSON. -5. **Detach**: Agent issues `{"command":"detach"}` (or connection closes). Harness flushes pending frames, releases sandbox locks, and tears down the socket. - -#### 13.1.3. Integration with `z3ed agent` - -- **Plan annotation**: The CLI plan schema gains a new step kind `imgui_action` with fields `harness_uri`, `actions[]`, and optional `expect[]`. During execution `z3ed agent run` opens the harness stream, feeds each action, and short-circuits on first failure. -- **Sandbox awareness**: Harness sessions inherit the active sandbox ROM path from `RomSandboxManager`, ensuring UI assertions operate on the same data snapshot as CLI mutations. -- **Telemetry hooks**: Every harness response is appended to the proposal timeline (see Β§12) with thumbnails for screenshots. Failures bubble up as structured errors with hints (`"missing_widget": "Palette/Cell[12]"`). - -### 13.2. Safety & Sandboxing - -- **Read-Only Default**: Harness sessions start in read-only mode; mutation commands must explicitly request escalation after presenting a plan (triggering a UI prompt for the user to authorize). Without authorization, only `capture` and `assert` operations succeed. -- **Rate Limiting**: Cap concurrent interactions and enforce per-step quotas to prevent runaway agents. -- **Logging**: Every harness call is logged and linked to the proposal ID, with playback available inside the acceptance UI. - -### 13.3. Script Generation Strategy - -- **Template Library**: Publish a library of canonical ImGui action sequences (open file, expand tree, focus palette editor). Plans reference templates via IDs to reduce LLM token usage and improve reliability. -- **Auto-Healing**: When a widget lookup fails, the harness can suggest closest matches (Levenshtein distance) so the agent can retry with corrected IDs. -- **Hybrid Execution**: Encourage plans that mix CLI operations for bulk edits and ImGui actions for visual verification, minimizing UI-driven mutations. - -## 14. Test & Verification Strategy - -### 14.1. Layered Test Suites - -- **CLI Unit Tests**: Extend `test/cli/` with high-coverage tests for new resource handlers using sandbox ROM fixtures. -- **Harness Integration Tests**: Add `test/ui/automation/` cases that spin up the harness, replay canned plans, and validate deterministic behavior. -- **End-to-End Agent Scenarios**: Create golden scenarios (e.g., β€œRecolor Link tunic”, β€œShift Dungeon Chest”) that exercise command + UI flows, verifying ROM diffs, UI captures, and pass/fail criteria. - -### 14.2. Continuous Verification - -- **CI Pipelines**: Introduce dedicated CI jobs for agent features, enabling `YAZE_WITH_JSON` builds, running harness smoke suites, and publishing artifacts (diffs, screenshots) on failure. -- **Nightly Regression**: Schedule nightly runs of expensive ImGui scenarios and long-running CLI scripts with hardware acceleration (Apple Metal) to detect flaky interactions. -- **Fuzzing Hooks**: Instrument command parsers with libFuzzer harnesses to catch malformed LLM output early. - -### 14.3. Telemetry-Informed Testing - -- **Flake Tracker**: Aggregate harness failures by widget/action to prioritize stabilization. -- **Adaptive Test Selection**: Use proposal metadata to select relevant regression suites dynamically (e.g., palette-focused proposals trigger palette regression tests). -- **Feedback Loop**: Feed test outcomes back into prompt engineering, e.g., annotate prompts with known flaky commands so the LLM favors safer alternatives. - -## 15. Expanded Roadmap (Phase 6+) - -### Phase 6: Agent Workflow Foundations (Planned) -- Implement resource catalogue tooling and `agent describe` schemas. -- Ship sandbox ROM workflow with proposal tracking and acceptance UI. -- Finalize ImGuiTestHarness MVP with read-only verification. -- Expand CLI surface with sprite/object manipulation commands flagged as agent-safe. - -### Phase 7: Controlled Mutation & Review (Planned) -- Enable harness mutation mode with user authorization prompts. -- Deliver inline diff controls and feedback composer UI. -- Wire policy engine for gating accept buttons. -- Launch initial telemetry dashboards (opt-in) for agent performance metrics. - -### Phase 8: Learning & Self-Improvement (Exploratory) -- Capture accept/reject rationales to train prompt selectors. -- Experiment with reinforcement signals for local models (reward accepted plans, penalize rejected ones). -- Explore collaborative agent sessions where multiple proposals merge or compete under defined heuristics. -- Investigate deterministic replay of LLM outputs for reliable regression testing. - -### 7.4. Widget ID Management for Test Automation - -A key challenge in GUI test automation is the fragility of identifying widgets. Relying on human-readable labels (e.g., `"button:Overworld"`) makes tests brittle; a simple text change in the UI can break the entire test suite. - -To address this, the `z3ed` ecosystem includes a robust **Widget ID Management** system. - -**Goals**: -- **Decouple Tests from Labels**: Tests should refer to a stable, logical ID, not a display label. -- **Hierarchical and Scoped IDs**: Allow for organized and unique identification of widgets within complex, nested UIs. -- **Discoverability**: Enable the test harness to easily find and interact with widgets using these stable IDs. - -**Implementation**: -- **`WidgetIdRegistry`**: A central service that manages the mapping between stable, hierarchical IDs and the dynamic `ImGuiID`s used at runtime. -- **Hierarchical Naming**: Widget IDs are structured like paths (e.g., `/editors/overworld/toolbar/save_button`). This avoids collisions and provides context. -- **Registration**: Editor and tool developers are responsible for registering their interactive widgets with the `WidgetIdRegistry` upon creation. -- **Test Harness Integration**: The `ImGuiTestHarness` uses the registry to look up the current `ImGuiID` for a given stable ID, ensuring it always interacts with the correct widget, regardless of label changes or UI refactoring. - -This system is critical for the long-term maintainability of the automated E2E validation pipeline. diff --git a/docs/z3ed/E6-z3ed-implementation-plan.md b/docs/z3ed/E6-z3ed-implementation-plan.md deleted file mode 100644 index 41693e52..00000000 --- a/docs/z3ed/E6-z3ed-implementation-plan.md +++ /dev/null @@ -1,2268 +0,0 @@ -# z3ed Agentic Workflow Plan - -**Last Updated**: October 2, 2025 -**Status**: Core Infrastructure Complete | Test Harness Enhancement Phase 🎯 - -> πŸ“‹ **Quick Start**: See [README.md](README.md) for essential links and project status. - -## Executive Summary - -The z3ed CLI and AI agent workflow system has completed major infrastructure milestones: - -**βœ… Completed Phases**: -- **Phase 6**: Resource Catalogue - Machine-readable API specs for AI consumption -- **AW-01/02/03**: Acceptance Workflow - Proposal tracking, sandbox management, GUI review with ROM merging -- **AW-04**: Policy Evaluation Framework - YAML-based constraint system for proposal acceptance -- **IT-01**: ImGuiTestHarness - Full GUI automation via gRPC + ImGuiTestEngine (all 3 phases complete) -- **IT-02**: CLI Agent Test - Natural language β†’ automated GUI testing (implementation complete) - -**🎯 Active Phase**: -- **Conversational Agent Implementation**: βœ… Foundation complete, LLM function calling βœ… COMPLETE (Oct 3, 2025) - -**πŸ“‹ Next Phases (Updated Oct 3, 2025)**: -- **Priority 1**: Live LLM Testing (1-2h) - Verify function calling with Ollama/Gemini -- **Priority 2**: GUI Chat Widget (6-8h) - Create ImGui widget matching TUI experience -- **Priority 3**: Expand Tool Coverage (8-10h) - Add dialogue, sprite, region inspection tools -- **Priority 4**: Widget Discovery API (IT-06) - AI agents enumerate available GUI interactions -- **Priority 5**: Windows Cross-Platform Testing - Validate on Windows with vcpkg -- **Deprioritized**: Collaborative Editing (IT-10) - Postponed in favor of practical LLM integration - -**Recent Accomplishments** (Updated: October 2025): -- **βœ… IT-08 Enhanced Error Reporting Complete**: Full diagnostic capture operational - - IT-08a: Screenshot RPC with SDL capture (BMP format, 1536x864) - - IT-08b: Auto-capture execution context on failures (frame, window, widget) - - IT-08c: Widget state dumps with comprehensive UI snapshot (JSON, 45 min) - - Proto schema updated with screenshot_path, failure_context, widget_state - - GetTestResults RPC returns complete failure diagnostics -- **βœ… IT-09 CLI Suite Commands Landed**: End-to-end suite orchestration for CI - - `agent test suite run` handles groups, tags, params, retries, and emits - summaries plus default JUnit XML under `test-results/junit/` - - `agent test suite validate` performs structural linting with exit codes - - NEW `agent test suite create` interactive builder writes YAML suites to - `tests/.yaml` (with `--force` overwrite) and guides group/test entry -- **βœ… IT-08a Screenshot RPC Complete**: SDL-based screenshot capture operational - - Captures 1536x864 BMP files via SDL_RenderReadPixels - - Successfully tested via gRPC (5.3MB output files) - - Foundation for auto-capture on test failures -- **βœ… Policy Framework Complete**: PolicyEvaluator service fully integrated with ProposalDrawer GUI - - 4 policy types implemented: test_requirement, change_constraint, forbidden_range, review_requirement - - 3 severity levels: Info (informational), Warning (overridable), Critical (blocks acceptance) - - GUI displays color-coded violations (β›” critical, ⚠️ warning, ℹ️ info) - - Accept button gating based on policy violations with override confirmation dialog - - Example policy configuration at `.yaze/policies/agent.yaml` -- **βœ… E2E Validation Complete**: All 5 functional RPC tests passing (Ping, Click, Type, Wait, Assert) - - Window detection timing issue **resolved** with 10-frame yield buffer in Wait RPC - - Thread safety issues **resolved** with shared_ptr state management - - Test harness validated on macOS ARM64 with real YAZE GUI interactions -- **gRPC Test Harness (IT-01 & IT-02)**: Full implementation complete with natural language β†’ GUI testing -- **βœ… Test Recording & Replay (IT-07)**: JSON recorder/replayer implemented, CLI and harness wired, end-to-end regression workflow captured in `scripts/test_record_replay_e2e.sh` -- **Build System**: Hardened CMake configuration with reliable gRPC integration -- **Proposal Workflow**: Agentic proposal system fully operational (create, list, diff, review in GUI) - -**Known Limitations & Improvement Opportunities**: -- **Screenshot Auto-Capture**: Manual RPC only β†’ needs integration with TestManager failure detection -- **Test Introspection**: βœ… Complete - GetTestStatus/ListTests/GetResults RPCs operational -- **Widget Discovery**: AI agents can't enumerate available widgets β†’ add DiscoverWidgets RPC -- **Test Recording**: No record/replay for regression testing β†’ add RecordSession/ReplaySession RPCs -- **Synchronous Wait**: Async tests return immediately β†’ add blocking mode or result polling -- **Error Context**: Test failures lack screenshots/state dumps β†’ enhance error reporting -- **Performance**: Tests add ~166ms per Wait call due to frame yielding (acceptable trade-off) -- **YAML Parsing**: Simple parser implemented, consider yaml-cpp for complex scenarios - -**Time Investment**: 28.5 hours total (IT-01: 11h, IT-02: 7.5h, E2E: 2h, Policy: 6h, Docs: 2h) - -## Quick Reference - -**Start Test Harness**: -```bash -./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze \ - --enable_test_harness \ - --test_harness_port=50052 \ - --rom_file=assets/zelda3.sfc & -``` - -**Test All RPCs**: -```bash -./scripts/test_harness_e2e.sh -``` - -**Create Proposal**: -```bash -./build/bin/z3ed agent run "Test prompt" --sandbox -./build/bin/z3ed agent list -./build/bin/z3ed agent diff --proposal-id -``` - -**Review in GUI**: -- Open YAZE β†’ `Debug β†’ Agent Proposals` -- Select proposal β†’ Review β†’ Accept/Reject/Delete - ---- - -## 1. Current Priorities (Week of Oct 2-8, 2025) - -**Status**: Core Infrastructure Complete βœ… | Test Harness Enhancement Phase πŸ”§ - -### Priority 1: Test Harness Enhancements (IT-05 to IT-09) πŸ”§ ACTIVE -**Goal**: Transform test harness from basic automation to comprehensive testing platform **and deliver holistic error reporting across YAZE** -**Time Estimate**: 20-25 hours total (7.5h completed in IT-07) -**Blocking Dependency**: IT-01 Complete βœ… - -**Motivation**: The harness now supports AI workflows, regression capture, and automationβ€”but error surfaces remain shallow: -- **AI Agent Development**: Still needs widget discovery for adaptive planning -- **Regression Testing**: Recording/replay finished; reporting pipeline must surface actionable failures -- **CI/CD Integration**: Requires reliable artifacts (logs, screenshots, structured context) -- **Debugging**: Failures lack screenshots, widget hierarchies, and EditorManager state snapshots -- **Application Consistency**: z3ed, EditorManager, and core services emit heterogeneous error formats - -#### IT-05: Test Introspection API (6-8 hours) -**Status (Oct 2, 2025)**: βœ… Completed - -**Highlights**: -- `imgui_test_harness.proto` now exposes `GetTestStatus`, `ListTests`, and - `GetTestResults` RPCs backed by `TestManager`'s execution history. -- CLI commands (`z3ed agent test status|list|results`) are fully wired with - JSON/YAML formatting, follow-mode polling, and filtering options. -- `GuiAutomationClient` provides typed wrappers for introspection APIs so agent - workflows can poll status programmatically. -- Regression coverage lives in `scripts/test_harness_e2e.sh`; a slimmer - introspection smoke (`scripts/test_introspection_e2e.sh`) is queued for CI - automation but manual verification paths are documented. - -**Future Enhancements**: -- Capture richer assertion metadata (expected/actual pairs) for improved - failure messaging when the underlying harness exposes it. -- Add pagination helpers to CLI once history volume grows (low priority). - -**Example Usage**: -```bash -# Queue a test -z3ed agent test --prompt "Open Overworld editor" - -# Poll for completion -z3ed test status --test-id grpc_click_12345678 - -# Retrieve results -z3ed test results --test-id grpc_click_12345678 --format json -``` - -**API Schema**: -```proto -message GetTestStatusRequest { - string test_id = 1; -} - -message GetTestStatusResponse { - enum Status { QUEUED = 0; RUNNING = 1; PASSED = 2; FAILED = 3; TIMEOUT = 4; } - Status status = 1; - int64 execution_time_ms = 2; - string error_message = 3; - repeated string assertion_failures = 4; -} - -message ListTestsRequest { - string category_filter = 1; // Optional: "grpc", "unit", etc. - int32 page_size = 2; - string page_token = 3; -} - -message ListTestsResponse { - repeated TestInfo tests = 1; - string next_page_token = 2; -} - -message TestInfo { - string test_id = 1; - string name = 2; - string category = 3; - int64 last_run_timestamp_ms = 4; - int32 total_runs = 5; - int32 pass_count = 6; - int32 fail_count = 7; -} -``` - -#### IT-06: Widget Discovery API (4-6 hours) -**Implementation Tasks**: -1. **Add DiscoverWidgets RPC**: - - Enumerate all windows currently open in YAZE GUI - - List all interactive widgets (buttons, inputs, menus, tabs) per window - - Return widget metadata: ID, type, label, enabled state, position - - Support filtering by window name or widget type - -2. **AI-Friendly Output Format**: - - JSON schema describing available interactions - - Natural language descriptions for each widget - - Suggested action templates (e.g., "Click button:{label}") - -**Example Usage**: -```bash -# Discover all widgets -z3ed gui discover - -# Filter by window -z3ed gui discover --window "Overworld" - -# Get only buttons -z3ed gui discover --type button -``` - -**API Schema (current)**: -```proto -message DiscoverWidgetsRequest { - string window_filter = 1; - WidgetType type_filter = 2; - string path_prefix = 3; - bool include_invisible = 4; - bool include_disabled = 5; -} - -message WidgetBounds { - float min_x = 1; - float min_y = 2; - float max_x = 3; - float max_y = 4; -} - -message DiscoveredWidget { - string path = 1; - string label = 2; - string type = 3; - string description = 4; - string suggested_action = 5; - bool visible = 6; - bool enabled = 7; - WidgetBounds bounds = 8; - uint32 widget_id = 9; - int64 last_seen_frame = 10; - int64 last_seen_at_ms = 11; - bool stale = 12; -} - -message DiscoveredWindow { - string name = 1; - bool visible = 2; - repeated DiscoveredWidget widgets = 3; -} - -message DiscoverWidgetsResponse { - repeated DiscoveredWindow windows = 1; - int32 total_widgets = 2; - int64 generated_at_ms = 3; -} -``` - -**Benefits for AI Agents**: -- LLMs can dynamically learn available GUI interactions -- Agents can adapt to UI changes without hardcoded widget names -- Natural language descriptions enable better prompt engineering - -#### IT-07: Test Recording & Replay βœ… COMPLETE (Oct 2, 2025) -**Highlights**: -- Implemented `StartRecording`, `StopRecording`, and `ReplayTest` RPCs with persistent JSON scripts -- Added CLI commands: `z3ed test record start|stop`, `z3ed test replay` -- Scripts stored in `tests/gui/` with metadata (name, tags, assertions, timing hints) -- Added regression coverage via `scripts/test_record_replay_e2e.sh` -- Documentation updates in `E6-z3ed-reference.md` and new quick-start snippets in README -- Confirmed compatibility with natural language prompts generated by the agent workflow - -**Outcome**: Recording/replay is production-ready; focus shifts to surfacing rich failure diagnostics (IT-08). - -#### IT-08: Enhanced Error Reporting (5-7 hours) βœ… COMPLETE -**Status**: IT-08a Complete βœ… | IT-08b Complete βœ… | IT-08c Complete βœ… -**Objective**: Deliver a unified, high-signal error reporting pipeline spanning ImGuiTestHarness, z3ed CLI, EditorManager, and core application services. - -**Implementation Tracks**: -1. **Harness-Level Diagnostics** - - βœ… IT-08a: Screenshot RPC implemented (SDL-based, BMP format, 1536x864) - - βœ… IT-08b: Auto-capture screenshots and context on test failure using shared - helper that writes to `${TMPDIR}/yaze/test-results//` - - βœ… IT-08c: Widget tree JSON dumps emitted alongside failure context - - ⏳ HTML bundle exporter (screenshots + widget tree) remains a stretch goal - -2. **CLI Experience Improvements** - - Surface artifact paths, failure context, and widget state in CLI output (DONE) - - Standardize error envelopes in z3ed (`absl::Status` + structured payload) - - Add `--format html` flag to emit rich bundles (planned) - - Integrate with recording workflow: replay failures using captured state (planned) - -3. **EditorManager & Application Integration** - - Introduce shared `ErrorAnnotatedResult` utility exposing `status`, `context`, `actionable_hint` - - Adapt EditorManager subsystems (ProposalDrawer, OverworldEditor, DungeonEditor) to adopt the shared structure - - Add in-app failure overlay (ImGui modal) that references harness artifacts when available - - Hook proposal acceptance/replay flows to display enriched diagnostics when sandbox merges fail - -4. **Telemetry & Storage Hooks** (Stretch) - - Optionally emit error metadata to a ring buffer for future analytics/telemetry workstreams - - Provide CLI flag `--error-artifact-dir` to customize storage (supports CI separation) - -**Error Report Example**: -```json -{ - "test_id": "grpc_assert_12345678", - "failure_time": "2025-10-02T14:23:45Z", - "assertion": "visible:Overworld", - "expected": "visible", - "actual": "hidden", - "screenshot": "/tmp/yaze/test-results/grpc_assert_12345678/failure_1696357220000.bmp", - "widget_state": { - "active_window": "Main Window", - "focused_widget": null, - "visible_windows": ["Main Window", "Debug"], - "overworld_window": { "exists": true, "visible": false, "position": "0,0,0,0" } - }, - "execution_context": { - "frame_count": 1234, - "recent_events": ["Click: menuitem: Overworld Editor", "Wait: window_visible:Overworld"], - "resource_stats": { "memory_mb": 245, "textures": 12, "framerate": 60.0 }, - "editor_manager_snapshot": { - "active_module": "OverworldEditor", - "dirty_buffers": ["overworld_layer_1"], - "last_error": null - } - } -} -``` - -#### IT-09: CI/CD Integration βœ… CLI Tooling Shipped -**Delivered (Oct 3, 2025)**: -1. **Standardized Suite Runtime** - - YAML suite parser/loader with group dependencies and retry semantics - - `z3ed agent test suite run` exposes `--group`, `--tag`, `--param`, - `--retries`, `--ci-mode`, and `--junit` - - Automatic JUnit XML emission to `test-results/junit/.xml` - -2. **Validation & Authoring UX** - - `z3ed agent test suite validate` surfaces structural linting with - annotated exit codes (0 pass, 1 fail, 2 error) - - NEW `z3ed agent test suite create ` interactive flow scaffolds - suites under `tests/`, prompting for metadata, groups, replay scripts, - tags, and key=value parameters (with `--force` overwrite support) - -3. **Reporting** - - Text and JSON summaries include per-test assertions and retry outcomes - - Default output directory layout ready for CI artifact upload - -**Next Steps** (post-CLI follow-through): -- Publish canonical `tests/smoke.yaml` / `tests/regression.yaml` samples -- Add `.github/workflows/gui-tests.yml` template referencing the new runner -- Document flaky-test mitigation patterns, including recommended retry counts -- Wire suite execution output into docs/CI dashboards for quick triage - -**Test Suite Format**: -```yaml -name: YAZE GUI Test Suite -description: Comprehensive tests for YAZE editor functionality -version: 1.0 - -config: - timeout_per_test: 30s - retry_on_failure: 2 - parallel_execution: false - -test_groups: - - name: smoke - description: Fast tests for basic functionality - tests: - - tests/overworld_load.json - - tests/dungeon_load.json - - - name: regression - description: Full test suite for release validation - depends_on: [smoke] - tests: - - tests/palette_edit.json - - tests/sprite_load.json - - tests/rom_save.json -``` - -**GitHub Actions Integration**: -```yaml -name: GUI Tests -on: [push, pull_request] - -jobs: - gui-tests: - runs-on: macos-latest - steps: - - uses: actions/checkout@v2 - - name: Build YAZE with test harness - run: | - cmake -B build -DYAZE_WITH_GRPC=ON - cmake --build build --target yaze --target z3ed - - name: Start test harness - run: | - ./build/bin/yaze --enable_test_harness --headless & - sleep 5 - - name: Run test suite - run: | - ./build/bin/z3ed test run-suite tests/suite.yaml --ci-mode - - name: Upload test results - if: always() - uses: actions/upload-artifact@v2 - with: - name: test-results - path: test-results/ -``` - ---- - -#### IT-10: Collaborative Editing & Multiplayer Sessions ⏸️ DEPRIORITIZED - -**Status**: Postponed in favor of LLM integration work -**Rationale**: While collaborative editing is an interesting feature, practical LLM integration provides more immediate value for the agentic workflow system. The core infrastructure is complete, and enabling real AI agents to interact with z3ed is the critical next step. - -**Future Consideration**: IT-10 may be revisited after LLM integration is production-ready and validated by users. The collaborative editing design is preserved in the documentation for future reference. - -**See**: [LLM-INTEGRATION-PLAN.md](LLM-INTEGRATION-PLAN.md) for the new priority work. - ---- - -### Priority 2: LLM Integration (Ollama + Gemini + Claude) πŸ€– NEW PRIORITY - -**Goal**: Enable practical AI-driven ROM modifications with local and remote LLM providers -**Time Estimate**: 12-15 hours total -**Status**: Ready to Implement - -**Why This is Critical**: The z3ed infrastructure is complete (CLI, proposals, sandbox, GUI automation), but currently uses `MockAIService` with hardcoded commands. Real LLM integration unlocks the full potential of the agentic workflow system. - -**πŸ“‹ Complete Documentation**: -- **[LLM-INTEGRATION-PLAN.md](LLM-INTEGRATION-PLAN.md)** - Detailed technical implementation guide (60+ pages) -- **[LLM-IMPLEMENTATION-CHECKLIST.md](LLM-IMPLEMENTATION-CHECKLIST.md)** - Step-by-step task list with checkboxes -- **[LLM-INTEGRATION-SUMMARY.md](LLM-INTEGRATION-SUMMARY.md)** - Executive summary and getting started - -**Implementation Phases**: - -#### Phase 1: Ollama Local Integration (4-6 hours) 🎯 START HERE -- Create `OllamaAIService` class with health checks and model management -- Wire into agent commands with provider selection mechanism -- Add CMake configuration for httplib support -- End-to-end testing with `qwen2.5-coder:7b` model - -**Key Benefits**: Local, free, private, no rate limits - -#### Phase 2: Gemini Fixes (2-3 hours) -- Fix existing `GeminiAIService` implementation -- Improve prompting with resource catalogue -- Add markdown code block stripping for reliable parsing - -#### Phase 3: Claude Integration (2-3 hours) -- Create `ClaudeAIService` class -- Implement Messages API integration -- Same interface as other services for easy swapping - -#### Phase 4: Enhanced Prompt Engineering (3-4 hours) -- Create `PromptBuilder` utility class -- Load resource catalogue (`z3ed-resources.yaml`) into system prompts -- Add few-shot examples for improved accuracy (>90%) -- Inject ROM context (current state, loaded editors) - -**Quick Start After Implementation**: -```bash -# Install Ollama -brew install ollama -ollama serve & -ollama pull qwen2.5-coder:7b - -# Configure z3ed -export YAZE_AI_PROVIDER=ollama - -# Use natural language -z3ed agent run --prompt "Make all soldier armor red" --rom zelda3.sfc --sandbox -z3ed agent diff # Review changes -``` - -**Testing Script**: `./scripts/quickstart_ollama.sh` (automated setup validation) - ---- - -### Priority 3: Windows Cross-Platform Testing πŸͺŸ -1. **Collaboration Server**: - - WebSocket server for real-time client communication - - Session management (create, join, authentication) - - Edit event broadcasting to all connected clients - - Conflict resolution (last-write-wins with timestamps) - -2. **Collaboration Client**: - - Connect to remote sessions via WebSocket - - Send local edits to server - - Receive and apply remote edits - - ROM state synchronization on join - -3. **Edit Event Protocol**: - - Protobuf definitions for edit events (tile, sprite, palette, map) - - Cursor position tracking - - AI proposal sharing and voting - - Session state messages - -4. **GUI Integration**: - - Status bar showing connected users - - Collaboration panel (user list, activity feed) - - Live cursor rendering (color-coded per user) - - Proposal voting UI (Accept/Reject/Discuss) - -5. **Session Recording & Replay**: - - Record all events to YAML/JSON file - - Replay engine with timeline controls - - Export session summaries for review - -**CLI Commands**: -```bash -# Host a collaborative session -z3ed collab host --port 5000 --password "dev123" - -# Join a session -z3ed collab join yaze://connect/192.168.1.100:5000 - -# List active sessions (LAN discovery) -z3ed collab list - -# Disconnect from session -z3ed collab disconnect - -# Replay recorded session -z3ed collab replay session_2025_10_02.yaml --speed 2x -``` - -**User Stories**: -- **US-1**: As a ROM hacker, I want to host a collaborative session so my teammates can join and work together -- **US-2**: As a collaborator, I want to see other users' edits in real-time so we stay synchronized -- **US-3**: As a team lead, I want to use AI agents with my team so we can all benefit from automation (shared proposals with majority voting) -- **US-4**: As a collaborator, I want to see where other users are working so we don't conflict (live cursors) -- **US-5**: As a project manager, I want to record collaborative sessions so we can review work later - -**Benefits**: -- **Real-Time Collaboration**: Multiple users can edit the same ROM simultaneously -- **Shared AI Assistance**: Team votes on AI proposals before execution -- **Conflict Prevention**: Live cursors show where teammates are working -- **Audit Trail**: Session recording for review and compliance -- **Remote Teams**: Connect over LAN or internet (with optional encryption) - -**Technical Architecture**: -``` -β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” -β”‚ Client A │────►│ Collab Server │◄────│ Client B β”‚ -β”‚ (Host) β”‚ β”‚ (WebSocket) β”‚ β”‚ β”‚ -β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ - β”‚ - Session Mgmt β”‚ - β”‚ - Event Broker β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” - β”‚ - Conflict Res │◄────│ Client C β”‚ - β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ -``` - -**Security Considerations**: -- Optional password protection for sessions -- Read-only vs read-write access levels -- ROM checksum verification (prevents desync) -- Rate limiting (prevent spam/DOS) -- Optional TLS/SSL encryption for public internet - -**See**: [IT-10-COLLABORATIVE-EDITING.md](IT-10-COLLABORATIVE-EDITING.md) for complete specification - ---- - -### Priority 2: Windows Cross-Platform Testing πŸͺŸ -**Goal**: Validate z3ed and test harness on Windows -**Time Estimate**: 8-10 hours -**Blocking Dependency**: IT-05 Complete (need stable API) - -> πŸ“‹ **Detailed Guides**: See [NEXT_PRIORITIES_OCT2.md](NEXT_PRIORITIES_OCT2.md) for complete implementation breakdowns with code examples. - ---- - -## 2. Workstreams Overview - -| Workstream | Goal | Status | Notes | -|------------|------|--------|-------| -| Resource Catalogue | Machine-readable CLI specs for AI consumption | βœ… Complete | `docs/api/z3ed-resources.yaml` generated | -| Acceptance Workflow | Human review/approval of agent proposals | βœ… Complete | ProposalDrawer with ROM merging operational | -| ImGuiTest Bridge | Automated GUI testing via gRPC | βœ… Complete | All 3 phases done (11 hours) | -| Verification Pipeline | Layered testing + CI coverage | πŸ“‹ In Progress | E2E validation phase | -| Telemetry & Learning | Capture signals for improvement | πŸ“‹ Planned | Optional/opt-in (Phase 8) | - -### Completed Work Summary - -**Resource Catalogue (RC)** βœ…: -- CLI flag passthrough and resource catalog system -- `agent describe` exports YAML/JSON schemas -- `docs/api/z3ed-resources.yaml` maintained -- All ROM/Palette/Overworld/Dungeon/Patch commands documented - -**Acceptance Workflow (AW-01/02/03)** βœ…: -- `ProposalRegistry` with disk persistence and cross-session tracking -- `RomSandboxManager` for isolated ROM copies -- `agent list` and `agent diff` commands -- **ProposalDrawer GUI**: List/detail views, Accept/Reject/Delete, ROM merging -- Integrated into EditorManager (`Debug β†’ Agent Proposals`) - -**ImGuiTestHarness (IT-01)** βœ…: -- Phase 1: gRPC infrastructure (6 RPC methods) -- Phase 2: TestManager integration with dynamic tests -- Phase 3: Full ImGuiTestEngine (Type/Wait/Assert RPCs) -- E2E test script: `scripts/test_harness_e2e.sh` -- Documentation: IT-01-QUICKSTART.md - ---- - -## 3. Task Backlog - -| ID | Task | Workstream | Type | Status | Dependencies | -|----|------|------------|------|--------|--------------| -| RC-01 | Define schema for `ResourceCatalog` entries and implement serialization helpers. | Resource Catalogue | Code | βœ… Done | Schema system complete with all resource types documented | -| RC-02 | Auto-generate `docs/api/z3ed-resources.yaml` from command annotations. | Resource Catalogue | Tooling | βœ… Done | Generated and committed to docs/api/ | -| RC-03 | Implement `z3ed agent describe` CLI surface returning JSON schemas. | Resource Catalogue | Code | βœ… Done | Both YAML and JSON output formats working | -| RC-04 | Integrate schema export with TUI command palette + help overlays. | Resource Catalogue | UX | πŸ“‹ Planned | RC-03 | -| RC-05 | Harden CLI command routing/flag parsing to unblock agent automation. | Resource Catalogue | Code | βœ… Done | Fixed rom info handler to use FLAGS_rom | -| AW-01 | Implement sandbox ROM cloning and tracking (`RomSandboxManager`). | Acceptance Workflow | Code | βœ… Done | ROM sandbox manager operational with lifecycle management | -| AW-02 | Build proposal registry service storing diffs, logs, screenshots. | Acceptance Workflow | Code | βœ… Done | ProposalRegistry implemented with disk persistence | -| AW-03 | Add ImGui drawer for proposals with accept/reject controls. | Acceptance Workflow | UX | βœ… Done | ProposalDrawer GUI complete with ROM merging | -| AW-04 | Implement policy evaluation for gating accept buttons. | Acceptance Workflow | Code | βœ… Done | PolicyEvaluator service with 4 policy types (test, constraint, forbidden, review), GUI integration complete (6 hours) | -| AW-05 | Draft `.z3ed-diff` hybrid schema (binary deltas + JSON metadata). | Acceptance Workflow | Design | πŸ“‹ Planned | AW-01 | -| IT-01 | Create `ImGuiTestHarness` IPC service embedded in `yaze_test`. | ImGuiTest Bridge | Code | βœ… Done | Phase 1+2+3 Complete - Full GUI automation with gRPC + ImGuiTestEngine (11 hours) | -| IT-02 | Implement CLI agent step translation (`imgui_action` β†’ harness call). | ImGuiTest Bridge | Code | βœ… Done | `z3ed agent test` command with natural language prompts (7.5 hours) | -| IT-03 | Provide synchronization primitives (`WaitForIdle`, etc.). | ImGuiTest Bridge | Code | βœ… Done | Wait RPC with condition polling already implemented in IT-01 Phase 3 | -| IT-04 | Complete E2E validation with real YAZE widgets | ImGuiTest Bridge | Test | βœ… Done | IT-02 - All 5 functional tests passing, window detection fixed with yield buffer | -| IT-05 | Add test introspection RPCs (GetTestStatus, ListTests, GetResults) | ImGuiTest Bridge | Code | βœ… Done | IT-01 - Enable clients to poll test results and query execution state (Oct 2, 2025) | -| IT-06 | Implement widget discovery API for AI agents | ImGuiTest Bridge | Code | πŸ“‹ Planned | IT-01 - DiscoverWidgets RPC to enumerate windows, buttons, inputs | -| IT-07 | Add test recording/replay for regression testing | ImGuiTest Bridge | Code | βœ… Done | IT-05 - RecordSession/ReplaySession RPCs with JSON test scripts | -| IT-08 | Enhance error reporting with screenshots and state dumps | ImGuiTest Bridge | Code | βœ… Done | IT-01 - Screenshot RPC, auto-capture, widget state dumps complete (Oct 2, 2025) | -| IT-08a | Screenshot RPC implementation (SDL capture) | ImGuiTest Bridge | Code | βœ… Done | IT-01 - Screenshot capture complete (Oct 2, 2025) | -| IT-08b | Auto-capture screenshots on test failure | ImGuiTest Bridge | Code | βœ… Done | IT-08a - Integrated with TestManager (Oct 2, 2025) | -| IT-08c | Widget state dumps and execution context | ImGuiTest Bridge | Code | βœ… Done | IT-08b - Enhanced failure diagnostics (Oct 2, 2025) | -| IT-09 | Create standardized test suite format for CI integration | ImGuiTest Bridge | Infra | βœ… Done | IT-07 - CLI suite run/validate/create commands, JUnit output | -| IT-10 | Collaborative editing & multiplayer sessions with shared AI | Collaboration | Feature | πŸ“‹ Planned | IT-05, IT-08 - Real-time multi-user editing with live cursors, shared proposals (12-15 hours) | -| VP-01 | Expand CLI unit tests for new commands and sandbox flow. | Verification Pipeline | Test | πŸ“‹ Planned | RC/AW tasks | -| VP-02 | Add harness integration tests with replay scripts. | Verification Pipeline | Test | πŸ“‹ Planned | IT tasks | -| VP-03 | Create CI job running agent smoke tests with `YAZE_WITH_JSON`. | Verification Pipeline | Infra | πŸ“‹ Planned | VP-01, VP-02 | -| TL-01 | Capture accept/reject metadata and push to telemetry log. | Telemetry & Learning | Code | πŸ“‹ Planned | AW tasks | -| TL-02 | Build anonymized metrics exporter + opt-in toggle. | Telemetry & Learning | Infra | πŸ“‹ Planned | TL-01 | - -_Status Legend: πŸ”„ Active Β· πŸ“‹ Planned Β· βœ… Done_ - -**Progress Summary**: -- βœ… Completed: 13 tasks (54%) -- πŸ”„ Active: 0 tasks (0%) -- πŸ“‹ Planned: 11 tasks (46%) -- **Total**: 24 tasks (6 test harness enhancements + 1 collaborative feature) - -## 3. Immediate Next Steps (Week of Oct 1-7, 2025) - -### Priority 0: Testing & Validation (Active) -1. **TEST**: Complete end-to-end proposal workflow - - Launch YAZE and verify ProposalDrawer displays live proposals - - Test Accept action β†’ verify ROM merge and save prompt - - Test Reject and Delete actions - - Validate filtering and refresh functionality - -2. **Widget ID Refactoring** (Started Oct 2, 2025) 🎯 NEW - - βœ… Added widget_id_registry to build system - - βœ… Registered 13 Overworld toolset buttons with hierarchical IDs - - πŸ“‹ Next: Test widget discovery and update test harness - - See: [WIDGET_ID_REFACTORING_PROGRESS.md](WIDGET_ID_REFACTORING_PROGRESS.md) - -### Priority 1: ImGuiTestHarness Foundation (IT-01) βœ… COMPLETE -**Rationale**: Required for automated GUI testing and remote control of YAZE for AI workflows -**Decision**: βœ… **Use gRPC** - Production-grade, cross-platform, type-safe (see `IT-01-grpc-evaluation.md`) - -**Status**: Phase 1 Complete βœ… | Phase 2 Complete βœ… | Phase 3 Planned οΏ½ - -#### Phase 1: gRPC Infrastructure βœ… COMPLETE -- βœ… Add gRPC to build system via FetchContent -- βœ… Create .proto schema (Ping, Click, Type, Wait, Assert, Screenshot) -- βœ… Implement gRPC server with all 6 RPC stubs -- βœ… Test with grpcurl - all RPCs responding -- βœ… Server lifecycle management (Start/Shutdown) -- βœ… Cross-platform build verified (macOS ARM64) - -**See**: `GRPC_TEST_SUCCESS.md` for Phase 1 completion details - -#### Phase 2: ImGuiTestEngine Integration βœ… COMPLETE -**Goal**: Replace stub RPC handlers with actual GUI automation -**Status**: Infrastructure complete, dynamic test registration implemented -**Time Spent**: ~4 hours - -**Implementation Guide**: πŸ“– **[IT-01-PHASE2-IMPLEMENTATION-GUIDE.md](IT-01-PHASE2-IMPLEMENTATION-GUIDE.md)** - -**Completed Tasks**: -1. βœ… **TestManager Integration** - gRPC service receives TestManager reference -2. βœ… **Build System** - Successfully compiles with ImGuiTestEngine support -3. βœ… **Server Startup** - gRPC server starts correctly on macOS with test harness flag -4. βœ… **Dynamic Test Registration** - Click RPC uses `IM_REGISTER_TEST()` macro for dynamic tests -5. βœ… **Stub Handlers** - Type/Wait/Assert RPCs return success (implementation pending Phase 3) -6. βœ… **Ping RPC** - Fully functional, returns YAZE version and timestamp - -**Key Learnings**: -- ImGuiTestEngine requires test registration - can't call test functions directly -- Test context provided by engine via `test->Output.Status` not `test->Status` -- YAZE uses custom flag system with `FLAGS_name->Get()` pattern -- Correct flags: `--enable_test_harness`, `--test_harness_port`, `--rom_file` - -**Testing Results**: -```bash -# Server starts successfully -./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze \ - --enable_test_harness \ - --test_harness_port=50052 \ - --rom_file=assets/zelda3.sfc & - -# Ping RPC working -grpcurl -plaintext -d '{"message":"test"}' \ - 127.0.0.1:50052 yaze.test.ImGuiTestHarness/Ping -# Response: {"message":"Pong: test","timestampMs":"...","yazeVersion":"0.3.2"} -``` - -**Issues Fixed**: -- βŒβ†’βœ… SIGSEGV on TestManager initialization (deferred ImGuiTestEngine init to Phase 3) -- βŒβ†’βœ… ImGuiTestEngine API mismatch (switched to dynamic test registration) -- βŒβ†’βœ… Status field access (corrected to `test->Output.Status`) -- βŒβ†’βœ… Port conflicts (use port 50052, `killall yaze` to cleanup) -- βŒβ†’βœ… Flag naming (documented correct underscore format) - -#### Phase 3: Full ImGuiTestEngine Integration βœ… COMPLETE (Oct 2, 2025) -**Goal**: Complete implementation of all GUI automation RPCs - -**Completed Tasks**: -1. βœ… **Type RPC Implementation** - Full text input automation - - ItemInfo API usage corrected (returns by value, not pointer) - - Focus management with ItemClick before typing - - Clear-first functionality with keyboard shortcuts - - Dynamic test registration with timeout handling - -2. βœ… **Wait RPC Implementation** - Condition polling with timeout - - Three condition types: window_visible, element_visible, element_enabled - - Configurable timeout (default 5000ms) and poll interval (default 100ms) - - Proper Yield() calls to allow ImGui event processing - - Extended timeout for test execution - -3. βœ… **Assert RPC Implementation** - State validation with structured responses - - Multiple assertion types: visible, enabled, exists, text_contains - - Actual vs expected value reporting - - Detailed error messages for debugging - - text_contains partially implemented (text retrieval needs refinement) - -4. βœ… **API Compatibility Fixes** - - Corrected ItemInfo usage (by value, check ID != 0) - - Fixed flag names (ItemFlags instead of StatusFlags) - - Proper visibility checks using RectClipped dimensions - - All dynamic tests properly registered and cleaned up - -**Testing**: -- Build successful on macOS ARM64 -- All RPCs respond correctly -- Test script created: `scripts/test_harness_e2e.sh` -- See `IT-01-PHASE3-COMPLETE.md` for full implementation details - -**Known Limitations**: -- Screenshot RPC not implemented (placeholder stub) -- text_contains assertion uses placeholder text retrieval -- Need end-to-end workflow testing with real YAZE widgets - -6. **End-to-End Testing** (1 hour) - - Create shell script workflow: start server β†’ click button β†’ wait for window β†’ type text β†’ assert state - - Test with real YAZE editors (Overworld, Dungeon, etc.) - - Document edge cases and troubleshooting - -#### Phase 4: CLI Integration & Windows Testing (4-5 hours) -7. **CLI Client** (`z3ed agent test`) - - Generate gRPC calls from AI prompts - - Natural language β†’ ImGui action translation - - Screenshot capture for LLM feedback - - Emit structured error envelopes with artifact links (IT-08) - -8. **Windows Testing** - - Detailed build instructions for vcpkg setup - - Test on Windows VM or with contributor - - Add Windows CI job to GitHub Actions - - Document troubleshooting - -### IT-01 Quick Reference - -**Start YAZE with Test Harness**: -```bash -./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze \ - --enable_test_harness \ - --test_harness_port=50052 \ - --rom_file=assets/zelda3.sfc & -``` - -**Test RPCs with grpcurl**: -```bash -# Ping - Health check -grpcurl -plaintext -import-path src/app/core/proto -proto imgui_test_harness.proto \ - -d '{"message":"test"}' 127.0.0.1:50052 yaze.test.ImGuiTestHarness/Ping - -# Click - Click UI element -grpcurl -plaintext -import-path src/app/core/proto -proto imgui_test_harness.proto \ - -d '{"target":"button:Overworld","type":"LEFT"}' \ - 127.0.0.1:50052 yaze.test.ImGuiTestHarness/Click - -# Type - Input text -grpcurl -plaintext -import-path src/app/core/proto -proto imgui_test_harness.proto \ - -d '{"target":"input:Filename","text":"zelda3.sfc","clear_first":true}' \ - 127.0.0.1:50052 yaze.test.ImGuiTestHarness/Type - -# Wait - Wait for condition -grpcurl -plaintext -import-path src/app/core/proto -proto imgui_test_harness.proto \ - -d '{"condition":"window_visible:Overworld Editor","timeout_ms":5000}' \ - 127.0.0.1:50052 yaze.test.ImGuiTestHarness/Wait - -# Assert - Validate state -grpcurl -plaintext -import-path src/app/core/proto -proto imgui_test_harness.proto \ - -d '{"condition":"visible:Main Window"}' \ - 127.0.0.1:50052 yaze.test.ImGuiTestHarness/Assert -``` - -**Troubleshooting**: -- **Port in use**: `killall yaze` or use `--test_harness_port=50053` -- **Connection refused**: Check server started with `lsof -i :50052` -- **Unrecognized flag**: Use underscores not hyphens (e.g., `--rom_file` not `--rom`) - -### Priority 2: Policy Evaluation Framework (AW-04, 4-6 hours) -5. **DESIGN**: YAML-based Policy Configuration - ```yaml - # .yaze/policies/agent.yaml - version: 1.0 - policies: - - name: require_tests - type: test_requirement - enabled: true - rules: - - test_suite: "overworld_rendering" - min_pass_rate: 0.95 - - test_suite: "palette_integrity" - min_pass_rate: 1.0 - - - name: limit_change_scope - type: change_constraint - enabled: true - rules: - - max_bytes_changed: 10240 # 10KB - - allowed_banks: [0x00, 0x01, 0x0E] # Graphics banks only - - forbidden_ranges: - - start: 0xFFB0 # ROM header - end: 0xFFFF - - - name: human_review_required - type: review_requirement - enabled: true - rules: - - if: bytes_changed > 1024 - then: require_diff_review: true - - if: commands_executed > 10 - then: require_log_review: true - ``` - -6. **IMPLEMENT**: PolicyEvaluator Service - - `src/cli/service/policy_evaluator.{h,cc}` - - Singleton service loads policies from `.yaze/policies/` - - `EvaluateProposal(proposal_id) -> PolicyResult` - - Returns: pass/fail + list of violations with severity - - Hook into ProposalRegistry lifecycle - -7. **INTEGRATE**: Policy UI in ProposalDrawer - - Add "Policy Status" section in detail view - - Display violations with icons: β›” Critical, ⚠️ Warning, ℹ️ Info - - Gate Accept button: disabled if critical violations exist - - Show helpful messages: "Accept blocked: Test pass rate 0.85 < 0.95" - - Allow policy overrides with confirmation: "Override policy? This action will be logged." - -### Priority 3: Documentation & Consolidation (2-3 hours) -8. **CONSOLIDATE**: Merge standalone docs into main plan - - βœ… AW-03 summary β†’ already in main plan, delete standalone doc - - Check for other AW-* or task-specific docs to merge - - Update main plan with architecture diagrams - -9. **CREATE**: Architecture Flow Diagram - - Visual representation of proposal lifecycle - - Component interaction diagram - - Add to implementation plan - -### Later: Advanced Features -- VP-01: Expand CLI unit tests -- VP-02: Integration tests with replay scripts -- TL-01: Telemetry capture for learning - -## 4. Current Issues & Blockers - -### Active Issues -None - all blocking issues resolved as of Oct 1, 2025 - -### Known Limitations (Non-Blocking) -1. ProposalDrawer lacks keyboard navigation -2. Large diffs/logs truncated at 1000 lines (consider pagination) -3. Proposals don't persist full metadata to disk (prompt, description, sandbox_id reconstructed) -4. No policy evaluation yet (AW-04) - -## 5. Architecture Overview - -### 5.1. Proposal Lifecycle Flow - -``` -β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” -β”‚ 1. CREATION (CLI: z3ed agent run) β”‚ -β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ -β”‚ User Prompt β”‚ -β”‚ ↓ β”‚ -β”‚ MockAIService / GeminiAIService β”‚ -β”‚ ↓ (generates commands) β”‚ -β”‚ ["palette export ...", "overworld set-tile ..."] β”‚ -β”‚ ↓ β”‚ -β”‚ RomSandboxManager::CreateSandbox(rom) β”‚ -β”‚ ↓ (creates isolated copy) β”‚ -β”‚ /tmp/yaze/sandboxes//zelda3.sfc β”‚ -β”‚ ↓ β”‚ -β”‚ Execute commands on sandbox ROM β”‚ -β”‚ ↓ (logs each command) β”‚ -β”‚ ProposalRegistry::CreateProposal(sandbox_id, prompt, desc) β”‚ -β”‚ ↓ (creates proposal directory) β”‚ -β”‚ /tmp/yaze/proposals/proposal--/ β”‚ -β”‚ β”œβ”€ execution.log (command outputs) β”‚ -β”‚ β”œβ”€ diff.txt (if generated) β”‚ -β”‚ └─ screenshots/ (if any) β”‚ -β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ - -β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” -β”‚ 2. DISCOVERY (CLI: z3ed agent list) β”‚ -β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ -β”‚ ProposalRegistry::ListProposals() β”‚ -β”‚ ↓ (lazy loads from disk) β”‚ -β”‚ LoadProposalsFromDiskLocked() β”‚ -β”‚ ↓ (scans /tmp/yaze/proposals/) β”‚ -β”‚ Reconstructs metadata from filesystem β”‚ -β”‚ ↓ (parses timestamps, reads logs) β”‚ -β”‚ Returns vector β”‚ -β”‚ ↓ β”‚ -β”‚ Display table: ID | Status | Created | Prompt | Stats β”‚ -β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ - -β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” -β”‚ 3. REVIEW (GUI: Debug β†’ Agent Proposals) β”‚ -β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ -β”‚ ProposalDrawer::Draw() β”‚ -β”‚ ↓ (called every frame from EditorManager) β”‚ -β”‚ ProposalDrawer::RefreshProposals() β”‚ -β”‚ ↓ (calls ProposalRegistry::ListProposals) β”‚ -β”‚ Display proposal list (selectable table) β”‚ -β”‚ ↓ (user clicks proposal) β”‚ -β”‚ ProposalDrawer::SelectProposal(id) β”‚ -β”‚ ↓ (loads detail content) β”‚ -β”‚ Read execution.log and diff.txt from proposal directory β”‚ -β”‚ ↓ β”‚ -β”‚ Display detail view: β”‚ -β”‚ β”œβ”€ Metadata (sandbox_id, timestamp, stats) β”‚ -β”‚ β”œβ”€ Diff (syntax highlighted) β”‚ -β”‚ └─ Log (command execution trace) β”‚ -β”‚ ↓ β”‚ -β”‚ User decides: [Accept] [Reject] [Delete] β”‚ -β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ - -β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” -β”‚ 4. ACCEPTANCE (GUI: Click "Accept" button) β”‚ -β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ -β”‚ ProposalDrawer::AcceptProposal(proposal_id) β”‚ -β”‚ ↓ β”‚ -β”‚ Get proposal metadata (includes sandbox_id) β”‚ -β”‚ ↓ β”‚ -β”‚ RomSandboxManager::ListSandboxes() β”‚ -β”‚ ↓ (find sandbox by ID) β”‚ -β”‚ sandbox_rom_path = sandbox.rom_path β”‚ -β”‚ ↓ β”‚ -β”‚ Load sandbox ROM from disk β”‚ -β”‚ ↓ β”‚ -β”‚ rom_->WriteVector(0, sandbox_rom.vector()) β”‚ -β”‚ ↓ (copies entire sandbox ROM β†’ main ROM) β”‚ -β”‚ ROM marked dirty (save prompt appears) β”‚ -β”‚ ↓ β”‚ -β”‚ ProposalRegistry::UpdateStatus(id, kAccepted) β”‚ -β”‚ ↓ β”‚ -β”‚ User: File β†’ Save ROM β”‚ -β”‚ ↓ β”‚ -β”‚ Changes committed βœ… β”‚ -β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ - -β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” -β”‚ 5. REJECTION (GUI: Click "Reject" button) β”‚ -β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ -β”‚ ProposalDrawer::RejectProposal(proposal_id) β”‚ -β”‚ ↓ β”‚ -β”‚ ProposalRegistry::UpdateStatus(id, kRejected) β”‚ -β”‚ ↓ β”‚ -β”‚ Proposal preserved for audit trail β”‚ -β”‚ Sandbox ROM left untouched (can be cleaned up later) β”‚ -β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ -``` - -### 5.2. Component Interaction Diagram - -``` -β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” -β”‚ CLI Layer β”‚ -β”‚ (z3ed commands) β”‚ -β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ - β”‚ - β”œβ”€β”€β–Ί agent run ──────────┐ - β”œβ”€β”€β–Ί agent list ────────── - └──► agent diff ────────── - β”‚ - β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” - β”‚ CLI Service Layer β”‚ - β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ - β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ - β”‚ β”‚ ProposalRegistry (Singleton) β”‚ β”‚ - β”‚ β”‚ β€’ CreateProposal() β”‚ β”‚ - β”‚ β”‚ β€’ ListProposals() β”‚ β”‚ - β”‚ β”‚ β€’ GetProposal() β”‚ β”‚ - β”‚ β”‚ β€’ UpdateStatus() β”‚ β”‚ - β”‚ β”‚ β€’ RemoveProposal() β”‚ β”‚ - β”‚ β”‚ β€’ LoadProposalsFromDiskLocked() β”‚ β”‚ - β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ - β”‚ β”‚ β”‚ - β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ - β”‚ β”‚ RomSandboxManager (Singleton) β”‚ β”‚ - β”‚ β”‚ β€’ CreateSandbox() β”‚ β”‚ - β”‚ β”‚ β€’ ActiveSandbox() β”‚ β”‚ - β”‚ β”‚ β€’ ListSandboxes() β”‚ β”‚ - β”‚ β”‚ β€’ RemoveSandbox() β”‚ β”‚ - β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ - β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ - β”‚ - β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” - β”‚ Filesystem Layer β”‚ - β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ - β”‚ /tmp/yaze/proposals/ β”‚ - β”‚ └─ proposal--/ β”‚ - β”‚ β”œβ”€ execution.log β”‚ - β”‚ β”œβ”€ diff.txt β”‚ - β”‚ └─ screenshots/ β”‚ - β”‚ β”‚ - β”‚ /tmp/yaze/sandboxes/ β”‚ - β”‚ └─ -/ β”‚ - β”‚ └─ zelda3.sfc (isolated ROM copy) β”‚ - β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ - β–² - β”‚ - β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” - β”‚ GUI Layer β”‚ - β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ - β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ - β”‚ β”‚ EditorManager β”‚ β”‚ - β”‚ β”‚ β€’ current_rom_ β”‚ β”‚ - β”‚ β”‚ β€’ proposal_drawer_ β”‚ β”‚ - β”‚ β”‚ β€’ Update() { proposal_drawer_.Draw() } β”‚ β”‚ - β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ - β”‚ β”‚ β”‚ - β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ - β”‚ β”‚ ProposalDrawer β”‚ β”‚ - β”‚ β”‚ β€’ rom_ (ptr to EditorManager's ROM) β”‚ β”‚ - β”‚ β”‚ β€’ Draw() β”‚ β”‚ - β”‚ β”‚ β€’ DrawProposalList() β”‚ β”‚ - β”‚ β”‚ β€’ DrawProposalDetail() β”‚ β”‚ - β”‚ β”‚ β€’ AcceptProposal() ← ROM MERGE β”‚ β”‚ - β”‚ β”‚ β€’ RejectProposal() β”‚ β”‚ - β”‚ β”‚ β€’ DeleteProposal() β”‚ β”‚ - β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ - β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ -``` - -### 5.3. Data Flow: Agent Run to ROM Merge - -``` -User: "Make soldiers wear red armor" - β”‚ - β–Ό -β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” -β”‚ MockAIService β”‚ Generates: ["palette export sprites_aux1 4 soldier.col"] -β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ - β”‚ - β–Ό -β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” -β”‚ RomSandboxManager β”‚ Creates: /tmp/.../sandboxes/20251001T200215-1/zelda3.sfc -β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ - β”‚ - β–Ό -β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” -β”‚ Command Executor β”‚ Runs: palette export on sandbox ROM -β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ - β”‚ - β–Ό -β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” -β”‚ ProposalRegistry β”‚ Creates: proposal-20251001T200215-1/ -β”‚ β”‚ β€’ execution.log: "[timestamp] palette export succeeded" -β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β€’ diff.txt: (if diff generated) - β”‚ - β”‚ Time passes... user launches GUI - β–Ό -β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” -β”‚ ProposalDrawer loads β”‚ Reads: /tmp/.../proposals/proposal-*/ -β”‚ β”‚ Displays: List of proposals -β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ - β”‚ - β”‚ User clicks "Accept" - β–Ό -β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” -β”‚ AcceptProposal() β”‚ 1. Find sandbox ROM: /tmp/.../sandboxes/.../zelda3.sfc -β”‚ β”‚ 2. Load sandbox ROM -β”‚ β”‚ 3. rom_->WriteVector(0, sandbox_rom.vector()) -β”‚ β”‚ 4. Main ROM now contains all sandbox changes -β”‚ β”‚ 5. ROM marked dirty -β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ - β”‚ - β–Ό -β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” -β”‚ User: File β†’ Save β”‚ Changes persisted to disk βœ… -β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ -``` - -## 5. Open Questions - -- What serialization format should the proposal registry adopt for diff payloads (binary vs. textual vs. hybrid)? \ - ➀ Decision: pursue a hybrid package (`.z3ed-diff`) that wraps binary tile/object deltas alongside a JSON metadata envelope (identifiers, texture descriptors, preview palette info). Capture format draft under RC/AW backlog. -- How should the harness authenticate escalation requests for mutation actions? \ - ➀ Still openβ€”evaluate shared-secret vs. interactive user prompt in the harness spike (IT-01). -- Can we reuse existing regression test infrastructure for nightly ImGui runs or should we spin up a dedicated binary? \ - ➀ Investigate during the ImGuiTestHarness spike; compare extending `yaze_test` jobs versus introducing a lightweight automation runner. - -## 4. Work History & Key Decisions - -This section provides a high-level summary of completed workstreams and major architectural decisions. - -### Resource Catalogue Workstream (RC) - βœ… COMPLETE -- **Outcome**: A machine-readable API specification for all `z3ed` commands. -- **Artifact**: `docs/api/z3ed-resources.yaml` is the generated source of truth. -- **Details**: Implemented a schema system and serialization for all CLI resources (ROM, Palette, Agent, etc.), enabling AI consumption. - -### Acceptance Workflow (AW-01, AW-02, AW-03) - βœ… COMPLETE -- **Outcome**: A complete, human-in-the-loop proposal review system. -- **Components**: - - `RomSandboxManager`: For creating isolated ROM copies. - - `ProposalRegistry`: For tracking proposals, diffs, and logs with disk persistence. - - `ProposalDrawer`: An ImGui panel for reviewing, accepting, and rejecting proposals, with full ROM merging capabilities. -- **Integration**: The `agent run`, `agent list`, and `agent diff` commands are fully integrated with the registry. The GUI and CLI share the same underlying proposal data. - -### ImGuiTestHarness (IT-01, IT-02) - βœ… CORE COMPLETE -- **Outcome**: A gRPC-based service for automated GUI testing. -- **Decision**: Chose **gRPC** for its performance, cross-platform support, and type safety. -- **Features**: Implemented 6 core RPCs: `Ping`, `Click`, `Type`, `Wait`, `Assert`, and a stubbed `Screenshot`. -- **Integration**: The `z3ed agent test` command can translate natural language prompts into a sequence of gRPC calls to execute tests. - -### Files Modified/Created -A summary of files created or changed during the implementation of the core `z3ed` infrastructure. - -**Core Services & CLI Handlers**: -- `src/cli/service/proposal_registry.{h,cc}` -- `src/cli/service/rom_sandbox_manager.{h,cc}` -- `src/cli/service/resource_catalog.{h,cc}` -- `src/cli/handlers/agent.cc` -- `src/cli/handlers/rom.cc` - -**GUI & Application Integration**: -- `src/app/editor/system/proposal_drawer.{h,cc}` -- `src/app/editor/editor_manager.{h,cc}` -- `src/app/core/service/imgui_test_harness_service.{h,cc}` -- `src/app/core/proto/imgui_test_harness.proto` - -**Build System (CMake)**: -- `src/app/app.cmake` -- `src/app/emu/emu.cmake` -- `src/cli/z3ed.cmake` -- `src/CMakeLists.txt` - -**Documentation & API Specs**: -- `docs/api/z3ed-resources.yaml` -- `docs/z3ed/E6-z3ed-cli-design.md` -- `docs/z3ed/E6-z3ed-implementation-plan.md` -- `docs/z3ed/E6-z3ed-reference.md` -- `docs/z3ed/README.md` - -## 5. Open Questions - -- What serialization format should the proposal registry adopt for diff payloads (binary vs. textual vs. hybrid)? \ - ➀ Decision: pursue a hybrid package (`.z3ed-diff`) that wraps binary tile/object deltas alongside a JSON metadata envelope (identifiers, texture descriptors, preview palette info). Capture format draft under RC/AW backlog. -- How should the harness authenticate escalation requests for mutation actions? \ - ➀ Still openβ€”evaluate shared-secret vs. interactive user prompt in the harness spike (IT-01). -- Can we reuse existing regression test infrastructure for nightly ImGui runs or should we spin up a dedicated binary? \ - ➀ Investigate during the ImGuiTestHarness spike; compare extending `yaze_test` jobs versus introducing a lightweight automation runner. - -# Z3ED_AI Flag Migration Guide - -**Date**: October 3, 2025 -**Status**: βœ… Complete and Tested - -## Summary - -This document describes the consolidation of z3ed AI build flags into a single `Z3ED_AI` master flag, fixing a Gemini integration crash, and improving build ergonomics. - -## Problem Statement - -### Before (Issues): -1. **Confusing Build Flags**: Users had to specify `-DYAZE_WITH_GRPC=ON -DYAZE_WITH_JSON=ON` to enable AI features -2. **Crash on Startup**: Gemini integration crashed due to `PromptBuilder` using JSON/YAML unconditionally -3. **Poor Modularity**: AI dependencies scattered across multiple conditional blocks -4. **Unclear Documentation**: Users didn't know which flags enabled which features - -### Root Cause of Crash: -```cpp -// GeminiAIService constructor (ALWAYS runs when Gemini key present) -GeminiAIService::GeminiAIService(const GeminiConfig& config) : config_(config) { - // This line crashed when YAZE_WITH_JSON=OFF - prompt_builder_.LoadResourceCatalogue(""); // ❌ Uses nlohmann::json unconditionally -} -``` - -The `PromptBuilder::LoadResourceCatalogue()` function used `nlohmann::json` and `yaml-cpp` without guards, causing segfaults when JSON support wasn't compiled in. - -## Solution - -### 1. Created Z3ED_AI Master Flag - -**New CMakeLists.txt** (`/Users/scawful/Code/yaze/CMakeLists.txt`): -```cmake -# Master flag for z3ed AI agent features -option(Z3ED_AI "Enable z3ed AI agent features (Gemini/Ollama integration)" OFF) - -# Auto-enable dependencies -if(Z3ED_AI) - message(STATUS "Z3ED_AI enabled: Activating AI agent dependencies (JSON, YAML, httplib)") - set(YAZE_WITH_JSON ON CACHE BOOL "Enable JSON support" FORCE) -endif() -``` - -**Benefits**: -- βœ… Single flag to enable all AI features: `-DZ3ED_AI=ON` -- βœ… Auto-manages dependencies (JSON, YAML, httplib) -- βœ… Clear intent: "I want AI agent features" -- βœ… Backward compatible: Old flags still work - -### 2. Fixed PromptBuilder Crash - -**Added Compile-Time Guard** (`src/cli/service/ai/prompt_builder.h`): -```cpp -#ifndef YAZE_CLI_SERVICE_PROMPT_BUILDER_H_ -#define YAZE_CLI_SERVICE_PROMPT_BUILDER_H_ - -// Warn at compile time if JSON not available -#if !defined(YAZE_WITH_JSON) -#warning "PromptBuilder requires JSON support. Build with -DZ3ED_AI=ON or -DYAZE_WITH_JSON=ON" -#endif -``` - -**Added Runtime Guard** (`src/cli/service/ai/prompt_builder.cc`): -```cpp -absl::Status PromptBuilder::LoadResourceCatalogue(const std::string& yaml_path) { -#ifndef YAZE_WITH_JSON - // Gracefully degrade instead of crashing - std::cerr << "⚠️ PromptBuilder requires JSON support for catalogue loading\n" - << " Build with -DZ3ED_AI=ON or -DYAZE_WITH_JSON=ON\n" - << " AI features will use basic prompts without tool definitions\n"; - return absl::OkStatus(); // Don't crash, just skip advanced features -#else - // ... normal loading code ... -#endif -} -``` - -**Benefits**: -- βœ… No more segfaults when `GEMINI_API_KEY` is set but JSON disabled -- βœ… Clear error messages at compile time and runtime -- βœ… Graceful degradation instead of hard failure - -### 3. Updated z3ed Build Configuration - -**New z3ed.cmake** (`src/cli/z3ed.cmake`): -```cmake -# AI Agent Support (Consolidated via Z3ED_AI flag) -if(Z3ED_AI OR YAZE_WITH_JSON) - target_compile_definitions(z3ed PRIVATE YAZE_WITH_JSON) - message(STATUS "βœ“ z3ed AI agent enabled (Ollama + Gemini support)") - target_link_libraries(z3ed PRIVATE nlohmann_json::nlohmann_json) -endif() - -# SSL/HTTPS Support for Gemini -if((Z3ED_AI OR YAZE_WITH_JSON) AND (YAZE_WITH_GRPC OR Z3ED_AI)) - find_package(OpenSSL) - if(OpenSSL_FOUND) - target_compile_definitions(z3ed PRIVATE CPPHTTPLIB_OPENSSL_SUPPORT) - target_link_libraries(z3ed PRIVATE OpenSSL::SSL OpenSSL::Crypto) - message(STATUS "βœ“ SSL/HTTPS support enabled for z3ed (Gemini API ready)") - else() - message(WARNING "OpenSSL not found - Gemini API will not work") - message(STATUS " β€’ Ollama (local) still works without SSL") - endif() -endif() -``` - -**Benefits**: -- βœ… Clear status messages during build -- βœ… Explains what's enabled and what's missing -- βœ… Guidance on how to fix missing dependencies - -## Migration Instructions - -### For Users - -**Old Way** (still works): -```bash -cmake -B build -DYAZE_WITH_GRPC=ON -DYAZE_WITH_JSON=ON -cmake --build build --target z3ed -``` - -**New Way** (recommended): -```bash -cmake -B build -DZ3ED_AI=ON -cmake --build build --target z3ed -``` - -**With GUI Testing**: -```bash -cmake -B build -DZ3ED_AI=ON -DYAZE_WITH_GRPC=ON -cmake --build build --target z3ed -``` - -### For Developers - -**Check if AI Features Available**: -```cpp -#ifdef YAZE_WITH_JSON - // JSON-dependent code (AI responses, config loading) -#else - // Fallback or warning -#endif -``` - -**Don't use JSON/YAML directly** - use PromptBuilder which handles guards automatically. - -## Testing Results - -### Build Configurations Tested βœ… - -1. **Minimal Build** (no AI): - ```bash - cmake -B build - ./build/bin/z3ed --help # βœ… Works, shows "AI disabled" message - ``` - -2. **AI Enabled** (new flag): - ```bash - cmake -B build -DZ3ED_AI=ON - export GEMINI_API_KEY="..." - ./build/bin/z3ed agent plan --prompt "test" # βœ… Works, connects to Gemini - ``` - -3. **Full Stack** (AI + gRPC): - ```bash - cmake -B build -DZ3ED_AI=ON -DYAZE_WITH_GRPC=ON - ./build/bin/z3ed agent test --prompt "..." # βœ… Works, GUI automation available - ``` - -### Crash Scenarios Fixed βœ… - -**Before**: -```bash -export GEMINI_API_KEY="..." -cmake -B build # JSON disabled by default -./build/bin/z3ed agent plan --prompt "test" -# Result: Segmentation fault (139) ❌ -``` - -**After**: -```bash -export GEMINI_API_KEY="..." -cmake -B build # JSON disabled by default -./build/bin/z3ed agent plan --prompt "test" -# Result: ⚠️ Warning message, graceful degradation βœ… -``` - -```bash -export GEMINI_API_KEY="..." -cmake -B build -DZ3ED_AI=ON # JSON enabled -./build/bin/z3ed agent plan --prompt "Place a tree at 10, 10" -# Result: βœ… Gemini responds, creates proposal -``` - -## Impact on Build Modularization - -This change aligns with the goals in `build_modularization_plan.md` and `build_modularization_implementation.md`: - -### Before: -- Scattered conditional compilation flags -- Dependencies unclear -- Hard to add to modular library system - -### After: -- βœ… Clear feature flag: `Z3ED_AI` -- βœ… Can create `libyaze_agent.a` with `if(Z3ED_AI)` guard -- βœ… Easy to make optional in modular build: - ```cmake - if(Z3ED_AI) - add_library(yaze_agent STATIC ${YAZE_AGENT_SOURCES}) - target_compile_definitions(yaze_agent PUBLIC YAZE_WITH_JSON) - target_link_libraries(yaze_agent PUBLIC nlohmann_json::nlohmann_json yaml-cpp) - endif() - ``` - -### Future Modular Build Integration - -When implementing modular builds (Phase 6-7 from `build_modularization_plan.md`): - -```cmake -# src/cli/agent/agent_library.cmake (NEW) -if(Z3ED_AI) - add_library(yaze_agent STATIC - cli/service/ai/ai_service.cc - cli/service/ai/ollama_ai_service.cc - cli/service/ai/gemini_ai_service.cc - cli/service/ai/prompt_builder.cc - cli/service/agent/conversational_agent_service.cc - # ... other agent sources - ) - - target_compile_definitions(yaze_agent PUBLIC YAZE_WITH_JSON) - - target_link_libraries(yaze_agent PUBLIC - yaze_util - nlohmann_json::nlohmann_json - yaml-cpp - ) - - # Optional SSL for Gemini - if(OpenSSL_FOUND) - target_compile_definitions(yaze_agent PRIVATE CPPHTTPLIB_OPENSSL_SUPPORT) - target_link_libraries(yaze_agent PRIVATE OpenSSL::SSL OpenSSL::Crypto) - endif() - - message(STATUS "βœ“ yaze_agent library built with AI support") -endif() -``` - -**Benefits for Modular Build**: -- Agent library clearly optional -- Can rebuild just agent library when AI code changes -- z3ed links to `yaze_agent` instead of individual sources -- Faster incremental builds - -## Documentation Updates - -Updated files: -- βœ… `docs/z3ed/README.md` - Added Z3ED_AI flag documentation -- βœ… `docs/z3ed/Z3ED_AI_FLAG_MIGRATION.md` - This document -- πŸ“‹ TODO: Update `docs/02-build-instructions.md` with Z3ED_AI flag -- πŸ“‹ TODO: Update CI/CD workflows to use Z3ED_AI - -## Backward Compatibility - -### Old Flags Still Work βœ… - -```bash -# These all enable AI features: -cmake -B build -DYAZE_WITH_JSON=ON # βœ… Works -cmake -B build -DYAZE_WITH_GRPC=ON # βœ… Works (auto-enables JSON) -cmake -B build -DZ3ED_AI=ON # βœ… Works (new way) - -# Combining flags: -cmake -B build -DZ3ED_AI=ON -DYAZE_WITH_GRPC=ON # βœ… Full stack -``` - -### No Breaking Changes - -- Existing build scripts continue to work -- CI/CD pipelines don't need immediate updates -- Users can migrate at their own pace - -## Next Steps - -### Short Term (Complete) -- βœ… Fix Gemini crash -- βœ… Create Z3ED_AI master flag -- βœ… Update z3ed build configuration -- βœ… Test all build configurations -- βœ… Update README documentation - -### Medium Term (Recommended) -- [ ] Update CI/CD workflows to use `-DZ3ED_AI=ON` -- [ ] Add Z3ED_AI to preset configurations -- [ ] Update main build instructions docs -- [ ] Create agent library module (see above) - -### Long Term (Integration with Modular Build) -- [ ] Implement `yaze_agent` library (Phase 6) -- [ ] Add agent to modular dependency graph -- [ ] Create agent-specific unit tests -- [ ] Optional: Split Gemini/Ollama into separate modules - -## References - -- **Related Issues**: Gemini crash (segfault 139) with GEMINI_API_KEY set -- **Related Docs**: - - `docs/build_modularization_plan.md` - Future library structure - - `docs/build_modularization_implementation.md` - Implementation guide - - `docs/z3ed/README.md` - User-facing z3ed documentation - - `docs/z3ed/AGENT-ROADMAP.md` - AI agent development plan - -## Summary - -This migration successfully: -1. βœ… **Fixed crash**: Gemini no longer segfaults when JSON disabled -2. βœ… **Simplified builds**: One flag (`Z3ED_AI`) replaces multiple flags -3. βœ… **Improved UX**: Clear error messages and build status -4. βœ… **Maintained compatibility**: Old flags still work -5. βœ… **Prepared for modularization**: Clear path to `libyaze_agent.a` -6. βœ… **Tested thoroughly**: All configurations verified working - -The z3ed AI agent is now production-ready with Gemini and Ollama support! - -## 6. References - -**Active Documentation**: -- `E6-z3ed-cli-design.md` - Overall CLI design and architecture -- `E6-z3ed-reference.md` - Technical command and API reference -- `docs/api/z3ed-resources.yaml` - Machine-readable API reference (generated) - -**Source Code**: -- `src/cli/service/` - Core services (proposal registry, sandbox manager, resource catalog) -- `src/app/editor/system/proposal_drawer.{h,cc}` - GUI review panel -- `src/app/core/service/imgui_test_harness_service.{h,cc}` - gRPC automation server - ---- - -**Last Updated**: [Current Date] -**Contributors**: @scawful, GitHub Copilot -**License**: Same as YAZE (see ../../LICENSE) - -# Z3ED GUI Integration & Enhanced Gemini Support - -**Date**: October 3, 2025 -**Status**: Ready for Testing - -## Overview - -This update brings two major enhancements to the z3ed AI agent system: - -1. **GUI Chat Widget** - Interactive conversational agent interface in the YAZE application -2. **Enhanced Gemini Function Calling** - Improved AI tool integration with proper schema support - -## New Features - -### 1. GUI Agent Chat Widget - -A fully-featured ImGui chat interface that provides the same conversational agent capabilities as the TUI, but integrated directly into the YAZE GUI application. - -**Location**: `src/app/gui/widgets/agent_chat_widget.{h,cc}` - -**Key Features**: -- Real-time conversation with AI agent -- Automatic table rendering for JSON tool results -- Chat history persistence (save/load) -- Timestamps and message styling -- Auto-scroll and multi-line input -- ROM context awareness -- Color-coded messages (user vs. agent) - -**Access**: -- Menu: `Debug β†’ Agent Chat` (in YAZE GUI) -- Keyboard: Check application shortcuts menu - -**Usage Example**: -```cpp -// In your editor code: -AgentChatWidget chat_widget; -chat_widget.Initialize(&rom); - -// In your render loop: -bool show_chat = true; -chat_widget.Render(&show_chat); -``` - -### 2. Enhanced Gemini Function Calling - -The GeminiAIService now supports proper function calling with structured tool schemas, enabling the AI to autonomously invoke ROM inspection tools. - -**Available Tools**: -1. `resource_list` - Enumerate labeled resources (dungeons, sprites, palettes) -2. `dungeon_list_sprites` - List sprites in a dungeon room -3. `overworld_find_tile` - Find tile16 occurrences on maps -4. `overworld_describe_map` - Get map summary information -5. `overworld_list_warps` - List entrance/exit/hole points - -**Function Schema Format** (Gemini API): -```json -{ - "name": "overworld_find_tile", - "description": "Find all occurrences of a specific tile16 ID on overworld maps", - "parameters": { - "type": "object", - "properties": { - "tile": { - "type": "string", - "description": "Tile16 ID in hex format (e.g., 0x02E)" - }, - "map": { - "type": "string", - "description": "Optional: specific map ID to search" - }, - "format": { - "type": "string", - "enum": ["json", "text"], - "default": "json" - } - }, - "required": ["tile"] - } -} -``` - -**API Reference**: https://ai.google.dev/gemini-api/docs/function-calling - -### 3. ASCII Logo Branding - -Z3ED now features a distinctive ASCII art logo with a Triforce symbol, displayed in both the TUI main menu and CLI help output. - -**Variants**: -- `kZ3edLogo` - Full logo (default) -- `kZ3edLogoCompact` - Bordered version for smaller spaces -- `kZ3edLogoMinimal` - Compact version for constrained displays -- `GetColoredLogo()` - Terminal-colored version with ANSI codes - -**Preview**: -``` - β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— - β•šβ•β•β–ˆβ–ˆβ–ˆβ•”β•β•šβ•β•β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β•β•β•β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•— - β–ˆβ–ˆβ–ˆβ•”β• β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘ - β–ˆβ–ˆβ–ˆβ•”β• β•šβ•β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β• β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘ - β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β• - β•šβ•β•β•β•β•β•β•β•šβ•β•β•β•β•β• β•šβ•β•β•β•β•β•β•β•šβ•β•β•β•β•β• - - β–² Zelda 3 Editor - β–² β–² AI-Powered CLI - β–²β–²β–²β–²β–² -``` - -## Build Requirements - -### GUI Chat Widget -```bash -cmake -B build -DZ3ED_AI=ON -DYAZE_WITH_GRPC=ON -cmake --build build --target yaze -``` - -**Dependencies**: -- Z3ED_AI=ON (enables JSON, YAML, httplib) -- YAZE_WITH_GRPC=ON (optional, for test harness) -- ImGui (automatically included with YAZE) - -### Enhanced Gemini Support -```bash -cmake -B build -DZ3ED_AI=ON -cmake --build build --target z3ed -``` - -**Dependencies**: -- Z3ED_AI=ON (enables JSON for function calling) -- OpenSSL (optional, for HTTPS - auto-detected) -- Gemini API key: `export GEMINI_API_KEY="your-key"` - -## Testing - -### Test GUI Chat Widget - -1. **Launch YAZE with ROM**: -```bash -./build/bin/yaze.app/Contents/MacOS/yaze --rom assets/zelda3.sfc -``` - -2. **Open Agent Chat**: - - Menu β†’ Debug β†’ Agent Chat - - Or use keyboard shortcut - -3. **Try Commands**: - - "List all dungeons in this project" - - "Find tile 0x02E on map 0x05" - - "Describe map 0x00" - - "List all warps" - -### Test Enhanced Gemini Function Calling - -1. **Set API Key**: -```bash -export GEMINI_API_KEY="your-api-key-here" -``` - -2. **Verify Function Calling**: -```bash -./build/bin/z3ed agent chat --rom assets/zelda3.sfc -``` - -3. **Test Natural Language**: - - Type: "What dungeons are available?" - - Expected: AI calls `resource_list` tool autonomously - - Type: "Find all trees on the light world" - - Expected: AI calls `overworld_find_tile` with appropriate parameters - -### Test ASCII Logo - -1. **TUI Main Menu**: -```bash -./build/bin/z3ed --tui -``` - -2. **CLI Help**: -```bash -./build/bin/z3ed --help -``` - -3. **Verify Colors**: - - Cyan: Z3ED text - - Yellow: Triforce - - White/Gray: Subtitle - -## Implementation Details - -### AgentChatWidget Architecture - -``` -AgentChatWidget -β”œβ”€β”€ RenderChatHistory() // Displays message bubbles -β”œβ”€β”€ RenderInputArea() // Multi-line input with send button -β”œβ”€β”€ RenderToolbar() // History controls and settings -β”œβ”€β”€ RenderMessageBubble() // Individual message rendering -β”œβ”€β”€ RenderTableFromJson() // Automatic table generation -└── SendMessage() // Message processing via ConversationalAgentService -``` - -**Message Flow**: -1. User types message β†’ `SendMessage()` -2. `ConversationalAgentService::ProcessMessage()` invoked -3. AI generates response (may include tool calls) -4. Tool results rendered as tables or text -5. History updated with auto-scroll - -### Gemini Function Calling Flow - -``` -User Prompt - ↓ -GeminiAIService::GenerateResponse() - ↓ -BuildFunctionCallSchemas() β†’ Adds tool definitions - ↓ -Gemini API Request (with tools parameter) - ↓ -Gemini Response (may include tool_calls) - ↓ -ParseGeminiResponse() β†’ Extracts tool_calls - ↓ -ConversationalAgentService β†’ Dispatches to ToolDispatcher - ↓ -Tool Execution β†’ Returns JSON result - ↓ -Result shown in chat / CLI output -``` - -## Configuration - -### GUI Widget Settings - -Customize in `AgentChatWidget` constructor: -```cpp -// Color scheme -colors_.user_bubble = ImVec4(0.2f, 0.4f, 0.8f, 1.0f); // Blue -colors_.agent_bubble = ImVec4(0.3f, 0.3f, 0.35f, 1.0f); // Dark gray -colors_.tool_call_bg = ImVec4(0.2f, 0.5f, 0.3f, 0.3f); // Green tint - -// UI behavior -auto_scroll_ = true; // Auto-scroll on new messages -show_timestamps_ = true; // Display message timestamps -show_reasoning_ = false; // Show AI reasoning (if available) -message_spacing_ = 12.0f; // Space between messages (pixels) -``` - -### Gemini AI Settings - -Configure via `GeminiConfig`: -```cpp -GeminiConfig config; -config.api_key = "your-key"; -config.model = "gemini-2.5-flash"; // Or gemini-1.5-pro -config.temperature = 0.7f; -config.max_output_tokens = 2048; -config.use_enhanced_prompting = true; // Enable few-shot examples - -GeminiAIService service(config); -service.EnableFunctionCalling(true); // Enable tool calling -``` - -### Function Calling Control - -```cpp -// Disable function calling (fallback to command generation) -service.EnableFunctionCalling(false); - -// Check available tools -auto tools = service.GetAvailableTools(); -for (const auto& tool : tools) { - std::cout << "Tool: " << tool << std::endl; -} -``` - -## Troubleshooting - -### GUI Chat Widget Issues - -**Problem**: Widget not appearing -**Solution**: Check build flags - requires `Z3ED_AI=ON` - -**Problem**: "AI features not available" error -**Solution**: Rebuild with `-DZ3ED_AI=ON`: -```bash -rm -rf build && cmake -B build -DZ3ED_AI=ON && cmake --build build -``` - -**Problem**: JSON tables not rendering -**Solution**: Verify `YAZE_WITH_JSON` is enabled (auto-enabled by Z3ED_AI) - -**Problem**: Chat history not saving -**Solution**: Check `.yaze/` directory exists and is writable - -### Gemini Function Calling Issues - -**Problem**: Tools not being called -**Solution**: -1. Verify `function_calling_enabled_ = true` -2. Check Gemini API response includes `tool_calls` field -3. Ensure `responseMimeType` is set to `"application/json"` - -**Problem**: "Invalid tool schema" warnings -**Solution**: Validate schema JSON in `BuildFunctionCallSchemas()` - must match Gemini spec - -**Problem**: SSL/HTTPS errors -**Solution**: Install OpenSSL: -```bash -# macOS -brew install openssl - -# Linux -sudo apt install libssl-dev -``` - -### ASCII Logo Issues - -**Problem**: Logo garbled/misaligned -**Solution**: Ensure terminal supports UTF-8 and Unicode box-drawing characters - -**Problem**: Colors not showing -**Solution**: Use `GetColoredLogo()` for ANSI color support in terminals - -## Next Steps - -According to [AGENT-ROADMAP.md](AGENT-ROADMAP.md), the priority order is: - -1. **βœ… COMPLETE**: GUI Chat Widget -2. **βœ… COMPLETE**: Enhanced Gemini Function Calling -3. **βœ… COMPLETE**: ASCII Logo Branding -4. **🎯 NEXT UP**: Live LLM Testing (1-2 hours) - - Verify Gemini generates correct `tool_calls` JSON - - Test multi-turn conversations with context - - Exercise all 5 tools with natural language prompts -5. **πŸ“‹ PLANNED**: Expand Tool Coverage (8-10 hours) - - Dialogue/text search tools - - Sprite inspection tools - - Advanced overworld tools - -## Related Documentation - -- **[AGENT-ROADMAP.md](AGENT-ROADMAP.md)** - Strategic vision and next steps -- **[E6-z3ed-implementation-plan.md](E6-z3ed-implementation-plan.md)** - Implementation tracker -- **[README.md](README.md)** - Quick start guide -- **[BUILD_QUICK_REFERENCE.md](BUILD_QUICK_REFERENCE.md)** - Build instructions -- **Gemini Function Calling**: https://ai.google.dev/gemini-api/docs/function-calling - -## Examples - -### Example 1: Using GUI Chat for ROM Exploration - -``` -User: "What dungeons are in this ROM?" -Agent: [Calls resource_list tool] - Renders table with dungeon IDs, names, and labels - -User: "Show me sprites in the first dungeon" -Agent: [Calls dungeon_list_sprites with room 0x000] - Displays sprite table with IDs, types, positions - -User: "Find all water tiles on map 5" -Agent: [Calls overworld_find_tile with tile=water_id, map=0x05] - Shows coordinates where water appears -``` - -### Example 2: Programmatic Function Calling - -```cpp -#include "cli/service/ai/gemini_ai_service.h" -#include "cli/service/agent/conversational_agent_service.h" - -// Initialize services -GeminiConfig config("your-api-key"); -config.use_enhanced_prompting = true; -GeminiAIService ai_service(config); -ai_service.SetRomContext(&rom); - -agent::ConversationalAgentService agent; -agent.SetRomContext(&rom); - -// Natural language query -auto result = agent.SendMessage("List all palace dungeons"); - -// Result includes tool call execution -std::cout << result.value().message << std::endl; -// Output: JSON table of palace dungeons -``` - -### Example 3: Custom Tool Integration - -To add a new tool to Gemini function calling: - -1. **Add schema to `BuildFunctionCallSchemas()`**: -```cpp -{ - "name": "dialogue_search", - "description": "Search for text in ROM dialogue", - "parameters": { - "type": "object", - "properties": { - "text": { - "type": "string", - "description": "Search term" - } - }, - "required": ["text"] - } -} -``` - -2. **Implement in `ToolDispatcher`**: -```cpp -if (tool_name == "dialogue_search") { - return DialogueSearchTool(args); -} -``` - -3. **Update `GetAvailableTools()`**: -```cpp -return { - "resource_list", - "dungeon_list_sprites", - "overworld_find_tile", - "overworld_describe_map", - "overworld_list_warps", - "dialogue_search" // New tool -}; -``` - -## Success Criteria - -- βœ… GUI chat widget renders correctly in YAZE -- βœ… Messages display with proper formatting -- βœ… JSON tables render from tool results -- βœ… Chat history persists across sessions -- βœ… Gemini function calling works with all 5 tools -- βœ… Tool results properly formatted and returned -- βœ… ASCII logo displays in TUI and CLI help -- βœ… Colors render correctly in terminal - -## Performance Notes - -- **GUI Rendering**: ~60 FPS with 100+ messages in history -- **Table Rendering**: Automatic scrolling for large result sets -- **Function Calling Latency**: ~1-3 seconds per Gemini API call -- **Memory Usage**: ~50 MB for chat history (1000 messages) - -## Security Considerations - -- API keys stored in environment variables (not version controlled) -- Chat history saved to `.yaze/` (local filesystem only) -- No telemetry or external logging of conversations -- Tool execution sandboxed to read-only operations -- ROM modifications require explicit proposal acceptance - ---- - -**Questions or Issues?** -See [AGENT-ROADMAP.md](AGENT-ROADMAP.md) for the roadmap and open issues. - -# z3ed Implementation Status - -**Last Updated**: October 3, 2025 -**Status**: Core Infrastructure Complete | Integration Phase Active - -## Summary - -All core conversational agent infrastructure is implemented and functional. The focus is now on: -1. Testing function calling with live LLMs -2. Expanding tool coverage -3. Connecting chat conversations to proposal generation - -## Completed Infrastructure βœ… - -### Conversational Agent Service -- βœ… `ConversationalAgentService` - Full multi-step tool execution loop -- βœ… Chat history management with structured messages -- βœ… Table/JSON rendering support in chat messages -- βœ… ROM context integration -- βœ… Tool result replay without recursion - -### Chat Interfaces (3 Modes) -1. **FTXUI Chat** (`z3ed agent chat`) βœ… - - Full-screen interactive terminal - - Table rendering from JSON - - Syntax highlighting - - Production ready - -2. **Simple Chat** (`z3ed agent simple-chat`) βœ… NEW! - - Text-based REPL (no FTXUI) - - Batch mode support (`--file`) - - Better for AI/automation testing - - Commands: `quit`, `exit`, `reset` - -3. **GUI Chat Widget** βœ… (Already Integrated) - - Lives in `src/app/editor/system/agent_chat_widget.{h,cc}` - - Accessible via Debug β†’ Agent Chat menu - - Shares `ConversationalAgentService` backend - - Table rendering for structured data - - Auto-scrolling, syntax highlighting - -### Tool System -- βœ… `ToolDispatcher` - Routes tool calls to handlers -- βœ… 5 read-only tools operational: - - `resource-list` - Enumerate labeled resources - - `dungeon-list-sprites` - Inspect room sprites - - `overworld-find-tile` - Search for tile16 IDs - - `overworld-describe-map` - Get map metadata - - `overworld-list-warps` - List entrances/exits/holes -- βœ… Automatic JSON output formatting -- βœ… CLI and agent service can both invoke tools - -### AI Backends -- βœ… Ollama (local) - qwen2.5-coder recommended -- βœ… Gemini (cloud) - Gemini 2.0 with function calling -- βœ… Health checks and auto-detection -- βœ… Graceful degradation with clear errors - -### Build System -- βœ… Z3ED_AI master flag consolidation -- βœ… Auto-managed dependencies (JSON, YAML, httplib, OpenSSL) -- βœ… Backward compatibility -- βœ… Clear error messages - -## In Progress 🚧 - -### Priority 1: Live LLM Testing (1-2h) -**Goal**: Verify function calling works end-to-end - -**Status**: Infrastructure complete, needs real-world testing -- Tool schemas generated -- System prompts include function definitions -- Response parsing implemented -- Dispatcher operational - -**Remaining**: -- Test with Gemini 2.0: "What dungeons exist?" -- Test with Ollama (qwen2.5-coder) -- Validate multi-step conversations -- Exercise all 5 tools with natural language - -### Priority 2: Proposal Integration (6-8h) -**Goal**: Connect chat to ROM modification workflow - -**Status**: Proposal system exists, needs chat integration -- ProposalRegistry βœ… operational -- Tile16ProposalGenerator βœ… working -- ProposalDrawer GUI βœ… integrated -- Sandbox ROM manager βœ… complete - -**Remaining**: -- Detect action intents in conversation -- Generate proposal from chat context -- Link proposal to conversation history -- GUI notification when proposal ready - -### Priority 3: Tool Coverage (8-10h) -**Goal**: Enable deeper ROM introspection - -**Next Tools**: -- Dialogue/text search -- Sprite info inspection -- Region/teleport tools -- Room connections -- Item locations - -## Code Files Status - -### New Files Created βœ… -- `src/cli/service/agent/simple_chat_session.h` βœ… -- `src/cli/service/agent/simple_chat_session.cc` βœ… -- CLI handler: `HandleSimpleChatCommand()` βœ… - -### Modified Files βœ… -- `src/cli/handlers/agent/commands.h` - Added simple-chat declaration -- `src/cli/handlers/agent/general_commands.cc` - Implemented handler -- `src/cli/handlers/agent.cc` - Added routing -- `src/cli/agent.cmake` - Added simple_chat_session.cc to build -- `docs/z3ed/README.md` - Condensed and clarified -- `docs/z3ed/AGENT-ROADMAP.md` - Streamlined with priorities - -### Existing Files (Already Working) -- `src/app/editor/system/agent_chat_widget.{h,cc}` - GUI widget βœ… -- `src/cli/service/agent/conversational_agent_service.{h,cc}` βœ… -- `src/cli/service/agent/tool_dispatcher.{h,cc}` βœ… -- `src/cli/tui/chat_tui.{h,cc}` - FTXUI interface βœ… - -### Removed/Unused Files -- `src/app/gui/widgets/agent_chat_widget.*` - DUPLICATE (not used) - - The real implementation is in `src/app/editor/system/` - - Should be removed to avoid confusion - -## Next Steps - -### Immediate (Today) -1. **Test Live LLM Function Calling** (1-2h) - ```bash - # Test Gemini - export GEMINI_API_KEY="your-key" - z3ed agent simple-chat --rom zelda3.sfc - > What dungeons are defined? - - # Test Ollama - ollama serve - z3ed agent simple-chat --rom zelda3.sfc - > List sprites in room 0x012 - ``` - -2. **Validate Simple Chat Mode** (30min) - ```bash - # Interactive - z3ed agent simple-chat --rom zelda3.sfc - - # Batch mode - echo "What dungeons exist?" > test.txt - echo "Find tile 0x02E" >> test.txt - z3ed agent simple-chat --file test.txt --rom zelda3.sfc - ``` - -### Short Term (This Week) -1. **Add Dialogue Tools** (3h) - - `dialogue-search --text "search term"` - - `dialogue-get --id 0x...` - -2. **Add Sprite Tools** (3h) - - `sprite-get-info --id 0x...` - - `overworld-list-sprites --map 0x...` - -3. **Start Proposal Integration** (4h) - - Detect "create", "add", "place" intents - - Generate proposal from chat context - - Link to ProposalGenerator - -### Medium Term (Next 2 Weeks) -1. **Complete Proposal Integration** - - GUI notifications - - Conversation β†’ Proposal workflow - - Testing and refinement - -2. **Expand Tool Coverage** - - Region tools - - Connection/warp tools - - Advanced overworld queries - -3. **Performance Optimizations** - - Response caching - - Token usage tracking - - Streaming responses (optional) - -## Testing Checklist - -### Manual Testing -- [ ] Simple chat interactive mode -- [ ] Simple chat batch mode -- [ ] FTXUI chat with tables -- [ ] GUI chat widget in YAZE -- [ ] All 5 tools with natural language -- [ ] Multi-step conversations -- [ ] ROM context switching - -### LLM Testing -- [ ] Gemini function calling -- [ ] Ollama function calling -- [ ] Tool result incorporation -- [ ] Error handling -- [ ] Multi-turn context - -### Integration Testing -- [ ] Chat β†’ Proposal generation -- [ ] Proposal review in GUI -- [ ] Accept/reject workflow -- [ ] Sandbox ROM management - -## Known Issues - -1. **Duplicate Widget Files** - - `src/app/gui/widgets/agent_chat_widget.*` not used - - Should remove to avoid confusion - - Real implementation in `src/app/editor/system/` - -2. **Function Calling Not Tested Live** - - Infrastructure complete but untested with real LLMs - - Need to verify Gemini/Ollama can call tools - -3. **No Proposal Integration** - - Chat conversations don't generate proposals yet - - Need to detect action intents and trigger generators - -## Build Commands - -```bash -# Full AI features -cmake -B build -DZ3ED_AI=ON -cmake --build build --target z3ed - -# With GUI automation -cmake -B build -DZ3ED_AI=ON -DYAZE_WITH_GRPC=ON -cmake --build build - -# Test -./build/bin/z3ed agent simple-chat --rom assets/zelda3.sfc -``` - -## Documentation Status - -### Updated βœ… -- `README.md` - Condensed with clear examples -- `AGENT-ROADMAP.md` - Streamlined priorities -- `IMPLEMENTATION_STATUS.md` - This file (NEW) - -### Still Current -- `E6-z3ed-cli-design.md` - Architecture reference -- `E6-z3ed-reference.md` - Command reference -- `E6-z3ed-implementation-plan.md` - Detailed plan - -### Could Be Condensed (Low Priority) -- `E6-z3ed-implementation-plan.md` - Very detailed, some overlap -- `E6-z3ed-reference.md` - Could merge with README - -## Success Metrics - -### Phase 1: Foundation βœ… COMPLETE -- [x] Conversational agent service -- [x] 3 chat interfaces (TUI, simple, GUI) -- [x] 5 read-only tools -- [x] Build system consolidation - -### Phase 2: Integration 🚧 IN PROGRESS -- [ ] Live LLM testing with function calling -- [ ] Proposal generation from chat -- [ ] 10+ read-only tools -- [ ] End-to-end workflow tested - -### Phase 3: Production πŸ“‹ PLANNED -- [ ] Response caching -- [ ] Token usage tracking -- [ ] Error recovery -- [ ] User testing and feedback diff --git a/docs/z3ed/E6-z3ed-reference.md b/docs/z3ed/E6-z3ed-reference.md deleted file mode 100644 index b144e0c5..00000000 --- a/docs/z3ed/E6-z3ed-reference.md +++ /dev/null @@ -1,1332 +0,0 @@ -# z3ed CLI Technical Reference - -**Version**: 0.1.0-alpha -**Last Updated**: [Current Date] -**Status**: Production Ready (macOS), Windows Testing Pending - ---- - -## Table of Contents - -1. [Architecture Overview](#architecture-overview) -2. [Command Reference](#command-reference) -3. [Implementation Guide](#implementation-guide) -4. [Testing & Validation](#testing--validation) -5. [Development Workflows](#development-workflows) -6. [Troubleshooting](#troubleshooting) -7. [API Reference](#api-reference) -8. [Platform Notes](#platform-notes) - ---- - -## Architecture Overview - -### System Components - -``` -β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” -β”‚ AI Agent Layer (LLM) β”‚ -β”‚ └─ Natural language prompts β”‚ -β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ - β”‚ -β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” -β”‚ z3ed CLI (Command-Line Interface) β”‚ -β”‚ β”œβ”€ agent run --prompt "..." --sandbox β”‚ -β”‚ β”œβ”€ agent test --prompt "..." (IT-02) β”‚ -β”‚ β”œβ”€ agent list β”‚ -β”‚ β”œβ”€ agent diff --proposal-id β”‚ -β”‚ β”œβ”€ agent describe [--resource ] β”‚ -β”‚ β”œβ”€ rom info/validate/diff/generate-golden β”‚ -β”‚ β”œβ”€ palette export/import/list β”‚ -β”‚ β”œβ”€ overworld get-tile/find-tile/set-tile β”‚ -β”‚ └─ dungeon list-rooms/add-object β”‚ -β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ - β”‚ -β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” -β”‚ Service Layer (Singleton Services) β”‚ -β”‚ β”œβ”€ ProposalRegistry (proposal tracking) β”‚ -β”‚ β”œβ”€ RomSandboxManager (isolated ROM copies) β”‚ -β”‚ β”œβ”€ ResourceCatalog (machine-readable API specs) β”‚ -β”‚ β”œβ”€ GuiAutomationClient (gRPC wrapper) β”‚ -β”‚ β”œβ”€ TestWorkflowGenerator (NL β†’ test steps) β”‚ -β”‚ └─ PolicyEvaluator (YAML constraints) [Planned] β”‚ -β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ - β”‚ -β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” -β”‚ ImGuiTestHarness (gRPC Server) β”‚ -β”‚ β”œβ”€ Ping (health check) β”‚ -β”‚ β”œβ”€ Click (button, menu, tab) β”‚ -β”‚ β”œβ”€ Type (text input) β”‚ -β”‚ β”œβ”€ Wait (condition polling) β”‚ -β”‚ β”œβ”€ Assert (state validation) β”‚ -β”‚ β”œβ”€ Screenshot (capture) [Stub β†’ IT-08] β”‚ -β”‚ β”œβ”€ GetTestStatus (query test execution) [IT-05] β”‚ -β”‚ β”œβ”€ ListTests (enumerate tests) [IT-05] β”‚ -β”‚ β”œβ”€ GetTestResults (detailed results) [IT-05] β”‚ -β”‚ β”œβ”€ DiscoverWidgets (widget enumeration) [IT-06] β”‚ -β”‚ β”œβ”€ StartRecording (test recording) [IT-07] β”‚ -β”‚ β”œβ”€ StopRecording (finish recording) [IT-07] β”‚ -β”‚ └─ ReplayTest (execute test script) [IT-07] β”‚ -β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ - β”‚ -β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” -β”‚ YAZE GUI (ImGui Application) β”‚ -β”‚ β”œβ”€ ProposalDrawer (Debug β†’ Agent Proposals) β”‚ -β”‚ β”‚ β”œβ”€ List/detail views β”‚ -β”‚ β”‚ β”œβ”€ Accept/Reject/Delete β”‚ -β”‚ β”‚ └─ ROM merging β”‚ -β”‚ └─ Editor Windows β”‚ -β”‚ β”œβ”€ Overworld Editor β”‚ -β”‚ β”œβ”€ Dungeon Editor β”‚ -β”‚ β”œβ”€ Palette Editor β”‚ -β”‚ └─ Graphics Editor β”‚ -β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ -``` - -### Data Flow: Proposal Lifecycle - -``` -User: z3ed agent run "Make soldiers red" --sandbox - β”‚ - β–Ό -β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” -β”‚ MockAIService β”‚ β†’ ["palette export sprites_aux1 4 soldier.col"] -β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ - β”‚ - β–Ό -β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” -β”‚ RomSandboxManager β”‚ β†’ Creates: /tmp/.../sandboxes/20251002T100000/zelda3.sfc -β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ - β”‚ - β–Ό -β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” -β”‚ Execute Commands β”‚ β†’ Runs: palette export on sandbox ROM -β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ - β”‚ - β–Ό -β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” -β”‚ ProposalRegistry β”‚ β†’ Creates: proposal-20251002T100000/ -β”‚ β”‚ β€’ execution.log -β”‚ β”‚ β€’ diff.txt (if generated) -β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ - β”‚ - β–Ό (User opens YAZE GUI) -β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” -β”‚ ProposalDrawer β”‚ β†’ Displays: List of proposals -β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ - β”‚ - β–Ό (User clicks "Accept") -β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” -β”‚ AcceptProposal() β”‚ β†’ 1. Load sandbox ROM -β”‚ β”‚ 2. rom_->WriteVector(0, sandbox_rom.vector()) -β”‚ β”‚ 3. ROM marked dirty -β”‚ β”‚ 4. User saves ROM -β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ -``` - ---- - -## Command Reference - -### Agent Commands - -#### `agent run` - Execute AI-driven ROM modifications -```bash -z3ed agent run --prompt "" --rom [--sandbox] - -Options: - --prompt Natural language description of desired changes - --rom Path to ROM file (default: current ROM) - --sandbox Create isolated copy for testing (recommended) - -Example: - z3ed agent run --prompt "Change soldier armor to red" \ - --rom=zelda3.sfc --sandbox -``` - -**Output**: -- Proposal ID -- Sandbox path -- Command execution log -- Next steps guidance - -#### `agent list` - Show all proposals -```bash -z3ed agent list - -Example Output: -=== Agent Proposals === - -ID: proposal-20251002T100000-1 - Status: Pending - Created: 2025-10-02 10:00:00 - Prompt: Change soldier armor to red - Commands: 3 - Bytes Changed: 128 - -Total: 1 proposal(s) -``` - -#### `agent diff` - Show proposal changes -```bash -z3ed agent diff [--proposal-id ] - -Options: - --proposal-id View specific proposal (default: latest pending) - -Example: - z3ed agent diff --proposal-id proposal-20251002T100000-1 -``` - -**Output**: -- Proposal metadata -- Execution log -- Diff content -- Next steps - -#### `agent describe` - Export machine-readable API specs -```bash -z3ed agent describe [--format ] [--resource ] [--output ] - -Options: - --format Output format: yaml or json (default: yaml) - --resource Filter to specific resource (rom, palette, etc.) - --output Write to file instead of stdout - -Examples: - z3ed agent describe --format yaml - z3ed agent describe --format json --resource rom - z3ed agent describe --output docs/api/z3ed-resources.yaml -``` - -**Resources Available**: -- `rom` - ROM file operations -- `patch` - Patch application -- `palette` - Palette manipulation -- `overworld` - Overworld editing -- `dungeon` - Dungeon editing -- `agent` - Agent commands - -#### `agent resource-list` - Enumerate labeled resources for the AI -```bash -z3ed agent resource-list --type [--format ] - -Options: - --type Required label family (dungeon, overworld, sprite, palette, etc.) - --format Output format, defaults to `table`. Use `json` for LLM tooling. - -Examples: - # Show dungeon labels in a table - z3ed agent resource-list --type dungeon - - # Emit JSON for the conversation agent to consume - z3ed agent resource-list --type overworld --format json -``` - -**Notes**: -- When the conversation agent invokes this tool, JSON output is requested automatically. -- Labels are loaded from `ResourceContextBuilder`, so the command reflects project-specific metadata. - -#### `agent dungeon-list-sprites` - Inspect sprites in a dungeon room -```bash -z3ed agent dungeon-list-sprites --room [--format ] - -Options: - --room Dungeon room ID (hexadecimal). Accepts `0x` prefixes or decimal. - --format Output format, defaults to `table`. - -Examples: - z3ed agent dungeon-list-sprites --room 0x012 - z3ed agent dungeon-list-sprites --room 18 --format json -``` - -**Output**: -- Table view prints sprite id/x/y in hex+decimal for quick inspection. -- JSON view is tailored for the LLM toolchain and is returned automatically during tool calls. - -#### `agent chat` - Interactive terminal chat (TUI prototype) -```bash -z3ed agent chat -``` - -- Opens an FTXUI-based interface with scrolling history and input box. -- Uses the shared `ConversationalAgentService`, so the same backend powers the GUI widget. -- Useful for manual testing of tool dispatching and new prompting strategies. - -#### `agent test` - Automated GUI testing (IT-02) -```bash -z3ed agent test --prompt "" [--host ] [--port ] - -Options: - --prompt Natural language test description - --host Test harness hostname (default: localhost) - --port Test harness port (default: 50052) - --timeout Maximum test duration (default: 30) - -Supported Prompt Patterns: - - "Open editor" - - "Open and verify it loads" - - "Click