# z3ed Agent Roadmap **Last Updated**: October 3, 2025 ## Current Status ### ✅ Production Ready - **Build System**: Z3ED_AI flag consolidation complete - **AI Backends**: Ollama (local) and Gemini (cloud) operational - **Conversational Agent**: Multi-step tool execution with chat history - **Tool Dispatcher**: 5 read-only tools (resource-list, dungeon-list-sprites, overworld-find-tile, overworld-describe-map, overworld-list-warps) - **TUI Chat**: FTXUI-based interactive terminal interface - **Simple Chat**: Text-mode REPL for AI testing (no FTXUI dependencies) - **GUI Chat Widget**: ImGui-based widget (needs integration into main app) ### 🚧 Active Work 1. **Live LLM Testing** (1-2h): Verify function calling with real models 2. **GUI Integration** (4-6h): Wire AgentChatWidget into YAZE editor 3. **Proposal Workflow** (6-8h): End-to-end integration from chat to ROM changes ## Core Vision Transform z3ed from a command-line tool into a **conversational ROM hacking assistant** where users can: - Ask questions about ROM contents ("What dungeons exist?") - Inspect game data interactively ("How many soldiers in room X?") - Build changes incrementally through dialogue - Generate proposals from conversation context ## Technical Architecture ### 1. Conversational Agent Service ✅ **Status**: Complete - `ConversationalAgentService`: Manages chat sessions and tool execution - Integrates with Ollama/Gemini AI services - Handles tool calls with automatic JSON formatting - Maintains conversation history and context ### 2. Read-Only Tools ✅ **Status**: 5 tools implemented - `resource-list`: Enumerate labeled resources - `dungeon-list-sprites`: Inspect sprites in rooms - `overworld-find-tile`: Search for tile16 IDs - `overworld-describe-map`: Get map metadata - `overworld-list-warps`: List entrances/exits/holes **Next**: Add dialogue, sprite info, and region inspection tools ### 3. Chat Interfaces **Status**: Multiple modes available - **TUI (FTXUI)**: Full-screen interactive terminal (✅ complete) - **Simple Mode**: Text REPL for automation/testing (✅ complete) - **GUI (ImGui)**: Dockable widget in YAZE (⚠️ needs integration) ### 4. Proposal Workflow Integration **Status**: Planned **Goal**: When user requests ROM changes, agent generates proposal 1. User chats to explore ROM 2. User requests change ("add two more soldiers") 3. Agent generates commands → creates proposal 4. User reviews with `agent diff` or GUI 5. User accepts/rejects proposal ## Immediate Priorities ### Priority 1: Live LLM Testing (1-2 hours) Verify function calling works end-to-end: - Test Gemini 2.0 with natural language prompts - Test Ollama (qwen2.5-coder) with tool discovery - Validate multi-step conversations - Exercise all 5 tools ### Priority 2: GUI Chat Integration (4-6 hours) Wire AgentChatWidget into main YAZE editor: - Add menu item: Debug → Agent Chat - Connect to shared ConversationalAgentService - Test with loaded ROM context - Add history persistence ### Priority 3: Proposal Generation (6-8 hours) ## Technical Implementation Plan ### 1. Conversational Agent Service - **Description**: A new service that will manage the back-and-forth between the user and the LLM. It will maintain chat history and orchestrate the agent's different modes (Q&A vs. command generation). - **Components**: - `ConversationalAgentService`: The main class for managing the chat session. - Integration with existing `AIService` implementations (Ollama, Gemini). - **Status**: In progress — baseline service exists with chat history, tool loop handling, and structured response parsing. Next up: wiring in live ROM context and richer session state. ### 2. Read-Only "Tools" for the Agent - **Description**: To enable the agent to answer questions, we need to expand `z3ed` with a suite of read-only commands that the LLM can call. This is aligned with the "tool use" or "function calling" capabilities of modern LLMs. - **Example Tools to Implement**: - `resource list --type `: List all user-defined labels of a certain type. - `dungeon list-sprites --room `: List all sprites in a given room. - `dungeon get-info --room `: Get metadata for a specific room. - `overworld find-tile --tile `: Find all occurrences of a specific tile on the overworld map. - **Advanced Editing Tools (for future implementation)**: - `overworld set-area --map --x --y --width --height --tile ` - `overworld replace-tile --map --from --to ` - `overworld blend-tiles --map --pattern --density ` - **Status**: Foundational commands (`resource-list`, `dungeon-list-sprites`) are live with JSON output. Focus is shifting to high-value Overworld and dialogue inspection tools. ### 3. TUI and GUI Chat Interfaces - **Description**: User-facing components for interacting with the `ConversationalAgentService`. - **Components**: - **TUI**: A new full-screen component in `z3ed` using FTXUI, providing a rich chat experience in the terminal. - **GUI**: A new ImGui widget that can be docked into the main `yaze` application window. - **Status**: In progress — CLI/TUI and GUI chat widgets exist, now rendering tables/JSON with readable formatting. Need to improve input ergonomics and synchronized history navigation. ### 4. Integration with the Proposal Workflow - **Description**: The final step is to connect the conversation to the action. When a user's prompt implies a desire to modify the ROM (e.g., "Okay, now add two more soldiers"), the `ConversationalAgentService` will trigger the existing `Tile16ProposalGenerator` (and future proposal generators for other resource types) to create a proposal. - **Workflow**: 1. User chats with the agent to explore the ROM. 2. User asks the agent to make a change. 3. `ConversationalAgentService` generates the commands and passes them to the appropriate `ProposalGenerator`. 4. A new proposal is created and saved. 5. The TUI/GUI notifies the user that a proposal is ready for review. 6. User uses the `agent diff` and `agent accept` commands (or UI equivalents) to review and apply the changes. - **Status**: The proposal workflow itself is mostly implemented. This task involves integrating it with the new conversational service. ## Next Steps ### Immediate Priorities 1. **✅ Build System Consolidation** (COMPLETE - Oct 3, 2025): - ✅ Created Z3ED_AI master flag for simplified builds - ✅ Fixed Gemini crash with graceful degradation - ✅ Updated documentation with new build instructions - ✅ Tested both Ollama and Gemini backends - **Next**: Update CI/CD workflows to use `-DZ3ED_AI=ON` 2. **Live LLM Testing** (NEXT UP - 1-2 hours): - Verify function calling works with real Ollama/Gemini - Test multi-step tool execution - Validate all 5 tools with natural language prompts 3. **Expand Overworld Tool Coverage**: - ✅ Ship read-only tile searches (`overworld find-tile`) with shared formatting for CLI and agent calls. - Next: add area summaries, teleport destination lookups, and keep JSON/Text parity for all new tools. 4. **Polish the TUI Chat Experience**: - Tighten keyboard shortcuts, scrolling, and copy-to-clipboard behaviour. - Align log file output with on-screen formatting for easier debugging. 5. **Document & Test the New Tooling**: - Update the main `README.md` and relevant docs to cover the new chat formatting. - Add regression tests (unit or golden JSON fixtures) for the new Overworld tools. 5. **Build GUI Chat Widget**: - Create the ImGui component. - Ensure it shares the same backend service as the TUI. 6. **Full Integration with Proposal System**: - Implement the logic for the agent to transition from conversation to proposal generation. 7. **Expand Tool Arsenal**: - Continuously add new read-only commands to give the agent more capabilities to inspect the ROM. 8. **Multi-Modal Agent**: - Explore the possibility of the agent generating and displaying images (e.g., a map of a dungeon room) in the chat. 9. **Advanced Configuration**: - Implement environment variables for selecting AI providers and models (e.g., `YAZE_AI_PROVIDER`, `OLLAMA_MODEL`). - Add CLI flags for overriding the provider and model on a per-command basis. 10. **Performance and Cost-Saving**: - Implement a response cache to reduce latency and API costs. - Add token usage tracking and reporting. ## Current Status & Next Steps (Updated: October 3, 2025) We have made significant progress in laying the foundation for the conversational agent. ### ✅ Completed - **Build System Consolidation**: ✅ **NEW** Z3ED_AI master flag (Oct 3, 2025) - Single flag enables all AI features: `-DZ3ED_AI=ON` - Auto-manages dependencies (JSON, YAML, httplib, OpenSSL) - Fixed Gemini crash when API key set but JSON disabled - Graceful degradation with clear error messages - Backward compatible with old flags - Ready for build modularization (enables optional `libyaze_agent.a`) - **Docs**: `docs/z3ed/Z3ED_AI_FLAG_MIGRATION.md` - **`ConversationalAgentService`**: ✅ Fully operational with multi-step tool execution loop - Handles tool calls with automatic JSON output format - Prevents recursion through proper tool result replay - Supports conversation history and context management - **TUI Chat Interface**: ✅ Production-ready (`z3ed agent chat`) - Renders tables from JSON tool results - Pretty-prints JSON payloads with syntax formatting - Scrollable history with user/agent distinction - **Tool Dispatcher**: ✅ Complete with 5 read-only tools - `resource-list`: Enumerate labeled resources (dungeons, sprites, palettes) - `dungeon-list-sprites`: Inspect sprites in dungeon rooms - `overworld-find-tile`: Search for tile16 IDs across maps - `overworld-describe-map`: Get comprehensive map metadata - `overworld-list-warps`: List entrances/exits/holes with filtering - **Structured Output Rendering**: ✅ Both TUI formats support tables and JSON - Automatic table generation from JSON arrays/objects - Column-aligned formatting with headers - Graceful fallback to text for malformed data - **ROM Context Integration**: ✅ Tools can access loaded ROM or load from `--rom` flag - Shared ROM context passed through ConversationalAgentService - Automatic ROM loading with error handling - **AI Service Foundation**: ✅ Ollama and Gemini services operational - Enhanced prompting system with resource catalogue loading - System instruction generation with examples - Health checks and model availability validation - Both backends tested and working in production ### 🚧 In Progress - **Live LLM Testing**: Ready to execute with real Ollama/Gemini - All infrastructure complete (function calling, tool schemas, response parsing) - Need to verify multi-step tool execution with live models - Test scenarios prepared for all 5 tools - **Estimated Time**: 1-2 hours - **GUI Chat Widget**: Not yet started - TUI implementation complete and can serve as reference - Should reuse table/JSON rendering logic from TUI - Target: `src/app/gui/debug/agent_chat_widget.{h,cc}` - **Estimated Time**: 6-8 hours ### 🚀 Next Steps (Priority Order) #### Priority 1: Live LLM Testing with Function Calling (1-2 hours) **Goal**: Verify Ollama/Gemini can autonomously invoke tools in production **Infrastructure Complete** ✅: - ✅ Tool schema generation (`BuildFunctionCallSchemas()`) - ✅ System prompts include function definitions - ✅ AI services parse `tool_calls` from responses - ✅ ConversationalAgentService dispatches to ToolDispatcher - ✅ All 5 tools tested independently **Testing Tasks**: 1. **Gemini Testing** (30 min) - Verify Gemini 2.0 generates correct `tool_calls` JSON - Test prompt: "What dungeons are in this ROM?" - Verify tool result fed back into conversation - Test multi-step: "Now list sprites in the first dungeon" 2. **Ollama Testing** (30 min) - Verify qwen2.5-coder discovers and calls tools - Same test prompts as Gemini - Compare response quality between models 3. **Tool Coverage Testing** (30 min) - Exercise all 5 tools with natural language prompts - Verify JSON output formats correctly - Test error handling (invalid room IDs, etc.) **Success Criteria**: - LLM autonomously calls tools without explicit command syntax - Tool results incorporated into follow-up responses - Multi-turn conversations work with context #### Priority 2: Implement GUI Chat Widget (6-8 hours) **Goal**: Unified chat experience in YAZE application 1. **Create ImGui Chat Widget** (4 hours) - File: `src/app/gui/debug/agent_chat_widget.{h,cc}` - Reuse table/JSON rendering logic from TUI implementation - Add to Debug menu: `Debug → Agent Chat` - Share `ConversationalAgentService` instance with TUI 2. **Add Chat History Persistence** (2 hours) - Save chat history to `.yaze/agent_chat_history.json` - Load on startup, display in GUI/TUI - Add "Clear History" button 3. **Polish Input Experience** (2 hours) - Multi-line input support (Shift+Enter for newline, Enter to send) - Keyboard shortcuts: Ctrl+L to clear, Ctrl+C to copy last response - Auto-scroll to bottom on new messages #### Priority 3: Proposal Generation (6-8 hours) Connect chat to ROM modification workflow: - Detect action intents in conversation - Generate proposal from accumulated context - Link proposal to chat history - GUI notification when proposal ready ## Command Reference ### Chat Modes ```bash # Interactive TUI chat (FTXUI) z3ed agent chat --rom zelda3.sfc # Simple text mode (for automation/AI testing) z3ed agent simple-chat --rom zelda3.sfc # Batch mode from file z3ed agent simple-chat --file tests.txt --rom zelda3.sfc ``` ### Tool Commands (for direct testing) ```bash # List dungeons z3ed agent resource-list --type dungeon --format json # Find tiles z3ed agent overworld-find-tile --tile 0x02E --map 0x05 # List sprites in room z3ed agent dungeon-list-sprites --room 0x012 ``` ## Build Quick Reference ```bash # Full AI features cmake -B build -DZ3ED_AI=ON cmake --build build --target z3ed # With GUI automation/testing cmake -B build -DZ3ED_AI=ON -DYAZE_WITH_GRPC=ON cmake --build build # Minimal (no AI) cmake -B build cmake --build build --target z3ed ``` ## Future Enhancements ### Short Term (1-2 months) - Dialogue/text search tools - Sprite info inspection - Region/teleport tools - Response caching - Token usage tracking ### Medium Term (3-6 months) - Multi-modal agent (image generation) - Advanced configuration (env vars, model selection) - Proposal templates for common edits - Undo/redo in conversations ### Long Term (6+ months) - Visual diff viewer for proposals - Collaborative editing sessions - Learning from user feedback - Custom tool plugins **Goal**: Enable deeper ROM introspection for level design questions 1. **Dialogue/Text Tools** (3 hours) - `dialogue-search --text "search term"`: Find text in ROM dialogue - `dialogue-get --id 0x...`: Get dialogue by message ID 2. **Sprite Tools** (3 hours) - `sprite-get-info --id 0x...`: Sprite metadata (HP, damage, AI) - `overworld-list-sprites --map 0x...`: Sprites on overworld map 3. **Advanced Overworld Tools** (4 hours) - `overworld-get-region --map 0x...`: Region boundaries and properties - `overworld-list-transitions --from-map 0x...`: Map transitions/scrolling - `overworld-get-tile-at --map 0x... --x N --y N`: Get specific tile16 value #### Priority 4: Performance and Caching (4-6 hours) 1. **Response Caching** (3 hours) - Implement LRU cache for identical prompts - Cache tool results by (tool_name, args) key - Configurable TTL (default: 5 minutes for ROM introspection) 2. **Token Usage Tracking** (2 hours) - Log tokens per request (Ollama and Gemini APIs provide this) - Display in chat footer: "Last response: 1234 tokens, ~$0.02" - Add `--show-token-usage` flag to CLI commands 3. **Streaming Responses** (optional, 3-4 hours) - Use Ollama/Gemini streaming APIs - Update GUI/TUI to show partial responses as they arrive - Improves perceived latency for long responses ## z3ed Build Quick Reference ```bash # Full AI features (Ollama + Gemini) cmake -B build -DZ3ED_AI=ON cmake --build build --target z3ed # AI + GUI automation/testing cmake -B build -DZ3ED_AI=ON -DYAZE_WITH_GRPC=ON cmake --build build --target z3ed # Minimal build (no AI) cmake -B build cmake --build build --target z3ed ``` ## Build Flags Explained | Flag | Purpose | Dependencies | When to Use | |------|---------|--------------|-------------| | `Z3ED_AI=ON` | **Master flag** for AI features | JSON, YAML, httplib, (OpenSSL*) | Want Ollama or Gemini support | | `YAZE_WITH_GRPC=ON` | GUI automation & testing | gRPC, Protobuf, (auto-enables JSON) | Want GUI test harness | | `YAZE_WITH_JSON=ON` | Low-level JSON support | nlohmann_json | Auto-enabled by above flags | *OpenSSL optional - required for Gemini (HTTPS), Ollama works without it ## Feature Matrix | Feature | No Flags | Z3ED_AI | Z3ED_AI + GRPC | |---------|----------|---------|----------------| | Basic CLI | ✅ | ✅ | ✅ | | Ollama (local) | ❌ | ✅ | ✅ | | Gemini (cloud) | ❌ | ✅* | ✅* | | TUI Chat | ❌ | ✅ | ✅ | | GUI Test Automation | ❌ | ❌ | ✅ | | Tool Dispatcher | ❌ | ✅ | ✅ | | Function Calling | ❌ | ✅ | ✅ | *Requires OpenSSL for HTTPS ## Common Build Scenarios ### Developer (AI features, no GUI testing) ```bash cmake -B build -DZ3ED_AI=ON cmake --build build --target z3ed -j8 ``` ### Full Stack (AI + GUI automation) ```bash cmake -B build -DZ3ED_AI=ON -DYAZE_WITH_GRPC=ON cmake --build build --target z3ed -j8 ``` ### CI/CD (minimal, fast) ```bash cmake -B build -DYAZE_MINIMAL_BUILD=ON cmake --build build -j$(nproc) ``` ### Release Build (optimized) ```bash cmake -B build -DZ3ED_AI=ON -DCMAKE_BUILD_TYPE=Release cmake --build build --target z3ed -j8 ``` ## Migration from Old Flags ### Before (Confusing) ```bash cmake -B build -DYAZE_WITH_GRPC=ON -DYAZE_WITH_JSON=ON ``` ### After (Clear Intent) ```bash cmake -B build -DZ3ED_AI=ON -DYAZE_WITH_GRPC=ON ``` **Note**: Old flags still work for backward compatibility! ## Troubleshooting ### "Build with -DZ3ED_AI=ON" warning **Symptom**: AI commands fail with "JSON support required" **Fix**: Rebuild with AI flag ```bash rm -rf build && cmake -B build -DZ3ED_AI=ON && cmake --build build ``` ### "OpenSSL not found" warning **Symptom**: Gemini API doesn't work **Impact**: Only affects Gemini (cloud). Ollama (local) works fine **Fix (optional)**: ```bash # macOS brew install openssl # Linux sudo apt install libssl-dev # Then rebuild cmake -B build -DZ3ED_AI=ON && cmake --build build ``` ### Ollama vs Gemini not auto-detecting **Symptom**: Wrong backend selected **Fix**: Set explicit provider ```bash # Force Ollama export YAZE_AI_PROVIDER=ollama ./build/bin/z3ed agent plan --prompt "test" # Force Gemini export YAZE_AI_PROVIDER=gemini export GEMINI_API_KEY="your-key" ./build/bin/z3ed agent plan --prompt "test" ``` ## Environment Variables | Variable | Default | Purpose | |----------|---------|---------| | `YAZE_AI_PROVIDER` | auto | Force `ollama` or `gemini` | | `GEMINI_API_KEY` | - | Gemini API key (enables Gemini) | | `OLLAMA_MODEL` | `qwen2.5-coder:7b` | Override Ollama model | | `GEMINI_MODEL` | `gemini-2.5-flash` | Override Gemini model | ## Platform-Specific Notes ### macOS - OpenSSL auto-detected via Homebrew - Keychain integration for SSL certs - Recommended: `brew install openssl ollama` ### Linux - OpenSSL typically pre-installed - Install via: `sudo apt install libssl-dev` - Ollama: Download from https://ollama.com ### Windows - Use Ollama (no SSL required) - Gemini requires OpenSSL (harder to setup on Windows) - Recommend: Focus on Ollama for Windows builds ## Performance Tips ### Faster Incremental Builds ```bash # Use Ninja instead of Make cmake -B build -GNinja -DZ3ED_AI=ON ninja -C build z3ed # Enable ccache export CMAKE_CXX_COMPILER_LAUNCHER=ccache cmake -B build -DZ3ED_AI=ON ``` ### Reduce Build Scope ```bash # Only build z3ed (not full yaze app) cmake --build build --target z3ed # Parallel build cmake --build build --target z3ed -j$(nproc) ``` ## Related Documentation - **Migration Guide**: [Z3ED_AI_FLAG_MIGRATION.md](Z3ED_AI_FLAG_MIGRATION.md) - **Technical Roadmap**: [AGENT-ROADMAP.md](AGENT-ROADMAP.md) - **Main README**: [README.md](README.md) - **Build Modularization**: `../../build_modularization_plan.md` ## Quick Test Verify your build works: ```bash # Check z3ed runs ./build/bin/z3ed --version # Test AI detection ./build/bin/z3ed agent plan --prompt "test" 2>&1 | head -5 # Expected output (with Z3ED_AI=ON): # 🤖 Using Gemini AI with model: gemini-2.5-flash # or # 🤖 Using Ollama AI with model: qwen2.5-coder:7b # or # 🤖 Using MockAIService (no LLM configured) ``` ## Support If you encounter issues: 1. Check this guide's troubleshooting section 2. Review [Z3ED_AI_FLAG_MIGRATION.md](Z3ED_AI_FLAG_MIGRATION.md) 3. Verify CMake output for warnings 4. Open an issue with build logs ## Summary **Recommended for most users**: ```bash cmake -B build -DZ3ED_AI=ON cmake --build build --target z3ed -j8 ./build/bin/z3ed agent chat ``` This gives you: - ✅ Ollama support (local, free) - ✅ Gemini support (cloud, API key required) - ✅ TUI chat interface - ✅ Tool dispatcher with 5 commands - ✅ Function calling support - ✅ All AI agent features