21 KiB
z3ed Agent Roadmap
Last Updated: October 3, 2025
Current Status
✅ Production Ready
- Build System: Z3ED_AI flag consolidation complete
- AI Backends: Ollama (local) and Gemini (cloud) operational
- Conversational Agent: Multi-step tool execution with chat history
- Tool Dispatcher: 5 read-only tools (resource-list, dungeon-list-sprites, overworld-find-tile, overworld-describe-map, overworld-list-warps)
- TUI Chat: FTXUI-based interactive terminal interface
- Simple Chat: Text-mode REPL for AI testing (no FTXUI dependencies)
- GUI Chat Widget: ImGui-based widget (needs integration into main app)
🚧 Active Work
- Live LLM Testing (1-2h): Verify function calling with real models
- GUI Integration (4-6h): Wire AgentChatWidget into YAZE editor
- Proposal Workflow (6-8h): End-to-end integration from chat to ROM changes
Core Vision
Transform z3ed from a command-line tool into a conversational ROM hacking assistant where users can:
- Ask questions about ROM contents ("What dungeons exist?")
- Inspect game data interactively ("How many soldiers in room X?")
- Build changes incrementally through dialogue
- Generate proposals from conversation context
Technical Architecture
1. Conversational Agent Service ✅
Status: Complete
ConversationalAgentService: Manages chat sessions and tool execution- Integrates with Ollama/Gemini AI services
- Handles tool calls with automatic JSON formatting
- Maintains conversation history and context
2. Read-Only Tools ✅
Status: 5 tools implemented
resource-list: Enumerate labeled resourcesdungeon-list-sprites: Inspect sprites in roomsoverworld-find-tile: Search for tile16 IDsoverworld-describe-map: Get map metadataoverworld-list-warps: List entrances/exits/holes
Next: Add dialogue, sprite info, and region inspection tools
3. Chat Interfaces
Status: Multiple modes available
- TUI (FTXUI): Full-screen interactive terminal (✅ complete)
- Simple Mode: Text REPL for automation/testing (✅ complete)
- GUI (ImGui): Dockable widget in YAZE (⚠️ needs integration)
4. Proposal Workflow Integration
Status: Planned Goal: When user requests ROM changes, agent generates proposal
- User chats to explore ROM
- User requests change ("add two more soldiers")
- Agent generates commands → creates proposal
- User reviews with
agent diffor GUI - User accepts/rejects proposal
Immediate Priorities
Priority 1: Live LLM Testing (1-2 hours)
Verify function calling works end-to-end:
- Test Gemini 2.0 with natural language prompts
- Test Ollama (qwen2.5-coder) with tool discovery
- Validate multi-step conversations
- Exercise all 5 tools
Priority 2: GUI Chat Integration (4-6 hours)
Wire AgentChatWidget into main YAZE editor:
- Add menu item: Debug → Agent Chat
- Connect to shared ConversationalAgentService
- Test with loaded ROM context
- Add history persistence
Priority 3: Proposal Generation (6-8 hours)
Technical Implementation Plan
1. Conversational Agent Service
- Description: A new service that will manage the back-and-forth between the user and the LLM. It will maintain chat history and orchestrate the agent's different modes (Q&A vs. command generation).
- Components:
ConversationalAgentService: The main class for managing the chat session.- Integration with existing
AIServiceimplementations (Ollama, Gemini).
- Status: In progress — baseline service exists with chat history, tool loop handling, and structured response parsing. Next up: wiring in live ROM context and richer session state.
2. Read-Only "Tools" for the Agent
- Description: To enable the agent to answer questions, we need to expand
z3edwith a suite of read-only commands that the LLM can call. This is aligned with the "tool use" or "function calling" capabilities of modern LLMs. - Example Tools to Implement:
resource list --type <dungeon|sprite|...>: List all user-defined labels of a certain type.dungeon list-sprites --room <id|label>: List all sprites in a given room.dungeon get-info --room <id|label>: Get metadata for a specific room.overworld find-tile --tile <id>: Find all occurrences of a specific tile on the overworld map.
- Advanced Editing Tools (for future implementation):
overworld set-area --map <id> --x <x> --y <y> --width <w> --height <h> --tile <id>overworld replace-tile --map <id> --from <old_id> --to <new_id>overworld blend-tiles --map <id> --pattern <name> --density <percent>
- Status: Foundational commands (
resource-list,dungeon-list-sprites) are live with JSON output. Focus is shifting to high-value Overworld and dialogue inspection tools.
3. TUI and GUI Chat Interfaces
- Description: User-facing components for interacting with the
ConversationalAgentService. - Components:
- TUI: A new full-screen component in
z3edusing FTXUI, providing a rich chat experience in the terminal. - GUI: A new ImGui widget that can be docked into the main
yazeapplication window.
- TUI: A new full-screen component in
- Status: In progress — CLI/TUI and GUI chat widgets exist, now rendering tables/JSON with readable formatting. Need to improve input ergonomics and synchronized history navigation.
4. Integration with the Proposal Workflow
- Description: The final step is to connect the conversation to the action. When a user's prompt implies a desire to modify the ROM (e.g., "Okay, now add two more soldiers"), the
ConversationalAgentServicewill trigger the existingTile16ProposalGenerator(and future proposal generators for other resource types) to create a proposal. - Workflow:
- User chats with the agent to explore the ROM.
- User asks the agent to make a change.
ConversationalAgentServicegenerates the commands and passes them to the appropriateProposalGenerator.- A new proposal is created and saved.
- The TUI/GUI notifies the user that a proposal is ready for review.
- User uses the
agent diffandagent acceptcommands (or UI equivalents) to review and apply the changes.
- Status: The proposal workflow itself is mostly implemented. This task involves integrating it with the new conversational service.
Next Steps
Immediate Priorities
- ✅ Build System Consolidation (COMPLETE - Oct 3, 2025):
- ✅ Created Z3ED_AI master flag for simplified builds
- ✅ Fixed Gemini crash with graceful degradation
- ✅ Updated documentation with new build instructions
- ✅ Tested both Ollama and Gemini backends
- Next: Update CI/CD workflows to use
-DZ3ED_AI=ON
- Live LLM Testing (NEXT UP - 1-2 hours):
- Verify function calling works with real Ollama/Gemini
- Test multi-step tool execution
- Validate all 5 tools with natural language prompts
- Expand Overworld Tool Coverage:
- ✅ Ship read-only tile searches (
overworld find-tile) with shared formatting for CLI and agent calls. - Next: add area summaries, teleport destination lookups, and keep JSON/Text parity for all new tools.
- ✅ Ship read-only tile searches (
- Polish the TUI Chat Experience:
- Tighten keyboard shortcuts, scrolling, and copy-to-clipboard behaviour.
- Align log file output with on-screen formatting for easier debugging.
- Document & Test the New Tooling:
- Update the main
README.mdand relevant docs to cover the new chat formatting. - Add regression tests (unit or golden JSON fixtures) for the new Overworld tools.
- Update the main
- Build GUI Chat Widget:
- Create the ImGui component.
- Ensure it shares the same backend service as the TUI.
- Full Integration with Proposal System:
- Implement the logic for the agent to transition from conversation to proposal generation.
- Expand Tool Arsenal:
- Continuously add new read-only commands to give the agent more capabilities to inspect the ROM.
- Multi-Modal Agent:
- Explore the possibility of the agent generating and displaying images (e.g., a map of a dungeon room) in the chat.
- Advanced Configuration:
- Implement environment variables for selecting AI providers and models (e.g.,
YAZE_AI_PROVIDER,OLLAMA_MODEL). - Add CLI flags for overriding the provider and model on a per-command basis.
- Implement environment variables for selecting AI providers and models (e.g.,
- Performance and Cost-Saving: - Implement a response cache to reduce latency and API costs. - Add token usage tracking and reporting.
Current Status & Next Steps (Updated: October 3, 2025)
We have made significant progress in laying the foundation for the conversational agent.
✅ Completed
- Build System Consolidation: ✅ NEW Z3ED_AI master flag (Oct 3, 2025)
- Single flag enables all AI features:
-DZ3ED_AI=ON - Auto-manages dependencies (JSON, YAML, httplib, OpenSSL)
- Fixed Gemini crash when API key set but JSON disabled
- Graceful degradation with clear error messages
- Backward compatible with old flags
- Ready for build modularization (enables optional
libyaze_agent.a) - Docs:
docs/z3ed/Z3ED_AI_FLAG_MIGRATION.md
- Single flag enables all AI features:
ConversationalAgentService: ✅ Fully operational with multi-step tool execution loop- Handles tool calls with automatic JSON output format
- Prevents recursion through proper tool result replay
- Supports conversation history and context management
- TUI Chat Interface: ✅ Production-ready (
z3ed agent chat)- Renders tables from JSON tool results
- Pretty-prints JSON payloads with syntax formatting
- Scrollable history with user/agent distinction
- Tool Dispatcher: ✅ Complete with 5 read-only tools
resource-list: Enumerate labeled resources (dungeons, sprites, palettes)dungeon-list-sprites: Inspect sprites in dungeon roomsoverworld-find-tile: Search for tile16 IDs across mapsoverworld-describe-map: Get comprehensive map metadataoverworld-list-warps: List entrances/exits/holes with filtering
- Structured Output Rendering: ✅ Both TUI formats support tables and JSON
- Automatic table generation from JSON arrays/objects
- Column-aligned formatting with headers
- Graceful fallback to text for malformed data
- ROM Context Integration: ✅ Tools can access loaded ROM or load from
--romflag- Shared ROM context passed through ConversationalAgentService
- Automatic ROM loading with error handling
- AI Service Foundation: ✅ Ollama and Gemini services operational
- Enhanced prompting system with resource catalogue loading
- System instruction generation with examples
- Health checks and model availability validation
- Both backends tested and working in production
🚧 In Progress
- Live LLM Testing: Ready to execute with real Ollama/Gemini
- All infrastructure complete (function calling, tool schemas, response parsing)
- Need to verify multi-step tool execution with live models
- Test scenarios prepared for all 5 tools
- Estimated Time: 1-2 hours
- GUI Chat Widget: Not yet started
- TUI implementation complete and can serve as reference
- Should reuse table/JSON rendering logic from TUI
- Target:
src/app/gui/debug/agent_chat_widget.{h,cc} - Estimated Time: 6-8 hours
🚀 Next Steps (Priority Order)
Priority 1: Live LLM Testing with Function Calling (1-2 hours)
Goal: Verify Ollama/Gemini can autonomously invoke tools in production
Infrastructure Complete ✅:
- ✅ Tool schema generation (
BuildFunctionCallSchemas()) - ✅ System prompts include function definitions
- ✅ AI services parse
tool_callsfrom responses - ✅ ConversationalAgentService dispatches to ToolDispatcher
- ✅ All 5 tools tested independently
Testing Tasks:
-
Gemini Testing (30 min)
- Verify Gemini 2.0 generates correct
tool_callsJSON - Test prompt: "What dungeons are in this ROM?"
- Verify tool result fed back into conversation
- Test multi-step: "Now list sprites in the first dungeon"
- Verify Gemini 2.0 generates correct
-
Ollama Testing (30 min)
- Verify qwen2.5-coder discovers and calls tools
- Same test prompts as Gemini
- Compare response quality between models
-
Tool Coverage Testing (30 min)
- Exercise all 5 tools with natural language prompts
- Verify JSON output formats correctly
- Test error handling (invalid room IDs, etc.)
Success Criteria:
- LLM autonomously calls tools without explicit command syntax
- Tool results incorporated into follow-up responses
- Multi-turn conversations work with context
Priority 2: Implement GUI Chat Widget (6-8 hours)
Goal: Unified chat experience in YAZE application
-
Create ImGui Chat Widget (4 hours)
- File:
src/app/gui/debug/agent_chat_widget.{h,cc} - Reuse table/JSON rendering logic from TUI implementation
- Add to Debug menu:
Debug → Agent Chat - Share
ConversationalAgentServiceinstance with TUI
- File:
-
Add Chat History Persistence (2 hours)
- Save chat history to
.yaze/agent_chat_history.json - Load on startup, display in GUI/TUI
- Add "Clear History" button
- Save chat history to
-
Polish Input Experience (2 hours)
- Multi-line input support (Shift+Enter for newline, Enter to send)
- Keyboard shortcuts: Ctrl+L to clear, Ctrl+C to copy last response
- Auto-scroll to bottom on new messages
Priority 3: Proposal Generation (6-8 hours)
Connect chat to ROM modification workflow:
- Detect action intents in conversation
- Generate proposal from accumulated context
- Link proposal to chat history
- GUI notification when proposal ready
Command Reference
Chat Modes
# Interactive TUI chat (FTXUI)
z3ed agent chat --rom zelda3.sfc
# Simple text mode (for automation/AI testing)
z3ed agent simple-chat --rom zelda3.sfc
# Batch mode from file
z3ed agent simple-chat --file tests.txt --rom zelda3.sfc
Tool Commands (for direct testing)
# List dungeons
z3ed agent resource-list --type dungeon --format json
# Find tiles
z3ed agent overworld-find-tile --tile 0x02E --map 0x05
# List sprites in room
z3ed agent dungeon-list-sprites --room 0x012
Build Quick Reference
# Full AI features
cmake -B build -DZ3ED_AI=ON
cmake --build build --target z3ed
# With GUI automation/testing
cmake -B build -DZ3ED_AI=ON -DYAZE_WITH_GRPC=ON
cmake --build build
# Minimal (no AI)
cmake -B build
cmake --build build --target z3ed
Future Enhancements
Short Term (1-2 months)
- Dialogue/text search tools
- Sprite info inspection
- Region/teleport tools
- Response caching
- Token usage tracking
Medium Term (3-6 months)
- Multi-modal agent (image generation)
- Advanced configuration (env vars, model selection)
- Proposal templates for common edits
- Undo/redo in conversations
Long Term (6+ months)
- Visual diff viewer for proposals
- Collaborative editing sessions
- Learning from user feedback
- Custom tool plugins Goal: Enable deeper ROM introspection for level design questions
-
Dialogue/Text Tools (3 hours)
dialogue-search --text "search term": Find text in ROM dialoguedialogue-get --id 0x...: Get dialogue by message ID
-
Sprite Tools (3 hours)
sprite-get-info --id 0x...: Sprite metadata (HP, damage, AI)overworld-list-sprites --map 0x...: Sprites on overworld map
-
Advanced Overworld Tools (4 hours)
overworld-get-region --map 0x...: Region boundaries and propertiesoverworld-list-transitions --from-map 0x...: Map transitions/scrollingoverworld-get-tile-at --map 0x... --x N --y N: Get specific tile16 value
Priority 4: Performance and Caching (4-6 hours)
-
Response Caching (3 hours)
- Implement LRU cache for identical prompts
- Cache tool results by (tool_name, args) key
- Configurable TTL (default: 5 minutes for ROM introspection)
-
Token Usage Tracking (2 hours)
- Log tokens per request (Ollama and Gemini APIs provide this)
- Display in chat footer: "Last response: 1234 tokens, ~$0.02"
- Add
--show-token-usageflag to CLI commands
-
Streaming Responses (optional, 3-4 hours)
- Use Ollama/Gemini streaming APIs
- Update GUI/TUI to show partial responses as they arrive
- Improves perceived latency for long responses
z3ed Build Quick Reference
# Full AI features (Ollama + Gemini)
cmake -B build -DZ3ED_AI=ON
cmake --build build --target z3ed
# AI + GUI automation/testing
cmake -B build -DZ3ED_AI=ON -DYAZE_WITH_GRPC=ON
cmake --build build --target z3ed
# Minimal build (no AI)
cmake -B build
cmake --build build --target z3ed
Build Flags Explained
| Flag | Purpose | Dependencies | When to Use |
|---|---|---|---|
Z3ED_AI=ON |
Master flag for AI features | JSON, YAML, httplib, (OpenSSL*) | Want Ollama or Gemini support |
YAZE_WITH_GRPC=ON |
GUI automation & testing | gRPC, Protobuf, (auto-enables JSON) | Want GUI test harness |
YAZE_WITH_JSON=ON |
Low-level JSON support | nlohmann_json | Auto-enabled by above flags |
*OpenSSL optional - required for Gemini (HTTPS), Ollama works without it
Feature Matrix
| Feature | No Flags | Z3ED_AI | Z3ED_AI + GRPC |
|---|---|---|---|
| Basic CLI | ✅ | ✅ | ✅ |
| Ollama (local) | ❌ | ✅ | ✅ |
| Gemini (cloud) | ❌ | ✅* | ✅* |
| TUI Chat | ❌ | ✅ | ✅ |
| GUI Test Automation | ❌ | ❌ | ✅ |
| Tool Dispatcher | ❌ | ✅ | ✅ |
| Function Calling | ❌ | ✅ | ✅ |
*Requires OpenSSL for HTTPS
Common Build Scenarios
Developer (AI features, no GUI testing)
cmake -B build -DZ3ED_AI=ON
cmake --build build --target z3ed -j8
Full Stack (AI + GUI automation)
cmake -B build -DZ3ED_AI=ON -DYAZE_WITH_GRPC=ON
cmake --build build --target z3ed -j8
CI/CD (minimal, fast)
cmake -B build -DYAZE_MINIMAL_BUILD=ON
cmake --build build -j$(nproc)
Release Build (optimized)
cmake -B build -DZ3ED_AI=ON -DCMAKE_BUILD_TYPE=Release
cmake --build build --target z3ed -j8
Migration from Old Flags
Before (Confusing)
cmake -B build -DYAZE_WITH_GRPC=ON -DYAZE_WITH_JSON=ON
After (Clear Intent)
cmake -B build -DZ3ED_AI=ON -DYAZE_WITH_GRPC=ON
Note: Old flags still work for backward compatibility!
Troubleshooting
"Build with -DZ3ED_AI=ON" warning
Symptom: AI commands fail with "JSON support required"
Fix: Rebuild with AI flag
rm -rf build && cmake -B build -DZ3ED_AI=ON && cmake --build build
"OpenSSL not found" warning
Symptom: Gemini API doesn't work
Impact: Only affects Gemini (cloud). Ollama (local) works fine
Fix (optional):
# macOS
brew install openssl
# Linux
sudo apt install libssl-dev
# Then rebuild
cmake -B build -DZ3ED_AI=ON && cmake --build build
Ollama vs Gemini not auto-detecting
Symptom: Wrong backend selected
Fix: Set explicit provider
# Force Ollama
export YAZE_AI_PROVIDER=ollama
./build/bin/z3ed agent plan --prompt "test"
# Force Gemini
export YAZE_AI_PROVIDER=gemini
export GEMINI_API_KEY="your-key"
./build/bin/z3ed agent plan --prompt "test"
Environment Variables
| Variable | Default | Purpose |
|---|---|---|
YAZE_AI_PROVIDER |
auto | Force ollama or gemini |
GEMINI_API_KEY |
- | Gemini API key (enables Gemini) |
OLLAMA_MODEL |
qwen2.5-coder:7b |
Override Ollama model |
GEMINI_MODEL |
gemini-2.5-flash |
Override Gemini model |
Platform-Specific Notes
macOS
- OpenSSL auto-detected via Homebrew
- Keychain integration for SSL certs
- Recommended:
brew install openssl ollama
Linux
- OpenSSL typically pre-installed
- Install via:
sudo apt install libssl-dev - Ollama: Download from https://ollama.com
Windows
- Use Ollama (no SSL required)
- Gemini requires OpenSSL (harder to setup on Windows)
- Recommend: Focus on Ollama for Windows builds
Performance Tips
Faster Incremental Builds
# Use Ninja instead of Make
cmake -B build -GNinja -DZ3ED_AI=ON
ninja -C build z3ed
# Enable ccache
export CMAKE_CXX_COMPILER_LAUNCHER=ccache
cmake -B build -DZ3ED_AI=ON
Reduce Build Scope
# Only build z3ed (not full yaze app)
cmake --build build --target z3ed
# Parallel build
cmake --build build --target z3ed -j$(nproc)
Related Documentation
- Migration Guide: Z3ED_AI_FLAG_MIGRATION.md
- Technical Roadmap: AGENT-ROADMAP.md
- Main README: README.md
- Build Modularization:
../../build_modularization_plan.md
Quick Test
Verify your build works:
# Check z3ed runs
./build/bin/z3ed --version
# Test AI detection
./build/bin/z3ed agent plan --prompt "test" 2>&1 | head -5
# Expected output (with Z3ED_AI=ON):
# 🤖 Using Gemini AI with model: gemini-2.5-flash
# or
# 🤖 Using Ollama AI with model: qwen2.5-coder:7b
# or
# 🤖 Using MockAIService (no LLM configured)
Support
If you encounter issues:
- Check this guide's troubleshooting section
- Review Z3ED_AI_FLAG_MIGRATION.md
- Verify CMake output for warnings
- Open an issue with build logs
Summary
Recommended for most users:
cmake -B build -DZ3ED_AI=ON
cmake --build build --target z3ed -j8
./build/bin/z3ed agent chat
This gives you:
- ✅ Ollama support (local, free)
- ✅ Gemini support (cloud, API key required)
- ✅ TUI chat interface
- ✅ Tool dispatcher with 5 commands
- ✅ Function calling support
- ✅ All AI agent features