yaze/docs/z3ed/AGENT-ROADMAP.md

# z3ed Agent Roadmap

**Last Updated**: October 3, 2025

## Current Status

### ✅ Production Ready
- **Build System**: Z3ED_AI flag consolidation complete
- **AI Backends**: Ollama (local) and Gemini (cloud) operational
- **Conversational Agent**: Multi-step tool execution with chat history
- **Tool Dispatcher**: 5 read-only tools (resource-list, dungeon-list-sprites, overworld-find-tile, overworld-describe-map, overworld-list-warps)
- **TUI Chat**: FTXUI-based interactive terminal interface
- **Simple Chat**: Text-mode REPL for AI testing (no FTXUI dependencies)
- **GUI Chat Widget**: ImGui-based widget (needs integration into main app)

### 🚧 Active Work
1. **Live LLM Testing** (1-2h): Verify function calling with real models
2. **GUI Integration** (4-6h): Wire AgentChatWidget into YAZE editor
3. **Proposal Workflow** (6-8h): End-to-end integration from chat to ROM changes

## Core Vision

Transform z3ed from a command-line tool into a **conversational ROM hacking assistant** where users can:
- Ask questions about ROM contents ("What dungeons exist?")
- Inspect game data interactively ("How many soldiers in room X?")
- Build changes incrementally through dialogue
- Generate proposals from conversation context

## Technical Architecture

### 1. Conversational Agent Service ✅
**Status**: Complete
- `ConversationalAgentService`: Manages chat sessions and tool execution
- Integrates with Ollama/Gemini AI services
- Handles tool calls with automatic JSON formatting
- Maintains conversation history and context

### 2. Read-Only Tools ✅
**Status**: 5 tools implemented
- `resource-list`: Enumerate labeled resources
- `dungeon-list-sprites`: Inspect sprites in rooms
- `overworld-find-tile`: Search for tile16 IDs
- `overworld-describe-map`: Get map metadata
- `overworld-list-warps`: List entrances/exits/holes

**Next**: Add dialogue, sprite info, and region inspection tools

### 3. Chat Interfaces
**Status**: Multiple modes available
- **TUI (FTXUI)**: Full-screen interactive terminal (✅ complete)
- **Simple Mode**: Text REPL for automation/testing (✅ complete)
- **GUI (ImGui)**: Dockable widget in YAZE (⚠️ needs integration)

### 4. Proposal Workflow Integration
**Status**: Planned
**Goal**: When user requests ROM changes, agent generates proposal
1. User chats to explore ROM
2. User requests change ("add two more soldiers")
3. Agent generates commands → creates proposal
4. User reviews with `agent diff` or GUI
5. User accepts/rejects proposal

## Immediate Priorities

### Priority 1: Live LLM Testing (1-2 hours)
Verify function calling works end-to-end:
- Test Gemini 2.0 with natural language prompts
- Test Ollama (qwen2.5-coder) with tool discovery
- Validate multi-step conversations
- Exercise all 5 tools

### Priority 2: GUI Chat Integration (4-6 hours)
Wire AgentChatWidget into main YAZE editor:
- Add menu item: Debug → Agent Chat
- Connect to shared ConversationalAgentService
- Test with loaded ROM context
- Add history persistence

### Priority 3: Proposal Generation (6-8 hours)

## Technical Implementation Plan

### 1. Conversational Agent Service
- **Description**: A new service that will manage the back-and-forth between the user and the LLM. It will maintain chat history and orchestrate the agent's different modes (Q&A vs. command generation).
- **Components**:
    - `ConversationalAgentService`: The main class for managing the chat session.
    - Integration with existing `AIService` implementations (Ollama, Gemini).
- **Status**: In progress — baseline service exists with chat history, tool loop handling, and structured response parsing. Next up: wiring in live ROM context and richer session state.

### 2. Read-Only "Tools" for the Agent
- **Description**: To enable the agent to answer questions, we need to expand `z3ed` with a suite of read-only commands that the LLM can call. This is aligned with the "tool use" or "function calling" capabilities of modern LLMs.
- **Example Tools to Implement**:
    - `resource list --type <dungeon|sprite|...>`: List all user-defined labels of a certain type.
    - `dungeon list-sprites --room <id|label>`: List all sprites in a given room.
    - `dungeon get-info --room <id|label>`: Get metadata for a specific room.
    - `overworld find-tile --tile <id>`: Find all occurrences of a specific tile on the overworld map.
- **Advanced Editing Tools (for future implementation)**:
    - `overworld set-area --map <id> --x <x> --y <y> --width <w> --height <h> --tile <id>`
    - `overworld replace-tile --map <id> --from <old_id> --to <new_id>`
    - `overworld blend-tiles --map <id> --pattern <name> --density <percent>`
- **Status**: Foundational commands (`resource-list`, `dungeon-list-sprites`) are live with JSON output. Focus is shifting to high-value Overworld and dialogue inspection tools.

### 3. TUI and GUI Chat Interfaces
- **Description**: User-facing components for interacting with the `ConversationalAgentService`.
- **Components**:
    - **TUI**: A new full-screen component in `z3ed` using FTXUI, providing a rich chat experience in the terminal.
    - **GUI**: A new ImGui widget that can be docked into the main `yaze` application window.
- **Status**: In progress — CLI/TUI and GUI chat widgets exist, now rendering tables/JSON with readable formatting. Need to improve input ergonomics and synchronized history navigation.

### 4. Integration with the Proposal Workflow
- **Description**: The final step is to connect the conversation to the action. When a user's prompt implies a desire to modify the ROM (e.g., "Okay, now add two more soldiers"), the `ConversationalAgentService` will trigger the existing `Tile16ProposalGenerator` (and future proposal generators for other resource types) to create a proposal.
- **Workflow**:
    1. User chats with the agent to explore the ROM.
    2. User asks the agent to make a change.
    3. `ConversationalAgentService` generates the commands and passes them to the appropriate `ProposalGenerator`.
    4. A new proposal is created and saved.
    5. The TUI/GUI notifies the user that a proposal is ready for review.
    6. User uses the `agent diff` and `agent accept` commands (or UI equivalents) to review and apply the changes.
- **Status**: The proposal workflow itself is mostly implemented. This task involves integrating it with the new conversational service.

## Next Steps

### Immediate Priorities
1.  **✅ Build System Consolidation** (COMPLETE - Oct 3, 2025):
    - ✅ Created Z3ED_AI master flag for simplified builds
    - ✅ Fixed Gemini crash with graceful degradation
    - ✅ Updated documentation with new build instructions
    - ✅ Tested both Ollama and Gemini backends
    - **Next**: Update CI/CD workflows to use `-DZ3ED_AI=ON`
2.  **Live LLM Testing** (NEXT UP - 1-2 hours):
    - Verify function calling works with real Ollama/Gemini
    - Test multi-step tool execution
    - Validate all 5 tools with natural language prompts
3.  **Expand Overworld Tool Coverage**:
    - ✅ Ship read-only tile searches (`overworld find-tile`) with shared formatting for CLI and agent calls.
    - Next: add area summaries, teleport destination lookups, and keep JSON/Text parity for all new tools.
4.  **Polish the TUI Chat Experience**:
    - Tighten keyboard shortcuts, scrolling, and copy-to-clipboard behaviour.
    - Align log file output with on-screen formatting for easier debugging.
5.  **Document & Test the New Tooling**:
    - Update the main `README.md` and relevant docs to cover the new chat formatting.
    - Add regression tests (unit or golden JSON fixtures) for the new Overworld tools.
5.  **Build GUI Chat Widget**:
    - Create the ImGui component.
    - Ensure it shares the same backend service as the TUI.
6.  **Full Integration with Proposal System**:
    - Implement the logic for the agent to transition from conversation to proposal generation.
7.  **Expand Tool Arsenal**:
    - Continuously add new read-only commands to give the agent more capabilities to inspect the ROM.
8.  **Multi-Modal Agent**:
    - Explore the possibility of the agent generating and displaying images (e.g., a map of a dungeon room) in the chat.
9.  **Advanced Configuration**:
    - Implement environment variables for selecting AI providers and models (e.g., `YAZE_AI_PROVIDER`, `OLLAMA_MODEL`).
    - Add CLI flags for overriding the provider and model on a per-command basis.
10.  **Performance and Cost-Saving**:
    - Implement a response cache to reduce latency and API costs.
    - Add token usage tracking and reporting.

## Current Status & Next Steps (Updated: October 3, 2025)

We have made significant progress in laying the foundation for the conversational agent.

### ✅ Completed
- **Build System Consolidation**: ✅ **NEW** Z3ED_AI master flag (Oct 3, 2025)
  - Single flag enables all AI features: `-DZ3ED_AI=ON`
  - Auto-manages dependencies (JSON, YAML, httplib, OpenSSL)
  - Fixed Gemini crash when API key set but JSON disabled
  - Graceful degradation with clear error messages
  - Backward compatible with old flags
  - Ready for build modularization (enables optional `libyaze_agent.a`)
  - **Docs**: `docs/z3ed/Z3ED_AI_FLAG_MIGRATION.md`
- **`ConversationalAgentService`**: ✅ Fully operational with multi-step tool execution loop
  - Handles tool calls with automatic JSON output format
  - Prevents recursion through proper tool result replay
  - Supports conversation history and context management
- **TUI Chat Interface**: ✅ Production-ready (`z3ed agent chat`)
  - Renders tables from JSON tool results
  - Pretty-prints JSON payloads with syntax formatting
  - Scrollable history with user/agent distinction
- **Tool Dispatcher**: ✅ Complete with 5 read-only tools
  - `resource-list`: Enumerate labeled resources (dungeons, sprites, palettes)
  - `dungeon-list-sprites`: Inspect sprites in dungeon rooms
  - `overworld-find-tile`: Search for tile16 IDs across maps
  - `overworld-describe-map`: Get comprehensive map metadata
  - `overworld-list-warps`: List entrances/exits/holes with filtering
- **Structured Output Rendering**: ✅ Both TUI formats support tables and JSON
  - Automatic table generation from JSON arrays/objects
  - Column-aligned formatting with headers
  - Graceful fallback to text for malformed data
- **ROM Context Integration**: ✅ Tools can access loaded ROM or load from `--rom` flag
  - Shared ROM context passed through ConversationalAgentService
  - Automatic ROM loading with error handling
- **AI Service Foundation**: ✅ Ollama and Gemini services operational
  - Enhanced prompting system with resource catalogue loading
  - System instruction generation with examples
  - Health checks and model availability validation
  - Both backends tested and working in production

### 🚧 In Progress
- **Live LLM Testing**: Ready to execute with real Ollama/Gemini
  - All infrastructure complete (function calling, tool schemas, response parsing)
  - Need to verify multi-step tool execution with live models
  - Test scenarios prepared for all 5 tools
  - **Estimated Time**: 1-2 hours
- **GUI Chat Widget**: Not yet started
  - TUI implementation complete and can serve as reference
  - Should reuse table/JSON rendering logic from TUI
  - Target: `src/app/gui/debug/agent_chat_widget.{h,cc}`
  - **Estimated Time**: 6-8 hours

### 🚀 Next Steps (Priority Order)

#### Priority 1: Live LLM Testing with Function Calling (1-2 hours)
**Goal**: Verify Ollama/Gemini can autonomously invoke tools in production

**Infrastructure Complete** ✅:
- ✅ Tool schema generation (`BuildFunctionCallSchemas()`)
- ✅ System prompts include function definitions
- ✅ AI services parse `tool_calls` from responses
- ✅ ConversationalAgentService dispatches to ToolDispatcher
- ✅ All 5 tools tested independently

**Testing Tasks**:
1. **Gemini Testing** (30 min)
   - Verify Gemini 2.0 generates correct `tool_calls` JSON
   - Test prompt: "What dungeons are in this ROM?"
   - Verify tool result fed back into conversation
   - Test multi-step: "Now list sprites in the first dungeon"

2. **Ollama Testing** (30 min)
   - Verify qwen2.5-coder discovers and calls tools
   - Same test prompts as Gemini
   - Compare response quality between models

3. **Tool Coverage Testing** (30 min)
   - Exercise all 5 tools with natural language prompts
   - Verify JSON output formats correctly
   - Test error handling (invalid room IDs, etc.)

**Success Criteria**:
- LLM autonomously calls tools without explicit command syntax
- Tool results incorporated into follow-up responses
- Multi-turn conversations work with context

#### Priority 2: Implement GUI Chat Widget (6-8 hours)
**Goal**: Unified chat experience in YAZE application

1. **Create ImGui Chat Widget** (4 hours)
   - File: `src/app/gui/debug/agent_chat_widget.{h,cc}`
   - Reuse table/JSON rendering logic from TUI implementation
   - Add to Debug menu: `Debug → Agent Chat`
   - Share `ConversationalAgentService` instance with TUI

2. **Add Chat History Persistence** (2 hours)
   - Save chat history to `.yaze/agent_chat_history.json`
   - Load on startup, display in GUI/TUI
   - Add "Clear History" button

3. **Polish Input Experience** (2 hours)
   - Multi-line input support (Shift+Enter for newline, Enter to send)
   - Keyboard shortcuts: Ctrl+L to clear, Ctrl+C to copy last response
   - Auto-scroll to bottom on new messages

#### Priority 3: Proposal Generation (6-8 hours)
Connect chat to ROM modification workflow:
- Detect action intents in conversation
- Generate proposal from accumulated context
- Link proposal to chat history
- GUI notification when proposal ready

## Command Reference

### Chat Modes
```bash
# Interactive TUI chat (FTXUI)
z3ed agent chat --rom zelda3.sfc

# Simple text mode (for automation/AI testing)
z3ed agent simple-chat --rom zelda3.sfc

# Batch mode from file
z3ed agent simple-chat --file tests.txt --rom zelda3.sfc
```

### Tool Commands (for direct testing)
```bash
# List dungeons
z3ed agent resource-list --type dungeon --format json

# Find tiles
z3ed agent overworld-find-tile --tile 0x02E --map 0x05

# List sprites in room
z3ed agent dungeon-list-sprites --room 0x012
```

## Build Quick Reference

```bash
# Full AI features
cmake -B build -DZ3ED_AI=ON
cmake --build build --target z3ed

# With GUI automation/testing
cmake -B build -DZ3ED_AI=ON -DYAZE_WITH_GRPC=ON
cmake --build build

# Minimal (no AI)
cmake -B build
cmake --build build --target z3ed
```

## Future Enhancements

### Short Term (1-2 months)
- Dialogue/text search tools
- Sprite info inspection
- Region/teleport tools
- Response caching
- Token usage tracking

### Medium Term (3-6 months)
- Multi-modal agent (image generation)
- Advanced configuration (env vars, model selection)
- Proposal templates for common edits
- Undo/redo in conversations

### Long Term (6+ months)
- Visual diff viewer for proposals
- Collaborative editing sessions
- Learning from user feedback
- Custom tool plugins
**Goal**: Enable deeper ROM introspection for level design questions

1. **Dialogue/Text Tools** (3 hours)
   - `dialogue-search --text "search term"`: Find text in ROM dialogue
   - `dialogue-get --id 0x...`: Get dialogue by message ID

2. **Sprite Tools** (3 hours)
   - `sprite-get-info --id 0x...`: Sprite metadata (HP, damage, AI)
   - `overworld-list-sprites --map 0x...`: Sprites on overworld map

3. **Advanced Overworld Tools** (4 hours)
   - `overworld-get-region --map 0x...`: Region boundaries and properties
   - `overworld-list-transitions --from-map 0x...`: Map transitions/scrolling
   - `overworld-get-tile-at --map 0x... --x N --y N`: Get specific tile16 value

#### Priority 4: Performance and Caching (4-6 hours)

1. **Response Caching** (3 hours)
   - Implement LRU cache for identical prompts
   - Cache tool results by (tool_name, args) key
   - Configurable TTL (default: 5 minutes for ROM introspection)

2. **Token Usage Tracking** (2 hours)
   - Log tokens per request (Ollama and Gemini APIs provide this)
   - Display in chat footer: "Last response: 1234 tokens, ~$0.02"
   - Add `--show-token-usage` flag to CLI commands

3. **Streaming Responses** (optional, 3-4 hours)
   - Use Ollama/Gemini streaming APIs
   - Update GUI/TUI to show partial responses as they arrive
   - Improves perceived latency for long responses

## z3ed Build Quick Reference

```bash
# Full AI features (Ollama + Gemini)
cmake -B build -DZ3ED_AI=ON
cmake --build build --target z3ed

# AI + GUI automation/testing
cmake -B build -DZ3ED_AI=ON -DYAZE_WITH_GRPC=ON
cmake --build build --target z3ed

# Minimal build (no AI)
cmake -B build
cmake --build build --target z3ed
```

## Build Flags Explained

| Flag | Purpose | Dependencies | When to Use |
|------|---------|--------------|-------------|
| `Z3ED_AI=ON` | **Master flag** for AI features | JSON, YAML, httplib, (OpenSSL*) | Want Ollama or Gemini support |
| `YAZE_WITH_GRPC=ON` | GUI automation & testing | gRPC, Protobuf, (auto-enables JSON) | Want GUI test harness |
| `YAZE_WITH_JSON=ON` | Low-level JSON support | nlohmann_json | Auto-enabled by above flags |

*OpenSSL optional - required for Gemini (HTTPS), Ollama works without it

## Feature Matrix

| Feature | No Flags | Z3ED_AI | Z3ED_AI + GRPC |
|---------|----------|---------|----------------|
| Basic CLI | ✅ | ✅ | ✅ |
| Ollama (local) | ❌ | ✅ | ✅ |
| Gemini (cloud) | ❌ | ✅* | ✅* |
| TUI Chat | ❌ | ✅ | ✅ |
| GUI Test Automation | ❌ | ❌ | ✅ |
| Tool Dispatcher | ❌ | ✅ | ✅ |
| Function Calling | ❌ | ✅ | ✅ |

*Requires OpenSSL for HTTPS

## Common Build Scenarios

### Developer (AI features, no GUI testing)
```bash
cmake -B build -DZ3ED_AI=ON
cmake --build build --target z3ed -j8
```

### Full Stack (AI + GUI automation)
```bash
cmake -B build -DZ3ED_AI=ON -DYAZE_WITH_GRPC=ON
cmake --build build --target z3ed -j8
```

### CI/CD (minimal, fast)
```bash
cmake -B build -DYAZE_MINIMAL_BUILD=ON
cmake --build build -j$(nproc)
```

### Release Build (optimized)
```bash
cmake -B build -DZ3ED_AI=ON -DCMAKE_BUILD_TYPE=Release
cmake --build build --target z3ed -j8
```

## Migration from Old Flags

### Before (Confusing)
```bash
cmake -B build -DYAZE_WITH_GRPC=ON -DYAZE_WITH_JSON=ON
```

### After (Clear Intent)
```bash
cmake -B build -DZ3ED_AI=ON -DYAZE_WITH_GRPC=ON
```

**Note**: Old flags still work for backward compatibility!

## Troubleshooting

### "Build with -DZ3ED_AI=ON" warning
**Symptom**: AI commands fail with "JSON support required"
**Fix**: Rebuild with AI flag
```bash
rm -rf build && cmake -B build -DZ3ED_AI=ON && cmake --build build
```

### "OpenSSL not found" warning
**Symptom**: Gemini API doesn't work
**Impact**: Only affects Gemini (cloud). Ollama (local) works fine
**Fix (optional)**:
```bash
# macOS
brew install openssl

# Linux
sudo apt install libssl-dev

# Then rebuild
cmake -B build -DZ3ED_AI=ON && cmake --build build
```

### Ollama vs Gemini not auto-detecting
**Symptom**: Wrong backend selected
**Fix**: Set explicit provider
```bash
# Force Ollama
export YAZE_AI_PROVIDER=ollama
./build/bin/z3ed agent plan --prompt "test"

# Force Gemini
export YAZE_AI_PROVIDER=gemini
export GEMINI_API_KEY="your-key"
./build/bin/z3ed agent plan --prompt "test"
```

## Environment Variables

| Variable | Default | Purpose |
|----------|---------|---------|
| `YAZE_AI_PROVIDER` | auto | Force `ollama` or `gemini` |
| `GEMINI_API_KEY` | - | Gemini API key (enables Gemini) |
| `OLLAMA_MODEL` | `qwen2.5-coder:7b` | Override Ollama model |
| `GEMINI_MODEL` | `gemini-2.5-flash` | Override Gemini model |

## Platform-Specific Notes

### macOS
- OpenSSL auto-detected via Homebrew
- Keychain integration for SSL certs
- Recommended: `brew install openssl ollama`

### Linux
- OpenSSL typically pre-installed
- Install via: `sudo apt install libssl-dev`
- Ollama: Download from https://ollama.com

### Windows
- Use Ollama (no SSL required)
- Gemini requires OpenSSL (harder to setup on Windows)
- Recommend: Focus on Ollama for Windows builds

## Performance Tips

### Faster Incremental Builds
```bash
# Use Ninja instead of Make
cmake -B build -GNinja -DZ3ED_AI=ON
ninja -C build z3ed

# Enable ccache
export CMAKE_CXX_COMPILER_LAUNCHER=ccache
cmake -B build -DZ3ED_AI=ON
```

### Reduce Build Scope
```bash
# Only build z3ed (not full yaze app)
cmake --build build --target z3ed

# Parallel build
cmake --build build --target z3ed -j$(nproc)
```

## Related Documentation

- **Migration Guide**: [Z3ED_AI_FLAG_MIGRATION.md](Z3ED_AI_FLAG_MIGRATION.md)
- **Technical Roadmap**: [AGENT-ROADMAP.md](AGENT-ROADMAP.md)
- **Main README**: [README.md](README.md)
- **Build Modularization**: `../../build_modularization_plan.md`

## Quick Test

Verify your build works:

```bash
# Check z3ed runs
./build/bin/z3ed --version

# Test AI detection
./build/bin/z3ed agent plan --prompt "test" 2>&1 | head -5

# Expected output (with Z3ED_AI=ON):
# 🤖 Using Gemini AI with model: gemini-2.5-flash
# or
# 🤖 Using Ollama AI with model: qwen2.5-coder:7b
# or
# 🤖 Using MockAIService (no LLM configured)
```

## Support

If you encounter issues:
1. Check this guide's troubleshooting section
2. Review [Z3ED_AI_FLAG_MIGRATION.md](Z3ED_AI_FLAG_MIGRATION.md)
3. Verify CMake output for warnings
4. Open an issue with build logs

## Summary

**Recommended for most users**:
```bash
cmake -B build -DZ3ED_AI=ON
cmake --build build --target z3ed -j8
./build/bin/z3ed agent chat
```

This gives you:
- ✅ Ollama support (local, free)
- ✅ Gemini support (cloud, API key required)
- ✅ TUI chat interface
- ✅ Tool dispatcher with 5 commands
- ✅ Function calling support
- ✅ All AI agent features