scawful/yaze

Fork 0

Files

scawful e1304384bc feat: Add simple chat session implementation and integrate into build system

2025-10-04 00:02:01 -04:00

21 KiB

Raw Blame History

z3ed Agent Roadmap

Last Updated: October 3, 2025

Current Status

✅ Production Ready

Build System: Z3ED_AI flag consolidation complete
AI Backends: Ollama (local) and Gemini (cloud) operational
Conversational Agent: Multi-step tool execution with chat history
Tool Dispatcher: 5 read-only tools (resource-list, dungeon-list-sprites, overworld-find-tile, overworld-describe-map, overworld-list-warps)
TUI Chat: FTXUI-based interactive terminal interface
Simple Chat: Text-mode REPL for AI testing (no FTXUI dependencies)
GUI Chat Widget: ImGui-based widget (needs integration into main app)

🚧 Active Work

Live LLM Testing (1-2h): Verify function calling with real models
GUI Integration (4-6h): Wire AgentChatWidget into YAZE editor
Proposal Workflow (6-8h): End-to-end integration from chat to ROM changes

Core Vision

Transform z3ed from a command-line tool into a conversational ROM hacking assistant where users can:

Ask questions about ROM contents ("What dungeons exist?")
Inspect game data interactively ("How many soldiers in room X?")
Build changes incrementally through dialogue
Generate proposals from conversation context

Technical Architecture

1. Conversational Agent Service ✅

Status: Complete

ConversationalAgentService: Manages chat sessions and tool execution
Integrates with Ollama/Gemini AI services
Handles tool calls with automatic JSON formatting
Maintains conversation history and context

2. Read-Only Tools ✅

Status: 5 tools implemented

resource-list: Enumerate labeled resources
dungeon-list-sprites: Inspect sprites in rooms
overworld-find-tile: Search for tile16 IDs
overworld-describe-map: Get map metadata
overworld-list-warps: List entrances/exits/holes

Next: Add dialogue, sprite info, and region inspection tools

3. Chat Interfaces

Status: Multiple modes available

TUI (FTXUI): Full-screen interactive terminal (✅ complete)
Simple Mode: Text REPL for automation/testing (✅ complete)
GUI (ImGui): Dockable widget in YAZE (⚠️ needs integration)

4. Proposal Workflow Integration

Status: Planned Goal: When user requests ROM changes, agent generates proposal

User chats to explore ROM
User requests change ("add two more soldiers")
Agent generates commands → creates proposal
User reviews with agent diff or GUI
User accepts/rejects proposal

Immediate Priorities

Priority 1: Live LLM Testing (1-2 hours)

Verify function calling works end-to-end:

Test Gemini 2.0 with natural language prompts
Test Ollama (qwen2.5-coder) with tool discovery
Validate multi-step conversations
Exercise all 5 tools

Priority 2: GUI Chat Integration (4-6 hours)

Wire AgentChatWidget into main YAZE editor:

Add menu item: Debug → Agent Chat
Connect to shared ConversationalAgentService
Test with loaded ROM context
Add history persistence

Priority 3: Proposal Generation (6-8 hours)

Technical Implementation Plan

1. Conversational Agent Service

Description: A new service that will manage the back-and-forth between the user and the LLM. It will maintain chat history and orchestrate the agent's different modes (Q&A vs. command generation).
Components:
- ConversationalAgentService: The main class for managing the chat session.
- Integration with existing AIService implementations (Ollama, Gemini).
Status: In progress — baseline service exists with chat history, tool loop handling, and structured response parsing. Next up: wiring in live ROM context and richer session state.

2. Read-Only "Tools" for the Agent

Description: To enable the agent to answer questions, we need to expand z3ed with a suite of read-only commands that the LLM can call. This is aligned with the "tool use" or "function calling" capabilities of modern LLMs.
Example Tools to Implement:
- resource list --type <dungeon|sprite|...>: List all user-defined labels of a certain type.
- dungeon list-sprites --room <id|label>: List all sprites in a given room.
- dungeon get-info --room <id|label>: Get metadata for a specific room.
- overworld find-tile --tile <id>: Find all occurrences of a specific tile on the overworld map.
Advanced Editing Tools (for future implementation):
- overworld set-area --map <id> --x <x> --y <y> --width <w> --height <h> --tile <id>
- overworld replace-tile --map <id> --from <old_id> --to <new_id>
- overworld blend-tiles --map <id> --pattern <name> --density <percent>
Status: Foundational commands (resource-list, dungeon-list-sprites) are live with JSON output. Focus is shifting to high-value Overworld and dialogue inspection tools.

3. TUI and GUI Chat Interfaces

Description: User-facing components for interacting with the ConversationalAgentService.
Components:
- TUI: A new full-screen component in z3ed using FTXUI, providing a rich chat experience in the terminal.
- GUI: A new ImGui widget that can be docked into the main yaze application window.
Status: In progress — CLI/TUI and GUI chat widgets exist, now rendering tables/JSON with readable formatting. Need to improve input ergonomics and synchronized history navigation.

4. Integration with the Proposal Workflow

Description: The final step is to connect the conversation to the action. When a user's prompt implies a desire to modify the ROM (e.g., "Okay, now add two more soldiers"), the ConversationalAgentService will trigger the existing Tile16ProposalGenerator (and future proposal generators for other resource types) to create a proposal.
Workflow:
1. User chats with the agent to explore the ROM.
2. User asks the agent to make a change.
3. ConversationalAgentService generates the commands and passes them to the appropriate ProposalGenerator.
4. A new proposal is created and saved.
5. The TUI/GUI notifies the user that a proposal is ready for review.
6. User uses the agent diff and agent accept commands (or UI equivalents) to review and apply the changes.
Status: The proposal workflow itself is mostly implemented. This task involves integrating it with the new conversational service.

Next Steps

Immediate Priorities

✅ Build System Consolidation (COMPLETE - Oct 3, 2025):
- ✅ Created Z3ED_AI master flag for simplified builds
- ✅ Fixed Gemini crash with graceful degradation
- ✅ Updated documentation with new build instructions
- ✅ Tested both Ollama and Gemini backends
- Next: Update CI/CD workflows to use -DZ3ED_AI=ON
Live LLM Testing (NEXT UP - 1-2 hours):
- Verify function calling works with real Ollama/Gemini
- Test multi-step tool execution
- Validate all 5 tools with natural language prompts
Expand Overworld Tool Coverage:
- ✅ Ship read-only tile searches (overworld find-tile) with shared formatting for CLI and agent calls.
- Next: add area summaries, teleport destination lookups, and keep JSON/Text parity for all new tools.
Polish the TUI Chat Experience:
- Tighten keyboard shortcuts, scrolling, and copy-to-clipboard behaviour.
- Align log file output with on-screen formatting for easier debugging.
Document & Test the New Tooling:
- Update the main README.md and relevant docs to cover the new chat formatting.
- Add regression tests (unit or golden JSON fixtures) for the new Overworld tools.
Build GUI Chat Widget:
- Create the ImGui component.
- Ensure it shares the same backend service as the TUI.
Full Integration with Proposal System:
- Implement the logic for the agent to transition from conversation to proposal generation.
Expand Tool Arsenal:
- Continuously add new read-only commands to give the agent more capabilities to inspect the ROM.
Multi-Modal Agent:
- Explore the possibility of the agent generating and displaying images (e.g., a map of a dungeon room) in the chat.
Advanced Configuration:
- Implement environment variables for selecting AI providers and models (e.g., YAZE_AI_PROVIDER, OLLAMA_MODEL).
- Add CLI flags for overriding the provider and model on a per-command basis.
Performance and Cost-Saving: - Implement a response cache to reduce latency and API costs. - Add token usage tracking and reporting.

Current Status & Next Steps (Updated: October 3, 2025)

We have made significant progress in laying the foundation for the conversational agent.

✅ Completed

Build System Consolidation: ✅ NEW Z3ED_AI master flag (Oct 3, 2025)
- Single flag enables all AI features: -DZ3ED_AI=ON
- Auto-manages dependencies (JSON, YAML, httplib, OpenSSL)
- Fixed Gemini crash when API key set but JSON disabled
- Graceful degradation with clear error messages
- Backward compatible with old flags
- Ready for build modularization (enables optional libyaze_agent.a)
- Docs: docs/z3ed/Z3ED_AI_FLAG_MIGRATION.md
ConversationalAgentService: ✅ Fully operational with multi-step tool execution loop
- Handles tool calls with automatic JSON output format
- Prevents recursion through proper tool result replay
- Supports conversation history and context management
TUI Chat Interface: ✅ Production-ready (z3ed agent chat)
- Renders tables from JSON tool results
- Pretty-prints JSON payloads with syntax formatting
- Scrollable history with user/agent distinction
Tool Dispatcher: ✅ Complete with 5 read-only tools
- resource-list: Enumerate labeled resources (dungeons, sprites, palettes)
- dungeon-list-sprites: Inspect sprites in dungeon rooms
- overworld-find-tile: Search for tile16 IDs across maps
- overworld-describe-map: Get comprehensive map metadata
- overworld-list-warps: List entrances/exits/holes with filtering
Structured Output Rendering: ✅ Both TUI formats support tables and JSON
- Automatic table generation from JSON arrays/objects
- Column-aligned formatting with headers
- Graceful fallback to text for malformed data
ROM Context Integration: ✅ Tools can access loaded ROM or load from --rom flag
- Shared ROM context passed through ConversationalAgentService
- Automatic ROM loading with error handling
AI Service Foundation: ✅ Ollama and Gemini services operational
- Enhanced prompting system with resource catalogue loading
- System instruction generation with examples
- Health checks and model availability validation
- Both backends tested and working in production

🚧 In Progress

Live LLM Testing: Ready to execute with real Ollama/Gemini
- All infrastructure complete (function calling, tool schemas, response parsing)
- Need to verify multi-step tool execution with live models
- Test scenarios prepared for all 5 tools
- Estimated Time: 1-2 hours
GUI Chat Widget: Not yet started
- TUI implementation complete and can serve as reference
- Should reuse table/JSON rendering logic from TUI
- Target: src/app/gui/debug/agent_chat_widget.{h,cc}
- Estimated Time: 6-8 hours

🚀 Next Steps (Priority Order)

Priority 1: Live LLM Testing with Function Calling (1-2 hours)

Goal: Verify Ollama/Gemini can autonomously invoke tools in production

Infrastructure Complete ✅:

✅ Tool schema generation (BuildFunctionCallSchemas())
✅ System prompts include function definitions
✅ AI services parse tool_calls from responses
✅ ConversationalAgentService dispatches to ToolDispatcher
✅ All 5 tools tested independently

Testing Tasks:

Gemini Testing (30 min)
- Verify Gemini 2.0 generates correct tool_calls JSON
- Test prompt: "What dungeons are in this ROM?"
- Verify tool result fed back into conversation
- Test multi-step: "Now list sprites in the first dungeon"
Ollama Testing (30 min)
- Verify qwen2.5-coder discovers and calls tools
- Same test prompts as Gemini
- Compare response quality between models
Tool Coverage Testing (30 min)
- Exercise all 5 tools with natural language prompts
- Verify JSON output formats correctly
- Test error handling (invalid room IDs, etc.)

Success Criteria:

LLM autonomously calls tools without explicit command syntax
Tool results incorporated into follow-up responses
Multi-turn conversations work with context

Goal: Unified chat experience in YAZE application

Create ImGui Chat Widget (4 hours)
- File: src/app/gui/debug/agent_chat_widget.{h,cc}
- Reuse table/JSON rendering logic from TUI implementation
- Add to Debug menu: Debug → Agent Chat
- Share ConversationalAgentService instance with TUI
Add Chat History Persistence (2 hours)
- Save chat history to .yaze/agent_chat_history.json
- Load on startup, display in GUI/TUI
- Add "Clear History" button
Polish Input Experience (2 hours)
- Multi-line input support (Shift+Enter for newline, Enter to send)
- Keyboard shortcuts: Ctrl+L to clear, Ctrl+C to copy last response
- Auto-scroll to bottom on new messages

Priority 3: Proposal Generation (6-8 hours)

Connect chat to ROM modification workflow:

Detect action intents in conversation
Generate proposal from accumulated context
Link proposal to chat history
GUI notification when proposal ready

Command Reference

Chat Modes

# Interactive TUI chat (FTXUI)
z3ed agent chat --rom zelda3.sfc

# Simple text mode (for automation/AI testing)
z3ed agent simple-chat --rom zelda3.sfc

# Batch mode from file
z3ed agent simple-chat --file tests.txt --rom zelda3.sfc

Tool Commands (for direct testing)

# List dungeons
z3ed agent resource-list --type dungeon --format json

# Find tiles
z3ed agent overworld-find-tile --tile 0x02E --map 0x05

# List sprites in room
z3ed agent dungeon-list-sprites --room 0x012

Build Quick Reference

# Full AI features
cmake -B build -DZ3ED_AI=ON
cmake --build build --target z3ed

# With GUI automation/testing
cmake -B build -DZ3ED_AI=ON -DYAZE_WITH_GRPC=ON
cmake --build build

# Minimal (no AI)
cmake -B build
cmake --build build --target z3ed

Future Enhancements

Short Term (1-2 months)

Dialogue/text search tools
Sprite info inspection
Region/teleport tools
Response caching
Token usage tracking

Medium Term (3-6 months)

Multi-modal agent (image generation)
Advanced configuration (env vars, model selection)
Proposal templates for common edits
Undo/redo in conversations

Long Term (6+ months)

Visual diff viewer for proposals
Collaborative editing sessions
Learning from user feedback
Custom tool plugins Goal: Enable deeper ROM introspection for level design questions

Dialogue/Text Tools (3 hours)
- dialogue-search --text "search term": Find text in ROM dialogue
- dialogue-get --id 0x...: Get dialogue by message ID
Sprite Tools (3 hours)
- sprite-get-info --id 0x...: Sprite metadata (HP, damage, AI)
- overworld-list-sprites --map 0x...: Sprites on overworld map
Advanced Overworld Tools (4 hours)
- overworld-get-region --map 0x...: Region boundaries and properties
- overworld-list-transitions --from-map 0x...: Map transitions/scrolling
- overworld-get-tile-at --map 0x... --x N --y N: Get specific tile16 value

Priority 4: Performance and Caching (4-6 hours)

Response Caching (3 hours)
- Implement LRU cache for identical prompts
- Cache tool results by (tool_name, args) key
- Configurable TTL (default: 5 minutes for ROM introspection)
Token Usage Tracking (2 hours)
- Log tokens per request (Ollama and Gemini APIs provide this)
- Display in chat footer: "Last response: 1234 tokens, ~$0.02"
- Add --show-token-usage flag to CLI commands
Streaming Responses (optional, 3-4 hours)
- Use Ollama/Gemini streaming APIs
- Update GUI/TUI to show partial responses as they arrive
- Improves perceived latency for long responses

z3ed Build Quick Reference

# Full AI features (Ollama + Gemini)
cmake -B build -DZ3ED_AI=ON
cmake --build build --target z3ed

# AI + GUI automation/testing
cmake -B build -DZ3ED_AI=ON -DYAZE_WITH_GRPC=ON
cmake --build build --target z3ed

# Minimal build (no AI)
cmake -B build
cmake --build build --target z3ed

Build Flags Explained

Flag	Purpose	Dependencies	When to Use
`Z3ED_AI=ON`	Master flag for AI features	JSON, YAML, httplib, (OpenSSL*)	Want Ollama or Gemini support
`YAZE_WITH_GRPC=ON`	GUI automation & testing	gRPC, Protobuf, (auto-enables JSON)	Want GUI test harness
`YAZE_WITH_JSON=ON`	Low-level JSON support	nlohmann_json	Auto-enabled by above flags

*OpenSSL optional - required for Gemini (HTTPS), Ollama works without it

Feature Matrix

Feature	No Flags	Z3ED_AI	Z3ED_AI + GRPC
Basic CLI	✅	✅	✅
Ollama (local)	❌	✅	✅
Gemini (cloud)	❌	✅*	✅*
TUI Chat	❌	✅	✅
GUI Test Automation	❌	❌	✅
Tool Dispatcher	❌	✅	✅
Function Calling	❌	✅	✅

*Requires OpenSSL for HTTPS

Common Build Scenarios

Developer (AI features, no GUI testing)

cmake -B build -DZ3ED_AI=ON
cmake --build build --target z3ed -j8

Full Stack (AI + GUI automation)

cmake -B build -DZ3ED_AI=ON -DYAZE_WITH_GRPC=ON
cmake --build build --target z3ed -j8

CI/CD (minimal, fast)

cmake -B build -DYAZE_MINIMAL_BUILD=ON
cmake --build build -j$(nproc)

Release Build (optimized)

cmake -B build -DZ3ED_AI=ON -DCMAKE_BUILD_TYPE=Release
cmake --build build --target z3ed -j8

Migration from Old Flags

Before (Confusing)

cmake -B build -DYAZE_WITH_GRPC=ON -DYAZE_WITH_JSON=ON

After (Clear Intent)

cmake -B build -DZ3ED_AI=ON -DYAZE_WITH_GRPC=ON

Note: Old flags still work for backward compatibility!

Troubleshooting

"Build with -DZ3ED_AI=ON" warning

Symptom: AI commands fail with "JSON support required"
Fix: Rebuild with AI flag

rm -rf build && cmake -B build -DZ3ED_AI=ON && cmake --build build

"OpenSSL not found" warning

Symptom: Gemini API doesn't work
Impact: Only affects Gemini (cloud). Ollama (local) works fine
Fix (optional):

# macOS
brew install openssl

# Linux
sudo apt install libssl-dev

# Then rebuild
cmake -B build -DZ3ED_AI=ON && cmake --build build

Ollama vs Gemini not auto-detecting

Symptom: Wrong backend selected
Fix: Set explicit provider

# Force Ollama
export YAZE_AI_PROVIDER=ollama
./build/bin/z3ed agent plan --prompt "test"

# Force Gemini
export YAZE_AI_PROVIDER=gemini
export GEMINI_API_KEY="your-key"
./build/bin/z3ed agent plan --prompt "test"

Environment Variables

Variable	Default	Purpose
`YAZE_AI_PROVIDER`	auto	Force `ollama` or `gemini`
`GEMINI_API_KEY`	-	Gemini API key (enables Gemini)
`OLLAMA_MODEL`	`qwen2.5-coder:7b`	Override Ollama model
`GEMINI_MODEL`	`gemini-2.5-flash`	Override Gemini model

Platform-Specific Notes

macOS

OpenSSL auto-detected via Homebrew
Keychain integration for SSL certs
Recommended: brew install openssl ollama

Linux

OpenSSL typically pre-installed
Install via: sudo apt install libssl-dev
Ollama: Download from https://ollama.com

Windows

Use Ollama (no SSL required)
Gemini requires OpenSSL (harder to setup on Windows)
Recommend: Focus on Ollama for Windows builds

Performance Tips

Faster Incremental Builds

# Use Ninja instead of Make
cmake -B build -GNinja -DZ3ED_AI=ON
ninja -C build z3ed

# Enable ccache
export CMAKE_CXX_COMPILER_LAUNCHER=ccache
cmake -B build -DZ3ED_AI=ON

Reduce Build Scope

# Only build z3ed (not full yaze app)
cmake --build build --target z3ed

# Parallel build
cmake --build build --target z3ed -j$(nproc)

Migration Guide: Z3ED_AI_FLAG_MIGRATION.md
Technical Roadmap: AGENT-ROADMAP.md
Main README: README.md
Build Modularization: ../../build_modularization_plan.md

Quick Test

Verify your build works:

# Check z3ed runs
./build/bin/z3ed --version

# Test AI detection
./build/bin/z3ed agent plan --prompt "test" 2>&1 | head -5

# Expected output (with Z3ED_AI=ON):
# 🤖 Using Gemini AI with model: gemini-2.5-flash
# or
# 🤖 Using Ollama AI with model: qwen2.5-coder:7b
# or
# 🤖 Using MockAIService (no LLM configured)

Support

If you encounter issues:

Check this guide's troubleshooting section
Review Z3ED_AI_FLAG_MIGRATION.md
Verify CMake output for warnings
Open an issue with build logs

Summary

Recommended for most users:

cmake -B build -DZ3ED_AI=ON
cmake --build build --target z3ed -j8
./build/bin/z3ed agent chat

This gives you:

✅ Ollama support (local, free)
✅ Gemini support (cloud, API key required)
✅ TUI chat interface
✅ Tool dispatcher with 5 commands
✅ Function calling support
✅ All AI agent features

21 KiB Raw Blame History

z3ed Agent Roadmap

Current Status

✅ Production Ready

🚧 Active Work

Core Vision

Technical Architecture

1. Conversational Agent Service ✅

2. Read-Only Tools ✅

3. Chat Interfaces

4. Proposal Workflow Integration

Immediate Priorities

Priority 1: Live LLM Testing (1-2 hours)

Priority 2: GUI Chat Integration (4-6 hours)

Priority 3: Proposal Generation (6-8 hours)

Technical Implementation Plan

1. Conversational Agent Service

2. Read-Only "Tools" for the Agent

3. TUI and GUI Chat Interfaces

4. Integration with the Proposal Workflow

Next Steps

Immediate Priorities

Current Status & Next Steps (Updated: October 3, 2025)

✅ Completed

🚧 In Progress

🚀 Next Steps (Priority Order)

Priority 1: Live LLM Testing with Function Calling (1-2 hours)

Priority 2: Implement GUI Chat Widget (6-8 hours)

Priority 3: Proposal Generation (6-8 hours)

Command Reference

Chat Modes

Tool Commands (for direct testing)

Build Quick Reference

Future Enhancements

Short Term (1-2 months)

Medium Term (3-6 months)

Long Term (6+ months)

Priority 4: Performance and Caching (4-6 hours)

z3ed Build Quick Reference

Build Flags Explained

Feature Matrix

Common Build Scenarios

Developer (AI features, no GUI testing)

Full Stack (AI + GUI automation)

CI/CD (minimal, fast)

Release Build (optimized)

Migration from Old Flags

Before (Confusing)

After (Clear Intent)

Troubleshooting

"Build with -DZ3ED_AI=ON" warning

"OpenSSL not found" warning

Ollama vs Gemini not auto-detecting

Environment Variables

Platform-Specific Notes

macOS

Linux

Windows

Performance Tips

Faster Incremental Builds

Reduce Build Scope

Related Documentation

Quick Test

Support

Summary

21 KiB

Raw Blame History