docs: Update roadmap and implementation plan to reflect progress on conversational agent and LLM integration

This commit is contained in:
scawful
2025-10-03 22:17:38 -04:00
parent 5123b0ee5f
commit 799d8117ed
4 changed files with 170 additions and 37 deletions

View File

@@ -86,26 +86,133 @@ This vision will be realized through a shared interface available in both the `z
- Implement a response cache to reduce latency and API costs.
- Add token usage tracking and reporting.
## Current Status & Next Steps (As of Oct 3, Session 2)
## Current Status & Next Steps (Updated: October 3, 2025)
We have made significant progress in laying the foundation for the conversational agent.
### ✅ Completed
- **Initial `ConversationalAgentService`**: The basic service is in place.
- **TUI Chat Stub**: A functional `agent chat` command exists.
- **GUI Chat Widget Stub**: An `AgentChatWidget` is integrated into the main GUI.
- **Initial Agent "Tools"**: `resource-list` and `dungeon-list-sprites` commands are implemented.
- **Tool Use Foundation**: The `ToolDispatcher` is implemented, and the AI services are aware of the new tool call format.
- **Tool Loop Improvements**: Conversational flow now handles multi-step tool calls with default JSON output, allowing results to feed back into the chat without recursion.
- **Structured Tool Output Rendering**: Both the TUI and GUI chat widgets now display tables and JSON payloads with friendly formatting, drastically improving readability.
- **Overworld Inspection Suite**: Added `overworld describe-map` and `overworld list-warps` commands producing text/JSON summaries for map metadata and warp points, with agent tooling hooks.
- **Overworld Tile Search Tool**: Added `overworld find-tile` across CLI and agent tooling with shared ROM context handling and regression tests.
- **`ConversationalAgentService`**: ✅ Fully operational with multi-step tool execution loop
- Handles tool calls with automatic JSON output format
- Prevents recursion through proper tool result replay
- Supports conversation history and context management
- **TUI Chat Interface**: ✅ Production-ready (`z3ed agent chat`)
- Renders tables from JSON tool results
- Pretty-prints JSON payloads with syntax formatting
- Scrollable history with user/agent distinction
- **Tool Dispatcher**: ✅ Complete with 5 read-only tools
- `resource-list`: Enumerate labeled resources (dungeons, sprites, palettes)
- `dungeon-list-sprites`: Inspect sprites in dungeon rooms
- `overworld-find-tile`: Search for tile16 IDs across maps
- `overworld-describe-map`: Get comprehensive map metadata
- `overworld-list-warps`: List entrances/exits/holes with filtering
- **Structured Output Rendering**: ✅ Both TUI formats support tables and JSON
- Automatic table generation from JSON arrays/objects
- Column-aligned formatting with headers
- Graceful fallback to text for malformed data
- **ROM Context Integration**: ✅ Tools can access loaded ROM or load from `--rom` flag
- Shared ROM context passed through ConversationalAgentService
- Automatic ROM loading with error handling
- **AI Service Foundation**: ✅ Ollama and Gemini services operational
- Enhanced prompting system with resource catalogue loading
- System instruction generation with examples
- Health checks and model availability validation
### 🚀 Next Steps
1. **Integrate Tool Use with LLM**:
- Modify the `AIService` to support function calling/tool use.
- Teach the agent to call the new read-only commands to answer questions.
2. **Polish the TUI Chat Experience**:
- Tighten keyboard shortcuts, scrolling, and copy-to-clipboard behaviour.
- Align log file output with on-screen formatting for easier debugging.
2. **Expand Tool Coverage**: Target additional Overworld navigation helpers (region summaries, teleport lookups) and dialogue inspectors. Prioritize commands that unblock common level-design questions and emit concise table/JSON payloads.
### 🚧 In Progress
- **GUI Chat Widget**: ⚠️ **NOT YET IMPLEMENTED**
- No `AgentChatWidget` found in `src/app/gui/` directory
- TUI implementation exists but GUI integration is pending
- **Action Required**: Create `src/app/gui/debug/agent_chat_widget.{h,cc}`
- **LLM Function Calling**: ⚠️ **PARTIALLY IMPLEMENTED**
- ToolDispatcher exists and is used by ConversationalAgentService
- AI services (Ollama, Gemini) parse tool calls from responses
- **Gap**: LLM prompt needs explicit tool schema definitions for function calling
- **Action Required**: Add tool definitions to system prompts (see Next Steps)
### 🚀 Next Steps (Priority Order)
#### Priority 1: Complete LLM Function Calling Integration (4-6 hours)
**Goal**: Enable Ollama/Gemini to autonomously invoke read-only tools
1. **Add Tool Definitions to System Prompts** (2 hours)
- Generate JSON schema for all 5 tools in `ToolDispatcher`
- Inject tool definitions into `PromptBuilder::BuildSystemInstruction()`
- Format: OpenAI-compatible function calling format
```json
{
"name": "resource-list",
"description": "List all labeled resources of a given type",
"parameters": {
"type": "object",
"properties": {
"type": {"type": "string", "enum": ["dungeon", "sprite", "overworld"]},
"format": {"type": "string", "enum": ["table", "json"]}
},
"required": ["type"]
}
}
```
2. **Parse Function Calls from LLM Responses** (2 hours)
- Update `OllamaAIService::GenerateResponse()` to detect function calls in JSON
- Update `GeminiAIService::GenerateResponse()` for Gemini's function calling format
- Populate `AgentResponse.tool_calls` with parsed ToolCall objects
- **File**: `src/cli/service/ai/ollama_ai_service.cc:176-294`
- **File**: `src/cli/service/ai/gemini_ai_service.cc:104-285`
3. **Test Tool Invocation Round-Trip** (1-2 hours)
- Verify LLM can discover available tools from system prompt
- Test: "What dungeons are in this ROM?" → should call `resource-list --type dungeon`
- Test: "Find all water tiles on map 0" → should call `overworld-find-tile --tile 0x..."`
- Create regression test script: `scripts/test_agent_tool_calling.sh`
#### Priority 2: Implement GUI Chat Widget (6-8 hours)
**Goal**: Unified chat experience in YAZE application
1. **Create ImGui Chat Widget** (4 hours)
- File: `src/app/gui/debug/agent_chat_widget.{h,cc}`
- Reuse table/JSON rendering logic from TUI implementation
- Add to Debug menu: `Debug → Agent Chat`
- Share `ConversationalAgentService` instance with TUI
2. **Add Chat History Persistence** (2 hours)
- Save chat history to `.yaze/agent_chat_history.json`
- Load on startup, display in GUI/TUI
- Add "Clear History" button
3. **Polish Input Experience** (2 hours)
- Multi-line input support (Shift+Enter for newline, Enter to send)
- Keyboard shortcuts: Ctrl+L to clear, Ctrl+C to copy last response
- Auto-scroll to bottom on new messages
#### Priority 3: Expand Tool Coverage (8-10 hours)
**Goal**: Enable deeper ROM introspection for level design questions
1. **Dialogue/Text Tools** (3 hours)
- `dialogue-search --text "search term"`: Find text in ROM dialogue
- `dialogue-get --id 0x...`: Get dialogue by message ID
2. **Sprite Tools** (3 hours)
- `sprite-get-info --id 0x...`: Sprite metadata (HP, damage, AI)
- `overworld-list-sprites --map 0x...`: Sprites on overworld map
3. **Advanced Overworld Tools** (4 hours)
- `overworld-get-region --map 0x...`: Region boundaries and properties
- `overworld-list-transitions --from-map 0x...`: Map transitions/scrolling
- `overworld-get-tile-at --map 0x... --x N --y N`: Get specific tile16 value
#### Priority 4: Performance and Caching (4-6 hours)
1. **Response Caching** (3 hours)
- Implement LRU cache for identical prompts
- Cache tool results by (tool_name, args) key
- Configurable TTL (default: 5 minutes for ROM introspection)
2. **Token Usage Tracking** (2 hours)
- Log tokens per request (Ollama and Gemini APIs provide this)
- Display in chat footer: "Last response: 1234 tokens, ~$0.02"
- Add `--show-token-usage` flag to CLI commands
3. **Streaming Responses** (optional, 3-4 hours)
- Use Ollama/Gemini streaming APIs
- Update GUI/TUI to show partial responses as they arrive
- Improves perceived latency for long responses

View File

@@ -143,19 +143,24 @@ The generative workflow has been refined to incorporate more detailed planning a
- **`rom generate-golden`**: Implemented.
- **Project Scaffolding**: Implemented.
### Phase 4: Agentic Framework & Generative AI (In Progress)
### Phase 4: Agentic Framework & Generative AI (✅ Foundation Complete, 🚧 LLM Integration In Progress)
- **`z3ed agent` command**: ✅ Implemented with `run`, `plan`, `diff`, `test`, `commit`, `revert`, `describe`, `learn`, and `list` subcommands.
- **Resource Catalog System**: ✅ Complete - comprehensive schema for all CLI commands with effects and returns metadata.
- **Agent Describe Command**: ✅ Fully operational - exports command catalog in JSON/YAML formats for AI consumption.
- **Agent List Command**: ✅ Complete - enumerates all proposals with status and metadata.
- **Agent Diff Enhancement**: ✅ Complete - reads proposals from registry, supports `--proposal-id` flag, displays execution logs and metadata.
- **Machine-Readable API**: ✅ `docs/api/z3ed-resources.yaml` generated and maintained for automation.
- **AI Model Interaction**: In progress, with `MockAIService` and `GeminiAIService` (conditional) implemented.
- **Execution Loop (MCP)**: In progress, with command parsing and execution logic.
- **Leveraging `ImGuiTestEngine`**: In progress, with `agent test` subcommand for GUI verification.
- **Conversational Agent Service**: ✅ Complete - multi-step tool execution loop with history management.
- **Tool Dispatcher**: ✅ Complete - 5 read-only tools for ROM introspection (`resource-list`, `dungeon-list-sprites`, `overworld-find-tile`, `overworld-describe-map`, `overworld-list-warps`).
- **TUI Chat Interface**: ✅ Complete - production-ready with table/JSON rendering (`z3ed agent chat`).
- **AI Service Backends**: ✅ Operational - Ollama (local) and Gemini (cloud) with enhanced prompting.
- **LLM Function Calling**: 🚧 In Progress - ToolDispatcher exists, needs tool schema injection into prompts and response parsing.
- **GUI Chat Widget**: 📋 Planned - TUI implementation complete, ImGui widget pending.
- **Execution Loop (MCP)**: ✅ Complete - command parsing and execution logic operational.
- **Leveraging `ImGuiTestEngine`**: ✅ Complete - `agent test` subcommand for GUI verification (see IT-01/02).
- **Sandbox ROM Management**: ✅ Complete - `RomSandboxManager` operational with full lifecycle management.
- **Proposal Tracking**: ✅ Complete - `ProposalRegistry` implemented with metadata, diffs, logs, and lifecycle management.
- **Granular Data Commands**: Partially complete - rom, palette, overworld, dungeon commands operational.
- **Granular Data Commands**: ✅ Complete - rom, palette, overworld, dungeon commands operational.
- **SpriteBuilder CLI**: Deprioritized.
### Phase 5: Code Structure & UX Improvements (Completed)

View File

@@ -17,12 +17,15 @@ The z3ed CLI and AI agent workflow system has completed major infrastructure mil
- **IT-02**: CLI Agent Test - Natural language → automated GUI testing (implementation complete)
**🔄 Active Phase**:
- **Test Harness Enhancements (IT-05 to IT-09)**: Expanding from basic automation to comprehensive testing platform with a renewed emphasis on system-wide error reporting
- **Test Harness Enhancements (IT-05 to IT-09)**: ✅ Core infrastructure complete (IT-05/07/08 shipped, IT-09 CLI tooling complete)
- **Conversational Agent Implementation**: 🚧 Foundation complete, LLM function calling integration in progress
**📋 Next Phases**:
- **Priority 1**: LLM Integration (Ollama + Gemini + Claude) - Make AI agent system production-ready (see [LLM-INTEGRATION-PLAN.md](LLM-INTEGRATION-PLAN.md))
- **Priority 2**: Widget Discovery API (IT-06) - AI agents enumerate available GUI interactions
- **Priority 3**: Windows Cross-Platform Testing - Validate on Windows with vcpkg
**📋 Next Phases (Updated Oct 3, 2025)**:
- **Priority 1**: Complete LLM Function Calling (4-6h) - Add tool schema to prompts, parse function calls
- **Priority 2**: GUI Chat Widget (6-8h) - Create ImGui widget matching TUI experience
- **Priority 3**: Expand Tool Coverage (8-10h) - Add dialogue, sprite, region inspection tools
- **Priority 4**: Widget Discovery API (IT-06) - AI agents enumerate available GUI interactions
- **Priority 5**: Windows Cross-Platform Testing - Validate on Windows with vcpkg
- **Deprioritized**: Collaborative Editing (IT-10) - Postponed in favor of practical LLM integration
**Recent Accomplishments** (Updated: October 2025):

View File

@@ -128,19 +128,37 @@ Here are some example prompts you can try with either Ollama or Gemini:
2. **[E6-z3ed-cli-design.md](E6-z3ed-cli-design.md)** - Detailed architecture and design philosophy.
3. **[E6-z3ed-reference.md](E6-z3ed-reference.md)** - Complete command reference and API documentation.
## Current Status (October 2025)
## Current Status (October 3, 2025)
The project is currently focused on implementing a conversational AI agent. See [AGENT-ROADMAP.md](AGENT-ROADMAP.md) for a detailed breakdown of what's complete, in progress, and planned.
### 🔄 In Progress
- **Conversational Agent**: Building a chat-like interface for the TUI and GUI.
- **Agent "Tools"**: Adding more read-only commands for the agent to inspect the ROM.
- **ResourceLabels Integration**: Integrating user-defined names for AI context.
### ✅ Completed
- **Conversational Agent Service**: ✅ Multi-step tool execution loop operational
- **TUI Chat Interface**: ✅ Production-ready with table/JSON rendering (`z3ed agent chat`)
- **Tool Dispatcher**: ✅ 5 read-only tools for ROM introspection
- `resource-list`: Labeled resource enumeration
- `dungeon-list-sprites`: Sprite inspection in dungeon rooms
- `overworld-find-tile`: Tile16 search across overworld maps
- `overworld-describe-map`: Comprehensive map metadata
- `overworld-list-warps`: Entrance/exit/hole enumeration
- **AI Service Backends**: ✅ Ollama (local) and Gemini (cloud) operational
- **Enhanced Prompting**: ✅ Resource catalogue loading with system instruction generation
### 📋 Planned
- **GUI Chat Widget**: A shared chat interface for the main `yaze` application.
- **Dungeon Editing Support**: Object/sprite placement via AI.
- **Visual Diff Generation**: Before/after screenshots for proposals.
### 🔄 In Progress (Priority Order)
1. **LLM Function Calling**: Partially implemented - needs tool schema injection into prompts
2. **GUI Chat Widget**: Not yet started - TUI exists, GUI integration pending
3. **Tool Coverage Expansion**: 5 tools working, 8+ planned (dialogue, sprites, regions)
### 📋 Next Steps (See AGENT-ROADMAP.md for details)
1. **Complete LLM Function Calling** (4-6h): Add tool definitions to system prompts
2. **Implement GUI Chat Widget** (6-8h): Create ImGui widget matching TUI experience
3. **Expand Tool Coverage** (8-10h): Add dialogue search, sprite info, region queries
4. **Performance Optimizations** (4-6h): Response caching, token tracking, streaming
### 📋 Future Plans
- **Dungeon Editing Support**: Object/sprite placement via AI (after tool foundation complete)
- **Visual Diff Generation**: Before/after screenshots for proposals
- **Multi-Modal Agent**: Image generation for dungeon room maps
## AI Editing Focus Areas