docs: Update roadmap and implementation plan to reflect progress on conversational agent and LLM integration

2025-10-03 22:17:38 -04:00
parent 5123b0ee5f
commit 799d8117ed
4 changed files with 170 additions and 37 deletions
--- a/docs/z3ed/AGENT-ROADMAP.md
+++ b/docs/z3ed/AGENT-ROADMAP.md
@@ -86,26 +86,133 @@ This vision will be realized through a shared interface available in both the `z
    - Implement a response cache to reduce latency and API costs.
    - Add token usage tracking and reporting.

-## Current Status & Next Steps (As of Oct 3, Session 2)
+## Current Status & Next Steps (Updated: October 3, 2025)

 We have made significant progress in laying the foundation for the conversational agent.

 ### ✅ Completed
- **Initial `ConversationalAgentService`**: The basic service is in place.
- **TUI Chat Stub**: A functional `agent chat` command exists.
- **GUI Chat Widget Stub**: An `AgentChatWidget` is integrated into the main GUI.
- **Initial Agent "Tools"**: `resource-list` and `dungeon-list-sprites` commands are implemented.
- **Tool Use Foundation**: The `ToolDispatcher` is implemented, and the AI services are aware of the new tool call format.
- - **Tool Loop Improvements**: Conversational flow now handles multi-step tool calls with default JSON output, allowing results to feed back into the chat without recursion.
- **Structured Tool Output Rendering**: Both the TUI and GUI chat widgets now display tables and JSON payloads with friendly formatting, drastically improving readability.
- **Overworld Inspection Suite**: Added `overworld describe-map` and `overworld list-warps` commands producing text/JSON summaries for map metadata and warp points, with agent tooling hooks.
- **Overworld Tile Search Tool**: Added `overworld find-tile` across CLI and agent tooling with shared ROM context handling and regression tests.
+- **`ConversationalAgentService`**: ✅ Fully operational with multi-step tool execution loop
+  - Handles tool calls with automatic JSON output format
+  - Prevents recursion through proper tool result replay
+  - Supports conversation history and context management
+- **TUI Chat Interface**: ✅ Production-ready (`z3ed agent chat`)
+  - Renders tables from JSON tool results
+  - Pretty-prints JSON payloads with syntax formatting
+  - Scrollable history with user/agent distinction
+- **Tool Dispatcher**: ✅ Complete with 5 read-only tools
+  - `resource-list`: Enumerate labeled resources (dungeons, sprites, palettes)
+  - `dungeon-list-sprites`: Inspect sprites in dungeon rooms
+  - `overworld-find-tile`: Search for tile16 IDs across maps
+  - `overworld-describe-map`: Get comprehensive map metadata
+  - `overworld-list-warps`: List entrances/exits/holes with filtering
+- **Structured Output Rendering**: ✅ Both TUI formats support tables and JSON
+  - Automatic table generation from JSON arrays/objects
+  - Column-aligned formatting with headers
+  - Graceful fallback to text for malformed data
+- **ROM Context Integration**: ✅ Tools can access loaded ROM or load from `--rom` flag
+  - Shared ROM context passed through ConversationalAgentService
+  - Automatic ROM loading with error handling
+- **AI Service Foundation**: ✅ Ollama and Gemini services operational
+  - Enhanced prompting system with resource catalogue loading
+  - System instruction generation with examples
+  - Health checks and model availability validation

-### 🚀 Next Steps
-1.  **Integrate Tool Use with LLM**:
-    - Modify the `AIService` to support function calling/tool use.
-    - Teach the agent to call the new read-only commands to answer questions.
-2.  **Polish the TUI Chat Experience**:
-    - Tighten keyboard shortcuts, scrolling, and copy-to-clipboard behaviour.
-    - Align log file output with on-screen formatting for easier debugging.
-2.  **Expand Tool Coverage**: Target additional Overworld navigation helpers (region summaries, teleport lookups) and dialogue inspectors. Prioritize commands that unblock common level-design questions and emit concise table/JSON payloads.
+### 🚧 In Progress
+- **GUI Chat Widget**: ⚠️ **NOT YET IMPLEMENTED**
+  - No `AgentChatWidget` found in `src/app/gui/` directory
+  - TUI implementation exists but GUI integration is pending
+  - **Action Required**: Create `src/app/gui/debug/agent_chat_widget.{h,cc}`
+- **LLM Function Calling**: ⚠️ **PARTIALLY IMPLEMENTED**
+  - ToolDispatcher exists and is used by ConversationalAgentService
+  - AI services (Ollama, Gemini) parse tool calls from responses
+  - **Gap**: LLM prompt needs explicit tool schema definitions for function calling
+  - **Action Required**: Add tool definitions to system prompts (see Next Steps)
+
+### 🚀 Next Steps (Priority Order)
+
+#### Priority 1: Complete LLM Function Calling Integration (4-6 hours)
+**Goal**: Enable Ollama/Gemini to autonomously invoke read-only tools
+
+1. **Add Tool Definitions to System Prompts** (2 hours)
+   - Generate JSON schema for all 5 tools in `ToolDispatcher`
+   - Inject tool definitions into `PromptBuilder::BuildSystemInstruction()`
+   - Format: OpenAI-compatible function calling format
+   ```json
+   {
+     "name": "resource-list",
+     "description": "List all labeled resources of a given type",
+     "parameters": {
+       "type": "object",
+       "properties": {
+         "type": {"type": "string", "enum": ["dungeon", "sprite", "overworld"]},
+         "format": {"type": "string", "enum": ["table", "json"]}
+       },
+       "required": ["type"]
+     }
+   }
+   ```
+
+2. **Parse Function Calls from LLM Responses** (2 hours)
+   - Update `OllamaAIService::GenerateResponse()` to detect function calls in JSON
+   - Update `GeminiAIService::GenerateResponse()` for Gemini's function calling format
+   - Populate `AgentResponse.tool_calls` with parsed ToolCall objects
+   - **File**: `src/cli/service/ai/ollama_ai_service.cc:176-294`
+   - **File**: `src/cli/service/ai/gemini_ai_service.cc:104-285`
+
+3. **Test Tool Invocation Round-Trip** (1-2 hours)
+   - Verify LLM can discover available tools from system prompt
+   - Test: "What dungeons are in this ROM?" → should call `resource-list --type dungeon`
+   - Test: "Find all water tiles on map 0" → should call `overworld-find-tile --tile 0x..."`
+   - Create regression test script: `scripts/test_agent_tool_calling.sh`
+
+#### Priority 2: Implement GUI Chat Widget (6-8 hours)
+**Goal**: Unified chat experience in YAZE application
+
+1. **Create ImGui Chat Widget** (4 hours)
+   - File: `src/app/gui/debug/agent_chat_widget.{h,cc}`
+   - Reuse table/JSON rendering logic from TUI implementation
+   - Add to Debug menu: `Debug → Agent Chat`
+   - Share `ConversationalAgentService` instance with TUI
+
+2. **Add Chat History Persistence** (2 hours)
+   - Save chat history to `.yaze/agent_chat_history.json`
+   - Load on startup, display in GUI/TUI
+   - Add "Clear History" button
+
+3. **Polish Input Experience** (2 hours)
+   - Multi-line input support (Shift+Enter for newline, Enter to send)
+   - Keyboard shortcuts: Ctrl+L to clear, Ctrl+C to copy last response
+   - Auto-scroll to bottom on new messages
+
+#### Priority 3: Expand Tool Coverage (8-10 hours)
+**Goal**: Enable deeper ROM introspection for level design questions
+
+1. **Dialogue/Text Tools** (3 hours)
+   - `dialogue-search --text "search term"`: Find text in ROM dialogue
+   - `dialogue-get --id 0x...`: Get dialogue by message ID
+
+2. **Sprite Tools** (3 hours)
+   - `sprite-get-info --id 0x...`: Sprite metadata (HP, damage, AI)
+   - `overworld-list-sprites --map 0x...`: Sprites on overworld map
+
+3. **Advanced Overworld Tools** (4 hours)
+   - `overworld-get-region --map 0x...`: Region boundaries and properties
+   - `overworld-list-transitions --from-map 0x...`: Map transitions/scrolling
+   - `overworld-get-tile-at --map 0x... --x N --y N`: Get specific tile16 value
+
+#### Priority 4: Performance and Caching (4-6 hours)
+
+1. **Response Caching** (3 hours)
+   - Implement LRU cache for identical prompts
+   - Cache tool results by (tool_name, args) key
+   - Configurable TTL (default: 5 minutes for ROM introspection)
+
+2. **Token Usage Tracking** (2 hours)
+   - Log tokens per request (Ollama and Gemini APIs provide this)
+   - Display in chat footer: "Last response: 1234 tokens, ~$0.02"
+   - Add `--show-token-usage` flag to CLI commands
+
+3. **Streaming Responses** (optional, 3-4 hours)
+   - Use Ollama/Gemini streaming APIs
+   - Update GUI/TUI to show partial responses as they arrive
+   - Improves perceived latency for long responses
--- a/docs/z3ed/E6-z3ed-cli-design.md
+++ b/docs/z3ed/E6-z3ed-cli-design.md
@@ -143,19 +143,24 @@ The generative workflow has been refined to incorporate more detailed planning a
 - **`rom generate-golden`**: Implemented.
 - **Project Scaffolding**: Implemented.

-### Phase 4: Agentic Framework & Generative AI (In Progress)
+### Phase 4: Agentic Framework & Generative AI (✅ Foundation Complete, 🚧 LLM Integration In Progress)
 - **`z3ed agent` command**: ✅ Implemented with `run`, `plan`, `diff`, `test`, `commit`, `revert`, `describe`, `learn`, and `list` subcommands.
 - **Resource Catalog System**: ✅ Complete - comprehensive schema for all CLI commands with effects and returns metadata.
 - **Agent Describe Command**: ✅ Fully operational - exports command catalog in JSON/YAML formats for AI consumption.
 - **Agent List Command**: ✅ Complete - enumerates all proposals with status and metadata.
 - **Agent Diff Enhancement**: ✅ Complete - reads proposals from registry, supports `--proposal-id` flag, displays execution logs and metadata.
 - **Machine-Readable API**: ✅ `docs/api/z3ed-resources.yaml` generated and maintained for automation.
- **AI Model Interaction**: In progress, with `MockAIService` and `GeminiAIService` (conditional) implemented.
- **Execution Loop (MCP)**: In progress, with command parsing and execution logic.
- **Leveraging `ImGuiTestEngine`**: In progress, with `agent test` subcommand for GUI verification.
+- **Conversational Agent Service**: ✅ Complete - multi-step tool execution loop with history management.
+- **Tool Dispatcher**: ✅ Complete - 5 read-only tools for ROM introspection (`resource-list`, `dungeon-list-sprites`, `overworld-find-tile`, `overworld-describe-map`, `overworld-list-warps`).
+- **TUI Chat Interface**: ✅ Complete - production-ready with table/JSON rendering (`z3ed agent chat`).
+- **AI Service Backends**: ✅ Operational - Ollama (local) and Gemini (cloud) with enhanced prompting.
+- **LLM Function Calling**: 🚧 In Progress - ToolDispatcher exists, needs tool schema injection into prompts and response parsing.
+- **GUI Chat Widget**: 📋 Planned - TUI implementation complete, ImGui widget pending.
+- **Execution Loop (MCP)**: ✅ Complete - command parsing and execution logic operational.
+- **Leveraging `ImGuiTestEngine`**: ✅ Complete - `agent test` subcommand for GUI verification (see IT-01/02).
 - **Sandbox ROM Management**: ✅ Complete - `RomSandboxManager` operational with full lifecycle management.
 - **Proposal Tracking**: ✅ Complete - `ProposalRegistry` implemented with metadata, diffs, logs, and lifecycle management.
- **Granular Data Commands**: Partially complete - rom, palette, overworld, dungeon commands operational.
+- **Granular Data Commands**: ✅ Complete - rom, palette, overworld, dungeon commands operational.
 - **SpriteBuilder CLI**: Deprioritized.

 ### Phase 5: Code Structure & UX Improvements (Completed)
--- a/docs/z3ed/E6-z3ed-implementation-plan.md
+++ b/docs/z3ed/E6-z3ed-implementation-plan.md
@@ -17,12 +17,15 @@ The z3ed CLI and AI agent workflow system has completed major infrastructure mil
 - **IT-02**: CLI Agent Test - Natural language → automated GUI testing (implementation complete)

 **🔄 Active Phase**:
- **Test Harness Enhancements (IT-05 to IT-09)**: Expanding from basic automation to comprehensive testing platform with a renewed emphasis on system-wide error reporting
+- **Test Harness Enhancements (IT-05 to IT-09)**: ✅ Core infrastructure complete (IT-05/07/08 shipped, IT-09 CLI tooling complete)
+- **Conversational Agent Implementation**: 🚧 Foundation complete, LLM function calling integration in progress

-**📋 Next Phases**:
- **Priority 1**: LLM Integration (Ollama + Gemini + Claude) - Make AI agent system production-ready (see [LLM-INTEGRATION-PLAN.md](LLM-INTEGRATION-PLAN.md))
- **Priority 2**: Widget Discovery API (IT-06) - AI agents enumerate available GUI interactions
- **Priority 3**: Windows Cross-Platform Testing - Validate on Windows with vcpkg
+**📋 Next Phases (Updated Oct 3, 2025)**:
+- **Priority 1**: Complete LLM Function Calling (4-6h) - Add tool schema to prompts, parse function calls
+- **Priority 2**: GUI Chat Widget (6-8h) - Create ImGui widget matching TUI experience
+- **Priority 3**: Expand Tool Coverage (8-10h) - Add dialogue, sprite, region inspection tools
+- **Priority 4**: Widget Discovery API (IT-06) - AI agents enumerate available GUI interactions
+- **Priority 5**: Windows Cross-Platform Testing - Validate on Windows with vcpkg
 - **Deprioritized**: Collaborative Editing (IT-10) - Postponed in favor of practical LLM integration

 **Recent Accomplishments** (Updated: October 2025):
--- a/docs/z3ed/README.md
+++ b/docs/z3ed/README.md
@@ -128,19 +128,37 @@ Here are some example prompts you can try with either Ollama or Gemini:
 2. **[E6-z3ed-cli-design.md](E6-z3ed-cli-design.md)** - Detailed architecture and design philosophy.
 3. **[E6-z3ed-reference.md](E6-z3ed-reference.md)** - Complete command reference and API documentation.

-## Current Status (October 2025)
+## Current Status (October 3, 2025)

 The project is currently focused on implementing a conversational AI agent. See [AGENT-ROADMAP.md](AGENT-ROADMAP.md) for a detailed breakdown of what's complete, in progress, and planned.

-### 🔄 In Progress
- **Conversational Agent**: Building a chat-like interface for the TUI and GUI.
- **Agent "Tools"**: Adding more read-only commands for the agent to inspect the ROM.
- **ResourceLabels Integration**: Integrating user-defined names for AI context.
+### ✅ Completed
+- **Conversational Agent Service**: ✅ Multi-step tool execution loop operational
+- **TUI Chat Interface**: ✅ Production-ready with table/JSON rendering (`z3ed agent chat`)
+- **Tool Dispatcher**: ✅ 5 read-only tools for ROM introspection
+  - `resource-list`: Labeled resource enumeration
+  - `dungeon-list-sprites`: Sprite inspection in dungeon rooms
+  - `overworld-find-tile`: Tile16 search across overworld maps
+  - `overworld-describe-map`: Comprehensive map metadata
+  - `overworld-list-warps`: Entrance/exit/hole enumeration
+- **AI Service Backends**: ✅ Ollama (local) and Gemini (cloud) operational
+- **Enhanced Prompting**: ✅ Resource catalogue loading with system instruction generation

-### 📋 Planned
- **GUI Chat Widget**: A shared chat interface for the main `yaze` application.
- **Dungeon Editing Support**: Object/sprite placement via AI.
- **Visual Diff Generation**: Before/after screenshots for proposals.
+### 🔄 In Progress (Priority Order)
+1. **LLM Function Calling**: Partially implemented - needs tool schema injection into prompts
+2. **GUI Chat Widget**: Not yet started - TUI exists, GUI integration pending
+3. **Tool Coverage Expansion**: 5 tools working, 8+ planned (dialogue, sprites, regions)
+
+### 📋 Next Steps (See AGENT-ROADMAP.md for details)
+1. **Complete LLM Function Calling** (4-6h): Add tool definitions to system prompts
+2. **Implement GUI Chat Widget** (6-8h): Create ImGui widget matching TUI experience
+3. **Expand Tool Coverage** (8-10h): Add dialogue search, sprite info, region queries
+4. **Performance Optimizations** (4-6h): Response caching, token tracking, streaming
+
+### 📋 Future Plans
+- **Dungeon Editing Support**: Object/sprite placement via AI (after tool foundation complete)
+- **Visual Diff Generation**: Before/after screenshots for proposals
+- **Multi-Modal Agent**: Image generation for dungeon room maps

 ## AI Editing Focus Areas