From 799d8117ed830e61f0b5bf6c47df41df68f688c2 Mon Sep 17 00:00:00 2001 From: scawful Date: Fri, 3 Oct 2025 22:17:38 -0400 Subject: [PATCH] docs: Update roadmap and implementation plan to reflect progress on conversational agent and LLM integration --- docs/z3ed/AGENT-ROADMAP.md | 143 ++++++++++++++++++++--- docs/z3ed/E6-z3ed-cli-design.md | 15 ++- docs/z3ed/E6-z3ed-implementation-plan.md | 13 ++- docs/z3ed/README.md | 36 ++++-- 4 files changed, 170 insertions(+), 37 deletions(-) diff --git a/docs/z3ed/AGENT-ROADMAP.md b/docs/z3ed/AGENT-ROADMAP.md index c317fc47..d324a5d7 100644 --- a/docs/z3ed/AGENT-ROADMAP.md +++ b/docs/z3ed/AGENT-ROADMAP.md @@ -86,26 +86,133 @@ This vision will be realized through a shared interface available in both the `z - Implement a response cache to reduce latency and API costs. - Add token usage tracking and reporting. -## Current Status & Next Steps (As of Oct 3, Session 2) +## Current Status & Next Steps (Updated: October 3, 2025) We have made significant progress in laying the foundation for the conversational agent. ### ✅ Completed -- **Initial `ConversationalAgentService`**: The basic service is in place. -- **TUI Chat Stub**: A functional `agent chat` command exists. -- **GUI Chat Widget Stub**: An `AgentChatWidget` is integrated into the main GUI. -- **Initial Agent "Tools"**: `resource-list` and `dungeon-list-sprites` commands are implemented. -- **Tool Use Foundation**: The `ToolDispatcher` is implemented, and the AI services are aware of the new tool call format. - - **Tool Loop Improvements**: Conversational flow now handles multi-step tool calls with default JSON output, allowing results to feed back into the chat without recursion. -- **Structured Tool Output Rendering**: Both the TUI and GUI chat widgets now display tables and JSON payloads with friendly formatting, drastically improving readability. -- **Overworld Inspection Suite**: Added `overworld describe-map` and `overworld list-warps` commands producing text/JSON summaries for map metadata and warp points, with agent tooling hooks. -- **Overworld Tile Search Tool**: Added `overworld find-tile` across CLI and agent tooling with shared ROM context handling and regression tests. +- **`ConversationalAgentService`**: ✅ Fully operational with multi-step tool execution loop + - Handles tool calls with automatic JSON output format + - Prevents recursion through proper tool result replay + - Supports conversation history and context management +- **TUI Chat Interface**: ✅ Production-ready (`z3ed agent chat`) + - Renders tables from JSON tool results + - Pretty-prints JSON payloads with syntax formatting + - Scrollable history with user/agent distinction +- **Tool Dispatcher**: ✅ Complete with 5 read-only tools + - `resource-list`: Enumerate labeled resources (dungeons, sprites, palettes) + - `dungeon-list-sprites`: Inspect sprites in dungeon rooms + - `overworld-find-tile`: Search for tile16 IDs across maps + - `overworld-describe-map`: Get comprehensive map metadata + - `overworld-list-warps`: List entrances/exits/holes with filtering +- **Structured Output Rendering**: ✅ Both TUI formats support tables and JSON + - Automatic table generation from JSON arrays/objects + - Column-aligned formatting with headers + - Graceful fallback to text for malformed data +- **ROM Context Integration**: ✅ Tools can access loaded ROM or load from `--rom` flag + - Shared ROM context passed through ConversationalAgentService + - Automatic ROM loading with error handling +- **AI Service Foundation**: ✅ Ollama and Gemini services operational + - Enhanced prompting system with resource catalogue loading + - System instruction generation with examples + - Health checks and model availability validation -### 🚀 Next Steps -1. **Integrate Tool Use with LLM**: - - Modify the `AIService` to support function calling/tool use. - - Teach the agent to call the new read-only commands to answer questions. -2. **Polish the TUI Chat Experience**: - - Tighten keyboard shortcuts, scrolling, and copy-to-clipboard behaviour. - - Align log file output with on-screen formatting for easier debugging. -2. **Expand Tool Coverage**: Target additional Overworld navigation helpers (region summaries, teleport lookups) and dialogue inspectors. Prioritize commands that unblock common level-design questions and emit concise table/JSON payloads. \ No newline at end of file +### 🚧 In Progress +- **GUI Chat Widget**: ⚠️ **NOT YET IMPLEMENTED** + - No `AgentChatWidget` found in `src/app/gui/` directory + - TUI implementation exists but GUI integration is pending + - **Action Required**: Create `src/app/gui/debug/agent_chat_widget.{h,cc}` +- **LLM Function Calling**: ⚠️ **PARTIALLY IMPLEMENTED** + - ToolDispatcher exists and is used by ConversationalAgentService + - AI services (Ollama, Gemini) parse tool calls from responses + - **Gap**: LLM prompt needs explicit tool schema definitions for function calling + - **Action Required**: Add tool definitions to system prompts (see Next Steps) + +### 🚀 Next Steps (Priority Order) + +#### Priority 1: Complete LLM Function Calling Integration (4-6 hours) +**Goal**: Enable Ollama/Gemini to autonomously invoke read-only tools + +1. **Add Tool Definitions to System Prompts** (2 hours) + - Generate JSON schema for all 5 tools in `ToolDispatcher` + - Inject tool definitions into `PromptBuilder::BuildSystemInstruction()` + - Format: OpenAI-compatible function calling format + ```json + { + "name": "resource-list", + "description": "List all labeled resources of a given type", + "parameters": { + "type": "object", + "properties": { + "type": {"type": "string", "enum": ["dungeon", "sprite", "overworld"]}, + "format": {"type": "string", "enum": ["table", "json"]} + }, + "required": ["type"] + } + } + ``` + +2. **Parse Function Calls from LLM Responses** (2 hours) + - Update `OllamaAIService::GenerateResponse()` to detect function calls in JSON + - Update `GeminiAIService::GenerateResponse()` for Gemini's function calling format + - Populate `AgentResponse.tool_calls` with parsed ToolCall objects + - **File**: `src/cli/service/ai/ollama_ai_service.cc:176-294` + - **File**: `src/cli/service/ai/gemini_ai_service.cc:104-285` + +3. **Test Tool Invocation Round-Trip** (1-2 hours) + - Verify LLM can discover available tools from system prompt + - Test: "What dungeons are in this ROM?" → should call `resource-list --type dungeon` + - Test: "Find all water tiles on map 0" → should call `overworld-find-tile --tile 0x..."` + - Create regression test script: `scripts/test_agent_tool_calling.sh` + +#### Priority 2: Implement GUI Chat Widget (6-8 hours) +**Goal**: Unified chat experience in YAZE application + +1. **Create ImGui Chat Widget** (4 hours) + - File: `src/app/gui/debug/agent_chat_widget.{h,cc}` + - Reuse table/JSON rendering logic from TUI implementation + - Add to Debug menu: `Debug → Agent Chat` + - Share `ConversationalAgentService` instance with TUI + +2. **Add Chat History Persistence** (2 hours) + - Save chat history to `.yaze/agent_chat_history.json` + - Load on startup, display in GUI/TUI + - Add "Clear History" button + +3. **Polish Input Experience** (2 hours) + - Multi-line input support (Shift+Enter for newline, Enter to send) + - Keyboard shortcuts: Ctrl+L to clear, Ctrl+C to copy last response + - Auto-scroll to bottom on new messages + +#### Priority 3: Expand Tool Coverage (8-10 hours) +**Goal**: Enable deeper ROM introspection for level design questions + +1. **Dialogue/Text Tools** (3 hours) + - `dialogue-search --text "search term"`: Find text in ROM dialogue + - `dialogue-get --id 0x...`: Get dialogue by message ID + +2. **Sprite Tools** (3 hours) + - `sprite-get-info --id 0x...`: Sprite metadata (HP, damage, AI) + - `overworld-list-sprites --map 0x...`: Sprites on overworld map + +3. **Advanced Overworld Tools** (4 hours) + - `overworld-get-region --map 0x...`: Region boundaries and properties + - `overworld-list-transitions --from-map 0x...`: Map transitions/scrolling + - `overworld-get-tile-at --map 0x... --x N --y N`: Get specific tile16 value + +#### Priority 4: Performance and Caching (4-6 hours) + +1. **Response Caching** (3 hours) + - Implement LRU cache for identical prompts + - Cache tool results by (tool_name, args) key + - Configurable TTL (default: 5 minutes for ROM introspection) + +2. **Token Usage Tracking** (2 hours) + - Log tokens per request (Ollama and Gemini APIs provide this) + - Display in chat footer: "Last response: 1234 tokens, ~$0.02" + - Add `--show-token-usage` flag to CLI commands + +3. **Streaming Responses** (optional, 3-4 hours) + - Use Ollama/Gemini streaming APIs + - Update GUI/TUI to show partial responses as they arrive + - Improves perceived latency for long responses \ No newline at end of file diff --git a/docs/z3ed/E6-z3ed-cli-design.md b/docs/z3ed/E6-z3ed-cli-design.md index b423a0da..c3242c0f 100644 --- a/docs/z3ed/E6-z3ed-cli-design.md +++ b/docs/z3ed/E6-z3ed-cli-design.md @@ -143,19 +143,24 @@ The generative workflow has been refined to incorporate more detailed planning a - **`rom generate-golden`**: Implemented. - **Project Scaffolding**: Implemented. -### Phase 4: Agentic Framework & Generative AI (In Progress) +### Phase 4: Agentic Framework & Generative AI (✅ Foundation Complete, 🚧 LLM Integration In Progress) - **`z3ed agent` command**: ✅ Implemented with `run`, `plan`, `diff`, `test`, `commit`, `revert`, `describe`, `learn`, and `list` subcommands. - **Resource Catalog System**: ✅ Complete - comprehensive schema for all CLI commands with effects and returns metadata. - **Agent Describe Command**: ✅ Fully operational - exports command catalog in JSON/YAML formats for AI consumption. - **Agent List Command**: ✅ Complete - enumerates all proposals with status and metadata. - **Agent Diff Enhancement**: ✅ Complete - reads proposals from registry, supports `--proposal-id` flag, displays execution logs and metadata. - **Machine-Readable API**: ✅ `docs/api/z3ed-resources.yaml` generated and maintained for automation. -- **AI Model Interaction**: In progress, with `MockAIService` and `GeminiAIService` (conditional) implemented. -- **Execution Loop (MCP)**: In progress, with command parsing and execution logic. -- **Leveraging `ImGuiTestEngine`**: In progress, with `agent test` subcommand for GUI verification. +- **Conversational Agent Service**: ✅ Complete - multi-step tool execution loop with history management. +- **Tool Dispatcher**: ✅ Complete - 5 read-only tools for ROM introspection (`resource-list`, `dungeon-list-sprites`, `overworld-find-tile`, `overworld-describe-map`, `overworld-list-warps`). +- **TUI Chat Interface**: ✅ Complete - production-ready with table/JSON rendering (`z3ed agent chat`). +- **AI Service Backends**: ✅ Operational - Ollama (local) and Gemini (cloud) with enhanced prompting. +- **LLM Function Calling**: 🚧 In Progress - ToolDispatcher exists, needs tool schema injection into prompts and response parsing. +- **GUI Chat Widget**: 📋 Planned - TUI implementation complete, ImGui widget pending. +- **Execution Loop (MCP)**: ✅ Complete - command parsing and execution logic operational. +- **Leveraging `ImGuiTestEngine`**: ✅ Complete - `agent test` subcommand for GUI verification (see IT-01/02). - **Sandbox ROM Management**: ✅ Complete - `RomSandboxManager` operational with full lifecycle management. - **Proposal Tracking**: ✅ Complete - `ProposalRegistry` implemented with metadata, diffs, logs, and lifecycle management. -- **Granular Data Commands**: Partially complete - rom, palette, overworld, dungeon commands operational. +- **Granular Data Commands**: ✅ Complete - rom, palette, overworld, dungeon commands operational. - **SpriteBuilder CLI**: Deprioritized. ### Phase 5: Code Structure & UX Improvements (Completed) diff --git a/docs/z3ed/E6-z3ed-implementation-plan.md b/docs/z3ed/E6-z3ed-implementation-plan.md index da41b65f..7ba9eeaa 100644 --- a/docs/z3ed/E6-z3ed-implementation-plan.md +++ b/docs/z3ed/E6-z3ed-implementation-plan.md @@ -17,12 +17,15 @@ The z3ed CLI and AI agent workflow system has completed major infrastructure mil - **IT-02**: CLI Agent Test - Natural language → automated GUI testing (implementation complete) **🔄 Active Phase**: -- **Test Harness Enhancements (IT-05 to IT-09)**: Expanding from basic automation to comprehensive testing platform with a renewed emphasis on system-wide error reporting +- **Test Harness Enhancements (IT-05 to IT-09)**: ✅ Core infrastructure complete (IT-05/07/08 shipped, IT-09 CLI tooling complete) +- **Conversational Agent Implementation**: 🚧 Foundation complete, LLM function calling integration in progress -**📋 Next Phases**: -- **Priority 1**: LLM Integration (Ollama + Gemini + Claude) - Make AI agent system production-ready (see [LLM-INTEGRATION-PLAN.md](LLM-INTEGRATION-PLAN.md)) -- **Priority 2**: Widget Discovery API (IT-06) - AI agents enumerate available GUI interactions -- **Priority 3**: Windows Cross-Platform Testing - Validate on Windows with vcpkg +**📋 Next Phases (Updated Oct 3, 2025)**: +- **Priority 1**: Complete LLM Function Calling (4-6h) - Add tool schema to prompts, parse function calls +- **Priority 2**: GUI Chat Widget (6-8h) - Create ImGui widget matching TUI experience +- **Priority 3**: Expand Tool Coverage (8-10h) - Add dialogue, sprite, region inspection tools +- **Priority 4**: Widget Discovery API (IT-06) - AI agents enumerate available GUI interactions +- **Priority 5**: Windows Cross-Platform Testing - Validate on Windows with vcpkg - **Deprioritized**: Collaborative Editing (IT-10) - Postponed in favor of practical LLM integration **Recent Accomplishments** (Updated: October 2025): diff --git a/docs/z3ed/README.md b/docs/z3ed/README.md index 9ce54af2..1819baa5 100644 --- a/docs/z3ed/README.md +++ b/docs/z3ed/README.md @@ -128,19 +128,37 @@ Here are some example prompts you can try with either Ollama or Gemini: 2. **[E6-z3ed-cli-design.md](E6-z3ed-cli-design.md)** - Detailed architecture and design philosophy. 3. **[E6-z3ed-reference.md](E6-z3ed-reference.md)** - Complete command reference and API documentation. -## Current Status (October 2025) +## Current Status (October 3, 2025) The project is currently focused on implementing a conversational AI agent. See [AGENT-ROADMAP.md](AGENT-ROADMAP.md) for a detailed breakdown of what's complete, in progress, and planned. -### 🔄 In Progress -- **Conversational Agent**: Building a chat-like interface for the TUI and GUI. -- **Agent "Tools"**: Adding more read-only commands for the agent to inspect the ROM. -- **ResourceLabels Integration**: Integrating user-defined names for AI context. +### ✅ Completed +- **Conversational Agent Service**: ✅ Multi-step tool execution loop operational +- **TUI Chat Interface**: ✅ Production-ready with table/JSON rendering (`z3ed agent chat`) +- **Tool Dispatcher**: ✅ 5 read-only tools for ROM introspection + - `resource-list`: Labeled resource enumeration + - `dungeon-list-sprites`: Sprite inspection in dungeon rooms + - `overworld-find-tile`: Tile16 search across overworld maps + - `overworld-describe-map`: Comprehensive map metadata + - `overworld-list-warps`: Entrance/exit/hole enumeration +- **AI Service Backends**: ✅ Ollama (local) and Gemini (cloud) operational +- **Enhanced Prompting**: ✅ Resource catalogue loading with system instruction generation -### 📋 Planned -- **GUI Chat Widget**: A shared chat interface for the main `yaze` application. -- **Dungeon Editing Support**: Object/sprite placement via AI. -- **Visual Diff Generation**: Before/after screenshots for proposals. +### 🔄 In Progress (Priority Order) +1. **LLM Function Calling**: Partially implemented - needs tool schema injection into prompts +2. **GUI Chat Widget**: Not yet started - TUI exists, GUI integration pending +3. **Tool Coverage Expansion**: 5 tools working, 8+ planned (dialogue, sprites, regions) + +### 📋 Next Steps (See AGENT-ROADMAP.md for details) +1. **Complete LLM Function Calling** (4-6h): Add tool definitions to system prompts +2. **Implement GUI Chat Widget** (6-8h): Create ImGui widget matching TUI experience +3. **Expand Tool Coverage** (8-10h): Add dialogue search, sprite info, region queries +4. **Performance Optimizations** (4-6h): Response caching, token tracking, streaming + +### 📋 Future Plans +- **Dungeon Editing Support**: Object/sprite placement via AI (after tool foundation complete) +- **Visual Diff Generation**: Before/after screenshots for proposals +- **Multi-Modal Agent**: Image generation for dungeon room maps ## AI Editing Focus Areas