feat: Implement LLM function calling schemas and enhance prompt builder with tool definitions

This commit is contained in:
scawful
2025-10-03 22:20:29 -04:00
parent 799d8117ed
commit bcdb7b3ad0
5 changed files with 130 additions and 42 deletions

View File

@@ -130,40 +130,46 @@ We have made significant progress in laying the foundation for the conversationa
### 🚀 Next Steps (Priority Order) ### 🚀 Next Steps (Priority Order)
#### Priority 1: Complete LLM Function Calling Integration (4-6 hours) #### Priority 1: Complete LLM Function Calling Integration ✅ COMPLETE (Oct 3, 2025)
**Goal**: Enable Ollama/Gemini to autonomously invoke read-only tools **Goal**: Enable Ollama/Gemini to autonomously invoke read-only tools
1. **Add Tool Definitions to System Prompts** (2 hours) **Completed Tasks:**
- Generate JSON schema for all 5 tools in `ToolDispatcher` 1.**Tool Schema Generation** - Added `BuildFunctionCallSchemas()` method
- Inject tool definitions into `PromptBuilder::BuildSystemInstruction()` - Generates OpenAI-compatible function calling schemas from tool specifications
- Format: OpenAI-compatible function calling format - Properly formats parameters with types, descriptions, and examples
```json - Marks required vs optional arguments
{ - **File**: `src/cli/service/ai/prompt_builder.{h,cc}`
"name": "resource-list",
"description": "List all labeled resources of a given type",
"parameters": {
"type": "object",
"properties": {
"type": {"type": "string", "enum": ["dungeon", "sprite", "overworld"]},
"format": {"type": "string", "enum": ["table", "json"]}
},
"required": ["type"]
}
}
```
2. **Parse Function Calls from LLM Responses** (2 hours) 2.**System Prompt Enhancement** - Injected tool definitions
- Update `OllamaAIService::GenerateResponse()` to detect function calls in JSON - Updated `BuildConstraintsSection()` to include tool schemas
- Update `GeminiAIService::GenerateResponse()` for Gemini's function calling format - Added tool usage guidance (tools for questions, commands for modifications)
- Included example tool call in JSON format
- **File**: `src/cli/service/ai/prompt_builder.cc`
3.**LLM Response Parsing** - Already implemented
- Both `OllamaAIService` and `GeminiAIService` parse `tool_calls` from JSON
- Populate `AgentResponse.tool_calls` with parsed ToolCall objects - Populate `AgentResponse.tool_calls` with parsed ToolCall objects
- **File**: `src/cli/service/ai/ollama_ai_service.cc:176-294` - **Files**: `src/cli/service/ai/{ollama,gemini}_ai_service.cc`
- **File**: `src/cli/service/ai/gemini_ai_service.cc:104-285`
3. **Test Tool Invocation Round-Trip** (1-2 hours) 4.**Infrastructure Verification** - Created test scripts
- Verify LLM can discover available tools from system prompt - `scripts/test_tool_schemas.sh` - Verifies tool definitions in catalogue
- Test: "What dungeons are in this ROM?" → should call `resource-list --type dungeon` - `scripts/test_agent_mock.sh` - Validates component integration
- Test: "Find all water tiles on map 0" → should call `overworld-find-tile --tile 0x..."` - All 5 tools properly defined with arguments and examples
- Create regression test script: `scripts/test_agent_tool_calling.sh` - **Status**: Ready for live LLM testing
**What's Working:**
- ✅ Tool definitions loaded from `assets/agent/prompt_catalogue.yaml`
- ✅ Function schemas generated in OpenAI format
- ✅ System prompts include tool definitions with usage guidance
- ✅ AI services parse tool_calls from LLM responses
- ✅ ConversationalAgentService dispatches tools via ToolDispatcher
- ✅ Tools return JSON results that feed back into conversation
**Next Step: Live LLM Testing** (1-2 hours)
- Test with Ollama: Verify qwen2.5-coder can discover and invoke tools
- Test with Gemini: Verify Gemini 2.0 generates correct tool_calls
- Create example prompts that exercise all 5 tools
- Verify multi-step tool execution (agent asks follow-up questions)
#### Priority 2: Implement GUI Chat Widget (6-8 hours) #### Priority 2: Implement GUI Chat Widget (6-8 hours)
**Goal**: Unified chat experience in YAZE application **Goal**: Unified chat experience in YAZE application

View File

@@ -16,12 +16,11 @@ The z3ed CLI and AI agent workflow system has completed major infrastructure mil
- **IT-01**: ImGuiTestHarness - Full GUI automation via gRPC + ImGuiTestEngine (all 3 phases complete) - **IT-01**: ImGuiTestHarness - Full GUI automation via gRPC + ImGuiTestEngine (all 3 phases complete)
- **IT-02**: CLI Agent Test - Natural language → automated GUI testing (implementation complete) - **IT-02**: CLI Agent Test - Natural language → automated GUI testing (implementation complete)
**🔄 Active Phase**: **🎯 Active Phase**:
- **Test Harness Enhancements (IT-05 to IT-09)**: ✅ Core infrastructure complete (IT-05/07/08 shipped, IT-09 CLI tooling complete) - **Conversational Agent Implementation**: ✅ Foundation complete, LLM function calling ✅ COMPLETE (Oct 3, 2025)
- **Conversational Agent Implementation**: 🚧 Foundation complete, LLM function calling integration in progress
**📋 Next Phases (Updated Oct 3, 2025)**: **📋 Next Phases (Updated Oct 3, 2025)**:
- **Priority 1**: Complete LLM Function Calling (4-6h) - Add tool schema to prompts, parse function calls - **Priority 1**: Live LLM Testing (1-2h) - Verify function calling with Ollama/Gemini
- **Priority 2**: GUI Chat Widget (6-8h) - Create ImGui widget matching TUI experience - **Priority 2**: GUI Chat Widget (6-8h) - Create ImGui widget matching TUI experience
- **Priority 3**: Expand Tool Coverage (8-10h) - Add dialogue, sprite, region inspection tools - **Priority 3**: Expand Tool Coverage (8-10h) - Add dialogue, sprite, region inspection tools
- **Priority 4**: Widget Discovery API (IT-06) - AI agents enumerate available GUI interactions - **Priority 4**: Widget Discovery API (IT-06) - AI agents enumerate available GUI interactions

View File

@@ -143,14 +143,15 @@ The project is currently focused on implementing a conversational AI agent. See
- `overworld-list-warps`: Entrance/exit/hole enumeration - `overworld-list-warps`: Entrance/exit/hole enumeration
- **AI Service Backends**: ✅ Ollama (local) and Gemini (cloud) operational - **AI Service Backends**: ✅ Ollama (local) and Gemini (cloud) operational
- **Enhanced Prompting**: ✅ Resource catalogue loading with system instruction generation - **Enhanced Prompting**: ✅ Resource catalogue loading with system instruction generation
- **LLM Function Calling**: ✅ Complete - Tool schemas injected into system prompts, response parsing implemented
### 🔄 In Progress (Priority Order) ### 🔄 In Progress (Priority Order)
1. **LLM Function Calling**: Partially implemented - needs tool schema injection into prompts 1. **Live LLM Testing**: Verify function calling with Ollama/Gemini (1-2h)
2. **GUI Chat Widget**: Not yet started - TUI exists, GUI integration pending 2. **GUI Chat Widget**: Not yet started - TUI exists, GUI integration pending (6-8h)
3. **Tool Coverage Expansion**: 5 tools working, 8+ planned (dialogue, sprites, regions) 3. **Tool Coverage Expansion**: 5 tools working, 8+ planned (dialogue, sprites, regions) (8-10h)
### 📋 Next Steps (See AGENT-ROADMAP.md for details) ### 📋 Next Steps (See AGENT-ROADMAP.md for details)
1. **Complete LLM Function Calling** (4-6h): Add tool definitions to system prompts 1. **Live LLM Testing** (1-2h): Verify function calling with real Ollama/Gemini
2. **Implement GUI Chat Widget** (6-8h): Create ImGui widget matching TUI experience 2. **Implement GUI Chat Widget** (6-8h): Create ImGui widget matching TUI experience
3. **Expand Tool Coverage** (8-10h): Add dialogue search, sprite info, region queries 3. **Expand Tool Coverage** (8-10h): Add dialogue search, sprite info, region queries
4. **Performance Optimizations** (4-6h): Response caching, token tracking, streaming 4. **Performance Optimizations** (4-6h): Response caching, token tracking, streaming

View File

@@ -406,6 +406,57 @@ std::string PromptBuilder::BuildToolReference() const {
return oss.str(); return oss.str();
} }
std::string PromptBuilder::BuildFunctionCallSchemas() const {
if (tool_specs_.empty()) {
return "[]";
}
nlohmann::json tools_array = nlohmann::json::array();
for (const auto& spec : tool_specs_) {
nlohmann::json tool;
tool["type"] = "function";
nlohmann::json function;
function["name"] = spec.name;
function["description"] = spec.description;
if (!spec.usage_notes.empty()) {
function["description"] = spec.description + " " + spec.usage_notes;
}
nlohmann::json parameters;
parameters["type"] = "object";
nlohmann::json properties = nlohmann::json::object();
nlohmann::json required = nlohmann::json::array();
for (const auto& arg : spec.arguments) {
nlohmann::json arg_schema;
arg_schema["type"] = "string"; // All CLI args are strings
arg_schema["description"] = arg.description;
if (!arg.example.empty()) {
arg_schema["example"] = arg.example;
}
properties[arg.name] = arg_schema;
if (arg.required) {
required.push_back(arg.name);
}
}
parameters["properties"] = properties;
if (!required.empty()) {
parameters["required"] = required;
}
function["parameters"] = parameters;
tool["function"] = function;
tools_array.push_back(tool);
}
return tools_array.dump(2);
}
std::string PromptBuilder::BuildFewShotExamplesSection() const { std::string PromptBuilder::BuildFewShotExamplesSection() const {
std::ostringstream oss; std::ostringstream oss;
@@ -460,26 +511,54 @@ std::string PromptBuilder::BuildConstraintsSection() const {
"reasoning": "Your thought process." "reasoning": "Your thought process."
} }
- `text_response` is for conversational replies. - `text_response` is for conversational replies.
- `tool_calls` is for asking questions about the ROM. Use the available tools. - `tool_calls` is for asking questions about the ROM. Use the available tools listed below.
- `commands` is for generating commands to modify the ROM. - `commands` is for generating commands to modify the ROM.
- All fields are optional. - All fields are optional, but you should always provide at least one.
2. **Command Syntax:** Follow the exact syntax shown in examples 2. **Tool Usage:** When the user asks a question about the ROM state, use tool_calls instead of commands
- Tools are read-only and return information
- Commands modify the ROM and should only be used when explicitly requested
- You can call multiple tools in one response
- Always use JSON format for tool results
3. **Command Syntax:** Follow the exact syntax shown in examples
- Use correct flag names (--group, --id, --to, --from, etc.) - Use correct flag names (--group, --id, --to, --from, etc.)
- Use hex format for colors (0xRRGGBB) and tile IDs (0xNNN) - Use hex format for colors (0xRRGGBB) and tile IDs (0xNNN)
- Coordinates are 0-based indices - Coordinates are 0-based indices
3. **Common Patterns:** 4. **Common Patterns:**
- Palette modifications: export set-color import - Palette modifications: export set-color import
- Multiple tile placement: multiple overworld set-tile commands - Multiple tile placement: multiple overworld set-tile commands
- Validation: single rom validate command - Validation: single rom validate command
4. **Error Prevention:** 5. **Error Prevention:**
- Always export before modifying palettes - Always export before modifying palettes
- Use temporary file names (temp_*.json) for intermediate files - Use temporary file names (temp_*.json) for intermediate files
- Validate coordinates are within bounds - Validate coordinates are within bounds
)"; )";
if (!tool_specs_.empty()) {
oss << "\n# Available Tools for ROM Inspection\n\n";
oss << "You have access to the following tools to answer questions:\n\n";
oss << "```json\n";
oss << BuildFunctionCallSchemas();
oss << "\n```\n\n";
oss << "**Tool Call Example:**\n";
oss << "```json\n";
oss << R"({
"text_response": "Let me check the dungeons in this ROM.",
"tool_calls": [
{
"tool_name": "resource-list",
"args": {
"type": "dungeon"
}
}
]
})";
oss << "\n```\n";
}
if (!tile_reference_.empty()) { if (!tile_reference_.empty()) {
oss << "\n" << BuildTileReferenceSection(); oss << "\n" << BuildTileReferenceSection();
} }

View File

@@ -86,6 +86,9 @@ class PromptBuilder {
return tile_reference_; return tile_reference_;
} }
// Generate OpenAI-compatible function call schemas (JSON format)
std::string BuildFunctionCallSchemas() const;
// Set verbosity level (0=minimal, 1=standard, 2=verbose) // Set verbosity level (0=minimal, 1=standard, 2=verbose)
void SetVerbosity(int level) { verbosity_ = level; } void SetVerbosity(int level) { verbosity_ = level; }