Files
yaze/docs/ollama_integration_status.md
scawful 5c863b1445 Refactor graphics optimizations documentation and add ImGui widget testing guide
- Updated the gfx_optimizations_complete.md to streamline the overview and implementation details of graphics optimizations, removing completed status indicators and enhancing clarity on future recommendations.
- Introduced imgui_widget_testing_guide.md, detailing the usage of YAZE's ImGui testing infrastructure for automated GUI testing, including architecture, integration steps, and best practices.
- Created ollama_integration_status.md to document the current status of Ollama integration, highlighting completed tasks, ongoing issues, and next steps for improvement.
- Revised developer_guide.md to reflect the latest updates in AI provider configuration and input methods for the z3ed agent, ensuring clarity on command-line flags and supported providers.
2025-10-04 03:24:42 -04:00

331 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Ollama Integration Status - Updated# Ollama Integration Status
## ✅ Completed## ✅ Completed
### Infrastructure### Flag Parsing
- ✅ Flag parsing for AI provider configuration- **Fixed**: AI provider flags (`--ai_provider`, `--ai_model`, `--ollama_host`, `--gemini_api_key`) are now properly parsed in `cli_main.cc`
- ✅ Ollama service with health checks- **Result**: Ollama provider is correctly detected and initialized
- ✅ Tool system with 5 read-only tools- **Verification**: `🤖 AI Provider: ollama` message appears correctly
- ✅ Simple chat modes (4 input methods working)
- ✅ Colorful terminal output with loading indicators### Ollama Service
- ✅ Verbose mode for diagnostics- **Status**: OllamaAIService properly connects to local Ollama server
- ✅ Configurable max-tool-iterations and max-retries- **Health Check**: Successfully validates model availability (qwen2.5-coder:7b)
- ✅ File-based prompt system (assets/agent/*.txt)- **JSON Parsing**: Correctly extracts tool calls and text responses from Ollama's response format
### Current Issue: Empty Tool Results### Tool System
- **Tool Dispatcher**: Working correctly - routes tool calls to appropriate handlers
**Problem**: The `resource-list` tool is returning empty JSON `{}` when requesting dungeon labels.- **Tool Registration**: 5 read-only tools available:
- `resource-list` - List labeled resources
**Root Cause**: The embedded labels in Zelda3Labels only include: - `dungeon-list-sprites` - Inspect room sprites
- `room` - 297 room names ✅ - `overworld-find-tile` - Search for tile placements
- `entrance` - 133 entrance names ✅ - `overworld-describe-map` - Get map metadata
- `sprite` - 256 sprite names ✅ - `overworld-list-warps` - List entrances/exits/holes
- `overlord` - 26 overlord names ✅
- `item` - 104 item names ✅### Simple Chat Modes
All 4 input methods working:
But **NOT** `dungeon` as a separate category.1. ✅ Single message mode: `z3ed agent simple-chat "message" --rom=file.sfc --ai_provider=ollama`
2. ✅ Interactive mode: `z3ed agent simple-chat --rom=file.sfc --ai_provider=ollama`
**Diagnosis**:3. ✅ Piped input mode: `echo "message" | z3ed agent simple-chat --rom=file.sfc --ai_provider=ollama`
```bash4. ✅ Batch file mode: `z3ed agent simple-chat --file=queries.txt --rom=file.sfc --ai_provider=ollama`
# Works (returns data):
./z3ed agent resource-list --type=room --format=json## 🚧 In Progress
./z3ed agent resource-list --type=entrance --format=json
./z3ed agent resource-list --type=sprite --format=json### Tool Calling Loop Issue
**Problem**: Agent enters infinite tool-calling loop without providing final text response
# Fails (returns empty {}):
./z3ed agent resource-list --type=dungeon --format=json**Symptoms**:
``````
Error: Agent did not produce a response after executing tools.
**Solution Options**:```
1. **Quick Fix**: Update prompt examples to use valid categories**Root Cause**: The system prompt needs refinement to instruct the LLM to:
- Change `type: dungeon` → `type: room` in examples1. Call tools when needed
- Update tool descriptions to clarify available categories2. Wait for tool results
3. **THEN provide a final text_response based on the tool results**
2. **Proper Fix**: Add dungeon labels to embedded labels4. Stop calling tools after receiving results
- Modify `Zelda3Labels::ToResourceLabels()` to include dungeon category
- Map dungeon IDs (0-11) to their names**Current Behavior**:
- LLM successfully calls tools (e.g., `resource-list` with `type=dungeon`)
3. **Alternative**: Clarify that "dungeons" are accessed via room labels- Tool executes and returns JSON results
- Document that dungeon rooms use the `room` category- LLM receives results in conversation history
- Provide ID ranges (e.g., rooms 0-119 are Hyrule Castle, etc.)- LLM either:
- Calls tools again (loop detected after 4 iterations)
## 🎨 New Features Added - OR doesn't provide a `text_response` field in the JSON
### Verbose Mode**Solution Needed**: Update system prompt to include explicit instructions like:
```bash
z3ed agent simple-chat "query" --verbose```
```When you call a tool:
Shows:1. The tool will execute and return results
- Iteration count2. You will receive the results in the next message
- Agent response analysis (tool calls, commands, text_response status)3. After receiving tool results, you MUST provide a text_response that answers the user's question using the tool data
- LLM reasoning4. Do NOT call the same tool again
- Tool output preview5. Example flow:
- Step-by-step execution flow User: "What dungeons are there?"
Assistant (first response): { "tool_calls": [{"tool_name": "resource-list", "args": {"type": "dungeon"}}] }
### Configuration Parameters [Tool executes and returns dungeon list]
```bash Assistant (second response): { "text_response": "Based on the resource list, there are X dungeons: [list them]" }
--max-tool-iterations=6 # Default: 4```
--max-retries=5 # Default: 3
--no-reasoning # Hide LLM reasoning## 📋 Testing
```
### Test Script
### Colorful OutputCreated `test_simple_chat_ollama.sh` with comprehensive tests:
- 🔧 Tool calls in magenta- ✅ Prerequisites check (Ollama, model, ROM)
- ✓ Success messages in green- ✅ Single message mode test
- ⚠ Warnings in yellow- ✅ Piped input test
- ✗ Errors in red- ✅ Interactive mode test (with auto-exit)
- Info in blue- ✅ Batch mode test
- 💭 Reasoning in dim yellow- ⚠️ Tool calling verification (needs prompt refinement)
- ⠋ Loading spinner (cyan)
### Manual Test Results
## 📋 Next Steps
**Test 1: Single Message**
### Priority 1: Fix Empty Tool Results (HIGH)```bash
1. Add dungeon category to embedded labels OR./build_test/bin/z3ed agent simple-chat "What dungeons are in this ROM?" \
2. Update all prompt examples to use `room` instead of `dungeon` --rom=assets/zelda3.sfc --ai_provider=ollama
3. Test that tools return actual data```
4. Verify LLM can process tool results**Result**:
- ✅ Ollama connects successfully
### Priority 2: Refine Prompts (MEDIUM)- ✅ Model loads (qwen2.5-coder:7b)
Once tools return data:- ❌ Hits 4-iteration limit without final response
1. Test if LLM provides final text_response after tool results
2. Adjust system prompt if loop persists**Test 2: Tool Availability**
3. Test with different Ollama models (llama3, codellama)```bash
./build_test/bin/z3ed agent resource-list --type=dungeon --format=json --rom=assets/zelda3.sfc
### Priority 3: Documentation (LOW)```
1. Document available resource categories**Result**: ✅ Returns proper JSON with dungeon names
2. Add troubleshooting guide
3. Create example queries for each tool## 🔧 Next Steps
## 🧪 Testing Commands### Priority 1: Fix Tool Calling Loop (High Priority)
1. **Update system prompt** in `prompt_builder.cc`:
```bash - Add explicit instructions for tool usage workflow
# Test with verbose mode - Include examples showing tool call → results → final response
./build_test/bin/z3ed agent simple-chat "What rooms are there?" \\ - Emphasize that `text_response` is REQUIRED after receiving tool results
--rom=assets/zelda3.sfc --ai_provider=ollama --verbose --max-tool-iterations=6
2. **Enhance examples** in `prompt_catalogue.yaml`:
# Test resource categories - Add multi-turn examples showing tool usage
./build_test/bin/z3ed agent resource-list --type=room --rom=assets/zelda3.sfc - Show correct pattern: question → tool_call → (wait) → text_response with tool data
./build_test/bin/z3ed agent resource-list --type=entrance --rom=assets/zelda3.sfc
./build_test/bin/z3ed agent resource-list --type=sprite --rom=assets/zelda3.sfc3. **Improve response validation** in `ollama_ai_service.cc`:
- Detect when tool results are in history but no text_response provided
# Test with Gemini (if API key available) - Add warning messages for debugging
export GEMINI_API_KEY='your-key'
./build_test/bin/z3ed agent simple-chat "What rooms are in this ROM?" \\### Priority 2: Testing & Validation (Medium Priority)
--rom=assets/zelda3.sfc --ai_provider=gemini --verbose1. Test with different Ollama models:
``` - qwen2.5-coder:7b (current)
- llama3:8b
## 📊 Performance - codellama:7b
- Ollama response: ~2-5 seconds (qwen2.5-coder:7b)2. Create regression test suite for tool calling:
- Tool execution: <100ms - Test each tool individually
- Loading indicator: Smooth 80ms refresh rate - Test multi-tool sequences
- Test conversation context preservation
## 🎯 Success Criteria
### Priority 3: Documentation (Low Priority)
- [x] Colorful, user-friendly output1. Update `simple_chat_input_methods.md` with:
- [x] Verbose mode for debugging - Known limitations section
- [x] Configurable parameters - Troubleshooting for tool calling issues
- [x] File-based prompts for easy updates - Recommended models and configurations
- [ ] Tools return actual data (BLOCKED on dungeon labels)
- [ ] LLM provides final response after tool calls2. Create `ollama_best_practices.md`:
- [ ] Zero infinite loops - Model recommendations
- Temperature/parameter tuning
--- - Prompt engineering tips
**Last Updated**: October 4, 2025## 📊 Performance Notes
**Status**: 🟡 Blocked on empty tool results - need to fix dungeon labels
**Next Action**: Add dungeon category to embedded labels OR update prompt examples- **Ollama Response Time**: ~2-5 seconds per query (qwen2.5-coder:7b on typical hardware)
- **Tool Execution**: <100ms per tool call
- **Total Interaction**: ~2-5 seconds for simple queries, longer for multi-turn with tools
## 🐛 Known Issues
1. **Tool Calling Loop**: Agent doesn't provide final response after tool execution (see above)
2. **No Streaming**: Responses are blocking (not streamed), so user sees delay
3. **Limited Context**: Prompt builder doesn't include full conversation context in system prompt
## 💡 Recommendations
### For Users
- Use MockAIService for testing until tool calling is fixed
- For production, prefer Gemini (has native function calling support)
- Keep queries simple and direct
### For Developers
- Focus on fixing the tool calling loop first
- Consider implementing streaming responses
- Add debug logging to track tool call cycles
- Test with multiple Ollama models to find best performer
## 📝 Related Files
- `/Users/scawful/Code/yaze/src/cli/cli_main.cc` - Flag parsing (FIXED ✅)
- `/Users/scawful/Code/yaze/src/cli/service/ai/ollama_ai_service.cc` - Ollama integration
- `/Users/scawful/Code/yaze/src/cli/service/ai/prompt_builder.cc` - System prompt generation (NEEDS FIX 🚧)
- `/Users/scawful/Code/yaze/src/cli/service/agent/conversational_agent_service.cc` - Tool execution loop
- `/Users/scawful/Code/yaze/assets/agent/prompt_catalogue.yaml` - Tool definitions and examples (NEEDS ENHANCEMENT 🚧)
- `/Users/scawful/Code/yaze/docs/simple_chat_input_methods.md` - User documentation
- `/Users/scawful/Code/yaze/test_simple_chat_ollama.sh` - Test script
## 🎯 Success Criteria
### Minimum Viable
- [ ] LLM successfully calls tools
- [ ] LLM provides final text_response after receiving tool results
- [ ] No infinite loops (completes within 4 iterations)
- [ ] Accurate answers to simple questions ("What dungeons?", "List sprites in room X")
### Full Success
- [ ] All 5 tools work correctly with Ollama
- [ ] Multi-turn conversations maintain context
- [ ] Works with 3+ different Ollama models
- [ ] Response time <5 seconds for typical queries
- [ ] Comprehensive test coverage
---
**Last Updated**: October 4, 2025
**Status**: 🟡 Partially Working - Core infrastructure complete, prompt refinement needed
**Next Action**: Update system prompt to fix tool calling loop