Add LLM integration summary and quickstart script for Ollama

- Created LLM-INTEGRATION-SUMMARY.md detailing the integration plan for Ollama, Gemini, and Claude. - Updated README.md to reflect the shift in focus towards LLM integration. - Added quickstart_ollama.sh script to facilitate testing of Ollama integration with z3ed.
2025-10-03 00:51:05 -04:00
parent 287f04ffc4
commit 40a4e43db9
7 changed files with 2254 additions and 7 deletions
--- a/docs/z3ed/LLM-IMPLEMENTATION-CHECKLIST.md
+++ b/docs/z3ed/LLM-IMPLEMENTATION-CHECKLIST.md
@@ -0,0 +1,261 @@
+# LLM Integration Implementation Checklist
+
+**Created**: October 3, 2025  
+**Status**: Ready to Begin  
+**Estimated Time**: 12-15 hours total
+
+> 📋 **Main Guide**: See [LLM-INTEGRATION-PLAN.md](LLM-INTEGRATION-PLAN.md) for detailed implementation instructions.
+
+## Phase 1: Ollama Local Integration (4-6 hours) 🎯 START HERE
+
+### Prerequisites
+- [ ] Install Ollama: `brew install ollama` (macOS)
+- [ ] Start Ollama server: `ollama serve`
+- [ ] Pull recommended model: `ollama pull qwen2.5-coder:7b`
+- [ ] Test connectivity: `curl http://localhost:11434/api/tags`
+
+### Implementation Tasks
+
+#### 1.1 Create OllamaAIService Class
+- [ ] Create `src/cli/service/ollama_ai_service.h`
+  - [ ] Define `OllamaConfig` struct
+  - [ ] Declare `OllamaAIService` class with `GetCommands()` override
+  - [ ] Add `CheckAvailability()` and `ListAvailableModels()` methods
+- [ ] Create `src/cli/service/ollama_ai_service.cc`
+  - [ ] Implement constructor with config
+  - [ ] Implement `BuildSystemPrompt()` with z3ed command documentation
+  - [ ] Implement `CheckAvailability()` with health check
+  - [ ] Implement `GetCommands()` with Ollama API call
+  - [ ] Add JSON parsing for command extraction
+  - [ ] Add error handling for connection failures
+
+#### 1.2 Update CMake Configuration
+- [ ] Add `YAZE_WITH_HTTPLIB` option to `CMakeLists.txt`
+- [ ] Add httplib detection (vcpkg or bundled)
+- [ ] Add compile definition `YAZE_WITH_HTTPLIB`
+- [ ] Update z3ed target to link httplib when available
+
+#### 1.3 Wire into Agent Commands
+- [ ] Update `src/cli/handlers/agent/general_commands.cc`
+  - [ ] Add `#include "cli/service/ollama_ai_service.h"`
+  - [ ] Create `CreateAIService()` helper function
+  - [ ] Implement provider selection logic (env vars)
+  - [ ] Add health check with fallback to MockAIService
+  - [ ] Update `HandleRunCommand()` to use service factory
+  - [ ] Update `HandlePlanCommand()` to use service factory
+
+#### 1.4 Testing & Validation
+- [ ] Create `scripts/test_ollama_integration.sh`
+  - [ ] Check Ollama server availability
+  - [ ] Verify model is pulled
+  - [ ] Test `z3ed agent run` with simple prompt
+  - [ ] Verify proposal creation
+  - [ ] Review generated commands
+- [ ] Run end-to-end test
+- [ ] Document any issues encountered
+
+### Success Criteria
+- [ ] `z3ed agent run --prompt "Validate ROM"` generates correct command
+- [ ] Health check reports clear errors when Ollama unavailable
+- [ ] Service fallback to MockAIService works correctly
+- [ ] Test script passes without manual intervention
+
+---
+
+## Phase 2: Improve Gemini Integration (2-3 hours)
+
+### Implementation Tasks
+
+#### 2.1 Fix GeminiAIService
+- [ ] Update `src/cli/service/gemini_ai_service.cc`
+  - [ ] Fix system instruction format
+  - [ ] Update to use `gemini-1.5-flash` model
+  - [ ] Add generation config (temperature, maxOutputTokens)
+  - [ ] Add safety settings
+  - [ ] Implement markdown code block stripping
+  - [ ] Improve error messages with actionable guidance
+
+#### 2.2 Wire into Service Factory
+- [ ] Update `CreateAIService()` to check for `GEMINI_API_KEY`
+- [ ] Add Gemini as provider option
+- [ ] Test with real API key
+
+#### 2.3 Testing
+- [ ] Test with various prompts
+- [ ] Verify JSON array parsing
+- [ ] Test error handling (invalid key, network issues)
+
+### Success Criteria
+- [ ] Gemini generates valid command arrays
+- [ ] Markdown stripping works reliably
+- [ ] Error messages guide user to API key setup
+
+---
+
+## Phase 3: Add Claude Integration (2-3 hours)
+
+### Implementation Tasks
+
+#### 3.1 Create ClaudeAIService
+- [ ] Create `src/cli/service/claude_ai_service.h`
+  - [ ] Define class with API key constructor
+  - [ ] Add `GetCommands()` override
+- [ ] Create `src/cli/service/claude_ai_service.cc`
+  - [ ] Implement Claude Messages API call
+  - [ ] Use `claude-3-5-sonnet-20241022` model
+  - [ ] Add markdown stripping
+  - [ ] Add error handling
+
+#### 3.2 Wire into Service Factory
+- [ ] Update `CreateAIService()` to check for `CLAUDE_API_KEY`
+- [ ] Add Claude as provider option
+
+#### 3.3 Testing
+- [ ] Test with various prompts
+- [ ] Compare output quality vs Gemini/Ollama
+
+### Success Criteria
+- [ ] Claude service works interchangeably with others
+- [ ] Quality comparable or better than Gemini
+
+---
+
+## Phase 4: Enhanced Prompt Engineering (3-4 hours)
+
+### Implementation Tasks
+
+#### 4.1 Create PromptBuilder Utility
+- [ ] Create `src/cli/service/prompt_builder.h`
+- [ ] Create `src/cli/service/prompt_builder.cc`
+  - [ ] Implement `LoadResourceCatalogue()` (read z3ed-resources.yaml)
+  - [ ] Implement `BuildSystemPrompt()` with full command docs
+  - [ ] Implement `BuildFewShotExamples()` with proven examples
+  - [ ] Implement `BuildContextPrompt()` with ROM state
+
+#### 4.2 Integrate into Services
+- [ ] Update OllamaAIService to use PromptBuilder
+- [ ] Update GeminiAIService to use PromptBuilder
+- [ ] Update ClaudeAIService to use PromptBuilder
+
+#### 4.3 Testing
+- [ ] Test with complex prompts
+- [ ] Measure accuracy improvement
+- [ ] Document which models perform best
+
+### Success Criteria
+- [ ] System prompts include full resource catalogue
+- [ ] Few-shot examples improve accuracy >90%
+- [ ] Context injection provides relevant ROM info
+
+---
+
+## Configuration & Documentation
+
+### Environment Variables Setup
+- [ ] Document `YAZE_AI_PROVIDER` options
+- [ ] Document `OLLAMA_MODEL` override
+- [ ] Document API key requirements
+- [ ] Create example `.env` file
+
+### User Documentation
+- [ ] Create `docs/z3ed/AI-SERVICE-SETUP.md`
+  - [ ] Ollama quick start
+  - [ ] Gemini setup guide
+  - [ ] Claude setup guide
+  - [ ] Troubleshooting section
+- [ ] Update README with LLM setup instructions
+- [ ] Add examples to main docs
+
+### CLI Enhancements
+- [ ] Add `--ai-provider` flag to override env
+- [ ] Add `--ai-model` flag to override model
+- [ ] Add `--dry-run` flag to see commands without executing
+- [ ] Add `--interactive` flag to confirm each command
+
+---
+
+## Testing Matrix
+
+| Provider | Model | Test Prompt | Expected Commands | Status |
+|----------|-------|-------------|-------------------|--------|
+| Ollama | qwen2.5-coder:7b | "Validate ROM" | `["rom validate --rom zelda3.sfc"]` | ⬜ |
+| Ollama | codellama:13b | "Export first palette" | `["palette export ..."]` | ⬜ |
+| Gemini | gemini-1.5-flash | "Make soldiers red" | `["palette export ...", "palette set-color ...", ...]` | ⬜ |
+| Claude | claude-3.5-sonnet | "Change tile at (10,20)" | `["overworld set-tile ..."]` | ⬜ |
+
+---
+
+## Rollout Plan
+
+### Week 1 (Oct 7-11, 2025)
+- **Monday**: Phase 1 implementation (OllamaAIService class)
+- **Tuesday**: Phase 1 CMake + wiring
+- **Wednesday**: Phase 1 testing + documentation
+- **Thursday**: Phase 2 (Gemini fixes)
+- **Friday**: Buffer day + code review
+
+### Week 2 (Oct 14-18, 2025)
+- **Monday**: Phase 3 (Claude integration)
+- **Tuesday**: Phase 4 (PromptBuilder)
+- **Wednesday**: Enhanced testing across all services
+- **Thursday**: Documentation completion
+- **Friday**: User validation + demos
+
+---
+
+## Known Risks & Mitigation
+
+| Risk | Impact | Likelihood | Mitigation |
+|------|--------|------------|------------|
+| Ollama not available on CI | Medium | Low | Add `YAZE_AI_PROVIDER=mock` for CI builds |
+| LLM output format inconsistent | High | Medium | Strict system prompts + validation layer |
+| API rate limits | Medium | Medium | Cache responses, implement retry backoff |
+| Model accuracy insufficient | High | Low | Multiple few-shot examples + prompt tuning |
+
+---
+
+## Success Metrics
+
+**Phase 1 Complete**: 
+- ✅ Ollama service operational on local machine
+- ✅ Can generate valid z3ed commands from prompts
+- ✅ End-to-end test passes
+
+**Phase 2-3 Complete**:
+- ✅ All three providers (Ollama, Gemini, Claude) work interchangeably
+- ✅ Service selection transparent to user
+
+**Phase 4 Complete**:
+- ✅ Command accuracy >90% on standard prompts
+- ✅ Resource catalogue integrated into system prompts
+
+**Production Ready**:
+- ✅ Documentation complete with setup guides
+- ✅ Error messages are actionable
+- ✅ Works on macOS (primary target)
+- ✅ At least one user validates the workflow
+
+---
+
+## Next Steps After Completion
+
+1. **Gather User Feedback**: Share with ROM hacking community
+2. **Measure Accuracy**: Track success rate of generated commands
+3. **Model Comparison**: Document which models work best
+4. **Fine-Tuning**: Consider fine-tuning local models on z3ed corpus
+5. **Agentic Loop**: Add self-correction based on execution results
+
+---
+
+## Notes & Observations
+
+_Add notes here as you progress through implementation:_
+
+- 
+- 
+- 
+
+---
+
+**Last Updated**: October 3, 2025  
+**Next Review**: After Phase 1 completion