Files

scawful 40a4e43db9 Add LLM integration summary and quickstart script for Ollama

- Created LLM-INTEGRATION-SUMMARY.md detailing the integration plan for Ollama, Gemini, and Claude.
- Updated README.md to reflect the shift in focus towards LLM integration.
- Added quickstart_ollama.sh script to facilitate testing of Ollama integration with z3ed.

2025-10-03 00:51:05 -04:00

8.4 KiB

Raw Blame History

LLM Integration Implementation Checklist

Created: October 3, 2025
Status: Ready to Begin
Estimated Time: 12-15 hours total

📋 Main Guide: See LLM-INTEGRATION-PLAN.md for detailed implementation instructions.

Phase 1: Ollama Local Integration (4-6 hours) 🎯 START HERE

Prerequisites

Install Ollama: brew install ollama (macOS)
Start Ollama server: ollama serve
Pull recommended model: ollama pull qwen2.5-coder:7b
Test connectivity: curl http://localhost:11434/api/tags

Implementation Tasks

1.1 Create OllamaAIService Class

Create src/cli/service/ollama_ai_service.h
- Define OllamaConfig struct
- Declare OllamaAIService class with GetCommands() override
- Add CheckAvailability() and ListAvailableModels() methods
Create src/cli/service/ollama_ai_service.cc
- Implement constructor with config
- Implement BuildSystemPrompt() with z3ed command documentation
- Implement CheckAvailability() with health check
- Implement GetCommands() with Ollama API call
- Add JSON parsing for command extraction
- Add error handling for connection failures

1.2 Update CMake Configuration

Add YAZE_WITH_HTTPLIB option to CMakeLists.txt
Add httplib detection (vcpkg or bundled)
Add compile definition YAZE_WITH_HTTPLIB
Update z3ed target to link httplib when available

1.3 Wire into Agent Commands

Update src/cli/handlers/agent/general_commands.cc
- Add #include "cli/service/ollama_ai_service.h"
- Create CreateAIService() helper function
- Implement provider selection logic (env vars)
- Add health check with fallback to MockAIService
- Update HandleRunCommand() to use service factory
- Update HandlePlanCommand() to use service factory

1.4 Testing & Validation

Create scripts/test_ollama_integration.sh
- Check Ollama server availability
- Verify model is pulled
- Test z3ed agent run with simple prompt
- Verify proposal creation
- Review generated commands
Run end-to-end test
Document any issues encountered

Success Criteria

z3ed agent run --prompt "Validate ROM" generates correct command
Health check reports clear errors when Ollama unavailable
Service fallback to MockAIService works correctly
Test script passes without manual intervention

Phase 2: Improve Gemini Integration (2-3 hours)

Implementation Tasks

2.1 Fix GeminiAIService

Update src/cli/service/gemini_ai_service.cc
- Fix system instruction format
- Update to use gemini-1.5-flash model
- Add generation config (temperature, maxOutputTokens)
- Add safety settings
- Implement markdown code block stripping
- Improve error messages with actionable guidance

2.2 Wire into Service Factory

Update CreateAIService() to check for GEMINI_API_KEY
Add Gemini as provider option
Test with real API key

2.3 Testing

Test with various prompts
Verify JSON array parsing
Test error handling (invalid key, network issues)

Success Criteria

Gemini generates valid command arrays
Markdown stripping works reliably
Error messages guide user to API key setup

Phase 3: Add Claude Integration (2-3 hours)

Implementation Tasks

3.1 Create ClaudeAIService

Create src/cli/service/claude_ai_service.h
- Define class with API key constructor
- Add GetCommands() override
Create src/cli/service/claude_ai_service.cc
- Implement Claude Messages API call
- Use claude-3-5-sonnet-20241022 model
- Add markdown stripping
- Add error handling

3.2 Wire into Service Factory

Update CreateAIService() to check for CLAUDE_API_KEY
Add Claude as provider option

3.3 Testing

Test with various prompts
Compare output quality vs Gemini/Ollama

Success Criteria

Claude service works interchangeably with others
Quality comparable or better than Gemini

Phase 4: Enhanced Prompt Engineering (3-4 hours)

Implementation Tasks

4.1 Create PromptBuilder Utility

Create src/cli/service/prompt_builder.h
Create src/cli/service/prompt_builder.cc
- Implement LoadResourceCatalogue() (read z3ed-resources.yaml)
- Implement BuildSystemPrompt() with full command docs
- Implement BuildFewShotExamples() with proven examples
- Implement BuildContextPrompt() with ROM state

4.2 Integrate into Services

Update OllamaAIService to use PromptBuilder
Update GeminiAIService to use PromptBuilder
Update ClaudeAIService to use PromptBuilder

4.3 Testing

Test with complex prompts
Measure accuracy improvement
Document which models perform best

Success Criteria

System prompts include full resource catalogue
Few-shot examples improve accuracy >90%
Context injection provides relevant ROM info

Configuration & Documentation

Environment Variables Setup

Document YAZE_AI_PROVIDER options
Document OLLAMA_MODEL override
Document API key requirements
Create example .env file

User Documentation

Create docs/z3ed/AI-SERVICE-SETUP.md
- Ollama quick start
- Gemini setup guide
- Claude setup guide
- Troubleshooting section
Update README with LLM setup instructions
Add examples to main docs

CLI Enhancements

Add --ai-provider flag to override env
Add --ai-model flag to override model
Add --dry-run flag to see commands without executing
Add --interactive flag to confirm each command

Testing Matrix

Provider	Model	Test Prompt	Expected Commands	Status
Ollama	qwen2.5-coder:7b	"Validate ROM"	`["rom validate --rom zelda3.sfc"]`	⬜
Ollama	codellama:13b	"Export first palette"	`["palette export ..."]`	⬜
Gemini	gemini-1.5-flash	"Make soldiers red"	`["palette export ...", "palette set-color ...", ...]`	⬜
Claude	claude-3.5-sonnet	"Change tile at (10,20)"	`["overworld set-tile ..."]`	⬜

Rollout Plan

Week 1 (Oct 7-11, 2025)

Monday: Phase 1 implementation (OllamaAIService class)
Tuesday: Phase 1 CMake + wiring
Wednesday: Phase 1 testing + documentation
Thursday: Phase 2 (Gemini fixes)
Friday: Buffer day + code review

Week 2 (Oct 14-18, 2025)

Monday: Phase 3 (Claude integration)
Tuesday: Phase 4 (PromptBuilder)
Wednesday: Enhanced testing across all services
Thursday: Documentation completion
Friday: User validation + demos

Known Risks & Mitigation

Risk	Impact	Likelihood	Mitigation
Ollama not available on CI	Medium	Low	Add `YAZE_AI_PROVIDER=mock` for CI builds
LLM output format inconsistent	High	Medium	Strict system prompts + validation layer
API rate limits	Medium	Medium	Cache responses, implement retry backoff
Model accuracy insufficient	High	Low	Multiple few-shot examples + prompt tuning

Success Metrics

Phase 1 Complete:

✅ Ollama service operational on local machine
✅ Can generate valid z3ed commands from prompts
✅ End-to-end test passes

Phase 2-3 Complete:

✅ All three providers (Ollama, Gemini, Claude) work interchangeably
✅ Service selection transparent to user

Phase 4 Complete:

✅ Command accuracy >90% on standard prompts
✅ Resource catalogue integrated into system prompts

Production Ready:

✅ Documentation complete with setup guides
✅ Error messages are actionable
✅ Works on macOS (primary target)
✅ At least one user validates the workflow

Next Steps After Completion

Gather User Feedback: Share with ROM hacking community
Measure Accuracy: Track success rate of generated commands
Model Comparison: Document which models work best
Fine-Tuning: Consider fine-tuning local models on z3ed corpus
Agentic Loop: Add self-correction based on execution results

Notes & Observations

Add notes here as you progress through implementation:

Last Updated: October 3, 2025
Next Review: After Phase 1 completion

8.4 KiB Raw Blame History

LLM Integration Implementation Checklist

Phase 1: Ollama Local Integration (4-6 hours) 🎯 START HERE

Prerequisites

Implementation Tasks

1.1 Create OllamaAIService Class

1.2 Update CMake Configuration

1.3 Wire into Agent Commands

1.4 Testing & Validation

Success Criteria

Phase 2: Improve Gemini Integration (2-3 hours)

Implementation Tasks

2.1 Fix GeminiAIService

2.2 Wire into Service Factory

2.3 Testing

Success Criteria

Phase 3: Add Claude Integration (2-3 hours)

Implementation Tasks

3.1 Create ClaudeAIService

3.2 Wire into Service Factory

3.3 Testing

Success Criteria

Phase 4: Enhanced Prompt Engineering (3-4 hours)

Implementation Tasks

4.1 Create PromptBuilder Utility

4.2 Integrate into Services

4.3 Testing

Success Criteria

Configuration & Documentation

Environment Variables Setup

User Documentation

CLI Enhancements

Testing Matrix

Rollout Plan

Week 1 (Oct 7-11, 2025)

Week 2 (Oct 14-18, 2025)

Known Risks & Mitigation

Success Metrics

Next Steps After Completion

Notes & Observations

8.4 KiB

Raw Blame History