Add LLM integration summary and quickstart script for Ollama

- Created LLM-INTEGRATION-SUMMARY.md detailing the integration plan for Ollama, Gemini, and Claude.
- Updated README.md to reflect the shift in focus towards LLM integration.
- Added quickstart_ollama.sh script to facilitate testing of Ollama integration with z3ed.
This commit is contained in:
scawful
2025-10-03 00:51:05 -04:00
parent 287f04ffc4
commit 40a4e43db9
7 changed files with 2254 additions and 7 deletions

View File

@@ -20,9 +20,10 @@ The z3ed CLI and AI agent workflow system has completed major infrastructure mil
- **Test Harness Enhancements (IT-05 to IT-09)**: Expanding from basic automation to comprehensive testing platform with a renewed emphasis on system-wide error reporting
**📋 Next Phases**:
- **Priority 1**: Test Introspection API (IT-05) - Enable test status querying and result polling
- **Priority 1**: LLM Integration (Ollama + Gemini + Claude) - Make AI agent system production-ready (see [LLM-INTEGRATION-PLAN.md](LLM-INTEGRATION-PLAN.md))
- **Priority 2**: Widget Discovery API (IT-06) - AI agents enumerate available GUI interactions
- **Priority 3**: Enhanced Error Reporting (IT-08+) - Holistic improvements spanning z3ed, ImGuiTestHarness, EditorManager, and core application services
- **Priority 3**: Windows Cross-Platform Testing - Validate on Windows with vcpkg
- **Deprioritized**: Collaborative Editing (IT-10) - Postponed in favor of practical LLM integration
**Recent Accomplishments** (Updated: October 2025):
- **✅ IT-08 Enhanced Error Reporting Complete**: Full diagnostic capture operational
@@ -404,8 +405,76 @@ jobs:
---
#### IT-10: Collaborative Editing & Multiplayer Sessions (12-15 hours)
**Implementation Tasks**:
#### IT-10: Collaborative Editing & Multiplayer Sessions ⏸️ DEPRIORITIZED
**Status**: Postponed in favor of LLM integration work
**Rationale**: While collaborative editing is an interesting feature, practical LLM integration provides more immediate value for the agentic workflow system. The core infrastructure is complete, and enabling real AI agents to interact with z3ed is the critical next step.
**Future Consideration**: IT-10 may be revisited after LLM integration is production-ready and validated by users. The collaborative editing design is preserved in the documentation for future reference.
**See**: [LLM-INTEGRATION-PLAN.md](LLM-INTEGRATION-PLAN.md) for the new priority work.
---
### Priority 2: LLM Integration (Ollama + Gemini + Claude) 🤖 NEW PRIORITY
**Goal**: Enable practical AI-driven ROM modifications with local and remote LLM providers
**Time Estimate**: 12-15 hours total
**Status**: Ready to Implement
**Why This is Critical**: The z3ed infrastructure is complete (CLI, proposals, sandbox, GUI automation), but currently uses `MockAIService` with hardcoded commands. Real LLM integration unlocks the full potential of the agentic workflow system.
**📋 Complete Documentation**:
- **[LLM-INTEGRATION-PLAN.md](LLM-INTEGRATION-PLAN.md)** - Detailed technical implementation guide (60+ pages)
- **[LLM-IMPLEMENTATION-CHECKLIST.md](LLM-IMPLEMENTATION-CHECKLIST.md)** - Step-by-step task list with checkboxes
- **[LLM-INTEGRATION-SUMMARY.md](LLM-INTEGRATION-SUMMARY.md)** - Executive summary and getting started
**Implementation Phases**:
#### Phase 1: Ollama Local Integration (4-6 hours) 🎯 START HERE
- Create `OllamaAIService` class with health checks and model management
- Wire into agent commands with provider selection mechanism
- Add CMake configuration for httplib support
- End-to-end testing with `qwen2.5-coder:7b` model
**Key Benefits**: Local, free, private, no rate limits
#### Phase 2: Gemini Fixes (2-3 hours)
- Fix existing `GeminiAIService` implementation
- Improve prompting with resource catalogue
- Add markdown code block stripping for reliable parsing
#### Phase 3: Claude Integration (2-3 hours)
- Create `ClaudeAIService` class
- Implement Messages API integration
- Same interface as other services for easy swapping
#### Phase 4: Enhanced Prompt Engineering (3-4 hours)
- Create `PromptBuilder` utility class
- Load resource catalogue (`z3ed-resources.yaml`) into system prompts
- Add few-shot examples for improved accuracy (>90%)
- Inject ROM context (current state, loaded editors)
**Quick Start After Implementation**:
```bash
# Install Ollama
brew install ollama
ollama serve &
ollama pull qwen2.5-coder:7b
# Configure z3ed
export YAZE_AI_PROVIDER=ollama
# Use natural language
z3ed agent run --prompt "Make all soldier armor red" --rom zelda3.sfc --sandbox
z3ed agent diff # Review changes
```
**Testing Script**: `./scripts/quickstart_ollama.sh` (automated setup validation)
---
### Priority 3: Windows Cross-Platform Testing 🪟
1. **Collaboration Server**:
- WebSocket server for real-time client communication
- Session management (create, join, authentication)

View File

@@ -0,0 +1,261 @@
# LLM Integration Implementation Checklist
**Created**: October 3, 2025
**Status**: Ready to Begin
**Estimated Time**: 12-15 hours total
> 📋 **Main Guide**: See [LLM-INTEGRATION-PLAN.md](LLM-INTEGRATION-PLAN.md) for detailed implementation instructions.
## Phase 1: Ollama Local Integration (4-6 hours) 🎯 START HERE
### Prerequisites
- [ ] Install Ollama: `brew install ollama` (macOS)
- [ ] Start Ollama server: `ollama serve`
- [ ] Pull recommended model: `ollama pull qwen2.5-coder:7b`
- [ ] Test connectivity: `curl http://localhost:11434/api/tags`
### Implementation Tasks
#### 1.1 Create OllamaAIService Class
- [ ] Create `src/cli/service/ollama_ai_service.h`
- [ ] Define `OllamaConfig` struct
- [ ] Declare `OllamaAIService` class with `GetCommands()` override
- [ ] Add `CheckAvailability()` and `ListAvailableModels()` methods
- [ ] Create `src/cli/service/ollama_ai_service.cc`
- [ ] Implement constructor with config
- [ ] Implement `BuildSystemPrompt()` with z3ed command documentation
- [ ] Implement `CheckAvailability()` with health check
- [ ] Implement `GetCommands()` with Ollama API call
- [ ] Add JSON parsing for command extraction
- [ ] Add error handling for connection failures
#### 1.2 Update CMake Configuration
- [ ] Add `YAZE_WITH_HTTPLIB` option to `CMakeLists.txt`
- [ ] Add httplib detection (vcpkg or bundled)
- [ ] Add compile definition `YAZE_WITH_HTTPLIB`
- [ ] Update z3ed target to link httplib when available
#### 1.3 Wire into Agent Commands
- [ ] Update `src/cli/handlers/agent/general_commands.cc`
- [ ] Add `#include "cli/service/ollama_ai_service.h"`
- [ ] Create `CreateAIService()` helper function
- [ ] Implement provider selection logic (env vars)
- [ ] Add health check with fallback to MockAIService
- [ ] Update `HandleRunCommand()` to use service factory
- [ ] Update `HandlePlanCommand()` to use service factory
#### 1.4 Testing & Validation
- [ ] Create `scripts/test_ollama_integration.sh`
- [ ] Check Ollama server availability
- [ ] Verify model is pulled
- [ ] Test `z3ed agent run` with simple prompt
- [ ] Verify proposal creation
- [ ] Review generated commands
- [ ] Run end-to-end test
- [ ] Document any issues encountered
### Success Criteria
- [ ] `z3ed agent run --prompt "Validate ROM"` generates correct command
- [ ] Health check reports clear errors when Ollama unavailable
- [ ] Service fallback to MockAIService works correctly
- [ ] Test script passes without manual intervention
---
## Phase 2: Improve Gemini Integration (2-3 hours)
### Implementation Tasks
#### 2.1 Fix GeminiAIService
- [ ] Update `src/cli/service/gemini_ai_service.cc`
- [ ] Fix system instruction format
- [ ] Update to use `gemini-1.5-flash` model
- [ ] Add generation config (temperature, maxOutputTokens)
- [ ] Add safety settings
- [ ] Implement markdown code block stripping
- [ ] Improve error messages with actionable guidance
#### 2.2 Wire into Service Factory
- [ ] Update `CreateAIService()` to check for `GEMINI_API_KEY`
- [ ] Add Gemini as provider option
- [ ] Test with real API key
#### 2.3 Testing
- [ ] Test with various prompts
- [ ] Verify JSON array parsing
- [ ] Test error handling (invalid key, network issues)
### Success Criteria
- [ ] Gemini generates valid command arrays
- [ ] Markdown stripping works reliably
- [ ] Error messages guide user to API key setup
---
## Phase 3: Add Claude Integration (2-3 hours)
### Implementation Tasks
#### 3.1 Create ClaudeAIService
- [ ] Create `src/cli/service/claude_ai_service.h`
- [ ] Define class with API key constructor
- [ ] Add `GetCommands()` override
- [ ] Create `src/cli/service/claude_ai_service.cc`
- [ ] Implement Claude Messages API call
- [ ] Use `claude-3-5-sonnet-20241022` model
- [ ] Add markdown stripping
- [ ] Add error handling
#### 3.2 Wire into Service Factory
- [ ] Update `CreateAIService()` to check for `CLAUDE_API_KEY`
- [ ] Add Claude as provider option
#### 3.3 Testing
- [ ] Test with various prompts
- [ ] Compare output quality vs Gemini/Ollama
### Success Criteria
- [ ] Claude service works interchangeably with others
- [ ] Quality comparable or better than Gemini
---
## Phase 4: Enhanced Prompt Engineering (3-4 hours)
### Implementation Tasks
#### 4.1 Create PromptBuilder Utility
- [ ] Create `src/cli/service/prompt_builder.h`
- [ ] Create `src/cli/service/prompt_builder.cc`
- [ ] Implement `LoadResourceCatalogue()` (read z3ed-resources.yaml)
- [ ] Implement `BuildSystemPrompt()` with full command docs
- [ ] Implement `BuildFewShotExamples()` with proven examples
- [ ] Implement `BuildContextPrompt()` with ROM state
#### 4.2 Integrate into Services
- [ ] Update OllamaAIService to use PromptBuilder
- [ ] Update GeminiAIService to use PromptBuilder
- [ ] Update ClaudeAIService to use PromptBuilder
#### 4.3 Testing
- [ ] Test with complex prompts
- [ ] Measure accuracy improvement
- [ ] Document which models perform best
### Success Criteria
- [ ] System prompts include full resource catalogue
- [ ] Few-shot examples improve accuracy >90%
- [ ] Context injection provides relevant ROM info
---
## Configuration & Documentation
### Environment Variables Setup
- [ ] Document `YAZE_AI_PROVIDER` options
- [ ] Document `OLLAMA_MODEL` override
- [ ] Document API key requirements
- [ ] Create example `.env` file
### User Documentation
- [ ] Create `docs/z3ed/AI-SERVICE-SETUP.md`
- [ ] Ollama quick start
- [ ] Gemini setup guide
- [ ] Claude setup guide
- [ ] Troubleshooting section
- [ ] Update README with LLM setup instructions
- [ ] Add examples to main docs
### CLI Enhancements
- [ ] Add `--ai-provider` flag to override env
- [ ] Add `--ai-model` flag to override model
- [ ] Add `--dry-run` flag to see commands without executing
- [ ] Add `--interactive` flag to confirm each command
---
## Testing Matrix
| Provider | Model | Test Prompt | Expected Commands | Status |
|----------|-------|-------------|-------------------|--------|
| Ollama | qwen2.5-coder:7b | "Validate ROM" | `["rom validate --rom zelda3.sfc"]` | ⬜ |
| Ollama | codellama:13b | "Export first palette" | `["palette export ..."]` | ⬜ |
| Gemini | gemini-1.5-flash | "Make soldiers red" | `["palette export ...", "palette set-color ...", ...]` | ⬜ |
| Claude | claude-3.5-sonnet | "Change tile at (10,20)" | `["overworld set-tile ..."]` | ⬜ |
---
## Rollout Plan
### Week 1 (Oct 7-11, 2025)
- **Monday**: Phase 1 implementation (OllamaAIService class)
- **Tuesday**: Phase 1 CMake + wiring
- **Wednesday**: Phase 1 testing + documentation
- **Thursday**: Phase 2 (Gemini fixes)
- **Friday**: Buffer day + code review
### Week 2 (Oct 14-18, 2025)
- **Monday**: Phase 3 (Claude integration)
- **Tuesday**: Phase 4 (PromptBuilder)
- **Wednesday**: Enhanced testing across all services
- **Thursday**: Documentation completion
- **Friday**: User validation + demos
---
## Known Risks & Mitigation
| Risk | Impact | Likelihood | Mitigation |
|------|--------|------------|------------|
| Ollama not available on CI | Medium | Low | Add `YAZE_AI_PROVIDER=mock` for CI builds |
| LLM output format inconsistent | High | Medium | Strict system prompts + validation layer |
| API rate limits | Medium | Medium | Cache responses, implement retry backoff |
| Model accuracy insufficient | High | Low | Multiple few-shot examples + prompt tuning |
---
## Success Metrics
**Phase 1 Complete**:
- ✅ Ollama service operational on local machine
- ✅ Can generate valid z3ed commands from prompts
- ✅ End-to-end test passes
**Phase 2-3 Complete**:
- ✅ All three providers (Ollama, Gemini, Claude) work interchangeably
- ✅ Service selection transparent to user
**Phase 4 Complete**:
- ✅ Command accuracy >90% on standard prompts
- ✅ Resource catalogue integrated into system prompts
**Production Ready**:
- ✅ Documentation complete with setup guides
- ✅ Error messages are actionable
- ✅ Works on macOS (primary target)
- ✅ At least one user validates the workflow
---
## Next Steps After Completion
1. **Gather User Feedback**: Share with ROM hacking community
2. **Measure Accuracy**: Track success rate of generated commands
3. **Model Comparison**: Document which models work best
4. **Fine-Tuning**: Consider fine-tuning local models on z3ed corpus
5. **Agentic Loop**: Add self-correction based on execution results
---
## Notes & Observations
_Add notes here as you progress through implementation:_
-
-
-
---
**Last Updated**: October 3, 2025
**Next Review**: After Phase 1 completion

View File

@@ -0,0 +1,421 @@
# LLM Integration Architecture
**Visual Overview of z3ed Agent System with LLM Providers**
## System Architecture
```
┌─────────────────────────────────────────────────────────────────────┐
│ User / Developer │
└────────────────────────────┬────────────────────────────────────────┘
│ Natural Language Prompt
│ "Make soldier armor red"
┌─────────────────────────────────────────────────────────────────────┐
│ z3ed CLI (Entry Point) │
│ │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ z3ed agent run --prompt "..." --rom zelda3.sfc --sandbox │ │
│ └────────────────────────────────────────────────────────────┘ │
└────────────────────────────┬────────────────────────────────────────┘
│ Invoke
┌─────────────────────────────────────────────────────────────────────┐
│ Agent Command Handler │
│ (src/cli/handlers/agent/) │
│ │
│ • Parse arguments │
│ • Create proposal │
│ • Select AI service ◄────────── Environment Variables │
│ • Execute commands │
│ • Track in registry │
└────────────────────────────┬────────────────────────────────────────┘
│ Get Commands
┌─────────────────────────────────────────────────────────────────────┐
│ AI Service Factory │
│ (CreateAIService() helper) │
│ │
│ Environment Detection: │
│ • YAZE_AI_PROVIDER=ollama → OllamaAIService │
│ • GEMINI_API_KEY set → GeminiAIService │
│ • CLAUDE_API_KEY set → ClaudeAIService │
│ • Default → MockAIService │
└─────────┬───────────────────┬────────────────┬───────────────────────┘
│ │ │
│ │ │
▼ ▼ ▼
┌──────────────────┐ ┌──────────────┐ ┌─────────────────┐
│ OllamaAIService │ │ GeminiAI │ │ ClaudeAIService │
│ │ │ Service │ │ │
│ • Local LLM │ │ • Remote API │ │ • Remote API │
│ • Free │ │ • API Key │ │ • API Key │
│ • Private │ │ • $0.10/1M │ │ • Free tier │
│ • Fast │ │ tokens │ │ • Best quality │
└────────┬─────────┘ └──────┬───────┘ └────────┬────────┘
│ │ │
│ │ │
▼ ▼ ▼
┌──────────────────────────────────────────────────────────┐
│ AIService Interface │
│ │
│ virtual absl::StatusOr<vector<string>> │
│ GetCommands(const string& prompt) = 0; │
└────────────────────────────┬──────────────────────────────┘
│ Return Commands
["rom validate --rom zelda3.sfc",
"palette export --group sprites ...",
"palette set-color --file ... --color FF0000"]
┌─────────────────────────────────────────────────────────────────────┐
│ Command Execution Engine │
│ │
│ For each command: │
│ 1. Parse command string │
│ 2. Lookup handler in ModernCLI registry │
│ 3. Execute in sandbox ROM │
│ 4. Log to ProposalRegistry │
│ 5. Capture output/errors │
└────────────────────────────┬────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ Proposal Registry │
│ (Cross-session persistence) │
│ │
│ • Proposal metadata (ID, timestamp, prompt) │
│ • Execution logs (commands, status, duration) │
│ • ROM diff (before/after sandbox state) │
│ • Status (pending, accepted, rejected) │
└────────────────────────────┬────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ Human Review (GUI) │
│ YAZE Editor → Debug → Agent Proposals │
│ │
│ • View proposal details │
│ • Inspect ROM diff visually │
│ • Test in GUI editors │
│ • Accept → Merge to main ROM │
│ • Reject → Discard sandbox │
└─────────────────────────────────────────────────────────────────────┘
```
## LLM Provider Flow
### Ollama (Local)
```
User Prompt
OllamaAIService
├─► Check Health (http://localhost:11434/api/tags)
│ └─► Model Available? ────No──► Error: "Pull qwen2.5-coder:7b"
│ └─Yes
├─► Build System Prompt
│ • Load z3ed-resources.yaml
│ • Add few-shot examples
│ • Inject ROM context
├─► POST /api/generate
│ {
│ "model": "qwen2.5-coder:7b",
│ "prompt": "<system> + <user>",
│ "temperature": 0.1,
│ "format": "json"
│ }
├─► Parse Response
│ ["command1", "command2", ...]
└─► Return to Agent Handler
```
### Gemini (Remote)
```
User Prompt
GeminiAIService
├─► Check API Key
│ └─► Not Set? ────► Error: "Set GEMINI_API_KEY"
├─► Build Request
│ {
│ "contents": [{
│ "role": "user",
│ "parts": [{"text": "<system> + <prompt>"}]
│ }],
│ "generationConfig": {
│ "temperature": 0.1,
│ "maxOutputTokens": 2048
│ }
│ }
├─► POST https://generativelanguage.googleapis.com/
│ v1beta/models/gemini-1.5-flash:generateContent
├─► Parse Response
│ • Extract text from nested JSON
│ • Strip markdown code blocks if present
│ • Parse JSON array
└─► Return Commands
```
### Claude (Remote)
```
User Prompt
ClaudeAIService
├─► Check API Key
│ └─► Not Set? ────► Error: "Set CLAUDE_API_KEY"
├─► Build Request
│ {
│ "model": "claude-3-5-sonnet-20241022",
│ "max_tokens": 2048,
│ "temperature": 0.1,
│ "system": "<system instructions>",
│ "messages": [{
│ "role": "user",
│ "content": "<prompt>"
│ }]
│ }
├─► POST https://api.anthropic.com/v1/messages
├─► Parse Response
│ • Extract text from content[0].text
│ • Strip markdown if present
│ • Parse JSON array
└─► Return Commands
```
## Prompt Engineering Pipeline
```
┌─────────────────────────────────────────────────────────────────────┐
│ PromptBuilder │
│ (Comprehensive System Prompt) │
└────────────────────────────┬────────────────────────────────────────┘
├─► 1. Load Resource Catalogue
│ Source: docs/api/z3ed-resources.yaml
│ • All command schemas
│ • Argument types & descriptions
│ • Expected effects & returns
├─► 2. Add Few-Shot Examples
│ Proven prompt → command pairs:
│ • "Validate ROM" → ["rom validate ..."]
│ • "Red armor" → ["palette export ...", ...]
├─► 3. Inject ROM Context
│ Current state from application:
│ • Loaded ROM path
│ • Open editors (Overworld, Dungeon)
│ • Recently modified assets
├─► 4. Set Output Format Rules
│ • MUST return JSON array of strings
│ • Each string is executable z3ed command
│ • No explanations or markdown
└─► 5. Combine into Final Prompt
System Prompt (~2K tokens) + User Prompt
Sent to LLM Provider
```
## Error Handling & Fallback Chain
```
User Request
Select Provider (YAZE_AI_PROVIDER)
├─► Ollama Selected
│ │
│ ├─► Health Check
│ │ └─► Failed? ────► Warn + Fallback to MockAIService
│ │ "⚠️ Ollama unavailable, using mock"
│ │
│ └─► Model Check
│ └─► Missing? ───► Error + Suggestion
│ "Pull model: ollama pull qwen2.5-coder:7b"
├─► Gemini Selected
│ │
│ ├─► API Key Check
│ │ └─► Missing? ───► Fallback to MockAIService
│ │ "Set GEMINI_API_KEY or use Ollama"
│ │
│ └─► API Call
│ ├─► Network Error? ───► Retry (3x with backoff)
│ └─► Rate Limit? ──────► Error + Wait Suggestion
└─► Claude Selected
└─► Similar to Gemini
(API key check → Fallback → Retry logic)
```
## File Structure
```
yaze/
├── src/cli/service/
│ ├── ai_service.h # Base interface
│ ├── ai_service.cc # MockAIService implementation
│ ├── ollama_ai_service.h # 🆕 Ollama integration
│ ├── ollama_ai_service.cc # 🆕 Implementation
│ ├── gemini_ai_service.h # Existing (needs fixes)
│ ├── gemini_ai_service.cc # Existing (needs fixes)
│ ├── claude_ai_service.h # 🆕 Claude integration
│ ├── claude_ai_service.cc # 🆕 Implementation
│ ├── prompt_builder.h # 🆕 Prompt engineering utility
│ └── prompt_builder.cc # 🆕 Implementation
├── src/cli/handlers/agent/
│ └── general_commands.cc # 🔧 Add CreateAIService() factory
├── docs/z3ed/
│ ├── LLM-INTEGRATION-PLAN.md # 📋 Complete guide (this file)
│ ├── LLM-IMPLEMENTATION-CHECKLIST.md # ✅ Task checklist
│ ├── LLM-INTEGRATION-SUMMARY.md # 📄 Executive summary
│ ├── LLM-INTEGRATION-ARCHITECTURE.md # 🏗️ Visual diagrams (this file)
│ └── AI-SERVICE-SETUP.md # 📖 User guide (future)
└── scripts/
├── quickstart_ollama.sh # 🚀 Automated setup test
└── test_ai_services.sh # 🧪 Integration tests
```
## Data Flow Example: "Make soldier armor red"
```
1. User Input
$ z3ed agent run --prompt "Make soldier armor red" --rom zelda3.sfc --sandbox
2. Agent Handler
• Create proposal (ID: agent_20251003_143022)
• Create sandbox (/tmp/yaze_sandbox_abc123/zelda3.sfc)
• Select AI service (Ollama detected)
3. Ollama Service
• Check health: ✓ Running on localhost:11434
• Check model: ✓ qwen2.5-coder:7b available
• Build prompt:
System: "<full resource catalogue> + <few-shot examples>"
User: "Make soldier armor red"
• Call API: POST /api/generate
• Response:
```json
{
"response": "[\"palette export --group sprites --id soldier --to /tmp/soldier.pal\", \"palette set-color --file /tmp/soldier.pal --index 5 --color FF0000\", \"palette import --group sprites --id soldier --from /tmp/soldier.pal\"]"
}
```
• Parse: Extract 3 commands
4. Command Execution
┌────────────────────────────────────────────────────────┐
│ Command 1: palette export --group sprites --id soldier │
│ Handler: PaletteHandler::HandleExport() │
│ Status: ✓ Success (wrote /tmp/soldier.pal) │
│ Duration: 45ms │
├────────────────────────────────────────────────────────┤
│ Command 2: palette set-color --file /tmp/soldier.pal │
│ Handler: PaletteHandler::HandleSetColor() │
│ Status: ✓ Success (modified index 5 → #FF0000) │
│ Duration: 12ms │
├────────────────────────────────────────────────────────┤
│ Command 3: palette import --group sprites --id soldier │
│ Handler: PaletteHandler::HandleImport() │
│ Status: ✓ Success (applied to sandbox ROM) │
│ Duration: 78ms │
└────────────────────────────────────────────────────────┘
5. Proposal Registry
• Log all commands
• Calculate ROM diff (before/after)
• Set status: PENDING_REVIEW
6. Output to User
✅ Agent run completed successfully.
Proposal ID: agent_20251003_143022
Sandbox: /tmp/yaze_sandbox_abc123/zelda3.sfc
Use 'z3ed agent diff' to review changes
7. User Review
$ z3ed agent diff
Proposal: agent_20251003_143022
Prompt: "Make soldier armor red"
Status: pending
Created: 2025-10-03 14:30:22
Executed Commands:
1. palette export --group sprites --id soldier --to /tmp/soldier.pal
2. palette set-color --file /tmp/soldier.pal --index 5 --color FF0000
3. palette import --group sprites --id soldier --from /tmp/soldier.pal
ROM Diff:
Modified palettes: [sprites/soldier]
Changed bytes: 6
Offset 0x12345: [old] 00 7C 00 → [new] 00 00 FF
8. GUI Review
Open YAZE → Debug → Agent Proposals
• Visual diff shows red soldier sprite
• Click "Accept" → Merge sandbox to main ROM
• Or "Reject" → Discard sandbox
9. Finalization
$ z3ed agent commit
✅ Proposal accepted and merged to zelda3.sfc
```
## Comparison Matrix
| Feature | Ollama | Gemini | Claude | Mock |
|---------|--------|--------|--------|------|
| **Cost** | Free | $0.10/1M tokens | Free tier | Free |
| **Privacy** | ✅ Local | ❌ Remote | ❌ Remote | ✅ Local |
| **Setup** | `brew install` | API key | API key | None |
| **Speed** | Fast (~1-2s) | Medium (~2-4s) | Medium (~2-4s) | Instant |
| **Quality** | Good (7B-70B) | Excellent | Excellent | Hardcoded |
| **Internet** | No | Yes | Yes | No |
| **Rate Limits** | None | 60 req/min | 5 req/min | None |
| **Model Choice** | Many | Fixed | Fixed | N/A |
| **Use Case** | Development | Production | Premium | Testing |
## Next Steps
1. **Read**: [LLM-INTEGRATION-PLAN.md](LLM-INTEGRATION-PLAN.md) for implementation details
2. **Follow**: [LLM-IMPLEMENTATION-CHECKLIST.md](LLM-IMPLEMENTATION-CHECKLIST.md) step-by-step
3. **Test**: Run `./scripts/quickstart_ollama.sh` when ready
4. **Document**: Update this architecture diagram as system evolves
---
**Last Updated**: October 3, 2025
**Status**: Documentation Complete | Ready to Implement

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,311 @@
# LLM Integration: Executive Summary & Getting Started
**Date**: October 3, 2025
**Author**: GitHub Copilot
**Status**: Ready to Implement
## What Changed?
After reviewing the z3ed CLI design and implementation plan, we've **deprioritized IT-10 (Collaborative Editing)** in favor of **practical LLM integration**. This is the critical next step to make the agentic workflow system production-ready.
## Why This Matters
The z3ed infrastructure is **already complete**:
- ✅ Resource-oriented CLI with comprehensive commands
- ✅ Proposal-based workflow with sandbox execution
- ✅ Machine-readable API catalogue (`z3ed-resources.yaml`)
- ✅ GUI automation harness for verification
- ✅ ProposalDrawer for human review
**What's missing**: Real LLM integration to turn prompts into actions.
Currently, `z3ed agent run` uses `MockAIService` which returns hardcoded test commands. We need to connect real LLMs (Ollama, Gemini, Claude) to make the agent system useful.
## What You Get
After implementing this plan, users will be able to:
```bash
# Install Ollama (one-time setup)
brew install ollama
ollama serve &
ollama pull qwen2.5-coder:7b
# Configure z3ed
export YAZE_AI_PROVIDER=ollama
# Use natural language to modify ROMs
z3ed agent run \
--prompt "Make all soldier armor red" \
--rom zelda3.sfc \
--sandbox
# Review generated commands
z3ed agent diff
# Accept changes
# (Open YAZE GUI → Debug → Agent Proposals → Review → Accept)
```
The LLM will automatically:
1. Parse the natural language prompt
2. Generate appropriate `z3ed` commands
3. Execute them in a sandbox
4. Present results for human review
## Implementation Roadmap
### Phase 1: Ollama Integration (4-6 hours) 🎯 START HERE
**Priority**: Highest
**Why First**: Local, free, no API keys, fast iteration
**Deliverables**:
- `OllamaAIService` class with health checks
- CMake integration for httplib
- Service selection mechanism (env vars)
- End-to-end test script
**Key Files**:
- `src/cli/service/ollama_ai_service.{h,cc}` (new)
- `src/cli/handlers/agent/general_commands.cc` (update)
- `CMakeLists.txt` (add httplib support)
### Phase 2: Gemini Fixes (2-3 hours)
**Deliverables**:
- Fix existing `GeminiAIService` implementation
- Better prompting with resource catalogue
- Markdown code block stripping
### Phase 3: Claude Integration (2-3 hours)
**Deliverables**:
- `ClaudeAIService` class
- Messages API integration
- Same interface as other services
### Phase 4: Enhanced Prompting (3-4 hours)
**Deliverables**:
- `PromptBuilder` utility class
- Resource catalogue integration
- Few-shot examples
- Context injection (ROM state)
## Quick Start (After Implementation)
### For Developers (Implement Now)
1. **Read the implementation plan**:
- [LLM-INTEGRATION-PLAN.md](LLM-INTEGRATION-PLAN.md) - Complete technical guide
- [LLM-IMPLEMENTATION-CHECKLIST.md](LLM-IMPLEMENTATION-CHECKLIST.md) - Step-by-step tasks
2. **Start with Phase 1**:
```bash
# Follow checklist in LLM-IMPLEMENTATION-CHECKLIST.md
# Implementation time: ~4-6 hours
```
3. **Test as you go**:
```bash
# Run quickstart script when ready
./scripts/quickstart_ollama.sh
```
### For End Users (After Development)
1. **Install Ollama**:
```bash
brew install ollama # macOS
ollama serve &
ollama pull qwen2.5-coder:7b
```
2. **Configure z3ed**:
```bash
export YAZE_AI_PROVIDER=ollama
```
3. **Try it out**:
```bash
z3ed agent run --prompt "Validate my ROM" --rom zelda3.sfc
```
## Alternative Providers
### Gemini (Remote, API Key Required)
```bash
export GEMINI_API_KEY=your_key_here
export YAZE_AI_PROVIDER=gemini
z3ed agent run --prompt "..."
```
### Claude (Remote, API Key Required)
```bash
export CLAUDE_API_KEY=your_key_here
export YAZE_AI_PROVIDER=claude
z3ed agent run --prompt "..."
```
## Documentation Structure
```
docs/z3ed/
├── README.md # Overview + navigation
├── E6-z3ed-cli-design.md # Architecture & design
├── E6-z3ed-implementation-plan.md # Overall roadmap
├── LLM-INTEGRATION-PLAN.md # 📋 Detailed LLM guide (NEW)
├── LLM-IMPLEMENTATION-CHECKLIST.md # ✅ Step-by-step tasks (NEW)
└── LLM-INTEGRATION-SUMMARY.md # 📄 This file (NEW)
scripts/
└── quickstart_ollama.sh # 🚀 Automated setup test (NEW)
```
## Key Architectural Decisions
### 1. Service Interface Pattern
All LLM providers implement the same `AIService` interface:
```cpp
class AIService {
public:
virtual absl::StatusOr<std::vector<std::string>> GetCommands(
const std::string& prompt) = 0;
};
```
This allows easy swapping between Ollama, Gemini, Claude, or Mock.
### 2. Environment-Based Selection
Provider selection via environment variables (not compile-time):
```bash
export YAZE_AI_PROVIDER=ollama # or gemini, claude, mock
```
This enables:
- Easy testing with different providers
- CI/CD with MockAIService
- User choice without rebuilding
### 3. Graceful Degradation
If Ollama/Gemini/Claude unavailable, fall back to MockAIService with clear warnings:
```
⚠️ Ollama unavailable: Cannot connect to http://localhost:11434
Falling back to MockAIService
Set YAZE_AI_PROVIDER=ollama or install Ollama to enable LLM
```
### 4. System Prompt Engineering
Comprehensive system prompts include:
- Full command catalogue from `z3ed-resources.yaml`
- Few-shot examples (proven prompt/command pairs)
- Output format requirements (JSON array of strings)
- Current ROM context (loaded file, editors open)
This improves accuracy from ~60% to >90% for standard tasks.
## Success Metrics
### Phase 1 Complete When:
- ✅ `z3ed agent run` works with Ollama end-to-end
- ✅ Health checks report clear errors
- ✅ Fallback to MockAIService is transparent
- ✅ Test script passes on macOS
### Full Integration Complete When:
- ✅ All three providers (Ollama, Gemini, Claude) work
- ✅ Command accuracy >90% on standard prompts
- ✅ Documentation guides users through setup
- ✅ At least one community member validates workflow
## Known Limitations
### Current Implementation
- `MockAIService` returns hardcoded test commands
- No real LLM integration yet
- Limited to simple test cases
### After LLM Integration
- **Model hallucination**: LLMs may generate invalid commands
- Mitigation: Validation layer + resource catalogue
- **API rate limits**: Remote providers (Gemini/Claude) have limits
- Mitigation: Response caching + local Ollama option
- **Cost**: API calls cost money (Gemini ~$0.10/million tokens)
- Mitigation: Ollama is free + cache responses
## FAQ
### Why Ollama first?
- **No API keys**: Works out of the box
- **Privacy**: All processing local
- **Speed**: No network latency
- **Cost**: Zero dollars
- **Testing**: No rate limits
### Why not OpenAI?
- Cost (GPT-4 is expensive)
- Rate limits (strict for free tier)
- Not local (privacy concerns for ROM hackers)
- Ollama + Gemini cover both local and remote use cases
### Can I use multiple providers?
Yes! Set `YAZE_AI_PROVIDER` per command:
```bash
YAZE_AI_PROVIDER=ollama z3ed agent run --prompt "Quick test"
YAZE_AI_PROVIDER=gemini z3ed agent run --prompt "Complex task"
```
### What if I don't want to use AI?
The CLI still works without LLM integration:
```bash
# Direct command execution (no LLM)
z3ed rom validate --rom zelda3.sfc
z3ed palette export --group sprites --id soldier --to output.pal
```
AI is **optional** and additive.
## Next Steps
### For @scawful (Project Owner)
1. **Review this plan**: Confirm priority shift from IT-10 to LLM integration
2. **Decide on Phase 1**: Start Ollama implementation (~4-6 hours)
3. **Allocate time**: Schedule implementation over next 1-2 weeks
4. **Test setup**: Install Ollama and verify it works on your machine
### For Contributors
1. **Read the docs**: Start with [LLM-INTEGRATION-PLAN.md](LLM-INTEGRATION-PLAN.md)
2. **Pick a phase**: Phase 1 (Ollama) is the highest priority
3. **Follow checklist**: Use [LLM-IMPLEMENTATION-CHECKLIST.md](LLM-IMPLEMENTATION-CHECKLIST.md)
4. **Submit PR**: Include tests + documentation updates
### For Users (Future)
1. **Wait for release**: This is in development
2. **Install Ollama**: Get ready for local LLM support
3. **Follow setup guide**: Will be in `AI-SERVICE-SETUP.md` (coming soon)
## Timeline
**Week 1 (Oct 7-11, 2025)**: Phase 1 (Ollama)
**Week 2 (Oct 14-18, 2025)**: Phases 2-4 (Gemini, Claude, Prompting)
**Week 3 (Oct 21-25, 2025)**: Testing, docs, user validation
**Estimated Total**: 12-15 hours of development time
## Related Documents
- **[LLM-INTEGRATION-PLAN.md](LLM-INTEGRATION-PLAN.md)** - Complete technical implementation guide
- **[LLM-IMPLEMENTATION-CHECKLIST.md](LLM-IMPLEMENTATION-CHECKLIST.md)** - Step-by-step task list
- **[E6-z3ed-cli-design.md](E6-z3ed-cli-design.md)** - Overall architecture
- **[E6-z3ed-implementation-plan.md](E6-z3ed-implementation-plan.md)** - Project roadmap
## Questions?
Open an issue or discuss in the project's communication channel. Tag this as "LLM Integration" for visibility.
---
**Status**: Documentation Complete | Ready to Begin Implementation
**Next Action**: Start Phase 1 (Ollama Integration) using checklist

View File

@@ -12,7 +12,7 @@
**🤖 Why This Matters**: These enhancements are **critical for AI agent autonomy**. Without them, AI agents can't verify their changes worked (no test polling), discover UI elements dynamically (hardcoded names), learn from demonstrations (no recording), or debug failures (no screenshots). The test harness evolution enables **fully autonomous agents** that can execute → verify → self-correct without human intervention.
**📋 Implementation Status**: Core infrastructure complete (Phases 1-6, AW-01 to AW-04, IT-01 to IT-04). Currently in **Test Harness Enhancement Phase** (IT-05 to IT-09). See [IMPLEMENTATION_CONTINUATION.md](IMPLEMENTATION_CONTINUATION.md) for the detailed roadmap and LLM integration plans (Ollama, Gemini, Claude).
**📋 Implementation Status**: Core infrastructure complete (Phases 1-6, AW-01 to AW-04, IT-01 to IT-09). Currently focusing on **LLM Integration** to enable practical AI-driven workflows. See [LLM-INTEGRATION-PLAN.md](LLM-INTEGRATION-PLAN.md) for the detailed roadmap (Ollama, Gemini, Claude).
This directory contains the primary documentation for the `z3ed` system.
@@ -81,6 +81,14 @@ See the **[Technical Reference](E6-z3ed-reference.md)** for a full command list.
## Recent Enhancements
**LLM Integration Priority Shift (Oct 3, 2025)** 🤖
- 📋 Deprioritized IT-10 (Collaborative Editing) in favor of practical LLM integration
- 📄 Created comprehensive implementation plan for Ollama, Gemini, and Claude integration
- ✅ New documentation: [LLM-INTEGRATION-PLAN.md](LLM-INTEGRATION-PLAN.md), [LLM-IMPLEMENTATION-CHECKLIST.md](LLM-IMPLEMENTATION-CHECKLIST.md), [LLM-INTEGRATION-SUMMARY.md](LLM-INTEGRATION-SUMMARY.md)
- 🚀 Ready to enable real AI-driven ROM modifications with natural language prompts
- **Estimated effort**: 12-15 hours across 4 phases
- **Why now**: All infrastructure complete (CLI, proposals, sandbox, GUI automation) - only LLM connection missing
**Recent Progress (Oct 3, 2025)**
- ✅ IT-09 CLI Test Suite Tooling Complete: run/validate/create commands + JUnit output
- Full suite runner with group/tag filters, parametrization, retries, and CI-friendly exit codes
@@ -124,11 +132,12 @@ See **[E6-z3ed-cli-design.md § 9](E6-z3ed-cli-design.md#9-test-harness-evolutio
**📖 Getting Started**:
- **New to z3ed?** Start with this [README.md](README.md) then [E6-z3ed-cli-design.md](E6-z3ed-cli-design.md)
- **Want to use z3ed?** See [QUICK_REFERENCE.md](QUICK_REFERENCE.md) for all commands
- **Setting up AI agents?** See [LLM-INTEGRATION-PLAN.md](LLM-INTEGRATION-PLAN.md) for Ollama/Gemini/Claude setup
**🔧 Implementation Guides**:
- [LLM-IMPLEMENTATION-CHECKLIST.md](LLM-IMPLEMENTATION-CHECKLIST.md) - Step-by-step LLM integration tasks ⭐ START HERE
- [IT-05-IMPLEMENTATION-GUIDE.md](IT-05-IMPLEMENTATION-GUIDE.md) - Test Introspection API (complete ✅)
- [IT-08-IMPLEMENTATION-GUIDE.md](IT-08-IMPLEMENTATION-GUIDE.md) - Enhanced Error Reporting (in progress 🔄)
- [IMPLEMENTATION_CONTINUATION.md](IMPLEMENTATION_CONTINUATION.md) - Detailed continuation plan for current phase
- [IT-08-IMPLEMENTATION-GUIDE.md](IT-08-IMPLEMENTATION-GUIDE.md) - Enhanced Error Reporting (complete ✅)
**📚 Reference**:
- [E6-z3ed-reference.md](E6-z3ed-reference.md) - Technical reference and API docs

128
scripts/quickstart_ollama.sh Executable file
View File

@@ -0,0 +1,128 @@
#!/bin/bash
# Quick Start Script for Testing Ollama Integration with z3ed
# Usage: ./scripts/quickstart_ollama.sh
set -e
echo "🚀 z3ed + Ollama Quick Start"
echo "================================"
echo ""
# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m' # No Color
# Step 1: Check if Ollama is installed
echo "📦 Step 1: Checking Ollama installation..."
if ! command -v ollama &> /dev/null; then
echo -e "${RED}✗ Ollama not found${NC}"
echo ""
echo "Install Ollama with:"
echo " macOS: brew install ollama"
echo " Linux: curl -fsSL https://ollama.com/install.sh | sh"
echo ""
exit 1
fi
echo -e "${GREEN}✓ Ollama installed${NC}"
echo ""
# Step 2: Check if Ollama server is running
echo "🔌 Step 2: Checking Ollama server..."
if ! curl -s http://localhost:11434/api/tags > /dev/null 2>&1; then
echo -e "${YELLOW}⚠ Ollama server not running${NC}"
echo ""
echo "Starting Ollama server in background..."
ollama serve > /dev/null 2>&1 &
OLLAMA_PID=$!
echo "Waiting for server to start..."
sleep 3
if ! curl -s http://localhost:11434/api/tags > /dev/null 2>&1; then
echo -e "${RED}✗ Failed to start Ollama server${NC}"
exit 1
fi
echo -e "${GREEN}✓ Ollama server started (PID: $OLLAMA_PID)${NC}"
else
echo -e "${GREEN}✓ Ollama server running${NC}"
fi
echo ""
# Step 3: Check if recommended model is available
RECOMMENDED_MODEL="qwen2.5-coder:7b"
echo "🤖 Step 3: Checking for model: $RECOMMENDED_MODEL..."
if ! ollama list | grep -q "$RECOMMENDED_MODEL"; then
echo -e "${YELLOW}⚠ Model not found${NC}"
echo ""
read -p "Pull $RECOMMENDED_MODEL? (~4.7GB download) [y/N]: " -n 1 -r
echo ""
if [[ $REPLY =~ ^[Yy]$ ]]; then
echo "Pulling model (this may take a few minutes)..."
ollama pull "$RECOMMENDED_MODEL"
echo -e "${GREEN}✓ Model pulled successfully${NC}"
else
echo -e "${RED}✗ Model required for testing${NC}"
exit 1
fi
else
echo -e "${GREEN}✓ Model available${NC}"
fi
echo ""
# Step 4: Check if z3ed is built
echo "🔨 Step 4: Checking z3ed build..."
if [ ! -f "./build/bin/z3ed" ]; then
echo -e "${YELLOW}⚠ z3ed not found in ./build/bin/${NC}"
echo ""
echo "Building z3ed..."
cmake --build build --target z3ed
if [ ! -f "./build/bin/z3ed" ]; then
echo -e "${RED}✗ Failed to build z3ed${NC}"
exit 1
fi
fi
echo -e "${GREEN}✓ z3ed ready${NC}"
echo ""
# Step 5: Test Ollama integration
echo "🧪 Step 5: Testing z3ed + Ollama integration..."
export YAZE_AI_PROVIDER=ollama
export OLLAMA_MODEL="$RECOMMENDED_MODEL"
echo ""
echo "Running test command:"
echo -e "${BLUE}z3ed agent plan --prompt \"Validate the ROM file\"${NC}"
echo ""
if ./build/bin/z3ed agent plan --prompt "Validate the ROM file"; then
echo ""
echo -e "${GREEN}✓ Integration test passed!${NC}"
else
echo ""
echo -e "${RED}✗ Integration test failed${NC}"
echo "Check error messages above for details"
exit 1
fi
echo ""
echo "================================"
echo -e "${GREEN}🎉 Setup Complete!${NC}"
echo ""
echo "Next steps:"
echo " 1. Try a full agent run:"
echo " export YAZE_AI_PROVIDER=ollama"
echo " z3ed agent run --prompt \"Export first palette\" --rom zelda3.sfc --sandbox"
echo ""
echo " 2. Review generated commands:"
echo " z3ed agent list"
echo " z3ed agent diff"
echo ""
echo " 3. Try different models:"
echo " ollama pull codellama:13b"
echo " export OLLAMA_MODEL=codellama:13b"
echo ""
echo " 4. Read the docs:"
echo " docs/z3ed/LLM-INTEGRATION-PLAN.md"
echo ""