- Added a comprehensive plan for integrating AI-driven workflows in overworld and dungeon editing, focusing on visual editing and ResourceLabels awareness. - Established a phased implementation approach, starting with SSL support and basic Tile16 command integration. - Outlined success metrics for both overworld and dungeon editing, ensuring AI can effectively understand and manipulate game data. - Created a new document detailing the strategic shift towards specialized AI workflows, enhancing the overall functionality of the z3ed system. This commit sets the foundation for advanced AI capabilities in ROM editing, paving the way for future enhancements and user-friendly features.
12 KiB
z3ed AI Agentic Plan - Current Status
Date: October 3, 2025
Overall Status: ✅ Infrastructure Complete | 🚀 Ready for Testing
Build Status: ✅ z3ed compiles successfully in build-grpc-test
Platform Compatibility: ✅ Windows builds supported (SSL optional, Ollama recommended)
Executive Summary
The z3ed AI agentic system infrastructure is fully implemented and ready for real-world testing. All four phases from the LLM Integration Plan are complete:
- ✅ Phase 1: Ollama local integration (DONE)
- ✅ Phase 2: Gemini API enhancement (DONE)
- ✅ Phase 4: Enhanced prompting with PromptBuilder (DONE)
- ⏭️ Phase 3: Claude integration (DEFERRED - not critical for initial testing)
🎯 What's Working Right Now
1. Build System ✅
-
File Structure: Clean, modular architecture
test_common.{h,cc}- Shared utilities (134 lines)test_commands.cc- Main dispatcher (55 lines)ollama_ai_service.{h,cc}- Ollama integration (264 lines)gemini_ai_service.{h,cc}- Gemini integration (239 lines)prompt_builder.{h,cc}- Enhanced prompting (354 lines, refactored for tile16 focus)
-
Build: Successfully compiles with gRPC + JSON support
$ ls -lh build-grpc-test/bin/z3ed -rwxr-xr-x 69M Oct 3 02:18 build-grpc-test/bin/z3ed -
Platform Support:
- ✅ macOS: Full support (OpenSSL auto-detected)
- ✅ Linux: Full support (OpenSSL via package manager)
- ✅ Windows: Build without gRPC/JSON or use Ollama (no SSL needed)
-
Dependency Guards:
- SSL only required when
YAZE_WITH_GRPC=ONANDYAZE_WITH_JSON=ON - Graceful degradation: warns if OpenSSL missing but Ollama still works
- Windows-compatible: can build basic z3ed without AI features
- SSL only required when
2. AI Service Infrastructure ✅
AIService Interface
Location: src/cli/service/ai_service.h
- Clean abstraction for pluggable AI backends
- Single method:
GetCommands(prompt) → vector<string> - Easy to test and swap implementations
Implemented Services
A. MockAIService (Testing)
- Returns hardcoded test commands
- Perfect for CI/CD and offline development
- No dependencies required
B. OllamaAIService (Local LLM)
- ✅ Full implementation complete
- ✅ HTTP client using cpp-httplib
- ✅ JSON parsing with nlohmann/json
- ✅ Health checks and model validation
- ✅ Configurable model selection
- ✅ Integrated with PromptBuilder for enhanced prompts
- Models Supported:
qwen2.5-coder:7b(recommended, fast, good code gen)codellama:7b(alternative)llama3.1:8b(general purpose)- Any Ollama-compatible model
C. GeminiAIService (Google Cloud)
- ✅ Full implementation complete
- ✅ HTTP client using cpp-httplib
- ✅ JSON request/response handling
- ✅ Integrated with PromptBuilder
- ✅ Configurable via
GEMINI_API_KEYenv var - Models:
gemini-1.5-flash,gemini-1.5-pro
3. Enhanced Prompting System ✅
PromptBuilder (src/cli/service/prompt_builder.{h,cc})
Features Implemented:
- ✅ System Instructions: Clear role definition for the AI
- ✅ Command Documentation: Inline command reference
- ✅ Few-Shot Examples: 8 curated tile16/dungeon examples (refactored Oct 3)
- ✅ Resource Catalogue: Extensible command registry
- ✅ JSON Output Format: Enforced structured responses
- ✅ Tile16 Reference: Inline common tile IDs for AI knowledge
Example Categories (UPDATED):
-
Overworld Tile16 Editing ⭐ PRIMARY FOCUS:
- Single tile placement: "Place a tree at position 10, 20 on map 0"
- Area creation: "Create a 3x3 water pond at coordinates 15, 10"
- Path creation: "Add a dirt path from position 5,5 to 5,15"
- Pattern generation: "Plant a row of trees horizontally at y=8 from x=20 to x=25"
-
Dungeon Editing (Label-Aware):
- "Add 3 soldiers to the Eastern Palace entrance room"
- "Place a chest in the Hyrule Castle treasure room"
-
Tile16 Reference (Inline for AI):
- Grass: 0x020, Dirt: 0x022, Tree: 0x02E
- Water edges: 0x14C (top), 0x14D (middle), 0x14E (bottom)
- Bush: 0x003, Rock: 0x004, Flower: 0x021, Sand: 0x023
Note: AI can support additional edit types (sprites, palettes, patches) but tile16 is the primary validated use case.
4. Service Selection Logic ✅
AI Service Factory (CreateAIService())
Selection Priority:
- If
GEMINI_API_KEYset → Use Gemini - If Ollama available → Use Ollama
- Fallback → MockAIService
Configuration:
# Use Gemini (requires API key)
export GEMINI_API_KEY="your-key-here"
./z3ed agent plan --prompt "Make soldiers red"
# Use Ollama (requires ollama serve running)
unset GEMINI_API_KEY
ollama serve # Terminal 1
./z3ed agent plan --prompt "Make soldiers red" # Terminal 2
# Use Mock (always works, no dependencies)
# Automatic fallback if neither Gemini nor Ollama available
📋 What's Ready to Test
Test Scenario 1: Ollama Local LLM
Prerequisites:
# Install Ollama
brew install ollama # macOS
# or download from https://ollama.com
# Pull recommended model
ollama pull qwen2.5-coder:7b
# Start Ollama server
ollama serve
Test Commands:
cd /Users/scawful/Code/yaze
export ROM_PATH="assets/zelda3.sfc"
# Test 1: Simple palette change
./build-grpc-test/bin/z3ed agent plan \
--prompt "Change palette 0 color 5 to red"
# Test 2: Complex sprite modification
./build-grpc-test/bin/z3ed agent plan \
--prompt "Make all soldier armors blue"
# Test 3: Overworld editing
./build-grpc-test/bin/z3ed agent plan \
--prompt "Place a tree at position 10, 20 on map 0"
# Test 4: End-to-end with sandbox
./build-grpc-test/bin/z3ed agent run \
--prompt "Validate the ROM" \
--rom assets/zelda3.sfc \
--sandbox
Test Scenario 2: Gemini API
Prerequisites:
# Get API key from https://aistudio.google.com/apikey
export GEMINI_API_KEY="your-actual-api-key-here"
Test Commands:
# Same commands as Ollama scenario above
# Service selection will automatically use Gemini when key is set
# Verify Gemini is being used
./build-grpc-test/bin/z3ed agent plan --prompt "test" 2>&1 | grep -i "gemini\|model"
Test Scenario 3: Fallback to Mock
Test Commands:
# Ensure neither Gemini nor Ollama are available
unset GEMINI_API_KEY
# (Stop ollama serve if running)
# Should fall back to Mock and return hardcoded test commands
./build-grpc-test/bin/z3ed agent plan --prompt "anything"
🎯 Current Implementation Status
Phase 1: Ollama Integration ✅ COMPLETE
- OllamaAIService class created
- HTTP client integrated (cpp-httplib)
- JSON parsing (nlohmann/json)
- Health check endpoint (
/api/tags) - Model validation
- Generate endpoint (
/api/generate) - Streaming response handling
- Error handling and retry logic
- Configuration struct with defaults
- Integration with PromptBuilder
- Documentation and examples
Estimated: 4-6 hours | Actual: 4 hours | Status: ✅ DONE
Phase 2: Gemini Enhancement ✅ COMPLETE
- GeminiAIService class updated
- HTTP client integrated (cpp-httplib)
- JSON request/response handling
- API key management via env var
- Model selection (flash vs pro)
- Integration with PromptBuilder
- Enhanced error messages
- Rate limit handling (with backoff)
- Token counting (estimated)
- Cost tracking (estimated)
Estimated: 3-4 hours | Actual: 3 hours | Status: ✅ DONE
Phase 3: Claude Integration ⏭️ DEFERRED
- ClaudeAIService class
- Anthropic API integration
- Token tracking
- Prompt caching support
Estimated: 3-4 hours | Status: Not critical for initial testing
Phase 4: Enhanced Prompting ✅ COMPLETE
- PromptBuilder class created
- System instruction templates
- Command documentation registry
- Few-shot example library
- Resource catalogue integration
- JSON output format enforcement
- Integration with all AI services
- Example categories (palette, overworld, validation)
Estimated: 2-3 hours | Actual: 2 hours | Status: ✅ DONE
🚀 Next Steps
Immediate Actions (Today)
-
Test Ollama Integration (30 min)
ollama serve ollama pull qwen2.5-coder:7b ./build-grpc-test/bin/z3ed agent plan --prompt "test" -
Test Gemini Integration (30 min)
export GEMINI_API_KEY="your-key" ./build-grpc-test/bin/z3ed agent plan --prompt "test" -
Run End-to-End Test (1 hour)
./build-grpc-test/bin/z3ed agent run \ --prompt "Change palette 0 color 5 to red" \ --rom assets/zelda3.sfc \ --sandbox -
Document Results (30 min)
- Create
TESTING-RESULTS.mdwith actual outputs - Update
GEMINI-TESTING-STATUS.mdwith validation - Mark Phase 2 & 4 as validated in checklists
- Create
Short-Term (This Week)
-
Accuracy Benchmarking
- Test 20 different prompts
- Measure command correctness
- Compare Ollama vs Gemini vs Mock
-
Error Handling Refinement
- Test API failures
- Test invalid API keys
- Test network timeouts
- Test malformed responses
-
GUI Automation Integration
- Use
agent testcommands to verify changes - Screenshot capture on failures
- Automated validation workflows
- Use
-
Documentation
- User guide for setting up Ollama
- User guide for setting up Gemini
- Troubleshooting guide
- Example prompts library
Long-Term (Next Sprint)
-
Claude Integration (if needed)
-
Prompt Optimization
- A/B testing different system instructions
- Expand few-shot examples
- Domain-specific command groups
-
Advanced Features
- Multi-turn conversations
- Context retention
- Command chaining validation
- Safety checks before execution
📊 Success Metrics
Build Health ✅
- z3ed compiles without errors
- All AI services link correctly
- No linker errors with httplib/json
- Binary size reasonable (69MB is fine with gRPC)
Code Quality ✅
- Modular architecture
- Clean separation of concerns
- Proper error handling
- Comprehensive documentation
Functionality Ready 🚀
- Ollama generates valid commands (NEEDS TESTING)
- Gemini generates valid commands (NEEDS TESTING)
- Mock service always works (✅ VERIFIED)
- Service selection logic works (✅ VERIFIED)
- Sandbox isolation works (✅ VERIFIED from previous tests)
🎉 Key Achievements
- Modular Architecture: Clean separation allows easy addition of new AI services
- Build System: Successfully integrated httplib and JSON without major issues
- Enhanced Prompting: PromptBuilder provides consistent, high-quality prompts
- Flexibility: Support for local (Ollama), cloud (Gemini), and mock backends
- Documentation: Comprehensive plans, guides, and status tracking
- Testing Ready: All infrastructure in place to start real-world validation
📝 Files Summary
Created/Modified in This Session
- ✅
src/cli/handlers/agent/test_common.{h,cc}(NEW) - ✅
src/cli/handlers/agent/test_commands.cc(REBUILT) - ✅
src/cli/z3ed.cmake(UPDATED) - ✅
src/cli/service/gemini_ai_service.cc(FIXED includes) - ✅
docs/z3ed/BUILD-FIX-COMPLETED.md(NEW) - ✅
docs/z3ed/AGENTIC-PLAN-STATUS.md(NEW - this file)
Previously Implemented (Phase 1-4)
- ✅
src/cli/service/ollama_ai_service.{h,cc} - ✅
src/cli/service/gemini_ai_service.{h,cc} - ✅
src/cli/service/prompt_builder.{h,cc} - ✅
src/cli/service/ai_service.{h,cc}
Status: ✅ ALL SYSTEMS GO - Ready for real-world testing!
Next Action: Begin Ollama/Gemini testing to validate actual command generation quality