Files
yaze/docs/z3ed/AGENTIC-PLAN-STATUS.md
scawful b89dcca93f Refactor Agent Commands and Enhance Resource Context Management
- Updated the immediate action plan to focus on integrating `Tile16ProposalGenerator` and `ResourceContextBuilder` into agent commands, improving command handling and proposal generation.
- Implemented the `SetTile` method in the `Overworld` class to facilitate tile modifications based on the current world context.
- Enhanced error handling in command execution to ensure robust feedback during ROM operations.
- Created new files for `Tile16ProposalGenerator` and `ResourceContextBuilder`, enabling structured management of tile changes and resource labels for AI prompts.

This commit advances the functionality of the z3ed system, laying the groundwork for more sophisticated AI-driven editing capabilities.
2025-10-03 09:35:49 -04:00

12 KiB

z3ed AI Agentic Plan - Current Status

Date: October 3, 2025
Overall Status: Infrastructure Complete | 🚀 Ready for Testing
Build Status: z3ed compiles successfully in build-grpc-test
Platform Compatibility: Windows builds supported (SSL optional, Ollama recommended)

Executive Summary

The z3ed AI agentic system infrastructure is fully implemented and ready for real-world testing. All four phases from the LLM Integration Plan are complete:

  • Phase 1: Ollama local integration (DONE)
  • Phase 2: Gemini API enhancement (DONE)
  • Phase 4: Enhanced prompting with PromptBuilder (DONE)
  • ⏭️ Phase 3: Claude integration (DEFERRED - not critical for initial testing)

🎯 What's Working Right Now

1. Build System

  • File Structure: Clean, modular architecture

    • test_common.{h,cc} - Shared utilities (134 lines)
    • test_commands.cc - Main dispatcher (55 lines)
    • ollama_ai_service.{h,cc} - Ollama integration (264 lines)
    • gemini_ai_service.{h,cc} - Gemini integration (239 lines)
    • prompt_builder.{h,cc} - Enhanced prompting (354 lines, refactored for tile16 focus)
  • Build: Successfully compiles with gRPC + JSON support

    $ ls -lh build-grpc-test/bin/z3ed
    -rwxr-xr-x  69M Oct  3 02:18 build-grpc-test/bin/z3ed
    
  • Platform Support:

    • macOS: Full support (OpenSSL auto-detected)
    • Linux: Full support (OpenSSL via package manager)
    • Windows: Build without gRPC/JSON or use Ollama (no SSL needed)
  • Dependency Guards:

    • SSL only required when YAZE_WITH_GRPC=ON AND YAZE_WITH_JSON=ON
    • Graceful degradation: warns if OpenSSL missing but Ollama still works
    • Windows-compatible: can build basic z3ed without AI features

2. AI Service Infrastructure

AIService Interface

Location: src/cli/service/ai_service.h

  • Clean abstraction for pluggable AI backends
  • Single method: GetCommands(prompt) → vector<string>
  • Easy to test and swap implementations

Implemented Services

A. MockAIService (Testing)

  • Returns hardcoded test commands
  • Perfect for CI/CD and offline development
  • No dependencies required

B. OllamaAIService (Local LLM)

  • Full implementation complete
  • HTTP client using cpp-httplib
  • JSON parsing with nlohmann/json
  • Health checks and model validation
  • Configurable model selection
  • Integrated with PromptBuilder for enhanced prompts
  • Models Supported:
    • qwen2.5-coder:7b (recommended, fast, good code gen)
    • codellama:7b (alternative)
    • llama3.1:8b (general purpose)
    • Any Ollama-compatible model

C. GeminiAIService (Google Cloud)

  • Full implementation complete
  • HTTP client using cpp-httplib
  • JSON request/response handling
  • Integrated with PromptBuilder
  • Configurable via GEMINI_API_KEY env var
  • Models: gemini-1.5-flash, gemini-1.5-pro

3. Enhanced Prompting System

PromptBuilder (src/cli/service/prompt_builder.{h,cc})

Features Implemented:

  • System Instructions: Clear role definition for the AI
  • Command Documentation: Inline command reference
  • Few-Shot Examples: 8 curated tile16/dungeon examples (refactored Oct 3)
  • Resource Catalogue: Extensible command registry
  • JSON Output Format: Enforced structured responses
  • Tile16 Reference: Inline common tile IDs for AI knowledge

Example Categories (UPDATED):

  1. Overworld Tile16 Editing PRIMARY FOCUS:

    • Single tile placement: "Place a tree at position 10, 20 on map 0"
    • Area creation: "Create a 3x3 water pond at coordinates 15, 10"
    • Path creation: "Add a dirt path from position 5,5 to 5,15"
    • Pattern generation: "Plant a row of trees horizontally at y=8 from x=20 to x=25"
  2. Dungeon Editing (Label-Aware):

    • "Add 3 soldiers to the Eastern Palace entrance room"
    • "Place a chest in the Hyrule Castle treasure room"
  3. Tile16 Reference (Inline for AI):

    • Grass: 0x020, Dirt: 0x022, Tree: 0x02E
    • Water edges: 0x14C (top), 0x14D (middle), 0x14E (bottom)
    • Bush: 0x003, Rock: 0x004, Flower: 0x021, Sand: 0x023

Note: AI can support additional edit types (sprites, palettes, patches) but tile16 is the primary validated use case.

4. Service Selection Logic

AI Service Factory (CreateAIService())

Selection Priority:

  1. If GEMINI_API_KEY set → Use Gemini
  2. If Ollama available → Use Ollama
  3. Fallback → MockAIService

Configuration:

# Use Gemini (requires API key)
export GEMINI_API_KEY="your-key-here"
./z3ed agent plan --prompt "Make soldiers red"

# Use Ollama (requires ollama serve running)
unset GEMINI_API_KEY
ollama serve  # Terminal 1
./z3ed agent plan --prompt "Make soldiers red"  # Terminal 2

# Use Mock (always works, no dependencies)
# Automatic fallback if neither Gemini nor Ollama available

📋 What's Ready to Test

Test Scenario 1: Ollama Local LLM

Prerequisites:

# Install Ollama
brew install ollama  # macOS
# or download from https://ollama.com

# Pull recommended model
ollama pull qwen2.5-coder:7b

# Start Ollama server
ollama serve

Test Commands:

cd /Users/scawful/Code/yaze
export ROM_PATH="assets/zelda3.sfc"

# Test 1: Simple palette change
./build-grpc-test/bin/z3ed agent plan \
  --prompt "Change palette 0 color 5 to red"

# Test 2: Complex sprite modification
./build-grpc-test/bin/z3ed agent plan \
  --prompt "Make all soldier armors blue"

# Test 3: Overworld editing
./build-grpc-test/bin/z3ed agent plan \
  --prompt "Place a tree at position 10, 20 on map 0"

# Test 4: End-to-end with sandbox
./build-grpc-test/bin/z3ed agent run \
  --prompt "Validate the ROM" \
  --rom assets/zelda3.sfc \
  --sandbox

Test Scenario 2: Gemini API

Prerequisites:

# Get API key from https://aistudio.google.com/apikey
export GEMINI_API_KEY="your-actual-api-key-here"

Test Commands:

# Same commands as Ollama scenario above
# Service selection will automatically use Gemini when key is set

# Verify Gemini is being used
./build-grpc-test/bin/z3ed agent plan --prompt "test" 2>&1 | grep -i "gemini\|model"

Test Scenario 3: Fallback to Mock

Test Commands:

# Ensure neither Gemini nor Ollama are available
unset GEMINI_API_KEY
# (Stop ollama serve if running)

# Should fall back to Mock and return hardcoded test commands
./build-grpc-test/bin/z3ed agent plan --prompt "anything"

🎯 Current Implementation Status

Phase 1: Ollama Integration COMPLETE

  • OllamaAIService class created
  • HTTP client integrated (cpp-httplib)
  • JSON parsing (nlohmann/json)
  • Health check endpoint (/api/tags)
  • Model validation
  • Generate endpoint (/api/generate)
  • Streaming response handling
  • Error handling and retry logic
  • Configuration struct with defaults
  • Integration with PromptBuilder
  • Documentation and examples

Estimated: 4-6 hours | Actual: 4 hours | Status: DONE

Phase 2: Gemini Enhancement COMPLETE

  • GeminiAIService class updated
  • HTTP client integrated (cpp-httplib)
  • JSON request/response handling
  • API key management via env var
  • Model selection (flash vs pro)
  • Integration with PromptBuilder
  • Enhanced error messages
  • Rate limit handling (with backoff)
  • Token counting (estimated)
  • Cost tracking (estimated)

Estimated: 3-4 hours | Actual: 3 hours | Status: DONE

Phase 3: Claude Integration ⏭️ DEFERRED

  • ClaudeAIService class
  • Anthropic API integration
  • Token tracking
  • Prompt caching support

Estimated: 3-4 hours | Status: Not critical for initial testing

Phase 4: Enhanced Prompting COMPLETE

  • PromptBuilder class created
  • System instruction templates
  • Command documentation registry
  • Few-shot example library
  • Resource catalogue integration
  • JSON output format enforcement
  • Integration with all AI services
  • Example categories (palette, overworld, validation)

Estimated: 2-3 hours | Actual: 2 hours | Status: DONE

🚀 Next Steps

Immediate Actions (Next Session)

  1. Integrate Tile16ProposalGenerator into Agent Commands (2 hours)

    • Modify HandlePlanCommand() to use generator
    • Modify HandleRunCommand() to apply proposals
    • Add HandleAcceptCommand() for accepting proposals
  2. Integrate ResourceContextBuilder into PromptBuilder (1 hour)

    • Update BuildContextualPrompt() to inject labels
    • Test with actual labels file from user project
  3. Test End-to-End Workflow (1 hour)

    ollama serve
    ./build-grpc-test/bin/z3ed agent plan \
      --prompt "Create a 3x3 water pond at 15, 10"
    
    # Verify proposal generation
    # Verify tile16 changes are correct
    
  4. Add Visual Diff Implementation (2-3 hours)

    • Render tile16 bitmaps from overworld
    • Create side-by-side comparison images
    • Highlight changed tiles

Short-Term (This Week)

  1. Accuracy Benchmarking

    • Test 20 different prompts
    • Measure command correctness
    • Compare Ollama vs Gemini vs Mock
  2. Error Handling Refinement

    • Test API failures
    • Test invalid API keys
    • Test network timeouts
    • Test malformed responses
  3. GUI Automation Integration

    • Use agent test commands to verify changes
    • Screenshot capture on failures
    • Automated validation workflows
  4. Documentation

    • User guide for setting up Ollama
    • User guide for setting up Gemini
    • Troubleshooting guide
    • Example prompts library

Long-Term (Next Sprint)

  1. Claude Integration (if needed)

  2. Prompt Optimization

    • A/B testing different system instructions
    • Expand few-shot examples
    • Domain-specific command groups
  3. Advanced Features

    • Multi-turn conversations
    • Context retention
    • Command chaining validation
    • Safety checks before execution

📊 Success Metrics

Build Health

  • z3ed compiles without errors
  • All AI services link correctly
  • No linker errors with httplib/json
  • Binary size reasonable (69MB is fine with gRPC)

Code Quality

  • Modular architecture
  • Clean separation of concerns
  • Proper error handling
  • Comprehensive documentation

Functionality Ready 🚀

  • Ollama generates valid commands (NEEDS TESTING)
  • Gemini generates valid commands (NEEDS TESTING)
  • Mock service always works ( VERIFIED)
  • Service selection logic works ( VERIFIED)
  • Sandbox isolation works ( VERIFIED from previous tests)

🎉 Key Achievements

  1. Modular Architecture: Clean separation allows easy addition of new AI services
  2. Build System: Successfully integrated httplib and JSON without major issues
  3. Enhanced Prompting: PromptBuilder provides consistent, high-quality prompts
  4. Flexibility: Support for local (Ollama), cloud (Gemini), and mock backends
  5. Documentation: Comprehensive plans, guides, and status tracking
  6. Testing Ready: All infrastructure in place to start real-world validation

📝 Files Summary

Created/Modified Recently

  • src/cli/handlers/agent/test_common.{h,cc} (NEW)
  • src/cli/handlers/agent/test_commands.cc (REBUILT)
  • src/cli/z3ed.cmake (UPDATED)
  • src/cli/service/gemini_ai_service.cc (FIXED includes)
  • src/cli/service/tile16_proposal_generator.{h,cc} (NEW - Oct 3)
  • src/cli/service/resource_context_builder.{h,cc} (NEW - Oct 3)
  • src/app/zelda3/overworld/overworld.h (UPDATED - SetTile method)
  • src/cli/handlers/overworld.cc (UPDATED - SetTile implementation)
  • docs/z3ed/IMPLEMENTATION-SESSION-OCT3-CONTINUED.md (NEW)
  • docs/z3ed/AGENTIC-PLAN-STATUS.md (UPDATED - this file)

Previously Implemented (Phase 1-4)

  • src/cli/service/ollama_ai_service.{h,cc}
  • src/cli/service/gemini_ai_service.{h,cc}
  • src/cli/service/prompt_builder.{h,cc}
  • src/cli/service/ai_service.{h,cc}

Status: ALL SYSTEMS GO - Ready for real-world testing!
Next Action: Begin Ollama/Gemini testing to validate actual command generation quality