Introduce Overworld & Dungeon AI Integration Plan

- Added a comprehensive plan for integrating AI-driven workflows in overworld and dungeon editing, focusing on visual editing and ResourceLabels awareness.
- Established a phased implementation approach, starting with SSL support and basic Tile16 command integration.
- Outlined success metrics for both overworld and dungeon editing, ensuring AI can effectively understand and manipulate game data.
- Created a new document detailing the strategic shift towards specialized AI workflows, enhancing the overall functionality of the z3ed system.

This commit sets the foundation for advanced AI capabilities in ROM editing, paving the way for future enhancements and user-friendly features.
This commit is contained in:
scawful
2025-10-03 09:20:37 -04:00
parent 923f5af068
commit 3473d37be4
14 changed files with 1639 additions and 2894 deletions

View File

@@ -1,142 +1,283 @@
# z3ed: AI-Powered CLI for YAZE
**Status**: Active Development | Test Harness Enhancement Phase
**Status**: Active Development | AI Integration Phase
**Latest Update**: October 3, 2025
## Overview
`z3ed` is a command-line interface for YAZE that enables AI-driven ROM modifications through a proposal-based workflow. It provides both human-accessible commands for developers and machine-readable APIs for LLM integration, forming the backbone of an agentic development ecosystem.
`z3ed` is a command-line interface for YAZE that enables AI-driven ROM modifications through a proposal-based workflow. It provides both human-accessible commands for developers and machine-readable APIs for LLM integration.
**Recent Focus**: Evolving the ImGuiTestHarness from basic GUI automation into a comprehensive testing platform that serves dual purposes:
1. **AI-Driven Workflows**: Widget discovery, test introspection, and dynamic interaction learning
2. **Traditional GUI Testing**: Test recording/replay, CI/CD integration, and regression testing
**🤖 Why This Matters**: These enhancements are **critical for AI agent autonomy**. Without them, AI agents can't verify their changes worked (no test polling), discover UI elements dynamically (hardcoded names), learn from demonstrations (no recording), or debug failures (no screenshots). The test harness evolution enables **fully autonomous agents** that can execute → verify → self-correct without human intervention.
**📋 Implementation Status**: Core infrastructure complete (Phases 1-6, AW-01 to AW-04, IT-01 to IT-09). Currently focusing on **LLM Integration** to enable practical AI-driven workflows. See [LLM-INTEGRATION-PLAN.md](LLM-INTEGRATION-PLAN.md) for the detailed roadmap (Ollama, Gemini, Claude).
This directory contains the primary documentation for the `z3ed` system.
**📋 Documentation Status**: Consolidated (Oct 2, 2025) - 10 core files, 6,547 lines
## Core Documentation
Start here to understand the architecture, learn how to use the commands, and see the current development status.
1. **[E6-z3ed-cli-design.md](E6-z3ed-cli-design.md)** - **Design & Architecture**
* The "source of truth" for the system's architecture, design goals, and the agentic workflow framework. Read this first to understand *why* the system is built the way it is.
2. **[E6-z3ed-reference.md](E6-z3ed-reference.md)** - **Technical Reference & Guides**
* A complete command reference, API documentation, implementation guides, and troubleshooting tips. Use this as your day-to-day manual for working with `z3ed`.
3. **[E6-z3ed-implementation-plan.md](E6-z3ed-implementation-plan.md)** - **Roadmap & Status**
* The project's task backlog, roadmap, progress tracking, and a list of known issues. Check this document for current priorities and to see what's next.
**Core Capabilities**:
1. **AI-Driven Editing**: Natural language prompts → ROM modifications (overworld tile16, dungeon objects, sprites, palettes)
2. **GUI Test Automation**: Widget discovery, test recording/replay, introspection for debugging
3. **Proposal System**: Safe sandbox editing with accept/reject workflow
4. **Multiple AI Backends**: Ollama (local), Gemini (cloud), Claude (planned)
## Quick Start
### Build z3ed
### Build Options
```bash
# Basic build (without GUI automation support)
# Basic z3ed (CLI only, no AI/testing features)
cmake --build build --target z3ed
# Build with gRPC support (for GUI automation)
cmake -B build-grpc-test -DYAZE_WITH_GRPC=ON
# Full build with AI agent and testing suite
cmake -B build-grpc-test -DYAZE_WITH_GRPC=ON -DYAZE_WITH_JSON=ON
cmake --build build-grpc-test --target z3ed
```
### Common Commands
**Dependencies for Full Build**:
- gRPC (GUI automation)
- nlohmann/json (AI service communication)
- OpenSSL (optional, for Gemini HTTPS - auto-detected on macOS/Linux)
### AI Agent Commands
```bash
# Create an agent proposal in a safe sandbox
z3ed agent run --prompt "Make all soldier armor red" --rom=zelda3.sfc --sandbox
# Generate commands from natural language prompt
z3ed agent plan --prompt "Place a tree at position 10, 10 on map 0"
# List all active and past proposals
# Execute in sandbox with auto-approval
z3ed agent run --prompt "Create a 3x3 water pond at 15, 20" --rom zelda3.sfc --sandbox
# List all proposals
z3ed agent list
# View the changes for the latest proposal
z3ed agent diff
# View proposal details
z3ed agent diff --proposal <id>
```
# Run an automated GUI test (requires test harness to be running)
z3ed agent test --prompt "Open the Overworld editor and verify it loads"
### GUI Testing Commands
# Discover available GUI widgets for AI interaction
z3ed agent gui discover --window "Overworld" --type button
```bash
# Run automated test
z3ed agent test --prompt "Open Overworld editor and verify it loads"
# Record a test session for regression testing
z3ed agent test record start --output tests/overworld_load.json
# ... perform actions ...
# Query test status
z3ed agent test status --test-id <id> --follow
# Record manual workflow
z3ed agent test record start --output tests/my_test.json
# ... perform actions in GUI ...
z3ed agent test record stop
# Replay recorded test
z3ed agent test replay tests/overworld_load.json
# Query test execution status
z3ed agent test status --test-id grpc_click_12345678 --follow
z3ed agent test replay tests/my_test.json
```
See the **[Technical Reference](E6-z3ed-reference.md)** for a full command list.
## AI Service Setup
## Recent Enhancements
### Ollama (Local LLM - Recommended for Development)
**LLM Integration Priority Shift (Oct 3, 2025)** 🤖
- 📋 Deprioritized IT-10 (Collaborative Editing) in favor of practical LLM integration
- 📄 Created comprehensive implementation plan for Ollama, Gemini, and Claude integration
- ✅ New documentation: [LLM-INTEGRATION-PLAN.md](LLM-INTEGRATION-PLAN.md), [LLM-IMPLEMENTATION-CHECKLIST.md](LLM-IMPLEMENTATION-CHECKLIST.md), [LLM-INTEGRATION-SUMMARY.md](LLM-INTEGRATION-SUMMARY.md)
- 🚀 Ready to enable real AI-driven ROM modifications with natural language prompts
- **Estimated effort**: 12-15 hours across 4 phases
- **Why now**: All infrastructure complete (CLI, proposals, sandbox, GUI automation) - only LLM connection missing
```bash
# Install Ollama
brew install ollama # macOS
# or download from https://ollama.com
**Recent Progress (Oct 3, 2025)**
- ✅ IT-09 CLI Test Suite Tooling Complete: run/validate/create commands + JUnit output
- Full suite runner with group/tag filters, parametrization, retries, and CI-friendly exit codes
- Interactive `agent test suite create` scaffolds YAML definitions in `tests/`
- Default JUnit reports under `test-results/junit/` for CI upload
- ✅ IT-08 Enhanced Error Reporting Complete: Full diagnostic capture on test failures
- IT-08a: Screenshot RPC with SDL capture (BMP format, 1536x864)
- IT-08b: Auto-capture execution context on failures (frame, window, widget)
- IT-08c: Widget state dumps with comprehensive UI snapshot (JSON format)
- Proto schema supports screenshot_path, failure_context, and widget_state
- GetTestResults RPC returns full failure diagnostics for debugging
- ✅ IT-05 Implementation Complete: Test introspection API fully operational
- GetTestStatus, ListTests, and GetTestResults RPCs implemented and tested
- CLI commands (`z3ed agent test {status,list,results}`) fully functional
- E2E validation script confirms production readiness
- Thread-safe execution history with bounded memory management
- ✅ IT-08a Screenshot RPC Complete: Visual debugging now available
- SDL-based screenshot capture implemented (1536x864 BMP format)
- Successfully tested via gRPC (5.3MB output files)
- Foundation for auto-capture on test failures
- AI agents can now capture visual context for debugging
- ✅ IT-07 Test Recording & Replay Complete: Regression testing workflow operational
- ✅ Server-side wiring for test lifecycle tracking inside `TestManager`
- ✅ gRPC status mapping helper to surface accurate error codes back to clients
- ✅ CLI integration with YAML/JSON output formats
- ✅ End-to-end introspection tests with comprehensive validation
# Pull recommended model
ollama pull qwen2.5-coder:7b
**Next Priority**: IT-08b (Auto-capture on failure) + IT-08c (Widget state dumps) to complete enhanced error reporting
# Start server
ollama serve
**Test Harness Evolution** (In Progress: IT-05 to IT-09 | 78% Complete):
- **Test Introspection**: ✅ Query test status, results, and execution history
- **Widget Discovery**: ✅ AI agents can enumerate available GUI interactions dynamically
- **Test Recording**: ✅ Capture manual workflows as JSON scripts for regression testing
- **Enhanced Debugging**: 🔄 Screenshot capture (✅ IT-08a), widget state dumps (📋 IT-08c), execution context on failures (📋 IT-08b)
- **CI/CD Integration**: 📋 Standardized test suite format with JUnit XML output
# z3ed will auto-detect Ollama at localhost:11434
z3ed agent plan --prompt "test"
```
See **[E6-z3ed-cli-design.md § 9](E6-z3ed-cli-design.md#9-test-harness-evolution-from-automation-to-platform)** for detailed architecture and implementation roadmap.
### Gemini (Google Cloud API)
## Quick Navigation
```bash
# Get API key from https://aistudio.google.com/apikey
export GEMINI_API_KEY="your-key-here"
**📖 Getting Started**:
- **New to z3ed?** Start with this [README.md](README.md) then [E6-z3ed-cli-design.md](E6-z3ed-cli-design.md)
- **Want to use z3ed?** See [QUICK_REFERENCE.md](QUICK_REFERENCE.md) for all commands
- **Setting up AI agents?** See [LLM-INTEGRATION-PLAN.md](LLM-INTEGRATION-PLAN.md) for Ollama/Gemini/Claude setup
# z3ed will auto-select Gemini when key is set
z3ed agent plan --prompt "test"
```
**🔧 Implementation Guides**:
- [LLM-IMPLEMENTATION-CHECKLIST.md](LLM-IMPLEMENTATION-CHECKLIST.md) - Step-by-step LLM integration tasks ⭐ START HERE
- [IT-05-IMPLEMENTATION-GUIDE.md](IT-05-IMPLEMENTATION-GUIDE.md) - Test Introspection API (complete ✅)
- [IT-08-IMPLEMENTATION-GUIDE.md](IT-08-IMPLEMENTATION-GUIDE.md) - Enhanced Error Reporting (complete ✅)
**Note**: Gemini requires OpenSSL (HTTPS). Build with `-DYAZE_WITH_GRPC=ON -DYAZE_WITH_JSON=ON` to enable SSL support. OpenSSL is auto-detected on macOS/Linux. Windows users can use Ollama instead.
**📚 Reference**:
- [E6-z3ed-reference.md](E6-z3ed-reference.md) - Technical reference and API docs
- [E6-z3ed-implementation-plan.md](E6-z3ed-implementation-plan.md) - Task backlog and roadmap
- [QUICK_REFERENCE.md](QUICK_REFERENCE.md) - Quick command reference
## Core Documentation
### Essential Reads
1. **[E6-z3ed-cli-design.md](E6-z3ed-cli-design.md)** - Architecture, design philosophy, agentic workflow framework
2. **[E6-z3ed-reference.md](E6-z3ed-reference.md)** - Complete command reference and API documentation
3. **[AGENTIC-PLAN-STATUS.md](AGENTIC-PLAN-STATUS.md)** - Current implementation status and roadmap
### Quick References
- **[QUICK_REFERENCE.md](QUICK_REFERENCE.md)** - Condensed command cheatsheet
- **[QUICK-START-GEMINI.md](QUICK-START-GEMINI.md)** - Gemini API setup and testing guide
- **[OVERWORLD-DUNGEON-AI-PLAN.md](OVERWORLD-DUNGEON-AI-PLAN.md)** - Tile16 editing strategy and ResourceLabels integration
### Implementation Guides
- **[LLM-INTEGRATION-PLAN.md](LLM-INTEGRATION-PLAN.md)** - LLM integration roadmap (Ollama, Gemini, Claude)
- **[LLM-IMPLEMENTATION-CHECKLIST.md](LLM-IMPLEMENTATION-CHECKLIST.md)** - Step-by-step implementation tasks
- **[IT-05-IMPLEMENTATION-GUIDE.md](IT-05-IMPLEMENTATION-GUIDE.md)** - Test introspection API (complete ✅)
- **[IT-08-IMPLEMENTATION-GUIDE.md](IT-08-IMPLEMENTATION-GUIDE.md)** - Enhanced error reporting (complete ✅)
## Current Status (October 2025)
### ✅ Complete
- **CLI Infrastructure**: Command parsing, handlers, TUI components
- **Proposal System**: Sandbox creation, diff generation, accept/reject workflow
- **AI Services**: Ollama integration, Gemini integration, PromptBuilder
- **GUI Automation**: Widget discovery, test recording/replay, gRPC harness
- **Test Introspection**: Status polling, results query, execution history
- **Error Reporting**: Screenshots, failure context, widget state dumps
### 🔄 In Progress
- **Tile16 Editing Workflow**: Accept/reject for overworld canvas edits
- **ResourceLabels Integration**: User-defined names for AI context
- **Dungeon Editing Support**: Object/sprite placement via AI
### 📋 Planned
- **Visual Diff Generation**: Before/after screenshots for proposals
- **Batch Operations**: Multiple tile16 changes in single proposal
- **Pattern Library**: Pre-defined tile patterns (rivers, forests, etc.)
- **Claude Integration**: Anthropic API support
## AI Editing Focus Areas
z3ed is optimized for practical ROM editing workflows:
### Overworld Tile16 Editing ⭐ PRIMARY FOCUS
**Why**: Simple data model (uint16 IDs), visual feedback, reversible, safe
- Single tile placement (trees, rocks, bushes)
- Area creation (water ponds, dirt patches)
- Path creation (connecting points with tiles)
- Pattern generation (tree rows, forests, boundaries)
### Dungeon Editing
- Sprite placement with label awareness ("eastern palace entrance")
- Object placement (chests, doors, switches)
- Entrance configuration
- Room property editing
### Palette Editing
- Color modification by index
- Sprite palette adjustments
- Export/import workflows
### Additional Capabilities
- Sprite data editing
- Compression/decompression
- ROM validation
- Patch application
## Example Workflows
### Basic Tile16 Edit
```bash
# AI generates command
z3ed agent plan --prompt "Place a tree at 10, 10"
# Output: overworld set-tile --map 0 --x 10 --y 10 --tile 0x02E
# Execute manually
z3ed overworld set-tile --map 0 --x 10 --y 10 --tile 0x02E
# Or auto-execute with sandbox
z3ed agent run --prompt "Place a tree at 10, 10" --rom zelda3.sfc --sandbox
```
### Complex Multi-Step Edit
```bash
# AI generates multiple commands
z3ed agent plan --prompt "Create a 3x3 water pond at 15, 20"
# Review proposal
z3ed agent diff --latest
# Accept and apply
z3ed agent accept --latest
```
### Label-Aware Dungeon Edit
```bash
# AI uses ResourceLabels from your project
z3ed agent plan --prompt "Add 3 soldiers to my custom fortress entrance"
# AI explains: "Using label 'custom_fortress' for dungeon 0x04"
```
## Dependencies Guard
AI agent features require:
- `YAZE_WITH_GRPC=ON` - GUI automation and test harness
- `YAZE_WITH_JSON=ON` - AI service communication
- OpenSSL (optional) - Gemini HTTPS support (auto-detected)
**Windows Compatibility**: Build without gRPC/JSON for basic z3ed functionality. Use Ollama (localhost) instead of Gemini for AI features without SSL dependency.
## Recent Changes (Oct 3, 2025)
### SSL/HTTPS Support
- ✅ OpenSSL now optional (guarded by YAZE_WITH_GRPC + YAZE_WITH_JSON)
- ✅ Graceful degradation when OpenSSL not found (Ollama still works)
- ✅ Windows builds work without SSL dependencies
### Prompt Engineering
- ✅ Refocused examples on tile16 editing workflows
- ✅ Added dungeon editing with label awareness
- ✅ Inline tile16 reference for AI knowledge
- ✅ Practical multi-step examples (water ponds, paths, patterns)
### Documentation Consolidation
- ✅ Removed 10 outdated/redundant documents
- ✅ Consolidated status into AGENTIC-PLAN-STATUS.md
- ✅ Updated README with clear dependency requirements
- ✅ Added Windows compatibility notes
## Troubleshooting
### "OpenSSL not found" warning
**Impact**: Gemini API won't work (HTTPS required)
**Solutions**:
- Use Ollama instead (no SSL needed, runs locally)
- Install OpenSSL: `brew install openssl` (macOS) or `apt-get install libssl-dev` (Linux)
- Windows: Build without gRPC/JSON, use Ollama
### "gRPC not available" error
**Impact**: GUI testing and automation disabled
**Solution**: Rebuild with `-DYAZE_WITH_GRPC=ON`
### AI generates invalid commands
**Causes**: Vague prompt, unfamiliar tile IDs, missing context
**Solutions**:
- Use specific coordinates and tile types
- Reference tile16 IDs from documentation
- Provide map context ("Light World", "map 0")
- Check ResourceLabels are loaded for your project
## Contributing
### Adding AI Prompt Examples
Edit `src/cli/service/prompt_builder.cc``LoadDefaultExamples()`
- Add practical, multi-step examples
- Include explanation of tile IDs and reasoning
- Test with both Ollama and Gemini
### Adding CLI Commands
1. Create handler in `src/cli/handlers/<category>.cc`
2. Register in command dispatcher
3. Add to `E6-z3ed-reference.md` documentation
4. Add example prompt to `prompt_builder.cc`
### Testing
```bash
# Run unit tests
cd build-grpc-test && ctest --output-on-failure
# Test AI integration
./bin/z3ed agent plan --prompt "test prompt" --verbose
```
---
**Getting Help**:
- Read [E6-z3ed-cli-design.md](E6-z3ed-cli-design.md) for architecture
- Check [AGENTIC-PLAN-STATUS.md](AGENTIC-PLAN-STATUS.md) for current status
- Review [QUICK-START-GEMINI.md](QUICK-START-GEMINI.md) for AI setup
**Quick Test** (verifies AI is working):
```bash
export GEMINI_API_KEY="your-key" # or start ollama serve
./build-grpc-test/bin/z3ed agent plan --prompt "Place a tree at 10, 10"
```