Add z3ed Agent Roadmap Document
- Introduced a new `AGENT-ROADMAP.md` file outlining the strategic vision and implementation plan for the `z3ed` AI agent. - Defined the core vision of transitioning to a conversational ROM hacking assistant with key features such as an interactive chat interface, ROM introspection, and contextual awareness. - Detailed the technical implementation plan, including the development of a `ConversationalAgentService`, read-only tools for the agent, and user-facing TUI/GUI chat interfaces. - Consolidated immediate priorities, short-term goals, and long-term vision for the agent's development. This commit establishes a comprehensive roadmap for enhancing the z3ed agent's capabilities, paving the way for future AI-driven features and user interactions.
This commit is contained in:
99
docs/z3ed/AGENT-ROADMAP.md
Normal file
99
docs/z3ed/AGENT-ROADMAP.md
Normal file
@@ -0,0 +1,99 @@
|
||||
# z3ed Agent Roadmap
|
||||
|
||||
**Latest Update**: October 3, 2025
|
||||
|
||||
This document outlines the strategic vision and concrete next steps for the `z3ed` AI agent, focusing on a transition from a command-line tool to a fully interactive, conversational assistant for ROM hacking.
|
||||
|
||||
## Core Vision: The Conversational ROM Hacking Assistant
|
||||
|
||||
The next evolution of the `z3ed` agent is to create a chat-like interface where users can interact with the AI in a more natural and exploratory way. Instead of just issuing a single command, users will be able to have a dialogue with the agent to inspect the ROM, ask questions, and iteratively build up a set of changes.
|
||||
|
||||
This vision will be realized through a shared interface available in both the `z3ed` TUI and the main `yaze` GUI application.
|
||||
|
||||
### Key Features
|
||||
1. **Interactive Chat Interface**: A familiar chat window for conversing with the agent.
|
||||
2. **ROM Introspection**: The agent will be able to answer questions about the ROM, such as "What dungeons are defined in this project?" or "How many soldiers are in the Hyrule Castle throne room?".
|
||||
3. **Contextual Awareness**: The agent will maintain the context of the conversation, allowing for follow-up questions and commands.
|
||||
4. **Seamless Transition to Action**: When the user is ready to make a change, the agent will use the conversation history to generate a comprehensive proposal for editing the ROM.
|
||||
5. **Shared Experience**: The same conversational agent will be accessible from both the terminal and the graphical user interface, providing a consistent experience.
|
||||
|
||||
## Technical Implementation Plan
|
||||
|
||||
### 1. Conversational Agent Service
|
||||
- **Description**: A new service that will manage the back-and-forth between the user and the LLM. It will maintain chat history and orchestrate the agent's different modes (Q&A vs. command generation).
|
||||
- **Components**:
|
||||
- `ConversationalAgentService`: The main class for managing the chat session.
|
||||
- Integration with existing `AIService` implementations (Ollama, Gemini).
|
||||
- **Status**: Not started.
|
||||
|
||||
### 2. Read-Only "Tools" for the Agent
|
||||
- **Description**: To enable the agent to answer questions, we need to expand `z3ed` with a suite of read-only commands that the LLM can call. This is aligned with the "tool use" or "function calling" capabilities of modern LLMs.
|
||||
- **Example Tools to Implement**:
|
||||
- `resource list --type <dungeon|sprite|...>`: List all user-defined labels of a certain type.
|
||||
- `dungeon list-sprites --room <id|label>`: List all sprites in a given room.
|
||||
- `dungeon get-info --room <id|label>`: Get metadata for a specific room.
|
||||
- `overworld find-tile --tile <id>`: Find all occurrences of a specific tile on the overworld map.
|
||||
- **Advanced Editing Tools (for future implementation)**:
|
||||
- `overworld set-area --map <id> --x <x> --y <y> --width <w> --height <h> --tile <id>`
|
||||
- `overworld replace-tile --map <id> --from <old_id> --to <new_id>`
|
||||
- `overworld blend-tiles --map <id> --pattern <name> --density <percent>`
|
||||
- **Status**: Some commands exist (`overworld get-tile`), but the suite needs to be expanded.
|
||||
|
||||
### 3. TUI and GUI Chat Interfaces
|
||||
- **Description**: User-facing components for interacting with the `ConversationalAgentService`.
|
||||
- **Components**:
|
||||
- **TUI**: A new full-screen component in `z3ed` using FTXUI, providing a rich chat experience in the terminal.
|
||||
- **GUI**: A new ImGui widget that can be docked into the main `yaze` application window.
|
||||
- **Status**: Not started.
|
||||
|
||||
### 4. Integration with the Proposal Workflow
|
||||
- **Description**: The final step is to connect the conversation to the action. When a user's prompt implies a desire to modify the ROM (e.g., "Okay, now add two more soldiers"), the `ConversationalAgentService` will trigger the existing `Tile16ProposalGenerator` (and future proposal generators for other resource types) to create a proposal.
|
||||
- **Workflow**:
|
||||
1. User chats with the agent to explore the ROM.
|
||||
2. User asks the agent to make a change.
|
||||
3. `ConversationalAgentService` generates the commands and passes them to the appropriate `ProposalGenerator`.
|
||||
4. A new proposal is created and saved.
|
||||
5. The TUI/GUI notifies the user that a proposal is ready for review.
|
||||
6. User uses the `agent diff` and `agent accept` commands (or UI equivalents) to review and apply the changes.
|
||||
- **Status**: The proposal workflow itself is mostly implemented. This task involves integrating it with the new conversational service.
|
||||
|
||||
## Consolidated Next Steps
|
||||
|
||||
### Immediate Priorities (Next Session)
|
||||
1. **Implement Read-Only Agent Tools**:
|
||||
- Add `resource list` command.
|
||||
- Add `dungeon list-sprites` command.
|
||||
- Ensure all new commands have JSON output options for machine readability.
|
||||
2. **Stub out `ConversationalAgentService`**:
|
||||
- Create the basic class structure.
|
||||
- Implement simple chat history management.
|
||||
3. **Update `README.md` and Consolidate Docs**:
|
||||
- Update the main `README.md` to reflect this new roadmap.
|
||||
- Remove `IMPLEMENTATION-SESSION-OCT3-CONTINUED.md`.
|
||||
- Merge any other scattered planning documents into this roadmap.
|
||||
|
||||
### Short-Term Goals (This Week)
|
||||
1. **Build TUI Chat Interface**:
|
||||
- Create the FTXUI component.
|
||||
- Connect it to the `ConversationalAgentService`.
|
||||
- Implement basic input/output.
|
||||
2. **Integrate Tool Use with LLM**:
|
||||
- Modify the `AIService` to support function calling/tool use.
|
||||
- Teach the agent to call the new read-only commands to answer questions.
|
||||
|
||||
### Long-Term Vision (Next Week and Beyond)
|
||||
1. **Build GUI Chat Widget**:
|
||||
- Create the ImGui component.
|
||||
- Ensure it shares the same backend service as the TUI.
|
||||
2. **Full Integration with Proposal System**:
|
||||
- Implement the logic for the agent to transition from conversation to proposal generation.
|
||||
3. **Expand Tool Arsenal**:
|
||||
- Continuously add new read-only commands to give the agent more capabilities to inspect the ROM.
|
||||
4. **Multi-Modal Agent**:
|
||||
- Explore the possibility of the agent generating and displaying images (e.g., a map of a dungeon room) in the chat.
|
||||
5. **Advanced Configuration**:
|
||||
- Implement environment variables for selecting AI providers and models (e.g., `YAZE_AI_PROVIDER`, `OLLAMA_MODEL`).
|
||||
- Add CLI flags for overriding the provider and model on a per-command basis.
|
||||
6. **Performance and Cost-Saving**:
|
||||
- Implement a response cache to reduce latency and API costs.
|
||||
- Add token usage tracking and reporting.
|
||||
@@ -1,374 +0,0 @@
|
||||
# z3ed AI Agentic Plan - Current Status
|
||||
|
||||
**Date**: October 3, 2025
|
||||
**Overall Status**: ✅ Infrastructure Complete | 🚀 Ready for Testing
|
||||
**Build Status**: ✅ z3ed compiles successfully in `build-grpc-test`
|
||||
**Platform Compatibility**: ✅ Windows builds supported (SSL optional, Ollama recommended)
|
||||
|
||||
## Executive Summary
|
||||
|
||||
The z3ed AI agentic system infrastructure is **fully implemented** and ready for real-world testing. All four phases from the LLM Integration Plan are complete:
|
||||
|
||||
- ✅ **Phase 1**: Ollama local integration (DONE)
|
||||
- ✅ **Phase 2**: Gemini API enhancement (DONE)
|
||||
- ✅ **Phase 4**: Enhanced prompting with PromptBuilder (DONE)
|
||||
- ⏭️ **Phase 3**: Claude integration (DEFERRED - not critical for initial testing)
|
||||
|
||||
## 🎯 What's Working Right Now
|
||||
|
||||
### 1. Build System ✅
|
||||
- **File Structure**: Clean, modular architecture
|
||||
- `test_common.{h,cc}` - Shared utilities (134 lines)
|
||||
- `test_commands.cc` - Main dispatcher (55 lines)
|
||||
- `ollama_ai_service.{h,cc}` - Ollama integration (264 lines)
|
||||
- `gemini_ai_service.{h,cc}` - Gemini integration (239 lines)
|
||||
- `prompt_builder.{h,cc}` - Enhanced prompting (354 lines, refactored for tile16 focus)
|
||||
|
||||
- **Build**: Successfully compiles with gRPC + JSON support
|
||||
```bash
|
||||
$ ls -lh build-grpc-test/bin/z3ed
|
||||
-rwxr-xr-x 69M Oct 3 02:18 build-grpc-test/bin/z3ed
|
||||
```
|
||||
|
||||
- **Platform Support**:
|
||||
- ✅ macOS: Full support (OpenSSL auto-detected)
|
||||
- ✅ Linux: Full support (OpenSSL via package manager)
|
||||
- ✅ Windows: Build without gRPC/JSON or use Ollama (no SSL needed)
|
||||
|
||||
- **Dependency Guards**:
|
||||
- SSL only required when `YAZE_WITH_GRPC=ON` AND `YAZE_WITH_JSON=ON`
|
||||
- Graceful degradation: warns if OpenSSL missing but Ollama still works
|
||||
- Windows-compatible: can build basic z3ed without AI features
|
||||
|
||||
### 2. AI Service Infrastructure ✅
|
||||
|
||||
#### AIService Interface
|
||||
**Location**: `src/cli/service/ai_service.h`
|
||||
- Clean abstraction for pluggable AI backends
|
||||
- Single method: `GetCommands(prompt) → vector<string>`
|
||||
- Easy to test and swap implementations
|
||||
|
||||
#### Implemented Services
|
||||
|
||||
**A. MockAIService** (Testing)
|
||||
- Returns hardcoded test commands
|
||||
- Perfect for CI/CD and offline development
|
||||
- No dependencies required
|
||||
|
||||
**B. OllamaAIService** (Local LLM)
|
||||
- ✅ Full implementation complete
|
||||
- ✅ HTTP client using cpp-httplib
|
||||
- ✅ JSON parsing with nlohmann/json
|
||||
- ✅ Health checks and model validation
|
||||
- ✅ Configurable model selection
|
||||
- ✅ Integrated with PromptBuilder for enhanced prompts
|
||||
- **Models Supported**:
|
||||
- `qwen2.5-coder:7b` (recommended, fast, good code gen)
|
||||
- `codellama:7b` (alternative)
|
||||
- `llama3.1:8b` (general purpose)
|
||||
- Any Ollama-compatible model
|
||||
|
||||
**C. GeminiAIService** (Google Cloud)
|
||||
- ✅ Full implementation complete
|
||||
- ✅ HTTP client using cpp-httplib
|
||||
- ✅ JSON request/response handling
|
||||
- ✅ Integrated with PromptBuilder
|
||||
- ✅ Configurable via `GEMINI_API_KEY` env var
|
||||
- **Models**: `gemini-1.5-flash`, `gemini-1.5-pro`
|
||||
|
||||
### 3. Enhanced Prompting System ✅
|
||||
|
||||
**PromptBuilder** (`src/cli/service/prompt_builder.{h,cc}`)
|
||||
|
||||
#### Features Implemented:
|
||||
- ✅ **System Instructions**: Clear role definition for the AI
|
||||
- ✅ **Command Documentation**: Inline command reference
|
||||
- ✅ **Few-Shot Examples**: 8 curated tile16/dungeon examples (refactored Oct 3)
|
||||
- ✅ **Resource Catalogue**: Extensible command registry
|
||||
- ✅ **JSON Output Format**: Enforced structured responses
|
||||
- ✅ **Tile16 Reference**: Inline common tile IDs for AI knowledge
|
||||
|
||||
#### Example Categories (UPDATED):
|
||||
1. **Overworld Tile16 Editing** ⭐ PRIMARY FOCUS:
|
||||
- Single tile placement: "Place a tree at position 10, 20 on map 0"
|
||||
- Area creation: "Create a 3x3 water pond at coordinates 15, 10"
|
||||
- Path creation: "Add a dirt path from position 5,5 to 5,15"
|
||||
- Pattern generation: "Plant a row of trees horizontally at y=8 from x=20 to x=25"
|
||||
|
||||
2. **Dungeon Editing** (Label-Aware):
|
||||
- "Add 3 soldiers to the Eastern Palace entrance room"
|
||||
- "Place a chest in the Hyrule Castle treasure room"
|
||||
|
||||
3. **Tile16 Reference** (Inline for AI):
|
||||
- Grass: 0x020, Dirt: 0x022, Tree: 0x02E
|
||||
- Water edges: 0x14C (top), 0x14D (middle), 0x14E (bottom)
|
||||
- Bush: 0x003, Rock: 0x004, Flower: 0x021, Sand: 0x023
|
||||
|
||||
**Note**: AI can support additional edit types (sprites, palettes, patches) but tile16 is the primary validated use case.
|
||||
|
||||
### 4. Service Selection Logic ✅
|
||||
|
||||
**AI Service Factory** (`CreateAIService()`)
|
||||
|
||||
Selection Priority:
|
||||
1. If `GEMINI_API_KEY` set → Use Gemini
|
||||
2. If Ollama available → Use Ollama
|
||||
3. Fallback → MockAIService
|
||||
|
||||
**Configuration**:
|
||||
```bash
|
||||
# Use Gemini (requires API key)
|
||||
export GEMINI_API_KEY="your-key-here"
|
||||
./z3ed agent plan --prompt "Make soldiers red"
|
||||
|
||||
# Use Ollama (requires ollama serve running)
|
||||
unset GEMINI_API_KEY
|
||||
ollama serve # Terminal 1
|
||||
./z3ed agent plan --prompt "Make soldiers red" # Terminal 2
|
||||
|
||||
# Use Mock (always works, no dependencies)
|
||||
# Automatic fallback if neither Gemini nor Ollama available
|
||||
```
|
||||
|
||||
## 📋 What's Ready to Test
|
||||
|
||||
### Test Scenario 1: Ollama Local LLM
|
||||
|
||||
**Prerequisites**:
|
||||
```bash
|
||||
# Install Ollama
|
||||
brew install ollama # macOS
|
||||
# or download from https://ollama.com
|
||||
|
||||
# Pull recommended model
|
||||
ollama pull qwen2.5-coder:7b
|
||||
|
||||
# Start Ollama server
|
||||
ollama serve
|
||||
```
|
||||
|
||||
**Test Commands**:
|
||||
```bash
|
||||
cd /Users/scawful/Code/yaze
|
||||
export ROM_PATH="assets/zelda3.sfc"
|
||||
|
||||
# Test 1: Simple palette change
|
||||
./build-grpc-test/bin/z3ed agent plan \
|
||||
--prompt "Change palette 0 color 5 to red"
|
||||
|
||||
# Test 2: Complex sprite modification
|
||||
./build-grpc-test/bin/z3ed agent plan \
|
||||
--prompt "Make all soldier armors blue"
|
||||
|
||||
# Test 3: Overworld editing
|
||||
./build-grpc-test/bin/z3ed agent plan \
|
||||
--prompt "Place a tree at position 10, 20 on map 0"
|
||||
|
||||
# Test 4: End-to-end with sandbox
|
||||
./build-grpc-test/bin/z3ed agent run \
|
||||
--prompt "Validate the ROM" \
|
||||
--rom assets/zelda3.sfc \
|
||||
--sandbox
|
||||
```
|
||||
|
||||
### Test Scenario 2: Gemini API
|
||||
|
||||
**Prerequisites**:
|
||||
```bash
|
||||
# Get API key from https://aistudio.google.com/apikey
|
||||
export GEMINI_API_KEY="your-actual-api-key-here"
|
||||
```
|
||||
|
||||
**Test Commands**:
|
||||
```bash
|
||||
# Same commands as Ollama scenario above
|
||||
# Service selection will automatically use Gemini when key is set
|
||||
|
||||
# Verify Gemini is being used
|
||||
./build-grpc-test/bin/z3ed agent plan --prompt "test" 2>&1 | grep -i "gemini\|model"
|
||||
```
|
||||
|
||||
### Test Scenario 3: Fallback to Mock
|
||||
|
||||
**Test Commands**:
|
||||
```bash
|
||||
# Ensure neither Gemini nor Ollama are available
|
||||
unset GEMINI_API_KEY
|
||||
# (Stop ollama serve if running)
|
||||
|
||||
# Should fall back to Mock and return hardcoded test commands
|
||||
./build-grpc-test/bin/z3ed agent plan --prompt "anything"
|
||||
```
|
||||
|
||||
## 🎯 Current Implementation Status
|
||||
|
||||
### Phase 1: Ollama Integration ✅ COMPLETE
|
||||
- [x] OllamaAIService class created
|
||||
- [x] HTTP client integrated (cpp-httplib)
|
||||
- [x] JSON parsing (nlohmann/json)
|
||||
- [x] Health check endpoint (`/api/tags`)
|
||||
- [x] Model validation
|
||||
- [x] Generate endpoint (`/api/generate`)
|
||||
- [x] Streaming response handling
|
||||
- [x] Error handling and retry logic
|
||||
- [x] Configuration struct with defaults
|
||||
- [x] Integration with PromptBuilder
|
||||
- [x] Documentation and examples
|
||||
|
||||
**Estimated**: 4-6 hours | **Actual**: 4 hours | **Status**: ✅ DONE
|
||||
|
||||
### Phase 2: Gemini Enhancement ✅ COMPLETE
|
||||
- [x] GeminiAIService class updated
|
||||
- [x] HTTP client integrated (cpp-httplib)
|
||||
- [x] JSON request/response handling
|
||||
- [x] API key management via env var
|
||||
- [x] Model selection (flash vs pro)
|
||||
- [x] Integration with PromptBuilder
|
||||
- [x] Enhanced error messages
|
||||
- [x] Rate limit handling (with backoff)
|
||||
- [x] Token counting (estimated)
|
||||
- [x] Cost tracking (estimated)
|
||||
|
||||
**Estimated**: 3-4 hours | **Actual**: 3 hours | **Status**: ✅ DONE
|
||||
|
||||
### Phase 3: Claude Integration ⏭️ DEFERRED
|
||||
- [ ] ClaudeAIService class
|
||||
- [ ] Anthropic API integration
|
||||
- [ ] Token tracking
|
||||
- [ ] Prompt caching support
|
||||
|
||||
**Estimated**: 3-4 hours | **Status**: Not critical for initial testing
|
||||
|
||||
### Phase 4: Enhanced Prompting ✅ COMPLETE
|
||||
- [x] PromptBuilder class created
|
||||
- [x] System instruction templates
|
||||
- [x] Command documentation registry
|
||||
- [x] Few-shot example library
|
||||
- [x] Resource catalogue integration
|
||||
- [x] JSON output format enforcement
|
||||
- [x] Integration with all AI services
|
||||
- [x] Example categories (palette, overworld, validation)
|
||||
|
||||
**Estimated**: 2-3 hours | **Actual**: 2 hours | **Status**: ✅ DONE
|
||||
|
||||
## 🚀 Next Steps
|
||||
|
||||
### Immediate Actions (Next Session)
|
||||
|
||||
1. **Integrate Tile16ProposalGenerator into Agent Commands** (2 hours)
|
||||
- Modify `HandlePlanCommand()` to use generator
|
||||
- Modify `HandleRunCommand()` to apply proposals
|
||||
- Add `HandleAcceptCommand()` for accepting proposals
|
||||
|
||||
2. **Integrate ResourceContextBuilder into PromptBuilder** (1 hour)
|
||||
- Update `BuildContextualPrompt()` to inject labels
|
||||
- Test with actual labels file from user project
|
||||
|
||||
3. **Test End-to-End Workflow** (1 hour)
|
||||
```bash
|
||||
ollama serve
|
||||
./build-grpc-test/bin/z3ed agent plan \
|
||||
--prompt "Create a 3x3 water pond at 15, 10"
|
||||
|
||||
# Verify proposal generation
|
||||
# Verify tile16 changes are correct
|
||||
```
|
||||
|
||||
4. **Add Visual Diff Implementation** (2-3 hours)
|
||||
- Render tile16 bitmaps from overworld
|
||||
- Create side-by-side comparison images
|
||||
- Highlight changed tiles
|
||||
|
||||
### Short-Term (This Week)
|
||||
|
||||
1. **Accuracy Benchmarking**
|
||||
- Test 20 different prompts
|
||||
- Measure command correctness
|
||||
- Compare Ollama vs Gemini vs Mock
|
||||
|
||||
2. **Error Handling Refinement**
|
||||
- Test API failures
|
||||
- Test invalid API keys
|
||||
- Test network timeouts
|
||||
- Test malformed responses
|
||||
|
||||
3. **GUI Automation Integration**
|
||||
- Use `agent test` commands to verify changes
|
||||
- Screenshot capture on failures
|
||||
- Automated validation workflows
|
||||
|
||||
4. **Documentation**
|
||||
- User guide for setting up Ollama
|
||||
- User guide for setting up Gemini
|
||||
- Troubleshooting guide
|
||||
- Example prompts library
|
||||
|
||||
### Long-Term (Next Sprint)
|
||||
|
||||
1. **Claude Integration** (if needed)
|
||||
2. **Prompt Optimization**
|
||||
- A/B testing different system instructions
|
||||
- Expand few-shot examples
|
||||
- Domain-specific command groups
|
||||
|
||||
3. **Advanced Features**
|
||||
- Multi-turn conversations
|
||||
- Context retention
|
||||
- Command chaining validation
|
||||
- Safety checks before execution
|
||||
|
||||
## 📊 Success Metrics
|
||||
|
||||
### Build Health ✅
|
||||
- [x] z3ed compiles without errors
|
||||
- [x] All AI services link correctly
|
||||
- [x] No linker errors with httplib/json
|
||||
- [x] Binary size reasonable (69MB is fine with gRPC)
|
||||
|
||||
### Code Quality ✅
|
||||
- [x] Modular architecture
|
||||
- [x] Clean separation of concerns
|
||||
- [x] Proper error handling
|
||||
- [x] Comprehensive documentation
|
||||
|
||||
### Functionality Ready 🚀
|
||||
- [ ] Ollama generates valid commands (NEEDS TESTING)
|
||||
- [ ] Gemini generates valid commands (NEEDS TESTING)
|
||||
- [ ] Mock service always works (✅ VERIFIED)
|
||||
- [ ] Service selection logic works (✅ VERIFIED)
|
||||
- [ ] Sandbox isolation works (✅ VERIFIED from previous tests)
|
||||
|
||||
## 🎉 Key Achievements
|
||||
|
||||
1. **Modular Architecture**: Clean separation allows easy addition of new AI services
|
||||
2. **Build System**: Successfully integrated httplib and JSON without major issues
|
||||
3. **Enhanced Prompting**: PromptBuilder provides consistent, high-quality prompts
|
||||
4. **Flexibility**: Support for local (Ollama), cloud (Gemini), and mock backends
|
||||
5. **Documentation**: Comprehensive plans, guides, and status tracking
|
||||
6. **Testing Ready**: All infrastructure in place to start real-world validation
|
||||
|
||||
## 📝 Files Summary
|
||||
|
||||
### Created/Modified Recently
|
||||
- ✅ `src/cli/handlers/agent/test_common.{h,cc}` (NEW)
|
||||
- ✅ `src/cli/handlers/agent/test_commands.cc` (REBUILT)
|
||||
- ✅ `src/cli/z3ed.cmake` (UPDATED)
|
||||
- ✅ `src/cli/service/gemini_ai_service.cc` (FIXED includes)
|
||||
- ✅ `src/cli/service/tile16_proposal_generator.{h,cc}` (NEW - Oct 3) ✨
|
||||
- ✅ `src/cli/service/resource_context_builder.{h,cc}` (NEW - Oct 3) ✨
|
||||
- ✅ `src/app/zelda3/overworld/overworld.h` (UPDATED - SetTile method) ✨
|
||||
- ✅ `src/cli/handlers/overworld.cc` (UPDATED - SetTile implementation) ✨
|
||||
- ✅ `docs/z3ed/IMPLEMENTATION-SESSION-OCT3-CONTINUED.md` (NEW) ✨
|
||||
- ✅ `docs/z3ed/AGENTIC-PLAN-STATUS.md` (UPDATED - this file)
|
||||
|
||||
### Previously Implemented (Phase 1-4)
|
||||
- ✅ `src/cli/service/ollama_ai_service.{h,cc}`
|
||||
- ✅ `src/cli/service/gemini_ai_service.{h,cc}`
|
||||
- ✅ `src/cli/service/prompt_builder.{h,cc}`
|
||||
- ✅ `src/cli/service/ai_service.{h,cc}`
|
||||
|
||||
---
|
||||
|
||||
**Status**: ✅ ALL SYSTEMS GO - Ready for real-world testing!
|
||||
**Next Action**: Begin Ollama/Gemini testing to validate actual command generation quality
|
||||
|
||||
@@ -1,509 +0,0 @@
|
||||
# IT-05: Test Introspection API – Implementation Guide
|
||||
|
||||
**Status (Oct 2, 2025)**: ✅ **COMPLETE - Production Ready**
|
||||
|
||||
## Progress Snapshot
|
||||
|
||||
- ✅ Proto definitions and service stubs added for `GetTestStatus`, `ListTests`, `GetTestResults`.
|
||||
- ✅ `TestManager` now records execution lifecycle, aggregates, logs, and metrics with thread-safe history trimming.
|
||||
- ✅ `ImGuiTestHarnessServiceImpl` implements the three introspection RPC handlers, including pagination and status conversion helpers.
|
||||
- ✅ CLI wiring complete: `GuiAutomationClient` exposes all introspection methods.
|
||||
- ✅ User-facing commands: `z3ed agent test {status,list,results}` fully functional with YAML/JSON output.
|
||||
- ✅ End-to-end validation script (`scripts/test_introspection_e2e.sh`) validates complete workflow.
|
||||
|
||||
**E2E Test Results** (Oct 2, 2025):
|
||||
```bash
|
||||
✓ GetTestStatus RPC - Query test execution status
|
||||
✓ ListTests RPC - Enumerate available tests
|
||||
✓ GetTestResults RPC - Retrieve detailed results (YAML + JSON)
|
||||
✓ Follow mode - Poll status until completion
|
||||
✓ Category filtering - Filter tests by category
|
||||
✓ Pagination - Limit number of results
|
||||
```st Introspection API – Implementation Guide
|
||||
|
||||
**Status (Oct 2, 2025)**: 🟡 *Server-side RPCs complete; CLI + E2E pending*
|
||||
|
||||
## Progress Snapshot
|
||||
|
||||
- ✅ Proto definitions and service stubs added for `GetTestStatus`, `ListTests`, `GetTestResults`.
|
||||
- ✅ `TestManager` now records execution lifecycle, aggregates, logs, and metrics with thread-safe history trimming.
|
||||
- ✅ `ImGuiTestHarnessServiceImpl` implements the three RPC handlers, including pagination and status conversion helpers.
|
||||
- ⚠️ CLI wiring, automation client calls, and user-facing output still TODO.
|
||||
- ⚠️ End-to-end validation script (`scripts/test_introspection_e2e.sh`) not yet authored.
|
||||
|
||||
**Current Limitations**:
|
||||
- ❌ Tests execute asynchronously with no way to query status
|
||||
- ❌ Clients must poll blindly or give up early
|
||||
- ❌ No visibility into test execution queue
|
||||
- ❌ Results lost after test completion
|
||||
- ❌ Can't track test history or identify flaky tests
|
||||
|
||||
**Why This Blocks AI Agent Autonomy**
|
||||
|
||||
Without test introspection, **AI agents cannot implement closed-loop feedback**:
|
||||
|
||||
```
|
||||
❌ BROKEN: AI Agent Without IT-05
|
||||
1. AI generates commands: ["z3ed palette export ..."]
|
||||
2. AI executes commands in sandbox
|
||||
3. AI generates test: "Verify soldier is red"
|
||||
4. AI runs test → Gets test_id
|
||||
5. ??? AI has no way to check if test passed ???
|
||||
6. AI presents proposal to user blindly
|
||||
(might be broken, AI doesn't know)
|
||||
|
||||
✅ WORKING: AI Agent With IT-05
|
||||
1. AI generates commands
|
||||
2. AI executes in sandbox
|
||||
3. AI generates verification test
|
||||
4. AI runs test → Gets test_id
|
||||
5. AI polls: GetTestStatus(test_id)
|
||||
6. Test FAILED? AI sees error + screenshot
|
||||
7. AI adjusts strategy and retries
|
||||
8. Test PASSED? AI presents successful proposal
|
||||
```
|
||||
|
||||
**This is the difference between**:
|
||||
- **Dumb automation**: Execute blindly, hope for the best
|
||||
- **Intelligent agent**: Verify, learn, self-correct
|
||||
|
||||
**Benefits After IT-05**:
|
||||
- ✅ AI agents can reliably poll for test completion
|
||||
- ✅ AI agents can read detailed failure messages
|
||||
- ✅ AI agents can implement retry logic with adjusted strategies
|
||||
- ✅ CLI can show real-time progress bars
|
||||
- ✅ Test history enables trend analysis (flaky tests, performance regressions)
|
||||
- ✅ Foundation for test recording/replay (IT-07)
|
||||
- ✅ **Enables autonomous agent operation**ion API - Implementation Guide
|
||||
|
||||
**Status**: 📋 Planned | Priority 1 | Time Estimate: 6-8 hours
|
||||
**Dependencies**: IT-01 Complete ✅, IT-02 Complete ✅
|
||||
**Blocking**: IT-06 (Widget Discovery needs introspection foundation)
|
||||
|
||||
## Overview
|
||||
|
||||
Add test introspection capabilities to enable clients to query test execution status, list available tests, and retrieve detailed results. This is critical for AI agents to reliably poll for test completion and make decisions based on results.
|
||||
|
||||
## Motivation
|
||||
|
||||
**Current Limitations**:
|
||||
- ❌ Tests execute asynchronously with no way to query status
|
||||
- ❌ Clients must poll blindly or give up early
|
||||
- ❌ No visibility into test execution queue
|
||||
- ❌ Results lost after test completion
|
||||
- ❌ Can't track test history or identify flaky tests
|
||||
|
||||
**Benefits After IT-05**
|
||||
|
||||
- ✅ AI agents can reliably poll for test completion
|
||||
- ✅ CLI can show real-time progress bars
|
||||
- ✅ Test history enables trend analysis
|
||||
- ✅ Foundation for test recording/replay (IT-07)
|
||||
|
||||
## Architecture
|
||||
|
||||
### New Service Components
|
||||
|
||||
```cpp
|
||||
// src/app/core/test_manager.h
|
||||
class TestManager {
|
||||
// Existing...
|
||||
|
||||
// NEW: Test tracking
|
||||
struct TestExecution {
|
||||
std::string test_id;
|
||||
std::string name;
|
||||
std::string category;
|
||||
TestStatus status; // QUEUED, RUNNING, PASSED, FAILED, TIMEOUT
|
||||
int64_t queued_at_ms;
|
||||
int64_t started_at_ms;
|
||||
int64_t completed_at_ms;
|
||||
int32_t execution_time_ms;
|
||||
std::string error_message;
|
||||
std::vector<std::string> assertion_failures;
|
||||
std::vector<std::string> logs;
|
||||
};
|
||||
|
||||
// NEW: Test execution tracking
|
||||
absl::StatusOr<TestExecution> GetTestStatus(const std::string& test_id);
|
||||
std::vector<TestExecution> ListTests(const std::string& category_filter = "");
|
||||
absl::StatusOr<TestExecution> GetTestResults(const std::string& test_id);
|
||||
|
||||
private:
|
||||
// NEW: Test execution history
|
||||
std::map<std::string, TestExecution> test_history_;
|
||||
absl::Mutex test_history_mutex_; // Thread-safe access
|
||||
};
|
||||
```
|
||||
|
||||
### Proto Additions
|
||||
|
||||
```protobuf
|
||||
// src/app/core/proto/imgui_test_harness.proto
|
||||
|
||||
// Add to service definition
|
||||
service ImGuiTestHarness {
|
||||
// ... existing RPCs ...
|
||||
|
||||
// NEW: Test introspection
|
||||
rpc GetTestStatus(GetTestStatusRequest) returns (GetTestStatusResponse);
|
||||
rpc ListTests(ListTestsRequest) returns (ListTestsResponse);
|
||||
rpc GetTestResults(GetTestResultsRequest) returns (GetTestResultsResponse);
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// GetTestStatus - Query test execution state
|
||||
// ============================================================================
|
||||
|
||||
message GetTestStatusRequest {
|
||||
string test_id = 1; // Test ID from Click/Type/Wait/Assert response
|
||||
}
|
||||
|
||||
message GetTestStatusResponse {
|
||||
enum Status {
|
||||
UNKNOWN = 0; // Test ID not found
|
||||
QUEUED = 1; // Waiting to execute
|
||||
RUNNING = 2; // Currently executing
|
||||
PASSED = 3; // Completed successfully
|
||||
FAILED = 4; // Assertion failed or error
|
||||
TIMEOUT = 5; // Exceeded timeout
|
||||
}
|
||||
|
||||
Status status = 1;
|
||||
int64 queued_at_ms = 2; // When test was queued
|
||||
int64 started_at_ms = 3; // When test started (0 if not started)
|
||||
int64 completed_at_ms = 4; // When test completed (0 if not complete)
|
||||
int32 execution_time_ms = 5; // Total execution time
|
||||
string error_message = 6; // Error details if FAILED/TIMEOUT
|
||||
repeated string assertion_failures = 7; // Failed assertion details
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// ListTests - Enumerate available tests
|
||||
// ============================================================================
|
||||
|
||||
message ListTestsRequest {
|
||||
string category_filter = 1; // Optional: "grpc", "unit", "integration", "e2e"
|
||||
int32 page_size = 2; // Number of results per page (default 100)
|
||||
string page_token = 3; // Pagination token from previous response
|
||||
}
|
||||
|
||||
message ListTestsResponse {
|
||||
repeated TestInfo tests = 1;
|
||||
string next_page_token = 2; // Token for next page (empty if no more)
|
||||
int32 total_count = 3; // Total number of matching tests
|
||||
}
|
||||
|
||||
message TestInfo {
|
||||
string test_id = 1; // Unique test identifier
|
||||
string name = 2; // Human-readable test name
|
||||
string category = 3; // Category: grpc, unit, integration, e2e
|
||||
int64 last_run_timestamp_ms = 4; // When test last executed
|
||||
int32 total_runs = 5; // Total number of executions
|
||||
int32 pass_count = 6; // Number of successful runs
|
||||
int32 fail_count = 7; // Number of failed runs
|
||||
int32 average_duration_ms = 8; // Average execution time
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// GetTestResults - Retrieve detailed results
|
||||
// ============================================================================
|
||||
|
||||
message GetTestResultsRequest {
|
||||
string test_id = 1;
|
||||
bool include_logs = 2; // Include full execution logs
|
||||
}
|
||||
|
||||
message GetTestResultsResponse {
|
||||
bool success = 1; // Overall test result
|
||||
string test_name = 2;
|
||||
string category = 3;
|
||||
int64 executed_at_ms = 4;
|
||||
int32 duration_ms = 5;
|
||||
|
||||
// Detailed results
|
||||
repeated AssertionResult assertions = 6;
|
||||
repeated string logs = 7; // If include_logs=true
|
||||
|
||||
// Performance metrics
|
||||
map<string, int32> metrics = 8; // e.g., "frame_count": 123
|
||||
}
|
||||
|
||||
message AssertionResult {
|
||||
string description = 1;
|
||||
bool passed = 2;
|
||||
string expected_value = 3;
|
||||
string actual_value = 4;
|
||||
string error_message = 5;
|
||||
}
|
||||
```
|
||||
|
||||
## Implementation Steps
|
||||
|
||||
### Step 1: Extend TestManager (✔️ Completed)
|
||||
|
||||
**What changed**:
|
||||
- Introduced `HarnessTestExecution`, `HarnessTestSummary`, and related enums in `test_manager.h`.
|
||||
- Added registration, running, completion, log, and metric helpers with `absl::Mutex` guarding (`RegisterHarnessTest`, `MarkHarnessTestRunning`, `MarkHarnessTestCompleted`, etc.).
|
||||
- Stored executions in `harness_history_` + `harness_aggregates_` with deque-based trimming to avoid unbounded growth.
|
||||
|
||||
**Where to look**:
|
||||
- `src/app/test/test_manager.h` (see *Harness test introspection (IT-05)* section around `HarnessTestExecution`).
|
||||
- `src/app/test/test_manager.cc` (functions `RegisterHarnessTest`, `MarkHarnessTestCompleted`, `AppendHarnessTestLog`, `GetHarnessTestExecution`, `ListHarnessTestSummaries`).
|
||||
|
||||
**Next touch-ups**:
|
||||
- Consider persisting assertion metadata (expected/actual) so `GetTestResults` can populate richer `AssertionResult` entries.
|
||||
- Decide on retention limit (`harness_history_limit_`) tuning once CLI consumption patterns are known.
|
||||
|
||||
#### 1.2 Update Existing RPC Handlers
|
||||
|
||||
**File**: `src/app/core/service/imgui_test_harness_service.cc`
|
||||
|
||||
Modify Click, Type, Wait, Assert handlers to record test execution:
|
||||
|
||||
```cpp
|
||||
absl::Status ImGuiTestHarnessServiceImpl::Click(
|
||||
const ClickRequest* request, ClickResponse* response) {
|
||||
|
||||
// Generate unique test ID
|
||||
std::string test_id = test_manager_->GenerateTestId("grpc_click");
|
||||
|
||||
// Record test start
|
||||
test_manager_->RecordTestStart(
|
||||
test_id,
|
||||
absl::StrFormat("Click: %s", request->target()),
|
||||
"grpc");
|
||||
|
||||
// ... existing implementation ...
|
||||
|
||||
// Record test completion
|
||||
if (success) {
|
||||
test_manager_->RecordTestComplete(test_id, TestManager::TestStatus::PASSED);
|
||||
} else {
|
||||
test_manager_->RecordTestComplete(
|
||||
test_id, TestManager::TestStatus::FAILED, error_message);
|
||||
}
|
||||
|
||||
// Add test ID to response (requires proto update)
|
||||
response->set_test_id(test_id);
|
||||
|
||||
return absl::OkStatus();
|
||||
}
|
||||
```
|
||||
|
||||
**Proto Update**: Add `test_id` field to all responses:
|
||||
|
||||
```protobuf
|
||||
message ClickResponse {
|
||||
bool success = 1;
|
||||
string message = 2;
|
||||
int32 execution_time_ms = 3;
|
||||
string test_id = 4; // NEW: Unique test identifier for introspection
|
||||
}
|
||||
|
||||
// Repeat for TypeResponse, WaitResponse, AssertResponse
|
||||
```
|
||||
|
||||
### Step 2: Implement Introspection RPCs (✔️ Completed)
|
||||
|
||||
**What changed**:
|
||||
- Added helper utilities (`ConvertHarnessStatus`, `ToUnixMillisSafe`, `ClampDurationToInt32`) in `imgui_test_harness_service.cc`.
|
||||
- Implemented `GetTestStatus`, `ListTests`, and `GetTestResults` with pagination, optional log inclusion, and structured metrics.mapping.
|
||||
- Updated gRPC wrapper to surface new RPCs and translate Abseil status codes into gRPC codes.
|
||||
- Ensured deque-backed `DynamicTestData` keep-alive remains bounded while reusing new tracking helpers.
|
||||
|
||||
**Where to look**:
|
||||
- `src/app/core/service/imgui_test_harness_service.cc` (search for `GetTestStatus(`, `ListTests(`, `GetTestResults(`).
|
||||
- `src/app/core/service/imgui_test_harness_service.h` (new method declarations).
|
||||
|
||||
**Follow-ups**:
|
||||
- Expand `AssertionResult` population once `TestManager` captures structured expected/actual data.
|
||||
- Evaluate pagination defaults (`page_size`, `page_token`) once CLI usage patterns are seen.
|
||||
|
||||
### Step 3: CLI Integration (🚧 TODO)
|
||||
|
||||
Goal: expose the new RPCs through `GuiAutomationClient` and user-facing `z3ed agent test` subcommands. The pseudo-code below illustrates the desired flow; implementation still pending.
|
||||
|
||||
**File**: `src/cli/handlers/agent.cc`
|
||||
|
||||
Add new CLI commands for test introspection:
|
||||
|
||||
```cpp
|
||||
// z3ed agent test status --test-id <id> [--follow]
|
||||
absl::Status HandleAgentTestStatus(const CommandOptions& options) {
|
||||
const std::string test_id = absl::GetFlag(FLAGS_test_id);
|
||||
const bool follow = absl::GetFlag(FLAGS_follow);
|
||||
|
||||
GuiAutomationClient client("localhost", 50052);
|
||||
RETURN_IF_ERROR(client.Connect());
|
||||
|
||||
while (true) {
|
||||
auto status_or = client.GetTestStatus(test_id);
|
||||
RETURN_IF_ERROR(status_or.status());
|
||||
|
||||
const auto& status = status_or.value();
|
||||
|
||||
// Print status
|
||||
std::cout << "Test ID: " << test_id << "\n";
|
||||
std::cout << "Status: " << StatusToString(status.status) << "\n";
|
||||
std::cout << "Execution Time: " << status.execution_time_ms << "ms\n";
|
||||
|
||||
if (status.status == TestStatus::PASSED ||
|
||||
status.status == TestStatus::FAILED ||
|
||||
status.status == TestStatus::TIMEOUT) {
|
||||
break; // Terminal state
|
||||
}
|
||||
|
||||
if (!follow) break;
|
||||
|
||||
// Poll every 500ms
|
||||
absl::SleepFor(absl::Milliseconds(500));
|
||||
}
|
||||
|
||||
return absl::OkStatus();
|
||||
}
|
||||
|
||||
// z3ed agent test results --test-id <id> [--format json] [--include-logs]
|
||||
absl::Status HandleAgentTestResults(const CommandOptions& options) {
|
||||
const std::string test_id = absl::GetFlag(FLAGS_test_id);
|
||||
const std::string format = absl::GetFlag(FLAGS_format);
|
||||
const bool include_logs = absl::GetFlag(FLAGS_include_logs);
|
||||
|
||||
GuiAutomationClient client("localhost", 50052);
|
||||
RETURN_IF_ERROR(client.Connect());
|
||||
|
||||
auto results_or = client.GetTestResults(test_id, include_logs);
|
||||
RETURN_IF_ERROR(results_or.status());
|
||||
|
||||
const auto& results = results_or.value();
|
||||
|
||||
if (format == "json") {
|
||||
// Output JSON
|
||||
PrintTestResultsJson(results);
|
||||
} else {
|
||||
// Output YAML (default)
|
||||
PrintTestResultsYaml(results);
|
||||
}
|
||||
|
||||
return absl::OkStatus();
|
||||
}
|
||||
|
||||
// z3ed agent test list [--category <name>] [--status <filter>]
|
||||
absl::Status HandleAgentTestList(const CommandOptions& options) {
|
||||
const std::string category = absl::GetFlag(FLAGS_category);
|
||||
const std::string status_filter = absl::GetFlag(FLAGS_status);
|
||||
|
||||
GuiAutomationClient client("localhost", 50052);
|
||||
RETURN_IF_ERROR(client.Connect());
|
||||
|
||||
auto tests_or = client.ListTests(category);
|
||||
RETURN_IF_ERROR(tests_or.status());
|
||||
|
||||
const auto& tests = tests_or.value();
|
||||
|
||||
// Print table
|
||||
std::cout << "=== Test List ===\n\n";
|
||||
std::cout << absl::StreamFormat("%-20s %-30s %-10s %-10s\n",
|
||||
"Test ID", "Name", "Category", "Status");
|
||||
std::cout << std::string(80, '-') << "\n";
|
||||
|
||||
for (const auto& test : tests) {
|
||||
std::cout << absl::StreamFormat("%-20s %-30s %-10s %-10s\n",
|
||||
test.test_id, test.name, test.category,
|
||||
StatusToString(test.last_status));
|
||||
}
|
||||
|
||||
return absl::OkStatus();
|
||||
}
|
||||
```
|
||||
|
||||
### Step 4: Testing & Validation (🚧 TODO)
|
||||
|
||||
#### Test Script: `scripts/test_introspection_e2e.sh`
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# Test introspection API
|
||||
|
||||
set -e
|
||||
|
||||
# Start YAZE
|
||||
./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze \
|
||||
--enable_test_harness \
|
||||
--test_harness_port=50052 \
|
||||
--rom_file=assets/zelda3.sfc &
|
||||
|
||||
YAZE_PID=$!
|
||||
sleep 3
|
||||
|
||||
# Test 1: Run a test and capture test ID
|
||||
echo "Test 1: GetTestStatus"
|
||||
TEST_ID=$(z3ed agent test --prompt "Open Overworld" --output json | jq -r '.test_id')
|
||||
echo "Test ID: $TEST_ID"
|
||||
|
||||
# Test 2: Poll for status
|
||||
echo "Test 2: Poll status"
|
||||
z3ed agent test status --test-id $TEST_ID --follow
|
||||
|
||||
# Test 3: Get results
|
||||
echo "Test 3: Get results"
|
||||
z3ed agent test results --test-id $TEST_ID --format yaml --include-logs
|
||||
|
||||
# Test 4: List all tests
|
||||
echo "Test 4: List tests"
|
||||
z3ed agent test list --category grpc
|
||||
|
||||
# Cleanup
|
||||
kill $YAZE_PID
|
||||
```
|
||||
|
||||
## Success Criteria
|
||||
|
||||
- [x] All 3 new RPCs respond correctly ✅
|
||||
- [x] Test IDs returned in Click/Type/Wait/Assert responses ✅
|
||||
- [x] Status polling works with `--follow` flag ✅
|
||||
- [x] Test history persists across multiple test runs ✅
|
||||
- [x] CLI commands output clean YAML/JSON ✅
|
||||
- [x] No memory leaks in test history tracking (bounded deque + pruning) ✅
|
||||
- [x] Thread-safe access to test history (mutex-protected) ✅
|
||||
- [x] Documentation updated in `E6-z3ed-reference.md` ✅
|
||||
- [x] E2E test script validates complete workflow ✅
|
||||
|
||||
## Migration Guide
|
||||
|
||||
**For Existing Code**:
|
||||
- No breaking changes - new RPCs only
|
||||
- Existing tests continue to work
|
||||
- Test ID field added to responses (backwards compatible)
|
||||
|
||||
**For CLI Users**:
|
||||
```bash
|
||||
# Old: Test runs, no way to check status
|
||||
z3ed agent test --prompt "Open Overworld"
|
||||
|
||||
# New: Get test ID, poll for status
|
||||
TEST_ID=$(z3ed agent test --prompt "Open Overworld" --output json | jq -r '.test_id')
|
||||
z3ed agent test status --test-id $TEST_ID --follow
|
||||
z3ed agent test results --test-id $TEST_ID
|
||||
```
|
||||
|
||||
## Next Steps
|
||||
|
||||
After IT-05 completion:
|
||||
1. **IT-06**: Widget Discovery API (uses introspection foundation)
|
||||
2. **IT-07**: Test Recording & Replay (records test IDs and results)
|
||||
3. **IT-08**: Enhanced Error Reporting (captures test context on failure)
|
||||
|
||||
## References
|
||||
|
||||
- **Proto Definition**: `src/app/core/proto/imgui_test_harness.proto`
|
||||
- **Test Manager**: `src/app/core/test_manager.{h,cc}`
|
||||
- **RPC Service**: `src/app/core/service/imgui_test_harness_service.{h,cc}`
|
||||
- **CLI Handlers**: `src/cli/handlers/agent.cc`
|
||||
- **Main Plan**: `docs/z3ed/E6-z3ed-implementation-plan.md`
|
||||
|
||||
---
|
||||
|
||||
**Author**: @scawful, GitHub Copilot
|
||||
**Created**: October 2, 2025
|
||||
**Status**: In progress (server-side complete; CLI + E2E pending)
|
||||
@@ -1,830 +0,0 @@
|
||||
# IT-08: Enhanced Error Reporting Implementation Guide
|
||||
|
||||
**Status**: IT-08a Complete ✅ | IT-08b Complete ✅ | IT-08c Complete ✅
|
||||
**Date**: October 2, 2025
|
||||
**Overall Progress**: 100% Complete (3 of 3 phases)
|
||||
|
||||
---
|
||||
|
||||
## Phase Overview
|
||||
|
||||
| Phase | Task | Status | Time | Description |
|
||||
|-------|------|--------|------|-------------|
|
||||
| IT-08a | Screenshot RPC | ✅ Complete | 1.5h | SDL-based screenshot capture |
|
||||
| IT-08b | Auto-Capture on Failure | ✅ Complete | 1.5h | Integrate with TestManager |
|
||||
| IT-08c | Widget State Dumps | ✅ Complete | 45m | Capture UI context on failure |
|
||||
| IT-08d | Error Envelope Standardization | 📋 Planned | 1-2h | Unified error format across services |
|
||||
| IT-08e | CLI Error Improvements | 📋 Planned | 1h | Rich error output with artifacts |
|
||||
|
||||
**Total Estimated Time**: 5-7 hours
|
||||
**Time Spent**: 3.75 hours
|
||||
**Time Remaining**: 0 hours (Core phases complete)
|
||||
|
||||
---
|
||||
|
||||
## IT-08a: Screenshot RPC ✅ COMPLETE
|
||||
|
||||
**Date Completed**: October 2, 2025
|
||||
**Time**: 1.5 hours
|
||||
|
||||
### Implementation Summary
|
||||
|
||||
### What Was Built
|
||||
|
||||
Implemented the `Screenshot` RPC in the ImGuiTestHarness service with the following capabilities:
|
||||
|
||||
1. **SDL Renderer Integration**: Accesses the ImGui SDL2 backend renderer through `BackendRendererUserData`
|
||||
2. **Framebuffer Capture**: Uses `SDL_RenderReadPixels` to capture the full window contents (1536x864, 32-bit ARGB)
|
||||
3. **BMP File Output**: Saves screenshots as BMP files using SDL's built-in `SDL_SaveBMP` function
|
||||
4. **Flexible Paths**: Supports custom output paths or auto-generates timestamped filenames (`/tmp/yaze_screenshot_<timestamp>.bmp`)
|
||||
5. **Response Metadata**: Returns file path, file size (bytes), and image dimensions
|
||||
|
||||
### Technical Implementation
|
||||
|
||||
**Location**: `/Users/scawful/Code/yaze/src/app/core/service/imgui_test_harness_service.cc`
|
||||
|
||||
```cpp
|
||||
// Helper struct matching imgui_impl_sdlrenderer2.cpp backend data
|
||||
struct ImGui_ImplSDLRenderer2_Data {
|
||||
SDL_Renderer* Renderer;
|
||||
};
|
||||
|
||||
absl::Status ImGuiTestHarnessServiceImpl::Screenshot(
|
||||
const ScreenshotRequest* request, ScreenshotResponse* response) {
|
||||
// 1. Get SDL renderer from ImGui backend
|
||||
ImGuiIO& io = ImGui::GetIO();
|
||||
auto* backend_data = static_cast<ImGui_ImplSDLRenderer2_Data*>(io.BackendRendererUserData);
|
||||
|
||||
if (!backend_data || !backend_data->Renderer) {
|
||||
response->set_success(false);
|
||||
response->set_message("SDL renderer not available");
|
||||
## IT-08b: Auto-Capture on Test Failure ✅ COMPLETE
|
||||
|
||||
## IT-08b: Auto-Capture on Test Failure ✅ COMPLETE
|
||||
|
||||
**Date Completed**: October 2, 2025
|
||||
**Artifacts**: `CaptureFailureContext`, `screenshot_utils.{h,cc}`, CLI introspection updates
|
||||
|
||||
### Highlights
|
||||
|
||||
- **Shared SDL helper**: New `CaptureHarnessScreenshot()` centralizes renderer
|
||||
capture and writes BMP files into `${TMPDIR}/yaze/test-results/<test_id>/`.
|
||||
- **TestManager integration**: Failure context now records ImGui window/nav
|
||||
state, widget hierarchy (`CaptureWidgetState`), and screenshot metadata while
|
||||
keeping `HarnessTestExecution` aggregates in sync.
|
||||
- **Graceful fallbacks**: When `YAZE_WITH_GRPC` is disabled we emit a harness
|
||||
log noting that screenshot capture is unavailable.
|
||||
- **End-user surfacing**: `GuiAutomationClient::GetTestResults` and
|
||||
`z3ed agent test results` expose `screenshot_path`, `screenshot_size_bytes`,
|
||||
`failure_context`, and `widget_state` in both YAML and JSON modes.
|
||||
|
||||
### Key Touch Points
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `src/app/core/service/screenshot_utils.{h,cc}` | SDL renderer capture reused by RPC + auto-capture |
|
||||
| `src/app/test/test_manager.cc` | Auto-capture pipeline with per-test artifact directories |
|
||||
| `src/app/core/service/imgui_test_harness_service.cc` | Screenshot RPC delegates to shared helper |
|
||||
| `src/cli/service/gui_automation_client.*` | Propagates new proto fields to CLI |
|
||||
| `src/cli/handlers/agent/test_commands.cc` | Presents diagnostics to users/agents |
|
||||
|
||||
### Validation Checklist
|
||||
|
||||
```bash
|
||||
# Build (needs YAZE_WITH_GRPC=ON)
|
||||
cmake --build build-grpc-test --target yaze -j$(sysctl -n hw.ncpu)
|
||||
|
||||
# Start harness
|
||||
./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze \
|
||||
--enable_test_harness --test_harness_port=50052 \
|
||||
--rom_file=assets/zelda3.sfc &
|
||||
|
||||
# Queue a failing automation step
|
||||
grpcurl -plaintext \
|
||||
-import-path src/app/core/proto \
|
||||
-proto imgui_test_harness.proto \
|
||||
-d '{"target":"button:DoesNotExist","type":"LEFT"}' \
|
||||
localhost:50052 yaze.test.ImGuiTestHarness/Click
|
||||
|
||||
# Fetch diagnostics
|
||||
z3ed agent test results --test-id <captured_id> --include-logs --format yaml
|
||||
|
||||
# Inspect artifact directory
|
||||
ls ${TMPDIR}/yaze/test-results/<captured_id>/
|
||||
```
|
||||
|
||||
You should see a `.bmp` failure screenshot, widget JSON in the CLI output, and
|
||||
logs noting the auto-capture event. When the helper fails (e.g., renderer not
|
||||
ready) the harness log and CLI output record the failure reason.
|
||||
|
||||
### Next Steps
|
||||
|
||||
- Wire the same helper into HTML bundle generation (IT-08c follow-up).
|
||||
- Add configurable artifact root (`--error-artifact-dir`) for CI separation.
|
||||
- Consider PNG encoding via `stb_image_write` if file size becomes an issue.
|
||||
|
||||
---
|
||||
### Technical Implementation
|
||||
|
||||
**Location**: `/Users/scawful/Code/yaze/src/app/test/test_manager.{h,cc}`
|
||||
|
||||
**Key Changes**:
|
||||
|
||||
```cpp
|
||||
// In HarnessTestExecution struct
|
||||
struct HarnessTestExecution {
|
||||
// ... existing fields ...
|
||||
|
||||
// IT-08b: Failure diagnostics
|
||||
std::string screenshot_path;
|
||||
int64_t screenshot_size_bytes = 0;
|
||||
std::string failure_context;
|
||||
std::string widget_state; // IT-08c (future)
|
||||
};
|
||||
|
||||
// In MarkHarnessTestCompleted()
|
||||
if (status == HarnessTestStatus::kFailed ||
|
||||
status == HarnessTestStatus::kTimeout) {
|
||||
lock.Release();
|
||||
CaptureFailureContext(test_id);
|
||||
lock.Acquire();
|
||||
}
|
||||
|
||||
// CaptureFailureContext implementation
|
||||
void TestManager::CaptureFailureContext(const std::string& test_id) {
|
||||
absl::MutexLock lock(&harness_history_mutex_);
|
||||
auto it = harness_history_.find(test_id);
|
||||
if (it == harness_history_.end()) {
|
||||
return;
|
||||
}
|
||||
|
||||
HarnessTestExecution& execution = it->second;
|
||||
|
||||
// Capture execution context
|
||||
if (ImGui::GetCurrentContext() != nullptr) {
|
||||
ImGuiWindow* current_window = ImGui::GetCurrentWindow();
|
||||
const char* window_name = current_window ? current_window->Name : "none";
|
||||
ImGuiID active_id = ImGui::GetActiveID();
|
||||
|
||||
execution.failure_context = absl::StrFormat(
|
||||
"Frame: %d, Active Window: %s, Focused Widget: 0x%08X",
|
||||
ImGui::GetFrameCount(), window_name, active_id);
|
||||
}
|
||||
|
||||
// Set screenshot path placeholder
|
||||
execution.screenshot_path = absl::StrFormat(
|
||||
"/tmp/yaze_test_%s_failure.bmp", test_id);
|
||||
}
|
||||
```
|
||||
|
||||
### Testing
|
||||
|
||||
The implementation will be validated when tests fail:
|
||||
|
||||
```bash
|
||||
# 1. Build with changes
|
||||
cmake --build build-grpc-test --target yaze -j8
|
||||
|
||||
# 2. Start test harness
|
||||
./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze \
|
||||
--enable_test_harness --test_harness_port=50052 \
|
||||
--rom_file=assets/zelda3.sfc &
|
||||
|
||||
# 3. Trigger a failing test
|
||||
grpcurl -plaintext \
|
||||
-import-path src/app/core/proto \
|
||||
-proto imgui_test_harness.proto \
|
||||
-d '{"target":"nonexistent_widget","type":"LEFT"}' \
|
||||
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Click
|
||||
|
||||
# 4. Query test results
|
||||
grpcurl -plaintext \
|
||||
-import-path src/app/core/proto \
|
||||
-proto imgui_test_harness.proto \
|
||||
-d '{"test_id":"grpc_click_<timestamp>","include_logs":true}' \
|
||||
127.0.0.1:50052 yaze.test.ImGuiTestHarness/GetTestResults
|
||||
```
|
||||
|
||||
**Expected Response**:
|
||||
```json
|
||||
{
|
||||
"success": false,
|
||||
"testName": "Click nonexistent_widget",
|
||||
"category": "grpc",
|
||||
"executedAtMs": "1696357200000",
|
||||
"durationMs": 150,
|
||||
"screenshotPath": "/tmp/yaze/test-results/grpc_click_12345678/failure_1696357200000.bmp",
|
||||
"failureContext": "Frame: 1234, Active Window: Main Window, Focused Widget: 0x00000000"
|
||||
}
|
||||
```
|
||||
|
||||
### Success Criteria
|
||||
|
||||
- ✅ Failure context captured automatically on test failures
|
||||
- ✅ Screenshot path stored in test history
|
||||
- ✅ GetTestResults RPC returns failure diagnostics
|
||||
- ✅ No deadlocks (mutex released before calling CaptureFailureContext)
|
||||
- ✅ Proto schema updated with new fields
|
||||
|
||||
### Retro Notes
|
||||
|
||||
- Placeholder screenshot paths have been replaced by the shared helper that
|
||||
writes into `${TMPDIR}/yaze/test-results/<test_id>/` and records byte sizes.
|
||||
- Widget state capture (IT-08c) is now invoked directly from
|
||||
`CaptureFailureContext`, removing the TODOs from the original plan.
|
||||
|
||||
---
|
||||
|
||||
## IT-08b: Auto-Capture on Test Failure 🔄 IN PROGRESS
|
||||
|
||||
**Goal**: Automatically capture screenshots and context when tests fail
|
||||
**Time Estimate**: 1-1.5 hours
|
||||
**Status**: Ready to implement
|
||||
|
||||
### Implementation Plan
|
||||
|
||||
#### Step 1: Modify TestManager (30 minutes)
|
||||
|
||||
**File**: `src/app/core/test_manager.cc`
|
||||
|
||||
Add screenshot capture in `MarkHarnessTestCompleted()`:
|
||||
|
||||
```cpp
|
||||
void TestManager::MarkHarnessTestCompleted(const std::string& test_id,
|
||||
ImGuiTestStatus status) {
|
||||
auto& history_entry = test_history_[test_id];
|
||||
history_entry.status = status;
|
||||
history_entry.end_time = absl::Now();
|
||||
history_entry.execution_time_ms = absl::ToInt64Milliseconds(
|
||||
history_entry.end_time - history_entry.start_time);
|
||||
|
||||
// Auto-capture screenshot on failure
|
||||
if (status == ImGuiTestStatus_Error || status == ImGuiTestStatus_Warning) {
|
||||
CaptureFailureContext(test_id);
|
||||
}
|
||||
}
|
||||
|
||||
void TestManager::CaptureFailureContext(const std::string& test_id) {
|
||||
auto& history_entry = test_history_[test_id];
|
||||
|
||||
// 1. Capture screenshot
|
||||
std::string screenshot_path =
|
||||
absl::StrFormat("/tmp/yaze_test_%s_failure.bmp", test_id);
|
||||
|
||||
if (harness_service_) {
|
||||
ScreenshotRequest req;
|
||||
req.set_output_path(screenshot_path);
|
||||
|
||||
ScreenshotResponse resp;
|
||||
auto status = harness_service_->Screenshot(&req, &resp);
|
||||
|
||||
if (status.ok()) {
|
||||
history_entry.screenshot_path = resp.file_path();
|
||||
history_entry.screenshot_size_bytes = resp.file_size_bytes();
|
||||
}
|
||||
}
|
||||
|
||||
// 2. Capture widget state (IT-08c)
|
||||
// history_entry.widget_state = CaptureWidgetState();
|
||||
|
||||
// 3. Capture execution context
|
||||
history_entry.failure_context = absl::StrFormat(
|
||||
"Frame: %d, Active Window: %s, Focused Widget: %s",
|
||||
ImGui::GetFrameCount(),
|
||||
ImGui::GetCurrentWindow() ? ImGui::GetCurrentWindow()->Name : "none",
|
||||
ImGui::GetActiveID());
|
||||
}
|
||||
```
|
||||
|
||||
#### Step 2: Update TestHistory Structure (15 minutes)
|
||||
|
||||
**File**: `src/app/core/test_manager.h`
|
||||
|
||||
Add failure context fields:
|
||||
|
||||
```cpp
|
||||
struct TestHistory {
|
||||
std::string test_id;
|
||||
std::string test_name;
|
||||
ImGuiTestStatus status;
|
||||
absl::Time start_time;
|
||||
absl::Time end_time;
|
||||
int64_t execution_time_ms;
|
||||
std::vector<std::string> logs;
|
||||
std::map<std::string, std::string> metrics;
|
||||
|
||||
// IT-08b: Failure diagnostics
|
||||
std::string screenshot_path;
|
||||
int64_t screenshot_size_bytes = 0;
|
||||
std::string failure_context;
|
||||
std::string widget_state; // IT-08c
|
||||
};
|
||||
```
|
||||
|
||||
#### Step 3: Update GetTestResults RPC (30 minutes)
|
||||
|
||||
**File**: `src/app/core/service/imgui_test_harness_service.cc`
|
||||
|
||||
Include screenshot path in results:
|
||||
|
||||
```cpp
|
||||
absl::Status ImGuiTestHarnessServiceImpl::GetTestResults(
|
||||
const GetTestResultsRequest* request,
|
||||
GetTestResultsResponse* response) {
|
||||
|
||||
const auto& history = test_manager_->GetTestHistory(request->test_id());
|
||||
|
||||
// ... existing result population ...
|
||||
|
||||
// Add failure diagnostics
|
||||
if (!history.screenshot_path.empty()) {
|
||||
response->set_screenshot_path(history.screenshot_path);
|
||||
response->set_screenshot_size_bytes(history.screenshot_size_bytes);
|
||||
}
|
||||
|
||||
if (!history.failure_context.empty()) {
|
||||
response->set_failure_context(history.failure_context);
|
||||
}
|
||||
|
||||
return absl::OkStatus();
|
||||
}
|
||||
```
|
||||
|
||||
#### Step 4: Update Proto Schema (15 minutes)
|
||||
|
||||
**File**: `src/app/core/proto/imgui_test_harness.proto`
|
||||
|
||||
Add fields to GetTestResultsResponse:
|
||||
|
||||
```proto
|
||||
message GetTestResultsResponse {
|
||||
string test_id = 1;
|
||||
TestStatus status = 2;
|
||||
int64 execution_time_ms = 3;
|
||||
repeated string logs = 4;
|
||||
map<string, string> metrics = 5;
|
||||
|
||||
// IT-08b: Failure diagnostics
|
||||
string screenshot_path = 6;
|
||||
int64 screenshot_size_bytes = 7;
|
||||
string failure_context = 8;
|
||||
string widget_state = 9; // IT-08c
|
||||
}
|
||||
```
|
||||
|
||||
### Testing
|
||||
|
||||
```bash
|
||||
# 1. Build with changes
|
||||
cmake --build build-grpc-test --target yaze -j8
|
||||
|
||||
# 2. Start test harness
|
||||
./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze \
|
||||
--enable_test_harness --test_harness_port=50052 \
|
||||
--rom_file=assets/zelda3.sfc &
|
||||
|
||||
# 3. Trigger a failing test
|
||||
grpcurl -plaintext \
|
||||
-import-path src/app/core/proto \
|
||||
-proto imgui_test_harness.proto \
|
||||
-d '{"target":"nonexistent_widget","type":"LEFT"}' \
|
||||
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Click
|
||||
|
||||
# 4. Check for screenshot
|
||||
ls -lh /tmp/yaze_test_*_failure.bmp
|
||||
|
||||
# 5. Query test results
|
||||
grpcurl -plaintext \
|
||||
-import-path src/app/core/proto \
|
||||
-proto imgui_test_harness.proto \
|
||||
-d '{"test_id":"grpc_click_<timestamp>"}' \
|
||||
127.0.0.1:50052 yaze.test.ImGuiTestHarness/GetTestResults
|
||||
|
||||
# Expected: screenshot_path and failure_context populated
|
||||
```
|
||||
|
||||
### Success Criteria
|
||||
|
||||
- ✅ Screenshots auto-captured on test failure
|
||||
- ✅ Screenshot path stored in test history
|
||||
- ✅ GetTestResults returns screenshot metadata
|
||||
- ✅ No performance impact on passing tests
|
||||
- ✅ Screenshots cleaned up after test completion (optional)
|
||||
|
||||
---
|
||||
|
||||
## IT-08c: Widget State Dumps ✅ COMPLETE
|
||||
|
||||
**Date Completed**: October 2, 2025
|
||||
**Time**: 45 minutes
|
||||
|
||||
### Implementation Summary
|
||||
|
||||
Successfully implemented comprehensive widget state capture for test failure diagnostics.
|
||||
|
||||
### What Was Built
|
||||
|
||||
1. **Widget State Capture Utility** (`widget_state_capture.h/cc`):
|
||||
- Created dedicated service for capturing ImGui widget hierarchy and state
|
||||
- JSON serialization for structured output
|
||||
- Comprehensive state snapshot including windows, widgets, input, and navigation
|
||||
|
||||
2. **State Information Captured**:
|
||||
- Frame count and frame rate
|
||||
- Focused window and widget IDs
|
||||
- Hovered widget ID
|
||||
- List of visible windows
|
||||
- Open popups
|
||||
- Navigation state (nav ID, active state)
|
||||
- Mouse state (buttons, position)
|
||||
- Keyboard modifiers (Ctrl, Shift, Alt)
|
||||
|
||||
3. **TestManager Integration**:
|
||||
- Widget state automatically captured in `CaptureFailureContext()`
|
||||
- State stored in `HarnessTestExecution::widget_state`
|
||||
- Logged for debugging visibility
|
||||
|
||||
4. **Build System Integration**:
|
||||
- Added widget_state_capture sources to app.cmake
|
||||
- Integrated with gRPC build configuration
|
||||
|
||||
### Technical Implementation
|
||||
|
||||
**Location**: `/Users/scawful/Code/yaze/src/app/core/widget_state_capture.{h,cc}`
|
||||
|
||||
**Key Features**:
|
||||
|
||||
```cpp
|
||||
struct WidgetState {
|
||||
std::string focused_window;
|
||||
std::string focused_widget;
|
||||
std::string hovered_widget;
|
||||
std::vector<std::string> visible_windows;
|
||||
std::vector<std::string> open_popups;
|
||||
int frame_count;
|
||||
float frame_rate;
|
||||
ImGuiID nav_id;
|
||||
bool nav_active;
|
||||
bool mouse_down[5];
|
||||
float mouse_pos_x, mouse_pos_y;
|
||||
bool ctrl_pressed, shift_pressed, alt_pressed;
|
||||
};
|
||||
|
||||
std::string CaptureWidgetState() {
|
||||
// Captures full ImGui context state
|
||||
// Returns JSON-formatted string
|
||||
}
|
||||
```
|
||||
|
||||
**Integration in TestManager**:
|
||||
|
||||
```cpp
|
||||
void TestManager::CaptureFailureContext(const std::string& test_id) {
|
||||
// ... capture execution context ...
|
||||
|
||||
// Widget state capture (IT-08c)
|
||||
execution.widget_state = core::CaptureWidgetState();
|
||||
|
||||
util::logf("[TestManager] Widget state: %s",
|
||||
execution.widget_state.c_str());
|
||||
}
|
||||
```
|
||||
|
||||
### Output Example
|
||||
|
||||
```json
|
||||
{
|
||||
"frame_count": 1234,
|
||||
"frame_rate": 60.0,
|
||||
"focused_window": "Overworld Editor",
|
||||
"focused_widget": "0x12345678",
|
||||
"hovered_widget": "0x87654321",
|
||||
"visible_windows": [
|
||||
"Main Window",
|
||||
"Overworld Editor",
|
||||
"Debug"
|
||||
],
|
||||
"open_popups": [],
|
||||
"navigation": {
|
||||
"nav_id": "0x00000000",
|
||||
"nav_active": false
|
||||
},
|
||||
"input": {
|
||||
"mouse_buttons": [false, false, false, false, false],
|
||||
"mouse_pos": [1024.5, 768.3],
|
||||
"modifiers": {
|
||||
"ctrl": false,
|
||||
"shift": false,
|
||||
"alt": false
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Testing
|
||||
|
||||
Widget state capture will be automatically triggered on test failures:
|
||||
|
||||
```bash
|
||||
# 1. Build with new code
|
||||
cmake --build build-grpc-test --target yaze -j8
|
||||
|
||||
# 2. Start test harness
|
||||
./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze \
|
||||
--enable_test_harness --test_harness_port=50052 \
|
||||
--rom_file=assets/zelda3.sfc &
|
||||
|
||||
# 3. Trigger a failing test
|
||||
grpcurl -plaintext \
|
||||
-import-path src/app/core/proto \
|
||||
-proto imgui_test_harness.proto \
|
||||
-d '{"target":"nonexistent_widget","type":"LEFT"}' \
|
||||
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Click
|
||||
|
||||
# 4. Query results - will include widget_state field
|
||||
grpcurl -plaintext \
|
||||
-import-path src/app/core/proto \
|
||||
-proto imgui_test_harness.proto \
|
||||
-d '{"test_id":"<test_id>","include_logs":true}' \
|
||||
127.0.0.1:50052 yaze.test.ImGuiTestHarness/GetTestResults
|
||||
```
|
||||
|
||||
### Success Criteria
|
||||
|
||||
- ✅ Widget state capture utility implemented
|
||||
- ✅ JSON serialization working
|
||||
- ✅ Integrated with TestManager failure capture
|
||||
- ✅ Added to build system
|
||||
- ✅ Comprehensive state information captured
|
||||
- ✅ Proto schema already supports widget_state field
|
||||
|
||||
### Benefits for Debugging
|
||||
|
||||
The widget state dump provides critical context for debugging test failures:
|
||||
- **UI State**: Know exactly which windows/widgets were visible
|
||||
- **Focus State**: Understand what had input focus
|
||||
- **Input State**: See mouse and keyboard state at failure time
|
||||
- **Navigation**: Track ImGui navigation state
|
||||
- **Frame Timing**: Frame count and rate for timing issues
|
||||
|
||||
---
|
||||
|
||||
## IT-08c: Widget State Dumps 📋 PLANNED
|
||||
|
||||
**Goal**: Capture UI hierarchy and state on test failures
|
||||
**Time Estimate**: 30-45 minutes
|
||||
**Status**: Specification phase
|
||||
|
||||
### Implementation Plan
|
||||
|
||||
#### Step 1: Create Widget State Capture Utility (30 minutes)
|
||||
|
||||
**File**: `src/app/core/widget_state_capture.h` (new file)
|
||||
|
||||
```cpp
|
||||
#ifndef YAZE_CORE_WIDGET_STATE_CAPTURE_H
|
||||
#define YAZE_CORE_WIDGET_STATE_CAPTURE_H
|
||||
|
||||
#include <string>
|
||||
#include "imgui/imgui.h"
|
||||
|
||||
namespace yaze {
|
||||
namespace core {
|
||||
|
||||
struct WidgetState {
|
||||
std::string focused_window;
|
||||
std::string focused_widget;
|
||||
std::string hovered_widget;
|
||||
std::vector<std::string> visible_windows;
|
||||
std::vector<std::string> open_menus;
|
||||
std::string active_popup;
|
||||
};
|
||||
|
||||
std::string CaptureWidgetState();
|
||||
std::string SerializeWidgetStateToJson(const WidgetState& state);
|
||||
|
||||
} // namespace core
|
||||
} // namespace yaze
|
||||
|
||||
#endif
|
||||
```
|
||||
|
||||
**File**: `src/app/core/widget_state_capture.cc` (new file)
|
||||
|
||||
```cpp
|
||||
#include "src/app/core/widget_state_capture.h"
|
||||
#include "absl/strings/str_format.h"
|
||||
#include "nlohmann/json.hpp"
|
||||
|
||||
namespace yaze {
|
||||
namespace core {
|
||||
|
||||
std::string CaptureWidgetState() {
|
||||
WidgetState state;
|
||||
|
||||
// Capture focused window
|
||||
ImGuiWindow* current = ImGui::GetCurrentWindow();
|
||||
if (current) {
|
||||
state.focused_window = current->Name;
|
||||
}
|
||||
|
||||
// Capture active widget
|
||||
ImGuiID active_id = ImGui::GetActiveID();
|
||||
if (active_id != 0) {
|
||||
state.focused_widget = absl::StrFormat("ID_%u", active_id);
|
||||
}
|
||||
|
||||
// Capture hovered widget
|
||||
ImGuiID hovered_id = ImGui::GetHoveredID();
|
||||
if (hovered_id != 0) {
|
||||
state.hovered_widget = absl::StrFormat("ID_%u", hovered_id);
|
||||
}
|
||||
|
||||
// Traverse window list
|
||||
ImGuiContext* ctx = ImGui::GetCurrentContext();
|
||||
for (ImGuiWindow* window : ctx->Windows) {
|
||||
if (window->Active && !window->Hidden) {
|
||||
state.visible_windows.push_back(window->Name);
|
||||
}
|
||||
}
|
||||
|
||||
return SerializeWidgetStateToJson(state);
|
||||
}
|
||||
|
||||
std::string SerializeWidgetStateToJson(const WidgetState& state) {
|
||||
nlohmann::json j;
|
||||
j["focused_window"] = state.focused_window;
|
||||
j["focused_widget"] = state.focused_widget;
|
||||
j["hovered_widget"] = state.hovered_widget;
|
||||
j["visible_windows"] = state.visible_windows;
|
||||
j["open_menus"] = state.open_menus;
|
||||
j["active_popup"] = state.active_popup;
|
||||
return j.dump(2); // Pretty print with indent
|
||||
}
|
||||
|
||||
} // namespace core
|
||||
} // namespace yaze
|
||||
```
|
||||
|
||||
#### Step 2: Integrate with TestManager (15 minutes)
|
||||
|
||||
Update `CaptureFailureContext()` in `test_manager.cc`:
|
||||
|
||||
```cpp
|
||||
void TestManager::CaptureFailureContext(const std::string& test_id) {
|
||||
auto& history_entry = test_history_[test_id];
|
||||
|
||||
// 1. Screenshot (IT-08b)
|
||||
// ... existing code ...
|
||||
|
||||
// 2. Widget state (IT-08c)
|
||||
history_entry.widget_state = core::CaptureWidgetState();
|
||||
|
||||
// 3. Execution context
|
||||
// ... existing code ...
|
||||
}
|
||||
```
|
||||
|
||||
### Output Example
|
||||
|
||||
```json
|
||||
{
|
||||
"focused_window": "Overworld Editor",
|
||||
"focused_widget": "ID_12345",
|
||||
"hovered_widget": "ID_67890",
|
||||
"visible_windows": [
|
||||
"Main Window",
|
||||
"Overworld Editor",
|
||||
"Palette Editor"
|
||||
],
|
||||
"open_menus": [],
|
||||
"active_popup": ""
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## IT-08d: Error Envelope Standardization 📋 PLANNED
|
||||
|
||||
**Goal**: Unified error format across z3ed, TestManager, EditorManager
|
||||
**Time Estimate**: 1-2 hours
|
||||
**Status**: Design phase
|
||||
|
||||
### Proposed Error Envelope
|
||||
|
||||
```cpp
|
||||
// Shared error structure
|
||||
struct ErrorContext {
|
||||
absl::Status status;
|
||||
std::string component; // "TestHarness", "EditorManager", "z3ed"
|
||||
std::string operation; // "Click", "LoadROM", "RunTest"
|
||||
std::map<std::string, std::string> metadata;
|
||||
std::vector<std::string> artifact_paths; // Screenshots, logs, etc.
|
||||
std::string actionable_hint; // User-facing suggestion
|
||||
};
|
||||
```
|
||||
|
||||
### Integration Points
|
||||
|
||||
1. **TestManager**: Wrap failures in ErrorContext
|
||||
2. **EditorManager**: Use ErrorContext for all operations
|
||||
3. **z3ed CLI**: Parse ErrorContext and format for display
|
||||
4. **ProposalDrawer**: Display ErrorContext in GUI modal
|
||||
|
||||
---
|
||||
|
||||
## IT-08e: CLI Error Improvements 📋 PLANNED
|
||||
|
||||
**Goal**: Rich error output in z3ed CLI
|
||||
**Time Estimate**: 1 hour
|
||||
**Status**: Design phase
|
||||
|
||||
### Enhanced CLI Output
|
||||
|
||||
```bash
|
||||
$ z3ed agent test --prompt "Open Overworld editor"
|
||||
|
||||
❌ Test Failed: grpc_click_1696357200
|
||||
Component: ImGuiTestHarness
|
||||
Operation: Click widget "Overworld"
|
||||
|
||||
Error: Widget not found
|
||||
|
||||
Artifacts:
|
||||
• Screenshot: /tmp/yaze_test_grpc_click_1696357200_failure.bmp
|
||||
• Widget State: /tmp/yaze_test_grpc_click_1696357200_state.json
|
||||
• Logs: /tmp/yaze_test_grpc_click_1696357200.log
|
||||
|
||||
Context:
|
||||
• Visible Windows: Main Window, Debug
|
||||
• Focused Window: Main Window
|
||||
• Active Widget: None
|
||||
|
||||
Suggestion:
|
||||
→ Check if ROM is loaded (File → Open ROM)
|
||||
→ Verify Overworld editor button is visible
|
||||
→ Use 'z3ed agent gui discover' to list available widgets
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Progress Tracking
|
||||
|
||||
### Completed ✅
|
||||
- IT-08a: Screenshot RPC (1.5 hours)
|
||||
- IT-08b: Auto-capture on failure (1.5 hours)
|
||||
- IT-08c: Widget state dumps (45 minutes)
|
||||
|
||||
### In Progress 🔄
|
||||
- None - Core error reporting complete
|
||||
|
||||
### Planned 📋
|
||||
- IT-08d: Error envelope standardization (optional enhancement)
|
||||
- IT-08e: CLI error improvements (optional enhancement)
|
||||
|
||||
### Time Investment
|
||||
- **Spent**: 3.75 hours (IT-08a + IT-08b + IT-08c)
|
||||
- **Remaining**: 0 hours for core phases
|
||||
- **Total**: 3.75 hours vs 5-7 hours estimated (under budget ✅)
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
**IT-08 Core Complete** ✅
|
||||
|
||||
All three core phases of IT-08 (Enhanced Error Reporting) are now complete:
|
||||
1. ✅ Screenshot capture via SDL
|
||||
2. ✅ Auto-capture on test failure
|
||||
3. ✅ Widget state dumps
|
||||
|
||||
**Optional Enhancements** (IT-08d/e - not blocking):
|
||||
- Error envelope standardization across services
|
||||
- CLI error output improvements
|
||||
- HTML error report generation
|
||||
|
||||
**Recommended Next Priority**: IT-09 (CI/CD Integration) or IT-06 (Widget Discovery API)
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- **Implementation Plan**: [E6-z3ed-implementation-plan.md](E6-z3ed-implementation-plan.md)
|
||||
- **Test Harness Guide**: [IT-05-IMPLEMENTATION-GUIDE.md](IT-05-IMPLEMENTATION-GUIDE.md)
|
||||
- **Source Files**:
|
||||
- `src/app/core/service/imgui_test_harness_service.cc`
|
||||
- `src/app/core/test_manager.{h,cc}`
|
||||
- `src/app/core/proto/imgui_test_harness.proto`
|
||||
|
||||
---
|
||||
|
||||
**Last Updated**: October 2, 2025
|
||||
**Current Phase**: IT-08b (Auto-capture on failure)
|
||||
**Overall Progress**: 33% Complete (1 of 3 core phases)
|
||||
|
||||
---
|
||||
|
||||
**Report Generated**: October 2, 2025
|
||||
**Author**: GitHub Copilot (AI Assistant)
|
||||
**Project**: YAZE - Yet Another Zelda3 Editor
|
||||
**Component**: z3ed CLI Tool - Test Automation Harness
|
||||
File diff suppressed because it is too large
Load Diff
@@ -1,298 +0,0 @@
|
||||
# LLM Integration Implementation Checklist
|
||||
|
||||
**Created**: October 3, 2025
|
||||
**Status**: Ready to Begin
|
||||
**Estimated Time**: 12-15 hours total
|
||||
|
||||
> 📋 **Main Guide**: See [LLM-INTEGRATION-PLAN.md](LLM-INTEGRATION-PLAN.md) for detailed implementation instructions.
|
||||
|
||||
## Phase 1: Ollama Local Integration (4-6 hours) ✅ COMPLETE
|
||||
|
||||
### Prerequisites
|
||||
- [x] Install Ollama: `brew install ollama` (macOS)
|
||||
- [x] Start Ollama server: `ollama serve`
|
||||
- [x] Pull recommended model: `ollama pull qwen2.5-coder:7b`
|
||||
- [x] Test connectivity: `curl http://localhost:11434/api/tags`
|
||||
|
||||
### Implementation Tasks
|
||||
|
||||
#### 1.1 Create OllamaAIService Class
|
||||
- [x] Create `src/cli/service/ollama_ai_service.h`
|
||||
- [x] Define `OllamaConfig` struct
|
||||
- [x] Declare `OllamaAIService` class with `GetCommands()` override
|
||||
- [x] Add `CheckAvailability()` and `ListAvailableModels()` methods
|
||||
- [x] Create `src/cli/service/ollama_ai_service.cc`
|
||||
- [x] Implement constructor with config
|
||||
- [x] Implement `BuildSystemPrompt()` with z3ed command documentation
|
||||
- [x] Implement `CheckAvailability()` with health check
|
||||
- [x] Implement `GetCommands()` with Ollama API call
|
||||
- [x] Add JSON parsing for command extraction
|
||||
- [x] Add error handling for connection failures
|
||||
|
||||
#### 1.2 Update CMake Configuration
|
||||
- [x] Add `YAZE_WITH_HTTPLIB` option to `CMakeLists.txt`
|
||||
- [x] Add httplib detection (vcpkg or bundled)
|
||||
- [x] Add compile definition `YAZE_WITH_HTTPLIB`
|
||||
- [x] Update z3ed target to link httplib when available
|
||||
|
||||
#### 1.3 Wire into Agent Commands
|
||||
- [x] Update `src/cli/handlers/agent/general_commands.cc`
|
||||
- [x] Add `#include "cli/service/ollama_ai_service.h"`
|
||||
- [x] Create `CreateAIService()` helper function
|
||||
- [x] Implement provider selection logic (env vars)
|
||||
- [x] Add health check with fallback to MockAIService
|
||||
- [x] Update `HandleRunCommand()` to use service factory
|
||||
- [x] Update `HandlePlanCommand()` to use service factory
|
||||
|
||||
#### 1.4 Testing & Validation
|
||||
- [x] Create `scripts/test_ollama_integration.sh`
|
||||
- [x] Check Ollama server availability
|
||||
- [x] Verify model is pulled
|
||||
- [x] Test `z3ed agent run` with simple prompt
|
||||
- [x] Verify proposal creation
|
||||
- [x] Review generated commands
|
||||
- [x] Run end-to-end test
|
||||
- [x] Document any issues encountered
|
||||
|
||||
### Success Criteria
|
||||
- [x] `z3ed agent run --prompt "Validate ROM"` generates correct command
|
||||
- [x] Health check reports clear errors when Ollama unavailable
|
||||
- [x] Service fallback to MockAIService works correctly
|
||||
- [x] Test script passes without manual intervention
|
||||
|
||||
**Status:** ✅ Complete - See [PHASE1-COMPLETE.md](PHASE1-COMPLETE.md)
|
||||
|
||||
---
|
||||
|
||||
## Phase 2: Improve Gemini Integration (2-3 hours) ✅ COMPLETE
|
||||
|
||||
### Implementation Tasks
|
||||
|
||||
#### 2.1 Fix GeminiAIService
|
||||
- [x] Update `src/cli/service/gemini_ai_service.h`
|
||||
- [x] Add `GeminiConfig` struct with model, temperature, max_tokens
|
||||
- [x] Add health check methods
|
||||
- [x] Update constructor signature
|
||||
- [x] Update `src/cli/service/gemini_ai_service.cc`
|
||||
- [x] Fix system instruction format (separate field in v1beta API)
|
||||
- [x] Update to use `gemini-2.5-flash` model
|
||||
- [x] Add generation config (temperature, maxOutputTokens)
|
||||
- [x] Add `responseMimeType: application/json` for structured output
|
||||
- [x] Implement markdown code block stripping
|
||||
- [x] Add `CheckAvailability()` with API key validation
|
||||
- [x] Improve error messages with actionable guidance
|
||||
|
||||
#### 2.2 Wire into Service Factory
|
||||
- [x] Update `CreateAIService()` to use `GeminiConfig`
|
||||
- [x] Add Gemini health check with fallback
|
||||
- [x] Add `GEMINI_MODEL` environment variable support
|
||||
- [x] Test with graceful fallback
|
||||
|
||||
#### 2.3 Testing
|
||||
- [x] Create `scripts/test_gemini_integration.sh`
|
||||
- [x] Test graceful fallback without API key
|
||||
- [x] Test error handling (invalid key, network issues)
|
||||
- [ ] Test with real API key (pending)
|
||||
- [ ] Verify JSON array parsing (pending)
|
||||
- [ ] Test various prompts (pending)
|
||||
|
||||
### Success Criteria
|
||||
- [x] Gemini service compiles and builds
|
||||
- [x] Service factory integration works
|
||||
- [x] Graceful fallback to MockAIService
|
||||
- [ ] Gemini generates valid command arrays (pending API key)
|
||||
- [ ] Markdown stripping works reliably (pending API key)
|
||||
- [x] Error messages guide user to API key setup
|
||||
|
||||
**Status:** ✅ Complete (build & integration) - See [PHASE2-COMPLETE.md](PHASE2-COMPLETE.md)
|
||||
**Pending:** Real API key validation
|
||||
|
||||
---
|
||||
|
||||
## Phase 3: Add Claude Integration (2-3 hours)
|
||||
|
||||
### Implementation Tasks
|
||||
|
||||
#### 3.1 Create ClaudeAIService
|
||||
- [ ] Create `src/cli/service/claude_ai_service.h`
|
||||
- [ ] Define class with API key constructor
|
||||
- [ ] Add `GetCommands()` override
|
||||
- [ ] Create `src/cli/service/claude_ai_service.cc`
|
||||
- [ ] Implement Claude Messages API call
|
||||
- [ ] Use `claude-3-5-sonnet-20241022` model
|
||||
- [ ] Add markdown stripping
|
||||
- [ ] Add error handling
|
||||
|
||||
#### 3.2 Wire into Service Factory
|
||||
- [ ] Update `CreateAIService()` to check for `CLAUDE_API_KEY`
|
||||
- [ ] Add Claude as provider option
|
||||
|
||||
#### 3.3 Testing
|
||||
- [ ] Test with various prompts
|
||||
- [ ] Compare output quality vs Gemini/Ollama
|
||||
|
||||
### Success Criteria
|
||||
- [ ] Claude service works interchangeably with others
|
||||
- [ ] Quality comparable or better than Gemini
|
||||
|
||||
---
|
||||
|
||||
## Phase 4: Enhanced Prompt Engineering (3-4 hours) ✅ COMPLETE
|
||||
|
||||
### Implementation Tasks
|
||||
|
||||
#### 4.1 Create PromptBuilder Utility
|
||||
- [x] Create `src/cli/service/prompt_builder.h`
|
||||
- [x] Create `src/cli/service/prompt_builder.cc`
|
||||
- [x] Implement `LoadResourceCatalogue()` (with hardcoded docs for now)
|
||||
- [x] Implement `BuildSystemPrompt()` with full command docs
|
||||
- [x] Implement `BuildFewShotExamplesSection()` with proven examples
|
||||
- [x] Implement `BuildContextPrompt()` with ROM state foundation
|
||||
- [x] Add default few-shot examples (6+ examples)
|
||||
- [x] Add command documentation (palette, overworld, sprite, dungeon, rom)
|
||||
- [x] Add tile ID reference (tree, house, water, grass)
|
||||
- [x] Add constraints section (output format, syntax rules)
|
||||
|
||||
#### 4.2 Integrate into Services
|
||||
- [x] Update OllamaAIService to use PromptBuilder
|
||||
- [x] Add PromptBuilder include
|
||||
- [x] Add use_enhanced_prompting flag (default: true)
|
||||
- [x] Use BuildSystemInstructionWithExamples()
|
||||
- [x] Update GeminiAIService to use PromptBuilder
|
||||
- [x] Add PromptBuilder include
|
||||
- [x] Add use_enhanced_prompting flag (default: true)
|
||||
- [x] Use BuildSystemInstructionWithExamples()
|
||||
- [ ] Update ClaudeAIService to use PromptBuilder (pending Phase 3)
|
||||
|
||||
#### 4.3 Testing
|
||||
- [x] Create test script (test_enhanced_prompting.sh)
|
||||
- [ ] Test with complex prompts (pending real API validation)
|
||||
- [ ] Measure accuracy improvement (pending validation)
|
||||
- [ ] Document which models perform best (pending validation)
|
||||
|
||||
### Success Criteria
|
||||
- [x] PromptBuilder utility class implemented
|
||||
- [x] Few-shot examples included (6+ examples)
|
||||
- [x] Command documentation complete
|
||||
- [x] Tile ID reference included
|
||||
- [x] Integrated into Ollama & Gemini
|
||||
- [x] Enabled by default
|
||||
- [ ] System prompts include full resource catalogue (pending yaml loading)
|
||||
- [ ] Few-shot examples improve accuracy >90% (pending validation)
|
||||
- [ ] Context injection provides relevant ROM info (foundation in place)
|
||||
|
||||
**Status:** ✅ Complete (implementation) - See [PHASE4-COMPLETE.md](PHASE4-COMPLETE.md)
|
||||
**Pending:** Real API validation to measure accuracy improvement
|
||||
|
||||
---
|
||||
|
||||
## Configuration & Documentation
|
||||
|
||||
### Environment Variables Setup
|
||||
- [ ] Document `YAZE_AI_PROVIDER` options
|
||||
- [ ] Document `OLLAMA_MODEL` override
|
||||
- [ ] Document API key requirements
|
||||
- [ ] Create example `.env` file
|
||||
|
||||
### User Documentation
|
||||
- [ ] Create `docs/z3ed/AI-SERVICE-SETUP.md`
|
||||
- [ ] Ollama quick start
|
||||
- [ ] Gemini setup guide
|
||||
- [ ] Claude setup guide
|
||||
- [ ] Troubleshooting section
|
||||
- [ ] Update README with LLM setup instructions
|
||||
- [ ] Add examples to main docs
|
||||
|
||||
### CLI Enhancements
|
||||
- [ ] Add `--ai-provider` flag to override env
|
||||
- [ ] Add `--ai-model` flag to override model
|
||||
- [ ] Add `--dry-run` flag to see commands without executing
|
||||
- [ ] Add `--interactive` flag to confirm each command
|
||||
|
||||
---
|
||||
|
||||
## Testing Matrix
|
||||
|
||||
| Provider | Model | Test Prompt | Expected Commands | Status |
|
||||
|----------|-------|-------------|-------------------|--------|
|
||||
| Ollama | qwen2.5-coder:7b | "Validate ROM" | `["rom validate --rom zelda3.sfc"]` | ⬜ |
|
||||
| Ollama | codellama:13b | "Export first palette" | `["palette export ..."]` | ⬜ |
|
||||
| Gemini | gemini-2.5-flash | "Make soldiers red" | `["palette export ...", "palette set-color ...", ...]` | ⬜ |
|
||||
| Claude | claude-3.5-sonnet | "Change tile at (10,20)" | `["overworld set-tile ..."]` | ⬜ |
|
||||
|
||||
---
|
||||
|
||||
## Rollout Plan
|
||||
|
||||
### Week 1 (Oct 7-11, 2025)
|
||||
- **Monday**: Phase 1 implementation (OllamaAIService class)
|
||||
- **Tuesday**: Phase 1 CMake + wiring
|
||||
- **Wednesday**: Phase 1 testing + documentation
|
||||
- **Thursday**: Phase 2 (Gemini fixes)
|
||||
- **Friday**: Buffer day + code review
|
||||
|
||||
### Week 2 (Oct 14-18, 2025)
|
||||
- **Monday**: Phase 3 (Claude integration)
|
||||
- **Tuesday**: Phase 4 (PromptBuilder)
|
||||
- **Wednesday**: Enhanced testing across all services
|
||||
- **Thursday**: Documentation completion
|
||||
- **Friday**: User validation + demos
|
||||
|
||||
---
|
||||
|
||||
## Known Risks & Mitigation
|
||||
|
||||
| Risk | Impact | Likelihood | Mitigation |
|
||||
|------|--------|------------|------------|
|
||||
| Ollama not available on CI | Medium | Low | Add `YAZE_AI_PROVIDER=mock` for CI builds |
|
||||
| LLM output format inconsistent | High | Medium | Strict system prompts + validation layer |
|
||||
| API rate limits | Medium | Medium | Cache responses, implement retry backoff |
|
||||
| Model accuracy insufficient | High | Low | Multiple few-shot examples + prompt tuning |
|
||||
|
||||
---
|
||||
|
||||
## Success Metrics
|
||||
|
||||
**Phase 1 Complete**:
|
||||
- ✅ Ollama service operational on local machine
|
||||
- ✅ Can generate valid z3ed commands from prompts
|
||||
- ✅ End-to-end test passes
|
||||
|
||||
**Phase 2-3 Complete**:
|
||||
- ✅ All three providers (Ollama, Gemini, Claude) work interchangeably
|
||||
- ✅ Service selection transparent to user
|
||||
|
||||
**Phase 4 Complete**:
|
||||
- ✅ Command accuracy >90% on standard prompts
|
||||
- ✅ Resource catalogue integrated into system prompts
|
||||
|
||||
**Production Ready**:
|
||||
- ✅ Documentation complete with setup guides
|
||||
- ✅ Error messages are actionable
|
||||
- ✅ Works on macOS (primary target)
|
||||
- ✅ At least one user validates the workflow
|
||||
|
||||
---
|
||||
|
||||
## Next Steps After Completion
|
||||
|
||||
1. **Gather User Feedback**: Share with ROM hacking community
|
||||
2. **Measure Accuracy**: Track success rate of generated commands
|
||||
3. **Model Comparison**: Document which models work best
|
||||
4. **Fine-Tuning**: Consider fine-tuning local models on z3ed corpus
|
||||
5. **Agentic Loop**: Add self-correction based on execution results
|
||||
|
||||
---
|
||||
|
||||
## Notes & Observations
|
||||
|
||||
_Add notes here as you progress through implementation:_
|
||||
|
||||
-
|
||||
-
|
||||
-
|
||||
|
||||
---
|
||||
|
||||
**Last Updated**: October 3, 2025
|
||||
**Next Review**: After Phase 1 completion
|
||||
File diff suppressed because it is too large
Load Diff
@@ -1,477 +0,0 @@
|
||||
# Overworld & Dungeon AI Integration Plan
|
||||
|
||||
**Date**: October 3, 2025
|
||||
**Status**: 🎯 Design Phase
|
||||
**Focus**: Practical tile16 editing and ResourceLabels awareness
|
||||
|
||||
## Executive Summary
|
||||
|
||||
This document outlines the strategic shift from general-purpose ROM editing to **specialized overworld and dungeon AI workflows**. The focus is on practical, visual editing with accept/reject flows that leverage the existing tile16 editor and ResourceLabels system.
|
||||
|
||||
## Vision: AI-Driven Visual Editing
|
||||
|
||||
### Why Overworld/Dungeon Focus?
|
||||
|
||||
**Overworld Canvas Editing** is ideal for AI because:
|
||||
1. **Simple Data Model**: Just tile16 IDs on a 512x512 grid
|
||||
2. **Visual Feedback**: Immediate preview of changes
|
||||
3. **Reversible**: Easy accept/reject workflow
|
||||
4. **Common Use Case**: Most ROM hacks modify overworld layout
|
||||
5. **Safe Sandbox**: Changes don't affect game logic
|
||||
|
||||
**Dungeon Editing** is next logical step:
|
||||
1. **Structured Data**: Rooms, objects, sprites, entrances
|
||||
2. **ResourceLabels**: User-defined names make AI navigation intuitive
|
||||
3. **No Preview Yet**: AI can still generate valid data
|
||||
4. **Complex Workflows**: Requires AI to understand relationships
|
||||
|
||||
## Architecture: Tile16 Accept/Reject Workflow
|
||||
|
||||
### Current State
|
||||
- ✅ Tile16Editor fully implemented (`src/app/editor/overworld/tile16_editor.{h,cc}`)
|
||||
- ✅ Overworld canvas displays tile16 grid (32x32 tile16s per screen)
|
||||
- ✅ Tile16 IDs are 16-bit values (0x000 to 0xFFF)
|
||||
- ✅ Changes update blockset bitmap in real-time
|
||||
- ⚠️ **Missing**: Proposal-based workflow for AI edits
|
||||
|
||||
### Proposed Workflow
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ User: "Add a river flowing from north to south on map 0" │
|
||||
└────────────────────┬────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌──────────────────────────────┐
|
||||
│ AI Service (Gemini/Ollama) │
|
||||
│ - Understands "river" │
|
||||
│ - Knows water tile16 IDs │
|
||||
│ - Plans tile placement │
|
||||
└──────────────┬───────────────┘
|
||||
│
|
||||
▼
|
||||
┌──────────────────────────────────────────┐
|
||||
│ Generate Tile16 Proposal (JSON) │
|
||||
│ { │
|
||||
│ "map": 0, │
|
||||
│ "changes": [ │
|
||||
│ {"x": 10, "y": 0, "tile": 0x14C}, │ ← Water top
|
||||
│ {"x": 10, "y": 1, "tile": 0x14D}, │ ← Water middle
|
||||
│ {"x": 10, "y": 2, "tile": 0x14D}, │
|
||||
│ {"x": 10, "y": 30, "tile": 0x14E} │ ← Water bottom
|
||||
│ ] │
|
||||
│ } │
|
||||
└──────────────┬───────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌──────────────────────────────────────────┐
|
||||
│ Apply to Sandbox ROM (Preview) │
|
||||
│ - Load map 0 from sandbox ROM │
|
||||
│ - Apply tile16 changes │
|
||||
│ - Render preview bitmap │
|
||||
│ - Generate diff image (before/after) │
|
||||
└──────────────┬───────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌──────────────────────────────────────────┐
|
||||
│ Display to User │
|
||||
│ ┌─────────┬──────────┬─────────┐ │
|
||||
│ │ Before │ Changes │ After │ │
|
||||
│ │ [Image] │ +47 │ [Image] │ │
|
||||
│ │ │ tiles │ │ │
|
||||
│ └─────────┴──────────┴─────────┘ │
|
||||
│ │
|
||||
│ [Accept] [Reject] [Modify] │
|
||||
└──────────────┬───────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌──────────────────────────────────────────┐
|
||||
│ User Decision │
|
||||
│ ✓ Accept → Write to main ROM │
|
||||
│ ✗ Reject → Discard sandbox changes │
|
||||
│ ✎ Modify → Adjust proposal parameters │
|
||||
└──────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Implementation Components
|
||||
|
||||
#### 1. Tile16ProposalGenerator
|
||||
**File**: `src/cli/service/tile16_proposal_generator.{h,cc}`
|
||||
|
||||
```cpp
|
||||
struct Tile16Change {
|
||||
int map_id;
|
||||
int x; // Tile16 X coordinate (0-63 typically)
|
||||
int y; // Tile16 Y coordinate (0-63 typically)
|
||||
uint16_t old_tile; // Original tile16 ID
|
||||
uint16_t new_tile; // New tile16 ID
|
||||
};
|
||||
|
||||
struct Tile16Proposal {
|
||||
std::string id; // Unique proposal ID
|
||||
std::string prompt; // Original user prompt
|
||||
int map_id;
|
||||
std::vector<Tile16Change> changes;
|
||||
std::string reasoning; // AI explanation
|
||||
|
||||
// Metadata
|
||||
std::chrono::system_clock::time_point created_at;
|
||||
std::string ai_service; // "gemini", "ollama", etc.
|
||||
};
|
||||
|
||||
class Tile16ProposalGenerator {
|
||||
public:
|
||||
// Generate proposal from AI service
|
||||
absl::StatusOr<Tile16Proposal> GenerateFromPrompt(
|
||||
const std::string& prompt,
|
||||
const RomContext& context);
|
||||
|
||||
// Apply proposal to sandbox ROM
|
||||
absl::Status ApplyProposal(
|
||||
const Tile16Proposal& proposal,
|
||||
Rom* sandbox_rom);
|
||||
|
||||
// Generate visual diff
|
||||
absl::StatusOr<gfx::Bitmap> GenerateDiff(
|
||||
const Tile16Proposal& proposal,
|
||||
Rom* before_rom,
|
||||
Rom* after_rom);
|
||||
|
||||
// Save proposal for later review
|
||||
absl::Status SaveProposal(
|
||||
const Tile16Proposal& proposal,
|
||||
const std::string& path);
|
||||
};
|
||||
```
|
||||
|
||||
#### 2. Enhanced Prompt Examples
|
||||
|
||||
**Current Examples** (Too Generic):
|
||||
```cpp
|
||||
examples_.push_back({
|
||||
"Place a tree at coordinates (10, 20) on map 0",
|
||||
{"overworld set-tile --map 0 --x 10 --y 20 --tile 0x02E"},
|
||||
"Tree tile ID is 0x02E in ALTTP"
|
||||
});
|
||||
```
|
||||
|
||||
**New Examples** (Practical & Visual):
|
||||
```cpp
|
||||
examples_.push_back({
|
||||
"Add a horizontal row of trees across the top of Light World",
|
||||
{
|
||||
"overworld batch-edit --map 0 --pattern horizontal_trees.json"
|
||||
},
|
||||
"Use batch patterns for repetitive tile placement",
|
||||
"overworld" // Category
|
||||
});
|
||||
|
||||
examples_.push_back({
|
||||
"Create a 3x3 water pond at position 10, 15",
|
||||
{
|
||||
"overworld set-area --map 0 --x 10 --y 15 --width 3 --height 3 --tile 0x14D --edges true"
|
||||
},
|
||||
"Area commands handle edge tiles automatically (corners, sides)",
|
||||
"overworld"
|
||||
});
|
||||
|
||||
examples_.push_back({
|
||||
"Replace all grass tiles with dirt in the Lost Woods area",
|
||||
{
|
||||
"overworld replace-tile --map 0 --region lost_woods --from 0x020 --to 0x022"
|
||||
},
|
||||
"Region-based replacement uses predefined area boundaries",
|
||||
"overworld"
|
||||
});
|
||||
|
||||
examples_.push_back({
|
||||
"Make the desert more sandy by adding sand dunes",
|
||||
{
|
||||
"overworld blend-tiles --map 3 --region desert --pattern sand_dunes --density 40"
|
||||
},
|
||||
"Blend patterns add visual variety while respecting terrain type",
|
||||
"overworld"
|
||||
});
|
||||
```
|
||||
|
||||
#### 3. ResourceLabels Context Injection
|
||||
|
||||
**Current Problem**: AI doesn't know user's custom names for dungeons, maps, etc.
|
||||
|
||||
**Solution**: Extract ResourceLabels and inject into prompt context.
|
||||
|
||||
**File**: `src/cli/service/resource_context_builder.{h,cc}`
|
||||
|
||||
```cpp
|
||||
class ResourceContextBuilder {
|
||||
public:
|
||||
explicit ResourceContextBuilder(Rom* rom) : rom_(rom) {}
|
||||
|
||||
// Extract all resource labels from current project
|
||||
absl::StatusOr<std::string> BuildResourceContext();
|
||||
|
||||
// Get specific category of labels
|
||||
absl::StatusOr<std::map<std::string, std::string>> GetLabels(
|
||||
const std::string& category);
|
||||
|
||||
private:
|
||||
Rom* rom_;
|
||||
|
||||
// Extract from ROM's ResourceLabelManager
|
||||
std::string ExtractOverworldLabels(); // "light_world", "dark_world", etc.
|
||||
std::string ExtractDungeonLabels(); // "eastern_palace", "swamp_palace", etc.
|
||||
std::string ExtractEntranceLabels(); // "links_house", "sanctuary", etc.
|
||||
std::string ExtractRoomLabels(); // "boss_room", "treasure_room", etc.
|
||||
std::string ExtractSpriteLabels(); // "soldier", "octorok", etc.
|
||||
};
|
||||
```
|
||||
|
||||
**Enhanced Prompt with ResourceLabels**:
|
||||
```
|
||||
=== AVAILABLE RESOURCES ===
|
||||
|
||||
Overworld Maps:
|
||||
- 0: "Light World" (user label: "hyrule_overworld")
|
||||
- 1: "Dark World" (user label: "dark_world")
|
||||
- 3: "Desert" (user label: "lanmola_desert")
|
||||
|
||||
Dungeons:
|
||||
- 0x00: "Hyrule Castle" (user label: "castle")
|
||||
- 0x02: "Eastern Palace" (user label: "east_palace")
|
||||
- 0x04: "Desert Palace" (user label: "desert_dungeon")
|
||||
|
||||
Entrances:
|
||||
- 0x00: "Link's House" (user label: "starting_house")
|
||||
- 0x01: "Sanctuary" (user label: "church")
|
||||
|
||||
Common Tile16s:
|
||||
- 0x020: Grass
|
||||
- 0x022: Dirt
|
||||
- 0x14C: Water (top edge)
|
||||
- 0x14D: Water (middle)
|
||||
- 0x14E: Water (bottom edge)
|
||||
- 0x02E: Tree
|
||||
|
||||
=== USER PROMPT ===
|
||||
{user_prompt}
|
||||
|
||||
=== INSTRUCTIONS ===
|
||||
1. Use the user's custom labels when referencing resources
|
||||
2. If user says "eastern palace", use dungeon ID 0x02
|
||||
3. If user says "my custom dungeon", check for matching label
|
||||
4. Provide tile16 IDs as hex values (0x###)
|
||||
5. Explain which labels you're using in your reasoning
|
||||
```
|
||||
|
||||
#### 4. CLI Command Structure
|
||||
|
||||
**New Commands**:
|
||||
```bash
|
||||
# Tile16 editing commands (AI-friendly)
|
||||
z3ed overworld set-tile --map <id> --x <x> --y <y> --tile <tile16_id>
|
||||
z3ed overworld set-area --map <id> --x <x> --y <y> --width <w> --height <h> --tile <tile16_id>
|
||||
z3ed overworld replace-tile --map <id> --from <old_tile> --to <new_tile> [--region <name>]
|
||||
z3ed overworld batch-edit --map <id> --pattern <json_file>
|
||||
z3ed overworld blend-tiles --map <id> --pattern <name> --density <percent>
|
||||
|
||||
# Dungeon editing commands (label-aware)
|
||||
z3ed dungeon get-room --dungeon <label_or_id> --room <label_or_id>
|
||||
z3ed dungeon set-object --dungeon <id> --room <id> --object <type> --x <x> --y <y>
|
||||
z3ed dungeon list-entrances --dungeon <label_or_id>
|
||||
z3ed dungeon add-sprite --dungeon <id> --room <id> --sprite <type> --x <x> --y <y>
|
||||
|
||||
# ResourceLabel commands (for AI context)
|
||||
z3ed labels list [--category <type>]
|
||||
z3ed labels export --to <json_file>
|
||||
z3ed labels get --type <type> --key <key>
|
||||
```
|
||||
|
||||
## ResourceLabels Deep Integration
|
||||
|
||||
### Current System
|
||||
**Location**: `src/app/core/project.{h,cc}`
|
||||
|
||||
```cpp
|
||||
struct ResourceLabelManager {
|
||||
// Format: labels_["dungeon"]["0x02"] = "eastern_palace"
|
||||
std::unordered_map<std::string, std::unordered_map<std::string, std::string>> labels_;
|
||||
|
||||
std::string GetLabel(const std::string& type, const std::string& key);
|
||||
void EditLabel(const std::string& type, const std::string& key, const std::string& newValue);
|
||||
};
|
||||
```
|
||||
|
||||
**File Format** (`labels.txt`):
|
||||
```ini
|
||||
[overworld]
|
||||
0=Light World
|
||||
1=Dark World
|
||||
3=Desert Region
|
||||
|
||||
[dungeon]
|
||||
0x00=Hyrule Castle
|
||||
0x02=Eastern Palace
|
||||
0x04=Desert Palace
|
||||
|
||||
[entrance]
|
||||
0x00=Links House
|
||||
0x01=Sanctuary
|
||||
|
||||
[room]
|
||||
0x00_0x10=Eastern Palace Boss Room
|
||||
0x04_0x05=Desert Palace Treasure Room
|
||||
```
|
||||
|
||||
### Proposed Enhancement
|
||||
|
||||
**1. Export ResourceLabels to JSON for AI**
|
||||
|
||||
```json
|
||||
{
|
||||
"overworld": {
|
||||
"maps": [
|
||||
{"id": 0, "label": "Light World", "user_label": "hyrule_overworld"},
|
||||
{"id": 1, "label": "Dark World", "user_label": "dark_world"},
|
||||
{"id": 3, "label": "Desert", "user_label": "lanmola_desert"}
|
||||
]
|
||||
},
|
||||
"dungeons": {
|
||||
"list": [
|
||||
{"id": "0x00", "label": "Hyrule Castle", "user_label": "castle", "rooms": 67},
|
||||
{"id": "0x02", "label": "Eastern Palace", "user_label": "east_palace", "rooms": 20}
|
||||
]
|
||||
},
|
||||
"entrances": {
|
||||
"list": [
|
||||
{"id": "0x00", "label": "Link's House", "user_label": "starting_house", "map": 0},
|
||||
{"id": "0x01", "label": "Sanctuary", "user_label": "church", "map": 0}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**2. Enhanced PromptBuilder Integration**
|
||||
|
||||
```cpp
|
||||
// In BuildContextualPrompt()
|
||||
std::string PromptBuilder::BuildContextualPrompt(
|
||||
const std::string& user_prompt,
|
||||
const RomContext& context) {
|
||||
|
||||
std::string prompt = BuildSystemInstruction();
|
||||
|
||||
// NEW: Add resource labels context
|
||||
if (context.rom_loaded && !context.resource_labels.empty()) {
|
||||
prompt += "\n\n=== AVAILABLE RESOURCES ===\n";
|
||||
|
||||
for (const auto& [category, labels] : context.resource_labels) {
|
||||
prompt += absl::StrFormat("\n%s:\n", absl::AsciiStrToTitle(category));
|
||||
|
||||
for (const auto& [key, label] : labels) {
|
||||
prompt += absl::StrFormat(" - %s: \"%s\"\n", key, label);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
prompt += absl::StrFormat("\n\n=== USER PROMPT ===\n%s\n", user_prompt);
|
||||
|
||||
return prompt;
|
||||
}
|
||||
```
|
||||
|
||||
## Dungeon Editor Considerations
|
||||
|
||||
### Current State
|
||||
- ✅ DungeonEditor exists (`src/app/editor/dungeon/dungeon_editor.h`)
|
||||
- ✅ DungeonEditorSystem provides object/sprite/entrance editing
|
||||
- ✅ ObjectRenderer handles room rendering
|
||||
- ⚠️ **No visual preview available yet** (mentioned by user)
|
||||
- ⚠️ Room data structure is complex
|
||||
|
||||
### AI-Friendly Dungeon Operations
|
||||
|
||||
**Focus on Data Generation** (not visual editing):
|
||||
```cpp
|
||||
// AI can generate valid dungeon data without preview
|
||||
struct DungeonProposal {
|
||||
std::string dungeon_label; // "eastern_palace" or "0x02"
|
||||
std::string room_label; // "boss_room" or "0x10"
|
||||
|
||||
std::vector<ObjectPlacement> objects; // Walls, floors, decorations
|
||||
std::vector<SpritePlacement> sprites; // Enemies, NPCs
|
||||
std::vector<Entrance> entrances; // Room connections
|
||||
std::vector<Chest> chests; // Treasure
|
||||
};
|
||||
|
||||
// Example AI generation
|
||||
AI Prompt: "Add 3 soldiers to the entrance of Eastern Palace"
|
||||
AI Response:
|
||||
{
|
||||
"commands": [
|
||||
"dungeon add-sprite --dungeon east_palace --room entrance_room --sprite soldier --x 5 --y 3",
|
||||
"dungeon add-sprite --dungeon east_palace --room entrance_room --sprite soldier --x 10 --y 3",
|
||||
"dungeon add-sprite --dungeon east_palace --room entrance_room --sprite soldier --x 7 --y 8"
|
||||
],
|
||||
"reasoning": "Using user label 'east_palace' for dungeon 0x02, placing soldiers in entrance room formation"
|
||||
}
|
||||
```
|
||||
|
||||
## Implementation Phases
|
||||
|
||||
### Phase 1: SSL + Overworld Tile16 Basics (This Week)
|
||||
- [x] Enable SSL support (see SSL-AND-COLLABORATIVE-PLAN.md)
|
||||
- [ ] Implement Tile16ProposalGenerator basic structure
|
||||
- [ ] Add overworld tile16 commands to CLI
|
||||
- [ ] Update PromptBuilder with overworld-focused examples
|
||||
- [ ] Test basic "place a tree" workflow
|
||||
|
||||
### Phase 2: ResourceLabels Integration (Next Week)
|
||||
- [ ] Implement ResourceContextBuilder
|
||||
- [ ] Extract labels from ROM project
|
||||
- [ ] Inject labels into AI prompts
|
||||
- [ ] Test label-aware prompts ("add trees to my custom forest")
|
||||
- [ ] Document label file format for users
|
||||
|
||||
### Phase 3: Visual Diff & Accept/Reject (Week 3)
|
||||
- [ ] Implement visual diff generation
|
||||
- [ ] Add before/after screenshot comparison
|
||||
- [ ] Create accept/reject CLI workflow
|
||||
- [ ] Add proposal history tracking
|
||||
- [ ] Test multi-step proposals
|
||||
|
||||
### Phase 4: Dungeon Editing (Month 2)
|
||||
- [ ] Implement DungeonProposalGenerator
|
||||
- [ ] Add dungeon CLI commands
|
||||
- [ ] Test sprite/object placement
|
||||
- [ ] Validate entrance connections
|
||||
- [ ] Document dungeon editing workflow
|
||||
|
||||
## Success Metrics
|
||||
|
||||
### Overworld Editing
|
||||
- [ ] AI can place individual tiles correctly
|
||||
- [ ] AI can create tile patterns (rivers, paths, forests)
|
||||
- [ ] AI understands user's custom map labels
|
||||
- [ ] Visual diff shows changes clearly
|
||||
- [ ] Accept/reject workflow is intuitive
|
||||
|
||||
### Dungeon Editing
|
||||
- [ ] AI can find rooms by user labels
|
||||
- [ ] AI can place sprites in valid positions
|
||||
- [ ] AI can configure entrances correctly
|
||||
- [ ] Proposals don't break room data
|
||||
- [ ] Generated data passes validation
|
||||
|
||||
### ResourceLabels
|
||||
- [ ] AI uses user's custom labels correctly
|
||||
- [ ] AI falls back to IDs when no label exists
|
||||
- [ ] AI explains which resources it's using
|
||||
- [ ] Label extraction works for all resource types
|
||||
- [ ] JSON export is complete and accurate
|
||||
|
||||
---
|
||||
|
||||
**Status**: 📋 DESIGN COMPLETE - Ready for Phase 1 Implementation
|
||||
**Next Action**: Enable SSL support, then implement Tile16ProposalGenerator
|
||||
**Timeline**: 3-4 weeks for full overworld/dungeon AI integration
|
||||
|
||||
@@ -1,307 +0,0 @@
|
||||
# Quick Start: Gemini AI Integration
|
||||
|
||||
**Date**: October 3, 2025
|
||||
**Status**: ✅ Ready to Test
|
||||
|
||||
## 🚀 Immediate Steps
|
||||
|
||||
### 1. Build z3ed with SSL Support
|
||||
|
||||
```bash
|
||||
cd /Users/scawful/Code/yaze
|
||||
|
||||
# Build z3ed (SSL is now enabled)
|
||||
cmake --build build-grpc-test --target z3ed
|
||||
|
||||
# Verify OpenSSL is linked
|
||||
otool -L build-grpc-test/bin/z3ed | grep -i ssl
|
||||
|
||||
# Expected output:
|
||||
# /opt/homebrew/Cellar/openssl@3/3.5.4/lib/libssl.3.dylib
|
||||
# /opt/homebrew/Cellar/openssl@3/3.5.4/lib/libcrypto.3.dylib
|
||||
```
|
||||
|
||||
### 2. Set Up Gemini API Key
|
||||
|
||||
**Get Your API Key**:
|
||||
1. Go to https://aistudio.google.com/apikey
|
||||
2. Sign in with Google account
|
||||
3. Click "Create API Key"
|
||||
4. Copy the key (starts with `AIza...`)
|
||||
|
||||
**Set Environment Variable**:
|
||||
```bash
|
||||
export GEMINI_API_KEY="AIzaSy..."
|
||||
|
||||
# Or add to your ~/.zshrc for persistence:
|
||||
echo 'export GEMINI_API_KEY="AIzaSy..."' >> ~/.zshrc
|
||||
source ~/.zshrc
|
||||
```
|
||||
|
||||
### 3. Test Basic Connection
|
||||
|
||||
```bash
|
||||
# Simple test prompt
|
||||
./build-grpc-test/bin/z3ed agent plan --prompt "Place a tree at position 10, 10"
|
||||
|
||||
# Expected output:
|
||||
# ✓ Using Gemini AI service
|
||||
# ✓ Commands generated:
|
||||
# overworld set-tile --map 0 --x 10 --y 10 --tile 0x02E
|
||||
```
|
||||
|
||||
## 📝 Example Prompts to Try
|
||||
|
||||
### Overworld Tile16 Editing
|
||||
|
||||
**Single Tile Placement**:
|
||||
```bash
|
||||
./build-grpc-test/bin/z3ed agent plan --prompt "Place a tree at position 10, 20 on map 0"
|
||||
./build-grpc-test/bin/z3ed agent plan --prompt "Add a rock at coordinates 15, 8"
|
||||
./build-grpc-test/bin/z3ed agent plan --prompt "Put a bush at 5, 5"
|
||||
```
|
||||
|
||||
**Area Creation**:
|
||||
```bash
|
||||
./build-grpc-test/bin/z3ed agent plan --prompt "Create a 3x3 water pond at coordinates 15, 10"
|
||||
./build-grpc-test/bin/z3ed agent plan --prompt "Make a 2x4 dirt patch at 20, 15"
|
||||
```
|
||||
|
||||
**Path/Line Creation**:
|
||||
```bash
|
||||
./build-grpc-test/bin/z3ed agent plan --prompt "Add a dirt path from position 5,5 to 5,15"
|
||||
./build-grpc-test/bin/z3ed agent plan --prompt "Create a horizontal stone path at y=10 from x=8 to x=20"
|
||||
```
|
||||
|
||||
**Pattern Creation**:
|
||||
```bash
|
||||
./build-grpc-test/bin/z3ed agent plan --prompt "Plant a row of trees horizontally at y=8 from x=20 to x=25"
|
||||
./build-grpc-test/bin/z3ed agent plan --prompt "Add trees in a circle around position 30, 30"
|
||||
```
|
||||
|
||||
### Dungeon Editing (Label-Aware)
|
||||
|
||||
```bash
|
||||
./build-grpc-test/bin/z3ed agent plan --prompt "Add 3 soldiers to the Eastern Palace entrance room"
|
||||
./build-grpc-test/bin/z3ed agent plan --prompt "Place a chest in Hyrule Castle treasure room"
|
||||
./build-grpc-test/bin/z3ed agent plan --prompt "Add a key to room 0x10 in dungeon 0x02"
|
||||
```
|
||||
|
||||
## 🔍 What to Look For
|
||||
|
||||
### Good AI Response Example:
|
||||
```json
|
||||
{
|
||||
"commands": [
|
||||
"overworld set-tile --map 0 --x 10 --y 10 --tile 0x02E"
|
||||
],
|
||||
"reasoning": "Placing tree tile (0x02E) at specified coordinates"
|
||||
}
|
||||
```
|
||||
|
||||
### Quality Checks:
|
||||
- ✅ AI uses correct tile16 IDs (0x02E for trees, 0x022 for dirt, etc.)
|
||||
- ✅ AI explains what it's doing
|
||||
- ✅ Commands follow correct syntax
|
||||
- ✅ AI handles edge cases (water borders, path curves)
|
||||
- ✅ AI suggests reasonable positions
|
||||
|
||||
## 🐛 Troubleshooting
|
||||
|
||||
### Error: "Cannot reach Gemini API"
|
||||
**Causes**:
|
||||
- No internet connection
|
||||
- Incorrect API key
|
||||
- SSL not enabled
|
||||
|
||||
**Solutions**:
|
||||
```bash
|
||||
# Verify internet
|
||||
ping -c 3 google.com
|
||||
|
||||
# Verify API key is set
|
||||
echo $GEMINI_API_KEY
|
||||
|
||||
# Verify SSL is linked
|
||||
otool -L build-grpc-test/bin/z3ed | grep ssl
|
||||
```
|
||||
|
||||
### Error: "Invalid Gemini API key"
|
||||
**Causes**:
|
||||
- Typo in API key
|
||||
- API key not activated
|
||||
- Rate limit exceeded
|
||||
|
||||
**Solutions**:
|
||||
1. Verify key at https://aistudio.google.com/apikey
|
||||
2. Generate a new key if needed
|
||||
3. Wait a few minutes if rate-limited
|
||||
|
||||
### Error: "No valid commands extracted"
|
||||
**Causes**:
|
||||
- AI didn't understand prompt
|
||||
- Prompt too vague
|
||||
- AI output format incorrect
|
||||
|
||||
**Solutions**:
|
||||
1. Rephrase prompt more clearly
|
||||
2. Use examples from this guide
|
||||
3. Check logs: `./build-grpc-test/bin/z3ed agent plan --prompt "..." -v`
|
||||
|
||||
## 📊 Command Reference
|
||||
|
||||
### Tile16 Reference (Common IDs)
|
||||
|
||||
| Tile | ID (Hex) | Description |
|
||||
|------|----------|-------------|
|
||||
| Grass | 0x020 | Standard grass tile |
|
||||
| Dirt | 0x022 | Dirt/path tile |
|
||||
| Tree | 0x02E | Full tree tile |
|
||||
| Bush | 0x003 | Bush tile |
|
||||
| Rock | 0x004 | Rock tile |
|
||||
| Flower | 0x021 | Flower tile |
|
||||
| Sand | 0x023 | Desert sand |
|
||||
| Water (top) | 0x14C | Water top edge |
|
||||
| Water (middle) | 0x14D | Water middle |
|
||||
| Water (bottom) | 0x14E | Water bottom edge |
|
||||
|
||||
### Map IDs
|
||||
|
||||
| Map | ID | Description |
|
||||
|-----|-----|-------------|
|
||||
| Light World | 0 | Main overworld |
|
||||
| Dark World | 1 | Dark world version |
|
||||
| Desert | 3 | Desert area |
|
||||
|
||||
### Dungeon IDs
|
||||
|
||||
| Dungeon | ID (Hex) | Description |
|
||||
|---------|----------|-------------|
|
||||
| Hyrule Castle | 0x00 | Starting castle |
|
||||
| Eastern Palace | 0x02 | First dungeon |
|
||||
| Desert Palace | 0x04 | Second dungeon |
|
||||
| Tower of Hera | 0x07 | Third dungeon |
|
||||
|
||||
## 🔧 Advanced Usage
|
||||
|
||||
### Full Workflow (with Sandbox)
|
||||
|
||||
```bash
|
||||
# Generate proposal with sandbox isolation
|
||||
./build-grpc-test/bin/z3ed agent run \
|
||||
--prompt "Create a water pond at 15, 10" \
|
||||
--rom assets/zelda3.sfc \
|
||||
--sandbox
|
||||
|
||||
# This will:
|
||||
# 1. Create sandbox ROM copy
|
||||
# 2. Generate AI commands
|
||||
# 3. Apply to sandbox
|
||||
# 4. Save diff
|
||||
# 5. Keep original ROM untouched
|
||||
```
|
||||
|
||||
### Batch Testing
|
||||
|
||||
```bash
|
||||
# Create test script
|
||||
cat > test_prompts.sh << 'EOF'
|
||||
#!/bin/bash
|
||||
PROMPTS=(
|
||||
"Place a tree at 10, 10"
|
||||
"Create a water pond at 15, 20"
|
||||
"Add a dirt path from 5,5 to 5,15"
|
||||
"Plant trees horizontally at y=8"
|
||||
)
|
||||
|
||||
for prompt in "${PROMPTS[@]}"; do
|
||||
echo "Testing: $prompt"
|
||||
./build-grpc-test/bin/z3ed agent plan --prompt "$prompt"
|
||||
echo "---"
|
||||
done
|
||||
EOF
|
||||
|
||||
chmod +x test_prompts.sh
|
||||
./test_prompts.sh
|
||||
```
|
||||
|
||||
### Logging for Debugging
|
||||
|
||||
```bash
|
||||
# Enable verbose logging
|
||||
./build-grpc-test/bin/z3ed agent plan \
|
||||
--prompt "test" \
|
||||
--log-level debug \
|
||||
2>&1 | tee gemini_test.log
|
||||
|
||||
# Check what AI returned
|
||||
cat gemini_test.log | grep -A 10 "AI Response"
|
||||
```
|
||||
|
||||
## 📈 Success Metrics
|
||||
|
||||
After testing, verify:
|
||||
|
||||
### Technical Success
|
||||
- [ ] Binary has OpenSSL linked (`otool -L` shows libssl/libcrypto)
|
||||
- [ ] Gemini API responds (no connection errors)
|
||||
- [ ] Commands are well-formed (correct syntax)
|
||||
- [ ] Tile16 IDs are correct (match reference table)
|
||||
|
||||
### Quality Success
|
||||
- [ ] AI understands natural language prompts
|
||||
- [ ] AI explains its reasoning
|
||||
- [ ] AI handles edge cases (pond edges, path curves)
|
||||
- [ ] AI suggests reasonable coordinates
|
||||
|
||||
### User Experience Success
|
||||
- [ ] Prompts feel natural to write
|
||||
- [ ] Responses are easy to understand
|
||||
- [ ] Commands work when executed
|
||||
- [ ] Errors are informative
|
||||
|
||||
## 🎯 Next Steps
|
||||
|
||||
Once testing is successful:
|
||||
|
||||
1. **Document Results**:
|
||||
- Update `TESTING-SESSION-RESULTS.md`
|
||||
- Note any issues or improvements needed
|
||||
- Share example outputs
|
||||
|
||||
2. **Begin Phase 2 Implementation**:
|
||||
- Create `Tile16ProposalGenerator` class
|
||||
- Implement proposal JSON format
|
||||
- Add CLI commands for overworld editing
|
||||
|
||||
3. **Iterate on Prompts**:
|
||||
- Add more few-shot examples based on testing
|
||||
- Refine tile16 reference for AI
|
||||
- Document common failure patterns
|
||||
|
||||
## 📞 Support
|
||||
|
||||
### Documentation References
|
||||
- `docs/z3ed/SSL-AND-COLLABORATIVE-PLAN.md` - SSL implementation details
|
||||
- `docs/z3ed/OVERWORLD-DUNGEON-AI-PLAN.md` - Strategic roadmap
|
||||
- `docs/z3ed/SESSION-SUMMARY-OCT3-2025.md` - Full session summary
|
||||
- `docs/z3ed/AGENTIC-PLAN-STATUS.md` - Overall project status
|
||||
|
||||
### Common Issues
|
||||
- **SSL Errors**: Check OpenSSL is linked, try rebuilding
|
||||
- **API Key Issues**: Verify at aistudio.google.com
|
||||
- **Command Errors**: Review prompt examples, use more specific language
|
||||
- **Rate Limits**: Wait 1 minute between large batches
|
||||
|
||||
---
|
||||
|
||||
**Quick Test Command** (Copy/Paste Ready):
|
||||
```bash
|
||||
export GEMINI_API_KEY="your-key-here" && \
|
||||
./build-grpc-test/bin/z3ed agent plan \
|
||||
--prompt "Place a tree at position 10, 10"
|
||||
```
|
||||
|
||||
**Status**: ✅ READY TO TEST
|
||||
**Next**: Build, test, iterate!
|
||||
|
||||
@@ -1,490 +0,0 @@
|
||||
# z3ed Quick Reference Card
|
||||
|
||||
**Last Updated**: October 2, 2025
|
||||
**For**: z3ed v0.1.0-alpha (macOS production-ready)
|
||||
|
||||
---
|
||||
|
||||
## Build & Setup
|
||||
|
||||
### Build with gRPC Support
|
||||
```bash
|
||||
# Initial build (15-20 min)
|
||||
cmake -B build-grpc-test -DYAZE_WITH_GRPC=ON
|
||||
cmake --build build-grpc-test --target yaze -j$(sysctl -n hw.ncpu)
|
||||
cmake --build build-grpc-test --target z3ed -j$(sysctl -n hw.ncpu)
|
||||
|
||||
# Incremental rebuild (5-10 sec)
|
||||
cmake --build build-grpc-test --target z3ed
|
||||
```
|
||||
|
||||
### Start Test Harness
|
||||
```bash
|
||||
# Start YAZE with test harness
|
||||
./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze \
|
||||
--enable_test_harness \
|
||||
--test_harness_port=50052 \
|
||||
--rom_file=assets/zelda3.sfc &
|
||||
|
||||
# Verify server running
|
||||
lsof -i :50052
|
||||
|
||||
# Kill existing instance
|
||||
killall yaze 2>/dev/null
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## CLI Commands
|
||||
|
||||
### Agent Workflow
|
||||
|
||||
#### Create Proposal
|
||||
```bash
|
||||
# Run agent with sandbox (recommended)
|
||||
z3ed agent run --prompt "Make soldiers red" --rom=zelda3.sfc --sandbox
|
||||
|
||||
# Run without sandbox (modifies ROM directly)
|
||||
z3ed agent run --prompt "..." --rom=zelda3.sfc
|
||||
```
|
||||
|
||||
#### List Proposals
|
||||
```bash
|
||||
# List all proposals
|
||||
z3ed agent list
|
||||
|
||||
# Output shows:
|
||||
# - ID
|
||||
# - Status (Pending/Accepted/Rejected)
|
||||
# - Created timestamp
|
||||
# - Prompt
|
||||
# - Commands executed
|
||||
# - Bytes changed
|
||||
```
|
||||
|
||||
#### View Diff
|
||||
```bash
|
||||
# View latest pending proposal
|
||||
z3ed agent diff
|
||||
|
||||
# View specific proposal
|
||||
z3ed agent diff --proposal-id proposal-20251002T100000-1
|
||||
```
|
||||
|
||||
#### Review in GUI
|
||||
```bash
|
||||
# Open YAZE
|
||||
./build/bin/yaze.app/Contents/MacOS/yaze
|
||||
|
||||
# Navigate: Debug → Agent Proposals
|
||||
# Select proposal → Review → Accept/Reject/Delete
|
||||
```
|
||||
|
||||
#### Export API Schema
|
||||
```bash
|
||||
# Export all commands as YAML (for AI consumption)
|
||||
z3ed agent describe --output docs/api/z3ed-resources.yaml
|
||||
|
||||
# Export as JSON
|
||||
z3ed agent describe --format json --output api.json
|
||||
|
||||
# Export specific resource
|
||||
z3ed agent describe --resource rom --format json
|
||||
```
|
||||
|
||||
### Agent Testing (IT-02)
|
||||
|
||||
#### Run Natural Language Test
|
||||
```bash
|
||||
# Open editor and wait for window
|
||||
z3ed agent test --prompt "Open Overworld editor"
|
||||
|
||||
# Complex workflow
|
||||
z3ed agent test --prompt "Open Dungeon editor and verify it loads"
|
||||
|
||||
# Click specific button
|
||||
z3ed agent test --prompt "Click Save button"
|
||||
```
|
||||
|
||||
#### Test Introspection (IT-05) 🔜 PLANNED
|
||||
```bash
|
||||
# Get test status
|
||||
z3ed agent test status --test-id grpc_click_12345678
|
||||
|
||||
# Poll until completion
|
||||
z3ed agent test status --test-id grpc_click_12345678 --follow
|
||||
|
||||
# Get detailed results
|
||||
z3ed agent test results --test-id grpc_click_12345678 --include-logs
|
||||
|
||||
# List all tests
|
||||
z3ed agent test list --category grpc
|
||||
```
|
||||
|
||||
#### Widget Discovery (IT-06) <20> IN PROGRESS — telemetry available
|
||||
```bash
|
||||
# Discover all widgets
|
||||
z3ed agent gui discover
|
||||
|
||||
# Filter by window
|
||||
z3ed agent gui discover --window "Overworld"
|
||||
|
||||
# Get only buttons and include hidden/disabled widgets for AI diffing
|
||||
z3ed agent gui discover --type button --include-invisible --include-disabled --format json
|
||||
```
|
||||
|
||||
#### Test Recording (IT-07) 🔜 PLANNED
|
||||
```bash
|
||||
# Start recording
|
||||
z3ed agent test record start --output tests/my_test.json
|
||||
|
||||
# Perform actions...
|
||||
|
||||
# Stop recording
|
||||
z3ed agent test record stop --validate
|
||||
|
||||
# Replay test
|
||||
z3ed agent test replay tests/my_test.json
|
||||
|
||||
# Run test suite
|
||||
z3ed agent test suite run tests/smoke.yaml --ci-mode
|
||||
```
|
||||
|
||||
### ROM Commands
|
||||
|
||||
```bash
|
||||
# Display ROM metadata
|
||||
z3ed rom info --rom=zelda3.sfc
|
||||
|
||||
# Validate ROM integrity
|
||||
z3ed rom validate --rom=zelda3.sfc
|
||||
|
||||
# Compare two ROMs
|
||||
z3ed rom diff --rom1=original.sfc --rom2=modified.sfc
|
||||
|
||||
# Generate golden checksums
|
||||
z3ed rom generate-golden --rom=zelda3.sfc --output=golden.json
|
||||
```
|
||||
|
||||
### Palette Commands
|
||||
|
||||
```bash
|
||||
# Export palette
|
||||
z3ed palette export sprites_aux1 4 soldier.col
|
||||
|
||||
# Import palette
|
||||
z3ed palette import sprites_aux1 4 soldier_red.col
|
||||
|
||||
# List palettes
|
||||
z3ed palette list --group sprites_aux1
|
||||
```
|
||||
|
||||
### Overworld Commands
|
||||
|
||||
```bash
|
||||
# Get tile at coordinates
|
||||
z3ed overworld get-tile --map=0 --x=100 --y=50
|
||||
|
||||
# Set tile at coordinates
|
||||
z3ed overworld set-tile --map=0 --x=100 --y=50 --tile-id=0x1234
|
||||
```
|
||||
|
||||
### Dungeon Commands
|
||||
|
||||
```bash
|
||||
# List dungeon rooms
|
||||
z3ed dungeon list-rooms --dungeon=0
|
||||
|
||||
# Add object to room
|
||||
z3ed dungeon add-object --dungeon=0 --room=5 --object=chest
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## gRPC Testing with grpcurl
|
||||
|
||||
### Setup
|
||||
```bash
|
||||
# Install grpcurl
|
||||
brew install grpcurl
|
||||
|
||||
# Set proto path
|
||||
export PROTO_PATH="src/app/core/proto"
|
||||
export PROTO_FILE="imgui_test_harness.proto"
|
||||
```
|
||||
|
||||
### Core RPCs
|
||||
|
||||
#### Ping (Health Check)
|
||||
```bash
|
||||
grpcurl -plaintext -import-path $PROTO_PATH -proto $PROTO_FILE \
|
||||
-d '{"message":"test"}' 127.0.0.1:50052 yaze.test.ImGuiTestHarness/Ping
|
||||
|
||||
# Response:
|
||||
# {
|
||||
# "message": "Pong: test",
|
||||
# "timestampMs": "1696271234567",
|
||||
# "yazeVersion": "0.3.2"
|
||||
# }
|
||||
```
|
||||
|
||||
#### Click
|
||||
```bash
|
||||
# Click button
|
||||
grpcurl -plaintext -import-path $PROTO_PATH -proto $PROTO_FILE \
|
||||
-d '{"target":"button:Save","type":"LEFT"}' \
|
||||
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Click
|
||||
|
||||
# Click menu item
|
||||
grpcurl -plaintext -import-path $PROTO_PATH -proto $PROTO_FILE \
|
||||
-d '{"target":"menuitem: Overworld Editor","type":"LEFT"}' \
|
||||
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Click
|
||||
```
|
||||
|
||||
#### Type
|
||||
```bash
|
||||
grpcurl -plaintext -import-path $PROTO_PATH -proto $PROTO_FILE \
|
||||
-d '{"target":"input:Search","text":"tile16","clear_first":true}' \
|
||||
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Type
|
||||
```
|
||||
|
||||
#### Wait
|
||||
```bash
|
||||
# Wait for window to be visible
|
||||
grpcurl -plaintext -import-path $PROTO_PATH -proto $PROTO_FILE \
|
||||
-d '{"condition":"window_visible:Overworld","timeout_ms":5000}' \
|
||||
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Wait
|
||||
|
||||
# Wait for element to be enabled
|
||||
grpcurl -plaintext -import-path $PROTO_PATH -proto $PROTO_FILE \
|
||||
-d '{"condition":"element_enabled:button:Save","timeout_ms":3000}' \
|
||||
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Wait
|
||||
```
|
||||
|
||||
#### Assert
|
||||
```bash
|
||||
# Assert window visible
|
||||
grpcurl -plaintext -import-path $PROTO_PATH -proto $PROTO_FILE \
|
||||
-d '{"condition":"visible:Overworld"}' \
|
||||
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Assert
|
||||
|
||||
# Assert element enabled
|
||||
grpcurl -plaintext -import-path $PROTO_PATH -proto $PROTO_FILE \
|
||||
-d '{"condition":"enabled:button:Save"}' \
|
||||
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Assert
|
||||
```
|
||||
|
||||
#### Screenshot (Stub)
|
||||
```bash
|
||||
grpcurl -plaintext -import-path $PROTO_PATH -proto $PROTO_FILE \
|
||||
-d '{"window_title":"Overworld","output_path":"/tmp/test.png","format":"PNG"}' \
|
||||
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Screenshot
|
||||
|
||||
# Response: {"success":false,"message":"Screenshot not yet implemented"}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## E2E Testing
|
||||
|
||||
### Run Full Test Suite
|
||||
```bash
|
||||
# Run all E2E tests
|
||||
./scripts/test_harness_e2e.sh
|
||||
|
||||
# Expected output:
|
||||
# ✓ Ping (Health Check)
|
||||
# ✓ Click (Open Overworld Editor)
|
||||
# ✓ Wait (Overworld Editor Window)
|
||||
# ✓ Assert (Overworld Editor Visible)
|
||||
# ✓ Click (Open Dungeon Editor)
|
||||
# Tests Passed: 5
|
||||
```
|
||||
|
||||
### Manual Workflow Test
|
||||
```bash
|
||||
# 1. Start YAZE
|
||||
./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze \
|
||||
--enable_test_harness --test_harness_port=50052 \
|
||||
--rom_file=assets/zelda3.sfc &
|
||||
|
||||
# 2. Create proposal
|
||||
./build/bin/z3ed agent run --prompt "Test proposal" --rom=zelda3.sfc --sandbox
|
||||
|
||||
# 3. List proposals
|
||||
./build/bin/z3ed agent list
|
||||
|
||||
# 4. View diff
|
||||
./build/bin/z3ed agent diff
|
||||
|
||||
# 5. Review in GUI
|
||||
./build/bin/yaze.app/Contents/MacOS/yaze
|
||||
# Debug → Agent Proposals → Select → Accept
|
||||
|
||||
# 6. Cleanup
|
||||
killall yaze
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Port Already in Use
|
||||
```bash
|
||||
# Find process
|
||||
lsof -i :50052
|
||||
|
||||
# Kill process
|
||||
kill <PID>
|
||||
|
||||
# Or use different port
|
||||
./yaze --enable_test_harness --test_harness_port=50053
|
||||
```
|
||||
|
||||
### Connection Refused
|
||||
```bash
|
||||
# Check if server is running
|
||||
lsof -i :50052
|
||||
|
||||
# Check firewall (macOS)
|
||||
# System Preferences → Security & Privacy → Firewall
|
||||
|
||||
# Check logs
|
||||
./yaze --enable_test_harness --log_level=debug
|
||||
```
|
||||
|
||||
### Widget Not Found
|
||||
```bash
|
||||
# Problem: "Button 'XYZ' not found"
|
||||
|
||||
# Solutions:
|
||||
# 1. Verify exact label (case-sensitive)
|
||||
# 2. Wait for window to be visible first
|
||||
grpcurl ... Wait '{"condition":"window_visible:WindowName"}'
|
||||
|
||||
# 3. Assert widget exists
|
||||
grpcurl ... Assert '{"condition":"exists:button:XYZ"}'
|
||||
|
||||
# 4. Use widget discovery (IT-06 telemetry)
|
||||
z3ed agent gui discover --window "WindowName"
|
||||
```
|
||||
|
||||
### Build Errors
|
||||
```bash
|
||||
# Clean build
|
||||
rm -rf build-grpc-test
|
||||
cmake -B build-grpc-test -DYAZE_WITH_GRPC=ON
|
||||
cmake --build build-grpc-test --target yaze -j$(sysctl -n hw.ncpu)
|
||||
|
||||
# Check gRPC installation
|
||||
cmake --build build-grpc-test --target help | grep -i grpc
|
||||
|
||||
# Verify proto generation
|
||||
ls build-grpc-test/_deps/grpc-src/
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## File Locations
|
||||
|
||||
### Core Files
|
||||
```
|
||||
src/app/core/
|
||||
├── proto/imgui_test_harness.proto # gRPC service definition
|
||||
├── core/service/imgui_test_harness_service.{h,cc} # RPC implementation
|
||||
└── test_manager.{h,cc} # Test execution management
|
||||
|
||||
src/cli/
|
||||
├── handlers/agent.cc # CLI agent commands
|
||||
└── service/
|
||||
├── proposal_registry.{h,cc} # Proposal tracking
|
||||
├── rom_sandbox_manager.{h,cc} # ROM isolation
|
||||
└── resource_catalog.{h,cc} # API schema
|
||||
|
||||
src/app/editor/system/
|
||||
└── proposal_drawer.{h,cc} # GUI review panel
|
||||
|
||||
docs/z3ed/
|
||||
├── README.md # Overview & links
|
||||
├── E6-z3ed-cli-design.md # Architecture
|
||||
├── E6-z3ed-implementation-plan.md # Roadmap
|
||||
├── E6-z3ed-reference.md # Technical reference
|
||||
├── IMPLEMENTATION_CONTINUATION.md # Next steps
|
||||
└── IT-05-IMPLEMENTATION-GUIDE.md # Introspection API guide
|
||||
```
|
||||
|
||||
### Build Artifacts
|
||||
```
|
||||
build-grpc-test/
|
||||
├── bin/
|
||||
│ ├── yaze.app/Contents/MacOS/yaze # YAZE with test harness
|
||||
│ └── z3ed # CLI tool
|
||||
└── _deps/
|
||||
└── grpc-src/ # gRPC source (auto-fetched)
|
||||
|
||||
/tmp/yaze/
|
||||
├── proposals/ # Proposal metadata
|
||||
│ └── proposal-<timestamp>-<seq>/
|
||||
│ ├── execution.log
|
||||
│ ├── diff.txt
|
||||
│ └── screenshots/
|
||||
└── sandboxes/ # Isolated ROM copies
|
||||
└── <timestamp>-<seq>/
|
||||
└── zelda3.sfc
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Environment Variables
|
||||
|
||||
```bash
|
||||
# Optional: Set default ROM path
|
||||
export YAZE_ROM_PATH=~/roms/zelda3.sfc
|
||||
|
||||
# Optional: Set default test harness port
|
||||
export YAZE_TEST_HARNESS_PORT=50052
|
||||
|
||||
# Optional: Enable verbose logging
|
||||
export YAZE_LOG_LEVEL=debug
|
||||
|
||||
# Optional: Set proposal directory
|
||||
export YAZE_PROPOSAL_DIR=/custom/path/proposals
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Platform Support
|
||||
|
||||
| Platform | Status | Notes |
|
||||
|----------|--------|-------|
|
||||
| macOS ARM64 | ✅ Production Ready | Fully tested |
|
||||
| macOS Intel | ⚠️ Should Work | Not explicitly tested |
|
||||
| Linux | ⚠️ Should Work | gRPC has excellent support |
|
||||
| Windows | 🔬 Experimental | Build system ready, needs validation |
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
**Current Phase**: Test Harness Enhancements (IT-05 to IT-09)
|
||||
|
||||
**Immediate Priority**: IT-05 (Test Introspection API)
|
||||
|
||||
**See**:
|
||||
- [IMPLEMENTATION_CONTINUATION.md](IMPLEMENTATION_CONTINUATION.md) - Detailed roadmap
|
||||
- [IT-05-IMPLEMENTATION-GUIDE.md](IT-05-IMPLEMENTATION-GUIDE.md) - Step-by-step guide
|
||||
|
||||
---
|
||||
|
||||
## Resources
|
||||
|
||||
- **GitHub**: https://github.com/scawful/yaze
|
||||
- **Documentation**: `docs/z3ed/`
|
||||
- **Slack/Discord**: [TBD]
|
||||
- **Contributors**: @scawful, GitHub Copilot
|
||||
|
||||
---
|
||||
|
||||
**Last Updated**: October 2, 2025
|
||||
**Version**: z3ed v0.1.0-alpha
|
||||
**License**: Same as YAZE (see LICENSE)
|
||||
@@ -96,44 +96,48 @@ z3ed agent plan --prompt "test"
|
||||
|
||||
**Note**: Gemini requires OpenSSL (HTTPS). Build with `-DYAZE_WITH_GRPC=ON -DYAZE_WITH_JSON=ON` to enable SSL support. OpenSSL is auto-detected on macOS/Linux. Windows users can use Ollama instead.
|
||||
|
||||
### Example Prompts
|
||||
Here are some example prompts you can try with either Ollama or Gemini:
|
||||
|
||||
**Overworld Tile16 Editing**:
|
||||
- `"Place a tree at position 10, 20 on map 0"`
|
||||
- `"Create a 3x3 water pond at coordinates 15, 10"`
|
||||
- `"Add a dirt path from position 5,5 to 5,15"`
|
||||
- `"Plant a row of trees horizontally at y=8 from x=20 to x=25"`
|
||||
|
||||
**Dungeon Editing (Label-Aware)**:
|
||||
- `"Add 3 soldiers to the Eastern Palace entrance room"`
|
||||
- `"Place a chest in Hyrule Castle treasure room"`
|
||||
|
||||
## Core Documentation
|
||||
|
||||
### Essential Reads
|
||||
1. **[E6-z3ed-cli-design.md](E6-z3ed-cli-design.md)** - Architecture, design philosophy, agentic workflow framework
|
||||
2. **[E6-z3ed-reference.md](E6-z3ed-reference.md)** - Complete command reference and API documentation
|
||||
3. **[AGENTIC-PLAN-STATUS.md](AGENTIC-PLAN-STATUS.md)** - Current implementation status and roadmap
|
||||
1. **[AGENT-ROADMAP.md](AGENT-ROADMAP.md)** - The primary source of truth for the AI agent's strategic vision, architecture, and next steps.
|
||||
2. **[E6-z3ed-cli-design.md](E6-z3ed-cli-design.md)** - Detailed architecture and design philosophy.
|
||||
3. **[E6-z3ed-reference.md](E6-z3ed-reference.md)** - Complete command reference and API documentation.
|
||||
|
||||
### Quick References
|
||||
- **[QUICK_REFERENCE.md](QUICK_REFERENCE.md)** - Condensed command cheatsheet
|
||||
- **[QUICK-START-GEMINI.md](QUICK-START-GEMINI.md)** - Gemini API setup and testing guide
|
||||
- **[OVERWORLD-DUNGEON-AI-PLAN.md](OVERWORLD-DUNGEON-AI-PLAN.md)** - Tile16 editing strategy and ResourceLabels integration
|
||||
- **[QUICK_REFERENCE.md](QUICK_REFERENCE.md)** - Condensed command cheatsheet.
|
||||
- **[QUICK-START-GEMINI.md](QUICK-START-GEMINI.md)** - Gemini API setup and testing guide.
|
||||
|
||||
### Implementation Guides
|
||||
- **[LLM-INTEGRATION-PLAN.md](LLM-INTEGRATION-PLAN.md)** - LLM integration roadmap (Ollama, Gemini, Claude)
|
||||
- **[LLM-IMPLEMENTATION-CHECKLIST.md](LLM-IMPLEMENTATION-CHECKLIST.md)** - Step-by-step implementation tasks
|
||||
- **[IT-05-IMPLEMENTATION-GUIDE.md](IT-05-IMPLEMENTATION-GUIDE.md)** - Test introspection API (complete ✅)
|
||||
- **[IT-08-IMPLEMENTATION-GUIDE.md](IT-08-IMPLEMENTATION-GUIDE.md)** - Enhanced error reporting (complete ✅)
|
||||
- **[LLM-INTEGRATION-PLAN.md](LLM-INTEGRATION-PLAN.md)** - (Archive) Original LLM integration roadmap.
|
||||
- **[IT-05-IMPLEMENTATION-GUIDE.md](IT-05-IMPLEMENTATION-GUIDE.md)** - Test introspection API (complete ✅).
|
||||
- **[IT-08-IMPLEMENTATION-GUIDE.md](IT-08-IMPLEMENTATION-GUIDE.md)** - Enhanced error reporting (complete ✅).
|
||||
|
||||
## Current Status (October 2025)
|
||||
|
||||
### ✅ Complete
|
||||
- **CLI Infrastructure**: Command parsing, handlers, TUI components
|
||||
- **Proposal System**: Sandbox creation, diff generation, accept/reject workflow
|
||||
- **AI Services**: Ollama integration, Gemini integration, PromptBuilder
|
||||
- **GUI Automation**: Widget discovery, test recording/replay, gRPC harness
|
||||
- **Test Introspection**: Status polling, results query, execution history
|
||||
- **Error Reporting**: Screenshots, failure context, widget state dumps
|
||||
The project is currently focused on implementing a conversational AI agent. See [AGENT-ROADMAP.md](AGENT-ROADMAP.md) for a detailed breakdown of what's complete, in progress, and planned.
|
||||
|
||||
### 🔄 In Progress
|
||||
- **Tile16 Editing Workflow**: Accept/reject for overworld canvas edits
|
||||
- **ResourceLabels Integration**: User-defined names for AI context
|
||||
- **Dungeon Editing Support**: Object/sprite placement via AI
|
||||
- **Conversational Agent**: Building a chat-like interface for the TUI and GUI.
|
||||
- **Agent "Tools"**: Adding more read-only commands for the agent to inspect the ROM.
|
||||
- **ResourceLabels Integration**: Integrating user-defined names for AI context.
|
||||
|
||||
### 📋 Planned
|
||||
- **Visual Diff Generation**: Before/after screenshots for proposals
|
||||
- **Batch Operations**: Multiple tile16 changes in single proposal
|
||||
- **Pattern Library**: Pre-defined tile patterns (rivers, forests, etc.)
|
||||
- **Claude Integration**: Anthropic API support
|
||||
- **GUI Chat Widget**: A shared chat interface for the main `yaze` application.
|
||||
- **Dungeon Editing Support**: Object/sprite placement via AI.
|
||||
- **Visual Diff Generation**: Before/after screenshots for proposals.
|
||||
|
||||
## AI Editing Focus Areas
|
||||
|
||||
@@ -246,38 +250,6 @@ AI agent features require:
|
||||
- Provide map context ("Light World", "map 0")
|
||||
- Check ResourceLabels are loaded for your project
|
||||
|
||||
## Contributing
|
||||
|
||||
### Adding AI Prompt Examples
|
||||
Edit `src/cli/service/prompt_builder.cc` → `LoadDefaultExamples()`
|
||||
- Add practical, multi-step examples
|
||||
- Include explanation of tile IDs and reasoning
|
||||
- Test with both Ollama and Gemini
|
||||
|
||||
### Adding CLI Commands
|
||||
1. Create handler in `src/cli/handlers/<category>.cc`
|
||||
2. Register in command dispatcher
|
||||
3. Add to `E6-z3ed-reference.md` documentation
|
||||
4. Add example prompt to `prompt_builder.cc`
|
||||
|
||||
### Testing
|
||||
```bash
|
||||
# Run unit tests
|
||||
cd build-grpc-test && ctest --output-on-failure
|
||||
|
||||
# Test AI integration
|
||||
./bin/z3ed agent plan --prompt "test prompt" --verbose
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
**Getting Help**:
|
||||
- Read [E6-z3ed-cli-design.md](E6-z3ed-cli-design.md) for architecture
|
||||
- Check [AGENTIC-PLAN-STATUS.md](AGENTIC-PLAN-STATUS.md) for current status
|
||||
- Review [QUICK-START-GEMINI.md](QUICK-START-GEMINI.md) for AI setup
|
||||
|
||||
**Quick Test** (verifies AI is working):
|
||||
```bash
|
||||
export GEMINI_API_KEY="your-key" # or start ollama serve
|
||||
./build-grpc-test/bin/z3ed agent plan --prompt "Place a tree at 10, 10"
|
||||
```
|
||||
#### Gemini-Specific Issues
|
||||
- **"Cannot reach Gemini API"**: Check your internet connection, API key, and that you've built with SSL support.
|
||||
- **"Invalid Gemini API key"**: Regenerate your key at `aistudio.google.com/apikey`.
|
||||
|
||||
@@ -1,239 +0,0 @@
|
||||
# SSL Support and Collaborative Features Plan
|
||||
|
||||
**Date**: October 3, 2025
|
||||
**Status**: 🔧 In Progress
|
||||
|
||||
## Executive Summary
|
||||
|
||||
This document outlines the plan to enable SSL/HTTPS support in z3ed for Gemini API integration, and explains how this infrastructure benefits future collaborative editing features.
|
||||
|
||||
## Problem Statement
|
||||
|
||||
**Current Issue**: Gemini API requires HTTPS (`https://generativelanguage.googleapis.com`), but our httplib dependency doesn't have SSL support enabled in the current build configuration.
|
||||
|
||||
**Error Scenario**:
|
||||
```cpp
|
||||
httplib::Client cli("https://generativelanguage.googleapis.com");
|
||||
// Fails because CPPHTTPLIB_OPENSSL_SUPPORT is not defined
|
||||
```
|
||||
|
||||
## Solution: Enable OpenSSL Support
|
||||
|
||||
### 1. Build System Changes
|
||||
|
||||
**File**: `src/cli/z3ed.cmake`
|
||||
|
||||
**Changes Required**:
|
||||
```cmake
|
||||
# After line 84 (where YAZE_WITH_JSON is configured)
|
||||
|
||||
# ============================================================================
|
||||
# SSL/HTTPS Support (Required for Gemini API and future collaborative features)
|
||||
# ============================================================================
|
||||
option(YAZE_WITH_SSL "Build with OpenSSL support for HTTPS" ON)
|
||||
if(YAZE_WITH_SSL OR YAZE_WITH_JSON)
|
||||
# Find OpenSSL on the system
|
||||
find_package(OpenSSL REQUIRED)
|
||||
|
||||
# Define the SSL support macro for httplib
|
||||
target_compile_definitions(z3ed PRIVATE CPPHTTPLIB_OPENSSL_SUPPORT)
|
||||
|
||||
# Link OpenSSL libraries
|
||||
target_link_libraries(z3ed PRIVATE OpenSSL::SSL OpenSSL::Crypto)
|
||||
|
||||
# On macOS, also enable Keychain cert support
|
||||
if(APPLE)
|
||||
target_compile_definitions(z3ed PRIVATE CPPHTTPLIB_USE_CERTS_FROM_MACOSX_KEYCHAIN)
|
||||
target_link_libraries(z3ed PRIVATE "-framework CoreFoundation -framework Security")
|
||||
endif()
|
||||
|
||||
message(STATUS "✓ SSL/HTTPS support enabled for z3ed")
|
||||
endif()
|
||||
```
|
||||
|
||||
### 2. Verification Steps
|
||||
|
||||
**Build with SSL**:
|
||||
```bash
|
||||
cd /Users/scawful/Code/yaze
|
||||
|
||||
# Clean rebuild with SSL support
|
||||
rm -rf build-grpc-test
|
||||
cmake -B build-grpc-test -DYAZE_WITH_GRPC=ON -DYAZE_WITH_JSON=ON -DYAZE_WITH_SSL=ON
|
||||
cmake --build build-grpc-test --target z3ed
|
||||
|
||||
# Verify OpenSSL is linked
|
||||
otool -L build-grpc-test/bin/z3ed | grep ssl
|
||||
# Expected output:
|
||||
# /usr/lib/libssl.dylib
|
||||
# /usr/lib/libcrypto.dylib
|
||||
```
|
||||
|
||||
**Test Gemini Connection**:
|
||||
```bash
|
||||
export GEMINI_API_KEY="your-key-here"
|
||||
./build-grpc-test/bin/z3ed agent plan --prompt "Test SSL connection"
|
||||
```
|
||||
|
||||
### 3. OpenSSL Installation (if needed)
|
||||
|
||||
**macOS**:
|
||||
```bash
|
||||
# OpenSSL is usually pre-installed, but if needed:
|
||||
brew install openssl@3
|
||||
|
||||
# If CMake can't find it, set paths:
|
||||
export OPENSSL_ROOT_DIR=$(brew --prefix openssl@3)
|
||||
```
|
||||
|
||||
**Linux**:
|
||||
```bash
|
||||
# Debian/Ubuntu
|
||||
sudo apt-get install libssl-dev
|
||||
|
||||
# Fedora/RHEL
|
||||
sudo dnf install openssl-devel
|
||||
```
|
||||
|
||||
## Benefits for Collaborative Features
|
||||
|
||||
### 1. WebSocket Support (Future)
|
||||
|
||||
SSL enables secure WebSocket connections for real-time collaborative editing:
|
||||
|
||||
```cpp
|
||||
#ifdef CPPHTTPLIB_OPENSSL_SUPPORT
|
||||
// Secure WebSocket for collaborative editing
|
||||
httplib::SSLClient ws_client("wss://collaboration.yaze.dev");
|
||||
ws_client.set_connection_timeout(30, 0);
|
||||
|
||||
// Subscribe to real-time ROM changes
|
||||
auto res = ws_client.Get("/subscribe/room/12345");
|
||||
// Multiple users can edit the same ROM simultaneously
|
||||
#endif
|
||||
```
|
||||
|
||||
**Use Cases**:
|
||||
- Multi-user dungeon editing sessions
|
||||
- Real-time tile16 preview sharing
|
||||
- Collaborative palette editing
|
||||
- Synchronized sprite placement
|
||||
|
||||
### 2. Cloud ROM Storage (Future)
|
||||
|
||||
HTTPS enables secure cloud storage integration:
|
||||
|
||||
```cpp
|
||||
// Upload ROM to secure cloud storage
|
||||
httplib::SSLClient cloud("https://api.yaze.cloud");
|
||||
cloud.Post("/roms/upload", rom_data, "application/octet-stream");
|
||||
|
||||
// Download shared ROM modifications
|
||||
auto res = cloud.Get("/roms/shared/abc123");
|
||||
```
|
||||
|
||||
**Use Cases**:
|
||||
- Team ROM projects with version control
|
||||
- Shared resource libraries (tile16 sets, palettes, sprites)
|
||||
- Automated ROM backups
|
||||
- Project synchronization across devices
|
||||
|
||||
### 3. Secure Authentication (Future)
|
||||
|
||||
SSL required for secure user authentication:
|
||||
|
||||
```cpp
|
||||
// OAuth2 flow for collaborative features
|
||||
httplib::SSLClient auth("https://auth.yaze.dev");
|
||||
auto token_res = auth.Post("/oauth/token",
|
||||
"grant_type=authorization_code&code=ABC123",
|
||||
"application/x-www-form-urlencoded");
|
||||
```
|
||||
|
||||
**Use Cases**:
|
||||
- User accounts for collaborative editing
|
||||
- Shared project permissions
|
||||
- ROM access control
|
||||
- API rate limiting
|
||||
|
||||
### 4. Plugin/Extension Marketplace (Future)
|
||||
|
||||
HTTPS required for secure plugin downloads:
|
||||
|
||||
```cpp
|
||||
// Download verified plugins from marketplace
|
||||
httplib::SSLClient marketplace("https://plugins.yaze.dev");
|
||||
auto plugin_res = marketplace.Get("/api/v1/plugins/tile16-tools/latest");
|
||||
// Verify signature before installation
|
||||
```
|
||||
|
||||
**Use Cases**:
|
||||
- Community-created editing tools
|
||||
- Custom AI prompt templates
|
||||
- Shared dungeon/overworld templates
|
||||
- Asset packs and resources
|
||||
|
||||
## Integration Timeline
|
||||
|
||||
### Phase 1: Immediate (This Session)
|
||||
- ✅ Enable OpenSSL in z3ed build
|
||||
- ✅ Test Gemini API with SSL
|
||||
- ✅ Document SSL setup in README
|
||||
|
||||
### Phase 2: Short-term (Next Week)
|
||||
- Add SSL health checks to CLI startup
|
||||
- Implement certificate validation
|
||||
- Add SSL error diagnostics
|
||||
|
||||
### Phase 3: Medium-term (Next Month)
|
||||
- Design collaborative editing protocol
|
||||
- Prototype WebSocket-based real-time editing
|
||||
- Implement cloud ROM storage API
|
||||
|
||||
### Phase 4: Long-term (Future)
|
||||
- Full collaborative editing system
|
||||
- Plugin marketplace infrastructure
|
||||
- Authentication and authorization system
|
||||
|
||||
## Security Considerations
|
||||
|
||||
### Certificate Validation
|
||||
- Always validate SSL certificates in production
|
||||
- Support custom CA certificates for enterprise environments
|
||||
- Implement certificate pinning for critical endpoints
|
||||
|
||||
### API Key Protection
|
||||
- Never hardcode API keys
|
||||
- Use environment variables or secure keychains
|
||||
- Rotate keys periodically
|
||||
|
||||
### Data Transmission
|
||||
- Encrypt ROM data before transmission
|
||||
- Use TLS 1.3 for all connections
|
||||
- Implement perfect forward secrecy
|
||||
|
||||
## Testing Checklist
|
||||
|
||||
- [ ] OpenSSL links correctly on macOS
|
||||
- [ ] OpenSSL links correctly on Linux
|
||||
- [ ] OpenSSL links correctly on Windows
|
||||
- [ ] Gemini API works with HTTPS
|
||||
- [ ] Certificate validation works
|
||||
- [ ] macOS Keychain integration works
|
||||
- [ ] Custom CA certificates work
|
||||
- [ ] Build size impact acceptable
|
||||
- [ ] No performance regression
|
||||
|
||||
## Estimated Impact
|
||||
|
||||
**Build Size**: +2-3MB (OpenSSL libraries)
|
||||
**Build Time**: +10-15 seconds (first build only)
|
||||
**Runtime**: Negligible overhead for HTTPS
|
||||
**Dependencies**: OpenSSL 3.0+ (system package)
|
||||
|
||||
---
|
||||
|
||||
**Status**: ✅ READY FOR IMPLEMENTATION
|
||||
**Priority**: HIGH (Blocks Gemini API integration)
|
||||
**Next Action**: Modify `src/cli/z3ed.cmake` to enable OpenSSL support
|
||||
|
||||
Reference in New Issue
Block a user