feat: Enhance ROM loading options and proposal management

- Introduced `RomLoadOptions` struct to manage various loading configurations for ROM files, including options for stripping headers, populating metadata, and loading Zelda 3 content.
- Updated `Rom::LoadFromFile` and `Rom::LoadFromData` methods to accept `RomLoadOptions`, allowing for more flexible ROM loading behavior.
- Implemented `MaybeStripSmcHeader` function to conditionally remove SMC headers from ROM data.
- Added new command handler `RomInfo` to display basic ROM information, including title and size.
- Created `ProposalRegistry` class to manage agent-generated proposals, including creation, logging, and status updates.
- Enhanced CLI commands to support proposal listing and detailed diff viewing, improving user interaction with agent-generated modifications.
- Updated resource catalog to include new actions for ROM info and agent proposal management.
This commit is contained in:
scawful
2025-10-01 18:18:48 -04:00
parent 04a4d04f4e
commit 02c6985201
13 changed files with 1373 additions and 72 deletions

View File

@@ -93,11 +93,18 @@ The generative workflow has been refined to incorporate more detailed planning a
- **Project Scaffolding**: Implemented.
### Phase 4: Agentic Framework & Generative AI (In Progress)
- **`z3ed agent` command**: Implemented with `run`, `plan`, `diff`, `test`, `commit`, `revert`, and `learn` subcommands.
- **`z3ed agent` command**: Implemented with `run`, `plan`, `diff`, `test`, `commit`, `revert`, `describe`, `learn`, and `list` subcommands.
- **Resource Catalog System**: ✅ Complete - comprehensive schema for all CLI commands with effects and returns metadata.
- **Agent Describe Command**: ✅ Fully operational - exports command catalog in JSON/YAML formats for AI consumption.
- **Agent List Command**: ✅ Complete - enumerates all proposals with status and metadata.
- **Agent Diff Enhancement**: ✅ Complete - reads proposals from registry, supports `--proposal-id` flag, displays execution logs and metadata.
- **Machine-Readable API**: ✅ `docs/api/z3ed-resources.yaml` generated and maintained for automation.
- **AI Model Interaction**: In progress, with `MockAIService` and `GeminiAIService` (conditional) implemented.
- **Execution Loop (MCP)**: In progress, with command parsing and execution logic.
- **Leveraging `ImGuiTestEngine`**: In progress, with `agent test` subcommand.
- **Granular Data Commands**: Not started, but planned.
- **Leveraging `ImGuiTestEngine`**: In progress, with `agent test` subcommand for GUI verification.
- **Sandbox ROM Management**: ✅ Complete - `RomSandboxManager` operational with full lifecycle management.
- **Proposal Tracking**: ✅ Complete - `ProposalRegistry` implemented with metadata, diffs, logs, and lifecycle management.
- **Granular Data Commands**: Partially complete - rom, palette, overworld, dungeon commands operational.
- **SpriteBuilder CLI**: Deprioritized.
### Phase 5: Code Structure & UX Improvements (Completed)
@@ -108,6 +115,27 @@ The generative workflow has been refined to incorporate more detailed planning a
- **Build System**: Streamlined CMake configuration with proper dependency management and conditional compilation.
- **Code Quality**: Resolved linting errors and improved code maintainability through better header organization and forward declarations.
### Phase 6: Resource Catalogue & API Documentation (✅ Completed - Oct 1, 2025)
- **Resource Schema System**: ✅ Comprehensive schema definitions for all CLI resources (ROM, Patch, Palette, Overworld, Dungeon, Agent).
- **Metadata Annotations**: ✅ All commands annotated with arguments, effects, returns, and stability levels.
- **Serialization Framework**: ✅ Dual-format export (JSON compact, YAML human-readable) with resource filtering.
- **Agent Describe Command**: ✅ Full implementation with `--format`, `--resource`, `--output`, `--version` flags.
- **API Documentation Generation**: ✅ Automated generation of `docs/api/z3ed-resources.yaml` for AI/tooling consumption.
- **Flag-Based Dispatch**: ✅ Hardened command routing - all ROM commands use `FLAGS_rom` consistently.
- **ROM Info Fix**: ✅ Created dedicated `RomInfo` handler, resolving segfault issue.
**Key Achievements**:
- Machine-readable API catalog enables LLM integration for automated ROM hacking workflows
- Comprehensive command documentation with argument types, effects, and return schemas
- Stable foundation for AI agents to discover and invoke CLI commands programmatically
- Validation layer for ensuring command compatibility and argument correctness
**Testing Coverage**:
- ✅ All ROM commands tested: `info`, `validate`, `diff`, `generate-golden`
- ✅ Agent describe tested: YAML output, JSON output, resource filtering, file generation
- ✅ Help system integration verified with updated command listings
- ✅ Build system validated on macOS (arm64) with no critical warnings
## 8. Agentic Framework Architecture - Advanced Dive
The agentic framework is designed to allow an AI agent to make edits to the ROM based on high-level natural language prompts. The framework is built around the `z3ed` CLI and the `ImGuiTestEngine`. This section provides a more advanced look into its architecture and future development.
@@ -118,10 +146,12 @@ The `z3ed agent` command is the main entry point for the agent. It has the follo
- `run --prompt "..."`: Executes a prompt by generating and running a sequence of `z3ed` commands.
- `plan --prompt "..."`: Shows the sequence of `z3ed` commands the AI plans to execute.
- `diff`: Shows a diff of the changes made to the ROM after running a prompt.
- `diff [--proposal-id <id>]`: Shows a diff of the changes made to the ROM after running a prompt. Displays the latest pending proposal by default, or a specific proposal if ID is provided.
- `list`: Lists all proposals with their status, creation time, prompt, and execution statistics.
- `test --prompt "..."`: Generates changes and then runs an `ImGuiTestEngine` test to verify them.
- `commit`: Saves the modified ROM and any new assets to the project.
- `revert`: Reverts the changes made by the agent.
- `describe [--resource <name>]`: Returns machine-readable schemas for CLI commands, enabling AI/LLM integration.
- `learn --description "..."`: Records a sequence of user actions (CLI commands and GUI interactions) and associates them with a natural language description, allowing the agent to learn new workflows.
### 8.2. The Agentic Loop (MCP) - Detailed Workflow
@@ -414,6 +444,37 @@ Allowing an LLM to drive the ImGui UI safely requires a structured bridge betwee
- **Synchronization Primitives**: Provide `WaitForIdle`, `WaitForCondition`, and `Delay` primitives so LLMs can coordinate with frame updates. Each primitive enforces timeouts and returns explicit success/failure statuses.
- **State Queries**: Implement reflection endpoints retrieving ImGui widget hierarchy, enabling the agent to confirm UI states before issuing the next action—mirroring how `ImGuiTestEngine` DSL scripts work today.
#### 13.1.1. Transport & Envelope
- **Session bootstrap**: `yaze_test --automation=<socket path>` spins up the harness and prints a connection URI. The CLI or external agent opens a persistent stream (Unix domain socket on macOS/Linux, named pipe + overlapped IO on Windows). TLS is out-of-scope; trust is derived from local IPC.
- **Message format**: Each frame is a length-prefixed JSON envelope with optional binary attachments. Core fields:
```json
{
"id": "req-42",
"type": "event" | "query" | "expect" | "control",
"payload": { /* type-specific body */ },
"attachments": [
{ "slot": 0, "mime": "image/png" }
]
}
```
Binary blobs (e.g., screenshots) follow immediately after the JSON payload in the same frame to avoid out-of-band coordination.
- **Streaming semantics**: Responses reuse the `id` field and include `status`, `error`, and optional attachments. Long-running operations (`WaitForCondition`) stream periodic `progress` updates before returning `status: "ok"` or `status: "timeout"`.
#### 13.1.2. Harness Runtime Lifecycle
1. **Attach**: Agent sends a `control` message (`{"command":"attach"}`) to lock in a session. Harness responds with negotiated capabilities (available input devices, screenshot formats, rate limits).
2. **Activate context**: Agent issues an `event` to focus a specific ImGui context (e.g., "main", "palette_editor"). Harness binds to the corresponding `ImGuiTestEngine` backend fixture.
3. **Execute actions**: Agent streams `event` objects (`click`, `drag`, `keystroke`, `text_input`). Harness feeds them into the ImGui event queue at the start of the next frame, waits for the frame to settle, then replies.
4. **Query & assert**: Agent interleaves `query` messages (`get_widget_tree`, `capture_screenshot`, `read_value`) and `expect` messages (`assert_property`, `assert_pixel`). Harness routes these to existing ImGuiTestEngine inspectors, lifting the results into structured JSON.
5. **Detach**: Agent issues `{"command":"detach"}` (or connection closes). Harness flushes pending frames, releases sandbox locks, and tears down the socket.
#### 13.1.3. Integration with `z3ed agent`
- **Plan annotation**: The CLI plan schema gains a new step kind `imgui_action` with fields `harness_uri`, `actions[]`, and optional `expect[]`. During execution `z3ed agent run` opens the harness stream, feeds each action, and short-circuits on first failure.
- **Sandbox awareness**: Harness sessions inherit the active sandbox ROM path from `RomSandboxManager`, ensuring UI assertions operate on the same data snapshot as CLI mutations.
- **Telemetry hooks**: Every harness response is appended to the proposal timeline (see §12) with thumbnails for screenshots. Failures bubble up as structured errors with hints (`"missing_widget": "Palette/Cell[12]"`).
### 13.2. Safety & Sandboxing
- **Read-Only Default**: Harness sessions start in read-only mode; mutation commands must explicitly request escalation after presenting a plan (triggering a UI prompt for the user to authorize). Without authorization, only `capture` and `assert` operations succeed.