- Updated `.clang-tidy` and `.clangd` configurations for improved code quality checks and diagnostics. - Added new submodules for JSON and HTTP libraries to support future features. - Refined README and documentation files to standardize naming conventions and improve clarity. - Introduced a new command palette in the CLI for easier command access and execution. - Implemented various CLI handlers for managing ROM, sprites, palettes, and dungeon functionalities. - Enhanced the TUI components for better user interaction and command execution. - Added AI service integration for generating commands based on user prompts, expanding the CLI's capabilities.
19 KiB
1. Overview
This document outlines a design for the evolution of z3ed, the command-line interface for the YAZE project. The goal is to transform z3ed from a collection of utility commands into a powerful, scriptable, and extensible tool for both manual and automated ROM hacking, with a forward-looking approach to AI-driven generative development.
1.1. Current State
z3ed has evolved significantly. The initial limitations regarding scope, inconsistent structure, basic TUI, and limited scriptability have largely been addressed through the implementation of resource-oriented commands, modular TUI components, and structured output. The focus has shifted towards building a robust foundation for AI-driven generative hacking.
2. Design Goals
The proposed redesign focuses on three core pillars:
- Power & Usability for ROM Hackers: Empower users with fine-grained control over all aspects of the ROM directly from the command line.
- Testability & Automation: Provide robust commands for validating ROM integrity and automating complex testing scenarios.
- AI & Generative Hacking: Establish a powerful, scriptable API that an AI agent (MCP) can use to perform complex, generative tasks on the ROM.
3. Proposed CLI Architecture: Resource-Oriented Commands
The CLI has adopted a z3ed <resource> <action> [options] structure, similar to modern CLIs like gcloud or kubectl, improving clarity and extensibility.
3.1. Top-Level Resources
rom: Commands for interacting with the ROM file itself.patch: Commands for applying and creating patches.gfx: Commands for graphics manipulation.palette: Commands for palette manipulation.overworld: Commands for overworld editing.dungeon: Commands for dungeon editing.sprite: Commands for sprite management and creation.test: Commands for running tests.tui: The entrypoint for the enhanced Text User Interface.agent: Commands for interacting with the AI agent.
3.2. Example Command Mapping
The command mapping has been successfully implemented, transitioning from the old flat structure to the new resource-oriented approach.
4. New Features & Commands
4.1. For the ROM Hacker (Power & Scriptability)
These commands focus on exporting data to and from the original SCAD (Nintendo Super Famicom/SNES CAD) binary formats found in the gigaleak, as well as other relevant binary formats. This enables direct interaction with development assets, version control, and sharing. Many of these commands have been implemented or are in progress.
- Dungeon Editing: Commands for exporting, importing, listing, and adding objects.
- Overworld Editing: Commands for getting, setting tiles, listing, and moving sprites.
- Graphics & Palettes: Commands for exporting/importing sheets and palettes.
4.2. For Testing & Automation
- ROM Validation & Comparison:
z3ed rom validate,z3ed rom diff, andz3ed rom generate-goldenhave been implemented. - Test Execution:
z3ed test runandz3ed test list-suitesare in progress.
5. TUI Enhancements
The --tui flag now launches a significantly enhanced, interactive terminal application built with FTXUI. The TUI has been decomposed into a set of modular components, with each command handler responsible for its own TUI representation, making it more extensible and easier to maintain.
- Dashboard View: The main screen is evolving into a dashboard.
- Interactive Palette Editor: In progress.
- Interactive Hex Viewer: Implemented.
- Command Palette: In progress.
- Tabbed Layout: Implemented.
6. Generative & Agentic Workflows (MCP Integration)
The redesigned CLI serves as the foundational API for an AI-driven Model-Code-Program (MCP) loop. The AI agent's "program" is a script of z3ed commands.
6.1. The Generative Workflow
The generative workflow has been refined to incorporate more detailed planning and verification steps, leveraging the z3ed agent commands.
6.2. Key Enablers
- Granular Commands: The CLI provides commands to manipulate data within the binary formats (e.g.,
palette set-color,gfx set-pixel), abstracting complexity from the AI agent. - Idempotency: Commands are designed to be idempotent where possible.
- SpriteBuilder CLI: Deprioritized for now, pending further research and development of the underlying assembly generation capabilities.
7. Implementation Roadmap
Phase 1: Core CLI & TUI Foundation (Done)
- CLI Structure: Implemented.
- Command Migration: Implemented.
- TUI Decomposition: Implemented.
Phase 2: Interactive TUI & Command Palette (Done)
- Interactive Palette Editor: Implemented.
- Interactive Hex Viewer: Implemented.
- Command Palette: Implemented.
Phase 3: Testing & Project Management (Done)
rom validate: Implemented.rom diff: Implemented.rom generate-golden: Implemented.- Project Scaffolding: Implemented.
Phase 4: Agentic Framework & Generative AI (In Progress)
z3ed agentcommand: Implemented withrun,plan,diff,test,commit,revert, andlearnsubcommands.- AI Model Interaction: In progress, with
MockAIServiceandGeminiAIService(conditional) implemented. - Execution Loop (MCP): In progress, with command parsing and execution logic.
- Leveraging
ImGuiTestEngine: In progress, withagent testsubcommand. - Granular Data Commands: Not started, but planned.
- SpriteBuilder CLI: Deprioritized.
Phase 5: Code Structure & UX Improvements (Completed)
- Modular Architecture: Refactored CLI handlers into clean, focused modules with proper separation of concerns.
- TUI Component System: Implemented
TuiComponentinterface for consistent UI components across the application. - Unified Command Interface: Standardized
CommandHandlerbase class with both CLI and TUI execution paths. - Error Handling: Improved error handling with consistent
absl::Statususage throughout the codebase. - Build System: Streamlined CMake configuration with proper dependency management and conditional compilation.
- Code Quality: Resolved linting errors and improved code maintainability through better header organization and forward declarations.
8. Agentic Framework Architecture - Advanced Dive
The agentic framework is designed to allow an AI agent to make edits to the ROM based on high-level natural language prompts. The framework is built around the z3ed CLI and the ImGuiTestEngine. This section provides a more advanced look into its architecture and future development.
8.1. The z3ed agent Command
The z3ed agent command is the main entry point for the agent. It has the following subcommands:
run --prompt "...": Executes a prompt by generating and running a sequence ofz3edcommands.plan --prompt "...": Shows the sequence ofz3edcommands the AI plans to execute.diff: Shows a diff of the changes made to the ROM after running a prompt.test --prompt "...": Generates changes and then runs anImGuiTestEnginetest to verify them.commit: Saves the modified ROM and any new assets to the project.revert: Reverts the changes made by the agent.learn --description "...": Records a sequence of user actions (CLI commands and GUI interactions) and associates them with a natural language description, allowing the agent to learn new workflows.
8.2. The Agentic Loop (MCP) - Detailed Workflow
- Model (Planner): The agent receives a high-level natural language prompt. It leverages an LLM to break down this goal into a detailed, executable plan. This plan is a sequence of
z3edCLI commands, potentially interleaved withImGuiTestEnginetest steps for intermediate verification. The LLM's prompt includes the user's request, a comprehensive list of availablez3edcommands (with their parameters and expected effects), and relevant contextual information about the current ROM state (e.g., loaded ROM, project files, current editor view). - Code (Command & Test Generation): The LLM returns the generated plan as a structured JSON object. This JSON object contains an array of actions, where each action specifies a
z3edcommand (with its arguments) or anImGuiTestEnginetest to execute. This structured output is crucial for reliable parsing and execution by thez3edagent. - Program (Execution Engine): The
z3ed agentparses the JSON plan and executes each command sequentially. Forz3edcommands, it directly invokes the corresponding internalCommandHandlermethods. ForImGuiTestEnginesteps, it launches theyaze_testexecutable with the appropriate test arguments. The output (stdout, stderr, exit codes) of each executed command is captured. This output, along with any visual feedback fromImGuiTestEngine(e.g., screenshots), can be fed back to the LLM for iterative refinement of the plan. - Verification (Tester): The
ImGuiTestEngineplays a critical role here. After the agent executes a sequence of commands, it can generate and run a specificImGuiTestEnginescript. This script can interact with the YAZE GUI (e.g., open a specific editor, navigate to a location, assert visual properties) to verify that the changes were applied correctly and as intended. The results of these tests (pass/fail, detailed logs, comparison screenshots) are reported back to the user and can be used by the LLM to self-correct or refine its strategy.
8.3. AI Model & Protocol Strategy
- Models: The framework will support both local and remote AI models, offering flexibility and catering to different user needs.
- Local Models (macOS Setup): For privacy, offline use, and reduced operational costs, integration with local LLMs via Ollama is a priority. Users can easily install Ollama on macOS and pull models optimized for code generation, such as
codellama:7b. Thez3edagent will communicate with Ollama's local API endpoint. - Remote Models (Gemini API): For more complex tasks requiring advanced reasoning capabilities, integration with powerful remote models like the Gemini API will be available. Users will need to provide a
GEMINI_API_KEYenvironment variable. A newGeminiAIServiceclass will be implemented to handle the secure API requests and responses.
- Local Models (macOS Setup): For privacy, offline use, and reduced operational costs, integration with local LLMs via Ollama is a priority. Users can easily install Ollama on macOS and pull models optimized for code generation, such as
- Protocol: A robust, yet simple, JSON-based protocol will be used for communication between
z3edand the AI model. This ensures structured data exchange, critical for reliable parsing and execution. Thez3edtool will serialize the user's prompt, current ROM context, availablez3edcommands, and any relevantImGuiTestEnginecapabilities into a JSON object. The AI model will be expected to return a JSON object containing the sequence of commands to be executed, along with potential explanations or confidence scores.
8.4. GUI Integration & User Experience
- Agent Control Panel: A dedicated TUI/GUI panel will be created for managing the agent. This panel will serve as the primary interface for users to interact with the AI. It will feature:
- A multi-line text input for entering natural language prompts.
- Buttons for
Run,Plan,Diff,Test,Commit,Revert, andLearnactions. - A real-time log view displaying the agent's thought process, executed commands, and their outputs.
- A status bar indicating the agent's current state (e.g., "Idle", "Planning", "Executing Commands", "Verifying Changes").
- Diff Editing UI: A TUI-based visual diff viewer will be implemented. This UI will present a side-by-side comparison of the original ROM state (or a previous checkpoint) and the changes proposed or made by the agent. Users will be able to:
- Navigate through individual differences (e.g., changed bytes, modified tiles, added objects).
- Highlight specific changes.
- Accept or reject individual changes or groups of changes, providing fine-grained control over the agent's output.
- Interactive Planning: The agent will present its generated plan in a human-readable format within the GUI. Users will have the opportunity to:
- Review each step of the plan.
- Approve the entire plan for execution.
- Reject specific steps or the entire plan.
- Edit the plan directly (e.g., modify command arguments, reorder steps, insert new commands) before allowing the agent to proceed.
8.5. Testing & Verification
ImGuiTestEngineIntegration: The agent will be able to dynamically generate and executeImGuiTestEnginetests. This allows for automated visual verification of the agent's work, ensuring that changes are not only functionally correct but also visually appealing and consistent with design principles. The agent can be trained to generate test scripts that assert specific pixel colors, UI element positions, or overall visual layouts.- Mock Testing Framework: A robust "mock" mode will be implemented for the
z3ed agent. In this mode, the agent will simulate the execution of commands without modifying the actual ROM. This is crucial for safe and fast testing of the agent's planning and command generation capabilities. The existingMockRomclass will be extended to fully support allz3edcommands, providing a consistent interface for both real and mock execution. - User-Facing Tests: A "tutorial" or "challenge" mode will be created where users can test the agent with a series of predefined tasks. This will serve as an educational tool for users to understand the agent's capabilities and provide a way to benchmark its performance against specific ROM hacking challenges.
8.6. Safety & Sandboxing
- Dry Run Mode: The agent will always offer a "dry run" mode, where it only shows the commands it would execute without making any actual changes to the ROM. This provides a critical safety net for users.
- Command Whitelisting: The agent's execution environment will enforce a strict command whitelisting policy. Only a predefined set of "safe"
z3edcommands will be executable by the AI. Any attempt to execute an unauthorized command will be blocked. - Resource Limits: The agent will operate within defined resource limits (e.g., maximum number of commands per plan, maximum data modification size) to prevent unintended extensive changes or infinite loops.
- Human Oversight: Given the inherent unpredictability of AI models, human oversight will be a fundamental principle. The interactive planning and diff editing UIs are designed to keep the user in control at all times.
8.7. Optional JSON Dependency
To avoid breaking platform builds where a JSON library is not available or desired, the JSON-related code will be conditionally compiled using a preprocessor macro (e.g., YAZE_WITH_JSON). When this macro is not defined, the agentic features that rely on JSON will be disabled. The nlohmann/json library will be added as a submodule to the project and included in the build only when YAZE_WITH_JSON is defined.
8.8. Contextual Awareness & Feedback Loop
- Contextual Information: The agent's prompts to the LLM will be enriched with comprehensive contextual information, including:
- The current state of the loaded ROM (e.g., ROM header, loaded assets, current editor view).
- Relevant project files (e.g.,
.yazeproject configuration, symbol files). - User preferences and previous interactions.
- A dynamic list of available
z3edcommands and their detailed usage.
- Feedback Loop for Learning: The results of
ImGuiTestEngineverifications and user accept/reject actions will form a crucial feedback loop. This data can be used to fine-tune the LLM or train smaller, specialized models to improve the agent's planning and command generation capabilities over time.
8.9. Error Handling and Recovery
- Robust Error Reporting: The agent will provide clear and actionable error messages when commands fail or unexpected situations arise.
- Rollback Mechanisms: The
revertcommand provides a basic rollback. More advanced mechanisms, such as transactional changes or snapshotting, could be explored for complex multi-step operations. - Interactive Debugging: In case of errors, the agent could pause execution and allow the user to inspect the current state, modify the plan, or provide corrective instructions.
8.10. Extensibility
- Modular Command Handlers: The
z3edCLI's modular design allows for easy addition of new commands, which automatically become available to the AI agent. - Pluggable AI Models: The
AIServiceinterface enables seamless integration of different AI models (local or remote) without modifying the core agent logic. - Custom Test Generation: Users or developers can extend the
ImGuiTestEnginecapabilities to create custom verification tests for specific hacking scenarios.
9. UX Improvements and Architectural Decisions
9.1. TUI Component Architecture
The TUI system has been redesigned around a consistent component architecture:
TuiComponentInterface: All UI components implement a standard interface with aRender()method, ensuring consistency across the application.- Component Composition: Complex UIs are built by composing simpler components, making the code more maintainable and testable.
- Event Handling: Standardized event handling patterns across all components for consistent user experience.
9.2. Command Handler Unification
The CLI and TUI systems now share a unified command handler architecture:
- Dual Execution Paths: Each command handler supports both CLI (
Run()) and TUI (RunTUI()) execution modes. - Shared State Management: Common functionality like ROM loading and validation is centralized in the base
CommandHandlerclass. - Consistent Error Handling: All commands use
absl::Statusfor uniform error reporting across CLI and TUI modes.
9.3. Interface Consolidation
Several interfaces have been combined and simplified:
- Unified Menu System: The main menu now serves as a central hub for both direct command execution and TUI mode switching.
- Integrated Help System: Help information is accessible from both CLI and TUI modes with consistent formatting.
- Streamlined Navigation: Reduced cognitive load by consolidating related functionality into single interfaces.
9.4. Code Organization Improvements
The codebase has been restructured for better maintainability:
- Header Organization: Proper forward declarations and include management to reduce compilation dependencies.
- Namespace Management: Clean namespace usage to avoid conflicts and improve code clarity.
- Build System Optimization: Streamlined CMake configuration with conditional compilation for optional features.
9.5. Future UX Enhancements
Based on the current architecture, several UX improvements are planned:
- Progressive Disclosure: Complex commands will offer both simple and advanced modes.
- Context-Aware Help: Help text will adapt based on current ROM state and available commands.
- Undo/Redo System: Command history tracking for safer experimentation.
- Batch Operations: Support for executing multiple related commands as a single operation.