yaze/docs/E6-z3ed-cli-design.md at ba50d89e7d12e09e9ee4d899af31b29fb2a13c74

Files

scawful ba50d89e7d Update z3ed CLI tool and project build configuration

- Updated `.clang-tidy` and `.clangd` configurations for improved code quality checks and diagnostics.
- Added new submodules for JSON and HTTP libraries to support future features.
- Refined README and documentation files to standardize naming conventions and improve clarity.
- Introduced a new command palette in the CLI for easier command access and execution.
- Implemented various CLI handlers for managing ROM, sprites, palettes, and dungeon functionalities.
- Enhanced the TUI components for better user interaction and command execution.
- Added AI service integration for generating commands based on user prompts, expanding the CLI's capabilities.

2025-10-01 08:57:10 -04:00

19 KiB

Raw Blame History

1. Overview

This document outlines a design for the evolution of z3ed, the command-line interface for the YAZE project. The goal is to transform z3ed from a collection of utility commands into a powerful, scriptable, and extensible tool for both manual and automated ROM hacking, with a forward-looking approach to AI-driven generative development.

1.1. Current State

z3ed has evolved significantly. The initial limitations regarding scope, inconsistent structure, basic TUI, and limited scriptability have largely been addressed through the implementation of resource-oriented commands, modular TUI components, and structured output. The focus has shifted towards building a robust foundation for AI-driven generative hacking.

2. Design Goals

The proposed redesign focuses on three core pillars:

Power & Usability for ROM Hackers: Empower users with fine-grained control over all aspects of the ROM directly from the command line.
Testability & Automation: Provide robust commands for validating ROM integrity and automating complex testing scenarios.
AI & Generative Hacking: Establish a powerful, scriptable API that an AI agent (MCP) can use to perform complex, generative tasks on the ROM.

3. Proposed CLI Architecture: Resource-Oriented Commands

The CLI has adopted a z3ed <resource> <action> [options] structure, similar to modern CLIs like gcloud or kubectl, improving clarity and extensibility.

3.1. Top-Level Resources

rom: Commands for interacting with the ROM file itself.
patch: Commands for applying and creating patches.
gfx: Commands for graphics manipulation.
palette: Commands for palette manipulation.
overworld: Commands for overworld editing.
dungeon: Commands for dungeon editing.
sprite: Commands for sprite management and creation.
test: Commands for running tests.
tui: The entrypoint for the enhanced Text User Interface.
agent: Commands for interacting with the AI agent.

3.2. Example Command Mapping

The command mapping has been successfully implemented, transitioning from the old flat structure to the new resource-oriented approach.

4. New Features & Commands

4.1. For the ROM Hacker (Power & Scriptability)

These commands focus on exporting data to and from the original SCAD (Nintendo Super Famicom/SNES CAD) binary formats found in the gigaleak, as well as other relevant binary formats. This enables direct interaction with development assets, version control, and sharing. Many of these commands have been implemented or are in progress.

Dungeon Editing: Commands for exporting, importing, listing, and adding objects.
Overworld Editing: Commands for getting, setting tiles, listing, and moving sprites.
Graphics & Palettes: Commands for exporting/importing sheets and palettes.

4.2. For Testing & Automation

ROM Validation & Comparison: z3ed rom validate, z3ed rom diff, and z3ed rom generate-golden have been implemented.
Test Execution: z3ed test run and z3ed test list-suites are in progress.

5. TUI Enhancements

The --tui flag now launches a significantly enhanced, interactive terminal application built with FTXUI. The TUI has been decomposed into a set of modular components, with each command handler responsible for its own TUI representation, making it more extensible and easier to maintain.

Dashboard View: The main screen is evolving into a dashboard.
Interactive Palette Editor: In progress.
Interactive Hex Viewer: Implemented.
Command Palette: In progress.
Tabbed Layout: Implemented.

6. Generative & Agentic Workflows (MCP Integration)

The redesigned CLI serves as the foundational API for an AI-driven Model-Code-Program (MCP) loop. The AI agent's "program" is a script of z3ed commands.

6.1. The Generative Workflow

The generative workflow has been refined to incorporate more detailed planning and verification steps, leveraging the z3ed agent commands.

6.2. Key Enablers

Granular Commands: The CLI provides commands to manipulate data within the binary formats (e.g., palette set-color, gfx set-pixel), abstracting complexity from the AI agent.
Idempotency: Commands are designed to be idempotent where possible.
SpriteBuilder CLI: Deprioritized for now, pending further research and development of the underlying assembly generation capabilities.

7. Implementation Roadmap

Phase 1: Core CLI & TUI Foundation (Done)

CLI Structure: Implemented.
Command Migration: Implemented.
TUI Decomposition: Implemented.

Phase 2: Interactive TUI & Command Palette (Done)

Interactive Palette Editor: Implemented.
Interactive Hex Viewer: Implemented.
Command Palette: Implemented.

Phase 3: Testing & Project Management (Done)

rom validate: Implemented.
rom diff: Implemented.
rom generate-golden: Implemented.
Project Scaffolding: Implemented.

Phase 4: Agentic Framework & Generative AI (In Progress)

z3ed agent command: Implemented with run, plan, diff, test, commit, revert, and learn subcommands.
AI Model Interaction: In progress, with MockAIService and GeminiAIService (conditional) implemented.
Execution Loop (MCP): In progress, with command parsing and execution logic.
Leveraging ImGuiTestEngine: In progress, with agent test subcommand.
Granular Data Commands: Not started, but planned.
SpriteBuilder CLI: Deprioritized.

Phase 5: Code Structure & UX Improvements (Completed)

Modular Architecture: Refactored CLI handlers into clean, focused modules with proper separation of concerns.
TUI Component System: Implemented TuiComponent interface for consistent UI components across the application.
Unified Command Interface: Standardized CommandHandler base class with both CLI and TUI execution paths.
Error Handling: Improved error handling with consistent absl::Status usage throughout the codebase.
Build System: Streamlined CMake configuration with proper dependency management and conditional compilation.
Code Quality: Resolved linting errors and improved code maintainability through better header organization and forward declarations.

8. Agentic Framework Architecture - Advanced Dive

The agentic framework is designed to allow an AI agent to make edits to the ROM based on high-level natural language prompts. The framework is built around the z3ed CLI and the ImGuiTestEngine. This section provides a more advanced look into its architecture and future development.

8.1. The `z3ed agent` Command

The z3ed agent command is the main entry point for the agent. It has the following subcommands:

run --prompt "...": Executes a prompt by generating and running a sequence of z3ed commands.
plan --prompt "...": Shows the sequence of z3ed commands the AI plans to execute.
diff: Shows a diff of the changes made to the ROM after running a prompt.
test --prompt "...": Generates changes and then runs an ImGuiTestEngine test to verify them.
commit: Saves the modified ROM and any new assets to the project.
revert: Reverts the changes made by the agent.
learn --description "...": Records a sequence of user actions (CLI commands and GUI interactions) and associates them with a natural language description, allowing the agent to learn new workflows.

8.2. The Agentic Loop (MCP) - Detailed Workflow

Model (Planner): The agent receives a high-level natural language prompt. It leverages an LLM to break down this goal into a detailed, executable plan. This plan is a sequence of z3ed CLI commands, potentially interleaved with ImGuiTestEngine test steps for intermediate verification. The LLM's prompt includes the user's request, a comprehensive list of available z3ed commands (with their parameters and expected effects), and relevant contextual information about the current ROM state (e.g., loaded ROM, project files, current editor view).
Code (Command & Test Generation): The LLM returns the generated plan as a structured JSON object. This JSON object contains an array of actions, where each action specifies a z3ed command (with its arguments) or an ImGuiTestEngine test to execute. This structured output is crucial for reliable parsing and execution by the z3ed agent.
Program (Execution Engine): The z3ed agent parses the JSON plan and executes each command sequentially. For z3ed commands, it directly invokes the corresponding internal CommandHandler methods. For ImGuiTestEngine steps, it launches the yaze_test executable with the appropriate test arguments. The output (stdout, stderr, exit codes) of each executed command is captured. This output, along with any visual feedback from ImGuiTestEngine (e.g., screenshots), can be fed back to the LLM for iterative refinement of the plan.
Verification (Tester): The ImGuiTestEngine plays a critical role here. After the agent executes a sequence of commands, it can generate and run a specific ImGuiTestEngine script. This script can interact with the YAZE GUI (e.g., open a specific editor, navigate to a location, assert visual properties) to verify that the changes were applied correctly and as intended. The results of these tests (pass/fail, detailed logs, comparison screenshots) are reported back to the user and can be used by the LLM to self-correct or refine its strategy.

8.3. AI Model & Protocol Strategy

Models: The framework will support both local and remote AI models, offering flexibility and catering to different user needs.
- Local Models (macOS Setup): For privacy, offline use, and reduced operational costs, integration with local LLMs via Ollama is a priority. Users can easily install Ollama on macOS and pull models optimized for code generation, such as codellama:7b. The z3ed agent will communicate with Ollama's local API endpoint.
- Remote Models (Gemini API): For more complex tasks requiring advanced reasoning capabilities, integration with powerful remote models like the Gemini API will be available. Users will need to provide a GEMINI_API_KEY environment variable. A new GeminiAIService class will be implemented to handle the secure API requests and responses.
Protocol: A robust, yet simple, JSON-based protocol will be used for communication between z3ed and the AI model. This ensures structured data exchange, critical for reliable parsing and execution. The z3ed tool will serialize the user's prompt, current ROM context, available z3ed commands, and any relevant ImGuiTestEngine capabilities into a JSON object. The AI model will be expected to return a JSON object containing the sequence of commands to be executed, along with potential explanations or confidence scores.

8.4. GUI Integration & User Experience

Agent Control Panel: A dedicated TUI/GUI panel will be created for managing the agent. This panel will serve as the primary interface for users to interact with the AI. It will feature:
- A multi-line text input for entering natural language prompts.
- Buttons for Run, Plan, Diff, Test, Commit, Revert, and Learn actions.
- A real-time log view displaying the agent's thought process, executed commands, and their outputs.
- A status bar indicating the agent's current state (e.g., "Idle", "Planning", "Executing Commands", "Verifying Changes").
Diff Editing UI: A TUI-based visual diff viewer will be implemented. This UI will present a side-by-side comparison of the original ROM state (or a previous checkpoint) and the changes proposed or made by the agent. Users will be able to:
- Navigate through individual differences (e.g., changed bytes, modified tiles, added objects).
- Highlight specific changes.
- Accept or reject individual changes or groups of changes, providing fine-grained control over the agent's output.
Interactive Planning: The agent will present its generated plan in a human-readable format within the GUI. Users will have the opportunity to:
- Review each step of the plan.
- Approve the entire plan for execution.
- Reject specific steps or the entire plan.
- Edit the plan directly (e.g., modify command arguments, reorder steps, insert new commands) before allowing the agent to proceed.

8.5. Testing & Verification

ImGuiTestEngine Integration: The agent will be able to dynamically generate and execute ImGuiTestEngine tests. This allows for automated visual verification of the agent's work, ensuring that changes are not only functionally correct but also visually appealing and consistent with design principles. The agent can be trained to generate test scripts that assert specific pixel colors, UI element positions, or overall visual layouts.
Mock Testing Framework: A robust "mock" mode will be implemented for the z3ed agent. In this mode, the agent will simulate the execution of commands without modifying the actual ROM. This is crucial for safe and fast testing of the agent's planning and command generation capabilities. The existing MockRom class will be extended to fully support all z3ed commands, providing a consistent interface for both real and mock execution.
User-Facing Tests: A "tutorial" or "challenge" mode will be created where users can test the agent with a series of predefined tasks. This will serve as an educational tool for users to understand the agent's capabilities and provide a way to benchmark its performance against specific ROM hacking challenges.

8.6. Safety & Sandboxing

Dry Run Mode: The agent will always offer a "dry run" mode, where it only shows the commands it would execute without making any actual changes to the ROM. This provides a critical safety net for users.
Command Whitelisting: The agent's execution environment will enforce a strict command whitelisting policy. Only a predefined set of "safe" z3ed commands will be executable by the AI. Any attempt to execute an unauthorized command will be blocked.
Resource Limits: The agent will operate within defined resource limits (e.g., maximum number of commands per plan, maximum data modification size) to prevent unintended extensive changes or infinite loops.
Human Oversight: Given the inherent unpredictability of AI models, human oversight will be a fundamental principle. The interactive planning and diff editing UIs are designed to keep the user in control at all times.

8.7. Optional JSON Dependency

To avoid breaking platform builds where a JSON library is not available or desired, the JSON-related code will be conditionally compiled using a preprocessor macro (e.g., YAZE_WITH_JSON). When this macro is not defined, the agentic features that rely on JSON will be disabled. The nlohmann/json library will be added as a submodule to the project and included in the build only when YAZE_WITH_JSON is defined.

8.8. Contextual Awareness & Feedback Loop

Contextual Information: The agent's prompts to the LLM will be enriched with comprehensive contextual information, including:
- The current state of the loaded ROM (e.g., ROM header, loaded assets, current editor view).
- Relevant project files (e.g., .yaze project configuration, symbol files).
- User preferences and previous interactions.
- A dynamic list of available z3ed commands and their detailed usage.
Feedback Loop for Learning: The results of ImGuiTestEngine verifications and user accept/reject actions will form a crucial feedback loop. This data can be used to fine-tune the LLM or train smaller, specialized models to improve the agent's planning and command generation capabilities over time.

8.9. Error Handling and Recovery

Robust Error Reporting: The agent will provide clear and actionable error messages when commands fail or unexpected situations arise.
Rollback Mechanisms: The revert command provides a basic rollback. More advanced mechanisms, such as transactional changes or snapshotting, could be explored for complex multi-step operations.
Interactive Debugging: In case of errors, the agent could pause execution and allow the user to inspect the current state, modify the plan, or provide corrective instructions.

8.10. Extensibility

Modular Command Handlers: The z3ed CLI's modular design allows for easy addition of new commands, which automatically become available to the AI agent.
Pluggable AI Models: The AIService interface enables seamless integration of different AI models (local or remote) without modifying the core agent logic.
Custom Test Generation: Users or developers can extend the ImGuiTestEngine capabilities to create custom verification tests for specific hacking scenarios.

9. UX Improvements and Architectural Decisions

9.1. TUI Component Architecture

The TUI system has been redesigned around a consistent component architecture:

TuiComponent Interface: All UI components implement a standard interface with a Render() method, ensuring consistency across the application.
Component Composition: Complex UIs are built by composing simpler components, making the code more maintainable and testable.
Event Handling: Standardized event handling patterns across all components for consistent user experience.

9.2. Command Handler Unification

The CLI and TUI systems now share a unified command handler architecture:

Dual Execution Paths: Each command handler supports both CLI (Run()) and TUI (RunTUI()) execution modes.
Shared State Management: Common functionality like ROM loading and validation is centralized in the base CommandHandler class.
Consistent Error Handling: All commands use absl::Status for uniform error reporting across CLI and TUI modes.

9.3. Interface Consolidation

Several interfaces have been combined and simplified:

Unified Menu System: The main menu now serves as a central hub for both direct command execution and TUI mode switching.
Integrated Help System: Help information is accessible from both CLI and TUI modes with consistent formatting.
Streamlined Navigation: Reduced cognitive load by consolidating related functionality into single interfaces.

9.4. Code Organization Improvements

The codebase has been restructured for better maintainability:

Header Organization: Proper forward declarations and include management to reduce compilation dependencies.
Namespace Management: Clean namespace usage to avoid conflicts and improve code clarity.
Build System Optimization: Streamlined CMake configuration with conditional compilation for optional features.

9.5. Future UX Enhancements

Based on the current architecture, several UX improvements are planned:

Progressive Disclosure: Complex commands will offer both simple and advanced modes.
Context-Aware Help: Help text will adapt based on current ROM state and available commands.
Undo/Redo System: Command history tracking for safer experimentation.
Batch Operations: Support for executing multiple related commands as a single operation.

19 KiB Raw Blame History