Files

scawful fbbe911ae0 feat: Expand collaborative session capabilities in Z3ED documentation

- Updated README.md to include detailed instructions for both local and network collaboration modes.
- Added setup requirements and steps for starting a collaboration server using Node.js.
- Enhanced feature descriptions for real-time collaboration, session management, and participant tracking.
- Improved clarity on how to host and join sessions in both collaboration modes.

2025-10-04 17:31:36 -04:00

14 KiB

Raw Blame History

z3ed: AI-Powered CLI for YAZE

Version: 0.1.0-alpha Last Updated: October 4, 2025

1. Overview

This document is the source of truth for the z3ed CLI architecture, design, and roadmap. It outlines the evolution of z3ed into a powerful, scriptable, and extensible tool for both manual and AI-driven ROM hacking.

z3ed has successfully implemented its core infrastructure and is production-ready on macOS.

Core Capabilities

Conversational Agent: Chat with an AI (Ollama or Gemini) to explore ROM contents and plan changes using natural language—available from the CLI, terminal UI, and now directly within the YAZE editor.
GUI Test Automation: A gRPC-based test harness allows for widget discovery, test recording/replay, and introspection for debugging and AI-driven validation.
Proposal System: A safe, sandboxed editing workflow where all changes are tracked as "proposals" that require human review and acceptance.
Resource-Oriented CLI: A clean z3ed <resource> <action> command structure that is both human-readable and machine-parsable.

2. Quick Start

Build

A single Z3ED_AI=ON CMake flag enables all AI features, including JSON, YAML, and httplib dependencies. This simplifies the build process.

# Build with AI features (RECOMMENDED)
cmake -B build -DZ3ED_AI=ON
cmake --build build --target z3ed

# For GUI automation features, also include gRPC
cmake -B build -DZ3ED_AI=ON -DYAZE_WITH_GRPC=ON
cmake --build build --target z3ed

AI Setup

Ollama (Recommended for Development):

brew install ollama              # macOS
ollama pull qwen2.5-coder:7b    # Pull recommended model
ollama serve                     # Start server

Gemini (Cloud API):

# Get API key from https://aistudio.google.com/apikey
export GEMINI_API_KEY="your-key-here"

Example Commands

Conversational Agent:

# Interactive chat (FTXUI)
z3ed agent chat --rom zelda3.sfc

# Simple text mode (better for AI/automation)
z3ed agent simple-chat --rom zelda3.sfc

# Batch mode
z3ed agent simple-chat --file queries.txt --rom zelda3.sfc

Proposal Workflow:

# Generate from prompt
z3ed agent run --prompt "Place tree at 10,10" --rom zelda3.sfc --sandbox

# List proposals
z3ed agent list

# Review
z3ed agent diff --proposal-id <id>

# Accept
z3ed agent accept --proposal-id <id>

3. Architecture

The z3ed system is composed of several layers, from the high-level AI agent down to the YAZE GUI and test harness.

System Components Diagram

┌─────────────────────────────────────────────────────────┐
│ AI Agent Layer (LLM: Ollama, Gemini)                    │
└────────────────────┬────────────────────────────────────┘
                     │
┌────────────────────▼────────────────────────────────────┐
│ z3ed CLI (Command-Line Interface)                       │
│  ├─ agent run/plan/diff/test/list/describe              │
│  └─ rom/palette/overworld/dungeon commands              │
└────────────────────┬────────────────────────────────────┘
                     │
┌────────────────────▼────────────────────────────────────┐
│ Service Layer (Singleton Services)                      │
│  ├─ ProposalRegistry (Proposal Tracking)                │
│  ├─ RomSandboxManager (Isolated ROM Copies)             │
│  ├─ ResourceCatalog (Machine-Readable API Specs)        │
│  └─ ConversationalAgentService (Chat & Tool Dispatch)   │
└────────────────────┬────────────────────────────────────┘
                     │
┌────────────────────▼────────────────────────────────────┐
│ ImGuiTestHarness (gRPC Server in YAZE)                  │
│  ├─ Ping, Click, Type, Wait, Assert, Screenshot         │
│  └─ Introspection & Discovery RPCs                      │
└────────────────────┬────────────────────────────────────┘
                     │
┌────────────────────▼────────────────────────────────────┐
│ YAZE GUI (ImGui Application)                            │
│  └─ ProposalDrawer & Editor Windows                     │
└─────────────────────────────────────────────────────────┘

4. Agentic & Generative Workflow (MCP)

The z3ed CLI is the foundation for an AI-driven Model-Code-Program (MCP) loop, where the AI agent's "program" is a script of z3ed commands.

Model (Planner): The agent receives a natural language prompt and leverages an LLM to create a plan, which is a sequence of z3ed commands.
Code (Generation): The LLM returns the plan as a structured JSON object containing actions.
Program (Execution): The z3ed agent parses the plan and executes each command sequentially in a sandboxed ROM environment.
Verification (Tester): The ImGuiTestHarness is used to run automated GUI tests to verify that the changes were applied correctly.

5. Command Reference

Agent Commands

agent run --prompt "...": Executes an AI-driven ROM modification in a sandbox.
agent plan --prompt "...": Shows the sequence of commands the AI plans to execute.
agent list: Shows all proposals and their status.
agent diff [--proposal-id <id>]: Shows the changes, logs, and metadata for a proposal.
agent describe [--resource <name>]: Exports machine-readable API specifications for AI consumption.
agent chat: Opens an interactive terminal chat (TUI) with the AI agent.
agent simple-chat: A lightweight, non-TUI chat mode for scripting and automation.
agent test ...: Commands for running and managing automated GUI tests.

Resource Commands

rom info|validate|diff: Commands for ROM file inspection and comparison.
palette export|import|list: Commands for palette manipulation.
overworld get-tile|find-tile|set-tile: Commands for overworld editing.
dungeon list-sprites|list-rooms: Commands for dungeon inspection.

6. Chat Modes

FTXUI Chat (`agent chat`)

Full-screen interactive terminal with table rendering, syntax highlighting, and scrollable history. Best for manual exploration.

Simple Chat (`agent simple-chat`)

Lightweight, scriptable text-based REPL that supports single messages, interactive sessions, piped input, and batch files.

Accessible from Debug → Agent Chat inside YAZE. Provides the same conversation loop as the CLI, including streaming history, JSON/table inspection, and ROM-aware tool dispatch.

✨ New Features:

Persistent Chat History: Chat conversations are automatically saved and restored
Collaborative Sessions: Multiple users can join the same session and share a chat history
Multimodal Vision: Capture screenshots of your ROM editor and ask Gemini to analyze them

7. AI Provider Configuration

Z3ED supports multiple AI providers. Configuration is resolved with command-line flags taking precedence over environment variables.

--ai_provider=<provider>: Selects the AI provider (mock, ollama, gemini).
--ai_model=<model>: Specifies the model name (e.g., qwen2.5-coder:7b, gemini-1.5-flash).
--gemini_api_key=<key>: Your Gemini API key.
--ollama_host=<url>: The URL for your Ollama server (default: http://localhost:11434).

8. CLI Output & Help System

The z3ed CLI features a modernized output system designed to be clean for users and informative for developers.

Verbose Logging

By default, z3ed provides clean, user-facing output. For detailed debugging, including API calls and internal state, use the --verbose flag.

Default (Clean):

🤖 AI Provider: gemini
   Model: gemini-2.5-flash
⠋ Thinking...
🔧 Calling tool: resource-list (type=room)
✓ Tool executed successfully

Verbose Mode:

# z3ed agent simple-chat "What is room 5?" --verbose
🤖 AI Provider: gemini
   Model: gemini-2.5-flash
[DEBUG] Initializing Gemini service...
[DEBUG] Function calling: disabled
[DEBUG] Using curl for HTTPS request...
⠋ Thinking...
[DEBUG] Parsing response...
🔧 Calling tool: resource-list (type=room)
✓ Tool executed successfully

Hierarchical Help System

The help system is organized by category for easy navigation.

Main Help: z3ed --help or z3ed -h shows a high-level overview of command categories.
Category Help: z3ed help <category> provides detailed information for a specific group of commands (e.g., agent, patch, rom).

9. Collaborative Sessions & Multimodal Vision

Collaborative Sessions

Z3ED supports both local (filesystem-based) and network (WebSocket-based) collaborative sessions for sharing chat conversations and working together on ROM hacks.

Local Collaboration Mode

How to Use:

Open YAZE and go to Debug → Agent Chat
In the Agent Chat widget, select "Local" mode
Host a Session:
- Enter a session name (e.g., "Evening ROM Hack")
- Click "Host"
- Share the generated 6-character code (e.g., ABC123) with collaborators on the same machine
Join a Session:
- Enter the session code provided by the host
- Click "Join"
- Your chat will now sync with others in the session

Features:

Shared chat history stored in ~/.yaze/agent/sessions/<code>_history.json
Automatic synchronization when sending/receiving messages (2-second polling)
Participant list shows all connected users
Perfect for multiple YAZE instances on the same machine

Network Collaboration Mode (NEW!)

Requirements:

Node.js installed on the server machine
yaze-collab-server repository cloned alongside yaze
Network connectivity between collaborators

Setup:

Start the Collaboration Server:

# From z3ed CLI:
z3ed collab start [--port=8765]

# Or manually:
cd yaze-collab-server
npm install
node server.js

Connect from YAZE:
- Open YAZE and go to Debug → Agent Chat
- Select "Network" mode
- Enter server URL (e.g., ws://localhost:8765)
- Click "Connect to Server"
Collaborate:
- Host or join sessions just like local mode
- Collaborate with anyone who can reach your server
- Real-time message broadcasting via WebSockets