- Introduced the `agent learn` command to manage learned knowledge, including user preferences, ROM patterns, project context, and conversation memory. - Added functionality for setting, getting, listing preferences, and managing project contexts. - Implemented memory storage and retrieval, allowing the agent to remember past conversations and key facts. - Enhanced CLI output with usage instructions and error handling for the learn command. - Created a new `LearnedKnowledgeService` to handle persistent data storage and retrieval.
901 lines
30 KiB
Markdown
901 lines
30 KiB
Markdown
# z3ed: AI-Powered CLI for YAZE
|
|
|
|
**Version**: 0.1.0-alpha
|
|
**Last Updated**: October 4, 2025
|
|
|
|
## 1. Overview
|
|
|
|
This document is the **source of truth** for the z3ed CLI architecture, design, and roadmap. It outlines the evolution of `z3ed` into a powerful, scriptable, and extensible tool for both manual and AI-driven ROM hacking.
|
|
|
|
`z3ed` has successfully implemented its core infrastructure and is **production-ready on macOS**.
|
|
|
|
### Core Capabilities
|
|
|
|
1. **Conversational Agent**: Chat with an AI (Ollama or Gemini) to explore ROM contents and plan changes using natural language—available from the CLI, terminal UI, and now directly within the YAZE editor.
|
|
2. **GUI Test Automation**: A gRPC-based test harness allows for widget discovery, test recording/replay, and introspection for debugging and AI-driven validation.
|
|
3. **Proposal System**: A safe, sandboxed editing workflow where all changes are tracked as "proposals" that require human review and acceptance.
|
|
4. **Resource-Oriented CLI**: A clean `z3ed <resource> <action>` command structure that is both human-readable and machine-parsable.
|
|
|
|
## 2. Quick Start
|
|
|
|
### Build
|
|
|
|
A single `Z3ED_AI=ON` CMake flag enables all AI features, including JSON, YAML, and httplib dependencies. This simplifies the build process.
|
|
|
|
```bash
|
|
# Build with AI features (RECOMMENDED)
|
|
cmake -B build -DZ3ED_AI=ON
|
|
cmake --build build --target z3ed
|
|
|
|
# For GUI automation features, also include gRPC
|
|
cmake -B build -DZ3ED_AI=ON -DYAZE_WITH_GRPC=ON
|
|
cmake --build build --target z3ed
|
|
```
|
|
|
|
### AI Setup
|
|
|
|
**Ollama (Recommended for Development)**:
|
|
```bash
|
|
brew install ollama # macOS
|
|
ollama pull qwen2.5-coder:7b # Pull recommended model
|
|
ollama serve # Start server
|
|
```
|
|
|
|
**Gemini (Cloud API)**:
|
|
```bash
|
|
# Get API key from https://aistudio.google.com/apikey
|
|
export GEMINI_API_KEY="your-key-here"
|
|
```
|
|
|
|
### Example Commands
|
|
|
|
**Conversational Agent**:
|
|
```bash
|
|
# Interactive chat (FTXUI)
|
|
z3ed agent chat --rom zelda3.sfc
|
|
|
|
# Simple text mode (better for AI/automation)
|
|
z3ed agent simple-chat --rom zelda3.sfc
|
|
|
|
# Batch mode
|
|
z3ed agent simple-chat --file queries.txt --rom zelda3.sfc
|
|
```
|
|
|
|
**Proposal Workflow**:
|
|
```bash
|
|
# Generate from prompt
|
|
z3ed agent run --prompt "Place tree at 10,10" --rom zelda3.sfc --sandbox
|
|
|
|
# List proposals
|
|
z3ed agent list
|
|
|
|
# Review
|
|
z3ed agent diff --proposal-id <id>
|
|
|
|
# Accept
|
|
z3ed agent accept --proposal-id <id>
|
|
```
|
|
|
|
## 3. Architecture
|
|
|
|
The z3ed system is composed of several layers, from the high-level AI agent down to the YAZE GUI and test harness.
|
|
|
|
### System Components Diagram
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────┐
|
|
│ AI Agent Layer (LLM: Ollama, Gemini) │
|
|
└────────────────────┬────────────────────────────────────┘
|
|
│
|
|
┌────────────────────▼────────────────────────────────────┐
|
|
│ z3ed CLI (Command-Line Interface) │
|
|
│ ├─ agent run/plan/diff/test/list/describe │
|
|
│ └─ rom/palette/overworld/dungeon commands │
|
|
└────────────────────┬────────────────────────────────────┘
|
|
│
|
|
┌────────────────────▼────────────────────────────────────┐
|
|
│ Service Layer (Singleton Services) │
|
|
│ ├─ ProposalRegistry (Proposal Tracking) │
|
|
│ ├─ RomSandboxManager (Isolated ROM Copies) │
|
|
│ ├─ ResourceCatalog (Machine-Readable API Specs) │
|
|
│ └─ ConversationalAgentService (Chat & Tool Dispatch) │
|
|
└────────────────────┬────────────────────────────────────┘
|
|
│
|
|
┌────────────────────▼────────────────────────────────────┐
|
|
│ ImGuiTestHarness (gRPC Server in YAZE) │
|
|
│ ├─ Ping, Click, Type, Wait, Assert, Screenshot │
|
|
│ └─ Introspection & Discovery RPCs │
|
|
└────────────────────┬────────────────────────────────────┘
|
|
│
|
|
┌────────────────────▼────────────────────────────────────┐
|
|
│ YAZE GUI (ImGui Application) │
|
|
│ └─ ProposalDrawer & Editor Windows │
|
|
└─────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
## 4. Agentic & Generative Workflow (MCP)
|
|
|
|
The `z3ed` CLI is the foundation for an AI-driven Model-Code-Program (MCP) loop, where the AI agent's "program" is a script of `z3ed` commands.
|
|
|
|
1. **Model (Planner)**: The agent receives a natural language prompt and leverages an LLM to create a plan, which is a sequence of `z3ed` commands.
|
|
2. **Code (Generation)**: The LLM returns the plan as a structured JSON object containing actions.
|
|
3. **Program (Execution)**: The `z3ed agent` parses the plan and executes each command sequentially in a sandboxed ROM environment.
|
|
4. **Verification (Tester)**: The `ImGuiTestHarness` is used to run automated GUI tests to verify that the changes were applied correctly.
|
|
|
|
## 5. Command Reference
|
|
|
|
### Agent Commands
|
|
|
|
- `agent run --prompt "..."`: Executes an AI-driven ROM modification in a sandbox.
|
|
- `agent plan --prompt "..."`: Shows the sequence of commands the AI plans to execute.
|
|
- `agent list`: Shows all proposals and their status.
|
|
- `agent diff [--proposal-id <id>]`: Shows the changes, logs, and metadata for a proposal.
|
|
- `agent describe [--resource <name>]`: Exports machine-readable API specifications for AI consumption.
|
|
- `agent chat`: Opens an interactive terminal chat (TUI) with the AI agent.
|
|
- `agent simple-chat`: A lightweight, non-TUI chat mode for scripting and automation.
|
|
- `agent test ...`: Commands for running and managing automated GUI tests.
|
|
- `agent learn ...`: **NEW**: Manage learned knowledge (preferences, ROM patterns, project context, conversation memory).
|
|
|
|
### Resource Commands
|
|
|
|
- `rom info|validate|diff`: Commands for ROM file inspection and comparison.
|
|
- `palette export|import|list`: Commands for palette manipulation.
|
|
- `overworld get-tile|find-tile|set-tile`: Commands for overworld editing.
|
|
- `dungeon list-sprites|list-rooms`: Commands for dungeon inspection.
|
|
|
|
## 6. Chat Modes
|
|
|
|
### FTXUI Chat (`agent chat`)
|
|
Full-screen interactive terminal with table rendering, syntax highlighting, and scrollable history. Best for manual exploration.
|
|
|
|
### Simple Chat (`agent simple-chat`)
|
|
Lightweight, scriptable text-based REPL that supports single messages, interactive sessions, piped input, and batch files.
|
|
|
|
### GUI Chat Widget (Editor Integration)
|
|
Accessible from **Debug → Agent Chat** inside YAZE. Provides the same conversation loop as the CLI, including streaming history, JSON/table inspection, and ROM-aware tool dispatch.
|
|
|
|
**✨ New Features:**
|
|
- **Persistent Chat History**: Chat conversations are automatically saved and restored
|
|
- **Collaborative Sessions**: Multiple users can join the same session and share a chat history
|
|
- **Multimodal Vision**: Capture screenshots of your ROM editor and ask Gemini to analyze them
|
|
|
|
## 7. AI Provider Configuration
|
|
|
|
Z3ED supports multiple AI providers. Configuration is resolved with command-line flags taking precedence over environment variables.
|
|
|
|
- `--ai_provider=<provider>`: Selects the AI provider (`mock`, `ollama`, `gemini`).
|
|
- `--ai_model=<model>`: Specifies the model name (e.g., `qwen2.5-coder:7b`, `gemini-2.5-flash`).
|
|
- `--gemini_api_key=<key>`: Your Gemini API key.
|
|
- `--ollama_host=<url>`: The URL for your Ollama server (default: `http://localhost:11434`).
|
|
|
|
### System Prompt Versions
|
|
|
|
Z3ED includes multiple system prompt versions for different use cases:
|
|
|
|
- **v1 (default)**: Original reactive prompt with basic tool calling
|
|
- **v2**: Enhanced with better JSON formatting and error handling
|
|
- **v3 (latest)**: Proactive prompt with intelligent tool chaining and implicit iteration - **RECOMMENDED**
|
|
|
|
To use v3 prompt: Set environment variable `Z3ED_PROMPT_VERSION=v3` or it will be auto-selected for Gemini 2.0+ models.
|
|
|
|
## 8. Learn Command - Knowledge Management
|
|
|
|
The learn command enables the AI agent to remember preferences, patterns, and context across sessions.
|
|
|
|
### Basic Usage
|
|
|
|
```bash
|
|
# Store a preference
|
|
z3ed agent learn --preference "default_palette=2"
|
|
|
|
# Get a preference
|
|
z3ed agent learn --get-preference default_palette
|
|
|
|
# List all preferences
|
|
z3ed agent learn --list-preferences
|
|
|
|
# View statistics
|
|
z3ed agent learn --stats
|
|
|
|
# Export all learned data
|
|
z3ed agent learn --export my_learned_data.json
|
|
|
|
# Import learned data
|
|
z3ed agent learn --import my_learned_data.json
|
|
```
|
|
|
|
### Project Context
|
|
|
|
Store project-specific information that the agent can reference:
|
|
|
|
```bash
|
|
# Save project context
|
|
z3ed agent learn --project "myrom" --context "Vanilla+ difficulty hack, focus on dungeon redesign"
|
|
|
|
# List projects
|
|
z3ed agent learn --list-projects
|
|
|
|
# Get project details
|
|
z3ed agent learn --get-project "myrom"
|
|
```
|
|
|
|
### Conversation Memory
|
|
|
|
The agent automatically stores summaries of conversations for future reference:
|
|
|
|
```bash
|
|
# View recent memories
|
|
z3ed agent learn --recent-memories 10
|
|
|
|
# Search memories by topic
|
|
z3ed agent learn --search-memories "room 5"
|
|
```
|
|
|
|
### Storage Location
|
|
|
|
All learned data is stored in `~/.yaze/agent/`:
|
|
- `preferences.json`: User preferences
|
|
- `patterns.json`: Learned ROM patterns
|
|
- `projects.json`: Project contexts
|
|
- `memories.json`: Conversation summaries
|
|
|
|
## 9. CLI Output & Help System
|
|
|
|
The `z3ed` CLI features a modernized output system designed to be clean for users and informative for developers.
|
|
|
|
### Verbose Logging
|
|
|
|
By default, `z3ed` provides clean, user-facing output. For detailed debugging, including API calls and internal state, use the `--verbose` flag.
|
|
|
|
**Default (Clean):**
|
|
```bash
|
|
🤖 AI Provider: gemini
|
|
Model: gemini-2.5-flash
|
|
⠋ Thinking...
|
|
🔧 Calling tool: resource-list (type=room)
|
|
✓ Tool executed successfully
|
|
```
|
|
|
|
**Verbose Mode:**
|
|
```bash
|
|
# z3ed agent simple-chat "What is room 5?" --verbose
|
|
🤖 AI Provider: gemini
|
|
Model: gemini-2.5-flash
|
|
[DEBUG] Initializing Gemini service...
|
|
[DEBUG] Function calling: disabled
|
|
[DEBUG] Using curl for HTTPS request...
|
|
⠋ Thinking...
|
|
[DEBUG] Parsing response...
|
|
🔧 Calling tool: resource-list (type=room)
|
|
✓ Tool executed successfully
|
|
```
|
|
|
|
### Hierarchical Help System
|
|
|
|
The help system is organized by category for easy navigation.
|
|
|
|
- **Main Help**: `z3ed --help` or `z3ed -h` shows a high-level overview of command categories.
|
|
- **Category Help**: `z3ed help <category>` provides detailed information for a specific group of commands (e.g., `agent`, `patch`, `rom`).
|
|
|
|
## 10. Collaborative Sessions & Multimodal Vision
|
|
|
|
### Overview
|
|
|
|
YAZE supports real-time collaboration for ROM hacking through dual modes: **Local** (filesystem-based) for same-machine collaboration, and **Network** (WebSocket-based via yaze-server v2.0) for internet-based collaboration with advanced features including ROM synchronization, snapshot sharing, and AI agent integration.
|
|
|
|
---
|
|
|
|
### Local Collaboration Mode
|
|
|
|
Perfect for multiple YAZE instances on the same machine or cloud-synced folders (Dropbox, iCloud).
|
|
|
|
#### How to Use
|
|
|
|
1. Open YAZE → **Debug → Agent Chat**
|
|
2. Select **"Local"** mode
|
|
3. **Host a Session:**
|
|
- Enter session name: `Evening ROM Hack`
|
|
- Click **"Host Session"**
|
|
- Share the 6-character code (e.g., `ABC123`)
|
|
4. **Join a Session:**
|
|
- Enter the session code
|
|
- Click **"Join Session"**
|
|
- Chat history syncs automatically
|
|
|
|
#### Features
|
|
|
|
- **Shared History**: `~/.yaze/agent/sessions/<code>_history.json`
|
|
- **Auto-Sync**: 2-second polling for new messages
|
|
- **Participant Tracking**: Real-time participant list
|
|
- **Toast Notifications**: Get notified when collaborators send messages
|
|
- **Zero Setup**: No server required
|
|
|
|
#### Cloud Folder Workaround
|
|
|
|
Enable internet collaboration without a server:
|
|
|
|
```bash
|
|
# Link your sessions directory to Dropbox/iCloud
|
|
ln -s ~/Dropbox/yaze-sessions ~/.yaze/agent/sessions
|
|
|
|
# Have your collaborator do the same
|
|
# Now you can collaborate through cloud sync!
|
|
```
|
|
|
|
---
|
|
|
|
### Network Collaboration Mode (yaze-server v2.0)
|
|
|
|
Real-time collaboration over the internet with advanced features powered by the yaze-server v2.0.
|
|
|
|
#### Requirements
|
|
|
|
- **Server**: Node.js 18+ with yaze-server running
|
|
- **Client**: YAZE built with `-DYAZE_WITH_GRPC=ON` and `-DZ3ED_AI=ON`
|
|
- **Network**: Connectivity between collaborators
|
|
|
|
#### Server Setup
|
|
|
|
**Option 1: Using z3ed CLI**
|
|
```bash
|
|
z3ed collab start [--port=8765]
|
|
```
|
|
|
|
**Option 2: Manual Launch**
|
|
```bash
|
|
cd /path/to/yaze-server
|
|
npm install
|
|
npm start
|
|
|
|
# Server starts on http://localhost:8765
|
|
# Health check: curl http://localhost:8765/health
|
|
```
|
|
|
|
**Option 3: Docker**
|
|
```bash
|
|
docker build -t yaze-server .
|
|
docker run -p 8765:8765 yaze-server
|
|
```
|
|
|
|
#### Client Connection
|
|
|
|
1. Open YAZE → **Debug → Agent Chat**
|
|
2. Select **"Network"** mode
|
|
3. Enter server URL: `ws://localhost:8765` (or remote server)
|
|
4. Click **"Connect to Server"**
|
|
5. Host or join sessions like local mode
|
|
|
|
#### Core Features
|
|
|
|
**Session Management:**
|
|
- Unique 6-character session codes
|
|
- Participant tracking with join/leave notifications
|
|
- Real-time message broadcasting
|
|
- Persistent chat history
|
|
|
|
**Connection Management:**
|
|
- Health monitoring endpoints (`/health`, `/metrics`)
|
|
- Graceful shutdown notifications
|
|
- Automatic cleanup of inactive sessions
|
|
- Rate limiting (100 messages/minute per IP)
|
|
|
|
#### Advanced Features (v2.0)
|
|
|
|
**🎮 ROM Synchronization**
|
|
Share ROM edits in real-time:
|
|
- Send base64-encoded diffs to all participants
|
|
- Automatic ROM hash tracking
|
|
- Size limit: 5MB per diff
|
|
- Conflict detection via hash comparison
|
|
|
|
**📸 Multimodal Snapshot Sharing**
|
|
Share screenshots and images:
|
|
- Capture and share specific editor views
|
|
- Support for multiple snapshot types (overworld, dungeon, sprite, etc.)
|
|
- Base64 encoding for efficient transfer
|
|
- Size limit: 10MB per snapshot
|
|
|
|
**💡 Proposal Management**
|
|
Collaborative proposal workflow:
|
|
- Share AI-generated proposals with all participants
|
|
- Track proposal status: pending, accepted, rejected
|
|
- Real-time status updates broadcast to all users
|
|
- Proposal history tracked in server database
|
|
|
|
**🤖 AI Agent Integration**
|
|
Server-routed AI queries:
|
|
- Send queries through the collaboration server
|
|
- Shared AI responses visible to all participants
|
|
- Query history tracked in database
|
|
- Optional: Disable AI per session
|
|
|
|
#### Protocol Reference
|
|
|
|
The server uses JSON WebSocket messages over HTTP/WebSocket transport.
|
|
|
|
**Client → Server Messages:**
|
|
|
|
```json
|
|
// Host Session (v2.0 with optional ROM hash and AI control)
|
|
{
|
|
"type": "host_session",
|
|
"payload": {
|
|
"session_name": "My Session",
|
|
"username": "alice",
|
|
"rom_hash": "abc123...", // optional
|
|
"ai_enabled": true // optional, default true
|
|
}
|
|
}
|
|
|
|
// Join Session
|
|
{
|
|
"type": "join_session",
|
|
"payload": {
|
|
"session_code": "ABC123",
|
|
"username": "bob"
|
|
}
|
|
}
|
|
|
|
// Chat Message (v2.0 with metadata support)
|
|
{
|
|
"type": "chat_message",
|
|
"payload": {
|
|
"sender": "alice",
|
|
"message": "Hello!",
|
|
"message_type": "chat", // optional: chat, system, ai
|
|
"metadata": {...} // optional metadata
|
|
}
|
|
}
|
|
|
|
// ROM Sync (NEW in v2.0)
|
|
{
|
|
"type": "rom_sync",
|
|
"payload": {
|
|
"sender": "alice",
|
|
"diff_data": "base64_encoded_diff...",
|
|
"rom_hash": "sha256_hash"
|
|
}
|
|
}
|
|
|
|
// Snapshot Share (NEW in v2.0)
|
|
{
|
|
"type": "snapshot_share",
|
|
"payload": {
|
|
"sender": "alice",
|
|
"snapshot_data": "base64_encoded_image...",
|
|
"snapshot_type": "overworld_editor"
|
|
}
|
|
}
|
|
|
|
// Proposal Share (NEW in v2.0)
|
|
{
|
|
"type": "proposal_share",
|
|
"payload": {
|
|
"sender": "alice",
|
|
"proposal_data": {
|
|
"title": "Add new sprite",
|
|
"description": "...",
|
|
"changes": [...]
|
|
}
|
|
}
|
|
}
|
|
|
|
// Proposal Update (NEW in v2.0)
|
|
{
|
|
"type": "proposal_update",
|
|
"payload": {
|
|
"proposal_id": "uuid",
|
|
"status": "accepted" // pending, accepted, rejected
|
|
}
|
|
}
|
|
|
|
// AI Query (NEW in v2.0)
|
|
{
|
|
"type": "ai_query",
|
|
"payload": {
|
|
"username": "alice",
|
|
"query": "What enemies are in the eastern palace?"
|
|
}
|
|
}
|
|
|
|
// Leave Session
|
|
{ "type": "leave_session" }
|
|
|
|
// Ping
|
|
{ "type": "ping" }
|
|
```
|
|
|
|
**Server → Client Messages:**
|
|
|
|
```json
|
|
// Session Hosted
|
|
{
|
|
"type": "session_hosted",
|
|
"payload": {
|
|
"session_id": "uuid",
|
|
"session_code": "ABC123",
|
|
"session_name": "My Session",
|
|
"participants": ["alice"],
|
|
"rom_hash": "abc123...",
|
|
"ai_enabled": true
|
|
}
|
|
}
|
|
|
|
// Session Joined
|
|
{
|
|
"type": "session_joined",
|
|
"payload": {
|
|
"session_id": "uuid",
|
|
"session_code": "ABC123",
|
|
"session_name": "My Session",
|
|
"participants": ["alice", "bob"],
|
|
"messages": [...]
|
|
}
|
|
}
|
|
|
|
// Chat Message (broadcast)
|
|
{
|
|
"type": "chat_message",
|
|
"payload": {
|
|
"sender": "alice",
|
|
"message": "Hello!",
|
|
"timestamp": 1709567890123,
|
|
"message_type": "chat",
|
|
"metadata": null
|
|
}
|
|
}
|
|
|
|
// ROM Sync (broadcast, NEW in v2.0)
|
|
{
|
|
"type": "rom_sync",
|
|
"payload": {
|
|
"sync_id": "uuid",
|
|
"sender": "alice",
|
|
"diff_data": "base64...",
|
|
"rom_hash": "sha256...",
|
|
"timestamp": 1709567890123
|
|
}
|
|
}
|
|
|
|
// Snapshot Shared (broadcast, NEW in v2.0)
|
|
{
|
|
"type": "snapshot_shared",
|
|
"payload": {
|
|
"snapshot_id": "uuid",
|
|
"sender": "alice",
|
|
"snapshot_data": "base64...",
|
|
"snapshot_type": "overworld_editor",
|
|
"timestamp": 1709567890123
|
|
}
|
|
}
|
|
|
|
// Proposal Shared (broadcast, NEW in v2.0)
|
|
{
|
|
"type": "proposal_shared",
|
|
"payload": {
|
|
"proposal_id": "uuid",
|
|
"sender": "alice",
|
|
"proposal_data": {...},
|
|
"status": "pending",
|
|
"timestamp": 1709567890123
|
|
}
|
|
}
|
|
|
|
// Proposal Updated (broadcast, NEW in v2.0)
|
|
{
|
|
"type": "proposal_updated",
|
|
"payload": {
|
|
"proposal_id": "uuid",
|
|
"status": "accepted",
|
|
"timestamp": 1709567890123
|
|
}
|
|
}
|
|
|
|
// AI Response (broadcast, NEW in v2.0)
|
|
{
|
|
"type": "ai_response",
|
|
"payload": {
|
|
"query_id": "uuid",
|
|
"username": "alice",
|
|
"query": "What enemies are in the eastern palace?",
|
|
"response": "The eastern palace contains...",
|
|
"timestamp": 1709567890123
|
|
}
|
|
}
|
|
|
|
// Participant Events
|
|
{
|
|
"type": "participant_joined", // or "participant_left"
|
|
"payload": {
|
|
"username": "bob",
|
|
"participants": ["alice", "bob"]
|
|
}
|
|
}
|
|
|
|
// Server Shutdown (NEW in v2.0)
|
|
{
|
|
"type": "server_shutdown",
|
|
"payload": {
|
|
"message": "Server is shutting down. Please reconnect later."
|
|
}
|
|
}
|
|
|
|
// Pong
|
|
{
|
|
"type": "pong",
|
|
"payload": { "timestamp": 1709567890123 }
|
|
}
|
|
|
|
// Error
|
|
{
|
|
"type": "error",
|
|
"payload": { "error": "Session ABC123 not found" }
|
|
}
|
|
```
|
|
|
|
#### Server Configuration
|
|
|
|
**Environment Variables:**
|
|
- `PORT` - Server port (default: 8765)
|
|
- `ENABLE_AI_AGENT` - Enable AI agent integration (default: true)
|
|
- `AI_AGENT_ENDPOINT` - External AI agent endpoint URL
|
|
|
|
**Rate Limiting:**
|
|
- Window: 60 seconds
|
|
- Max messages: 100 per IP per window
|
|
- Max snapshot size: 10 MB
|
|
- Max ROM diff size: 5 MB
|
|
|
|
#### Database Schema (Server v2.0)
|
|
|
|
The server uses SQLite with the following tables:
|
|
|
|
- **sessions**: Session metadata, ROM hash, AI enabled flag
|
|
- **participants**: User tracking with last_seen timestamps
|
|
- **messages**: Chat history with message types and metadata
|
|
- **rom_syncs**: ROM diff history with hashes
|
|
- **snapshots**: Shared screenshots and images
|
|
- **proposals**: AI proposal tracking with status
|
|
- **agent_interactions**: AI query and response history
|
|
|
|
#### Deployment
|
|
|
|
**Heroku:**
|
|
```bash
|
|
cd /path/to/yaze-server
|
|
heroku create yaze-collab
|
|
git push heroku main
|
|
heroku config:set ENABLE_AI_AGENT=true
|
|
```
|
|
|
|
**VPS (with PM2):**
|
|
```bash
|
|
git clone https://github.com/scawful/yaze-server
|
|
cd yaze-server
|
|
npm install
|
|
npm install -g pm2
|
|
pm2 start server.js --name yaze-collab
|
|
pm2 startup
|
|
pm2 save
|
|
```
|
|
|
|
**Docker:**
|
|
```bash
|
|
docker build -t yaze-server .
|
|
docker run -p 8765:8765 -e ENABLE_AI_AGENT=true yaze-server
|
|
```
|
|
|
|
#### Testing
|
|
|
|
**Health Check:**
|
|
```bash
|
|
curl http://localhost:8765/health
|
|
curl http://localhost:8765/metrics
|
|
```
|
|
|
|
**Test with wscat:**
|
|
```bash
|
|
npm install -g wscat
|
|
wscat -c ws://localhost:8765
|
|
|
|
# Host session
|
|
> {"type":"host_session","payload":{"session_name":"Test","username":"alice","ai_enabled":true}}
|
|
|
|
# Join session (in another terminal)
|
|
> {"type":"join_session","payload":{"session_code":"ABC123","username":"bob"}}
|
|
|
|
# Send message
|
|
> {"type":"chat_message","payload":{"sender":"alice","message":"Hello!"}}
|
|
```
|
|
|
|
#### Security Considerations
|
|
|
|
**Current Implementation:**
|
|
⚠️ Basic security - suitable for trusted networks
|
|
- No authentication or encryption by default
|
|
- Plain text message transmission
|
|
- Session codes are the only access control
|
|
|
|
**Recommended for Production:**
|
|
1. **SSL/TLS**: Use `wss://` with valid certificates
|
|
2. **Authentication**: Implement JWT tokens or OAuth
|
|
3. **Session Passwords**: Optional per-session passwords
|
|
4. **Persistent Storage**: Use PostgreSQL/MySQL for production
|
|
5. **Monitoring**: Add logging to CloudWatch/Datadog
|
|
6. **Backup**: Regular database backups
|
|
|
|
---
|
|
|
|
### Multimodal Vision (Gemini)
|
|
|
|
Analyze screenshots of your ROM editor using Gemini's vision capabilities for visual feedback and suggestions.
|
|
|
|
#### Requirements
|
|
|
|
- `GEMINI_API_KEY` environment variable set
|
|
- YAZE built with `-DYAZE_WITH_GRPC=ON` and `-DZ3ED_AI=ON`
|
|
|
|
#### Capture Modes
|
|
|
|
**Full Window**: Captures the entire YAZE application window
|
|
|
|
**Active Editor** (default): Captures only the currently focused editor window
|
|
|
|
**Specific Window**: Captures a named window (e.g., "Overworld Editor")
|
|
|
|
#### How to Use
|
|
|
|
1. Open **Debug → Agent Chat**
|
|
2. Expand **"Gemini Multimodal (Preview)"** panel
|
|
3. Select capture mode:
|
|
- ○ Full Window
|
|
- ● Active Editor (default)
|
|
- ○ Specific Window
|
|
4. If Specific Window, enter window name: `Overworld Editor`
|
|
5. Click **"Capture Snapshot"**
|
|
6. Enter prompt: `"What issues do you see with this layout?"`
|
|
7. Click **"Send to Gemini"**
|
|
|
|
#### Example Prompts
|
|
|
|
- "Analyze the tile placement in this overworld screen"
|
|
- "What's wrong with the palette colors in this screenshot?"
|
|
- "Suggest improvements for this dungeon room layout"
|
|
- "Does this screen follow good level design practices?"
|
|
- "Are there any visual glitches or tile conflicts?"
|
|
- "How can I improve the composition of this room?"
|
|
|
|
The AI response appears in your chat history and can reference specific details from the screenshot. In network collaboration mode, multimodal snapshots can be shared with all participants.
|
|
|
|
---
|
|
|
|
### Architecture
|
|
|
|
```
|
|
┌──────────────────────────────────────────────────────┐
|
|
│ YAZE Editor │
|
|
│ │
|
|
│ ┌─────────────────────────────────────────────┐ │
|
|
│ │ Agent Chat Widget (ImGui) │ │
|
|
│ │ │ │
|
|
│ │ [Collaboration Panel] │ │
|
|
│ │ ├─ Local Mode (filesystem) ✓ Working │ │
|
|
│ │ └─ Network Mode (websocket) ✓ Working │ │
|
|
│ │ │ │
|
|
│ │ [Multimodal Panel] │ │
|
|
│ │ ├─ Capture Mode Selection ✓ Working │ │
|
|
│ │ ├─ Screenshot Capture ✓ Working │ │
|
|
│ │ └─ Send to Gemini ✓ Working │ │
|
|
│ └─────────────────────────────────────────────┘ │
|
|
│ │ │ │
|
|
│ ▼ ▼ │
|
|
│ ┌──────────────────┐ ┌──────────────────┐ │
|
|
│ │ Collaboration │ │ Screenshot │ │
|
|
│ │ Coordinators │ │ Utils │ │
|
|
│ └──────────────────┘ └──────────────────┘ │
|
|
│ │ │ │
|
|
└───────────┼────────────────────┼────────────────────┘
|
|
│ │
|
|
▼ ▼
|
|
┌──────────────────┐ ┌──────────────────┐
|
|
│ ~/.yaze/agent/ │ │ Gemini Vision │
|
|
│ sessions/ │ │ API │
|
|
└──────────────────┘ └──────────────────┘
|
|
│
|
|
▼
|
|
┌──────────────────────────────────────────┐
|
|
│ yaze-server v2.0 │
|
|
│ - WebSocket Server (Node.js) │
|
|
│ - SQLite Database │
|
|
│ - Session Management │
|
|
│ - ROM Sync │
|
|
│ - Snapshot Sharing │
|
|
│ - Proposal Management │
|
|
│ - AI Agent Integration │
|
|
└──────────────────────────────────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
### Troubleshooting
|
|
|
|
**"Failed to start collaboration server"**
|
|
- Ensure Node.js is installed: `node --version`
|
|
- Check port availability: `lsof -i :8765`
|
|
- Verify server directory exists
|
|
|
|
**"Not connected to collaboration server"**
|
|
- Verify server is running: `curl http://localhost:8765/health`
|
|
- Check firewall settings
|
|
- Confirm server URL is correct
|
|
|
|
**"Session not found"**
|
|
- Verify session code is correct (case-insensitive)
|
|
- Check if session expired (server restart clears sessions)
|
|
- Try hosting a new session
|
|
|
|
**"Rate limit exceeded"**
|
|
- Server enforces 100 messages per minute per IP
|
|
- Wait 60 seconds and try again
|
|
|
|
**Participants not updating**
|
|
- Click "Refresh Session" button
|
|
- Check network connectivity
|
|
- Verify server logs for errors
|
|
|
|
**Messages not broadcasting**
|
|
- Ensure all clients are in the same session
|
|
- Check session code matches exactly
|
|
- Verify network connectivity between client and server
|
|
|
|
---
|
|
|
|
### References
|
|
|
|
- **Server Repository**: [yaze-server](https://github.com/scawful/yaze-server)
|
|
- **Agent Editor Docs**: `src/app/editor/agent/README.md`
|
|
- **Integration Guide**: `docs/z3ed/YAZE_SERVER_V2_INTEGRATION.md`
|
|
|
|
## 11. Roadmap & Implementation Status
|
|
|
|
**Last Updated**: October 4, 2025
|
|
|
|
### ✅ Completed
|
|
|
|
- **Core Infrastructure**: Resource-oriented CLI, proposal workflow, sandbox manager, and resource catalog are all production-ready.
|
|
- **AI Backends**: Both Ollama (local) and Gemini (cloud) are operational.
|
|
- **Conversational Agent**: The agent service, tool dispatcher (with 5 read-only tools), TUI/simple chat interfaces, and ImGui editor chat widget with persistent history.
|
|
- **GUI Test Harness**: A comprehensive GUI testing platform with introspection, widget discovery, recording/replay, and CI integration support.
|
|
- **Collaborative Sessions**:
|
|
- Local filesystem-based collaborative editing with shared chat history
|
|
- Network WebSocket-based collaboration via yaze-server v2.0
|
|
- Dual-mode support (Local/Network) with seamless switching
|
|
- **Multimodal Vision**: Gemini vision API integration with multiple capture modes (Full Window, Active Editor, Specific Window).
|
|
- **yaze-server v2.0**: Production-ready Node.js WebSocket server with:
|
|
- ROM synchronization with diff broadcasting
|
|
- Multimodal snapshot sharing
|
|
- Collaborative proposal management
|
|
- AI agent integration and query routing
|
|
- Health monitoring and metrics endpoints
|
|
- Rate limiting and security features
|
|
|
|
### 🚧 Active & Next Steps
|
|
|
|
1. **Live LLM Testing (1-2h)**: Verify function calling with real models (Ollama/Gemini).
|
|
2. **Expand Tool Coverage (8-10h)**: Add new read-only tools for inspecting dialogue, sprites, and regions.
|
|
3. **Full WebSocket Protocol (2-3 days)**: Upgrade from HTTP polling to true WebSocket frames using ixwebsocket or websocketpp.
|
|
4. **Collaboration UI Enhancements (1 day)**: Add UI elements for ROM sync, snapshot sharing, and proposal management in the Agent Chat widget.
|
|
5. **Windows Cross-Platform Testing (8-10h)**: Validate `z3ed` and the test harness on Windows.
|
|
|
|
### ✅ Recently Completed (v0.2.0-alpha)
|
|
|
|
- **Enhanced System Prompt (v3)**: Proactive tool chaining with implicit iteration to minimize back-and-forth
|
|
- **Learn Command**: Full implementation with preferences, ROM patterns, project context, and conversation memory
|
|
- **gRPC Windows Build Optimization**: Documented vcpkg approach and optimization strategies
|
|
|
|
## 12. Troubleshooting
|
|
|
|
- **"Build with -DZ3ED_AI=ON" warning**: AI features are disabled. Rebuild with the flag to enable them.
|
|
- **"gRPC not available" error**: GUI testing is disabled. Rebuild with `-DYAZE_WITH_GRPC=ON`.
|
|
- **AI generates invalid commands**: The prompt may be vague. Use specific coordinates, tile IDs, and map context.
|
|
- **Chat mode freezes**: Use `agent simple-chat` instead of the FTXUI-based `agent chat` for better stability, especially in scripts. |