Files
yaze/assets/agent/gui_automation_instructions.txt
2025-10-17 12:10:25 -04:00

217 lines
6.5 KiB
Plaintext

# GUI Automation with YAZE Test Harness
## Overview
You have the ability to control the YAZE GUI directly through a test harness system. This allows you to perform visual edits, interact with UI elements, and capture screenshots for multimodal analysis.
## Prerequisites
- YAZE must be running with the `--enable-test-harness` flag
- Test harness server runs on port 50052 by default
- GUI automation tools only work when YAZE GUI is active
## Available GUI Tools
### 1. gui-discover
**Purpose**: Discover available widgets and windows in the YAZE interface
**When to use**: Before performing any GUI actions, discover what UI elements are available
**Example usage**:
```json
{
"tool_calls": [{
"tool_name": "gui-discover",
"args": {
"window": "Overworld",
"type": "button"
}
}]
}
```
### 2. gui-click
**Purpose**: Automate clicking buttons and UI elements
**When to use**: To open editors, switch modes, or trigger actions in the GUI
**Example usage**:
```json
{
"tool_calls": [{
"tool_name": "gui-click",
"args": {
"target": "ModeButton:Draw (2)",
"click_type": "left"
}
}]
}
```
### 3. gui-place-tile
**Purpose**: Automate tile placement in the overworld editor
**When to use**: When user wants to see visual tile placement in the GUI (not just ROM data edit)
**Example usage**:
```json
{
"tool_calls": [{
"tool_name": "gui-place-tile",
"args": {
"tile": "0x02E",
"x": "15",
"y": "20"
}
}]
}
```
### 4. gui-screenshot
**Purpose**: Capture visual state of the GUI
**When to use**: For visual verification, multimodal analysis, or user feedback
**Example usage**:
```json
{
"tool_calls": [{
"tool_name": "gui-screenshot",
"args": {
"region": "full",
"format": "PNG"
}
}]
}
```
## GUI Automation Workflow
### Typical Pattern for GUI Edits
1. **Discover** - Find available widgets with `gui-discover`
2. **Navigate** - Use `gui-click` to open the right editor or switch modes
3. **Edit** - Use specific tools like `gui-place-tile` for the actual modification
4. **Verify** - Capture a screenshot with `gui-screenshot` to confirm changes
### Example: Place a tree tile in the overworld
```
User: "Use the GUI to place a tree at position 10, 15"
Step 1: Call gui-place-tile
{
"tool_calls": [{
"tool_name": "gui-place-tile",
"args": {
"tile": "0x02E",
"x": "10",
"y": "15"
}
}],
"reasoning": "The user wants visual GUI interaction. Tree tile is 0x02E."
}
Step 2: After receiving tool result, inform user
{
"text_response": "I've generated the GUI automation script to place a tree tile at position (10, 15). The test harness will execute this action if YAZE is running with --enable-test-harness.",
"reasoning": "Tool call succeeded, provide confirmation to user."
}
```
## When to Use GUI Tools vs ROM Tools
### Use GUI Tools When:
- User explicitly requests "use the GUI" or "show me"
- User wants to see visual feedback
- User wants to learn how to use the editor
- Demonstrating a workflow
### Use ROM Tools When:
- User wants batch operations
- User needs precise control over ROM data
- GUI is not running
- Faster automated operations needed
## Important Notes
1. **GUI tools require connection**: All GUI tools check if test harness is connected. If not, they return mock responses.
2. **Coordinate systems**: GUI coordinates are tile-based (0-63 for overworld), matching the ROM data coordinates.
3. **Widget paths**: Widget paths are hierarchical, like "ModeButton:Draw (2)" or "ToolbarAction:Toggle Tile16 Selector". Use `gui-discover` to find exact paths.
4. **Error handling**: If a GUI tool fails, fall back to ROM tools to ensure user request is fulfilled.
5. **Test scripts**: Tools like `gui-place-tile` generate test scripts that can be saved and replayed later.
## Integration with Multimodal Features
Combine GUI automation with screenshot capture for powerful multimodal workflows:
```
1. Capture before state: gui-screenshot
2. Perform edit: gui-place-tile
3. Capture after state: gui-screenshot
4. Compare visually or send to vision model for verification
```
## Troubleshooting
### "Connection refused" errors
- Ensure YAZE is running with `--enable-test-harness` flag
- Check that port 50052 is available
- Verify no firewall blocking localhost connections
### "Widget not found" errors
- Run `gui-discover` first to get current widget list
- Check that the right editor window is open
- Verify widget path spelling and case
### "Tool not implemented" errors
- Ensure YAZE was built with `-DYAZE_WITH_GRPC=ON`
- Verify z3ed binary includes gRPC support
## Example Conversations
### Example 1: Simple tile placement
```
User: "Use the GUI to place grass at 5, 10"
Assistant: [Calls gui-place-tile with tile=0x020, x=5, y=10]
Assistant: "I've queued a GUI action to place grass tile at position (5, 10)."
```
### Example 2: Discover and click workflow
```
User: "Open the Tile16 selector"
Assistant: [Calls gui-discover with window=Overworld]
Assistant: [Receives widget list including "ToolbarAction:Toggle Tile16 Selector"]
Assistant: [Calls gui-click with target="ToolbarAction:Toggle Tile16 Selector"]
Assistant: "I've clicked the Tile16 Selector button to open the selector panel."
```
### Example 3: Visual verification
```
User: "Show me what the current map looks like"
Assistant: [Calls gui-screenshot with region=full]
Assistant: "Here's a screenshot of the current editor state: /tmp/yaze_screenshot.png"
```
## Advanced Features
### Chaining GUI Actions
You can chain multiple GUI tools in a single response for complex workflows:
```json
{
"tool_calls": [
{"tool_name": "gui-discover", "args": {"window": "Overworld"}},
{"tool_name": "gui-click", "args": {"target": "ModeButton:Draw (2)"}},
{"tool_name": "gui-place-tile", "args": {"tile": "0x02E", "x": "10", "y": "10"}},
{"tool_name": "gui-screenshot", "args": {"region": "full"}}
],
"reasoning": "Complete workflow: discover widgets, switch to draw mode, place tile, capture result"
}
```
### Recording and Replay
GUI actions can be recorded for later replay:
1. Actions are logged as test scripts
2. Scripts can be saved to YAML/JSON files
3. Replay with `z3ed agent test replay <script.yaml>`
## Summary
GUI automation tools extend your capabilities beyond ROM data manipulation to include visual, interactive editing workflows. Use them when users want to see changes happen in real-time or when demonstrating features of the YAZE editor.
Remember: Always start with `gui-discover` to understand what's available, then use specific tools for your task.