Files

scawful 0465d07a55 feat: Add GUI automation client and test workflow generator

- Implemented GuiAutomationClient for gRPC communication with the test harness.
- Added methods for various GUI actions: Click, Type, Wait, Assert, and Screenshot.
- Created TestWorkflowGenerator to convert natural language prompts into structured test workflows.
- Enhanced HandleTestCommand to support new command-line arguments for GUI automation.
- Updated CMakeLists.txt to include new source files for GUI automation and workflow generation.

2025-10-02 01:01:19 -04:00

15 KiB

Raw Blame History

End-to-End Workflow Validation Guide

Created: October 2, 2025
Status: Priority 1 - Ready to Execute
Time Estimate: 2-3 hours

Overview

This guide provides a comprehensive checklist for validating the complete z3ed agent workflow from proposal creation through ROM commit. This is the final validation step before declaring the agentic workflow system operational.

Prerequisites

Build Requirements

# Build z3ed CLI
cmake --build build --target z3ed -j8

# Build YAZE with gRPC support
cmake -B build-grpc-test -DYAZE_WITH_GRPC=ON
cmake --build build-grpc-test --target yaze -j$(sysctl -n hw.ncpu)

# Verify grpcurl is installed
brew install grpcurl

Test Assets

ROM file: assets/zelda3.sfc (required)
Empty workspace for proposals: /tmp/yaze/ (auto-created)

Validation Checklist

✅ Phase 1: Automated Test Script (30 minutes)

1.1. Run E2E Test Script

./scripts/test_harness_e2e.sh

Expected Output:

=== ImGuiTestHarness E2E Test ===

Starting YAZE with test harness...
YAZE PID: 12345
Waiting for server to start...
✓ Server started successfully

=== Running RPC Tests ===

Test 1: Ping (Health Check)
✓ PASSED

Test 2: Click (Button)
✓ PASSED

Test 3: Type (Text Input)
✓ PASSED

Test 4: Wait (Window Visible)
✓ PASSED

Test 5: Assert (Window Visible)
✓ PASSED

Test 6: Screenshot (Not Implemented)
✓ PASSED

=== Test Summary ===
Tests Run:    6
Tests Passed: 6
Tests Failed: 0

All tests passed!

Success Criteria:

All 6 tests pass
No connection errors
No port conflicts
Server starts and stops cleanly

Troubleshooting:

If port in use: killall yaze && sleep 2
If grpcurl missing: brew install grpcurl
If binary not found: Check build-grpc-test/bin/ directory

✅ Phase 2: Manual Proposal Workflow (60 minutes)

2.1. Create Test Proposal

# Create a proposal via CLI
./build/bin/z3ed agent run \
  --rom=assets/zelda3.sfc \
  --prompt "Test proposal for E2E validation" \
  --sandbox

# Expected output:
# ✅ Agent run completed successfully.
#    Proposal ID: <UUID>
#    Sandbox: /tmp/yaze/sandboxes/<UUID>/zelda3.sfc
#    Use 'z3ed agent diff' to review changes

Verification Steps:

Command completes without error
Proposal ID is displayed
Sandbox ROM file exists at shown path
No crashes or hangs

2.2. List Proposals

./build/bin/z3ed agent list

# Expected output:
# === Agent Proposals ===
#
# ID: <UUID>
#   Status: Pending
#   Created: <timestamp>
#   Prompt: Test proposal for E2E validation
#   Commands: 0
#   Bytes Changed: 0
#
# Total: 1 proposal(s)

Verification Steps:

Proposal appears in list
Status shows "Pending"
All metadata fields populated
Prompt matches input

2.3. View Proposal Diff

./build/bin/z3ed agent diff

# Expected output:
# === Proposal Diff ===
# Proposal ID: <UUID>
# Sandbox ID: <UUID>
# Prompt: Test proposal for E2E validation
# Description: Agent-generated ROM modifications
# Status: Pending
# Created: <timestamp>
# Commands Executed: 0
# Bytes Changed: 0
#
# --- Diff Content ---
# (No changes yet for mock implementation)
#
# --- Execution Log ---
# Starting agent run with prompt: Test proposal for E2E validation
# Generated 0 commands
# Completed execution of 0 commands
#
# === Next Steps ===
# To accept changes: z3ed agent commit
# To reject changes: z3ed agent revert
# To review in GUI: yaze --proposal=<UUID>

Verification Steps:

Diff displays correctly
Execution log shows all steps
Metadata matches proposal
No errors reading files

2.4. Launch YAZE GUI

# Start YAZE normally (not test harness mode)
./build/bin/yaze.app/Contents/MacOS/yaze

# Navigate to: Debug → Agent Proposals

Verification Steps:

YAZE launches without crashes
"Agent Proposals" menu item exists
ProposalDrawer opens when clicked
Drawer appears on right side (400px width)

2.5. Test ProposalDrawer UI

List View Verification:

Proposal appears in list
Status badge shows "Pending" in yellow
Prompt text is visible
Created timestamp displayed
Click proposal to open detail view

Detail View Verification:

All metadata displayed correctly
Execution log visible and scrollable
Diff section shows (empty for mock)
Accept/Reject/Delete buttons visible
Back button returns to list

Filtering Verification:

"All" filter shows proposal
"Pending" filter shows proposal
"Accepted" filter hides proposal (not accepted yet)
"Rejected" filter hides proposal (not rejected yet)

Refresh Verification:

Click "Refresh" button
Proposal count updates if needed
No crashes or errors

2.6. Test Accept Workflow

Steps:

Select proposal in list view
Open detail view
Click "Accept" button
Confirm in dialog (if shown)
Wait for processing

Verification:

Accept button triggers action
Status changes to "Accepted"
Status badge turns green
ROM data merged successfully (check logs)
Sandbox ROM remains unchanged
No crashes during merge

Post-Accept Checks:

# Verify proposal status persists
./build/bin/z3ed agent list
# Should show Status: Accepted

# Verify ROM was modified (if changes were made)
# For mock implementation, this will be no-op

2.7. Test Reject Workflow

Create another proposal:

./build/bin/z3ed agent run \
  --rom=assets/zelda3.sfc \
  --prompt "Proposal to reject" \
  --sandbox

Steps:

Open ProposalDrawer in YAZE
Select new proposal
Click "Reject" button
Confirm in dialog (if shown)

Verification:

Reject button triggers action
Status changes to "Rejected"
Status badge turns red
ROM remains unchanged
Sandbox ROM unchanged
No crashes

2.8. Test Delete Workflow

Create another proposal:

./build/bin/z3ed agent run \
  --rom=assets/zelda3.sfc \
  --prompt "Proposal to delete" \
  --sandbox

Steps:

Open ProposalDrawer in YAZE
Select new proposal
Click "Delete" button
Confirm in dialog

Verification:

Delete button triggers action
Proposal removed from list
Files cleaned up from disk
No crashes

File Cleanup Check:

# Verify proposal directory was removed
ls /tmp/yaze/proposals/
# Should NOT show deleted proposal ID

# Verify sandbox was removed
ls /tmp/yaze/sandboxes/
# Should NOT show deleted sandbox ID

3.1. Start Test Harness

# Terminal 1: Start YAZE with test harness
./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze \
  --enable_test_harness \
  --test_harness_port=50052 \
  --rom_file=assets/zelda3.sfc &

# Wait for startup
sleep 3

# Verify server is listening
lsof -i :50052
# Should show yaze process

3.2. Test Overworld Editor Workflow

# Terminal 2: Run automation commands

# Click Overworld button
grpcurl -plaintext \
  -import-path src/app/core/proto \
  -proto imgui_test_harness.proto \
  -d '{"target":"button:Overworld","type":"LEFT"}' \
  127.0.0.1:50052 yaze.test.ImGuiTestHarness/Click

# Wait for window to appear
grpcurl -plaintext \
  -import-path src/app/core/proto \
  -proto imgui_test_harness.proto \
  -d '{"condition":"window_visible:Overworld Editor","timeout_ms":5000}' \
  127.0.0.1:50052 yaze.test.ImGuiTestHarness/Wait

# Assert window is visible
grpcurl -plaintext \
  -import-path src/app/core/proto \
  -proto imgui_test_harness.proto \
  -d '{"condition":"visible:Overworld Editor"}' \
  127.0.0.1:50052 yaze.test.ImGuiTestHarness/Assert

Verification:

Click RPC succeeds
Overworld Editor window opens in YAZE
Wait RPC succeeds (condition met)
Assert RPC succeeds (window visible)
No timeouts or errors

3.3. Test Dungeon Editor Workflow

# Click Dungeon button
grpcurl -plaintext \
  -import-path src/app/core/proto \
  -proto imgui_test_harness.proto \
  -d '{"target":"button:Dungeon","type":"LEFT"}' \
  127.0.0.1:50052 yaze.test.ImGuiTestHarness/Click

# Wait for window
grpcurl -plaintext \
  -import-path src/app/core/proto \
  -proto imgui_test_harness.proto \
  -d '{"condition":"window_visible:Dungeon Editor","timeout_ms":5000}' \
  127.0.0.1:50052 yaze.test.ImGuiTestHarness/Wait

# Assert visible
grpcurl -plaintext \
  -import-path src/app/core/proto \
  -proto imgui_test_harness.proto \
  -d '{"condition":"visible:Dungeon Editor"}' \
  127.0.0.1:50052 yaze.test.ImGuiTestHarness/Assert

Verification:

Click RPC succeeds
Dungeon Editor window opens
Wait RPC succeeds
Assert RPC succeeds
No errors

3.4. Test CLI Agent Test Command

# Build z3ed with gRPC support first
cmake -B build-grpc-test -DYAZE_WITH_GRPC=ON
cmake --build build-grpc-test --target z3ed -j8

# Test simple open editor command
./build-grpc-test/bin/z3ed agent test \
  --prompt "Open Overworld editor"

# Expected output:
# === GUI Automation Test ===
# Prompt: Open Overworld editor
# Server: localhost:50052
#
# Generated workflow:
# Workflow: Open Overworld Editor
#   1. Click(button:Overworld)
#   2. Wait(window_visible:Overworld Editor, 5000ms)
#
# ✓ Connected to test harness
#
# [1/2] Click(button:Overworld) ... ✓ (125ms)
# [2/2] Wait(window_visible:Overworld Editor, 5000ms) ... ✓ (1250ms)
#
# ✅ Test passed in 1375ms

Verification:

Command parses prompt correctly
Workflow generation succeeds
Connection to test harness succeeds
All steps execute successfully
Timing information displayed
Exit code is 0

Test Additional Prompts:

# Open and verify
./build-grpc-test/bin/z3ed agent test \
  --prompt "Open Dungeon editor and verify it loads"

# Click button
./build-grpc-test/bin/z3ed agent test \
  --prompt "Click Overworld button"

Verification for Each:

Prompt recognized
Workflow generated correctly
All steps pass
No crashes or errors

✅ Phase 4: Documentation Updates (30 minutes)

4.1. Update IT-01-QUICKSTART.md

Add section on CLI agent test command:

## CLI Agent Test Command

You can now automate GUI testing with natural language prompts:

\`\`\`bash
# Start YAZE with test harness
./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze \
  --enable_test_harness \
  --test_harness_port=50052 \
  --rom_file=assets/zelda3.sfc &

# Run automated test
./build-grpc-test/bin/z3ed agent test \
  --prompt "Open Overworld editor and verify it loads"
\`\`\`

### Supported Prompt Patterns

1. **Open Editor**: "Open Overworld editor"
2. **Open and Verify**: "Open Dungeon editor and verify it loads"
3. **Click Button**: "Click Open ROM button"
4. **Type Input**: "Type 'zelda3.sfc' in filename input"

Tasks:

Add CLI agent test section
Document supported prompts
Add troubleshooting tips
Update examples

4.2. Update E6-z3ed-implementation-plan.md

Mark Priority 1 complete:

### Priority 1: End-to-End Workflow Validation ✅ COMPLETE

**Completion Date**: October 2, 2025  
**Time Spent**: 3 hours  
**Status**: All validation checks passed

**Completed Tasks**:
1. ✅ E2E test script validation
2. ✅ Manual proposal workflow testing
3. ✅ Real widget automation testing
4. ✅ CLI agent test command implementation
5. ✅ Documentation updates

**Key Findings**:
- All systems working as expected
- No critical issues identified
- Performance acceptable (< 2s per step)
- Ready for production use

**Next Priority**: IT-02 (CLI Agent Test Command - already implemented!)

Tasks:

Mark Priority 1 complete
Document completion details
List any issues found
Update status summary

4.3. Update README.md

Update current status:

### ✅ Priority 1: End-to-End Workflow Validation (COMPLETE)
**Goal**: Validated complete proposal lifecycle with real GUI and widgets  
**Time Invested**: 3 hours  
**Status**: All checks passed

### ✅ Priority 2: CLI Agent Test Command (COMPLETE)
**Goal**: Natural language prompt → automated GUI test workflow  
**Time Invested**: 2 hours (implemented alongside Priority 1)  
**Status**: Fully operational

**Implementation**:
- GuiAutomationClient: gRPC wrapper for CLI usage
- TestWorkflowGenerator: Natural language prompt parsing
- `z3ed agent test` command: End-to-end automation

**See**: [IT-01-QUICKSTART.md](IT-01-QUICKSTART.md) for usage examples

Tasks:

Update completion status
Add implementation details
Update quick start guide
Add examples

Success Criteria Summary

Must Pass (Critical)

E2E test script: All 6 tests pass
Proposal creation: Works without errors
ProposalDrawer: Opens and displays proposals
Accept workflow: ROM merging works correctly
GUI automation: Real widgets respond to RPCs
CLI agent test: At least 3 prompts work

Should Pass (Important)

Reject workflow: Status updates correctly
Delete workflow: Files cleaned up
Cross-session persistence: Proposals survive restart
Error handling: Helpful messages on failure
Performance: < 5s per automation step

Nice to Have (Optional)

Screenshots: Capture and save images
Policy evaluation: Basic constraint checking
Telemetry: Usage metrics collected

Known Issues & Limitations

Current Limitations

MockAIService: Not using real LLM (placeholder commands)
Screenshot: Not yet implemented (returns stub)
Policy Evaluation: Not yet implemented (AW-04)
Windows Support: Test harness not available on Windows

Workarounds

Mock service sufficient for testing infrastructure
Screenshot can be added later (non-blocking)
Policy framework is Priority 3
Windows users can use manual testing

Next Steps

After completing this validation:

Mark Priority 1 Complete: Update all documentation
Mark Priority 2 Complete: CLI agent test implemented
Begin Priority 3: Policy Evaluation Framework (AW-04)
Production Deployment: System ready for real usage

Reporting Issues

If any validation step fails, document:

What failed: Specific step/command
Error message: Full output or screenshot
Environment: OS, build config, ROM file
Reproduction: Steps to reproduce
Workaround: Any temporary fixes found

Report issues in: docs/z3ed/VALIDATION_ISSUES.md

Last Updated: October 2, 2025
Contributors: @scawful, GitHub Copilot
License: Same as YAZE (see ../../LICENSE)

15 KiB Raw Blame History

End-to-End Workflow Validation Guide

Overview

Prerequisites

Build Requirements

Test Assets

Validation Checklist

✅ Phase 1: Automated Test Script (30 minutes)

1.1. Run E2E Test Script

✅ Phase 2: Manual Proposal Workflow (60 minutes)

2.1. Create Test Proposal

2.2. List Proposals

2.3. View Proposal Diff

2.4. Launch YAZE GUI

2.5. Test ProposalDrawer UI

2.6. Test Accept Workflow

2.7. Test Reject Workflow

2.8. Test Delete Workflow

✅ Phase 3: Real Widget Testing (60 minutes)

3.1. Start Test Harness

3.2. Test Overworld Editor Workflow

3.3. Test Dungeon Editor Workflow

3.4. Test CLI Agent Test Command

✅ Phase 4: Documentation Updates (30 minutes)

4.1. Update IT-01-QUICKSTART.md

4.2. Update E6-z3ed-implementation-plan.md

4.3. Update README.md

Success Criteria Summary

Must Pass (Critical)

Should Pass (Important)

Nice to Have (Optional)

Known Issues & Limitations

Current Limitations

Workarounds

Next Steps

Reporting Issues

15 KiB

Raw Blame History