feat: Add GUI automation client and test workflow generator

- Implemented GuiAutomationClient for gRPC communication with the test harness.
- Added methods for various GUI actions: Click, Type, Wait, Assert, and Screenshot.
- Created TestWorkflowGenerator to convert natural language prompts into structured test workflows.
- Enhanced HandleTestCommand to support new command-line arguments for GUI automation.
- Updated CMakeLists.txt to include new source files for GUI automation and workflow generation.
This commit is contained in:
scawful
2025-10-02 01:01:19 -04:00
parent 286efdec6a
commit 0465d07a55
11 changed files with 2585 additions and 85 deletions

View File

@@ -0,0 +1,613 @@
# End-to-End Workflow Validation Guide
**Created**: October 2, 2025
**Status**: Priority 1 - Ready to Execute
**Time Estimate**: 2-3 hours
## Overview
This guide provides a comprehensive checklist for validating the complete z3ed agent workflow from proposal creation through ROM commit. This is the final validation step before declaring the agentic workflow system operational.
## Prerequisites
### Build Requirements
```bash
# Build z3ed CLI
cmake --build build --target z3ed -j8
# Build YAZE with gRPC support
cmake -B build-grpc-test -DYAZE_WITH_GRPC=ON
cmake --build build-grpc-test --target yaze -j$(sysctl -n hw.ncpu)
# Verify grpcurl is installed
brew install grpcurl
```
### Test Assets
- ROM file: `assets/zelda3.sfc` (required)
- Empty workspace for proposals: `/tmp/yaze/` (auto-created)
## Validation Checklist
### ✅ Phase 1: Automated Test Script (30 minutes)
#### 1.1. Run E2E Test Script
```bash
./scripts/test_harness_e2e.sh
```
**Expected Output**:
```
=== ImGuiTestHarness E2E Test ===
Starting YAZE with test harness...
YAZE PID: 12345
Waiting for server to start...
✓ Server started successfully
=== Running RPC Tests ===
Test 1: Ping (Health Check)
✓ PASSED
Test 2: Click (Button)
✓ PASSED
Test 3: Type (Text Input)
✓ PASSED
Test 4: Wait (Window Visible)
✓ PASSED
Test 5: Assert (Window Visible)
✓ PASSED
Test 6: Screenshot (Not Implemented)
✓ PASSED
=== Test Summary ===
Tests Run: 6
Tests Passed: 6
Tests Failed: 0
All tests passed!
```
**Success Criteria**:
- [ ] All 6 tests pass
- [ ] No connection errors
- [ ] No port conflicts
- [ ] Server starts and stops cleanly
**Troubleshooting**:
- If port in use: `killall yaze && sleep 2`
- If grpcurl missing: `brew install grpcurl`
- If binary not found: Check `build-grpc-test/bin/` directory
---
### ✅ Phase 2: Manual Proposal Workflow (60 minutes)
#### 2.1. Create Test Proposal
```bash
# Create a proposal via CLI
./build/bin/z3ed agent run \
--rom=assets/zelda3.sfc \
--prompt "Test proposal for E2E validation" \
--sandbox
# Expected output:
# ✅ Agent run completed successfully.
# Proposal ID: <UUID>
# Sandbox: /tmp/yaze/sandboxes/<UUID>/zelda3.sfc
# Use 'z3ed agent diff' to review changes
```
**Verification Steps**:
1. [ ] Command completes without error
2. [ ] Proposal ID is displayed
3. [ ] Sandbox ROM file exists at shown path
4. [ ] No crashes or hangs
#### 2.2. List Proposals
```bash
./build/bin/z3ed agent list
# Expected output:
# === Agent Proposals ===
#
# ID: <UUID>
# Status: Pending
# Created: <timestamp>
# Prompt: Test proposal for E2E validation
# Commands: 0
# Bytes Changed: 0
#
# Total: 1 proposal(s)
```
**Verification Steps**:
1. [ ] Proposal appears in list
2. [ ] Status shows "Pending"
3. [ ] All metadata fields populated
4. [ ] Prompt matches input
#### 2.3. View Proposal Diff
```bash
./build/bin/z3ed agent diff
# Expected output:
# === Proposal Diff ===
# Proposal ID: <UUID>
# Sandbox ID: <UUID>
# Prompt: Test proposal for E2E validation
# Description: Agent-generated ROM modifications
# Status: Pending
# Created: <timestamp>
# Commands Executed: 0
# Bytes Changed: 0
#
# --- Diff Content ---
# (No changes yet for mock implementation)
#
# --- Execution Log ---
# Starting agent run with prompt: Test proposal for E2E validation
# Generated 0 commands
# Completed execution of 0 commands
#
# === Next Steps ===
# To accept changes: z3ed agent commit
# To reject changes: z3ed agent revert
# To review in GUI: yaze --proposal=<UUID>
```
**Verification Steps**:
1. [ ] Diff displays correctly
2. [ ] Execution log shows all steps
3. [ ] Metadata matches proposal
4. [ ] No errors reading files
#### 2.4. Launch YAZE GUI
```bash
# Start YAZE normally (not test harness mode)
./build/bin/yaze.app/Contents/MacOS/yaze
# Navigate to: Debug → Agent Proposals
```
**Verification Steps**:
1. [ ] YAZE launches without crashes
2. [ ] "Agent Proposals" menu item exists
3. [ ] ProposalDrawer opens when clicked
4. [ ] Drawer appears on right side (400px width)
#### 2.5. Test ProposalDrawer UI
**List View Verification**:
1. [ ] Proposal appears in list
2. [ ] Status badge shows "Pending" in yellow
3. [ ] Prompt text is visible
4. [ ] Created timestamp displayed
5. [ ] Click proposal to open detail view
**Detail View Verification**:
1. [ ] All metadata displayed correctly
2. [ ] Execution log visible and scrollable
3. [ ] Diff section shows (empty for mock)
4. [ ] Accept/Reject/Delete buttons visible
5. [ ] Back button returns to list
**Filtering Verification**:
1. [ ] "All" filter shows proposal
2. [ ] "Pending" filter shows proposal
3. [ ] "Accepted" filter hides proposal (not accepted yet)
4. [ ] "Rejected" filter hides proposal (not rejected yet)
**Refresh Verification**:
1. [ ] Click "Refresh" button
2. [ ] Proposal count updates if needed
3. [ ] No crashes or errors
#### 2.6. Test Accept Workflow
**Steps**:
1. Select proposal in list view
2. Open detail view
3. Click "Accept" button
4. Confirm in dialog (if shown)
5. Wait for processing
**Verification**:
1. [ ] Accept button triggers action
2. [ ] Status changes to "Accepted"
3. [ ] Status badge turns green
4. [ ] ROM data merged successfully (check logs)
5. [ ] Sandbox ROM remains unchanged
6. [ ] No crashes during merge
**Post-Accept Checks**:
```bash
# Verify proposal status persists
./build/bin/z3ed agent list
# Should show Status: Accepted
# Verify ROM was modified (if changes were made)
# For mock implementation, this will be no-op
```
#### 2.7. Test Reject Workflow
**Create another proposal**:
```bash
./build/bin/z3ed agent run \
--rom=assets/zelda3.sfc \
--prompt "Proposal to reject" \
--sandbox
```
**Steps**:
1. Open ProposalDrawer in YAZE
2. Select new proposal
3. Click "Reject" button
4. Confirm in dialog (if shown)
**Verification**:
1. [ ] Reject button triggers action
2. [ ] Status changes to "Rejected"
3. [ ] Status badge turns red
4. [ ] ROM remains unchanged
5. [ ] Sandbox ROM unchanged
6. [ ] No crashes
#### 2.8. Test Delete Workflow
**Create another proposal**:
```bash
./build/bin/z3ed agent run \
--rom=assets/zelda3.sfc \
--prompt "Proposal to delete" \
--sandbox
```
**Steps**:
1. Open ProposalDrawer in YAZE
2. Select new proposal
3. Click "Delete" button
4. Confirm in dialog
**Verification**:
1. [ ] Delete button triggers action
2. [ ] Proposal removed from list
3. [ ] Files cleaned up from disk
4. [ ] No crashes
**File Cleanup Check**:
```bash
# Verify proposal directory was removed
ls /tmp/yaze/proposals/
# Should NOT show deleted proposal ID
# Verify sandbox was removed
ls /tmp/yaze/sandboxes/
# Should NOT show deleted sandbox ID
```
---
### ✅ Phase 3: Real Widget Testing (60 minutes)
#### 3.1. Start Test Harness
```bash
# Terminal 1: Start YAZE with test harness
./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze \
--enable_test_harness \
--test_harness_port=50052 \
--rom_file=assets/zelda3.sfc &
# Wait for startup
sleep 3
# Verify server is listening
lsof -i :50052
# Should show yaze process
```
#### 3.2. Test Overworld Editor Workflow
```bash
# Terminal 2: Run automation commands
# Click Overworld button
grpcurl -plaintext \
-import-path src/app/core/proto \
-proto imgui_test_harness.proto \
-d '{"target":"button:Overworld","type":"LEFT"}' \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Click
# Wait for window to appear
grpcurl -plaintext \
-import-path src/app/core/proto \
-proto imgui_test_harness.proto \
-d '{"condition":"window_visible:Overworld Editor","timeout_ms":5000}' \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Wait
# Assert window is visible
grpcurl -plaintext \
-import-path src/app/core/proto \
-proto imgui_test_harness.proto \
-d '{"condition":"visible:Overworld Editor"}' \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Assert
```
**Verification**:
1. [ ] Click RPC succeeds
2. [ ] Overworld Editor window opens in YAZE
3. [ ] Wait RPC succeeds (condition met)
4. [ ] Assert RPC succeeds (window visible)
5. [ ] No timeouts or errors
#### 3.3. Test Dungeon Editor Workflow
```bash
# Click Dungeon button
grpcurl -plaintext \
-import-path src/app/core/proto \
-proto imgui_test_harness.proto \
-d '{"target":"button:Dungeon","type":"LEFT"}' \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Click
# Wait for window
grpcurl -plaintext \
-import-path src/app/core/proto \
-proto imgui_test_harness.proto \
-d '{"condition":"window_visible:Dungeon Editor","timeout_ms":5000}' \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Wait
# Assert visible
grpcurl -plaintext \
-import-path src/app/core/proto \
-proto imgui_test_harness.proto \
-d '{"condition":"visible:Dungeon Editor"}' \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Assert
```
**Verification**:
1. [ ] Click RPC succeeds
2. [ ] Dungeon Editor window opens
3. [ ] Wait RPC succeeds
4. [ ] Assert RPC succeeds
5. [ ] No errors
#### 3.4. Test CLI Agent Test Command
```bash
# Build z3ed with gRPC support first
cmake -B build-grpc-test -DYAZE_WITH_GRPC=ON
cmake --build build-grpc-test --target z3ed -j8
# Test simple open editor command
./build-grpc-test/bin/z3ed agent test \
--prompt "Open Overworld editor"
# Expected output:
# === GUI Automation Test ===
# Prompt: Open Overworld editor
# Server: localhost:50052
#
# Generated workflow:
# Workflow: Open Overworld Editor
# 1. Click(button:Overworld)
# 2. Wait(window_visible:Overworld Editor, 5000ms)
#
# ✓ Connected to test harness
#
# [1/2] Click(button:Overworld) ... ✓ (125ms)
# [2/2] Wait(window_visible:Overworld Editor, 5000ms) ... ✓ (1250ms)
#
# ✅ Test passed in 1375ms
```
**Verification**:
1. [ ] Command parses prompt correctly
2. [ ] Workflow generation succeeds
3. [ ] Connection to test harness succeeds
4. [ ] All steps execute successfully
5. [ ] Timing information displayed
6. [ ] Exit code is 0
**Test Additional Prompts**:
```bash
# Open and verify
./build-grpc-test/bin/z3ed agent test \
--prompt "Open Dungeon editor and verify it loads"
# Click button
./build-grpc-test/bin/z3ed agent test \
--prompt "Click Overworld button"
```
**Verification for Each**:
1. [ ] Prompt recognized
2. [ ] Workflow generated correctly
3. [ ] All steps pass
4. [ ] No crashes or errors
---
### ✅ Phase 4: Documentation Updates (30 minutes)
#### 4.1. Update IT-01-QUICKSTART.md
Add section on CLI agent test command:
```markdown
## CLI Agent Test Command
You can now automate GUI testing with natural language prompts:
\`\`\`bash
# Start YAZE with test harness
./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze \
--enable_test_harness \
--test_harness_port=50052 \
--rom_file=assets/zelda3.sfc &
# Run automated test
./build-grpc-test/bin/z3ed agent test \
--prompt "Open Overworld editor and verify it loads"
\`\`\`
### Supported Prompt Patterns
1. **Open Editor**: "Open Overworld editor"
2. **Open and Verify**: "Open Dungeon editor and verify it loads"
3. **Click Button**: "Click Open ROM button"
4. **Type Input**: "Type 'zelda3.sfc' in filename input"
```
**Tasks**:
1. [ ] Add CLI agent test section
2. [ ] Document supported prompts
3. [ ] Add troubleshooting tips
4. [ ] Update examples
#### 4.2. Update E6-z3ed-implementation-plan.md
Mark Priority 1 complete:
```markdown
### Priority 1: End-to-End Workflow Validation ✅ COMPLETE
**Completion Date**: October 2, 2025
**Time Spent**: 3 hours
**Status**: All validation checks passed
**Completed Tasks**:
1. ✅ E2E test script validation
2. ✅ Manual proposal workflow testing
3. ✅ Real widget automation testing
4. ✅ CLI agent test command implementation
5. ✅ Documentation updates
**Key Findings**:
- All systems working as expected
- No critical issues identified
- Performance acceptable (< 2s per step)
- Ready for production use
**Next Priority**: IT-02 (CLI Agent Test Command - already implemented!)
```
**Tasks**:
1. [ ] Mark Priority 1 complete
2. [ ] Document completion details
3. [ ] List any issues found
4. [ ] Update status summary
#### 4.3. Update README.md
Update current status:
```markdown
### ✅ Priority 1: End-to-End Workflow Validation (COMPLETE)
**Goal**: Validated complete proposal lifecycle with real GUI and widgets
**Time Invested**: 3 hours
**Status**: All checks passed
### ✅ Priority 2: CLI Agent Test Command (COMPLETE)
**Goal**: Natural language prompt → automated GUI test workflow
**Time Invested**: 2 hours (implemented alongside Priority 1)
**Status**: Fully operational
**Implementation**:
- GuiAutomationClient: gRPC wrapper for CLI usage
- TestWorkflowGenerator: Natural language prompt parsing
- `z3ed agent test` command: End-to-end automation
**See**: [IT-01-QUICKSTART.md](IT-01-QUICKSTART.md) for usage examples
```
**Tasks**:
1. [ ] Update completion status
2. [ ] Add implementation details
3. [ ] Update quick start guide
4. [ ] Add examples
---
## Success Criteria Summary
### Must Pass (Critical)
- [ ] E2E test script: All 6 tests pass
- [ ] Proposal creation: Works without errors
- [ ] ProposalDrawer: Opens and displays proposals
- [ ] Accept workflow: ROM merging works correctly
- [ ] GUI automation: Real widgets respond to RPCs
- [ ] CLI agent test: At least 3 prompts work
### Should Pass (Important)
- [ ] Reject workflow: Status updates correctly
- [ ] Delete workflow: Files cleaned up
- [ ] Cross-session persistence: Proposals survive restart
- [ ] Error handling: Helpful messages on failure
- [ ] Performance: < 5s per automation step
### Nice to Have (Optional)
- [ ] Screenshots: Capture and save images
- [ ] Policy evaluation: Basic constraint checking
- [ ] Telemetry: Usage metrics collected
---
## Known Issues & Limitations
### Current Limitations
1. **MockAIService**: Not using real LLM (placeholder commands)
2. **Screenshot**: Not yet implemented (returns stub)
3. **Policy Evaluation**: Not yet implemented (AW-04)
4. **Windows Support**: Test harness not available on Windows
### Workarounds
1. Mock service sufficient for testing infrastructure
2. Screenshot can be added later (non-blocking)
3. Policy framework is Priority 3
4. Windows users can use manual testing
---
## Next Steps
After completing this validation:
1. **Mark Priority 1 Complete**: Update all documentation
2. **Mark Priority 2 Complete**: CLI agent test implemented
3. **Begin Priority 3**: Policy Evaluation Framework (AW-04)
4. **Production Deployment**: System ready for real usage
---
## Reporting Issues
If any validation step fails, document:
1. **What failed**: Specific step/command
2. **Error message**: Full output or screenshot
3. **Environment**: OS, build config, ROM file
4. **Reproduction**: Steps to reproduce
5. **Workaround**: Any temporary fixes found
Report issues in: `docs/z3ed/VALIDATION_ISSUES.md`
---
**Last Updated**: October 2, 2025
**Contributors**: @scawful, GitHub Copilot
**License**: Same as YAZE (see ../../LICENSE)