From 3a573c076434cea0015cc3e0c8e21cd03aed5185 Mon Sep 17 00:00:00 2001 From: scawful Date: Thu, 2 Oct 2025 15:00:34 -0400 Subject: [PATCH] doc: Plan test harness with introspection capabilities (IT-05) --- docs/z3ed/E6-z3ed-cli-design.md | 265 ++++++++- docs/z3ed/E6-z3ed-implementation-plan.md | 461 +++++++++++---- docs/z3ed/E6-z3ed-reference.md | 169 +++++- docs/z3ed/IT-05-IMPLEMENTATION-GUIDE.md | 722 +++++++++++++++++++++++ docs/z3ed/README.md | 53 +- 5 files changed, 1552 insertions(+), 118 deletions(-) create mode 100644 docs/z3ed/IT-05-IMPLEMENTATION-GUIDE.md diff --git a/docs/z3ed/E6-z3ed-cli-design.md b/docs/z3ed/E6-z3ed-cli-design.md index 12de6e42..56af8fbc 100644 --- a/docs/z3ed/E6-z3ed-cli-design.md +++ b/docs/z3ed/E6-z3ed-cli-design.md @@ -24,7 +24,12 @@ This document is the **source of truth** for the z3ed CLI architecture and desig - **Proposal Registry**: Cross-session proposal tracking with disk persistence **๐Ÿ”„ In Progress**: -- **E2E Validation**: Testing complete workflow (80% done, window detection needs fix) +- **Test Harness Enhancements (IT-05 to IT-09)**: Expanding from basic automation to comprehensive testing platform + - Test introspection APIs for status/results polling + - Widget discovery for AI-driven interactions + - Test recording/replay for regression testing + - Enhanced error reporting with screenshots + - CI/CD integration with standardized test formats **๐Ÿ“‹ Planned Next**: - **Policy Evaluation Framework (AW-04)**: YAML-based constraints for proposal acceptance @@ -51,6 +56,13 @@ The z3ed CLI is built on three core pillars: **gRPC Test Harness**: Embedded gRPC server in YAZE enables remote GUI automation for testing and AI-driven workflows. +**Comprehensive Testing Platform**: Test harness evolved beyond basic automation to support: +- **Widget Discovery**: AI agents can enumerate available GUI interactions dynamically +- **Test Introspection**: Query test status, results, and execution queue in real-time +- **Recording & Replay**: Capture test sessions as JSON scripts for regression testing +- **CI/CD Integration**: Standardized test suite format with JUnit XML output +- **Enhanced Debugging**: Screenshot capture, widget state dumps, and execution context on failures + **Cross-Platform Foundation**: Core built for macOS/Linux with Windows support planned via vcpkg. ## 3. Proposed CLI Architecture: Resource-Oriented Commands @@ -203,6 +215,257 @@ The `z3ed agent` command is the main entry point for the agent. It has the follo ### 8.3. AI Model & Protocol Strategy - **Models**: The framework will support both local and remote AI models, offering flexibility and catering to different user needs. + +--- + +## 9. Test Harness Evolution: From Automation to Platform + +The ImGuiTestHarness has evolved from a basic GUI automation tool into a comprehensive testing platform that serves dual purposes: **AI-driven generative workflows** and **traditional GUI testing**. + +### 9.1. Current Capabilities (IT-01 to IT-04) โœ… + +**Core Automation** (6 RPCs): +- `Ping` - Health check and version verification +- `Click` - Button, menu, and tab interactions +- `Type` - Text input with focus management +- `Wait` - Condition polling (window visibility, element state) +- `Assert` - State validation (visible, enabled, exists) +- `Screenshot` - Capture (stub, needs implementation) + +**Integration Points**: +- ImGuiTestEngine dynamic test registration +- Async test queue with frame-accurate timing +- gRPC server embedded in YAZE process +- Cross-platform build (macOS validated, Windows planned) + +**Proven Use Cases**: +- Menu-driven editor opening (Overworld, Dungeon, etc.) +- Window visibility validation +- Multi-step workflows with timing dependencies +- Natural language test prompts via `z3ed agent test` + +### 9.2. Limitations Identified + +**For AI Agents**: +- โŒ Can't discover available widgets โ†’ must hardcode target names +- โŒ No way to query test results โ†’ async tests return immediately with no status +- โŒ No structured error context โ†’ failures lack screenshots and state dumps +- โŒ Limited to predefined actions โ†’ can't learn new interaction patterns + +**For Traditional Testing**: +- โŒ No test recording โ†’ can't capture manual workflows for regression +- โŒ No test suite format โ†’ can't organize tests into smoke/regression/nightly groups +- โŒ No CI integration โ†’ can't run tests in automated pipelines +- โŒ No result persistence โ†’ test history lost between sessions +- โŒ Poor debugging โ†’ failures don't capture visual or state context + +### 9.3. Enhancement Roadmap (IT-05 to IT-09) + +#### IT-05: Test Introspection API (6-8 hours) +**Problem**: Tests execute asynchronously with no way to query status or results. Clients poll blindly or give up early. + +**Solution**: Add 3 new RPCs: +- `GetTestStatus(test_id)` โ†’ Returns queued/running/passed/failed/timeout with execution time +- `ListTests(category_filter)` โ†’ Enumerates all registered tests with metadata +- `GetTestResults(test_id)` โ†’ Retrieves detailed results: logs, assertions, metrics + +**Benefits**: +- AI agents can poll for test completion reliably +- CLI can show real-time progress bars +- Test history enables trend analysis (flaky tests, performance regressions) + +**Example Flow**: +```bash +# Queue test (returns immediately with test_id) +TEST_ID=$(z3ed agent test --prompt "Open Overworld" --output json | jq -r '.test_id') + +# Poll until complete +while true; do + STATUS=$(z3ed agent test status --test-id $TEST_ID --format json | jq -r '.status') + [[ "$STATUS" =~ ^(PASSED|FAILED|TIMEOUT)$ ]] && break + sleep 0.5 +done + +# Get results +z3ed agent test results --test-id $TEST_ID --include-logs +``` + +#### IT-06: Widget Discovery API (4-6 hours) +**Problem**: AI agents must know widget names in advance. Can't adapt to UI changes or learn new editors. + +**Solution**: Add `DiscoverWidgets` RPC: +- Enumerates all windows currently open +- Lists interactive widgets per window: buttons, inputs, menus, tabs +- Returns metadata: ID, label, type, enabled state, position +- Provides suggested action templates (e.g., "Click button:Save") + +**Benefits**: +- AI agents discover GUI capabilities dynamically +- Test scripts validate expected widgets exist +- LLM prompts improved with natural language descriptions +- Reduces brittleness from hardcoded widget names + +**Example Flow**: +```python +# AI agent workflow +widgets = z3ed_client.DiscoverWidgets(window_filter="Overworld") + +# LLM prompt: "Which buttons are available in the Overworld editor?" +available_actions = [w.suggested_action for w in widgets.buttons if w.is_enabled] + +# LLM generates: "Click button:Save Changes" +z3ed_client.Click(target="button:Save Changes") +``` + +#### IT-07: Test Recording & Replay (8-10 hours) +**Problem**: No way to capture manual workflows for regression. Testers repeat same actions every release. + +**Solution**: Add recording workflow: +- `StartRecording(output_file)` โ†’ Begins capturing all RPC calls +- `StopRecording()` โ†’ Saves to JSON test script +- `ReplayTest(test_script)` โ†’ Executes recorded actions with validation + +**Test Script Format** (JSON): +```json +{ + "name": "Overworld Tile Edit Test", + "steps": [ + { "action": "Click", "target": "menuitem: Overworld Editor" }, + { "action": "Wait", "condition": "window_visible:Overworld", "timeout_ms": 5000 }, + { "action": "Click", "target": "button:Select Tile" }, + { "action": "Assert", "condition": "enabled:button:Apply" } + ] +} +``` + +**Benefits**: +- QA engineers record test scenarios once, replay forever +- Test scripts version controlled alongside code +- Parameterized tests (e.g., test with different ROMs) +- Foundation for test suite management (smoke, regression, nightly) + +#### IT-08: Enhanced Error Reporting (3-4 hours) +**Problem**: Test failures lack context. Developer sees "Window not visible" but doesn't know why. + +**Solution**: Capture rich context on failure: +- Screenshot (implement stub RPC) +- Widget state dump (full hierarchy with properties) +- Execution context (active window, recent events, resource stats) +- HTML report generation with annotated screenshots + +**Example Error Report**: +```json +{ + "test_id": "grpc_wait_12345678", + "failure_reason": "Timeout waiting for window_visible:Overworld", + "screenshot": "test-results/failure_12345678.png", + "widget_state": { + "visible_windows": ["Main Window", "Debug"], + "overworld_window": { "exists": true, "visible": false, "reason": "not_initialized" } + }, + "execution_context": { + "last_click": "menuitem: Overworld Editor", + "frames_since_click": 150, + "resource_stats": { "memory_mb": 245, "framerate": 58.3 } + } +} +``` + +**Benefits**: +- Developers fix failing tests faster (visual + state context) +- Flaky test debugging (see exact UI state at failure) +- Test reports shareable with QA/PM (HTML with screenshots) + +#### IT-09: CI/CD Integration (2-3 hours) +**Problem**: Tests run manually. No automated regression on PR/merge. + +**Solution**: Standardize test execution for CI: +- YAML test suite format (groups, dependencies, parallel execution) +- `z3ed test suite run` command with `--ci-mode` +- JUnit XML output for CI parsers (Jenkins, GitHub Actions) +- Exit codes: 0=pass, 1=fail, 2=error + +**GitHub Actions Example**: +```yaml +name: GUI Tests +on: [push, pull_request] +jobs: + gui-tests: + runs-on: macos-latest + steps: + - name: Build YAZE + run: cmake --build build --target yaze --target z3ed + - name: Start test harness + run: ./build/bin/yaze --enable_test_harness --headless & + - name: Run smoke tests + run: ./build/bin/z3ed test suite run tests/smoke.yaml --ci-mode + - name: Upload results + uses: actions/upload-artifact@v2 + with: + name: test-results + path: test-results/ +``` + +**Benefits**: +- Catch regressions before merge +- Test history tracked in CI dashboard +- Parallel execution for faster feedback +- Flaky test detection (retry logic, failure rates) + +### 9.4. Unified Testing Vision + +The enhanced test harness serves three audiences: + +**For AI Agents** (Generative Workflows): +- Widget discovery enables dynamic learning +- Test introspection provides reliable feedback loops +- Recording captures expert workflows for training data + +**For Developers** (Unit/Integration Testing): +- Test suites organize tests by scope (smoke, regression, nightly) +- CI integration catches regressions early +- Rich error reporting speeds up debugging + +**For QA Engineers** (Manual Testing Automation): +- Record manual workflows once, replay forever +- Parameterized tests reduce maintenance burden +- Visual test reports simplify communication + +**Shared Infrastructure**: +- Single gRPC server handles all test types +- Consistent test script format (JSON/YAML) +- Common result storage and reporting +- Cross-platform support (macOS, Windows, Linux) + +### 9.5. Implementation Priority + +**Phase 1: Foundation** (Already Complete โœ…) +- Core automation RPCs (Ping, Click, Type, Wait, Assert) +- ImGuiTestEngine integration +- gRPC server lifecycle +- Basic E2E validation + +**Phase 2: Introspection & Discovery** (IT-05, IT-06 - 10-14 hours) +- Test status/results querying +- Widget enumeration API +- Async test management +- *Critical for AI agents* + +**Phase 3: Recording & Replay** (IT-07 - 8-10 hours) +- Test script format +- Recording workflow +- Replay engine +- *Unlocks regression testing* + +**Phase 4: Production Readiness** (IT-08, IT-09 - 5-7 hours) +- Screenshot implementation +- Error context capture +- CI/CD integration +- *Enables automated pipelines* + +**Total Estimated Effort**: 23-31 hours beyond current implementation + +--- - **Local Models (macOS Setup)**: For privacy, offline use, and reduced operational costs, integration with local LLMs via [Ollama](https://ollama.ai/) is a priority. Users can easily install Ollama on macOS and pull models optimized for code generation, such as `codellama:7b`. The `z3ed` agent will communicate with Ollama's local API endpoint. - **Remote Models (Gemini API)**: For more complex tasks requiring advanced reasoning capabilities, integration with powerful remote models like the Gemini API will be available. Users will need to provide a `GEMINI_API_KEY` environment variable. A new `GeminiAIService` class will be implemented to handle the secure API requests and responses. - **Protocol**: A robust, yet simple, JSON-based protocol will be used for communication between `z3ed` and the AI model. This ensures structured data exchange, critical for reliable parsing and execution. The `z3ed` tool will serialize the user's prompt, current ROM context, available `z3ed` commands, and any relevant `ImGuiTestEngine` capabilities into a JSON object. The AI model will be expected to return a JSON object containing the sequence of commands to be executed, along with potential explanations or confidence scores. diff --git a/docs/z3ed/E6-z3ed-implementation-plan.md b/docs/z3ed/E6-z3ed-implementation-plan.md index f484e77d..f5acb1f5 100644 --- a/docs/z3ed/E6-z3ed-implementation-plan.md +++ b/docs/z3ed/E6-z3ed-implementation-plan.md @@ -1,9 +1,28 @@ -# z3ed Agentic Wo**Active Phase**: -- **Policy Evaluation Framework (AW-04)**: YAML-based constraint system for gating proposal acceptance - implementation complete, ready for production testing. +# z3ed Agentic Workflow Plan + +**Last Updated**: October 2, 2025 +**Status**: Core Infrastructure Complete | Test Harness Enhancement Phase ๐ŸŽฏ + +> ๐Ÿ“‹ **Quick Start**: See [README.md](README.md) for essential links and project status. + +## Executive Summary + +The z3ed CLI and AI agent workflow system has completed major infrastructure milestones: + +**โœ… Completed Phases**: +- **Phase 6**: Resource Catalogue - Machine-readable API specs for AI consumption +- **AW-01/02/03**: Acceptance Workflow - Proposal tracking, sandbox management, GUI review with ROM merging +- **AW-04**: Policy Evaluation Framework - YAML-based constraint system for proposal acceptance +- **IT-01**: ImGuiTestHarness - Full GUI automation via gRPC + ImGuiTestEngine (all 3 phases complete) +- **IT-02**: CLI Agent Test - Natural language โ†’ automated GUI testing (implementation complete) + +**๐Ÿ”„ Active Phase**: +- **Test Harness Enhancements (IT-05 to IT-09)**: Expanding from basic automation to comprehensive testing platform **๐Ÿ“‹ Next Phases**: -- **Priority 1**: Production Testing - Validate policy enforcement with real ROM modification proposals. -- **Priority 2**: Windows Cross-Platform Testing - Ensure z3ed works on Windows targets with gRPC integration. +- **Priority 1**: Test Introspection API (IT-05) - Enable test status querying and result polling +- **Priority 2**: Widget Discovery API (IT-06) - AI agents enumerate available GUI interactions +- **Priority 3**: Test Recording & Replay (IT-07) - Capture workflows for regression testing **Recent Accomplishments** (Updated: January 2025): - **โœ… Policy Framework Complete**: PolicyEvaluator service fully integrated with ProposalDrawer GUI @@ -20,49 +39,17 @@ - **Build System**: Hardened CMake configuration with reliable gRPC integration - **Proposal Workflow**: Agentic proposal system fully operational (create, list, diff, review in GUI) -**Known Limitations** (Non-Blocking): -- **Screenshot RPC**: Stub implementation (returns "not implemented" - planned for production phase) -- **Widget Naming**: Documentation needed for icon prefixes and naming conventions +**Known Limitations & Improvement Opportunities**: +- **Screenshot RPC**: Stub implementation โ†’ needs SDL_Surface capture + PNG encoding +- **Test Introspection**: No way to query test status, results, or queue โ†’ add GetTestStatus/ListTests RPCs +- **Widget Discovery**: AI agents can't enumerate available widgets โ†’ add DiscoverWidgets RPC +- **Test Recording**: No record/replay for regression testing โ†’ add RecordSession/ReplaySession RPCs +- **Synchronous Wait**: Async tests return immediately โ†’ add blocking mode or result polling +- **Error Context**: Test failures lack screenshots/state dumps โ†’ enhance error reporting - **Performance**: Tests add ~166ms per Wait call due to frame yielding (acceptable trade-off) - **YAML Parsing**: Simple parser implemented, consider yaml-cpp for complex scenarios -**Time Investment**: 28.5 hours total (IT-01: 11h, IT-02: 7.5h, E2E: 2h, Policy: 6h, Docs: 2h)on Plan - -**Last Updated**: [Current Date] -**Status**: Core Infrastructure Complete | E2E Validation In Progress ๐ŸŽฏ - -> ๐Ÿ“‹ **Quick Start**: See [README.md](README.md) for essential links and project status. - -## Executive Summary - -The z3ed CLI and AI agent workflow system has completed major infrastructure milestones: - -**โœ… Completed Phases**: -- **Phase 6**: Resource Catalogue - Machine-readable API specs for AI consumption -- **AW-01/02/03**: Acceptance Workflow - Proposal tracking, sandbox management, GUI review with ROM merging -- **IT-01**: ImGuiTestHarness - Full GUI automation via gRPC + ImGuiTestEngine (all 3 phases complete) -- **IT-02**: CLI Agent Test - Natural language โ†’ automated GUI testing (implementation complete) - -**๐Ÿ”„ Active Phase**: -- **E2E Validation**: Testing complete proposal lifecycle with real GUI widgets (window detection debugging in progress) - -**๐Ÿ“‹ Next Phases**: -- **Priority 1**: Complete E2E Validation - Fix window detection after menu actions (2-3 hours) -- **Priority 2**: Policy Evaluation Framework (AW-04) - YAML-based constraints for proposal acceptance (6-8 hours) - -**Recent Accomplishments** (October 2, 2025): -- IT-02 implementation complete with async test queue pattern -- Build system fixes for z3ed target (gRPC integration) -- Documentation consolidated into clean structure -- E2E test script operational (5/6 RPCs working) -- Menu interaction verified via ImGuiTestEngine - -**Known Issues**: -- Window detection timing after menu clicks needs refinement -- Screenshot RPC proto mismatch (non-critical) - -**Time Investment**: 20.5 hours total (IT-01: 11h, IT-02: 7.5h, Docs: 2h) -**Code Quality**: All targets compile cleanly, no crashes, partial test coverage +**Time Investment**: 28.5 hours total (IT-01: 11h, IT-02: 7.5h, E2E: 2h, Policy: 6h, Docs: 2h) ## Quick Reference @@ -94,83 +81,326 @@ The z3ed CLI and AI agent workflow system has completed major infrastructure mil ## 1. Current Priorities (Week of Oct 2-8, 2025) -**Status**: IT-01 Complete โœ… | IT-02 Complete โœ… | E2E Tests Running โšก +**Status**: Core Infrastructure Complete โœ… | Test Harness Enhancement Phase ๐Ÿ”ง -### Priority 0: E2E Test Validation (IMMEDIATE) ๐ŸŽฏ -**Goal**: Validate test harness with real YAZE widgets -**Time Estimate**: 30-60 minutes -**Status**: Test script running, needs real widget names +### Priority 1: Test Harness Enhancements (IT-05 to IT-09) ๐Ÿ”ง ACTIVE +**Goal**: Transform test harness from basic automation to comprehensive testing platform +**Time Estimate**: 20-25 hours total +**Blocking Dependency**: IT-01 Complete โœ… -**Current Results**: -- โœ… Ping RPC working -- โš ๏ธ Tests 2-5 using fake widget names -- ๐Ÿ“‹ Need to identify real widget names from YAZE source -- ๐Ÿ”ง Screenshot RPC needs proto fix - -**Task Checklist**: -1. โœ… **E2E Test Script**: Already created (`scripts/test_harness_e2e.sh`) -2. ๐Ÿ“‹ **Manual Testing Workflow**: - - Start YAZE with test harness enabled - - Create proposal via CLI: `z3ed agent run "Test prompt" --sandbox` - - Verify proposal appears in ProposalDrawer GUI - - Test Accept โ†’ validate ROM merge and save prompt - - Test Reject โ†’ validate status update - - Test Delete โ†’ validate cleanup -3. ๐Ÿ“‹ **Real Widget Testing**: - - Click actual YAZE buttons (Overworld, Dungeon, etc.) - - Type into real input fields - - Wait for actual windows to appear - - Assert on real widget states -4. ๐Ÿ“‹ **Document Edge Cases**: - - Widget not found scenarios - - Timeout handling - - Error recovery patterns - -### Priority 2: CLI Agent Test Command (IT-02) ๐Ÿ“‹ NEXT -**Goal**: Natural language โ†’ automated GUI testing via gRPC -**Time Estimate**: 4-6 hours -**Blocking Dependency**: Priority 1 completion +**Motivation**: Current test harness supports basic GUI automation but lacks features for: +- **AI Agent Development**: No widget discovery API for LLMs to learn available interactions +- **Regression Testing**: No recording/replay mechanism for test suite management +- **CI/CD Integration**: No standardized test format for automated pipelines +- **Debugging**: Limited error context when tests fail (no screenshots, state dumps) +- **Test Management**: Can't query test status, results, or execution queue +#### IT-05: Test Introspection API (6-8 hours) **Implementation Tasks**: -1. **Create `z3ed agent test` command**: - - Parse natural language prompt - - Generate RPC call sequence (Click โ†’ Wait โ†’ Assert) - - Execute via gRPC client - - Capture results and screenshots +1. **Add GetTestStatus RPC**: + - Query status of queued/running tests by ID + - Return test state: queued, running, passed, failed, timeout + - Include execution time, error messages, assertion failures -2. **Example Usage**: - ```bash - z3ed agent test --prompt "Open Overworld editor and verify it loads" \ - --rom zelda3.sfc +2. **Add ListTests RPC**: + - Enumerate all registered tests in ImGuiTestEngine + - Filter by category (grpc, unit, integration, e2e) + - Return test metadata: name, category, last run time, pass/fail count - # Generated workflow: - # 1. Click "button:Overworld" - # 2. Wait "window_visible:Overworld Editor" (5s) - # 3. Assert "visible:Overworld Editor" - # 4. Screenshot "full" - ``` +3. **Add GetTestResults RPC**: + - Retrieve detailed results for completed tests + - Include assertion logs, performance metrics, resource usage + - Support pagination for large result sets -3. **Implementation Files**: - - `src/cli/handlers/agent.cc` - Add `HandleTestCommand()` - - `src/cli/service/gui_automation_client.{h,cc}` - gRPC client wrapper - - `src/cli/service/test_workflow_generator.{h,cc}` - Prompt โ†’ RPC translator +**Example Usage**: +```bash +# Queue a test +z3ed agent test --prompt "Open Overworld editor" -### Priority 3: Policy Evaluation Framework (AW-04) ๐Ÿ“‹ -**Goal**: YAML-based constraint system for gating proposal acceptance -**Time Estimate**: 6-8 hours -**Blocking Dependency**: None (can work in parallel) +# Poll for completion +z3ed test status --test-id grpc_click_12345678 -> ๏ฟฝ **Detailed Guides**: See [NEXT_PRIORITIES_OCT2.md](NEXT_PRIORITIES_OCT2.md) for complete implementation breakdowns with code examples. +# Retrieve results +z3ed test results --test-id grpc_click_12345678 --format json +``` ---- +**API Schema**: +```proto +message GetTestStatusRequest { + string test_id = 1; +} -## 2. Workstreams Overview +message GetTestStatusResponse { + enum Status { QUEUED = 0; RUNNING = 1; PASSED = 2; FAILED = 3; TIMEOUT = 4; } + Status status = 1; + int64 execution_time_ms = 2; + string error_message = 3; + repeated string assertion_failures = 4; +} + +message ListTestsRequest { + string category_filter = 1; // Optional: "grpc", "unit", etc. + int32 page_size = 2; + string page_token = 3; +} + +message ListTestsResponse { + repeated TestInfo tests = 1; + string next_page_token = 2; +} + +message TestInfo { + string test_id = 1; + string name = 2; + string category = 3; + int64 last_run_timestamp_ms = 4; + int32 total_runs = 5; + int32 pass_count = 6; + int32 fail_count = 7; +} +``` + +#### IT-06: Widget Discovery API (4-6 hours) +**Implementation Tasks**: +1. **Add DiscoverWidgets RPC**: + - Enumerate all windows currently open in YAZE GUI + - List all interactive widgets (buttons, inputs, menus, tabs) per window + - Return widget metadata: ID, type, label, enabled state, position + - Support filtering by window name or widget type + +2. **AI-Friendly Output Format**: + - JSON schema describing available interactions + - Natural language descriptions for each widget + - Suggested action templates (e.g., "Click button:{label}") + +**Example Usage**: +```bash +# Discover all widgets +z3ed gui discover + +# Filter by window +z3ed gui discover --window "Overworld" + +# Get only buttons +z3ed gui discover --type button +``` + +**API Schema**: +```proto +message DiscoverWidgetsRequest { + string window_filter = 1; // Optional: filter by window name + enum WidgetType { ALL = 0; BUTTON = 1; INPUT = 2; MENU = 3; TAB = 4; CHECKBOX = 5; } + WidgetType type_filter = 2; +} + +message DiscoverWidgetsResponse { + repeated WindowInfo windows = 1; +} + +message WindowInfo { + string name = 1; + bool is_visible = 2; + repeated WidgetInfo widgets = 3; +} + +message WidgetInfo { + string id = 1; + string label = 2; + string type = 3; // "button", "input", "menu", etc. + bool is_enabled = 4; + string position = 5; // "x,y,width,height" + string suggested_action = 6; // "Click button:Open ROM" +} +``` + +**Benefits for AI Agents**: +- LLMs can dynamically learn available GUI interactions +- Agents can adapt to UI changes without hardcoded widget names +- Natural language descriptions enable better prompt engineering + +#### IT-07: Test Recording & Replay (8-10 hours) +**Implementation Tasks**: +1. **Add StartRecording/StopRecording RPCs**: + - Capture all RPC calls during a session + - Record timing, parameters, and results + - Save to JSON test script format + +2. **Add ReplayTest RPC**: + - Load JSON test script + - Execute recorded actions sequentially + - Validate expected results match actual results + - Support parameterization (e.g., replace ROM filename) + +3. **Test Script Format**: + - Human-readable JSON with comments + - Support assertions and conditionals + - Enable test suite composition (call other scripts) + +**Example Workflow**: +```bash +# Start recording +z3ed test record start --output overworld_test.json + +# Perform actions (manually or via agent) +z3ed agent test --prompt "Open Overworld editor" +z3ed agent test --prompt "Click tile at 10,20" + +# Stop recording +z3ed test record stop + +# Replay test +z3ed test replay overworld_test.json + +# Run in CI +z3ed test replay tests/*.json --ci-mode +``` + +**JSON Test Script Example**: +```json +{ + "name": "Overworld Editor Load Test", + "description": "Verify Overworld editor opens and tile selection works", + "steps": [ + { + "action": "Click", + "target": "menuitem: Overworld Editor", + "expected_result": { "success": true } + }, + { + "action": "Wait", + "condition": "window_visible:Overworld", + "timeout_ms": 5000 + }, + { + "action": "Assert", + "condition": "visible:Overworld", + "expected": { "success": true, "actual_value": "visible" } + } + ] +} +``` + +#### IT-08: Enhanced Error Reporting (3-4 hours) +**Implementation Tasks**: +1. **Screenshot on Failure**: + - Implement Screenshot RPC (complete stub) + - Automatically capture screenshot when test fails + - Save to proposal directory or test results folder + +2. **Widget State Dumps**: + - Capture full widget tree on assertion failure + - Include widget properties (enabled, visible, position, text) + - Generate HTML report with annotated screenshots + +3. **Execution Context**: + - Log ImGui state: active window, focused widget, frame count + - Capture recent ImGui events (clicks, key presses, hovers) + - Include resource stats: memory, textures, framerate + +**Error Report Example**: +```json +{ + "test_id": "grpc_assert_12345678", + "failure_time": "2025-10-02T14:23:45Z", + "assertion": "visible:Overworld", + "expected": "visible", + "actual": "hidden", + "screenshot": "/tmp/yaze_test_12345678.png", + "widget_state": { + "active_window": "Main Window", + "focused_widget": null, + "visible_windows": ["Main Window", "Debug"], + "overworld_window": { "exists": true, "visible": false, "position": "0,0,0,0" } + }, + "execution_context": { + "frame_count": 1234, + "recent_events": ["Click: menuitem: Overworld Editor", "Wait: window_visible:Overworld"], + "resource_stats": { "memory_mb": 245, "textures": 12, "framerate": 60.0 } + } +} +``` + +#### IT-09: CI/CD Integration (2-3 hours) +**Implementation Tasks**: +1. **Standardized Test Suite Format**: + - YAML/JSON format for test suite definitions + - Support test groups (smoke, regression, nightly) + - Enable parallel execution with dependencies + +2. **CI-Friendly CLI**: + - `z3ed test run-suite tests/suite.yaml --ci-mode` + - Exit codes: 0 = all passed, 1 = failures, 2 = errors + - JUnit XML output for CI parsers + - GitHub Actions integration examples + +3. **Documentation**: + - Add `.github/workflows/gui-tests.yml` example + - Create sample test suites for common scenarios + - Document best practices for flaky test handling + +**Test Suite Format**: +```yaml +name: YAZE GUI Test Suite +description: Comprehensive tests for YAZE editor functionality +version: 1.0 + +config: + timeout_per_test: 30s + retry_on_failure: 2 + parallel_execution: false + +test_groups: + - name: smoke + description: Fast tests for basic functionality + tests: + - tests/overworld_load.json + - tests/dungeon_load.json + + - name: regression + description: Full test suite for release validation + depends_on: [smoke] + tests: + - tests/palette_edit.json + - tests/sprite_load.json + - tests/rom_save.json +``` + +**GitHub Actions Integration**: +```yaml +name: GUI Tests +on: [push, pull_request] + +jobs: + gui-tests: + runs-on: macos-latest + steps: + - uses: actions/checkout@v2 + - name: Build YAZE with test harness + run: | + cmake -B build -DYAZE_WITH_GRPC=ON + cmake --build build --target yaze --target z3ed + - name: Start test harness + run: | + ./build/bin/yaze --enable_test_harness --headless & + sleep 5 + - name: Run test suite + run: | + ./build/bin/z3ed test run-suite tests/suite.yaml --ci-mode + - name: Upload test results + if: always() + uses: actions/upload-artifact@v2 + with: + name: test-results + path: test-results/ +``` + +### Priority 2: Windows Cross-Platform Testing ๐ŸชŸ +**Goal**: Validate z3ed and test harness on Windows +**Time Estimate**: 8-10 hours +**Blocking Dependency**: IT-05 Complete (need stable API) + +> ๐Ÿ“‹ **Detailed Guides**: See [NEXT_PRIORITIES_OCT2.md](NEXT_PRIORITIES_OCT2.md) for complete implementation breakdowns with code examples. -This plan decomposes the design additions into actionable engineering tasks. Each workstream contains milestones, blocking dependencies, and expected deliverables. -1. `src/cli/handlers/rom.cc` - Added `RomInfo::Run` implementation -2. `src/cli/z3ed.h` - Added `RomInfo` class declaration -3. `src/cli/modern_cli.cc` - Updated `HandleRomInfoCommand` routing -4. `src/cli/service/resource_catalog.cc` - Added `rom info` schema entry --- ## 2. Workstreams Overview @@ -225,6 +455,11 @@ This plan decomposes the design additions into actionable engineering tasks. Eac | IT-02 | Implement CLI agent step translation (`imgui_action` โ†’ harness call). | ImGuiTest Bridge | Code | โœ… Done | `z3ed agent test` command with natural language prompts (7.5 hours) | | IT-03 | Provide synchronization primitives (`WaitForIdle`, etc.). | ImGuiTest Bridge | Code | โœ… Done | Wait RPC with condition polling already implemented in IT-01 Phase 3 | | IT-04 | Complete E2E validation with real YAZE widgets | ImGuiTest Bridge | Test | โœ… Done | IT-02 - All 5 functional tests passing, window detection fixed with yield buffer | +| IT-05 | Add test introspection RPCs (GetTestStatus, ListTests, GetResults) | ImGuiTest Bridge | Code | ๐Ÿ“‹ Planned | IT-01 - Enable clients to poll test results and query execution state | +| IT-06 | Implement widget discovery API for AI agents | ImGuiTest Bridge | Code | ๐Ÿ“‹ Planned | IT-01 - DiscoverWidgets RPC to enumerate windows, buttons, inputs | +| IT-07 | Add test recording/replay for regression testing | ImGuiTest Bridge | Code | ๐Ÿ“‹ Planned | IT-05 - RecordSession/ReplaySession RPCs with JSON test scripts | +| IT-08 | Enhance error reporting with screenshots and state dumps | ImGuiTest Bridge | Code | ๐Ÿ“‹ Planned | IT-01 - Capture widget state on failure for debugging | +| IT-09 | Create standardized test suite format for CI integration | ImGuiTest Bridge | Infra | ๐Ÿ“‹ Planned | IT-07 - JSON/YAML test suite format compatible with CI/CD pipelines | | VP-01 | Expand CLI unit tests for new commands and sandbox flow. | Verification Pipeline | Test | ๐Ÿ“‹ Planned | RC/AW tasks | | VP-02 | Add harness integration tests with replay scripts. | Verification Pipeline | Test | ๐Ÿ“‹ Planned | IT tasks | | VP-03 | Create CI job running agent smoke tests with `YAZE_WITH_JSON`. | Verification Pipeline | Infra | ๐Ÿ“‹ Planned | VP-01, VP-02 | @@ -234,10 +469,10 @@ This plan decomposes the design additions into actionable engineering tasks. Eac _Status Legend: ๐Ÿ”„ Active ยท ๐Ÿ“‹ Planned ยท โœ… Done_ **Progress Summary**: -- โœ… Completed: 11 tasks (61%) -- ๐Ÿ”„ Active: 1 task (6%) -- ๐Ÿ“‹ Planned: 6 tasks (33%) -- **Total**: 18 tasks +- โœ… Completed: 11 tasks (48%) +- ๐Ÿ”„ Active: 1 task (4%) +- ๐Ÿ“‹ Planned: 11 tasks (48%) +- **Total**: 23 tasks (5 new test harness enhancements added) ## 3. Immediate Next Steps (Week of Oct 1-7, 2025) diff --git a/docs/z3ed/E6-z3ed-reference.md b/docs/z3ed/E6-z3ed-reference.md index 86f21e90..b3285170 100644 --- a/docs/z3ed/E6-z3ed-reference.md +++ b/docs/z3ed/E6-z3ed-reference.md @@ -59,7 +59,14 @@ โ”‚ โ”œโ”€ Type (text input) โ”‚ โ”‚ โ”œโ”€ Wait (condition polling) โ”‚ โ”‚ โ”œโ”€ Assert (state validation) โ”‚ -โ”‚ โ””โ”€ Screenshot (capture) [Stub] โ”‚ +โ”‚ โ”œโ”€ Screenshot (capture) [Stub โ†’ IT-08] โ”‚ +โ”‚ โ”œโ”€ GetTestStatus (query test execution) [IT-05] โ”‚ +โ”‚ โ”œโ”€ ListTests (enumerate tests) [IT-05] โ”‚ +โ”‚ โ”œโ”€ GetTestResults (detailed results) [IT-05] โ”‚ +โ”‚ โ”œโ”€ DiscoverWidgets (widget enumeration) [IT-06] โ”‚ +โ”‚ โ”œโ”€ StartRecording (test recording) [IT-07] โ”‚ +โ”‚ โ”œโ”€ StopRecording (finish recording) [IT-07] โ”‚ +โ”‚ โ””โ”€ ReplayTest (execute test script) [IT-07] โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” @@ -226,14 +233,170 @@ Examples: **Prerequisites**: 1. YAZE running with test harness: ```bash - ./yaze --enable_test_harness --test_harness_port=50052 --rom_file=zelda3.sfc & + ./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze \ + --enable_test_harness \ + --test_harness_port=50052 \ + --rom_file=assets/zelda3.sfc & ``` 2. z3ed built with gRPC support: ```bash cmake -B build-grpc-test -DYAZE_WITH_GRPC=ON - cmake --build build-grpc-test --target z3ed + cmake --build build-grpc-test --target z3ed -j$(sysctl -n hw.ncpu) ``` +#### `agent gui` - GUI Introspection & Control (IT-05/IT-06) + +##### `agent gui discover` - Enumerate available widgets +```bash +z3ed agent gui discover [--window ] [--type ] [--format ] + +Options: + --window Filter by window name (e.g. "Overworld") + --type Filter by widget type: button, input, menu, tab, checkbox + --format Output format: json or yaml (default: yaml) + +Examples: + # Discover all widgets + z3ed agent gui discover + + # Find all buttons in Overworld editor + z3ed agent gui discover --window "Overworld" --type button + + # Get JSON for AI consumption + z3ed agent gui discover --format json > widgets.json +``` + +**Output Example**: +```yaml +windows: + - name: Main Window + visible: true + widgets: + - id: menu_file + label: File + type: menu + enabled: true + suggested_action: "Click menuitem: File" + - name: Overworld + visible: true + widgets: + - id: btn_save + label: Save + type: button + enabled: true + position: "10,20,100,30" + suggested_action: "Click button:Save" +``` + +**Use Cases**: +- AI agents discover available GUI interactions dynamically +- Test scripts validate expected widgets are present +- Documentation generation for GUI features + +##### `agent test status` - Query test execution state +```bash +z3ed agent test status --test-id [--follow] + +Options: + --test-id Test ID from test command output + --follow Continuously poll until test completes (blocking) + +Example: + z3ed agent test status --test-id grpc_click_12345678 --follow +``` + +**Output**: +```yaml +test_id: grpc_click_12345678 +status: PASSED +execution_time_ms: 1234 +started_at: 2025-10-02T14:23:45Z +completed_at: 2025-10-02T14:23:46Z +assertions_passed: 3 +assertions_failed: 0 +``` + +##### `agent test results` - Get detailed test results +```bash +z3ed agent test results --test-id [--format ] [--include-logs] + +Options: + --test-id Test ID to retrieve results for + --format Output format (default: yaml) + --include-logs Include full execution logs + +Example: + z3ed agent test results --test-id grpc_click_12345678 --include-logs +``` + +##### `agent test list` - List all tests +```bash +z3ed agent test list [--category ] [--status ] + +Options: + --category Filter by category: grpc, unit, integration, e2e + --status Filter by status: passed, failed, running, queued + +Example: + z3ed agent test list --category grpc --status failed +``` + +#### `agent test record` - Record test sessions (IT-07) + +##### `agent test record start` - Begin recording +```bash +z3ed agent test record start --output [--description "..."] + +Options: + --output Output file for test script (JSON) + --description Human-readable test description + +Example: + z3ed agent test record start --output tests/overworld_load.json \ + --description "Test Overworld editor loading" +``` + +##### `agent test record stop` - Finish recording +```bash +z3ed agent test record stop [--validate] + +Options: + --validate Run recorded test immediately to verify it works + +Example: + z3ed agent test record stop --validate +``` + +#### `agent test replay` - Execute recorded tests +```bash +z3ed agent test replay [--ci-mode] [--output-dir ] + +Options: + --ci-mode Exit with code 1 on failure, generate JUnit XML + --output-dir Directory for test results (default: test-results/) + +Examples: + # Run single test + z3ed agent test replay tests/overworld_load.json + + # Run test suite in CI + z3ed agent test replay tests/suite.yaml --ci-mode +``` + +#### `agent test suite` - Manage test suites (IT-09) +```bash +z3ed agent test suite [options] + +Actions: + run Run test suite (YAML/JSON) + create Create new test suite interactively + validate Validate test suite format + +Examples: + z3ed agent test suite run tests/smoke.yaml + z3ed agent test suite validate tests/regression.yaml +``` + ### ROM Commands #### `rom info` - Display ROM metadata diff --git a/docs/z3ed/IT-05-IMPLEMENTATION-GUIDE.md b/docs/z3ed/IT-05-IMPLEMENTATION-GUIDE.md new file mode 100644 index 00000000..efba3806 --- /dev/null +++ b/docs/z3ed/IT-05-IMPLEMENTATION-GUIDE.md @@ -0,0 +1,722 @@ +# IT-05: T## Motivation + +**Current Limitations**: +- โŒ Tests execute asynchronously with no way to query status +- โŒ Clients must poll blindly or give up early +- โŒ No visibility into test execution queue +- โŒ Results lost after test completion +- โŒ Can't track test history or identify flaky tests + +**Why This Blocks AI Agent Autonomy**: + +Without test introspection, **AI agents cannot implement closed-loop feedback**: + +``` +โŒ BROKEN: AI Agent Without IT-05 +1. AI generates commands: ["z3ed palette export ..."] +2. AI executes commands in sandbox +3. AI generates test: "Verify soldier is red" +4. AI runs test โ†’ Gets test_id +5. ??? AI has no way to check if test passed ??? +6. AI presents proposal to user blindly + (might be broken, AI doesn't know) + +โœ… WORKING: AI Agent With IT-05 +1. AI generates commands +2. AI executes in sandbox +3. AI generates verification test +4. AI runs test โ†’ Gets test_id +5. AI polls: GetTestStatus(test_id) +6. Test FAILED? AI sees error + screenshot +7. AI adjusts strategy and retries +8. Test PASSED? AI presents successful proposal +``` + +**This is the difference between**: +- **Dumb automation**: Execute blindly, hope for the best +- **Intelligent agent**: Verify, learn, self-correct + +**Benefits After IT-05**: +- โœ… AI agents can reliably poll for test completion +- โœ… AI agents can read detailed failure messages +- โœ… AI agents can implement retry logic with adjusted strategies +- โœ… CLI can show real-time progress bars +- โœ… Test history enables trend analysis (flaky tests, performance regressions) +- โœ… Foundation for test recording/replay (IT-07) +- โœ… **Enables autonomous agent operation**ion API - Implementation Guide + +**Status**: ๐Ÿ“‹ Planned | Priority 1 | Time Estimate: 6-8 hours +**Dependencies**: IT-01 Complete โœ…, IT-02 Complete โœ… +**Blocking**: IT-06 (Widget Discovery needs introspection foundation) + +## Overview + +Add test introspection capabilities to enable clients to query test execution status, list available tests, and retrieve detailed results. This is critical for AI agents to reliably poll for test completion and make decisions based on results. + +## Motivation + +**Current Limitations**: +- โŒ Tests execute asynchronously with no way to query status +- โŒ Clients must poll blindly or give up early +- โŒ No visibility into test execution queue +- โŒ Results lost after test completion +- โŒ Can't track test history or identify flaky tests + +**Benefits After IT-05**: +- โœ… AI agents can reliably poll for test completion +- โœ… CLI can show real-time progress bars +- โœ… Test history enables trend analysis +- โœ… Foundation for test recording/replay (IT-07) + +## Architecture + +### New Service Components + +```cpp +// src/app/core/test_manager.h +class TestManager { + // Existing... + + // NEW: Test tracking + struct TestExecution { + std::string test_id; + std::string name; + std::string category; + TestStatus status; // QUEUED, RUNNING, PASSED, FAILED, TIMEOUT + int64_t queued_at_ms; + int64_t started_at_ms; + int64_t completed_at_ms; + int32_t execution_time_ms; + std::string error_message; + std::vector assertion_failures; + std::vector logs; + }; + + // NEW: Test execution tracking + absl::StatusOr GetTestStatus(const std::string& test_id); + std::vector ListTests(const std::string& category_filter = ""); + absl::StatusOr GetTestResults(const std::string& test_id); + + private: + // NEW: Test execution history + std::map test_history_; + absl::Mutex test_history_mutex_; // Thread-safe access +}; +``` + +### Proto Additions + +```protobuf +// src/app/core/proto/imgui_test_harness.proto + +// Add to service definition +service ImGuiTestHarness { + // ... existing RPCs ... + + // NEW: Test introspection + rpc GetTestStatus(GetTestStatusRequest) returns (GetTestStatusResponse); + rpc ListTests(ListTestsRequest) returns (ListTestsResponse); + rpc GetTestResults(GetTestResultsRequest) returns (GetTestResultsResponse); +} + +// ============================================================================ +// GetTestStatus - Query test execution state +// ============================================================================ + +message GetTestStatusRequest { + string test_id = 1; // Test ID from Click/Type/Wait/Assert response +} + +message GetTestStatusResponse { + enum Status { + UNKNOWN = 0; // Test ID not found + QUEUED = 1; // Waiting to execute + RUNNING = 2; // Currently executing + PASSED = 3; // Completed successfully + FAILED = 4; // Assertion failed or error + TIMEOUT = 5; // Exceeded timeout + } + + Status status = 1; + int64 queued_at_ms = 2; // When test was queued + int64 started_at_ms = 3; // When test started (0 if not started) + int64 completed_at_ms = 4; // When test completed (0 if not complete) + int32 execution_time_ms = 5; // Total execution time + string error_message = 6; // Error details if FAILED/TIMEOUT + repeated string assertion_failures = 7; // Failed assertion details +} + +// ============================================================================ +// ListTests - Enumerate available tests +// ============================================================================ + +message ListTestsRequest { + string category_filter = 1; // Optional: "grpc", "unit", "integration", "e2e" + int32 page_size = 2; // Number of results per page (default 100) + string page_token = 3; // Pagination token from previous response +} + +message ListTestsResponse { + repeated TestInfo tests = 1; + string next_page_token = 2; // Token for next page (empty if no more) + int32 total_count = 3; // Total number of matching tests +} + +message TestInfo { + string test_id = 1; // Unique test identifier + string name = 2; // Human-readable test name + string category = 3; // Category: grpc, unit, integration, e2e + int64 last_run_timestamp_ms = 4; // When test last executed + int32 total_runs = 5; // Total number of executions + int32 pass_count = 6; // Number of successful runs + int32 fail_count = 7; // Number of failed runs + int32 average_duration_ms = 8; // Average execution time +} + +// ============================================================================ +// GetTestResults - Retrieve detailed results +// ============================================================================ + +message GetTestResultsRequest { + string test_id = 1; + bool include_logs = 2; // Include full execution logs +} + +message GetTestResultsResponse { + bool success = 1; // Overall test result + string test_name = 2; + string category = 3; + int64 executed_at_ms = 4; + int32 duration_ms = 5; + + // Detailed results + repeated AssertionResult assertions = 6; + repeated string logs = 7; // If include_logs=true + + // Performance metrics + map metrics = 8; // e.g., "frame_count": 123 +} + +message AssertionResult { + string description = 1; + bool passed = 2; + string expected_value = 3; + string actual_value = 4; + string error_message = 5; +} +``` + +## Implementation Steps + +### Step 1: Extend TestManager (2-3 hours) + +#### 1.1 Add Test Execution Tracking + +**File**: `src/app/core/test_manager.h` + +```cpp +#include +#include +#include "absl/synchronization/mutex.h" +#include "absl/time/time.h" + +class TestManager { + public: + enum class TestStatus { + UNKNOWN = 0, + QUEUED = 1, + RUNNING = 2, + PASSED = 3, + FAILED = 4, + TIMEOUT = 5 + }; + + struct TestExecution { + std::string test_id; + std::string name; + std::string category; + TestStatus status; + absl::Time queued_at; + absl::Time started_at; + absl::Time completed_at; + absl::Duration execution_time; + std::string error_message; + std::vector assertion_failures; + std::vector logs; + std::map metrics; + }; + + // NEW: Introspection API + absl::StatusOr GetTestStatus(const std::string& test_id); + std::vector ListTests(const std::string& category_filter = ""); + absl::StatusOr GetTestResults(const std::string& test_id); + + // NEW: Recording test execution + void RecordTestStart(const std::string& test_id, const std::string& name, + const std::string& category); + void RecordTestComplete(const std::string& test_id, TestStatus status, + const std::string& error_message = ""); + void AddTestLog(const std::string& test_id, const std::string& log_entry); + void AddTestMetric(const std::string& test_id, const std::string& key, + int32_t value); + + private: + std::map test_history_ ABSL_GUARDED_BY(history_mutex_); + absl::Mutex history_mutex_; + + // Helper: Generate unique test ID + std::string GenerateTestId(const std::string& prefix); +}; +``` + +**File**: `src/app/core/test_manager.cc` + +```cpp +#include "src/app/core/test_manager.h" +#include "absl/strings/str_format.h" +#include "absl/time/clock.h" +#include + +std::string TestManager::GenerateTestId(const std::string& prefix) { + static std::random_device rd; + static std::mt19937 gen(rd()); + static std::uniform_int_distribution<> dis(10000000, 99999999); + + return absl::StrFormat("%s_%d", prefix, dis(gen)); +} + +void TestManager::RecordTestStart(const std::string& test_id, + const std::string& name, + const std::string& category) { + absl::MutexLock lock(&history_mutex_); + + TestExecution& exec = test_history_[test_id]; + exec.test_id = test_id; + exec.name = name; + exec.category = category; + exec.status = TestStatus::RUNNING; + exec.started_at = absl::Now(); + exec.queued_at = exec.started_at; // For now, no separate queue +} + +void TestManager::RecordTestComplete(const std::string& test_id, + TestStatus status, + const std::string& error_message) { + absl::MutexLock lock(&history_mutex_); + + auto it = test_history_.find(test_id); + if (it == test_history_.end()) return; + + TestExecution& exec = it->second; + exec.status = status; + exec.completed_at = absl::Now(); + exec.execution_time = exec.completed_at - exec.started_at; + exec.error_message = error_message; +} + +void TestManager::AddTestLog(const std::string& test_id, + const std::string& log_entry) { + absl::MutexLock lock(&history_mutex_); + + auto it = test_history_.find(test_id); + if (it != test_history_.end()) { + it->second.logs.push_back(log_entry); + } +} + +void TestManager::AddTestMetric(const std::string& test_id, + const std::string& key, + int32_t value) { + absl::MutexLock lock(&history_mutex_); + + auto it = test_history_.find(test_id); + if (it != test_history_.end()) { + it->second.metrics[key] = value; + } +} + +absl::StatusOr TestManager::GetTestStatus( + const std::string& test_id) { + absl::MutexLock lock(&history_mutex_); + + auto it = test_history_.find(test_id); + if (it == test_history_.end()) { + return absl::NotFoundError( + absl::StrFormat("Test ID '%s' not found", test_id)); + } + + return it->second; +} + +std::vector TestManager::ListTests( + const std::string& category_filter) { + absl::MutexLock lock(&history_mutex_); + + std::vector results; + for (const auto& [id, exec] : test_history_) { + if (category_filter.empty() || exec.category == category_filter) { + results.push_back(exec); + } + } + + return results; +} + +absl::StatusOr TestManager::GetTestResults( + const std::string& test_id) { + // Same as GetTestStatus for now + return GetTestStatus(test_id); +} +``` + +#### 1.2 Update Existing RPC Handlers + +**File**: `src/app/core/imgui_test_harness_service.cc` + +Modify Click, Type, Wait, Assert handlers to record test execution: + +```cpp +absl::Status ImGuiTestHarnessServiceImpl::Click( + const ClickRequest* request, ClickResponse* response) { + + // Generate unique test ID + std::string test_id = test_manager_->GenerateTestId("grpc_click"); + + // Record test start + test_manager_->RecordTestStart( + test_id, + absl::StrFormat("Click: %s", request->target()), + "grpc"); + + // ... existing implementation ... + + // Record test completion + if (success) { + test_manager_->RecordTestComplete(test_id, TestManager::TestStatus::PASSED); + } else { + test_manager_->RecordTestComplete( + test_id, TestManager::TestStatus::FAILED, error_message); + } + + // Add test ID to response (requires proto update) + response->set_test_id(test_id); + + return absl::OkStatus(); +} +``` + +**Proto Update**: Add `test_id` field to all responses: + +```protobuf +message ClickResponse { + bool success = 1; + string message = 2; + int32 execution_time_ms = 3; + string test_id = 4; // NEW: Unique test identifier for introspection +} + +// Repeat for TypeResponse, WaitResponse, AssertResponse +``` + +### Step 2: Implement Introspection RPCs (2-3 hours) + +**File**: `src/app/core/imgui_test_harness_service.cc` + +```cpp +absl::Status ImGuiTestHarnessServiceImpl::GetTestStatus( + const GetTestStatusRequest* request, + GetTestStatusResponse* response) { + + auto status_or = test_manager_->GetTestStatus(request->test_id()); + if (!status_or.ok()) { + response->set_status(GetTestStatusResponse::UNKNOWN); + return absl::OkStatus(); // Not an RPC error, just test not found + } + + const auto& exec = status_or.value(); + + // Map internal status to proto status + switch (exec.status) { + case TestManager::TestStatus::QUEUED: + response->set_status(GetTestStatusResponse::QUEUED); + break; + case TestManager::TestStatus::RUNNING: + response->set_status(GetTestStatusResponse::RUNNING); + break; + case TestManager::TestStatus::PASSED: + response->set_status(GetTestStatusResponse::PASSED); + break; + case TestManager::TestStatus::FAILED: + response->set_status(GetTestStatusResponse::FAILED); + break; + case TestManager::TestStatus::TIMEOUT: + response->set_status(GetTestStatusResponse::TIMEOUT); + break; + default: + response->set_status(GetTestStatusResponse::UNKNOWN); + } + + // Convert absl::Time to milliseconds since epoch + response->set_queued_at_ms(absl::ToUnixMillis(exec.queued_at)); + response->set_started_at_ms(absl::ToUnixMillis(exec.started_at)); + response->set_completed_at_ms(absl::ToUnixMillis(exec.completed_at)); + response->set_execution_time_ms(absl::ToInt64Milliseconds(exec.execution_time)); + response->set_error_message(exec.error_message); + + for (const auto& failure : exec.assertion_failures) { + response->add_assertion_failures(failure); + } + + return absl::OkStatus(); +} + +absl::Status ImGuiTestHarnessServiceImpl::ListTests( + const ListTestsRequest* request, + ListTestsResponse* response) { + + auto tests = test_manager_->ListTests(request->category_filter()); + + // TODO: Implement pagination if needed + response->set_total_count(tests.size()); + + for (const auto& exec : tests) { + auto* test_info = response->add_tests(); + test_info->set_test_id(exec.test_id); + test_info->set_name(exec.name); + test_info->set_category(exec.category); + test_info->set_last_run_timestamp_ms(absl::ToUnixMillis(exec.completed_at)); + test_info->set_total_runs(1); // TODO: Track across multiple runs + + if (exec.status == TestManager::TestStatus::PASSED) { + test_info->set_pass_count(1); + test_info->set_fail_count(0); + } else { + test_info->set_pass_count(0); + test_info->set_fail_count(1); + } + + test_info->set_average_duration_ms( + absl::ToInt64Milliseconds(exec.execution_time)); + } + + return absl::OkStatus(); +} + +absl::Status ImGuiTestHarnessServiceImpl::GetTestResults( + const GetTestResultsRequest* request, + GetTestResultsResponse* response) { + + auto status_or = test_manager_->GetTestResults(request->test_id()); + if (!status_or.ok()) { + return absl::NotFoundError( + absl::StrFormat("Test '%s' not found", request->test_id())); + } + + const auto& exec = status_or.value(); + + response->set_success(exec.status == TestManager::TestStatus::PASSED); + response->set_test_name(exec.name); + response->set_category(exec.category); + response->set_executed_at_ms(absl::ToUnixMillis(exec.completed_at)); + response->set_duration_ms(absl::ToInt64Milliseconds(exec.execution_time)); + + // Include logs if requested + if (request->include_logs()) { + for (const auto& log : exec.logs) { + response->add_logs(log); + } + } + + // Add metrics + for (const auto& [key, value] : exec.metrics) { + (*response->mutable_metrics())[key] = value; + } + + return absl::OkStatus(); +} +``` + +### Step 3: CLI Integration (1-2 hours) + +**File**: `src/cli/handlers/agent.cc` + +Add new CLI commands for test introspection: + +```cpp +// z3ed agent test status --test-id [--follow] +absl::Status HandleAgentTestStatus(const CommandOptions& options) { + const std::string test_id = absl::GetFlag(FLAGS_test_id); + const bool follow = absl::GetFlag(FLAGS_follow); + + GuiAutomationClient client("localhost", 50052); + RETURN_IF_ERROR(client.Connect()); + + while (true) { + auto status_or = client.GetTestStatus(test_id); + RETURN_IF_ERROR(status_or.status()); + + const auto& status = status_or.value(); + + // Print status + std::cout << "Test ID: " << test_id << "\n"; + std::cout << "Status: " << StatusToString(status.status) << "\n"; + std::cout << "Execution Time: " << status.execution_time_ms << "ms\n"; + + if (status.status == TestStatus::PASSED || + status.status == TestStatus::FAILED || + status.status == TestStatus::TIMEOUT) { + break; // Terminal state + } + + if (!follow) break; + + // Poll every 500ms + absl::SleepFor(absl::Milliseconds(500)); + } + + return absl::OkStatus(); +} + +// z3ed agent test results --test-id [--format json] [--include-logs] +absl::Status HandleAgentTestResults(const CommandOptions& options) { + const std::string test_id = absl::GetFlag(FLAGS_test_id); + const std::string format = absl::GetFlag(FLAGS_format); + const bool include_logs = absl::GetFlag(FLAGS_include_logs); + + GuiAutomationClient client("localhost", 50052); + RETURN_IF_ERROR(client.Connect()); + + auto results_or = client.GetTestResults(test_id, include_logs); + RETURN_IF_ERROR(results_or.status()); + + const auto& results = results_or.value(); + + if (format == "json") { + // Output JSON + PrintTestResultsJson(results); + } else { + // Output YAML (default) + PrintTestResultsYaml(results); + } + + return absl::OkStatus(); +} + +// z3ed agent test list [--category ] [--status ] +absl::Status HandleAgentTestList(const CommandOptions& options) { + const std::string category = absl::GetFlag(FLAGS_category); + const std::string status_filter = absl::GetFlag(FLAGS_status); + + GuiAutomationClient client("localhost", 50052); + RETURN_IF_ERROR(client.Connect()); + + auto tests_or = client.ListTests(category); + RETURN_IF_ERROR(tests_or.status()); + + const auto& tests = tests_or.value(); + + // Print table + std::cout << "=== Test List ===\n\n"; + std::cout << absl::StreamFormat("%-20s %-30s %-10s %-10s\n", + "Test ID", "Name", "Category", "Status"); + std::cout << std::string(80, '-') << "\n"; + + for (const auto& test : tests) { + std::cout << absl::StreamFormat("%-20s %-30s %-10s %-10s\n", + test.test_id, test.name, test.category, + StatusToString(test.last_status)); + } + + return absl::OkStatus(); +} +``` + +### Step 4: Testing & Validation (1 hour) + +#### Test Script: `scripts/test_introspection_e2e.sh` + +```bash +#!/bin/bash +# Test introspection API + +set -e + +# Start YAZE +./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze \ + --enable_test_harness \ + --test_harness_port=50052 \ + --rom_file=assets/zelda3.sfc & + +YAZE_PID=$! +sleep 3 + +# Test 1: Run a test and capture test ID +echo "Test 1: GetTestStatus" +TEST_ID=$(z3ed agent test --prompt "Open Overworld" --output json | jq -r '.test_id') +echo "Test ID: $TEST_ID" + +# Test 2: Poll for status +echo "Test 2: Poll status" +z3ed agent test status --test-id $TEST_ID --follow + +# Test 3: Get results +echo "Test 3: Get results" +z3ed agent test results --test-id $TEST_ID --format yaml --include-logs + +# Test 4: List all tests +echo "Test 4: List tests" +z3ed agent test list --category grpc + +# Cleanup +kill $YAZE_PID +``` + +## Success Criteria + +- [ ] All 3 new RPCs respond correctly +- [ ] Test IDs returned in Click/Type/Wait/Assert responses +- [ ] Status polling works with `--follow` flag +- [ ] Test history persists across multiple test runs +- [ ] CLI commands output clean YAML/JSON +- [ ] No memory leaks in test history tracking +- [ ] Thread-safe access to test history +- [ ] Documentation updated in E6-z3ed-reference.md + +## Migration Guide + +**For Existing Code**: +- No breaking changes - new RPCs only +- Existing tests continue to work +- Test ID field added to responses (backwards compatible) + +**For CLI Users**: +```bash +# Old: Test runs, no way to check status +z3ed agent test --prompt "Open Overworld" + +# New: Get test ID, poll for status +TEST_ID=$(z3ed agent test --prompt "Open Overworld" --output json | jq -r '.test_id') +z3ed agent test status --test-id $TEST_ID --follow +z3ed agent test results --test-id $TEST_ID +``` + +## Next Steps + +After IT-05 completion: +1. **IT-06**: Widget Discovery API (uses introspection foundation) +2. **IT-07**: Test Recording & Replay (records test IDs and results) +3. **IT-08**: Enhanced Error Reporting (captures test context on failure) + +## References + +- **Proto Definition**: `src/app/core/proto/imgui_test_harness.proto` +- **Test Manager**: `src/app/core/test_manager.{h,cc}` +- **RPC Service**: `src/app/core/imgui_test_harness_service.{h,cc}` +- **CLI Handlers**: `src/cli/handlers/agent.cc` +- **Main Plan**: `docs/z3ed/E6-z3ed-implementation-plan.md` + +--- + +**Author**: @scawful, GitHub Copilot +**Created**: October 2, 2025 +**Status**: Ready for implementation diff --git a/docs/z3ed/README.md b/docs/z3ed/README.md index f97288ab..4cae2dee 100644 --- a/docs/z3ed/README.md +++ b/docs/z3ed/README.md @@ -1,11 +1,19 @@ # z3ed: AI-Powered CLI for YAZE -**Status**: Active Development +**Status**: Active Development | Test Harness Enhancement Phase ## Overview `z3ed` is a command-line interface for YAZE that enables AI-driven ROM modifications through a proposal-based workflow. It provides both human-accessible commands for developers and machine-readable APIs for LLM integration, forming the backbone of an agentic development ecosystem. +**Recent Focus**: Evolving the ImGuiTestHarness from basic GUI automation into a comprehensive testing platform that serves dual purposes: +1. **AI-Driven Workflows**: Widget discovery, test introspection, and dynamic interaction learning +2. **Traditional GUI Testing**: Test recording/replay, CI/CD integration, and regression testing + +**๐Ÿค– Why This Matters**: These enhancements are **critical for AI agent autonomy**. Without them, AI agents can't verify their changes worked (no test polling), discover UI elements dynamically (hardcoded names), learn from demonstrations (no recording), or debug failures (no screenshots). The test harness evolution enables **fully autonomous agents** that can execute โ†’ verify โ†’ self-correct without human intervention. + +**๐Ÿ“‹ Implementation Status**: Core infrastructure complete (Phases 1-6, AW-01 to AW-04, IT-01 to IT-04). Currently in **Test Harness Enhancement Phase** (IT-05 to IT-09). See [IMPLEMENTATION_CONTINUATION.md](IMPLEMENTATION_CONTINUATION.md) for the detailed roadmap and LLM integration plans (Ollama, Gemini, Claude). + This directory contains the primary documentation for the `z3ed` system. ## Core Documentation @@ -21,6 +29,9 @@ Start here to understand the architecture, learn how to use the commands, and se 3. **[E6-z3ed-implementation-plan.md](E6-z3ed-implementation-plan.md)** - **Roadmap & Status** * The project's task backlog, roadmap, progress tracking, and a list of known issues. Check this document for current priorities and to see what's next. +4. **[IMPLEMENTATION_CONTINUATION.md](IMPLEMENTATION_CONTINUATION.md)** - **Current Phase & Next Steps** โญ + * Detailed continuation plan for test harness enhancements (IT-05 to IT-09). Start here to resume implementation with clear task breakdowns and success criteria. + ## Quick Start ### Build z3ed @@ -48,6 +59,46 @@ z3ed agent diff # Run an automated GUI test (requires test harness to be running) z3ed agent test --prompt "Open the Overworld editor and verify it loads" + +# Discover available GUI widgets for AI interaction +z3ed agent gui discover --window "Overworld" --type button + +# Record a test session for regression testing +z3ed agent test record start --output tests/overworld_load.json +# ... perform actions ... +z3ed agent test record stop + +# Replay recorded test +z3ed agent test replay tests/overworld_load.json + +# Query test execution status +z3ed agent test status --test-id grpc_click_12345678 --follow ``` See the **[Technical Reference](E6-z3ed-reference.md)** for a full command list. + +## Recent Enhancements + +**Test Harness Evolution** (Planned: IT-05 to IT-09): +- **Test Introspection**: Query test status, results, and execution history +- **Widget Discovery**: AI agents can enumerate available GUI interactions dynamically +- **Test Recording**: Capture manual workflows as JSON scripts for regression testing +- **Enhanced Debugging**: Screenshot capture, widget state dumps, execution context on failures +- **CI/CD Integration**: Standardized test suite format with JUnit XML output + +See **[E6-z3ed-cli-design.md ยง 9](E6-z3ed-cli-design.md#9-test-harness-evolution-from-automation-to-platform)** for detailed architecture and implementation roadmap. + +## Quick Navigation + +**๐Ÿ“– Getting Started**: +- **New to z3ed?** Start with this [README.md](README.md) then [E6-z3ed-cli-design.md](E6-z3ed-cli-design.md) +- **Want to use z3ed?** See [QUICK_REFERENCE.md](QUICK_REFERENCE.md) for all commands +- **Resume implementation?** Read [IMPLEMENTATION_CONTINUATION.md](IMPLEMENTATION_CONTINUATION.md) + +**๐Ÿ”ง Implementation Guides**: +- [IT-05-IMPLEMENTATION-GUIDE.md](IT-05-IMPLEMENTATION-GUIDE.md) - Test Introspection API (next priority) +- [STATUS_REPORT_OCT2.md](STATUS_REPORT_OCT2.md) - Complete progress summary + +**๐Ÿ“š Reference**: +- [E6-z3ed-reference.md](E6-z3ed-reference.md) - Technical reference and API docs +- [E6-z3ed-implementation-plan.md](E6-z3ed-implementation-plan.md) - Task backlog and roadmap