Files
yaze/docs/z3ed/NEXT_PRIORITIES_OCT2.md
scawful 286efdec6a Enhance ImGuiTestHarness with dynamic test integration and end-to-end validation
- Updated README.md to reflect the completion of IT-01 and the transition to end-to-end validation phase.
- Introduced a new end-to-end test script (scripts/test_harness_e2e.sh) for validating all RPC methods of the ImGuiTestHarness gRPC service.
- Implemented dynamic test functionality in ImGuiTestHarnessService for Type, Wait, and Assert methods, utilizing ImGuiTestEngine.
- Enhanced error handling and response messages for better clarity during test execution.
- Updated existing methods to support dynamic test registration and execution, ensuring robust interaction with the GUI elements.
2025-10-02 00:49:28 -04:00

20 KiB

z3ed Next Priorities - October 2, 2025

Current Status: IT-01 Complete | AW-03 Complete | Ready for E2E Validation

This document outlines the immediate next steps for the z3ed agent workflow system after completing IT-01 Phase 3 (ImGuiTestEngine integration).


Priority 1: End-to-End Workflow Validation (ACTIVE) 🔄

Goal: Validate the complete AI agent workflow from proposal creation through ROM commit
Time Estimate: 2-3 hours
Status: Ready to execute
Blocking: None - all prerequisites complete

Why This First?

  • Validate all systems work together in production
  • Identify any integration issues before building more features
  • Establish baseline for acceptable UX and performance
  • Document real-world usage patterns for future improvements

Task Breakdown

1.1. Automated Test Script Validation (30 min)

Goal: Verify E2E test script works correctly

# Run the automated test script
./scripts/test_harness_e2e.sh

# Expected: All 6 tests pass
# - Ping (health check)
# - Click (button interaction)
# - Type (text input)
# - Wait (condition polling)
# - Assert (state validation)
# - Screenshot (stub - not implemented message)

Success Criteria:

  • Script runs without errors
  • All RPCs return success responses
  • Server starts and stops cleanly
  • No port conflicts or hanging processes

Troubleshooting:

  • If port 50052 in use: killall yaze or use different port
  • If grpcurl missing: brew install grpcurl
  • If binary not found: Build with cmake --build build-grpc-test

1.2. Manual Workflow Testing (60 min)

Goal: Test complete proposal lifecycle with real GUI

Steps:

  1. Create Proposal via CLI:

    # Build z3ed
    cmake --build build --target z3ed -j8
    
    # Create test proposal with sandbox
    ./build/bin/z3ed agent run "Test proposal for validation" --sandbox
    
    # Verify proposal created
    ./build/bin/z3ed agent list
    ./build/bin/z3ed agent diff --proposal-id <ID>
    
  2. Launch YAZE GUI:

    ./build/bin/yaze.app/Contents/MacOS/yaze
    
    # Open ROM: File → Open ROM → assets/zelda3.sfc
    # Open drawer: Debug → Agent Proposals
    
  3. Test ProposalDrawer UI:

    • Verify proposal appears in list
    • Click proposal to select
    • Review metadata (ID, timestamp, sandbox_id)
    • Review execution log content
    • Review diff content (if any)
    • Test filtering (All/Pending/Accepted/Rejected)
    • Test Refresh button
  4. Test Accept Workflow:

    • Click "Accept" button
    • Confirm dialog appears
    • Verify ROM marked dirty (save prompt)
    • File → Save ROM
    • Verify proposal status changes to "Accepted"
  5. Test Reject Workflow:

    • Create another test proposal
    • Click "Reject" button
    • Confirm dialog appears
    • Verify status changes to "Rejected"
    • Verify sandbox ROM unchanged
  6. Test Delete Workflow:

    • Create another test proposal
    • Click "Delete" button
    • Confirm dialog appears
    • Verify proposal removed from list
    • Verify files cleaned up from disk

Success Criteria:

  • All workflows complete without crashes
  • ROM merging works correctly
  • Status updates persist across sessions
  • UI responsive and intuitive

Known Issues to Document:

  • Any UX friction points
  • Performance concerns with large diffs
  • Edge cases that need handling

1.3. Real Widget Testing (60 min)

Goal: Test GUI automation with actual YAZE widgets

Workflow 1: Open Overworld Editor:

# Start YAZE with test harness
./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze \
  --enable_test_harness \
  --test_harness_port=50052 \
  --rom_file=assets/zelda3.sfc &

# Wait for startup
sleep 2

# Test workflow
grpcurl -plaintext -import-path src/app/core/proto -proto imgui_test_harness.proto \
  -d '{"target":"button:Overworld","type":"LEFT"}' \
  127.0.0.1:50052 yaze.test.ImGuiTestHarness/Click

grpcurl -plaintext -import-path src/app/core/proto -proto imgui_test_harness.proto \
  -d '{"condition":"window_visible:Overworld Editor","timeout_ms":5000}' \
  127.0.0.1:50052 yaze.test.ImGuiTestHarness/Wait

grpcurl -plaintext -import-path src/app/core/proto -proto imgui_test_harness.proto \
  -d '{"condition":"visible:Overworld Editor"}' \
  127.0.0.1:50052 yaze.test.ImGuiTestHarness/Assert

Workflow 2: Open Dungeon Editor:

  • Click "button:Dungeon"
  • Wait "window_visible:Dungeon Editor"
  • Assert "visible:Dungeon Editor"

Workflow 3: Type in Input Field (if applicable):

  • Click "input:FieldName"
  • Type text with clear_first
  • Assert text_contains (partial implementation)

Success Criteria:

  • All real widgets respond to automation
  • Timeouts work correctly (5s default)
  • Error messages helpful when widgets not found
  • No crashes or hangs during automation

Document:

  • Widget naming conventions (button:Name, window:Name, input:Name)
  • Common timeout values needed
  • Edge cases (disabled buttons, hidden windows, etc.)

1.4. Documentation Updates (30 min)

Goal: Capture learnings and update guides

Files to Update:

  1. IT-01-QUICKSTART.md:

    • Add real widget examples
    • Document common workflows
    • Add troubleshooting for real scenarios
  2. E6-z3ed-implementation-plan.md:

    • Mark Priority 1 as complete
    • Add lessons learned section
    • Update known limitations
  3. STATE_SUMMARY_2025-10-02.md:

    • Add E2E validation results
    • Update status metrics
    • Document performance characteristics

Success Criteria:

  • New users can follow guides without getting stuck
  • Common issues documented with solutions
  • Real-world examples added

Priority 2: CLI Agent Test Command (IT-02) 📋

Goal: Natural language prompt → automated GUI test workflow
Time Estimate: 4-6 hours
Status: Ready to start after Priority 1
Blocking Dependency: Priority 1 completion

Why This Next?

  • Enables AI agents to drive YAZE GUI automatically
  • Makes GUI automation accessible via simple CLI commands
  • Provides foundation for complex multi-step workflows
  • Demonstrates value of IT-01 infrastructure

Design Overview

User Input:
  z3ed agent test --prompt "Open Overworld editor and verify it loads"

Workflow:
  1. Parse prompt → identify intent (open editor, verify visibility)
  2. Generate RPC sequence:
     - Click "button:Overworld"
     - Wait "window_visible:Overworld Editor" (5s timeout)
     - Assert "visible:Overworld Editor"
  3. Execute RPCs via gRPC client
  4. Capture results and report
  5. Optional: Screenshot for LLM feedback

Output:
  ✓ Clicked button:Overworld (85ms)
  ✓ Waited for window:Overworld Editor (1234ms)
  ✓ Asserted visible:Overworld Editor (12ms)
  
  Test passed in 1.331s

Implementation Tasks

2.1. Create gRPC Client Library (2 hours)

Files:

  • src/cli/service/gui_automation_client.h
  • src/cli/service/gui_automation_client.cc

Interface:

class GuiAutomationClient {
 public:
  static GuiAutomationClient& Instance();
  
  absl::Status Connect(const std::string& host, int port);
  absl::StatusOr<PingResponse> Ping(const std::string& message);
  absl::StatusOr<ClickResponse> Click(const std::string& target, ClickType type);
  absl::StatusOr<TypeResponse> Type(const std::string& target, 
                                    const std::string& text,
                                    bool clear_first);
  absl::StatusOr<WaitResponse> Wait(const std::string& condition,
                                    int timeout_ms,
                                    int poll_interval_ms);
  absl::StatusOr<AssertResponse> Assert(const std::string& condition);
  absl::StatusOr<ScreenshotResponse> Screenshot(const std::string& region,
                                                 const std::string& format);
  
 private:
  std::unique_ptr<yaze::test::ImGuiTestHarness::Stub> stub_;
};

Implementation Notes:

  • Use gRPC C++ client API
  • Handle connection errors gracefully
  • Support timeout configuration
  • Return structured results (not raw proto messages)

2.2. Create Test Workflow Generator (1.5 hours)

Files:

  • src/cli/service/test_workflow_generator.h
  • src/cli/service/test_workflow_generator.cc

Interface:

struct TestStep {
  enum Type { kClick, kType, kWait, kAssert, kScreenshot };
  Type type;
  std::string target;
  std::string value;
  int timeout_ms = 5000;
};

struct TestWorkflow {
  std::string description;
  std::vector<TestStep> steps;
};

class TestWorkflowGenerator {
 public:
  static absl::StatusOr<TestWorkflow> GenerateFromPrompt(
      const std::string& prompt);
  
 private:
  static absl::StatusOr<TestWorkflow> ParseSimplePrompt(
      const std::string& prompt);
  static absl::StatusOr<TestWorkflow> ParseComplexPrompt(
      const std::string& prompt);
};

Supported Prompt Patterns:

  1. Simple Open: "Open Overworld editor"

    • Click "button:Overworld"
    • Wait "window_visible:Overworld Editor"
  2. Open and Verify: "Open Dungeon editor and verify it loads"

    • Click "button:Dungeon"
    • Wait "window_visible:Dungeon Editor"
    • Assert "visible:Dungeon Editor"
  3. Type and Validate: "Type 'zelda3.sfc' in filename input"

    • Click "input:Filename"
    • Type "zelda3.sfc" with clear_first
    • Assert "text_contains:Filename:zelda3.sfc"
  4. Multi-Step: "Open Overworld, click tile, verify properties panel"

    • Click "button:Overworld"
    • Wait "window_visible:Overworld Editor"
    • Click "canvas:Overworld" (x, y coordinates)
    • Wait "window_visible:Properties"

Implementation Strategy:

  • Start with simple regex/pattern matching
  • Add more complex patterns iteratively
  • Return error for unsupported prompts
  • Suggest valid alternatives

2.3. Implement z3ed agent test Command (1.5 hours)

Files:

  • src/cli/handlers/agent.cc (add HandleTestCommand)
  • Update src/cli/modern_cli.cc routing

Command Interface:

z3ed agent test --prompt "..." [--host localhost] [--port 50052] [--timeout 30s]

Implementation:

absl::Status HandleTestCommand(const AgentOptions& options) {
  // 1. Parse prompt → workflow
  auto workflow_result = TestWorkflowGenerator::GenerateFromPrompt(
      options.prompt);
  if (!workflow_result.ok()) {
    return workflow_result.status();
  }
  TestWorkflow workflow = std::move(*workflow_result);
  
  // 2. Connect to test harness
  auto& client = GuiAutomationClient::Instance();
  auto status = client.Connect(options.host, options.port);
  if (!status.ok()) {
    return status;
  }
  
  // 3. Execute workflow steps
  for (const auto& step : workflow.steps) {
    auto result = ExecuteStep(client, step);
    if (!result.ok()) {
      return result;
    }
    PrintStepResult(step, *result);
  }
  
  std::cout << "\nTest passed!\n";
  return absl::OkStatus();
}

Output Format:

  • Progress indicators for each step
  • Execution time per step
  • Success/failure status
  • Error messages with context
  • Final summary

2.4. Testing and Documentation (1 hour)

Test Cases:

  1. Simple open editor test
  2. Multi-step workflow test
  3. Timeout handling test
  4. Connection error test
  5. Invalid widget test

Documentation:

  • Add IT-02 completion doc
  • Update implementation plan
  • Add examples to IT-01-QUICKSTART.md
  • Update resource catalog with agent test command

Success Criteria:

  • z3ed agent test works with 5+ different prompts
  • Error messages helpful for debugging
  • Documentation complete with examples
  • Ready for AI agent integration

Priority 3: Policy Evaluation Framework (AW-04) 📋

Goal: YAML-based constraint system for gating proposal acceptance
Time Estimate: 6-8 hours
Status: Can work in parallel with Priority 2
Blocking Dependency: None (UI integration requires AW-03)

Why This Matters?

  • Prevents dangerous/unwanted changes from being accepted
  • Enforces project-specific constraints (byte limits, bank restrictions)
  • Requires test coverage before acceptance
  • Provides audit trail for policy violations

Design Overview

Policy Configuration (.yaze/policies/agent.yaml):

version: 1.0
policies:
  # Test Requirements
  - name: require_tests
    type: test_requirement
    enabled: true
    severity: critical  # critical | warning | info
    rules:
      - test_suite: "overworld_rendering"
        min_pass_rate: 0.95
      - test_suite: "palette_integrity"
        min_pass_rate: 1.0
  
  # Change Constraints
  - name: limit_change_scope
    type: change_constraint
    enabled: true
    severity: critical
    rules:
      - max_bytes_changed: 10240  # 10KB limit
      - allowed_banks: [0x00, 0x01, 0x0E]  # Graphics banks only
      - forbidden_ranges:
          - start: 0xFFB0  # ROM header
            end: 0xFFFF
          - start: 0x0000  # System RAM
            end: 0x1FFF
  
  # Review Requirements
  - name: human_review_required
    type: review_requirement
    enabled: true
    severity: warning
    rules:
      - if: bytes_changed > 1024
        then: require_diff_review
      - if: commands_executed > 10
        then: require_log_review
      - if: new_files_created
        then: require_approval
  
  # CVE Checks
  - name: security_validation
    type: security_check
    enabled: true
    severity: critical
    rules:
      - check: no_known_cves
        message: "Dependencies must not have known CVEs"
      - check: checksum_valid
        message: "ROM checksum must be valid after changes"

Implementation Tasks

3.1. Policy Schema and Parser (2 hours)

Files:

  • src/cli/service/policy_evaluator.h
  • src/cli/service/policy_evaluator.cc
  • .yaze/policies/agent.yaml (example)

Data Structures:

enum class PolicySeverity { kCritical, kWarning, kInfo };
enum class PolicyType {
  kTestRequirement,
  kChangeConstraint,
  kReviewRequirement,
  kSecurityCheck
};

struct PolicyRule {
  std::string condition;
  std::string action;
  std::map<std::string, std::string> parameters;
};

struct Policy {
  std::string name;
  PolicyType type;
  PolicySeverity severity;
  bool enabled;
  std::vector<PolicyRule> rules;
};

struct PolicyViolation {
  std::string policy_name;
  PolicySeverity severity;
  std::string message;
  std::string actual_value;
  std::string expected_value;
};

struct PolicyResult {
  bool passed;
  std::vector<PolicyViolation> violations;
  
  bool HasCriticalViolations() const;
  bool HasWarnings() const;
};

YAML Parsing:

  • Use yaml-cpp library (already in vcpkg)
  • Parse policy file on startup
  • Validate schema (version, required fields)
  • Cache parsed policies in memory

3.2. Policy Evaluation Engine (2.5 hours)

Interface:

class PolicyEvaluator {
 public:
  static PolicyEvaluator& Instance();
  
  absl::Status LoadPolicies(const std::string& policy_dir = ".yaze/policies");
  absl::StatusOr<PolicyResult> EvaluateProposal(const std::string& proposal_id);
  
 private:
  absl::StatusOr<PolicyResult> EvaluateTestRequirements(
      const ProposalMetadata& proposal);
  absl::StatusOr<PolicyResult> EvaluateChangeConstraints(
      const ProposalMetadata& proposal);
  absl::StatusOr<PolicyResult> EvaluateReviewRequirements(
      const ProposalMetadata& proposal);
  absl::StatusOr<PolicyResult> EvaluateSecurityChecks(
      const ProposalMetadata& proposal);
  
  std::vector<Policy> policies_;
};

Evaluation Logic:

  1. Load proposal metadata (bytes changed, commands executed, etc.)
  2. Load proposal diff (for bank/range analysis)
  3. For each enabled policy:
    • Evaluate all rules
    • Collect violations
    • Determine overall pass/fail
  4. Return structured result

Example Evaluations:

  • Test Requirements: Check if test results exist and meet thresholds
  • Change Constraints: Analyze diff for byte count, bank ranges, forbidden areas
  • Review Requirements: Check metadata (bytes, commands, files)
  • Security Checks: Run ROM validation, checksum verification

3.3. ProposalDrawer Integration (2 hours)

Files:

  • src/app/editor/system/proposal_drawer.cc (update)

UI Changes:

  1. Add Policy Status Section (in detail view):

    Policy Status: [✓ Passed | ⚠ Warnings | ⛔ Failed]
    
    Critical Issues:
      ⛔ Test pass rate 85% < 95% (overworld_rendering)
      ⛔ Forbidden range modified: 0xFFB0-0xFFFF (ROM header)
    
    Warnings:
      ⚠ 2048 bytes changed > 1024 (requires diff review)
    
  2. Gate Accept Button:

    • Disable if critical violations exist
    • Show tooltip: "Accept blocked: 2 critical policy violations"
    • Enable override button (with confirmation + logging)
  3. Policy Override Dialog:

    Override Policy Violations?
    
    This action will be logged for audit purposes.
    
    Violations:
      • Test pass rate below threshold
      • ROM header modified
    
    Reason (required): [___________________________]
    
    [Cancel] [Override and Accept]
    

Integration Points:

void ProposalDrawer::DrawProposalDetail(const ProposalMetadata& proposal) {
  // ... existing metadata, diff, log sections ...
  
  // Add policy section
  ImGui::Separator();
  if (ImGui::CollapsingHeader("Policy Status", ImGuiTreeNodeFlags_DefaultOpen)) {
    DrawPolicyStatus(proposal.id);
  }
}

void ProposalDrawer::DrawPolicyStatus(const std::string& proposal_id) {
  auto& evaluator = PolicyEvaluator::Instance();
  auto result = evaluator.EvaluateProposal(proposal_id);
  
  if (!result.ok()) {
    ImGui::TextColored(ImVec4(1, 0, 0, 1), "Error evaluating policies");
    return;
  }
  
  const auto& policy_result = *result;
  
  // Show overall status
  if (policy_result.passed) {
    ImGui::TextColored(ImVec4(0, 1, 0, 1), "✓ All policies passed");
  } else if (policy_result.HasCriticalViolations()) {
    ImGui::TextColored(ImVec4(1, 0, 0, 1), "⛔ Critical violations");
  } else {
    ImGui::TextColored(ImVec4(1, 1, 0, 1), "⚠ Warnings present");
  }
  
  // List violations
  for (const auto& violation : policy_result.violations) {
    DrawViolation(violation);
  }
}

void ProposalDrawer::AcceptProposal(const std::string& proposal_id) {
  // Evaluate policies before accepting
  auto& evaluator = PolicyEvaluator::Instance();
  auto result = evaluator.EvaluateProposal(proposal_id);
  
  if (result.ok() && result->HasCriticalViolations()) {
    // Show override dialog instead of accepting directly
    show_policy_override_dialog_ = true;
    pending_accept_proposal_id_ = proposal_id;
    return;
  }
  
  // ... existing accept logic ...
}

3.4. Testing and Documentation (1.5 hours)

Test Cases:

  1. Valid proposal (all policies pass)
  2. Test requirement violation
  3. Change constraint violation
  4. Multiple violations
  5. Policy override workflow

Documentation:

  • Create AW-04-POLICY-FRAMEWORK.md with:
    • Policy schema reference
    • Built-in policy examples
    • How to write custom policies
    • Override audit trail
  • Update implementation plan
  • Update ProposalDrawer documentation

Success Criteria:

  • Policies loaded and evaluated correctly
  • UI clearly shows policy status
  • Accept button gated on critical violations
  • Override workflow functional with logging
  • Documentation complete

Timeline Summary

Week of Oct 2-8, 2025:

  • Days 1-2: Priority 1 (E2E Validation)
  • Days 3-4: Priority 2 (CLI Agent Test)
  • Days 5-7: Priority 3 (Policy Framework)

Expected Completion: October 8, 2025

Next After This:

  • Windows cross-platform testing
  • Screenshot implementation
  • Production telemetry (opt-in)
  • Advanced policy features

Success Metrics

By End of Week:

  • Complete proposal workflow validated end-to-end
  • z3ed agent test command operational with 5+ prompt patterns
  • Policy framework implemented and integrated
  • Documentation updated for all new features
  • Zero known blockers for production use

Quality Bar:

  • All code builds cleanly on macOS ARM64
  • No crashes or hangs in normal workflows
  • Error messages helpful and actionable
  • Documentation sufficient for new contributors
  • Ready for Windows testing phase

Last Updated: October 2, 2025
Contributors: @scawful, GitHub Copilot
License: Same as YAZE (see ../../LICENSE)