Files
yaze/docs/z3ed/NEXT_PRIORITIES_OCT2.md
scawful 286efdec6a Enhance ImGuiTestHarness with dynamic test integration and end-to-end validation
- Updated README.md to reflect the completion of IT-01 and the transition to end-to-end validation phase.
- Introduced a new end-to-end test script (scripts/test_harness_e2e.sh) for validating all RPC methods of the ImGuiTestHarness gRPC service.
- Implemented dynamic test functionality in ImGuiTestHarnessService for Type, Wait, and Assert methods, utilizing ImGuiTestEngine.
- Enhanced error handling and response messages for better clarity during test execution.
- Updated existing methods to support dynamic test registration and execution, ensuring robust interaction with the GUI elements.
2025-10-02 00:49:28 -04:00

715 lines
20 KiB
Markdown

# z3ed Next Priorities - October 2, 2025
**Current Status**: IT-01 Complete ✅ | AW-03 Complete ✅ | Ready for E2E Validation
This document outlines the immediate next steps for the z3ed agent workflow system after completing IT-01 Phase 3 (ImGuiTestEngine integration).
---
## Priority 1: End-to-End Workflow Validation (ACTIVE) 🔄
**Goal**: Validate the complete AI agent workflow from proposal creation through ROM commit
**Time Estimate**: 2-3 hours
**Status**: Ready to execute
**Blocking**: None - all prerequisites complete
### Why This First?
- Validate all systems work together in production
- Identify any integration issues before building more features
- Establish baseline for acceptable UX and performance
- Document real-world usage patterns for future improvements
### Task Breakdown
#### 1.1. Automated Test Script Validation (30 min)
**Goal**: Verify E2E test script works correctly
```bash
# Run the automated test script
./scripts/test_harness_e2e.sh
# Expected: All 6 tests pass
# - Ping (health check)
# - Click (button interaction)
# - Type (text input)
# - Wait (condition polling)
# - Assert (state validation)
# - Screenshot (stub - not implemented message)
```
**Success Criteria**:
- Script runs without errors
- All RPCs return success responses
- Server starts and stops cleanly
- No port conflicts or hanging processes
**Troubleshooting**:
- If port 50052 in use: `killall yaze` or use different port
- If grpcurl missing: `brew install grpcurl`
- If binary not found: Build with `cmake --build build-grpc-test`
#### 1.2. Manual Workflow Testing (60 min)
**Goal**: Test complete proposal lifecycle with real GUI
**Steps**:
1. **Create Proposal via CLI**:
```bash
# Build z3ed
cmake --build build --target z3ed -j8
# Create test proposal with sandbox
./build/bin/z3ed agent run "Test proposal for validation" --sandbox
# Verify proposal created
./build/bin/z3ed agent list
./build/bin/z3ed agent diff --proposal-id <ID>
```
2. **Launch YAZE GUI**:
```bash
./build/bin/yaze.app/Contents/MacOS/yaze
# Open ROM: File → Open ROM → assets/zelda3.sfc
# Open drawer: Debug → Agent Proposals
```
3. **Test ProposalDrawer UI**:
- ✅ Verify proposal appears in list
- ✅ Click proposal to select
- ✅ Review metadata (ID, timestamp, sandbox_id)
- ✅ Review execution log content
- ✅ Review diff content (if any)
- ✅ Test filtering (All/Pending/Accepted/Rejected)
- ✅ Test Refresh button
4. **Test Accept Workflow**:
- ✅ Click "Accept" button
- ✅ Confirm dialog appears
- ✅ Verify ROM marked dirty (save prompt)
- ✅ File → Save ROM
- ✅ Verify proposal status changes to "Accepted"
5. **Test Reject Workflow**:
- ✅ Create another test proposal
- ✅ Click "Reject" button
- ✅ Confirm dialog appears
- ✅ Verify status changes to "Rejected"
- ✅ Verify sandbox ROM unchanged
6. **Test Delete Workflow**:
- ✅ Create another test proposal
- ✅ Click "Delete" button
- ✅ Confirm dialog appears
- ✅ Verify proposal removed from list
- ✅ Verify files cleaned up from disk
**Success Criteria**:
- All workflows complete without crashes
- ROM merging works correctly
- Status updates persist across sessions
- UI responsive and intuitive
**Known Issues to Document**:
- Any UX friction points
- Performance concerns with large diffs
- Edge cases that need handling
#### 1.3. Real Widget Testing (60 min)
**Goal**: Test GUI automation with actual YAZE widgets
**Workflow 1: Open Overworld Editor**:
```bash
# Start YAZE with test harness
./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze \
--enable_test_harness \
--test_harness_port=50052 \
--rom_file=assets/zelda3.sfc &
# Wait for startup
sleep 2
# Test workflow
grpcurl -plaintext -import-path src/app/core/proto -proto imgui_test_harness.proto \
-d '{"target":"button:Overworld","type":"LEFT"}' \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Click
grpcurl -plaintext -import-path src/app/core/proto -proto imgui_test_harness.proto \
-d '{"condition":"window_visible:Overworld Editor","timeout_ms":5000}' \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Wait
grpcurl -plaintext -import-path src/app/core/proto -proto imgui_test_harness.proto \
-d '{"condition":"visible:Overworld Editor"}' \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Assert
```
**Workflow 2: Open Dungeon Editor**:
- Click "button:Dungeon"
- Wait "window_visible:Dungeon Editor"
- Assert "visible:Dungeon Editor"
**Workflow 3: Type in Input Field** (if applicable):
- Click "input:FieldName"
- Type text with clear_first
- Assert text_contains (partial implementation)
**Success Criteria**:
- All real widgets respond to automation
- Timeouts work correctly (5s default)
- Error messages helpful when widgets not found
- No crashes or hangs during automation
**Document**:
- Widget naming conventions (button:Name, window:Name, input:Name)
- Common timeout values needed
- Edge cases (disabled buttons, hidden windows, etc.)
#### 1.4. Documentation Updates (30 min)
**Goal**: Capture learnings and update guides
**Files to Update**:
1. **IT-01-QUICKSTART.md**:
- Add real widget examples
- Document common workflows
- Add troubleshooting for real scenarios
2. **E6-z3ed-implementation-plan.md**:
- Mark Priority 1 as complete
- Add lessons learned section
- Update known limitations
3. **STATE_SUMMARY_2025-10-02.md**:
- Add E2E validation results
- Update status metrics
- Document performance characteristics
**Success Criteria**:
- New users can follow guides without getting stuck
- Common issues documented with solutions
- Real-world examples added
---
## Priority 2: CLI Agent Test Command (IT-02) 📋
**Goal**: Natural language prompt → automated GUI test workflow
**Time Estimate**: 4-6 hours
**Status**: Ready to start after Priority 1
**Blocking Dependency**: Priority 1 completion
### Why This Next?
- Enables AI agents to drive YAZE GUI automatically
- Makes GUI automation accessible via simple CLI commands
- Provides foundation for complex multi-step workflows
- Demonstrates value of IT-01 infrastructure
### Design Overview
```
User Input:
z3ed agent test --prompt "Open Overworld editor and verify it loads"
Workflow:
1. Parse prompt → identify intent (open editor, verify visibility)
2. Generate RPC sequence:
- Click "button:Overworld"
- Wait "window_visible:Overworld Editor" (5s timeout)
- Assert "visible:Overworld Editor"
3. Execute RPCs via gRPC client
4. Capture results and report
5. Optional: Screenshot for LLM feedback
Output:
✓ Clicked button:Overworld (85ms)
✓ Waited for window:Overworld Editor (1234ms)
✓ Asserted visible:Overworld Editor (12ms)
Test passed in 1.331s
```
### Implementation Tasks
#### 2.1. Create gRPC Client Library (2 hours)
**Files**:
- `src/cli/service/gui_automation_client.h`
- `src/cli/service/gui_automation_client.cc`
**Interface**:
```cpp
class GuiAutomationClient {
public:
static GuiAutomationClient& Instance();
absl::Status Connect(const std::string& host, int port);
absl::StatusOr<PingResponse> Ping(const std::string& message);
absl::StatusOr<ClickResponse> Click(const std::string& target, ClickType type);
absl::StatusOr<TypeResponse> Type(const std::string& target,
const std::string& text,
bool clear_first);
absl::StatusOr<WaitResponse> Wait(const std::string& condition,
int timeout_ms,
int poll_interval_ms);
absl::StatusOr<AssertResponse> Assert(const std::string& condition);
absl::StatusOr<ScreenshotResponse> Screenshot(const std::string& region,
const std::string& format);
private:
std::unique_ptr<yaze::test::ImGuiTestHarness::Stub> stub_;
};
```
**Implementation Notes**:
- Use gRPC C++ client API
- Handle connection errors gracefully
- Support timeout configuration
- Return structured results (not raw proto messages)
#### 2.2. Create Test Workflow Generator (1.5 hours)
**Files**:
- `src/cli/service/test_workflow_generator.h`
- `src/cli/service/test_workflow_generator.cc`
**Interface**:
```cpp
struct TestStep {
enum Type { kClick, kType, kWait, kAssert, kScreenshot };
Type type;
std::string target;
std::string value;
int timeout_ms = 5000;
};
struct TestWorkflow {
std::string description;
std::vector<TestStep> steps;
};
class TestWorkflowGenerator {
public:
static absl::StatusOr<TestWorkflow> GenerateFromPrompt(
const std::string& prompt);
private:
static absl::StatusOr<TestWorkflow> ParseSimplePrompt(
const std::string& prompt);
static absl::StatusOr<TestWorkflow> ParseComplexPrompt(
const std::string& prompt);
};
```
**Supported Prompt Patterns**:
1. **Simple Open**: "Open Overworld editor"
- Click "button:Overworld"
- Wait "window_visible:Overworld Editor"
2. **Open and Verify**: "Open Dungeon editor and verify it loads"
- Click "button:Dungeon"
- Wait "window_visible:Dungeon Editor"
- Assert "visible:Dungeon Editor"
3. **Type and Validate**: "Type 'zelda3.sfc' in filename input"
- Click "input:Filename"
- Type "zelda3.sfc" with clear_first
- Assert "text_contains:Filename:zelda3.sfc"
4. **Multi-Step**: "Open Overworld, click tile, verify properties panel"
- Click "button:Overworld"
- Wait "window_visible:Overworld Editor"
- Click "canvas:Overworld" (x, y coordinates)
- Wait "window_visible:Properties"
**Implementation Strategy**:
- Start with simple regex/pattern matching
- Add more complex patterns iteratively
- Return error for unsupported prompts
- Suggest valid alternatives
#### 2.3. Implement `z3ed agent test` Command (1.5 hours)
**Files**:
- `src/cli/handlers/agent.cc` (add `HandleTestCommand`)
- Update `src/cli/modern_cli.cc` routing
**Command Interface**:
```bash
z3ed agent test --prompt "..." [--host localhost] [--port 50052] [--timeout 30s]
```
**Implementation**:
```cpp
absl::Status HandleTestCommand(const AgentOptions& options) {
// 1. Parse prompt → workflow
auto workflow_result = TestWorkflowGenerator::GenerateFromPrompt(
options.prompt);
if (!workflow_result.ok()) {
return workflow_result.status();
}
TestWorkflow workflow = std::move(*workflow_result);
// 2. Connect to test harness
auto& client = GuiAutomationClient::Instance();
auto status = client.Connect(options.host, options.port);
if (!status.ok()) {
return status;
}
// 3. Execute workflow steps
for (const auto& step : workflow.steps) {
auto result = ExecuteStep(client, step);
if (!result.ok()) {
return result;
}
PrintStepResult(step, *result);
}
std::cout << "\nTest passed!\n";
return absl::OkStatus();
}
```
**Output Format**:
- Progress indicators for each step
- Execution time per step
- Success/failure status
- Error messages with context
- Final summary
#### 2.4. Testing and Documentation (1 hour)
**Test Cases**:
1. Simple open editor test
2. Multi-step workflow test
3. Timeout handling test
4. Connection error test
5. Invalid widget test
**Documentation**:
- Add IT-02 completion doc
- Update implementation plan
- Add examples to IT-01-QUICKSTART.md
- Update resource catalog with `agent test` command
**Success Criteria**:
- `z3ed agent test` works with 5+ different prompts
- Error messages helpful for debugging
- Documentation complete with examples
- Ready for AI agent integration
---
## Priority 3: Policy Evaluation Framework (AW-04) 📋
**Goal**: YAML-based constraint system for gating proposal acceptance
**Time Estimate**: 6-8 hours
**Status**: Can work in parallel with Priority 2
**Blocking Dependency**: None (UI integration requires AW-03)
### Why This Matters?
- Prevents dangerous/unwanted changes from being accepted
- Enforces project-specific constraints (byte limits, bank restrictions)
- Requires test coverage before acceptance
- Provides audit trail for policy violations
### Design Overview
**Policy Configuration** (`.yaze/policies/agent.yaml`):
```yaml
version: 1.0
policies:
# Test Requirements
- name: require_tests
type: test_requirement
enabled: true
severity: critical # critical | warning | info
rules:
- test_suite: "overworld_rendering"
min_pass_rate: 0.95
- test_suite: "palette_integrity"
min_pass_rate: 1.0
# Change Constraints
- name: limit_change_scope
type: change_constraint
enabled: true
severity: critical
rules:
- max_bytes_changed: 10240 # 10KB limit
- allowed_banks: [0x00, 0x01, 0x0E] # Graphics banks only
- forbidden_ranges:
- start: 0xFFB0 # ROM header
end: 0xFFFF
- start: 0x0000 # System RAM
end: 0x1FFF
# Review Requirements
- name: human_review_required
type: review_requirement
enabled: true
severity: warning
rules:
- if: bytes_changed > 1024
then: require_diff_review
- if: commands_executed > 10
then: require_log_review
- if: new_files_created
then: require_approval
# CVE Checks
- name: security_validation
type: security_check
enabled: true
severity: critical
rules:
- check: no_known_cves
message: "Dependencies must not have known CVEs"
- check: checksum_valid
message: "ROM checksum must be valid after changes"
```
### Implementation Tasks
#### 3.1. Policy Schema and Parser (2 hours)
**Files**:
- `src/cli/service/policy_evaluator.h`
- `src/cli/service/policy_evaluator.cc`
- `.yaze/policies/agent.yaml` (example)
**Data Structures**:
```cpp
enum class PolicySeverity { kCritical, kWarning, kInfo };
enum class PolicyType {
kTestRequirement,
kChangeConstraint,
kReviewRequirement,
kSecurityCheck
};
struct PolicyRule {
std::string condition;
std::string action;
std::map<std::string, std::string> parameters;
};
struct Policy {
std::string name;
PolicyType type;
PolicySeverity severity;
bool enabled;
std::vector<PolicyRule> rules;
};
struct PolicyViolation {
std::string policy_name;
PolicySeverity severity;
std::string message;
std::string actual_value;
std::string expected_value;
};
struct PolicyResult {
bool passed;
std::vector<PolicyViolation> violations;
bool HasCriticalViolations() const;
bool HasWarnings() const;
};
```
**YAML Parsing**:
- Use `yaml-cpp` library (already in vcpkg)
- Parse policy file on startup
- Validate schema (version, required fields)
- Cache parsed policies in memory
#### 3.2. Policy Evaluation Engine (2.5 hours)
**Interface**:
```cpp
class PolicyEvaluator {
public:
static PolicyEvaluator& Instance();
absl::Status LoadPolicies(const std::string& policy_dir = ".yaze/policies");
absl::StatusOr<PolicyResult> EvaluateProposal(const std::string& proposal_id);
private:
absl::StatusOr<PolicyResult> EvaluateTestRequirements(
const ProposalMetadata& proposal);
absl::StatusOr<PolicyResult> EvaluateChangeConstraints(
const ProposalMetadata& proposal);
absl::StatusOr<PolicyResult> EvaluateReviewRequirements(
const ProposalMetadata& proposal);
absl::StatusOr<PolicyResult> EvaluateSecurityChecks(
const ProposalMetadata& proposal);
std::vector<Policy> policies_;
};
```
**Evaluation Logic**:
1. Load proposal metadata (bytes changed, commands executed, etc.)
2. Load proposal diff (for bank/range analysis)
3. For each enabled policy:
- Evaluate all rules
- Collect violations
- Determine overall pass/fail
4. Return structured result
**Example Evaluations**:
- **Test Requirements**: Check if test results exist and meet thresholds
- **Change Constraints**: Analyze diff for byte count, bank ranges, forbidden areas
- **Review Requirements**: Check metadata (bytes, commands, files)
- **Security Checks**: Run ROM validation, checksum verification
#### 3.3. ProposalDrawer Integration (2 hours)
**Files**:
- `src/app/editor/system/proposal_drawer.cc` (update)
**UI Changes**:
1. **Add Policy Status Section** (in detail view):
```
Policy Status: [✓ Passed | ⚠ Warnings | ⛔ Failed]
Critical Issues:
⛔ Test pass rate 85% < 95% (overworld_rendering)
⛔ Forbidden range modified: 0xFFB0-0xFFFF (ROM header)
Warnings:
⚠ 2048 bytes changed > 1024 (requires diff review)
```
2. **Gate Accept Button**:
- Disable if critical violations exist
- Show tooltip: "Accept blocked: 2 critical policy violations"
- Enable override button (with confirmation + logging)
3. **Policy Override Dialog**:
```
Override Policy Violations?
This action will be logged for audit purposes.
Violations:
• Test pass rate below threshold
• ROM header modified
Reason (required): [___________________________]
[Cancel] [Override and Accept]
```
**Integration Points**:
```cpp
void ProposalDrawer::DrawProposalDetail(const ProposalMetadata& proposal) {
// ... existing metadata, diff, log sections ...
// Add policy section
ImGui::Separator();
if (ImGui::CollapsingHeader("Policy Status", ImGuiTreeNodeFlags_DefaultOpen)) {
DrawPolicyStatus(proposal.id);
}
}
void ProposalDrawer::DrawPolicyStatus(const std::string& proposal_id) {
auto& evaluator = PolicyEvaluator::Instance();
auto result = evaluator.EvaluateProposal(proposal_id);
if (!result.ok()) {
ImGui::TextColored(ImVec4(1, 0, 0, 1), "Error evaluating policies");
return;
}
const auto& policy_result = *result;
// Show overall status
if (policy_result.passed) {
ImGui::TextColored(ImVec4(0, 1, 0, 1), "✓ All policies passed");
} else if (policy_result.HasCriticalViolations()) {
ImGui::TextColored(ImVec4(1, 0, 0, 1), "⛔ Critical violations");
} else {
ImGui::TextColored(ImVec4(1, 1, 0, 1), "⚠ Warnings present");
}
// List violations
for (const auto& violation : policy_result.violations) {
DrawViolation(violation);
}
}
void ProposalDrawer::AcceptProposal(const std::string& proposal_id) {
// Evaluate policies before accepting
auto& evaluator = PolicyEvaluator::Instance();
auto result = evaluator.EvaluateProposal(proposal_id);
if (result.ok() && result->HasCriticalViolations()) {
// Show override dialog instead of accepting directly
show_policy_override_dialog_ = true;
pending_accept_proposal_id_ = proposal_id;
return;
}
// ... existing accept logic ...
}
```
#### 3.4. Testing and Documentation (1.5 hours)
**Test Cases**:
1. Valid proposal (all policies pass)
2. Test requirement violation
3. Change constraint violation
4. Multiple violations
5. Policy override workflow
**Documentation**:
- Create AW-04-POLICY-FRAMEWORK.md with:
- Policy schema reference
- Built-in policy examples
- How to write custom policies
- Override audit trail
- Update implementation plan
- Update ProposalDrawer documentation
**Success Criteria**:
- Policies loaded and evaluated correctly
- UI clearly shows policy status
- Accept button gated on critical violations
- Override workflow functional with logging
- Documentation complete
---
## Timeline Summary
**Week of Oct 2-8, 2025**:
- Days 1-2: Priority 1 (E2E Validation)
- Days 3-4: Priority 2 (CLI Agent Test)
- Days 5-7: Priority 3 (Policy Framework)
**Expected Completion**: October 8, 2025
**Next After This**:
- Windows cross-platform testing
- Screenshot implementation
- Production telemetry (opt-in)
- Advanced policy features
---
## Success Metrics
**By End of Week**:
- ✅ Complete proposal workflow validated end-to-end
- ✅ `z3ed agent test` command operational with 5+ prompt patterns
- ✅ Policy framework implemented and integrated
- ✅ Documentation updated for all new features
- ✅ Zero known blockers for production use
**Quality Bar**:
- All code builds cleanly on macOS ARM64
- No crashes or hangs in normal workflows
- Error messages helpful and actionable
- Documentation sufficient for new contributors
- Ready for Windows testing phase
---
**Last Updated**: October 2, 2025
**Contributors**: @scawful, GitHub Copilot
**License**: Same as YAZE (see ../../LICENSE)