Update documentation

This commit is contained in:
scawful
2025-10-02 20:55:28 -04:00
parent e3621d7a1f
commit 0fb8ba4202
9 changed files with 1059 additions and 1997 deletions

View File

@@ -1,627 +0,0 @@
# Policy Evaluation Framework (AW-04)
**Status**: Implementation In Progress
**Priority**: High (Next Phase)
**Time Estimate**: 6-8 hours
**Last Updated**: October 2, 2025
## Overview
The Policy Evaluation Framework provides a YAML-based constraint system for gating proposal acceptance in the z3ed agent workflow. It ensures that AI-generated ROM modifications meet quality, safety, and testing requirements before being merged into the main ROM.
## Goals
1. **Quality Gates**: Enforce minimum test pass rates and code quality standards
2. **Safety Constraints**: Prevent modifications to critical ROM regions (headers, checksums)
3. **Scope Limits**: Restrict changes to reasonable byte counts and specific banks
4. **Human Review**: Require manual review for large or complex changes
5. **Flexibility**: Allow policy overrides with confirmation and logging
## Architecture
```
┌─────────────────────────────────────────────────────────┐
│ ProposalDrawer (GUI) │
│ └─ Accept button gated by PolicyEvaluator │
└────────────────────┬────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ PolicyEvaluator (Singleton Service) │
│ ├─ LoadPolicies() from .yaze/policies/ │
│ ├─ EvaluateProposal(proposal_id) → PolicyResult │
│ └─ Cache of parsed YAML policies │
└────────────────────┬────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ .yaze/policies/agent.yaml (YAML Configuration) │
│ ├─ test_requirements (min pass rates) │
│ ├─ change_constraints (byte limits, allowed banks) │
│ ├─ review_requirements (human review triggers) │
│ └─ forbidden_ranges (protected ROM regions) │
└─────────────────────────────────────────────────────────┘
```
## YAML Policy Schema
### Example Policy File
```yaml
# .yaze/policies/agent.yaml
version: 1.0
enabled: true
policies:
# Policy 1: Test Requirements
- name: require_tests
type: test_requirement
enabled: true
severity: critical # critical | warning | info
rules:
- test_suite: "overworld_rendering"
min_pass_rate: 0.95
- test_suite: "palette_integrity"
min_pass_rate: 1.0
- test_suite: "dungeon_logic"
min_pass_rate: 0.90
message: "All required test suites must pass before accepting proposal"
# Policy 2: Change Scope Limits
- name: limit_change_scope
type: change_constraint
enabled: true
severity: critical
rules:
- max_bytes_changed: 10240 # 10KB limit
- allowed_banks: [0x00, 0x01, 0x0E, 0x0F] # Graphics banks only
- max_commands_executed: 20
message: "Proposal exceeds allowed change scope"
# Policy 3: Protected ROM Regions
- name: protect_critical_regions
type: forbidden_range
enabled: true
severity: critical
ranges:
- start: 0xFFB0 # ROM header
end: 0xFFFF
reason: "ROM header is protected"
- start: 0x00FFC0 # Internal header
end: 0x00FFDF
reason: "Internal ROM header"
message: "Proposal modifies protected ROM region"
# Policy 4: Human Review Requirements
- name: human_review_required
type: review_requirement
enabled: true
severity: warning
conditions:
- if: bytes_changed > 1024
then: require_diff_review
message: "Large change requires diff review"
- if: commands_executed > 10
then: require_log_review
message: "Complex operation requires log review"
- if: test_failures > 0
then: require_explanation
message: "Test failures require explanation"
# Policy 5: Palette Modifications
- name: palette_safety
type: change_constraint
enabled: true
severity: warning
rules:
- max_palettes_changed: 5
- preserve_transparency: true # Don't modify color index 0
message: "Palette changes exceed safety threshold"
```
### Schema Definition
```yaml
# Policy file structure
version: string # Semantic version (e.g., "1.0")
enabled: boolean # Master enable/disable
policies:
- name: string # Unique policy identifier
type: enum # test_requirement | change_constraint | forbidden_range | review_requirement
enabled: boolean # Policy-specific enable/disable
severity: enum # critical | warning | info
# Type-specific fields:
rules: array # For test_requirement, change_constraint
ranges: array # For forbidden_range
conditions: array # For review_requirement
message: string # User-facing error message
```
## Implementation Plan
### Phase 1: Core Infrastructure (2 hours)
#### 1.1 Create PolicyEvaluator Service
**File**: `src/cli/service/policy_evaluator.h`
```cpp
#ifndef YAZE_CLI_SERVICE_POLICY_EVALUATOR_H
#define YAZE_CLI_SERVICE_POLICY_EVALUATOR_H
#include <string>
#include <vector>
#include <memory>
#include "absl/status/status.h"
#include "absl/status/statusor.h"
#include "absl/strings/string_view.h"
namespace yaze {
namespace cli {
// Policy violation severity levels
enum class PolicySeverity {
kInfo, // Informational, doesn't block acceptance
kWarning, // Warning, can be overridden
kCritical // Critical, blocks acceptance
};
// Individual policy violation
struct PolicyViolation {
std::string policy_name;
PolicySeverity severity;
std::string message;
std::string details; // Additional context
};
// Result of policy evaluation
struct PolicyResult {
bool passed; // True if all critical policies passed
std::vector<PolicyViolation> violations;
// Categorized violations
std::vector<PolicyViolation> critical_violations;
std::vector<PolicyViolation> warnings;
std::vector<PolicyViolation> info;
// Helper methods
bool has_critical_violations() const { return !critical_violations.empty(); }
bool can_accept_with_override() const {
return !has_critical_violations() && !warnings.empty();
}
};
// Singleton service for evaluating proposals against policies
class PolicyEvaluator {
public:
static PolicyEvaluator& GetInstance();
// Load policies from disk (.yaze/policies/agent.yaml)
absl::Status LoadPolicies(absl::string_view policy_dir = ".yaze/policies");
// Evaluate a proposal against all loaded policies
absl::StatusOr<PolicyResult> EvaluateProposal(
absl::string_view proposal_id);
// Reload policies from disk (for live editing)
absl::Status ReloadPolicies();
// Check if policies are loaded and enabled
bool IsEnabled() const { return enabled_; }
// Get policy configuration path
std::string GetPolicyPath() const { return policy_path_; }
private:
PolicyEvaluator() = default;
~PolicyEvaluator() = default;
// Non-copyable, non-movable
PolicyEvaluator(const PolicyEvaluator&) = delete;
PolicyEvaluator& operator=(const PolicyEvaluator&) = delete;
// Parse YAML policy file
absl::Status ParsePolicyFile(absl::string_view yaml_content);
// Evaluate individual policy types
void EvaluateTestRequirements(
absl::string_view proposal_id, PolicyResult* result);
void EvaluateChangeConstraints(
absl::string_view proposal_id, PolicyResult* result);
void EvaluateForbiddenRanges(
absl::string_view proposal_id, PolicyResult* result);
void EvaluateReviewRequirements(
absl::string_view proposal_id, PolicyResult* result);
bool enabled_ = false;
std::string policy_path_;
// Parsed policy structures (implementation detail)
struct PolicyConfig;
std::unique_ptr<PolicyConfig> config_;
};
} // namespace cli
} // namespace yaze
#endif // YAZE_CLI_SERVICE_POLICY_EVALUATOR_H
```
#### 1.2 Create Policy Configuration Structures
**File**: `src/cli/service/policy_evaluator.cc` (partial)
```cpp
#include "src/cli/service/policy_evaluator.h"
#include <fstream>
#include <sstream>
#include "absl/strings/str_format.h"
#include "src/cli/service/proposal_registry.h"
// If YAML parsing is available
#ifdef YAZE_WITH_YAML
#include <yaml-cpp/yaml.h>
#endif
namespace yaze {
namespace cli {
// Internal policy configuration structures
struct PolicyEvaluator::PolicyConfig {
std::string version;
bool enabled;
struct TestRequirement {
std::string name;
bool enabled;
PolicySeverity severity;
std::vector<std::pair<std::string, double>> test_suites; // suite name → min pass rate
std::string message;
};
struct ChangeConstraint {
std::string name;
bool enabled;
PolicySeverity severity;
int max_bytes_changed = -1;
std::vector<int> allowed_banks;
int max_commands_executed = -1;
int max_palettes_changed = -1;
bool preserve_transparency = false;
std::string message;
};
struct ForbiddenRange {
std::string name;
bool enabled;
PolicySeverity severity;
std::vector<std::tuple<int, int, std::string>> ranges; // start, end, reason
std::string message;
};
struct ReviewRequirement {
std::string name;
bool enabled;
PolicySeverity severity;
std::vector<std::string> conditions;
std::string message;
};
std::vector<TestRequirement> test_requirements;
std::vector<ChangeConstraint> change_constraints;
std::vector<ForbiddenRange> forbidden_ranges;
std::vector<ReviewRequirement> review_requirements;
};
// Singleton instance
PolicyEvaluator& PolicyEvaluator::GetInstance() {
static PolicyEvaluator instance;
return instance;
}
absl::Status PolicyEvaluator::LoadPolicies(absl::string_view policy_dir) {
policy_path_ = absl::StrFormat("%s/agent.yaml", policy_dir);
// Check if file exists
std::ifstream file(policy_path_);
if (!file.good()) {
// No policy file - policies disabled
enabled_ = false;
return absl::OkStatus();
}
// Read file content
std::stringstream buffer;
buffer << file.rdbuf();
std::string yaml_content = buffer.str();
return ParsePolicyFile(yaml_content);
}
absl::Status PolicyEvaluator::ParsePolicyFile(absl::string_view yaml_content) {
#ifndef YAZE_WITH_YAML
return absl::UnimplementedError(
"YAML support not compiled. Build with YAZE_WITH_YAML=ON");
#else
try {
YAML::Node root = YAML::Load(std::string(yaml_content));
config_ = std::make_unique<PolicyConfig>();
config_->version = root["version"].as<std::string>("1.0");
config_->enabled = root["enabled"].as<bool>(true);
if (!config_->enabled) {
enabled_ = false;
return absl::OkStatus();
}
// Parse policies array
if (root["policies"]) {
for (const auto& policy_node : root["policies"]) {
std::string type = policy_node["type"].as<std::string>();
if (type == "test_requirement") {
// Parse test requirement policy
// ... (implementation continues)
} else if (type == "change_constraint") {
// Parse change constraint policy
// ... (implementation continues)
} else if (type == "forbidden_range") {
// Parse forbidden range policy
// ... (implementation continues)
} else if (type == "review_requirement") {
// Parse review requirement policy
// ... (implementation continues)
}
}
}
enabled_ = true;
return absl::OkStatus();
} catch (const YAML::Exception& e) {
return absl::InvalidArgumentError(
absl::StrFormat("Failed to parse policy YAML: %s", e.what()));
}
#endif
}
// ... (implementation continues with evaluation methods)
} // namespace cli
} // namespace yaze
```
### Phase 2: Policy Evaluation Logic (2-3 hours)
Implement the core evaluation methods that check proposals against each policy type.
### Phase 3: GUI Integration (2 hours)
#### 3.1 Update ProposalDrawer
**File**: `src/app/editor/system/proposal_drawer.cc`
Add policy status display and gating logic:
```cpp
#include "src/cli/service/policy_evaluator.h"
void ProposalDrawer::DrawProposalDetail(const std::string& proposal_id) {
// ... existing detail view code ...
// === Policy Status Section ===
ImGui::Separator();
ImGui::TextUnformatted("Policy Status:");
auto& policy_eval = cli::PolicyEvaluator::GetInstance();
if (policy_eval.IsEnabled()) {
auto policy_result = policy_eval.EvaluateProposal(proposal_id);
if (policy_result.ok()) {
const auto& result = policy_result.value();
if (result.passed) {
ImGui::TextColored(ImVec4(0, 1, 0, 1), "✓ All policies passed");
} else {
// Show violations
if (result.has_critical_violations()) {
ImGui::TextColored(ImVec4(1, 0, 0, 1), "⛔ Critical violations:");
for (const auto& violation : result.critical_violations) {
ImGui::BulletText("%s: %s",
violation.policy_name.c_str(),
violation.message.c_str());
}
}
if (!result.warnings.empty()) {
ImGui::TextColored(ImVec4(1, 1, 0, 1), "⚠️ Warnings:");
for (const auto& violation : result.warnings) {
ImGui::BulletText("%s: %s",
violation.policy_name.c_str(),
violation.message.c_str());
}
}
}
// Gate Accept button
ImGui::Separator();
bool can_accept = !result.has_critical_violations();
if (!can_accept) {
ImGui::BeginDisabled();
}
if (ImGui::Button("Accept Proposal")) {
if (result.can_accept_with_override() && !override_confirmed_) {
// Show override confirmation dialog
ImGui::OpenPopup("Override Policy");
} else {
AcceptProposal(proposal_id);
}
}
if (!can_accept) {
ImGui::EndDisabled();
ImGui::SameLine();
ImGui::TextColored(ImVec4(1, 0, 0, 1),
"(Accept blocked by policy violations)");
}
// Override confirmation dialog
if (ImGui::BeginPopupModal("Override Policy", nullptr,
ImGuiWindowFlags_AlwaysAutoResize)) {
ImGui::Text("This proposal has policy warnings.");
ImGui::Text("Do you want to override and accept anyway?");
ImGui::Text("This action will be logged.");
ImGui::Separator();
if (ImGui::Button("Override and Accept")) {
override_confirmed_ = true;
AcceptProposal(proposal_id);
ImGui::CloseCurrentPopup();
}
ImGui::SameLine();
if (ImGui::Button("Cancel")) {
ImGui::CloseCurrentPopup();
}
ImGui::EndPopup();
}
} else {
ImGui::TextColored(ImVec4(1, 0, 0, 1),
"Policy evaluation failed: %s",
policy_result.status().message().data());
}
} else {
ImGui::TextColored(ImVec4(0.5, 0.5, 0.5, 1),
"No policies configured");
}
}
```
### Phase 4: Testing & Documentation (1-2 hours)
#### 4.1 Example Policy File
Create `.yaze/policies/agent.yaml.example`:
```yaml
# Example agent policy configuration
# Copy to .yaze/policies/agent.yaml and customize
version: 1.0
enabled: true
policies:
# Require test suites to pass
- name: require_tests
type: test_requirement
enabled: false # Disabled by default (no tests yet)
severity: critical
rules:
- test_suite: "smoke_test"
min_pass_rate: 1.0
message: "All smoke tests must pass"
# Limit change scope
- name: limit_changes
type: change_constraint
enabled: true
severity: warning
rules:
- max_bytes_changed: 5120 # 5KB
- max_commands_executed: 15
message: "Keep changes small and focused"
# Protect ROM header
- name: protect_header
type: forbidden_range
enabled: true
severity: critical
ranges:
- start: 0xFFB0
end: 0xFFFF
reason: "ROM header"
message: "Cannot modify ROM header"
```
#### 4.2 Unit Tests
Create `test/cli/policy_evaluator_test.cc`:
```cpp
#include "src/cli/service/policy_evaluator.h"
#include "gtest/gtest.h"
namespace yaze {
namespace cli {
namespace {
TEST(PolicyEvaluatorTest, LoadPoliciesSuccess) {
auto& eval = PolicyEvaluator::GetInstance();
auto status = eval.LoadPolicies("test/fixtures/policies");
EXPECT_TRUE(status.ok());
EXPECT_TRUE(eval.IsEnabled());
}
TEST(PolicyEvaluatorTest, EvaluateProposal_NoViolations) {
// ... test implementation
}
TEST(PolicyEvaluatorTest, EvaluateProposal_CriticalViolation) {
// ... test implementation
}
} // namespace
} // namespace cli
} // namespace yaze
```
## Deliverables
- [x] Policy evaluator service interface
- [ ] YAML policy parser implementation
- [ ] Policy evaluation logic for all 4 types
- [ ] ProposalDrawer GUI integration
- [ ] Policy override workflow
- [ ] Example policy configurations
- [ ] Unit tests
- [ ] Documentation and usage guide
## Success Criteria
1. **Functional**:
- Policies load from YAML files
- Proposals evaluated against all enabled policies
- Accept button gated by critical violations
- Override workflow for warnings
2. **User Experience**:
- Clear policy status display in ProposalDrawer
- Helpful violation messages
- Override confirmation dialog
- Policy evaluation fast (< 100ms)
3. **Quality**:
- Unit test coverage > 80%
- No crashes or memory leaks
- Graceful handling of malformed YAML
- Works with policies disabled
## Future Enhancements
- Policy templates for common scenarios
- Policy violation history/analytics
- Auto-fix suggestions for violations
- Integration with CI/CD for automated policy checks
- Policy versioning and migration
---
**Status**: Ready for implementation
**Next Step**: Create PolicyEvaluator skeleton and wire into build system
**Estimated Completion**: October 3-4, 2025

View File

@@ -25,6 +25,10 @@ The z3ed CLI and AI agent workflow system has completed major infrastructure mil
- **Priority 3**: Enhanced Error Reporting (IT-08+) - Holistic improvements spanning z3ed, ImGuiTestHarness, EditorManager, and core application services - **Priority 3**: Enhanced Error Reporting (IT-08+) - Holistic improvements spanning z3ed, ImGuiTestHarness, EditorManager, and core application services
**Recent Accomplishments** (Updated: October 2025): **Recent Accomplishments** (Updated: October 2025):
- **✅ IT-08a Screenshot RPC Complete**: SDL-based screenshot capture operational
- Captures 1536x864 BMP files via SDL_RenderReadPixels
- Successfully tested via gRPC (5.3MB output files)
- Foundation for auto-capture on test failures
- **✅ Policy Framework Complete**: PolicyEvaluator service fully integrated with ProposalDrawer GUI - **✅ Policy Framework Complete**: PolicyEvaluator service fully integrated with ProposalDrawer GUI
- 4 policy types implemented: test_requirement, change_constraint, forbidden_range, review_requirement - 4 policy types implemented: test_requirement, change_constraint, forbidden_range, review_requirement
- 3 severity levels: Info (informational), Warning (overridable), Critical (blocks acceptance) - 3 severity levels: Info (informational), Warning (overridable), Critical (blocks acceptance)
@@ -41,8 +45,8 @@ The z3ed CLI and AI agent workflow system has completed major infrastructure mil
- **Proposal Workflow**: Agentic proposal system fully operational (create, list, diff, review in GUI) - **Proposal Workflow**: Agentic proposal system fully operational (create, list, diff, review in GUI)
**Known Limitations & Improvement Opportunities**: **Known Limitations & Improvement Opportunities**:
- **Screenshot RPC**: Stub implementation → needs SDL_Surface capture + PNG encoding - **Screenshot Auto-Capture**: Manual RPC only → needs integration with TestManager failure detection
- **Test Introspection**: No way to query test status, results, or queue → add GetTestStatus/ListTests RPCs - **Test Introspection**: ✅ Complete - GetTestStatus/ListTests/GetResults RPCs operational
- **Widget Discovery**: AI agents can't enumerate available widgets → add DiscoverWidgets RPC - **Widget Discovery**: AI agents can't enumerate available widgets → add DiscoverWidgets RPC
- **Test Recording**: No record/replay for regression testing → add RecordSession/ReplaySession RPCs - **Test Recording**: No record/replay for regression testing → add RecordSession/ReplaySession RPCs
- **Synchronous Wait**: Async tests return immediately → add blocking mode or result polling - **Synchronous Wait**: Async tests return immediately → add blocking mode or result polling
@@ -236,13 +240,15 @@ message WidgetInfo {
**Outcome**: Recording/replay is production-ready; focus shifts to surfacing rich failure diagnostics (IT-08). **Outcome**: Recording/replay is production-ready; focus shifts to surfacing rich failure diagnostics (IT-08).
#### IT-08: Enhanced Error Reporting (5-7 hours) #### IT-08: Enhanced Error Reporting (5-7 hours) 🔄 ACTIVE
**Status**: IT-08a Complete ✅ | IT-08b In Progress 🔄
**Objective**: Deliver a unified, high-signal error reporting pipeline spanning ImGuiTestHarness, z3ed CLI, EditorManager, and core application services. **Objective**: Deliver a unified, high-signal error reporting pipeline spanning ImGuiTestHarness, z3ed CLI, EditorManager, and core application services.
**Implementation Tracks**: **Implementation Tracks**:
1. **Harness-Level Diagnostics** 1. **Harness-Level Diagnostics**
- Implement Screenshot RPC (convert stub into working SDL capture pipeline) - ✅ IT-08a: Screenshot RPC implemented (SDL-based, BMP format, 1536x864)
- Auto-capture screenshots, widget tree dumps, and recent ImGui events on failure - 📋 IT-08b: Auto-capture screenshots on test failure
- 📋 IT-08c: Widget tree dumps and recent ImGui events on failure
- Serialize results to both structured JSON (for automation) and human-friendly HTML bundles - Serialize results to both structured JSON (for automation) and human-friendly HTML bundles
- Persist artifacts under `test-results/<test_id>/` with timestamped directories - Persist artifacts under `test-results/<test_id>/` with timestamped directories
@@ -516,9 +522,10 @@ z3ed collab replay session_2025_10_02.yaml --speed 2x
| IT-05 | Add test introspection RPCs (GetTestStatus, ListTests, GetResults) | ImGuiTest Bridge | Code | ✅ Done | IT-01 - Enable clients to poll test results and query execution state (Oct 2, 2025) | | IT-05 | Add test introspection RPCs (GetTestStatus, ListTests, GetResults) | ImGuiTest Bridge | Code | ✅ Done | IT-01 - Enable clients to poll test results and query execution state (Oct 2, 2025) |
| IT-06 | Implement widget discovery API for AI agents | ImGuiTest Bridge | Code | 📋 Planned | IT-01 - DiscoverWidgets RPC to enumerate windows, buttons, inputs | | IT-06 | Implement widget discovery API for AI agents | ImGuiTest Bridge | Code | 📋 Planned | IT-01 - DiscoverWidgets RPC to enumerate windows, buttons, inputs |
| IT-07 | Add test recording/replay for regression testing | ImGuiTest Bridge | Code | ✅ Done | IT-05 - RecordSession/ReplaySession RPCs with JSON test scripts | | IT-07 | Add test recording/replay for regression testing | ImGuiTest Bridge | Code | ✅ Done | IT-05 - RecordSession/ReplaySession RPCs with JSON test scripts |
| IT-08 | Enhance error reporting with screenshots and state dumps | ImGuiTest Bridge | Code | <EFBFBD> Active | IT-01 - Capture widget state on failure for debugging | | IT-08 | Enhance error reporting with screenshots and state dumps | ImGuiTest Bridge | Code | 🔄 Active | IT-01 - Capture widget state on failure for debugging |
| IT-08a | Adopt shared error envelope across CLI & services | ImGuiTest Bridge | Code | 🔄 Active | IT-08 | | IT-08a | Screenshot RPC implementation (SDL capture) | ImGuiTest Bridge | Code | ✅ Done | IT-01 - Screenshot capture complete (Oct 2, 2025) |
| IT-08b | EditorManager diagnostic overlay & logging | ImGuiTest Bridge | UX | 📋 Planned | IT-08 | | IT-08b | Auto-capture screenshots on test failure | ImGuiTest Bridge | Code | 🔄 Active | IT-08a - Integrate with TestManager |
| IT-08c | Widget state dumps and execution context | ImGuiTest Bridge | Code | 📋 Planned | IT-08b - Enhanced failure diagnostics |
| IT-09 | Create standardized test suite format for CI integration | ImGuiTest Bridge | Infra | 📋 Planned | IT-07 - JSON/YAML test suite format compatible with CI/CD pipelines | | IT-09 | Create standardized test suite format for CI integration | ImGuiTest Bridge | Infra | 📋 Planned | IT-07 - JSON/YAML test suite format compatible with CI/CD pipelines |
| IT-10 | Collaborative editing & multiplayer sessions with shared AI | Collaboration | Feature | 📋 Planned | IT-05, IT-08 - Real-time multi-user editing with live cursors, shared proposals (12-15 hours) | | IT-10 | Collaborative editing & multiplayer sessions with shared AI | Collaboration | Feature | 📋 Planned | IT-05, IT-08 - Real-time multi-user editing with live cursors, shared proposals (12-15 hours) |
| VP-01 | Expand CLI unit tests for new commands and sandbox flow. | Verification Pipeline | Test | 📋 Planned | RC/AW tasks | | VP-01 | Expand CLI unit tests for new commands and sandbox flow. | Verification Pipeline | Test | 📋 Planned | RC/AW tasks |

View File

@@ -0,0 +1,647 @@
# IT-08: Enhanced Error Reporting Implementation Guide
**Status**: IT-08a Complete ✅ | IT-08b In Progress 🔄 | IT-08c Planned 📋
**Date**: October 2, 2025
**Overall Progress**: 33% Complete (1 of 3 phases)
---
## Phase Overview
| Phase | Task | Status | Time | Description |
|-------|------|--------|------|-------------|
| IT-08a | Screenshot RPC | ✅ Complete | 1.5h | SDL-based screenshot capture |
| IT-08b | Auto-Capture on Failure | 🔄 Active | 1-1.5h | Integrate with TestManager |
| IT-08c | Widget State Dumps | 📋 Planned | 30-45m | Capture UI context on failure |
| IT-08d | Error Envelope Standardization | 📋 Planned | 1-2h | Unified error format across services |
| IT-08e | CLI Error Improvements | 📋 Planned | 1h | Rich error output with artifacts |
**Total Estimated Time**: 5-7 hours
**Time Spent**: 1.5 hours
**Time Remaining**: 3.5-5.5 hours
---
## IT-08a: Screenshot RPC ✅ COMPLETE
**Date Completed**: October 2, 2025
**Time**: 1.5 hours
### Implementation Summary
### What Was Built
Implemented the `Screenshot` RPC in the ImGuiTestHarness service with the following capabilities:
1. **SDL Renderer Integration**: Accesses the ImGui SDL2 backend renderer through `BackendRendererUserData`
2. **Framebuffer Capture**: Uses `SDL_RenderReadPixels` to capture the full window contents (1536x864, 32-bit ARGB)
3. **BMP File Output**: Saves screenshots as BMP files using SDL's built-in `SDL_SaveBMP` function
4. **Flexible Paths**: Supports custom output paths or auto-generates timestamped filenames (`/tmp/yaze_screenshot_<timestamp>.bmp`)
5. **Response Metadata**: Returns file path, file size (bytes), and image dimensions
### Technical Implementation
**Location**: `/Users/scawful/Code/yaze/src/app/core/service/imgui_test_harness_service.cc`
```cpp
// Helper struct matching imgui_impl_sdlrenderer2.cpp backend data
struct ImGui_ImplSDLRenderer2_Data {
SDL_Renderer* Renderer;
};
absl::Status ImGuiTestHarnessServiceImpl::Screenshot(
const ScreenshotRequest* request, ScreenshotResponse* response) {
// 1. Get SDL renderer from ImGui backend
ImGuiIO& io = ImGui::GetIO();
auto* backend_data = static_cast<ImGui_ImplSDLRenderer2_Data*>(io.BackendRendererUserData);
if (!backend_data || !backend_data->Renderer) {
response->set_success(false);
response->set_message("SDL renderer not available");
return absl::FailedPreconditionError("No SDL renderer available");
}
SDL_Renderer* renderer = backend_data->Renderer;
// 2. Get renderer output size
int width, height;
SDL_GetRendererOutputSize(renderer, &width, &height);
// 3. Create surface to hold screenshot
SDL_Surface* surface = SDL_CreateRGBSurface(0, width, height, 32,
0x00FF0000, 0x0000FF00,
0x000000FF, 0xFF000000);
// 4. Read pixels from renderer (ARGB8888 format)
SDL_RenderReadPixels(renderer, nullptr, SDL_PIXELFORMAT_ARGB8888,
surface->pixels, surface->pitch);
// 5. Determine output path (custom or auto-generated)
std::string output_path = request->output_path();
if (output_path.empty()) {
output_path = absl::StrFormat("/tmp/yaze_screenshot_%lld.bmp",
absl::ToUnixMillis(absl::Now()));
}
// 6. Save to BMP file
SDL_SaveBMP(surface, output_path.c_str());
// 7. Get file size and clean up
std::ifstream file(output_path, std::ios::binary | std::ios::ate);
int64_t file_size = file.tellg();
SDL_FreeSurface(surface);
// 8. Return success response
response->set_success(true);
response->set_message(absl::StrFormat("Screenshot saved to %s (%dx%d)",
output_path, width, height));
response->set_file_path(output_path);
response->set_file_size_bytes(file_size);
return absl::OkStatus();
}
```
### Testing Results
**Test Command**:
```bash
grpcurl -plaintext \
-import-path /Users/scawful/Code/yaze/src/app/core/proto \
-proto imgui_test_harness.proto \
-d '{"output_path": "/tmp/test_screenshot.bmp"}' \
localhost:50052 yaze.test.ImGuiTestHarness/Screenshot
```
**Response**:
```json
{
"success": true,
"message": "Screenshot saved to /tmp/test_screenshot.bmp (1536x864)",
"filePath": "/tmp/test_screenshot.bmp",
"fileSizeBytes": "5308538"
}
```
**File Verification**:
```bash
$ ls -lh /tmp/test_screenshot.bmp
-rw-r--r-- 1 scawful wheel 5.1M Oct 2 20:16 /tmp/test_screenshot.bmp
$ file /tmp/test_screenshot.bmp
/tmp/test_screenshot.bmp: PC bitmap, Windows 95/NT4 and newer format, 1536 x 864 x 32, cbSize 5308538, bits offset 122
```
**Result**: Screenshot successfully captured, saved, and validated!
---
## Design Decisions
### Why BMP Format?
**Chosen**: SDL's built-in `SDL_SaveBMP` function
**Rationale**:
- ✅ Zero external dependencies (no need for libpng, stb_image_write, etc.)
- ✅ Guaranteed to work on all platforms where SDL works
- ✅ Simple, reliable, and fast
- ✅ Adequate for debugging/error reporting (file size not critical)
- ⚠️ Larger file sizes (5.3MB vs ~500KB for PNG), but acceptable for temporary debug files
**Future Consideration**: If disk space becomes an issue, can add PNG encoding using stb_image_write (single-header library, easy to integrate)
### SDL Backend Integration
**Challenge**: How to access the SDL_Renderer from ImGui?
**Solution**:
- ImGui's `BackendRendererUserData` points to an `ImGui_ImplSDLRenderer2_Data` struct
- This struct contains the `Renderer` pointer as its first member
- Cast `BackendRendererUserData` to access the renderer safely
**Why Not Store Renderer Globally?**
- Multiple ImGui contexts could use different renderers
- Backend data pattern follows ImGui's architecture conventions
- More maintainable and future-proof
---
## Integration with Test System
### Current Usage (Manual RPC)
AI agents or CLI tools can manually capture screenshots:
```bash
# Capture screenshot after opening editor
z3ed agent test --prompt "Open Overworld Editor"
grpcurl ... yaze.test.ImGuiTestHarness/Screenshot
```
### Next Step: Auto-Capture on Failure
The screenshot RPC is now ready to be integrated with TestManager to automatically capture context when tests fail:
**Planned Implementation** (IT-08 Phase 2):
```cpp
// In TestManager::MarkHarnessTestCompleted()
if (test_result == IMGUI_TEST_STATUS_FAILED ||
test_result == IMGUI_TEST_STATUS_TIMEOUT) {
// Auto-capture screenshot
ScreenshotRequest req;
req.set_output_path(absl::StrFormat("/tmp/test_%s_failure.bmp", test_id));
ScreenshotResponse resp;
harness_service_->Screenshot(&req, &resp);
test_history_[test_id].screenshot_path = resp.file_path();
// Also capture widget state (IT-08 Phase 3)
test_history_[test_id].widget_state = CaptureWidgetState();
}
```
---
---
## IT-08b: Auto-Capture on Test Failure 🔄 IN PROGRESS
**Goal**: Automatically capture screenshots and context when tests fail
**Time Estimate**: 1-1.5 hours
**Status**: Ready to implement
### Implementation Plan
#### Step 1: Modify TestManager (30 minutes)
**File**: `src/app/core/test_manager.cc`
Add screenshot capture in `MarkHarnessTestCompleted()`:
```cpp
void TestManager::MarkHarnessTestCompleted(const std::string& test_id,
ImGuiTestStatus status) {
auto& history_entry = test_history_[test_id];
history_entry.status = status;
history_entry.end_time = absl::Now();
history_entry.execution_time_ms = absl::ToInt64Milliseconds(
history_entry.end_time - history_entry.start_time);
// Auto-capture screenshot on failure
if (status == ImGuiTestStatus_Error || status == ImGuiTestStatus_Warning) {
CaptureFailureContext(test_id);
}
}
void TestManager::CaptureFailureContext(const std::string& test_id) {
auto& history_entry = test_history_[test_id];
// 1. Capture screenshot
std::string screenshot_path =
absl::StrFormat("/tmp/yaze_test_%s_failure.bmp", test_id);
if (harness_service_) {
ScreenshotRequest req;
req.set_output_path(screenshot_path);
ScreenshotResponse resp;
auto status = harness_service_->Screenshot(&req, &resp);
if (status.ok()) {
history_entry.screenshot_path = resp.file_path();
history_entry.screenshot_size_bytes = resp.file_size_bytes();
}
}
// 2. Capture widget state (IT-08c)
// history_entry.widget_state = CaptureWidgetState();
// 3. Capture execution context
history_entry.failure_context = absl::StrFormat(
"Frame: %d, Active Window: %s, Focused Widget: %s",
ImGui::GetFrameCount(),
ImGui::GetCurrentWindow() ? ImGui::GetCurrentWindow()->Name : "none",
ImGui::GetActiveID());
}
```
#### Step 2: Update TestHistory Structure (15 minutes)
**File**: `src/app/core/test_manager.h`
Add failure context fields:
```cpp
struct TestHistory {
std::string test_id;
std::string test_name;
ImGuiTestStatus status;
absl::Time start_time;
absl::Time end_time;
int64_t execution_time_ms;
std::vector<std::string> logs;
std::map<std::string, std::string> metrics;
// IT-08b: Failure diagnostics
std::string screenshot_path;
int64_t screenshot_size_bytes = 0;
std::string failure_context;
std::string widget_state; // IT-08c
};
```
#### Step 3: Update GetTestResults RPC (30 minutes)
**File**: `src/app/core/service/imgui_test_harness_service.cc`
Include screenshot path in results:
```cpp
absl::Status ImGuiTestHarnessServiceImpl::GetTestResults(
const GetTestResultsRequest* request,
GetTestResultsResponse* response) {
const auto& history = test_manager_->GetTestHistory(request->test_id());
// ... existing result population ...
// Add failure diagnostics
if (!history.screenshot_path.empty()) {
response->set_screenshot_path(history.screenshot_path);
response->set_screenshot_size_bytes(history.screenshot_size_bytes);
}
if (!history.failure_context.empty()) {
response->set_failure_context(history.failure_context);
}
return absl::OkStatus();
}
```
#### Step 4: Update Proto Schema (15 minutes)
**File**: `src/app/core/proto/imgui_test_harness.proto`
Add fields to GetTestResultsResponse:
```proto
message GetTestResultsResponse {
string test_id = 1;
TestStatus status = 2;
int64 execution_time_ms = 3;
repeated string logs = 4;
map<string, string> metrics = 5;
// IT-08b: Failure diagnostics
string screenshot_path = 6;
int64 screenshot_size_bytes = 7;
string failure_context = 8;
string widget_state = 9; // IT-08c
}
```
### Testing
```bash
# 1. Build with changes
cmake --build build-grpc-test --target yaze -j8
# 2. Start test harness
./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze \
--enable_test_harness --test_harness_port=50052 \
--rom_file=assets/zelda3.sfc &
# 3. Trigger a failing test
grpcurl -plaintext \
-import-path src/app/core/proto \
-proto imgui_test_harness.proto \
-d '{"target":"nonexistent_widget","type":"LEFT"}' \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Click
# 4. Check for screenshot
ls -lh /tmp/yaze_test_*_failure.bmp
# 5. Query test results
grpcurl -plaintext \
-import-path src/app/core/proto \
-proto imgui_test_harness.proto \
-d '{"test_id":"grpc_click_<timestamp>"}' \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/GetTestResults
# Expected: screenshot_path and failure_context populated
```
### Success Criteria
- ✅ Screenshots auto-captured on test failure
- ✅ Screenshot path stored in test history
- ✅ GetTestResults returns screenshot metadata
- ✅ No performance impact on passing tests
- ✅ Screenshots cleaned up after test completion (optional)
---
## IT-08c: Widget State Dumps 📋 PLANNED
**Goal**: Capture UI hierarchy and state on test failures
**Time Estimate**: 30-45 minutes
**Status**: Specification phase
### Implementation Plan
#### Step 1: Create Widget State Capture Utility (30 minutes)
**File**: `src/app/core/widget_state_capture.h` (new file)
```cpp
#ifndef YAZE_CORE_WIDGET_STATE_CAPTURE_H
#define YAZE_CORE_WIDGET_STATE_CAPTURE_H
#include <string>
#include "imgui/imgui.h"
namespace yaze {
namespace core {
struct WidgetState {
std::string focused_window;
std::string focused_widget;
std::string hovered_widget;
std::vector<std::string> visible_windows;
std::vector<std::string> open_menus;
std::string active_popup;
};
std::string CaptureWidgetState();
std::string SerializeWidgetStateToJson(const WidgetState& state);
} // namespace core
} // namespace yaze
#endif
```
**File**: `src/app/core/widget_state_capture.cc` (new file)
```cpp
#include "src/app/core/widget_state_capture.h"
#include "absl/strings/str_format.h"
#include "nlohmann/json.hpp"
namespace yaze {
namespace core {
std::string CaptureWidgetState() {
WidgetState state;
// Capture focused window
ImGuiWindow* current = ImGui::GetCurrentWindow();
if (current) {
state.focused_window = current->Name;
}
// Capture active widget
ImGuiID active_id = ImGui::GetActiveID();
if (active_id != 0) {
state.focused_widget = absl::StrFormat("ID_%u", active_id);
}
// Capture hovered widget
ImGuiID hovered_id = ImGui::GetHoveredID();
if (hovered_id != 0) {
state.hovered_widget = absl::StrFormat("ID_%u", hovered_id);
}
// Traverse window list
ImGuiContext* ctx = ImGui::GetCurrentContext();
for (ImGuiWindow* window : ctx->Windows) {
if (window->Active && !window->Hidden) {
state.visible_windows.push_back(window->Name);
}
}
return SerializeWidgetStateToJson(state);
}
std::string SerializeWidgetStateToJson(const WidgetState& state) {
nlohmann::json j;
j["focused_window"] = state.focused_window;
j["focused_widget"] = state.focused_widget;
j["hovered_widget"] = state.hovered_widget;
j["visible_windows"] = state.visible_windows;
j["open_menus"] = state.open_menus;
j["active_popup"] = state.active_popup;
return j.dump(2); // Pretty print with indent
}
} // namespace core
} // namespace yaze
```
#### Step 2: Integrate with TestManager (15 minutes)
Update `CaptureFailureContext()` in `test_manager.cc`:
```cpp
void TestManager::CaptureFailureContext(const std::string& test_id) {
auto& history_entry = test_history_[test_id];
// 1. Screenshot (IT-08b)
// ... existing code ...
// 2. Widget state (IT-08c)
history_entry.widget_state = core::CaptureWidgetState();
// 3. Execution context
// ... existing code ...
}
```
### Output Example
```json
{
"focused_window": "Overworld Editor",
"focused_widget": "ID_12345",
"hovered_widget": "ID_67890",
"visible_windows": [
"Main Window",
"Overworld Editor",
"Palette Editor"
],
"open_menus": [],
"active_popup": ""
}
```
---
## IT-08d: Error Envelope Standardization 📋 PLANNED
**Goal**: Unified error format across z3ed, TestManager, EditorManager
**Time Estimate**: 1-2 hours
**Status**: Design phase
### Proposed Error Envelope
```cpp
// Shared error structure
struct ErrorContext {
absl::Status status;
std::string component; // "TestHarness", "EditorManager", "z3ed"
std::string operation; // "Click", "LoadROM", "RunTest"
std::map<std::string, std::string> metadata;
std::vector<std::string> artifact_paths; // Screenshots, logs, etc.
std::string actionable_hint; // User-facing suggestion
};
```
### Integration Points
1. **TestManager**: Wrap failures in ErrorContext
2. **EditorManager**: Use ErrorContext for all operations
3. **z3ed CLI**: Parse ErrorContext and format for display
4. **ProposalDrawer**: Display ErrorContext in GUI modal
---
## IT-08e: CLI Error Improvements 📋 PLANNED
**Goal**: Rich error output in z3ed CLI
**Time Estimate**: 1 hour
**Status**: Design phase
### Enhanced CLI Output
```bash
$ z3ed agent test --prompt "Open Overworld editor"
❌ Test Failed: grpc_click_1696357200
Component: ImGuiTestHarness
Operation: Click widget "Overworld"
Error: Widget not found
Artifacts:
• Screenshot: /tmp/yaze_test_grpc_click_1696357200_failure.bmp
• Widget State: /tmp/yaze_test_grpc_click_1696357200_state.json
• Logs: /tmp/yaze_test_grpc_click_1696357200.log
Context:
• Visible Windows: Main Window, Debug
• Focused Window: Main Window
• Active Widget: None
Suggestion:
→ Check if ROM is loaded (File → Open ROM)
→ Verify Overworld editor button is visible
→ Use 'z3ed agent gui discover' to list available widgets
```
---
## Progress Tracking
### Completed ✅
- IT-08a: Screenshot RPC (1.5 hours)
### In Progress 🔄
- IT-08b: Auto-capture on failure (next priority)
### Planned 📋
- IT-08c: Widget state dumps
- IT-08d: Error envelope standardization
- IT-08e: CLI error improvements
### Time Investment
- **Spent**: 1.5 hours (IT-08a)
- **Remaining**: 3.5-5.5 hours (IT-08b/c/d/e)
- **Total**: 5-7 hours (as estimated)
---
## Next Steps
**Immediate** (IT-08b - 1-1.5 hours):
1. Modify TestManager to capture screenshots on failure
2. Update TestHistory structure
3. Update GetTestResults RPC
4. Test with intentional failures
**Short-term** (IT-08c - 30-45 minutes):
1. Create widget state capture utility
2. Integrate with TestManager
3. Add to GetTestResults RPC
**Medium-term** (IT-08d/e - 2-3 hours):
1. Design unified error envelope
2. Implement across all services
3. Update CLI output formatting
4. Add ProposalDrawer error modal
---
## References
- **Implementation Plan**: [E6-z3ed-implementation-plan.md](E6-z3ed-implementation-plan.md)
- **Test Harness Guide**: [IT-05-IMPLEMENTATION-GUIDE.md](IT-05-IMPLEMENTATION-GUIDE.md)
- **Source Files**:
- `src/app/core/service/imgui_test_harness_service.cc`
- `src/app/core/test_manager.{h,cc}`
- `src/app/core/proto/imgui_test_harness.proto`
---
**Last Updated**: October 2, 2025
**Current Phase**: IT-08b (Auto-capture on failure)
**Overall Progress**: 33% Complete (1 of 3 core phases)
---
**Report Generated**: October 2, 2025
**Author**: GitHub Copilot (AI Assistant)
**Project**: YAZE - Yet Another Zelda3 Editor
**Component**: z3ed CLI Tool - Test Automation Harness

View File

@@ -1,347 +0,0 @@
# IT-08 Screenshot RPC - Completion Report
**Date**: October 2, 2025
**Task**: IT-08 Enhanced Error Reporting - Screenshot Capture Implementation
**Status**: ✅ Screenshot RPC Complete (30% of IT-08)
---
## Implementation Summary
### What Was Built
Implemented the `Screenshot` RPC in the ImGuiTestHarness service with the following capabilities:
1. **SDL Renderer Integration**: Accesses the ImGui SDL2 backend renderer through `BackendRendererUserData`
2. **Framebuffer Capture**: Uses `SDL_RenderReadPixels` to capture the full window contents (1536x864, 32-bit ARGB)
3. **BMP File Output**: Saves screenshots as BMP files using SDL's built-in `SDL_SaveBMP` function
4. **Flexible Paths**: Supports custom output paths or auto-generates timestamped filenames (`/tmp/yaze_screenshot_<timestamp>.bmp`)
5. **Response Metadata**: Returns file path, file size (bytes), and image dimensions
### Technical Implementation
**Location**: `/Users/scawful/Code/yaze/src/app/core/service/imgui_test_harness_service.cc`
```cpp
// Helper struct matching imgui_impl_sdlrenderer2.cpp backend data
struct ImGui_ImplSDLRenderer2_Data {
SDL_Renderer* Renderer;
};
absl::Status ImGuiTestHarnessServiceImpl::Screenshot(
const ScreenshotRequest* request, ScreenshotResponse* response) {
// 1. Get SDL renderer from ImGui backend
ImGuiIO& io = ImGui::GetIO();
auto* backend_data = static_cast<ImGui_ImplSDLRenderer2_Data*>(io.BackendRendererUserData);
if (!backend_data || !backend_data->Renderer) {
response->set_success(false);
response->set_message("SDL renderer not available");
return absl::FailedPreconditionError("No SDL renderer available");
}
SDL_Renderer* renderer = backend_data->Renderer;
// 2. Get renderer output size
int width, height;
SDL_GetRendererOutputSize(renderer, &width, &height);
// 3. Create surface to hold screenshot
SDL_Surface* surface = SDL_CreateRGBSurface(0, width, height, 32,
0x00FF0000, 0x0000FF00,
0x000000FF, 0xFF000000);
// 4. Read pixels from renderer (ARGB8888 format)
SDL_RenderReadPixels(renderer, nullptr, SDL_PIXELFORMAT_ARGB8888,
surface->pixels, surface->pitch);
// 5. Determine output path (custom or auto-generated)
std::string output_path = request->output_path();
if (output_path.empty()) {
output_path = absl::StrFormat("/tmp/yaze_screenshot_%lld.bmp",
absl::ToUnixMillis(absl::Now()));
}
// 6. Save to BMP file
SDL_SaveBMP(surface, output_path.c_str());
// 7. Get file size and clean up
std::ifstream file(output_path, std::ios::binary | std::ios::ate);
int64_t file_size = file.tellg();
SDL_FreeSurface(surface);
// 8. Return success response
response->set_success(true);
response->set_message(absl::StrFormat("Screenshot saved to %s (%dx%d)",
output_path, width, height));
response->set_file_path(output_path);
response->set_file_size_bytes(file_size);
return absl::OkStatus();
}
```
### Testing Results
**Test Command**:
```bash
grpcurl -plaintext \
-import-path /Users/scawful/Code/yaze/src/app/core/proto \
-proto imgui_test_harness.proto \
-d '{"output_path": "/tmp/test_screenshot.bmp"}' \
localhost:50052 yaze.test.ImGuiTestHarness/Screenshot
```
**Response**:
```json
{
"success": true,
"message": "Screenshot saved to /tmp/test_screenshot.bmp (1536x864)",
"filePath": "/tmp/test_screenshot.bmp",
"fileSizeBytes": "5308538"
}
```
**File Verification**:
```bash
$ ls -lh /tmp/test_screenshot.bmp
-rw-r--r-- 1 scawful wheel 5.1M Oct 2 20:16 /tmp/test_screenshot.bmp
$ file /tmp/test_screenshot.bmp
/tmp/test_screenshot.bmp: PC bitmap, Windows 95/NT4 and newer format, 1536 x 864 x 32, cbSize 5308538, bits offset 122
```
**Result**: Screenshot successfully captured, saved, and validated!
---
## Design Decisions
### Why BMP Format?
**Chosen**: SDL's built-in `SDL_SaveBMP` function
**Rationale**:
- ✅ Zero external dependencies (no need for libpng, stb_image_write, etc.)
- ✅ Guaranteed to work on all platforms where SDL works
- ✅ Simple, reliable, and fast
- ✅ Adequate for debugging/error reporting (file size not critical)
- ⚠️ Larger file sizes (5.3MB vs ~500KB for PNG), but acceptable for temporary debug files
**Future Consideration**: If disk space becomes an issue, can add PNG encoding using stb_image_write (single-header library, easy to integrate)
### SDL Backend Integration
**Challenge**: How to access the SDL_Renderer from ImGui?
**Solution**:
- ImGui's `BackendRendererUserData` points to an `ImGui_ImplSDLRenderer2_Data` struct
- This struct contains the `Renderer` pointer as its first member
- Cast `BackendRendererUserData` to access the renderer safely
**Why Not Store Renderer Globally?**
- Multiple ImGui contexts could use different renderers
- Backend data pattern follows ImGui's architecture conventions
- More maintainable and future-proof
---
## Integration with Test System
### Current Usage (Manual RPC)
AI agents or CLI tools can manually capture screenshots:
```bash
# Capture screenshot after opening editor
z3ed agent test --prompt "Open Overworld Editor"
grpcurl ... yaze.test.ImGuiTestHarness/Screenshot
```
### Next Step: Auto-Capture on Failure
The screenshot RPC is now ready to be integrated with TestManager to automatically capture context when tests fail:
**Planned Implementation** (IT-08 Phase 2):
```cpp
// In TestManager::MarkHarnessTestCompleted()
if (test_result == IMGUI_TEST_STATUS_FAILED ||
test_result == IMGUI_TEST_STATUS_TIMEOUT) {
// Auto-capture screenshot
ScreenshotRequest req;
req.set_output_path(absl::StrFormat("/tmp/test_%s_failure.bmp", test_id));
ScreenshotResponse resp;
harness_service_->Screenshot(&req, &resp);
test_history_[test_id].screenshot_path = resp.file_path();
// Also capture widget state (IT-08 Phase 3)
test_history_[test_id].widget_state = CaptureWidgetState();
}
```
---
## Remaining Work (IT-08 Phases 2-3)
### Phase 2: Auto-Capture on Test Failure (1-1.5 hours)
**Tasks**:
1. Modify `TestManager::MarkHarnessTestCompleted()` to detect failures
2. Call Screenshot RPC automatically when `status == FAILED || status == TIMEOUT`
3. Store screenshot path in test history
4. Update `GetTestResults` RPC to include screenshot paths in response
5. Test with intentional test failures
**Files to Modify**:
- `src/app/core/test_manager.cc` (auto-capture logic)
- `src/app/core/service/imgui_test_harness_service.cc` (store screenshot in history)
### Phase 3: Widget State Dump (30-45 minutes)
**Tasks**:
1. Implement `CaptureWidgetState()` function to traverse ImGui window hierarchy
2. Capture: focused window, focused widget, hovered widget, open menus
3. Store as JSON string in test history
4. Include in `GetTestResults` response
**Files to Create**:
- `src/app/core/widget_state_capture.{h,cc}` (traversal logic)
**Example Output**:
```json
{
"focused_window": "Overworld Editor",
"hovered_widget": "canvas_overworld_main",
"open_menus": [],
"visible_windows": ["Overworld Editor", "Palette Editor", "Tile16 Editor"]
}
```
---
## Performance Considerations
### Current Performance
- **Screenshot Capture Time**: ~10-20ms (depends on resolution)
- **File Write Time**: ~50-100ms (5.3MB BMP)
- **Total Impact**: ~60-120ms per screenshot
**Analysis**: Acceptable for failure scenarios (only captures when test fails, not on every frame)
### Optimization Options (If Needed)
1. **Async Capture**: Move screenshot to background thread (complex, may not be necessary)
2. **PNG Compression**: Reduce file size from 5.3MB to ~500KB (10x smaller)
3. **Downscaling**: Capture at 50% resolution (768x432) for faster I/O
4. **Skip Screenshots for Fast Tests**: Only capture for tests >1 second
**Recommendation**: Current performance is fine for debugging. Only optimize if users report slowdowns.
---
## CLI Integration
### z3ed CLI Usage
The Screenshot RPC is accessible via the CLI automation client:
```cpp
// In gui_automation_client.cc
absl::StatusOr<ScreenshotResponse> GuiAutomationClient::TakeScreenshot(
const std::string& output_path) {
ScreenshotRequest request;
request.set_output_path(output_path);
ScreenshotResponse response;
grpc::ClientContext context;
auto status = stub_->Screenshot(&context, request, &response);
if (!status.ok()) {
return absl::InternalError(status.error_message());
}
return response;
}
```
### Agent Mode Integration
AI agents can now request screenshots to understand GUI state:
```yaml
# Example agent workflow
- action: click
target: "Overworld Editor##tab"
- action: screenshot
output: "/tmp/overworld_state.bmp"
- action: analyze
image: "/tmp/overworld_state.bmp"
prompt: "Verify Overworld Editor opened successfully"
```
---
## Next Steps
### Immediate (Continue IT-08)
1. **Build and Test**: ✅ Complete (Oct 2, 2025)
2. **Auto-Capture on Failure**: 📋 Next (1-1.5 hours)
3. **Widget State Dump**: 📋 After auto-capture (30-45 minutes)
### After IT-08 Completion
**IT-09: CI/CD Integration** (2-3 hours):
- Test suite YAML format
- JUnit XML output for GitHub Actions
- Example workflow file
---
## Success Metrics
**Screenshot RPC Works**: Successfully captures 1536x864 @ 32-bit BMP files
**Integration Ready**: Can be called from CLI, agents, or test harness
**Performance Acceptable**: ~60-120ms total impact per capture
**Error Handling**: Returns clear error messages if renderer unavailable
**Overall IT-08 Progress**: 30% complete (1 of 3 phases done)
---
## Documentation Updates
### Files Updated
- `src/app/core/service/imgui_test_harness_service.cc` (Screenshot implementation)
- `docs/z3ed/IT-08-SCREENSHOT-COMPLETION.md` (this file)
### Files to Update Next
- `docs/z3ed/IMPLEMENTATION_CONTINUATION.md` (mark Screenshot complete)
- `docs/z3ed/STATUS_REPORT_OCT2.md` (update progress to 30%)
- `docs/z3ed/NEXT_STEPS_OCT2.md` (shift focus to Phase 2)
---
## Conclusion
The Screenshot RPC is fully functional and tested. It provides the foundation for IT-08's enhanced error reporting system by capturing visual context when tests fail.
**Key Achievement**: AI agents can now "see" what's on screen, enabling visual debugging and verification workflows.
**What's Next**: Integrate screenshot capture with the test failure detection system so every failed test automatically includes a screenshot + widget state dump.
**Estimated Time to Complete IT-08**: 1.5-2 hours remaining (auto-capture + widget state)
---
**Report Generated**: October 2, 2025
**Author**: GitHub Copilot (AI Assistant)
**Project**: YAZE - Yet Another Zelda3 Editor
**Component**: z3ed CLI Tool - Test Automation Harness

View File

@@ -0,0 +1,388 @@
# IT-08b: Auto-Capture on Test Failure - Implementation Guide
**Status**: 🔄 Ready to Implement
**Priority**: High (Next Phase of IT-08)
**Time Estimate**: 1-1.5 hours
**Date**: October 2, 2025
---
## Overview
Automatically capture screenshots and execution context when tests fail, enabling better debugging and diagnostics for AI agents.
**Goal**: Every failed test produces:
- Screenshot of GUI state at failure
- Execution context (frame count, active windows, focused widgets)
- Foundation for IT-08c (widget state dumps)
---
## Implementation Steps
### Step 1: Update TestHistory Structure (15 minutes)
**File**: `src/app/core/test_manager.h`
Add failure diagnostics fields:
```cpp
struct TestHistory {
std::string test_id;
std::string test_name;
ImGuiTestStatus status;
absl::Time start_time;
absl::Time end_time;
int64_t execution_time_ms;
std::vector<std::string> logs;
std::map<std::string, std::string> metrics;
// IT-08b: Failure diagnostics
std::string screenshot_path;
int64_t screenshot_size_bytes = 0;
std::string failure_context;
// IT-08c: Widget state (future)
std::string widget_state;
};
```
### Step 2: Add CaptureFailureContext Method (30 minutes)
**File**: `src/app/core/test_manager.cc`
Add new method after `MarkHarnessTestCompleted`:
```cpp
void TestManager::CaptureFailureContext(const std::string& test_id) {
if (test_history_.find(test_id) == test_history_.end()) {
return;
}
auto& history = test_history_[test_id];
// 1. Capture screenshot via harness service
if (harness_service_) {
std::string screenshot_path =
absl::StrFormat("/tmp/yaze_test_%s_failure.bmp", test_id);
ScreenshotRequest req;
req.set_output_path(screenshot_path);
ScreenshotResponse resp;
auto status = harness_service_->Screenshot(&req, &resp);
if (status.ok() && resp.success()) {
history.screenshot_path = resp.file_path();
history.screenshot_size_bytes = resp.file_size_bytes();
} else {
YAZE_LOG(ERROR) << "Failed to capture screenshot for " << test_id
<< ": " << status.message();
}
}
// 2. Capture execution context
ImGuiContext* ctx = ImGui::GetCurrentContext();
if (ctx) {
ImGuiWindow* current_window = ImGui::GetCurrentWindow();
std::string window_name = current_window ? current_window->Name : "none";
ImGuiID active_id = ImGui::GetActiveID();
ImGuiID hovered_id = ImGui::GetHoveredID();
history.failure_context = absl::StrFormat(
"Frame: %d, Window: %s, Active: %u, Hovered: %u",
ImGui::GetFrameCount(),
window_name,
active_id,
hovered_id);
}
// 3. Widget state capture (IT-08c - placeholder)
// history.widget_state = CaptureWidgetState();
}
```
### Step 3: Integrate with MarkHarnessTestCompleted (15 minutes)
**File**: `src/app/core/test_manager.cc`
Modify existing method to call CaptureFailureContext:
```cpp
void TestManager::MarkHarnessTestCompleted(const std::string& test_id,
ImGuiTestStatus status) {
if (test_history_.find(test_id) == test_history_.end()) {
return;
}
auto& history = test_history_[test_id];
history.status = status;
history.end_time = absl::Now();
history.execution_time_ms = absl::ToInt64Milliseconds(
history.end_time - history.start_time);
// Auto-capture diagnostics on failure
if (status == ImGuiTestStatus_Error || status == ImGuiTestStatus_Warning) {
CaptureFailureContext(test_id);
}
// Notify waiting threads
cv_.notify_all();
}
```
### Step 4: Update GetTestResults RPC (30 minutes)
**File**: `src/app/core/proto/imgui_test_harness.proto`
Add fields to response:
```proto
message GetTestResultsResponse {
string test_id = 1;
TestStatus status = 2;
int64 execution_time_ms = 3;
repeated string logs = 4;
map<string, string> metrics = 5;
// IT-08b: Failure diagnostics
string screenshot_path = 6;
int64 screenshot_size_bytes = 7;
string failure_context = 8;
// IT-08c: Widget state (future)
string widget_state = 9;
}
```
**File**: `src/app/core/service/imgui_test_harness_service.cc`
Update implementation:
```cpp
absl::Status ImGuiTestHarnessServiceImpl::GetTestResults(
const GetTestResultsRequest* request,
GetTestResultsResponse* response) {
const std::string& test_id = request->test_id();
auto history = test_manager_->GetTestHistory(test_id);
if (!history.has_value()) {
return absl::NotFoundError(
absl::StrFormat("Test not found: %s", test_id));
}
const auto& h = history.value();
// Basic info
response->set_test_id(h.test_id);
response->set_status(ConvertImGuiTestStatusToProto(h.status));
response->set_execution_time_ms(h.execution_time_ms);
// Logs and metrics
for (const auto& log : h.logs) {
response->add_logs(log);
}
for (const auto& [key, value] : h.metrics) {
(*response->mutable_metrics())[key] = value;
}
// IT-08b: Failure diagnostics
if (!h.screenshot_path.empty()) {
response->set_screenshot_path(h.screenshot_path);
response->set_screenshot_size_bytes(h.screenshot_size_bytes);
}
if (!h.failure_context.empty()) {
response->set_failure_context(h.failure_context);
}
// IT-08c: Widget state (future)
if (!h.widget_state.empty()) {
response->set_widget_state(h.widget_state);
}
return absl::OkStatus();
}
```
---
## Testing
### Build and Start Test Harness
```bash
# 1. Rebuild with changes
cmake --build build-grpc-test --target yaze -j$(sysctl -n hw.ncpu)
# 2. Start test harness
./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze \
--enable_test_harness \
--test_harness_port=50052 \
--rom_file=assets/zelda3.sfc &
```
### Trigger Test Failure
```bash
# 3. Trigger a failing test (nonexistent widget)
grpcurl -plaintext \
-import-path src/app/core/proto \
-proto imgui_test_harness.proto \
-d '{"target":"nonexistent_widget","type":"LEFT"}' \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Click
# Response should indicate failure
```
### Verify Screenshot Captured
```bash
# 4. Check for auto-captured screenshot
ls -lh /tmp/yaze_test_*_failure.bmp
# Expected: BMP file created (5.3MB)
```
### Query Test Results
```bash
# 5. Get test results (replace <test_id> with actual ID from Click response)
grpcurl -plaintext \
-import-path src/app/core/proto \
-proto imgui_test_harness.proto \
-d '{"test_id":"<test_id>"}' \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/GetTestResults
# Expected output:
{
"testId": "grpc_click_12345678",
"status": "FAILED",
"executionTimeMs": "1234",
"logs": [...],
"screenshotPath": "/tmp/yaze_test_grpc_click_12345678_failure.bmp",
"screenshotSizeBytes": "5308538",
"failureContext": "Frame: 1234, Window: Main Window, Active: 0, Hovered: 0"
}
```
### End-to-End Test Script
Create `scripts/test_auto_capture.sh`:
```bash
#!/bin/bash
set -e
echo "=== IT-08b Auto-Capture Test ==="
# Clean up old screenshots
rm -f /tmp/yaze_test_*_failure.bmp
# Start YAZE with test harness
echo "Starting YAZE..."
./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze \
--enable_test_harness \
--test_harness_port=50052 \
--rom_file=assets/zelda3.sfc &
YAZE_PID=$!
# Wait for server to start
sleep 3
# Trigger failing test
echo "Triggering test failure..."
TEST_ID=$(grpcurl -plaintext \
-import-path src/app/core/proto \
-proto imgui_test_harness.proto \
-d '{"target":"nonexistent_widget","type":"LEFT"}' \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Click | \
jq -r '.testId')
echo "Test ID: $TEST_ID"
# Wait for test to complete
sleep 2
# Check screenshot captured
if [ -f "/tmp/yaze_test_${TEST_ID}_failure.bmp" ]; then
echo "✅ Screenshot captured: /tmp/yaze_test_${TEST_ID}_failure.bmp"
else
echo "❌ Screenshot NOT captured"
kill $YAZE_PID
exit 1
fi
# Query test results
echo "Querying test results..."
RESULTS=$(grpcurl -plaintext \
-import-path src/app/core/proto \
-proto imgui_test_harness.proto \
-d "{\"test_id\":\"$TEST_ID\"}" \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/GetTestResults)
echo "$RESULTS"
# Verify fields present
if echo "$RESULTS" | jq -e '.screenshotPath' > /dev/null; then
echo "✅ Screenshot path in results"
else
echo "❌ Screenshot path missing"
kill $YAZE_PID
exit 1
fi
if echo "$RESULTS" | jq -e '.failureContext' > /dev/null; then
echo "✅ Failure context in results"
else
echo "❌ Failure context missing"
kill $YAZE_PID
exit 1
fi
echo "=== All tests passed! ==="
# Cleanup
kill $YAZE_PID
```
---
## Success Criteria
- ✅ Screenshots auto-captured on test failure (Error or Warning status)
- ✅ Screenshot path stored in TestHistory
- ✅ Failure context captured (frame, window, widgets)
- ✅ GetTestResults RPC returns screenshot_path and failure_context
- ✅ No performance impact on passing tests (capture only on failure)
- ✅ Clean error handling if screenshot capture fails
---
## Files Modified
1. `src/app/core/test_manager.h` - TestHistory structure
2. `src/app/core/test_manager.cc` - CaptureFailureContext method
3. `src/app/core/proto/imgui_test_harness.proto` - GetTestResultsResponse fields
4. `src/app/core/service/imgui_test_harness_service.cc` - GetTestResults implementation
---
## Next Steps
**After IT-08b Complete**:
1. IT-08c: Widget State Dumps (30-45 minutes)
2. IT-08d: Error Envelope Standardization (1-2 hours)
3. IT-08e: CLI Error Improvements (1 hour)
**Documentation Updates**:
1. Update `IT-08-IMPLEMENTATION-GUIDE.md` with IT-08b complete status
2. Update `E6-z3ed-implementation-plan.md` progress tracking
3. Update `README.md` with new capabilities
---
**Last Updated**: October 2, 2025
**Status**: Ready to implement
**Estimated Completion**: October 2-3, 2025 (1-1.5 hours)

View File

@@ -1,251 +0,0 @@
# Policy Evaluation Framework - Implementation Complete ✅
**Date**: October 2025
**Task**: AW-04 - Policy Evaluation Framework
**Status**: ✅ Complete - Ready for Production Testing
**Time**: 6 hours actual (estimated 6-8 hours)
## Overview
The Policy Evaluation Framework enables safe AI-driven ROM modifications by gating proposal acceptance based on YAML-configured constraints. This prevents the agent from making dangerous changes (corrupting ROM headers, exceeding byte limits, bypassing test requirements) while maintaining flexibility through configurable policies.
## Implementation Summary
### Core Components
1. **PolicyEvaluator Service** (`src/cli/service/policy_evaluator.{h,cc}`)
- Singleton service managing policy loading and evaluation
- 377 lines of implementation code
- Thread-safe with absl::StatusOr error handling
- Auto-loads from `.yaze/policies/agent.yaml` on first use
2. **Policy Types** (4 implemented):
- **test_requirement**: Gates on test status (critical severity)
- **change_constraint**: Limits bytes modified (warning/critical)
- **forbidden_range**: Blocks specific memory regions (critical)
- **review_requirement**: Flags proposals needing scrutiny (warning)
3. **Severity Levels** (3 levels):
- **Info**: Informational only, no blocking
- **Warning**: User can override with confirmation
- **Critical**: Blocks acceptance completely
4. **GUI Integration** (`src/app/editor/system/proposal_drawer.{h,cc}`)
- `DrawPolicyStatus()`: Color-coded violation display
- ⛔ Red for critical violations
- ⚠️ Yellow for warnings
- Blue for info messages
- Accept button gating: Disabled when critical violations present
- Override dialog: Confirmation required for warnings
5. **Configuration** (`.yaze/policies/agent.yaml`)
- Simple YAML-like format for policy definitions
- Example configuration with 4 policies provided
- User can enable/disable individual policies
- Supports comments and version tracking
### Build System Integration
- Added `cli/service/policy_evaluator.cc` to:
- `src/cli/z3ed.cmake` (z3ed CLI target)
- `src/app/app.cmake` (yaze GUI target, with `YAZE_ENABLE_POLICY_FRAMEWORK=1`)
- **Conditional Compilation**: Policy framework only enabled in main `yaze` target
- `yaze_emu` (emulator) builds without policy support
- Uses `#ifdef YAZE_ENABLE_POLICY_FRAMEWORK` to wrap optional code
- Clean build with no errors (warnings only for Abseil version mismatch)
## Code Changes
### Files Created (3 new files):
1. **docs/z3ed/AW-04-POLICY-FRAMEWORK.md** (1,234 lines)
- Complete implementation specification
- YAML schema documentation
- Architecture diagrams and examples
- 4-phase implementation plan
2. **src/cli/service/policy_evaluator.h** (85 lines)
- PolicyEvaluator singleton interface
- PolicyResult, PolicyViolation structures
- PolicySeverity enum
- Public API: LoadPolicies(), EvaluateProposal(), ReloadPolicies()
3. **src/cli/service/policy_evaluator.cc** (377 lines)
- ParsePolicyFile(): Simple YAML parser
- Evaluate[Test|Change|Forbidden|Review](): Policy evaluation logic
- CategorizeViolations(): Severity-based filtering
4. **.yaze/policies/agent.yaml** (34 lines)
- Example policy configuration
- 4 sample policies with detailed comments
- Ready for production use
### Files Modified (5 files):
1. **src/app/editor/system/proposal_drawer.h**
- Added: `DrawPolicyStatus()` method
- Added: `show_override_dialog_` member variable
2. **src/app/editor/system/proposal_drawer.cc** (~100 lines added)
- Integrated PolicyEvaluator::Get().EvaluateProposal()
- Implemented DrawPolicyStatus() with color-coded violations
- Modified DrawActionButtons() to gate Accept button
- Added policy override confirmation dialog
3. **src/cli/z3ed.cmake**
- Added: `cli/service/policy_evaluator.cc` to z3ed sources
4. **src/app/app.cmake**
- Added: `cli/service/policy_evaluator.cc` to yaze sources
- Added: `YAZE_ENABLE_POLICY_FRAMEWORK=1` compile definition
- Note: `yaze_emu` target does NOT include policy framework (optional feature)
5. **src/app/editor/system/proposal_drawer.cc**
- Wrapped policy code with `#ifdef YAZE_ENABLE_POLICY_FRAMEWORK`
- Gracefully degrades when policy framework disabled
6. **docs/z3ed/E6-z3ed-implementation-plan.md**
- Updated: AW-04 status from "📋 Next" to "✅ Done"
- Updated: Active phase to Policy Framework complete
- Updated: Time investment to 28.5 hours total
## Technical Details
### Conditional Compilation
The policy framework uses conditional compilation to allow building without policy support:
```cpp
#ifdef YAZE_ENABLE_POLICY_FRAMEWORK
auto& policy_eval = cli::PolicyEvaluator::GetInstance();
auto policy_result = policy_eval.EvaluateProposal(p.id);
// ... policy evaluation logic ...
#endif
```
**Build Targets**:
- `yaze` (main editor): Policy framework **enabled**
- `yaze_emu` (emulator): Policy framework **disabled** (not needed)
- `z3ed` (CLI): Policy framework **enabled**
### API Usage Patterns
**StatusOr Error Handling**:
```cpp
auto proposal_result = registry.GetProposal(proposal_id);
if (!proposal_result.ok()) {
return PolicyResult{false, {}, {}, {}, {}};
}
const auto& proposal = proposal_result.value();
```
**String View Conversions**:
```cpp
// Explicit conversion required for absl::string_view → std::string
std::string trimmed = std::string(absl::StripAsciiWhitespace(line));
config_->version = std::string(absl::StripAsciiWhitespace(parts[1]));
```
**Singleton Pattern**:
```cpp
PolicyEvaluator& evaluator = PolicyEvaluator::Get();
PolicyResult result = evaluator.EvaluateProposal(proposal_id);
```
### Compilation Fixes Applied
1. **Include Paths**: Changed from `src/cli/service/...` to `cli/service/...`
2. **StatusOr API**: Used `.ok()` and `.value()` instead of `.has_value()`
3. **String Numbers**: Added `#include "absl/strings/numbers.h"` for SimpleAtoi
4. **String View**: Explicit `std::string()` cast for all absl::StripAsciiWhitespace() calls
5. **Conditional Compilation**: Wrapped policy code with `YAZE_ENABLE_POLICY_FRAMEWORK` to fix yaze_emu build
## Testing Plan
### Phase 1: Manual Validation (Next Step)
- [ ] Launch yaze GUI and open Proposal Drawer
- [ ] Create test proposal and verify policy evaluation runs
- [ ] Test critical violation blocking (Accept button disabled)
- [ ] Test warning override flow (confirmation dialog)
- [ ] Verify policy status display with all severity levels
### Phase 2: Policy Testing
- [ ] Test forbidden_range detection (ROM header protection)
- [ ] Test change_constraint limits (byte count enforcement)
- [ ] Test test_requirement gating (blocks without passing tests)
- [ ] Test review_requirement flagging (complex proposals)
- [ ] Test policy enable/disable toggle
### Phase 3: Edge Cases
- [ ] Invalid YAML syntax handling
- [ ] Missing policy file behavior
- [ ] Malformed policy definitions
- [ ] Policy reload during runtime
- [ ] Multiple policies of same type
### Phase 4: Unit Tests
- [ ] PolicyEvaluator::ParsePolicyFile() unit tests
- [ ] Individual policy type evaluation tests
- [ ] Severity categorization tests
- [ ] Integration tests with ProposalRegistry
## Known Limitations
1. **YAML Parsing**: Simple custom parser implemented
- Works for current format but not full YAML spec
- Consider yaml-cpp for complex nested structures
2. **Forbidden Range Checking**: Requires ROM diff parsing
- Currently placeholder implementation
- Will need integration with .z3ed-diff format
3. **Review Requirement Conditions**: Complex expression evaluation
- Currently checks simple string matching
- May need expression parser for production
4. **Performance**: No profiling done yet
- Target: < 100ms per evaluation
- Likely well under target given simple logic
## Production Readiness Checklist
- ✅ Core implementation complete
- ✅ Build system integration
- ✅ GUI integration
- ✅ Example configuration
- ✅ Documentation complete
- ⏳ Manual testing (next step)
- ⏳ Unit test coverage
- ⏳ Windows cross-platform validation
- ⏳ Performance profiling
## Next Steps
**Immediate** (30 minutes):
1. Launch yaze and test policy evaluation in ProposalDrawer
2. Verify all 4 policy types work correctly
3. Test override workflow for warnings
**Short-term** (2-3 hours):
1. Add unit tests for PolicyEvaluator
2. Test on Windows build
3. Document policy configuration in user guide
**Medium-term** (4-6 hours):
1. Integrate with .z3ed-diff for forbidden range detection
2. Implement full YAML parser (yaml-cpp)
3. Add policy reload command to CLI
4. Performance profiling and optimization
## References
- **Specification**: [AW-04-POLICY-FRAMEWORK.md](AW-04-POLICY-FRAMEWORK.md)
- **Implementation Plan**: [E6-z3ed-implementation-plan.md](E6-z3ed-implementation-plan.md)
- **Example Config**: `.yaze/policies/agent.yaml`
- **Source Files**:
- `src/cli/service/policy_evaluator.{h,cc}`
- `src/app/editor/system/proposal_drawer.{h,cc}`
---
**Accomplishment**: The Policy Evaluation Framework is now fully implemented and ready for production testing. This represents a major safety milestone for the z3ed agentic workflow system, enabling confident AI-driven ROM modifications with human-defined constraints.

View File

@@ -16,6 +16,8 @@
This directory contains the primary documentation for the `z3ed` system. This directory contains the primary documentation for the `z3ed` system.
**📋 Documentation Status**: Consolidated (Oct 2, 2025) - 10 core files, 6,547 lines
## Core Documentation ## Core Documentation
Start here to understand the architecture, learn how to use the commands, and see the current development status. Start here to understand the architecture, learn how to use the commands, and see the current development status.
@@ -90,6 +92,7 @@ See the **[Technical Reference](E6-z3ed-reference.md)** for a full command list.
- Successfully tested via gRPC (5.3MB output files) - Successfully tested via gRPC (5.3MB output files)
- Foundation for auto-capture on test failures - Foundation for auto-capture on test failures
- AI agents can now capture visual context for debugging - AI agents can now capture visual context for debugging
- ✅ IT-07 Test Recording & Replay Complete: Regression testing workflow operational
- ✅ Server-side wiring for test lifecycle tracking inside `TestManager` - ✅ Server-side wiring for test lifecycle tracking inside `TestManager`
- ✅ gRPC status mapping helper to surface accurate error codes back to clients - ✅ gRPC status mapping helper to surface accurate error codes back to clients
- ✅ CLI integration with YAML/JSON output formats - ✅ CLI integration with YAML/JSON output formats
@@ -97,11 +100,11 @@ See the **[Technical Reference](E6-z3ed-reference.md)** for a full command list.
**Next Priority**: IT-08b (Auto-capture on failure) + IT-08c (Widget state dumps) to complete enhanced error reporting **Next Priority**: IT-08b (Auto-capture on failure) + IT-08c (Widget state dumps) to complete enhanced error reporting
**Test Harness Evolution** (In Progress: IT-05 to IT-09 | 76% Complete): **Test Harness Evolution** (In Progress: IT-05 to IT-09 | 78% Complete):
- **Test Introspection**: ✅ Query test status, results, and execution history - **Test Introspection**: ✅ Query test status, results, and execution history
- **Widget Discovery**: ✅ AI agents can enumerate available GUI interactions dynamically - **Widget Discovery**: ✅ AI agents can enumerate available GUI interactions dynamically
- **Test Recording**: ✅ Capture manual workflows as JSON scripts for regression testing - **Test Recording**: ✅ Capture manual workflows as JSON scripts for regression testing
- **Enhanced Debugging**: 🔄 Screenshot capture (✅), widget state dumps (📋), execution context on failures (📋) - **Enhanced Debugging**: 🔄 Screenshot capture (✅ IT-08a), widget state dumps (📋 IT-08c), execution context on failures (📋 IT-08b)
- **CI/CD Integration**: 📋 Standardized test suite format with JUnit XML output - **CI/CD Integration**: 📋 Standardized test suite format with JUnit XML output
See **[E6-z3ed-cli-design.md § 9](E6-z3ed-cli-design.md#9-test-harness-evolution-from-automation-to-platform)** for detailed architecture and implementation roadmap. See **[E6-z3ed-cli-design.md § 9](E6-z3ed-cli-design.md#9-test-harness-evolution-from-automation-to-platform)** for detailed architecture and implementation roadmap.
@@ -111,12 +114,13 @@ See **[E6-z3ed-cli-design.md § 9](E6-z3ed-cli-design.md#9-test-harness-evolutio
**📖 Getting Started**: **📖 Getting Started**:
- **New to z3ed?** Start with this [README.md](README.md) then [E6-z3ed-cli-design.md](E6-z3ed-cli-design.md) - **New to z3ed?** Start with this [README.md](README.md) then [E6-z3ed-cli-design.md](E6-z3ed-cli-design.md)
- **Want to use z3ed?** See [QUICK_REFERENCE.md](QUICK_REFERENCE.md) for all commands - **Want to use z3ed?** See [QUICK_REFERENCE.md](QUICK_REFERENCE.md) for all commands
- **Resume implementation?** Read [IMPLEMENTATION_CONTINUATION.md](IMPLEMENTATION_CONTINUATION.md)
**🔧 Implementation Guides**: **🔧 Implementation Guides**:
- [IT-05-IMPLEMENTATION-GUIDE.md](IT-05-IMPLEMENTATION-GUIDE.md) - Test Introspection API (next priority) - [IT-05-IMPLEMENTATION-GUIDE.md](IT-05-IMPLEMENTATION-GUIDE.md) - Test Introspection API (complete ✅)
- [STATUS_REPORT_OCT2.md](STATUS_REPORT_OCT2.md) - Complete progress summary - [IT-08-IMPLEMENTATION-GUIDE.md](IT-08-IMPLEMENTATION-GUIDE.md) - Enhanced Error Reporting (in progress 🔄)
- [IMPLEMENTATION_CONTINUATION.md](IMPLEMENTATION_CONTINUATION.md) - Detailed continuation plan for current phase
**📚 Reference**: **📚 Reference**:
- [E6-z3ed-reference.md](E6-z3ed-reference.md) - Technical reference and API docs - [E6-z3ed-reference.md](E6-z3ed-reference.md) - Technical reference and API docs
- [E6-z3ed-implementation-plan.md](E6-z3ed-implementation-plan.md) - Task backlog and roadmap - [E6-z3ed-implementation-plan.md](E6-z3ed-implementation-plan.md) - Task backlog and roadmap
- [QUICK_REFERENCE.md](QUICK_REFERENCE.md) - Quick command reference

View File

@@ -1,402 +0,0 @@
# Remote Control Agent Workflows
**Date**: October 2, 2025
**Status**: Functional - Test Harness + Widget Registry Integration
**Purpose**: Enable AI agents to remotely control YAZE for automated editing
## Overview
The remote control system allows AI agents to interact with YAZE through gRPC, using the ImGuiTestHarness and Widget ID Registry to perform real editing tasks.
## Quick Start
### 1. Start YAZE with Test Harness
```bash
./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze \
--enable_test_harness \
--test_harness_port=50052 \
--rom_file=assets/zelda3.sfc &
```
### 2. Open Overworld Editor
In YAZE GUI:
- Click "Overworld" button
- This registers 13 toolset widgets for remote control
### 3. Run Test Script
```bash
./scripts/test_remote_control.sh
```
Expected output:
- ✓ All 8 practical workflows pass
- Agent can switch modes, open tools, control zoom
## Supported Workflows
### Mode Switching
**Draw Tile Mode**:
```bash
grpcurl -plaintext -d '{"target":"Overworld/Toolset/button:DrawTile","type":"LEFT"}' \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Click
```
- Enables tile painting on overworld map
- Agent can then click canvas to draw selected tiles
**Pan Mode**:
```bash
grpcurl -plaintext -d '{"target":"Overworld/Toolset/button:Pan","type":"LEFT"}' \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Click
```
- Enables map navigation
- Agent can drag canvas to reposition view
**Entrances Mode**:
```bash
grpcurl -plaintext -d '{"target":"Overworld/Toolset/button:Entrances","type":"LEFT"}' \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Click
```
- Enables entrance editing
- Agent can click to place/move entrances
**Exits Mode**:
```bash
grpcurl -plaintext -d '{"target":"Overworld/Toolset/button:Exits","type":"LEFT"}' \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Click
```
- Enables exit editing
- Agent can click to place/move exits
**Sprites Mode**:
```bash
grpcurl -plaintext -d '{"target":"Overworld/Toolset/button:Sprites","type":"LEFT"}' \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Click
```
- Enables sprite editing
- Agent can place/move sprites on overworld
**Items Mode**:
```bash
grpcurl -plaintext -d '{"target":"Overworld/Toolset/button:Items","type":"LEFT"}' \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Click
```
- Enables item placement
- Agent can add items to overworld
### Tool Opening
**Tile16 Editor**:
```bash
grpcurl -plaintext -d '{"target":"Overworld/Toolset/button:Tile16Editor","type":"LEFT"}' \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Click
```
- Opens Tile16 Editor window
- Agent can select tiles for drawing
### View Controls
**Zoom In**:
```bash
grpcurl -plaintext -d '{"target":"Overworld/Toolset/button:ZoomIn","type":"LEFT"}' \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Click
```
**Zoom Out**:
```bash
grpcurl -plaintext -d '{"target":"Overworld/Toolset/button:ZoomOut","type":"LEFT"}' \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Click
```
**Fullscreen Toggle**:
```bash
grpcurl -plaintext -d '{"target":"Overworld/Toolset/button:Fullscreen","type":"LEFT"}' \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Click
```
## Multi-Step Workflows
### Workflow 1: Draw Custom Tiles
**Goal**: Agent draws specific tiles on the overworld map
**Steps**:
1. Switch to Draw Tile mode
2. Open Tile16 Editor
3. Select desired tile (TODO: needs canvas click support)
4. Click on overworld canvas at (x, y) to draw
**Current Status**: Steps 1-2 working, 3-4 need implementation
### Workflow 2: Reposition Entrance
**Goal**: Agent moves an entrance to a new location
**Steps**:
1. Switch to Entrances mode
2. Click on existing entrance to select
3. Drag to new location (TODO: needs drag support)
4. Verify entrance properties updated
**Current Status**: Step 1 working, 2-4 need implementation
### Workflow 3: Place Sprites
**Goal**: Agent adds sprites to overworld
**Steps**:
1. Switch to Sprites mode
2. Select sprite from palette (TODO)
3. Click canvas to place sprite
4. Adjust sprite properties if needed
**Current Status**: Step 1 working, 2-4 need implementation
## Widget Registry Integration
### Hierarchical Widget IDs
The test harness now supports hierarchical widget IDs from the registry:
```
Format: <Editor>/<Section>/<Type>:<Name>
Example: Overworld/Toolset/button:DrawTile
```
**Benefits**:
- Stable, predictable widget references
- Better error messages with suggestions
- Backwards compatible with legacy format
- Self-documenting structure
### Pattern Matching
When a widget isn't found, the system suggests alternatives:
```bash
# Typo in widget name
grpcurl ... -d '{"target":"Overworld/Toolset/button:DrawTyle"}'
# Response:
# "Widget not found: DrawTyle. Did you mean:
# Overworld/Toolset/button:DrawTile?"
```
### Widget Discovery
Future enhancement - list all available widgets:
```bash
z3ed agent discover --pattern "Overworld/*"
# Lists all Overworld widgets
z3ed agent discover --pattern "*/button:*"
# Lists all buttons across editors
```
## Implementation Details
### Test Harness Changes
**File**: `src/app/core/service/imgui_test_harness_service.cc`
**Changes**:
1. Added widget registry include
2. Click RPC tries hierarchical lookup first
3. Fallback to legacy string-based lookup
4. Pattern matching for suggestions
**Code**:
```cpp
// Try hierarchical widget ID lookup first
auto& registry = gui::WidgetIdRegistry::Instance();
ImGuiID widget_id = registry.GetWidgetId(target);
if (widget_id != 0) {
// Found in registry - use ImGui ID directly
ctx->ItemClick(widget_id, mouse_button);
} else {
// Fallback to legacy lookup
ctx->ItemClick(widget_label.c_str(), mouse_button);
}
```
### Widget Registration
**File**: `src/app/editor/overworld/overworld_editor.cc`
**Registered Widgets** (13 total):
- Overworld/Toolset/button:Pan
- Overworld/Toolset/button:DrawTile
- Overworld/Toolset/button:Entrances
- Overworld/Toolset/button:Exits
- Overworld/Toolset/button:Items
- Overworld/Toolset/button:Sprites
- Overworld/Toolset/button:Transports
- Overworld/Toolset/button:Music
- Overworld/Toolset/button:ZoomIn
- Overworld/Toolset/button:ZoomOut
- Overworld/Toolset/button:Fullscreen
- Overworld/Toolset/button:Tile16Editor
- Overworld/Toolset/button:CopyMap
## Next Steps
### Priority 1: Canvas Interaction (2-3 hours)
**Goal**: Enable agent to click on canvas at specific coordinates
**Implementation**:
1. Add canvas click to Click RPC
2. Support coordinate-based clicking: `{"target":"canvas:Overworld","x":100,"y":200}`
3. Test drawing tiles programmatically
**Use Cases**:
- Draw tiles at specific locations
- Select entities by clicking
- Navigate by clicking minimap
### Priority 2: Tile Selection (1-2 hours)
**Goal**: Enable agent to select tiles from Tile16 Editor
**Implementation**:
1. Register Tile16 Editor canvas widgets
2. Support tile palette clicking
3. Track selected tile state
**Use Cases**:
- Select tile before drawing
- Change tile selection mid-workflow
- Verify correct tile selected
### Priority 3: Entity Manipulation (2-3 hours)
**Goal**: Enable dragging of entrances, exits, sprites
**Implementation**:
1. Add Drag RPC to proto
2. Implement drag operation in test harness
3. Support drag start + end coordinates
**Use Cases**:
- Move entrances to new positions
- Reposition sprites
- Adjust exit locations
### Priority 4: Workflow Chaining (1-2 hours)
**Goal**: Combine multiple operations into workflows
**Implementation**:
1. Create workflow definition format
2. Execute sequence of RPCs
3. Handle errors gracefully
**Example Workflow**:
```yaml
workflow: draw_custom_tile
steps:
- click: Overworld/Toolset/button:DrawTile
- click: Overworld/Toolset/button:Tile16Editor
- wait: window_visible:Tile16 Editor
- click: canvas:Tile16Editor
x: 64
y: 64
- click: canvas:Overworld
x: 512
y: 384
```
## Testing Strategy
### Manual Testing
1. Start test harness
2. Run test script: `./scripts/test_remote_control.sh`
3. Observe mode changes in GUI
4. Verify no crashes or errors
### Automated Testing
1. Add to CI pipeline
2. Run as part of E2E validation
3. Test on multiple platforms
### Integration Testing
1. Test with real agent workflows
2. Validate agent can complete tasks
3. Measure reliability and timing
## Performance Characteristics
**Click Latency**: < 200ms
- gRPC overhead: ~10ms
- Test queue time: ~50ms
- ImGui event processing: ~100ms
- Total: ~160ms average
**Mode Switch Time**: < 500ms
- Includes UI update
- State transition
- Visual feedback
**Tool Opening**: < 1s
- Window creation
- Content loading
- Layout calculation
## Troubleshooting
### Widget Not Found
**Problem**: "Widget not found: Overworld/Toolset/button:DrawTile"
**Solutions**:
1. Verify Overworld editor is open (widgets registered on open)
2. Check widget name spelling
3. Look at suggestions in error message
4. Try legacy format: "button:DrawTile"
### Click Not Working
**Problem**: Click succeeds but nothing happens
**Solutions**:
1. Check if widget is enabled (not grayed out)
2. Verify correct mode/context for action
3. Add delay between clicks
4. Check ImGui event queue
### Test Timeout
**Problem**: "Test timeout - widget not found or unresponsive"
**Solutions**:
1. Increase timeout (default 5s)
2. Check if GUI is responsive
3. Verify widget is visible (not hidden)
4. Look for modal dialogs blocking interaction
## References
**Documentation**:
- [WIDGET_ID_REFACTORING_PROGRESS.md](WIDGET_ID_REFACTORING_PROGRESS.md)
- [IT-01-QUICKSTART.md](IT-01-QUICKSTART.md)
- [E2E_VALIDATION_GUIDE.md](E2E_VALIDATION_GUIDE.md)
**Code Files**:
- `src/app/core/service/imgui_test_harness_service.cc` - Test harness implementation
- `src/app/gui/widget_id_registry.{h,cc}` - Widget registry
- `src/app/editor/overworld/overworld_editor.cc` - Widget registrations
- `scripts/test_remote_control.sh` - Test script
---
**Last Updated**: October 2, 2025, 11:45 PM
**Status**: Functional - Basic mode switching works
**Next**: Canvas interaction + tile selection

View File

@@ -1,357 +0,0 @@
# Widget ID Refactoring - Next Actions
**Date**: October 2, 2025
**Status**: Phase 1 Complete - Testing & Integration Phase
**Previous Session**: [SESSION_SUMMARY_OCT2_NIGHT.md](SESSION_SUMMARY_OCT2_NIGHT.md)
## Quick Start - Next Session
### Option 1: Manual Testing (15 minutes) 🎯 RECOMMENDED FIRST
**Goal**: Verify widgets register correctly in running GUI
```bash
# 1. Launch YAZE
./build/bin/yaze.app/Contents/MacOS/yaze
# 2. Open a ROM
# File → Open ROM → assets/zelda3.sfc
# 3. Open Overworld Editor
# Click "Overworld" button in main window
# 4. Test toolset buttons
# Click through: Pan, DrawTile, Entrances, etc.
# Expected: All work normally, no crashes
# 5. Check console output
# Look for any errors or warnings
# Widget registrations happen silently
```
**Success Criteria**:
- ✅ GUI launches without crashes
- ✅ Overworld editor opens normally
- ✅ All toolset buttons clickable
- ✅ No error messages in console
---
### Option 2: Add Widget Discovery Command (30 minutes)
**Goal**: Create CLI command to list registered widgets
**File to Edit**: `src/cli/handlers/agent.cc`
**Add New Command**: `z3ed agent discover`
```cpp
// Add to agent.cc:
absl::Status HandleDiscoverCommand(const std::vector<std::string>& args) {
// Parse --pattern flag (default "*")
std::string pattern = "*";
for (size_t i = 0; i < args.size(); ++i) {
if (args[i] == "--pattern" && i + 1 < args.size()) {
pattern = args[++i];
}
}
// Get widget registry
auto& registry = gui::WidgetIdRegistry::Instance();
auto matches = registry.FindWidgets(pattern);
if (matches.empty()) {
std::cout << "No widgets found matching pattern: " << pattern << "\n";
return absl::NotFoundError("No widgets found");
}
std::cout << "=== Registered Widgets ===\n\n";
std::cout << "Pattern: " << pattern << "\n";
std::cout << "Count: " << matches.size() << "\n\n";
for (const auto& path : matches) {
const auto* info = registry.GetWidgetInfo(path);
if (info) {
std::cout << path << "\n";
std::cout << " Type: " << info->type << "\n";
std::cout << " ImGui ID: " << info->imgui_id << "\n";
if (!info->description.empty()) {
std::cout << " Description: " << info->description << "\n";
}
std::cout << "\n";
}
}
return absl::OkStatus();
}
// Add routing in HandleAgentCommand:
if (subcommand == "discover") {
return HandleDiscoverCommand(args);
}
```
**Test**:
```bash
# Rebuild
cmake --build build --target z3ed -j8
# Test discovery (will fail - widgets registered at runtime)
./build/bin/z3ed agent discover
# Note: This requires YAZE to be running with widgets registered
# We'll need a different approach - see Option 3
```
---
### Option 3: Widget Export at Shutdown (30 minutes) 🎯 BETTER APPROACH
**Goal**: Export widget catalog when YAZE exits
**File to Edit**: `src/app/editor/editor_manager.cc`
**Add Destructor or Shutdown Method**:
```cpp
// In editor_manager.cc destructor or Shutdown():
void EditorManager::Shutdown() {
// Export widget catalog for z3ed agent
auto& registry = gui::WidgetIdRegistry::Instance();
std::string catalog_path = "/tmp/yaze_widgets.yaml";
try {
registry.ExportCatalogToFile(catalog_path, "yaml");
std::cout << "Widget catalog exported to: " << catalog_path << "\n";
} catch (const std::exception& e) {
std::cerr << "Failed to export widget catalog: " << e.what() << "\n";
}
}
```
**Test**:
```bash
# 1. Rebuild
cmake --build build --target yaze -j8
# 2. Launch YAZE
./build/bin/yaze.app/Contents/MacOS/yaze
# 3. Open Overworld editor
# (registers widgets)
# 4. Quit YAZE
# File → Quit or Cmd+Q
# 5. Check exported catalog
cat /tmp/yaze_widgets.yaml
# Expected output:
# widgets:
# - path: "Overworld/Toolset/button:Pan"
# type: button
# imgui_id: 12345
# context:
# editor: Overworld
# tab: Toolset
# ...
```
---
### Option 4: Test Harness Integration (1-2 hours)
**Goal**: Enable test harness to click widgets by hierarchical ID
**Files to Edit**:
1. `src/app/core/service/imgui_test_harness_service.cc`
2. `src/app/core/proto/imgui_test_harness.proto` (optional - add DiscoverWidgets RPC)
**Implementation**:
```cpp
// In imgui_test_harness_service.cc, update Click RPC:
absl::Status ImGuiTestHarnessServiceImpl::Click(
const ClickRequest* request, ClickResponse* response) {
const std::string& target = request->target();
// Try hierarchical widget ID first
auto& registry = gui::WidgetIdRegistry::Instance();
ImGuiID widget_id = registry.GetWidgetId(target);
if (widget_id != 0) {
// Found in registry - use ImGui ID directly
std::string test_name = absl::StrFormat("DynamicClick_%s", target);
auto* dynamic_test = ImGuiTest_CreateDynamicTest(
test_manager_->GetEngine(), test_category_.c_str(), test_name.c_str());
dynamic_test->GuiFunc = [widget_id](ImGuiTestContext* ctx) {
ctx->ItemClick(widget_id);
};
ImGuiTest_RunTest(test_manager_->GetEngine(), dynamic_test);
response->set_success(true);
response->set_message(absl::StrFormat("Clicked widget: %s", target));
return absl::OkStatus();
}
// Fallback to legacy string-based lookup
// ... existing code ...
// If not found, suggest alternatives
auto matches = registry.FindWidgets("*" + target + "*");
if (!matches.empty()) {
std::string suggestions = absl::StrJoin(matches, ", ");
return absl::NotFoundError(
absl::StrFormat("Widget not found: %s. Did you mean: %s?",
target, suggestions));
}
return absl::NotFoundError(
absl::StrFormat("Widget not found: %s", target));
}
```
**Test**:
```bash
# 1. Rebuild with gRPC
cmake --build build-grpc-test --target yaze -j8
# 2. Start test harness
./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze \
--enable_test_harness \
--test_harness_port=50052 \
--rom_file=assets/zelda3.sfc &
# 3. Open Overworld editor in GUI
# (registers widgets)
# 4. Test hierarchical click
grpcurl -plaintext \
-import-path src/app/core/proto \
-proto imgui_test_harness.proto \
-d '{"target":"Overworld/Toolset/button:DrawTile","type":"LEFT"}' \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Click
# Expected: Click succeeds, DrawTile mode activated
```
---
## Recommended Sequence
### Tonight (30 minutes)
1.**Option 1**: Manual testing - verify no crashes
2. 📋 **Option 3**: Add widget export at shutdown
3. 📋 Inspect exported YAML, verify 13 toolset widgets
### Tomorrow Morning (1-2 hours)
1. 📋 **Option 4**: Test harness integration
2. 📋 Test clicking widgets via hierarchical IDs
3. 📋 Update E2E test script with new IDs
### Tomorrow Afternoon (2-3 hours)
1. 📋 Complete Overworld editor (canvas, properties)
2. 📋 Add DiscoverWidgets RPC to proto
3. 📋 Document patterns and best practices
---
## Files to Modify Next
### High Priority
1. `src/app/editor/editor_manager.cc` - Add widget export at shutdown
2. `src/app/core/service/imgui_test_harness_service.cc` - Registry lookup in Click RPC
### Medium Priority
3. `src/app/core/proto/imgui_test_harness.proto` - Add DiscoverWidgets RPC
4. `src/app/editor/overworld/overworld_editor.cc` - Add canvas/properties widgets
### Low Priority
5. `scripts/test_harness_e2e.sh` - Update with hierarchical IDs
6. `docs/z3ed/IT-01-QUICKSTART.md` - Add widget ID examples
---
## Success Criteria
### Phase 1 (Complete) ✅
- [x] Widget registry in build
- [x] 13 toolset widgets registered
- [x] Clean build
- [x] Documentation updated
### Phase 2 (Current) 🔄
- [ ] Manual testing passes
- [ ] Widget export works
- [ ] Test harness can click by hierarchical ID
- [ ] At least 1 E2E test updated
### Phase 3 (Next) 📋
- [ ] Complete Overworld editor (30+ widgets)
- [ ] DiscoverWidgets RPC working
- [ ] All E2E tests use hierarchical IDs
- [ ] Performance validated (< 1ms overhead)
---
## Quick Commands
### Build
```bash
# Regular build
cmake --build build --target yaze -j8
# Test harness build
cmake --build build-grpc-test --target yaze -j8
# CLI build
cmake --build build --target z3ed -j8
```
### Test
```bash
# Manual test
./build/bin/yaze.app/Contents/MacOS/yaze
# Test harness
./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze \
--enable_test_harness \
--test_harness_port=50052 \
--rom_file=assets/zelda3.sfc
```
### Cleanup
```bash
# Kill running YAZE instances
killall yaze
# Clean build
rm -rf build/CMakeFiles build/bin
cmake --build build -j8
```
---
## References
**Progress Docs**:
- [WIDGET_ID_REFACTORING_PROGRESS.md](WIDGET_ID_REFACTORING_PROGRESS.md) - Detailed tracker
- [SESSION_SUMMARY_OCT2_NIGHT.md](SESSION_SUMMARY_OCT2_NIGHT.md) - Tonight's work
**Design Docs**:
- [IMGUI_ID_MANAGEMENT_REFACTORING.md](IMGUI_ID_MANAGEMENT_REFACTORING.md) - Complete plan
- [IT-01-QUICKSTART.md](IT-01-QUICKSTART.md) - Test harness guide
**Code References**:
- `src/app/gui/widget_id_registry.{h,cc}` - Registry implementation
- `src/app/editor/overworld/overworld_editor.cc` - Usage example
- `src/app/core/service/imgui_test_harness_service.cc` - Test harness
---
**Last Updated**: October 2, 2025, 11:30 PM
**Next Action**: Option 1 (Manual Testing) or Option 3 (Widget Export)
**Time Estimate**: 15-30 minutes