Update documentation

This commit is contained in:
scawful
2025-10-02 20:55:28 -04:00
parent e3621d7a1f
commit 0fb8ba4202
9 changed files with 1059 additions and 1997 deletions

View File

@@ -1,627 +0,0 @@
# Policy Evaluation Framework (AW-04)
**Status**: Implementation In Progress
**Priority**: High (Next Phase)
**Time Estimate**: 6-8 hours
**Last Updated**: October 2, 2025
## Overview
The Policy Evaluation Framework provides a YAML-based constraint system for gating proposal acceptance in the z3ed agent workflow. It ensures that AI-generated ROM modifications meet quality, safety, and testing requirements before being merged into the main ROM.
## Goals
1. **Quality Gates**: Enforce minimum test pass rates and code quality standards
2. **Safety Constraints**: Prevent modifications to critical ROM regions (headers, checksums)
3. **Scope Limits**: Restrict changes to reasonable byte counts and specific banks
4. **Human Review**: Require manual review for large or complex changes
5. **Flexibility**: Allow policy overrides with confirmation and logging
## Architecture
```
┌─────────────────────────────────────────────────────────┐
│ ProposalDrawer (GUI) │
│ └─ Accept button gated by PolicyEvaluator │
└────────────────────┬────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ PolicyEvaluator (Singleton Service) │
│ ├─ LoadPolicies() from .yaze/policies/ │
│ ├─ EvaluateProposal(proposal_id) → PolicyResult │
│ └─ Cache of parsed YAML policies │
└────────────────────┬────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ .yaze/policies/agent.yaml (YAML Configuration) │
│ ├─ test_requirements (min pass rates) │
│ ├─ change_constraints (byte limits, allowed banks) │
│ ├─ review_requirements (human review triggers) │
│ └─ forbidden_ranges (protected ROM regions) │
└─────────────────────────────────────────────────────────┘
```
## YAML Policy Schema
### Example Policy File
```yaml
# .yaze/policies/agent.yaml
version: 1.0
enabled: true
policies:
# Policy 1: Test Requirements
- name: require_tests
type: test_requirement
enabled: true
severity: critical # critical | warning | info
rules:
- test_suite: "overworld_rendering"
min_pass_rate: 0.95
- test_suite: "palette_integrity"
min_pass_rate: 1.0
- test_suite: "dungeon_logic"
min_pass_rate: 0.90
message: "All required test suites must pass before accepting proposal"
# Policy 2: Change Scope Limits
- name: limit_change_scope
type: change_constraint
enabled: true
severity: critical
rules:
- max_bytes_changed: 10240 # 10KB limit
- allowed_banks: [0x00, 0x01, 0x0E, 0x0F] # Graphics banks only
- max_commands_executed: 20
message: "Proposal exceeds allowed change scope"
# Policy 3: Protected ROM Regions
- name: protect_critical_regions
type: forbidden_range
enabled: true
severity: critical
ranges:
- start: 0xFFB0 # ROM header
end: 0xFFFF
reason: "ROM header is protected"
- start: 0x00FFC0 # Internal header
end: 0x00FFDF
reason: "Internal ROM header"
message: "Proposal modifies protected ROM region"
# Policy 4: Human Review Requirements
- name: human_review_required
type: review_requirement
enabled: true
severity: warning
conditions:
- if: bytes_changed > 1024
then: require_diff_review
message: "Large change requires diff review"
- if: commands_executed > 10
then: require_log_review
message: "Complex operation requires log review"
- if: test_failures > 0
then: require_explanation
message: "Test failures require explanation"
# Policy 5: Palette Modifications
- name: palette_safety
type: change_constraint
enabled: true
severity: warning
rules:
- max_palettes_changed: 5
- preserve_transparency: true # Don't modify color index 0
message: "Palette changes exceed safety threshold"
```
### Schema Definition
```yaml
# Policy file structure
version: string # Semantic version (e.g., "1.0")
enabled: boolean # Master enable/disable
policies:
- name: string # Unique policy identifier
type: enum # test_requirement | change_constraint | forbidden_range | review_requirement
enabled: boolean # Policy-specific enable/disable
severity: enum # critical | warning | info
# Type-specific fields:
rules: array # For test_requirement, change_constraint
ranges: array # For forbidden_range
conditions: array # For review_requirement
message: string # User-facing error message
```
## Implementation Plan
### Phase 1: Core Infrastructure (2 hours)
#### 1.1 Create PolicyEvaluator Service
**File**: `src/cli/service/policy_evaluator.h`
```cpp
#ifndef YAZE_CLI_SERVICE_POLICY_EVALUATOR_H
#define YAZE_CLI_SERVICE_POLICY_EVALUATOR_H
#include <string>
#include <vector>
#include <memory>
#include "absl/status/status.h"
#include "absl/status/statusor.h"
#include "absl/strings/string_view.h"
namespace yaze {
namespace cli {
// Policy violation severity levels
enum class PolicySeverity {
kInfo, // Informational, doesn't block acceptance
kWarning, // Warning, can be overridden
kCritical // Critical, blocks acceptance
};
// Individual policy violation
struct PolicyViolation {
std::string policy_name;
PolicySeverity severity;
std::string message;
std::string details; // Additional context
};
// Result of policy evaluation
struct PolicyResult {
bool passed; // True if all critical policies passed
std::vector<PolicyViolation> violations;
// Categorized violations
std::vector<PolicyViolation> critical_violations;
std::vector<PolicyViolation> warnings;
std::vector<PolicyViolation> info;
// Helper methods
bool has_critical_violations() const { return !critical_violations.empty(); }
bool can_accept_with_override() const {
return !has_critical_violations() && !warnings.empty();
}
};
// Singleton service for evaluating proposals against policies
class PolicyEvaluator {
public:
static PolicyEvaluator& GetInstance();
// Load policies from disk (.yaze/policies/agent.yaml)
absl::Status LoadPolicies(absl::string_view policy_dir = ".yaze/policies");
// Evaluate a proposal against all loaded policies
absl::StatusOr<PolicyResult> EvaluateProposal(
absl::string_view proposal_id);
// Reload policies from disk (for live editing)
absl::Status ReloadPolicies();
// Check if policies are loaded and enabled
bool IsEnabled() const { return enabled_; }
// Get policy configuration path
std::string GetPolicyPath() const { return policy_path_; }
private:
PolicyEvaluator() = default;
~PolicyEvaluator() = default;
// Non-copyable, non-movable
PolicyEvaluator(const PolicyEvaluator&) = delete;
PolicyEvaluator& operator=(const PolicyEvaluator&) = delete;
// Parse YAML policy file
absl::Status ParsePolicyFile(absl::string_view yaml_content);
// Evaluate individual policy types
void EvaluateTestRequirements(
absl::string_view proposal_id, PolicyResult* result);
void EvaluateChangeConstraints(
absl::string_view proposal_id, PolicyResult* result);
void EvaluateForbiddenRanges(
absl::string_view proposal_id, PolicyResult* result);
void EvaluateReviewRequirements(
absl::string_view proposal_id, PolicyResult* result);
bool enabled_ = false;
std::string policy_path_;
// Parsed policy structures (implementation detail)
struct PolicyConfig;
std::unique_ptr<PolicyConfig> config_;
};
} // namespace cli
} // namespace yaze
#endif // YAZE_CLI_SERVICE_POLICY_EVALUATOR_H
```
#### 1.2 Create Policy Configuration Structures
**File**: `src/cli/service/policy_evaluator.cc` (partial)
```cpp
#include "src/cli/service/policy_evaluator.h"
#include <fstream>
#include <sstream>
#include "absl/strings/str_format.h"
#include "src/cli/service/proposal_registry.h"
// If YAML parsing is available
#ifdef YAZE_WITH_YAML
#include <yaml-cpp/yaml.h>
#endif
namespace yaze {
namespace cli {
// Internal policy configuration structures
struct PolicyEvaluator::PolicyConfig {
std::string version;
bool enabled;
struct TestRequirement {
std::string name;
bool enabled;
PolicySeverity severity;
std::vector<std::pair<std::string, double>> test_suites; // suite name → min pass rate
std::string message;
};
struct ChangeConstraint {
std::string name;
bool enabled;
PolicySeverity severity;
int max_bytes_changed = -1;
std::vector<int> allowed_banks;
int max_commands_executed = -1;
int max_palettes_changed = -1;
bool preserve_transparency = false;
std::string message;
};
struct ForbiddenRange {
std::string name;
bool enabled;
PolicySeverity severity;
std::vector<std::tuple<int, int, std::string>> ranges; // start, end, reason
std::string message;
};
struct ReviewRequirement {
std::string name;
bool enabled;
PolicySeverity severity;
std::vector<std::string> conditions;
std::string message;
};
std::vector<TestRequirement> test_requirements;
std::vector<ChangeConstraint> change_constraints;
std::vector<ForbiddenRange> forbidden_ranges;
std::vector<ReviewRequirement> review_requirements;
};
// Singleton instance
PolicyEvaluator& PolicyEvaluator::GetInstance() {
static PolicyEvaluator instance;
return instance;
}
absl::Status PolicyEvaluator::LoadPolicies(absl::string_view policy_dir) {
policy_path_ = absl::StrFormat("%s/agent.yaml", policy_dir);
// Check if file exists
std::ifstream file(policy_path_);
if (!file.good()) {
// No policy file - policies disabled
enabled_ = false;
return absl::OkStatus();
}
// Read file content
std::stringstream buffer;
buffer << file.rdbuf();
std::string yaml_content = buffer.str();
return ParsePolicyFile(yaml_content);
}
absl::Status PolicyEvaluator::ParsePolicyFile(absl::string_view yaml_content) {
#ifndef YAZE_WITH_YAML
return absl::UnimplementedError(
"YAML support not compiled. Build with YAZE_WITH_YAML=ON");
#else
try {
YAML::Node root = YAML::Load(std::string(yaml_content));
config_ = std::make_unique<PolicyConfig>();
config_->version = root["version"].as<std::string>("1.0");
config_->enabled = root["enabled"].as<bool>(true);
if (!config_->enabled) {
enabled_ = false;
return absl::OkStatus();
}
// Parse policies array
if (root["policies"]) {
for (const auto& policy_node : root["policies"]) {
std::string type = policy_node["type"].as<std::string>();
if (type == "test_requirement") {
// Parse test requirement policy
// ... (implementation continues)
} else if (type == "change_constraint") {
// Parse change constraint policy
// ... (implementation continues)
} else if (type == "forbidden_range") {
// Parse forbidden range policy
// ... (implementation continues)
} else if (type == "review_requirement") {
// Parse review requirement policy
// ... (implementation continues)
}
}
}
enabled_ = true;
return absl::OkStatus();
} catch (const YAML::Exception& e) {
return absl::InvalidArgumentError(
absl::StrFormat("Failed to parse policy YAML: %s", e.what()));
}
#endif
}
// ... (implementation continues with evaluation methods)
} // namespace cli
} // namespace yaze
```
### Phase 2: Policy Evaluation Logic (2-3 hours)
Implement the core evaluation methods that check proposals against each policy type.
### Phase 3: GUI Integration (2 hours)
#### 3.1 Update ProposalDrawer
**File**: `src/app/editor/system/proposal_drawer.cc`
Add policy status display and gating logic:
```cpp
#include "src/cli/service/policy_evaluator.h"
void ProposalDrawer::DrawProposalDetail(const std::string& proposal_id) {
// ... existing detail view code ...
// === Policy Status Section ===
ImGui::Separator();
ImGui::TextUnformatted("Policy Status:");
auto& policy_eval = cli::PolicyEvaluator::GetInstance();
if (policy_eval.IsEnabled()) {
auto policy_result = policy_eval.EvaluateProposal(proposal_id);
if (policy_result.ok()) {
const auto& result = policy_result.value();
if (result.passed) {
ImGui::TextColored(ImVec4(0, 1, 0, 1), "✓ All policies passed");
} else {
// Show violations
if (result.has_critical_violations()) {
ImGui::TextColored(ImVec4(1, 0, 0, 1), "⛔ Critical violations:");
for (const auto& violation : result.critical_violations) {
ImGui::BulletText("%s: %s",
violation.policy_name.c_str(),
violation.message.c_str());
}
}
if (!result.warnings.empty()) {
ImGui::TextColored(ImVec4(1, 1, 0, 1), "⚠️ Warnings:");
for (const auto& violation : result.warnings) {
ImGui::BulletText("%s: %s",
violation.policy_name.c_str(),
violation.message.c_str());
}
}
}
// Gate Accept button
ImGui::Separator();
bool can_accept = !result.has_critical_violations();
if (!can_accept) {
ImGui::BeginDisabled();
}
if (ImGui::Button("Accept Proposal")) {
if (result.can_accept_with_override() && !override_confirmed_) {
// Show override confirmation dialog
ImGui::OpenPopup("Override Policy");
} else {
AcceptProposal(proposal_id);
}
}
if (!can_accept) {
ImGui::EndDisabled();
ImGui::SameLine();
ImGui::TextColored(ImVec4(1, 0, 0, 1),
"(Accept blocked by policy violations)");
}
// Override confirmation dialog
if (ImGui::BeginPopupModal("Override Policy", nullptr,
ImGuiWindowFlags_AlwaysAutoResize)) {
ImGui::Text("This proposal has policy warnings.");
ImGui::Text("Do you want to override and accept anyway?");
ImGui::Text("This action will be logged.");
ImGui::Separator();
if (ImGui::Button("Override and Accept")) {
override_confirmed_ = true;
AcceptProposal(proposal_id);
ImGui::CloseCurrentPopup();
}
ImGui::SameLine();
if (ImGui::Button("Cancel")) {
ImGui::CloseCurrentPopup();
}
ImGui::EndPopup();
}
} else {
ImGui::TextColored(ImVec4(1, 0, 0, 1),
"Policy evaluation failed: %s",
policy_result.status().message().data());
}
} else {
ImGui::TextColored(ImVec4(0.5, 0.5, 0.5, 1),
"No policies configured");
}
}
```
### Phase 4: Testing & Documentation (1-2 hours)
#### 4.1 Example Policy File
Create `.yaze/policies/agent.yaml.example`:
```yaml
# Example agent policy configuration
# Copy to .yaze/policies/agent.yaml and customize
version: 1.0
enabled: true
policies:
# Require test suites to pass
- name: require_tests
type: test_requirement
enabled: false # Disabled by default (no tests yet)
severity: critical
rules:
- test_suite: "smoke_test"
min_pass_rate: 1.0
message: "All smoke tests must pass"
# Limit change scope
- name: limit_changes
type: change_constraint
enabled: true
severity: warning
rules:
- max_bytes_changed: 5120 # 5KB
- max_commands_executed: 15
message: "Keep changes small and focused"
# Protect ROM header
- name: protect_header
type: forbidden_range
enabled: true
severity: critical
ranges:
- start: 0xFFB0
end: 0xFFFF
reason: "ROM header"
message: "Cannot modify ROM header"
```
#### 4.2 Unit Tests
Create `test/cli/policy_evaluator_test.cc`:
```cpp
#include "src/cli/service/policy_evaluator.h"
#include "gtest/gtest.h"
namespace yaze {
namespace cli {
namespace {
TEST(PolicyEvaluatorTest, LoadPoliciesSuccess) {
auto& eval = PolicyEvaluator::GetInstance();
auto status = eval.LoadPolicies("test/fixtures/policies");
EXPECT_TRUE(status.ok());
EXPECT_TRUE(eval.IsEnabled());
}
TEST(PolicyEvaluatorTest, EvaluateProposal_NoViolations) {
// ... test implementation
}
TEST(PolicyEvaluatorTest, EvaluateProposal_CriticalViolation) {
// ... test implementation
}
} // namespace
} // namespace cli
} // namespace yaze
```
## Deliverables
- [x] Policy evaluator service interface
- [ ] YAML policy parser implementation
- [ ] Policy evaluation logic for all 4 types
- [ ] ProposalDrawer GUI integration
- [ ] Policy override workflow
- [ ] Example policy configurations
- [ ] Unit tests
- [ ] Documentation and usage guide
## Success Criteria
1. **Functional**:
- Policies load from YAML files
- Proposals evaluated against all enabled policies
- Accept button gated by critical violations
- Override workflow for warnings
2. **User Experience**:
- Clear policy status display in ProposalDrawer
- Helpful violation messages
- Override confirmation dialog
- Policy evaluation fast (< 100ms)
3. **Quality**:
- Unit test coverage > 80%
- No crashes or memory leaks
- Graceful handling of malformed YAML
- Works with policies disabled
## Future Enhancements
- Policy templates for common scenarios
- Policy violation history/analytics
- Auto-fix suggestions for violations
- Integration with CI/CD for automated policy checks
- Policy versioning and migration
---
**Status**: Ready for implementation
**Next Step**: Create PolicyEvaluator skeleton and wire into build system
**Estimated Completion**: October 3-4, 2025

View File

@@ -25,6 +25,10 @@ The z3ed CLI and AI agent workflow system has completed major infrastructure mil
- **Priority 3**: Enhanced Error Reporting (IT-08+) - Holistic improvements spanning z3ed, ImGuiTestHarness, EditorManager, and core application services
**Recent Accomplishments** (Updated: October 2025):
- **✅ IT-08a Screenshot RPC Complete**: SDL-based screenshot capture operational
- Captures 1536x864 BMP files via SDL_RenderReadPixels
- Successfully tested via gRPC (5.3MB output files)
- Foundation for auto-capture on test failures
- **✅ Policy Framework Complete**: PolicyEvaluator service fully integrated with ProposalDrawer GUI
- 4 policy types implemented: test_requirement, change_constraint, forbidden_range, review_requirement
- 3 severity levels: Info (informational), Warning (overridable), Critical (blocks acceptance)
@@ -41,8 +45,8 @@ The z3ed CLI and AI agent workflow system has completed major infrastructure mil
- **Proposal Workflow**: Agentic proposal system fully operational (create, list, diff, review in GUI)
**Known Limitations & Improvement Opportunities**:
- **Screenshot RPC**: Stub implementation → needs SDL_Surface capture + PNG encoding
- **Test Introspection**: No way to query test status, results, or queue → add GetTestStatus/ListTests RPCs
- **Screenshot Auto-Capture**: Manual RPC only → needs integration with TestManager failure detection
- **Test Introspection**: ✅ Complete - GetTestStatus/ListTests/GetResults RPCs operational
- **Widget Discovery**: AI agents can't enumerate available widgets → add DiscoverWidgets RPC
- **Test Recording**: No record/replay for regression testing → add RecordSession/ReplaySession RPCs
- **Synchronous Wait**: Async tests return immediately → add blocking mode or result polling
@@ -236,13 +240,15 @@ message WidgetInfo {
**Outcome**: Recording/replay is production-ready; focus shifts to surfacing rich failure diagnostics (IT-08).
#### IT-08: Enhanced Error Reporting (5-7 hours)
#### IT-08: Enhanced Error Reporting (5-7 hours) 🔄 ACTIVE
**Status**: IT-08a Complete ✅ | IT-08b In Progress 🔄
**Objective**: Deliver a unified, high-signal error reporting pipeline spanning ImGuiTestHarness, z3ed CLI, EditorManager, and core application services.
**Implementation Tracks**:
1. **Harness-Level Diagnostics**
- Implement Screenshot RPC (convert stub into working SDL capture pipeline)
- Auto-capture screenshots, widget tree dumps, and recent ImGui events on failure
- ✅ IT-08a: Screenshot RPC implemented (SDL-based, BMP format, 1536x864)
- 📋 IT-08b: Auto-capture screenshots on test failure
- 📋 IT-08c: Widget tree dumps and recent ImGui events on failure
- Serialize results to both structured JSON (for automation) and human-friendly HTML bundles
- Persist artifacts under `test-results/<test_id>/` with timestamped directories
@@ -516,9 +522,10 @@ z3ed collab replay session_2025_10_02.yaml --speed 2x
| IT-05 | Add test introspection RPCs (GetTestStatus, ListTests, GetResults) | ImGuiTest Bridge | Code | ✅ Done | IT-01 - Enable clients to poll test results and query execution state (Oct 2, 2025) |
| IT-06 | Implement widget discovery API for AI agents | ImGuiTest Bridge | Code | 📋 Planned | IT-01 - DiscoverWidgets RPC to enumerate windows, buttons, inputs |
| IT-07 | Add test recording/replay for regression testing | ImGuiTest Bridge | Code | ✅ Done | IT-05 - RecordSession/ReplaySession RPCs with JSON test scripts |
| IT-08 | Enhance error reporting with screenshots and state dumps | ImGuiTest Bridge | Code | <EFBFBD> Active | IT-01 - Capture widget state on failure for debugging |
| IT-08a | Adopt shared error envelope across CLI & services | ImGuiTest Bridge | Code | 🔄 Active | IT-08 |
| IT-08b | EditorManager diagnostic overlay & logging | ImGuiTest Bridge | UX | 📋 Planned | IT-08 |
| IT-08 | Enhance error reporting with screenshots and state dumps | ImGuiTest Bridge | Code | 🔄 Active | IT-01 - Capture widget state on failure for debugging |
| IT-08a | Screenshot RPC implementation (SDL capture) | ImGuiTest Bridge | Code | ✅ Done | IT-01 - Screenshot capture complete (Oct 2, 2025) |
| IT-08b | Auto-capture screenshots on test failure | ImGuiTest Bridge | Code | 🔄 Active | IT-08a - Integrate with TestManager |
| IT-08c | Widget state dumps and execution context | ImGuiTest Bridge | Code | 📋 Planned | IT-08b - Enhanced failure diagnostics |
| IT-09 | Create standardized test suite format for CI integration | ImGuiTest Bridge | Infra | 📋 Planned | IT-07 - JSON/YAML test suite format compatible with CI/CD pipelines |
| IT-10 | Collaborative editing & multiplayer sessions with shared AI | Collaboration | Feature | 📋 Planned | IT-05, IT-08 - Real-time multi-user editing with live cursors, shared proposals (12-15 hours) |
| VP-01 | Expand CLI unit tests for new commands and sandbox flow. | Verification Pipeline | Test | 📋 Planned | RC/AW tasks |

View File

@@ -0,0 +1,647 @@
# IT-08: Enhanced Error Reporting Implementation Guide
**Status**: IT-08a Complete ✅ | IT-08b In Progress 🔄 | IT-08c Planned 📋
**Date**: October 2, 2025
**Overall Progress**: 33% Complete (1 of 3 phases)
---
## Phase Overview
| Phase | Task | Status | Time | Description |
|-------|------|--------|------|-------------|
| IT-08a | Screenshot RPC | ✅ Complete | 1.5h | SDL-based screenshot capture |
| IT-08b | Auto-Capture on Failure | 🔄 Active | 1-1.5h | Integrate with TestManager |
| IT-08c | Widget State Dumps | 📋 Planned | 30-45m | Capture UI context on failure |
| IT-08d | Error Envelope Standardization | 📋 Planned | 1-2h | Unified error format across services |
| IT-08e | CLI Error Improvements | 📋 Planned | 1h | Rich error output with artifacts |
**Total Estimated Time**: 5-7 hours
**Time Spent**: 1.5 hours
**Time Remaining**: 3.5-5.5 hours
---
## IT-08a: Screenshot RPC ✅ COMPLETE
**Date Completed**: October 2, 2025
**Time**: 1.5 hours
### Implementation Summary
### What Was Built
Implemented the `Screenshot` RPC in the ImGuiTestHarness service with the following capabilities:
1. **SDL Renderer Integration**: Accesses the ImGui SDL2 backend renderer through `BackendRendererUserData`
2. **Framebuffer Capture**: Uses `SDL_RenderReadPixels` to capture the full window contents (1536x864, 32-bit ARGB)
3. **BMP File Output**: Saves screenshots as BMP files using SDL's built-in `SDL_SaveBMP` function
4. **Flexible Paths**: Supports custom output paths or auto-generates timestamped filenames (`/tmp/yaze_screenshot_<timestamp>.bmp`)
5. **Response Metadata**: Returns file path, file size (bytes), and image dimensions
### Technical Implementation
**Location**: `/Users/scawful/Code/yaze/src/app/core/service/imgui_test_harness_service.cc`
```cpp
// Helper struct matching imgui_impl_sdlrenderer2.cpp backend data
struct ImGui_ImplSDLRenderer2_Data {
SDL_Renderer* Renderer;
};
absl::Status ImGuiTestHarnessServiceImpl::Screenshot(
const ScreenshotRequest* request, ScreenshotResponse* response) {
// 1. Get SDL renderer from ImGui backend
ImGuiIO& io = ImGui::GetIO();
auto* backend_data = static_cast<ImGui_ImplSDLRenderer2_Data*>(io.BackendRendererUserData);
if (!backend_data || !backend_data->Renderer) {
response->set_success(false);
response->set_message("SDL renderer not available");
return absl::FailedPreconditionError("No SDL renderer available");
}
SDL_Renderer* renderer = backend_data->Renderer;
// 2. Get renderer output size
int width, height;
SDL_GetRendererOutputSize(renderer, &width, &height);
// 3. Create surface to hold screenshot
SDL_Surface* surface = SDL_CreateRGBSurface(0, width, height, 32,
0x00FF0000, 0x0000FF00,
0x000000FF, 0xFF000000);
// 4. Read pixels from renderer (ARGB8888 format)
SDL_RenderReadPixels(renderer, nullptr, SDL_PIXELFORMAT_ARGB8888,
surface->pixels, surface->pitch);
// 5. Determine output path (custom or auto-generated)
std::string output_path = request->output_path();
if (output_path.empty()) {
output_path = absl::StrFormat("/tmp/yaze_screenshot_%lld.bmp",
absl::ToUnixMillis(absl::Now()));
}
// 6. Save to BMP file
SDL_SaveBMP(surface, output_path.c_str());
// 7. Get file size and clean up
std::ifstream file(output_path, std::ios::binary | std::ios::ate);
int64_t file_size = file.tellg();
SDL_FreeSurface(surface);
// 8. Return success response
response->set_success(true);
response->set_message(absl::StrFormat("Screenshot saved to %s (%dx%d)",
output_path, width, height));
response->set_file_path(output_path);
response->set_file_size_bytes(file_size);
return absl::OkStatus();
}
```
### Testing Results
**Test Command**:
```bash
grpcurl -plaintext \
-import-path /Users/scawful/Code/yaze/src/app/core/proto \
-proto imgui_test_harness.proto \
-d '{"output_path": "/tmp/test_screenshot.bmp"}' \
localhost:50052 yaze.test.ImGuiTestHarness/Screenshot
```
**Response**:
```json
{
"success": true,
"message": "Screenshot saved to /tmp/test_screenshot.bmp (1536x864)",
"filePath": "/tmp/test_screenshot.bmp",
"fileSizeBytes": "5308538"
}
```
**File Verification**:
```bash
$ ls -lh /tmp/test_screenshot.bmp
-rw-r--r-- 1 scawful wheel 5.1M Oct 2 20:16 /tmp/test_screenshot.bmp
$ file /tmp/test_screenshot.bmp
/tmp/test_screenshot.bmp: PC bitmap, Windows 95/NT4 and newer format, 1536 x 864 x 32, cbSize 5308538, bits offset 122
```
**Result**: Screenshot successfully captured, saved, and validated!
---
## Design Decisions
### Why BMP Format?
**Chosen**: SDL's built-in `SDL_SaveBMP` function
**Rationale**:
- ✅ Zero external dependencies (no need for libpng, stb_image_write, etc.)
- ✅ Guaranteed to work on all platforms where SDL works
- ✅ Simple, reliable, and fast
- ✅ Adequate for debugging/error reporting (file size not critical)
- ⚠️ Larger file sizes (5.3MB vs ~500KB for PNG), but acceptable for temporary debug files
**Future Consideration**: If disk space becomes an issue, can add PNG encoding using stb_image_write (single-header library, easy to integrate)
### SDL Backend Integration
**Challenge**: How to access the SDL_Renderer from ImGui?
**Solution**:
- ImGui's `BackendRendererUserData` points to an `ImGui_ImplSDLRenderer2_Data` struct
- This struct contains the `Renderer` pointer as its first member
- Cast `BackendRendererUserData` to access the renderer safely
**Why Not Store Renderer Globally?**
- Multiple ImGui contexts could use different renderers
- Backend data pattern follows ImGui's architecture conventions
- More maintainable and future-proof
---
## Integration with Test System
### Current Usage (Manual RPC)
AI agents or CLI tools can manually capture screenshots:
```bash
# Capture screenshot after opening editor
z3ed agent test --prompt "Open Overworld Editor"
grpcurl ... yaze.test.ImGuiTestHarness/Screenshot
```
### Next Step: Auto-Capture on Failure
The screenshot RPC is now ready to be integrated with TestManager to automatically capture context when tests fail:
**Planned Implementation** (IT-08 Phase 2):
```cpp
// In TestManager::MarkHarnessTestCompleted()
if (test_result == IMGUI_TEST_STATUS_FAILED ||
test_result == IMGUI_TEST_STATUS_TIMEOUT) {
// Auto-capture screenshot
ScreenshotRequest req;
req.set_output_path(absl::StrFormat("/tmp/test_%s_failure.bmp", test_id));
ScreenshotResponse resp;
harness_service_->Screenshot(&req, &resp);
test_history_[test_id].screenshot_path = resp.file_path();
// Also capture widget state (IT-08 Phase 3)
test_history_[test_id].widget_state = CaptureWidgetState();
}
```
---
---
## IT-08b: Auto-Capture on Test Failure 🔄 IN PROGRESS
**Goal**: Automatically capture screenshots and context when tests fail
**Time Estimate**: 1-1.5 hours
**Status**: Ready to implement
### Implementation Plan
#### Step 1: Modify TestManager (30 minutes)
**File**: `src/app/core/test_manager.cc`
Add screenshot capture in `MarkHarnessTestCompleted()`:
```cpp
void TestManager::MarkHarnessTestCompleted(const std::string& test_id,
ImGuiTestStatus status) {
auto& history_entry = test_history_[test_id];
history_entry.status = status;
history_entry.end_time = absl::Now();
history_entry.execution_time_ms = absl::ToInt64Milliseconds(
history_entry.end_time - history_entry.start_time);
// Auto-capture screenshot on failure
if (status == ImGuiTestStatus_Error || status == ImGuiTestStatus_Warning) {
CaptureFailureContext(test_id);
}
}
void TestManager::CaptureFailureContext(const std::string& test_id) {
auto& history_entry = test_history_[test_id];
// 1. Capture screenshot
std::string screenshot_path =
absl::StrFormat("/tmp/yaze_test_%s_failure.bmp", test_id);
if (harness_service_) {
ScreenshotRequest req;
req.set_output_path(screenshot_path);
ScreenshotResponse resp;
auto status = harness_service_->Screenshot(&req, &resp);
if (status.ok()) {
history_entry.screenshot_path = resp.file_path();
history_entry.screenshot_size_bytes = resp.file_size_bytes();
}
}
// 2. Capture widget state (IT-08c)
// history_entry.widget_state = CaptureWidgetState();
// 3. Capture execution context
history_entry.failure_context = absl::StrFormat(
"Frame: %d, Active Window: %s, Focused Widget: %s",
ImGui::GetFrameCount(),
ImGui::GetCurrentWindow() ? ImGui::GetCurrentWindow()->Name : "none",
ImGui::GetActiveID());
}
```
#### Step 2: Update TestHistory Structure (15 minutes)
**File**: `src/app/core/test_manager.h`
Add failure context fields:
```cpp
struct TestHistory {
std::string test_id;
std::string test_name;
ImGuiTestStatus status;
absl::Time start_time;
absl::Time end_time;
int64_t execution_time_ms;
std::vector<std::string> logs;
std::map<std::string, std::string> metrics;
// IT-08b: Failure diagnostics
std::string screenshot_path;
int64_t screenshot_size_bytes = 0;
std::string failure_context;
std::string widget_state; // IT-08c
};
```
#### Step 3: Update GetTestResults RPC (30 minutes)
**File**: `src/app/core/service/imgui_test_harness_service.cc`
Include screenshot path in results:
```cpp
absl::Status ImGuiTestHarnessServiceImpl::GetTestResults(
const GetTestResultsRequest* request,
GetTestResultsResponse* response) {
const auto& history = test_manager_->GetTestHistory(request->test_id());
// ... existing result population ...
// Add failure diagnostics
if (!history.screenshot_path.empty()) {
response->set_screenshot_path(history.screenshot_path);
response->set_screenshot_size_bytes(history.screenshot_size_bytes);
}
if (!history.failure_context.empty()) {
response->set_failure_context(history.failure_context);
}
return absl::OkStatus();
}
```
#### Step 4: Update Proto Schema (15 minutes)
**File**: `src/app/core/proto/imgui_test_harness.proto`
Add fields to GetTestResultsResponse:
```proto
message GetTestResultsResponse {
string test_id = 1;
TestStatus status = 2;
int64 execution_time_ms = 3;
repeated string logs = 4;
map<string, string> metrics = 5;
// IT-08b: Failure diagnostics
string screenshot_path = 6;
int64 screenshot_size_bytes = 7;
string failure_context = 8;
string widget_state = 9; // IT-08c
}
```
### Testing
```bash
# 1. Build with changes
cmake --build build-grpc-test --target yaze -j8
# 2. Start test harness
./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze \
--enable_test_harness --test_harness_port=50052 \
--rom_file=assets/zelda3.sfc &
# 3. Trigger a failing test
grpcurl -plaintext \
-import-path src/app/core/proto \
-proto imgui_test_harness.proto \
-d '{"target":"nonexistent_widget","type":"LEFT"}' \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Click
# 4. Check for screenshot
ls -lh /tmp/yaze_test_*_failure.bmp
# 5. Query test results
grpcurl -plaintext \
-import-path src/app/core/proto \
-proto imgui_test_harness.proto \
-d '{"test_id":"grpc_click_<timestamp>"}' \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/GetTestResults
# Expected: screenshot_path and failure_context populated
```
### Success Criteria
- ✅ Screenshots auto-captured on test failure
- ✅ Screenshot path stored in test history
- ✅ GetTestResults returns screenshot metadata
- ✅ No performance impact on passing tests
- ✅ Screenshots cleaned up after test completion (optional)
---
## IT-08c: Widget State Dumps 📋 PLANNED
**Goal**: Capture UI hierarchy and state on test failures
**Time Estimate**: 30-45 minutes
**Status**: Specification phase
### Implementation Plan
#### Step 1: Create Widget State Capture Utility (30 minutes)
**File**: `src/app/core/widget_state_capture.h` (new file)
```cpp
#ifndef YAZE_CORE_WIDGET_STATE_CAPTURE_H
#define YAZE_CORE_WIDGET_STATE_CAPTURE_H
#include <string>
#include "imgui/imgui.h"
namespace yaze {
namespace core {
struct WidgetState {
std::string focused_window;
std::string focused_widget;
std::string hovered_widget;
std::vector<std::string> visible_windows;
std::vector<std::string> open_menus;
std::string active_popup;
};
std::string CaptureWidgetState();
std::string SerializeWidgetStateToJson(const WidgetState& state);
} // namespace core
} // namespace yaze
#endif
```
**File**: `src/app/core/widget_state_capture.cc` (new file)
```cpp
#include "src/app/core/widget_state_capture.h"
#include "absl/strings/str_format.h"
#include "nlohmann/json.hpp"
namespace yaze {
namespace core {
std::string CaptureWidgetState() {
WidgetState state;
// Capture focused window
ImGuiWindow* current = ImGui::GetCurrentWindow();
if (current) {
state.focused_window = current->Name;
}
// Capture active widget
ImGuiID active_id = ImGui::GetActiveID();
if (active_id != 0) {
state.focused_widget = absl::StrFormat("ID_%u", active_id);
}
// Capture hovered widget
ImGuiID hovered_id = ImGui::GetHoveredID();
if (hovered_id != 0) {
state.hovered_widget = absl::StrFormat("ID_%u", hovered_id);
}
// Traverse window list
ImGuiContext* ctx = ImGui::GetCurrentContext();
for (ImGuiWindow* window : ctx->Windows) {
if (window->Active && !window->Hidden) {
state.visible_windows.push_back(window->Name);
}
}
return SerializeWidgetStateToJson(state);
}
std::string SerializeWidgetStateToJson(const WidgetState& state) {
nlohmann::json j;
j["focused_window"] = state.focused_window;
j["focused_widget"] = state.focused_widget;
j["hovered_widget"] = state.hovered_widget;
j["visible_windows"] = state.visible_windows;
j["open_menus"] = state.open_menus;
j["active_popup"] = state.active_popup;
return j.dump(2); // Pretty print with indent
}
} // namespace core
} // namespace yaze
```
#### Step 2: Integrate with TestManager (15 minutes)
Update `CaptureFailureContext()` in `test_manager.cc`:
```cpp
void TestManager::CaptureFailureContext(const std::string& test_id) {
auto& history_entry = test_history_[test_id];
// 1. Screenshot (IT-08b)
// ... existing code ...
// 2. Widget state (IT-08c)
history_entry.widget_state = core::CaptureWidgetState();
// 3. Execution context
// ... existing code ...
}
```
### Output Example
```json
{
"focused_window": "Overworld Editor",
"focused_widget": "ID_12345",
"hovered_widget": "ID_67890",
"visible_windows": [
"Main Window",
"Overworld Editor",
"Palette Editor"
],
"open_menus": [],
"active_popup": ""
}
```
---
## IT-08d: Error Envelope Standardization 📋 PLANNED
**Goal**: Unified error format across z3ed, TestManager, EditorManager
**Time Estimate**: 1-2 hours
**Status**: Design phase
### Proposed Error Envelope
```cpp
// Shared error structure
struct ErrorContext {
absl::Status status;
std::string component; // "TestHarness", "EditorManager", "z3ed"
std::string operation; // "Click", "LoadROM", "RunTest"
std::map<std::string, std::string> metadata;
std::vector<std::string> artifact_paths; // Screenshots, logs, etc.
std::string actionable_hint; // User-facing suggestion
};
```
### Integration Points
1. **TestManager**: Wrap failures in ErrorContext
2. **EditorManager**: Use ErrorContext for all operations
3. **z3ed CLI**: Parse ErrorContext and format for display
4. **ProposalDrawer**: Display ErrorContext in GUI modal
---
## IT-08e: CLI Error Improvements 📋 PLANNED
**Goal**: Rich error output in z3ed CLI
**Time Estimate**: 1 hour
**Status**: Design phase
### Enhanced CLI Output
```bash
$ z3ed agent test --prompt "Open Overworld editor"
❌ Test Failed: grpc_click_1696357200
Component: ImGuiTestHarness
Operation: Click widget "Overworld"
Error: Widget not found
Artifacts:
• Screenshot: /tmp/yaze_test_grpc_click_1696357200_failure.bmp
• Widget State: /tmp/yaze_test_grpc_click_1696357200_state.json
• Logs: /tmp/yaze_test_grpc_click_1696357200.log
Context:
• Visible Windows: Main Window, Debug
• Focused Window: Main Window
• Active Widget: None
Suggestion:
→ Check if ROM is loaded (File → Open ROM)
→ Verify Overworld editor button is visible
→ Use 'z3ed agent gui discover' to list available widgets
```
---
## Progress Tracking
### Completed ✅
- IT-08a: Screenshot RPC (1.5 hours)
### In Progress 🔄
- IT-08b: Auto-capture on failure (next priority)
### Planned 📋
- IT-08c: Widget state dumps
- IT-08d: Error envelope standardization
- IT-08e: CLI error improvements
### Time Investment
- **Spent**: 1.5 hours (IT-08a)
- **Remaining**: 3.5-5.5 hours (IT-08b/c/d/e)
- **Total**: 5-7 hours (as estimated)
---
## Next Steps
**Immediate** (IT-08b - 1-1.5 hours):
1. Modify TestManager to capture screenshots on failure
2. Update TestHistory structure
3. Update GetTestResults RPC
4. Test with intentional failures
**Short-term** (IT-08c - 30-45 minutes):
1. Create widget state capture utility
2. Integrate with TestManager
3. Add to GetTestResults RPC
**Medium-term** (IT-08d/e - 2-3 hours):
1. Design unified error envelope
2. Implement across all services
3. Update CLI output formatting
4. Add ProposalDrawer error modal
---
## References
- **Implementation Plan**: [E6-z3ed-implementation-plan.md](E6-z3ed-implementation-plan.md)
- **Test Harness Guide**: [IT-05-IMPLEMENTATION-GUIDE.md](IT-05-IMPLEMENTATION-GUIDE.md)
- **Source Files**:
- `src/app/core/service/imgui_test_harness_service.cc`
- `src/app/core/test_manager.{h,cc}`
- `src/app/core/proto/imgui_test_harness.proto`
---
**Last Updated**: October 2, 2025
**Current Phase**: IT-08b (Auto-capture on failure)
**Overall Progress**: 33% Complete (1 of 3 core phases)
---
**Report Generated**: October 2, 2025
**Author**: GitHub Copilot (AI Assistant)
**Project**: YAZE - Yet Another Zelda3 Editor
**Component**: z3ed CLI Tool - Test Automation Harness

View File

@@ -1,347 +0,0 @@
# IT-08 Screenshot RPC - Completion Report
**Date**: October 2, 2025
**Task**: IT-08 Enhanced Error Reporting - Screenshot Capture Implementation
**Status**: ✅ Screenshot RPC Complete (30% of IT-08)
---
## Implementation Summary
### What Was Built
Implemented the `Screenshot` RPC in the ImGuiTestHarness service with the following capabilities:
1. **SDL Renderer Integration**: Accesses the ImGui SDL2 backend renderer through `BackendRendererUserData`
2. **Framebuffer Capture**: Uses `SDL_RenderReadPixels` to capture the full window contents (1536x864, 32-bit ARGB)
3. **BMP File Output**: Saves screenshots as BMP files using SDL's built-in `SDL_SaveBMP` function
4. **Flexible Paths**: Supports custom output paths or auto-generates timestamped filenames (`/tmp/yaze_screenshot_<timestamp>.bmp`)
5. **Response Metadata**: Returns file path, file size (bytes), and image dimensions
### Technical Implementation
**Location**: `/Users/scawful/Code/yaze/src/app/core/service/imgui_test_harness_service.cc`
```cpp
// Helper struct matching imgui_impl_sdlrenderer2.cpp backend data
struct ImGui_ImplSDLRenderer2_Data {
SDL_Renderer* Renderer;
};
absl::Status ImGuiTestHarnessServiceImpl::Screenshot(
const ScreenshotRequest* request, ScreenshotResponse* response) {
// 1. Get SDL renderer from ImGui backend
ImGuiIO& io = ImGui::GetIO();
auto* backend_data = static_cast<ImGui_ImplSDLRenderer2_Data*>(io.BackendRendererUserData);
if (!backend_data || !backend_data->Renderer) {
response->set_success(false);
response->set_message("SDL renderer not available");
return absl::FailedPreconditionError("No SDL renderer available");
}
SDL_Renderer* renderer = backend_data->Renderer;
// 2. Get renderer output size
int width, height;
SDL_GetRendererOutputSize(renderer, &width, &height);
// 3. Create surface to hold screenshot
SDL_Surface* surface = SDL_CreateRGBSurface(0, width, height, 32,
0x00FF0000, 0x0000FF00,
0x000000FF, 0xFF000000);
// 4. Read pixels from renderer (ARGB8888 format)
SDL_RenderReadPixels(renderer, nullptr, SDL_PIXELFORMAT_ARGB8888,
surface->pixels, surface->pitch);
// 5. Determine output path (custom or auto-generated)
std::string output_path = request->output_path();
if (output_path.empty()) {
output_path = absl::StrFormat("/tmp/yaze_screenshot_%lld.bmp",
absl::ToUnixMillis(absl::Now()));
}
// 6. Save to BMP file
SDL_SaveBMP(surface, output_path.c_str());
// 7. Get file size and clean up
std::ifstream file(output_path, std::ios::binary | std::ios::ate);
int64_t file_size = file.tellg();
SDL_FreeSurface(surface);
// 8. Return success response
response->set_success(true);
response->set_message(absl::StrFormat("Screenshot saved to %s (%dx%d)",
output_path, width, height));
response->set_file_path(output_path);
response->set_file_size_bytes(file_size);
return absl::OkStatus();
}
```
### Testing Results
**Test Command**:
```bash
grpcurl -plaintext \
-import-path /Users/scawful/Code/yaze/src/app/core/proto \
-proto imgui_test_harness.proto \
-d '{"output_path": "/tmp/test_screenshot.bmp"}' \
localhost:50052 yaze.test.ImGuiTestHarness/Screenshot
```
**Response**:
```json
{
"success": true,
"message": "Screenshot saved to /tmp/test_screenshot.bmp (1536x864)",
"filePath": "/tmp/test_screenshot.bmp",
"fileSizeBytes": "5308538"
}
```
**File Verification**:
```bash
$ ls -lh /tmp/test_screenshot.bmp
-rw-r--r-- 1 scawful wheel 5.1M Oct 2 20:16 /tmp/test_screenshot.bmp
$ file /tmp/test_screenshot.bmp
/tmp/test_screenshot.bmp: PC bitmap, Windows 95/NT4 and newer format, 1536 x 864 x 32, cbSize 5308538, bits offset 122
```
**Result**: Screenshot successfully captured, saved, and validated!
---
## Design Decisions
### Why BMP Format?
**Chosen**: SDL's built-in `SDL_SaveBMP` function
**Rationale**:
- ✅ Zero external dependencies (no need for libpng, stb_image_write, etc.)
- ✅ Guaranteed to work on all platforms where SDL works
- ✅ Simple, reliable, and fast
- ✅ Adequate for debugging/error reporting (file size not critical)
- ⚠️ Larger file sizes (5.3MB vs ~500KB for PNG), but acceptable for temporary debug files
**Future Consideration**: If disk space becomes an issue, can add PNG encoding using stb_image_write (single-header library, easy to integrate)
### SDL Backend Integration
**Challenge**: How to access the SDL_Renderer from ImGui?
**Solution**:
- ImGui's `BackendRendererUserData` points to an `ImGui_ImplSDLRenderer2_Data` struct
- This struct contains the `Renderer` pointer as its first member
- Cast `BackendRendererUserData` to access the renderer safely
**Why Not Store Renderer Globally?**
- Multiple ImGui contexts could use different renderers
- Backend data pattern follows ImGui's architecture conventions
- More maintainable and future-proof
---
## Integration with Test System
### Current Usage (Manual RPC)
AI agents or CLI tools can manually capture screenshots:
```bash
# Capture screenshot after opening editor
z3ed agent test --prompt "Open Overworld Editor"
grpcurl ... yaze.test.ImGuiTestHarness/Screenshot
```
### Next Step: Auto-Capture on Failure
The screenshot RPC is now ready to be integrated with TestManager to automatically capture context when tests fail:
**Planned Implementation** (IT-08 Phase 2):
```cpp
// In TestManager::MarkHarnessTestCompleted()
if (test_result == IMGUI_TEST_STATUS_FAILED ||
test_result == IMGUI_TEST_STATUS_TIMEOUT) {
// Auto-capture screenshot
ScreenshotRequest req;
req.set_output_path(absl::StrFormat("/tmp/test_%s_failure.bmp", test_id));
ScreenshotResponse resp;
harness_service_->Screenshot(&req, &resp);
test_history_[test_id].screenshot_path = resp.file_path();
// Also capture widget state (IT-08 Phase 3)
test_history_[test_id].widget_state = CaptureWidgetState();
}
```
---
## Remaining Work (IT-08 Phases 2-3)
### Phase 2: Auto-Capture on Test Failure (1-1.5 hours)
**Tasks**:
1. Modify `TestManager::MarkHarnessTestCompleted()` to detect failures
2. Call Screenshot RPC automatically when `status == FAILED || status == TIMEOUT`
3. Store screenshot path in test history
4. Update `GetTestResults` RPC to include screenshot paths in response
5. Test with intentional test failures
**Files to Modify**:
- `src/app/core/test_manager.cc` (auto-capture logic)
- `src/app/core/service/imgui_test_harness_service.cc` (store screenshot in history)
### Phase 3: Widget State Dump (30-45 minutes)
**Tasks**:
1. Implement `CaptureWidgetState()` function to traverse ImGui window hierarchy
2. Capture: focused window, focused widget, hovered widget, open menus
3. Store as JSON string in test history
4. Include in `GetTestResults` response
**Files to Create**:
- `src/app/core/widget_state_capture.{h,cc}` (traversal logic)
**Example Output**:
```json
{
"focused_window": "Overworld Editor",
"hovered_widget": "canvas_overworld_main",
"open_menus": [],
"visible_windows": ["Overworld Editor", "Palette Editor", "Tile16 Editor"]
}
```
---
## Performance Considerations
### Current Performance
- **Screenshot Capture Time**: ~10-20ms (depends on resolution)
- **File Write Time**: ~50-100ms (5.3MB BMP)
- **Total Impact**: ~60-120ms per screenshot
**Analysis**: Acceptable for failure scenarios (only captures when test fails, not on every frame)
### Optimization Options (If Needed)
1. **Async Capture**: Move screenshot to background thread (complex, may not be necessary)
2. **PNG Compression**: Reduce file size from 5.3MB to ~500KB (10x smaller)
3. **Downscaling**: Capture at 50% resolution (768x432) for faster I/O
4. **Skip Screenshots for Fast Tests**: Only capture for tests >1 second
**Recommendation**: Current performance is fine for debugging. Only optimize if users report slowdowns.
---
## CLI Integration
### z3ed CLI Usage
The Screenshot RPC is accessible via the CLI automation client:
```cpp
// In gui_automation_client.cc
absl::StatusOr<ScreenshotResponse> GuiAutomationClient::TakeScreenshot(
const std::string& output_path) {
ScreenshotRequest request;
request.set_output_path(output_path);
ScreenshotResponse response;
grpc::ClientContext context;
auto status = stub_->Screenshot(&context, request, &response);
if (!status.ok()) {
return absl::InternalError(status.error_message());
}
return response;
}
```
### Agent Mode Integration
AI agents can now request screenshots to understand GUI state:
```yaml
# Example agent workflow
- action: click
target: "Overworld Editor##tab"
- action: screenshot
output: "/tmp/overworld_state.bmp"
- action: analyze
image: "/tmp/overworld_state.bmp"
prompt: "Verify Overworld Editor opened successfully"
```
---
## Next Steps
### Immediate (Continue IT-08)
1. **Build and Test**: ✅ Complete (Oct 2, 2025)
2. **Auto-Capture on Failure**: 📋 Next (1-1.5 hours)
3. **Widget State Dump**: 📋 After auto-capture (30-45 minutes)
### After IT-08 Completion
**IT-09: CI/CD Integration** (2-3 hours):
- Test suite YAML format
- JUnit XML output for GitHub Actions
- Example workflow file
---
## Success Metrics
**Screenshot RPC Works**: Successfully captures 1536x864 @ 32-bit BMP files
**Integration Ready**: Can be called from CLI, agents, or test harness
**Performance Acceptable**: ~60-120ms total impact per capture
**Error Handling**: Returns clear error messages if renderer unavailable
**Overall IT-08 Progress**: 30% complete (1 of 3 phases done)
---
## Documentation Updates
### Files Updated
- `src/app/core/service/imgui_test_harness_service.cc` (Screenshot implementation)
- `docs/z3ed/IT-08-SCREENSHOT-COMPLETION.md` (this file)
### Files to Update Next
- `docs/z3ed/IMPLEMENTATION_CONTINUATION.md` (mark Screenshot complete)
- `docs/z3ed/STATUS_REPORT_OCT2.md` (update progress to 30%)
- `docs/z3ed/NEXT_STEPS_OCT2.md` (shift focus to Phase 2)
---
## Conclusion
The Screenshot RPC is fully functional and tested. It provides the foundation for IT-08's enhanced error reporting system by capturing visual context when tests fail.
**Key Achievement**: AI agents can now "see" what's on screen, enabling visual debugging and verification workflows.
**What's Next**: Integrate screenshot capture with the test failure detection system so every failed test automatically includes a screenshot + widget state dump.
**Estimated Time to Complete IT-08**: 1.5-2 hours remaining (auto-capture + widget state)
---
**Report Generated**: October 2, 2025
**Author**: GitHub Copilot (AI Assistant)
**Project**: YAZE - Yet Another Zelda3 Editor
**Component**: z3ed CLI Tool - Test Automation Harness

View File

@@ -0,0 +1,388 @@
# IT-08b: Auto-Capture on Test Failure - Implementation Guide
**Status**: 🔄 Ready to Implement
**Priority**: High (Next Phase of IT-08)
**Time Estimate**: 1-1.5 hours
**Date**: October 2, 2025
---
## Overview
Automatically capture screenshots and execution context when tests fail, enabling better debugging and diagnostics for AI agents.
**Goal**: Every failed test produces:
- Screenshot of GUI state at failure
- Execution context (frame count, active windows, focused widgets)
- Foundation for IT-08c (widget state dumps)
---
## Implementation Steps
### Step 1: Update TestHistory Structure (15 minutes)
**File**: `src/app/core/test_manager.h`
Add failure diagnostics fields:
```cpp
struct TestHistory {
std::string test_id;
std::string test_name;
ImGuiTestStatus status;
absl::Time start_time;
absl::Time end_time;
int64_t execution_time_ms;
std::vector<std::string> logs;
std::map<std::string, std::string> metrics;
// IT-08b: Failure diagnostics
std::string screenshot_path;
int64_t screenshot_size_bytes = 0;
std::string failure_context;
// IT-08c: Widget state (future)
std::string widget_state;
};
```
### Step 2: Add CaptureFailureContext Method (30 minutes)
**File**: `src/app/core/test_manager.cc`
Add new method after `MarkHarnessTestCompleted`:
```cpp
void TestManager::CaptureFailureContext(const std::string& test_id) {
if (test_history_.find(test_id) == test_history_.end()) {
return;
}
auto& history = test_history_[test_id];
// 1. Capture screenshot via harness service
if (harness_service_) {
std::string screenshot_path =
absl::StrFormat("/tmp/yaze_test_%s_failure.bmp", test_id);
ScreenshotRequest req;
req.set_output_path(screenshot_path);
ScreenshotResponse resp;
auto status = harness_service_->Screenshot(&req, &resp);
if (status.ok() && resp.success()) {
history.screenshot_path = resp.file_path();
history.screenshot_size_bytes = resp.file_size_bytes();
} else {
YAZE_LOG(ERROR) << "Failed to capture screenshot for " << test_id
<< ": " << status.message();
}
}
// 2. Capture execution context
ImGuiContext* ctx = ImGui::GetCurrentContext();
if (ctx) {
ImGuiWindow* current_window = ImGui::GetCurrentWindow();
std::string window_name = current_window ? current_window->Name : "none";
ImGuiID active_id = ImGui::GetActiveID();
ImGuiID hovered_id = ImGui::GetHoveredID();
history.failure_context = absl::StrFormat(
"Frame: %d, Window: %s, Active: %u, Hovered: %u",
ImGui::GetFrameCount(),
window_name,
active_id,
hovered_id);
}
// 3. Widget state capture (IT-08c - placeholder)
// history.widget_state = CaptureWidgetState();
}
```
### Step 3: Integrate with MarkHarnessTestCompleted (15 minutes)
**File**: `src/app/core/test_manager.cc`
Modify existing method to call CaptureFailureContext:
```cpp
void TestManager::MarkHarnessTestCompleted(const std::string& test_id,
ImGuiTestStatus status) {
if (test_history_.find(test_id) == test_history_.end()) {
return;
}
auto& history = test_history_[test_id];
history.status = status;
history.end_time = absl::Now();
history.execution_time_ms = absl::ToInt64Milliseconds(
history.end_time - history.start_time);
// Auto-capture diagnostics on failure
if (status == ImGuiTestStatus_Error || status == ImGuiTestStatus_Warning) {
CaptureFailureContext(test_id);
}
// Notify waiting threads
cv_.notify_all();
}
```
### Step 4: Update GetTestResults RPC (30 minutes)
**File**: `src/app/core/proto/imgui_test_harness.proto`
Add fields to response:
```proto
message GetTestResultsResponse {
string test_id = 1;
TestStatus status = 2;
int64 execution_time_ms = 3;
repeated string logs = 4;
map<string, string> metrics = 5;
// IT-08b: Failure diagnostics
string screenshot_path = 6;
int64 screenshot_size_bytes = 7;
string failure_context = 8;
// IT-08c: Widget state (future)
string widget_state = 9;
}
```
**File**: `src/app/core/service/imgui_test_harness_service.cc`
Update implementation:
```cpp
absl::Status ImGuiTestHarnessServiceImpl::GetTestResults(
const GetTestResultsRequest* request,
GetTestResultsResponse* response) {
const std::string& test_id = request->test_id();
auto history = test_manager_->GetTestHistory(test_id);
if (!history.has_value()) {
return absl::NotFoundError(
absl::StrFormat("Test not found: %s", test_id));
}
const auto& h = history.value();
// Basic info
response->set_test_id(h.test_id);
response->set_status(ConvertImGuiTestStatusToProto(h.status));
response->set_execution_time_ms(h.execution_time_ms);
// Logs and metrics
for (const auto& log : h.logs) {
response->add_logs(log);
}
for (const auto& [key, value] : h.metrics) {
(*response->mutable_metrics())[key] = value;
}
// IT-08b: Failure diagnostics
if (!h.screenshot_path.empty()) {
response->set_screenshot_path(h.screenshot_path);
response->set_screenshot_size_bytes(h.screenshot_size_bytes);
}
if (!h.failure_context.empty()) {
response->set_failure_context(h.failure_context);
}
// IT-08c: Widget state (future)
if (!h.widget_state.empty()) {
response->set_widget_state(h.widget_state);
}
return absl::OkStatus();
}
```
---
## Testing
### Build and Start Test Harness
```bash
# 1. Rebuild with changes
cmake --build build-grpc-test --target yaze -j$(sysctl -n hw.ncpu)
# 2. Start test harness
./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze \
--enable_test_harness \
--test_harness_port=50052 \
--rom_file=assets/zelda3.sfc &
```
### Trigger Test Failure
```bash
# 3. Trigger a failing test (nonexistent widget)
grpcurl -plaintext \
-import-path src/app/core/proto \
-proto imgui_test_harness.proto \
-d '{"target":"nonexistent_widget","type":"LEFT"}' \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Click
# Response should indicate failure
```
### Verify Screenshot Captured
```bash
# 4. Check for auto-captured screenshot
ls -lh /tmp/yaze_test_*_failure.bmp
# Expected: BMP file created (5.3MB)
```
### Query Test Results
```bash
# 5. Get test results (replace <test_id> with actual ID from Click response)
grpcurl -plaintext \
-import-path src/app/core/proto \
-proto imgui_test_harness.proto \
-d '{"test_id":"<test_id>"}' \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/GetTestResults
# Expected output:
{
"testId": "grpc_click_12345678",
"status": "FAILED",
"executionTimeMs": "1234",
"logs": [...],
"screenshotPath": "/tmp/yaze_test_grpc_click_12345678_failure.bmp",
"screenshotSizeBytes": "5308538",
"failureContext": "Frame: 1234, Window: Main Window, Active: 0, Hovered: 0"
}
```
### End-to-End Test Script
Create `scripts/test_auto_capture.sh`:
```bash
#!/bin/bash
set -e
echo "=== IT-08b Auto-Capture Test ==="
# Clean up old screenshots
rm -f /tmp/yaze_test_*_failure.bmp
# Start YAZE with test harness
echo "Starting YAZE..."
./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze \
--enable_test_harness \
--test_harness_port=50052 \
--rom_file=assets/zelda3.sfc &
YAZE_PID=$!
# Wait for server to start
sleep 3
# Trigger failing test
echo "Triggering test failure..."
TEST_ID=$(grpcurl -plaintext \
-import-path src/app/core/proto \
-proto imgui_test_harness.proto \
-d '{"target":"nonexistent_widget","type":"LEFT"}' \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Click | \
jq -r '.testId')
echo "Test ID: $TEST_ID"
# Wait for test to complete
sleep 2
# Check screenshot captured
if [ -f "/tmp/yaze_test_${TEST_ID}_failure.bmp" ]; then
echo "✅ Screenshot captured: /tmp/yaze_test_${TEST_ID}_failure.bmp"
else
echo "❌ Screenshot NOT captured"
kill $YAZE_PID
exit 1
fi
# Query test results
echo "Querying test results..."
RESULTS=$(grpcurl -plaintext \
-import-path src/app/core/proto \
-proto imgui_test_harness.proto \
-d "{\"test_id\":\"$TEST_ID\"}" \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/GetTestResults)
echo "$RESULTS"
# Verify fields present
if echo "$RESULTS" | jq -e '.screenshotPath' > /dev/null; then
echo "✅ Screenshot path in results"
else
echo "❌ Screenshot path missing"
kill $YAZE_PID
exit 1
fi
if echo "$RESULTS" | jq -e '.failureContext' > /dev/null; then
echo "✅ Failure context in results"
else
echo "❌ Failure context missing"
kill $YAZE_PID
exit 1
fi
echo "=== All tests passed! ==="
# Cleanup
kill $YAZE_PID
```
---
## Success Criteria
- ✅ Screenshots auto-captured on test failure (Error or Warning status)
- ✅ Screenshot path stored in TestHistory
- ✅ Failure context captured (frame, window, widgets)
- ✅ GetTestResults RPC returns screenshot_path and failure_context
- ✅ No performance impact on passing tests (capture only on failure)
- ✅ Clean error handling if screenshot capture fails
---
## Files Modified
1. `src/app/core/test_manager.h` - TestHistory structure
2. `src/app/core/test_manager.cc` - CaptureFailureContext method
3. `src/app/core/proto/imgui_test_harness.proto` - GetTestResultsResponse fields
4. `src/app/core/service/imgui_test_harness_service.cc` - GetTestResults implementation
---
## Next Steps
**After IT-08b Complete**:
1. IT-08c: Widget State Dumps (30-45 minutes)
2. IT-08d: Error Envelope Standardization (1-2 hours)
3. IT-08e: CLI Error Improvements (1 hour)
**Documentation Updates**:
1. Update `IT-08-IMPLEMENTATION-GUIDE.md` with IT-08b complete status
2. Update `E6-z3ed-implementation-plan.md` progress tracking
3. Update `README.md` with new capabilities
---
**Last Updated**: October 2, 2025
**Status**: Ready to implement
**Estimated Completion**: October 2-3, 2025 (1-1.5 hours)

View File

@@ -1,251 +0,0 @@
# Policy Evaluation Framework - Implementation Complete ✅
**Date**: October 2025
**Task**: AW-04 - Policy Evaluation Framework
**Status**: ✅ Complete - Ready for Production Testing
**Time**: 6 hours actual (estimated 6-8 hours)
## Overview
The Policy Evaluation Framework enables safe AI-driven ROM modifications by gating proposal acceptance based on YAML-configured constraints. This prevents the agent from making dangerous changes (corrupting ROM headers, exceeding byte limits, bypassing test requirements) while maintaining flexibility through configurable policies.
## Implementation Summary
### Core Components
1. **PolicyEvaluator Service** (`src/cli/service/policy_evaluator.{h,cc}`)
- Singleton service managing policy loading and evaluation
- 377 lines of implementation code
- Thread-safe with absl::StatusOr error handling
- Auto-loads from `.yaze/policies/agent.yaml` on first use
2. **Policy Types** (4 implemented):
- **test_requirement**: Gates on test status (critical severity)
- **change_constraint**: Limits bytes modified (warning/critical)
- **forbidden_range**: Blocks specific memory regions (critical)
- **review_requirement**: Flags proposals needing scrutiny (warning)
3. **Severity Levels** (3 levels):
- **Info**: Informational only, no blocking
- **Warning**: User can override with confirmation
- **Critical**: Blocks acceptance completely
4. **GUI Integration** (`src/app/editor/system/proposal_drawer.{h,cc}`)
- `DrawPolicyStatus()`: Color-coded violation display
- ⛔ Red for critical violations
- ⚠️ Yellow for warnings
- Blue for info messages
- Accept button gating: Disabled when critical violations present
- Override dialog: Confirmation required for warnings
5. **Configuration** (`.yaze/policies/agent.yaml`)
- Simple YAML-like format for policy definitions
- Example configuration with 4 policies provided
- User can enable/disable individual policies
- Supports comments and version tracking
### Build System Integration
- Added `cli/service/policy_evaluator.cc` to:
- `src/cli/z3ed.cmake` (z3ed CLI target)
- `src/app/app.cmake` (yaze GUI target, with `YAZE_ENABLE_POLICY_FRAMEWORK=1`)
- **Conditional Compilation**: Policy framework only enabled in main `yaze` target
- `yaze_emu` (emulator) builds without policy support
- Uses `#ifdef YAZE_ENABLE_POLICY_FRAMEWORK` to wrap optional code
- Clean build with no errors (warnings only for Abseil version mismatch)
## Code Changes
### Files Created (3 new files):
1. **docs/z3ed/AW-04-POLICY-FRAMEWORK.md** (1,234 lines)
- Complete implementation specification
- YAML schema documentation
- Architecture diagrams and examples
- 4-phase implementation plan
2. **src/cli/service/policy_evaluator.h** (85 lines)
- PolicyEvaluator singleton interface
- PolicyResult, PolicyViolation structures
- PolicySeverity enum
- Public API: LoadPolicies(), EvaluateProposal(), ReloadPolicies()
3. **src/cli/service/policy_evaluator.cc** (377 lines)
- ParsePolicyFile(): Simple YAML parser
- Evaluate[Test|Change|Forbidden|Review](): Policy evaluation logic
- CategorizeViolations(): Severity-based filtering
4. **.yaze/policies/agent.yaml** (34 lines)
- Example policy configuration
- 4 sample policies with detailed comments
- Ready for production use
### Files Modified (5 files):
1. **src/app/editor/system/proposal_drawer.h**
- Added: `DrawPolicyStatus()` method
- Added: `show_override_dialog_` member variable
2. **src/app/editor/system/proposal_drawer.cc** (~100 lines added)
- Integrated PolicyEvaluator::Get().EvaluateProposal()
- Implemented DrawPolicyStatus() with color-coded violations
- Modified DrawActionButtons() to gate Accept button
- Added policy override confirmation dialog
3. **src/cli/z3ed.cmake**
- Added: `cli/service/policy_evaluator.cc` to z3ed sources
4. **src/app/app.cmake**
- Added: `cli/service/policy_evaluator.cc` to yaze sources
- Added: `YAZE_ENABLE_POLICY_FRAMEWORK=1` compile definition
- Note: `yaze_emu` target does NOT include policy framework (optional feature)
5. **src/app/editor/system/proposal_drawer.cc**
- Wrapped policy code with `#ifdef YAZE_ENABLE_POLICY_FRAMEWORK`
- Gracefully degrades when policy framework disabled
6. **docs/z3ed/E6-z3ed-implementation-plan.md**
- Updated: AW-04 status from "📋 Next" to "✅ Done"
- Updated: Active phase to Policy Framework complete
- Updated: Time investment to 28.5 hours total
## Technical Details
### Conditional Compilation
The policy framework uses conditional compilation to allow building without policy support:
```cpp
#ifdef YAZE_ENABLE_POLICY_FRAMEWORK
auto& policy_eval = cli::PolicyEvaluator::GetInstance();
auto policy_result = policy_eval.EvaluateProposal(p.id);
// ... policy evaluation logic ...
#endif
```
**Build Targets**:
- `yaze` (main editor): Policy framework **enabled**
- `yaze_emu` (emulator): Policy framework **disabled** (not needed)
- `z3ed` (CLI): Policy framework **enabled**
### API Usage Patterns
**StatusOr Error Handling**:
```cpp
auto proposal_result = registry.GetProposal(proposal_id);
if (!proposal_result.ok()) {
return PolicyResult{false, {}, {}, {}, {}};
}
const auto& proposal = proposal_result.value();
```
**String View Conversions**:
```cpp
// Explicit conversion required for absl::string_view → std::string
std::string trimmed = std::string(absl::StripAsciiWhitespace(line));
config_->version = std::string(absl::StripAsciiWhitespace(parts[1]));
```
**Singleton Pattern**:
```cpp
PolicyEvaluator& evaluator = PolicyEvaluator::Get();
PolicyResult result = evaluator.EvaluateProposal(proposal_id);
```
### Compilation Fixes Applied
1. **Include Paths**: Changed from `src/cli/service/...` to `cli/service/...`
2. **StatusOr API**: Used `.ok()` and `.value()` instead of `.has_value()`
3. **String Numbers**: Added `#include "absl/strings/numbers.h"` for SimpleAtoi
4. **String View**: Explicit `std::string()` cast for all absl::StripAsciiWhitespace() calls
5. **Conditional Compilation**: Wrapped policy code with `YAZE_ENABLE_POLICY_FRAMEWORK` to fix yaze_emu build
## Testing Plan
### Phase 1: Manual Validation (Next Step)
- [ ] Launch yaze GUI and open Proposal Drawer
- [ ] Create test proposal and verify policy evaluation runs
- [ ] Test critical violation blocking (Accept button disabled)
- [ ] Test warning override flow (confirmation dialog)
- [ ] Verify policy status display with all severity levels
### Phase 2: Policy Testing
- [ ] Test forbidden_range detection (ROM header protection)
- [ ] Test change_constraint limits (byte count enforcement)
- [ ] Test test_requirement gating (blocks without passing tests)
- [ ] Test review_requirement flagging (complex proposals)
- [ ] Test policy enable/disable toggle
### Phase 3: Edge Cases
- [ ] Invalid YAML syntax handling
- [ ] Missing policy file behavior
- [ ] Malformed policy definitions
- [ ] Policy reload during runtime
- [ ] Multiple policies of same type
### Phase 4: Unit Tests
- [ ] PolicyEvaluator::ParsePolicyFile() unit tests
- [ ] Individual policy type evaluation tests
- [ ] Severity categorization tests
- [ ] Integration tests with ProposalRegistry
## Known Limitations
1. **YAML Parsing**: Simple custom parser implemented
- Works for current format but not full YAML spec
- Consider yaml-cpp for complex nested structures
2. **Forbidden Range Checking**: Requires ROM diff parsing
- Currently placeholder implementation
- Will need integration with .z3ed-diff format
3. **Review Requirement Conditions**: Complex expression evaluation
- Currently checks simple string matching
- May need expression parser for production
4. **Performance**: No profiling done yet
- Target: < 100ms per evaluation
- Likely well under target given simple logic
## Production Readiness Checklist
- ✅ Core implementation complete
- ✅ Build system integration
- ✅ GUI integration
- ✅ Example configuration
- ✅ Documentation complete
- ⏳ Manual testing (next step)
- ⏳ Unit test coverage
- ⏳ Windows cross-platform validation
- ⏳ Performance profiling
## Next Steps
**Immediate** (30 minutes):
1. Launch yaze and test policy evaluation in ProposalDrawer
2. Verify all 4 policy types work correctly
3. Test override workflow for warnings
**Short-term** (2-3 hours):
1. Add unit tests for PolicyEvaluator
2. Test on Windows build
3. Document policy configuration in user guide
**Medium-term** (4-6 hours):
1. Integrate with .z3ed-diff for forbidden range detection
2. Implement full YAML parser (yaml-cpp)
3. Add policy reload command to CLI
4. Performance profiling and optimization
## References
- **Specification**: [AW-04-POLICY-FRAMEWORK.md](AW-04-POLICY-FRAMEWORK.md)
- **Implementation Plan**: [E6-z3ed-implementation-plan.md](E6-z3ed-implementation-plan.md)
- **Example Config**: `.yaze/policies/agent.yaml`
- **Source Files**:
- `src/cli/service/policy_evaluator.{h,cc}`
- `src/app/editor/system/proposal_drawer.{h,cc}`
---
**Accomplishment**: The Policy Evaluation Framework is now fully implemented and ready for production testing. This represents a major safety milestone for the z3ed agentic workflow system, enabling confident AI-driven ROM modifications with human-defined constraints.

View File

@@ -16,6 +16,8 @@
This directory contains the primary documentation for the `z3ed` system.
**📋 Documentation Status**: Consolidated (Oct 2, 2025) - 10 core files, 6,547 lines
## Core Documentation
Start here to understand the architecture, learn how to use the commands, and see the current development status.
@@ -90,6 +92,7 @@ See the **[Technical Reference](E6-z3ed-reference.md)** for a full command list.
- Successfully tested via gRPC (5.3MB output files)
- Foundation for auto-capture on test failures
- AI agents can now capture visual context for debugging
- ✅ IT-07 Test Recording & Replay Complete: Regression testing workflow operational
- ✅ Server-side wiring for test lifecycle tracking inside `TestManager`
- ✅ gRPC status mapping helper to surface accurate error codes back to clients
- ✅ CLI integration with YAML/JSON output formats
@@ -97,11 +100,11 @@ See the **[Technical Reference](E6-z3ed-reference.md)** for a full command list.
**Next Priority**: IT-08b (Auto-capture on failure) + IT-08c (Widget state dumps) to complete enhanced error reporting
**Test Harness Evolution** (In Progress: IT-05 to IT-09 | 76% Complete):
**Test Harness Evolution** (In Progress: IT-05 to IT-09 | 78% Complete):
- **Test Introspection**: ✅ Query test status, results, and execution history
- **Widget Discovery**: ✅ AI agents can enumerate available GUI interactions dynamically
- **Test Recording**: ✅ Capture manual workflows as JSON scripts for regression testing
- **Enhanced Debugging**: 🔄 Screenshot capture (✅), widget state dumps (📋), execution context on failures (📋)
- **Enhanced Debugging**: 🔄 Screenshot capture (✅ IT-08a), widget state dumps (📋 IT-08c), execution context on failures (📋 IT-08b)
- **CI/CD Integration**: 📋 Standardized test suite format with JUnit XML output
See **[E6-z3ed-cli-design.md § 9](E6-z3ed-cli-design.md#9-test-harness-evolution-from-automation-to-platform)** for detailed architecture and implementation roadmap.
@@ -111,12 +114,13 @@ See **[E6-z3ed-cli-design.md § 9](E6-z3ed-cli-design.md#9-test-harness-evolutio
**📖 Getting Started**:
- **New to z3ed?** Start with this [README.md](README.md) then [E6-z3ed-cli-design.md](E6-z3ed-cli-design.md)
- **Want to use z3ed?** See [QUICK_REFERENCE.md](QUICK_REFERENCE.md) for all commands
- **Resume implementation?** Read [IMPLEMENTATION_CONTINUATION.md](IMPLEMENTATION_CONTINUATION.md)
**🔧 Implementation Guides**:
- [IT-05-IMPLEMENTATION-GUIDE.md](IT-05-IMPLEMENTATION-GUIDE.md) - Test Introspection API (next priority)
- [STATUS_REPORT_OCT2.md](STATUS_REPORT_OCT2.md) - Complete progress summary
- [IT-05-IMPLEMENTATION-GUIDE.md](IT-05-IMPLEMENTATION-GUIDE.md) - Test Introspection API (complete ✅)
- [IT-08-IMPLEMENTATION-GUIDE.md](IT-08-IMPLEMENTATION-GUIDE.md) - Enhanced Error Reporting (in progress 🔄)
- [IMPLEMENTATION_CONTINUATION.md](IMPLEMENTATION_CONTINUATION.md) - Detailed continuation plan for current phase
**📚 Reference**:
- [E6-z3ed-reference.md](E6-z3ed-reference.md) - Technical reference and API docs
- [E6-z3ed-implementation-plan.md](E6-z3ed-implementation-plan.md) - Task backlog and roadmap
- [QUICK_REFERENCE.md](QUICK_REFERENCE.md) - Quick command reference

View File

@@ -1,402 +0,0 @@
# Remote Control Agent Workflows
**Date**: October 2, 2025
**Status**: Functional - Test Harness + Widget Registry Integration
**Purpose**: Enable AI agents to remotely control YAZE for automated editing
## Overview
The remote control system allows AI agents to interact with YAZE through gRPC, using the ImGuiTestHarness and Widget ID Registry to perform real editing tasks.
## Quick Start
### 1. Start YAZE with Test Harness
```bash
./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze \
--enable_test_harness \
--test_harness_port=50052 \
--rom_file=assets/zelda3.sfc &
```
### 2. Open Overworld Editor
In YAZE GUI:
- Click "Overworld" button
- This registers 13 toolset widgets for remote control
### 3. Run Test Script
```bash
./scripts/test_remote_control.sh
```
Expected output:
- ✓ All 8 practical workflows pass
- Agent can switch modes, open tools, control zoom
## Supported Workflows
### Mode Switching
**Draw Tile Mode**:
```bash
grpcurl -plaintext -d '{"target":"Overworld/Toolset/button:DrawTile","type":"LEFT"}' \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Click
```
- Enables tile painting on overworld map
- Agent can then click canvas to draw selected tiles
**Pan Mode**:
```bash
grpcurl -plaintext -d '{"target":"Overworld/Toolset/button:Pan","type":"LEFT"}' \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Click
```
- Enables map navigation
- Agent can drag canvas to reposition view
**Entrances Mode**:
```bash
grpcurl -plaintext -d '{"target":"Overworld/Toolset/button:Entrances","type":"LEFT"}' \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Click
```
- Enables entrance editing
- Agent can click to place/move entrances
**Exits Mode**:
```bash
grpcurl -plaintext -d '{"target":"Overworld/Toolset/button:Exits","type":"LEFT"}' \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Click
```
- Enables exit editing
- Agent can click to place/move exits
**Sprites Mode**:
```bash
grpcurl -plaintext -d '{"target":"Overworld/Toolset/button:Sprites","type":"LEFT"}' \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Click
```
- Enables sprite editing
- Agent can place/move sprites on overworld
**Items Mode**:
```bash
grpcurl -plaintext -d '{"target":"Overworld/Toolset/button:Items","type":"LEFT"}' \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Click
```
- Enables item placement
- Agent can add items to overworld
### Tool Opening
**Tile16 Editor**:
```bash
grpcurl -plaintext -d '{"target":"Overworld/Toolset/button:Tile16Editor","type":"LEFT"}' \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Click
```
- Opens Tile16 Editor window
- Agent can select tiles for drawing
### View Controls
**Zoom In**:
```bash
grpcurl -plaintext -d '{"target":"Overworld/Toolset/button:ZoomIn","type":"LEFT"}' \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Click
```
**Zoom Out**:
```bash
grpcurl -plaintext -d '{"target":"Overworld/Toolset/button:ZoomOut","type":"LEFT"}' \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Click
```
**Fullscreen Toggle**:
```bash
grpcurl -plaintext -d '{"target":"Overworld/Toolset/button:Fullscreen","type":"LEFT"}' \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Click
```
## Multi-Step Workflows
### Workflow 1: Draw Custom Tiles
**Goal**: Agent draws specific tiles on the overworld map
**Steps**:
1. Switch to Draw Tile mode
2. Open Tile16 Editor
3. Select desired tile (TODO: needs canvas click support)
4. Click on overworld canvas at (x, y) to draw
**Current Status**: Steps 1-2 working, 3-4 need implementation
### Workflow 2: Reposition Entrance
**Goal**: Agent moves an entrance to a new location
**Steps**:
1. Switch to Entrances mode
2. Click on existing entrance to select
3. Drag to new location (TODO: needs drag support)
4. Verify entrance properties updated
**Current Status**: Step 1 working, 2-4 need implementation
### Workflow 3: Place Sprites
**Goal**: Agent adds sprites to overworld
**Steps**:
1. Switch to Sprites mode
2. Select sprite from palette (TODO)
3. Click canvas to place sprite
4. Adjust sprite properties if needed
**Current Status**: Step 1 working, 2-4 need implementation
## Widget Registry Integration
### Hierarchical Widget IDs
The test harness now supports hierarchical widget IDs from the registry:
```
Format: <Editor>/<Section>/<Type>:<Name>
Example: Overworld/Toolset/button:DrawTile
```
**Benefits**:
- Stable, predictable widget references
- Better error messages with suggestions
- Backwards compatible with legacy format
- Self-documenting structure
### Pattern Matching
When a widget isn't found, the system suggests alternatives:
```bash
# Typo in widget name
grpcurl ... -d '{"target":"Overworld/Toolset/button:DrawTyle"}'
# Response:
# "Widget not found: DrawTyle. Did you mean:
# Overworld/Toolset/button:DrawTile?"
```
### Widget Discovery
Future enhancement - list all available widgets:
```bash
z3ed agent discover --pattern "Overworld/*"
# Lists all Overworld widgets
z3ed agent discover --pattern "*/button:*"
# Lists all buttons across editors
```
## Implementation Details
### Test Harness Changes
**File**: `src/app/core/service/imgui_test_harness_service.cc`
**Changes**:
1. Added widget registry include
2. Click RPC tries hierarchical lookup first
3. Fallback to legacy string-based lookup
4. Pattern matching for suggestions
**Code**:
```cpp
// Try hierarchical widget ID lookup first
auto& registry = gui::WidgetIdRegistry::Instance();
ImGuiID widget_id = registry.GetWidgetId(target);
if (widget_id != 0) {
// Found in registry - use ImGui ID directly
ctx->ItemClick(widget_id, mouse_button);
} else {
// Fallback to legacy lookup
ctx->ItemClick(widget_label.c_str(), mouse_button);
}
```
### Widget Registration
**File**: `src/app/editor/overworld/overworld_editor.cc`
**Registered Widgets** (13 total):
- Overworld/Toolset/button:Pan
- Overworld/Toolset/button:DrawTile
- Overworld/Toolset/button:Entrances
- Overworld/Toolset/button:Exits
- Overworld/Toolset/button:Items
- Overworld/Toolset/button:Sprites
- Overworld/Toolset/button:Transports
- Overworld/Toolset/button:Music
- Overworld/Toolset/button:ZoomIn
- Overworld/Toolset/button:ZoomOut
- Overworld/Toolset/button:Fullscreen
- Overworld/Toolset/button:Tile16Editor
- Overworld/Toolset/button:CopyMap
## Next Steps
### Priority 1: Canvas Interaction (2-3 hours)
**Goal**: Enable agent to click on canvas at specific coordinates
**Implementation**:
1. Add canvas click to Click RPC
2. Support coordinate-based clicking: `{"target":"canvas:Overworld","x":100,"y":200}`
3. Test drawing tiles programmatically
**Use Cases**:
- Draw tiles at specific locations
- Select entities by clicking
- Navigate by clicking minimap
### Priority 2: Tile Selection (1-2 hours)
**Goal**: Enable agent to select tiles from Tile16 Editor
**Implementation**:
1. Register Tile16 Editor canvas widgets
2. Support tile palette clicking
3. Track selected tile state
**Use Cases**:
- Select tile before drawing
- Change tile selection mid-workflow
- Verify correct tile selected
### Priority 3: Entity Manipulation (2-3 hours)
**Goal**: Enable dragging of entrances, exits, sprites
**Implementation**:
1. Add Drag RPC to proto
2. Implement drag operation in test harness
3. Support drag start + end coordinates
**Use Cases**:
- Move entrances to new positions
- Reposition sprites
- Adjust exit locations
### Priority 4: Workflow Chaining (1-2 hours)
**Goal**: Combine multiple operations into workflows
**Implementation**:
1. Create workflow definition format
2. Execute sequence of RPCs
3. Handle errors gracefully
**Example Workflow**:
```yaml
workflow: draw_custom_tile
steps:
- click: Overworld/Toolset/button:DrawTile
- click: Overworld/Toolset/button:Tile16Editor
- wait: window_visible:Tile16 Editor
- click: canvas:Tile16Editor
x: 64
y: 64
- click: canvas:Overworld
x: 512
y: 384
```
## Testing Strategy
### Manual Testing
1. Start test harness
2. Run test script: `./scripts/test_remote_control.sh`
3. Observe mode changes in GUI
4. Verify no crashes or errors
### Automated Testing
1. Add to CI pipeline
2. Run as part of E2E validation
3. Test on multiple platforms
### Integration Testing
1. Test with real agent workflows
2. Validate agent can complete tasks
3. Measure reliability and timing
## Performance Characteristics
**Click Latency**: < 200ms
- gRPC overhead: ~10ms
- Test queue time: ~50ms
- ImGui event processing: ~100ms
- Total: ~160ms average
**Mode Switch Time**: < 500ms
- Includes UI update
- State transition
- Visual feedback
**Tool Opening**: < 1s
- Window creation
- Content loading
- Layout calculation
## Troubleshooting
### Widget Not Found
**Problem**: "Widget not found: Overworld/Toolset/button:DrawTile"
**Solutions**:
1. Verify Overworld editor is open (widgets registered on open)
2. Check widget name spelling
3. Look at suggestions in error message
4. Try legacy format: "button:DrawTile"
### Click Not Working
**Problem**: Click succeeds but nothing happens
**Solutions**:
1. Check if widget is enabled (not grayed out)
2. Verify correct mode/context for action
3. Add delay between clicks
4. Check ImGui event queue
### Test Timeout
**Problem**: "Test timeout - widget not found or unresponsive"
**Solutions**:
1. Increase timeout (default 5s)
2. Check if GUI is responsive
3. Verify widget is visible (not hidden)
4. Look for modal dialogs blocking interaction
## References
**Documentation**:
- [WIDGET_ID_REFACTORING_PROGRESS.md](WIDGET_ID_REFACTORING_PROGRESS.md)
- [IT-01-QUICKSTART.md](IT-01-QUICKSTART.md)
- [E2E_VALIDATION_GUIDE.md](E2E_VALIDATION_GUIDE.md)
**Code Files**:
- `src/app/core/service/imgui_test_harness_service.cc` - Test harness implementation
- `src/app/gui/widget_id_registry.{h,cc}` - Widget registry
- `src/app/editor/overworld/overworld_editor.cc` - Widget registrations
- `scripts/test_remote_control.sh` - Test script
---
**Last Updated**: October 2, 2025, 11:45 PM
**Status**: Functional - Basic mode switching works
**Next**: Canvas interaction + tile selection

View File

@@ -1,357 +0,0 @@
# Widget ID Refactoring - Next Actions
**Date**: October 2, 2025
**Status**: Phase 1 Complete - Testing & Integration Phase
**Previous Session**: [SESSION_SUMMARY_OCT2_NIGHT.md](SESSION_SUMMARY_OCT2_NIGHT.md)
## Quick Start - Next Session
### Option 1: Manual Testing (15 minutes) 🎯 RECOMMENDED FIRST
**Goal**: Verify widgets register correctly in running GUI
```bash
# 1. Launch YAZE
./build/bin/yaze.app/Contents/MacOS/yaze
# 2. Open a ROM
# File → Open ROM → assets/zelda3.sfc
# 3. Open Overworld Editor
# Click "Overworld" button in main window
# 4. Test toolset buttons
# Click through: Pan, DrawTile, Entrances, etc.
# Expected: All work normally, no crashes
# 5. Check console output
# Look for any errors or warnings
# Widget registrations happen silently
```
**Success Criteria**:
- ✅ GUI launches without crashes
- ✅ Overworld editor opens normally
- ✅ All toolset buttons clickable
- ✅ No error messages in console
---
### Option 2: Add Widget Discovery Command (30 minutes)
**Goal**: Create CLI command to list registered widgets
**File to Edit**: `src/cli/handlers/agent.cc`
**Add New Command**: `z3ed agent discover`
```cpp
// Add to agent.cc:
absl::Status HandleDiscoverCommand(const std::vector<std::string>& args) {
// Parse --pattern flag (default "*")
std::string pattern = "*";
for (size_t i = 0; i < args.size(); ++i) {
if (args[i] == "--pattern" && i + 1 < args.size()) {
pattern = args[++i];
}
}
// Get widget registry
auto& registry = gui::WidgetIdRegistry::Instance();
auto matches = registry.FindWidgets(pattern);
if (matches.empty()) {
std::cout << "No widgets found matching pattern: " << pattern << "\n";
return absl::NotFoundError("No widgets found");
}
std::cout << "=== Registered Widgets ===\n\n";
std::cout << "Pattern: " << pattern << "\n";
std::cout << "Count: " << matches.size() << "\n\n";
for (const auto& path : matches) {
const auto* info = registry.GetWidgetInfo(path);
if (info) {
std::cout << path << "\n";
std::cout << " Type: " << info->type << "\n";
std::cout << " ImGui ID: " << info->imgui_id << "\n";
if (!info->description.empty()) {
std::cout << " Description: " << info->description << "\n";
}
std::cout << "\n";
}
}
return absl::OkStatus();
}
// Add routing in HandleAgentCommand:
if (subcommand == "discover") {
return HandleDiscoverCommand(args);
}
```
**Test**:
```bash
# Rebuild
cmake --build build --target z3ed -j8
# Test discovery (will fail - widgets registered at runtime)
./build/bin/z3ed agent discover
# Note: This requires YAZE to be running with widgets registered
# We'll need a different approach - see Option 3
```
---
### Option 3: Widget Export at Shutdown (30 minutes) 🎯 BETTER APPROACH
**Goal**: Export widget catalog when YAZE exits
**File to Edit**: `src/app/editor/editor_manager.cc`
**Add Destructor or Shutdown Method**:
```cpp
// In editor_manager.cc destructor or Shutdown():
void EditorManager::Shutdown() {
// Export widget catalog for z3ed agent
auto& registry = gui::WidgetIdRegistry::Instance();
std::string catalog_path = "/tmp/yaze_widgets.yaml";
try {
registry.ExportCatalogToFile(catalog_path, "yaml");
std::cout << "Widget catalog exported to: " << catalog_path << "\n";
} catch (const std::exception& e) {
std::cerr << "Failed to export widget catalog: " << e.what() << "\n";
}
}
```
**Test**:
```bash
# 1. Rebuild
cmake --build build --target yaze -j8
# 2. Launch YAZE
./build/bin/yaze.app/Contents/MacOS/yaze
# 3. Open Overworld editor
# (registers widgets)
# 4. Quit YAZE
# File → Quit or Cmd+Q
# 5. Check exported catalog
cat /tmp/yaze_widgets.yaml
# Expected output:
# widgets:
# - path: "Overworld/Toolset/button:Pan"
# type: button
# imgui_id: 12345
# context:
# editor: Overworld
# tab: Toolset
# ...
```
---
### Option 4: Test Harness Integration (1-2 hours)
**Goal**: Enable test harness to click widgets by hierarchical ID
**Files to Edit**:
1. `src/app/core/service/imgui_test_harness_service.cc`
2. `src/app/core/proto/imgui_test_harness.proto` (optional - add DiscoverWidgets RPC)
**Implementation**:
```cpp
// In imgui_test_harness_service.cc, update Click RPC:
absl::Status ImGuiTestHarnessServiceImpl::Click(
const ClickRequest* request, ClickResponse* response) {
const std::string& target = request->target();
// Try hierarchical widget ID first
auto& registry = gui::WidgetIdRegistry::Instance();
ImGuiID widget_id = registry.GetWidgetId(target);
if (widget_id != 0) {
// Found in registry - use ImGui ID directly
std::string test_name = absl::StrFormat("DynamicClick_%s", target);
auto* dynamic_test = ImGuiTest_CreateDynamicTest(
test_manager_->GetEngine(), test_category_.c_str(), test_name.c_str());
dynamic_test->GuiFunc = [widget_id](ImGuiTestContext* ctx) {
ctx->ItemClick(widget_id);
};
ImGuiTest_RunTest(test_manager_->GetEngine(), dynamic_test);
response->set_success(true);
response->set_message(absl::StrFormat("Clicked widget: %s", target));
return absl::OkStatus();
}
// Fallback to legacy string-based lookup
// ... existing code ...
// If not found, suggest alternatives
auto matches = registry.FindWidgets("*" + target + "*");
if (!matches.empty()) {
std::string suggestions = absl::StrJoin(matches, ", ");
return absl::NotFoundError(
absl::StrFormat("Widget not found: %s. Did you mean: %s?",
target, suggestions));
}
return absl::NotFoundError(
absl::StrFormat("Widget not found: %s", target));
}
```
**Test**:
```bash
# 1. Rebuild with gRPC
cmake --build build-grpc-test --target yaze -j8
# 2. Start test harness
./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze \
--enable_test_harness \
--test_harness_port=50052 \
--rom_file=assets/zelda3.sfc &
# 3. Open Overworld editor in GUI
# (registers widgets)
# 4. Test hierarchical click
grpcurl -plaintext \
-import-path src/app/core/proto \
-proto imgui_test_harness.proto \
-d '{"target":"Overworld/Toolset/button:DrawTile","type":"LEFT"}' \
127.0.0.1:50052 yaze.test.ImGuiTestHarness/Click
# Expected: Click succeeds, DrawTile mode activated
```
---
## Recommended Sequence
### Tonight (30 minutes)
1.**Option 1**: Manual testing - verify no crashes
2. 📋 **Option 3**: Add widget export at shutdown
3. 📋 Inspect exported YAML, verify 13 toolset widgets
### Tomorrow Morning (1-2 hours)
1. 📋 **Option 4**: Test harness integration
2. 📋 Test clicking widgets via hierarchical IDs
3. 📋 Update E2E test script with new IDs
### Tomorrow Afternoon (2-3 hours)
1. 📋 Complete Overworld editor (canvas, properties)
2. 📋 Add DiscoverWidgets RPC to proto
3. 📋 Document patterns and best practices
---
## Files to Modify Next
### High Priority
1. `src/app/editor/editor_manager.cc` - Add widget export at shutdown
2. `src/app/core/service/imgui_test_harness_service.cc` - Registry lookup in Click RPC
### Medium Priority
3. `src/app/core/proto/imgui_test_harness.proto` - Add DiscoverWidgets RPC
4. `src/app/editor/overworld/overworld_editor.cc` - Add canvas/properties widgets
### Low Priority
5. `scripts/test_harness_e2e.sh` - Update with hierarchical IDs
6. `docs/z3ed/IT-01-QUICKSTART.md` - Add widget ID examples
---
## Success Criteria
### Phase 1 (Complete) ✅
- [x] Widget registry in build
- [x] 13 toolset widgets registered
- [x] Clean build
- [x] Documentation updated
### Phase 2 (Current) 🔄
- [ ] Manual testing passes
- [ ] Widget export works
- [ ] Test harness can click by hierarchical ID
- [ ] At least 1 E2E test updated
### Phase 3 (Next) 📋
- [ ] Complete Overworld editor (30+ widgets)
- [ ] DiscoverWidgets RPC working
- [ ] All E2E tests use hierarchical IDs
- [ ] Performance validated (< 1ms overhead)
---
## Quick Commands
### Build
```bash
# Regular build
cmake --build build --target yaze -j8
# Test harness build
cmake --build build-grpc-test --target yaze -j8
# CLI build
cmake --build build --target z3ed -j8
```
### Test
```bash
# Manual test
./build/bin/yaze.app/Contents/MacOS/yaze
# Test harness
./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze \
--enable_test_harness \
--test_harness_port=50052 \
--rom_file=assets/zelda3.sfc
```
### Cleanup
```bash
# Kill running YAZE instances
killall yaze
# Clean build
rm -rf build/CMakeFiles build/bin
cmake --build build -j8
```
---
## References
**Progress Docs**:
- [WIDGET_ID_REFACTORING_PROGRESS.md](WIDGET_ID_REFACTORING_PROGRESS.md) - Detailed tracker
- [SESSION_SUMMARY_OCT2_NIGHT.md](SESSION_SUMMARY_OCT2_NIGHT.md) - Tonight's work
**Design Docs**:
- [IMGUI_ID_MANAGEMENT_REFACTORING.md](IMGUI_ID_MANAGEMENT_REFACTORING.md) - Complete plan
- [IT-01-QUICKSTART.md](IT-01-QUICKSTART.md) - Test harness guide
**Code References**:
- `src/app/gui/widget_id_registry.{h,cc}` - Registry implementation
- `src/app/editor/overworld/overworld_editor.cc` - Usage example
- `src/app/core/service/imgui_test_harness_service.cc` - Test harness
---
**Last Updated**: October 2, 2025, 11:30 PM
**Next Action**: Option 1 (Manual Testing) or Option 3 (Widget Export)
**Time Estimate**: 15-30 minutes