docs: Update implementation plan and README for clarity and recent accomplishments
This commit is contained in:
627
docs/z3ed/AW-04-POLICY-FRAMEWORK.md
Normal file
627
docs/z3ed/AW-04-POLICY-FRAMEWORK.md
Normal file
@@ -0,0 +1,627 @@
|
||||
# Policy Evaluation Framework (AW-04)
|
||||
|
||||
**Status**: Implementation In Progress
|
||||
**Priority**: High (Next Phase)
|
||||
**Time Estimate**: 6-8 hours
|
||||
**Last Updated**: October 2, 2025
|
||||
|
||||
## Overview
|
||||
|
||||
The Policy Evaluation Framework provides a YAML-based constraint system for gating proposal acceptance in the z3ed agent workflow. It ensures that AI-generated ROM modifications meet quality, safety, and testing requirements before being merged into the main ROM.
|
||||
|
||||
## Goals
|
||||
|
||||
1. **Quality Gates**: Enforce minimum test pass rates and code quality standards
|
||||
2. **Safety Constraints**: Prevent modifications to critical ROM regions (headers, checksums)
|
||||
3. **Scope Limits**: Restrict changes to reasonable byte counts and specific banks
|
||||
4. **Human Review**: Require manual review for large or complex changes
|
||||
5. **Flexibility**: Allow policy overrides with confirmation and logging
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ ProposalDrawer (GUI) │
|
||||
│ └─ Accept button gated by PolicyEvaluator │
|
||||
└────────────────────┬────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ PolicyEvaluator (Singleton Service) │
|
||||
│ ├─ LoadPolicies() from .yaze/policies/ │
|
||||
│ ├─ EvaluateProposal(proposal_id) → PolicyResult │
|
||||
│ └─ Cache of parsed YAML policies │
|
||||
└────────────────────┬────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ .yaze/policies/agent.yaml (YAML Configuration) │
|
||||
│ ├─ test_requirements (min pass rates) │
|
||||
│ ├─ change_constraints (byte limits, allowed banks) │
|
||||
│ ├─ review_requirements (human review triggers) │
|
||||
│ └─ forbidden_ranges (protected ROM regions) │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## YAML Policy Schema
|
||||
|
||||
### Example Policy File
|
||||
|
||||
```yaml
|
||||
# .yaze/policies/agent.yaml
|
||||
version: 1.0
|
||||
enabled: true
|
||||
|
||||
policies:
|
||||
# Policy 1: Test Requirements
|
||||
- name: require_tests
|
||||
type: test_requirement
|
||||
enabled: true
|
||||
severity: critical # critical | warning | info
|
||||
rules:
|
||||
- test_suite: "overworld_rendering"
|
||||
min_pass_rate: 0.95
|
||||
- test_suite: "palette_integrity"
|
||||
min_pass_rate: 1.0
|
||||
- test_suite: "dungeon_logic"
|
||||
min_pass_rate: 0.90
|
||||
message: "All required test suites must pass before accepting proposal"
|
||||
|
||||
# Policy 2: Change Scope Limits
|
||||
- name: limit_change_scope
|
||||
type: change_constraint
|
||||
enabled: true
|
||||
severity: critical
|
||||
rules:
|
||||
- max_bytes_changed: 10240 # 10KB limit
|
||||
- allowed_banks: [0x00, 0x01, 0x0E, 0x0F] # Graphics banks only
|
||||
- max_commands_executed: 20
|
||||
message: "Proposal exceeds allowed change scope"
|
||||
|
||||
# Policy 3: Protected ROM Regions
|
||||
- name: protect_critical_regions
|
||||
type: forbidden_range
|
||||
enabled: true
|
||||
severity: critical
|
||||
ranges:
|
||||
- start: 0xFFB0 # ROM header
|
||||
end: 0xFFFF
|
||||
reason: "ROM header is protected"
|
||||
- start: 0x00FFC0 # Internal header
|
||||
end: 0x00FFDF
|
||||
reason: "Internal ROM header"
|
||||
message: "Proposal modifies protected ROM region"
|
||||
|
||||
# Policy 4: Human Review Requirements
|
||||
- name: human_review_required
|
||||
type: review_requirement
|
||||
enabled: true
|
||||
severity: warning
|
||||
conditions:
|
||||
- if: bytes_changed > 1024
|
||||
then: require_diff_review
|
||||
message: "Large change requires diff review"
|
||||
- if: commands_executed > 10
|
||||
then: require_log_review
|
||||
message: "Complex operation requires log review"
|
||||
- if: test_failures > 0
|
||||
then: require_explanation
|
||||
message: "Test failures require explanation"
|
||||
|
||||
# Policy 5: Palette Modifications
|
||||
- name: palette_safety
|
||||
type: change_constraint
|
||||
enabled: true
|
||||
severity: warning
|
||||
rules:
|
||||
- max_palettes_changed: 5
|
||||
- preserve_transparency: true # Don't modify color index 0
|
||||
message: "Palette changes exceed safety threshold"
|
||||
```
|
||||
|
||||
### Schema Definition
|
||||
|
||||
```yaml
|
||||
# Policy file structure
|
||||
version: string # Semantic version (e.g., "1.0")
|
||||
enabled: boolean # Master enable/disable
|
||||
|
||||
policies:
|
||||
- name: string # Unique policy identifier
|
||||
type: enum # test_requirement | change_constraint | forbidden_range | review_requirement
|
||||
enabled: boolean # Policy-specific enable/disable
|
||||
severity: enum # critical | warning | info
|
||||
|
||||
# Type-specific fields:
|
||||
rules: array # For test_requirement, change_constraint
|
||||
ranges: array # For forbidden_range
|
||||
conditions: array # For review_requirement
|
||||
|
||||
message: string # User-facing error message
|
||||
```
|
||||
|
||||
## Implementation Plan
|
||||
|
||||
### Phase 1: Core Infrastructure (2 hours)
|
||||
|
||||
#### 1.1 Create PolicyEvaluator Service
|
||||
|
||||
**File**: `src/cli/service/policy_evaluator.h`
|
||||
|
||||
```cpp
|
||||
#ifndef YAZE_CLI_SERVICE_POLICY_EVALUATOR_H
|
||||
#define YAZE_CLI_SERVICE_POLICY_EVALUATOR_H
|
||||
|
||||
#include <string>
|
||||
#include <vector>
|
||||
#include <memory>
|
||||
#include "absl/status/status.h"
|
||||
#include "absl/status/statusor.h"
|
||||
#include "absl/strings/string_view.h"
|
||||
|
||||
namespace yaze {
|
||||
namespace cli {
|
||||
|
||||
// Policy violation severity levels
|
||||
enum class PolicySeverity {
|
||||
kInfo, // Informational, doesn't block acceptance
|
||||
kWarning, // Warning, can be overridden
|
||||
kCritical // Critical, blocks acceptance
|
||||
};
|
||||
|
||||
// Individual policy violation
|
||||
struct PolicyViolation {
|
||||
std::string policy_name;
|
||||
PolicySeverity severity;
|
||||
std::string message;
|
||||
std::string details; // Additional context
|
||||
};
|
||||
|
||||
// Result of policy evaluation
|
||||
struct PolicyResult {
|
||||
bool passed; // True if all critical policies passed
|
||||
std::vector<PolicyViolation> violations;
|
||||
|
||||
// Categorized violations
|
||||
std::vector<PolicyViolation> critical_violations;
|
||||
std::vector<PolicyViolation> warnings;
|
||||
std::vector<PolicyViolation> info;
|
||||
|
||||
// Helper methods
|
||||
bool has_critical_violations() const { return !critical_violations.empty(); }
|
||||
bool can_accept_with_override() const {
|
||||
return !has_critical_violations() && !warnings.empty();
|
||||
}
|
||||
};
|
||||
|
||||
// Singleton service for evaluating proposals against policies
|
||||
class PolicyEvaluator {
|
||||
public:
|
||||
static PolicyEvaluator& GetInstance();
|
||||
|
||||
// Load policies from disk (.yaze/policies/agent.yaml)
|
||||
absl::Status LoadPolicies(absl::string_view policy_dir = ".yaze/policies");
|
||||
|
||||
// Evaluate a proposal against all loaded policies
|
||||
absl::StatusOr<PolicyResult> EvaluateProposal(
|
||||
absl::string_view proposal_id);
|
||||
|
||||
// Reload policies from disk (for live editing)
|
||||
absl::Status ReloadPolicies();
|
||||
|
||||
// Check if policies are loaded and enabled
|
||||
bool IsEnabled() const { return enabled_; }
|
||||
|
||||
// Get policy configuration path
|
||||
std::string GetPolicyPath() const { return policy_path_; }
|
||||
|
||||
private:
|
||||
PolicyEvaluator() = default;
|
||||
~PolicyEvaluator() = default;
|
||||
|
||||
// Non-copyable, non-movable
|
||||
PolicyEvaluator(const PolicyEvaluator&) = delete;
|
||||
PolicyEvaluator& operator=(const PolicyEvaluator&) = delete;
|
||||
|
||||
// Parse YAML policy file
|
||||
absl::Status ParsePolicyFile(absl::string_view yaml_content);
|
||||
|
||||
// Evaluate individual policy types
|
||||
void EvaluateTestRequirements(
|
||||
absl::string_view proposal_id, PolicyResult* result);
|
||||
void EvaluateChangeConstraints(
|
||||
absl::string_view proposal_id, PolicyResult* result);
|
||||
void EvaluateForbiddenRanges(
|
||||
absl::string_view proposal_id, PolicyResult* result);
|
||||
void EvaluateReviewRequirements(
|
||||
absl::string_view proposal_id, PolicyResult* result);
|
||||
|
||||
bool enabled_ = false;
|
||||
std::string policy_path_;
|
||||
|
||||
// Parsed policy structures (implementation detail)
|
||||
struct PolicyConfig;
|
||||
std::unique_ptr<PolicyConfig> config_;
|
||||
};
|
||||
|
||||
} // namespace cli
|
||||
} // namespace yaze
|
||||
|
||||
#endif // YAZE_CLI_SERVICE_POLICY_EVALUATOR_H
|
||||
```
|
||||
|
||||
#### 1.2 Create Policy Configuration Structures
|
||||
|
||||
**File**: `src/cli/service/policy_evaluator.cc` (partial)
|
||||
|
||||
```cpp
|
||||
#include "src/cli/service/policy_evaluator.h"
|
||||
|
||||
#include <fstream>
|
||||
#include <sstream>
|
||||
#include "absl/strings/str_format.h"
|
||||
#include "src/cli/service/proposal_registry.h"
|
||||
|
||||
// If YAML parsing is available
|
||||
#ifdef YAZE_WITH_YAML
|
||||
#include <yaml-cpp/yaml.h>
|
||||
#endif
|
||||
|
||||
namespace yaze {
|
||||
namespace cli {
|
||||
|
||||
// Internal policy configuration structures
|
||||
struct PolicyEvaluator::PolicyConfig {
|
||||
std::string version;
|
||||
bool enabled;
|
||||
|
||||
struct TestRequirement {
|
||||
std::string name;
|
||||
bool enabled;
|
||||
PolicySeverity severity;
|
||||
std::vector<std::pair<std::string, double>> test_suites; // suite name → min pass rate
|
||||
std::string message;
|
||||
};
|
||||
|
||||
struct ChangeConstraint {
|
||||
std::string name;
|
||||
bool enabled;
|
||||
PolicySeverity severity;
|
||||
int max_bytes_changed = -1;
|
||||
std::vector<int> allowed_banks;
|
||||
int max_commands_executed = -1;
|
||||
int max_palettes_changed = -1;
|
||||
bool preserve_transparency = false;
|
||||
std::string message;
|
||||
};
|
||||
|
||||
struct ForbiddenRange {
|
||||
std::string name;
|
||||
bool enabled;
|
||||
PolicySeverity severity;
|
||||
std::vector<std::tuple<int, int, std::string>> ranges; // start, end, reason
|
||||
std::string message;
|
||||
};
|
||||
|
||||
struct ReviewRequirement {
|
||||
std::string name;
|
||||
bool enabled;
|
||||
PolicySeverity severity;
|
||||
std::vector<std::string> conditions;
|
||||
std::string message;
|
||||
};
|
||||
|
||||
std::vector<TestRequirement> test_requirements;
|
||||
std::vector<ChangeConstraint> change_constraints;
|
||||
std::vector<ForbiddenRange> forbidden_ranges;
|
||||
std::vector<ReviewRequirement> review_requirements;
|
||||
};
|
||||
|
||||
// Singleton instance
|
||||
PolicyEvaluator& PolicyEvaluator::GetInstance() {
|
||||
static PolicyEvaluator instance;
|
||||
return instance;
|
||||
}
|
||||
|
||||
absl::Status PolicyEvaluator::LoadPolicies(absl::string_view policy_dir) {
|
||||
policy_path_ = absl::StrFormat("%s/agent.yaml", policy_dir);
|
||||
|
||||
// Check if file exists
|
||||
std::ifstream file(policy_path_);
|
||||
if (!file.good()) {
|
||||
// No policy file - policies disabled
|
||||
enabled_ = false;
|
||||
return absl::OkStatus();
|
||||
}
|
||||
|
||||
// Read file content
|
||||
std::stringstream buffer;
|
||||
buffer << file.rdbuf();
|
||||
std::string yaml_content = buffer.str();
|
||||
|
||||
return ParsePolicyFile(yaml_content);
|
||||
}
|
||||
|
||||
absl::Status PolicyEvaluator::ParsePolicyFile(absl::string_view yaml_content) {
|
||||
#ifndef YAZE_WITH_YAML
|
||||
return absl::UnimplementedError(
|
||||
"YAML support not compiled. Build with YAZE_WITH_YAML=ON");
|
||||
#else
|
||||
try {
|
||||
YAML::Node root = YAML::Load(std::string(yaml_content));
|
||||
|
||||
config_ = std::make_unique<PolicyConfig>();
|
||||
config_->version = root["version"].as<std::string>("1.0");
|
||||
config_->enabled = root["enabled"].as<bool>(true);
|
||||
|
||||
if (!config_->enabled) {
|
||||
enabled_ = false;
|
||||
return absl::OkStatus();
|
||||
}
|
||||
|
||||
// Parse policies array
|
||||
if (root["policies"]) {
|
||||
for (const auto& policy_node : root["policies"]) {
|
||||
std::string type = policy_node["type"].as<std::string>();
|
||||
|
||||
if (type == "test_requirement") {
|
||||
// Parse test requirement policy
|
||||
// ... (implementation continues)
|
||||
} else if (type == "change_constraint") {
|
||||
// Parse change constraint policy
|
||||
// ... (implementation continues)
|
||||
} else if (type == "forbidden_range") {
|
||||
// Parse forbidden range policy
|
||||
// ... (implementation continues)
|
||||
} else if (type == "review_requirement") {
|
||||
// Parse review requirement policy
|
||||
// ... (implementation continues)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
enabled_ = true;
|
||||
return absl::OkStatus();
|
||||
|
||||
} catch (const YAML::Exception& e) {
|
||||
return absl::InvalidArgumentError(
|
||||
absl::StrFormat("Failed to parse policy YAML: %s", e.what()));
|
||||
}
|
||||
#endif
|
||||
}
|
||||
|
||||
// ... (implementation continues with evaluation methods)
|
||||
|
||||
} // namespace cli
|
||||
} // namespace yaze
|
||||
```
|
||||
|
||||
### Phase 2: Policy Evaluation Logic (2-3 hours)
|
||||
|
||||
Implement the core evaluation methods that check proposals against each policy type.
|
||||
|
||||
### Phase 3: GUI Integration (2 hours)
|
||||
|
||||
#### 3.1 Update ProposalDrawer
|
||||
|
||||
**File**: `src/app/editor/system/proposal_drawer.cc`
|
||||
|
||||
Add policy status display and gating logic:
|
||||
|
||||
```cpp
|
||||
#include "src/cli/service/policy_evaluator.h"
|
||||
|
||||
void ProposalDrawer::DrawProposalDetail(const std::string& proposal_id) {
|
||||
// ... existing detail view code ...
|
||||
|
||||
// === Policy Status Section ===
|
||||
ImGui::Separator();
|
||||
ImGui::TextUnformatted("Policy Status:");
|
||||
|
||||
auto& policy_eval = cli::PolicyEvaluator::GetInstance();
|
||||
if (policy_eval.IsEnabled()) {
|
||||
auto policy_result = policy_eval.EvaluateProposal(proposal_id);
|
||||
|
||||
if (policy_result.ok()) {
|
||||
const auto& result = policy_result.value();
|
||||
|
||||
if (result.passed) {
|
||||
ImGui::TextColored(ImVec4(0, 1, 0, 1), "✓ All policies passed");
|
||||
} else {
|
||||
// Show violations
|
||||
if (result.has_critical_violations()) {
|
||||
ImGui::TextColored(ImVec4(1, 0, 0, 1), "⛔ Critical violations:");
|
||||
for (const auto& violation : result.critical_violations) {
|
||||
ImGui::BulletText("%s: %s",
|
||||
violation.policy_name.c_str(),
|
||||
violation.message.c_str());
|
||||
}
|
||||
}
|
||||
|
||||
if (!result.warnings.empty()) {
|
||||
ImGui::TextColored(ImVec4(1, 1, 0, 1), "⚠️ Warnings:");
|
||||
for (const auto& violation : result.warnings) {
|
||||
ImGui::BulletText("%s: %s",
|
||||
violation.policy_name.c_str(),
|
||||
violation.message.c_str());
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Gate Accept button
|
||||
ImGui::Separator();
|
||||
bool can_accept = !result.has_critical_violations();
|
||||
|
||||
if (!can_accept) {
|
||||
ImGui::BeginDisabled();
|
||||
}
|
||||
|
||||
if (ImGui::Button("Accept Proposal")) {
|
||||
if (result.can_accept_with_override() && !override_confirmed_) {
|
||||
// Show override confirmation dialog
|
||||
ImGui::OpenPopup("Override Policy");
|
||||
} else {
|
||||
AcceptProposal(proposal_id);
|
||||
}
|
||||
}
|
||||
|
||||
if (!can_accept) {
|
||||
ImGui::EndDisabled();
|
||||
ImGui::SameLine();
|
||||
ImGui::TextColored(ImVec4(1, 0, 0, 1),
|
||||
"(Accept blocked by policy violations)");
|
||||
}
|
||||
|
||||
// Override confirmation dialog
|
||||
if (ImGui::BeginPopupModal("Override Policy", nullptr,
|
||||
ImGuiWindowFlags_AlwaysAutoResize)) {
|
||||
ImGui::Text("This proposal has policy warnings.");
|
||||
ImGui::Text("Do you want to override and accept anyway?");
|
||||
ImGui::Text("This action will be logged.");
|
||||
ImGui::Separator();
|
||||
|
||||
if (ImGui::Button("Override and Accept")) {
|
||||
override_confirmed_ = true;
|
||||
AcceptProposal(proposal_id);
|
||||
ImGui::CloseCurrentPopup();
|
||||
}
|
||||
ImGui::SameLine();
|
||||
if (ImGui::Button("Cancel")) {
|
||||
ImGui::CloseCurrentPopup();
|
||||
}
|
||||
ImGui::EndPopup();
|
||||
}
|
||||
} else {
|
||||
ImGui::TextColored(ImVec4(1, 0, 0, 1),
|
||||
"Policy evaluation failed: %s",
|
||||
policy_result.status().message().data());
|
||||
}
|
||||
} else {
|
||||
ImGui::TextColored(ImVec4(0.5, 0.5, 0.5, 1),
|
||||
"No policies configured");
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Phase 4: Testing & Documentation (1-2 hours)
|
||||
|
||||
#### 4.1 Example Policy File
|
||||
|
||||
Create `.yaze/policies/agent.yaml.example`:
|
||||
|
||||
```yaml
|
||||
# Example agent policy configuration
|
||||
# Copy to .yaze/policies/agent.yaml and customize
|
||||
|
||||
version: 1.0
|
||||
enabled: true
|
||||
|
||||
policies:
|
||||
# Require test suites to pass
|
||||
- name: require_tests
|
||||
type: test_requirement
|
||||
enabled: false # Disabled by default (no tests yet)
|
||||
severity: critical
|
||||
rules:
|
||||
- test_suite: "smoke_test"
|
||||
min_pass_rate: 1.0
|
||||
message: "All smoke tests must pass"
|
||||
|
||||
# Limit change scope
|
||||
- name: limit_changes
|
||||
type: change_constraint
|
||||
enabled: true
|
||||
severity: warning
|
||||
rules:
|
||||
- max_bytes_changed: 5120 # 5KB
|
||||
- max_commands_executed: 15
|
||||
message: "Keep changes small and focused"
|
||||
|
||||
# Protect ROM header
|
||||
- name: protect_header
|
||||
type: forbidden_range
|
||||
enabled: true
|
||||
severity: critical
|
||||
ranges:
|
||||
- start: 0xFFB0
|
||||
end: 0xFFFF
|
||||
reason: "ROM header"
|
||||
message: "Cannot modify ROM header"
|
||||
```
|
||||
|
||||
#### 4.2 Unit Tests
|
||||
|
||||
Create `test/cli/policy_evaluator_test.cc`:
|
||||
|
||||
```cpp
|
||||
#include "src/cli/service/policy_evaluator.h"
|
||||
#include "gtest/gtest.h"
|
||||
|
||||
namespace yaze {
|
||||
namespace cli {
|
||||
namespace {
|
||||
|
||||
TEST(PolicyEvaluatorTest, LoadPoliciesSuccess) {
|
||||
auto& eval = PolicyEvaluator::GetInstance();
|
||||
auto status = eval.LoadPolicies("test/fixtures/policies");
|
||||
EXPECT_TRUE(status.ok());
|
||||
EXPECT_TRUE(eval.IsEnabled());
|
||||
}
|
||||
|
||||
TEST(PolicyEvaluatorTest, EvaluateProposal_NoViolations) {
|
||||
// ... test implementation
|
||||
}
|
||||
|
||||
TEST(PolicyEvaluatorTest, EvaluateProposal_CriticalViolation) {
|
||||
// ... test implementation
|
||||
}
|
||||
|
||||
} // namespace
|
||||
} // namespace cli
|
||||
} // namespace yaze
|
||||
```
|
||||
|
||||
## Deliverables
|
||||
|
||||
- [x] Policy evaluator service interface
|
||||
- [ ] YAML policy parser implementation
|
||||
- [ ] Policy evaluation logic for all 4 types
|
||||
- [ ] ProposalDrawer GUI integration
|
||||
- [ ] Policy override workflow
|
||||
- [ ] Example policy configurations
|
||||
- [ ] Unit tests
|
||||
- [ ] Documentation and usage guide
|
||||
|
||||
## Success Criteria
|
||||
|
||||
1. **Functional**:
|
||||
- Policies load from YAML files
|
||||
- Proposals evaluated against all enabled policies
|
||||
- Accept button gated by critical violations
|
||||
- Override workflow for warnings
|
||||
|
||||
2. **User Experience**:
|
||||
- Clear policy status display in ProposalDrawer
|
||||
- Helpful violation messages
|
||||
- Override confirmation dialog
|
||||
- Policy evaluation fast (< 100ms)
|
||||
|
||||
3. **Quality**:
|
||||
- Unit test coverage > 80%
|
||||
- No crashes or memory leaks
|
||||
- Graceful handling of malformed YAML
|
||||
- Works with policies disabled
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
- Policy templates for common scenarios
|
||||
- Policy violation history/analytics
|
||||
- Auto-fix suggestions for violations
|
||||
- Integration with CI/CD for automated policy checks
|
||||
- Policy versioning and migration
|
||||
|
||||
---
|
||||
|
||||
**Status**: Ready for implementation
|
||||
**Next Step**: Create PolicyEvaluator skeleton and wire into build system
|
||||
**Estimated Completion**: October 3-4, 2025
|
||||
@@ -5,18 +5,19 @@
|
||||
- **Priority 1**: Complete E2E Validation by implementing identified fixes for window detection and thread safety.
|
||||
- **Priority 2**: Begin Policy Evaluation Framework (AW-04) - a YAML-based constraint system for proposal acceptance.
|
||||
|
||||
**Recent Accomplishments**:
|
||||
- **gRPC Test Harness (IT-01 & IT-02)**: Core implementation of all 6 RPCs (Ping, Click, Type, Wait, Assert, Screenshot) is complete, enabling automated GUI testing from natural language prompts.
|
||||
- **Root Cause Analysis**: Identified key sources of test flakiness, including a window-creation timing issue and a thread-safety bug in RPC handlers. Solutions have been designed.
|
||||
- **Build System**: Hardened the CMake build for reliable gRPC integration.
|
||||
- **Proposal Workflow**: The agentic proposal system (create, list, diff, review in GUI) is fully operational.
|
||||
**Recent Accomplishments** (Updated: October 2, 2025):
|
||||
- **✅ E2E Validation Complete**: All 5 functional RPC tests passing (Ping, Click, Type, Wait, Assert)
|
||||
- Window detection timing issue **resolved** with 10-frame yield buffer in Wait RPC
|
||||
- Thread safety issues **resolved** with shared_ptr state management
|
||||
- Test harness validated on macOS ARM64 with real YAZE GUI interactions
|
||||
- **gRPC Test Harness (IT-01 & IT-02)**: Full implementation complete with natural language → GUI testing
|
||||
- **Build System**: Hardened CMake configuration with reliable gRPC integration
|
||||
- **Proposal Workflow**: Agentic proposal system fully operational (create, list, diff, review in GUI)
|
||||
|
||||
**Known Issues**:
|
||||
- **Test Flakiness**: The e2e test script (`test_harness_e2e.sh`) is flaky due to a timing issue where `Click` actions that open new windows return before the window is interactable.
|
||||
- **Solution**: The `Click` RPC handler must call `ctx->Yield()` after performing the click to allow the ImGui frame to update before the RPC returns.
|
||||
- **RPC Handler Crashes**: The `Wait` and `Assert` RPCs can crash due to unsafe state sharing between the gRPC thread and the test engine thread.
|
||||
- **Solution**: A thread-safe pattern using a `std::shared_ptr` to a state struct must be implemented for these handlers.
|
||||
- **Screenshot RPC**: The Screenshot RPC is a non-functional stub.
|
||||
**Known Limitations** (Non-Blocking):
|
||||
- **Screenshot RPC**: Stub implementation (returns "not implemented" - planned for production phase)
|
||||
- **Widget Naming**: Documentation needed for icon prefixes and naming conventions
|
||||
- **Performance**: Tests add ~166ms per Wait call due to frame yielding (acceptable trade-off)
|
||||
|
||||
**Time Investment**: 20.5 hours total (IT-01: 11h, IT-02: 7.5h, Docs: 2h)on Plan
|
||||
|
||||
@@ -216,7 +217,7 @@ This plan decomposes the design additions into actionable engineering tasks. Eac
|
||||
| IT-01 | Create `ImGuiTestHarness` IPC service embedded in `yaze_test`. | ImGuiTest Bridge | Code | ✅ Done | Phase 1+2+3 Complete - Full GUI automation with gRPC + ImGuiTestEngine (11 hours) |
|
||||
| IT-02 | Implement CLI agent step translation (`imgui_action` → harness call). | ImGuiTest Bridge | Code | ✅ Done | `z3ed agent test` command with natural language prompts (7.5 hours) |
|
||||
| IT-03 | Provide synchronization primitives (`WaitForIdle`, etc.). | ImGuiTest Bridge | Code | ✅ Done | Wait RPC with condition polling already implemented in IT-01 Phase 3 |
|
||||
| IT-04 | Complete E2E validation with real YAZE widgets | ImGuiTest Bridge | Test | 🔄 Active | IT-02, Fix window detection after menu actions (2-3 hours) |
|
||||
| IT-04 | Complete E2E validation with real YAZE widgets | ImGuiTest Bridge | Test | ✅ Done | IT-02 - All 5 functional tests passing, window detection fixed with yield buffer |
|
||||
| VP-01 | Expand CLI unit tests for new commands and sandbox flow. | Verification Pipeline | Test | 📋 Planned | RC/AW tasks |
|
||||
| VP-02 | Add harness integration tests with replay scripts. | Verification Pipeline | Test | 📋 Planned | IT tasks |
|
||||
| VP-03 | Create CI job running agent smoke tests with `YAZE_WITH_JSON`. | Verification Pipeline | Infra | 📋 Planned | VP-01, VP-02 |
|
||||
|
||||
@@ -1,224 +1,53 @@
|
||||
# z3ed: AI-Powered CLI for YAZE
|
||||
|
||||
**Status**: Active Development
|
||||
**Version**: 0.1.0-alpha
|
||||
**Last Updated**: October 2, 2025 (E2E Validation 80% Complete)
|
||||
**Status**: Active Development
|
||||
|
||||
## Overview
|
||||
|
||||
z3ed is a command-line interface for YAZE that enables AI-driven ROM modifications through a proposal-based workflow. It provides both human-accessible commands and machine-readable APIs for LLM integration.
|
||||
`z3ed` is a command-line interface for YAZE that enables AI-driven ROM modifications through a proposal-based workflow. It provides both human-accessible commands for developers and machine-readable APIs for LLM integration, forming the backbone of an agentic development ecosystem.
|
||||
|
||||
This directory contains the primary documentation for the `z3ed` system.
|
||||
|
||||
## Core Documentation
|
||||
|
||||
### Essential Documents (Read These First)
|
||||
Start here to understand the architecture, learn how to use the commands, and see the current development status.
|
||||
|
||||
1. **[E6-z3ed-cli-design.md](E6-z3ed-cli-design.md)** - **SOURCE OF TRUTH**
|
||||
- Architecture overview
|
||||
- Design goals and principles
|
||||
- Command structure
|
||||
- Agentic workflow framework
|
||||
1. **[E6-z3ed-cli-design.md](E6-z3ed-cli-design.md)** - **Design & Architecture**
|
||||
* The "source of truth" for the system's architecture, design goals, and the agentic workflow framework. Read this first to understand *why* the system is built the way it is.
|
||||
|
||||
2. **[E6-z3ed-reference.md](E6-z3ed-reference.md)** - **TECHNICAL REFERENCE**
|
||||
- Complete command reference
|
||||
- API documentation
|
||||
- Implementation guides
|
||||
- Troubleshooting
|
||||
2. **[E6-z3ed-reference.md](E6-z3ed-reference.md)** - **Technical Reference & Guides**
|
||||
* A complete command reference, API documentation, implementation guides, and troubleshooting tips. Use this as your day-to-day manual for working with `z3ed`.
|
||||
|
||||
3. **[E6-z3ed-implementation-plan.md](E6-z3ed-implementation-plan.md)** - **IMPLEMENTATION TRACKER**
|
||||
- Task backlog and roadmap
|
||||
- Progress tracking
|
||||
- Known issues
|
||||
- Next priorities
|
||||
|
||||
### Quick Start Guides
|
||||
|
||||
4. **[IT-01-QUICKSTART.md](IT-01-QUICKSTART.md)** - Test harness quick start
|
||||
- Starting the gRPC server
|
||||
- Testing with grpcurl
|
||||
- Common workflows
|
||||
|
||||
5. **[AGENT_TEST_QUICKREF.md](AGENT_TEST_QUICKREF.md)** - CLI agent test command
|
||||
- Supported prompt patterns
|
||||
- Example workflows
|
||||
- Error handling
|
||||
|
||||
6. **[E2E_VALIDATION_GUIDE.md](E2E_VALIDATION_GUIDE.md)** - Complete validation checklist
|
||||
- Testing procedures
|
||||
- Success criteria
|
||||
- Issue reporting
|
||||
|
||||
### Implementation Guides
|
||||
|
||||
7. **[IMGUI_ID_MANAGEMENT_REFACTORING.md](IMGUI_ID_MANAGEMENT_REFACTORING.md)** - GUI ID management refactoring
|
||||
- Hierarchical widget ID system
|
||||
- Widget registry for test automation
|
||||
- Migration guide for editors
|
||||
- Integration with z3ed agent
|
||||
|
||||
### Status Documents
|
||||
|
||||
8. **[PROJECT_STATUS_OCT2.md](PROJECT_STATUS_OCT2.md)** - Current project status
|
||||
- Component completion percentages
|
||||
- Performance metrics
|
||||
- Known limitations
|
||||
|
||||
9. **[NEXT_PRIORITIES_OCT2.md](NEXT_PRIORITIES_OCT2.md)** - Detailed next steps
|
||||
- Priority 0-3 task breakdowns
|
||||
- Implementation guides
|
||||
- Time estimates
|
||||
3. **[E6-z3ed-implementation-plan.md](E6-z3ed-implementation-plan.md)** - **Roadmap & Status**
|
||||
* The project's task backlog, roadmap, progress tracking, and a list of known issues. Check this document for current priorities and to see what's next.
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Build z3ed
|
||||
|
||||
```bash
|
||||
# Basic build (without gRPC)
|
||||
cmake --build build --target z3ed -j8
|
||||
# Basic build (without GUI automation support)
|
||||
cmake --build build --target z3ed
|
||||
|
||||
# With gRPC support (for GUI automation)
|
||||
# Build with gRPC support (for GUI automation)
|
||||
cmake -B build-grpc-test -DYAZE_WITH_GRPC=ON
|
||||
cmake --build build-grpc-test --target z3ed -j$(sysctl -n hw.ncpu)
|
||||
cmake --build build-grpc-test --target z3ed
|
||||
```
|
||||
|
||||
### Basic Usage
|
||||
### Common Commands
|
||||
|
||||
```bash
|
||||
# Display ROM info
|
||||
z3ed rom info --rom=zelda3.sfc
|
||||
# Create an agent proposal in a safe sandbox
|
||||
z3ed agent run --prompt "Make all soldier armor red" --rom=zelda3.sfc --sandbox
|
||||
|
||||
# Export a palette
|
||||
z3ed palette export sprites_aux1 4 soldier.col
|
||||
|
||||
# Create an agent proposal
|
||||
z3ed agent run --prompt "Make soldiers red" --rom=zelda3.sfc --sandbox
|
||||
|
||||
# List all proposals
|
||||
# List all active and past proposals
|
||||
z3ed agent list
|
||||
|
||||
# View proposal changes
|
||||
# View the changes for the latest proposal
|
||||
z3ed agent diff
|
||||
|
||||
# Automated GUI testing (requires test harness)
|
||||
z3ed agent test --prompt "Open Overworld editor and verify it loads"
|
||||
# Run an automated GUI test (requires test harness to be running)
|
||||
z3ed agent test --prompt "Open the Overworld editor and verify it loads"
|
||||
```
|
||||
|
||||
### Start Test Harness (Optional)
|
||||
|
||||
```bash
|
||||
# Terminal 1: Start YAZE with test harness
|
||||
./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze \
|
||||
--enable_test_harness \
|
||||
--test_harness_port=50052 \
|
||||
--rom_file=assets/zelda3.sfc &
|
||||
|
||||
# Terminal 2: Run automated test
|
||||
./build-grpc-test/bin/z3ed agent test \
|
||||
--prompt "Open Overworld editor"
|
||||
```
|
||||
|
||||
## Documentation Structure
|
||||
|
||||
```
|
||||
docs/z3ed/
|
||||
├── Core Documentation (3 files)
|
||||
│ ├── E6-z3ed-cli-design.md [Source of Truth]
|
||||
│ ├── E6-z3ed-reference.md [Technical Reference]
|
||||
│ └── E6-z3ed-implementation-plan.md [Tracker]
|
||||
│
|
||||
├── Quick Start Guides (3 files)
|
||||
│ ├── IT-01-QUICKSTART.md [Test Harness]
|
||||
│ ├── AGENT_TEST_QUICKREF.md [CLI Agent Test]
|
||||
│ └── E2E_VALIDATION_GUIDE.md [Validation]
|
||||
│
|
||||
├── Implementation Guides (1 file)
|
||||
│ └── IMGUI_ID_MANAGEMENT_REFACTORING.md [GUI ID System]
|
||||
│
|
||||
├── Status Documents (4 files)
|
||||
│ ├── README.md [This file]
|
||||
│ ├── PROJECT_STATUS_OCT2.md [Current Status]
|
||||
│ ├── NEXT_PRIORITIES_OCT2.md [Next Steps]
|
||||
│ └── WORK_SUMMARY_OCT2.md [Recent Work]
|
||||
│
|
||||
└── Archive (15+ files)
|
||||
└── Historical documentation and implementation notes
|
||||
```
|
||||
|
||||
## Key Features
|
||||
|
||||
### ✅ Completed (Production-Ready on macOS)
|
||||
|
||||
- **Resource-Oriented CLI**: Clean command structure (`z3ed <resource> <action>`)
|
||||
- **Resource Catalogue**: Machine-readable API specs (`docs/api/z3ed-resources.yaml`)
|
||||
- **Acceptance Workflow**: Proposal tracking, sandbox management, GUI review
|
||||
- **ImGuiTestHarness (IT-01)**: gRPC-based GUI automation (6 RPC methods)
|
||||
- **CLI Agent Test (IT-02)**: Natural language → automated GUI tests
|
||||
- **ProposalDrawer**: Integrated proposal review UI in YAZE
|
||||
- **ROM Operations**: info, validate, diff, generate-golden
|
||||
- **Palette Operations**: export, import, list
|
||||
- **Overworld Operations**: get-tile, set-tile
|
||||
- **Dungeon Operations**: list-rooms, add-object
|
||||
|
||||
### 🔄 In Progress (80% Complete)
|
||||
|
||||
- **E2E Validation**: Full workflow testing (window detection needs fix)
|
||||
|
||||
### 📋 Planned (Next Priorities)
|
||||
|
||||
1. **Policy Evaluation Framework (AW-04)**: YAML-based constraints
|
||||
2. **Windows Cross-Platform Testing**: Validate on Windows with vcpkg
|
||||
3. **Production Readiness**: Telemetry, screenshot, expanded tests
|
||||
|
||||
## Architecture Highlights
|
||||
|
||||
### Proposal-Based Workflow
|
||||
|
||||
```
|
||||
User Prompt → AI Service → Sandbox ROM → Execute Commands →
|
||||
Create Proposal → Review in GUI → Accept/Reject → Commit to ROM
|
||||
```
|
||||
|
||||
### Component Stack
|
||||
|
||||
```
|
||||
┌─────────────────────────────────┐
|
||||
│ AI Agent (LLM) │
|
||||
├─────────────────────────────────┤
|
||||
│ z3ed CLI │
|
||||
├─────────────────────────────────┤
|
||||
│ Service Layer │
|
||||
│ • ProposalRegistry │
|
||||
│ • RomSandboxManager │
|
||||
│ • GuiAutomationClient │
|
||||
│ • TestWorkflowGenerator │
|
||||
├─────────────────────────────────┤
|
||||
│ ImGuiTestHarness (gRPC) │
|
||||
├─────────────────────────────────┤
|
||||
│ YAZE GUI + ProposalDrawer │
|
||||
└─────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Resources
|
||||
|
||||
**Machine-Readable API**: `docs/api/z3ed-resources.yaml`
|
||||
**Proto Schema**: `src/app/core/proto/imgui_test_harness.proto`
|
||||
**Test Script**: `scripts/test_harness_e2e.sh`
|
||||
|
||||
## Contributing
|
||||
|
||||
See **[B1-contributing.md](../B1-contributing.md)** for general contribution guidelines.
|
||||
|
||||
For z3ed-specific development:
|
||||
1. Read **E6-z3ed-cli-design.md** for architecture
|
||||
2. Check **E6-z3ed-implementation-plan.md** for open tasks
|
||||
3. Use **E6-z3ed-reference.md** for API details
|
||||
4. Follow **NEXT_PRIORITIES_OCT2.md** for current work
|
||||
|
||||
## License
|
||||
|
||||
Same as YAZE - see `LICENSE` in repository root.
|
||||
|
||||
---
|
||||
|
||||
**Last Updated**: October 2, 2025
|
||||
**Contributors**: @scawful, GitHub Copilot
|
||||
**Next Milestone**: E2E Validation Complete (Est. Oct 3, 2025)
|
||||
See the **[Technical Reference](E6-z3ed-reference.md)** for a full command list.
|
||||
|
||||
Reference in New Issue
Block a user