# Policy Evaluation Framework (AW-04) **Status**: Implementation In Progress **Priority**: High (Next Phase) **Time Estimate**: 6-8 hours **Last Updated**: October 2, 2025 ## Overview The Policy Evaluation Framework provides a YAML-based constraint system for gating proposal acceptance in the z3ed agent workflow. It ensures that AI-generated ROM modifications meet quality, safety, and testing requirements before being merged into the main ROM. ## Goals 1. **Quality Gates**: Enforce minimum test pass rates and code quality standards 2. **Safety Constraints**: Prevent modifications to critical ROM regions (headers, checksums) 3. **Scope Limits**: Restrict changes to reasonable byte counts and specific banks 4. **Human Review**: Require manual review for large or complex changes 5. **Flexibility**: Allow policy overrides with confirmation and logging ## Architecture ``` ┌─────────────────────────────────────────────────────────┐ │ ProposalDrawer (GUI) │ │ └─ Accept button gated by PolicyEvaluator │ └────────────────────┬────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ PolicyEvaluator (Singleton Service) │ │ ├─ LoadPolicies() from .yaze/policies/ │ │ ├─ EvaluateProposal(proposal_id) → PolicyResult │ │ └─ Cache of parsed YAML policies │ └────────────────────┬────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────┐ │ .yaze/policies/agent.yaml (YAML Configuration) │ │ ├─ test_requirements (min pass rates) │ │ ├─ change_constraints (byte limits, allowed banks) │ │ ├─ review_requirements (human review triggers) │ │ └─ forbidden_ranges (protected ROM regions) │ └─────────────────────────────────────────────────────────┘ ``` ## YAML Policy Schema ### Example Policy File ```yaml # .yaze/policies/agent.yaml version: 1.0 enabled: true policies: # Policy 1: Test Requirements - name: require_tests type: test_requirement enabled: true severity: critical # critical | warning | info rules: - test_suite: "overworld_rendering" min_pass_rate: 0.95 - test_suite: "palette_integrity" min_pass_rate: 1.0 - test_suite: "dungeon_logic" min_pass_rate: 0.90 message: "All required test suites must pass before accepting proposal" # Policy 2: Change Scope Limits - name: limit_change_scope type: change_constraint enabled: true severity: critical rules: - max_bytes_changed: 10240 # 10KB limit - allowed_banks: [0x00, 0x01, 0x0E, 0x0F] # Graphics banks only - max_commands_executed: 20 message: "Proposal exceeds allowed change scope" # Policy 3: Protected ROM Regions - name: protect_critical_regions type: forbidden_range enabled: true severity: critical ranges: - start: 0xFFB0 # ROM header end: 0xFFFF reason: "ROM header is protected" - start: 0x00FFC0 # Internal header end: 0x00FFDF reason: "Internal ROM header" message: "Proposal modifies protected ROM region" # Policy 4: Human Review Requirements - name: human_review_required type: review_requirement enabled: true severity: warning conditions: - if: bytes_changed > 1024 then: require_diff_review message: "Large change requires diff review" - if: commands_executed > 10 then: require_log_review message: "Complex operation requires log review" - if: test_failures > 0 then: require_explanation message: "Test failures require explanation" # Policy 5: Palette Modifications - name: palette_safety type: change_constraint enabled: true severity: warning rules: - max_palettes_changed: 5 - preserve_transparency: true # Don't modify color index 0 message: "Palette changes exceed safety threshold" ``` ### Schema Definition ```yaml # Policy file structure version: string # Semantic version (e.g., "1.0") enabled: boolean # Master enable/disable policies: - name: string # Unique policy identifier type: enum # test_requirement | change_constraint | forbidden_range | review_requirement enabled: boolean # Policy-specific enable/disable severity: enum # critical | warning | info # Type-specific fields: rules: array # For test_requirement, change_constraint ranges: array # For forbidden_range conditions: array # For review_requirement message: string # User-facing error message ``` ## Implementation Plan ### Phase 1: Core Infrastructure (2 hours) #### 1.1 Create PolicyEvaluator Service **File**: `src/cli/service/policy_evaluator.h` ```cpp #ifndef YAZE_CLI_SERVICE_POLICY_EVALUATOR_H #define YAZE_CLI_SERVICE_POLICY_EVALUATOR_H #include #include #include #include "absl/status/status.h" #include "absl/status/statusor.h" #include "absl/strings/string_view.h" namespace yaze { namespace cli { // Policy violation severity levels enum class PolicySeverity { kInfo, // Informational, doesn't block acceptance kWarning, // Warning, can be overridden kCritical // Critical, blocks acceptance }; // Individual policy violation struct PolicyViolation { std::string policy_name; PolicySeverity severity; std::string message; std::string details; // Additional context }; // Result of policy evaluation struct PolicyResult { bool passed; // True if all critical policies passed std::vector violations; // Categorized violations std::vector critical_violations; std::vector warnings; std::vector info; // Helper methods bool has_critical_violations() const { return !critical_violations.empty(); } bool can_accept_with_override() const { return !has_critical_violations() && !warnings.empty(); } }; // Singleton service for evaluating proposals against policies class PolicyEvaluator { public: static PolicyEvaluator& GetInstance(); // Load policies from disk (.yaze/policies/agent.yaml) absl::Status LoadPolicies(absl::string_view policy_dir = ".yaze/policies"); // Evaluate a proposal against all loaded policies absl::StatusOr EvaluateProposal( absl::string_view proposal_id); // Reload policies from disk (for live editing) absl::Status ReloadPolicies(); // Check if policies are loaded and enabled bool IsEnabled() const { return enabled_; } // Get policy configuration path std::string GetPolicyPath() const { return policy_path_; } private: PolicyEvaluator() = default; ~PolicyEvaluator() = default; // Non-copyable, non-movable PolicyEvaluator(const PolicyEvaluator&) = delete; PolicyEvaluator& operator=(const PolicyEvaluator&) = delete; // Parse YAML policy file absl::Status ParsePolicyFile(absl::string_view yaml_content); // Evaluate individual policy types void EvaluateTestRequirements( absl::string_view proposal_id, PolicyResult* result); void EvaluateChangeConstraints( absl::string_view proposal_id, PolicyResult* result); void EvaluateForbiddenRanges( absl::string_view proposal_id, PolicyResult* result); void EvaluateReviewRequirements( absl::string_view proposal_id, PolicyResult* result); bool enabled_ = false; std::string policy_path_; // Parsed policy structures (implementation detail) struct PolicyConfig; std::unique_ptr config_; }; } // namespace cli } // namespace yaze #endif // YAZE_CLI_SERVICE_POLICY_EVALUATOR_H ``` #### 1.2 Create Policy Configuration Structures **File**: `src/cli/service/policy_evaluator.cc` (partial) ```cpp #include "src/cli/service/policy_evaluator.h" #include #include #include "absl/strings/str_format.h" #include "src/cli/service/proposal_registry.h" // If YAML parsing is available #ifdef YAZE_WITH_YAML #include #endif namespace yaze { namespace cli { // Internal policy configuration structures struct PolicyEvaluator::PolicyConfig { std::string version; bool enabled; struct TestRequirement { std::string name; bool enabled; PolicySeverity severity; std::vector> test_suites; // suite name → min pass rate std::string message; }; struct ChangeConstraint { std::string name; bool enabled; PolicySeverity severity; int max_bytes_changed = -1; std::vector allowed_banks; int max_commands_executed = -1; int max_palettes_changed = -1; bool preserve_transparency = false; std::string message; }; struct ForbiddenRange { std::string name; bool enabled; PolicySeverity severity; std::vector> ranges; // start, end, reason std::string message; }; struct ReviewRequirement { std::string name; bool enabled; PolicySeverity severity; std::vector conditions; std::string message; }; std::vector test_requirements; std::vector change_constraints; std::vector forbidden_ranges; std::vector review_requirements; }; // Singleton instance PolicyEvaluator& PolicyEvaluator::GetInstance() { static PolicyEvaluator instance; return instance; } absl::Status PolicyEvaluator::LoadPolicies(absl::string_view policy_dir) { policy_path_ = absl::StrFormat("%s/agent.yaml", policy_dir); // Check if file exists std::ifstream file(policy_path_); if (!file.good()) { // No policy file - policies disabled enabled_ = false; return absl::OkStatus(); } // Read file content std::stringstream buffer; buffer << file.rdbuf(); std::string yaml_content = buffer.str(); return ParsePolicyFile(yaml_content); } absl::Status PolicyEvaluator::ParsePolicyFile(absl::string_view yaml_content) { #ifndef YAZE_WITH_YAML return absl::UnimplementedError( "YAML support not compiled. Build with YAZE_WITH_YAML=ON"); #else try { YAML::Node root = YAML::Load(std::string(yaml_content)); config_ = std::make_unique(); config_->version = root["version"].as("1.0"); config_->enabled = root["enabled"].as(true); if (!config_->enabled) { enabled_ = false; return absl::OkStatus(); } // Parse policies array if (root["policies"]) { for (const auto& policy_node : root["policies"]) { std::string type = policy_node["type"].as(); if (type == "test_requirement") { // Parse test requirement policy // ... (implementation continues) } else if (type == "change_constraint") { // Parse change constraint policy // ... (implementation continues) } else if (type == "forbidden_range") { // Parse forbidden range policy // ... (implementation continues) } else if (type == "review_requirement") { // Parse review requirement policy // ... (implementation continues) } } } enabled_ = true; return absl::OkStatus(); } catch (const YAML::Exception& e) { return absl::InvalidArgumentError( absl::StrFormat("Failed to parse policy YAML: %s", e.what())); } #endif } // ... (implementation continues with evaluation methods) } // namespace cli } // namespace yaze ``` ### Phase 2: Policy Evaluation Logic (2-3 hours) Implement the core evaluation methods that check proposals against each policy type. ### Phase 3: GUI Integration (2 hours) #### 3.1 Update ProposalDrawer **File**: `src/app/editor/system/proposal_drawer.cc` Add policy status display and gating logic: ```cpp #include "src/cli/service/policy_evaluator.h" void ProposalDrawer::DrawProposalDetail(const std::string& proposal_id) { // ... existing detail view code ... // === Policy Status Section === ImGui::Separator(); ImGui::TextUnformatted("Policy Status:"); auto& policy_eval = cli::PolicyEvaluator::GetInstance(); if (policy_eval.IsEnabled()) { auto policy_result = policy_eval.EvaluateProposal(proposal_id); if (policy_result.ok()) { const auto& result = policy_result.value(); if (result.passed) { ImGui::TextColored(ImVec4(0, 1, 0, 1), "✓ All policies passed"); } else { // Show violations if (result.has_critical_violations()) { ImGui::TextColored(ImVec4(1, 0, 0, 1), "⛔ Critical violations:"); for (const auto& violation : result.critical_violations) { ImGui::BulletText("%s: %s", violation.policy_name.c_str(), violation.message.c_str()); } } if (!result.warnings.empty()) { ImGui::TextColored(ImVec4(1, 1, 0, 1), "⚠️ Warnings:"); for (const auto& violation : result.warnings) { ImGui::BulletText("%s: %s", violation.policy_name.c_str(), violation.message.c_str()); } } } // Gate Accept button ImGui::Separator(); bool can_accept = !result.has_critical_violations(); if (!can_accept) { ImGui::BeginDisabled(); } if (ImGui::Button("Accept Proposal")) { if (result.can_accept_with_override() && !override_confirmed_) { // Show override confirmation dialog ImGui::OpenPopup("Override Policy"); } else { AcceptProposal(proposal_id); } } if (!can_accept) { ImGui::EndDisabled(); ImGui::SameLine(); ImGui::TextColored(ImVec4(1, 0, 0, 1), "(Accept blocked by policy violations)"); } // Override confirmation dialog if (ImGui::BeginPopupModal("Override Policy", nullptr, ImGuiWindowFlags_AlwaysAutoResize)) { ImGui::Text("This proposal has policy warnings."); ImGui::Text("Do you want to override and accept anyway?"); ImGui::Text("This action will be logged."); ImGui::Separator(); if (ImGui::Button("Override and Accept")) { override_confirmed_ = true; AcceptProposal(proposal_id); ImGui::CloseCurrentPopup(); } ImGui::SameLine(); if (ImGui::Button("Cancel")) { ImGui::CloseCurrentPopup(); } ImGui::EndPopup(); } } else { ImGui::TextColored(ImVec4(1, 0, 0, 1), "Policy evaluation failed: %s", policy_result.status().message().data()); } } else { ImGui::TextColored(ImVec4(0.5, 0.5, 0.5, 1), "No policies configured"); } } ``` ### Phase 4: Testing & Documentation (1-2 hours) #### 4.1 Example Policy File Create `.yaze/policies/agent.yaml.example`: ```yaml # Example agent policy configuration # Copy to .yaze/policies/agent.yaml and customize version: 1.0 enabled: true policies: # Require test suites to pass - name: require_tests type: test_requirement enabled: false # Disabled by default (no tests yet) severity: critical rules: - test_suite: "smoke_test" min_pass_rate: 1.0 message: "All smoke tests must pass" # Limit change scope - name: limit_changes type: change_constraint enabled: true severity: warning rules: - max_bytes_changed: 5120 # 5KB - max_commands_executed: 15 message: "Keep changes small and focused" # Protect ROM header - name: protect_header type: forbidden_range enabled: true severity: critical ranges: - start: 0xFFB0 end: 0xFFFF reason: "ROM header" message: "Cannot modify ROM header" ``` #### 4.2 Unit Tests Create `test/cli/policy_evaluator_test.cc`: ```cpp #include "src/cli/service/policy_evaluator.h" #include "gtest/gtest.h" namespace yaze { namespace cli { namespace { TEST(PolicyEvaluatorTest, LoadPoliciesSuccess) { auto& eval = PolicyEvaluator::GetInstance(); auto status = eval.LoadPolicies("test/fixtures/policies"); EXPECT_TRUE(status.ok()); EXPECT_TRUE(eval.IsEnabled()); } TEST(PolicyEvaluatorTest, EvaluateProposal_NoViolations) { // ... test implementation } TEST(PolicyEvaluatorTest, EvaluateProposal_CriticalViolation) { // ... test implementation } } // namespace } // namespace cli } // namespace yaze ``` ## Deliverables - [x] Policy evaluator service interface - [ ] YAML policy parser implementation - [ ] Policy evaluation logic for all 4 types - [ ] ProposalDrawer GUI integration - [ ] Policy override workflow - [ ] Example policy configurations - [ ] Unit tests - [ ] Documentation and usage guide ## Success Criteria 1. **Functional**: - Policies load from YAML files - Proposals evaluated against all enabled policies - Accept button gated by critical violations - Override workflow for warnings 2. **User Experience**: - Clear policy status display in ProposalDrawer - Helpful violation messages - Override confirmation dialog - Policy evaluation fast (< 100ms) 3. **Quality**: - Unit test coverage > 80% - No crashes or memory leaks - Graceful handling of malformed YAML - Works with policies disabled ## Future Enhancements - Policy templates for common scenarios - Policy violation history/analytics - Auto-fix suggestions for violations - Integration with CI/CD for automated policy checks - Policy versioning and migration --- **Status**: Ready for implementation **Next Step**: Create PolicyEvaluator skeleton and wire into build system **Estimated Completion**: October 3-4, 2025