Files
yaze/docs/internal/agents/cli-ux-proposals.md
2025-12-22 14:50:57 -05:00

100 lines
10 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# z3ed CLI UX/TUI Improvement Proposals
Status: IN_PROGRESS
Owner: ai-infra-architect
Created: 2025-12-01
Last Reviewed: 2025-12-02
Next Review: 2025-12-08
Board: docs/internal/agents/coordination-board.md (2025-12-01 ai-infra-architect z3ed CLI UX/TUI Improvement Proposals)
## Progress Update (2025-12-02)
### Completed
-**Doctor Suite Expansion**: Added `dungeon-doctor` and `rom-doctor` commands
-**Test CLI Infrastructure**: Added `test-list`, `test-run`, `test-status` commands
-**OutputFormatter Integration**: All diagnostic commands now use structured output
-**RequiresRom Fix**: Commands that don't need ROM can now run without `--rom` flag
-**JSON/Text Output**: All new commands support `--format json|text`
### New Commands Added
| Command | Description | Requires ROM |
|---------|-------------|--------------|
| `dungeon-doctor` | Room data integrity, object/sprite limits, chest conflicts | Yes |
| `rom-doctor` | Header validation, checksums, expansion status, free space | Yes |
| `test-list` | List available test suites with labels and requirements | No |
| `test-run` | Run tests with structured output | No |
| `test-status` | Show test configuration (ROM path, presets, enabled suites) | No |
### Remaining Work
- TUI consolidation (single `--tui` entry)
- Command palette driven from CommandRegistry
- Agent-aligned test harness refinements
## Summary
- Unify CLI argument/help surfaces and ensure every command emits consistent, machine-friendly output for agents while remaining legible for humans.
- Expand and harden the “doctor” style diagnostics into a repeatable health suite (overworld + future dungeon/ROM integrity) with safe fix paths and baseline comparison.
- Consolidate the TUI experience (ChatTUI vs unified layout vs enhanced TUI shell) into one interactive mode that is useful for human operators and exposes the same commands/tools agents call.
- Extend the same UX standards to tests and helper tools so agents and humans can run, triage, and record suites from CLI/TUI/editor with structured results and predictable flags.
## Current Observations
- Entry path duplication: `cli_main.cc` handles `--tui` by launching `ShowMain()` (unified layout) while `ModernCLI::Run` also special-cases `--tui` to `ChatTUI`, creating divergent UX and help flows (`PrintCompactHelp()` vs `ModernCLI::ShowHelp()`).
- Command metadata is pieced together inside `CommandRegistry::RegisterAllCommands()` instead of being driven by handlers; `ExportFunctionSchemas()` returns `{}` and `GenerateHelp()` is not surfaced via `z3ed --help <cmd>`.
- Argument parsing is minimal (`ArgumentParser` only supports `--key value/=`) and handlers often skip validation (`overworld-doctor`, `rom-compare`, `overworld-validate` accept anything). Format handling is inconsistent (`--json` flags vs `--format` vs raw `std::cout`).
- Doctor/compare tooling writes heavy ASCII art directly to `std::cout` and ignores `OutputFormatter`, so agents cannot consume structured output; no dry-run, no severity levels, and no notion of “fix plan vs applied fixes”.
- TUI pieces are fragmented: `tui/command_palette.cc` hardcodes demo commands, `UnifiedLayout` shows placeholder status/workflows, `ChatTUI` has its own shortcuts/history, and the ANSI `EnhancedTUI` shell is disconnected from ftxui flows. No TUI path renders real command metadata or schemas.
## Proposed Improvements
### 1) Argument/Help/Schema Consolidation
- Make `CommandRegistry` the single source for help and schemas: require handlers to supply description/examples/requirements, expose `z3ed help <command|category|all>` using `GenerateHelp/GenerateCategoryHelp`, and implement `ExportFunctionSchemas()` for AI tool discovery.
- Standardize global/common flags (`--rom`, `--mock-rom`, `--format {json,text,table}`, `--verbose`, `--grpc`) and teach `ArgumentParser` to parse booleans/ints/enum values with better errors and `--` passthrough for prompts.
- Add per-command validation hooks that surface actionable errors (missing required args, invalid ranges) and return status codes without dumping stack traces to stdout; ensure `ValidateArgs` is used in all handlers.
### 2) Doctor Suite (Diagnostics + Fixes)
- Convert `overworld-doctor`, `overworld-validate`, and `rom-compare` to use `OutputFormatter` with a compact JSON schema (summary, findings with severities, suggested actions, fix_applied flags) plus a readable text mode for humans.
- Split diagnose vs fix: `doctor overworld diagnose [--baseline … --include-tail --format json]` and `doctor overworld fix [--baseline … --output … --dry-run]`, with safety gates for pointer-table expansion and backup writing.
- Add baseline handling and snapshotting: auto-load vanilla baseline from configured path, emit diff stats, and allow `--save-report <path>` (JSON/markdown) for agents to ingest.
- Roadmap new scopes: `doctor dungeon`, `doctor rom-header/checksums`, and `doctor sprites/palettes` that reuse the same report schema so agents can stack health checks.
### 3) Interactive TUI for Humans + Agents
- Collapse the two TUI modes into one `--tui` entry: single ftxui layout that hosts chat, command palette, status, and tool output panes; retire the duplicate ChatTUI path in `ModernCLI::Run` or make it a sub-mode inside the unified layout.
- Drive the TUI command palette from `CommandRegistry` (real command list, usage, examples, requirements), with fuzzy search, previews, and a “run with args” form that populates flags for common tasks (rom load, format).
- Pipe tool/doctor output into a scrollback pane with toggles for text vs JSON, and surface quick actions for common diagnostics (overworld diagnose, rom compare, palette inspect) plus agent handoff buttons (run in simple-chat with the same args).
- Share history/autocomplete between TUI and `simple-chat` so agents and humans see the same recent commands/prompts; add inline help overlay (hotkey) that renders registry metadata instead of static placeholder text.
### 4) Agent & Automation Alignment
- Enforce that all agent-callable commands emit JSON by default; mark human-only commands as `available_to_agent=false` in metadata and warn when agents attempt them.
- Add `--capture <file>` / `--emit-schema` options so agents can snapshot outputs without scraping stdout, and wire doctor results into the agent TODO manager for follow-up actions.
- Provide a thin `z3ed doctor --profile minimal|full` wrapper that batches key diagnostics for CI/agents and returns a single aggregated status code plus JSON report.
## Test & Tools UX Proposals
### Current Observations
- Tests are well-documented for humans (`test/README.md`), but there is no machine-readable manifest of suites/labels or CLI entry to run/parse results; agents must shell out to `ctest` and scrape text.
- Agent-side test commands (`agent test run/list/status/...`) print ad-hoc logs and lack `OutputFormatter`/metadata, making automation fragile; no JSON status, exit codes, or artifacts paths surfaced.
- Test helper tools (`tools-*/` commands, `tools/test_helpers/*`) mix stdout banners with file emission and manual path requirements; they are not discoverable via TUI or CommandRegistry-driven palettes and do not expose dry-run/plan outputs.
- TUI/editor have no test surface: no panel to run `stable/gui/rom_dependent/experimental` suites, inspect failing cases, or attach ROM paths/presets; quick actions and history are missing.
- Build/preset coupling is implicit—no guided flow to pick `mac-test/mac-ai/mac-dev`, enable ROM/AI flags, or attach `YAZE_TEST_ROM_VANILLA`; agents/humans can misconfigure and get empty test sets.
### Proposed Improvements
- **Unified test CLI/TUI API**
- Add `z3ed test list --format json` (labels, targets, requirements, presets) and `z3ed test run --label stable|gui|rom_dependent --preset <preset> [--rom …] [--artifact <path>]` backed by `ctest` with structured OutputFormatter.
- Emit JSON summaries (pass/fail, duration, failing tests, log paths) with clear exit codes; support `--capture` to write reports for agents and CI.
- Map labels to presets and requirements automatically (ROM path, AI runtime) and surface actionable errors instead of silent skips.
- **TUI/editor integration**
- Add a Tests panel in the unified TUI: quick buttons for `stable`, `stable+gui`, `rom`, `experimental`; show live progress, failures, and links to logs/artifacts; allow rerun of last failure set.
- Mirror the panel in ImGui editor (if available) with a lightweight runner that shells through the same CLI API to keep behavior identical.
- **Agent-aligned test harness**
- Refactor `agent test *` commands to use CommandRegistry metadata and OutputFormatter (JSON default, text fallback), including workflow generation/replay, recording state, and results paths.
- Provide a `test manifest` JSON file (generated from CMake/ctest) listing suites, labels, and prerequisites; expose via `z3ed --export-test-manifest`.
- **Tools/test-helpers cleanup**
- Convert `tools-harness-state`, `tools-extract-values`, `tools-extract-golden`, and `tools-patch-v3` to strict arg validation, `--format {json,text}`, and `--dry-run`/`--output` defaults; summarize emitted artifacts in JSON.
- Register these tools in the TUI command palette with real metadata/examples; add quick actions (“Generate harness state from ROM”, “Extract vanilla values as JSON”).
- **Build/preset ergonomics**
- Add `z3ed test configure --profile {fast,ai,rom,full}` to set the right CMake preset and flags, prompt for ROM path when needed, and persist the choice for the session.
- Surface preset/flag status in the TUI status bar and in `z3ed test status` so agents/humans know why suites are skipped.
## Deliverables / Exit Criteria
- Implemented help/schema surface (`z3ed help`, `z3ed --export-schemas`) backed by handler-supplied metadata; `ExportFunctionSchemas()` returns real data.
- All doctor/validate/compare commands emit structured output via `OutputFormatter` with diagnose/fix separation, dry-run, and baseline inputs; text mode remains readable.
- Single `--tui` experience that pulls commands from `CommandRegistry`, executes them, and displays outputs/history consistently for humans and agents.
- Updated documentation and examples reflecting the consolidated flag/command layout, plus quick-start snippets for agents (JSON) and humans (text).