- Introduced `AIGUIController` class to manage AI-driven GUI automation with vision feedback, enabling natural language command execution and iterative action refinement.
- Implemented `VisionActionRefiner` class to analyze screenshots and refine actions based on visual feedback, improving action success rates.
- Added header and implementation files for both classes, along with necessary methods for screenshot analysis, action verification, and UI element location.
- Updated CMake configuration to include new source files for the AI GUI controller and vision action refiner functionalities.
- Introduced `AIActionParser` class to parse natural language commands into structured GUI actions, supporting commands like placing tiles and opening editors.
- Implemented helper functions for extracting coordinates and parsing hex/decimal values.
- Added action types for various AI actions, including selecting and placing tiles, saving changes, and clicking buttons.
- Created header file `ai_action_parser.h` to define the action types and parser interface.
- Added implementation file `ai_action_parser.cc` with command parsing logic and pattern matching for different action types.
- Revised function schemas in `function_schemas.json` to streamline resource listing and searching, including new parameters for dungeon and overworld queries.
- Introduced a new system prompt in `system_prompt_v3.txt` to improve the agent's proactive exploration and multi-tool chaining strategies.
- Updated `GeminiAIService` to support the new prompt version and enhanced function calling logic for better tool integration.
- Added tests for multimodal image analysis and error handling in `test_gemini_vision.cc` to ensure robust functionality.
- Implemented new commands for listing overworld sprites and retrieving entrance details.
- Enhanced CLI functionality to support filtering by map, world, and sprite ID with JSON and text output formats.
- Introduced tile statistics analysis command for detailed tile usage insights.
- Updated function schemas and system prompts to reflect the new commands and their parameters.
- Implemented conditional compilation for executing curl commands using _popen and _pclose on Windows.
- Ensured consistent handling of command execution across different operating systems.
- Updated response generation methods to maintain functionality in Windows environments.
- Updated README.md to include detailed instructions for both local and network collaboration modes.
- Added setup requirements and steps for starting a collaboration server using Node.js.
- Enhanced feature descriptions for real-time collaboration, session management, and participant tracking.
- Improved clarity on how to host and join sessions in both collaboration modes.
- Implemented `resource-search` command to allow fuzzy searching of resource labels.
- Added `dungeon-describe-room` command to summarize metadata for a specified dungeon room.
- Enhanced `agent` command handler to support new commands and updated usage documentation.
- Introduced read-only accessors for room metadata in the Room class.
- Updated AI service to recognize and handle new commands for resource searching and room description.
- Improved metrics tracking for user interactions, including command execution and response times.
- Enhanced TUI to display command metrics and session summaries.
- Added a `--verbose` flag to enable detailed debug output for the Gemini AI service.
- Updated `GeminiAIService` constructor to log initialization details when verbose mode is enabled.
- Modified `CreateAIService` to pass the verbose flag to the Gemini configuration.
- Enhanced command help in `ModernCLI` to categorize commands and provide detailed descriptions.
- Refactored `HandleSimpleChatCommand` to accept a pointer to `Rom` instead of a reference.
- Updated `ShowCategoryHelp` to display command categories and examples.
- Improved error handling and logging in `GeminiAIService` for better debugging.
- Integrated yaml-cpp library into the project for YAML file parsing.
- Updated ConversationalAgentService to set ROM context in AI services.
- Extended AIService interface with SetRomContext method for context injection.
- Implemented SetRomContext in GeminiAIService and OllamaAIService.
- Enhanced PromptBuilder to load resource catalogues from YAML files.
- Added functions to parse commands, tools, examples, and tile references from YAML.
- Improved error handling for loading YAML files and added search paths for catalogues.
- Updated CMake configuration to fetch yaml-cpp if not found.
- Modified vcpkg.json to include yaml-cpp as a dependency.
- Added support for JSON in CMake configuration for AI integrations.
- Implemented new tool commands: resource-list and dungeon-list-sprites.
- Created ToolDispatcher for managing tool command execution.
- Refactored CMake structure to include agent sources and improve build configuration.
- Updated agent roadmap and README documentation to reflect current status and next steps.
- Introduced `ToolDispatcher` class to handle tool calls from the AI agent, allowing for dynamic execution of commands based on user requests.
- Updated `ConversationalAgentService` to integrate tool dispatching, enabling the agent to respond to tool calls and manage execution results.
- Enhanced `AgentResponse` structure to include a list of tool calls, facilitating communication between the AI and the tool dispatcher.
- Modified AI service implementations to parse and include tool calls in responses, improving the agent's interactive capabilities.
This commit significantly enhances the z3ed system's ability to manage and execute tool calls, paving the way for more complex interactions in ROM hacking.
- Added support for AI agent services, including `ConversationalAgentService`, to facilitate user interactions through a chat interface.
- Implemented `ChatTUI` for a terminal-based chat experience, allowing users to send messages and receive responses from the AI agent.
- Updated `EditorManager` to include options for displaying the agent chat widget and performance dashboard.
- Enhanced CMake configurations to include new source files for AI services and chat interface components.
This commit significantly expands the functionality of the z3ed system, paving the way for a more interactive and user-friendly experience in ROM hacking.
- Restructured CLI service source files to improve organization, moving files into dedicated directories for better maintainability.
- Introduced new AI service components, including `AIService`, `MockAIService`, and `GeminiAIService`, to facilitate natural language command generation.
- Implemented `PolicyEvaluator` and `ProposalRegistry` for enhanced proposal management and policy enforcement in AI workflows.
- Updated CMake configurations to reflect new file paths and ensure proper linking of the restructured components.
- Enhanced test suite with new test workflow generation capabilities, improving the robustness of automated testing.
This commit significantly advances the architecture of the z3ed system, laying the groundwork for more sophisticated AI-driven features and streamlined development processes.