feat: Revamp agent test suite script for improved functionality and usability
- Converted the agent test suite script to a more comprehensive format, consolidating multiple tests into a single script. - Enhanced pre-flight checks for AI provider availability, including Ollama and Gemini. - Implemented detailed test execution and result logging, providing clearer output and recommendations for troubleshooting. - Removed outdated test scripts to streamline the testing process and improve maintainability. - Updated README to reflect changes in the test suite and added build environment verification instructions.
This commit is contained in:
@@ -106,3 +106,29 @@ cmake --build build --config Debug
|
||||
- **`extract_changelog.py`** - Extract changelog for releases
|
||||
- **`quality_check.sh`** - Code quality checks (Linux/macOS)
|
||||
- **`create-macos-bundle.sh`** - Create macOS application bundle for releases
|
||||
|
||||
## Build Environment Verification
|
||||
|
||||
This directory also contains build environment verification scripts.
|
||||
|
||||
### `verify-build-environment.ps1` / `.sh`
|
||||
|
||||
A comprehensive script that checks:
|
||||
|
||||
- ✅ **CMake Installation** - Version 3.16+ required
|
||||
- ✅ **Git Installation** - With submodule support
|
||||
- ✅ **C++ Compiler** - GCC 13+, Clang 16+, or MSVC 2019+
|
||||
- ✅ **Platform Tools** - Xcode (macOS), Visual Studio (Windows), build-essential (Linux)
|
||||
- ✅ **Git Submodules** - All dependencies synchronized
|
||||
|
||||
### Usage
|
||||
|
||||
**Windows (PowerShell):**
|
||||
```powershell
|
||||
.\scripts\verify-build-environment.ps1
|
||||
```
|
||||
|
||||
**macOS/Linux:**
|
||||
```bash
|
||||
./scripts/verify-build-environment.sh
|
||||
```
|
||||
|
||||
@@ -1,365 +0,0 @@
|
||||
# YAZE Build Environment Verification Scripts
|
||||
|
||||
This directory contains build environment verification and setup scripts for YAZE development.
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Verify Build Environment
|
||||
|
||||
**Windows (PowerShell):**
|
||||
```powershell
|
||||
.\scripts\verify-build-environment.ps1
|
||||
```
|
||||
|
||||
**macOS/Linux:**
|
||||
```bash
|
||||
./scripts/verify-build-environment.sh
|
||||
```
|
||||
|
||||
## Scripts Overview
|
||||
|
||||
### `verify-build-environment.ps1` / `.sh`
|
||||
|
||||
Comprehensive build environment verification script that checks:
|
||||
|
||||
- ✅ **CMake Installation** - Version 3.16+ required
|
||||
- ✅ **Git Installation** - With submodule support
|
||||
- ✅ **C++ Compiler** - GCC 13+, Clang 16+, or MSVC 2019+
|
||||
- ✅ **Platform Tools** - Xcode (macOS), Visual Studio (Windows), build-essential (Linux)
|
||||
- ✅ **Git Submodules** - All dependencies synchronized (auto-fixes if missing/empty)
|
||||
- ✅ **CMake Cache** - Freshness check (warns if >7 days old)
|
||||
- ✅ **Dependency Compatibility** - gRPC isolation, httplib, nlohmann/json
|
||||
- ✅ **CMake Configuration** - Test configuration (verbose mode only)
|
||||
|
||||
**Automatic Fixes:**
|
||||
The script now automatically fixes common issues without requiring `-FixIssues`:
|
||||
- 🔧 **Missing/Empty Submodules** - Automatically runs `git submodule update --init --recursive`
|
||||
- 🔧 **Old CMake Cache** - Prompts for confirmation when using `-FixIssues` (auto-skips otherwise)
|
||||
|
||||
#### Usage
|
||||
|
||||
**Windows:**
|
||||
```powershell
|
||||
# Basic verification (auto-fixes submodules)
|
||||
.\scripts\verify-build-environment.ps1
|
||||
|
||||
# With interactive fixes (prompts for cache cleaning)
|
||||
.\scripts\verify-build-environment.ps1 -FixIssues
|
||||
|
||||
# Force clean old CMake cache (no prompts)
|
||||
.\scripts\verify-build-environment.ps1 -CleanCache
|
||||
|
||||
# Verbose output (includes CMake configuration test)
|
||||
.\scripts\verify-build-environment.ps1 -Verbose
|
||||
|
||||
# Combined options
|
||||
.\scripts\verify-build-environment.ps1 -FixIssues -Verbose
|
||||
```
|
||||
|
||||
**macOS/Linux:**
|
||||
```bash
|
||||
# Basic verification (auto-fixes submodules)
|
||||
./scripts/verify-build-environment.sh
|
||||
|
||||
# With interactive fixes (prompts for cache cleaning)
|
||||
./scripts/verify-build-environment.sh --fix
|
||||
|
||||
# Force clean old CMake cache (no prompts)
|
||||
./scripts/verify-build-environment.sh --clean
|
||||
|
||||
# Verbose output
|
||||
./scripts/verify-build-environment.sh --verbose
|
||||
|
||||
# Combined options
|
||||
./scripts/verify-build-environment.sh --fix --verbose
|
||||
```
|
||||
|
||||
#### Exit Codes
|
||||
|
||||
- `0` - Success, environment ready for development
|
||||
- `1` - Issues found, manual intervention required
|
||||
|
||||
## Common Workflows
|
||||
|
||||
### First-Time Setup
|
||||
|
||||
```bash
|
||||
# 1. Clone repository with submodules
|
||||
git clone --recursive https://github.com/scawful/yaze.git
|
||||
cd yaze
|
||||
|
||||
# 2. Verify environment
|
||||
./scripts/verify-build-environment.sh --verbose
|
||||
|
||||
# 3. If issues found, fix automatically
|
||||
./scripts/verify-build-environment.sh --fix
|
||||
|
||||
# 4. Build
|
||||
cmake --preset debug # macOS
|
||||
# OR
|
||||
cmake -B build -DCMAKE_BUILD_TYPE=Debug # All platforms
|
||||
cmake --build build
|
||||
```
|
||||
|
||||
### After Pulling Changes
|
||||
|
||||
```bash
|
||||
# 1. Update submodules
|
||||
git submodule update --init --recursive
|
||||
|
||||
# 2. Verify environment (check cache age)
|
||||
./scripts/verify-build-environment.sh
|
||||
|
||||
# 3. If cache is old, clean and rebuild
|
||||
./scripts/verify-build-environment.sh --clean
|
||||
cmake -B build -DCMAKE_BUILD_TYPE=Debug
|
||||
cmake --build build
|
||||
```
|
||||
|
||||
### Troubleshooting Build Issues
|
||||
|
||||
```bash
|
||||
# 1. Clean everything and verify
|
||||
./scripts/verify-build-environment.sh --clean --fix --verbose
|
||||
|
||||
# 2. This will:
|
||||
# - Sync all git submodules
|
||||
# - Remove old CMake cache
|
||||
# - Test CMake configuration
|
||||
# - Report any issues
|
||||
|
||||
# 3. Follow recommended actions in output
|
||||
```
|
||||
|
||||
### Before Opening Pull Request
|
||||
|
||||
```bash
|
||||
# Verify clean build environment
|
||||
./scripts/verify-build-environment.sh --verbose
|
||||
|
||||
# Should report: "Build Environment Ready for Development!"
|
||||
```
|
||||
|
||||
## Automatic Fixes
|
||||
|
||||
The script automatically fixes common issues when detected:
|
||||
|
||||
### Always Auto-Fixed (No Confirmation Required)
|
||||
|
||||
1. **Missing/Empty Git Submodules**
|
||||
```bash
|
||||
git submodule sync --recursive
|
||||
git submodule update --init --recursive
|
||||
```
|
||||
- Runs automatically when submodules are missing or empty
|
||||
- No user confirmation required
|
||||
- Re-verifies after sync to ensure success
|
||||
|
||||
### Fixed with `-FixIssues` / `--fix` Flag
|
||||
|
||||
2. **Clean Old CMake Cache** (with confirmation prompt)
|
||||
- Prompts user before removing build directories
|
||||
- Only when cache is older than 7 days
|
||||
- User can choose to skip
|
||||
|
||||
### Fixed with `-CleanCache` / `--clean` Flag
|
||||
|
||||
3. **Force Clean CMake Cache** (no confirmation)
|
||||
- Removes `build/`, `build_test/`, `build-grpc-test/`
|
||||
- Removes Visual Studio cache (`out/`)
|
||||
- No prompts, immediate cleanup
|
||||
|
||||
### Optional Verbose Tests
|
||||
|
||||
When run with `--verbose` or `-Verbose`:
|
||||
|
||||
4. **Test CMake Configuration**
|
||||
- Creates temporary build directory
|
||||
- Tests minimal configuration
|
||||
- Reports success/failure
|
||||
- Cleans up test directory
|
||||
|
||||
## Integration with Visual Studio
|
||||
|
||||
The verification script integrates with Visual Studio CMake workflow:
|
||||
|
||||
1. **Pre-Build Check**: Run verification before opening VS
|
||||
2. **Submodule Sync**: Ensures all dependencies are present
|
||||
3. **Cache Management**: Prevents stale CMake cache issues
|
||||
|
||||
**Visual Studio Workflow:**
|
||||
```powershell
|
||||
# 1. Verify environment
|
||||
.\scripts\verify-build-environment.ps1 -Verbose
|
||||
|
||||
# 2. Open in Visual Studio
|
||||
# File → Open → Folder → Select yaze directory
|
||||
|
||||
# 3. Visual Studio detects CMakeLists.txt automatically
|
||||
# 4. Select Debug/Release from toolbar
|
||||
# 5. Press F5 to build and run
|
||||
```
|
||||
|
||||
## What Gets Checked
|
||||
|
||||
### CMake (Required)
|
||||
- Minimum version 3.16
|
||||
- Command available in PATH
|
||||
- Compatible with project CMake files
|
||||
|
||||
### Git (Required)
|
||||
- Git command available
|
||||
- Submodule support
|
||||
- All submodules present and synchronized:
|
||||
- `src/lib/SDL`
|
||||
- `src/lib/abseil-cpp`
|
||||
- `src/lib/asar`
|
||||
- `src/lib/imgui`
|
||||
- `third_party/json`
|
||||
- `third_party/httplib`
|
||||
|
||||
### Compilers (Required)
|
||||
- **Windows**: Visual Studio 2019+ with C++ workload
|
||||
- **macOS**: Xcode Command Line Tools
|
||||
- **Linux**: GCC 13+ or Clang 16+, build-essential package
|
||||
|
||||
### Platform Dependencies
|
||||
|
||||
**Linux Specific:**
|
||||
- GTK+3 development libraries (`libgtk-3-dev`)
|
||||
- DBus development libraries (`libdbus-1-dev`)
|
||||
- pkg-config tool
|
||||
|
||||
**macOS Specific:**
|
||||
- Xcode Command Line Tools
|
||||
- Cocoa framework (automatic)
|
||||
|
||||
**Windows Specific:**
|
||||
- Visual Studio 2022 recommended
|
||||
- Windows SDK 10.0.19041.0 or later
|
||||
|
||||
### CMake Cache
|
||||
|
||||
Checks for build directories:
|
||||
- `build/` - Main build directory
|
||||
- `build_test/` - Test build directory
|
||||
- `build-grpc-test/` - gRPC test builds
|
||||
- `out/` - Visual Studio CMake output
|
||||
|
||||
Warns if cache files are older than 7 days.
|
||||
|
||||
### Dependencies
|
||||
|
||||
**gRPC Isolation (when enabled):**
|
||||
- Verifies `CMAKE_DISABLE_FIND_PACKAGE_Protobuf=TRUE`
|
||||
- Verifies `CMAKE_DISABLE_FIND_PACKAGE_absl=TRUE`
|
||||
- Prevents system package conflicts
|
||||
|
||||
**Header-Only Libraries:**
|
||||
- `third_party/httplib` - cpp-httplib HTTP library
|
||||
- `third_party/json` - nlohmann/json library
|
||||
|
||||
## Automatic Fixes
|
||||
|
||||
When run with `--fix` or `-FixIssues`:
|
||||
|
||||
1. **Sync Git Submodules**
|
||||
```bash
|
||||
git submodule sync --recursive
|
||||
git submodule update --init --recursive
|
||||
```
|
||||
|
||||
2. **Clean CMake Cache** (when combined with `--clean`)
|
||||
- Removes `build/`, `build_test/`, `build-grpc-test/`
|
||||
- Removes Visual Studio cache (`out/`)
|
||||
|
||||
3. **Test CMake Configuration**
|
||||
- Creates temporary build directory
|
||||
- Tests minimal configuration
|
||||
- Reports success/failure
|
||||
- Cleans up test directory
|
||||
|
||||
## CI/CD Integration
|
||||
|
||||
The verification script can be integrated into CI/CD pipelines:
|
||||
|
||||
```yaml
|
||||
# Example GitHub Actions step
|
||||
- name: Verify Build Environment
|
||||
run: |
|
||||
./scripts/verify-build-environment.sh --verbose
|
||||
shell: bash
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Script Reports "CMake Not Found"
|
||||
|
||||
**Windows:**
|
||||
```powershell
|
||||
# Check if CMake is installed
|
||||
cmake --version
|
||||
|
||||
# If not found, add to PATH or install
|
||||
choco install cmake
|
||||
|
||||
# Restart PowerShell
|
||||
```
|
||||
|
||||
**macOS/Linux:**
|
||||
```bash
|
||||
# Check if CMake is installed
|
||||
cmake --version
|
||||
|
||||
# Install if missing
|
||||
brew install cmake # macOS
|
||||
sudo apt install cmake # Ubuntu/Debian
|
||||
```
|
||||
|
||||
### "Git Submodules Missing"
|
||||
|
||||
```bash
|
||||
# Manually sync and update
|
||||
git submodule sync --recursive
|
||||
git submodule update --init --recursive
|
||||
|
||||
# Or use fix option
|
||||
./scripts/verify-build-environment.sh --fix
|
||||
```
|
||||
|
||||
### "CMake Cache Too Old"
|
||||
|
||||
```bash
|
||||
# Clean automatically
|
||||
./scripts/verify-build-environment.sh --clean
|
||||
|
||||
# Or manually
|
||||
rm -rf build build_test build-grpc-test
|
||||
```
|
||||
|
||||
### "Visual Studio Not Found" (Windows)
|
||||
|
||||
```powershell
|
||||
# Install Visual Studio 2022 with C++ workload
|
||||
# Download from: https://visualstudio.microsoft.com/
|
||||
|
||||
# Required workload:
|
||||
# "Desktop development with C++"
|
||||
```
|
||||
|
||||
### Script Fails on Network Issues (gRPC)
|
||||
|
||||
The script verifies configuration but doesn't download gRPC unless building with `-DYAZE_WITH_GRPC=ON`.
|
||||
|
||||
If you encounter network issues:
|
||||
```bash
|
||||
# Use minimal build (no gRPC)
|
||||
cmake -B build -DYAZE_MINIMAL_BUILD=ON
|
||||
```
|
||||
|
||||
## See Also
|
||||
|
||||
- [Build Instructions](../docs/02-build-instructions.md) - Complete build guide
|
||||
- [Getting Started](../docs/01-getting-started.md) - First-time setup
|
||||
- [Platform Compatibility](../docs/B2-platform-compatibility.md) - Platform-specific notes
|
||||
- [Contributing](../docs/B1-contributing.md) - Development guidelines
|
||||
303
scripts/agent_test_suite.sh
Normal file → Executable file
303
scripts/agent_test_suite.sh
Normal file → Executable file
@@ -1,93 +1,238 @@
|
||||
#!/bin/bash
|
||||
# Comprehensive test script for Ollama and Gemini AI providers with tool calling
|
||||
|
||||
# Comprehensive test suite for the z3ed AI Agent.
|
||||
# This script consolidates multiple older test scripts into one.
|
||||
#
|
||||
# Usage: ./scripts/agent_test_suite.sh <provider>
|
||||
# provider: ollama, gemini, or mock
|
||||
set -e
|
||||
|
||||
set -e # Exit immediately if a command exits with a non-zero status.
|
||||
GREEN='\033[0;32m'
|
||||
YELLOW='\033[1;33m'
|
||||
RED='\033[0;31m'
|
||||
BLUE='\033[0;34m'
|
||||
NC='\033[0m' # No Color
|
||||
|
||||
# --- Configuration ---
|
||||
Z3ED_BIN="/Users/scawful/Code/yaze/build_test/bin/z3ed"
|
||||
ROM_PATH="/Users/scawful/Code/yaze/assets/zelda3.sfc"
|
||||
TEST_DIR="/Users/scawful/Code/yaze/assets/agent"
|
||||
TEST_FILES=(
|
||||
"context_and_followup.txt"
|
||||
"complex_command_generation.txt"
|
||||
"error_handling_and_edge_cases.txt"
|
||||
)
|
||||
Z3ED="./build_test/bin/z3ed"
|
||||
ROM="assets/zelda3.sfc"
|
||||
RESULTS_FILE="/tmp/z3ed_ai_test_results.txt"
|
||||
|
||||
# --- Helper Functions ---
|
||||
print_header() {
|
||||
echo ""
|
||||
echo "================================================="
|
||||
echo "$1"
|
||||
echo "================================================="
|
||||
echo "=========================================="
|
||||
echo " Z3ED AI Provider Test Suite"
|
||||
echo "=========================================="
|
||||
echo ""
|
||||
|
||||
# Clear results file
|
||||
> "$RESULTS_FILE"
|
||||
|
||||
# Check if z3ed exists
|
||||
if [ ! -f "$Z3ED" ]; then
|
||||
echo -e "${RED}✗ z3ed not found at $Z3ED${NC}"
|
||||
echo " Try building with: cmake --build build_rooms"
|
||||
exit 1
|
||||
fi
|
||||
echo -e "${GREEN}✓ z3ed found${NC}"
|
||||
|
||||
# Check if ROM exists
|
||||
if [ ! -f "$ROM" ]; then
|
||||
echo -e "${RED}✗ ROM not found at $ROM${NC}"
|
||||
exit 1
|
||||
fi
|
||||
echo -e "${GREEN}✓ ROM found${NC}"
|
||||
|
||||
# Test Ollama availability
|
||||
OLLAMA_AVAILABLE=false
|
||||
if command -v ollama &> /dev/null && curl -s http://localhost:11434/api/tags > /dev/null 2>&1; then
|
||||
if ollama list | grep -q "qwen2.5-coder"; then
|
||||
OLLAMA_AVAILABLE=true
|
||||
echo -e "${GREEN}✓ Ollama available (qwen2.5-coder)${NC}"
|
||||
else
|
||||
echo -e "${YELLOW}⚠ Ollama available but qwen2.5-coder not found${NC}"
|
||||
echo " Install with: ollama pull qwen2.5-coder:7b"
|
||||
fi
|
||||
else
|
||||
echo -e "${YELLOW}⚠ Ollama not available${NC}"
|
||||
fi
|
||||
|
||||
# Test Gemini availability
|
||||
GEMINI_AVAILABLE=false
|
||||
if [ -n "$GEMINI_API_KEY" ]; then
|
||||
GEMINI_AVAILABLE=true
|
||||
echo -e "${GREEN}✓ Gemini API key configured${NC}"
|
||||
else
|
||||
echo -e "${YELLOW}⚠ Gemini API key not set${NC}"
|
||||
echo " Set with: export GEMINI_API_KEY='your-key'"
|
||||
fi
|
||||
|
||||
if [ "$OLLAMA_AVAILABLE" = false ] && [ "$GEMINI_AVAILABLE" = false ]; then
|
||||
echo -e "${RED}✗ No AI providers available${NC}"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo ""
|
||||
|
||||
# Test function
|
||||
run_test() {
|
||||
local test_name="$1"
|
||||
local provider="$2"
|
||||
local query="$3"
|
||||
local expected_pattern="$4"
|
||||
local extra_args="$5"
|
||||
|
||||
echo "=========================================="
|
||||
echo " Test: $test_name"
|
||||
echo " Provider: $provider"
|
||||
echo "=========================================="
|
||||
echo ""
|
||||
echo "Query: $query"
|
||||
echo ""
|
||||
|
||||
local cmd="$Z3ED agent simple-chat \"$query\" --rom=\"$ROM\" --ai_provider=$provider $extra_args"
|
||||
echo "Running: $cmd"
|
||||
echo ""
|
||||
|
||||
local output
|
||||
local exit_code=0
|
||||
output=$($cmd 2>&1) || exit_code=$?
|
||||
|
||||
echo "$output"
|
||||
echo ""
|
||||
|
||||
# Check for expected patterns
|
||||
local result="UNKNOWN"
|
||||
if [ $exit_code -ne 0 ]; then
|
||||
result="FAILED (exit code: $exit_code)"
|
||||
elif echo "$output" | grep -qi "$expected_pattern"; then
|
||||
result="PASSED"
|
||||
echo -e "${GREEN}✓ Response contains expected pattern: '$expected_pattern'${NC}"
|
||||
else
|
||||
result="FAILED (pattern not found)"
|
||||
echo -e "${YELLOW}⚠ Response missing expected pattern: '$expected_pattern'${NC}"
|
||||
fi
|
||||
|
||||
# Check for error indicators
|
||||
if echo "$output" | grep -qi "error\|failed\|infinite loop"; then
|
||||
result="FAILED (error detected)"
|
||||
echo -e "${RED}✗ Error detected in output${NC}"
|
||||
fi
|
||||
|
||||
# Record result
|
||||
echo "$test_name | $provider | $result" >> "$RESULTS_FILE"
|
||||
echo ""
|
||||
echo -e "${BLUE}Result: $result${NC}"
|
||||
echo ""
|
||||
|
||||
sleep 2 # Avoid rate limiting
|
||||
}
|
||||
|
||||
# --- Pre-flight Checks ---
|
||||
print_header "Performing Pre-flight Checks"
|
||||
# Test Suite
|
||||
|
||||
if [ -z "$1" ]; then
|
||||
echo "❌ Error: No AI provider specified."
|
||||
echo "Usage: $0 <ollama|gemini|mock>"
|
||||
exit 1
|
||||
fi
|
||||
PROVIDER=$1
|
||||
echo "✅ Provider: $PROVIDER"
|
||||
|
||||
if [ ! -f "$Z3ED_BIN" ]; then
|
||||
echo "❌ Error: z3ed binary not found at $Z3ED_BIN"
|
||||
echo "Please build the project first (e.g., in build_test)."
|
||||
exit 1
|
||||
fi
|
||||
echo "✅ z3ed binary found."
|
||||
|
||||
if [ ! -f "$ROM_PATH" ]; then
|
||||
echo "❌ Error: ROM not found at $ROM_PATH"
|
||||
exit 1
|
||||
fi
|
||||
echo "✅ ROM file found."
|
||||
|
||||
if [ "$PROVIDER" == "gemini" ] && [ -z "$GEMINI_API_KEY" ]; then
|
||||
echo "❌ Error: GEMINI_API_KEY environment variable is not set."
|
||||
echo "Please set it to your Gemini API key to run this test."
|
||||
exit 1
|
||||
fi
|
||||
if [ "$PROVIDER" == "gemini" ]; then
|
||||
echo "✅ GEMINI_API_KEY is set."
|
||||
if [ "$OLLAMA_AVAILABLE" = true ]; then
|
||||
echo ""
|
||||
echo "=========================================="
|
||||
echo " OLLAMA TESTS"
|
||||
echo "=========================================="
|
||||
echo ""
|
||||
|
||||
run_test "Ollama: Simple Question" "ollama" \
|
||||
"What dungeons are in this ROM?" \
|
||||
"dungeon\|palace\|castle"
|
||||
|
||||
run_test "Ollama: Sprite Query" "ollama" \
|
||||
"What sprites are in room 0?" \
|
||||
"sprite\|room"
|
||||
|
||||
run_test "Ollama: Tile Search" "ollama" \
|
||||
"Where can I find trees in the overworld?" \
|
||||
"tree\|0x02E\|map\|coordinate"
|
||||
|
||||
run_test "Ollama: Map Description" "ollama" \
|
||||
"Describe overworld map 0" \
|
||||
"light world\|map\|overworld"
|
||||
|
||||
run_test "Ollama: Warp List" "ollama" \
|
||||
"List the warps in the Light World" \
|
||||
"warp\|entrance\|exit"
|
||||
fi
|
||||
|
||||
if [ "$PROVIDER" == "ollama" ]; then
|
||||
if ! pgrep -x "Ollama" > /dev/null && ! pgrep -x "ollama" > /dev/null; then
|
||||
echo "⚠️ Warning: Ollama server process not found. The script might fail if it's not running."
|
||||
if [ "$GEMINI_AVAILABLE" = true ]; then
|
||||
echo ""
|
||||
echo "=========================================="
|
||||
echo " GEMINI TESTS"
|
||||
echo "=========================================="
|
||||
echo ""
|
||||
|
||||
run_test "Gemini: Simple Question" "gemini" \
|
||||
"What dungeons are in this ROM?" \
|
||||
"dungeon\|palace\|castle" \
|
||||
"--gemini_api_key=\"$GEMINI_API_KEY\""
|
||||
|
||||
run_test "Gemini: Sprite Query" "gemini" \
|
||||
"What sprites are in room 0?" \
|
||||
"sprite\|room" \
|
||||
"--gemini_api_key=\"$GEMINI_API_KEY\""
|
||||
|
||||
run_test "Gemini: Tile Search" "gemini" \
|
||||
"Where can I find trees in the overworld?" \
|
||||
"tree\|0x02E\|map\|coordinate" \
|
||||
"--gemini_api_key=\"$GEMINI_API_KEY\""
|
||||
|
||||
run_test "Gemini: Map Description" "gemini" \
|
||||
"Describe overworld map 0" \
|
||||
"light world\|map\|overworld" \
|
||||
"--gemini_api_key=\"$GEMINI_API_KEY\""
|
||||
|
||||
run_test "Gemini: Warp List" "gemini" \
|
||||
"List the warps in the Light World" \
|
||||
"warp\|entrance\|exit" \
|
||||
"--gemini_api_key=\"$GEMINI_API_KEY\""
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo "=========================================="
|
||||
echo " TEST SUMMARY"
|
||||
echo "=========================================="
|
||||
echo ""
|
||||
|
||||
if [ -f "$RESULTS_FILE" ]; then
|
||||
cat "$RESULTS_FILE"
|
||||
echo ""
|
||||
|
||||
local total=$(wc -l < "$RESULTS_FILE" | tr -d ' ')
|
||||
local passed=$(grep -c "PASSED" "$RESULTS_FILE" || echo "0")
|
||||
local failed=$(grep -c "FAILED" "$RESULTS_FILE" || echo "0")
|
||||
|
||||
echo "Total Tests: $total"
|
||||
echo -e "${GREEN}Passed: $passed${NC}"
|
||||
echo -e "${RED}Failed: $failed${NC}"
|
||||
echo ""
|
||||
|
||||
if [ "$passed" -eq "$total" ]; then
|
||||
echo -e "${GREEN}🎉 All tests passed!${NC}"
|
||||
elif [ "$passed" -gt 0 ]; then
|
||||
echo -e "${YELLOW}⚠ Some tests failed. Review output above.${NC}"
|
||||
else
|
||||
echo "✅ Ollama server process found."
|
||||
echo -e "${RED}✗ All tests failed. Check configuration.${NC}"
|
||||
fi
|
||||
else
|
||||
echo -e "${RED}✗ No results file generated${NC}"
|
||||
fi
|
||||
|
||||
# --- Run Test Suite ---
|
||||
for test_file in "${TEST_FILES[@]}"; do
|
||||
print_header "Running Test File: $test_file (Provider: $PROVIDER)"
|
||||
FULL_TEST_PATH="$TEST_DIR/$test_file"
|
||||
|
||||
if [ ! -f "$FULL_TEST_PATH" ]; then
|
||||
echo "❌ Error: Test file not found: $FULL_TEST_PATH"
|
||||
continue
|
||||
fi
|
||||
|
||||
# Construct the command. Use --quiet for cleaner test logs.
|
||||
COMMAND="$Z3ED_BIN agent simple-chat --file=$FULL_TEST_PATH --rom=$ROM_PATH --ai_provider=$PROVIDER --quiet"
|
||||
|
||||
echo "Executing command..."
|
||||
echo "--- Agent Output for $test_file ---"
|
||||
|
||||
# Execute the command and print its output
|
||||
eval $COMMAND
|
||||
|
||||
echo "--- Test Complete ---"
|
||||
echo ""
|
||||
done
|
||||
|
||||
print_header "✅ All tests completed successfully!"
|
||||
echo ""
|
||||
echo "=========================================="
|
||||
echo " Recommendations"
|
||||
echo "=========================================="
|
||||
echo ""
|
||||
echo "If tests are failing:"
|
||||
echo " 1. Check that the ROM is valid and loaded properly"
|
||||
echo " 2. Verify tool definitions in prompt_catalogue.yaml"
|
||||
echo " 3. Review system prompts in prompt_builder.cc"
|
||||
echo " 4. Check AI provider connectivity and quotas"
|
||||
echo " 5. Examine tool execution logs for errors"
|
||||
echo ""
|
||||
echo "For Ollama:"
|
||||
echo " - Try different models: ollama pull llama3:8b"
|
||||
echo " - Adjust temperature in ollama_ai_service.cc"
|
||||
echo ""
|
||||
echo "For Gemini:"
|
||||
echo " - Verify API key is valid"
|
||||
echo " - Check quota at: https://aistudio.google.com"
|
||||
echo ""
|
||||
echo "Results saved to: $RESULTS_FILE"
|
||||
echo ""
|
||||
|
||||
@@ -1,79 +0,0 @@
|
||||
#!/bin/bash
|
||||
# Test Phase 4: Enhanced Prompting
|
||||
# Compares command quality with and without few-shot examples
|
||||
|
||||
set -e
|
||||
|
||||
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
|
||||
PROJECT_ROOT="$SCRIPT_DIR/.."
|
||||
Z3ED_BIN="$PROJECT_ROOT/build/bin/z3ed"
|
||||
|
||||
echo "🧪 Phase 4: Enhanced Prompting Test"
|
||||
echo "======================================"
|
||||
echo ""
|
||||
|
||||
# Color output helpers
|
||||
GREEN='\033[0;32m'
|
||||
BLUE='\033[0;34m'
|
||||
YELLOW='\033[0;33m'
|
||||
NC='\033[0m' # No Color
|
||||
|
||||
# Test prompts
|
||||
declare -a TEST_PROMPTS=(
|
||||
"Change palette 0 color 5 to red"
|
||||
"Place a tree at coordinates (10, 20) on map 0"
|
||||
"Make all soldiers wear red armor"
|
||||
"Export palette 0, change color 3 to blue, and import it back"
|
||||
"Validate the ROM"
|
||||
)
|
||||
|
||||
echo -e "${BLUE}Testing with Enhanced Prompting (few-shot examples)${NC}"
|
||||
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
|
||||
echo ""
|
||||
|
||||
for prompt in "${TEST_PROMPTS[@]}"; do
|
||||
echo -e "${YELLOW}Prompt:${NC} \"$prompt\""
|
||||
echo ""
|
||||
|
||||
# Test with Gemini if available
|
||||
if [ -n "$GEMINI_API_KEY" ]; then
|
||||
echo "Testing with Gemini (enhanced prompting)..."
|
||||
OUTPUT=$($Z3ED_BIN agent plan --prompt "$prompt" 2>&1)
|
||||
|
||||
echo "$OUTPUT"
|
||||
|
||||
# Count commands
|
||||
COMMAND_COUNT=$(echo "$OUTPUT" | grep -c -E "^\s*-" || true)
|
||||
echo ""
|
||||
echo "Commands generated: $COMMAND_COUNT"
|
||||
|
||||
else
|
||||
echo "⚠️ GEMINI_API_KEY not set - using MockAIService"
|
||||
OUTPUT=$($Z3ED_BIN agent plan --prompt "$prompt" 2>&1 || true)
|
||||
echo "$OUTPUT"
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
|
||||
echo ""
|
||||
done
|
||||
|
||||
echo ""
|
||||
echo "🎉 Enhanced Prompting Tests Complete!"
|
||||
echo ""
|
||||
echo "Key Improvements with Phase 4:"
|
||||
echo " • Few-shot examples show the model how to format commands"
|
||||
echo " • Comprehensive command reference included in system prompt"
|
||||
echo " • Tile ID references (tree=0x02E, house=0x0C0, etc.)"
|
||||
echo " • Multi-step workflow examples (export → modify → import)"
|
||||
echo " • Clear constraints on output format"
|
||||
echo ""
|
||||
echo "Expected Accuracy Improvement:"
|
||||
echo " • Before: ~60-70% (guessing command syntax)"
|
||||
echo " • After: ~90%+ (following proven patterns)"
|
||||
echo ""
|
||||
echo "Next Steps:"
|
||||
echo " 1. Review command quality and accuracy"
|
||||
echo " 2. Add more few-shot examples for edge cases"
|
||||
echo " 3. Load z3ed-resources.yaml when available"
|
||||
echo " 4. Add ROM context injection"
|
||||
@@ -1,153 +0,0 @@
|
||||
#!/bin/bash
|
||||
# End-to-end test script for ImGuiTestHarness gRPC service
|
||||
# Tests all RPC methods to validate Phase 3 implementation
|
||||
|
||||
set -e # Exit on error
|
||||
|
||||
# Colors for output
|
||||
GREEN='\033[0;32m'
|
||||
RED='\033[0;31m'
|
||||
YELLOW='\033[1;33m'
|
||||
NC='\033[0m' # No Color
|
||||
|
||||
# Configuration
|
||||
YAZE_BIN="./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze"
|
||||
TEST_PORT=50052
|
||||
PROTO_PATH="src/app/core/proto"
|
||||
PROTO_FILE="imgui_test_harness.proto"
|
||||
ROM_FILE="assets/zelda3.sfc"
|
||||
|
||||
echo -e "${YELLOW}=== ImGuiTestHarness E2E Test ===${NC}\n"
|
||||
|
||||
# Check if YAZE binary exists
|
||||
if [ ! -f "$YAZE_BIN" ]; then
|
||||
echo -e "${RED}Error: YAZE binary not found at $YAZE_BIN${NC}"
|
||||
echo "Please build with: cmake --build build-grpc-test --target yaze"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Check if ROM file exists
|
||||
if [ ! -f "$ROM_FILE" ]; then
|
||||
echo -e "${RED}Error: ROM file not found at $ROM_FILE${NC}"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Check if grpcurl is installed
|
||||
if ! command -v grpcurl &> /dev/null; then
|
||||
echo -e "${RED}Error: grpcurl not found${NC}"
|
||||
echo "Install with: brew install grpcurl"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Kill any existing YAZE instances
|
||||
echo -e "${YELLOW}Cleaning up existing YAZE instances...${NC}"
|
||||
killall yaze 2>/dev/null || true
|
||||
sleep 1
|
||||
|
||||
# Start YAZE in background
|
||||
echo -e "${YELLOW}Starting YAZE with test harness...${NC}"
|
||||
$YAZE_BIN \
|
||||
--enable_test_harness \
|
||||
--test_harness_port=$TEST_PORT \
|
||||
--rom_file=$ROM_FILE &
|
||||
|
||||
YAZE_PID=$!
|
||||
echo "YAZE PID: $YAZE_PID"
|
||||
|
||||
# Wait for server to be ready
|
||||
echo -e "${YELLOW}Waiting for server to start...${NC}"
|
||||
sleep 3
|
||||
|
||||
# Check if server is running
|
||||
if ! lsof -i :$TEST_PORT > /dev/null 2>&1; then
|
||||
echo -e "${RED}Error: Server not listening on port $TEST_PORT${NC}"
|
||||
kill $YAZE_PID 2>/dev/null || true
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo -e "${GREEN}✓ Server started successfully${NC}\n"
|
||||
|
||||
# Test counter
|
||||
TESTS_RUN=0
|
||||
TESTS_PASSED=0
|
||||
TESTS_FAILED=0
|
||||
|
||||
# Helper function to run a test
|
||||
run_test() {
|
||||
local test_name="$1"
|
||||
local rpc_method="$2"
|
||||
local request_data="$3"
|
||||
|
||||
TESTS_RUN=$((TESTS_RUN + 1))
|
||||
echo -e "${YELLOW}Test $TESTS_RUN: $test_name${NC}"
|
||||
|
||||
if grpcurl -plaintext \
|
||||
-import-path $PROTO_PATH \
|
||||
-proto $PROTO_FILE \
|
||||
-d "$request_data" \
|
||||
127.0.0.1:$TEST_PORT \
|
||||
yaze.test.ImGuiTestHarness/$rpc_method 2>&1 | tee /tmp/grpc_test_output.txt; then
|
||||
|
||||
# Check for success in response
|
||||
if grep -q '"success":.*true' /tmp/grpc_test_output.txt || \
|
||||
grep -q '"message":.*"Pong' /tmp/grpc_test_output.txt || \
|
||||
grep -q 'yazeVersion' /tmp/grpc_test_output.txt; then
|
||||
echo -e "${GREEN}✓ PASSED${NC}\n"
|
||||
TESTS_PASSED=$((TESTS_PASSED + 1))
|
||||
else
|
||||
echo -e "${RED}✗ FAILED (unexpected response)${NC}\n"
|
||||
TESTS_FAILED=$((TESTS_FAILED + 1))
|
||||
fi
|
||||
else
|
||||
echo -e "${RED}✗ FAILED (connection/RPC error)${NC}\n"
|
||||
TESTS_FAILED=$((TESTS_FAILED + 1))
|
||||
fi
|
||||
}
|
||||
|
||||
# Run all tests
|
||||
echo -e "${YELLOW}=== Running RPC Tests ===${NC}\n"
|
||||
|
||||
# 1. Ping - Health Check
|
||||
run_test "Ping (Health Check)" "Ping" '{"message":"test"}'
|
||||
|
||||
# 2. Click - Menu Item (Open Overworld Editor)
|
||||
# Note: Menu items in YAZE use format "menuitem:<Icon> Name"
|
||||
run_test "Click (Open Overworld Editor)" "Click" '{"target":"menuitem: Overworld Editor","type":"CLICK_TYPE_LEFT"}'
|
||||
|
||||
# 3. Wait - Window Visible (Overworld Editor should open)
|
||||
run_test "Wait (Overworld Editor Window)" "Wait" '{"condition":"window_visible:Overworld","timeout_ms":15000,"poll_interval_ms":100}'
|
||||
|
||||
# 4. Assert - Window Visible (Overworld Editor should be open)
|
||||
run_test "Assert (Overworld Editor Visible)" "Assert" '{"condition":"visible:Overworld"}'
|
||||
|
||||
# 5. Click - Another menu item (Dungeon Editor)
|
||||
run_test "Click (Open Dungeon Editor)" "Click" '{"target":"menuitem: Dungeon Editor","type":"CLICK_TYPE_LEFT"}'
|
||||
|
||||
# 6. Screenshot - Not Implemented (stub)
|
||||
echo -e "${YELLOW}Test 6: Screenshot (Not Implemented - Stub)${NC}"
|
||||
echo -e "${YELLOW}(Skipping - proto field mismatch needs fix)${NC}\n"
|
||||
TESTS_RUN=$((TESTS_RUN + 1))
|
||||
|
||||
# Summary
|
||||
echo -e "${YELLOW}=== Test Summary ===${NC}"
|
||||
echo "Tests Run: $TESTS_RUN"
|
||||
echo -e "${GREEN}Tests Passed: $TESTS_PASSED${NC}"
|
||||
if [ $TESTS_FAILED -gt 0 ]; then
|
||||
echo -e "${RED}Tests Failed: $TESTS_FAILED${NC}"
|
||||
fi
|
||||
echo ""
|
||||
|
||||
# Cleanup
|
||||
echo -e "${YELLOW}Cleaning up...${NC}"
|
||||
kill $YAZE_PID 2>/dev/null || true
|
||||
rm -f /tmp/grpc_test_output.txt
|
||||
sleep 1
|
||||
|
||||
# Exit with appropriate code
|
||||
if [ $TESTS_FAILED -gt 0 ]; then
|
||||
echo -e "${RED}Some tests failed${NC}"
|
||||
exit 1
|
||||
else
|
||||
echo -e "${GREEN}All tests passed!${NC}"
|
||||
exit 0
|
||||
fi
|
||||
@@ -1,180 +0,0 @@
|
||||
#!/bin/bash
|
||||
# Test script to verify ImGuiTestHarness gRPC service integration
|
||||
# Ensures the GUI automation infrastructure is working
|
||||
|
||||
set -e
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
PROJECT_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
|
||||
YAZE_APP="${PROJECT_ROOT}/build/bin/yaze.app/Contents/MacOS/yaze"
|
||||
|
||||
# Colors
|
||||
RED='\033[0;31m'
|
||||
GREEN='\033[0;32m'
|
||||
YELLOW='\033[1;33m'
|
||||
BLUE='\033[0;34m'
|
||||
NC='\033[0m'
|
||||
|
||||
echo "========================================="
|
||||
echo "ImGui Test Harness Verification"
|
||||
echo "========================================="
|
||||
echo ""
|
||||
|
||||
# Check if YAZE is built with gRPC support
|
||||
if [ ! -f "$YAZE_APP" ]; then
|
||||
echo -e "${RED}✗ YAZE application not found at $YAZE_APP${NC}"
|
||||
echo ""
|
||||
echo "Build with gRPC support:"
|
||||
echo " cmake -B build -DYAZE_WITH_GRPC=ON -DYAZE_WITH_JSON=ON"
|
||||
echo " cmake --build build --target yaze"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo -e "${GREEN}✓ YAZE application found${NC}"
|
||||
echo ""
|
||||
|
||||
# Check if gRPC libraries are linked
|
||||
echo "Checking gRPC dependencies..."
|
||||
echo "------------------------------"
|
||||
|
||||
if otool -L "$YAZE_APP" 2>/dev/null | grep -q "libgrpc"; then
|
||||
echo -e "${GREEN}✓ gRPC libraries linked${NC}"
|
||||
else
|
||||
echo -e "${YELLOW}⚠ gRPC libraries may not be linked${NC}"
|
||||
echo " This might be expected if gRPC is statically linked"
|
||||
fi
|
||||
|
||||
# Check for test harness service code
|
||||
TEST_HARNESS_IMPL="${PROJECT_ROOT}/src/app/core/service/imgui_test_harness_service.cc"
|
||||
if [ -f "$TEST_HARNESS_IMPL" ]; then
|
||||
echo -e "${GREEN}✓ Test harness implementation found${NC}"
|
||||
else
|
||||
echo -e "${RED}✗ Test harness implementation not found${NC}"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo ""
|
||||
|
||||
# Check if the service is properly integrated
|
||||
echo "Verifying test harness integration..."
|
||||
echo "--------------------------------------"
|
||||
|
||||
# Look for the service registration in the codebase
|
||||
if grep -q "ImGuiTestHarnessServer" "${PROJECT_ROOT}/src/app/core/service/imgui_test_harness_service.h"; then
|
||||
echo -e "${GREEN}✓ ImGuiTestHarnessServer class defined${NC}"
|
||||
else
|
||||
echo -e "${RED}✗ ImGuiTestHarnessServer class not found${NC}"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Check for gRPC server initialization
|
||||
if grep -rq "ImGuiTestHarnessServer.*Start" "${PROJECT_ROOT}/src/app" 2>/dev/null; then
|
||||
echo -e "${GREEN}✓ Server startup code found${NC}"
|
||||
else
|
||||
echo -e "${YELLOW}⚠ Could not verify server startup code${NC}"
|
||||
fi
|
||||
|
||||
echo ""
|
||||
|
||||
# Test gRPC port availability
|
||||
echo "Testing gRPC server availability..."
|
||||
echo "------------------------------------"
|
||||
|
||||
GRPC_PORT=50051
|
||||
echo "Checking if port $GRPC_PORT is available..."
|
||||
|
||||
if lsof -Pi :$GRPC_PORT -sTCP:LISTEN -t >/dev/null 2>&1; then
|
||||
echo -e "${YELLOW}⚠ Port $GRPC_PORT is already in use${NC}"
|
||||
echo " If YAZE is running, this is expected"
|
||||
SERVER_RUNNING=true
|
||||
else
|
||||
echo -e "${GREEN}✓ Port $GRPC_PORT is available${NC}"
|
||||
SERVER_RUNNING=false
|
||||
fi
|
||||
|
||||
echo ""
|
||||
|
||||
# Interactive test option
|
||||
if [ "$SERVER_RUNNING" = false ]; then
|
||||
echo "========================================="
|
||||
echo "Interactive Test Options"
|
||||
echo "========================================="
|
||||
echo ""
|
||||
echo "The test harness server is not currently running."
|
||||
echo ""
|
||||
echo "To test the full integration:"
|
||||
echo ""
|
||||
echo "1. Start YAZE in one terminal:"
|
||||
echo " $YAZE_APP"
|
||||
echo ""
|
||||
echo "2. In another terminal, verify the gRPC server:"
|
||||
echo " lsof -Pi :$GRPC_PORT -sTCP:LISTEN"
|
||||
echo ""
|
||||
echo "3. Test with z3ed GUI automation:"
|
||||
echo " z3ed agent test --prompt 'Open Overworld editor'"
|
||||
echo ""
|
||||
else
|
||||
echo "========================================="
|
||||
echo "Live Server Test"
|
||||
echo "========================================="
|
||||
echo ""
|
||||
echo -e "${GREEN}✓ gRPC server appears to be running on port $GRPC_PORT${NC}"
|
||||
echo ""
|
||||
|
||||
# Try to connect to the server
|
||||
if command -v grpcurl &> /dev/null; then
|
||||
echo "Testing server connection with grpcurl..."
|
||||
if grpcurl -plaintext localhost:$GRPC_PORT list 2>&1 | grep -q "yaze.test.ImGuiTestHarness"; then
|
||||
echo -e "${GREEN}✅ ImGuiTestHarness service is available!${NC}"
|
||||
echo ""
|
||||
echo "Available RPC methods:"
|
||||
grpcurl -plaintext localhost:$GRPC_PORT list yaze.test.ImGuiTestHarness 2>&1 | sed 's/^/ /'
|
||||
else
|
||||
echo -e "${YELLOW}⚠ Could not verify service availability${NC}"
|
||||
fi
|
||||
else
|
||||
echo -e "${YELLOW}⚠ grpcurl not installed, skipping connection test${NC}"
|
||||
echo " Install with: brew install grpcurl"
|
||||
fi
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo "========================================="
|
||||
echo "Summary"
|
||||
echo "========================================="
|
||||
echo ""
|
||||
echo "Test Harness Components:"
|
||||
echo " [✓] Source files present"
|
||||
echo " [✓] gRPC integration compiled"
|
||||
|
||||
if [ "$SERVER_RUNNING" = true ]; then
|
||||
echo " [✓] Server running on port $GRPC_PORT"
|
||||
else
|
||||
echo " [ ] Server not currently running"
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo "The ImGuiTestHarness service is ${GREEN}ready${NC} for:"
|
||||
echo " - Widget discovery and introspection"
|
||||
echo " - Automated GUI testing via z3ed agent test"
|
||||
echo " - Recording and playback of user interactions"
|
||||
echo ""
|
||||
|
||||
# Additional checks for agent chat widget
|
||||
echo "Checking for Agent Chat Widget..."
|
||||
echo "----------------------------------"
|
||||
|
||||
if grep -rq "AgentChatWidget" "${PROJECT_ROOT}/src/app/gui" 2>/dev/null; then
|
||||
echo -e "${GREEN}✓ AgentChatWidget found in GUI code${NC}"
|
||||
else
|
||||
echo -e "${YELLOW}⚠ AgentChatWidget not yet implemented${NC}"
|
||||
echo " This is the next priority item in the roadmap"
|
||||
echo " Location: src/app/gui/debug/agent_chat_widget.{h,cc}"
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo "Next Steps:"
|
||||
echo " 1. Run YAZE and verify gRPC server starts: $YAZE_APP"
|
||||
echo " 2. Test conversation agent: z3ed agent test-conversation"
|
||||
echo " 3. Implement AgentChatWidget for GUI integration"
|
||||
echo ""
|
||||
@@ -1,128 +0,0 @@
|
||||
#!/bin/bash
|
||||
# End-to-end smoke test for test introspection CLI commands
|
||||
# Requires YAZE to be built with gRPC support (build-grpc-test preset)
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
GREEN='\033[0;32m'
|
||||
RED='\033[0;31m'
|
||||
YELLOW='\033[1;33m'
|
||||
BLUE='\033[0;34m'
|
||||
NC='\033[0m'
|
||||
|
||||
Z3ED_BIN="./build-grpc-test/bin/z3ed"
|
||||
YAZE_BIN="./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze"
|
||||
ROM_FILE="assets/zelda3.sfc"
|
||||
TEST_PORT="${TEST_PORT:-50052}"
|
||||
PROMPT="Open Overworld editor and verify it loads"
|
||||
HOST="localhost"
|
||||
|
||||
STATUS_LOG="$(mktemp /tmp/z3ed_status_XXXX.log)"
|
||||
RESULTS_LOG="$(mktemp /tmp/z3ed_results_XXXX.log)"
|
||||
LIST_LOG="$(mktemp /tmp/z3ed_list_XXXX.log)"
|
||||
RUN_LOG="$(mktemp /tmp/z3ed_run_XXXX.log)"
|
||||
|
||||
cleanup() {
|
||||
if [[ -n "${YAZE_PID:-}" ]]; then
|
||||
kill "${YAZE_PID}" 2>/dev/null || true
|
||||
fi
|
||||
rm -f "$STATUS_LOG" "$RESULTS_LOG" "$LIST_LOG" "$RUN_LOG"
|
||||
}
|
||||
trap cleanup EXIT
|
||||
|
||||
if [[ ! -x "$Z3ED_BIN" ]]; then
|
||||
echo -e "${RED}Error:${NC} z3ed binary not found at $Z3ED_BIN"
|
||||
echo "Build with: cmake --build build-grpc-test --target z3ed"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if [[ ! -x "$YAZE_BIN" ]]; then
|
||||
echo -e "${RED}Error:${NC} YAZE binary not found at $YAZE_BIN"
|
||||
echo "Build with: cmake --build build-grpc-test --target yaze"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if [[ ! -f "$ROM_FILE" ]]; then
|
||||
echo -e "${RED}Error:${NC} ROM file not found at $ROM_FILE"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo -e "${YELLOW}=== Test Harness Introspection E2E ===${NC}"
|
||||
|
||||
# Ensure no previous YAZE instance is running
|
||||
killall yaze 2>/dev/null || true
|
||||
sleep 1
|
||||
|
||||
echo -e "${BLUE}→ Starting YAZE (port $TEST_PORT)...${NC}"
|
||||
"$YAZE_BIN" \
|
||||
--enable_test_harness \
|
||||
--test_harness_port="$TEST_PORT" \
|
||||
--rom_file="$ROM_FILE" &
|
||||
YAZE_PID=$!
|
||||
|
||||
ready=0
|
||||
for attempt in {1..20}; do
|
||||
if lsof -i ":$TEST_PORT" >/dev/null 2>&1; then
|
||||
ready=1
|
||||
break
|
||||
fi
|
||||
sleep 0.5
|
||||
done
|
||||
|
||||
if [[ "$ready" -ne 1 ]]; then
|
||||
echo -e "${RED}Error:${NC} ImGuiTestHarness server did not start on port $TEST_PORT"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo -e "${GREEN}✓ Harness ready${NC}"
|
||||
|
||||
echo -e "${BLUE}→ Running agent test workflow: $PROMPT${NC}"
|
||||
if ! "$Z3ED_BIN" agent test --prompt "$PROMPT" --host "$HOST" --port "$TEST_PORT" | tee "$RUN_LOG"; then
|
||||
echo -e "${RED}Error:${NC} agent test run failed"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
PRIMARY_TEST_ID=$(sed -n 's/.*Test ID: \([^][]*\).*/\1/p' "$RUN_LOG" | tail -n 1 | tr -d ' ]')
|
||||
if [[ -z "$PRIMARY_TEST_ID" ]]; then
|
||||
echo -e "${RED}Error:${NC} Unable to extract test id from agent test output"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo -e "${GREEN}✓ Captured Test ID:${NC} $PRIMARY_TEST_ID"
|
||||
|
||||
echo -e "${BLUE}→ Checking status${NC}"
|
||||
"$Z3ED_BIN" agent test status --test-id "$PRIMARY_TEST_ID" --host "$HOST" --port "$TEST_PORT" | tee "$STATUS_LOG"
|
||||
if ! grep -q "Status: " "$STATUS_LOG"; then
|
||||
echo -e "${RED}Error:${NC} status command did not return a status"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if grep -q "Status: PASSED" "$STATUS_LOG"; then
|
||||
echo -e "${GREEN}✓ Status indicates PASS${NC}"
|
||||
else
|
||||
echo -e "${YELLOW}! Status is not PASSED (see $STATUS_LOG)${NC}"
|
||||
fi
|
||||
|
||||
echo -e "${BLUE}→ Fetching detailed results (YAML)${NC}"
|
||||
"$Z3ED_BIN" agent test results --test-id "$PRIMARY_TEST_ID" --include-logs --host "$HOST" --port "$TEST_PORT" | tee "$RESULTS_LOG"
|
||||
if ! grep -q "success: " "$RESULTS_LOG"; then
|
||||
echo -e "${RED}Error:${NC} results command failed"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo -e "${BLUE}→ Listing recent grpc tests${NC}"
|
||||
"$Z3ED_BIN" agent test list --category grpc --limit 5 --host "$HOST" --port "$TEST_PORT" | tee "$LIST_LOG"
|
||||
if ! grep -q "Test ID:" "$LIST_LOG"; then
|
||||
echo -e "${RED}Error:${NC} list command returned no tests"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo -e "${GREEN}✓ Introspection commands completed successfully${NC}"
|
||||
|
||||
echo -e "${YELLOW}Artifacts:${NC}"
|
||||
echo " Status log: $STATUS_LOG"
|
||||
echo " Results log: $RESULTS_LOG"
|
||||
echo " List log: $LIST_LOG"
|
||||
|
||||
echo -e "${GREEN}All checks passed!${NC}"
|
||||
exit 0
|
||||
@@ -1,276 +0,0 @@
|
||||
#!/bin/bash
|
||||
|
||||
# Test Remote Control - Practical Agent Workflows
|
||||
#
|
||||
# This script demonstrates the agent's ability to remotely control YAZE
|
||||
# and perform real editing tasks like drawing tiles, moving entities, etc.
|
||||
#
|
||||
# Usage: ./scripts/test_remote_control.sh
|
||||
|
||||
set -e # Exit on error
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
PROJECT_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"
|
||||
PROTO_DIR="$PROJECT_ROOT/src/app/core/proto"
|
||||
PROTO_FILE="imgui_test_harness.proto"
|
||||
|
||||
# Colors for output
|
||||
RED='\033[0;31m'
|
||||
GREEN='\033[0;32m'
|
||||
YELLOW='\033[1;33m'
|
||||
BLUE='\033[0;34m'
|
||||
NC='\033[0m' # No Color
|
||||
|
||||
# Test harness connection
|
||||
HOST="127.0.0.1"
|
||||
PORT="50052"
|
||||
|
||||
echo -e "${BLUE}=== YAZE Remote Control Test ===${NC}\n"
|
||||
|
||||
# Check if grpcurl is available
|
||||
if ! command -v grpcurl &> /dev/null; then
|
||||
echo -e "${RED}Error: grpcurl not found${NC}"
|
||||
echo "Install with: brew install grpcurl"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Helper function to make gRPC calls
|
||||
grpc_call() {
|
||||
local method=$1
|
||||
local data=$2
|
||||
grpcurl -plaintext \
|
||||
-import-path "$PROTO_DIR" \
|
||||
-proto "$PROTO_FILE" \
|
||||
-d "$data" \
|
||||
"$HOST:$PORT" \
|
||||
"yaze.test.ImGuiTestHarness/$method"
|
||||
}
|
||||
|
||||
# Helper function to print test status
|
||||
print_test() {
|
||||
local test_num=$1
|
||||
local test_name=$2
|
||||
echo -e "\n${BLUE}Test $test_num: $test_name${NC}"
|
||||
}
|
||||
|
||||
print_success() {
|
||||
echo -e "${GREEN}✓ PASSED${NC}"
|
||||
}
|
||||
|
||||
print_failure() {
|
||||
echo -e "${RED}✗ FAILED: $1${NC}"
|
||||
}
|
||||
|
||||
# Test 0: Check server connection
|
||||
print_test "0" "Server Connection"
|
||||
if grpc_call "Ping" '{"message":"hello"}' &> /dev/null; then
|
||||
print_success
|
||||
else
|
||||
print_failure "Server not responding"
|
||||
echo -e "${YELLOW}Start the test harness:${NC}"
|
||||
echo "./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze \\"
|
||||
echo " --enable_test_harness \\"
|
||||
echo " --test_harness_port=50052 \\"
|
||||
echo " --rom_file=assets/zelda3.sfc"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo -e "\n${BLUE}=== Practical Agent Workflows ===${NC}\n"
|
||||
|
||||
# Workflow 1: Activate Draw Tile Mode
|
||||
print_test "1" "Activate Draw Tile Mode"
|
||||
echo "Action: Click DrawTile button in Overworld toolset"
|
||||
|
||||
response=$(grpc_call "Click" '{
|
||||
"target": "Overworld/Toolset/button:DrawTile",
|
||||
"type": "LEFT"
|
||||
}' 2>&1)
|
||||
|
||||
if echo "$response" | grep -q '"success": true'; then
|
||||
print_success
|
||||
echo "Agent can now paint tiles on the overworld"
|
||||
else
|
||||
print_failure "Could not activate draw tile mode"
|
||||
echo "Response: $response"
|
||||
fi
|
||||
|
||||
sleep 1
|
||||
|
||||
# Workflow 2: Select Pan Mode
|
||||
print_test "2" "Select Pan Mode"
|
||||
echo "Action: Click Pan button to enable map navigation"
|
||||
|
||||
response=$(grpc_call "Click" '{
|
||||
"target": "Overworld/Toolset/button:Pan",
|
||||
"type": "LEFT"
|
||||
}' 2>&1)
|
||||
|
||||
if echo "$response" | grep -q '"success": true'; then
|
||||
print_success
|
||||
echo "Agent can now pan the overworld map"
|
||||
else
|
||||
print_failure "Could not activate pan mode"
|
||||
echo "Response: $response"
|
||||
fi
|
||||
|
||||
sleep 1
|
||||
|
||||
# Workflow 3: Open Tile16 Editor
|
||||
print_test "3" "Open Tile16 Editor"
|
||||
echo "Action: Click Tile16Editor button to open editor"
|
||||
|
||||
response=$(grpc_call "Click" '{
|
||||
"target": "Overworld/Toolset/button:Tile16Editor",
|
||||
"type": "LEFT"
|
||||
}' 2>&1)
|
||||
|
||||
if echo "$response" | grep -q '"success": true'; then
|
||||
print_success
|
||||
echo "Tile16 Editor window should now be open"
|
||||
echo "Agent can select tiles for drawing"
|
||||
else
|
||||
print_failure "Could not open Tile16 Editor"
|
||||
echo "Response: $response"
|
||||
fi
|
||||
|
||||
sleep 1
|
||||
|
||||
# Workflow 4: Test Entrances Mode
|
||||
print_test "4" "Switch to Entrances Mode"
|
||||
echo "Action: Click Entrances button"
|
||||
|
||||
response=$(grpc_call "Click" '{
|
||||
"target": "Overworld/Toolset/button:Entrances",
|
||||
"type": "LEFT"
|
||||
}' 2>&1)
|
||||
|
||||
if echo "$response" | grep -q '"success": true'; then
|
||||
print_success
|
||||
echo "Agent can now edit overworld entrances"
|
||||
else
|
||||
print_failure "Could not activate entrances mode"
|
||||
echo "Response: $response"
|
||||
fi
|
||||
|
||||
sleep 1
|
||||
|
||||
# Workflow 5: Test Exits Mode
|
||||
print_test "5" "Switch to Exits Mode"
|
||||
echo "Action: Click Exits button"
|
||||
|
||||
response=$(grpc_call "Click" '{
|
||||
"target": "Overworld/Toolset/button:Exits",
|
||||
"type": "LEFT"
|
||||
}' 2>&1)
|
||||
|
||||
if echo "$response" | grep -q '"success": true'; then
|
||||
print_success
|
||||
echo "Agent can now edit overworld exits"
|
||||
else
|
||||
print_failure "Could not activate exits mode"
|
||||
echo "Response: $response"
|
||||
fi
|
||||
|
||||
sleep 1
|
||||
|
||||
# Workflow 6: Test Sprites Mode
|
||||
print_test "6" "Switch to Sprites Mode"
|
||||
echo "Action: Click Sprites button"
|
||||
|
||||
response=$(grpc_call "Click" '{
|
||||
"target": "Overworld/Toolset/button:Sprites",
|
||||
"type": "LEFT"
|
||||
}' 2>&1)
|
||||
|
||||
if echo "$response" | grep -q '"success": true'; then
|
||||
print_success
|
||||
echo "Agent can now edit sprite placements"
|
||||
else
|
||||
print_failure "Could not activate sprites mode"
|
||||
echo "Response: $response"
|
||||
fi
|
||||
|
||||
sleep 1
|
||||
|
||||
# Workflow 7: Test Items Mode
|
||||
print_test "7" "Switch to Items Mode"
|
||||
echo "Action: Click Items button"
|
||||
|
||||
response=$(grpc_call "Click" '{
|
||||
"target": "Overworld/Toolset/button:Items",
|
||||
"type": "LEFT"
|
||||
}' 2>&1)
|
||||
|
||||
if echo "$response" | grep -q '"success": true'; then
|
||||
print_success
|
||||
echo "Agent can now place items on the overworld"
|
||||
else
|
||||
print_failure "Could not activate items mode"
|
||||
echo "Response: $response"
|
||||
fi
|
||||
|
||||
sleep 1
|
||||
|
||||
# Workflow 8: Test Zoom Controls
|
||||
print_test "8" "Test Zoom Controls"
|
||||
echo "Action: Zoom in on the map"
|
||||
|
||||
response=$(grpc_call "Click" '{
|
||||
"target": "Overworld/Toolset/button:ZoomIn",
|
||||
"type": "LEFT"
|
||||
}' 2>&1)
|
||||
|
||||
if echo "$response" | grep -q '"success": true'; then
|
||||
print_success
|
||||
echo "Zoom level increased"
|
||||
|
||||
# Zoom back out
|
||||
sleep 0.5
|
||||
grpc_call "Click" '{
|
||||
"target": "Overworld/Toolset/button:ZoomOut",
|
||||
"type": "LEFT"
|
||||
}' &> /dev/null
|
||||
echo "Zoom level restored"
|
||||
else
|
||||
print_failure "Could not zoom"
|
||||
echo "Response: $response"
|
||||
fi
|
||||
|
||||
# Workflow 9: Legacy Format Fallback Test
|
||||
print_test "9" "Legacy Format Fallback"
|
||||
echo "Action: Test old-style widget reference"
|
||||
|
||||
response=$(grpc_call "Click" '{
|
||||
"target": "button:Overworld",
|
||||
"type": "LEFT"
|
||||
}' 2>&1)
|
||||
|
||||
if echo "$response" | grep -q '"success": true'; then
|
||||
print_success
|
||||
echo "Legacy format still works (backwards compatible)"
|
||||
else
|
||||
# This is expected if Overworld Editor isn't in main window
|
||||
echo -e "${YELLOW}Legacy format may not work (expected)${NC}"
|
||||
fi
|
||||
|
||||
# Summary
|
||||
echo -e "\n${BLUE}=== Test Summary ===${NC}\n"
|
||||
echo "Remote control capabilities verified:"
|
||||
echo " ✓ Mode switching (Draw, Pan, Entrances, Exits, Sprites, Items)"
|
||||
echo " ✓ Tool opening (Tile16 Editor)"
|
||||
echo " ✓ Zoom controls"
|
||||
echo " ✓ Widget registry integration"
|
||||
echo ""
|
||||
echo "Agent can now:"
|
||||
echo " • Switch between editing modes"
|
||||
echo " • Open auxiliary editors"
|
||||
echo " • Control view settings"
|
||||
echo " • Prepare for complex editing operations"
|
||||
echo ""
|
||||
echo "Next steps for full automation:"
|
||||
echo " 1. Add canvas click support (x,y coordinates)"
|
||||
echo " 2. Add tile selection in Tile16 Editor"
|
||||
echo " 3. Add entity dragging support"
|
||||
echo " 4. Implement workflow chaining (mode + select + draw)"
|
||||
echo ""
|
||||
echo -e "${GREEN}Remote control system functional!${NC}"
|
||||
Reference in New Issue
Block a user