feat: Revamp agent test suite script for improved functionality and usability

- Converted the agent test suite script to a more comprehensive format, consolidating multiple tests into a single script.
- Enhanced pre-flight checks for AI provider availability, including Ollama and Gemini.
- Implemented detailed test execution and result logging, providing clearer output and recommendations for troubleshooting.
- Removed outdated test scripts to streamline the testing process and improve maintainability.
- Updated README to reflect changes in the test suite and added build environment verification instructions.
This commit is contained in:
scawful
2025-10-04 14:10:04 -04:00
parent 3ef157b991
commit 99d37a8747
8 changed files with 250 additions and 1260 deletions

View File

@@ -106,3 +106,29 @@ cmake --build build --config Debug
- **`extract_changelog.py`** - Extract changelog for releases
- **`quality_check.sh`** - Code quality checks (Linux/macOS)
- **`create-macos-bundle.sh`** - Create macOS application bundle for releases
## Build Environment Verification
This directory also contains build environment verification scripts.
### `verify-build-environment.ps1` / `.sh`
A comprehensive script that checks:
- ✅ **CMake Installation** - Version 3.16+ required
- ✅ **Git Installation** - With submodule support
- ✅ **C++ Compiler** - GCC 13+, Clang 16+, or MSVC 2019+
- ✅ **Platform Tools** - Xcode (macOS), Visual Studio (Windows), build-essential (Linux)
- ✅ **Git Submodules** - All dependencies synchronized
### Usage
**Windows (PowerShell):**
```powershell
.\scripts\verify-build-environment.ps1
```
**macOS/Linux:**
```bash
./scripts/verify-build-environment.sh
```

View File

@@ -1,365 +0,0 @@
# YAZE Build Environment Verification Scripts
This directory contains build environment verification and setup scripts for YAZE development.
## Quick Start
### Verify Build Environment
**Windows (PowerShell):**
```powershell
.\scripts\verify-build-environment.ps1
```
**macOS/Linux:**
```bash
./scripts/verify-build-environment.sh
```
## Scripts Overview
### `verify-build-environment.ps1` / `.sh`
Comprehensive build environment verification script that checks:
-**CMake Installation** - Version 3.16+ required
-**Git Installation** - With submodule support
-**C++ Compiler** - GCC 13+, Clang 16+, or MSVC 2019+
-**Platform Tools** - Xcode (macOS), Visual Studio (Windows), build-essential (Linux)
-**Git Submodules** - All dependencies synchronized (auto-fixes if missing/empty)
-**CMake Cache** - Freshness check (warns if >7 days old)
-**Dependency Compatibility** - gRPC isolation, httplib, nlohmann/json
-**CMake Configuration** - Test configuration (verbose mode only)
**Automatic Fixes:**
The script now automatically fixes common issues without requiring `-FixIssues`:
- 🔧 **Missing/Empty Submodules** - Automatically runs `git submodule update --init --recursive`
- 🔧 **Old CMake Cache** - Prompts for confirmation when using `-FixIssues` (auto-skips otherwise)
#### Usage
**Windows:**
```powershell
# Basic verification (auto-fixes submodules)
.\scripts\verify-build-environment.ps1
# With interactive fixes (prompts for cache cleaning)
.\scripts\verify-build-environment.ps1 -FixIssues
# Force clean old CMake cache (no prompts)
.\scripts\verify-build-environment.ps1 -CleanCache
# Verbose output (includes CMake configuration test)
.\scripts\verify-build-environment.ps1 -Verbose
# Combined options
.\scripts\verify-build-environment.ps1 -FixIssues -Verbose
```
**macOS/Linux:**
```bash
# Basic verification (auto-fixes submodules)
./scripts/verify-build-environment.sh
# With interactive fixes (prompts for cache cleaning)
./scripts/verify-build-environment.sh --fix
# Force clean old CMake cache (no prompts)
./scripts/verify-build-environment.sh --clean
# Verbose output
./scripts/verify-build-environment.sh --verbose
# Combined options
./scripts/verify-build-environment.sh --fix --verbose
```
#### Exit Codes
- `0` - Success, environment ready for development
- `1` - Issues found, manual intervention required
## Common Workflows
### First-Time Setup
```bash
# 1. Clone repository with submodules
git clone --recursive https://github.com/scawful/yaze.git
cd yaze
# 2. Verify environment
./scripts/verify-build-environment.sh --verbose
# 3. If issues found, fix automatically
./scripts/verify-build-environment.sh --fix
# 4. Build
cmake --preset debug # macOS
# OR
cmake -B build -DCMAKE_BUILD_TYPE=Debug # All platforms
cmake --build build
```
### After Pulling Changes
```bash
# 1. Update submodules
git submodule update --init --recursive
# 2. Verify environment (check cache age)
./scripts/verify-build-environment.sh
# 3. If cache is old, clean and rebuild
./scripts/verify-build-environment.sh --clean
cmake -B build -DCMAKE_BUILD_TYPE=Debug
cmake --build build
```
### Troubleshooting Build Issues
```bash
# 1. Clean everything and verify
./scripts/verify-build-environment.sh --clean --fix --verbose
# 2. This will:
# - Sync all git submodules
# - Remove old CMake cache
# - Test CMake configuration
# - Report any issues
# 3. Follow recommended actions in output
```
### Before Opening Pull Request
```bash
# Verify clean build environment
./scripts/verify-build-environment.sh --verbose
# Should report: "Build Environment Ready for Development!"
```
## Automatic Fixes
The script automatically fixes common issues when detected:
### Always Auto-Fixed (No Confirmation Required)
1. **Missing/Empty Git Submodules**
```bash
git submodule sync --recursive
git submodule update --init --recursive
```
- Runs automatically when submodules are missing or empty
- No user confirmation required
- Re-verifies after sync to ensure success
### Fixed with `-FixIssues` / `--fix` Flag
2. **Clean Old CMake Cache** (with confirmation prompt)
- Prompts user before removing build directories
- Only when cache is older than 7 days
- User can choose to skip
### Fixed with `-CleanCache` / `--clean` Flag
3. **Force Clean CMake Cache** (no confirmation)
- Removes `build/`, `build_test/`, `build-grpc-test/`
- Removes Visual Studio cache (`out/`)
- No prompts, immediate cleanup
### Optional Verbose Tests
When run with `--verbose` or `-Verbose`:
4. **Test CMake Configuration**
- Creates temporary build directory
- Tests minimal configuration
- Reports success/failure
- Cleans up test directory
## Integration with Visual Studio
The verification script integrates with Visual Studio CMake workflow:
1. **Pre-Build Check**: Run verification before opening VS
2. **Submodule Sync**: Ensures all dependencies are present
3. **Cache Management**: Prevents stale CMake cache issues
**Visual Studio Workflow:**
```powershell
# 1. Verify environment
.\scripts\verify-build-environment.ps1 -Verbose
# 2. Open in Visual Studio
# File → Open → Folder → Select yaze directory
# 3. Visual Studio detects CMakeLists.txt automatically
# 4. Select Debug/Release from toolbar
# 5. Press F5 to build and run
```
## What Gets Checked
### CMake (Required)
- Minimum version 3.16
- Command available in PATH
- Compatible with project CMake files
### Git (Required)
- Git command available
- Submodule support
- All submodules present and synchronized:
- `src/lib/SDL`
- `src/lib/abseil-cpp`
- `src/lib/asar`
- `src/lib/imgui`
- `third_party/json`
- `third_party/httplib`
### Compilers (Required)
- **Windows**: Visual Studio 2019+ with C++ workload
- **macOS**: Xcode Command Line Tools
- **Linux**: GCC 13+ or Clang 16+, build-essential package
### Platform Dependencies
**Linux Specific:**
- GTK+3 development libraries (`libgtk-3-dev`)
- DBus development libraries (`libdbus-1-dev`)
- pkg-config tool
**macOS Specific:**
- Xcode Command Line Tools
- Cocoa framework (automatic)
**Windows Specific:**
- Visual Studio 2022 recommended
- Windows SDK 10.0.19041.0 or later
### CMake Cache
Checks for build directories:
- `build/` - Main build directory
- `build_test/` - Test build directory
- `build-grpc-test/` - gRPC test builds
- `out/` - Visual Studio CMake output
Warns if cache files are older than 7 days.
### Dependencies
**gRPC Isolation (when enabled):**
- Verifies `CMAKE_DISABLE_FIND_PACKAGE_Protobuf=TRUE`
- Verifies `CMAKE_DISABLE_FIND_PACKAGE_absl=TRUE`
- Prevents system package conflicts
**Header-Only Libraries:**
- `third_party/httplib` - cpp-httplib HTTP library
- `third_party/json` - nlohmann/json library
## Automatic Fixes
When run with `--fix` or `-FixIssues`:
1. **Sync Git Submodules**
```bash
git submodule sync --recursive
git submodule update --init --recursive
```
2. **Clean CMake Cache** (when combined with `--clean`)
- Removes `build/`, `build_test/`, `build-grpc-test/`
- Removes Visual Studio cache (`out/`)
3. **Test CMake Configuration**
- Creates temporary build directory
- Tests minimal configuration
- Reports success/failure
- Cleans up test directory
## CI/CD Integration
The verification script can be integrated into CI/CD pipelines:
```yaml
# Example GitHub Actions step
- name: Verify Build Environment
run: |
./scripts/verify-build-environment.sh --verbose
shell: bash
```
## Troubleshooting
### Script Reports "CMake Not Found"
**Windows:**
```powershell
# Check if CMake is installed
cmake --version
# If not found, add to PATH or install
choco install cmake
# Restart PowerShell
```
**macOS/Linux:**
```bash
# Check if CMake is installed
cmake --version
# Install if missing
brew install cmake # macOS
sudo apt install cmake # Ubuntu/Debian
```
### "Git Submodules Missing"
```bash
# Manually sync and update
git submodule sync --recursive
git submodule update --init --recursive
# Or use fix option
./scripts/verify-build-environment.sh --fix
```
### "CMake Cache Too Old"
```bash
# Clean automatically
./scripts/verify-build-environment.sh --clean
# Or manually
rm -rf build build_test build-grpc-test
```
### "Visual Studio Not Found" (Windows)
```powershell
# Install Visual Studio 2022 with C++ workload
# Download from: https://visualstudio.microsoft.com/
# Required workload:
# "Desktop development with C++"
```
### Script Fails on Network Issues (gRPC)
The script verifies configuration but doesn't download gRPC unless building with `-DYAZE_WITH_GRPC=ON`.
If you encounter network issues:
```bash
# Use minimal build (no gRPC)
cmake -B build -DYAZE_MINIMAL_BUILD=ON
```
## See Also
- [Build Instructions](../docs/02-build-instructions.md) - Complete build guide
- [Getting Started](../docs/01-getting-started.md) - First-time setup
- [Platform Compatibility](../docs/B2-platform-compatibility.md) - Platform-specific notes
- [Contributing](../docs/B1-contributing.md) - Development guidelines

303
scripts/agent_test_suite.sh Normal file → Executable file
View File

@@ -1,93 +1,238 @@
#!/bin/bash
# Comprehensive test script for Ollama and Gemini AI providers with tool calling
# Comprehensive test suite for the z3ed AI Agent.
# This script consolidates multiple older test scripts into one.
#
# Usage: ./scripts/agent_test_suite.sh <provider>
# provider: ollama, gemini, or mock
set -e
set -e # Exit immediately if a command exits with a non-zero status.
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
RED='\033[0;31m'
BLUE='\033[0;34m'
NC='\033[0m' # No Color
# --- Configuration ---
Z3ED_BIN="/Users/scawful/Code/yaze/build_test/bin/z3ed"
ROM_PATH="/Users/scawful/Code/yaze/assets/zelda3.sfc"
TEST_DIR="/Users/scawful/Code/yaze/assets/agent"
TEST_FILES=(
"context_and_followup.txt"
"complex_command_generation.txt"
"error_handling_and_edge_cases.txt"
)
Z3ED="./build_test/bin/z3ed"
ROM="assets/zelda3.sfc"
RESULTS_FILE="/tmp/z3ed_ai_test_results.txt"
# --- Helper Functions ---
print_header() {
echo ""
echo "================================================="
echo "$1"
echo "================================================="
echo "=========================================="
echo " Z3ED AI Provider Test Suite"
echo "=========================================="
echo ""
# Clear results file
> "$RESULTS_FILE"
# Check if z3ed exists
if [ ! -f "$Z3ED" ]; then
echo -e "${RED}✗ z3ed not found at $Z3ED${NC}"
echo " Try building with: cmake --build build_rooms"
exit 1
fi
echo -e "${GREEN}✓ z3ed found${NC}"
# Check if ROM exists
if [ ! -f "$ROM" ]; then
echo -e "${RED}✗ ROM not found at $ROM${NC}"
exit 1
fi
echo -e "${GREEN}✓ ROM found${NC}"
# Test Ollama availability
OLLAMA_AVAILABLE=false
if command -v ollama &> /dev/null && curl -s http://localhost:11434/api/tags > /dev/null 2>&1; then
if ollama list | grep -q "qwen2.5-coder"; then
OLLAMA_AVAILABLE=true
echo -e "${GREEN}✓ Ollama available (qwen2.5-coder)${NC}"
else
echo -e "${YELLOW}⚠ Ollama available but qwen2.5-coder not found${NC}"
echo " Install with: ollama pull qwen2.5-coder:7b"
fi
else
echo -e "${YELLOW}⚠ Ollama not available${NC}"
fi
# Test Gemini availability
GEMINI_AVAILABLE=false
if [ -n "$GEMINI_API_KEY" ]; then
GEMINI_AVAILABLE=true
echo -e "${GREEN}✓ Gemini API key configured${NC}"
else
echo -e "${YELLOW}⚠ Gemini API key not set${NC}"
echo " Set with: export GEMINI_API_KEY='your-key'"
fi
if [ "$OLLAMA_AVAILABLE" = false ] && [ "$GEMINI_AVAILABLE" = false ]; then
echo -e "${RED}✗ No AI providers available${NC}"
exit 1
fi
echo ""
# Test function
run_test() {
local test_name="$1"
local provider="$2"
local query="$3"
local expected_pattern="$4"
local extra_args="$5"
echo "=========================================="
echo " Test: $test_name"
echo " Provider: $provider"
echo "=========================================="
echo ""
echo "Query: $query"
echo ""
local cmd="$Z3ED agent simple-chat \"$query\" --rom=\"$ROM\" --ai_provider=$provider $extra_args"
echo "Running: $cmd"
echo ""
local output
local exit_code=0
output=$($cmd 2>&1) || exit_code=$?
echo "$output"
echo ""
# Check for expected patterns
local result="UNKNOWN"
if [ $exit_code -ne 0 ]; then
result="FAILED (exit code: $exit_code)"
elif echo "$output" | grep -qi "$expected_pattern"; then
result="PASSED"
echo -e "${GREEN}✓ Response contains expected pattern: '$expected_pattern'${NC}"
else
result="FAILED (pattern not found)"
echo -e "${YELLOW}⚠ Response missing expected pattern: '$expected_pattern'${NC}"
fi
# Check for error indicators
if echo "$output" | grep -qi "error\|failed\|infinite loop"; then
result="FAILED (error detected)"
echo -e "${RED}✗ Error detected in output${NC}"
fi
# Record result
echo "$test_name | $provider | $result" >> "$RESULTS_FILE"
echo ""
echo -e "${BLUE}Result: $result${NC}"
echo ""
sleep 2 # Avoid rate limiting
}
# --- Pre-flight Checks ---
print_header "Performing Pre-flight Checks"
# Test Suite
if [ -z "$1" ]; then
echo "❌ Error: No AI provider specified."
echo "Usage: $0 <ollama|gemini|mock>"
exit 1
fi
PROVIDER=$1
echo "✅ Provider: $PROVIDER"
if [ ! -f "$Z3ED_BIN" ]; then
echo "❌ Error: z3ed binary not found at $Z3ED_BIN"
echo "Please build the project first (e.g., in build_test)."
exit 1
fi
echo "✅ z3ed binary found."
if [ ! -f "$ROM_PATH" ]; then
echo "❌ Error: ROM not found at $ROM_PATH"
exit 1
fi
echo "✅ ROM file found."
if [ "$PROVIDER" == "gemini" ] && [ -z "$GEMINI_API_KEY" ]; then
echo "❌ Error: GEMINI_API_KEY environment variable is not set."
echo "Please set it to your Gemini API key to run this test."
exit 1
fi
if [ "$PROVIDER" == "gemini" ]; then
echo "✅ GEMINI_API_KEY is set."
if [ "$OLLAMA_AVAILABLE" = true ]; then
echo ""
echo "=========================================="
echo " OLLAMA TESTS"
echo "=========================================="
echo ""
run_test "Ollama: Simple Question" "ollama" \
"What dungeons are in this ROM?" \
"dungeon\|palace\|castle"
run_test "Ollama: Sprite Query" "ollama" \
"What sprites are in room 0?" \
"sprite\|room"
run_test "Ollama: Tile Search" "ollama" \
"Where can I find trees in the overworld?" \
"tree\|0x02E\|map\|coordinate"
run_test "Ollama: Map Description" "ollama" \
"Describe overworld map 0" \
"light world\|map\|overworld"
run_test "Ollama: Warp List" "ollama" \
"List the warps in the Light World" \
"warp\|entrance\|exit"
fi
if [ "$PROVIDER" == "ollama" ]; then
if ! pgrep -x "Ollama" > /dev/null && ! pgrep -x "ollama" > /dev/null; then
echo "⚠️ Warning: Ollama server process not found. The script might fail if it's not running."
if [ "$GEMINI_AVAILABLE" = true ]; then
echo ""
echo "=========================================="
echo " GEMINI TESTS"
echo "=========================================="
echo ""
run_test "Gemini: Simple Question" "gemini" \
"What dungeons are in this ROM?" \
"dungeon\|palace\|castle" \
"--gemini_api_key=\"$GEMINI_API_KEY\""
run_test "Gemini: Sprite Query" "gemini" \
"What sprites are in room 0?" \
"sprite\|room" \
"--gemini_api_key=\"$GEMINI_API_KEY\""
run_test "Gemini: Tile Search" "gemini" \
"Where can I find trees in the overworld?" \
"tree\|0x02E\|map\|coordinate" \
"--gemini_api_key=\"$GEMINI_API_KEY\""
run_test "Gemini: Map Description" "gemini" \
"Describe overworld map 0" \
"light world\|map\|overworld" \
"--gemini_api_key=\"$GEMINI_API_KEY\""
run_test "Gemini: Warp List" "gemini" \
"List the warps in the Light World" \
"warp\|entrance\|exit" \
"--gemini_api_key=\"$GEMINI_API_KEY\""
fi
echo ""
echo "=========================================="
echo " TEST SUMMARY"
echo "=========================================="
echo ""
if [ -f "$RESULTS_FILE" ]; then
cat "$RESULTS_FILE"
echo ""
local total=$(wc -l < "$RESULTS_FILE" | tr -d ' ')
local passed=$(grep -c "PASSED" "$RESULTS_FILE" || echo "0")
local failed=$(grep -c "FAILED" "$RESULTS_FILE" || echo "0")
echo "Total Tests: $total"
echo -e "${GREEN}Passed: $passed${NC}"
echo -e "${RED}Failed: $failed${NC}"
echo ""
if [ "$passed" -eq "$total" ]; then
echo -e "${GREEN}🎉 All tests passed!${NC}"
elif [ "$passed" -gt 0 ]; then
echo -e "${YELLOW}⚠ Some tests failed. Review output above.${NC}"
else
echo "✅ Ollama server process found."
echo -e "${RED}✗ All tests failed. Check configuration.${NC}"
fi
else
echo -e "${RED}✗ No results file generated${NC}"
fi
# --- Run Test Suite ---
for test_file in "${TEST_FILES[@]}"; do
print_header "Running Test File: $test_file (Provider: $PROVIDER)"
FULL_TEST_PATH="$TEST_DIR/$test_file"
if [ ! -f "$FULL_TEST_PATH" ]; then
echo "❌ Error: Test file not found: $FULL_TEST_PATH"
continue
fi
# Construct the command. Use --quiet for cleaner test logs.
COMMAND="$Z3ED_BIN agent simple-chat --file=$FULL_TEST_PATH --rom=$ROM_PATH --ai_provider=$PROVIDER --quiet"
echo "Executing command..."
echo "--- Agent Output for $test_file ---"
# Execute the command and print its output
eval $COMMAND
echo "--- Test Complete ---"
echo ""
done
print_header "✅ All tests completed successfully!"
echo ""
echo "=========================================="
echo " Recommendations"
echo "=========================================="
echo ""
echo "If tests are failing:"
echo " 1. Check that the ROM is valid and loaded properly"
echo " 2. Verify tool definitions in prompt_catalogue.yaml"
echo " 3. Review system prompts in prompt_builder.cc"
echo " 4. Check AI provider connectivity and quotas"
echo " 5. Examine tool execution logs for errors"
echo ""
echo "For Ollama:"
echo " - Try different models: ollama pull llama3:8b"
echo " - Adjust temperature in ollama_ai_service.cc"
echo ""
echo "For Gemini:"
echo " - Verify API key is valid"
echo " - Check quota at: https://aistudio.google.com"
echo ""
echo "Results saved to: $RESULTS_FILE"
echo ""

View File

@@ -1,79 +0,0 @@
#!/bin/bash
# Test Phase 4: Enhanced Prompting
# Compares command quality with and without few-shot examples
set -e
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
PROJECT_ROOT="$SCRIPT_DIR/.."
Z3ED_BIN="$PROJECT_ROOT/build/bin/z3ed"
echo "🧪 Phase 4: Enhanced Prompting Test"
echo "======================================"
echo ""
# Color output helpers
GREEN='\033[0;32m'
BLUE='\033[0;34m'
YELLOW='\033[0;33m'
NC='\033[0m' # No Color
# Test prompts
declare -a TEST_PROMPTS=(
"Change palette 0 color 5 to red"
"Place a tree at coordinates (10, 20) on map 0"
"Make all soldiers wear red armor"
"Export palette 0, change color 3 to blue, and import it back"
"Validate the ROM"
)
echo -e "${BLUE}Testing with Enhanced Prompting (few-shot examples)${NC}"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
echo ""
for prompt in "${TEST_PROMPTS[@]}"; do
echo -e "${YELLOW}Prompt:${NC} \"$prompt\""
echo ""
# Test with Gemini if available
if [ -n "$GEMINI_API_KEY" ]; then
echo "Testing with Gemini (enhanced prompting)..."
OUTPUT=$($Z3ED_BIN agent plan --prompt "$prompt" 2>&1)
echo "$OUTPUT"
# Count commands
COMMAND_COUNT=$(echo "$OUTPUT" | grep -c -E "^\s*-" || true)
echo ""
echo "Commands generated: $COMMAND_COUNT"
else
echo "⚠️ GEMINI_API_KEY not set - using MockAIService"
OUTPUT=$($Z3ED_BIN agent plan --prompt "$prompt" 2>&1 || true)
echo "$OUTPUT"
fi
echo ""
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
echo ""
done
echo ""
echo "🎉 Enhanced Prompting Tests Complete!"
echo ""
echo "Key Improvements with Phase 4:"
echo " • Few-shot examples show the model how to format commands"
echo " • Comprehensive command reference included in system prompt"
echo " • Tile ID references (tree=0x02E, house=0x0C0, etc.)"
echo " • Multi-step workflow examples (export → modify → import)"
echo " • Clear constraints on output format"
echo ""
echo "Expected Accuracy Improvement:"
echo " • Before: ~60-70% (guessing command syntax)"
echo " • After: ~90%+ (following proven patterns)"
echo ""
echo "Next Steps:"
echo " 1. Review command quality and accuracy"
echo " 2. Add more few-shot examples for edge cases"
echo " 3. Load z3ed-resources.yaml when available"
echo " 4. Add ROM context injection"

View File

@@ -1,153 +0,0 @@
#!/bin/bash
# End-to-end test script for ImGuiTestHarness gRPC service
# Tests all RPC methods to validate Phase 3 implementation
set -e # Exit on error
# Colors for output
GREEN='\033[0;32m'
RED='\033[0;31m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color
# Configuration
YAZE_BIN="./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze"
TEST_PORT=50052
PROTO_PATH="src/app/core/proto"
PROTO_FILE="imgui_test_harness.proto"
ROM_FILE="assets/zelda3.sfc"
echo -e "${YELLOW}=== ImGuiTestHarness E2E Test ===${NC}\n"
# Check if YAZE binary exists
if [ ! -f "$YAZE_BIN" ]; then
echo -e "${RED}Error: YAZE binary not found at $YAZE_BIN${NC}"
echo "Please build with: cmake --build build-grpc-test --target yaze"
exit 1
fi
# Check if ROM file exists
if [ ! -f "$ROM_FILE" ]; then
echo -e "${RED}Error: ROM file not found at $ROM_FILE${NC}"
exit 1
fi
# Check if grpcurl is installed
if ! command -v grpcurl &> /dev/null; then
echo -e "${RED}Error: grpcurl not found${NC}"
echo "Install with: brew install grpcurl"
exit 1
fi
# Kill any existing YAZE instances
echo -e "${YELLOW}Cleaning up existing YAZE instances...${NC}"
killall yaze 2>/dev/null || true
sleep 1
# Start YAZE in background
echo -e "${YELLOW}Starting YAZE with test harness...${NC}"
$YAZE_BIN \
--enable_test_harness \
--test_harness_port=$TEST_PORT \
--rom_file=$ROM_FILE &
YAZE_PID=$!
echo "YAZE PID: $YAZE_PID"
# Wait for server to be ready
echo -e "${YELLOW}Waiting for server to start...${NC}"
sleep 3
# Check if server is running
if ! lsof -i :$TEST_PORT > /dev/null 2>&1; then
echo -e "${RED}Error: Server not listening on port $TEST_PORT${NC}"
kill $YAZE_PID 2>/dev/null || true
exit 1
fi
echo -e "${GREEN}✓ Server started successfully${NC}\n"
# Test counter
TESTS_RUN=0
TESTS_PASSED=0
TESTS_FAILED=0
# Helper function to run a test
run_test() {
local test_name="$1"
local rpc_method="$2"
local request_data="$3"
TESTS_RUN=$((TESTS_RUN + 1))
echo -e "${YELLOW}Test $TESTS_RUN: $test_name${NC}"
if grpcurl -plaintext \
-import-path $PROTO_PATH \
-proto $PROTO_FILE \
-d "$request_data" \
127.0.0.1:$TEST_PORT \
yaze.test.ImGuiTestHarness/$rpc_method 2>&1 | tee /tmp/grpc_test_output.txt; then
# Check for success in response
if grep -q '"success":.*true' /tmp/grpc_test_output.txt || \
grep -q '"message":.*"Pong' /tmp/grpc_test_output.txt || \
grep -q 'yazeVersion' /tmp/grpc_test_output.txt; then
echo -e "${GREEN}✓ PASSED${NC}\n"
TESTS_PASSED=$((TESTS_PASSED + 1))
else
echo -e "${RED}✗ FAILED (unexpected response)${NC}\n"
TESTS_FAILED=$((TESTS_FAILED + 1))
fi
else
echo -e "${RED}✗ FAILED (connection/RPC error)${NC}\n"
TESTS_FAILED=$((TESTS_FAILED + 1))
fi
}
# Run all tests
echo -e "${YELLOW}=== Running RPC Tests ===${NC}\n"
# 1. Ping - Health Check
run_test "Ping (Health Check)" "Ping" '{"message":"test"}'
# 2. Click - Menu Item (Open Overworld Editor)
# Note: Menu items in YAZE use format "menuitem:<Icon> Name"
run_test "Click (Open Overworld Editor)" "Click" '{"target":"menuitem: Overworld Editor","type":"CLICK_TYPE_LEFT"}'
# 3. Wait - Window Visible (Overworld Editor should open)
run_test "Wait (Overworld Editor Window)" "Wait" '{"condition":"window_visible:Overworld","timeout_ms":15000,"poll_interval_ms":100}'
# 4. Assert - Window Visible (Overworld Editor should be open)
run_test "Assert (Overworld Editor Visible)" "Assert" '{"condition":"visible:Overworld"}'
# 5. Click - Another menu item (Dungeon Editor)
run_test "Click (Open Dungeon Editor)" "Click" '{"target":"menuitem: Dungeon Editor","type":"CLICK_TYPE_LEFT"}'
# 6. Screenshot - Not Implemented (stub)
echo -e "${YELLOW}Test 6: Screenshot (Not Implemented - Stub)${NC}"
echo -e "${YELLOW}(Skipping - proto field mismatch needs fix)${NC}\n"
TESTS_RUN=$((TESTS_RUN + 1))
# Summary
echo -e "${YELLOW}=== Test Summary ===${NC}"
echo "Tests Run: $TESTS_RUN"
echo -e "${GREEN}Tests Passed: $TESTS_PASSED${NC}"
if [ $TESTS_FAILED -gt 0 ]; then
echo -e "${RED}Tests Failed: $TESTS_FAILED${NC}"
fi
echo ""
# Cleanup
echo -e "${YELLOW}Cleaning up...${NC}"
kill $YAZE_PID 2>/dev/null || true
rm -f /tmp/grpc_test_output.txt
sleep 1
# Exit with appropriate code
if [ $TESTS_FAILED -gt 0 ]; then
echo -e "${RED}Some tests failed${NC}"
exit 1
else
echo -e "${GREEN}All tests passed!${NC}"
exit 0
fi

View File

@@ -1,180 +0,0 @@
#!/bin/bash
# Test script to verify ImGuiTestHarness gRPC service integration
# Ensures the GUI automation infrastructure is working
set -e
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PROJECT_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
YAZE_APP="${PROJECT_ROOT}/build/bin/yaze.app/Contents/MacOS/yaze"
# Colors
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m'
echo "========================================="
echo "ImGui Test Harness Verification"
echo "========================================="
echo ""
# Check if YAZE is built with gRPC support
if [ ! -f "$YAZE_APP" ]; then
echo -e "${RED}✗ YAZE application not found at $YAZE_APP${NC}"
echo ""
echo "Build with gRPC support:"
echo " cmake -B build -DYAZE_WITH_GRPC=ON -DYAZE_WITH_JSON=ON"
echo " cmake --build build --target yaze"
exit 1
fi
echo -e "${GREEN}✓ YAZE application found${NC}"
echo ""
# Check if gRPC libraries are linked
echo "Checking gRPC dependencies..."
echo "------------------------------"
if otool -L "$YAZE_APP" 2>/dev/null | grep -q "libgrpc"; then
echo -e "${GREEN}✓ gRPC libraries linked${NC}"
else
echo -e "${YELLOW}⚠ gRPC libraries may not be linked${NC}"
echo " This might be expected if gRPC is statically linked"
fi
# Check for test harness service code
TEST_HARNESS_IMPL="${PROJECT_ROOT}/src/app/core/service/imgui_test_harness_service.cc"
if [ -f "$TEST_HARNESS_IMPL" ]; then
echo -e "${GREEN}✓ Test harness implementation found${NC}"
else
echo -e "${RED}✗ Test harness implementation not found${NC}"
exit 1
fi
echo ""
# Check if the service is properly integrated
echo "Verifying test harness integration..."
echo "--------------------------------------"
# Look for the service registration in the codebase
if grep -q "ImGuiTestHarnessServer" "${PROJECT_ROOT}/src/app/core/service/imgui_test_harness_service.h"; then
echo -e "${GREEN}✓ ImGuiTestHarnessServer class defined${NC}"
else
echo -e "${RED}✗ ImGuiTestHarnessServer class not found${NC}"
exit 1
fi
# Check for gRPC server initialization
if grep -rq "ImGuiTestHarnessServer.*Start" "${PROJECT_ROOT}/src/app" 2>/dev/null; then
echo -e "${GREEN}✓ Server startup code found${NC}"
else
echo -e "${YELLOW}⚠ Could not verify server startup code${NC}"
fi
echo ""
# Test gRPC port availability
echo "Testing gRPC server availability..."
echo "------------------------------------"
GRPC_PORT=50051
echo "Checking if port $GRPC_PORT is available..."
if lsof -Pi :$GRPC_PORT -sTCP:LISTEN -t >/dev/null 2>&1; then
echo -e "${YELLOW}⚠ Port $GRPC_PORT is already in use${NC}"
echo " If YAZE is running, this is expected"
SERVER_RUNNING=true
else
echo -e "${GREEN}✓ Port $GRPC_PORT is available${NC}"
SERVER_RUNNING=false
fi
echo ""
# Interactive test option
if [ "$SERVER_RUNNING" = false ]; then
echo "========================================="
echo "Interactive Test Options"
echo "========================================="
echo ""
echo "The test harness server is not currently running."
echo ""
echo "To test the full integration:"
echo ""
echo "1. Start YAZE in one terminal:"
echo " $YAZE_APP"
echo ""
echo "2. In another terminal, verify the gRPC server:"
echo " lsof -Pi :$GRPC_PORT -sTCP:LISTEN"
echo ""
echo "3. Test with z3ed GUI automation:"
echo " z3ed agent test --prompt 'Open Overworld editor'"
echo ""
else
echo "========================================="
echo "Live Server Test"
echo "========================================="
echo ""
echo -e "${GREEN}✓ gRPC server appears to be running on port $GRPC_PORT${NC}"
echo ""
# Try to connect to the server
if command -v grpcurl &> /dev/null; then
echo "Testing server connection with grpcurl..."
if grpcurl -plaintext localhost:$GRPC_PORT list 2>&1 | grep -q "yaze.test.ImGuiTestHarness"; then
echo -e "${GREEN}✅ ImGuiTestHarness service is available!${NC}"
echo ""
echo "Available RPC methods:"
grpcurl -plaintext localhost:$GRPC_PORT list yaze.test.ImGuiTestHarness 2>&1 | sed 's/^/ /'
else
echo -e "${YELLOW}⚠ Could not verify service availability${NC}"
fi
else
echo -e "${YELLOW}⚠ grpcurl not installed, skipping connection test${NC}"
echo " Install with: brew install grpcurl"
fi
fi
echo ""
echo "========================================="
echo "Summary"
echo "========================================="
echo ""
echo "Test Harness Components:"
echo " [✓] Source files present"
echo " [✓] gRPC integration compiled"
if [ "$SERVER_RUNNING" = true ]; then
echo " [✓] Server running on port $GRPC_PORT"
else
echo " [ ] Server not currently running"
fi
echo ""
echo "The ImGuiTestHarness service is ${GREEN}ready${NC} for:"
echo " - Widget discovery and introspection"
echo " - Automated GUI testing via z3ed agent test"
echo " - Recording and playback of user interactions"
echo ""
# Additional checks for agent chat widget
echo "Checking for Agent Chat Widget..."
echo "----------------------------------"
if grep -rq "AgentChatWidget" "${PROJECT_ROOT}/src/app/gui" 2>/dev/null; then
echo -e "${GREEN}✓ AgentChatWidget found in GUI code${NC}"
else
echo -e "${YELLOW}⚠ AgentChatWidget not yet implemented${NC}"
echo " This is the next priority item in the roadmap"
echo " Location: src/app/gui/debug/agent_chat_widget.{h,cc}"
fi
echo ""
echo "Next Steps:"
echo " 1. Run YAZE and verify gRPC server starts: $YAZE_APP"
echo " 2. Test conversation agent: z3ed agent test-conversation"
echo " 3. Implement AgentChatWidget for GUI integration"
echo ""

View File

@@ -1,128 +0,0 @@
#!/bin/bash
# End-to-end smoke test for test introspection CLI commands
# Requires YAZE to be built with gRPC support (build-grpc-test preset)
set -euo pipefail
GREEN='\033[0;32m'
RED='\033[0;31m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m'
Z3ED_BIN="./build-grpc-test/bin/z3ed"
YAZE_BIN="./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze"
ROM_FILE="assets/zelda3.sfc"
TEST_PORT="${TEST_PORT:-50052}"
PROMPT="Open Overworld editor and verify it loads"
HOST="localhost"
STATUS_LOG="$(mktemp /tmp/z3ed_status_XXXX.log)"
RESULTS_LOG="$(mktemp /tmp/z3ed_results_XXXX.log)"
LIST_LOG="$(mktemp /tmp/z3ed_list_XXXX.log)"
RUN_LOG="$(mktemp /tmp/z3ed_run_XXXX.log)"
cleanup() {
if [[ -n "${YAZE_PID:-}" ]]; then
kill "${YAZE_PID}" 2>/dev/null || true
fi
rm -f "$STATUS_LOG" "$RESULTS_LOG" "$LIST_LOG" "$RUN_LOG"
}
trap cleanup EXIT
if [[ ! -x "$Z3ED_BIN" ]]; then
echo -e "${RED}Error:${NC} z3ed binary not found at $Z3ED_BIN"
echo "Build with: cmake --build build-grpc-test --target z3ed"
exit 1
fi
if [[ ! -x "$YAZE_BIN" ]]; then
echo -e "${RED}Error:${NC} YAZE binary not found at $YAZE_BIN"
echo "Build with: cmake --build build-grpc-test --target yaze"
exit 1
fi
if [[ ! -f "$ROM_FILE" ]]; then
echo -e "${RED}Error:${NC} ROM file not found at $ROM_FILE"
exit 1
fi
echo -e "${YELLOW}=== Test Harness Introspection E2E ===${NC}"
# Ensure no previous YAZE instance is running
killall yaze 2>/dev/null || true
sleep 1
echo -e "${BLUE}→ Starting YAZE (port $TEST_PORT)...${NC}"
"$YAZE_BIN" \
--enable_test_harness \
--test_harness_port="$TEST_PORT" \
--rom_file="$ROM_FILE" &
YAZE_PID=$!
ready=0
for attempt in {1..20}; do
if lsof -i ":$TEST_PORT" >/dev/null 2>&1; then
ready=1
break
fi
sleep 0.5
done
if [[ "$ready" -ne 1 ]]; then
echo -e "${RED}Error:${NC} ImGuiTestHarness server did not start on port $TEST_PORT"
exit 1
fi
echo -e "${GREEN}✓ Harness ready${NC}"
echo -e "${BLUE}→ Running agent test workflow: $PROMPT${NC}"
if ! "$Z3ED_BIN" agent test --prompt "$PROMPT" --host "$HOST" --port "$TEST_PORT" | tee "$RUN_LOG"; then
echo -e "${RED}Error:${NC} agent test run failed"
exit 1
fi
PRIMARY_TEST_ID=$(sed -n 's/.*Test ID: \([^][]*\).*/\1/p' "$RUN_LOG" | tail -n 1 | tr -d ' ]')
if [[ -z "$PRIMARY_TEST_ID" ]]; then
echo -e "${RED}Error:${NC} Unable to extract test id from agent test output"
exit 1
fi
echo -e "${GREEN}✓ Captured Test ID:${NC} $PRIMARY_TEST_ID"
echo -e "${BLUE}→ Checking status${NC}"
"$Z3ED_BIN" agent test status --test-id "$PRIMARY_TEST_ID" --host "$HOST" --port "$TEST_PORT" | tee "$STATUS_LOG"
if ! grep -q "Status: " "$STATUS_LOG"; then
echo -e "${RED}Error:${NC} status command did not return a status"
exit 1
fi
if grep -q "Status: PASSED" "$STATUS_LOG"; then
echo -e "${GREEN}✓ Status indicates PASS${NC}"
else
echo -e "${YELLOW}! Status is not PASSED (see $STATUS_LOG)${NC}"
fi
echo -e "${BLUE}→ Fetching detailed results (YAML)${NC}"
"$Z3ED_BIN" agent test results --test-id "$PRIMARY_TEST_ID" --include-logs --host "$HOST" --port "$TEST_PORT" | tee "$RESULTS_LOG"
if ! grep -q "success: " "$RESULTS_LOG"; then
echo -e "${RED}Error:${NC} results command failed"
exit 1
fi
echo -e "${BLUE}→ Listing recent grpc tests${NC}"
"$Z3ED_BIN" agent test list --category grpc --limit 5 --host "$HOST" --port "$TEST_PORT" | tee "$LIST_LOG"
if ! grep -q "Test ID:" "$LIST_LOG"; then
echo -e "${RED}Error:${NC} list command returned no tests"
exit 1
fi
echo -e "${GREEN}✓ Introspection commands completed successfully${NC}"
echo -e "${YELLOW}Artifacts:${NC}"
echo " Status log: $STATUS_LOG"
echo " Results log: $RESULTS_LOG"
echo " List log: $LIST_LOG"
echo -e "${GREEN}All checks passed!${NC}"
exit 0

View File

@@ -1,276 +0,0 @@
#!/bin/bash
# Test Remote Control - Practical Agent Workflows
#
# This script demonstrates the agent's ability to remotely control YAZE
# and perform real editing tasks like drawing tiles, moving entities, etc.
#
# Usage: ./scripts/test_remote_control.sh
set -e # Exit on error
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PROJECT_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"
PROTO_DIR="$PROJECT_ROOT/src/app/core/proto"
PROTO_FILE="imgui_test_harness.proto"
# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m' # No Color
# Test harness connection
HOST="127.0.0.1"
PORT="50052"
echo -e "${BLUE}=== YAZE Remote Control Test ===${NC}\n"
# Check if grpcurl is available
if ! command -v grpcurl &> /dev/null; then
echo -e "${RED}Error: grpcurl not found${NC}"
echo "Install with: brew install grpcurl"
exit 1
fi
# Helper function to make gRPC calls
grpc_call() {
local method=$1
local data=$2
grpcurl -plaintext \
-import-path "$PROTO_DIR" \
-proto "$PROTO_FILE" \
-d "$data" \
"$HOST:$PORT" \
"yaze.test.ImGuiTestHarness/$method"
}
# Helper function to print test status
print_test() {
local test_num=$1
local test_name=$2
echo -e "\n${BLUE}Test $test_num: $test_name${NC}"
}
print_success() {
echo -e "${GREEN}✓ PASSED${NC}"
}
print_failure() {
echo -e "${RED}✗ FAILED: $1${NC}"
}
# Test 0: Check server connection
print_test "0" "Server Connection"
if grpc_call "Ping" '{"message":"hello"}' &> /dev/null; then
print_success
else
print_failure "Server not responding"
echo -e "${YELLOW}Start the test harness:${NC}"
echo "./build-grpc-test/bin/yaze.app/Contents/MacOS/yaze \\"
echo " --enable_test_harness \\"
echo " --test_harness_port=50052 \\"
echo " --rom_file=assets/zelda3.sfc"
exit 1
fi
echo -e "\n${BLUE}=== Practical Agent Workflows ===${NC}\n"
# Workflow 1: Activate Draw Tile Mode
print_test "1" "Activate Draw Tile Mode"
echo "Action: Click DrawTile button in Overworld toolset"
response=$(grpc_call "Click" '{
"target": "Overworld/Toolset/button:DrawTile",
"type": "LEFT"
}' 2>&1)
if echo "$response" | grep -q '"success": true'; then
print_success
echo "Agent can now paint tiles on the overworld"
else
print_failure "Could not activate draw tile mode"
echo "Response: $response"
fi
sleep 1
# Workflow 2: Select Pan Mode
print_test "2" "Select Pan Mode"
echo "Action: Click Pan button to enable map navigation"
response=$(grpc_call "Click" '{
"target": "Overworld/Toolset/button:Pan",
"type": "LEFT"
}' 2>&1)
if echo "$response" | grep -q '"success": true'; then
print_success
echo "Agent can now pan the overworld map"
else
print_failure "Could not activate pan mode"
echo "Response: $response"
fi
sleep 1
# Workflow 3: Open Tile16 Editor
print_test "3" "Open Tile16 Editor"
echo "Action: Click Tile16Editor button to open editor"
response=$(grpc_call "Click" '{
"target": "Overworld/Toolset/button:Tile16Editor",
"type": "LEFT"
}' 2>&1)
if echo "$response" | grep -q '"success": true'; then
print_success
echo "Tile16 Editor window should now be open"
echo "Agent can select tiles for drawing"
else
print_failure "Could not open Tile16 Editor"
echo "Response: $response"
fi
sleep 1
# Workflow 4: Test Entrances Mode
print_test "4" "Switch to Entrances Mode"
echo "Action: Click Entrances button"
response=$(grpc_call "Click" '{
"target": "Overworld/Toolset/button:Entrances",
"type": "LEFT"
}' 2>&1)
if echo "$response" | grep -q '"success": true'; then
print_success
echo "Agent can now edit overworld entrances"
else
print_failure "Could not activate entrances mode"
echo "Response: $response"
fi
sleep 1
# Workflow 5: Test Exits Mode
print_test "5" "Switch to Exits Mode"
echo "Action: Click Exits button"
response=$(grpc_call "Click" '{
"target": "Overworld/Toolset/button:Exits",
"type": "LEFT"
}' 2>&1)
if echo "$response" | grep -q '"success": true'; then
print_success
echo "Agent can now edit overworld exits"
else
print_failure "Could not activate exits mode"
echo "Response: $response"
fi
sleep 1
# Workflow 6: Test Sprites Mode
print_test "6" "Switch to Sprites Mode"
echo "Action: Click Sprites button"
response=$(grpc_call "Click" '{
"target": "Overworld/Toolset/button:Sprites",
"type": "LEFT"
}' 2>&1)
if echo "$response" | grep -q '"success": true'; then
print_success
echo "Agent can now edit sprite placements"
else
print_failure "Could not activate sprites mode"
echo "Response: $response"
fi
sleep 1
# Workflow 7: Test Items Mode
print_test "7" "Switch to Items Mode"
echo "Action: Click Items button"
response=$(grpc_call "Click" '{
"target": "Overworld/Toolset/button:Items",
"type": "LEFT"
}' 2>&1)
if echo "$response" | grep -q '"success": true'; then
print_success
echo "Agent can now place items on the overworld"
else
print_failure "Could not activate items mode"
echo "Response: $response"
fi
sleep 1
# Workflow 8: Test Zoom Controls
print_test "8" "Test Zoom Controls"
echo "Action: Zoom in on the map"
response=$(grpc_call "Click" '{
"target": "Overworld/Toolset/button:ZoomIn",
"type": "LEFT"
}' 2>&1)
if echo "$response" | grep -q '"success": true'; then
print_success
echo "Zoom level increased"
# Zoom back out
sleep 0.5
grpc_call "Click" '{
"target": "Overworld/Toolset/button:ZoomOut",
"type": "LEFT"
}' &> /dev/null
echo "Zoom level restored"
else
print_failure "Could not zoom"
echo "Response: $response"
fi
# Workflow 9: Legacy Format Fallback Test
print_test "9" "Legacy Format Fallback"
echo "Action: Test old-style widget reference"
response=$(grpc_call "Click" '{
"target": "button:Overworld",
"type": "LEFT"
}' 2>&1)
if echo "$response" | grep -q '"success": true'; then
print_success
echo "Legacy format still works (backwards compatible)"
else
# This is expected if Overworld Editor isn't in main window
echo -e "${YELLOW}Legacy format may not work (expected)${NC}"
fi
# Summary
echo -e "\n${BLUE}=== Test Summary ===${NC}\n"
echo "Remote control capabilities verified:"
echo " ✓ Mode switching (Draw, Pan, Entrances, Exits, Sprites, Items)"
echo " ✓ Tool opening (Tile16 Editor)"
echo " ✓ Zoom controls"
echo " ✓ Widget registry integration"
echo ""
echo "Agent can now:"
echo " • Switch between editing modes"
echo " • Open auxiliary editors"
echo " • Control view settings"
echo " • Prepare for complex editing operations"
echo ""
echo "Next steps for full automation:"
echo " 1. Add canvas click support (x,y coordinates)"
echo " 2. Add tile selection in Tile16 Editor"
echo " 3. Add entity dragging support"
echo " 4. Implement workflow chaining (mode + select + draw)"
echo ""
echo -e "${GREEN}Remote control system functional!${NC}"