Files
yaze/docs/internal/emulator_accuracy_report.md

9.7 KiB

Codebase Investigation: Yaze vs Mesen2 SNES Emulation

Executive Summary

This investigation compares the architecture of yaze (Yet Another Zelda Editor's emulator) with Mesen2 (a high-accuracy multi-system emulator). The goal is to identify areas where yaze can be improved to approach Mesen2's level of accuracy.

Fundamental Difference:

  • Yaze is an instruction-level / scanline-based emulator. It executes entire CPU instructions at once and catches up other subsystems (APU, PPU) at specific checkpoints (memory access, scanline end).
  • Mesen2 is a bus-level / cycle-based emulator. It advances the system state (timers, DMA, interrupts) on every single CPU bus cycle (read/write/idle), allowing for sub-instruction synchronization.

Detailed Comparison

1. CPU Timing & Bus Arbitration

Feature Yaze (Snes::RunOpcode, Cpu::ExecuteInstruction) Mesen2 (SnesCpu::Exec, Read/Write)
Granularity Executes full instruction, then adds cycles. Batches bus cycles around memory accesses. Executes micro-ops. Read/Write calls ProcessCpuCycle to advance system state per byte.
Timing Snes::CpuRead runs access_time - 4 cycles, reads, then 4 cycles. SnesCpu::Read determines speed (GetCpuSpeed), runs cycles, then reads.
Interrupts Checked at instruction boundaries (RunOpcode). Checked on every cycle (ProcessCpuCycle -> DetectNmiSignalEdge).

Improvement Opportunity: The current yaze approach of batching cycles in CpuRead (RunCycles(access_time - 4)) is a good approximation but fails for edge cases where an IRQ or DMA might trigger during an instruction's execution (e.g., between operand bytes).

  • Recommendation: Refactor Cpu::ReadByte / Cpu::WriteByte callbacks to advance the system clock before returning data. This moves yaze closer to a cycle-stepped architecture without rewriting the entire core state machine.

2. PPU Rendering & Raster Effects

Feature Yaze (Ppu::RunLine) Mesen2 (SnesPpu)
Rendering Scanline-based. Renders full line at H=512 (next_horiz_event). Dot-based (effectively). Handles cycle-accurate register writes.
Mid-Line Changes Register writes (WriteBBus) update internal state immediately, but rendering only happens later. Raster effects (H-IRQ) will apply to the whole line or be missed. Register writes catch up the renderer to the current dot before applying changes.

Improvement Opportunity: This is the biggest accuracy gap. Games like Tales of Phantasia or Star Ocean that use raster effects (changing color/brightness/windowing mid-scanline) will not render correctly in yaze.

  • Recommendation: Implement a "Just-In-Time" PPU Catch-up.
    • Add a Ppu::CatchUp(uint16_t h_pos) method.
    • Call ppu_.CatchUp(memory_.h_pos()) inside Snes::WriteBBus (PPU register writes).
    • CatchUp should render pixels from last_rendered_x to current_x, then update last_rendered_x.

3. APU Synchronization

Feature Yaze (Snes::CatchUpApu) Mesen2 (Spc::IncCycleCount)
Sync Method Catch-up. Runs APU to match CPU master cycles on every port read/write (ReadBBus/WriteBBus). Cycle interleaved.
Ratio Fixed-point math (kApuCyclesNumerator...). Floating point ratio derived from sample rates.

Assessment: yaze's APU synchronization strategy is actually very robust. Calling CatchUpApu on every IO port access ($2140-$2143) ensures the SPC700 sees the correct data timing relative to the CPU. The handshake tracker (ApuHandshakeTracker) confirms this logic is working well for boot sequences.

  • Recommendation: No major architectural changes needed here. Focus on Spc700 opcode accuracy and DSP mixing quality.

4. Input & Auto-Joypad Reading

Feature Yaze (Snes::HandleInput) Mesen2 (InternalRegisters::ProcessAutoJoypad)
Timing Runs once at VBlank start. Populates all registers immediately. Runs continuously over ~4224 master clocks during VBlank.
Accuracy Games reading $4218 too early in VBlank will see finished data (correct values, wrong timing). Games reading too early see 0 or partial data.

Improvement Opportunity: Some games rely on the duration of the auto-joypad read to time their VBlank routines.

  • Recommendation: Implement a state machine for auto-joypad reading in Snes::RunCycle. Instead of filling port_auto_read_ instantly, fill it bit-by-bit over the correct number of cycles.

5. AI & Editor Integration Architecture

To support AI-driven debugging and dynamic editor integration (e.g., "Teleport & Test"), the emulator must evolve from a "black box" to an observable, controllable simulation.

A. Dynamic State Injection (The "Test Sprite" Button)

Currently, testing requires a full reset or loading a binary save state. We need a State Patching API to programmatically set up game scenarios.

  • Proposal: Emulator::InjectState(const GameStatePatch& patch)
    • GameStatePatch: A structure containing target WRAM values (e.g., Room ID, Coordinates, Inventory) and CPU state (PC location).
    • Workflow:
      1. Reset & Fast-Boot: Reset emulator and fast-forward past the boot sequence (e.g., until GameMode RAM indicates "Gameplay").
      2. Injection: Pause execution and write the patch values directly to WRAM/SRAM.
      3. Resume: Hand control to the user or AI agent.
    • Use Case: "Test this sprite in Room 0x12." -> The editor builds a patch setting ROOM_ID=0x12, LINK_X=StartPos, and injects it.

B. Semantic Inspection Layer (The "AI Eyes")

Multimodal models struggle with raw pixel streams for precise logic debugging. They need a "semantic overlay" that grounds visuals in game data.

  • Proposal: SemanticIntrospectionEngine
    • Symbol Mapping: Uses SymbolProvider and MemoryMap (from yaze project) to decode raw RAM into meaningful concepts.
    • Structured Context: Expose a method GetSemanticState() returning JSON/Struct:
      {
        "mode": "Underworld",
        "room_id": 24,
        "link": { "x": 1200, "y": 800, "state": "SwordSlash", "hp": 16 },
        "sprites": [
          { "id": 0, "type": "Stalfos", "x": 1250, "y": 800, "state": "Active", "hp": 2 }
        ]
      }
      
    • Visual Grounding: Provide an API to generate "debug frames" where hitboxes and interaction zones are drawn over the game feed. This allows Vision Models to correlate "Link is overlapping Stalfos" visually with Link.x ~= Stalfos.x logically.

C. Headless & Fast-Forward Control

For automated verification (e.g., "Does entering this room crash?"), rendering overhead is unnecessary.

  • Proposal: Decoupled Rendering Pipeline
    • Allow Emulator to run in "Headless Mode":
      • PPU renders to a simplified RAM buffer (or skips rendering if only logic is being tested).
      • Audio backend is disabled or set to NullBackend.
      • Execution speed is uncapped (limited only by CPU).
    • RunUntil(Condition) API: Allow the agent to execute complex commands like:
      • RunUntil(PC == 0x8000) (Breakpoint match)
      • RunUntil(Memory[0x10] == 0x01) (Game mode change)
      • RunUntil(FrameCount == Target + 60) (Time duration)

Recent Improvements

SDL3 Audio Backend (2025-11-23)

A new SDL3 audio backend has been implemented to modernize the emulator's audio subsystem:

Implementation Details:

  • Stream-based architecture: Replaces SDL2's queue-based approach with SDL3's SDL_AudioStream API
  • Files added:
    • src/app/emu/audio/sdl3_audio_backend.h/cc - Complete SDL3 backend implementation
    • src/app/platform/sdl_compat.h - Cross-version compatibility layer
  • Factory integration: AudioBackendFactory now supports BackendType::SDL3
  • Resampling support: Native handling of SPC700's 32kHz output to device rate
  • Volume control: Optimized fast-path for unity gain (common case)

Benefits:

  • Lower audio latency potential with stream-based processing
  • Better synchronization between audio and video subsystems
  • Native resampling reduces CPU overhead for rate conversion
  • Future-proof architecture aligned with SDL3's design philosophy

Testing:

  • Unit tests added in test/unit/sdl3_audio_backend_test.cc
  • Conditional compilation via YAZE_USE_SDL3 flag ensures backward compatibility
  • Seamless fallback to SDL2 when SDL3 unavailable

Action Plan

To upgrade yaze for both accuracy and AI integration, follow this implementation order:

  1. PPU Catch-up (Accuracy - High Impact)

    • Modify Ppu to track last_rendered_x.
    • Split RunLine into RenderRange(start_x, end_x).
    • Inject ppu_.CatchUp() calls in Snes::WriteBBus.
  2. Semantic Inspection API (AI - High Impact)

    • Create SemanticIntrospectionEngine class.
    • Connect it to Memory and SymbolProvider.
    • Implement basic GetPlayerState() and GetSpriteState() using known ALTTP RAM offsets.
  3. State Injection API (Integration - Medium Impact)

    • Implement Emulator::InjectState.
    • Add specific "presets" for common ALTTP testing scenarios (e.g., "Dungeon Test", "Overworld Test").
  4. Refined CPU Timing (Accuracy - Low Impact, High Effort)

    • Audit Cpu::ExecuteInstruction for missing callbacks_.idle() calls.
    • Ensure "dummy read" cycles in RMW instructions trigger side effects.
  5. Auto-Joypad Progressive Read (Accuracy - Low Impact)

    • Change auto_joy_timer_ to drive bit-shifting in port_auto_read_ registers.