From b7c642611a1e4e6d581c4c3cd2f52f5ea11c1540 Mon Sep 17 00:00:00 2001 From: scawful Date: Fri, 10 Oct 2025 17:23:22 -0400 Subject: [PATCH] docs: add comprehensive APU timing analysis - Document current bstep mechanism and its fragility - Identify root cause of handshake timing failure - Design atomic instruction execution approach - Plan fixed-point cycle ratio conversion - Outline implementation strategy Relates to #APU-timing-fix --- docs/apu-timing-analysis.md | 324 ++++++++++++++++++++++++++++++++++++ 1 file changed, 324 insertions(+) create mode 100644 docs/apu-timing-analysis.md diff --git a/docs/apu-timing-analysis.md b/docs/apu-timing-analysis.md new file mode 100644 index 00000000..198e3385 --- /dev/null +++ b/docs/apu-timing-analysis.md @@ -0,0 +1,324 @@ +# APU Timing Fix - Technical Analysis + +**Branch:** `feature/apu-timing-fix` +**Date:** October 10, 2025 +**Status:** Analysis Complete, Implementation Starting + +## Problem Summary + +The APU fails to load and play music because the SPC700 gets stuck during the initial CPU-APU handshake. This handshake uploads the sound driver from ROM to APU RAM. The timing desynchronization causes infinite loops detected by the watchdog timer. + +--- + +## Current Implementation Analysis + +### 1. **Cycle Counting System** (`spc700.cc`) + +**Current Approach:** +```cpp +// In spc700.h line 87: +int last_opcode_cycles_ = 0; + +// In RunOpcode() line 80: +last_opcode_cycles_ = spc700_cycles[opcode]; // Static lookup +``` + +**Problem:** The `spc700_cycles[]` array provides BASELINE cycle counts only. It does NOT account for: +- Addressing mode variations +- Page boundary crossings (+1 cycle) +- Branch taken vs not taken (+2 cycles if taken) +- Memory access penalties + +### 2. **The `bstep` Mechanism** (`spc700.cc`) + +**What is `bstep`?** + +`bstep` is a "business step" counter used to spread complex multi-step instructions across multiple calls to `RunOpcode()`. + +**Example from line 1108-1115 (opcode 0xCB - MOVSY dp):** +```cpp +case 0xcb: { // movsy dp + if (bstep == 0) { + adr = dp(); // Save address for bstep=1 + } + if (adr == 0x00F4 && bstep == 1) { + LOG_DEBUG("SPC", "MOVSY writing Y=$%02X to F4 at PC=$%04X", Y, PC); + } + MOVSY(adr); // Use saved address + break; +} +``` + +The `MOVSY()` function internally increments `bstep` to track progress: +- `bstep=0`: Call `dp()` to get address +- `bstep=1`: Actually perform the write +- `bstep=2`: Reset to 0, instruction complete + +**Why this is fragile:** +1. **Non-atomic execution**: An instruction takes 2-3 calls to `RunOpcode()` to complete +2. **State leakage**: If `bstep` gets out of sync, all future instructions fail +3. **Cycle accounting errors**: Cycles are consumed incrementally, not atomically +4. **Debugging nightmare**: Hard to trace when an instruction "really" executes + +### 3. **APU Main Loop** (`apu.cc:73-143`) + +**Current implementation:** +```cpp +void Apu::RunCycles(uint64_t master_cycles) { + const double ratio = memory_.pal_timing() ? apuCyclesPerMasterPal : apuCyclesPerMaster; + uint64_t master_delta = master_cycles - g_last_master_cycles; + g_last_master_cycles = master_cycles; + + const uint64_t target_apu_cycles = cycles_ + static_cast(master_delta * ratio); + + while (cycles_ < target_apu_cycles) { + spc700_.RunOpcode(); // Variable cycles + int spc_cycles = spc700_.GetLastOpcodeCycles(); + + for (int i = 0; i < spc_cycles; ++i) { + Cycle(); // Advance DSP/timers + } + } +} +``` + +**Problems:** +1. **Floating-point `ratio`**: `apuCyclesPerMaster` is `double` (line 17), causing precision drift +2. **Opcode-level granularity**: Advances by opcode, not by cycle +3. **No sub-cycle accuracy**: Can't model instructions that span multiple cycles + +### 4. **Floating-Point Precision** (`apu.cc:17`) + +```cpp +static const double apuCyclesPerMaster = (32040 * 32) / (1364 * 262 * 60.0); +``` + +**Calculation:** +- Numerator: 32040 * 32 = 1,025,280 +- Denominator: 1364 * 262 * 60.0 = 21,437,280 +- Result: ~0.04783 (floating point) + +**Problem:** Over thousands of cycles, tiny rounding errors accumulate, causing timing drift. + +--- + +## Root Cause: Handshake Timing Failure + +### The Handshake Protocol + +1. **APU Ready**: SPC700 writes `$AA` to `$F4`, `$BB` to `$F5` +2. **CPU Waits**: Main CPU polls for `$BBAA` +3. **CPU Initiates**: Writes `$CC` to APU input port +4. **APU Acknowledges**: SPC700 sees `$CC`, prepares to receive +5. **Byte Transfer Loop**: CPU sends byte, waits for echo confirmation, sends next byte + +### Where It Gets Stuck + +The SPC700 enters an infinite loop because: +- **SPC700 is waiting** for a byte from CPU (hasn't arrived yet) +- **CPU is waiting** for acknowledgment from SPC700 (already sent, but missed) + +This happens because cycle counts are off by 1-2 cycles per instruction, which accumulates over the ~500-1000 instructions in the handshake. + +--- + +## Solution Design + +### Phase 1: Atomic Instruction Execution + +**Goal:** Eliminate `bstep` mechanism entirely. + +**New Design:** +```cpp +// New function signature +int Spc700::Step() { + if (reset_wanted_) { /* handle reset */ return 8; } + if (stopped_) { /* handle stop */ return 2; } + + // Fetch opcode + uint8_t opcode = ReadOpcode(); + + // Calculate EXACT cycle cost upfront + int cycles = CalculatePreciseCycles(opcode); + + // Execute instruction COMPLETELY + ExecuteInstructionAtomic(opcode); + + return cycles; // Return exact cycles consumed +} +``` + +**Benefits:** +- One call = one complete instruction +- Cycles calculated before execution +- No state leakage between calls +- Easier debugging + +### Phase 2: Precise Cycle Calculation + +**New function:** +```cpp +int Spc700::CalculatePreciseCycles(uint8_t opcode) { + int base_cycles = spc700_cycles[opcode]; + + // Account for addressing mode penalties + switch (opcode) { + case 0x10: case 0x30: /* ... branches ... */ + // Branches: +2 cycles if taken (handled in execution) + break; + case 0x15: case 0x16: /* ... abs+X, abs+Y ... */ + // Check if page boundary crossed (+1 cycle) + if (will_cross_page_boundary(opcode)) { + base_cycles += 1; + } + break; + // ... more addressing mode checks ... + } + + return base_cycles; +} +``` + +### Phase 3: Refactor `Apu::RunCycles` to Cycle Budget Model + +**New implementation:** +```cpp +void Apu::RunCycles(uint64_t master_cycles) { + // 1. Calculate target using FIXED-POINT ratio (Phase 4) + uint64_t master_delta = master_cycles - g_last_master_cycles; + g_last_master_cycles = master_cycles; + + // 2. Fixed-point conversion (avoiding floating point) + uint64_t target_apu_cycles = cycles_ + (master_delta * kApuCyclesNumerator) / kApuCyclesDenominator; + + // 3. Run until budget exhausted + while (cycles_ < target_apu_cycles) { + // 4. Execute ONE instruction atomically + int spc_cycles_consumed = spc700_.Step(); + + // 5. Advance DSP/timers for each cycle + for (int i = 0; i < spc_cycles_consumed; ++i) { + Cycle(); // Ticks DSP, timers, increments cycles_ + } + } +} +``` + +### Phase 4: Fixed-Point Cycle Ratio + +**Replace floating-point with integer ratio:** +```cpp +// Old (apu.cc:17) +static const double apuCyclesPerMaster = (32040 * 32) / (1364 * 262 * 60.0); + +// New +static constexpr uint64_t kApuCyclesNumerator = 32040 * 32; // 1,025,280 +static constexpr uint64_t kApuCyclesDenominator = 1364 * 262 * 60; // 21,437,280 +``` + +**Conversion:** +```cpp +apu_cycles = (master_cycles * kApuCyclesNumerator) / kApuCyclesDenominator; +``` + +**Benefits:** +- Perfect precision (no floating-point drift) +- Integer arithmetic is faster +- Deterministic across platforms + +--- + +## Implementation Plan + +### Step 1: Add `Spc700::Step()` Function +- Add new `Step()` method to `spc700.h` +- Implement atomic instruction execution +- Keep `RunOpcode()` temporarily for compatibility + +### Step 2: Implement Precise Cycle Calculation +- Create `CalculatePreciseCycles()` helper +- Handle branch penalties +- Handle page boundary crossings +- Add tests to verify against known SPC700 timings + +### Step 3: Eliminate `bstep` Mechanism +- Refactor all multi-step instructions (0xCB, 0xD0, 0xD7, etc.) +- Remove `bstep` variable +- Remove `step` variable +- Verify all 256 opcodes work atomically + +### Step 4: Refactor `Apu::RunCycles` +- Switch to cycle budget model +- Use `Step()` instead of `RunOpcode()` +- Add cycle budget logging for debugging + +### Step 5: Convert to Fixed-Point Ratio +- Replace `apuCyclesPerMaster` double +- Use integer numerator/denominator +- Add constants for PAL timing too + +### Step 6: Testing +- Test with vanilla Zelda3 ROM +- Verify handshake completes +- Verify music plays +- Check for watchdog timeouts +- Measure timing accuracy + +--- + +## Files to Modify + +1. **src/app/emu/audio/spc700.h** + - Add `int Step()` method + - Add `int CalculatePreciseCycles(uint8_t opcode)` + - Remove `bstep` and `step` variables + +2. **src/app/emu/audio/spc700.cc** + - Implement `Step()` + - Implement `CalculatePreciseCycles()` + - Refactor `ExecuteInstructions()` to be atomic + - Remove all `bstep` logic + +3. **src/app/emu/audio/apu.h** + - Update cycle ratio constants + +4. **src/app/emu/audio/apu.cc** + - Refactor `RunCycles()` to use `Step()` + - Convert to fixed-point ratio + - Remove floating-point arithmetic + +5. **test/unit/spc700_timing_test.cc** (new) + - Test cycle accuracy for all opcodes + - Test handshake simulation + - Verify no regressions + +--- + +## Success Criteria + +- [ ] All SPC700 instructions execute atomically (one `Step()` call) +- [ ] Cycle counts accurate to ±1 cycle per instruction +- [ ] APU handshake completes without watchdog timeout +- [ ] Music loads and plays in vanilla Zelda3 +- [ ] No floating-point drift over long emulation sessions +- [ ] Unit tests pass for all 256 opcodes + +--- + +## Next Steps + +1. ✅ Create feature branch +2. ✅ Analyze current implementation +3. ⏳ Implement `Spc700::Step()` function +4. ⏳ Add precise cycle calculation +5. ⏳ Refactor `Apu::RunCycles` +6. ⏳ Convert to fixed-point ratio +7. ⏳ Test with Zelda3 ROM +8. ⏳ Write unit tests + +--- + +**References:** +- [SPC700 Opcode Reference](https://problemkaputt.de/fullsnes.htm#snesapucpu) +- [APU Timing Documentation](https://wiki.superfamicom.org/spc700-reference) +- docs/E6-emulator-improvements.md