docs: add comprehensive APU timing analysis

- Document current bstep mechanism and its fragility
- Identify root cause of handshake timing failure
- Design atomic instruction execution approach
- Plan fixed-point cycle ratio conversion
- Outline implementation strategy

Relates to #APU-timing-fix
This commit is contained in:
scawful
2025-10-10 17:23:22 -04:00
parent cc7e374911
commit b7c642611a

324
docs/apu-timing-analysis.md Normal file
View File

@@ -0,0 +1,324 @@
# APU Timing Fix - Technical Analysis
**Branch:** `feature/apu-timing-fix`
**Date:** October 10, 2025
**Status:** Analysis Complete, Implementation Starting
## Problem Summary
The APU fails to load and play music because the SPC700 gets stuck during the initial CPU-APU handshake. This handshake uploads the sound driver from ROM to APU RAM. The timing desynchronization causes infinite loops detected by the watchdog timer.
---
## Current Implementation Analysis
### 1. **Cycle Counting System** (`spc700.cc`)
**Current Approach:**
```cpp
// In spc700.h line 87:
int last_opcode_cycles_ = 0;
// In RunOpcode() line 80:
last_opcode_cycles_ = spc700_cycles[opcode]; // Static lookup
```
**Problem:** The `spc700_cycles[]` array provides BASELINE cycle counts only. It does NOT account for:
- Addressing mode variations
- Page boundary crossings (+1 cycle)
- Branch taken vs not taken (+2 cycles if taken)
- Memory access penalties
### 2. **The `bstep` Mechanism** (`spc700.cc`)
**What is `bstep`?**
`bstep` is a "business step" counter used to spread complex multi-step instructions across multiple calls to `RunOpcode()`.
**Example from line 1108-1115 (opcode 0xCB - MOVSY dp):**
```cpp
case 0xcb: { // movsy dp
if (bstep == 0) {
adr = dp(); // Save address for bstep=1
}
if (adr == 0x00F4 && bstep == 1) {
LOG_DEBUG("SPC", "MOVSY writing Y=$%02X to F4 at PC=$%04X", Y, PC);
}
MOVSY(adr); // Use saved address
break;
}
```
The `MOVSY()` function internally increments `bstep` to track progress:
- `bstep=0`: Call `dp()` to get address
- `bstep=1`: Actually perform the write
- `bstep=2`: Reset to 0, instruction complete
**Why this is fragile:**
1. **Non-atomic execution**: An instruction takes 2-3 calls to `RunOpcode()` to complete
2. **State leakage**: If `bstep` gets out of sync, all future instructions fail
3. **Cycle accounting errors**: Cycles are consumed incrementally, not atomically
4. **Debugging nightmare**: Hard to trace when an instruction "really" executes
### 3. **APU Main Loop** (`apu.cc:73-143`)
**Current implementation:**
```cpp
void Apu::RunCycles(uint64_t master_cycles) {
const double ratio = memory_.pal_timing() ? apuCyclesPerMasterPal : apuCyclesPerMaster;
uint64_t master_delta = master_cycles - g_last_master_cycles;
g_last_master_cycles = master_cycles;
const uint64_t target_apu_cycles = cycles_ + static_cast<uint64_t>(master_delta * ratio);
while (cycles_ < target_apu_cycles) {
spc700_.RunOpcode(); // Variable cycles
int spc_cycles = spc700_.GetLastOpcodeCycles();
for (int i = 0; i < spc_cycles; ++i) {
Cycle(); // Advance DSP/timers
}
}
}
```
**Problems:**
1. **Floating-point `ratio`**: `apuCyclesPerMaster` is `double` (line 17), causing precision drift
2. **Opcode-level granularity**: Advances by opcode, not by cycle
3. **No sub-cycle accuracy**: Can't model instructions that span multiple cycles
### 4. **Floating-Point Precision** (`apu.cc:17`)
```cpp
static const double apuCyclesPerMaster = (32040 * 32) / (1364 * 262 * 60.0);
```
**Calculation:**
- Numerator: 32040 * 32 = 1,025,280
- Denominator: 1364 * 262 * 60.0 = 21,437,280
- Result: ~0.04783 (floating point)
**Problem:** Over thousands of cycles, tiny rounding errors accumulate, causing timing drift.
---
## Root Cause: Handshake Timing Failure
### The Handshake Protocol
1. **APU Ready**: SPC700 writes `$AA` to `$F4`, `$BB` to `$F5`
2. **CPU Waits**: Main CPU polls for `$BBAA`
3. **CPU Initiates**: Writes `$CC` to APU input port
4. **APU Acknowledges**: SPC700 sees `$CC`, prepares to receive
5. **Byte Transfer Loop**: CPU sends byte, waits for echo confirmation, sends next byte
### Where It Gets Stuck
The SPC700 enters an infinite loop because:
- **SPC700 is waiting** for a byte from CPU (hasn't arrived yet)
- **CPU is waiting** for acknowledgment from SPC700 (already sent, but missed)
This happens because cycle counts are off by 1-2 cycles per instruction, which accumulates over the ~500-1000 instructions in the handshake.
---
## Solution Design
### Phase 1: Atomic Instruction Execution
**Goal:** Eliminate `bstep` mechanism entirely.
**New Design:**
```cpp
// New function signature
int Spc700::Step() {
if (reset_wanted_) { /* handle reset */ return 8; }
if (stopped_) { /* handle stop */ return 2; }
// Fetch opcode
uint8_t opcode = ReadOpcode();
// Calculate EXACT cycle cost upfront
int cycles = CalculatePreciseCycles(opcode);
// Execute instruction COMPLETELY
ExecuteInstructionAtomic(opcode);
return cycles; // Return exact cycles consumed
}
```
**Benefits:**
- One call = one complete instruction
- Cycles calculated before execution
- No state leakage between calls
- Easier debugging
### Phase 2: Precise Cycle Calculation
**New function:**
```cpp
int Spc700::CalculatePreciseCycles(uint8_t opcode) {
int base_cycles = spc700_cycles[opcode];
// Account for addressing mode penalties
switch (opcode) {
case 0x10: case 0x30: /* ... branches ... */
// Branches: +2 cycles if taken (handled in execution)
break;
case 0x15: case 0x16: /* ... abs+X, abs+Y ... */
// Check if page boundary crossed (+1 cycle)
if (will_cross_page_boundary(opcode)) {
base_cycles += 1;
}
break;
// ... more addressing mode checks ...
}
return base_cycles;
}
```
### Phase 3: Refactor `Apu::RunCycles` to Cycle Budget Model
**New implementation:**
```cpp
void Apu::RunCycles(uint64_t master_cycles) {
// 1. Calculate target using FIXED-POINT ratio (Phase 4)
uint64_t master_delta = master_cycles - g_last_master_cycles;
g_last_master_cycles = master_cycles;
// 2. Fixed-point conversion (avoiding floating point)
uint64_t target_apu_cycles = cycles_ + (master_delta * kApuCyclesNumerator) / kApuCyclesDenominator;
// 3. Run until budget exhausted
while (cycles_ < target_apu_cycles) {
// 4. Execute ONE instruction atomically
int spc_cycles_consumed = spc700_.Step();
// 5. Advance DSP/timers for each cycle
for (int i = 0; i < spc_cycles_consumed; ++i) {
Cycle(); // Ticks DSP, timers, increments cycles_
}
}
}
```
### Phase 4: Fixed-Point Cycle Ratio
**Replace floating-point with integer ratio:**
```cpp
// Old (apu.cc:17)
static const double apuCyclesPerMaster = (32040 * 32) / (1364 * 262 * 60.0);
// New
static constexpr uint64_t kApuCyclesNumerator = 32040 * 32; // 1,025,280
static constexpr uint64_t kApuCyclesDenominator = 1364 * 262 * 60; // 21,437,280
```
**Conversion:**
```cpp
apu_cycles = (master_cycles * kApuCyclesNumerator) / kApuCyclesDenominator;
```
**Benefits:**
- Perfect precision (no floating-point drift)
- Integer arithmetic is faster
- Deterministic across platforms
---
## Implementation Plan
### Step 1: Add `Spc700::Step()` Function
- Add new `Step()` method to `spc700.h`
- Implement atomic instruction execution
- Keep `RunOpcode()` temporarily for compatibility
### Step 2: Implement Precise Cycle Calculation
- Create `CalculatePreciseCycles()` helper
- Handle branch penalties
- Handle page boundary crossings
- Add tests to verify against known SPC700 timings
### Step 3: Eliminate `bstep` Mechanism
- Refactor all multi-step instructions (0xCB, 0xD0, 0xD7, etc.)
- Remove `bstep` variable
- Remove `step` variable
- Verify all 256 opcodes work atomically
### Step 4: Refactor `Apu::RunCycles`
- Switch to cycle budget model
- Use `Step()` instead of `RunOpcode()`
- Add cycle budget logging for debugging
### Step 5: Convert to Fixed-Point Ratio
- Replace `apuCyclesPerMaster` double
- Use integer numerator/denominator
- Add constants for PAL timing too
### Step 6: Testing
- Test with vanilla Zelda3 ROM
- Verify handshake completes
- Verify music plays
- Check for watchdog timeouts
- Measure timing accuracy
---
## Files to Modify
1. **src/app/emu/audio/spc700.h**
- Add `int Step()` method
- Add `int CalculatePreciseCycles(uint8_t opcode)`
- Remove `bstep` and `step` variables
2. **src/app/emu/audio/spc700.cc**
- Implement `Step()`
- Implement `CalculatePreciseCycles()`
- Refactor `ExecuteInstructions()` to be atomic
- Remove all `bstep` logic
3. **src/app/emu/audio/apu.h**
- Update cycle ratio constants
4. **src/app/emu/audio/apu.cc**
- Refactor `RunCycles()` to use `Step()`
- Convert to fixed-point ratio
- Remove floating-point arithmetic
5. **test/unit/spc700_timing_test.cc** (new)
- Test cycle accuracy for all opcodes
- Test handshake simulation
- Verify no regressions
---
## Success Criteria
- [ ] All SPC700 instructions execute atomically (one `Step()` call)
- [ ] Cycle counts accurate to ±1 cycle per instruction
- [ ] APU handshake completes without watchdog timeout
- [ ] Music loads and plays in vanilla Zelda3
- [ ] No floating-point drift over long emulation sessions
- [ ] Unit tests pass for all 256 opcodes
---
## Next Steps
1. ✅ Create feature branch
2. ✅ Analyze current implementation
3. ⏳ Implement `Spc700::Step()` function
4. ⏳ Add precise cycle calculation
5. ⏳ Refactor `Apu::RunCycles`
6. ⏳ Convert to fixed-point ratio
7. ⏳ Test with Zelda3 ROM
8. ⏳ Write unit tests
---
**References:**
- [SPC700 Opcode Reference](https://problemkaputt.de/fullsnes.htm#snesapucpu)
- [APU Timing Documentation](https://wiki.superfamicom.org/spc700-reference)
- docs/E6-emulator-improvements.md