backend-infra-engineer: Release v0.3.3 snapshot
This commit is contained in:
461
docs/internal/research/apu-timing-analysis.md
Normal file
461
docs/internal/research/apu-timing-analysis.md
Normal file
@@ -0,0 +1,461 @@
|
||||
# APU Timing Fix - Technical Analysis
|
||||
|
||||
**Branch:** `feature/apu-timing-fix`
|
||||
**Date:** October 10, 2025
|
||||
**Status:** Implemented - Core Timing Fixed (Minor Audio Glitches Remain)
|
||||
|
||||
---
|
||||
|
||||
## Implementation Status
|
||||
|
||||
**Completed:**
|
||||
- Atomic `Step()` function for SPC700
|
||||
- Fixed-point cycle ratio (no floating-point drift)
|
||||
- Cycle budget model in APU
|
||||
- Removed `bstep` mechanism from instructions.cc
|
||||
- Cycle-accurate instruction implementations
|
||||
- Proper branch timing (+2 cycles when taken)
|
||||
- Dummy read/write cycles for MOV and RMW instructions
|
||||
|
||||
**Known Issues:**
|
||||
- Some audio glitches/distortion during playback
|
||||
- Minor timing inconsistencies under investigation
|
||||
- Can be improved in future iterations
|
||||
|
||||
**Note:** The APU now executes correctly and music plays, but audio quality can be further refined.
|
||||
|
||||
## Problem Summary
|
||||
|
||||
The APU fails to load and play music because the SPC700 gets stuck during the initial CPU-APU handshake. This handshake uploads the sound driver from ROM to APU RAM. The timing desynchronization causes infinite loops detected by the watchdog timer.
|
||||
|
||||
---
|
||||
|
||||
## Current Implementation Analysis
|
||||
|
||||
### 1. **Cycle Counting System** (`spc700.cc`)
|
||||
|
||||
**Current Approach:**
|
||||
```cpp
|
||||
// In spc700.h line 87:
|
||||
int last_opcode_cycles_ = 0;
|
||||
|
||||
// In RunOpcode() line 80:
|
||||
last_opcode_cycles_ = spc700_cycles[opcode]; // Static lookup
|
||||
```
|
||||
|
||||
**Problem:** The `spc700_cycles[]` array provides BASELINE cycle counts only. It does NOT account for:
|
||||
- Addressing mode variations
|
||||
- Page boundary crossings (+1 cycle)
|
||||
- Branch taken vs not taken (+2 cycles if taken)
|
||||
- Memory access penalties
|
||||
|
||||
### 2. **The `bstep` Mechanism** (`spc700.cc`)
|
||||
|
||||
**What is `bstep`?**
|
||||
|
||||
`bstep` is a "business step" counter used to spread complex multi-step instructions across multiple calls to `RunOpcode()`.
|
||||
|
||||
**Example from line 1108-1115 (opcode 0xCB - MOVSY dp):**
|
||||
```cpp
|
||||
case 0xcb: { // movsy dp
|
||||
if (bstep == 0) {
|
||||
adr = dp(); // Save address for bstep=1
|
||||
}
|
||||
if (adr == 0x00F4 && bstep == 1) {
|
||||
LOG_DEBUG("SPC", "MOVSY writing Y=$%02X to F4 at PC=$%04X", Y, PC);
|
||||
}
|
||||
MOVSY(adr); // Use saved address
|
||||
break;
|
||||
}
|
||||
```
|
||||
|
||||
The `MOVSY()` function internally increments `bstep` to track progress:
|
||||
- `bstep=0`: Call `dp()` to get address
|
||||
- `bstep=1`: Actually perform the write
|
||||
- `bstep=2`: Reset to 0, instruction complete
|
||||
|
||||
**Why this is fragile:**
|
||||
1. **Non-atomic execution**: An instruction takes 2-3 calls to `RunOpcode()` to complete
|
||||
2. **State leakage**: If `bstep` gets out of sync, all future instructions fail
|
||||
3. **Cycle accounting errors**: Cycles are consumed incrementally, not atomically
|
||||
4. **Debugging nightmare**: Hard to trace when an instruction "really" executes
|
||||
|
||||
### 3. **APU Main Loop** (`apu.cc:73-143`)
|
||||
|
||||
**Current implementation:**
|
||||
```cpp
|
||||
void Apu::RunCycles(uint64_t master_cycles) {
|
||||
const double ratio = memory_.pal_timing() ? apuCyclesPerMasterPal : apuCyclesPerMaster;
|
||||
uint64_t master_delta = master_cycles - g_last_master_cycles;
|
||||
g_last_master_cycles = master_cycles;
|
||||
|
||||
const uint64_t target_apu_cycles = cycles_ + static_cast<uint64_t>(master_delta * ratio);
|
||||
|
||||
while (cycles_ < target_apu_cycles) {
|
||||
spc700_.RunOpcode(); // Variable cycles
|
||||
int spc_cycles = spc700_.GetLastOpcodeCycles();
|
||||
|
||||
for (int i = 0; i < spc_cycles; ++i) {
|
||||
Cycle(); // Advance DSP/timers
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Problems:**
|
||||
1. **Floating-point `ratio`**: `apuCyclesPerMaster` is `double` (line 17), causing precision drift
|
||||
2. **Opcode-level granularity**: Advances by opcode, not by cycle
|
||||
3. **No sub-cycle accuracy**: Can't model instructions that span multiple cycles
|
||||
|
||||
### 4. **Floating-Point Precision** (`apu.cc:17`)
|
||||
|
||||
```cpp
|
||||
static const double apuCyclesPerMaster = (32040 * 32) / (1364 * 262 * 60.0);
|
||||
```
|
||||
|
||||
**Calculation:**
|
||||
- Numerator: 32040 * 32 = 1,025,280
|
||||
- Denominator: 1364 * 262 * 60.0 = 21,437,280
|
||||
- Result: ~0.04783 (floating point)
|
||||
|
||||
**Problem:** Over thousands of cycles, tiny rounding errors accumulate, causing timing drift.
|
||||
|
||||
---
|
||||
|
||||
## Root Cause: Handshake Timing Failure
|
||||
|
||||
### The Handshake Protocol
|
||||
|
||||
1. **APU Ready**: SPC700 writes `$AA` to `$F4`, `$BB` to `$F5`
|
||||
2. **CPU Waits**: Main CPU polls for `$BBAA`
|
||||
3. **CPU Initiates**: Writes `$CC` to APU input port
|
||||
4. **APU Acknowledges**: SPC700 sees `$CC`, prepares to receive
|
||||
5. **Byte Transfer Loop**: CPU sends byte, waits for echo confirmation, sends next byte
|
||||
|
||||
### Where It Gets Stuck
|
||||
|
||||
The SPC700 enters an infinite loop because:
|
||||
- **SPC700 is waiting** for a byte from CPU (hasn't arrived yet)
|
||||
- **CPU is waiting** for acknowledgment from SPC700 (already sent, but missed)
|
||||
|
||||
This happens because cycle counts are off by 1-2 cycles per instruction, which accumulates over the ~500-1000 instructions in the handshake.
|
||||
|
||||
---
|
||||
|
||||
## LakeSnes Comparison Analysis
|
||||
|
||||
### What LakeSnes Does Right
|
||||
|
||||
**1. Atomic Instruction Execution (spc.c:73-93)**
|
||||
```c
|
||||
void spc_runOpcode(Spc* spc) {
|
||||
if(spc->resetWanted) { /* handle reset */ return; }
|
||||
if(spc->stopped) { spc_idleWait(spc); return; }
|
||||
|
||||
uint8_t opcode = spc_readOpcode(spc);
|
||||
spc_doOpcode(spc, opcode); // COMPLETE instruction in one call
|
||||
}
|
||||
```
|
||||
|
||||
**Key insight:** LakeSnes executes instructions **atomically** - no `bstep`, no `step`, no state leakage.
|
||||
|
||||
**2. Cycle Tracking via Callbacks (spc.c:406-409)**
|
||||
```c
|
||||
static void spc_movsy(Spc* spc, uint16_t adr) {
|
||||
spc_read(spc, adr); // Calls apu_cycle()
|
||||
spc_write(spc, adr, spc->y); // Calls apu_cycle()
|
||||
}
|
||||
```
|
||||
|
||||
Every `spc_read()`, `spc_write()`, and `spc_idle()` call triggers `apu_cycle()`, which:
|
||||
- Advances APU cycle counter
|
||||
- Ticks DSP every 32 cycles
|
||||
- Updates timers
|
||||
|
||||
**3. Simple Addressing Mode Functions (spc.c:189-275)**
|
||||
```c
|
||||
static uint16_t spc_adrDp(Spc* spc) {
|
||||
return spc_readOpcode(spc) | (spc->p << 8);
|
||||
}
|
||||
|
||||
static uint16_t spc_adrDpx(Spc* spc) {
|
||||
uint16_t res = ((spc_readOpcode(spc) + spc->x) & 0xff) | (spc->p << 8);
|
||||
spc_idle(spc); // Extra cycle for indexed addressing
|
||||
return res;
|
||||
}
|
||||
```
|
||||
|
||||
Each memory access and idle call automatically advances cycles.
|
||||
|
||||
**4. APU Main Loop (apu.c:73-82)**
|
||||
```c
|
||||
int apu_runCycles(Apu* apu, int wantedCycles) {
|
||||
int runCycles = 0;
|
||||
uint32_t startCycles = apu->cycles;
|
||||
while(runCycles < wantedCycles) {
|
||||
spc_runOpcode(apu->spc);
|
||||
runCycles += (uint32_t) (apu->cycles - startCycles);
|
||||
startCycles = apu->cycles;
|
||||
}
|
||||
return runCycles;
|
||||
}
|
||||
```
|
||||
|
||||
**Problem:** This approach tracks cycles by **delta**, which works because every memory access calls `apu_cycle()`.
|
||||
|
||||
### Where LakeSnes Falls Short (And How We Can Do Better)
|
||||
|
||||
**1. No Explicit Cycle Return**
|
||||
- LakeSnes relies on tracking `cycles` delta after each opcode
|
||||
- Doesn't return precise cycle count from `spc_runOpcode()`
|
||||
- Makes it hard to validate cycle accuracy per instruction
|
||||
|
||||
**Our improvement:** Return exact cycle count from `Step()`:
|
||||
```cpp
|
||||
int Spc700::Step() {
|
||||
uint8_t opcode = ReadOpcode();
|
||||
int cycles = CalculatePreciseCycles(opcode);
|
||||
ExecuteInstructionAtomic(opcode);
|
||||
return cycles; // EXPLICIT return
|
||||
}
|
||||
```
|
||||
|
||||
**2. Implicit Cycle Counting**
|
||||
- Cycles accumulated implicitly through callbacks
|
||||
- Hard to debug when cycles are wrong
|
||||
- No way to verify cycle accuracy per instruction
|
||||
|
||||
**Our improvement:** Explicit cycle budget model in `Apu::RunCycles()`:
|
||||
```cpp
|
||||
while (cycles_ < target_apu_cycles) {
|
||||
int spc_cycles = spc700_.Step(); // Explicit cycle count
|
||||
for (int i = 0; i < spc_cycles; ++i) {
|
||||
Cycle(); // Explicit cycle advancement
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**3. No Fixed-Point Ratio**
|
||||
- LakeSnes also uses floating-point (implicitly in SNES main loop)
|
||||
- Subject to same precision drift issues
|
||||
|
||||
**Our improvement:** Integer numerator/denominator for perfect precision.
|
||||
|
||||
### What We're Adopting from LakeSnes
|
||||
|
||||
**Atomic instruction execution** - No `bstep` mechanism
|
||||
**Simple addressing mode functions** - Return address, advance cycles via callbacks
|
||||
**Cycle advancement per memory access** - Every read/write/idle advances cycles
|
||||
|
||||
### What We're Improving Over LakeSnes
|
||||
|
||||
**Explicit cycle counting** - `Step()` returns exact cycles consumed
|
||||
**Cycle budget model** - Clear loop with explicit cycle advancement
|
||||
**Fixed-point ratio** - Integer arithmetic for perfect precision
|
||||
**Testability** - Easy to verify cycle counts per instruction
|
||||
|
||||
---
|
||||
|
||||
## Solution Design
|
||||
|
||||
### Phase 1: Atomic Instruction Execution
|
||||
|
||||
**Goal:** Eliminate `bstep` mechanism entirely.
|
||||
|
||||
**New Design:**
|
||||
```cpp
|
||||
// New function signature
|
||||
int Spc700::Step() {
|
||||
if (reset_wanted_) { /* handle reset */ return 8; }
|
||||
if (stopped_) { /* handle stop */ return 2; }
|
||||
|
||||
// Fetch opcode
|
||||
uint8_t opcode = ReadOpcode();
|
||||
|
||||
// Calculate EXACT cycle cost upfront
|
||||
int cycles = CalculatePreciseCycles(opcode);
|
||||
|
||||
// Execute instruction COMPLETELY
|
||||
ExecuteInstructionAtomic(opcode);
|
||||
|
||||
return cycles; // Return exact cycles consumed
|
||||
}
|
||||
```
|
||||
|
||||
**Benefits:**
|
||||
- One call = one complete instruction
|
||||
- Cycles calculated before execution
|
||||
- No state leakage between calls
|
||||
- Easier debugging
|
||||
|
||||
### Phase 2: Precise Cycle Calculation
|
||||
|
||||
**New function:**
|
||||
```cpp
|
||||
int Spc700::CalculatePreciseCycles(uint8_t opcode) {
|
||||
int base_cycles = spc700_cycles[opcode];
|
||||
|
||||
// Account for addressing mode penalties
|
||||
switch (opcode) {
|
||||
case 0x10: case 0x30: /* ... branches ... */
|
||||
// Branches: +2 cycles if taken (handled in execution)
|
||||
break;
|
||||
case 0x15: case 0x16: /* ... abs+X, abs+Y ... */
|
||||
// Check if page boundary crossed (+1 cycle)
|
||||
if (will_cross_page_boundary(opcode)) {
|
||||
base_cycles += 1;
|
||||
}
|
||||
break;
|
||||
// ... more addressing mode checks ...
|
||||
}
|
||||
|
||||
return base_cycles;
|
||||
}
|
||||
```
|
||||
|
||||
### Phase 3: Refactor `Apu::RunCycles` to Cycle Budget Model
|
||||
|
||||
**New implementation:**
|
||||
```cpp
|
||||
void Apu::RunCycles(uint64_t master_cycles) {
|
||||
// 1. Calculate target using FIXED-POINT ratio (Phase 4)
|
||||
uint64_t master_delta = master_cycles - g_last_master_cycles;
|
||||
g_last_master_cycles = master_cycles;
|
||||
|
||||
// 2. Fixed-point conversion (avoiding floating point)
|
||||
uint64_t target_apu_cycles = cycles_ + (master_delta * kApuCyclesNumerator) / kApuCyclesDenominator;
|
||||
|
||||
// 3. Run until budget exhausted
|
||||
while (cycles_ < target_apu_cycles) {
|
||||
// 4. Execute ONE instruction atomically
|
||||
int spc_cycles_consumed = spc700_.Step();
|
||||
|
||||
// 5. Advance DSP/timers for each cycle
|
||||
for (int i = 0; i < spc_cycles_consumed; ++i) {
|
||||
Cycle(); // Ticks DSP, timers, increments cycles_
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Phase 4: Fixed-Point Cycle Ratio
|
||||
|
||||
**Replace floating-point with integer ratio:**
|
||||
```cpp
|
||||
// Old (apu.cc:17)
|
||||
static const double apuCyclesPerMaster = (32040 * 32) / (1364 * 262 * 60.0);
|
||||
|
||||
// New
|
||||
static constexpr uint64_t kApuCyclesNumerator = 32040 * 32; // 1,025,280
|
||||
static constexpr uint64_t kApuCyclesDenominator = 1364 * 262 * 60; // 21,437,280
|
||||
```
|
||||
|
||||
**Conversion:**
|
||||
```cpp
|
||||
apu_cycles = (master_cycles * kApuCyclesNumerator) / kApuCyclesDenominator;
|
||||
```
|
||||
|
||||
**Benefits:**
|
||||
- Perfect precision (no floating-point drift)
|
||||
- Integer arithmetic is faster
|
||||
- Deterministic across platforms
|
||||
|
||||
---
|
||||
|
||||
## Implementation Plan
|
||||
|
||||
### Step 1: Add `Spc700::Step()` Function
|
||||
- Add new `Step()` method to `spc700.h`
|
||||
- Implement atomic instruction execution
|
||||
- Keep `RunOpcode()` temporarily for compatibility
|
||||
|
||||
### Step 2: Implement Precise Cycle Calculation
|
||||
- Create `CalculatePreciseCycles()` helper
|
||||
- Handle branch penalties
|
||||
- Handle page boundary crossings
|
||||
- Add tests to verify against known SPC700 timings
|
||||
|
||||
### Step 3: Eliminate `bstep` Mechanism
|
||||
- Refactor all multi-step instructions (0xCB, 0xD0, 0xD7, etc.)
|
||||
- Remove `bstep` variable
|
||||
- Remove `step` variable
|
||||
- Verify all 256 opcodes work atomically
|
||||
|
||||
### Step 4: Refactor `Apu::RunCycles`
|
||||
- Switch to cycle budget model
|
||||
- Use `Step()` instead of `RunOpcode()`
|
||||
- Add cycle budget logging for debugging
|
||||
|
||||
### Step 5: Convert to Fixed-Point Ratio
|
||||
- Replace `apuCyclesPerMaster` double
|
||||
- Use integer numerator/denominator
|
||||
- Add constants for PAL timing too
|
||||
|
||||
### Step 6: Testing
|
||||
- Test with vanilla Zelda3 ROM
|
||||
- Verify handshake completes
|
||||
- Verify music plays
|
||||
- Check for watchdog timeouts
|
||||
- Measure timing accuracy
|
||||
|
||||
---
|
||||
|
||||
## Files to Modify
|
||||
|
||||
1. **src/app/emu/audio/spc700.h**
|
||||
- Add `int Step()` method
|
||||
- Add `int CalculatePreciseCycles(uint8_t opcode)`
|
||||
- Remove `bstep` and `step` variables
|
||||
|
||||
2. **src/app/emu/audio/spc700.cc**
|
||||
- Implement `Step()`
|
||||
- Implement `CalculatePreciseCycles()`
|
||||
- Refactor `ExecuteInstructions()` to be atomic
|
||||
- Remove all `bstep` logic
|
||||
|
||||
3. **src/app/emu/audio/apu.h**
|
||||
- Update cycle ratio constants
|
||||
|
||||
4. **src/app/emu/audio/apu.cc**
|
||||
- Refactor `RunCycles()` to use `Step()`
|
||||
- Convert to fixed-point ratio
|
||||
- Remove floating-point arithmetic
|
||||
|
||||
5. **test/unit/spc700_timing_test.cc** (new)
|
||||
- Test cycle accuracy for all opcodes
|
||||
- Test handshake simulation
|
||||
- Verify no regressions
|
||||
|
||||
---
|
||||
|
||||
## Success Criteria
|
||||
|
||||
- [x] All SPC700 instructions execute atomically (one `Step()` call)
|
||||
- [x] Cycle counts accurate to ±1 cycle per instruction
|
||||
- [x] APU handshake completes without watchdog timeout
|
||||
- [x] Music loads and plays in vanilla Zelda3
|
||||
- [x] No floating-point drift over long emulation sessions
|
||||
- [ ] Unit tests pass for all 256 opcodes (future work)
|
||||
- [ ] Audio quality refined (minor glitches remain)
|
||||
|
||||
---
|
||||
|
||||
## Implementation Completed
|
||||
|
||||
1. Create feature branch
|
||||
2. Analyze current implementation
|
||||
3. Implement `Spc700::Step()` function
|
||||
4. Add precise cycle calculation
|
||||
5. Refactor `Apu::RunCycles`
|
||||
6. Convert to fixed-point ratio
|
||||
7. Refactor instructions.cc to be atomic and cycle-accurate
|
||||
8. Test with Zelda3 ROM
|
||||
9. Write unit tests (future work)
|
||||
10. Fine-tune audio quality (future work)
|
||||
|
||||
---
|
||||
|
||||
**References:**
|
||||
- [SPC700 Opcode Reference](https://problemkaputt.de/fullsnes.htm#snesapucpu)
|
||||
- [APU Timing Documentation](https://wiki.superfamicom.org/spc700-reference)
|
||||
- docs/E6-emulator-improvements.md
|
||||
Reference in New Issue
Block a user