13 KiB
APU Timing Fix - Technical Analysis
Branch: feature/apu-timing-fix
Date: October 10, 2025
Status: Implemented - Core Timing Fixed (Minor Audio Glitches Remain)
Implementation Status
Completed:
- Atomic
Step()function for SPC700 - Fixed-point cycle ratio (no floating-point drift)
- Cycle budget model in APU
- Removed
bstepmechanism from instructions.cc - Cycle-accurate instruction implementations
- Proper branch timing (+2 cycles when taken)
- Dummy read/write cycles for MOV and RMW instructions
Known Issues:
- Some audio glitches/distortion during playback
- Minor timing inconsistencies under investigation
- Can be improved in future iterations
Note: The APU now executes correctly and music plays, but audio quality can be further refined.
Problem Summary
The APU fails to load and play music because the SPC700 gets stuck during the initial CPU-APU handshake. This handshake uploads the sound driver from ROM to APU RAM. The timing desynchronization causes infinite loops detected by the watchdog timer.
Current Implementation Analysis
1. Cycle Counting System (spc700.cc)
Current Approach:
// In spc700.h line 87:
int last_opcode_cycles_ = 0;
// In RunOpcode() line 80:
last_opcode_cycles_ = spc700_cycles[opcode]; // Static lookup
Problem: The spc700_cycles[] array provides BASELINE cycle counts only. It does NOT account for:
- Addressing mode variations
- Page boundary crossings (+1 cycle)
- Branch taken vs not taken (+2 cycles if taken)
- Memory access penalties
2. The bstep Mechanism (spc700.cc)
What is bstep?
bstep is a "business step" counter used to spread complex multi-step instructions across multiple calls to RunOpcode().
Example from line 1108-1115 (opcode 0xCB - MOVSY dp):
case 0xcb: { // movsy dp
if (bstep == 0) {
adr = dp(); // Save address for bstep=1
}
if (adr == 0x00F4 && bstep == 1) {
LOG_DEBUG("SPC", "MOVSY writing Y=$%02X to F4 at PC=$%04X", Y, PC);
}
MOVSY(adr); // Use saved address
break;
}
The MOVSY() function internally increments bstep to track progress:
bstep=0: Calldp()to get addressbstep=1: Actually perform the writebstep=2: Reset to 0, instruction complete
Why this is fragile:
- Non-atomic execution: An instruction takes 2-3 calls to
RunOpcode()to complete - State leakage: If
bstepgets out of sync, all future instructions fail - Cycle accounting errors: Cycles are consumed incrementally, not atomically
- Debugging nightmare: Hard to trace when an instruction "really" executes
3. APU Main Loop (apu.cc:73-143)
Current implementation:
void Apu::RunCycles(uint64_t master_cycles) {
const double ratio = memory_.pal_timing() ? apuCyclesPerMasterPal : apuCyclesPerMaster;
uint64_t master_delta = master_cycles - g_last_master_cycles;
g_last_master_cycles = master_cycles;
const uint64_t target_apu_cycles = cycles_ + static_cast<uint64_t>(master_delta * ratio);
while (cycles_ < target_apu_cycles) {
spc700_.RunOpcode(); // Variable cycles
int spc_cycles = spc700_.GetLastOpcodeCycles();
for (int i = 0; i < spc_cycles; ++i) {
Cycle(); // Advance DSP/timers
}
}
}
Problems:
- Floating-point
ratio:apuCyclesPerMasterisdouble(line 17), causing precision drift - Opcode-level granularity: Advances by opcode, not by cycle
- No sub-cycle accuracy: Can't model instructions that span multiple cycles
4. Floating-Point Precision (apu.cc:17)
static const double apuCyclesPerMaster = (32040 * 32) / (1364 * 262 * 60.0);
Calculation:
- Numerator: 32040 * 32 = 1,025,280
- Denominator: 1364 * 262 * 60.0 = 21,437,280
- Result: ~0.04783 (floating point)
Problem: Over thousands of cycles, tiny rounding errors accumulate, causing timing drift.
Root Cause: Handshake Timing Failure
The Handshake Protocol
- APU Ready: SPC700 writes
$AAto$F4,$BBto$F5 - CPU Waits: Main CPU polls for
$BBAA - CPU Initiates: Writes
$CCto APU input port - APU Acknowledges: SPC700 sees
$CC, prepares to receive - Byte Transfer Loop: CPU sends byte, waits for echo confirmation, sends next byte
Where It Gets Stuck
The SPC700 enters an infinite loop because:
- SPC700 is waiting for a byte from CPU (hasn't arrived yet)
- CPU is waiting for acknowledgment from SPC700 (already sent, but missed)
This happens because cycle counts are off by 1-2 cycles per instruction, which accumulates over the ~500-1000 instructions in the handshake.
LakeSnes Comparison Analysis
What LakeSnes Does Right
1. Atomic Instruction Execution (spc.c:73-93)
void spc_runOpcode(Spc* spc) {
if(spc->resetWanted) { /* handle reset */ return; }
if(spc->stopped) { spc_idleWait(spc); return; }
uint8_t opcode = spc_readOpcode(spc);
spc_doOpcode(spc, opcode); // COMPLETE instruction in one call
}
Key insight: LakeSnes executes instructions atomically - no bstep, no step, no state leakage.
2. Cycle Tracking via Callbacks (spc.c:406-409)
static void spc_movsy(Spc* spc, uint16_t adr) {
spc_read(spc, adr); // Calls apu_cycle()
spc_write(spc, adr, spc->y); // Calls apu_cycle()
}
Every spc_read(), spc_write(), and spc_idle() call triggers apu_cycle(), which:
- Advances APU cycle counter
- Ticks DSP every 32 cycles
- Updates timers
3. Simple Addressing Mode Functions (spc.c:189-275)
static uint16_t spc_adrDp(Spc* spc) {
return spc_readOpcode(spc) | (spc->p << 8);
}
static uint16_t spc_adrDpx(Spc* spc) {
uint16_t res = ((spc_readOpcode(spc) + spc->x) & 0xff) | (spc->p << 8);
spc_idle(spc); // Extra cycle for indexed addressing
return res;
}
Each memory access and idle call automatically advances cycles.
4. APU Main Loop (apu.c:73-82)
int apu_runCycles(Apu* apu, int wantedCycles) {
int runCycles = 0;
uint32_t startCycles = apu->cycles;
while(runCycles < wantedCycles) {
spc_runOpcode(apu->spc);
runCycles += (uint32_t) (apu->cycles - startCycles);
startCycles = apu->cycles;
}
return runCycles;
}
Problem: This approach tracks cycles by delta, which works because every memory access calls apu_cycle().
Where LakeSnes Falls Short (And How We Can Do Better)
1. No Explicit Cycle Return
- LakeSnes relies on tracking
cyclesdelta after each opcode - Doesn't return precise cycle count from
spc_runOpcode() - Makes it hard to validate cycle accuracy per instruction
Our improvement: Return exact cycle count from Step():
int Spc700::Step() {
uint8_t opcode = ReadOpcode();
int cycles = CalculatePreciseCycles(opcode);
ExecuteInstructionAtomic(opcode);
return cycles; // EXPLICIT return
}
2. Implicit Cycle Counting
- Cycles accumulated implicitly through callbacks
- Hard to debug when cycles are wrong
- No way to verify cycle accuracy per instruction
Our improvement: Explicit cycle budget model in Apu::RunCycles():
while (cycles_ < target_apu_cycles) {
int spc_cycles = spc700_.Step(); // Explicit cycle count
for (int i = 0; i < spc_cycles; ++i) {
Cycle(); // Explicit cycle advancement
}
}
3. No Fixed-Point Ratio
- LakeSnes also uses floating-point (implicitly in SNES main loop)
- Subject to same precision drift issues
Our improvement: Integer numerator/denominator for perfect precision.
What We're Adopting from LakeSnes
Atomic instruction execution - No bstep mechanism
Simple addressing mode functions - Return address, advance cycles via callbacks
Cycle advancement per memory access - Every read/write/idle advances cycles
What We're Improving Over LakeSnes
Explicit cycle counting - Step() returns exact cycles consumed
Cycle budget model - Clear loop with explicit cycle advancement
Fixed-point ratio - Integer arithmetic for perfect precision
Testability - Easy to verify cycle counts per instruction
Solution Design
Phase 1: Atomic Instruction Execution
Goal: Eliminate bstep mechanism entirely.
New Design:
// New function signature
int Spc700::Step() {
if (reset_wanted_) { /* handle reset */ return 8; }
if (stopped_) { /* handle stop */ return 2; }
// Fetch opcode
uint8_t opcode = ReadOpcode();
// Calculate EXACT cycle cost upfront
int cycles = CalculatePreciseCycles(opcode);
// Execute instruction COMPLETELY
ExecuteInstructionAtomic(opcode);
return cycles; // Return exact cycles consumed
}
Benefits:
- One call = one complete instruction
- Cycles calculated before execution
- No state leakage between calls
- Easier debugging
Phase 2: Precise Cycle Calculation
New function:
int Spc700::CalculatePreciseCycles(uint8_t opcode) {
int base_cycles = spc700_cycles[opcode];
// Account for addressing mode penalties
switch (opcode) {
case 0x10: case 0x30: /* ... branches ... */
// Branches: +2 cycles if taken (handled in execution)
break;
case 0x15: case 0x16: /* ... abs+X, abs+Y ... */
// Check if page boundary crossed (+1 cycle)
if (will_cross_page_boundary(opcode)) {
base_cycles += 1;
}
break;
// ... more addressing mode checks ...
}
return base_cycles;
}
Phase 3: Refactor Apu::RunCycles to Cycle Budget Model
New implementation:
void Apu::RunCycles(uint64_t master_cycles) {
// 1. Calculate target using FIXED-POINT ratio (Phase 4)
uint64_t master_delta = master_cycles - g_last_master_cycles;
g_last_master_cycles = master_cycles;
// 2. Fixed-point conversion (avoiding floating point)
uint64_t target_apu_cycles = cycles_ + (master_delta * kApuCyclesNumerator) / kApuCyclesDenominator;
// 3. Run until budget exhausted
while (cycles_ < target_apu_cycles) {
// 4. Execute ONE instruction atomically
int spc_cycles_consumed = spc700_.Step();
// 5. Advance DSP/timers for each cycle
for (int i = 0; i < spc_cycles_consumed; ++i) {
Cycle(); // Ticks DSP, timers, increments cycles_
}
}
}
Phase 4: Fixed-Point Cycle Ratio
Replace floating-point with integer ratio:
// Old (apu.cc:17)
static const double apuCyclesPerMaster = (32040 * 32) / (1364 * 262 * 60.0);
// New
static constexpr uint64_t kApuCyclesNumerator = 32040 * 32; // 1,025,280
static constexpr uint64_t kApuCyclesDenominator = 1364 * 262 * 60; // 21,437,280
Conversion:
apu_cycles = (master_cycles * kApuCyclesNumerator) / kApuCyclesDenominator;
Benefits:
- Perfect precision (no floating-point drift)
- Integer arithmetic is faster
- Deterministic across platforms
Implementation Plan
Step 1: Add Spc700::Step() Function
- Add new
Step()method tospc700.h - Implement atomic instruction execution
- Keep
RunOpcode()temporarily for compatibility
Step 2: Implement Precise Cycle Calculation
- Create
CalculatePreciseCycles()helper - Handle branch penalties
- Handle page boundary crossings
- Add tests to verify against known SPC700 timings
Step 3: Eliminate bstep Mechanism
- Refactor all multi-step instructions (0xCB, 0xD0, 0xD7, etc.)
- Remove
bstepvariable - Remove
stepvariable - Verify all 256 opcodes work atomically
Step 4: Refactor Apu::RunCycles
- Switch to cycle budget model
- Use
Step()instead ofRunOpcode() - Add cycle budget logging for debugging
Step 5: Convert to Fixed-Point Ratio
- Replace
apuCyclesPerMasterdouble - Use integer numerator/denominator
- Add constants for PAL timing too
Step 6: Testing
- Test with vanilla Zelda3 ROM
- Verify handshake completes
- Verify music plays
- Check for watchdog timeouts
- Measure timing accuracy
Files to Modify
-
src/app/emu/audio/spc700.h
- Add
int Step()method - Add
int CalculatePreciseCycles(uint8_t opcode) - Remove
bstepandstepvariables
- Add
-
src/app/emu/audio/spc700.cc
- Implement
Step() - Implement
CalculatePreciseCycles() - Refactor
ExecuteInstructions()to be atomic - Remove all
bsteplogic
- Implement
-
src/app/emu/audio/apu.h
- Update cycle ratio constants
-
src/app/emu/audio/apu.cc
- Refactor
RunCycles()to useStep() - Convert to fixed-point ratio
- Remove floating-point arithmetic
- Refactor
-
test/unit/spc700_timing_test.cc (new)
- Test cycle accuracy for all opcodes
- Test handshake simulation
- Verify no regressions
Success Criteria
- All SPC700 instructions execute atomically (one
Step()call) - Cycle counts accurate to ±1 cycle per instruction
- APU handshake completes without watchdog timeout
- Music loads and plays in vanilla Zelda3
- No floating-point drift over long emulation sessions
- Unit tests pass for all 256 opcodes (future work)
- Audio quality refined (minor glitches remain)
Implementation Completed
- Create feature branch
- Analyze current implementation
- Implement
Spc700::Step()function - Add precise cycle calculation
- Refactor
Apu::RunCycles - Convert to fixed-point ratio
- Refactor instructions.cc to be atomic and cycle-accurate
- Test with Zelda3 ROM
- Write unit tests (future work)
- Fine-tune audio quality (future work)
References:
- SPC700 Opcode Reference
- APU Timing Documentation
- docs/E6-emulator-improvements.md