docs: add LakeSnes comparison analysis to APU timing

- Document what LakeSnes does right (atomic execution, cycle callbacks) - Identify where LakeSnes falls short (implicit counting, no explicit return) - Define what we're adopting vs improving - Clarify hybrid approach: LakeSnes simplicity + explicit validation
2025-10-10 17:28:00 -04:00
parent b7c642611a
commit b5cecedbb0
1 changed files with 114 additions and 0 deletions
--- a/docs/apu-timing-analysis.md
+++ b/docs/apu-timing-analysis.md
@@ -122,6 +122,120 @@ This happens because cycle counts are off by 1-2 cycles per instruction, which a

 ---

+## LakeSnes Comparison Analysis
+
+### What LakeSnes Does Right
+
+**1. Atomic Instruction Execution (spc.c:73-93)**
+```c
+void spc_runOpcode(Spc* spc) {
+  if(spc->resetWanted) { /* handle reset */ return; }
+  if(spc->stopped) { spc_idleWait(spc); return; }
+
+  uint8_t opcode = spc_readOpcode(spc);
+  spc_doOpcode(spc, opcode);  // COMPLETE instruction in one call
+}
+```
+
+**Key insight:** LakeSnes executes instructions **atomically** - no `bstep`, no `step`, no state leakage.
+
+**2. Cycle Tracking via Callbacks (spc.c:406-409)**
+```c
+static void spc_movsy(Spc* spc, uint16_t adr) {
+  spc_read(spc, adr);          // Calls apu_cycle()
+  spc_write(spc, adr, spc->y); // Calls apu_cycle()
+}
+```
+
+Every `spc_read()`, `spc_write()`, and `spc_idle()` call triggers `apu_cycle()`, which:
+- Advances APU cycle counter
+- Ticks DSP every 32 cycles
+- Updates timers
+
+**3. Simple Addressing Mode Functions (spc.c:189-275)**
+```c
+static uint16_t spc_adrDp(Spc* spc) {
+  return spc_readOpcode(spc) | (spc->p << 8);
+}
+
+static uint16_t spc_adrDpx(Spc* spc) {
+  uint16_t res = ((spc_readOpcode(spc) + spc->x) & 0xff) | (spc->p << 8);
+  spc_idle(spc);  // Extra cycle for indexed addressing
+  return res;
+}
+```
+
+Each memory access and idle call automatically advances cycles.
+
+**4. APU Main Loop (apu.c:73-82)**
+```c
+int apu_runCycles(Apu* apu, int wantedCycles) {
+  int runCycles = 0;
+  uint32_t startCycles = apu->cycles;
+  while(runCycles < wantedCycles) {
+    spc_runOpcode(apu->spc);
+    runCycles += (uint32_t) (apu->cycles - startCycles);
+    startCycles = apu->cycles;
+  }
+  return runCycles;
+}
+```
+
+**Problem:** This approach tracks cycles by **delta**, which works because every memory access calls `apu_cycle()`.
+
+### Where LakeSnes Falls Short (And How We Can Do Better)
+
+**1. No Explicit Cycle Return**
+- LakeSnes relies on tracking `cycles` delta after each opcode
+- Doesn't return precise cycle count from `spc_runOpcode()`
+- Makes it hard to validate cycle accuracy per instruction
+
+**Our improvement:** Return exact cycle count from `Step()`:
+```cpp
+int Spc700::Step() {
+  uint8_t opcode = ReadOpcode();
+  int cycles = CalculatePreciseCycles(opcode);
+  ExecuteInstructionAtomic(opcode);
+  return cycles;  // EXPLICIT return
+}
+```
+
+**2. Implicit Cycle Counting**
+- Cycles accumulated implicitly through callbacks
+- Hard to debug when cycles are wrong
+- No way to verify cycle accuracy per instruction
+
+**Our improvement:** Explicit cycle budget model in `Apu::RunCycles()`:
+```cpp
+while (cycles_ < target_apu_cycles) {
+  int spc_cycles = spc700_.Step();  // Explicit cycle count
+  for (int i = 0; i < spc_cycles; ++i) {
+    Cycle();  // Explicit cycle advancement
+  }
+}
+```
+
+**3. No Fixed-Point Ratio**
+- LakeSnes also uses floating-point (implicitly in SNES main loop)
+- Subject to same precision drift issues
+
+**Our improvement:** Integer numerator/denominator for perfect precision.
+
+### What We're Adopting from LakeSnes
+
+✅ **Atomic instruction execution** - No `bstep` mechanism
+✅ **Simple addressing mode functions** - Return address, advance cycles via callbacks
+✅ **Cycle advancement per memory access** - Every read/write/idle advances cycles
+
+### What We're Improving Over LakeSnes
+
+✅ **Explicit cycle counting** - `Step()` returns exact cycles consumed
+✅ **Cycle budget model** - Clear loop with explicit cycle advancement
+✅ **Fixed-point ratio** - Integer arithmetic for perfect precision
+✅ **Testability** - Easy to verify cycle counts per instruction
+
+---
+
 ## Solution Design

 ### Phase 1: Atomic Instruction Execution