Replaces the victim-thread-mutate-then-wait scheme for vsync / CP
interrupts with synchronous in-line dispatch on the coordinator host
thread. Mirrors canary's EmulateCPInterruptDPC -> Processor::Execute
path (kernel_state.cc:1370, processor.cc:413): pick a guest thread,
borrow its PpcContext, jam ISR PC + args in, run the interpreter
inline until LR_HALT_SENTINEL, restore the borrowed context.
Why: audit-059 measured gpu.interrupt.delivered{source=0} = 54 over
3.9 s vs canary's 4712 over 30 s. Per-second shortfall ~11×. Old
asynchronous LR-sentinel injection (try_inject_graphics_interrupt)
needed a Ready or Blocked guest thread to land on; once the Sylpheed
main thread and worker threads all idled post-boot, no victim was
available and every queued vsync got dropped. Host-driven dispatch
decouples delivery from guest-thread readiness.
Smoke test (lockstep): unchanged 54 — under current Sylpheed boot
trajectory the ticker is gated by guest-instruction progress, not
victim availability; lockstep stalls into idle-advance after ~5M
instructions of real work and the synthetic tick_vsync_instr stops
firing. Under --parallel (wallclock ticker) gpu.interrupt.delivered
climbs to ~1131 over a 128 s run, confirming the synchronous
dispatcher itself works as intended. Architectural piece is now in
place; raising the lockstep delivery rate requires ticking the
synthetic vsync inside coord_idle_advance, which is a separate
change.
Changes:
- crates/xenia-kernel/src/interrupts.rs: doc-comment update only.
SavedCallbackCtx + CALLBACK_STACK_PAD retained — the audio
callback path (audit-048) still uses the asynchronous LR-sentinel
inject on a dedicated per-client worker.
- crates/xenia-app/src/main.rs:
* dispatch_graphics_interrupts(kernel, mem, &mut stats,
&mut decode_cache, thunk_map): new fn. Drains the full FIFO per
call. Victim selection same shape (Ready preferred, else
Blocked, skip Idle/Exited/ServicingIrq), but the call is
synchronous - we run step_cached + import-thunk dispatch inline
on the borrowed ctx until pc == LR_HALT_SENTINEL.
MAX_INSTRS_PER_ISR = 1M safety budget.
* coord_pre_round: graphics-IRQ injection call removed. Audio
path unchanged (still calls try_inject_audio_callback).
* run_execution + run_execution_parallel: each now owns a
persistent isr_decode_cache and calls
dispatch_graphics_interrupts after coord_pre_round.
* try_inject_graphics_interrupt: deleted (118 LOC).
No new public APIs, no new dependencies, no changes to xenia-cpu.
Tests: workspace 765 passed / 0 failed / 4 ignored (parallel_stress
+ sylpheed_n50m, all gated). Kernel 127/127, app 5/5, cpu 288/288.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The synthetic v-sync ticker used a per-instruction proxy
(VSYNC_INSTR_PERIOD = 150 k) tuned for ~10 MIPS lockstep
throughput → 60 Hz. Audit M11 observed this drifts under
`--parallel`: with 6 worker threads sharing the kernel mutex,
the dispatcher executes more PPC instructions per tick
callback, so the accumulator never crosses 150 k. Result:
~629 v-syncs/100M lockstep → ~2 v-syncs/100M --parallel.
Hybrid solution preserves lockstep determinism (which the
goldens depend on) while fixing --parallel:
* `tick_vsync_instr(instr_count)` — legacy instruction-count
ticker, used by lockstep. Bit-stable across runs.
* `tick_vsync_wallclock()` — new Instant-based ticker. Fires
`floor(elapsed / VSYNC_PERIOD)` v-syncs since the anchor
and advances the anchor by that many full periods (no
lazy backlog). Capped at INTERRUPT_QUEUE_CAP per call so a
forward-jumping clock can't overflow the FIFO.
* `KernelState.parallel_active` flag set at startup from
`--parallel` / `XENIA_PARALLEL=1`. Read by `coord_pre_round`
in main.rs to choose between the two tickers.
Verification:
* cargo test --workspace --release: 561 passing (+3 new
wall-clock tests vs prior 558 baseline).
* lockstep -n 100M --stable-digest: BIT-IDENTICAL to
pre-Phase-3 baseline. interrupts_delivered preserved at
~630 (was ~629 pre-fix).
* --parallel --reservations-table -n 30M: interrupts_delivered
rose from ~2 to 17. (FIFO INTERRUPT_QUEUE_CAP=4 still caps
burst delivery; that's a separate bottleneck — addressed
by raising cap when --parallel queue depth becomes the
next blocker.)
Trade-off: --parallel runs are non-deterministic at the
v-sync rate by design (per audit M05 PPCBUG-703 already).
Lockstep stays bit-identical, so the `sylpheed_n*m.json`
goldens are untouched.
Audit IDs: KRNBUG-D08 (closed).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Major HLE buildout in exports.rs: KeInitializeSemaphore now seeds
count/limit, XexGet{Module,Procedure}Address use distinct
HMODULE_XBOXKRNL/HMODULE_XAM pseudo-handles with a reverse
(ModuleId,ordinal)→thunk_addr map, plus sweeping additions across
sync primitives, file I/O, semaphores, events, threads, and
allocator paths needed to advance Sylpheed past VdSwap=2.
New modules:
- thread.rs — ThreadRef + per-thread suspension/wake plumbing
- interrupts.rs — IRQ delivery, pending-IRQ slots, IPI helpers
- path.rs — guest path normalization (D:\\, game:\\, etc.)
- audit.rs — --trace-handles harness backing the handle audit
- ui_bridge.rs — kernel-side endpoint of the xenia-ui bridge
(input snapshots, framebuffer publish handles)
state.rs grows to own the HW-slot scheduler state, the new audit /
UI bridge handles, and the per-handle reverse maps. xam.rs and
objects.rs follow suit for the HLE additions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>