AUDIT-059 round 15 — diagnostic. When `--audit-r3-dump-bytes=N` is set,
every `--audit-pc-probe-hex` fire emits a paired `AUDIT-R3-DUMP` line
with N bytes of guest memory from r3 as u32 lanes (4-byte aligned, cap
256B). Sized for the 80-byte stack-local struct at sub_82452DC0's
`r31+96` (probe sub_8245B000 entry where r3 IS the struct ptr).
Settable via `XENIA_AUDIT_R3_DUMP_BYTES` env. Read-only; lockstep digest
unaffected (empty-set fast path in fire_audit_pc_probe_if_match).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Round-14 of AUDIT-2BF (singleton-dump). The bctrl at sub_822F1AA8+0x90
(PC 0x822F1B4C) loads [0x828E1F08] (a global singleton), dereferences
its vtable, and indirect-calls vtable[0]. Canary returns; ours hangs.
To name the resolved target we need to dump the (singleton, vtable,
vtable[0]) chain on probe firing.
Adds `--audit-mem-read-hex` / `XENIA_AUDIT_MEM_READ` taking a single
guest VA. When set and any `--audit-pc-probe-hex` PC fires, the kernel
emits a paired `AUDIT-MEM-READ` line with three guest reads:
AUDIT-MEM-READ addr=0x828E1F08 val=<*addr> vtable=<**addr> \
vtable[0]=<***addr+0> vtable[24]=<***addr+24> ...
`vtable[24]` is included as the slot-6 method (audit-059 round 9
documented the canary silph chain dispatching slot 6 of a vtable here).
Read-only; lockstep digest unaffected. ~30 LOC across state.rs and
main.rs. `cmd_check` opts out of the flag (same policy as the existing
audit_pc_probe_hex).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a per-PC probe analogous to --lr-trace / --branch-probe but tuned
for the silph init chain's virtual-dispatch site at sub_82172BA0+0x1E8
(PC 0x82172D88, the bctrl after a 3-deep `lwz` chain that loads vtable
slot 6). Each fire emits one AUDIT-PC-PROBE line with (pc, tid, hw,
cycle, lr, r3, r11) plus four guest-memory dereferences off r3 — the
vtable, slot-6 method pointer, auxiliary handle field, and embedded
sub-object vtable — so the line can be compared head-to-head with
canary's round-9 capture (r3=0xBCCC52C0, [r3+0]=0x820A3644,
slot6=sub_821B55D8, [r3+0xC]=0xF80000D8, [r3+0x30]=0x820A1870) to
identify whether ours dispatches to the wrong vtable on a correct
object (case A) or to a wrong object entirely (case B).
Why this addition rather than reuse of an existing probe: --lr-trace
emits JSONL designed for canary-side diffing and only captures
r3/r4/r5/r6/lr (no memory dereferences); --branch-probe captures CR
flags and lr but again no memory; --ctor-probe is single-shot per PC
and walks the stack back-chain. None of them load the four indirect
fields needed to identify a vtable-shape divergence.
Implementation:
- state.rs: new HashSet<u32> field `audit_pc_probe_pcs` and helper
`fire_audit_pc_probe_if_match(hw_id, mem)`. Empty-set fast-path
keeps the cost to one is_empty() check per worker_prologue call
when the flag is unused. Read-only — no guest state mutation,
lockstep digest unchanged.
- main.rs: new CLI flag --audit-pc-probe-hex with bare-hex comma
parsing (tolerates `0x` prefix), settable also via
XENIA_AUDIT_PC_PROBE env var. Threaded through cmd_exec_inner;
cmd_check passes None so check digests are unaffected.
Probe wired into worker_prologue alongside fire_ctor_probe / fire_-
branch_probe / fire_lr_trace. Like its siblings, it fires once per
basic-block entry — known limitation (audit-045 reading-error class
13); use a block-entry PC if probing a mid-block instruction.
Verification: kernel 127/127, app 5/5 non-ignored, no behaviour
change with empty flag.
Cross-references audit-059 round 9's canary capture and lays the
groundwork for the round-10 ours-side comparison.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lockstep vsync delivery was capped at 54/run despite the ticker firing
333 periods and dispatcher being called 1.2M times. Root cause: the
blanket `is_in_callback()` gate skipped dispatch entirely whenever the
async audio path held `interrupts.saved`, which is essentially the
entire boot (audio worker rarely hits its LR_HALT_SENTINEL between
back-to-back callbacks). 5.85M dispatch_skip_in_callback events drowned
out the 55 with-pending windows.
Graphics dispatch (iterate-2.BE) runs the ISR synchronously and
restores the borrowed context before returning — it doesn't touch
`interrupts.saved`. The only real conflict is if graphics picks the
*same* thread audio borrowed (which would stomp audio's
SavedCallbackCtx). Replace the blanket gate with per-thread exclusion:
when audio is mid-flight, exclude only its `injected_ref` from
victim selection. Falls through to the existing no-victim drop if
that's the only candidate.
Lockstep (50M instr): gpu.interrupt.delivered{source=0} 54 → 295
(5.5×), all 333 ticker periods either delivered or unarmed (no more
queue_full_drops). Wallclock unchanged ~3 s.
Parallel (30M instr): 1193 → 3458 baseline lift (2.9×), no regression.
Tests: xenia-kernel 127/127, xenia-app 5/5 non-ignored. Lockstep
goldens will drift (interrupts.delivered is in the digest); deferred
to next iterate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The iterate-2.BE host-driven synchronous ISR dispatcher relies on
something queueing v-syncs. In lockstep that's `tick_vsync_instr`,
called from `coord_pre_round` per round. If the scheduler stalls into
`coord_idle_advance` (no Ready threads), the instruction counter
freezes — the accumulator stops incrementing, the ticker stops
queueing, and the dispatcher is left starved for the duration of the
idle wait.
Tick `tick_vsync_wallclock` at the top of `coord_idle_advance` so
v-syncs keep firing on host time even when the guest scheduler is
parked. The dispatcher in the outer loop drains whatever we queue on
the next iteration. Same MMIO `D1MODE_VBLANK_VLINE_STATUS` bit-set as
the production path.
Note: empirically in Sylpheed at 50M/500M instruction horizons,
`coord_idle_advance` is never reached (tids 9/10/12 stay Ready through
the early-boot deadlock), so this commit doesn't move
`gpu.interrupt.delivered{source=0}` off 54 for this title at these
horizons. It is the correct fix for the documented starvation pattern
and will activate as soon as the kernel reaches a state where Ready
threads drop to zero with timers/waits pending.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the victim-thread-mutate-then-wait scheme for vsync / CP
interrupts with synchronous in-line dispatch on the coordinator host
thread. Mirrors canary's EmulateCPInterruptDPC -> Processor::Execute
path (kernel_state.cc:1370, processor.cc:413): pick a guest thread,
borrow its PpcContext, jam ISR PC + args in, run the interpreter
inline until LR_HALT_SENTINEL, restore the borrowed context.
Why: audit-059 measured gpu.interrupt.delivered{source=0} = 54 over
3.9 s vs canary's 4712 over 30 s. Per-second shortfall ~11×. Old
asynchronous LR-sentinel injection (try_inject_graphics_interrupt)
needed a Ready or Blocked guest thread to land on; once the Sylpheed
main thread and worker threads all idled post-boot, no victim was
available and every queued vsync got dropped. Host-driven dispatch
decouples delivery from guest-thread readiness.
Smoke test (lockstep): unchanged 54 — under current Sylpheed boot
trajectory the ticker is gated by guest-instruction progress, not
victim availability; lockstep stalls into idle-advance after ~5M
instructions of real work and the synthetic tick_vsync_instr stops
firing. Under --parallel (wallclock ticker) gpu.interrupt.delivered
climbs to ~1131 over a 128 s run, confirming the synchronous
dispatcher itself works as intended. Architectural piece is now in
place; raising the lockstep delivery rate requires ticking the
synthetic vsync inside coord_idle_advance, which is a separate
change.
Changes:
- crates/xenia-kernel/src/interrupts.rs: doc-comment update only.
SavedCallbackCtx + CALLBACK_STACK_PAD retained — the audio
callback path (audit-048) still uses the asynchronous LR-sentinel
inject on a dedicated per-client worker.
- crates/xenia-app/src/main.rs:
* dispatch_graphics_interrupts(kernel, mem, &mut stats,
&mut decode_cache, thunk_map): new fn. Drains the full FIFO per
call. Victim selection same shape (Ready preferred, else
Blocked, skip Idle/Exited/ServicingIrq), but the call is
synchronous - we run step_cached + import-thunk dispatch inline
on the borrowed ctx until pc == LR_HALT_SENTINEL.
MAX_INSTRS_PER_ISR = 1M safety budget.
* coord_pre_round: graphics-IRQ injection call removed. Audio
path unchanged (still calls try_inject_audio_callback).
* run_execution + run_execution_parallel: each now owns a
persistent isr_decode_cache and calls
dispatch_graphics_interrupts after coord_pre_round.
* try_inject_graphics_interrupt: deleted (118 LOC).
No new public APIs, no new dependencies, no changes to xenia-cpu.
Tests: workspace 765 passed / 0 failed / 4 ignored (parallel_stress
+ sylpheed_n50m, all gated). Kernel 127/127, app 5/5, cpu 288/288.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-06 18:58:40 +02:00
3 changed files with 576 additions and 124 deletions
/// M12 — diagnostic. If the live PC for HW slot `hw_id` is in
/// `self.lr_trace_pcs`, emit one JSONL record. Format mirrors what
/// xenia-canary's `--log_lr_on_pc` patch emits, plus the cycle
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.