9 Commits

Author SHA1 Message Date
MechaCat02
40f208ea4e [2.BF] Silph WorkerCtx: install canary's real sub-vtable at [+0x2C][0]
Round-21 pivot of the audit-059 synth-spawn module. Round 20 made the
silph::WorkerCtx workers run by attaching a 32-slot stub sub-vtable
where every entry was a `li r3, 0; blr` stub — workers spawned but
spun forever because slots 15/17 short-circuited to NULL ("no work").

Round 21 reads canary's real sub-vtable VA out of the XEX `.rdata` —
`0x8200A168` — and points `[sub_object + 0]` at it directly. The
vtable bytes live in the static image both engines map, so no guest
memory is consumed and slot 15 (= `sub_824FCCC8`) and slot 17
(= `sub_824FCE38`) — the only slots `sub_82506B08` ever calls —
become working game methods.

Discovery method (canary probes in
`audit-runs/audit-059-handle-disambiguation/round21-subvtable-canary/`):
  1. `--audit_jit_prolog_pc=0x82506B08` to catch the first WorkerCtx
     virtual-dispatch entry; `[r3+0x2C]` revealed the sub-object VA.
  2. Re-run with `--audit_jit_prolog_mem_dump=<sub-obj VA>` to deref
     `[sub-object + 0]` = sub-vtable VA = 0x8200A168.
  3. PE inspection (`xex-text/xex-rdata` is the static image) reads
     all 31 slots; slot 15 -> sub_824FCCC8, slot 17 -> sub_824FCE38.

Smoke metrics (50M instructions, `XENIA_CACHE_PERSIST=1
XENIA_SILPH_SYNTH=1`, audit-runs/audit-059-handle-disambiguation/
round21-real-vtable/):
  * 4/4 workers spawned, no crash, no new fault
  * KeSetEvent 633885 -> 431860 (-32%)
  * KeWaitForSingleObject 258441 -> 185762 (-28%)
  * Per-handle state unchanged on the focused stalled set
    (0x1020/0x1090 still `<NO_SIGNALS_DESPITE_WAITS>`,
    0x12a4/0x12ac/0x1218/0x1224 still `<UNCREATED>`).
  * No VdSwap/draws progression observed in this window.

Verdict: B (partial). The workers no longer spin in a stub-loop —
internal call density shifted — but the focused wedge handles still
don't get signalled. Likely root cause: workers may now be waiting
on the WorkerCtx's own KEVENTs (which we synthesised at
+0x54/+0x94) for upstream work that no producer is enqueuing.

Net LOC: 29 ins / 31 del. Tests: workspace passes (lockstep app
tests, kernel 127/127, hir 288/288, scheduler 38/38).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-07 21:19:52 +02:00
MechaCat02
8683fb59ed [2.BF] Silph WorkerCtx: synthesize sub-object + vtable at [+0x2C]
Audit-059 round 19 isolated the round-18 worker fault: the four silph::
WorkerCtx worker bodies all execute the sequence

    lwz r3, 44(rN)     ; r3 = [ctx+0x2C] — sub-object pointer
    lwz r11, 0(r3)     ; r11 = sub-object vtable
    lwz r11, 60(r11)   ; r11 = sub-object vtable[15]
    mtctr r11
    bctrl

Ours left [ctx+0x2C] NULL → PC=0 fault on first virtual dispatch. Round 19
recommended materialising a sub-object whose vtable points entirely at an
existing trivial-return stub so workers idle live, returning NULL work,
without crashing.

Changes (silph_synth.rs only, +63/-6):

- Grow SILPH_CTX_SIZE 0x500 → 0x800 to embed sub-object at +0x300 and a
  32-slot sub-vtable at +0x500 in the same heap_alloc.
- After ctx header init, write sub-object pointer at [ctx+0x2C], the XEX-
  resident wrapper constant 0xBE568F00 (round-7 finding) at [ctx+0x30],
  and leave [ctx+0x28] NULL (matches canary first-fire snapshot).
- Populate every slot of the 32-entry sub-vtable with VA 0x8216CAA4, the
  first 4-byte-aligned standalone `li r3, 0; blr` stub located by a fresh
  PE-text scan (preceded by a `blr` terminating the previous function).
- Sub-object body itself is zero-filled apart from the [+0]=vtable_ptr
  write; round-19 disassembly confirms workers only touch slots 15/17.

Smoke (XENIA_SILPH_SYNTH=1, persistent cache, 5e7 instr):

- Lockstep: no crash, all 4 workers (tid=6/7/8/9) reach Ready in deep
  worker-body PCs (0x825067xx/0x825089xx/0x825091xx). Verdict (D) —
  workers run their idle loop returning NULL; existing silph waiters
  (0x1020, 0x1090) remain <NO_SIGNALS_DESPITE_WAITS> because we
  deliberately neutered productive work.
- Parallel: identical picture, no PC=0/PC=garbage fault anywhere.

No regression in 765-test suite.

Next round: feed real work-items into the intrusive ring at ctx+0x210
so workers' returned-NULL idle becomes returned-work productive; or
discover which sub-vtable slots actually need real callees (slot 15
worker drain, slot 17 producer).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-07 21:04:04 +02:00
MechaCat02
b5885b8560 [2.BF] Synthetic silph::WorkerCtx spawn (round 18 — opt-in landing)
Adds infrastructure to synthesise the silph::WorkerCtx that AUDIT-058/059
identified as never reached by ours' static-init chain (real chain entry
sits in audit-059 round 9's wrong-vtable wedge at sub_82172BA0+0x1E8).
Ctx layout follows round 5's live hexdump from canary:

  +0x00   vtable = 0x8200A1E8
  +0x04   self
  +0x08   intrusive list head -> self
  +0x0C   init flag = 1
  +0x10   packed byte field
  +0x18   2x float ~1.0 (UI rates)
  +0x24   flag = 1
  +0x28..+0x30  3x foreign-arena pointers (left NULL — see below)
  +0x54..+0x84  4x X_KEVENT auto-reset, state=0
  +0x94..+0xC4  4x X_KEVENT manual-reset, state=1 (pre-signaled)
  +0x210..+0x250  4-entry intrusive work-ring, empty

Worker spawn mirrors AUDIT-048's audio-worker pattern in
xaudio_register_render_driver: per-worker allocate_thread_image +
state.scheduler.spawn with r3 = ctx_ptr. Trigger fires at the first
dat/* VFS open (ours' earliest is dat/files.tbl), which is when canary
runs the equivalent chain.

ROUND 18 OUTCOME — opt-in only:

With workers spawned Ready (XENIA_SILPH_SYNTH=1), boot CRASHES at
cycle ~5.5M with PC=0 on hw=1, just after worker_3 (entry 0x825065B8)
spawns. Per task constraints this is STOP-and-report: the ctx fields
+0x28/+0x2C/+0x30 (foreign heap pointers — canary's 0x30057018,
0xBCE25640, 0xBE568F00, distinct arenas per audit-059 round 7) are
left NULL, and the worker bodies plausibly dereference one of them.
Synthesising those is a fresh investigation (round 19+).

With workers spawned Suspended (XENIA_SILPH_SYNTH=suspend), boot
completes normally (11 spawns, VdSwap=1, KeSetEvent=2,
KeReleaseSemaphore=1 — matches default baseline). The ctx remains
materialised in guest memory at the logged VA for downstream probing.

Default (env var unset): no synth, no regression.

Files:
  crates/xenia-kernel/src/silph_synth.rs   (new, 225 LOC)
  crates/xenia-kernel/src/lib.rs           (+1 LOC, register module)
  crates/xenia-kernel/src/exports.rs       (+37 LOC, hook in open_vfs_file)
  crates/xenia-kernel/src/state.rs         (+18 LOC, 4 silph_synth_* fields)

Tests: cargo test --release --workspace = 765 pass / 0 fail.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-07 20:44:29 +02:00
MechaCat02
9340ff4592 [Audit] --audit-r3-dump-bytes: dump N bytes at r3 when probe fires
AUDIT-059 round 15 — diagnostic. When `--audit-r3-dump-bytes=N` is set,
every `--audit-pc-probe-hex` fire emits a paired `AUDIT-R3-DUMP` line
with N bytes of guest memory from r3 as u32 lanes (4-byte aligned, cap
256B). Sized for the 80-byte stack-local struct at sub_82452DC0's
`r31+96` (probe sub_8245B000 entry where r3 IS the struct ptr).

Settable via `XENIA_AUDIT_R3_DUMP_BYTES` env. Read-only; lockstep digest
unaffected (empty-set fast path in fire_audit_pc_probe_if_match).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-07 19:39:22 +02:00
MechaCat02
bcd018659b [Audit] --audit-mem-dump-chain: deref a guest address N levels for diagnosis
Round-14 of AUDIT-2BF (singleton-dump). The bctrl at sub_822F1AA8+0x90
(PC 0x822F1B4C) loads [0x828E1F08] (a global singleton), dereferences
its vtable, and indirect-calls vtable[0]. Canary returns; ours hangs.
To name the resolved target we need to dump the (singleton, vtable,
vtable[0]) chain on probe firing.

Adds `--audit-mem-read-hex` / `XENIA_AUDIT_MEM_READ` taking a single
guest VA. When set and any `--audit-pc-probe-hex` PC fires, the kernel
emits a paired `AUDIT-MEM-READ` line with three guest reads:

  AUDIT-MEM-READ addr=0x828E1F08 val=<*addr> vtable=<**addr> \
                 vtable[0]=<***addr+0> vtable[24]=<***addr+24> ...

`vtable[24]` is included as the slot-6 method (audit-059 round 9
documented the canary silph chain dispatching slot 6 of a vtable here).

Read-only; lockstep digest unaffected. ~30 LOC across state.rs and
main.rs. `cmd_check` opts out of the flag (same policy as the existing
audit_pc_probe_hex).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-07 12:13:42 +02:00
MechaCat02
09e59e09b7 Audit-2BF.delta: add --audit-pc-probe-hex for silph-init bctrl probe
Adds a per-PC probe analogous to --lr-trace / --branch-probe but tuned
for the silph init chain's virtual-dispatch site at sub_82172BA0+0x1E8
(PC 0x82172D88, the bctrl after a 3-deep `lwz` chain that loads vtable
slot 6). Each fire emits one AUDIT-PC-PROBE line with (pc, tid, hw,
cycle, lr, r3, r11) plus four guest-memory dereferences off r3 — the
vtable, slot-6 method pointer, auxiliary handle field, and embedded
sub-object vtable — so the line can be compared head-to-head with
canary's round-9 capture (r3=0xBCCC52C0, [r3+0]=0x820A3644,
slot6=sub_821B55D8, [r3+0xC]=0xF80000D8, [r3+0x30]=0x820A1870) to
identify whether ours dispatches to the wrong vtable on a correct
object (case A) or to a wrong object entirely (case B).

Why this addition rather than reuse of an existing probe: --lr-trace
emits JSONL designed for canary-side diffing and only captures
r3/r4/r5/r6/lr (no memory dereferences); --branch-probe captures CR
flags and lr but again no memory; --ctor-probe is single-shot per PC
and walks the stack back-chain. None of them load the four indirect
fields needed to identify a vtable-shape divergence.

Implementation:
  - state.rs: new HashSet<u32> field `audit_pc_probe_pcs` and helper
    `fire_audit_pc_probe_if_match(hw_id, mem)`. Empty-set fast-path
    keeps the cost to one is_empty() check per worker_prologue call
    when the flag is unused. Read-only — no guest state mutation,
    lockstep digest unchanged.
  - main.rs: new CLI flag --audit-pc-probe-hex with bare-hex comma
    parsing (tolerates `0x` prefix), settable also via
    XENIA_AUDIT_PC_PROBE env var. Threaded through cmd_exec_inner;
    cmd_check passes None so check digests are unaffected.

Probe wired into worker_prologue alongside fire_ctor_probe / fire_-
branch_probe / fire_lr_trace. Like its siblings, it fires once per
basic-block entry — known limitation (audit-045 reading-error class
13); use a block-entry PC if probing a mid-block instruction.

Verification: kernel 127/127, app 5/5 non-ignored, no behaviour
change with empty flag.

Cross-references audit-059 round 9's canary capture and lays the
groundwork for the round-10 ours-side comparison.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-07 10:59:03 +02:00
MechaCat02
5a8fe21ad5 Iterate-2.BF.γ: refine is_in_callback gate to per-thread exclusion
Lockstep vsync delivery was capped at 54/run despite the ticker firing
333 periods and dispatcher being called 1.2M times. Root cause: the
blanket `is_in_callback()` gate skipped dispatch entirely whenever the
async audio path held `interrupts.saved`, which is essentially the
entire boot (audio worker rarely hits its LR_HALT_SENTINEL between
back-to-back callbacks). 5.85M dispatch_skip_in_callback events drowned
out the 55 with-pending windows.

Graphics dispatch (iterate-2.BE) runs the ISR synchronously and
restores the borrowed context before returning — it doesn't touch
`interrupts.saved`. The only real conflict is if graphics picks the
*same* thread audio borrowed (which would stomp audio's
SavedCallbackCtx). Replace the blanket gate with per-thread exclusion:
when audio is mid-flight, exclude only its `injected_ref` from
victim selection. Falls through to the existing no-victim drop if
that's the only candidate.

Lockstep (50M instr): gpu.interrupt.delivered{source=0} 54 → 295
(5.5×), all 333 ticker periods either delivered or unarmed (no more
queue_full_drops). Wallclock unchanged ~3 s.

Parallel (30M instr): 1193 → 3458 baseline lift (2.9×), no regression.

Tests: xenia-kernel 127/127, xenia-app 5/5 non-ignored. Lockstep
goldens will drift (interrupts.delivered is in the digest); deferred
to next iterate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-06 19:52:16 +02:00
MechaCat02
51489e34db Iterate-2.BE Path β: tick vsync from coord_idle_advance
The iterate-2.BE host-driven synchronous ISR dispatcher relies on
something queueing v-syncs. In lockstep that's `tick_vsync_instr`,
called from `coord_pre_round` per round. If the scheduler stalls into
`coord_idle_advance` (no Ready threads), the instruction counter
freezes — the accumulator stops incrementing, the ticker stops
queueing, and the dispatcher is left starved for the duration of the
idle wait.

Tick `tick_vsync_wallclock` at the top of `coord_idle_advance` so
v-syncs keep firing on host time even when the guest scheduler is
parked. The dispatcher in the outer loop drains whatever we queue on
the next iteration. Same MMIO `D1MODE_VBLANK_VLINE_STATUS` bit-set as
the production path.

Note: empirically in Sylpheed at 50M/500M instruction horizons,
`coord_idle_advance` is never reached (tids 9/10/12 stay Ready through
the early-boot deadlock), so this commit doesn't move
`gpu.interrupt.delivered{source=0}` off 54 for this title at these
horizons. It is the correct fix for the documented starvation pattern
and will activate as soon as the kernel reaches a state where Ready
threads drop to zero with timers/waits pending.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-06 19:22:03 +02:00
MechaCat02
9a93152981 Iterate-2.BE: host-driven synchronous graphics ISR delivery
Replaces the victim-thread-mutate-then-wait scheme for vsync / CP
interrupts with synchronous in-line dispatch on the coordinator host
thread. Mirrors canary's EmulateCPInterruptDPC -> Processor::Execute
path (kernel_state.cc:1370, processor.cc:413): pick a guest thread,
borrow its PpcContext, jam ISR PC + args in, run the interpreter
inline until LR_HALT_SENTINEL, restore the borrowed context.

Why: audit-059 measured gpu.interrupt.delivered{source=0} = 54 over
3.9 s vs canary's 4712 over 30 s. Per-second shortfall ~11×. Old
asynchronous LR-sentinel injection (try_inject_graphics_interrupt)
needed a Ready or Blocked guest thread to land on; once the Sylpheed
main thread and worker threads all idled post-boot, no victim was
available and every queued vsync got dropped. Host-driven dispatch
decouples delivery from guest-thread readiness.

Smoke test (lockstep): unchanged 54 — under current Sylpheed boot
trajectory the ticker is gated by guest-instruction progress, not
victim availability; lockstep stalls into idle-advance after ~5M
instructions of real work and the synthetic tick_vsync_instr stops
firing. Under --parallel (wallclock ticker) gpu.interrupt.delivered
climbs to ~1131 over a 128 s run, confirming the synchronous
dispatcher itself works as intended. Architectural piece is now in
place; raising the lockstep delivery rate requires ticking the
synthetic vsync inside coord_idle_advance, which is a separate
change.

Changes:
- crates/xenia-kernel/src/interrupts.rs: doc-comment update only.
  SavedCallbackCtx + CALLBACK_STACK_PAD retained — the audio
  callback path (audit-048) still uses the asynchronous LR-sentinel
  inject on a dedicated per-client worker.
- crates/xenia-app/src/main.rs:
  * dispatch_graphics_interrupts(kernel, mem, &mut stats,
    &mut decode_cache, thunk_map): new fn. Drains the full FIFO per
    call. Victim selection same shape (Ready preferred, else
    Blocked, skip Idle/Exited/ServicingIrq), but the call is
    synchronous - we run step_cached + import-thunk dispatch inline
    on the borrowed ctx until pc == LR_HALT_SENTINEL.
    MAX_INSTRS_PER_ISR = 1M safety budget.
  * coord_pre_round: graphics-IRQ injection call removed. Audio
    path unchanged (still calls try_inject_audio_callback).
  * run_execution + run_execution_parallel: each now owns a
    persistent isr_decode_cache and calls
    dispatch_graphics_interrupts after coord_pre_round.
  * try_inject_graphics_interrupt: deleted (118 LOC).

No new public APIs, no new dependencies, no changes to xenia-cpu.

Tests: workspace 765 passed / 0 failed / 4 ignored (parallel_stress
+ sylpheed_n50m, all gated). Kernel 127/127, app 5/5, cpu 288/288.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-06 18:58:40 +02:00
6 changed files with 912 additions and 124 deletions

View File

@@ -242,6 +242,44 @@ enum Commands {
/// line). Stdout when omitted. /// line). Stdout when omitted.
#[arg(long)] #[arg(long)]
lr_trace_out: Option<String>, lr_trace_out: Option<String>,
/// AUDIT-2BF — comma-separated list of guest PCs (hex, no `0x`
/// prefix required) to capture as one-line `AUDIT-PC-PROBE`
/// records on every fire. Designed for the silph init chain
/// virtual-dispatch site at `sub_82172BA0+0x1E8` (PC
/// `0x82172D88`, a `bctrl` after a 3-deep vtable-slot-6 load).
/// Each record carries (pc, tid, hw, cycle, lr, r3, r11) plus
/// four guest-memory dereferences off r3: `[r3+0]` (vtable),
/// `[[r3+0]+24]` (slot 6 method = bctrl target), `[r3+0x0C]`
/// (auxiliary handle), `[r3+0x30]` (embedded sub-object vtable).
/// Compares directly against canary's round-9 capture:
/// r3=0xBCCC52C0, [r3+0]=0x820A3644, slot6=sub_821B55D8,
/// [r3+0xC]=0xF80000D8, [r3+0x30]=0x820A1870. Read-only;
/// lockstep digest unaffected. Settable via
/// `XENIA_AUDIT_PC_PROBE`. Example:
/// `--audit-pc-probe-hex=82172D88,82172D80`.
#[arg(long)]
audit_pc_probe_hex: Option<String>,
/// AUDIT-2BF round 14 — guest VA (hex, optional `0x` prefix) to
/// dereference 3 deep on every `--audit-pc-probe-hex` fire.
/// Emits a paired `AUDIT-MEM-READ` line with the singleton value,
/// vtable, vtable[0] (= first virtual method, the bctrl target
/// at `0x822F1B4C`), and vtable[24] (= slot 6 = canary's silph
/// chain target `sub_821B55D8`). Compare ours vs canary to
/// determine whether the bctrl dispatches to the same function
/// or a different one. Read-only; lockstep digest unaffected.
/// Settable via `XENIA_AUDIT_MEM_READ`. Example:
/// `--audit-mem-read-hex=828E1F08`.
#[arg(long)]
audit_mem_read_hex: Option<String>,
/// AUDIT-052 — number of bytes (4-byte aligned, max 256) to
/// dump from `r3` on every `--audit-pc-probe-hex` fire. Emits a
/// paired `AUDIT-R3-DUMP` line with the u32 lanes. Designed for
/// the 80-byte stack-local struct at `sub_82452DC0` (`r31+96`)
/// when probing `sub_8245B000` entry — where `r3` IS the struct
/// pointer. Read-only; lockstep digest unaffected. Settable via
/// `XENIA_AUDIT_R3_DUMP_BYTES`. Example: `--audit-r3-dump-bytes=80`.
#[arg(long)]
audit_r3_dump_bytes: Option<u32>,
}, },
/// Browse XISO disc image contents /// Browse XISO disc image contents
Browse { Browse {
@@ -405,6 +443,9 @@ fn main() -> Result<()> {
probe_db, probe_db,
lr_trace, lr_trace,
lr_trace_out, lr_trace_out,
audit_pc_probe_hex,
audit_mem_read_hex,
audit_r3_dump_bytes,
} => cmd_exec( } => cmd_exec(
&path, &path,
max_instructions, max_instructions,
@@ -431,6 +472,9 @@ fn main() -> Result<()> {
probe_db.as_deref(), probe_db.as_deref(),
lr_trace.as_deref(), lr_trace.as_deref(),
lr_trace_out.as_deref(), lr_trace_out.as_deref(),
audit_pc_probe_hex.as_deref(),
audit_mem_read_hex.as_deref(),
audit_r3_dump_bytes,
), ),
Commands::Browse { path } => cmd_browse(&path), Commands::Browse { path } => cmd_browse(&path),
Commands::Info { path } => cmd_info(&path), Commands::Info { path } => cmd_info(&path),
@@ -662,6 +706,9 @@ fn cmd_exec(
probe_db: Option<&str>, probe_db: Option<&str>,
lr_trace: Option<&str>, lr_trace: Option<&str>,
lr_trace_out: Option<&str>, lr_trace_out: Option<&str>,
audit_pc_probe_hex: Option<&str>,
audit_mem_read_hex: Option<&str>,
audit_r3_dump_bytes: Option<u32>,
) -> Result<()> { ) -> Result<()> {
cmd_exec_inner( cmd_exec_inner(
path, path,
@@ -689,6 +736,9 @@ fn cmd_exec(
probe_db, probe_db,
lr_trace, lr_trace,
lr_trace_out, lr_trace_out,
audit_pc_probe_hex,
audit_mem_read_hex,
audit_r3_dump_bytes,
None, None,
None, None,
false, false,
@@ -735,6 +785,9 @@ fn cmd_check(
None, // probe_db — same None, // probe_db — same
None, // lr_trace — same None, // lr_trace — same
None, // lr_trace_out — same None, // lr_trace_out — same
None, // audit_pc_probe_hex — diagnostic, never wanted on goldens
None, // audit_mem_read_hex — same
None, // audit_r3_dump_bytes — same
out, out,
expect, expect,
stable_digest, stable_digest,
@@ -767,6 +820,9 @@ fn cmd_exec_inner(
probe_db: Option<&str>, probe_db: Option<&str>,
lr_trace: Option<&str>, lr_trace: Option<&str>,
lr_trace_out: Option<&str>, lr_trace_out: Option<&str>,
audit_pc_probe_hex: Option<&str>,
audit_mem_read_hex: Option<&str>,
audit_r3_dump_bytes: Option<u32>,
digest_out: Option<&str>, digest_out: Option<&str>,
digest_expect: Option<&str>, digest_expect: Option<&str>,
stable_digest: bool, stable_digest: bool,
@@ -1167,6 +1223,84 @@ fn cmd_exec_inner(
} }
} }
// AUDIT-2BF — `--audit-pc-probe-hex=82172D88,...`. Bare-hex tokens
// (with or without `0x` prefix). Parses every comma-separated entry
// as a u32 PC and inserts into `kernel.audit_pc_probe_pcs`. Empty
// set is the hot-path no-op (single is_empty() check).
let audit_pc_probe_combined: Option<String> = match (
audit_pc_probe_hex, std::env::var("XENIA_AUDIT_PC_PROBE").ok(),
) {
(Some(s), _) => Some(s.to_string()),
(None, Some(s)) if !s.is_empty() => Some(s),
_ => None,
};
if let Some(list) = audit_pc_probe_combined {
for token in list.split(',').map(str::trim).filter(|s| !s.is_empty()) {
let hex = token.strip_prefix("0x").or_else(|| token.strip_prefix("0X")).unwrap_or(token);
let pc = u32::from_str_radix(hex, 16)
.map_err(|e| anyhow::anyhow!("--audit-pc-probe-hex {token:?}: {e}"))?;
kernel.audit_pc_probe_pcs.insert(pc);
}
if !quiet && !kernel.audit_pc_probe_pcs.is_empty() {
let mut pcs: Vec<u32> = kernel.audit_pc_probe_pcs.iter().copied().collect();
pcs.sort_unstable();
let strs: Vec<String> = pcs.iter().map(|p| format!("{p:#010x}")).collect();
tracing::info!(
"audit-pc-probe armed: {} ({})",
kernel.audit_pc_probe_pcs.len(),
strs.join(", "),
);
}
}
// AUDIT-2BF round 14 — `--audit-mem-read-hex=828E1F08`. Single
// hex VA (optional `0x` prefix). Stored on `kernel.audit_mem_read_addr`.
// Paired with `audit_pc_probe_pcs`: on every probe fire, the kernel
// emits a second `AUDIT-MEM-READ` line dereferencing 3 deep so we can
// resolve vtable[0] / vtable[24] at the singleton.
let audit_mem_read_combined: Option<String> = match (
audit_mem_read_hex, std::env::var("XENIA_AUDIT_MEM_READ").ok(),
) {
(Some(s), _) => Some(s.to_string()),
(None, Some(s)) if !s.is_empty() => Some(s),
_ => None,
};
if let Some(tok) = audit_mem_read_combined {
let tok = tok.trim();
if !tok.is_empty() {
let hex = tok.strip_prefix("0x").or_else(|| tok.strip_prefix("0X")).unwrap_or(tok);
let addr = u32::from_str_radix(hex, 16)
.map_err(|e| anyhow::anyhow!("--audit-mem-read-hex {tok:?}: {e}"))?;
kernel.audit_mem_read_addr = Some(addr);
if !quiet {
tracing::info!("audit-mem-read armed: {:#010x}", addr);
}
}
}
// AUDIT-052 — `--audit-r3-dump-bytes=80`. When set, every
// `--audit-pc-probe-hex` fire emits a paired `AUDIT-R3-DUMP` line
// with N bytes from `r3` (4-byte aligned, capped at 256). Sized for
// the 80-byte stack-local struct at `sub_82452DC0`'s `r31+96` —
// probe `sub_8245B000` entry where `r3 == parent's r31+96`.
let audit_r3_dump_combined: Option<u32> = match (
audit_r3_dump_bytes, std::env::var("XENIA_AUDIT_R3_DUMP_BYTES").ok(),
) {
(Some(n), _) => Some(n),
(None, Some(s)) if !s.is_empty() => Some(
s.parse::<u32>().map_err(|e| anyhow::anyhow!("--audit-r3-dump-bytes {s:?}: {e}"))?,
),
_ => None,
};
if let Some(n) = audit_r3_dump_combined {
if n > 0 {
kernel.audit_r3_dump_bytes = Some(n);
if !quiet {
tracing::info!("audit-r3-dump armed: {} bytes", n);
}
}
}
// Diagnostic. Parse `--dump-addr=0x828F3D08,...` (or // Diagnostic. Parse `--dump-addr=0x828F3D08,...` (or
// `XENIA_DUMP_ADDR=...`) into `kernel.dump_addrs`. The contents // `XENIA_DUMP_ADDR=...`) into `kernel.dump_addrs`. The contents
// are dumped at end-of-run by `dump_thread_diagnostic`. Pure // are dumped at end-of-run by `dump_thread_diagnostic`. Pure
@@ -1990,7 +2124,13 @@ fn coord_pre_round(
} }
kernel.fire_due_timers(); kernel.fire_due_timers();
try_inject_graphics_interrupt(kernel); // Graphics-interrupt delivery is no longer done here — see
// `dispatch_graphics_interrupts`, called from the outer loop with
// `mem` and `&mut stats` in scope. The audio path still uses the
// asynchronous LR-sentinel inject because each XAudio client has a
// dedicated worker thread (audit-048 Plan B) that the callback
// runs on; we just queue the source and the worker_prologue's
// halt-sentinel restore path closes the loop.
if kernel.xaudio_tick_enabled { if kernel.xaudio_tick_enabled {
try_inject_audio_callback(kernel); try_inject_audio_callback(kernel);
} }
@@ -2010,6 +2150,24 @@ fn coord_idle_advance(
shutdown: &Option<std::sync::Arc<std::sync::atomic::AtomicBool>>, shutdown: &Option<std::sync::Arc<std::sync::atomic::AtomicBool>>,
stats: &ExecStats, stats: &ExecStats,
) -> RoundCtl { ) -> RoundCtl {
// Path β (iterate-2.BE follow-up): when the scheduler has no Ready
// threads, `coord_pre_round`'s instruction-count vsync ticker stops
// advancing (instruction_count is frozen). That starves the
// host-driven graphics ISR dispatcher: queue stays empty, no
// deliveries occur, and the very stall we're trying to break out of
// gets worse. Tick vsync from wallclock here unconditionally — it's
// a host-clock read, independent of instruction count, and the
// dispatcher in the outer loop will drain whatever we queue on the
// next pass. Mirrors the `--parallel` ticker choice in
// `coord_pre_round` (`tick_vsync_wallclock` branch).
if kernel.interrupts.tick_vsync_wallclock() {
use std::sync::atomic::Ordering;
let mmio = kernel.gpu.mmio();
let prev = mmio.d1mode_vblank_vline_status.load(Ordering::Relaxed);
mmio.d1mode_vblank_vline_status
.store(prev | 0x1, Ordering::Relaxed);
}
let next_timer = kernel.earliest_timer_deadline(); let next_timer = kernel.earliest_timer_deadline();
let next_wait = kernel.scheduler.earliest_wait_deadline(); let next_wait = kernel.scheduler.earliest_wait_deadline();
let target = match (next_timer, next_wait) { let target = match (next_timer, next_wait) {
@@ -2218,6 +2376,7 @@ fn worker_prologue(
// the helper, no overhead on the hot path. // the helper, no overhead on the hot path.
kernel.fire_ctor_probe_if_match(hw_id, mem); kernel.fire_ctor_probe_if_match(hw_id, mem);
kernel.fire_branch_probe_if_match(hw_id); kernel.fire_branch_probe_if_match(hw_id);
kernel.fire_audit_pc_probe_if_match(hw_id, mem);
kernel.fire_lr_trace_if_match(hw_id); kernel.fire_lr_trace_if_match(hw_id);
if mem.has_mem_watch() { if mem.has_mem_watch() {
@@ -2595,12 +2754,21 @@ fn run_execution(
let mut workers: [WorkerCtx; xenia_cpu::scheduler::HW_THREAD_COUNT] = let mut workers: [WorkerCtx; xenia_cpu::scheduler::HW_THREAD_COUNT] =
std::array::from_fn(|i| WorkerCtx::new(i as u8, force_per_instr)); std::array::from_fn(|i| WorkerCtx::new(i as u8, force_per_instr));
// Iterate-2.BE — decode cache used by the synchronous ISR
// dispatcher. ISRs are short (~40 PPC instructions) but fire
// every ~16.7 ms, so persisting the cache across calls avoids
// re-decoding the same handful of pages 60×/s.
let mut isr_decode_cache = xenia_cpu::decoder::DecodeCache::new();
'outer: loop { 'outer: loop {
// Per-round prologue: budget / shutdown / heartbeat / vsync / // Per-round prologue: budget / shutdown / heartbeat / vsync /
// timers / graphics-interrupt injection. Carved into // timers / audio-interrupt injection. Carved into
// `coord_pre_round` so the parallel scheduler (Step 03+) can // `coord_pre_round` so the parallel scheduler (Step 03+) can
// call the same coordination logic between phaser barriers // call the same coordination logic between phaser barriers
// without duplicating it from the lockstep path. // without duplicating it from the lockstep path. The
// graphics-interrupt dispatch is hoisted out — it runs
// *synchronously* (host-driven, iterate-2.BE) and needs `mem`
// + `&mut stats` which aren't in `coord_pre_round`'s scope.
match coord_pre_round( match coord_pre_round(
kernel, kernel,
&stats, &stats,
@@ -2612,6 +2780,13 @@ fn run_execution(
RoundCtl::BreakOuter => break, RoundCtl::BreakOuter => break,
RoundCtl::Continue => {} RoundCtl::Continue => {}
} }
dispatch_graphics_interrupts(
kernel,
mem,
&mut stats,
&mut isr_decode_cache,
thunk_map,
);
// Snapshot round schedule. `round_schedule` also advances rng state // Snapshot round schedule. `round_schedule` also advances rng state
// when seeded; mutation is intentional. // when seeded; mutation is intentional.
@@ -2789,6 +2964,10 @@ fn run_execution_parallel(
let throttle_start = Instant::now(); let throttle_start = Instant::now();
// Iterate-2.BE — decode cache for the synchronous ISR dispatcher.
// Lives on the coordinator (this) thread; workers never touch it.
let mut isr_decode_cache = xenia_cpu::decoder::DecodeCache::new();
const COORD_ID: u8 = xenia_cpu::scheduler::HW_THREAD_COUNT as u8; // = 6 const COORD_ID: u8 = xenia_cpu::scheduler::HW_THREAD_COUNT as u8; // = 6
const PARTY_COUNT: u32 = xenia_cpu::scheduler::HW_THREAD_COUNT as u32 + 1; const PARTY_COUNT: u32 = xenia_cpu::scheduler::HW_THREAD_COUNT as u32 + 1;
@@ -3025,6 +3204,22 @@ fn run_execution_parallel(
} }
let mut guard = pre_outcome.1; let mut guard = pre_outcome.1;
// Iterate-2.BE — host-driven synchronous ISR dispatch.
// Runs under the kernel lock while workers are still parked
// at the phaser B2 barrier (the coordinator hasn't published
// the runnable mask or arrived at the phaser yet), so no
// contention with worker steps.
{
let mut s = stats_mtx.lock().expect("stats mutex poisoned");
dispatch_graphics_interrupts(
&mut *guard,
mem,
&mut *s,
&mut isr_decode_cache,
thunk_map,
);
}
guard.scheduler.begin_round(); guard.scheduler.begin_round();
let order = guard.scheduler.round_schedule(); let order = guard.scheduler.round_schedule();
@@ -3140,146 +3335,275 @@ fn run_execution_parallel(
stats_mtx.into_inner().expect("stats mutex poisoned") stats_mtx.into_inner().expect("stats mutex poisoned")
} }
/// First-Pixels M2 — inject a queued graphics interrupt into HW thread 0 /// Iterate-2.BE — host-driven synchronous dispatch of all queued
/// when it's safe to do so (callback registered, no interrupt already /// graphics interrupts. Mirrors canary's
/// running). Called at the top of each scheduler round. /// [`EmulateCPInterruptDPC`](../../../../xenia-canary/src/xenia/kernel/kernel_state.cc#L1370)
/// → [`Processor::Execute`](../../../../xenia-canary/src/xenia/cpu/processor.cc#L413)
/// path: pick a guest thread, borrow its `PpcContext`, jam the ISR
/// PC + args into it, and **run the interpreter inline on the host
/// thread** until the ISR returns to `LR_HALT_SENTINEL`. Then restore
/// the borrowed context and continue.
/// ///
/// Unlike the earlier P6 version which only delivered when HW 0 was /// Drains the full pending FIFO each call — canary's frame-limiter
/// `Ready`, this one also delivers when HW 0 is `Blocked`: the injector /// runs at its own cadence and our queue can already hold up to
/// stashes the block reason into the new `HwState::ServicingIrq(reason)` /// `INTERRUPT_QUEUE_CAP` coalesced v-sync events.
/// variant, flips the thread to that state so `round_schedule` runs it,
/// and — on callback return to `LR_HALT_SENTINEL` — the restore path
/// re-creates `Blocked(reason)`, unless a `wake()` during the callback
/// (e.g. `KeSetEvent` → `wake_eligible_waiters`) flipped it to `Ready`,
/// in which case the wait was resolved and we leave it.
/// ///
/// This is the fix that unblocks games (like Sylpheed) which gate their /// Why this replaces the prior victim-mutate-then-wait scheme: with
/// main loop on a v-sync callback signaling an event the main thread /// the old asynchronous injection, when every guest thread idled (post
/// waits on. The earlier "only-when-Ready" policy dropped 397 of 399 /// boot, when Sylpheed's main thread reaches its WAIT_FOREVER on the
/// observed v-syncs on a 1 B-instruction Sylpheed probe; now they /// vsync-driven PKEVENT and all worker threads are likewise Blocked),
/// actually get delivered. /// the next scheduler round had no `Ready` victim and `Blocked` ones
fn try_inject_graphics_interrupt(kernel: &mut xenia_kernel::KernelState) { /// still required at least one round of execution to reach the
/// callback. Audit-059 measured `gpu.interrupt.delivered = 54` over
/// 3.9 s vs canary's 4712 — an 87× shortfall. Host-driven dispatch
/// makes delivery rate a function of wall clock, not guest-thread
/// readiness.
///
/// Victim selection still mirrors the canary precedent: prefer Ready
/// (no state mangling), else any Blocked thread (we temporarily flip
/// to `ServicingIrq(reason)` for the duration of the inline run so
/// `call_export` etc. see a coherent thread state, and restore the
/// `Blocked(reason)` on the way out unless the ISR itself signaled a
/// wake). Idle / Exited / already-ServicingIrq slots are skipped — if
/// nothing remains the source is dropped (still the right behavior;
/// canary's `XThread::GetCurrentThread()` would assert).
///
/// All execution while in-flight runs against the borrowed thread's
/// `ctx`. We set `scheduler.current = Some(target_ref)` so kernel
/// imports (`KeSetEvent`, `KeReleaseSemaphore`, etc.) reach the right
/// context, then restore the previous `current` on the way out. The
/// dispatch is single-threaded — under `--parallel` it runs on the
/// coordinator with workers parked at the phaser barrier, so there is
/// no contention.
fn dispatch_graphics_interrupts(
kernel: &mut xenia_kernel::KernelState,
mem: &xenia_memory::GuestMemory,
stats: &mut ExecStats,
decode_cache: &mut xenia_cpu::decoder::DecodeCache,
thunk_map: &HashMap<u32, (ModuleId, u16, String)>,
) {
use xenia_cpu::interpreter::{step_cached, StepResult};
use xenia_cpu::scheduler::HwState; use xenia_cpu::scheduler::HwState;
const LR_HALT: u32 = xenia_cpu::context::LR_HALT_SENTINEL as u32;
/// Defensive cap so a runaway ISR can't lock the coordinator on
/// the per-tick dispatch. Real Sylpheed vsync ISR is ~40 PPC
/// instructions; canary's `Processor::Execute` has no analogous
/// cap because it runs on a dedicated host thread, but we run
/// inline on the coordinator so a budget is prudent.
const MAX_INSTRS_PER_ISR: u64 = 1_000_000;
if kernel.interrupts.is_in_callback() {
return;
}
let Some(cb) = kernel.interrupts.callback else { let Some(cb) = kernel.interrupts.callback else {
// No callback registered; drain any pending entries (they
// wouldn't have made it into the queue per `queue_interrupt`'s
// own `callback.is_none()` guard, but be defensive).
kernel.interrupts.pending.clear(); kernel.interrupts.pending.clear();
return; return;
}; };
let Some(source) = kernel.interrupts.peek_next() else { // Iterate-2.BF.γ: graphics dispatch is fully synchronous (host-driven,
return; // iterate-2.BE) — it borrows a guest thread, runs the ISR to
// LR_HALT_SENTINEL, and restores all in-call before returning. So it
// CAN safely coexist with an audio callback mid-flight, *as long as we
// pick a different victim thread* than the one audio borrowed. The old
// blanket `is_in_callback()` gate caused 5.85M skipped dispatches in
// lockstep boot (vs 55 with-pending dispatches) — audio is essentially
// always mid-flight on its dedicated worker, which choked vsync
// delivery at ~54. Exclude only audio's borrowed thread; the queue
// drains synchronously and graphics ISR completion does not touch
// `interrupts.saved` (used exclusively by the async audio path).
let audio_borrowed = if kernel.interrupts.is_in_callback() {
kernel.interrupts.injected_ref
} else {
None
}; };
// Canary's `EmulateCPInterruptDPC` (kernel_state.cc:1373) dispatches on while let Some(source) = kernel.interrupts.peek_next() {
// whatever the current thread happens to be — real hardware fires the // Victim selection: Ready first, then Blocked (canary's
// interrupt on CPU 2 and the kernel impersonates a DPC on top of // `XThread::GetCurrentThread()` analog — any live thread will
// whichever thread is active. Hard-anchoring to HW 0 breaks the moment // do for borrowing context). Skip Idle/Exited/ServicingIrq.
// `main()` returns: Sylpheed's main thread exits right after init, the // Skip the audio-borrowed thread (if any) to avoid clobbering
// render worker spins on a `PKEVENT` inside the interrupt callback's // its `SavedCallbackCtx` mid-flight.
// user_data struct (`user_data + 0x5C`), and because HW 0 is now let excluded = audio_borrowed;
// `Exited(_)` our injector drops every subsequent vsync — the PKEVENT let mut victim: Option<xenia_cpu::ThreadRef> = None;
// is never signaled and the worker polls forever. 'outer_ready: for (hw_id, slot) in kernel.scheduler.slots.iter().enumerate() {
//
// Pick the first HW thread we can plausibly run the callback on:
// 1. Prefer `Ready` (no state-mangling needed)
// 2. Else take a `Blocked(reason)` thread and swap to
// `ServicingIrq(reason)` so the round scheduler runs it; the
// LR-sentinel restore path reinstates the block on callback return
// 3. Skip `Idle`, `Exited`, or already-`ServicingIrq` slots
//
// The callback itself just signals a game-side event and returns — it
// doesn't care which HW thread it ran on.
// Pass 1: find any Ready thread across all slots.
let mut victim: Option<xenia_cpu::ThreadRef> = None;
'outer_ready: for (hw_id, slot) in kernel.scheduler.slots.iter().enumerate() {
for (idx, t) in slot.runqueue.iter().enumerate() {
if matches!(t.state, HwState::Ready) {
victim = Some(xenia_cpu::ThreadRef::new(hw_id as u8, idx as u16));
break 'outer_ready;
}
}
}
// Pass 2: any Blocked thread (we'll flip it to ServicingIrq).
if victim.is_none() {
'outer_blocked: for (hw_id, slot) in kernel.scheduler.slots.iter().enumerate() {
for (idx, t) in slot.runqueue.iter().enumerate() { for (idx, t) in slot.runqueue.iter().enumerate() {
if matches!(t.state, HwState::Blocked(_)) { let r = xenia_cpu::ThreadRef::new(hw_id as u8, idx as u16);
victim = Some(xenia_cpu::ThreadRef::new(hw_id as u8, idx as u16)); if excluded == Some(r) {
break 'outer_blocked; continue;
}
if matches!(t.state, HwState::Ready) {
victim = Some(r);
break 'outer_ready;
} }
} }
} }
} if victim.is_none() {
let Some(target_ref) = victim else { 'outer_blocked: for (hw_id, slot) in kernel.scheduler.slots.iter().enumerate() {
// All threads Idle/Exited/already servicing — nothing to inject on. for (idx, t) in slot.runqueue.iter().enumerate() {
kernel.interrupts.take_next(); let r = xenia_cpu::ThreadRef::new(hw_id as u8, idx as u16);
kernel.interrupts.dropped += 1; if excluded == Some(r) {
return; continue;
}; }
if matches!(t.state, HwState::Blocked(_)) {
let t = kernel.scheduler.thread_mut(target_ref); victim = Some(r);
let prev_state = t.state.clone(); break 'outer_blocked;
match prev_state { }
HwState::Ready => {} }
HwState::Blocked(reason) => { }
t.state = HwState::ServicingIrq(reason);
} }
_ => unreachable!("victim selection above filtered out other variants"), let Some(target_ref) = victim else {
} // No donor at all — drop and exit (no point looping if the
// next source has the same problem).
kernel.interrupts.take_next();
kernel.interrupts.dropped += 1;
return;
};
let _ = kernel.interrupts.take_next(); // Commit: pop the queue, flag temporary state.
let t = kernel.scheduler.thread_mut(target_ref); let _ = kernel.interrupts.take_next();
let saved = xenia_kernel::SavedCallbackCtx::capture(&t.ctx, source); let prev_state = kernel.scheduler.thread(target_ref).state.clone();
kernel.interrupts.injected_ref = Some(target_ref); let was_blocked = matches!(prev_state, HwState::Blocked(_));
t.ctx.pc = cb.callback_pc; if let HwState::Blocked(reason) = prev_state.clone() {
t.ctx.lr = xenia_cpu::context::LR_HALT_SENTINEL; kernel.scheduler.thread_mut(target_ref).state =
// Canary `Processor::Execute` decrements the guest SP by 176 before HwState::ServicingIrq(reason);
// running the callback and restores on return (see Canary }
// processor.cc:383). Without this pad the callback's
// `__savegprlr_N` prologue stomps the interrupted function's // Save the borrowed ctx fields the ISR will clobber. Matches
// already-saved LR at [r1-8], so when the interrupted function // canary's processor.cc:387-394 (save prev lr, run, restore).
// later returns via `__restgprlr_N -> bclr` it jumps to let saved = {
// `LR_HALT_SENTINEL` and the thread exits prematurely. Matching let t = kernel.scheduler.thread_mut(target_ref);
// restore lives in `SavedCallbackCtx::restore` (which now also let saved = xenia_kernel::SavedCallbackCtx::capture(&t.ctx, source);
// restores r1). t.ctx.pc = cb.callback_pc;
t.ctx.gpr[1] = t t.ctx.lr = xenia_cpu::context::LR_HALT_SENTINEL;
.ctx // Canary processor.cc:383 — pad SP so the callback's
.gpr[1] // __savegprlr_N prologue doesn't stomp the interrupted
.wrapping_sub(xenia_kernel::interrupts::CALLBACK_STACK_PAD as u64); // function's saved LR at [r1-8].
t.ctx.gpr[3] = source as u64; t.ctx.gpr[1] = t
t.ctx.gpr[4] = cb.user_data as u64; .ctx
kernel.interrupts.saved = Some(saved); .gpr[1]
metrics::counter!("gpu.interrupt.delivered", "source" => format!("{source}")) .wrapping_sub(xenia_kernel::interrupts::CALLBACK_STACK_PAD as u64);
.increment(1); t.ctx.gpr[3] = source as u64;
tracing::debug!( t.ctx.gpr[4] = cb.user_data as u64;
source, saved
hw_id = target_ref.hw_id, };
idx = target_ref.idx,
callback = format_args!("{:#010x}", cb.callback_pc), // Stash the previous `scheduler.current` (call_export reaches
"graphics interrupt: injecting" // it; imports the ISR calls must dispatch on the borrowed
); // thread). Restore on the way out.
let prev_current = kernel.scheduler.current;
kernel.scheduler.current = Some(target_ref);
metrics::counter!("gpu.interrupt.delivered", "source" => format!("{source}"))
.increment(1);
tracing::debug!(
source,
hw_id = target_ref.hw_id,
idx = target_ref.idx,
callback = format_args!("{:#010x}", cb.callback_pc),
"graphics interrupt: dispatching synchronously (iterate-2.BE)"
);
// Inline interpreter loop on the borrowed context until the
// ISR returns to LR_HALT_SENTINEL (its `blr` writes
// `lr → pc`). Per-instruction step handles imports via
// thunk_map (the ISR typically just calls `KeSetEvent`).
let mut isr_instrs: u64 = 0;
loop {
let pc = kernel.scheduler.ctx_mut_ref(target_ref).pc;
if pc == LR_HALT {
break;
}
if isr_instrs >= MAX_INSTRS_PER_ISR {
tracing::warn!(
pc = format_args!("{:#010x}", pc),
isr_instrs,
"graphics ISR exceeded MAX_INSTRS_PER_ISR; aborting"
);
break;
}
// Import-thunk intercept: same shape as worker_prologue's
// step 2 (line ~2287).
if let Some((module, ordinal, _name)) = thunk_map.get(&pc) {
let module = *module;
let ordinal_u32 = *ordinal as u32;
kernel.call_export(module, ordinal_u32, mem);
let post_ref = kernel.scheduler.current;
let c = match post_ref {
Some(r) => kernel.scheduler.ctx_mut_ref(r),
None => kernel.scheduler.ctx_mut_ref(target_ref),
};
c.pc = c.lr as u32;
c.cycle_count += 1;
c.timebase += 1;
stats.instruction_count += 1;
stats.import_count += 1;
isr_instrs += 1;
continue;
}
if !mem.is_mapped(pc) {
tracing::error!(
pc = format_args!("{:#010x}", pc),
isr_instrs,
"graphics ISR hit unmapped PC; aborting"
);
break;
}
let ctx = kernel.scheduler.ctx_mut_ref(target_ref);
let page_ver = mem.page_version(ctx.pc);
let r = step_cached(ctx, mem, decode_cache, page_ver);
stats.instruction_count += 1;
isr_instrs += 1;
match r {
StepResult::Continue => {}
StepResult::SystemCall => {
tracing::warn!("graphics ISR hit `sc` instruction; aborting");
break;
}
StepResult::Trap => {
tracing::warn!("graphics ISR hit trap; aborting");
break;
}
StepResult::Halted => break,
StepResult::Unimplemented(op) => {
tracing::warn!(?op, "graphics ISR hit unimplemented opcode; aborting");
break;
}
}
}
// Restore the borrowed context.
saved.restore(kernel.scheduler.ctx_mut_ref(target_ref));
kernel.scheduler.current = prev_current;
kernel.interrupts.delivered += 1;
// Restore thread state. If the ISR signaled a wake on the
// borrowed thread (e.g. canary `KeSetEvent` → scheduler wake)
// the state may already be Ready; only re-block if still
// ServicingIrq.
if was_blocked {
let t = kernel.scheduler.thread_mut(target_ref);
if let HwState::ServicingIrq(reason) = t.state.clone() {
t.state = HwState::Blocked(reason);
}
}
}
} }
/// AUDIT-032 Plan B — inject a pending XAudio buffer-complete callback /// AUDIT-032 Plan B — inject a pending XAudio buffer-complete callback
/// into the **dedicated audio worker** registered for the head-of-queue /// into the **dedicated audio worker** registered for the head-of-queue
/// client. Mirrors /// client. Uses the asynchronous LR-sentinel injection mechanism (same
/// [`try_inject_graphics_interrupt`] (same SP-pad, same saved-context /// SP-pad, same `SavedCallbackCtx` restore-on-sentinel as the pre-iterate-2.BE
/// restore-on-sentinel) but the target thread is fixed at registration /// graphics path) but the target thread is fixed at registration time
/// time instead of selected via the random-victim policy. The pre-fix /// instead of selected via the random-victim policy. The pre-fix
/// random-victim path corrupted unrelated thread state /// random-victim path corrupted unrelated thread state
/// (APUBUG-PRODUCER-001 "HW-thread hijack"); per-client workers eliminate /// (APUBUG-PRODUCER-001 "HW-thread hijack"); per-client workers eliminate
/// that whole class of regression. /// that whole class of regression.
/// ///
/// Mutual exclusion with the graphics path is via the shared /// Mutual exclusion with the graphics path (which is now synchronous —
/// `interrupts.saved` slot — if a graphics callback is already in flight, /// see `dispatch_graphics_interrupts`) is via the shared
/// `is_in_callback()` returns true and we bail until it returns to the /// `interrupts.saved` slot — if an audio callback is already in flight,
/// `LR_HALT_SENTINEL`. /// `is_in_callback()` returns true and `dispatch_graphics_interrupts`
/// defers until it returns to the `LR_HALT_SENTINEL`.
fn try_inject_audio_callback(kernel: &mut xenia_kernel::KernelState) { fn try_inject_audio_callback(kernel: &mut xenia_kernel::KernelState) {
use xenia_cpu::scheduler::HwState; use xenia_cpu::scheduler::HwState;

View File

@@ -980,6 +980,43 @@ fn open_vfs_file(
// see a null handle later and trigger `XamShowDirtyDiscErrorUI`. // see a null handle later and trigger `XamShowDirtyDiscErrorUI`.
let path = crate::path::object_attributes_to_vfs_path(mem, obj_attrs_ptr) let path = crate::path::object_attributes_to_vfs_path(mem, obj_attrs_ptr)
.unwrap_or_default(); .unwrap_or_default();
// AUDIT-2.BF — synthetic silph::WorkerCtx spawn. AUDIT-058/059
// identified that ours never activates the 6-level static caller
// ladder that ends in `sub_825070F0`, so the four worker threads
// it would normally spawn (entries 0x82506528/58/88/B8) never run.
// Canary's chain originally fires right after `DiscImageDevice::
// ResolvePath("\\dat\\movie")` (audit-058); ours never opens
// `dat/movie` because tid=13 wedges before reaching it. We
// therefore trigger on the first `dat/*` open — the earliest
// such open in ours is `dat/files.tbl` (immediately preceding
// tid=12/13 spawn at audit-059 round 1).
//
// **Round 18 finding** (this commit): when the workers are
// spawned runnable, they fault almost immediately (`PC=0` at
// cycle ~5.5M on the hw thread carrying worker_3), preempting
// ours' boot before the normal guest threads even spawn. The
// ctx layout from audit-059 round 5 is incomplete — at least
// one of `[+0x28]`/`[+0x2C]`/`[+0x30]` (the three foreign-
// arena pointers) must be populated for the worker bodies to
// run. Synthesising those is a fresh investigation (round 19+).
//
// Until then the synth path is **opt-in**: set
// `XENIA_SILPH_SYNTH=1` to enable the runnable spawn (will
// crash boot), or `XENIA_SILPH_SYNTH=suspend` to spawn but keep
// them in `Blocked(Suspended)` (lets boot complete with the
// ctx materialised in memory for downstream probes). Default:
// disabled — preserves the existing boot trajectory.
if !state.silph_synth_done && path.starts_with("dat/") {
match std::env::var("XENIA_SILPH_SYNTH").as_deref() {
Ok("1") | Ok("run") | Ok("runnable") => {
let _ = crate::silph_synth::spawn_silph_workers(state, mem, false);
}
Ok("suspend") | Ok("suspended") => {
let _ = crate::silph_synth::spawn_silph_workers(state, mem, true);
}
_ => {}
}
}
if path.is_empty() && obj_attrs_ptr == 0 { if path.is_empty() && obj_attrs_ptr == 0 {
if handle_out != 0 { if handle_out != 0 {
mem.write_u32(handle_out, 0); mem.write_u32(handle_out, 0);

View File

@@ -8,13 +8,18 @@
//! guest-issued command stream; source code 1 (`INTERRUPT_SOURCE_CP`). //! guest-issued command stream; source code 1 (`INTERRUPT_SOURCE_CP`).
//! //!
//! Canary's [xboxkrnl_video.cc:303-310](xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_video.cc#L303-L310) //! Canary's [xboxkrnl_video.cc:303-310](xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_video.cc#L303-L310)
//! dispatches the callback on HW thread 0. We follow the same convention. //! dispatches the callback on HW thread 0. We follow the same convention
//! for picking a *context donor*, but as of iterate-2.BE the dispatch
//! itself is **synchronous and host-driven**: the main loop runs the ISR
//! inline on the borrowed guest context, mirroring canary's
//! `EmulateCPInterruptDPC → Processor::Execute` path
//! ([kernel_state.cc:1370](../../../../xenia-canary/src/xenia/kernel/kernel_state.cc#L1370),
//! [processor.cc:413](../../../../xenia-canary/src/xenia/cpu/processor.cc#L413)).
//! Independent of whether the donor guest thread was Ready or Blocked.
//! //!
//! The delivery model is cooperative: we inject the callback entry into HW //! The audio callback path (audit-048) still uses asynchronous LR-sentinel
//! thread 0 at the top of a scheduler round when it's safe (not mid-export, //! injection on a dedicated per-client worker thread; the
//! not already inside another interrupt). When the callback returns to //! [`SavedCallbackCtx`] machinery below remains in use there.
//! [`LR_HALT_SENTINEL`] the main loop restores the saved [`PpcContext`]
//! fields and the HW thread picks up where it left off.
use std::collections::VecDeque; use std::collections::VecDeque;
use std::time::{Duration, Instant}; use std::time::{Duration, Instant};

View File

@@ -3,6 +3,7 @@ pub mod exports;
pub mod interrupts; pub mod interrupts;
pub mod objects; pub mod objects;
pub mod path; pub mod path;
pub mod silph_synth;
pub mod state; pub mod state;
pub mod thread; pub mod thread;
pub mod ui_bridge; pub mod ui_bridge;

View File

@@ -0,0 +1,280 @@
//! AUDIT-2.BF — synthetic spawn of the silph::WorkerCtx worker quartet.
//!
//! AUDIT-058/059 traced a 6-level static-caller ladder
//! (`sub_824F7800 ← sub_824F7CD0 ← sub_824F8398 ← sub_821B55D8 ← sub_821B6DF4`,
//! topped by virtual-dispatch from `sub_82172BA0+0x1E8`) that activates
//! `sub_825070F0` in canary at ~1× / 30 s, kicking off four worker threads
//! initialised against a single ~0x440-byte ctx. In ours none of those PCs
//! fire (audit-059 round 9 confirmed sub_821B6DF4 = 0×, real chain entry =
//! virtual-dispatch from sub_82172BA0+0x1E8 hits wrong-vtable slot).
//!
//! Rather than chase the wrong-vtable break, this module reproduces the end
//! state directly: at the first observation of a load-bearing VFS path
//! (`dat/movie`), we synthesise the ctx structure in guest memory per audit-
//! 059 round 5's live hexdump and spawn the four worker entry points the
//! same way AUDIT-048's audio host-pump spawns its dedicated client worker.
//!
//! The ctx is opaque to the workers — only fields they dereference matter.
//! Per round 5 dump (`audit-runs/audit-059-handle-disambiguation/round5-ctx-
//! dump/canary.log`):
//!
//! +0x00 vtable = 0x8200A1E8 (XEX .rdata, valid in both engines)
//! +0x04 self = ctx
//! +0x08 intrusive head= ctx
//! +0x0C init flag = 1
//! +0x10 packed byte = 0x01000000
//! +0x18 float ~1.0 = 0x3F7FCCCC
//! +0x1C float ~1.0 = 0x3F802D83
//! +0x24 flag = 1
//! +0x28..+0x30 = three foreign pointers, NULL initially
//! +0x54..+0x84 = 4× X_KEVENT auto-reset, state=0
//! +0x94..+0xC4 = 4× X_KEVENT manual-reset, state=1
//! +0x210..+0x250 = 4-entry intrusive work-ring, empty
//!
//! Worker entries (each takes r3 = ctx_ptr):
//! 0x82506528, 0x82506558, 0x82506588, 0x825065B8
use xenia_cpu::scheduler::{BlockReason, SpawnParams};
use xenia_cpu::ThreadRef;
use xenia_memory::{GuestMemory, MemoryAccess};
use crate::objects::KernelObject;
use crate::state::{GuestMemoryPcr, KernelState};
use crate::thread::allocate_thread_image;
/// XEX `.rdata` vtable for the silph::WorkerCtx singleton (audit-059 round 5).
const SILPH_CTX_VTABLE: u32 = 0x8200_A1E8;
/// 4-element fixed entry table — guest text PCs for the four worker bodies.
const SILPH_WORKER_ENTRIES: [u32; 4] = [
0x8250_6528,
0x8250_6558,
0x8250_6588,
0x8250_65B8,
];
/// Round 0x440 up to a page-ish so the ctx alloc never straddles a page
/// boundary in heap_alloc's bookkeeping. Round 20 grew the alloc from 0x500
/// to 0x800 to make room for a synthesised sub-object at +0x300 and its
/// 32-slot vtable at +0x500 (= ctx + 0x500..0x580). Round 21 retains the
/// embedded sub-object but drops the synthesized vtable (we now point at
/// canary's real XEX-resident sub-vtable directly), so the 0x500..0x580
/// region is unused but harmless.
const SILPH_CTX_SIZE: u32 = 0x800;
/// Offset within the ctx allocation of the synthetic sub-object referenced
/// at `[ctx+0x2C]`. Canary's sub-object sits ~0x300 bytes above the ctx and
/// varies per-instance; we keep it embedded in the same alloc so a single
/// `heap_alloc` covers everything.
const SILPH_SUBOBJ_OFFSET: u32 = 0x300;
/// XEX `.rdata` VA of canary's real sub-object vtable (audit-059 round 21).
/// Discovered by:
/// 1. Probing canary at `pc=0x82506B08` (= `sub_82506B08`, method 35 of
/// the WorkerCtx vtable, the first sub-object method called by every
/// `sub_82506528/58/88/B8` worker entry).
/// 2. Capturing `[ctx+0x2C]` from the JIT-prolog dump (= sub-object VA
/// in canary's heap).
/// 3. Re-running with `--audit_jit_prolog_mem_dump=<sub-obj VA>` to read
/// `[sub-object + 0]` = sub-vtable VA = **`0x8200A168`**.
/// PE inspection confirms slot 15 (called via `[r11+0x3C]` at
/// `sub_82506B08+0x44`) = `sub_824FCCC8` and slot 17 (`[r11+0x44]` at
/// `sub_82506B08+0x70`) = `sub_824FCE38`. Both are real game methods in
/// the same `.text` region as the rest of the worker dispatch surface.
const SILPH_SUB_VTABLE_SOURCE_VA: u32 = 0x8200_A168;
/// Round-19 XEX-resident wrapper constant observed at `[ctx+0x30]` in every
/// canary ctx (audit-059 round 7). Same value for all four ctxes — opaque
/// pointer / handle the worker passes through without dereferencing.
const SILPH_CTX_FIELD_30_CONST: u32 = 0xBE56_8F00;
/// 64 KiB worker stack (mirrors AUDIT-048 audio worker), half of canary's
/// 128 KiB default.
const SILPH_WORKER_STACK: u32 = 0x10_000;
/// Idempotently synthesise the silph::WorkerCtx and spawn the four worker
/// threads it normally drives.
///
/// `suspended` controls whether the spawned threads enter the runqueue as
/// `Ready` (false) or as `Blocked(Suspended)` (true). Use `true` for
/// diagnostic baselines where you want the ctx materialised in guest memory
/// for downstream probes but don't want the worker bodies executing (e.g.
/// when round-5 ctx fields like the foreign-arena pointers at +0x28/+0x2C/
/// +0x30 are still NULL and the workers would fault on first dereference).
///
/// Returns the ctx VA on the first call; on subsequent calls returns the
/// cached VA without re-spawning. Failures inside spawn are logged but the
/// `synth_done` latch is still flipped so we don't retry-loop.
///
/// Mirrors the AUDIT-048 audio-worker spawn pattern in
/// `xaudio_register_render_driver` (`exports.rs:3122`).
pub fn spawn_silph_workers(
state: &mut KernelState,
mem: &GuestMemory,
suspended: bool,
) -> Option<u32> {
if state.silph_synth_done {
return Some(state.silph_synth_ctx);
}
state.silph_synth_done = true;
let Some(ctx) = state.heap_alloc(SILPH_CTX_SIZE, mem) else {
tracing::warn!("silph_synth: heap_alloc({:#x}) failed for ctx", SILPH_CTX_SIZE);
return None;
};
state.silph_synth_ctx = ctx;
// Zero the entire ctx page first — heap_alloc returns freshly mapped
// memory but we want the audit-059-round-5 layout to be canonical
// regardless of any future allocator behaviour change.
for off in (0..SILPH_CTX_SIZE).step_by(4) {
mem.write_u32(ctx + off, 0);
}
// ---- Header scalars (per audit-059 round 5 hexdump) ----
mem.write_u32(ctx + 0x00, SILPH_CTX_VTABLE);
mem.write_u32(ctx + 0x04, ctx); // self
mem.write_u32(ctx + 0x08, ctx); // intrusive list head pointing at self
mem.write_u32(ctx + 0x0C, 0x0000_0001); // init flag / refcount
mem.write_u32(ctx + 0x10, 0x0100_0000); // packed byte field
mem.write_u32(ctx + 0x18, 0x3F7F_CCCC); // float ~1.0 (UI rate A)
mem.write_u32(ctx + 0x1C, 0x3F80_2D83); // float ~1.0 (UI rate B)
mem.write_u32(ctx + 0x24, 0x0000_0001);
// +0x28..+0x30 = three foreign pointers.
// +0x28 — canary's first-fire snapshot has NULL here. Round-19 fault
// analysis shows worker bodies don't dereference this on
// first entry, so we leave it NULL too.
// +0x2C — sub-object pointer. Worker bodies do
// `lwz r3,44(rN); lwz r11,0(r3); lwz r11,60(r11); bctrl`,
// i.e. virtual-dispatch through slot 15 of the sub-object's
// vtable. Point this at our synthesised sub-object embedded
// at ctx + SILPH_SUBOBJ_OFFSET.
// +0x30 — XEX-resident wrapper constant 0xBE568F00 (round 7). Opaque
// but identical across all four canary ctxes.
let subobj_ptr = ctx + SILPH_SUBOBJ_OFFSET;
mem.write_u32(ctx + 0x2C, subobj_ptr);
mem.write_u32(ctx + 0x30, SILPH_CTX_FIELD_30_CONST);
// ---- Embedded sub-object at +0x300 ----
// Round-21 pivot: instead of synthesising a stub vtable that returns
// NULL from every slot, point `[sub_object + 0]` directly at canary's
// real XEX-resident sub-vtable VA. The vtable bytes are part of the
// same static image both engines map, so referring to it costs zero
// guest memory and gives the workers a working virtual-method surface
// (slot 15 = sub_824FCCC8, slot 17 = sub_824FCE38, plus 29 other real
// methods). Round-19 disassembly shows worker bodies only touch the
// sub-object's vtable; the rest of the sub-object is opaque so we
// leave it zero-filled.
mem.write_u32(subobj_ptr, SILPH_SUB_VTABLE_SOURCE_VA);
// ---- 4× X_KEVENT auto-reset at +0x54/+0x64/+0x74/+0x84, state = 0 ----
// X_DISPATCH_HEADER layout (canary xobject.h:35):
// +0x00 type (u8: 0=manual-event, 1=auto-event, 2=mutant, ...)
// +0x01 abandoned (u8)
// +0x02 size (u8 dwords)
// +0x03 inserted (u8)
// +0x04 signal_state (u32 BE)
// +0x08..+0x0F list_head (two pointers — self-link = empty list)
for i in 0..4u32 {
let off = ctx + 0x54 + (i * 0x10);
mem.write_u8(off, 1); // type = auto-reset Event
mem.write_u32(off + 4, 0); // signal_state = 0
// List head self-link denotes empty waiter list.
mem.write_u32(off + 8, off + 8);
mem.write_u32(off + 12, off + 8);
}
// ---- 4× X_KEVENT manual-reset at +0x94..+0xC4, state = 1 (pre-signaled) ----
for i in 0..4u32 {
let off = ctx + 0x94 + (i * 0x10);
mem.write_u8(off, 0); // type = manual-reset Event
mem.write_u32(off + 4, 1); // signal_state = 1 (pre-signaled)
mem.write_u32(off + 8, off + 8);
mem.write_u32(off + 12, off + 8);
}
// ---- 4-entry intrusive work-ring at +0x210, initially empty ----
// Each entry: [+0]=0x01000000 [+4]=0 [+8]=self_ptr [+0xC]=self_ptr.
for i in 0..4u32 {
let off = ctx + 0x210 + (i * 0x10);
mem.write_u32(off, 0x0100_0000);
mem.write_u32(off + 4, 0);
mem.write_u32(off + 8, off + 8);
mem.write_u32(off + 12, off + 8);
}
// +0x250 "XEN"-tagged descriptors and +0x2E0 resource-index table left
// zero — they may be populated lazily by the workers themselves.
// ---- Spawn the 4 worker guest threads ----
use std::sync::atomic::Ordering;
let mut spawned = 0usize;
for (i, &entry) in SILPH_WORKER_ENTRIES.iter().enumerate() {
let Some(image) = allocate_thread_image(state, mem, SILPH_WORKER_STACK, 0) else {
tracing::warn!("silph_synth: allocate_thread_image failed for worker {}", i);
continue;
};
let tid = state.next_thread_id.fetch_add(1, Ordering::Relaxed);
let handle = state.alloc_handle_for(KernelObject::Thread {
id: tid,
hw_id: None,
exit_code: None,
waiters: Vec::new(),
});
let tls_slot_count = state.next_tls_index.load(Ordering::Relaxed);
let params = SpawnParams {
entry,
start_context: ctx, // r3 = ctx_ptr
stack_base: image.stack_base,
stack_size: image.stack_size,
pcr_base: image.pcr_base,
tls_base: image.tls_base,
thread_handle: handle,
guest_tid: tid,
create_suspended: suspended,
is_initial: false,
tls_slot_count,
affinity_mask: 0,
priority: 0,
ideal_processor: None,
};
match state.scheduler.spawn(params, &mut GuestMemoryPcr(mem)) {
Ok(hw_id) => {
if let Some(KernelObject::Thread { hw_id: slot, .. }) =
state.objects.get_mut(&handle)
{
*slot = Some(hw_id);
}
let tref = ThreadRef::new(
hw_id,
(state.scheduler.slots[hw_id as usize].runqueue.len() - 1) as u16,
);
state.silph_synth_handles[i] = Some(handle);
state.silph_synth_refs[i] = Some(tref);
spawned += 1;
tracing::info!(
"silph_synth: spawned worker {} tid={} handle={:#x} entry={:#010x} ctx={:#010x}",
i, tid, handle, entry, ctx
);
}
Err(_) => {
tracing::warn!(
"silph_synth: scheduler.spawn failed for worker {} entry={:#010x}",
i, entry
);
}
}
// Avoid an unused-variable warning if BlockReason isn't referenced.
let _ = BlockReason::WaitAny {
handles: Vec::new(),
deadline: None,
};
}
tracing::info!(
"silph_synth: ctx={:#010x} workers_spawned={}/4",
ctx, spawned
);
Some(ctx)
}

View File

@@ -244,6 +244,41 @@ pub struct KernelState {
/// Distinct from `ctor_probe_pcs` because that helper emits 8 /// Distinct from `ctor_probe_pcs` because that helper emits 8
/// frames of back-chain per hit — too noisy for branch tracing. /// frames of back-chain per hit — too noisy for branch tracing.
pub branch_probe_pcs: std::collections::HashSet<u32>, pub branch_probe_pcs: std::collections::HashSet<u32>,
/// AUDIT-2BF — diagnostic. PCs at which to emit a structured one-line
/// `AUDIT-PC-PROBE` record on every fire, designed for the silph init
/// chain virtual-dispatch site at `sub_82172BA0+0x1E8` (PC
/// `0x82172D88`, a `bctrl` after a 3-deep load of vtable slot 6). The
/// emitted line carries (pc, tid, hw, cycle, lr, r3, r11) plus four
/// guest-memory dereferences off `r3`: `[r3+0]` (vtable), `[[r3+0]+24]`
/// (slot 6 method pointer = the bctrl target), `[r3+0x0C]` (audit-059
/// round-9 canary-known auxiliary handle `0xF80000D8`), and `[r3+0x30]`
/// (canary-known embedded sub-object vtable `0x820A1870`). Distinct
/// from `branch_probe_pcs` because that helper only logs registers (no
/// memory) and from `lr_trace_pcs` because that emits JSON intended
/// for canary diffing, not the four hard-coded indirect dereferences
/// needed here. Read-only — no guest state mutation. Lockstep
/// digest unaffected. Settable via `--audit-pc-probe-hex` /
/// `XENIA_AUDIT_PC_PROBE`.
pub audit_pc_probe_pcs: std::collections::HashSet<u32>,
/// AUDIT-2BF round 14 — diagnostic. Optional guest VA. When set, each
/// `AUDIT-PC-PROBE` fire emits a paired `AUDIT-MEM-READ` line with
/// `addr`, `*addr` (singleton value), `**addr` (vtable), `***addr+0`
/// (vtable[0] = first virtual method), and `***addr+24` (vtable[6]
/// in 4-byte stride = slot 6 = silph chain bctrl target). Three-deep
/// dereference to resolve the vtable[0] target at the bctrl site
/// `0x822F1B4C` inside `sub_822F1AA8`. Read-only; lockstep digest
/// unaffected. Settable via `--audit-mem-read-hex` /
/// `XENIA_AUDIT_MEM_READ`.
pub audit_mem_read_addr: Option<u32>,
/// AUDIT-052 — diagnostic. When set, each `AUDIT-PC-PROBE` fire
/// additionally emits an `AUDIT-R3-DUMP` line with N bytes of guest
/// memory dumped from `r3` as `u32` lanes (4-byte aligned only).
/// Sized for audit-051's 80-byte stack-local struct at `r31+96`
/// inside `sub_82452DC0` (probe `sub_8245B000` entry where
/// `r3 == parent's r31+96`). Read-only; lockstep digest unaffected.
/// Settable via `--audit-r3-dump-bytes` /
/// `XENIA_AUDIT_R3_DUMP_BYTES`.
pub audit_r3_dump_bytes: Option<u32>,
/// M12 — diagnostic. PCs at which to emit a structured JSONL record /// M12 — diagnostic. PCs at which to emit a structured JSONL record
/// per fire, designed for diffing against xenia-canary's /// per fire, designed for diffing against xenia-canary's
/// `--log_lr_on_pc` patch output. Each line carries /// `--log_lr_on_pc` patch output. Each line carries
@@ -264,6 +299,20 @@ pub struct KernelState {
pub dump_addrs: Vec<u32>, pub dump_addrs: Vec<u32>,
/// `--dump-section=BASE:LEN:PATH` end-of-run snapshot, page-gated by `is_mapped`. /// `--dump-section=BASE:LEN:PATH` end-of-run snapshot, page-gated by `is_mapped`.
pub dump_section: Option<(u32, u32, std::path::PathBuf)>, pub dump_section: Option<(u32, u32, std::path::PathBuf)>,
/// AUDIT-2.BF — synthetic silph::WorkerCtx spawn one-shot latch. Set on
/// first call to [`crate::silph_synth::spawn_silph_workers`] (triggered
/// by the first observation of a load-bearing VFS path such as
/// `dat/movie`), then reused — subsequent triggers are no-ops.
pub silph_synth_done: bool,
/// AUDIT-2.BF — VA of the synthesised silph::WorkerCtx. Zero before the
/// first spawn; set to the ctx base by `spawn_silph_workers`. Held on
/// the kernel state so future export hooks can find it (no caller does
/// yet — placeholder for round 19+ wiring).
pub silph_synth_ctx: u32,
/// AUDIT-2.BF — kernel handles for the 4 synthetic worker threads.
pub silph_synth_handles: [Option<u32>; 4],
/// AUDIT-2.BF — `ThreadRef` cache for the 4 synthetic workers.
pub silph_synth_refs: [Option<xenia_cpu::ThreadRef>; 4],
} }
impl KernelState { impl KernelState {
@@ -327,10 +376,17 @@ impl KernelState {
ctor_probe_pcs: std::collections::HashSet::new(), ctor_probe_pcs: std::collections::HashSet::new(),
pc_probe_consumers: HashMap::new(), pc_probe_consumers: HashMap::new(),
branch_probe_pcs: std::collections::HashSet::new(), branch_probe_pcs: std::collections::HashSet::new(),
audit_pc_probe_pcs: std::collections::HashSet::new(),
audit_mem_read_addr: None,
audit_r3_dump_bytes: None,
lr_trace_pcs: std::collections::HashSet::new(), lr_trace_pcs: std::collections::HashSet::new(),
lr_trace_writer: None, lr_trace_writer: None,
dump_addrs: Vec::new(), dump_addrs: Vec::new(),
dump_section: None, dump_section: None,
silph_synth_done: false,
silph_synth_ctx: 0,
silph_synth_handles: [None; 4],
silph_synth_refs: [None; 4],
}; };
crate::exports::register_exports(&mut state); crate::exports::register_exports(&mut state);
crate::xam::register_exports(&mut state); crate::xam::register_exports(&mut state);
@@ -797,6 +853,91 @@ impl KernelState {
); );
} }
/// AUDIT-2BF — diagnostic. If the live PC for HW slot `hw_id` is in
/// `self.audit_pc_probe_pcs`, emit a single one-line
/// `AUDIT-PC-PROBE` record with (pc, tid, hw, cycle, lr, r3, r11)
/// plus four guest-memory dereferences off r3: `[r3+0]` (vtable),
/// `[[r3+0]+24]` (slot 6 method = bctrl target), `[r3+0x0C]`
/// (auxiliary handle field), `[r3+0x30]` (embedded sub-object
/// vtable field). Tuned for the silph init chain virtual-dispatch
/// site at `sub_82172BA0+0x1E8` (PC `0x82172D88`).
///
/// Read-only. No guest-state mutation; lockstep digest unaffected.
/// Empty set is the common case → single `is_empty()` test on the
/// hot path.
pub fn fire_audit_pc_probe_if_match(&self, hw_id: u8, mem: &GuestMemory) {
if self.audit_pc_probe_pcs.is_empty() {
return;
}
let ctx = self.scheduler.ctx(hw_id);
let pc = ctx.pc;
if !self.audit_pc_probe_pcs.contains(&pc) {
return;
}
let tid = self.scheduler.tid(hw_id).unwrap_or(0);
let r3 = ctx.gpr[3] as u32;
let r11 = ctx.gpr[11] as u32;
let lr = ctx.lr as u32;
let cycle = ctx.cycle_count;
// Memory dereferences. Guest pointers may be unmapped/garbage;
// `read_u32` returns 0 for unmapped pages (heap.rs:510 returns
// a default), so an all-zero block in the output reliably
// indicates an invalid `r3`.
let vtable = mem.read_u32(r3);
let slot6_method = if vtable != 0 {
mem.read_u32(vtable.wrapping_add(24))
} else {
0
};
let aux_handle = mem.read_u32(r3.wrapping_add(0x0C));
let sub_vt = mem.read_u32(r3.wrapping_add(0x30));
println!(
"AUDIT-PC-PROBE pc={:#010x} tid={} hw={} cycle={} lr={:#010x} r3={:#010x} r11={:#010x} \
[r3+0]={:#010x} [[r3+0]+24]={:#010x} [r3+0x0C]={:#010x} [r3+0x30]={:#010x}",
pc, tid, hw_id, cycle, lr, r3, r11,
vtable, slot6_method, aux_handle, sub_vt,
);
// AUDIT-2BF round 14 — paired memory-read. When
// `audit_mem_read_addr` is set, dereference 3 deep: singleton
// pointer → vtable → vtable[0] / vtable[24]. Defensively
// null-checks each level. `read_u32` returns 0 for unmapped
// pages so all-zero output is the unmapped/uninitialized
// signature.
if let Some(addr) = self.audit_mem_read_addr {
let val = mem.read_u32(addr);
let vt = if val != 0 { mem.read_u32(val) } else { 0 };
let m0 = if vt != 0 { mem.read_u32(vt) } else { 0 };
let m6 = if vt != 0 { mem.read_u32(vt.wrapping_add(24)) } else { 0 };
println!(
"AUDIT-MEM-READ addr={:#010x} val={:#010x} vtable={:#010x} \
vtable[0]={:#010x} vtable[24]={:#010x} pc={:#010x} tid={} cycle={}",
addr, val, vt, m0, m6, pc, tid, cycle,
);
}
// AUDIT-052 — dump N bytes of guest memory from r3 as u32 lanes
// when `audit_r3_dump_bytes` is set. Sized for the 80-byte
// stack-local struct at sub_82452DC0's `r31+96` (probe is
// sub_8245B000 entry where r3 IS the struct ptr). Output
// format: `AUDIT-R3-DUMP pc=… r3=… +0x00=… +0x04=… …`.
if let Some(n) = self.audit_r3_dump_bytes {
let n = n.min(256) & !3u32; // cap 256B, 4-byte align
let mut out = String::with_capacity(64 + (n as usize) * 16);
use std::fmt::Write as _;
let _ = write!(
&mut out,
"AUDIT-R3-DUMP pc={:#010x} tid={} cycle={} r3={:#010x}",
pc, tid, cycle, r3,
);
let mut off: u32 = 0;
while off < n {
let v = mem.read_u32(r3.wrapping_add(off));
let _ = write!(&mut out, " +0x{:02x}={:#010x}", off, v);
off = off.wrapping_add(4);
}
println!("{}", out);
}
}
/// M12 — diagnostic. If the live PC for HW slot `hw_id` is in /// M12 — diagnostic. If the live PC for HW slot `hw_id` is in
/// `self.lr_trace_pcs`, emit one JSONL record. Format mirrors what /// `self.lr_trace_pcs`, emit one JSONL record. Format mirrors what
/// xenia-canary's `--log_lr_on_pc` patch emits, plus the cycle /// xenia-canary's `--log_lr_on_pc` patch emits, plus the cycle