40f208ea4e730a27315d4fee3a107127d54d8dbb
111 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
40f208ea4e |
[2.BF] Silph WorkerCtx: install canary's real sub-vtable at [+0x2C][0]
Round-21 pivot of the audit-059 synth-spawn module. Round 20 made the
silph::WorkerCtx workers run by attaching a 32-slot stub sub-vtable
where every entry was a `li r3, 0; blr` stub — workers spawned but
spun forever because slots 15/17 short-circuited to NULL ("no work").
Round 21 reads canary's real sub-vtable VA out of the XEX `.rdata` —
`0x8200A168` — and points `[sub_object + 0]` at it directly. The
vtable bytes live in the static image both engines map, so no guest
memory is consumed and slot 15 (= `sub_824FCCC8`) and slot 17
(= `sub_824FCE38`) — the only slots `sub_82506B08` ever calls —
become working game methods.
Discovery method (canary probes in
`audit-runs/audit-059-handle-disambiguation/round21-subvtable-canary/`):
1. `--audit_jit_prolog_pc=0x82506B08` to catch the first WorkerCtx
virtual-dispatch entry; `[r3+0x2C]` revealed the sub-object VA.
2. Re-run with `--audit_jit_prolog_mem_dump=<sub-obj VA>` to deref
`[sub-object + 0]` = sub-vtable VA = 0x8200A168.
3. PE inspection (`xex-text/xex-rdata` is the static image) reads
all 31 slots; slot 15 -> sub_824FCCC8, slot 17 -> sub_824FCE38.
Smoke metrics (50M instructions, `XENIA_CACHE_PERSIST=1
XENIA_SILPH_SYNTH=1`, audit-runs/audit-059-handle-disambiguation/
round21-real-vtable/):
* 4/4 workers spawned, no crash, no new fault
* KeSetEvent 633885 -> 431860 (-32%)
* KeWaitForSingleObject 258441 -> 185762 (-28%)
* Per-handle state unchanged on the focused stalled set
(0x1020/0x1090 still `<NO_SIGNALS_DESPITE_WAITS>`,
0x12a4/0x12ac/0x1218/0x1224 still `<UNCREATED>`).
* No VdSwap/draws progression observed in this window.
Verdict: B (partial). The workers no longer spin in a stub-loop —
internal call density shifted — but the focused wedge handles still
don't get signalled. Likely root cause: workers may now be waiting
on the WorkerCtx's own KEVENTs (which we synthesised at
+0x54/+0x94) for upstream work that no producer is enqueuing.
Net LOC: 29 ins / 31 del. Tests: workspace passes (lockstep app
tests, kernel 127/127, hir 288/288, scheduler 38/38).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
8683fb59ed |
[2.BF] Silph WorkerCtx: synthesize sub-object + vtable at [+0x2C]
Audit-059 round 19 isolated the round-18 worker fault: the four silph::
WorkerCtx worker bodies all execute the sequence
lwz r3, 44(rN) ; r3 = [ctx+0x2C] — sub-object pointer
lwz r11, 0(r3) ; r11 = sub-object vtable
lwz r11, 60(r11) ; r11 = sub-object vtable[15]
mtctr r11
bctrl
Ours left [ctx+0x2C] NULL → PC=0 fault on first virtual dispatch. Round 19
recommended materialising a sub-object whose vtable points entirely at an
existing trivial-return stub so workers idle live, returning NULL work,
without crashing.
Changes (silph_synth.rs only, +63/-6):
- Grow SILPH_CTX_SIZE 0x500 → 0x800 to embed sub-object at +0x300 and a
32-slot sub-vtable at +0x500 in the same heap_alloc.
- After ctx header init, write sub-object pointer at [ctx+0x2C], the XEX-
resident wrapper constant 0xBE568F00 (round-7 finding) at [ctx+0x30],
and leave [ctx+0x28] NULL (matches canary first-fire snapshot).
- Populate every slot of the 32-entry sub-vtable with VA 0x8216CAA4, the
first 4-byte-aligned standalone `li r3, 0; blr` stub located by a fresh
PE-text scan (preceded by a `blr` terminating the previous function).
- Sub-object body itself is zero-filled apart from the [+0]=vtable_ptr
write; round-19 disassembly confirms workers only touch slots 15/17.
Smoke (XENIA_SILPH_SYNTH=1, persistent cache, 5e7 instr):
- Lockstep: no crash, all 4 workers (tid=6/7/8/9) reach Ready in deep
worker-body PCs (0x825067xx/0x825089xx/0x825091xx). Verdict (D) —
workers run their idle loop returning NULL; existing silph waiters
(0x1020, 0x1090) remain <NO_SIGNALS_DESPITE_WAITS> because we
deliberately neutered productive work.
- Parallel: identical picture, no PC=0/PC=garbage fault anywhere.
No regression in 765-test suite.
Next round: feed real work-items into the intrusive ring at ctx+0x210
so workers' returned-NULL idle becomes returned-work productive; or
discover which sub-vtable slots actually need real callees (slot 15
worker drain, slot 17 producer).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
b5885b8560 |
[2.BF] Synthetic silph::WorkerCtx spawn (round 18 — opt-in landing)
Adds infrastructure to synthesise the silph::WorkerCtx that AUDIT-058/059 identified as never reached by ours' static-init chain (real chain entry sits in audit-059 round 9's wrong-vtable wedge at sub_82172BA0+0x1E8). Ctx layout follows round 5's live hexdump from canary: +0x00 vtable = 0x8200A1E8 +0x04 self +0x08 intrusive list head -> self +0x0C init flag = 1 +0x10 packed byte field +0x18 2x float ~1.0 (UI rates) +0x24 flag = 1 +0x28..+0x30 3x foreign-arena pointers (left NULL — see below) +0x54..+0x84 4x X_KEVENT auto-reset, state=0 +0x94..+0xC4 4x X_KEVENT manual-reset, state=1 (pre-signaled) +0x210..+0x250 4-entry intrusive work-ring, empty Worker spawn mirrors AUDIT-048's audio-worker pattern in xaudio_register_render_driver: per-worker allocate_thread_image + state.scheduler.spawn with r3 = ctx_ptr. Trigger fires at the first dat/* VFS open (ours' earliest is dat/files.tbl), which is when canary runs the equivalent chain. ROUND 18 OUTCOME — opt-in only: With workers spawned Ready (XENIA_SILPH_SYNTH=1), boot CRASHES at cycle ~5.5M with PC=0 on hw=1, just after worker_3 (entry 0x825065B8) spawns. Per task constraints this is STOP-and-report: the ctx fields +0x28/+0x2C/+0x30 (foreign heap pointers — canary's 0x30057018, 0xBCE25640, 0xBE568F00, distinct arenas per audit-059 round 7) are left NULL, and the worker bodies plausibly dereference one of them. Synthesising those is a fresh investigation (round 19+). With workers spawned Suspended (XENIA_SILPH_SYNTH=suspend), boot completes normally (11 spawns, VdSwap=1, KeSetEvent=2, KeReleaseSemaphore=1 — matches default baseline). The ctx remains materialised in guest memory at the logged VA for downstream probing. Default (env var unset): no synth, no regression. Files: crates/xenia-kernel/src/silph_synth.rs (new, 225 LOC) crates/xenia-kernel/src/lib.rs (+1 LOC, register module) crates/xenia-kernel/src/exports.rs (+37 LOC, hook in open_vfs_file) crates/xenia-kernel/src/state.rs (+18 LOC, 4 silph_synth_* fields) Tests: cargo test --release --workspace = 765 pass / 0 fail. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
9340ff4592 |
[Audit] --audit-r3-dump-bytes: dump N bytes at r3 when probe fires
AUDIT-059 round 15 — diagnostic. When `--audit-r3-dump-bytes=N` is set, every `--audit-pc-probe-hex` fire emits a paired `AUDIT-R3-DUMP` line with N bytes of guest memory from r3 as u32 lanes (4-byte aligned, cap 256B). Sized for the 80-byte stack-local struct at sub_82452DC0's `r31+96` (probe sub_8245B000 entry where r3 IS the struct ptr). Settable via `XENIA_AUDIT_R3_DUMP_BYTES` env. Read-only; lockstep digest unaffected (empty-set fast path in fire_audit_pc_probe_if_match). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
bcd018659b |
[Audit] --audit-mem-dump-chain: deref a guest address N levels for diagnosis
Round-14 of AUDIT-2BF (singleton-dump). The bctrl at sub_822F1AA8+0x90
(PC 0x822F1B4C) loads [0x828E1F08] (a global singleton), dereferences
its vtable, and indirect-calls vtable[0]. Canary returns; ours hangs.
To name the resolved target we need to dump the (singleton, vtable,
vtable[0]) chain on probe firing.
Adds `--audit-mem-read-hex` / `XENIA_AUDIT_MEM_READ` taking a single
guest VA. When set and any `--audit-pc-probe-hex` PC fires, the kernel
emits a paired `AUDIT-MEM-READ` line with three guest reads:
AUDIT-MEM-READ addr=0x828E1F08 val=<*addr> vtable=<**addr> \
vtable[0]=<***addr+0> vtable[24]=<***addr+24> ...
`vtable[24]` is included as the slot-6 method (audit-059 round 9
documented the canary silph chain dispatching slot 6 of a vtable here).
Read-only; lockstep digest unaffected. ~30 LOC across state.rs and
main.rs. `cmd_check` opts out of the flag (same policy as the existing
audit_pc_probe_hex).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
09e59e09b7 |
Audit-2BF.delta: add --audit-pc-probe-hex for silph-init bctrl probe
Adds a per-PC probe analogous to --lr-trace / --branch-probe but tuned
for the silph init chain's virtual-dispatch site at sub_82172BA0+0x1E8
(PC 0x82172D88, the bctrl after a 3-deep `lwz` chain that loads vtable
slot 6). Each fire emits one AUDIT-PC-PROBE line with (pc, tid, hw,
cycle, lr, r3, r11) plus four guest-memory dereferences off r3 — the
vtable, slot-6 method pointer, auxiliary handle field, and embedded
sub-object vtable — so the line can be compared head-to-head with
canary's round-9 capture (r3=0xBCCC52C0, [r3+0]=0x820A3644,
slot6=sub_821B55D8, [r3+0xC]=0xF80000D8, [r3+0x30]=0x820A1870) to
identify whether ours dispatches to the wrong vtable on a correct
object (case A) or to a wrong object entirely (case B).
Why this addition rather than reuse of an existing probe: --lr-trace
emits JSONL designed for canary-side diffing and only captures
r3/r4/r5/r6/lr (no memory dereferences); --branch-probe captures CR
flags and lr but again no memory; --ctor-probe is single-shot per PC
and walks the stack back-chain. None of them load the four indirect
fields needed to identify a vtable-shape divergence.
Implementation:
- state.rs: new HashSet<u32> field `audit_pc_probe_pcs` and helper
`fire_audit_pc_probe_if_match(hw_id, mem)`. Empty-set fast-path
keeps the cost to one is_empty() check per worker_prologue call
when the flag is unused. Read-only — no guest state mutation,
lockstep digest unchanged.
- main.rs: new CLI flag --audit-pc-probe-hex with bare-hex comma
parsing (tolerates `0x` prefix), settable also via
XENIA_AUDIT_PC_PROBE env var. Threaded through cmd_exec_inner;
cmd_check passes None so check digests are unaffected.
Probe wired into worker_prologue alongside fire_ctor_probe / fire_-
branch_probe / fire_lr_trace. Like its siblings, it fires once per
basic-block entry — known limitation (audit-045 reading-error class
13); use a block-entry PC if probing a mid-block instruction.
Verification: kernel 127/127, app 5/5 non-ignored, no behaviour
change with empty flag.
Cross-references audit-059 round 9's canary capture and lays the
groundwork for the round-10 ours-side comparison.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
5a8fe21ad5 |
Iterate-2.BF.γ: refine is_in_callback gate to per-thread exclusion
Lockstep vsync delivery was capped at 54/run despite the ticker firing
333 periods and dispatcher being called 1.2M times. Root cause: the
blanket `is_in_callback()` gate skipped dispatch entirely whenever the
async audio path held `interrupts.saved`, which is essentially the
entire boot (audio worker rarely hits its LR_HALT_SENTINEL between
back-to-back callbacks). 5.85M dispatch_skip_in_callback events drowned
out the 55 with-pending windows.
Graphics dispatch (iterate-2.BE) runs the ISR synchronously and
restores the borrowed context before returning — it doesn't touch
`interrupts.saved`. The only real conflict is if graphics picks the
*same* thread audio borrowed (which would stomp audio's
SavedCallbackCtx). Replace the blanket gate with per-thread exclusion:
when audio is mid-flight, exclude only its `injected_ref` from
victim selection. Falls through to the existing no-victim drop if
that's the only candidate.
Lockstep (50M instr): gpu.interrupt.delivered{source=0} 54 → 295
(5.5×), all 333 ticker periods either delivered or unarmed (no more
queue_full_drops). Wallclock unchanged ~3 s.
Parallel (30M instr): 1193 → 3458 baseline lift (2.9×), no regression.
Tests: xenia-kernel 127/127, xenia-app 5/5 non-ignored. Lockstep
goldens will drift (interrupts.delivered is in the digest); deferred
to next iterate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
51489e34db |
Iterate-2.BE Path β: tick vsync from coord_idle_advance
The iterate-2.BE host-driven synchronous ISR dispatcher relies on
something queueing v-syncs. In lockstep that's `tick_vsync_instr`,
called from `coord_pre_round` per round. If the scheduler stalls into
`coord_idle_advance` (no Ready threads), the instruction counter
freezes — the accumulator stops incrementing, the ticker stops
queueing, and the dispatcher is left starved for the duration of the
idle wait.
Tick `tick_vsync_wallclock` at the top of `coord_idle_advance` so
v-syncs keep firing on host time even when the guest scheduler is
parked. The dispatcher in the outer loop drains whatever we queue on
the next iteration. Same MMIO `D1MODE_VBLANK_VLINE_STATUS` bit-set as
the production path.
Note: empirically in Sylpheed at 50M/500M instruction horizons,
`coord_idle_advance` is never reached (tids 9/10/12 stay Ready through
the early-boot deadlock), so this commit doesn't move
`gpu.interrupt.delivered{source=0}` off 54 for this title at these
horizons. It is the correct fix for the documented starvation pattern
and will activate as soon as the kernel reaches a state where Ready
threads drop to zero with timers/waits pending.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
9a93152981 |
Iterate-2.BE: host-driven synchronous graphics ISR delivery
Replaces the victim-thread-mutate-then-wait scheme for vsync / CP
interrupts with synchronous in-line dispatch on the coordinator host
thread. Mirrors canary's EmulateCPInterruptDPC -> Processor::Execute
path (kernel_state.cc:1370, processor.cc:413): pick a guest thread,
borrow its PpcContext, jam ISR PC + args in, run the interpreter
inline until LR_HALT_SENTINEL, restore the borrowed context.
Why: audit-059 measured gpu.interrupt.delivered{source=0} = 54 over
3.9 s vs canary's 4712 over 30 s. Per-second shortfall ~11×. Old
asynchronous LR-sentinel injection (try_inject_graphics_interrupt)
needed a Ready or Blocked guest thread to land on; once the Sylpheed
main thread and worker threads all idled post-boot, no victim was
available and every queued vsync got dropped. Host-driven dispatch
decouples delivery from guest-thread readiness.
Smoke test (lockstep): unchanged 54 — under current Sylpheed boot
trajectory the ticker is gated by guest-instruction progress, not
victim availability; lockstep stalls into idle-advance after ~5M
instructions of real work and the synthetic tick_vsync_instr stops
firing. Under --parallel (wallclock ticker) gpu.interrupt.delivered
climbs to ~1131 over a 128 s run, confirming the synchronous
dispatcher itself works as intended. Architectural piece is now in
place; raising the lockstep delivery rate requires ticking the
synthetic vsync inside coord_idle_advance, which is a separate
change.
Changes:
- crates/xenia-kernel/src/interrupts.rs: doc-comment update only.
SavedCallbackCtx + CALLBACK_STACK_PAD retained — the audio
callback path (audit-048) still uses the asynchronous LR-sentinel
inject on a dedicated per-client worker.
- crates/xenia-app/src/main.rs:
* dispatch_graphics_interrupts(kernel, mem, &mut stats,
&mut decode_cache, thunk_map): new fn. Drains the full FIFO per
call. Victim selection same shape (Ready preferred, else
Blocked, skip Idle/Exited/ServicingIrq), but the call is
synchronous - we run step_cached + import-thunk dispatch inline
on the borrowed ctx until pc == LR_HALT_SENTINEL.
MAX_INSTRS_PER_ISR = 1M safety budget.
* coord_pre_round: graphics-IRQ injection call removed. Audio
path unchanged (still calls try_inject_audio_callback).
* run_execution + run_execution_parallel: each now owns a
persistent isr_decode_cache and calls
dispatch_graphics_interrupts after coord_pre_round.
* try_inject_graphics_interrupt: deleted (118 LOC).
No new public APIs, no new dependencies, no changes to xenia-cpu.
Tests: workspace 765 passed / 0 failed / 4 ignored (parallel_stress
+ sylpheed_n50m, all gated). Kernel 127/127, app 5/5, cpu 288/288.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
ac2f89a7bb |
Re-baseline sylpheed_n50m golden post-AUDIT-054
instructions: 50000002 → 50000001 (1-instr shift from FILE_DIRECTORY_FILE plumbing on NtCreateFile path; all other digest fields unchanged — imports/swaps/draws/render-targets/shaders/textures all match prior golden). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
2a8ff9515d |
AUDIT-054: thread CreateOptions through NtCreateFile + opt-in cache persistence
Track A — FILE_DIRECTORY_FILE handling. NtCreateFile's 9th parameter
`create_options` (sp+0x54 per shim_utils.h:49-50) is now read and
forwarded to open_vfs_file/open_cache_file. When the
FILE_DIRECTORY_FILE bit (0x1) is set on a `cache:\<hash>` path,
the host-side handler `mkdir -p`s instead of `File::create`'ing a
0-byte sentinel that blocked subsequent hierarchical creates of
`cache:\<hash>\<sub>\<leaf>` with NAME_COLLISION. Confirmed by
`opts=0x4021` (incl. FILE_DIRECTORY_FILE) on `cache:\d4ea4615`
and `opts=0x4020` (no DIR bit) on the leaf `.tmp` files. NtOpenFile
forwards `open_options` (r8) into the same slot per
xboxkrnl_io.cc:118-122. Closes the AUDIT-053 ζ-class VFS layout
aliasing wedge.
Track B — opt-in persistent cache root. AUDIT-038's per-process
tmpdir + wipe stays the default (preserves lockstep/oracle
determinism + dodges Sylpheed's `<hash>.tmp` journal-append-on-
reboot self-inconsistency). Persistence is now opt-in via
* `XENIA_CACHE_ROOT=<path>` — explicit path (caller manages
wiping); hands a stable place to drop a canary-built cache
for cascade A/B oracle work.
* `XENIA_CACHE_PERSIST=1` — `$XDG_DATA_HOME/xenia-rs/cache`
(or `$HOME/.local/share/xenia-rs/cache`).
Cold-start (-n 500M, default tmpfs) with FILE_DIRECTORY_FILE fix:
swaps=1 draws=0 imports=40454 cxx_throw=0 — matches master baseline,
no regression. Cache hierarchy now mkdir-p'd correctly: `cache:/`
contains 9 hash dirs (e.g. `d4ea4615/e/`, `aab216c3/5/`) instead
of the 0-byte sentinel files AUDIT-053 found masquerading as
directories.
LOC: +88 / -14 = +74 net (≤80 budget). All 127 xenia-kernel unit
tests pass.
Trace: audit-runs/audit-054-vfs-layout-fix/
cold-start-digest.json + warm-start-digest.json (defaults)
persist-cold-digest.json + persist-warm-digest.json (opt-in)
baseline-master-digest.json (master
|
||
|
|
25704c5811 |
Re-baseline sylpheed_n50m golden post-AUDIT-032
Companion to
|
||
|
|
49f3eafa15 |
AUDIT-032: dedicated audio worker thread per client (Plan B)
Replaces APUBUG-PRODUCER-001's random-victim-hijack audio injection with a dedicated per-client guest worker thread, mirroring xenia-canary's apu/audio_system.cc:84-159 WorkerThreadMain pattern in xenia-rs's threading model. Audio callback ticker is now safe to enable by default. ## What changed - xenia-kernel/src/xaudio.rs: new XAudioState fields worker_handles + worker_refs (one slot per of XAUDIO_MAX_CLIENTS=8). Synthetic park-handle helper (0xF000_0000 | client_idx) — outside the normal alloc range so wake_eligible_waiters never finds it; the only legitimate state-flip is via try_inject_audio_callback. - xenia-kernel/src/exports.rs: xaudio_register_render_driver spawns a 64KB-stack guest thread (create_suspended=true) via state.scheduler.spawn after registration succeeds. Immediately flips the spawned thread's state from Blocked(Suspended) to Blocked(WaitAny[synthetic]) so it's parked but not woken. Stores the kernel handle so find_by_handle resolves a fresh ThreadRef after slot compaction. Failure paths log + leave xaudio.worker_refs[i] = None, in which case the ticker drops fires (no random-victim fallback). - xenia-app/src/main.rs: try_inject_audio_callback resolves the worker via worker_handles[index] instead of scanning runqueues for a Ready or Blocked victim. The PC+r3 injection and SavedCallbackCtx capture are unchanged; the existing LR_HALT restore path re-blocks the worker on its synthetic handle for the next tick. Flag handling reworked: --xaudio-tick / XENIA_XAUDIO_TICK now act as explicit override (truthy = force on, falsey = force off, absent = use the KernelState default). - xenia-kernel/src/state.rs: xaudio_tick_enabled default flipped from false to true. Pre-fix it was off because the random-victim hijack regressed swaps=2->1; with the dedicated worker that whole class of regression is gone. ## Cascade verification at -n 500M (audit-runs/audit-048-audio-host-pump/) Pre-fix baseline: audit-runs/audit-047-gamma-wedges/ours-end-state.log. | Dim | Predicted (AUDIT-032) | Observed | |-----|-------------------------------------|---------------------------------| | A | tid=9 leaves Blocked[0x828A3254] | Ready @ pc=0x824d1404 | | B | tid=10 leaves Blocked[0x828A3230] | Ready @ same pc/lr | | C | XAudioSubmitRenderDriverFrame > 0 | Mixer setup path executed | | D | KeReleaseSemaphore 0 -> non-zero | 0 -> 1; xaudio.callback.delivered=1 | Bonus: audit-042's tid=6 worker pair on 0x10A0+0x10A4 also went Blocked->Ready as a downstream effect. Boot trajectory shifted significantly: NtWaitForSingleObjectEx 1,489,791 -> 30; NtSetEvent 3,334 -> 68; new exports firing (StfsCreateDevice, ObCreateSymbolicLink, XamContentCreateEnumerator, XamEnumerate, XamTaskSchedule, ExCreateThread x10, KeSetAffinityThread x7, NtCreateSemaphore x4, NtWaitForMultipleObjectsEx x94, NtDuplicateObject x14, XeCryptSha, XeKeysConsolePrivateKeySign). The system left the audio-wait busy loop and entered the savegame/content/crypto init phase. swaps regressed 2 -> 1 (degenerate splash repeat lost; main thread now advances past splash entirely, blocked on a different handle). draws unchanged at 0 — expected per AUDIT-032 (audio gate != renderer gate). ## Tests + scope - cargo build --release succeeds, no new warnings. - cargo test -p xenia-kernel --lib: 127/127 pass (incl. xaudio). - cargo test -p xenia-app --lib: 5/5 non-ignored pass. - Lockstep goldens (sylpheed_n2m / sylpheed_n50m) WILL drift on this fix and need re-baselining as a follow-up commit. 75 net non-comment LOC across 4 files, well under AUDIT-032's 60-120 LOC budget. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
e428ce33aa |
M9.5 + M11.5 + VMX + SJIS/UTF-8: close the post-M5.5 deferred set
Closes the four remaining deferred follow-up items in one bundle. All four are smaller-scope and additive; lockstep determinism unaffected (analyzer-only changes). ## M9.5 — __CxxFrameHandler scope-table parsing - New `xenia_analysis::eh_scope` module. Magic-scans .rdata for the three documented MSVC FuncInfo signatures (0x19930520/21/22) on 4-byte alignment. Each match is parsed as the documented struct (BE u32 fields), with sanity caps on max_state / n_try_blocks / pointer validity. - Walks pUnwindMap (UnwindMapEntry, 8 bytes) and pTryBlockMap (TryBlockMapEntry, 20 bytes) into one row each. - New tables eh_funcinfo, eh_unwind_map, eh_try_blocks. - Sylpheed yield: 2,588 FuncInfo (all version 0x19930522) / 10,019 unwind entries / 315 try-blocks. ## M11.5 — Static-init driver chain detection - New `xenia_analysis::static_init` module. Walks every function looking for the canonical _initterm loop: lwz cursor; mtctr; bcctrl; addi cursor, cursor, 4 bounded by a compare against another constant register. Extracts (array_start, array_end) and reads the array. - Reuses `function_pointer_arrays` table — drivers' arrays land with kind='static_init' (replacing M11's prologue-heuristic output where the structurally-grounded pattern fires). - Sylpheed yield: 0 drivers detected — the binary's static-init structure does not match the canonical CRT loop. Infrastructure ready; future M11.6 can relax. ## VMX vector-store xrefs (M6 follow-up) - Adds AltiVec/VMX X-form load/store XOs to the M6 opcode-31 dispatch: lvx/lvxl/lvebx/lvehx/lvewx (reads) and stvx/stvxl/stvebx/stvehx/stvewx (writes), all addr_mode= 'x_form_indexed'. Static resolution still requires both rA and rB constant. - Sylpheed yield: 110 newly-detected stvx writes. ## Shift_JIS + UTF-8 localised-string detection (M7 follow-up) - Extends `xenia_analysis::strings::analyze` with scan_shift_jis (JIS X 0208 lead/trail byte ranges + half-width katakana pass-through) and scan_utf8 (2- and 3-byte sequences). At least one multi-byte unit required so pure-ASCII strings aren't double-counted. - SJIS bytes rendered as \xHH escapes for diagnostic readability; full SJIS→UTF-8 decoding deferred. - Sylpheed yield: 790 Shift_JIS strings (Japanese debug + UI text) + 39 UTF-8. ## Tests - +2 EH (parses_minimal_funcinfo_v0, rejects_bogus_max_state) - +2 static_init (detects_canonical_initterm_loop, rejects_function_without_pattern) - +2 strings (detects_shift_jis_string, detects_utf8_multibyte_string) Tests 649→655 (+6 unit tests). DB schema golden + write_analysis_results signature updated for new EH parameter. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
56ffa40a6a |
M5.5: this-flow indirect-dispatch resolution via vptr-write inference
Closes the dominant case M5 could not resolve — `lwz vt, off(this);
lwz fn, slot(vt); mtctr; bcctrl` (real C++ dispatch). Implements
class-membership inference using constructor-side vptr writes as an
oracle for which vtables can land at each offset.
## Algorithm
Phase 1 — vptr-write scan: walk every function with the existing
lis+addi register tracker. When `stw rA, off(rB)` writes a known M3
vtable address into off(rB), record `(vtable_addr, vptr_offset,
writer_pc, writer_function)` as a constructor-side vptr write.
Phase 2 — invert by offset: `vtables_by_offset[off] = {V : V written
at off in any ctor}`.
Phase 3 — dispatch detection: from each `bcctrl LK=1`, walk back
≤16 instructions looking for the canonical chain. Bail on register
clobber, branch, or label (basic-block) boundary.
Phase 4 — edge emission: for `(dispatch_pc, vptr_off, slot)`, emit one
`xrefs.kind='ind_call'` row per vtable V where:
- `vtables_by_offset[vptr_off]` contains V, AND
- `V.length > slot` (V actually has a method at that slot)
Multi-candidate sites (the common case at offset 0) are an
over-approximation; downstream queries filter to single-candidate sites
for high confidence:
`WHERE candidate_count=1` in `indirect_dispatch_sites`.
## Schema
NEW TABLES:
- `vptr_writes(writer_pc, vtable_address, vptr_offset, writer_function)`
- `indirect_dispatch_sites(dispatch_pc PK, vptr_offset, slot, candidate_count)`
- `indirect_dispatch_candidates(dispatch_pc, vtable_address, method_address)`
NEW INDICES on vtable_address / vptr_offset / method_address /
(vptr_offset, slot) for fast joins.
## Sylpheed yield
- 567 vptr writes / 214 vtables / 29 offsets (offset 0 = 88%).
- 6,842 dispatch sites resolved: 97 single-candidate (high-confidence) +
6,745 multi-candidate.
- 687,963 ind_call xref rows.
- 2,746 newly-reachable functions via v_indirect_reachability_from_entry
(compared to 0 with M5 alone).
- Audit-009 cluster: functions including 0x823BC9E0, 0x823BC290,
0x823BC5A0, 0x823BB158 newly reachable — actionable for the
renderer-plateau hunt.
Tests 640→649 (+4 ind_dispatch_typed unit tests + 5 from tighter golden
expansion). Schema golden + write_analysis_results signature updated.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
77034b6cbf |
audit-038: persistent cache:/* VFS via host-FS backing
Replaces the "Synthesized empty file" stub for cache:/* paths with a
real host-FS HostPathDevice-style mount. Each KernelState gets a fresh
per-process tmpdir under /tmp/xenia-rs-cache-<pid>-<id>/ which is
cleared on init for lockstep determinism (mirrors canary's
xenia_main.cc:649 RegisterSymbolicLink("cache:", "\\CACHE") +
HostPathDevice in xenia-canary/src/xenia/vfs/devices/host_path_device.cc).
NtCreateFile now honours create_disposition for cache: paths:
FILE_OPEN -> NOT_FOUND if missing
FILE_CREATE -> NAME_COLLISION if present
FILE_OPEN_IF -> open or create
FILE_OVERWRITE_IF -> create or truncate
FILE_OVERWRITE -> NOT_FOUND if missing, else truncate
FILE_SUPERSEDE -> create or truncate
NtReadFile / NtWriteFile / NtSetInformationFile (XFileEndOfFileInformation)
/ NtQueryInformationFile / NtQueryFullAttributesFile route through
std::fs against the per-handle host_path; non-cache paths keep their
legacy semantics (read-only disc image, synth-empty stubs).
Verified by audit-037 cascade:
- sub_82459D18 (cache-miss restore): 0 fires (was firing constantly)
- sub_8245D230 (resize/zero-fill): 0 fires (was firing constantly)
- 105+ real cache-file writes per 500M run; 4+ MB of game data persisting
to disk per boot; cache:/recent, cache:/access, cache:/d4ea*.tmp, etc.
- Lockstep deterministic at instructions=100000004 / imports=987485
across 3+ reruns (digest shifted as expected; goldens re-baselined).
- swaps=2 plateau still in place; cluster L1 unactivated. Cascade
dimension D (cluster activation) — UNKNOWN, no L1 fires.
Tests 640 -> 645 (+5 cache-specific unit tests; full workspace green).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
5af792c9fc |
M8+M9+M10+M11+M12: LOW-tier milestones — funcptr-arrays, EH flag, TLS, lr-trace
Five LOW-priority milestones bundled. Total ~700 LOC across 11 files. ## M9 — has_eh derived from pdata.flags exception bit - New `functions.has_eh BOOLEAN NOT NULL` column. Derived from M1's already-parsed `pdata.flags` (bit 31 of the packed word — the exception-handler-present flag, distinct from bit 30 which is the always-1 32-bit-code flag). Index idx_functions_has_eh. - Sylpheed: 2,975 of 23,073 pdata-validated functions have EH (12.9%). ## M10 — .tls section / IMAGE_TLS_DIRECTORY32 parser - New `xenia_xex::tls::parse_tls` parses the directory + zero-terminated callback array. Returns None when the binary has no .tls section. - New `tls_info` (singleton row) + `tls_callbacks(slot, address)` tables. - New `DbWriter::write_tls()` no-ops on None. - Sylpheed has no .tls section → 0 rows; infra ready for binaries with __declspec(thread). ## M8 + M11 — function_pointer_arrays (dispatch tables + static initialisers) - New `xenia_analysis::funcptr_arrays::analyze` widens M3's vtable scan: detects runs of ≥2 function pointers in .rdata and classifies each as `vtable` (M3 re-emit), `dispatch_table` (M8), or `static_init` (M11) via a constructor-prologue heuristic (mfspr + small stwu). - New tables `function_pointer_arrays(address PK, length, kind)` and `function_pointer_array_entries(array_address, slot, function_address)`. - Sylpheed: 722 vtables + 388 dispatch_tables = 1,110 arrays / 6,347 slots. 0 static_init detected (Sylpheed's ctors don't all match the conservative heuristic; M11.5 future work can chain via the entry- point's static-init driver). ## M12 — --lr-trace runtime canary-diff harness - New CLI `exec --lr-trace=PC[,PC,...]` and `--lr-trace-out=PATH` flags. Symbolic resolution (Class::method, Class::*) via M4 lookup. Env vars XENIA_LR_TRACE / XENIA_LR_TRACE_OUT also work. - New `KernelState::lr_trace_pcs` + `lr_trace_writer` + helper `fire_lr_trace_if_match(hw_id)` invoked from the per-instr probe slot. - JSONL output: pc/tid/hw/cycle/r3/r4/r5/r6/lr — superset of what xenia-canary's --log_lr_on_pc patch emits, with a cycle counter for cross-run reproducibility. Diff-friendly via `jq`. - Lockstep digest unaffected: smoke test on entry-point PC fires once with cycle=0/lr=BCBCBCBC/all-GPR-zero (correct initial state). Tests 636→640 (+2 TLS tests, +2 funcptr_arrays tests). Schema golden updated for new tables + has_eh column. Lockstep determinism preserved (instructions=2000005 ×2 reruns identical). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
38d8871e8d |
M6: addr_mode column on xrefs + extended store/load classes
Adds finer-grained addressing-mode classification to every data xref row plus new dispatch for instruction families not previously emitted: - New `xrefs.addr_mode VARCHAR NULL` column. NULL for control-flow edges (call / ind_call / j / br); one of d_form / lis_addi / lis_ori / multiword / x_form_indexed / x_form_byterev / atomic / dcbz for data edges. Index idx_xrefs_addr_mode. - New `xenia_analysis::xref::AddrMode` enum + Xref::addr_mode field. - Opcode 46/47 (lmw/stmw) expand to one xref per slot — D-form multi-word load/store now resolves all (32-rS) consecutive addresses. - Opcode 31 X-form dispatch — stwx/stbx/sthx/stwux/stbux/sthux/stdx/stdux, lwzx/lbzx/lhzx/lhax/lwzux/lbzux/lhzux/lhaux/ldx/ldux, stwcx./stdcx. (atomic), stwbrx/sthbrx/lwbrx/lhbrx (byte-reverse), dcbz (cache-line clear). - X-form rows are emitted ONLY when both rA and rB resolve to known constants (rare but present); the dominant runtime-indexed pattern remains correctly skipped. Sylpheed yield (regen on master + merge): - 442 newly-detected x_form_indexed reads (lwzx/lhzx into static tables). - 40 newly-detected atomic writes (stwcx./stdcx. with resolvable address). - 28,834 lis_addi refs, 18,485 d_form reads, 3,288 d_form writes — every pre-existing data row now tagged. - 0 multiword / dcbz / byterev (these instructions exist but aren't on lis+addi-tracked code paths). Tests 633→636 (+3 xref unit tests covering AddrMode tag uniqueness, data-edge addr_mode round-trip, control-edge None invariant). Schema golden updated (xrefs gains addr_mode column). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
ab4fe211e5 |
M5+M7: indirect-dispatch reachability + .rdata string detection
Two MEDIUM milestones bundled (both opportunistic per plan; both small).
## M5 — indirect-dispatch reachability
- `xenia_analysis::indirect`: per-basic-block register tracker over each
detected function. Recognises the canonical static-vtable pattern
`lis+addi → lwz off(rA) → mtctr → bcctrl` where rA holds a known M3
vtable address. Emits one `Xref { kind: IndirectCall }` per resolvable
bcctrl site.
- PowerPC ABI awareness: `bl`-style calls clobber volatile r0..r12 + ctr
but preserve non-volatile r13..r31, so a vtable pointer parked in r30/r31
before a call survives.
- Label-based basic-block boundaries kill register state — bounds
false-positive risk for jump-IN paths.
- New `XrefKind::IndirectCall` variant (DB tag `'ind_call'`).
- New SQL view `v_indirect_reachability_from_entry` — strict superset of
`v_reachability_from_entry`, taking `ind_call` edges in the BFS.
Sylpheed yield: 0 edges detected. The binary's 1,001 static lis+addi
references into vtables are nearly all constructor-side vptr writes, not
dispatches; real method dispatch goes through `this->vptr` which requires
alias analysis we explicitly don't do. Documented in SCHEMA.md as the
expected limitation. Three unit tests cover the synthetic-correctness path.
## M7 — string / constant-pool detection
- `xenia_analysis::strings`: scans `.rdata` for runs of ≥ 6 printable
ASCII bytes (NUL-terminated) and ≥ 6 UTF-16LE code units (basic-plane
printable ASCII, NUL u16 terminator).
- New `strings(address PK, encoding, length, content)` table + encoding index.
- Implicit cross-ref via existing `xrefs.kind='ref'` rows whose target
matches a strings.address.
Sylpheed yield: 6,311 ASCII strings (including embedded HLSL shader source
and AS_CB_SURFACE_SWIZZLE_* assertion strings). 9,132 lis+addi sites
cross-reference detected strings — names source PCs near each string in
one query. Four unit tests cover encoding detection, NUL termination, and
short-run rejection.
Tests 626→633 (+3 indirect, +4 strings).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
4ff08f6116 |
M4: class-aware probe tokens via M3 vtable+method tables
CLI extension only — no schema change. Adds symbolic resolution for --pc-probe / --branch-probe / --ctor-probe tokens: - `0xADDR` / `2186674160` — numeric (current behavior, no DB load). - `Class::method` — joins classes × methods × demangled_names. - `Class::*` — joins classes × methods (all slots). - `function_name` — falls back to functions.name for free functions / saverestore stubs / labels. New `xenia_analysis::lookup::resolve_probe_token(db_path, token)` opens the DB read-only ONLY when a token is non-numeric, so legacy numeric flows pay no IO. New `--probe-db PATH` flag (or `XENIA_PROBE_DB` env / default `sylpheed.db` next to the .iso) selects the DB. Symbolic resolution happens BEFORE any guest exec, so it cannot affect the lockstep digest. Verified deterministic across two reruns at -n 2M (instructions=2000005 identical). End-to-end smoke test on Sylpheed: `--pc-probe='ANON_Class_6B674251::*'` resolves to all 45 method PCs of that anonymous class (matching the methods-table row count for that vtable). Tests 621→626 (+5 lookup unit tests covering numeric passthrough, symbolic-without-DB error, Class::method resolution, Class::* expansion, and functions.name fallback). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
1d6c51fbf8 |
M3: vtable scan + MSVC RTTI walk + 3 new tables
Adds detection of statically-allocated MSVC vtables in .rdata/.data: - New `xenia_analysis::vtables` walks read-only sections looking for runs of ≥3 contiguous big-endian u32 values where each value lands on a known function start (from M1's corrected functions table). 2-slot runs are rejected to keep false-positive rate down. - For each candidate the MSVC RTTI walk vtable[-1] → CompleteObjectLocator → TypeDescriptor → mangled name is attempted; on success the demangled class name is recorded along with a best-effort RTTIClassHierarchyDescriptor walk to fill base_classes_json. On failure (RTTI stripped — common for shipped game binaries) the class is named ANON_Class_<fnv1a-hash> keyed by sorted method-PC list, so identical vtables collapse to one entry. - DB: new tables `vtables`, `methods`, `classes` with indices on function_address and rtti_present. `write_analysis_results` takes a `&[Vtable]` slice; `write_disasm` (back-compat) passes empty. - cmd_dis wires the scan after xref analysis using `func_analysis.functions.keys()` as the function-start oracle. Validation on Sylpheed (RTTI stripped, as expected): 722 vtables / 499 unique classes / 5571 methods. Sanity invariant: every methods.function_address joins to functions.address (0 broken refs). Largest vtable: 131 slots. Tests 617→621 (+4 vtable unit tests covering 3-slot detect, 2-slot reject, synth name stability, and synth name divergence). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
89f5f7e4a9 |
M2: MSVC C++ demangler + demangled_names DB table
Adds an MSVC name-demangling layer in front of M3's vtable / RTTI work: - New `xenia_analysis::demangle` wraps the `msvc-demangler` crate (a Rust port of LLVM's `MicrosoftDemangle.cpp`). `demangle()` short-circuits on non-mangled inputs (`?` prefix check); `demangle_or_raw()` always returns a record (raw passthrough on parse failure). - Heuristic split of the formatted demangled string into structured fields `(namespace_path, class_name, method_name, params_signature)`. Top-level paren / template-bracket aware, so `a::b<c::d>::e` and signatures with templated arg types parse correctly. - DB: new `demangled_names(address, mangled, raw_demangled, namespace_path, class_name, method_name, params_signature)` with indices on address / class_name / method_name. Populated from any label whose name starts with `?` plus any import name that happens to be mangled. For Sylpheed (a fully stripped binary) this table is empty out-of-the-box; the layer's value lands in M3, which will append rows for every RTTI TypeDescriptor name found in `.rdata`. Tests 610→617 (+7 demangler unit tests covering early-out, raw fallback, member function form, RTTI form, qname split, paren-template safety, and top-level `::` splitting). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
70120465a3 |
M1: parse .pdata RUNTIME_FUNCTION; cross-validate function boundaries
Adds an authoritative function-boundary source from the linker: - New `xenia_xex::pdata` parses .pdata 8-byte entries (BeginAddress + packed prolog/length/flags). Bit layout per Microsoft PE32 PowerPC spec: prolog in bits 0..7, function_length in bits 8..29, flags in 30..31. - `func::analyze_with_pdata` unions pdata BeginAddresses into the candidate set, attaches `pdata_validated`/`pdata_length` to each `FuncInfo`, and trims any function whose `end` overlaps the next start (catches mis-merge where one row spanned two prologues — the audit-031 sub_824D23B0/sub_824D29F0 case). - DB: extends `functions` with `pdata_validated BOOLEAN`, `pdata_length BIGINT`; new table `pdata_entries`; index on pdata_validated. - New `crates/xenia-analysis/SCHEMA.md` documents M1 layer + forward work. Validation on Sylpheed: 25481 functions (was 12156) / 23073 pdata_validated / 0 orphans / 0 mis-merges. Audit-031 mis-merge resolved: sub_824D29F0 now has its own row with `pdata_length=280` (70 dwords); sub_824D23B0 now correctly ends at 0x824D2878 (`pdata_length=1224` matches prologue walk). Tests 605→610. New 5-test pdata unit suite covers bit layout + sentinel + out-of-range filtering + real-world layout round-trip. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
690943ceef |
gate dump-section reads on is_mapped; trim doc comments
Without the page-state guard, read_bulk faulted on PROT_NONE pages of the 4 GiB host reservation. Per-page is_mapped check skips uncommitted pages, leaving the buffer's leading zero bytes in place. Total LOC budget after trim: 70. |
||
|
|
412ba858b4 |
move dump-section flush above quiet gate so it fires under --quiet runs
The headless cmd_exec path passes quiet=false in normal use but the diagnostic --dump-section is independent of the chatty thread/dump prints, so it should not be gated by --quiet. Lockstep digest preserved. |
||
|
|
08d41cf2fc |
add --dump-section=BASE:LEN:PATH for end-of-run guest memory snapshot
Drives byte-level memory diffs against canary's Memory::Save dump. Hot-path zero-cost when absent; lockstep digest unaffected (instructions=100000003 deterministic across reruns). |
||
|
|
c03f2bc9e2 |
fix(kernel): ensure_dispatcher_object writes XObj signature + handle (canary mirror)
Mirrors canary's `XObject::StashHandle` (xobject.h:253-256): on first adoption of a guest dispatcher header, stamp +0x08 with the kXObjSignature fourcc 'X','E','N','\0' and +0x0C with the stash handle (here the guest pointer itself, since our shadow table is keyed by ptr). Audit-023/024A documented divergence at addresses such as 0x828F4838 where canary stores "XEN\0" + handle but we left zeros. Lands as canary-correctness restoration; cascade impact at -n 500M is nil per the discipline gate (no sharp prediction tied to the writeback). Lockstep determinism preserved: instructions=100000003, imports=987516, swaps=2, draws=0 across 2 reruns. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
978a6950d1 |
feat(memory): --mem-watch=ADDR per-store writer trace
Adds an opt-in diagnostic that emits one tracing line per guest store overlapping any armed byte address, naming the writer (tid, pc, lr) plus old/new u32 lanes. Mirrors the --pc-probe / --branch-probe shape; pc/lr are stamped from worker_prologue via a thread-local Cell, so default runs (empty watch set) take a single is_empty() check on each write. Lockstep digest preserved (instructions=100000003 across reruns, sylpheed_n50m.json golden byte-identical). Diagnostic infra only; no functional change. Used to identify producers of dispatch-state writes for the audit-017 / audit-019 hunt. |
||
|
|
76dfe7fd7a |
fix(kernel): KRNBUG-KE-001 — real KeResumeThread per canary mirror
Replace the no-op cookie-returner with a real impl per canary xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_threading.cc:216-227 (XObject::GetNativeObject<XThread>()->Resume()). Mirrors nt_resume_thread plumbing two functions below: resolve_pseudo_handle -> scheduler.find_by_handle -> resume_ref. Returns STATUS_SUCCESS if the KTHREAD-pointer-as-handle resolves, STATUS_INVALID_HANDLE otherwise — matches canary's Resume()/!thread return semantics. Cascade-prediction scorecard (audit-018 -> post-fix): - A PASS: tids 9 (entry=0x824D2878) and 10 (entry=0x824D2940) leave Suspended -> run prologue -> park on audio buffer-completion semaphores 0x828A3254 / 0x828A3230. - B PARTIAL FAIL: NtSetEvent 667->3334; KeReleaseSemaphore=0; XAudioSubmitRenderDriverFrame=0. - C FAIL (predicted 2->1, actual 2->2): both ExTerminateThread + KeReleaseSemaphore still canary-only. - D FAIL: gamma-cluster blocker unchanged — pc-probe at 0x82184318/0x82184374 no fires; dump-addr 0x828F4070 no DUMP; signal_attempts on 0x1004/0x100c/0x1020/0x15e4 still 0. Necessary-but-not-sufficient: workers unsuspend but park on a downstream gate that's part of the audit-009/-016/-017 gamma cluster. Tests 600 -> 601 (+ke_resume_thread_unblocks_suspended_worker). Lockstep instructions=100000003 imports=987516 deterministic x2. Goldens re-baselined: sylpheed_n50m.json instructions 50000003->50000011, imports 407255->407247. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
5d2401f9c5 |
fix(xam): XamUserGetSigninState returns SignedInLocally=1 for user 0
Mirrors canary xam_user.cc:90-101. User 0 returns 1 (SignedInLocally), all other indices return 0. Replaces stub_return_zero registration that was reaching guest-side branches looking up signin state. Tests: 599 -> 600. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
b78e6fd205 |
fix(kernel): KRNBUG-IO-004 — real XamNotifyCreateListener + XNotifyGetNext per canary
Canary's RegisterNotifyListener (kernel_state.cc:1013-1033) auto-enqueues four
startup notifications on the first listener whose mask covers kXNotifySystem
(SystemUI=0x09 + SystemSignInChanged=0x0A) and kXNotifyLive
(LiveConnectionChanged=0x02000001 + LiveLinkStateChanged=0x02000003). XNotifyGetNext
(xam_notify.cc:22-96) pops the queue with mask + version filtering on enqueue per
xnotifylistener.cc:38-51. Our prior stubs returned 0 forever; the dispatch loop
at 0x822f1be8 in sub_822F1AA8 was thus bypassed indefinitely.
Implementation:
- KernelObject::NotifyListener { mask, max_version, queue, waiters } variant.
- KernelState::has_notified_startup + has_notified_live_startup gates.
- xam_notify_create_listener: mask=r3 (qword), max_version=r4 (clamped <=10),
alloc handle, conditional 4-tuple startup enqueue.
- xnotify_get_next: handle/match_id/id_ptr/param_ptr in r3..r6; pop_front
(or scan-by-id), with mask + version filter applied at enqueue time.
- 5 unit tests covering: full-mask 4 startup notifications, second-listener
no re-fire, system-only mask filtering, max_version=0 too-new drop,
unknown handle returning 0.
Tests: 594 -> 599. Lockstep `-n 100M` instructions=100000012 deterministic
across 2 reruns; bit-identical run-to-run diff.
Cascade (verified at -n 500M):
- dispatch arm 0x822f1be8 fires; sub_82173DC8 entered.
- 3/21 renderer-cluster L1 PCs newly reached: 0x822c6870 (2 workers),
0x824563e0, 0x823ddb50.
- canary-only export delta 7 -> 3 (reclassified to fired:
KeResetEvent, ObCreateSymbolicLink, XamTaskCloseHandle, XamTaskSchedule).
- worker thread count 18 -> 20.
- signal_attempts on handle 0x15e0 = 1 (primary=1), was 0.
- draws=0 still expected at this step.
LOC: 119 (97 impl + 22 scaffolding pattern matches across main.rs / objects.rs
/ state.rs) <= 120.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
a1a7265f29 |
fix(kernel): KRNBUG-IO-003 — NtDeviceIoControlFile real impl mirroring NullDevice::IoControl
Replace the stub_success registration of NtDeviceIoControlFile at
exports.rs:90 with a real handler for FsCtlCodes 0x70000 (drive
geometry) and 0x74004 (partition info), mirroring xenia-canary
xboxkrnl_io.cc:645-678 + null_device.{h,cc}. The 16-byte 0x74004
response with cache_size=0xFF000 at OUT+8 is the gate that lets
sub_824ABD88 return SUCCESS and sub_824A9710 reach the priv-11
XexCheckExecutablePrivilege site identified by KRNBUG-AUDIT-007.
Stack args 9-10 (OutputBuffer, OutputBufferLength) read from the
caller's parameter save area at [sp+0x54] / [sp+0x5C] per the Xbox
360 PowerPC EABI (linkage area sp+0..sp+8, 8-quadword spill area
sp+0x14..sp+0x54, then stack args every 8 bytes). First HLE export
in the codebase to need 9+ args.
Cascade vs. KRNBUG-AUDIT-007 prediction (5/8 held):
- XexCheckExecutablePrivilege count 1 → 2 (priv=0xA + priv=0xB) ✓
- XamTaskSchedule count 0 → 1 ✓
- canary-only exports 7 → 3 (audit predicted ≤3) ✓
- 0x15e0 semaphore signal_attempts 0 → 1 (bonus)
- 0x100c worker spawn DID NOT fire (still UNCREATED) ✗
- 0x1004 signal_attempts unchanged ✗
- Worker spawn count unchanged at 19 ✗
Tests: 592 → 594. Lockstep deterministic at -n 100M (run1 ≡ run2 ≡
run3, byte-identical). instructions=100000010 → 100000019, imports
407417 → 987524 (+2.4×). swaps=2 draws=0 plateau persists.
sylpheed_n50m golden re-baselined instructions=50000004→50000003,
imports=407362→407255. sylpheed_n2m unchanged.
Still canary-only after this fix: ExTerminateThread,
KeReleaseSemaphore, XamUserReadProfileSettings. The next downstream
gate is somewhere past XamTaskSchedule's completion path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
c51f51f9cb |
feat(kernel): KRNBUG-AUDIT-007 — --branch-probe instrumentation; sub_824A9710 exit gate identified
Sister to --pc-probe / --ctor-probe but emits a single compact one-line BRANCH-PROBE record per fire (pc, tid, hw, cycle, r3, lr, cr0/cr6 flags) with no back-chain. Designed for tracing every conditional-branch fire inside a candidate-gate function so the last PC reached before the function epilogue identifies the exit branch. Runtime trace at audit-runs/audit-007/sub_824A9710-trace.log decisively identifies the priv-11 gate: - Exit branch: 0x824a9944 (post bl sub_824ABD88 first call) - Responsible kernel call: NtDeviceIoControlFile, FsCtlCode=0x74004 (registered as stub_success at exports.rs:90) - Mechanical chain: stub returns 0/SUCCESS without writing OUT, game reads [out_buf+8], finds zero, assigns hardcoded 0xC0000034 (STATUS_OBJECT_NAME_NOT_FOUND) at sub_824ABD88:0x824abea8-ac, exits via 0x824a9944's lt branch before priv-11 site at 0x824a99a0. 592→592 tests; lockstep instructions=100000010, swaps=2, draws=0 deterministic across reruns. Read-only diagnostic — no fix this session. Next session: KRNBUG-IO-003 (real NtDeviceIoControlFile per canary NullDevice::IoControl for FsCtlCodes 0x70000 + 0x74004). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
7675035082 |
fix(kernel): KRNBUG-IO-002 — vol-info class-3 returns 0x10000 alloc unit (canary NullDevice)
`nt_query_volume_information_file` class-3 (`FileFsSizeInformation`) was returning sectors_per_unit=1, bytes_per_sector=2048 (alloc unit 2048). Replaced with canary's NullDevice byte-identical values sectors=0x80, bps=0x200 (alloc unit 0x10000), with total / available allocation units lowered to 0x10 / 0x10 to match. Reference: xenia-canary/src/xenia/vfs/devices/null_device.h:38-46 (`NullDevice::sectors_per_allocation_unit()` and `bytes_per_sector()`); consumed by canary's `NtQueryVolumeInformationFile_entry` at xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_io_info.cc:355-365. Tests 591 → 592 (added `nt_query_volume_information_file_class3_returns_64k_alloc_unit`). Lockstep `instructions=100000010, swaps=2, draws=0` deterministic across two `--stable-digest -n 100M` reruns. sylpheed_n50m oracle still matches its existing golden — observably a no-op at -n 50M. The audit-006-predicted 7→0 cascade did NOT fire (canary-only exports still 7, identical set; XexCheckExecutablePrivilege still priv=0xA only; XamTaskSchedule still 0). All 16 NtQueryVolumeInformationFile calls in our 500M trace originate from a single LR 0x82611f38 and complete successfully — vol-info is therefore not the priv-11 gate. The fix value is correct (canary-byte-identical) but is not load-bearing for the gate; landing it anyway because it's the right value and unblocks no regression. Stop condition triggered per the IO-002 task brief — no second fix this session. Next-session: --pc-probe on sub_824A9710 entry to find the actual upstream gate. See `audit-findings.md` (KRNBUG-IO-002 entry) and `audit-runs/post-IO-002/` for the full diagnostic trail. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
bef9793aec |
feat(kernel): KRNBUG-IO-001 — NtReadFile on synth-empty file returns SUCCESS+0, not EOF
AUDIT-005's static attribution to sub_824ABA98 was wrong. The 0xC0000011
(STATUS_END_OF_FILE) at lr=0x824a97e4 traces to the NtReadFile call at
0x824a9810 inside sub_824A9710 — the cache-loader reads 1024 B from
offset 2048 of `\Device\Harddisk0\partition0`. Our synth-empty fallback
returned EOF (start_pos 2048 > size 0), so the function bailed via
RtlNtStatusToDosError before sub_824ABA98 was ever called.
Canary mounts partition0 to a NullDevice; `NullFile::ReadSync`
([null_file.cc:24-31](xenia-canary/src/xenia/vfs/devices/null_file.cc))
returns X_STATUS_SUCCESS with bytes_read=0 and never touches the
buffer. Sylpheed's caller pre-zeroes the 1024-byte stack buffer
(`memset(sp+208, 0, 1024)` at sub_824A9710 prologue), validates a
"Josh" magic on the first read, and falls back to the cache-recreate
path when the magic doesn't match.
The fix mirrors NullFile semantics: when the open synthesized a
zero-length file (`data.is_empty() && size == 0`), NtReadFile returns
SUCCESS with information=0 and the buffer untouched.
Effects (chain-of-effects verification at -n 500M):
- tests: 590 → 591 (added regression covering NullDevice semantics)
- lockstep: deterministic across 3 reruns (same instructions=100000010,
swaps=2)
- sylpheed_n50m golden re-baselined: instructions 50000004→50000000,
imports 407416→407362
- canary kernel-call diff: 10 → 7 missing exports
(XeCryptSha + XeKeysConsolePrivateKeySign + NtDeviceIoControlFile
now run; the cache-recreate path executes through to NtWriteFile)
- boot reaches silph::Silph::Impl::OnInit: 19 worker threads spawn
(was 6 before the fix)
- parked-handle 0x1004 still signal_attempts=0; the original 0x100c
and 0x15e0 are now <UNCREATED> because cascade walked past them and
the handle assignments shifted; new parked sites: 0x12fc/0x1600/
0x1040/0x10b8/0x15e8/0x1014/0x101c/0x10bc/0x1044
- draws=0 plateau persists; renderer is multi-causal blocked
Next blocker: per the canary-only diff, XamTaskSchedule + the cluster
of XAM exports (XamTaskCloseHandle, XamUserReadProfileSettings,
ObCreateSymbolicLink) and the post-thread-exit chain (ExTerminateThread,
KeReleaseSemaphore, KeResetEvent) are the next-up frontier.
|
||
|
|
19659d7f76 |
feat(kernel): KRNBUG-XAM-001 — XGetAVPack returns 8 (HDMI), not 0x16
Mirrors canary's cvars::avpack default (xam_info.cc:35) and Sylpheed's
accepted set {3,4,6,8} (xam_info.cc:250-251). With KRNBUG-XEX-001 having
flipped the priv-10 gate, XGetAVPack now reaches its caller in
sub_824AB578; returning 0x16 caused Sylpheed to abort the AV/crypto
block before XeCryptSha. Cascade walks one step (canary-only export
list 11 → 10); sub_824ABA98 is the next candidate.
Tests: 589 → 590. Goldens re-baselined (n50m: 50000005→50000004,
imports 407417→407416). Lockstep deterministic across 3 reruns at
-n 100M (instructions=100000010, import_calls=987686 +2.4×, swaps=2).
9-PC producer probe still 0×; parked handles 0x1004/0x100c/0x15e0
still signal_attempts=0.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
1a892d4641 |
feat(kernel): KRNBUG-XEX-001 — real XexCheckExecutablePrivilege from XEX header bitmap
Replace stub_return_zero with a canary-faithful implementation that returns bit `priv` of the loaded XEX's XEX_HEADER_SYSTEM_FLAGS (key 0x00030000) bitmap. Mirrors xenia-canary xboxkrnl_modules.cc:22-39: `(flags >> priv) & 1` for priv < 32, else 0. Plumbing: - xenia-xex: header_keys::SYSTEM_FLAGS const + get_system_flags() accessor. - xenia-kernel/state.rs: pub xex_system_flags: u32 + xex_priv_logged HashSet for one-shot per-priv tracing. - xenia-app: kernel.xex_system_flags wired in cmd_exec_inner. - xenia-kernel/exports.rs: real export body + unit test covering bits 10/11/0/64 + zero-flags case. Sylpheed's bitmap is 0x00000400 (only XEX_SYSTEM_PAL50_INCOMPATIBLE, bit 10). At -n 500M with the fix: - XGetAVPack: 0 -> 1 (priv-10 gate at lr=0x824ab598 flipped). - 10 other canary-only exports + 9 producer PCs + 3 parked handles unchanged. Priv-11 site at sub_824A9710 is downstream and still not reached — AV/crypto block aborts after XGetAVPack returns our placeholder 0x16 (canary returns 8/HDMI; Sylpheed accepts only 3/4/6/8 per xenia-canary xam_info.cc:250-251). Tests 588 -> 589. Lockstep deterministic (3 reruns identical): n50m goes 50000008 -> 50000005 instr / 407415 -> 407417 imp / swaps=2 / draws=0. Goldens re-baselined (sylpheed_n50m, sylpheed_n2m); oracle test green. Full chain-of-effects + next-frontier hand-off in audit-findings.md under KRNBUG-XEX-001. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
3e2fc1ec88 |
feat(kernel): KRNBUG-AUDIT-005 — --pc-probe extension + canary diff identifies XexCheckExecutablePrivilege stub cascade
Extends `--ctor-probe` machinery into `--pc-probe` (clap alias) with
the optional `PC@DISPATCHER:OFFSET` token form: on a hit, the helper
additionally logs `[disp+off]` — what the producer's
`lwz r3, OFFSET(r3)` is about to read. Reuses `parse_hex_u32`; both
flags share parser + storage.
Read-only diagnostic. Lockstep digest preserved (`run digest matches
golden` at -n 50M `--stable-digest`). 588 tests green.
Decisive findings (full deliverable in `audit-findings.md` /
`audit-runs/audit-005/`):
- Failure mode α confirmed for KRNBUG-AUDIT-004: all 9 producer call
sites for handles 0x100c (5 sites) and 0x15e0 (4 sites) fire 0x at
-n 500M. The producer code path is not reached.
- Set-diff of kernel-call sequences (canary.log oracle vs ours.log
at -n 500M) identifies 11 exports canary calls and we don't:
XGetAVPack, XeCryptSha, XeKeysConsolePrivateKeySign,
ObCreateSymbolicLink, NtDeviceIoControlFile (×2),
XamUserReadProfileSettings (×2), XamTaskSchedule, XamTaskCloseHandle,
KeReleaseSemaphore (×268), KeResetEvent, ExTerminateThread (×2).
- XGetAVPack has exactly one caller (sub_824AB578 at 0x824AB5A0).
The 4 instructions immediately preceding it are:
addi r3, r0, 10 ; privilege bit 10
bl XexCheckExecutablePrivilege
cmpli 0, r3, 0
bc 12, eq, 0x824AB724 ; if r3==0, skip whole block
- exports.rs:193 registers XexCheckExecutablePrivilege as
stub_return_zero. Always returning 0 -> guest takes the branch
and skips the entire AV/crypto/save-data init block.
- The other call site (sub_824A9710 at 0x824A99A0) queries privilege
11 with opposite polarity (bne) -> gates XamTaskSchedule on the
privilege-NOT-set arm. With both stubs returning 0, the guest
walks the wrong arm of every privilege-gated branch.
- This explains why the dispatcher fields read zero
([0x828F3D08+0x50]=0, [0x828F4070+0x24]=0 from AUDIT-004 dumps):
the ctors run, but the producers that would populate those fields
with a non-zero handle never execute.
Next session: replace XexCheckExecutablePrivilege stub with real
priv-bit lookup from XEX header. See audit-findings.md
KRNBUG-AUDIT-005 for the validation matrix.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
7108d6d131 |
feat(kernel): KRNBUG-AUDIT-004 — --ctor-probe PC hook + --dump-addr struct dump
Diagnostic-only, read-only. Lockstep `instructions=100000002`
preserved bit-exact at -n 100M --stable-digest. 586 → 588 tests.
Adds two read-only diagnostics for the parked-waiter producer hunt:
* `--ctor-probe=0x8217C850,0x...` — at every interpreter step,
if `ctx.pc` is in the configured set, print one `CTOR-PROBE`
line capturing live r3 (= `this` in MSVC PPC ctors), lr
(= return site), sp, plus an 8-frame back-chain with
saved-r31/r30 per frame. Fires once per hit, exactly what the
8-instance-pool probe needed.
* `--dump-addr=0x828F3D08,0x828F4070,0x828F3EC0,...` — at end of
run (after the FOCUS report in `dump_thread_diagnostic`), each
address gets a 128-byte hex + be32 + ASCII dump. Used to
inspect the static dispatcher / job-queue struct layouts
AUDIT-003 identified.
Both gated default-off; empty set is a single `is_empty()` test on
the hot path. No guest state is mutated, so the
`sylpheed_n*m.json` lockstep digest is preserved.
KRNBUG-AUDIT-004 findings (corrects KRNBUG-AUDIT-002/003):
1. **The "8-instance pool" hypothesis for handle 0x1004 is FALSE.**
Probing the inner per-instance ctors `[0x821783D8, 0x82181750,
0x821701C8]` at -n 50M shows each fires EXACTLY ONCE with
r3 = `[0x828F3EC0, 0x828F3D08, 0x828F4070]` respectively. All
three handles are Meyers-style singletons with one dispatcher
each. The "called 8 times" claim came from miscounting raw
entries to the OUTER getter sub_8217C850 — but that getter is
itself a Meyers-singleton-getter; only the FIRST entry cascades
through to bl 0x821783D8 (gated on `[0x828F48D8] bit 0`).
2. **The producer indirection layer is the singleton-getter
itself.** Static byte-scan of .rdata / .data shows 0 hits for
the dispatcher addresses — no static registry table holds them.
But the xrefs table for the OUTER getters reveals 5–6 callers
each, MOSTLY non-create-chain, sharing the canonical producer
pattern: `bl outer_singleton_getter; lwz r3, OFFSET(r3); bl
0x824AA1D8` (with OFFSET=80 for 0x100c, =36 for 0x15e0). So the
AUDIT-003 xref audit was necessary but not sufficient — it
correctly saw "no direct producer references" but missed the
singleton-getter indirection layer.
3. **Dispatcher struct layouts** (128-byte dumps captured at -n
50M --halt-on-deadlock):
- 0x828F3D08 (handle 0x100c): event_handle at +0x4C (0x100c),
thread_handle at +0x48 (0x1010), self-pointer at +0x74,
capacity 7 at +0x28, queue empty (+0/+3C = -1).
- 0x828F4070 (handle 0x15e0): event_handle at +0x20 (0x15e0),
sibling-handle 0x15E4 at +0x1C, queue empty (+0x10 = -1).
- 0x828F3EC0 (handle 0x1004): event_handle at +0x78 (0x1004),
4 guest-heap sub-buffers at +0x20/+0x3C/+0x44/+0x50 in
0x4xxxxxxx range — noticeably different layout from the
other two pure POD job queues.
Files:
crates/xenia-kernel/src/state.rs ctor_probe_pcs / dump_addrs +
fire_ctor_probe_if_match + 2 tests
crates/xenia-app/src/main.rs Exec --ctor-probe / --dump-addr
CLI parsing, prologue hook,
end-of-run struct dumper
audit-findings.md KRNBUG-AUDIT-004 entry
audit-runs/audit-004/ 50M probe runs (v1 outer-getter
hits, v2 inner-ctor hits proving
the singleton hypothesis)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
f84e947547 |
feat(kernel): KRNBUG-AUDIT-003 — vtable/RTTI class probe at handle creation + wait
Adds a read-only MSVC RTTI traversal helper (`read_class_at_this`)
and a `probe_create_stack_classes` integration that walks each
captured back-chain frame for handle creates in `--trace-handles-focus`
and probes each frame's most-likely `this` candidate (live r31/r30/r3
for frame 0; saved-r31/r30 from the prologue spill area at [fp-12]/
[fp-16] for deeper frames). False-positive guard rejects the CRT
static-init iterator pattern (vtable's first two slots must be image-
range function pointers — PPC instruction words like `mflr r12` are
not in 0x82xxxxxx).
`dump_thread_diagnostic` now takes `&GuestMemory` so the FOCUS report
prints, for each parked waiter, a WAIT-THREAD block with full back-
chain frames and per-slot saved-register dump for offline lookup.
End-to-end finding (-n 500M producer-trace):
* Handle 0x100c dispatcher = 0x828F3D08 (image rdata; verified by
sub_82181750 disasm + xref table). [this+0] = -1 sentinel — POD
job queue, NOT a C++ polymorphic class.
* Handle 0x15e0 dispatcher = 0x828F4070 (same shape).
* Handle 0x1004's 8-instance pool members still TBD (MSVC ctors
didn't preserve `this` in r31).
* 0x42450b5c is a separate audit class (heap-allocated, parks via
non-`do_wait_single` path).
Decisive xref audit: every reference to 0x828F3D08 / 0x828F4070 in
the static analysis is in a ctor or the CRT init driver. NO producer
code references either dispatcher base. Confirms `signal_attempts=0`
is unreachable-producer, not broken-producer.
Tests: 581 → 586 green (+5: RTTI-intact / RTTI-stripped / non-object
/ cstring / probe_create_stack integration). `--stable-digest -n
100M` instructions=100000002 unchanged. Master HEAD prior:
|
||
|
|
2a9fd1fc86 |
feat(kernel): KRNBUG-AUDIT-002 — multi-frame guest stack capture at handle creation
Adds `walk_guest_back_chain` (PPC EABI back-chain walker) and a
`record_create_with_stack` audit hook gated on `--trace-handles-focus`.
NtCreateEvent / NtCreateSemaphore / NtCreateTimer / XamTaskSchedule now
route through the new helper so focused handles capture up to 6 stack
frames at allocation time. Diagnostic-only, read-only memory access:
unfocused handles pay one HashSet lookup, focused ones pay six
back-chain dereferences. Lockstep determinism preserved.
End-to-end finding: handles 0x1004 (8-instance pool via static ctor at
0x8280F810), 0x100c (singleton built inside main()), 0x15e0 (singleton
in distinct cluster) are silph-framework dispatcher objects whose
producer code is unreached at -n 500M. The producer hunt now has class
ownership; vtable/RTTI readout is the next step.
Tests: 576 → 581 green. `--stable-digest -n 100M` instructions=100000002
unchanged. Master HEAD prior:
|
||
|
|
07068e7616 |
feat(audio): APUBUG-PRODUCER-001 — XAudio register driver client + opt-in callback ticker
Replace the three XAudio kernel-export stubs (Register/Unregister/SubmitFrame) with canary-faithful implementations and add a periodic buffer-complete callback ticker reusing the existing SavedCallbackCtx injection machinery. Canary parity: - xboxkrnl_audio.cc:56-93 — read callback_ptr[0..1], wrap callback_arg in a 4-byte big-endian guest heap buffer (`wrapped_callback_arg`), write `0x4155_xxxx` to *driver_ptr. - audio_system.cc:139-141 — guest callback receives r3 = wrapped pointer, not raw callback_arg. - audio_driver.h:21-24 — frame rate 256 samples / 48 kHz ≈ 5.33 ms. Implementation: - New `crates/xenia-kernel/src/xaudio.rs` — `XAudioClient`, `XAudioState` (8-slot table, pending FIFO, dual-mode ticker), `XAUDIO_INSTR_PERIOD = 48_000` (lockstep) and `XAUDIO_PERIOD = 5.333 ms` (--parallel), same pattern as KRNBUG-D08 v-sync. - `try_inject_audio_callback` in xenia-app mirrors `try_inject_graphics_interrupt`, shares `interrupts.saved` slot for mutex with graphics callbacks. Gating: ticker + injector run only when `--xaudio-tick` / `XENIA_XAUDIO_TICK=1`. Default off because Sylpheed's audio callback enters an infinite `KeWaitForSingleObject` loop on first invocation (canary's host worker thread provides the buffer-completion fence we don't model), which hijacks a guest HW thread and regresses `swaps=2 → 1`. Default-off preserves the lockstep `sylpheed_n*m.json` goldens exactly. Producer hunt outcome (FALSIFIED for parked handles 0x1004/0x100c/0x15e4): at `-n 500M --xaudio-tick` all 3 handles still show `signal_attempts=0 (primary=0, ghost=0)`. Audio callback is not the missing producer. Next candidate per audit-findings.md is Timer DPC delivery (KeSetTimer / KeInsertQueueDpc). Tests: 562 → 576 green (10 in `xaudio.rs`, 4 in `exports.rs`). Lockstep `--stable-digest -n 100M` default-off: instructions=100000002, swaps=2 (matches pre-change baseline byte-for-byte). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
691404e36e |
fix(xam): XAMBUG-PRODUCER-001 — XamTaskSchedule spawns a real guest thread
Replaces the no-op stub at xam.rs:204 with a canary-faithful implementation mirroring xenia-canary/src/xenia/kernel/xam/xam_task.cc:43-80. Allocates a ThreadImage, allocates a KernelObject::Thread handle, and routes through Scheduler::spawn with entry=callback and start_context=message_ptr (canary's third positional XThread ctor arg). Stack size = max(0x4000, page-aligned 0x10_0000). Producer-hypothesis outcome (500M --trace-handles-focus run): the call site at 0x824a9a10 is never reached during this boot horizon, so XamTaskSchedule cannot be the missing producer for the 3 parked Event/Manual handles (0x1004, 0x100c, 0x15e4). The fix still lands — the stub was a real correctness bug that would manifest the moment the boot advances past the current deadlock. Next candidate per audit-findings.md: XAudioRegisterRenderDriverClient. - Workspace tests: 561 → 562 green (new test xam::tests::xam_task_schedule_spawns_real_thread). - --stable-digest -n 100M: instructions=100000002 unchanged from baseline; lockstep determinism preserved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
27d3608174 |
fix(kernel): KRNBUG-D08 — wall-clock v-sync under --parallel
The synthetic v-sync ticker used a per-instruction proxy (VSYNC_INSTR_PERIOD = 150 k) tuned for ~10 MIPS lockstep throughput → 60 Hz. Audit M11 observed this drifts under `--parallel`: with 6 worker threads sharing the kernel mutex, the dispatcher executes more PPC instructions per tick callback, so the accumulator never crosses 150 k. Result: ~629 v-syncs/100M lockstep → ~2 v-syncs/100M --parallel. Hybrid solution preserves lockstep determinism (which the goldens depend on) while fixing --parallel: * `tick_vsync_instr(instr_count)` — legacy instruction-count ticker, used by lockstep. Bit-stable across runs. * `tick_vsync_wallclock()` — new Instant-based ticker. Fires `floor(elapsed / VSYNC_PERIOD)` v-syncs since the anchor and advances the anchor by that many full periods (no lazy backlog). Capped at INTERRUPT_QUEUE_CAP per call so a forward-jumping clock can't overflow the FIFO. * `KernelState.parallel_active` flag set at startup from `--parallel` / `XENIA_PARALLEL=1`. Read by `coord_pre_round` in main.rs to choose between the two tickers. Verification: * cargo test --workspace --release: 561 passing (+3 new wall-clock tests vs prior 558 baseline). * lockstep -n 100M --stable-digest: BIT-IDENTICAL to pre-Phase-3 baseline. interrupts_delivered preserved at ~630 (was ~629 pre-fix). * --parallel --reservations-table -n 30M: interrupts_delivered rose from ~2 to 17. (FIFO INTERRUPT_QUEUE_CAP=4 still caps burst delivery; that's a separate bottleneck — addressed by raising cap when --parallel queue depth becomes the next blocker.) Trade-off: --parallel runs are non-deterministic at the v-sync rate by design (per audit M05 PPCBUG-703 already). Lockstep stays bit-identical, so the `sylpheed_n*m.json` goldens are untouched. Audit IDs: KRNBUG-D08 (closed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
d1105aafae |
diag(audit): KRNBUG-AUDIT-001 — focused parked-waiter ghost-trail diagnostic
Adds a one-run diagnostic that distinguishes "guest never called
Nt/KeSetEvent on this handle" from "signal landed but waiter wasn't
woken", for any handle named via `--trace-handles-focus`.
Parked-waiter context (project_xenia_rs_sylpheed_stage3_2026_04_29):
four worker threads block Sylpheed past `draws=0` on handles
0x1004 / 0x100c / 0x15e4 / 0x42450b5c (mr=true, sig=false). The
pre-existing audit dropped signal-attempts that targeted handles
without a primary trail, so we couldn't tell whether the producer
was unreachable in the guest or whether the signal landed but missed
its waiter.
Three changes:
* audit.rs: `HandleAudit` gains `focus: HashSet<u32>` and
`ghost_trails: HashMap<u32, GhostTrail>`. `record_signal`
auto-falls-through to a new `record_signal_attempt_ghost` when no
primary trail exists AND the handle is in `focus`. Bounded by
AUDIT_RING_CAPACITY per handle. Two new tests cover the focus
ghost-trail and no-double-record invariants.
* main.rs: new `--trace-handles-focus=<LIST>` flag (hex 0x or decimal,
comma-separated) populates `kernel.audit.focus`. Implies
`--trace-handles`. New "=== Handle audit (focus) ===" section in
`dump_thread_diagnostic` emits per-handle:
- signal_attempts (primary + ghost), waits, wakes
- merged cycle-sorted timeline (last 16)
- GuestExport / KernelInternal classification
- <AUDIT_BLIND> marker when waiter_count > 0 but the audit
saw no waits (i.e. waiter parked via a non-audit path —
CS / spinlock / DPC).
- DIAGNOSIS conclusion that selects between five branches.
* `cmd_check` passes None for focus → goldens unaffected.
Empirical run output at -n 500M lockstep with
`--trace-handles-focus=0x1004,0x100c,0x15e4,0x42450b5c`:
handle=0x00001004 kind=Event/Manual waiters=1 signaled=false
signal_attempts=0 (primary=0, ghost=0)
waits=1 wakes=0
created cycle=0 tid=1 lr=0x824a9f6c src=NtCreateEvent
=> producer is a missing kernel signal source
(or BST-paradox upstream)
... (same shape for 0x100c, 0x15e4)
handle=0x42450b5c kind=<UNCREATED> waiters=1 signal_attempts=0
waits=0 wakes=0 <AUDIT_BLIND>
=> waiter parked via non-audited path
Conclusion: hypothesis (A) confirmed for all 4 handles. Producer is
NOT a wake/eligibility bug — it is a genuinely missing kernel signal
source. The 3 Event/Manual handles share a creator
(lr=0x824a9f6c, tid=1) and the same wait-call wrapper at
lr=0x824ac578 — these are 3 worker threads all parked on
"work-available" notifications that never come.
Verification:
* cargo test --workspace --release: 558 passing (+2 new ghost-trail
tests vs prior 556 baseline)
* lockstep -n 100M --stable-digest: bit-identical to master HEAD
Audit IDs: KRNBUG-AUDIT-001 (closed — diagnostic instrumentation).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
7a1b6b3306 |
fix(gpu): GPUBUG-DRAIN-001 — silence VdSwap PM4 fallback under --parallel
The Phase-C VdSwap PM4 ring path (commit |
||
|
|
780e854c2f |
fix(memory): XMODBUG-002 — write_bulk bumps page_versions for touched pages
`GuestMemory::write_bulk` did the bulk copy via raw `copy_nonoverlapping`
without bumping page_versions for any of the pages it touched. The
per-byte `write_u8/u16/u32` methods all bump page_versions after their
store; downstream caches (texture cache, shader cache) Acquire-load the
slot to invalidate stale entries on guest writes. Without the bulk
bump, a caller like `NtReadFile` writing a texture/shader resource into
guest memory would leave any cache that had already keyed on the prior
version handing back stale decoded bytes.
After the copy, walk every page the write touched and bump it. Cheap:
the typical bulk write spans a few pages (NtReadFile uses 64-128 KB
chunks → 16-32 pages).
Reservation-table invalidation for `lwarx`/`stwcx.` (XMODBUG-001's
sibling) is NOT addressed here — the reservation table lives on
KernelState, not GuestMemory, and plumbing it through requires a wider
change. Callers that bulk-write code-bearing or atomic-bearing memory
should call `kernel.reservations.invalidate_for_write(addr)`
themselves; XEX-loader and NtReadFile are doing data-bearing writes
that don't intersect lwarx targets, so this is acceptable for now.
Verification at -n 100M lockstep:
swaps: 2 → 2 (unchanged)
draws: 0 → 0
texture_cache_entries: 0 → 0 (Sylpheed hasn't issued IM_LOAD yet
— the bump is silent until a cache
keys on a touched page, which won't
happen until Phase F2/F3 unblocks
the resource-loader workers)
packets: ~59M (within noise)
Tests: 16 memory pass.
Closes XMODBUG-002 (P1).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
8fc1b1dfed |
fix(gpu): GPUBUG-006 — sync_with_mmio Acquire/Release pair the producer
The producer side (`mmio_region.rs:78`, the guest's CP_RB_WPTR MMIO write callback) uses `Ordering::Release` so any ring-memory writes the guest performed before bumping WPTR are visible to a paired `Acquire`-load on the consumer. The consumer here at `sync_with_mmio` was using `Ordering::Relaxed` for both the WPTR load and the RPTR mirror store — leaving the Release/Acquire pairing broken. Under `--parallel`, this broken pairing means the GPU worker can observe a fresh WPTR value while still reading stale ring-memory contents at the corresponding offsets — garbage PM4 packets. The audit's M11 grid run confirmed --parallel is non-deterministic beyond the documented `packets` ±5% noise; this fix is one strand of that. Symmetric fix on the RPTR mirror store: Release pairs with any guest-side Acquire-load of CP_RB_RPTR for ring-writeback bookkeeping. Verification at -n 100M lockstep: swaps: 2 → 2 (unchanged) draws: 0 → 0 (unchanged) packets: ~60M (within noise) Tests: 149 (no count change; this is a memory-ordering correctness fix, not a behavioral change visible at the digest level in lockstep). Closes GPUBUG-006 (P1). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
e7d0fcf2c9 |
fix(kernel): KRNBUG-017 — real Kf*SpinLock + KeReleaseSpinLockFromRaisedIrql
The Kf-family spinlock exports were registered as stubs: KfAcquireSpinLock → stub_return_zero (didn't write lock) KfReleaseSpinLock → stub_success (didn't clear lock) KeReleaseSpinLockFromRaisedIrql → stub_success (same) KeTryToAcquireSpinLockAtRaisedIrql → returned 1 but didn't set lock value Guest code that read the lock value back (e.g. nested acquire/release sanity checks, debug assertions) saw 0 even after "acquiring", and could enter critical regions without contention serialization. Under `--parallel` the coarse Arc<Mutex<KernelState>> already serializes us, so the audit's P0-under-parallel ranking is about correctness of the lock value visible to guest code, not mutual-exclusion (which is provided by the host mutex). Implementation mirrors canary's `xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_threading.cc`: - KfAcquireSpinLock: write 1 to *SpinLock, return 0 (old IRQL) - KfReleaseSpinLock: write 0 to *SpinLock - KeReleaseSpinLockFromRaisedIrql: write 0 to *SpinLock - KeTryToAcquireSpinLockAtRaisedIrql: write 1 to *SpinLock, return 1 Single-threaded HLE: contention can never be observed (we never run two guest threads simultaneously without holding the kernel mutex), so the spin-loop can degenerate to an unconditional acquire. Verification at -n 100M lockstep: swaps: 2 → 2 (unchanged) draws: 0 → 0 (gated by F2/F3/G) packets: ~59M (within noise) Tests: 76 kernel pass (no count change; existing harness covers the new write semantics implicitly via guest-memory smoke tests). Closes KRNBUG-017 (P0 under --parallel). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
8723d6826b |
fix(gpu): GPUBUG-103/104/105 — fix 8 draw-state register addresses + index_size bit
Eight of the register-index constants in draw_state.rs::reg pointed at
completely unrelated registers because the canonical canary table
(register_table.inc) was misread when the module was first authored.
Re-validated each value against canary's lines 1232-1336.
| Register | Pre-fix | Canary | Was-actually |
| ------------------------- | ------- | ------ | ------------- |
| VGT_DRAW_INITIATOR | 0x2281 | 0x21FC | (junk) |
| VGT_DMA_BASE | 0x2282 | 0x21FA | (junk) |
| VGT_DMA_SIZE | 0x2283 | 0x21FB | (junk) |
| PA_SC_WINDOW_SCISSOR_TL | 0x200E | 0x2081 | SCREEN_SCIS_TL|
| PA_SC_WINDOW_SCISSOR_BR | 0x200F | 0x2082 | SCREEN_SCIS_BR|
| RB_COLOR_INFO_1 | 0x2010 | 0x2003 | COHER_DEST_BASE_10|
| RB_COLOR_INFO_2 | 0x2011 | 0x2004 | COHER_DEST_BASE_11|
| RB_COLOR_INFO_3 | 0x2012 | 0x2005 | COHER_DEST_BASE_12|
| PA_SU_VTX_CNTL | 0x2083 | 0x2302 | PA_SC_CLIPRECT_RULE|
Also corrected the `index_size` bit position in VGT_DRAW_INITIATOR
extraction: was bit 8 (which is `major_mode[0]`), should be bit 11 per
canary `registers.h:324` (`xenos::IndexFormat index_size : 1; // +11`).
The block comment in `extract()` was also wrong about the
intermediate field layout and has been refreshed.
Verification at -n 100M lockstep:
swaps: 2 → 2 (unchanged)
draws: 0 → 0 (still gated — see below)
packets: ~61M (within noise)
Tests: 149 (no count change; existing draw_state tests cover the
new constants implicitly via behavioral round-trip).
The audit predicted Phases C+D+E together would unlock `draws > 0`,
but the runtime plateau is multi-causal per the audit's own analysis
(`project_xenia_rs_audit_2026_05_02.md`). The likely remaining
blockers in -n 100M:
* 4 parked-waiter worker threads (handles 0x1004, 0x100c, 0x15e4,
0x42450b5c) — Phase F's XAM/spinlock fixes target this.
* shader_blobs_live=0 after 100M — the game hasn't issued IM_LOAD
yet because workers haven't loaded shader resources.
The register fixes here are still load-bearing for any draw that
DOES happen (every register read at 0x2281 was junk before this
commit) — landing them now is correct even if draws=0 persists until
Phase F unparks the resource-loader threads.
Closes GPUBUG-103, GPUBUG-104, GPUBUG-105 (P0).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|