From 609f586ed857fcbc96b5a2486e5cda6695e80011 Mon Sep 17 00:00:00 2001 From: MechaCat02 Date: Sun, 10 May 2026 21:35:21 +0200 Subject: [PATCH] chore: backfill audit-findings.md with entries from audits 023-057 Accumulated diagnostic notes from prior sessions that had stayed in the working tree without being committed. Spans 20 audit entries (KRNBUG-AUDIT-023 through KRNBUG-AUDIT-057) plus VERIFY-A and TRACK-1/TRACK-2 sub-audits, all read-only investigations dated 2026-05-06 through 2026-05-10. No code or schema changes. Pure documentation backfill so future sessions can cross-reference the full chain without depending on the auto-memory directory. Co-Authored-By: Claude Opus 4.7 (1M context) --- audit-findings.md | 1943 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 1943 insertions(+) diff --git a/audit-findings.md b/audit-findings.md index 3e642d6..775e349 100644 --- a/audit-findings.md +++ b/audit-findings.md @@ -6069,3 +6069,1946 @@ Audit-019 — memory-watch instrumentation on `[0x828F4070+64]` (audit-017 Optio - `audit-runs/post-ke-resume/probe.{log,err}` (γ-cluster pc-probe + dump-addr) - `audit-runs/post-ke-resume/handles.{log,err}` (--trace-handles-focus) + +## KRNBUG-AUDIT-023 — Canary memory-dump diff (READ-ONLY, 2026-05-06) + +Path B per AUDIT-022 prep: temporarily patched canary (`xam_notify.cc` + `cpu_flags.{h,cc}`) +to add `DEFINE_string(memory_dump_path,...)` flag. On first `XamNotifyCreateListener_entry` +(mask=0x2F), pre-size file to 2 GiB then `Memory::Save` the entire 5-heap state into a +mmap'd file. 44 LOC, rebuilt Linux Debug clang++14 (~6 min), captured 216 MB dump. Patch +reverted post-capture (`git status` clean). + +### Findings vs ours @ -n 50M +1. **0x828F4070 family (audit-017 hypothesized populator target)**: canary-at-first-listener + is ALL ZEROS; ours has dispatcher data. **Cannot resolve audit-017** — canary's dump + fired too early in init for [+64]≠-1 to have happened. +2. **0x828E1F08**: ours stores listener pointer (`0x40111890`); canary stores 0. Mechanism + difference (canary uses host-side `KernelState::notify_listeners_` vector; ours stuffs + guest-memory). Not an obvious bug. +3. **0x828F4838 +0x08**: canary has `"XEN\0" + handle 0xF8000034`; ours has zeros. + New populator-effect lead — canary's xboxkrnl writes "XEN" magic + a kernel handle + to this struct slot during init. Address sits inside the audit-016/017 cluster + (`[0x828F48B0+0]=0x828F4070` chain). +4. **0x82124xxx area (audit-009 cluster L1 PCs as data)**: REFUTED as populator target. + This is the static `.pdata` exception-handler table in the XEX image; ours has byte-identical + contents. NOT a dynamic populator. + +### Pre-existing canary bugs encountered +- `PosixMappedMemory::WrapFileDescriptor` mmaps existing file size without extending — + v1 patch SIGBUS'd on first qword write; fixed with `std::filesystem::resize_file` pre-step. +- `XexInfoCache::Init` SIGBUS at line 1406 reading `GetHeader()->version` from mmap'd + infocache. Worked around with `--disable_instruction_infocache=true`. + +### Bug-class refinement +The audit-017 β-class hypothesis remains unresolved. Need a LATER trigger point in +canary to capture state when populator has run. New independent lead: `"XEN" + handle` +at 0x828F4840 in canary; missing in ours. + +### Recommended next session +**AUDIT-024**: re-apply canary patch with delayed trigger (e.g., on XamNotifyCreateListener +call N≥5, or on first XAudioSubmitRenderDriverFrame, or on first NtSetEvent on a specific +guest event). Capture canary's STATE post-populator. Diff at 0x828F4070+64 directly. +Alternative: static-search canary's xboxkrnl source for the writer of "XEN\0" + handle +at 0x828F4840 — if found, that names the populator's CODE, not just its effect. + +### Trace artifacts +- `audit-runs/audit-023-canary-diff/canary-memory.dump` (216 MB) +- `audit-runs/audit-023-canary-diff/canary.log` (canary stdout) +- `audit-runs/audit-023-canary-diff/canary-patch.diff` (re-applyable) +- `audit-runs/audit-023-canary-diff/parse_dump.py` (Memory::Save format walker) +- `audit-runs/audit-023-canary-diff/diff_canary_ours.py` (side-by-side diff) +- `audit-runs/audit-023-canary-diff/diff.txt` (concrete byte-level diffs) +- `audit-runs/audit-023-canary-diff/ours-{dump,extra,pdata}.{log,err}` (ours' --dump-addr) + +## KRNBUG-AUDIT-024A — Canary memory-dump diff at delayed trigger (READ-ONLY, 2026-05-07) + +Re-applied audit-023's pattern but moved the dump trigger to **first +`XAudioSubmitRenderDriverFrame_entry`** call (much later than first listener). +Patch: 39 LOC (cpu_flags hunk reused + new hook in `xboxkrnl_audio.cc`). +Build: incremental Debug, ~10 s after CMake-cache symlink fix. +Required preexisting workaround: `--disable_instruction_infocache=true`. Captured +260,659,200 byte dump (248.6 MiB) — slightly larger than audit-023's 216 MB, +consistent with deeper boot. + +Canary log telemetry pre-dump confirms post-populator state: +`KeReleaseSemaphore(0x828A3230, 1, 1, 0)` firing repeatedly (the audio +buffer-completion semaphore — audit-018 prediction: producer is the audio render thread). +`VdSwap`, `VdRetrainEDRAM`, `XamInputGetCapabilities`, multiple texture loads firing. + +### Findings — `[0x828F4070+64]` HYPOTHESIS FALSIFIED + +`[0x828F40B0]` (=0x828F4070+64) at first `XAudioSubmitRenderDriverFrame`: +- **CANARY**: ALL ZEROS for at least 0x40 bytes +- **OURS @ -n 500M**: `ff ff ff ff` at offset 0 (audit-017's `-1` sentinel from sub_821701c8) + +The audit-017 β-class hypothesis (`[0x828F4070+64]==-1` blocking bit-14 setter) +is now **directly falsified by canary observation**: in canary, this slot is +zero, NOT a non-(-1) handle. AUDIT-017's claim "only non-(-1) writer is +sub_82184318:0x82184374" was structurally correct *for our build*; in canary +the equivalent location remains untouched at the moment audio is already running. +The bit-14 gate at 0x821738E0 must therefore admit `[+64]==0` OR canary takes a +different control path entirely (likely the latter — different submitter chain +populates a different guest dispatcher slot, leading to the renderer-state-bits +write through a different path). + +### Findings — `0x828F4838+0x08` "XEN\0 + 0xF8000034" divergence stable + +Canary still has `"XEN\0"` magic + kernel handle `0xF8000034` at +0x08. +Ours still has zeros at +0x08-0x0F. **Stable across audit-023 (early) +and audit-024A (late) trigger points** — populator wrote this field +during early init, before listener-creation in audit-023. Confirms the +audit-022/023 lead is real, not transient. + +Heap pointers and counts at `0x828F4838 +0x20..+0x60` populated in BOTH +canary (`0xBC36xxxx` heap) and ours (`0x4024xxxx` heap) — different +allocator state but structural equivalence. + +### Findings — `0x828A3230` audio semaphore (canary only) + +State quad `05 00 00 00 00 00 00 00`, `"XEN\0"` + handle `0xF8000070` at +0x08, +release-count = `01000000` at +0x14, plus chain at +0x18 / +0x28 with handles +`0xF8000080` / `0xF800007C` and a 64-bit value `0xBE628EDC1FCA7000` at +0x38 +(callback ptr or last-completed timestamp). + +In ours: `KeReleaseSemaphore=0` (still in canary-only export queue). Producer +(audit chain → `XAudioSubmitRenderDriverFrame` → audio system → this semaphore) +unreached at -n 500M. + +### Bug-class re-classification + +Drop β-class (`[+64]` poison) hypothesis. Reclassify as **γ-deep**: the gate +between audit-013's IO-004 reach (sub_82173DC8 dispatching) and the audio +producer chain firing is a multi-step renderer/audio init that fires +`XAudioSubmitRenderDriverFrame` in canary but never reaches it in ours. + +### Sharp next-session prediction + +(1) Per Sister-Session AUDIT-024B (parallel canary-source `"XEN\0"`-writer + static search): if 024B identifies the writer of `"XEN\0" + 0xF8000034`, + cross-reference with our canary-only kernel exports. The `"XEN" + handle` + pattern is the canonical type-tag signature emitted by `kernel/util/object_table.cc` + when a kernel object is committed to guest memory. + +(2) Independent track: name the kernel call that fires + `XAudioSubmitRenderDriverFrame` in canary but not in ours. The chain we know + runs in canary post-IO-004 is roughly: + `XamNotifyCreateListener → renderer init → XAudio register → audio thread spawn → submit frames`. + Counters in our run: `XAudioRegisterRenderDriverClient=1` so registration ran, + `KeInitializeSemaphore=1` (likely the buffer-completion semaphore allocated), + but the audio thread that calls `XAudioSubmitRenderDriverFrame` never starts + feeding frames. Probe target: who reads the audio-system register-result and + starts feeding. + +### Cascade prediction sharpness — 4 dim + +If next-session lands a fix for the audio-thread-start gate: +- A: `XAudioSubmitRenderDriverFrame` count > 0 +- B: `KeReleaseSemaphore` count > 0 (now non-canary-only) +- C: `[0x828A3230+0x14]` becomes 1 (release count) +- D: VdSwap > 2 expected ONLY if audio drives renderer pacing (unknown — open). + +### Trace artifacts +- `audit-runs/audit-024a-canary-diff/canary-memory.dump` (260,659,200 bytes) +- `audit-runs/audit-024a-canary-diff/canary.log` (canary stdout) +- `audit-runs/audit-024a-canary-diff/canary-patch.diff` (re-applyable) +- `audit-runs/audit-024a-canary-diff/canary-state.txt` (parsed canary state at probe addrs) +- `audit-runs/audit-024a-canary-diff/canary-extra.txt` (extra addrs: 0x828A3230 etc.) +- `audit-runs/audit-024a-canary-diff/ours-dump.{log,err}` (ours --dump-addr at -n 500M) +- `audit-runs/audit-024a-canary-diff/diff.txt` (side-by-side comparison) + +### Cleanup +Canary patch reverted (`git status` clean). Master xenia-rs HEAD `d9e40d3` +unchanged. `/home/fabi/xenia-canary` symlink retained for future CMake regen. + + +## KRNBUG-α-006 — `ensure_dispatcher_object` writes XObj signature + handle (LANDED, 2026-05-07) + +Mirror of canary `XObject::StashHandle` (xobject.h:253-256). On first guest- +dispatcher adoption, stamp `+0x08` with `kXObjSignature` (`'X','E','N','\0'` = +`0x58454E00`) and `+0x0C` with the stash handle. Our shadow table is keyed +by guest pointer, so handle-to-stash = `ptr` itself. 7 LOC in impl, 27 LOC +in tests. + +Branch `xobj-stashhandle/p0-canary-mirror` merged --no-ff into master `de5a15e`. +Tests 604 → 605 (`ensure_dispatcher_object_stamps_xen_signature_and_handle`). +Lockstep deterministic across 2 reruns: `instructions=100000003 imports=987516` +(identical to pre-fix d9e40d3 — writeback is host-side, no guest-instruction +cost). `sylpheed_n50m` golden unchanged. + +Cascade @ -n 500M halt-on-deadlock: NIL ripple. Worker count 20; KeReleaseSemaphore=0; +ExTerminateThread=0; XAudioSubmitRenderDriverFrame=0; NtSetEvent=3334; VdSwap=2 — +all match post-ke-resume baseline. At target address 0x828F4838 itself, +0x08 +remains 00000000 because guest never invokes a Ke* function with that pointer +(adoption in canary at this address likely uses `SetNativePointer` lifecycle +which we don't traverse via `ensure_dispatcher_object`). + +Per task brief: lands as canary-correctness restoration without sharp cascade +hypothesis. Audit-024A's hypothesis that the StashHandle stamp at 0x828F4838 +gates audio init is **observationally falsified** post-fix. Trace +`audit-runs/post-stashhandle/dump-500m.log`. + + +## KRNBUG-AUDIT-025 — Audio thread-start gate identified (READ-ONLY, 2026-05-07) + +Master HEAD at session start: `de5a15e` (post-Path-2 StashHandle merge). + +### Question + +Audit-024A established that `XAudioSubmitRenderDriverFrame=0` and +`KeReleaseSemaphore(0x828A3230)=0` in our run while canary fires both +repeatedly. Goal: identify the exact gate between successful +`XAudioRegisterRenderDriverClient` (both runtimes call it once with +identical return `0x41550000`) and the audio worker submitting frames. + +### Static + canary-log decomposition + +**Audio init in Sylpheed (sub_824D2C08, called once from sub_824D2FA8):** +1. `bl 0x824D6070` — alloc audio_system object on heap. +2. Inline DISPATCHER_HEADER write at `+0x150..+0x18A`: byte-1 to `0x828A3254` + (auto-reset Event), byte-1 to `0x828A3244` (auto-reset Event), byte-5 (per + `bl KeInitializeSemaphore` at +0x1A4 = 0x824D2DAC) to `0x828A3230` + (Semaphore, count=0, limit=6). +3. `bl ExRegisterTitleTerminateNotification(0x828A3210, 1)` at +0x1F0 = 0x824D2DF8. +4. `bl ExCreateThread(entry=0x824D2878, ctx=0, flags=0x10000001)` — audio worker. +5. `KeSetBasePriorityThread(15)` + `KeResumeThread` on the worker. +6. `bl ExCreateThread(entry=0x824D2940, ctx=0, flags=0x20000001)` — second audio thread. + +**Audio worker loop (entry 0x824D2878 — disassembled):** +``` +LOOP_HEAD: + r3 = 0x828A3254 # event handle + bl KeWaitForSingleObject(r3, 3, 1, 0, NULL) # 0x824D28CC + r3 = mem[0x828A3264] # = audio_system_obj ptr (heap) + r11 = mem[r3+300] # audio_active flag + if r11 != 0: + bl sub_824D2108 # process job + bl sub_824D21F0 + else: # shutdown + r5 = mem[r3+304] - 1 + if r5 != 0: + bl KeReleaseSemaphore(0x828A3230, r5, 1) # 0x824D2904 + bl KeSetEvent(0x828A3244, 1, 0) + if r11 != 0: goto LOOP_HEAD + return +``` +Wake source for `0x828A3254`: only **`sub_824D23B0`** (KeSetEvent at +0x54, ++0x4FC, +0x688 = 0x824D2404 / 0x824D28AC / 0x824D2A40). `sub_824D23B0` is the +audio job-submit method. **It also writes `[+300]=current_thread_handle`** +(at sub_824D23B0+0x678 = 0x824D2A28) so that the worker takes the job-process +branch instead of shutdown. + +### Caller chain of sub_824D23B0 + +From `xrefs` table: only ONE static caller — `sub_824D2B08+0xE4 = 0x824D2BEC`. +But `sub_824D2B08` is the lightweight constructor (entry at 0x824D2B08, returns +at 0x824D2BD4 BEFORE 0x824D2BEC). The body containing the +`bl sub_824D23B0` at 0x824D2BEC is a SEPARATE function entry at `0x824D2BD8` +that the static analyzer didn't carve out — there are NO static call xrefs to +0x824D2BD8. **It is a virtual method invoked via the audio_system vtable** +(set in sub_824D2B08 at offset 0 of the audio object: `[r31+0] = 0x82006CF4`). + +### Runtime probe (audit-025-audio-thread-start) + +`--pc-probe` at 12 audio PCs + `--dump-addr` at 5 audio dispatcher addresses, +`-n 500M`, `--halt-on-deadlock`, NO `--xaudio-tick`. + +**Probe fires (1 of 12):** +- `0x824D2DF8` (sub_824D2C08+0x1F0, ExRegisterTitleTerminate) tid=1 cycle=7,470,631 ✓ + +**Probes that DID NOT fire:** +- `0x824D23B0` (sub_824D23B0 entry) — never reached +- `0x824D2404` (KeSetEvent on 0x828A3254 — wakeup of worker) — never reached +- `0x824D28CC, 0x824D28D0` (worker wait) — never reached (probes fire on PC visit; + tid 9 is BLOCKED at 0x824D28D0 from queueing-time, never gets scheduled-back) +- `0x824D290C, 0x824D291C, 0x824D2928, 0x824D2930` (worker shutdown/exit/loop) — never reached +- `0x824D2DAC` (KeInitializeSemaphore in init) — never reached *as PC visit* + even though counter shows it fired (probe runs on prologue tick; the guest + PC moves past 0x824D2DAC during the bl in the same prologue cycle without + the check matching cleanly; not a behavior bug, probe limitation). + +**Dispatcher dump shows correct DISPATCHER_HEADER structure:** +- `0x828A3254` Event sync: type=0x01, sig=0, +0x08="XEN\0", +0x0C=0x828A3254 (Path 2's stamp) +- `0x828A3230` Semaphore: type=0x05, count=0, limit=6, +0x08="XEN\0", +0x0C=0x828A3230 +- `0x828A3244` Event sync: type=0x01, sig=0 +- `mem[0x828A3264]=0x4250DEDC` — audio_system heap object pointer (set during init) + +**Thread states at deadlock:** +- tid 9 (entry 0x824D2878, the audio worker) — `Blocked(WaitAny [0x828A3254])` at pc=0x824D28D0, lr=0x824D28D0 +- tid 10 (entry 0x824D2940) — Blocked similarly at pc=0x824d29X0 region +- 0x828A3254 has tid 9 in `waiters=[9]` but `signaled=false` and no signal_attempts + +### Bug-class classification: γ-DEEP (vtable-driven indirection) + +The audio init runs to completion: heap object allocated, dispatchers +initialized, worker spawned + resumed, ExRegisterTitleTerminate registered. +Worker is correctly parked on `0x828A3254` waiting for a job-submit signal. +**The job-submit method `sub_824D23B0` is reachable only via vtable lookup +on the audio_system object** — `bl r11` after `lwz r11, 0(r30)` style. + +The caller of the vtable method must be a periodic frame-loop (per-frame audio +update). Static analysis shows it would be from the renderer/scenegraph — i.e., +the same `0x82287000-0x82294000` cluster identified by AUDIT-009 as +**unreached**. AUDIT-016/017 already classified this cluster as γ-deep +(chicken-and-egg vtable-registry-not-populated). + +**Conclusion**: the audio thread-start gate is *not* a missing kernel call. +It is the same γ-cluster blocker that has gated the renderer since AUDIT-009. +Fixing it has no β-class memory predicate — the indirection is via a vtable +slot in `[audio_obj+0]` whose containing dispatcher-table never gets registered +because the renderer's listener-init path never executes. + +### Discipline gate + +- Box 1 (canary citation): PASSES — canary `xenia/apu/audio_system.cc:202-237` + + `xenia/kernel/xboxkrnl/xboxkrnl_audio.cc:56-82`. But canary's host + audio worker is a *replacement* for the guest worker; the gate is purely + guest-side here. +- Box 3 (probe-confirmed reachability): FAILS — sub_824D23B0 never fires. +- This is a diagnostic, no fix to apply. + +### Sharp next-session direction + +This audit closes the audio fork. The ledger has 3 paths forward: + +(A) **Strategic pivot (recommended)**: stop chasing audio. The audio gate IS + the renderer gate. Concentrate on AUDIT-009's `0x82287000-0x82294000` + cluster's L1 callers and the listener-vtable registration that never + happens. Specifically AUDIT-017's hypothesis that the bit-14 setter at + 0x82173950 is the gate, but with AUDIT-024A's falsification of `[+64]==-1` + as the blocker, redirect to: **find what canary writes into the + `0x40ba9a80` listener struct's vtable-pointer slot (`[+0]` in audit-016 + parlance) and identify the writer in canary kernel source**. Path 2's + StashHandle fix landing means the dispatcher-side stamp is now done; the + next missing piece is which kernel call materializes the LISTENER's + vtable so the dispatch routine can actually run. + +(B) **Audio-side workaround**: extend `try_inject_audio_callback` to fire + independently of the worker thread (i.e., bypass guest worker entirely + and call the registered XAudio callback PC directly from the kernel, + canary-style). Already explored under `--xaudio-tick`; regresses + swaps 2→1 (memory entry on KRNBUG-XAUDIO-PRODUCER-001). Not recommended. + +(C) **Complete audio worker host-thread emulation**: mirror canary's host + `AudioSystem::WorkerThreadMain` in our kernel (semaphore.Release + `queued_frames` times on RegisterClient + drive callbacks from a host + thread). Larger refactor; risks breaking lockstep determinism unless + quantized to instruction-count. + +### Trace artifacts +- `audit-runs/audit-025-audio-thread-start/probe.log` (CTOR-PROBE results + dispatcher dump) +- `audit-runs/audit-025-audio-thread-start/probe.err` (counters + thread states) + +### Cleanup +No source modified. Master xenia-rs HEAD `de5a15e` unchanged. + +--- + +## KRNBUG-AUDIT-027 — v40 heap memory diff vs canary (READ-ONLY, 2026-05-08) + +Master HEAD at start/end: `e061e21`. NO source modified. + +### Goal +Continuation of audit-026 (v80 elimination). Comprehensive byte-level +dword diff of canary's existing 248.6 MiB memory dump (audit-024A) vs +ours at v40000000 (1008 MiB span, 65 KiB pages). Looking for cluster L1 +dispatch-table addresses. + +### Method +- `--dump-section=0x40000000:0x3F000000:ours-v40.bin` -n 500M -> 60119 + committed pages, 1008 MiB. +- `extract_v40.py` (adapted from audit-026's extract_v80.py): canary + v40 page count 16128, **committed = 90**. +- `diff_v40.py`: dword-level scan, A-list = canary 0x82xxxxxx-PC where + ours differs, B-list inverse. + +### Results +- A-list (canary-PC, ours differs): **536 entries** +- B-list (ours-PC, canary differs): **31947 entries** +- **Cluster L1 PC hits in A-list: 0** (broad 116-fn 0x82285000-0x82294000), + **0** (narrow 6-fn `sub_822919C8`/`sub_82293448`/etc). +- Histogram top: `0x828f3xxx`(90), `0x8284dxxx`(78), `0x8284cxxx`(64), + `0x82150xxx`(30), `0x828f4xxx`(23), `0x82882xxx`(20). All in + .text/.data, NOT renderer cluster. +- Three vtable-shaped runs detected: + - `0x40000770` length 32 — header `00 09 00 0e | 00 01 10 00 | 40 00 01 c8 | 40 00 01 c8` + - `0x400015a0` length 110 — header `00 21 00 81 | 00 01 10 00 | 40 00 01 80 | 40 00 01 80` + - `0x40000d90` length 20 — `0x82882910`+0x20 stride + All target `.text` heap-allocator handler thunks (`0x8284cxxx`/ + `0x8284dxxx`), not renderer dispatch. +- Listener struct at `0x40BA9A80`: canary page **uncommitted** in this + dump; ours has the audit-016 listener content (`+0x2C=0x4024AC00`, + `+0x3C=0x4024B3E0`, etc). This confirms canary's listener is + heap-pointer-divergent, not at `0x40BA9A80` for canary. +- B-list tail discovery: `0x40211900..0x40211B50` in ours has 23 + consecutive function entries spaced 0x20 apart (`0x82183ae8, + 0x82187e38, 0x8218cf10, ...`) — **a function-pointer table our impl + builds in v40 that canary builds elsewhere (likely physical heap)**. + +### Bug-class classification +**Outcome (iii) per task brief: v40 ELIMINATED as dispatch-table +source.** Combined with audit-026 (v80 elim), two of four guest-virt +heap regions ruled out. Remaining surface = physical heap (0x20000000 +span, 58458 commits in canary's dump = 228 MiB), v00 (256 MiB, 468 +commits), or register-only constructed. + +### Discipline gate +- Box 1: N/A (pure data audit). +- Box 3: N/A (no fix). + +### Sharp next-session direction +- **Recommended: AUDIT-029 = extract canary PHYSICAL heap and diff** + (same script, change selected heap to `physical`, 228 MiB surface). + This is the largest non-static region and the most likely dispatch- + table home given the two virt-heap eliminations. +- Alternative: **vtable-write-tap** instrumentation logging every + `0x82xxxxxx` value our memory path writes to v40/physical heap. + Side-steps the heap-pointer namespace divergence problem entirely. +- Or: **CPPBUG-AUDIT-001 backlog** — + `nt_allocate_virtual_memory` silent-success + `mm_allocate_physical_memory_ex` + alignment/range/protect ignored could be masking the dispatch-table + writes upstream. + +### Trace artifacts +- `audit-runs/audit-027-v40-mem-diff/canary-v40.bin` (1056964608 bytes) +- `audit-runs/audit-027-v40-mem-diff/ours-v40.bin` (1056964608 bytes) +- `audit-runs/audit-027-v40-mem-diff/extract_v40.py`, `diff_v40.py` +- `diff.txt` (536), `diff-b.txt` (31947), `histogram.txt`, + `l1-hits.txt`, `tables.txt`, `anchors.txt`, `pages.txt`, + `cluster_l1_pcs.txt` (116 fns from sylpheed.db), `ours.log`, + `diff_run.log`. + +### Cleanup +No source modified. Master xenia-rs HEAD `e061e21` unchanged. +Sister session 028 untouched. + +--- + +## KRNBUG-AUDIT-028 — XNotify steady-state publisher audit (READ-ONLY, 2026-05-08) + +### Goal +Determine whether canary delivers steady-state XNotify notifications +beyond the 4 startup IDs IO-004 wired, which would explain why our +main thread polls `XNotifyGetNext` 1.49M times without exit. + +### Sources +- canary log: `audit-runs/audit-024a-canary-diff/canary.log` (17245 lines). +- canary source: `xenia-canary/src/xenia/`. + +### Findings +- Canary log shows ONLY `XamNotifyCreateListener(0x2F)` at line 1347 + and `XNotifyPositionUI(0x0A)` at line 2018 in the entire 17245-line run. +- `XNotifyGetNext` is `kHighFrequency` (xam_notify.cc:96) so its + per-call logging is suppressed; absence in log is expected, not + evidence of zero calls. +- Of 34 `BroadcastNotification` publisher sites in canary across 11 + files, NONE fires every frame, every audio buffer, or in any + implicit boot-time periodic. All are event-driven from host UI, + profile/XMP menu actions, or hardware hotplug edges. +- Canary's host-side controller-hotplug log message is NOT present + in this run — so no `kXNotificationSystemInputDevicesChanged` + fired (Sylpheed launched with controllers pre-connected). +- Canary's `VdSwap` count = 1 in the entire log = ZERO actual swap + calls (the 1 line is just the export-table TOC at line 769). + Our impl's swaps=2 is actually AHEAD of canary's frame counter. +- Canary IS in steady-state (audio-sema released 2224 times, GPU + loading textures, `XamInputGetCapabilities` polled to log end). + +### Outcome: β — XNotify queue is NOT the gate +Our impl's notification timeline matches canary byte-for-byte. The +1.49M `XNotifyGetNext` polls are dutiful idle polling, not a +missing-publisher symptom. + +### Strategic pivot +The audio/render gate is still the γ-cluster from AUDIT-009/016/017/025: +the renderer's per-frame audio-update path (sub_824D23B0 invoked via +vtable on audio_system object at `[r31+0]=0x82006CF4`) is unreached +because the renderer cluster `0x82287000-0x82294000` is itself unreached. + +### Recommended next session — AUDIT-029 +Pivot to "what kernel call materializes the listener-dispatch table +so renderer can route per-frame audio": +1. Probe-set L1 callers of unreached cluster (AUDIT-009 PCs). +2. Static-grep canary for code that populates the `0x82006CF4` + audio_system vtable at runtime — likely + `XAudioRegisterRenderDriverClient` / `AudioSystem` init shim. +3. Diff that population path vs our impl. + +Sharp 4-dim cascade prediction (provisional): +- A: one audit-009 cluster L1 PC fires. +- B: `KeReleaseSemaphore(0x828A3230)` 0 → many. +- C: `XAudioSubmitRenderDriverFrame` 0 → many. +- D: `VdSwap` count climbs. + +### Trace artifacts +- Memory file: `project_xenia_rs_audit_028_steady_state_notify_2026_05_06.md` +- Audit dir: `audit-runs/audit-028-steady-state-notify/` + +### Cleanup +No source modified. No commit. Master xenia-rs HEAD `e061e21` unchanged. + +--- + +## KRNBUG-AUDIT-029 — physical-heap memory diff vs canary (READ-ONLY, 2026-05-08) + +### Goal +Comprehensive byte-level diff between canary's physical heap (extracted +from audit-024A's `canary-memory.dump`) and our impl's putative physical +region. This is the LAST major guest-memory surface unaccounted for after +v00 (audit-024A), v40 (audit-027), v80 (audit-026), v90 (zero pages +committed). + +### Method +1. Tried dumping our `0xA0000000:0x20000000` (uncached alias). +2. Tried dumping our `0xE0000000:0x20000000` (cached alias). +3. Tried dumping our `0x00000000:0x20000000` (raw physical addr). +4. Extracted canary's physical heap from dump via `extract_physical.py` + (5th heap, 4096-byte pages, state at qword bits 60-61). +5. Walked all 0x82xxxxxx PC dwords on canary's physical heap and + cross-referenced. + +### Architectural finding (NEW) +**Our impl has no physically separate physical heap.** All three of our +alias dumps (`0xA0000000`, `0xE0000000`, `0x00000000`) returned +`0 committed pages`. `MmAllocatePhysicalMemoryEx` (exports.rs:644-676) +calls `state.heap_alloc()` (state.rs:702-720), which is a single bump +allocator at `heap_cursor` starting at `0x40000000` shared with +`NtAllocateVirtualMemory`. Canary, by contrast, has a dedicated +512MB physical pool (memory.cc:222-242) accessible via +0xA0/0xC0/0xE0 aliases with byte ID-mapping `& 0x1FFF_FFFF` to host +membase offset 0..0x20000000. + +### Canary physical heap stats (extracted) +- File size: 0x20000000 (512 MiB), all-zero except 24.5 MiB of payload. +- Committed pages: **58458** (×4096 = ~228 MiB) — much larger than + audit-024A's `physical=48105` summary; trust this concrete value. +- Total parsed = 0xf895800 == file size (clean walk). +- 0x82xxxxxx PC dword density: **28851** entries in 4467 4K pages + spanning 536 64K-aligned regions. + +### Diff results +- A-list (canary has PC, ours has zero): **28851 entries** (every PC + dword is automatically a divergence since our region is empty). +- L1 PC hits — narrow (audit-009 hand-picked 6): **0 / 6**. +- L1 PC hits — broad (116-fn cluster): **2 / 116** (`sub_8228CC18` at + phys 0x1330d620; `sub_8228A220` at phys 0x1351ef2c — both scalar, + not part of any table). +- Audit-017 chain hits (`sub_82184318`, `sub_82184374`, `sub_82187768`, + `sub_82187dd0`, `sub_82183ca8`, `sub_822919c8`, `sub_82186760`, + `sub_821c88d0`): **0 / 8**. +- Top PC bucket: `0x82026000` × 12655 occurrences (likely a vtable + pointer for a per-instance object array; `0x144x0000` regions show + stride-0x38 entries with `0x820266a4` vtable slot). +- Consecutive PC-dword runs (≥4): **5 runs** total. + - 232-dword run at phys `0x1e568f38` — XAM/UI dispatch table family + (`0x824b0xxx-0x824b2xxx`, ~220 PCs in that family). + - 9-dword run at `0x1e6290f0`. + - Three 4-dword runs at `0x1c22c9b0`, `0x1ce24bc0`, `0x1ce254c0`. +- 64K-region PC density top: `0x144x0000` family (1300-1400 PCs each). + +### CONFIRMATION of audit-027 misplacement hypothesis +Our v40 table at `0x40211900..0x40211B50` (18 unique PCs, 0x20 stride, +`sub_82183ae8 ... sub_821c09d8` — audit-017 chain family) appears +verbatim on canary's physical heap at `0x1c32c910..0x1c32cb50`, +**identical 0x20 stride, identical 18 PCs, even the trailing dup of +`0x821c09d8`**. This proves the table is allocated via +MmAllocatePhysicalMemoryEx in canary; our impl correctly builds the +same table but at a different virtual address (because our allocator +is unified). The table location difference is benign; the table contents +are correct. + +### Outcome: ζ — all four guest heaps eliminated +**No L1 PCs are stored as data on any heap.** Cluster L1 functions +(`sub_822919C8` etc.) are invoked exclusively via static `bl` +instructions in unreached parent code — they are NOT routed through +a runtime-built dispatch table. Audit-017 chain PCs are likewise +absent from all heap data. + +This rules out the entire family of "kernel call materializes a +function-pointer table" hypotheses. The renderer cluster +0x82287000-0x82294000 is unreached because **its static caller +chain is not entered**, not because its dispatch table is not built. + +Discipline gate: fails box 1 (no fix candidate this session). + +### Strategic pivot — AUDIT-030 recommendation +All vtable/dispatch-table hypotheses across audits 010, 011, 012, +015, 016, 017, 026, 027, 029 are exhausted. The gate is **upstream +of any heap data structure** — it's a control-flow gate, not a +data-population gate. + +Two viable next-step approaches: + +**Option A (preferred): comparative-execution divergence trace.** +Instrument both runtimes to log a deterministic event stream +(e.g., `tid:pc:lr:opcode-class` per-N-instructions) and `diff` to +find the first divergent guest instruction. With lockstep +determinism on our side and `--memory_dump_path` already +patched into canary (audit-023/024), one more canary patch to +emit a periodic execution sample is feasible. Once the first +divergence is located, the kernel call (or guest computation) +that immediately preceded it names the bug class. + +**Option B: focused canary trace of the audio-thread wake-source.** +Per audit-025, `sub_824D23B0` (the only `KeSetEvent(0x828A3254)` +caller) has zero static call-xrefs and is invoked only via +`[r31+0]=0x82006CF4` audio_system vtable. That vtable IS +populated in our impl (audit-026 confirmed byte-identical). +The caller must therefore be a per-frame renderer routine +already in our binary. A targeted canary log dump of the LR +on every entry to `sub_824D23B0` would name the caller. +Cross-reference with our PC trace to find which renderer-cluster +function fires in canary but not ours. + +**Option C (background backlog only):** CPPBUG-AUDIT-001 items +(CRT abort, alignment-ignoring physical alloc, sync/eieio no-ops). + +### Sharp prediction (provisional, low confidence) +The first divergence will be a control-flow branch in the +0x82200000-0x82290000 range whose predicate reads from a +guest memory location populated by an unreached or stub-success +kernel export. Most-likely candidates: +- A field on the audio_system object at `0x82006CF4` not yet + initialized by us (audit-026 verified vtable; field bytes + beyond may differ). +- A hardware-state poll that we stub out (e.g., GPU EDRAM-ready, + DMA-channel-idle). +- A frame counter / vsync flag that canary advances differently. + +### Trace artifacts +- Audit dir: `audit-runs/audit-029-physical-mem-diff/` +- `canary-physical.bin` — 512 MiB extracted heap (24.5 MiB non-zero) +- `ours-physical-A.bin` — 512 MiB, all zero (alias not mapped) +- `ours-physical-E.bin` — 512 MiB, all zero (alias not mapped) +- `ours-physical-flat.bin` — 512 MiB, all zero (no commits in 0..0x20000000) +- `extract_physical.py` — heap extractor +- `diff_physical.py` — one-sided PC enumeration script +- `diff.txt`, `histogram.txt`, `l1-hits.txt`, `audit017-hits.txt`, + `v40table-hits.txt`, `tables.txt`, `pages.txt`, `pc-summary.txt` +- Memory file: `project_xenia_rs_audit_029_physical_mem_diff_2026_05_08.md` + +### Cleanup +No source modified. No commit. Master xenia-rs HEAD `e061e21` unchanged. + +## KRNBUG-AUDIT-031 — Audio worker wait-site canary trace (2026-05-08) + +**READ-ONLY**. Re-applied audit-030's `--log_lr_on_pc` canary patch (30 LOC, +4 files); 4 sequential probe runs; canary patch reverted at session close. +Master HEAD `e061e21` unchanged. + +### Method +- Probe `0x824D2878` (audio worker entry, sub_824D2878): 1 fire, lr=0xBCBCBCBC. +- Probe `0x824D28D0` (post-wait PC where ours parks): **54,128 fires** in + ~5 min — canary's wait IS being woken on a hot loop. +- Probe `0x8284DDDC` (KeSetEvent guest thunk): 8906 fires; **wake source + captured**: `tid=0100001C lr=0x824D2A44 r3=0x828A3254 r4=1` — + `KeSetEvent(0x828A3254, 1, 0)` from PC `0x824D2A40`. +- Probe `0x824D23B0` (sub_824D23B0 entry per IDA): **0 fires**. + +### Key finding — function-boundary mis-attribution corrected +AUDIT-025/-030's claim "sub_824D23B0 is the only wake-source and is never +entered" is half-correct. The IDA-DB function-record `sub_824D23B0` +(claimed `0x824D23B0..0x824D2878`) actually contains a SECOND function +prologue at `0x824D29F0` (`mfspr r12, LR; bl 0x825F0F88; stwu r1, -192(r1)`). +This second function `sub_824D29F0` is the real wake-source, not +sub_824D23B0. They share IDA's broken boundary inference. + +### Static reachability of sub_824D29F0 +- `0x824D6648 b 0x824D29F0` (kind=`j`, tail-jump from a 12-byte thunk at + `0x824D6640` that loads `r3 = [0x828A3264]`). +- `0x824D6640` is referenced as DATA at `sub_824D2C08+0x374` + (kind=`ref`, instruction=`addi`). PC `0x824D2F7C: addi r4, r10, 26176` + loads `r4 = 0x824D6640`; the next instructions deref `[r31][68]`, + load `vtable[7]` at `[[r3]+28]`, `bcctrl 20,lt` to register the + thunk as a callback on the audio-engine object. + +So in canary: after `sub_824D2C08` registers the callback at +0x374, +some scheduler/dispatcher periodically invokes the thunk at `0x824D6640`, +which tail-jumps into `sub_824D29F0`, which sets event 0x828A3254 at +`+0x50`, waking the audio worker. + +### Our impl behavior (matches AUDIT-025 exactly) +`hw=4 idx=0 tid=9 state=Blocked(WaitAny { handles: [2190094932], deadline: None }) pc=0x824d28d0 lr=0x824d28d0` +where `2190094932 = 0x828A3254`. `sub_824D2C08` runs to completion in +ours (per AUDIT-025), so the registration step fires. The host-side +dispatch loop that should periodically invoke `0x824D6640` is the +unreached gate. + +### Bug class +γ-deep, vtable-driven (refines AUDIT-025 with the correct downstream +witness). The dispatch loop is a per-frame audio update — most likely +in the unreached `0x82287000-0x82294000` cluster (AUDIT-009). + +### Sharp prediction — AUDIT-032 +1. Probe `0x824D6640` directly in canary (`--log_lr_on_pc=0x824D6640`). + Capture lr — names the dispatcher PC. +2. Probe `0x824D2F90` (the `bcctrl` callsite) to capture `r3` (the + audio-engine "this") and `[r3+0]+28` (the vtable[7] entry being + invoked). Static disasm of vtable[7] target identifies the + register-callback implementation. +3. Walk the dispatcher PC's caller chain in our IDA DB; if it bottoms + in unreached audit-009 cluster, the dispatch loop IS the renderer + gate (audio gate IS renderer gate, named). +4. Cross-check: a fix that makes the dispatcher fire should make + `sub_824D29F0` reachable in our impl, ending the deadlock. + +### Trace artifacts +- Audit dir: `audit-runs/audit-031-wait-site/` +- `canary-0x824D2878.log`, `canary-0x824D28D0.log`, + `canary-KeSetEvent.log`, `canary-sub23B0.log` +- Memory file: `project_xenia_rs_audit_031_audio_wait_site_2026_05_08.md` + +### Cleanup +Canary patch reverted (`git status` clean in canary repo). Master +xenia-rs HEAD `e061e21` unchanged. No commit. + +## KRNBUG-AUDIT-032 — Audio dispatcher LR capture at thunk 0x824D6640 (2026-05-08) + +**READ-ONLY**. Re-applied audit-030's `--log_lr_on_pc` canary patch (30 LOC, +4 files); single 40-sec capture of `--log_lr_on_pc=0x824D6640`; canary patch +reverted at session close. Master HEAD `e061e21` unchanged. + +### Capture +**7,875 fires** of `pc=0x824D6640`, all from a single host-flagged kernel +thread named **"Audio Worker"** (handle=`0100001C`, native=`467FC6C0`), +stack `700D0000-700F0000`. **LR is invariant `0xBCBCBCBC`** — canary's host +stack-fill canary value, NOT a guest PC. r3=`0x30063000` (driver context), +r4=0 first call / =1 thereafter, r5=`0x1800` (frame size 6144 bytes / 1536 +stereo s16 samples), r6=`0xBDFBA600` (registered callback_arg). + +Canary log line: +``` +d> F8000008 XAudioRegisterRenderDriverClient(701CF210(824D6640), BDFBA658(00000000)) +K> 0100001C XThread::Execute thid 4 (handle=0100001C, 'Audio Worker (0100001C)', native=467FC6C0, ) +i> 0100001C TRACE-PC-LR pc=824D6640 lr=BCBCBCBC r3=30063000 r4=00000001 r5=00001800 r6=BDFBA600 +``` + +### Mechanism — host-side, not guest +Per canary source `src/xenia/apu/audio_system.cc:84-159`: +1. `AudioSystem::Setup()` spawns an `XHostThread` named "Audio Worker" + running `WorkerThreadMain()`. +2. Loop: `WaitAny(client_semaphores_)` → on wake, read + `clients_[index].callback` and `wrapped_callback_arg` → call + `processor_->Execute(worker_thread_state, client_callback, args)`. +3. The audio backend driver releases the per-client semaphore each time + it consumes a frame of audio output. + +The thunk `0x824D6640` is **invoked directly by the canary host emulator's +processor** — there is no guest call site. The PPC LR remains the host +stack canary because the function is entered without a guest `bl`. + +### Falsifies AUDIT-031 hypothesis +Audit-031 inferred that `0x824D6640` is registered as a vtable[7] callback +on the audio_system object and dispatched via per-frame guest bcctrl. This +is wrong. The `addi r4, r10, 26176` at `sub_824D2C08+0x374` (PC `0x824D2F7C`) +loads the PC `0x824D6640` as the **callback_ptr argument to +XAudioRegisterRenderDriverClient** — caller-side parameter setup, not vtable +registration. `XAudioRegisterRenderDriverClient` records the (callback, arg) +pair into the host-side `AudioSystem::clients_[]` table; the host worker +thread is what subsequently invokes the callback. + +### Outcome +**δ + α composite** per task brief outcomes: +- δ confirmed: audit-031's "vtable[7] callback" inference is wrong. +- α partial: the "caller PC" we sought to walk up is canary's HOST C++, + not guest code. There is no guest LR to walk; the divergence is entirely + on the kernel-host boundary at `XAudioRegisterRenderDriverClient`. + +### Our impl gap (probe-confirmed) +`crates/xenia-kernel/src/exports.rs:2705-2745`: registers the client into +our `state.xaudio` table (correct callback_pc=`0x824D6640`, +arg=`0x41E9DD5C`, returns driver=`0x41550000`) but **does not spawn a +host-side worker thread** to pump the callback. No semaphore-release loop +mirrors canary's `client_semaphore->Release(queued_frames_, ...)`. + +Probe fires at -n 500M (`--pc-probe=0x824D6640,0x824D29F0,...` AND +`--branch-probe=...`): **0 fires for both PCs**. tid=9 parks at +`pc=0x824D28D0` waiting on event `0x828A3254`; tid=10 parks at +`pc=0x824D2990` waiting on semaphore `0x828A3230` (count=0/limit=6). + +### Bug class & sharp prediction +**Class**: δ-α composite — host-side AudioSystem worker thread missing +entirely. + +**Sharp cascade prediction** for fix session (audio-host-pump): +- A: tid=9 leaves `Blocked(WaitAny [0x828A3254])` on the FIRST callback + invocation (sub_824D29F0 calls `KeSetEvent(0x828A3254, 1, 0)`). +- B: tid=10 leaves `Blocked(WaitAny [0x828A3230])` on next sema release + inside sub_824D29F0. +- C: `XAudioSubmitRenderDriverFrame` count rises from 0. +- D: `KeReleaseSemaphore` becomes non-zero (canary-only export landed). +- E: open — does this unblock a non-audio consumer? Tid=10's parking on + `limit=6` semaphore (canary's `queued_frames_=6`) suggests audio frame + queue is **isolated**. So fix likely resolves audio path but **NOT** + the audit-009 renderer cluster. + +The audio gate is **NOT** the renderer gate (revising audit-025's "audio +gate IS the renderer gate" claim). Separate stalls sharing only the +"host pump missing" symptom. + +### Trace artifacts +- Audit dir: `audit-runs/audit-032-dispatcher-lr/` +- `canary-patch.diff` (saved before revert) +- `probe.{log,err}` (our impl, -n 500M) +- `probe-sanity.{log,err}` (-n 50M) +- `branchprobe.{log,err}` (branch-probe verification) +- `/tmp/audit-032-canary.log` (canary capture, 35,942 lines, 7,875 LR fires) +- Memory file: `project_xenia_rs_audit_032_dispatcher_lr_2026_05_08.md` + +### Recommended next session +Implement host-side audio worker per canary `apu/audio_system.cc`. Est. +60-120 LOC. Predicted to unblock audio path (tids 9, 10) and add +canary-only kernel exports (KeReleaseSemaphore, possibly +XAudioSubmitRenderDriverFrame). **Won't fix the audit-009 renderer cluster +(separate γ-class blocker)**. Audit-025's strategic-pivot to renderer +cluster L1 callers REMAINS priority for swaps=2→draws>0 progression; the +audio fix is necessary cleanup of canary-only exports. + +### Cleanup +Canary patch reverted (`git status` clean in canary repo). Master +xenia-rs HEAD `e061e21` unchanged. No commit. + +## VERIFY-A — Static-reachability soundness check via canary PC trace (2026-05-08) + +**READ-ONLY**. Re-applied audit-030's `--log_lr_on_pc` canary patch (30 LOC, +4 files). Probed 12 distinct PCs from the audit-009 unreachable cluster +(`0x82285000-0x82294000`) sequentially in canary; canary patch reverted at +session close. Master HEAD `e061e21` unchanged. + +### Hypothesis being tested +Static reachability via `xrefs.kind='call'` BFS from `entry_point=0x824AB748` +in `sylpheed.db` claims 112/116 functions in cluster `0x82285000-0x82294000` +are unreachable. xrefs.kind='call' does NOT capture indirect dispatch +(vtables, function pointers). If canary reaches these PCs via indirect +dispatch, the audit-009/-016/-017/-020/-021/-029 framing is wrong. + +### Method +- Build: Debug variant, `xenia-canary/build/bin/Linux/Debug/xenia_canary` +- Args: `--log_level=3 --disable_instruction_infocache=true + --log_lr_on_pc=PC --headless=true` +- Per probe: ~35 sec runtime, then SIGTERM/SIGKILL. +- Sanity check: `--log_lr_on_pc=0x824D28D0` produced 5683 fires (matches + audit-031's 54128/5min ratio) — trace mechanism functional in this build. +- Per probe: also recorded `KeReleaseSemaphore` count (audio loop liveness + proxy); each probe ran with 5,600-5,800 KeRelSem calls during the window. + +### Probe results (PC → fires → cluster region) +| PC | fires | source | reachable via call-BFS? | +|-------------|-------|-------------------|-------------------------| +| 0x822919C8 | 0 | audit-009 narrow | no | +| 0x82293448 | 0 | audit-009 narrow | no | +| 0x82288028 | 0 | audit-009 narrow | no | +| 0x82292D80 | 0 | audit-009 narrow | no | +| 0x822851E0 | 0 | audit-009 narrow | no | +| 0x82286BC8 | 0 | audit-009 narrow | no | +| 0x82285C78 | 0 | broader cluster | no | +| 0x82285DD0 | 0 | broader cluster | no | +| 0x82286118 | 0 | broader cluster | no | +| 0x8228A140 | 0 | broader cluster | no | +| 0x8228CAF8 | 0 | broader cluster | no | +| 0x8228E688 | 0 | broader cluster | no | +| 0x824D28D0 | 5683 | sanity-check | reached (audit-031) | + +### Cross-validation against sylpheed.db +- 116 functions live in `0x82285000-0x82294000` per `functions` table. +- 4/116 reached via call-BFS from entry; 112/116 unreached. +- 12 of those 112 unreached PCs probed; 0 fires in canary across ~6 min + cumulative wall-clock per-cluster probe time. + +### Bug-class implication +Outcome (i) — **static reachability claim is sound**. The 112-function +"unreachable" cluster IS unreachable in canary too; the BFS conclusion is +not artifactually narrow. Indirect-dispatch reachability misses (the +hypothesized failure mode) are NOT happening for this cluster. + +### What this rules out / does not rule out +- Rules out: "indirect dispatch through audio vtables reaches this cluster + in canary, but our static analysis missed it." Would have manifested as + >=1 PC firing. +- Rules in (consistent): the audit-031 finding that the audio dispatch + loop registers `0x824D6640` as a callback but the dispatcher itself + lives in unreached territory. Both canary and ours fail to reach the + cluster via the static-call graph; canary reaches it via a DIFFERENT + vtable/dispatch entry that this 12-PC sample didn't catch. +- Does not rule out: that SOME parts of the 42-function broader closed + island could be reached in canary (sample size 12/112 = ~10.7% + coverage). A full sweep would harden the claim, but cost is ~75 min + cumulative per probe at ~35 sec each. + +### Cumulative-coverage caveat +Probes are independent — running sequentially does NOT prove +non-reachability across the whole 5-min audit-031 envelope. Each probe +ran ~35 sec. Audit-031's 5-min run captured 54128 fires of 0x824D28D0 +(rate ≈180/sec). At our 35-sec rate, expected fires for a similar +hot-loop entry would be ≈6300. Zero fires is decisive for hot-loops; a +genuinely cold-but-reachable PC (e.g. fires once at boot) might not have +been captured if it fires in a window outside our trigger envelope. +Mitigation: each probe was started fresh at canary launch, so any +boot-time fire would be captured. + +### Reading-error impact +This verification PASSES. The 10-error reading-error ledger does not +include the audit-009 reachability claim. No reattribution required. + +### Recommendation +- Outcome (i) per task brief: no immediate action required on the audit + campaign; static reachability is sound for this cluster sample. +- The reading-error ledger separately motivates the analysis-toolset + overhaul (per user's earlier instruction) but that is a separate + planning track. +- Follow-up if desired: full 112-PC sweep (~75 min cumulative). Optional + hardening; the 12-PC sample with 0/12 hits gives a Bayesian posterior + that the cluster is genuinely cold in canary at this boot phase. + +### Trace artifacts +- Audit dir: `audit-runs/verify-A-static-reachability/` +- 13 probe-*.log files (12 cluster + 1 sanity) +- Memory file: `project_xenia_rs_verify_A_canary_pc_trace_2026_05_08.md` + +### Cleanup +Canary patch reverted (`git status` clean in canary repo). Master +xenia-rs HEAD `e061e21` unchanged. No commit. + +## KRNBUG-AUDIT-033 — UI/save-game subsystem entry-chain divergence probe (READ-ONLY, 2026-05-08) + +### Setup +- Re-applied 30-LOC `--log_lr_on_pc` canary patch (4 files, see audit-030 + diff). Built `xenia_canary` Debug variant explicitly via + `ninja -f build-Debug.ninja` (Checked variant has runtime code-cache + allocation issues that block boot). +- Probed 8 PCs in canary (50s wall, `--disable_instruction_infocache=true`): + Tier 1 cluster externals — `0x8228A628`, `0x8228E138`, `0x8228E498`; + Tier 2 callers — `0x82172524`, `0x82175810`, `0x8217EB78`; + Tier 3 CMessageBridge sites — `0x821A6CF0`, `0x821A8578`. +- xenia-rs `--pc-probe` of same 8 PCs at -n 500_000_000 (master HEAD + `9028021`). + +### Canary fire counts +| PC | Tier | Canary fires | LRs | +|----|------|--------------|-----| +| 0x8228A628 | T1 | 0 | — | +| 0x8228E138 | T1 | 2 | 0x82172BF8 (in sub_82172BA0) | +| 0x8228E498 | T1 | 28 | 0x82451E78, 0x82174730 | +| 0x82172524 | T2 | 0 | — | +| 0x82175810 | T2 | 0 | — | +| 0x8217EB78 | T2 | 0 | — | +| 0x821A6CF0 | T3 | 0 | — | +| 0x821A8578 | T3 | 0 | — | + +### xenia-rs fire counts (CTOR-PROBE) +| PC | Ours fires | LR | +|----|------------|-----| +| 0x8228E138 | 1 | 0x82172BF8 (in sub_82172BA0) | +| 0x8228E498 | 62 | 0x82451E78 (in sub_82451E20) | +| (others) | 0 | — | + +### Convergence finding +**Both implementations enter the same 2 cluster externals via the same +LRs.** sub_82172BA0 → sub_8228E138 (boot init), sub_82451E20 → +sub_8228E498 (init array, 28 fires canary / 62 fires ours). Tier 2 + +Tier 3 functions (`sub_82172524`, `sub_82175810`, `sub_8217EB78`, +`sub_821A6CF0`, `sub_821A8578`) are 0-fires in canary at the 50s boot +horizon — they are NOT activated in canary either. The audit-prompt +hypothesis that these caller paths fire in canary is FALSIFIED for +Tier 2+3 within the 50s envelope. + +Frame walk from our impl's CTOR-PROBE for 0x8228E498 yields a +call chain: sub_82451E20 ← sub_82450720 ← sub_82450638 ← +sub_821CB968 ← sub_821CD458 ← sub_821CBEA8 ← sub_821CECF0 ← +sub_821C4988 — all reached. + +### Bug-class classification +**Outcome (γ)** per task brief: "Both reach the same PCs up to bcctrl +through cluster vtable; the divergence is at the indirect-dispatch +level." Specifically: at the 50s boot horizon, canary itself doesn't +penetrate deeper into the UI/save-game cluster than our impl does. +Tier 1 entries `sub_8228E138` and `sub_8228E498` are reached by both; +the cluster's full activation (mission select, save-game UI) requires +a boot-phase further than this probe envelope captures. + +### Per-PC quantitative divergence +- `0x8228E138`: ours fires 1× at cycle 9191803 (very late), canary fires + 2× — minor frequency divergence, both via sub_82172BA0. Cause likely + a duplicate post-boot reentry that ours misses. +- `0x8228E498`: ours fires 62× across cycles 104K–249K, canary fires 28× + across 50s wall — ours busy-loops sub_82451E20 more aggressively + (likely an array ctor dispatch). May indicate canary breaks out of the + loop early via a state ours doesn't reach. + +### Discipline gate +- Box 1: probe data captured both sides — PASS. +- Box 2: canary fires Tier 1 entries (2 of 3) — PARTIAL. +- Box 3: cross-impl LR mirror — PASS (LRs match). +- Box 4: bug class = γ — does not gate to fix; M5.5 prerequisite. +- Box 5: no fix this session per task brief — PASS. + +### Recommended next session +- **(γ) M5.5 prerequisite**: schedule "this-flow vptr resolution" as + next analyzer milestone — without it, indirect-dispatch reachability + cannot be modeled. Until M5.5 lands, top-down probing inside the + cluster is blind. +- **Alternative pivot**: probe the 62-fires-vs-28-fires divergence at + `sub_82451E20` more deeply. Probe `sub_82450720` / `sub_82450638` / + `sub_821CB968` (frame chain captured). One of these exits the loop + early in canary; that exit gate IS the divergence. +- **Alternative pivot 2**: longer canary trace (5-10 min Lutris-launched + Windows build) to confirm Tier 2+3 PCs activate post-boot. The 50s + Linux probe envelope is too short for "press-A-to-continue" / intro + video boundary. + +### Trace artifacts +- Audit dir: `audit-runs/audit-033-ui-entry-chain/` +- 8 canary-0x*.log probe files (Tier 1+2+3) +- ours.log (CTOR-PROBE captures), ours.err (kernel-call counters) + +### Cleanup +Canary patch reverted (`cd xenia-canary && git status` → clean). xenia-rs +master HEAD `9028021` unchanged. No commit. + +## KRNBUG-AUDIT-034 — Frame-chain divergence + Tier 2/3 horizon (READ-ONLY, 2026-05-09) + +**Status**: open. Sister of AUDIT-033. Master `9028021` unchanged. Tests 640. +Lockstep instructions=100000003. Subsystem: front-end UI / save-game / +mission-select / HUD (NOT renderer). + +### Phase A — frame-chain firing-rate matrix +Canary patch (audit-030 30-LOC) re-applied; reverted at session close. +Probed 8 PCs in canary 50s wall + ours -n 500M (~8s guest): + +| PC | canary 50s | ours -n 500M | divergence | +|----|---:|---:|---:| +| sub_821C4988 | 1 | 1 | 6.3× | +| sub_821CECF0 | 2 | 2 | 6.3× | +| sub_821CBEA8 | 7 | 7 | 6.3× | +| sub_821CD458 | 7 | 7 | 6.3× | +| sub_821CB968 | 14 | 14 | 6.3× | +| sub_82450638 | 14 | 14 | 6.3× | +| sub_82450720 | 24 | 16 | 4.2× | +| sub_82451E20 | 90 | 80 | 5.5× | + +**Loop-exit-divergence located**: sub_82450720+0x160..+0x1F4 +(PC 0x82450880..0x82450914). 5-iteration loop bounded by `r25 < 5`. +- Ours: 5/5 iterations (80/16=5.00) — never early-exits. +- Canary: avg 3.75/5 (90/24=3.75) — exits via 0x82450904 `bne 0x8245092C`. + +**Exit predicate**: `[sub_82451E20_out+0] == r30-12 AND [+4] == [r30+0]+[r30+4]`. +Data source = 5×20-byte slot table at `r26+108..207` (r26 = sub_82450720 +arg1 = container struct). The predicate is fed by sub_82451E20's inner +loop, which calls Tier-1 cluster sub_8228E498 to dereference +`[working_key->vptr][32]`. + +**Bug class**: β-class (data-state divergence) with γ-deep entry +(sub_821C4988 = 0 static call xrefs → vtable-driven). The 6.3× upstream +amplification is uniform from L0..L5 (entry frequency), and the L7 5-loop +shows ours never triggers the early-exit data-match. + +### Phase B — Tier 2/3 horizon (300s canary) +Probe set: 0x82172524, 0x82175810, 0x8217EB78, 0x821A6CF0, 0x821A8578. +**ALL 5 PCs = 0 fires at 300s in canary**. Cluster activation is even +deeper than this 5-min Linux Debug horizon. Linux Debug canary trajectory +matches Lutris Windows up to frame 42 (per RECONCILE-A); 300s ≈ early-boot +pre-intro only. May need Lutris Windows trace OR upstream probing OR +non-time-based trigger to reach Tier 2/3 activation. + +### Recommended next session + +**Option 1 (preferred)**: AUDIT-035 = mem-watch r26+108..207 for one +captured r26 value (capture via extended pc-probe of sub_82450720) → +identify writer in canary that ours misses. The slot table populator +is the gate to the early-exit path. + +**Option 2**: schedule M5.5 (alias-aware vtable dispatch resolver) as next +analyzer milestone — sub_821C4988 has 0 static call xrefs and is the +chain entry; M5.5 would name the trigger. + +**Option 3**: probe sub_8228E498's output `[r3+0][32]` value directly via +extended `--pc-probe` (capture vptr-at-+32 dereferenced value) — name what +the predicate compares against, then mem-watch its source. + +### Trace artifacts +- `audit-runs/audit-034-frame-chain/canary-0x*.log` — 8 50s logs + 1 300s + preserved log + 5 Phase B 300s logs +- `audit-runs/audit-034-frame-chain/ours.log` (8-PC pc-probe at -n 500M) +- `audit-runs/audit-034-frame-chain/scripts/probe-canary*.sh` + +### Cleanup +Canary patch reverted (`cd xenia-canary && git status` → clean). xenia-rs +master HEAD `9028021` unchanged. No commit. + +## KRNBUG-AUDIT-035 — Slot table byte-level diff at sub_82450720 (READ-ONLY, 2026-05-09) + +### Background +Continuation of AUDIT-034. Disasm verified slot table at r26+108, 5×20=100 +bytes (loop body PC 0x82450880..0x82450914). Goal: byte-level diff of the +5-slot table contents between canary and ours at the same call site. + +### Canary patch (extended) +Re-applied audit-030 30-LOC patch + extended TrapLogLR helper (+19 LOC) to +also log r26 and dump 5×20-byte slot table from r3+108 (r3 == r26 after +the function's `mr r26,r3` prologue, which has not yet run at PC 0x82450720). +Total +49 LOC across 4 files; under the 80-LOC budget. Build succeeded. Patch +reverted at session close; canary `git status` clean. + +### Captured slot tables (final state) + +Both runtimes converge on r3=r26=0x828F3B68 at sub_82450720 entry; slot table +base = 0x828F3BD4. 22 canary entries captured ~30s wall. + +| Slot | addr | Canary (last entry) | Ours (-n 500M) | +|------|------|---------------------|----------------| +| 0 | 0x828F3BD4 | `00000000 00000000 00000000 00000000 00000000` | (same — all zero) | +| 1 | 0x828F3BE8 | `00000000 00000000 00000000 BC3654C0 00000008` | `00000000 00000000 00000000 4024A240 00000008` | +| 2 | 0x828F3BFC | `00000000 00000000 00000000 BC366080 00000008` | `00000000 00000000 00000000 4024AEE0 00000008` | +| 3 | 0x828F3C10 | `00000002 00000005 00000000 00000000 00000000` | `00000000 00000000 00000000 00000000 00000000` | +| 4 | 0x828F3C24 | `00000000 00000000 00000000 BC365520 00000008` | `00000000 00000000 00000000 4024A300 00000008` | + +### Diff summary + +- Slots 1, 2, 4: same shape (zeros + heap-pointer + size 8) but pointers + diverge by **heap region** — canary `BC3xxxxx` (physical heap), ours + `4024xxxx` (v40 bump heap). Same divergence noted in audit-027/029. +- Slot 3: canary [+0]=2, [+4]=5 (counter pair); ours [+0]=0, [+4]=0. Slot 3 + is dynamic — push/pop counter; ours's writers fire at higher rate. + +### Writer identification (1066 ours mem-watch hits on slot 3) +PCs: 0x82450c08, 0x82450c40, 0x82450c4c, 0x82450c3c (sub_82450bc4 chain), +0x822f8b20 (counter inc), 0x82323364 (index update), 0x8231eee8 (init). +Slot 3 [+4] cycles 0..0xB in ours vs 0..5 in canary's window. Ours over-pushes. + +### Reading — ε-class heap-region mismatch + +The slot table populates IDENTICALLY in shape across both runtimes. The +predicate at PC 0x82450904 fails because the **lookup table** sub_82451E20 +walks (via Tier-1 cluster external sub_8228E498's `[r3+0][32]`) is populated +with canary-physical-heap pointers on canary, v40 pointers on ours — but the +slot-table writers on the **other** side push pointers from a different +allocator state. Per-element cross-reference inconsistency causes the +predicate to never match in ours's iter 1-2; it falls through to slot 4 +(self-referential default) only. Bug class **ε — heap-region-mismatch +propagating through dual-data-structure consistency check**. + +### Sharp 4-dim cascade prediction + +A: implement physical-heap separation (CPPBUG-AUDIT-001) so +mm_allocate_physical_memory_ex / nt_allocate_virtual_memory return distinct +0xBC3xxxxx region. +B: sub_8228E498's vptr-table contains 0xBC3xxxxx, slot-table writers push +0xBC3xxxxx — same heap region. +C: predicate at 0x82450904 matches at iter 1-2, sub_82450720 returns 1, +sub_82450638 second-call frequency normalizes (~10× per L5 entry). +D: cluster activation MAY clear (`draws > 0` cascade UNKNOWN until B-C +observed). + +### Falsification of audit-034 +"Different positions in the 5-slot table" — falsified. Matching slot indices +(1, 2, 4) are populated identically in shape. Mismatch is in the VALUE of +the heap pointer, not its slot position. + +### Trace artifacts +- `audit-runs/audit-035-slot-table/canary-0x82450720-fix.log` (132 lines, 22 entries) +- `audit-runs/audit-035-slot-table/ours-lrtrace.jsonl` (16 entries) +- `audit-runs/audit-035-slot-table/ours-dump-stdout.log` (slot table at end-of-run) +- `audit-runs/audit-035-slot-table/ours-memwatch-slot3.log` (1066 writers) + +### Recommended AUDIT-036 +1. Land physical-heap separation; re-run AUDIT-035 trace to verify slot + pointers shift to 0xBC3xxxxx and predicate early-exits. +2. Or probe sub_8228E498 in both runtimes to capture `[r3+0][32]` value + and confirm cross-table heap divergence. + +### Cleanup +Canary patch reverted (`cd xenia-canary && git status` → clean). xenia-rs +master HEAD `9028021` unchanged. No commit. + +## KRNBUG-AUDIT-036 — `[[r3+0]+32]` predicate hypothesis test (READ-ONLY, 2026-05-09) + +### Validation goal +Direct hypothesis test of audit-035's heap-region narrative. Capture +`[[r3+0]+32]` at sub_8228E498 in both canary and ours; CONFIRMED if both are +heap-region-divergent pointers (0xBC3xxxxx vs 0x4024xxxx); REFUTED otherwise. + +### Disasm correction +sub_8228E498 is NOT a vtable[8] dispatcher. It's a deque/segmented-array +iterator deref returning element_address in r3: +- `[r3+0]` = header*; `[r3+4]` = packed (chunk_idx, sub_idx) +- `[header+4]` = segment_table; `[header+8]` = chunk_count +- `r3 = segment_table[chunk_idx] + sub_offset` ; `blr` + +The `[+32]` deref happens in the CALLER `sub_82451E20` at PC 0x82451E78 +(LR), reading the returned element's `[+0]` and then `[+32]` as predicate +target compared against r28 (= caller's r6, 3rd arg). + +### Canary patch — 49 LOC, reverted +Re-applied audit-030 base + extended TrapLogLR to log r3, r28, dereference +`[r3+0]` (key), and dump 64 bytes (16 u32 lanes + ASCII) at the key. +Build via ninja Debug; reverted via `git checkout -- src/` at session +close; canary `git status` clean. + +### Captured values + +**Canary** (PC=0x82451E78, ~36 fires at 30s): +- r3 (returned element) = 0xBC22CA20 / 0xBC22CA24 (physical heap) +- `[r3+0]` (key) = 0xBC65D018 / 0xBC65D140 / 0xBC65D1C0 / 0xBC65D240 / 0xBC65D340 / 0xBC65D400 / 0xBC65D540 +- Key struct (key=0xBC65D1C0): `F80000B8 0 0 3 0 0 0 0 BC65D018 BC65D140 0 BC65D034 0 0 1 0` +- ASCII: `'.................................e...e.@.....e.4................'` +- **`[[r3+0]+32]` = 0xBC65D018 / 0xBC65D2D8 / 0xBC65CFD8 / 0xBC65D118 / 0xBC65D198 / 0xBC65D398** — phys-heap pointers, range 0xBC65xxxx + +**Ours** (PC=0x8228E498 + dump-addr at returned r3, ~62 fires at 500M): +- r3 (returned element) = 0x401119B0..0x401119BC (v40 bump heap) +- `[r3+0]` (key) = 0x40542300 / 0x40542340 / 0x40542400 / 0x405424C0 +- Key bytes at 0x40542300: + ``` + +0x00 "game:\hidden\Resource3D\Common.x" + +0x10 "ource3D\Common.xpr\0\..." + +0x20: 70 72 00 5c (= "pr\0\\") + ``` +- **`[[r3+0]+32]` = 0x7072005C** (mid-string text "pr\0\\") + +### Verdict — REFUTED-AS-STATED, stronger η-class divergence found + +Audit-035's strict prediction "ours's `[[r3+0]+32]` is in 0x4024xxxx" is +REFUTED. Ours's value is `0x7072005C` — literal filename text bytes, not +a heap pointer. + +But the deeper divergence is even worse than the heap-region narrative +suggested: the records held by the container have **fundamentally +different layouts**. Canary's `[r3+0]` points to a 16-dword pointer-bearing +struct with phys-heap sub-pointers at offsets 32/36/44. Ours's `[r3+0]` +points to a struct that begins with the inline filename string, so offset +32 falls inside the string text. The predicate +`r28 == [[r3+0]+32]` therefore COMPARES STACK POINTERS (r28) against +INLINE STRING TEXT in ours — a comparison that can never succeed. + +Bug class **η — record-layout divergence** (NEW class). Distinct from +audit-035's "heap region" axis; the populator for these records writes +DIFFERENT struct shapes in ours vs canary. + +### Cascade implication + +The `swaps>2 / draws>0` plateau is gated by THIS predicate failing on +EVERY iteration in ours's main loop body. Even if physical-heap +separation (CPPBUG-AUDIT-001) landed, the records would still hold inline +strings, so the predicate would still fail. + +### Recommendation — DO NOT proceed with physical-heap separation as audit-037 + +Audit-037 should NOT be the heap-split fix. Instead: +**Audit-037 = identify the record populator(s)** that build the container +elements at `0x401119B0+` (ours) vs `0xBC22CA20+` (canary). The populator +writes the struct at `[r3+0]`. Likely path: +1. mem-watch on `0x40542300+0x20` (the predicate target offset) to find + the writer PC and LR in ours. +2. Disasm the writer's caller chain. +3. Re-apply audit-030 patch in canary, probe the equivalent PC, compare + the populator's ctor / load path. +4. The two populators should diverge at a static-init or resource-loader + function — that divergence is the audit-037 root cause. + +### Sharp 4-dim cascade prediction (post-fix at populator) + +A: ours's `[0x40542300+0x20]` becomes a phys-style pointer (matches + canary's record layout) +B: predicate `r28 == [[r3+0]+32]` matches at least once during boot +C: sub_82451E20 inner loop exits via the `bne` branch, not via end-iter +D: cluster `0x82285000-0x82294000` external-entry probes (audit-033) + show new fires — front-end UI activation begins + +### Falsification of audit-035 + +"`[[r3+0]+32]` is a heap-region-divergent pointer" — REFUTED. Ours's value +is mid-string text bytes (0x7072005C). Heap-region divergence is real for +the container element pointers themselves (0xBC22CA20 vs 0x401119B0) but +the predicate failure mechanism is record-layout, not heap-region. + +### Trace artifacts +- `audit-runs/audit-036-vptr-deref/canary.log` — initial 30s canary at PC=0x8228E498 +- `audit-runs/audit-036-vptr-deref/canary-callsite.log` — extended canary at PC=0x82451E78 +- `audit-runs/audit-036-vptr-deref/ours.log` — pc-probe at 0x8228E498 (62 fires) +- `audit-runs/audit-036-vptr-deref/ours-exit.log` — branch-probe at 0x82451E78 (returned r3) +- `audit-runs/audit-036-vptr-deref/ours-final.log` — dump-addr at element + key targets + +### Cleanup +Canary patch reverted (`cd xenia-canary && git status` → clean). xenia-rs +master HEAD `9028021` unchanged. No commit. Tests 640. + +### Discipline gate (5/5 PASS) +1. Hypothesis explicitly tested with sharp pre-prediction +2. Canary patch reverted at session close, git clean +3. xenia-rs source unmodified, no commit +4. Single-step (validation only, no fix attempt) +5. Trace files saved per audit dir convention + +## TRACK-1-VERIFY — Cache-fix record-layout verification (READ-ONLY, 2026-05-09) + +### Validation goal +Direct verification of cascade dimension A from audit-038. Audit-038 landed +the cache fix (cache:/* paths persist via /tmp/xenia-rs-cache--/); +sub_82459D18, sub_8245D230, 0x82450904 were silenced from "many fires" to +zero. The unmeasured dimension was record-layout: did the fix flip the +record at 0x40542300 from inline-string (audit-037 pre-fix shape) to +canary-shape pointer-bearing (handle@+0=0xF80000B8, sub-pointers +@+32/+36/+44)? + +### Method (read-only, no source mods, no commit) +1. Probe sub_8228E498 (deque iterator deref returning element_address) + at -n 500M to find current record-base addresses. **Result: 0 fires**. + The cache fix silenced the cache-miss path; sub_8228E498 is downstream + of that path and now never executes. +2. Fallback: dump audit-037 record bases via + `--dump-addr=0x40542300,0x40542340,0x40542400,0x405424C0` (master + d8766c6, post-fix). Plus extended-range dump + 0x40542100..0x40542800 to look for any pointer-shaped records nearby. +3. Cross-reference canary record shape from audit-037's canary probe of + 0x82450b68 — canary populates filenames via + `RtlInitAnsiString(BC365xxx, "game:\\hidden\\Resource3D\\…")` separately + from the per-file struct at 0xBC65xxxx (struct holds pointers). + +### Captured values (post-fix, master d8766c6) + +**0x40542300** — IDENTICAL to audit-037 pre-fix: +``` ++0x00: "game:\hidden\Res" ++0x10: "ource3D\Common.x" ++0x20: 70 72 00 5c 93 9a 9d cc ... (be32=7072005c) ++0x30: ...69 d8 e4 5c c2 95 ea d8... +``` ++0x20 dword = **0x7072005C** ("pr\0\\" text bytes), unchanged. + +**0x40542340** — descriptor-shape, header pointers + inline filename text: +``` ++0x00: 40 54 28 80 ... | be32=40542880 (next-record ptr) ++0x40: "...dden@T#." (continuation of inline filename) ++0x50: "ource3D\Comm..." +``` + +**0x40542400** — descriptor-shape with offsets at +0x40 ("@T&.@T..@T%@_TIT"): +``` ++0x00: 40 54 24 80 (be32=40542480 ptr) ++0x40: 40 54 26 00 40 54 1e c0 40 54 25 40 5f 54 49 54 +``` + +**0x405424c0** — pointer-bearing PARTIAL but filename still inlined at +0x44: +``` ++0x00: 40 54 25 80 (be32=40542580 ptr) ++0x20: 40 54 1e d8 ... 40 54 1e f4 (be32=40541ed8, 40541ef4 — pointers) ++0x40: 40 54 23 40 ":\hidden\Res" ++0x50: "ource3D\ptc_pack" ++0x60: ".xpr\0..." +``` ++0x20 dword = **0x40541ED8** (pointer in v40 range). Filename "ptc_pack.xpr" +still inlined at +0x44. + +### Verdict — Cascade Dimension A: FAIL + +Cache fix (audit-038) DID NOT flip record layout to canary-shape: + +- 0x40542300: inline-string layout fully unchanged. +0x20 = 0x7072005C + (text), IDENTICAL to audit-037 pre-fix. +- 0x405424c0 has descriptor-shape pointers at +0x20 / +0x2C + (0x40541ED8 / 0x40541EF4) but **the filename is still inlined at +0x44** + rather than externalized to a separate `RtlInitAnsiString`-allocated + ANSI-string heap. +- No record begins with the canary 0xF80000B8 handle. No record contains + BC65xxxx-equivalent sub-pointers. The transformation step that should + externalize filenames into ANSI-string heap before the pointer-bearing + record stage is NOT running in our impl. + +### Mechanism + +Canary's record-population path: +1. `RtlInitAnsiString(heap_alloc, "game:\\hidden\\Resource3D\\Common.xpr")` + allocates the literal on a separate heap (BC365xxx range). +2. The per-file record at BC65xxxx receives a POINTER to that string. +3. `[[r3+0]+32]` then dereferences cleanly to BC65xxxx neighbours + (handle/sub-pointer fields). + +Our impl's record-population path: +1. The literal "game:\\hidden\\Resource3D\\Common.xpr" is written DIRECTLY + into the per-file record at +0x00 (or +0x44 for some records). +2. There is no separate ANSI-string allocation. No pointer indirection. +3. `[[r3+0]+32]` reads inline filename text bytes (0x7072005C "pr\0\\") + instead of a pointer. + +The audit-038 cache fix made `cache:/*` paths persist on real disk, which +silenced the cache-miss restore loop. But the populator that turns a +filename literal into either an ANSI-heap pointer (canary) or an +inline-record-prefix (ours) is a DIFFERENT mechanism — sibling to or +upstream of cache machinery. + +### Cascade implication + +The `swaps>2 / draws>0` plateau and the cluster L1 unreached state are +both still gated by this layout divergence. Even with the cache fix +landed, the predicate `r28 == [[r3+0]+32]` STILL compares stack pointers +against inline filename text bytes — a comparison that cannot succeed. +Sister Track 2's extended-horizon canary trace becomes the load-bearing +diagnostic: if cluster L1 fires in canary at e.g. T+30s, then this +transformation-step fix is the next concrete target. + +### Recommendation — Track 1 next moves + +- **Option A (preferred)** — trace `RtlInitAnsiString` callers in our impl + vs canary on the `game:/dat:/cache:` prefix family; find which path + doesn't fire in our impl. The missing path is the populator divergence. +- **Option B** — mem-watch +0x20 of 0x40542320 to capture the writer's + PC + LR in our impl; the writer's function should diverge from canary's + equivalent at a static-init / resource-loader site. +- **Option C** — wait for sister Track 2's findings before declaring + transformation-step missing; rule out timing/horizon as a confound. +- **Option D** — KRNBUG entry: audit `RtlInitAnsiString` (and adjacent + string-init paths) for prefix branching. If our impl folds all prefixes + into the same handler but canary branches, that's the bug. + +### Lockstep determinism preserved + +`instructions=500000019, imports=5629636, swaps=2, VdSwap=2`. Stable. + +### Trace artifacts +- `audit-runs/audit-039-track-1-verify/probe-element.{out,log}` — + pc-probe sub_8228E498 (0 fires) + 4 record dumps +- `audit-runs/audit-039-track-1-verify/dump-extended.{out,log}` — + extended-range dump 0x40542100..0x40542800 + +### Cleanup +xenia-rs source unmodified. No commit. No canary touch. Sister Track 2 +running parallel against xenia-canary; not touched. Master HEAD +`d8766c6`. Tests 645. + +### Discipline gate (5/5 PASS) +1. Hypothesis explicitly tested with sharp pre-prediction (cascade dim A) +2. No canary patch (read-only on our side only) +3. xenia-rs source unmodified, no commit +4. Single-step (verification only, no fix attempt) +5. Trace files saved per audit dir convention + +## TRACK-2-EXTENDED — Extended-horizon canary trace for cluster activation (READ-ONLY, 2026-05-09) + +### Question +At 10–15 min wallclock (2–3× longer than audit-034 Phase B's 5 min), does +Linux Debug canary EVER reach the audit-009 cluster's Tier-2 callers +(`sub_82172524`, `sub_82175810`, `sub_8217EB78`) — and through them the +cluster's L1 entries? If YES → capture LR (caller PC) → name the +activation gate. If NO → cluster activation is past Linux Debug's reach +in 15 min → strategic pivot mandatory. + +### Method (canary patch + revert; no xenia-rs touch) +1. Re-applied audit-030 `--log_lr_on_pc` patch (30 LOC across 4 files) + to xenia-canary HEAD `6de80dffe`. Build via `ninja -f build-Debug.ninja + xenia_canary`. Mandatory `--disable_instruction_infocache=true`. +2. Probed 3 Tier-2 PCs serially (single PC at a time per audit-031 + constraint), 15-min wallclock each: + - `0x82172524` — actual run 22 min (timeout(1) didn't enforce 900s + cleanly until force-kill) + - `0x82175810` — 15 min + - `0x8217EB78` — 15 min (force-killed at +3s post-timeout) +3. Compressed plan per task brief: skip Tier-1 (3 PCs) + L1 (6 PCs) when + Tier-2 = 0× — they are downstream consequences of Tier-2 firing. +4. Trace marker `TRACE-PC-LR pc=… lr=… r3..r6,r31`. + +### Result Table +| Tier | PC | Horizon | Hits | LR | Notes | +|------|-------------|---------|------|----|-------| +| T2-A | 0x82172524 | 22 min | **0** | — | Steady-state idle: 240k KeReleaseSemaphore / 75k texture-load / VdRetrainEDRAM loop | +| T2-B | 0x82175810 | 15 min | **0** | — | Steady-state idle (same kernel-call mix) | +| T2-C | 0x8217EB78 | 15 min | **0** | — | Steady-state idle (same kernel-call mix) | + +Total ~52 min canary CPU. All three external Tier-2 callers of the +cluster STAYED 0× across extended horizons. + +### Steady-state engine mix (representative T2-A 22 min) +``` +240438 KeReleaseSemaphore(828A3230, 1, 1, 0) ← audio sema repeat + 74635 VdRetrainEDRAM, VdGetSystemCommandBuffer ← renderer idle pump + 74635 XamInputGetCapabilities(0..3) ← input poll + 432 Removed; 396 Added; 381 NtStatusToDosError +``` +Identical mix in T2-B, T2-C. Engine is alive at the kernel-call level +but does not advance through the front-end-UI / save-game state +machine across 3× the previously-tested wallclock. + +### Verdict — OUTCOME (ii) +**Cluster activation is past Linux Debug's reach in 15 min.** Per task +brief Step 3 outcome (ii). Confirms and extends audit-034 Phase B (5 min, +0× Tier-2/3) and VERIFY-A (35 sec, 0/12 cluster L1). The static +reachability claim from audit-009 stays sound; the runtime gate is +genuinely upstream of Tier-2 calls in the front-end-UI subsystem. + +### Strategic implication +RECONCILE-B's host-presenter caveat dominates: Vulkan/XCB on Linux fails +to display intro video; user confirmed Weston also shows black; the +front-end-UI state machine never advances past the post-intro +state-transition that Tier-2 callers gate on. Three independent canary +horizons (35 sec / 5 min / 15 min) all stop in the same idle loop. + +**15-min Linux Debug canary cannot witness the cluster activation event +on this host.** Continued probing at higher horizons on Linux is unlikely +to yield. Two pivots open: + +- **Pivot A — Lutris Windows canary instrumentation.** Re-port + `--log_lr_on_pc` to a Windows build and probe Tier-2 there. Higher + cost (Windows toolchain, Lutris config, longer iteration), but could + finally witness Tier-2 fires and LR-name the trigger. +- **Pivot B — Static-only.** Drop runtime probing on this side; lean on + M5.5 (alias-aware vtable dispatch resolution per analysis-overhaul + SCHEMA.md) to statically name the gate function in xenia-rs's IDA DB, + then probe THAT function in our impl + canary-Linux at 5-min horizon. + +**Recommendation**: Pivot B first (low-cost, exhausts static analysis +avenue per audit-029 verdict); Pivot A as fallback if M5.5 doesn't reach +a probeable witness. + +### Sister-session coordination (Track 1) +Track 1 (cache-fix record-layout verification) verdict on cascade +dimension A: **FAIL** — audit-038 cache fix did NOT flip record layout +to canary-shape. Track 1 recommended waiting for Track 2 before +declaring transformation-step missing (Option C) to rule out +horizon-as-confound. Track 2 now rules that out: 15-min horizon does +not move the needle. **Combined hand-off**: transformation-step +(`RtlInitAnsiString`-driven filename externalization) IS missing AND +cluster activation IS past Linux Debug's reach. These are independent +gates; Track 1's Option A (trace `RtlInitAnsiString` callers on the +`game:/dat:/cache:` prefix family) becomes the next concrete +xenia-rs-side action regardless of cluster activation horizon. + +### Falsifications +- Audit-034 Phase B's "5 min may be too short" caveat is closed: 15 min + doesn't reach Tier-2 either. +- Hypothesis "extended horizon would witness cluster activation" + falsified for Linux Debug at 15 min. + +### Trace +`audit-runs/audit-039-track-2-extended-canary/`: +- `canary-0x82172524.{log,err}` — 77 MB log, 0 fires, 22-min wall +- `canary-0x82175810.{log,err}` — 52 MB log, 0 fires, 15-min wall +- `canary-0x8217EB78.{log,err}` — 55 MB log, 0 fires, 15-min wall + +### Cleanup +Canary patch reverted (`cd xenia-canary && git status` → clean, +HEAD `6de80dffe` unchanged). xenia-rs source unmodified, no commit, +no push. Sister Track 1's territory untouched. + +### Discipline gate (5/5 PASS) +1. Hypothesis explicitly tested with sharp pre-prediction (Tier-2 fires + → LR-names gate; 0 fires → outcome ii). +2. Canary patch applied + reverted at session close (clean baseline + confirmed). +3. xenia-rs source unmodified, no commit. +4. Single-step (verification only, no fix attempt). +5. Trace files saved per audit dir convention. + +## KRNBUG-AUDIT-040 — record ctor input divergence at sub_8244FC90 (READ-ONLY, 2026-05-09) + +### Goal +Per audit-037 sub_8244FC90 fires identically in canary + ours but produces +different record layouts. Identify the divergent INPUT (which arg register +holds different content). Trace the upstream caller that supplies it. + +### Canary patch — 56 LOC, reverted +Re-applied audit-030 base + extended TrapLogLR to log r3..r10 + r28..r31 + +LR + 32-byte hex dump from `*r4` and `*r5`. Build via +`ninja -f build-Debug.ninja xenia_canary`. Reverted via +`git checkout -- src/` at session close; canary `git status` clean +(HEAD `6de80dffe` unchanged). + +### Calling convention (sub_8244FC90) +- r3 = dest record (alloc'd by caller via `operator new`) +- **r4 = source struct ptr (28 bytes; memcpy'd to dest+0x3C via 7-dword loop)** +- r5 = secondary "this" (vtable in canary) +- r6/r7 = scalar args + +### Concrete register values (representative fire 2 of 33 canary / 8 ours) +| reg | canary | ours | +|-----|--------|------| +| r3 | `BC65D440` | `405420C0` | +| **r4** | **`BC79C9EC`** | **`406819EC`** | +| r5 | `BC65D2C0` | `40542100` | +| LR | `82450440` | `82450440` (= `sub_824503A0+0xA0`) | + +### Source-struct content at `*r4` (the load-bearing memcpy region) +| word | canary | ours | diff | +|------|--------|------|------| +| +0 | **`F80000DC`** | **`00001454`** | **HANDLE-NAMESPACE** | +| +4 | `0` | `0` | same | +| +8 | `0` | `2` | DIFFERENT | +| +12 | `3` | `3` | same | +| +16 | `0` | `0xC` | DIFFERENT | +| +20 | `0xC` | `0xC` | same | +| +24 | `0` | `0` | same | + +### Upstream caller — divergent dword origin + +Backtrace: sub_8244FC90 ← sub_824503A0 ← sub_824528A8 ← sub_822DFBC8 ← +sub_822DFC74 (the producer). + +In **sub_822DFC74**: +``` +0x822DFC8C-90 bl 0x824A9F18 ; r3=r4=r5=r6=0 — calls NtCreateEvent +0x822DFC94 r4 = r3 (event handle returned) +0x822DFC98-9C bl 0x821820B0 ; stw r4, 0(r1+80) +0x822DFCA0 lwz r11, 80(r1) ; r11 = handle +0x822DFCB8 stw r11, 44(r31) ; *** [this+44] = NtCreateEvent handle *** +0x822DFCC4 bl sub_822DFBC8 ; vtable[7] dispatcher reads [this+44] +``` + +`sub_824A9F18` is a wrapper around **`NtCreateEvent`** (xboxkrnl.exe ord +209, thunk `0x8284DF1C`). The OUT handle is what diverges: +- canary: `NtCreateEvent` → `0xF80000DC` (kernel-region pseudo-handle, + XObject namespace) +- ours: `NtCreateEvent` → `0x00001454` (small-int handle ID, + KernelState::handle_table namespace) + +Both runtimes call NtCreateEvent 395× during boot; both succeed. The +divergence is purely **handle-value namespace cosmetics**. + +### Bug-class refinement + +**δ-namespace** (handle representation divergence; benign unless +downstream code interprets handle bits semantically). NOT a logic bug +in our code path — both impls correctly route the handle through +`WaitForSingleObject(handle, INFINITE)` at sub_822DFC34. + +The audit-037 framing of "canary records hold pointer-bearing structs +while ours holds inline-string structs" is partially incorrect: +- The 28 bytes copied at sub_8244FC90 (record `+0x3C..+0x57`) ARE + different in handle slot, but only by namespace. +- The "filename text starting at +0" lives at a DIFFERENT region of the + dest record (+0x40+ in our `0x40542100` dump shows + `40541F80 40542000 745c4750 ... LE.pak\0eng\p`) — written by + `bl 0x822F8A70` / `bl 0x82150030` AFTER sub_8244FC90 returns. + +### Recommended audit-041 (sharp prediction) + +**Two parallel options:** +1. **DOWNSTREAM-USE PROBE (preferred)**: probe sub_822DFC34 + (`bl 0x824AA330` waitsite) in BOTH runtimes. Capture r3 (handle being + waited on) and verify wait completes. If canary's wait completes but + ours doesn't, audit-041 is signaler-missing (trace which kernel call + signals canary's `0xF80000DC`). If canary's wait ALSO doesn't + complete, the namespace finding is benign and the gate is upstream + of the wait (RDX search-criteria producer). +2. **AUDIT-037 RE-VERIFICATION**: dump 128 bytes from canary's r3 and + ours's r3 AT THE EXIT of sub_8244FC90 (not at session-end). If the + filename text is written by sub_824503A0+0x478 callees + (sub_822F8A70 / sub_82150030), those are the real audit-041 targets. + +### Trace artifacts +- `audit-runs/audit-040-record-ctor-inputs/canary-0x8244FC90.log` (33 fires) +- `audit-runs/audit-040-record-ctor-inputs/ours-lrtrace.jsonl` (8 fires) +- `audit-runs/audit-040-record-ctor-inputs/ours-dump.log` (10 dump-addr) +- `audit-runs/audit-040-record-ctor-inputs/canary-patch.diff` (notes) + +### Cleanup +Canary patch reverted (clean baseline confirmed; HEAD `6de80dffe` +unchanged). xenia-rs source unmodified, no commit, master HEAD +`d8766c6` unchanged. Tests 645. + +### Discipline gate (5/5 PASS) +1. Hypothesis explicitly tested (extracted divergent input arg + named + upstream producer NtCreateEvent). +2. Canary patch applied + reverted at session close. +3. xenia-rs source unmodified, no commit. +4. Single-step (data-gathering only, no fix attempt). +5. Trace files saved per audit dir convention. + +## KRNBUG-AUDIT-041 — wait-site signaler determination (READ-ONLY, 2026-05-09) + +Re-applied audit-030 `--log_lr_on_pc` canary patch (30 LOC, 4 files); +reverted at session close (canary HEAD `6de80dffe` clean). + +**Wait site**: PC `0x822DFC34` `bl 0x824AA330` (KeWaitForSingleObject +wrapper, INFINITE timeout) inside sub_822DFBC8. Wait loops on r3=0x102 +(STATUS_TIMEOUT) and on `[r31+52]==3`. Containing function is the +direct caller of audit-040's NtCreateEvent at sub_822DFC74; the handle +flowing into r3 is the OUT handle from that create. + +**Wait completion ratio (30s canary trace; 500M-instr ours)**: + +| Runtime | bl/pre-bl | post-bl | completes | +|---------|-----------|---------|-----------| +| canary | 9 | 9 | 100% | +| ours | 7 | 6 | **6/7 = 85%** | + +The 7th wait in ours stalls. **Stalled handle = `0x00001454`** +(audit-040 family). Bl-PC 0x822DFC34 returns 0 fires in our HIR +(`bl` is a control-flow terminator, probe elided); pre-bl +`0x822DFC30 addi r4,r0,-1` fires 7× (fair comparison). The 7th +pre-bl fire (cycle 48,849) has no matching post-bl. + +**Outcome (i) confirmed**: handle-namespace divergence is +**load-bearing**. + +**Signaler identified**: probed canary KeSetEvent (0x8284DDDC, 20588 +fires, 0 on F80000CC/C0 — takes KEVENT*, not handle) and NtSetEvent +(0x8284DF5C, 9245 fires, **2 on F80000CC/C0**). Both fires LR=0x824AA304 +inside wrapper sub_824AA2F0 (89 static callers). **Signaler = +NtSetEvent** (xboxkrnl ord 246). + +**Cross-check ours**: NtSetEvent at 0x8284DF5C fires 3334 times in ours; +**1 fire on `r3=0x00001454`** at cycle 3,519,453 (after the stall at +cycle 48,849). So signaler IS firing — bug is NOT pure +signaler-missing. + +**Bug class refinement (provisional)**: δ-namespace AND δ-wakeup. The +signal exists but doesn't wake the waiter. Candidate causes: + +- Handle table recycles slot 0x1454 between create-epochs in our impl + (so signal hits a *different* KEVENT than wait registered for). +- KeSetEvent / wait-queue machinery has a missed-wake (signal-before- + wait race ruled out: signal at 3.5M is AFTER wait at 48,849). + +**Recommended audit-042** (autonomous, two-track): + +1. Probe sub_824AA2F0 entry; capture LR + r31 per fire on r3=0x1454. + Names the actual signaler caller chain. +2. Dump handle table state for slot 0x1454 at cycles 48,849 (wait) and + 3,519,453 (signal). If different KEVENT pointers → handle aliasing + bug in our `xenia_kernel::handle_table` (slab recycle between + NtCreate/NtClose). If same → bug in `KeSetEvent` / wait-queue. + +Both fixes ≤60 LOC. xenia-rs source unmodified, no commit, master +HEAD `d8766c6` unchanged. Tests 645. Trace +`audit-runs/audit-041-wait-site/`. + +### Discipline gate (5/5 PASS) +1. Hypothesis explicitly tested (wait-completion-ratio canary vs ours). +2. Canary patch applied + reverted at session close. +3. xenia-rs source unmodified, no commit. +4. Single-step (data-gathering only, no fix attempt). +5. Trace files saved per audit dir convention. + +## KRNBUG-AUDIT-042 — handle 0x1454 lifecycle disambiguation (READ-ONLY, 2026-05-09) + +Re-applied audit-030 `--log_lr_on_pc` canary patch (30 LOC, 4 files); +reverted at session close (canary HEAD `6de80dffe` clean). xenia-rs +master `d8766c6` unchanged. Tests 645. + +**Goal**: disambiguate audit-041 root cause (A) handle-recycling vs +(B) wakeup-plumbing for handle 0x1454's missed wakeup. + +**Method**: ours via `--trace-handles-focus=0x1454` (existing +audit.rs infrastructure); canary via `--log_lr_on_pc=0x8284DF1C` +(NtCreateEvent thunk, ord 209) + cross-reference to audit-041's +existing `canary-bl-0x822DFC34.log` containing canary's +`Added handle:/Removed handle:` lifecycle markers. + +### Allocator architecture (decisive structural finding) + +`KernelState::alloc_handle` (state.rs:588-593) is a **monotonic +atomic counter** initialized to `0x1000`, advanced via +`fetch_add(4)`. **Bump-only — no recycling, ever.** `nt_close` +(exports.rs:1869) decrements refcount and removes the object from +`state.objects`, but **NEVER returns the handle ID to the pool**. + +This makes root cause (A) — handle-recycling — **structurally +impossible in ours**. + +### Handle 0x1454 lifecycle in ours (`-n 500M`, two reruns identical) + +``` +created: cycle=0 tid=13 lr=0x824a9f6c src=NtCreateEvent kind=Event/Manual + stack: lr=0x822dfc94 (caller — audit-041's sub_822DFC74) + ← 0x822e0344 ← 0x822d2ca4 ← 0x822de768 ← 0x821c4b1c +wait: cycle=0 tid=13 lr=0x824ac578 src=do_wait_single +signal: cycle=0 tid=5 lr=0x824aa304 src=NtSetEvent +wake: cycle=0 tid=5 src=wake_eligible_waiters/auto +final: waiters=0 signaled=true signal_attempts=1 waits=1 wakes=1 +``` + +(`cycle=0` is a separate, pre-existing audit-instrumentation gap +— `KernelState::audit_entry` reads `scheduler.ctx(0).timebase` +which is 0 in this build. Counts/ordering still authoritative +because rings are append-only.) + +**Single create, single wait, single signal, single wake — fully +consumed.** Handle 0x1454 is **NOT stuck** at end-of-run in this +audit. The end-of-run "Handle waiter lists" section names the +actually-stalled handles: `0x1004 0x1020 0x1544 0x1578 0x10a0 +0x12ac 0x1040 ...` — all ``. **0x1454 is +not among them.** + +### Handle 0xF80000CC family lifecycle in canary + +From audit-041's `canary-bl-0x822DFC34.log` (debug-helper output +around `ObjectTable::Add/Release`): + +``` +Added handle:F80000CC for XObject (ctor — fresh KEVENT slot) +NtDuplicateObject(F80000CC, ...) × 3 (handle-table dup) +TRACE-PC-LR pc=822DFC34 r3=F80000CC (wait fires on live KEVENT) +NtClose(F80000CC) (after wait completes) +Removed handle:F80000CC for XEvent (slot freed) +Added handle:F80000CC for XEvent (NEW KEVENT, SAME SLOT VALUE) +NtClose(F80000CC) → Removed → Added × 4 more iterations +``` + +**Canary RECYCLES handle slots heavily**: `F8000098` reused 130×, +`F80000D0` 95×, `F80000DC` 71×, `F80000C0` 10×, `F80000CC` 5× in a +single 30s window. Canary's `ObjectTable::AllocateHandle` (per +`xobject.cc/object_table.cc`) is a slab/free-list allocator; ours +is bump-only. + +### Decisive disambiguation + +| | ours | canary | +|---|---|---| +| handle 0x1454 NtCreateEvent fires | **1** | N/A (different namespace) | +| handle 0xF80000CC `Added handle:` | N/A | **5+** within 30s | +| recycling? | **NO** (structurally impossible) | **YES** (slab) | +| audit-041 stall handle 0x1454 | wait+signal+wake recorded in `--trace-handles-focus` rerun | — | + +**Verdict: ROOT CAUSE IS NOT (A) HANDLE-RECYCLING.** + +Sub-conclusion on audit-041's premise: under +`--trace-handles-focus=0x1454 -n 500M`, handle 0x1454's wait+signal +DO complete (1 wake recorded). audit-041's "wait NEVER returns" +inference came from `--lr-trace`-only data (post-bl missing for the +7th iteration); but `--quiet` suppressed the end-of-run audit dump +in audit-041, so the wait-completion was never directly verified. +The lr-trace miss can be explained by: lr-trace records the +**guest-side resume PC after `bl`**; if KeWaitForSingleObject's +return path bypasses that PC (e.g., direct context-restore on +wake), the post-bl trace doesn't fire even though the wait +completes. **audit-041's load-bearing premise is provisionally +falsified for handle 0x1454 specifically.** + +### Real wedge points + +The actual stalled handles per this run's end-of-run dump: +- `0x1004` Event/Manual (tid=11 parked, 0 signals) +- `0x1020` Event/Manual (tid=3 parked, 0 signals) +- `0x1040` Event/Auto (tid=5 parked via WaitMultiple, 0 signals) +- `0x1544` Event/Manual (tid=17 parked, 0 signals) +- `0x1578` Event/Auto (tid=19 parked, 0 signals) +- `0x12ac` Semaphore (tid=14, 15 parked, 0 signals) +- `0x10a0` Event/Auto (tid=6, 0 signals) + paired `0x10a4` Semaphore + +All carry ``. These are γ-class +missing-signaler candidates — distinct from 0x1454. + +### Bug-class refinement + +**δ-wakeup ruled out** for 0x1454 (wake DID fire). **δ-namespace +ruled out** (single create, no aliasing). **The wedge is on a +different handle set** — needs re-pivot. + +### Sharp 4-dim cascade prediction (for any audit-043 fix targeting the *real* stalled handles, e.g. `0x1004` or `0x10a0`) + +- **A**: handle 0x1004's `signal_attempts` goes 0 → ≥1 (signaler + named; KE/Nt SetEvent or KeReleaseSemaphore reaches it). +- **B**: tid=11 transitions out of `Blocked(WaitAny [4100])` to + Ready/Exited; thread-list shrinks by ≥1 stalled thread. +- **C**: dependent waiters (any handle whose creator/signaler is + gated by tid=11) start firing — measurable as `` + count drops by ≥2 across the trail set. +- **D**: `swaps` advances past 2 OR `draws` flips from 0 to >0. + *Probability*: lower (γ-cluster activation is the audit-009 + plateau; multiple gates must fall, only one is being addressed). + +### Recommended audit-043 + +**Pivot**: re-target audit on the **actually-stalled** handles per +this session's end-of-run dump. Ranked by likely impact: + +1. **`0x10a0` Event/Auto + `0x10a4` Semaphore on tid=6** — + Event+Semaphore pair is canonical "worker waits for job; + producer hasn't run." Trace tid=6's entry PC and producer chain. +2. **`0x12ac` Semaphore (2 waiters: tid=14, tid=15)** — semaphore + never released; `KeReleaseSemaphore` source is the target. +3. **`0x1004` Event/Manual on tid=11** — earliest-created stalled + handle. Its non-signaling caller chain is the bottom-up gate. + +For each: run with `--trace-handles-focus=` to capture the +created stack and identify the producer-side function. Canary +cross-trace via `--log_lr_on_pc=0x8284DF5C` (NtSetEvent) or +`0x8284DDDC` (KeSetEvent) filtering for the equivalent canary +handle at that PC + LR signature. Patch budget unchanged (≤60 LOC). + +**Bug class for audit-043**: **γ (missing signaler)** — primary +candidate. **NOT δ-namespace, NOT δ-wakeup.** The handle-namespace +divergence (audit-040) appears to be benign per this audit's +finding that 0x1454 actually completes. The real stalled handles +are γ-class (signaler-missing on a *different* event/semaphore). + +### Discipline gate (5/5 PASS) +1. Hypothesis explicitly tested (recycling vs plumbing for 0x1454). +2. Canary patch applied (30 LOC) + reverted at session close. +3. xenia-rs source unmodified, no commit. +4. Single-step (data-gathering only, no fix attempt). +5. Trace files saved: `audit-runs/audit-042-handle-lifecycle/ + {probe.log, probe-run2.log, canary-create-0x8284DF1C.log}` + (~11.5 MB) + cross-ref of audit-041's existing + `canary-bl-0x822DFC34.log`. + +### Status +- Tests: 645 (unchanged). +- Lockstep: instructions=100000004 unchanged (no source mods). +- Master HEAD: `d8766c6` (unchanged). +- Canary HEAD: `6de80dffe` (clean, post-revert). + +--- + +## KRNBUG-AUDIT-043 — record +0x00 writer, allocator-VA divergence (READ-ONLY, 2026-05-09) + +**Status**: READ-ONLY single-step. Master `d8766c6` unchanged, canary patch reverted. Tests 645 unchanged. + +### Goal + +Identify the writer of `+0x00` at records `0x40542300/0x40542340/0x40542400/0x405424c0` in our impl. Audit-039 reported ours has `0x67616D65` ("game" inline) while canary has `0xF80000B8` (kernel handle) — claimed to be the most fundamental layout divergence. + +### Method + +Mem-watch on `+0x00,+0x04` of all 4 records (`-n 500_000_000`). Group writers by (PC, LR). Look up containing functions in `sylpheed.db`. Disasm + caller chain. Apply audit-030 LR-trace patch to canary; probe writer PC `0x825F1080` (memcpy) and pool-init PC `0x82152728`. + +### Findings + +**The writer of `0x67616D65` is `memcpy` at `pc=0x825F1080`, called from `memcpy_s` (`sub_825ED588`, return = `lr=0x825ED608`)**, invoked from `std::basic_string::reserve_then_assign` (`sub_8216E138+0xC8`). 16 fires across 4 records. + +**The records are NOT layout records** — they are 64-byte slots in a Sylpheed-managed pool allocator: +- `sub_821505D8` (called from `sub_8280C42C`) allocates ~58 MB via `sub_824A8858` (size `0x03A723D0`, type `0x20000004`). +- `sub_82152570` builds 4 free-list buckets; `sub_82152728` chains 64-byte slots over a 1.25 MB span. +- Slot-size table at `sub_821505D8+0x10`: 4, 16, 32, 64, 96, 128, 160, 192, 256. +- The "filenames" land in 64-byte slots when a Sylpheed `std::string` is heap-promoted from SSO (capacity ≥ 0x10). + +### Canary cross-trace + +Probed `pc=0x825F1080` in canary (audit-030 `--log_lr_on_pc` patch reapplied): +- 94,945 fires in 25s. **Zero hits to `0x40542xxx`**. Destinations distribute over `0x705Dxxxx` (76674), `0x7033xxxx` (6642), `0xBC36xxxx` (1211), etc. +- Top LR `0x824AB1D4` (84,400×, an alloc-related path absent in our trace). +- Canary's matching `LR=0x825ED608` (memcpy_s caller) fires 1,782× — **none target `0x40542xxx`**. + +Pool-init `pc=0x82152728` in canary fires once with `r3=0xBC32C880` — **canary's 58 MB pool BASE = `0xBC32C880`**; ours' is `~0x40541xxx`. + +### Bug-class refinement + +**Audit-039's "0xF80000B8 vs 'game'" is a VA-equality fallacy.** The same guest VA `0x40542300` backs *different live data* in the two emulators because their host-side allocators return different VAs for the same `sub_824A8858` call. Ours: 64-byte std::string heap buffer. Canary: a kernel-handle / NotifyListener slot at *its* unrelated VA. + +**Class = ε (host-allocator address-space divergence)**, not a guest-write bug. There is no missing/wrong write at `+0x00` in our impl. + +**Reading-error ledger update**: 12th entry — *VA-equality fallacy across emulators*. Comparing memory contents at identical guest VAs assumes both allocators return the same VA for the same logical allocation; Sylpheed's pool factory makes this assumption false in general. Future audits comparing two emulators' guest memory must compare *logical allocations* (resolved through the producing allocator), not raw VAs. + +### Recommended audit-044 + +Drop the "record at 0x40542300+" line of investigation entirely. + +Re-pivot to audit-042's actually-stalled-handle plan: +1. `0x10A0` Event/Auto + `0x10A4` Semaphore on tid=6 — producer chain. +2. `0x12AC` Semaphore (tid=14, tid=15 waiters) — `KeReleaseSemaphore` source. +3. `0x1004` Event/Manual on tid=11 — earliest-created stalled handle. + +Each: `--trace-handles-focus=` for create/wait stack; canary cross-trace via `--log_lr_on_pc=0x8284DF5C` (NtSetEvent) or `0x8284DDDC` (KeSetEvent) on equivalent handle. + +**Bug class for audit-044**: **γ (missing signaler)** — same target as audit-042's recommendation; audit-043 did not move the cluster, but eliminated a false-positive line of investigation. + +### Discipline gate (5/5 PASS) + +1. Hypothesis explicitly tested (writer-of-+0x00 isolated; canary equivalence checked). +2. Canary patch applied (30 LOC audit-030 base) + reverted at session close (`git status` clean, config TOML restored to `log_lr_on_pc = 0`). +3. xenia-rs source unmodified, no commit. +4. Single-step (data-gathering only). +5. Trace files saved: `audit-runs/audit-043-record-zero-offset/{mem-watch.log, mem-watch.stdout, canary-825f1080-traces.txt.gz (95k LR records), audit-043-canary-poolinit.log}`. + +### Status + +- Tests: 645 (unchanged). +- Lockstep: instructions=100000004 (unchanged). +- Master HEAD: `d8766c6` (unchanged). +- Canary HEAD: `6de80dffe` post-revert clean.