3f8d3b6f1c7039de74e300bf8eb1e34f3e93090c
53 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
3f8d3b6f1c |
[iterate-3AD] Fix 2nd splash logo rendering black: re-upload evolving atlas
The publisher (SQUARE ENIX) and the 2nd developer/studio splash logo share
one K8888 atlas at physical base 0x4dbee000, sampled at different UVs. The
publisher's white text occupies the top V-bands; the developer logo's
(bluish/gold) artwork is CPU-written into the SAME surface AFTER the publisher
frame, so the atlas evolves across frames.
The UI host texture cache (`texture_cache_host::upload`) only re-uploads a
`TextureKey` when `version_when_uploaded` increases. But the per-draw bind in
`render.rs` hardcoded `version_when_uploaded = 1` for every draw, so once the
atlas was first uploaded (during the publisher frame, with only the top bands
filled) the cache pinned that partial upload. The 2nd logo, sampling a V-band
that was still zero at first-upload time, read transparent-black -> rendered
nothing (the "white-triangle / black stub" the user saw after SQUARE ENIX).
Verdict: (G) a legitimate 2nd LOGO item whose real artwork lives in the same
evolving atlas — NOT a spurious 3rd item, and NOT a geometry/shader/blend gap.
Measured via readback: the 2nd-logo geometry rasterizes correctly (3 on-screen
quads), interp1 (UV) and interp0 (color) reach the PS with real values, the
texture content at the sampled bands exists — only the bound wgpu texture was
the stale partial upload.
Fix (UI-only, deterministic core untouched):
- `gpu_system`: thread the real content `version` (from `span_max_version`)
into `last_draw_textures` (now `(key, version, bytes)`).
- `draw_capture::DrawCapture.textures`: same 3-tuple.
- `render.rs`: use the real `version` (not a hardcoded 1) so the host cache
re-uploads when the guest fills more of the atlas.
- `exports.rs` `vd_swap`: the legacy single-texture `publish_texture` bridge
drops the version (`(key, _v, bytes) -> (key, bytes)`).
Readback (env-gated probe, removed before commit): after the fix the 2nd logo
renders real varied artwork (blue + gold texels in a centered strip) instead
of black. Determinism: `check -n50m --gpu-inline --stable-digest` byte-
identical to the
|
||
|
|
504592ac13 |
[iterate-3O] Real-render slice: replay guest geometry in --ui (Route A)
Replace the synthetic placeholder triangle in the --ui window with the
splash's REAL guest geometry, proving the faithful-render pipe end to end.
Architecture: Route A (UI-side replay). A per-draw capture channel carries
each PM4_DRAW_INDX*'s real state to the UI, which replays it through the
existing wgpu Xenos pipeline. The deterministic headless core is untouched:
capture is gated on an Option<Vec<DrawCapture>> that is None in headless
mode and only enabled on the --ui path, so the --gpu-inline n50m golden is
byte-identical (verified 2x).
The hard part was sourcing real vertices. The WGSL VS already does
format-aware vertex fetch from the b4 storage buffer at the address from the
fetch constant -- but b4 was never populated and the fetch address is an
absolute guest dword address. The slice:
* xenia-gpu/draw_capture.rs: parse the active VS, find its first vertex
fetch, read that fetch constant, copy a bounded window of guest memory
at the fetch base. Best-effort: has_real_vertices=false falls back to
procedural geometry (never fabricated pixels).
* gpu_system.rs: accumulate one DrawCapture per draw into frame_captures.
* exports.rs (vd_swap): drain + publish the frame's captures to the UI.
* ui_bridge/bridge.rs: new publish_geometry channel + UiHandles.geometry.
* WGSL (interp + translator): rebase the absolute fetch address by a new
DrawConstants.vertex_base_dwords so it indexes the uploaded window.
* render.rs: dispatch_xenos_captures uploads each draw's real vertex
window + matching shader, issues real DrawRequests (real prim type,
host vertex count, vs/ps keys).
* app.rs: prefer the real-capture replay; HUD adds real-geo=N counter.
Verified in --ui on Sylpheed: "first Xenos capture batch replayed (real
geometry) captures=24 real_vertex_draws=24" -- all draws resolved a real
guest vertex window; WGSL compiles; no validation errors over 1616 swaps.
Still synthetic-free but not yet pixel-perfect: textures/UVs, DMA index
buffers (auto-index only for now), and kCopy resolve routing are staged
for follow-ups. Faithful: real vertex data, prim types, shaders, constants.
cargo test --workspace green; n50m golden unchanged (2x byte-identical).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
||
|
|
3f5d5cf5f7 |
[iterate-2Z] Implement NtSetInformationFile FileRenameInformation for cache: files
The GamePart title-logo gate first-divergence: Sylpheed's asset cache
decompresses each packed resource to a staging `cache:\<hash><tail>.tmp`
file, then renames it into its final nested path `cache:\<hash>\<dir>\<file>`
(e.g. the title logo texture `\69d8e45c\e\534ffea`) via
NtSetInformationFile class 10 (XFileRenameInformation). Our handler treated
class 10 as a permissive no-op (catch-all `_ => STATUS_SUCCESS`), so the host
rename never happened: the nested target directories were created but left
EMPTY while the decompressed data stayed in the flat `.tmp` file. When the
title later reads back `\69d8e45c\...` to build the logo texture the read
misses, so the textured logo pixel shader (canary `E59B2B3D`, tfetch2D) is
never dispatched and the logo never renders.
Fix: implement class 10 faithfully, mirroring canary
`xboxkrnl_io_info.cc:226` (`X_FILE_RENAME_INFORMATION{ replace_existing@0,
root_dir_handle@4, ANSI_STRING@8 }` -> `file->Rename(TranslateAnsiPath)`).
Read the target path from the embedded ANSI_STRING at info_ptr+8, resolve it
against the host cache backing dir (`resolve_cache_path`), create the parent
dirs, `std::fs::rename` the backing file, and update the handle's `path` +
`host_path`. Non-cache (read-only VFS) sources keep the prior permissive
acknowledge. Verified at runtime: 20 renames/80M now move
`69d8e45ce534ffea.tmp -> 69d8e45c/e/534ffea` etc., and the nested cache tree
now matches canary's HostPathDevice layout byte-for-byte (data present, not
empty dirs).
Made `path::read_ansi_string` pub so the handler can parse the rename target.
Deterministic + golden-invariant: two `check --gpu-inline --stable-digest
-n 50000000` runs are byte-identical and the 50M stable digest is unchanged
(draws=718/swaps=147/6 shaders/tex=0); the logo read-back occurs later than
the observable window so GPU counters at 1B/2.5B are unchanged
(2.5B: draws=48734, swaps=16060, still 6 flat shaders, texture_decodes=0).
The fix is a verified-necessary precondition — without it the nested asset
read-back is guaranteed to miss. A downstream gate (the 2nd title thread's
load-completion post skipped when its notify target `[r29+8]==0`, and the
later read-back phase being beyond 2.5B) remains for follow-up.
New test: `nt_set_information_file_rename_moves_cache_file` (678 total, was
677).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
||
|
|
2f55d1fd7d |
[iterate-2X] Texture pipeline: un-stub RectangleList + draw-time texture decode
Two faithful, deterministic GPU-backend changes that make the texture path
correct for whatever textured draw the splash eventually dispatches. Both are
currently inert on Sylpheed (the textured logo draw is still gated downstream
— see below), but neither shifts the stable-digest golden, so they land safely.
1. Un-stub RectangleList primitive expansion (primitive.rs). The splash submits
2819 RectangleList draws at 200M, all of which were REJECTED by the P3 stub
(`gpu.primitive.rejected{rectangle_list}`) → only ~592 flat point/quad draws
rasterized. Mirror canary's intent (primitive_processor.cc:389-456
kRectangleListAsTriangleStrip) within our CPU index-rewrite idiom: emit each
rect's 3 real vertices as one TriangleList triangle (v0,v1,v2), rejected=false,
faithful host_vertex_count. The full quad (synthesized 4th corner v3=v0+v2-v1)
needs real vertex fetch in vs_main — left as a documented TODO. Rejection
warnings drop 2819→0.
2. Draw-time texture decode keyed off the active PS's real tfetch slots
(gpu_system.rs + exports.rs vd_swap). Previously vd_swap decoded a hardcoded
fetch-constant slot 0 at swap time. Now the DRAW handler parses the bound
pixel shader (ucode::parse_shader), collects its tfetch fetch_const slots via
new shader_metrics::tfetch_slots, reads each 6-dword fetch constant, and
decode+caches it into GpuSystem::last_draw_textures. vd_swap publishes the
first of these (UI binds one texture today), falling back to the legacy slot-0
probe on flat-only frames. New span_max_version helper walks page_version over
the trait (draw-time &dyn MemoryAccess lacks the heap's inherent
max_page_version). Pure function of guest writes — deterministic.
Status: texture_decodes stays 0 on Sylpheed because all 6 live shaders are flat
(no tfetch); canary's textured logo shaders E59B2B3D/F7B1457 are not yet
dispatched by ours (a downstream title-state gate, the next frontier). The full
P5 decode→publish→upload→sample path is already wired; this makes the decode
side key off the real shader instead of a guess.
Validation: stable-digest golden sylpheed_n50m unchanged (draws=718 swaps=147
tex=0), regenerated twice byte-identical; 200M run shows 0 RectangleList
rejections. cargo test --workspace green (677, +2: rectangle_list_expansion,
tfetch_slots_extracts_texture_fetch_constants). No temp hooks. Branch only;
not pushed/merged.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
||
|
|
a91f4c550b |
[iterate-2W] Sustain the title present loop: viewport-size register + ISR CPU impersonation
The title's per-frame loop (sub_822F1AA8) is clock-B-paced and only re-fires when the swap count [controller+88] changes, which advances only on source=1 CP swap-complete interrupts. Each present batch the guest submits (via the sub_824CE348 -> sub_824BF4D0 builder) ends with a WAIT_REG_MEM on a per-CPU swap-acknowledge fence [GCTX+0] (GCTX = [device+10772]); the GPU parks there until the graphics ISR (sub_824BE9A0) clears that CPU's bit. Two coupled gaps kept ours emitting only ONE source=1 then dead-locking (draws plateaued at 28, run halted ~19.27M): 1. GPU MMIO register 0x1961 (AVIVO_D1MODE_VIEWPORT_SIZE) read as 0. The swap callback sub_824CE2B8 divides by its low 12 bits (display height) as a refresh-pacing term, so a 0 read tripped its `twi` divide-by-zero guard and aborted the ISR before it reached the fence-clear. Mirror canary GraphicsSystem::ReadRegister (graphics_system.cc:311): return 0x050002D0 (1280x720). 2. The ISR ran on an arbitrary borrowed thread, so [r13+268] (the PCR processor number) did not match the interrupt's target CPU. The ISR clears `1 << current_cpu` from the fence; running on the wrong CPU cleared the wrong bit and the fence (bit 2, from cpu_mask 0x4) never reached 0. Carry the target CPU through the interrupt queue (bit index of the PM4_INTERRUPT cpu_mask for CP, 2 for vsync per canary DispatchInterruptCallback(0, 2)) and impersonate it on the borrowed thread's PCR around the ISR, mirroring canary EmulateCPInterruptDPC -> XThread::SetActiveCpu. With both fixes the fence clears, the GPU drains each present batch, source=1 sustains per-present, clock B advances, and the loop runs continuously. Draws climb linearly with the budget (no re-stall): 50M 28->718, 200M ->3411, 1B ->18734; swaps 2->147/950/6060. No "Unanticipated CPU_INTERRUPT" trap. Inline-deterministic (--stable-digest byte-identical x2); n50m golden re-baselined. 675 tests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
66bd805726 |
[iterate-2V] VdSwap: stop bumping primary CP_RB_WPTR out-of-band (canary-faithful)
Ours' `vd_swap` wrote its 64-dword XE_SWAP block at the guest's reserved `buffer_ptr` slot AND then bumped the primary ring `CP_RB_WPTR` out-of-band via `state.gpu.extend_write_ptr_by(64)`. That bump was a bug: `buffer_ptr` (~0x4add6efc) is NOT inside the primary ring (base ~0x4adcd000, 8192 dwords) — it lives ~10k dwords past it, in the renderer indirect-buffer region. The bogus WPTR bump pushed the GPU read-pointer PAST the guest's real write-pointer; the drain treated the overshoot as a circular wrap and re-executed the splash's draw indirect-buffers ~2×, inflating draws to 78 (the real splash geometry is ~28 draws; 12 INDIRECT_BUFFERs vs the real 6). Canary's `VdSwap_entry` (xenia-canary xboxkrnl_video.cc:518-548) writes the fetch-constant patch + PM4_XE_SWAP + NOP pad into the reserved slot and returns — it NEVER touches CP_RB_WPTR. The guest advances the primary ring write-pointer itself via its own doorbell once it has populated the slot; swap-complete CP interrupts come only from the game's in-stream PM4_INTERRUPT packets, never from VdSwap. This fix removes only the out-of-band `extend_write_ptr_by(64)` call, keeping the buffer_ptr block write intact and byte-faithful to canary. Effect at `--gpu-inline -n 50M`: draws 78→28, INDIRECT_BUFFER 12→6 (re-execution artifact gone), swaps 4→2. The run now halts at ~19.27M instructions (worker threads exit) instead of spinning to 50M, because removing the corruption unmasks the real per-present-interrupt deadlock — the title loop needs a per-present PM4_INTERRUPT that the stalled game never submits. That deadlock is a SEPARATE, known gate tracked/addressed elsewhere; it is intentionally NOT papered over here. Re-baselined golden crates/xenia-app/tests/golden/sylpheed_n50m.json to the new honest values (regenerated twice, byte-identical). sylpheed_n2m.json is unaffected (draws=0 at 2M). cargo test --workspace: 675 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
873c197ff1 |
[iterate-2T] VdSwap: route present through ring PM4_XE_SWAP, drop out-of-band swap interrupt
Make ours' VdSwap present path faithful to xenia-canary `VdSwap_entry` (xboxkrnl_video.cc:518-548): write the reserved 64-dword ring slot with a PM4_TYPE0 fetch-constant patch + PM4_TYPE3(PM4_XE_SWAP) + NOP padding, then let the natural drain consume the swap packet in command-stream order. Remove the synthetic CP swap-complete interrupt that `notify_xe_swap` raised out-of-band. Root found this session (the actual present-path bug): ours' `notify_xe_swap` pushed an `InterruptSource::Swap` (→ INTERRUPT_SOURCE_CP) interrupt directly from the VdSwap HLE, decoupled from the GPU command stream. When that interrupt reached the graphics ISR `sub_824BE9A0` before D3D had armed its swap-callback slot (`[gfx+10772]+16` still the `0xBADF00D` placeholder), the ISR took its error path and hit the assert "ERR[D3D]: Unanticipated CPU_INTERRUPT. Sign of a corrupt command buffer?" (`bl sub_824C5DF0; twi` at 0x824BE9DC) — 2x per run on master. Canary's VdSwap raises NO interrupt; swap-complete CP interrupts come only from in-stream PM4_INTERRUPT packets, which are naturally ordered after the callback-arming Type-0 writes. Routing the swap through the ring packet matches that ordering and eliminates the trap (2 -> 0). Canary oracle confirmation (muted, audit_mem_watch + audit_jit_prolog_pc): canary's early/loading loop is present-driven — swap counter [gfx+15160] (0xBE56CA38) advances ~per-vblank from vblank 65 onward, reaching 0xD02 (3330) in ~60s via 6184 CP source=1 interrupts, with VdSwap called only ONCE. So the present interrupts are entirely in-stream, not from the VdSwap export. This is a correctness/faithfulness fix; it does NOT cascade. draws stay 78 at 200M and 1B because the upstream gate persists: the game submits one render batch then stalls (renderer sub_82506xxx 0x; 2nd title thread 0x821748F0 never spawns). The per-frame loop sub_822F1AA8 runs ~1207 iterations on vsync but clock B (swap count) only advances ~once, so the manager update sub_821741C8 fires once. That is the iterate-2Q/2F title-pipeline gate, not a present/ interrupt bug. swaps 3 -> 4 (the in-stream PM4_XE_SWAP now drains). Deterministic in inline mode (n50m --gpu-inline --stable-digest regenerated byte-identical twice; golden re-baselined: swaps 3 -> 4). cargo test --workspace 675 passing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
034ec8b47f |
[iterate-2O] GPU: drain indirect buffers correctly — Sylpheed renders splash (draws 0→78)
Ours' GPU never drained the D3D driver's system command buffer past the first 11-dword indirect buffer, so DRAW_INDX / reg-0x57C-arm packets never executed and draws stayed 0 (the long-hunted render gate; see UPDATE-18). Runtime tracing (temporary, removed) showed the guest submits 6 INDIRECT_BUFFER packets at boot (CP_RB_WPTR 22→37) but ours executed exactly ONE IB and then spun 15.7M packets inside it. Three coupled command-processor bugs, all corrected to match canary: 1. `sync_with_mmio` applied the primary CP_RB_WPTR to whichever ring was active, including an executing indirect buffer — `37 % 11 = 3` clobbered the IB's write pointer so its read pointer looped 0→2→5→0 forever and never popped back to the primary ring. CP_RB_WPTR governs ONLY the primary ring; while an IB executes, the primary is the bottom of the IB stack. Canary executes each IB through a separate `RingBuffer reader_` (command_processor.cc), so the primary write pointer is structurally inapplicable to an IB. 2. Indirect buffers were treated as circular rings: read wrapped at `size_dwords` (`11 % 11 = 0`) and never reached the fixed write pointer, so even without the clobber the IB could not terminate. An IB is a fixed *linear* sub-stream; add `RingBufferView.indirect` and drain `[0, ib_size)` monotonically, then pop. 3. `is_ready` only checked the active ring, so an IB that now correctly exhausts would never get `execute_one` called again to pop back to the primary ring (whose WPTR may have advanced). Check the whole IB stack. Also: the ring was sized `1 << size_log2` bytes (1024 dwords) vs canary's `1 << (size_log2 + 3)` (8192 dwords) — an 8× undersize that desynced WPTR-wrap math from the guest. Fixed in `GpuSystem::initialize_ring_buffer` (and the dead bookkeeping copy in `vd_initialize_ring_buffer`). Cascade (deterministic; threaded-default backend, byte-identical across runs): reg 0x57C now written, IB jumps 1→12, packets 15.7M→9,825, and the splash renders — draws 0→78, shaders 0→3, render_targets 0→2, swaps 2→3 — stable at 50M / 200M / 1B. Boot then reaches a new downstream gate (draws plateau at 78, interrupts keep climbing → engine alive, not deadlocked). golden `sylpheed_n50m.json` re-baselined (draws 78). `cargo test --workspace` green (674; +2 ring_view regression tests). vd_swap's synthetic-swap short-circuit is now redundant but left untouched (cascade works without changing it); cleaning it up is a separate follow-up. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
93f60a3ba0 |
[iterate-2M] PCR+0x10C (PRCB.current_cpu): init per-HW-thread to unwedge spin-barrier
Ours never initialized the PRCB `current_cpu` byte at PCR+0x10C (prcb_data@0x100 + current_cpu@0xC). Canary sets it from `GetFakeCpuNumber(affinity)` (xthread.cc:847 `pcr->prcb_data.current_cpu = cpu_index`), which equals the HW thread id ours already writes at PCR+0x2C. Left unwritten it read 0 for every thread. Guest spin-barrier `sub_824D1328` (used by the audio/update pump threads at entries 0x824D2878 / 0x824D2940, ours tid 9 / tid 10) indexes a per-HW-thread occupancy byte array via `lbz r11, 268(r13)` then `stbx ..., [array+index]`. With index 0 for all threads, every thread marked slot 0; the multi-byte rendezvous signature it then spins on (`ld [obj+0x164]` compared against the packed per-slot expectation) could never assemble. Both pump threads busied at pc 0x824d140c/0x824d1410 forever (Ready, 5M+ barrier iterations) and never ran their `KeSetEvent` loops — so the events they signal (the 21k-per-thread heartbeat in canary) never fired, starving the downstream worker handshake. Fix: write `hw_id` to PCR+0x10C alongside PCR+0x2C in both the static thread image init (thread.rs) and the dynamic PcrWriter (state.rs, used by scheduler spawn + affinity migration) so the two stay in sync. Runtime-verified BOTH engines. Post-fix the pump threads escape the barrier (barrier iterations 5M+ -> 3) and advance into their loop bodies, now correctly Blocked(WaitAny) at pc 0x824d28d0 / 0x824d29c0 (was spinning at 0x824d140c). imports at n50M 339,766 -> 451,508; deterministic (two cold runs byte-identical). draws still 0 (a later, separate render gate). golden re-baselined. cargo test --workspace: 672 passed, 0 failed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
2bdb93e51e |
[iterate-2K] GPU physical-mirror aliasing: ring/IB/RPtr/resolve read wrong host region
Root cause (physical-mirror aliasing gap → GPU read wrong region → ring never truly drained → render worker ring-space wait → no frame → no draw): The Xbox 360 maps its 512 MB of physical DRAM into several virtual mirror windows differing only in cache policy — bare physical (0x0xxxxxxx), write-combine (0x4xxxxxxx), and cached 0xA/0xC/0xExxxxxxx — all aliasing addr & 0x1FFF_FFFF. Ours has one flat membase and `heap_alloc` (MmAllocatePhysicalMemoryEx) commits physical backing in the 0x4xxxxxxx window. The guest masks its CP-ring allocation base to bare physical (0x4adcc000 & 0x1FFFFFFF = 0x0adcc000) before handing it to VdInitializeRingBuffer, and PM4 INDIRECT_BUFFER / writeback / resolve pointers are likewise bare-physical. Ours stored those verbatim and read `membase + 0x0adcc000`, a never-committed zero-filled page — so the GPU drained ~718k zero PM4 headers, never executed the real Type3/DRAW stream, and the RPtr writeback landed on a zero page the render worker (tid=8) polls, freezing it forever. Fix (GPU/Vd-boundary translation, not memory-layer): add `physical_to_backing(addr)` deriving the committed backing exactly from `heap_alloc`'s placement (0x4000_0000 | (addr & 0x1FFF_FFFF), idempotent for the WC window, flat for non-physical code/stack). Apply it at every point the GPU/kernel consumes a guest physical address: ring base (initialize_ring_buffer), RPtr writeback (enable_rptr_writeback), PM4 INDIRECT_BUFFER pointer, WAIT_REG_MEM / COND_WRITE memory poll+write, REG_TO_MEM / MEM_WRITE / EVENT_WRITE* / LOAD_ALU_CONSTANT / IM_LOAD addresses, the resolve dest write, and the vd_swap frontbuffer present read. This was chosen over memory-layer aliasing because the latter re-projects every CPU load/store and corrupts the guest's flat 0xA/0xC/0xE accesses (it caused an early PC=0xfffffffc fault). Two adjacent GPU-backend gates this exposed and also fixed (canary-faithful): - WaitCmp::from_wait_info was off by one vs canary's MatchValueAndRef selector (it decoded wait_info&7==3 as NotEqual instead of Equal), inverting the standard CP coherency wait so the GPU parked forever on the first INDIRECT_BUFFER. Remapped to 1=Less..7=Always, 0=Never. - Added MakeCoherent: a WAIT polling COHER_STATUS_HOST clears the status bit (mirrors command_processor.cc:801-838) so the coherency handshake resolves. Result: the GPU now decodes the real Type3 packets at 0x4adcc000 (ME_INIT, INDIRECT_BUFFER → real Type0/WAIT_REG_MEM at 0x4adf5080) instead of zero-headers; RPtr at 0x408619fc advances (0x13, 0x16, … written by the GPU worker); the frame loop sub_822F1AA8 actively writes the controller at 0x40d09a40 (0x20→0x21→0x23); no fault, full 200M/1B budget runs clean. draws_seen is still 0: the remaining gate is upstream and separate — the main frame loop never sets controller bit-28 (frame-ready) at [0x40d09a40] (stalls at 0x23, the known iterate-2C state-divergence gate), so the guest never enqueues a render IB; the GPU only ever replays the init IB. This fix correctly unblocks the GPU ring/IB/RPtr data path (gate-2 GPU backend); the bit-28 frame-ready gate is the next target. Stable golden (sylpheed_n50m) unchanged (draws/swaps/RTs/shaders identical at 50M); regenerated twice byte-identical. cargo test --workspace: 672 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
ed2e0e72fd |
[iterate-2J] KeTimeStampBundle deterministic tick: fix frozen+mislaid guest clock
The xboxkrnl data export KeTimeStampBundle (ordinal 0x00AD, import slot
0x820007d0 — confirmed via sylpheed.db imports table) was set up with TWO
defects in the import-patch pass:
1. FROZEN: the block was written once at boot and never updated, so every
field stayed a constant for the whole run (observed: the guest's clock
reader sub_824AA830 = [[0x820007d0]+0x10] returned a constant
0x01d6bc0c from 5M..150M instructions).
2. WRONG LAYOUT: it stuffed the FILETIME high-dword at +0x10. The canonical
X_TIME_STAMP_BUNDLE (xenia-canary kernel_state.h) is:
+0x00 interrupt_time u64 (100ns since boot)
+0x08 system_time u64 (FILETIME 100ns since 1601)
+0x10 tick_count u32 (milliseconds since boot)
+0x14 padding
so [block+0x10] is tick_count in ms, not a FILETIME dword.
Fix (deterministic, no wall-clock):
* Initialize the block with the correct field layout (tick_count = 0 at
boot, system_time = FILETIME base, interrupt_time = 0).
* Store the block VA on KernelState::timestamp_bundle_addr during the
import patch.
* Add KernelState::update_timestamp_bundle(mem, clock) and call it every
round in BOTH the lockstep (run_execution) and parallel
(run_execution_parallel) outer loops, right where the deterministic
Scheduler::global_clock is advanced. The clock is the retired-instruction
monotonic global_clock, so every guest-visible time value stays a pure
function of guest progress (lockstep byte-reproducible).
* Cadence: 1 global_clock unit = 100ns (coherent with parse_timeout, which
divides 100ns timeouts by 100 onto the same basis), so
INSTRUCTIONS_PER_MS = 10_000. tick_count now advances 0 -> ~4999ms over
a 50M-instruction window. Also make KeQuerySystemTime read the same
100ns clock instead of a frozen FILETIME constant.
Verification: tick_count at 0x40002010 now advances (deadline arm at
0x82450d0c stores clock+66 = 0x260,0x269,...,0x51d,... advancing, vs the
frozen 0x01d6bc4e before the fix). Determinism: two cold --stable-digest
runs are byte-identical; the n50m golden is UNCHANGED (the clock-affected
counter is not in the stable digest). 672/672 tests pass.
HONEST CAVEAT — the predicted render cascade did NOT materialize on this
branch. The diagnosed consuming gate at 0x82450b10 (the clock-vs-deadline
compare in the worker-hub channel loop sub_82450A68) is unreachable here:
the loop always branches away at 0x82450b0c ([this+220] >= channel-index),
so the hub already dispatches sub_82450B68 342x in BOTH the frozen and
fixed builds. Guest trajectory (imports 339766@50M / 1738001@200M /
9212446@1B), draws (0), swaps (2) and thread topology (tid14 Ready, not
blocked on 0x109c) are identical frozen-vs-fixed. This commit is therefore
a correct latent-clock-bug fix and determinism-safe prerequisite, NOT the
render unblock. The 0x109c/tid14 starvation premise was not reproduced at
f75bc96; the next gate must be re-localized.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
||
|
|
7e2603a9e5 |
[iterate-2E] Extend coherent monotonic clock to lockstep (timebase-desync livelock fix)
Lockstep livelocked the scheduler the same way --parallel did before
|
||
|
|
5aaadfec36 |
[iterate-2E] Add XENIA_AUDIT_DEREF pointer-chase probe
On each AUDIT-PC-PROBE fire, treat gpr[reg] as a base object, dump its first 64 bytes, follow [base+off] to a sub-object, dump that, then follow [[base+off]+0] to its vtable and dump 48 slots. Env-gated (XENIA_AUDIT_DEREF=<reg>:<off>), read-only, lockstep digest unaffected. Captures the live work-item + stream object + vtable at sub_824510E0 before the pool recycles the slot — which overturned the prior session's "infinite spin" diagnosis: the streaming read PROGRESSES 68/68 128KB chunks of a 9MB file, then the hub (tid=5) blocks INFINITE on a self-created Event/Manual (0x1060) that is never signaled. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
0332d1990d |
[Track 2] Parallel-scoped global clock fixes timebase-desync livelock
In --parallel mode a long run livelocked: the scheduler spun "advanced to deadline 3000 waking hw=2 idx=0" ~14k times in microseconds. Root cause: each guest thread owns ctx.timebase (+1/instr in step_block), and all kernel deadline arithmetic read Scheduler::ctx(hw_id).timebase as "now". But the parallel worker extracts its PpcContext via mem::replace(ctx_mut_ref, PpcContext::new()) — leaving a ZEROED timebase in the slot while it steps unlocked — and advance_all_timebases_to only walks runqueue (never idle_ctx). So the coordinator's coord_pre_round drain and a woken thread's parse_timeout could read a zeroed/stale basis decoupled from the deadline the scheduler just advanced to. The thread re-armed the same constant deadline forever; the global clock never moved. Fix: add a single monotonic Scheduler::global_clock, advanced by the per-block retired-instruction count on each parallel writeback and floored up by advance_all_timebases_to. Kernel deadline reads route through KernelState::now_basis_at(hw_id), which returns global_clock ONLY when parallel_active; lockstep keeps reading the exact pre-existing ctx(hw_id).timebase expression, so the deterministic lockstep trace is byte-identical (sylpheed_n50m golden unchanged, zero re-baseline). Verified: - 50M --parallel run completes (was: hung). Deadlines now strictly increasing 5.4M -> 49.1M (18097 unique of 18116; max repeat 2) vs pre-fix constant 3000 x ~14000. - sylpheed_n50m golden byte-identical via plain `check` (no persist). - Full suite 665/665 green. Note: an intermittent parallel hang/crash (~1-2/20 at -n 5M) is pre-existing (master 1/20, this build 2/20 — within noise) and distinct from the timebase livelock: it is a parallel-race class (e.g. the unsafe block_ptr deref in run_execution_parallel). Tracked separately; lockstep remains the recommendation for long runs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
b20c99f141 |
[Subsystem-fixes] 6 verified ours-vs-canary divergence fixes
From the 2026-06-12 5-subsystem differential audit. All verified against canary as oracle; 660/660 workspace tests green (655 + 5 new). 1. nt_create_event polarity (exports.rs) — `manual_reset = gpr[5] != 0` was INVERTED. Canary xboxkrnl_threading.cc:668 `Initialize(!event_type,..)` + xevent.cc:41 (type 0 = NotificationEvent = manual, type 1 = Sync = auto). Now `== 0`. Was the dormant 2.AI fix on chore/portable-snapshot, never merged. The Ke-path was already correct; only the Nt-path was wrong. 2. 2.AF deadline drain (main.rs coord_pre_round) — expired KeWait/KeDelay deadlines never fired under load because advance_to_next_wake_if_due was only called in coord_idle_advance (no-Ready-threads path). Added a per-round drain loop; covers BOTH lockstep and parallel outer loops since both call coord_pre_round. Was the dormant 2.AF fix, never merged. 3. handle slab-recycle ABA guard (state.rs + scheduler.rs) — release_handle_slot (my round-34 regression) recycled a closed slot even with a thread still parked on it, risking a stale-waiter wake when the slot is re-minted. Added Scheduler::any_thread_waiting_on; decline to recycle a still-waited slot. 4. vpkpx pixel-pack (vmx.rs) — wrong field mapping (~100% mismatch). Now exact canary ppc_emit_altivec.cc:1795 shift/mask (red 6b out[15:10] from w[24:19], green out[9:5] from w[14:10], blue out[4:0] from w[7:3]; no fabricated alpha bit). +unit test. 5. VFS GDFX attribute plumbing (vfs/*, exports.rs query fns) — VfsEntry now carries the real on-disc attribute byte (GDFX dirent +12, canary disc_image_device.cc:136/154) instead of inferring directory-ness from path shape. Query exports report the real FILE_ATTRIBUTE_* bits. Candidate driver of the XamShowDirtyDiscErrorUI gate. +tests. 6. MmGetPhysicalAddress region-aware mirror (exports.rs) — flat 0x1FFFFFFF mask missed canary's +0x1000 host_address_offset for 0xE0000000+ mirror (memory.cc:2317). Read-only query; proven byte-identical 50M digest. +test. Investigated and intentionally NOT changed: - zero-on-recommit: no-op; ours has no region-reuse path (bump allocators, free is a stub). - 32-bit ALU writeback truncation (PPCBUG-020): documented-deliberate; premise (MSR.SF=0) is questionable but flipping it is out of scope here. - KeSetEvent/NtSetEvent return value: ours returns true previous state (hardware-faithful); canary returns constant 1 — NOT an ours bug. sylpheed_n50m golden will need re-baselining (legit behavior change). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
db90ad0f7d |
[AUDIT-059 R-D2] Phase D auto-signal POC confirms audit-049 wedge diagnosis
Hook NtCreateEvent for the silph::UImpl tid=13 chain (entry=0x821748F0, start_context=0x4024a840, frame-1 LR=0x821CB15C inside sub_821CB030+0x128) and auto-signal the resulting handle after XENIA_SILPH_UI_AUTOSIGNAL_DELAY instructions. Env-gated; default off. SR4 verdict B (partial unwedge): - handle 0x1078 signal_attempts 0->1 - tid=13 Blocked(WaitAny[0x1078]) -> Ready pc=0x824a9108 - ExCreateThread 10 -> 12 (new silph::UImpl tid=14, worker tid=15) - New downstream wedges 0x1084 + 0x1088 - cxx_throw runtime_error on tid=5 inside R26 dispatcher (BST not-registered instance lhs=0x715a7af0) - VdSwap stays 1; no draws (POC is diagnostic, not final fix) Confirms Phase C diagnosis end-to-end. The real signaler must (a) drive NtSetEvent on the silph KEVENT AND (b) register the dispatcher's BST instance upstream; this POC only does (a). Reading-error class #20: ctx.lr at kernel export entry is the thunk wrapper's return slot, NOT the guest caller's post-bl PC. Walk back-chain 1 step to get frames[1].lr. Reading-error class #21: --parallel and lockstep have SEPARATE outer loops in main.rs (run_execution_parallel line 2928 vs run_execution line 2706). Per-round hooks must be wired in BOTH paths. Files: - crates/xenia-cpu/src/scheduler.rs: GuestThread.start_entry/start_context fields + spawn() population + current_thread_entry_and_ctx() helper - crates/xenia-kernel/src/state.rs: AutoSignalPending struct, env-parsed silph_autosignal_delay, pending Vec, last_cycle_hint, set_now_cycle_hint, maybe_register_silph_autosignal (walks back-chain), fire_due_silph_autosignals - crates/xenia-kernel/src/exports.rs: hook in nt_create_event - crates/xenia-app/src/main.rs: fire-site + cycle hint in both outer loops - audit-runs/audit-059-handle-disambiguation/round-D2-autosignal-poc/FINDINGS.md Tests 655/655 green. Default behavior byte-identical when env unset. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
229b46c765 |
[Kernel] Slab-recycle handle allocator (AUDIT-059 R34)
Adds a FIFO free list of closed handle slots so alloc_handle returns recycled IDs before bumping next_handle. Mirrors canary's slab-style ObjectTable: F8000098 reused 130x per 30s window in canary, but ours' monotonic bump allocator never reused slots — so a recycled slot in canary maps to a fresh, never-reused slot in ours, drifting kernel object identity per AUDIT-042's analysis. release_handle_slot is wired into nt_close's refcount==0 branch and gated to the canonical [0x1000, 0xF000_0000) range so synthetic XAudio park handles (AUDIT-048) are never recycled. Verified: all 655 workspace tests green, smoke tests at -n 50M show NtClose 115/run with handle table renumbering active (round-34 max handle 0x12ac vs round-16 baseline 0x12b8 over same workload). γ- cluster #2 wedge unchanged — silph wait still parks tid=13 on the renumbered handle (4216=0x1078 here vs 0x12a4 baseline), confirming the wedge is independent of allocator policy. Lands as a parity fix to bring our kernel-object identity in line with canary. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
40f208ea4e |
[2.BF] Silph WorkerCtx: install canary's real sub-vtable at [+0x2C][0]
Round-21 pivot of the audit-059 synth-spawn module. Round 20 made the
silph::WorkerCtx workers run by attaching a 32-slot stub sub-vtable
where every entry was a `li r3, 0; blr` stub — workers spawned but
spun forever because slots 15/17 short-circuited to NULL ("no work").
Round 21 reads canary's real sub-vtable VA out of the XEX `.rdata` —
`0x8200A168` — and points `[sub_object + 0]` at it directly. The
vtable bytes live in the static image both engines map, so no guest
memory is consumed and slot 15 (= `sub_824FCCC8`) and slot 17
(= `sub_824FCE38`) — the only slots `sub_82506B08` ever calls —
become working game methods.
Discovery method (canary probes in
`audit-runs/audit-059-handle-disambiguation/round21-subvtable-canary/`):
1. `--audit_jit_prolog_pc=0x82506B08` to catch the first WorkerCtx
virtual-dispatch entry; `[r3+0x2C]` revealed the sub-object VA.
2. Re-run with `--audit_jit_prolog_mem_dump=<sub-obj VA>` to deref
`[sub-object + 0]` = sub-vtable VA = 0x8200A168.
3. PE inspection (`xex-text/xex-rdata` is the static image) reads
all 31 slots; slot 15 -> sub_824FCCC8, slot 17 -> sub_824FCE38.
Smoke metrics (50M instructions, `XENIA_CACHE_PERSIST=1
XENIA_SILPH_SYNTH=1`, audit-runs/audit-059-handle-disambiguation/
round21-real-vtable/):
* 4/4 workers spawned, no crash, no new fault
* KeSetEvent 633885 -> 431860 (-32%)
* KeWaitForSingleObject 258441 -> 185762 (-28%)
* Per-handle state unchanged on the focused stalled set
(0x1020/0x1090 still `<NO_SIGNALS_DESPITE_WAITS>`,
0x12a4/0x12ac/0x1218/0x1224 still `<UNCREATED>`).
* No VdSwap/draws progression observed in this window.
Verdict: B (partial). The workers no longer spin in a stub-loop —
internal call density shifted — but the focused wedge handles still
don't get signalled. Likely root cause: workers may now be waiting
on the WorkerCtx's own KEVENTs (which we synthesised at
+0x54/+0x94) for upstream work that no producer is enqueuing.
Net LOC: 29 ins / 31 del. Tests: workspace passes (lockstep app
tests, kernel 127/127, hir 288/288, scheduler 38/38).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
8683fb59ed |
[2.BF] Silph WorkerCtx: synthesize sub-object + vtable at [+0x2C]
Audit-059 round 19 isolated the round-18 worker fault: the four silph::
WorkerCtx worker bodies all execute the sequence
lwz r3, 44(rN) ; r3 = [ctx+0x2C] — sub-object pointer
lwz r11, 0(r3) ; r11 = sub-object vtable
lwz r11, 60(r11) ; r11 = sub-object vtable[15]
mtctr r11
bctrl
Ours left [ctx+0x2C] NULL → PC=0 fault on first virtual dispatch. Round 19
recommended materialising a sub-object whose vtable points entirely at an
existing trivial-return stub so workers idle live, returning NULL work,
without crashing.
Changes (silph_synth.rs only, +63/-6):
- Grow SILPH_CTX_SIZE 0x500 → 0x800 to embed sub-object at +0x300 and a
32-slot sub-vtable at +0x500 in the same heap_alloc.
- After ctx header init, write sub-object pointer at [ctx+0x2C], the XEX-
resident wrapper constant 0xBE568F00 (round-7 finding) at [ctx+0x30],
and leave [ctx+0x28] NULL (matches canary first-fire snapshot).
- Populate every slot of the 32-entry sub-vtable with VA 0x8216CAA4, the
first 4-byte-aligned standalone `li r3, 0; blr` stub located by a fresh
PE-text scan (preceded by a `blr` terminating the previous function).
- Sub-object body itself is zero-filled apart from the [+0]=vtable_ptr
write; round-19 disassembly confirms workers only touch slots 15/17.
Smoke (XENIA_SILPH_SYNTH=1, persistent cache, 5e7 instr):
- Lockstep: no crash, all 4 workers (tid=6/7/8/9) reach Ready in deep
worker-body PCs (0x825067xx/0x825089xx/0x825091xx). Verdict (D) —
workers run their idle loop returning NULL; existing silph waiters
(0x1020, 0x1090) remain <NO_SIGNALS_DESPITE_WAITS> because we
deliberately neutered productive work.
- Parallel: identical picture, no PC=0/PC=garbage fault anywhere.
No regression in 765-test suite.
Next round: feed real work-items into the intrusive ring at ctx+0x210
so workers' returned-NULL idle becomes returned-work productive; or
discover which sub-vtable slots actually need real callees (slot 15
worker drain, slot 17 producer).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
b5885b8560 |
[2.BF] Synthetic silph::WorkerCtx spawn (round 18 — opt-in landing)
Adds infrastructure to synthesise the silph::WorkerCtx that AUDIT-058/059 identified as never reached by ours' static-init chain (real chain entry sits in audit-059 round 9's wrong-vtable wedge at sub_82172BA0+0x1E8). Ctx layout follows round 5's live hexdump from canary: +0x00 vtable = 0x8200A1E8 +0x04 self +0x08 intrusive list head -> self +0x0C init flag = 1 +0x10 packed byte field +0x18 2x float ~1.0 (UI rates) +0x24 flag = 1 +0x28..+0x30 3x foreign-arena pointers (left NULL — see below) +0x54..+0x84 4x X_KEVENT auto-reset, state=0 +0x94..+0xC4 4x X_KEVENT manual-reset, state=1 (pre-signaled) +0x210..+0x250 4-entry intrusive work-ring, empty Worker spawn mirrors AUDIT-048's audio-worker pattern in xaudio_register_render_driver: per-worker allocate_thread_image + state.scheduler.spawn with r3 = ctx_ptr. Trigger fires at the first dat/* VFS open (ours' earliest is dat/files.tbl), which is when canary runs the equivalent chain. ROUND 18 OUTCOME — opt-in only: With workers spawned Ready (XENIA_SILPH_SYNTH=1), boot CRASHES at cycle ~5.5M with PC=0 on hw=1, just after worker_3 (entry 0x825065B8) spawns. Per task constraints this is STOP-and-report: the ctx fields +0x28/+0x2C/+0x30 (foreign heap pointers — canary's 0x30057018, 0xBCE25640, 0xBE568F00, distinct arenas per audit-059 round 7) are left NULL, and the worker bodies plausibly dereference one of them. Synthesising those is a fresh investigation (round 19+). With workers spawned Suspended (XENIA_SILPH_SYNTH=suspend), boot completes normally (11 spawns, VdSwap=1, KeSetEvent=2, KeReleaseSemaphore=1 — matches default baseline). The ctx remains materialised in guest memory at the logged VA for downstream probing. Default (env var unset): no synth, no regression. Files: crates/xenia-kernel/src/silph_synth.rs (new, 225 LOC) crates/xenia-kernel/src/lib.rs (+1 LOC, register module) crates/xenia-kernel/src/exports.rs (+37 LOC, hook in open_vfs_file) crates/xenia-kernel/src/state.rs (+18 LOC, 4 silph_synth_* fields) Tests: cargo test --release --workspace = 765 pass / 0 fail. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
9340ff4592 |
[Audit] --audit-r3-dump-bytes: dump N bytes at r3 when probe fires
AUDIT-059 round 15 — diagnostic. When `--audit-r3-dump-bytes=N` is set, every `--audit-pc-probe-hex` fire emits a paired `AUDIT-R3-DUMP` line with N bytes of guest memory from r3 as u32 lanes (4-byte aligned, cap 256B). Sized for the 80-byte stack-local struct at sub_82452DC0's `r31+96` (probe sub_8245B000 entry where r3 IS the struct ptr). Settable via `XENIA_AUDIT_R3_DUMP_BYTES` env. Read-only; lockstep digest unaffected (empty-set fast path in fire_audit_pc_probe_if_match). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
bcd018659b |
[Audit] --audit-mem-dump-chain: deref a guest address N levels for diagnosis
Round-14 of AUDIT-2BF (singleton-dump). The bctrl at sub_822F1AA8+0x90
(PC 0x822F1B4C) loads [0x828E1F08] (a global singleton), dereferences
its vtable, and indirect-calls vtable[0]. Canary returns; ours hangs.
To name the resolved target we need to dump the (singleton, vtable,
vtable[0]) chain on probe firing.
Adds `--audit-mem-read-hex` / `XENIA_AUDIT_MEM_READ` taking a single
guest VA. When set and any `--audit-pc-probe-hex` PC fires, the kernel
emits a paired `AUDIT-MEM-READ` line with three guest reads:
AUDIT-MEM-READ addr=0x828E1F08 val=<*addr> vtable=<**addr> \
vtable[0]=<***addr+0> vtable[24]=<***addr+24> ...
`vtable[24]` is included as the slot-6 method (audit-059 round 9
documented the canary silph chain dispatching slot 6 of a vtable here).
Read-only; lockstep digest unaffected. ~30 LOC across state.rs and
main.rs. `cmd_check` opts out of the flag (same policy as the existing
audit_pc_probe_hex).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
09e59e09b7 |
Audit-2BF.delta: add --audit-pc-probe-hex for silph-init bctrl probe
Adds a per-PC probe analogous to --lr-trace / --branch-probe but tuned
for the silph init chain's virtual-dispatch site at sub_82172BA0+0x1E8
(PC 0x82172D88, the bctrl after a 3-deep `lwz` chain that loads vtable
slot 6). Each fire emits one AUDIT-PC-PROBE line with (pc, tid, hw,
cycle, lr, r3, r11) plus four guest-memory dereferences off r3 — the
vtable, slot-6 method pointer, auxiliary handle field, and embedded
sub-object vtable — so the line can be compared head-to-head with
canary's round-9 capture (r3=0xBCCC52C0, [r3+0]=0x820A3644,
slot6=sub_821B55D8, [r3+0xC]=0xF80000D8, [r3+0x30]=0x820A1870) to
identify whether ours dispatches to the wrong vtable on a correct
object (case A) or to a wrong object entirely (case B).
Why this addition rather than reuse of an existing probe: --lr-trace
emits JSONL designed for canary-side diffing and only captures
r3/r4/r5/r6/lr (no memory dereferences); --branch-probe captures CR
flags and lr but again no memory; --ctor-probe is single-shot per PC
and walks the stack back-chain. None of them load the four indirect
fields needed to identify a vtable-shape divergence.
Implementation:
- state.rs: new HashSet<u32> field `audit_pc_probe_pcs` and helper
`fire_audit_pc_probe_if_match(hw_id, mem)`. Empty-set fast-path
keeps the cost to one is_empty() check per worker_prologue call
when the flag is unused. Read-only — no guest state mutation,
lockstep digest unchanged.
- main.rs: new CLI flag --audit-pc-probe-hex with bare-hex comma
parsing (tolerates `0x` prefix), settable also via
XENIA_AUDIT_PC_PROBE env var. Threaded through cmd_exec_inner;
cmd_check passes None so check digests are unaffected.
Probe wired into worker_prologue alongside fire_ctor_probe / fire_-
branch_probe / fire_lr_trace. Like its siblings, it fires once per
basic-block entry — known limitation (audit-045 reading-error class
13); use a block-entry PC if probing a mid-block instruction.
Verification: kernel 127/127, app 5/5 non-ignored, no behaviour
change with empty flag.
Cross-references audit-059 round 9's canary capture and lays the
groundwork for the round-10 ours-side comparison.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
9a93152981 |
Iterate-2.BE: host-driven synchronous graphics ISR delivery
Replaces the victim-thread-mutate-then-wait scheme for vsync / CP
interrupts with synchronous in-line dispatch on the coordinator host
thread. Mirrors canary's EmulateCPInterruptDPC -> Processor::Execute
path (kernel_state.cc:1370, processor.cc:413): pick a guest thread,
borrow its PpcContext, jam ISR PC + args in, run the interpreter
inline until LR_HALT_SENTINEL, restore the borrowed context.
Why: audit-059 measured gpu.interrupt.delivered{source=0} = 54 over
3.9 s vs canary's 4712 over 30 s. Per-second shortfall ~11×. Old
asynchronous LR-sentinel injection (try_inject_graphics_interrupt)
needed a Ready or Blocked guest thread to land on; once the Sylpheed
main thread and worker threads all idled post-boot, no victim was
available and every queued vsync got dropped. Host-driven dispatch
decouples delivery from guest-thread readiness.
Smoke test (lockstep): unchanged 54 — under current Sylpheed boot
trajectory the ticker is gated by guest-instruction progress, not
victim availability; lockstep stalls into idle-advance after ~5M
instructions of real work and the synthetic tick_vsync_instr stops
firing. Under --parallel (wallclock ticker) gpu.interrupt.delivered
climbs to ~1131 over a 128 s run, confirming the synchronous
dispatcher itself works as intended. Architectural piece is now in
place; raising the lockstep delivery rate requires ticking the
synthetic vsync inside coord_idle_advance, which is a separate
change.
Changes:
- crates/xenia-kernel/src/interrupts.rs: doc-comment update only.
SavedCallbackCtx + CALLBACK_STACK_PAD retained — the audio
callback path (audit-048) still uses the asynchronous LR-sentinel
inject on a dedicated per-client worker.
- crates/xenia-app/src/main.rs:
* dispatch_graphics_interrupts(kernel, mem, &mut stats,
&mut decode_cache, thunk_map): new fn. Drains the full FIFO per
call. Victim selection same shape (Ready preferred, else
Blocked, skip Idle/Exited/ServicingIrq), but the call is
synchronous - we run step_cached + import-thunk dispatch inline
on the borrowed ctx until pc == LR_HALT_SENTINEL.
MAX_INSTRS_PER_ISR = 1M safety budget.
* coord_pre_round: graphics-IRQ injection call removed. Audio
path unchanged (still calls try_inject_audio_callback).
* run_execution + run_execution_parallel: each now owns a
persistent isr_decode_cache and calls
dispatch_graphics_interrupts after coord_pre_round.
* try_inject_graphics_interrupt: deleted (118 LOC).
No new public APIs, no new dependencies, no changes to xenia-cpu.
Tests: workspace 765 passed / 0 failed / 4 ignored (parallel_stress
+ sylpheed_n50m, all gated). Kernel 127/127, app 5/5, cpu 288/288.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
2a8ff9515d |
AUDIT-054: thread CreateOptions through NtCreateFile + opt-in cache persistence
Track A — FILE_DIRECTORY_FILE handling. NtCreateFile's 9th parameter
`create_options` (sp+0x54 per shim_utils.h:49-50) is now read and
forwarded to open_vfs_file/open_cache_file. When the
FILE_DIRECTORY_FILE bit (0x1) is set on a `cache:\<hash>` path,
the host-side handler `mkdir -p`s instead of `File::create`'ing a
0-byte sentinel that blocked subsequent hierarchical creates of
`cache:\<hash>\<sub>\<leaf>` with NAME_COLLISION. Confirmed by
`opts=0x4021` (incl. FILE_DIRECTORY_FILE) on `cache:\d4ea4615`
and `opts=0x4020` (no DIR bit) on the leaf `.tmp` files. NtOpenFile
forwards `open_options` (r8) into the same slot per
xboxkrnl_io.cc:118-122. Closes the AUDIT-053 ζ-class VFS layout
aliasing wedge.
Track B — opt-in persistent cache root. AUDIT-038's per-process
tmpdir + wipe stays the default (preserves lockstep/oracle
determinism + dodges Sylpheed's `<hash>.tmp` journal-append-on-
reboot self-inconsistency). Persistence is now opt-in via
* `XENIA_CACHE_ROOT=<path>` — explicit path (caller manages
wiping); hands a stable place to drop a canary-built cache
for cascade A/B oracle work.
* `XENIA_CACHE_PERSIST=1` — `$XDG_DATA_HOME/xenia-rs/cache`
(or `$HOME/.local/share/xenia-rs/cache`).
Cold-start (-n 500M, default tmpfs) with FILE_DIRECTORY_FILE fix:
swaps=1 draws=0 imports=40454 cxx_throw=0 — matches master baseline,
no regression. Cache hierarchy now mkdir-p'd correctly: `cache:/`
contains 9 hash dirs (e.g. `d4ea4615/e/`, `aab216c3/5/`) instead
of the 0-byte sentinel files AUDIT-053 found masquerading as
directories.
LOC: +88 / -14 = +74 net (≤80 budget). All 127 xenia-kernel unit
tests pass.
Trace: audit-runs/audit-054-vfs-layout-fix/
cold-start-digest.json + warm-start-digest.json (defaults)
persist-cold-digest.json + persist-warm-digest.json (opt-in)
baseline-master-digest.json (master
|
||
|
|
49f3eafa15 |
AUDIT-032: dedicated audio worker thread per client (Plan B)
Replaces APUBUG-PRODUCER-001's random-victim-hijack audio injection with a dedicated per-client guest worker thread, mirroring xenia-canary's apu/audio_system.cc:84-159 WorkerThreadMain pattern in xenia-rs's threading model. Audio callback ticker is now safe to enable by default. ## What changed - xenia-kernel/src/xaudio.rs: new XAudioState fields worker_handles + worker_refs (one slot per of XAUDIO_MAX_CLIENTS=8). Synthetic park-handle helper (0xF000_0000 | client_idx) — outside the normal alloc range so wake_eligible_waiters never finds it; the only legitimate state-flip is via try_inject_audio_callback. - xenia-kernel/src/exports.rs: xaudio_register_render_driver spawns a 64KB-stack guest thread (create_suspended=true) via state.scheduler.spawn after registration succeeds. Immediately flips the spawned thread's state from Blocked(Suspended) to Blocked(WaitAny[synthetic]) so it's parked but not woken. Stores the kernel handle so find_by_handle resolves a fresh ThreadRef after slot compaction. Failure paths log + leave xaudio.worker_refs[i] = None, in which case the ticker drops fires (no random-victim fallback). - xenia-app/src/main.rs: try_inject_audio_callback resolves the worker via worker_handles[index] instead of scanning runqueues for a Ready or Blocked victim. The PC+r3 injection and SavedCallbackCtx capture are unchanged; the existing LR_HALT restore path re-blocks the worker on its synthetic handle for the next tick. Flag handling reworked: --xaudio-tick / XENIA_XAUDIO_TICK now act as explicit override (truthy = force on, falsey = force off, absent = use the KernelState default). - xenia-kernel/src/state.rs: xaudio_tick_enabled default flipped from false to true. Pre-fix it was off because the random-victim hijack regressed swaps=2->1; with the dedicated worker that whole class of regression is gone. ## Cascade verification at -n 500M (audit-runs/audit-048-audio-host-pump/) Pre-fix baseline: audit-runs/audit-047-gamma-wedges/ours-end-state.log. | Dim | Predicted (AUDIT-032) | Observed | |-----|-------------------------------------|---------------------------------| | A | tid=9 leaves Blocked[0x828A3254] | Ready @ pc=0x824d1404 | | B | tid=10 leaves Blocked[0x828A3230] | Ready @ same pc/lr | | C | XAudioSubmitRenderDriverFrame > 0 | Mixer setup path executed | | D | KeReleaseSemaphore 0 -> non-zero | 0 -> 1; xaudio.callback.delivered=1 | Bonus: audit-042's tid=6 worker pair on 0x10A0+0x10A4 also went Blocked->Ready as a downstream effect. Boot trajectory shifted significantly: NtWaitForSingleObjectEx 1,489,791 -> 30; NtSetEvent 3,334 -> 68; new exports firing (StfsCreateDevice, ObCreateSymbolicLink, XamContentCreateEnumerator, XamEnumerate, XamTaskSchedule, ExCreateThread x10, KeSetAffinityThread x7, NtCreateSemaphore x4, NtWaitForMultipleObjectsEx x94, NtDuplicateObject x14, XeCryptSha, XeKeysConsolePrivateKeySign). The system left the audio-wait busy loop and entered the savegame/content/crypto init phase. swaps regressed 2 -> 1 (degenerate splash repeat lost; main thread now advances past splash entirely, blocked on a different handle). draws unchanged at 0 — expected per AUDIT-032 (audio gate != renderer gate). ## Tests + scope - cargo build --release succeeds, no new warnings. - cargo test -p xenia-kernel --lib: 127/127 pass (incl. xaudio). - cargo test -p xenia-app --lib: 5/5 non-ignored pass. - Lockstep goldens (sylpheed_n2m / sylpheed_n50m) WILL drift on this fix and need re-baselining as a follow-up commit. 75 net non-comment LOC across 4 files, well under AUDIT-032's 60-120 LOC budget. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
77034b6cbf |
audit-038: persistent cache:/* VFS via host-FS backing
Replaces the "Synthesized empty file" stub for cache:/* paths with a
real host-FS HostPathDevice-style mount. Each KernelState gets a fresh
per-process tmpdir under /tmp/xenia-rs-cache-<pid>-<id>/ which is
cleared on init for lockstep determinism (mirrors canary's
xenia_main.cc:649 RegisterSymbolicLink("cache:", "\\CACHE") +
HostPathDevice in xenia-canary/src/xenia/vfs/devices/host_path_device.cc).
NtCreateFile now honours create_disposition for cache: paths:
FILE_OPEN -> NOT_FOUND if missing
FILE_CREATE -> NAME_COLLISION if present
FILE_OPEN_IF -> open or create
FILE_OVERWRITE_IF -> create or truncate
FILE_OVERWRITE -> NOT_FOUND if missing, else truncate
FILE_SUPERSEDE -> create or truncate
NtReadFile / NtWriteFile / NtSetInformationFile (XFileEndOfFileInformation)
/ NtQueryInformationFile / NtQueryFullAttributesFile route through
std::fs against the per-handle host_path; non-cache paths keep their
legacy semantics (read-only disc image, synth-empty stubs).
Verified by audit-037 cascade:
- sub_82459D18 (cache-miss restore): 0 fires (was firing constantly)
- sub_8245D230 (resize/zero-fill): 0 fires (was firing constantly)
- 105+ real cache-file writes per 500M run; 4+ MB of game data persisting
to disk per boot; cache:/recent, cache:/access, cache:/d4ea*.tmp, etc.
- Lockstep deterministic at instructions=100000004 / imports=987485
across 3+ reruns (digest shifted as expected; goldens re-baselined).
- swaps=2 plateau still in place; cluster L1 unactivated. Cascade
dimension D (cluster activation) — UNKNOWN, no L1 fires.
Tests 640 -> 645 (+5 cache-specific unit tests; full workspace green).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
5af792c9fc |
M8+M9+M10+M11+M12: LOW-tier milestones — funcptr-arrays, EH flag, TLS, lr-trace
Five LOW-priority milestones bundled. Total ~700 LOC across 11 files. ## M9 — has_eh derived from pdata.flags exception bit - New `functions.has_eh BOOLEAN NOT NULL` column. Derived from M1's already-parsed `pdata.flags` (bit 31 of the packed word — the exception-handler-present flag, distinct from bit 30 which is the always-1 32-bit-code flag). Index idx_functions_has_eh. - Sylpheed: 2,975 of 23,073 pdata-validated functions have EH (12.9%). ## M10 — .tls section / IMAGE_TLS_DIRECTORY32 parser - New `xenia_xex::tls::parse_tls` parses the directory + zero-terminated callback array. Returns None when the binary has no .tls section. - New `tls_info` (singleton row) + `tls_callbacks(slot, address)` tables. - New `DbWriter::write_tls()` no-ops on None. - Sylpheed has no .tls section → 0 rows; infra ready for binaries with __declspec(thread). ## M8 + M11 — function_pointer_arrays (dispatch tables + static initialisers) - New `xenia_analysis::funcptr_arrays::analyze` widens M3's vtable scan: detects runs of ≥2 function pointers in .rdata and classifies each as `vtable` (M3 re-emit), `dispatch_table` (M8), or `static_init` (M11) via a constructor-prologue heuristic (mfspr + small stwu). - New tables `function_pointer_arrays(address PK, length, kind)` and `function_pointer_array_entries(array_address, slot, function_address)`. - Sylpheed: 722 vtables + 388 dispatch_tables = 1,110 arrays / 6,347 slots. 0 static_init detected (Sylpheed's ctors don't all match the conservative heuristic; M11.5 future work can chain via the entry- point's static-init driver). ## M12 — --lr-trace runtime canary-diff harness - New CLI `exec --lr-trace=PC[,PC,...]` and `--lr-trace-out=PATH` flags. Symbolic resolution (Class::method, Class::*) via M4 lookup. Env vars XENIA_LR_TRACE / XENIA_LR_TRACE_OUT also work. - New `KernelState::lr_trace_pcs` + `lr_trace_writer` + helper `fire_lr_trace_if_match(hw_id)` invoked from the per-instr probe slot. - JSONL output: pc/tid/hw/cycle/r3/r4/r5/r6/lr — superset of what xenia-canary's --log_lr_on_pc patch emits, with a cycle counter for cross-run reproducibility. Diff-friendly via `jq`. - Lockstep digest unaffected: smoke test on entry-point PC fires once with cycle=0/lr=BCBCBCBC/all-GPR-zero (correct initial state). Tests 636→640 (+2 TLS tests, +2 funcptr_arrays tests). Schema golden updated for new tables + has_eh column. Lockstep determinism preserved (instructions=2000005 ×2 reruns identical). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
690943ceef |
gate dump-section reads on is_mapped; trim doc comments
Without the page-state guard, read_bulk faulted on PROT_NONE pages of the 4 GiB host reservation. Per-page is_mapped check skips uncommitted pages, leaving the buffer's leading zero bytes in place. Total LOC budget after trim: 70. |
||
|
|
08d41cf2fc |
add --dump-section=BASE:LEN:PATH for end-of-run guest memory snapshot
Drives byte-level memory diffs against canary's Memory::Save dump. Hot-path zero-cost when absent; lockstep digest unaffected (instructions=100000003 deterministic across reruns). |
||
|
|
c03f2bc9e2 |
fix(kernel): ensure_dispatcher_object writes XObj signature + handle (canary mirror)
Mirrors canary's `XObject::StashHandle` (xobject.h:253-256): on first adoption of a guest dispatcher header, stamp +0x08 with the kXObjSignature fourcc 'X','E','N','\0' and +0x0C with the stash handle (here the guest pointer itself, since our shadow table is keyed by ptr). Audit-023/024A documented divergence at addresses such as 0x828F4838 where canary stores "XEN\0" + handle but we left zeros. Lands as canary-correctness restoration; cascade impact at -n 500M is nil per the discipline gate (no sharp prediction tied to the writeback). Lockstep determinism preserved: instructions=100000003, imports=987516, swaps=2, draws=0 across 2 reruns. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
76dfe7fd7a |
fix(kernel): KRNBUG-KE-001 — real KeResumeThread per canary mirror
Replace the no-op cookie-returner with a real impl per canary xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_threading.cc:216-227 (XObject::GetNativeObject<XThread>()->Resume()). Mirrors nt_resume_thread plumbing two functions below: resolve_pseudo_handle -> scheduler.find_by_handle -> resume_ref. Returns STATUS_SUCCESS if the KTHREAD-pointer-as-handle resolves, STATUS_INVALID_HANDLE otherwise — matches canary's Resume()/!thread return semantics. Cascade-prediction scorecard (audit-018 -> post-fix): - A PASS: tids 9 (entry=0x824D2878) and 10 (entry=0x824D2940) leave Suspended -> run prologue -> park on audio buffer-completion semaphores 0x828A3254 / 0x828A3230. - B PARTIAL FAIL: NtSetEvent 667->3334; KeReleaseSemaphore=0; XAudioSubmitRenderDriverFrame=0. - C FAIL (predicted 2->1, actual 2->2): both ExTerminateThread + KeReleaseSemaphore still canary-only. - D FAIL: gamma-cluster blocker unchanged — pc-probe at 0x82184318/0x82184374 no fires; dump-addr 0x828F4070 no DUMP; signal_attempts on 0x1004/0x100c/0x1020/0x15e4 still 0. Necessary-but-not-sufficient: workers unsuspend but park on a downstream gate that's part of the audit-009/-016/-017 gamma cluster. Tests 600 -> 601 (+ke_resume_thread_unblocks_suspended_worker). Lockstep instructions=100000003 imports=987516 deterministic x2. Goldens re-baselined: sylpheed_n50m.json instructions 50000003->50000011, imports 407255->407247. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
5d2401f9c5 |
fix(xam): XamUserGetSigninState returns SignedInLocally=1 for user 0
Mirrors canary xam_user.cc:90-101. User 0 returns 1 (SignedInLocally), all other indices return 0. Replaces stub_return_zero registration that was reaching guest-side branches looking up signin state. Tests: 599 -> 600. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
b78e6fd205 |
fix(kernel): KRNBUG-IO-004 — real XamNotifyCreateListener + XNotifyGetNext per canary
Canary's RegisterNotifyListener (kernel_state.cc:1013-1033) auto-enqueues four
startup notifications on the first listener whose mask covers kXNotifySystem
(SystemUI=0x09 + SystemSignInChanged=0x0A) and kXNotifyLive
(LiveConnectionChanged=0x02000001 + LiveLinkStateChanged=0x02000003). XNotifyGetNext
(xam_notify.cc:22-96) pops the queue with mask + version filtering on enqueue per
xnotifylistener.cc:38-51. Our prior stubs returned 0 forever; the dispatch loop
at 0x822f1be8 in sub_822F1AA8 was thus bypassed indefinitely.
Implementation:
- KernelObject::NotifyListener { mask, max_version, queue, waiters } variant.
- KernelState::has_notified_startup + has_notified_live_startup gates.
- xam_notify_create_listener: mask=r3 (qword), max_version=r4 (clamped <=10),
alloc handle, conditional 4-tuple startup enqueue.
- xnotify_get_next: handle/match_id/id_ptr/param_ptr in r3..r6; pop_front
(or scan-by-id), with mask + version filter applied at enqueue time.
- 5 unit tests covering: full-mask 4 startup notifications, second-listener
no re-fire, system-only mask filtering, max_version=0 too-new drop,
unknown handle returning 0.
Tests: 594 -> 599. Lockstep `-n 100M` instructions=100000012 deterministic
across 2 reruns; bit-identical run-to-run diff.
Cascade (verified at -n 500M):
- dispatch arm 0x822f1be8 fires; sub_82173DC8 entered.
- 3/21 renderer-cluster L1 PCs newly reached: 0x822c6870 (2 workers),
0x824563e0, 0x823ddb50.
- canary-only export delta 7 -> 3 (reclassified to fired:
KeResetEvent, ObCreateSymbolicLink, XamTaskCloseHandle, XamTaskSchedule).
- worker thread count 18 -> 20.
- signal_attempts on handle 0x15e0 = 1 (primary=1), was 0.
- draws=0 still expected at this step.
LOC: 119 (97 impl + 22 scaffolding pattern matches across main.rs / objects.rs
/ state.rs) <= 120.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
a1a7265f29 |
fix(kernel): KRNBUG-IO-003 — NtDeviceIoControlFile real impl mirroring NullDevice::IoControl
Replace the stub_success registration of NtDeviceIoControlFile at
exports.rs:90 with a real handler for FsCtlCodes 0x70000 (drive
geometry) and 0x74004 (partition info), mirroring xenia-canary
xboxkrnl_io.cc:645-678 + null_device.{h,cc}. The 16-byte 0x74004
response with cache_size=0xFF000 at OUT+8 is the gate that lets
sub_824ABD88 return SUCCESS and sub_824A9710 reach the priv-11
XexCheckExecutablePrivilege site identified by KRNBUG-AUDIT-007.
Stack args 9-10 (OutputBuffer, OutputBufferLength) read from the
caller's parameter save area at [sp+0x54] / [sp+0x5C] per the Xbox
360 PowerPC EABI (linkage area sp+0..sp+8, 8-quadword spill area
sp+0x14..sp+0x54, then stack args every 8 bytes). First HLE export
in the codebase to need 9+ args.
Cascade vs. KRNBUG-AUDIT-007 prediction (5/8 held):
- XexCheckExecutablePrivilege count 1 → 2 (priv=0xA + priv=0xB) ✓
- XamTaskSchedule count 0 → 1 ✓
- canary-only exports 7 → 3 (audit predicted ≤3) ✓
- 0x15e0 semaphore signal_attempts 0 → 1 (bonus)
- 0x100c worker spawn DID NOT fire (still UNCREATED) ✗
- 0x1004 signal_attempts unchanged ✗
- Worker spawn count unchanged at 19 ✗
Tests: 592 → 594. Lockstep deterministic at -n 100M (run1 ≡ run2 ≡
run3, byte-identical). instructions=100000010 → 100000019, imports
407417 → 987524 (+2.4×). swaps=2 draws=0 plateau persists.
sylpheed_n50m golden re-baselined instructions=50000004→50000003,
imports=407362→407255. sylpheed_n2m unchanged.
Still canary-only after this fix: ExTerminateThread,
KeReleaseSemaphore, XamUserReadProfileSettings. The next downstream
gate is somewhere past XamTaskSchedule's completion path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
c51f51f9cb |
feat(kernel): KRNBUG-AUDIT-007 — --branch-probe instrumentation; sub_824A9710 exit gate identified
Sister to --pc-probe / --ctor-probe but emits a single compact one-line BRANCH-PROBE record per fire (pc, tid, hw, cycle, r3, lr, cr0/cr6 flags) with no back-chain. Designed for tracing every conditional-branch fire inside a candidate-gate function so the last PC reached before the function epilogue identifies the exit branch. Runtime trace at audit-runs/audit-007/sub_824A9710-trace.log decisively identifies the priv-11 gate: - Exit branch: 0x824a9944 (post bl sub_824ABD88 first call) - Responsible kernel call: NtDeviceIoControlFile, FsCtlCode=0x74004 (registered as stub_success at exports.rs:90) - Mechanical chain: stub returns 0/SUCCESS without writing OUT, game reads [out_buf+8], finds zero, assigns hardcoded 0xC0000034 (STATUS_OBJECT_NAME_NOT_FOUND) at sub_824ABD88:0x824abea8-ac, exits via 0x824a9944's lt branch before priv-11 site at 0x824a99a0. 592→592 tests; lockstep instructions=100000010, swaps=2, draws=0 deterministic across reruns. Read-only diagnostic — no fix this session. Next session: KRNBUG-IO-003 (real NtDeviceIoControlFile per canary NullDevice::IoControl for FsCtlCodes 0x70000 + 0x74004). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
7675035082 |
fix(kernel): KRNBUG-IO-002 — vol-info class-3 returns 0x10000 alloc unit (canary NullDevice)
`nt_query_volume_information_file` class-3 (`FileFsSizeInformation`) was returning sectors_per_unit=1, bytes_per_sector=2048 (alloc unit 2048). Replaced with canary's NullDevice byte-identical values sectors=0x80, bps=0x200 (alloc unit 0x10000), with total / available allocation units lowered to 0x10 / 0x10 to match. Reference: xenia-canary/src/xenia/vfs/devices/null_device.h:38-46 (`NullDevice::sectors_per_allocation_unit()` and `bytes_per_sector()`); consumed by canary's `NtQueryVolumeInformationFile_entry` at xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_io_info.cc:355-365. Tests 591 → 592 (added `nt_query_volume_information_file_class3_returns_64k_alloc_unit`). Lockstep `instructions=100000010, swaps=2, draws=0` deterministic across two `--stable-digest -n 100M` reruns. sylpheed_n50m oracle still matches its existing golden — observably a no-op at -n 50M. The audit-006-predicted 7→0 cascade did NOT fire (canary-only exports still 7, identical set; XexCheckExecutablePrivilege still priv=0xA only; XamTaskSchedule still 0). All 16 NtQueryVolumeInformationFile calls in our 500M trace originate from a single LR 0x82611f38 and complete successfully — vol-info is therefore not the priv-11 gate. The fix value is correct (canary-byte-identical) but is not load-bearing for the gate; landing it anyway because it's the right value and unblocks no regression. Stop condition triggered per the IO-002 task brief — no second fix this session. Next-session: --pc-probe on sub_824A9710 entry to find the actual upstream gate. See `audit-findings.md` (KRNBUG-IO-002 entry) and `audit-runs/post-IO-002/` for the full diagnostic trail. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
bef9793aec |
feat(kernel): KRNBUG-IO-001 — NtReadFile on synth-empty file returns SUCCESS+0, not EOF
AUDIT-005's static attribution to sub_824ABA98 was wrong. The 0xC0000011
(STATUS_END_OF_FILE) at lr=0x824a97e4 traces to the NtReadFile call at
0x824a9810 inside sub_824A9710 — the cache-loader reads 1024 B from
offset 2048 of `\Device\Harddisk0\partition0`. Our synth-empty fallback
returned EOF (start_pos 2048 > size 0), so the function bailed via
RtlNtStatusToDosError before sub_824ABA98 was ever called.
Canary mounts partition0 to a NullDevice; `NullFile::ReadSync`
([null_file.cc:24-31](xenia-canary/src/xenia/vfs/devices/null_file.cc))
returns X_STATUS_SUCCESS with bytes_read=0 and never touches the
buffer. Sylpheed's caller pre-zeroes the 1024-byte stack buffer
(`memset(sp+208, 0, 1024)` at sub_824A9710 prologue), validates a
"Josh" magic on the first read, and falls back to the cache-recreate
path when the magic doesn't match.
The fix mirrors NullFile semantics: when the open synthesized a
zero-length file (`data.is_empty() && size == 0`), NtReadFile returns
SUCCESS with information=0 and the buffer untouched.
Effects (chain-of-effects verification at -n 500M):
- tests: 590 → 591 (added regression covering NullDevice semantics)
- lockstep: deterministic across 3 reruns (same instructions=100000010,
swaps=2)
- sylpheed_n50m golden re-baselined: instructions 50000004→50000000,
imports 407416→407362
- canary kernel-call diff: 10 → 7 missing exports
(XeCryptSha + XeKeysConsolePrivateKeySign + NtDeviceIoControlFile
now run; the cache-recreate path executes through to NtWriteFile)
- boot reaches silph::Silph::Impl::OnInit: 19 worker threads spawn
(was 6 before the fix)
- parked-handle 0x1004 still signal_attempts=0; the original 0x100c
and 0x15e0 are now <UNCREATED> because cascade walked past them and
the handle assignments shifted; new parked sites: 0x12fc/0x1600/
0x1040/0x10b8/0x15e8/0x1014/0x101c/0x10bc/0x1044
- draws=0 plateau persists; renderer is multi-causal blocked
Next blocker: per the canary-only diff, XamTaskSchedule + the cluster
of XAM exports (XamTaskCloseHandle, XamUserReadProfileSettings,
ObCreateSymbolicLink) and the post-thread-exit chain (ExTerminateThread,
KeReleaseSemaphore, KeResetEvent) are the next-up frontier.
|
||
|
|
19659d7f76 |
feat(kernel): KRNBUG-XAM-001 — XGetAVPack returns 8 (HDMI), not 0x16
Mirrors canary's cvars::avpack default (xam_info.cc:35) and Sylpheed's
accepted set {3,4,6,8} (xam_info.cc:250-251). With KRNBUG-XEX-001 having
flipped the priv-10 gate, XGetAVPack now reaches its caller in
sub_824AB578; returning 0x16 caused Sylpheed to abort the AV/crypto
block before XeCryptSha. Cascade walks one step (canary-only export
list 11 → 10); sub_824ABA98 is the next candidate.
Tests: 589 → 590. Goldens re-baselined (n50m: 50000005→50000004,
imports 407417→407416). Lockstep deterministic across 3 reruns at
-n 100M (instructions=100000010, import_calls=987686 +2.4×, swaps=2).
9-PC producer probe still 0×; parked handles 0x1004/0x100c/0x15e0
still signal_attempts=0.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
1a892d4641 |
feat(kernel): KRNBUG-XEX-001 — real XexCheckExecutablePrivilege from XEX header bitmap
Replace stub_return_zero with a canary-faithful implementation that returns bit `priv` of the loaded XEX's XEX_HEADER_SYSTEM_FLAGS (key 0x00030000) bitmap. Mirrors xenia-canary xboxkrnl_modules.cc:22-39: `(flags >> priv) & 1` for priv < 32, else 0. Plumbing: - xenia-xex: header_keys::SYSTEM_FLAGS const + get_system_flags() accessor. - xenia-kernel/state.rs: pub xex_system_flags: u32 + xex_priv_logged HashSet for one-shot per-priv tracing. - xenia-app: kernel.xex_system_flags wired in cmd_exec_inner. - xenia-kernel/exports.rs: real export body + unit test covering bits 10/11/0/64 + zero-flags case. Sylpheed's bitmap is 0x00000400 (only XEX_SYSTEM_PAL50_INCOMPATIBLE, bit 10). At -n 500M with the fix: - XGetAVPack: 0 -> 1 (priv-10 gate at lr=0x824ab598 flipped). - 10 other canary-only exports + 9 producer PCs + 3 parked handles unchanged. Priv-11 site at sub_824A9710 is downstream and still not reached — AV/crypto block aborts after XGetAVPack returns our placeholder 0x16 (canary returns 8/HDMI; Sylpheed accepts only 3/4/6/8 per xenia-canary xam_info.cc:250-251). Tests 588 -> 589. Lockstep deterministic (3 reruns identical): n50m goes 50000008 -> 50000005 instr / 407415 -> 407417 imp / swaps=2 / draws=0. Goldens re-baselined (sylpheed_n50m, sylpheed_n2m); oracle test green. Full chain-of-effects + next-frontier hand-off in audit-findings.md under KRNBUG-XEX-001. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
3e2fc1ec88 |
feat(kernel): KRNBUG-AUDIT-005 — --pc-probe extension + canary diff identifies XexCheckExecutablePrivilege stub cascade
Extends `--ctor-probe` machinery into `--pc-probe` (clap alias) with
the optional `PC@DISPATCHER:OFFSET` token form: on a hit, the helper
additionally logs `[disp+off]` — what the producer's
`lwz r3, OFFSET(r3)` is about to read. Reuses `parse_hex_u32`; both
flags share parser + storage.
Read-only diagnostic. Lockstep digest preserved (`run digest matches
golden` at -n 50M `--stable-digest`). 588 tests green.
Decisive findings (full deliverable in `audit-findings.md` /
`audit-runs/audit-005/`):
- Failure mode α confirmed for KRNBUG-AUDIT-004: all 9 producer call
sites for handles 0x100c (5 sites) and 0x15e0 (4 sites) fire 0x at
-n 500M. The producer code path is not reached.
- Set-diff of kernel-call sequences (canary.log oracle vs ours.log
at -n 500M) identifies 11 exports canary calls and we don't:
XGetAVPack, XeCryptSha, XeKeysConsolePrivateKeySign,
ObCreateSymbolicLink, NtDeviceIoControlFile (×2),
XamUserReadProfileSettings (×2), XamTaskSchedule, XamTaskCloseHandle,
KeReleaseSemaphore (×268), KeResetEvent, ExTerminateThread (×2).
- XGetAVPack has exactly one caller (sub_824AB578 at 0x824AB5A0).
The 4 instructions immediately preceding it are:
addi r3, r0, 10 ; privilege bit 10
bl XexCheckExecutablePrivilege
cmpli 0, r3, 0
bc 12, eq, 0x824AB724 ; if r3==0, skip whole block
- exports.rs:193 registers XexCheckExecutablePrivilege as
stub_return_zero. Always returning 0 -> guest takes the branch
and skips the entire AV/crypto/save-data init block.
- The other call site (sub_824A9710 at 0x824A99A0) queries privilege
11 with opposite polarity (bne) -> gates XamTaskSchedule on the
privilege-NOT-set arm. With both stubs returning 0, the guest
walks the wrong arm of every privilege-gated branch.
- This explains why the dispatcher fields read zero
([0x828F3D08+0x50]=0, [0x828F4070+0x24]=0 from AUDIT-004 dumps):
the ctors run, but the producers that would populate those fields
with a non-zero handle never execute.
Next session: replace XexCheckExecutablePrivilege stub with real
priv-bit lookup from XEX header. See audit-findings.md
KRNBUG-AUDIT-005 for the validation matrix.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
7108d6d131 |
feat(kernel): KRNBUG-AUDIT-004 — --ctor-probe PC hook + --dump-addr struct dump
Diagnostic-only, read-only. Lockstep `instructions=100000002`
preserved bit-exact at -n 100M --stable-digest. 586 → 588 tests.
Adds two read-only diagnostics for the parked-waiter producer hunt:
* `--ctor-probe=0x8217C850,0x...` — at every interpreter step,
if `ctx.pc` is in the configured set, print one `CTOR-PROBE`
line capturing live r3 (= `this` in MSVC PPC ctors), lr
(= return site), sp, plus an 8-frame back-chain with
saved-r31/r30 per frame. Fires once per hit, exactly what the
8-instance-pool probe needed.
* `--dump-addr=0x828F3D08,0x828F4070,0x828F3EC0,...` — at end of
run (after the FOCUS report in `dump_thread_diagnostic`), each
address gets a 128-byte hex + be32 + ASCII dump. Used to
inspect the static dispatcher / job-queue struct layouts
AUDIT-003 identified.
Both gated default-off; empty set is a single `is_empty()` test on
the hot path. No guest state is mutated, so the
`sylpheed_n*m.json` lockstep digest is preserved.
KRNBUG-AUDIT-004 findings (corrects KRNBUG-AUDIT-002/003):
1. **The "8-instance pool" hypothesis for handle 0x1004 is FALSE.**
Probing the inner per-instance ctors `[0x821783D8, 0x82181750,
0x821701C8]` at -n 50M shows each fires EXACTLY ONCE with
r3 = `[0x828F3EC0, 0x828F3D08, 0x828F4070]` respectively. All
three handles are Meyers-style singletons with one dispatcher
each. The "called 8 times" claim came from miscounting raw
entries to the OUTER getter sub_8217C850 — but that getter is
itself a Meyers-singleton-getter; only the FIRST entry cascades
through to bl 0x821783D8 (gated on `[0x828F48D8] bit 0`).
2. **The producer indirection layer is the singleton-getter
itself.** Static byte-scan of .rdata / .data shows 0 hits for
the dispatcher addresses — no static registry table holds them.
But the xrefs table for the OUTER getters reveals 5–6 callers
each, MOSTLY non-create-chain, sharing the canonical producer
pattern: `bl outer_singleton_getter; lwz r3, OFFSET(r3); bl
0x824AA1D8` (with OFFSET=80 for 0x100c, =36 for 0x15e0). So the
AUDIT-003 xref audit was necessary but not sufficient — it
correctly saw "no direct producer references" but missed the
singleton-getter indirection layer.
3. **Dispatcher struct layouts** (128-byte dumps captured at -n
50M --halt-on-deadlock):
- 0x828F3D08 (handle 0x100c): event_handle at +0x4C (0x100c),
thread_handle at +0x48 (0x1010), self-pointer at +0x74,
capacity 7 at +0x28, queue empty (+0/+3C = -1).
- 0x828F4070 (handle 0x15e0): event_handle at +0x20 (0x15e0),
sibling-handle 0x15E4 at +0x1C, queue empty (+0x10 = -1).
- 0x828F3EC0 (handle 0x1004): event_handle at +0x78 (0x1004),
4 guest-heap sub-buffers at +0x20/+0x3C/+0x44/+0x50 in
0x4xxxxxxx range — noticeably different layout from the
other two pure POD job queues.
Files:
crates/xenia-kernel/src/state.rs ctor_probe_pcs / dump_addrs +
fire_ctor_probe_if_match + 2 tests
crates/xenia-app/src/main.rs Exec --ctor-probe / --dump-addr
CLI parsing, prologue hook,
end-of-run struct dumper
audit-findings.md KRNBUG-AUDIT-004 entry
audit-runs/audit-004/ 50M probe runs (v1 outer-getter
hits, v2 inner-ctor hits proving
the singleton hypothesis)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
f84e947547 |
feat(kernel): KRNBUG-AUDIT-003 — vtable/RTTI class probe at handle creation + wait
Adds a read-only MSVC RTTI traversal helper (`read_class_at_this`)
and a `probe_create_stack_classes` integration that walks each
captured back-chain frame for handle creates in `--trace-handles-focus`
and probes each frame's most-likely `this` candidate (live r31/r30/r3
for frame 0; saved-r31/r30 from the prologue spill area at [fp-12]/
[fp-16] for deeper frames). False-positive guard rejects the CRT
static-init iterator pattern (vtable's first two slots must be image-
range function pointers — PPC instruction words like `mflr r12` are
not in 0x82xxxxxx).
`dump_thread_diagnostic` now takes `&GuestMemory` so the FOCUS report
prints, for each parked waiter, a WAIT-THREAD block with full back-
chain frames and per-slot saved-register dump for offline lookup.
End-to-end finding (-n 500M producer-trace):
* Handle 0x100c dispatcher = 0x828F3D08 (image rdata; verified by
sub_82181750 disasm + xref table). [this+0] = -1 sentinel — POD
job queue, NOT a C++ polymorphic class.
* Handle 0x15e0 dispatcher = 0x828F4070 (same shape).
* Handle 0x1004's 8-instance pool members still TBD (MSVC ctors
didn't preserve `this` in r31).
* 0x42450b5c is a separate audit class (heap-allocated, parks via
non-`do_wait_single` path).
Decisive xref audit: every reference to 0x828F3D08 / 0x828F4070 in
the static analysis is in a ctor or the CRT init driver. NO producer
code references either dispatcher base. Confirms `signal_attempts=0`
is unreachable-producer, not broken-producer.
Tests: 581 → 586 green (+5: RTTI-intact / RTTI-stripped / non-object
/ cstring / probe_create_stack integration). `--stable-digest -n
100M` instructions=100000002 unchanged. Master HEAD prior:
|
||
|
|
2a9fd1fc86 |
feat(kernel): KRNBUG-AUDIT-002 — multi-frame guest stack capture at handle creation
Adds `walk_guest_back_chain` (PPC EABI back-chain walker) and a
`record_create_with_stack` audit hook gated on `--trace-handles-focus`.
NtCreateEvent / NtCreateSemaphore / NtCreateTimer / XamTaskSchedule now
route through the new helper so focused handles capture up to 6 stack
frames at allocation time. Diagnostic-only, read-only memory access:
unfocused handles pay one HashSet lookup, focused ones pay six
back-chain dereferences. Lockstep determinism preserved.
End-to-end finding: handles 0x1004 (8-instance pool via static ctor at
0x8280F810), 0x100c (singleton built inside main()), 0x15e0 (singleton
in distinct cluster) are silph-framework dispatcher objects whose
producer code is unreached at -n 500M. The producer hunt now has class
ownership; vtable/RTTI readout is the next step.
Tests: 576 → 581 green. `--stable-digest -n 100M` instructions=100000002
unchanged. Master HEAD prior:
|
||
|
|
07068e7616 |
feat(audio): APUBUG-PRODUCER-001 — XAudio register driver client + opt-in callback ticker
Replace the three XAudio kernel-export stubs (Register/Unregister/SubmitFrame) with canary-faithful implementations and add a periodic buffer-complete callback ticker reusing the existing SavedCallbackCtx injection machinery. Canary parity: - xboxkrnl_audio.cc:56-93 — read callback_ptr[0..1], wrap callback_arg in a 4-byte big-endian guest heap buffer (`wrapped_callback_arg`), write `0x4155_xxxx` to *driver_ptr. - audio_system.cc:139-141 — guest callback receives r3 = wrapped pointer, not raw callback_arg. - audio_driver.h:21-24 — frame rate 256 samples / 48 kHz ≈ 5.33 ms. Implementation: - New `crates/xenia-kernel/src/xaudio.rs` — `XAudioClient`, `XAudioState` (8-slot table, pending FIFO, dual-mode ticker), `XAUDIO_INSTR_PERIOD = 48_000` (lockstep) and `XAUDIO_PERIOD = 5.333 ms` (--parallel), same pattern as KRNBUG-D08 v-sync. - `try_inject_audio_callback` in xenia-app mirrors `try_inject_graphics_interrupt`, shares `interrupts.saved` slot for mutex with graphics callbacks. Gating: ticker + injector run only when `--xaudio-tick` / `XENIA_XAUDIO_TICK=1`. Default off because Sylpheed's audio callback enters an infinite `KeWaitForSingleObject` loop on first invocation (canary's host worker thread provides the buffer-completion fence we don't model), which hijacks a guest HW thread and regresses `swaps=2 → 1`. Default-off preserves the lockstep `sylpheed_n*m.json` goldens exactly. Producer hunt outcome (FALSIFIED for parked handles 0x1004/0x100c/0x15e4): at `-n 500M --xaudio-tick` all 3 handles still show `signal_attempts=0 (primary=0, ghost=0)`. Audio callback is not the missing producer. Next candidate per audit-findings.md is Timer DPC delivery (KeSetTimer / KeInsertQueueDpc). Tests: 562 → 576 green (10 in `xaudio.rs`, 4 in `exports.rs`). Lockstep `--stable-digest -n 100M` default-off: instructions=100000002, swaps=2 (matches pre-change baseline byte-for-byte). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
691404e36e |
fix(xam): XAMBUG-PRODUCER-001 — XamTaskSchedule spawns a real guest thread
Replaces the no-op stub at xam.rs:204 with a canary-faithful implementation mirroring xenia-canary/src/xenia/kernel/xam/xam_task.cc:43-80. Allocates a ThreadImage, allocates a KernelObject::Thread handle, and routes through Scheduler::spawn with entry=callback and start_context=message_ptr (canary's third positional XThread ctor arg). Stack size = max(0x4000, page-aligned 0x10_0000). Producer-hypothesis outcome (500M --trace-handles-focus run): the call site at 0x824a9a10 is never reached during this boot horizon, so XamTaskSchedule cannot be the missing producer for the 3 parked Event/Manual handles (0x1004, 0x100c, 0x15e4). The fix still lands — the stub was a real correctness bug that would manifest the moment the boot advances past the current deadlock. Next candidate per audit-findings.md: XAudioRegisterRenderDriverClient. - Workspace tests: 561 → 562 green (new test xam::tests::xam_task_schedule_spawns_real_thread). - --stable-digest -n 100M: instructions=100000002 unchanged from baseline; lockstep determinism preserved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
27d3608174 |
fix(kernel): KRNBUG-D08 — wall-clock v-sync under --parallel
The synthetic v-sync ticker used a per-instruction proxy (VSYNC_INSTR_PERIOD = 150 k) tuned for ~10 MIPS lockstep throughput → 60 Hz. Audit M11 observed this drifts under `--parallel`: with 6 worker threads sharing the kernel mutex, the dispatcher executes more PPC instructions per tick callback, so the accumulator never crosses 150 k. Result: ~629 v-syncs/100M lockstep → ~2 v-syncs/100M --parallel. Hybrid solution preserves lockstep determinism (which the goldens depend on) while fixing --parallel: * `tick_vsync_instr(instr_count)` — legacy instruction-count ticker, used by lockstep. Bit-stable across runs. * `tick_vsync_wallclock()` — new Instant-based ticker. Fires `floor(elapsed / VSYNC_PERIOD)` v-syncs since the anchor and advances the anchor by that many full periods (no lazy backlog). Capped at INTERRUPT_QUEUE_CAP per call so a forward-jumping clock can't overflow the FIFO. * `KernelState.parallel_active` flag set at startup from `--parallel` / `XENIA_PARALLEL=1`. Read by `coord_pre_round` in main.rs to choose between the two tickers. Verification: * cargo test --workspace --release: 561 passing (+3 new wall-clock tests vs prior 558 baseline). * lockstep -n 100M --stable-digest: BIT-IDENTICAL to pre-Phase-3 baseline. interrupts_delivered preserved at ~630 (was ~629 pre-fix). * --parallel --reservations-table -n 30M: interrupts_delivered rose from ~2 to 17. (FIFO INTERRUPT_QUEUE_CAP=4 still caps burst delivery; that's a separate bottleneck — addressed by raising cap when --parallel queue depth becomes the next blocker.) Trade-off: --parallel runs are non-deterministic at the v-sync rate by design (per audit M05 PPCBUG-703 already). Lockstep stays bit-identical, so the `sylpheed_n*m.json` goldens are untouched. Audit IDs: KRNBUG-D08 (closed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
d1105aafae |
diag(audit): KRNBUG-AUDIT-001 — focused parked-waiter ghost-trail diagnostic
Adds a one-run diagnostic that distinguishes "guest never called
Nt/KeSetEvent on this handle" from "signal landed but waiter wasn't
woken", for any handle named via `--trace-handles-focus`.
Parked-waiter context (project_xenia_rs_sylpheed_stage3_2026_04_29):
four worker threads block Sylpheed past `draws=0` on handles
0x1004 / 0x100c / 0x15e4 / 0x42450b5c (mr=true, sig=false). The
pre-existing audit dropped signal-attempts that targeted handles
without a primary trail, so we couldn't tell whether the producer
was unreachable in the guest or whether the signal landed but missed
its waiter.
Three changes:
* audit.rs: `HandleAudit` gains `focus: HashSet<u32>` and
`ghost_trails: HashMap<u32, GhostTrail>`. `record_signal`
auto-falls-through to a new `record_signal_attempt_ghost` when no
primary trail exists AND the handle is in `focus`. Bounded by
AUDIT_RING_CAPACITY per handle. Two new tests cover the focus
ghost-trail and no-double-record invariants.
* main.rs: new `--trace-handles-focus=<LIST>` flag (hex 0x or decimal,
comma-separated) populates `kernel.audit.focus`. Implies
`--trace-handles`. New "=== Handle audit (focus) ===" section in
`dump_thread_diagnostic` emits per-handle:
- signal_attempts (primary + ghost), waits, wakes
- merged cycle-sorted timeline (last 16)
- GuestExport / KernelInternal classification
- <AUDIT_BLIND> marker when waiter_count > 0 but the audit
saw no waits (i.e. waiter parked via a non-audit path —
CS / spinlock / DPC).
- DIAGNOSIS conclusion that selects between five branches.
* `cmd_check` passes None for focus → goldens unaffected.
Empirical run output at -n 500M lockstep with
`--trace-handles-focus=0x1004,0x100c,0x15e4,0x42450b5c`:
handle=0x00001004 kind=Event/Manual waiters=1 signaled=false
signal_attempts=0 (primary=0, ghost=0)
waits=1 wakes=0
created cycle=0 tid=1 lr=0x824a9f6c src=NtCreateEvent
=> producer is a missing kernel signal source
(or BST-paradox upstream)
... (same shape for 0x100c, 0x15e4)
handle=0x42450b5c kind=<UNCREATED> waiters=1 signal_attempts=0
waits=0 wakes=0 <AUDIT_BLIND>
=> waiter parked via non-audited path
Conclusion: hypothesis (A) confirmed for all 4 handles. Producer is
NOT a wake/eligibility bug — it is a genuinely missing kernel signal
source. The 3 Event/Manual handles share a creator
(lr=0x824a9f6c, tid=1) and the same wait-call wrapper at
lr=0x824ac578 — these are 3 worker threads all parked on
"work-available" notifications that never come.
Verification:
* cargo test --workspace --release: 558 passing (+2 new ghost-trail
tests vs prior 556 baseline)
* lockstep -n 100M --stable-digest: bit-identical to master HEAD
Audit IDs: KRNBUG-AUDIT-001 (closed — diagnostic instrumentation).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
7a1b6b3306 |
fix(gpu): GPUBUG-DRAIN-001 — silence VdSwap PM4 fallback under --parallel
The Phase-C VdSwap PM4 ring path (commit |
||
|
|
e7d0fcf2c9 |
fix(kernel): KRNBUG-017 — real Kf*SpinLock + KeReleaseSpinLockFromRaisedIrql
The Kf-family spinlock exports were registered as stubs: KfAcquireSpinLock → stub_return_zero (didn't write lock) KfReleaseSpinLock → stub_success (didn't clear lock) KeReleaseSpinLockFromRaisedIrql → stub_success (same) KeTryToAcquireSpinLockAtRaisedIrql → returned 1 but didn't set lock value Guest code that read the lock value back (e.g. nested acquire/release sanity checks, debug assertions) saw 0 even after "acquiring", and could enter critical regions without contention serialization. Under `--parallel` the coarse Arc<Mutex<KernelState>> already serializes us, so the audit's P0-under-parallel ranking is about correctness of the lock value visible to guest code, not mutual-exclusion (which is provided by the host mutex). Implementation mirrors canary's `xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_threading.cc`: - KfAcquireSpinLock: write 1 to *SpinLock, return 0 (old IRQL) - KfReleaseSpinLock: write 0 to *SpinLock - KeReleaseSpinLockFromRaisedIrql: write 0 to *SpinLock - KeTryToAcquireSpinLockAtRaisedIrql: write 1 to *SpinLock, return 1 Single-threaded HLE: contention can never be observed (we never run two guest threads simultaneously without holding the kernel mutex), so the spin-loop can degenerate to an unconditional acquire. Verification at -n 100M lockstep: swaps: 2 → 2 (unchanged) draws: 0 → 0 (gated by F2/F3/G) packets: ~59M (within noise) Tests: 76 kernel pass (no count change; existing harness covers the new write semantics implicitly via guest-memory smoke tests). Closes KRNBUG-017 (P0 under --parallel). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |