xenia-rs

Author	SHA1	Message	Date
MechaCat02	451b3b28fe	Merge canary-diff-and-pc-consumer-probe/p0-priv-stub-cascade (KRNBUG-AUDIT-005)	2026-05-04 18:06:26 +02:00
MechaCat02	3e2fc1ec88	feat(kernel): KRNBUG-AUDIT-005 — --pc-probe extension + canary diff identifies XexCheckExecutablePrivilege stub cascade Extends `--ctor-probe` machinery into `--pc-probe` (clap alias) with the optional `PC@DISPATCHER:OFFSET` token form: on a hit, the helper additionally logs `[disp+off]` — what the producer's `lwz r3, OFFSET(r3)` is about to read. Reuses `parse_hex_u32`; both flags share parser + storage. Read-only diagnostic. Lockstep digest preserved (`run digest matches golden` at -n 50M `--stable-digest`). 588 tests green. Decisive findings (full deliverable in `audit-findings.md` / `audit-runs/audit-005/`): - Failure mode α confirmed for KRNBUG-AUDIT-004: all 9 producer call sites for handles 0x100c (5 sites) and 0x15e0 (4 sites) fire 0x at -n 500M. The producer code path is not reached. - Set-diff of kernel-call sequences (canary.log oracle vs ours.log at -n 500M) identifies 11 exports canary calls and we don't: XGetAVPack, XeCryptSha, XeKeysConsolePrivateKeySign, ObCreateSymbolicLink, NtDeviceIoControlFile (×2), XamUserReadProfileSettings (×2), XamTaskSchedule, XamTaskCloseHandle, KeReleaseSemaphore (×268), KeResetEvent, ExTerminateThread (×2). - XGetAVPack has exactly one caller (sub_824AB578 at 0x824AB5A0). The 4 instructions immediately preceding it are: addi r3, r0, 10 ; privilege bit 10 bl XexCheckExecutablePrivilege cmpli 0, r3, 0 bc 12, eq, 0x824AB724 ; if r3==0, skip whole block - exports.rs:193 registers XexCheckExecutablePrivilege as stub_return_zero. Always returning 0 -> guest takes the branch and skips the entire AV/crypto/save-data init block. - The other call site (sub_824A9710 at 0x824A99A0) queries privilege 11 with opposite polarity (bne) -> gates XamTaskSchedule on the privilege-NOT-set arm. With both stubs returning 0, the guest walks the wrong arm of every privilege-gated branch. - This explains why the dispatcher fields read zero ([0x828F3D08+0x50]=0, [0x828F4070+0x24]=0 from AUDIT-004 dumps): the ctors run, but the producers that would populate those fields with a non-zero handle never execute. Next session: replace XexCheckExecutablePrivilege stub with real priv-bit lookup from XEX header. See audit-findings.md KRNBUG-AUDIT-005 for the validation matrix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 18:06:22 +02:00
MechaCat02	6a070bedc6	Merge dispatcher-probe-audit/p0-ctor-probe-and-struct-dump (KRNBUG-AUDIT-004)	2026-05-04 17:09:52 +02:00
MechaCat02	7108d6d131	feat(kernel): KRNBUG-AUDIT-004 — --ctor-probe PC hook + --dump-addr struct dump Diagnostic-only, read-only. Lockstep `instructions=100000002` preserved bit-exact at -n 100M --stable-digest. 586 → 588 tests. Adds two read-only diagnostics for the parked-waiter producer hunt: * `--ctor-probe=0x8217C850,0x...` — at every interpreter step, if `ctx.pc` is in the configured set, print one `CTOR-PROBE` line capturing live r3 (= `this` in MSVC PPC ctors), lr (= return site), sp, plus an 8-frame back-chain with saved-r31/r30 per frame. Fires once per hit, exactly what the 8-instance-pool probe needed. * `--dump-addr=0x828F3D08,0x828F4070,0x828F3EC0,...` — at end of run (after the FOCUS report in `dump_thread_diagnostic`), each address gets a 128-byte hex + be32 + ASCII dump. Used to inspect the static dispatcher / job-queue struct layouts AUDIT-003 identified. Both gated default-off; empty set is a single `is_empty()` test on the hot path. No guest state is mutated, so the `sylpheed_nm.json` lockstep digest is preserved. KRNBUG-AUDIT-004 findings (corrects KRNBUG-AUDIT-002/003): 1. The "8-instance pool" hypothesis for handle 0x1004 is FALSE.* Probing the inner per-instance ctors `[0x821783D8, 0x82181750, 0x821701C8]` at -n 50M shows each fires EXACTLY ONCE with r3 = `[0x828F3EC0, 0x828F3D08, 0x828F4070]` respectively. All three handles are Meyers-style singletons with one dispatcher each. The "called 8 times" claim came from miscounting raw entries to the OUTER getter sub_8217C850 — but that getter is itself a Meyers-singleton-getter; only the FIRST entry cascades through to bl 0x821783D8 (gated on `[0x828F48D8] bit 0`). 2. The producer indirection layer is the singleton-getter itself. Static byte-scan of .rdata / .data shows 0 hits for the dispatcher addresses — no static registry table holds them. But the xrefs table for the OUTER getters reveals 5–6 callers each, MOSTLY non-create-chain, sharing the canonical producer pattern: `bl outer_singleton_getter; lwz r3, OFFSET(r3); bl 0x824AA1D8` (with OFFSET=80 for 0x100c, =36 for 0x15e0). So the AUDIT-003 xref audit was necessary but not sufficient — it correctly saw "no direct producer references" but missed the singleton-getter indirection layer. 3. Dispatcher struct layouts (128-byte dumps captured at -n 50M --halt-on-deadlock): - 0x828F3D08 (handle 0x100c): event_handle at +0x4C (0x100c), thread_handle at +0x48 (0x1010), self-pointer at +0x74, capacity 7 at +0x28, queue empty (+0/+3C = -1). - 0x828F4070 (handle 0x15e0): event_handle at +0x20 (0x15e0), sibling-handle 0x15E4 at +0x1C, queue empty (+0x10 = -1). - 0x828F3EC0 (handle 0x1004): event_handle at +0x78 (0x1004), 4 guest-heap sub-buffers at +0x20/+0x3C/+0x44/+0x50 in 0x4xxxxxxx range — noticeably different layout from the other two pure POD job queues. Files: crates/xenia-kernel/src/state.rs ctor_probe_pcs / dump_addrs + fire_ctor_probe_if_match + 2 tests crates/xenia-app/src/main.rs Exec --ctor-probe / --dump-addr CLI parsing, prologue hook, end-of-run struct dumper audit-findings.md KRNBUG-AUDIT-004 entry audit-runs/audit-004/ 50M probe runs (v1 outer-getter hits, v2 inner-ctor hits proving the singleton hypothesis) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 17:09:47 +02:00
MechaCat02	48eed258f0	Merge xam-handle-stack-trace/p0-class-probe (KRNBUG-AUDIT-003) vtable/RTTI class probe at handle creation + wait. Read-only diagnostic; lockstep determinism preserved. Tests 581 → 586 green. --stable-digest -n 100M instructions=100000002. Identifies handle 0x100c dispatcher at 0x828F3D08 and handle 0x15e0 dispatcher at 0x828F4070 — both POD job queues, not C++ classes (`[this+0]=-1` sentinel, no vtable). Decisive xref audit shows every reference to either base is in a ctor or the CRT — NO producer code exists in static analysis. Producer hunt deliverable: confirms unreachable-producer, not broken-producer. Master HEAD prior: `6440261`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 21:15:03 +02:00
MechaCat02	f84e947547	feat(kernel): KRNBUG-AUDIT-003 — vtable/RTTI class probe at handle creation + wait Adds a read-only MSVC RTTI traversal helper (`read_class_at_this`) and a `probe_create_stack_classes` integration that walks each captured back-chain frame for handle creates in `--trace-handles-focus` and probes each frame's most-likely `this` candidate (live r31/r30/r3 for frame 0; saved-r31/r30 from the prologue spill area at [fp-12]/ [fp-16] for deeper frames). False-positive guard rejects the CRT static-init iterator pattern (vtable's first two slots must be image- range function pointers — PPC instruction words like `mflr r12` are not in 0x82xxxxxx). `dump_thread_diagnostic` now takes `&GuestMemory` so the FOCUS report prints, for each parked waiter, a WAIT-THREAD block with full back- chain frames and per-slot saved-register dump for offline lookup. End-to-end finding (-n 500M producer-trace): * Handle 0x100c dispatcher = 0x828F3D08 (image rdata; verified by sub_82181750 disasm + xref table). [this+0] = -1 sentinel — POD job queue, NOT a C++ polymorphic class. * Handle 0x15e0 dispatcher = 0x828F4070 (same shape). * Handle 0x1004's 8-instance pool members still TBD (MSVC ctors didn't preserve `this` in r31). * 0x42450b5c is a separate audit class (heap-allocated, parks via non-`do_wait_single` path). Decisive xref audit: every reference to 0x828F3D08 / 0x828F4070 in the static analysis is in a ctor or the CRT init driver. NO producer code references either dispatcher base. Confirms `signal_attempts=0` is unreachable-producer, not broken-producer. Tests: 581 → 586 green (+5: RTTI-intact / RTTI-stripped / non-object / cstring / probe_create_stack integration). `--stable-digest -n 100M` instructions=100000002 unchanged. Master HEAD prior: `6440261`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 21:14:56 +02:00
MechaCat02	6440261e2e	Merge xam-handle-stack-trace/p0-multiframe-walker (KRNBUG-AUDIT-002) Multi-frame back-chain capture at NtCreateEvent / NtCreateSemaphore / NtCreateTimer / XamTaskSchedule, gated on --trace-handles-focus. Read- only diagnostic; lockstep determinism unaffected. Tests 576 → 581 green. --stable-digest -n 100M instructions=100000002. Identifies: 0x1004 = 8-instance pool via static ctor at 0x8280F810; 0x100c = singleton inside main(); 0x15e0 = singleton in distinct cluster. All three are silph-framework dispatchers; producer hunt continues with vtable/RTTI readout next session. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 20:41:12 +02:00
MechaCat02	2a9fd1fc86	feat(kernel): KRNBUG-AUDIT-002 — multi-frame guest stack capture at handle creation Adds `walk_guest_back_chain` (PPC EABI back-chain walker) and a `record_create_with_stack` audit hook gated on `--trace-handles-focus`. NtCreateEvent / NtCreateSemaphore / NtCreateTimer / XamTaskSchedule now route through the new helper so focused handles capture up to 6 stack frames at allocation time. Diagnostic-only, read-only memory access: unfocused handles pay one HashSet lookup, focused ones pay six back-chain dereferences. Lockstep determinism preserved. End-to-end finding: handles 0x1004 (8-instance pool via static ctor at 0x8280F810), 0x100c (singleton built inside main()), 0x15e0 (singleton in distinct cluster) are silph-framework dispatcher objects whose producer code is unreached at -n 500M. The producer hunt now has class ownership; vtable/RTTI readout is the next step. Tests: 576 → 581 green. `--stable-digest -n 100M` instructions=100000002 unchanged. Master HEAD prior: `9d45efe`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 20:41:06 +02:00
MechaCat02	9d45efe5d5	Merge xaudio-register-driver/p0-real-callback-loop (APUBUG-PRODUCER-001) Adds canary-faithful XAudioRegisterRenderDriverClient + Unregister + Submit implementations and a default-off audio buffer-complete callback ticker (`--xaudio-tick` / `XENIA_XAUDIO_TICK=1`). Producer hypothesis FALSIFIED for handles 0x1004/0x100c/0x15e4 — all three still show signal_attempts=0 at -n 500M with the ticker enabled. Tests: 562 → 576 green. Lockstep goldens preserved at default settings (instructions=100000002, swaps=2 unchanged). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 19:50:29 +02:00
MechaCat02	07068e7616	feat(audio): APUBUG-PRODUCER-001 — XAudio register driver client + opt-in callback ticker Replace the three XAudio kernel-export stubs (Register/Unregister/SubmitFrame) with canary-faithful implementations and add a periodic buffer-complete callback ticker reusing the existing SavedCallbackCtx injection machinery. Canary parity: - xboxkrnl_audio.cc:56-93 — read callback_ptr[0..1], wrap callback_arg in a 4-byte big-endian guest heap buffer (`wrapped_callback_arg`), write `0x4155_xxxx` to driver_ptr. - audio_system.cc:139-141 — guest callback receives r3 = wrapped pointer, not raw callback_arg. - audio_driver.h:21-24 — frame rate 256 samples / 48 kHz ≈ 5.33 ms. Implementation: - New `crates/xenia-kernel/src/xaudio.rs` — `XAudioClient`, `XAudioState` (8-slot table, pending FIFO, dual-mode ticker), `XAUDIO_INSTR_PERIOD = 48_000` (lockstep) and `XAUDIO_PERIOD = 5.333 ms` (--parallel), same pattern as KRNBUG-D08 v-sync. - `try_inject_audio_callback` in xenia-app mirrors `try_inject_graphics_interrupt`, shares `interrupts.saved` slot for mutex with graphics callbacks. Gating: ticker + injector run only when `--xaudio-tick` / `XENIA_XAUDIO_TICK=1`. Default off because Sylpheed's audio callback enters an infinite `KeWaitForSingleObject` loop on first invocation (canary's host worker thread provides the buffer-completion fence we don't model), which hijacks a guest HW thread and regresses `swaps=2 → 1`. Default-off preserves the lockstep `sylpheed_nm.json` goldens exactly. Producer hunt outcome (FALSIFIED for parked handles 0x1004/0x100c/0x15e4): at `-n 500M --xaudio-tick` all 3 handles still show `signal_attempts=0 (primary=0, ghost=0)`. Audio callback is not the missing producer. Next candidate per audit-findings.md is Timer DPC delivery (KeSetTimer / KeInsertQueueDpc). Tests: 562 → 576 green (10 in `xaudio.rs`, 4 in `exports.rs`). Lockstep `--stable-digest -n 100M` default-off: instructions=100000002, swaps=2 (matches pre-change baseline byte-for-byte). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 19:50:22 +02:00
MechaCat02	38f78c88a8	Merge xam-task-schedule-producer/p0-spawn-real-thread (XAMBUG-PRODUCER-001)	2026-05-03 18:32:44 +02:00
MechaCat02	691404e36e	fix(xam): XAMBUG-PRODUCER-001 — XamTaskSchedule spawns a real guest thread Replaces the no-op stub at xam.rs:204 with a canary-faithful implementation mirroring xenia-canary/src/xenia/kernel/xam/xam_task.cc:43-80. Allocates a ThreadImage, allocates a KernelObject::Thread handle, and routes through Scheduler::spawn with entry=callback and start_context=message_ptr (canary's third positional XThread ctor arg). Stack size = max(0x4000, page-aligned 0x10_0000). Producer-hypothesis outcome (500M --trace-handles-focus run): the call site at 0x824a9a10 is never reached during this boot horizon, so XamTaskSchedule cannot be the missing producer for the 3 parked Event/Manual handles (0x1004, 0x100c, 0x15e4). The fix still lands — the stub was a real correctness bug that would manifest the moment the boot advances past the current deadlock. Next candidate per audit-findings.md: XAudioRegisterRenderDriverClient. - Workspace tests: 561 → 562 green (new test xam::tests::xam_task_schedule_spawns_real_thread). - --stable-digest -n 100M: instructions=100000002 unchanged from baseline; lockstep determinism preserved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 18:32:40 +02:00
MechaCat02	b54aa48d10	Merge audit-2026-05-fix/p2-session-closeout	2026-05-03 17:35:37 +02:00
MechaCat02	eb71fe8daf	docs(audit): close out follow-up session 2026-05-03 3 IDs landed: GPUBUG-DRAIN-001, KRNBUG-AUDIT-001, KRNBUG-D08. Tests 556 → 561. Lockstep digest BIT-IDENTICAL on stable fields. draws=0 persists; parked-waiter producer-trace confirms hypothesis (A) for 3 of 4 handles — guest code never calls Nt/KeSetEvent on 0x1004 / 0x100c / 0x15e4 — so the renderer plateau is a missing kernel signal source, NOT a wake-eligibility bug or BST-paradox. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 17:35:37 +02:00
MechaCat02	866855000c	Merge audit-2026-05-fix/p2-vsync-wallclock (KRNBUG-D08)	2026-05-03 17:34:30 +02:00
MechaCat02	27d3608174	fix(kernel): KRNBUG-D08 — wall-clock v-sync under --parallel The synthetic v-sync ticker used a per-instruction proxy (VSYNC_INSTR_PERIOD = 150 k) tuned for ~10 MIPS lockstep throughput → 60 Hz. Audit M11 observed this drifts under `--parallel`: with 6 worker threads sharing the kernel mutex, the dispatcher executes more PPC instructions per tick callback, so the accumulator never crosses 150 k. Result: ~629 v-syncs/100M lockstep → ~2 v-syncs/100M --parallel. Hybrid solution preserves lockstep determinism (which the goldens depend on) while fixing --parallel: * `tick_vsync_instr(instr_count)` — legacy instruction-count ticker, used by lockstep. Bit-stable across runs. * `tick_vsync_wallclock()` — new Instant-based ticker. Fires `floor(elapsed / VSYNC_PERIOD)` v-syncs since the anchor and advances the anchor by that many full periods (no lazy backlog). Capped at INTERRUPT_QUEUE_CAP per call so a forward-jumping clock can't overflow the FIFO. * `KernelState.parallel_active` flag set at startup from `--parallel` / `XENIA_PARALLEL=1`. Read by `coord_pre_round` in main.rs to choose between the two tickers. Verification: * cargo test --workspace --release: 561 passing (+3 new wall-clock tests vs prior 558 baseline). * lockstep -n 100M --stable-digest: BIT-IDENTICAL to pre-Phase-3 baseline. interrupts_delivered preserved at ~630 (was ~629 pre-fix). * --parallel --reservations-table -n 30M: interrupts_delivered rose from ~2 to 17. (FIFO INTERRUPT_QUEUE_CAP=4 still caps burst delivery; that's a separate bottleneck — addressed by raising cap when --parallel queue depth becomes the next blocker.) Trade-off: --parallel runs are non-deterministic at the v-sync rate by design (per audit M05 PPCBUG-703 already). Lockstep stays bit-identical, so the `sylpheed_n*m.json` goldens are untouched. Audit IDs: KRNBUG-D08 (closed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 17:34:30 +02:00
MechaCat02	b82919bdd0	Merge audit-2026-05-fix/p2-parked-waiter-trace (KRNBUG-AUDIT-001)	2026-05-03 17:22:14 +02:00
MechaCat02	d1105aafae	diag(audit): KRNBUG-AUDIT-001 — focused parked-waiter ghost-trail diagnostic Adds a one-run diagnostic that distinguishes "guest never called Nt/KeSetEvent on this handle" from "signal landed but waiter wasn't woken", for any handle named via `--trace-handles-focus`. Parked-waiter context (project_xenia_rs_sylpheed_stage3_2026_04_29): four worker threads block Sylpheed past `draws=0` on handles 0x1004 / 0x100c / 0x15e4 / 0x42450b5c (mr=true, sig=false). The pre-existing audit dropped signal-attempts that targeted handles without a primary trail, so we couldn't tell whether the producer was unreachable in the guest or whether the signal landed but missed its waiter. Three changes: * audit.rs: `HandleAudit` gains `focus: HashSet<u32>` and `ghost_trails: HashMap<u32, GhostTrail>`. `record_signal` auto-falls-through to a new `record_signal_attempt_ghost` when no primary trail exists AND the handle is in `focus`. Bounded by AUDIT_RING_CAPACITY per handle. Two new tests cover the focus ghost-trail and no-double-record invariants. * main.rs: new `--trace-handles-focus=<LIST>` flag (hex 0x or decimal, comma-separated) populates `kernel.audit.focus`. Implies `--trace-handles`. New "=== Handle audit (focus) ===" section in `dump_thread_diagnostic` emits per-handle: - signal_attempts (primary + ghost), waits, wakes - merged cycle-sorted timeline (last 16) - GuestExport / KernelInternal classification - <AUDIT_BLIND> marker when waiter_count > 0 but the audit saw no waits (i.e. waiter parked via a non-audit path — CS / spinlock / DPC). - DIAGNOSIS conclusion that selects between five branches. * `cmd_check` passes None for focus → goldens unaffected. Empirical run output at -n 500M lockstep with `--trace-handles-focus=0x1004,0x100c,0x15e4,0x42450b5c`: handle=0x00001004 kind=Event/Manual waiters=1 signaled=false signal_attempts=0 (primary=0, ghost=0) waits=1 wakes=0 created cycle=0 tid=1 lr=0x824a9f6c src=NtCreateEvent => producer is a missing kernel signal source (or BST-paradox upstream) ... (same shape for 0x100c, 0x15e4) handle=0x42450b5c kind=<UNCREATED> waiters=1 signal_attempts=0 waits=0 wakes=0 <AUDIT_BLIND> => waiter parked via non-audited path Conclusion: hypothesis (A) confirmed for all 4 handles. Producer is NOT a wake/eligibility bug — it is a genuinely missing kernel signal source. The 3 Event/Manual handles share a creator (lr=0x824a9f6c, tid=1) and the same wait-call wrapper at lr=0x824ac578 — these are 3 worker threads all parked on "work-available" notifications that never come. Verification: * cargo test --workspace --release: 558 passing (+2 new ghost-trail tests vs prior 556 baseline) * lockstep -n 100M --stable-digest: bit-identical to master HEAD Audit IDs: KRNBUG-AUDIT-001 (closed — diagnostic instrumentation). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 17:22:14 +02:00
MechaCat02	0e95e38813	Merge audit-2026-05-fix/p2-vdswap-parallel-fallback (GPUBUG-DRAIN-001)	2026-05-03 17:12:19 +02:00
MechaCat02	7a1b6b3306	fix(gpu): GPUBUG-DRAIN-001 — silence VdSwap PM4 fallback under --parallel The Phase-C VdSwap PM4 ring path (commit `82f3d61`) emits two "PM4_XE_SWAP not consumed by drain" warnings when running: exec sylpheed.iso --ui --quiet --halt-on-deadlock \ --parallel --reservations-table Lockstep -n 100M never trips it. Two distinct race windows: (a) Inline backend (--ui forces it): drain(mem, 4096) hit its fixed packet cap before reaching the PM4_XE_SWAP we'd just injected at the WPTR tail. With 6 CPU threads, the ring accumulates >4096 packets between vd_swap callbacks. (b) Threaded backend (--parallel without --ui): the worker's DrainFence handler has a 900 ms deadline and game-batched IBs (8-10 M packets observed) keep it from reaching the tail in any reasonable budget. If the worker eventually drained past the injected packet later, the safety-net direct notify would double-count. Three changes: * gpu_system.rs: new `drain_until_wptr(target, time_budget)` draining by the canary `WorkerThreadMain` predicate (read_offset != target) instead of a fixed packet count. 900 ms deadline mirrors the threaded DrainFence handler. * handle.rs: inline `drain_to_current_wptr` switches to `drain_until_wptr`. DrainFence handler publishes the digest mirror BEFORE replying so the CPU's post-drain `digest_snapshot` sees fresh stats. * exports.rs (vd_swap): skip the PM4 ring injection unconditionally and route swap notification through `notify_xe_swap` directly. Tail-injection is unreliable under --parallel for both backends. The slot-0 fetch-constant patch is deferred (GPUBUG-FETCH-PATCH-001); draws=0 today so a stale slot 0 has no observable effect. Verification: * cargo test --workspace --release: 556 passing (unchanged). * Lockstep -n 100M --stable-digest: bit-identical to pre-fix master HEAD `aa3f1d3`. {instructions:100000002, imports:987685, unimpl:0, draws:0, swaps:2, ...} * check --parallel --reservations-table -n 30M: 0 warnings (was 2). swaps=2. * exec --gpu-inline --parallel --reservations-table -n 30M: 0 warnings (was 2 with drained=8M-10M observed). swaps=2. Audit IDs: GPUBUG-DRAIN-001 (closed), GPUBUG-FETCH-PATCH-001 (filed, deferred). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 17:12:15 +02:00
MechaCat02	aa3f1d344f	Merge audit-2026-05-fix/tracker-close-out: fix-session 2026-05-03 close-out	2026-05-03 14:35:08 +02:00
MechaCat02	c7fccccbc6	docs(audit): close out fix session 2026-05-03 — 12 IDs applied Records the outcome of the audit-2026-05 fix sprint into the master tracker. Documents: - 12 closed IDs (10 P0 + 2 P1) with their commit SHAs and verification deltas - 4 deferred IDs (XAMBUG-001, XAMBUG-002, KRNBUG-D08/XMODBUG-011, PPCBUG-720/721/722) with explicit reasons - Sprint acceptance criteria status: A-E lands cleanly with swaps=2, but draws=0 persists (renderer plateau is multi-causal as the audit predicted; parked-waiter handles unresolved) - Recommended next session: trace producers for the 4 parked-waiter handles directly Closed IDs: SWAPBUG-001 / PPCBUG-001 (P0) → `9ab986e` ORACBUG-004 (P0; partial ORACBUG-006) → `1f416aa` KRNBUG-Vd-04, GPUBUG-001, XMODBUG-013 (3× P0) → `82f3d61` GPUBUG-101 (P0) → `78ea81c` GPUBUG-100 (P0; abs deferred) → `c5c6713` GPUBUG-102 (P0) → `ec2d955` GPUBUG-103/104/105 (3× P0) → `8723d68` KRNBUG-017 (P0-under-parallel) → `e7d0fcf` GPUBUG-006 (P1) → `8fc1b1d` XMODBUG-002 (P1) → `780e854` Test count at sprint close: 556 (+5 from 551 baseline). Workspace clean; no dangling branches. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 14:35:08 +02:00
MechaCat02	6f851a2083	Merge audit-2026-05-fix/p1-xmodbug-002-write-bulk	2026-05-03 14:30:22 +02:00
MechaCat02	780e854c2f	fix(memory): XMODBUG-002 — write_bulk bumps page_versions for touched pages `GuestMemory::write_bulk` did the bulk copy via raw `copy_nonoverlapping` without bumping page_versions for any of the pages it touched. The per-byte `write_u8/u16/u32` methods all bump page_versions after their store; downstream caches (texture cache, shader cache) Acquire-load the slot to invalidate stale entries on guest writes. Without the bulk bump, a caller like `NtReadFile` writing a texture/shader resource into guest memory would leave any cache that had already keyed on the prior version handing back stale decoded bytes. After the copy, walk every page the write touched and bump it. Cheap: the typical bulk write spans a few pages (NtReadFile uses 64-128 KB chunks → 16-32 pages). Reservation-table invalidation for `lwarx`/`stwcx.` (XMODBUG-001's sibling) is NOT addressed here — the reservation table lives on KernelState, not GuestMemory, and plumbing it through requires a wider change. Callers that bulk-write code-bearing or atomic-bearing memory should call `kernel.reservations.invalidate_for_write(addr)` themselves; XEX-loader and NtReadFile are doing data-bearing writes that don't intersect lwarx targets, so this is acceptable for now. Verification at -n 100M lockstep: swaps: 2 → 2 (unchanged) draws: 0 → 0 texture_cache_entries: 0 → 0 (Sylpheed hasn't issued IM_LOAD yet — the bump is silent until a cache keys on a touched page, which won't happen until Phase F2/F3 unblocks the resource-loader workers) packets: ~59M (within noise) Tests: 16 memory pass. Closes XMODBUG-002 (P1). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 14:30:22 +02:00
MechaCat02	104078dc29	Merge audit-2026-05-fix/p1-gpubug-006-mmio-ordering	2026-05-03 14:26:09 +02:00
MechaCat02	8fc1b1dfed	fix(gpu): GPUBUG-006 — sync_with_mmio Acquire/Release pair the producer The producer side (`mmio_region.rs:78`, the guest's CP_RB_WPTR MMIO write callback) uses `Ordering::Release` so any ring-memory writes the guest performed before bumping WPTR are visible to a paired `Acquire`-load on the consumer. The consumer here at `sync_with_mmio` was using `Ordering::Relaxed` for both the WPTR load and the RPTR mirror store — leaving the Release/Acquire pairing broken. Under `--parallel`, this broken pairing means the GPU worker can observe a fresh WPTR value while still reading stale ring-memory contents at the corresponding offsets — garbage PM4 packets. The audit's M11 grid run confirmed --parallel is non-deterministic beyond the documented `packets` ±5% noise; this fix is one strand of that. Symmetric fix on the RPTR mirror store: Release pairs with any guest-side Acquire-load of CP_RB_RPTR for ring-writeback bookkeeping. Verification at -n 100M lockstep: swaps: 2 → 2 (unchanged) draws: 0 → 0 (unchanged) packets: ~60M (within noise) Tests: 149 (no count change; this is a memory-ordering correctness fix, not a behavioral change visible at the digest level in lockstep). Closes GPUBUG-006 (P1). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 14:26:09 +02:00
MechaCat02	fceaa81f46	Merge audit-2026-05-fix/kernel-p0-spinlock-xam: KRNBUG-017 Kf-spinlock	2026-05-03 14:25:00 +02:00
MechaCat02	e7d0fcf2c9	fix(kernel): KRNBUG-017 — real KfSpinLock + KeReleaseSpinLockFromRaisedIrql The Kf-family spinlock exports were registered as stubs: KfAcquireSpinLock → stub_return_zero (didn't write lock) KfReleaseSpinLock → stub_success (didn't clear lock) KeReleaseSpinLockFromRaisedIrql → stub_success (same) KeTryToAcquireSpinLockAtRaisedIrql → returned 1 but didn't set lock value Guest code that read the lock value back (e.g. nested acquire/release sanity checks, debug assertions) saw 0 even after "acquiring", and could enter critical regions without contention serialization. Under `--parallel` the coarse Arc<Mutex<KernelState>> already serializes us, so the audit's P0-under-parallel ranking is about correctness of the lock value visible to guest code, not mutual-exclusion (which is provided by the host mutex). Implementation mirrors canary's `xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_threading.cc`: - KfAcquireSpinLock: write 1 to SpinLock, return 0 (old IRQL) - KfReleaseSpinLock: write 0 to SpinLock - KeReleaseSpinLockFromRaisedIrql: write 0 to SpinLock - KeTryToAcquireSpinLockAtRaisedIrql: write 1 to *SpinLock, return 1 Single-threaded HLE: contention can never be observed (we never run two guest threads simultaneously without holding the kernel mutex), so the spin-loop can degenerate to an unconditional acquire. Verification at -n 100M lockstep: swaps: 2 → 2 (unchanged) draws: 0 → 0 (gated by F2/F3/G) packets: ~59M (within noise) Tests: 76 kernel pass (no count change; existing harness covers the new write semantics implicitly via guest-memory smoke tests). Closes KRNBUG-017 (P0 under --parallel). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 14:24:47 +02:00
MechaCat02	537d789deb	Merge audit-2026-05-fix/drawstate-p0-register-addresses: GPUBUG-103/104/105	2026-05-03 14:22:09 +02:00
MechaCat02	8723d6826b	fix(gpu): GPUBUG-103/104/105 — fix 8 draw-state register addresses + index_size bit Eight of the register-index constants in draw_state.rs::reg pointed at completely unrelated registers because the canonical canary table (register_table.inc) was misread when the module was first authored. Re-validated each value against canary's lines 1232-1336. \| Register \| Pre-fix \| Canary \| Was-actually \| \| ------------------------- \| ------- \| ------ \| ------------- \| \| VGT_DRAW_INITIATOR \| 0x2281 \| 0x21FC \| (junk) \| \| VGT_DMA_BASE \| 0x2282 \| 0x21FA \| (junk) \| \| VGT_DMA_SIZE \| 0x2283 \| 0x21FB \| (junk) \| \| PA_SC_WINDOW_SCISSOR_TL \| 0x200E \| 0x2081 \| SCREEN_SCIS_TL\| \| PA_SC_WINDOW_SCISSOR_BR \| 0x200F \| 0x2082 \| SCREEN_SCIS_BR\| \| RB_COLOR_INFO_1 \| 0x2010 \| 0x2003 \| COHER_DEST_BASE_10\| \| RB_COLOR_INFO_2 \| 0x2011 \| 0x2004 \| COHER_DEST_BASE_11\| \| RB_COLOR_INFO_3 \| 0x2012 \| 0x2005 \| COHER_DEST_BASE_12\| \| PA_SU_VTX_CNTL \| 0x2083 \| 0x2302 \| PA_SC_CLIPRECT_RULE\| Also corrected the `index_size` bit position in VGT_DRAW_INITIATOR extraction: was bit 8 (which is `major_mode[0]`), should be bit 11 per canary `registers.h:324` (`xenos::IndexFormat index_size : 1; // +11`). The block comment in `extract()` was also wrong about the intermediate field layout and has been refreshed. Verification at -n 100M lockstep: swaps: 2 → 2 (unchanged) draws: 0 → 0 (still gated — see below) packets: ~61M (within noise) Tests: 149 (no count change; existing draw_state tests cover the new constants implicitly via behavioral round-trip). The audit predicted Phases C+D+E together would unlock `draws > 0`, but the runtime plateau is multi-causal per the audit's own analysis (`project_xenia_rs_audit_2026_05_02.md`). The likely remaining blockers in -n 100M: * 4 parked-waiter worker threads (handles 0x1004, 0x100c, 0x15e4, 0x42450b5c) — Phase F's XAM/spinlock fixes target this. * shader_blobs_live=0 after 100M — the game hasn't issued IM_LOAD yet because workers haven't loaded shader resources. The register fixes here are still load-bearing for any draw that DOES happen (every register read at 0x2281 was junk before this commit) — landing them now is correct even if draws=0 persists until Phase F unparks the resource-loader threads. Closes GPUBUG-103, GPUBUG-104, GPUBUG-105 (P0). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 14:22:04 +02:00
MechaCat02	a07784349d	Merge audit-2026-05-fix/shader-p0-operand-modifiers: GPUBUG-100/101/102	2026-05-03 14:18:51 +02:00
MechaCat02	ec2d955dbd	fix(gpu): GPUBUG-102 — apply per-format endian byte-swap to vertex fetch The vertex fetch constant (canary `xe_gpu_vertex_fetch_t`, xenos.h:1158-1172) holds an `endian` field (low 2 bits of dword_1) selecting kNone/k8in16/k8in32/k16in32 swap patterns per `GpuSwapInline` (xenos.h:1090-1109). Xbox 360 vertex data is stored big-endian; the host is little-endian. Pre-fix every dword was bitcast as-is — vertex positions decoded as byte-reversed garbage, producing clipped or NaN positions in any draw that survived to the host. Mechanical changes: - crates/xenia-gpu/src/translator.rs: AOT `emit_vfetch` reads fetch_const dword 1 (endian) and wraps each lane's load in `gpu_swap(value, endian)`. New `gpu_swap` helper added to the emitted module header. - crates/xenia-gpu/src/shaders/xenos_interp.wgsl: matching `gpu_swap` helper added to the runtime interpreter shader. `interpret_vertex_fetch` reads fc1, computes the endian, and wraps every format's per-lane load (including 8_8_8_8 and 16_16_FLOAT paths). Mirrors the AOT translator's emission. Verification at -n 100M lockstep: swaps: 2 → 2 (gated by Phase E for draws) draws: 0 → 0 packets: ~60M (within noise) Tests: +1 (vfetch_emit_includes_gpu_swap_helper_call). Closes GPUBUG-102 (P0). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 14:18:46 +02:00
MechaCat02	c5c6713419	fix(gpu): GPUBUG-100 — apply per-operand swizzle + negate to ALU sources Word-1 of every ALU triple holds three 8-bit component-relative swizzles (`src1_swiz`/`src2_swiz`/`src3_swiz` at bits 16-23/8-15/0-7 per canary ucode.h:2064-2066) and three per-operand negate flags (bits 24/25/26). Pre-fix, both the WGSL interpreter and the AOT translator discarded word-1 entirely with `_ = w1;` — every ALU result was missing its swizzle (broadcast/permute patterns like `.zyxw`, `.xxxx`) and any negated operand was used positive instead. Component-relative semantics (canary's `AluInstruction::GetSwizzledComponentIndex`, ucode.h:1996): for output component i, the source component is `((swizzle >> (2*i)) + i) & 3`. Identity swizzle is 0x00, NOT 0xE4 — the original `apply_swizzle` in the interpreter shader treated it as absolute, also incorrect. Mechanical changes: - crates/xenia-gpu/src/ucode/alu.rs: extend AluInstruction with src_X_swiz (u8) and src_X_negate (bool) fields. decode_alu unpacks them from word 1. - crates/xenia-gpu/src/shaders/xenos_interp.wgsl: apply_swizzle uses component-relative semantics. interpret_alu decodes the modifiers and applies via apply_swizzle + apply_modifiers (with abs=false). - crates/xenia-gpu/src/translator.rs: src_operand emits the precomputed swizzle inline as `vec4<f32>(base.x, base.y, ...)`, then wraps in `(-…)` when negated. Identity swizzle (0x00) emits a bare base expression so it round-trips with the trivial-shader fixture. Abs is omitted in this commit — the abs flag is dual-meaning (for temps it lives at bit 7 of the src byte; for constants at word-2 bit 7 `abs_constants`). Wiring it up correctly requires more careful case-split logic; deferred to Phase G. Verification at -n 100M lockstep: swaps: 2 → 2 (gated by Phase E for draws) draws: 0 → 0 packets: ~58M (within noise) Tests: 554 → 555 (+1 swizzle/negate test, no count change otherwise because identity swizzle test merged into D1's parameterised test). WGSL still validates via naga (combined_module_parses_as_wgsl). Closes GPUBUG-100 (P0). Abs deferred to Phase G. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 14:15:07 +02:00
MechaCat02	78ea81c12a	fix(gpu): GPUBUG-101 — decode src1/2/3_sel temp-vs-constant selector Per canary AluInstruction layout (xenia-canary/src/xenia/gpu/ucode.h: 2078-2086), word-0 bits 29-31 are the per-operand `srcN_sel` flags selecting temp register (1) vs ALU constant (0); the corresponding 8-bit src byte indexes either: - a temp register (bits 5:0 = index, bits 6/7 reserved for relative-addressing / abs flags consumed by Phase D2), or - an ALU constant (full 8-bit index). Pre-fix, the WGSL interpreter and AOT translator both masked `& 0x7F` on the src byte and emitted `r[low7]` regardless of the operand class. Every shader's WVP matrix / light constant / per-frame uniform read came back as r[low7] — typically zero — yielding invisible rendering. Mechanical changes: - crates/xenia-gpu/src/ucode/alu.rs: decode src_a_is_temp / src_b_is_temp / src_c_is_temp from w0 bits 29/30/31. Note that our src_a (low byte of w0) is canary's third operand, hence its selector is bit 29 (canary src3_sel), not bit 31. - crates/xenia-gpu/src/shaders/xenos_interp.wgsl: `read_src` now takes the is_temp flag; constants index xenos_consts.alu directly. - crates/xenia-gpu/src/translator.rs: `src_operand` mirrors the interpreter — `r[idx]` when temp, `xenos_consts.alu[idx]` when constant. The trivial-shader synthetic test was updated to set the temp flags so its `r[0u] = (r[0u] + r[0u])` assertion remains valid; without the flags set, all sources would now resolve as constants. Bank-selection (cf-level relative addressing for higher banks of the 512 ALU constants) remains a Phase G+ extension — covers c0..c127 in bank 0, which most Sylpheed shaders use directly. Verification at -n 100M lockstep: swaps: 2 → 2 (unchanged — gated by D2/D3/E for draws) draws: 0 → 0 packets: ~61M (within noise) Tests: 552 → 554 (+2 translator tests for the temp/constant decode). Closes GPUBUG-101 (P0). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 14:10:11 +02:00
MechaCat02	1b74db6fa7	Merge audit-2026-05-fix/renderer-p0-vdswap-pm4: VdSwap PM4 ring path	2026-05-03 14:00:27 +02:00
MechaCat02	82f3d611e2	fix(gpu,kernel): KRNBUG-Vd-04 / GPUBUG-001 / XMODBUG-013 — VdSwap PM4 ring path The pre-fix VdSwap zero-filled the guest's reserved buffer with NOPs and called `state.gpu.notify_xe_swap` directly — bypassing the ring, leaving the PM4_XE_SWAP handler at gpu_system.rs:1232 dead code, and skipping the PM4_TYPE0(SHADER_CONSTANT_FETCH_00_0, 6) patch. Sylpheed's bloom/ blur "sample frame N for frame N+1" path samples fetch-constant slot 0 expecting the frontbuffer descriptor; without the patch, slot 0 stayed stale and any shader sampling it read garbage. This commit writes the canary VdSwap PM4 sequence directly into the primary ring at the current write pointer (read via the shared MMIO atomic), then advances WPTR over the injection. The natural CP drain consumes PM4_XE_SWAP — bumping `swaps_seen` and patching fetch-constant slot 0 — without going through any direct kernel→GPU bypass. Sequence per xenia-canary VdSwap_entry (xboxkrnl_video.cc:438-521): 1) PM4_TYPE0(0x4800, count=6) + 6 fetch-header dwords (with base_address re-patched from virtual to physical >> 12). 2) PM4_TYPE3(PM4_XE_SWAP, count=4) + signature + frontbuffer_phys + width + height. Mechanism notes: - buffer_ptr in xenia-rs is in the system command buffer, NOT the primary ring (verified empirically: buffer_ptr=0x4acd4df8 vs ring_base=0x0accb000, size 4 KB). Canary's VdSwap writes to buffer_ptr because its ring layout maps the reserved slot inside the ring; xenia-rs's doesn't, so we have to write at the actual ring WPTR address (cached on KernelState.ring_base from VdInitializeRingBuffer). - The original "buffer_ptr zero-fill + bump WPTR by 64" path is preserved before the injection — it exposes any game-batched PM4 packets and keeps the buffer_ptr region skippable per existing game compat behavior. - A safety-net fallback at the end calls `notify_xe_swap` directly if swaps_seen didn't advance during the drain (e.g. a ring-arithmetic edge case). Idempotent — only fires when the PM4 path didn't. - KRNBUG-Mm-04 deferred: virt→phys uses the masked stub `virt & 0x1FFF_FFFF`, sufficient for the standard heap. Mechanical changes: - crates/xenia-gpu/src/pm4.rs: add make_packet_type0 / type2 / type3 helpers + round-trip unit test (mirrors canary xenos.h:1682-1709). - crates/xenia-gpu/src/handle.rs: add mmio_cp_rb_wptr_load accessor (Acquire-load) so the kernel can compute ring offsets. - crates/xenia-kernel/src/state.rs: cache ring_base / ring_size_dwords on KernelState (set by VdInitializeRingBuffer). - crates/xenia-kernel/src/exports.rs: rewrite the vd_swap PM4-emit block; patch fetch_dwords[1] base_address virt→phys before injection. Verification at -n 100M lockstep: swaps: 2 → 2 (game fires VdSwap exactly twice) draws: 0 → 0 (gated by Phases D+E) fallback warning: 0 occurrences (PM4 path consumed both swaps) instructions: ~100M Tests: 552 passing (553 with new pm4 round-trip test). Lockstep stable-fields determinism: byte-identical across two 100M runs. The "swaps > 2" prediction in the audit's plan assumed the game would fire VdSwap more often once the path worked; empirically Sylpheed only calls VdSwap twice within 100M instructions (this is the renderer plateau the audit identified). The success criterion for Phase C is that the PM4 path is now operational, which Phases D+E require for visible draws. Closes KRNBUG-Vd-04, GPUBUG-001, XMODBUG-013. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 14:00:23 +02:00
MechaCat02	0590bffdd9	Merge audit-2026-05-fix/oracle-sylpheed-n50m-n4b: ORACBUG-004 sylpheed_n50m oracle	2026-05-03 13:46:06 +02:00
MechaCat02	1f416aaa2e	test(check): ORACBUG-004 — sylpheed_n50m stable-digest oracle Adds a regression-catcher golden for Sylpheed boot at -n 50M lockstep, covering the first VdSwap pair (the n2m oracle is swap-blind because the first VdSwap fires at ~18M instructions). The new --stable-digest flag emits/compares only fields that are deterministic in lockstep: instructions, imports, unimpl, draws, swaps, unique_render_targets, shader_blobs_live, texture_cache_entries Excluded: packets — empirically ±2-8% lockstep variance (GPU thread race per audit M11) resolves, interrupts_delivered, interrupts_dropped, texture_decodes — scheduling-sensitive under --parallel path — cwd-dependent Empirical determinism: 3 consecutive lockstep -n 50M runs produce byte-identical stable-digest output. The n4b canonical-invocation golden the audit's recommended next sprint also called for is deferred. Per audit memory `--parallel --reservations-table` is pathologically slow (>32 min for -n 100M), so -n 4B in that mode would be many hours per run, not the 5-15 min the plan estimated. n4b will be captured one-shot post-renderer-unblock as a manual artifact under audit-runs/post-fix/, not as a test golden. See crates/xenia-app/tests/golden/README.md. Test infrastructure: - crates/xenia-app/tests/sylpheed_oracles.rs — invokes CARGO_BIN_EXE_xenia-rs against the ISO. Path resolved via SYLPHEED_ISO env var (skips gracefully if missing). - #[ignore]-gated; run via: cargo test --release -p xenia-app --test sylpheed_oracles \\ -- --ignored --nocapture Closes ORACBUG-004 (P0). Partial: ORACBUG-006 (P1 deferred). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 13:46:02 +02:00
MechaCat02	62f673d094	Merge audit-2026-05-fix/swapbug-001-revert-addi-truncation: SWAPBUG-001 revert	2026-05-03 13:38:05 +02:00
MechaCat02	9ab986ec09	fix(cpu): SWAPBUG-001 — revert addi 32-bit truncation The addi opcode was truncating its result to 32 bits per the post-P4-batch3 "32-bit ABI" rationale (commit `bf8208e`). Hunk-level bisection during the 2026-05 audit (M11) isolated this single cast as the cause of the post-P8 swap regression: swaps dropped 2 → 1 and the renderer lost a frame. PowerISA mandates sign-extension to 64 bits; canary does not truncate addi. The truncation was a canary-divergent over-extension of the addis fix (which IS canary-divergent by design, see addis at interpreter.rs:121-134). The addi_li_neg_one_zero_extends_upper test encoded the wrong invariant. Replaced with a sign-extension test asserting canonical PowerISA behavior (gpr[3] == 0xFFFF_FFFF_FFFF_FFFF for `li r3, -1`). Verification at -n 100M lockstep: swaps: 1 → 2 (gate met) draws: 0 → 0 (unchanged — gated by Phase C+D+E) instructions: ~100M (unchanged) imports: 11.4M → 987k (game escapes retry loop) packets: 281M → 57M (same) interrupts_delivered: 629 → 630 Tests: 551 passing (unchanged). Lockstep determinism: byte-identical across two 100M runs except packets (±5%, GPU-thread-race noise floor). Closes SWAPBUG-001 / PPCBUG-001. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 13:37:51 +02:00
MechaCat02	caa37fc595	docs(audit): post-P8 end-to-end review findings + acid test result Document the post-P8 cross-cutting review and acid test outcome: End-to-end reviewer caught: - BLOCKING-LIKELY: lwa/lwax/lwaux ISA deviation (fixed in `f1166d0`) - Cosmetic: fpscr round_single_toward_zero duplicate-branch (fixed in `09c6c92`) - Minor performance: reservation table active_reservers as slot-occupancy - Asymmetry note: extswx remains 64-bit ABI per audit PPCBUG-038 (wontfix) Acid test (-n 4B --parallel --reservations-table, pre-lwa-hotfix build): - swaps=1, draws=0 - exit 0, no panics, no errors, no RtlRaiseException - 14 thread spawns, 2 LR-sentinel exits - Renderer plateau NOT unblocked by cumulative P1-P8 correctness fixes Implication: the Sylpheed `draws=0` plateau has a non-PPC-correctness root cause. PPC fixes were correctness-justified independent of the renderer (well-grounded against canary). Next investigation tracks: graphics pipeline (EDRAM resolve, RT readback), kernel HLE (event signaling, timers), or the unresolved BST-validation paradox per `project_xenia_rs_sylpheed_event_chain_2026_04_29.md`. Out of scope for the PPC instruction audit. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 14:49:43 +02:00
MechaCat02	09c6c927bd	refactor(cpu): fpscr round_single_toward_zero — collapse duplicate-branch ULP step Post-P8 review nit: the if/else branches were identical (`adj_bits - 1` either way). Both positive and negative finite f32 values use the IEEE-754 sign bit as the MSB, and subtracting 1 from `to_bits()` always reduces magnitude by one ULP. Replace the mock-conditional with the unconditional form + a comment explaining why one operation works for both signs. No behavior change. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 14:45:55 +02:00
MechaCat02	f1166d0f75	fix(cpu): revert PPCBUG-105 — lwa/lwax/lwaux sign-extend per PowerISA Post-P8 end-to-end review caught an ISA deviation introduced by P4 batch 5. The original code used `as i32 as i64 as u64` (correct PowerISA sign-extension; canary's `SignExtend(INT64_TYPE)`). My P4 batch 5 commit (`20a730d`) changed all three to `as u64` (zero-extend), citing the audit's "32-bit-ABI hazard" note for PPCBUG-105. This deviation is wrong per PowerISA and any 64-bit-mode kernel code that uses `lwa rT, off(rA)` will silently produce the wrong rT for negative words (e.g. memory 0x80000000 should yield 0xFFFFFFFF_80000000 but was yielding 0x00000000_80000000). Restore ISA-spec sign-extension for all three forms (lwa, lwax, lwaux). The audit's 32-bit-ABI hazard concern was speculative — there's no evidence that Xbox 360 user code emits `lwa` (compilers use `lwz`). If a real bug surfaces from a 32-bit-ABI consumer that feeds an `lwa`-loaded value into a u64 unsigned compare, that's a separate issue to debug at the consumer site. Test renamed: lwa_high_bit_set_zero_extends_upper → lwa_sign_extends_to_i64 with assertion flipped to expect 0xFFFFFFFF_80000000. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 14:43:47 +02:00
MechaCat02	9de18a9eec	chore(audit): mark P8 PPCBUGs applied; append P8 progress section; AUDIT-FIX-COMPLETE P8 phase merged at `4029041`. Update audit-findings.md status fields (38 PPCBUGs marked applied) and append the P8 progress section to audit-report-2026-04-29.md. This closes the eight-phase audit-application sweep. Total ~161 PPCBUGs applied across P1-P8. ~12 LOW test-gap IDs remain Status: open and can be closed incrementally without blocking any functionality. Next session: deferred acid test (`xenia-rs check sylpheed.iso -n 4B --parallel --reservations-table`) to see if cumulative correctness fixes unblock the Sylpheed renderer plateau (draws=0). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 14:24:24 +02:00
MechaCat02	4029041618	Merge branch 'ppc-audit-fix/p8-tests' — Phase 8 test gap closure Phase 8 of the PPC instruction audit fix application: pure test gap closure for opcode groups that previously had near-zero unit test coverage. 53 new tests across 5 commits (4 batches + review-nit rename). - `9827b03`: Batch 1 — branch/CR-logical/SPR/MSR/FPSCR/sync (12 tests) - `2d223ee`: Batch 2 — load/store base + lswx/stswx with XER TBC (15 tests) - `ebfd18a`: Batch 3 — FPU + VMX float (14 tests) - `2614806`: Batch 4 — VMX integer/permute/load-store (12 tests) - `1f9696a`: review-fix nit — vmsum3fp_… → vmaddfp_lane_fma rename Independent reviewer verdict: LGTM, no blocking issues, no rubber- stamp tests, no encoding bugs (every hand-encoded raw cross-checked against canary's INSTRUCTION table). Two minor follow-ups: the test rename was applied immediately; the audit cross-reference in batch-4 body is loose (one representative test per group, not 1:1) — accepted. The XER-TBC tests (`lswx_uses_xer_tbc_for_byte_count`, `stswx_uses_xer_tbc_for_byte_count`) are load-bearing: they directly exercise the P6 XER TBC infrastructure, both opcodes were permanent no-ops pre-P6. Closed IDs (28): 055, 067, 070, 081, 082, 083, 084, 085, 089, 091, 100, 109, 110, 111, 118, 127, 129, 132, 146, 147, 153, 163, 171, 187, 208, 228, 240, 277, 316/320, 321/323, 370, 438, 439, 440, 490, 517. Remaining `Status: Open` test-gap LOW IDs are tracked in audit-findings.md; they don't block any functionality and can be closed in incremental future work. Verification at merge: cargo test --workspace --release reports 551 passed, 0 failed (up from 498 at P7 merge; 53 net new tests). Acid test deferred to end of all phases per user direction.	2026-05-02 14:23:04 +02:00
MechaCat02	1f9696ad47	test(cpu): rename vmsum3fp_… to vmaddfp_lane_fma per reviewer nit P8 review feedback (non-blocking): the test fn name said vmsum3fp but the encoding/body actually tests vmaddfp. Rename + clarify comment; no behavior change. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 14:22:39 +02:00
MechaCat02	261480616c	test(cpu): PPCBUG-240/277/278/316/321/370/490/517 P8 batch 4 — VMX integer/permute/load-store Phase 8 batch 4 — VMX integer + permute/pack + multiply-sum + load/store. 12 new tests: - VMX add/sub (240): vaddubm byte add, vsubuwm word sub. - VMX compare (277): vcmpequb lane mask. - VMX min/max (278): vmaxsw signed lane max. - VMX shift/rotate (316): vsl 128-bit left shift, vsraw arithmetic per-lane. - VMX logical (321): vand lane-wise AND. - VMX permute (370): vsldoi byte concatenation + shift. - VMX multiply-sum (490): vmaddfp lane FMA. - VMX load/store (517): lvx aligned quadword load, stvx aligned store, lvebx byte-lane load. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 14:16:51 +02:00
MechaCat02	ebfd18a64e	test(cpu): PPCBUG-187/208/228/438/439/440 P8 batch 3 — FPU + VMX float Phase 8 batch 3 — FPU and VMX float test gap closure. 14 new tests: - Single FPU (187): fadds, fmuls - Double FPU (208): fmul, fdiv (zero-numerator), fneg, fabs, fmr - FPU convert/compare (228): fcmpu, fcfid - VMX float compare (438): vcmpeqfp lane mask - VMX rounding (439): vrfip, vrfim, vrfiz - VMX convert (440): vctsxs saturation to INT_MAX/INT_MIN The VMX VX-form encoding nit (XO is 11 bits at PPC 21-31, host bits 10-0, with bit 0 the LSB — not bit 1) was caught by initial test failures and fixed before commit. VC-form (vcmpeqfp) has the same "XO at bit 0" layout. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 14:14:10 +02:00
MechaCat02	2d223eee69	test(cpu): PPCBUG-091/100/109-111/118/127/129/132/146-147/153/163/171 P8 batch 2 — load/store Phase 8 batch 2 — load/store test gap closure. 15 new tests across the load/store opcodes: - lbz zero-extend (091), lwbrx byte-swap (109/110), lwarx smoke (111), ld doubleword (118), lmw + lswi (127), lswx with XER TBC (127), lfs single-to-double widening (129). - stb (132), sth, stw (146), std (153), stmw + stswx (163), stfs (171). `lswx_uses_xer_tbc_for_byte_count` and `stswx_uses_xer_tbc_for_byte_count` specifically lock in the new XER TBC infrastructure landed in P6 (`68c0ee5`); both opcodes were permanent no-ops before that. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 14:10:26 +02:00
MechaCat02	9827b03f1a	test(cpu): PPCBUG-055/067/070/081-085/089 P8 batch 1 — branch/CR/SPR/sync Phase 8 batch 1 — test gap closure for the branch/CR-logical/SPR/MSR/ FPSCR/cache+sync groups. 12 new tests across the affected groups: - PPCBUG-055 branch: blr, bctr, bcl-LK-on-not-taken - PPCBUG-070 CR logical: cror, crand, crxor (crclr idiom) - PPCBUG-067 trap+sc: sc smoke, tw TO=0 never-traps - PPCBUG-081-085 SPR/MSR/FPSCR moves: mfcr 8-field assembly, mtfsb1/mtfsb0 - PPCBUG-089 cache+sync: sync state-non-mutation smoke These groups previously had near-zero unit test coverage. New tests lock in the current ISA-correct behavior; would catch a regression in any of the dispatch/encoding/result paths. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-02 14:08:54 +02:00

1 2 3

111 Commits