xenia-rs

Author	SHA1	Message	Date
MechaCat02	de21c7a544	[iterate-2G] db16cyc spin-hint cooperative yield: unblock title-screen 0x10a0 gate The silph title state machine (tid13) blocked on event 0x10a0, never signaled. Root: the event's producer chain runs on the silph worker (entry 0x821C4AD0, our tid14), which was starved. tid14 shares a HW slot with a guest spinlock/ barrier participant (sub_824D1328, entry 0x824D2940) that busy-spins on the db16cyc hint `or r31,r31,r31` (encoding 0x7FFFFB78) at 0x824D140C. Under our round-robin lockstep the spinner consumed its whole block every round and starved the co-located tid14 (only 9 progress hits over 200M instr) — so the producer never reached the event-create/duplicate/signal dance the canary oracle performs (handle F80000E8 set by the submitter F8000044 via a duplicated handle). Fix (canary-faithful): recognize the db16cyc spin hint exactly as canary's InstrEmit_orx does (code 0x7FFFFB78 -> DelayExecution) and surface it as a new StepResult::Yield. The scheduler's yield_current() promotes every Ready peer on the slot past STARVE_LIMIT so begin_slot_visit picks one next round, then they reset and the spinner reclaims the slot — fair alternation, no priority inversion, pure function of slot state (deterministic). Result (lockstep, cache-persist, -n 200M): tid14 progresses past its old stall into a real wait; tid13 advances off 0x10a0 to a new event; hub/submitter re-enter their wait loops. imports 280k->592k, packets 124M->164M, swaps 1->2. draws still 0 (the splash's first draw is a further-upstream gate). Determinism preserved (two cold n50m runs byte-identical). n50m golden re-baselined (imports 90296->339766, swaps 1->2; draws unchanged 0). n2m golden unchanged (db16cyc not reached in first 2M). Tests 670/670. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-13 10:38:17 +02:00
MechaCat02	db90ad0f7d	[AUDIT-059 R-D2] Phase D auto-signal POC confirms audit-049 wedge diagnosis Hook NtCreateEvent for the silph::UImpl tid=13 chain (entry=0x821748F0, start_context=0x4024a840, frame-1 LR=0x821CB15C inside sub_821CB030+0x128) and auto-signal the resulting handle after XENIA_SILPH_UI_AUTOSIGNAL_DELAY instructions. Env-gated; default off. SR4 verdict B (partial unwedge): - handle 0x1078 signal_attempts 0->1 - tid=13 Blocked(WaitAny[0x1078]) -> Ready pc=0x824a9108 - ExCreateThread 10 -> 12 (new silph::UImpl tid=14, worker tid=15) - New downstream wedges 0x1084 + 0x1088 - cxx_throw runtime_error on tid=5 inside R26 dispatcher (BST not-registered instance lhs=0x715a7af0) - VdSwap stays 1; no draws (POC is diagnostic, not final fix) Confirms Phase C diagnosis end-to-end. The real signaler must (a) drive NtSetEvent on the silph KEVENT AND (b) register the dispatcher's BST instance upstream; this POC only does (a). Reading-error class #20: ctx.lr at kernel export entry is the thunk wrapper's return slot, NOT the guest caller's post-bl PC. Walk back-chain 1 step to get frames[1].lr. Reading-error class #21: --parallel and lockstep have SEPARATE outer loops in main.rs (run_execution_parallel line 2928 vs run_execution line 2706). Per-round hooks must be wired in BOTH paths. Files: - crates/xenia-cpu/src/scheduler.rs: GuestThread.start_entry/start_context fields + spawn() population + current_thread_entry_and_ctx() helper - crates/xenia-kernel/src/state.rs: AutoSignalPending struct, env-parsed silph_autosignal_delay, pending Vec, last_cycle_hint, set_now_cycle_hint, maybe_register_silph_autosignal (walks back-chain), fire_due_silph_autosignals - crates/xenia-kernel/src/exports.rs: hook in nt_create_event - crates/xenia-app/src/main.rs: fire-site + cycle hint in both outer loops - audit-runs/audit-059-handle-disambiguation/round-D2-autosignal-poc/FINDINGS.md Tests 655/655 green. Default behavior byte-identical when env unset. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-11 18:38:38 +02:00
MechaCat02	481591fdb2	[AUDIT-059 R-C1] Phase C: bit-28 setter hypothesis REFUTED via dump-addr Phase A's diagnosis (bit 28 of [0x40d09a40] gets set to exit sub_822F1AA8's loop) is falsified by direct probe + --dump-addr in 4 sub-rounds. Key evidence: - sub_821B55D8 candidate fn fires 0× in ours; sub_824AA858 (XamInputSetState wrapper) fires 0× in canary too — chain is dead code in both engines. - end-of-run dump shows [0x40d09a40+0] = 0x00000021, same as at entry — bit 28 is NEVER set. - bcctrl at PC 0x822F1B4C (sub_822F1AA8+0xA4) fires (LR=0x822F1B50) but the post-bcctrl BB head 0x822F1B50 fires 0× — bcctrl never returns. - sub_82173990 (vtable[0] of singleton at [0x828E1F08]) is the call target; tid=1 wedges inside this 768-byte function on a thread-join to handle 0x1070 (= tid=13's thread handle). - tid=13 (entry=sub_821748F0, ctx=0x4024a840, handle=0x1070) reaches sub_821C4EB0 (silph::UImpl@GamePart_Title) at cycle 1882 → audit-049 cluster IS reached, wedges on handle 0x1078 there. C.2 force-clear POC NOT EXECUTED — would be no-op since bit 28 is never set. Per plan stopping criterion, hand back instead of proceeding blind. Adds reading-error class #19: disasm-pattern-match without runtime verification (Phase A scanned 49 oris-0x1000 sites and declared one the setter without ever observing the bit get set). No xenia-rs source changes. Canary repo also unchanged (config edit reverted clean). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-11 17:57:27 +02:00
MechaCat02	52c30d82a7	[AUDIT-059 R-A] Phase A backward-trace: divergence is sub_822F1AA8 loop exit, not factory/registry Round-37 anchor reframe: both engines install the SAME static .rdata vtable 0x820A183C at [0x828E1F08]. Instance VAs differ only because of ε-class allocator divergence (audit-043). vtable bytes byte-identical; the user prompt's "factory/registry" framing was falsified. Phase A walkthrough (rounds A1..A8): - A.1 canary --audit_jit_prolog_pc=0x821741C8: tid=6, r3=0xBCCC4A80 (= inner sub-object of [0x828E1F08]'s singleton), LR=0x822F1D5C (return-from-bctrl inside sub_822F1AA8) - A.2 found tid=6 spawn site sub_821746B0 at PC 0x82174824 spawning entry=sub_821748F0 ctx=BC365700/BC366DA0. sub_822F1AA8 ALSO spawns a second thread (entry=sub_822F1EE0 ctx=BCE24A40) at PC 0x822F1B08 - A.3 sub_822F1AA8 has 2 callers, both in sub_8216EA68 (its sole caller is sub_824AB748 = entry_point) - A.4 ours mirror probe: sub_821746B0 enters, [0x828E2B14] gate passes, ExCreateThread fires returning handle 0x1070 (= tid=13). Ours' tid=13 IS the same logical thread as canary's spawned silph initializer - A.5 canary --audit_jit_prolog_pc=0x821749C0: fires only 2× on short-lived tid=17, tid=26 (the spawned initializers — NOT tid=6) - A.6 canary --audit_jit_prolog_pc=0x822F1AA8: fires 1× on tid=6 with r3=0xBCE24A40 LR=0x8216EE14 (the second sub_822F1AA8 call site) - A.7 canary --audit_jit_prolog_pc=0x824AB748 (entry_point): fires on tid=00000006. CONFIRMS canary's tid=6 = canary's main thread. Verdict: identical call chain entry_point → sub_8216EA68 → sub_822F1AA8 in both engines; same controller (ε-divergent VA, byte-identical fields). Canary's main thread stays in sub_822F1AA8's dispatcher loop firing sub_821741C8 ~1678×/30s. Ours' main thread exits the loop and thread-joins on the spawned initializer (tid=13), which is itself wedged on handle 0x1078 forever. Loop exit is gated by bit 28 of [r30+0] (the controller's flag word). Same value 0x21 at function entry in both engines. Some code between entry and loop check sets bit 28 in ours but not in canary. Mem-watch on 0x40d09a40 shows zero guest stores in ours' 50M parallel run — setter is either a kernel-side store, computed alias, or probe-quantum-elided JIT store. Phase B classification: Class 3a (state-divergence on controller object). The vtable is the same; the controller's bit 28 evolves differently during sub_822F1AA8 setup. Class 4 (synthesis) is now less attractive since we correctly reach the dispatcher with the right inputs — we just exit too soon. Phase C will need either JIT instrumentation to identify the bit-28 setter, or a kernel-side hook to clear bit 28 on entry to the loop check site. Findings notes: - round-A4b-ours-spawn-gate/FINDINGS.md (spawn topology + tid mapping) - round-A8-ours-822F1AA8-trace/FINDINGS.md (full loop structure + bit-28 gate) New reading-error class #18: probe-output anchor misframing (singleton[VA]=X vtable=Y was misread as "Y is canary-only vtable" when Y is the same .rdata vtable in both engines). Branch: iterate-2C/silph-ui-spawn-trace off master @ `229b46c`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-11 17:02:20 +02:00
MechaCat02	76dfe7fd7a	fix(kernel): KRNBUG-KE-001 — real KeResumeThread per canary mirror Replace the no-op cookie-returner with a real impl per canary xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_threading.cc:216-227 (XObject::GetNativeObject<XThread>()->Resume()). Mirrors nt_resume_thread plumbing two functions below: resolve_pseudo_handle -> scheduler.find_by_handle -> resume_ref. Returns STATUS_SUCCESS if the KTHREAD-pointer-as-handle resolves, STATUS_INVALID_HANDLE otherwise — matches canary's Resume()/!thread return semantics. Cascade-prediction scorecard (audit-018 -> post-fix): - A PASS: tids 9 (entry=0x824D2878) and 10 (entry=0x824D2940) leave Suspended -> run prologue -> park on audio buffer-completion semaphores 0x828A3254 / 0x828A3230. - B PARTIAL FAIL: NtSetEvent 667->3334; KeReleaseSemaphore=0; XAudioSubmitRenderDriverFrame=0. - C FAIL (predicted 2->1, actual 2->2): both ExTerminateThread + KeReleaseSemaphore still canary-only. - D FAIL: gamma-cluster blocker unchanged — pc-probe at 0x82184318/0x82184374 no fires; dump-addr 0x828F4070 no DUMP; signal_attempts on 0x1004/0x100c/0x1020/0x15e4 still 0. Necessary-but-not-sufficient: workers unsuspend but park on a downstream gate that's part of the audit-009/-016/-017 gamma cluster. Tests 600 -> 601 (+ke_resume_thread_unblocks_suspended_worker). Lockstep instructions=100000003 imports=987516 deterministic x2. Goldens re-baselined: sylpheed_n50m.json instructions 50000003->50000011, imports 407255->407247. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 20:46:46 +02:00
MechaCat02	91a7df5f6a	docs(audit): KRNBUG-IO-004 entry + canary export queue post-fix delta audit-findings.md: full IO-004 entry with cascade-prediction scorecard. audit-runs/audit-006/canary_export_queue.md: post-IO-004 status note (7 -> 3 canary-only; 4 reclassified RE-FIRES). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 16:55:57 +02:00
MechaCat02	50a488776f	docs(audit): KRNBUG-AUDIT-008 + KRNBUG-AUDIT-009 diagnostics — renderer cluster fully unreached Captures two consecutive read-only diagnostic sessions: AUDIT-008 (2026-05-05): IO-003 model reset. The 0x100c / 0x1004 / 0x15e0 workers ARE spawned post-IO-003; the IO-003 prediction-scorecard's "UNCREATED" markers were misclassifications (handle audit already showed the workers parked on lifecycle events, just unlinked from dispatcher addresses). Hypothesized the gate among the 5 non-create-chain callers of sub_821800D8 whose parents live in 0x82287000-0x82292FFF. AUDIT-009 (2026-05-05): falsifies AUDIT-008's β-hypothesis. A 21-PC --branch-probe (6 parents + 5 shims + dispatcher + 9 audit-005 producer-callsites) shows 0/21 firings at -n 500M — the entire 0x82287000-0x82294000 cluster is unreached. Static analysis confirms the cluster's level-1 roots have zero non-call xrefs in sylpheed.db. The gate is structurally above the cluster (vtable / function-pointer that's never written). Stop condition 1 triggered; discipline gate fails on box 1 + box 3; no fix this session. Also updates audit-runs/audit-006/canary_export_queue.md to reflect the AUDIT-009 evidence: 3 canary-only exports remain REAL_BUT_UNREACHED (ExTerminateThread, KeReleaseSemaphore, XamUserReadProfileSettings) — none is the immediate gate. No code changes; --branch-probe machinery from AUDIT-007 sufficed. Trace artifacts left untracked under audit-runs/audit-008/ + audit-runs/audit-009/ (consistent with prior audit-runs/* convention). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 18:53:32 +02:00
MechaCat02	a1a7265f29	fix(kernel): KRNBUG-IO-003 — NtDeviceIoControlFile real impl mirroring NullDevice::IoControl Replace the stub_success registration of NtDeviceIoControlFile at exports.rs:90 with a real handler for FsCtlCodes 0x70000 (drive geometry) and 0x74004 (partition info), mirroring xenia-canary xboxkrnl_io.cc:645-678 + null_device.{h,cc}. The 16-byte 0x74004 response with cache_size=0xFF000 at OUT+8 is the gate that lets sub_824ABD88 return SUCCESS and sub_824A9710 reach the priv-11 XexCheckExecutablePrivilege site identified by KRNBUG-AUDIT-007. Stack args 9-10 (OutputBuffer, OutputBufferLength) read from the caller's parameter save area at [sp+0x54] / [sp+0x5C] per the Xbox 360 PowerPC EABI (linkage area sp+0..sp+8, 8-quadword spill area sp+0x14..sp+0x54, then stack args every 8 bytes). First HLE export in the codebase to need 9+ args. Cascade vs. KRNBUG-AUDIT-007 prediction (5/8 held): - XexCheckExecutablePrivilege count 1 → 2 (priv=0xA + priv=0xB) ✓ - XamTaskSchedule count 0 → 1 ✓ - canary-only exports 7 → 3 (audit predicted ≤3) ✓ - 0x15e0 semaphore signal_attempts 0 → 1 (bonus) - 0x100c worker spawn DID NOT fire (still UNCREATED) ✗ - 0x1004 signal_attempts unchanged ✗ - Worker spawn count unchanged at 19 ✗ Tests: 592 → 594. Lockstep deterministic at -n 100M (run1 ≡ run2 ≡ run3, byte-identical). instructions=100000010 → 100000019, imports 407417 → 987524 (+2.4×). swaps=2 draws=0 plateau persists. sylpheed_n50m golden re-baselined instructions=50000004→50000003, imports=407362→407255. sylpheed_n2m unchanged. Still canary-only after this fix: ExTerminateThread, KeReleaseSemaphore, XamUserReadProfileSettings. The next downstream gate is somewhere past XamTaskSchedule's completion path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 22:00:12 +02:00
MechaCat02	7108d6d131	feat(kernel): KRNBUG-AUDIT-004 — --ctor-probe PC hook + --dump-addr struct dump Diagnostic-only, read-only. Lockstep `instructions=100000002` preserved bit-exact at -n 100M --stable-digest. 586 → 588 tests. Adds two read-only diagnostics for the parked-waiter producer hunt: * `--ctor-probe=0x8217C850,0x...` — at every interpreter step, if `ctx.pc` is in the configured set, print one `CTOR-PROBE` line capturing live r3 (= `this` in MSVC PPC ctors), lr (= return site), sp, plus an 8-frame back-chain with saved-r31/r30 per frame. Fires once per hit, exactly what the 8-instance-pool probe needed. * `--dump-addr=0x828F3D08,0x828F4070,0x828F3EC0,...` — at end of run (after the FOCUS report in `dump_thread_diagnostic`), each address gets a 128-byte hex + be32 + ASCII dump. Used to inspect the static dispatcher / job-queue struct layouts AUDIT-003 identified. Both gated default-off; empty set is a single `is_empty()` test on the hot path. No guest state is mutated, so the `sylpheed_nm.json` lockstep digest is preserved. KRNBUG-AUDIT-004 findings (corrects KRNBUG-AUDIT-002/003): 1. The "8-instance pool" hypothesis for handle 0x1004 is FALSE.* Probing the inner per-instance ctors `[0x821783D8, 0x82181750, 0x821701C8]` at -n 50M shows each fires EXACTLY ONCE with r3 = `[0x828F3EC0, 0x828F3D08, 0x828F4070]` respectively. All three handles are Meyers-style singletons with one dispatcher each. The "called 8 times" claim came from miscounting raw entries to the OUTER getter sub_8217C850 — but that getter is itself a Meyers-singleton-getter; only the FIRST entry cascades through to bl 0x821783D8 (gated on `[0x828F48D8] bit 0`). 2. The producer indirection layer is the singleton-getter itself. Static byte-scan of .rdata / .data shows 0 hits for the dispatcher addresses — no static registry table holds them. But the xrefs table for the OUTER getters reveals 5–6 callers each, MOSTLY non-create-chain, sharing the canonical producer pattern: `bl outer_singleton_getter; lwz r3, OFFSET(r3); bl 0x824AA1D8` (with OFFSET=80 for 0x100c, =36 for 0x15e0). So the AUDIT-003 xref audit was necessary but not sufficient — it correctly saw "no direct producer references" but missed the singleton-getter indirection layer. 3. Dispatcher struct layouts (128-byte dumps captured at -n 50M --halt-on-deadlock): - 0x828F3D08 (handle 0x100c): event_handle at +0x4C (0x100c), thread_handle at +0x48 (0x1010), self-pointer at +0x74, capacity 7 at +0x28, queue empty (+0/+3C = -1). - 0x828F4070 (handle 0x15e0): event_handle at +0x20 (0x15e0), sibling-handle 0x15E4 at +0x1C, queue empty (+0x10 = -1). - 0x828F3EC0 (handle 0x1004): event_handle at +0x78 (0x1004), 4 guest-heap sub-buffers at +0x20/+0x3C/+0x44/+0x50 in 0x4xxxxxxx range — noticeably different layout from the other two pure POD job queues. Files: crates/xenia-kernel/src/state.rs ctor_probe_pcs / dump_addrs + fire_ctor_probe_if_match + 2 tests crates/xenia-app/src/main.rs Exec --ctor-probe / --dump-addr CLI parsing, prologue hook, end-of-run struct dumper audit-findings.md KRNBUG-AUDIT-004 entry audit-runs/audit-004/ 50M probe runs (v1 outer-getter hits, v2 inner-ctor hits proving the singleton hypothesis) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 17:09:47 +02:00

9 Commits