Commit Graph

4 Commits

Author SHA1 Message Date
MechaCat02
de21c7a544 [iterate-2G] db16cyc spin-hint cooperative yield: unblock title-screen 0x10a0 gate
The silph title state machine (tid13) blocked on event 0x10a0, never signaled.
Root: the event's producer chain runs on the silph worker (entry 0x821C4AD0,
our tid14), which was starved. tid14 shares a HW slot with a guest spinlock/
barrier participant (sub_824D1328, entry 0x824D2940) that busy-spins on the
db16cyc hint `or r31,r31,r31` (encoding 0x7FFFFB78) at 0x824D140C. Under our
round-robin lockstep the spinner consumed its whole block every round and
starved the co-located tid14 (only 9 progress hits over 200M instr) — so the
producer never reached the event-create/duplicate/signal dance the canary
oracle performs (handle F80000E8 set by the submitter F8000044 via a duplicated
handle).

Fix (canary-faithful): recognize the db16cyc spin hint exactly as canary's
InstrEmit_orx does (code 0x7FFFFB78 -> DelayExecution) and surface it as a new
StepResult::Yield. The scheduler's yield_current() promotes every Ready peer on
the slot past STARVE_LIMIT so begin_slot_visit picks one next round, then they
reset and the spinner reclaims the slot — fair alternation, no priority
inversion, pure function of slot state (deterministic).

Result (lockstep, cache-persist, -n 200M): tid14 progresses past its old stall
into a real wait; tid13 advances off 0x10a0 to a new event; hub/submitter
re-enter their wait loops. imports 280k->592k, packets 124M->164M, swaps 1->2.
draws still 0 (the splash's first draw is a further-upstream gate).

Determinism preserved (two cold n50m runs byte-identical). n50m golden
re-baselined (imports 90296->339766, swaps 1->2; draws unchanged 0). n2m
golden unchanged (db16cyc not reached in first 2M). Tests 670/670.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-13 10:38:17 +02:00
MechaCat02
db90ad0f7d [AUDIT-059 R-D2] Phase D auto-signal POC confirms audit-049 wedge diagnosis
Hook NtCreateEvent for the silph::UImpl tid=13 chain (entry=0x821748F0,
start_context=0x4024a840, frame-1 LR=0x821CB15C inside sub_821CB030+0x128)
and auto-signal the resulting handle after XENIA_SILPH_UI_AUTOSIGNAL_DELAY
instructions. Env-gated; default off.

SR4 verdict B (partial unwedge):
- handle 0x1078 signal_attempts 0->1
- tid=13 Blocked(WaitAny[0x1078]) -> Ready pc=0x824a9108
- ExCreateThread 10 -> 12 (new silph::UImpl tid=14, worker tid=15)
- New downstream wedges 0x1084 + 0x1088
- cxx_throw runtime_error on tid=5 inside R26 dispatcher
  (BST not-registered instance lhs=0x715a7af0)
- VdSwap stays 1; no draws (POC is diagnostic, not final fix)

Confirms Phase C diagnosis end-to-end. The real signaler must (a) drive
NtSetEvent on the silph KEVENT AND (b) register the dispatcher's BST
instance upstream; this POC only does (a).

Reading-error class #20: ctx.lr at kernel export entry is the thunk
wrapper's return slot, NOT the guest caller's post-bl PC. Walk back-chain
1 step to get frames[1].lr.

Reading-error class #21: --parallel and lockstep have SEPARATE outer
loops in main.rs (run_execution_parallel line 2928 vs run_execution
line 2706). Per-round hooks must be wired in BOTH paths.

Files:
- crates/xenia-cpu/src/scheduler.rs: GuestThread.start_entry/start_context
  fields + spawn() population + current_thread_entry_and_ctx() helper
- crates/xenia-kernel/src/state.rs: AutoSignalPending struct, env-parsed
  silph_autosignal_delay, pending Vec, last_cycle_hint, set_now_cycle_hint,
  maybe_register_silph_autosignal (walks back-chain), fire_due_silph_autosignals
- crates/xenia-kernel/src/exports.rs: hook in nt_create_event
- crates/xenia-app/src/main.rs: fire-site + cycle hint in both outer loops
- audit-runs/audit-059-handle-disambiguation/round-D2-autosignal-poc/FINDINGS.md

Tests 655/655 green. Default behavior byte-identical when env unset.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-11 18:38:38 +02:00
MechaCat02
481591fdb2 [AUDIT-059 R-C1] Phase C: bit-28 setter hypothesis REFUTED via dump-addr
Phase A's diagnosis (bit 28 of [0x40d09a40] gets set to exit
sub_822F1AA8's loop) is falsified by direct probe + --dump-addr in 4
sub-rounds.

Key evidence:
- sub_821B55D8 candidate fn fires 0× in ours; sub_824AA858
  (XamInputSetState wrapper) fires 0× in canary too — chain is dead code
  in both engines.
- end-of-run dump shows [0x40d09a40+0] = 0x00000021, same as at entry —
  bit 28 is NEVER set.
- bcctrl at PC 0x822F1B4C (sub_822F1AA8+0xA4) fires (LR=0x822F1B50) but
  the post-bcctrl BB head 0x822F1B50 fires 0× — bcctrl never returns.
- sub_82173990 (vtable[0] of singleton at [0x828E1F08]) is the call
  target; tid=1 wedges inside this 768-byte function on a thread-join
  to handle 0x1070 (= tid=13's thread handle).
- tid=13 (entry=sub_821748F0, ctx=0x4024a840, handle=0x1070) reaches
  sub_821C4EB0 (silph::UImpl@GamePart_Title) at cycle 1882 → audit-049
  cluster IS reached, wedges on handle 0x1078 there.

C.2 force-clear POC NOT EXECUTED — would be no-op since bit 28 is never
set. Per plan stopping criterion, hand back instead of proceeding blind.

Adds reading-error class #19: disasm-pattern-match without runtime
verification (Phase A scanned 49 oris-0x1000 sites and declared one the
setter without ever observing the bit get set).

No xenia-rs source changes. Canary repo also unchanged (config edit
reverted clean).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-11 17:57:27 +02:00
MechaCat02
52c30d82a7 [AUDIT-059 R-A] Phase A backward-trace: divergence is sub_822F1AA8 loop exit, not factory/registry
Round-37 anchor reframe: both engines install the SAME static .rdata vtable
0x820A183C at [0x828E1F08]. Instance VAs differ only because of ε-class
allocator divergence (audit-043). vtable bytes byte-identical; the user
prompt's "factory/registry" framing was falsified.

Phase A walkthrough (rounds A1..A8):
- A.1 canary --audit_jit_prolog_pc=0x821741C8: tid=6, r3=0xBCCC4A80 (= inner
  sub-object of [0x828E1F08]'s singleton), LR=0x822F1D5C (return-from-bctrl
  inside sub_822F1AA8)
- A.2 found tid=6 spawn site sub_821746B0 at PC 0x82174824 spawning
  entry=sub_821748F0 ctx=BC365700/BC366DA0. sub_822F1AA8 ALSO spawns a
  second thread (entry=sub_822F1EE0 ctx=BCE24A40) at PC 0x822F1B08
- A.3 sub_822F1AA8 has 2 callers, both in sub_8216EA68 (its sole caller is
  sub_824AB748 = entry_point)
- A.4 ours mirror probe: sub_821746B0 enters, [0x828E2B14] gate passes,
  ExCreateThread fires returning handle 0x1070 (= tid=13). Ours' tid=13
  IS the same logical thread as canary's spawned silph initializer
- A.5 canary --audit_jit_prolog_pc=0x821749C0: fires only 2× on short-lived
  tid=17, tid=26 (the spawned initializers — NOT tid=6)
- A.6 canary --audit_jit_prolog_pc=0x822F1AA8: fires 1× on tid=6 with
  r3=0xBCE24A40 LR=0x8216EE14 (the second sub_822F1AA8 call site)
- A.7 canary --audit_jit_prolog_pc=0x824AB748 (entry_point): fires on
  tid=00000006. CONFIRMS canary's tid=6 = canary's main thread.

Verdict: identical call chain entry_point → sub_8216EA68 → sub_822F1AA8 in
both engines; same controller (ε-divergent VA, byte-identical fields).
Canary's main thread stays in sub_822F1AA8's dispatcher loop firing
sub_821741C8 ~1678×/30s. Ours' main thread exits the loop and thread-joins
on the spawned initializer (tid=13), which is itself wedged on handle 0x1078
forever.

Loop exit is gated by bit 28 of [r30+0] (the controller's flag word). Same
value 0x21 at function entry in both engines. Some code between entry and
loop check sets bit 28 in ours but not in canary. Mem-watch on 0x40d09a40
shows zero guest stores in ours' 50M parallel run — setter is either a
kernel-side store, computed alias, or probe-quantum-elided JIT store.

Phase B classification: Class 3a (state-divergence on controller object).
The vtable is the same; the controller's bit 28 evolves differently during
sub_822F1AA8 setup. Class 4 (synthesis) is now less attractive since we
correctly reach the dispatcher with the right inputs — we just exit too
soon.

Phase C will need either JIT instrumentation to identify the bit-28 setter,
or a kernel-side hook to clear bit 28 on entry to the loop check site.

Findings notes:
- round-A4b-ours-spawn-gate/FINDINGS.md (spawn topology + tid mapping)
- round-A8-ours-822F1AA8-trace/FINDINGS.md (full loop structure + bit-28 gate)

New reading-error class #18: probe-output anchor misframing (singleton[VA]=X
vtable=Y was misread as "Y is canary-only vtable" when Y is the same
.rdata vtable in both engines).

Branch: iterate-2C/silph-ui-spawn-trace off master @ 229b46c.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-11 17:02:20 +02:00