Files
xenia-rs/audit-runs/iterate-2D-deferred-fixes/DEFERRED_FIXES.md
MechaCat02 de21c7a544 [iterate-2G] db16cyc spin-hint cooperative yield: unblock title-screen 0x10a0 gate
The silph title state machine (tid13) blocked on event 0x10a0, never signaled.
Root: the event's producer chain runs on the silph worker (entry 0x821C4AD0,
our tid14), which was starved. tid14 shares a HW slot with a guest spinlock/
barrier participant (sub_824D1328, entry 0x824D2940) that busy-spins on the
db16cyc hint `or r31,r31,r31` (encoding 0x7FFFFB78) at 0x824D140C. Under our
round-robin lockstep the spinner consumed its whole block every round and
starved the co-located tid14 (only 9 progress hits over 200M instr) — so the
producer never reached the event-create/duplicate/signal dance the canary
oracle performs (handle F80000E8 set by the submitter F8000044 via a duplicated
handle).

Fix (canary-faithful): recognize the db16cyc spin hint exactly as canary's
InstrEmit_orx does (code 0x7FFFFB78 -> DelayExecution) and surface it as a new
StepResult::Yield. The scheduler's yield_current() promotes every Ready peer on
the slot past STARVE_LIMIT so begin_slot_visit picks one next round, then they
reset and the spinner reclaims the slot — fair alternation, no priority
inversion, pure function of slot state (deterministic).

Result (lockstep, cache-persist, -n 200M): tid14 progresses past its old stall
into a real wait; tid13 advances off 0x10a0 to a new event; hub/submitter
re-enter their wait loops. imports 280k->592k, packets 124M->164M, swaps 1->2.
draws still 0 (the splash's first draw is a further-upstream gate).

Determinism preserved (two cold n50m runs byte-identical). n50m golden
re-baselined (imports 90296->339766, swaps 1->2; draws unchanged 0). n2m
golden unchanged (db16cyc not reached in first 2M). Tests 670/670.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-13 10:38:17 +02:00

3.4 KiB
Raw Blame History

iterate-2D Deferred Structural Fixes — Outcome

Branch iterate-2D/subsystem-fixes. After verification + the user's go-ahead:

Issue 1 — 32-bit word-form ALU truncation (PPCBUG-020) — FIXED & LANDED

Commit 341196a. Confirmed load-bearing via runtime ours-vs-canary capture: Sylpheed's ms→LARGE_INTEGER converter sub_824ACA88 (clrldi; mulli r11,r11,-10000; std) produced 0x00000000_FFFD8F00 in ours vs canary's correct 0xFFFFFFFF_FFFD8F00 for a 16 ms wait — a positive (absolute) timeout → ~26000× over-wait that froze the main frame loop. Fixed the 17 data-losing word-form ops (full 64-bit result, CA/OV/CR0 preserved byte-identical), updated 7 bug-asserting tests, re-baselined sylpheed_n50m (imports 40454→1790936), sylpheed_n2m unchanged. 660/660 + ignored oracle green; lockstep determinism preserved. Boot unwedged (parallel NtWaitForMultipleObjectsEx 94→30428; frozen worker/critical-section loops now run). VdSwap still 1 — rendering progression needs the out-of-scope acd1656 fixes (nt_create_event polarity + 2.AF), not in this branch.

Issue 2 — Memory page-size per-region collapse — DEFERRED (verified NOT load-bearing)

Sylpheed requests MmAllocatePhysicalMemoryEx with flags=0, alignment(r8)=0 (default); ours returns self-consistent 4K-aligned addresses and boots. ours has no 0xA0/0xC0/0xE0 physical-region model at all, so a faithful fix is a region-model rewrite that shifts every physical guest VA (golden-breaking, invalidates the audit-059 VA map) with no demonstrated boot benefit. A partial page-size-only change would shift VAs for zero correctness gain — do NOT do it piecemeal. Pursue only if a render-path struct is proven to depend on physical region/alignment.

Issue 3 — Timing — LEFT (not load-bearing / determinism-coupled)

  • 3d DPC/APC: INERT — the only timer (NtSetTimerEx) passes a NULL APC routine; no NtQueueApcThread/KeInsertQueueDpc imported.
  • 3b timeout sign: was a SYMPTOM of Issue 1 (the "positive absolute" timeouts were mulli-corruption artifacts) — resolved by the Issue 1 fix.
  • 3a/3c timebase/skew: timebase = instruction-count IS the deterministic lockstep clock; must not become wallclock. 2.AF deadline-drain already present. Not load-bearing for Sylpheed.

Issue 4 — VFS synthesized-success-on-miss — LEFT (risky / coupled to Issue 1 trajectory)

The synthesis fallback handles a MIX (writable-partition probes partition0/Cache0 + a genuine disc miss dat/files.tbl, verified absent from the ISO). Canary doesn't fire XamShowDirtyDiscErrorUI during boot (the one "DirtyDisc" log hit is the import-table declaration). Not cleanly separable without heuristic disc-vs-partition routing. Re-verify on the corrected post-Issue-1 (and post-acd1656) trajectory before changing.

Issue 5 — Mutant object — SKIPPED (verified unused)

Sylpheed's XEX import table contains NO mutant symbols (NtCreateMutant/NtReleaseMutant/KeReleaseMutant/ KeInitializeMutant/NtQueryMutant) — the game cannot call them; unimplemented=0 across boot. A correct implementation needs mutant hand-off semantics + an owner-type redesign (the existing Mutex { owner: Option<u8> } tracks a HW slot, not a thread) in the determinism-critical wait path, for code that never executes. Per the mandate's skip-if-unused criterion, left unimplemented. Can be added on request as a pure canary-parity / future-title feature (determinism-safe since no Sylpheed mutant ever exists at runtime).