Files
xenia-rs/audit-runs/iterate-2D-deferred-fixes/DEFERRED_FIXES.md
MechaCat02 de21c7a544 [iterate-2G] db16cyc spin-hint cooperative yield: unblock title-screen 0x10a0 gate
The silph title state machine (tid13) blocked on event 0x10a0, never signaled.
Root: the event's producer chain runs on the silph worker (entry 0x821C4AD0,
our tid14), which was starved. tid14 shares a HW slot with a guest spinlock/
barrier participant (sub_824D1328, entry 0x824D2940) that busy-spins on the
db16cyc hint `or r31,r31,r31` (encoding 0x7FFFFB78) at 0x824D140C. Under our
round-robin lockstep the spinner consumed its whole block every round and
starved the co-located tid14 (only 9 progress hits over 200M instr) — so the
producer never reached the event-create/duplicate/signal dance the canary
oracle performs (handle F80000E8 set by the submitter F8000044 via a duplicated
handle).

Fix (canary-faithful): recognize the db16cyc spin hint exactly as canary's
InstrEmit_orx does (code 0x7FFFFB78 -> DelayExecution) and surface it as a new
StepResult::Yield. The scheduler's yield_current() promotes every Ready peer on
the slot past STARVE_LIMIT so begin_slot_visit picks one next round, then they
reset and the spinner reclaims the slot — fair alternation, no priority
inversion, pure function of slot state (deterministic).

Result (lockstep, cache-persist, -n 200M): tid14 progresses past its old stall
into a real wait; tid13 advances off 0x10a0 to a new event; hub/submitter
re-enter their wait loops. imports 280k->592k, packets 124M->164M, swaps 1->2.
draws still 0 (the splash's first draw is a further-upstream gate).

Determinism preserved (two cold n50m runs byte-identical). n50m golden
re-baselined (imports 90296->339766, swaps 1->2; draws unchanged 0). n2m
golden unchanged (db16cyc not reached in first 2M). Tests 670/670.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-13 10:38:17 +02:00

48 lines
3.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# iterate-2D Deferred Structural Fixes — Outcome
Branch `iterate-2D/subsystem-fixes`. After verification + the user's go-ahead:
## Issue 1 — 32-bit word-form ALU truncation (PPCBUG-020) — ✅ FIXED & LANDED
Commit **341196a**. Confirmed load-bearing via runtime ours-vs-canary capture:
Sylpheed's ms→LARGE_INTEGER converter `sub_824ACA88` (`clrldi; mulli r11,r11,-10000; std`)
produced `0x00000000_FFFD8F00` in ours vs canary's correct `0xFFFFFFFF_FFFD8F00` for a 16 ms
wait — a positive (absolute) timeout → ~26000× over-wait that froze the main frame loop.
Fixed the 17 data-losing word-form ops (full 64-bit result, CA/OV/CR0 preserved byte-identical),
updated 7 bug-asserting tests, re-baselined `sylpheed_n50m` (imports 40454→1790936), `sylpheed_n2m`
unchanged. 660/660 + ignored oracle green; lockstep determinism preserved. Boot unwedged
(parallel NtWaitForMultipleObjectsEx 94→30428; frozen worker/critical-section loops now run).
VdSwap still 1 — rendering progression needs the out-of-scope acd1656 fixes (nt_create_event
polarity + 2.AF), not in this branch.
## Issue 2 — Memory page-size per-region collapse — DEFERRED (verified NOT load-bearing)
Sylpheed requests `MmAllocatePhysicalMemoryEx` with flags=0, alignment(r8)=0 (default); ours returns
self-consistent 4K-aligned addresses and boots. ours has no 0xA0/0xC0/0xE0 physical-region model at
all, so a faithful fix is a region-model rewrite that shifts every physical guest VA (golden-breaking,
invalidates the audit-059 VA map) with no demonstrated boot benefit. A partial page-size-only change
would shift VAs for zero correctness gain — do NOT do it piecemeal. Pursue only if a render-path
struct is proven to depend on physical region/alignment.
## Issue 3 — Timing — LEFT (not load-bearing / determinism-coupled)
- 3d DPC/APC: INERT — the only timer (NtSetTimerEx) passes a NULL APC routine; no
NtQueueApcThread/KeInsertQueueDpc imported.
- 3b timeout sign: was a SYMPTOM of Issue 1 (the "positive absolute" timeouts were mulli-corruption
artifacts) — resolved by the Issue 1 fix.
- 3a/3c timebase/skew: timebase = instruction-count IS the deterministic lockstep clock; must not
become wallclock. 2.AF deadline-drain already present. Not load-bearing for Sylpheed.
## Issue 4 — VFS synthesized-success-on-miss — LEFT (risky / coupled to Issue 1 trajectory)
The synthesis fallback handles a MIX (writable-partition probes partition0/Cache0 + a genuine disc
miss dat/files.tbl, verified absent from the ISO). Canary doesn't fire XamShowDirtyDiscErrorUI during
boot (the one "DirtyDisc" log hit is the import-table declaration). Not cleanly separable without
heuristic disc-vs-partition routing. Re-verify on the corrected post-Issue-1 (and post-acd1656)
trajectory before changing.
## Issue 5 — Mutant object — SKIPPED (verified unused)
Sylpheed's XEX import table contains NO mutant symbols (NtCreateMutant/NtReleaseMutant/KeReleaseMutant/
KeInitializeMutant/NtQueryMutant) — the game cannot call them; unimplemented=0 across boot. A correct
implementation needs mutant hand-off semantics + an owner-type redesign (the existing
`Mutex { owner: Option<u8> }` tracks a HW slot, not a thread) in the determinism-critical wait path,
for code that never executes. Per the mandate's skip-if-unused criterion, left unimplemented. Can be
added on request as a pure canary-parity / future-title feature (determinism-safe since no Sylpheed
mutant ever exists at runtime).