Files
xenia-rs/audit-runs/phase-c23-scheduler-determinism-plan/candidate-strategies.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

7.3 KiB
Raw Blame History

Candidate strategies — Phase C+23

Five candidate strategies for aligning canary↔ours contention. Each evaluated on: implementation, scope, behavior risk, coverage, compatibility with existing absorbers.

(α) Lockstep cooperative scheduler — both engines

What

Run both engines as single-host-thread cooperative schedulers, with a shared deterministic policy for "which guest thread runs next at each scheduling boundary". Canary would lose its 1-host-per-1-guest model; ours already cooperative.

Scope

  • canary: ~2000-3000 LOC across kernel/xthread.cc, base/threading.cc, base/threading_posix.cc, base/threading_win.cc, cpu/processor.cc. Replace Thread::Create with a fiber/coroutine runtime. All pthread_cond_wait-style waits become explicit scheduler calls.
  • ours: ~0 LOC (already in this model).

Behavior risk

HIGH. Canary is the oracle. Reworking its scheduling philosophy could break game-compat regression (other titles depend on the host-thread behavior). Re-validating Sylpheed alone would not certify this for the broader canary test corpus.

Coverage

ALL contention sources, deterministically.

Compatibility

Replaces C+18 / C+21 / D-extension absorbers (they become moot once canary is bit-deterministic). But: if the cooperative canary picks a different schedule than ours, the matched-prefix gain is zero — both still diverge, just deterministically. Needs a shared policy.

Verdict

Overscoped. Already rejected in 2026-05-18 plan as approach B.


(β) Deterministic preemption points — both engines

What

Define a finite set of scheduling boundaries that BOTH engines honor (e.g., kernel-call entry, xeKeWaitForSingleObject, RtlEnterCriticalSection, quantum exhaustion, page-boundary crossings). Between these points, threads run monolithically. The policy at each point is deterministic (e.g., "lowest tid among Ready wins").

Scope

  • canary: ~1000 LOC. Add a xe::DeterministicScheduler layer that intercepts kernel-call entry; if multiple guest threads are competing, picks via the shared policy. Disable host preemption outside boundaries (set per-thread SCHED_FIFO or use a global scheduler_mutex released only at boundaries).
  • ours: ~200 LOC. Modify Scheduler::round_schedule and decrement_quantum to honor the same boundary set.

Behavior risk

HIGH on canary. Same oracle-stability concern as (α). MEDIUM on ours; the rotation-at-boundaries is a small generalization of existing logic.

Coverage

ALL kernel-mediated contention. Does NOT cover non-kernel guest atomics (rare in Sylpheed — probed at 0 occurrences in import inventory).

Compatibility

Subsumes C+18 / C+21 / D-extension. Same shared-policy requirement as (α).

Verdict

The right structural answer in principle, but the engineering investment (1200+ LOC across two engines, including a host-side priority-inversion-safe mutex layer in canary) is multi-session heavy. Multi-month-long subaudit. Not justified for the residual divergence past 105,046 unless future titles need it.


(γ) Recorded scheduling trace — canary records, ours replays

What

Canary emits a high-fidelity scheduling trace (every park/wake/ context-switch + the guest-cycle each happens at). Ours consumes this trace as its scheduling oracle: at each scheduling point, ours forces its decision to match the trace.

This generalizes Phase D's contention-manifest from "1 event class on 1 primitive" to "every scheduling decision."

Scope

  • canary: ~200 LOC (extend kernel_emit_contention to emit sched.park, sched.wake, sched.yield, sched.priority_change).
  • ours: ~400 LOC (a generalized SchedulingTraceReplayer consulted at every park / wake / quantum decision).
  • Diff tool: ~50 LOC engine-local kinds.

Behavior risk

LOW on canary (additive emit only, cvar-gated default-off). MEDIUM on ours (replay mode is a new schedule policy; default mode unchanged).

Coverage

ALL kernel-mediated contention, ALL wait timeouts, ALL priority adjustments. Strong.

Compatibility

Mostly subsumes C+18 / C+21 absorbers (they remain as safety nets). D-extension absorber may still be needed if upstream state-mutation timing differs by a few host instructions in regions canary's trace doesn't precisely cover.

Verdict

The "right next step" if structural alignment is the goal. The Phase D Stages 1-4 work is the foundation for this; γ broadens to other event classes. Risk: the trace can be enormous (millions of entries for Sylpheed), and the cost-benefit depends on how many additional events past 105,046 a broader trace would unlock.


(δ) Wine-level controls — single-CPU pin + RT priority

What

Run canary under Wine with taskset -c 0, chrt --rr 99, disable kernel preemption flags. Reduce canary's host-OS jitter without modifying code.

Scope

  • 0 LOC engine. ~10 LOC bash wrapper.

Behavior risk

MEDIUM. Wine's internal threads (ntdll server, GPU shim) still race with the game's guest threads; pinning all of them to one core serializes but doesn't guarantee a specific interleaving order. Aggressive RT priority could hang the rig if a tight spin loop forms.

Coverage

PARTIAL — reduces jitter range, doesn't eliminate. Empirical jitter profile suggests jitter range is already small (0-3 wait.begin events per cold), so the marginal reduction is small.

Compatibility

Orthogonal — works alongside absorbers. Could be combined with γ to reduce trace size by reducing canary's natural variance.

Verdict

Cheap, worth trying as a probe, but unlikely to bit-stabilize canary because Wine itself has internal non-determinism. Recommend as a small empirical experiment, not as the structural fix.


(ε) Atomic-operation determinism — ours emulates canary's host

What

Change ours's atomic-op semantics so that, e.g., when ours's tid=1 performs atomic_cas(-1, 0, &cs->lock_count), the outcome matches what canary's host atomics would produce given the same instruction ordering. Requires modeling canary's host-OS scheduling decisions inside ours.

Scope

Effectively (γ) but at a finer grain. ~600 LOC.

Behavior risk

HIGH. Atomic-op semantics are a fundamental primitive; changing them risks breaking unrelated PowerPC instruction emulation.

Coverage

ALL contention. But the LOC growth is large because PowerPC has multiple atomic instructions (lwarx/stwcx., loadarrowright, etc.) each needing the replay hook.

Compatibility

Subsumes everything. Conflicts with the existing Scheduler.

Verdict

Theoretical only. Don't pursue.


(ζ) Stay with the band-aid

What

Accept that the matched-prefix metric is unreliable in contention regions. Continue using C+18 / C+21 / D-extension absorbers; if new divergence classes appear past 105,046, add narrow absorbers as needed.

Scope

0 LOC engine. Diff-tool absorber additions: ~50-150 LOC per new class as it appears.

Behavior risk

LOW. Band-aids are explicitly annotated; the absorber chain has 3 layers but each is narrow.

Coverage

Up to ε. The 104,607 cap is unblocked to 105,046. The NEXT cap (VdInitializeEngines, the VD-subsystem bug) is unrelated to scheduling.

Compatibility

Self-consistent. Already in production.

Verdict

Cheapest viable answer. The next divergence is not scheduling; no further scheduler-determinism work is needed UNTIL a future cap recurs from scheduler asymmetry.