Files
xenia-rs/audit-runs/phase-c23-scheduler-determinism-plan/candidate-strategies.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

215 lines
7.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Candidate strategies — Phase C+23
Five candidate strategies for aligning canary↔ours contention. Each
evaluated on: implementation, scope, behavior risk, coverage,
compatibility with existing absorbers.
## (α) Lockstep cooperative scheduler — both engines
### What
Run both engines as single-host-thread cooperative schedulers, with
a shared deterministic policy for "which guest thread runs next at
each scheduling boundary". Canary would lose its 1-host-per-1-guest
model; ours already cooperative.
### Scope
- canary: ~2000-3000 LOC across `kernel/xthread.cc`, `base/threading.cc`,
`base/threading_posix.cc`, `base/threading_win.cc`, `cpu/processor.cc`.
Replace `Thread::Create` with a fiber/coroutine runtime. All
`pthread_cond_wait`-style waits become explicit scheduler calls.
- ours: ~0 LOC (already in this model).
### Behavior risk
**HIGH.** Canary is the *oracle*. Reworking its scheduling philosophy
could break game-compat regression (other titles depend on the
host-thread behavior). Re-validating Sylpheed alone would not certify
this for the broader canary test corpus.
### Coverage
ALL contention sources, deterministically.
### Compatibility
Replaces C+18 / C+21 / D-extension absorbers (they become moot once
canary is bit-deterministic). But: if the cooperative canary picks a
*different* schedule than ours, the matched-prefix gain is zero —
both still diverge, just deterministically. Needs a *shared policy*.
### Verdict
Overscoped. Already rejected in 2026-05-18 plan as approach B.
---
## (β) Deterministic preemption points — both engines
### What
Define a finite set of scheduling boundaries that BOTH engines honor
(e.g., kernel-call entry, `xeKeWaitForSingleObject`, `RtlEnterCriticalSection`,
quantum exhaustion, page-boundary crossings). Between these points,
threads run monolithically. The policy at each point is deterministic
(e.g., "lowest tid among Ready wins").
### Scope
- canary: ~1000 LOC. Add a `xe::DeterministicScheduler` layer that
intercepts kernel-call entry; if multiple guest threads are
competing, picks via the shared policy. Disable host preemption
outside boundaries (set per-thread `SCHED_FIFO` or use a global
`scheduler_mutex` released only at boundaries).
- ours: ~200 LOC. Modify `Scheduler::round_schedule` and
`decrement_quantum` to honor the same boundary set.
### Behavior risk
**HIGH** on canary. Same oracle-stability concern as (α). MEDIUM on
ours; the rotation-at-boundaries is a small generalization of
existing logic.
### Coverage
ALL kernel-mediated contention. Does NOT cover non-kernel guest
atomics (rare in Sylpheed — probed at 0 occurrences in import
inventory).
### Compatibility
Subsumes C+18 / C+21 / D-extension. Same shared-policy requirement
as (α).
### Verdict
The right structural answer in principle, but the engineering
investment (1200+ LOC across two engines, including a host-side
priority-inversion-safe mutex layer in canary) is multi-session
heavy. Multi-month-long subaudit. Not justified for the residual
divergence past 105,046 unless future titles need it.
---
## (γ) Recorded scheduling trace — canary records, ours replays
### What
Canary emits a high-fidelity scheduling trace (every park/wake/
context-switch + the guest-cycle each happens at). Ours consumes
this trace as its scheduling oracle: at each scheduling point, ours
forces its decision to match the trace.
This generalizes Phase D's contention-manifest from "1 event class
on 1 primitive" to "every scheduling decision."
### Scope
- canary: ~200 LOC (extend `kernel_emit_contention` to emit `sched.park`,
`sched.wake`, `sched.yield`, `sched.priority_change`).
- ours: ~400 LOC (a generalized `SchedulingTraceReplayer` consulted at
every park / wake / quantum decision).
- Diff tool: ~50 LOC engine-local kinds.
### Behavior risk
LOW on canary (additive emit only, cvar-gated default-off).
MEDIUM on ours (replay mode is a new schedule policy; default mode
unchanged).
### Coverage
ALL kernel-mediated contention, ALL wait timeouts, ALL priority
adjustments. Strong.
### Compatibility
Mostly subsumes C+18 / C+21 absorbers (they remain as safety nets).
D-extension absorber may still be needed if upstream state-mutation
timing differs by a few host instructions in regions canary's trace
doesn't precisely cover.
### Verdict
The "right next step" if structural alignment is the goal. The Phase
D Stages 1-4 work is the *foundation* for this; γ broadens to other
event classes. Risk: the trace can be enormous (millions of entries
for Sylpheed), and the cost-benefit depends on how many *additional*
events past 105,046 a broader trace would unlock.
---
## (δ) Wine-level controls — single-CPU pin + RT priority
### What
Run canary under Wine with `taskset -c 0`, `chrt --rr 99`, disable
kernel preemption flags. Reduce canary's host-OS jitter without
modifying code.
### Scope
- 0 LOC engine. ~10 LOC bash wrapper.
### Behavior risk
MEDIUM. Wine's internal threads (ntdll server, GPU shim) still race
with the game's guest threads; pinning all of them to one core
serializes but doesn't guarantee a specific interleaving order.
Aggressive RT priority could hang the rig if a tight spin loop
forms.
### Coverage
PARTIAL — reduces jitter range, doesn't eliminate. Empirical jitter
profile suggests jitter range is already small (0-3 wait.begin events
per cold), so the marginal reduction is small.
### Compatibility
Orthogonal — works alongside absorbers. Could be combined with γ
to reduce trace size by reducing canary's natural variance.
### Verdict
Cheap, worth trying as a probe, but unlikely to bit-stabilize canary
because Wine itself has internal non-determinism. **Recommend as a
small empirical experiment, not as the structural fix.**
---
## (ε) Atomic-operation determinism — ours emulates canary's host
### What
Change ours's atomic-op semantics so that, e.g., when ours's tid=1
performs `atomic_cas(-1, 0, &cs->lock_count)`, the outcome matches
what canary's host atomics would produce given the same instruction
ordering. Requires modeling canary's host-OS scheduling decisions
inside ours.
### Scope
Effectively (γ) but at a finer grain. ~600 LOC.
### Behavior risk
HIGH. Atomic-op semantics are a fundamental primitive; changing
them risks breaking unrelated PowerPC instruction emulation.
### Coverage
ALL contention. But the LOC growth is large because PowerPC has
multiple atomic instructions (lwarx/stwcx., loadarrowright, etc.)
each needing the replay hook.
### Compatibility
Subsumes everything. Conflicts with the existing Scheduler.
### Verdict
Theoretical only. Don't pursue.
---
## (ζ) Stay with the band-aid
### What
Accept that the matched-prefix metric is unreliable in contention
regions. Continue using C+18 / C+21 / D-extension absorbers; if new
divergence classes appear past 105,046, add narrow absorbers as
needed.
### Scope
0 LOC engine. Diff-tool absorber additions: ~50-150 LOC per new
class as it appears.
### Behavior risk
LOW. Band-aids are explicitly annotated; the absorber chain has
3 layers but each is narrow.
### Coverage
Up to ε. The 104,607 cap is unblocked to 105,046. The NEXT cap
(`VdInitializeEngines`, the VD-subsystem bug) is unrelated to
scheduling.
### Compatibility
Self-consistent. Already in production.
### Verdict
**Cheapest viable answer.** The next divergence is *not* scheduling;
no further scheduler-determinism work is needed UNTIL a future cap
recurs from scheduler asymmetry.