handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,214 @@
|
||||
# Candidate strategies — Phase C+23
|
||||
|
||||
Five candidate strategies for aligning canary↔ours contention. Each
|
||||
evaluated on: implementation, scope, behavior risk, coverage,
|
||||
compatibility with existing absorbers.
|
||||
|
||||
## (α) Lockstep cooperative scheduler — both engines
|
||||
|
||||
### What
|
||||
Run both engines as single-host-thread cooperative schedulers, with
|
||||
a shared deterministic policy for "which guest thread runs next at
|
||||
each scheduling boundary". Canary would lose its 1-host-per-1-guest
|
||||
model; ours already cooperative.
|
||||
|
||||
### Scope
|
||||
- canary: ~2000-3000 LOC across `kernel/xthread.cc`, `base/threading.cc`,
|
||||
`base/threading_posix.cc`, `base/threading_win.cc`, `cpu/processor.cc`.
|
||||
Replace `Thread::Create` with a fiber/coroutine runtime. All
|
||||
`pthread_cond_wait`-style waits become explicit scheduler calls.
|
||||
- ours: ~0 LOC (already in this model).
|
||||
|
||||
### Behavior risk
|
||||
**HIGH.** Canary is the *oracle*. Reworking its scheduling philosophy
|
||||
could break game-compat regression (other titles depend on the
|
||||
host-thread behavior). Re-validating Sylpheed alone would not certify
|
||||
this for the broader canary test corpus.
|
||||
|
||||
### Coverage
|
||||
ALL contention sources, deterministically.
|
||||
|
||||
### Compatibility
|
||||
Replaces C+18 / C+21 / D-extension absorbers (they become moot once
|
||||
canary is bit-deterministic). But: if the cooperative canary picks a
|
||||
*different* schedule than ours, the matched-prefix gain is zero —
|
||||
both still diverge, just deterministically. Needs a *shared policy*.
|
||||
|
||||
### Verdict
|
||||
Overscoped. Already rejected in 2026-05-18 plan as approach B.
|
||||
|
||||
---
|
||||
|
||||
## (β) Deterministic preemption points — both engines
|
||||
|
||||
### What
|
||||
Define a finite set of scheduling boundaries that BOTH engines honor
|
||||
(e.g., kernel-call entry, `xeKeWaitForSingleObject`, `RtlEnterCriticalSection`,
|
||||
quantum exhaustion, page-boundary crossings). Between these points,
|
||||
threads run monolithically. The policy at each point is deterministic
|
||||
(e.g., "lowest tid among Ready wins").
|
||||
|
||||
### Scope
|
||||
- canary: ~1000 LOC. Add a `xe::DeterministicScheduler` layer that
|
||||
intercepts kernel-call entry; if multiple guest threads are
|
||||
competing, picks via the shared policy. Disable host preemption
|
||||
outside boundaries (set per-thread `SCHED_FIFO` or use a global
|
||||
`scheduler_mutex` released only at boundaries).
|
||||
- ours: ~200 LOC. Modify `Scheduler::round_schedule` and
|
||||
`decrement_quantum` to honor the same boundary set.
|
||||
|
||||
### Behavior risk
|
||||
**HIGH** on canary. Same oracle-stability concern as (α). MEDIUM on
|
||||
ours; the rotation-at-boundaries is a small generalization of
|
||||
existing logic.
|
||||
|
||||
### Coverage
|
||||
ALL kernel-mediated contention. Does NOT cover non-kernel guest
|
||||
atomics (rare in Sylpheed — probed at 0 occurrences in import
|
||||
inventory).
|
||||
|
||||
### Compatibility
|
||||
Subsumes C+18 / C+21 / D-extension. Same shared-policy requirement
|
||||
as (α).
|
||||
|
||||
### Verdict
|
||||
The right structural answer in principle, but the engineering
|
||||
investment (1200+ LOC across two engines, including a host-side
|
||||
priority-inversion-safe mutex layer in canary) is multi-session
|
||||
heavy. Multi-month-long subaudit. Not justified for the residual
|
||||
divergence past 105,046 unless future titles need it.
|
||||
|
||||
---
|
||||
|
||||
## (γ) Recorded scheduling trace — canary records, ours replays
|
||||
|
||||
### What
|
||||
Canary emits a high-fidelity scheduling trace (every park/wake/
|
||||
context-switch + the guest-cycle each happens at). Ours consumes
|
||||
this trace as its scheduling oracle: at each scheduling point, ours
|
||||
forces its decision to match the trace.
|
||||
|
||||
This generalizes Phase D's contention-manifest from "1 event class
|
||||
on 1 primitive" to "every scheduling decision."
|
||||
|
||||
### Scope
|
||||
- canary: ~200 LOC (extend `kernel_emit_contention` to emit `sched.park`,
|
||||
`sched.wake`, `sched.yield`, `sched.priority_change`).
|
||||
- ours: ~400 LOC (a generalized `SchedulingTraceReplayer` consulted at
|
||||
every park / wake / quantum decision).
|
||||
- Diff tool: ~50 LOC engine-local kinds.
|
||||
|
||||
### Behavior risk
|
||||
LOW on canary (additive emit only, cvar-gated default-off).
|
||||
MEDIUM on ours (replay mode is a new schedule policy; default mode
|
||||
unchanged).
|
||||
|
||||
### Coverage
|
||||
ALL kernel-mediated contention, ALL wait timeouts, ALL priority
|
||||
adjustments. Strong.
|
||||
|
||||
### Compatibility
|
||||
Mostly subsumes C+18 / C+21 absorbers (they remain as safety nets).
|
||||
D-extension absorber may still be needed if upstream state-mutation
|
||||
timing differs by a few host instructions in regions canary's trace
|
||||
doesn't precisely cover.
|
||||
|
||||
### Verdict
|
||||
The "right next step" if structural alignment is the goal. The Phase
|
||||
D Stages 1-4 work is the *foundation* for this; γ broadens to other
|
||||
event classes. Risk: the trace can be enormous (millions of entries
|
||||
for Sylpheed), and the cost-benefit depends on how many *additional*
|
||||
events past 105,046 a broader trace would unlock.
|
||||
|
||||
---
|
||||
|
||||
## (δ) Wine-level controls — single-CPU pin + RT priority
|
||||
|
||||
### What
|
||||
Run canary under Wine with `taskset -c 0`, `chrt --rr 99`, disable
|
||||
kernel preemption flags. Reduce canary's host-OS jitter without
|
||||
modifying code.
|
||||
|
||||
### Scope
|
||||
- 0 LOC engine. ~10 LOC bash wrapper.
|
||||
|
||||
### Behavior risk
|
||||
MEDIUM. Wine's internal threads (ntdll server, GPU shim) still race
|
||||
with the game's guest threads; pinning all of them to one core
|
||||
serializes but doesn't guarantee a specific interleaving order.
|
||||
Aggressive RT priority could hang the rig if a tight spin loop
|
||||
forms.
|
||||
|
||||
### Coverage
|
||||
PARTIAL — reduces jitter range, doesn't eliminate. Empirical jitter
|
||||
profile suggests jitter range is already small (0-3 wait.begin events
|
||||
per cold), so the marginal reduction is small.
|
||||
|
||||
### Compatibility
|
||||
Orthogonal — works alongside absorbers. Could be combined with γ
|
||||
to reduce trace size by reducing canary's natural variance.
|
||||
|
||||
### Verdict
|
||||
Cheap, worth trying as a probe, but unlikely to bit-stabilize canary
|
||||
because Wine itself has internal non-determinism. **Recommend as a
|
||||
small empirical experiment, not as the structural fix.**
|
||||
|
||||
---
|
||||
|
||||
## (ε) Atomic-operation determinism — ours emulates canary's host
|
||||
|
||||
### What
|
||||
Change ours's atomic-op semantics so that, e.g., when ours's tid=1
|
||||
performs `atomic_cas(-1, 0, &cs->lock_count)`, the outcome matches
|
||||
what canary's host atomics would produce given the same instruction
|
||||
ordering. Requires modeling canary's host-OS scheduling decisions
|
||||
inside ours.
|
||||
|
||||
### Scope
|
||||
Effectively (γ) but at a finer grain. ~600 LOC.
|
||||
|
||||
### Behavior risk
|
||||
HIGH. Atomic-op semantics are a fundamental primitive; changing
|
||||
them risks breaking unrelated PowerPC instruction emulation.
|
||||
|
||||
### Coverage
|
||||
ALL contention. But the LOC growth is large because PowerPC has
|
||||
multiple atomic instructions (lwarx/stwcx., loadarrowright, etc.)
|
||||
each needing the replay hook.
|
||||
|
||||
### Compatibility
|
||||
Subsumes everything. Conflicts with the existing Scheduler.
|
||||
|
||||
### Verdict
|
||||
Theoretical only. Don't pursue.
|
||||
|
||||
---
|
||||
|
||||
## (ζ) Stay with the band-aid
|
||||
|
||||
### What
|
||||
Accept that the matched-prefix metric is unreliable in contention
|
||||
regions. Continue using C+18 / C+21 / D-extension absorbers; if new
|
||||
divergence classes appear past 105,046, add narrow absorbers as
|
||||
needed.
|
||||
|
||||
### Scope
|
||||
0 LOC engine. Diff-tool absorber additions: ~50-150 LOC per new
|
||||
class as it appears.
|
||||
|
||||
### Behavior risk
|
||||
LOW. Band-aids are explicitly annotated; the absorber chain has
|
||||
3 layers but each is narrow.
|
||||
|
||||
### Coverage
|
||||
Up to ε. The 104,607 cap is unblocked to 105,046. The NEXT cap
|
||||
(`VdInitializeEngines`, the VD-subsystem bug) is unrelated to
|
||||
scheduling.
|
||||
|
||||
### Compatibility
|
||||
Self-consistent. Already in production.
|
||||
|
||||
### Verdict
|
||||
**Cheapest viable answer.** The next divergence is *not* scheduling;
|
||||
no further scheduler-determinism work is needed UNTIL a future cap
|
||||
recurs from scheduler asymmetry.
|
||||
Reference in New Issue
Block a user