Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
123 lines
5.0 KiB
Markdown
123 lines
5.0 KiB
Markdown
# Phase C+22 — ESCALATION (2026-05-18)
|
|
|
|
## Decision: ESCALATE
|
|
|
|
C+22's target divergence at canary tid=6→1 idx=104,607 (canary
|
|
`import.call RtlEnterCriticalSection` extra nested-Enter vs
|
|
ours `import.call RtlLeaveCriticalSection`) is classified as
|
|
**(A) scheduler-determinism + post-wait state-mutation downstream
|
|
effect** — the same class C+20 escalated. C+21's wait.begin
|
|
floating-absorb correctly removed the visible wait.begin jitter
|
|
event (verified `floating_wait (c/o) = 2/0` engaged on this
|
|
chain in the fresh c22 sample), but the *post-wait branch* in
|
|
canary's guest code, taken because shared state was mutated
|
|
during the wait, cannot be papered over at the diff layer
|
|
without crossing reading-error #23 (matching genuinely different
|
|
guest behavior).
|
|
|
|
## What was done
|
|
|
|
1. Backed up both canary cache locations.
|
|
2. Wiped both canary caches + ours's cache.
|
|
3. Cold-ran ours (50M instructions, against the `.iso`).
|
|
4. Cold-ran canary (90s timeout, against the `.iso`).
|
|
5. Truncated canary log keeping all tids (first 250k events per
|
|
tid) so the C+18/C+21 cross-tid shared-global heuristic has
|
|
the multi-tid evidence it needs.
|
|
6. Ran `diff_events.py` with full multi-tid map.
|
|
7. Verified main matched prefix = 104,607 (matches C+21).
|
|
8. Verified sister chains unchanged: 11/32/3/41/16.
|
|
9. Verified C+21 floating-absorb engaged: `floating_create (c/o)
|
|
= 1/0`, `floating_wait (c/o) = 2/0` on main chain.
|
|
10. Restored canary caches.
|
|
|
|
Discovered along the way:
|
|
- **Reading-error class #34** (NEW): cold-run determinism
|
|
depends on input path form. The `.xex` and `.iso` paths
|
|
produce different boot trajectories. All cold-vs-cold runs
|
|
MUST use the `.iso` path. Documented in
|
|
`investigation.md` §"Methodology note".
|
|
|
|
## What was NOT done
|
|
|
|
- No engine source changed (per ESCALATE classification).
|
|
- No diff-tool changes (the existing C+18/C+21 absorbers
|
|
already work correctly for this region; over-absorbing the
|
|
post-wait Enter/Leave block would cross into matching
|
|
genuinely different guest behavior).
|
|
- Phase A emitter additive for `cs_ptr` arg considered but
|
|
deferred — not needed to establish the escalation decision;
|
|
would only refine the cause-of-branch story which is already
|
|
established by the C+20 analysis.
|
|
- D-NEW-2 NOT touched (explicitly out of scope per prompt).
|
|
|
|
## Why we can't fix this in C+22's authorized scope
|
|
|
|
The C+22 prompt authorizes modifications to:
|
|
- `crates/xenia-kernel/src/exports.rs` (rtl_enter_critical_section,
|
|
rtl_leave_critical_section, related CS state)
|
|
- `crates/xenia-kernel/src/state.rs` if CS state model needs
|
|
adjustment
|
|
- `tools/diff-events/diff_events.py` if a new race pattern is
|
|
identified
|
|
- Tests, Phase A emitter additive if needed, documentation
|
|
|
|
But explicitly forbids:
|
|
- Refactor scheduler / thread-model
|
|
- Refactor CS primitives broadly
|
|
- Touch GPU/audio/HID
|
|
- Land deferred items
|
|
- Fix D-NEW-2 in this session
|
|
|
|
The actual root cause is **scheduler determinism** — ours's
|
|
single-stepping scheduler runs tid=1 monolithically through this
|
|
region, denying other tids the opportunity to claim the shared
|
|
CS that's contended in canary. The fix requires either:
|
|
|
|
1. Reworking ours's scheduler to interleave threads at finer
|
|
granularity (multi-thousand-LOC refactor — NOT AUTHORIZED).
|
|
2. Recording canary's scheduling trace and replaying it in ours
|
|
(new subsystem — NOT AUTHORIZED).
|
|
3. Adding wait.begin emission to ours's RtlEnter park path AND
|
|
re-architecting the CS contention model so that, when ours
|
|
DOES contend, it produces canary-symmetric state mutations
|
|
— partial; would not fix this case because ours fast-paths
|
|
here, never parks.
|
|
4. Modifying Sylpheed guest code (out of scope and defeats
|
|
parity goal).
|
|
|
|
None of (1)-(4) fit C+22's authorized scope. **Escalation is the
|
|
correct decision.**
|
|
|
|
## Recommended next-target sequence
|
|
|
|
1. **C+23 = D-NEW-2** (independent ε-class fix on a different
|
|
sister chain). `KeWaitForSingleObject` `timeout_ns`
|
|
sign/scale asymmetry. Out of scope for C+22 per prompt; in
|
|
scope for C+23.
|
|
2. **C+24 = D-NEW-3** (canary tid=14→9 idx=41:
|
|
`XAudioGetVoiceCategoryVolumeChangeMask` vs ours's
|
|
`RtlEnterCriticalSection`). Likely a missing/stubbed
|
|
XAudio export.
|
|
3. **Parallel scheduler-determinism track**: a dedicated multi-
|
|
session refactor to attack the C+20/C+22 family at the root.
|
|
Scope per C+20: per-CS-pointer "expected contention"
|
|
inference from canary logs + scheduler driver + diff-tool
|
|
"scheduling-trace replay" event class.
|
|
|
|
## Confidence
|
|
|
|
- Classification confidence: HIGH (95%+). Verified by
|
|
multi-sample canary cold runs showing structurally identical
|
|
EE-LL nested pattern across all 4 samples; C+21 absorber
|
|
engaged exactly as predicted; mechanism (post-wait
|
|
state-mutation branch) consistent with C+20's analysis.
|
|
|
|
- Escalation correctness: HIGH (95%+). No authorized
|
|
modification within C+22's scope can fix this; reading-error
|
|
#23 explicitly applies if we over-absorb in the diff tool.
|
|
|
|
- Reading-error #34 discovery: HIGH (verified by repeat
|
|
experiment — 2 ours-cold runs against `.iso` byte-identical
|
|
modulo timestamps; identical to C+19 archive).
|