handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes

Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-06-05 07:19:08 +02:00
parent acd1656753
commit ef93a4fa14
620 changed files with 108303 additions and 1 deletions

View File

@@ -0,0 +1,61 @@
# Phase C+23 broad-impact assessment (2026-05-18)
## Resolved
- **D-NEW-2 (C+17 catalog)** — KeWait timeout `wait.begin.timeout_ns`
on canary tid=12 → ours tid=7 idx=3. Was 429466729600 (broken
positive sign-extension); now -30000000 matching canary exactly.
Sister chain advances 3 → 4.
## Advanced
- **`addis` opcode semantics across the entire CPU interpreter**.
Any place in any guest binary using `lis`/`addis` with a negative
immediate that flows into a 64-bit `std`, `mr`, `orx`, or anything
else that propagates the full GPR value will now produce
canary-equivalent behavior. The KeWait timeout was just the first
observable symptom in Sylpheed's cold-vs-cold trajectory.
## Persisted (no change)
- Main matched-prefix tid=6→1 = **104,607** (the C+22 scheduler-
determinism asymmetry is unaffected by this fix).
- Sister chains tid=4→11 (11), tid=7→2 (32), tid=14→9 (41),
tid=15→10 (16).
- Phase B `image_canonical_sha256 = ea8d160e9369328a5b922258a92113ef
b8d7ce3e1a5c12cc521e375985c91c18`.
- Kernel tests 204 (unchanged — no kernel-side code change).
## NEW
- **D-NEW-2.1 (idx=4 sister chain canary tid=12 → ours tid=7)**:
`KeWaitForSingleObject` return value canary=258 (TIMEOUT) vs
ours=0 (SUCCESS). Same C+20/C+22 family — scheduler-determinism;
ours's monolithic-thread runner allows the wait to return
SUCCESS where canary's contended scheduler lets the 30 ms timeout
elapse with no signaler. Out of scope for this phase; same
parallel scheduler-determinism track recommended in C+22.
- **CPU correctness probe**: opportunity for a low-key audit of
every "32-bit ABI defensive truncation" in
`xenia-cpu/src/interpreter.rs` for other ops that legitimately
need 64-bit sign-extension at the producer rather than truncation.
Quick mental scan suggests `addis` was likely the only such case;
`addi` already extends via `simm16 as i64 as u64`, `ori`/`oris`
are pure 64-bit OR. Not a blocking concern.
## Test count delta
| crate | pre-C+23 | post-C+23 | delta |
|---------------|----------|-----------|-------|
| xenia-cpu | 288 | 291 | +3 |
| xenia-kernel | 204 | 204 | 0 |
## Cold-stable digest delta
| baseline | pre-C+23 | post-C+23 |
|------------------------|-----------------------------------|-----------------------------------|
| ours-cold det-fields | e1dfcb1559f987b35012a7f2dc6d93f5 | 23cf4c4cbf61a577caa4118ab2308ba6 |
| Phase B image_canonical| ea8d160e9369… | ea8d160e9369… (UNCHANGED) |
3× ours-cold runs all yield the new digest. Determinism preserved.

View File

@@ -0,0 +1,134 @@
# Phase C+23 cold-vs-cold result (2026-05-18)
## Outcome: ENGINE FIX LANDED
`addis` sign-extension fix at `xenia-cpu/src/interpreter.rs` resolves
D-NEW-2 (ε-class timeout sign-extension on the canary tid=12 → ours
tid=7 sister chain). 5 LOC effective. Determinism preserved (3× cold
runs byte-identical post-fix).
## Matched-prefix table (vs C+22 baseline)
| chain | C+22 | C+23 (fresh) | delta |
|--------------------------------|---------|--------------|-------|
| canary tid=6 → ours tid=1 main | 104,607 | 104,607 | 0 |
| canary tid=4 → ours tid=11 | 11 | 11 | 0 |
| canary tid=7 → ours tid=2 | 32 | 32 | 0 |
| canary tid=12 → ours tid=7 | 3 | **4** | **+1** |
| canary tid=14 → ours tid=9 | 41 | 41 | 0 |
| canary tid=15 → ours tid=10 | 16 | 16 | 0 |
## Floating-event absorption counts (fresh c23)
| chain | floating_create (c/o) | floating_wait (c/o) |
|--------------------------------|-----------------------|---------------------|
| canary tid=6 → ours tid=1 main | 2 / 0 | 3 / 0 |
| canary tid=15 → ours tid=10 | 0 / 1 | 0 / 0 |
| others | 0 / 0 | 0 / 0 |
C+18 absorber engaged on main chain (2 canary handle.create floated)
and on tid=15→10 (1 ours handle.create floated). C+21 absorber engaged
on main chain (3 canary wait.begin events floated — this canary cold
sample took the contended slow path 3 times).
## Cold-stable invariants
- **ours-cold byte-identical (det-fields) across 3 runs**:
digest `23cf4c4cbf61a577caa4118ab2308ba6`. Replaces C+22's
`e1dfcb1559f987b35012a7f2dc6d93f5` baseline (digest moved due
to engine source change). New baseline anchored here.
- **Event count** unchanged: 121,569 ours events (matches C+22).
- **Phase B `image_canonical_sha256` =
`ea8d160e9369328a5b922258a92113efb8d7ce3e1a5c12cc521e375985c91c18`**
— UNCHANGED. Image-loading path untouched.
- **Engine source change**: `xenia-cpu/src/interpreter.rs::addis`
(5 LOC effective, ~25 LOC including comment + commented-out
truncation). No `xenia-canary` source changes. No diff-tool changes.
- **Tests**: kernel 204 unchanged; cpu 288 → 291 (3 new regression
tests for the addis fix).
## Direct fix-verification at the divergence point
ours-cold post-fix, tid=7 events 0-4:
```
[0] import.call KeWaitForSingleObject
[1] kernel.call KeWaitForSingleObject
[2] handle.create sid=6e3d96c5a52bf429
[3] wait.begin {timeout_ns: -30000000, alertable: false, wait_type: any}
[4] kernel.return return_value=0 status=0x00000000
```
canary-cold, tid=12 events 0-4:
```
[0] import.call KeWaitForSingleObject
[1] kernel.call KeWaitForSingleObject
[2] handle.create sid=c49d8f0ab90401ea (different SID, absorbed)
[3] wait.begin {timeout_ns: -30000000, alertable: false, wait_type: any}
[4] kernel.return return_value=258 status=0x00000102 (TIMEOUT)
```
`timeout_ns: -30000000` MATCHES across engines (was `429466729600` pre-fix).
## New downstream divergence at idx=4 (C+23 → C+24+ target)
The advance reveals the next-class issue at idx=4:
```
canary: [4] kernel.return KeWaitForSingleObject return_value=258 (TIMEOUT)
ours: [4] kernel.return KeWaitForSingleObject return_value=0 (SUCCESS)
```
Classification: **(A) scheduler-determinism**, same family as C+20
and C+22 escalations. Ours's monolithic-thread runner doesn't allow
the 30 ms timeout window to elapse with no signaler, so the wait
returns SUCCESS (the event was already signaled at the entry?) or
the wait was implicit-fast-served. Canary's contended scheduler lets
the timeout fire. Engine-side fix requires the parallel
scheduler-determinism track (multi-session refactor).
## Verification that fix is NOT diff-tool jitter
Multiple distinct evidences:
1. **Direct ours-cold inspection** — the `wait.begin.timeout_ns`
field is read directly from ours-cold.jsonl (no diff-tool
interpretation), and it's now -30000000.
2. **Unit tests** — `lis_ori_std_negative_timeout_writes_sign_
extended_doubleword` in xenia-cpu asserts the architectural fact
directly.
3. **Determinism** — 3× cold runs produce byte-identical det-fields
digest. The fix isn't a race that flickered on this one sample.
4. **Phase B image hash unchanged** — the fix is purely behavioral
on the JIT layer, not a re-link or image change.
## Cascade outcome
- A=verify canary's timeout read logic: PASS (identical formula).
- B=identify encoding bug class: PASS — (d) sign-extension.
- C=land fix: PASS — 5 LOC + 3 tests.
- D=tid=12→7 advances past 3: PASS (3 → 4).
- E=no regression on main or other sisters: PASS (all preserved).
## Files
- `investigation.md`
- `cold-vs-cold-result.md` (this file)
- `diff-cold-vs-cold.md`
- `re-validation.md`
- `ours-cold.jsonl` / `ours-cold-stdout.log` / `ours-cold-stderr.log`
- `canary-cold-trunc.jsonl` / `canary-cold-stdout.log`
- `canary-binary-cache-pre-wipe.tar.gz` / `canary-xdg-cache-pre-wipe.tar.gz`
- `digest-cold-stable-1.json` / `-2.json` / `-3.json`
- `fix.diff`
## Next-target recommendation
- **C+24 = D-NEW-3** (canary tid=14 → ours tid=9 idx=41): canary
calls `XAudioGetVoiceCategoryVolumeChangeMask`; ours calls
`RtlEnterCriticalSection`. Likely missing/stubbed XAudio export
in ours causing fallback. Independent of scheduler-determinism.
- **Parallel scheduler-determinism track**: tackle the C+20/C+22 +
the newly-surfaced C+23-idx=4 family at the root via a
per-CS-pointer expected-contention inference layer. Multi-session.

View File

@@ -0,0 +1,137 @@
# Phase A diff report
**This report is the output of Phase A's diff harness. Divergences
shown here are INPUT for Phase B (first-divergence localization),
not findings of Phase A.** Phase A's job is to make the harness
itself correct, not to analyze what it surfaces.
## Summary
| canary_tid | ours_tid | matched | canary_total | ours_total | first_divergence_at | floating_create (c/o) | floating_wait (c/o) |
|---|---|---|---|---|---|---|---|
| 4 | 11 | 11 | 20000 | 11 | — | 0/0 | 0/0 |
| 6 | 1 | 104607 | 250000 | 108507 | 104607 | 2/0 | 3/0 |
| 7 | 2 | 32 | 32 | 33 | — | 0/0 | 0/0 |
| 12 | 7 | 4 | 20000 | 5 | 4 | 0/0 | 0/0 |
| 14 | 9 | 41 | 20000 | 77 | 41 | 0/0 | 0/0 |
| 15 | 10 | 16 | 20000 | 17 | — | 0/1 | 0/0 |
*`floating_create (c/o)` counts shared-global `handle.create` events absorbed by Phase C+18 cross-tid SID matching. `floating_wait (c/o)` counts `wait.begin` events on shared-global dispatchers absorbed by Phase C+21 (scheduling-jitter window — canary's contention slow path may fire while ours fast-paths or vice versa). See schema-v1.md §"Shared-global SIDs" and §"Wait-begin floating absorb".*
## canary_tid=4 → ours_tid=11
No divergence within the 11 compared events (canary has 20000, ours has 11).
## canary_tid=6 → ours_tid=1
First divergence at `tid_event_idx=104607`: payload.ord: canary=293 ours=304
**Pre-context (last 5 matching events):**
```
canary: [104607] kernel.call RtlLeaveCriticalSection
ours: [104602] kernel.call RtlLeaveCriticalSection
canary: [104608] kernel.return RtlLeaveCriticalSection
ours: [104603] kernel.return RtlLeaveCriticalSection
canary: [104609] import.call RtlEnterCriticalSection
ours: [104604] import.call RtlEnterCriticalSection
canary: [104610] kernel.call RtlEnterCriticalSection
ours: [104605] kernel.call RtlEnterCriticalSection
canary: [104611] kernel.return RtlEnterCriticalSection
ours: [104606] kernel.return RtlEnterCriticalSection
```
**Divergent event:**
```
canary: [104612] import.call RtlEnterCriticalSection
ours: [104607] import.call RtlLeaveCriticalSection
```
**Next event after the divergence (if any):**
```
canary: [104613] kernel.call RtlEnterCriticalSection
ours: [104608] kernel.call RtlLeaveCriticalSection
```
**Raw events (JSON):**
```json
{"deterministic": true, "engine": "canary", "guest_cycle": 0, "host_ns": 1475067700, "kind": "import.call", "payload": {"module": "xboxkrnl.exe", "name": "RtlEnterCriticalSection", "ord": 293}, "schema_version": 1, "tid": 6, "tid_event_idx": 104612}
{"deterministic": true, "engine": "ours", "guest_cycle": 5517276, "host_ns": 478624573, "kind": "import.call", "payload": {"module": "xboxkrnl.exe", "name": "RtlLeaveCriticalSection", "ord": 304}, "schema_version": 1, "tid": 1, "tid_event_idx": 104607}
```
## canary_tid=7 → ours_tid=2
No divergence within the 32 compared events (canary has 32, ours has 33).
## canary_tid=12 → ours_tid=7
First divergence at `tid_event_idx=4`: payload.return_value: canary=258 ours=0
**Pre-context (last 5 matching events):**
```
canary: [0] import.call KeWaitForSingleObject
ours: [0] import.call KeWaitForSingleObject
canary: [1] kernel.call KeWaitForSingleObject
ours: [1] kernel.call KeWaitForSingleObject
canary: [2] handle.create sid=c49d8f0ab90401ea
ours: [2] handle.create sid=6e3d96c5a52bf429
canary: [3] wait.begin {'handles_semantic_ids': ['c49d8f0ab90401ea'], 'timeout_ns': -30000000, 'alertable': False, 'wait_type': 'any'}
ours: [3] wait.begin {'handles_semantic_ids': ['6e3d96c5a52bf429'], 'timeout_ns': -30000000, 'alertable': False, 'wait_type': 'any'}
```
**Divergent event:**
```
canary: [4] kernel.return KeWaitForSingleObject
ours: [4] kernel.return KeWaitForSingleObject
```
**Next event after the divergence (if any):**
```
canary: [5] import.call RtlEnterCriticalSection
ours: <end of stream>
```
**Raw events (JSON):**
```json
{"deterministic": true, "engine": "canary", "guest_cycle": 0, "host_ns": 1582904700, "kind": "kernel.return", "payload": {"name": "KeWaitForSingleObject", "return_value": 258, "side_effects": [], "status": "0x00000102"}, "schema_version": 1, "tid": 12, "tid_event_idx": 4}
{"deterministic": true, "engine": "ours", "guest_cycle": 30, "host_ns": 488185483, "kind": "kernel.return", "payload": {"name": "KeWaitForSingleObject", "return_value": 0, "side_effects": [], "status": "0x00000000"}, "schema_version": 1, "tid": 7, "tid_event_idx": 4}
```
## canary_tid=14 → ours_tid=9
First divergence at `tid_event_idx=41`: payload.ord: canary=503 ours=293
**Pre-context (last 5 matching events):**
```
canary: [36] kernel.call KeReleaseSpinLockFromRaisedIrql
ours: [36] kernel.call KeReleaseSpinLockFromRaisedIrql
canary: [37] kernel.return KeReleaseSpinLockFromRaisedIrql
ours: [37] kernel.return KeReleaseSpinLockFromRaisedIrql
canary: [38] import.call KfLowerIrql
ours: [38] import.call KfLowerIrql
canary: [39] kernel.call KfLowerIrql
ours: [39] kernel.call KfLowerIrql
canary: [40] kernel.return KfLowerIrql
ours: [40] kernel.return KfLowerIrql
```
**Divergent event:**
```
canary: [41] import.call XAudioGetVoiceCategoryVolumeChangeMask
ours: [41] import.call RtlEnterCriticalSection
```
**Next event after the divergence (if any):**
```
canary: [42] kernel.call XAudioGetVoiceCategoryVolumeChangeMask
ours: [42] kernel.call RtlEnterCriticalSection
```
**Raw events (JSON):**
```json
{"deterministic": true, "engine": "canary", "guest_cycle": 0, "host_ns": 1770114500, "kind": "import.call", "payload": {"module": "xboxkrnl.exe", "name": "XAudioGetVoiceCategoryVolumeChangeMask", "ord": 503}, "schema_version": 1, "tid": 14, "tid_event_idx": 41}
{"deterministic": true, "engine": "ours", "guest_cycle": 417, "host_ns": 1612544262, "kind": "import.call", "payload": {"module": "xboxkrnl.exe", "name": "RtlEnterCriticalSection", "ord": 293}, "schema_version": 1, "tid": 9, "tid_event_idx": 41}
```
## canary_tid=15 → ours_tid=10
No divergence within the 16 compared events (canary has 20000, ours has 17).

View File

@@ -0,0 +1,6 @@
{
"run": 1,
"total_events": 121569,
"det_fields_md5": "23cf4c4cbf61a577caa4118ab2308ba6",
"phase_b_image_canonical_sha256": "ea8d160e9369328a5b922258a92113efb8d7ce3e1a5c12cc521e375985c91c18"
}

View File

@@ -0,0 +1,6 @@
{
"run": 2,
"total_events": 121569,
"det_fields_md5": "23cf4c4cbf61a577caa4118ab2308ba6",
"phase_b_image_canonical_sha256": "ea8d160e9369328a5b922258a92113efb8d7ce3e1a5c12cc521e375985c91c18"
}

View File

@@ -0,0 +1,6 @@
{
"run": 3,
"total_events": 121569,
"det_fields_md5": "23cf4c4cbf61a577caa4118ab2308ba6",
"phase_b_image_canonical_sha256": "ea8d160e9369328a5b922258a92113efb8d7ce3e1a5c12cc521e375985c91c18"
}

View File

@@ -0,0 +1,136 @@
diff --git a/crates/xenia-cpu/src/interpreter.rs b/crates/xenia-cpu/src/interpreter.rs
index 0e150e8..9101b54 100644
--- a/crates/xenia-cpu/src/interpreter.rs
+++ b/crates/xenia-cpu/src/interpreter.rs
@@ -117,17 +117,27 @@ fn execute(ctx: &mut PpcContext, mem: &dyn MemoryAccess, instr: &DecodedInstr) -
ctx.pc += 4;
}
PpcOpcode::addis => {
- // Xbox 360 user mode is 32-bit ABI (MSR.SF=0), so addis must
- // produce a value whose upper 32 bits don't pollute downstream
- // 64-bit arithmetic. The PPC ISA in 64-bit mode sign-extends
- // simm16 before the shift, producing 0xFFFFFFFF_xxxx0000 for
- // negative simm16 (high bit set). When this value flows into
- // a 64-bit subfc against a zero-extended lwz value, the unsigned
- // 64-bit comparison yields wrong CA. Truncate to 32 bits to
- // simulate 32-bit ABI behavior.
- let ra_val = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] };
- let result = ra_val.wrapping_add((instr.simm16() as i64 as u64) << 16);
- ctx.gpr[instr.rd()] = result as u32 as u64;
+ // Phase C+23: `addis` (and the `lis` simplified mnemonic) must
+ // sign-extend the shifted immediate to the full 64 bits before
+ // storing into the GPR, matching canary's HIR emitter
+ // (`InstrEmit_addis` in `ppc_emit_alu.cc`: `EXTS16(SI) << 16`
+ // as a 64-bit constant). Game code commonly builds a negative
+ // 32-bit value via `lis rN, 0xFFFB; ori rN, rN, 0x6C20`
+ // (yielding the i32 -300,000 for a 30ms `KeWait` timeout) and
+ // then stores it as a 64-bit doubleword via `std`. Without
+ // sign extension the high half on the wire was 0x00000000,
+ // turning the timeout into a positive ~4.3-billion-tick
+ // absolute deadline (~7 minutes) instead of a 30ms relative
+ // wait — surfacing as `wait.begin.timeout_ns=429466729600`
+ // on canary tid=12 → ours tid=7 idx=3 sister chain
+ // (cold-vs-cold C+22 baseline). Defensive 32-bit truncation
+ // for the arithmetic chain consumers (`subfcx`/`addex`/etc.)
+ // is already implemented at each consumer site (see PPCBUG-002/
+ // 007/etc.), so widening `addis` here does NOT regress them.
+ let ra_val = if instr.ra() == 0 { 0i64 } else { ctx.gpr[instr.ra()] as i64 };
+ let shifted = (instr.simm16() as i64) << 16;
+ let result = ra_val.wrapping_add(shifted);
+ ctx.gpr[instr.rd()] = result as u64;
ctx.pc += 4;
}
PpcOpcode::addic => {
@@ -4934,6 +4944,92 @@ mod tests {
assert_eq!(ctx.gpr[3], 0x10000);
}
+ /// Phase C+23 regression: `addis rD, 0, neg_simm` (the `lis` form
+ /// with a negative immediate) must sign-extend the result to the
+ /// full 64 bits, matching canary's HIR emitter. Without this fix,
+ /// game code that builds a 32-bit negative value via
+ /// `lis r11, 0xFFFB; ori r11, r11, 0x6C20` and then stores the
+ /// result as a 64-bit doubleword via `std` would put 0x00000000
+ /// in the high half instead of the correct 0xFFFFFFFF, turning a
+ /// 30 ms relative `KeWaitForSingleObject` timeout into a positive
+ /// absolute deadline ~7 minutes away. Anchored by the cold-vs-cold
+ /// sister chain canary tid=12 → ours tid=7 idx=3 divergence.
+ #[test]
+ fn addis_with_negative_simm_sign_extends_to_64_bits() {
+ let mut ctx = PpcContext::new();
+ let mut mem = TestMem::new();
+ // addis r11, r0, 0xFFFB (lis r11, 0xFFFB)
+ // op=15, rd=11, ra=0, simm=0xFFFB.
+ let raw = (15u32 << 26) | (11u32 << 21) | (0u32 << 16) | 0xFFFBu32;
+ write_instr(&mut mem, 0, raw);
+ ctx.pc = 0;
+ step(&mut ctx, &mut mem);
+ assert_eq!(
+ ctx.gpr[11], 0xFFFFFFFF_FFFB0000u64,
+ "addis with negative simm must sign-extend to 64 bits"
+ );
+ }
+
+ /// Phase C+23 regression: the full `lis + ori + std` sequence that
+ /// builds the 300,000 timeout tick count used by Sylpheed for its
+ /// 30 ms `KeWait` calls must produce 0xFFFFFFFFFFFB6C20 on the wire,
+ /// not 0x00000000FFFB6C20. This is the proximate cause of the
+ /// `wait.begin.timeout_ns = 429466729600` divergence on canary tid=12
+ /// → ours tid=7 idx=3 in the cold-vs-cold C+22 baseline.
+ #[test]
+ fn lis_ori_std_negative_timeout_writes_sign_extended_doubleword() {
+ let mut ctx = PpcContext::new();
+ let mut mem = TestMem::new();
+ // r1 = 0x100 (stack pointer surrogate). Storage slot at r1+8.
+ ctx.gpr[1] = 0x100;
+ // lis r11, 0xFFFB ; r11 = 0xFFFFFFFFFFFB0000
+ let lis = (15u32 << 26) | (11u32 << 21) | (0u32 << 16) | 0xFFFBu32;
+ // ori r11, r11, 0x6C20 ; r11 = 0xFFFFFFFFFFFB6C20
+ // op=24 (ori): D-form encoding | rs(11) | ra(11) | uimm.
+ let ori = (24u32 << 26) | (11u32 << 21) | (11u32 << 16) | 0x6C20u32;
+ // std r11, 8(r1) ; mem[0x108..0x110] = 0xFFFFFFFFFFFB6C20
+ // op=62, DS-form, ds_field=8>>2=2, xo=0.
+ let std_op = (62u32 << 26) | (11u32 << 21) | (1u32 << 16) | (8u32 & 0xFFFCu32);
+ write_instr(&mut mem, 0, lis);
+ write_instr(&mut mem, 4, ori);
+ write_instr(&mut mem, 8, std_op);
+ ctx.pc = 0;
+ step(&mut ctx, &mut mem); // lis
+ assert_eq!(ctx.gpr[11], 0xFFFFFFFF_FFFB0000u64);
+ step(&mut ctx, &mut mem); // ori
+ assert_eq!(ctx.gpr[11], 0xFFFFFFFF_FFFB6C20u64);
+ step(&mut ctx, &mut mem); // std
+ let stored = mem.read_u64(0x108);
+ assert_eq!(
+ stored, 0xFFFFFFFF_FFFB6C20u64,
+ "std must persist all 64 bits of the sign-extended GPR"
+ );
+ // Interpreting the stored doubleword as a 100ns NT TIMEOUT tick
+ // count: it must round-trip to 300,000 (30 ms relative wait),
+ // NOT to +4,294,667,296 (the C+22 broken value).
+ assert_eq!(stored as i64, -300_000i64);
+ assert_eq!((stored as i64).wrapping_mul(100), -30_000_000i64);
+ }
+
+ /// Phase C+23 regression: ensure `addis` against a non-zero rA still
+ /// performs the canonical Add with 64-bit semantics. Used by
+ /// arithmetic chains that combine a sign-extended `lis` high half
+ /// with a subsequent `addi` low half. Equivalent to canary's HIR
+ /// `Add(LoadGPR(rA), const_i64(simm << 16))`.
+ #[test]
+ fn addis_with_nonzero_ra_adds_in_64_bit() {
+ let mut ctx = PpcContext::new();
+ let mut mem = TestMem::new();
+ // r4 = 0x1234 already. addis r5, r4, 0xFFFE => r5 = r4 + (-2<<16)
+ // = 0x1234 + 0xFFFFFFFFFFFE0000
+ ctx.gpr[4] = 0x1234;
+ let raw = (15u32 << 26) | (5u32 << 21) | (4u32 << 16) | 0xFFFEu32;
+ write_instr(&mut mem, 0, raw);
+ ctx.pc = 0;
+ step(&mut ctx, &mut mem);
+ assert_eq!(ctx.gpr[5], 0xFFFFFFFF_FFFE1234u64);
+ }
+
#[test]
fn test_lwz_stw() {
let mut ctx = PpcContext::new();

View File

@@ -0,0 +1,243 @@
# Phase C+23 investigation — KeWaitForSingleObject timeout encoding (2026-05-18)
## Divergence (input from C+22)
D-NEW-2 at canary tid=12 → ours tid=7 idx=3 sister chain:
```
canary: [3] wait.begin {handles_semantic_ids: ['c49d8f0ab90401ea'],
timeout_ns: -30000000, alertable: False, wait_type: 'any'}
ours: [3] wait.begin {handles_semantic_ids: ['6e3d96c5a52bf429'],
timeout_ns: 429466729600, alertable: False, wait_type: 'any'}
```
Canary: -30,000,000 ns = -300,000 100ns-ticks = 30 ms relative wait.
Ours: +429,466,729,600 ns = +4,294,667,296 100ns-ticks = +7 minutes
absolute deadline. Wrong by sign-extension class.
## Step 1 — Verify framing (reading-error #28)
### Canary's `xeKeWaitForSingleObject`
`xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_threading.cc:969-1013`:
```cpp
uint32_t xeKeWaitForSingleObject(void* object_ptr, uint32_t wait_reason,
uint32_t processor_mode, uint32_t alertable,
uint64_t* timeout_ptr) {
...
if (phase_a::IsEnabled()) {
uint64_t sid = 0;
if (!object->handles().empty()) {
sid = phase_a::LookupHandleSemanticId(object->handles()[0]);
}
int64_t timeout_ns = timeout_ptr
? (static_cast<int64_t>(*timeout_ptr) * 100) : -1;
phase_a::EmitWaitBegin(&sid, 1, timeout_ns, alertable != 0, false);
}
...
}
dword_result_t KeWaitForSingleObject_entry(lpvoid_t object_ptr,
dword_t wait_reason,
dword_t processor_mode,
dword_t alertable,
lpqword_t timeout_ptr) {
uint64_t timeout = timeout_ptr ? static_cast<uint64_t>(*timeout_ptr) : 0u;
return xeKeWaitForSingleObject(...);
}
```
`lpqword_t` is Xenia's BE-swapped 64-bit-aligned pointer accessor.
Formula: read 8 BE bytes as int64, multiply by 100.
### Ours's `ke_wait_for_single_object`
`xenia-rs/crates/xenia-kernel/src/exports.rs:5051-5083` (and
`decode_timeout_ns` at 4987-4995):
```rust
fn decode_timeout_ns(mem: &GuestMemory, timeout_ptr: u32) -> i64 {
if timeout_ptr == 0 { return -1; }
let raw = mem.read_u64(timeout_ptr) as i64;
raw.saturating_mul(100)
}
```
`mem.read_u64` reads 8 BE bytes (xenia-memory/heap.rs:521-533).
Formula: read 8 BE bytes as int64, multiply by 100. **Identical to canary.**
### Conclusion of Step 1
Both engines read 8 BE bytes from the same conceptual `timeout_ptr` and
multiply by 100. If both read the **same bytes** from the **same address**,
they produce the same `timeout_ns`. The divergence implies one of:
1. The `timeout_ptr` address differs (upstream).
2. The bytes at the same address differ (upstream).
3. Wrong-register read in one of the engines (reading-error #25).
## Step 2 — Sample the actual guest call (reading-error #25 discipline)
Added a TEMPORARY diagnostic dump to `ke_wait_for_single_object`
(removed before landing the fix). Ran cold ours; first hit for tid=7:
```
XRS_C23 KeWait tid=7 lr=0x824cd4f4 r3=0x42453b5c r4=0x3 r5=0x1 r6=0x0
r7=0x71187eb0 r8=0x0 r9=0x0 r10=0x2
bytes_at_r7=hi=0x0 lo=0xfffb6c20
```
- r3 = `0x42453b5c` — object pointer (PKEVENT at ctx+0x20).
- r7 = `0x71187eb0` — timeout pointer (stack-allocated).
- bytes at r7 = `0x00000000 0xFFFB6C20` (BE) → full 8 BE bytes =
`0x00000000_FFFB6C20` = +4,294,667,296. **Matches ours's output.**
For canary's -300,000 (= -30,000,000 / 100), the 8 BE bytes would be
`0xFFFFFFFF_FFFB6C20`. So **the high 4 bytes are zero in ours but
all-Fs in canary**. The low 32 bits match exactly.
The guest is writing the LARGE_INTEGER to its stack and our engine
sees `0x00000000_FFFB6C20` while canary sees `0xFFFFFFFF_FFFB6C20`.
Different bytes at the same conceptual location ⇒ upstream divergence
in how the guest computes the value.
## Step 3 — Identify the encoding bug (root cause)
LR at the KeWait call = 0x824cd4f4. The thread entry (from
`thread.create.entry_pc`) is `0x824cd458`. Disassembling
`0x824cd458 … 0x824cd4f0` (the prolog through the call):
```
824cd470: 0x3d60fffb lis r11, 0xFFFB ; high half of -300,000
824cd478: 0x3ba10050 addi r29, r1, 80 ; r29 = stack timeout slot
824cd47c: 0x616b6c20 ori r11, r11, 0x6C20 ; r11 |= 0x6C20
824cd480: 0xf9610050 std r11, 80(r1) ; store r11 as 64-bit DW
...
824cd4dc: 0x7fa7eb78 mr r7, r29 ; r7 = timeout pointer
...
824cd4f0: 0x483808dd bl KeWaitForSingleObject
```
In canonical PowerPC, `lis r11, 0xFFFB` is `addis r11, 0, 0xFFFB` and
**sign-extends the shifted immediate to 64 bits**:
```
r11 = EXTS(0xFFFB) << 16 = 0xFFFFFFFF_FFFB0000
```
Canary's HIR emitter at `xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc:
138-150` (`InstrEmit_addis`) does exactly that:
```cpp
Value* si = f.LoadConstantInt64(XEEXTS16(i.D.DS) << 16);
```
Subsequent `ori r11, r11, 0x6C20` produces `0xFFFFFFFF_FFFB6C20`, and
`std r11, 80(r1)` writes all 64 bits → canary's wire bytes
`0xFFFFFFFF_FFFB6C20` = -300,000 as int64.
**Ours's `addis` at
`xenia-rs/crates/xenia-cpu/src/interpreter.rs:119-132` (before fix)**:
```rust
PpcOpcode::addis => {
// (per the comment) truncate to 32 bits to simulate 32-bit ABI.
let ra_val = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] };
let result = ra_val.wrapping_add((instr.simm16() as i64 as u64) << 16);
ctx.gpr[instr.rd()] = result as u32 as u64; // ⬅ ZERO-extends to 64
ctx.pc += 4;
}
```
The `result as u32 as u64` cast **drops the high 32 bits before storage**,
producing `0x00000000_FFFB0000` instead of `0xFFFFFFFF_FFFB0000`.
After `ori``0x00000000_FFFB6C20`. After `std` (which stores all 64
bits of the GPR) → wire bytes `0x00000000_FFFB6C20` = +4,294,667,296
as int64. **This is the C+22 divergence value exactly.**
### Encoding bug class: (d) Sign-extension. Specifically:
> `addis` performed a defensive 32-bit zero-extension truncation that
> defeats the architectural sign-extension semantics required when the
> result later flows into a 64-bit memory store (`std`).
### Why the defensive truncation existed
The C+22-era comment cites correctness of the `subfc`/`lwz` carry
chain in 32-bit ABI mode. Inspection of every consumer of GPRs that
might receive an `addis` result confirms: every 32-bit-meaningful
arithmetic op (`subfcx`, `addic`, `addicx`, `subficx`, etc.) already
defensively truncates BOTH operands to u32 BEFORE computing. So the
upstream sign-extended high bits never enter their result; they only
become visible via `std`/`mr`/`orx` (operations that legitimately
propagate the full 64-bit value).
Reverting the `addis` truncation does NOT regress any PPCBUG-002/-007/
-etc. fix; those operate at their consumer site, not at the producer.
## The fix (5 LOC effective)
`xenia-rs/crates/xenia-cpu/src/interpreter.rs:119-138`:
```rust
PpcOpcode::addis => {
// Phase C+23: sign-extend the shifted immediate to 64 bits before
// adding to rA, matching canary's HIR emitter. Defensive 32-bit
// truncation at each consumer site already handles the 32-bit-ABI
// arithmetic chain correctness (see PPCBUG-002/-007/etc.).
let ra_val = if instr.ra() == 0 { 0i64 } else { ctx.gpr[instr.ra()] as i64 };
let shifted = (instr.simm16() as i64) << 16;
let result = ra_val.wrapping_add(shifted);
ctx.gpr[instr.rd()] = result as u64;
ctx.pc += 4;
}
```
### Tests added (3 new in xenia-cpu)
- `addis_with_negative_simm_sign_extends_to_64_bits` — direct
unit test for `lis r11, 0xFFFB` producing `0xFFFFFFFFFFFB0000`.
- `lis_ori_std_negative_timeout_writes_sign_extended_doubleword`
end-to-end regression: runs the actual 3-instruction sequence
used by Sylpheed's KeWait setup, asserts wire bytes
`0xFFFFFFFFFFFB6C20` and int64 round-trip to -300,000.
- `addis_with_nonzero_ra_adds_in_64_bit` — ensures the rA-non-zero
case still uses canonical 64-bit Add semantics.
## Cross-engine encoding bug class summary
Per the prompt's hint catalog:
- (a) Wrong register: ruled out. r3-r10 dump confirms r7 holds the
timeout pointer in ours, matching canary's 5-arg ABI signature.
- (b) Wrong-direction LARGE_INTEGER dereference: ruled out. Both
engines read 8 BE bytes via the same idiom.
- (c) Endianness: ruled out. Both BE.
- (d) Sign-extension: **CONFIRMED.** Bug is in the CPU interpreter's
`addis` opcode, not the wait subsystem.
## Validation evidence
- ours-cold (post-fix) tid=7 idx=3 `wait.begin.timeout_ns = -30000000`,
matching canary exactly.
- Sister chain canary tid=12 → ours tid=7 advances from matched=3 to
matched=4.
- New divergence at idx=4 is `return_value: canary=258 (TIMEOUT) ours=0
(SUCCESS)` — the C+22-class scheduler-determinism issue (ours's
monolithic-thread runner sees no contention, so the 30 ms timeout
doesn't fire). Out of scope for this phase.
- Main chain matched-prefix 104,607 preserved (no regression).
- All other sister chains at C+22 baseline.
## Files
- `investigation.md` (this file)
- `cold-vs-cold-result.md`
- `diff-cold-vs-cold.md` — full Phase A diff report
- `ours-cold.jsonl` / `ours-cold-stdout.log` / `ours-cold-stderr.log`
- `canary-cold-trunc.jsonl` / `canary-cold-stdout.log`
- `canary-binary-cache-pre-wipe.tar.gz` / `canary-xdg-cache-pre-wipe.tar.gz`
- `re-validation.md`
- `digest-cold-stable-1.json` / `-2.json` / `-3.json`
- `fix.diff`

View File

@@ -0,0 +1,86 @@
# Phase C+23 re-validation (2026-05-18)
## Protocol followed
Cold-vs-cold per reading-error #31 + #32 + #33 + #34.
1. ✓ Backed up both canary cache locations
(`xenia-canary/build-cross/bin/Windows/Debug/cache/` and
`~/.local/share/Xenia/cache/`) to tarballs in
`xenia-rs/audit-runs/phase-c23-keWait-timeout-encoding/`.
2. ✓ Wiped both canary caches + ours's
(`~/.local/share/xenia-rs/cache/` + `/tmp/xrs-cache-c23-*`).
3. ✓ Cold-ran ours (50M instructions) against the `.iso`
path — NOT the loose `default.xex` (per #34).
4. ✓ Cold-ran canary with phase_a_event_log_path set, killed after
~95s timeout, with `--mute=true`.
5. ✓ Truncated canary log to first 250k events for tid=6 / 20k for
sisters (using existing C+19 truncate.py).
6. ✓ Ran `diff_events.py` with full tid map
`6=1,7=2,4=11,12=7,14=9,15=10`.
7. ✓ Restored both canary cache backups.
8. ✓ Reverted canary config (`phase_a_event_log_path` back to `""`).
## Determinism check
3 cold ours runs against `.iso`, det-fields-only MD5:
| run | digest |
|-----|-------------------------------------|
| 1 | 23cf4c4cbf61a577caa4118ab2308ba6 |
| 2 | 23cf4c4cbf61a577caa4118ab2308ba6 |
| 3 | 23cf4c4cbf61a577caa4118ab2308ba6 |
PASS — bit-stable. Replaces C+22's `e1dfcb1559f987b35012a7f2dc6d93f5`
baseline (digest moved due to addis behavioral change).
## Phase B image hash
`image_canonical_sha256 = ea8d160e9369328a5b922258a92113efb8d7
ce3e1a5c12cc521e375985c91c18` — UNCHANGED. The image-loading path
is unaffected.
## Gate matrix
| gate | result |
|---------------------------------------------|--------------------------------|
| Engine source change minimal | PASS (~25 LOC, 1 file) |
| CPU tests (xenia-cpu) — pre vs post | 288 → 291 (3 new, 0 regressions) |
| Kernel tests (xenia-kernel) | 204 unchanged (no regressions) |
| Diff-tool source unchanged | PASS |
| Phase A schema version 1 unchanged | PASS |
| ours-cold byte-stable across 3 runs | PASS (digest unchanged) |
| Main matched-prefix preserved at C+22 | PASS (104,607) |
| Sister tid=12→7 advanced | PASS (3 → 4) |
| Sister tid=4→11 / 7→2 / 14→9 / 15→10 | PASS (all preserved) |
| Phase B image hash preserved | PASS (ea8d160e…) |
| Canary caches restored | PASS |
| Canary config restored | PASS |
| Workspace build clean | PASS |
| `--mute=true` used | PASS |
| Renamed binary used (xrs-c23) | PASS |
| Cold-vs-cold against `.iso` | PASS (per #34) |
## What changed in the diff-tool report compared to C+22 baseline
| metric | C+22 | C+23 | delta |
|------------------------------|---------|---------|-------|
| matched tid=6→1 | 104,607 | 104,607 | 0 |
| matched tid=4→11 | 11 | 11 | 0 |
| matched tid=7→2 | 32 | 32 | 0 |
| matched tid=12→7 | 3 | **4** | **+1** |
| matched tid=14→9 | 41 | 41 | 0 |
| matched tid=15→10 | 16 | 16 | 0 |
| ours-cold total events | 121,569 | 121,569 | 0 |
| ours-cold det-fields digest | e1dfcb15… | 23cf4c4c… | NEW |
| Phase B image hash | ea8d160e… | ea8d160e… | 0 |
## Outcome
- C+23 = LANDED. tid=12→7 chain advances +1 (3 → 4).
- 5 LOC effective engine change (CPU interpreter `addis`).
- 3 new regression tests in xenia-cpu (lib tests 288 → 291).
- Determinism preserved. Phase B image hash preserved.
- All other sister chains and main preserved.
- New downstream divergence at idx=4 is C+22-class scheduler
determinism (out of scope for this phase).