Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
244 lines
9.3 KiB
Markdown
244 lines
9.3 KiB
Markdown
# Phase C+23 investigation — KeWaitForSingleObject timeout encoding (2026-05-18)
|
|
|
|
## Divergence (input from C+22)
|
|
|
|
D-NEW-2 at canary tid=12 → ours tid=7 idx=3 sister chain:
|
|
|
|
```
|
|
canary: [3] wait.begin {handles_semantic_ids: ['c49d8f0ab90401ea'],
|
|
timeout_ns: -30000000, alertable: False, wait_type: 'any'}
|
|
ours: [3] wait.begin {handles_semantic_ids: ['6e3d96c5a52bf429'],
|
|
timeout_ns: 429466729600, alertable: False, wait_type: 'any'}
|
|
```
|
|
|
|
Canary: -30,000,000 ns = -300,000 100ns-ticks = 30 ms relative wait.
|
|
Ours: +429,466,729,600 ns = +4,294,667,296 100ns-ticks = +7 minutes
|
|
absolute deadline. Wrong by sign-extension class.
|
|
|
|
## Step 1 — Verify framing (reading-error #28)
|
|
|
|
### Canary's `xeKeWaitForSingleObject`
|
|
|
|
`xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_threading.cc:969-1013`:
|
|
|
|
```cpp
|
|
uint32_t xeKeWaitForSingleObject(void* object_ptr, uint32_t wait_reason,
|
|
uint32_t processor_mode, uint32_t alertable,
|
|
uint64_t* timeout_ptr) {
|
|
...
|
|
if (phase_a::IsEnabled()) {
|
|
uint64_t sid = 0;
|
|
if (!object->handles().empty()) {
|
|
sid = phase_a::LookupHandleSemanticId(object->handles()[0]);
|
|
}
|
|
int64_t timeout_ns = timeout_ptr
|
|
? (static_cast<int64_t>(*timeout_ptr) * 100) : -1;
|
|
phase_a::EmitWaitBegin(&sid, 1, timeout_ns, alertable != 0, false);
|
|
}
|
|
...
|
|
}
|
|
|
|
dword_result_t KeWaitForSingleObject_entry(lpvoid_t object_ptr,
|
|
dword_t wait_reason,
|
|
dword_t processor_mode,
|
|
dword_t alertable,
|
|
lpqword_t timeout_ptr) {
|
|
uint64_t timeout = timeout_ptr ? static_cast<uint64_t>(*timeout_ptr) : 0u;
|
|
return xeKeWaitForSingleObject(...);
|
|
}
|
|
```
|
|
|
|
`lpqword_t` is Xenia's BE-swapped 64-bit-aligned pointer accessor.
|
|
Formula: read 8 BE bytes as int64, multiply by 100.
|
|
|
|
### Ours's `ke_wait_for_single_object`
|
|
|
|
`xenia-rs/crates/xenia-kernel/src/exports.rs:5051-5083` (and
|
|
`decode_timeout_ns` at 4987-4995):
|
|
|
|
```rust
|
|
fn decode_timeout_ns(mem: &GuestMemory, timeout_ptr: u32) -> i64 {
|
|
if timeout_ptr == 0 { return -1; }
|
|
let raw = mem.read_u64(timeout_ptr) as i64;
|
|
raw.saturating_mul(100)
|
|
}
|
|
```
|
|
|
|
`mem.read_u64` reads 8 BE bytes (xenia-memory/heap.rs:521-533).
|
|
Formula: read 8 BE bytes as int64, multiply by 100. **Identical to canary.**
|
|
|
|
### Conclusion of Step 1
|
|
|
|
Both engines read 8 BE bytes from the same conceptual `timeout_ptr` and
|
|
multiply by 100. If both read the **same bytes** from the **same address**,
|
|
they produce the same `timeout_ns`. The divergence implies one of:
|
|
|
|
1. The `timeout_ptr` address differs (upstream).
|
|
2. The bytes at the same address differ (upstream).
|
|
3. Wrong-register read in one of the engines (reading-error #25).
|
|
|
|
## Step 2 — Sample the actual guest call (reading-error #25 discipline)
|
|
|
|
Added a TEMPORARY diagnostic dump to `ke_wait_for_single_object`
|
|
(removed before landing the fix). Ran cold ours; first hit for tid=7:
|
|
|
|
```
|
|
XRS_C23 KeWait tid=7 lr=0x824cd4f4 r3=0x42453b5c r4=0x3 r5=0x1 r6=0x0
|
|
r7=0x71187eb0 r8=0x0 r9=0x0 r10=0x2
|
|
bytes_at_r7=hi=0x0 lo=0xfffb6c20
|
|
```
|
|
|
|
- r3 = `0x42453b5c` — object pointer (PKEVENT at ctx+0x20).
|
|
- r7 = `0x71187eb0` — timeout pointer (stack-allocated).
|
|
- bytes at r7 = `0x00000000 0xFFFB6C20` (BE) → full 8 BE bytes =
|
|
`0x00000000_FFFB6C20` = +4,294,667,296. **Matches ours's output.**
|
|
|
|
For canary's -300,000 (= -30,000,000 / 100), the 8 BE bytes would be
|
|
`0xFFFFFFFF_FFFB6C20`. So **the high 4 bytes are zero in ours but
|
|
all-Fs in canary**. The low 32 bits match exactly.
|
|
|
|
The guest is writing the LARGE_INTEGER to its stack and our engine
|
|
sees `0x00000000_FFFB6C20` while canary sees `0xFFFFFFFF_FFFB6C20`.
|
|
Different bytes at the same conceptual location ⇒ upstream divergence
|
|
in how the guest computes the value.
|
|
|
|
## Step 3 — Identify the encoding bug (root cause)
|
|
|
|
LR at the KeWait call = 0x824cd4f4. The thread entry (from
|
|
`thread.create.entry_pc`) is `0x824cd458`. Disassembling
|
|
`0x824cd458 … 0x824cd4f0` (the prolog through the call):
|
|
|
|
```
|
|
824cd470: 0x3d60fffb lis r11, 0xFFFB ; high half of -300,000
|
|
824cd478: 0x3ba10050 addi r29, r1, 80 ; r29 = stack timeout slot
|
|
824cd47c: 0x616b6c20 ori r11, r11, 0x6C20 ; r11 |= 0x6C20
|
|
824cd480: 0xf9610050 std r11, 80(r1) ; store r11 as 64-bit DW
|
|
...
|
|
824cd4dc: 0x7fa7eb78 mr r7, r29 ; r7 = timeout pointer
|
|
...
|
|
824cd4f0: 0x483808dd bl KeWaitForSingleObject
|
|
```
|
|
|
|
In canonical PowerPC, `lis r11, 0xFFFB` is `addis r11, 0, 0xFFFB` and
|
|
**sign-extends the shifted immediate to 64 bits**:
|
|
|
|
```
|
|
r11 = EXTS(0xFFFB) << 16 = 0xFFFFFFFF_FFFB0000
|
|
```
|
|
|
|
Canary's HIR emitter at `xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc:
|
|
138-150` (`InstrEmit_addis`) does exactly that:
|
|
|
|
```cpp
|
|
Value* si = f.LoadConstantInt64(XEEXTS16(i.D.DS) << 16);
|
|
```
|
|
|
|
Subsequent `ori r11, r11, 0x6C20` produces `0xFFFFFFFF_FFFB6C20`, and
|
|
`std r11, 80(r1)` writes all 64 bits → canary's wire bytes
|
|
`0xFFFFFFFF_FFFB6C20` = -300,000 as int64.
|
|
|
|
**Ours's `addis` at
|
|
`xenia-rs/crates/xenia-cpu/src/interpreter.rs:119-132` (before fix)**:
|
|
|
|
```rust
|
|
PpcOpcode::addis => {
|
|
// (per the comment) truncate to 32 bits to simulate 32-bit ABI.
|
|
let ra_val = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] };
|
|
let result = ra_val.wrapping_add((instr.simm16() as i64 as u64) << 16);
|
|
ctx.gpr[instr.rd()] = result as u32 as u64; // ⬅ ZERO-extends to 64
|
|
ctx.pc += 4;
|
|
}
|
|
```
|
|
|
|
The `result as u32 as u64` cast **drops the high 32 bits before storage**,
|
|
producing `0x00000000_FFFB0000` instead of `0xFFFFFFFF_FFFB0000`.
|
|
After `ori` → `0x00000000_FFFB6C20`. After `std` (which stores all 64
|
|
bits of the GPR) → wire bytes `0x00000000_FFFB6C20` = +4,294,667,296
|
|
as int64. **This is the C+22 divergence value exactly.**
|
|
|
|
### Encoding bug class: (d) Sign-extension. Specifically:
|
|
|
|
> `addis` performed a defensive 32-bit zero-extension truncation that
|
|
> defeats the architectural sign-extension semantics required when the
|
|
> result later flows into a 64-bit memory store (`std`).
|
|
|
|
### Why the defensive truncation existed
|
|
|
|
The C+22-era comment cites correctness of the `subfc`/`lwz` carry
|
|
chain in 32-bit ABI mode. Inspection of every consumer of GPRs that
|
|
might receive an `addis` result confirms: every 32-bit-meaningful
|
|
arithmetic op (`subfcx`, `addic`, `addicx`, `subficx`, etc.) already
|
|
defensively truncates BOTH operands to u32 BEFORE computing. So the
|
|
upstream sign-extended high bits never enter their result; they only
|
|
become visible via `std`/`mr`/`orx` (operations that legitimately
|
|
propagate the full 64-bit value).
|
|
|
|
Reverting the `addis` truncation does NOT regress any PPCBUG-002/-007/
|
|
-etc. fix; those operate at their consumer site, not at the producer.
|
|
|
|
## The fix (5 LOC effective)
|
|
|
|
`xenia-rs/crates/xenia-cpu/src/interpreter.rs:119-138`:
|
|
|
|
```rust
|
|
PpcOpcode::addis => {
|
|
// Phase C+23: sign-extend the shifted immediate to 64 bits before
|
|
// adding to rA, matching canary's HIR emitter. Defensive 32-bit
|
|
// truncation at each consumer site already handles the 32-bit-ABI
|
|
// arithmetic chain correctness (see PPCBUG-002/-007/etc.).
|
|
let ra_val = if instr.ra() == 0 { 0i64 } else { ctx.gpr[instr.ra()] as i64 };
|
|
let shifted = (instr.simm16() as i64) << 16;
|
|
let result = ra_val.wrapping_add(shifted);
|
|
ctx.gpr[instr.rd()] = result as u64;
|
|
ctx.pc += 4;
|
|
}
|
|
```
|
|
|
|
### Tests added (3 new in xenia-cpu)
|
|
|
|
- `addis_with_negative_simm_sign_extends_to_64_bits` — direct
|
|
unit test for `lis r11, 0xFFFB` producing `0xFFFFFFFFFFFB0000`.
|
|
- `lis_ori_std_negative_timeout_writes_sign_extended_doubleword` —
|
|
end-to-end regression: runs the actual 3-instruction sequence
|
|
used by Sylpheed's KeWait setup, asserts wire bytes
|
|
`0xFFFFFFFFFFFB6C20` and int64 round-trip to -300,000.
|
|
- `addis_with_nonzero_ra_adds_in_64_bit` — ensures the rA-non-zero
|
|
case still uses canonical 64-bit Add semantics.
|
|
|
|
## Cross-engine encoding bug class summary
|
|
|
|
Per the prompt's hint catalog:
|
|
|
|
- (a) Wrong register: ruled out. r3-r10 dump confirms r7 holds the
|
|
timeout pointer in ours, matching canary's 5-arg ABI signature.
|
|
- (b) Wrong-direction LARGE_INTEGER dereference: ruled out. Both
|
|
engines read 8 BE bytes via the same idiom.
|
|
- (c) Endianness: ruled out. Both BE.
|
|
- (d) Sign-extension: **CONFIRMED.** Bug is in the CPU interpreter's
|
|
`addis` opcode, not the wait subsystem.
|
|
|
|
## Validation evidence
|
|
|
|
- ours-cold (post-fix) tid=7 idx=3 `wait.begin.timeout_ns = -30000000`,
|
|
matching canary exactly.
|
|
- Sister chain canary tid=12 → ours tid=7 advances from matched=3 to
|
|
matched=4.
|
|
- New divergence at idx=4 is `return_value: canary=258 (TIMEOUT) ours=0
|
|
(SUCCESS)` — the C+22-class scheduler-determinism issue (ours's
|
|
monolithic-thread runner sees no contention, so the 30 ms timeout
|
|
doesn't fire). Out of scope for this phase.
|
|
- Main chain matched-prefix 104,607 preserved (no regression).
|
|
- All other sister chains at C+22 baseline.
|
|
|
|
## Files
|
|
|
|
- `investigation.md` (this file)
|
|
- `cold-vs-cold-result.md`
|
|
- `diff-cold-vs-cold.md` — full Phase A diff report
|
|
- `ours-cold.jsonl` / `ours-cold-stdout.log` / `ours-cold-stderr.log`
|
|
- `canary-cold-trunc.jsonl` / `canary-cold-stdout.log`
|
|
- `canary-binary-cache-pre-wipe.tar.gz` / `canary-xdg-cache-pre-wipe.tar.gz`
|
|
- `re-validation.md`
|
|
- `digest-cold-stable-1.json` / `-2.json` / `-3.json`
|
|
- `fix.diff`
|