handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes

Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-06-05 07:19:08 +02:00
parent acd1656753
commit ef93a4fa14
620 changed files with 108303 additions and 1 deletions

View File

@@ -0,0 +1,251 @@
# Iterate 2.K — Longer-budget cache-wipe replay (writer report)
**Date:** 2026-05-28. **LOC delta:** engine **0**, canary **0**. Pure
measurement.
**Tests:** N/A (no source modifications).
## Headline
**INSTALL-CHAIN-ABSENT-NEW-BLOCKER.** 500M-instruction budget run
(10× 2.J's 50M) reaches the budget cap cleanly at wallclock=13.96s
**but emits ZERO new Phase-A events past 2.J's terminus.** Event count
121,569 bit-identical to 2.J. tid=1 max guest_cycle 9,169,116 bit-identical
to 2.J. The keystone `sub_824F8398` install chain still **0 fires**;
`sub_825070F0` worker fan-out still **0 fires**. Final-state dump
reveals **all 12 live threads parked in `Blocked(WaitAny ..., deadline:
None)` waits, 5 of them at PC `0x824ac578`** — the exact AUDIT-049
wedge PC. The 2.J "wedge moved / wait returns success" observation was
budget-truncated artifact: under longer budget, the engine re-converges
to a deadlock at the same call site. **2.J's `NtWaitForSingleObjectEx
return=0` events are the wrapper successfully returning on prior
iterations of a tight `wait → return → wait` loop; the FINAL wait of
each tid blocks forever and never emits a `kernel.return`.** Cache
parity was load-bearing but is NOT THE keystone. Next blocker is
upstream of the install chain at the wedge-loop level.
## Mode
ZERO LOC. Invocation:
```
XENIA_CACHE_WIPE=1 timeout 600 ./xenia-rs/target/release/xenia-rs exec \
-n 500000000 --quiet \
--phase-a-event-log audit-runs/iterate-2K-longer-budget-replay/ours-cold.jsonl \
"Project Sylpheed - Arc of Deception (USA, Europe) (En,Ja).iso"
```
Identical to 2.J except `-n 50000000``-n 500000000`. XDG cache
already absent (no `/home/fabi/.local/share/xenia-rs/cache/`) before run;
`XENIA_CACHE_WIPE=1` set for belt-and-braces.
Run completed `EXIT=0` at wallclock 13.96s. Final reason from non-quiet
diagnostic re-run: `reached max instruction count limit=500000000`
(instruction budget hit, not a panic/fault/timeout). Total instructions
executed: 500,000,004.
## Primary gate results
| gate | 2.J | 2.K | result |
|------|-----|-----|--------|
| `sub_824F8398` install-chain fires | 0 | **0** | UNCHANGED |
| `sub_825070F0` worker fan-out fires | 0 | **0** | UNCHANGED |
Grep against full ours-cold.jsonl (case-insensitive on hex literal,
plus per-tid first-kernel-call signature): zero hits for either symbol
across all kinds (thread.create, import.call, kernel.call,
kernel.return, payload fields). The canary's tids 15/27/28 (the
`sub_825070F0` family workers) and tid 14 (audio worker
`sub_824D2878`-driven) are **structurally absent from ours's thread
fan-out at this trajectory point**, even given 10× the instruction
budget.
## Secondary cascade gate results
### Thread set
**10 thread.create entries, bit-identical to 2.J** (same entry_pcs,
same ctx_ptrs). Per tripstone #28 (don't key on integer tid):
| entry_pc | ctx_ptr | canary analog |
|----------|---------|---------------|
| 0x82181830 | 0x828f3d08 | main bootstrap |
| 0x8245a5d0 | 0x828f4838 | early helper |
| 0x82450a28 | 0x828f3b68 | producer (AUDIT-069) |
| 0x82457ef0 | 0x828f3b08 | dispatcher tid=5 |
| 0x824cd458 | 0xbe8cbb3c | per-AUDIT-068 sister |
| 0x822f1ee0 | 0xbd184a40 | helper |
| 0x824d2878 | 0x00000000 | audio worker (no kernel calls) |
| 0x824d2940 | 0x00000000 | audio companion (no kernel calls) |
| 0x82178950 | 0x828f3ec0 | input/lifecycle |
| 0x821748f0 | 0xbc6c5640 | early helper |
NB: `sub_824D2878` IS in the spawn set but its tid emits no kernel
calls in the entire 500M-instruction run (matches 2.J). Workers
`sub_825070F0` × 4 + secondary-burst tids never spawn.
### VdSwap / draws (gameplay progression — tripstone #39)
- **VdSwap = 1** (same single swap at cycle=5,577,303 / host_ns=493.5ms
as 2.J). Bit-identical timestamp.
- **Draws = 0** (no `*Draw*` kernel name emitted).
- **Gameplay progression NOT achieved.** Honest "no" per #39.
### Total event count
- **121,569 events** (bit-identical to 2.J).
- File size 28,724,871 bytes vs 2.J 28,667,xxx ish — content identical
up to floating host_ns jitter; structurally equal.
- Implication: between 50M and 500M instructions (4× more wallclock),
the engine emitted **0 new kernel calls, 0 new wait.begin, 0 new
handle events**. The host clock advanced (~3× wallclock) but the
guest committed no observable progress.
### Wedge state (final-state dump, non-quiet diagnostic re-run)
At budget exhaustion, all live threads parked:
| tid | PC | LR | state | handle waiting on |
|-----|----|----|-------|-------------------|
| 1 | 0x824ac578 | 0x824ac578 | Blocked(WaitAny, no deadline) | 0x12c8 = Thread(id=13) |
| 11 | 0x824d2a94 | 0x824d2a94 | Blocked(WaitAny, no deadline) | 0x828a3244 = Event(sig=false) |
| 2 | 0x824a95f8 | 0x824a95f8 | Blocked(WaitAny, no deadline) | 0x8287093c = Event(sig=false) |
| 13 | 0x824ac578 | 0x824ac578 | Blocked(WaitAny, no deadline) | 0x12d0 = Event(sig=false) |
| 7 | 0x824cd4f4 | 0x824cd4f4 | Blocked(WaitAny, deadline=3000) | 0xbe8cbb5c = Event |
| 8 | 0x824ab214 | 0x824ab214 | Blocked(WaitAny, no deadline) | 0x10d8 = Semaphore(0/2^31-1) |
| 4 | 0x824ac578 | 0x824ac578 | Blocked(WaitAny, no deadline) | 0x1028 = Semaphore(0/2^31-1) |
| 5 | 0x824ac578 | 0x824ac578 | Blocked(WaitAny, no deadline) | 0x12e4 = Event(sig=false) |
| 9 | 0x824d1404 | 0x824d22b4 | Ready | — |
| 6 | 0x824ab214 | 0x824ab214 | Ready | — |
| 10 | 0x824d1404 | 0x824d22b4 | Ready | — |
| 12 | 0x824aa6a4 | 0x824aa6a4 | Ready | — |
| 3 | 0x824ac578 | 0x824ac578 | Blocked(WaitAny, no deadline) | 0x1020 = Event(sig=false) |
**5 of 13 tids parked at PC `0x824ac578`** (the AUDIT-049 wedge),
including the canonical tid=1 → Thread(id=13) → Event circular wait.
4 tids in `Ready` state but never re-scheduled to advance.
tid=1's last `kernel.return` in Phase-A log shows
`NtWaitForSingleObjectEx return_value=0 status=0x00000000` at
cycle=9,169,116 — but this is one of an **earlier** iteration of the
wait loop, NOT the wait it is currently blocked on. The final wait
(handle 0x12c8 = tid=13 thread handle) NEVER returned; no
`kernel.return` event was emitted for it because the wrapper is parked
indefinitely.
### Reading-error #41 candidate (new this iterate)
**Phase-A "kernel.return success" events do NOT imply forward progress
when the call site is a tight wait-loop.** 2.J's report observed "tid=1
NtWait returns success, wedge moved or absent" — but the events
captured were prior loop iterations that **fed back into the SAME wait
call** which then blocks forever. The honest interpretation is "wait
wrapper made N successful round-trips, then the (N+1)th call blocked
indefinitely." Recommend registering: **return-success in Phase-A does
not prove wedge resolution; cross-check against final-state thread
diagnostic dump under the longest available budget.**
## Comparison: 2.H → 2.J → 2.K
| gate | 2.H (no wipe) | 2.J (wipe, 50M) | 2.K (wipe, 500M) |
|------|-----|-----|-----|
| cache probe `0xc000000f` | FAIL | PASS (9/9) | PASS (9/9) |
| total events | 118,149 | 121,569 | **121,569** |
| tid=4 events | 160 | 2,075 | **2,075** |
| thread.create count | 10 | 10 | **10** |
| tid=1 last cycle | 9,140,200 | 9,169,116 | **9,169,116** |
| VdSwap count | 1 | 1 | **1** |
| draws | 0 | 0 | **0** |
| `sub_824F8398` fires | 0 | 0 | **0** |
| `sub_825070F0` fires | 0 | 0 | **0** |
| wedge PC `0x824ac578` parked | yes | "moved" (budget short) | **5 tids parked there** |
| termination | 50M budget | 50M budget | 500M budget cleanly |
| wallclock to terminate | ~5s | ~5s | **13.96s** |
**Critical finding: 2.J ≡ 2.K at the Phase-A event level.** All
gates identical to 2.J. The 10× budget bought 4× more wallclock but
zero additional observable guest progress. The engine is genuinely
wedged from somewhere between cycle 9,140,200 and 9,169,116 onward.
## Tripstone audit
- **#28** (cross-engine tid stability): All ours-internal claims keyed
on entry_pc, not integer tid. 2.J ↔ 2.K both ours-side so integer tid
stable; entry_pc/ctx_ptr columns bit-stable.
- **#39** (gameplay progression IS progression): Headline does NOT
claim progression. VdSwap=1, draws=0 — same as 2.J. PASS claim is on
*characterization* of the wedge (now visible at the same PC as
AUDIT-049), not on cascade.
- **#40** (single-keystone framing): The 2.J framing "cache parity is
the keystone, longer budget will reveal the install chain" is
**FALSIFIED** by 2.K. Neither cache parity nor longer budget
unblocks `sub_824F8398`. Reading-error #40 class repeats again
(this iterate's expectation that 10× budget unblocks the chain).
Recommend registering reading-error **#41**: Phase-A
`kernel.return success` events do not prove wedge resolution when
the call site is a tight wait-loop with N successful spins before
the (N+1)th terminal block.
## Confidence
- **HIGH** that 2.K reached 500M instructions cleanly (`exec complete
wall_ms=13959 instructions=500000004` in diagnostic re-run).
- **HIGH** that Phase-A event log is bit-identical to 2.J at the
structural level (count, last tid_event_idx, last guest_cycle).
- **HIGH** that 5 tids parked at `0x824ac578` at budget exhaustion
(final-state dump direct evidence).
- **HIGH** that `sub_824F8398` and `sub_825070F0` are 0 fires (grepped
across all event kinds + payload fields).
- **HIGH** that wallclock-vs-events ratio diverges 3:1 between 2.J and
2.K — the engine is consuming host time without making guest
observable progress, i.e. spinning in the JIT loop on
re-execution of already-blocked waits or busy-loops.
## Next iterate recommendation
**Iterate 2.L should be ONE of:**
1. **Walk the wedge backward from `0x824ac578` to find the missing
signaler** (~0-50 LOC instrumentation). Each parked tid is waiting
on a specific event/semaphore handle. Identify per-tid: (a) who in
canary signals that handle and when; (b) whether the signaler tid
exists in ours; (c) if it exists, why doesn't it reach the signal
site. The wedge handles in this run are:
- tid=1 → 0x12c8 = Thread(id=13) — waiting for tid=13 to exit
- tid=13 → 0x12d0 = Event — needs an external signaler
- tid=3,4,5 → various Event/Semaphore handles
- tid=8 → 0x10d8 = Semaphore (the AUDIT-069 work-semaphore class)
This is essentially AUDIT-069 territory: producer-underrun at the
work-semaphore. ~0 LOC if reusing existing `--lr-trace` /
`--branch-probe` infra.
2. **Push budget further (-n 5000000000, 50×) to see if anything
eventually fires** (~0 LOC, ~2.5 min wallclock estimate, decisive
negative). LOW PRIORITY — based on 2.K's flat-zero events 50M-500M,
strongly predict 0 events 500M-5000M.
3. **2.D-style diff re-measure** of (op, lr) missing-tuple count from
the IAT producer LR side (~0-30 LOC). 2.J said "expected unchanged
at 28/28". 2.K confirms structurally identical to 2.J, so
missing-tuple count is also expected unchanged. Re-measure to
CONFIRM (and to refresh the producer-rate at LR 0x824AB168
which was 9.97% in 2.D). Useful as cascade-sanity even if
negative.
**Recommended priority: (1)** — direct per-handle waiter→signaler
walk on the 5 parked tids at `0x824ac578`. Will identify the most
upstream missing signaler and likely lead to either AUDIT-069's
producer-underrun root or a new state-parity divergence upstream of
the install epoch. ~0-50 LOC, ~30-60 min.
**DO NOT pursue (2)** without first attempting (1) — the structural
evidence (event count flat, max-cycle flat, final-state genuine
wedge) makes "longer budget" a high-confidence negative.
## Artifacts
Under `xenia-rs/audit-runs/iterate-2K-longer-budget-replay/`:
- `ours-cold.jsonl` (121,569 events, 500M-instr quiet run, ~28MB)
- `ours-cold.stdout.log` / `ours-cold.stderr.log` (empty — quiet mode)
- `exit-diag-full.log` (390 lines, non-quiet diagnostic re-run
capturing budget-hit message + final-state dump + thread diagnostics
+ metrics summary)
- `exit-diag.log` (50-line tail of first diagnostic run)
- `exit-diag-head.log` (100-line head of second diagnostic run)
- `writer-report.md` (this file)
Cache wiped via `XENIA_CACHE_WIPE=1` env (per-process tmpdir at
`/tmp/xenia-rs-cache-244570-0/`). No XDG cache pre-existed.