handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes

Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-06-05 07:19:08 +02:00
parent acd1656753
commit ef93a4fa14
620 changed files with 108303 additions and 1 deletions

View File

@@ -0,0 +1,109 @@
# Step 0 — framing verification
Read-only checks of the crowbar's expected parameters against
`xenia-rs/audit-runs/phase-nonmatch-investigation/create-thread-events.json`,
the AUDIT-068 S3/S4 memory dossier (write epoch 9.4-9.6 s, vtable
base `0x8200A1E8`), and ours's `ExCreateThread`
(`crates/xenia-kernel/src/exports.rs:294`).
## The 4 thread.create events (from canary-jitter-1.jsonl)
| Index | host_ns | tid (creator) | entry_pc | ctx_ptr | stack | susp | aff | prio |
|------:|---------------:|--------------:|------------:|------------:|-------:|-----:|----:|-----:|
| 20 | 10,382,912,900 | 6 | 0x82506528 | 0xBCE251C0 | 65536 | true | 0 | 0 |
| 21 | 10,383,282,200 | 6 | 0x82506558 | 0xBCE251C0 | 65536 | true | 0 | 0 |
| 22 | 10,383,647,200 | 6 | 0x82506588 | 0xBCE251C0 | 65536 | true | 0 | 0 |
| 23 | 10,384,161,700 | 6 | 0x825065B8 | 0xBCE251C0 | 65536 | true | 0 | 0 |
All 4 share `ctx_ptr=0xBCE251C0`, all spaced ~370500 ns apart on
canary tid=6 (main). `affinity=0` means scheduler chooses; `priority=0`
default.
Canary's natural resume happens "later" via `NtResumeThread` from
worker code (not captured in this jsonl excerpt; deferred — for the
crowbar we resume directly after the 4-spawn burst since the natural
resume gate is downstream of the wedge).
## The ctx layout @ ctx_ptr (per AUDIT-068 S3/S4)
At install epoch host_ns ≈ 9.416 s on canary tid=6, three u32 slots
written simultaneously by guest PC `sub_824FD240+0x24` POD-copy:
```
[ctx_ptr + 0x00] = 0x8200A1E8 (vtable BASE — class ANON_Class_713383D7)
[ctx_ptr + 0x04] = ctx_ptr (self pointer — doubly-linked list head)
[ctx_ptr + 0x08] = ctx_ptr (self pointer — doubly-linked list head)
[ctx_ptr + 0x0C] = (refcount, observed = 1 at later epoch per S4)
```
**Reading-error #37 discipline**: the value `0x8200A1E8` is the
vtable BASE, NOT slot-N address. `0x8200A208` cited in older
AUDIT-058/060/067 is `base + 0x20` = slot-8 address within the
vtable, mistaken for the base in those audits. The install value
is `0x8200A1E8` per AUDIT-068 S3 measurement.
## Worker entry stubs
Per `sub_825070F0.md`, each of the 4 entries (`0x82506528`, +0x30,
+0x60, +0x90) is a thin stub that does:
```
lwz r11, 0(r3) ; load vtable base from ctx
lwz r11, 140(r11) ; load fn ptr from vtable[35]
; (each entry uses a different slot: 35/36/37/38)
mtctr r11
bctr
```
So the workers dispatch through ctx's vtable. If the vtable's
slots 35-38 are not populated (or `0x8200A1E8` is in `.rdata` and
slot reads are valid), the workers will jump to whatever guest code
is at those addresses. The dossier says vtable is "7 entries" but
the worker stubs read at offsets 140/144/148/152 → so the actual
class has at least 39 vtable entries (consistent with AUDIT-058's
"this is a wider parent class" framing).
The risk that the workers fault on a bad vtable load is REAL but
HONEST — the crowbar's job is to test this exact thing.
## What ours's `ex_create_thread` does today
`crates/xenia-kernel/src/exports.rs:294-405`. Takes 6 PPC regs,
allocates thread image (stack + PCR + TLS), allocates a thread
handle, calls `scheduler.spawn(SpawnParams { ... })`, installs the
self-ref via `state.retain_handle(handle)`, writes the handle to
`r3` and tid to `r5`. Phase A `thread.create` event is emitted when
`event_log::is_enabled()`.
The host-side analog therefore only needs:
1. Allocate ctx page via `state.heap_alloc(0x1000, mem)` → write the
4 u32s described above into it.
2. For each of 4 entries: call a host-side `ex_create_thread`-like
helper that takes (entry, ctx_ptr, stack_size, suspended, affinity,
priority) directly, skipping the PPC-reg-marshalling.
3. Resume each of the 4 spawned threads via the scheduler's
`resume_ref`.
## Trigger choice
`coord_pre_round` in `xenia-app/src/main.rs:2038` is per-outer-round
and has access to both `KernelState` and `ExecStats`. Adding a
one-shot check on `stats.instruction_count >= threshold` is
trivially additive.
Threshold default = 20_000_000. At ~6.7M instr/sec lockstep that's
~3 s wallclock; well past the 10-thread initial spawn burst (which
peaks around the boot-init swap) but still early enough for the
workers to have time before the 200M cap.
Configurable via env `XENIA_CROWBAR_TRIGGER_INSTR=N`.
## LOC estimate
- `xenia-kernel/src/exports.rs` host_spawn_worker_thread helper: ~50 LOC
- `xenia-kernel/src/state.rs` crowbar `CrowbarConfig` field + `tick_crowbar`: ~40 LOC
- `xenia-app/src/main.rs` cvar + trigger wire-up: ~30 LOC
- Tests: ~50 LOC
Total ~170 LOC; trim by inlining the helper or sharing
`SpawnParams` boilerplate. Target ≤150 LOC.