Files
xenia-rs/audit-runs/audit-059-handle-disambiguation/ROUND_34_PLAN.md
MechaCat02 de21c7a544 [iterate-2G] db16cyc spin-hint cooperative yield: unblock title-screen 0x10a0 gate
The silph title state machine (tid13) blocked on event 0x10a0, never signaled.
Root: the event's producer chain runs on the silph worker (entry 0x821C4AD0,
our tid14), which was starved. tid14 shares a HW slot with a guest spinlock/
barrier participant (sub_824D1328, entry 0x824D2940) that busy-spins on the
db16cyc hint `or r31,r31,r31` (encoding 0x7FFFFB78) at 0x824D140C. Under our
round-robin lockstep the spinner consumed its whole block every round and
starved the co-located tid14 (only 9 progress hits over 200M instr) — so the
producer never reached the event-create/duplicate/signal dance the canary
oracle performs (handle F80000E8 set by the submitter F8000044 via a duplicated
handle).

Fix (canary-faithful): recognize the db16cyc spin hint exactly as canary's
InstrEmit_orx does (code 0x7FFFFB78 -> DelayExecution) and surface it as a new
StepResult::Yield. The scheduler's yield_current() promotes every Ready peer on
the slot past STARVE_LIMIT so begin_slot_visit picks one next round, then they
reset and the spinner reclaims the slot — fair alternation, no priority
inversion, pure function of slot state (deterministic).

Result (lockstep, cache-persist, -n 200M): tid14 progresses past its old stall
into a real wait; tid13 advances off 0x10a0 to a new event; hub/submitter
re-enter their wait loops. imports 280k->592k, packets 124M->164M, swaps 1->2.
draws still 0 (the splash's first draw is a further-upstream gate).

Determinism preserved (two cold n50m runs byte-identical). n50m golden
re-baselined (imports 90296->339766, swaps 1->2; draws unchanged 0). n2m
golden unchanged (db16cyc not reached in first 2M). Tests 670/670.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-13 10:38:17 +02:00

117 lines
5.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Round 34 — silph_ui_synth.rs (cluster B sibling) — DEFERRED PLAN
## Background
Rounds 23-33 drove γ-cluster #2 down to the actual gate: **`sub_821741C8`** (silph worker-dispatch loop) fires 0× in ours / 471× in canary (tid=6). It's invoked via dynamic vtable slot 9 from `sub_821752C0` thunk. The vtable writer is in the audit-050 unreachability island — there's no static caller chain to hook into.
The fix shape is a synth module analogous to `silph_synth.rs` (rounds 18-21):
- Synthesize a singleton-like object with the right vtable
- Spawn a guest thread at the right entry with this object as r3
- Let the dispatch chain do the rest
Rounds 18-21 took 4 rounds to land cluster A's analog and ended at "workers run live but idle" because of missing foreign-pointer fields. Cluster B will face similar challenges.
## Sub-round breakdown (estimated 5-8 rounds)
### 34.α — Probe canary's dispatcher singleton (1 round)
Capture canary's runtime state at `sub_821741C8` entry:
- `r3 = 0xBCA44C00` (canary tid=6's dispatcher singleton)
- Dump `r3..r3+0x80` to identify all fields
- Note vtable address at `[r3+0]`
```bash
WINEDEBUG=-all wine xenia_canary.exe --mute=true --audit_handle_lifecycle=true \
--audit_jit_prolog_pc=0x821741C8 --audit_jit_prolog_r3_bytes=128 \
--audit_jit_prolog_mem_dump=<vtable_va_from_r3+0> \
...
```
### 34.β — Probe full vtable layout (1 round)
Read the vtable bytes statically from the PE (canary's `[r3+0]` IS a static XEX VA — same trick as round 21):
- Read 32-64 slots from PE at file offset = vtable VA - 0x82000000
- Confirm slot 9 = `sub_821C7CB8` and `vtable+0x24` thunk to `sub_821741C8`
- Look at all other slots — do any reference deep guest code that needs more init?
Cross-reference each slot's DB reach. If a slot is the dispatcher's own method body, it'll be called from within the chain — needs to exist.
### 34.γ — Skeleton synth + thread spawn (1 round)
Create `crates/xenia-kernel/src/silph_ui_synth.rs` mirroring `silph_synth.rs` structure:
```rust
pub fn spawn_silph_ui_dispatcher(state: &mut KernelState, mem: &GuestMemory, scheduler: &mut Scheduler) -> Result<u32, &'static str> {
if state.silph_ui_synth_done { return Ok(state.silph_ui_synth_ctx); }
// Allocate ~0x100-0x200 bytes for the dispatcher singleton
let ctx = state.heap_alloc(0x200, 16)?;
mem.write_zeros(ctx, 0x200);
// Install static-XEX vtable at [+0]
mem.write_u32(ctx + 0x00, VTABLE_VA); // discovered in 34.β
// Other init fields from 34.α dump
// ...
// Spawn dispatcher thread at sub_821748F0 with r3=ctx
scheduler.spawn(SpawnParams{
entry: 0x821748F0,
start_context: ctx,
create_suspended: false,
...
})?;
state.silph_ui_synth_done = true;
state.silph_ui_synth_ctx = ctx;
Ok(ctx)
}
```
Hook point: first reach of `sub_821CB030` in the existing silph factory chain (the call site that should normally trigger this dispatcher's creation in canary).
Add 3-mode env gate: `XENIA_SILPH_UI_SYNTH={unset|=suspend|=1}`.
### 34.δ — Run + diagnose first crash (1 round)
Almost certainly crashes on a NULL deref of one of the singleton's fields. Use round 19's pattern:
- Probe at thread entry + early BB heads
- Identify the offset that's accessed
- Compare to canary's value at that offset
### 34.ε..η — Iterate on field fills (2-4 rounds)
Each crash identifies one more required field. Fill it. Re-run. Continue until workers idle (verdict D analog).
### 34.θ — Producer-side seeding (1 round)
Even with the dispatcher running, work-items may not flow. Per round 32 it's pool 3 that's starved (271 fires in canary). The producers are `sub_821CBEA8 / sub_821D24A0 / sub_821CD458` — they may need their own bootstrap. Probe what triggers them in canary.
## Verification at each stage
After every commit:
- `cargo test --release --workspace` — 765/765 must pass
- `XENIA_CACHE_PERSIST=1 XENIA_SILPH_UI_SYNTH=1 ./target/release/xenia-rs exec <ISO> -n 50000000 --trace-handles-focus=0x1218,0x1224,0x12a4,0x12ac`
- Check:
- No crash
- `sub_821741C8` fires
- `sub_82450b68` r4=3 fires increase
- Handle 0x1224 / 0x1218 transition out of NO_SIGNALS_DESPITE_WAITS
- Eventually: `VdSwap > 1, draws > 0`
## Risk register
- **High**: dispatcher singleton may require many more fields than the analog WorkerCtx (rounds 18-21 needed 8 KEVENTs + ring + descriptors + index table; UI dispatcher likely has similar scope)
- **High**: foreign-arena pointers in canary's heap (similar to round 19's `[+0x28/+0x2C/+0x30]`) may need their own synthesis
- **Medium**: cluster B's worker may itself spawn threads which need contexts which need... cascading scope
- **Low**: workspace tests breaking (probe infrastructure is solid)
- **Low**: existing iterate-2BE work regressing (it's on a separate branch)
## Off-ramps
If we hit a wall at any sub-round, the off-ramps are:
1. Land the infrastructure as opt-in (rounds 18-21 pattern) and ship cluster A + cluster B both as opt-in env vars
2. Drop cluster B entirely and PR the iterate-2BE work to master (production-ready architectural fix)
3. Pivot to lockstep diff of inflate function (round 30 hypothesis (i)) if cluster B keeps producing crash-fix layers
## Branch plan
New branch: `iterate-2BF/silph-ui-synth` off `iterate-2BF/synthetic-silph-spawn` HEAD `40f208e`. Each sub-round = 1 commit. All commits opt-in via env var; default behavior unchanged.
## When ready to execute
Dispatch with the prompt at the round-33 agent's recommendation, starting at sub-round 34.α.