Files
xenia-rs/audit-runs/audit-059-handle-disambiguation/ROUND_34_PLAN.md
MechaCat02 de21c7a544 [iterate-2G] db16cyc spin-hint cooperative yield: unblock title-screen 0x10a0 gate
The silph title state machine (tid13) blocked on event 0x10a0, never signaled.
Root: the event's producer chain runs on the silph worker (entry 0x821C4AD0,
our tid14), which was starved. tid14 shares a HW slot with a guest spinlock/
barrier participant (sub_824D1328, entry 0x824D2940) that busy-spins on the
db16cyc hint `or r31,r31,r31` (encoding 0x7FFFFB78) at 0x824D140C. Under our
round-robin lockstep the spinner consumed its whole block every round and
starved the co-located tid14 (only 9 progress hits over 200M instr) — so the
producer never reached the event-create/duplicate/signal dance the canary
oracle performs (handle F80000E8 set by the submitter F8000044 via a duplicated
handle).

Fix (canary-faithful): recognize the db16cyc spin hint exactly as canary's
InstrEmit_orx does (code 0x7FFFFB78 -> DelayExecution) and surface it as a new
StepResult::Yield. The scheduler's yield_current() promotes every Ready peer on
the slot past STARVE_LIMIT so begin_slot_visit picks one next round, then they
reset and the spinner reclaims the slot — fair alternation, no priority
inversion, pure function of slot state (deterministic).

Result (lockstep, cache-persist, -n 200M): tid14 progresses past its old stall
into a real wait; tid13 advances off 0x10a0 to a new event; hub/submitter
re-enter their wait loops. imports 280k->592k, packets 124M->164M, swaps 1->2.
draws still 0 (the splash's first draw is a further-upstream gate).

Determinism preserved (two cold n50m runs byte-identical). n50m golden
re-baselined (imports 90296->339766, swaps 1->2; draws unchanged 0). n2m
golden unchanged (db16cyc not reached in first 2M). Tests 670/670.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-13 10:38:17 +02:00

5.5 KiB
Raw Blame History

Round 34 — silph_ui_synth.rs (cluster B sibling) — DEFERRED PLAN

Background

Rounds 23-33 drove γ-cluster #2 down to the actual gate: sub_821741C8 (silph worker-dispatch loop) fires 0× in ours / 471× in canary (tid=6). It's invoked via dynamic vtable slot 9 from sub_821752C0 thunk. The vtable writer is in the audit-050 unreachability island — there's no static caller chain to hook into.

The fix shape is a synth module analogous to silph_synth.rs (rounds 18-21):

  • Synthesize a singleton-like object with the right vtable
  • Spawn a guest thread at the right entry with this object as r3
  • Let the dispatch chain do the rest

Rounds 18-21 took 4 rounds to land cluster A's analog and ended at "workers run live but idle" because of missing foreign-pointer fields. Cluster B will face similar challenges.

Sub-round breakdown (estimated 5-8 rounds)

34.α — Probe canary's dispatcher singleton (1 round)

Capture canary's runtime state at sub_821741C8 entry:

  • r3 = 0xBCA44C00 (canary tid=6's dispatcher singleton)
  • Dump r3..r3+0x80 to identify all fields
  • Note vtable address at [r3+0]
WINEDEBUG=-all wine xenia_canary.exe --mute=true --audit_handle_lifecycle=true \
  --audit_jit_prolog_pc=0x821741C8 --audit_jit_prolog_r3_bytes=128 \
  --audit_jit_prolog_mem_dump=<vtable_va_from_r3+0> \
  ...

34.β — Probe full vtable layout (1 round)

Read the vtable bytes statically from the PE (canary's [r3+0] IS a static XEX VA — same trick as round 21):

  • Read 32-64 slots from PE at file offset = vtable VA - 0x82000000
  • Confirm slot 9 = sub_821C7CB8 and vtable+0x24 thunk to sub_821741C8
  • Look at all other slots — do any reference deep guest code that needs more init?

Cross-reference each slot's DB reach. If a slot is the dispatcher's own method body, it'll be called from within the chain — needs to exist.

34.γ — Skeleton synth + thread spawn (1 round)

Create crates/xenia-kernel/src/silph_ui_synth.rs mirroring silph_synth.rs structure:

pub fn spawn_silph_ui_dispatcher(state: &mut KernelState, mem: &GuestMemory, scheduler: &mut Scheduler) -> Result<u32, &'static str> {
    if state.silph_ui_synth_done { return Ok(state.silph_ui_synth_ctx); }
    
    // Allocate ~0x100-0x200 bytes for the dispatcher singleton
    let ctx = state.heap_alloc(0x200, 16)?;
    mem.write_zeros(ctx, 0x200);
    
    // Install static-XEX vtable at [+0]
    mem.write_u32(ctx + 0x00, VTABLE_VA);  // discovered in 34.β
    
    // Other init fields from 34.α dump
    // ...
    
    // Spawn dispatcher thread at sub_821748F0 with r3=ctx
    scheduler.spawn(SpawnParams{
        entry: 0x821748F0,
        start_context: ctx,
        create_suspended: false,
        ...
    })?;
    
    state.silph_ui_synth_done = true;
    state.silph_ui_synth_ctx = ctx;
    Ok(ctx)
}

Hook point: first reach of sub_821CB030 in the existing silph factory chain (the call site that should normally trigger this dispatcher's creation in canary).

Add 3-mode env gate: XENIA_SILPH_UI_SYNTH={unset|=suspend|=1}.

34.δ — Run + diagnose first crash (1 round)

Almost certainly crashes on a NULL deref of one of the singleton's fields. Use round 19's pattern:

  • Probe at thread entry + early BB heads
  • Identify the offset that's accessed
  • Compare to canary's value at that offset

34.ε..η — Iterate on field fills (2-4 rounds)

Each crash identifies one more required field. Fill it. Re-run. Continue until workers idle (verdict D analog).

34.θ — Producer-side seeding (1 round)

Even with the dispatcher running, work-items may not flow. Per round 32 it's pool 3 that's starved (271 fires in canary). The producers are sub_821CBEA8 / sub_821D24A0 / sub_821CD458 — they may need their own bootstrap. Probe what triggers them in canary.

Verification at each stage

After every commit:

  • cargo test --release --workspace — 765/765 must pass
  • XENIA_CACHE_PERSIST=1 XENIA_SILPH_UI_SYNTH=1 ./target/release/xenia-rs exec <ISO> -n 50000000 --trace-handles-focus=0x1218,0x1224,0x12a4,0x12ac
  • Check:
    • No crash
    • sub_821741C8 fires
    • sub_82450b68 r4=3 fires increase
    • Handle 0x1224 / 0x1218 transition out of NO_SIGNALS_DESPITE_WAITS
    • Eventually: VdSwap > 1, draws > 0

Risk register

  • High: dispatcher singleton may require many more fields than the analog WorkerCtx (rounds 18-21 needed 8 KEVENTs + ring + descriptors + index table; UI dispatcher likely has similar scope)
  • High: foreign-arena pointers in canary's heap (similar to round 19's [+0x28/+0x2C/+0x30]) may need their own synthesis
  • Medium: cluster B's worker may itself spawn threads which need contexts which need... cascading scope
  • Low: workspace tests breaking (probe infrastructure is solid)
  • Low: existing iterate-2BE work regressing (it's on a separate branch)

Off-ramps

If we hit a wall at any sub-round, the off-ramps are:

  1. Land the infrastructure as opt-in (rounds 18-21 pattern) and ship cluster A + cluster B both as opt-in env vars
  2. Drop cluster B entirely and PR the iterate-2BE work to master (production-ready architectural fix)
  3. Pivot to lockstep diff of inflate function (round 30 hypothesis (i)) if cluster B keeps producing crash-fix layers

Branch plan

New branch: iterate-2BF/silph-ui-synth off iterate-2BF/synthetic-silph-spawn HEAD 40f208e. Each sub-round = 1 commit. All commits opt-in via env var; default behavior unchanged.

When ready to execute

Dispatch with the prompt at the round-33 agent's recommendation, starting at sub-round 34.α.