The silph title state machine (tid13) blocked on event 0x10a0, never signaled. Root: the event's producer chain runs on the silph worker (entry 0x821C4AD0, our tid14), which was starved. tid14 shares a HW slot with a guest spinlock/ barrier participant (sub_824D1328, entry 0x824D2940) that busy-spins on the db16cyc hint `or r31,r31,r31` (encoding 0x7FFFFB78) at 0x824D140C. Under our round-robin lockstep the spinner consumed its whole block every round and starved the co-located tid14 (only 9 progress hits over 200M instr) — so the producer never reached the event-create/duplicate/signal dance the canary oracle performs (handle F80000E8 set by the submitter F8000044 via a duplicated handle). Fix (canary-faithful): recognize the db16cyc spin hint exactly as canary's InstrEmit_orx does (code 0x7FFFFB78 -> DelayExecution) and surface it as a new StepResult::Yield. The scheduler's yield_current() promotes every Ready peer on the slot past STARVE_LIMIT so begin_slot_visit picks one next round, then they reset and the spinner reclaims the slot — fair alternation, no priority inversion, pure function of slot state (deterministic). Result (lockstep, cache-persist, -n 200M): tid14 progresses past its old stall into a real wait; tid13 advances off 0x10a0 to a new event; hub/submitter re-enter their wait loops. imports 280k->592k, packets 124M->164M, swaps 1->2. draws still 0 (the splash's first draw is a further-upstream gate). Determinism preserved (two cold n50m runs byte-identical). n50m golden re-baselined (imports 90296->339766, swaps 1->2; draws unchanged 0). n2m golden unchanged (db16cyc not reached in first 2M). Tests 670/670. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
117 lines
5.5 KiB
Markdown
117 lines
5.5 KiB
Markdown
# Round 34 — silph_ui_synth.rs (cluster B sibling) — DEFERRED PLAN
|
||
|
||
## Background
|
||
|
||
Rounds 23-33 drove γ-cluster #2 down to the actual gate: **`sub_821741C8`** (silph worker-dispatch loop) fires 0× in ours / 471× in canary (tid=6). It's invoked via dynamic vtable slot 9 from `sub_821752C0` thunk. The vtable writer is in the audit-050 unreachability island — there's no static caller chain to hook into.
|
||
|
||
The fix shape is a synth module analogous to `silph_synth.rs` (rounds 18-21):
|
||
- Synthesize a singleton-like object with the right vtable
|
||
- Spawn a guest thread at the right entry with this object as r3
|
||
- Let the dispatch chain do the rest
|
||
|
||
Rounds 18-21 took 4 rounds to land cluster A's analog and ended at "workers run live but idle" because of missing foreign-pointer fields. Cluster B will face similar challenges.
|
||
|
||
## Sub-round breakdown (estimated 5-8 rounds)
|
||
|
||
### 34.α — Probe canary's dispatcher singleton (1 round)
|
||
Capture canary's runtime state at `sub_821741C8` entry:
|
||
- `r3 = 0xBCA44C00` (canary tid=6's dispatcher singleton)
|
||
- Dump `r3..r3+0x80` to identify all fields
|
||
- Note vtable address at `[r3+0]`
|
||
|
||
```bash
|
||
WINEDEBUG=-all wine xenia_canary.exe --mute=true --audit_handle_lifecycle=true \
|
||
--audit_jit_prolog_pc=0x821741C8 --audit_jit_prolog_r3_bytes=128 \
|
||
--audit_jit_prolog_mem_dump=<vtable_va_from_r3+0> \
|
||
...
|
||
```
|
||
|
||
### 34.β — Probe full vtable layout (1 round)
|
||
Read the vtable bytes statically from the PE (canary's `[r3+0]` IS a static XEX VA — same trick as round 21):
|
||
- Read 32-64 slots from PE at file offset = vtable VA - 0x82000000
|
||
- Confirm slot 9 = `sub_821C7CB8` and `vtable+0x24` thunk to `sub_821741C8`
|
||
- Look at all other slots — do any reference deep guest code that needs more init?
|
||
|
||
Cross-reference each slot's DB reach. If a slot is the dispatcher's own method body, it'll be called from within the chain — needs to exist.
|
||
|
||
### 34.γ — Skeleton synth + thread spawn (1 round)
|
||
Create `crates/xenia-kernel/src/silph_ui_synth.rs` mirroring `silph_synth.rs` structure:
|
||
```rust
|
||
pub fn spawn_silph_ui_dispatcher(state: &mut KernelState, mem: &GuestMemory, scheduler: &mut Scheduler) -> Result<u32, &'static str> {
|
||
if state.silph_ui_synth_done { return Ok(state.silph_ui_synth_ctx); }
|
||
|
||
// Allocate ~0x100-0x200 bytes for the dispatcher singleton
|
||
let ctx = state.heap_alloc(0x200, 16)?;
|
||
mem.write_zeros(ctx, 0x200);
|
||
|
||
// Install static-XEX vtable at [+0]
|
||
mem.write_u32(ctx + 0x00, VTABLE_VA); // discovered in 34.β
|
||
|
||
// Other init fields from 34.α dump
|
||
// ...
|
||
|
||
// Spawn dispatcher thread at sub_821748F0 with r3=ctx
|
||
scheduler.spawn(SpawnParams{
|
||
entry: 0x821748F0,
|
||
start_context: ctx,
|
||
create_suspended: false,
|
||
...
|
||
})?;
|
||
|
||
state.silph_ui_synth_done = true;
|
||
state.silph_ui_synth_ctx = ctx;
|
||
Ok(ctx)
|
||
}
|
||
```
|
||
|
||
Hook point: first reach of `sub_821CB030` in the existing silph factory chain (the call site that should normally trigger this dispatcher's creation in canary).
|
||
|
||
Add 3-mode env gate: `XENIA_SILPH_UI_SYNTH={unset|=suspend|=1}`.
|
||
|
||
### 34.δ — Run + diagnose first crash (1 round)
|
||
Almost certainly crashes on a NULL deref of one of the singleton's fields. Use round 19's pattern:
|
||
- Probe at thread entry + early BB heads
|
||
- Identify the offset that's accessed
|
||
- Compare to canary's value at that offset
|
||
|
||
### 34.ε..η — Iterate on field fills (2-4 rounds)
|
||
Each crash identifies one more required field. Fill it. Re-run. Continue until workers idle (verdict D analog).
|
||
|
||
### 34.θ — Producer-side seeding (1 round)
|
||
Even with the dispatcher running, work-items may not flow. Per round 32 it's pool 3 that's starved (271 fires in canary). The producers are `sub_821CBEA8 / sub_821D24A0 / sub_821CD458` — they may need their own bootstrap. Probe what triggers them in canary.
|
||
|
||
## Verification at each stage
|
||
|
||
After every commit:
|
||
- `cargo test --release --workspace` — 765/765 must pass
|
||
- `XENIA_CACHE_PERSIST=1 XENIA_SILPH_UI_SYNTH=1 ./target/release/xenia-rs exec <ISO> -n 50000000 --trace-handles-focus=0x1218,0x1224,0x12a4,0x12ac`
|
||
- Check:
|
||
- No crash
|
||
- `sub_821741C8` fires
|
||
- `sub_82450b68` r4=3 fires increase
|
||
- Handle 0x1224 / 0x1218 transition out of NO_SIGNALS_DESPITE_WAITS
|
||
- Eventually: `VdSwap > 1, draws > 0`
|
||
|
||
## Risk register
|
||
|
||
- **High**: dispatcher singleton may require many more fields than the analog WorkerCtx (rounds 18-21 needed 8 KEVENTs + ring + descriptors + index table; UI dispatcher likely has similar scope)
|
||
- **High**: foreign-arena pointers in canary's heap (similar to round 19's `[+0x28/+0x2C/+0x30]`) may need their own synthesis
|
||
- **Medium**: cluster B's worker may itself spawn threads which need contexts which need... cascading scope
|
||
- **Low**: workspace tests breaking (probe infrastructure is solid)
|
||
- **Low**: existing iterate-2BE work regressing (it's on a separate branch)
|
||
|
||
## Off-ramps
|
||
|
||
If we hit a wall at any sub-round, the off-ramps are:
|
||
1. Land the infrastructure as opt-in (rounds 18-21 pattern) and ship cluster A + cluster B both as opt-in env vars
|
||
2. Drop cluster B entirely and PR the iterate-2BE work to master (production-ready architectural fix)
|
||
3. Pivot to lockstep diff of inflate function (round 30 hypothesis (i)) if cluster B keeps producing crash-fix layers
|
||
|
||
## Branch plan
|
||
|
||
New branch: `iterate-2BF/silph-ui-synth` off `iterate-2BF/synthetic-silph-spawn` HEAD `40f208e`. Each sub-round = 1 commit. All commits opt-in via env var; default behavior unchanged.
|
||
|
||
## When ready to execute
|
||
|
||
Dispatch with the prompt at the round-33 agent's recommendation, starting at sub-round 34.α.
|