handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
85
audit-runs/review-a-step1-force-spawn/spec.md
Normal file
85
audit-runs/review-a-step1-force-spawn/spec.md
Normal file
@@ -0,0 +1,85 @@
|
||||
# Review A Step 1 — `--force-spawn-workers` crowbar spec
|
||||
|
||||
**Date**: 2026-05-27
|
||||
**Status**: LANDED; PRIMARY gate FAIL (progression metric unmoved).
|
||||
|
||||
This run re-validates and documents the pre-existing v1/v2/v3 crowbar
|
||||
implementation under the canonical "Review A Step 1" framing. The
|
||||
implementation already lives in the working tree (committed-like, not
|
||||
yet `git add`-ed). This session re-runs the gates and lands the
|
||||
default-OFF determinism + PRIMARY-gate verdict on the present HEAD.
|
||||
|
||||
## Implementation surface (already in working tree)
|
||||
|
||||
- `crates/xenia-kernel/src/exports.rs`
|
||||
- `CROWBAR_WORKER_ENTRIES = [0x82506528, 0x82506558, 0x82506588, 0x825065B8]`
|
||||
- `CROWBAR_VTABLE_BASE = 0x8200_A1E8` (reading-error #37 honoured: this is the vtable BASE, not slot-N)
|
||||
- `CROWBAR_STACK_SIZE = 65_536`
|
||||
- `crowbar_spawn_one_worker()`: allocates thread image, allocates
|
||||
handle, spawns via `state.scheduler.spawn(SpawnParams { ... })` with
|
||||
`create_suspended=true, affinity=0, priority=0`, retains self-ref.
|
||||
- `crowbar_dump_vtable_region()`: read-only diag dumping 128 vtable
|
||||
u32 slots so we see slots 35-38 (offsets 140/144/148/152) used by
|
||||
the worker entry stubs.
|
||||
- `crowbar_maybe_install_vtable_from_file()`: v2 opt-in via env var
|
||||
`XENIA_CROWBAR_VTABLE_BIN`; no-op if unset (this run leaves unset
|
||||
because ours already has the vtable populated — see results).
|
||||
- `crowbar_maybe_install_ctx_from_file()`: v3 opt-in via env var
|
||||
`XENIA_CROWBAR_CTX_BIN`; installs canary-captured 64-byte ctx
|
||||
blob (vptr / self / self / refcount / sentinels /
|
||||
secondary-obj-ptr / float).
|
||||
- `crowbar_force_spawn_workers()`: orchestrator. Allocates a 0x1000
|
||||
ctx page, installs `{vptr, self, self, refcount=1}` POD-copy
|
||||
head, optionally installs vtable + ctx blobs, spawns 4 workers,
|
||||
resumes 4 workers. Returns count resumed.
|
||||
|
||||
- `crates/xenia-kernel/src/state.rs`
|
||||
- `KernelState::crowbar_workers_enabled` (bool, default false)
|
||||
- `KernelState::crowbar_workers_trigger_instr` (u64, default
|
||||
`20_000_000`)
|
||||
- `KernelState::crowbar_workers_fired` (bool latch)
|
||||
- `KernelState::try_fire_crowbar_workers(&mut self, &GuestMemory,
|
||||
instruction_count)`: at-most-once helper; no-op when disabled, when
|
||||
already fired, or before threshold.
|
||||
|
||||
- `crates/xenia-app/src/main.rs`
|
||||
- `--force-spawn-workers` CLI flag on the `Exec` subcommand
|
||||
(line ~278) → sets `XENIA_CROWBAR_WORKERS=1` for downstream wire-up
|
||||
(line ~455).
|
||||
- Env-var wire-up in `cmd_exec_inner` (~line 1212): reads
|
||||
`XENIA_CROWBAR_WORKERS=1` and `XENIA_CROWBAR_TRIGGER_INSTR=N`.
|
||||
- **The trigger call** is at `coord_pre_round` (~line 2479) inside
|
||||
the per-round prologue, gated on
|
||||
`kernel.crowbar_workers_enabled && !kernel.crowbar_workers_fired`.
|
||||
- `check` subcommand has NO `--force-spawn-workers` flag; activation
|
||||
via env var works for both `exec` and `check`.
|
||||
|
||||
## Crowbar firing-moment choice
|
||||
|
||||
**Option β = fixed cycle threshold of 20M instructions.**
|
||||
|
||||
20M ≈ 3 s wallclock at lockstep cadence, well past:
|
||||
- the 10-thread initial spawn burst that peaks around boot-init
|
||||
VdSwap, and
|
||||
- the AUDIT-049 wedge crystallisation at host_ns ≈ 1.728 s
|
||||
(~12-15M instr).
|
||||
|
||||
The trigger fires once and latches `crowbar_workers_fired = true` so
|
||||
the helper is at-most-once per process lifetime.
|
||||
|
||||
## Default-off invariant
|
||||
|
||||
- `crowbar_workers_enabled` defaults to `false` in `KernelState::with_gpu()`.
|
||||
- The trigger condition `kernel.crowbar_workers_enabled && !kernel.crowbar_workers_fired`
|
||||
short-circuits the helper when disabled.
|
||||
- Env-var read returns `false` when `XENIA_CROWBAR_WORKERS` is unset.
|
||||
- Therefore: zero behaviour change in normal runs. 3× OFF cold runs
|
||||
are bit-identical (see `re-validation.md`).
|
||||
|
||||
## Determinism under crowbar-on
|
||||
|
||||
3× ON cold runs are bit-identical (`instructions=20000159`, identical
|
||||
fault PC/LR/CTR/r3/r4/r29/r30/r31, identical tid=16). The crowbar
|
||||
fires deterministically at the threshold instruction count, the 4
|
||||
spawned tids are bit-stable across runs, and the fault site is
|
||||
bit-stable.
|
||||
Reference in New Issue
Block a user