docs: handoff report for continuing on another machine

Snapshot of repo layout, branch map, gitignore policy for the heavy
local artifacts, and the iterate-2.BC investigation state (next steps
2.BD handle disambiguation -> 2.BE host-driven ISR delivery).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-06-05 07:21:32 +02:00
parent ef93a4fa14
commit a4926c73f4

119
HANDOFF.md Normal file
View File

@@ -0,0 +1,119 @@
# Handoff — Project Sylpheed RE / xenia-rs
_Generated 2026-06-05 to continue work on another machine._
## TL;DR
Reverse-engineering **Project Sylpheed — Arc of Deception** boot under a Rust
Xbox 360 emulator (`xenia-rs`), using **Xenia Canary** as the reference
("canary") oracle. The game boots but **wedges after ~26 frames**: a render /
VSync-event producer stops firing post-boot, so guest threads block forever.
Investigation is at **iterate 2.BC**; next step is **2.BD (handle
disambiguation)**, then **2.BE (architecture fix)**.
## Repo / machine layout
Workspace root: `/home/fabi/RE - Project Sylpheed/` (NOT a git repo itself).
| Dir | Git remote | Purpose |
|-----|-----------|---------|
| `xenia-rs/` | `git.mc02.dev/fabi/xenia-rs.git` | **Main project** — the Rust emulator + all RE work |
| `/home/fabi/Xenia-Canary/` | `git.mc02.dev/fabi/Xenia-Canary.git` | Reference Canary build (branch `xenia-rs`) |
| `xenia-canary/` | `github.com/xenia-canary/xenia-canary` | Upstream canary checkout (incl. `third_party/snappy` submodule) |
| `xenia-canary/third_party/snappy` | `git.mc02.dev/fabi/Snappy.git` (fork) | snappy + cross-build patch (see below) |
| `xenia/` | upstream xenia | Reference only |
| `sylpheed-reborn/` | — | **DEAD — ignore** |
The big game asset (`*.iso`, 7.8 GB) and `*.pe`/`*.xex.json` live at workspace
root and are **not** in git.
### What is and isn't in git (xenia-rs)
`.gitignore` now excludes the heavy, regenerable local artifacts so the repo
stays portable:
- **Committed:** all source, plus `audit-runs/**` analysis **notes**
(`.md`/`.txt`/small `.json` digests, ~6 MB).
- **Ignored:** `audit-runs/**` raw traces (`.jsonl`, `.jsonl.gz`, `.gz`, `.csv`,
`.stdout`, `.stderr`, `.log`**~146 GB**), `.claude/` agent worktrees
(~66 GB), `*.bin` dumps, `exit-thread-state.json`, `*.bak`.
To regenerate traces on the new machine, re-run the emulator/diff harnesses
(see `docs/` and the per-iterate `audit-runs/iterate-*/` note files).
## Branch map (xenia-rs, all pushed)
- `master` — golden baseline (`sylpheed_n50m` golden, post-AUDIT-054).
- `chore/portable-snapshot`**active line**; HEAD `ef93a4f` carries the
dormant parity fixes (`nt_create_event` polarity, MMIO VSync hardcode) +
iterate notes.
- `iterate-2AT-deref`, `iterate-2AU-xaudio`, `iterate-2AZ-vsync` — throwaway
probe instrumentation, preserved as-is (inert per findings; do not merge
blindly).
- `worktree-agent-a0848e51cc0d72503` — stale worktree ref (no unique work).
## Where the investigation stands (iterate 2.BC)
The authoritative running log is the persistent memory at
`~/.claude/projects/-home-fabi-RE---Project-Sylpheed/memory/` (`MEMORY.md`
index + topic files). Key state:
- **The wedge is a genuine producer bug, independent of cadence mode.** Running
the game `--parallel` (wall-clock 60 Hz VSync) also wedges after ~2 frames
(iterate 2.BB), so it is **not** a lockstep artifact. Cadence-clock direction
is a dead end.
- **Canary's frame pacing = a host "GPU Frame limiter" thread** (canary tid=2,
`graphics_system.cc:146`) that calls `NtSetEvent` ~4660× at 60 Hz and runs
the guest VSync ISR **synchronously on the host thread**
(`MarkVblank → DispatchInterruptCallback → EmulateCPInterruptDPC →
processor_->Execute`), scheduler-independent (iterate 2.BA).
- **Ours has no host frame-limiter.** It injects the ISR onto a guest *victim*
thread (`try_inject_graphics_interrupt`, `crates/xenia-app/src/main.rs`
~3729). Once the guest blocks/idles after boot, ISR delivery stops — ours
fires the signal path ~96× early-boot then **stops**.
- **`opt_callback` signal path IS wired in ours** (iterate 2.BC, falsifies the
earlier 2.AT "NULL delegate" claim): `sub_822F2248` body = 3 parts; part C
`@0x822F22CC` calls `bl 0x822F13B0` (singleton `0x828f3844`) →
`NtSetEvent(ev0)` via `0x824AA2F0`. Runtime: this reaches `NtSetEvent` 96× on
**handle 0x108c** then stops. So divergence = **cadence/delivery
architecture**, not a missing delegate.
### Open question to resolve FIRST — iterate 2.BD (~0 LOC)
**Handle disambiguation.** `opt_callback` signals **0x108c**, but `tid=1` was
recorded wedging on **0x10e8**. Are these the same event or different?
- If `tid=1`'s wait is really 0x108c (0x10e8 a mislabel) → the cadence/delivery
fix unwedges tid=1.
- If 0x10e8 is a separate event → it needs its own producer.
Map who-waits / who-signals for `0x108c / 0x1090 / 0x10e8 / 0x1004` in **both**
ours and canary before writing any fix. (`0x1004` = tid=12 DPC work-queue wake,
also dead post-boot.)
### Then iterate 2.BE — architecture fix (~2060 LOC, MEDIUM)
Replace victim-thread ISR injection with **host-driven synchronous ISR
delivery** mirroring canary's `EmulateCPInterruptDPC` frame-limiter, so VSync
keeps firing after the guest blocks. Fix surface:
`crates/xenia-kernel/src/interrupts.rs` + `crates/xenia-app/src/main.rs`.
This is why the 2.AZ clock-swap was inert — the gap is *delivery
architecture*, not the clock.
## Workflow notes
- **The user drives dispatch cadence.** After a research iterate completes,
sync memory + report concisely, then **pause for explicit go** — do not
auto-dispatch the next sub-agent.
- Methodology rule earned the hard way (#44/#46): before claiming "X never
fires / signal missing", trace the **whole** function body at runtime
(`--lr-trace`) and verify the reference engine actually has it non-null —
an empty slot is only a bug if canary's is populated.
## Verify the checkout on the new machine
```sh
cd xenia-rs
git checkout chore/portable-snapshot
cargo test # 300 + 230 + 149 + 11 suites expected green
# determinism baseline: sylpheed_n50m golden should be bit-identical to master
```