Files
xenia-rs/HANDOFF.md
MechaCat02 a4926c73f4 docs: handoff report for continuing on another machine
Snapshot of repo layout, branch map, gitignore policy for the heavy
local artifacts, and the iterate-2.BC investigation state (next steps
2.BD handle disambiguation -> 2.BE host-driven ISR delivery).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:21:32 +02:00

5.7 KiB
Raw Blame History

Handoff — Project Sylpheed RE / xenia-rs

Generated 2026-06-05 to continue work on another machine.

TL;DR

Reverse-engineering Project Sylpheed — Arc of Deception boot under a Rust Xbox 360 emulator (xenia-rs), using Xenia Canary as the reference ("canary") oracle. The game boots but wedges after ~26 frames: a render / VSync-event producer stops firing post-boot, so guest threads block forever. Investigation is at iterate 2.BC; next step is 2.BD (handle disambiguation), then 2.BE (architecture fix).

Repo / machine layout

Workspace root: /home/fabi/RE - Project Sylpheed/ (NOT a git repo itself).

Dir Git remote Purpose
xenia-rs/ git.mc02.dev/fabi/xenia-rs.git Main project — the Rust emulator + all RE work
/home/fabi/Xenia-Canary/ git.mc02.dev/fabi/Xenia-Canary.git Reference Canary build (branch xenia-rs)
xenia-canary/ github.com/xenia-canary/xenia-canary Upstream canary checkout (incl. third_party/snappy submodule)
xenia-canary/third_party/snappy git.mc02.dev/fabi/Snappy.git (fork) snappy + cross-build patch (see below)
xenia/ upstream xenia Reference only
sylpheed-reborn/ DEAD — ignore

The big game asset (*.iso, 7.8 GB) and *.pe/*.xex.json live at workspace root and are not in git.

What is and isn't in git (xenia-rs)

.gitignore now excludes the heavy, regenerable local artifacts so the repo stays portable:

  • Committed: all source, plus audit-runs/** analysis notes (.md/.txt/small .json digests, ~6 MB).
  • Ignored: audit-runs/** raw traces (.jsonl, .jsonl.gz, .gz, .csv, .stdout, .stderr, .log~146 GB), .claude/ agent worktrees (~66 GB), *.bin dumps, exit-thread-state.json, *.bak.

To regenerate traces on the new machine, re-run the emulator/diff harnesses (see docs/ and the per-iterate audit-runs/iterate-*/ note files).

Branch map (xenia-rs, all pushed)

  • master — golden baseline (sylpheed_n50m golden, post-AUDIT-054).
  • chore/portable-snapshotactive line; HEAD ef93a4f carries the dormant parity fixes (nt_create_event polarity, MMIO VSync hardcode) + iterate notes.
  • iterate-2AT-deref, iterate-2AU-xaudio, iterate-2AZ-vsync — throwaway probe instrumentation, preserved as-is (inert per findings; do not merge blindly).
  • worktree-agent-a0848e51cc0d72503 — stale worktree ref (no unique work).

Where the investigation stands (iterate 2.BC)

The authoritative running log is the persistent memory at ~/.claude/projects/-home-fabi-RE---Project-Sylpheed/memory/ (MEMORY.md index + topic files). Key state:

  • The wedge is a genuine producer bug, independent of cadence mode. Running the game --parallel (wall-clock 60 Hz VSync) also wedges after ~2 frames (iterate 2.BB), so it is not a lockstep artifact. Cadence-clock direction is a dead end.
  • Canary's frame pacing = a host "GPU Frame limiter" thread (canary tid=2, graphics_system.cc:146) that calls NtSetEvent ~4660× at 60 Hz and runs the guest VSync ISR synchronously on the host thread (MarkVblank → DispatchInterruptCallback → EmulateCPInterruptDPC → processor_->Execute), scheduler-independent (iterate 2.BA).
  • Ours has no host frame-limiter. It injects the ISR onto a guest victim thread (try_inject_graphics_interrupt, crates/xenia-app/src/main.rs ~3729). Once the guest blocks/idles after boot, ISR delivery stops — ours fires the signal path ~96× early-boot then stops.
  • opt_callback signal path IS wired in ours (iterate 2.BC, falsifies the earlier 2.AT "NULL delegate" claim): sub_822F2248 body = 3 parts; part C @0x822F22CC calls bl 0x822F13B0 (singleton 0x828f3844) → NtSetEvent(ev0) via 0x824AA2F0. Runtime: this reaches NtSetEvent 96× on handle 0x108c then stops. So divergence = cadence/delivery architecture, not a missing delegate.

Open question to resolve FIRST — iterate 2.BD (~0 LOC)

Handle disambiguation. opt_callback signals 0x108c, but tid=1 was recorded wedging on 0x10e8. Are these the same event or different?

  • If tid=1's wait is really 0x108c (0x10e8 a mislabel) → the cadence/delivery fix unwedges tid=1.
  • If 0x10e8 is a separate event → it needs its own producer.

Map who-waits / who-signals for 0x108c / 0x1090 / 0x10e8 / 0x1004 in both ours and canary before writing any fix. (0x1004 = tid=12 DPC work-queue wake, also dead post-boot.)

Then iterate 2.BE — architecture fix (~2060 LOC, MEDIUM)

Replace victim-thread ISR injection with host-driven synchronous ISR delivery mirroring canary's EmulateCPInterruptDPC frame-limiter, so VSync keeps firing after the guest blocks. Fix surface: crates/xenia-kernel/src/interrupts.rs + crates/xenia-app/src/main.rs. This is why the 2.AZ clock-swap was inert — the gap is delivery architecture, not the clock.

Workflow notes

  • The user drives dispatch cadence. After a research iterate completes, sync memory + report concisely, then pause for explicit go — do not auto-dispatch the next sub-agent.
  • Methodology rule earned the hard way (#44/#46): before claiming "X never fires / signal missing", trace the whole function body at runtime (--lr-trace) and verify the reference engine actually has it non-null — an empty slot is only a bug if canary's is populated.

Verify the checkout on the new machine

cd xenia-rs
git checkout chore/portable-snapshot
cargo test          # 300 + 230 + 149 + 11 suites expected green
# determinism baseline: sylpheed_n50m golden should be bit-identical to master