docs: handoff report for continuing on another machine
Snapshot of repo layout, branch map, gitignore policy for the heavy local artifacts, and the iterate-2.BC investigation state (next steps 2.BD handle disambiguation -> 2.BE host-driven ISR delivery). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
119
HANDOFF.md
Normal file
119
HANDOFF.md
Normal file
@@ -0,0 +1,119 @@
|
|||||||
|
# Handoff — Project Sylpheed RE / xenia-rs
|
||||||
|
|
||||||
|
_Generated 2026-06-05 to continue work on another machine._
|
||||||
|
|
||||||
|
## TL;DR
|
||||||
|
|
||||||
|
Reverse-engineering **Project Sylpheed — Arc of Deception** boot under a Rust
|
||||||
|
Xbox 360 emulator (`xenia-rs`), using **Xenia Canary** as the reference
|
||||||
|
("canary") oracle. The game boots but **wedges after ~2–6 frames**: a render /
|
||||||
|
VSync-event producer stops firing post-boot, so guest threads block forever.
|
||||||
|
Investigation is at **iterate 2.BC**; next step is **2.BD (handle
|
||||||
|
disambiguation)**, then **2.BE (architecture fix)**.
|
||||||
|
|
||||||
|
## Repo / machine layout
|
||||||
|
|
||||||
|
Workspace root: `/home/fabi/RE - Project Sylpheed/` (NOT a git repo itself).
|
||||||
|
|
||||||
|
| Dir | Git remote | Purpose |
|
||||||
|
|-----|-----------|---------|
|
||||||
|
| `xenia-rs/` | `git.mc02.dev/fabi/xenia-rs.git` | **Main project** — the Rust emulator + all RE work |
|
||||||
|
| `/home/fabi/Xenia-Canary/` | `git.mc02.dev/fabi/Xenia-Canary.git` | Reference Canary build (branch `xenia-rs`) |
|
||||||
|
| `xenia-canary/` | `github.com/xenia-canary/xenia-canary` | Upstream canary checkout (incl. `third_party/snappy` submodule) |
|
||||||
|
| `xenia-canary/third_party/snappy` | `git.mc02.dev/fabi/Snappy.git` (fork) | snappy + cross-build patch (see below) |
|
||||||
|
| `xenia/` | upstream xenia | Reference only |
|
||||||
|
| `sylpheed-reborn/` | — | **DEAD — ignore** |
|
||||||
|
|
||||||
|
The big game asset (`*.iso`, 7.8 GB) and `*.pe`/`*.xex.json` live at workspace
|
||||||
|
root and are **not** in git.
|
||||||
|
|
||||||
|
### What is and isn't in git (xenia-rs)
|
||||||
|
|
||||||
|
`.gitignore` now excludes the heavy, regenerable local artifacts so the repo
|
||||||
|
stays portable:
|
||||||
|
|
||||||
|
- **Committed:** all source, plus `audit-runs/**` analysis **notes**
|
||||||
|
(`.md`/`.txt`/small `.json` digests, ~6 MB).
|
||||||
|
- **Ignored:** `audit-runs/**` raw traces (`.jsonl`, `.jsonl.gz`, `.gz`, `.csv`,
|
||||||
|
`.stdout`, `.stderr`, `.log` — **~146 GB**), `.claude/` agent worktrees
|
||||||
|
(~66 GB), `*.bin` dumps, `exit-thread-state.json`, `*.bak`.
|
||||||
|
|
||||||
|
To regenerate traces on the new machine, re-run the emulator/diff harnesses
|
||||||
|
(see `docs/` and the per-iterate `audit-runs/iterate-*/` note files).
|
||||||
|
|
||||||
|
## Branch map (xenia-rs, all pushed)
|
||||||
|
|
||||||
|
- `master` — golden baseline (`sylpheed_n50m` golden, post-AUDIT-054).
|
||||||
|
- `chore/portable-snapshot` — **active line**; HEAD `ef93a4f` carries the
|
||||||
|
dormant parity fixes (`nt_create_event` polarity, MMIO VSync hardcode) +
|
||||||
|
iterate notes.
|
||||||
|
- `iterate-2AT-deref`, `iterate-2AU-xaudio`, `iterate-2AZ-vsync` — throwaway
|
||||||
|
probe instrumentation, preserved as-is (inert per findings; do not merge
|
||||||
|
blindly).
|
||||||
|
- `worktree-agent-a0848e51cc0d72503` — stale worktree ref (no unique work).
|
||||||
|
|
||||||
|
## Where the investigation stands (iterate 2.BC)
|
||||||
|
|
||||||
|
The authoritative running log is the persistent memory at
|
||||||
|
`~/.claude/projects/-home-fabi-RE---Project-Sylpheed/memory/` (`MEMORY.md`
|
||||||
|
index + topic files). Key state:
|
||||||
|
|
||||||
|
- **The wedge is a genuine producer bug, independent of cadence mode.** Running
|
||||||
|
the game `--parallel` (wall-clock 60 Hz VSync) also wedges after ~2 frames
|
||||||
|
(iterate 2.BB), so it is **not** a lockstep artifact. Cadence-clock direction
|
||||||
|
is a dead end.
|
||||||
|
- **Canary's frame pacing = a host "GPU Frame limiter" thread** (canary tid=2,
|
||||||
|
`graphics_system.cc:146`) that calls `NtSetEvent` ~4660× at 60 Hz and runs
|
||||||
|
the guest VSync ISR **synchronously on the host thread**
|
||||||
|
(`MarkVblank → DispatchInterruptCallback → EmulateCPInterruptDPC →
|
||||||
|
processor_->Execute`), scheduler-independent (iterate 2.BA).
|
||||||
|
- **Ours has no host frame-limiter.** It injects the ISR onto a guest *victim*
|
||||||
|
thread (`try_inject_graphics_interrupt`, `crates/xenia-app/src/main.rs`
|
||||||
|
~3729). Once the guest blocks/idles after boot, ISR delivery stops — ours
|
||||||
|
fires the signal path ~96× early-boot then **stops**.
|
||||||
|
- **`opt_callback` signal path IS wired in ours** (iterate 2.BC, falsifies the
|
||||||
|
earlier 2.AT "NULL delegate" claim): `sub_822F2248` body = 3 parts; part C
|
||||||
|
`@0x822F22CC` calls `bl 0x822F13B0` (singleton `0x828f3844`) →
|
||||||
|
`NtSetEvent(ev0)` via `0x824AA2F0`. Runtime: this reaches `NtSetEvent` 96× on
|
||||||
|
**handle 0x108c** then stops. So divergence = **cadence/delivery
|
||||||
|
architecture**, not a missing delegate.
|
||||||
|
|
||||||
|
### Open question to resolve FIRST — iterate 2.BD (~0 LOC)
|
||||||
|
|
||||||
|
**Handle disambiguation.** `opt_callback` signals **0x108c**, but `tid=1` was
|
||||||
|
recorded wedging on **0x10e8**. Are these the same event or different?
|
||||||
|
- If `tid=1`'s wait is really 0x108c (0x10e8 a mislabel) → the cadence/delivery
|
||||||
|
fix unwedges tid=1.
|
||||||
|
- If 0x10e8 is a separate event → it needs its own producer.
|
||||||
|
|
||||||
|
Map who-waits / who-signals for `0x108c / 0x1090 / 0x10e8 / 0x1004` in **both**
|
||||||
|
ours and canary before writing any fix. (`0x1004` = tid=12 DPC work-queue wake,
|
||||||
|
also dead post-boot.)
|
||||||
|
|
||||||
|
### Then iterate 2.BE — architecture fix (~20–60 LOC, MEDIUM)
|
||||||
|
|
||||||
|
Replace victim-thread ISR injection with **host-driven synchronous ISR
|
||||||
|
delivery** mirroring canary's `EmulateCPInterruptDPC` frame-limiter, so VSync
|
||||||
|
keeps firing after the guest blocks. Fix surface:
|
||||||
|
`crates/xenia-kernel/src/interrupts.rs` + `crates/xenia-app/src/main.rs`.
|
||||||
|
This is why the 2.AZ clock-swap was inert — the gap is *delivery
|
||||||
|
architecture*, not the clock.
|
||||||
|
|
||||||
|
## Workflow notes
|
||||||
|
|
||||||
|
- **The user drives dispatch cadence.** After a research iterate completes,
|
||||||
|
sync memory + report concisely, then **pause for explicit go** — do not
|
||||||
|
auto-dispatch the next sub-agent.
|
||||||
|
- Methodology rule earned the hard way (#44/#46): before claiming "X never
|
||||||
|
fires / signal missing", trace the **whole** function body at runtime
|
||||||
|
(`--lr-trace`) and verify the reference engine actually has it non-null —
|
||||||
|
an empty slot is only a bug if canary's is populated.
|
||||||
|
|
||||||
|
## Verify the checkout on the new machine
|
||||||
|
|
||||||
|
```sh
|
||||||
|
cd xenia-rs
|
||||||
|
git checkout chore/portable-snapshot
|
||||||
|
cargo test # 300 + 230 + 149 + 11 suites expected green
|
||||||
|
# determinism baseline: sylpheed_n50m golden should be bit-identical to master
|
||||||
|
```
|
||||||
Reference in New Issue
Block a user