docs: handoff report for continuing on another machine
Snapshot of repo layout, branch map, gitignore policy for the heavy local artifacts, and the iterate-2.BC investigation state (next steps 2.BD handle disambiguation -> 2.BE host-driven ISR delivery). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
119
HANDOFF.md
Normal file
119
HANDOFF.md
Normal file
@@ -0,0 +1,119 @@
|
||||
# Handoff — Project Sylpheed RE / xenia-rs
|
||||
|
||||
_Generated 2026-06-05 to continue work on another machine._
|
||||
|
||||
## TL;DR
|
||||
|
||||
Reverse-engineering **Project Sylpheed — Arc of Deception** boot under a Rust
|
||||
Xbox 360 emulator (`xenia-rs`), using **Xenia Canary** as the reference
|
||||
("canary") oracle. The game boots but **wedges after ~2–6 frames**: a render /
|
||||
VSync-event producer stops firing post-boot, so guest threads block forever.
|
||||
Investigation is at **iterate 2.BC**; next step is **2.BD (handle
|
||||
disambiguation)**, then **2.BE (architecture fix)**.
|
||||
|
||||
## Repo / machine layout
|
||||
|
||||
Workspace root: `/home/fabi/RE - Project Sylpheed/` (NOT a git repo itself).
|
||||
|
||||
| Dir | Git remote | Purpose |
|
||||
|-----|-----------|---------|
|
||||
| `xenia-rs/` | `git.mc02.dev/fabi/xenia-rs.git` | **Main project** — the Rust emulator + all RE work |
|
||||
| `/home/fabi/Xenia-Canary/` | `git.mc02.dev/fabi/Xenia-Canary.git` | Reference Canary build (branch `xenia-rs`) |
|
||||
| `xenia-canary/` | `github.com/xenia-canary/xenia-canary` | Upstream canary checkout (incl. `third_party/snappy` submodule) |
|
||||
| `xenia-canary/third_party/snappy` | `git.mc02.dev/fabi/Snappy.git` (fork) | snappy + cross-build patch (see below) |
|
||||
| `xenia/` | upstream xenia | Reference only |
|
||||
| `sylpheed-reborn/` | — | **DEAD — ignore** |
|
||||
|
||||
The big game asset (`*.iso`, 7.8 GB) and `*.pe`/`*.xex.json` live at workspace
|
||||
root and are **not** in git.
|
||||
|
||||
### What is and isn't in git (xenia-rs)
|
||||
|
||||
`.gitignore` now excludes the heavy, regenerable local artifacts so the repo
|
||||
stays portable:
|
||||
|
||||
- **Committed:** all source, plus `audit-runs/**` analysis **notes**
|
||||
(`.md`/`.txt`/small `.json` digests, ~6 MB).
|
||||
- **Ignored:** `audit-runs/**` raw traces (`.jsonl`, `.jsonl.gz`, `.gz`, `.csv`,
|
||||
`.stdout`, `.stderr`, `.log` — **~146 GB**), `.claude/` agent worktrees
|
||||
(~66 GB), `*.bin` dumps, `exit-thread-state.json`, `*.bak`.
|
||||
|
||||
To regenerate traces on the new machine, re-run the emulator/diff harnesses
|
||||
(see `docs/` and the per-iterate `audit-runs/iterate-*/` note files).
|
||||
|
||||
## Branch map (xenia-rs, all pushed)
|
||||
|
||||
- `master` — golden baseline (`sylpheed_n50m` golden, post-AUDIT-054).
|
||||
- `chore/portable-snapshot` — **active line**; HEAD `ef93a4f` carries the
|
||||
dormant parity fixes (`nt_create_event` polarity, MMIO VSync hardcode) +
|
||||
iterate notes.
|
||||
- `iterate-2AT-deref`, `iterate-2AU-xaudio`, `iterate-2AZ-vsync` — throwaway
|
||||
probe instrumentation, preserved as-is (inert per findings; do not merge
|
||||
blindly).
|
||||
- `worktree-agent-a0848e51cc0d72503` — stale worktree ref (no unique work).
|
||||
|
||||
## Where the investigation stands (iterate 2.BC)
|
||||
|
||||
The authoritative running log is the persistent memory at
|
||||
`~/.claude/projects/-home-fabi-RE---Project-Sylpheed/memory/` (`MEMORY.md`
|
||||
index + topic files). Key state:
|
||||
|
||||
- **The wedge is a genuine producer bug, independent of cadence mode.** Running
|
||||
the game `--parallel` (wall-clock 60 Hz VSync) also wedges after ~2 frames
|
||||
(iterate 2.BB), so it is **not** a lockstep artifact. Cadence-clock direction
|
||||
is a dead end.
|
||||
- **Canary's frame pacing = a host "GPU Frame limiter" thread** (canary tid=2,
|
||||
`graphics_system.cc:146`) that calls `NtSetEvent` ~4660× at 60 Hz and runs
|
||||
the guest VSync ISR **synchronously on the host thread**
|
||||
(`MarkVblank → DispatchInterruptCallback → EmulateCPInterruptDPC →
|
||||
processor_->Execute`), scheduler-independent (iterate 2.BA).
|
||||
- **Ours has no host frame-limiter.** It injects the ISR onto a guest *victim*
|
||||
thread (`try_inject_graphics_interrupt`, `crates/xenia-app/src/main.rs`
|
||||
~3729). Once the guest blocks/idles after boot, ISR delivery stops — ours
|
||||
fires the signal path ~96× early-boot then **stops**.
|
||||
- **`opt_callback` signal path IS wired in ours** (iterate 2.BC, falsifies the
|
||||
earlier 2.AT "NULL delegate" claim): `sub_822F2248` body = 3 parts; part C
|
||||
`@0x822F22CC` calls `bl 0x822F13B0` (singleton `0x828f3844`) →
|
||||
`NtSetEvent(ev0)` via `0x824AA2F0`. Runtime: this reaches `NtSetEvent` 96× on
|
||||
**handle 0x108c** then stops. So divergence = **cadence/delivery
|
||||
architecture**, not a missing delegate.
|
||||
|
||||
### Open question to resolve FIRST — iterate 2.BD (~0 LOC)
|
||||
|
||||
**Handle disambiguation.** `opt_callback` signals **0x108c**, but `tid=1` was
|
||||
recorded wedging on **0x10e8**. Are these the same event or different?
|
||||
- If `tid=1`'s wait is really 0x108c (0x10e8 a mislabel) → the cadence/delivery
|
||||
fix unwedges tid=1.
|
||||
- If 0x10e8 is a separate event → it needs its own producer.
|
||||
|
||||
Map who-waits / who-signals for `0x108c / 0x1090 / 0x10e8 / 0x1004` in **both**
|
||||
ours and canary before writing any fix. (`0x1004` = tid=12 DPC work-queue wake,
|
||||
also dead post-boot.)
|
||||
|
||||
### Then iterate 2.BE — architecture fix (~20–60 LOC, MEDIUM)
|
||||
|
||||
Replace victim-thread ISR injection with **host-driven synchronous ISR
|
||||
delivery** mirroring canary's `EmulateCPInterruptDPC` frame-limiter, so VSync
|
||||
keeps firing after the guest blocks. Fix surface:
|
||||
`crates/xenia-kernel/src/interrupts.rs` + `crates/xenia-app/src/main.rs`.
|
||||
This is why the 2.AZ clock-swap was inert — the gap is *delivery
|
||||
architecture*, not the clock.
|
||||
|
||||
## Workflow notes
|
||||
|
||||
- **The user drives dispatch cadence.** After a research iterate completes,
|
||||
sync memory + report concisely, then **pause for explicit go** — do not
|
||||
auto-dispatch the next sub-agent.
|
||||
- Methodology rule earned the hard way (#44/#46): before claiming "X never
|
||||
fires / signal missing", trace the **whole** function body at runtime
|
||||
(`--lr-trace`) and verify the reference engine actually has it non-null —
|
||||
an empty slot is only a bug if canary's is populated.
|
||||
|
||||
## Verify the checkout on the new machine
|
||||
|
||||
```sh
|
||||
cd xenia-rs
|
||||
git checkout chore/portable-snapshot
|
||||
cargo test # 300 + 230 + 149 + 11 suites expected green
|
||||
# determinism baseline: sylpheed_n50m golden should be bit-identical to master
|
||||
```
|
||||
Reference in New Issue
Block a user