Files

MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes

Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-05 07:19:08 +02:00

14 KiB

Raw Blame History

Iterate 2.H — Physical heap `vA0000000` bucket (writer report)

Date: 2026-05-28. LOC delta: engine +99 / -3 (2 files), canary 0. Tests: xenia-kernel 227 PASS (was 226 — +1 new test), xenia-memory 19 PASS. Zero regressions.

Headline

PRIMARY-GATE-PASS-NO-CASCADE. All three diverging ctx_ptr columns now land in the 0xAxxxxxxx-0xBxxxxxxx canary vA0000000 heap range (was 0x4xxxxxxx). The structural address-space-bucket divergence is closed. The secondary cascade (missing producer LRs, canary tids 15/27/28 worker fan-out, tid=1 wedge) is unchanged — the run produces a bit-identical event count (118,149) and the same set of 10 spawned thread entry_pcs as the iterate-2F baseline. Allocation-bucket was not the upstream cause of the worker-fan-out absence.

Mode detected

Boot trajectory captured via exec -n 50000000 --quiet --phase-a-event-log … (same invocation as iterate-2F-vdswap-drain-fix/ours-cold.jsonl). 50M-instruction budget completes in <1 s wallclock and ours wedges at the same set of guest PCs.

Patch

Files

xenia-rs/crates/xenia-kernel/src/state.rs
- +12 LOC: new field physical_heap_cursor: AtomicU32 on KernelState with docstring tying it to canary memory.cc:269-271.
- +3 LOC: init in with_gpu() to 0xC000_0000 (top-exclusive frontier of the 0xA0000000-0xBFFFFFFF bucket).
- +37 LOC: new method physical_heap_alloc(&self, size, mem) -> Option<u32> — 64KB-aligned, top-down, CAS-loop bump allocator with 0xA000_0000 floor check; on success delegates to mem.alloc(base, size, READ|WRITE).
- +22 LOC: smoke test physical_heap_alloc_descends_in_va_range proving 10 consecutive 0x1234-byte allocs are descending, range-bound, and 64KB-aligned.
xenia-rs/crates/xenia-kernel/src/exports.rs
- +18 / -3 LOC in mm_allocate_physical_memory_ex: read protect_bits from r5; route X_MEM_LARGE_PAGES (0x20000000) requests to the new physical_heap_alloc, fall through to existing heap_alloc for non-large-page (4KB / 16MB-page) cases. Mirrors canary xboxkrnl_memory.cc:436-455 flag→heap-bucket dispatch.

Total git diff: 2 files, +99 insertions / -3 deletions = 96 net LOC.

Within the 80-150 target band, well under the 200 hard cap.

Out-of-scope (per prompt SCOPE GUARDS — deferred to follow-up)

vC0000000 (16MB-page bucket) and vE0000000 (4KB bucket) — NOT wired. Non-large-page MmAllocatePhysicalMemoryEx calls still fall through to the legacy heap_alloc at 0x4000_0000 (preserves prior behavior).
mm_get_physical_address masking — untouched.
MmFreePhysicalMemory — untouched (no free-list yet; minimal cursor bump-allocator, per prompt guidance).

Primary gate result

thread.create events with ctx_ptr not in static-allocated 0x828Fxxxx region (the diverging entries called out by the prompt):

entry_pc	canary ctx_ptr	2.F (pre-fix) ctx_ptr	2.H ctx_ptr	gate
`0x824cd458`	`0xbe56bb3c`	`0x42453b3c`	`0xbe8cbb3c`	PASS (in 0xAxxx-0xBxxx, low-3-bytes `0x8cbb3c` vs canary `0x56bb3c`, low-2-bytes `0xbb3c` exact-match)
`0x822f1ee0`	`0xbce24a40`	`0x40d0ca40`	`0xbd184a40`	PASS (in 0xAxxx-0xBxxx, low-2-bytes `0x4a40` exact-match)
`0x821748f0`	`0xbc365620`	`0x4024d640`	`0xbc6c5580`	PASS (in 0xAxxx-0xBxxx, high-byte `0xbc` exact-match)

The four entries the prompt called "static — already passes" still match exactly (0x828f3d08, 0x828f4838, 0x828f3b68, 0x828f3b08).

Notes:

Exact bit-for-bit ctx_ptr parity vs canary is not expected (and is not required by the gate) because top-down allocation order depends on the specific sequence of intervening MmAllocatePhysicalMemoryEx calls from other engine paths (XEX header preload, kernel objects, audio voice structs, etc.). The 2.H allocator services every X_MEM_LARGE_PAGES request, not just the seven on this table — so the cursor lands at offsets reflecting cumulative bytes-out before each thread.create.
The low-bytes match (0xbb3c / 0x4a40) is a strong structural signal: ours and canary now produce the same per-instance struct offsets within their respective heap pages, which means the MmAllocatePhysicalMemoryEx callers are requesting the same sizes in the same sequence. Only the heap top-of-cursor differs.
The two ctx_ptr=0x00000000 entries (0x824d2878 / 0x824d2940 audio worker entries) are by-design (suspended audio workers spawn with null context); unchanged.

Determinism check (gate gate): two consecutive 2.H runs produce identical thread.create ctx_ptr columns (table above is bit-stable across runs). Engine count: 118,149 events, ditto. guest_cycle drift ~120 cycles is pre-existing scheduler-interleaving non-determinism (documented in scheduler-determinism-plan), not introduced by 2.H.

Secondary cascade gate results

Per prompt: cascade gates are not required for the fix to land, but status matters.

(b) Missing (op, lr) tuples (iterate-2D method)

Not re-run. Would require fresh --lr-trace of the IAT thunks (0x8284DDDC,0x8284E49C,0x8284DF5C,0x8284E07C) which is a separate capture mode. The 2.D diff script analyzes that trace and the canary audit-69/70 traces; the new ours-cold.jsonl from phase-a-event-log doesn't feed that pipeline directly. Indirect evidence: the boot trajectory hits 118,149 events identical to 2.F at the kernel-call granularity (same total, same thread set, same wedge location at guest_cycle=450,294 on tid=5 — see "tid=1 wedge" below). High confidence the 2.D fire-pattern result is UNCHANGED. Gate (b): expected UNCHANGED (28/28).

(c) Canary tids 15/27/28 ours analogs

Spawned thread entry_pc set (10 entries) is bit-identical to 2.F baseline:

0x821748f0, 0x82178950, 0x82181830, 0x822f1ee0, 0x82450a28,
0x82457ef0, 0x8245a5d0, 0x824cd458, 0x824d2878, 0x824d2940

The sub_825070F0 post-VdSwap worker fan-out (which would spawn the analogs for canary tids 15/27/28) is still absent. Gate (c): FAIL (0 → 0).

(d) Producer-rate at LR 0x824AB168

Not directly measured (would need --lr-trace=0x824AB158 re-run). Indirect indicator: identical event count + identical thread set → producer-call sequence is structurally unchanged. Gate (d): expected UNCHANGED (~9.97% → ~9.97%).

(e) tid=1 wedge timestamp

Last 3 events on the 2.H run terminate with tid=5 waiting on a single handle (semantic_id d1cc2ba936cfd448) at guest_cycle=450,294 / host_ns ≈ 797,232,750. 2.F's terminal block was tid=1 + tid=13 at the same wedge PC 0x824ac578 per its writer-report; identical event-count + identical thread set implies the same wedge geometry. Wallclock difference is pre-existing (2.F removed the 900ms VdSwap drain). Gate (e): NEUTRAL — wedge presence unchanged; ctx_ptr is now in the right bucket but the wedge is downstream of allocation.

Cascade roll-up

gate	description	result
Patch LOC ≤ 200	hard cap	PASS (96 LOC net)
Patch LOC 80-150	target band	PASS (96 LOC net)
Build clean	warnings only, no errors	PASS
xenia-kernel tests	no regression, +1 new	PASS (227/227, was 226)
xenia-memory tests	no regression	PASS (19/19)
Determinism (ctx_ptr)	2 runs bit-stable on diverging entries	PASS
PRIMARY: ctx_ptr in 0xAxxx-0xBxxx range	3/3 diverging entries	PASS
(b) missing (op,lr) tuples drop from 28	not re-measured; expected unchanged	n/a
(c) ours analogs for canary tids 15/27/28	0 → 0	FAIL
(d) producer-rate at 0x824AB168 ≥10%	not re-measured; expected unchanged	n/a
(e) tid=1 wedge moved/absent	same wedge geometry	NEUTRAL

Outcome class: PRIMARY-GATE-PASS-NO-CASCADE. The structural address-space-bucket bug is closed. The downstream cascade (worker fan-out, producer rate, wedge) is unaffected.

Why the cascade did not follow

The 2.G report (per memory index) framed the 0xBCE25640 ctx-state installer chain as the next blocker once vA0000000 was mapped. 2.H maps the bucket but does NOT address what writes the vtable at [ctx+44] to point at 0x8200A1E8 / what game-side path leads sub_824FD240+0x24 to be invoked (AUDIT-068 Session 4). Two observations:

The arena VA itself is now allocatable in ours. The previous "unmapped VA" fault under Review A Step 1's --force-spawn-workers crowbar should no longer trip on the mapping (the VA exists). But:
The arena would only be naturally allocated if the upstream guest PPC code-path that calls MmAllocatePhysicalMemoryEx with X_MEM_LARGE_PAGES and lands the arena there ever fires in ours. In 2.H, the boot trajectory still wedges at the same point — meaning the ctx-installer chain (per AUDIT-068 S4 the sub_824F8398 → sub_824F7CD0 → sub_824F7800 → sub_824FD240+0x24 sequence) is downstream of the wedge and never executes.

The 2.H fix is necessary (every cooperating subsystem now has ctx_ptr in the right bucket — see the 0xbe8cbb3c, 0xbd184a40, 0xbc6c5580 entries which DO fire pre-wedge) but not sufficient to break the wedge. The wedge is still at sub_821CB030+0x1AC per AUDIT-049, upstream of the AUDIT-068 install epoch (host_ns ≈ 9.4 s on canary, ~13× later than ours's wedge at ~810 ms).

Tripstone audit

#28 (per-engine tid stability): the ctx_ptr comparison is keyed on entry_pc (stable across engines) — never on the host-side tid label.
#39 (composite progression metric): the PRIMARY gate is structural (bucket-range parity), explicitly NOT a swaps/draws/RT progression claim. The fix is NOT advertised as progression. Indeed, the event-count is identical to 2.F (118,149) — guest progression is unchanged.
#40 (single-keystone framing): the framing "vA0000000 is the keystone" is PARTIALLY FALSIFIED. The structural gate passes (closing one real bug), but the predicted downstream cascade (workers spawn → producers fire → wedge unblocks) does NOT follow. Retained on its own merits; not advertised as the keystone.

Confidence

HIGH that the patch correctly maps MmAllocatePhysicalMemoryEx large-page requests to the canary vA0000000 heap range. HIGH that this is a real bug fixed (the previous 0x4xxxxxxx addresses are factually wrong vs canary's heap layout). HIGH that the cascade does not follow (3-of-3 cascade gates flat: identical event count, identical thread set, same wedge). MEDIUM that this fix is on the critical path of the AUDIT-068 ctx-installer chain — necessary but downstream of the unidentified upstream cause that prevents sub_824F8398 from firing in ours at all.

Next iterate recommendation

NOT a follow-up vA-bucket-extension iterate. The vC0000000 / vE0000000 buckets are still on the legacy heap_alloc at 0x4000_0000; this is structurally wrong but unobserved on the boot trajectory (no calls in our window request 16MB or 4KB pages — the three diverging thread.creates all routed via the 64KB X_MEM_LARGE_PAGES flag, confirmed by their landing in the new allocator).

Recommended next: iterate-2I attacks the upstream cause of the AUDIT-068 install-chain non-firing. Two candidate angles:

(i) Mine canary phase-a log for the kernel-call sequence in the window host_ns ∈ [0, 1.0]s (well before the install epoch) and diff vs ours's 2.H phase-a log. The first kernel-call mismatch in that window is upstream of every observable wedge / spawn divergence. ~0 engine LOC, pure data work.
(ii) Re-attempt Review A Step 1's --force-spawn-workers now that 0xBCE25640 is allocable. Workers may still fault on missing vtable entries (the [ctx+44] = 0x8200A1E8 write is a game-side ctor that hasn't run), but the fault-class will shift from "unmapped page" to "uninitialized vtable" — a more informative divergence.

Artifacts

Under xenia-rs/audit-runs/iterate-2H-physical-heap-vA/:

ours-cold.jsonl (118,149 events, 50M-instr run, phase-a log, md5sum 1aa11b1a4839ca8b670f53f29df2c885)
ours-cold.stdout.log / ours-cold.stderr.log (empty — quiet mode)
writer-report.md (this file)

Patch summary (text form, for review)

diff --git a/crates/xenia-kernel/src/state.rs b/crates/xenia-kernel/src/state.rs
+    pub physical_heap_cursor: std::sync::atomic::AtomicU32,
+            physical_heap_cursor: AtomicU32::new(0xC000_0000),
+    pub fn physical_heap_alloc(&self, size: u32, mem: &GuestMemory) -> Option<u32> {
+        use std::sync::atomic::Ordering;
+        if size == 0 { return None; }
+        let aligned_size = (size + 0xFFFF) & !0xFFFF;
+        let base = loop {
+            let cur = self.physical_heap_cursor.load(Ordering::Relaxed);
+            let new_cur = cur.checked_sub(aligned_size)?;
+            if new_cur < 0xA000_0000 { return None; }
+            match self.physical_heap_cursor.compare_exchange(
+                cur, new_cur, Ordering::Relaxed, Ordering::Relaxed,
+            ) { Ok(_) => break new_cur, Err(_) => continue }
+        };
+        let protect = MemoryProtect::READ | MemoryProtect::WRITE;
+        mem.alloc(base, aligned_size, protect).ok()?;
+        Some(base)
+    }

diff --git a/crates/xenia-kernel/src/exports.rs b/crates/xenia-kernel/src/exports.rs
-    let size = ctx.gpr[4] as u32;
+    let size = ctx.gpr[4] as u32;
+    let protect_bits = ctx.gpr[5] as u32;
…
-    match state.heap_alloc(size, mem) {
+    const X_MEM_LARGE_PAGES: u32 = 0x2000_0000;
+    let result = if protect_bits & X_MEM_LARGE_PAGES != 0 {
+        state.physical_heap_alloc(size, mem)
+    } else {
+        state.heap_alloc(size, mem)
+    };
+    match result {

14 KiB Raw Blame History Unescape Escape

Iterate 2.H — Physical heap vA0000000 bucket (writer report)