Files
xenia-rs/audit-runs/iterate-2H-physical-heap-vA/writer-report.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

294 lines
14 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Iterate 2.H — Physical heap `vA0000000` bucket (writer report)
**Date:** 2026-05-28. **LOC delta:** engine **+99 / -3** (2 files), canary **0**.
**Tests:** xenia-kernel **227 PASS** (was 226 — +1 new test), xenia-memory **19 PASS**.
**Zero regressions.**
## Headline
**PRIMARY-GATE-PASS-NO-CASCADE.** All three diverging `ctx_ptr` columns now
land in the `0xAxxxxxxx-0xBxxxxxxx` canary `vA0000000` heap range (was
`0x4xxxxxxx`). The structural address-space-bucket divergence is closed.
The secondary cascade (missing producer LRs, canary tids 15/27/28 worker
fan-out, tid=1 wedge) is **unchanged** — the run produces a bit-identical
event count (118,149) and the same set of 10 spawned thread entry_pcs as
the iterate-2F baseline. Allocation-bucket was not the upstream cause of
the worker-fan-out absence.
## Mode detected
Boot trajectory captured via `exec -n 50000000 --quiet --phase-a-event-log
…` (same invocation as iterate-2F-vdswap-drain-fix/ours-cold.jsonl).
50M-instruction budget completes in <1 s wallclock and ours wedges at
the same set of guest PCs.
## Patch
### Files
- `xenia-rs/crates/xenia-kernel/src/state.rs`
- **+12 LOC**: new field `physical_heap_cursor: AtomicU32` on `KernelState`
with docstring tying it to canary memory.cc:269-271.
- **+3 LOC**: init in `with_gpu()` to `0xC000_0000` (top-exclusive
frontier of the `0xA0000000-0xBFFFFFFF` bucket).
- **+37 LOC**: new method `physical_heap_alloc(&self, size, mem) ->
Option<u32>` — 64KB-aligned, top-down, CAS-loop bump allocator with
`0xA000_0000` floor check; on success delegates to
`mem.alloc(base, size, READ|WRITE)`.
- **+22 LOC**: smoke test `physical_heap_alloc_descends_in_va_range`
proving 10 consecutive 0x1234-byte allocs are descending, range-bound,
and 64KB-aligned.
- `xenia-rs/crates/xenia-kernel/src/exports.rs`
- **+18 / -3 LOC** in `mm_allocate_physical_memory_ex`: read `protect_bits`
from `r5`; route `X_MEM_LARGE_PAGES` (`0x20000000`) requests to the
new `physical_heap_alloc`, fall through to existing `heap_alloc` for
non-large-page (4KB / 16MB-page) cases. Mirrors canary
`xboxkrnl_memory.cc:436-455` flag→heap-bucket dispatch.
### Total git diff: 2 files, **+99 insertions / -3 deletions = 96 net LOC**.
Within the 80-150 target band, well under the 200 hard cap.
### Out-of-scope (per prompt SCOPE GUARDS — deferred to follow-up)
- `vC0000000` (16MB-page bucket) and `vE0000000` (4KB bucket) — NOT wired.
Non-large-page `MmAllocatePhysicalMemoryEx` calls still fall through
to the legacy `heap_alloc` at `0x4000_0000` (preserves prior behavior).
- `mm_get_physical_address` masking — untouched.
- `MmFreePhysicalMemory` — untouched (no free-list yet; minimal cursor
bump-allocator, per prompt guidance).
## Primary gate result
`thread.create` events with `ctx_ptr` not in static-allocated
`0x828Fxxxx` region (the diverging entries called out by the prompt):
| entry_pc | canary ctx_ptr | 2.F (pre-fix) ctx_ptr | 2.H ctx_ptr | gate |
|---|---|---|---|---|
| `0x824cd458` | `0xbe56bb3c` | `0x42453b3c` | **`0xbe8cbb3c`** | **PASS** (in 0xAxxx-0xBxxx, low-3-bytes `0x8cbb3c` vs canary `0x56bb3c`, low-2-bytes `0xbb3c` exact-match) |
| `0x822f1ee0` | `0xbce24a40` | `0x40d0ca40` | **`0xbd184a40`** | **PASS** (in 0xAxxx-0xBxxx, low-2-bytes `0x4a40` exact-match) |
| `0x821748f0` | `0xbc365620` | `0x4024d640` | **`0xbc6c5580`** | **PASS** (in 0xAxxx-0xBxxx, high-byte `0xbc` exact-match) |
The four entries the prompt called "static — already passes" still
match exactly (`0x828f3d08`, `0x828f4838`, `0x828f3b68`, `0x828f3b08`).
**Notes:**
- Exact bit-for-bit ctx_ptr parity vs canary is not expected (and is not
required by the gate) because top-down allocation order depends on
the specific sequence of intervening `MmAllocatePhysicalMemoryEx`
calls from other engine paths (XEX header preload, kernel objects,
audio voice structs, etc.). The 2.H allocator services every
`X_MEM_LARGE_PAGES` request, not just the seven on this table — so
the cursor lands at offsets reflecting cumulative bytes-out before
each `thread.create`.
- The low-bytes match (`0xbb3c` / `0x4a40`) is a strong structural
signal: ours and canary now produce the same per-instance struct
offsets within their respective heap pages, which means the
`MmAllocatePhysicalMemoryEx` callers are requesting the same sizes
in the same sequence. Only the heap top-of-cursor differs.
- The two `ctx_ptr=0x00000000` entries (0x824d2878 / 0x824d2940 audio
worker entries) are by-design (suspended audio workers spawn with
null context); unchanged.
**Determinism check (gate gate):** two consecutive 2.H runs produce
identical `thread.create` `ctx_ptr` columns (table above is bit-stable
across runs). Engine count: 118,149 events, ditto. `guest_cycle` drift
~120 cycles is pre-existing scheduler-interleaving non-determinism
(documented in scheduler-determinism-plan), not introduced by 2.H.
## Secondary cascade gate results
Per prompt: cascade gates are not required for the fix to land, but
status matters.
### (b) Missing (op, lr) tuples (iterate-2D method)
Not re-run. Would require fresh `--lr-trace` of the IAT thunks
(`0x8284DDDC,0x8284E49C,0x8284DF5C,0x8284E07C`) which is a separate
capture mode. The 2.D diff script analyzes that trace and the canary
audit-69/70 traces; the new ours-cold.jsonl from phase-a-event-log
doesn't feed that pipeline directly. Indirect evidence: the boot
trajectory hits 118,149 events identical to 2.F at the kernel-call
granularity (same total, same thread set, same wedge location at
guest_cycle=450,294 on tid=5 — see "tid=1 wedge" below). High
confidence the 2.D fire-pattern result is **UNCHANGED**.
**Gate (b): expected UNCHANGED (28/28).**
### (c) Canary tids 15/27/28 ours analogs
Spawned thread entry_pc set (10 entries) is **bit-identical** to 2.F
baseline:
```
0x821748f0, 0x82178950, 0x82181830, 0x822f1ee0, 0x82450a28,
0x82457ef0, 0x8245a5d0, 0x824cd458, 0x824d2878, 0x824d2940
```
The `sub_825070F0` post-VdSwap worker fan-out (which would spawn the
analogs for canary tids 15/27/28) is **still absent**. **Gate (c): FAIL
(0 → 0).**
### (d) Producer-rate at LR 0x824AB168
Not directly measured (would need `--lr-trace=0x824AB158` re-run).
Indirect indicator: identical event count + identical thread set →
producer-call sequence is structurally unchanged. **Gate (d): expected
UNCHANGED (~9.97% → ~9.97%).**
### (e) tid=1 wedge timestamp
Last 3 events on the 2.H run terminate with tid=5 waiting on a single
handle (semantic_id `d1cc2ba936cfd448`) at `guest_cycle=450,294` /
`host_ns ≈ 797,232,750`. 2.F's terminal block was tid=1 + tid=13 at
the same wedge PC `0x824ac578` per its writer-report; identical
event-count + identical thread set implies the same wedge geometry.
Wallclock difference is pre-existing (2.F removed the 900ms VdSwap
drain). **Gate (e): NEUTRAL — wedge presence unchanged; ctx_ptr is now
in the right bucket but the wedge is downstream of allocation.**
## Cascade roll-up
| gate | description | result |
|------|-------------|--------|
| Patch LOC ≤ 200 | hard cap | **PASS** (96 LOC net) |
| Patch LOC 80-150 | target band | **PASS** (96 LOC net) |
| Build clean | warnings only, no errors | **PASS** |
| xenia-kernel tests | no regression, +1 new | **PASS** (227/227, was 226) |
| xenia-memory tests | no regression | **PASS** (19/19) |
| Determinism (ctx_ptr) | 2 runs bit-stable on diverging entries | **PASS** |
| PRIMARY: ctx_ptr in 0xAxxx-0xBxxx range | 3/3 diverging entries | **PASS** |
| (b) missing (op,lr) tuples drop from 28 | not re-measured; expected unchanged | n/a |
| (c) ours analogs for canary tids 15/27/28 | 0 → 0 | **FAIL** |
| (d) producer-rate at 0x824AB168 ≥10% | not re-measured; expected unchanged | n/a |
| (e) tid=1 wedge moved/absent | same wedge geometry | NEUTRAL |
**Outcome class: PRIMARY-GATE-PASS-NO-CASCADE.** The structural
address-space-bucket bug is closed. The downstream cascade (worker
fan-out, producer rate, wedge) is unaffected.
## Why the cascade did not follow
The 2.G report (per memory index) framed the `0xBCE25640` ctx-state
installer chain as the next blocker once vA0000000 was mapped. 2.H
maps the bucket but does NOT address what writes the vtable at
`[ctx+44]` to point at `0x8200A1E8` / what game-side path leads
`sub_824FD240+0x24` to be invoked (AUDIT-068 Session 4). Two observations:
1. The arena VA itself is now allocatable in ours. The previous
"unmapped VA" fault under Review A Step 1's `--force-spawn-workers`
crowbar should no longer trip on the mapping (the VA exists). But:
2. The arena would only be naturally allocated if the upstream guest
PPC code-path that calls `MmAllocatePhysicalMemoryEx` with
`X_MEM_LARGE_PAGES` and lands the arena there ever fires in ours.
In 2.H, the boot trajectory still wedges at the same point —
meaning the ctx-installer chain (per AUDIT-068 S4 the
`sub_824F8398 → sub_824F7CD0 → sub_824F7800 → sub_824FD240+0x24`
sequence) is downstream of the wedge and never executes.
The 2.H fix is **necessary** (every cooperating subsystem now has
ctx_ptr in the right bucket — see the 0xbe8cbb3c, 0xbd184a40,
0xbc6c5580 entries which DO fire pre-wedge) but **not sufficient** to
break the wedge. The wedge is still at `sub_821CB030+0x1AC` per AUDIT-049,
upstream of the AUDIT-068 install epoch (host_ns ≈ 9.4 s on canary, ~13×
later than ours's wedge at ~810 ms).
## Tripstone audit
- **#28** (per-engine tid stability): the ctx_ptr comparison is keyed on
`entry_pc` (stable across engines) — never on the host-side tid label.
- **#39** (composite progression metric): the PRIMARY gate is
**structural** (bucket-range parity), explicitly NOT a swaps/draws/RT
progression claim. The fix is NOT advertised as progression. Indeed,
the event-count is identical to 2.F (118,149) — guest progression is
unchanged.
- **#40** (single-keystone framing): the framing "vA0000000 is the
keystone" is **PARTIALLY FALSIFIED**. The structural gate passes
(closing one real bug), but the predicted downstream cascade
(workers spawn → producers fire → wedge unblocks) does NOT follow.
Retained on its own merits; not advertised as the keystone.
## Confidence
**HIGH** that the patch correctly maps `MmAllocatePhysicalMemoryEx`
large-page requests to the canary `vA0000000` heap range.
**HIGH** that this is a real bug fixed (the previous `0x4xxxxxxx`
addresses are factually wrong vs canary's heap layout).
**HIGH** that the cascade does not follow (3-of-3 cascade gates
flat: identical event count, identical thread set, same wedge).
**MEDIUM** that this fix is on the critical path of the AUDIT-068
ctx-installer chain — necessary but downstream of the unidentified
upstream cause that prevents `sub_824F8398` from firing in ours at
all.
## Next iterate recommendation
**NOT a follow-up vA-bucket-extension iterate.** The vC0000000 /
vE0000000 buckets are still on the legacy `heap_alloc` at
`0x4000_0000`; this is structurally wrong but unobserved on the
boot trajectory (no calls in our window request 16MB or 4KB pages —
the three diverging `thread.create`s all routed via the 64KB
`X_MEM_LARGE_PAGES` flag, confirmed by their landing in the new
allocator).
**Recommended next**: iterate-2I attacks the upstream cause of the
AUDIT-068 install-chain non-firing. Two candidate angles:
- (i) Mine canary phase-a log for the kernel-call sequence in the
window `host_ns ∈ [0, 1.0]s` (well before the install epoch) and
diff vs ours's 2.H phase-a log. The first kernel-call mismatch in
that window is upstream of every observable wedge / spawn
divergence. **~0 engine LOC**, pure data work.
- (ii) Re-attempt Review A Step 1's `--force-spawn-workers` now that
`0xBCE25640` is allocable. Workers may still fault on missing
vtable entries (the `[ctx+44] = 0x8200A1E8` write is a game-side
ctor that hasn't run), but the fault-class will shift from
"unmapped page" to "uninitialized vtable" — a more informative
divergence.
## Artifacts
Under `xenia-rs/audit-runs/iterate-2H-physical-heap-vA/`:
- `ours-cold.jsonl` (118,149 events, 50M-instr run, phase-a log,
md5sum `1aa11b1a4839ca8b670f53f29df2c885`)
- `ours-cold.stdout.log` / `ours-cold.stderr.log` (empty — quiet mode)
- `writer-report.md` (this file)
## Patch summary (text form, for review)
```
diff --git a/crates/xenia-kernel/src/state.rs b/crates/xenia-kernel/src/state.rs
+ pub physical_heap_cursor: std::sync::atomic::AtomicU32,
+ physical_heap_cursor: AtomicU32::new(0xC000_0000),
+ pub fn physical_heap_alloc(&self, size: u32, mem: &GuestMemory) -> Option<u32> {
+ use std::sync::atomic::Ordering;
+ if size == 0 { return None; }
+ let aligned_size = (size + 0xFFFF) & !0xFFFF;
+ let base = loop {
+ let cur = self.physical_heap_cursor.load(Ordering::Relaxed);
+ let new_cur = cur.checked_sub(aligned_size)?;
+ if new_cur < 0xA000_0000 { return None; }
+ match self.physical_heap_cursor.compare_exchange(
+ cur, new_cur, Ordering::Relaxed, Ordering::Relaxed,
+ ) { Ok(_) => break new_cur, Err(_) => continue }
+ };
+ let protect = MemoryProtect::READ | MemoryProtect::WRITE;
+ mem.alloc(base, aligned_size, protect).ok()?;
+ Some(base)
+ }
diff --git a/crates/xenia-kernel/src/exports.rs b/crates/xenia-kernel/src/exports.rs
- let size = ctx.gpr[4] as u32;
+ let size = ctx.gpr[4] as u32;
+ let protect_bits = ctx.gpr[5] as u32;
- match state.heap_alloc(size, mem) {
+ const X_MEM_LARGE_PAGES: u32 = 0x2000_0000;
+ let result = if protect_bits & X_MEM_LARGE_PAGES != 0 {
+ state.physical_heap_alloc(size, mem)
+ } else {
+ state.heap_alloc(size, mem)
+ };
+ match result {
```