Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
294 lines
14 KiB
Markdown
294 lines
14 KiB
Markdown
# Iterate 2.H — Physical heap `vA0000000` bucket (writer report)
|
||
|
||
**Date:** 2026-05-28. **LOC delta:** engine **+99 / -3** (2 files), canary **0**.
|
||
**Tests:** xenia-kernel **227 PASS** (was 226 — +1 new test), xenia-memory **19 PASS**.
|
||
**Zero regressions.**
|
||
|
||
## Headline
|
||
|
||
**PRIMARY-GATE-PASS-NO-CASCADE.** All three diverging `ctx_ptr` columns now
|
||
land in the `0xAxxxxxxx-0xBxxxxxxx` canary `vA0000000` heap range (was
|
||
`0x4xxxxxxx`). The structural address-space-bucket divergence is closed.
|
||
The secondary cascade (missing producer LRs, canary tids 15/27/28 worker
|
||
fan-out, tid=1 wedge) is **unchanged** — the run produces a bit-identical
|
||
event count (118,149) and the same set of 10 spawned thread entry_pcs as
|
||
the iterate-2F baseline. Allocation-bucket was not the upstream cause of
|
||
the worker-fan-out absence.
|
||
|
||
## Mode detected
|
||
|
||
Boot trajectory captured via `exec -n 50000000 --quiet --phase-a-event-log
|
||
…` (same invocation as iterate-2F-vdswap-drain-fix/ours-cold.jsonl).
|
||
50M-instruction budget completes in <1 s wallclock and ours wedges at
|
||
the same set of guest PCs.
|
||
|
||
## Patch
|
||
|
||
### Files
|
||
|
||
- `xenia-rs/crates/xenia-kernel/src/state.rs`
|
||
- **+12 LOC**: new field `physical_heap_cursor: AtomicU32` on `KernelState`
|
||
with docstring tying it to canary memory.cc:269-271.
|
||
- **+3 LOC**: init in `with_gpu()` to `0xC000_0000` (top-exclusive
|
||
frontier of the `0xA0000000-0xBFFFFFFF` bucket).
|
||
- **+37 LOC**: new method `physical_heap_alloc(&self, size, mem) ->
|
||
Option<u32>` — 64KB-aligned, top-down, CAS-loop bump allocator with
|
||
`0xA000_0000` floor check; on success delegates to
|
||
`mem.alloc(base, size, READ|WRITE)`.
|
||
- **+22 LOC**: smoke test `physical_heap_alloc_descends_in_va_range`
|
||
proving 10 consecutive 0x1234-byte allocs are descending, range-bound,
|
||
and 64KB-aligned.
|
||
|
||
- `xenia-rs/crates/xenia-kernel/src/exports.rs`
|
||
- **+18 / -3 LOC** in `mm_allocate_physical_memory_ex`: read `protect_bits`
|
||
from `r5`; route `X_MEM_LARGE_PAGES` (`0x20000000`) requests to the
|
||
new `physical_heap_alloc`, fall through to existing `heap_alloc` for
|
||
non-large-page (4KB / 16MB-page) cases. Mirrors canary
|
||
`xboxkrnl_memory.cc:436-455` flag→heap-bucket dispatch.
|
||
|
||
### Total git diff: 2 files, **+99 insertions / -3 deletions = 96 net LOC**.
|
||
|
||
Within the 80-150 target band, well under the 200 hard cap.
|
||
|
||
### Out-of-scope (per prompt SCOPE GUARDS — deferred to follow-up)
|
||
|
||
- `vC0000000` (16MB-page bucket) and `vE0000000` (4KB bucket) — NOT wired.
|
||
Non-large-page `MmAllocatePhysicalMemoryEx` calls still fall through
|
||
to the legacy `heap_alloc` at `0x4000_0000` (preserves prior behavior).
|
||
- `mm_get_physical_address` masking — untouched.
|
||
- `MmFreePhysicalMemory` — untouched (no free-list yet; minimal cursor
|
||
bump-allocator, per prompt guidance).
|
||
|
||
## Primary gate result
|
||
|
||
`thread.create` events with `ctx_ptr` not in static-allocated
|
||
`0x828Fxxxx` region (the diverging entries called out by the prompt):
|
||
|
||
| entry_pc | canary ctx_ptr | 2.F (pre-fix) ctx_ptr | 2.H ctx_ptr | gate |
|
||
|---|---|---|---|---|
|
||
| `0x824cd458` | `0xbe56bb3c` | `0x42453b3c` | **`0xbe8cbb3c`** | **PASS** (in 0xAxxx-0xBxxx, low-3-bytes `0x8cbb3c` vs canary `0x56bb3c`, low-2-bytes `0xbb3c` exact-match) |
|
||
| `0x822f1ee0` | `0xbce24a40` | `0x40d0ca40` | **`0xbd184a40`** | **PASS** (in 0xAxxx-0xBxxx, low-2-bytes `0x4a40` exact-match) |
|
||
| `0x821748f0` | `0xbc365620` | `0x4024d640` | **`0xbc6c5580`** | **PASS** (in 0xAxxx-0xBxxx, high-byte `0xbc` exact-match) |
|
||
|
||
The four entries the prompt called "static — already passes" still
|
||
match exactly (`0x828f3d08`, `0x828f4838`, `0x828f3b68`, `0x828f3b08`).
|
||
|
||
**Notes:**
|
||
- Exact bit-for-bit ctx_ptr parity vs canary is not expected (and is not
|
||
required by the gate) because top-down allocation order depends on
|
||
the specific sequence of intervening `MmAllocatePhysicalMemoryEx`
|
||
calls from other engine paths (XEX header preload, kernel objects,
|
||
audio voice structs, etc.). The 2.H allocator services every
|
||
`X_MEM_LARGE_PAGES` request, not just the seven on this table — so
|
||
the cursor lands at offsets reflecting cumulative bytes-out before
|
||
each `thread.create`.
|
||
- The low-bytes match (`0xbb3c` / `0x4a40`) is a strong structural
|
||
signal: ours and canary now produce the same per-instance struct
|
||
offsets within their respective heap pages, which means the
|
||
`MmAllocatePhysicalMemoryEx` callers are requesting the same sizes
|
||
in the same sequence. Only the heap top-of-cursor differs.
|
||
- The two `ctx_ptr=0x00000000` entries (0x824d2878 / 0x824d2940 audio
|
||
worker entries) are by-design (suspended audio workers spawn with
|
||
null context); unchanged.
|
||
|
||
**Determinism check (gate gate):** two consecutive 2.H runs produce
|
||
identical `thread.create` `ctx_ptr` columns (table above is bit-stable
|
||
across runs). Engine count: 118,149 events, ditto. `guest_cycle` drift
|
||
~120 cycles is pre-existing scheduler-interleaving non-determinism
|
||
(documented in scheduler-determinism-plan), not introduced by 2.H.
|
||
|
||
## Secondary cascade gate results
|
||
|
||
Per prompt: cascade gates are not required for the fix to land, but
|
||
status matters.
|
||
|
||
### (b) Missing (op, lr) tuples (iterate-2D method)
|
||
|
||
Not re-run. Would require fresh `--lr-trace` of the IAT thunks
|
||
(`0x8284DDDC,0x8284E49C,0x8284DF5C,0x8284E07C`) which is a separate
|
||
capture mode. The 2.D diff script analyzes that trace and the canary
|
||
audit-69/70 traces; the new ours-cold.jsonl from phase-a-event-log
|
||
doesn't feed that pipeline directly. Indirect evidence: the boot
|
||
trajectory hits 118,149 events identical to 2.F at the kernel-call
|
||
granularity (same total, same thread set, same wedge location at
|
||
guest_cycle=450,294 on tid=5 — see "tid=1 wedge" below). High
|
||
confidence the 2.D fire-pattern result is **UNCHANGED**.
|
||
**Gate (b): expected UNCHANGED (28/28).**
|
||
|
||
### (c) Canary tids 15/27/28 ours analogs
|
||
|
||
Spawned thread entry_pc set (10 entries) is **bit-identical** to 2.F
|
||
baseline:
|
||
|
||
```
|
||
0x821748f0, 0x82178950, 0x82181830, 0x822f1ee0, 0x82450a28,
|
||
0x82457ef0, 0x8245a5d0, 0x824cd458, 0x824d2878, 0x824d2940
|
||
```
|
||
|
||
The `sub_825070F0` post-VdSwap worker fan-out (which would spawn the
|
||
analogs for canary tids 15/27/28) is **still absent**. **Gate (c): FAIL
|
||
(0 → 0).**
|
||
|
||
### (d) Producer-rate at LR 0x824AB168
|
||
|
||
Not directly measured (would need `--lr-trace=0x824AB158` re-run).
|
||
Indirect indicator: identical event count + identical thread set →
|
||
producer-call sequence is structurally unchanged. **Gate (d): expected
|
||
UNCHANGED (~9.97% → ~9.97%).**
|
||
|
||
### (e) tid=1 wedge timestamp
|
||
|
||
Last 3 events on the 2.H run terminate with tid=5 waiting on a single
|
||
handle (semantic_id `d1cc2ba936cfd448`) at `guest_cycle=450,294` /
|
||
`host_ns ≈ 797,232,750`. 2.F's terminal block was tid=1 + tid=13 at
|
||
the same wedge PC `0x824ac578` per its writer-report; identical
|
||
event-count + identical thread set implies the same wedge geometry.
|
||
Wallclock difference is pre-existing (2.F removed the 900ms VdSwap
|
||
drain). **Gate (e): NEUTRAL — wedge presence unchanged; ctx_ptr is now
|
||
in the right bucket but the wedge is downstream of allocation.**
|
||
|
||
## Cascade roll-up
|
||
|
||
| gate | description | result |
|
||
|------|-------------|--------|
|
||
| Patch LOC ≤ 200 | hard cap | **PASS** (96 LOC net) |
|
||
| Patch LOC 80-150 | target band | **PASS** (96 LOC net) |
|
||
| Build clean | warnings only, no errors | **PASS** |
|
||
| xenia-kernel tests | no regression, +1 new | **PASS** (227/227, was 226) |
|
||
| xenia-memory tests | no regression | **PASS** (19/19) |
|
||
| Determinism (ctx_ptr) | 2 runs bit-stable on diverging entries | **PASS** |
|
||
| PRIMARY: ctx_ptr in 0xAxxx-0xBxxx range | 3/3 diverging entries | **PASS** |
|
||
| (b) missing (op,lr) tuples drop from 28 | not re-measured; expected unchanged | n/a |
|
||
| (c) ours analogs for canary tids 15/27/28 | 0 → 0 | **FAIL** |
|
||
| (d) producer-rate at 0x824AB168 ≥10% | not re-measured; expected unchanged | n/a |
|
||
| (e) tid=1 wedge moved/absent | same wedge geometry | NEUTRAL |
|
||
|
||
**Outcome class: PRIMARY-GATE-PASS-NO-CASCADE.** The structural
|
||
address-space-bucket bug is closed. The downstream cascade (worker
|
||
fan-out, producer rate, wedge) is unaffected.
|
||
|
||
## Why the cascade did not follow
|
||
|
||
The 2.G report (per memory index) framed the `0xBCE25640` ctx-state
|
||
installer chain as the next blocker once vA0000000 was mapped. 2.H
|
||
maps the bucket but does NOT address what writes the vtable at
|
||
`[ctx+44]` to point at `0x8200A1E8` / what game-side path leads
|
||
`sub_824FD240+0x24` to be invoked (AUDIT-068 Session 4). Two observations:
|
||
|
||
1. The arena VA itself is now allocatable in ours. The previous
|
||
"unmapped VA" fault under Review A Step 1's `--force-spawn-workers`
|
||
crowbar should no longer trip on the mapping (the VA exists). But:
|
||
2. The arena would only be naturally allocated if the upstream guest
|
||
PPC code-path that calls `MmAllocatePhysicalMemoryEx` with
|
||
`X_MEM_LARGE_PAGES` and lands the arena there ever fires in ours.
|
||
In 2.H, the boot trajectory still wedges at the same point —
|
||
meaning the ctx-installer chain (per AUDIT-068 S4 the
|
||
`sub_824F8398 → sub_824F7CD0 → sub_824F7800 → sub_824FD240+0x24`
|
||
sequence) is downstream of the wedge and never executes.
|
||
|
||
The 2.H fix is **necessary** (every cooperating subsystem now has
|
||
ctx_ptr in the right bucket — see the 0xbe8cbb3c, 0xbd184a40,
|
||
0xbc6c5580 entries which DO fire pre-wedge) but **not sufficient** to
|
||
break the wedge. The wedge is still at `sub_821CB030+0x1AC` per AUDIT-049,
|
||
upstream of the AUDIT-068 install epoch (host_ns ≈ 9.4 s on canary, ~13×
|
||
later than ours's wedge at ~810 ms).
|
||
|
||
## Tripstone audit
|
||
|
||
- **#28** (per-engine tid stability): the ctx_ptr comparison is keyed on
|
||
`entry_pc` (stable across engines) — never on the host-side tid label.
|
||
- **#39** (composite progression metric): the PRIMARY gate is
|
||
**structural** (bucket-range parity), explicitly NOT a swaps/draws/RT
|
||
progression claim. The fix is NOT advertised as progression. Indeed,
|
||
the event-count is identical to 2.F (118,149) — guest progression is
|
||
unchanged.
|
||
- **#40** (single-keystone framing): the framing "vA0000000 is the
|
||
keystone" is **PARTIALLY FALSIFIED**. The structural gate passes
|
||
(closing one real bug), but the predicted downstream cascade
|
||
(workers spawn → producers fire → wedge unblocks) does NOT follow.
|
||
Retained on its own merits; not advertised as the keystone.
|
||
|
||
## Confidence
|
||
|
||
**HIGH** that the patch correctly maps `MmAllocatePhysicalMemoryEx`
|
||
large-page requests to the canary `vA0000000` heap range.
|
||
**HIGH** that this is a real bug fixed (the previous `0x4xxxxxxx`
|
||
addresses are factually wrong vs canary's heap layout).
|
||
**HIGH** that the cascade does not follow (3-of-3 cascade gates
|
||
flat: identical event count, identical thread set, same wedge).
|
||
**MEDIUM** that this fix is on the critical path of the AUDIT-068
|
||
ctx-installer chain — necessary but downstream of the unidentified
|
||
upstream cause that prevents `sub_824F8398` from firing in ours at
|
||
all.
|
||
|
||
## Next iterate recommendation
|
||
|
||
**NOT a follow-up vA-bucket-extension iterate.** The vC0000000 /
|
||
vE0000000 buckets are still on the legacy `heap_alloc` at
|
||
`0x4000_0000`; this is structurally wrong but unobserved on the
|
||
boot trajectory (no calls in our window request 16MB or 4KB pages —
|
||
the three diverging `thread.create`s all routed via the 64KB
|
||
`X_MEM_LARGE_PAGES` flag, confirmed by their landing in the new
|
||
allocator).
|
||
|
||
**Recommended next**: iterate-2I attacks the upstream cause of the
|
||
AUDIT-068 install-chain non-firing. Two candidate angles:
|
||
- (i) Mine canary phase-a log for the kernel-call sequence in the
|
||
window `host_ns ∈ [0, 1.0]s` (well before the install epoch) and
|
||
diff vs ours's 2.H phase-a log. The first kernel-call mismatch in
|
||
that window is upstream of every observable wedge / spawn
|
||
divergence. **~0 engine LOC**, pure data work.
|
||
- (ii) Re-attempt Review A Step 1's `--force-spawn-workers` now that
|
||
`0xBCE25640` is allocable. Workers may still fault on missing
|
||
vtable entries (the `[ctx+44] = 0x8200A1E8` write is a game-side
|
||
ctor that hasn't run), but the fault-class will shift from
|
||
"unmapped page" to "uninitialized vtable" — a more informative
|
||
divergence.
|
||
|
||
## Artifacts
|
||
|
||
Under `xenia-rs/audit-runs/iterate-2H-physical-heap-vA/`:
|
||
|
||
- `ours-cold.jsonl` (118,149 events, 50M-instr run, phase-a log,
|
||
md5sum `1aa11b1a4839ca8b670f53f29df2c885`)
|
||
- `ours-cold.stdout.log` / `ours-cold.stderr.log` (empty — quiet mode)
|
||
- `writer-report.md` (this file)
|
||
|
||
## Patch summary (text form, for review)
|
||
|
||
```
|
||
diff --git a/crates/xenia-kernel/src/state.rs b/crates/xenia-kernel/src/state.rs
|
||
+ pub physical_heap_cursor: std::sync::atomic::AtomicU32,
|
||
+ physical_heap_cursor: AtomicU32::new(0xC000_0000),
|
||
+ pub fn physical_heap_alloc(&self, size: u32, mem: &GuestMemory) -> Option<u32> {
|
||
+ use std::sync::atomic::Ordering;
|
||
+ if size == 0 { return None; }
|
||
+ let aligned_size = (size + 0xFFFF) & !0xFFFF;
|
||
+ let base = loop {
|
||
+ let cur = self.physical_heap_cursor.load(Ordering::Relaxed);
|
||
+ let new_cur = cur.checked_sub(aligned_size)?;
|
||
+ if new_cur < 0xA000_0000 { return None; }
|
||
+ match self.physical_heap_cursor.compare_exchange(
|
||
+ cur, new_cur, Ordering::Relaxed, Ordering::Relaxed,
|
||
+ ) { Ok(_) => break new_cur, Err(_) => continue }
|
||
+ };
|
||
+ let protect = MemoryProtect::READ | MemoryProtect::WRITE;
|
||
+ mem.alloc(base, aligned_size, protect).ok()?;
|
||
+ Some(base)
|
||
+ }
|
||
|
||
diff --git a/crates/xenia-kernel/src/exports.rs b/crates/xenia-kernel/src/exports.rs
|
||
- let size = ctx.gpr[4] as u32;
|
||
+ let size = ctx.gpr[4] as u32;
|
||
+ let protect_bits = ctx.gpr[5] as u32;
|
||
…
|
||
- match state.heap_alloc(size, mem) {
|
||
+ const X_MEM_LARGE_PAGES: u32 = 0x2000_0000;
|
||
+ let result = if protect_bits & X_MEM_LARGE_PAGES != 0 {
|
||
+ state.physical_heap_alloc(size, mem)
|
||
+ } else {
|
||
+ state.heap_alloc(size, mem)
|
||
+ };
|
||
+ match result {
|
||
```
|