feat(kernel): KRNBUG-AUDIT-004 — --ctor-probe PC hook + --dump-addr struct dump
Diagnostic-only, read-only. Lockstep `instructions=100000002`
preserved bit-exact at -n 100M --stable-digest. 586 → 588 tests.
Adds two read-only diagnostics for the parked-waiter producer hunt:
* `--ctor-probe=0x8217C850,0x...` — at every interpreter step,
if `ctx.pc` is in the configured set, print one `CTOR-PROBE`
line capturing live r3 (= `this` in MSVC PPC ctors), lr
(= return site), sp, plus an 8-frame back-chain with
saved-r31/r30 per frame. Fires once per hit, exactly what the
8-instance-pool probe needed.
* `--dump-addr=0x828F3D08,0x828F4070,0x828F3EC0,...` — at end of
run (after the FOCUS report in `dump_thread_diagnostic`), each
address gets a 128-byte hex + be32 + ASCII dump. Used to
inspect the static dispatcher / job-queue struct layouts
AUDIT-003 identified.
Both gated default-off; empty set is a single `is_empty()` test on
the hot path. No guest state is mutated, so the
`sylpheed_n*m.json` lockstep digest is preserved.
KRNBUG-AUDIT-004 findings (corrects KRNBUG-AUDIT-002/003):
1. **The "8-instance pool" hypothesis for handle 0x1004 is FALSE.**
Probing the inner per-instance ctors `[0x821783D8, 0x82181750,
0x821701C8]` at -n 50M shows each fires EXACTLY ONCE with
r3 = `[0x828F3EC0, 0x828F3D08, 0x828F4070]` respectively. All
three handles are Meyers-style singletons with one dispatcher
each. The "called 8 times" claim came from miscounting raw
entries to the OUTER getter sub_8217C850 — but that getter is
itself a Meyers-singleton-getter; only the FIRST entry cascades
through to bl 0x821783D8 (gated on `[0x828F48D8] bit 0`).
2. **The producer indirection layer is the singleton-getter
itself.** Static byte-scan of .rdata / .data shows 0 hits for
the dispatcher addresses — no static registry table holds them.
But the xrefs table for the OUTER getters reveals 5–6 callers
each, MOSTLY non-create-chain, sharing the canonical producer
pattern: `bl outer_singleton_getter; lwz r3, OFFSET(r3); bl
0x824AA1D8` (with OFFSET=80 for 0x100c, =36 for 0x15e0). So the
AUDIT-003 xref audit was necessary but not sufficient — it
correctly saw "no direct producer references" but missed the
singleton-getter indirection layer.
3. **Dispatcher struct layouts** (128-byte dumps captured at -n
50M --halt-on-deadlock):
- 0x828F3D08 (handle 0x100c): event_handle at +0x4C (0x100c),
thread_handle at +0x48 (0x1010), self-pointer at +0x74,
capacity 7 at +0x28, queue empty (+0/+3C = -1).
- 0x828F4070 (handle 0x15e0): event_handle at +0x20 (0x15e0),
sibling-handle 0x15E4 at +0x1C, queue empty (+0x10 = -1).
- 0x828F3EC0 (handle 0x1004): event_handle at +0x78 (0x1004),
4 guest-heap sub-buffers at +0x20/+0x3C/+0x44/+0x50 in
0x4xxxxxxx range — noticeably different layout from the
other two pure POD job queues.
Files:
crates/xenia-kernel/src/state.rs ctor_probe_pcs / dump_addrs +
fire_ctor_probe_if_match + 2 tests
crates/xenia-app/src/main.rs Exec --ctor-probe / --dump-addr
CLI parsing, prologue hook,
end-of-run struct dumper
audit-findings.md KRNBUG-AUDIT-004 entry
audit-runs/audit-004/ 50M probe runs (v1 outer-getter
hits, v2 inner-ctor hits proving
the singleton hypothesis)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -4457,3 +4457,168 @@ function arg (no constant-load), but the simple
|
|||||||
3. Treat 0x42450b5c independently. AUDIT-002's hook missed it because
|
3. Treat 0x42450b5c independently. AUDIT-002's hook missed it because
|
||||||
the parking site (PC=0x824cd4f4) isn't routed through `do_wait_single`.
|
the parking site (PC=0x824cd4f4) isn't routed through `do_wait_single`.
|
||||||
Open KRNBUG-AUDIT-004 for that wait path.
|
Open KRNBUG-AUDIT-004 for that wait path.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### KRNBUG-AUDIT-004 — `--ctor-probe` PC hook + `--dump-addr` struct dump; producer-indirection layer identified; "8-instance pool" hypothesis falsified
|
||||||
|
|
||||||
|
**Status**: landed on master (no-ff merge of feature branch
|
||||||
|
`dispatcher-probe-audit/p0-ctor-probe-and-struct-dump`). Diagnostic-
|
||||||
|
only, read-only, lockstep-preserved (`instructions=100000002` at
|
||||||
|
`-n 100M --stable-digest`).
|
||||||
|
|
||||||
|
**Tests**: 586 → **588**.
|
||||||
|
|
||||||
|
**What landed (`crates/xenia-kernel/src/state.rs`):**
|
||||||
|
- `pub ctor_probe_pcs: HashSet<u32>` field on `KernelState` (default
|
||||||
|
empty).
|
||||||
|
- `pub fire_ctor_probe_if_match(hw_id, mem)` — fast-rejects when set
|
||||||
|
is empty; on match prints a one-shot record `CTOR-PROBE pc=...
|
||||||
|
tid=... hw=... cycle=... sp=... r3=... lr=...` plus an 8-frame
|
||||||
|
back-chain with saved-r31/r30 per frame. Pure read.
|
||||||
|
- `pub dump_addrs: Vec<u32>` field for end-of-run struct dumps.
|
||||||
|
- 2 unit tests: empty-set no-op, set-membership invariant.
|
||||||
|
|
||||||
|
**What landed (`crates/xenia-app/src/main.rs`):**
|
||||||
|
- `--ctor-probe=0x8217C850,0x82181750,...` CLI flag (and
|
||||||
|
`XENIA_CTOR_PROBE`). Parsed into `kernel.ctor_probe_pcs` at
|
||||||
|
`cmd_exec_inner` startup.
|
||||||
|
- `--dump-addr=0x828F3D08,0x828F4070,0x828F3EC0,...` CLI flag (and
|
||||||
|
`XENIA_DUMP_ADDR`). Each address gets a 128-byte hex+be32+ASCII
|
||||||
|
dump at end-of-run, after the per-handle FOCUS report.
|
||||||
|
- `worker_prologue` calls `fire_ctor_probe_if_match` after reading
|
||||||
|
`pc` and before any thunk-dispatch / step-block branch.
|
||||||
|
`dump_thread_diagnostic` consumes `kernel.dump_addrs`.
|
||||||
|
|
||||||
|
**Decisive findings (corrects KRNBUG-AUDIT-002/003):**
|
||||||
|
|
||||||
|
1. **The "8-instance pool" hypothesis for handle 0x1004 is FALSE.**
|
||||||
|
Probe ran at `-n 50M --halt-on-deadlock` with PCs
|
||||||
|
`[0x821783D8, 0x82181750, 0x821701C8]` (the per-instance ctors
|
||||||
|
for handles 0x1004 / 0x100c / 0x15e0 respectively). Each fired
|
||||||
|
**EXACTLY ONCE**:
|
||||||
|
```
|
||||||
|
CTOR-PROBE pc=0x821783d8 tid=1 hw=0 cycle=1401430 r3=0x828f3ec0 ← handle 0x1004
|
||||||
|
CTOR-PROBE pc=0x82181750 tid=1 hw=0 cycle=5363599 r3=0x828f3d08 ← handle 0x100c
|
||||||
|
CTOR-PROBE pc=0x821701c8 tid=1 hw=0 cycle=9203618 r3=0x828f4070 ← handle 0x15e0
|
||||||
|
```
|
||||||
|
Handle 0x1004 has a SINGLE dispatcher at **0x828F3EC0**, not 8
|
||||||
|
pool members. The earlier "called 8 times" claim came from
|
||||||
|
counting raw entries to the OUTER getter `sub_8217C850`, but
|
||||||
|
`sub_8217C850` is a Meyers-style singleton-getter — its inner
|
||||||
|
`bl 0x821783D8` (the per-instance ctor) is gated on a one-shot
|
||||||
|
init flag at `[0x828F48D8] bit 0`. Subsequent `sub_8217C850`
|
||||||
|
calls just return the existing slot pointer.
|
||||||
|
|
||||||
|
2. **The producer indirection layer IS the singleton-getter
|
||||||
|
itself.** Static byte-scans of `.rdata` and `.data` show 0 hits
|
||||||
|
for the dispatcher addresses 0x828F3D08 / 0x828F4070 — so no
|
||||||
|
registry table holds them. But the `xrefs` table for the OUTER
|
||||||
|
getters reveals:
|
||||||
|
```
|
||||||
|
sub_821800D8 (outer for 0x828F3D08, handle 0x100c): 6 callers
|
||||||
|
0x821802d8 (sub_82180158+0x180) ← non-create-chain
|
||||||
|
0x821806e0 (sub_821805C8+0x118) ← non-create-chain
|
||||||
|
0x82180b28 (sub_82180A10+0x118) ← non-create-chain
|
||||||
|
0x82180ea0 (sub_82180D90+0x110) ← non-create-chain
|
||||||
|
0x82181254 (sub_821810E0+0x174) ← non-create-chain
|
||||||
|
0x82181c54 (sub_82181C28+0x2C) ← create-chain ONLY
|
||||||
|
|
||||||
|
sub_8216F618 (outer for 0x828F4070, handle 0x15e0): 5 callers
|
||||||
|
0x8216f9d4 (sub_8216F818+0x1BC) ← non-create-chain
|
||||||
|
0x8216fc08 (sub_8216F9F0+0x218) ← non-create-chain
|
||||||
|
0x821700b8 (sub_8216FF70+0x148) ← non-create-chain
|
||||||
|
0x821700f4 (sub_821700E0+0x14) ← non-create-chain
|
||||||
|
0x821707f4 (sub_821707C0+0x34) ← create-chain ONLY
|
||||||
|
```
|
||||||
|
The non-create-chain consumers all share the **canonical
|
||||||
|
producer pattern**:
|
||||||
|
```
|
||||||
|
bl outer_singleton_getter ; r3 = dispatcher ptr
|
||||||
|
lwz r3, OFFSET(r3) ; r3 = an event handle / queue field
|
||||||
|
bl 0x824AA1D8 ; signal/wake function
|
||||||
|
```
|
||||||
|
For 0x100c the offset is 80 (= 0x50); for 0x15e0 the offset is
|
||||||
|
36 (= 0x24).
|
||||||
|
|
||||||
|
So **interpretation (2) of the audit charter is confirmed**:
|
||||||
|
producers reference the dispatchers via a function-call layer of
|
||||||
|
indirection, not through direct address materialization. The
|
||||||
|
xref-table audit in AUDIT-003 (which only catches direct
|
||||||
|
constant-loads of the dispatcher base) was **necessary but not
|
||||||
|
sufficient** — it correctly saw "no direct producer references"
|
||||||
|
but missed the singleton-getter indirection.
|
||||||
|
|
||||||
|
3. **Dispatcher struct layouts** (128-byte dumps at `-n 50M
|
||||||
|
--halt-on-deadlock`):
|
||||||
|
```
|
||||||
|
0x828F3D08 (handle 0x100c, per-instance ctor sub_82181750):
|
||||||
|
+0x00 = 0xFFFFFFFF ; queue head/tail sentinel
|
||||||
|
+0x28 = 0x00000007 ; capacity = 7
|
||||||
|
+0x2C = 0x01000000 ; init flag
|
||||||
|
+0x3C = 0xFFFFFFFF ; secondary sentinel
|
||||||
|
+0x48 = 0x00001010 ; thread_handle (worker thread)
|
||||||
|
+0x4C = 0x0000100C ; event_handle (= self handle 0x100c)
|
||||||
|
+0x50 = 0x00000000 ; producer reads this — currently 0
|
||||||
|
+0x70 = 0x00000001 ; refcount?
|
||||||
|
+0x74 = 0x828F3D08 ; self-pointer
|
||||||
|
|
||||||
|
0x828F4070 (handle 0x15e0, per-instance ctor sub_821701C8):
|
||||||
|
+0x00 = 0x01000000 ; init flag
|
||||||
|
+0x10 = 0xFFFFFFFF ; queue sentinel
|
||||||
|
+0x1C = 0x000015E4 ; sibling-handle (NOT in our parked
|
||||||
|
; set — possibly a thread handle)
|
||||||
|
+0x20 = 0x000015E0 ; event_handle (= self handle 0x15e0)
|
||||||
|
+0x24 = 0x00000000 ; producer reads this — currently 0
|
||||||
|
+0x40 = 0xFFFFFFFF ; secondary sentinel
|
||||||
|
|
||||||
|
0x828F3EC0 (handle 0x1004, per-instance ctor sub_821783D8):
|
||||||
|
+0x00 = 0x01000000 ; init flag
|
||||||
|
+0x10 = 0xFFFFFFFF ; queue sentinel
|
||||||
|
+0x20 = 0x40541BC0 ; heap pointer (sub-buffer #1)
|
||||||
|
+0x30 = 0x00000014 ; size 20
|
||||||
|
+0x34 = 0x0000002F ; size 47
|
||||||
|
+0x38 = 0x414F5F60 ; heap-range payload (or two halfwords)
|
||||||
|
+0x3C = 0x40211CA0 ; heap pointer (sub-buffer #2)
|
||||||
|
+0x44 = 0x405418C0 ; heap pointer (sub-buffer #3)
|
||||||
|
+0x50 = 0x40111840 ; heap pointer (sub-buffer #4)
|
||||||
|
+0x58 = 0xFFFFFFFF ; sentinel
|
||||||
|
+0x5C = 0xFFFFFFFF ; sentinel
|
||||||
|
+0x76 = 0x000012AC ; possibly thread id
|
||||||
|
+0x78 = 0x00001004 ; event_handle (= self handle 0x1004)
|
||||||
|
```
|
||||||
|
The 0x1004 dispatcher is **noticeably different**: it owns 4
|
||||||
|
guest-heap sub-buffers in 0x4xxxxxxx range, suggesting it
|
||||||
|
manages a more complex resource than the other two (which are
|
||||||
|
pure POD job queues). The +0x78 location of the event_handle
|
||||||
|
differs from 0x100c's +0x4C and 0x15e0's +0x20, so each
|
||||||
|
subsystem has its own struct layout (no shared base class).
|
||||||
|
|
||||||
|
**Reproduce:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cargo run --release -p xenia-app -- exec 'sylpheed.iso' \
|
||||||
|
--halt-on-deadlock \
|
||||||
|
--trace-handles-focus=0x1004,0x100c,0x15e0 \
|
||||||
|
--ctor-probe=0x821783D8,0x82181750,0x821701C8 \
|
||||||
|
--dump-addr=0x828F3D08,0x828F4070,0x828F3EC0 \
|
||||||
|
-n 50000000
|
||||||
|
```
|
||||||
|
|
||||||
|
Trace files saved at:
|
||||||
|
- `audit-runs/audit-004/run-50m-probe.txt` (outer-getter probes)
|
||||||
|
- `audit-runs/audit-004/run-50m-probe-v2.txt` (inner-ctor probes — singleton hypothesis confirmed)
|
||||||
|
|
||||||
|
**Recommendation for next session (do not implement a fix):**
|
||||||
|
|
||||||
|
Hook the entry of each non-create-chain consumer site for handle
|
||||||
|
0x100c (5 sites: 0x821802d8, 0x821806e0, 0x82180b28, 0x82180ea0,
|
||||||
|
0x82181254) and for handle 0x15e0 (4 sites: 0x8216f9d4, 0x8216fc08,
|
||||||
|
0x821700b8, 0x821700f4) using `--ctor-probe=...`. If any fire, then
|
||||||
|
the producer DOES execute and the failure mode is in the wake/signal
|
||||||
|
chain (probably `lwz r3, OFFSET(r3)` reads zero — see dispatcher dump
|
||||||
|
[+0x50] = 0 for 0x100c, [+0x24] = 0 for 0x15e0 — and the wake
|
||||||
|
function 0x824AA1D8 is then called with handle=0). If none fire,
|
||||||
|
the producer chain is gated upstream (likely a feature flag, init
|
||||||
|
phase, or RPC handler that never fires). Either way, the next
|
||||||
|
diagnostic narrows the bug surface dramatically.
|
||||||
|
|||||||
1013
audit-runs/audit-004/run-50m-probe-v2.txt
Normal file
1013
audit-runs/audit-004/run-50m-probe-v2.txt
Normal file
File diff suppressed because it is too large
Load Diff
1151
audit-runs/audit-004/run-50m-probe.txt
Normal file
1151
audit-runs/audit-004/run-50m-probe.txt
Normal file
File diff suppressed because it is too large
Load Diff
@@ -176,6 +176,26 @@ enum Commands {
|
|||||||
/// `XENIA_XAUDIO_TICK=1`.
|
/// `XENIA_XAUDIO_TICK=1`.
|
||||||
#[arg(long)]
|
#[arg(long)]
|
||||||
xaudio_tick: bool,
|
xaudio_tick: bool,
|
||||||
|
/// Diagnostic. Comma-separated list of guest PCs at which the
|
||||||
|
/// worker prologue prints a one-shot `CTOR-PROBE` line capturing
|
||||||
|
/// live r3/lr/sp + an 8-frame back-chain. Hit-once-per-hit — for
|
||||||
|
/// the bridge ctor `sub_8217C850` called 8× from the
|
||||||
|
/// static-init driver, the probe fires 8×. Use to recover
|
||||||
|
/// pool-element `this` addresses that MSVC ctors fail to
|
||||||
|
/// preserve in r31. Read-only; lockstep digest unaffected.
|
||||||
|
/// Examples: `--ctor-probe=0x8217C850`,
|
||||||
|
/// `--ctor-probe=0x82181750,0x821701C8`.
|
||||||
|
#[arg(long)]
|
||||||
|
ctor_probe: Option<String>,
|
||||||
|
/// Diagnostic. Comma-separated list of guest addresses to dump
|
||||||
|
/// (64 bytes each, hex + u32 lanes) at end-of-run, after the
|
||||||
|
/// per-handle FOCUS report. Used to inspect the static
|
||||||
|
/// dispatcher / job-queue / pool struct layouts identified by
|
||||||
|
/// AUDIT-003 (e.g. `--dump-addr=0x828F3D08,0x828F4070`). The
|
||||||
|
/// dump captures a snapshot at the moment `dump_thread_diagnostic`
|
||||||
|
/// runs — typically when `--halt-on-deadlock` triggers.
|
||||||
|
#[arg(long)]
|
||||||
|
dump_addr: Option<String>,
|
||||||
},
|
},
|
||||||
/// Browse XISO disc image contents
|
/// Browse XISO disc image contents
|
||||||
Browse {
|
Browse {
|
||||||
@@ -331,6 +351,8 @@ fn main() -> Result<()> {
|
|||||||
reservations_table,
|
reservations_table,
|
||||||
parallel,
|
parallel,
|
||||||
xaudio_tick,
|
xaudio_tick,
|
||||||
|
ctor_probe,
|
||||||
|
dump_addr,
|
||||||
} => cmd_exec(
|
} => cmd_exec(
|
||||||
&path,
|
&path,
|
||||||
max_instructions,
|
max_instructions,
|
||||||
@@ -349,6 +371,8 @@ fn main() -> Result<()> {
|
|||||||
reservations_table,
|
reservations_table,
|
||||||
parallel,
|
parallel,
|
||||||
xaudio_tick,
|
xaudio_tick,
|
||||||
|
ctor_probe.as_deref(),
|
||||||
|
dump_addr.as_deref(),
|
||||||
),
|
),
|
||||||
Commands::Browse { path } => cmd_browse(&path),
|
Commands::Browse { path } => cmd_browse(&path),
|
||||||
Commands::Info { path } => cmd_info(&path),
|
Commands::Info { path } => cmd_info(&path),
|
||||||
@@ -550,6 +574,8 @@ fn cmd_exec(
|
|||||||
reservations_table: bool,
|
reservations_table: bool,
|
||||||
parallel: bool,
|
parallel: bool,
|
||||||
xaudio_tick: bool,
|
xaudio_tick: bool,
|
||||||
|
ctor_probe: Option<&str>,
|
||||||
|
dump_addr: Option<&str>,
|
||||||
) -> Result<()> {
|
) -> Result<()> {
|
||||||
cmd_exec_inner(
|
cmd_exec_inner(
|
||||||
path,
|
path,
|
||||||
@@ -569,6 +595,8 @@ fn cmd_exec(
|
|||||||
reservations_table,
|
reservations_table,
|
||||||
parallel,
|
parallel,
|
||||||
xaudio_tick,
|
xaudio_tick,
|
||||||
|
ctor_probe,
|
||||||
|
dump_addr,
|
||||||
None,
|
None,
|
||||||
None,
|
None,
|
||||||
false,
|
false,
|
||||||
@@ -607,6 +635,8 @@ fn cmd_check(
|
|||||||
reservations_table,
|
reservations_table,
|
||||||
parallel,
|
parallel,
|
||||||
xaudio_tick,
|
xaudio_tick,
|
||||||
|
None, // ctor_probe — diagnostic, never wanted on goldens
|
||||||
|
None, // dump_addr — same
|
||||||
out,
|
out,
|
||||||
expect,
|
expect,
|
||||||
stable_digest,
|
stable_digest,
|
||||||
@@ -631,6 +661,8 @@ fn cmd_exec_inner(
|
|||||||
reservations_table: bool,
|
reservations_table: bool,
|
||||||
parallel: bool,
|
parallel: bool,
|
||||||
xaudio_tick: bool,
|
xaudio_tick: bool,
|
||||||
|
ctor_probe: Option<&str>,
|
||||||
|
dump_addr: Option<&str>,
|
||||||
digest_out: Option<&str>,
|
digest_out: Option<&str>,
|
||||||
digest_expect: Option<&str>,
|
digest_expect: Option<&str>,
|
||||||
stable_digest: bool,
|
stable_digest: bool,
|
||||||
@@ -879,6 +911,89 @@ fn cmd_exec_inner(
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Diagnostic. Parse `--ctor-probe=0x8217C850,0x...` (or
|
||||||
|
// `XENIA_CTOR_PROBE=...`) into `kernel.ctor_probe_pcs`. The
|
||||||
|
// worker prologue checks this set on every step; on a hit it
|
||||||
|
// prints a single back-chain capture line. Empty set = no
|
||||||
|
// probes = no-op fast path.
|
||||||
|
let ctor_probe_combined: Option<String> = match (ctor_probe, std::env::var("XENIA_CTOR_PROBE").ok()) {
|
||||||
|
(Some(s), _) => Some(s.to_string()),
|
||||||
|
(None, Some(s)) if !s.is_empty() => Some(s),
|
||||||
|
_ => None,
|
||||||
|
};
|
||||||
|
if let Some(list) = ctor_probe_combined {
|
||||||
|
for token in list.split(',').map(str::trim).filter(|s| !s.is_empty()) {
|
||||||
|
let parsed = if let Some(hex) = token.strip_prefix("0x").or_else(|| token.strip_prefix("0X")) {
|
||||||
|
u32::from_str_radix(hex, 16)
|
||||||
|
} else {
|
||||||
|
token.parse::<u32>()
|
||||||
|
};
|
||||||
|
match parsed {
|
||||||
|
Ok(pc) => {
|
||||||
|
kernel.ctor_probe_pcs.insert(pc);
|
||||||
|
}
|
||||||
|
Err(_) => {
|
||||||
|
return Err(anyhow::anyhow!(
|
||||||
|
"invalid PC in --ctor-probe: {token:?}"
|
||||||
|
));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if !quiet && !kernel.ctor_probe_pcs.is_empty() {
|
||||||
|
let pcs: Vec<String> = kernel
|
||||||
|
.ctor_probe_pcs
|
||||||
|
.iter()
|
||||||
|
.map(|p| format!("{p:#010x}"))
|
||||||
|
.collect();
|
||||||
|
tracing::info!(
|
||||||
|
"ctor probes armed: {} ({})",
|
||||||
|
kernel.ctor_probe_pcs.len(),
|
||||||
|
pcs.join(", ")
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Diagnostic. Parse `--dump-addr=0x828F3D08,...` (or
|
||||||
|
// `XENIA_DUMP_ADDR=...`) into `kernel.dump_addrs`. The contents
|
||||||
|
// are dumped at end-of-run by `dump_thread_diagnostic`. Pure
|
||||||
|
// read; never mutates guest state.
|
||||||
|
let dump_addr_combined: Option<String> = match (dump_addr, std::env::var("XENIA_DUMP_ADDR").ok()) {
|
||||||
|
(Some(s), _) => Some(s.to_string()),
|
||||||
|
(None, Some(s)) if !s.is_empty() => Some(s),
|
||||||
|
_ => None,
|
||||||
|
};
|
||||||
|
if let Some(list) = dump_addr_combined {
|
||||||
|
for token in list.split(',').map(str::trim).filter(|s| !s.is_empty()) {
|
||||||
|
let parsed = if let Some(hex) = token.strip_prefix("0x").or_else(|| token.strip_prefix("0X")) {
|
||||||
|
u32::from_str_radix(hex, 16)
|
||||||
|
} else {
|
||||||
|
token.parse::<u32>()
|
||||||
|
};
|
||||||
|
match parsed {
|
||||||
|
Ok(addr) => {
|
||||||
|
kernel.dump_addrs.push(addr);
|
||||||
|
}
|
||||||
|
Err(_) => {
|
||||||
|
return Err(anyhow::anyhow!(
|
||||||
|
"invalid address in --dump-addr: {token:?}"
|
||||||
|
));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if !quiet && !kernel.dump_addrs.is_empty() {
|
||||||
|
let addrs: Vec<String> = kernel
|
||||||
|
.dump_addrs
|
||||||
|
.iter()
|
||||||
|
.map(|a| format!("{a:#010x}"))
|
||||||
|
.collect();
|
||||||
|
tracing::info!(
|
||||||
|
"dump addresses armed: {} ({})",
|
||||||
|
kernel.dump_addrs.len(),
|
||||||
|
addrs.join(", ")
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
// Install the GPU register aperture MMIO region on the guest memory so
|
// Install the GPU register aperture MMIO region on the guest memory so
|
||||||
// any `0x7FC8xxxx` access routes to our atomic mailbox. Matches canary's
|
// any `0x7FC8xxxx` access routes to our atomic mailbox. Matches canary's
|
||||||
// `graphics_system.cc:141-144`. The callbacks capture Arc clones of the
|
// `graphics_system.cc:141-144`. The callbacks capture Arc clones of the
|
||||||
@@ -1812,6 +1927,13 @@ fn worker_prologue(
|
|||||||
|
|
||||||
let pc = kernel.scheduler.ctx(hw_id).pc;
|
let pc = kernel.scheduler.ctx(hw_id).pc;
|
||||||
|
|
||||||
|
// 0) Diagnostic ctor-probe: if `pc` is in
|
||||||
|
// `kernel.ctor_probe_pcs`, capture live r3/lr/sp + back-chain
|
||||||
|
// and println one record. Read-only; lockstep digest unaffected.
|
||||||
|
// Empty set is the common case → single `is_empty()` test inside
|
||||||
|
// the helper, no overhead on the hot path.
|
||||||
|
kernel.fire_ctor_probe_if_match(hw_id, mem);
|
||||||
|
|
||||||
// 1) Halt-sentinel check (per HW thread).
|
// 1) Halt-sentinel check (per HW thread).
|
||||||
if pc == LR_HALT {
|
if pc == LR_HALT {
|
||||||
let injected_here = kernel.interrupts.saved.is_some()
|
let injected_here = kernel.interrupts.saved.is_some()
|
||||||
@@ -3405,6 +3527,56 @@ fn dump_thread_diagnostic(
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Diagnostic. `--dump-addr` content dump at end-of-run. Each
|
||||||
|
// address gets a 64-byte dump in three forms (raw bytes, u32
|
||||||
|
// big-endian lanes, ASCII guess). The lanes form is gold for
|
||||||
|
// pool-element / dispatcher struct reverse-engineering: at
|
||||||
|
// [base+0]=-1 means "empty queue" sentinel from AUDIT-003, while
|
||||||
|
// any image-range pointer `0x82xxxxxx` at a fixed offset hints
|
||||||
|
// at an embedded silph::Event* / vtable / next-link.
|
||||||
|
if !kernel.dump_addrs.is_empty() {
|
||||||
|
println!("\n=== Dump-addr ===");
|
||||||
|
const DUMP_BYTES: usize = 128;
|
||||||
|
for &addr in &kernel.dump_addrs {
|
||||||
|
println!(" addr={:#010x}", addr);
|
||||||
|
let mut bytes = [0u8; DUMP_BYTES];
|
||||||
|
for i in 0..DUMP_BYTES {
|
||||||
|
bytes[i] = mem.read_u8(addr.wrapping_add(i as u32));
|
||||||
|
}
|
||||||
|
for row in 0..(DUMP_BYTES / 16) {
|
||||||
|
let off = row * 16;
|
||||||
|
let hex: String = (0..16)
|
||||||
|
.map(|i| format!("{:02x}", bytes[off + i]))
|
||||||
|
.collect::<Vec<_>>()
|
||||||
|
.join(" ");
|
||||||
|
let words: String = (0..4)
|
||||||
|
.map(|i| {
|
||||||
|
let b = &bytes[off + i * 4..off + i * 4 + 4];
|
||||||
|
format!(
|
||||||
|
"{:08x}",
|
||||||
|
u32::from_be_bytes([b[0], b[1], b[2], b[3]])
|
||||||
|
)
|
||||||
|
})
|
||||||
|
.collect::<Vec<_>>()
|
||||||
|
.join(" ");
|
||||||
|
let ascii: String = (0..16)
|
||||||
|
.map(|i| {
|
||||||
|
let b = bytes[off + i];
|
||||||
|
if (0x20..=0x7E).contains(&b) {
|
||||||
|
b as char
|
||||||
|
} else {
|
||||||
|
'.'
|
||||||
|
}
|
||||||
|
})
|
||||||
|
.collect();
|
||||||
|
println!(
|
||||||
|
" +{:#04x}: {} | be32={} | {}",
|
||||||
|
off, hex, words, ascii,
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
#[allow(clippy::too_many_arguments)]
|
#[allow(clippy::too_many_arguments)]
|
||||||
|
|||||||
@@ -179,6 +179,30 @@ pub struct KernelState {
|
|||||||
/// backend (which lives on the worker thread under `--gpu-thread`).
|
/// backend (which lives on the worker thread under `--gpu-thread`).
|
||||||
pub ring_base: u32,
|
pub ring_base: u32,
|
||||||
pub ring_size_dwords: u32,
|
pub ring_size_dwords: u32,
|
||||||
|
/// Diagnostic. PCs at which the worker prologue fires a one-shot
|
||||||
|
/// stack/back-chain dump capturing live `r3` (= `this` in MSVC
|
||||||
|
/// PPC ctors), `lr` (= return site), and the cycle/tid that hit
|
||||||
|
/// the PC. Populated from `--ctor-probe=0x8217C850,0x...` /
|
||||||
|
/// `XENIA_CTOR_PROBE`. Empty by default → check is a single
|
||||||
|
/// `is_empty()` test, no extra cost on the unprobed hot path.
|
||||||
|
/// Read-only diagnostic — no guest state is mutated, so the
|
||||||
|
/// `sylpheed_n*m.json` lockstep digest is preserved.
|
||||||
|
///
|
||||||
|
/// **Why a per-PC probe instead of per-handle?** The MSVC ctors
|
||||||
|
/// at `sub_8217C850` (and friends) don't preserve `this` in r31
|
||||||
|
/// across the inner `bl` to `silph::Event::Construct`, so the
|
||||||
|
/// AUDIT-002 multi-frame back-chain at `NtCreateEvent` only
|
||||||
|
/// recovers stack-relative pointers — never the pool-element
|
||||||
|
/// `this`. Hooking the ctor's PRE-prologue PC captures r3 = this
|
||||||
|
/// before any save/restore can clobber it.
|
||||||
|
pub ctor_probe_pcs: std::collections::HashSet<u32>,
|
||||||
|
/// Diagnostic. Guest addresses to dump (64 bytes each, hex + u32
|
||||||
|
/// lanes) at end-of-run. Populated from `--dump-addr=0x828F3D08,
|
||||||
|
/// 0x828F4070`. Used to inspect static dispatcher / job-queue /
|
||||||
|
/// pool struct layouts identified by AUDIT-003. Read-only — the
|
||||||
|
/// dump is performed by `dump_thread_diagnostic`, never during
|
||||||
|
/// the hot interpreter loop, so lockstep determinism is unaffected.
|
||||||
|
pub dump_addrs: Vec<u32>,
|
||||||
}
|
}
|
||||||
|
|
||||||
impl KernelState {
|
impl KernelState {
|
||||||
@@ -230,6 +254,8 @@ impl KernelState {
|
|||||||
ring_base: 0,
|
ring_base: 0,
|
||||||
ring_size_dwords: 0,
|
ring_size_dwords: 0,
|
||||||
parallel_active: false,
|
parallel_active: false,
|
||||||
|
ctor_probe_pcs: std::collections::HashSet::new(),
|
||||||
|
dump_addrs: Vec::new(),
|
||||||
};
|
};
|
||||||
crate::exports::register_exports(&mut state);
|
crate::exports::register_exports(&mut state);
|
||||||
crate::xam::register_exports(&mut state);
|
crate::xam::register_exports(&mut state);
|
||||||
@@ -522,6 +548,48 @@ impl KernelState {
|
|||||||
self.audit.record_wake(handle, entry);
|
self.audit.record_wake(handle, entry);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Diagnostic. If the live PC for HW slot `hw_id` is in
|
||||||
|
/// `self.ctor_probe_pcs`, emit a single `CTOR-PROBE` line with
|
||||||
|
/// the current cycle, tid, hw_id, sp, r3, lr, plus an 8-frame
|
||||||
|
/// back-chain walk. Read-only — no guest state is mutated, so a
|
||||||
|
/// run with the probe set is byte-identical to one without (the
|
||||||
|
/// probe only adds println noise).
|
||||||
|
///
|
||||||
|
/// Intended call site: top of `worker_prologue`, after `pc` has
|
||||||
|
/// been read but before any thunk-dispatch / step-block branch.
|
||||||
|
/// Fires once per hit — if the same PC is reached again (e.g.
|
||||||
|
/// the bridge ctor sub_8217C850 called 8 times by the static-
|
||||||
|
/// init driver), it fires 8 times, which is exactly what we want
|
||||||
|
/// for pool-element identification.
|
||||||
|
pub fn fire_ctor_probe_if_match(&self, hw_id: u8, mem: &GuestMemory) {
|
||||||
|
if self.ctor_probe_pcs.is_empty() {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
let ctx = self.scheduler.ctx(hw_id);
|
||||||
|
let pc = ctx.pc;
|
||||||
|
if !self.ctor_probe_pcs.contains(&pc) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
let tid = self.scheduler.tid(hw_id).unwrap_or(0);
|
||||||
|
let r3 = ctx.gpr[3] as u32;
|
||||||
|
let lr = ctx.lr as u32;
|
||||||
|
let sp = ctx.gpr[1] as u32;
|
||||||
|
let cycle = ctx.cycle_count;
|
||||||
|
let frames = walk_guest_back_chain(sp, lr, mem, 8);
|
||||||
|
println!(
|
||||||
|
"CTOR-PROBE pc={:#010x} tid={} hw={} cycle={} sp={:#010x} r3={:#010x} lr={:#010x}",
|
||||||
|
pc, tid, hw_id, cycle, sp, r3, lr,
|
||||||
|
);
|
||||||
|
for (i, (fp, frame_lr)) in frames.iter().enumerate() {
|
||||||
|
let saved_r31 = mem.read_u32(fp.wrapping_sub(12));
|
||||||
|
let saved_r30 = mem.read_u32(fp.wrapping_sub(16));
|
||||||
|
println!(
|
||||||
|
" CTOR-PROBE frame={} fp={:#010x} lr={:#010x} saved-r31={:#010x} saved-r30={:#010x}",
|
||||||
|
i, fp, frame_lr, saved_r31, saved_r30,
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
/// Read a TLS slot for the currently running HW thread.
|
/// Read a TLS slot for the currently running HW thread.
|
||||||
pub fn tls_get(&self, index: u32) -> u64 {
|
pub fn tls_get(&self, index: u32) -> u64 {
|
||||||
self.scheduler.tls_get(index)
|
self.scheduler.tls_get(index)
|
||||||
@@ -1231,6 +1299,38 @@ mod tests {
|
|||||||
/// A NUL-terminated ASCII string is read up to `max`; non-printable
|
/// A NUL-terminated ASCII string is read up to `max`; non-printable
|
||||||
/// bytes mark the candidate as bogus (return empty string). The
|
/// bytes mark the candidate as bogus (return empty string). The
|
||||||
/// `.?A` prefix gating in `read_class_at_this` then rejects them.
|
/// `.?A` prefix gating in `read_class_at_this` then rejects them.
|
||||||
|
/// `fire_ctor_probe_if_match` only emits when `pc` matches a
|
||||||
|
/// configured PC. We assert it's a no-op on miss and a no-panic
|
||||||
|
/// on hit (the println goes to stdout; we just check the helper
|
||||||
|
/// reads the back-chain without faulting).
|
||||||
|
#[test]
|
||||||
|
fn fire_ctor_probe_if_match_no_op_on_empty_set() {
|
||||||
|
let mem = GuestMemory::new().expect("memory init");
|
||||||
|
let state = KernelState::new();
|
||||||
|
// No probes set → must be a no-op even when the scheduler
|
||||||
|
// ctx has whatever PC.
|
||||||
|
state.fire_ctor_probe_if_match(0, &mem);
|
||||||
|
assert!(state.ctor_probe_pcs.is_empty());
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn fire_ctor_probe_if_match_only_fires_on_listed_pc() {
|
||||||
|
// We can't easily redirect stdout under cargo-test, so this
|
||||||
|
// test mostly verifies the membership check + that no panic
|
||||||
|
// occurs when frame walking encounters zero/sentinel pages.
|
||||||
|
// The empty-stack walk returns just `[(sp, lr)]`, exercising
|
||||||
|
// the loop body once safely.
|
||||||
|
let mem = GuestMemory::new().expect("memory init");
|
||||||
|
let mut state = KernelState::new();
|
||||||
|
state.ctor_probe_pcs.insert(0x8217_C850);
|
||||||
|
// The default PpcContext on slot 0 has pc=0 (idle sentinel),
|
||||||
|
// so the probe set membership test misses → no fire.
|
||||||
|
state.fire_ctor_probe_if_match(0, &mem);
|
||||||
|
// Sanity: an unrelated PC isn't claimed.
|
||||||
|
assert!(!state.ctor_probe_pcs.contains(&0x8200_0000));
|
||||||
|
assert!(state.ctor_probe_pcs.contains(&0x8217_C850));
|
||||||
|
}
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn read_ascii_cstring_handles_termination_and_garbage() {
|
fn read_ascii_cstring_handles_termination_and_garbage() {
|
||||||
use xenia_memory::page_table::MemoryProtect;
|
use xenia_memory::page_table::MemoryProtect;
|
||||||
|
|||||||
Reference in New Issue
Block a user