Commit Graph

4 Commits

Author SHA1 Message Date
MechaCat02
acb29db444 [iterate-3AL] Superblock dispatch: chain basic blocks per slot-visit (~1.6x boot-to-splash)
Replace the one-basic-block-per-slot-per-round lockstep dispatch with a
SUPERBLOCK runner: each slot-visit chains straight-line blocks through
their terminating branches up to a deterministic instruction budget,
amortizing the per-round (timebase/coord/round_schedule) and per-slot
(worker_prologue) dispatch tax over ~128 instructions instead of ~6.

Yield-points (end the chain, return to the round) are pure functions of
guest state, preserving the lockstep cross-thread interleaving correctness:
  - non-Continue step result (Yield/SystemCall/Trap/Unimpl/Halted);
    db16cyc Yield is the spin-wait producer hand-off.
  - sync-sensitive block: lwarx/ldarx/stwcx./stdcx. or sync/eieio/isync
    (new PpcOpcode::is_sync_sensitive, flagged on DecodedBlock at build).
  - MMIO touch: new GuestMemory::mmio_access_count() watermark, sampled
    per block, keeps GPU/register ordering at one-block granularity.
  - next PC leaves ordinary guest code (import thunk / halt sentinel /
    unmapped) -> hand to the full worker_prologue next round.
  - instruction budget reached.

Instruction-count/clock accounting stays exact: per-block cycle_count
deltas are summed and handed to worker_epilogue once (instruction_count +
decrement_quantum advance by the precise retired count). XENIA_SUPERBLOCK_BUDGET=1
reproduces the old one-block schedule byte-for-byte.

Budget tuned to 128 (env-overridable): boot progression stays healthy up
to 256, sharp cliff at ~384 (a boot producer/consumer handoff starves);
128 is 3x below the cliff. Also scale the inline-GPU per-round fairness
cap with the budget (flat 64 throttled GPU command processing 17x under
superblocks and collapsed the present loop).

PERF (check -n 100M --gpu-inline): 25.3 -> 42.7 MIPS (1.69x); 1B: 26.0 ->
41.4 MIPS (1.59x). Callgrind n=5M: host instructions 2.178B -> 1.507B
(-31%); worker_prologue -90%, coord_pre_round -91%, begin_slot_visit /
round_schedule_into / coord_post_round / update_timestamp_bundle each
~-90%; interpreter execute byte-identical (real work unchanged).

GATES: C1 boot progression 150M draws 7391/swaps 2164 (baseline 7415/2172),
1B draws 88547/swaps 29228 linear no stall, K8888 decode + RTs=2 intact.
C2 determinism: n50m stable digest byte-identical across fresh runs;
golden re-baselined intentionally (pacing-only deltas: imports 333453->243387,
draws 1274->1279). C3 milestone-1 render: texture_decodes/draws/swaps/
present cadence track baseline (3AJ fade-in pacing preserved). C4: 690
tests green (+2 sync_sensitive).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-19 22:31:54 +02:00
MechaCat02
19659d7f76 feat(kernel): KRNBUG-XAM-001 — XGetAVPack returns 8 (HDMI), not 0x16
Mirrors canary's cvars::avpack default (xam_info.cc:35) and Sylpheed's
accepted set {3,4,6,8} (xam_info.cc:250-251). With KRNBUG-XEX-001 having
flipped the priv-10 gate, XGetAVPack now reaches its caller in
sub_824AB578; returning 0x16 caused Sylpheed to abort the AV/crypto
block before XeCryptSha. Cascade walks one step (canary-only export
list 11 → 10); sub_824ABA98 is the next candidate.

Tests: 589 → 590. Goldens re-baselined (n50m: 50000005→50000004,
imports 407417→407416). Lockstep deterministic across 3 reruns at
-n 100M (instructions=100000010, import_calls=987686 +2.4×, swaps=2).
9-PC producer probe still 0×; parked handles 0x1004/0x100c/0x15e0
still signal_attempts=0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 18:54:24 +02:00
MechaCat02
1a892d4641 feat(kernel): KRNBUG-XEX-001 — real XexCheckExecutablePrivilege from XEX header bitmap
Replace stub_return_zero with a canary-faithful implementation that
returns bit `priv` of the loaded XEX's XEX_HEADER_SYSTEM_FLAGS
(key 0x00030000) bitmap. Mirrors xenia-canary
xboxkrnl_modules.cc:22-39: `(flags >> priv) & 1` for priv < 32, else 0.

Plumbing:
- xenia-xex: header_keys::SYSTEM_FLAGS const + get_system_flags() accessor.
- xenia-kernel/state.rs: pub xex_system_flags: u32 + xex_priv_logged
  HashSet for one-shot per-priv tracing.
- xenia-app: kernel.xex_system_flags wired in cmd_exec_inner.
- xenia-kernel/exports.rs: real export body + unit test covering
  bits 10/11/0/64 + zero-flags case.

Sylpheed's bitmap is 0x00000400 (only XEX_SYSTEM_PAL50_INCOMPATIBLE,
bit 10). At -n 500M with the fix:
- XGetAVPack: 0 -> 1 (priv-10 gate at lr=0x824ab598 flipped).
- 10 other canary-only exports + 9 producer PCs + 3 parked handles
  unchanged. Priv-11 site at sub_824A9710 is downstream and still
  not reached — AV/crypto block aborts after XGetAVPack returns
  our placeholder 0x16 (canary returns 8/HDMI; Sylpheed accepts
  only 3/4/6/8 per xenia-canary xam_info.cc:250-251).

Tests 588 -> 589. Lockstep deterministic (3 reruns identical):
n50m goes 50000008 -> 50000005 instr / 407415 -> 407417 imp / swaps=2 /
draws=0. Goldens re-baselined (sylpheed_n50m, sylpheed_n2m); oracle
test green.

Full chain-of-effects + next-frontier hand-off in audit-findings.md
under KRNBUG-XEX-001.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 18:32:51 +02:00
MechaCat02
bae9305982 xenia-app: observability subsystem, --parallel runtime, stress harness
observability.rs installs the tracing subscriber stack (env-filter +
JSON file appender + chrome trace + error layer) and the metrics
recorder shared by the workspace. main.rs grows the new CLI surface:
--parallel, --reservations-table, --trace-handles, --analyze=
{rust,sql,both}, xenia dis --json, --ui, plus the wiring that runs
the CPU through the new scheduler, drives the GPU's threaded backend,
and surfaces the framebuffer + HUD via xenia-ui.

Add tests/parallel_stress.rs (#[ignore]-gated long form, short form
runs 20×@5M) and tests/golden/sylpheed_n2m.json — the digest the
lockstep/parallel combos compare against.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:30:26 +02:00