xenia-rs

fabi/xenia-rs

Fork 0

Commit Graph

Author	SHA1	Message	Date
MechaCat02	9a93152981	Iterate-2.BE: host-driven synchronous graphics ISR delivery Replaces the victim-thread-mutate-then-wait scheme for vsync / CP interrupts with synchronous in-line dispatch on the coordinator host thread. Mirrors canary's EmulateCPInterruptDPC -> Processor::Execute path (kernel_state.cc:1370, processor.cc:413): pick a guest thread, borrow its PpcContext, jam ISR PC + args in, run the interpreter inline until LR_HALT_SENTINEL, restore the borrowed context. Why: audit-059 measured gpu.interrupt.delivered{source=0} = 54 over 3.9 s vs canary's 4712 over 30 s. Per-second shortfall ~11×. Old asynchronous LR-sentinel injection (try_inject_graphics_interrupt) needed a Ready or Blocked guest thread to land on; once the Sylpheed main thread and worker threads all idled post-boot, no victim was available and every queued vsync got dropped. Host-driven dispatch decouples delivery from guest-thread readiness. Smoke test (lockstep): unchanged 54 — under current Sylpheed boot trajectory the ticker is gated by guest-instruction progress, not victim availability; lockstep stalls into idle-advance after ~5M instructions of real work and the synthetic tick_vsync_instr stops firing. Under --parallel (wallclock ticker) gpu.interrupt.delivered climbs to ~1131 over a 128 s run, confirming the synchronous dispatcher itself works as intended. Architectural piece is now in place; raising the lockstep delivery rate requires ticking the synthetic vsync inside coord_idle_advance, which is a separate change. Changes: - crates/xenia-kernel/src/interrupts.rs: doc-comment update only. SavedCallbackCtx + CALLBACK_STACK_PAD retained — the audio callback path (audit-048) still uses the asynchronous LR-sentinel inject on a dedicated per-client worker. - crates/xenia-app/src/main.rs: * dispatch_graphics_interrupts(kernel, mem, &mut stats, &mut decode_cache, thunk_map): new fn. Drains the full FIFO per call. Victim selection same shape (Ready preferred, else Blocked, skip Idle/Exited/ServicingIrq), but the call is synchronous - we run step_cached + import-thunk dispatch inline on the borrowed ctx until pc == LR_HALT_SENTINEL. MAX_INSTRS_PER_ISR = 1M safety budget. * coord_pre_round: graphics-IRQ injection call removed. Audio path unchanged (still calls try_inject_audio_callback). * run_execution + run_execution_parallel: each now owns a persistent isr_decode_cache and calls dispatch_graphics_interrupts after coord_pre_round. * try_inject_graphics_interrupt: deleted (118 LOC). No new public APIs, no new dependencies, no changes to xenia-cpu. Tests: workspace 765 passed / 0 failed / 4 ignored (parallel_stress + sylpheed_n50m, all gated). Kernel 127/127, app 5/5, cpu 288/288. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-06 18:58:40 +02:00
MechaCat02	27d3608174	fix(kernel): KRNBUG-D08 — wall-clock v-sync under --parallel The synthetic v-sync ticker used a per-instruction proxy (VSYNC_INSTR_PERIOD = 150 k) tuned for ~10 MIPS lockstep throughput → 60 Hz. Audit M11 observed this drifts under `--parallel`: with 6 worker threads sharing the kernel mutex, the dispatcher executes more PPC instructions per tick callback, so the accumulator never crosses 150 k. Result: ~629 v-syncs/100M lockstep → ~2 v-syncs/100M --parallel. Hybrid solution preserves lockstep determinism (which the goldens depend on) while fixing --parallel: * `tick_vsync_instr(instr_count)` — legacy instruction-count ticker, used by lockstep. Bit-stable across runs. * `tick_vsync_wallclock()` — new Instant-based ticker. Fires `floor(elapsed / VSYNC_PERIOD)` v-syncs since the anchor and advances the anchor by that many full periods (no lazy backlog). Capped at INTERRUPT_QUEUE_CAP per call so a forward-jumping clock can't overflow the FIFO. * `KernelState.parallel_active` flag set at startup from `--parallel` / `XENIA_PARALLEL=1`. Read by `coord_pre_round` in main.rs to choose between the two tickers. Verification: * cargo test --workspace --release: 561 passing (+3 new wall-clock tests vs prior 558 baseline). * lockstep -n 100M --stable-digest: BIT-IDENTICAL to pre-Phase-3 baseline. interrupts_delivered preserved at ~630 (was ~629 pre-fix). * --parallel --reservations-table -n 30M: interrupts_delivered rose from ~2 to 17. (FIFO INTERRUPT_QUEUE_CAP=4 still caps burst delivery; that's a separate bottleneck — addressed by raising cap when --parallel queue depth becomes the next blocker.) Trade-off: --parallel runs are non-deterministic at the v-sync rate by design (per audit M05 PPCBUG-703 already). Lockstep stays bit-identical, so the `sylpheed_n*m.json` goldens are untouched. Audit IDs: KRNBUG-D08 (closed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 17:34:30 +02:00
MechaCat02	5f0d6487ea	xenia-kernel: HLE expansion, scheduler integration, audit + UI bridge Major HLE buildout in exports.rs: KeInitializeSemaphore now seeds count/limit, XexGet{Module,Procedure}Address use distinct HMODULE_XBOXKRNL/HMODULE_XAM pseudo-handles with a reverse (ModuleId,ordinal)→thunk_addr map, plus sweeping additions across sync primitives, file I/O, semaphores, events, threads, and allocator paths needed to advance Sylpheed past VdSwap=2. New modules: - thread.rs — ThreadRef + per-thread suspension/wake plumbing - interrupts.rs — IRQ delivery, pending-IRQ slots, IPI helpers - path.rs — guest path normalization (D:\\, game:\\, etc.) - audit.rs — --trace-handles harness backing the handle audit - ui_bridge.rs — kernel-side endpoint of the xenia-ui bridge (input snapshots, framebuffer publish handles) state.rs grows to own the HW-slot scheduler state, the new audit / UI bridge handles, and the per-handle reverse maps. xam.rs and objects.rs follow suit for the HLE additions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 16:29:00 +02:00

Author

SHA1

Message

Date

MechaCat02

9a93152981

Iterate-2.BE: host-driven synchronous graphics ISR delivery

Replaces the victim-thread-mutate-then-wait scheme for vsync / CP
interrupts with synchronous in-line dispatch on the coordinator host
thread. Mirrors canary's EmulateCPInterruptDPC -> Processor::Execute
path (kernel_state.cc:1370, processor.cc:413): pick a guest thread,
borrow its PpcContext, jam ISR PC + args in, run the interpreter
inline until LR_HALT_SENTINEL, restore the borrowed context.

Why: audit-059 measured gpu.interrupt.delivered{source=0} = 54 over
3.9 s vs canary's 4712 over 30 s. Per-second shortfall ~11×. Old
asynchronous LR-sentinel injection (try_inject_graphics_interrupt)
needed a Ready or Blocked guest thread to land on; once the Sylpheed
main thread and worker threads all idled post-boot, no victim was
available and every queued vsync got dropped. Host-driven dispatch
decouples delivery from guest-thread readiness.

Smoke test (lockstep): unchanged 54 — under current Sylpheed boot
trajectory the ticker is gated by guest-instruction progress, not
victim availability; lockstep stalls into idle-advance after ~5M
instructions of real work and the synthetic tick_vsync_instr stops
firing. Under --parallel (wallclock ticker) gpu.interrupt.delivered
climbs to ~1131 over a 128 s run, confirming the synchronous
dispatcher itself works as intended. Architectural piece is now in
place; raising the lockstep delivery rate requires ticking the
synthetic vsync inside coord_idle_advance, which is a separate
change.

Changes:
- crates/xenia-kernel/src/interrupts.rs: doc-comment update only.
  SavedCallbackCtx + CALLBACK_STACK_PAD retained — the audio
  callback path (audit-048) still uses the asynchronous LR-sentinel
  inject on a dedicated per-client worker.
- crates/xenia-app/src/main.rs:
  * dispatch_graphics_interrupts(kernel, mem, &mut stats,
    &mut decode_cache, thunk_map): new fn. Drains the full FIFO per
    call. Victim selection same shape (Ready preferred, else
    Blocked, skip Idle/Exited/ServicingIrq), but the call is
    synchronous - we run step_cached + import-thunk dispatch inline
    on the borrowed ctx until pc == LR_HALT_SENTINEL.
    MAX_INSTRS_PER_ISR = 1M safety budget.
  * coord_pre_round: graphics-IRQ injection call removed. Audio
    path unchanged (still calls try_inject_audio_callback).
  * run_execution + run_execution_parallel: each now owns a
    persistent isr_decode_cache and calls
    dispatch_graphics_interrupts after coord_pre_round.
  * try_inject_graphics_interrupt: deleted (118 LOC).

No new public APIs, no new dependencies, no changes to xenia-cpu.

Tests: workspace 765 passed / 0 failed / 4 ignored (parallel_stress
+ sylpheed_n50m, all gated). Kernel 127/127, app 5/5, cpu 288/288.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-06-06 18:58:40 +02:00

MechaCat02

27d3608174

fix(kernel): KRNBUG-D08 — wall-clock v-sync under --parallel

The synthetic v-sync ticker used a per-instruction proxy
(VSYNC_INSTR_PERIOD = 150 k) tuned for ~10 MIPS lockstep
throughput → 60 Hz. Audit M11 observed this drifts under
`--parallel`: with 6 worker threads sharing the kernel mutex,
the dispatcher executes more PPC instructions per tick
callback, so the accumulator never crosses 150 k. Result:
~629 v-syncs/100M lockstep → ~2 v-syncs/100M --parallel.

Hybrid solution preserves lockstep determinism (which the
goldens depend on) while fixing --parallel:

* `tick_vsync_instr(instr_count)` — legacy instruction-count
  ticker, used by lockstep. Bit-stable across runs.

* `tick_vsync_wallclock()` — new Instant-based ticker. Fires
  `floor(elapsed / VSYNC_PERIOD)` v-syncs since the anchor
  and advances the anchor by that many full periods (no
  lazy backlog). Capped at INTERRUPT_QUEUE_CAP per call so a
  forward-jumping clock can't overflow the FIFO.

* `KernelState.parallel_active` flag set at startup from
  `--parallel` / `XENIA_PARALLEL=1`. Read by `coord_pre_round`
  in main.rs to choose between the two tickers.

Verification:

* cargo test --workspace --release: 561 passing (+3 new
  wall-clock tests vs prior 558 baseline).
* lockstep -n 100M --stable-digest: BIT-IDENTICAL to
  pre-Phase-3 baseline. interrupts_delivered preserved at
  ~630 (was ~629 pre-fix).
* --parallel --reservations-table -n 30M: interrupts_delivered
  rose from ~2 to 17. (FIFO INTERRUPT_QUEUE_CAP=4 still caps
  burst delivery; that's a separate bottleneck — addressed
  by raising cap when --parallel queue depth becomes the
  next blocker.)

Trade-off: --parallel runs are non-deterministic at the
v-sync rate by design (per audit M05 PPCBUG-703 already).
Lockstep stays bit-identical, so the `sylpheed_n*m.json`
goldens are untouched.

Audit IDs: KRNBUG-D08 (closed).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-03 17:34:30 +02:00

MechaCat02

5f0d6487ea

xenia-kernel: HLE expansion, scheduler integration, audit + UI bridge

Major HLE buildout in exports.rs: KeInitializeSemaphore now seeds
count/limit, XexGet{Module,Procedure}Address use distinct
HMODULE_XBOXKRNL/HMODULE_XAM pseudo-handles with a reverse
(ModuleId,ordinal)→thunk_addr map, plus sweeping additions across
sync primitives, file I/O, semaphores, events, threads, and
allocator paths needed to advance Sylpheed past VdSwap=2.

New modules:
  - thread.rs   — ThreadRef + per-thread suspension/wake plumbing
  - interrupts.rs — IRQ delivery, pending-IRQ slots, IPI helpers
  - path.rs     — guest path normalization (D:\\, game:\\, etc.)
  - audit.rs    — --trace-handles harness backing the handle audit
  - ui_bridge.rs — kernel-side endpoint of the xenia-ui bridge
                   (input snapshots, framebuffer publish handles)

state.rs grows to own the HW-slot scheduler state, the new audit /
UI bridge handles, and the per-handle reverse maps. xam.rs and
objects.rs follow suit for the HLE additions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-01 16:29:00 +02:00

3 Commits