M9.5 + M11.5 + VMX + SJIS/UTF-8: close the post-M5.5 deferred set
Closes the four remaining deferred follow-up items in one bundle. All four are smaller-scope and additive; lockstep determinism unaffected (analyzer-only changes). ## M9.5 — __CxxFrameHandler scope-table parsing - New `xenia_analysis::eh_scope` module. Magic-scans .rdata for the three documented MSVC FuncInfo signatures (0x19930520/21/22) on 4-byte alignment. Each match is parsed as the documented struct (BE u32 fields), with sanity caps on max_state / n_try_blocks / pointer validity. - Walks pUnwindMap (UnwindMapEntry, 8 bytes) and pTryBlockMap (TryBlockMapEntry, 20 bytes) into one row each. - New tables eh_funcinfo, eh_unwind_map, eh_try_blocks. - Sylpheed yield: 2,588 FuncInfo (all version 0x19930522) / 10,019 unwind entries / 315 try-blocks. ## M11.5 — Static-init driver chain detection - New `xenia_analysis::static_init` module. Walks every function looking for the canonical _initterm loop: lwz cursor; mtctr; bcctrl; addi cursor, cursor, 4 bounded by a compare against another constant register. Extracts (array_start, array_end) and reads the array. - Reuses `function_pointer_arrays` table — drivers' arrays land with kind='static_init' (replacing M11's prologue-heuristic output where the structurally-grounded pattern fires). - Sylpheed yield: 0 drivers detected — the binary's static-init structure does not match the canonical CRT loop. Infrastructure ready; future M11.6 can relax. ## VMX vector-store xrefs (M6 follow-up) - Adds AltiVec/VMX X-form load/store XOs to the M6 opcode-31 dispatch: lvx/lvxl/lvebx/lvehx/lvewx (reads) and stvx/stvxl/stvebx/stvehx/stvewx (writes), all addr_mode= 'x_form_indexed'. Static resolution still requires both rA and rB constant. - Sylpheed yield: 110 newly-detected stvx writes. ## Shift_JIS + UTF-8 localised-string detection (M7 follow-up) - Extends `xenia_analysis::strings::analyze` with scan_shift_jis (JIS X 0208 lead/trail byte ranges + half-width katakana pass-through) and scan_utf8 (2- and 3-byte sequences). At least one multi-byte unit required so pure-ASCII strings aren't double-counted. - SJIS bytes rendered as \xHH escapes for diagnostic readability; full SJIS→UTF-8 decoding deferred. - Sylpheed yield: 790 Shift_JIS strings (Japanese debug + UI text) + 39 UTF-8. ## Tests - +2 EH (parses_minimal_funcinfo_v0, rejects_bogus_max_state) - +2 static_init (detects_canonical_initterm_loop, rejects_function_without_pattern) - +2 strings (detects_shift_jis_string, detects_utf8_multibyte_string) Tests 649→655 (+6 unit tests). DB schema golden + write_analysis_results signature updated for new EH parameter. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
296
crates/xenia-analysis/src/eh_scope.rs
Normal file
296
crates/xenia-analysis/src/eh_scope.rs
Normal file
@@ -0,0 +1,296 @@
|
||||
//! M9.5 — MSVC `__CxxFrameHandler` scope-table parsing.
|
||||
//!
|
||||
//! When MSVC compiles C++ try/catch on Win32 PowerPC, the compiler emits
|
||||
//! per-function `FuncInfo` records in `.rdata` containing the scope-state
|
||||
//! tables that `__CxxFrameHandler` walks during unwinding. Each record
|
||||
//! starts with one of the documented magic numbers:
|
||||
//!
|
||||
//! - `0x19930520` — original FuncInfo (no aligned-state-array)
|
||||
//! - `0x19930521` — adds `pESTypeList` field
|
||||
//! - `0x19930522` — adds `EHFlags` field
|
||||
//!
|
||||
//! Layout (4-byte little-endian on x86; **on Xbox 360 PowerPC PE the
|
||||
//! struct is big-endian** because the binary is BE throughout):
|
||||
//!
|
||||
//! ```text
|
||||
//! +0x00 uint32 magicNumber (one of 0x199305{20,21,22})
|
||||
//! +0x04 int32 maxState (number of UnwindMapEntry rows)
|
||||
//! +0x08 uint32 pUnwindMap (VA → UnwindMapEntry[])
|
||||
//! +0x0C uint32 nTryBlocks
|
||||
//! +0x10 uint32 pTryBlockMap (VA → TryBlockMapEntry[])
|
||||
//! +0x14 uint32 nIPMapEntries (ignored on x86; present on PPC)
|
||||
//! +0x18 uint32 pIPtoStateMap (VA → IPtoStateMapEntry[])
|
||||
//! +0x1C uint32 pESTypeList (only when magic ≥ 0x19930521)
|
||||
//! +0x20 uint32 EHFlags (only when magic = 0x19930522)
|
||||
//! ```
|
||||
//!
|
||||
//! Each `UnwindMapEntry` is 8 bytes: `(toState i32, action u32)`.
|
||||
//! Each `TryBlockMapEntry` is 20 bytes:
|
||||
//! `(tryLow i32, tryHigh i32, catchHigh i32, nCatches u32, pHandlerArray u32)`.
|
||||
//!
|
||||
//! ### What this module does
|
||||
//!
|
||||
//! - Magic-scan `.rdata` for the three FuncInfo signatures (read as BE u32).
|
||||
//! - Parse the FuncInfo record + walk the unwind map and try-block map.
|
||||
//! - Skip records whose internal pointers don't land in valid sections,
|
||||
//! or whose lengths exceed sane caps.
|
||||
//!
|
||||
//! ### What this module does NOT do
|
||||
//!
|
||||
//! - Does not associate a FuncInfo back to its owning function. The
|
||||
//! `bl __CxxFrameHandler` registration would name that linkage, but
|
||||
//! it requires walking all `has_eh=true` functions' prologues; a
|
||||
//! future M9.6 can do that. For now the FuncInfo record stands on its
|
||||
//! own — joins to `functions` by best-effort PC range queries.
|
||||
//! - Does not parse the `pHandlerArray` per try-block (catch type info).
|
||||
//!
|
||||
//! Reference: LLVM `llvm/lib/CodeGen/AsmPrinter/WinException.cpp`,
|
||||
//! Microsoft openrce.org documentation on FuncInfo.
|
||||
|
||||
use xenia_xex::pe::PeSection;
|
||||
|
||||
const MAGIC_OLD: u32 = 0x1993_0520;
|
||||
const MAGIC_V21: u32 = 0x1993_0521;
|
||||
const MAGIC_V22: u32 = 0x1993_0522;
|
||||
|
||||
#[derive(Debug, Clone, Copy)]
|
||||
pub struct UnwindMapEntry {
|
||||
pub to_state: i32,
|
||||
pub action_pc: u32, // VA of the cleanup action; 0 if none
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Copy)]
|
||||
pub struct TryBlockMapEntry {
|
||||
pub try_low: i32,
|
||||
pub try_high: i32,
|
||||
pub catch_high: i32,
|
||||
pub n_catches: u32,
|
||||
pub p_handler_array: u32,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct EhFuncInfo {
|
||||
pub address: u32, // VA of the FuncInfo record itself
|
||||
pub magic: u32,
|
||||
pub max_state: i32,
|
||||
pub p_unwind_map: u32,
|
||||
pub n_try_blocks: u32,
|
||||
pub p_try_block_map: u32,
|
||||
pub n_ip_map_entries: u32,
|
||||
pub p_ip_to_state_map: u32,
|
||||
pub p_es_type_list: Option<u32>,
|
||||
pub eh_flags: Option<u32>,
|
||||
pub unwind_map: Vec<UnwindMapEntry>,
|
||||
pub try_blocks: Vec<TryBlockMapEntry>,
|
||||
}
|
||||
|
||||
#[tracing::instrument(skip_all, fields(image_base = format_args!("{:#010x}", image_base)))]
|
||||
pub fn analyze(
|
||||
pe: &[u8],
|
||||
image_base: u32,
|
||||
sections: &[PeSection],
|
||||
) -> Vec<EhFuncInfo> {
|
||||
let started = std::time::Instant::now();
|
||||
let mut out: Vec<EhFuncInfo> = Vec::new();
|
||||
|
||||
// Compute the union of valid VA ranges across all sections — used to
|
||||
// sanity-check internal pointers in the FuncInfo records.
|
||||
let valid_ranges: Vec<(u32, u32)> = sections.iter()
|
||||
.map(|s| (image_base + s.virtual_address,
|
||||
image_base + s.virtual_address + s.virtual_size))
|
||||
.collect();
|
||||
let in_valid = |va: u32| valid_ranges.iter().any(|(lo, hi)| va >= *lo && va < *hi);
|
||||
|
||||
let read_u32 = |abs: u32| -> Option<u32> {
|
||||
let off = abs.wrapping_sub(image_base) as usize;
|
||||
if off + 4 > pe.len() { return None; }
|
||||
Some(u32::from_be_bytes([pe[off], pe[off + 1], pe[off + 2], pe[off + 3]]))
|
||||
};
|
||||
let read_i32 = |abs: u32| -> Option<i32> { read_u32(abs).map(|u| u as i32) };
|
||||
|
||||
for section in sections {
|
||||
if section.name != ".rdata" { continue; }
|
||||
let raw_start = section.virtual_address as usize;
|
||||
let raw_end = (section.virtual_address + section.virtual_size) as usize;
|
||||
if raw_end > pe.len() { continue; }
|
||||
let bytes = &pe[raw_start..raw_end.min(pe.len())];
|
||||
let va_base = image_base + section.virtual_address;
|
||||
|
||||
// Walk on 4-byte alignment looking for the magic.
|
||||
let mut i = 0;
|
||||
while i + 4 <= bytes.len() {
|
||||
if !i.is_multiple_of(4) { i += 1; continue; }
|
||||
let m = u32::from_be_bytes([bytes[i], bytes[i + 1], bytes[i + 2], bytes[i + 3]]);
|
||||
if m == MAGIC_OLD || m == MAGIC_V21 || m == MAGIC_V22 {
|
||||
let addr = va_base + i as u32;
|
||||
if let Some(rec) = parse_funcinfo(addr, m, &read_u32, &read_i32, &in_valid) {
|
||||
out.push(rec);
|
||||
}
|
||||
}
|
||||
i += 4;
|
||||
}
|
||||
}
|
||||
|
||||
let elapsed_ms = started.elapsed().as_millis() as f64;
|
||||
let n_unwind: usize = out.iter().map(|r| r.unwind_map.len()).sum();
|
||||
let n_try: usize = out.iter().map(|r| r.try_blocks.len()).sum();
|
||||
metrics::histogram!("analysis.phase_ms", "phase" => "eh_scope").record(elapsed_ms);
|
||||
tracing::info!(
|
||||
records = out.len(),
|
||||
unwind_entries = n_unwind,
|
||||
try_blocks = n_try,
|
||||
elapsed_ms,
|
||||
"M9.5 EH scope-table scan complete",
|
||||
);
|
||||
out
|
||||
}
|
||||
|
||||
fn parse_funcinfo(
|
||||
addr: u32,
|
||||
magic: u32,
|
||||
read_u32: &impl Fn(u32) -> Option<u32>,
|
||||
read_i32: &impl Fn(u32) -> Option<i32>,
|
||||
in_valid: &impl Fn(u32) -> bool,
|
||||
) -> Option<EhFuncInfo> {
|
||||
let max_state = read_i32(addr + 0x04)?;
|
||||
let p_unwind_map = read_u32(addr + 0x08)?;
|
||||
let n_try_blocks = read_u32(addr + 0x0C)?;
|
||||
let p_try_block_map = read_u32(addr + 0x10)?;
|
||||
let n_ip_map_entries = read_u32(addr + 0x14)?;
|
||||
let p_ip_to_state_map = read_u32(addr + 0x18)?;
|
||||
|
||||
// Sanity caps: real FuncInfo records have max_state ≤ a few thousand,
|
||||
// n_try_blocks ≤ a few hundred. Reject obviously bogus values that
|
||||
// happened to alias the magic.
|
||||
if !(0..=10_000).contains(&max_state) { return None; }
|
||||
if n_try_blocks > 1_000 { return None; }
|
||||
if n_ip_map_entries > 100_000 { return None; }
|
||||
// Pointers must either be NULL or land in a valid section.
|
||||
if p_unwind_map != 0 && !in_valid(p_unwind_map) { return None; }
|
||||
if p_try_block_map != 0 && !in_valid(p_try_block_map) { return None; }
|
||||
if p_ip_to_state_map != 0 && !in_valid(p_ip_to_state_map) { return None; }
|
||||
|
||||
let (p_es_type_list, eh_flags) = if magic == MAGIC_V21 {
|
||||
(read_u32(addr + 0x1C), None)
|
||||
} else if magic == MAGIC_V22 {
|
||||
(read_u32(addr + 0x1C), read_u32(addr + 0x20))
|
||||
} else {
|
||||
(None, None)
|
||||
};
|
||||
|
||||
// Walk unwind map (8-byte entries).
|
||||
let mut unwind_map: Vec<UnwindMapEntry> = Vec::with_capacity(max_state as usize);
|
||||
if p_unwind_map != 0 && max_state > 0 {
|
||||
for i in 0..max_state {
|
||||
let p = p_unwind_map.wrapping_add((i * 8) as u32);
|
||||
let to_state = read_i32(p)?;
|
||||
let action_pc = read_u32(p + 4)?;
|
||||
unwind_map.push(UnwindMapEntry { to_state, action_pc });
|
||||
}
|
||||
}
|
||||
|
||||
// Walk try-block map (20-byte entries).
|
||||
let mut try_blocks: Vec<TryBlockMapEntry> = Vec::with_capacity(n_try_blocks as usize);
|
||||
if p_try_block_map != 0 && n_try_blocks > 0 {
|
||||
for i in 0..n_try_blocks {
|
||||
let p = p_try_block_map.wrapping_add(i * 20);
|
||||
let try_low = read_i32(p)?;
|
||||
let try_high = read_i32(p + 4)?;
|
||||
let catch_high = read_i32(p + 8)?;
|
||||
let n_catches = read_u32(p + 12)?;
|
||||
let p_handler_a = read_u32(p + 16)?;
|
||||
try_blocks.push(TryBlockMapEntry {
|
||||
try_low, try_high, catch_high, n_catches, p_handler_array: p_handler_a,
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
Some(EhFuncInfo {
|
||||
address: addr,
|
||||
magic,
|
||||
max_state,
|
||||
p_unwind_map,
|
||||
n_try_blocks,
|
||||
p_try_block_map,
|
||||
n_ip_map_entries,
|
||||
p_ip_to_state_map,
|
||||
p_es_type_list,
|
||||
eh_flags,
|
||||
unwind_map,
|
||||
try_blocks,
|
||||
})
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use xenia_xex::pe::PeSection;
|
||||
|
||||
fn mk_section(name: &str, va: u32, size: u32) -> PeSection {
|
||||
PeSection {
|
||||
name: name.into(),
|
||||
virtual_address: va, virtual_size: size,
|
||||
raw_offset: va, raw_size: size,
|
||||
flags: 0x4000_0040,
|
||||
}
|
||||
}
|
||||
|
||||
fn write_be(pe: &mut [u8], at: usize, v: u32) {
|
||||
pe[at..at + 4].copy_from_slice(&v.to_be_bytes());
|
||||
}
|
||||
fn write_be_i32(pe: &mut [u8], at: usize, v: i32) {
|
||||
pe[at..at + 4].copy_from_slice(&v.to_be_bytes());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn parses_minimal_funcinfo_v0() {
|
||||
let image_base = 0x82000000u32;
|
||||
let rdata_va = 0x1000u32;
|
||||
let mut pe = vec![0u8; 0x4000];
|
||||
|
||||
// FuncInfo at .rdata + 0x10.
|
||||
let fi_off = (rdata_va + 0x10) as usize;
|
||||
let fi_va = image_base + rdata_va + 0x10;
|
||||
let unwind_off = (rdata_va + 0x80) as usize;
|
||||
let unwind_va = image_base + rdata_va + 0x80;
|
||||
|
||||
write_be(&mut pe, fi_off, MAGIC_OLD); // magic
|
||||
write_be_i32(&mut pe, fi_off + 4, 2); // maxState
|
||||
write_be(&mut pe, fi_off + 8, unwind_va); // pUnwindMap
|
||||
write_be(&mut pe, fi_off + 12, 0); // nTryBlocks
|
||||
write_be(&mut pe, fi_off + 16, 0); // pTryBlockMap
|
||||
write_be(&mut pe, fi_off + 20, 0); // nIPMapEntries
|
||||
write_be(&mut pe, fi_off + 24, 0); // pIPtoStateMap
|
||||
|
||||
// Two unwind entries.
|
||||
write_be_i32(&mut pe, unwind_off, -1); // to_state
|
||||
write_be(&mut pe, unwind_off + 4, image_base + 0x500); // action_pc
|
||||
write_be_i32(&mut pe, unwind_off + 8, 0);
|
||||
write_be(&mut pe, unwind_off + 12, image_base + 0x600);
|
||||
|
||||
let sections = vec![mk_section(".rdata", rdata_va, 0x100)];
|
||||
let recs = analyze(&pe, image_base, §ions);
|
||||
assert_eq!(recs.len(), 1);
|
||||
let r = &recs[0];
|
||||
assert_eq!(r.address, fi_va);
|
||||
assert_eq!(r.magic, MAGIC_OLD);
|
||||
assert_eq!(r.max_state, 2);
|
||||
assert_eq!(r.unwind_map.len(), 2);
|
||||
assert_eq!(r.unwind_map[0].to_state, -1);
|
||||
assert_eq!(r.unwind_map[0].action_pc, image_base + 0x500);
|
||||
assert_eq!(r.try_blocks.len(), 0);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn rejects_bogus_max_state() {
|
||||
let image_base = 0x82000000u32;
|
||||
let rdata_va = 0x1000u32;
|
||||
let mut pe = vec![0u8; 0x4000];
|
||||
let fi_off = (rdata_va + 0x10) as usize;
|
||||
write_be(&mut pe, fi_off, MAGIC_OLD);
|
||||
write_be_i32(&mut pe, fi_off + 4, 0xFFFF); // bogus maxState
|
||||
let sections = vec![mk_section(".rdata", rdata_va, 0x100)];
|
||||
let recs = analyze(&pe, image_base, §ions);
|
||||
assert_eq!(recs.len(), 0);
|
||||
}
|
||||
}
|
||||
Reference in New Issue
Block a user