SOGAR learns Pareto-optimal recourse summaries by solving a bi-objective decision tree optimization that partitions populations and assigns shared low-cost actions per subgroup.
hub Canonical reference
A Unified Approach to Interpreting Model Predictions
Canonical reference. 100% of citing Pith papers cite this work as background.
abstract
Understanding why a model makes a certain prediction can be as crucial as the prediction's accuracy in many applications. However, the highest accuracy for large modern datasets is often achieved by complex models that even experts struggle to interpret, such as ensemble or deep learning models, creating a tension between accuracy and interpretability. In response, various methods have recently been proposed to help users interpret the predictions of complex models, but it is often unclear how these methods are related and when one method is preferable over another. To address this problem, we present a unified framework for interpreting predictions, SHAP (SHapley Additive exPlanations). SHAP assigns each feature an importance value for a particular prediction. Its novel components include: (1) the identification of a new class of additive feature importance measures, and (2) theoretical results showing there is a unique solution in this class with a set of desirable properties. The new class unifies six existing methods, notable because several recent methods in the class lack the proposed desirable properties. Based on insights from this unification, we present new methods that show improved computational performance and/or better consistency with human intuition than previous approaches.
hub tools
citation-role summary
citation-polarity summary
fields
cs.LG 13 cs.CL 6 cs.CR 4 cond-mat.mtrl-sci 3 cs.AI 3 cs.SE 3 astro-ph.GA 2 astro-ph.CO 1 astro-ph.EP 1 astro-ph.IM 1roles
background 5polarities
background 5representative citing papers
GlyTwin generates patient-centric counterfactual behavioral interventions to reduce hyperglycemia in type 1 diabetes, evaluated on a new dataset from 50 patients showing 85.8% valid explanations and 87.3% effectiveness.
ExaGPT uses span-level similarity retrieval from human and LLM datastores to detect machine-generated text while supplying the matching spans as human-interpretable evidence, achieving up to 37-point accuracy gains over prior interpretable detectors at 1% FPR.
Derives a closed-form Shapley value for the squared robust Interval-Mahalanobis distance to explain variable contributions to outlyingness in interval-valued data.
KG-TRACE fuses genomic features with RotatE KG embeddings via an epistemic trust gate for AMR prediction, reporting 0.976 AUROC on isoniazid resistance in the CRyPTIC cohort plus 92.5% symbolic coverage via a new Biological Grounding Ratio metric.
TSM-Bench shows SOTA MGT detectors drop 10-40% in accuracy on task-specific Wikipedia edits versus generic text, with fine-tuning on task-specific data generalizing better than the reverse.
ML-accelerated screening of 8640 AB2C2D variants yields 34 low-hull-energy altermagnets with spin splittings exceeding 1.5 eV, including RbMn2Te2O with 1.88 eV splitting and ~390 K Neel temperature.
CA-LIG is a unified hierarchical attribution method that computes layer-wise Integrated Gradients fused with class-specific attention gradients to generate signed, context-sensitive explanations for transformer models.
A framework using covariance-based spectral signatures and TreeSHAP attributions on AASIST3 branches identifies four operational archetypes and a flawed specialization mode that explains high error rates on specific spoofing attacks.
Neural network classification with CRPS optimization produces calibrated photometric redshift PDFs for DESI Legacy and Pan-STARRS data, achieving σ_NMAD of 0.0153 on LSDR10 and outperforming regression methods.
XGBoost classifier filters interlopers in CSST slitless spectroscopy simulations, retaining 42% of galaxies with 96.6% accurate redshifts and 0.13% outliers.
AttnTrace is an attention-weight-based context traceback method for LLMs that claims higher accuracy and efficiency than prior art like TracLLM while aiding prompt injection detection.
New unsupervised method adapts the multivariate logrank statistic into a differentiable loss for training any neural network on any data modality to discover prognostically distinct patient clusters, demonstrated on myeloma lab data and lung cancer CT images with post-hoc explainability.
A demographically and topically split Reddit dataset called Splits! is constructed and validated to support scalable, flexible investigation of sociocultural linguistic phenomena via a two-stage filtering process for promising candidates.
News embeddings from financial text improve out-of-sample realized volatility forecasts for stocks, with stronger effects for stock-specific news and high-volatility periods, and yield gains when combined with benchmarks.
Reinforcement learning on MIR features combined with cargo-fuzz validation reduces false positives in Rust static memory safety analysis, raising precision from 25.6% to 59.0% and accuracy to 65.2%.
The paper presents a taxonomy of seven production-specific failure modes for agentic AI, demonstrates that existing metrics fail to detect four of them entirely, and proposes the PAEF five-dimension framework for continuous production evaluation with an open-source implementation.
A new scale-aware diagnostic framework shows that unconstrained diffusion generative models exhibit structural freezing and instability instead of smooth physical responses under multiscale perturbations.
Cross-modal averaging maps ECG model attributions to CineECG 3D space, raising Dice overlap with expert annotations from 0.47 to 0.56 on 20 cases while filtering attribution noise.
TrajOnco uses a chain-of-agents LLM architecture with memory to perform temporal reasoning on longitudinal EHR, achieving 0.64-0.80 AUROC for 1-year multi-cancer risk prediction in zero-shot mode on matched cohorts while matching supervised ML on lung cancer and outperforming single-agent baselines.
3D CNNs predict elastic moduli of nanoporous metals with R²=0.955, outperforming descriptor-based models, and transfer learning works on smaller denser datasets for large-scale Pareto optimization.
ML regressors trained on APOGEE DR17 red giants predict C, O, Mg, Si abundances from kinematics and [Fe/H] more accurately than [Fe/H] baseline, with external validation on HARPS FGK dwarfs and reproduction of Galactic chemical evolution trends.
ICA and VEIL enable privacy-preserving supervised ML by producing structurally non-invertible encodings aligned with downstream tasks while maintaining predictive utility.
SANJESH applies bi-level optimization to production traces and reveals VM allocation scenarios that cause 4x worse performance than the operator's existing evaluator detected.
citing papers explorer
-
ExaGPT: Example-Based Machine-Generated Text Detection for Human Interpretability
ExaGPT uses span-level similarity retrieval from human and LLM datastores to detect machine-generated text while supplying the matching spans as human-interpretable evidence, achieving up to 37-point accuracy gains over prior interpretable detectors at 1% FPR.
-
TSM-Bench: Detecting LLM-Generated Text in Real-World Wikipedia Editing Practices
TSM-Bench shows SOTA MGT detectors drop 10-40% in accuracy on task-specific Wikipedia edits versus generic text, with fine-tuning on task-specific data generalizing better than the reverse.
-
Explainable AI: Context-Aware Layer-Wise Integrated Gradients for Explaining Transformer Models
CA-LIG is a unified hierarchical attribution method that computes layer-wise Integrated Gradients fused with class-specific attention gradients to generate signed, context-sensitive explanations for transformer models.
-
AttnTrace: Contextual Attribution of Prompt Injection and Knowledge Corruption
AttnTrace is an attention-weight-based context traceback method for LLMs that claims higher accuracy and efficiency than prior art like TracLLM while aiding prompt injection detection.
-
Splits! Flexible Sociocultural Linguistic Investigation at Scale
A demographically and topically split Reddit dataset called Splits! is constructed and validated to support scalable, flexible investigation of sociocultural linguistic phenomena via a two-stage filtering process for promising candidates.
-
Surrogate modeling for interpreting black-box LLMs in medical predictions
A surrogate modeling method approximates LLM-encoded medical knowledge via prompting to quantify variable influence and flag inaccuracies and racial biases.