Interpretability research should be judged by actionability—the degree to which its insights support concrete decisions and interventions—rather than explanatory power alone.
Title resolution pending
6 Pith papers cite this work. Polarity classification is still indexing.
years
2026 6representative citing papers
Alignment in deep networks is governed by flag varieties, with subspace intersection dimension as the unique reparameterization-invariant observable, explaining regularization and activation effects from first principles.
Symmetric spectral diagnostics on attention are structurally blind to flow direction, with asymmetry G as the sole control parameter, yielding a two-axis test that distinguishes bottleneck versus diffuse hallucination modes with opposite polarity.
Sparse replacement layers decompose the MLP and attention modules of a chess-playing transformer to reveal verifiable tactical reasoning pathways and parallel computation patterns.
AP-MAE reconstructs masked attention patterns in LLMs with high accuracy, generalizes across models, predicts generation correctness at 55-70%, and enables 13.6% accuracy gains via targeted interventions.
CTT is a compression pipeline for LLMs that achieves up to 49x memory reduction, 10x faster inference, 81% lower CO2 emissions, and retains 68-98% accuracy on code clone detection, summarization, and generation tasks.
citing papers explorer
-
Interpretability Can Be Actionable
Interpretability research should be judged by actionability—the degree to which its insights support concrete decisions and interventions—rather than explanatory power alone.
-
Flag Varieties: A Geometric Framework for Deep Network Alignment
Alignment in deep networks is governed by flag varieties, with subspace intersection dimension as the unique reparameterization-invariant observable, explaining regularization and activation effects from first principles.
-
Self-Attention as Transport: Limits of Symmetric Spectral Diagnostics
Symmetric spectral diagnostics on attention are structurally blind to flow direction, with asymmetry G as the sole control parameter, yielding a two-axis test that distinguishes bottleneck versus diffuse hallucination modes with opposite polarity.
-
Tracing the Thought of a Grandmaster-level Chess-Playing Transformer
Sparse replacement layers decompose the MLP and attention modules of a chess-playing transformer to reveal verifiable tactical reasoning pathways and parallel computation patterns.
-
Automated Attention Pattern Discovery at Scale in Large Language Models
AP-MAE reconstructs masked attention patterns in LLMs with high accuracy, generalizes across models, predicts generation correctness at 55-70%, and enables 13.6% accuracy gains via targeted interventions.
-
Carbon-Taxed Transformers: A Green Compression Pipeline for Overgrown Language Models
CTT is a compression pipeline for LLMs that achieves up to 49x memory reduction, 10x faster inference, 81% lower CO2 emissions, and retains 68-98% accuracy on code clone detection, summarization, and generation tasks.