Advances in Neural Information Processing Systems , volume=

Towards automated circuit discovery for mechanistic interpretability , author=

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

browse 5 citing papers

representative citing papers

GKnow: Measuring the Entanglement of Gender Bias and Factual Gender

cs.CL · 2026-05-12 · unverdicted · novelty 7.0

Gender bias and factual gender knowledge are severely entangled in language model circuits and neurons, making neuron ablation an unreliable method for debiasing.

Cell-Based Representation of Relational Binding in Language Models

cs.CL · 2026-04-21 · unverdicted · novelty 7.0

Large language models encode relational bindings via a cell-based representation: a low-dimensional linear subspace in which each cell corresponds to an entity-relation index pair and attributes are retrieved from the matching cell.

Interpretability Can Be Actionable

cs.LG · 2026-05-11 · conditional · novelty 6.0

Interpretability research should be judged by actionability—the degree to which its insights support concrete decisions and interventions—rather than explanatory power alone.

When Do LLMs Reason? A Dynamical Systems View via Entropy Phase Transitions

cs.LG · 2026-05-20 · unverdicted · novelty 5.0

Early entropy dynamics during LLM decoding mark when explicit reasoning becomes beneficial, enabling the training-free EDRM router that selects strategies per instance and yields 41-55% token savings with accuracy gains across 15 benchmarks.

There Will Be a Scientific Theory of Deep Learning

stat.ML · 2026-04-23 · unverdicted · novelty 2.0

A mechanics of the learning process is emerging in deep learning theory, characterized by dynamics, coarse statistics, and falsifiable predictions across idealized settings, limits, laws, hyperparameters, and universal behaviors.

citing papers explorer

Showing 5 of 5 citing papers.

GKnow: Measuring the Entanglement of Gender Bias and Factual Gender cs.CL · 2026-05-12 · unverdicted · none · ref 26
Gender bias and factual gender knowledge are severely entangled in language model circuits and neurons, making neuron ablation an unreliable method for debiasing.
Cell-Based Representation of Relational Binding in Language Models cs.CL · 2026-04-21 · unverdicted · none · ref 41
Large language models encode relational bindings via a cell-based representation: a low-dimensional linear subspace in which each cell corresponds to an entity-relation index pair and attributes are retrieved from the matching cell.
Interpretability Can Be Actionable cs.LG · 2026-05-11 · conditional · none · ref 153
Interpretability research should be judged by actionability—the degree to which its insights support concrete decisions and interventions—rather than explanatory power alone.
When Do LLMs Reason? A Dynamical Systems View via Entropy Phase Transitions cs.LG · 2026-05-20 · unverdicted · none · ref 38
Early entropy dynamics during LLM decoding mark when explicit reasoning becomes beneficial, enabling the training-free EDRM router that selects strategies per instance and yields 41-55% token savings with accuracy gains across 15 benchmarks.
There Will Be a Scientific Theory of Deep Learning stat.ML · 2026-04-23 · unverdicted · none · ref 291
A mechanics of the learning process is emerging in deep learning theory, characterized by dynamics, coarse statistics, and falsifiable predictions across idealized settings, limits, laws, hyperparameters, and universal behaviors.

Advances in Neural Information Processing Systems , volume=

fields

years

verdicts

representative citing papers

citing papers explorer