Maheep Chaudhary and Atticus Geiger

Evaluating open-source sparse autoencoders on disentangling factual knowledge in gpt-2 small , author= · 2024 · arXiv 2409.04478

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

representative citing papers

PLOT: Progressive Localization via Optimal Transport in Neural Causal Abstraction

cs.LG · 2026-05-07 · unverdicted · novelty 7.0 · 2 refs

PLOT localizes causal variables in neural networks by fitting optimal transport couplings between abstract and neural intervention effect geometries, enabling fast handles or guided search.

Why Limit the Residual Stream to Layers and Not Tokens? Persistent Memory for Continuous Latent Reasoning

cs.AI · 2026-06-05 · unverdicted · novelty 6.0

AGCLR extends CoCoNuT with a gated concept stream for persistent memory to fix fact loss in latent reasoning, yielding improvements on reasoning benchmarks as depth increases.

Mechanistic origins of catastrophic forgetting: why RL preserves circuits better than SFT?

cs.LG · 2026-05-21 · unverdicted · novelty 6.0

RL preserves a larger fraction of base model circuits than SFT during fine-tuning on scientific QA, per a new head-level differential circuit vulnerability metric, at the cost of slower adaptation.

citing papers explorer

Showing 3 of 3 citing papers.

PLOT: Progressive Localization via Optimal Transport in Neural Causal Abstraction cs.LG · 2026-05-07 · unverdicted · none · ref 2 · 2 links
PLOT localizes causal variables in neural networks by fitting optimal transport couplings between abstract and neural intervention effect geometries, enabling fast handles or guided search.
Why Limit the Residual Stream to Layers and Not Tokens? Persistent Memory for Continuous Latent Reasoning cs.AI · 2026-06-05 · unverdicted · none · ref 35
AGCLR extends CoCoNuT with a gated concept stream for persistent memory to fix fact loss in latent reasoning, yielding improvements on reasoning benchmarks as depth increases.
Mechanistic origins of catastrophic forgetting: why RL preserves circuits better than SFT? cs.LG · 2026-05-21 · unverdicted · none · ref 3
RL preserves a larger fraction of base model circuits than SFT during fine-tuning on scientific QA, per a new head-level differential circuit vulnerability metric, at the cost of slower adaptation.

Maheep Chaudhary and Atticus Geiger

fields

years

verdicts

representative citing papers

citing papers explorer