pith. sign in

hub Canonical reference

Distill , year =

Canonical reference. 100% of citing Pith papers cite this work as background.

29 Pith papers citing it
Background 100% of classified citations

hub tools

citation-role summary

background 5

citation-polarity summary

roles

background 5

polarities

background 5

clear filters

representative citing papers

Toy Models of Superposition

cs.LG · 2022-09-21 · accept · novelty 8.0

Toy models demonstrate that polysemanticity arises when neural networks store more sparse features than neurons via superposition, producing a phase transition tied to polytope geometry and increased adversarial vulnerability.

Toward Identifiable Sparse Autoencoders

cs.LG · 2026-05-29 · unverdicted · novelty 7.0

Identifiable sparse autoencoders (iSAEs) are created from TopK SAEs via architecture and training tweaks, yielding improved stability and lower error by linking to dictionary learning where learned dictionaries satisfy an approximate restricted isometry condition.

Improving Dictionary Learning with Gated Sparse Autoencoders

cs.LG · 2024-04-24 · unverdicted · novelty 7.0

Gated SAEs decouple which features to use from how large their activations should be, applying the L1 penalty only to selection and thereby eliminating shrinkage while halving the number of firing features needed for good fidelity.

In-context Learning and Induction Heads

cs.LG · 2022-09-24 · unverdicted · novelty 7.0

Induction heads, which implement pattern completion in attention, develop at the same training stage as a sudden rise in in-context learning, providing evidence they are the primary mechanism for in-context learning in transformers.

Feature Identification via the Empirical NTK

cs.LG · 2025-10-01 · unverdicted · novelty 6.0

Eigenanalysis of the empirical NTK surfaces feature directions that align with Fourier features in modular addition networks and grammatical features in Gemma-3-270M, outperforming PCA baselines on activations.

Thinking Sparks!: Emergent Attention Heads in Reasoning Models During Post Training

cs.AI · 2025-09-30 · unverdicted · novelty 6.0

Post-training on reasoning tasks sparks the emergence of specialized attention heads that enable structured computation, with SFT adding stable heads while GRPO uses dynamic activation and pruning tied to reward signals, and controllable think models relying on compensatory heads instead of specific

Linear Representations of Sentiment in Large Language Models

cs.LG · 2023-10-23 · unverdicted · novelty 6.0

Sentiment is represented as a single linear direction in LLM activation space that is causally relevant across tasks and is summarized at punctuation and names in addition to charged words.

citing papers explorer

Showing 5 of 5 citing papers after filters.

  • Toy Models of Superposition cs.LG · 2022-09-21 · accept · none · ref 1

    Toy models demonstrate that polysemanticity arises when neural networks store more sparse features than neurons via superposition, producing a phase transition tied to polytope geometry and increased adversarial vulnerability.

  • Data-driven Circuit Discovery for Interpretability of Language Models cs.AI · 2026-05-09 · unverdicted · none · ref 20

    Standard circuit discovery methods produce dataset-specific circuits rather than task-general ones, and a new clustering-based method discovers multiple more faithful circuits per dataset.

  • In-context Learning and Induction Heads cs.LG · 2022-09-24 · unverdicted · none · ref 19

    Induction heads, which implement pattern completion in attention, develop at the same training stage as a sudden rise in in-context learning, providing evidence they are the primary mechanism for in-context learning in transformers.

  • Towards Effective Theory of LLMs: A Representation Learning Approach cs.LG · 2026-05-10 · unverdicted · none · ref 42

    RET learns temporally consistent macrovariables from LLM activations via self-supervised learning to support interpretability, early behavioral prediction, and causal intervention.

  • Engineering Resource-constrained Software Systems with DNN Components: a Concept-based Pruning Approach cs.SE · 2026-04-11 · unverdicted · none · ref 76

    A concept-based pruning method for DNNs guided by interpretable concepts and system requirements produces smaller, computationally efficient models that maintain effectiveness on image classification tasks.