arXiv preprint arXiv:2506.03093 , year=

From Flat to Hierarchical: Extracting Sparse Representations with Matching Pursuit , author= · 2025 · arXiv 2506.03093

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Expander Sparse Autoencoders: Parameter-Efficient Dictionaries for Mechanistic Interpretability

cs.LG · 2026-07-02 · conditional · novelty 8.0

Expander SAEs apply left-d-regular expander masks to TopK SAEs, learning only dn decoder parameters instead of mn and tracing a storage-fidelity frontier that reaches 293x compression with 84% retained performance on Qwen2.5-3B.

A Unifying Framework for Concept-Based Representational Similarity

cs.LG · 2026-06-08 · unverdicted · novelty 7.0

A unifying framework decomposes concept alignment into instance-wise and distributional translation and concept consistency, introduces the InterVenchA benchmark, and shows that joint optimization via CoSAE recovers strong alignment even with 0.1% paired data.

Tensor Product Representation Probes Reveal Shared Structure Across Linear Directions

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

Linear probes for Othello board states factor into tensor-product structure with square and color embeddings composed by a binding matrix, from which the linear probes can be directly recovered.

From Reasoning Traces to Reusable Modules: Understanding Compositional Generalization in Language Model Reasoning

cs.LG · 2026-06-16 · unverdicted · novelty 6.0

Introduces a hierarchical latent selection model showing SFT supplies raw module materials in compound traces while RL decomposes them to identify atomic modules and enable recombination for new reasoning configurations.

A Geometric View for Understanding Concept Learning and Neuron Interpretation in Sparse Autoencoders

cs.LG · 2026-06-05 · unverdicted · novelty 6.0

Formalizes concept learning in sparse autoencoders as set alignment between human-defined and model-induced concepts, distinguishing detection, separation, and approximation with geometric conditions for neuron representation.

Stories in Space: In-Context Learning Trajectories in Conceptual Belief Space

cs.CL · 2026-05-12 · unverdicted · novelty 6.0

LLMs perform in-context learning as trajectories through a structured low-dimensional conceptual belief space, with the structure visible in both behavior and internal representations and causally manipulable via interventions.

Anatomy of Post-Training: Using Interpretability to Characterize Data and Shape the Learning Signal

cs.LG · 2026-06-10 · unverdicted · novelty 5.0

A new pipeline uses interpretability to characterize concepts in preference data and shape rewards via feature or data interventions during LM post-training.

citing papers explorer

Showing 7 of 7 citing papers.

Expander Sparse Autoencoders: Parameter-Efficient Dictionaries for Mechanistic Interpretability cs.LG · 2026-07-02 · conditional · none · ref 5
Expander SAEs apply left-d-regular expander masks to TopK SAEs, learning only dn decoder parameters instead of mn and tracing a storage-fidelity frontier that reaches 293x compression with 84% retained performance on Qwen2.5-3B.
A Unifying Framework for Concept-Based Representational Similarity cs.LG · 2026-06-08 · unverdicted · none · ref 48
A unifying framework decomposes concept alignment into instance-wise and distributional translation and concept consistency, introduces the InterVenchA benchmark, and shows that joint optimization via CoSAE recovers strong alignment even with 0.1% paired data.
Tensor Product Representation Probes Reveal Shared Structure Across Linear Directions cs.LG · 2026-05-11 · unverdicted · none · ref 5
Linear probes for Othello board states factor into tensor-product structure with square and color embeddings composed by a binding matrix, from which the linear probes can be directly recovered.
From Reasoning Traces to Reusable Modules: Understanding Compositional Generalization in Language Model Reasoning cs.LG · 2026-06-16 · unverdicted · none · ref 91
Introduces a hierarchical latent selection model showing SFT supplies raw module materials in compound traces while RL decomposes them to identify atomic modules and enable recombination for new reasoning configurations.
A Geometric View for Understanding Concept Learning and Neuron Interpretation in Sparse Autoencoders cs.LG · 2026-06-05 · unverdicted · none · ref 9
Formalizes concept learning in sparse autoencoders as set alignment between human-defined and model-induced concepts, distinguishing detection, separation, and approximation with geometric conditions for neuron representation.
Stories in Space: In-Context Learning Trajectories in Conceptual Belief Space cs.CL · 2026-05-12 · unverdicted · none · ref 26
LLMs perform in-context learning as trajectories through a structured low-dimensional conceptual belief space, with the structure visible in both behavior and internal representations and causally manipulable via interventions.
Anatomy of Post-Training: Using Interpretability to Characterize Data and Shape the Learning Signal cs.LG · 2026-06-10 · unverdicted · none · ref 26
A new pipeline uses interpretability to characterize concepts in preference data and shape rewards via feature or data interventions during LM post-training.

arXiv preprint arXiv:2506.03093 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer