Feature visualization

URL https://arxiv · 2020 · DOI 10.23915/distill.00007

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open at publisher browse 4 citing papers

representative citing papers

cs.LG · 2022-09-21 · accept · novelty 8.0

Toy models demonstrate that polysemanticity arises when neural networks store more sparse features than neurons via superposition, producing a phase transition tied to polytope geometry and increased adversarial vulnerability.

From Mechanistic to Compositional Interpretability

cs.LG · 2026-05-09 · unverdicted · novelty 7.0

Compositional interpretability defines explanations as commuting syntactic-semantic mapping pairs grounded in compositionality and minimum description length, with compressive refinement and a parsimony theorem guaranteeing concise human-aligned decompositions.

NeuroViz: Real-time Interactive Visualization of Forward and Backward Passes in Neural Network Training

cs.LG · 2026-05-03 · unverdicted · novelty 5.0

NeuroViz offers interactive real-time visualization of neural network forward and backward passes, achieving top usability scores in a study with 31 participants compared to existing tools.

Open Problems in Mechanistic Interpretability

cs.LG · 2025-01-27 · unverdicted · novelty 3.0

A review paper that organizes conceptual, practical, and socio-technical open problems in mechanistic interpretability.

citing papers explorer

Showing 4 of 4 citing papers.

Toy Models of Superposition cs.LG · 2022-09-21 · accept · none · ref 10
Toy models demonstrate that polysemanticity arises when neural networks store more sparse features than neurons via superposition, producing a phase transition tied to polytope geometry and increased adversarial vulnerability.
From Mechanistic to Compositional Interpretability cs.LG · 2026-05-09 · unverdicted · none · ref 203
Compositional interpretability defines explanations as commuting syntactic-semantic mapping pairs grounded in compositionality and minimum description length, with compressive refinement and a parsimony theorem guaranteeing concise human-aligned decompositions.
NeuroViz: Real-time Interactive Visualization of Forward and Backward Passes in Neural Network Training cs.LG · 2026-05-03 · unverdicted · none · ref 34
NeuroViz offers interactive real-time visualization of neural network forward and backward passes, achieving top usability scores in a study with 31 participants compared to existing tools.
Open Problems in Mechanistic Interpretability cs.LG · 2025-01-27 · unverdicted · none · ref 9
A review paper that organizes conceptual, practical, and socio-technical open problems in mechanistic interpretability.

Feature visualization

fields

years

verdicts

representative citing papers

citing papers explorer