The neighborhoods we consider are of the form Bϵ(x), where x is the embedding of the last token of a prompt at the 31st of GPT2-Large

More specifically, we test how the attribution values of the neurons changes as we consider increasingly large neighborhoods · 2021

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

The Linear Centroids Hypothesis: Features as Directions Learned by Local Experts

cs.LG · 2026-04-13 · unverdicted · novelty 7.0

The Linear Centroids Hypothesis reframes network features as directions in centroid spaces of local affine experts, unifying interpretability methods and yielding sparser, more faithful dictionaries, circuits, and saliency maps.

citing papers explorer

Showing 1 of 1 citing paper.

The Linear Centroids Hypothesis: Features as Directions Learned by Local Experts cs.LG · 2026-04-13 · unverdicted · none · ref 17
The Linear Centroids Hypothesis reframes network features as directions in centroid spaces of local affine experts, unifying interpretability methods and yielding sparser, more faithful dictionaries, circuits, and saliency maps.

The neighborhoods we consider are of the form Bϵ(x), where x is the embedding of the last token of a prompt at the 31st of GPT2-Large

fields

years

verdicts

representative citing papers

citing papers explorer