pith. sign in

Enhancing automated interpretability with output-centric feature descriptions

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

fields

cs.LG 2

years

2026 2

verdicts

UNVERDICTED 2

clear filters

representative citing papers

Prototype Language Models

cs.LG · 2026-07-01 · unverdicted · novelty 6.0

PRISM forms predictions as sparse mixtures of learned prototypes trained with clustering objectives, matching dense model accuracy while enabling ~500x faster data attribution and behavior editing without finetuning.

citing papers explorer

Showing 2 of 2 citing papers after filters.

  • Query Lens: Interpreting Sparse Key-Value Features with Indirect Effects cs.LG · 2026-05-30 · unverdicted · none · ref 17

    Query Lens extends Logit Lens to interpret sparse features via key-value analysis and indirect effects, yielding coherent token signatures where Logit Lens fails, and proposes the Subspace Channel Hypothesis.

  • Prototype Language Models cs.LG · 2026-07-01 · unverdicted · none · ref 153

    PRISM forms predictions as sparse mixtures of learned prototypes trained with clustering objectives, matching dense model accuracy while enabling ~500x faster data attribution and behavior editing without finetuning.