The average next-token cross-entropy is CEr =− 1 M MX i=1 logp (r) i (yi).(16) Here r= SAE uses ˜h, r= Id uses the original h, and r= 0uses0

For substitution r∈ {SAE,Id,0} , let p(r) i be the resulting next-token distribution at evaluated position i, let yi be the target next token · arXiv 8180.9352

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

read on arXiv browse 1 citing papers

representative citing papers

VASAE: Naming SAE Dictionary Directions with Vocabulary-Aligned Anchoring

cs.CL · 2026-06-26 · unverdicted · novelty 6.0

VASAE introduces vocabulary-aligned anchoring to train SAEs that yield features with intrinsic token names, reporting high alignment rates in early layers of GPT-2 and Llama-3.1 without reconstruction loss.

citing papers explorer

Showing 1 of 1 citing paper.

VASAE: Naming SAE Dictionary Directions with Vocabulary-Aligned Anchoring cs.CL · 2026-06-26 · unverdicted · none · ref 22
VASAE introduces vocabulary-aligned anchoring to train SAEs that yield features with intrinsic token names, reporting high alignment rates in early layers of GPT-2 and Llama-3.1 without reconstruction loss.

The average next-token cross-entropy is CEr =− 1 M MX i=1 logp (r) i (yi).(16) Here r= SAE uses ˜h, r= Id uses the original h, and r= 0uses0

fields

years

verdicts

representative citing papers

citing papers explorer