LLMs recover dominant binomial orders from corpora but align less closely with exact preference distributions, with preference strength partially encoded in middle-to-late layers and manipulable via steering.
Analyzing Encoded Concepts in Transformer Language Models
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
LLM representations encode essay quality in a linearly decodable form that emerges across layers and includes identifiable scoring neurons whose distribution shifts with essay length.
Pilot study uses pretrained video encoder features from lung ultrasound to predict 30-day CHF readmission, finding lower-lung views and temporal differences most informative with top MLP F1 of 0.80.
citing papers explorer
-
Behavioral and Representational Evidence of Binomial Ordering Preferences in Large Language Models
LLMs recover dominant binomial orders from corpora but align less closely with exact preference distributions, with preference strength partially encoded in middle-to-late layers and manipulable via steering.
-
From Texts to Scores: Tracing the Emergence of Essay Quality Representations in Large Language Models
LLM representations encode essay quality in a linearly decodable form that emerges across layers and includes identifiable scoring neurons whose distribution shifts with essay length.