LRS trains a latent reward model on final-answer correctness to steer SAE states during inference, improving reasoning performance and implicitly encouraging better cognitive behaviors.
arXiv preprint arXiv:2602.01695 , year=
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Latent Reward Steering: An Adaptive Inference-Time Framework that Implicitly Promotes Cognitive Behaviors in Reasoning LLMs
LRS trains a latent reward model on final-answer correctness to steer SAE states during inference, improving reasoning performance and implicitly encouraging better cognitive behaviors.