LOVER creates an unsupervised logic-regularized verifier that reaches 95% of supervised verifier performance on reasoning tasks across 10 datasets.
The Eleventh International Conference on Learning Representations , year=
3 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 3representative citing papers
Activation Addition steers language models by adding contrastive activation vectors from prompt pairs to control high-level properties like sentiment and toxicity at inference time without training.
At sufficient scale, LLMs linearly represent the truth value of factual statements, as shown by visualizations, cross-dataset generalization, and causal interventions that flip truth judgments.
citing papers explorer
-
Logic-Regularized Verifier Elicits Reasoning from LLMs
LOVER creates an unsupervised logic-regularized verifier that reaches 95% of supervised verifier performance on reasoning tasks across 10 datasets.
-
Steering Language Models With Activation Engineering
Activation Addition steers language models by adding contrastive activation vectors from prompt pairs to control high-level properties like sentiment and toxicity at inference time without training.
-
The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets
At sufficient scale, LLMs linearly represent the truth value of factual statements, as shown by visualizations, cross-dataset generalization, and causal interventions that flip truth judgments.