DoLa reduces hallucinations in LLMs by contrasting logits from later versus earlier layers during decoding, improving truthfulness on TruthfulQA by 12-17 absolute points without fine-tuning or retrieval.
Truthfulqa: Measuring how models mimic human falsehoods
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CL 2representative citing papers
AMBS is a 1-to-N Transformer steering framework that shares a base representation across HHH objectives and restricts divergence during inference to produce consistent multi-objective responses in one forward pass.
citing papers explorer
-
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models
DoLa reduces hallucinations in LLMs by contrasting logits from later versus earlier layers during decoding, improving truthfulness on TruthfulQA by 12-17 absolute points without fine-tuning or retrieval.
-
We Think, Therefore We Align LLMs to Helpful, Harmless and Honest Before They Go Wrong
AMBS is a 1-to-N Transformer steering framework that shares a base representation across HHH objectives and restricts divergence during inference to produce consistent multi-objective responses in one forward pass.