Chain-of-Verification reduces hallucinations in large language models by drafting responses, planning independent verification questions, answering them separately, and generating a final verified output.
Transactions of the Association for Computational Linguistics , volume=
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 3verdicts
UNVERDICTED 3representative citing papers
Pragmatic reasoning in LLMs varies substantially by evaluation method and model family, with scalar diversity patterns appearing only in certain conditions rather than reflecting stable competence.
A reference-free proxy scoring framework combined with GIRB calibration produces better-aligned evaluation metrics for summarization and outperforms baselines across seven datasets.
citing papers explorer
-
Chain-of-Verification Reduces Hallucination in Large Language Models
Chain-of-Verification reduces hallucinations in large language models by drafting responses, planning independent verification questions, answering them separately, and generating a final verified output.
-
Evaluating Pragmatic Reasoning in Large Language Models: Evidence from Scalar Diversity
Pragmatic reasoning in LLMs varies substantially by evaluation method and model family, with scalar diversity patterns appearing only in certain conditions rather than reflecting stable competence.
-
Calibrating Model-Based Evaluation Metrics for Summarization
A reference-free proxy scoring framework combined with GIRB calibration produces better-aligned evaluation metrics for summarization and outperforms baselines across seven datasets.