On the probability–quality paradox in language generation

Clara Meister, Gian Wiher, Tiago Pimentel, Ryan Cotterell · 2022 · DOI 10.18653/v1/2022.acl-short.5

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open at publisher browse 3 citing papers

representative citing papers

Can LLMs Use Linguistic Uncertainty Markers to Reliably Reflect Intrinsic Confidence?

cs.CL · 2026-05-27 · unverdicted · novelty 7.0

LLMs struggle to associate epistemic markers with stable internal confidence levels across distributions, even under model-centric interpretations, while maintaining somewhat consistent marker rankings.

Reinforcement Learning with Metacognitive Feedback Elicits Faithful Uncertainty Expression in LLMs

cs.CL · 2026-06-30 · unverdicted · novelty 6.0

RLMF uses quality of model self-judgments to refine RL rankings and select training data, achieving SOTA faithful calibration while preserving accuracy and outperforming standard RL by up to 63%.

Can Reasoning Models Detect Changes to their Chains of Thought?

cs.AI · 2026-06-20 · unverdicted · novelty 5.0

Reasoning models detect modifications to their chains of thought with only modest accuracy and cannot reliably identify the nature of those modifications.

citing papers explorer

Showing 3 of 3 citing papers.

Can LLMs Use Linguistic Uncertainty Markers to Reliably Reflect Intrinsic Confidence? cs.CL · 2026-05-27 · unverdicted · none · ref 56
LLMs struggle to associate epistemic markers with stable internal confidence levels across distributions, even under model-centric interpretations, while maintaining somewhat consistent marker rankings.
Reinforcement Learning with Metacognitive Feedback Elicits Faithful Uncertainty Expression in LLMs cs.CL · 2026-06-30 · unverdicted · none · ref 70
RLMF uses quality of model self-judgments to refine RL rankings and select training data, achieving SOTA faithful calibration while preserving accuracy and outperforming standard RL by up to 63%.
Can Reasoning Models Detect Changes to their Chains of Thought? cs.AI · 2026-06-20 · unverdicted · none · ref 274
Reasoning models detect modifications to their chains of thought with only modest accuracy and cannot reliably identify the nature of those modifications.

On the probability–quality paradox in language generation

fields

years

verdicts

representative citing papers

citing papers explorer