Large Language Models Help Humans Verify Truthfulness -- Except When They Are Convincingly Wrong

Chenglei Si, Navita Goyal, Tongshuang Wu, Chen Zhao, Shi Feng, Hal Daumé Iii, Jordan Boyd-Graber · 2024 · DOI 10.18653/v1/2024.naacl-long.81

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open at publisher browse 4 citing papers

representative citing papers

Embodied Explainability and Ontological Obstacles: Why We Struggle to Explain the Answers of Large Language Models (LLMs)

cs.HC · 2026-06-22 · unverdicted · novelty 7.0

An argument paper reframes LLM explainability as an embodied, situated practice based on Dourish and enactivist cognition, identifying ontological obstacles in internal explanations and advocating affordance-based designs.

HANSEL: Extracting Breadcrumbs from Web Agent Trajectories for Interactive Verification

cs.HC · 2026-06-17 · unverdicted · novelty 6.0

HANSEL extracts navigable evidence from agent trajectories with 83.7% precision and 88.8% recall on 45 tasks, reduces volume by 61.6%, and improves verification metrics in a 14-participant study.

When LLM Rationales Become User-Facing: Effects on Trust Perception, Decision-Making, and Gaze Behaviors

cs.HC · 2026-06-24 · unverdicted · novelty 5.0

Two linked user studies find that LLM rationale correctness and certainty framing affect trust and decision confidence while presentation format does not, and incorrect rationales increase gaze attention and pupil size.

Benchmarked Yet Not Measured -- Generative AI Should be Evaluated Against Real-World Utility

cs.LG · 2026-05-07 · unverdicted · novelty 4.0 · 2 refs

Generative AI evaluation must shift from static benchmark scores to measuring sustained improvements in human capabilities within specific deployment contexts.

citing papers explorer

Showing 4 of 4 citing papers.

Embodied Explainability and Ontological Obstacles: Why We Struggle to Explain the Answers of Large Language Models (LLMs) cs.HC · 2026-06-22 · unverdicted · none · ref 99
An argument paper reframes LLM explainability as an embodied, situated practice based on Dourish and enactivist cognition, identifying ontological obstacles in internal explanations and advocating affordance-based designs.
HANSEL: Extracting Breadcrumbs from Web Agent Trajectories for Interactive Verification cs.HC · 2026-06-17 · unverdicted · none · ref 41
HANSEL extracts navigable evidence from agent trajectories with 83.7% precision and 88.8% recall on 45 tasks, reduces volume by 61.6%, and improves verification metrics in a 14-participant study.
When LLM Rationales Become User-Facing: Effects on Trust Perception, Decision-Making, and Gaze Behaviors cs.HC · 2026-06-24 · unverdicted · none · ref 12
Two linked user studies find that LLM rationale correctness and certainty framing affect trust and decision confidence while presentation format does not, and incorrect rationales increase gaze attention and pupil size.
Benchmarked Yet Not Measured -- Generative AI Should be Evaluated Against Real-World Utility cs.LG · 2026-05-07 · unverdicted · none · ref 189 · 2 links
Generative AI evaluation must shift from static benchmark scores to measuring sustained improvements in human capabilities within specific deployment contexts.

Large Language Models Help Humans Verify Truthfulness -- Except When They Are Convincingly Wrong

fields

years

verdicts

representative citing papers

citing papers explorer