Agree, Disagree, Explain: Decomposing Human Label Variation in NLI through the Lens of Explanations

· 2025 · cs.CL · arXiv 2510.16458

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open full Pith review browse 3 citing papers arXiv PDF

abstract

Natural Language Inference (NLI) datasets often exhibit human label variation. To better understand these variations, explanation-based approaches analyze the underlying reasoning behind annotators' decisions. One such approach is the LiTEx taxonomy, which categorizes free-text explanations in English into reasoning categories. However, previous work applying LiTEx has focused on within-label variation: cases where annotators agree on the NLI label but provide different explanations. This paper broadens the scope by examining how annotators may diverge not only in the reasoning category but also in the labeling. We use explanations as a lens to analyze variation in NLI annotations and to examine individual differences in reasoning. We apply LiTEx to two NLI datasets and align annotation variation from multiple aspects: NLI label agreement, explanation similarity, and taxonomy agreement, with an additional compounding factor of annotators' selection bias. We observe instances where annotators disagree on the label but provide similar explanations, suggesting that surface-level disagreement may mask underlying agreement in interpretation. Moreover, our analysis reveals individual preferences in explanation strategies and label choices. These findings highlight that agreement in reasoning categories better reflects the semantic similarity of explanations than label agreement alone. Our findings underscore the richness of reasoning-based explanations and the need for caution in treating labels as ground truth.

representative citing papers

Human Label Variation as Stable Signal: Learning Annotator-Specific Explanation Behavior via Cross-Annotator Preference Optimization

cs.CL · 2026-05-27 · unverdicted · novelty 6.0

LLMs can learn annotator-specific label-explanation behavior from human label variation via cross-annotator preference optimization, outperforming prompting and standard fine-tuning on two sentence-pair tasks.

Detecting Unfaithful Chain-of-Thought via Circuit-Guided Internal-External Discrepancy

cs.AI · 2026-05-25 · unverdicted · novelty 6.0

CIE-Scorer detects unfaithful CoT by tracing compact sentence-level circuits, building internal-external reasoning graphs, and scoring their discrepancy with Fused Gromov-Wasserstein distance, reporting SOTA results on FaithCoT-Bench with reduced circuit cost.

Quantifying and Predicting Disagreement in Graded Human Ratings

cs.CL · 2026-05-01 · unverdicted · novelty 5.0

Annotation disagreement on toxic language can be moderately predicted from textual features, with high-opposition items proving harder for models to estimate accurately.

citing papers explorer

Showing 3 of 3 citing papers.

Human Label Variation as Stable Signal: Learning Annotator-Specific Explanation Behavior via Cross-Annotator Preference Optimization cs.CL · 2026-05-27 · unverdicted · none · ref 3 · internal anchor
LLMs can learn annotator-specific label-explanation behavior from human label variation via cross-annotator preference optimization, outperforming prompting and standard fine-tuning on two sentence-pair tasks.
Detecting Unfaithful Chain-of-Thought via Circuit-Guided Internal-External Discrepancy cs.AI · 2026-05-25 · unverdicted · none · ref 19 · internal anchor
CIE-Scorer detects unfaithful CoT by tracing compact sentence-level circuits, building internal-external reasoning graphs, and scoring their discrepancy with Fused Gromov-Wasserstein distance, reporting SOTA results on FaithCoT-Bench with reduced circuit cost.
Quantifying and Predicting Disagreement in Graded Human Ratings cs.CL · 2026-05-01 · unverdicted · none · ref 201 · internal anchor
Annotation disagreement on toxic language can be moderately predicted from textual features, with high-opposition items proving harder for models to estimate accurately.

Agree, Disagree, Explain: Decomposing Human Label Variation in NLI through the Lens of Explanations

fields

years

verdicts

representative citing papers

citing papers explorer