InProceedings of ICLR

Walk the Talk? Measuring the Faithfulness of Large Language Model Explanations · 2025 · arXiv 2504.14150

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Detecting Unfaithful Chain-of-Thought via Circuit-Guided Internal-External Discrepancy

cs.AI · 2026-05-25 · unverdicted · novelty 6.0

CIE-Scorer detects unfaithful CoT by tracing compact sentence-level circuits, building internal-external reasoning graphs, and scoring their discrepancy with Fused Gromov-Wasserstein distance, reporting SOTA results on FaithCoT-Bench with reduced circuit cost.

Faithful by Definition: Emotion Analysis via Natural Semantic Metalanguage Explications

cs.CL · 2026-07-01 · unverdicted · novelty 5.0

An NSM-based explication parser with fixed semantic rules produces emotion labels for events, achieving 0.33 accuracy on held-out crowd-sourced data while shifting empirical risk to an inspectable parser.

citing papers explorer

Showing 2 of 2 citing papers.

Detecting Unfaithful Chain-of-Thought via Circuit-Guided Internal-External Discrepancy cs.AI · 2026-05-25 · unverdicted · none · ref 34
CIE-Scorer detects unfaithful CoT by tracing compact sentence-level circuits, building internal-external reasoning graphs, and scoring their discrepancy with Fused Gromov-Wasserstein distance, reporting SOTA results on FaithCoT-Bench with reduced circuit cost.
Faithful by Definition: Emotion Analysis via Natural Semantic Metalanguage Explications cs.CL · 2026-07-01 · unverdicted · none · ref 6
An NSM-based explication parser with fixed semantic rules produces emotion labels for events, achieving 0.33 accuracy on held-out crowd-sourced data while shifting empirical risk to an inspectable parser.

InProceedings of ICLR

fields

years

verdicts

representative citing papers

citing papers explorer