Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering

Vaibhav Adlakha, Parishad BehnamGhader, Xing Han Lu, Nicholas Meade, Siva Reddy · 2024 · DOI 10.1162/tacl_a_00667

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open at publisher browse 2 citing papers

representative citing papers

Challenges and Recommendations for LLMs-as-a-Judge in Multilingual Settings and Low-Resource Languages

cs.CL · 2026-07-02 · unverdicted · novelty 5.0

Meta-analysis of 33 ACL papers shows inconsistent LLM-as-a-Judge results, overtrust, and single-model reliance in multilingual/low-resource settings, with recommendations for better practice.

Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate

cs.CL · 2023-05-30 · conditional · novelty 5.0

Multi-agent debate with tit-for-tat arguments and a judge LLM improves reasoning by preventing LLMs from locking into incorrect initial solutions.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Challenges and Recommendations for LLMs-as-a-Judge in Multilingual Settings and Low-Resource Languages cs.CL · 2026-07-02 · unverdicted · none · ref 123
Meta-analysis of 33 ACL papers shows inconsistent LLM-as-a-Judge results, overtrust, and single-model reliance in multilingual/low-resource settings, with recommendations for better practice.
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate cs.CL · 2023-05-30 · conditional · none · ref 220
Multi-agent debate with tit-for-tat arguments and a judge LLM improves reasoning by preventing LLMs from locking into incorrect initial solutions.

Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering

fields

years

verdicts

representative citing papers

citing papers explorer