ARES : An automated evaluation framework for retrieval-augmented generation systems

Jon Saad-Falcon, Omar Khattab, Christopher Potts, Matei Zaharia · 2024 · DOI 10.18653/v1/2024.naacl-long.20

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open at publisher browse 2 citing papers

representative citing papers

Evaluating Commercial AI Chatbots as News Intermediaries

cs.CL · 2026-05-21 · conditional · novelty 7.0

Commercial AI chatbots reach over 90% multiple-choice accuracy on recent news facts but lose 11-17% in free response and drop to 19-70% on subtle false-premise questions, with retrieval failures causing most errors and clear Anglophone bias.

When Confidence Takes the Wrong Path: Diagnosing Retrieval-State Lock-In in RAG

cs.CL · 2026-06-22 · unverdicted · novelty 6.0

Retrieval-state lock-in causes zero-dispersion errors in 42% of KG-RAG and 59% of dense-retrieval failures; a three-object check rule reaches 91.9% pooled precision at 7.7% coverage.

citing papers explorer

Showing 2 of 2 citing papers.

Evaluating Commercial AI Chatbots as News Intermediaries cs.CL · 2026-05-21 · conditional · none · ref 49
Commercial AI chatbots reach over 90% multiple-choice accuracy on recent news facts but lose 11-17% in free response and drop to 19-70% on subtle false-premise questions, with retrieval failures causing most errors and clear Anglophone bias.
When Confidence Takes the Wrong Path: Diagnosing Retrieval-State Lock-In in RAG cs.CL · 2026-06-22 · unverdicted · none · ref 49
Retrieval-state lock-in causes zero-dispersion errors in 42% of KG-RAG and 59% of dense-retrieval failures; a three-object check rule reaches 91.9% pooled precision at 7.7% coverage.

ARES : An automated evaluation framework for retrieval-augmented generation systems

fields

years

verdicts

representative citing papers

citing papers explorer