Commercial AI chatbots reach over 90% multiple-choice accuracy on recent news facts but lose 11-17% in free response and drop to 19-70% on subtle false-premise questions, with retrieval failures causing most errors and clear Anglophone bias.
ARES : An automated evaluation framework for retrieval-augmented generation systems
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CL 2years
2026 2representative citing papers
Retrieval-state lock-in causes zero-dispersion errors in 42% of KG-RAG and 59% of dense-retrieval failures; a three-object check rule reaches 91.9% pooled precision at 7.7% coverage.
citing papers explorer
-
Evaluating Commercial AI Chatbots as News Intermediaries
Commercial AI chatbots reach over 90% multiple-choice accuracy on recent news facts but lose 11-17% in free response and drop to 19-70% on subtle false-premise questions, with retrieval failures causing most errors and clear Anglophone bias.
-
When Confidence Takes the Wrong Path: Diagnosing Retrieval-State Lock-In in RAG
Retrieval-state lock-in causes zero-dispersion errors in 42% of KG-RAG and 59% of dense-retrieval failures; a three-object check rule reaches 91.9% pooled precision at 7.7% coverage.