Text2DistBench is a new scalable benchmark showing LLMs outperform random baselines on distributional reading comprehension from YouTube comments but vary widely by question type and distribution characteristics.
InProceedings of the 62nd Annual Meeting of the Association for Compu- tational Linguistics (V olume 1: Long Papers), pages 16366–16393, Bangkok, Thailand
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 2years
2026 2representative citing papers
A multi-agent LLM system discovers criteria such as Encouraging, Urgent, and Clear for surgical feedback and uses them to score 4.2k instances, outperforming prior content-based approaches in predicting trainee behavior changes and trainer approval.
citing papers explorer
-
Beyond Facts: Benchmarking Distributional Reading Comprehension in Large Language Models
Text2DistBench is a new scalable benchmark showing LLMs outperform random baselines on distributional reading comprehension from YouTube comments but vary widely by question type and distribution characteristics.
-
A Multi-Agent LLM Framework for Rating the Quality of Surgical Feedback
A multi-agent LLM system discovers criteria such as Encouraging, Urgent, and Clear for surgical feedback and uses them to score 4.2k instances, outperforming prior content-based approaches in predicting trainee behavior changes and trainer approval.