Q u ALITY : Question Answering with Long Input Texts, Yes!

Richard Yuanzhe Pang, Alicia Parrish, Nitish Joshi, Nikita Nangia, Jason Phang, Angelica Chen, Vishakh Padmakumar, Johnny Ma, Jana Thompson, He He, Samuel Bowman · 2022 · DOI 10.18653/v1/2022.naacl-main.391

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open at publisher browse 2 citing papers

representative citing papers

Measuring Progress on Scalable Oversight for Large Language Models

cs.HC · 2022-11-04 · unverdicted · novelty 6.0

Humans chatting with an unreliable LLM assistant outperform both the model alone and unaided humans on MMLU and time-limited QuALITY tasks.

Toward Human-AI Complementarity Across Diverse Tasks

cs.HC · 2026-04-13 · unverdicted · novelty 5.0

Human-AI hybrids achieve only +0.4pp over AI alone on diverse tasks because confidence routing fails to identify the small set of cases where humans can correct AI errors.

citing papers explorer

Showing 2 of 2 citing papers.

Measuring Progress on Scalable Oversight for Large Language Models cs.HC · 2022-11-04 · unverdicted · none · ref 57
Humans chatting with an unreliable LLM assistant outperform both the model alone and unaided humans on MMLU and time-limited QuALITY tasks.
Toward Human-AI Complementarity Across Diverse Tasks cs.HC · 2026-04-13 · unverdicted · none · ref 40
Human-AI hybrids achieve only +0.4pp over AI alone on diverse tasks because confidence routing fails to identify the small set of cases where humans can correct AI errors.

Q u ALITY : Question Answering with Long Input Texts, Yes!

fields

years

verdicts

representative citing papers

citing papers explorer