Recognition: unknown
ELI5: Long Form Question Answering
read the original abstract
We introduce the first large-scale corpus for long-form question answering, a task requiring elaborate and in-depth answers to open-ended questions. The dataset comprises 270K threads from the Reddit forum ``Explain Like I'm Five'' (ELI5) where an online community provides answers to questions which are comprehensible by five year olds. Compared to existing datasets, ELI5 comprises diverse questions requiring multi-sentence answers. We provide a large set of web documents to help answer the question. Automatic and human evaluations show that an abstractive model trained with a multi-task objective outperforms conventional Seq2Seq, language modeling, as well as a strong extractive baseline. However, our best model is still far from human performance since raters prefer gold responses in over 86% of cases, leaving ample opportunity for future improvement.
This paper has not been read by Pith yet.
Forward citations
Cited by 5 Pith papers
-
Every Bit, Everywhere, All at Once: A Binomial Multibit LLM Watermark
A binomial multibit watermarking scheme encodes every payload bit at each LLM token with dynamic redirection, outperforming baselines in accuracy and robustness for large payloads.
-
How Generative AI Disrupts Search: An Empirical Study of Google Search, Gemini, and AI Overviews
AI Overviews and Gemini retrieve substantially different sources than traditional Google search (Jaccard similarity <0.2), favor Google-owned content, appear for 51.5% of queries especially controversial ones, and are...
-
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
BART introduces a denoising pretraining method for seq2seq models that matches RoBERTa on GLUE and SQuAD while setting new state-of-the-art results on abstractive summarization, dialogue, and QA with up to 6 ROUGE gains.
-
Trustworthy AI: Ensuring Reliability and Accountability from Models to Agents
The thesis presents a kernel method for multiaccuracy across overlooked subpopulations, information-theoretic optimal watermarking for LLMs, and a simulator showing LLM agents outperforming humans in supply chains whi...
-
Rethinking the Necessity of Adaptive Retrieval-Augmented Generation through the Lens of Adaptive Listwise Ranking
AdaRankLLM shows adaptive listwise reranking outperforms fixed-depth retrieval for most LLMs by acting as a noise filter for weak models and an efficiency optimizer for strong ones, with lower context use.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.