Reading Wikipedia to Answer Open-Domain Questions

Adam Fisch, Antoine Bordes, Danqi Chen, Jason Weston

Authors on Pith no claims yet

classification 💻 cs.CL

keywords wikipediaanswerarticlescombinesexistingmachinequestionreading

read the original abstract

This paper proposes to tackle open- domain question answering using Wikipedia as the unique knowledge source: the answer to any factoid question is a text span in a Wikipedia article. This task of machine reading at scale combines the challenges of document retrieval (finding the relevant articles) with that of machine comprehension of text (identifying the answer spans from those articles). Our approach combines a search component based on bigram hashing and TF-IDF matching with a multi-layer recurrent neural network model trained to detect answers in Wikipedia paragraphs. Our experiments on multiple existing QA datasets indicate that (1) both modules are highly competitive with respect to existing counterparts and (2) multitask learning using distant supervision on their combination is an effective complete system on this challenging task.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Online Learning-to-Defer with Varying Experts
stat.ML 2026-05 unverdicted novelty 8.0

Presents the first online learning-to-defer algorithm with regret bounds O((n + n_e) T^{2/3}) generally and O((n + n_e) sqrt(T)) under low noise for multiclass classification with varying experts.
Passage Re-ranking with BERT
cs.IR 2019-01 unverdicted novelty 8.0

Fine-tuning BERT for query-passage relevance classification achieves state-of-the-art results on TREC-CAR and MS MARCO, with a 27% relative gain in MRR@10 over prior methods.
Cram Less to Fit More: Training Data Pruning Improves Memorization of Facts
cs.CL 2026-04 conditional novelty 6.0

Loss-based pruning of training data to limit facts and flatten their frequency distribution enables a 110M-parameter GPT-2 model to memorize 1.3 times more entity facts than standard training, matching a 1.3B-paramete...
TIDE: Every Layer Knows the Token Beneath the Context
cs.CL 2026-05 unverdicted novelty 5.0

TIDE augments standard transformers with per-layer token embedding injection via an ensemble of memory blocks and a depth-conditioned router to mitigate rare-token undertraining and contextual collapse.
Plasma GraphRAG: Physics-Grounded Parameter Selection for Gyrokinetic Simulations
physics.plasm-ph 2026-04 unverdicted novelty 5.0

Plasma GraphRAG automates physics-grounded parameter selection for gyrokinetic simulations via a domain-specific knowledge graph and LLMs, reporting over 10% better quality and up to 25% fewer hallucinations than stan...