pith. machine review for the scientific record. sign in

hub

Liu, and Matt Gardner

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it
abstract

We present a novel method for obtaining high-quality, domain-targeted multiple choice questions from crowd workers. Generating these questions can be difficult without trading away originality, relevance or diversity in the answer options. Our method addresses these problems by leveraging a large corpus of domain-specific text and a small set of existing questions. It produces model suggestions for document selection and answer distractor choice which aid the human question generation process. With this method we have assembled SciQ, a dataset of 13.7K multiple choice science exam questions (Dataset available at http://allenai.org/data.html). We demonstrate that the method produces in-domain questions by providing an analysis of this new dataset and by showing that humans cannot distinguish the crowdsourced questions from original questions. When using SciQ as additional training data to existing questions, we observe accuracy improvements on real science exams.

hub tools

representative citing papers

Inducing Artificial Uncertainty in Language Models

cs.CL · 2026-05-13 · unverdicted · novelty 7.0

Inducing artificial uncertainty on trivial tasks allows training probes that achieve higher calibration on hard data than standard approaches while retaining performance on easy data.

Scaling Laws for Mixture Pretraining Under Data Constraints

cs.LG · 2026-05-12 · conditional · novelty 7.0

Repetition-aware scaling laws show scarce target data in pretraining mixtures can be repeated 15-20 times optimally, with the best count depending on data size, compute, and model scale.

Scaling and evaluating sparse autoencoders

cs.LG · 2024-06-06 · unverdicted · novelty 7.0

K-sparse autoencoders with dead-latent fixes produce clean scaling laws and better feature quality metrics that improve with size, shown by training a 16-million-latent model on GPT-4 activations.

Galactica: A Large Language Model for Science

cs.CL · 2022-11-16 · unverdicted · novelty 5.0

Galactica, a science-specialized LLM, reports higher scores than GPT-3, Chinchilla, and PaLM on LaTeX knowledge, mathematical reasoning, and medical QA benchmarks while outperforming general models on BIG-bench.

citing papers explorer

Showing 12 of 12 citing papers.