pith. machine review for the scientific record. sign in

arxiv: 1806.00358 · v2 · submitted 2018-06-01 · 💻 cs.AI · cs.CL· cs.IR

Recognition: unknown

A Systematic Classification of Knowledge, Reasoning, and Context within the ARC Dataset

Authors on Pith no claims yet
classification 💻 cs.AI cs.CLcs.IR
keywords reasoningchallengedatasetknowledgequestionstypesansweringdefinitions
0
0 comments X
read the original abstract

The recent work of Clark et al. introduces the AI2 Reasoning Challenge (ARC) and the associated ARC dataset that partitions open domain, complex science questions into an Easy Set and a Challenge Set. That paper includes an analysis of 100 questions with respect to the types of knowledge and reasoning required to answer them; however, it does not include clear definitions of these types, nor does it offer information about the quality of the labels. We propose a comprehensive set of definitions of knowledge and reasoning types necessary for answering the questions in the ARC dataset. Using ten annotators and a sophisticated annotation interface, we analyze the distribution of labels across the Challenge Set and statistics related to them. Additionally, we demonstrate that although naive information retrieval methods return sentences that are irrelevant to answering the query, sufficient supporting text is often present in the (ARC) corpus. Evaluating with human-selected relevant sentences improves the performance of a neural machine comprehension model by 42 points.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. GeoMMBench and GeoMMAgent: Toward Expert-Level Multimodal Intelligence in Geoscience and Remote Sensing

    cs.CV 2026-04 unverdicted novelty 7.0

    GeoMMBench reveals deficiencies in current multimodal LLMs for geoscience tasks while GeoMMAgent demonstrates that tool-integrated agents achieve significantly higher performance.

  2. GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

    cs.LG 2022-10 unverdicted novelty 7.0

    GPTQ quantizes 175B-parameter GPT models to 3-4 bits per weight in one shot using approximate second-order information, achieving negligible accuracy degradation and 3-4x inference speedups.