Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems

Ling, Wang, Yogatama, Dani, Dyer, Chris, Blunsom, Phil · 2017 · DOI 10.18653/v1/p17-1015

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

open at publisher browse 7 citing papers

representative citing papers

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

cs.CL · 2022-01-28 · accept · novelty 9.0

Chain-of-thought prompting, by including intermediate reasoning steps in few-shot examples, elicits strong reasoning abilities in large language models on arithmetic, commonsense, and symbolic tasks.

Beyond Accuracy: Evaluating Strategy Diversity in LLM Mathematical Reasoning

cs.AI · 2026-05-10 · unverdicted · novelty 7.0

Frontier LLMs achieve 95-100% accuracy on AMC/AIME problems but recover far fewer distinct valid strategies than human references, while collectively generating 50 novel strategies.

Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads

cs.LG · 2024-01-19 · conditional · novelty 7.0

Medusa augments LLMs with multiple decoding heads and tree-based attention to predict and verify several tokens in parallel, yielding 2.2-3.6x inference speedup via two fine-tuning regimes.

GAIA: a benchmark for General AI Assistants

cs.CL · 2023-11-21 · unverdicted · novelty 7.0

GAIA benchmark shows humans at 92% accuracy on simple real-world questions far outperform current AI systems at 15%, proposing this gap as a key milestone for general AI.

Measuring Faithfulness in Chain-of-Thought Reasoning

cs.AI · 2023-07-17 · conditional · novelty 7.0

Chain-of-Thought reasoning in LLMs is often unfaithful, with models relying on it variably by task and less so as models scale larger.

PaLM: Scaling Language Modeling with Pathways

cs.CL · 2022-04-05 · accept · novelty 6.0

PaLM 540B demonstrates continued scaling benefits by setting new few-shot SOTA results on hundreds of benchmarks and outperforming humans on BIG-bench.

Toward Cross-Lingual Quality Classifiers for Multilingual Pretraining Data Selection

cs.CL · 2026-04-22 · unverdicted · novelty 4.0

Multilingual pooling for quality classifiers outperforms monolingual baselines in rank stability and accuracy for LLM pretraining data selection across high- and low-resource languages.

citing papers explorer

Showing 2 of 2 citing papers after filters.

GAIA: a benchmark for General AI Assistants cs.CL · 2023-11-21 · unverdicted · none · ref 117
GAIA benchmark shows humans at 92% accuracy on simple real-world questions far outperform current AI systems at 15%, proposing this gap as a key milestone for general AI.
Measuring Faithfulness in Chain-of-Thought Reasoning cs.AI · 2023-07-17 · conditional · none · ref 15
Chain-of-Thought reasoning in LLMs is often unfaithful, with models relying on it variably by task and less so as models scale larger.

Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems

fields

years

verdicts

representative citing papers

citing papers explorer