Training verifiers to solve math word problems

Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, John Schulman · 2021

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

browse 5 citing papers

representative citing papers

Super Apriel: One Checkpoint, Many Speeds

cs.LG · 2026-04-21 · unverdicted · novelty 7.0

A single 15B supernet checkpoint supports runtime switching between attention mixer placements for multiple decode speed presets while retaining 77-96% quality relative to the teacher model.

RouterBench: A Benchmark for Multi-LLM Routing System

cs.LG · 2024-03-18 · unverdicted · novelty 7.0

RouterBench supplies a standardized benchmark, 405k+ inference dataset, theoretical framework, and comparative analysis for multi-LLM routing systems.

LoRA-Mixer: Coordinate Modular LoRA Experts Through Serial Attention Routing

cs.LG · 2025-06-17 · unverdicted · novelty 6.0

LoRA-Mixer routes modular LoRA experts into attention projection matrices with an adaptive Routing Specialization Loss to improve multi-task performance while using fewer trainable parameters than prior LoRA-MoE methods.

Less Is More: Cognitive Load and the Single-Prompt Ceiling in LLM Mathematical Reasoning

cs.CL · 2026-04-20 · unverdicted · novelty 5.0

Systematic testing of prompt engineering for LLM equational reasoning finds a performance ceiling of 60-79% accuracy that extensive engineering cannot exceed, driven by undecidability and model capacity limits.

FinSTaR: Towards Financial Reasoning with Time Series Reasoning Models

cs.AI · 2026-05-05

citing papers explorer

Showing 5 of 5 citing papers.

Super Apriel: One Checkpoint, Many Speeds cs.LG · 2026-04-21 · unverdicted · none · ref 12
A single 15B supernet checkpoint supports runtime switching between attention mixer placements for multiple decode speed presets while retaining 77-96% quality relative to the teacher model.
RouterBench: A Benchmark for Multi-LLM Routing System cs.LG · 2024-03-18 · unverdicted · none · ref 80
RouterBench supplies a standardized benchmark, 405k+ inference dataset, theoretical framework, and comparative analysis for multi-LLM routing systems.
LoRA-Mixer: Coordinate Modular LoRA Experts Through Serial Attention Routing cs.LG · 2025-06-17 · unverdicted · none · ref 29
LoRA-Mixer routes modular LoRA experts into attention projection matrices with an adaptive Routing Specialization Loss to improve multi-task performance while using fewer trainable parameters than prior LoRA-MoE methods.
Less Is More: Cognitive Load and the Single-Prompt Ceiling in LLM Mathematical Reasoning cs.CL · 2026-04-20 · unverdicted · none · ref 3
Systematic testing of prompt engineering for LLM equational reasoning finds a performance ceiling of 60-79% accuracy that extensive engineering cannot exceed, driven by undecidability and model capacity limits.
FinSTaR: Towards Financial Reasoning with Time Series Reasoning Models cs.AI · 2026-05-05 · unreviewed · ref 7

Training verifiers to solve math word problems

fields

years

verdicts

representative citing papers

citing papers explorer