Mathgap: Out-of-distribution evaluation on problems with arbitrarily complex proofs

Andreas Opedal, Haruki Shirakami, Bernhard Schölkopf, Abulhair Saparov, Mrinmaya Sachan · 2024 · arXiv 2410.13502

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

CORE: Contrastive Reflection Enables Rapid Improvements in Reasoning

cs.AI · 2026-05-27 · unverdicted · novelty 7.0

CORE distills contrasts between successful and unsuccessful reasoning traces into compact natural-language insights that enable faster model self-improvement on reasoning tasks with fewer rollouts than parametric or other non-parametric baselines.

FormalScience: Scalable Human-in-the-Loop Autoformalisation of Science with Agentic Code Generation in Lean

cs.AI · 2026-04-24 · unverdicted · novelty 7.0

FormalScience provides a scalable human-in-the-loop system for autoformalising scientific reasoning into Lean, demonstrated on a new 200-problem physics dataset with perfect formal validity.

citing papers explorer

Showing 2 of 2 citing papers.

CORE: Contrastive Reflection Enables Rapid Improvements in Reasoning cs.AI · 2026-05-27 · unverdicted · none · ref 21
CORE distills contrasts between successful and unsuccessful reasoning traces into compact natural-language insights that enable faster model self-improvement on reasoning tasks with fewer rollouts than parametric or other non-parametric baselines.
FormalScience: Scalable Human-in-the-Loop Autoformalisation of Science with Agentic Code Generation in Lean cs.AI · 2026-04-24 · unverdicted · none · ref 4
FormalScience provides a scalable human-in-the-loop system for autoformalising scientific reasoning into Lean, demonstrated on a new 200-problem physics dataset with perfect formal validity.

Mathgap: Out-of-distribution evaluation on problems with arbitrarily complex proofs

fields

years

verdicts

representative citing papers

citing papers explorer