CORE distills contrasts between successful and unsuccessful reasoning traces into compact natural-language insights that enable faster model self-improvement on reasoning tasks with fewer rollouts than parametric or other non-parametric baselines.
Mathgap: Out-of-distribution evaluation on problems with arbitrarily complex proofs
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.AI 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
FormalScience provides a scalable human-in-the-loop system for autoformalising scientific reasoning into Lean, demonstrated on a new 200-problem physics dataset with perfect formal validity.
citing papers explorer
-
CORE: Contrastive Reflection Enables Rapid Improvements in Reasoning
CORE distills contrasts between successful and unsuccessful reasoning traces into compact natural-language insights that enable faster model self-improvement on reasoning tasks with fewer rollouts than parametric or other non-parametric baselines.
-
FormalScience: Scalable Human-in-the-Loop Autoformalisation of Science with Agentic Code Generation in Lean
FormalScience provides a scalable human-in-the-loop system for autoformalising scientific reasoning into Lean, demonstrated on a new 200-problem physics dataset with perfect formal validity.