The Two-Hump Problem: Bridging the Difficulty Gap in Mathematical Reinforcement Learning

Ali Shehper; Angus Gruen; Coco Huang; Davide Passaro; Giorgi Butbaia; Lucas Fagan; Maksymilian Manko; Michele Tarquini; Sergei Gukov

arxiv: 2606.21611 · v1 · pith:ZZHS6EFRnew · submitted 2026-06-19 · 💻 cs.LG · cs.AI· math.GR· math.GT

The Two-Hump Problem: Bridging the Difficulty Gap in Mathematical Reinforcement Learning

Lucas Fagan , Michele Tarquini , Ali Shehper , Maksymilian Manko , Angus Gruen , Coco Huang , Giorgi Butbaia , Davide Passaro

show 1 more author

Sergei Gukov

This is my paper

classification 💻 cs.LG cs.AImath.GRmath.GT

keywords difficultylearningac-trivialchallengedatasetsincludinginstanceslength

0 comments

read the original abstract

Mathematical search problems present a unique challenge for Reinforcement Learning (RL) due to vast search spaces and sparse rewards. In previous works, the Andrews-Curtis (AC) conjecture was established as an illustrative example of such problems. In this work, we identify a critical structural barrier in the AC landscape: a "Two-Hump" distribution, where problem instances are either trivially solvable or effectively impossible, with a scarcity of intermediate "hard-but-solvable" instances required for effective learning. We tackle this challenge through two primary avenues: novel data generation techniques to populate the difficulty gap, and significant algorithmic enhancements including the introduction of supermoves and Transformer-based architectures. We demonstrate substantial performance improvements over previous baselines, and release new comprehensive benchmark datasets including AC-19 (125,192 AC-trivial presentations of varying difficulty with length at most 19) and AC-1M (1,136,154 hard AC-trivial presentations of length at most 30), the first large-scale, publicly available datasets of this kind.

This paper has not been read by Pith yet.

The Two-Hump Problem: Bridging the Difficulty Gap in Mathematical Reinforcement Learning

discussion (0)