pith. sign in

arxiv: 2606.21611 · v1 · pith:ZZHS6EFRnew · submitted 2026-06-19 · 💻 cs.LG · cs.AI· math.GR· math.GT

The Two-Hump Problem: Bridging the Difficulty Gap in Mathematical Reinforcement Learning

classification 💻 cs.LG cs.AImath.GRmath.GT
keywords difficultylearningac-trivialchallengedatasetsincludinginstanceslength
0
0 comments X
read the original abstract

Mathematical search problems present a unique challenge for Reinforcement Learning (RL) due to vast search spaces and sparse rewards. In previous works, the Andrews-Curtis (AC) conjecture was established as an illustrative example of such problems. In this work, we identify a critical structural barrier in the AC landscape: a "Two-Hump" distribution, where problem instances are either trivially solvable or effectively impossible, with a scarcity of intermediate "hard-but-solvable" instances required for effective learning. We tackle this challenge through two primary avenues: novel data generation techniques to populate the difficulty gap, and significant algorithmic enhancements including the introduction of supermoves and Transformer-based architectures. We demonstrate substantial performance improvements over previous baselines, and release new comprehensive benchmark datasets including AC-19 (125,192 AC-trivial presentations of varying difficulty with length at most 19) and AC-1M (1,136,154 hard AC-trivial presentations of length at most 30), the first large-scale, publicly available datasets of this kind.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.