pith. sign in

hub Canonical reference

Neural Combinatorial Optimization with Reinforcement Learning

Canonical reference. 100% of citing Pith papers cite this work as background.

33 Pith papers citing it
Background 100% of classified citations
abstract

This paper presents a framework to tackle combinatorial optimization problems using neural networks and reinforcement learning. We focus on the traveling salesman problem (TSP) and train a recurrent network that, given a set of city coordinates, predicts a distribution over different city permutations. Using negative tour length as the reward signal, we optimize the parameters of the recurrent network using a policy gradient method. We compare learning the network parameters on a set of training graphs against learning them on individual test graphs. Despite the computational expense, without much engineering and heuristic designing, Neural Combinatorial Optimization achieves close to optimal results on 2D Euclidean graphs with up to 100 nodes. Applied to the KnapSack, another NP-hard problem, the same method obtains optimal solutions for instances with up to 200 items.

hub tools

citation-role summary

background 5

citation-polarity summary

roles

background 5

polarities

background 5

representative citing papers

TriSearch: Learning to Optimize Triangulations via Bistellar Flips

cs.LG · 2026-05-28 · unverdicted · novelty 7.0

TriSearch is an RL framework that optimizes triangulations of polytopes using bistellar flips with a circuit-supported subtriangulation action representation, generalizing zero-shot to larger instances and outperforming prior samplers in 3D and 4D.

Learning to Discover at Test Time

cs.LG · 2026-01-22 · unverdicted · novelty 7.0

TTT-Discover applies test-time RL to set new state-of-the-art results on math inequalities, GPU kernels, algorithm contests, and single-cell denoising using an open model and public code.

Training Agents Inside of Scalable World Models

cs.AI · 2025-09-29 · conditional · novelty 7.0

Dreamer 4 is the first agent to obtain diamonds in Minecraft from only offline data by reinforcement learning inside a scalable world model that accurately predicts game mechanics.

Linear Decision Tree Policies for Integer Linear Programs

math.OC · 2026-05-04 · unverdicted · novelty 7.0

Linear decision trees can represent optimal solution policies for families of integer linear programs, enabling polynomial-time queries after offline synthesis for fixed feasible sets.

AlphaTransit: Learning to Design City-scale Transit Routes

cs.AI · 2026-05-27 · unverdicted · novelty 6.0

AlphaTransit pairs MCTS with a learned policy-value network to reach 54.6% and 82.1% service rates on a Bloomington transit benchmark, outperforming plain RL and plain MCTS baselines.

citing papers explorer

Showing 33 of 33 citing papers.