Neural Combinatorial Optimization with Reinforcement Learning

Hieu Pham, Irwan Bello, Mohammad Norouzi, Quoc V. Le, Samy Bengio

Authors on Pith no claims yet

classification 💻 cs.AI cs.LGstat.ML

keywords learningcombinatorialgraphsnetworkneuraloptimizationcitymethod

read the original abstract

This paper presents a framework to tackle combinatorial optimization problems using neural networks and reinforcement learning. We focus on the traveling salesman problem (TSP) and train a recurrent network that, given a set of city coordinates, predicts a distribution over different city permutations. Using negative tour length as the reward signal, we optimize the parameters of the recurrent network using a policy gradient method. We compare learning the network parameters on a set of training graphs against learning them on individual test graphs. Despite the computational expense, without much engineering and heuristic designing, Neural Combinatorial Optimization achieves close to optimal results on 2D Euclidean graphs with up to 100 nodes. Applied to the KnapSack, another NP-hard problem, the same method obtains optimal solutions for instances with up to 200 items.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 15 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Linear Decision Tree Policies for Integer Linear Programs
math.OC 2026-05 unverdicted novelty 7.0

Linear decision trees can represent optimal solution policies for families of integer linear programs, enabling polynomial-time queries after offline synthesis for fixed feasible sets.
Learning to Solve the Quadratic Assignment Problem with Warm-Started MCMC Finetuning
cs.LG 2026-04 unverdicted novelty 7.0

PLMA combines cross-graph attention EBMs with short warm-started MCMC chains to reach near-zero average optimality gaps on QAPLIB and strong robustness on hard Taixxeyy instances.
Vehicle-as-Prompt: A Unified Deep Reinforcement Learning Framework for Heterogeneous Fleet Vehicle Routing Problem
cs.LG 2026-04 unverdicted novelty 7.0

VaP-CSMV uses a cross-semantic encoder and multi-view decoder to unify DRL solving of HFVRP variants, outperforming prior neural solvers while matching heuristics at much lower inference time and generalizing zero-sho...
CO-MAP: A Reinforcement Learning Approach to the Qubit Allocation Problem
quant-ph 2026-05 unverdicted novelty 6.0

Reinforcement learning policy for qubit mapping reduces SWAP overhead by 65-85% versus standard quantum compilers on MQTBench and Queko benchmark circuits.
Contextual Plackett-Luce: An Efficient Neural Model for Probabilistic Sequence Selection under Ambiguity
cs.LG 2026-05 unverdicted novelty 6.0

Contextual Plackett-Luce extends the classical Plackett-Luce model with context-dependent Ising parameterization to enable efficient parallel scoring followed by incremental autoregressive selection for ambiguous sequ...
HMACE: Heterogeneous Multi-Agent Collaborative Evolution for Combinatorial Optimization
cs.AI 2026-05 unverdicted novelty 6.0

HMACE deploys Proposer, Generator, Evaluator, and Reflector agents in an evolutionary loop to generate and refine heuristics for NP-hard problems, reporting lower optimality gaps and token costs than baselines on TSP ...
Graph Normalization: Fast Binarizing Dynamics for Differentiable MWIS
cs.LG 2026-05 unverdicted novelty 6.0

Graph Normalization is a convergent dynamical system that approximates MWIS by always reaching a binary maximum independent set via majorization-minimization and evolutionary game equivalence.
A Hybrid Reinforcement and Self-Supervised Learning Aided Benders Decomposition Algorithm
eess.SY 2026-04 unverdicted novelty 6.0

A hybrid RL and self-supervised learning method accelerates generalized Benders decomposition by 57.5% on a MINLP case study while recovering optimal solutions.
Machine Learning for Two-Stage Graph Sparsification for the Travelling Salesman Problem
cs.LG 2026-04 unverdicted novelty 6.0

A two-stage ML sparsifier for TSP candidate graphs combines alpha-Nearest and POPMUSIC for high recall then trains a model to cut density while preserving coverage across distance types and instance sizes up to 500.
Neural Global Optimization via Iterative Refinement from Noisy Samples
cs.LG 2026-04 unverdicted novelty 6.0

A neural model learns iterative refinement from noisy samples and spline inputs to find global minima, reporting 8.05% mean error on multi-modal tests versus 36.24% for spline initialization alone.
ARMATA: Auto-Regressive Multi-Agent Task Assignment
cs.MA 2026-05 unverdicted novelty 5.0

ARMATA is a new end-to-end autoregressive model with multi-stage decoding that unifies allocation and routing for multi-agent systems and reports up to 20% better solutions than OR-Tools, CPLEX, and LKH-3 in seconds i...
Finite Expression Method with TranNet-based Function Learning for High-Dimensional Partial Differential Equations
math.NA 2026-04 unverdicted novelty 4.0

An extension of the finite expression method using TranNet-initialized shallow neural operators is proposed as an effective solver for high-dimensional partial differential equations.
Built Environment Reasoning from Remote Sensing Imagery Using Large Vision--Language Models
cs.CL 2026-05 unverdicted novelty 3.0

Large vision-language models applied to multi-scale remote sensing imagery can generate recommendations on built environment design, constructability, land use, and risks for smart city decision-making.
Gemma 2: Improving Open Language Models at a Practical Size
cs.CL 2024-07 conditional novelty 3.0

Gemma 2 models achieve leading performance at their sizes by combining established Transformer modifications with knowledge distillation for the 2B and 9B variants.
Deep Learning for Sequential Decision Making under Uncertainty: Foundations, Frameworks, and Frontiers
math.OC 2026-04 unverdicted novelty 2.0

A tutorial framing deep learning as a complement to optimization for sequential decision-making under uncertainty, with applications in supply chains, healthcare, and energy.