Attention, Learn to Solve Routing Problems!

Wouter Kool , Herke van Hoof , Max Welling

Authors on Pith no claims yet

classification 📊 stat.ML cs.LG

keywords heuristicslearnproblemproblemsattentionbetterclosegetting

read the original abstract

The recently presented idea to learn heuristics for combinatorial optimization problems is promising as it can save costly development. However, to push this idea towards practical implementation, we need better models and better ways of training. We contribute in both directions: we propose a model based on attention layers with benefits over the Pointer Network and we show how to train this model using REINFORCE with a simple baseline based on a deterministic greedy rollout, which we find is more efficient than using a value function. We significantly improve over recent learned heuristics for the Travelling Salesman Problem (TSP), getting close to optimal results for problems up to 100 nodes. With the same hyperparameters, we learn strong heuristics for two variants of the Vehicle Routing Problem (VRP), the Orienteering Problem (OP) and (a stochastic variant of) the Prize Collecting TSP (PCTSP), outperforming a wide range of baselines and getting results close to highly optimized and specialized algorithms.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Vehicle-as-Prompt: A Unified Deep Reinforcement Learning Framework for Heterogeneous Fleet Vehicle Routing Problem
cs.LG 2026-04 unverdicted novelty 7.0

VaP-CSMV uses a cross-semantic encoder and multi-view decoder to unify DRL solving of HFVRP variants, outperforming prior neural solvers while matching heuristics at much lower inference time and generalizing zero-sho...
CO-MAP: A Reinforcement Learning Approach to the Qubit Allocation Problem
quant-ph 2026-05 unverdicted novelty 6.0

Reinforcement learning policy for qubit mapping reduces SWAP overhead by 65-85% versus standard quantum compilers on MQTBench and Queko benchmark circuits.
Machine Learning for Two-Stage Graph Sparsification for the Travelling Salesman Problem
cs.LG 2026-04 unverdicted novelty 6.0

A two-stage ML sparsifier for TSP candidate graphs combines alpha-Nearest and POPMUSIC for high recall then trains a model to cut density while preserving coverage across distance types and instance sizes up to 500.
ARMATA: Auto-Regressive Multi-Agent Task Assignment
cs.MA 2026-05 unverdicted novelty 5.0

ARMATA is a new end-to-end autoregressive model with multi-stage decoding that unifies allocation and routing for multi-agent systems and reports up to 20% better solutions than OR-Tools, CPLEX, and LKH-3 in seconds i...
NCO4CVRP: Neural Combinatorial Optimization for the Capacitated Vehicle Routing Problem
cs.LG 2026-04 unverdicted novelty 3.0

Adding simulated annealing to random reconstruction and beam search to POMO in neural CVRP solvers reduces optimality gaps on standard benchmarks.
Deep Learning for Sequential Decision Making under Uncertainty: Foundations, Frameworks, and Frontiers
math.OC 2026-04 unverdicted novelty 2.0

A tutorial framing deep learning as a complement to optimization for sequential decision-making under uncertainty, with applications in supply chains, healthcare, and energy.