hub Canonical reference

Neural Combinatorial Optimization with Reinforcement Learning

· 2016 · cs.AI · arXiv 1611.09940

Canonical reference. 100% of citing Pith papers cite this work as background.

33 Pith papers citing it

Background 100% of classified citations

open full Pith review browse 33 citing papers arXiv PDF

abstract

This paper presents a framework to tackle combinatorial optimization problems using neural networks and reinforcement learning. We focus on the traveling salesman problem (TSP) and train a recurrent network that, given a set of city coordinates, predicts a distribution over different city permutations. Using negative tour length as the reward signal, we optimize the parameters of the recurrent network using a policy gradient method. We compare learning the network parameters on a set of training graphs against learning them on individual test graphs. Despite the computational expense, without much engineering and heuristic designing, Neural Combinatorial Optimization achieves close to optimal results on 2D Euclidean graphs with up to 100 nodes. Applied to the KnapSack, another NP-hard problem, the same method obtains optimal solutions for instances with up to 200 items.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 5

citation-polarity summary

background 5

representative citing papers

AGAN: Towards Automated Design of Generative Adversarial Networks

cs.LG · 2019-06-25 · unverdicted · novelty 8.0

AGAN is the first neural architecture search method for GANs that discovers architectures outperforming state-of-the-art on CIFAR-10 unsupervised image generation and competitive on supervised tasks.

TriSearch: Learning to Optimize Triangulations via Bistellar Flips

cs.LG · 2026-05-28 · unverdicted · novelty 7.0

TriSearch is an RL framework that optimizes triangulations of polytopes using bistellar flips with a circuit-supported subtriangulation action representation, generalizing zero-shot to larger instances and outperforming prior samplers in 3D and 4D.

Memory-Guided Tree Search with Cross-Branch Knowledge Transfer for LLM Solver Synthesis

cs.AI · 2026-05-17 · unverdicted · novelty 7.0

MEMOIR adds branch-local and global memory with a reflection step to tree search for LLM solver synthesis, reaching 96.7% solution validity and 7.3-point score gains over baselines on seven CO problems with lower run-to-run variance.

Learning to Discover at Test Time

cs.LG · 2026-01-22 · unverdicted · novelty 7.0

TTT-Discover applies test-time RL to set new state-of-the-art results on math inequalities, GPU kernels, algorithm contests, and single-cell denoising using an open model and public code.

Training Agents Inside of Scalable World Models

cs.AI · 2025-09-29 · conditional · novelty 7.0

Dreamer 4 is the first agent to obtain diamonds in Minecraft from only offline data by reinforcement learning inside a scalable world model that accurately predicts game mechanics.

Linear Decision Tree Policies for Integer Linear Programs

math.OC · 2026-05-04 · unverdicted · novelty 7.0

Linear decision trees can represent optimal solution policies for families of integer linear programs, enabling polynomial-time queries after offline synthesis for fixed feasible sets.

Learning to Solve the Quadratic Assignment Problem with Warm-Started MCMC Finetuning

cs.LG · 2026-04-22 · unverdicted · novelty 7.0

PLMA combines cross-graph attention EBMs with short warm-started MCMC chains to reach near-zero average optimality gaps on QAPLIB and strong robustness on hard Taixxeyy instances.

Vehicle-as-Prompt: A Unified Deep Reinforcement Learning Framework for Heterogeneous Fleet Vehicle Routing Problem

cs.LG · 2026-04-06 · unverdicted · novelty 7.0

VaP-CSMV uses a cross-semantic encoder and multi-view decoder to unify DRL solving of HFVRP variants, outperforming prior neural solvers while matching heuristics at much lower inference time and generalizing zero-shot to unseen scales.

PARCEL: Pool-Anchored Resampling with Conditioned Elastic Queries for Efficient Vision-Language Understanding

cs.CV · 2026-05-28 · unverdicted · novelty 6.0

PARCEL is a new visual tokenization architecture combining pool-anchored resampling with conditioned elastic queries to enhance performance-efficiency tradeoffs in LVLMs over prior matryoshka methods.

AlphaTransit: Learning to Design City-scale Transit Routes

cs.AI · 2026-05-27 · unverdicted · novelty 6.0

AlphaTransit pairs MCTS with a learned policy-value network to reach 54.6% and 82.1% service rates on a Bloomington transit benchmark, outperforming plain RL and plain MCTS baselines.

Solving Classical and Quantum Spin Glasses with Deep Boltzmann Quantum States

cond-mat.dis-nn · 2026-05-15 · unverdicted · novelty 6.0

Deep Boltzmann Quantum States with natural-gradient optimization and annealing-like training match exact or best-known solutions for large infinite-range Ising spin glasses and solve job shop scheduling instances.

Rethinking Molecular OOD Generalization via Target-Aware Source Selection

cs.LG · 2026-05-13 · unverdicted · novelty 6.0

SCOPE-BENCH shows state-of-the-art molecular models suffer up to 8x higher errors under extreme OOD, while POMA reduces mean absolute error by up to 11.2% via target-aware source selection and dual-scale adaptation.

CO-MAP: A Reinforcement Learning Approach to the Qubit Allocation Problem

quant-ph · 2026-05-13 · unverdicted · novelty 6.0

Reinforcement learning policy for qubit mapping reduces SWAP overhead by 65-85% versus standard quantum compilers on MQTBench and Queko benchmark circuits.

Rethinking Efficiency in Neural Combinatorial Optimization: Batched Preference Optimization with Mamba

cs.LG · 2026-02-24 · unverdicted · novelty 6.0

ECO uses supervised warm-up plus iterative batched DPO on a Mamba backbone to reach top neural performance on TSP and CVRP while lowering memory growth and raising throughput.

Learning to Approximate Uniform Facility Location via Graph Neural Networks

cs.LG · 2026-02-13 · unverdicted · novelty 6.0

A differentiable MPNN approximates uniform facility location with provable guarantees and outperforms standard approximation algorithms while closing the gap to exact ILP solutions.

Learning-Optimized Qubit Mapping and Reuse to Minimize Inter-Core Communication in Modular Quantum Architectures

quant-ph · 2025-06-11 · unverdicted · novelty 6.0

QARMA applies transformer-augmented reinforcement learning to qubit allocation and reuse in modular quantum systems, reporting up to 86% average reduction in inter-core communications versus optimized Qiskit baselines.

RL-SPH: Learning to Achieve Feasible Solutions for Integer Linear Programs

cs.LG · 2024-11-29 · unverdicted · novelty 6.0

RL-SPH is a reinforcement learning start primal heuristic that independently produces feasible solutions for ILPs with non-binary integers at 100% rate and with 28.6× lower primal gap than prior start heuristics.

Attention-Based Deep Reinforcement Learning for Qubit Allocation in Modular Quantum Architectures

quant-ph · 2024-06-17 · unverdicted · novelty 6.0

An attention-based DRL agent with Transformer encoder and GNN learns heuristics for qubit-to-core allocation in multi-core quantum systems to minimize state transfers and online compilation time.

Contextual Plackett-Luce: An Efficient Neural Model for Probabilistic Sequence Selection under Ambiguity

cs.LG · 2026-05-09 · unverdicted · novelty 6.0

Contextual Plackett-Luce extends the classical Plackett-Luce model with context-dependent Ising parameterization to enable efficient parallel scoring followed by incremental autoregressive selection for ambiguous sequence tasks.

HMACE: Heterogeneous Multi-Agent Collaborative Evolution for Combinatorial Optimization

cs.AI · 2026-05-08 · unverdicted · novelty 6.0

HMACE deploys Proposer, Generator, Evaluator, and Reflector agents in an evolutionary loop to generate and refine heuristics for NP-hard problems, reporting lower optimality gaps and token costs than baselines on TSP and Online BPP.

Graph Normalization: Fast Binarizing Dynamics for Differentiable MWIS

cs.LG · 2026-05-06 · unverdicted · novelty 6.0

Graph Normalization is a convergent dynamical system that approximates MWIS by always reaching a binary maximum independent set via majorization-minimization and evolutionary game equivalence.

A Hybrid Reinforcement and Self-Supervised Learning Aided Benders Decomposition Algorithm

eess.SY · 2026-04-23 · unverdicted · novelty 6.0

A hybrid RL and self-supervised learning method accelerates generalized Benders decomposition by 57.5% on a MINLP case study while recovering optimal solutions.

Neural Global Optimization via Iterative Refinement from Noisy Samples

cs.LG · 2026-04-04 · unverdicted · novelty 6.0

A neural model learns iterative refinement from noisy samples and spline inputs to find global minima, reporting 8.05% mean error on multi-modal tests versus 36.24% for spline initialization alone.

Combinatorial Keyword Recommendations for Sponsored Search with Deep Reinforcement Learning

cs.IR · 2019-07-18 · unverdicted · novelty 5.0

A modified pointer network trained with actor-critic DRL and Equal Size K-Means clustering is applied to combinatorial keyword recommendation in sponsored search, reporting offline and online gains.

citing papers explorer

Showing 33 of 33 citing papers.

AGAN: Towards Automated Design of Generative Adversarial Networks cs.LG · 2019-06-25 · unverdicted · none · ref 35 · internal anchor
AGAN is the first neural architecture search method for GANs that discovers architectures outperforming state-of-the-art on CIFAR-10 unsupervised image generation and competitive on supervised tasks.
TriSearch: Learning to Optimize Triangulations via Bistellar Flips cs.LG · 2026-05-28 · unverdicted · none · ref 17 · internal anchor
TriSearch is an RL framework that optimizes triangulations of polytopes using bistellar flips with a circuit-supported subtriangulation action representation, generalizing zero-shot to larger instances and outperforming prior samplers in 3D and 4D.
Memory-Guided Tree Search with Cross-Branch Knowledge Transfer for LLM Solver Synthesis cs.AI · 2026-05-17 · unverdicted · none · ref 15 · internal anchor
MEMOIR adds branch-local and global memory with a reflection step to tree search for LLM solver synthesis, reaching 96.7% solution validity and 7.3-point score gains over baselines on seven CO problems with lower run-to-run variance.
Learning to Discover at Test Time cs.LG · 2026-01-22 · unverdicted · none · ref 7 · internal anchor
TTT-Discover applies test-time RL to set new state-of-the-art results on math inequalities, GPU kernels, algorithm contests, and single-cell denoising using an open model and public code.
Training Agents Inside of Scalable World Models cs.AI · 2025-09-29 · conditional · none · ref 36 · internal anchor
Dreamer 4 is the first agent to obtain diamonds in Minecraft from only offline data by reinforcement learning inside a scalable world model that accurately predicts game mechanics.
Linear Decision Tree Policies for Integer Linear Programs math.OC · 2026-05-04 · unverdicted · none · ref 167
Linear decision trees can represent optimal solution policies for families of integer linear programs, enabling polynomial-time queries after offline synthesis for fixed feasible sets.
Learning to Solve the Quadratic Assignment Problem with Warm-Started MCMC Finetuning cs.LG · 2026-04-22 · unverdicted · none · ref 27
PLMA combines cross-graph attention EBMs with short warm-started MCMC chains to reach near-zero average optimality gaps on QAPLIB and strong robustness on hard Taixxeyy instances.
Vehicle-as-Prompt: A Unified Deep Reinforcement Learning Framework for Heterogeneous Fleet Vehicle Routing Problem cs.LG · 2026-04-06 · unverdicted · none · ref 25
VaP-CSMV uses a cross-semantic encoder and multi-view decoder to unify DRL solving of HFVRP variants, outperforming prior neural solvers while matching heuristics at much lower inference time and generalizing zero-shot to unseen scales.
PARCEL: Pool-Anchored Resampling with Conditioned Elastic Queries for Efficient Vision-Language Understanding cs.CV · 2026-05-28 · unverdicted · none · ref 13 · internal anchor
PARCEL is a new visual tokenization architecture combining pool-anchored resampling with conditioned elastic queries to enhance performance-efficiency tradeoffs in LVLMs over prior matryoshka methods.
AlphaTransit: Learning to Design City-scale Transit Routes cs.AI · 2026-05-27 · unverdicted · none · ref 4 · internal anchor
AlphaTransit pairs MCTS with a learned policy-value network to reach 54.6% and 82.1% service rates on a Bloomington transit benchmark, outperforming plain RL and plain MCTS baselines.
Solving Classical and Quantum Spin Glasses with Deep Boltzmann Quantum States cond-mat.dis-nn · 2026-05-15 · unverdicted · none · ref 37 · internal anchor
Deep Boltzmann Quantum States with natural-gradient optimization and annealing-like training match exact or best-known solutions for large infinite-range Ising spin glasses and solve job shop scheduling instances.
Rethinking Molecular OOD Generalization via Target-Aware Source Selection cs.LG · 2026-05-13 · unverdicted · none · ref 6 · internal anchor
SCOPE-BENCH shows state-of-the-art molecular models suffer up to 8x higher errors under extreme OOD, while POMA reduces mean absolute error by up to 11.2% via target-aware source selection and dual-scale adaptation.
CO-MAP: A Reinforcement Learning Approach to the Qubit Allocation Problem quant-ph · 2026-05-13 · unverdicted · none · ref 2 · internal anchor
Reinforcement learning policy for qubit mapping reduces SWAP overhead by 65-85% versus standard quantum compilers on MQTBench and Queko benchmark circuits.
Rethinking Efficiency in Neural Combinatorial Optimization: Batched Preference Optimization with Mamba cs.LG · 2026-02-24 · unverdicted · none · ref 26 · internal anchor
ECO uses supervised warm-up plus iterative batched DPO on a Mamba backbone to reach top neural performance on TSP and CVRP while lowering memory growth and raising throughput.
Learning to Approximate Uniform Facility Location via Graph Neural Networks cs.LG · 2026-02-13 · unverdicted · none · ref 2 · internal anchor
A differentiable MPNN approximates uniform facility location with provable guarantees and outperforms standard approximation algorithms while closing the gap to exact ILP solutions.
Learning-Optimized Qubit Mapping and Reuse to Minimize Inter-Core Communication in Modular Quantum Architectures quant-ph · 2025-06-11 · unverdicted · none · ref 22 · internal anchor
QARMA applies transformer-augmented reinforcement learning to qubit allocation and reuse in modular quantum systems, reporting up to 86% average reduction in inter-core communications versus optimized Qiskit baselines.
RL-SPH: Learning to Achieve Feasible Solutions for Integer Linear Programs cs.LG · 2024-11-29 · unverdicted · none · ref 4 · internal anchor
RL-SPH is a reinforcement learning start primal heuristic that independently produces feasible solutions for ILPs with non-binary integers at 100% rate and with 28.6× lower primal gap than prior start heuristics.
Attention-Based Deep Reinforcement Learning for Qubit Allocation in Modular Quantum Architectures quant-ph · 2024-06-17 · unverdicted · none · ref 7 · internal anchor
An attention-based DRL agent with Transformer encoder and GNN learns heuristics for qubit-to-core allocation in multi-core quantum systems to minimize state transfers and online compilation time.
Contextual Plackett-Luce: An Efficient Neural Model for Probabilistic Sequence Selection under Ambiguity cs.LG · 2026-05-09 · unverdicted · none · ref 3
Contextual Plackett-Luce extends the classical Plackett-Luce model with context-dependent Ising parameterization to enable efficient parallel scoring followed by incremental autoregressive selection for ambiguous sequence tasks.
HMACE: Heterogeneous Multi-Agent Collaborative Evolution for Combinatorial Optimization cs.AI · 2026-05-08 · unverdicted · none · ref 8
HMACE deploys Proposer, Generator, Evaluator, and Reflector agents in an evolutionary loop to generate and refine heuristics for NP-hard problems, reporting lower optimality gaps and token costs than baselines on TSP and Online BPP.
Graph Normalization: Fast Binarizing Dynamics for Differentiable MWIS cs.LG · 2026-05-06 · unverdicted · none · ref 6
Graph Normalization is a convergent dynamical system that approximates MWIS by always reaching a binary maximum independent set via majorization-minimization and evolutionary game equivalence.
A Hybrid Reinforcement and Self-Supervised Learning Aided Benders Decomposition Algorithm eess.SY · 2026-04-23 · unverdicted · none · ref 40
A hybrid RL and self-supervised learning method accelerates generalized Benders decomposition by 57.5% on a MINLP case study while recovering optimal solutions.
Neural Global Optimization via Iterative Refinement from Noisy Samples cs.LG · 2026-04-04 · unverdicted · none · ref 14
A neural model learns iterative refinement from noisy samples and spline inputs to find global minima, reporting 8.05% mean error on multi-modal tests versus 36.24% for spline initialization alone.
Combinatorial Keyword Recommendations for Sponsored Search with Deep Reinforcement Learning cs.IR · 2019-07-18 · unverdicted · none · ref 11 · internal anchor
A modified pointer network trained with actor-critic DRL and Equal Size K-Means clustering is applied to combinatorial keyword recommendation in sponsored search, reporting offline and online gains.
ARMATA: Auto-Regressive Multi-Agent Task Assignment cs.MA · 2026-05-05 · unverdicted · none · ref 25
ARMATA is a new end-to-end autoregressive model with multi-stage decoding that unifies allocation and routing for multi-agent systems and reports up to 20% better solutions than OR-Tools, CPLEX, and LKH-3 in seconds instead of hours.
PaliGemma 2: A Family of Versatile VLMs for Transfer cs.CV · 2024-12-04 · unverdicted · none · ref 6 · internal anchor
PaliGemma 2 is a family of vision-language models that achieves state-of-the-art results on transfer tasks like table structure recognition and radiography report generation by combining SigLIP with Gemma 2 models at various sizes and resolutions.
Finite Expression Method with TranNet-based Function Learning for High-Dimensional Partial Differential Equations math.NA · 2026-04-24 · unverdicted · none · ref 10
An extension of the finite expression method using TranNet-initialized shallow neural operators is proposed as an effective solver for high-dimensional partial differential equations.
Built Environment Reasoning from Remote Sensing Imagery Using Large Vision--Language Models cs.CL · 2026-05-08 · unverdicted · none · ref 81
Large vision-language models applied to multi-scale remote sensing imagery can generate recommendations on built environment design, constructability, land use, and risks for smart city decision-making.
Gemma 2: Improving Open Language Models at a Practical Size cs.CL · 2024-07-31 · conditional · none · ref 59
Gemma 2 models achieve leading performance at their sizes by combining established Transformer modifications with knowledge distillation for the 2B and 9B variants.
Deep Learning for Sequential Decision Making under Uncertainty: Foundations, Frameworks, and Frontiers math.OC · 2026-04-13 · unverdicted · none · ref 13
A tutorial framing deep learning as a complement to optimization for sequential decision-making under uncertainty, with applications in supply chains, healthcare, and energy.
Convex Compositional Reasoning Models cs.LG · 2026-05-22 · unreviewed · ref 4 · internal anchor
ReVEL: Multi-Turn Reflective LLM-Guided Heuristic Evolution via Structured Performance Feedback cs.AI · 2026-03-05 · unreviewed · ref 1 · internal anchor
Machine Learning-based Two-Stage Graph Sparsification for the Travelling Salesman Problem cs.LG · 2026-04-22 · unreviewed · ref 3

Neural Combinatorial Optimization with Reinforcement Learning

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer