pith. sign in

arxiv: 2606.22776 · v1 · pith:PATNHIJNnew · submitted 2026-06-22 · 💻 cs.LG · cs.AI

GeoRouteNet: Geometry-Enhanced Non-Autoregressive Neural Solver for the Traveling Salesman Problem

Pith reviewed 2026-06-26 09:25 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords traveling salesman problemnon-autoregressive solvergeometric inductive biasreinforcement learninggraph attention networkcombinatorial optimizationeuclidean tsp
0
0 comments X

The pith

GeoRouteNet shows geometry enhancements and multi-candidate RL let non-autoregressive models solve Euclidean TSP with low optimality gaps and fast inference.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces GeoRouteNet to fix weak geometric bias and unstable training in non-autoregressive neural solvers for the traveling salesman problem. It adds centered node features, learnable radial distance basis functions, distance-aware graph attention with explicit edge messaging, LayerNorm-SwiGLU blocks, and cross-layer mixing to the encoder. Training uses multi-candidate self-comparison reinforcement learning that samples multiple tours per instance, builds adaptive baselines from greedy and peer candidates, and adds winner guidance with annealed entropy regularization. These changes produce 0.32 percent optimality gap on 10,000 random TSP50 instances and 1.26 percent on TSP100 under beam-1000 decoding, plus 3.60 percent overall on 27 stratified TSPLIB EUC_2D instances. Ablations indicate geometric structure drives cross-distribution gains while the training method stabilizes quality.

Core claim

GeoRouteNet incorporates centered node features, learnable radial distance basis functions, distance-aware graph attention with explicit edge messaging, LayerNorm-SwiGLU feed-forward blocks, and cross-layer attentive residual mixing, trained via multi-candidate self-comparison RL that samples multiple candidate tours per instance, constructs adaptive baselines from greedy and peer candidates, and adds winner-candidate guidance with annealed entropy regularization. On 10,000 random TSP50 instances this yields a 0.32 percent optimality gap under Beam-1000 decoding; on TSP100 the gap is 1.26 percent. On 27 stratified TSPLIB EUC_2D instances the overall gap falls from 17.12 percent (prior NAR re

What carries the argument

Geometry-enhanced encoder that uses centered features, learnable radial distance basis functions, and distance-aware graph attention with explicit edge messaging, paired with multi-candidate self-comparison reinforcement learning that builds adaptive baselines from multiple sampled tours.

If this is right

  • Non-autoregressive TSP solvers reach sub-2 percent optimality gaps on random instances up to size 100 when equipped with explicit geometric encoding.
  • Geometric structure improvements account for most of the gain on real-world distributions while multi-candidate training provides further stabilization.
  • Batch inference throughput can exceed that of established solvers such as Concorde and LKH3 on batches of TSP instances.
  • Ablation results show the geometric encoder and multi-candidate RL are complementary rather than redundant.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same distance-aware attention and radial basis approach could transfer to other geometric routing problems such as vehicle routing or capacitated variants.
  • Explicit modeling of pairwise distances may be a general requirement for scaling non-autoregressive methods beyond synthetic uniform data.
  • The reported throughput advantage suggests the architecture could support real-time routing decisions in logistics settings where exact solvers are too slow.

Load-bearing premise

The added geometric components supply enough inductive bias to improve cross-scale and cross-distribution generalization in non-autoregressive TSP solvers.

What would settle it

Running GeoRouteNet on a fresh collection of TSPLIB EUC_2D instances drawn from the same stratified distribution and checking whether the optimality gap stays below 5 percent would directly test the claimed generalization improvement.

Figures

Figures reproduced from arXiv: 2606.22776 by Xiang Li.

Figure 1
Figure 1. Figure 1: GeoRouteNet model architecture. The model takes node coordinates and pairwise distances as input, constructs geometrically enriched node and edge features, processes them through a distance-aware graph attention encoder with LayerNorm-SwiGLU feed-forward blocks and cross-layer attentive residual mixing, and outputs start-node and edge transition scores for decoding. 3.1 Node Geometric Features Euclidean TS… view at source ↗
Figure 2
Figure 2. Figure 2: Learnable radial distance basis functions before (left) and after (right) training. After training, several RBF centers concentrate on short-to-medium distance ranges while preserving coverage of longer cross-cluster edges. 3.3 Distance-Aware Graph Attention Our graph attention layer uses standard query-key-value (QKV) projection [20, 21] with two geometric augmentations. For the attention from node i to n… view at source ↗
Figure 3
Figure 3. Figure 3: Overview of the MCS-RL training pipeline. For each training instance, the model generates K candidate tours through sampling, plus a greedy tour. An adaptive baseline is constructed from the greedy tour and the shortest among the other candidates. Winner-candidate guidance reinforces the shortest sampled tour, and annealed entropy regularization encourages exploration early in training. 7 [PITH_FULL_IMAGE… view at source ↗
Figure 4
Figure 4. Figure 4: Greedy baseline tour length during training (25-epoch moving average). GeoRouteNet-related curves consistently lie below NAR4TSP-R, indicating faster convergence and lower asymptotic length. especially important for cross-scale generalization: when node count doubles, centered coordinates, radial distance features, and distance-aware attention provide a more stable geometric representation than raw coordin… view at source ↗
Figure 5
Figure 5. Figure 5: Beam-1000 optimality gap across random TSP50, TSP100, and four TSPLIB groups. GeoRouteNet achieves the lowest gap in every setting while NAR4TSP-R shows the steepest degradation from random data to TSPLIB [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Tour visualization comparison on kroA100. Left: Concorde optimal (21,282). Center: NAR4TSP￾R Beam-1000 (21,987, one edge crossing). Right: GeoRouteNet Beam-1000 (21,405, no crossings). 13 [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: maps all models onto a latency-Gap plane, connecting Greedy, Beam-100, and Beam￾1000 points for each model. GeoRouteNet’s curve lies below NAR4TSP-R across the full latency range, indicating a better speed-quality Pareto frontier [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗
read the original abstract

The traveling salesman problem (TSP) is a canonical NP-hard combinatorial optimization benchmark that tests the representational capacity and generalization of neural solvers. While non-autoregressive (NAR) approaches offer parallel inference, they often lack sufficient geometric inductive bias and stable training signals, leading to degraded performance under cross-scale and cross-distribution shifts. We propose GeoRouteNet, a geometry-enhanced NAR neural solver for Euclidean TSP. On the model side, GeoRouteNet incorporates centered node features, learnable radial distance basis functions, distance-aware graph attention with explicit edge messaging, LayerNorm-SwiGLU feed-forward blocks, and cross-layer attentive residual mixing. On the training side, we design multi-candidate self-comparison reinforcement learning (MCS-RL), which samples multiple candidate tours per instance, constructs adaptive baselines from greedy and peer candidates, and adds winner-candidate guidance with annealed entropy regularization. On 10,000 random TSP50 instances, GeoRouteNet achieves a 0.32% optimality gap under Beam-1000 decoding. On TSP100, the gap is 1.26%. On 27 stratified TSPLIB EUC_2D instances, the overall gap drops from 17.12% (NAR4TSP reproduction) to 3.60%, while batch inference throughput substantially exceeds that of Concorde and LKH3. Ablation studies confirm that geometric structure enhancement and multi-candidate training are complementary: structure improvements dominate cross-distribution gains, while MCS-RL further stabilizes solution quality when paired with a strong geometric encoder.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes GeoRouteNet, a geometry-enhanced non-autoregressive neural solver for Euclidean TSP. It augments a NAR encoder with centered node features, learnable radial distance basis functions, distance-aware graph attention with explicit edge messaging, LayerNorm-SwiGLU blocks, and cross-layer attentive residual mixing. Training employs multi-candidate self-comparison reinforcement learning (MCS-RL) that samples multiple tours per instance, builds adaptive baselines from greedy and peer candidates, and applies winner guidance with annealed entropy. Reported results include 0.32% optimality gap on 10,000 random TSP50 instances and 1.26% on TSP100 under Beam-1000 decoding; on 27 stratified TSPLIB EUC_2D instances the gap falls from 17.12% (NAR4TSP reproduction) to 3.60%, with batch throughput exceeding Concorde and LKH3. Ablations attribute cross-distribution gains primarily to the geometric modules while MCS-RL stabilizes quality.

Significance. If the numerical results and the attribution of TSPLIB gains to geometric inductive bias hold after verification, the work would advance NAR TSP solvers by supplying concrete evidence that explicit geometric structure (centered features, radial basis, distance-aware attention) improves cross-scale and cross-distribution generalization. The explicit separation of structure versus training effects in the ablations, together with the throughput comparisons, supplies falsifiable claims that can be tested on additional real-world instances.

major comments (2)
  1. [Ablation studies] Ablation studies (abstract): the claim that geometric structure improvements dominate the TSPLIB gap reduction (17.12% to 3.60%) is load-bearing for the central generalization argument. The manuscript must confirm that model width, depth, training steps, optimizer, and all MCS-RL hyperparameters remain fixed when the geometric modules (centered features, radial basis functions, distance-aware attention) are removed; otherwise the observed improvement cannot be attributed to the inductive bias.
  2. [Experiments] Experiments section (random-instance and TSPLIB results): the optimality gaps (0.32% TSP50, 1.26% TSP100, 3.60% TSPLIB) are the primary evidence for the performance claims, yet the abstract and results supply no information on the exact optimality verifier, whether gaps are averaged over multiple independent runs with reported variance or error bars, or any post-hoc instance filtering. These details are required to establish the data-to-claim link.
minor comments (1)
  1. [Abstract] The abstract states that batch inference throughput substantially exceeds Concorde and LKH3; a table or figure with concrete tokens-per-second or tours-per-second numbers on identical hardware would strengthen the practical-advantage claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. Below we address the two major comments point by point, providing clarifications and committing to revisions where needed to strengthen the presentation of our results.

read point-by-point responses
  1. Referee: [Ablation studies] Ablation studies (abstract): the claim that geometric structure improvements dominate the TSPLIB gap reduction (17.12% to 3.60%) is load-bearing for the central generalization argument. The manuscript must confirm that model width, depth, training steps, optimizer, and all MCS-RL hyperparameters remain fixed when the geometric modules (centered features, radial basis functions, distance-aware attention) are removed; otherwise the observed improvement cannot be attributed to the inductive bias.

    Authors: The ablation experiments were performed with model width, depth, training steps, optimizer, and all MCS-RL hyperparameters held exactly constant; only the geometric modules were removed or disabled. This controlled setup is described in the experiments section, allowing direct attribution of the TSPLIB gains to the inductive biases. To make the control explicit for readers, we will add a clarifying sentence in the ablation subsection. revision: yes

  2. Referee: [Experiments] Experiments section (random-instance and TSPLIB results): the optimality gaps (0.32% TSP50, 1.26% TSP100, 3.60% TSPLIB) are the primary evidence for the performance claims, yet the abstract and results supply no information on the exact optimality verifier, whether gaps are averaged over multiple independent runs with reported variance or error bars, or any post-hoc instance filtering. These details are required to establish the data-to-claim link.

    Authors: All optimality gaps are computed against exact solutions from the Concorde solver. The reported figures are means over the full sets of 10,000 random instances and 27 TSPLIB instances from a single training run, with no post-hoc filtering of instances. We will revise the experimental section and abstract to state the verifier, the single-run averaging, and the absence of filtering. revision: yes

Circularity Check

0 steps flagged

No significant circularity in claimed generalization or ablations

full rationale

The provided abstract and context describe an empirical neural solver with geometric modules and an MCS-RL training procedure. Reported optimality gaps are measured against external optima (Concorde/LKH3 or known TSPLIB solutions) on held-out random instances and cross-distribution TSPLIB cases. No equations, self-citations, or derivations are shown that reduce a claimed result to its own fitted inputs by construction. MCS-RL uses model samples for adaptive baselines, but this is standard on-policy RL and does not force the external optimality gaps. Ablation claims attribute gains to geometric bias vs. MCS-RL without evidence of capacity or regime changes that would create definitional equivalence. The derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central performance claims rest on several learnable neural-network parameters and the domain assumption that explicit geometric structure improves NAR generalization. Because only the abstract is available, the ledger is necessarily incomplete and conservative.

free parameters (2)
  • learnable radial distance basis function coefficients
    Explicitly described as learnable components fitted during training to TSP instances.
  • weights inside distance-aware graph attention and LayerNorm-SwiGLU blocks
    Standard neural-network parameters optimized on the training distribution.
axioms (2)
  • domain assumption Euclidean TSP instances possess geometric structure that can be directly exploited by node centering, radial basis functions, and edge-distance messaging to improve non-autoregressive performance
    This premise is invoked in the abstract to motivate the model-side enhancements.
  • domain assumption Multi-candidate self-comparison reinforcement learning supplies stable training signals that complement geometric encoding
    Invoked in the abstract to justify the training-side design and ablation claims.

pith-pipeline@v0.9.1-grok · 5809 in / 1669 out tokens · 47323 ms · 2026-06-26T09:25:30.059804+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

25 extracted references · 5 canonical work pages · 3 internal anchors

  1. [1]

    Applegate, Robert E

    David L. Applegate, Robert E. Bixby, Vašek Chvátal, and William J. Cook.The Traveling Salesman Problem: A Computational Study. Princeton University Press, 2006

  2. [2]

    Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. Layer normalization.arXiv preprint arXiv:1607.06450, 2016

  3. [3]

    Dynamic programming treatment of the travelling salesman problem.Journal of the ACM, 9(1):61–63, 1962

    Richard Bellman. Dynamic programming treatment of the travelling salesman problem.Journal of the ACM, 9(1):61–63, 1962

  4. [4]

    Neural Combinatorial Optimization with Reinforcement Learning

    Irwan Bello, Hieu Pham, Quoc V. Le, Mohammad Norouzi, and Samy Bengio. Neural combinatorial optimization with reinforcement learning.arXiv preprint arXiv:1611.09940, 2016

  5. [5]

    Solution of a large-scale traveling-salesman problem.Journal of the Operations Research Society of America, 2(4):393–410, 1954

    George Dantzig, Ray Fulkerson, and Selmer Johnson. Solution of a large-scale traveling-salesman problem.Journal of the Operations Research Society of America, 2(4):393–410, 1954

  6. [6]

    Michael Held and Richard M. Karp. A dynamic programming approach to sequencing problems. Journal of the Society for Industrial and Applied Mathematics, 10(1):196–210, 1962

  7. [7]

    An effective implementation of the Lin-Kernighan traveling salesman heuristic

    Keld Helsgaun. An effective implementation of the Lin-Kernighan traveling salesman heuristic. European Journal of Operational Research, 126(1):106–130, 2000

  8. [8]

    An extension of the Lin-Kernighan-Helsgaun TSP solver for constrained traveling salesman and vehicle routing problems

    Keld Helsgaun. An extension of the Lin-Kernighan-Helsgaun TSP solver for constrained traveling salesman and vehicle routing problems. Technical report, Roskilde University, 2017

  9. [9]

    Bridging large language models and optimization: A unified framework for text-attributed combinatorial optimization.arXiv preprint arXiv:2408.12214, 2024

    Xinyu Jiang, Yuxiang Wu, Yanzhen Wang, and Yu Zhang. Bridging large language models and optimization: A unified framework for text-attributed combinatorial optimization.arXiv preprint arXiv:2408.12214, 2024

  10. [10]

    Joshi, Thomas Laurent, and Xavier Bresson

    Chaitanya K. Joshi, Thomas Laurent, and Xavier Bresson. An efficient graph convolutional network technique for the travelling salesman problem.arXiv preprint arXiv:1906.01227, 2019

  11. [11]

    Joshi, Quentin Cappart, Louis-Martin Rousseau, and Thomas Laurent

    Chaitanya K. Joshi, Quentin Cappart, Louis-Martin Rousseau, and Thomas Laurent. Learning the travelling salesperson problem requires rethinking generalization.Constraints, 27(1-2): 70–98, 2022

  12. [12]

    Kingma and Jimmy Ba

    Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR), 2015

  13. [13]

    Attention, learn to solve routing problems! InInternational Conference on Learning Representations (ICLR), 2019

    Wouter Kool, Herke van Hoof, and Max Welling. Attention, learn to solve routing problems! InInternational Conference on Learning Representations (ICLR), 2019

  14. [14]

    POMO: Policy optimization with multiple optima for reinforcement learning

    Yeong-Dae Kwon, Jinho Choo, Byoungjip Kim, Iljoo Yoon, Youngjune Gwon, and Seungjai Min. POMO: Policy optimization with multiple optima for reinforcement learning. InAdvances in Neural Information Processing Systems (NeurIPS), 2020. 18 GeoRouteNet: Geometry-Enhanced NAR TSP Solver

  15. [15]

    Lawler, Jan Karel Lenstra, Alexander H

    Eugene L. Lawler, Jan Karel Lenstra, Alexander H. G. Rinnooy Kan, and David B. Shmoys, editors.The Traveling Salesman Problem: A Guided Tour of Combinatorial Optimization. Wiley, 1985

  16. [16]

    Kernighan

    Shen Lin and Brian W. Kernighan. An effective heuristic algorithm for the traveling-salesman problem.Operations Research, 21(2):498–516, 1973

  17. [17]

    TSPLIB: A traveling salesman problem library.ORSA Journal on Computing, 3(4):376–384, 1991

    Gerhard Reinelt. TSPLIB: A traveling salesman problem library.ORSA Journal on Computing, 3(4):376–384, 1991

  18. [18]

    GLU Variants Improve Transformer

    Noam Shazeer. GLU variants improve Transformer.arXiv preprint arXiv:2002.05202, 2020

  19. [19]

    DIFUSCO: Graph-based diffusion solvers for combinatorial optimization

    Zhiqing Sun and Yiming Yang. DIFUSCO: Graph-based diffusion solvers for combinatorial optimization. InAdvances in Neural Information Processing Systems (NeurIPS), 2023

  20. [20]

    Gomez, ŁukaszKaiser, andIlliaPolosukhin

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, ŁukaszKaiser, andIlliaPolosukhin. Attentionisallyouneed. InAdvances in Neural Information Processing Systems (NeurIPS), 2017

  21. [21]

    Graph attention networks

    Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. Graph attention networks. InInternational Conference on Learning Representations (ICLR), 2018

  22. [22]

    Pointer Networks

    Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. Pointer Networks. InAdvances in Neural Information Processing Systems (NeurIPS), 2015

  23. [23]

    An efficient diffusion-based non-autoregressive solver for traveling salesman problem

    Mingzheng Wang, You Zhou, Zhiguang Cao, Yubin Xiao, Xuan Wu, Wei Pang, Yanchun Jiang, Hui Yang, Peng Zhao, and Yongsheng Li. An efficient diffusion-based non-autoregressive solver for traveling salesman problem. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), pages 1469–1480, 2025

  24. [24]

    Reinforcement learning-based nonautoregressive solver for traveling salesman problems.IEEE Transactions on Neural Networks and Learning Systems, 36(7):13402–13416, 2025

    Yubin Xiao, Di Wang, Bin Li, Huanhuan Chen, Wei Pang, Xuan Wu, Hao Li, Dong Xu, Yanchun Liang, and You Zhou. Reinforcement learning-based nonautoregressive solver for traveling salesman problems.IEEE Transactions on Neural Networks and Learning Systems, 36(7):13402–13416, 2025

  25. [25]

    ReEvo: Large language models as hyper-heuristics with reflective evolution

    Haotian Ye, Jie Wang, Zhiguang Cao, Federico Berto, Chuanbo Hua, Haeyeon Kim, Jinkyoo Park, and Guojie Song. ReEvo: Large language models as hyper-heuristics with reflective evolution. InAdvances in Neural Information Processing Systems (NeurIPS), 2024. 19