N(CO)$^2$: Neural Combinatorial Optimization with Chance Constraints to Solve Stochastic Orienteering

Anas Saeed; Marcos Abel Zuzu\'arregui; Stefano Carpin

arxiv: 2606.18514 · v1 · pith:BJ6DX62Anew · submitted 2026-06-16 · 💻 cs.RO · cs.LG

N(CO)²: Neural Combinatorial Optimization with Chance Constraints to Solve Stochastic Orienteering

Anas Saeed , Marcos Abel Zuzu\'arregui , Stefano Carpin This is my paper

Pith reviewed 2026-06-27 00:10 UTC · model grok-4.3

classification 💻 cs.RO cs.LG

keywords neural combinatorial optimizationchance constraintsstochastic orienteering problemreinforcement learningpath planning under uncertaintymixed-integer linear programming

0 comments

The pith

A reinforcement learning model with chance constraints solves the stochastic orienteering problem competitively to MILP without hand-crafted heuristics.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces N(CO)^2 to solve the stochastic orienteering problem by combining neural combinatorial optimization with chance constraints inside a reinforcement learning framework. The model learns to select paths that account for uncertainty while balancing exploration and exploitation. It aims to match the results of state-of-the-art mixed-integer linear programming across varied problem instances. Success would mean planners in uncertain settings no longer require manually engineered heuristics for each new scenario.

Core claim

The authors claim that integrating reinforcement learning with chance constraints inside a neural combinatorial optimization model enables path selection under uncertainty for the stochastic orienteering problem, generalizes across diverse instances, and reaches competitive performance against the state-of-the-art mixed-integer linear program.

What carries the argument

N(CO)^2, which embeds chance constraints inside a reinforcement learning loop to optimize path selection under uncertainty.

If this is right

The approach reduces human effort in designing heuristics for stochastic combinatorial problems.
It supports adaptive decision-making in uncertain environments common to automation tasks.
The model achieves competitive performance with the state-of-the-art MILP solver on SOP instances.
Performance holds across diverse SOP instances without instance-specific tuning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same chance-constraint integration could be tested on other stochastic routing problems such as the traveling salesman problem with time windows.
Deployment on physical robots would reveal whether simulation-trained policies transfer to real sensor noise and dynamics.
A direct comparison against other learning-based SOP solvers on the same benchmark set would clarify relative strengths.

Load-bearing premise

The reinforcement learning framework with chance constraints can optimize path selection under uncertainty and generalize across SOP instances without relying on hand-crafted heuristics.

What would settle it

Evaluating the trained model on a fresh collection of SOP instances with different uncertainty distributions and observing consistent underperformance relative to the MILP solver on solution quality or feasibility would falsify the competitiveness and generalization claims.

Figures

Figures reproduced from arXiv: 2606.18514 by Anas Saeed, Marcos Abel Zuzu\'arregui, Stefano Carpin.

**Figure 2.** Figure 2: Comparison of training performance of the model using Eq. (5) and (6) on four evaluation datasets. For each evaluation [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Evaluation of the compliance of paths sampled using [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: a280 TSP benchmark graph visual of Baseline, N(CO) [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Evaluation of active search algorithm on 2 evaluation [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

read the original abstract

Neural combinatorial optimization (NCO) offers a promising alternative to traditional heuristic-based methods for solving complex graph optimization problems by proposing to learn heuristics through data. This class of problems frequently arises in automation, as it can be used to model a variety of applications. While NCO has been extensively studied for deterministic combinatorial optimization problems, there are only a few works that aim to solve stochastic combinatorial optimization problems. In this work, we present N(CO)$^2$: Neural Combinatorial Optimization with Chance cOnstraints to solve the Stochastic Orienteering Problem (SOP) without the use of hand-crafted heuristics. By integrating a reinforcement learning (RL) framework, the model optimizes path selection under uncertainty, effectively balancing exploration and exploitation. Empirical results demonstrate that our method generalizes well across diverse SOP instances, achieving competitive performance compared to the state-of-the-art mixed-integer linear program (MILP) for the task. The proposed approach reduces human effort in heuristic design while enabling adaptive and efficient decision-making in uncertain environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The abstract claims competitive MILP-level results and good generalization on stochastic orienteering via RL plus chance constraints, but supplies none of the modeling or experimental details needed to check those claims.

read the letter

The one thing to know is that this paper applies neural combinatorial optimization to the stochastic orienteering problem by folding chance constraints into an RL policy, with the stated goal of avoiding hand-crafted heuristics. The abstract frames this as a step past the mostly deterministic NCO literature and the small number of existing stochastic NCO papers.

What is actually new is the concrete combination for SOP in a robotics setting. The idea of letting an RL decoder learn feasible paths under uncertainty while reducing manual heuristic work is a natural extension, and the target application area is reasonable.

The paper does little else that can be assessed from the given text. It states that the method generalizes across diverse instances and matches state-of-the-art MILP performance, yet reports no instance counts, no train/test distribution shift, no quantitative gaps in objective value or feasibility rate, and no description of how the chance constraint is encoded inside the policy network or decoder. The stress-test note correctly flags exactly these gaps; without them the competitiveness claim cannot be evaluated.

The soft spots are therefore central rather than peripheral. Soundness cannot be judged because the supporting data and implementation choices are absent. There is no sign of circular fitting or invented entities, but that is only because no equations or derivations appear at all.

This is for people already working on neural methods for combinatorial problems who want to see an early attempt at the stochastic case. A reader would get limited value until the experimental protocol and uncertainty modeling are shown in detail. I would not send it to peer review in this form; the claims are too underspecified to justify referee time.

Referee Report

2 major / 0 minor

Summary. The manuscript introduces N(CO)^2, a neural combinatorial optimization method that integrates chance constraints into a reinforcement learning framework to solve the Stochastic Orienteering Problem (SOP) without hand-crafted heuristics. It claims that the approach optimizes path selection under uncertainty, generalizes across diverse SOP instances, and achieves competitive performance relative to state-of-the-art MILP solvers while reducing human effort in heuristic design.

Significance. If the central empirical claims hold, the work would contribute to extending NCO methods from deterministic to stochastic combinatorial problems by providing a data-driven, adaptive alternative to MILP for uncertain environments. The absence of any reported experimental protocol, metrics, feasibility rates, or quantitative comparisons in the abstract, however, prevents assessment of whether the claimed competitiveness and generalization are supported.

major comments (2)

Abstract: the central claim that the method 'generalizes well across diverse SOP instances, achieving competitive performance compared to the state-of-the-art mixed-integer linear program (MILP)' is asserted without any supporting experimental details, metrics, baselines, instance counts, distribution-shift description, or error analysis, rendering the claim impossible to evaluate.
Abstract: no formulation is supplied for how the chance constraint (violation probability ≤ α) is encoded inside the policy network, decoder, or reward function, which is load-bearing for the feasibility guarantee asserted in the skeptic's note.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful review. We address each major comment below and agree that revisions to the abstract are warranted to improve evaluability of the claims.

read point-by-point responses

Referee: Abstract: the central claim that the method 'generalizes well across diverse SOP instances, achieving competitive performance compared to the state-of-the-art mixed-integer linear program (MILP)' is asserted without any supporting experimental details, metrics, baselines, instance counts, distribution-shift description, or error analysis, rendering the claim impossible to evaluate.

Authors: We agree that the abstract's brevity makes the central empirical claim difficult to assess without additional context. The full manuscript details the experimental protocol, including MILP baselines, instance generation procedures, distribution shifts, and quantitative metrics such as feasibility rates and optimality gaps in the Experiments section. To address this, we will revise the abstract to concisely report key results (e.g., average performance relative to MILP across N instances and feasibility under chance constraints). revision: yes
Referee: Abstract: no formulation is supplied for how the chance constraint (violation probability ≤ α) is encoded inside the policy network, decoder, or reward function, which is load-bearing for the feasibility guarantee asserted in the skeptic's note.

Authors: The encoding of the chance constraint (via a penalty term in the reward and a constrained policy gradient update) is formally defined in Section 3.2 of the manuscript. We concur that a high-level indication of this mechanism would strengthen the abstract. We will add a brief clause describing the integration of the chance constraint into the RL objective. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical RL claims rest on training and testing protocol, not self-referential definitions or fitted predictions

full rationale

The manuscript presents an RL framework for stochastic orienteering with chance constraints and reports empirical generalization versus MILP. No equations, ansatzes, uniqueness theorems, or derivation steps appear in the abstract or described content. Claims reduce to model training on SOP instances followed by out-of-sample evaluation; these are standard empirical assertions, not reductions by construction to inputs. No self-citation load-bearing, fitted-input-as-prediction, or renaming patterns are present. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no information on free parameters, axioms, or invented entities.

pith-pipeline@v0.9.1-grok · 5716 in / 1023 out tokens · 37907 ms · 2026-06-27T00:10:12.361105+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

30 extracted references · 3 canonical work pages · 2 internal anchors

[1]

Neural Combinatorial Optimization with Reinforcement Learning

I. Bello et al. Neural combinatorial optimization with reinforcement learning.arXiv preprint arXiv:1611.09940, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[2]

Bou et al

A. Bou et al. Torchrl: A data-driven decision-making library for pytorch, 2023

2023
[3]

S. Carpin. Solving stochastic orienteering problems with chance constraints using monte carlo tree search.IEEE Transactions on Automation Science and Engineering, 22:7855–7869, 2025

2025
[4]

Dai et al

H. Dai et al. Learning combinatorial optimization algorithms over graphs. InProceedings of the 31st International Conference on Neural Information Processing Systems, pages 6351 – 6361, 2017

2017
[5]

Dai et al

H. Dai et al. Learning to optimize with stochastic dominance constraints. InProceedings of The 26th International Conference on Artificial Intelligence and Statistics, pages 8991–9009, 2023

2023
[6]

B. L. Golden et al. The orienteering problem.Naval Research Logistics, 34:307–318, 1987

1987
[7]

Grinsztajn et al

N. Grinsztajn et al. Winner takes it all: Training performant rl populations for combinatorial optimization.Advances in Neural Information Processing Systems, 36:48485–48509, 2023

2023
[8]

Hussain et al

M.S. Hussain et al. Global self-attention as a replacement for graph convolution. InProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 655–665, 2022

2022
[9]

Joshi et al

C.K. Joshi et al. Learning the travelling salesperson problem requires rethinking generalization.Constraints, 27(1):70–98, 2022

2022
[10]

Attention, Learn to Solve Routing Problems!

W. Kool et al. Attention, learn to solve routing problems!arXiv preprint arXiv:1803.08475, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[11]

Kool et al

W. Kool et al. Buy 4 reinforce samples, get a baseline for free! Deep Reinforcement Learning Meets Structured Prediction, ICLR 2019 Workshop, 2019

2019
[12]

Liu et al

Z. Liu et al. Boosting graph search with attention network for solving the general orienteering problem.AI Open, 5:46–54, 2024

2024
[13]

Loshchilov and F

I. Loshchilov and F. Hutter. Decoupled weight decay regularization. InInternational Conference on Learning Representations, 2019

2019
[14]

Martin et al

L. Martin et al. The competitive pickup and delivery orienteering problem for balancing car-sharing systems.Transportation Science, 55(6):1232–1259, 2021

2021
[15]

Min et al

Y . Min et al. Unsupervised learning for solving the travelling sales- man problem.Advances in Neural Information Processing Systems, 36:47264–47278, 2023

2023
[16]

Pagnoncelli et al

B. Pagnoncelli et al. Sample average approximation method for chance constrained programming: Theory and applications.Journal of Oprtimization Theory and Applications, 142:399–416, 2009

2009
[17]

Qiu et al

R. Qiu et al. Dimes: A differentiable meta solver for combinatorial optimization problems.Advances in Neural Information Processing Systems, 35:25531–25546, 2022

2022
[18]

Tsplib—a traveling salesman problem library.ORSA Journal on Computing, 3(4):376–384, 1991

Gerhard Reinelt. Tsplib—a traveling salesman problem library.ORSA Journal on Computing, 3(4):376–384, 1991

1991
[19]

Neural combinatorial optimization for stochastic flexible job shop scheduling problems

I.G Smit et al. Neural combinatorial optimization for stochastic flexible job shop scheduling problems. InThe 39th Annual AAAI Conference on Artificial Intelligence (AAAI-25), December 2024

2024
[20]

Difusco: Graph-based diffusion solvers for combinatorial optimization.Advances in neural information processing systems, 36:3706–3731, 2023

Z Sun et al. Difusco: Graph-based diffusion solvers for combinatorial optimization.Advances in neural information processing systems, 36:3706–3731, 2023

2023
[21]

G. Tong. USCO-solver: Solving undetermined stochastic combinato- rial optimization problems. In A. Beygelzimer et al., editors,Advances in Neural Information Processing Systems, 2021

2021
[22]

Varakantham et al

P. Varakantham et al. Optimization approaches for solving chance constrained stochastic orienteering problems. In P. Perny et al., editors, Algorithmic Decision Theory, pages 387–398, Berlin, Heidelberg,
[23]

Springer Berlin Heidelberg
[24]

Vaswani et al

A. Vaswani et al. Attention is all you need.Advances in neural information processing systems, 30, 2017

2017
[25]

Wang et al

R. Wang et al. Linsatnet: the positive linear satisfiability neural networks. InInternational Conference on Machine Learning, pages 36605–36625. PMLR, 2023

2023
[26]

Williams

R.J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning.Machine Learning, 8(3):229– 256, May 1992

1992
[27]

Xiao et al

P. Xiao et al. Neural combinatorial optimization for robust routing problem with uncertain travel times. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

2024
[28]

Xing and S

Z. Xing and S. Tu. A graph neural network assisted monte carlo tree search approach to traveling salesman problem.IEEE Access, 8:108418–108428, 2020

2020
[29]

Zuzu ´arregui and S

M.A. Zuzu ´arregui and S. Carpin. Leveraging LLMs for mission planning in precision agriculture. InProc. of the IEEE International Conference on Robotics and Automation, pages 7146–7152, 2025

2025
[30]

Zuzu ´arregui and S

M.A. Zuzu ´arregui and S. Carpin. Solving Stochastic Orienteering Problems with Chance Constraints Using a GNN Powered Monte Carlo Tree Search. InProceedings of the IEEE International Con- ference on Automation Science and Engineering (also available as arXiv:2409.04653), 2025

work page arXiv 2025

[1] [1]

Neural Combinatorial Optimization with Reinforcement Learning

I. Bello et al. Neural combinatorial optimization with reinforcement learning.arXiv preprint arXiv:1611.09940, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[2] [2]

Bou et al

A. Bou et al. Torchrl: A data-driven decision-making library for pytorch, 2023

2023

[3] [3]

S. Carpin. Solving stochastic orienteering problems with chance constraints using monte carlo tree search.IEEE Transactions on Automation Science and Engineering, 22:7855–7869, 2025

2025

[4] [4]

Dai et al

H. Dai et al. Learning combinatorial optimization algorithms over graphs. InProceedings of the 31st International Conference on Neural Information Processing Systems, pages 6351 – 6361, 2017

2017

[5] [5]

Dai et al

H. Dai et al. Learning to optimize with stochastic dominance constraints. InProceedings of The 26th International Conference on Artificial Intelligence and Statistics, pages 8991–9009, 2023

2023

[6] [6]

B. L. Golden et al. The orienteering problem.Naval Research Logistics, 34:307–318, 1987

1987

[7] [7]

Grinsztajn et al

N. Grinsztajn et al. Winner takes it all: Training performant rl populations for combinatorial optimization.Advances in Neural Information Processing Systems, 36:48485–48509, 2023

2023

[8] [8]

Hussain et al

M.S. Hussain et al. Global self-attention as a replacement for graph convolution. InProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 655–665, 2022

2022

[9] [9]

Joshi et al

C.K. Joshi et al. Learning the travelling salesperson problem requires rethinking generalization.Constraints, 27(1):70–98, 2022

2022

[10] [10]

Attention, Learn to Solve Routing Problems!

W. Kool et al. Attention, learn to solve routing problems!arXiv preprint arXiv:1803.08475, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[11] [11]

Kool et al

W. Kool et al. Buy 4 reinforce samples, get a baseline for free! Deep Reinforcement Learning Meets Structured Prediction, ICLR 2019 Workshop, 2019

2019

[12] [12]

Liu et al

Z. Liu et al. Boosting graph search with attention network for solving the general orienteering problem.AI Open, 5:46–54, 2024

2024

[13] [13]

Loshchilov and F

I. Loshchilov and F. Hutter. Decoupled weight decay regularization. InInternational Conference on Learning Representations, 2019

2019

[14] [14]

Martin et al

L. Martin et al. The competitive pickup and delivery orienteering problem for balancing car-sharing systems.Transportation Science, 55(6):1232–1259, 2021

2021

[15] [15]

Min et al

Y . Min et al. Unsupervised learning for solving the travelling sales- man problem.Advances in Neural Information Processing Systems, 36:47264–47278, 2023

2023

[16] [16]

Pagnoncelli et al

B. Pagnoncelli et al. Sample average approximation method for chance constrained programming: Theory and applications.Journal of Oprtimization Theory and Applications, 142:399–416, 2009

2009

[17] [17]

Qiu et al

R. Qiu et al. Dimes: A differentiable meta solver for combinatorial optimization problems.Advances in Neural Information Processing Systems, 35:25531–25546, 2022

2022

[18] [18]

Tsplib—a traveling salesman problem library.ORSA Journal on Computing, 3(4):376–384, 1991

Gerhard Reinelt. Tsplib—a traveling salesman problem library.ORSA Journal on Computing, 3(4):376–384, 1991

1991

[19] [19]

Neural combinatorial optimization for stochastic flexible job shop scheduling problems

I.G Smit et al. Neural combinatorial optimization for stochastic flexible job shop scheduling problems. InThe 39th Annual AAAI Conference on Artificial Intelligence (AAAI-25), December 2024

2024

[20] [20]

Difusco: Graph-based diffusion solvers for combinatorial optimization.Advances in neural information processing systems, 36:3706–3731, 2023

Z Sun et al. Difusco: Graph-based diffusion solvers for combinatorial optimization.Advances in neural information processing systems, 36:3706–3731, 2023

2023

[21] [21]

G. Tong. USCO-solver: Solving undetermined stochastic combinato- rial optimization problems. In A. Beygelzimer et al., editors,Advances in Neural Information Processing Systems, 2021

2021

[22] [22]

Varakantham et al

P. Varakantham et al. Optimization approaches for solving chance constrained stochastic orienteering problems. In P. Perny et al., editors, Algorithmic Decision Theory, pages 387–398, Berlin, Heidelberg,

[23] [23]

Springer Berlin Heidelberg

[24] [24]

Vaswani et al

A. Vaswani et al. Attention is all you need.Advances in neural information processing systems, 30, 2017

2017

[25] [25]

Wang et al

R. Wang et al. Linsatnet: the positive linear satisfiability neural networks. InInternational Conference on Machine Learning, pages 36605–36625. PMLR, 2023

2023

[26] [26]

Williams

R.J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning.Machine Learning, 8(3):229– 256, May 1992

1992

[27] [27]

Xiao et al

P. Xiao et al. Neural combinatorial optimization for robust routing problem with uncertain travel times. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

2024

[28] [28]

Xing and S

Z. Xing and S. Tu. A graph neural network assisted monte carlo tree search approach to traveling salesman problem.IEEE Access, 8:108418–108428, 2020

2020

[29] [29]

Zuzu ´arregui and S

M.A. Zuzu ´arregui and S. Carpin. Leveraging LLMs for mission planning in precision agriculture. InProc. of the IEEE International Conference on Robotics and Automation, pages 7146–7152, 2025

2025

[30] [30]

Zuzu ´arregui and S

M.A. Zuzu ´arregui and S. Carpin. Solving Stochastic Orienteering Problems with Chance Constraints Using a GNN Powered Monte Carlo Tree Search. InProceedings of the IEEE International Con- ference on Automation Science and Engineering (also available as arXiv:2409.04653), 2025

work page arXiv 2025