PyEPO: A PyTorch-based End-to-End Predict-then-Optimize Library for Linear and Integer Programming

Bo Tang; Elias B. Khalil

arxiv: 2206.14234 · v3 · submitted 2022-06-28 · 🧮 math.OC · cs.LG

PyEPO: A PyTorch-based End-to-End Predict-then-Optimize Library for Linear and Integer Programming

Bo Tang , Elias B. Khalil This is my paper

Pith reviewed 2026-05-24 11:20 UTC · model grok-4.3

classification 🧮 math.OC cs.LG

keywords predict-then-optimizeend-to-end learninglinear programminginteger programmingPyTorchcombinatorial optimizationsurrogate losses

0 comments

The pith

PyEPO is the first generic PyTorch library for end-to-end predict-then-optimize on linear and integer programs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents PyEPO as a library that trains neural networks to predict objective coefficients while directly accounting for how those predictions affect the quality of the final optimization solution. Traditional approaches separate prediction from optimization, but end-to-end methods adjust the predictions to improve downstream decisions on problems such as shortest paths and knapsacks. PyEPO supplies a single interface that lets users define arbitrary linear or integer programs, select from surrogate losses, black-box solvers, or perturbed optimization methods, and plug in custom network architectures. Experiments on standard combinatorial instances illustrate performance differences across methods and surface practical observations about training stability and solution quality.

Core claim

PyEPO is the first generic tool for linear and integer programming with predicted objective function coefficients. It packages surrogate decision losses, black-box solvers, and perturbed methods into a PyTorch library that offers a user-friendly interface for defining new optimization problems, applying state-of-the-art algorithms, and using custom neural network architectures, with experiments on Shortest Path, Multiple Knapsack, and Traveling Salesperson Problem.

What carries the argument

PyEPO library, which integrates neural-network training loops with optimization solvers so that prediction errors are measured by their effect on the solved objective rather than by standalone prediction accuracy.

If this is right

Users can define new optimization problems and train networks without writing separate prediction and optimization stages.
Multiple algorithms including surrogate losses, black-box solvers, and perturbed methods become interchangeable within the same training pipeline.
Custom neural network architectures can be inserted directly into the predict-then-optimize loop.
Empirical comparisons on Shortest Path, Multiple Knapsack, and TSP instances become reproducible with a single code base.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Wider adoption could reduce the gap between learned predictors and operational decisions in logistics and scheduling domains.
The library's extensibility may accelerate testing of new surrogate losses or perturbation schemes on the same problem set.
Insights from the reported experiments could inform choices between gradient-based and gradient-free training for integer programs.

Load-bearing premise

The described algorithms can be packaged into a single extensible library that correctly integrates neural network training with optimization solvers for arbitrary user-defined linear and integer programs.

What would settle it

Release of an earlier open-source library that already provided the same generic end-to-end interface for arbitrary linear and integer programs with predicted coefficients, or demonstration that PyEPO's integration produces incorrect gradients or infeasible solutions on a standard test instance.

Figures

Figures reproduced from arXiv: 2206.14234 by Bo Tang, Elias B. Khalil.

**Figure 2.** Figure 2: Illustration of the end-to-end predict-then-optimize framework: A la [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: As shown for the learning curves of the training of [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

**Figure 4.** Figure 4: Parallel efficiency: Although there is additional overhead in creating a [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

**Figure 5.** Figure 5: Warcraft terrain shortest path dataset: (Left) Each input feature is a [PITH_FULL_IMAGE:figures/full_fig_p024_5.png] view at source ↗

**Figure 6.** Figure 6: Normalized regret for the shortest path problem on the test set: The [PITH_FULL_IMAGE:figures/full_fig_p026_6.png] view at source ↗

**Figure 7.** Figure 7: Normalized regret for the 2D knapsack problem on the test set: There [PITH_FULL_IMAGE:figures/full_fig_p027_7.png] view at source ↗

**Figure 8.** Figure 8: Normalized regret for the TSP problem on the test set: There are 20 [PITH_FULL_IMAGE:figures/full_fig_p028_8.png] view at source ↗

**Figure 9.** Figure 9: Average training time per iteration for exact and relaxation methods [PITH_FULL_IMAGE:figures/full_fig_p029_9.png] view at source ↗

**Figure 10.** Figure 10: Normalized regret for the 2D knapsack (at the top) and TSP (at [PITH_FULL_IMAGE:figures/full_fig_p030_10.png] view at source ↗

**Figure 11.** Figure 11: Normalized regret and MSE on the test set: In these experiments, we [PITH_FULL_IMAGE:figures/full_fig_p031_11.png] view at source ↗

**Figure 12.** Figure 12: MSE v.s. Regret: The result covers different two-stage methods, [PITH_FULL_IMAGE:figures/full_fig_p032_12.png] view at source ↗

**Figure 13.** Figure 13: MSE v.s. Regret: The result covers different two-stage methods, [PITH_FULL_IMAGE:figures/full_fig_p033_13.png] view at source ↗

**Figure 14.** Figure 14: Learning curve, relative regret, and path accuracy for the shortest [PITH_FULL_IMAGE:figures/full_fig_p034_14.png] view at source ↗

read the original abstract

In deterministic optimization, it is typically assumed that all problem parameters are fixed and known. In practice, however, some parameters may be a priori unknown but can be estimated from contextual information. A typical predict-then-optimize approach separates predictions and optimization into two distinct stages. Recently, end-to-end predict-then-optimize has emerged as an attractive alternative. This work introduces the PyEPO package, a PyTorch-based end-to-end predict-then-optimize library in Python. To the best of our knowledge, PyEPO (pronounced like \textit{pineapple} with a silent ``n") is the first such generic tool for linear and integer programming with predicted objective function coefficients. It includes various algorithms such as surrogate decision losses, black-box solvers, and perturbed methods. PyEPO offers a user-friendly interface for defining new optimization problems, applying state-of-the-art algorithms, and using custom neural network architectures. We conducted experiments comparing various methods on problems such as Shortest Path, Multiple Knapsack, and Traveling Salesperson Problem, and discussed empirical insights that may guide future research. PyEPO and its documentation are available at https://github.com/khalil-research/PyEPO.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PyEPO is a PyTorch library that bundles existing end-to-end predict-then-optimize methods for linear and integer programs with predicted coefficients.

read the letter

PyEPO is a PyTorch library that bundles existing end-to-end predict-then-optimize methods for linear and integer programs with predicted coefficients. The authors position it as the first generic tool of its kind and include surrogate decision losses, black-box solvers, and perturbed methods, plus an interface for users to define custom problems and plug in their own networks. The code is on GitHub with documentation. Experiments compare the methods on shortest path, multiple knapsack, and TSP instances, which are standard test cases in this area. They also share some empirical observations from those runs. This is useful packaging work that saves others from reimplementing the pieces separately. The main limitation is that the contribution is in the integration and user interface rather than new algorithms or theory. The experiments stay within well-studied benchmarks and do not test larger or more realistic instances. Soundness rests on the code behaving correctly for arbitrary user problems, which the abstract does not prove but also does not contradict. The priority claim is properly hedged with “to the best of our knowledge.” This paper is for researchers in operations research or applied machine learning who want a ready starting point for predict-then-optimize pipelines. A reader who needs to try these methods on their own data without writing everything from scratch would get practical value. It shows clear organization of the different approaches into one framework. I would bring it to a reading group focused on ML for optimization. I would not cite the paper itself in my own theoretical work, but the library could be referenced if used. It deserves peer review because a documented, extensible implementation with comparisons can serve as a reference even without novel math.

Referee Report

0 major / 3 minor

Summary. The manuscript introduces PyEPO, a PyTorch-based open-source library for end-to-end predict-then-optimize frameworks applied to linear and integer programs whose objective coefficients are predicted from contextual features. It implements surrogate decision losses, black-box differentiation, and perturbed optimization methods; supplies a user interface for specifying arbitrary user-defined problems; and reports numerical comparisons on shortest-path, multiple-knapsack, and TSP instances together with empirical observations.

Significance. If the library correctly integrates the cited algorithms with standard solvers and the reported experiments are reproducible, the work supplies a concrete, extensible platform that lowers the barrier for researchers to test and compare end-to-end methods on custom linear and integer programs. The public GitHub repository and documentation constitute a reproducible artifact that can serve as a reference implementation for the community.

minor comments (3)

[Introduction] The abstract states that PyEPO is 'the first such generic tool … to the best of our knowledge.' A brief related-work paragraph in the introduction that explicitly contrasts PyEPO with prior specialized implementations (e.g., those limited to particular problem classes) would strengthen this claim.
[Experiments] Section 4 (or the experiments section) should include a short table listing the exact solver interfaces, PyTorch versions, and random seeds used for each benchmark so that the reported runtimes and solution qualities can be reproduced without inspecting the repository.
[Library Design] The user-interface description would benefit from a minimal complete code snippet showing how a new problem class is registered and how a custom neural network is attached; the current prose description leaves the exact API surface unclear.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. The report correctly identifies PyEPO's contributions as the first generic PyTorch library for end-to-end predict-then-optimize on linear and integer programs. No specific major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The manuscript presents a software library (PyEPO) implementing existing predict-then-optimize algorithms for linear and integer programs. No derivation chain, first-principles result, or fitted prediction is claimed; the contribution is the packaging, interface, and empirical comparison on standard problems. The qualified priority statement ('to the best of our knowledge') is not load-bearing for any mathematical claim and does not reduce to self-citation or self-definition. The work is self-contained as an engineering artifact whose correctness is evidenced by the public repository and reported experiments.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a software library introduction paper. No free parameters, mathematical axioms, or invented entities are introduced in a theoretical derivation sense.

pith-pipeline@v0.9.0 · 5747 in / 1113 out tokens · 61327 ms · 2026-05-24T11:20:59.653163+00:00 · methodology

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

IGT-OMD: Implicit Gradient Transport for Decision-Focused Learning under Delayed Feedback
cs.LG 2026-05 unverdicted novelty 7.0

IGT-OMD reduces gradient transport error from quadratic to linear in delay length for delayed bilevel optimization and achieves sublinear regret with adaptive steps.
Decision-Focused Learning via Tangent-Space Projection of Prediction Error
cs.LG 2026-05 unverdicted novelty 7.0

Regret gradients in DFL are the tangent-space projection of prediction error scaled by curvature, enabling efficient direct computation without differentiating through solvers.
Decision-Focused Learning via Tangent-Space Projection of Prediction Error
cs.LG 2026-05 unverdicted novelty 6.0

PEAR computes regret gradients via tangent-space projection of prediction error, delivering top decision quality and efficiency on LP and QP tasks without solver differentiation.

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages · cited by 2 Pith papers

[1]

(2016) Tensorﬂow: Large-scale machine learning on heterogeneous distributed systems

Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, et al. (2016) Tensorﬂow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:160304467

work page 2016
[2]

In: Wallach H, Larochelle H, Beygelzimer A, d'Alch´ e-Buc F, Fox E, Garnett R (eds) Advances in Neural Information Processing Systems, Curran Associates, Inc., vol 32

Agrawal A, Amos B, Barratt S, Boyd S, Diamond S, Kolter JZ (2019) Diﬀerentiable convex optimization layers. In: Wallach H, Larochelle H, Beygelzimer A, d'Alch´ e-Buc F, Fox E, Garnett R (eds) Advances in Neural Information Processing Systems, Curran Associates, Inc., vol 32

work page 2019
[3]

arXiv preprint arXiv:190409043

Agrawal A, Barratt S, Boyd S, Busseti E, Moursi WM (2019) Diﬀerenti- ating through a cone program. arXiv preprint arXiv:190409043

work page 2019
[4]

In: International Conference on Machine Learning, PMLR, pp 136–145

Amos B, Kolter JZ (2017) Optnet: Diﬀerentiable optimization as a layer in neural networks. In: International Conference on Machine Learning, PMLR, pp 136–145

work page 2017
[5]

International Journal of Neural Systems 8(04):433–443

Bengio Y (1997) Using a ﬁnancial training criterion rather than a predic- tion criterion. International Journal of Neural Systems 8(04):433–443

work page 1997
[6]

arXiv preprint arXiv:200208676

Berthet Q, Blondel M, Teboul O, Cuturi M, Vert JP, Bach F (2020) Learning with diﬀerentiable perturbed optimizers. arXiv preprint arXiv:200208676

work page 2020
[7]

J Mach Learn Res 21(35):1–69

Blondel M, Martins AF, Niculae V (2020) Learning with fenchel-young losses. J Mach Learn Res 21(35):1–69

work page 2020
[8]

arXiv preprint arXiv:151201274

Chen T, Li M, Li Y, Lin M, Wang N, Wang M, Xiao T, Xu B, Zhang C, Zhang Z (2015) Mxnet: A ﬂexible and eﬃcient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:151201274

work page 2015
[9]

1: User’s manual for cplex

Cplex II (2009) V12. 1: User’s manual for cplex. International Business Machines Corporation 46(53):157

work page 2009
[10]

arXiv preprint arXiv:220713513

Dalle G, Baty L, Bouvier L, Parmentier A (2022) Learning with com- binatorial optimization layers: a probabilistic approach. arXiv preprint arXiv:220713513

work page 2022
[11]

Journal of the operations research society of America 2(4):393–410

Dantzig G, Fulkerson R, Johnson S (1954) Solution of a large-scale traveling-salesman problem. Journal of the operations research society of America 2(4):393–410

work page 1954
[12]

In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in Neural Information Processing Systems, Curran Associates, Inc., vol 30

Djolonga J, Krause A (2017) Diﬀerentiable learning of submodular models. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in Neural Information Processing Systems, Curran Associates, Inc., vol 30

work page 2017
[13]

In: Artiﬁcial Intelligence and Statistics, PMLR, pp 318–326

Domke J (2012) Generic methods for optimization-based modeling. In: Artiﬁcial Intelligence and Statistics, PMLR, pp 318–326

work page 2012
[14]

In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in Neural Infor- mation Processing Systems, Curran Associates, Inc., vol 30

Donti P, Amos B, Kolter JZ (2017) Task-based end-to-end model learning in stochastic optimization. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in Neural Infor- mation Processing Systems, Curran Associates, Inc., vol 30

work page 2017
[15]

In: International Con- ference on Machine Learning, PMLR, vol 119, pp 2858–2867 Title Suppressed Due to Excessive Length 37

Elmachtoub A, Liang JCN, McNellis R (2020) Decision trees for decision- making under the predict-then-optimize framework. In: International Con- ference on Machine Learning, PMLR, vol 119, pp 2858–2867 Title Suppressed Due to Excessive Length 37

work page 2020
[16]

predict, then optimize

Elmachtoub AN, Grigas P (2021) Smart “predict, then optimize”. Man- agement Science 0(0)

work page 2021
[17]

In: Proceedings of the AAAI Conference on Artiﬁcial Intelligence, vol 34, pp 1504–1511

Ferber A, Wilder B, Dilkina B, Tambe M (2020) Mipaal: Mixed integer program as a layer. In: Proceedings of the AAAI Conference on Artiﬁcial Intelligence, vol 34, pp 1504–1511

work page 2020
[18]

In: Advances in Neural Information Processing Systems 28 (2015), pp 2962– 2970

Feurer M, Klein A, Eggensperger J Katharina Springenberg, Blum M, Hutter F (2015) Eﬃcient and robust automated machine learning. In: Advances in Neural Information Processing Systems 28 (2015), pp 2962– 2970

work page 2015
[19]

In: International Conference on Decision and Game Theory for Security, Springer, pp 35–56

Ford B, Nguyen T, Tambe M, Sintov N, Delle Fave F (2015) Beware the soothsayer: From attack prediction accuracy to predictive reliability in security games. In: International Conference on Decision and Game Theory for Security, Springer, pp 35–56

work page 2015
[20]

Operations Research Center Working Paper;OR 078-78

Gavish B, Graves SC (1978) The travelling salesman problem and related problems. Operations Research Center Working Paper;OR 078-78

work page 1978
[21]

arXiv preprint arXiv:160705447

Gould S, Fernando B, Cherian A, Anderson P, Cruz RS, Guo E (2016) On diﬀerentiating parameterized argmin and argmax problems with applica- tion to bi-level optimization. arXiv preprint arXiv:160705447

work page 2016
[22]

URL https://www.gurobi.com

Gurobi Optimization, LLC (2021) Gurobi Optimizer Reference Manual. URL https://www.gurobi.com

work page 2021
[23]

Hagberg A, Swart P, S Chult D (2008) Exploring network structure, dy- namics, and function using networkx. Tech. rep., Los Alamos National Lab.(LANL), Los Alamos, NM (United States)

work page 2008
[24]

(2017) Pyomo-optimization modeling in python, vol 67

Hart WE, Laird CD, Watson JP, Woodruﬀ DL, Hackebeil GA, Nicholson BL, Siirola JD, et al. (2017) Pyomo-optimization modeling in python, vol 67. Springer

work page 2017
[25]

In: Bengio Y, Schu- urmans D, Laﬀerty J, Williams C, Culotta A (eds) Advances in Neural Information Processing Systems, Curran Associates, Inc., vol 22

Kao Yh, Roy B, Yan X (2009) Directed regression. In: Bengio Y, Schu- urmans D, Laﬀerty J, Williams C, Culotta A (eds) Advances in Neural Information Processing Systems, Curran Associates, Inc., vol 22

work page 2009
[26]

In: Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H (eds) Advances in Neural Information Processing Systems, Curran Associates, Inc., vol 33, pp 7272–7282

Mandi J, Guns T (2020) Interior point solving for lp-based predic- tion+optimisation. In: Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H (eds) Advances in Neural Information Processing Systems, Curran Associates, Inc., vol 33, pp 7272–7282

work page 2020
[27]

(2020) Smart predict-and-optimize for hard combinatorial optimization problems

Mandi J, Stuckey PJ, Guns T, et al. (2020) Smart predict-and-optimize for hard combinatorial optimization problems. In: Proceedings of the AAAI Conference on Artiﬁcial Intelligence, vol 34, pp 1603–1610, DOI 10.1609/ aaai.v34i02.5521

work page 2020
[28]

John Wiley & Sons, Inc

Martello S, Toth P (1990) Knapsack problems: algorithms and computer implementations. John Wiley & Sons, Inc

work page 1990
[29]

Optimization and Engineering 13(1):1–27

Mattingley J, Boyd S (2012) Cvxgen: A code generator for embedded convex optimization. Optimization and Engineering 13(1):1–27

work page 2012
[30]

Journal of the ACM (JACM) 7(4):326–329

Miller CE, Tucker AW, Zemlin RA (1960) Integer programming for- mulation of traveling salesman problems. Journal of the ACM (JACM) 7(4):326–329

work page 1960
[31]

Synthesis Lectures 38 Bo Tang, Elias B

Ortega-Arranz H, Llanos DR, Gonzalez-Escribano A (2014) The shortest- path problem: Analysis and comparison of methods. Synthesis Lectures 38 Bo Tang, Elias B. Khalil on Theoretical Computer Science 1(1):1–87

work page 2014
[32]

In: NIPS 2017 Autodiﬀ Workshop

Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Des- maison A, Antiga L, Lerer A (2017) Automatic diﬀerentiation in pytorch. In: NIPS 2017 Autodiﬀ Workshop

work page 2017
[33]

In: Wallach H, Larochelle H, Beygelzimer A, d 'Alch´ e-Buc F, Fox E, Garnett R (eds) Advances in Neural Information Processing Systems 32, Curran Associates, Inc., pp 8024–8035

Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Kopf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S (2019) Pytorch: An imperative style, high-performance deep learning library. In: Wallach H, Larochelle H, Beygelzimer A, d 'Alch´ e-Buc F, Fox E...

work page 2019
[34]

Journal of Machine Learning Research 12:2825–2830

Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit- learn: Machine learning in Python. Journal of Machine Learning Research 12:2825–2830

work page 2011
[35]

In: International Conference on Learning Representations

Poganˇ ci´ c MV, Paulus A, Musil V, Martius G, Rolinek M (2019) Diﬀeren- tiation of blackbox combinatorial solvers. In: International Conference on Learning Representations

work page 2019
[36]

Wilder B, Dilkina B, Tambe M (2019) Melding the data-decisions pipeline: Decision-focused learning for combinatorial optimization. In: Proceedings of the AAAI Conference on Artiﬁcial Intelligence, vol 33, pp 1658–1665 9 Statements and Declarations 9.1 Funding This work was supported by funding from a SCALE AI Research Chair and an NSERC Discovery Grant. 9...

work page 2019

[1] [1]

(2016) Tensorﬂow: Large-scale machine learning on heterogeneous distributed systems

Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, et al. (2016) Tensorﬂow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:160304467

work page 2016

[2] [2]

In: Wallach H, Larochelle H, Beygelzimer A, d'Alch´ e-Buc F, Fox E, Garnett R (eds) Advances in Neural Information Processing Systems, Curran Associates, Inc., vol 32

Agrawal A, Amos B, Barratt S, Boyd S, Diamond S, Kolter JZ (2019) Diﬀerentiable convex optimization layers. In: Wallach H, Larochelle H, Beygelzimer A, d'Alch´ e-Buc F, Fox E, Garnett R (eds) Advances in Neural Information Processing Systems, Curran Associates, Inc., vol 32

work page 2019

[3] [3]

arXiv preprint arXiv:190409043

Agrawal A, Barratt S, Boyd S, Busseti E, Moursi WM (2019) Diﬀerenti- ating through a cone program. arXiv preprint arXiv:190409043

work page 2019

[4] [4]

In: International Conference on Machine Learning, PMLR, pp 136–145

Amos B, Kolter JZ (2017) Optnet: Diﬀerentiable optimization as a layer in neural networks. In: International Conference on Machine Learning, PMLR, pp 136–145

work page 2017

[5] [5]

International Journal of Neural Systems 8(04):433–443

Bengio Y (1997) Using a ﬁnancial training criterion rather than a predic- tion criterion. International Journal of Neural Systems 8(04):433–443

work page 1997

[6] [6]

arXiv preprint arXiv:200208676

Berthet Q, Blondel M, Teboul O, Cuturi M, Vert JP, Bach F (2020) Learning with diﬀerentiable perturbed optimizers. arXiv preprint arXiv:200208676

work page 2020

[7] [7]

J Mach Learn Res 21(35):1–69

Blondel M, Martins AF, Niculae V (2020) Learning with fenchel-young losses. J Mach Learn Res 21(35):1–69

work page 2020

[8] [8]

arXiv preprint arXiv:151201274

Chen T, Li M, Li Y, Lin M, Wang N, Wang M, Xiao T, Xu B, Zhang C, Zhang Z (2015) Mxnet: A ﬂexible and eﬃcient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:151201274

work page 2015

[9] [9]

1: User’s manual for cplex

Cplex II (2009) V12. 1: User’s manual for cplex. International Business Machines Corporation 46(53):157

work page 2009

[10] [10]

arXiv preprint arXiv:220713513

Dalle G, Baty L, Bouvier L, Parmentier A (2022) Learning with com- binatorial optimization layers: a probabilistic approach. arXiv preprint arXiv:220713513

work page 2022

[11] [11]

Journal of the operations research society of America 2(4):393–410

Dantzig G, Fulkerson R, Johnson S (1954) Solution of a large-scale traveling-salesman problem. Journal of the operations research society of America 2(4):393–410

work page 1954

[12] [12]

In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in Neural Information Processing Systems, Curran Associates, Inc., vol 30

Djolonga J, Krause A (2017) Diﬀerentiable learning of submodular models. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in Neural Information Processing Systems, Curran Associates, Inc., vol 30

work page 2017

[13] [13]

In: Artiﬁcial Intelligence and Statistics, PMLR, pp 318–326

Domke J (2012) Generic methods for optimization-based modeling. In: Artiﬁcial Intelligence and Statistics, PMLR, pp 318–326

work page 2012

[14] [14]

In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in Neural Infor- mation Processing Systems, Curran Associates, Inc., vol 30

Donti P, Amos B, Kolter JZ (2017) Task-based end-to-end model learning in stochastic optimization. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in Neural Infor- mation Processing Systems, Curran Associates, Inc., vol 30

work page 2017

[15] [15]

In: International Con- ference on Machine Learning, PMLR, vol 119, pp 2858–2867 Title Suppressed Due to Excessive Length 37

Elmachtoub A, Liang JCN, McNellis R (2020) Decision trees for decision- making under the predict-then-optimize framework. In: International Con- ference on Machine Learning, PMLR, vol 119, pp 2858–2867 Title Suppressed Due to Excessive Length 37

work page 2020

[16] [16]

predict, then optimize

Elmachtoub AN, Grigas P (2021) Smart “predict, then optimize”. Man- agement Science 0(0)

work page 2021

[17] [17]

In: Proceedings of the AAAI Conference on Artiﬁcial Intelligence, vol 34, pp 1504–1511

Ferber A, Wilder B, Dilkina B, Tambe M (2020) Mipaal: Mixed integer program as a layer. In: Proceedings of the AAAI Conference on Artiﬁcial Intelligence, vol 34, pp 1504–1511

work page 2020

[18] [18]

In: Advances in Neural Information Processing Systems 28 (2015), pp 2962– 2970

Feurer M, Klein A, Eggensperger J Katharina Springenberg, Blum M, Hutter F (2015) Eﬃcient and robust automated machine learning. In: Advances in Neural Information Processing Systems 28 (2015), pp 2962– 2970

work page 2015

[19] [19]

In: International Conference on Decision and Game Theory for Security, Springer, pp 35–56

Ford B, Nguyen T, Tambe M, Sintov N, Delle Fave F (2015) Beware the soothsayer: From attack prediction accuracy to predictive reliability in security games. In: International Conference on Decision and Game Theory for Security, Springer, pp 35–56

work page 2015

[20] [20]

Operations Research Center Working Paper;OR 078-78

Gavish B, Graves SC (1978) The travelling salesman problem and related problems. Operations Research Center Working Paper;OR 078-78

work page 1978

[21] [21]

arXiv preprint arXiv:160705447

Gould S, Fernando B, Cherian A, Anderson P, Cruz RS, Guo E (2016) On diﬀerentiating parameterized argmin and argmax problems with applica- tion to bi-level optimization. arXiv preprint arXiv:160705447

work page 2016

[22] [22]

URL https://www.gurobi.com

Gurobi Optimization, LLC (2021) Gurobi Optimizer Reference Manual. URL https://www.gurobi.com

work page 2021

[23] [23]

Hagberg A, Swart P, S Chult D (2008) Exploring network structure, dy- namics, and function using networkx. Tech. rep., Los Alamos National Lab.(LANL), Los Alamos, NM (United States)

work page 2008

[24] [24]

(2017) Pyomo-optimization modeling in python, vol 67

Hart WE, Laird CD, Watson JP, Woodruﬀ DL, Hackebeil GA, Nicholson BL, Siirola JD, et al. (2017) Pyomo-optimization modeling in python, vol 67. Springer

work page 2017

[25] [25]

In: Bengio Y, Schu- urmans D, Laﬀerty J, Williams C, Culotta A (eds) Advances in Neural Information Processing Systems, Curran Associates, Inc., vol 22

Kao Yh, Roy B, Yan X (2009) Directed regression. In: Bengio Y, Schu- urmans D, Laﬀerty J, Williams C, Culotta A (eds) Advances in Neural Information Processing Systems, Curran Associates, Inc., vol 22

work page 2009

[26] [26]

In: Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H (eds) Advances in Neural Information Processing Systems, Curran Associates, Inc., vol 33, pp 7272–7282

Mandi J, Guns T (2020) Interior point solving for lp-based predic- tion+optimisation. In: Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H (eds) Advances in Neural Information Processing Systems, Curran Associates, Inc., vol 33, pp 7272–7282

work page 2020

[27] [27]

(2020) Smart predict-and-optimize for hard combinatorial optimization problems

Mandi J, Stuckey PJ, Guns T, et al. (2020) Smart predict-and-optimize for hard combinatorial optimization problems. In: Proceedings of the AAAI Conference on Artiﬁcial Intelligence, vol 34, pp 1603–1610, DOI 10.1609/ aaai.v34i02.5521

work page 2020

[28] [28]

John Wiley & Sons, Inc

Martello S, Toth P (1990) Knapsack problems: algorithms and computer implementations. John Wiley & Sons, Inc

work page 1990

[29] [29]

Optimization and Engineering 13(1):1–27

Mattingley J, Boyd S (2012) Cvxgen: A code generator for embedded convex optimization. Optimization and Engineering 13(1):1–27

work page 2012

[30] [30]

Journal of the ACM (JACM) 7(4):326–329

Miller CE, Tucker AW, Zemlin RA (1960) Integer programming for- mulation of traveling salesman problems. Journal of the ACM (JACM) 7(4):326–329

work page 1960

[31] [31]

Synthesis Lectures 38 Bo Tang, Elias B

Ortega-Arranz H, Llanos DR, Gonzalez-Escribano A (2014) The shortest- path problem: Analysis and comparison of methods. Synthesis Lectures 38 Bo Tang, Elias B. Khalil on Theoretical Computer Science 1(1):1–87

work page 2014

[32] [32]

In: NIPS 2017 Autodiﬀ Workshop

Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Des- maison A, Antiga L, Lerer A (2017) Automatic diﬀerentiation in pytorch. In: NIPS 2017 Autodiﬀ Workshop

work page 2017

[33] [33]

In: Wallach H, Larochelle H, Beygelzimer A, d 'Alch´ e-Buc F, Fox E, Garnett R (eds) Advances in Neural Information Processing Systems 32, Curran Associates, Inc., pp 8024–8035

Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Kopf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S (2019) Pytorch: An imperative style, high-performance deep learning library. In: Wallach H, Larochelle H, Beygelzimer A, d 'Alch´ e-Buc F, Fox E...

work page 2019

[34] [34]

Journal of Machine Learning Research 12:2825–2830

Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit- learn: Machine learning in Python. Journal of Machine Learning Research 12:2825–2830

work page 2011

[35] [35]

In: International Conference on Learning Representations

Poganˇ ci´ c MV, Paulus A, Musil V, Martius G, Rolinek M (2019) Diﬀeren- tiation of blackbox combinatorial solvers. In: International Conference on Learning Representations

work page 2019

[36] [36]

Wilder B, Dilkina B, Tambe M (2019) Melding the data-decisions pipeline: Decision-focused learning for combinatorial optimization. In: Proceedings of the AAAI Conference on Artiﬁcial Intelligence, vol 33, pp 1658–1665 9 Statements and Declarations 9.1 Funding This work was supported by funding from a SCALE AI Research Chair and an NSERC Discovery Grant. 9...

work page 2019