Scaling Decision-Focused Learning to Large Problems with Lagrangian Decomposition

Hugo Percot; Louis-Martin Rousseau; Quentin Cappart; St\'ephane Eilles-Chan Way; Tias Guns

arxiv: 2606.08797 · v1 · pith:XDZHYGVWnew · submitted 2026-06-07 · 💻 cs.LG · cs.AI

Scaling Decision-Focused Learning to Large Problems with Lagrangian Decomposition

St\'ephane Eilles-Chan Way , Hugo Percot , Quentin Cappart , Tias Guns , Louis-Martin Rousseau This is my paper

Pith reviewed 2026-06-27 18:34 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords decision-focused learningLagrangian decompositionpredict-then-optimizemulti-dimensional knapsackportfolio optimizationsurrogate objectivescaling

0 comments

The pith

Lagrangian decomposition scales decision-focused learning to instances with eight times more variables.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Decision-focused learning trains prediction models by embedding a constrained optimization step inside each training iteration, but the repeated full solves make it impractical for large problems. The paper replaces that inner optimization with a Lagrangian decomposition that breaks the problem into independent subproblems, yielding a surrogate objective and two new loss functions. These losses let the prediction model be trained while preserving compatibility with existing methods such as SPO+ and IMLE. Experiments on multi-dimensional knapsack and quadratic portfolio problems show the approach stays competitive in decision quality yet runs on instances up to eight times larger than those handled by prior work. The decomposition also admits parallel computation across subproblems.

Core claim

Lagrangian decomposition of the inner constrained optimization produces a separable surrogate objective that supports two new loss functions for decision-focused training. The resulting framework integrates directly with standard methods including SPO+ and IMLE, achieves competitive performance on the multi-dimensional knapsack and quadratic portfolio benchmarks, and extends to instances involving up to eight times more variables than those typically considered in related work.

What carries the argument

Lagrangian decomposition applied to the inner optimization problem, which creates a surrogate objective and the two associated loss functions used for training the prediction model.

If this is right

The framework integrates directly with standard decision-focused learning methods including SPO+ and IMLE.
Training becomes amenable to parallelization across the decomposed subproblems.
The approach achieves competitive or better decision quality on multi-dimensional knapsack and quadratic portfolio problems.
It extends to instances with up to eight times more variables than those used in prior work.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The parallel structure could support distributed training on clusters or GPUs for even larger problems.
The surrogate may reduce dependence on exact solvers during training, enabling use in environments where only approximate or heuristic solvers are available.
Similar decomposition techniques could transfer to other predict-then-optimize settings where Lagrangian methods already work well.

Load-bearing premise

The surrogate objective obtained from Lagrangian decomposition remains a sufficiently faithful proxy for the original constrained optimization problem when used inside the training loop of the prediction model.

What would settle it

On a large instance, if models trained with the decomposed losses produce final decisions whose quality falls materially below those from full-optimization DFL or from a plain predict-then-optimize baseline, the claim that the surrogate enables reliable scaling would be falsified.

Figures

Figures reproduced from arXiv: 2606.08797 by Hugo Percot, Louis-Martin Rousseau, Quentin Cappart, St\'ephane Eilles-Chan Way, Tias Guns.

**Figure 1.** Figure 1: Schematic view of our method. The initial parameterized COP and the parameterized LD define two optimization problems, each [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗

**Figure 2.** Figure 2: Test-set relative regret (mean across seeds [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

read the original abstract

Decision-focused learning has shown great promise for addressing predict-then-optimize problems, particularly in the presence of under-specified models. However, its practical deployment is often hindered by high computational costs and limited scalability, as it requires solving a constrained optimization problem for each training instance at every iteration. To address these challenges, we propose a novel framework that incorporates Lagrangian decomposition into the decision-focused learning paradigm. Specifically, we introduce a new surrogate objective along with two loss functions for evaluating and training the underlying prediction model. We further propose two variants of our approach, which offer different trade-offs between computational efficiency and solution quality. Our framework can be seamlessly integrated with standard decision-focused learning methods, including Smart Predict-then-Optimize (SPO+) and Implicit Maximum Likelihood Estimation (IMLE). Through experiments on two standard benchmarks, the multi-dimensional knapsack problem and quadratic portfolio optimization, we demonstrate that our approach achieves competitive performance while remaining amenable to parallelization. In particular, it consistently outperforms traditional decision-focused learning methods on large-scale instances, involving up to eight times more variables than those typically considered in related work. The implementation is available at https://github.com/corail-research/DFL-LD.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Lagrangian decomposition gives DFL a practical way to handle bigger instances, but the surrogate's effect on training gradients needs closer scrutiny.

read the letter

The core advance here is folding Lagrangian decomposition into decision-focused learning so the inner optimization can be broken into smaller subproblems that run in parallel. They define a surrogate objective and two new losses that plug into SPO+ and IMLE, then test on multi-dimensional knapsack and quadratic portfolio problems.

The experiments are the strongest part. They report results on instances with up to eight times more variables than typical DFL papers, and the method stays competitive while being amenable to parallelization. That directly tackles the scalability complaint that has kept DFL out of larger operations-research settings.

The soft spot is the fidelity of the surrogate. Lagrangian relaxation introduces a duality gap whose size depends on the multipliers and the integrality of the subproblems. If that gap varies across training examples or produces biased subgradients, the learned predictor can optimize for the wrong decisions. The abstract gives no bounds on the gap, no ablation on multiplier quality, and no check that the surrogate and original problem share the same argmin over the prediction space. Without those, the performance gains could partly reflect implementation choices rather than a reliable training signal.

The work is aimed at people already using predict-then-optimize pipelines who need to move past small benchmarks. The citation pattern looks standard and the code is released, which helps.

It is worth sending to referees. The scalability route is concrete and the results on larger instances are worth verifying with the full derivations and controls.

Referee Report

2 major / 2 minor

Summary. The paper proposes a framework that integrates Lagrangian decomposition into decision-focused learning (DFL) to improve scalability. It defines a new surrogate objective derived from the decomposed Lagrangian, along with two associated loss functions, and shows how these can be combined with existing DFL methods such as SPO+ and IMLE. Experiments on the multi-dimensional knapsack problem and quadratic portfolio optimization report competitive performance on instances up to eight times larger than those in prior work, with an open-source implementation provided.

Significance. If the surrogate losses produce training signals whose argmin aligns with that of the original constrained problem, the approach could meaningfully extend DFL to larger practical instances while retaining the ability to parallelize subproblems. The public code release supports reproducibility and is a clear strength.

major comments (2)

[§3.2] §3.2 (surrogate objective definition): the two new loss functions are obtained by substituting the Lagrangian relaxation into the DFL training loop, yet no bound or empirical characterization is given for the duality gap that arises in the integer multi-dimensional knapsack subproblems; without this, it is unclear whether the surrogate and original objectives share the same argmin over the prediction space.
[§4.2, Table 3] §4.2 and Table 3 (large-scale knapsack results): the reported outperformance versus standard SPO+ and IMLE on instances with 8× more variables is presented as point estimates only; the absence of variance across random seeds or statistical tests leaves open whether the gains are robust to the varying duality gaps across training instances.

minor comments (2)

[§3.3] Notation for the two loss functions (L_LD and L_LD-IMLE) is introduced without an explicit side-by-side comparison of their computational and statistical properties.
[Abstract] The abstract states that the framework 'can be seamlessly integrated' with SPO+ and IMLE, but the integration steps are only sketched at a high level; a short pseudocode block would clarify the required modifications to the original DFL pipelines.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major comment below, explaining our position and planned revisions where appropriate.

read point-by-point responses

Referee: [§3.2] §3.2 (surrogate objective definition): the two new loss functions are obtained by substituting the Lagrangian relaxation into the DFL training loop, yet no bound or empirical characterization is given for the duality gap that arises in the integer multi-dimensional knapsack subproblems; without this, it is unclear whether the surrogate and original objectives share the same argmin over the prediction space.

Authors: We thank the referee for this observation. The Lagrangian decomposition is introduced precisely to obtain a tractable surrogate that admits parallel subproblem solves while preserving the decision-focused training loop. Although the integer knapsack subproblems admit a nonzero duality gap in general, the surrogate objective is constructed so that its gradient signal with respect to the predictions remains informative for improving downstream decisions; this is supported by the consistent outperformance on large instances. We acknowledge that a theoretical characterization of when the argmins coincide is not provided. In the revision we will add an empirical study reporting the observed duality gaps (and their variation) across training instances and problem sizes. revision: yes
Referee: [§4.2, Table 3] §4.2 and Table 3 (large-scale knapsack results): the reported outperformance versus standard SPO+ and IMLE on instances with 8× more variables is presented as point estimates only; the absence of variance across random seeds or statistical tests leaves open whether the gains are robust to the varying duality gaps across training instances.

Authors: We agree that variance estimates and statistical tests would strengthen the presentation. The current results are reported as single-run point estimates because the primary focus was demonstrating scalability on instances eight times larger than prior work. In the revised manuscript we will rerun the large-scale experiments over multiple random seeds, report means and standard deviations, and include paired statistical tests to confirm that the observed improvements remain significant despite instance-to-instance variation in duality gaps. revision: yes

Circularity Check

0 steps flagged

No significant circularity; extension of external DFL methods via standard Lagrangian decomposition

full rationale

The paper defines a surrogate objective and two new loss functions by applying Lagrangian decomposition (a classical optimization technique) to the predict-then-optimize setting. These are then integrated with the externally cited SPO+ and IMLE frameworks. No equation reduces a claimed prediction or performance gain to a parameter fitted inside the paper, nor does any load-bearing premise rest on a self-citation chain. The derivation chain remains self-contained against external benchmarks and standard mathematical results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on standard convex optimization assumptions and the validity of Lagrangian relaxation as a surrogate; no new free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption The underlying optimization problems admit a Lagrangian decomposition whose dual provides a usable surrogate for the original objective during training.
Required for the new surrogate objective and losses to be well-defined.

pith-pipeline@v0.9.1-grok · 5750 in / 1158 out tokens · 13471 ms · 2026-06-27T18:34:37.511344+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 2 canonical work pages

[1]

Doge-train: Discrete optimization on gpu with end- to-end training

[Abbas and Swoboda, 2024] Ahmed Abbas and Paul Swo- boda. Doge-train: Discrete optimization on gpu with end- to-end training. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 20623–20631,

2024
[2]

Zico Kolter

[Amos and Kolter, 2017] Brandon Amos and J. Zico Kolter. OptNet: differentiable optimization as a layer in neural networks.Proceedings of the 34th International Confer- ence on Machine Learning - Volume 70, pages 136–145,

2017
[3]

Solver-free decision-focused learning for linear optimization prob- lems.arXiv preprint arXiv:2505.22224,

[Berdenet al., 2025 ] Senne Berden, Ali ˙Irfan Mah- muto˘gulları, Dimos Tsouros, and Tias Guns. Solver-free decision-focused learning for linear optimization prob- lems.arXiv preprint arXiv:2505.22224,

work page arXiv 2025
[4]

Learning valid dual bounds in constraint programming: Boosted lagrangian decomposition with self-supervised learning

[Bessaet al., 2025 ] Swann Bessa, Darius Dabert, Max Bourgeat, Louis-Martin Rousseau, and Quentin Cappart. Learning valid dual bounds in constraint programming: Boosted lagrangian decomposition with self-supervised learning. InProceedings of the AAAI Conference on Arti- ficial Intelligence, volume 39, pages 11113–11121,

2025
[5]

Scalable coupling of deep learning with logical reasoning,

[Defresneet al., 2023 ] Marianne Defresne, Sophie Barbe, and Thomas Schiex. Scalable coupling of deep learning with logical reasoning,

2023
[6]

predict, then optimize

[Elmachtoub and Grigas, 2021] Adam N. Elmachtoub and Paul Grigas. Smart “predict, then optimize”.Manage. Sci., 68(1):9–26, January

2021
[7]

Elmachtoub, Henry Lam, Haofeng Zhang, and Yunfan Zhao

[Elmachtoubet al., 2023 ] Adam N. Elmachtoub, Henry Lam, Haofeng Zhang, and Yunfan Zhao. Estimate-then- optimize versus integrated-estimation-optimization versus sample average approximation: A stochastic dominance perspective,

2023
[8]

and Lam, Henry and Zhang, Haofeng and Zhao, Yunfan , date =

eprint: 2304.06833. [Gouldet al., 2016 ] Stephen Gould, Basura Fernando, Anoop Cherian, Peter Anderson, Rodrigo Santa Cruz, and Edison Guo. On differentiating parameterized argmin and argmax problems with application to bi-level optimization.arXiv preprint arXiv:1607.05447,

work page arXiv 2016
[9]

Lagrangean decomposition: A model yielding stronger lagrangean bounds.Mathematical programming, 39(2):215–228,

[Guignard and Kim, 1987] Monique Guignard and Siwhan Kim. Lagrangean decomposition: A model yielding stronger lagrangean bounds.Mathematical programming, 39(2):215–228,

1987
[10]

Gurobi Optimizer Reference Manual,

[Gurobi Optimization, LLC, 2024] Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual,

2024
[11]

General bounding mech- anism for constraint programs

[H`aet al., 2015 ] Minh Ho `ang H `a, Claude-Guy Quimper, and Louis-Martin Rousseau. General bounding mech- anism for constraint programs. In Gilles Pesant, edi- tor,Principles and Practice of Constraint Programming, pages 158–172. Springer International Publishing,

2015
[12]

Smart predict-and-optimize for hard combina- torial optimization problems

[Mandiet al., 2020 ] Jayanta Mandi, Peter J Stuckey, Tias Guns, et al. Smart predict-and-optimize for hard combina- torial optimization problems. InProceedings of the AAAI conference on artificial intelligence, volume 34, pages 1603–1610,

2020
[13]

Decision-focused learning: Foundations, state of the art, benchmark and future op- portunities.Journal of Artificial Intelligence Research, 80:1623–1701,

[Mandiet al., 2024 ] Jayanta Mandi, James Kotary, Senne Berden, Maxime Mulamba, Victor Bucarey, Tias Guns, and Ferdinando Fioretto. Decision-focused learning: Foundations, state of the art, benchmark and future op- portunities.Journal of Artificial Intelligence Research, 80:1623–1701,

2024
[14]

Implicit MLE: backpropagating through discrete exponential family distributions

[Niepertet al., 2021 ] Mathias Niepert, Pasquale Minervini, and Luca Franceschi. Implicit MLE: backpropagating through discrete exponential family distributions. InPro- ceedings of the 35th International Conference on Neural Information Processing Systems, NIPS ’21. Curran Asso- ciates Inc.,

2021
[15]

Learning lagrangian multipliers for the travelling sales- man problem

[Parjadiset al., 2024 ] Augustin Parjadis, Quentin Cappart, Bistra Dilkina, Aaron Ferber, and Louis-Martin Rousseau. Learning lagrangian multipliers for the travelling sales- man problem. In30th International Conference on Princi- ples and Practice of Constraint Programming (CP 2024), pages 22–1. Schloss Dagstuhl–Leibniz-Zentrum f¨ur Infor- matik,

2024
[16]

Pytorch: An imperative style, high- performance deep learning library

[Paszkeet al., 2019 ] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An impera...

2019
[17]

[Shor, 2012] Naum Zuselevich Shor.Minimization methods for non-differentiable functions, volume

2012
[18]

[Tang and Khalil, 2024b] Bo Tang and Elias B Khalil

Springer Nature Switzerland. [Tang and Khalil, 2024b] Bo Tang and Elias B Khalil. PyEPO: a PyTorch-based end-to-end predict-then- optimize library for linear and integer programming. Mathematical Programming Computation, 16(3):297– 335, 2024

2024

[1] [1]

Doge-train: Discrete optimization on gpu with end- to-end training

[Abbas and Swoboda, 2024] Ahmed Abbas and Paul Swo- boda. Doge-train: Discrete optimization on gpu with end- to-end training. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 20623–20631,

2024

[2] [2]

Zico Kolter

[Amos and Kolter, 2017] Brandon Amos and J. Zico Kolter. OptNet: differentiable optimization as a layer in neural networks.Proceedings of the 34th International Confer- ence on Machine Learning - Volume 70, pages 136–145,

2017

[3] [3]

Solver-free decision-focused learning for linear optimization prob- lems.arXiv preprint arXiv:2505.22224,

[Berdenet al., 2025 ] Senne Berden, Ali ˙Irfan Mah- muto˘gulları, Dimos Tsouros, and Tias Guns. Solver-free decision-focused learning for linear optimization prob- lems.arXiv preprint arXiv:2505.22224,

work page arXiv 2025

[4] [4]

Learning valid dual bounds in constraint programming: Boosted lagrangian decomposition with self-supervised learning

[Bessaet al., 2025 ] Swann Bessa, Darius Dabert, Max Bourgeat, Louis-Martin Rousseau, and Quentin Cappart. Learning valid dual bounds in constraint programming: Boosted lagrangian decomposition with self-supervised learning. InProceedings of the AAAI Conference on Arti- ficial Intelligence, volume 39, pages 11113–11121,

2025

[5] [5]

Scalable coupling of deep learning with logical reasoning,

[Defresneet al., 2023 ] Marianne Defresne, Sophie Barbe, and Thomas Schiex. Scalable coupling of deep learning with logical reasoning,

2023

[6] [6]

predict, then optimize

[Elmachtoub and Grigas, 2021] Adam N. Elmachtoub and Paul Grigas. Smart “predict, then optimize”.Manage. Sci., 68(1):9–26, January

2021

[7] [7]

Elmachtoub, Henry Lam, Haofeng Zhang, and Yunfan Zhao

[Elmachtoubet al., 2023 ] Adam N. Elmachtoub, Henry Lam, Haofeng Zhang, and Yunfan Zhao. Estimate-then- optimize versus integrated-estimation-optimization versus sample average approximation: A stochastic dominance perspective,

2023

[8] [8]

and Lam, Henry and Zhang, Haofeng and Zhao, Yunfan , date =

eprint: 2304.06833. [Gouldet al., 2016 ] Stephen Gould, Basura Fernando, Anoop Cherian, Peter Anderson, Rodrigo Santa Cruz, and Edison Guo. On differentiating parameterized argmin and argmax problems with application to bi-level optimization.arXiv preprint arXiv:1607.05447,

work page arXiv 2016

[9] [9]

Lagrangean decomposition: A model yielding stronger lagrangean bounds.Mathematical programming, 39(2):215–228,

[Guignard and Kim, 1987] Monique Guignard and Siwhan Kim. Lagrangean decomposition: A model yielding stronger lagrangean bounds.Mathematical programming, 39(2):215–228,

1987

[10] [10]

Gurobi Optimizer Reference Manual,

[Gurobi Optimization, LLC, 2024] Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual,

2024

[11] [11]

General bounding mech- anism for constraint programs

[H`aet al., 2015 ] Minh Ho `ang H `a, Claude-Guy Quimper, and Louis-Martin Rousseau. General bounding mech- anism for constraint programs. In Gilles Pesant, edi- tor,Principles and Practice of Constraint Programming, pages 158–172. Springer International Publishing,

2015

[12] [12]

Smart predict-and-optimize for hard combina- torial optimization problems

[Mandiet al., 2020 ] Jayanta Mandi, Peter J Stuckey, Tias Guns, et al. Smart predict-and-optimize for hard combina- torial optimization problems. InProceedings of the AAAI conference on artificial intelligence, volume 34, pages 1603–1610,

2020

[13] [13]

Decision-focused learning: Foundations, state of the art, benchmark and future op- portunities.Journal of Artificial Intelligence Research, 80:1623–1701,

[Mandiet al., 2024 ] Jayanta Mandi, James Kotary, Senne Berden, Maxime Mulamba, Victor Bucarey, Tias Guns, and Ferdinando Fioretto. Decision-focused learning: Foundations, state of the art, benchmark and future op- portunities.Journal of Artificial Intelligence Research, 80:1623–1701,

2024

[14] [14]

Implicit MLE: backpropagating through discrete exponential family distributions

[Niepertet al., 2021 ] Mathias Niepert, Pasquale Minervini, and Luca Franceschi. Implicit MLE: backpropagating through discrete exponential family distributions. InPro- ceedings of the 35th International Conference on Neural Information Processing Systems, NIPS ’21. Curran Asso- ciates Inc.,

2021

[15] [15]

Learning lagrangian multipliers for the travelling sales- man problem

[Parjadiset al., 2024 ] Augustin Parjadis, Quentin Cappart, Bistra Dilkina, Aaron Ferber, and Louis-Martin Rousseau. Learning lagrangian multipliers for the travelling sales- man problem. In30th International Conference on Princi- ples and Practice of Constraint Programming (CP 2024), pages 22–1. Schloss Dagstuhl–Leibniz-Zentrum f¨ur Infor- matik,

2024

[16] [16]

Pytorch: An imperative style, high- performance deep learning library

[Paszkeet al., 2019 ] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An impera...

2019

[17] [17]

[Shor, 2012] Naum Zuselevich Shor.Minimization methods for non-differentiable functions, volume

2012

[18] [18]

[Tang and Khalil, 2024b] Bo Tang and Elias B Khalil

Springer Nature Switzerland. [Tang and Khalil, 2024b] Bo Tang and Elias B Khalil. PyEPO: a PyTorch-based end-to-end predict-then- optimize library for linear and integer programming. Mathematical Programming Computation, 16(3):297– 335, 2024

2024