arxiv: 2605.01361 · v1 · submitted 2026-05-02 · 💻 cs.LG

Recognition: unknown

Decision-Focused Learning via Tangent-Space Projection of Prediction Error

Junhyeong Lee , Sangjin Jin , Yongjae Lee

Authors on Pith no claims yet

Pith reviewed 2026-05-09 14:14 UTC · model grok-4.3

classification 💻 cs.LG

keywords decision-focused learningregret gradienttangent space projectionprediction erroractive constraintslinear programmingquadratic programming

0 comments

The pith

The regret gradient equals the prediction error projected onto the tangent space of active constraints and scaled by local curvature.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper shows that the gradient of decision regret with respect to a predictor's output has a simple geometric form under standard conditions. The relevant part of the prediction error lies exactly in the directions allowed by the constraints that are binding at the optimal decision, with the size adjusted by how sharply the feasible region curves there. This means one can obtain the true regret gradient by removing from the usual mean-squared-error gradient only those components that cannot change the downstream decision. The result is a method that trains the predictor directly for decision quality using nothing more than a small linear system built from the active constraints, rather than repeated solver calls or surrogate objectives.

Core claim

Under standard regularity with locally stable active constraints, the regret gradient is equivalent to the prediction error projected onto the tangent space of the active constraints and scaled by local curvature. This characterization shows that regret gradients arise by filtering decision-irrelevant components from the mean-squared-error gradient, yielding a closed-form expression that can be evaluated by solving a reduced linear system over the active constraints alone.

What carries the argument

The tangent-space projection of prediction error onto active constraints, which isolates the decision-relevant component of the gradient without requiring differentiation through the solver.

If this is right

Regret gradients become computable from a single reduced linear system whose size depends only on the number of active constraints.
No differentiation through iterative solver steps or extra optimization solves is required to obtain the gradient.
The resulting training procedure improves downstream decision quality on linear and quadratic programs while using less compute than prior methods.
Performance advantages remain when the constraint set shifts after the predictor has been trained.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same projection view could supply cheap gradient estimates for any constrained decision problem whose active set can be identified reliably.
Implementation in existing optimization libraries would require only the Jacobian of the active constraints and a single linear solve rather than custom autodiff rules.
When predictions are noisy, the filtering step may also reduce the variance of the effective gradient seen by the predictor.

Load-bearing premise

The set of active constraints at the optimal decision stays locally stable under small changes to the predictions.

What would settle it

If finite-difference estimates of the true regret gradient on a problem with fixed active constraints diverge from the value produced by the projection formula, the claimed geometric characterization is false.

Figures

Figures reproduced from arXiv: 2605.01361 by Junhyeong Lee, Sangjin Jin, Yongjae Lee.

**Figure 1.** Figure 1: The PEAR framework. PEAR provides a direct pipeline for transforming prediction error into a regret gradient via a projection onto the local decision geometry. After identifying the active constraints, PEAR rescales the error by local curvature and projects it onto the feasible tangent space, filtering out decision-irrelevant components. 3. Regret Gradient via Tangent-Space Projection This section establis… view at source ↗

**Figure 2.** Figure 2: Robustness under constraint shifts. We train on an original train set and evaluate under shifted constraints. Our method maintains the lowest regret among DFL methods across all three tasks. Additional results are in Appendix F. malized regret, cumulative and annualized returns, Sharpe ratio, and maximum drawdown. Batched computation. In our implementation, the forward pass calls OSQP on each sample to ob… view at source ↗

**Figure 3.** Figure 3: Portfolio cumulative returns. Portfolio value over the test period. PEAR achieves the highest value among all methods. 80 100 120 140 160 180 Trading Days 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 Portfolio Value MSE (2-stage) PEAR (Ours) QPTH CVXPYLayers (a) Drawdown Period I (days 80–180) 560 580 600 620 640 Trading Days 1.5 2.0 2.5 3.0 3.5 4.0 4.5 Portfolio Value MSE (2-stage) PEAR (Ours) QPTH CVXPYLayers (b) Dra… view at source ↗

**Figure 4.** Figure 4: Drawdown analysis. Portfolio value during the two most severe market downturns. PEAR demonstrates the most stable behavior, indicating robust risk management. 18 view at source ↗

read the original abstract

Decision-Focused Learning (DFL) trains predictors to improve downstream decision quality, but computing regret gradients typically requires differentiating through solvers or relying on surrogate losses, which can be computationally expensive or deviate from the true objective. We show that, under standard regularity with locally stable active constraints, the regret gradient admits a closed-form geometric characterization, equivalent to the prediction error projected onto the tangent space of active constraints, scaled by local curvature. This reveals that regret gradients can be obtained by filtering decision-irrelevant components from the MSE gradient, providing a simpler and more direct alternative to existing approaches. Based on this, we propose PEAR (Projected Error As Regret-gradient), which computes regret gradients via a reduced linear system over active constraints, avoiding differentiation through solver iterations or additional optimization solves. Experiments on LP benchmarks and a real-world QP task show that PEAR achieves the best decision quality among all baselines while being the most computationally efficient, with gains that persist under constraint shifts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a geometric shortcut for regret gradients via tangent-space projection but the LP experiments probably do not test the local stability assumption the derivation needs.

read the letter

The paper's core claim is that the regret gradient equals the prediction error projected onto the tangent space of the active constraints and scaled by local curvature. From that they build PEAR, which solves a small linear system over the active set instead of differentiating through a solver or adding surrogate losses. That geometric filtering view is the new piece relative to earlier DFL work. It is a clean way to see why some parts of the usual MSE gradient can be dropped without hurting the downstream decision. On the reported LP benchmarks and the real QP task, PEAR comes out ahead on both decision quality and speed, and the advantage is said to survive constraint shifts. Those are concrete positives worth noting. The assumption that matters most is local stability of the active set. The derivation uses standard KKT sensitivity under that condition. Linear programs, which dominate their experiments, have the property that arbitrarily small changes to the predicted parameters can switch the optimal basis. The abstract states that performance holds under constraint shifts, but nothing in the visible material shows they checked whether the active set stayed fixed during those shifts or during any finite-difference validation of the closed-form gradient. Without that check the reported gains could be driven by cases where the assumption happened to hold rather than by the method itself. The math itself looks straightforward and non-circular once the regularity conditions are granted. This is aimed at people already working on decision-focused learning who want a lighter gradient estimator for optimization-coupled pipelines. A reader who needs to train predictors for LPs or QPs and is tired of full solver differentiation would find the PEAR implementation worth trying. It is distinct enough from prior approaches that it should go to peer review rather than a desk reject, though any referee should be asked to verify active-set stability on the LP instances.

Referee Report

2 major / 1 minor

Summary. The manuscript claims that, under standard regularity conditions with locally stable active constraints, the regret gradient in decision-focused learning admits a closed-form geometric characterization: it equals the prediction error projected onto the tangent space of the active constraints and scaled by local curvature. This insight yields the PEAR method, which obtains the gradient by solving a reduced linear system over the active constraints, avoiding differentiation through solvers or surrogate losses. Experiments on LP benchmarks and a real-world QP task report that PEAR achieves the best decision quality among baselines while being the most computationally efficient, with gains persisting under constraint shifts.

Significance. If the characterization holds, the work offers a computationally lighter and geometrically interpretable route to regret gradients in DFL, potentially making optimization-aware training more scalable. The parameter-free derivation from tangent-space geometry and active-set stability is a clear strength, as is the emphasis on efficiency without additional optimization solves. The reported outperformance on LP and QP tasks suggests practical value, provided the stability assumption is verified.

major comments (2)

[Abstract] Abstract: the claim that 'performance gains persist under constraint shifts' is presented without any reported diagnostic (e.g., active-set monitoring or basis-change counts) confirming that the active constraints remained locally stable during those shifts or during the finite-difference validations of the closed-form expression.
[Experiments] Experimental section (LP benchmarks): the central equivalence requires locally stable active sets, yet the manuscript provides no evidence that basis changes were absent under the small parameter perturbations used in the benchmarks; polyhedral LPs are known to be sensitive to such flips, which would invalidate the tangent-space projection.

minor comments (1)

The regularity conditions (e.g., strict complementarity or non-degeneracy) invoked for the projection formula could be stated more explicitly with a short list of required assumptions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful review and for emphasizing the need to verify local stability of active constraints, which underpins the tangent-space characterization of regret gradients. We address each major comment below and will incorporate the suggested diagnostics in the revised manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that 'performance gains persist under constraint shifts' is presented without any reported diagnostic (e.g., active-set monitoring or basis-change counts) confirming that the active constraints remained locally stable during those shifts or during the finite-difference validations of the closed-form expression.

Authors: We appreciate this point. The manuscript's theoretical results are derived under the explicit assumption of locally stable active constraints (as stated in the introduction and method sections). The abstract claim regarding persistence under constraint shifts was intended as an empirical observation, but we agree that it should be supported by diagnostics. In revision, we will add active-set monitoring (including basis-change counts) for the constraint-shift experiments and finite-difference validations, and we will revise the abstract to qualify the claim accordingly. This will confirm that the tangent-space projection remains applicable in the reported regimes. revision: yes
Referee: [Experiments] Experimental section (LP benchmarks): the central equivalence requires locally stable active sets, yet the manuscript provides no evidence that basis changes were absent under the small parameter perturbations used in the benchmarks; polyhedral LPs are known to be sensitive to such flips, which would invalidate the tangent-space projection.

Authors: We agree that the sensitivity of polyhedral LPs to basis flips is a valid concern and that explicit verification is needed to support the central equivalence. In the revised experimental section, we will include new analysis reporting the frequency of basis changes (or their absence) under the small perturbations used for both the LP benchmarks and the finite-difference gradient validations. This will directly address whether the active sets remained locally stable, thereby validating the applicability of the projected-error characterization. If any flips are observed, their impact will be discussed. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation rests on geometric assumptions rather than self-reference or fitted inputs

full rationale

The abstract presents the regret gradient as a closed-form geometric characterization (prediction error projected onto tangent space of active constraints, scaled by local curvature) derived under standard regularity and locally stable active constraints. No equations or steps in the provided text reduce the claimed result to a fitted parameter renamed as prediction, a self-definitional loop, or a load-bearing self-citation chain. The derivation is framed as following from properties of the feasible set and active-set stability, which are external to the paper's own outputs. This matches the default expectation of a self-contained derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The result rests on standard regularity conditions and local stability of active constraints; no free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption standard regularity with locally stable active constraints
Invoked to guarantee the closed-form expression for the regret gradient holds.

pith-pipeline@v0.9.0 · 5465 in / 1081 out tokens · 28638 ms · 2026-05-09T14:14:05.985494+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

61 extracted references · 10 canonical work pages · 1 internal anchor

[1]

1983 , publisher=

Introduction to sensitivity and stability analysis in non linear programming , author=. 1983 , publisher=

1983
[2]

2013 , publisher=

Perturbation analysis of optimization problems , author=. 2013 , publisher=

2013
[3]

Traces and emergence of nonlinear programming , pages=

Minima of functions of several variables with inequalities as side conditions , author=. Traces and emergence of nonlinear programming , pages=. 2013 , publisher=

2013
[4]

Traces and emergence of nonlinear programming , pages=

Nonlinear programming , author=. Traces and emergence of nonlinear programming , pages=. 2013 , publisher=

2013
[5]

Mathematical programming , volume=

Sensitivity analysis for nonlinear programming using penalty methods , author=. Mathematical programming , volume=. 1976 , publisher=

1976
[6]

2006 , edition=

Numerical Optimization , author=. 2006 , edition=

2006
[7]

Mathematics of Operations Research , volume=

Strongly regular generalized equations , author=. Mathematics of Operations Research , volume=. 1980 , publisher=

1980
[8]

2009 , publisher=

Implicit functions and solution mappings , author=. 2009 , publisher=

2009
[9]

Artificial Intelligence and Statistics , pages=

Generic methods for optimization-based modeling , author=. Artificial Intelligence and Statistics , pages=. 2012 , organization=

2012
[10]

International conference on machine learning , pages=

Gradient-based hyperparameter optimization through reversible learning , author=. International conference on machine learning , pages=. 2015 , organization=

2015
[11]

Unrolled generative adversarial networks

Unrolled generative adversarial networks , author=. arXiv preprint arXiv:1611.02163 , year=

work page arXiv
[12]

On Differentiating Parameterized Argmin and Argmax Problems with Application to Bi-level Optimization

On differentiating parameterized argmin and argmax problems with application to bi-level optimization , author=. arXiv preprint arXiv:1607.05447 , year=

work page Pith review arXiv
[13]

International conference on machine learning , pages=

Input convex neural networks , author=. International conference on machine learning , pages=. 2017 , organization=

2017
[14]

International conference on machine learning , pages=

Optnet: Differentiable optimization as a layer in neural networks , author=. International conference on machine learning , pages=. 2017 , organization=

2017
[15]

Advances in neural information processing systems , volume=

Task-based end-to-end model learning in stochastic optimization , author=. Advances in neural information processing systems , volume=
[16]

Dif- ferentiating through a cone program,

Differentiating through a cone program , author=. arXiv preprint arXiv:1904.09043 , year=

work page arXiv 1904
[17]

Advances in neural information processing systems , volume=

Differentiable convex optimization layers , author=. Advances in neural information processing systems , volume=
[18]

IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

Deep declarative networks , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=. 2021 , publisher=

2021
[19]

Advances in Neural Information Processing Systems , volume=

Interior point solving for lp-based prediction+ optimisation , author=. Advances in Neural Information Processing Systems , volume=
[20]

Proceedings of the AAAI conference on artificial intelligence , volume=

Melding the data-decisions pipeline: Decision-focused learning for combinatorial optimization , author=. Proceedings of the AAAI conference on artificial intelligence , volume=
[21]

arXiv preprint arXiv:2508.11365 , year=

Minimizing Surrogate Losses for Decision-Focused Learning using Differentiable Optimization , author=. arXiv preprint arXiv:2508.11365 , year=

work page arXiv
[22]

predict, then optimize

Smart “predict, then optimize” , author=. Management Science , volume=. 2022 , publisher=

2022
[23]

arXiv preprint arXiv:2011.05354 , year=

Contrastive losses and solution caching for predict-and-optimize , author=. arXiv preprint arXiv:2011.05354 , year=

work page arXiv 2011
[24]

International conference on machine learning , pages=

Decision-focused learning: Through the lens of learning to rank , author=. International conference on machine learning , pages=. 2022 , organization=

2022
[25]

Advances in Neural Information Processing Systems , volume=

Decision-focused learning without decision-making: Learning locally optimized decision losses , author=. Advances in Neural Information Processing Systems , volume=
[26]

Advances in neural information processing systems , volume=

Learning with differentiable pertubed optimizers , author=. Advances in neural information processing systems , volume=
[27]

Advances in Neural Information Processing Systems , volume=

Implicit MLE: backpropagating through discrete exponential family distributions , author=. Advances in Neural Information Processing Systems , volume=
[28]

International Conference on Learning Representations , year=

Differentiation of blackbox combinatorial solvers , author=. International Conference on Learning Representations , year=
[29]

Advances in Neural Information Processing Systems , volume=

Decision-focused learning with directional gradients , author=. Advances in Neural Information Processing Systems , volume=
[30]

arXiv preprint arXiv:2505.22224 , year=

Solver-Free Decision-Focused Learning for Linear Optimization Problems , author=. arXiv preprint arXiv:2505.22224 , year=

work page arXiv
[31]

Advances in Neural Information Processing Systems , volume=

BPQP: A Differentiable Convex Optimization Framework for Efficient End-to-End Learning , author=. Advances in Neural Information Processing Systems , volume=
[32]

Advances in neural information processing systems , volume=

Efficient and modular implicit differentiation , author=. Advances in neural information processing systems , volume=
[33]

Proceedings of the AAAI conference on artificial intelligence , volume=

Smart predict-and-optimize for hard combinatorial optimization problems , author=. Proceedings of the AAAI conference on artificial intelligence , volume=
[34]

Advances in Neural Information Processing Systems , volume=

Taskmet: Task-driven metric learning for model learning , author=. Advances in Neural Information Processing Systems , volume=
[35]

33rd International Joint Conference on Artificial Intelligence , pages=

Robust Losses for Decision-Focused Learning , author=. 33rd International Joint Conference on Artificial Intelligence , pages=
[36]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

DFF: Decision-Focused Fine-Tuning for Smarter Predict-Then-Optimize with Limited Data , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
[37]

arXiv preprint arXiv:2502.00828 , year=

Decision-informed neural networks with large language model integration for portfolio optimization , author=. arXiv preprint arXiv:2502.00828 , year=

work page arXiv
[38]

Journal of Artificial Intelligence Research , volume=

Decision-focused learning: Foundations, state of the art, benchmark and future opportunities , author=. Journal of Artificial Intelligence Research , volume=
[39]

European Journal of Operational Research , volume=

A survey of contextual optimization methods for decision-making under uncertainty , author=. European Journal of Operational Research , volume=. 2025 , publisher=

2025
[40]

Springer Science , volume=

Numerical optimization , author=. Springer Science , volume=
[41]

Automatica , volume=

The explicit linear quadratic regulator for constrained systems , author=. Automatica , volume=. 2002 , publisher=

2002
[42]

International Conference on Machine Learning (ICML) , pages=

Differentiating through optimization problems , author=. International Conference on Machine Learning (ICML) , pages=
[43]

2004 , publisher=

Convex optimization , author=. 2004 , publisher=

2004
[44]

Mathematical Programming Computation , volume=

OSQP: An operator splitting solver for quadratic programs , author=. Mathematical Programming Computation , volume=. 2020 , publisher=

2020
[45]

arXiv preprint arXiv:2210.01802 , year=

Alternating differentiation for optimization layers , author=. arXiv preprint arXiv:2210.01802 , year=

work page arXiv
[46]

Proceedings of the AAAI conference on artificial intelligence , volume=

Mipaal: Mixed integer program as a layer , author=. Proceedings of the AAAI conference on artificial intelligence , volume=
[47]

PyEPO: A PyTorch-based End-to-End Predict-then-Optimize Library for Linear and Integer Programming

Pyepo: A pytorch-based end-to-end predict-then-optimize library for linear and integer programming , author=. arXiv preprint arXiv:2206.14234 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[48]

International journal of neural systems , volume=

Using a financial training criterion rather than a prediction criterion , author=. International journal of neural systems , volume=. 1997 , publisher=

1997
[49]

Backpropagation through combinatorial algorithms: Identity with projection works.arXiv preprint arXiv:2205.15213, 2022

Backpropagation through combinatorial algorithms: Identity with projection works , author=. arXiv preprint arXiv:2205.15213 , year=

work page arXiv
[50]

Neural computation , volume=

Fast exact multiplication by the Hessian , author=. Neural computation , volume=. 1994 , publisher=

1994
[51]

Proceedings of the 6th ACM International Conference on AI in Finance , pages=

Return Prediction for Mean-Variance Portfolio Selection: How Decision-Focused Learning Shapes Forecasting Models , author=. Proceedings of the 6th ACM International Conference on AI in Finance , pages=
[52]

Proceedings of the AAAI conference on artificial intelligence , volume=

Are transformers effective for time series forecasting? , author=. Proceedings of the AAAI conference on artificial intelligence , volume=
[53]

Magoon and Fengyu Yang and Noam Aigerman and Shahar Z

Connor W. Magoon and Fengyu Yang and Noam Aigerman and Shahar Z. Kovalsky , title =. Advances in Neural Information Processing Systems , year =
[54]

Mathematics of operations research , volume=

Sensitivity analysis for nonlinear programs and variational inequalities with nonunique multipliers , author=. Mathematics of operations research , volume=. 1990 , publisher=

1990
[55]

Advances in Neural Information Processing Systems , volume=

Landscape surrogate: Learning decision losses for mathematical optimization under partial information , author=. Advances in Neural Information Processing Systems , volume=
[56]

ICLR 2024-The Twelfth International Conference on Learning Representations , year=

Leveraging augmented-Lagrangian techniques for differentiating over infeasible quadratic programs in machine learning , author=. ICLR 2024-The Twelfth International Conference on Learning Representations , year=

2024
[57]

Quantitative Finance , volume=

Distributionally robust end-to-end portfolio construction , author=. Quantitative Finance , volume=. 2023 , publisher=

2023
[58]

Electric Power Systems Research , volume=

Decision-focused learning under decision dependent uncertainty for power systems with price-responsive demand , author=. Electric Power Systems Research , volume=. 2024 , publisher=

2024
[59]

Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems , pages=

Restless multi-armed bandits for maternal and child health: Results from decision-focused learning , author=. Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems , pages=

2023
[60]

International conference on learning representations , year=

Reversible instance normalization for accurate time-series forecasting against distribution shift , author=. International conference on learning representations , year=
[61]

SIAM Journal on Optimization , volume=

Active set identification in nonlinear programming , author=. SIAM Journal on Optimization , volume=. 2006 , publisher=

2006