pith. machine review for the scientific record. sign in

arxiv: 2605.12693 · v1 · submitted 2026-05-12 · 💻 cs.LG

Recognition: 2 theorem links

· Lean Theorem

IGT-OMD: Implicit Gradient Transport for Decision-Focused Learning under Delayed Feedback

Benjamin Amoh, Geoffrey G. Parker, Wesley Marrero

Authors on Pith no claims yet

Pith reviewed 2026-05-14 21:00 UTC · model grok-4.3

classification 💻 cs.LG
keywords decision-focused learningdelayed feedbackbilevel optimizationonline mirror descentregret boundshypergradientsgradient transport
0
0 comments X

The pith

IGT-OMD corrects gradient staleness in delayed bilevel optimization by re-evaluating stale gradients at current parameters using stored inner solutions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that decision-focused learning in online settings with delayed feedback suffers from staleness amplification, where delays in gradient information interact with the sensitivity of the inner optimization solver to produce worse regret than single-level delayed methods predict. It introduces IGT-OMD, which applies implicit gradient transport inside online mirror descent to fix this by recomputing stale hypergradients at the latest parameters. This change converts the transport error term from quadratic dependence on delay to linear dependence. A sympathetic reader would care because many sequential decision problems involve feedback that arrives only after multiple steps, and without such a correction the end-to-end training approach quickly becomes unusable.

Core claim

IGT-OMD applies Implicit Gradient Transport to hypergradients within Online Mirror Descent by re-evaluating stale gradients at the current parameters using stored inner solutions. This reduces transport error from a quadratic to a linear dependence on delay and produces the first sublinear regret bound for delayed bilevel optimization that holds with queue-length-adaptive step sizes.

What carries the argument

Implicit Gradient Transport applied to hypergradients, which recomputes stale gradients at updated outer parameters using previously stored inner solutions to cancel the effect of delay.

If this is right

  • Any black-box delayed optimizer pays an irreducible extra regret cost coming from inner-solver approximation error.
  • Without the bilevel-aware correction, gradient staleness alone produces quadratically growing transport error.
  • The benefit of the transport correction is zero at unit delay and rises to 9.5 percent at fifty rounds of delay.
  • On Linear Quadratic Regulator, Warcraft shortest-path, and Sinkhorn optimal transport tasks, decision loss drops 17 to 55 percent relative to single-level baselines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Storing and re-using inner solutions may be a practical way to handle delays in other online bilevel settings beyond decision-focused learning.
  • The observed phase transitions suggest that the correction becomes worthwhile only after a certain delay threshold, which could be used to decide when to activate it in variable-delay systems.
  • Similar re-evaluation tricks might be testable in non-convex or non-smooth bilevel problems where current regret proofs do not apply.

Load-bearing premise

Inner solutions can be stored and re-evaluated at current parameters with negligible extra cost, and the bilevel problem satisfies the smoothness and convexity conditions needed for the regret analysis.

What would settle it

An experiment in which IGT-OMD exhibits linear or worse regret growth with increasing delay, or in which the measured transport error remains quadratic after the correction is applied, would show the central claim is false.

Figures

Figures reproduced from arXiv: 2605.12693 by Benjamin Amoh, Geoffrey G. Parker, Wesley Marrero.

Figure 1
Figure 1. Figure 1: Transport error scaling validates Theorem 2(c). Left: Log-log regression of cumulative R3 (slope ≈1) and Rsq (slope ≈2) against σmax, confirming linear vs. quadratic scaling. Right: The ratio Rsq/R3 tracks σmax with near-perfect agreement across all algorithms. 5.4 Isolating Transport Contribution with Adam Algorithm 1 with SGD does not outperform Adam baselines (i.e., Robust OMD, and Stale OMD) on the Sin… view at source ↗
Figure 2
Figure 2. Figure 2: visualizes the regret and transport metrics as a function of K. 10 0 10 1 K (inner solver iterations) 540 560 580 600 620 Cumulative regret Regret vs Inner Solver Quality (d = 20) Adam+IGT Stale OMD 1 3 5 10 20 50 K (inner solver iterations) 0 2 4 6 8 10 IGT improvement (%) 8.4% 8.6% 8.5% 7.9% 7.7% 7.5% Adam+IGT Improvement over Stale OMD (d = 20) 10 0 10 1 K (inner solver iterations) 0 100 200 300 400 500… view at source ↗
read the original abstract

Decision-focused learning trains predictive models end-to-end against downstream decision loss, but online settings suffer delayed feedback: outcomes may not arrive for many environment interactions. We identify \emph{staleness amplification}, a failure mode unique to bilevel optimization under delay, in which gradient staleness couples with inner-solver sensitivity to inflate regret beyond single-level delay theory. We prove that any black-box delayed optimizer incurs an irreducible regret cost from inner-solver approximation error, and that gradient staleness contributes a quadratically growing transport error without bilevel-aware correction. Our algorithm, \textbf{IGT-OMD}, applies Implicit Gradient Transport to hypergradients within Online Mirror Descent, re-evaluating stale gradients at the current parameters using stored inner solutions. This method reduces transport error from a quadratic to a linear dependence on delay and achieves the first sublinear regret bound for delayed bilevel optimization with queue-length-adaptive step sizes. Controlled experiments provide a \emph{mechanistic fingerprint}: transport benefit is exactly $0.0\%$ ($p=1.00$) at unit delay and grows monotonically to $9.5\%$ at fifty rounds ($p<0.001$), isolating the correction's effect. On Linear Quadratic Regulator, Warcraft shortest-path, and Sinkhorn optimal transport, IGT-OMD reduces decision loss by $17$--$55\%$ relative to single-level baselines, with phase transitions matching the theory.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces IGT-OMD, an online mirror descent algorithm that applies implicit gradient transport to hypergradients for decision-focused learning under delayed feedback in bilevel optimization. It identifies staleness amplification as a bilevel-specific failure mode, proves an irreducible regret cost for any black-box delayed optimizer arising from inner-solver approximation error, and shows that uncorrected gradient staleness produces quadratic transport error. By storing inner solutions and re-evaluating stale gradients at current parameters, IGT-OMD reduces transport error to linear dependence on delay, yielding the first sublinear regret bound under queue-length-adaptive step sizes. Controlled experiments isolate a monotonic transport benefit (0% at unit delay to 9.5% at 50 rounds) and report 17-55% decision-loss reductions on LQR, Warcraft shortest-path, and Sinkhorn optimal transport tasks.

Significance. If the stated smoothness, convexity, and exact-re-evaluation assumptions hold, the result is significant: it supplies the first sublinear regret guarantee for delayed bilevel decision-focused learning and isolates the unique cost of staleness amplification. The mechanistic fingerprint (transport benefit exactly 0% at unit delay, p=1.00, rising monotonically) and the use of stored inner solutions for exact re-evaluation are concrete strengths that make the contribution falsifiable and reproducible. The practical gains on standard benchmarks suggest immediate applicability to delayed-feedback settings common in online decision making.

major comments (2)
  1. [regret analysis] The regret analysis (abstract and §4): the sublinear bound is stated to follow from linear transport error plus queue-length-adaptive steps, but the derivation of the linear error term is not shown to be independent of the inner-solver tolerance; if the stored inner solutions are only approximate, the claimed reduction from quadratic to linear may not hold.
  2. [irreducible regret theorem] Theorem on irreducible regret cost: the proof that black-box methods incur an additive cost from inner-solver error is load-bearing for the claim that IGT-OMD is necessary; the argument should explicitly quantify how this cost scales with delay under the same smoothness assumptions used for the IGT-OMD bound.
minor comments (2)
  1. [experiments] The experimental section reports p<0.001 for the 9.5% benefit at 50 rounds but does not state the exact statistical test or correction for multiple comparisons.
  2. [algorithm] Notation for the queue-length-adaptive step size is introduced without an explicit formula in the main text; a displayed equation would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the insightful comments on the regret analysis and the irreducible regret theorem. We address each point below and will make the suggested revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [regret analysis] The regret analysis (abstract and §4): the sublinear bound is stated to follow from linear transport error plus queue-length-adaptive steps, but the derivation of the linear error term is not shown to be independent of the inner-solver tolerance; if the stored inner solutions are only approximate, the claimed reduction from quadratic to linear may not hold.

    Authors: We clarify that our analysis assumes exact inner solutions, as stated in the setup for IGT-OMD where we store and re-evaluate using the exact solutions obtained during the inner optimization. This ensures the transport error is linear in the delay and independent of the inner-solver tolerance. To address the concern, we will revise Section 4 to explicitly derive the linear error term, highlighting the independence under the exact re-evaluation assumption. If inner solutions are approximate, an additional error term would appear, but our contribution focuses on the exact case to isolate the staleness amplification effect. revision: yes

  2. Referee: [irreducible regret theorem] Theorem on irreducible regret cost: the proof that black-box methods incur an additive cost from inner-solver error is load-bearing for the claim that IGT-OMD is necessary; the argument should explicitly quantify how this cost scales with delay under the same smoothness assumptions used for the IGT-OMD bound.

    Authors: We agree that quantifying the scaling is important for the necessity claim. We will revise the proof of the irreducible regret theorem to explicitly show how the additive cost from inner-solver approximation error scales with the delay length, using the same smoothness and Lipschitz constants as in the IGT-OMD regret bound. This will demonstrate that the cost is irreducible and grows with delay, underscoring why bilevel-aware correction like IGT-OMD is needed. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The central regret bound for IGT-OMD is derived directly from the algorithm definition: storing inner solutions enables exact re-evaluation of stale hypergradients under Online Mirror Descent, converting quadratic transport error to linear dependence on delay. This follows from the stated smoothness/convexity assumptions and queue-length-adaptive steps without any reduction to fitted inputs, self-definitional loops, or load-bearing self-citations. The black-box lower bound and mechanistic fingerprint experiments are independent of the bound itself and do not rename or smuggle prior results. The derivation remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The analysis rests on standard smoothness and strong-convexity assumptions for the inner and outer problems plus the ability to store and re-evaluate inner solutions; no new free parameters or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption Inner and outer objectives satisfy standard smoothness and convexity conditions required for online mirror descent regret bounds.
    Typical background assumption for deriving sublinear regret in bilevel online optimization.

pith-pipeline@v0.9.0 · 5563 in / 1231 out tokens · 42628 ms · 2026-05-14T21:00:32.254926+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Reference graph

Works this paper leans on

43 extracted references · 18 canonical work pages · 4 internal anchors

  1. [1]

    Predict, then Optimize

    Adam N. Elmachtoub and Paul Grigas. Smart “Predict, then Optimize”.Management Science, 68(1):9–26, 2022. doi: 10.1287/mnsc.2020.3922

  2. [2]

    Weld, Walter S

    Bryan Wilder, Bistra Dilkina, and Milind Tambe. Melding the data-decisions pipeline: Decision- focused learning for combinatorial optimization. InProceedings of the AAAI Conference on Artificial Intelligence, volume 33 (01), pages 1658–1665, 2019. doi: 10.1609/aaai.v33i01. 33011658

  3. [3]

    Decision-focused learning: Foundations, state of the art, benchmark and future opportunities.Journal of Artificial Intelligence Research, 81:1623–1701, 2024

    Jayanta Mandi, James Kotary, Senne Berden, Maxime Mulamba, Victor Bucarey, Tias Guns, and Ferdinando Fioretto. Decision-focused learning: Foundations, state of the art, benchmark and future opportunities.Journal of Artificial Intelligence Research, 81:1623–1701, 2024

  4. [4]

    Online learning under delayed feedback

    Pooria Joulani, András György, and Csaba Szepesvári. Online learning under delayed feedback. InInternational Conference on Machine Learning, pages 1453–1461. PMLR, 2013

  5. [5]

    Online learning and online convex optimization.Foundations and Trends in Machine Learning, 4(2):107–194, 2012

    Shai Shalev-Shwartz. Online learning and online convex optimization.Foundations and Trends in Machine Learning, 4(2):107–194, 2012

  6. [6]

    Online learning with adversarial delays

    Kent Quanrud and Daniel Khashabi. Online learning with adversarial delays. InAdvances in Neural Information Processing Systems, volume 28, pages 1270–1278. Curran Associates, Inc.,

  7. [7]

    URLhttps://neurips.cc

  8. [8]

    Arnold, Pierre-Antoine Manzagol, Reza Babanezhad, Ioannis Mitliagkas, and Nicolas Le Roux

    Sébastien M.R. Arnold, Pierre-Antoine Manzagol, Reza Babanezhad, Ioannis Mitliagkas, and Nicolas Le Roux. Reducing the variance in online optimization by transporting past gradients. InAdvances in Neural Information Processing Systems, volume 32, 2019

  9. [9]

    Vincent Poor

    Siyuan Yu, Wei Chen, and H. Vincent Poor. Distributed stochastic gradient descent with staleness: A stochastic delay differential equation based framework.IEEE Transactions on Signal Processing, 2025. doi: 10.48550/arxiv.2406.11159. Accepted. arXiv:2406.11159

  10. [10]

    Differentiation of blackbox combinatorial solvers

    Marin Vlastelica, Anselm Paulus, Vít Musil, Georg Martius, and Michal Rolínek. Differentiation of blackbox combinatorial solvers. InInternational Conference on Learning Representations, 2020

  11. [11]

    Bo Tang and Elias B. Khalil. PyEPO: A PyTorch-based end-to-end predict-then-optimize library for linear and integer programming.arXiv preprint arXiv:2206.14234, 2022. doi: 10.48550/arXiv.2206.14234. URLhttps://arxiv.org

  12. [12]

    Jordan, Etienne Boursier, and Alain Durmus

    Aymeric Capitaine, Maxime Haddouche, Eric Moulines, Michael I. Jordan, Etienne Boursier, and Alain Durmus. Online decision-focused learning.arXiv preprint arXiv:2505.13564, 2025. URLhttps://arxiv.org

  13. [13]

    Longllada: Unlocking long context capabilities in diffusion llms

    Evgenii Nikishin, Remo Abachi, Rishabh Agarwal, and Pierre-Luc Bacon. Control-oriented model-based reinforcement learning with implicit differentiation. InProceedings of the AAAI Conference on Artificial Intelligence, volume 36(7), pages 7886–7894, 2022. doi: 10.1609/aaai. v36i7.20758

  14. [14]

    Online learning with optimism and delay

    Genevieve Flaspohler, Francesco Orabona, Judah Cohen, Soukayna Mouatadid, Miruna Oprescu, Paulo Orenstein, and Lester Mackey. Online learning with optimism and delay. InProceedings of the 38th International Conference on Machine Learning, volume 139 ofProceedings of Machine Learning Research, pages 3363–3373. PMLR, 2021. URLhttps://mlr.press

  15. [15]

    Banker online mirror descent: A universal approach for delayed online bandit learning

    Jiatai Huang, Yan Dai, and Longbo Huang. Banker online mirror descent: A universal approach for delayed online bandit learning. InProceedings of the 40th International Conference on Machine Learning, volume 202 ofProceedings of Machine Learning Research, pages 13818–13847. PMLR, 2023. URL https://proceedings.mlr.press/v202/huang23e/ huang23e.pdf

  16. [16]

    Alexander Ryabchenko, Idan Attias, and Daniel M. Roy. A reduction from delayed to immediate feedback for online convex optimization with improved guarantees. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40 (01), pages 1–9, 2026. doi: 10.48550/arXiv. 2602.02634. URLhttps://arxiv.org/abs/2602.02634. 10

  17. [17]

    Decentralized online convex optimization with unknown feedback delays

    Hao Qiu, Mengxiao Zhang, and Juliette Achddou. Decentralized online convex optimization with unknown feedback delays. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40 (01), pages 1–9, 2026. doi: 10.1609/aaai.v40i30.39688. URL https://ojs.aaai. org/index.php/AAAI/article/view/39688

  18. [18]

    Approximation Methods for Bilevel Programming

    Saeed Ghadimi and Mengdi Wang. Approximation methods for bilevel programming.arXiv preprint arXiv:1802.02246, 2018. doi: 10.48550/arXiv.1802.02246. URL https://arxiv. org/abs/1802.02246

  19. [19]

    Bilevel optimization: Convergence analysis and enhanced design

    Kaiyi Ji, Junjie Yang, and Yingbin Liang. Bilevel optimization: Convergence analysis and enhanced design. InInternational Conference on Machine Learning, pages 4882–4892. PMLR,

  20. [20]

    doi: 10.48550/arxiv.2010.07962

  21. [21]

    Online bilevel optimization: Regret analysis of online alternating gradient methods

    Davoud Ataee Tarzanagh, Parvin Nazari, Bojian Hou, Li Shen, and Laura Balzano. Online bilevel optimization: Regret analysis of online alternating gradient methods. InProceedings of the 27th International Conference on Artificial Intelligence and Statistics, volume 238 of Proceedings of Machine Learning Research, pages 2854–2862. PMLR, 2024. URL https: //p...

  22. [22]

    Non-convex bilevel optimization with time-varying objective functions

    Sen Lin, Daouda Sow, Kaiyi Ji, Yingbin Liang, and Ness Shroff. Non-convex bilevel optimization with time-varying objective functions. InAdvances in Neural Information Processing Systems, volume 36, pages 12658–12686. Curran Associates, Inc., 2023. URL https://proceedings.neurips.cc/paper_files/paper/2023/hash/ 5ee60ca5686bbcf756e56a6c75e66f32-Abstract-Con...

  23. [23]

    Stochas- tic regret guarantees for online zeroth- and first-order bilevel optimization

    Parvin Nazari, Bojian Hou, Davoud Ataee Tarzanagh, Li Shen, and George Michailidis. Stochas- tic regret guarantees for online zeroth- and first-order bilevel optimization. InAdvances in Neural Information Processing Systems, volume 38. Curran Associates, Inc., 2025. URL https://arxiv.org/abs/2511.01126

  24. [24]

    Dontchev and R

    Asen L. Dontchev and R. Tyrrell Rockafellar.Implicit Functions and Solution Mappings: A View from Variational Analysis. Springer Monographs in Mathematics. Springer New York, 2nd edition, 2014. ISBN 978-1-4939-1036-6. doi: 10.1007/978-1-4939-1037-3

  25. [25]

    Methods of conjugate gradients for solving linear systems.Journal of Research of the National Bureau of Standards, 49(6):409–436, 1952

    Magnus Rudolph Hestenes and Eduard Stiefel. Methods of conjugate gradients for solving linear systems.Journal of Research of the National Bureau of Standards, 49(6):409–436, 1952. doi: 10.6028/jres.049.044

  26. [26]

    Michael Steele.The Cauchy-Schwarz Master Class: An Introduction to the Art of Mathemati- cal Inequalities

    J. Michael Steele.The Cauchy-Schwarz Master Class: An Introduction to the Art of Mathemati- cal Inequalities. Cambridge University Press, Cambridge, UK, 2004. ISBN 978-0521546775. doi: 10.1017/cbo9780511817106

  27. [27]

    Robust stochastic approximation approach to stochastic programming,

    Arkadi Nemirovski, Anatoli Juditsky, Guanghui Lan, and Alexander Shapiro. Robust stochastic approximation approach to stochastic programming.SIAM Journal on Optimization, 19(4): 1574–1609, 2009. doi: 10.1137/070704277

  28. [28]

    Lev M Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming.USSR Computational Mathematics and Mathematical Physics, 7(3):200–217, 1967

  29. [29]

    Kingma and Jimmy Ba

    Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InProceedings of the 3rd International Conference on Learning Representations, San Diego, CA, USA, 2015. URLhttps://arxiv.org. ICLR 2015

  30. [30]

    Bertsekas.Nonlinear Programming

    Dimitri P. Bertsekas.Nonlinear Programming. Athena Scientific, Belmont, MA, 3rd edition,

  31. [31]

    Mirror descent and nonlinear projected subgradient methods for convex optimization.Operations Research Letters, 31(3):167–175, 2003

    Amir Beck and Marc Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization.Operations Research Letters, 31(3):167–175, 2003. doi: 10.1016/ s0167-6377(02)00231-6

  32. [32]

    Kolmanovskii and A

    V . Kolmanovskii and A. Myshkis.Stability of Stochastic Functional Differential Equations, volume 463 ofMathematics and Its Applications, pages 387–437. Springer Netherlands, Dordrecht, 1999. doi: 10.1007/978-94-017-1965-0_10. 11

  33. [33]

    B. G. Pachpatte.Inequalities for Differential and Integral Equations, volume 197 ofMathematics in Science and Engineering. Academic Press, San Diego, CA, 1998. ISBN 978-0125434300

  34. [34]

    Cambridge Texts in Applied Mathematics

    Arieh Iserles.A First Course in the Numerical Analysis of Differential Equations. Cambridge Texts in Applied Mathematics. Cambridge University Press, Cambridge, UK, 2nd edition, 2009. ISBN 978-0521734905. doi: 10.1017/CBO9780511995569

  35. [35]

    Jury.Theory and Application of the z-Transform Method

    Eliahu I. Jury.Theory and Application of the z-Transform Method. John Wiley & Sons, New York, NY , USA, 1964. ISBN 978-0471452850

  36. [36]

    PyTorch: An imperative style, high-performance deep learning library

    Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. PyTorch: An imperative style, high-performance deep learning library. InAdvances in Neural Information Processing Systems, volume 32, pages 8024–8035. Curran Associates, Inc., 2019. URLhttps://neurips. cc

  37. [37]

    OpenAI Gym

    Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. OpenAI Gym.arXiv preprint arXiv:1606.01540, 2016. doi: 10.48550/arXiv.1606.01540. URLhttps://arxiv.org/abs/1606.01540. 12 A Notation Table Table 5: Principal notation. Symbol Definition Core variables and delay θt ∈Θ⊆R p Outer-level predictor pa...

  38. [38]

    Thus: TX t=1 F(θ t)−F(θ ∗) ≥ TX t=t0 C2 0 θ2 t 2 ≥(T−t 0)· C2 0 2 · ϵ2 inner 4C2 0 = (T−t 0)ϵ 2 inner 8 .(29) ForT≫σ max (i.e.T−t 0 ≥T /2), this givesRegret T (A)≥Ω(T ϵ 2 inner)

    for t≥t 0 =O(σ max). Thus: TX t=1 F(θ t)−F(θ ∗) ≥ TX t=t0 C2 0 θ2 t 2 ≥(T−t 0)· C2 0 2 · ϵ2 inner 4C2 0 = (T−t 0)ϵ 2 inner 8 .(29) ForT≫σ max (i.e.T−t 0 ≥T /2), this givesRegret T (A)≥Ω(T ϵ 2 inner). Step 5: Independence from Algorithm Design The lower bound holds foranyblack-box delayed optimizer — that is, any algorithm whose only access to F is through...

  39. [39]

    No step-size choice eliminates this bias

    The constant bias C0ϵinner in (25) is independent of η, σmax, and the algorithm’s iterate sequence. No step-size choice eliminates this bias

  40. [40]

    The bound requires afixed precision floor: ϵinner must remain bounded away from zero uniformly over all rounds t (Assumption 3)

    Any stable algorithm converges to θss =−ϵ inner/C0, incurring F(θ ss) =ϵ 2 inner/2 per round. The bound requires afixed precision floor: ϵinner must remain bounded away from zero uniformly over all rounds t (Assumption 3). If the inner solver were warm-started such that ϵt →0 , the steady-state offset would vanish and the Ω(T) lower bound would collapse. ...

  41. [41]

    An unstable algorithm ( η >1/(2C 2 0 σmax)) has |θt| → ∞ (clipped at B0, since Θ = [−B0, B0]is compact), incurringF(θ t)≥C 2 0 B2 0 /2per round, which is strictly worse

  42. [42]

    inner-loop apathy

    The adversary chooses θ1 =B 0 ̸= 0, guaranteeing a non-trivial transient; the O(σmax) burn-in cost is absorbed into theΩ(T)rate forT≫σ max. The hard instance satisfies Assumptions 1 and 2 with µw =L w and Lθ =a 2 <∞ , completing Part (b). 21 Part (c): Transport Error Separation Without bilevel-aware correction, the per-round staleness mismatch contributes...

  43. [43]

    Guidelines: • The answer [N/A] means that the paper does not involve crowdsourcing nor research with human subjects

    Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...