pith. sign in

arxiv: 2606.13605 · v1 · pith:J2PA43W6new · submitted 2026-06-11 · 🧮 math.OC · cs.LG· cs.SY· eess.SY

Distribution-Agnostic Robust Trajectory Optimization via Chance-Constrained Reinforcement Learning

Pith reviewed 2026-06-27 05:40 UTC · model grok-4.3

classification 🧮 math.OC cs.LGcs.SYeess.SY
keywords robust trajectory optimizationchance-constrained reinforcement learningdistribution-agnosticprobabilistic feasibilityaffine correction lawspacecraft guidanceEarth-Mars transferrocket landing
0
0 comments X

The pith

Chance-constrained reinforcement learning robustifies nominal spacecraft trajectories using only samples of uncertainty.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a framework that first computes a deterministic nominal trajectory offline and then applies reinforcement learning to learn a correction policy that keeps the actual path feasible with high probability under sampled uncertainties. The correction takes the form of an affine closed-loop law, and chance constraints are handled by checking upper-tail quantiles from many simulated rollouts rather than by assuming any specific probability distribution. The method is shown on an Earth-Mars transfer and on a rocket landing problem, remaining competitive with existing robust optimizers on fuel cost while preserving feasibility even when the uncertainty changes after training. A sympathetic reader would care because most real spacecraft uncertainties are known only through samples and are difficult to capture with standard parametric distributions.

Core claim

By combining an offline nominal trajectory with a learned affine closed-loop correction law that includes feedforward adjustments and time-varying feedback gains, and by enforcing probabilistic feasibility through empirical upper-tail quantiles from rollouts together with covariance penalties on terminal dispersion, the framework achieves distribution-agnostic robust trajectory optimization that stays competitive in upper-tail fuel cost and transfers without redesign to materially different problems such as multi-impulse transfers and continuous-thrust landings.

What carries the argument

The structured affine closed-loop correction law (feedforward adjustment plus time-varying feedback gains) that is learned by reinforcement learning and whose probabilistic feasibility is enforced via rollout-based upper-tail quantiles.

If this is right

  • Upper-tail fuel cost remains competitive with a recent robust optimization reference under Gaussian uncertainty.
  • Probabilistic feasibility is preserved when the same policy is tested under bounded uniform uncertainty.
  • The policy also maintains feasibility under process disturbances that were never seen during training.
  • The identical robustification structure applies to a short-horizon continuous-thrust landing problem with drag, mass depletion, and glide-slope constraints.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The sample-only requirement would let mission planners substitute real flight telemetry directly for synthetic noise models.
  • The same scaffold could be tested on non-spacecraft problems such as autonomous vehicle path planning where uncertainty distributions are likewise unknown.
  • If the affine correction form proves insufficient for strongly nonlinear uncertainty propagation, the framework would need a richer policy class while keeping the quantile-based chance-constraint mechanism.

Load-bearing premise

Uncertainties can be represented solely through samples of initial conditions and process noise that the method is allowed to draw.

What would settle it

A set of Monte Carlo rollouts under a new uncertainty distribution, not used in training, in which the empirical fraction of trajectories violating the chance constraints exceeds the allowed probability level.

Figures

Figures reproduced from arXiv: 2606.13605 by Harry Holt, Marco Sagliano, Roberto Armellin, Yashdeep Chaudhary.

Figure 1
Figure 1. Figure 1: Closed-loop trajectory solution for the Earth–Mars transfer under Gaussian initial [PITH_FULL_IMAGE:figures/full_fig_p016_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Control-effort statistics under Gaussian initial uncertainty. (a) Per-node impulse [PITH_FULL_IMAGE:figures/full_fig_p017_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Evolution of ensemble dispersion along the trajectory for the Gaussian case. (a) [PITH_FULL_IMAGE:figures/full_fig_p018_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Terminal-state error distributions under Gaussian initial uncertainty. (a) Histogram [PITH_FULL_IMAGE:figures/full_fig_p019_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Closed-loop trajectory solution for the Earth–Mars transfer under uniform initial [PITH_FULL_IMAGE:figures/full_fig_p020_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Control-effort statistics under uniform initial uncertainty. (a) Per-node impulse [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Evolution of ensemble dispersion along the trajectory for the uniform case. (a) [PITH_FULL_IMAGE:figures/full_fig_p022_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Terminal-state error distributions under uniform initial uncertainty. (a) Histogram [PITH_FULL_IMAGE:figures/full_fig_p023_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Robustness to two-dimensional variation in unmodeled process noise under Gaussian [PITH_FULL_IMAGE:figures/full_fig_p024_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Trade-off between terminal dispersion qˆ0.95(er) and 95th-percentile total cost ∆vtot,0.95 over the diagonal sweep ηr = ηv = η. Marker style indicates feasibility and color encodes the process-noise scaling level. The process-noise study shows that the learned closed-loop structure provides meaningful robustness to moderate unmodeled disturbances without retraining, but with a finite robustness radius det… view at source ↗
Figure 11
Figure 11. Figure 11: Closed-loop descent trajectory for the rocket landing case under Gaussian initial [PITH_FULL_IMAGE:figures/full_fig_p030_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Applied control and equivalent-∆v statistics for the rocket landing case under Gaussian initial uncertainty with nominal process noise enabled. Panels show: (a) node-wise distribution of applied acceleration magnitude ∥U∥ across the evaluation ensemble and (b) histogram of total ∆veq over the MC rollout. The terminal targeting statistics are reported in [PITH_FULL_IMAGE:figures/full_fig_p031_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Terminal targeting error distributions for the rocket landing case under Gaussian [PITH_FULL_IMAGE:figures/full_fig_p032_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Closed-loop descent trajectory for the rocket landing case under bounded uniform [PITH_FULL_IMAGE:figures/full_fig_p033_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Applied control and equivalent-∆v statistics for the rocket landing case under bounded uniform initial uncertainty with nominal process noise enabled. Panels show: (a) node-wise distribution of applied acceleration magnitude ∥U∥ across the evaluation ensemble and (b) histogram of total ∆veq over the MC rollout. different touchdown-ellipse geometry in [PITH_FULL_IMAGE:figures/full_fig_p034_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Terminal targeting error distributions for the rocket landing case under bounded [PITH_FULL_IMAGE:figures/full_fig_p035_16.png] view at source ↗
read the original abstract

This paper presents a distribution-agnostic robust trajectory-optimization framework based on chance-constrained reinforcement learning. The uncertainty is represented here through initial conditions and process noise, with the only requirement being that it can be sampled. A deterministic nominal trajectory is first computed offline, and reinforcement learning is then used only to robustify that baseline through a structured affine closed-loop correction law comprising a feedforward control adjustment and time-varying feedback gains. Probabilistic feasibility is enforced empirically through rollout-based upper-tail quantiles, while terminal dispersion is regulated through covariance-feasibility penalties. The framework is assessed on two materially different trajectory design problems. The flagship case study is a three-dimensional multi-impulse Earth-Mars transfer, where the learned policy is benchmarked against a recent robust trajectory-optimization reference under Gaussian uncertainty and then evaluated under bounded uniform uncertainty and under process disturbances not seen during training. The second case study is a stochastic atmospheric pinpoint rocket landing problem, used to assess portability to a short-horizon continuous-thrust setting with drag, mass depletion, and glide-slope constraints. The results show that the proposed framework can remain competitive in upper-tail fuel cost while preserving probabilistic feasibility, and that the same robustification scaffold can be carried across heterogeneous spacecraft trajectory planning problems without redesign of its core stochastic-control structure.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents a distribution-agnostic robust trajectory optimization framework that first computes a deterministic nominal trajectory offline and then employs reinforcement learning to learn a structured affine closed-loop correction law (feedforward adjustment plus time-varying feedback gains) for robustness against sampled uncertainties in initial conditions and process noise. Probabilistic feasibility is enforced via rollout-based upper-tail quantiles and covariance-feasibility penalties. The method is evaluated on a 3D multi-impulse Earth-Mars transfer (benchmarked under Gaussian uncertainty and tested under uniform and unseen disturbances) and a stochastic atmospheric pinpoint rocket landing problem, with the central claim being competitive upper-tail fuel cost while preserving probabilistic feasibility and portability of the robustification scaffold across heterogeneous problems without core redesign.

Significance. If the empirical results hold with adequate quantitative validation, the work provides a portable sample-based robustification approach for chance-constrained trajectory optimization that requires only the ability to sample uncertainties, which could facilitate application to varied spacecraft planning tasks in aerospace control without strong distributional assumptions or problem-specific redesign.

major comments (2)
  1. [Abstract and § on numerical results] Abstract and results sections: The central claim of remaining 'competitive in upper-tail fuel cost while preserving probabilistic feasibility' is stated without any numerical metrics, training hyperparameters, rollout counts, or explicit feasibility rates in the abstract; the full manuscript must supply these quantitative comparisons (e.g., fuel cost values and violation probabilities versus the reference method) to substantiate the performance assertions on the Earth-Mars and landing cases.
  2. [Method section describing the correction law] Method description: The structured affine closed-loop correction law is presented as the key robustification mechanism, yet the manuscript provides no derivation or justification for why this particular affine form (rather than a more general policy) suffices to maintain the claimed portability and chance-constraint satisfaction across both impulsive and continuous-thrust problems.
minor comments (2)
  1. [Method] Notation for the covariance-feasibility penalties and quantile thresholds should be defined consistently with explicit symbols and units.
  2. [Numerical experiments] The description of the two case studies would benefit from a brief table summarizing problem dimensions, constraint types, and uncertainty sources for quick comparison.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. Below we respond point by point to the major comments and indicate the changes planned for the revised manuscript.

read point-by-point responses
  1. Referee: [Abstract and § on numerical results] Abstract and results sections: The central claim of remaining 'competitive in upper-tail fuel cost while preserving probabilistic feasibility' is stated without any numerical metrics, training hyperparameters, rollout counts, or explicit feasibility rates in the abstract; the full manuscript must supply these quantitative comparisons (e.g., fuel cost values and violation probabilities versus the reference method) to substantiate the performance assertions on the Earth-Mars and landing cases.

    Authors: We agree that the abstract and results sections would be strengthened by the inclusion of explicit numerical metrics. In the revised manuscript we will add quantitative comparisons, including upper-tail fuel costs, feasibility violation probabilities, training hyperparameters, and rollout counts, with direct numerical values versus the reference method for both the Earth-Mars and landing cases. revision: yes

  2. Referee: [Method section describing the correction law] Method description: The structured affine closed-loop correction law is presented as the key robustification mechanism, yet the manuscript provides no derivation or justification for why this particular affine form (rather than a more general policy) suffices to maintain the claimed portability and chance-constraint satisfaction across both impulsive and continuous-thrust problems.

    Authors: The affine structure (feedforward adjustment plus time-varying feedback gains) is selected to enable efficient learning via reinforcement learning while ensuring the same robustification scaffold remains portable across heterogeneous problems without core redesign. We acknowledge that the current manuscript does not provide an explicit derivation of why this form suffices for chance-constraint satisfaction. In the revised version we will add a concise justification in the method section explaining how the affine corrections, combined with rollout-based quantiles and covariance penalties, empirically enforce probabilistic feasibility across the tested impulsive and continuous-thrust settings. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper's central claim rests on an empirical sample-based procedure: a nominal trajectory is computed offline, then RL learns an affine correction law whose probabilistic feasibility is enforced directly via rollout quantiles and covariance penalties. This structure is portable across two distinct problems solely because uncertainty is assumed sampleable—the standard precondition for any Monte-Carlo chance-constraint method. No derivation step reduces by construction to a fitted parameter renamed as prediction, no self-citation supplies a uniqueness theorem that forbids alternatives, and no ansatz is smuggled in via prior work. The framework therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Framework rests on sampleability of uncertainty and introduces an affine correction structure whose effectiveness is asserted via case studies.

axioms (1)
  • domain assumption Uncertainty can be sampled from an unknown distribution
    Explicitly stated as the sole requirement for uncertainty representation.
invented entities (1)
  • Structured affine closed-loop correction law no independent evidence
    purpose: Robustify nominal trajectory via feedforward adjustment and time-varying feedback gains
    Introduced as the RL policy structure for closed-loop correction

pith-pipeline@v0.9.1-grok · 5774 in / 1193 out tokens · 30268 ms · 2026-06-27T05:40:39.255137+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

40 extracted references · 20 canonical work pages

  1. [1]

    R. H. Battin, An Introduction to the Mathematics and Methods of Astrodynamics, AIAA Education Series, American Institute of Aeronautics and Astronautics (AIAA), 1999

  2. [2]

    B. A. Conway, Spacecraft Trajectory Optimization, Cambridge Aerospace Series, Cambridge University Press, 2010

  3. [3]

    J. T. Betts, Survey of numerical methods for trajectory optimization, Journal of Guidance, Control, and Dynamics 21 (2) (1998) 193–207.doi:10.2514/2.4231

  4. [4]

    D. P. Bertsekas, S. E. Shreve, Stochastic Optimal Control: The Discrete Time Case, no. 5 in Optimization and Neural Computation Series, Athena Scientific, Belmont, Mass, 1996

  5. [5]

    Charnes, W

    A. Charnes, W. W. Cooper, Chance-constrained programming, Management Science 6 (1) (1959) 73–79. doi:10.1287/mnsc.6.1.73

  6. [6]

    Prékopa, Stochastic Programming, no

    A. Prékopa, Stochastic Programming, no. v.324 in Mathematics and Its Applications, Springer, Dordrecht, Netherlands, 1995

  7. [7]

    Y. Chen, T. T. Georgiou, M. Pavon, Optimal steering of a linear stochastic system to a final probability distribution, part II, IEEE Trans. Automat. Contr. 61 (5) (2016) 1170–1180. 36

  8. [8]

    Ridderhof, J

    J. Ridderhof, J. Pilipovsky, P. Tsiotras, Chance-constrained covariance control for low-thrust minimum- fuel trajectory optimization, in: AIAA/AAS Astrodynamics Specialists Conference, South Lake Tahoe, California, 2020, paper no. AAS 20-618

  9. [9]

    Ridderhof, P

    J. Ridderhof, P. Tsiotras, Minimum-fuel powered descent in the presence of random disturbances, in: AIAA Scitech 2019 Forum, American Institute of Aeronautics and Astronautics, 2019. doi: 10.2514/6.2019-0646

  10. [10]

    Ridderhof, P

    J. Ridderhof, P. Tsiotras, Uncertainty quantification and control during mars powered descent and landing using covariance steering, in: 2018 AIAA Guidance, Navigation, and Control Conference, Kissimmee, Florida, 2018.doi:10.2514/6.2018-0611

  11. [11]

    Benedikter, A

    B. Benedikter, A. Zavoli, Z. Wang, S. Pizzurro, E. Cavallini, Convex approach to covariance control with application to stochastic low-thrust trajectory optimization, Journal of Guidance, Control, and Dynamics 45 (11) (2022) 2061–2075.doi:10.2514/1.g006806

  12. [12]

    Benedikter, A

    B. Benedikter, A. Zavoli, Z. Wang, S. Pizzurro, E. Cavallini, Convex approach to stochastic control for autonomous rocket pinpoint landing, in: AAS/AIAA Astrodynamics Specialist Conference, Charlotte, North Carolina, 2022, paper no. AAS 22-717

  13. [13]

    Marmo, A

    N. Marmo, A. Zavoli, N. Ozaki, Chance-constraint robust trajectory optimization with a hybrid multiple-shooting approach, Journal of Guidance, Control, and Dynamics 48 (11) (2025) 2495–2511. doi:10.2514/1.g008275

  14. [14]

    Zhang, D

    P. Zhang, D. Wu, S. Gong, Trajectory optimization for aerodynamically controlled missiles by chance- constrained sequential convex programming, Aerospace Science and Technology 153 (2024) 109464. doi:10.1016/j.ast.2024.109464

  15. [16]

    R. S. Sutton, A. G. Barto, Reinforcement Learning: An Introduction, 2nd Edition, MIT Press Cambridge, 2018

  16. [17]

    Zavoli, L

    A. Zavoli, L. Federici, Reinforcement learning for robust trajectory design of interplanetary missions, Journal of Guidance, Control, and Dynamics 44 (8) (2021) 1440–1453.doi:10.2514/1.g005794

  17. [18]

    Federici, A

    L. Federici, A. Scorsoglio, A. Zavoli, R. Furfaro, Autonomous guidance for cislunar orbit transfers via reinforcement learning, in: AAS/AIAA Astrodynamics Specialist Conference, Big Sky, MT, 2021, paper no. AAS 21-610

  18. [19]

    Gaudet, R

    B. Gaudet, R. Linares, R. Furfaro, Deep reinforcement learning for six degree-of-freedom planetary landing, Advances in Space Research 65 (7) (2020) 1723–1741.doi:10.1016/j.asr.2019.12.030

  19. [20]

    Furfaro, A

    R. Furfaro, A. Scorsoglio, R. Linares, M. Massari, Adaptive generalized ZEM-ZEV feedback guidance for planetary landing via a deep reinforcement learning approach (Mar. 2020).arXiv:2003.02182

  20. [21]

    Federici, B

    L. Federici, B. Benedikter, R. Furfaro, Reinforcement-learning-enhanced model predictive control with application to autonomous planetary landing, Journal of Guidance, Control, and Dynamics 49 (3) (2026) 788–805.doi:10.2514/1.g009534

  21. [22]

    Gaudet, R

    B. Gaudet, R. Furfaro, R. Linares, Reinforcement learning for angle-only intercept guidance of maneu- vering targets, Aerospace Science and Technology 99 (2020) 105746.doi:10.1016/j.ast.2020.105746

  22. [23]

    Hovell, S

    K. Hovell, S. Ulrich, Deep reinforcement learning for spacecraft proximity operations guidance, Journal of Spacecraft and Rockets 58 (2) (2021) 254–264.doi:10.2514/1.a34838. 37

  23. [24]

    Federici, B

    L. Federici, B. Benedikter, A. Zavoli, Deep learning techniques for autonomous spacecraft guidance during proximity operations, Journal of Spacecraft and Rockets 58 (6) (2021) 1774–1785.doi:10.2514/ 1.a35076

  24. [25]

    H. Yuan, D. Li, Deep reinforcement learning for rendezvous guidance with enhanced angles-only observability, Aerospace Science and Technology 129 (2022) 107812.doi:10.1016/j.ast.2022.107812

  25. [26]

    C. Mu, S. Liu, M. Lu, Z. Liu, L. Cui, K. Wang, Autonomous spacecraft collision avoidance with a variable number of space debris based on safe reinforcement learning, Aerospace Science and Technology 149 (2024) 109131.doi:10.1016/j.ast.2024.109131

  26. [27]

    H. Holt, R. Armellin, Reinforcement learning enhanced lqr and control lyapunov functions for spacecraft proximity operations, IEEE Transactions on Robotics 41 (2025) 5117–5129.doi:10.1109/tro.2025. 3600160

  27. [28]

    Capra, A

    L. Capra, A. Brandonisio, M. R. Lavagna, Reinforced model predictive guidance and control for spacecraft proximity operations, Aerospace 12 (9) (2025) 837.doi:10.3390/aerospace12090837

  28. [29]

    J. T. A. Vedant, M. West, A. Ghosh, Reinforcement learning for spacecraft attitude control, in: 70th International Astronautical Congress, Washington D.C., United States, 2019

  29. [30]

    Zheng, Y

    M. Zheng, Y. Wu, C. Li, Reinforcement learning strategy for spacecraft attitude hyperagile tracking control with uncertainties, Aerospace Science and Technology 119 (2021) 107126.doi:10.1016/j.ast. 2021.107126

  30. [31]

    Chen, Y.-X

    R.-Z. Chen, Y.-X. Li, C. K. Ahn, Reinforcement-learning-based fixed-time attitude consensus control for multiple spacecraft systems with model uncertainties, Aerospace Science and Technology 132 (2023) 108060.doi:10.1016/j.ast.2022.108060

  31. [32]

    Chaudhary, H

    Y. Chaudhary, H. Holt, L. Anoè, R. Armellin, C. Bombardelli, Low-Thrust Cis-Lunar Transfers exploiting Ballistic Capture Trajectories, in: AIAA SCITECH 2024 Forum, American Institute of Aeronautics and Astronautics (AIAA), 2024.doi:10.2514/6.2024-0837

  32. [33]

    Liu, Fuel-optimal rocket landing with aerodynamic controls, Journal of Guidance, Control, and Dynamics 42 (1) (2019) 65–77.doi:10.2514/1.g003537

    X. Liu, Fuel-optimal rocket landing with aerodynamic controls, Journal of Guidance, Control, and Dynamics 42 (1) (2019) 65–77.doi:10.2514/1.g003537

  33. [34]

    Engstrom, A

    L. Engstrom, A. Ilyas, S. Santurkar, D. Tsipras, F. Janoos, L. Rudolph, A. Madry, Implementation matters in deep policy gradients: A case study on ppo and trpo (2020).arXiv:2005.12729

  34. [35]

    Raffin, A

    A. Raffin, A. Hill, A. Gleave, A. Kanervisto, M. Ernestus, N. Dormann, Stable-baselines3: Reliable reinforcement learning implementations, Journal of Machine Learning Research 22 (268) (2021) 1–8, available at http://jmlr.org/papers/v22/20-1364.html

  35. [36]

    Paszke, S

    A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Köpf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, S. Chintala, PyTorch: An Imperative Style, High-Performance Deep Learning Library (Dec. 2019).arXiv:1912.01703

  36. [37]

    Towers, A

    M. Towers, A. Kwiatkowski, J. Terry, J. U. Balis, G. De Cola, T. Deleu, M. Goulão, A. Kallinteris, M. Krimmel, A. KG, et al., Gymnasium: A standard interface for reinforcement learning environments (2024).arXiv:2407.17032

  37. [38]

    E. M. Standish, JPL planetary and lunar ephemerides, DE405/LE405, Interoffice Memorandum JPL IOM 312.F-98-048, Jet Propulsion Laboratory (Aug. 1998).ftp://ssd.jpl.nasa.gov/pub/eph/planets/ ioms/de405.iom.pdf. 38

  38. [39]

    Diamond, S

    S. Diamond, S. Boyd, CVXPY: A Python-embedded modeling language for convex optimization, Journal of Machine Learning Research 17 (83) (2016) 1–5

  39. [40]

    Agrawal, R

    A. Agrawal, R. Verschueren, S. Diamond, S. Boyd, A rewriting system for convex optimization problems, Journal of Control and Decision 5 (1) (2018) 42–60

  40. [41]

    com/latest/pythonapi/index.html

    MOSEK ApS, The MOSEK Python Optimizer API manual, Version 11.1 (2025).https://docs.mosek. com/latest/pythonapi/index.html. 39