Safe Deep Reinforcement Learning for Spacecraft Reorientation with Pointing Keep-Out Constraint

Juntang Yang; Mohamed Khalil Ben-Larbi

arxiv: 2605.19967 · v1 · pith:PGIOVG6Dnew · submitted 2026-05-19 · 📡 eess.SY · cs.SY

Safe Deep Reinforcement Learning for Spacecraft Reorientation with Pointing Keep-Out Constraint

Juntang Yang , Mohamed Khalil Ben-Larbi This is my paper

Pith reviewed 2026-05-20 03:54 UTC · model grok-4.3

classification 📡 eess.SY cs.SY

keywords spacecraft reorientationdeep reinforcement learningcontrol barrier functionssafety filterpointing keep-out constraintattitude controlsafe reinforcement learning

0 comments

The pith

A control barrier function safety filter guarantees the pointing keep-out constraint during deep RL spacecraft reorientation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a deep reinforcement learning controller for spacecraft attitude reorientation that must avoid a forbidden pointing direction. A compact state representation encodes the keep-out zone and a shaped reward encourages reaching the target attitude. The soft actor-critic algorithm trains the policy with curriculum learning. During operation a control barrier function safety filter overrides any action that would violate the constraint. Monte Carlo trials show that reward shaping alone permits violations while the added filter prevents them in every tested case.

Core claim

The authors establish that a CBF-based safety filter, when applied to actions proposed by a trained reinforcement learning policy, guarantees that the spacecraft pointing vector never enters the keep-out zone throughout the entire reorientation maneuver.

What carries the argument

Control barrier function (CBF)-based safety filter that modifies the reinforcement learning action to enforce the continuous-state pointing keep-out constraint.

Load-bearing premise

The control barrier function can be formulated to enforce the pointing keep-out constraint in continuous state space without excessive conservatism or requiring perfect knowledge of the spacecraft dynamics.

What would settle it

A closed-loop simulation or hardware test in which the spacecraft pointing direction enters the keep-out zone while the CBF safety filter remains active would falsify the guarantee.

read the original abstract

This paper implements deep reinforcement learning (DRL) with a safety filter for spacecraft reorientation control with a single pointing keep-out zone. A new state space representation is designed which includes a compact representation of the attitude constraint zone. A reward function is formulated to achieve the control objective while enforcing the attitude constraint. The soft actor-critic (SAC) algorithm is adopted to handle continuous state and action space. A curriculum learning approach is implemented for agent training. To guarantee the compliance of the attitude constraint, a control barrier function (CBF)-based safety filter is implemented for agent deployment. Simulation results demonstrate the effectiveness of the proposed state space presentation and the designed reward function. Monte Carlo simulations underscore that reward shaping alone cannot guarantee the safety during reorientation maneuver. In contrast, with the CBF-based safety filter, the constraint can be guaranteed during maneuvers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CBF filter on top of SAC keeps the keep-out zone safe in their Monte Carlo runs, but the relative-degree-2 issue for attitude constraints may limit how strong the invariance claim actually is.

read the letter

The main thing to know is that layering a CBF safety filter on a trained SAC policy appears to enforce the pointing keep-out constraint during reorientation maneuvers in simulation, while reward shaping by itself does not. They encode the constraint zone compactly in the state, shape the reward to discourage violations, train with curriculum learning, and then run the policy through a CBF-based quadratic program at deployment time. The Monte Carlo results make a practical case that the extra filter layer matters for reliability.

Referee Report

1 major / 2 minor

Summary. The paper proposes a deep reinforcement learning controller using Soft Actor-Critic for spacecraft attitude reorientation that avoids a single pointing keep-out zone. It introduces a compact state representation that encodes the constraint zone, designs a shaped reward, employs curriculum learning during training, and augments the deployed policy with a CBF-based safety filter whose quadratic program is intended to enforce forward invariance of the safe set. Monte Carlo simulations are presented to show that reward shaping alone permits violations while the CBF filter prevents them.

Significance. If the safety filter rigorously guarantees invariance, the combination of learned policy with an independent CBF layer offers a practical route to certified safety in continuous-state aerospace control tasks. The custom state encoding and curriculum approach are constructive contributions. However, the absence of baseline comparisons, quantitative performance metrics, and error bars limits the strength of the empirical claims.

major comments (1)

[CBF safety filter section] CBF safety filter section: the pointing keep-out constraint is a function h(q) of the attitude quaternion alone. Spacecraft attitude dynamics are second-order (state (q, ω), torque input), so L_g h = 0 and the relative degree is 2. A standard first-order CBF condition of the form L_f h + L_g h u + α(h) ≥ 0 therefore cannot directly constrain the input at the boundary. The manuscript must either (i) derive and apply a higher-order CBF, (ii) explicitly compute the second Lie derivative and show the resulting QP is always feasible, or (iii) demonstrate that the filter still renders the set invariant under the specific dynamics. Monte Carlo results alone do not substitute for this Lie-derivative analysis.

minor comments (2)

[Abstract] Abstract and results section: no numerical performance metrics (e.g., success rate, settling time, control effort), error bars, or comparison against established baseline controllers (PD, LQR, or other safe RL methods) are reported, making it difficult to assess practical improvement.
[State space representation] State-space design: the compact representation of the keep-out zone is introduced but its invariance properties under the closed-loop dynamics are not analyzed separately from the CBF filter.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive comment on the CBF safety filter. The observation regarding relative degree is technically correct and we have revised the manuscript to address it rigorously.

read point-by-point responses

Referee: [CBF safety filter section] CBF safety filter section: the pointing keep-out constraint is a function h(q) of the attitude quaternion alone. Spacecraft attitude dynamics are second-order (state (q, ω), torque input), so L_g h = 0 and the relative degree is 2. A standard first-order CBF condition of the form L_f h + L_g h u + α(h) ≥ 0 therefore cannot directly constrain the input at the boundary. The manuscript must either (i) derive and apply a higher-order CBF, (ii) explicitly compute the second Lie derivative and show the resulting QP is always feasible, or (iii) demonstrate that the filter still renders the set invariant under the specific dynamics. Monte Carlo results alone do not substitute for this Lie-derivative analysis.

Authors: We agree that the pointing keep-out constraint h(q) depends solely on the attitude quaternion, yielding L_g h = 0 and relative degree 2 under the second-order attitude dynamics. The original manuscript applied a standard first-order CBF condition without explicit higher-order analysis. In the revised version we derive a second-order CBF by computing the second Lie derivative along the dynamics, formulate the corresponding QP, and prove that the QP remains feasible for all admissible torques. We further show that the closed-loop system renders the safe set forward invariant. The revised manuscript includes the full Lie-derivative derivation, feasibility proof, and supporting simulation results that go beyond Monte Carlo validation alone. revision: yes

Circularity Check

0 steps flagged

Safety guarantee supplied by independent CBF filter rather than emerging from learned policy

full rationale

The paper's derivation chain consists of an explicit state-space redesign, a hand-crafted reward function, standard SAC training with curriculum, and a post-hoc CBF safety filter applied at deployment. The central guarantee that the pointing constraint is satisfied is attributed directly to the CBF filter (an external mechanism whose invariance properties are assumed from prior CBF literature) rather than being derived from or fitted inside the DRL loop. Monte-Carlo results are used only to show that reward shaping alone is insufficient, which is an empirical observation and does not create a self-referential loop. No equation reduces a prediction to a fitted parameter by construction, no uniqueness theorem is imported from the authors' own prior work, and no ansatz is smuggled via self-citation. The minor score accounts for routine self-citation of CBF methods, which is not load-bearing for the core claim.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The approach depends on standard rigid-body attitude dynamics being accurately simulated and on the ability to construct a valid CBF for the keep-out zone; no new physical entities are postulated.

free parameters (2)

reward function weights
Weights balancing task completion against constraint violation are chosen during design and affect training behavior.
curriculum progression schedule
Parameters controlling how quickly task difficulty increases during training are set by the authors.

axioms (2)

domain assumption Spacecraft rotational dynamics follow standard Euler equations or quaternion kinematics without unmodeled disturbances.
Required for the simulation environment in which the agent is trained and tested.
domain assumption A control barrier function can be defined whose zero superlevel set exactly matches the keep-out constraint.
Central premise enabling the safety filter to guarantee constraint satisfaction.

pith-pipeline@v0.9.0 · 5675 in / 1336 out tokens · 49530 ms · 2026-05-20T03:54:58.777086+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

CBF h(t,q) = κ + κ̇|κ̇|/(2μ) with pκ, pℎ polynomials and QP for U_z (Eqs. 15,22-24)
IndisputableMonolith/Foundation/Atomicity.lean atomic_tick unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Monte-Carlo results showing 0 % violation only with safety filter (Table 4)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages

[1]

A randomized attitude slew planning algorithm for autonomousspacecraft

E Feron, M Dahleh, E Frazzoli, and R Kornfeld. A randomized attitude slew planning algorithm for autonomousspacecraft. InAIAAguidance,navigation,andcontrolconferenceandexhibit,page4155,2001

work page 2001
[2]

Journal of Guidance, Control, and Dynamics, 36(5):1301–1309, 2013

HenriCKjellbergandEGlennLightsey.Discretizedconstrainedattitudepathfindingandcontrolforsatellites. Journal of Guidance, Control, and Dynamics, 36(5):1301–1309, 2013

work page 2013
[3]

Constrained spacecraftattitudecontrolonso(3)usingfastnonlinearmodelpredictivecontrol

Rohit Gupta, Uroš V Kalabić, Stefano Di Cairano, Anthony M Bloch, and Ilya V Kolmanovsky. Constrained spacecraftattitudecontrolonso(3)usingfastnonlinearmodelpredictivecontrol. In2015AmericanControl Conference (ACC), pages 2980–2986. IEEE, 2015

work page 2015
[5]

Potential field-based sliding surface design and its application in spacecraft constrained reorientation.Journal of Guidance, Control, and Dynamics, 44(2):399–409, 2021

Juntang Yang, Yisheng Duan, Mohamed Khalil Ben-Larbi, and Enrico Stoll. Potential field-based sliding surface design and its application in spacecraft constrained reorientation.Journal of Guidance, Control, and Dynamics, 44(2):399–409, 2021

work page 2021
[6]

Bridging reinforcement learning and online learning for spacecraft attitude control.Journal of Aerospace Information Systems, 19(1):62–69, 2022

Jacob G Elkins, Rohan Sood, and Clemens Rumpf. Bridging reinforcement learning and online learning for spacecraft attitude control.Journal of Aerospace Information Systems, 19(1):62–69, 2022

work page 2022
[7]

In2020 Chinese Automation Congress (CAC), pages 4095–4101

DuozhiGao,HaiboZhang,ChuanjiangLi,andXinzhouGao.Satelliteattitudecontrolwithdeepreinforcement learning. In2020 Chinese Automation Congress (CAC), pages 4095–4101. IEEE, 2020

work page 2020
[8]

Djebko, F

K. Djebko, F. Puppe, S. Montenegro, T. Baumann, and M. Faisal. Learning attitude control. In14th IAA Symposium on Small Satellites for Earth System Observation, 2023

work page 2023
[9]

Deep reinforcement learning-based attitude control for spacecraft using control moment gyros.Advances in Space Research, 75(1):1129–1144, 2025

Snyoll Oghim, Junwoo Park, Hyochoong Bang, and Henzeh Leeghim. Deep reinforcement learning-based attitude control for spacecraft using control moment gyros.Advances in Space Research, 75(1):1129–1144, 2025

work page 2025
[10]

Spacecraft attitude maneuver planning based on deep reinforcement learning under complex constraints

Shulei Jiang, Fanyu Zhao, Yuejie Chen, and Zhonghe Jin. Spacecraft attitude maneuver planning based on deep reinforcement learning under complex constraints. In2023 9th International Conference on Control Science and Systems Engineering (ICCSSE), pages 61–67. IEEE, 2023

work page 2023
[11]

Reinforcement learning-based satellite formation attitude control under multi-constraint.Advances in Space Research, 74(11):5819–5836, 2024

Yingkai Cai, Kay-Soon Low, and Zhaokui Wang. Reinforcement learning-based satellite formation attitude control under multi-constraint.Advances in Space Research, 74(11):5819–5836, 2024

work page 2024
[12]

A review of safereinforcementlearning: Methods,theoriesandapplications.IEEETransactionsonPatternAnalysisand Machine Intelligence, 2024

Shangding Gu, Long Yang, Yali Du, Guang Chen, Florian Walter, Jun Wang, and Alois Knoll. A review of safereinforcementlearning: Methods,theoriesandapplications.IEEETransactionsonPatternAnalysisand Machine Intelligence, 2024

work page 2024
[13]

A predictive safety filter for learning-based control of constrained nonlinear dynamical systems.Automatica, 129:109597, 2021

Kim Peter Wabersich and Melanie N Zeilinger. A predictive safety filter for learning-based control of constrained nonlinear dynamical systems.Automatica, 129:109597, 2021

work page 2021
[14]

Safe reinforcement learning on autonomous vehicles

David Isele, Alireza Nakhaei, and Kikuo Fujimura. Safe reinforcement learning on autonomous vehicles. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1–6. IEEE, 2018

work page 2018
[15]

Engineering Applications of Artificial Intelligence, 88:103360, 2020

JavierGarcíaandDiogoShafie.Teachingahumanoidrobottowalkfasterthroughsafereinforcementlearning. Engineering Applications of Artificial Intelligence, 88:103360, 2020

work page 2020
[16]

Shielded deep reinforcement learning for multi-sensor spacecraft imaging

Islam Nazmy, Andrew Harris, Morteza Lahijanian, and Hanspeter Schaub. Shielded deep reinforcement learning for multi-sensor spacecraft imaging. In2022 American Control Conference (ACC), pages 1808–

work page
[17]

Shieldeddeepreinforcementlearningforcomplex spacecraft tasking

RobertReed, HanspeterSchaub, andMortezaLahijanian. Shieldeddeepreinforcementlearningforcomplex spacecraft tasking. In2024 American Control Conference (ACC), pages 2331–2337. IEEE, 2024. Except where otherwise noted, content of this paper is licensed undera Creative Commons Attribution 4.0 International License. The reproduction and distribution with attri...

work page 2024
[18]

Soft actor-critic: Off-policy maximum entropydeepreinforcementlearningwithastochasticactor

Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft actor-critic: Off-policy maximum entropydeepreinforcementlearningwithastochasticactor. InInternationalconferenceonmachinelearning, pages 1861–1870. Pmlr, 2018

work page 2018
[19]

Curriculum learning for reinforcement learning domains: A framework and survey.Journal of Machine Learning Research, 21(181):1–50, 2020

SanmitNarvekar, Bei Peng, Matteo Leonetti, Jivko Sinapov, MatthewE Taylor, and PeterStone. Curriculum learning for reinforcement learning domains: A framework and survey.Journal of Machine Learning Research, 21(181):1–50, 2020

work page 2020
[20]

Extending the capabilities of reinforcement learning through curriculum: A review of methods and applications.SN Computer Science, 3(1):28, 2022

Kashish Gupta, Debasmita Mukherjee, and Homayoun Najjaran. Extending the capabilities of reinforcement learning through curriculum: A review of methods and applications.SN Computer Science, 3(1):28, 2022

work page 2022
[21]

Autonomous spacecraft attitude reorientation using robust sampled- data control barrier functions.Journal of Guidance, Control, and Dynamics, 46(10):1874–1891, 2023

Joseph Breeden and Dimitra Panagou. Autonomous spacecraft attitude reorientation using robust sampled- data control barrier functions.Journal of Guidance, Control, and Dynamics, 46(10):1874–1891, 2023

work page 2023
[22]

Springer, New York, 2014

F Landis Markley and John L Crassidis.Fundamentals of Spacecraft Attitude Determination and Control, chapter 2, 3, 7. Springer, New York, 2014

work page 2014
[23]

Feedback control for spacecraft reorientation under attitude constraints via convex potentials.IEEE Transactions on Aerospace and Electronic Systems, 50(4):2578–2592, 2014

Unsik Lee and Mehran Mesbahi. Feedback control for spacecraft reorientation under attitude constraints via convex potentials.IEEE Transactions on Aerospace and Electronic Systems, 50(4):2578–2592, 2014

work page 2014
[24]

Reinforcement learning: An introduction 2nd ed.MIT press Cambridge, 1(2):25, 2018

Richard S Sutton, Andrew G Barto, et al. Reinforcement learning: An introduction 2nd ed.MIT press Cambridge, 1(2):25, 2018

work page 2018
[25]

Deep reinforcement learning: A brief survey.IEEE signal processing magazine, 34(6):26–38, 2017

Kai Arulkumaran, Marc Peter Deisenroth, Miles Brundage, and Anil Anthony Bharath. Deep reinforcement learning: A brief survey.IEEE signal processing magazine, 34(6):26–38, 2017

work page 2017
[26]

Stable-baselines3: Reliable reinforcement learning implementations.Journal of machine learning research, 22(268):1–8, 2021

Antonin Raffin, Ashley Hill, Adam Gleave, Anssi Kanervisto, Maximilian Ernestus, and Noah Dormann. Stable-baselines3: Reliable reinforcement learning implementations.Journal of machine learning research, 22(268):1–8, 2021

work page 2021

[1] [1]

A randomized attitude slew planning algorithm for autonomousspacecraft

E Feron, M Dahleh, E Frazzoli, and R Kornfeld. A randomized attitude slew planning algorithm for autonomousspacecraft. InAIAAguidance,navigation,andcontrolconferenceandexhibit,page4155,2001

work page 2001

[2] [2]

Journal of Guidance, Control, and Dynamics, 36(5):1301–1309, 2013

HenriCKjellbergandEGlennLightsey.Discretizedconstrainedattitudepathfindingandcontrolforsatellites. Journal of Guidance, Control, and Dynamics, 36(5):1301–1309, 2013

work page 2013

[3] [3]

Constrained spacecraftattitudecontrolonso(3)usingfastnonlinearmodelpredictivecontrol

Rohit Gupta, Uroš V Kalabić, Stefano Di Cairano, Anthony M Bloch, and Ilya V Kolmanovsky. Constrained spacecraftattitudecontrolonso(3)usingfastnonlinearmodelpredictivecontrol. In2015AmericanControl Conference (ACC), pages 2980–2986. IEEE, 2015

work page 2015

[4] [5]

Potential field-based sliding surface design and its application in spacecraft constrained reorientation.Journal of Guidance, Control, and Dynamics, 44(2):399–409, 2021

Juntang Yang, Yisheng Duan, Mohamed Khalil Ben-Larbi, and Enrico Stoll. Potential field-based sliding surface design and its application in spacecraft constrained reorientation.Journal of Guidance, Control, and Dynamics, 44(2):399–409, 2021

work page 2021

[5] [6]

Bridging reinforcement learning and online learning for spacecraft attitude control.Journal of Aerospace Information Systems, 19(1):62–69, 2022

Jacob G Elkins, Rohan Sood, and Clemens Rumpf. Bridging reinforcement learning and online learning for spacecraft attitude control.Journal of Aerospace Information Systems, 19(1):62–69, 2022

work page 2022

[6] [7]

In2020 Chinese Automation Congress (CAC), pages 4095–4101

DuozhiGao,HaiboZhang,ChuanjiangLi,andXinzhouGao.Satelliteattitudecontrolwithdeepreinforcement learning. In2020 Chinese Automation Congress (CAC), pages 4095–4101. IEEE, 2020

work page 2020

[7] [8]

Djebko, F

K. Djebko, F. Puppe, S. Montenegro, T. Baumann, and M. Faisal. Learning attitude control. In14th IAA Symposium on Small Satellites for Earth System Observation, 2023

work page 2023

[8] [9]

Deep reinforcement learning-based attitude control for spacecraft using control moment gyros.Advances in Space Research, 75(1):1129–1144, 2025

Snyoll Oghim, Junwoo Park, Hyochoong Bang, and Henzeh Leeghim. Deep reinforcement learning-based attitude control for spacecraft using control moment gyros.Advances in Space Research, 75(1):1129–1144, 2025

work page 2025

[9] [10]

Spacecraft attitude maneuver planning based on deep reinforcement learning under complex constraints

Shulei Jiang, Fanyu Zhao, Yuejie Chen, and Zhonghe Jin. Spacecraft attitude maneuver planning based on deep reinforcement learning under complex constraints. In2023 9th International Conference on Control Science and Systems Engineering (ICCSSE), pages 61–67. IEEE, 2023

work page 2023

[10] [11]

Reinforcement learning-based satellite formation attitude control under multi-constraint.Advances in Space Research, 74(11):5819–5836, 2024

Yingkai Cai, Kay-Soon Low, and Zhaokui Wang. Reinforcement learning-based satellite formation attitude control under multi-constraint.Advances in Space Research, 74(11):5819–5836, 2024

work page 2024

[11] [12]

A review of safereinforcementlearning: Methods,theoriesandapplications.IEEETransactionsonPatternAnalysisand Machine Intelligence, 2024

Shangding Gu, Long Yang, Yali Du, Guang Chen, Florian Walter, Jun Wang, and Alois Knoll. A review of safereinforcementlearning: Methods,theoriesandapplications.IEEETransactionsonPatternAnalysisand Machine Intelligence, 2024

work page 2024

[12] [13]

A predictive safety filter for learning-based control of constrained nonlinear dynamical systems.Automatica, 129:109597, 2021

Kim Peter Wabersich and Melanie N Zeilinger. A predictive safety filter for learning-based control of constrained nonlinear dynamical systems.Automatica, 129:109597, 2021

work page 2021

[13] [14]

Safe reinforcement learning on autonomous vehicles

David Isele, Alireza Nakhaei, and Kikuo Fujimura. Safe reinforcement learning on autonomous vehicles. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1–6. IEEE, 2018

work page 2018

[14] [15]

Engineering Applications of Artificial Intelligence, 88:103360, 2020

JavierGarcíaandDiogoShafie.Teachingahumanoidrobottowalkfasterthroughsafereinforcementlearning. Engineering Applications of Artificial Intelligence, 88:103360, 2020

work page 2020

[15] [16]

Shielded deep reinforcement learning for multi-sensor spacecraft imaging

Islam Nazmy, Andrew Harris, Morteza Lahijanian, and Hanspeter Schaub. Shielded deep reinforcement learning for multi-sensor spacecraft imaging. In2022 American Control Conference (ACC), pages 1808–

work page

[16] [17]

Shieldeddeepreinforcementlearningforcomplex spacecraft tasking

RobertReed, HanspeterSchaub, andMortezaLahijanian. Shieldeddeepreinforcementlearningforcomplex spacecraft tasking. In2024 American Control Conference (ACC), pages 2331–2337. IEEE, 2024. Except where otherwise noted, content of this paper is licensed undera Creative Commons Attribution 4.0 International License. The reproduction and distribution with attri...

work page 2024

[17] [18]

Soft actor-critic: Off-policy maximum entropydeepreinforcementlearningwithastochasticactor

Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft actor-critic: Off-policy maximum entropydeepreinforcementlearningwithastochasticactor. InInternationalconferenceonmachinelearning, pages 1861–1870. Pmlr, 2018

work page 2018

[18] [19]

Curriculum learning for reinforcement learning domains: A framework and survey.Journal of Machine Learning Research, 21(181):1–50, 2020

SanmitNarvekar, Bei Peng, Matteo Leonetti, Jivko Sinapov, MatthewE Taylor, and PeterStone. Curriculum learning for reinforcement learning domains: A framework and survey.Journal of Machine Learning Research, 21(181):1–50, 2020

work page 2020

[19] [20]

Extending the capabilities of reinforcement learning through curriculum: A review of methods and applications.SN Computer Science, 3(1):28, 2022

Kashish Gupta, Debasmita Mukherjee, and Homayoun Najjaran. Extending the capabilities of reinforcement learning through curriculum: A review of methods and applications.SN Computer Science, 3(1):28, 2022

work page 2022

[20] [21]

Autonomous spacecraft attitude reorientation using robust sampled- data control barrier functions.Journal of Guidance, Control, and Dynamics, 46(10):1874–1891, 2023

Joseph Breeden and Dimitra Panagou. Autonomous spacecraft attitude reorientation using robust sampled- data control barrier functions.Journal of Guidance, Control, and Dynamics, 46(10):1874–1891, 2023

work page 2023

[21] [22]

Springer, New York, 2014

F Landis Markley and John L Crassidis.Fundamentals of Spacecraft Attitude Determination and Control, chapter 2, 3, 7. Springer, New York, 2014

work page 2014

[22] [23]

Feedback control for spacecraft reorientation under attitude constraints via convex potentials.IEEE Transactions on Aerospace and Electronic Systems, 50(4):2578–2592, 2014

Unsik Lee and Mehran Mesbahi. Feedback control for spacecraft reorientation under attitude constraints via convex potentials.IEEE Transactions on Aerospace and Electronic Systems, 50(4):2578–2592, 2014

work page 2014

[23] [24]

Reinforcement learning: An introduction 2nd ed.MIT press Cambridge, 1(2):25, 2018

Richard S Sutton, Andrew G Barto, et al. Reinforcement learning: An introduction 2nd ed.MIT press Cambridge, 1(2):25, 2018

work page 2018

[24] [25]

Deep reinforcement learning: A brief survey.IEEE signal processing magazine, 34(6):26–38, 2017

Kai Arulkumaran, Marc Peter Deisenroth, Miles Brundage, and Anil Anthony Bharath. Deep reinforcement learning: A brief survey.IEEE signal processing magazine, 34(6):26–38, 2017

work page 2017

[25] [26]

Stable-baselines3: Reliable reinforcement learning implementations.Journal of machine learning research, 22(268):1–8, 2021

Antonin Raffin, Ashley Hill, Adam Gleave, Anssi Kanervisto, Maximilian Ernestus, and Noah Dormann. Stable-baselines3: Reliable reinforcement learning implementations.Journal of machine learning research, 22(268):1–8, 2021

work page 2021