pith. sign in

arxiv: 2606.10771 · v1 · pith:VMK2H2XNnew · submitted 2026-06-09 · 🌌 astro-ph.IM · cs.LG· cs.RO

On-sky demonstration of reinforcement learning for adaptive optics control

Pith reviewed 2026-06-27 11:38 UTC · model grok-4.3

classification 🌌 astro-ph.IM cs.LGcs.RO
keywords adaptive opticsreinforcement learningon-sky demonstrationvibration compensationreal-time controltelescope instrumentationpolicy optimization
0
0 comments X

The pith

Reinforcement learning controller PO4AO outperforms standard integrator in first on-sky adaptive optics tests.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents the first on-sky validation of a reinforcement learning controller for adaptive optics, deployed on the Papyrus system at a 1.52 m telescope. PO4AO beat the conventional integrator across multiple nights and conditions while learning vibration patterns and resisting measurement noise. It ran with one fixed set of settings for different targets and seeing levels even though its Python code added latency and occasional frame drops. The results indicate that a properly optimized version could serve as a reliable, turnkey controller for single-conjugate adaptive optics.

Core claim

PO4AO, a policy-optimization reinforcement learning controller, was interfaced with the existing real-time controller via shared memory and tested on sky against a standard integrator. It delivered higher performance in every configuration tested, compensated for vibrations, remained robust to noise, and required no retuning of hyperparameters when flux levels or atmospheric conditions changed.

What carries the argument

The PO4AO reinforcement learning policy that maps wavefront-sensor measurements to deformable-mirror commands and learns corrections online instead of using a fixed integrator.

If this is right

  • The controller learns and compensates for vibration patterns present in the real telescope environment.
  • It maintains performance under photon and detector noise without special tuning.
  • A single set of hyperparameters suffices across a range of flux levels and atmospheric conditions.
  • When ported to an optimized real-time language the method becomes a practical turnkey option for single-conjugate adaptive optics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same learning approach could be tested on multi-conjugate or extreme adaptive optics systems where vibrations and misregistrations are more complex.
  • Faster implementations would remove the current latency penalty and allow direct comparison of control bandwidths.
  • RL controllers might be applied to other real-time astronomy tasks such as tip-tilt or coronagraph alignment once the on-sky proof is established.

Load-bearing premise

The performance comparison remains fair even though the Python implementation of PO4AO added 750 microseconds of latency, control jitter, and occasional frame drops that the baseline integrator did not experience.

What would settle it

A side-by-side test in which the standard integrator is also run through the same Python interface with identical added latency and frame-drop statistics, checking whether PO4AO still outperforms.

Figures

Figures reproduced from arXiv: 2606.10771 by Angelie Alagao, Benoit Neichel, Byron Engler, Jalo Nousiainen, Jean-Francois Sauvage, Jonathan Dray, Markus Kasper, Romain Fetick, Sylvain Cetre, Vincent Chambouleyron.

Figure 1
Figure 1. Figure 1: Schematic diagram of the PAPYRUS bench during the [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: PSF during the 1st night, under the strong vibration. Left [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: First night telemetry analysis. Top: Residual variance per [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 2
Figure 2. Figure 2: The vibration was irregular, making it di [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: PSF during the second night. Each row compares the best [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Strehl estimation of each PSF data set ( [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Second night telemetry analysis: Vega modal variance [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 9
Figure 9. Figure 9: Control matrices Cˆ for the integrator and po4ao con￾trollers. Top: Integrator. Bottom: PO4AO. 5.2. Application on PAPYRUS telemetry Using PAPYRUS telemetry acquired with the integrator con￾troller and the PO4AO closed-loop configuration described in Sec. 3, we computed the control matrix Cˆ for each case. The computation used l = 15, 000 timesteps and included only the actuators fully illuminated on the d… view at source ↗
Figure 8
Figure 8. Figure 8: Third night telemetry analysis. To retrieve the control matrix Cˆ, one can build two history matrices Hs ∈ R nact×l and H ∈ R 2nact×l from l timesteps, such as: H s = h a(1) · · · a(t) · · · a(l) i H = " o(1) · · · o(t) · · · o(l) o(0) · · · o(t − 1) · · · o(l − 1)# (7) Using these history matrices, we can estimate the control ma￾trix: Cˆ = H sH + (8) where · + denotes the pseudo-inverse. po4ao integrator … view at source ↗
Figure 12
Figure 12. Figure 12: Eigenvalues for the first nact × nact block of the control matrices of the integrator and PO4AO controllers. Article number, page 9 of 14 [PITH_FULL_IMAGE:figures/full_fig_p009_12.png] view at source ↗
read the original abstract

Reinforcement learning (RL)-based algorithms have recently emerged as a promising approach for adaptive optics (AO) control. In simulations and laboratory experiments, they have demonstrated robustness to real-world effects such as photon and detector noise, misregistration, vibrations, and rapid variations in seeing conditions. However, their performance has not yet been validated on sky. We report the first on-sky demonstration of a reinforcement learning controller for adaptive optics, named Policy Optimization for AO (PO4AO). We further analyze its on-sky behavior and identify directions for improving the algorithm and its implementation.PO4AO was implemented and deployed on the Papyrus adaptive optics system installed at the Coud\'e focus of the 1.52 m telescope (T152) at the OHP. A Python-based implementation was interfaced with the existing real-time controller (DAO RTC) via shared-memory buffers. The performance of PO4AO was compared to that of a standard integrator controller over several nights, covering a range of flux levels and atmospheric conditions. PO4AO consistently outperformed the standard integrator in all tested configurations. The controller successfully learned and compensated for vibration patterns and demonstrated strong robustness to measurement noise. Once tuned for Papyrus, PO4AO operated in a turnkey fashion, using a single set of hyperparameters across varying observing conditions and science targets. These performance gains were achieved despite a non-optimized Python implementation introducing approximately $750\,\mu\text{s}$ of additional latency, along with control jitter and occasional frame drops. When properly implemented and optimized, PO4AO constitutes a robust and high-performance turnkey controller for single-conjugate adaptive optics systems, paving the way for broader adoption of reinforcement learning strategies in on-sky AO operations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript reports the first on-sky demonstration of a reinforcement learning controller (PO4AO) for adaptive optics, implemented on the Papyrus system at the 1.52 m OHP telescope. Across multiple nights and a range of flux levels and atmospheric conditions, PO4AO outperformed a standard integrator controller, learned and compensated for vibration patterns, and exhibited robustness to measurement noise while operating in a turnkey manner with fixed hyperparameters. These gains occurred despite added latency (~750 μs), jitter, and frame drops from a non-optimized Python implementation interfaced via shared memory to the DAO RTC.

Significance. If the performance comparison is shown to be fair, this constitutes a significant empirical result as the first on-sky validation of RL-based AO control. The multi-night dataset across conditions, combined with the demonstration of vibration compensation and noise robustness, provides concrete evidence supporting RL as a practical alternative to classical integrators for single-conjugate AO, with potential for broader operational adoption once optimized.

major comments (1)
  1. [Abstract] Abstract: The central claim that PO4AO 'consistently outperformed the standard integrator in all tested configurations' and that 'performance gains were achieved despite' the Python implementation's added latency, jitter, and frame drops does not state whether the baseline integrator was retuned, re-optimized, or evaluated under matched latency/jitter conditions. This detail is load-bearing for attributing the reported margin to algorithmic differences rather than implementation asymmetry.
minor comments (1)
  1. [Abstract] Abstract: No quantitative performance metrics (e.g., Strehl ratio, residual wavefront error, or improvement factors with uncertainties) are provided to support the outperformance claim; adding these would improve clarity and allow readers to assess the magnitude of the gains.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their thorough review and for recognizing the significance of our on-sky demonstration of PO4AO. We provide a point-by-point response to the major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that PO4AO 'consistently outperformed the standard integrator in all tested configurations' and that 'performance gains were achieved despite' the Python implementation's added latency, jitter, and frame drops does not state whether the baseline integrator was retuned, re-optimized, or evaluated under matched latency/jitter conditions. This detail is load-bearing for attributing the reported margin to algorithmic differences rather than implementation asymmetry.

    Authors: We agree that this information is important for a fair interpretation of the results. The standard integrator controller is the one already deployed in the DAO RTC and was used in its standard operational configuration without additional retuning or optimization for the purpose of this comparison. PO4AO was interfaced via shared memory, introducing the reported additional latency, jitter, and frame drops, while the integrator operated at the native latency of the RTC. We will revise the abstract to explicitly clarify that the baseline integrator was evaluated under its native conditions without matched implementation overhead, thereby strengthening the claim that the performance gains are attributable to the RL algorithm despite these disadvantages. revision: yes

Circularity Check

0 steps flagged

Empirical demonstration with no derivation chain present

full rationale

The paper reports on-sky experimental results comparing the PO4AO reinforcement learning controller to a standard integrator across flux levels and conditions. No mathematical derivation, first-principles result, fitted parameter renamed as prediction, or self-citation chain is invoked to support a claimed prediction. The central claim (outperformance) is measured against an external baseline controller and is therefore falsifiable by direct observation rather than reducing to the paper's own inputs by construction. No steps matching the enumerated circularity patterns exist.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical on-sky demonstration rather than a theoretical derivation, so the ledger contains no free parameters, axioms, or invented entities.

pith-pipeline@v0.9.1-grok · 5880 in / 1035 out tokens · 17694 ms · 2026-06-27T11:38:26.093171+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

68 extracted references · 3 canonical work pages · 2 internal anchors

  1. [1]

    2023, JATIS, 9, 049005

    Archinuk, F., Hafeez, R., Fabbro, S., Teimoorinia, H., & Véran, J.-P. 2023, JATIS, 9, 049005

  2. [2]

    Babcock, H. W. 1953, PASP, 65, 229

  3. [3]

    2025, Durham-Adaptive- Optics/daoBase: Initial Release

    Barr, D., Cetre, S., Connolly, J., & Davies, T. 2025, Durham-Adaptive- Optics/daoBase: Initial Release

  4. [4]

    2020, arXiv preprint arXiv:2003.05714

    Boccaletti, A., Chauvin, G., Mouillet, D., et al. 2020, arXiv preprint arXiv:2003.05714

  5. [5]

    2013, in Proc

    Bonneville, C., Thomas, F., de Mengin Poirier, M., et al. 2013, in Proc. SPIE Conf., V ol. 8616, SPIE, 163–177

  6. [6]

    2025, Science, 389, 1012

    Buchli, J., Tracey, B., Andric, T., et al. 2025, Science, 389, 1012

  7. [7]

    T., Gray, M., & Neichel, B

    Camelo, R., Nousiainen, J., Heritier, C. T., Gray, M., & Neichel, B. 2024, in Proc. SPIE Conf., V ol. 13097, SPIE, 233–239

  8. [8]

    T., Morgan, G., & Neichel, B

    Camelo, R., Nousiainen, J., Heritier, C. T., Morgan, G., & Neichel, B. 2023, in AO4ELT7

  9. [9]

    H., Dohlen, K., et al

    Cantalloube, F., Por, E. H., Dohlen, K., et al. 2018, A&A, 620, L10

  10. [10]

    2022, in Proc

    Carlotti, A., Bidot, A., Mouillet, D., et al. 2022, in Proc. SPIE Conf., V ol. 12184, SPIE, 523–543

  11. [11]

    2020, A&A, 644, A6

    Chambouleyron, V ., Fauvarque, O., Janin-Potiron, P., et al. 2020, A&A, 644, A6

  12. [12]

    2024, A&A, 681, A48

    Chambouleyron, V ., Sengupta, A., Salama, M., et al. 2024, A&A, 681, A48

  13. [13]

    2011, in AO4ELT

    Conan, J.-M., Raynaud, H., AR, Kulcsár, C., Meimon, S., & Sivo, G. 2011, in AO4ELT

  14. [14]

    M., Bond, C

    Correia, C. M., Bond, C. Z., Sauvage, J.-F., et al. 2017, JOSA A, 34, 1877

  15. [15]

    2022, Nat, 602, 414

    Degrave, J., Felici, F., Buchli, J., et al. 2022, Nat, 602, 414

  16. [16]

    2019, A&A, 629, A107

    Deo, V ., Gendron, É., Rousset, G., et al. 2019, A&A, 629, A107

  17. [17]

    1998, Appl

    Dessenne, C., Madec, P.-Y ., & Rousset, G. 1998, Appl. Opt., 37, 4623

  18. [18]

    2024, in Proc

    Dinis, I., Wildi, F., Ségransan, D., et al. 2024, in Proc. SPIE Conf., V ol. 13097, SPIE, 1876–1891

  19. [19]

    2024, in Proc

    Dray, J., Sinquin, B., Gray, M., et al. 2024, in Proc. SPIE Conf., V ol. 13097, SPIE, 1862–1868

  20. [20]

    Durech, E., Newberry, W., Franke, J., & Sarunic, M. V . 2021, Biomedical Opt. Express, 12, 5423

  21. [21]

    2023, in AO4ELT7 Fétick, R

    Fetick, R., Chambouleyron, V ., Muslimov, E., et al. 2023, in AO4ELT7 Fétick, R. J. L., Fusco, T., Neichel, B., et al. 2019, A&A, 628, A99

  22. [22]

    & Landman, R

    Fowler, J. & Landman, R. 2023, Proc. SPIE Conf., 12680, 100

  23. [23]

    Frazin, R. A. 2018, arXiv preprint arXiv:1804.01011

  24. [24]

    1994, in European Southern Observatory Conference and Workshop

    Gendron, E. 1994, in European Southern Observatory Conference and Workshop

  25. [25]

    & Le Roux, B

    Gray, M. & Le Roux, B. 2012, in Proc. SPIE Conf., V ol. 8447, SPIE, 84471T

  26. [26]

    Guerra-Ramos, D., Trujillo-Sevilla, J., & Rodríguez-Ramos, J. M. 2020, applied sciences, 10, 3207

  27. [27]

    2018, Annual Review of Astronomy and Astrophysics, 56, 315

    Guyon, O. 2018, Annual Review of Astronomy and Astrophysics, 56, 315

  28. [28]

    Adaptive Optics Predictive Control with Empirical Orthogonal Functions (EOFs)

    Guyon, O. & Males, J. 2017, arXiv preprint arXiv:1707.00570

  29. [29]

    Y ., Males, J., Close, L., et al

    Haffert, S. Y ., Males, J., Close, L., et al. 2021, in Proc. SPIE Conf., V ol. 11823, SPIE, 118231C

  30. [30]

    2018, MNRAS, 481, 2829

    Heritier, C., Esposito, S., Fusco, T., et al. 2018, MNRAS, 481, 2829

  31. [31]

    & Ramlau, R

    Hutterer, V . & Ramlau, R. 2018, Appl. Opt., 57, 8790

  32. [32]

    2019, Inverse Problems, 35, 045008

    Hutterer, V ., Ramlau, R., & Shatokhina, I. 2019, Inverse Problems, 35, 045008

  33. [33]

    2015, PASP, 127, 890

    Jovanovic, N., Martinache, F., Guyon, O., et al. 2015, PASP, 127, 890

  34. [34]

    2024, Scientific reports, 14, 15733

    Kaiser, J., Xu, C., Eichler, A., et al. 2024, Scientific reports, 14, 15733

  35. [35]

    2019, Optik, 178, 785 Kulcsár, C., Raynaud, H.-F., Petit, C., Conan, J.-M., & Lesegno, P

    Ke, H., Xu, B., Xu, Z., et al. 2019, Optik, 178, 785 Kulcsár, C., Raynaud, H.-F., Petit, C., Conan, J.-M., & Lesegno, P. V . D. 2006, Opt. Express, 14(17):7464–7476

  36. [36]

    2025, A&A, 696, L1

    Landman, R., Haffert, S., Long, J., et al. 2025, A&A, 696, L1

  37. [37]

    2024, A&A, 684, A114

    Landman, R., Haffert, S., Males, J., et al. 2024, A&A, 684, A114

  38. [38]

    & Haffert, S

    Landman, R. & Haffert, S. Y . 2020, Opt. Express, 28, 16644

  39. [39]

    Y ., Radhakrishnan, V

    Landman, R., Haffert, S. Y ., Radhakrishnan, V . M., & Keller, C. U. 2020, in Proc. SPIE Conf., V ol. 11448, SPIE, 1144849

  40. [40]

    Y ., Radhakrishnan, V

    Landman, R., Haffert, S. Y ., Radhakrishnan, V . M., & Keller, C. U. 2021, JATIS, 7, 039002

  41. [41]

    2019, in ICANN, Springer, 537–542

    Liu, X., Morris, T., & Saunter, C. 2019, in ICANN, Springer, 537–542

  42. [42]

    2024, in Proc

    Lovis, C., Blind, N., Chazelas, B., et al. 2024, in Proc. SPIE Conf., V ol. 13096, SPIE, 412–417 Article number, page 10 of 14 J. Nousiainen et al.: On-sky demonstration of reinforcement learning for adaptive optics control

  43. [43]

    R., Close, L

    Males, J. R., Close, L. M., Miller, K., et al. 2018, in Proc. SPIE Conf., V ol. 10703, SPIE, 1070309

  44. [44]

    Males, J. R. & Guyon, O. 2018, JATIS, 4, 019001

  45. [45]

    1989, The Messenger, 58, 1

    Merkle, F., Kern, P., Léna, P., et al. 1989, The Messenger, 58, 1

  46. [46]

    2021, in Proc

    Muslimov, E., Levraud, N., Chambouleyron, V ., et al. 2021, in Proc. SPIE Conf., V ol. 11876, SPIE, 56–68

  47. [47]

    2021, Opt

    Nousiainen, J., Rajani, C., Kasper, M., & Helin, T. 2021, Opt. Express, 29, 15327

  48. [48]

    2023, Photonics, 10

    Parvizi, P., Zou, R., Bellinger, C., Cheriton, R., & Spinello, D. 2023, Photonics, 10

  49. [49]

    Paschall, R. N. & Anderson, D. J. 1993, Appl. Opt., 32, 6347 Pérez-Fernández, S., Buendía-Roca, A., González-Gutiérrez, C., et al. 2025, Mathematics, 13, 1028

  50. [50]

    2022, Opt

    Pou, B., Ferreira, F., Quinones, E., Gratadour, D., & Martin, M. 2022, Opt. Ex- press, 30, 2991

  51. [51]

    2024, Opt

    Pou, B., Smith, J., Quinones, E., Martin, M., & Gratadour, D. 2024, Opt. Express, 32, 37011

  52. [52]

    A., Macintosh, B

    Poyneer, L. A., Macintosh, B. A., & Véran, J.-P. 2007, JOSA A, 24, 2645

  53. [53]

    1999, Adaptive optics in astronomy (Cambridge University)

    Roddier, F. 1999, Adaptive optics in astronomy (Cambridge University)

  54. [54]

    2020, MNRAS, 498, 3228

    Sinquin, B., Prengère, L., Kulcsár, C., et al. 2020, MNRAS, 498, 3228

  55. [55]

    2023, in AO4ELT7, 457940

    Striffling, A., Fétick, R., Chambouleyron, V ., et al. 2023, in AO4ELT7, 457940

  56. [56]

    J.-L., et al

    Striffling, A., Héritier, C.-T., Fétick, R. J.-L., et al. 2025, A&A, 703, A253

  57. [57]

    2017, Optics Communications, 382, 519

    Sun, Z., Chen, Y ., Li, X., Qin, X., & Wang, H. 2017, Optics Communications, 382, 519

  58. [58]

    2018, in Proc

    Swanson, R., Lamb, M., Correia, C., Sivanandam, S., & Kutulakos, K. 2018, in Proc. SPIE Conf., V ol. 10703, SPIE, 107031F

  59. [59]

    M., Sivanandam, S., & Kutulakos, K

    Swanson, R., Lamb, M., Correia, C. M., Sivanandam, S., & Kutulakos, K. 2021, MNRAS, 503, 2944 van Kooten, M., Doelman, N., & Kenworthy, M. 2017, Performance of AO pre- dictive control in the presence of non-stationary turbulence (Instituto de As- trofisica de Canarias) van Kooten, M., Doelman, N., & Kenworthy, M. 2019, JOSA A, 36, 731 van Kooten, M. A., J...

  60. [60]

    2024, in Proc

    Weinberger, C., Neichel, B., Tapia, J., & Vera, E. 2024, in Proc. SPIE Conf., V ol. 13097, 130970S

  61. [61]

    2024, A&A, 687, A202

    Weinberger, C., Tapia, J., Neichel, B., & Vera, E. 2024, A&A, 687, A202

  62. [62]

    P., Norris, B

    Wong, A. P., Norris, B. R., Deo, V ., et al. 2023, PASP, 135, 114501

  63. [63]

    P., Norris, B

    Wong, A. P., Norris, B. R., Tuthill, P. G., et al. 2021, JATIS, 7, 019001

  64. [64]

    Xiong, Y ., Guo, L., Huang, Y ., & Chen, L. 2020, J. Thermophys. Heat Transf., 34, 37

  65. [65]

    & Avruch, I

    Yatawatta, S. & Avruch, I. M. 2021, MNRAS, 505, 2141 Article number, page 11 of 14 A&A proofs:manuscript no. aa59769-26 Appendix A: Additional telemetry analysis For interested readers, we have added several additional teleme- try plots. For each dataset presented in the paper, we plot the wavefront mean-squared error (MSE) at each time step and com- pare...

  66. [66]

    A.1: Additional first night telemetry analysis

    Vega, V = 0.09 PO4AO gain in Variance Integrator / PO4AO (b) Fig. A.1: Additional first night telemetry analysis. (a) mean- squared wavefront error at each time step: blue line is for the integrator and orange line for PO4AO. (b) Comparison between residual modal variance, i.e., integrator variance divided by the PO4AO variance for each KL mode. Moreover,...

  67. [67]

    Vega, V = 0.09 PO4AO gain in Variance Integrator / PO4AO (b) 0 2000 4000 6000 8000 10000 12000 14000 time step (t) 0.0001 0.0002 0.0003 0.0004 0.0005 0.0006 0.0007T otal MSE HD177809, V = 5.72 Total MSE Integrator PO4AO (c) 0 25 50 75 100 125 150 175 200 KL mode index 1.0 1.5 2.0 2.5 3.0 3.5Residual variance gain HD177809, V = 5.72 PO4AO gain in Variance ...

  68. [68]

    A.3: Additional second night telemetry analysis

    Cygni, V = 6.66 PO4AO gain in Variance Integrator / PO4AO (f) Fig. A.3: Additional second night telemetry analysis. (a, c, e) mean-squared wavefront error at each time step for each target: blue lines are for the integrator and orange lines for PO4AO. (b, d, f) Comparison between residual modal variance, i.e., integrator variance divided by the PO4AO vari...