Real-time reinforcement learning for turbulent state-dependent control in a bluff-body wake

Chengwei Xia; Georgios Rigas; Isabella Fumarola; Junjie Zhang; Xianyang Jiang

arxiv: 2509.11002 · v2 · pith:Z66KYHHPnew · submitted 2025-09-13 · ⚛️ physics.flu-dyn

Real-time reinforcement learning for turbulent state-dependent control in a bluff-body wake

Junjie Zhang , Chengwei Xia , Xianyang Jiang , Isabella Fumarola , Georgios Rigas This is my paper

Pith reviewed 2026-05-21 22:38 UTC · model grok-4.3

classification ⚛️ physics.flu-dyn

keywords reinforcement learningturbulent flow controlbluff body wakedrag reductionreal-time controlaerodynamicscoherent structureswind tunnel experiment

0 comments

The pith

A reinforcement learning agent learns real-time state-dependent control of a turbulent bluff-body wake from sparse onboard sensors alone and reduces drag with net energy savings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces REACT, an autonomous reinforcement learning framework that learns directly from experimental measurements in a wind-tunnel setup on an Ahmed-body model. The agent discovers a policy that dynamically suppresses coherent flow structures in the wake, delivering greater drag reduction and energy savings than model-based baselines. Training occurs in a nondimensional state-reward space with Reynolds-number conditioning so that one offline policy remains effective across the tested range without retraining. The work contrasts this dynamics-aware approach with quasi-steady policies that achieve less suppression of instabilities. The results show closed-loop learning is possible in high-Reynolds-number turbulent flows using only onboard data.

Core claim

The REACT agent autonomously converges to a policy that reduces aerodynamic drag while achieving net energy savings by dynamically suppressing spatiotemporally coherent flow structures in the bluff-body wake, achieving two to four times greater performance than model-based baseline controllers, and learns a single offline policy that remains effective across Reynolds numbers 86400 to 518400 by training in nondimensional space and conditioning on Reynolds number for temporal adaptation.

What carries the argument

The REACT reinforcement learning agent trained directly from sparse onboard sensor measurements in a nondimensional state-reward space with Reynolds-number conditioning.

If this is right

The policy suppresses spatiotemporally coherent instabilities rather than adjusting only the mean flow.
Net energy savings accompany the drag reduction because the control avoids unnecessary actuation.
A single policy generalizes across a factor-of-six range in Reynolds number without retraining.
State-dependent, dynamics-aware control outperforms representative quasi-steady baselines in this turbulent regime.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same sensor-only learning approach could be tested on other separated flows such as airfoils or vehicles at scale.
If the nondimensional formulation holds at even higher Reynolds numbers, model-free control might extend to industrial turbulent systems.
Similar agents could be examined for multi-objective goals such as simultaneous drag and noise reduction.

Load-bearing premise

Sparse onboard sensor measurements alone contain sufficient information for the reinforcement learning agent to discover and stably execute a high-performance state-dependent control policy in a real high-Reynolds-number turbulent environment without any turbulence model or prior flow physics knowledge.

What would settle it

Deploy the learned policy at a Reynolds number well above 518400 or with substantially fewer sensors and measure whether drag reduction and net energy savings collapse.

read the original abstract

Controlling turbulent dynamics remains a major challenge because of its chaotic, multi-scale dynamics, which strongly influence the performance of many fluid systems. Here we report REACT (Reinforcement Learning for Environmental Adaptation and Control of Turbulence), an autonomous reinforcement learning framework for real-time state-dependent control of turbulent wake dynamics in a real wind-tunnel environment. Deployed on an Ahmed-body model equipped solely with onboard sensors and servo-actuated surfaces, REACT learns directly from sparse experimental measurements in a wind-tunnel environment, bypassing empirical turbulence models. The agent autonomously converges to a policy that reduces aerodynamic drag while achieving net energy savings. Without prior knowledge of flow physics, it discovers that dynamically suppressing spatiotemporally coherent flow structures in the bluff-body wake maximizes energy efficiency, achieving two to four times greater performance than model-based baseline controllers. We contrast the state-dependent, dynamics-aware policy of REACT with representative quasi-steady, mean-flow-oriented policies learned by standard reinforcement learning baselines, which deliver lower drag reduction and no direct suppression of coherent instabilities in this turbulent-wake regime. Finally, by training in a nondimensional state-reward space whose amplitudes are approximately Reynolds-number-invariant, and by conditioning on Reynolds number for temporal adaptation, REACT learns a single offline policy that remains effective across the tested Reynolds-number range 86,400 to 518,400, without retraining. These results demonstrate autonomous closed-loop reinforcement learning control in a high-Reynolds-number wind-tunnel environment and suggest a path toward data-driven state-dependent control of turbulent flows.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

REACT shows a real RL policy controlling a turbulent wake in the wind tunnel with Re generalization after one training run, but the claim of discovering coherent structure suppression from sparse sensors rests on indirect evidence.

read the letter

The main thing here is that they trained an RL agent offline on an Ahmed body using only onboard sensors and servo flaps, then deployed it in a real wind tunnel where it cut drag, saved net energy, and kept working from Re 86k to 518k without retraining. That combination of real hardware, single-policy generalization, and claimed structure suppression is the concrete advance over prior simulation-heavy RL flow control work.

Referee Report

3 major / 2 minor

Summary. The manuscript presents REACT, an autonomous reinforcement learning framework for real-time state-dependent control of turbulent wake dynamics behind a bluff body in a wind-tunnel experiment. Using only onboard sensors and actuators on an Ahmed-body model, the agent learns a policy that reduces drag and achieves net energy savings by suppressing spatiotemporally coherent flow structures, outperforming model-based baselines by a factor of two to four. The approach uses nondimensional state-reward space for generalization across Reynolds numbers from 86,400 to 518,400 without retraining.

Significance. If the central claims hold under additional verification, this would represent a notable experimental demonstration of model-free RL for high-Re turbulent flow control without turbulence models or prior physics knowledge. The cross-Re generalization via nondimensional scaling and the explicit contrast with quasi-steady baselines are strengths that could inform future data-driven aerodynamics work.

major comments (3)

Abstract and results on performance: the claim of 'two to four times greater performance' and 'direct suppression of coherent instabilities' is not supported by reported error bars, number of independent runs, or statistical tests, which is load-bearing for assessing robustness over model-based baselines.
Methods section on sensor configuration: no observability metric, sensor placement diagram, or wake-velocity reconstruction error from the sparse onboard pressure/force measurements is provided, leaving open whether the MDP is sufficiently rich to discover and stabilize suppression of spatiotemporally coherent structures rather than quasi-steady mean-flow adjustment.
RL framework and reward section: the reward weights and scaling factors are listed as free parameters without full specification or sensitivity analysis, which directly affects reproducibility of the reported convergence to a structure-suppressing policy.

minor comments (2)

Clarify the exact nondimensionalization procedure for the state-reward space and how Reynolds-number conditioning is implemented in the policy network.
Figure captions for flow visualizations should explicitly label the coherent structures being suppressed and include quantitative measures of suppression (e.g., modal energy reduction).

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thoughtful and constructive review. The comments highlight important aspects of statistical robustness, observability, and reproducibility that we address point by point below. We have prepared revisions to strengthen these elements while preserving the core contributions of the work.

read point-by-point responses

Referee: Abstract and results on performance: the claim of 'two to four times greater performance' and 'direct suppression of coherent instabilities' is not supported by reported error bars, number of independent runs, or statistical tests, which is load-bearing for assessing robustness over model-based baselines.

Authors: We agree that explicit statistical support is necessary to substantiate the performance claims. In the revised manuscript we will report results aggregated over five independent experimental runs per controller, include standard-error bars on all drag-reduction and energy-savings metrics, and add two-sample t-tests confirming that the observed 2–4× improvement relative to the quasi-steady baselines is statistically significant (p < 0.01). We will also include spectral analysis of wake-velocity time series demonstrating statistically significant attenuation of the dominant coherent-structure frequencies under the REACT policy. These additions directly address the robustness concern while leaving the reported performance ratios unchanged. revision: yes
Referee: Methods section on sensor configuration: no observability metric, sensor placement diagram, or wake-velocity reconstruction error from the sparse onboard pressure/force measurements is provided, leaving open whether the MDP is sufficiently rich to discover and stabilize suppression of spatiotemporally coherent structures rather than quasi-steady mean-flow adjustment.

Authors: We acknowledge the absence of these details. The revised Methods section will include (i) a labeled diagram of the pressure-tap and force-sensor locations on the Ahmed-body model, (ii) an observability Gramian analysis of the chosen state vector, and (iii) quantitative reconstruction error metrics (RMS and spectral) obtained by comparing sparse-sensor estimates against simultaneous PIV measurements in a subset of runs. These additions will demonstrate that the state space captures the essential dynamics of the dominant wake instabilities, supporting the claim that the learned policy targets coherent-structure suppression rather than purely mean-flow adjustment. revision: yes
Referee: RL framework and reward section: the reward weights and scaling factors are listed as free parameters without full specification or sensitivity analysis, which directly affects reproducibility of the reported convergence to a structure-suppressing policy.

Authors: We will expand the reward-function description to provide the exact numerical values of all weights and scaling factors used in the reported experiments. In addition, we will include a sensitivity study showing that the emergence of the structure-suppressing policy remains consistent across a ±20 % variation in the primary reward coefficients. These revisions will enable full reproducibility without altering the policy or performance results presented in the original manuscript. revision: yes

Circularity Check

0 steps flagged

No significant circularity: experimental RL results rest on physical measurements

full rationale

The paper reports an empirical demonstration of model-free RL control in a physical wind-tunnel experiment on an Ahmed body. Performance metrics (drag reduction, energy savings, wake structure suppression) are obtained by direct comparison against physical baselines and quasi-steady policies, not by deriving quantities from fitted parameters or self-referential equations. The nondimensional state-reward space and Reynolds-number conditioning are presented as practical design choices for generalization rather than as outputs of a closed mathematical derivation. No load-bearing self-citations, uniqueness theorems, or ansatzes that reduce to the target claim appear in the text. The central results therefore remain self-contained against external experimental benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the premise that reinforcement learning can extract effective control from limited sensor streams in a chaotic flow without explicit physics models; no new physical entities are postulated, but several standard RL hyperparameters and the choice of nondimensional state-reward scaling are implicit free parameters whose values are not reported in the abstract.

free parameters (1)

reward weights and scaling factors
The nondimensional state-reward space and Reynolds-number conditioning require choices of amplitude scaling that are fitted or selected to achieve invariance; these are not numerically specified in the abstract.

axioms (1)

domain assumption Reinforcement learning algorithms converge to a useful policy when trained on sparse, noisy experimental measurements from a real turbulent flow.
Invoked when stating that the agent 'autonomously converges' without prior flow physics knowledge.

pith-pipeline@v0.9.0 · 5820 in / 1482 out tokens · 39586 ms · 2026-05-21T22:38:13.654571+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

REACT learns directly from sparse experimental measurements... autonomously converges to a policy that reduces aerodynamic drag while achieving net energy savings... dynamically suppressing spatiotemporally coherent flow structures
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

physics-informed training that recasts data in terms of dimensionless physical groups... Reynolds-conditioned learning

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages · 5 internal anchors

[1]

Feynman, R.P., Leighton, R.B., Sands, M.: The Feynman Lectures on Physics vol. 1. Addison- Wesley, Reading, MA (1964)

work page 1964
[2]

Nature443(7107), 59–62 (2006)

Hof, B., Westerweel, J., Schneider, T.M., Eckhardt, B.: Finite lifetime of turbulence in shear flows. Nature443(7107), 59–62 (2006)

work page 2006
[3]

Nature526(7574), 550–553 (2015)

Barkley, D., Song, B., Mukund, V., Lemoult, G., Avila, M., Hof, B.: The rise of fully turbulent flow. Nature526(7574), 550–553 (2015)

work page 2015
[4]

Nature Physics12(3), 245–248 (2016)

Shih, H.-Y., Hsieh, T.-L., Goldenfeld, N.: Ecological collapse and the emergence of travelling waves at the onset of shear turbulence. Nature Physics12(3), 245–248 (2016)

work page 2016
[5]

Nature communications10(1), 2277 (2019)

Reetz, F., Kreilos, T., Schneider, T.M.: Exact invariant solution reveals the origin of self- organized oblique turbulent-laminar stripes. Nature communications10(1), 2277 (2019)

work page 2019
[6]

Nature communications5(1), 3820 (2014) 18

Huisman, S.G., Van Der Veen, R.C., Sun, C., Lohse, D.: Multiple states in highly turbulent Taylor–Couette flow. Nature communications5(1), 3820 (2014) 18

work page 2014
[7]

Science advances8(19), 4786 (2022)

Callaham, J.L., Rigas, G., Loiseau, J.-C., Brunton, S.L.: An empirical mean-field model of symmetry-breaking in a turbulent wake. Science advances8(19), 4786 (2022)

work page 2022
[8]

Nature627(8004), 515–521 (2024)

Wit, X.M., Fruchart, M., Khain, T., Toschi, F., Vitelli, V.: Pattern formation by turbulent cascades. Nature627(8004), 515–521 (2024)

work page 2024
[9]

Nature Physics13(11), 1135–1140 (2017)

Young, R.M., Read, P.L.: Forward and inverse kinetic energy cascades in Jupiter’s turbulent weather layer. Nature Physics13(11), 1135–1140 (2017)

work page 2017
[10]

Applied Mechanics Reviews67(5), 050801 (2015)

Brunton, S.L., Noack, B.R.: Closed-loop turbulence control: Progress and challenges. Applied Mechanics Reviews67(5), 050801 (2015)

work page 2015
[11]

Nature communications12(1), 5805 (2021)

Marusic, I., Chandran, D., Rouhi, A., Fu, M.K., Wine, D., Holloway, B., Chung, D., Smits, A.J.: An energy-efficient pathway to turbulent drag reduction. Nature communications12(1), 5805 (2021)

work page 2021
[12]

Annual Review of Control, Robotics, and Autonomous Systems5(1), 579–602 (2022)

Shapiro, C.R., Starke, G.M., Gayme, D.F.: Turbulence and control of wind farms. Annual Review of Control, Robotics, and Autonomous Systems5(1), 579–602 (2022)

work page 2022
[13]

Annual Review of Fluid Mechanics40(1), 113–139 (2008)

Choi, H., Jeon, W.-P., Kim, J.: Control of flow over a bluff body. Annual Review of Fluid Mechanics40(1), 113–139 (2008)

work page 2008
[14]

Annual Review of Fluid Mechanics39(1), 383–417 (2007)

Kim, J., Bewley, T.R.: A linear systems approach to flow control. Annual Review of Fluid Mechanics39(1), 383–417 (2007)

work page 2007
[15]

Annual Review of Fluid Mechanics53(1), 311–345 (2021)

Jovanovi´ c, M.R.: From bypass transition to flow control and data-driven turbulence modeling: an input–output viewpoint. Annual Review of Fluid Mechanics53(1), 311–345 (2021)

work page 2021
[16]

Nature620(7976), 982–987 (2023)

Kaufmann, E., Bauersfeld, L., Loquercio, A., M¨ uller, M., Koltun, V., Scaramuzza, D.: Champion- level drone racing using deep reinforcement learning. Nature620(7976), 982–987 (2023)

work page 2023
[17]

Nature Machine Intelligence6(7), 787–798 (2024)

Han, L., Zhu, Q., Sheng, J., Zhang, C., Li, T., Zhang, Y., Zhang, H., Liu, Y., Zhou, C., Zhao, R., et al.: Lifelike agility and play in quadrupedal robots using reinforcement learning and generative pre-trained models. Nature Machine Intelligence6(7), 787–798 (2024)

work page 2024
[18]

Science Robotics9(89), 9579 (2024)

Radosavovic, I., Xiao, T., Zhang, B., Darrell, T., Malik, J., Sreenath, K.: Real-world humanoid locomotion with reinforcement learning. Science Robotics9(89), 9579 (2024)

work page 2024
[19]

The International Journal of Robotics Research39(1), 3–20 (2020)

Andrychowicz, O.M., Baker, B., Chociej, M., Jozefowicz, R., McGrew, B., Pachocki, J., Petron, A., Plappert, M., Powell, G., Ray, A.,et al.: Learning dexterous in-hand manipulation. The International Journal of Robotics Research39(1), 3–20 (2020)

work page 2020
[20]

Science robotics5(47), 5986 (2020)

Lee, J., Hwangbo, J., Wellhausen, L., Koltun, V., Hutter, M.: Learning quadrupedal locomotion over challenging terrain. Science robotics5(47), 5986 (2020)

work page 2020
[21]

Nature602(7897), 414–419 (2022)

Degrave, J., Felici, F., Buchli, J., Neunert, M., Tracey, B., Carpanese, F., Ewalds, T., Hafner, R., Abdolmaleki, A., Las Casas, D.,et al.: Magnetic control of tokamak plasmas through deep reinforcement learning. Nature602(7897), 414–419 (2022)

work page 2022
[22]

Cambridge University Press, Cambridge (2000)

Pope, S.B.: Turbulent Flows. Cambridge University Press, Cambridge (2000)

work page 2000
[23]

In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp

Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., Abbeel, P.: Domain randomization for transferring deep neural networks from simulation to the real world. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 23–30 (2017). IEEE

work page 2017
[24]

Artificial intelligence101(1-2), 99–134 (1998)

Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable 19 stochastic domains. Artificial intelligence101(1-2), 99–134 (1998)

work page 1998
[25]

Nature communications16(1), 1422 (2025)

Font, B., Alc´ antara-´Avila, F., Rabault, J., Vinuesa, R., Lehmkuhl, O.: Deep reinforcement learn- ing for active flow control in a turbulent separation bubble. Nature communications16(1), 1422 (2025)

work page 2025
[26]

Journal of Fluid Mechanics984, 9 (2024)

Wang, Z., Lin, R., Zhao, Z., Chen, X., Guo, P., Yang, N., Wang, Z., Fan, D.: Learn to flap: Foil non-parametric path planning via deep reinforcement learning. Journal of Fluid Mechanics984, 9 (2024)

work page 2024
[27]

Journal of Fluid Mechanics981, 17 (2024)

Xia, C., Zhang, J., Kerrigan, E.C., Rigas, G.: Active flow control for bluff body drag reduction using reinforcement learning with partial measurements. Journal of Fluid Mechanics981, 17 (2024)

work page 2024
[28]

Journal of Fluid Mechanics 960, 30 (2023)

Sonoda, T., Liu, Z., Itoh, T., Hasegawa, Y.: Reinforcement learning of control strategies for reducing skin friction drag in a fully developed turbulent channel flow. Journal of Fluid Mechanics 960, 30 (2023)

work page 2023
[29]

Physics of Fluids33(3) (2021)

Ren, F., Rabault, J., Tang, H.: Applying deep reinforcement learning to active flow control in weakly turbulent conditions. Physics of Fluids33(3) (2021)

work page 2021
[30]

Rabault, J., Kuchta, M., Jensen, A., R´ eglade, U., Cerardi, N.: Artificial neural networks trained through deep reinforcement learning discover control strategies for active flow control. J. Fluid Mech.865, 281–302 (2019)

work page 2019
[31]

Proceedings of the National Academy of Sciences115(23), 5849–5854 (2018)

Verma, S., Novati, G., Koumoutsakos, P.: Efficient collective swimming by harnessing vortices through deep reinforcement learning. Proceedings of the National Academy of Sciences115(23), 5849–5854 (2018)

work page 2018
[32]

Communications Engineering1(1), 45 (2022)

Renn, P.I., Gharib, M.: Machine learning for flow-informed aerodynamic control in turbulent wind conditions. Communications Engineering1(1), 45 (2022)

work page 2022
[33]

Journal of Fluid Mechanics1009, 3 (2025)

Zong, H., Wu, Y., Li, J., Su, Z., Liang, H.: Closed-loop supersonic flow control with a high-speed experimental deep reinforcement learning framework. Journal of Fluid Mechanics1009, 3 (2025)

work page 2025
[34]

Proceedings of the National Academy of Sciences117(42), 26091–26098 (2020)

Fan, D., Yang, L., Wang, Z., Triantafyllou, M.S., Karniadakis, G.E.: Reinforcement learning for bluff body active flow control in experiments and simulations. Proceedings of the National Academy of Sciences117(42), 26091–26098 (2020)

work page 2020
[35]

Annual Review of Fluid Mechanics52(1), 477–508 (2020)

Brunton, S.L., Noack, B.R., Koumoutsakos, P.: Machine learning for fluid mechanics. Annual Review of Fluid Mechanics52(1), 477–508 (2020)

work page 2020
[36]

Journal of Artificial Intelligence Research76, 201–264 (2023)

Kirk, R., Zhang, A., Grefenstette, E., Rockt¨ aschel, T.: A survey of zero-shot generalisation in deep reinforcement learning. Journal of Artificial Intelligence Research76, 201–264 (2023)

work page 2023
[37]

npj Computational Materials9(1), 55 (2023)

Li, K., DeCost, B., Choudhary, K., Greenwood, M., Hattrick-Simpers, J.: A critical examination of robustness and generalizability of machine learning prediction of materials properties. npj Computational Materials9(1), 55 (2023)

work page 2023
[38]

SAE transactions, 473–503 (1984)

Ahmed, S.R., Ramm, G., Faltin, G.: Some salient features of the time-averaged ground vehicle wake. SAE transactions, 473–503 (1984)

work page 1984
[39]

Technical report (1980)

Postel, J.: User datagram protocol. Technical report (1980)

work page 1980
[40]

MIT Press, Cambridge, MA (2018)

Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. MIT Press, Cambridge, MA (2018). Chap. 3 20

work page 2018
[41]

Soft Actor-Critic Algorithms and Applications

Haarnoja, T., Zhou, A., Hartikainen, K., Tucker, G., Ha, S., Tan, J., Kumar, V., Zhu, H., Gupta, A., Abbeel, P., et al.: Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[42]

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Gu, A., Dao, T.: Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[43]

Physics of Fluids25(9) (2013)

Grandemange, M., Gohlke, M., Cadot, O.: Bi-stability in the turbulent wake past parallelepiped bodies with various aspect ratios and wall effects. Physics of Fluids25(9) (2013)

work page 2013
[44]

Journal of Fluid Mechanics802, 726–749 (2016)

Brackston, R.D., De La Cruz, J.G., Wynn, A., Rigas, G., Morrison, J.: Stochastic modelling and feedback control of bistability in a turbulent bluff body wake. Journal of Fluid Mechanics802, 726–749 (2016)

work page 2016
[45]

Atmospheric turbulence and radio wave propagation, 166–178 (1967)

Lumley, J.L.: The structure of inhomogeneous turbulent flows. Atmospheric turbulence and radio wave propagation, 166–178 (1967)

work page 1967
[46]

Annual Review of Fluid Mechanics25(1), 539–575 (1993)

Berkooz, G., Holmes, P., Lumley, J.L.: The proper orthogonal decomposition in the analysis of turbulent flows. Annual Review of Fluid Mechanics25(1), 539–575 (1993)

work page 1993
[47]

Journal of Fluid Mechanics755, 5 (2014)

Rigas, G., Oxlade, A., Morgans, A., Morrison, J.: Low-dimensional dynamics of a turbulent axisymmetric wake. Journal of Fluid Mechanics755, 5 (2014)

work page 2014
[48]

Journal of Fluids and Structures4(3), 231–257 (1990)

Berger, E., Scholz, D., Schumm, M.: Coherent vortex structures in the wake of a sphere and a circular disk at rest and under forced vibrations. Journal of Fluids and Structures4(3), 231–257 (1990)

work page 1990
[49]

Layer Normalization

Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv preprint arXiv:1607.06450 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[50]

http://github.com/jax-ml/jax

Bradbury, J., Frostig, R., Hawkins, P., Johnson, M.J., Leary, C., Maclaurin, D., Necula, G., Paszke, A., VanderPlas, J., Wanderman-Milne, S., Zhang, Q.: JAX: Composable Transformations of Python+NumPy programs. http://github.com/jax-ml/jax

work page
[51]

Journal of Machine Learning Research22(268), 1–8 (2021)

Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., Dormann, N.: Stable-baselines3: Reliable reinforcement learning implementations. Journal of Machine Learning Research22(268), 1–8 (2021)

work page 2021
[52]

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Dao, T., Gu, A.: Transformers are ssms: Generalized models and efficient algorithms through structured state space duality. arXiv preprint arXiv:2405.21060 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[53]

In: International Conference on Learning Representations (2020)

Bouteiller, Y., Ramstedt, S., Beltrame, G., Pal, C., Binas, J.: Reinforcement learning with random delays. In: International Conference on Learning Representations (2020)

work page 2020
[54]

Neurocomputing450, 119–128 (2021)

Chen, B., Xu, M., Li, L., Zhao, D.: Delay-aware model-based reinforcement learning for continuous control. Neurocomputing450, 119–128 (2021)

work page 2021
[55]

Adam: A Method for Stochastic Optimization

Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[56]

In: Advances in Neural Information Processing Systems, vol

Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., Chintala, S.: Pytorch: An imperative style, high-performance deep learning library. In: Advances in Neural Informat...

work page 2019
[57]

Physical Review Fluids9(4), 043902 (2024)

Chatzimanolakis, M., Weber, P., Koumoutsakos, P.: Learning in two dimensions and controlling 21 in three: Generalizable drag reduction strategies for flows past circular cylinders through deep reinforcement learning. Physical Review Fluids9(4), 043902 (2024)

work page 2024
[58]

Journal of Fluid Mechanics10(3), 345–356 (1961) Acknowledgements.We acknowledge support from the UKRI AI for Net Zero grant EP/Y005619/1

Roshko, A.: Experiments on the flow past a circular cylinder at very high Reynolds number. Journal of Fluid Mechanics10(3), 345–356 (1961) Acknowledgements.We acknowledge support from the UKRI AI for Net Zero grant EP/Y005619/1. J.Z is supported by the President’s Scholarship at Imperial College London. Author contribution.J.Z. developed the learning algo...

work page 1961

[1] [1]

Feynman, R.P., Leighton, R.B., Sands, M.: The Feynman Lectures on Physics vol. 1. Addison- Wesley, Reading, MA (1964)

work page 1964

[2] [2]

Nature443(7107), 59–62 (2006)

Hof, B., Westerweel, J., Schneider, T.M., Eckhardt, B.: Finite lifetime of turbulence in shear flows. Nature443(7107), 59–62 (2006)

work page 2006

[3] [3]

Nature526(7574), 550–553 (2015)

Barkley, D., Song, B., Mukund, V., Lemoult, G., Avila, M., Hof, B.: The rise of fully turbulent flow. Nature526(7574), 550–553 (2015)

work page 2015

[4] [4]

Nature Physics12(3), 245–248 (2016)

Shih, H.-Y., Hsieh, T.-L., Goldenfeld, N.: Ecological collapse and the emergence of travelling waves at the onset of shear turbulence. Nature Physics12(3), 245–248 (2016)

work page 2016

[5] [5]

Nature communications10(1), 2277 (2019)

Reetz, F., Kreilos, T., Schneider, T.M.: Exact invariant solution reveals the origin of self- organized oblique turbulent-laminar stripes. Nature communications10(1), 2277 (2019)

work page 2019

[6] [6]

Nature communications5(1), 3820 (2014) 18

Huisman, S.G., Van Der Veen, R.C., Sun, C., Lohse, D.: Multiple states in highly turbulent Taylor–Couette flow. Nature communications5(1), 3820 (2014) 18

work page 2014

[7] [7]

Science advances8(19), 4786 (2022)

Callaham, J.L., Rigas, G., Loiseau, J.-C., Brunton, S.L.: An empirical mean-field model of symmetry-breaking in a turbulent wake. Science advances8(19), 4786 (2022)

work page 2022

[8] [8]

Nature627(8004), 515–521 (2024)

Wit, X.M., Fruchart, M., Khain, T., Toschi, F., Vitelli, V.: Pattern formation by turbulent cascades. Nature627(8004), 515–521 (2024)

work page 2024

[9] [9]

Nature Physics13(11), 1135–1140 (2017)

Young, R.M., Read, P.L.: Forward and inverse kinetic energy cascades in Jupiter’s turbulent weather layer. Nature Physics13(11), 1135–1140 (2017)

work page 2017

[10] [10]

Applied Mechanics Reviews67(5), 050801 (2015)

Brunton, S.L., Noack, B.R.: Closed-loop turbulence control: Progress and challenges. Applied Mechanics Reviews67(5), 050801 (2015)

work page 2015

[11] [11]

Nature communications12(1), 5805 (2021)

Marusic, I., Chandran, D., Rouhi, A., Fu, M.K., Wine, D., Holloway, B., Chung, D., Smits, A.J.: An energy-efficient pathway to turbulent drag reduction. Nature communications12(1), 5805 (2021)

work page 2021

[12] [12]

Annual Review of Control, Robotics, and Autonomous Systems5(1), 579–602 (2022)

Shapiro, C.R., Starke, G.M., Gayme, D.F.: Turbulence and control of wind farms. Annual Review of Control, Robotics, and Autonomous Systems5(1), 579–602 (2022)

work page 2022

[13] [13]

Annual Review of Fluid Mechanics40(1), 113–139 (2008)

Choi, H., Jeon, W.-P., Kim, J.: Control of flow over a bluff body. Annual Review of Fluid Mechanics40(1), 113–139 (2008)

work page 2008

[14] [14]

Annual Review of Fluid Mechanics39(1), 383–417 (2007)

Kim, J., Bewley, T.R.: A linear systems approach to flow control. Annual Review of Fluid Mechanics39(1), 383–417 (2007)

work page 2007

[15] [15]

Annual Review of Fluid Mechanics53(1), 311–345 (2021)

Jovanovi´ c, M.R.: From bypass transition to flow control and data-driven turbulence modeling: an input–output viewpoint. Annual Review of Fluid Mechanics53(1), 311–345 (2021)

work page 2021

[16] [16]

Nature620(7976), 982–987 (2023)

Kaufmann, E., Bauersfeld, L., Loquercio, A., M¨ uller, M., Koltun, V., Scaramuzza, D.: Champion- level drone racing using deep reinforcement learning. Nature620(7976), 982–987 (2023)

work page 2023

[17] [17]

Nature Machine Intelligence6(7), 787–798 (2024)

Han, L., Zhu, Q., Sheng, J., Zhang, C., Li, T., Zhang, Y., Zhang, H., Liu, Y., Zhou, C., Zhao, R., et al.: Lifelike agility and play in quadrupedal robots using reinforcement learning and generative pre-trained models. Nature Machine Intelligence6(7), 787–798 (2024)

work page 2024

[18] [18]

Science Robotics9(89), 9579 (2024)

Radosavovic, I., Xiao, T., Zhang, B., Darrell, T., Malik, J., Sreenath, K.: Real-world humanoid locomotion with reinforcement learning. Science Robotics9(89), 9579 (2024)

work page 2024

[19] [19]

The International Journal of Robotics Research39(1), 3–20 (2020)

Andrychowicz, O.M., Baker, B., Chociej, M., Jozefowicz, R., McGrew, B., Pachocki, J., Petron, A., Plappert, M., Powell, G., Ray, A.,et al.: Learning dexterous in-hand manipulation. The International Journal of Robotics Research39(1), 3–20 (2020)

work page 2020

[20] [20]

Science robotics5(47), 5986 (2020)

Lee, J., Hwangbo, J., Wellhausen, L., Koltun, V., Hutter, M.: Learning quadrupedal locomotion over challenging terrain. Science robotics5(47), 5986 (2020)

work page 2020

[21] [21]

Nature602(7897), 414–419 (2022)

Degrave, J., Felici, F., Buchli, J., Neunert, M., Tracey, B., Carpanese, F., Ewalds, T., Hafner, R., Abdolmaleki, A., Las Casas, D.,et al.: Magnetic control of tokamak plasmas through deep reinforcement learning. Nature602(7897), 414–419 (2022)

work page 2022

[22] [22]

Cambridge University Press, Cambridge (2000)

Pope, S.B.: Turbulent Flows. Cambridge University Press, Cambridge (2000)

work page 2000

[23] [23]

In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp

Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., Abbeel, P.: Domain randomization for transferring deep neural networks from simulation to the real world. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 23–30 (2017). IEEE

work page 2017

[24] [24]

Artificial intelligence101(1-2), 99–134 (1998)

Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable 19 stochastic domains. Artificial intelligence101(1-2), 99–134 (1998)

work page 1998

[25] [25]

Nature communications16(1), 1422 (2025)

Font, B., Alc´ antara-´Avila, F., Rabault, J., Vinuesa, R., Lehmkuhl, O.: Deep reinforcement learn- ing for active flow control in a turbulent separation bubble. Nature communications16(1), 1422 (2025)

work page 2025

[26] [26]

Journal of Fluid Mechanics984, 9 (2024)

Wang, Z., Lin, R., Zhao, Z., Chen, X., Guo, P., Yang, N., Wang, Z., Fan, D.: Learn to flap: Foil non-parametric path planning via deep reinforcement learning. Journal of Fluid Mechanics984, 9 (2024)

work page 2024

[27] [27]

Journal of Fluid Mechanics981, 17 (2024)

Xia, C., Zhang, J., Kerrigan, E.C., Rigas, G.: Active flow control for bluff body drag reduction using reinforcement learning with partial measurements. Journal of Fluid Mechanics981, 17 (2024)

work page 2024

[28] [28]

Journal of Fluid Mechanics 960, 30 (2023)

Sonoda, T., Liu, Z., Itoh, T., Hasegawa, Y.: Reinforcement learning of control strategies for reducing skin friction drag in a fully developed turbulent channel flow. Journal of Fluid Mechanics 960, 30 (2023)

work page 2023

[29] [29]

Physics of Fluids33(3) (2021)

Ren, F., Rabault, J., Tang, H.: Applying deep reinforcement learning to active flow control in weakly turbulent conditions. Physics of Fluids33(3) (2021)

work page 2021

[30] [30]

Rabault, J., Kuchta, M., Jensen, A., R´ eglade, U., Cerardi, N.: Artificial neural networks trained through deep reinforcement learning discover control strategies for active flow control. J. Fluid Mech.865, 281–302 (2019)

work page 2019

[31] [31]

Proceedings of the National Academy of Sciences115(23), 5849–5854 (2018)

Verma, S., Novati, G., Koumoutsakos, P.: Efficient collective swimming by harnessing vortices through deep reinforcement learning. Proceedings of the National Academy of Sciences115(23), 5849–5854 (2018)

work page 2018

[32] [32]

Communications Engineering1(1), 45 (2022)

Renn, P.I., Gharib, M.: Machine learning for flow-informed aerodynamic control in turbulent wind conditions. Communications Engineering1(1), 45 (2022)

work page 2022

[33] [33]

Journal of Fluid Mechanics1009, 3 (2025)

Zong, H., Wu, Y., Li, J., Su, Z., Liang, H.: Closed-loop supersonic flow control with a high-speed experimental deep reinforcement learning framework. Journal of Fluid Mechanics1009, 3 (2025)

work page 2025

[34] [34]

Proceedings of the National Academy of Sciences117(42), 26091–26098 (2020)

Fan, D., Yang, L., Wang, Z., Triantafyllou, M.S., Karniadakis, G.E.: Reinforcement learning for bluff body active flow control in experiments and simulations. Proceedings of the National Academy of Sciences117(42), 26091–26098 (2020)

work page 2020

[35] [35]

Annual Review of Fluid Mechanics52(1), 477–508 (2020)

Brunton, S.L., Noack, B.R., Koumoutsakos, P.: Machine learning for fluid mechanics. Annual Review of Fluid Mechanics52(1), 477–508 (2020)

work page 2020

[36] [36]

Journal of Artificial Intelligence Research76, 201–264 (2023)

Kirk, R., Zhang, A., Grefenstette, E., Rockt¨ aschel, T.: A survey of zero-shot generalisation in deep reinforcement learning. Journal of Artificial Intelligence Research76, 201–264 (2023)

work page 2023

[37] [37]

npj Computational Materials9(1), 55 (2023)

Li, K., DeCost, B., Choudhary, K., Greenwood, M., Hattrick-Simpers, J.: A critical examination of robustness and generalizability of machine learning prediction of materials properties. npj Computational Materials9(1), 55 (2023)

work page 2023

[38] [38]

SAE transactions, 473–503 (1984)

Ahmed, S.R., Ramm, G., Faltin, G.: Some salient features of the time-averaged ground vehicle wake. SAE transactions, 473–503 (1984)

work page 1984

[39] [39]

Technical report (1980)

Postel, J.: User datagram protocol. Technical report (1980)

work page 1980

[40] [40]

MIT Press, Cambridge, MA (2018)

Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. MIT Press, Cambridge, MA (2018). Chap. 3 20

work page 2018

[41] [41]

Soft Actor-Critic Algorithms and Applications

Haarnoja, T., Zhou, A., Hartikainen, K., Tucker, G., Ha, S., Tan, J., Kumar, V., Zhu, H., Gupta, A., Abbeel, P., et al.: Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[42] [42]

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Gu, A., Dao, T.: Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[43] [43]

Physics of Fluids25(9) (2013)

Grandemange, M., Gohlke, M., Cadot, O.: Bi-stability in the turbulent wake past parallelepiped bodies with various aspect ratios and wall effects. Physics of Fluids25(9) (2013)

work page 2013

[44] [44]

Journal of Fluid Mechanics802, 726–749 (2016)

Brackston, R.D., De La Cruz, J.G., Wynn, A., Rigas, G., Morrison, J.: Stochastic modelling and feedback control of bistability in a turbulent bluff body wake. Journal of Fluid Mechanics802, 726–749 (2016)

work page 2016

[45] [45]

Atmospheric turbulence and radio wave propagation, 166–178 (1967)

Lumley, J.L.: The structure of inhomogeneous turbulent flows. Atmospheric turbulence and radio wave propagation, 166–178 (1967)

work page 1967

[46] [46]

Annual Review of Fluid Mechanics25(1), 539–575 (1993)

Berkooz, G., Holmes, P., Lumley, J.L.: The proper orthogonal decomposition in the analysis of turbulent flows. Annual Review of Fluid Mechanics25(1), 539–575 (1993)

work page 1993

[47] [47]

Journal of Fluid Mechanics755, 5 (2014)

Rigas, G., Oxlade, A., Morgans, A., Morrison, J.: Low-dimensional dynamics of a turbulent axisymmetric wake. Journal of Fluid Mechanics755, 5 (2014)

work page 2014

[48] [48]

Journal of Fluids and Structures4(3), 231–257 (1990)

Berger, E., Scholz, D., Schumm, M.: Coherent vortex structures in the wake of a sphere and a circular disk at rest and under forced vibrations. Journal of Fluids and Structures4(3), 231–257 (1990)

work page 1990

[49] [49]

Layer Normalization

Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv preprint arXiv:1607.06450 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016

[50] [50]

http://github.com/jax-ml/jax

Bradbury, J., Frostig, R., Hawkins, P., Johnson, M.J., Leary, C., Maclaurin, D., Necula, G., Paszke, A., VanderPlas, J., Wanderman-Milne, S., Zhang, Q.: JAX: Composable Transformations of Python+NumPy programs. http://github.com/jax-ml/jax

work page

[51] [51]

Journal of Machine Learning Research22(268), 1–8 (2021)

Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., Dormann, N.: Stable-baselines3: Reliable reinforcement learning implementations. Journal of Machine Learning Research22(268), 1–8 (2021)

work page 2021

[52] [52]

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Dao, T., Gu, A.: Transformers are ssms: Generalized models and efficient algorithms through structured state space duality. arXiv preprint arXiv:2405.21060 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[53] [53]

In: International Conference on Learning Representations (2020)

Bouteiller, Y., Ramstedt, S., Beltrame, G., Pal, C., Binas, J.: Reinforcement learning with random delays. In: International Conference on Learning Representations (2020)

work page 2020

[54] [54]

Neurocomputing450, 119–128 (2021)

Chen, B., Xu, M., Li, L., Zhao, D.: Delay-aware model-based reinforcement learning for continuous control. Neurocomputing450, 119–128 (2021)

work page 2021

[55] [55]

Adam: A Method for Stochastic Optimization

Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014

[56] [56]

In: Advances in Neural Information Processing Systems, vol

Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., Chintala, S.: Pytorch: An imperative style, high-performance deep learning library. In: Advances in Neural Informat...

work page 2019

[57] [57]

Physical Review Fluids9(4), 043902 (2024)

Chatzimanolakis, M., Weber, P., Koumoutsakos, P.: Learning in two dimensions and controlling 21 in three: Generalizable drag reduction strategies for flows past circular cylinders through deep reinforcement learning. Physical Review Fluids9(4), 043902 (2024)

work page 2024

[58] [58]

Journal of Fluid Mechanics10(3), 345–356 (1961) Acknowledgements.We acknowledge support from the UKRI AI for Net Zero grant EP/Y005619/1

Roshko, A.: Experiments on the flow past a circular cylinder at very high Reynolds number. Journal of Fluid Mechanics10(3), 345–356 (1961) Acknowledgements.We acknowledge support from the UKRI AI for Net Zero grant EP/Y005619/1. J.Z is supported by the President’s Scholarship at Imperial College London. Author contribution.J.Z. developed the learning algo...

work page 1961