arxiv: 2605.11981 · v1 · submitted 2026-05-12 · ⚛️ physics.flu-dyn · cs.AI

Recognition: no theorem link

High-lift Wing Separation Control via Bayesian Optimization and Deep Reinforcement Learning

Bernat Font, Ivette Rodriguez, Oriol Lehmkuhl, Ricard Montal\`a, Ricardo Vinuesa

Pith reviewed 2026-05-13 04:52 UTC · model grok-4.3

classification ⚛️ physics.flu-dyn cs.AI

keywords active flow controlhigh-lift wingBayesian optimizationdeep reinforcement learningsynthetic jetsflow separationlarge-eddy simulationaerodynamic efficiency

0 comments

The pith

Bayesian optimization identifies steady jet velocities that raise high-lift wing efficiency by 10.9 percent via 9.7 percent drag reduction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors compare two strategies for setting synthetic jet strengths to reduce flow separation on a three-element high-lift wing at 23 degrees angle of attack. Bayesian optimization searches over constant jet velocities on the slat, main element, and flap and locates values that cut drag by 9.7 percent while lift stays the same, producing a 10.9 percent efficiency gain. Deep reinforcement learning, which adjusts the jets from live flow sensor readings, yields only small shifts in lift and drag and almost no efficiency improvement. Training records show that the reward function's penalties restricted how far the learning agent could explore new settings. The comparison indicates that open-loop search currently outperforms the closed-loop method for this high-Reynolds-number control problem.

Core claim

The Bayesian optimization framework successfully identified steady jet velocities that increased efficiency by +10.9% through a -9.7% drag reduction while maintaining lift. In contrast, the DRL agent, despite leveraging instantaneous flow information from distributed sensors, achieved only minor improvements in lift and drag, with negligible efficiency gain. Training analysis indicated that the penalty-dominated reward constrained exploration.

What carries the argument

Bayesian optimization search over steady synthetic jet velocities placed on the slat, main wing element, and flap.

If this is right

Steady jet velocities located by Bayesian optimization reduce drag by 9.7 percent while lift remains unchanged.
Aerodynamic efficiency rises by 10.9 percent under these fixed jet settings.
Deep reinforcement learning produces only minor lift and drag changes when its reward heavily penalizes deviations.
Wall-resolved large-eddy simulations reproduce the baseline flow in agreement with prior measurements.
Reward design must allow greater exploration before closed-loop control can match open-loop search performance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Open-loop Bayesian optimization currently provides a more practical route than the tested reinforcement learning setup for reducing separation at this Reynolds number.
A less penalty-heavy reward could allow the reinforcement learning agent to reach or exceed the efficiency gains of the open-loop method.
The jet velocities identified here supply a specific target that could be checked in a wind-tunnel experiment.
The same optimization approach could be applied at other angles of attack to locate efficient control settings across the operating range.

Load-bearing premise

The penalties built into the reinforcement learning reward were the main reason the agent did not discover stronger control settings.

What would settle it

Re-train the reinforcement learning agent with a reward that rewards efficiency gains more directly and check whether the resulting efficiency improvement reaches the 10.9 percent level found by Bayesian optimization.

Figures

Figures reproduced from arXiv: 2605.11981 by Bernat Font, Ivette Rodriguez, Oriol Lehmkuhl, Ricard Montal\`a, Ricardo Vinuesa.

**Figure 1.** Figure 1: CFD-DRL set-up To enable three-dimensional actuation without excessive computational cost, a multi-agent reinforcement learning (MARL) framework is employed [35, 18, 19, 24]. The computational domain is then divided into multiple pseudo-environments, each containing its own set of jets (slat, main, and flap), while all pseudo-environments share a single agent. Thus, based on local flow states, the agent … view at source ↗

**Figure 2.** Figure 2: Pressure coefficient Cp (left) and skin friction coefficient Cf (right) distributions for the nonactuated case over the walls of the wing. The skin friction coefficient is only shown on the suction side of the main and flap elements. The time-averaged aerodynamic coefficients obtained with each mesh are reported in [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Instantaneous fields from the coarse (left panel) and fine meshes (right panel) in the uncon [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Efficiency E (left), lift coefficient Cl (middle), and drag coefficient Cd (right) manifolds predicted by the GP models as a function of the slat and main jet velocities [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Slice A (left panel) and B (right panel) of the efficiency [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

**Figure 6.** Figure 6: Streamlines of the time-averaged flow field with velocity magnitude in the background for the [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗

**Figure 7.** Figure 7: Lift coefficient Cl (left), drag coefficient Cd (middle) and local reward r (right) evolution during the DRL training. All quantities are averaged over the last 4U∞/c of each episode and values from each MARL episode are concatenated sequentially. Consequently, the reward r increases over the course of training, indicating a certain degree of learning. However, this improvement primarily stems from the re… view at source ↗

read the original abstract

This study investigates active flow control (AFC) of a 30P30N high-lift wing at a Reynolds number Re$_c$ = 450,000 and angle of attack $\alpha$ = 23$^\circ$ using wallresolved large-eddy simulations (LES). Two optimization strategies are explored: open-loop Bayesian optimization (BO) and closed-loop deep reinforcement learning (DRL), both targeting the mitigation of stall and the improvement of aerodynamic efficiency via synthetic jets on the slat, main, and flap elements. The uncontrolled configuration was validated against literature data, confirming the reliability of the LES setup. The BO framework successfully identified steady jet velocities that increased efficiency by +10.9% through a -9.7% drag reduction while maintaining lift. In contrast, the DRL agent, despite leveraging instantaneous flow information from distributed sensors, achieved only minor improvements in lift and drag, with negligible efficiency gain. Training analysis indicated that the penalty-dominated reward constrained exploration. These results highlight the need for carefully designed rewards and computational acceleration strategies in DRL-based flow control at high Reynolds numbers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 3 minor

Summary. This paper investigates active flow control on a 30P30N high-lift wing at Re_c = 450,000 and α = 23° using wall-resolved LES. It compares open-loop Bayesian optimization (BO) for identifying steady synthetic jet velocities on the slat, main element, and flap against closed-loop deep reinforcement learning (DRL) that uses instantaneous sensor data. The uncontrolled LES is validated against literature data. BO yields a +10.9% efficiency gain via -9.7% drag reduction at fixed lift, while DRL produces only minor lift/drag changes with negligible efficiency improvement, which the authors attribute to a penalty-dominated reward function limiting exploration.

Significance. If the numerical results hold, the work supplies a concrete, validated demonstration that BO can locate effective steady actuation parameters for stall mitigation in a realistic high-lift configuration, delivering a quantifiable aerodynamic-efficiency improvement. The side-by-side comparison with DRL highlights practical difficulties in applying reinforcement learning to high-Re turbulent flows and points to reward design as a key area for future refinement. The explicit LES validation and reporting of specific percentage gains strengthen the contribution to computational aerodynamics and data-driven flow control.

major comments (1)

[DRL training analysis] DRL training analysis section: the attribution of limited DRL performance primarily to the penalty-dominated reward function is interpretive rather than demonstrated; no ablation studies or alternative reward formulations are presented to isolate this factor from sensor placement, episode length, or controlled-case LES fidelity, weakening the contrast drawn with the BO results.

minor comments (3)

[Abstract] Abstract and §3: the definition of aerodynamic efficiency as the lift-to-drag ratio is stated but should be repeated explicitly when the +10.9% figure is first introduced to avoid any ambiguity for readers.
[Methods] Methods: the specific BO acquisition function, kernel choice, and number of evaluations are not detailed; adding these parameters would improve reproducibility of the reported optimum jet velocities.
[Figures and tables] Figure captions and tables: axis labels and units for jet velocity, lift, and drag coefficients should be checked for consistency with the text; several captions are terse and would benefit from one additional sentence describing the key trend shown.

Simulated Author's Rebuttal

1 responses · 1 unresolved

We thank the referee for the constructive and detailed review of our manuscript. We have carefully considered the major comment and provide a point-by-point response below. We are prepared to revise the manuscript to address the concerns raised.

read point-by-point responses

Referee: [DRL training analysis] DRL training analysis section: the attribution of limited DRL performance primarily to the penalty-dominated reward function is interpretive rather than demonstrated; no ablation studies or alternative reward formulations are presented to isolate this factor from sensor placement, episode length, or controlled-case LES fidelity, weakening the contrast drawn with the BO results.

Authors: We agree that the attribution of limited DRL performance to the penalty-dominated reward function is interpretive, as it is derived from analysis of the observed training curves and reward component breakdowns rather than from controlled ablation experiments. The manuscript's training analysis shows that the penalty term rapidly dominated the cumulative reward, which we interpret as limiting exploration of lift-enhancing or drag-reducing actions. We acknowledge that factors such as sensor placement, episode length, and LES fidelity in the controlled cases could also play a role, and that the absence of ablations weakens the strength of the contrast with the BO results. In the revised manuscript, we will expand the DRL training analysis section with additional details on the reward formulation, include supplementary plots breaking down the individual reward terms over training episodes, and explicitly qualify our conclusions as interpretive while noting the computational constraints that precluded ablations. We will also moderate the language comparing DRL and BO outcomes to reflect these limitations. revision: yes

standing simulated objections not resolved

We cannot perform the requested ablation studies or test alternative reward formulations, as they would require extensive additional wall-resolved LES computations at Re_c = 450,000 that exceed available resources.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper reports direct outcomes from wall-resolved LES of the 30P30N configuration at fixed Re and alpha, validated against external literature data for the uncontrolled case. Bayesian optimization and DRL are then applied as standard black-box optimizers to search for jet velocities; the +10.9% efficiency gain is the numerical result of those runs, not a quantity fitted or defined in terms of itself. The observation that the penalty term limited DRL exploration is an empirical training-log conclusion, not a self-referential definition. No load-bearing step reduces to a self-citation, ansatz smuggled via prior work, or renaming of a known result; the derivation chain is self-contained against external benchmarks and computational experiments.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The claims depend on the fidelity of the CFD model and the appropriateness of the reward function in DRL, with jet velocities being outputs of the optimization rather than inputs.

axioms (1)

domain assumption The wall-resolved large-eddy simulation accurately represents the flow physics for both uncontrolled and controlled cases at the given Reynolds number.
Validation is provided only for the uncontrolled configuration against literature.

pith-pipeline@v0.9.0 · 5509 in / 1354 out tokens · 47890 ms · 2026-05-13T04:52:56.154930+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 1 internal anchor

[1]

Thibert J J, Reneaux J, Moens F and Preist J 1995The Aeronautical Journal99395–411

work page
[2]

Klausmeyer S M and Lin J C 1997 Comparative results from a CFD challenge over a 2D three- element high-lift airfoil Tech. Rep. NASA TM-112858 NASA Langley Research Center NASA Technical Memorandum

work page 1997
[3]

Choudhari M and Lockard D 2015 Assessment of slat noise predictions for 30P30N high-lift con- figuration from BANC-III workshop21st AIAA/CEAS Aeroacoustics Conference

work page 2015
[4]

Pascioni K, Cattafesta L and Choudhari M 2014 An experimental investigation of the 30P30N multi-element high-lift airfoil20th AIAA/CEAS Aeroacoustics Conference

work page 2014
[5]

Ashton N, West A and Mendonça F 2016AIAA Journal543657–3667

work page
[6]

Gao J, Li X and Lin D 2020AIAA Journal582517–2532

work page
[7]

Ueno Y and Ochi A 2019 Airframe noise prediction using Navier-Stokes code with cartesian and boundary-fitted layer meshes25th AIAA/CEAS Aeroacoustics Conference

work page 2019
[8]

Shur M, Strelets M, Spalart P and Travin A 2023Journal of Turbulence24554–576

work page
[9]

Montalà R, Lehmkuhl O and Rodriguez I 2024Physics of Fluids36025125 ISSN 1070-6631

work page
[10]

Montalà R, Lehmkuhl O and Rodriguez I 2025Flow, Turbulence and Combustion11551–77

work page
[11]

You D and Moin P 2008Journal of Fluids and Structures241349–1357 ISSN 0889-9746 unsteady Separated Flows and their Control

work page
[12]

Rodriguez I, Lehmkuhl O and Borrell R 2020Flow, Turbulence and Combustion105607–626

work page
[13]

Melton L, Yao C and Seifert A 2006AIAA Journal44012017

work page
[14]

Shmilovich A and Yadlin Y 2009Journal of Aircraft461354–1364

work page
[15]

Lehmkuhl O, Lozano-Durán A and Rodriguez I 2020Journal of Physics: Conference Series1522 012017

work page
[16]

Fluid Mech.865281–302

Rabault J, Kuchta M, Jensen A, Réglade U and Cerardi N 2019J. Fluid Mech.865281–302

work page
[17]

Fluids31094105 ISSN 1070-6631

Rabault J and Kuhnle A 2019Phys. Fluids31094105 ISSN 1070-6631

work page
[18]

Suárez P, Alcántara-Ávila F, Rabault J, Miró A, Font B, Lehmkuhl O and Vinuesa R 2025Commun. Eng.4

work page
[19]

Combust.1153–27

Suárez P, Alcántara-Ávila F, Rabault J, Miró A, Font B, Lehmkuhl O and Vinuesa R 2025Flow Turbul. Combust.1153–27

work page
[20]

Guastoni L, Rabault J, Schlatter P, Azizpour H and Vinuesa R 2023Eur . Phys. J. E4627

work page
[21]

Vasanth J, Rabault J, Alcántara-Ávila F, Mortensen M and Vinuesa R 2024Flow Turbul. Combust

work page
[22]

Fluids34033606

Wang Y , Mei Y , Aubry N, Chen Z, Wu P and Wu W 2022Phys. Fluids34033606

work page
[23]

Garcia X, Miró A, Suárez P, Alcántara-Ávila F, Rabault J, Font B, Lehmkuhl O and Vinuesa R 2025 Int. J. Heat Fluid Flow116109913 ISSN 0142-727X

work page 2025
[24]

Commun.161422

Font B, Alcántara-Ávila F, Rabault J, Vinuesa R and Lehmkuhl O 2025Nat. Commun.161422

work page
[25]

Montalà R, Font B, Suárez P, Rabault J, Lehmkuhl O, Vinuesa R and Rodriguez I 2025 Deep reinforcement learning for active flow control around a three-dimensional flow-separated wing at Re = 1,000https://doi.org/10.48550/arXiv.2509.10195

work page doi:10.48550/arxiv.2509.10195 2025
[26]

org/10.48550/arXiv.2509.10185

Montalà R, Font B, Suárez P, Rabault J, Lehmkuhl O, Vinuesa R and Rodriguez I 2025 Discovering flow separation control strategies in 3D wings via deep reinforcement learninghttps://doi. org/10.48550/arXiv.2509.10185

work page doi:10.48550/arxiv.2509.10185 2025
[27]

Morita Y , Rezaeiravesh S, Tabatabaei N, Vinuesa R, Fukagata K and Schlatter P 2022Journal of Computational Physics449110788 ISSN 0021-9991

work page
[28]

Mahfoze O A, Moody A, Wynn A, Whalley R D and Laizet S 2019Phys. Rev. Fluids4(9) 094601

work page
[29]

Han B Z, Huang W X and Xu C X 2023Physics of Fluids35115144 ISSN 1070-6631

work page
[30]

Li Y , Noack B R, Wang T, Cornejo Maceda G Y , Pickering E, Shaqarin T and Tyliszczak A 2024 Journal of Fluid Mechanics991A5

work page 2024
[31]

Vreman A W 2004Physics of Fluids163670–3681

work page
[32]

Gasparino L, Spiga F and Lehmkuhl O 2024Computer Physics Communications297109067 ISSN 0010-4655

work page
[33]

com/tensorflow/agents

Guadarrama S, Korattikara A, Ramirez O, Castro P, Holly E, Fishman S, Wang K, Gonina E, Wu N, Kokiopoulou E, Sbaiz L, Smith J, Bartók G, Berent J, Harris C, Vanhoucke V and Brevdo E 2018 TF-Agents: A library for reinforcement learning in TensorFlow URLhttps://github. com/tensorflow/agents

work page 2018
[34]

Partee S, Ellis M, Rigazzi A, Shao A, Bachman S, Marques G and Robbins B 2022J. Comput. Sci. 62101707 ISSN 1877-7503

work page
[35]

Belus V , Rabault J, Viquerat J, Che Z, Hachem E and Reglade U 2019AIP Advances9125014 ISSN 2158-3226

work page
[36]

Schulman J, Wolski F, Dhariwal P, Radford A and Klimov O 2017 Proximal policy optimization algorithmshttps://doi.org/10.48550/arXiv.1707.06347

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1707.06347 2017
[37]

Piomelli U and Chasnov J R 1996Large-Eddy Simulations: Theory and Applications(Dordrecht: Springer Netherlands) pp 269–336 ISBN 978-94-015-8666-5

work page