Reinforcement Twinning for Hybrid Control of Flapping-Wing Drones

Joris Degroote; Lilla Koloszar; Lorenzo Schena; Miguel Alfonso Mendez; Romain Poletti

arxiv: 2505.18201 · v2 · pith:CATUV2ICnew · submitted 2025-05-21 · 💻 cs.RO · cs.LG

Reinforcement Twinning for Hybrid Control of Flapping-Wing Drones

Romain Poletti , Lorenzo Schena , Lilla Koloszar , Joris Degroote , Miguel Alfonso Mendez This is my paper

classification 💻 cs.RO cs.LG

keywords hybridcontrolflapping-winglearningmodel-basedmodel-freepurelyreinforcement

0 comments

read the original abstract

Controlling flapping-wing drones requires controllers that handle time-varying, nonlinear, underactuated dynamics from incomplete, noisy sensor data. Recent advances in artificial intelligence (AI), particularly reinforcement learning (RL), have opened new perspectives for addressing such complex control problems through data-driven policy optimization from interaction with the environment. Yet purely data-driven methods are sample-inefficient, demanding extensive, sometimes unsafe exploration, especially without guiding physical models. This motivates hybrid AI-physics frameworks. This article proposes a hybrid model-free/model-based flight-control approach using the reinforcement twinning algorithm. The model-based (MB) component uses an adjoint formulation and an adaptive digital twin continuously identified from live trajectories; the model-free (MF) component uses RL. The two agents share knowledge via transfer learning, imitation learning, and shared experience between the real environment and the digital twin, coordinated by a policy referee that selects which agent acts in reality based on digital-twin performance and a real-to-virtual consistency ratio. The framework is evaluated for the longitudinal control of a flapping-wing drone, modelled as a nonlinear time-varying system driven by quasi-steady aerodynamic forces. The hybrid strategy is tested under three adaptive-model initializations: (1) offline identification from existing data, (2) random initialization with fully online identification, and (3) offline pre-training with biased parameters followed by online adaptation. In all cases, the hybrid framework improves performance, robustness, and sample efficiency over purely model-free and purely model-based approaches.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Nonlinear System Identification of Variable-Pitch Propellers Using a Wiener Model
eess.SY 2026-04 unverdicted novelty 4.0

A Wiener model with linear dynamics for actuation followed by a static nonlinearity reproduces experimental PWM-to-thrust responses of variable-pitch propellers with good accuracy under stated assumptions.