Fuzzy Logic Theory-based Adaptive Reward Shaping for Robust Reinforcement Learning (FARS)

H\"urkan \c{S}ahin , Van Huyen Dang , Erdi Sayar , Alper Yegenoglu , Erdal Kayacan

Authors on Pith no claims yet

Pith reviewed 2026-05-10 08:39 UTC · model grok-4.3

classification 💻 cs.RO

keywords rewardfuzzylearningmethodacrossadaptivechallenginglogic

0 comments

The pith

Fuzzy logic-based adaptive reward shaping improves RL convergence speed, reduces variability, and boosts success rates by up to 5% in drone racing simulations compared to standard rewards.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Reinforcement learning agents learn by trial and error using rewards, but in high-dimensional tasks with long time horizons such as autonomous drone racing, rewards are often sparse or fixed. This leads to slow exploration and agents getting stuck in poor behaviors. The method encodes expert knowledge into fuzzy rules that handle gradual concepts like speed or proximity to obstacles. These rules adjust the reward signal based on the current agent state, providing smoother guidance between aggressive movement and precise control. In simulation experiments on drone racing benchmarks, the approach produced more stable learning curves and lower variance across random training seeds. Success rates increased by approximately 5 percent in harder scenarios relative to non-fuzzy reward designs. The fuzzy system aims to reduce sensitivity to hyperparameter choices while remaining interpretable.

Core claim

Extensive simulation results on autonomous drone racing benchmarks show stable learning behavior and consistent task performance across scenarios of increasing difficulty. The proposed method achieves faster convergence and reduced performance variability across training seeds in more challenging environments, with success rates improving by up to approximately 5 percent compared to non fuzzy reward formulations.

Load-bearing premise

That human expert knowledge can be reliably encoded into a fixed set of fuzzy rules that generalize across varying difficulty levels and random seeds without introducing unintended biases or requiring extensive per-scenario retuning.

Figures

Figures reproduced from arXiv: 2604.15772 by Alper Yegenoglu, Erdal Kayacan, Erdi Sayar, H\"urkan \c{S}ahin, Van Huyen Dang.

**Figure 2.** Figure 2: Overview of our proposed training environment and the task design [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Velocity–distance reward surfaces r vd derived from (a) Mamdani and (b) Sugeno fuzzy inference systems. The surfaces provide a continuous representation of the fuzzy rule base, mapping distance–velocity inputs to reward outputs. Higher reward values correspond to higher velocities at larger distances and lower velocities near the target. variants differ in terms of smoothness and representation, both maint… view at source ↗

**Figure 4.** Figure 4: Randomly generated zigzag environments in Isaac Sim for the Easy, [PITH_FULL_IMAGE:figures/full_fig_p003_4.png] view at source ↗

**Figure 5.** Figure 5: Training and evaluation results for the Easy, Medium, and Hard scenarios. The left three columns (a–c, e–g, i–k) report training performance, averaged [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗

**Figure 4.** Figure 4: In simulation, the UAV is modeled as an X-configuration [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

read the original abstract

Reinforcement learning (RL) often struggles in real-world tasks with high-dimensional state spaces and long horizons, where sparse or fixed rewards severely slow down exploration and cause agents to get trapped in local optima. This paper presents a fuzzy logic based reward shaping method that integrates human intuition into RL reward design. By encoding expert knowledge into adaptive and interpreable terms, fuzzy rules promote stable learning and reduce sensitivity to hyperparameters. The proposed method leverages these properties to adapt reward contributions based on the agent state, enabling smoother transitions between fast motion and precise control in challenging navigation tasks. Extensive simulation results on autonomous drone racing benchmarks show stable learning behavior and consistent task performance across scenarios of increasing difficulty. The proposed method achieves faster convergence and reduced performance variability across training seeds in more challenging environments, with success rates improving by up to approximately 5 percent compared to non fuzzy reward formulations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that expert intuition translates directly into effective fuzzy rules for state-dependent reward modulation. No explicit free parameters or invented entities are named in the abstract, but the method implicitly depends on chosen membership functions and rule bases.

free parameters (1)

fuzzy membership function parameters
Likely tuned to define terms such as 'fast motion' or 'precise control' for the drone state.

axioms (1)

domain assumption Expert knowledge can be encoded into interpretable fuzzy rules that improve RL stability without overfitting to specific scenarios
Invoked to justify adaptive reward contributions based on agent state.

pith-pipeline@v0.9.0 · 5462 in / 1270 out tokens · 32572 ms · 2026-05-10T08:39:05.492485+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

28 extracted references · 4 canonical work pages · 2 internal anchors

[1]

Multi-objective evolutionary hindsight experience replay for robot manipulation tasks,

E. Sayar, G. Iacca, and A. Knoll, “Multi-objective evolutionary hindsight experience replay for robot manipulation tasks,” inProceedings of the Genetic and Evolutionary Computation Conference, ser. GECCO ’24. New York, NY , USA: Association for Computing Machinery, 2024, p. 403–411. [Online]. Available: https://doi.org/10.1145/3638529.3654045

work page doi:10.1145/3638529.3654045 2024
[2]

Contact energy based hindsight experience prioritization,

E. Sayar, Z. Bing, C. D’Eramo, O. S. Oguz, and A. Knoll, “Contact energy based hindsight experience prioritization,” in2024 IEEE International Conference on Robotics and Automation (ICRA), 2024, pp. 5434–5440

2024
[3]

Vds- nav: V olumetric depth-based safe navigation for aerial robots–bridging the sim-to-real gap,

V . H. Dang, A. Redder, H. X. Pham, A. Sarabakha, and E. Kayacan, “Vds- nav: V olumetric depth-based safe navigation for aerial robots–bridging the sim-to-real gap,”IEEE Robotics and Automation Letters, vol. 10, no. 10, pp. 11 038–11 045, 2025

2025
[4]

Lyapunov-inspired deep reinforcement learning for robot navigation in obstacle environments,

H. I. Ugurlu, A. Redder, and E. Kayacan, “Lyapunov-inspired deep reinforcement learning for robot navigation in obstacle environments,” in2025 IEEE Symposium on Computational Intelligence on Engineer- ing/Cyber Physical Systems (CIES), 2025, pp. 1–8

2025
[5]

Policy invariance under reward transformations: Theory and application to reward shaping,

A. Y . Ng, D. Harada, and S. Russell, “Policy invariance under reward transformations: Theory and application to reward shaping,” inIcml, vol. 99. Citeseer, 1999, pp. 278–287

1999
[6]

Type-2 fuzzy logic trajectory tracking control of quadrotor vtol aircraft with elliptic membership functions,

E. Kayacan and R. Maslim, “Type-2 fuzzy logic trajectory tracking control of quadrotor vtol aircraft with elliptic membership functions,” IEEE/ASME Transactions on Mechatronics, vol. 22, no. 1, pp. 339–348, 2017

2017
[7]

Input uncertainty sensitivity enhanced nonsingleton fuzzy logic controllers for long-term navigation of quadrotor uavs,

C. Fu, A. Sarabakha, E. Kayacan, C. Wagner, R. John, and J. M. Garibaldi, “Input uncertainty sensitivity enhanced nonsingleton fuzzy logic controllers for long-term navigation of quadrotor uavs,”IEEE/ASME Transactions on Mechatronics, vol. 23, no. 2, pp. 725–734, 2018

2018
[8]

Online deep fuzzy learning for control of nonlinear systems using expert knowledge,

A. Sarabakha and E. Kayacan, “Online deep fuzzy learning for control of nonlinear systems using expert knowledge,”IEEE Transactions on Fuzzy Systems, vol. 28, no. 7, pp. 1492–1503, 2020

2020
[9]

Intuit before tuning: Type-1 and type-2 fuzzy logic controllers,

A. Sarabakha, C. Fu, and E. Kayacan, “Intuit before tuning: Type-1 and type-2 fuzzy logic controllers,”Applied Soft Computing, vol. 81, p. 105495, 2019. [Online]. Available: https://www.sciencedirect.com/ science/article/pii/S1568494619302650

2019
[10]

An aerial robot for rice farm quality inspection with type-2 fuzzy neural networks tuned by particle swarm optimization-sliding mode control hybrid algorithm,

E. Camci, D. R. Kripalani, L. Ma, E. Kayacan, and M. A. Khanesar, “An aerial robot for rice farm quality inspection with type-2 fuzzy neural networks tuned by particle swarm optimization-sliding mode control hybrid algorithm,”Swarm and Evolutionary Computation, vol. 41, pp. 1–8, 2018. [Online]. Available: https://www.sciencedirect.com/science/article/pii/...

2018
[11]

Game of drones: Uav pursuit-evasion game with type-2 fuzzy logic controllers tuned by reinforcement learning,

E. Camci and E. Kayacan, “Game of drones: Uav pursuit-evasion game with type-2 fuzzy logic controllers tuned by reinforcement learning,” in 2016 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 2016, pp. 618–625

2016
[12]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” 2017. [Online]. Available: https://arxiv.org/abs/1707.06347

work page internal anchor Pith review Pith/arXiv arXiv 2017
[13]

Reinforcement learning- based control of nonlinear systems using lyapunov stability concept and fuzzy reward scheme,

M. Chen, H. K. Lam, Q. Shi, and B. Xiao, “Reinforcement learning- based control of nonlinear systems using lyapunov stability concept and fuzzy reward scheme,”IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 67, no. 10, pp. 2059–2063, 2020

2059
[14]

A safe navigation algorithm for differential-drive mobile robots by using fuzzy logic reward function-based deep reinforcement learning,

M. C. Bingol, “A safe navigation algorithm for differential-drive mobile robots by using fuzzy logic reward function-based deep reinforcement learning,”Electronics, vol. 14, no. 8, 2025

2025
[15]

Differential flatness of quadrotor dynamics subject to rotor drag for accurate tracking of high- speed trajectories,

M. Faessler, A. Franchi, and D. Scaramuzza, “Differential flatness of quadrotor dynamics subject to rotor drag for accurate tracking of high- speed trajectories,”IEEE Robotics and Automation Letters, vol. 3, no. 2, pp. 620–626, 2018

2018
[16]

Sim-to-real deep reinforcement learning for safe end-to-end planning of aerial robots,

H. I. Ugurlu, X. H. Pham, and E. Kayacan, “Sim-to-real deep reinforcement learning for safe end-to-end planning of aerial robots,”Robotics, vol. 11, no. 5, 2022. [Online]. Available: https://www.mdpi.com/2218-6581/11/5/109

2022
[17]

Continual learning for robust gate detection under dynamic lighting in autonomous drone racing,

Z. Qiao, X. H. Pham, S. Ramasamy, X. Jiang, E. Kayacan, and A. Sarabakha, “Continual learning for robust gate detection under dynamic lighting in autonomous drone racing,” in2024 International Joint Conference on Neural Networks (IJCNN), 2024, pp. 1–8

2024
[18]

Pencilnet: Zero-shot sim-to-real transfer learning for robust gate perception in autonomous drone racing,

H. X. Pham, A. Sarabakha, M. Odnoshyvkin, and E. Kayacan, “Pencilnet: Zero-shot sim-to-real transfer learning for robust gate perception in autonomous drone racing,”IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 11 847–11 854, 2022

2022
[19]

Event- based navigation for autonomous drone racing with sparse gated recurrent network,

K. F. Andersen, H. X. Pham, H. I. Ugurlu, and E. Kayacan, “Event- based navigation for autonomous drone racing with sparse gated recurrent network,” in2022 European Control Conference (ECC), 2022, pp. 1342– 1348

2022
[20]

Chapter 15 - deep learning for vision-based navigation in autonomous drone racing,

H. X. Pham, H. I. Ugurlu, J. Le Fevre, D. Bardakci, and E. Kayacan, “Chapter 15 - deep learning for vision-based navigation in autonomous drone racing,” inDeep Learning for Robot Perception and Cognition, A. Iosifidis and A. Tefas, Eds. Academic Press, 2022, pp. 371–406

2022
[21]

Gatenet: An efficient deep neural network architecture for gate perception using fish-eye camera in autonomous drone racing,

H. X. Pham, I. Bozcan, A. Sarabakha, S. Haddadin, and E. Kayacan, “Gatenet: An efficient deep neural network architecture for gate perception using fish-eye camera in autonomous drone racing,” in2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021, pp. 4176–4183

2021
[22]

Image generation for efficient neural network training in autonomous drone racing,

T. Morales, A. Sarabakha, and E. Kayacan, “Image generation for efficient neural network training in autonomous drone racing,” in2020 International Joint Conference on Neural Networks (IJCNN), 2020, pp. 1–8

2020
[23]

Environment as policy: Learning to race in unseen tracks,

H. Wang, J. Xing, N. Messikommer, and D. Scaramuzza, “Environment as policy: Learning to race in unseen tracks,” 2026

2026
[24]

Mastering diverse, unknown, and cluttered tracks for robust vision-based drone racing,

F. Yu, Y . Hu, Y . Su, Y . Deng, L. Zhang, and D. Zou, “Mastering diverse, unknown, and cluttered tracks for robust vision-based drone racing,” 2025

2025
[25]

Chapter 8 markov decision processes,

M. L. Puterman, “Chapter 8 markov decision processes,” inStochastic Models, ser. Handbooks in Operations Research and Management Science. Elsevier, 1990, vol. 2, pp. 331–434. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0927050705801720

1990
[26]

Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning

M. Mittal, P. Roth, J. Tigue, A. Richard, O. Zhang, P. Du, A. Serrano- Munoz, X. Yao, R. Zurbr ¨ugg, N. Rudinet al., “Isaac lab: A gpu- accelerated simulation framework for multi-modal robot learning,”arXiv preprint arXiv:2511.04831, 2025

work page internal anchor Pith review arXiv 2025
[27]

Application of fuzzy algorithms for control of simple dynamic plant,

E. Mamdani, “Application of fuzzy algorithms for control of simple dynamic plant,”Proceedings of the Institution of Electrical Engineers, vol. 121, pp. 1585–1588, 1974. [Online]. Available: https://digital-library.theiet.org/doi/abs/10.1049/piee.1974.0328

work page doi:10.1049/piee.1974.0328 1974
[28]

Sugeno,Industrial Applications of Fuzzy Control

M. Sugeno,Industrial Applications of Fuzzy Control. USA: Elsevier Science Inc., 1985

1985