Recognition: unknown
Fuzzy Logic Theory-based Adaptive Reward Shaping for Robust Reinforcement Learning (FARS)
Pith reviewed 2026-05-10 08:39 UTC · model grok-4.3
The pith
Fuzzy logic-based adaptive reward shaping improves RL convergence speed, reduces variability, and boosts success rates by up to 5% in drone racing simulations compared to standard rewards.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Extensive simulation results on autonomous drone racing benchmarks show stable learning behavior and consistent task performance across scenarios of increasing difficulty. The proposed method achieves faster convergence and reduced performance variability across training seeds in more challenging environments, with success rates improving by up to approximately 5 percent compared to non fuzzy reward formulations.
Load-bearing premise
That human expert knowledge can be reliably encoded into a fixed set of fuzzy rules that generalize across varying difficulty levels and random seeds without introducing unintended biases or requiring extensive per-scenario retuning.
Figures
read the original abstract
Reinforcement learning (RL) often struggles in real-world tasks with high-dimensional state spaces and long horizons, where sparse or fixed rewards severely slow down exploration and cause agents to get trapped in local optima. This paper presents a fuzzy logic based reward shaping method that integrates human intuition into RL reward design. By encoding expert knowledge into adaptive and interpreable terms, fuzzy rules promote stable learning and reduce sensitivity to hyperparameters. The proposed method leverages these properties to adapt reward contributions based on the agent state, enabling smoother transitions between fast motion and precise control in challenging navigation tasks. Extensive simulation results on autonomous drone racing benchmarks show stable learning behavior and consistent task performance across scenarios of increasing difficulty. The proposed method achieves faster convergence and reduced performance variability across training seeds in more challenging environments, with success rates improving by up to approximately 5 percent compared to non fuzzy reward formulations.
Editorial analysis
A structured set of objections, weighed in public.
Axiom & Free-Parameter Ledger
free parameters (1)
- fuzzy membership function parameters
axioms (1)
- domain assumption Expert knowledge can be encoded into interpretable fuzzy rules that improve RL stability without overfitting to specific scenarios
Reference graph
Works this paper leans on
-
[1]
Multi-objective evolutionary hindsight experience replay for robot manipulation tasks,
E. Sayar, G. Iacca, and A. Knoll, “Multi-objective evolutionary hindsight experience replay for robot manipulation tasks,” inProceedings of the Genetic and Evolutionary Computation Conference, ser. GECCO ’24. New York, NY , USA: Association for Computing Machinery, 2024, p. 403–411. [Online]. Available: https://doi.org/10.1145/3638529.3654045
-
[2]
Contact energy based hindsight experience prioritization,
E. Sayar, Z. Bing, C. D’Eramo, O. S. Oguz, and A. Knoll, “Contact energy based hindsight experience prioritization,” in2024 IEEE International Conference on Robotics and Automation (ICRA), 2024, pp. 5434–5440
2024
-
[3]
Vds- nav: V olumetric depth-based safe navigation for aerial robots–bridging the sim-to-real gap,
V . H. Dang, A. Redder, H. X. Pham, A. Sarabakha, and E. Kayacan, “Vds- nav: V olumetric depth-based safe navigation for aerial robots–bridging the sim-to-real gap,”IEEE Robotics and Automation Letters, vol. 10, no. 10, pp. 11 038–11 045, 2025
2025
-
[4]
Lyapunov-inspired deep reinforcement learning for robot navigation in obstacle environments,
H. I. Ugurlu, A. Redder, and E. Kayacan, “Lyapunov-inspired deep reinforcement learning for robot navigation in obstacle environments,” in2025 IEEE Symposium on Computational Intelligence on Engineer- ing/Cyber Physical Systems (CIES), 2025, pp. 1–8
2025
-
[5]
Policy invariance under reward transformations: Theory and application to reward shaping,
A. Y . Ng, D. Harada, and S. Russell, “Policy invariance under reward transformations: Theory and application to reward shaping,” inIcml, vol. 99. Citeseer, 1999, pp. 278–287
1999
-
[6]
Type-2 fuzzy logic trajectory tracking control of quadrotor vtol aircraft with elliptic membership functions,
E. Kayacan and R. Maslim, “Type-2 fuzzy logic trajectory tracking control of quadrotor vtol aircraft with elliptic membership functions,” IEEE/ASME Transactions on Mechatronics, vol. 22, no. 1, pp. 339–348, 2017
2017
-
[7]
Input uncertainty sensitivity enhanced nonsingleton fuzzy logic controllers for long-term navigation of quadrotor uavs,
C. Fu, A. Sarabakha, E. Kayacan, C. Wagner, R. John, and J. M. Garibaldi, “Input uncertainty sensitivity enhanced nonsingleton fuzzy logic controllers for long-term navigation of quadrotor uavs,”IEEE/ASME Transactions on Mechatronics, vol. 23, no. 2, pp. 725–734, 2018
2018
-
[8]
Online deep fuzzy learning for control of nonlinear systems using expert knowledge,
A. Sarabakha and E. Kayacan, “Online deep fuzzy learning for control of nonlinear systems using expert knowledge,”IEEE Transactions on Fuzzy Systems, vol. 28, no. 7, pp. 1492–1503, 2020
2020
-
[9]
Intuit before tuning: Type-1 and type-2 fuzzy logic controllers,
A. Sarabakha, C. Fu, and E. Kayacan, “Intuit before tuning: Type-1 and type-2 fuzzy logic controllers,”Applied Soft Computing, vol. 81, p. 105495, 2019. [Online]. Available: https://www.sciencedirect.com/ science/article/pii/S1568494619302650
2019
-
[10]
An aerial robot for rice farm quality inspection with type-2 fuzzy neural networks tuned by particle swarm optimization-sliding mode control hybrid algorithm,
E. Camci, D. R. Kripalani, L. Ma, E. Kayacan, and M. A. Khanesar, “An aerial robot for rice farm quality inspection with type-2 fuzzy neural networks tuned by particle swarm optimization-sliding mode control hybrid algorithm,”Swarm and Evolutionary Computation, vol. 41, pp. 1–8, 2018. [Online]. Available: https://www.sciencedirect.com/science/article/pii/...
2018
-
[11]
Game of drones: Uav pursuit-evasion game with type-2 fuzzy logic controllers tuned by reinforcement learning,
E. Camci and E. Kayacan, “Game of drones: Uav pursuit-evasion game with type-2 fuzzy logic controllers tuned by reinforcement learning,” in 2016 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 2016, pp. 618–625
2016
-
[12]
Proximal Policy Optimization Algorithms
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” 2017. [Online]. Available: https://arxiv.org/abs/1707.06347
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[13]
Reinforcement learning- based control of nonlinear systems using lyapunov stability concept and fuzzy reward scheme,
M. Chen, H. K. Lam, Q. Shi, and B. Xiao, “Reinforcement learning- based control of nonlinear systems using lyapunov stability concept and fuzzy reward scheme,”IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 67, no. 10, pp. 2059–2063, 2020
2059
-
[14]
A safe navigation algorithm for differential-drive mobile robots by using fuzzy logic reward function-based deep reinforcement learning,
M. C. Bingol, “A safe navigation algorithm for differential-drive mobile robots by using fuzzy logic reward function-based deep reinforcement learning,”Electronics, vol. 14, no. 8, 2025
2025
-
[15]
Differential flatness of quadrotor dynamics subject to rotor drag for accurate tracking of high- speed trajectories,
M. Faessler, A. Franchi, and D. Scaramuzza, “Differential flatness of quadrotor dynamics subject to rotor drag for accurate tracking of high- speed trajectories,”IEEE Robotics and Automation Letters, vol. 3, no. 2, pp. 620–626, 2018
2018
-
[16]
Sim-to-real deep reinforcement learning for safe end-to-end planning of aerial robots,
H. I. Ugurlu, X. H. Pham, and E. Kayacan, “Sim-to-real deep reinforcement learning for safe end-to-end planning of aerial robots,”Robotics, vol. 11, no. 5, 2022. [Online]. Available: https://www.mdpi.com/2218-6581/11/5/109
2022
-
[17]
Continual learning for robust gate detection under dynamic lighting in autonomous drone racing,
Z. Qiao, X. H. Pham, S. Ramasamy, X. Jiang, E. Kayacan, and A. Sarabakha, “Continual learning for robust gate detection under dynamic lighting in autonomous drone racing,” in2024 International Joint Conference on Neural Networks (IJCNN), 2024, pp. 1–8
2024
-
[18]
Pencilnet: Zero-shot sim-to-real transfer learning for robust gate perception in autonomous drone racing,
H. X. Pham, A. Sarabakha, M. Odnoshyvkin, and E. Kayacan, “Pencilnet: Zero-shot sim-to-real transfer learning for robust gate perception in autonomous drone racing,”IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 11 847–11 854, 2022
2022
-
[19]
Event- based navigation for autonomous drone racing with sparse gated recurrent network,
K. F. Andersen, H. X. Pham, H. I. Ugurlu, and E. Kayacan, “Event- based navigation for autonomous drone racing with sparse gated recurrent network,” in2022 European Control Conference (ECC), 2022, pp. 1342– 1348
2022
-
[20]
Chapter 15 - deep learning for vision-based navigation in autonomous drone racing,
H. X. Pham, H. I. Ugurlu, J. Le Fevre, D. Bardakci, and E. Kayacan, “Chapter 15 - deep learning for vision-based navigation in autonomous drone racing,” inDeep Learning for Robot Perception and Cognition, A. Iosifidis and A. Tefas, Eds. Academic Press, 2022, pp. 371–406
2022
-
[21]
Gatenet: An efficient deep neural network architecture for gate perception using fish-eye camera in autonomous drone racing,
H. X. Pham, I. Bozcan, A. Sarabakha, S. Haddadin, and E. Kayacan, “Gatenet: An efficient deep neural network architecture for gate perception using fish-eye camera in autonomous drone racing,” in2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021, pp. 4176–4183
2021
-
[22]
Image generation for efficient neural network training in autonomous drone racing,
T. Morales, A. Sarabakha, and E. Kayacan, “Image generation for efficient neural network training in autonomous drone racing,” in2020 International Joint Conference on Neural Networks (IJCNN), 2020, pp. 1–8
2020
-
[23]
Environment as policy: Learning to race in unseen tracks,
H. Wang, J. Xing, N. Messikommer, and D. Scaramuzza, “Environment as policy: Learning to race in unseen tracks,” 2026
2026
-
[24]
Mastering diverse, unknown, and cluttered tracks for robust vision-based drone racing,
F. Yu, Y . Hu, Y . Su, Y . Deng, L. Zhang, and D. Zou, “Mastering diverse, unknown, and cluttered tracks for robust vision-based drone racing,” 2025
2025
-
[25]
Chapter 8 markov decision processes,
M. L. Puterman, “Chapter 8 markov decision processes,” inStochastic Models, ser. Handbooks in Operations Research and Management Science. Elsevier, 1990, vol. 2, pp. 331–434. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0927050705801720
1990
-
[26]
Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning
M. Mittal, P. Roth, J. Tigue, A. Richard, O. Zhang, P. Du, A. Serrano- Munoz, X. Yao, R. Zurbr ¨ugg, N. Rudinet al., “Isaac lab: A gpu- accelerated simulation framework for multi-modal robot learning,”arXiv preprint arXiv:2511.04831, 2025
work page internal anchor Pith review arXiv 2025
-
[27]
Application of fuzzy algorithms for control of simple dynamic plant,
E. Mamdani, “Application of fuzzy algorithms for control of simple dynamic plant,”Proceedings of the Institution of Electrical Engineers, vol. 121, pp. 1585–1588, 1974. [Online]. Available: https://digital-library.theiet.org/doi/abs/10.1049/piee.1974.0328
-
[28]
Sugeno,Industrial Applications of Fuzzy Control
M. Sugeno,Industrial Applications of Fuzzy Control. USA: Elsevier Science Inc., 1985
1985
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.