When Web Apps Heal Themselves: A MAPE-K Based Approach to Fault Tolerance and Adaptive Recovery
Pith reviewed 2026-05-20 05:04 UTC · model grok-4.3
The pith
A MAPE-K framework with AutoFix detects web app faults at 90.7% F1-score and recovers 56.2% faster.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors claim that their modular self-healing framework, built on the MAPE-K model and incorporating an AutoFix-inspired adaptive mechanism, delivers effective fault tolerance for web applications. Evaluation through design and development research with fault injection in twenty scenarios yielded a mean fault detection F1-score of 90.7 percent, a 93.2 percent recovery success rate, and a 56.2 percent reduction in time-to-recovery down to an average of 3.92 seconds, alongside stable throughput and gains from feedback iterations.
What carries the argument
The MAPE-K loop of monitor-analyze-plan-execute over a shared knowledge base, paired with the AutoFix module that selects and refines recovery actions through iterative feedback.
If this is right
- System throughput remains between 88 and 95 percent even while faults are active.
- Average response time rises by only 3.1 percent under fault conditions.
- Iterative feedback raises recovery efficiency by 18.6 percent across repeated cycles.
- The framework supplies a concrete starting point for building more autonomous self-healing web applications.
Where Pith is reading between the lines
- Production deployments might surface fault types absent from the twenty controlled scenarios, requiring updates to the recovery library.
- Embedding learning algorithms in the knowledge base could let the system invent new fixes instead of depending only on predefined ones.
- The same monitor-analyze-plan-execute structure could be adapted to improve resilience in related systems such as microservice clusters or cloud services.
Load-bearing premise
The twenty runtime failure scenarios created through controlled fault injection accurately represent the range and frequency of faults encountered in real-world production web application environments.
What would settle it
Deploying the framework on a live production web application and measuring its actual fault detection F1-score and recovery times against the controlled-experiment results over several weeks of normal operation.
Figures
read the original abstract
Ensuring the reliability and resilience of modern web applications remains a critical challenge due to increasing system complexity and dynamic runtime environments. This study proposes a modular self-healing framework based on the monitor-analyze-plan-execute over a shared knowledge base (MAPE-K) model, integrated with an AutoFix-inspired mechanism for adaptive fault recovery. Using a design and development research (DDR) approach, the system was implemented and evaluated through controlled fault injection experiments across twenty runtime failure scenarios, including service crashes, memory leaks, and database disconnections. Experimental results demonstrate that the proposed framework achieved a mean fault detection F1-score of 90.7% and a recovery success rate of 93.2%. The AutoFix module reduced the average time-to-recovery (TTR) by 56.2%, achieving an average recovery time of 3.92 seconds. System throughput was maintained between 88% and 95% during fault conditions, with only a 3.1% increase in response time. Additionally, iterative feedback mechanisms improved recovery efficiency by 18.6% over multiple cycles. These findings indicate that the proposed framework provides a practical and extensible approach to enhancing fault tolerance in web applications through feedback-driven adaptation. While the current implementation relies on predefined recovery strategies, the integration of learning-oriented feedback establishes a foundation for future development of more autonomous self-healing systems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a modular self-healing framework for web applications based on the MAPE-K model integrated with an AutoFix-inspired mechanism for adaptive fault recovery. Implemented via a design and development research approach, the system is evaluated through controlled fault injection experiments across twenty runtime failure scenarios (including service crashes, memory leaks, and database disconnections). It reports a mean fault detection F1-score of 90.7%, a recovery success rate of 93.2%, a 56.2% reduction in average time-to-recovery (TTR) to 3.92 seconds, throughput maintained between 88% and 95% with a 3.1% response time increase, and an 18.6% improvement in recovery efficiency from iterative feedback mechanisms.
Significance. If the results hold under more rigorous validation, the work offers a practical, extensible approach to fault tolerance in dynamic web applications by combining the MAPE-K loop with feedback-driven adaptation. The concrete metrics on detection accuracy, recovery speed, and system performance under faults provide a foundation for future autonomous self-healing systems, though the current reliance on predefined strategies limits full autonomy claims.
major comments (2)
- Evaluation section: The twenty runtime failure scenarios created through controlled fault injection are presented without a quantitative mapping to observed fault frequencies from production logs, without comparison to public web-app failure datasets, and without sensitivity analysis showing how the F1-score of 90.7% or recovery success rate of 93.2% change when fault probabilities are altered. This makes the central performance claims difficult to extrapolate beyond the chosen test harness.
- Results and Abstract: The manuscript states specific performance numbers (mean F1-score of 90.7%, recovery success of 93.2%, TTR reduction of 56.2%) but supplies no information on baselines, statistical tests, error bars, or the precise calculation methods for F1-score and recovery success. This is load-bearing for assessing whether the data support the stated claims.
minor comments (2)
- Abstract: The claim that 'iterative feedback mechanisms improved recovery efficiency by 18.6% over multiple cycles' lacks detail on the number of cycles, the exact efficiency metric used, or how the improvement was measured.
- Notation and presentation: Ensure consistent use of terms such as 'AutoFix module' versus 'AutoFix-inspired mechanism' across sections to avoid ambiguity in describing the adaptive recovery component.
Simulated Author's Rebuttal
We are grateful to the referee for the valuable feedback on our manuscript. The comments highlight important aspects for improving the rigor of our evaluation and results reporting. We have prepared point-by-point responses and indicate where revisions will be incorporated.
read point-by-point responses
-
Referee: Evaluation section: The twenty runtime failure scenarios created through controlled fault injection are presented without a quantitative mapping to observed fault frequencies from production logs, without comparison to public web-app failure datasets, and without sensitivity analysis showing how the F1-score of 90.7% or recovery success rate of 93.2% change when fault probabilities are altered. This makes the central performance claims difficult to extrapolate beyond the chosen test harness.
Authors: We recognize the importance of grounding the evaluation in real-world data. The twenty scenarios were carefully chosen to cover a representative range of common web application faults, including crashes, leaks, and disconnections, based on established taxonomies in the software engineering literature. However, obtaining quantitative mappings from production logs would require access to proprietary data from specific deployments, which was beyond the scope of this controlled study. We will expand the Evaluation section to provide a detailed rationale for scenario selection, citing relevant prior work on web app failures. We commit to performing a sensitivity analysis by adjusting fault injection probabilities and reporting the impact on key metrics. A comparison to public datasets will be discussed as a future direction, noting that suitable standardized datasets for runtime web app faults are limited. revision: partial
-
Referee: Results and Abstract: The manuscript states specific performance numbers (mean F1-score of 90.7%, recovery success of 93.2%, TTR reduction of 56.2%) but supplies no information on baselines, statistical tests, error bars, or the precise calculation methods for F1-score and recovery success. This is load-bearing for assessing whether the data support the stated claims.
Authors: We agree that additional details are necessary to substantiate the reported figures. The mean F1-score of 90.7% was derived from aggregating detection performance across all scenarios, using standard definitions of precision and recall where a detection is considered correct if the fault type and location are accurately identified within a time window. The recovery success rate of 93.2% reflects the fraction of cases where the planned recovery actions fully restored the application state. To address this, we will revise the manuscript to include baseline comparisons (e.g., to threshold-based monitoring without MAPE-K), specify the exact formulas and data used for calculations, report standard deviations or confidence intervals from repeated experimental runs, and include appropriate statistical tests to validate the significance of the 56.2% TTR reduction. These changes will be made in the revised version. revision: yes
- The absence of a quantitative mapping to production logs and direct comparisons to public datasets, as the study was based on controlled experiments without access to such real-world data sources.
Circularity Check
No circularity: results are direct experimental measurements
full rationale
The manuscript presents an implemented MAPE-K self-healing framework evaluated via controlled fault-injection experiments on twenty predefined scenarios. No equations, derivations, fitted parameters, or mathematical predictions appear in the provided text or abstract. Performance metrics (F1-score, recovery success, TTR reduction) are reported as observed outcomes of the test harness rather than quantities derived from or equivalent to the input assumptions by construction. No self-citation chains, ansatzes, or renamings of known results are invoked as load-bearing steps. The evaluation therefore remains self-contained against external benchmarks and receives the default non-circularity finding.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
modular self-healing framework based on the monitor–analyze–plan–execute over a shared knowledge base (MAPE-K) model, integrated with an AutoFix-inspired mechanism
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
fault detection F1-score of 90.7% and a recovery success rate of 93.2%
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Yazdanparast, Using rule engine in self-healing systems and MAPE model
Z. Yazdanparast, Using rule engine in self-healing systems and MAPE model. arXiv preprint arXiv:2402.11581, 2024
-
[2]
A two phases self-healing framework for service-oriented systems,
A. Alhosban, Z. Malik, K. Hashmi, B. Medjahed, and H. Al -Ababneh, “A two phases self-healing framework for service-oriented systems,” ACM Transactions on the Web, vol. 15, no. 2, pp. 1–25, Apr. 2021, doi: 10.1145/3450443
-
[3]
Proactive self‐healing techniques for cloud computing: a systematic review,
S. R. Rouholamini, M. Mirabi, R. Farazkish, and A. Sahafi, “Proactive self‐healing techniques for cloud computing: a systematic review,” Concurrency and Computation: Practice and Experience, vol. 36, no. 24, Aug. 2024, doi: 10.1002/cpe.8246
-
[4]
V. Ajith, T. Cyriac, C. Chavda, A. T. Kiyani, V. Chennareddy, and K. Ali, “Analyzing docker vulnerabilities through static an d dynamic methods and enhancing IoT security with AWS IoT core, CloudWatch, and GuardDuty,” IoT, vol. 5, no. 3, pp. 592–607, Sep. 2024, doi: 10.3390/iot5030026
-
[5]
F. Eyvazov, T. E. Ali, F. I. Ali, and A. D. Zoltan, “Beyond containers: orchestrating microserv ices with minikube, kubernetes, docker, and compose for seamless deployment and scalability,” in 2024 11th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO) , Mar. 2024, pp. 1 –6, doi: 10.1109...
-
[6]
Ma -Ease: an android-based technology for corn production and management,
S. G. Aribe Jr, J. M. H. Turtosa, J. M. B. Yamba, and A. B. Jamisola, “Ma -Ease: an android-based technology for corn production and management,” Pertanika Journal of Science and Technology, vol. 27, no. 1, 2019
work page 2019
-
[7]
NotiPower: a mobile -based power advisory for bukidnon second electric cooperative, inc. consumers,
S. G. Aribe Jr , J. M. Q. Vedra, J. M. Ladion, and A. S. Tablazon, “NotiPower: a mobile -based power advisory for bukidnon second electric cooperative, inc. consumers,” International Journal of Multidisciplinary Research and Publications , vol. 2, no. 1, pp. 35–42
-
[8]
Developing digital research portal for bukidnon state university’s scholarly work,
K. J. R. Caseres, R. P. Cruz, L. A. T. Gonzales, P. G. Mary L. Tapayan, and S. Aribe Jr., “Developing digital research portal for bukidnon state university’s scholarly work,” SSRN Electronic Journal, 2025, doi: 10.2139/ssrn.5389800
-
[9]
An android -based ubiquitous notification application for Bukidnon State University,
S. G. Aribe Jr, C. C. Yabes, M. V. G. Jamago, K. I. L. Rayos, H. Toledo Rebosura, and J. J. B. Gonzales, “An android -based ubiquitous notification application for Bukidnon State University,” Pertanika Journal of Science and Technology , vol. 27, no. 2, 2019
work page 2019
-
[10]
A. Sibgatullina, R. Ivanova, and E. Yushchik, “Moodle learning system as an effective tool for implementing the innovation policy of the university,” International Journal of Web -Based Learning and Teaching Technologies , vol. 17, no. 1, pp. 1 –12, Mar. 2022, doi: 10.4018/ijwltt.298683
-
[11]
A survey on automatic bug fixing,
H. Cao, Y. Meng, J. Shi, L. Li, T. Liao, and C. Zhao, “A survey on automatic bug fixing,” in 2020 6th International Symposium on System and Software Reliability (ISSSR), Oct. 2020, pp. 122–131, doi: 10.1109/isssr51244.2020.00029
-
[12]
A literature review on automated code repair,
T. Mamatha, B. R. S. Reddy, and C. S. Bindu, “A literature review on automated code repair,” in Proceedings of the 2nd International Conference on Recent Trends in Machine Learning, IoT, Smart Cities and Applications , Springer Nature Singapore, 2022, pp. 249–260
work page 2022
-
[13]
Self -healing control: review, framework, and prospect,
H. Liang and X. Yin, “Self -healing control: review, framework, and prospect,” IEEE Access , vol. 11, pp. 79495 –79512, 2023, doi: 10.1109/access.2023.3298554
-
[14]
Self -healing autonomous software code development,
S. K. Jangam, “Self -healing autonomous software code development,” International Journal of Emerging Trends in Computer Science and Information Technology, vol. 3, no. 4, pp. 42–52, 2022
work page 2022
-
[15]
J. Alonso et al. , “Optimization and prediction techniques for self -healing and self -learning applications in a trustworthy cloud continuum,” Information, vol. 12, no. 8, p. 308, Jul. 2021, doi: 10.3390/info12080308
-
[16]
J. Lee, K. Oh, Y. Yoon, T. Song, T. Lee, and K. Yi, “Adaptive fault detection and emergency control of autonomous vehicles fo r fail-safe systems using a sliding mode a pproach,” IEEE Access , vol. 10, pp. 27863 –27880, 2022, doi: 10.1109/access.2022.3155738
-
[17]
A review of monitoring probes for cloud computing continuum,
Y. Verginadis, “A review of monitoring probes for cloud computing continuum,” in Advanced Information Networking and Applications, Springer International Publishing, 2023, pp. 631–643
work page 2023
-
[18]
N. M. Aris, “Design and development research (DDR) approach in designing design thinking chemistry module to empower students’ innovation competencies,” Journal of Advanced Research in Applied Sciences and Engineering Technology, vol. 44, no. 1, pp. 55–68, Apr. 2024, doi: 10.37934/araset.44.1.5568
-
[19]
Self -healing test automation framework using AI and ML,
S. Saarathy, S. Bathrachalam, and B. Rajendran, “Self -healing test automation framework using AI and ML,” International Journal of Strategic Management, vol. 3, no. 3, pp. 45–77, Aug. 2024, doi: 10.47604/ijsm.2843
-
[20]
Y. Matsuo and D. Ikegami, “Performance analysis of anomaly detection methods for application system on kubernetes with auto - scaling and self -healing,” in 2021 17th International Conference on Network and Service Management (CNSM) , Oct. 2021, pp. 464–472, doi: 10.23919/cnsm52442.2021.9615544
-
[21]
S. Dubey, “Test automation revisited: comparative analysis of tools and frameworks for scalable software testing,” International Journal for Research in Applied Science and Engineering Technology , vol. 13, no. 9, pp. 207 –216, Sep. 2025, doi: 10.22214/ijraset.2025.73663
-
[22]
Self -adaptive systems planning with model checking using MAPE -K,
A. E. M. Da Silva, A. M. S. Andrade, and S. S. Andrade, “Self -adaptive systems planning with model checking using MAPE -K,” in Anais do XXI Workshop de Testes e Tolerância a Falhas (WTF 2020), Dec. 2020, pp. 69–82, doi: 10.5753/wtf.2020.12488
-
[23]
Differential optimization testing of gremlin -based graph database systems,
Y. Zheng et al. , “Differential optimization testing of gremlin -based graph database systems,” in 2024 IEEE Conference on Software Testing, Verification and Validation (ICST), May 2024, pp. 25–36, doi: 10.1109/icst60714.2024.00012
-
[24]
The role of chaos engineering in devops for software robustness,
N. A. Mhatre and M. S. Kulkarni, “The role of chaos engineering in devops for software robustness,” in Applied Intelligence and Computing, Soft Computing Research Society, 2024, pp. 9–17
work page 2024
-
[25]
Software engineering revolutionized by machine learning -powered self -healing systems,
J. Patel and H. Shah, “Software engineering revolutionized by machine learning -powered self -healing systems,” International Research Journal Of Engineering & Applied Sciences, vol. 9, no. 1, pp. 43–49, 2021, doi: 10.55083/irjeas.2021.v09i01008
-
[26]
Applying machine learning in self -adaptive systems: a systematic literature review,
O. Gheibi, D. Weyns, and F. Quin, “Applying machine learning in self -adaptive systems: a systematic literature review,” ACM Transactions on Autonomous and Adaptive Systems, vol. 15, no. 3, pp. 1–37, Sep. 2020, doi: 10.1145/3469440
-
[27]
Yazdanparast, A survey on self-healing software system
Z. Yazdanparast, A survey on self-healing software system. arXiv preprint arXiv:2403.00455, 2024
-
[28]
Kubernetes and docker load balancing: state -of-the-art techniques and challenges,
I. Vasireddy, G. Ramya, and P. Kandi, “Kubernetes and docker load balancing: state -of-the-art techniques and challenges,” International Journal of Innovative Research in Engineering and Management , vol. 10, no. 6, pp. 49 –54, Dec. 2023, doi: 10.55524/ijirem.2023.10.6.7. ISSN: 2252-8776 Int J Inf & Commun Technol, Vol. 15, No. 2, June 2026: 729-740 740...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.