pith. sign in

arxiv: 2607.01595 · v1 · pith:UHRP43IOnew · submitted 2026-07-02 · 💻 cs.AI · cs.CL

Safe and Adaptive Cloud Healing: Verifying LLM-Generated Recovery Plans with a Neural-Symbolic World Model

Pith reviewed 2026-07-03 14:55 UTC · model grok-4.3

classification 💻 cs.AI cs.CL
keywords cloud self-healingLLM plan synthesisneural-symbolic verificationfault recoveryprogram synthesisdeep reinforcement learningadaptive systemsworld model simulation
0
0 comments X

The pith

PASE turns cloud fault recovery into neuro-symbolic program synthesis by having an LLM generate plans that a world model then verifies through simulation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents PASE as a self-healing system that treats recovery planning as a synthesis task rather than a search over fixed actions. An LLM produces structured plans drawn from semantic primitives, a neural-symbolic world model simulates each plan to check feasibility, and a DRL-trained meta-prompt optimizer improves the prompts fed to the LLM. The resulting closed loop of reason-plan-verify-adapt is claimed to handle both known and unknown faults more effectively than prior sequential LLM-plus-DRL pipelines. Experiments on a real cloud fault-injection dataset are said to show more than 40 percent lower average recovery time and higher accuracy on unseen faults.

Core claim

PASE reconceptualizes recovery as a neuro-symbolic program synthesis task. It employs an LLM as a core Plan Synthesis Engine to generate structured recovery plans from a library of semantic primitives. A Neural-Symbolic World Model verifies plan feasibility through simulation, while a Meta-Prompt Optimizer, trained via DRL, learns to generate optimal prompts that guide the LLM's planning process. This tight reason-plan-verify-adapt loop enables dynamic, context-aware recovery strategy generation beyond predefined action spaces.

What carries the argument

The Planning-Aware Semantic self-healing engine (PASE) that uses an LLM for plan synthesis from semantic primitives, a neural-symbolic world model for simulation-based verification, and a DRL meta-prompt optimizer to adapt the LLM's prompts.

If this is right

  • Average system recovery time falls by more than 40 percent compared with prior methods on the same fault-injection dataset.
  • Fault detection accuracy rises in scenarios involving previously unseen faults.
  • Recovery actions can be synthesized outside any fixed action library because the LLM generates plans from semantic primitives.
  • The closed reason-plan-verify-adapt cycle replaces loosely coupled LLM-plus-DRL pipelines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same verify-before-execute pattern could be applied to other domains that require safe LLM-generated action sequences, such as robotic task planning.
  • If the world model simulation is cheap and accurate, the framework reduces the cost of exploring unsafe plans in real systems.
  • Success would depend on the library of semantic primitives being expressive enough to cover the faults that actually occur.

Load-bearing premise

The LLM will reliably output structured recovery plans whose feasibility the neural-symbolic world model can accurately determine by simulation.

What would settle it

An experiment in which a plan the world model labels feasible is executed on the live cloud system and produces a longer outage or new failure.

Figures

Figures reproduced from arXiv: 2607.01595 by Haoran Lin, Junyan Tan, Siyuan Guo, Tianyu Shen, Xinyue Luo, Yichen Fang, Zeyu Qiao.

Figure 1
Figure 1. Figure 1: Motivation. From state→action selection without verification (late, risky recovery) to plan→verify→adapt: PASE synthesizes recovery plans with an LLM planner, screens them via a neural-symbolic world model, and improves planning through DRL-based meta-prompt adaptation. 1 Introduction Modern cloud-based AI systems support a wide range of mission-critical services, from online inference to large-scale distr… view at source ↗
Figure 1
Figure 1. Figure 1: The main contributions of this work are summarized as follows: • We formulate cloud fault recovery as a neuro-symbolic program synthesis problem and propose PASE, which shifts the decision unit from single-step action selection to multi-step plan generation. • We design a verified planning loop that couples an LLM-based plan synthesizer with a neural￾symbolic world model for feasibility screening, improvin… view at source ↗
Figure 2
Figure 2. Figure 2: The overview of PASE. From observation Ot to verified recovery: Ot → Dt → pt → Πt → F(Πt). The recovery plan Πt is executed only when F(Πt) > τ ; the loop is trained via NSWM pre-training, MPO warm-up, and joint fine-tuning. 4 EXPERIMENTS 4.1 Experimental Objectives and Evaluation Metrics This experiment aims to evaluate the comprehensive performance of the proposed PASE framework in the task of cloud syst… view at source ↗
Figure 3
Figure 3. Figure 3: Meta-Prompt Optimization Strategy Ablation Study. [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Fault Detection Accuracy Comparison. a 92% correlation with the actual success rate of the plans in the real environment, demonstrating the effectiveness of simulation verification. 4.3.3 Rapid Adaptation Capability to Novel Faults We injected a novel, previously unseen “hybrid CPU-Memory deadlock fault” during the mid-phase of the experiment. As shown in [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Recovery Time Comparison. ensemble methods near our AUC (0.91) but are impractical for real-time use due to high overhead (156ms, 432MB). These results confirm that our neural-symbolic fusion optimally balances accuracy, efficiency, and resource use by combining neural pattern learning with symbolic safety guarantees. Meta-Prompt Optimization Strategy Evaluation [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: World Model Architecture Ablation Study. [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
read the original abstract

As the scale and complexity of cloud-based AI systems continue to escalate, ensuring service reliability through rapid fault detection and adaptive recovery has become a critical challenge. While existing approaches integrate Large Language Models (LLMs) for semantic understanding and Deep Reinforcement Learning (DRL) for policy optimization, they often rely on sequential, loosely coupled architectures that underutilize the generative and reasoning capabilities of LLMs. In this paper, we propose a paradigm shift with PASE, a Planning-Aware Semantic self-healing engine, a novel fault self-healing framework that reconceptualizes recovery as a neuro-symbolic program synthesis task. PASE employs an LLM as a core Plan Synthesis Engine to generate structured recovery plans from a library of semantic primitives. A Neural-Symbolic World Model verifies plan feasibility through simulation, while a Meta-Prompt Optimizer, trained via DRL, learns to generate optimal prompts that guide the LLM's planning process. This tight reason-plan-verify-adapt loop enables dynamic, context-aware recovery strategy generation beyond predefined action spaces. Experiments on a real-world cloud fault injection dataset demonstrate that PASE significantly outperforms state-of-the-art methods, reducing average system recovery time by over 40% and improving fault detection accuracy in unknown fault scenarios. Our framework advances autonomous system management by unifying LLM-based reasoning with model-assisted verification and meta-learned guidance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes PASE, a Planning-Aware Semantic self-healing engine for cloud AI systems. It reconceptualizes recovery as neuro-symbolic program synthesis: an LLM generates structured recovery plans from a library of semantic primitives; a Neural-Symbolic World Model verifies feasibility via simulation; and a DRL-trained Meta-Prompt Optimizer learns optimal prompts to guide the LLM. The framework forms a reason-plan-verify-adapt loop. Experiments on a real-world cloud fault injection dataset are reported to show >40% reduction in average recovery time and improved fault detection accuracy in unknown scenarios compared to state-of-the-art methods.

Significance. If the central claims hold, the work would be significant for autonomous cloud management by demonstrating a tight integration of LLM generative capabilities with model-based verification, moving beyond loosely coupled LLM+DRL pipelines. The neuro-symbolic verification step directly targets safety concerns in LLM-generated actions, which is a timely contribution to reliable AI-driven systems.

major comments (2)
  1. [Abstract] Abstract and experimental description: the headline claims of >40% reduction in recovery time and improved unknown-fault accuracy rest on the Neural-Symbolic World Model correctly classifying LLM-synthesized plans as feasible. No architecture, training procedure, simulation error rate against real executions, or false-positive analysis is supplied, rendering the performance numbers uninterpretable.
  2. [Abstract] Abstract: the experimental protocol, dataset description, baseline methods, number of trials, and statistical measures (error bars, significance tests) are absent, so it is impossible to determine whether the reported gains are supported by the data or methods.
minor comments (1)
  1. [Abstract] The abstract uses several compound terms ("Planning-Aware Semantic self-healing engine", "neuro-symbolic program synthesis task") without a concise one-sentence definition on first use.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on the abstract. We agree that additional details would improve clarity and will revise the abstract to incorporate summaries of the key components and experimental setup.

read point-by-point responses
  1. Referee: [Abstract] Abstract and experimental description: the headline claims of >40% reduction in recovery time and improved unknown-fault accuracy rest on the Neural-Symbolic World Model correctly classifying LLM-synthesized plans as feasible. No architecture, training procedure, simulation error rate against real executions, or false-positive analysis is supplied, rendering the performance numbers uninterpretable.

    Authors: The architecture, training procedure, and validation of the Neural-Symbolic World Model, including simulation fidelity and false-positive rates, are detailed in the body of the paper (Sections 3 and 4). To address the referee's concern about the abstract, we will revise the abstract to include a concise description of the world model and its verification capabilities. This will make the performance claims more interpretable without altering the manuscript's core content. revision: yes

  2. Referee: [Abstract] Abstract: the experimental protocol, dataset description, baseline methods, number of trials, and statistical measures (error bars, significance tests) are absent, so it is impossible to determine whether the reported gains are supported by the data or methods.

    Authors: The experimental protocol, including the dataset, baselines, number of trials, and statistical analysis, is described in Section 5 of the manuscript. We will revise the abstract to briefly mention the evaluation setup, such as the use of a real-world cloud fault injection dataset and comparison against state-of-the-art methods with multiple trials. This revision will help readers assess the reported results. revision: yes

Circularity Check

0 steps flagged

No circularity; framework description contains no derivations or self-referential reductions

full rationale

The paper text (abstract and described components) presents PASE as an architectural integration of LLM plan synthesis, neural-symbolic simulation, and DRL-based meta-prompt optimization, with performance claims tied to external experiments on a cloud fault dataset. No equations, derivation chains, fitted parameters renamed as predictions, or load-bearing self-citations appear. The central claims rest on empirical results rather than any step that reduces by construction to its own inputs. This matches the expected non-finding for papers without mathematical self-reference.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities with supporting detail; the neural-symbolic world model is mentioned but cannot be audited for independence or fitting.

pith-pipeline@v0.9.1-grok · 5792 in / 1030 out tokens · 30047 ms · 2026-07-03T14:55:49.035285+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

40 extracted references · 23 canonical work pages · 6 internal anchors

  1. [1]

    Optimization and prediction techniques for self-healing and self-learning applications in a trustworthy cloud continuum.Information, 12(8):308, 2021

    Juncal Alonso, Leire Orue-Echevarria, Eneko Osaba, Jesús López Lobo, Iñigo Martinez, Josu Diaz de Arcaya, and Iñaki Etxaniz. Optimization and prediction techniques for self-healing and self-learning applications in a trustworthy cloud continuum.Information, 12(8):308, 2021

  2. [2]

    ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate

    Chi-Min Chan, Weize Chen, Yusheng Su, Jianxuan Yu, Wei Xue, Shanghang Zhang, Jie Fu, and Zhiyuan Liu. Chateval: Towards better llm-based evaluators through multi-agent debate. arXiv preprint arXiv:2308.07201, 2023

  3. [3]

    Mvi-bench: A comprehensive benchmark for evaluating robustness to misleading 10 visual inputs in lvlms

    Huiyi Chen, Jiawei Peng, Dehai Min, Changchang Sun, Kaijie Chen, Yan Yan, Xu Yang, and Lu Cheng. Mvi-bench: A comprehensive benchmark for evaluating robustness to misleading 10 visual inputs in lvlms. InProceedings of the 43rd International Conference on Machine Learning (ICML 2026), 2025

  4. [4]

    R2i-bench: Benchmarking reasoning-driven text-to-image generation

    Kaijie Chen, Zihao Lin, Zhiyang Xu, Ying Shen, Yuguang Yao, Joy Rimchala, Jiaxin Zhang, and Lifu Huang. R2i-bench: Benchmarking reasoning-driven text-to-image generation. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 12606–12641, 2025

  5. [5]

    Composerx: Multi-agent symbolic music composition with llms.arXiv preprint arXiv:2404.18081, 2024

    Qixin Deng, Qikai Yang, Ruibin Yuan, Yipeng Huang, Yi Wang, Xubo Liu, Zeyue Tian, Jiahao Pan, Ge Zhang, Hanfeng Lin, et al. Composerx: Multi-agent symbolic music composition with llms.arXiv preprint arXiv:2404.18081, 2024

  6. [6]

    Confidence trigger detection: Accelerating real-time tracking-by-detection systems

    Zhicheng Ding, Zhixin Lai, Siyang Li, Panfeng Li, Qikai Yang, and Edward Wong. Confidence trigger detection: Accelerating real-time tracking-by-detection systems. In2024 5th Interna- tional Conference on Electronic Communication and Artificial Intelligence (ICECAI), pages 587–592. IEEE, 2024

  7. [7]

    Jianbing Feng, Tao Yu, Kuozhen Zhang, and Lefeng Cheng. Integration of multi-agent systems and artificial intelligence in self-healing subway power supply systems: Advancements in fault diagnosis, isolation, and recovery.Processes, 13(4):1144, 2025

  8. [8]

    Kanatsoulis, and Alejandro Ribeiro

    Jiashu He, Charilaos I. Kanatsoulis, and Alejandro Ribeiro. T-GAE: Transferable Graph Autoencoder for Network Alignment.arXiv e-prints, art. arXiv:2310.03272, October 2023. doi: 10.48550/arXiv.2310.03272

  9. [9]

    Give: Structured reasoning of large language models with knowledge graph inspired veracity extrapolation, 2025

    Jiashu He, Mingyu Derek Ma, Jinxuan Fan, Dan Roth, Wei Wang, and Alejandro Ribeiro. Give: Structured reasoning of large language models with knowledge graph inspired veracity extrapolation, 2025. URLhttps://arxiv.org/abs/2410.08475

  10. [10]

    GUI Agents for Continual Game Generation

    Yixu Huang, Bo Li, Na Li, Zhe Wang, Kaijie Chen, Haonan Ge, Qingyi Si, Yuanzhe Shen, Ruihan Yang, Guangjing Wang, and Hongcheng Guo. Gui agents for continual game generation. arXiv preprint arXiv:2605.28258, 2026. doi: 10.48550/arXiv.2605.28258

  11. [11]

    Cloud-based ai systems: Leveraging large language models for intelligent fault detection and autonomous self-healing.arXiv preprint arXiv:2505.11743, 2025

    Cheng Ji and Huaiying Luo. Cloud-based ai systems: Leveraging large language models for intelligent fault detection and autonomous self-healing.arXiv preprint arXiv:2505.11743, 2025

  12. [12]

    Assertion detection in clinical natural language processing using large language models

    Yuelyu Ji, Zeshui Yu, and Yanshan Wang. Assertion detection in clinical natural language processing using large language models. In2024 IEEE 12th International Conference on Healthcare Informatics (ICHI), pages 242–247, 2024. doi: 10.1109/ICHI61247.2024.00039

  13. [13]

    Metamorphictestingoflarge languagemodelsfornaturallanguageprocessing.doi:10.48550/arXiv

    Yuelyu Ji, Hang Zhang, and Yanshan Wang. Evaluating bias in retrieval-augmented medical question-answering systems.arXiv preprint arXiv:2503.15454, 2025. doi: 10.48550/arXiv. 2503.15454

  14. [14]

    Adaptive fault tolerance mechanisms of large language models in cloud computing environments.arXiv preprint arXiv:2503.12228, 2025

    Yihong Jin, Ze Yang, Xinhe Xu, Yihan Zhang, and Shuyang Ji. Adaptive fault tolerance mechanisms of large language models in cloud computing environments.arXiv preprint arXiv:2503.12228, 2025

  15. [15]

    Musarath Jahan Karamthulla, Jesu Narkarunai Arasu Malaiyappan, and Sanjeev Prakash. Ai- powered self-healing systems for fault tolerant platform engineering: Case studies and chal- lenges.Journal of Knowledge Learning and Science Technology ISSN: 2959-6386 (online), 2 (2):327–338, 2023

  16. [16]

    Deception detection from linguistic and physiological data streams using bimodal convolutional neural networks

    Panfeng Li, Mohamed Abouelenien, Rada Mihalcea, Zhicheng Ding, Qikai Yang, and Yiming Zhou. Deception detection from linguistic and physiological data streams using bimodal convolutional neural networks. In2024 5th International Conference on Information Science, Parallel and Distributed Systems (ISPDS), pages 263–267. IEEE, 2024

  17. [17]

    Automated intelligent healing in cloud-scale data centers

    Rui Li, Zhinan Cheng, Patrick PC Lee, Pinghui Wang, Yi Qiang, Lin Lan, Cheng He, Jinlong Lu, Mian Wang, and Xinquan Ding. Automated intelligent healing in cloud-scale data centers. In2021 40th International Symposium on Reliable Distributed Systems (SRDS), pages 244–253. IEEE, 2021. 11

  18. [18]

    Advances in appfl: A comprehensive and extensible federated learning framework.arXiv preprint arXiv:2409.11585, 2024

    Zilinghan Li, Shilan He, Ze Yang, Minseok Ryu, Kibaek Kim, and Ravi Madduri. Advances in appfl: A comprehensive and extensible federated learning framework.arXiv preprint arXiv:2409.11585, 2024

  19. [19]

    GraphSnapShot: Caching local structure for fast graph learning.arXiv preprint arXiv:2406.17918, 2024

    Dong Liu, Roger Waleffe, Meng Jiang, and Shivaram Venkataraman. GraphSnapShot: Caching local structure for fast graph learning.arXiv preprint arXiv:2406.17918, 2024. doi: 10.48550/ arXiv.2406.17918

  20. [20]

    Designing large foundation models for efficient training and inference: A survey.arXiv preprint arXiv:2409.01990, 2024

    Dong Liu, Yanxuan Yu, Yite Wang, Jing Wu, Zhongwei Wan, Sina Alinejad, Benjamin Lengerich, and Ying Nian Wu. Designing large foundation models for efficient training and inference: A survey.arXiv preprint arXiv:2409.01990, 2024. doi: 10.48550/arXiv.2409.01990

  21. [21]

    Cross-cloud data privacy protection: Optimizing collabora- tive mechanisms of ai systems by integrating federated learning and llms.arXiv preprint arXiv:2505.13292, 2025

    Huaiying Luo and Cheng Ji. Cross-cloud data privacy protection: Optimizing collabora- tive mechanisms of ai systems by integrating federated learning and llms.arXiv preprint arXiv:2505.13292, 2025

  22. [22]

    Faithfulpersona: Balancing faithfulness and personalization in code explanations through self-critique

    Zhuang Luo, Yichuan Li, Zexing Xu, Kyumin Lee, and S Rasoul Etesami. Faithfulpersona: Balancing faithfulness and personalization in code explanations through self-critique. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 930–944, 2025

  23. [23]

    Artificial intelligence for self-healing automation testing frameworks: Real-time fault prediction and recovery.CINE- FORUM, 64(3S):111–140, 2024

    Prathyusha Nama, Purushotham Reddy, and Suprit Kumar Pattanayak. Artificial intelligence for self-healing automation testing frameworks: Real-time fault prediction and recovery.CINE- FORUM, 64(3S):111–140, 2024. URL https://revistadecineforum.com/index.php/ cf/article/view/177

  24. [24]

    Artificial intelligence for fault detection in cloud-optimized data engineering systems.International Journal of Social Trends, 2(4):8–44, 2024

    Dillep Kumar Pentyala. Artificial intelligence for fault detection in cloud-optimized data engineering systems.International Journal of Social Trends, 2(4):8–44, 2024

  25. [25]

    Sohel Rana. Ai-driven fault detection and predictive maintenance in electrical power systems: A systematic review of data-driven approaches, digital twins, and self-healing grids.American Journal of Advanced Technology and Engineering Solutions, 1(01):258–289, 2025

  26. [26]

    Sarkar, M

    Ayushman Sarkar, Mohd Yamani Idna Idris, and Zhenyu Yu. Reasoning in computer vision: Taxonomy, models, tasks, and methodologies.arXiv preprint arXiv:2508.10523, 2025

  27. [27]

    Multi-Agent Collaboration: Harnessing the Power of Intelligent LLM Agents

    Yashar Talebirad and Amirhossein Nadiri. Multi-agent collaboration: Harnessing the power of intelligent llm agents.arXiv preprint arXiv:2306.03314, 2023

  28. [28]

    Ai-enhanced self-healing cloud architectures for data integrity, privacy, and sustainable learning

    Vamshidhar Reddy Vemula. Ai-enhanced self-healing cloud architectures for data integrity, privacy, and sustainable learning. InSmart Education and Sustainable Learning Environments in Smart Cities, pages 93–106. IGI Global Scientific Publishing, 2025

  29. [29]

    A Stochastic GDA Method With Backtracking For Solving Nonconvex Concave Minimax Problems

    Qiushui Xu, Xuan Zhang, Necdet Serhat Aybat, and Mert Gürbüzbalaban. A stochastic gda method with backtracking for solving nonconvex (strongly) concave minimax problems.arXiv preprint arXiv:2403.07806, 2024

  30. [30]

    From critique to clarity: A pathway to faithful and personalized code explanations with large language models

    Zexing Xu, Zhuang Luo, Yichuan Li, Kyumin Lee, and S Rasoul Etesami. From critique to clarity: A pathway to faithful and personalized code explanations with large language models. arXiv preprint arXiv:2501.14731, 2024

  31. [31]

    Hades: Hardware accelerated decoding for efficient speculation in large language models.arXiv preprint arXiv:2412.19925, 2024

    Ze Yang, Yihong Jin, and Xinhe Xu. Hades: Hardware accelerated decoding for efficient speculation in large language models.arXiv preprint arXiv:2412.19925, 2024

  32. [32]

    Research on large language model cross-cloud privacy protection and collaborative training based on federated learning

    Ze Yang, Yihong Jin, Yihan Zhang, Juntian Liu, and Xinhe Xu. Research on large language model cross-cloud privacy protection and collaborative training based on federated learning. arXiv preprint arXiv:2503.12226, 2025

  33. [33]

    Drdgrl: Dual-relational dynamic graph repre- sentation learning for delay-sensitive stock trend prediction

    Mingjie You, Kaijie Chen, and Dawei Cheng. Drdgrl: Dual-relational dynamic graph repre- sentation learning for delay-sensitive stock trend prediction. InInternational Conference on Database Systems for Advanced Applications, pages 35–50. Springer, 2026

  34. [34]

    Ai for science: A comprehensive review on innovations, challenges, and future directions.International Journal of Artificial Intelligence for Science (IJAI4S), 1(1), 2025

    Zhenyu Yu. Ai for science: A comprehensive review on innovations, challenges, and future directions.International Journal of Artificial Intelligence for Science (IJAI4S), 1(1), 2025. 12

  35. [35]

    Physics-constrained symbolic regression from imagery

    Zhenyu Yu, Mohd Yamani Idna Idris, and Pei Wang. Physics-constrained symbolic regression from imagery. In2nd AI for Math Workshop@ ICML 2025, 2025

  36. [36]

    Cotextor: Training-free modular multilingual text editing via layered disentanglement and depth-aware fusion

    Zhenyu Yu, Mohd Yamani Idna Idris, Pei Wang, and Rizwan Qureshi. Cotextor: Training-free modular multilingual text editing via layered disentanglement and depth-aware fusion. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems Creative AI Track: Humanity, 2025

  37. [37]

    MemMark: State-Evolution Attribution Watermarking for Agent Long-Term Memory Systems

    Haobo Zhang, Xutao Mao, Guangyuan Dong, Ziwei Li, Xuanbo Su, Kaijie Chen, Jing Yang, and Zheng Lin. Memmark: State-evolution attribution watermarking for agent long-term memory systems.arXiv preprint arXiv:2605.25002, 2026

  38. [38]

    Agda+: Proximal alternating gradient descent ascent method with a nonmonotone adaptive step-size search for nonconvex minimax problems.arXiv preprint arXiv:2406.14371, 2024

    Xuan Zhang, Qiushui Xu, and Necdet Serhat Aybat. Agda+: Proximal alternating gradient descent ascent method with a nonmonotone adaptive step-size search for nonconvex minimax problems.arXiv preprint arXiv:2406.14371, 2024

  39. [39]

    Stride: Strategic trajectory reasoning via discriminative estimation for verifiable reinforcement learning.arXiv preprint arXiv:2606.15866,

    Qinjian Zhao, Zhihao Dou, Dinggen Zhang, Xiangyu Li, Chaoda Song, Zhongwei Wan, Xinpeng Li, Yanyan Zhang, Kaijie Chen, Qingtao Pan, et al. Stride: Strategic trajectory reasoning via dis- criminative estimation for verifiable reinforcement learning.arXiv preprint arXiv:2606.15866, 2026

  40. [40]

    Towards foundation-model- based multiagent system to accelerate ai for social impact.arXiv preprint arXiv:2412.07880, 2024

    Yunfan Zhao, Niclas Boehmer, Aparna Taneja, and Milind Tambe. Towards foundation-model- based multiagent system to accelerate ai for social impact.arXiv preprint arXiv:2412.07880, 2024. 13