pith. machine review for the scientific record. sign in

arxiv: 2604.21092 · v1 · submitted 2026-04-22 · 💻 cs.AI · cs.SE

Recognition: unknown

Mind the Prompt: Self-adaptive Generation of Task Plan Explanations via LLMs

Alexandros Evangelidis, Gricel V\'azquez, Radu Calinescu, Sepeedeh Shahbeigi, Simos Gerasimou

Authors on Pith no claims yet

Pith reviewed 2026-05-09 23:53 UTC · model grok-4.3

classification 💻 cs.AI cs.SE
keywords prompt engineeringPOMDPcognitive modelingLLM explanationsself-adaptive systemstask planningcyber-physical systems
0
0 comments X

The pith

COMPASS models users' hidden cognitive states as a POMDP to automatically refine prompts for better LLM explanations of task plans.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces COMPASS as a proof-of-concept method to automate the tricky process of prompt engineering for large language models that explain complex AI task planning. It treats prompt refinement as a probabilistic decision problem in which hidden user mental factors such as attention, comprehension, and uncertainty are inferred from visible interaction signals. These elements are combined inside a partially observable Markov decision process whose policy then generates explanations and prompt adjustments on the fly. The approach is tested on two cyber-physical system examples, with both numerical measures and human judgments used to check whether the resulting explanations are more reliable across different users. The core idea is that embedding a model of human cognition can make LLM output more consistent without repeated manual prompt fixes.

Core claim

COMPASS formalises prompt engineering as a cognitive and probabilistic decision-making process. It models unobservable users' latent cognitive states, such as attention and comprehension, uncertainty, and observable interaction cues as a POMDP, whose synthesised policy enables adaptive generation of explanations and prompt refinements. Evaluation on two cyber-physical system case studies confirms the feasibility of integrating human cognition and user profile feedback into automated prompt synthesis.

What carries the argument

The POMDP (partially observable Markov decision process) that represents latent cognitive states together with interaction cues and yields a policy for adaptive prompt and explanation generation.

If this is right

  • Explanations of opaque AI task plans become more reliable for diverse users without repeated manual prompt engineering.
  • Self-adaptive systems can incorporate user feedback and cognitive cues directly into LLM output generation.
  • The same modelling approach applies to other complex planning domains where stakeholder understanding varies.
  • Quantitative and qualitative evaluation on cyber-physical systems shows the method can be implemented and assessed in practice.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The technique could extend to other LLM uses such as tutoring or decision aids where user mental states affect output usefulness.
  • If the POMDP policy generalises, organisations deploying LLMs would need less expert prompt-tuning effort.
  • Further tests with larger and more varied user groups would show how faithfully the model tracks real cognitive changes.

Load-bearing premise

That inferring users' hidden cognitive states from interaction cues and encoding them in a POMDP produces measurably better prompt refinements and explanations than static methods.

What would settle it

A controlled user study in which COMPASS-adapted prompts yield no improvement or lower scores in explanation clarity, comprehension, or satisfaction compared with fixed prompts on the same task-planning cases.

Figures

Figures reproduced from arXiv: 2604.21092 by Alexandros Evangelidis, Gricel V\'azquez, Radu Calinescu, Sepeedeh Shahbeigi, Simos Gerasimou.

Figure 1
Figure 1. Figure 1: COMPASS stages and artefacts, showing the adaptation of explanations (S2) positioned within the MAPE-K loop. [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Planning problem description of scenario in Sec [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Example of GenAI instructions (𝑖2). when system changes are localised (e.g., in the plan minimum prob￾ability of success requirement); AI planners can leverage this aspect to incrementally and efficiently generate new plans by building on or adapting previously computed plans [22, 88]. The successful generation of feasible task plans concludes stage S1. These plans are ready for deployment, and can be insp… view at source ↗
Figure 4
Figure 4. Figure 4: Prompt generator (5) flow. See also Table 3. [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Scenario showing adaptive explanations diminishing returns. By proactively adapting explanation granu￾larity and structure to inferred attention and understanding levels, COMPASS aims to avoid explanations that users would view as cognitively costly relative to their current reference point. RQ3 (Alignment). We analyse the survey results using descriptive statistics and quantitative data analysis as per [9… view at source ↗
Figure 6
Figure 6. Figure 6: Understanding level for adapted explanation [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗
read the original abstract

Integrating Large Language Models (LLMs) into complex software systems enables the generation of human-understandable explanations of opaque AI processes, such as automated task planning. However, the quality and reliability of these explanations heavily depend on effective prompt engineering. The lack of a systematic understanding of how diverse stakeholder groups formulate and refine prompts hinders the development of tools that can automate this process. We introduce COMPASS (COgnitive Modelling for Prompt Automated SynthesiS), a proof-of-concept self-adaptive approach that formalises prompt engineering as a cognitive and probabilistic decision-making process. COMPASS models unobservable users' latent cognitive states, such as attention and comprehension, uncertainty, and observable interaction cues as a POMDP, whose synthesised policy enables adaptive generation of explanations and prompt refinements. We evaluate COMPASS using two diverse cyber-physical system case studies to assess the adaptive explanation generation and their qualities, both quantitatively and qualitatively. Our results demonstrate the feasibility of COMPASS integrating human cognition and user profile's feedback into automated prompt synthesis in complex task planning systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces COMPASS, a proof-of-concept self-adaptive framework that models prompt engineering for LLM-generated explanations of task plans as a POMDP. It represents users' latent cognitive states (attention, comprehension, uncertainty) together with observable interaction cues, synthesizes a policy for adaptive explanation generation and prompt refinement, and evaluates feasibility via quantitative and qualitative assessment on two cyber-physical system case studies.

Significance. If the POMDP formulation and policy synthesis can be shown to track latent states and measurably outperform non-adaptive baselines, the work would offer a principled way to integrate human cognitive modeling into automated prompt synthesis for explainable planning systems. The case-study approach demonstrates practical applicability in cyber-physical domains, but the absence of formal definitions and controlled experiments currently limits the strength of this contribution.

major comments (3)
  1. [Abstract / Modeling section] Abstract and modeling description: the central claim that COMPASS 'models unobservable users' latent cognitive states ... as a POMDP' is not supported by any equations defining the state space S, observation function O, transition probabilities T, reward function R, or belief-update rule. Without these, it is impossible to determine whether the synthesized policy actually infers or adapts to the claimed latent states rather than applying heuristic prompt adjustments.
  2. [Evaluation] Evaluation section: the paper reports 'quantitatively and qualitatively' assessing adaptive explanation generation on two case studies, yet supplies no concrete metrics, non-adaptive baselines, ablation isolating the POMDP component, or statistical tests. This leaves the claim that the approach 'integrates human cognition ... into automated prompt synthesis' unverified and prevents assessment of whether the reported feasibility stems from genuine cognitive-state tracking.
  3. [Evaluation / Discussion] The weakest assumption—that POMDP inference from interaction cues accurately captures prompt-refinement dynamics—is not tested; no validation of the observation model against human data or comparison of explanation quality with/without the POMDP policy is provided.
minor comments (2)
  1. [Abstract] The abstract states results 'demonstrate the feasibility' but does not define what quantitative thresholds or qualitative criteria were used to reach this conclusion.
  2. [Modeling] Notation for the POMDP components (e.g., how user profiles enter the belief state) is introduced only at a high level; explicit mapping from cues to observations would improve clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which highlight opportunities to strengthen the formalization and evaluation of COMPASS. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract / Modeling section] Abstract and modeling description: the central claim that COMPASS 'models unobservable users' latent cognitive states ... as a POMDP' is not supported by any equations defining the state space S, observation function O, transition probabilities T, reward function R, or belief-update rule. Without these, it is impossible to determine whether the synthesized policy actually infers or adapts to the claimed latent states rather than applying heuristic prompt adjustments.

    Authors: We agree that explicit mathematical definitions are essential to substantiate the POMDP claim and distinguish it from heuristics. In the revised manuscript, we will add a dedicated Modeling subsection that formally specifies the full POMDP tuple, including: the state space S (latent cognitive states for attention, comprehension, and uncertainty, combined with observable interaction cues), action space A (prompt refinements and explanation adaptations), transition function T, reward function R (balancing explanation quality, user comprehension, and cognitive load), observation function O (mapping cues to observations), discount factor, and the belief-update rule via standard Bayesian filtering. We will also describe the policy synthesis process (using a POMDP solver) to show how it maintains and acts on beliefs over latent states. revision: yes

  2. Referee: [Evaluation] Evaluation section: the paper reports 'quantitatively and qualitatively' assessing adaptive explanation generation on two case studies, yet supplies no concrete metrics, non-adaptive baselines, ablation isolating the POMDP component, or statistical tests. This leaves the claim that the approach 'integrates human cognition ... into automated prompt synthesis' unverified and prevents assessment of whether the reported feasibility stems from genuine cognitive-state tracking.

    Authors: We acknowledge that the current evaluation description is insufficiently detailed to verify the claims. In the revision, we will expand the Evaluation section to report concrete quantitative metrics (e.g., explanation fidelity, adaptation frequency, and proxy measures of comprehension), direct comparisons against non-adaptive baselines (fixed-prompt variants), ablations that disable the POMDP belief update while retaining other components, and statistical tests on the observed differences. These additions will allow readers to assess whether improvements arise from cognitive-state tracking. revision: yes

  3. Referee: [Evaluation / Discussion] The weakest assumption—that POMDP inference from interaction cues accurately captures prompt-refinement dynamics—is not tested; no validation of the observation model against human data or comparison of explanation quality with/without the POMDP policy is provided.

    Authors: We recognize this as a substantive limitation of the proof-of-concept scope. The revised manuscript will include: (i) a side-by-side comparison of explanation quality with and without the POMDP policy on the case studies, and (ii) an expanded Discussion section that explicitly states the assumption, its grounding in prior HCI literature, and the absence of direct human validation of the observation model. We will also outline planned future user studies for empirical validation. Full human-subject testing of the observation model exceeds the current feasibility-study design but will be clearly flagged as future work. revision: partial

Circularity Check

0 steps flagged

No significant circularity in COMPASS POMDP formalization

full rationale

The paper presents COMPASS as a POMDP-based formalization of prompt engineering that models latent cognitive states from interaction cues and synthesizes an adaptive policy. No equations, fitted parameters, or self-citations are shown that reduce the claimed adaptive generation or policy synthesis to a self-defined input by construction. The derivation relies on standard POMDP machinery applied to the domain, with evaluation via external case studies rather than internal consistency that would indicate circularity. This is the most common honest finding for papers that introduce a modeling framework without parameter-fitting loops or load-bearing self-references.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that latent cognitive states can be tracked probabilistically from interaction cues using a standard POMDP; no free parameters or new invented entities are specified in the abstract.

axioms (1)
  • domain assumption POMDP can model users' latent cognitive states such as attention, comprehension, and uncertainty from observable interaction cues.
    Invoked to formalize prompt engineering as a cognitive decision process.

pith-pipeline@v0.9.0 · 5504 in / 1208 out tokens · 52913 ms · 2026-05-09T23:53:37.899952+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

101 extracted references · 13 canonical work pages · 3 internal anchors

  1. [1]

    https://github.com/Gricel-lee/COMPASS-LLM

    Project’s GitHub. https://github.com/Gricel-lee/COMPASS-LLM

  2. [2]

    Alharbi, Shihong Huang, and David Garlan

    Mohammed N. Alharbi, Shihong Huang, and David Garlan. 2021. A Probabilis- tic Model for Effective Explainability Based on Personality Traits. InSoftware Architecture: 15th European Conference on Software Architecture. 21 pages

  3. [3]

    Kingsley Attai, Moses Ekpenyong, Constance Amannah, Daniel Asuquo, Peterben Ajuga, Okure Obot, Ekemini Johnson, Anietie John, Omosivie Maduka, Christie Akwaowo, and Faith-Michael Uzoka. 2024. Enhancing the Interpretability of Malaria and Typhoid Diagnosis with Explainable AI and Large Language Models. Tropical Medicine and Infectious Disease9, 9 (2024)

  4. [4]

    Paul Ayres, Joy Yeonjoo Lee, Fred Paas, and Jeroen JG Van Merrienboer. 2021. The validity of physiological measures to identify differences in intrinsic cognitive load.Frontiers in psychology12 (2021), 702538

  5. [5]

    Nelly Bencomo, Kris Welsh, Pete Sawyer, and Jon Whittle. 2012. Self-Explanation in Adaptive Systems. In2012 IEEE 17th International Conference on Engineering of Complex Computer Systems. 157–166

  6. [6]

    Amrita Bhattacharjee, Raha Moraffah, Joshua Garland, and Huan Liu. 2024. Towards LLM-guided Causal Explainability for Black-box Text Classifiers. arXiv:2309.13340 [cs.CL] https://arxiv.org/abs/2309.13340

  7. [7]

    Andrea Bianco and Luca De Alfaro. 1995. Model checking of probabilistic and nondeterministic systems. InInternational Conference on Foundations of Software Technology and Theoretical Computer Science. Springer, 499–513

  8. [8]

    Ahsan Bilal, David Ebert, and Beiyu Lin. 2025. LLMs for explainable AI: A comprehensive survey.arXiv preprint arXiv:2504.00125(2025)

  9. [9]

    André Borrmann, Markus König, Christian Koch, and Jakob Beetz. 2018. Building information modeling: why? What? How? InBuilding information modeling: Technology foundations and industry practice. Springer, 1–24

  10. [10]

    James Burton, Noura Al Moubayed, and Amir Enshaei. 2023. Natural Language Explanations for Machine Learning Classification Decisions. In2023 International Joint Conference on Neural Networks (IJCNN). 1–9

  11. [11]

    Radu Calinescu, Simos Gerasimou, Sinem Getir Yaman, Gricel Vazquez Flores, and Micah Bassett. 2026. Verification of Multi-Model Stochastic Systems. In48th IEEE/ACM International Conference on Software Engineering (ICSE)

  12. [12]

    Radu Calinescu, Calum Imrie, Ravi Mangal, Genaína Nunes Rodrigues, Corina Păsăreanu, Misael Alpizar Santana, and Gricel Vázquez. 2022. Discrete-event controller synthesis for autonomous systems with deep-learning perception components.arXiv preprint arXiv:2202.03360(2022)

  13. [13]

    Radu Calinescu, Calum Imrie, Ravi Mangal, Genaína Nunes Rodrigues, Corina Păsăreanu, Misael Alpizar Santana, and Gricel Vázquez. 2024. Controller synthe- sis for autonomous systems with deep-learning perception components.IEEE Transactions on Software Engineering50, 6 (2024), 1374–1395

  14. [14]

    Radu Constantin Calinescu, Milan Ceska, Simos Gerasimou, Marta Kwiatkowska, and Nicola Paoletti. 2018. Efficient synthesis of robust models for stochastic systems.Journal of Systems and Software143 (2018), 140–158

  15. [15]

    Javier Cámara, Gabriel A Moreno, and David Garlan. 2014. Stochastic game analysis and latency awareness for proactive self-adaptation. InProceedings of the 9th International Symposium on Software Engineering for Adaptive and Self- Managing Systems. 155–164

  16. [16]

    Marco Cascella, Jonathan Montomoli, Valentina Bellini, and Elena Bignami. 2023. Evaluating the feasibility of ChatGPT in healthcare: an analysis of multiple clinical and research scenarios.Journal of medical systems47, 1 (2023), 33

  17. [17]

    Michael Cashmore, Anna Collins, Benjamin Krarup, Senka Krivic, Daniele Mag- azzeni, and David Smith. 2019. Towards explainable AI planning as a service, In 2nd ICAPS Workshop on Explainable Planning.2nd ICAPS Workshop on Explainable Planning

  18. [18]

    Tathagata Chakraborti, Sarath Sreedharan, Sachin Grover, and Subbarao Kamb- hampati. 2019. Plan explanations as model reconciliation–an empirical study. In 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 258–266

  19. [19]

    Tathagata Chakraborti, Sarath Sreedharan, and Subbarao Kambhampati. 2021. The emerging landscape of explainable automated planning & decision making. InProceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence(Yokohama, Yokohama, Japan)(IJCAI). Article 669, 9 pages

  20. [20]

    Yanda Chen, Ruiqi Zhong, Narutatsu Ri, Chen Zhao, He He, Jacob Steinhardt, Zhou Yu, and Kathleen McKeown. 2024. Do models explain themselves? counter- factual simulatability of natural language explanations. InProceedings of the 41st International Conference on Machine Learning (ICML). JMLR.org

  21. [21]

    Mirko D’Angelo, Simos Gerasimou, Sona Ghahremani, Johannes Grohmann, Ingrid Nunes, Evangelos Pournaras, and Sven Tomforde. 2019. On learning in collective self-adaptive systems: State of practice and a 3D framework. In2019 IEEE/ACM 14th International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS). IEEE, 13–24

  22. [22]

    Neil T Dantam, Zachary K Kingston, Swarat Chaudhuri, and Lydia E Kavraki

  23. [23]

    In Robotics: Science and systems, Vol

    Incremental task and motion planning: A constraint-based approach.. In Robotics: Science and systems, Vol. 12. Ann Arbor, MI, USA, 00052

  24. [24]

    Frédéric Dehais, Alex Lafont, Raphaëlle Roy, and Stephen Fairclough. 2020. A neuroergonomics approach to mental workload, engagement and human perfor- mance.Frontiers in neuroscience14 (2020), 268

  25. [25]

    Duanyu Feng, Yongfu Dai, Jimin Huang, Yifang Zhang, Qianqian Xie, Weiguang Han, Zhengyu Chen, Alejandro Lopez-Lira, and Hao Wang. 2024. Empowering Many, Biasing a Few: Generalist Credit Scoring through Large Language Models. arXiv:2310.00566 https://arxiv.org/abs/2310.00566

  26. [26]

    Maria Fox and Derek Long. 2003. PDDL2.1: An extension to PDDL for expressing temporal planning domains.Journal of artificial intelligence research20 (2003)

  27. [27]

    Maria Fox, Derek Long, and Daniele Magazzeni. 2017. Explainable planning. arXiv preprint arXiv:1709.10256(2017)

  28. [28]

    Lina Gao, Zhongying Zhao, Chao Li, Jianli Zhao, and Qingtian Zeng. 2022. Deep cognitive diagnosis model for predicting students’ performance.Future Genera- tion Computer Systems126 (2022), 252–262

  29. [29]

    Simos Gerasimou, Radu Calinescu, and Alec Banks. 2014. Efficient runtime quan- titative verification using caching, lookahead, and nearly-optimal reconfiguration. InProceedings of the 9th International Symposium on Software Engineering for Adaptive and Self-Managing Systems. 115–124

  30. [30]

    Simos Gerasimou, Radu Calinescu, and Giordano Tamburrelli. 2018. Synthesis of probabilistic models for quality-of-service software engineering.Automated Software Engineering25, 4 (2018), 785–831

  31. [31]

    Simos Gerasimou, Javier Cámara, Radu Calinescu, Naif Alasmari, Faisal Al- hwikem, and Xinwei Fang. 2021. Evolutionary-guided synthesis of verified pareto-optimal MDP policies. In2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 842–853

  32. [32]

    Google. 2025. Gemini. https://gemini.google.com Accessed: 2025-10-09

  33. [33]

    Hans Hansson and Bengt Jonsson. 1994. A logic for reasoning about time and reliability.Form. Asp. Comput.6, 5 (Sept. 1994), 512–535

  34. [34]

    Malte Helmert. 2006. The fast downward planning system.Journal of Artificial Intelligence Research26 (2006), 191–246

  35. [35]

    Jesse Hoey, Craig Boutilier, Pascal Poupart, Patrick Olivier, Andrew Monk, and Alex Mihailidis. 2013. People, sensors, decisions: Customizable and adaptive technologies for assistance in healthcare.ACM Trans. Interact. Intell. Syst.2, 4, Article 20 (Jan. 2013), 36 pages

  36. [36]

    Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan Tompson, Igor Mordatch, Yevgen Chebotar, Pierre Sermanet, Tomas Jackson, Noah Brown, Linda Luu, Sergey Levine, Karol Hausman, and brian ichter. 2022. Inner Monologue: Embodied Reasoning through Planning with Language Models. In6th Annual Conference on Robot Learning

  37. [37]

    Alon Jacovi and Yoav Goldberg. 2020. Towards Faithfully Interpretable NLP Systems: How Should We Define and Evaluate Faithfulness?. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 4198–4205

  38. [38]

    Daniel Kahneman, Jack L Knetsch, and Richard H Thaler. 1991. Anomalies: The endowment effect, loss aversion, and status quo bias.Journal of Economic perspectives5, 1 (1991), 193–206

  39. [39]

    Daniel Kahneman and Amos Tversky. 2013. Prospect theory: An analysis of decision under risk. InHandbook of the fundamentals of financial decision making: Part I. World Scientific, 99–127

  40. [40]

    Frank C Keil. 2006. Explanation and understanding.Annu. Rev. Psychol.57, 1 (2006), 227–254

  41. [41]

    Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan. 2024. The challenge of understanding what users want: Inconsistent preferences and engagement optimization.Management science70, 9 (2024), 6336–6355

  42. [42]

    Benjamin Krarup, Michael Cashmore, Daniele Magazzeni, and Tim Miller. 2019. Model-based contrastive explanations for explainable planning. InICAPS 2019 Workshop on Explainable AI Planning (XAIP). AAAI Press, USA

  43. [43]

    Nicholas Kroeger, Dan Ley, Satyapriya Krishna, Chirag Agarwal, and Himabindu Lakkaraju. 2023. Are Large Language Models Post Hoc Explainers?. InXAI in Action: Past, Present, and Future Applications

  44. [44]

    John Langford and Tong Zhang. 2007. The Epoch-Greedy algorithm for contextual multi-armed bandits. InProceedings of the 21st International Conference on Neural Information Processing Systems(Vancouver, British Columbia, Canada)(NIPS). Curran Associates Inc., Red Hook, NY, USA, 817–824

  45. [45]

    Jialong Li, Mingyue Zhang, Nianyu Li, Danny Weyns, Zhi Jin, and Kenji Tei

  46. [46]

    InProceedings of the 19th International Symposium on Software Engineering for Adaptive and Self-Managing Systems

    Exploring the potential of large language models in self-adaptive systems. InProceedings of the 19th International Symposium on Software Engineering for Adaptive and Self-Managing Systems. 77–83

  47. [47]

    Lihong Li, Wei Chu, John Langford, and Robert E Schapire. 2010. A contextual- bandit approach to personalized news article recommendation. InProceedings of the 19th international conference on World wide web. 661–670

  48. [48]

    Nianyu Li, Sridhar Adepu, Eunsuk Kang, and David Garlan. 2020. Explanations for human-on-the-loop: A probabilistic model checking approach. InProceedings of the IEEE/ACM 15th International Symposium on Software Engineering for Adaptive and Self-Managing Systems. 181–187

  49. [49]

    Xuefeng Li, Liwen Wang, Guanting Dong, Keqing He, Jinzheng Zhao, Hao Lei, Jiachi Liu, and Weiran Xu. 2023. Generative Zero-Shot Prompt Learning for Cross- Domain Slot Filling with Inverse Prompting. InFindings of the Association for Mind the Prompt: Self-adaptive Generation of Task Plan Explanations via LLMs SEAMS ’26, April 13–14, 2026, Rio de Janeiro, B...

  50. [50]

    Yinheng Li, Shaofei Wang, Han Ding, and Hang Chen. 2023. Large language models in finance: A survey. InProceedings of the Fourth ACM International Conference on AI in Finance. 374–382

  51. [51]

    Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Cheng- gang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, et al. 2024. Deepseek-v3 technical report.arXiv preprint arXiv:2412.19437(2024)

  52. [52]

    Bo Liu, Yuqian Jiang, Xiaohan Zhang, Qiang Liu, Shiqi Zhang, Joydeep Biswas, and Peter Stone. 2023. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv:2304.11477 [cs.AI] https://arxiv.org/abs/2304.11477

  53. [53]

    Lundberg and Su-In Lee

    Scott M. Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. InProceedings of the 31st International Conference on Neural Information Processing Systems (NIPS). 4768–4777

  54. [54]

    EasyChair, 2024

    Ayuns Luz. EasyChair, 2024. Enhancing the Interpretability and Explainability of AI-Driven Risk Models Using LLM Capabilities. EasyChair Preprint 13368

  55. [55]

    Ulrik Lyngs, Reuben Binns, Max Van Kleek, and Nigel Shadbolt. 2018. So, tell me what users want, what they really, really want!. InExtended Abstracts of the 2018 CHI Conference on Human Factors in Computing Systems. 1–10

  56. [56]

    Qing Lyu, Marianna Apidianaki, and Chris Callison-Burch. 2024. Towards faithful model explanation in NLP: A survey.Computational Linguistics50, 2 (2024)

  57. [57]

    Yuetian Mao, Junjie He, and Chunyang Chen. 2025. From prompts to templates: A systematic prompt template analysis for real-world LLMapps. InProceedings of the 33rd ACM International Conference on the Foundations of Software Engineering

  58. [58]

    Claudio Menghi, Christos Tsigkanos, Mehrnoosh Askarpour, Patrizio Pelliccione, Gricel Vázquez, Radu Calinescu, and Sergio García. 2022. Mission specification patterns for mobile robots: Providing support for quantitative properties.IEEE Transactions on Software Engineering49, 4 (2022), 2741–2760

  59. [59]

    Andrea Micheli and Alessandro Valentini. 2021. Synthesis of search heuristics for temporal planning via reinforcement learning. InProceedings of the AAAI conference on artificial intelligence, Vol. 35. 11895–11902

  60. [60]

    Tim Miller. 2019. Explanation in artificial intelligence: Insights from the social sciences.Artificial intelligence267 (2019), 1–38

  61. [61]

    Gethin Norman, David Parker, and Xueyi Zou. 2017. Verification and control of partially observable probabilistic systems.Real-Time Systems53, 3 (2017)

  62. [62]

    OpenAI. 2024. GPT-4 Technical Report.arXiv preprint arXiv:2303.08774(2024). https://arxiv.org/abs/2303.08774

  63. [63]

    OpenAI. 2025. ChatGPT-5: Large Language Model by OpenAI. https://chatgpt. com/. Accessed: 2025-10-24

  64. [64]

    Ashutosh Pandey, Gabriel A Moreno, Javier Cámara, and David Garlan. 2016. Hybrid planning for decision making in self-adaptive systems. In2016 IEEE 10th International Conference on Self-Adaptive and Self-Organizing Systems (SASO)

  65. [65]

    Juan Marcelo Parra-Ullauri, Antonio García-Domínguez, and Nelly Bencomo

  66. [66]

    In2021 ACM/IEEE International Conference on Model Driven Engineering Languages and Systems Companion

    From a Series of (Un)fortunate Events to Global Explainability of Runtime Model-Based Self-Adaptive Systems. In2021 ACM/IEEE International Conference on Model Driven Engineering Languages and Systems Companion. 807–816

  67. [67]

    Joelle Pineau, Michael Montemerlo, Martha Pollack, Nicholas Roy, and Sebastian Thrun. 2003. Towards robotic assistants in nursing homes: Challenges and results.Robotics and Autonomous Systems42, 3 (2003), 271–281. doi:10.1016/S0921- 8890(02)00381-0 Socially Interactive Robots

  68. [68]

    Pynadath and Stacy C

    David V. Pynadath and Stacy C. Marsella. 2005. PsychSim: modeling theory of mind with decision-theoretic agents. InProceedings of the 19th International Joint Conference on Artificial Intelligence(Edinburgh, Scotland)(IJCAI). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1181–1186

  69. [69]

    Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, et al . 2018. Improving language understanding by generative pre-training. (2018)

  70. [70]

    Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Amodei, et al. 2019. Language models are unsupervised multitask learners.OpenAI blog1, 8 (2019), 9

  71. [71]

    Laria Reynolds and Kyle McDonell. 2021. Prompt programming for large language models: Beyond the few-shot paradigm. InExtended abstracts of the 2021 CHI conference on human factors in computing systems. 1–7

  72. [72]

    Why Should I Trust You?

    Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16). Association for Computing Machinery, 1135–1144

  73. [73]

    Rovetta, Alberto and Trapasso, Alessandro and Valentini, Alessandro and et al

  74. [74]

    https://unified-planning.readthedocs

    Unified Planning Documentation. https://unified-planning.readthedocs. io/en/latest/index.html Accessed: 2024-12-02

  75. [75]

    Ranjan Sapkota, Rizwan Qureshi, Muhammad Usman Hadi, Syed Zohaib Hassan, Ferhat Sadak, Maged Shoman, Muhammad Sajjad, Fayaz Ali Dharejo, Achyut Paudel, Jiajia Li, et al. 2025. Multi-modal LLMs in agriculture: A comprehensive review.IEEE Transactions on Automation Science and Engineering(2025)

  76. [76]

    Enrico Scala, Patrik Haslum, Sylvie Thiébaux, and Miquel Ramirez. 2016. Interval- based relaxation for general numeric planning. InProceedings of the Twenty-second European Conference on Artificial Intelligence. 655–663

  77. [77]

    Hua Shen, Chieh-Yang Huang, Tongshuang Wu, and Ting-Hao Kenneth Huang

  78. [78]

    InCompanion Publication of the 2023 Conference on Computer Supported Cooperative Work and Social Computing (CSCW ’23 Companion)

    ConvXAI: Delivering Heterogeneous AI Explanations via Conversations to Support Human-AI Scientific Writing. InCompanion Publication of the 2023 Conference on Computer Supported Cooperative Work and Social Computing (CSCW ’23 Companion). Association for Computing Machinery, 384–387

  79. [79]

    Chanda Simfukwe, Young Chul Youn, et al. 2023. CNN for a regression machine learning algorithm for predicting cognitive impairment using qEEG.Neuropsy- chiatric Disease and Treatment(2023), 851–863

  80. [80]

    Slack, Satyapriya Krishna, Himabindu Lakkaraju, and Sameer Singh

    Dylan Z. Slack, Satyapriya Krishna, Himabindu Lakkaraju, and Sameer Singh

Showing first 80 references.