pith. sign in

arxiv: 2605.30187 · v1 · pith:RQZQD7BQnew · submitted 2026-05-28 · 💻 cs.AI · cs.CY

Modularizing Educational LLM-Agency for Fostering Responsible Learning Assistance

Pith reviewed 2026-06-29 07:23 UTC · model grok-4.3

classification 💻 cs.AI cs.CY
keywords educational AILLM agentsmodular architectureresponsible AIpedagogical guidanceexercise solvingAI chatbots in education
0
0 comments X

The pith

Modularizing an LLM agent into stage-specific components lets pedagogical rules be injected at each step of exercise solving.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that current monolithic LLMs used as educational chatbots tend to bypass pedagogical principles, risking reduced critical thinking and transfer skills in students. It proposes instead an agentic system broken into distinct modules that handle separate phases of problem solving. Each module can then enforce targeted educational constraints drawn from learning science. This structure is meant to produce assistance that remains controllable and transparent for teachers and learners. The authors present the modular design as a practical response to the limitations of single-model approaches.

Core claim

An agentic architecture that splits exercise-solving assistance into separate modules for different stages enables the direct incorporation of pedagogical constraints, producing guidance that is more controllable, transparent, and overseeable than monolithic LLM chatbots.

What carries the argument

Stage-specific modules within an agentic architecture that each embed targeted pedagogical advice during exercise solving.

If this is right

  • Guidance at each stage can discourage direct answers in favor of hints that preserve student effort.
  • Educators gain the ability to inspect or modify rules inside individual modules without retraining the whole system.
  • Risks such as over-reliance on AI or loss of creativity can be addressed at the module level rather than through prompt engineering alone.
  • The same modular skeleton can accept new pedagogical rules for different subjects or age groups.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could be tested by measuring whether modular systems maintain pedagogical fidelity better than monolithic ones across multi-turn conversations.
  • If successful, the design points toward reusable module libraries that different educational institutions could adapt without building full agents from scratch.
  • It raises the further question of how to validate that module-level rules actually produce the intended learning outcomes rather than just sounding pedagogical.

Load-bearing premise

Pedagogical principles can be turned into concrete constraints inside separate modules while keeping the overall interaction coherent and helpful.

What would settle it

A controlled study in which students using the modular system show no measurable gains in critical-thinking or transfer skills compared with students using a standard monolithic LLM tutor.

Figures

Figures reproduced from arXiv: 2605.30187 by Emely Wuenscher, Felix Jahn, Julius Gabelmann, Kevin Baum, Sophie van Rossum, Timo P. Gros, Verena Wolf.

Figure 1
Figure 1. Figure 1: Contribution of modular chatbot architectures to the identified desiderata for a responsible AI usage in education. harmful in an educational context. If students use these tools to bypass the nec￾essary struggle of problem-solving, they miss out on developing essential skills such as transferring and applying knowledge independently, creativity, and criti￾cal thinking. Consequently, the responsible deploy… view at source ↗
Figure 2
Figure 2. Figure 2: Visualization of the modular architecture of the chat component, including the steps of the output generation of each module; grayed steps are hidden to the user. provide their own (partial) solution approaches for validation. This module pro￾vides immediate, corrective, and motivating feedback, preventing the program￾ming of misconceptions into long-term memory [13]. To ensure high diagnostic accuracy, th… view at source ↗
Figure 3
Figure 3. Figure 3: Adversarial test: Researcher repeatedly requests the complete solution with escalating urgency. The hint module consistently refuses and instead offers guiding questions, demonstrating the system’s adherence to pedagogical constraints despite deliberate pressure. best approach to the exercise, then solves it step-by-step under guidance from the system’s evaluation module [PITH_FULL_IMAGE:figures/full_fig_… view at source ↗
Figure 4
Figure 4. Figure 4: Example of the system providing appropriate scaffolding to a user. For the first user request, the hint module is activated; for subsequent requests in which the user provides partial solutions, the evaluation module is activated [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗
read the original abstract

The widespread adoption of AI chatbots in education will drastically change learning, making responsible deployment a critical concern. While large language models (LLMs) might have access to sources discussing insights from educational sciences, they are not particularly inclined to adhere to pedagogical concepts, risking negative effects on the learning process, such as a loss of transfer capabilities, critical thinking, or creativity. In this paper, we introduce an agentic AI chatbot architecture assisting students with exercise solving, specifically designed to contribute to more responsible AI use in education. We base our conceptual development on the identification of several desiderata for responsible LLM-based educational systems, argue for the structural shortcomings inherent in monolithic, out-of-the-box solutions, and instead suggest modularizing the agentic architecture. We propose specific modules for different stages of exercise solving, enabling incorporation of targeted pedagogical advice, guiding students through the learning process in a more controllable, transparent, and overseeable manner.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes a modular agentic architecture for LLM-based educational chatbots to assist with exercise solving. It identifies desiderata for responsible AI in education, critiques monolithic LLMs for failing to adhere to pedagogical concepts (risking loss of transfer, critical thinking, and creativity), and outlines stage-specific modules that purportedly enable targeted incorporation of educational science principles for more controllable, transparent, and overseeable guidance.

Significance. If the modular design can be realized with enforceable constraints, it could supply a reusable blueprint for aligning LLM agents with pedagogical goals in education, addressing a timely concern as AI chatbots proliferate. The contribution is primarily conceptual, resting on the definitional claim that decomposition by exercise-solving stage facilitates constraint injection; no empirical results or implementations are provided to demonstrate realized benefits.

major comments (2)
  1. [Abstract] Abstract and conceptual development section: the central claim that modularization overcomes the 'structural shortcomings inherent in monolithic, out-of-the-box solutions' and enables 'targeted pedagogical advice' is presented without any concrete specification of module interfaces, constraint mechanisms, or information flow between stages. This makes the controllability and transparency benefits definitional rather than demonstrated.
  2. [conceptual development] The translation of pedagogical concepts into module-level constraints (the weakest assumption noted in the design) is asserted but not illustrated with even a single worked example of how, e.g., a 'critical thinking' desideratum would be encoded in a particular stage module without coherence loss. This is load-bearing for the claim of responsible learning assistance.
minor comments (1)
  1. The manuscript would benefit from an explicit list or diagram of the proposed modules and their inputs/outputs to make the architecture reproducible from the text.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. The comments correctly identify that the current manuscript remains at a high level of abstraction. We will revise to add the requested concreteness while preserving the paper's conceptual focus.

read point-by-point responses
  1. Referee: [Abstract] Abstract and conceptual development section: the central claim that modularization overcomes the 'structural shortcomings inherent in monolithic, out-of-the-box solutions' and enables 'targeted pedagogical advice' is presented without any concrete specification of module interfaces, constraint mechanisms, or information flow between stages. This makes the controllability and transparency benefits definitional rather than demonstrated.

    Authors: We agree that the absence of explicit module interfaces and information-flow descriptions leaves the claimed advantages at the definitional level. In the revision we will insert a new subsection that defines the input/output signatures of each stage module, the format in which pedagogical constraints are injected, and the protocol by which stage outputs are passed to the next module. revision: yes

  2. Referee: [conceptual development] The translation of pedagogical concepts into module-level constraints (the weakest assumption noted in the design) is asserted but not illustrated with even a single worked example of how, e.g., a 'critical thinking' desideratum would be encoded in a particular stage module without coherence loss. This is load-bearing for the claim of responsible learning assistance.

    Authors: We accept that a concrete worked example is necessary to substantiate the translation step. The revised manuscript will include at least one fully elaborated example showing how the desideratum of critical thinking is operationalized as a constraint within a designated stage module, including the prompt template, verification step, and mechanism for preserving coherence with adjacent stages. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The manuscript is a purely conceptual design proposal. It states desiderata for responsible educational LLMs, notes limitations of monolithic models, and outlines a modular architecture with stage-specific modules. No equations, fitted parameters, predictions, or uniqueness theorems appear. All central claims are definitional to the proposed design (i.e., the architecture is defined to incorporate pedagogical constraints) rather than derived from or reducing to prior fitted results or self-citations. The paper is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The proposal rests on domain assumptions about LLM limitations and the feasibility of modular pedagogical control; no free parameters or invented physical entities are introduced.

axioms (2)
  • domain assumption Monolithic out-of-the-box LLMs are structurally insufficient for responsible educational use because they do not reliably adhere to pedagogical concepts.
    Stated in the abstract as the motivation for modularization.
  • domain assumption Pedagogical concepts from educational sciences can be incorporated via targeted module-level advice without compromising overall system coherence.
    Implicit in the claim that modules enable controllable guidance.

pith-pipeline@v0.9.1-grok · 5707 in / 1157 out tokens · 19697 ms · 2026-06-29T07:23:33.872779+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

38 extracted references · 20 canonical work pages · 3 internal anchors

  1. [1]

    ACM Computing Surveys55, 1 – 37 (2022),https://api.semanticscholar.org/ CorpusID:246035545

    Abdelrahman, G.M., Wang, Q., Nunes, B.P.: Knowledge tracing: A survey. ACM Computing Surveys55, 1 – 37 (2022),https://api.semanticscholar.org/ CorpusID:246035545

  2. [2]

    Higher Education Research & Development43(7), 1465– 1478 (2024)

    Anson, D.W.J.: The impact of large language models on university stu- dents‚Äô literacy development: a dialogue with lea and street‚Äôs academic literacies framework. Higher Education Research & Development43(7), 1465– 1478 (2024). https://doi.org/10.1080/07294360.2024.2332259,https://doi.org/ 10.1080/07294360.2024.2332259

  3. [3]

    Anthropic: The claude 3 model family: Opus, sonnet, haiku. Tech. rep., Anthropic PBC (March 2024),https://www-cdn.anthropic.com/ de8ba9b01c9ab7cbabf5c33b80b7bbc618857627/Model_Card_Claude_3.pdf, official model card PDF for Claude 3 family (Opus, Sonnet, Haiku)

  4. [4]

    In: Proc

    Blobstein, A., Izmaylov, D., Yifat, T., Levy, M., Segal, A.: Angel: A new generation tool for learning material based questions and answers. In: Proc. of the NeurIPS Workshop on Generative AI for Education (GAIED) (2023)

  5. [5]

    Frontiers in Human Dynamics 6(Jul 2024)

    Cheong, B.C.: Transparency and accountability in ai systems: safeguarding well- being in the age of algorithmic decision-making. Frontiers in Human Dynamics 6(Jul 2024). https://doi.org/10.3389/fhumd.2024.1421273,http://dx.doi.org/ 10.3389/fhumd.2024.1421273

  6. [6]

    Information16(6) (2025),https://www.mdpi.com/ 2078-2489/16/6/469

    Córdova-Esparza, D.M.: Ai-powered educational agents: Opportunities, innova- tions, and ethical challenges. Information16(6) (2025),https://www.mdpi.com/ 2078-2489/16/6/469

  7. [7]

    Delikoura, I., Fung, Y.R., Hui, P.: From superficial outputs to superficial learn- ing: Risks of large language models in education (2025),https://arxiv.org/abs/ 2509.21972

  8. [8]

    Sustainability16(3) (2024)

    Demartini, C.G., Sciascia, L., Bosso, A., Manuri, F.: Artificial intelligence bringing improvements to adaptive learning in education: A case study. Sustainability16(3) (2024). https://doi.org/10.3390/su16031347,https://www.mdpi.com/2071-1050/ 16/3/1347

  9. [9]

    Fagbohun, O., Harrison, R.M., Dereventsov, A.: An empirical categorization of prompting techniques for large language models: A practitioner’s guide (2024), https://arxiv.org/abs/2402.14837

  10. [10]

    Freeman, J.: Student generative ai survey 2025 (2025),https://www.hepi.ac.uk/ reports/student-generative-ai-survey-2025/, based on a survey conducted by Savanta, Foreword by Professor Janice Kay CBE

  11. [11]

    Gemini Team et al.: Gemini: A family of highly capable multimodal models (2025), https://arxiv.org/abs/2312.11805 Modularizing Educational LLM-Agency for AI Responsibility 13

  12. [12]

    Telematics and Informatics98, 102265 (2025)

    Gnambs, T., Stein, J.P., Zinn, S., Griese, F., Appel, M.: Attitudes, experiences, and usage intentions of artificial intelligence: A popula- tion study in germany. Telematics and Informatics98, 102265 (2025). https://doi.org/https://doi.org/10.1016/j.tele.2025.102265,https://www. sciencedirect.com/science/article/pii/S0736585325000279

  13. [13]

    Review of educational research 77(1), 81–112 (2007)

    Hattie, J., Timperley, H.: The power of feedback. Review of educational research 77(1), 81–112 (2007)

  14. [14]

    Frontiers in EducationV olume 10 - 2025 (2025)

    Jose, B., Cleetus, A., Joseph, B., Joseph, L., Jose, B., John, A.K.: Epis- temic authority and generative ai in learning spaces: rethinking knowl- edge in the algorithmic age. Frontiers in EducationV olume 10 - 2025 (2025). https://doi.org/10.3389/feduc.2025.1647687,https://www.frontiersin. org/journals/education/articles/10.3389/feduc.2025.1647687

  15. [15]

    In: Foundations and Frameworks for AI in Education, pp

    Kaiser,G.,Kaiser,S.:Theuseofaiinteachingandlearning:Thepoweroffeedback. In: Foundations and Frameworks for AI in Education, pp. 255–290. IGI Global Scientific Publishing (2026)

  16. [16]

    Educational psychologist51(2), 289–299 (2016)

    Kapur, M.: Examining productive failure, productive success, unproductive failure, and unproductive success in learning. Educational psychologist51(2), 289–299 (2016)

  17. [17]

    Khot, T., Trivedi, H., Finlayson, M., Fu, Y., Richardson, K., Clark, P., Sabharwal, A.: Decomposed prompting: A modular approach for solving complex tasks (2023), https://arxiv.org/abs/2210.02406

  18. [18]

    Maastricht Journal of European and Comparative Law27(6), 720–735 (2020)

    Koulu, R.: Proceduralizing control and discretion: Human oversight in artificial in- telligence policy. Maastricht Journal of European and Comparative Law27(6), 720–735 (2020). https://doi.org/10.1177/1023263X20978649,https://doi.org/ 10.1177/1023263X20978649

  19. [19]

    Theory into prac- tice41(4), 212–218 (2002)

    Krathwohl, D.R.: A revision of bloom’s taxonomy: An overview. Theory into prac- tice41(4), 212–218 (2002)

  20. [20]

    Internet Policy Re- view9(05 2020)

    Larsson, S., Heintz, F.: Transparency in artificial intelligence. Internet Policy Re- view9(05 2020). https://doi.org/10.14763/2020.2.1469

  21. [21]

    InProceedings of the 2020 CHI Conference on Human Factors in Computing Systems

    Long, D., Magerko, B.: What is ai literacy? competencies and design consid- erations. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. p. 1–16. CHI ’20, Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3313831.3376727,https: //doi.org/10.1145/3313831.3376727

  22. [22]

    Ma, B., Li, H., Li, G., Chen, L., Tang, C., Xie, Y., Gu, C., Shimada, A., Konomi, S.: Scaffolding metacognition in programming education: Understanding student-ai interactions and design implications (2025),https://arxiv.org/abs/2511.04144

  23. [23]

    In: Arai, K

    Malmqvist, L.: Sycophancy in large language models: Causes and mitigations. In: Arai, K. (ed.) Intelligent Computing. pp. 61–74. Springer Nature Switzerland, Cham (2025)

  24. [24]

    In: Proceed- ings of the 2025 ACM Conference on Fairness, Accountability, and Transparency

    Neumann, A., Kirsten, E., Zafar, M.B., Singh, J.: Position is power: System prompts as a mechanism of bias in large language models (llms). In: Proceed- ings of the 2025 ACM Conference on Fairness, Accountability, and Transparency. pp. 573–598 (2025)

  25. [25]

    OpenAI Team et al.: Gpt-4 technical report (2024),https://arxiv.org/abs/ 2303.08774

  26. [26]

    IEEE Intelligent Systems40(1), 63–68 (2025)

    O’Leary, D.E.: Confirmation and specificity biases in large language mod- els: An explorative study. IEEE Intelligent Systems40(1), 63–68 (2025). https://doi.org/10.1109/MIS.2024.3513992 14 J. Gabelmann et al

  27. [27]

    Pew Research Center: Americans’ awareness of AI and views of use in daily life, control over it (September 2025),https://www.pewresearch.org/science/2025/ 09/17/ai-in-americans-lives-awareness-experiences-and-attitudes/

  28. [28]

    Schmucker, R., Xia, M., Azaria, A., Mitchell, T.: Ruffle&riley: Towards the auto- mated induction of conversational tutoring systems (2023),https://arxiv.org/ abs/2310.01420

  29. [29]

    Studies in Applied Linguistics and TESOL24(1) (2024)

    Shetye, S.: An evaluation of khanmigo, a generative ai tool, as a computer-assisted language learning app. Studies in Applied Linguistics and TESOL24(1) (2024)

  30. [30]

    In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

    Shridhar, K., Macina, J., El-Assady, M., Sinha, T., Kapur, M., Sachan, M.: Au- tomatic generation of socratic subquestions for teaching math word problems. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. pp. 4136–4149 (2022)

  31. [31]

    Review of educational research78(1), 153–189 (2008)

    Shute, V.J.: Focus on formative feedback. Review of educational research78(1), 153–189 (2008)

  32. [32]

    Computers in Human Behavior160, 108386 (2024)

    Stadler, M., Bannert, M., Sailer, M.: Cognitive Ease at a Cost: LLMs Reduce Mental Effort but Compromise Depth in Student Sci- entific Inquiry. Computers in Human Behavior160, 108386 (2024). https://doi.org/https://doi.org/10.1016/j.chb.2024.108386,https://www. sciencedirect.com/science/article/pii/S0747563224002541

  33. [33]

    , author Baum, K

    Sterz, S., Baum, K., Biewer, S., Hermanns, H., Lauber-Rönsberg, A., Meinel, P., Markus, L.: On the Quest for Effectiveness in Human Oversight: Interdisciplinary Perspectives (June 2024). https://doi.org/10.1145/3630106.3659051, fAccT ’24: Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Trans- parency

  34. [34]

    In: Psychology of learning and motivation, vol

    Sweller, J.: Cognitive load theory. In: Psychology of learning and motivation, vol. 55, pp. 37–76. Elsevier (2011)

  35. [35]

    Varangot-Reille, C., Bouvard, C., Gourru, A., Ciancone, M., Schaeffer, M., Jacquenet, F.: Doing more with less: A survey on routing strategies for resource optimisation in large language model-based systems (2025),https://arxiv.org/ abs/2502.00409

  36. [36]

    Vygotsky,L.S.:Mindinsociety:Thedevelopmentofhigherpsychologicalprocesses, vol. 86. Harvard university press (1978)

  37. [37]

    Journal of child psychology and psychiatry17(2), 89–100 (1976)

    Wood, D., Bruner, J.S., Ross, G.: The role of tutoring in problem solving. Journal of child psychology and psychiatry17(2), 89–100 (1976)

  38. [38]

    Educational Researcher54(6), 358–368 (2025)

    Wu, J.Y., Lee, Y.H., Chai, C.S., Tsai, C.C.: Strengthening human epis- temic agency in the symbiotic learning partnership with generative artificial intelligence. Educational Researcher54(6), 358–368 (2025). https://doi.org/10.3102/0013189X251333628,https://doi.org/10.3102/ 0013189X251333628 A Examples for Interactions with Different MALA Modules Figures ...