arxiv: 2605.06963 · v1 · submitted 2026-05-07 · 💻 cs.HC · cs.AI· cs.CL· cs.IR

Recognition: 2 theorem links

· Lean Theorem

From Surface Learning to Deep Understanding: A Grounded AI Tutoring System for Moodle

Anna Ostrowska, Anna Wr\'oblewska, Gabriela Majstrak, Jan Opala, Jan Skwarek, Micha{\l} Kukla, Sebastian Perga{\l}a

Authors on Pith no claims yet

Pith reviewed 2026-05-11 01:07 UTC · model grok-4.3

classification 💻 cs.HC cs.AIcs.CLcs.IR

keywords AI in educationMoodle plugintutoring systemhallucination preventionSocratic tutoringhuman-in-the-loop AIlearning management systemeducational technology

0 comments

The pith

A Moodle-based AI tutoring system uses retrieval from teacher materials to provide faithful Socratic guidance that promotes deep student understanding.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper describes the creation of an AI Teaching and Learning Assistant as a plugin for the Moodle learning platform. The system retrieves relevant sections from materials uploaded by teachers to inform its responses to student queries. By doing so, it seeks to avoid generating false information while using a questioning style to help students build genuine comprehension. Teachers can review and adjust the AI-generated content through a dedicated interface. Testing showed the responses stayed highly consistent with the source materials and users gave the system a solid recommendation rating.

Core claim

The paper demonstrates a dual-focused AI assistant for Moodle that supports students with interactive, question-driven tutoring and assists educators with oversight of content creation, all by grounding large language model outputs in teacher-supplied resources to achieve reliable educational interactions.

What carries the argument

Retrieval-augmented generation from teacher-provided materials, which supplies relevant context to condition responses, combined with Socratic dialogue for students and a supervised workspace for educators.

If this is right

Students engage in tutoring sessions that emphasize conceptual understanding through guided questioning rather than direct answers.
Educators maintain control by supervising AI-generated content to ensure it aligns with their teaching goals.
The approach minimizes the chance of AI introducing incorrect facts into the learning process.
High scores in faithfulness and user approval suggest the tool can integrate effectively into existing classroom workflows.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This grounding technique might be adapted to other educational software to improve AI safety across different systems.
Long-term use could allow the system to identify common student misconceptions from interaction patterns and address them proactively.
Such systems may encourage more teachers to experiment with AI by providing built-in safeguards against loss of content authority.

Load-bearing premise

Retrieval from teacher-provided materials will be sufficient to keep all AI responses accurate and to guide students toward deep rather than shallow learning.

What would settle it

A scenario in which the AI generates a response that includes details not present in the teacher materials or where students perform no better on understanding assessments than with standard tools.

Figures

Figures reproduced from arXiv: 2605.06963 by Anna Ostrowska, Anna Wr\'oblewska, Gabriela Majstrak, Jan Opala, Jan Skwarek, Micha{\l} Kukla, Sebastian Perga{\l}a.

**Figure 1.** Figure 1: Our system architecture: Presentation Layer –Moodle Plugin and Frontend Module, Application Layer – Backend Module, Data [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

read the original abstract

This demo paper describes the development of the AI Teaching \& Learning Assistant, a modular Moodle plugin that leverages Retrieval-Augmented Generation (RAG) to deliver high-quality, hallucination-free education. The system employs a dual-centric design, providing students with interactive, Socratic-based tutoring and educators with a "human-in-the-loop" workspace for supervised content generation. By grounding Large Language Model (LLM) responses in teacher-provided materials, the assistant addresses the risks of misinformation while encouraging deep conceptual mastery. Evaluation via the Ragas (LLM-as-a-Judge) framework and a preliminary user study confirms its effectiveness, achieving faithfulness scores up to 0.97 and a 4.00/5.00 recommendation rate.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. This demo paper describes the development of the AI Teaching & Learning Assistant, a modular Moodle plugin using Retrieval-Augmented Generation (RAG) to ground LLM responses in teacher-provided materials. It offers Socratic tutoring for students and a human-in-the-loop workspace for educators, claiming to mitigate misinformation risks while promoting deep conceptual mastery. Evaluation via the Ragas LLM-as-judge framework reports faithfulness scores up to 0.97, and a preliminary user study reports a 4.00/5.00 recommendation rate.

Significance. If the system performs as described, the work could contribute to HCI in education by providing a practical, deployable example of grounded AI tutoring integrated with an existing LMS. The dual-centric design and emphasis on teacher oversight address real deployment concerns around hallucinations. However, the significance for claims of 'deep understanding' is limited without evidence of learning gains beyond faithfulness and satisfaction metrics.

major comments (2)

[Evaluation] Evaluation section: The reported Ragas faithfulness scores (max 0.97) and 4.00/5 recommendation rate from the preliminary user study address hallucination risk and user satisfaction but provide no direct measures of conceptual depth, such as pre/post learning assessments, Bloom's taxonomy analysis of dialogue, or control-group comparisons, which are required to support the central claim of moving from surface learning to deep understanding.
[Abstract] Abstract and introduction: The claim that the system 'encourages deep conceptual mastery' rests on the assumption that RAG grounding plus Socratic interaction produces higher-order learning, yet the evaluation description supplies no details on study design, sample size, statistical tests, or potential biases in the LLM-as-judge setup, leaving the claim unsupported.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback on our demo paper. We acknowledge that the current evaluation is preliminary and does not directly measure learning gains or conceptual depth. We will revise the manuscript to moderate claims about 'deep conceptual mastery,' provide additional details on the user study where possible, and clarify the scope of the demo. Point-by-point responses follow.

read point-by-point responses

Referee: Evaluation section: The reported Ragas faithfulness scores (max 0.97) and 4.00/5 recommendation rate from the preliminary user study address hallucination risk and user satisfaction but provide no direct measures of conceptual depth, such as pre/post learning assessments, Bloom's taxonomy analysis of dialogue, or control-group comparisons, which are required to support the central claim of moving from surface learning to deep understanding.

Authors: We agree that the evaluation does not include direct measures of conceptual depth or learning outcomes. As this is a demo paper focused on system development and initial deployment feasibility, the reported metrics were limited to RAG faithfulness via the Ragas framework and a small-scale user satisfaction survey. We will revise the Evaluation section to explicitly note these limitations, expand on the preliminary study design and sample characteristics to the extent space allows, and reframe the contribution around mitigating hallucinations and supporting interactive tutoring rather than claiming demonstrated gains in deep understanding. Comprehensive learning assessments are planned as future work. revision: partial
Referee: Abstract and introduction: The claim that the system 'encourages deep conceptual mastery' rests on the assumption that RAG grounding plus Socratic interaction produces higher-order learning, yet the evaluation description supplies no details on study design, sample size, statistical tests, or potential biases in the LLM-as-judge setup, leaving the claim unsupported.

Authors: The referee correctly identifies that the abstract and introduction advance a claim not fully supported by the evaluation details. We will revise both sections to use more cautious language, describing the system as designed to encourage deeper engagement through grounded Socratic dialogue while clearly stating that empirical evidence of learning gains is not provided in this work. We will also add a brief description of the user study methodology in the Evaluation section to address transparency concerns around the LLM-as-judge approach. revision: yes

Circularity Check

0 steps flagged

No circularity: system description with empirical evaluation only.

full rationale

The paper is a demo/system description of a Moodle RAG plugin for Socratic tutoring. It reports faithfulness via Ragas (max 0.97) and a 4.00/5 user-study recommendation rate. No equations, fitted parameters, derivations, or self-citation chains appear. Central claims rest on external metrics and a preliminary study rather than reducing to internal definitions or prior self-work by construction. This matches the default non-circular case for non-mathematical papers.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical model or new theoretical entities are introduced; the work relies on standard assumptions about RAG reducing hallucinations when sources are curated.

pith-pipeline@v0.9.0 · 5454 in / 1018 out tokens · 22355 ms · 2026-05-11T01:07:37.158638+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear
By grounding Large Language Model (LLM) responses in teacher-provided materials, the assistant addresses the risks of misinformation while encouraging deep conceptual mastery. Evaluation via the Ragas (LLM-as-a-Judge) framework...
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
foster critical thinking by using pedagogical frameworks like the Socratic method and Bloom’s Taxonomy

Reference graph

Works this paper leans on

25 extracted references · 1 canonical work pages

[1]

[Adamopoulou and Moussiades, 2020] Eleni Adamopoulou and Lefteris Moussiades. An overview of chatbot tech- 2However, the views and opinions expressed are those of the au- thors only and do not necessarily reflect those of the European Union or the European Research Executive Agency. Neither the European Union nor the European Research Executive Agency can...

2020
[2]

The digital divide and ai in education: Addressing equity and accessibility.AI EDIFY Journal, 1(2):12–23,

[Ahmed, 2024] Fatima Ahmed. The digital divide and ai in education: Addressing equity and accessibility.AI EDIFY Journal, 1(2):12–23,

2024
[3]

PhD thesis, Durham University,

[Algahtani and others, 2011] Abdullah Algahtani et al.Eval- uating the effectiveness of the e-learning experience in some universities in Saudi Arabia from male students’ per- ceptions. PhD thesis, Durham University,

2011
[4]

[Allenet al., 2019 ] Liz Allen, Alison O’Connell, and Veronique Kiermer. How can we ensure visibility and di- versity in research contributions? how the contributor role taxonomy (credit) is helping the shift from authorship to contributorship.Learned Publishing, 32(1):71–74,

2019
[5]

The limits of computation: Joseph weizenbaum and the eliza chatbot.Weizenbaum Journal of the Digital Society, 3(3),

[Berry, 2023] David M Berry. The limits of computation: Joseph weizenbaum and the eliza chatbot.Weizenbaum Journal of the Digital Society, 3(3),

2023
[6]

Appli- cations and challenges of artificial intelligence in educa- tional course design and delivery

[Danet al., 2025 ] Daniel Dan, Anna Wr ´oblewska, Bartosz Grabek, Michał Taczała, and Minoru Nakayama. Appli- cations and challenges of artificial intelligence in educa- tional course design and delivery. In2025 20th Conference on Computer Science and Intelligence Systems (FedCSIS), pages 675–680. IEEE,

2025
[7]

Ragas: Automated evalu- ation of retrieval augmented generation

[Eset al., 2024 ] Shahul Es, Jithin James, Luis Espinosa Anke, and Steven Schockaert. Ragas: Automated evalu- ation of retrieval augmented generation. InProceedings of the 18th Conference of the European Chapter of the As- sociation for Computational Linguistics (EACL): System Demonstrations, pages 150–158,

2024
[8]

MACHINE LEARN- ING APPROACHES

[Gadhiya, 2025] Krunal Gadhiya.CHATBOT COMPARISON-RULE-BASED VS. MACHINE LEARN- ING APPROACHES. PhD thesis, CALIFORNIA STATE UNIVERSITY , NORTHRIDGE,

2025
[9]

Trans- form teaching and learning with Gemini for educa- tion

[Google for Education, 2025] Google for Education. Trans- form teaching and learning with Gemini for educa- tion. https://edu.google.com/ai/gemini-for-education/,

2025
[10]

[Google Labs, 2024] Google Labs

Accessed: 2026-01-29. [Google Labs, 2024] Google Labs. Google NotebookLM. https://notebooklm.google.com/,

2026
[11]

[G¨uldal and Dinc ¸er, 2025] Hakan G ¨uldal and Em- rah O ˘guzhan Dinc ¸er

Accessed: 2026- 01-23. [G¨uldal and Dinc ¸er, 2025] Hakan G ¨uldal and Em- rah O ˘guzhan Dinc ¸er. Can rule-based educational chatbots be an acceptable alternative for students in higher education?Education and Information Technologies, 30(3):3979–4012,

2026
[12]

Moodle.The Architecture of Open Source Applications, 2,

[Hunt, 2012] Tim Hunt. Moodle.The Architecture of Open Source Applications, 2,

2012
[13]

Survey of halluci- nation in natural language generation.ACM computing surveys, 55(12):1–38,

[Jiet al., 2023 ] Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. Survey of halluci- nation in natural language generation.ACM computing surveys, 55(12):1–38,

2023
[14]

Retrieval-augmented generation for knowledge-intensive nlp tasks

[Lewiset al., 2020 ] Patrick Lewis, Ethan Perez, Aleksander Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich K ¨uttler, Mike Lewis, Wen-tau Yih, Tim Rockt ¨aschel, Sebastian Riedel, and Douwe Kiela. Retrieval-augmented generation for knowledge-intensive nlp tasks. InProceedings of the 34th International Conference on Neural Information Pro...

2020
[15]

Microsoft copilot in education

[Microsoft Learn, 2025] Microsoft Learn. Microsoft copilot in education. https://learn.microsoft.com/ en-us/microsoft-365/education/guide/1-reference/ baseline-reference-copilot,

2025
[16]

[Moodle Project, 2025] Moodle Project

Accessed: 2026- 01-29. [Moodle Project, 2025] Moodle Project. Plugin types - moo- dle developer resources,

2026
[17]

[Moodle Pty Ltd, 2024] Moodle Pty Ltd

Accessed: 2026-01-24. [Moodle Pty Ltd, 2024] Moodle Pty Ltd. Moodle documen- tation. https://docs.moodle.org/,

2026
[18]

[Motlaghet al., 2023 ] Negin Yazdani Motlagh, Matin Kha- javi, Abbas Sharifi, and Mohsen Ahmadi

Accessed: 2026- 01-28. [Motlaghet al., 2023 ] Negin Yazdani Motlagh, Matin Kha- javi, Abbas Sharifi, and Mohsen Ahmadi. The impact of artificial intelligence on the evolution of digital education: A comparative study of openai text generation tools in- cluding chatgpt, bing chat, bard, and ernie.arXiv preprint arXiv:2309.02029,

work page arXiv 2026
[19]

Design intelligent educational chatbot for in- formation retrieval based on integrated knowledge bases

[Nguyenet al., 2022 ] Hien D Nguyen, Tuan-Vi Tran, Xuan- Thien Pham, Anh T Huynh, Vuong T Pham, and Diem Nguyen. Design intelligent educational chatbot for in- formation retrieval based on integrated knowledge bases. IAENG International Journal of Computer Science, 49(2),

2022
[20]

Introducing chatgpt edu

[OpenAI, 2024] OpenAI. Introducing chatgpt edu. https: //openai.com/index/introducing-chatgpt-edu/,

2024
[21]

[P´erezet al., 2020 ] Jos´e Quiroga P ´erez, Thanasis Daradoumis, and Joan Manuel Marqu `es Puig

Ac- cessed: 2026-01-29. [P´erezet al., 2020 ] Jos´e Quiroga P ´erez, Thanasis Daradoumis, and Joan Manuel Marqu `es Puig. Re- discovering the use of chatbots in education: A systematic literature review.Computer Applications in Engineering Education, 28(6):1549–1565,

2026
[22]

Introducing q-chat, the world’s first ai tutor built with openai’s chatgpt

[Quizlet, 2023] Quizlet. Introducing q-chat, the world’s first ai tutor built with openai’s chatgpt. https://quizlet.com/ blog/meet-q-chat,

2023
[23]

[Ragas Contributors, 2024] Ragas Contributors

Accessed: 2026-01-29. [Ragas Contributors, 2024] Ragas Contributors. Ragas: Automated evaluation for RAG systems – documen- tation. https://docs.ragas.io/en/stable/concepts/metrics/ index.html,

2026
[24]

[StudyFetch, 2026] StudyFetch

Accessed: 2025-01-20. [StudyFetch, 2026] StudyFetch. Studyfetch: AI study guide and tool generator. https://www.studyfetch.com/,

2025
[25]

[Wr´oblewskaet al., 2025 ] Anna Wr ´oblewska, Bartosz Grabek, Jakub ´Swistak, and Daniel Dan

Accessed: 2026-01-23. [Wr´oblewskaet al., 2025 ] Anna Wr ´oblewska, Bartosz Grabek, Jakub ´Swistak, and Daniel Dan. Evaluating LLM-generated Q&A test: A student-centered study. In Artificial Intelligence in Education. AIED 2025, volume 15878 ofLecture Notes in Computer Science, pages 277–289. Springer, Cham, 2025

2026