pith. machine review for the scientific record. sign in

arxiv: 2604.04670 · v1 · submitted 2026-04-06 · 📡 eess.IV · cs.AI· cs.CY· eess.SP

Recognition: 1 theorem link

· Lean Theorem

An AI Teaching Assistant for Motion Picture Engineering

Authors on Pith no claims yet

Pith reviewed 2026-05-10 19:16 UTC · model grok-4.3

classification 📡 eess.IV cs.AIcs.CYeess.SP
keywords AI teaching assistantretrieval augmented generationmotion picture engineeringopen-book examsstudent performanceLLM tutorseducational technology
0
0 comments X

The pith

An AI teaching assistant can be used in open-book exams without changing student performance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper describes the construction of an AI teaching assistant for a master's course in motion picture engineering using a retrieval-augmented generation approach. The system was tuned to answer course-specific questions and then deployed for seven weeks, during which 43 students produced nearly 1,900 queries. The authors tested the tool by permitting its use in three open-book examinations and found no difference in scores between students who had access and those who did not. Survey responses indicated that students viewed the assistant as helpful, though they did not strongly prefer it to human tutoring. The results suggest that assessments can be designed to retain their ability to measure learning even when AI support is available.

Core claim

The authors built and tuned a retrieval-augmented generation pipeline to serve as an AI teaching assistant for the motion picture engineering course, supplying students with accurate answers drawn from course materials. In a trial involving 43 students across 296 sessions, statistical tests on three separate exams showed no performance difference whether the AI tool was accessible or not. This outcome supports the claim that thoughtfully constructed assessments can preserve academic validity. Students rated the tool's usefulness at 4.22 out of 5 yet gave only moderate preference for it over human tutoring at 2.78 out of 5.

What carries the argument

The retrieval-augmented generation pipeline that retrieves relevant course documents and feeds them to the language model to generate answers tailored to motion picture engineering questions.

If this is right

  • Thoughtfully designed open-book exams can maintain their validity even when students have access to AI assistance.
  • A retrieval-augmented generation system can supply consistent, domain-specific help throughout an entire course term.
  • Usage logs and post-course surveys can quantify how students interact with and value an AI teaching assistant.
  • Students may accept AI support as beneficial while still seeing value in human tutoring for certain tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same implementation could be tried in other engineering or technical courses to check whether the lack of score difference appears more widely.
  • Longer-term use might gradually change how students prepare for exams, shifting emphasis toward skills the AI cannot easily replace.
  • Pairing the AI tool with optional human tutoring sessions might raise overall student satisfaction beyond what either provides alone.

Load-bearing premise

That the retrieval-augmented generation system produced sufficiently accurate and course-relevant answers and that the survey answers and usage logs captured genuine educational effects without large biases from self-reporting or participation patterns.

What would settle it

A larger follow-up trial that records a statistically significant difference in average exam scores between the group allowed to use the AI assistant and the group without access.

Figures

Figures reproduced from arXiv: 2604.04670 by Anil C. Kokaram, Deirdre O'Regan.

Figure 1
Figure 1. Figure 1: A typical RAG-based AI-TA: Based on the student’s query, the AI-TA searches course materials, retrieves relevant context [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Our AI-TA’s implementation architecture and core student workflow. To use this system, create a Project in Microsoft [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Workflow for ingesting course materials: First, transform all course materials to text-based documents, then using [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: AI-TA engagement: Number of MPE student queries per day over 7 weeks. Annotations show correlations between [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Student perceptions of the AI-TA from the course exit survey (question texts summarized for brevity). We calculated [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
read the original abstract

The rapid rise of LLMs over the last few years has promoted growing experimentation with LLM-driven AI tutors. However, the details of implementation, as well as the benefit in a teaching environment, are still in the early days of exploration. This article addresses these issues in the context of implementation of an AI Teaching Assistant (AI-TA) using Retrieval Augmented Generation (RAG) for Trinity College Dublin's Master's Motion Picture Engineering (MPE) course. We provide details of our implementation (including the prompt to the LLM, and code), and highlight how we designed and tuned our RAG pipeline to meet course needs. We describe our survey instrument and report on the impact of the AI-TA through a number of quantitative metrics. The scale of our experiment (43 students, 296 sessions, 1,889 queries over 7 weeks) was sufficient to have confidence in our findings. Unlike previous studies, we experimented with allowing the use of the AI-TA in open-book examinations. Statistical analysis across three exams showed no performance differences regardless of AI-TA access (p > 0.05), demonstrating that thoughtfully designed assessments can maintain academic validity. Student feedback revealed that the AI-TA was beneficial (mean = 4.22/5), while students had mixed feelings about preferring it over human tutoring (mean = 2.78/5).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript presents the development and evaluation of an AI Teaching Assistant (AI-TA) based on Retrieval-Augmented Generation (RAG) for Trinity College Dublin's Master's program in Motion Picture Engineering. It details the implementation, including prompt design and RAG tuning, and reports on a 7-week study with 43 students, 296 sessions, and 1,889 queries. Key findings include no statistically significant differences in exam performance with or without AI-TA access (p > 0.05) in open-book exams, positive student ratings for the AI-TA's helpfulness (mean 4.22/5), and mixed views on preferring it over human tutors (mean 2.78/5).

Significance. If the statistical findings are robustly supported, the paper makes a valuable contribution by providing a reproducible example of RAG-based AI tutoring in a specialized technical field, exploring the integration of such tools into high-stakes assessments without apparent compromise to validity, and offering empirical data on student perceptions at a scale larger than many similar studies. The inclusion of implementation specifics and code supports broader adoption and further research in AI-assisted education.

major comments (3)
  1. The central claim that 'thoughtfully designed assessments can maintain academic validity' is based on non-significant exam score differences (p > 0.05) across three exams. However, the manuscript does not report group sizes per condition, the method of assigning AI-TA access (e.g., randomized, per-exam variation, or self-selection), the specific statistical test used, any adjustments for multiple comparisons, effect sizes, or post-hoc power analysis. With a total of only 43 students, these details are essential to evaluate whether the failure to reject the null hypothesis supports the stronger interpretation of maintained validity.
  2. There is insufficient information on how AI-TA usage was monitored or restricted during the open-book examinations. Potential confounds such as unmonitored access, differential engagement levels, or selection effects could influence the performance comparison, undermining the conclusion that the assessments maintained validity regardless of access.
  3. The reported mean feedback scores (4.22/5 for benefit and 2.78/5 for preference over human tutoring) lack accompanying statistics such as standard deviations, number of respondents, or response rate. Additionally, details on survey design, validation, and administration (e.g., timing, anonymity) are needed to assess the reliability and potential biases in the quantitative metrics.
minor comments (2)
  1. While the paper provides the prompt to the LLM and code, consider including a quantitative assessment of the RAG system's retrieval precision or response accuracy on course-specific queries to strengthen claims about its reliability.
  2. The manuscript would benefit from a table summarizing key usage statistics (e.g., queries per student, session durations) and error bars or confidence intervals on the reported means for better visualization of the results.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed comments, which highlight important areas for improving the clarity and robustness of our manuscript. We address each major comment below and will revise the paper to incorporate the requested information where possible.

read point-by-point responses
  1. Referee: The central claim that 'thoughtfully designed assessments can maintain academic validity' is based on non-significant exam score differences (p > 0.05) across three exams. However, the manuscript does not report group sizes per condition, the method of assigning AI-TA access (e.g., randomized, per-exam variation, or self-selection), the specific statistical test used, any adjustments for multiple comparisons, effect sizes, or post-hoc power analysis. With a total of only 43 students, these details are essential to evaluate whether the failure to reject the null hypothesis supports the stronger interpretation of maintained validity.

    Authors: We agree that these details are essential for readers to properly interpret the non-significant results. In the revised manuscript we will add the per-exam group sizes, clarify that AI-TA access was assigned on a per-exam basis with randomization where feasible to reduce selection bias, specify the statistical tests (independent-samples t-tests), note that separate analyses per exam meant no multiple-comparison adjustment was applied, report Cohen’s d effect sizes, and include a post-hoc power analysis. These additions will allow a more nuanced evaluation of whether the data support the claim of maintained validity. revision: yes

  2. Referee: There is insufficient information on how AI-TA usage was monitored or restricted during the open-book examinations. Potential confounds such as unmonitored access, differential engagement levels, or selection effects could influence the performance comparison, undermining the conclusion that the assessments maintained validity regardless of access.

    Authors: We acknowledge the need for greater transparency on this point. The revised methods section will describe that access was controlled through authenticated user sessions on the platform, with server logs used to verify usage patterns during exam windows. For exams where the AI-TA was not permitted, the tool was disabled for those students. We will also add an explicit discussion of remaining limitations, including the possibility of students using other AI systems, and how this affects interpretation of the validity claim. revision: yes

  3. Referee: The reported mean feedback scores (4.22/5 for benefit and 2.78/5 for preference over human tutoring) lack accompanying statistics such as standard deviations, number of respondents, or response rate. Additionally, details on survey design, validation, and administration (e.g., timing, anonymity) are needed to assess the reliability and potential biases in the quantitative metrics.

    Authors: We will expand the results and methods sections to report standard deviations, the exact number of respondents and response rate, and full survey details including question wording, administration (voluntary anonymous online form at the end of the course), and any validation steps such as use of items adapted from prior AI-education instruments. This will improve the transparency and interpretability of the student-perception data. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical implementation and user study with direct data reporting

full rationale

The paper describes an RAG-based AI-TA implementation for a specific course, reports survey means (e.g., 4.22/5 benefit rating), query logs (1,889 queries), and statistical comparisons of exam scores across conditions (p > 0.05). No equations, derivations, fitted parameters, or first-principles predictions exist that could reduce to inputs by construction. Claims rest on primary data collection rather than self-referential modeling or self-citation chains. The analysis is self-contained and externally falsifiable via the reported exam outcomes and feedback metrics.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The study is an applied empirical evaluation relying on established RAG techniques and standard statistical hypothesis testing rather than novel mathematical constructs.

axioms (1)
  • standard math Standard statistical hypothesis testing to assess no significant difference in exam scores (p > 0.05)
    Invoked when reporting performance comparisons across exam groups with and without AI-TA access.

pith-pipeline@v0.9.0 · 5547 in / 1286 out tokens · 78791 ms · 2026-05-10T19:16:17.052878+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

10 extracted references · 6 canonical work pages

  1. [1]

    Attention is all you need,

    A. Vaswani et al., “Attention is all you need,” inAdvances in Neural Information Processing Systems, 2017, pp. 5998–6008. [Online]. Available: https://papers.nips.cc/paper/7181-attention-is-all-you-need

  2. [2]

    Retrieval-augmented generation for knowledge-intensive NLP tasks,

    P. Lewis et al., “Retrieval-augmented generation for knowledge-intensive NLP tasks,” inAdvances in Neural Information Processing Systems, vol. 33, 2020, pp. 9459–9474. [Online]. Available: https://proceedings.neurips.cc/paper/2020/hash/6b493230205f780e1bc26945df7481e5-Abstract.html

  3. [3]

    Retrieval-augmented generation for educational application: A systematic survey,

    Z. Li et al., “Retrieval-augmented generation for educational application: A systematic survey,”Comput. Educ. Artif. Intell., vol. 8, pp. 100417, Jun. 2025, doi: 10.1016/j.caeai.2025.100417. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2666920X25000578

  4. [4]

    An LLM-driven chatbot in higher education for databases and information systems,

    A. T. Neumann et al., “An LLM-driven chatbot in higher education for databases and information systems,” IEEE Trans. Educ., vol. 68, no. 1, pp. 103–116, Feb. 2025, doi: 10.1109/TE.2024.3467912. [Online]. Available: https://doi.org/10.1109/TE.2024.3467912

  5. [5]

    Teaching CS50 with AI: Leveraging generative artificial intelligence in computer science education,

    R. Liu et al., “Teaching CS50 with AI: Leveraging generative artificial intelligence in computer science education,” inProc. 55th ACM Tech. Symp. Comput. Sci. Educ.(SIGCSE), Portland, OR, USA, Mar. 2024, pp. 750–756, doi: 10.1145/3626252.3630938. [Online]. Available: https://doi.org/10.1145/3626252.3630938

  6. [6]

    HiTA: A RAG-based educational platform that centers educators in the instructional loop,

    C. Liu, L. Hoang, A. Stolman, and B. Wu, “HiTA: A RAG-based educational platform that centers educators in the instructional loop,” inArtificial Intelligence in Education. AIED 2024. Lecture Notes in Computer Science, vol. 14830, A. M. Olney, I. A. Chounta, Z. Liu, O. C. Santos, and I. I. Bittencourt, Eds. Cham, Switzerland: Springer, 2024, pp. 405–412, d...

  7. [7]

    Exploring generative AI in higher education: A RAG system to enhance student engagement with scientific literature,

    D. Th ¨us, S. Malone, and R. Br ¨unken, “Exploring generative AI in higher education: A RAG system to enhance student engagement with scientific literature,”Front. Psychol., vol. 15, pp. 1474892, 2024, doi: 10.3389/fpsyg.2024.1474892. [Online]. Available: https://doi.org/10.3389/fpsyg.2024.1474892 April 7, 2026 Accepted for publication in IEEE Signal Proc...

  8. [8]

    Student use of an LLM incrementally fine-tuned to behave like a teaching assistant in busi- ness and marketing,

    A. Smeaton, “Student use of an LLM incrementally fine-tuned to behave like a teaching assistant in busi- ness and marketing,” inUsing GenAI in Teaching, Learning and Assessment in Irish Universities, A. E. Schalk Quintanar and P. Rooney, Eds. Trinity College Dublin and University College Cork, 2025. [Online]. Available: https://ucclibrary.pressbooks.pub/g...

  9. [9]

    Alexander,A Dialogic Teaching Companion

    R. Alexander,A Dialogic Teaching Companion. London, UK: Routledge, 2020

  10. [10]

    bootstrap.py - resampling analysis for multiple comparisons,

    G.-C. Pascutto, “bootstrap.py - resampling analysis for multiple comparisons,” Computer software, Version 1.0, GNU Affero GPL v3, Feb. 2011. [Online]. Available: https://www.sjeng.org/ftp/bootstrap.py April 7, 2026 Accepted for publication in IEEE Signal Processing Magazine