pith. machine review for the scientific record. sign in

arxiv: 2604.05266 · v1 · submitted 2026-04-07 · 💻 cs.MM

Recognition: 2 theorem links

· Lean Theorem

LLM2Manim: Pedagogy-Aware AI Generation of STEM Animations

Authors on Pith no claims yet

Pith reviewed 2026-05-10 19:19 UTC · model grok-4.3

classification 💻 cs.MM
keywords large language modelsManimSTEM animationsmultimedia learninghuman-in-the-loopeducational technologyanimation generationpedagogy-aware AI
0
0 comments X

The pith

A human-in-the-loop LLM pipeline produces Manim animations for STEM topics that improve test scores and engagement over PowerPoint slides.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that a semi-automated system can use large language models to generate narrated animations of math and physics concepts in the Manim library while following multimedia learning principles such as segmentation, signaling, and dual coding. This matters because creating high-quality animations has traditionally required specialized skills and time, keeping them rare in everyday teaching despite their potential to support learning. The pipeline stabilizes outputs through constrained prompts, a symbol ledger for consistency, selective regeneration of errors, and expert human review. In a within-subject study with 100 undergraduates, the resulting animations produced higher post-test scores, larger learning gains, greater engagement, and lower cognitive load than equivalent PowerPoint instruction.

Core claim

The authors demonstrate a human-in-the-loop pipeline that converts STEM concepts into narrated Manim animations aligned with multimedia learning ideas. After testing with 100 students in paired lessons, the animation format yielded 83 percent post-test scores versus 78 percent for slides, with effect sizes of 0.67 for learning gains, 0.94 for engagement, and 0.41 for reduced cognitive load, plus faster task completion and student preference for the animated version.

What carries the argument

The semi-automated human-in-the-loop pipeline that uses constrained prompt templates, a symbol ledger for consistency, selective regeneration of faulty segments, and expert review before final rendering to produce pedagogy-aligned Manim animations.

If this is right

  • Instructors without animation expertise can create custom narrated STEM visuals more quickly.
  • Students achieve modestly higher post-test performance and larger learning gains on similar topics.
  • Engagement rises and cognitive load falls during instruction compared with static slides.
  • Many learners complete tasks faster and express preference for the animated format.
  • The method offers a practical route to bring dynamic visuals into more daily classroom use.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same constrained-prompt and review structure could be adapted to other code-based visualization libraries beyond Manim.
  • If the human review step is further automated, the pipeline could scale to produce animations for many more topics at low cost.
  • Wider classroom adoption might shift teaching norms toward routine use of dynamic explanations for abstract concepts.
  • Testing the approach with different age groups or non-STEM subjects would clarify how far the gains generalize.

Load-bearing premise

Human review after generation ensures the LLM outputs contain no factual errors or pedagogical misalignments that could undermine learning.

What would settle it

A larger replication study in which students using the animations show no score advantage or develop misconceptions traceable to inaccuracies in the generated code or narration.

Figures

Figures reproduced from arXiv: 2604.05266 by Aaron Christian, Aastha Joshi, Hongyi Ke, Jun Chen, Meet Gajjar, Qi Wang.

Figure 1
Figure 1. Figure 1: Overview of the LLM-driven, pedagogy-aware animation pipeline. The system converts an instructor’s goal or a student’s question into a narrated [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Example outputs from the pipeline. The bottom scenes show a [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Overall pipeline of the HITL authoring system, from a short brief to a rendered video. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Slot-based plan template to make key items explicit. Once an issue [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 7
Figure 7. Figure 7: Checks before rendering. We keep them simple and mostly local, [PITH_FULL_IMAGE:figures/full_fig_p004_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Within-subject A-B crossover design with counterbalanced order. [PITH_FULL_IMAGE:figures/full_fig_p005_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Learning performance across instructional conditions, shown through [PITH_FULL_IMAGE:figures/full_fig_p007_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Engagement and cognitive workload across instructional conditions. [PITH_FULL_IMAGE:figures/full_fig_p008_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Animation advantage in learning gains (Animation - Slides) by [PITH_FULL_IMAGE:figures/full_fig_p009_11.png] view at source ↗
read the original abstract

High-quality STEM animations can be useful for learning, but they are still not common in daily teaching, mostly because they take time and special skills to make. In this paper, we present a semi-automated, human-in-the-loop (HITL) pipeline that uses a large language model (LLM) to help convert math and physics concepts into narrated animations with the Python library Manim. The pipeline also tries to follow multimedia learning ideas like segmentation, signaling, and dual coding, so the narration and the visuals are more aligned. To keep the outputs stable, we use constrained prompt templates, a symbol ledger to keep symbols consistent, and we regenerate only the parts that have errors. We also include expert review before the final rendering, because sometimes the generated code or explanation is not fully correct. We tested the approach with 100 undergraduate students in a within-subject A-B study. Each student learned two similar STEM topics, one with the LLM-generated animations and one with PowerPoint slides. In general, the animation-based instruction gives slightly better post-test scores (83% vs.78%, p < .001), and students show higher learning gains (d=0.67). They also report higher engagement (d=0.94) and lower cognitive load (d=0.41). Students finished the tasks faster, and many of them said they prefer the animated format. Overall, these results suggest LLM-assisted animation can make STEM content creation easier, and it may be a practical option for more classrooms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript describes LLM2Manim, a human-in-the-loop pipeline that uses LLMs with constrained prompts, a symbol ledger, and expert review to generate narrated Manim animations for STEM topics while aligning with multimedia learning principles such as segmentation and signaling. It evaluates the system in a within-subjects A-B study with 100 undergraduates, reporting that animation-based instruction yields higher post-test scores (83% vs. 78%, p < .001), learning gains (d=0.67), engagement (d=0.94), and lower cognitive load (d=0.41) than PowerPoint slides, along with faster task completion and student preference.

Significance. If the results hold after improved reporting, the work provides a practical demonstration that LLM-assisted generation can reduce barriers to creating pedagogically sound STEM animations, with empirical support from a controlled user study that directly measures learning outcomes, engagement, and cognitive load. This strengthens the applied case for AI tools in multimedia education.

major comments (2)
  1. [Abstract / Evaluation] Abstract and Evaluation section: The central claim of statistically significant advantages (83% vs. 78% post-test, d=0.67 gains) rests on a within-subjects design, yet the manuscript provides no details on randomization of topic/condition order, topic selection criteria, pre-test equivalence checks, or controls for confounds such as individual differences or order effects. These omissions prevent confident attribution of outcomes to the animation pipeline rather than design artifacts.
  2. [Pipeline] Pipeline section: Expert human review is invoked because 'sometimes the generated code or explanation is not fully correct,' but the paper reports zero quantitative data on correction frequency, error categories (e.g., algebraic mistakes or violations of segmentation/signaling), or inter-reviewer agreement. This leaves open whether observed benefits arise from the automated pipeline or from the unquantified review step itself.
minor comments (1)
  1. [Abstract] The abstract would benefit from naming the specific STEM topics used in the study to support reproducibility and allow readers to assess topic similarity.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive feedback, which helps clarify the reporting of our experimental design and the human-in-the-loop components. We address each major comment below and indicate planned revisions.

read point-by-point responses
  1. Referee: [Abstract / Evaluation] Abstract and Evaluation section: The central claim of statistically significant advantages (83% vs. 78% post-test, d=0.67 gains) rests on a within-subjects design, yet the manuscript provides no details on randomization of topic/condition order, topic selection criteria, pre-test equivalence checks, or controls for confounds such as individual differences or order effects. These omissions prevent confident attribution of outcomes to the animation pipeline rather than design artifacts.

    Authors: We agree that additional methodological details are required for confident causal attribution. The revised manuscript will expand the Evaluation section with: (1) explicit description of counterbalancing via a Latin-square design for topic and condition order to control order effects; (2) topic selection criteria (two STEM topics of matched difficulty and prerequisite knowledge, validated through pilot testing with 20 students); (3) pre-test equivalence results (no significant baseline differences between conditions, t(99) = 0.52, p = 0.60); and (4) statistical checks confirming non-significant order effects and controls for individual differences via within-subject comparisons. These additions will directly address the concern. revision: yes

  2. Referee: [Pipeline] Pipeline section: Expert human review is invoked because 'sometimes the generated code or explanation is not fully correct,' but the paper reports zero quantitative data on correction frequency, error categories (e.g., algebraic mistakes or violations of segmentation/signaling), or inter-reviewer agreement. This leaves open whether observed benefits arise from the automated pipeline or from the unquantified review step itself.

    Authors: We acknowledge that quantifying the review step would better isolate the LLM pipeline's contribution. Unfortunately, systematic logs of correction frequency, error categories, and inter-reviewer agreement were not collected during animation generation for this study. In the revision we will add a qualitative description of the review process (including common correction types such as symbol consistency fixes and segmentation adjustments) with illustrative examples, and we will explicitly note the human review as a limitation of the current evaluation. Future work will instrument the pipeline to capture these metrics. revision: partial

standing simulated objections not resolved
  • Quantitative data on correction frequency, error categories, and inter-reviewer agreement during expert review, as these were not systematically recorded.

Circularity Check

0 steps flagged

No circularity: empirical pipeline description and user study with no derivations

full rationale

The paper presents a semi-automated HITL pipeline for LLM-generated Manim animations of STEM topics, using constrained prompts, symbol ledgers, and expert review, then reports results from a within-subject A-B study (n=100) comparing animations to PowerPoint on post-test scores, learning gains, engagement, and cognitive load. No equations, first-principles derivations, fitted parameters renamed as predictions, or self-citation chains appear in the claims. All central results rest on direct experimental observations rather than reducing to inputs by construction, making the work self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an applied empirical paper describing a system and evaluation study. No mathematical derivations, free parameters, axioms, or invented entities are invoked; claims rest on the described implementation and reported study outcomes.

pith-pipeline@v0.9.0 · 5586 in / 1152 out tokens · 58305 ms · 2026-05-10T19:19:42.085071+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

52 extracted references · 33 canonical work pages · 3 internal anchors

  1. [1]

    J. E. Zull,From Brain to Mind: Using Neuroscience to Guide Change in Education. New York, NY , USA: Routledge, 2023. [Online]. Available: https://doi.org/10.4324/9781003444886

  2. [2]

    Multisensory integration and the society for neuroscience: Then and now,

    B. E. Stein, T. R. Stanford, and B. A. Rowland, “Multisensory integration and the society for neuroscience: Then and now,”Journal of Neuroscience, vol. 40, no. 1, pp. 3–11, 2020. [Online]. Available: https://www.jneurosci.org/content/40/1/3

  3. [3]

    Animation as an aid to multimedia learning,

    R. E. Mayer and R. Moreno, “Animation as an aid to multimedia learning,”Educational Psychology Review, vol. 14, no. 1, pp. 87–99,

  4. [4]

    Available: https://doi.org/10.1023/A:1013184611077

    [Online]. Available: https://doi.org/10.1023/A:1013184611077

  5. [5]

    R. E. Mayer,The Cambridge Handbook of Multimedia Learning, 2nd ed. New York, NY , USA: Cambridge University Press, 2014. [Online]. Available: https://doi.org/10.1017/CBO9781139547369

  6. [6]

    Instructional animation versus static pictures: A meta-analysis,

    F. H. H ¨offler and H. Leutner, “Instructional animation versus static pictures: A meta-analysis,”Learning and Instruction, vol. 17, no. 6, pp. 722–738, 2007. [Online]. Available: https://doi.org/10.1016/ j.learninstruc.2007.09.013

  7. [7]

    Demonstrating the potential of visualization in education with the manim python library: Examples from algorithms and data structures,

    M. Markovi ´c and I. Ka ˇstelan, “Demonstrating the potential of visualization in education with the manim python library: Examples from algorithms and data structures,” in2024 47th MIPRO ICT and Electronics Convention (MIPRO), 2024, pp. 625–629. [Online]. Available: https://doi.org/10.1109/MIPRO60963.2024.10569661

  8. [8]

    (2011).Cognitive Load Theory

    J. Sweller, P. Ayres, and S. Kalyuga,Cognitive Load Theory. New York, NY , USA: Springer, 2011. [Online]. Available: https: //doi.org/10.1007/978-1-4419-8126-4

  9. [9]

    Strategies for learning from animation with and without narration,

    R. Ploetzner and B. Breyer, “Strategies for learning from animation with and without narration,” inLearning from Dynamic Visualization: Innovations in Research and Application. Cham, Switzerland: Springer,

  10. [11]

    Guiding cognitive processing during learning with animations,

    P. de Koning and H. Jarodzka, “Guiding cognitive processing during learning with animations,” inLearning from Dynamic Visualization: Innovations in Research and Application. Cham, Switzerland: Springer,

  11. [12]

    Available: https://doi.org/10.1007/978-3-319-56204-9 16

    [Online]. Available: https://doi.org/10.1007/978-3-319-56204-9 16

  12. [13]

    Does animation enhance learning? a meta-analysis,

    S. Berney and M. B ´etrancourt, “Does animation enhance learning? a meta-analysis,”Computers & Education, vol. 101, pp. 150–167, 2016. [Online]. Available: https://doi.org/10.1016/j.compedu.2016.06.005

  13. [14]

    Attention guidance in learning from a complex animation,

    H. A. de Koning, M. M. J. Tabbers, R. M. J. P. Rikers, and F. Paas, “Attention guidance in learning from a complex animation,”Applied Cognitive Psychology, vol. 23, no. 3, pp. 369–381, 2009. [Online]. Available: https://doi.org/10.1002/acp.1461

  14. [15]

    Artificial intelligence-based student learning evaluation: A concept map-based approach for analyzing a student’s understanding of a topic,

    G. P. Jain, V . P. Gurupur, J. L. Schroeder, and E. D. Faulkenberry, “Artificial intelligence-based student learning evaluation: A concept map-based approach for analyzing a student’s understanding of a topic,” IEEE Transactions on Learning Technologies, vol. 7, no. 3, pp. 267–279,

  15. [16]

    Available: https://doi.org/10.1109/TLT.2014.2330297

    [Online]. Available: https://doi.org/10.1109/TLT.2014.2330297

  16. [17]

    How undergraduate students’ learning strategy and culture affect algorithm animation use and interpretation,

    T. H ¨ubscher-Younger and N. H. Narayanan, “How undergraduate students’ learning strategy and culture affect algorithm animation use and interpretation,” inProceedings of the IEEE International Conference on Advanced Learning Technologies (ICALT), 2001, pp. 127–134. [Online]. Available: https://ieeexplore.ieee.org/document/943856

  17. [18]

    Proposing a 3d interactive visualization tool for learning OOP concepts,

    J. F. Miller and J. R. Miller, “Proposing a 3d interactive visualization tool for learning OOP concepts,” inInternational Conference on Information Technology: Research and Education, 2006, pp. 279–283. [Online]. Available: https://ieeexplore.ieee.org/document/4016308

  18. [19]

    Study of the use of dynamic 3d visualization graphs as supplements for understanding math,

    C. Lin, J. Hung, and K. Huang, “Study of the use of dynamic 3d visualization graphs as supplements for understanding math,”IEEE Transactions on Education, vol. 53, no. 2, pp. 262–270, 2010. [Online]. Available: https://doi.org/10.1109/TE.2009.2036154

  19. [20]

    A visual approach to understand parsing algorithms through python and manim,

    P. Akhilesh, K. A. Krishna, S. K. Bharadwaj, D. Subham, and M. Belwal, “A visual approach to understand parsing algorithms through python and manim,” in2024 15th International Conference on Computing, Communication and Networking Technologies (ICCCNT), 2024, pp. 1–7. [Online]. Available: https://doi.org/10.1109/ICCCNT61001.2024. 10725029

  20. [21]

    Vision-language models for vision tasks: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46:5625– 5644, 2023

    J. Lu, R. Zheng, Z. Gong, and H. Xu, “Supporting teachers’ professional development with generative AI: The effects on higher order thinking and self-efficacy,”IEEE Transactions on Learning Technologies, vol. 17, pp. 1279–1289, 2024. [Online]. Available: https://doi.org/10.1109/TLT.2024.3369690

  21. [22]

    Interactive visual learning in machine learning: A cognitive learning theories- driven approach,

    A. Alatawi, E. Burcu, D. Kalogiros, and J. R. Carri ´on, “Interactive visual learning in machine learning: A cognitive learning theories- driven approach,” inProceedings of the 2025 IEEE Global Engineering Education Conference (EDUCON), 2025, pp. 1–10. [Online]. Available: https://doi.org/10.1109/EDUCON62633.2025.11016355

  22. [23]

    Learning visualizations by analogy: Promoting visual literacy through visualization morphing,

    P. Ruchikachorn and K. Mueller, “Learning visualizations by analogy: Promoting visual literacy through visualization morphing,”IEEE Transactions on Visualization and Computer Graphics, vol. 21, no. 9, pp. 1028–1044, 2015. [Online]. Available: https://doi.org/10.1109/ TVCG.2015.2413786

  23. [24]

    Manimator: Transforming research papers and mathematical concepts into visual explanations,

    S. P., V . Jain, S. Golugula, and M. S. Sathvik, “Manimator: Transforming research papers and mathematical concepts into visual explanations,” arXiv preprint arXiv:2507.14306, 2025. [Online]. Available: https://doi.org/10.48550/arXiv.2507.14306

  24. [25]

    The- oremexplainagent: Towards multimodal explanations for llm theorem understanding,

    M. Ku, T. Chong, J. Leung, K. Shah, A. Yu, and W. Chen, “Theoremexplainagent: Towards multimodal explanations for llm theorem understanding,” arXiv preprint arXiv:2502.19400, 2025. [Online]. Available: https://arxiv.org/abs/2502.19400

  25. [26]

    3blue1brown: Visual mathematics explained,

    G. Sanderson, “3blue1brown: Visual mathematics explained,” Website and ManimGL Open-Source Repository, 2025, accessed 2025. [Online]. Available: https://www.3blue1brown.com/

  26. [27]

    arXiv preprint arXiv:2510.01174 , year=

    Z. Zheng, Y . Liu, K. Wang, and M. Chen, “Code2video: A code- centric paradigm for educational video generation,” arXiv preprint arXiv:2510.01174, 2025. [Online]. Available: https://doi.org/10.48550/ arXiv.2510.01174

  27. [28]

    Unleashing ChatGPT’s power: A case study on optimizing information retrieval in flipped classrooms via prompt engineering,

    M. Wang, M. Wang, X. Xu, L. Yang, D. Cai, and M. Yin, “Unleashing ChatGPT’s power: A case study on optimizing information retrieval in flipped classrooms via prompt engineering,”IEEE Transactions on Learning Technologies, vol. 17, pp. 629–641, 2024. [Online]. Available: https://doi.org/10.1109/TLT.2023.3324714

  28. [29]

    Using a chatbot to provide formative feedback: A longitudinal study of intrinsic motivation, cognitive load, and learning performance,

    J. Yin, T.-T. Goh, and Y . Hu, “Using a chatbot to provide formative feedback: A longitudinal study of intrinsic motivation, cognitive load, and learning performance,”IEEE Transactions on Learning Technologies, vol. 17, pp. 1404–1415, 2024. [Online]. Available: https://doi.org/10.1109/TLT.2024.3364015

  29. [30]

    Semantic navigation of powerpoint-based lecture video for autonote generation,

    C. Xu, W. Jia, R. Wang, X. He, B. Zhao, and Y . Zhang, “Semantic navigation of powerpoint-based lecture video for autonote generation,” IEEE Transactions on Learning Technologies, vol. 16, no. 1, pp. 1–17,

  30. [31]

    Available: https://doi.org/10.1109/TLT.2022.3216535

    [Online]. Available: https://doi.org/10.1109/TLT.2022.3216535

  31. [32]

    Toward an AI knowledge assistant for context-aware learning experiences in software capstone project development,

    A. Neyem, L. A. Gonz ´alez, M. Mendoza, J. P. S. Alcocer, L. Centellas- Claros, and C. Paredes-Robles, “Toward an AI knowledge assistant for context-aware learning experiences in software capstone project development,”IEEE Transactions on Learning Technologies, vol. 17, pp. 1639–1654, 2024. [Online]. Available: https://doi.org/10.1109/TLT. 2024.3396735

  32. [33]

    A coefficient of agreement for nominal scales,

    J. Cohen, “A coefficient of agreement for nominal scales,”Educational and Psychological Measurement, vol. 20, no. 1, pp. 37–46,

  33. [34]

    Available: https://journals.sagepub.com/doi/10.1177/ 001316446002000104

    [Online]. Available: https://journals.sagepub.com/doi/10.1177/ 001316446002000104

  34. [35]

    Coefficient alpha and the internal structure of tests,

    L. J. Cronbach, “Coefficient alpha and the internal structure of tests,” Psychometrika, vol. 16, no. 3, pp. 297–334, 1951. [Online]. Available: https://link.springer.com/article/10.1007/BF02310555 12

  35. [36]

    Cohen,Statistical Power Analysis for the Behavioral Sciences, 2nd ed

    J. Cohen,Statistical Power Analysis for the Behavioral Sciences, 2nd ed. Lawrence Erlbaum Associates, 1988. [Online]. Available: https: //utstat.utoronto.ca/∼brunner/oldclass/378f16/readings/CohenPower.pdf

  36. [37]

    Animation: Can it facilitate?

    B. Tversky, B. Morrison, and J. B ´etrancourt, “Animation: Can it facilitate?”International Journal of Human-Computer Studies, vol. 57, no. 4, pp. 247–262, 2002. [Online]. Available: https: //doi.org/10.1006/ijhc.2002.1017

  37. [38]

    Multimedia learning: Current themes, trends and future directions,

    G. Polat, Z. Akbulut, and O. Genc-Kumtepe, “Multimedia learning: Current themes, trends and future directions,”Education Inquiry, 2025. [Online]. Available: https://doi.org/10.1080/20004508.2025.2522104

  38. [39]

    Effectiveness of multimedia pedagogical agents predicted by diverse theories: A meta-analysis,

    J. C. Castro-Alonso, R. M. Wong, O. O. Adesope, and F. Paas, “Effectiveness of multimedia pedagogical agents predicted by diverse theories: A meta-analysis,”Educational Psychology Review, vol. 33, pp. 989–1015, 2021. [Online]. Available: https://doi.org/10.1007/ s10648-020-09587-1

  39. [40]

    Paivio,Mental Representations: A Dual Coding Ap- proach

    A. Paivio,Mental Representations: A Dual Coding Ap- proach. New York, NY , USA: Oxford University Press,

  40. [41]

    Available: https://global.oup.com/academic/product/ mental-representations-9780195066661

    [Online]. Available: https://global.oup.com/academic/product/ mental-representations-9780195066661

  41. [42]

    A design framework for interaction in 3d real-time learning environments,

    S. D. Scott, C. Greenhalgh, and S. Benford, “A design framework for interaction in 3d real-time learning environments,” inProceedings of the IEEE International Conference on Advanced Learning Technologies (ICALT), 2001, pp. 115–119. [Online]. Available: https://ieeexplore.ieee.org/document/943854

  42. [43]

    Mathematical animations: The art of teaching,

    S. Sahibet al., “Mathematical animations: The art of teaching,” in Proceedings of the 31st Annual Frontiers in Education Conference (FIE),

  43. [44]

    Available: https://ieeexplore.ieee.org/document/963987

    [Online]. Available: https://ieeexplore.ieee.org/document/963987

  44. [45]

    Hooked on data videos: Assessing the effect of animation and pictographs on viewer engagement,

    F. Amini, N. H. Riche, B. Lee, J. Leboe-McGowan, and P. Irani, “Hooked on data videos: Assessing the effect of animation and pictographs on viewer engagement,” inProceedings of the 2018 International Conference on Advanced Visual Interfaces (AVI), 2018, pp. 21:1–21:9. [Online]. Available: https://doi.org/10.1145/3206505. 3206552

  45. [47]

    GPT-4 Technical Report

    [Online]. Available: https://arxiv.org/abs/2303.08774

  46. [48]

    Manim: Mathematical animation framework,

    Manim Community Developers, “Manim: Mathematical animation framework,” 2025. [Online]. Available: https://www.manim.community/

  47. [49]

    M., Basappa, R.,Bergsmann,S.,Bouneffouf,D.,Callaghan,P.,Cavazza, M., Chaminade, T.,

    S. Amershi, M. Cakmak, W. B. Knox, and T. Kulesza, “Power to the people: The role of humans in interactive machine learning,”AI Magazine, vol. 35, no. 4, pp. 105–120, 2014. [Online]. Available: https://doi.org/10.1609/aimag.v35i4.2513

  48. [50]

    Karthik Ram, Carl Boettiger, Scott Chamberlain, Noam Ross, Maelle Goldberg, and Ignasi Bartomeus

    R. D. Peng, “Reproducible research in computational science,”Science, vol. 334, no. 6060, pp. 1226–1227, 2011. [Online]. Available: https://doi.org/10.1126/science.1213847

  49. [51]

    A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions

    L. Huang, W. Yu, W. Ma, W. Zhong, Z. Feng, H. Wang, Q. Chen, W. Peng, X. Feng, B. Qin, and T. Liu, “A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions,”ACM Transactions on Information Systems, 2025, early access. [Online]. Available: https://doi.org/10.48550/arXiv.2311.05232

  50. [52]

    A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT

    J. White, M. Fu, A. Hays, Q. Dong, S. Schmidt, S. Subramanian, T. Zhao, Q. Huang, J. G. Hester, T. Schuman, Y . Sun, C. Vendome, D. C. Schmidt, A. Gokhale, and J. Spencer-Smith, “A prompt pattern catalog to enhance prompt engineering with ChatGPT,” arXiv preprint arXiv:2302.11382, 2023. [Online]. Available: https: //doi.org/10.48550/arXiv.2302.11382

  51. [53]

    Teaching plan generation and evaluation with GPT-4: Unleashing the potential of LLM in instructional design,

    B. Hu, L. Zheng, J. Zhu, L. Ding, Y . Wang, and X. Gu, “Teaching plan generation and evaluation with GPT-4: Unleashing the potential of LLM in instructional design,”IEEE Transactions on Learning Technologies, vol. 17, pp. 1471–1485, 2024. [Online]. Available: https://doi.org/10.1109/TLT.2024.3384765

  52. [54]

    Student-AI question cocreation for enhancing reading comprehension,

    M. Liu, J. Zhang, L. M. Nyagoga, and L. Liu, “Student-AI question cocreation for enhancing reading comprehension,”IEEE Transactions on Learning Technologies, vol. 17, pp. 815–826, 2024. [Online]. Available: https://doi.org/10.1109/TLT.2023.3333439