Teaching Astronomy with Large Language Models

arxiv: 2506.06921 · v2 · submitted 2025-06-07 · ⚛️ physics.ed-ph · astro-ph.CO· astro-ph.GA· astro-ph.IM· astro-ph.SR

Teaching Astronomy with Large Language Models

Yuan-Sen Ting , Teaghan O'Briain This is my paper

Pith reviewed 2026-05-19 10:11 UTC · model grok-4.3

classification ⚛️ physics.ed-ph astro-ph.COastro-ph.GAastro-ph.IMastro-ph.SR

keywords astronomy educationlarge language modelsAI literacydomain-specific toolsstudent assessmentLLM gradingundergraduate teaching

0 comments p. Extension

The pith

Structured LLM integration in astronomy courses reduces student reliance while building critical AI skills.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how final-year undergraduate astronomy students interact with large language models under structured guidance. They developed AstroTutor, a specialized tutoring system incorporating curated arXiv papers, and required students to document their AI usage in reflections and surveys. Over the semester, students shifted from seeking basic assistance to using LLMs for verification and strategic tasks, showing decreased overall reliance. LLM grading matched human evaluations with more consistent and detailed feedback, and LLM-assisted interviews were piloted for scalable assessments. The findings indicate that transparency requirements and domain-specific tools can improve both astronomy learning and essential AI skills.

Core claim

By integrating general-purpose and domain-specific LLMs with requirements for students to document their interactions, the study shows that students evolve their AI strategies from basic help-seeking to advanced verification and cross-checking workflows. This structured approach leads to decreased reliance on LLMs rather than increased dependence, while fostering metacognitive awareness and effective prompting techniques. Experimental comparisons confirm that LLM-based grading provides feedback comparable to human grading in quality but with greater detail and consistency, and interview-based exams offer a scalable alternative for individualized evaluation.

What carries the argument

AstroTutor, a domain-specific astronomy tutoring system enhanced with curated arXiv content, combined with mandatory documentation of AI usage through homework reflections and surveys.

If this is right

Students develop critical evaluation skills and strategic tool selection over the course of the semester.
LLM grading shows strong correlation with human evaluation while delivering more detailed and consistent feedback.
LLM-facilitated interview-based examinations provide a scalable alternative for individualized student assessment.
Documentation requirements foster metacognitive awareness and evolution from basic assistance to verification workflows.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same structured documentation approach could transfer to other STEM fields to build general AI literacy without increasing dependence.
Making the AstroTutor repository openly available enables other instructors to test and adapt the system for different course contexts.
Decreased LLM reliance may correlate with improved retention of astronomy concepts through greater student engagement in verification.

Load-bearing premise

Student self-documentation through homework reflections and post-course surveys accurately reflects their actual AI interaction strategies without significant social desirability bias or incomplete reporting.

What would settle it

Direct comparison of student self-reported AI usage patterns against actual logged interactions with the LLM tools to measure discrepancies in reported strategies and skill evolution.

Figures

Figures reproduced from arXiv: 2506.06921 by Teaghan O'Briain, Yuan-Sen Ting.

**Figure 1.** Figure 1: Multi-agent architecture of AstroTutor. The system employs a Retrieval Augmented Generation (RAG) approach with three specialized agents accessing distinct knowledge domains: course materials and lecture notes adapted from the instructor’s textbook, trusted reference materials, and a curated database of ArXiv papers from the astro-ph section. All knowledge sources are stored in ChromaDB vector storage for… view at source ↗

**Figure 2.** Figure 2: User interface of the AstroTutor system. The interface provides organized access to course materials including lectures, tutorials, and reference textbooks. The main chat interface facilitates pedagogical interactions, offering assistance with course concepts, data analysis and coding support, and paper recommendations for assignments and projects. Students can download chat histories and reset conversatio… view at source ↗

**Figure 3.** Figure 3: Distribution of LLM tool usage among students throughout the semester. The chart displays usage percentages for chat-based LLMs (solid bars) and IDE-integrated tools (hatched bars). ChatGPT was the dominant tool with 90% adoption, followed by AstroTutor at 80%. Students typically used AstroTutor for theoretical understanding and ChatGPT for coding assistance, demonstrating complementary roles. For IDE-in… view at source ↗

**Figure 4.** Figure 4 [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Student self-assessment of learning outcomes and LLM proficiency development across six key dimensions (1-10 Likert scale). Each panel shows kernel density estimation of student responses with different color gradients to distinguish topics. The survey assessed: (1) awareness of LLM strengths and limitations through documentation, (2) LLM effectiveness for concept understanding, (3) maintenance of problem-… view at source ↗

**Figure 6.** Figure 6: Student evaluation of course implementation and future implications (1-10 Likert scale). Each panel shows kernel density estimation of student responses with different color gradients to distinguish topics. The survey assessed: (8) perceived value of the specialized AstroTutor system compared to general-purpose LLMs, (9) impact of documentation requirements on learning experience, (10) anticipated value of… view at source ↗

**Figure 7.** Figure 7: Example of LLM-generated grading feedback showing detailed error identification and constructive guidance for a student’s analytical approach to calculating distribution moments. sualizations, and appropriate result interpretation. The system generated structured JSON responses with specific fields including earned points, detailed error descriptions with point deductions, and feedback for each question.… view at source ↗

**Figure 8.** Figure 8: Comparison of LLM-assisted grading versus human grader scores across four homework assignments for two different models: Claude-3.7-Sonnet (left) and Gemini-2.5-Flash (right). Individual homework scores are shown as colored points, with black circles representing student averages used for linear regression analysis. The dashed line shows the linear fit based on student averages, while the solid gray line i… view at source ↗

read the original abstract

We present a study of LLM integration in final-year undergraduate astronomy education, examining how students develop AI literacy through structured guidance and documentation requirements. We developed AstroTutor, a domain-specific astronomy tutoring system enhanced with curated arXiv content, and deployed it alongside general-purpose LLMs in the course. Students documented their AI usage through homework reflections and post-course surveys. We analyzed student evolution in AI interaction strategies and conducted experimental comparisons of LLM-assisted versus traditional grading methods. LLM grading showed strong correlation with human evaluation while providing more detailed and consistent feedback. We also piloted LLM-facilitated interview-based examinations as a scalable alternative to traditional assessments, demonstrating potential for individualized evaluation that addresses common testing limitations. Students experienced decreased rather than increased reliance on LLMs over the semester, developing critical evaluation skills and strategic tool selection. They evolved from basic assistance-seeking to verification workflows, with documentation requirements fostering metacognitive awareness. Students developed effective prompting strategies, contextual enrichment techniques, and cross-verification practices. Our findings suggest that structured LLM integration with transparency requirements and domain-specific tools can enhance astronomy education while building essential AI literacy skills. We provide implementation guidelines for educators and make our AstroTutor repository freely available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript reports an empirical study of structured LLM integration in a final-year undergraduate astronomy course. The authors developed AstroTutor, a domain-specific tutoring system incorporating curated arXiv content, and deployed it alongside general-purpose LLMs. Students documented AI usage via required homework reflections and post-course surveys. Key claims include that students exhibited decreased (rather than increased) reliance on LLMs over the semester, evolving toward verification workflows, critical evaluation, and strategic tool selection; that LLM grading correlates strongly with human grading while providing more detailed feedback; and that LLM-facilitated interview-based exams offer a scalable assessment alternative. The paper concludes that transparency requirements and domain-specific tools enhance astronomy education while building AI literacy, and it supplies implementation guidelines plus an open AstroTutor repository.

Significance. If the central observations hold under more rigorous scrutiny, the work could usefully inform astronomy educators seeking to incorporate LLMs without fostering dependence. The emphasis on documentation requirements, the open release of AstroTutor, and the practical guidelines constitute concrete contributions that other instructors could adapt. The observational design and focus on student self-reports, however, limit the strength of claims about skill evolution and literacy gains.

major comments (3)

[Abstract / Results] Abstract and Results sections: the headline finding that students showed decreased rather than increased LLM reliance and developed verification workflows rests entirely on analysis of homework reflections and post-course surveys, yet no sample size, quantitative metrics (e.g., frequency counts or change scores), coding protocol, or inter-rater reliability is reported.
[Methods / Results] Methods / Experimental comparisons: the claim of strong correlation between LLM and human grading is presented without the actual correlation coefficient, number of assignments or students involved, or controls for confounding variables such as assignment difficulty or grader familiarity with the material.
[Discussion] Discussion: the interpretation that documentation requirements fostered metacognitive awareness and reduced dependence assumes self-reported reflections accurately capture actual interaction strategies; the manuscript provides no validation against usage logs from AstroTutor, no pre/post objective prompting tasks, and no control cohort to rule out social-desirability bias or course-specific effects.

minor comments (2)

[Abstract] The abstract would benefit from an explicit statement of the number of participating students and the duration of the course.
[Results] Figure or table captions describing LLM grading comparisons should include the precise statistical measure used (Pearson r, Spearman rho, etc.) rather than the qualitative phrase 'strong correlation.'

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for their constructive comments, which highlight important areas for improving the clarity and rigor of our reporting. We address each major comment below and indicate where revisions will be made to the manuscript.

read point-by-point responses

Referee: [Abstract / Results] Abstract and Results sections: the headline finding that students showed decreased rather than increased LLM reliance and developed verification workflows rests entirely on analysis of homework reflections and post-course surveys, yet no sample size, quantitative metrics (e.g., frequency counts or change scores), coding protocol, or inter-rater reliability is reported.

Authors: We agree that these methodological details were insufficiently reported in the original submission. The analysis drew on reflections submitted by the full class cohort. In the revised manuscript we will explicitly state the sample size, provide quantitative metrics including the proportion of students exhibiting shifts toward verification workflows and frequency counts of key themes across the semester, describe the thematic coding protocol, and report inter-rater reliability for the qualitative analysis. These elements were part of our internal process but omitted from the text. revision: yes
Referee: [Methods / Results] Methods / Experimental comparisons: the claim of strong correlation between LLM and human grading is presented without the actual correlation coefficient, number of assignments or students involved, or controls for confounding variables such as assignment difficulty or grader familiarity with the material.

Authors: We accept that the quantitative details supporting the grading comparison were not included. The revised manuscript will report the correlation coefficient, the number of assignments and students in the comparison, and describe the grading protocol, including steps taken to minimize effects of assignment difficulty and grader familiarity. This will allow readers to evaluate the strength of the observed agreement directly. revision: yes
Referee: [Discussion] Discussion: the interpretation that documentation requirements fostered metacognitive awareness and reduced dependence assumes self-reported reflections accurately capture actual interaction strategies; the manuscript provides no validation against usage logs from AstroTutor, no pre/post objective prompting tasks, and no control cohort to rule out social-desirability bias or course-specific effects.

Authors: We acknowledge the limitations of relying on self-reported data without additional validation. The study was observational and did not collect usage logs, conduct pre/post objective tasks, or include a control cohort. In the revised Discussion we will explicitly state these constraints, discuss the possibility of social-desirability bias and course-specific effects, and frame the findings as initial evidence rather than definitive causal claims. We will also outline directions for future work that could incorporate objective measures. revision: partial

standing simulated objections not resolved

Direct validation against AstroTutor usage logs cannot be added because such logs were not collected during the study.

Circularity Check

0 steps flagged

No circularity: empirical observational study with no derivations or self-referential reductions

full rationale

The paper reports an empirical study of LLM integration in an astronomy course, including development of AstroTutor, student self-documentation via homework reflections and surveys, analysis of strategy evolution, and comparisons of LLM-assisted grading versus traditional methods. No mathematical derivations, equations, fitted parameters, or first-principles predictions are present that could reduce to inputs by construction. Claims about decreased LLM reliance and skill development rest on direct observational data rather than any self-definitional, fitted-input, or self-citation load-bearing chain. The study is self-contained against its own reported benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central findings depend on the validity of student self-reporting and the representativeness of the single course studied. No free parameters in the mathematical sense, but the interpretation of 'decreased reliance' assumes accurate measurement of usage.

axioms (1)

domain assumption Student self-reports via reflections and surveys validly capture changes in AI usage strategies.
The main conclusions about decreased reliance and skill development rely on this.

invented entities (1)

AstroTutor no independent evidence
purpose: Domain-specific astronomy tutoring system enhanced with curated arXiv content.
It's a new tool developed for this study.

pith-pipeline@v0.9.0 · 5745 in / 1335 out tokens · 41894 ms · 2026-05-19T10:11:36.638617+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We developed AstroTutor as a domain-specific tutoring chatbot... Students documented their AI usage through homework reflections and post-course surveys.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages · 12 internal anchors

[1]

GPT-4 Technical Report

Achiam, J., Adler, S., Agarwal, S., et al. 2023, arXiv e-prints, arXiv:2303.08774. https://arxiv.org/abs/2303.08774

work page internal anchor Pith review Pith/arXiv arXiv 2023
[2]

2022, AI and ethics, 2, doi: 10.1007/s43681-021-00096-7

Akgun, S., & Greenhow, C. 2022, AI and ethics, 2, doi: 10.1007/s43681-021-00096-7

work page doi:10.1007/s43681-021-00096-7 2022
[3]

Alkaissi, H., & McFarlane, S. I. 2023, Cureus, 15, doi: 10.7759/cureus.35179

work page doi:10.7759/cureus.35179 2023
[4]

M., Nguyen, S., Zi, Y., et al

Babe, H. M., Nguyen, S., Zi, Y., et al. 2024, in Findings of the Association for Computational Linguistics: ACL 2024 (Bangkok, Thailand: Association for Computational Linguistics), 8452–8474, doi: 10.18653/v1/2024.findings-acl.501

work page doi:10.18653/v1/2024.findings-acl.501 2024
[5]

2025, Social Sciences & Humanities Open, 11, 101299, doi: 10.1016/j.ssaho.2025.101299

Balalle, H., & Pannilage, S. 2025, Social Sciences & Humanities Open, 11, 101299, doi: 10.1016/j.ssaho.2025.101299

work page doi:10.1016/j.ssaho.2025.101299 2025
[6]

B., & Polikarpova, N

Barke, S., James, M. B., & Polikarpova, N. 2022, arXiv e-prints, arXiv:2206.15000, doi: 10.48550/arXiv.2206.15000

work page doi:10.48550/arxiv.2206.15000 2022
[7]

A., Denny, P., Finnie-Ansley, J., et al

Becker, B. A., Denny, P., Finnie-Ansley, J., et al. 2022, arXiv e-prints, arXiv:2212.01020, doi: 10.48550/arXiv.2212.01020

work page doi:10.48550/arxiv.2212.01020 2022
[8]

Bishop, C. M. 2006, Pattern Recognition and Machine Learning (Information Science and Statistics) (Berlin, Heidelberg: Springer-Verlag)

work page 2006
[9]

Emergent autonomous scientific research capabilities of large language models

Boiko, D. A., MacKnight, R., & Gomes, G. 2023, arXiv e-prints, arXiv:2304.05332, doi: 10.48550/arXiv.2304.05332

work page internal anchor Pith review doi:10.48550/arxiv.2304.05332 2023
[10]

D., Jacoby, S., Carney, K., et al

Borne, K. D., Jacoby, S., Carney, K., et al. 2009, in astro2010: The Astronomy and Astrophysics Decadal

work page 2009
[11]

The Revolution in Astronomy Education: Data Science for the Masses

Survey, Vol. 2010, P7, doi: 10.48550/arXiv.0909.3895

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.0909.3895 2010
[12]

ChemCrow: Augmenting large-language models with chemistry tools

Bran, A. M., Cox, S., Schilter, O., et al. 2023, arXiv e-prints, arXiv:2304.05376, doi: 10.48550/arXiv.2304.05376

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2304.05376 2023
[13]

2020, Advances in neural information processing systems, 33, 1877 Caldas Ramos, M., Collison, C

Brown, T., Mann, B., Ryder, N., et al. 2020, Advances in neural information processing systems, 33, 1877 Caldas Ramos, M., Collison, C. J., & White, A. D. 2024, arXiv e-prints, arXiv:2407.01603, doi: 10.48550/arXiv.2407.01603

work page doi:10.48550/arxiv.2407.01603 2020
[14]

Chan, C. K. Y. 2023, arXiv e-prints, arXiv:2305.00280, doi: 10.48550/arXiv.2305.00280

work page doi:10.48550/arxiv.2305.00280 2023
[15]

2024a, arXiv e-prints, arXiv:2410.11123, doi: 10.48550/arXiv.2410.11123

Chen, E., Wang, D., Xu, L., et al. 2024a, arXiv e-prints, arXiv:2410.11123, doi: 10.48550/arXiv.2410.11123

work page doi:10.48550/arxiv.2410.11123
[16]

2024b, arXiv e-prints, arXiv:2404.18231, doi: 10.48550/arXiv.2404.18231

Chen, J., Wang, X., Xu, R., et al. 2024b, arXiv e-prints, arXiv:2404.18231, doi: 10.48550/arXiv.2404.18231

work page doi:10.48550/arxiv.2404.18231
[17]

2021, Philosophy & technology, 34, 1581

Coghlan, S., Miller, T., & Paterson, J. 2021, Philosophy & technology, 34, 1581

work page 2021
[18]

2024, Methods in Ecology and Evolution, 15, 1757, doi: 10.1111/2041-210X.14325 de Haan, T., Ting, Y.-S., Ghosal, T., et al

Cooper, N., Clark, A., Lecomte, N., Qiao, H., & Ellison, A. 2024, Methods in Ecology and Evolution, 15, 1757, doi: 10.1111/2041-210X.14325 de Haan, T., Ting, Y.-S., Ghosal, T., et al. 2025a, Scientific Reports, 15, 13751, doi: 10.1038/s41598-025-97131-y —. 2025b, arXiv e-prints, arXiv:2505.17592, doi: 10.48550/arXiv.2505.17592

work page doi:10.1111/2041-210x.14325 2024
[19]

DeepSeek-V3 Technical Report

DeepSeek-AI, Liu, A., Feng, B., et al. 2024, arXiv e-prints, arXiv:2412.19437, doi: 10.48550/arXiv.2412.19437

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2412.19437 2024
[20]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek-AI, Guo, D., Yang, D., et al. 2025, arXiv e-prints, arXiv:2501.12948, doi: 10.48550/arXiv.2501.12948

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2501.12948 2025
[21]

The emerging generative artificial intelligence divide in the United States

Deng, R., Jiang, M., Yu, X., Lu, Y., & Liu, S. 2025, Computers & Education, 227, 105224, doi: 10.1016/j.compedu.2024.105224

work page doi:10.1016/j.compedu.2024.105224 2025
[22]

2023, arXiv e-prints, arXiv:2307.16364, doi: 10.48550/arXiv.2307.16364 European Commission, & Directorate-General for

Denny, P., Leinonen, J., Prather, J., et al. 2023, arXiv e-prints, arXiv:2307.16364, doi: 10.48550/arXiv.2307.16364 European Commission, & Directorate-General for

work page doi:10.48550/arxiv.2307.16364 2023
[23]

2022, Ethical guidelines on the use of artificial intelligence (AI) and data in teaching and learning for educators (Publications Office of the European Union), doi: 10.2766/153756

Education, Youth, Sport and Culture. 2022, Ethical guidelines on the use of artificial intelligence (AI) and data in teaching and learning for educators (Publications Office of the European Union), doi: 10.2766/153756

work page doi:10.2766/153756 2022
[24]

2021, Annual Review of Statistics and Its Application, 8, 493, doi: 10.1146/annurev-statistics-042720-112045

Babu, G. 2021, Annual Review of Statistics and Its Application, 8, 493, doi: 10.1146/annurev-statistics-042720-112045

work page doi:10.1146/annurev-statistics-042720-112045 2021
[25]

A., Luxton-Reilly, A., & Prather, J

Finnie-Ansley, J., Denny, P., Becker, B. A., Luxton-Reilly, A., & Prather, J. 2022, in Proceedings of the 24th Australasian Computing Education Conference, ACE ’22 (New York, NY, USA: Association for Computing Machinery), 10–19, doi: 10.1145/3511861.3511863

work page doi:10.1145/3511861.3511863 2022
[26]

G., Chadayammuri, U., et al

Fouesneau, M., Momcheva, I. G., Chadayammuri, U., et al. 2024, arXiv e-prints, arXiv:2409.20252, doi: 10.48550/arXiv.2409.20252

work page doi:10.48550/arxiv.2409.20252 2024
[27]

2025, Societies, 15, 6, doi: 10.3390/soc15010006

Gerlich, M. 2025, Societies, 15, 6, doi: 10.3390/soc15010006

work page doi:10.3390/soc15010006 2025
[28]

2001, International Journal of Artificial Intelligence in Education, 12

Graesser, A., & Harter, D. 2001, International Journal of Artificial Intelligence in Education, 12

work page 2001
[29]

2017, Disability & Society, 32, 1627, doi: 10.1080/09687599.2017.1365695

Andries, C. 2017, Disability & Society, 32, 1627, doi: 10.1080/09687599.2017.1365695

work page doi:10.1080/09687599.2017.1365695 2017
[30]

2007, Review of Educational Research, 77, 81, doi: 10.3102/003465430298487

Hattie, J., & Timperley, H. 2007, Review of Educational Research, 77, 81, doi: 10.3102/003465430298487

work page doi:10.3102/003465430298487 2007
[31]

2021, International Journal of Artificial Intelligence in Education, 32, doi: 10.1007/s40593-021-00239-1

Holmes, W., Porayska-Pomsta, K., Holstein, K., et al. 2021, International Journal of Artificial Intelligence in Education, 32, doi: 10.1007/s40593-021-00239-1

work page doi:10.1007/s40593-021-00239-1 2021
[32]

2008, Higher Education Research & Development, 27, 55, doi: 10.1080/07294360701658765

Hounsell, D., Mccune, V., Hounsell, J., & Litjens, J. 2008, Higher Education Research & Development, 27, 55, doi: 10.1080/07294360701658765

work page doi:10.1080/07294360701658765 2008
[33]

A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions

Huang, L., Yu, W., Ma, W., et al. 2023, arXiv e-prints, arXiv:2311.05232, doi: 10.48550/arXiv.2311.05232 Teaching Astronomy with Large Language Models 19

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2311.05232 2023
[34]

2012, Assessment & Evaluation in Higher Education, 37, 125, doi: 10.1080/02602938.2010.515012

Huxham, M., Campbell, F., & Westwood, J. 2012, Assessment & Evaluation in Higher Education, 37, 125, doi: 10.1080/02602938.2010.515012

work page doi:10.1080/02602938.2010.515012 2012
[35]

2023, Learning and Individual Differences, 103, 102274, doi: 10.1016/j.lindif.2023.102274

Kasneci, E., Sessler, K., K¨ uchemann, S., et al. 2023, Learning and Individual Differences, 103, 102274, doi: 10.1016/j.lindif.2023.102274

work page doi:10.1016/j.lindif.2023.102274 2023
[36]

2024, arXiv e-prints, arXiv:2404.03647, doi: 10.48550/arXiv.2404.03647

Kevian, D., Syed, U., Guo, X., et al. 2024, arXiv e-prints, arXiv:2404.03647, doi: 10.48550/arXiv.2404.03647

work page doi:10.48550/arxiv.2404.03647 2024
[37]

Knoth, N., Tolzin, A., Janson, A., & Leimeister, J. M. 2024, Computers and Education: Artificial Intelligence, 6, 100225, doi: 10.1016/j.caeai.2024.100225

work page doi:10.1016/j.caeai.2024.100225 2024
[38]

2024, arXiv e-prints, arXiv:2308.07702

Kong, A., Zhao, S., Chen, H., et al. 2024, arXiv e-prints, arXiv:2308.07702. https://arxiv.org/abs/2308.07702 K¨ uchemann, S., Steinert, S., Revenga, N., et al. 2023, Phys. Rev. Phys. Educ. Res., 19, 020128, doi: 10.1103/PhysRevPhysEducRes.19.020128

work page doi:10.1103/physrevphyseducres.19.020128 2024
[39]

2023, Int J Educ Integr, 19, doi: 10.1007/s40979-023-00130-7

Kumar, R. 2023, Int J Educ Integr, 19, doi: 10.1007/s40979-023-00130-7

work page doi:10.1007/s40979-023-00130-7 2023
[40]

Kumar, T., & Kats, M. A. 2023, American Journal of Physics, 91, 955, doi: 10.1119/5.0182627

work page doi:10.1119/5.0182627 2023
[41]

B., & Sting, F

Lehmann, M., Cornelius, P. B., & Sting, F. J. 2024, arXiv e-prints, arXiv:2409.09047, doi: 10.48550/arXiv.2409.09047

work page doi:10.48550/arxiv.2409.09047 2024
[42]

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Lewis, P., Perez, E., Piktus, A., et al. 2020, arXiv e-prints, arXiv:2005.11401, doi: 10.48550/arXiv.2005.11401

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2005.11401 2020
[43]

Sycophancy in large language models: Causes and mitigations

Malmqvist, L. 2024, arXiv e-prints, arXiv:2411.15287, doi: 10.48550/arXiv.2411.15287

work page doi:10.48550/arxiv.2411.15287 2024
[44]

M., & Schwartz, R

Mutambuki, J. M., & Schwartz, R. 2018, Chem. Educ. Res. Pract., 19, 106, doi: 10.1039/C7RP00133A O’Flaherty, J., & Phillips, C. 2015, The Internet and Higher Education, 25, 85, doi: 10.1016/j.iheduc.2015.02.002

work page doi:10.1039/c7rp00133a 2018
[45]

2024, arXiv e-prints, arXiv:2409.19750, doi: 10.48550/arXiv.2409.19750

Pan, R., Dung Nguyen, T., Arora, H., et al. 2024, arXiv e-prints, arXiv:2409.19750, doi: 10.48550/arXiv.2409.19750

work page doi:10.48550/arxiv.2409.19750 2024
[46]

2025, arXiv e-prints, arXiv:2503.23989, doi: 10.48550/arXiv.2503.23989

Pathak, A., Gandhi, R., Uttam, V., et al. 2025, arXiv e-prints, arXiv:2503.23989, doi: 10.48550/arXiv.2503.23989

work page doi:10.48550/arxiv.2503.23989 2025
[47]

2025, Royal Society Open Science, 12, doi: 10.1098/rsos.241776

Peters, U., & Chin-Yee, B. 2025, Royal Society Open Science, 12, doi: 10.1098/rsos.241776

work page doi:10.1098/rsos.241776 2025
[48]

L., Santos, J

Raihan, N., Siddiq, M. L., Santos, J. C. S., & Zampieri, M. 2024, arXiv e-prints, arXiv:2410.16349, doi: 10.48550/arXiv.2410.16349

work page doi:10.48550/arxiv.2410.16349 2024
[49]

M., & Jesse, J

Regan, P. M., & Jesse, J. 2019, Ethics Inf Technol, 21, 167

work page 2019
[50]

2024, Frontiers in Education, 9, doi: 10.3389/feduc.2024.1461362

Ruwe, T., & Mayweg, E. 2024, Frontiers in Education, 9, doi: 10.3389/feduc.2024.1461362

work page doi:10.3389/feduc.2024.1461362 2024
[51]

The Prompt Report: A Systematic Survey of Prompt Engineering Techniques

Schulhoff, S., Ilie, M., Balepur, N., et al. 2024, arXiv e-prints, arXiv:2406.06608, doi: 10.48550/arXiv.2406.06608

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2406.06608 2024
[52]

J., Lara-Alecio, R., & Guerrero, C

Tong, F., Tang, S., Irby, B. J., Lara-Alecio, R., & Guerrero, C. 2020, International Journal of Educational Research, 99, 101514, doi: 10.1016/j.ijer.2019.101514 Towhidul Islam Tonmoy, S. M., Mehedi Zaman, S. M.,

work page doi:10.1016/j.ijer.2019.101514 2020
[53]

A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models

Jain, V., et al. 2024, arXiv e-prints, arXiv:2401.01313, doi: 10.48550/arXiv.2401.01313

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2401.01313 2024
[54]

2018, European Journal of Engineering Education, 43, 507, doi: 10.1080/03043797.2017.1290585

Wallin, P., & Adawi, T. 2018, European Journal of Engineering Education, 43, 507, doi: 10.1080/03043797.2017.1290585

work page doi:10.1080/03043797.2017.1290585 2018
[55]

2024, arXiv e-prints, arXiv:2403.18105, doi: 10.48550/arXiv.2403.18105

Wang, S., Xu, T., Li, H., et al. 2024, arXiv e-prints, arXiv:2403.18105, doi: 10.48550/arXiv.2403.18105

work page doi:10.48550/arxiv.2403.18105 2024
[56]

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Wei, J., Wang, X., Schuurmans, D., et al. 2022, arXiv e-prints, arXiv:2201.11903, doi: 10.48550/arXiv.2201.11903

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2201.11903 2022
[57]

2023, arXiv e-prints, arXiv:2306.01337, doi: 10.48550/arXiv.2306.01337

Wu, Y., Jia, F., Zhang, S., et al. 2023, arXiv e-prints, arXiv:2306.01337, doi: 10.48550/arXiv.2306.01337

work page doi:10.48550/arxiv.2306.01337 2023
[58]

2023, arXiv e-prints, arXiv:2305.14688, doi: 10.48550/arXiv.2305.14688

Xu, B., Yang, A., Lin, J., et al. 2023, arXiv e-prints, arXiv:2305.14688, doi: 10.48550/arXiv.2305.14688

work page doi:10.48550/arxiv.2305.14688 2023
[59]

ReAct: Synergizing Reasoning and Acting in Language Models

Yao, S., Zhao, J., Yu, D., et al. 2022, arXiv e-prints, arXiv:2210.03629, doi: 10.48550/arXiv.2210.03629

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2210.03629 2022
[60]

2024, Smart Learning Environments, 11, doi: 10.1186/s40561-024-00316-7

Zhai, C., Wibowo, S., & Li, L. 2024, Smart Learning Environments, 11, doi: 10.1186/s40561-024-00316-7

work page doi:10.1186/s40561-024-00316-7 2024
[61]

2023, arXiv e-prints, arXiv:2311.10054, doi: 10.48550/arXiv.2311.10054

Zheng, M., Pei, J., Logeswaran, L., Lee, M., & Jurgens, D. 2023, arXiv e-prints, arXiv:2311.10054, doi: 10.48550/arXiv.2311.10054

work page doi:10.48550/arxiv.2311.10054 2023

[1] [1]

GPT-4 Technical Report

Achiam, J., Adler, S., Agarwal, S., et al. 2023, arXiv e-prints, arXiv:2303.08774. https://arxiv.org/abs/2303.08774

work page internal anchor Pith review Pith/arXiv arXiv 2023

[2] [2]

2022, AI and ethics, 2, doi: 10.1007/s43681-021-00096-7

Akgun, S., & Greenhow, C. 2022, AI and ethics, 2, doi: 10.1007/s43681-021-00096-7

work page doi:10.1007/s43681-021-00096-7 2022

[3] [3]

Alkaissi, H., & McFarlane, S. I. 2023, Cureus, 15, doi: 10.7759/cureus.35179

work page doi:10.7759/cureus.35179 2023

[4] [4]

M., Nguyen, S., Zi, Y., et al

Babe, H. M., Nguyen, S., Zi, Y., et al. 2024, in Findings of the Association for Computational Linguistics: ACL 2024 (Bangkok, Thailand: Association for Computational Linguistics), 8452–8474, doi: 10.18653/v1/2024.findings-acl.501

work page doi:10.18653/v1/2024.findings-acl.501 2024

[5] [5]

2025, Social Sciences & Humanities Open, 11, 101299, doi: 10.1016/j.ssaho.2025.101299

Balalle, H., & Pannilage, S. 2025, Social Sciences & Humanities Open, 11, 101299, doi: 10.1016/j.ssaho.2025.101299

work page doi:10.1016/j.ssaho.2025.101299 2025

[6] [6]

B., & Polikarpova, N

Barke, S., James, M. B., & Polikarpova, N. 2022, arXiv e-prints, arXiv:2206.15000, doi: 10.48550/arXiv.2206.15000

work page doi:10.48550/arxiv.2206.15000 2022

[7] [7]

A., Denny, P., Finnie-Ansley, J., et al

Becker, B. A., Denny, P., Finnie-Ansley, J., et al. 2022, arXiv e-prints, arXiv:2212.01020, doi: 10.48550/arXiv.2212.01020

work page doi:10.48550/arxiv.2212.01020 2022

[8] [8]

Bishop, C. M. 2006, Pattern Recognition and Machine Learning (Information Science and Statistics) (Berlin, Heidelberg: Springer-Verlag)

work page 2006

[9] [9]

Emergent autonomous scientific research capabilities of large language models

Boiko, D. A., MacKnight, R., & Gomes, G. 2023, arXiv e-prints, arXiv:2304.05332, doi: 10.48550/arXiv.2304.05332

work page internal anchor Pith review doi:10.48550/arxiv.2304.05332 2023

[10] [10]

D., Jacoby, S., Carney, K., et al

Borne, K. D., Jacoby, S., Carney, K., et al. 2009, in astro2010: The Astronomy and Astrophysics Decadal

work page 2009

[11] [11]

The Revolution in Astronomy Education: Data Science for the Masses

Survey, Vol. 2010, P7, doi: 10.48550/arXiv.0909.3895

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.0909.3895 2010

[12] [12]

ChemCrow: Augmenting large-language models with chemistry tools

Bran, A. M., Cox, S., Schilter, O., et al. 2023, arXiv e-prints, arXiv:2304.05376, doi: 10.48550/arXiv.2304.05376

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2304.05376 2023

[13] [13]

2020, Advances in neural information processing systems, 33, 1877 Caldas Ramos, M., Collison, C

Brown, T., Mann, B., Ryder, N., et al. 2020, Advances in neural information processing systems, 33, 1877 Caldas Ramos, M., Collison, C. J., & White, A. D. 2024, arXiv e-prints, arXiv:2407.01603, doi: 10.48550/arXiv.2407.01603

work page doi:10.48550/arxiv.2407.01603 2020

[14] [14]

Chan, C. K. Y. 2023, arXiv e-prints, arXiv:2305.00280, doi: 10.48550/arXiv.2305.00280

work page doi:10.48550/arxiv.2305.00280 2023

[15] [15]

2024a, arXiv e-prints, arXiv:2410.11123, doi: 10.48550/arXiv.2410.11123

Chen, E., Wang, D., Xu, L., et al. 2024a, arXiv e-prints, arXiv:2410.11123, doi: 10.48550/arXiv.2410.11123

work page doi:10.48550/arxiv.2410.11123

[16] [16]

2024b, arXiv e-prints, arXiv:2404.18231, doi: 10.48550/arXiv.2404.18231

Chen, J., Wang, X., Xu, R., et al. 2024b, arXiv e-prints, arXiv:2404.18231, doi: 10.48550/arXiv.2404.18231

work page doi:10.48550/arxiv.2404.18231

[17] [17]

2021, Philosophy & technology, 34, 1581

Coghlan, S., Miller, T., & Paterson, J. 2021, Philosophy & technology, 34, 1581

work page 2021

[18] [18]

2024, Methods in Ecology and Evolution, 15, 1757, doi: 10.1111/2041-210X.14325 de Haan, T., Ting, Y.-S., Ghosal, T., et al

Cooper, N., Clark, A., Lecomte, N., Qiao, H., & Ellison, A. 2024, Methods in Ecology and Evolution, 15, 1757, doi: 10.1111/2041-210X.14325 de Haan, T., Ting, Y.-S., Ghosal, T., et al. 2025a, Scientific Reports, 15, 13751, doi: 10.1038/s41598-025-97131-y —. 2025b, arXiv e-prints, arXiv:2505.17592, doi: 10.48550/arXiv.2505.17592

work page doi:10.1111/2041-210x.14325 2024

[19] [19]

DeepSeek-V3 Technical Report

DeepSeek-AI, Liu, A., Feng, B., et al. 2024, arXiv e-prints, arXiv:2412.19437, doi: 10.48550/arXiv.2412.19437

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2412.19437 2024

[20] [20]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek-AI, Guo, D., Yang, D., et al. 2025, arXiv e-prints, arXiv:2501.12948, doi: 10.48550/arXiv.2501.12948

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2501.12948 2025

[21] [21]

The emerging generative artificial intelligence divide in the United States

Deng, R., Jiang, M., Yu, X., Lu, Y., & Liu, S. 2025, Computers & Education, 227, 105224, doi: 10.1016/j.compedu.2024.105224

work page doi:10.1016/j.compedu.2024.105224 2025

[22] [22]

2023, arXiv e-prints, arXiv:2307.16364, doi: 10.48550/arXiv.2307.16364 European Commission, & Directorate-General for

Denny, P., Leinonen, J., Prather, J., et al. 2023, arXiv e-prints, arXiv:2307.16364, doi: 10.48550/arXiv.2307.16364 European Commission, & Directorate-General for

work page doi:10.48550/arxiv.2307.16364 2023

[23] [23]

2022, Ethical guidelines on the use of artificial intelligence (AI) and data in teaching and learning for educators (Publications Office of the European Union), doi: 10.2766/153756

Education, Youth, Sport and Culture. 2022, Ethical guidelines on the use of artificial intelligence (AI) and data in teaching and learning for educators (Publications Office of the European Union), doi: 10.2766/153756

work page doi:10.2766/153756 2022

[24] [24]

2021, Annual Review of Statistics and Its Application, 8, 493, doi: 10.1146/annurev-statistics-042720-112045

Babu, G. 2021, Annual Review of Statistics and Its Application, 8, 493, doi: 10.1146/annurev-statistics-042720-112045

work page doi:10.1146/annurev-statistics-042720-112045 2021

[25] [25]

A., Luxton-Reilly, A., & Prather, J

Finnie-Ansley, J., Denny, P., Becker, B. A., Luxton-Reilly, A., & Prather, J. 2022, in Proceedings of the 24th Australasian Computing Education Conference, ACE ’22 (New York, NY, USA: Association for Computing Machinery), 10–19, doi: 10.1145/3511861.3511863

work page doi:10.1145/3511861.3511863 2022

[26] [26]

G., Chadayammuri, U., et al

Fouesneau, M., Momcheva, I. G., Chadayammuri, U., et al. 2024, arXiv e-prints, arXiv:2409.20252, doi: 10.48550/arXiv.2409.20252

work page doi:10.48550/arxiv.2409.20252 2024

[27] [27]

2025, Societies, 15, 6, doi: 10.3390/soc15010006

Gerlich, M. 2025, Societies, 15, 6, doi: 10.3390/soc15010006

work page doi:10.3390/soc15010006 2025

[28] [28]

2001, International Journal of Artificial Intelligence in Education, 12

Graesser, A., & Harter, D. 2001, International Journal of Artificial Intelligence in Education, 12

work page 2001

[29] [29]

2017, Disability & Society, 32, 1627, doi: 10.1080/09687599.2017.1365695

Andries, C. 2017, Disability & Society, 32, 1627, doi: 10.1080/09687599.2017.1365695

work page doi:10.1080/09687599.2017.1365695 2017

[30] [30]

2007, Review of Educational Research, 77, 81, doi: 10.3102/003465430298487

Hattie, J., & Timperley, H. 2007, Review of Educational Research, 77, 81, doi: 10.3102/003465430298487

work page doi:10.3102/003465430298487 2007

[31] [31]

2021, International Journal of Artificial Intelligence in Education, 32, doi: 10.1007/s40593-021-00239-1

Holmes, W., Porayska-Pomsta, K., Holstein, K., et al. 2021, International Journal of Artificial Intelligence in Education, 32, doi: 10.1007/s40593-021-00239-1

work page doi:10.1007/s40593-021-00239-1 2021

[32] [32]

2008, Higher Education Research & Development, 27, 55, doi: 10.1080/07294360701658765

Hounsell, D., Mccune, V., Hounsell, J., & Litjens, J. 2008, Higher Education Research & Development, 27, 55, doi: 10.1080/07294360701658765

work page doi:10.1080/07294360701658765 2008

[33] [33]

A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions

Huang, L., Yu, W., Ma, W., et al. 2023, arXiv e-prints, arXiv:2311.05232, doi: 10.48550/arXiv.2311.05232 Teaching Astronomy with Large Language Models 19

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2311.05232 2023

[34] [34]

2012, Assessment & Evaluation in Higher Education, 37, 125, doi: 10.1080/02602938.2010.515012

Huxham, M., Campbell, F., & Westwood, J. 2012, Assessment & Evaluation in Higher Education, 37, 125, doi: 10.1080/02602938.2010.515012

work page doi:10.1080/02602938.2010.515012 2012

[35] [35]

2023, Learning and Individual Differences, 103, 102274, doi: 10.1016/j.lindif.2023.102274

Kasneci, E., Sessler, K., K¨ uchemann, S., et al. 2023, Learning and Individual Differences, 103, 102274, doi: 10.1016/j.lindif.2023.102274

work page doi:10.1016/j.lindif.2023.102274 2023

[36] [36]

2024, arXiv e-prints, arXiv:2404.03647, doi: 10.48550/arXiv.2404.03647

Kevian, D., Syed, U., Guo, X., et al. 2024, arXiv e-prints, arXiv:2404.03647, doi: 10.48550/arXiv.2404.03647

work page doi:10.48550/arxiv.2404.03647 2024

[37] [37]

Knoth, N., Tolzin, A., Janson, A., & Leimeister, J. M. 2024, Computers and Education: Artificial Intelligence, 6, 100225, doi: 10.1016/j.caeai.2024.100225

work page doi:10.1016/j.caeai.2024.100225 2024

[38] [38]

2024, arXiv e-prints, arXiv:2308.07702

Kong, A., Zhao, S., Chen, H., et al. 2024, arXiv e-prints, arXiv:2308.07702. https://arxiv.org/abs/2308.07702 K¨ uchemann, S., Steinert, S., Revenga, N., et al. 2023, Phys. Rev. Phys. Educ. Res., 19, 020128, doi: 10.1103/PhysRevPhysEducRes.19.020128

work page doi:10.1103/physrevphyseducres.19.020128 2024

[39] [39]

2023, Int J Educ Integr, 19, doi: 10.1007/s40979-023-00130-7

Kumar, R. 2023, Int J Educ Integr, 19, doi: 10.1007/s40979-023-00130-7

work page doi:10.1007/s40979-023-00130-7 2023

[40] [40]

Kumar, T., & Kats, M. A. 2023, American Journal of Physics, 91, 955, doi: 10.1119/5.0182627

work page doi:10.1119/5.0182627 2023

[41] [41]

B., & Sting, F

Lehmann, M., Cornelius, P. B., & Sting, F. J. 2024, arXiv e-prints, arXiv:2409.09047, doi: 10.48550/arXiv.2409.09047

work page doi:10.48550/arxiv.2409.09047 2024

[42] [42]

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Lewis, P., Perez, E., Piktus, A., et al. 2020, arXiv e-prints, arXiv:2005.11401, doi: 10.48550/arXiv.2005.11401

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2005.11401 2020

[43] [43]

Sycophancy in large language models: Causes and mitigations

Malmqvist, L. 2024, arXiv e-prints, arXiv:2411.15287, doi: 10.48550/arXiv.2411.15287

work page doi:10.48550/arxiv.2411.15287 2024

[44] [44]

M., & Schwartz, R

Mutambuki, J. M., & Schwartz, R. 2018, Chem. Educ. Res. Pract., 19, 106, doi: 10.1039/C7RP00133A O’Flaherty, J., & Phillips, C. 2015, The Internet and Higher Education, 25, 85, doi: 10.1016/j.iheduc.2015.02.002

work page doi:10.1039/c7rp00133a 2018

[45] [45]

2024, arXiv e-prints, arXiv:2409.19750, doi: 10.48550/arXiv.2409.19750

Pan, R., Dung Nguyen, T., Arora, H., et al. 2024, arXiv e-prints, arXiv:2409.19750, doi: 10.48550/arXiv.2409.19750

work page doi:10.48550/arxiv.2409.19750 2024

[46] [46]

2025, arXiv e-prints, arXiv:2503.23989, doi: 10.48550/arXiv.2503.23989

Pathak, A., Gandhi, R., Uttam, V., et al. 2025, arXiv e-prints, arXiv:2503.23989, doi: 10.48550/arXiv.2503.23989

work page doi:10.48550/arxiv.2503.23989 2025

[47] [47]

2025, Royal Society Open Science, 12, doi: 10.1098/rsos.241776

Peters, U., & Chin-Yee, B. 2025, Royal Society Open Science, 12, doi: 10.1098/rsos.241776

work page doi:10.1098/rsos.241776 2025

[48] [48]

L., Santos, J

Raihan, N., Siddiq, M. L., Santos, J. C. S., & Zampieri, M. 2024, arXiv e-prints, arXiv:2410.16349, doi: 10.48550/arXiv.2410.16349

work page doi:10.48550/arxiv.2410.16349 2024

[49] [49]

M., & Jesse, J

Regan, P. M., & Jesse, J. 2019, Ethics Inf Technol, 21, 167

work page 2019

[50] [50]

2024, Frontiers in Education, 9, doi: 10.3389/feduc.2024.1461362

Ruwe, T., & Mayweg, E. 2024, Frontiers in Education, 9, doi: 10.3389/feduc.2024.1461362

work page doi:10.3389/feduc.2024.1461362 2024

[51] [51]

The Prompt Report: A Systematic Survey of Prompt Engineering Techniques

Schulhoff, S., Ilie, M., Balepur, N., et al. 2024, arXiv e-prints, arXiv:2406.06608, doi: 10.48550/arXiv.2406.06608

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2406.06608 2024

[52] [52]

J., Lara-Alecio, R., & Guerrero, C

Tong, F., Tang, S., Irby, B. J., Lara-Alecio, R., & Guerrero, C. 2020, International Journal of Educational Research, 99, 101514, doi: 10.1016/j.ijer.2019.101514 Towhidul Islam Tonmoy, S. M., Mehedi Zaman, S. M.,

work page doi:10.1016/j.ijer.2019.101514 2020

[53] [53]

A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models

Jain, V., et al. 2024, arXiv e-prints, arXiv:2401.01313, doi: 10.48550/arXiv.2401.01313

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2401.01313 2024

[54] [54]

2018, European Journal of Engineering Education, 43, 507, doi: 10.1080/03043797.2017.1290585

Wallin, P., & Adawi, T. 2018, European Journal of Engineering Education, 43, 507, doi: 10.1080/03043797.2017.1290585

work page doi:10.1080/03043797.2017.1290585 2018

[55] [55]

2024, arXiv e-prints, arXiv:2403.18105, doi: 10.48550/arXiv.2403.18105

Wang, S., Xu, T., Li, H., et al. 2024, arXiv e-prints, arXiv:2403.18105, doi: 10.48550/arXiv.2403.18105

work page doi:10.48550/arxiv.2403.18105 2024

[56] [56]

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Wei, J., Wang, X., Schuurmans, D., et al. 2022, arXiv e-prints, arXiv:2201.11903, doi: 10.48550/arXiv.2201.11903

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2201.11903 2022

[57] [57]

2023, arXiv e-prints, arXiv:2306.01337, doi: 10.48550/arXiv.2306.01337

Wu, Y., Jia, F., Zhang, S., et al. 2023, arXiv e-prints, arXiv:2306.01337, doi: 10.48550/arXiv.2306.01337

work page doi:10.48550/arxiv.2306.01337 2023

[58] [58]

2023, arXiv e-prints, arXiv:2305.14688, doi: 10.48550/arXiv.2305.14688

Xu, B., Yang, A., Lin, J., et al. 2023, arXiv e-prints, arXiv:2305.14688, doi: 10.48550/arXiv.2305.14688

work page doi:10.48550/arxiv.2305.14688 2023

[59] [59]

ReAct: Synergizing Reasoning and Acting in Language Models

Yao, S., Zhao, J., Yu, D., et al. 2022, arXiv e-prints, arXiv:2210.03629, doi: 10.48550/arXiv.2210.03629

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2210.03629 2022

[60] [60]

2024, Smart Learning Environments, 11, doi: 10.1186/s40561-024-00316-7

Zhai, C., Wibowo, S., & Li, L. 2024, Smart Learning Environments, 11, doi: 10.1186/s40561-024-00316-7

work page doi:10.1186/s40561-024-00316-7 2024

[61] [61]

2023, arXiv e-prints, arXiv:2311.10054, doi: 10.48550/arXiv.2311.10054

Zheng, M., Pei, J., Logeswaran, L., Lee, M., & Jurgens, D. 2023, arXiv e-prints, arXiv:2311.10054, doi: 10.48550/arXiv.2311.10054

work page doi:10.48550/arxiv.2311.10054 2023