Recognition: unknown
The Crutch or the Ceiling? How Different Generations of LLMs Shape EFL Student Writings
Pith reviewed 2026-05-10 09:50 UTC · model grok-4.3
The pith
Advanced LLMs boost EFL writing scores and lexical diversity yet correlate with lower expert coherence ratings.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Post-ChatGPT LLMs enhance quantitative measures such as lexical diversity and readability scores for EFL writers, particularly lower-proficiency learners, while increased LLM assistance correlates negatively with qualitative expert ratings, indicating surface fluency without deep coherence. Pedagogy must therefore differentiate ideational scaffolding from textual production and align AI functions with the learner's Zone of Proximal Development.
What carries the argument
Comparison of pre- and post-ChatGPT student compositions through expert qualitative scoring together with quantitative metrics including readability tests, MTLD, and Pearson's correlation coefficient.
If this is right
- Lower-proficiency EFL learners receive measurable boosts in assessment scores and lexical diversity from advanced LLMs.
- Greater LLM assistance can mask students' true current ability by supplying surface-level fluency.
- Human expert ratings of coherence decline as LLM assistance increases, pointing to limits in deep understanding.
- Effective use requires pedagogy to verify the learning process rather than judge final output quality alone.
Where Pith is reading between the lines
- Without explicit checks on the writing process, repeated LLM use may slow the development of independent coherence skills over a school year.
- The same surface-versus-depth pattern could appear in other language tasks or subjects where AI supplies polished output.
- A longitudinal study tracking the same students' unaided writing proficiency across varying levels of permitted LLM support would test whether the masking effect persists.
Load-bearing premise
Observed differences in student writings can be attributed primarily to changes in LLM capabilities rather than shifts in teaching practices, assignment design, or student proficiency over the same period.
What would settle it
A controlled comparison in which post-ChatGPT writings show stable or higher expert coherence ratings when teaching methods and student cohorts are held constant would falsify the claim that increased LLM assistance causes reduced deep coherence.
Figures
read the original abstract
The rapid evolution of Large Language Models (LLMs) has made them powerful tools for enhancing student writing. This study explores the extent and limitations of LLMs in assisting secondary-level English as a Foreign Language (EFL) students with their writing tasks. While existing studies focus on output quality, our research examines the developmental shift in LLMs and their impact on EFL students, assessing whether smarter models act as true scaffolds or mere compensatory crutches. To achieve this, we analyse student compositions assisted by LLMs before and after ChatGPT's release, using both expert qualitative scoring and quantitative metrics (readability tests, Pearson's correlation coefficient, MTLD, and others). Our results indicate that advanced LLMs boost assessment scores and lexical diversity for lower-proficiency learners, potentially masking their true ability. Crucially, increased LLM assistance correlated negatively with human expert ratings, suggesting surface fluency without deep coherence. To transform AI-assisted practice into genuine learning, pedagogy must shift from focusing on output quality to verifying the learning process. Educators should align AI functions, specifically differentiating ideational scaffolding from textual production, within the learner's Zone of Proximal Development.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper compares EFL secondary students' compositions assisted by pre-ChatGPT LLMs versus post-ChatGPT models. It employs expert qualitative scoring alongside quantitative metrics (readability tests, MTLD for lexical diversity, Pearson's correlation) to claim that advanced LLMs raise assessment scores and lexical diversity for lower-proficiency learners while increased LLM assistance negatively correlates with human expert ratings, interpreted as surface fluency without deep coherence. The authors conclude that pedagogy should shift from output quality to verifying the learning process and aligning AI use with the Zone of Proximal Development.
Significance. If the empirical claims survive methodological scrutiny, the work could usefully inform EFL pedagogy and HCI research on generative AI in education by documenting how model capability interacts with learner proficiency. The combination of qualitative expert judgment and quantitative measures (MTLD, readability) is a positive feature that allows triangulation of surface versus deeper writing qualities.
major comments (3)
- [Methods] Methods section: No sample size, participant demographics, number of compositions, or details on how 'LLM assistance levels' were quantified (self-report, usage logs, or otherwise) are reported. Without these, the negative correlation between assistance and expert ratings cannot be evaluated for statistical power or generalizability.
- [Results] Results/Discussion: The pre-post ChatGPT design attributes differences in writing to LLM generations, yet the manuscript provides no controls, matching, or covariates for concurrent changes in curriculum, assignment design, teacher practices, or student cohort proficiency. This leaves the central causal interpretation vulnerable to confounding.
- [Abstract] Abstract and Results: The claim that advanced LLMs 'mask true ability' for lower-proficiency learners rests on observed score boosts, but the paper does not report how proficiency was independently measured or how assistance was isolated from learner effort, undermining the masking interpretation.
minor comments (1)
- [Abstract] The abstract mentions 'Pearson's correlation coefficient' and 'MTLD' without defining the exact variables correlated or the MTLD implementation details; a brief methods paragraph would improve clarity.
Simulated Author's Rebuttal
We are grateful to the referee for their detailed feedback on our manuscript. We have carefully considered each major comment and provide our responses below, indicating the revisions we plan to make.
read point-by-point responses
-
Referee: [Methods] Methods section: No sample size, participant demographics, number of compositions, or details on how 'LLM assistance levels' were quantified (self-report, usage logs, or otherwise) are reported. Without these, the negative correlation between assistance and expert ratings cannot be evaluated for statistical power or generalizability.
Authors: We acknowledge the need for greater transparency in the Methods section. The revised manuscript will include detailed reporting of the sample size, participant demographics (including age, gender distribution, and EFL proficiency levels), the number of compositions analyzed, and the method for quantifying LLM assistance levels, which combined self-reported usage with analysis of writing process logs. These additions will support evaluation of statistical power and generalizability. revision: yes
-
Referee: [Results] Results/Discussion: The pre-post ChatGPT design attributes differences in writing to LLM generations, yet the manuscript provides no controls, matching, or covariates for concurrent changes in curriculum, assignment design, teacher practices, or student cohort proficiency. This leaves the central causal interpretation vulnerable to confounding.
Authors: We recognize that the pre-post design is susceptible to confounding from external factors. Our analysis did include student proficiency as a covariate and focused on comparative patterns across LLM generations. In the revision, we will expand the Discussion to explicitly address potential confounders, include any sensitivity analyses, and temper the causal language while emphasizing the observational nature of the findings and their implications for pedagogy. revision: partial
-
Referee: [Abstract] Abstract and Results: The claim that advanced LLMs 'mask true ability' for lower-proficiency learners rests on observed score boosts, but the paper does not report how proficiency was independently measured or how assistance was isolated from learner effort, undermining the masking interpretation.
Authors: Proficiency was independently measured using standardized EFL assessment tools prior to the writing tasks, and LLM assistance was isolated through a mixed-methods approach involving usage frequency reports and qualitative differentiation of text features. The masking interpretation is further supported by the negative correlation with expert ratings on deep coherence rather than surface features. We will update the abstract and results sections to clearly describe these measurement procedures and refine the interpretation to be more precise. revision: yes
Circularity Check
No significant circularity: empirical observational study without self-referential derivations
full rationale
The paper presents an observational pre-post analysis of EFL student writings assisted by different LLM generations, relying on expert qualitative scoring, readability metrics, MTLD, and Pearson correlations. No equations, fitted parameters renamed as predictions, or derivation chains appear in the provided text or abstract. The central claim (negative correlation between LLM assistance level and expert ratings) is framed as an empirical finding rather than a mathematical reduction to inputs. Self-citations, if present, are not load-bearing for any uniqueness theorem or ansatz. This matches the default expectation for non-derivational empirical work; the design limitations noted in the skeptic take concern external validity and confounding, not circularity by construction.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Expert qualitative scoring reliably distinguishes surface fluency from deep coherence in student writing.
- domain assumption Pre- and post-ChatGPT student compositions are comparable after controlling for other variables.
Reference graph
Works this paper leans on
-
[1]
2024. Poe. https://poe.com
2024
-
[2]
F. Al Mahmud. 2023. Investigating EFL Students’ Writing Skills Through Artificial Intelligence: Wordtune Application as a Tool.Journal of Language Teaching and Research14, 5 (2023), 1395–1404. https://doi.org/10.17507/jltr.1405.21
-
[3]
J. Algaraady and M. Mahyoob. 2023. ChatGPT’s Capabilities in Spotting and Analyzing Writing Errors Experienced by EFL Learners. Arab World English Journal (A WEJ) Special Issue on CALL9 (2023), 3–17. https://doi.org/10.24093/awej/call9.1
-
[4]
Shivam Bansal and Chaitanya Aggarwal. 2025. textstat. https://pypi.org/project/textstat/. Accessed: 2024-12-17
2025
-
[5]
1998.Corpus Linguistics: Investigating Language Structure and Use
Douglas Biber, Susan Conrad, and Randi Reppen. 1998.Corpus Linguistics: Investigating Language Structure and Use. Cambridge University Press
1998
-
[6]
O’Reilly Media, Inc
Steven Bird, Ewan Klein, and Edward Loper. 2009.Natural language processing with Python: analyzing text with the natural language toolkit. " O’Reilly Media, Inc. "
2009
-
[7]
R. A. Bjork. 1994.Memory and metamemory considerations in the training of human beings. The MIT Press. 185–205 pages
1994
-
[8]
Sid Black, Leo Gao, Phil Wang, Connor Leahy, and Stella Biderman. 2021. GPT-Neo: Large scale autoregressive language modeling with mesh-tensorflow. URL: http://github.com/eleutherai/gpt-neo
2021
-
[9]
E. Bonner, R. Lege, and E. Frazier. 2023. Large language model-based artificial intelligence in the language classroom: Practical ideas for teaching.Teaching English with Technology23, 1 (2023), 23–41. https://doi.org/10.56297/BKAM1691/WIEO1749
-
[10]
Bartosz Broda, Bartłomiej Nitoń, Włodzimierz Gruszczyński, and Maciej Ogrodniczuk. 2014. Measuring Readability of Polish Texts: Baseline Experiments. InProceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asun...
2014
-
[11]
Noam Chomsky. 1956. Three Models for the Description of Language.IRE Transactions on Information Theory2, 3 (September 1956), 113–124
1956
-
[12]
Meri Coleman and Ta Lin Liau. 1975. A Computer Readability Formula Designed for Machine Scoring.Journal of Applied Psychology60, 2 (1975), 283
1975
-
[13]
Microsoft Corporation. 2025. Microsoft Excel. (2025). https://office.microsoft.com/excel
2025
-
[14]
E. Creely and J. Blannin. 2025. Creative partnerships with generative AI. Possibilities for education and beyond.Thinking Skills and Creativity56 (2025), 101727. https://doi.org/10.1016/j.tsc.2025.101727
-
[15]
Scott A. Crossley. 2025. Developing Linguistic Constructs of Text Readability Using Natural Language Processing.Scientific Studies of Reading29, 2 (2025), 138–160. https://doi.org/10.1080/10888438.2024.2422365
-
[16]
Edgar Dale and Jeanne S. Chall. 1948. A Formula for Predicting Readability.Educational Research Bulletin(1948), 37–54
1948
-
[17]
H. B. Essel, D. Vlachopoulos, A. B. Essuman, and J. O. Amankwa. 2024. ChatGPT effects on cognitive skills of undergraduate students: Receiving instant responses from AI-based conversational large language models (LLMs).Computers and Education: Artificial Intelligence The Crutch or the Ceiling? How Different Generations of LLMs Shape EFL Student Writings•19 6...
-
[18]
A. S. Evmenova, K. Regan, R. Mergen, and R. Hrisseh. 2024. Improving Writing Feedback for Struggling Writers: Generative AI to the Rescue?TechTrends68, 4 (2024), 790–802. https://doi.org/10.1007/s11528-024-00965-y
-
[19]
H. Feng, K. Li, and L. J. Zhang. 2025. What does AI bring to second language writing? A systematic review (2014–2024).Language Learning & Technology29, 1 (2025), 1–27. https://doi.org/10.64152/10125/73619 Advance online publication
-
[20]
A. Z. Fitzsimons, E. M. Gerber, and D. Long. 2024. Overcoming challenges to personal narrative co-writing with AI: A participatory design approach for under-resourced high school students. InProceedings of the Third Workshop on Intelligent and Interactive Writing Assistants (In2Writing ’24). Association for Computational Linguistics. https://doi.org/10.11...
-
[21]
R. Flesch. 1948. A new readability yardstick.Journal of Applied Psychology32, 3 (1948), 221–233
1948
-
[22]
Tomas Goldsack, Zheheng Luo, Qianqian Xie, Carolina Scarton, Matthew Shardlow, Sophia Ananiadou, and Chenghua Lin. 2023. Overview of the BioLaySumm 2023 Shared Task on Lay Summarization of Biomedical Research Articles. InProceedings of the 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks. Association for Computational Lingui...
-
[23]
R. Gunning. 1952.The Technique of Clear Writing. McGraw-Hill, New York
1952
-
[24]
Kai Guo and Danling Li. 2024. Understanding EFL students’ use of self-made AI chatbots as personalized writing assistance tools: A mixed methods study.System124 (2024), 103362. https://doi.org/10.1016/j.system.2024.103362
-
[25]
Pei-Fu Guo, Ying-Hsuan Chen, Yun-Da Tsai, and Shou-De Lin. 2024. Towards Optimizing with Large Language Models. InFourth Workshop on Knowledge-infused Learning. https://openreview.net/forum?id=vIU8LUckb4
2024
-
[26]
J. Han, H. Yoo, J. Myung, M. Kim, T. Y. Lee, S.-Y. Ahn, and A. Oh. 2024. Exploring Student-ChatGPT Dialogue in EFL Writing Education. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 1–17
2024
- [27]
-
[28]
A. Harris. 2019. Defining advanced vocabulary in academic contexts.Journal of English for Academic Purposes34 (2019), 12–25
2019
-
[29]
T. Jansen, A. Horbach, and J. Möller. 2024. Feedback from Generative AI: Correlates of Student Engagement in Text Revision from 655 Classes from Primary and Secondary School. InProceedings of the CHI Conference on Human Factors in Computing Systems (CHI ’24). Association for Computing Machinery. https://doi.org/10.1145/3613904.3642345
-
[30]
An Empirical Study to Understand How Students Use ChatGPT for Writing Essays
A. Jelson, D. Manesh, A. Jang, D. Dunlap, Y.-H. Kim, and S. W. Lee. 2025. An empirical study to understand how students use ChatGPT for writing essays.arXiv preprint arXiv:2501.10551(2025). https://arxiv.org/abs/2501.10551
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[31]
J. Jeon, L. Wei, K. W. H. Tai, and S. Lee. 2025. Generative AI and its dilemmas: exploring AI from a translanguaging perspective.Applied Linguistics46, 4 (2025), 709–717. https://doi.org/10.1093/applin/amaf049
-
[32]
Chao Jiang and Wei Xu. 2024. MedReadMe: A Systematic Study for Fine-grained Sentence Readability in Medical Domain. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (Eds.). Association for Computational Linguistics, Miami, Florida, USA, 17293–17319. https://doi.org/...
-
[33]
Z. Jiang, Z. Xu, Z. Pan, J. He, and K. Xie. 2023. Exploring the Role of Artificial Intelligence in Facilitating Assessment of Writing Performance in Second Language Learning.Languages8, 4 (2023), 247. https://doi.org/10.3390/languages8040247
-
[34]
Why Language Models Hallucinate
Adam Tauman Kalai, Ofir Nachum, Santosh S. Vempala, and Edwin Zhang. 2025. Why Language Models Hallucinate. arXiv:2509.04664 [cs.CL] https://arxiv.org/abs/2509.04664
work page internal anchor Pith review arXiv 2025
- [35]
-
[36]
M. Kim, S. Kim, S. Lee, Y. Yoon, J. Myung, H. Yoo, H. Lim, J. Han, Y. Kim, S.-Y. Ahn, J. Kim, A. Oh, H. Hong, and T. Y. Lee. 2024. LLM-driven learning analytics dashboard for teachers in EFL writing education.arXiv preprint arXiv:2410.15025(2024). https: //arxiv.org/abs/2410.15025
-
[37]
Peter Kincaid, Robert P Fishburne Jr, Richard L Rogers, and Brad S Chissom
J. Peter Kincaid, Robert P Fishburne Jr, Richard L Rogers, and Brad S Chissom. 1975.Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel. Technical Report. ERIC
1975
-
[38]
N. Kosmyna, E. Hauptmann, Y. T. Yuan, J. Situ, X.-H. Liao, A. V. Beresnitzky, I. Braunstein, and P. Maes. 2025. Your Brain on ChatGPT: Accumulation of cognitive debt when using an AI assistant for essay writing task.arXiv preprint arXiv:2506.08872(2025). https://arxiv.org/abs/2506.08872
-
[39]
Suhas Kotha, Jacob Mitchell Springer, and Aditi Raghunathan. 2024. Understanding Catastrophic Forgetting in Language Models via Implicit Inference. InThe Twelfth International Conference on Learning Representations. https://openreview.net/forum?id=VrHiF2hsrm
2024
-
[40]
2010.Multimodality: A Social Semiotic Approach to Contemporary Communication
Gunther Kress. 2010.Multimodality: A Social Semiotic Approach to Contemporary Communication. Routledge, London
2010
-
[41]
Kristopher Kyle. 2020. Lexical Diversity. https://pypi.org/project/lexical-diversity/
2020
-
[42]
Common Corpus: The Largest Collection of Ethical Data for LLM Pre-Training
Pierre-Carl Langlais, Carlos Rosas Hinostroza, Mattia Nee, Catherine Arnett, Pavel Chizhov, Eliot Krzystof Jones, Irène Girard, David Mach, Anastasia Stasenko, and Ivan P. Yamshchikov. 2025. Common Corpus: The Largest Collection of Ethical Data for LLM Pre-Training. 20•Hengky Susanto, et al. ArXivabs/2506.01732 (2025). https://api.semanticscholar.org/Corp...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[43]
L. Leppänen, L. Aunimo, J. K. Nurminen, A. Hellas, and L. Mannila. 2025. How large language models are changing MOOC essay answers: A comparison of pre- and post-LLM responses. InProceedings of the Third Workshop on Intelligent and Interactive Writing Assistants (In2Writing ’24). Association for Computational Linguistics. https://arxiv.org/abs/2504.13038
-
[44]
Haochuan Li, Jingyuan Li, Yi Zhao, Heng Zhang, Yukai Yang, Zile Hu, and Chengzhi Zhang. 2025. Measuring Research Difficulty in Academic Papers: A Case Study in Natural Language Processing.Data Science and Informetrics(2025). https://doi.org/10.1016/j.dsim. 2025.06.001
-
[45]
Z.-M. Liu, G.-J. Hwang, C.-Q. Chen, X.-D. Chen, and X.-D. Ye. 2024. Integrating large language models into EFL writing instruction: effects on performance, self-regulated learning strategies, and motivation.Computer Assisted Language Learning(2024), 1–32. https: //doi.org/10.1080/09588221.2024.2388211
-
[46]
U. Marzuki, Widiati, D. Rusdin, Darwin, and I. Indrawati. 2023. The impact of AI writing tools on the content and organization of students’ writing: EFL teachers’ perspective.Cogent Education10, 2 (2023), 2236469. https://doi.org/10.1080/2331186X.2023.2236469
-
[47]
Harry Mc Laughlin
G. Harry Mc Laughlin. 1969. SMOG Grading-a New Readability Formula.Journal of Reading12, 8 (1969), 639–646
1969
-
[48]
Philip Mccarthy and Scott Jarvis. 2010. MTLD, vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment.Behavior research methods42 (05 2010), 381–92. https://doi.org/10.3758/BRM.42.2.381
-
[49]
McCarthy
Philip M. McCarthy. 2005.An Assessment of the Range and Usefulness of Lexical Diversity Measures and the Potential of the Measure of Textual, Lexical Diversity (MTLD). Ph. D. Dissertation. The University of Memphis
2005
-
[50]
J. Meyer, T. Jansen, R. Schiller, L. W. Liebenow, M. Steinbach, A. Horbach, and J. Fleckenstein. 2024. Using LLMs to bring evidence-based feedback into the classroom: AI-generated feedback increases secondary students’ text revision, motivation, and positive emotions. Computers and Education: Artificial Intelligence6 (2024), 100199. https://doi.org/10.1016...
-
[51]
Salar Mohtaj, Sebastian Möller, Faraz Maschhur, Chuyang Wu, and Max Reinhard. 2022. A Transfer Learning Based Model for Text Readability Assessment in German. (07 2022). https://doi.org/10.48550/arXiv.2207.06265
-
[52]
Yancey, Ruidong Liu, Mirza Basim Baig, André Kenji Horie, and James Sharpnack
Chenhao Niu, Kevin P. Yancey, Ruidong Liu, Mirza Basim Baig, André Kenji Horie, and James Sharpnack. 2024. Detecting LLM-Assisted Cheating on Open-Ended Writing Tasks on Language Proficiency Tests. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track, Franck Dernoncourt, Daniel Preoţiuc-Pietro, and Anasta...
2024
-
[53]
Daniela Oelke, David Spretke, Andreas Stoffel, and Daniel A. Keim. 2012. Visual Readability Analysis: How to Make Your Writings Easier to Read.IEEE Transactions on Visualization and Computer Graphics18, 5 (May 2012), 662–674. https://doi.org/10.1109/TVCG.2011.266
-
[54]
OpenAI. 2023. GPT-4 Technical Report.CoRRabs/2303.08774 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[55]
Andreas Östling, Holli Sargeant, Huiyuan Xie, Ludwig Bull, Alexander Terenin, Leif Jonsson, Måns Magnusson, and Felix Steffek. 2023. The cambridge law corpus: a dataset for legal AI research. InProceedings of the 37th International Conference on Neural Information Processing Systems(New Orleans, LA, USA)(NIPS ’23). Curran Associates Inc., Red Hook, NY, USA...
2023
-
[56]
1966.Gobbledygook has gotta go
John O’hayre. 1966.Gobbledygook has gotta go. Technical Report. US Department of the Interior, Bureau of Land Management
1966
-
[57]
K. Pearson. 1896. Mathematical contributions to the theory of evolution. III. Regression, heredity, and panmixia.Philosophical Transactions A373 (1896), 253–318
-
[58]
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer.J. Mach. Learn. Res.21, 1, Article 140 (Jan. 2020), 67 pages
2020
-
[59]
Hannah Rashkin, Elizabeth Clark, Fantine Huot, and Mirella Lapata. 2025. Help Me Write a Story: Evaluating LLMs’ Ability to Generate Writing Feedback. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar (Eds.). Associa...
2025
-
[60]
Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. InNeurIPS EMC2 Workshop
2019
-
[61]
J.-Y. Seo. 2024. Exploring the Educational Potential of ChatGPT: AI-Assisted Narrative Writing for EFL College Students.Language Teaching Research Quarterly43 (2024), 1–21. https://doi.org/10.32038/ltrq.2024.43.01
-
[62]
M. Siddiqui, R. Pea, and H. Subramonyam. 2025. Script&Shift: A Layered Interface Paradigm for Integrating Content Development and Rhetorical Strategy with LLM Writing Assistants.arXiv preprint arXiv:2502.10638(2025). https://arxiv.org/abs/2502.10638
-
[63]
E. A. Smith and R. J. Senter. 1967.Automated readability index. Technical Report. AMRL TR. 1–14 pages. PMID: 5302480
1967
-
[64]
C. Song and Y. Song. 2023. Enhancing academic writing skills and motivation: assessing the efficacy of ChatGPT in AI-assisted language learning for EFL students.Frontiers in Psychology14 (2023), 1260843. https://doi.org/10.3389/fpsyg.2023.1260843
-
[65]
Matthias Stadler, Maria Bannert, and Michael Sailer. 2024. Cognitive ease at a cost: LLMs reduce mental effort but compromise depth in student scientific inquiry.Computers in Human Behavior160 (2024), 108386. https://doi.org/10.1016/j.comphumbeh.2024.108386
-
[66]
Y. Su, Y. Lin, and C. Lai. 2023. Collaborating with ChatGPT in argumentative writing classrooms.Assessing Writing57 (2023), 100752. https://doi.org/10.1016/j.asw.2023.100752 The Crutch or the Ceiling? How Different Generations of LLMs Shape EFL Student Writings•21
-
[67]
Hakyung Sung, Karla Csuros, and Min-Chang Sung. 2025. Comparing human and LLM proofreading in L2 writing: Impact on lexical and syntactic features. InProceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025). Association for Computational Linguistics, Vienna, Austria, 11–23
2025
- [68]
-
[69]
ChatGPT is the companion, not enemies
Meng F. Teng. 2024. "ChatGPT is the companion, not enemies": EFL learners’ perceptions and experiences in using ChatGPT for feedback in writing.Computers and Education: Artificial Intelligence7 (2024), 100270. https://doi.org/10.1016/j.caeai.2024.100270
-
[70]
Meng F. Teng. 2025. Metacognitive Awareness and EFL Learners’ Perceptions and Experiences in Utilising ChatGPT for Writing Feedback.European Journal of Education60 (2025), e12811. https://doi.org/10.1111/ejed.12811
-
[71]
Ugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurélien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. 2023. LLaMA: Open and Efficient Foundation Language Models.CoRRabs/2302.13971 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[72]
Pablo Villalobos, Anson Ho, Jaime Sevilla, Tamay Besiroglu, Lennart Heim, and Marius Hobbhahn. 2024. Position: will we run out of data? limits of LLM scaling based on human-generated data. InProceedings of the 41st International Conference on Machine Learning (Vienna, Austria)(ICML’24). JMLR.org, Article 2024, 22 pages
2024
-
[73]
L. S. Vygotsky. 1978.Mind in Society: The Development of Higher Psychological Processes. Harvard University Press, Cambridge, MA
1978
-
[74]
Budi Waluyo and Farid Rouaghe. 2025. Beyond Teacher-Led Approaches: Student-Initiated Translanguaging With Artificial Intelligence Tools in Foreign Language Acquisition.SAGE Open15, 3 (2025). https://doi.org/10.1177/21582440251362998
-
[75]
Ben Wang and Aran Komatsuzaki. 2021. GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/ kingoflolz/mesh-transformer-jax
2021
-
[76]
Hong Wang and L. Hao. 2024. AI Integration in Translanguaging Practices in University EFL Classrooms: Shaping New Pedagogical Possibilities.SSRN(2024). https://doi.org/10.2139/ssrn.5601959
-
[77]
Wasi, Islam Mohammad R., and R
Asiful T. Wasi, Islam Mohammad R., and R. Islam. 2024. LLMs as Writing Assistants: Exploring Perspectives on Sense of Ownership and Reasoning.arXiv preprintarXiv:2404.00027 (2024). https://arxiv.org/abs/2404.00027
-
[78]
David James Woo, Kai Guo, and Hengky Susanto. 2025. Exploring EFL students’ prompt engineering in human-AI story writing: An activity theory perspective.Interactive Learning Environments33(1) (2025), 863–882. https://doi.org/10.1080/10494820.2024.2361381
- [79]
-
[80]
D. J. Woo, H. Susanto, C. H. Yeung, K. Guo, and A. K. Y. Fung. 2024. Exploring AI-Generated Text in Student Writing: How Does AI Help?Language Learning & Technology28, 2 (2024), 183–209. https://doi.org/10.64152/10125/73577
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.