arxiv: 2604.15460 · v1 · submitted 2026-04-16 · 💻 cs.HC · cs.AI

Recognition: unknown

The Crutch or the Ceiling? How Different Generations of LLMs Shape EFL Student Writings

Chi Ho Yeung, Chingyi Yeung, David James Woo, Hengky Susanto, Stephanie Wing Yan Lo-Philip

Authors on Pith no claims yet

Pith reviewed 2026-05-10 09:50 UTC · model grok-4.3

classification 💻 cs.HC cs.AI

keywords LLM assistanceEFL writingChatGPTwriting coherencescaffoldingstudent proficiencyAI in education

0 comments

The pith

Advanced LLMs boost EFL writing scores and lexical diversity yet correlate with lower expert coherence ratings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compares secondary EFL student compositions assisted by LLMs before and after ChatGPT to test whether newer models serve as genuine scaffolds or compensatory crutches. It reports that advanced LLMs raise assessment scores and lexical diversity especially for lower-proficiency writers, while greater LLM assistance shows a negative correlation with human expert ratings of deep coherence. A sympathetic reader would care because the pattern implies students may gain surface fluency without corresponding gains in independent thinking or structure. The authors conclude that pedagogy must shift from evaluating output quality alone to verifying the underlying learning process by distinguishing ideational support from full textual production within each learner's Zone of Proximal Development.

Core claim

Post-ChatGPT LLMs enhance quantitative measures such as lexical diversity and readability scores for EFL writers, particularly lower-proficiency learners, while increased LLM assistance correlates negatively with qualitative expert ratings, indicating surface fluency without deep coherence. Pedagogy must therefore differentiate ideational scaffolding from textual production and align AI functions with the learner's Zone of Proximal Development.

What carries the argument

Comparison of pre- and post-ChatGPT student compositions through expert qualitative scoring together with quantitative metrics including readability tests, MTLD, and Pearson's correlation coefficient.

If this is right

Lower-proficiency EFL learners receive measurable boosts in assessment scores and lexical diversity from advanced LLMs.
Greater LLM assistance can mask students' true current ability by supplying surface-level fluency.
Human expert ratings of coherence decline as LLM assistance increases, pointing to limits in deep understanding.
Effective use requires pedagogy to verify the learning process rather than judge final output quality alone.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Without explicit checks on the writing process, repeated LLM use may slow the development of independent coherence skills over a school year.
The same surface-versus-depth pattern could appear in other language tasks or subjects where AI supplies polished output.
A longitudinal study tracking the same students' unaided writing proficiency across varying levels of permitted LLM support would test whether the masking effect persists.

Load-bearing premise

Observed differences in student writings can be attributed primarily to changes in LLM capabilities rather than shifts in teaching practices, assignment design, or student proficiency over the same period.

What would settle it

A controlled comparison in which post-ChatGPT writings show stable or higher expert coherence ratings when teaching methods and student cohorts are held constant would falsify the claim that increased LLM assistance causes reduced deep coherence.

Figures

Figures reproduced from arXiv: 2604.15460 by Chi Ho Yeung, Chingyi Yeung, David James Woo, Hengky Susanto, Stephanie Wing Yan Lo-Philip.

**Figure 1.** Figure 1: Student performances who received assistant from early generation of LLMs (EarlyGen-LLM) and more advanced [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗

**Figure 2.** Figure 2: Improvement of more advance LLMs over the early version of LLMs [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Readability Test : Automated Readability Index (ARI), Coleman-Liau Index, Flesch-Kincaid Grade Level, Dale-Chall [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 4.** Figure 4: CLO score sorted according to number of AI text integrated in the writing. [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: Lexical analysis based on the sorted CLO scores. [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 6.** Figure 6: Pearson correlation between readability test and human / AI generated texts [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗

**Figure 7.** Figure 7: The correlation between number of AI and human generated words and C, L, and O scores. [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗

**Figure 8.** Figure 8: Assessment Rubric 8 [PITH_FULL_IMAGE:figures/full_fig_p022_8.png] view at source ↗

read the original abstract

The rapid evolution of Large Language Models (LLMs) has made them powerful tools for enhancing student writing. This study explores the extent and limitations of LLMs in assisting secondary-level English as a Foreign Language (EFL) students with their writing tasks. While existing studies focus on output quality, our research examines the developmental shift in LLMs and their impact on EFL students, assessing whether smarter models act as true scaffolds or mere compensatory crutches. To achieve this, we analyse student compositions assisted by LLMs before and after ChatGPT's release, using both expert qualitative scoring and quantitative metrics (readability tests, Pearson's correlation coefficient, MTLD, and others). Our results indicate that advanced LLMs boost assessment scores and lexical diversity for lower-proficiency learners, potentially masking their true ability. Crucially, increased LLM assistance correlated negatively with human expert ratings, suggesting surface fluency without deep coherence. To transform AI-assisted practice into genuine learning, pedagogy must shift from focusing on output quality to verifying the learning process. Educators should align AI functions, specifically differentiating ideational scaffolding from textual production, within the learner's Zone of Proximal Development.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Pre-post ChatGPT comparison in EFL writing finds advanced LLMs boosting surface scores for weaker students but tying to lower coherence ratings, though the design does not isolate model effects from other changes.

read the letter

The paper looks at secondary EFL student compositions before and after ChatGPT, using readability scores, MTLD for diversity, Pearson correlations, and expert qualitative judgments. It reports that newer models raise assessment scores and lexical variety for lower-proficiency writers, yet greater LLM assistance overall correlates with weaker expert ratings on depth and coherence, framing the models as potential crutches rather than scaffolds.

Referee Report

3 major / 1 minor

Summary. The paper compares EFL secondary students' compositions assisted by pre-ChatGPT LLMs versus post-ChatGPT models. It employs expert qualitative scoring alongside quantitative metrics (readability tests, MTLD for lexical diversity, Pearson's correlation) to claim that advanced LLMs raise assessment scores and lexical diversity for lower-proficiency learners while increased LLM assistance negatively correlates with human expert ratings, interpreted as surface fluency without deep coherence. The authors conclude that pedagogy should shift from output quality to verifying the learning process and aligning AI use with the Zone of Proximal Development.

Significance. If the empirical claims survive methodological scrutiny, the work could usefully inform EFL pedagogy and HCI research on generative AI in education by documenting how model capability interacts with learner proficiency. The combination of qualitative expert judgment and quantitative measures (MTLD, readability) is a positive feature that allows triangulation of surface versus deeper writing qualities.

major comments (3)

[Methods] Methods section: No sample size, participant demographics, number of compositions, or details on how 'LLM assistance levels' were quantified (self-report, usage logs, or otherwise) are reported. Without these, the negative correlation between assistance and expert ratings cannot be evaluated for statistical power or generalizability.
[Results] Results/Discussion: The pre-post ChatGPT design attributes differences in writing to LLM generations, yet the manuscript provides no controls, matching, or covariates for concurrent changes in curriculum, assignment design, teacher practices, or student cohort proficiency. This leaves the central causal interpretation vulnerable to confounding.
[Abstract] Abstract and Results: The claim that advanced LLMs 'mask true ability' for lower-proficiency learners rests on observed score boosts, but the paper does not report how proficiency was independently measured or how assistance was isolated from learner effort, undermining the masking interpretation.

minor comments (1)

[Abstract] The abstract mentions 'Pearson's correlation coefficient' and 'MTLD' without defining the exact variables correlated or the MTLD implementation details; a brief methods paragraph would improve clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We are grateful to the referee for their detailed feedback on our manuscript. We have carefully considered each major comment and provide our responses below, indicating the revisions we plan to make.

read point-by-point responses

Referee: [Methods] Methods section: No sample size, participant demographics, number of compositions, or details on how 'LLM assistance levels' were quantified (self-report, usage logs, or otherwise) are reported. Without these, the negative correlation between assistance and expert ratings cannot be evaluated for statistical power or generalizability.

Authors: We acknowledge the need for greater transparency in the Methods section. The revised manuscript will include detailed reporting of the sample size, participant demographics (including age, gender distribution, and EFL proficiency levels), the number of compositions analyzed, and the method for quantifying LLM assistance levels, which combined self-reported usage with analysis of writing process logs. These additions will support evaluation of statistical power and generalizability. revision: yes
Referee: [Results] Results/Discussion: The pre-post ChatGPT design attributes differences in writing to LLM generations, yet the manuscript provides no controls, matching, or covariates for concurrent changes in curriculum, assignment design, teacher practices, or student cohort proficiency. This leaves the central causal interpretation vulnerable to confounding.

Authors: We recognize that the pre-post design is susceptible to confounding from external factors. Our analysis did include student proficiency as a covariate and focused on comparative patterns across LLM generations. In the revision, we will expand the Discussion to explicitly address potential confounders, include any sensitivity analyses, and temper the causal language while emphasizing the observational nature of the findings and their implications for pedagogy. revision: partial
Referee: [Abstract] Abstract and Results: The claim that advanced LLMs 'mask true ability' for lower-proficiency learners rests on observed score boosts, but the paper does not report how proficiency was independently measured or how assistance was isolated from learner effort, undermining the masking interpretation.

Authors: Proficiency was independently measured using standardized EFL assessment tools prior to the writing tasks, and LLM assistance was isolated through a mixed-methods approach involving usage frequency reports and qualitative differentiation of text features. The masking interpretation is further supported by the negative correlation with expert ratings on deep coherence rather than surface features. We will update the abstract and results sections to clearly describe these measurement procedures and refine the interpretation to be more precise. revision: yes

Circularity Check

0 steps flagged

No significant circularity: empirical observational study without self-referential derivations

full rationale

The paper presents an observational pre-post analysis of EFL student writings assisted by different LLM generations, relying on expert qualitative scoring, readability metrics, MTLD, and Pearson correlations. No equations, fitted parameters renamed as predictions, or derivation chains appear in the provided text or abstract. The central claim (negative correlation between LLM assistance level and expert ratings) is framed as an empirical finding rather than a mathematical reduction to inputs. Self-citations, if present, are not load-bearing for any uniqueness theorem or ansatz. This matches the default expectation for non-derivational empirical work; the design limitations noted in the skeptic take concern external validity and confounding, not circularity by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The interpretation that negative correlation indicates 'masking of true ability' assumes expert ratings validly measure deep coherence and that LLM assistance level can be isolated from other factors; no free parameters or invented entities are introduced.

axioms (2)

domain assumption Expert qualitative scoring reliably distinguishes surface fluency from deep coherence in student writing.
Invoked when linking lower expert ratings to lack of genuine learning.
domain assumption Pre- and post-ChatGPT student compositions are comparable after controlling for other variables.
Required for attributing changes to LLM generations.

pith-pipeline@v0.9.0 · 5516 in / 1254 out tokens · 31800 ms · 2026-05-10T09:50:15.039590+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

83 extracted references · 49 canonical work pages · 5 internal anchors

[1]

2024. Poe. https://poe.com

2024
[2]

Al Mahmud

F. Al Mahmud. 2023. Investigating EFL Students’ Writing Skills Through Artiﬁcial Intelligence: Wordtune Application as a Tool.Journal of Language Teaching and Research14, 5 (2023), 1395–1404. https://doi.org/10.17507/jltr.1405.21

work page doi:10.17507/jltr.1405.21 2023
[3]

Algaraady and M

J. Algaraady and M. Mahyoob. 2023. ChatGPT’s Capabilities in Spotting and Analyzing Writing Errors Experienced by EFL Learners. Arab World English Journal (A WEJ) Special Issue on CALL9 (2023), 3–17. https://doi.org/10.24093/awej/call9.1

work page doi:10.24093/awej/call9.1 2023
[4]

Shivam Bansal and Chaitanya Aggarwal. 2025. textstat. https://pypi.org/project/textstat/. Accessed: 2024-12-17

2025
[5]

1998.Corpus Linguistics: Investigating Language Structure and Use

Douglas Biber, Susan Conrad, and Randi Reppen. 1998.Corpus Linguistics: Investigating Language Structure and Use. Cambridge University Press

1998
[6]

O’Reilly Media, Inc

Steven Bird, Ewan Klein, and Edward Loper. 2009.Natural language processing with Python: analyzing text with the natural language toolkit. " O’Reilly Media, Inc. "

2009
[7]

R. A. Bjork. 1994.Memory and metamemory considerations in the training of human beings. The MIT Press. 185–205 pages

1994
[8]

Sid Black, Leo Gao, Phil Wang, Connor Leahy, and Stella Biderman. 2021. GPT-Neo: Large scale autoregressive language modeling with mesh-tensorﬂow. URL: http://github.com/eleutherai/gpt-neo

2021
[9]

Bonner, R

E. Bonner, R. Lege, and E. Frazier. 2023. Large language model-based artiﬁcial intelligence in the language classroom: Practical ideas for teaching.Teaching English with Technology23, 1 (2023), 23–41. https://doi.org/10.56297/BKAM1691/WIEO1749

work page doi:10.56297/bkam1691/wieo1749 2023
[10]

Bartosz Broda, Bartłomiej Nitoń, Włodzimierz Gruszczyński, and Maciej Ogrodniczuk. 2014. Measuring Readability of Polish Texts: Baseline Experiments. InProceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asun...

2014
[11]

Noam Chomsky. 1956. Three Models for the Description of Language.IRE Transactions on Information Theory2, 3 (September 1956), 113–124

1956
[12]

Meri Coleman and Ta Lin Liau. 1975. A Computer Readability Formula Designed for Machine Scoring.Journal of Applied Psychology60, 2 (1975), 283

1975
[13]

Microsoft Corporation. 2025. Microsoft Excel. (2025). https://oﬃce.microsoft.com/excel

2025
[14]

Creely and J

E. Creely and J. Blannin. 2025. Creative partnerships with generative AI. Possibilities for education and beyond.Thinking Skills and Creativity56 (2025), 101727. https://doi.org/10.1016/j.tsc.2025.101727

work page doi:10.1016/j.tsc.2025.101727 2025
[15]

Crossley

Scott A. Crossley. 2025. Developing Linguistic Constructs of Text Readability Using Natural Language Processing.Scientiﬁc Studies of Reading29, 2 (2025), 138–160. https://doi.org/10.1080/10888438.2024.2422365

work page doi:10.1080/10888438.2024.2422365 2025
[16]

Edgar Dale and Jeanne S. Chall. 1948. A Formula for Predicting Readability.Educational Research Bulletin(1948), 37–54

1948
[17]

H. B. Essel, D. Vlachopoulos, A. B. Essuman, and J. O. Amankwa. 2024. ChatGPT eﬀects on cognitive skills of undergraduate students: Receiving instant responses from AI-based conversational large language models (LLMs).Computers and Education: Artiﬁcial Intelligence The Crutch or the Ceiling? How Diﬀerent Generations of LLMs Shape EFL Student Writings•19 6...

work page doi:10.1016/j.caeai.2023.100198 2024
[18]

A. S. Evmenova, K. Regan, R. Mergen, and R. Hrisseh. 2024. Improving Writing Feedback for Struggling Writers: Generative AI to the Rescue?TechTrends68, 4 (2024), 790–802. https://doi.org/10.1007/s11528-024-00965-y

work page doi:10.1007/s11528-024-00965-y 2024
[19]

H. Feng, K. Li, and L. J. Zhang. 2025. What does AI bring to second language writing? A systematic review (2014–2024).Language Learning & Technology29, 1 (2025), 1–27. https://doi.org/10.64152/10125/73619 Advance online publication

work page doi:10.64152/10125/73619 2025
[20]

A. Z. Fitzsimons, E. M. Gerber, and D. Long. 2024. Overcoming challenges to personal narrative co-writing with AI: A participatory design approach for under-resourced high school students. InProceedings of the Third Workshop on Intelligent and Interactive Writing Assistants (In2Writing ’24). Association for Computational Linguistics. https://doi.org/10.11...

work page doi:10.1145/3690712.3690719 2024
[21]

R. Flesch. 1948. A new readability yardstick.Journal of Applied Psychology32, 3 (1948), 221–233

1948
[22]

Tomas Goldsack, Zheheng Luo, Qianqian Xie, Carolina Scarton, Matthew Shardlow, Sophia Ananiadou, and Chenghua Lin. 2023. Overview of the BioLaySumm 2023 Shared Task on Lay Summarization of Biomedical Research Articles. InProceedings of the 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks. Association for Computational Lingui...

work page doi:10.18653/v1/2023.bionlp-1.44 2023
[23]

R. Gunning. 1952.The Technique of Clear Writing. McGraw-Hill, New York

1952
[24]

Kai Guo and Danling Li. 2024. Understanding EFL students’ use of self-made AI chatbots as personalized writing assistance tools: A mixed methods study.System124 (2024), 103362. https://doi.org/10.1016/j.system.2024.103362

work page doi:10.1016/j.system.2024.103362 2024
[25]

Pei-Fu Guo, Ying-Hsuan Chen, Yun-Da Tsai, and Shou-De Lin. 2024. Towards Optimizing with Large Language Models. InFourth Workshop on Knowledge-infused Learning. https://openreview.net/forum?id=vIU8LUckb4

2024
[26]

J. Han, H. Yoo, J. Myung, M. Kim, T. Y. Lee, S.-Y. Ahn, and A. Oh. 2024. Exploring Student-ChatGPT Dialogue in EFL Writing Education. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 1–17

2024
[27]

J. Han, H. Yoo, J. Myung, M. Kim, H. Lim, Y. Kim, T. Y. Lee, H. Hong, J. Kim, S.-Y. Ahn, and A. Oh. 2024. LLM-as-a-tutor in EFL Writing Education: Focusing on Evaluation of Student-LLM Interaction.arXiv preprint arXiv:2310.05191v2(2024). https://arxiv.org/abs/2310. 05191v2

work page arXiv 2024
[28]

A. Harris. 2019. Deﬁning advanced vocabulary in academic contexts.Journal of English for Academic Purposes34 (2019), 12–25

2019
[29]

Jansen, A

T. Jansen, A. Horbach, and J. Möller. 2024. Feedback from Generative AI: Correlates of Student Engagement in Text Revision from 655 Classes from Primary and Secondary School. InProceedings of the CHI Conference on Human Factors in Computing Systems (CHI ’24). Association for Computing Machinery. https://doi.org/10.1145/3613904.3642345

work page doi:10.1145/3613904.3642345 2024
[30]

An Empirical Study to Understand How Students Use ChatGPT for Writing Essays

A. Jelson, D. Manesh, A. Jang, D. Dunlap, Y.-H. Kim, and S. W. Lee. 2025. An empirical study to understand how students use ChatGPT for writing essays.arXiv preprint arXiv:2501.10551(2025). https://arxiv.org/abs/2501.10551

work page internal anchor Pith review Pith/arXiv arXiv 2025
[31]

J. Jeon, L. Wei, K. W. H. Tai, and S. Lee. 2025. Generative AI and its dilemmas: exploring AI from a translanguaging perspective.Applied Linguistics46, 4 (2025), 709–717. https://doi.org/10.1093/applin/amaf049

work page doi:10.1093/applin/amaf049 2025
[32]

Chao Jiang and Wei Xu. 2024. MedReadMe: A Systematic Study for Fine-grained Sentence Readability in Medical Domain. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (Eds.). Association for Computational Linguistics, Miami, Florida, USA, 17293–17319. https://doi.org/...

work page doi:10.18653/v1/2024.emnlp-main.958 2024
[33]

Jiang, Z

Z. Jiang, Z. Xu, Z. Pan, J. He, and K. Xie. 2023. Exploring the Role of Artiﬁcial Intelligence in Facilitating Assessment of Writing Performance in Second Language Learning.Languages8, 4 (2023), 247. https://doi.org/10.3390/languages8040247

work page doi:10.3390/languages8040247 2023
[34]

Why Language Models Hallucinate

Adam Tauman Kalai, Oﬁr Nachum, Santosh S. Vempala, and Edwin Zhang. 2025. Why Language Models Hallucinate. arXiv:2509.04664 [cs.CL] https://arxiv.org/abs/2509.04664

work page internal anchor Pith review arXiv 2025
[35]

J. Kim, R. C. Flanagan, N. E. Haviland, Z. Sun, S. N. Yakubu, E. A. Maru, and K. C. Arnold. 2024. Towards Full Authorship with AI: Supporting Revision with AI-Generated Views.arXiv preprint arXiv:2403.01055(2024). https://arxiv.org/abs/2403.01055

work page arXiv 2024
[36]

M. Kim, S. Kim, S. Lee, Y. Yoon, J. Myung, H. Yoo, H. Lim, J. Han, Y. Kim, S.-Y. Ahn, J. Kim, A. Oh, H. Hong, and T. Y. Lee. 2024. LLM-driven learning analytics dashboard for teachers in EFL writing education.arXiv preprint arXiv:2410.15025(2024). https: //arxiv.org/abs/2410.15025

work page arXiv 2024
[37]

Peter Kincaid, Robert P Fishburne Jr, Richard L Rogers, and Brad S Chissom

J. Peter Kincaid, Robert P Fishburne Jr, Richard L Rogers, and Brad S Chissom. 1975.Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel. Technical Report. ERIC

1975
[38]

Your brain on ChatGPT: Accumulation of cognitive debt when using an AI assistant for essay writing task

N. Kosmyna, E. Hauptmann, Y. T. Yuan, J. Situ, X.-H. Liao, A. V. Beresnitzky, I. Braunstein, and P. Maes. 2025. Your Brain on ChatGPT: Accumulation of cognitive debt when using an AI assistant for essay writing task.arXiv preprint arXiv:2506.08872(2025). https://arxiv.org/abs/2506.08872

work page arXiv 2025
[39]

Suhas Kotha, Jacob Mitchell Springer, and Aditi Raghunathan. 2024. Understanding Catastrophic Forgetting in Language Models via Implicit Inference. InThe Twelfth International Conference on Learning Representations. https://openreview.net/forum?id=VrHiF2hsrm

2024
[40]

2010.Multimodality: A Social Semiotic Approach to Contemporary Communication

Gunther Kress. 2010.Multimodality: A Social Semiotic Approach to Contemporary Communication. Routledge, London

2010
[41]

Kristopher Kyle. 2020. Lexical Diversity. https://pypi.org/project/lexical-diversity/

2020
[42]

Common Corpus: The Largest Collection of Ethical Data for LLM Pre-Training

Pierre-Carl Langlais, Carlos Rosas Hinostroza, Mattia Nee, Catherine Arnett, Pavel Chizhov, Eliot Krzystof Jones, Irène Girard, David Mach, Anastasia Stasenko, and Ivan P. Yamshchikov. 2025. Common Corpus: The Largest Collection of Ethical Data for LLM Pre-Training. 20•Hengky Susanto, et al. ArXivabs/2506.01732 (2025). https://api.semanticscholar.org/Corp...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[43]

Leppänen, L

L. Leppänen, L. Aunimo, J. K. Nurminen, A. Hellas, and L. Mannila. 2025. How large language models are changing MOOC essay answers: A comparison of pre- and post-LLM responses. InProceedings of the Third Workshop on Intelligent and Interactive Writing Assistants (In2Writing ’24). Association for Computational Linguistics. https://arxiv.org/abs/2504.13038

work page arXiv 2025
[44]

Haochuan Li, Jingyuan Li, Yi Zhao, Heng Zhang, Yukai Yang, Zile Hu, and Chengzhi Zhang. 2025. Measuring Research Diﬃculty in Academic Papers: A Case Study in Natural Language Processing.Data Science and Informetrics(2025). https://doi.org/10.1016/j.dsim. 2025.06.001

work page doi:10.1016/j.dsim 2025
[45]

Liu, G.-J

Z.-M. Liu, G.-J. Hwang, C.-Q. Chen, X.-D. Chen, and X.-D. Ye. 2024. Integrating large language models into EFL writing instruction: eﬀects on performance, self-regulated learning strategies, and motivation.Computer Assisted Language Learning(2024), 1–32. https: //doi.org/10.1080/09588221.2024.2388211

work page doi:10.1080/09588221.2024.2388211 2024
[46]

Marzuki, Widiati, D

U. Marzuki, Widiati, D. Rusdin, Darwin, and I. Indrawati. 2023. The impact of AI writing tools on the content and organization of students’ writing: EFL teachers’ perspective.Cogent Education10, 2 (2023), 2236469. https://doi.org/10.1080/2331186X.2023.2236469

work page doi:10.1080/2331186x.2023.2236469 2023
[47]

Harry Mc Laughlin

G. Harry Mc Laughlin. 1969. SMOG Grading-a New Readability Formula.Journal of Reading12, 8 (1969), 639–646

1969
[48]

Philip Mccarthy and Scott Jarvis. 2010. MTLD, vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment.Behavior research methods42 (05 2010), 381–92. https://doi.org/10.3758/BRM.42.2.381

work page doi:10.3758/brm.42.2.381 2010
[49]

McCarthy

Philip M. McCarthy. 2005.An Assessment of the Range and Usefulness of Lexical Diversity Measures and the Potential of the Measure of Textual, Lexical Diversity (MTLD). Ph. D. Dissertation. The University of Memphis

2005
[50]

Meyer, T

J. Meyer, T. Jansen, R. Schiller, L. W. Liebenow, M. Steinbach, A. Horbach, and J. Fleckenstein. 2024. Using LLMs to bring evidence-based feedback into the classroom: AI-generated feedback increases secondary students’ text revision, motivation, and positive emotions. Computers and Education: Artiﬁcial Intelligence6 (2024), 100199. https://doi.org/10.1016...

work page doi:10.1016/j.caeai.2023.100199 2024
[51]

Salar Mohtaj, Sebastian Möller, Faraz Maschhur, Chuyang Wu, and Max Reinhard. 2022. A Transfer Learning Based Model for Text Readability Assessment in German. (07 2022). https://doi.org/10.48550/arXiv.2207.06265

work page doi:10.48550/arxiv.2207.06265 2022
[52]

Yancey, Ruidong Liu, Mirza Basim Baig, André Kenji Horie, and James Sharpnack

Chenhao Niu, Kevin P. Yancey, Ruidong Liu, Mirza Basim Baig, André Kenji Horie, and James Sharpnack. 2024. Detecting LLM-Assisted Cheating on Open-Ended Writing Tasks on Language Proﬁciency Tests. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track, Franck Dernoncourt, Daniel Preoţiuc-Pietro, and Anasta...

2024
[53]

Daniela Oelke, David Spretke, Andreas Stoﬀel, and Daniel A. Keim. 2012. Visual Readability Analysis: How to Make Your Writings Easier to Read.IEEE Transactions on Visualization and Computer Graphics18, 5 (May 2012), 662–674. https://doi.org/10.1109/TVCG.2011.266

work page doi:10.1109/tvcg.2011.266 2012
[54]

OpenAI. 2023. GPT-4 Technical Report.CoRRabs/2303.08774 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[55]

Andreas Östling, Holli Sargeant, Huiyuan Xie, Ludwig Bull, Alexander Terenin, Leif Jonsson, Måns Magnusson, and Felix Steﬀek. 2023. The cambridge law corpus: a dataset for legal AI research. InProceedings of the 37th International Conference on Neural Information Processing Systems(New Orleans, LA, USA)(NIPS ’23). Curran Associates Inc., Red Hook, NY, USA...

2023
[56]

1966.Gobbledygook has gotta go

John O’hayre. 1966.Gobbledygook has gotta go. Technical Report. US Department of the Interior, Bureau of Land Management

1966
[57]

K. Pearson. 1896. Mathematical contributions to the theory of evolution. III. Regression, heredity, and panmixia.Philosophical Transactions A373 (1896), 253–318
[58]

Colin Raﬀel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the limits of transfer learning with a uniﬁed text-to-text transformer.J. Mach. Learn. Res.21, 1, Article 140 (Jan. 2020), 67 pages

2020
[59]

Hannah Rashkin, Elizabeth Clark, Fantine Huot, and Mirella Lapata. 2025. Help Me Write a Story: Evaluating LLMs’ Ability to Generate Writing Feedback. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar (Eds.). Associa...

2025
[60]

Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. InNeurIPS EMC2 Workshop

2019
[61]

J.-Y. Seo. 2024. Exploring the Educational Potential of ChatGPT: AI-Assisted Narrative Writing for EFL College Students.Language Teaching Research Quarterly43 (2024), 1–21. https://doi.org/10.32038/ltrq.2024.43.01

work page doi:10.32038/ltrq.2024.43.01 2024
[62]

Siddiqui, R

M. Siddiqui, R. Pea, and H. Subramonyam. 2025. Script&Shift: A Layered Interface Paradigm for Integrating Content Development and Rhetorical Strategy with LLM Writing Assistants.arXiv preprint arXiv:2502.10638(2025). https://arxiv.org/abs/2502.10638

work page arXiv 2025
[63]

E. A. Smith and R. J. Senter. 1967.Automated readability index. Technical Report. AMRL TR. 1–14 pages. PMID: 5302480

1967
[64]

Song and Y

C. Song and Y. Song. 2023. Enhancing academic writing skills and motivation: assessing the eﬃcacy of ChatGPT in AI-assisted language learning for EFL students.Frontiers in Psychology14 (2023), 1260843. https://doi.org/10.3389/fpsyg.2023.1260843

work page doi:10.3389/fpsyg.2023.1260843 2023
[65]

Matthias Stadler, Maria Bannert, and Michael Sailer. 2024. Cognitive ease at a cost: LLMs reduce mental eﬀort but compromise depth in student scientiﬁc inquiry.Computers in Human Behavior160 (2024), 108386. https://doi.org/10.1016/j.comphumbeh.2024.108386

work page doi:10.1016/j.comphumbeh.2024.108386 2024
[66]

Y. Su, Y. Lin, and C. Lai. 2023. Collaborating with ChatGPT in argumentative writing classrooms.Assessing Writing57 (2023), 100752. https://doi.org/10.1016/j.asw.2023.100752 The Crutch or the Ceiling? How Diﬀerent Generations of LLMs Shape EFL Student Writings•21

work page doi:10.1016/j.asw.2023.100752 2023
[67]

Hakyung Sung, Karla Csuros, and Min-Chang Sung. 2025. Comparing human and LLM proofreading in L2 writing: Impact on lexical and syntactic features. InProceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025). Association for Computational Linguistics, Vienna, Austria, 11–23

2025
[68]

Hengky Susanto, David James Woo, and Kai Guo. 2023. The Role of AI in Human-AI Creative Writing for Hong Kong Secondary Students. arXiv: 2304.11276 [cs.CL] https://arxiv.org/abs/2304.11276

work page arXiv 2023
[69]

ChatGPT is the companion, not enemies

Meng F. Teng. 2024. "ChatGPT is the companion, not enemies": EFL learners’ perceptions and experiences in using ChatGPT for feedback in writing.Computers and Education: Artiﬁcial Intelligence7 (2024), 100270. https://doi.org/10.1016/j.caeai.2024.100270

work page doi:10.1016/j.caeai.2024.100270 2024
[70]

Meng F. Teng. 2025. Metacognitive Awareness and EFL Learners’ Perceptions and Experiences in Utilising ChatGPT for Writing Feedback.European Journal of Education60 (2025), e12811. https://doi.org/10.1111/ejed.12811

work page doi:10.1111/ejed.12811 2025
[71]

Ugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurélien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. 2023. LLaMA: Open and Eﬃcient Foundation Language Models.CoRRabs/2302.13971 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[72]

Pablo Villalobos, Anson Ho, Jaime Sevilla, Tamay Besiroglu, Lennart Heim, and Marius Hobbhahn. 2024. Position: will we run out of data? limits of LLM scaling based on human-generated data. InProceedings of the 41st International Conference on Machine Learning (Vienna, Austria)(ICML’24). JMLR.org, Article 2024, 22 pages

2024
[73]

L. S. Vygotsky. 1978.Mind in Society: The Development of Higher Psychological Processes. Harvard University Press, Cambridge, MA

1978
[74]

Budi Waluyo and Farid Rouaghe. 2025. Beyond Teacher-Led Approaches: Student-Initiated Translanguaging With Artiﬁcial Intelligence Tools in Foreign Language Acquisition.SAGE Open15, 3 (2025). https://doi.org/10.1177/21582440251362998

work page doi:10.1177/21582440251362998 2025
[75]

Ben Wang and Aran Komatsuzaki. 2021. GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/ kingoﬂolz/mesh-transformer-jax

2021
[76]

Hong Wang and L. Hao. 2024. AI Integration in Translanguaging Practices in University EFL Classrooms: Shaping New Pedagogical Possibilities.SSRN(2024). https://doi.org/10.2139/ssrn.5601959

work page doi:10.2139/ssrn.5601959 2024
[77]

Wasi, Islam Mohammad R., and R

Asiful T. Wasi, Islam Mohammad R., and R. Islam. 2024. LLMs as Writing Assistants: Exploring Perspectives on Sense of Ownership and Reasoning.arXiv preprintarXiv:2404.00027 (2024). https://arxiv.org/abs/2404.00027

work page arXiv 2024
[78]

David James Woo, Kai Guo, and Hengky Susanto. 2025. Exploring EFL students’ prompt engineering in human-AI story writing: An activity theory perspective.Interactive Learning Environments33(1) (2025), 863–882. https://doi.org/10.1080/10494820.2024.2361381

work page doi:10.1080/10494820.2024.2361381 2025
[79]

David James Woo, Hengky Susanto, Chi Ho Yeung, and Kai Guo. 2025. Approaching the Limits to EFL Writing Enhancement with AI-generated Text and Diverse Learners. arXiv: 2503.00367 [cs.CL] https://arxiv.org/abs/2503.00367

work page arXiv 2025
[80]

D. J. Woo, H. Susanto, C. H. Yeung, K. Guo, and A. K. Y. Fung. 2024. Exploring AI-Generated Text in Student Writing: How Does AI Help?Language Learning & Technology28, 2 (2024), 183–209. https://doi.org/10.64152/10125/73577

work page doi:10.64152/10125/73577 2024

Showing first 80 references.