Recognition: no theorem link
The Cost of Perfect English: Pragmatic Flattening and the Erasure of Authorial Voice in L2 Writing Supported by GenAI
Pith reviewed 2026-05-14 19:52 UTC · model grok-4.3
The pith
Generative AI polishes L2 essays but erases writers' unique cultural voices.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The study reveals a dimensional divergence in the Semantic Preservation Paradox: GenAI models retain lexicogrammatical accuracy and propositional content but cause a drastic collapse in dialogic engagement markers across all tested models, converting interactive discourse into monologic assertions, while epistemic stance markers vary by model architecture yet still contribute to homogenization toward Anglo-American norms.
What carries the argument
Pragmatic flattening, the systematic erasure of culturally preferred politeness and authorial stance through AI polishing of L2 texts.
Load-bearing premise
The observed reductions in dialogic engagement and epistemic markers result mainly from the AI models' training on Anglo-American writing norms rather than from prompt design or the methods used to measure those markers.
What would settle it
Re-polishing the same essays with explicit instructions to preserve original dialogic and epistemic markers, then re-measuring those features, would falsify the claim if the reductions disappear.
read the original abstract
The integration of Generative AI (GenAI) into language learning offers second language (L2) writers powerful tools for text optimization. However, pursuing native-like fluency often sacrifices sociopragmatic diversity. Investigating "pragmatic flattening" - the systematic erasure of culturally preferred politeness and authorial stance - this study conducts a comparative analysis of argumentative essays by Chinese B2-level university students from the ICNALE corpus. The original texts were polished via the APIs of four leading Large Language Models at a zero-temperature setting for reproducibility. Findings reveal a nuanced "dimensional divergence" within the Semantic Preservation Paradox. While models corrected lexicogrammatical errors and retained propositional meaning, sociopragmatic interventions were bifurcated. In the interactive dimension, all models showed a drastic collapse of dialogic engagement markers, turning negotiated discourse into monologic assertions. Conversely, in the epistemic stance dimension, models showed architecture-based variability: some aggressively scrubbed epistemic markers, while others reinforced tentative hedging as decontextualized algorithmic caution. This confirms that while GenAI enhances accuracy, it systematically overwrites L2 writers' unique rhetorical identities into a homogenized Anglo-American paradigm. We argue that future instruction must move beyond error correction, advocating for Critical AI Literacy to empower multilingual writers to use GenAI for linguistic enhancement while safeguarding sociopragmatic diversity and rhetorical agency.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper examines the effects of GenAI polishing on L2 argumentative essays by Chinese B2-level students from the ICNALE corpus. Using zero-temperature API calls to four leading LLMs, it compares original texts to AI-revised versions and reports a 'pragmatic flattening' effect: a sharp reduction in dialogic engagement markers (turning negotiated discourse monologic) alongside architecture-dependent shifts in epistemic stance markers. The central claim is that while lexicogrammatical accuracy improves and propositional content is largely preserved, sociopragmatic features are overwritten, erasing authorial voice and converging L2 writing toward a homogenized Anglo-American rhetorical paradigm. The authors advocate Critical AI Literacy to mitigate these effects.
Significance. If substantiated, the result would be significant for applied linguistics and AI-assisted writing pedagogy, documenting a concrete mechanism by which current LLMs can reduce rhetorical diversity in L2 output. The reproducible zero-temperature protocol and use of a public corpus are strengths that facilitate verification. However, the absence of a native-speaker baseline and of transparent coding procedures for the key markers leaves the directional claim (movement toward Anglo-American norms) unsupported, limiting immediate impact.
major comments (3)
- [Methods / Results] The comparative analysis described in the abstract and methods lacks any native English control corpus (e.g., ICNALE native subset or equivalent argumentative essays). Without quantifying the same engagement and epistemic markers in native texts, the observed reductions cannot be shown to constitute convergence on an 'Anglo-American paradigm' rather than generic model tendencies toward monologic assertiveness.
- [Methods] No coding scheme, annotation guidelines, inter-rater reliability statistics, or quantitative operationalization (e.g., normalized frequency counts, statistical tests) are reported for the dialogic engagement markers or epistemic stance features whose 'drastic collapse' and 'architecture-based variability' underpin the central claim.
- [Discussion] The interpretation that sociopragmatic interventions are driven by training on Anglo-American norms (rather than prompt design, temperature=0, or the marker-identification procedure itself) is asserted without a control condition that isolates these factors.
minor comments (2)
- [Abstract / Introduction] The term 'Semantic Preservation Paradox' is introduced without a formal definition or operational criteria distinguishing propositional from sociopragmatic preservation.
- [Results] Figure or table captions should explicitly state the exact marker inventories and normalization method used for the reported collapses.
Simulated Author's Rebuttal
We thank the referee for their constructive comments on our manuscript. We address each major concern below and commit to substantial revisions to strengthen the methodological rigor and interpretive claims.
read point-by-point responses
-
Referee: [Methods / Results] The comparative analysis described in the abstract and methods lacks any native English control corpus (e.g., ICNALE native subset or equivalent argumentative essays). Without quantifying the same engagement and epistemic markers in native texts, the observed reductions cannot be shown to constitute convergence on an 'Anglo-American paradigm' rather than generic model tendencies toward monologic assertiveness.
Authors: We fully agree that a native-speaker baseline is essential to support the directional claim of convergence toward an Anglo-American rhetorical paradigm. In the revised version, we will incorporate the native subset of the ICNALE corpus and perform parallel quantitative analyses of dialogic engagement and epistemic stance markers. This will enable direct comparison showing whether AI-polished L2 texts align more closely with native norms than the originals. revision: yes
-
Referee: [Methods] No coding scheme, annotation guidelines, inter-rater reliability statistics, or quantitative operationalization (e.g., normalized frequency counts, statistical tests) are reported for the dialogic engagement markers or epistemic stance features whose 'drastic collapse' and 'architecture-based variability' underpin the central claim.
Authors: The referee correctly identifies a significant omission in the methods section. The markers were drawn from established frameworks (Hyland's engagement model and Biber's stance taxonomy), with frequencies normalized per 1,000 words and significance tested via paired t-tests. We will add a new subsection detailing the full coding scheme, annotation guidelines with examples from the corpus, inter-rater reliability (targeting kappa > 0.80), and the exact statistical procedures used. revision: yes
-
Referee: [Discussion] The interpretation that sociopragmatic interventions are driven by training on Anglo-American norms (rather than prompt design, temperature=0, or the marker-identification procedure itself) is asserted without a control condition that isolates these factors.
Authors: We acknowledge that the causal attribution to training data norms is interpretive and would benefit from stronger controls. While the zero-temperature setting and consistent effects across four models reduce the likelihood of prompt-specific artifacts, we will revise the discussion to explicitly discuss these potential confounds and outline planned follow-up experiments (e.g., with culturally neutral prompts or alternative temperatures) to isolate the role of pre-training data. revision: partial
Circularity Check
Empirical marker comparison with no derivation chain
full rationale
The paper is an empirical study that compares dialogic engagement and epistemic stance markers in original ICNALE Chinese B2 essays versus the same texts after zero-temperature polishing by four external LLM APIs. No equations, fitted parameters, or self-referential definitions appear in the provided text; the central observations (collapse of dialogic markers, architecture-dependent epistemic shifts) are reported as direct measurements rather than quantities derived from prior outputs of the same study. The interpretation of movement toward a homogenized Anglo-American paradigm rests on external corpus data and third-party model behavior, not on any self-citation chain or ansatz that reduces the result to its own inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The ICNALE corpus essays are representative of typical B2-level Chinese L2 argumentative writing.
invented entities (2)
-
pragmatic flattening
no independent evidence
-
Semantic Preservation Paradox
no independent evidence
Reference graph
Works this paper leans on
-
[1]
https://doi.org/10.24093/awej/vol16no3.14 Barattieri di San Pietro, C., et al. (2025). How inclusive large language models can be? The curious case of pragmatics. Frontiers in Education. https://doi.org/10.3389/feduc.2025.1619662 Brown, P., & Levinson, S. C. (1987). Politeness: Some universals in language usage. Cambridge University Press. Chen, Y., et al...
-
[2]
https://doi.org/10.1177/00336882231168504 Holliday, A. (2006). Native-speakerism. ELT Journal, 60(4), 385–
-
[3]
https://doi.org/10.1093/elt/ccl030 Hyland, K. (2005). Metadiscourse: Exploring interaction in writing. Continuum. Ishikawa, S. (2013). The ICNALE and sophisticated contrastive interlanguage analysis of Asian learners of English. In S. Ishikawa (Ed.), Learner corpus studies in Asia and the world (Vol. 1, pp. 91–118). Kobe University. Jiang, L., & Gu, M. M....
-
[4]
https://doi.org/10.1002/tesq.3138 Kecskés, I., & Dinh, H. (2025). ChatGPT for intercultural pragmatic learning? Potentially, but not yet. Intercultural Pragmatics, 22(2), 369–
-
[5]
https://doi.org/10.1515/ip-2025-2008 Kurt, G., & Kurt, Y. (2024). Enhancing L2 writing skills: ChatGPT as an automated feedback tool. Journal of Information Technology Education: Research, 23,
-
[6]
https://doi.org/10.28945/5370 Li, M., et al. (2025). The role of generative AI and hybrid feedback in improving L2 writing skills: A comparative study. Innovation in Language Learning and Teaching. https://doi.org/10.1080/17501229.2025.2503890 Mizumoto, A., & Eguchi, M. (2023). Exploring the potential of using an AI language model for automated essay scor...
-
[7]
https://doi.org/10.18095/meeso.2025.26.1.352 Saldaña, J. (2021). The coding manual for qualitative researchers (4th ed.). SAGE Publications. Strobl, C., et al. (2025). Collaborative writing based on generative AI models: Revision and deliberation processes in German as a foreign language. Journal of Second Language Writing, 67, 101185. https://doi.org/10....
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.