pith. machine review for the scientific record. sign in

arxiv: 2605.13055 · v1 · submitted 2026-05-13 · 💻 cs.CL · cs.CY

Recognition: no theorem link

The Cost of Perfect English: Pragmatic Flattening and the Erasure of Authorial Voice in L2 Writing Supported by GenAI

Authors on Pith no claims yet

Pith reviewed 2026-05-14 19:52 UTC · model grok-4.3

classification 💻 cs.CL cs.CY
keywords pragmatic flatteningL2 writinggenerative AIauthorial voicesociopragmatic diversitydialogic engagementepistemic stancerhetorical agency
0
0 comments X

The pith

Generative AI polishes L2 essays but erases writers' unique cultural voices.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how generative AI tools used to refine second-language writing systematically reduce cultural and personal rhetorical features. By comparing original Chinese student essays with versions polished by four major AI models, it finds that while grammar and meaning are preserved, elements like dialogic engagement disappear, turning interactive arguments into direct statements. Epistemic markers vary by model but often get flattened into a uniform cautious style. This suggests AI optimization favors a standardized English academic norm over diverse authorial identities. The authors call for teaching that helps writers preserve their rhetorical agency when using these tools.

Core claim

The study reveals a dimensional divergence in the Semantic Preservation Paradox: GenAI models retain lexicogrammatical accuracy and propositional content but cause a drastic collapse in dialogic engagement markers across all tested models, converting interactive discourse into monologic assertions, while epistemic stance markers vary by model architecture yet still contribute to homogenization toward Anglo-American norms.

What carries the argument

Pragmatic flattening, the systematic erasure of culturally preferred politeness and authorial stance through AI polishing of L2 texts.

Load-bearing premise

The observed reductions in dialogic engagement and epistemic markers result mainly from the AI models' training on Anglo-American writing norms rather than from prompt design or the methods used to measure those markers.

What would settle it

Re-polishing the same essays with explicit instructions to preserve original dialogic and epistemic markers, then re-measuring those features, would falsify the claim if the reductions disappear.

read the original abstract

The integration of Generative AI (GenAI) into language learning offers second language (L2) writers powerful tools for text optimization. However, pursuing native-like fluency often sacrifices sociopragmatic diversity. Investigating "pragmatic flattening" - the systematic erasure of culturally preferred politeness and authorial stance - this study conducts a comparative analysis of argumentative essays by Chinese B2-level university students from the ICNALE corpus. The original texts were polished via the APIs of four leading Large Language Models at a zero-temperature setting for reproducibility. Findings reveal a nuanced "dimensional divergence" within the Semantic Preservation Paradox. While models corrected lexicogrammatical errors and retained propositional meaning, sociopragmatic interventions were bifurcated. In the interactive dimension, all models showed a drastic collapse of dialogic engagement markers, turning negotiated discourse into monologic assertions. Conversely, in the epistemic stance dimension, models showed architecture-based variability: some aggressively scrubbed epistemic markers, while others reinforced tentative hedging as decontextualized algorithmic caution. This confirms that while GenAI enhances accuracy, it systematically overwrites L2 writers' unique rhetorical identities into a homogenized Anglo-American paradigm. We argue that future instruction must move beyond error correction, advocating for Critical AI Literacy to empower multilingual writers to use GenAI for linguistic enhancement while safeguarding sociopragmatic diversity and rhetorical agency.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper examines the effects of GenAI polishing on L2 argumentative essays by Chinese B2-level students from the ICNALE corpus. Using zero-temperature API calls to four leading LLMs, it compares original texts to AI-revised versions and reports a 'pragmatic flattening' effect: a sharp reduction in dialogic engagement markers (turning negotiated discourse monologic) alongside architecture-dependent shifts in epistemic stance markers. The central claim is that while lexicogrammatical accuracy improves and propositional content is largely preserved, sociopragmatic features are overwritten, erasing authorial voice and converging L2 writing toward a homogenized Anglo-American rhetorical paradigm. The authors advocate Critical AI Literacy to mitigate these effects.

Significance. If substantiated, the result would be significant for applied linguistics and AI-assisted writing pedagogy, documenting a concrete mechanism by which current LLMs can reduce rhetorical diversity in L2 output. The reproducible zero-temperature protocol and use of a public corpus are strengths that facilitate verification. However, the absence of a native-speaker baseline and of transparent coding procedures for the key markers leaves the directional claim (movement toward Anglo-American norms) unsupported, limiting immediate impact.

major comments (3)
  1. [Methods / Results] The comparative analysis described in the abstract and methods lacks any native English control corpus (e.g., ICNALE native subset or equivalent argumentative essays). Without quantifying the same engagement and epistemic markers in native texts, the observed reductions cannot be shown to constitute convergence on an 'Anglo-American paradigm' rather than generic model tendencies toward monologic assertiveness.
  2. [Methods] No coding scheme, annotation guidelines, inter-rater reliability statistics, or quantitative operationalization (e.g., normalized frequency counts, statistical tests) are reported for the dialogic engagement markers or epistemic stance features whose 'drastic collapse' and 'architecture-based variability' underpin the central claim.
  3. [Discussion] The interpretation that sociopragmatic interventions are driven by training on Anglo-American norms (rather than prompt design, temperature=0, or the marker-identification procedure itself) is asserted without a control condition that isolates these factors.
minor comments (2)
  1. [Abstract / Introduction] The term 'Semantic Preservation Paradox' is introduced without a formal definition or operational criteria distinguishing propositional from sociopragmatic preservation.
  2. [Results] Figure or table captions should explicitly state the exact marker inventories and normalization method used for the reported collapses.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript. We address each major concern below and commit to substantial revisions to strengthen the methodological rigor and interpretive claims.

read point-by-point responses
  1. Referee: [Methods / Results] The comparative analysis described in the abstract and methods lacks any native English control corpus (e.g., ICNALE native subset or equivalent argumentative essays). Without quantifying the same engagement and epistemic markers in native texts, the observed reductions cannot be shown to constitute convergence on an 'Anglo-American paradigm' rather than generic model tendencies toward monologic assertiveness.

    Authors: We fully agree that a native-speaker baseline is essential to support the directional claim of convergence toward an Anglo-American rhetorical paradigm. In the revised version, we will incorporate the native subset of the ICNALE corpus and perform parallel quantitative analyses of dialogic engagement and epistemic stance markers. This will enable direct comparison showing whether AI-polished L2 texts align more closely with native norms than the originals. revision: yes

  2. Referee: [Methods] No coding scheme, annotation guidelines, inter-rater reliability statistics, or quantitative operationalization (e.g., normalized frequency counts, statistical tests) are reported for the dialogic engagement markers or epistemic stance features whose 'drastic collapse' and 'architecture-based variability' underpin the central claim.

    Authors: The referee correctly identifies a significant omission in the methods section. The markers were drawn from established frameworks (Hyland's engagement model and Biber's stance taxonomy), with frequencies normalized per 1,000 words and significance tested via paired t-tests. We will add a new subsection detailing the full coding scheme, annotation guidelines with examples from the corpus, inter-rater reliability (targeting kappa > 0.80), and the exact statistical procedures used. revision: yes

  3. Referee: [Discussion] The interpretation that sociopragmatic interventions are driven by training on Anglo-American norms (rather than prompt design, temperature=0, or the marker-identification procedure itself) is asserted without a control condition that isolates these factors.

    Authors: We acknowledge that the causal attribution to training data norms is interpretive and would benefit from stronger controls. While the zero-temperature setting and consistent effects across four models reduce the likelihood of prompt-specific artifacts, we will revise the discussion to explicitly discuss these potential confounds and outline planned follow-up experiments (e.g., with culturally neutral prompts or alternative temperatures) to isolate the role of pre-training data. revision: partial

Circularity Check

0 steps flagged

Empirical marker comparison with no derivation chain

full rationale

The paper is an empirical study that compares dialogic engagement and epistemic stance markers in original ICNALE Chinese B2 essays versus the same texts after zero-temperature polishing by four external LLM APIs. No equations, fitted parameters, or self-referential definitions appear in the provided text; the central observations (collapse of dialogic markers, architecture-dependent epistemic shifts) are reported as direct measurements rather than quantities derived from prior outputs of the same study. The interpretation of movement toward a homogenized Anglo-American paradigm rests on external corpus data and third-party model behavior, not on any self-citation chain or ansatz that reduces the result to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The paper relies on standard assumptions about corpus representativeness and introduces new conceptual terms without external validation.

axioms (1)
  • domain assumption The ICNALE corpus essays are representative of typical B2-level Chinese L2 argumentative writing.
    The original texts are drawn from this corpus for the before-after comparison.
invented entities (2)
  • pragmatic flattening no independent evidence
    purpose: To name the systematic removal of culturally preferred politeness and authorial stance by GenAI.
    New term introduced to describe the observed phenomenon.
  • Semantic Preservation Paradox no independent evidence
    purpose: To frame the split between preserved propositional content and altered sociopragmatic features.
    New framing term for the nuanced findings.

pith-pipeline@v0.9.0 · 5548 in / 1353 out tokens · 58277 ms · 2026-05-14T19:52:35.707570+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

7 extracted references · 7 canonical work pages

  1. [1]

    https://doi.org/10.24093/awej/vol16no3.14 Barattieri di San Pietro, C., et al. (2025). How inclusive large language models can be? The curious case of pragmatics. Frontiers in Education. https://doi.org/10.3389/feduc.2025.1619662 Brown, P., & Levinson, S. C. (1987). Politeness: Some universals in language usage. Cambridge University Press. Chen, Y., et al...

  2. [2]

    https://doi.org/10.1177/00336882231168504 Holliday, A. (2006). Native-speakerism. ELT Journal, 60(4), 385–

  3. [3]

    https://doi.org/10.1093/elt/ccl030 Hyland, K. (2005). Metadiscourse: Exploring interaction in writing. Continuum. Ishikawa, S. (2013). The ICNALE and sophisticated contrastive interlanguage analysis of Asian learners of English. In S. Ishikawa (Ed.), Learner corpus studies in Asia and the world (Vol. 1, pp. 91–118). Kobe University. Jiang, L., & Gu, M. M....

  4. [4]

    https://doi.org/10.1002/tesq.3138 Kecskés, I., & Dinh, H. (2025). ChatGPT for intercultural pragmatic learning? Potentially, but not yet. Intercultural Pragmatics, 22(2), 369–

  5. [5]

    https://doi.org/10.1515/ip-2025-2008 Kurt, G., & Kurt, Y. (2024). Enhancing L2 writing skills: ChatGPT as an automated feedback tool. Journal of Information Technology Education: Research, 23,

  6. [6]

    https://doi.org/10.28945/5370 Li, M., et al. (2025). The role of generative AI and hybrid feedback in improving L2 writing skills: A comparative study. Innovation in Language Learning and Teaching. https://doi.org/10.1080/17501229.2025.2503890 Mizumoto, A., & Eguchi, M. (2023). Exploring the potential of using an AI language model for automated essay scor...

  7. [7]

    https://doi.org/10.18095/meeso.2025.26.1.352 Saldaña, J. (2021). The coding manual for qualitative researchers (4th ed.). SAGE Publications. Strobl, C., et al. (2025). Collaborative writing based on generative AI models: Revision and deliberation processes in German as a foreign language. Journal of Second Language Writing, 67, 101185. https://doi.org/10....