pith. machine review for the scientific record. sign in

arxiv: 2604.23627 · v1 · submitted 2026-04-26 · 💻 cs.CL · cs.LG

Recognition: unknown

Neural Grammatical Error Correction for Romanian

Mihai Dascalu, Stefan Ruseti, Teodor-Mihai Cotet

Pith reviewed 2026-05-08 06:08 UTC · model grok-4.3

classification 💻 cs.CL cs.LG
keywords grammatical error correctionRomanianneural modelstransformerpretrainingartificial data generationlow-resourcePOS tagger
0
0 comments X

The pith

Pretraining a larger Transformer on artificial errors then finetuning on a new 10k Romanian GEC corpus reaches F0.5 of 53.76.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the first grammatical error correction corpus for Romanian consisting of 10,000 sentence pairs and adapts the German ERRANT scorer for Romanian to evaluate edits. Experiments show a baseline small Transformer trained only on this corpus achieves an F0.5 score of 44.38, while pretraining a larger Transformer on errors artificially generated with a POS tagger and then finetuning on the real data improves the score to 53.76. The authors present the artificial data generation method as easily extensible to other languages because it requires only a POS tagger. This matters in low-resource settings where advanced GEC tools beyond basic spellcheckers have been unavailable for Romanian and similar languages.

Core claim

By constructing a 10k-pair GEC dataset for Romanian and adapting an error annotation toolkit, the paper shows that pretraining a larger Transformer model on artificially generated errors created using only a POS tagger, followed by finetuning on the actual corpus, produces the best model with an F0.5 score of 53.76, outperforming direct training of a smaller model solely on the human-annotated data.

What carries the argument

The two-stage process of pretraining a larger Transformer on synthetic grammatical errors generated via a POS tagger before finetuning on the real Romanian GEC corpus.

If this is right

  • The method for generating additional training examples applies to any language using only a POS tagger.
  • Pretraining strategies are effective for neural GEC in low-resource settings.
  • Larger Transformer models gain more from the artificial pretraining step than smaller models trained directly on the limited real data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same POS-based synthetic pretraining could be applied to GEC for other languages that have a tagger but lack large annotated corpora.
  • If performance gains hold across languages, simple rule-based error simulation may reduce reliance on expensive human annotation for GEC.
  • One could measure how much the improvement depends on the size gap between the synthetic pretraining set and the 10k real pairs.

Load-bearing premise

Errors artificially generated with only a POS tagger are representative enough of real Romanian grammatical mistakes for the pretraining to transfer usefully when finetuning on the human-annotated corpus.

What would settle it

Evaluating the best pretrained model on a new collection of real human-annotated Romanian sentences and finding its F0.5 score no higher than the 44.38 baseline would indicate the artificial data did not deliver transferable gains.

read the original abstract

Resources for Grammatical Error Correction (GEC) in non-English languages are scarce, while available spellcheckers in these languages are mostly limited to simple corrections and rules. In this paper we introduce a first GEC corpus for Romanian consisting of 10k pairs of sentences. In addition, the German version of ERRANT (ERRor ANnotation Toolkit) scorer was adapted for Romanian to analyze this corpus and extract edits needed for evaluation. Multiple neural models were experimented, together with pretraining strategies, which proved effective for GEC in low-resource settings. Our baseline consists of a small Transformer model trained only on the GEC dataset (F0.5 of 44.38), whereas the best performing model is produced by pretraining a larger Transformer model on artificially generated data, followed by finetuning on the actual corpus (F0.5 of 53.76). The proposed method for generating additional training examples is easily extensible and can be applied to any language, as it requires only a POS tagger

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces the first Romanian GEC corpus of 10k sentence pairs, adapts the ERRANT scorer for Romanian error analysis, and evaluates neural Transformer models. The baseline (small Transformer trained only on the corpus) achieves F0.5 = 44.38; the best result (F0.5 = 53.76) is obtained by pretraining a larger Transformer on synthetic errors generated via a POS tagger and then fine-tuning on the real corpus. The synthetic-generation procedure is presented as language-agnostic and requiring only a POS tagger.

Significance. If the central result holds, the work supplies the first public Romanian GEC resource and demonstrates that POS-based synthetic pretraining can improve low-resource GEC performance. The extensible generation method and ERRANT adaptation are practical contributions that could be reused for other under-resourced languages.

major comments (2)
  1. [Synthetic data generation] Synthetic data generation (method section): the description states that errors are inserted using only POS tags, yet provides no explicit rules, examples, or frequency tables for Romanian-specific phenomena (definite-article placement, case marking, gender/number agreement). Without a quantitative comparison of ERRANT error-type distributions between the synthetic corpus and the 10k human-annotated pairs, the 9.38-point F0.5 lift cannot be confidently attributed to targeted pretraining rather than model scale or generic noise.
  2. [Results and evaluation] Results and evaluation (experimental section): F0.5 scores are reported for single runs without train/test split sizes, hyper-parameter search details, or statistical significance tests. On a 10k-pair corpus, variance across seeds or bootstrap confidence intervals is needed to establish that the two-stage procedure reliably outperforms the baseline.
minor comments (2)
  1. [Abstract] Abstract: 'the German version of ERRANT' should be clarified as the adapted Romanian version used for both corpus analysis and final scoring.
  2. [Evaluation metrics] Notation: the paper uses F0.5 without defining the beta value or confirming it matches the standard GEC convention (beta = 0.5).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our work introducing the first Romanian GEC corpus and POS-based synthetic pretraining. We address each major comment below and will revise the manuscript accordingly to strengthen the presentation and analysis.

read point-by-point responses
  1. Referee: [Synthetic data generation] Synthetic data generation (method section): the description states that errors are inserted using only POS tags, yet provides no explicit rules, examples, or frequency tables for Romanian-specific phenomena (definite-article placement, case marking, gender/number agreement). Without a quantitative comparison of ERRANT error-type distributions between the synthetic corpus and the 10k human-annotated pairs, the 9.38-point F0.5 lift cannot be confidently attributed to targeted pretraining rather than model scale or generic noise.

    Authors: We agree that the method section would benefit from greater explicitness. Although the generation procedure is intentionally language-agnostic and relies solely on POS tags to insert plausible errors, we will expand it with concrete Romanian examples (e.g., definite-article insertion/deletion, gender/number agreement violations, and case-marking errors) together with the frequency tables used to sample error types. In addition, we will compute and report the ERRANT error-type distributions for both the synthetic pretraining corpus and the 10k human-annotated pairs; this comparison will allow readers to assess how closely the synthetic data mirrors real error patterns and will help attribute the observed F0.5 improvement more confidently to the pretraining strategy. revision: yes

  2. Referee: [Results and evaluation] Results and evaluation (experimental section): F0.5 scores are reported for single runs without train/test split sizes, hyper-parameter search details, or statistical significance tests. On a 10k-pair corpus, variance across seeds or bootstrap confidence intervals is needed to establish that the two-stage procedure reliably outperforms the baseline.

    Authors: We acknowledge that single-run reporting limits the strength of the claims. In the revised manuscript we will explicitly state the train/test split sizes, document the hyper-parameter search procedure (including the ranges explored and the final selected values), and report F0.5 scores averaged over at least five random seeds together with standard deviations. Where appropriate we will also include bootstrap confidence intervals or paired statistical significance tests to demonstrate that the two-stage pretraining procedure reliably outperforms the baseline. revision: yes

Circularity Check

0 steps flagged

No circularity: standard supervised GEC training and evaluation on held-out data

full rationale

The paper introduces a 10k-sentence Romanian GEC corpus, adapts ERRANT for evaluation, and reports F0.5 scores from Transformer models trained on the corpus (baseline 44.38) or pretrained on synthetic data then finetuned (53.76). These metrics are produced by conventional supervised training and held-out evaluation; no equation, parameter fit, or self-citation reduces the reported scores to the inputs by construction. The synthetic-data generation method is presented as an extensible heuristic requiring only a POS tagger, but it is not used to define or tautologically guarantee the evaluation outcome. The derivation chain is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

This is an applied empirical NLP paper. It introduces a new dataset and applies standard sequence-to-sequence neural methods. No new mathematical axioms, physical constants, or invented entities are postulated. Free parameters consist of ordinary training choices and the details of how synthetic errors are generated from POS tags.

free parameters (2)
  • Transformer model size and architecture hyperparameters
    Chosen for the baseline and larger model experiments; values not specified in abstract.
  • Synthetic error generation rules and volume
    Parameters controlling how artificial errors are created from POS tags and how much synthetic data is used before fine-tuning.
axioms (1)
  • domain assumption Standard Transformer encoder-decoder architecture is suitable for grammatical error correction as a sequence transduction task
    Invoked implicitly by choosing Transformer models for the GEC task.

pith-pipeline@v0.9.0 · 5476 in / 1550 out tokens · 88706 ms · 2026-05-08T06:08:58.525405+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

30 extracted references · 12 canonical work pages · 2 internal anchors

  1. [1]

    The conll-2014 shared task on grammatical error correction,

    H. T. Ng, S. M. Wu, T. Briscoe, C. Hadiwinoto, R. H. Susanto, and C. Bryant, “The conll-2014 shared task on grammatical error correction,” inProceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task, pp. 1–14, 2014

  2. [2]

    Jfleg: A fluency cor- pus and benchmark for grammatical error correction,

    C. Napoles, K. Sakaguchi, and J. Tetreault, “Jfleg: A fluency cor- pus and benchmark for grammatical error correction,”arXiv preprint arXiv:1702.04066, 2017

  3. [3]

    Automatic extraction of learner errors in esl sentences using linguistically enhanced alignments,

    M. Felice, C. Bryant, and E. Briscoe, “Automatic extraction of learner errors in esl sentences using linguistically enhanced alignments,” Asso- ciation for Computational Linguistics, 2016

  4. [4]

    Automatic annotation and evaluation of error types for grammatical error correction,

    C. Bryant, M. Felice, and E. Briscoe, “Automatic annotation and evaluation of error types for grammatical error correction,” Association for Computational Linguistics, 2017

  5. [5]

    Large scale arabic error annotation: Guidelines and framework,

    W. Zaghouani, B. Mohit, N. Habash, O. Obeid, N. Tomeh, A. Ro- zovskaya, N. Farra, S. Alkuhlani, and K. Oflazer, “Large scale arabic error annotation: Guidelines and framework,” 2014

  6. [6]

    Using wikipedia edits in low resource grammatical error correction,

    A. Boyd, “Using wikipedia edits in low resource grammatical error correction,” inProceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text, pp. 79–84, 2018

  7. [7]

    Grammar error correction in morpho- logically rich languages: The case of russian,

    A. Rozovskaya and D. Roth, “Grammar error correction in morpho- logically rich languages: The case of russian,”Transactions of the Association for Computational Linguistics, vol. 7, pp. 1–17, 2019

  8. [8]

    Natural language correction,

    J. N ´aplava, “Natural language correction,” 2017

  9. [9]

    Overview of grammatical error diagnosis for learning chinese as a foreign language,

    L.-C. Yu, L.-H. Lee, and L.-P. Chang, “Overview of grammatical error diagnosis for learning chinese as a foreign language,” inProceedings of the 1st Workshop on Natural Language Processing Techniques for Educational Applications, pp. 42–47, 2014

  10. [10]

    Mining revision log of language learning sns for automated japanese error cor- rection of second language learners,

    T. Mizumoto, M. Komachi, M. Nagata, and Y . Matsumoto, “Mining revision log of language learning sns for automated japanese error cor- rection of second language learners,” inProceedings of 5th International Joint Conference on Natural Language Processing, pp. 147–155, 2011

  11. [11]

    Grammatical error correction in low-resource scenarios,

    J. N ´aplava and M. Straka, “Grammatical error correction in low-resource scenarios,”arXiv preprint arXiv:1910.00353, 2019

  12. [12]

    Attention is all you need,

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” inAdvances in neural information processing systems, pp. 5998–6008, 2017

  13. [13]

    Neural grammatical error correction systems with unsupervised pre-training on synthetic data,

    R. Grundkiewicz, M. Junczys-Dowmunt, and K. Heafield, “Neural grammatical error correction systems with unsupervised pre-training on synthetic data,” inProceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 252– 263, 2019

  14. [14]

    The bea- 2019 shared task on grammatical error correction,

    C. Bryant, M. Felice, Ø. E. Andersen, and T. Briscoe, “The bea- 2019 shared task on grammatical error correction,” inProceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 52–75, 2019

  15. [15]

    Phrase-based machine translation is state-of-the-art for automatic grammatical error correction,

    M. Junczys-Dowmunt and R. Grundkiewicz, “Phrase-based machine translation is state-of-the-art for automatic grammatical error correction,” arXiv preprint arXiv:1605.06353, 2016

  16. [16]

    Neu- ral language correction with character-based attention,

    Z. Xie, A. Avati, N. Arivazhagan, D. Jurafsky, and A. Y . Ng, “Neu- ral language correction with character-based attention,”arXiv preprint arXiv:1603.09727, 2016

  17. [17]

    Approaching neural grammatical error correction as a low-resource machine translation task,

    M. Junczys-Dowmunt, R. Grundkiewicz, S. Guha, and K. Heafield, “Approaching neural grammatical error correction as a low-resource machine translation task,”arXiv preprint arXiv:1804.05940, 2018

  18. [18]

    Improving grammatical error correction via pre-training a copy-augmented architecture with unlabeled data,

    W. Zhao, L. Wang, K. Shen, R. Jia, and J. Liu, “Improving grammatical error correction via pre-training a copy-augmented architecture with unlabeled data,”arXiv preprint arXiv:1903.00138, 2019

  19. [19]

    An empirical study of incorporating pseudo data into grammatical error correction,

    S. Kiyono, J. Suzuki, M. Mita, T. Mizumoto, and K. Inui, “An empirical study of incorporating pseudo data into grammatical error correction,” arXiv preprint arXiv:1909.00502, 2019

  20. [20]

    Encoder- decoder models can benefit from pre-trained masked language models in grammatical error correction,

    M. Kaneko, M. Mita, S. Kiyono, J. Suzuki, and K. Inui, “Encoder- decoder models can benefit from pre-trained masked language models in grammatical error correction,”arXiv preprint arXiv:2005.00987, 2020

  21. [21]

    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

    J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,”arXiv preprint arXiv:1810.04805, 2018

  22. [22]

    Language model based grammatical error correction without annotated training data,

    C. Bryant and T. Briscoe, “Language model based grammatical error correction without annotated training data,” inProceedings of the thirteenth workshop on innovative use of NLP for building educational applications, pp. 247–253, 2018

  23. [23]

    Corpora generation for grammatical error correction,

    J. Lichtarge, C. Alberti, S. Kumar, N. Shazeer, N. Parmar, and S. Tong, “Corpora generation for grammatical error correction,”arXiv preprint arXiv:1904.05780, 2019

  24. [24]

    Constrained grammatical error correction using statistical machine translation,

    Z. Yuan and M. Felice, “Constrained grammatical error correction using statistical machine translation,” inProceedings of the Seventeenth Conference on Computational Natural Language Learning: Shared Task, pp. 52–61, 2013

  25. [25]

    Readerbench goes online: A comprehension-centered framework for educational pur- poses,

    G. Gutu, M. Dascalu, S. Trausan-Matu, and P. Dessus, “Readerbench goes online: A comprehension-centered framework for educational pur- poses,” inInternational Conference on Human-Computer Interaction (RoCHI 2016)(A. Iftene and J. Vanderdonckt, eds.), p. 95–102, MA- TRIX ROM, 2016

  26. [26]

    Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

    Y . Wu, M. Schuster, Z. Chen, Q. V . Le, M. Norouzi, W. Macherey, M. Krikun, Y . Cao, Q. Gao, K. Macherey,et al., “Google’s neural machine translation system: Bridging the gap between human and machine translation,”arXiv preprint arXiv:1609.08144, 2016

  27. [27]

    Scalable modified kneser-ney language model estimation,

    K. Heafield, I. Pouzyrevsky, J. H. Clark, and P. Koehn, “Scalable modified kneser-ney language model estimation,” inProceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 690–696, 2013

  28. [28]

    BLEU: a method for automatic evaluation of machine translation,

    K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “BLEU: a method for automatic evaluation of machine translation,” inProceedings of the 40th annual meeting on association for computational linguistics, pp. 311– 318, Association for Computational Linguistics, 2002

  29. [29]

    Incorporating

    J. Zhu, Y . Xia, L. Wu, D. He, T. Qin, W. Zhou, H. Li, and T.-Y . Liu, “Incorporating bert into neural machine translation,”arXiv preprint arXiv:2002.06823, 2020

  30. [30]

    Diac+: A professional diacritics recovering system,

    D. Tufis ¸, A. Ceaus ¸u,et al., “Diac+: A professional diacritics recovering system,”Proceedings of LREC 2008, pp. 167–174, 2008