Recognition: unknown
Neural Grammatical Error Correction for Romanian
Pith reviewed 2026-05-08 06:08 UTC · model grok-4.3
The pith
Pretraining a larger Transformer on artificial errors then finetuning on a new 10k Romanian GEC corpus reaches F0.5 of 53.76.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By constructing a 10k-pair GEC dataset for Romanian and adapting an error annotation toolkit, the paper shows that pretraining a larger Transformer model on artificially generated errors created using only a POS tagger, followed by finetuning on the actual corpus, produces the best model with an F0.5 score of 53.76, outperforming direct training of a smaller model solely on the human-annotated data.
What carries the argument
The two-stage process of pretraining a larger Transformer on synthetic grammatical errors generated via a POS tagger before finetuning on the real Romanian GEC corpus.
If this is right
- The method for generating additional training examples applies to any language using only a POS tagger.
- Pretraining strategies are effective for neural GEC in low-resource settings.
- Larger Transformer models gain more from the artificial pretraining step than smaller models trained directly on the limited real data.
Where Pith is reading between the lines
- The same POS-based synthetic pretraining could be applied to GEC for other languages that have a tagger but lack large annotated corpora.
- If performance gains hold across languages, simple rule-based error simulation may reduce reliance on expensive human annotation for GEC.
- One could measure how much the improvement depends on the size gap between the synthetic pretraining set and the 10k real pairs.
Load-bearing premise
Errors artificially generated with only a POS tagger are representative enough of real Romanian grammatical mistakes for the pretraining to transfer usefully when finetuning on the human-annotated corpus.
What would settle it
Evaluating the best pretrained model on a new collection of real human-annotated Romanian sentences and finding its F0.5 score no higher than the 44.38 baseline would indicate the artificial data did not deliver transferable gains.
read the original abstract
Resources for Grammatical Error Correction (GEC) in non-English languages are scarce, while available spellcheckers in these languages are mostly limited to simple corrections and rules. In this paper we introduce a first GEC corpus for Romanian consisting of 10k pairs of sentences. In addition, the German version of ERRANT (ERRor ANnotation Toolkit) scorer was adapted for Romanian to analyze this corpus and extract edits needed for evaluation. Multiple neural models were experimented, together with pretraining strategies, which proved effective for GEC in low-resource settings. Our baseline consists of a small Transformer model trained only on the GEC dataset (F0.5 of 44.38), whereas the best performing model is produced by pretraining a larger Transformer model on artificially generated data, followed by finetuning on the actual corpus (F0.5 of 53.76). The proposed method for generating additional training examples is easily extensible and can be applied to any language, as it requires only a POS tagger
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces the first Romanian GEC corpus of 10k sentence pairs, adapts the ERRANT scorer for Romanian error analysis, and evaluates neural Transformer models. The baseline (small Transformer trained only on the corpus) achieves F0.5 = 44.38; the best result (F0.5 = 53.76) is obtained by pretraining a larger Transformer on synthetic errors generated via a POS tagger and then fine-tuning on the real corpus. The synthetic-generation procedure is presented as language-agnostic and requiring only a POS tagger.
Significance. If the central result holds, the work supplies the first public Romanian GEC resource and demonstrates that POS-based synthetic pretraining can improve low-resource GEC performance. The extensible generation method and ERRANT adaptation are practical contributions that could be reused for other under-resourced languages.
major comments (2)
- [Synthetic data generation] Synthetic data generation (method section): the description states that errors are inserted using only POS tags, yet provides no explicit rules, examples, or frequency tables for Romanian-specific phenomena (definite-article placement, case marking, gender/number agreement). Without a quantitative comparison of ERRANT error-type distributions between the synthetic corpus and the 10k human-annotated pairs, the 9.38-point F0.5 lift cannot be confidently attributed to targeted pretraining rather than model scale or generic noise.
- [Results and evaluation] Results and evaluation (experimental section): F0.5 scores are reported for single runs without train/test split sizes, hyper-parameter search details, or statistical significance tests. On a 10k-pair corpus, variance across seeds or bootstrap confidence intervals is needed to establish that the two-stage procedure reliably outperforms the baseline.
minor comments (2)
- [Abstract] Abstract: 'the German version of ERRANT' should be clarified as the adapted Romanian version used for both corpus analysis and final scoring.
- [Evaluation metrics] Notation: the paper uses F0.5 without defining the beta value or confirming it matches the standard GEC convention (beta = 0.5).
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our work introducing the first Romanian GEC corpus and POS-based synthetic pretraining. We address each major comment below and will revise the manuscript accordingly to strengthen the presentation and analysis.
read point-by-point responses
-
Referee: [Synthetic data generation] Synthetic data generation (method section): the description states that errors are inserted using only POS tags, yet provides no explicit rules, examples, or frequency tables for Romanian-specific phenomena (definite-article placement, case marking, gender/number agreement). Without a quantitative comparison of ERRANT error-type distributions between the synthetic corpus and the 10k human-annotated pairs, the 9.38-point F0.5 lift cannot be confidently attributed to targeted pretraining rather than model scale or generic noise.
Authors: We agree that the method section would benefit from greater explicitness. Although the generation procedure is intentionally language-agnostic and relies solely on POS tags to insert plausible errors, we will expand it with concrete Romanian examples (e.g., definite-article insertion/deletion, gender/number agreement violations, and case-marking errors) together with the frequency tables used to sample error types. In addition, we will compute and report the ERRANT error-type distributions for both the synthetic pretraining corpus and the 10k human-annotated pairs; this comparison will allow readers to assess how closely the synthetic data mirrors real error patterns and will help attribute the observed F0.5 improvement more confidently to the pretraining strategy. revision: yes
-
Referee: [Results and evaluation] Results and evaluation (experimental section): F0.5 scores are reported for single runs without train/test split sizes, hyper-parameter search details, or statistical significance tests. On a 10k-pair corpus, variance across seeds or bootstrap confidence intervals is needed to establish that the two-stage procedure reliably outperforms the baseline.
Authors: We acknowledge that single-run reporting limits the strength of the claims. In the revised manuscript we will explicitly state the train/test split sizes, document the hyper-parameter search procedure (including the ranges explored and the final selected values), and report F0.5 scores averaged over at least five random seeds together with standard deviations. Where appropriate we will also include bootstrap confidence intervals or paired statistical significance tests to demonstrate that the two-stage pretraining procedure reliably outperforms the baseline. revision: yes
Circularity Check
No circularity: standard supervised GEC training and evaluation on held-out data
full rationale
The paper introduces a 10k-sentence Romanian GEC corpus, adapts ERRANT for evaluation, and reports F0.5 scores from Transformer models trained on the corpus (baseline 44.38) or pretrained on synthetic data then finetuned (53.76). These metrics are produced by conventional supervised training and held-out evaluation; no equation, parameter fit, or self-citation reduces the reported scores to the inputs by construction. The synthetic-data generation method is presented as an extensible heuristic requiring only a POS tagger, but it is not used to define or tautologically guarantee the evaluation outcome. The derivation chain is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (2)
- Transformer model size and architecture hyperparameters
- Synthetic error generation rules and volume
axioms (1)
- domain assumption Standard Transformer encoder-decoder architecture is suitable for grammatical error correction as a sequence transduction task
Reference graph
Works this paper leans on
-
[1]
The conll-2014 shared task on grammatical error correction,
H. T. Ng, S. M. Wu, T. Briscoe, C. Hadiwinoto, R. H. Susanto, and C. Bryant, “The conll-2014 shared task on grammatical error correction,” inProceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task, pp. 1–14, 2014
2014
-
[2]
Jfleg: A fluency cor- pus and benchmark for grammatical error correction,
C. Napoles, K. Sakaguchi, and J. Tetreault, “Jfleg: A fluency cor- pus and benchmark for grammatical error correction,”arXiv preprint arXiv:1702.04066, 2017
-
[3]
Automatic extraction of learner errors in esl sentences using linguistically enhanced alignments,
M. Felice, C. Bryant, and E. Briscoe, “Automatic extraction of learner errors in esl sentences using linguistically enhanced alignments,” Asso- ciation for Computational Linguistics, 2016
2016
-
[4]
Automatic annotation and evaluation of error types for grammatical error correction,
C. Bryant, M. Felice, and E. Briscoe, “Automatic annotation and evaluation of error types for grammatical error correction,” Association for Computational Linguistics, 2017
2017
-
[5]
Large scale arabic error annotation: Guidelines and framework,
W. Zaghouani, B. Mohit, N. Habash, O. Obeid, N. Tomeh, A. Ro- zovskaya, N. Farra, S. Alkuhlani, and K. Oflazer, “Large scale arabic error annotation: Guidelines and framework,” 2014
2014
-
[6]
Using wikipedia edits in low resource grammatical error correction,
A. Boyd, “Using wikipedia edits in low resource grammatical error correction,” inProceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text, pp. 79–84, 2018
2018
-
[7]
Grammar error correction in morpho- logically rich languages: The case of russian,
A. Rozovskaya and D. Roth, “Grammar error correction in morpho- logically rich languages: The case of russian,”Transactions of the Association for Computational Linguistics, vol. 7, pp. 1–17, 2019
2019
-
[8]
Natural language correction,
J. N ´aplava, “Natural language correction,” 2017
2017
-
[9]
Overview of grammatical error diagnosis for learning chinese as a foreign language,
L.-C. Yu, L.-H. Lee, and L.-P. Chang, “Overview of grammatical error diagnosis for learning chinese as a foreign language,” inProceedings of the 1st Workshop on Natural Language Processing Techniques for Educational Applications, pp. 42–47, 2014
2014
-
[10]
Mining revision log of language learning sns for automated japanese error cor- rection of second language learners,
T. Mizumoto, M. Komachi, M. Nagata, and Y . Matsumoto, “Mining revision log of language learning sns for automated japanese error cor- rection of second language learners,” inProceedings of 5th International Joint Conference on Natural Language Processing, pp. 147–155, 2011
2011
-
[11]
Grammatical error correction in low-resource scenarios,
J. N ´aplava and M. Straka, “Grammatical error correction in low-resource scenarios,”arXiv preprint arXiv:1910.00353, 2019
-
[12]
Attention is all you need,
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” inAdvances in neural information processing systems, pp. 5998–6008, 2017
2017
-
[13]
Neural grammatical error correction systems with unsupervised pre-training on synthetic data,
R. Grundkiewicz, M. Junczys-Dowmunt, and K. Heafield, “Neural grammatical error correction systems with unsupervised pre-training on synthetic data,” inProceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 252– 263, 2019
2019
-
[14]
The bea- 2019 shared task on grammatical error correction,
C. Bryant, M. Felice, Ø. E. Andersen, and T. Briscoe, “The bea- 2019 shared task on grammatical error correction,” inProceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 52–75, 2019
2019
-
[15]
Phrase-based machine translation is state-of-the-art for automatic grammatical error correction,
M. Junczys-Dowmunt and R. Grundkiewicz, “Phrase-based machine translation is state-of-the-art for automatic grammatical error correction,” arXiv preprint arXiv:1605.06353, 2016
-
[16]
Neu- ral language correction with character-based attention,
Z. Xie, A. Avati, N. Arivazhagan, D. Jurafsky, and A. Y . Ng, “Neu- ral language correction with character-based attention,”arXiv preprint arXiv:1603.09727, 2016
-
[17]
Approaching neural grammatical error correction as a low-resource machine translation task,
M. Junczys-Dowmunt, R. Grundkiewicz, S. Guha, and K. Heafield, “Approaching neural grammatical error correction as a low-resource machine translation task,”arXiv preprint arXiv:1804.05940, 2018
-
[18]
W. Zhao, L. Wang, K. Shen, R. Jia, and J. Liu, “Improving grammatical error correction via pre-training a copy-augmented architecture with unlabeled data,”arXiv preprint arXiv:1903.00138, 2019
-
[19]
An empirical study of incorporating pseudo data into grammatical error correction,
S. Kiyono, J. Suzuki, M. Mita, T. Mizumoto, and K. Inui, “An empirical study of incorporating pseudo data into grammatical error correction,” arXiv preprint arXiv:1909.00502, 2019
-
[20]
M. Kaneko, M. Mita, S. Kiyono, J. Suzuki, and K. Inui, “Encoder- decoder models can benefit from pre-trained masked language models in grammatical error correction,”arXiv preprint arXiv:2005.00987, 2020
-
[21]
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,”arXiv preprint arXiv:1810.04805, 2018
work page internal anchor Pith review arXiv 2018
-
[22]
Language model based grammatical error correction without annotated training data,
C. Bryant and T. Briscoe, “Language model based grammatical error correction without annotated training data,” inProceedings of the thirteenth workshop on innovative use of NLP for building educational applications, pp. 247–253, 2018
2018
-
[23]
Corpora generation for grammatical error correction,
J. Lichtarge, C. Alberti, S. Kumar, N. Shazeer, N. Parmar, and S. Tong, “Corpora generation for grammatical error correction,”arXiv preprint arXiv:1904.05780, 2019
-
[24]
Constrained grammatical error correction using statistical machine translation,
Z. Yuan and M. Felice, “Constrained grammatical error correction using statistical machine translation,” inProceedings of the Seventeenth Conference on Computational Natural Language Learning: Shared Task, pp. 52–61, 2013
2013
-
[25]
Readerbench goes online: A comprehension-centered framework for educational pur- poses,
G. Gutu, M. Dascalu, S. Trausan-Matu, and P. Dessus, “Readerbench goes online: A comprehension-centered framework for educational pur- poses,” inInternational Conference on Human-Computer Interaction (RoCHI 2016)(A. Iftene and J. Vanderdonckt, eds.), p. 95–102, MA- TRIX ROM, 2016
2016
-
[26]
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Y . Wu, M. Schuster, Z. Chen, Q. V . Le, M. Norouzi, W. Macherey, M. Krikun, Y . Cao, Q. Gao, K. Macherey,et al., “Google’s neural machine translation system: Bridging the gap between human and machine translation,”arXiv preprint arXiv:1609.08144, 2016
work page internal anchor Pith review arXiv 2016
-
[27]
Scalable modified kneser-ney language model estimation,
K. Heafield, I. Pouzyrevsky, J. H. Clark, and P. Koehn, “Scalable modified kneser-ney language model estimation,” inProceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 690–696, 2013
2013
-
[28]
BLEU: a method for automatic evaluation of machine translation,
K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “BLEU: a method for automatic evaluation of machine translation,” inProceedings of the 40th annual meeting on association for computational linguistics, pp. 311– 318, Association for Computational Linguistics, 2002
2002
-
[29]
J. Zhu, Y . Xia, L. Wu, D. He, T. Qin, W. Zhou, H. Li, and T.-Y . Liu, “Incorporating bert into neural machine translation,”arXiv preprint arXiv:2002.06823, 2020
-
[30]
Diac+: A professional diacritics recovering system,
D. Tufis ¸, A. Ceaus ¸u,et al., “Diac+: A professional diacritics recovering system,”Proceedings of LREC 2008, pp. 167–174, 2008
2008
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.