arxiv: 2605.13624 · v1 · submitted 2026-05-13 · 💻 cs.CL

Recognition: unknown

Edit-level Majority Voting Mitigates Over-Correction in LLM-based Grammatical Error Correction

Takumi Goto , Yusuke Sakai , Taro Watanabe

Authors on Pith no claims yet

Pith reviewed 2026-05-14 19:55 UTC · model grok-4.3

classification 💻 cs.CL

keywords grammatical error correctionLLM inferencemajority votingover-correctionmultilingual evaluationtraining-free method

0 comments

The pith

Edit-level majority voting over multiple LLM candidates reduces over-correction in grammatical error correction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that grammatical error correction with large language models frequently produces unnecessary changes that alter correct text. To address this without any retraining or model changes, the authors generate several correction candidates from one LLM and apply majority voting separately to each individual edit. This approach improves results over standard greedy decoding and minimum Bayes risk decoding on nine benchmarks spanning English, Czech, German, Ukrainian, Korean, Hindi, and Romanian. It also keeps correction quality consistent even when the instruction prompt is varied. Readers who use LLMs for writing assistance would value a lightweight inference step that limits unwanted alterations.

Core claim

Performing majority voting at the level of individual edits, rather than on complete sentences, reliably reduces the over-correction problem that arises when a single large language model is prompted to correct grammar.

What carries the argument

Edit-level majority voting: multiple correction candidates are produced from one LLM, edits are identified and aligned across candidates, and the most frequent version of each edit is retained in the final output.

If this is right

The method outperforms both greedy and minimum Bayes risk decoding on most of the tested multilingual benchmarks.
Correction quality remains stable when different instruction prompts are used.
No model modification or additional training data is required to obtain the improvement.
The approach applies directly to existing LLM inference pipelines for grammatical error correction.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same voting step could be tested on other LLM tasks that suffer from over-generation, such as summarization or style transfer.
If edit alignment proves robust across languages, the technique might extend to low-resource settings where prompt sensitivity is high.
Combining edit-level voting with lightweight reranking could further reduce the remaining errors without extra training.

Load-bearing premise

The generated candidates must contain enough diversity that correct edits appear in the majority while over-corrections appear only in minorities.

What would settle it

On any of the nine benchmarks, if the edit-voted output introduces more over-corrections or lower overall scores than greedy decoding from the same model and prompt, the central claim would be falsified.

Figures

Figures reproduced from arXiv: 2605.13624 by Takumi Goto, Taro Watanabe, Yusuke Sakai.

**Figure 2.** Figure 2: plots the results for each k in a space where the horizontal axis represents the average computation time per sentence (seconds) and the 8 6 4 2 0 Computation time (seconds, negative values) 0.30 0.35 0.40 0.45 0.50 0.55 Score k=1 k=2 k=4 k=8 k=32 k=16 k=1 k=2 k=8k=4 k=32 k=16 k=2k=1 k=8k=4 k=32 k=16 CWEB-G-dev BEA19-dev JFLEG-dev [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Relationship between edit frequency and pre [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: shows the template for English experiments, and [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

**Figure 5.** Figure 5: Instruction template for datasets of other than English. [LANG] will be replaced with a language name, such as “Czech” or “’German.” stage used the entire W&I-LOCNESS (Yannakoudakis et al., 2018) training data. We used bert-base-cased 6 (Devlin et al., 2019) and deberta-v3-large 7 (He et al., 2023) as the models. The training settings also basically followed Omelianchuk et al. (2020), except that the fi… view at source ↗

read the original abstract

Grammatical error correction using large language models often suffers from the over-correction issue. To mitigate this, we propose a training-free inference method that performs edit-level majority voting over multiple candidates generated by a single model, without requiring model modifications or additional training. Across nine benchmarks covering English, Czech, German, Ukrainian, Korean, Hindi, and Romanian, the proposed method outperforms both greedy and MBR decoding in most cases. Moreover, it yields stable correction quality regardless of the instruction prompts used. We release two repository supporting GEC datasets loading and LLM inference.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Edit-level majority voting gives a practical, training-free way to cut over-correction in LLM GEC and holds up across languages, but the edit alignment step needs tighter definition.

read the letter

The main thing to know is that this paper shows a simple inference trick—generating several outputs from one LLM and taking a majority vote at the level of individual edits—reduces over-correction in grammatical error correction without any retraining. It reports gains over greedy and MBR decoding on nine benchmarks spanning English, Czech, German, Ukrainian, Korean, Hindi, and Romanian, and the quality stays steady across different prompts. They also release code for dataset handling and inference, which is useful for anyone wanting to try it.

Referee Report

3 major / 2 minor

Summary. The paper proposes a training-free inference method for LLM-based grammatical error correction that generates multiple candidate outputs from a single model and applies majority voting at the level of individual edits to mitigate over-correction. It reports that this approach outperforms both greedy decoding and minimum Bayes risk (MBR) decoding across nine benchmarks covering English, Czech, German, Ukrainian, Korean, Hindi, and Romanian, while also producing stable correction quality independent of the specific instruction prompts used. Two supporting repositories for dataset loading and LLM inference are released.

Significance. If the central empirical claims hold after clarification of the method, the work offers a lightweight, training-free technique that could be widely adopted to improve reliability of LLM-based GEC systems without model changes. The cross-lingual scope and reported prompt stability are practically relevant, and the open release of code and data strengthens reproducibility.

major comments (3)

[§3] §3 (Proposed Method): The edit extraction and alignment procedure is not given a formal, language-agnostic definition or pseudocode. The manuscript relies on an implicit alignment step without specifying how insertions, deletions, substitutions, and multi-token edits are canonicalized or how edit boundaries are determined across scripts and morphologies. This is load-bearing for the majority-vote guarantee and risks inconsistent aggregation in languages such as Korean and Hindi.
[§4.2] §4.2 (Experimental Results): The tables reporting outperformance over MBR decoding do not include statistical significance tests, confidence intervals, or variance estimates across runs. Without these, it is unclear whether the gains in 'most cases' are robust or attributable to sampling variance in candidate generation.
[§4.3] §4.3 (Analysis): No quantitative analysis of candidate diversity or inter-candidate edit agreement is provided. The central claim that edit-level voting reliably mitigates over-correction presupposes sufficient diversity; absence of such diagnostics leaves the mechanism under-supported.

minor comments (2)

A summary table listing the nine benchmarks, their sizes, error-type distributions, and reference sources would improve clarity and allow readers to assess cross-lingual coverage.
[§3] Notation for edit operations (e.g., how an edit is represented as a tuple or string) should be introduced explicitly in §3 before the voting algorithm is described.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their insightful comments, which have helped us improve the clarity and rigor of our manuscript. We provide point-by-point responses below.

read point-by-point responses

Referee: [§3] §3 (Proposed Method): The edit extraction and alignment procedure is not given a formal, language-agnostic definition or pseudocode. The manuscript relies on an implicit alignment step without specifying how insertions, deletions, substitutions, and multi-token edits are canonicalized or how edit boundaries are determined across scripts and morphologies. This is load-bearing for the majority-vote guarantee and risks inconsistent aggregation in languages such as Korean and Hindi.

Authors: We agree that providing a formal, language-agnostic definition and pseudocode for the edit extraction and alignment procedure will enhance reproducibility and address concerns about consistency across languages. In the revised manuscript, we will add a detailed pseudocode algorithm that describes the steps for identifying edits using a standard alignment method (e.g., based on Levenshtein distance at the token level), canonicalizing operations for insertions, deletions, substitutions, and multi-token spans, and handling boundary determination in a script-agnostic manner. This will explicitly support the majority-vote mechanism. revision: yes
Referee: [§4.2] §4.2 (Experimental Results): The tables reporting outperformance over MBR decoding do not include statistical significance tests, confidence intervals, or variance estimates across runs. Without these, it is unclear whether the gains in 'most cases' are robust or attributable to sampling variance in candidate generation.

Authors: We acknowledge this limitation in the current version. To demonstrate the robustness of our results, we will include statistical significance tests (e.g., paired t-tests or Wilcoxon signed-rank tests) and confidence intervals computed via bootstrapping over multiple independent runs in the revised tables for the comparisons with MBR decoding. revision: yes
Referee: [§4.3] §4.3 (Analysis): No quantitative analysis of candidate diversity or inter-candidate edit agreement is provided. The central claim that edit-level voting reliably mitigates over-correction presupposes sufficient diversity; absence of such diagnostics leaves the mechanism under-supported.

Authors: We agree that quantifying candidate diversity and edit agreement would better support the mechanism. In the revised analysis section, we will add metrics such as the average number of unique edits per sentence across candidates, the pairwise edit overlap ratio, and the variance in correction quality, to show that there is sufficient diversity for the majority voting to be effective while reducing over-corrections. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical voting heuristic is self-contained

full rationale

The paper describes a training-free inference procedure that generates multiple LLM candidates for grammatical error correction and aggregates them via edit-level majority voting. No equations, fitted parameters, or derivations are present that could reduce to self-definition or input-as-prediction. The method relies on external empirical evaluation across nine benchmarks rather than any self-citation chain, uniqueness theorem, or ansatz imported from prior work. Edit extraction is presented as an implementation detail without circular redefinition of the voting outcome. This is a standard empirical inference setup whose validity is tested against baselines, yielding a score of 0.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are introduced in the abstract; the approach relies on standard LLM generation and majority voting.

pith-pipeline@v0.9.0 · 5388 in / 1004 out tokens · 56708 ms · 2026-05-14T19:55:17.732370+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages · 5 internal anchors

[1]

Abhijeet Awasthi, Sunita Sarawagi, Rasna Goyal, Sabyasachi Ghosh, and Vihari Piratla. 2019. https://doi.org/10.18653/v1/D19-1435 Parallel iterative edit models for local sequence transduction . In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing...

work page doi:10.18653/v1/d19-1435 2019
[2]

Adriane Boyd. 2018. https://doi.org/10.18653/v1/W18-6111 Using W ikipedia edits in low resource grammatical error correction . In Proceedings of the 2018 EMNLP Workshop W- NUT : The 4th Workshop on Noisy User-generated Text , pages 79--84, Brussels, Belgium. Association for Computational Linguistics

work page doi:10.18653/v1/w18-6111 2018
[3]

Andersen, and Ted Briscoe

Christopher Bryant, Mariano Felice, istein E. Andersen, and Ted Briscoe. 2019. https://doi.org/10.18653/v1/W19-4406 The BEA -2019 shared task on grammatical error correction . In Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 52--75, Florence, Italy. Association for Computational Linguistics

work page doi:10.18653/v1/w19-4406 2019
[4]

Christopher Bryant, Mariano Felice, and Ted Briscoe. 2017. https://doi.org/10.18653/v1/P17-1074 Automatic annotation and evaluation of error types for grammatical error correction . In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 793--805, Vancouver, Canada. Association for Computat...

work page doi:10.18653/v1/p17-1074 2017
[5]

Bin Cao, Kai Jiang, Fayu Pan, Chenlei Bao, and Jing Fan. 2024. https://aclanthology.org/2024.lrec-main.772/ Improving grammatical error correction by correction acceptability discrimination . In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 8818--8827, Torin...

work page 2024
[6]

Teodor-Mihai Cotet, Stefan Ruseti, and Mihai Dascalu. 2020. https://doi.org/10.1109/ICTAI50040.2020.00101 Neural grammatical error correction for romanian . In 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI), pages 625--631

work page doi:10.1109/ictai50040.2020.00101 2020
[7]

Andersen, Shiva Taslimipoor, Helen Yannakoudakis, Zheng Yuan, Christopher Bryant, Marek Rei, and Paula Buttery

Christopher Davis, Andrew Caines, istein E. Andersen, Shiva Taslimipoor, Helen Yannakoudakis, Zheng Yuan, Christopher Bryant, Marek Rei, and Paula Buttery. 2024. https://doi.org/10.18653/v1/2024.findings-acl.711 Prompting open-source and commercial language models for grammatical error correction of E nglish learner text . In Findings of the Association f...

work page doi:10.18653/v1/2024.findings-acl.711 2024
[8]

Hiroyuki Deguchi, Yusuke Sakai, Hidetaka Kamigaito, and Taro Watanabe. 2024. https://doi.org/10.18653/v1/2024.emnlp-demo.37 mbrs: A library for minimum B ayes risk decoding . In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 351--362, Miami, Florida, USA. Association for Computational L...

work page doi:10.18653/v1/2024.emnlp-demo.37 2024
[9]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. https://doi.org/10.18653/v1/N19-1423 BERT : Pre-training of deep bidirectional transformers for language understanding . In Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long a...

work page doi:10.18653/v1/n19-1423 2019
[10]

Bryan Eikema and Wilker Aziz. 2020. https://doi.org/10.18653/v1/2020.coling-main.398 Is MAP decoding all you need? the inadequacy of the mode in neural machine translation . In Proceedings of the 28th International Conference on Computational Linguistics, pages 4506--4520, Barcelona, Spain (Online). International Committee on Computational Linguistics

work page doi:10.18653/v1/2020.coling-main.398 2020
[11]

Mariano Felice, Christopher Bryant, and Ted Briscoe. 2016. https://aclanthology.org/C16-1079/ Automatic extraction of learner errors in ESL sentences using linguistically enhanced alignments . In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers , pages 825--835, Osaka, Japan. The COLING 2016 Orga...

work page 2016
[12]

Simon Flachs, Oph \'e lie Lacroix, Helen Yannakoudakis, Marek Rei, and Anders S gaard. 2020. https://doi.org/10.18653/v1/2020.emnlp-main.680 Grammatical error correction in low error density domains: A new benchmark and analyses . In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 8467--8478, Online. A...

work page doi:10.18653/v1/2020.emnlp-main.680 2020
[13]

Vaibhava Goel and William J Byrne. 2000. https://doi.org/10.1006/csla.2000.0138 Minimum bayes-risk automatic speech recognition . Computer Speech & Language, 14(2):115--135

work page doi:10.1006/csla.2000.0138 2000
[14]

Peiyuan Gong, Xuebo Liu, Heyan Huang, and Min Zhang. 2022. https://doi.org/10.18653/v1/2022.emnlp-main.463 Revisiting grammatical error correction evaluation and beyond . In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 6891--6902, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics

work page doi:10.18653/v1/2022.emnlp-main.463 2022
[15]

Takumi Goto, Yusuke Sakai, and Taro Watanabe. 2025 a . https://doi.org/10.18653/v1/2025.acl-demo.50 gec-metrics: A unified library for grammatical error correction evaluation . In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), pages 524--534, Vienna, Austria. Association for Compu...

work page doi:10.18653/v1/2025.acl-demo.50 2025
[16]

Takumi Goto, Yusuke Sakai, and Taro Watanabe. 2025 b . https://doi.org/10.18653/v1/2025.acl-short.92 Rethinking evaluation metrics for grammatical error correction: Why use a different evaluation process than human? In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 1165--1172, Vienna...

work page doi:10.18653/v1/2025.acl-short.92 2025
[17]

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, and 542 others. 2024. https://arxiv.org/abs/2407.21783 The llama 3...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[18]

Pengcheng He, Jianfeng Gao, and Weizhu Chen. 2023. https://openreview.net/forum?id=sE7-XhLxHA DeBERTaV3 : Improving DeBERTa using ELECTRA -style pre-training with gradient-disentangled embedding sharing . In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023 . OpenReview.net

work page 2023
[19]

Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. 2020. https://openreview.net/forum?id=rygGQyrFvH The curious case of neural text degeneration . In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020 . OpenReview.net

work page 2020
[20]

Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen - Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen

Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen - Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2022. https://openreview.net/forum?id=nZeVKeeFYf9 LoRA : Low-rank adaptation of large language models . In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022 . OpenReview.net

work page 2022
[21]

Masahiro Kaneko and Naoaki Okazaki. 2023. https://doi.org/10.18653/v1/2023.emnlp-main.619 Reducing sequence length by predicting edit spans with large language models . In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 10017--10029, Singapore. Association for Computational Linguistics

work page doi:10.18653/v1/2023.emnlp-main.619 2023
[22]

Anisia Katinskaia and Roman Yangarber. 2024. https://aclanthology.org/2024.lrec-main.692/ GPT -3.5 for grammatical error correction . In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 7831--7843, Torino, Italia. ELRA and ICCL

work page 2024
[23]

Masamune Kobayashi, Masato Mita, and Mamoru Komachi. 2024. https://doi.org/10.1162/tacl_a_00676 Revisiting meta-evaluation for grammatical error correction . Transactions of the Association for Computational Linguistics, 12:837--855

work page doi:10.1162/tacl_a_00676 2024
[24]

Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph Gonzalez, Hao Zhang, and Ion Stoica. 2023 a . https://doi.org/10.1145/3600006.3613165 Efficient memory management for large language model serving with pagedattention . In Proceedings of the 29th Symposium on Operating Systems Principles, SOSP 2023, Koblenz, Germany, Oc...

work page doi:10.1145/3600006.3613165 2023
[25]

Gonzalez, Hao Zhang, and Ion Stoica

Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, and Ion Stoica. 2023 b . Efficient memory management for large language model serving with pagedattention. In Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles

work page 2023
[26]

Jiehao Liang, Haihui Yang, Shiping Gao, and Xiaojun Quan. 2025. https://aclanthology.org/2025.coling-main.229/ Edit-wise preference optimization for grammatical error correction . In Proceedings of the 31st International Conference on Computational Linguistics, pages 3401--3414, Abu Dhabi, UAE. Association for Computational Linguistics

work page 2025
[27]

Ruixi Lin and Hwee Tou Ng. 2021. https://aclanthology.org/2021.ranlp-1.94/ System combination for grammatical error correction based on integer programming . In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pages 824--829, Held Online. INCOMA Ltd

work page 2021
[28]

Mengsay Loem, Masahiro Kaneko, Sho Takase, and Naoaki Okazaki. 2023. https://doi.org/10.18653/v1/2023.bea-1.18 Exploring effectiveness of GPT -3 in grammatical error correction: A study on performance and controllability in prompt-based methods . In Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023),...

work page doi:10.18653/v1/2023.bea-1.18 2023
[29]

Koki Maeda, Masahiro Kaneko, and Naoaki Okazaki. 2022. https://aclanthology.org/2022.coling-1.316/ IMPARA : Impact-based metric for GEC using parallel data . In Proceedings of the 29th International Conference on Computational Linguistics, pages 3578--3588, Gyeongju, Republic of Korea. International Committee on Computational Linguistics

work page 2022
[30]

Jakub N \'a plava and Milan Straka. 2019. https://doi.org/10.18653/v1/D19-5545 Grammatical error correction in low-resource scenarios . In Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019), pages 346--356, Hong Kong, China. Association for Computational Linguistics

work page doi:10.18653/v1/d19-5545 2019
[31]

Courtney Napoles, Keisuke Sakaguchi, Matt Post, and Joel Tetreault. 2015. https://doi.org/10.3115/v1/P15-2097 Ground truth for grammatical error correction metrics . In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), ...

work page doi:10.3115/v1/p15-2097 2015
[32]

Courtney Napoles, Keisuke Sakaguchi, Matt Post, and Joel Tetreault. 2016. https://arxiv.org/abs/1605.02592 GLEU without tuning . Preprint, arXiv:1605.02592

work page internal anchor Pith review Pith/arXiv arXiv 2016
[33]

Courtney Napoles, Keisuke Sakaguchi, and Joel Tetreault. 2017. https://aclanthology.org/E17-2037/ JFLEG : A fluency corpus and benchmark for grammatical error correction . In Proceedings of the 15th Conference of the E uropean Chapter of the Association for Computational Linguistics: Volume 2, Short Papers , pages 229--234, Valencia, Spain. Association fo...

work page 2017
[34]

Hwee Tou Ng, Siew Mei Wu, Ted Briscoe, Christian Hadiwinoto, Raymond Hendy Susanto, and Christopher Bryant. 2014. https://doi.org/10.3115/v1/W14-1701 The C o NLL -2014 shared task on grammatical error correction . In Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task, pages 1--14, Baltimore, Maryland. Associat...

work page doi:10.3115/v1/w14-1701 2014
[35]

Kostiantyn Omelianchuk, Vitaliy Atrasevych, Artem Chernodub, and Oleksandr Skurzhanskyi. 2020. https://doi.org/10.18653/v1/2020.bea-1.16 GECT o R -- grammatical error correction: Tag, not rewrite . In Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 163--170, Seattle, WA, USA Online. Association f...

work page doi:10.18653/v1/2020.bea-1.16 2020
[36]

Kostiantyn Omelianchuk, Andrii Liubonko, Oleksandr Skurzhanskyi, Artem Chernodub, Oleksandr Korniienko, and Igor Samokhin. 2024. https://aclanthology.org/2024.bea-1.3/ Pillars of grammatical error correction: Comprehensive inspection of contemporary approaches in the era of large language models . In Proceedings of the 19th Workshop on Innovative Use of N...

work page 2024
[37]

Peng Qi, Yuhao Zhang, Yuhui Zhang, Jason Bolton, and Christopher D. Manning. 2020. https://doi.org/10.18653/v1/2020.acl-demos.14 S tanza: A python natural language processing toolkit for many human languages . In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 101--108, Online. Associat...

work page doi:10.18653/v1/2020.acl-demos.14 2020
[38]

Mengyang Qiu, Tran Minh Nguyen, Zihao Huang, Zelong Li, Yang Gu, Qingyu Gao, Siliang Liu, and Jungyeul Park. 2025. https://aclanthology.org/2025.bea-1.15/ Multilingual grammatical error annotation: Combining language-agnostic framework with language-specific flexibility . In Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educationa...

work page 2025
[39]

Muhammad Reza Qorib, Seung-Hoon Na, and Hwee Tou Ng. 2022. https://doi.org/10.18653/v1/2022.naacl-main.143 Frustratingly easy system combination for grammatical error correction . In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1964--1974, Seattle, Uni...

work page doi:10.18653/v1/2022.naacl-main.143 2022
[40]

Muhammad Reza Qorib and Hwee Tou Ng. 2023. https://doi.org/10.18653/v1/2023.emnlp-main.785 System combination via quality estimation for grammatical error correction . In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 12746--12759, Singapore. Association for Computational Linguistics

work page doi:10.18653/v1/2023.emnlp-main.785 2023
[41]

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. http://jmlr.org/papers/v21/20-074.html Exploring the limits of transfer learning with a unified text-to-text transformer . Journal of Machine Learning Research, 21(140):1--67

work page 2020
[42]

Vyas Raina and Mark Gales. 2023. https://doi.org/10.18653/v1/2023.ijcnlp-short.12 Minimum B ayes' risk decoding for system combination of grammatical error correction systems . In Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Lin...

work page doi:10.18653/v1/2023.ijcnlp-short.12 2023
[43]

Sascha Rothe, Jonathan Mallinson, Eric Malmi, Sebastian Krause, and Aliaksei Severyn. 2021. https://doi.org/10.18653/v1/2021.acl-short.89 A simple recipe for multilingual grammatical error correction . In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language ...

work page doi:10.18653/v1/2021.acl-short.89 2021
[44]

Yusuke Sakai, Adam Nohejl, Jiangnan Hang, Hidetaka Kamigaito, and Taro Watanabe. 2024. https://doi.org/10.18653/v1/2024.blackboxnlp-1.31 Toward the evaluation of large language models considering score variance across instruction templates . In Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP, pages 499--529,...

work page doi:10.18653/v1/2024.blackboxnlp-1.31 2024
[45]

Ujjwal Sharma and Pushpak Bhattacharyya. 2025. https://aclanthology.org/2025.coling-main.406/ Hi- GEC : H indi grammar error correction in low resource scenario . In Proceedings of the 31st International Conference on Computational Linguistics, pages 6063--6075, Abu Dhabi, UAE. Association for Computational Linguistics

work page 2025
[46]

Alexey Sorokin. 2022. https://doi.org/10.18653/v1/2022.emnlp-main.785 Improved grammatical error correction by ranking elementary edits . In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 11416--11429, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics

work page doi:10.18653/v1/2022.emnlp-main.785 2022
[47]

Ryszard Staruch, Filip Gralinski, and Daniel Dzienisiewicz. 2025. https://doi.org/10.18653/v1/2025.bea-1.9 Adapting LLM s for minimal-edit grammatical error correction . In Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025), pages 118--128, Vienna, Austria. Association for Computational Linguistics

work page doi:10.18653/v1/2025.bea-1.9 2025
[48]

Oleksiy Syvokon and Mariana Romanyshyn. 2023. https://doi.org/10.18653/v1/2023.unlp-1.16 The UNLP 2023 shared task on grammatical error correction for U krainian . In Proceedings of the Second Ukrainian Natural Language Processing Workshop (UNLP), pages 132--137, Dubrovnik, Croatia. Association for Computational Linguistics

work page doi:10.18653/v1/2023.unlp-1.16 2023
[49]

Chenming Tang, Fanyi Qu, and Yunfang Wu. 2024. https://doi.org/10.18653/v1/2024.naacl-long.99 Ungrammatical-syntax-based in-context example selection for grammatical error correction . In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), p...

work page doi:10.18653/v1/2024.naacl-long.99 2024
[50]

Maksym Tarnavskyi, Artem Chernodub, and Kostiantyn Omelianchuk. 2022. https://doi.org/10.18653/v1/2022.acl-long.266 Ensembling and knowledge distilling of large sequence taggers for grammatical error correction . In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3842--3852, Dublin, Ir...

work page doi:10.18653/v1/2022.acl-long.266 2022
[51]

Gemma Team, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ramé, Morgane Rivière, Louis Rouillard, Thomas Mesnard, Geoffrey Cideron, Jean bastien Grill, Sabela Ramos, Edouard Yvinec, Michelle Casbon, Etienne Pot, Ivo Penchev, and 197 others. 2025. https://arxiv.org/abs/2503.19786...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[52]

Gemma Team, Morgane Riviere, Shreya Pathak, Pier Giuseppe Sessa, Cassidy Hardin, Surya Bhupatiraju, Léonard Hussenot, Thomas Mesnard, Bobak Shahriari, Alexandre Ramé, Johan Ferret, Peter Liu, Pouya Tafti, Abe Friesen, Michelle Casbon, Sabela Ramos, Ravin Kumar, Charline Le Lan, Sammy Jerome, and 179 others. 2024. https://arxiv.org/abs/2408.00118 Gemma 2: ...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[53]

Junrui Wang, Mengyang Qiu, Yang Gu, Zihao Huang, and Jungyeul Park. 2025. https://aclanthology.org/2025.coling-main.52/ Refined evaluation for end-to-end grammatical error correction using an alignment-based approach . In Proceedings of the 31st International Conference on Computational Linguistics, pages 774--785, Abu Dhabi, UAE. Association for Computat...

work page 2025
[54]

Le, Ed H

Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc V. Le, Ed H. Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. 2023. https://openreview.net/forum?id=1PL1NIMMrw Self-consistency improves chain of thought reasoning in language models . In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023 . OpenR...

work page 2023
[55]

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, and 41 others. 2025. https://arxiv.org/abs/2505.09388 Qwen3 technical report . Preprint, arXiv:2505.09388

work page internal anchor Pith review Pith/arXiv arXiv 2025
[56]

Helen Yannakoudakis, Øistein E Andersen, Ardeshir Geranpayeh, Ted Briscoe, and Diane Nicholls. 2018. https://doi.org/10.1080/08957347.2018.1464447 Developing an automated writing placement system for ESL learners . Applied Measurement in Education, 31(3):251--267

work page doi:10.1080/08957347.2018.1464447 2018
[57]

Soyoung Yoon, Sungjoon Park, Gyuwan Kim, Junhee Cho, Kihyo Park, Gyu Tae Kim, Minjoon Seo, and Alice Oh. 2023. https://doi.org/10.18653/v1/2023.acl-long.371 Towards standardizing K orean grammatical error correction: Datasets and annotation . In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)...

work page doi:10.18653/v1/2023.acl-long.371 2023
[58]

Ryoma Yoshimura, Masahiro Kaneko, Tomoyuki Kajiwara, and Mamoru Komachi. 2020. https://doi.org/10.18653/v1/2020.coling-main.573 SOME : Reference-less sub-metrics optimized for manual evaluations of grammatical error correction . In Proceedings of the 28th International Conference on Computational Linguistics, pages 6516--6522, Barcelona, Spain (Online). I...

work page doi:10.18653/v1/2020.coling-main.573 2020
[59]

Yue Zhang, Bo Zhang, Zhenghua Li, Zuyi Bao, Chen Li, and Min Zhang. 2022. https://doi.org/10.18653/v1/2022.emnlp-main.162 S yn GEC : Syntax-enhanced grammatical error correction with a tailored GEC -oriented parser . In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 2518--2531, Abu Dhabi, United Arab Emirates...

work page doi:10.18653/v1/2022.emnlp-main.162 2022
[60]

Wei Zhao, Liang Wang, Kewei Shen, Ruoyu Jia, and Jingming Liu. 2019. https://doi.org/10.18653/v1/N19-1014 Improving grammatical error correction via pre-training a copy-augmented architecture with unlabeled data . In Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technolog...

work page doi:10.18653/v1/n19-1014 2019
[61]

Yike Zhao, Xiaoman Wang, Yunshi Lan, and Weining Qian. 2025. https://aclanthology.org/2025.coling-demos.5/ U nified GEC : Integrating grammatical error correction approaches for multi-languages with a unified framework . In Proceedings of the 31st International Conference on Computational Linguistics: System Demonstrations, pages 37--45, Abu Dhabi, UAE. A...

work page 2025