Detector-Evasive LLM Paraphrasing via Constrained Policy Optimization

Mingyi Wang; Shaofeng Zou; Yuheng Bu; Zhuoer Shen

arxiv: 2606.00392 · v1 · pith:3TNGDBKInew · submitted 2026-05-29 · 💻 cs.LG · cs.AI

Detector-Evasive LLM Paraphrasing via Constrained Policy Optimization

Mingyi Wang , Zhuoer Shen , Yuheng Bu , Shaofeng Zou This is my paper

Pith reviewed 2026-06-28 23:04 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords LLM paraphrasingAI text detectionconstrained reinforcement learningdetector evasionsemantic preservationpolicy optimizationLagrangian methods

0 comments

The pith

Detector-evasive LLM paraphrasing is cast as a constrained Markov decision process that treats semantic preservation as a hard constraint while maximizing evasion.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper models the task of generating paraphrases that fool AI-text detectors as a Constrained Markov Decision Process. Detector evasion serves as the objective to maximize, while semantic similarity to the original text is enforced through an explicit constraint rather than a weighted reward. A Lagrangian primal-dual reinforcement learning algorithm called DEPO solves this formulation using a group-based policy update. Experiments across multiple datasets and detectors show the resulting policy achieves high evasion rates while staying inside the prescribed semantic-preservation region and generalizes across domains and prompts.

Core claim

By formulating detector-evasive paraphrasing as a CMDP with evasion as the primary objective and semantic preservation as a hard constraint, then solving it via a Lagrangian primal-dual RL method with GRPO-style updates, DEPO produces paraphrases that reliably evade detectors while satisfying the semantic constraint exactly, unlike scalarized reward approaches that provide only indirect control.

What carries the argument

Constrained Markov Decision Process formulation solved by Lagrangian primal-dual reinforcement learning with group-based policy updates, where semantic preservation acts as the explicit constraint.

If this is right

DEPO improves attack success rates while remaining inside the allowed semantic-preservation region.
The method exhibits robustness to changes in domain, detector, and prompt.
Cross-dataset results on MAGE, M4, RAID, and peer-review texts hold against MAGE, RoBERTa, RADAR, Binoculars, and Fast-DetectGPT detectors.
The adaptive balancing during training allows the policy to increase evasion without violating the constraint.
Prompt-level consistency indicates the approach does not require per-prompt retuning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The CMDP framing could extend to other constrained text-generation tasks such as style transfer under readability constraints.
Detector designers might need to account for policies trained under explicit semantic constraints rather than scalar rewards.
If the constraint enforcement proves stable, similar primal-dual methods could apply to safety constraints in LLM fine-tuning.

Load-bearing premise

Semantic preservation can be encoded and enforced as an explicit hard constraint inside the CMDP without creating new trade-offs or instabilities that reduce the evasion gains.

What would settle it

A direct comparison experiment in which the CMDP-constrained policy produces lower evasion success rates or higher semantic drift than an unconstrained baseline on the same datasets and detectors.

Figures

Figures reproduced from arXiv: 2606.00392 by Mingyi Wang, Shaofeng Zou, Yuheng Bu, Zhuoer Shen.

**Figure 2.** Figure 2: Detection score distributions across attack methods on four evaluation corpora. This distribution-level [PITH_FULL_IMAGE:figures/full_fig_p017_2.png] view at source ↗

**Figure 3.** Figure 3: ROC curves for original texts and DEPO-attacked texts under different reward-detector settings. The [PITH_FULL_IMAGE:figures/full_fig_p018_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative comparison of paraphrasing attack results. This figure presents representative paraphrasing attack outputs for a given AI-generated sentence(Original) using different methods described in Appendix A.2. 22 [PITH_FULL_IMAGE:figures/full_fig_p022_4.png] view at source ↗

read the original abstract

AI-text detectors are vulnerable to paraphrasing and detector-guided paraphrasing attacks, but existing detector-evasion methods often lack precise control over semantic preservation. In particular, optimizing directly for detector evasion can degrade fine-grained semantics, whereas scalarized reward designs provide only indirect, weight-sensitive control over the evasion-semantics trade-off. We address this limitation by formulating detector-evasive LLM paraphrasing as a Constrained Markov Decision Process, where detector evasion is the primary objective and semantic preservation is enforced as an explicit constraint. We propose Detector Evasion Policy Optimization (DEPO), a Lagrangian primal-dual reinforcement learning algorithm with a novel GRPO-style group-based policy update. DEPO adaptively balances semantic preservation and detector evasion during training, enabling the policy to improve attack success within a prescribed semantic-preservation region. Experiments on MAGE, M4, RAID, and peer-review datasets, evaluated against MAGE, RoBERTa, RADAR, Binoculars, and Fast-DetectGPT detectors, show that DEPO achieves strong detector evasion while precisely satisfying the semantic preservation constraint. DEPO also exhibits cross-domain, cross-detector, and prompt-level robustness.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DEPO frames detector evasion as a CMDP with an explicit semantic constraint solved by Lagrangian primal-dual RL, but the abstract gives no evidence that the solver actually keeps the constraint satisfied without instability.

read the letter

The main takeaway is that this paper casts LLM paraphrasing as a constrained MDP where detector evasion is the objective and semantic similarity is a hard constraint, then solves it with a Lagrangian method plus a GRPO-style group update. That formulation is the clearest novelty relative to prior scalarized reward approaches.

The work does a reasonable job of motivating why indirect weighting is insufficient and why an explicit constraint might be preferable for applications that need reliable semantic control. The reported experiments span multiple datasets and detectors, which at least shows an attempt at breadth.

The soft spot is exactly the one the stress-test flags. Lagrangian primal-dual methods routinely produce only approximate feasibility, and the abstract supplies no constraint-violation statistics, multiplier trajectories, or proof that the group update preserves the feasible region. Without those numbers it is impossible to know whether the claimed “precise” satisfaction is real or whether the evasion numbers are achieved by drifting outside the semantic bound. The free parameter (the semantic threshold) also sits right at the center of the claim, yet its sensitivity is not discussed.

This is a paper for people already working on adversarial attacks against text detectors or on constrained RL for language models. A reader who wants to see whether the CMDP framing actually delivers tighter control than existing methods would get value from the full version, but only after the optimization details are checked.

I would send it to peer review so the experimental section and any stability analysis can be examined properly.

Referee Report

2 major / 2 minor

Summary. The paper formulates detector-evasive LLM paraphrasing as a Constrained Markov Decision Process (CMDP) with detector evasion as the objective and semantic preservation as an explicit constraint. It introduces Detector Evasion Policy Optimization (DEPO), a Lagrangian primal-dual RL algorithm using a novel GRPO-style group-based policy update. Experiments across MAGE, M4, RAID, and peer-review datasets against MAGE, RoBERTa, RADAR, Binoculars, and Fast-DetectGPT detectors claim strong evasion performance while precisely satisfying the semantic constraint, plus cross-domain, cross-detector, and prompt-level robustness.

Significance. If verified, the CMDP formulation with explicit hard constraint offers a principled alternative to scalarized reward designs for controlling the evasion-semantics trade-off, which is a clear methodological strength. The adaptive balancing during training and reported robustness across datasets and detectors would be notable contributions to understanding AI-text detector vulnerabilities, provided the optimization dynamics support the feasibility claims.

major comments (2)

[Abstract] Abstract: the central claim that DEPO achieves evasion 'while precisely satisfying the semantic preservation constraint' is load-bearing for the contribution; the manuscript provides no constraint-violation statistics, multiplier convergence plots, or feasibility analysis to substantiate this against known primal-dual instabilities in Lagrangian methods.
[Method] Method section (CMDP and Lagrangian setup): the assumption that semantic preservation can be enforced as a hard constraint inside the CMDP without new trade-offs or instabilities requires explicit empirical support; the GRPO-style group update is presented as novel but lacks reported evidence that it preserves feasibility or improves upon standard primal-dual convergence.

minor comments (2)

[Experiments] Experiments: specify the exact semantic-preservation threshold value, the similarity metric used to enforce it, and the fraction of outputs meeting it for each detector/dataset combination.
[Abstract] Abstract and §4: clarify baseline paraphrasing methods and whether they include recent detector-guided attacks for direct comparison.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments correctly identify that the manuscript lacks explicit empirical validation of constraint feasibility and optimization stability for the Lagrangian approach. We agree these additions are necessary to support the central claims and will incorporate them in the revision.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that DEPO achieves evasion 'while precisely satisfying the semantic preservation constraint' is load-bearing for the contribution; the manuscript provides no constraint-violation statistics, multiplier convergence plots, or feasibility analysis to substantiate this against known primal-dual instabilities in Lagrangian methods.

Authors: We acknowledge that the current manuscript does not report constraint-violation statistics, Lagrange multiplier convergence plots, or a dedicated feasibility analysis. While the experimental results show semantic similarity scores consistently meeting the prescribed threshold alongside high evasion rates, this indirect evidence does not directly address potential primal-dual instabilities. In the revised version we will add: (i) per-epoch average and maximum constraint violation rates on all datasets, (ii) plots of the dual multiplier trajectory during training, and (iii) a short feasibility study comparing observed violations against a standard Lagrangian baseline. These additions will directly substantiate the 'precisely satisfying' claim. revision: yes
Referee: [Method] Method section (CMDP and Lagrangian setup): the assumption that semantic preservation can be enforced as a hard constraint inside the CMDP without new trade-offs or instabilities requires explicit empirical support; the GRPO-style group update is presented as novel but lacks reported evidence that it preserves feasibility or improves upon standard primal-dual convergence.

Authors: We agree that explicit empirical support for the CMDP constraint enforcement and for the GRPO-style update's effect on feasibility is missing. The group-based update is intended to reduce variance in policy gradients while respecting the constraint, yet no ablation or convergence comparison is provided. In revision we will include: (i) training curves of constraint violation for DEPO versus a standard primal-dual RL implementation without the group update, and (ii) quantitative comparison of final feasibility gap and number of epochs to stable multiplier convergence. This will supply the requested evidence on whether the GRPO modification improves stability. revision: yes

Circularity Check

0 steps flagged

No significant circularity; new algorithm with external validation

full rationale

The paper formulates detector-evasive paraphrasing as a CMDP and introduces the DEPO algorithm with a novel GRPO-style update. All performance claims are evaluated against external datasets (MAGE, M4, RAID, peer-review) and detectors (MAGE, RoBERTa, RADAR, Binoculars, Fast-DetectGPT). No equations, self-citations, or fitted parameters are shown to reduce the claimed results to inputs by construction. The derivation chain is self-contained against independent benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach rests on standard RL modeling assumptions and introduces one tunable constraint threshold; no new physical entities or ad-hoc constants beyond the prescribed semantic region are postulated in the abstract.

free parameters (1)

semantic-preservation threshold
The constraint bound that defines the acceptable semantic region is described as prescribed but its selection procedure and sensitivity are not detailed.

axioms (1)

domain assumption Paraphrasing can be modeled as a Markov Decision Process with detector output and semantic similarity as reward/constraint signals
Invoked when the problem is cast as a Constrained MDP in the abstract.

pith-pipeline@v0.9.1-grok · 5738 in / 1415 out tokens · 28585 ms · 2026-06-28T23:04:12.835635+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

76 extracted references · 19 canonical work pages · 7 internal anchors

[1]

Aho and Jeffrey D

Alfred V. Aho and Jeffrey D. Ullman , title =. 1972

1972
[2]

arXiv preprint arXiv:2305.18081 , year=

Game of Tones: Faculty detection of GPT-4 generated content in university assessments , author=. arXiv preprint arXiv:2305.18081 , year=

work page arXiv
[3]

Operations Research Letters , volume=

Faster algorithm and sharper analysis for constrained Markov decision process , author=. Operations Research Letters , volume=. 2024 , publisher=

2024
[4]

Proceedings of the international AAAI conference on web and social media , volume=

Machine-made media: Monitoring the mobilization of machine-generated articles on misinformation and mainstream news websites , author=. Proceedings of the international AAAI conference on web and social media , volume=
[5]

Publications Manual , year = "1983", publisher =

1983
[6]

and Kozen, Dexter C

Ashok K. Chandra and Dexter C. Kozen and Larry J. Stockmeyer , year = "1981", title =. doi:10.1145/322234.322243

work page doi:10.1145/322234.322243 1981
[7]

Scalable training of

Andrew, Galen and Gao, Jianfeng , booktitle=. Scalable training of
[8]

Dan Gusfield , title =. 1997

1997
[9]

Tetreault , title =

Mohammad Sadegh Rasooli and Joel R. Tetreault , title =. Computing Research Repository , volume =. 2015 , url =

2015
[10]

BERTScore: Evaluating Text Generation with BERT

Bertscore: Evaluating text generation with bert , author=. arXiv preprint arXiv:1904.09675 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1904
[11]

Findings of the association for computational linguistics: ACL 2022 , pages=

Sentence-t5: Scalable sentence encoders from pre-trained text-to-text models , author=. Findings of the association for computational linguistics: ACL 2022 , pages=

2022
[12]

Findings of the Association for Computational Linguistics: EMNLP 2024 , pages=

ALIGN-SIM: A task-free test bed for evaluating and interpreting sentence embeddings through semantic similarity alignment , author=. Findings of the Association for Computational Linguistics: EMNLP 2024 , pages=

2024
[13]

Proceedings of the 57th annual meeting of the association for computational linguistics: system demonstrations , pages=

Gltr: Statistical detection and visualization of generated text , author=. Proceedings of the 57th annual meeting of the association for computational linguistics: system demonstrations , pages=
[14]

Release Strategies and the Social Impacts of Language Models

Release strategies and the social impacts of language models , author=. arXiv preprint arXiv:1908.09203 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1908
[15]

Proceedings of the 58th annual meeting of the association for computational linguistics , pages=

Automatic detection of generated text is easiest when humans are fooled , author=. Proceedings of the 58th annual meeting of the association for computational linguistics , pages=
[16]

Findings of the association for computational linguistics: EMNLP 2021 , pages=

Turingbench: A benchmark environment for turing test in the age of neural text generation , author=. Findings of the association for computational linguistics: EMNLP 2021 , pages=

2021
[17]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

MAGE: Machine-generated text detection in the wild , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
[18]

Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

M4: Multi-generator, multi-domain, and multi-lingual black-box machine-generated text detection , author=. Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
[19]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Raid: A shared benchmark for robust evaluation of machine-generated text detectors , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
[20]

Advances in neural information processing systems , volume=

Defending against neural fake news , author=. Advances in neural information processing systems , volume=
[21]

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Roberta: A robustly optimized bert pretraining approach , author=. arXiv preprint arXiv:1907.11692 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1907
[22]

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages=

Ghostbuster: Detecting text ghostwritten by large language models , author=. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages=

2024
[23]

International conference on machine learning , pages=

Detectgpt: Zero-shot machine-generated text detection using probability curvature , author=. International conference on machine learning , pages=. 2023 , organization=

2023
[24]

International Conference on Learning Representations , volume=

Fast-detectgpt: Efficient zero-shot detection of machine-generated text via conditional probability curvature , author=. International Conference on Learning Representations , volume=
[25]

Advances in neural information processing systems , volume=

Radar: Robust ai-text detection via adversarial learning , author=. Advances in neural information processing systems , volume=
[26]

International Conference on Learning Representations , volume=

Dna-gpt: Divergent n-gram analysis for training-free detection of gpt-generated text , author=. International Conference on Learning Representations , volume=
[27]

arXiv preprint arXiv:2401.12070 , year=

Spotting llms with binoculars: Zero-shot detection of machine-generated text , author=. arXiv preprint arXiv:2401.12070 , year=

work page arXiv
[28]

International conference on machine learning , pages=

A watermark for large language models , author=. International conference on machine learning , pages=. 2023 , organization=

2023
[29]

International Conference on Learning Representations , volume=

Language model detectors are easily optimized against , author=. International Conference on Learning Representations , volume=
[30]

Advances in Neural Information Processing Systems , volume=

Adversarial Paraphrasing: A Universal Attack for Humanizing AI-Generated Text , author=. Advances in Neural Information Processing Systems , volume=
[31]

arXiv preprint arXiv:2503.08716 , year=

Authormist: Evading ai text detectors with reinforcement learning , author=. arXiv preprint arXiv:2503.08716 , year=

work page arXiv
[32]

Proceedings of the 1stWorkshop on GenAI Content Detection (GenAIDetect) , pages=

SilverSpeak: evading AI-generated text detectors using homoglyphs , author=. Proceedings of the 1stWorkshop on GenAI Content Detection (GenAIDetect) , pages=
[33]

arXiv preprint arXiv:2602.08934 , year=

StealthRL: Reinforcement Learning Paraphrase Attacks for Multi-Detector Evasion of AI-Text Detectors , author=. arXiv preprint arXiv:2602.08934 , year=

work page arXiv
[34]

arXiv preprint arXiv:2603.16152 , year=

HIPO: Instruction Hierarchy via Constrained Reinforcement Learning , author=. arXiv preprint arXiv:2603.16152 , year=

work page arXiv
[35]

Proceedings of COLING 2016, the 26th international conference on computational linguistics: technical papers , pages=

Neural paraphrase generation with stacked residual LSTM networks , author=. Proceedings of COLING 2016, the 26th international conference on computational linguistics: technical papers , pages=

2016
[36]

arXiv preprint arXiv:2305.10847 , year=

Large language models can be guided to evade ai-generated text detection , author=. arXiv preprint arXiv:2305.10847 , year=

work page arXiv
[37]

Proceedings of the aaai conference on artificial intelligence , volume=

A deep generative framework for paraphrase generation , author=. Proceedings of the aaai conference on artificial intelligence , volume=
[38]

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , pages=

Controllable paraphrase generation with a syntactic exemplar , author=. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , pages=
[39]

Findings of the Association for Computational Linguistics: EMNLP 2020 , pages=

ProphetNet: Predicting future n-gram for sequence-to-SequencePre-training , author=. Findings of the Association for Computational Linguistics: EMNLP 2020 , pages=

2020
[40]

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing , pages=

Paraphrase generation with deep reinforcement learning , author=. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing , pages=

2018
[41]

Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining , pages=

Unsupervised paraphrasing via deep reinforcement learning , author=. Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining , pages=
[42]

Exploring diverse expressions for paraphrase generation , author=. Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP) , pages=

2019
[43]

Sequence Level Training with Recurrent Neural Networks

Sequence level training with recurrent neural networks , author=. arXiv preprint arXiv:1511.06732 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[44]

International Conference on Learning Representations , year =

An Actor-Critic Algorithm for Sequence Prediction , author =. International Conference on Learning Representations , year =
[45]

International Conference on Learning Representations , year =

A Deep Reinforced Model for Abstractive Summarization , author =. International Conference on Learning Representations , year =
[46]

Fine-Tuning Language Models from Human Preferences

Fine-tuning language models from human preferences , author=. arXiv preprint arXiv:1909.08593 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1909
[47]

Advances in neural information processing systems , volume=

Learning to summarize with human feedback , author=. Advances in neural information processing systems , volume=
[48]

Advances in neural information processing systems , volume=

Training language models to follow instructions with human feedback , author=. Advances in neural information processing systems , volume=
[49]

Advances in neural information processing systems , volume=

Direct preference optimization: Your language model is secretly a reward model , author=. Advances in neural information processing systems , volume=
[50]

1999 , publisher =

Constrained Markov Decision Processes , author =. 1999 , publisher =

1999
[51]

International Conference on Learning Representations , volume=

Safe rlhf: Safe reinforcement learning from human feedback , author=. International Conference on Learning Representations , volume=
[52]

Advances in Neural Information Processing Systems , volume=

Stepwise alignment for constrained language model policy optimization , author=. Advances in Neural Information Processing Systems , volume=
[53]

International conference on machine learning , pages=

Constrained policy optimization , author=. International conference on machine learning , pages=. 2017 , organization=

2017
[54]

Advances in neural information processing systems , volume=

Paraphrasing evades detectors of ai-generated text, but retrieval is an effective defense , author=. Advances in neural information processing systems , volume=
[55]

Transactions on Machine Learning Research , year=

Can AI-generated text be reliably detected? stress testing AI text detectors under various attacks , author=. Transactions on Machine Learning Research , year=
[56]

1999 , publisher=

Nonlinear multiobjective optimization , author=. 1999 , publisher=

1999
[57]

Structural and multidisciplinary optimization , volume=

The weighted sum method for multi-objective optimization: new insights , author=. Structural and multidisciplinary optimization , volume=. 2010 , publisher=

2010
[58]

Reward Constrained Policy Optimization

Reward constrained policy optimization , author=. arXiv preprint arXiv:1805.11074 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[59]

arXiv preprint arXiv:2301.07597 , year=

How close is chatgpt to human experts? comparison corpus, evaluation, and detection , author=. arXiv preprint arXiv:2301.07597 , year=

work page arXiv
[60]

arXiv e-prints , pages=

Is your paper being reviewed by an LLM? A new benchmark dataset and approach for detecting AI text in peer review , author=. arXiv e-prints , pages=
[61]

arXiv preprint arXiv:2302.07731 , year=

Combat ai with ai: Counteract machine-generated fake restaurant reviews on social media , author=. arXiv preprint arXiv:2302.07731 , year=

work page arXiv
[62]

arXiv preprint arXiv:2509.04460 , year=

CoCoNUTS: Concentrating on Content while Neglecting Uninformative Textual Styles for AI-Generated Peer Review Detection , author=. arXiv preprint arXiv:2509.04460 , year=

work page arXiv
[63]

Advances in Neural Information Processing Systems , volume=

Detective: Detecting ai-generated text via multi-level contrastive learning , author=. Advances in Neural Information Processing Systems , volume=
[64]

arXiv preprint arXiv:2602.13042 , year=

Gptzero: Robust detection of llm-generated texts , author=. arXiv preprint arXiv:2602.13042 , year=

work page arXiv
[65]

Computational Linguistics , volume=

A survey on llm-generated text detection: Necessity, methods, and future directions , author=. Computational Linguistics , volume=
[66]

Findings of the Association for Computational Linguistics: EMNLP 2024 , pages=

A survey on detection of llms-generated content , author=. Findings of the Association for Computational Linguistics: EMNLP 2024 , pages=

2024
[67]

arXiv preprint arXiv:2403.01152 , year=

A survey of ai-generated text forensic systems: Detection, attribution, and characterization , author=. arXiv preprint arXiv:2403.01152 , year=

work page arXiv
[68]

Computer Science Review , volume=

AI-generated text detection: A comprehensive review of methods, datasets, and applications , author=. Computer Science Review , volume=. 2025 , publisher=

2025
[69]

Mathematics , volume=

Enhancing the Robustness of AI-Generated Text Detectors: A Survey , author=. Mathematics , volume=. 2025 , publisher=

2025
[70]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Are ai-generated text detectors robust to adversarial perturbations? , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
[71]

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) , pages=

Humanizing machine-generated content: evading AI-text detection through adversarial attack , author=. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) , pages=

2024
[72]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Deepseekmath: Pushing the limits of mathematical reasoning in open language models , author=. arXiv preprint arXiv:2402.03300 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[73]

von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and Beeching, Edward and Thrush, Tristan and Lambert, Nathan and Huang, Shengyi and Rasul, Kashif and Gallouédec, Quentin , license =
[74]

Advances in neural information processing systems , volume=

Judging llm-as-a-judge with mt-bench and chatbot arena , author=. Advances in neural information processing systems , volume=
[75]

Proceedings of the 2023 conference on empirical methods in natural language processing , pages=

G-eval: NLG evaluation using gpt-4 with better human alignment , author=. Proceedings of the 2023 conference on empirical methods in natural language processing , pages=

2023
[76]

, author=

Lora: Low-rank adaptation of large language models. , author=. Iclr , volume=

[1] [1]

Aho and Jeffrey D

Alfred V. Aho and Jeffrey D. Ullman , title =. 1972

1972

[2] [2]

arXiv preprint arXiv:2305.18081 , year=

Game of Tones: Faculty detection of GPT-4 generated content in university assessments , author=. arXiv preprint arXiv:2305.18081 , year=

work page arXiv

[3] [3]

Operations Research Letters , volume=

Faster algorithm and sharper analysis for constrained Markov decision process , author=. Operations Research Letters , volume=. 2024 , publisher=

2024

[4] [4]

Proceedings of the international AAAI conference on web and social media , volume=

Machine-made media: Monitoring the mobilization of machine-generated articles on misinformation and mainstream news websites , author=. Proceedings of the international AAAI conference on web and social media , volume=

[5] [5]

Publications Manual , year = "1983", publisher =

1983

[6] [6]

and Kozen, Dexter C

Ashok K. Chandra and Dexter C. Kozen and Larry J. Stockmeyer , year = "1981", title =. doi:10.1145/322234.322243

work page doi:10.1145/322234.322243 1981

[7] [7]

Scalable training of

Andrew, Galen and Gao, Jianfeng , booktitle=. Scalable training of

[8] [8]

Dan Gusfield , title =. 1997

1997

[9] [9]

Tetreault , title =

Mohammad Sadegh Rasooli and Joel R. Tetreault , title =. Computing Research Repository , volume =. 2015 , url =

2015

[10] [10]

BERTScore: Evaluating Text Generation with BERT

Bertscore: Evaluating text generation with bert , author=. arXiv preprint arXiv:1904.09675 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1904

[11] [11]

Findings of the association for computational linguistics: ACL 2022 , pages=

Sentence-t5: Scalable sentence encoders from pre-trained text-to-text models , author=. Findings of the association for computational linguistics: ACL 2022 , pages=

2022

[12] [12]

Findings of the Association for Computational Linguistics: EMNLP 2024 , pages=

ALIGN-SIM: A task-free test bed for evaluating and interpreting sentence embeddings through semantic similarity alignment , author=. Findings of the Association for Computational Linguistics: EMNLP 2024 , pages=

2024

[13] [13]

Proceedings of the 57th annual meeting of the association for computational linguistics: system demonstrations , pages=

Gltr: Statistical detection and visualization of generated text , author=. Proceedings of the 57th annual meeting of the association for computational linguistics: system demonstrations , pages=

[14] [14]

Release Strategies and the Social Impacts of Language Models

Release strategies and the social impacts of language models , author=. arXiv preprint arXiv:1908.09203 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1908

[15] [15]

Proceedings of the 58th annual meeting of the association for computational linguistics , pages=

Automatic detection of generated text is easiest when humans are fooled , author=. Proceedings of the 58th annual meeting of the association for computational linguistics , pages=

[16] [16]

Findings of the association for computational linguistics: EMNLP 2021 , pages=

Turingbench: A benchmark environment for turing test in the age of neural text generation , author=. Findings of the association for computational linguistics: EMNLP 2021 , pages=

2021

[17] [17]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

MAGE: Machine-generated text detection in the wild , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

[18] [18]

Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

M4: Multi-generator, multi-domain, and multi-lingual black-box machine-generated text detection , author=. Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

[19] [19]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Raid: A shared benchmark for robust evaluation of machine-generated text detectors , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

[20] [20]

Advances in neural information processing systems , volume=

Defending against neural fake news , author=. Advances in neural information processing systems , volume=

[21] [21]

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Roberta: A robustly optimized bert pretraining approach , author=. arXiv preprint arXiv:1907.11692 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1907

[22] [22]

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages=

Ghostbuster: Detecting text ghostwritten by large language models , author=. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages=

2024

[23] [23]

International conference on machine learning , pages=

Detectgpt: Zero-shot machine-generated text detection using probability curvature , author=. International conference on machine learning , pages=. 2023 , organization=

2023

[24] [24]

International Conference on Learning Representations , volume=

Fast-detectgpt: Efficient zero-shot detection of machine-generated text via conditional probability curvature , author=. International Conference on Learning Representations , volume=

[25] [25]

Advances in neural information processing systems , volume=

Radar: Robust ai-text detection via adversarial learning , author=. Advances in neural information processing systems , volume=

[26] [26]

International Conference on Learning Representations , volume=

Dna-gpt: Divergent n-gram analysis for training-free detection of gpt-generated text , author=. International Conference on Learning Representations , volume=

[27] [27]

arXiv preprint arXiv:2401.12070 , year=

Spotting llms with binoculars: Zero-shot detection of machine-generated text , author=. arXiv preprint arXiv:2401.12070 , year=

work page arXiv

[28] [28]

International conference on machine learning , pages=

A watermark for large language models , author=. International conference on machine learning , pages=. 2023 , organization=

2023

[29] [29]

International Conference on Learning Representations , volume=

Language model detectors are easily optimized against , author=. International Conference on Learning Representations , volume=

[30] [30]

Advances in Neural Information Processing Systems , volume=

Adversarial Paraphrasing: A Universal Attack for Humanizing AI-Generated Text , author=. Advances in Neural Information Processing Systems , volume=

[31] [31]

arXiv preprint arXiv:2503.08716 , year=

Authormist: Evading ai text detectors with reinforcement learning , author=. arXiv preprint arXiv:2503.08716 , year=

work page arXiv

[32] [32]

Proceedings of the 1stWorkshop on GenAI Content Detection (GenAIDetect) , pages=

SilverSpeak: evading AI-generated text detectors using homoglyphs , author=. Proceedings of the 1stWorkshop on GenAI Content Detection (GenAIDetect) , pages=

[33] [33]

arXiv preprint arXiv:2602.08934 , year=

StealthRL: Reinforcement Learning Paraphrase Attacks for Multi-Detector Evasion of AI-Text Detectors , author=. arXiv preprint arXiv:2602.08934 , year=

work page arXiv

[34] [34]

arXiv preprint arXiv:2603.16152 , year=

HIPO: Instruction Hierarchy via Constrained Reinforcement Learning , author=. arXiv preprint arXiv:2603.16152 , year=

work page arXiv

[35] [35]

Proceedings of COLING 2016, the 26th international conference on computational linguistics: technical papers , pages=

Neural paraphrase generation with stacked residual LSTM networks , author=. Proceedings of COLING 2016, the 26th international conference on computational linguistics: technical papers , pages=

2016

[36] [36]

arXiv preprint arXiv:2305.10847 , year=

Large language models can be guided to evade ai-generated text detection , author=. arXiv preprint arXiv:2305.10847 , year=

work page arXiv

[37] [37]

Proceedings of the aaai conference on artificial intelligence , volume=

A deep generative framework for paraphrase generation , author=. Proceedings of the aaai conference on artificial intelligence , volume=

[38] [38]

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , pages=

Controllable paraphrase generation with a syntactic exemplar , author=. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , pages=

[39] [39]

Findings of the Association for Computational Linguistics: EMNLP 2020 , pages=

ProphetNet: Predicting future n-gram for sequence-to-SequencePre-training , author=. Findings of the Association for Computational Linguistics: EMNLP 2020 , pages=

2020

[40] [40]

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing , pages=

Paraphrase generation with deep reinforcement learning , author=. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing , pages=

2018

[41] [41]

Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining , pages=

Unsupervised paraphrasing via deep reinforcement learning , author=. Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining , pages=

[42] [42]

Exploring diverse expressions for paraphrase generation , author=. Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP) , pages=

2019

[43] [43]

Sequence Level Training with Recurrent Neural Networks

Sequence level training with recurrent neural networks , author=. arXiv preprint arXiv:1511.06732 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[44] [44]

International Conference on Learning Representations , year =

An Actor-Critic Algorithm for Sequence Prediction , author =. International Conference on Learning Representations , year =

[45] [45]

International Conference on Learning Representations , year =

A Deep Reinforced Model for Abstractive Summarization , author =. International Conference on Learning Representations , year =

[46] [46]

Fine-Tuning Language Models from Human Preferences

Fine-tuning language models from human preferences , author=. arXiv preprint arXiv:1909.08593 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1909

[47] [47]

Advances in neural information processing systems , volume=

Learning to summarize with human feedback , author=. Advances in neural information processing systems , volume=

[48] [48]

Advances in neural information processing systems , volume=

Training language models to follow instructions with human feedback , author=. Advances in neural information processing systems , volume=

[49] [49]

Advances in neural information processing systems , volume=

Direct preference optimization: Your language model is secretly a reward model , author=. Advances in neural information processing systems , volume=

[50] [50]

1999 , publisher =

Constrained Markov Decision Processes , author =. 1999 , publisher =

1999

[51] [51]

International Conference on Learning Representations , volume=

Safe rlhf: Safe reinforcement learning from human feedback , author=. International Conference on Learning Representations , volume=

[52] [52]

Advances in Neural Information Processing Systems , volume=

Stepwise alignment for constrained language model policy optimization , author=. Advances in Neural Information Processing Systems , volume=

[53] [53]

International conference on machine learning , pages=

Constrained policy optimization , author=. International conference on machine learning , pages=. 2017 , organization=

2017

[54] [54]

Advances in neural information processing systems , volume=

Paraphrasing evades detectors of ai-generated text, but retrieval is an effective defense , author=. Advances in neural information processing systems , volume=

[55] [55]

Transactions on Machine Learning Research , year=

Can AI-generated text be reliably detected? stress testing AI text detectors under various attacks , author=. Transactions on Machine Learning Research , year=

[56] [56]

1999 , publisher=

Nonlinear multiobjective optimization , author=. 1999 , publisher=

1999

[57] [57]

Structural and multidisciplinary optimization , volume=

The weighted sum method for multi-objective optimization: new insights , author=. Structural and multidisciplinary optimization , volume=. 2010 , publisher=

2010

[58] [58]

Reward Constrained Policy Optimization

Reward constrained policy optimization , author=. arXiv preprint arXiv:1805.11074 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[59] [59]

arXiv preprint arXiv:2301.07597 , year=

How close is chatgpt to human experts? comparison corpus, evaluation, and detection , author=. arXiv preprint arXiv:2301.07597 , year=

work page arXiv

[60] [60]

arXiv e-prints , pages=

Is your paper being reviewed by an LLM? A new benchmark dataset and approach for detecting AI text in peer review , author=. arXiv e-prints , pages=

[61] [61]

arXiv preprint arXiv:2302.07731 , year=

Combat ai with ai: Counteract machine-generated fake restaurant reviews on social media , author=. arXiv preprint arXiv:2302.07731 , year=

work page arXiv

[62] [62]

arXiv preprint arXiv:2509.04460 , year=

CoCoNUTS: Concentrating on Content while Neglecting Uninformative Textual Styles for AI-Generated Peer Review Detection , author=. arXiv preprint arXiv:2509.04460 , year=

work page arXiv

[63] [63]

Advances in Neural Information Processing Systems , volume=

Detective: Detecting ai-generated text via multi-level contrastive learning , author=. Advances in Neural Information Processing Systems , volume=

[64] [64]

arXiv preprint arXiv:2602.13042 , year=

Gptzero: Robust detection of llm-generated texts , author=. arXiv preprint arXiv:2602.13042 , year=

work page arXiv

[65] [65]

Computational Linguistics , volume=

A survey on llm-generated text detection: Necessity, methods, and future directions , author=. Computational Linguistics , volume=

[66] [66]

Findings of the Association for Computational Linguistics: EMNLP 2024 , pages=

A survey on detection of llms-generated content , author=. Findings of the Association for Computational Linguistics: EMNLP 2024 , pages=

2024

[67] [67]

arXiv preprint arXiv:2403.01152 , year=

A survey of ai-generated text forensic systems: Detection, attribution, and characterization , author=. arXiv preprint arXiv:2403.01152 , year=

work page arXiv

[68] [68]

Computer Science Review , volume=

AI-generated text detection: A comprehensive review of methods, datasets, and applications , author=. Computer Science Review , volume=. 2025 , publisher=

2025

[69] [69]

Mathematics , volume=

Enhancing the Robustness of AI-Generated Text Detectors: A Survey , author=. Mathematics , volume=. 2025 , publisher=

2025

[70] [70]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Are ai-generated text detectors robust to adversarial perturbations? , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

[71] [71]

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) , pages=

Humanizing machine-generated content: evading AI-text detection through adversarial attack , author=. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) , pages=

2024

[72] [72]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Deepseekmath: Pushing the limits of mathematical reasoning in open language models , author=. arXiv preprint arXiv:2402.03300 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[73] [73]

von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and Beeching, Edward and Thrush, Tristan and Lambert, Nathan and Huang, Shengyi and Rasul, Kashif and Gallouédec, Quentin , license =

[74] [74]

Advances in neural information processing systems , volume=

Judging llm-as-a-judge with mt-bench and chatbot arena , author=. Advances in neural information processing systems , volume=

[75] [75]

Proceedings of the 2023 conference on empirical methods in natural language processing , pages=

G-eval: NLG evaluation using gpt-4 with better human alignment , author=. Proceedings of the 2023 conference on empirical methods in natural language processing , pages=

2023

[76] [76]

, author=

Lora: Low-rank adaptation of large language models. , author=. Iclr , volume=