arxiv: 2605.14240 · v1 · submitted 2026-05-14 · 💻 cs.LG

Recognition: no theorem link

Paraphrasing Attack Resilience of Various AI-Generated Text Detection Methods

Andrii Shportko , Inessa Verbitsky

Authors on Pith no claims yet

Pith reviewed 2026-05-15 01:31 UTC · model grok-4.3

classification 💻 cs.LG

keywords AI-generated text detectionparaphrasing attacksBinoculars detectorRoBERTa classifierensemble methodsattack resiliencemachine learning

0 comments

The pith

Binoculars-inclusive ensembles detect AI text most accurately but lose the largest share of that accuracy when text is paraphrased.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper evaluates how well current AI-generated text detectors withstand paraphrasing attacks that rewrite content to evade detection. It compares a fine-tuned RoBERTa model, the Binoculars detector, text feature analysis, and Random Forest ensembles built from combinations of these three. The central result is that ensembles containing Binoculars achieve the highest detection rates on unaltered text yet suffer the steepest declines once the text has been paraphrased, exposing a performance-resilience trade-off.

Core claim

Binoculars-inclusive ensembles yield the strongest detection results but also suffer the most significant losses during attacks, illustrating a dichotomy of performance versus resilience among state-of-the-art AI text detection techniques.

What carries the argument

Comparison of standalone detectors (fine-tuned RoBERTa, Binoculars, text feature analysis) and their Random Forest ensembles under controlled paraphrasing attacks.

If this is right

Higher-accuracy detectors are more vulnerable to simple rewriting attacks and may require extra defenses.
Practical systems must choose between peak detection rates and consistent behavior under evasion.
Ensemble construction improves baseline performance but passes through the same attack weaknesses of its strongest component.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Adversarial training focused on paraphrases could narrow the observed resilience gap for Binoculars-based methods.
Detection pipelines might benefit from runtime selection among models depending on detected rewriting patterns.
The trade-off may extend to other evasion techniques such as translation or style transfer.

Load-bearing premise

The paraphrasing attacks and evaluation datasets used are representative of real-world evasion attempts and the reported performance differences generalize beyond the specific test conditions.

What would settle it

A follow-up test on a fresh dataset using different paraphrasing tools in which Binoculars-inclusive ensembles no longer show the largest accuracy drops.

Figures

Figures reproduced from arXiv: 2605.14240 by Andrii Shportko, Inessa Verbitsky.

**Figure 1.** Figure 1: Pipeline of our model which is described in Appendix A.0.1. First, we chose to fine-tune RoBERTa for AI text detection because it provided a substantial improvement in the model’s ability to understand nuanced language differences. In essence, we added a final layer of size 2 for binary classification. It is also a well-tested approach in machine-generated text detection (Liu et al., 2019). We performed f… view at source ↗

**Figure 4.** Figure 4: Pre-attack F1 scores The ensemble incorporating all modules (Text Features, RoBERTa, and Binoculars) achieves the highest F1 score of 80.61%. The second-best performance is observed when Text Features and Binoculars are combined. While combining Text Features with RoBERTa or RoBERTa with Binoculars also improves performance compared to individual features, they fall short of the comprehensive ensemble. N… view at source ↗

**Figure 2.** Figure 2: Binoculars results 4.1.2 Context Window Effect We observed that the information gain increases as the context window increases. However, the information gain plateaus somewhere after 256 − 512 tokens. The Jensen-Shannon (JS) divergence score ( [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 5.** Figure 5: Post-attack F1 scores Among individual models, RoBERTa demonstrated the highest resilience to paraphrasing attacks, showing almost no degradation ( [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗

**Figure 3.** Figure 3: Binoculars score over context window of 512 w/o quantization [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 6.** Figure 6: F1 score comparison and degradation 5 Discussion 5.1 Analysis of Results As demonstrated by our Results, we introduced a Cohesive Testing Framework (CTF) for classifying text as human- versus machine-written. Our system streamlines the ensembling process by feeding the document input into three detectors – Binoculars, Text Features, and RoBERTa, which are then stacked and evaluated by our meta-learner, R… view at source ↗

read the original abstract

The recent large-scale emergence of LLMs has left an open space for dealing with their consequences, such as plagiarism or the spread of false information on the Internet. Coupling this with the rise of AI detector bypassing tools, reliable machine-generated text detection is in increasingly high demand. We investigate the paraphrasing attack resilience of various machine-generated text detection methods, evaluating three approaches: fine-tuned RoBERTa, Binoculars, and text feature analysis, along with their ensembles using Random Forest classifiers. We discovered that Binoculars-inclusive ensembles yield the strongest results, but they also suffer the most significant losses during attacks. In this paper, we present the dichotomy of performance versus resilience in the world of AI text detection, which complicates the current perception of reliability among state-of-the-art techniques.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Binoculars-inclusive ensembles detect AI text best but drop hardest under paraphrasing, though the abstract supplies no numbers or attack details to size the effect.

read the letter

The main observation is that ensembles with Binoculars on board post the highest detection scores but also take the biggest hit when the input gets paraphrased. The paper compares fine-tuned RoBERTa, Binoculars, and feature-based detectors, then tests Random Forest ensembles of them against paraphrasing attacks. That focused side-by-side under one attack type is the concrete addition; prior work has looked at detectors separately, so pulling them together with attack testing fills a small but practical gap. The comparison itself is straightforward and gives a reader a quick sense of relative robustness without new theory or models. The soft spot is the missing substance. The abstract states the performance-resilience split but shows no accuracy figures, no dataset sizes, no description of the paraphrasers or prompts, and no error breakdown. Without those, it is impossible to tell whether the losses are large enough to matter or whether the attacks are strong and diverse enough to stand in for real evasion. The representativeness assumption is doing most of the lifting. This is the sort of short empirical note that would interest people who actually deploy or evaluate AI-text detectors for moderation or academic integrity work. A practitioner could take the high-level trade-off as a reminder to test ensembles under attack, but they would need the numbers before changing anything. It is worth sending to referees because the question is timely and the setup is simple, but the authors must add the quantitative results and attack specifics first. I would mention it in a reading group to talk about detector testing practices, but I would not cite it until the data appear.

Referee Report

3 major / 2 minor

Summary. The manuscript evaluates the paraphrasing attack resilience of three AI-generated text detection approaches—fine-tuned RoBERTa, Binoculars, and text feature analysis—along with their Random Forest ensembles. It claims that Binoculars-inclusive ensembles deliver the strongest detection performance but suffer the largest drops under paraphrasing attacks, revealing a performance-resilience trade-off.

Significance. If the empirical comparisons are rigorously supported, the identification of a performance-resilience dichotomy in Binoculars-inclusive ensembles would be a useful practical observation for the design of AI text detectors. It could help the community avoid over-reliance on high-performing but brittle methods when facing real evasion attempts.

major comments (3)

Abstract: the key discovery is stated but supplies no quantitative results, dataset descriptions, attack implementation details, or error analysis to support the claimed performance-resilience dichotomy. This omission leaves the magnitude of the reported losses unanchored.
Evaluation section: the representativeness of the paraphrasing attacks (paraphraser models, prompt strategies, lexical diversity) and evaluation datasets is not justified with concrete details on domain coverage or attack strength, which is load-bearing for the generalizability of the trade-off claim.
Results section: performance differences and loss magnitudes under attacks are presented without statistical significance tests, confidence intervals, or ablation on ensemble feature importance, making it impossible to confirm that Binoculars-inclusive ensembles incur reliably larger drops.

minor comments (2)

Add explicit descriptions of the Random Forest hyperparameters and feature sets used in the ensembles to improve reproducibility.
Ensure all tables reporting accuracy or F1 scores include standard deviations across multiple runs or folds.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important areas for strengthening the presentation of our empirical findings on the performance-resilience trade-off in AI-generated text detectors. We address each major comment below.

read point-by-point responses

Referee: Abstract: the key discovery is stated but supplies no quantitative results, dataset descriptions, attack implementation details, or error analysis to support the claimed performance-resilience dichotomy. This omission leaves the magnitude of the reported losses unanchored.

Authors: We agree that the abstract would be strengthened by including quantitative anchors for the claimed dichotomy. We have revised the abstract to report specific metrics (e.g., pre- and post-attack F1 scores for the top-performing ensembles), along with concise references to the datasets and paraphrasing attack setups used in the study. revision: yes
Referee: Evaluation section: the representativeness of the paraphrasing attacks (paraphraser models, prompt strategies, lexical diversity) and evaluation datasets is not justified with concrete details on domain coverage or attack strength, which is load-bearing for the generalizability of the trade-off claim.

Authors: We have expanded the Evaluation section to include explicit details on the paraphraser models (e.g., T5-based and GPT variants), prompt strategies, lexical diversity controls, and the multi-domain datasets (news, academic, and web text). We added a justification subsection explaining their selection based on coverage of real-world evasion scenarios and domain diversity to support the generalizability of the observed trade-off. revision: yes
Referee: Results section: performance differences and loss magnitudes under attacks are presented without statistical significance tests, confidence intervals, or ablation on ensemble feature importance, making it impossible to confirm that Binoculars-inclusive ensembles incur reliably larger drops.

Authors: We acknowledge the value of statistical rigor for validating the larger drops. We have updated the Results section to include paired t-tests with reported p-values, 95% confidence intervals on the performance losses, and an ablation study on Random Forest feature importance. These additions confirm that Binoculars features drive both the high baseline performance and the statistically significant larger drops under attack. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical evaluation of existing detectors

full rationale

The paper conducts an empirical comparison of three pre-existing detection methods (fine-tuned RoBERTa, Binoculars, text feature analysis) and their Random Forest ensembles on paraphrased text. No equations, derivations, fitted parameters, or self-citations are used to derive or justify any result by construction. All claims rest on experimental outcomes that can be independently reproduced or falsified on external datasets. This matches the default case of a self-contained empirical study with no load-bearing reductions to prior inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are described in the abstract; the work consists of empirical testing of standard machine learning detection methods.

pith-pipeline@v0.9.0 · 5428 in / 1035 out tokens · 57794 ms · 2026-05-15T01:31:41.008483+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

14 extracted references · 9 canonical work pages · 1 internal anchor

[1]

https:// www.gptinf.com/

Gptinf: Ai content detection bypass tool. https:// www.gptinf.com/. Accessed: 2025-01-31. Harika Abburi, Michael Suesserman, Nirmala Pudota, Balaji Veeramani, Edward Bowen, and Sanmitra Bhattacharya

2025
[2]

Souradip Chakraborty, Amrit Singh Bedi, Sicheng Zhu, Bang An, Dinesh Manocha, and Furong Huang

Generative ai text classifi- cation using ensemble llm approaches.Preprint, arXiv:2309.07755. Souradip Chakraborty, Amrit Singh Bedi, Sicheng Zhu, Bang An, Dinesh Manocha, and Furong Huang

work page arXiv
[3]

Preprint, arXiv:2304.04736

On the possibilities of ai-generated text detection. Preprint, arXiv:2304.04736. Roberto Corizzo and Sebastian Leal-Arenas

work page arXiv
[4]

Epub 2023 Jun

Distinguish- ing academic science writing from humans or chat- gpt with over 99% accuracy using off-the-shelf ma- chine learning tools.Cell Reports Physical Science, 4(6):101426. Epub 2023 Jun

2023
[5]

Xiaomeng Hu, Pin-Yu Chen, and Tsung-Yi Ho

Spotting llms with binoculars: Zero-shot detection of machine-generated text.Preprint, arXiv:2401.12070. Xiaomeng Hu, Pin-Yu Chen, and Tsung-Yi Ho

work page arXiv
[6]

Kalpesh Krishna, Yixiao Song, Marzena Karpinska, John Wieting, and Mohit Iyyer

Radar: Robust ai-text detection via adversarial learn- ing.Preprint, arXiv:2307.03838. Kalpesh Krishna, Yixiao Song, Marzena Karpinska, John Wieting, and Mohit Iyyer

work page arXiv
[7]

Rahul Kumar and Michael Mindzak

Paraphras- ing evades detectors of ai-generated text, but retrieval is an effective defense.Preprint, arXiv:2303.13408. Rahul Kumar and Michael Mindzak

work page arXiv
[8]

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Roberta: A robustly optimized BERT pretraining approach.CoRR, abs/1907.11692. Arthur I. Miller

work page internal anchor Pith review Pith/arXiv arXiv 1907
[9]

Alberto Muñoz-Ortiz, Carlos Gómez-Rodríguez, and David Vilares

Luxveri at genai detection task 1: Inverse perplex- ity weighted ensemble for robust detection of ai- generated text across english and multilingual con- texts.Preprint, arXiv:2501.11914. Alberto Muñoz-Ortiz, Carlos Gómez-Rodríguez, and David Vilares

work page arXiv
[10]

Copyright © 2023 Journal of International Society of Preventive and Community Dentistry

Misconduct in biomedical re- search: A meta-analysis and systematic review.Jour- nal of International Society of Preventive & Commu- nity Dentistry, 13(3):185–193. Copyright © 2023 Journal of International Society of Preventive and Community Dentistry. Diana Trandab˘at, and Daniela Gifu

2023
[11]

27th International Conference on Knowledge Based and Intelligent Information and Engineering Sytems (KES 2023)

Discriminat- ing ai-generated fake news.Procedia Computer Sci- ence, 225:3822–3831. 27th International Conference on Knowledge Based and Intelligent Information and Engineering Sytems (KES 2023). Vivek Verma, Eve Fleisig, Nicholas Tomlin, and Dan Klein

2023
[12]

Ghostbuster: Detecting text ghostwrit- ten by large language models. InProceedings of the 2024 Conference of the North American Chap- ter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 1702–1717, Mexico City, Mexico. As- sociation for Computational Linguistics. Yuxia Wang, Artem Shelmanov, Joni...

2024
[13]

human.arXiv preprint arXiv:2501.11012

Genai content de- tection task 1: English and multilingual machine- generated text detection: Ai vs. human.arXiv preprint arXiv:2501.11012. Junchao Wu, Shu Yang, Runzhe Zhan, Yulin Yuan, Derek F. Wong, and Lidia S. Chao

work page arXiv
[14]

tiiuae/falcon-7b

A survey on llm-generated text detection: Necessity, methods, and future directions.Preprint, arXiv:2310.14724. A Appendix A.0.1 Dataset The training dataset contained a total of 610k entries from HC3, M4GT, and MAGE. The test dataset contained a total of 74k entries from CU- DRT, IELTS, NLPeer, PeerSum, and MixSet. We replicated 3 methods as well as thei...

work page arXiv