No Hidden Prompts Needed! You Can Game AI Peer Review with Presentation-Only Revisions
Pith reviewed 2026-06-27 06:36 UTC · model grok-4.3
The pith
AI peer reviewers award higher scores after changes to only a paper's abstract, framing and narrative.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Adversarial repackaging achieves a 75.1% attack success rate and a mean score gain of +1.21/10 by modifying only presentation-level content while keeping scientific evidence fixed. Strategies that change how the reviewer interprets the paper, such as related-work repositioning and analytical discussion expansion, substantially outperform surface edits such as local polishing, table formatting, and algorithm boxes. AI reviewers are easier to impress than to convince, and they can confuse the appearance of addressing a limitation with actually resolving it.
What carries the argument
Adversarial repackaging: a closed-loop attack that uses AI-reviewer feedback to search for presentation-level revisions while keeping the scientific evidence fixed.
If this is right
- Presentation-only revisions can produce large score gains without any alteration to scientific content.
- Interpretation-shifting edits outperform surface-level polishing.
- Highlighting strengths reliably raises perceived merit more than attempts to dissolve weaknesses.
- Unchanged evidence can be reinterpreted as a stronger contribution through narrative adjustments.
Where Pith is reading between the lines
- AI review systems may require explicit mechanisms that anchor scores to concrete evidence rather than narrative framing.
- The released contamination-free benchmark enables repeated testing of whether future AI reviewers stay anchored to scientific content.
- In deployed review pipelines, authors could systematically optimize presentation for AI scoring even when the underlying work is unchanged.
- Whether human reviewers exhibit similar sensitivity to presentation repositioning remains outside the scope of the reported experiments.
Load-bearing premise
The modifications to abstract, contribution framing, related work, discussion, and narrative structure constitute purely presentation-level changes that do not alter the interpretation or perceived strength of the underlying scientific evidence.
What would settle it
Apply the same set of presentation revisions to a paper but instruct the AI reviewer to ignore all narrative, abstract, and discussion text and score only the methods, experiments, and numerical results; if scores remain unchanged, the claim is falsified.
read the original abstract
As AI-generated reviews move from experimental tools into peer-review infrastructure, most robustness concerns have focused on explicit attacks such as hidden instructions and prompt injection. We study a harder and more policy-relevant failure mode: no hidden text, no prompt injection, and no changes to methods, experiments, figures, equations, proofs, or numerical results. The attacker modifies only presentation-level content, such as the abstract, contribution framing, related work, discussion, and narrative structure. We introduce adversarial repackaging: a closed-loop attack that uses AI-reviewer feedback to search for presentation-level revisions while keeping the scientific evidence fixed. Across three mainstream AI reviewers, adversarial repackaging achieves a 75.1% attack success rate and a mean score gain of +1.21/10. The effect is not explained by ordinary prose polishing. We also reveal that strategies that change how the reviewer interprets the paper, such as related-work repositioning and analytical discussion expansion, substantially outperform surface edits such as local polishing, table formatting, and algorithm boxes. Our analysis reveals two deeper structural failure modes. First, AI reviewers are easier to impress than to convince: highlighting strengths reliably increases perceived merit, while attempts to dissolve weaknesses frequently backfire. Second, AI reviewers can confuse the appearance of addressing a limitation with actually resolving it, allowing unchanged evidence to be reinterpreted as stronger scientific contribution. These results show that the deployment risk is not only malicious hidden instructions, but the emergence of paper presentation itself as an optimization surface. We release a contamination-free rolling benchmark and attack framework for testing whether AI reviewers remain anchored to scientific content under presentation-only edits.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that AI peer reviewers can be successfully attacked ('gamed') using only presentation-level revisions—such as changes to the abstract, contribution framing, related work, discussion, and narrative structure—while keeping all scientific evidence, methods, experiments, figures, equations, and numerical results fixed. It introduces a closed-loop 'adversarial repackaging' attack that uses AI-reviewer feedback to optimize these revisions. Across three mainstream AI reviewers, the approach yields a 75.1% attack success rate and mean score gain of +1.21/10. The paper further identifies two structural failure modes (AI reviewers are easier to impress than to convince; they confuse the appearance of addressing limitations with actual resolution) and releases a contamination-free rolling benchmark and attack framework.
Significance. If the strict separation between presentation and interpretive changes can be maintained, the results would demonstrate that AI reviewers are vulnerable to optimization over narrative framing alone, with implications for any deployment of AI in peer review. The release of a contamination-free rolling benchmark and attack framework is a concrete strength that supports reproducibility and future testing of whether AI reviewers remain anchored to scientific content.
major comments (1)
- [Abstract] Abstract: The central claim requires that all revisions keep 'the scientific evidence fixed' and constitute 'presentation-level content' only. However, the abstract explicitly states that 'strategies that change how the reviewer interprets the paper, such as related-work repositioning and analytical discussion expansion, substantially outperform surface edits'. Related-work repositioning alters perceived novelty and contribution claims; analytical discussion expansion reframes interpretive claims about the fixed results. These are not presentation-only under any standard definition and directly violate the 'evidence fixed' precondition, so the reported 75.1% success rate and +1.21 score gain cannot be isolated to presentation effects.
minor comments (1)
- The abstract refers to a 'contamination-free rolling benchmark' but provides no details on the contamination checks or rolling mechanism; adding a brief description would improve clarity without affecting the core argument.
Simulated Author's Rebuttal
We thank the referee for the detailed comment on the scope of presentation-level revisions. We respond point-by-point below.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim requires that all revisions keep 'the scientific evidence fixed' and constitute 'presentation-level content' only. However, the abstract explicitly states that 'strategies that change how the reviewer interprets the paper, such as related-work repositioning and analytical discussion expansion, substantially outperform surface edits'. Related-work repositioning alters perceived novelty and contribution claims; analytical discussion expansion reframes interpretive claims about the fixed results. These are not presentation-only under any standard definition and directly violate the 'evidence fixed' precondition, so the reported 75.1% success rate and +1.21 score gain cannot be isolated to presentation effects.
Authors: We thank the referee for this observation. The manuscript explicitly includes related-work repositioning and analytical discussion expansion within the scope of presentation-level revisions, as these operations modify only the narrative structure and framing around the unchanged scientific content. Related-work repositioning entails recontextualizing the contribution by adjusting references to prior work without changing the paper's methods or results. Analytical discussion expansion involves elaborating on the implications and interpretations of the fixed experimental findings. These are not changes to the evidence itself but to how it is presented and interpreted by the reviewer. The paper demonstrates that AI reviewers are susceptible to such framing adjustments, which is the core finding. The distinction from surface edits is intentional, as the results show narrative strategies are more impactful. Thus, the success metrics are for this class of revisions as defined. We maintain that the precondition is satisfied and no revision to the manuscript is necessary. revision: no
Circularity Check
No significant circularity; empirical measurement study
full rationale
The paper is an empirical attack study that directly measures attack success rates and score gains on external AI reviewers. No mathematical derivations, equations, fitted parameters, or self-citation load-bearing steps are present in the provided text or abstract. The central results (75.1% success rate, +1.21 mean gain) are obtained by running the attack against independent systems rather than reducing to any input by construction. The noted tension between 'evidence fixed' and 'interpretation-changing strategies' is a validity concern, not a circularity reduction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption AI reviewers can be influenced by changes in presentation framing without changes to scientific content.
Reference graph
Works this paper leans on
-
[1]
Litllms, llms for literature review: Are we there yet?Transactions on Machine Learning Research, 2025
Shubham Agarwal, Gaurav Sahu, Abhay Puri, Issam H Laradji, Krishnamurthy Dj Dvijotham, Jason Stanley, Laurent Charlin, and Christopher Pal. Litllms, llms for literature review: Are we there yet?Transactions on Machine Learning Research, 2025
2025
-
[2]
Pre-review to peer review: Pitfalls of automating reviews using large language models, 2025
Akhil Pandey Akella, Harish Varma Siravuri, and Shaurya Rohatgi. Pre-review to peer review: Pitfalls of automating reviews using large language models, 2025. URLhttps://arxiv.org/ abs/2512.22145
arXiv 2025
-
[3]
Stop automating peer review without rigorous evaluation
Joachim Baumann, Jiaxin Pei, Sanmi Koyejo, and Dirk Hovy. Stop automating peer review without rigorous evaluation. InPost-AGI Science and Society Workshop, 2026. URLhttps: //openreview.net/forum?id=cJhlquXIuS
2026
-
[4]
Ai-assisted peer review at scale: The aaai-26 ai review pilot.arXiv preprint arXiv:2604.13940, 2026
Joydeep Biswas, Sheila Schoepp, Gautham Vasan, Anthony Opipari, Arthur Zhang, Zichao Hu, 11 /wayd-magic-sparklesNo Hidden Prompts Needed! You Can Game AI Peer Review with Presentation-Only Revisions Sebastian Joseph, Matthew Lease, Junyi Jessy Li, Peter Stone, et al. Ai-assisted peer review at scale: The aaai-26 ai review pilot.arXiv preprint arXiv:2604.1...
Pith/arXiv arXiv 2026
-
[5]
TreeReview: A dynamic tree of questions framework for deep and efficient LLM-based scientific peer review
Yuan Chang, Ziyue Li, Hengyuan Zhang, Yuanbo Kong, Yanru Wu, Hayden Kwok-Hay So, Zhijiang Guo, Liya Zhu, and Ngai Wong. TreeReview: A dynamic tree of questions framework for deep and efficient LLM-based scientific peer review. In Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng, editors,Proceedings of the 2025 Conference on E...
2025
-
[6]
A Survey on In-context Learning
Association for Computational Linguistics. ISBN 979-8-89176-332-6. doi: 10.18653/v1/ 2025.emnlp-main.790. URLhttps://aclanthology.org/2025.emnlp-main.790/
-
[7]
Pangram predicts 21% of iclr reviews are ai-generated.Pangram Labs Blog, Nov, 2025
Bradley Emi. Pangram predicts 21% of iclr reviews are ai-generated.Pangram Labs Blog, Nov, 2025
2025
-
[8]
Maximilian Idahl and Zahra Ahmadi. Openreviewer: A specialized large language model for generating critical scientific paper reviews.arXiv preprint arXiv:2412.11948, 2024
arXiv 2024
-
[9]
Fengqing Jiang, Yichen Feng, Yuetai Li, Luyao Niu, Basel Alomair, and Radha Poovendran. Badscientist: Can a research agent write convincing but unsound papers that fool llm reviewers? arXiv preprint arXiv:2510.18003, 2025
Pith/arXiv arXiv 2025
-
[10]
Is bert really robust? a strong baseline for natural language attack on text classification and entailment
Di Jin, Zhijing Jin, Joey Tianyi Zhou, and Peter Szolovits. Is bert really robust? a strong baseline for natural language attack on text classification and entailment. InProceedings of the AAAI conference on artificial intelligence, volume 34, pages 8018–8025, 2020
2020
-
[11]
Paraphrasing adversarial attack on llm-as-a-reviewer.arXiv preprint arXiv:2601.06884, 2026
Masahiro Kaneko. Paraphrasing adversarial attack on llm-as-a-reviewer.arXiv preprint arXiv:2601.06884, 2026
arXiv 2026
-
[12]
Jaeho Kim, Yunseok Lee, and Seulki Lee. Position: The ai conference peer review crisis demands author feedback and reviewer rewards.arXiv preprint arXiv:2505.04966, 2025
arXiv 2025
-
[13]
Where do llms go wrong? diagnosing automated peer review via aspect-guided multi-level perturbation
Jiatao Li, Yanheng Li, Xinyu Hu, Mingqi Gao, and Xiaojun Wan. Where do llms go wrong? diagnosing automated peer review via aspect-guided multi-level perturbation. InProceedings of the 34th ACM International Conference on Information and Knowledge Management, pages 1572–1581, 2025
2025
-
[14]
Llm-reval: Can we trust llm reviewers yet?arXiv preprint arXiv:2510.12367, 2025
Rui Li, Jia-Chen Gu, Po-Nien Kung, Heming Xia, Xiangwen Kong, Zhifang Sui, Nanyun Peng, et al. Llm-reval: Can we trust llm reviewers yet?arXiv preprint arXiv:2510.12367, 2025
arXiv 2025
-
[15]
Songze Li, Chuokun Xu, Jiaying Wang, Xueluan Gong, Chen Chen, Jirui Zhang, Jun Wang, Kwok-Yan Lam, and Shouling Ji. Llms cannot reliably judge (yet?): A comprehensive assessment on the robustness of llm-as-a-judge.arXiv preprint arXiv:2506.09443, 2025
arXiv 2025
-
[16]
WeixinLiang,ZacharyIzzo,YaohuiZhang,HaleyLepp,HanchengCao,XuandongZhao,Lingjiao Chen, Haotian Ye, Sheng Liu, Zhi Huang, et al. Monitoring ai-modified content at scale: A case study on the impact of chatgpt on ai conference peer reviews.arXiv preprint arXiv:2403.07183, 2024
Pith/arXiv arXiv 2024
-
[17]
Stop ddos attack- ing the research community with ai-generated survey papers.Advances in Neural Information Processing Systems, 38, 2026
Jianghao Lin, Rong Shan, Jiachen Zhu, Yunjia Xi, Yong Yu, and Weinan Zhang. Stop ddos attack- ing the research community with ai-generated survey papers.Advances in Neural Information Processing Systems, 38, 2026
2026
-
[18]
Tzu-LingLin, Wei-ChihChen, Teng-FangHsiao, Hou-ILiu, Ya-HsinYeh, Yu-KaiChan, Wen-Sheng Lien, Po-Yen Kuo, Philip S. Yu, and Hong-Han Shuai. Breaking the reviewer: Assessing the 12 /wayd-magic-sparklesNo Hidden Prompts Needed! You Can Game AI Peer Review with Presentation-Only Revisions vulnerability of large language models in automated peer review under t...
-
[19]
Llm comparative assessment: Zero-shot nlg evaluation through pairwise comparisons using large language models
Adian Liusie, Potsawee Manakul, and Mark Gales. Llm comparative assessment: Zero-shot nlg evaluation through pairwise comparisons using large language models. InProceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 139–151, 2024
2024
-
[20]
Llm evaluators recognize and favor their own generations.Advances in Neural Information Processing Systems, 37:68772–68802, 2024
Arjun Panickssery, Samuel R Bowman, and Shi Feng. Llm evaluators recognize and favor their own generations.Advances in Neural Information Processing Systems, 37:68772–68802, 2024
2024
-
[21]
Is LLM -as-a-Judge Robust? Investigating Universal Adversarial Attacks on Zero-shot LLM Assessment
Vyas Raina, Adian Liusie, and Mark Gales. Is LLM-as-a-judge robust? investigating universal adversarial attacks on zero-shot LLM assessment. In Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen, editors,Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 7499–7517, Miami, Florida, USA, November 2024. Association f...
-
[22]
The ai review lottery: Widespread ai-assisted peer reviews boost paper scores and acceptance rates.Proceedings of the ACM on Human-Computer Interaction, 9(7):1–28, 2025
Giuseppe Russo, Manoel Horta Ribeiro, Tim Ruben Davidson, Veniamin Veselovsky, and Robert West. The ai review lottery: Widespread ai-assisted peer reviews boost paper scores and acceptance rates.Proceedings of the ACM on Human-Computer Interaction, 9(7):1–28, 2025
2025
-
[23]
Exploring the effects of alignment on numerical bias in large language models, 2026
Ayako Sato, Hwichan Kim, Zhousi Chen, Masato Mita, and Mamoru Komachi. Exploring the effects of alignment on numerical bias in large language models, 2026. URLhttps: //arxiv.org/abs/2601.16444
arXiv 2026
-
[24]
Challenges, experiments, and computational solutions in peer review.Communi- cations of the ACM, 65(6):76–87, 2022
Nihar B Shah. Challenges, experiments, and computational solutions in peer review.Communi- cations of the ACM, 65(6):76–87, 2022
2022
-
[25]
Hyungyu Shin, Jingyu Tang, Yoonjoo Lee, Nayoung Kim, Hyunseung Lim, Ji Yong Cho, Hwajung Hong, Moontae Lee, and Juho Kim. Mind the blind spots: A focus-level evaluation framework for LLM reviews. In Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng, editors,Proceedings of the 2025 Conference on Empirical Methods in Natural Lan...
-
[26]
A large-scale randomized study of large language model feedback in peer review.Nature Machine Intelligence, pages 1–11, 2026
Nitya Thakkar, Mert Yuksekgonul, Jake Silberg, Animesh Garg, Nanyun Peng, Fei Sha, Rose Yu, Carl Vondrick, and James Zou. A large-scale randomized study of large language model feedback in peer review.Nature Machine Intelligence, pages 1–11, 2026
2026
-
[27]
Sai Suresh Macharla Vasu, Ivaxi Sheth, Hui-Po Wang, Ruta Binkyte, and Mario Fritz. Justice in judgment: Unveiling(hidden)biasinllm-assistedpeerreviews.arXivpreprintarXiv:2509.13400, 2025
Pith/arXiv arXiv 2025
-
[28]
Can ai be a good peer reviewer? a survey of peer review process, evaluation, and the future, 2026
Sihong Wu, Owen Jiang, Yilun Zhao, Tiansheng Hu, Yiling Ma, Kaiyan Zhang, Manasi Patward- han, and Arman Cohan. Can ai be a good peer reviewer? a survey of peer review process, evaluation, and the future, 2026. URLhttps://arxiv.org/abs/2604.27924. 13 /wayd-magic-sparklesNo Hidden Prompts Needed! You Can Game AI Peer Review with Presentation-Only Revisions
Pith/arXiv arXiv 2026
-
[29]
Jing Yang, Qiyao Wei, and Jiaxin Pei. Paper copilot: Tracking the evolution of peer review in ai conferences.arXiv preprint arXiv:2510.13201, 2025
arXiv 2025
-
[30]
Rui Ye, Xianghe Pang, Jingyi Chai, Jiaao Chen, Zhenfei Yin, Zhen Xiang, Xiaowen Dong, Jing Shao, and Siheng Chen. Are we there yet? revealing the risks of utilizing large language models in scholarly peer review.arXiv preprint arXiv:2412.01708, 2024
arXiv 2024
-
[31]
Reviewrl: Towards automated scientific review with rl.arXiv preprint arXiv:2508.10308, 2025
Sihang Zeng, Kai Tian, Kaiyan Zhang, Yuru Wang, Junqi Gao, Runze Liu, Sa Yang, Jingxuan Li, Xinwei Long, Jiaheng Ma, Biqing Qi, and Bowen Zhou. Reviewrl: Towards automated scientific review with rl.arXiv preprint arXiv:2508.10308, 2025
arXiv 2025
-
[32]
Judging llm-as-a-judge with mt-bench and chatbot arena.Advances in neural information processing systems, 36:46595–46623, 2023
Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric Xing, et al. Judging llm-as-a-judge with mt-bench and chatbot arena.Advances in neural information processing systems, 36:46595–46623, 2023
2023
-
[33]
Qin Zhou, Zhexin Zhang, Zhi Li, and Limin Sun. " give a positive review only": An early investigation into in-paper prompt injection attacks and defenses for ai reviewers.arXiv preprint arXiv:2511.01287, 2025
arXiv 2025
-
[34]
Changjia Zhu, Junjie Xiong, Renkai Ma, Zhicong Lu, Yao Liu, and Lingyao Li. When your reviewer is an llm: Biases, divergence, and prompt injection risks in peer review.arXiv preprint arXiv:2509.09912, 2025. 14 /wayd-magic-sparklesNo Hidden Prompts Needed! You Can Game AI Peer Review with Presentation-Only Revisions Appendix A Presentation-Level Strategy P...
arXiv 2025
-
[35]
Strengths: Did the review become more or less positive in its strengths overall?
-
[36]
Weaknesses + Questions: Did the review become more or less severe in its weaknesses and questions overall?
-
[37]
strength_analysis
Overall framing: Did the summary and sub-scores indicate a more positive or negative overall stance? Important rules: - Judge the overall change in the review, considering both what is said and how it is expressed. Do not judge whether the review is correct. - Compare the two reviews holistically, but do not invent missing edits or motivations. - A critic...
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.