Recognition: unknown
When AI reviews science: Can we trust the referee?
Pith reviewed 2026-05-08 06:15 UTC · model grok-4.3
The pith
AI peer review is vulnerable to manipulation by hidden prompts, prestige framing, and rebuttal sycophancy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The work develops a taxonomy of security and reliability threats to AI peer review and instantiates it with four treatment-control experiments on a stratified sample of ICLR 2025 submissions. Using two advanced LLMs as referees, the probes isolate causal effects of prestige framing, assertion strength, rebuttal sycophancy, and contextual poisoning, producing measurable shifts in review scores. The resulting audit supplies an evidence-based baseline for tracking AI referee reliability and pinpoints concrete failure points that can guide mitigations.
What carries the argument
A lifecycle taxonomy of attacks on AI peer review paired with four treatment-control probes on stratified ICLR 2025 submissions using two LLMs to measure score changes from specific manipulations.
If this is right
- Hidden prompt injections can steer AI reviews toward unjustifiably positive judgments.
- AI referees exhibit measurable brittleness to authority, length, and assertion-strength biases.
- Rebuttals can induce sycophantic adjustments in AI-generated scores.
- Contextual poisoning of manuscripts affects downstream review outcomes.
- The taxonomy and audit together supply a repeatable baseline for monitoring AI peer-review reliability over time.
Where Pith is reading between the lines
- Conferences could add automated scanners for hidden instructions before AI review begins.
- Hybrid human-AI systems might reduce risk by routing flagged manuscripts to human oversight.
- Domain-specific testing at other venues could reveal whether failure modes vary by field or submission volume.
- Specialized review-tuned LLMs might be hardened against the documented attack vectors.
Load-bearing premise
The causal effects seen with two specific LLMs on ICLR 2025 submissions will hold for other models, conferences, and review settings.
What would settle it
A larger experiment using different LLMs on submissions from multiple conferences finds no score shifts from the same prompt injections, prestige framing, or rebuttal phrasing.
Figures
read the original abstract
The volume of scientific submissions continues to climb, outpacing the capacity of qualified human referees and stretching editorial timelines. At the same time, modern large language models (LLMs) offer impressive capabilities in summarization, fact checking, and literature triage, making the integration of AI into peer review increasingly attractive -- and, in practice, unavoidable. Yet early deployments and informal adoption have exposed acute failure modes. Recent incidents have revealed that hidden prompt injections embedded in manuscripts can steer LLM-generated reviews toward unjustifiably positive judgments. Complementary studies have also demonstrated brittleness to adversarial phrasing, authority and length biases, and hallucinated claims. These episodes raise a central question for scholarly communication: when AI reviews science, can we trust the AI referee? This paper provides a security- and reliability-centered analysis of AI peer review. We map attacks across the review lifecycle -- training and data retrieval, desk review, deep review, rebuttal, and system-level. We instantiate this taxonomy with four treatment-control probes on a stratified set of ICLR 2025 submissions, using two advanced LLM-based referees to isolate the causal effects of prestige framing, assertion strength, rebuttal sycophancy, and contextual poisoning on review scores. Together, this taxonomy and experimental audit provide an evidence-based baseline for assessing and tracking the reliability of AI peer review and highlight concrete failure points to guide targeted, testable mitigations.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to provide a security- and reliability-centered analysis of AI peer review by mapping attacks across the review lifecycle in a taxonomy and instantiating it with four treatment-control probes (prestige framing, assertion strength, rebuttal sycophancy, contextual poisoning) on a stratified set of ICLR 2025 submissions using two advanced LLMs. These elements together are presented as an evidence-based baseline for assessing AI peer review reliability and highlighting failure points for mitigations.
Significance. If the results hold, this work would offer a structured taxonomy and empirical evidence from controlled probes that could serve as a foundation for evaluating and improving the trustworthiness of AI in peer review processes. The experimental approach using treatment-control designs is a strength, allowing isolation of specific causal effects on review scores.
major comments (2)
- [Abstract] The abstract outlines the experimental audit but provides no information on sample sizes, statistical methods, effect sizes, or observed results from the probes. Without these, the support for the central claim of an 'evidence-based baseline' cannot be assessed, as the magnitude and significance of the causal effects remain unknown.
- [Experimental probes (as described in Abstract)] The study uses only two LLMs and submissions from a single conference (ICLR 2025). This narrow scope raises questions about generalizability, as the observed effects on review scores might not extend to other models, conferences, or review contexts, thereby weakening the assertion that the findings provide a baseline for tracking reliability broadly.
minor comments (1)
- Consider adding a table summarizing the probe designs, sample characteristics, and key quantitative outcomes to improve clarity and verifiability.
Simulated Author's Rebuttal
We appreciate the referee's detailed review and the opportunity to clarify and strengthen our manuscript. We address the major comments below and indicate the revisions we will make.
read point-by-point responses
-
Referee: [Abstract] The abstract outlines the experimental audit but provides no information on sample sizes, statistical methods, effect sizes, or observed results from the probes. Without these, the support for the central claim of an 'evidence-based baseline' cannot be assessed, as the magnitude and significance of the causal effects remain unknown.
Authors: We agree with this observation. The current abstract focuses on the structure of the study but omits quantitative details that are present in the main body of the paper. In the revised version, we will expand the abstract to concisely report the sample size of the stratified ICLR 2025 submissions, the statistical methods employed for the treatment-control comparisons, and the key observed effect sizes and their significance. This will better support the claim of providing an evidence-based baseline. revision: yes
-
Referee: [Experimental probes (as described in Abstract)] The study uses only two LLMs and submissions from a single conference (ICLR 2025). This narrow scope raises questions about generalizability, as the observed effects on review scores might not extend to other models, conferences, or review contexts, thereby weakening the assertion that the findings provide a baseline for tracking reliability broadly.
Authors: We acknowledge that the experimental scope is limited to two LLMs and one conference, which is a valid concern for broad generalizability. This design choice was made to ensure high internal validity and control in the causal probes, as explained in the methods. The paper already includes a limitations section discussing this, and we position the results as an initial baseline for the taxonomy rather than a universal finding. To address the comment, we will revise the abstract and discussion to more explicitly temper the claims about providing a 'baseline for tracking reliability broadly' and emphasize the need for future multi-model, multi-conference studies. We believe this partial revision clarifies the contribution without requiring new experiments. revision: partial
Circularity Check
No circularity: purely empirical taxonomy and controlled probes
full rationale
The paper constructs a taxonomy of AI-review attacks and instantiates it via four treatment-control experiments on an external stratified sample of ICLR 2025 submissions evaluated by two fixed LLMs. No equations, fitted parameters, or first-principles derivations appear; the reported causal effects on review scores are direct observations from the probes rather than quantities that reduce to prior fits or self-citations by construction. The baseline is therefore generated from independent data rather than from any self-referential mapping.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Modern large language models offer impressive capabilities in summarization, fact checking, and literature triage that make them suitable for integration into peer review
Reference graph
Works this paper leans on
-
[1]
overwhelmed
Sample I. (2025). Quality of scientific papers ques- tioned as academics “overwhelmed” by the millions published. The Guardian. https://www.theguardian. com/science/2025/jul/13/quality-of-scientific-papers- questioned-as-academicsoverwhelmed-by-the-millions- published
2025
-
[2]
Adam D. (2025). The peer-review crisis: How to fix an over- loaded system. Nature 644:24-27. DOI:10.1038/d41586- 025-02457-2
-
[3]
Bergstrom C.T. and Bak-Coleman J. (2025). AI, peer review and the human activity of science. Nature. DOI:10.1038/d41586-025-01839-w
-
[4]
Khalifa M. and Albadawy M. (2024). Using artificial intelli- gence in academic writing and research: An essential pro- ductivity tool. Comput. Methods Programs Biomed. Update 5:100145. DOI:10.1016/j.cmpbup.2024.100145
-
[5]
Chen Q., Y ang M., Qin L., et al. (2025). AI4Research: A survey of artificial intelligence for scientific research. arXiv preprint. DOI:10.48550/arXiv.2507.01903
-
[6]
Luo Z., Y ang Z., Xu Z., et al. (2025). Llm4sr: A survey on large language models for scientific research. arXiv preprint. DOI:10.48550/arXiv:2501.04306
-
[7]
Liang W., Izzo Z., Zhang Y ., et al. (2024). Monitoring AI- modified content at scale: A case study on the impact of ChatGPT on AI conference peer reviews. Proc. Int. Conf. Mach. Learn. 235:1192. DOI:10.5555/3692070.3693262
-
[8]
Wu D. (2025). Researchers are using AI for peer reviews—and finding ways to cheat it. The Washington Post. https://www.washingtonpost.com/nation/2025/07/17/aiuniversity- research-peer-review/
2025
-
[9]
Tong T., Wang F ., Zhao Z., et al. (2025). Badjudge: Backdoor vulnerabilities of llm-asa-judge. arXiv preprint. DOI:10.48550/arXiv.2503.00596
-
[10]
Gibney E. (2025). Scientists hide messages in pa- pers to game AI peer review. Nature 643:887-888. DOI:10.1038/d41586-025-02172-y
-
[11]
Ji Z., Lee N., Frieske R., et al. (2023). Survey of hallucination in natural language generation. ACM Comput. Surv. 55:1-38. DOI:10.1145/3571730
-
[12]
Jin Y ., Zhao Q., Wang Y ., et al. (2024). Agentreview: Ex- ploring peer review dynamics with llm agents. arXiv preprint. DOI:10.48550/arXiv.2406.12708
work page internal anchor Pith review doi:10.48550/arxiv.2406.12708 2024
-
[13]
Y e J., Wang Y ., Huang Y ., et al. (2024). Justice or preju- dice? quantifying biases in llmas-a-judge. arXiv preprint. DOI:10.48550/arXiv.2410.02736
-
[14]
Lin T.-L., Chen W.-C., Hsiao T.-F ., et al. (2025). Breaking the reviewer: Assessing the vulnerability of large language models in automated peer review under textual adversarial attacks. arXiv preprint. DOI:10.48550/arXiv.2506.11113
-
[15]
Li Y ., Jiang Y ., Li Z., et al. (2024). Backdoor learning: A survey. IEEE Trans. Neural Netw. Learn. Syst. 35:5-22. DOI:10.1109/TNNLS.2022.3182979
-
[16]
Zhang Y ., Rando J., Evtimov I., et al. (2024). Per- sistent pre-training poisoning of llms. arXiv preprint. DOI:10.48550/arXiv.2410.13722
-
[17]
Ignore Previous Prompt: Attack Techniques For Language Models
Perez F . and Ribeiro I. (2022). Ignore previous prompt: Attack techniques for language models. arXiv preprint. DOI:10.48550/arXiv.2211.09527
work page internal anchor Pith review doi:10.48550/arxiv.2211.09527 2022
-
[19]
Sharma M., Tong M., Korbak T., et al. (2023). Towards un- derstanding sycophancy in language models. arXiv preprint. DOI:10.48550/arXiv.2310.13548
work page internal anchor Pith review doi:10.48550/arxiv.2310.13548 2023
-
[20]
Fanous A., Goldberg J., Agarwal A., et al. (2025). Syceval: Evaluating llm sycophancy. Proc. AAAI/ACM Conf. AI Ethics Soc. 8:893-900. DOI:10.48550/arXiv.2502.08177
-
[21]
Shi J., Yuan Z., Liu Y ., et al. (2024). Optimization- based prompt injection attack to llm-as-a-judge. Proc. ACM SIGSAC Conf. Comput. Commun. Secur. 2024:660-674. DOI:10.1145/3658644.3690291
-
[22]
Malmqvist L. (2025). Sycophancy in large language models: Causes and mitigations. Intell. Comput. Proc. Comput. Conf. 2024:61-74. DOI:10.1007/978-3031-92611-2_5
-
[24]
Nuijten M.B., van Assen M.A.L.M., Hartgerink C.H.J., et al. (2017). The validity of the tool “statcheck” in discover- ing statistical reporting inconsistencies. PsyArXiv preprint. DOI:10.31234/osf.io/tcxaj
-
[25]
Checco A., Bracciale L., Loreti P ., et al. (2021). AI- assisted peer review. Humanit. Soc. Sci. Commun. 8:1-11. DOI:10.1057/s41599-020-00703-8
-
[26]
and Zemel R.S
Charlin L. and Zemel R.S. (2013). The toronto paper matching system: An automated paper-reviewer assign- ment system. ICML PEER. https://www.cs.toronto.edu/ lchar- lin/papers/tpms.pdf 15
2013
-
[27]
Leyton-Brown K., Mausam., Nandwani Y ., et al. (2024). Matching papers and reviewers at large conferences. Ar- tif. Intell. 331:104119. DOI:10.1016/j.artint.2024
-
[28]
Liu R. and Shah N.B. (2023). Reviewergpt? An exploratory study on using large language models for paper reviewing. arXiv preprint. DOI:10.48550/arXiv.2306
-
[29]
Quantum error thresholds for gauge-redundant digitiza- tions of lattice field theories
Gao Z., Brantley K. and Joachims T. (2024). Reviewer2: Optimizing review generation through prompt generation. arXiv preprint. DOI:10.48550/arXiv.2402
-
[30]
Yu J., Ding Z., Tan J., et al. (2024). Automated peer reviewing in paper sea: Standardization, evaluation, and analysis. arXiv preprint. DOI:10.18653/v1/2024. findings-emnlp.595
-
[31]
Wang Q., Zeng Q., Huang L., et al. (2020). ReviewRobot: Explainable paper review generation based on knowledge synthesis. arXiv preprint. DOI:10.18653/v1/2020. inlg-1.44
-
[32]
Weng Y ., Zhu M., Bao G., et al. (2024). Cycleresearcher: Improving automated research via automated review. arXiv preprint. DOI:10.48550/arXiv.2411.00816
-
[33]
D’Arcy M., Hope T., Birnbaum L., et al. (2024). Marg: Multi- agent review generation for scientific papers. arXiv preprint. DOI:10.48550/arXiv.2401.04259
-
[34]
Taechoyotin P ., Wang G., Zeng T., et al. (2024). MAMORX: Multi-agent multi-modal scientific review generation with ex- ternal knowledge. Proc. NeurIPS Workshop Found. Models Sci. https://openreview.net/forum?id=frvkE8rCfX
2024
-
[35]
Sun L., Chan A., Chang Y .S., et al. (2024). ReviewFlow: Intelligent scaffolding to support academic peer review- ing. Proc. Int. Conf. Intell. User Interfaces 2024:120-137. DOI:10.1145/3640543.3645159
-
[36]
Zyska D., Dycke N., Buchmann J., et al. (2023). CARE: Col- laborative AI-assisted reading environment. arXiv preprint. DOI:10.18653/v1/2023.acl-demo.28
-
[37]
Mathur P ., Siu A., Manjunatha V., et al. (2024). DocPilot: Copilot for automating PDF edit workflows in documents. Proc. Annu. Meet. Assoc. Comput. Linguist. 3:232-246. DOI:10.18653/v1/2024.acl-demos.22
-
[38]
Shanahan D. (2016). A peerless review? Automating methodological and statistical review.https://blogs.biomedcentral.com/bmcblog/2016/05/23/peerless- reviewautomating-methodological-statistical-review/
2016
-
[39]
Cyranoski D. (2019). Artificial intelligence is select- ing grant reviewers in China. Nature 569:316-317. DOI:10.1038/d41586-019-01517-8
-
[41]
Lin E., Peng Z. and Fang Y . (2025). Evaluating and enhanc- ing large language models for novelty assessment in schol- arly publications. Proc. Workshop AI Sci. Discov. 2025:46-57. DOI:10.18653/v1/2025.aisd-main.5
- [42]
-
[43]
Couto P .H., Ho Q.P ., Kumari N., et al. (2024). Relevai- reviewer: A benchmark on AI reviewers for survey paper relevance. arXiv preprint. DOI:10.48550/arXiv.2406.10294
-
[44]
Rethinking data augmentation for robust LiDAR semantic segmentation in adverse weather,
Faizullah A.R.B.M., Urlana A. and Mishra R. (2024). Limgen: Probing the llms for generating suggestive limitations of research papers. Proc. Jt. Eur. Conf. Mach. Learn. Knowl. Discov. Databases 2024:106-124. DOI:10.1007/978-3-031- 70344-7_7
-
[45]
Bhatia C., Pradhan T. and Pal S. (2020). Metagen: An academic meta-review generation system. Proc. Int. ACM SIGIR Conf. Res. Dev. Inf. Retr. 2020:1653-1656. DOI:10.1145/3397271.3401190
-
[46]
Shen C., Cheng L., Zhou R., et al. (2022). MReD: A meta-review dataset for structurecontrollable text generation. Findings Assoc. Comput. Linguist. ACL 2022: 2521-2535. DOI:10.18653/v1/2022.findings-acl.198
-
[47]
Zeng Q., Sidhu M., Blume A., et al. (2024). Scientific opin- ion summarization: Paper meta-review generation dataset, methods, and evaluation. Proc. Int. Jt. Conf. Artif. Intell. 2024:20-38. DOI:10.1007/978-981-97-9536-9_2
-
[48]
and Lau J
Li M., Hovy E. and Lau J. (2023). Summarizing multiple doc- uments with conversational structure for meta-review gener- ation. Findings Assoc. Comput. Linguist. EMNLP 2023:7089-
2023
-
[49]
DOI:10.18653/v1/2023.findings-emnlp.472
-
[50]
Sun L., Tao S., Hu J., et al. (2024). MetaWriter: Exploring the the potential and perils of ai writing support in scien- tific peer review. Proc. ACM Hum.-Comput. Interact. 8:1-32. DOI:10.1145/3637371
-
[51]
Darrin M., Arous I., Piantanida P ., et al. (2024). Glimpse: Pragmatically informative multi-document summarization for scholarly reviews. Proc. Annu. Meet. Assoc. Comput. Linguist. 2024:12737-12752. DOI:10.18653/v1/2024.acl- long.688
-
[52]
Sukpanichnant P ., Rapberger A. and Toni F . (2024). Peer- arg: Argumentative peer review with llms. arXiv preprint. DOI:10.48550/arXiv.2409.16813
-
[53]
Hossain E., Sinha S.K., Bansal N., et al. (2025). Llms as meta-reviewers’ assistants: A case study. Proc. Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. 2025:7763-7803. DOI:10.18653/v1/2025.naacl-long.395
-
[54]
Krizhevsky A., Sutskever I. and Hinton G.E. (2017). Ima- geNet classification with deep convolutional neural networks. Commun. ACM 60:84-90. DOI:10.1145/3065386
-
[55]
Hinton G., Deng L., Yu D., et al. (2012). Deep neural net- works for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process. Mag. 29:82-97. DOI:10.1109/msp.2012.2205597
-
[56]
Devlin J., Chang M.-W., Lee K., et al. (2019). Bert: Pre-training of deep bidirectional transformers for lan- guage understanding. Proc. NAACL HLT:4171-4186. DOI:10. 18653/v1/N19-1423
2019
-
[57]
Szegedy C., Zaremba W., Sutskever I., et al. (2013). Intriguing properties of neural networks. arXiv preprint. DOI:10.48550/arXiv.1312.6199 16
work page internal anchor Pith review doi:10.48550/arxiv.1312.6199 2013
-
[58]
Biggio B. and Roli F . (2018). Wild patterns: Ten years af- ter the rise of adversarial machine learning. Proc. ACM SIGSAC Conf. Comput. Commun. Secur. 2018:2154-2156. DOI:10.1145/3243734.3264418
-
[59]
Explaining and Harnessing Adversarial Examples
Goodfellow I.J., Shlens J. and Szegedy C. (2014). Explain- ing and harnessing adversarial examples. arXiv preprint. DOI:10.48550/arXiv.1412.6572
work page internal anchor Pith review doi:10.48550/arxiv.1412.6572 2014
-
[60]
Athalye A., Carlini N. and Wagner D. (2018). Obfuscated gradients give a false sense of security: Circumventing de- fenses to adversarial examples. Proc. Int. Conf. Mach. Learn. 2018:274-283. DOI:10.48550/arXiv.1802.00420
-
[61]
Barreno M., Nelson B., Sears R., et al. (2006). Can ma- chine learning be secure? Proc. ACM Symp. Inf. Comput. Commun. Secur. 2006:16-25. DOI:10.1145/1128817
-
[62]
Biggio B., Corona I., Maiorca D., et al. (2013). Evasion attacks against machine learning at test time. Proc. Jt. Eur. Conf. Mach. Learn. Knowl. Discov. Databases 2013:387-402. DOI:10.1007/978-3-642-40994-3_25
-
[64]
Papernot N., McDaniel P ., Jha S., et al. (2016). The limita- tions of deep learning in adversarial settings. Proc. IEEE Eur. Symp. Secur. Priv. 2016:372-387. DOI:10.1109/ Eu- roSP .2016.36
2016
-
[65]
Madry A., Makelov A., Schmidt L., et al. (2017). Towards deep learning models resistant to adversarial attacks. arXiv preprint. DOI:10.48550/arXiv.1706.06083
work page internal anchor Pith review doi:10.48550/arxiv.1706.06083 2017
-
[66]
Chen P .-Y ., Zhang H., Sharma Y ., et al. (2017). Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute mod- els. Proc. ACM Workshop Artif. Intell. Secur. 2017:15-26. DOI:10.1145/3128572.3140448
-
[67]
Ilyas A., Engstrom L., Athalye A., et al. (2018). Black- box adversarial attacks with limited queries and infor- mation. Proc. Int. Conf. Mach. Learn. 2018:2137-2146. DOI:10.48550/arXiv.1804.08598
- [68]
-
[69]
Fredrikson M., Jha S. and Ristenpart T. (2015). Model inversion attacks that exploit confidence in- formation and basic countermeasures. Proc. ACM SIGSAC Conf. Comput. Commun. Secur. 2015:1322-1333. DOI:10.1145/2810103.2813677
-
[70]
Shokri R., Stronati M., Song C., et al. (2017). Member- ship inference attacks against machine learning models. Proc. IEEE Symp. Secur. Priv. 2017:3-18. DOI:10.1109/SP . 2017.41
work page doi:10.1109/sp 2017
-
[71]
Tramèr F ., Zhang F ., Juels A., et al. (2016). Stealing machine learning models via prediction APIs. Proc. USENIX Secur. Symp. 2016:601-618. DOI:10.5555/3241094
-
[72]
Y eom S., Giacomelli I., Fredrikson M., et al. (2018). Privacy risk in machine learning: Analyzing the connection to over- fitting. Proc. IEEE Comput. Secur. Found. Symp. 2018:268-
2018
-
[73]
DOI:10.1109/CSF .2018.00027
work page doi:10.1109/csf 2018
-
[74]
Biggio B., Nelson B. and Laskov P . (2012). Poisoning attacks against support vector machines. arXiv preprint. DOI:10.5555/3042573.3042761
-
[75]
Tolpegin V., Truex S., Gursoy M.E., et al. (2020). Data poisoning attacks against federated learning systems. Proc. Eur. Symp. Res. Comput. Secur. 2020:480-501. DOI:10.1007/978-3-030-58951-6_24
-
[76]
Gu T., Dolan-Gavitt B. and Garg S. (2017). Badnets: Iden- tifying vulnerabilities in the machine learning model supply chain. arXiv preprint. DOI:10.48550/arXiv.1708
-
[77]
Chen X., Liu C., Li B., et al. (2017). Targeted backdoor attacks on deep learning systems using data poisoning. arXiv preprint. DOI:10.48550/arXiv.1712.05526
work page internal anchor Pith review doi:10.48550/arxiv.1712.05526 2017
-
[78]
Shafahi A., Huang W.R., Najibi M., et al. (2018). Poison frogs! Targeted clean-label poisoning attacks on neural networks. Adv. Neural Inf. Process. Syst. 31:6106-6116. DOI:10.5555/3327345.3327509
-
[79]
Zhang J., Chen B., Cheng X., et al. (2021). PoisonGAN: Generative poisoning attacks against federated learning in edge computing systems. IEEE Internet Things J. 8:3310-
2021
-
[80]
DOI:10.1109/jiot.2020.3023126
-
[81]
Carlini N., Athalye A., Papernot N., et al. (2019). On evaluating adversarial robustness. arXiv preprint. DOI:10.48550/arXiv.1902.06705
-
[82]
Tramèr F ., Kurakin A., Papernot N., et al. (2017). Ensemble adversarial training: Attacks and defenses. arXiv preprint. DOI:10.48550/arXiv.1705.07204
-
[83]
and Rosenfeld, Elan and Kolter, J
Cohen J., Rosenfeld E. and Kolter Z. (2019). Certi- fied adversarial robustness via randomized smoothing. Proc. Int. Conf. Mach. Learn. 2019:1310-1320. DOI:10. 48550/arXiv.1902.02918
-
[84]
Brown, Benjamin Mann, Nick Ryder, et al
Wu D., Xia S.-T. and Wang Y . (2020). Adversarial weight perturbation helps robust generalization. Adv. Neural Inf. Process. Syst. 33:2958-2969. DOI:10.5555/3495724
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.