Parasites in the Toolchain: A Large-Scale Analysis of Attacks on the MCP Ecosystem
Pith reviewed 2026-05-18 18:39 UTC · model grok-4.3
The pith
MCP toolchains can be hijacked to leak private data by embedding malicious instructions in external sources accessed by LLMs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The Model Context Protocol lacks both context-tool isolation and least-privilege enforcement. This enables adversarial instructions to propagate unchecked into sensitive tool invocations, allowing the assembly of legitimate tools into coordinated malicious workflows that culminate in stealthy privacy exfiltration through parasitic ingestion, collection, and disclosure phases.
What carries the argument
The Parasitic Toolchain Attack pattern (MCP-UPD) that infiltrates via external data and coordinates tools for privacy disclosure.
If this is right
- Attackers can achieve malicious goals by targeting the toolchain rather than individual prompts or tools.
- LLM integrations with external systems become vulnerable to indirect, no-interaction attacks.
- Many existing MCP servers contain gadgets that facilitate such attacks.
- Defense mechanisms are urgently needed to secure LLM-integrated environments.
Where Pith is reading between the lines
- Similar vulnerabilities likely exist in other emerging LLM tool orchestration protocols.
- Implementing context isolation could prevent propagation of malicious instructions across tools.
- Regular security censuses of tool ecosystems should be conducted to monitor for exploitable patterns.
- Testing the attack in real production LLM deployments would validate the practical risk.
Load-bearing premise
The tools and servers analyzed are representative of real-world MCP deployments, and the attack can be carried out in production without extra attacker capabilities.
What would settle it
An experiment showing an LLM using MCP tools processes a maliciously crafted external data source and then invokes tools to exfiltrate private user data.
Figures
read the original abstract
Large language models(LLMs) are increasingly integrated with external systems through the Model Context Protocol(MCP),which standardizes tool invocation and has rapidly become a backbone for LLM-powered applications. While this paradigm enhances functionality,it also introduces a fundamental security shift:LLMs transition from passive information processors to autonomous orchestrators of task-oriented toolchains,expanding the attack surface,elevating adversarial goals from manipulating single outputs to hijacking entire execution flows. In this paper,we identify and characterize a systematic privacy-leakage attack pattern,termed Parasitic Toolchain Attacks,instantiated as MCP Unintended Privacy Disclosure(MCP-UPD). These attacks require no direct victim interaction;instead,adversaries embed malicious instructions into external data sources that LLMs access during legitimate tasks. Unlike traditional prompt injection and tool poisoning attacks,our attack targets the interconnected toolchain itself,assembling multiple legitimate tools into a coordinated workflow whose combined behavior accomplishes malicious objectives. In MCP-UPD,the malicious logic infiltrates the toolchain and unfolds in three phases:Parasitic Ingestion,Privacy Collection,and Privacy Disclosure,culminating in stealthy exfiltration of private data. Our root cause analysis reveals that MCP lacks both context-tool isolation and least-privilege enforcement,enabling adversarial instructions to propagate unchecked into sensitive tool invocations. To assess the severity,we design MCP-SEC and conduct the first large-scale security census of the MCP ecosystem,analyzing 12230 tools across 1360 servers. Our findings show that the MCP ecosystem is rife with real-world exploitable gadgets and diverse attack methods,underscoring systemic risks in MCP platforms and the urgent need for defense mechanisms in LLM-integrated environments.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Parasitic Toolchain Attacks on the Model Context Protocol (MCP) ecosystem for LLM tool integrations. It defines MCP Unintended Privacy Disclosure (MCP-UPD) as a three-phase attack (Parasitic Ingestion, Privacy Collection, Privacy Disclosure) in which adversaries embed malicious instructions in external data sources that LLMs access during legitimate tasks; these instructions then hijack coordinated legitimate tools to exfiltrate private data without direct victim interaction. The authors perform a root-cause analysis attributing the vulnerability to MCP's lack of context-tool isolation and least-privilege enforcement, design the MCP-SEC scanner, and report a large-scale census of 12,230 tools across 1,360 servers that identifies widespread exploitable gadgets and diverse attack methods.
Significance. If the empirical claims are strengthened by execution-based validation, the work would be significant as the first large-scale security census of the emerging MCP ecosystem. It surfaces a novel attack pattern that targets interconnected toolchains rather than single prompts or tools, and the scale of the scan (over 12k tools) provides concrete evidence of systemic risks that could guide protocol-level defenses in LLM-integrated environments. The empirical focus and identification of real-world gadgets are clear strengths.
major comments (2)
- [§5] §5 (Large-scale analysis / MCP-SEC description): The census catalogs tools with data-access or exfiltration capabilities via static inspection of schemas and descriptions, yet provides no evidence that these gadgets can be assembled into the full three-phase MCP-UPD workflow when an LLM processes external data containing parasitic instructions. Without dynamic execution of the Parasitic Ingestion → Privacy Collection → Privacy Disclosure sequence in a live MCP client connected to an LLM, the central claim that the ecosystem is 'rife with real-world exploitable gadgets' rests on an untested assumption rather than an observed outcome.
- [§4] §4 (Attack construction and root-cause analysis): The diagnosis that MCP lacks context-tool isolation and least-privilege enforcement is plausible from the protocol overview, but the manuscript does not include concrete traces or examples showing how an adversarial instruction embedded in external data actually propagates through the toolchain to trigger a sensitive tool invocation under realistic task conditions. This gap directly affects the severity conclusion.
minor comments (2)
- [Abstract / §1] The abstract and §1 use the term 'MCP-UPD' before it is formally defined; a forward reference or early definition would improve readability.
- [Table 1] Table 1 (or equivalent summary of scan results) reports aggregate counts but does not break down the fraction of tools that were manually validated versus automatically flagged; adding this would clarify the reliability of the 12,230-tool census.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and for recognizing the novelty of the Parasitic Toolchain Attack pattern and the scale of our MCP ecosystem census. We address each major comment below and describe the revisions we will incorporate to strengthen the empirical grounding of our claims.
read point-by-point responses
-
Referee: [§5] §5 (Large-scale analysis / MCP-SEC description): The census catalogs tools with data-access or exfiltration capabilities via static inspection of schemas and descriptions, yet provides no evidence that these gadgets can be assembled into the full three-phase MCP-UPD workflow when an LLM processes external data containing parasitic instructions. Without dynamic execution of the Parasitic Ingestion → Privacy Collection → Privacy Disclosure sequence in a live MCP client connected to an LLM, the central claim that the ecosystem is 'rife with real-world exploitable gadgets' rests on an untested assumption rather than an observed outcome.
Authors: We agree that dynamic validation of full workflows would provide stronger confirmation that the identified gadgets can be composed into complete MCP-UPD attacks. Our MCP-SEC scanner employs static analysis of tool schemas and descriptions precisely to enable a scalable census across more than 12,000 tools, which would be infeasible to replicate dynamically at full scale. Nevertheless, we conducted targeted dynamic experiments during our research to validate representative gadget assemblies. In the revised manuscript we will add a dedicated subsection to §5 that reports these execution-based case studies, including concrete traces of the three-phase sequence executed in a live MCP client connected to an LLM under controlled conditions. revision: yes
-
Referee: [§4] §4 (Attack construction and root-cause analysis): The diagnosis that MCP lacks context-tool isolation and least-privilege enforcement is plausible from the protocol overview, but the manuscript does not include concrete traces or examples showing how an adversarial instruction embedded in external data actually propagates through the toolchain to trigger a sensitive tool invocation under realistic task conditions. This gap directly affects the severity conclusion.
Authors: We concur that explicit propagation traces would make the root-cause analysis more compelling. Section 4 currently grounds the diagnosis in the MCP protocol specification and outlines the three attack phases with illustrative scenarios. To directly address this concern, we will expand §4 in the revision with additional concrete execution traces drawn from our prototype attack implementations, demonstrating step-by-step how an adversarial instruction in external data propagates through the MCP context to invoke sensitive tools under realistic task conditions. revision: yes
Circularity Check
No circularity: empirical census of MCP tools with no derivations or self-referential predictions
full rationale
This is a measurement study that catalogs tool capabilities across 12230 instances and diagnoses protocol-level isolation failures from the MCP specification. No equations, fitted parameters, or first-principles derivations appear. Claims about parasitic ingestion, privacy collection, and disclosure rest directly on the observed tool schemas and the absence of context-tool isolation in the protocol description; they are not redefined or predicted from prior results within the paper itself. The analysis is self-contained against external benchmarks (the public MCP ecosystem) and does not reduce any central finding to a tautology or self-citation chain.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption MCP standardizes tool invocation for LLMs and is rapidly becoming a backbone for LLM-powered applications
invented entities (2)
-
Parasitic Toolchain Attacks
no independent evidence
-
MCP-UPD
no independent evidence
Forward citations
Cited by 8 Pith papers
-
A First Measurement Study on Authentication Security in Real-World Remote MCP Servers
First measurement study of 7,973 remote MCP servers finds 40.55% lack authentication and all 119 tested OAuth servers have flaws that risk data leaks or account takeover.
-
MCP-DPT: A Defense-Placement Taxonomy and Coverage Analysis for Model Context Protocol Security
MCP-DPT creates a defense-placement taxonomy that organizes MCP threats and defenses across six architectural layers, revealing mostly tool-centric protections and gaps at orchestration, transport, and supply-chain layers.
-
From Component Manipulation to System Compromise: Understanding and Detecting Malicious MCP Servers
Presents a component-centric PoC dataset of malicious MCP servers and a two-stage behavioral deviation detector Connor achieving 94.6% F1-score.
-
Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions
MCP lifecycle is defined with four phases and 16 activities; a threat taxonomy of 16 scenarios is constructed, validated via case studies, and paired with phase-specific safeguards.
-
Behavioral Integrity Verification for AI Agent Skills
BIV audits AI agent skills at scale, finding 80% deviate from declared behavior on 49,943 skills and achieving 0.946 F1 for malicious skill detection.
-
Unsafe by Flow: Uncovering Bidirectional Data-Flow Risks in MCP Ecosystem
MCP-BiFlow detects 93.8% of known bidirectional data-flow vulnerabilities in MCP servers and identifies 118 confirmed issues across 87 real-world servers from a scan of 15,452 repositories.
-
MCPThreatHive: Automated Threat Intelligence for Model Context Protocol Ecosystems
MCPThreatHive automates the full lifecycle of threat intelligence for MCP agentic systems using a new 38-pattern taxonomy mapped to STRIDE and OWASP frameworks plus composite risk scoring.
-
Security Threat Modeling for Emerging AI-Agent Protocols: A Comparative Analysis of MCP, A2A, Agora, and ANP
The paper identifies twelve protocol-level security risks across MCP, A2A, Agora, and ANP and quantifies wrong-provider tool execution risk in MCP via a measurement-driven case study on multi-server composition.
Reference graph
Works this paper leans on
-
[1]
Prompt Security Top 10: Key Security Risks for MCPs, Accessed: 2025-10-27
work page 2025
-
[2]
Protecting against indirect prompt injection attacks in MCP, Accessed: 2025-10-27
work page 2025
-
[3]
MCP-Sec.https://anonymous.4open.science/r/MCP-SEC-3FD0/, Accessed: 2025-11-09
work page 2025
-
[4]
https://secresearcher100.github.io/, Accessed: 2025-11-09
The Demo Site of Parasites in the Toolchain: A Large-Scale Analysis of Attacks on the MCP Ecosystem. https://secresearcher100.github.io/, Accessed: 2025-11-09
work page 2025
-
[5]
https://github.com/punkpeye/awesome-mcp-servers/blob/main/README.md, Ac- cessed: 2025-11-13
Awesome MCP Servers. https://github.com/punkpeye/awesome-mcp-servers/blob/main/README.md, Ac- cessed: 2025-11-13
work page 2025
-
[6]
Cline - AI Coding, Open Source and Uncompromised.https://cline.bot/, Accessed: 2025-11-13
work page 2025
-
[7]
Cursor - The AI Code Editor.https://cursor.com/home, Accessed: 2025-11-13
work page 2025
-
[8]
Discover Top MCP Servers | MCP Market.https://mcpmarket.com/, Accessed: 2025-11-13
work page 2025
-
[9]
Download Claude.https://claude.ai/download, Accessed: 2025-11-13
work page 2025
-
[10]
https://github.com/baranwang/mcp-trends-hub, Accessed: 2025-11- 13
GitHub - baranwang/mcp-tredns-hub. https://github.com/baranwang/mcp-trends-hub, Accessed: 2025-11- 13
work page 2025
-
[11]
https://github.com/brightdata/brightdata-mcp, Accessed: 2025-11- 13
GitHub - brightdata/brightdata-mcp. https://github.com/brightdata/brightdata-mcp, Accessed: 2025-11- 13
work page 2025
-
[12]
https://github.com/cswkim/discogs-mcp-server, Accessed: 2025- 11-13
GitHub - cswkim/discogs-mcp-server. https://github.com/cswkim/discogs-mcp-server, Accessed: 2025- 11-13
work page 2025
-
[13]
GitHub - deanward/hal.https://github.com/deanward/hal, Accessed: 2025-11-13
work page 2025
-
[14]
GitHub - devabdultech/hn-mcp.https://github.com/devabdultech/hn-mcp, Accessed: 2025-11-13
work page 2025
-
[15]
GitHub - evalsone/mcp-connect.https://github.com/evalsone/mcp-connect, Accessed: 2025-11-13
work page 2025
-
[16]
GitHub - ivo-toby/contentful-mcp.https://github.com/ivo-toby/contentful-mcp, Accessed: 2025-11-13
work page 2025
-
[17]
https://github.com/modelcontextprotocol/python-sdk, Ac- cessed: 2025-11-13
GitHub - modelcontextprotocol/python-sdk. https://github.com/modelcontextprotocol/python-sdk, Ac- cessed: 2025-11-13
work page 2025
-
[18]
17 Mind Your Server: A Systematic Study of Parasitic Attacks on the MCP Ecosystem
GitHub - oschina/mcp-gitee.https://github.com/oschina/mcp-gitee, Accessed: 2025-11-13. 17 Mind Your Server: A Systematic Study of Parasitic Attacks on the MCP Ecosystem
work page 2025
-
[19]
https://github.com/pimzino/agentic-tools-mcp, Accessed: 2025- 11-13
GitHub - pimzino/agentic-tools-mcp. https://github.com/pimzino/agentic-tools-mcp, Accessed: 2025- 11-13
work page 2025
-
[20]
https://github.com/wonderwhy-er/ DesktopCommanderMCP, Accessed: 2025-11-13
GitHub - wonderwhy-er/DesktopCommanderMCP. https://github.com/wonderwhy-er/ DesktopCommanderMCP, Accessed: 2025-11-13
work page 2025
-
[21]
https://modelcontextprotocol.io/docs/getting-started/intro, Accessed: 2025-11-13
Introduction - Model Context Protocol. https://modelcontextprotocol.io/docs/getting-started/intro, Accessed: 2025-11-13
work page 2025
-
[22]
MCP Security Notification: Tool Poisoning Attacks. https://invariantlabs.ai/blog/mcp-security- notification-tool-poisoning-attacks, Accessed: 2025-11-13
work page 2025
-
[23]
MCP Server Directory | PulseMCP.https://www.pulsemcp.com/servers, Accessed: 2025-11-13
work page 2025
-
[24]
npx | npm Docs.https://docs.npmjs.com/cli/v8/commands/npx, Accessed: 2025-11-13
work page 2025
-
[25]
Using tools | uv.https://docs.astral.sh/uv/guides/tools/, Accessed: 2025-11-13
work page 2025
-
[26]
Can llm-generated misinformation be detected? In NeurIPS 2023 Workshop on Regulatable ML, 2023
Canyu Chen and Kai Shu. Can llm-generated misinformation be detected? In NeurIPS 2023 Workshop on Regulatable ML, 2023
work page 2023
-
[27]
Bias and unfairness in information retrieval systems: New challenges in the llm era
Sunhao Dai, Chen Xu, Shicheng Xu, Liang Pang, Zhenhua Dong, and Jun Xu. Bias and unfairness in information retrieval systems: New challenges in the llm era. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 6437–6447, 2024
work page 2024
-
[28]
Preetam Prabhu Srikar Dammu, Hayoung Jung, Anjali Singh, Monojit Choudhury, and Tanu Mitra. “they are uncultured”: Unveiling covert harms and social threats in llm generated conversations. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 20339–20369, 2024
work page 2024
-
[29]
Realtoxicityprompts: Evaluating neural toxic degeneration in language models
Samuel Gehman, Suchin Gururangan, Maarten Sap, Yejin Choi, and Noah A Smith. Realtoxicityprompts: Evaluating neural toxic degeneration in language models. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3356–3369, 2020
work page 2020
-
[30]
Systematic analysis of mcp security,
Yongjian Guo, Puzhuo Liu, Wanlun Ma, Zehang Deng, Xiaogang Zhu, Peng Di, Xi Xiao, and Sheng Wen. Systematic analysis of mcp security. arXiv preprint arXiv:2508.12538, 2025
-
[31]
Stabletoolbench: Towards stable large-scale benchmarking on tool learning of large language models
Zhicheng Guo, Sijie Cheng, Hao Wang, Shihao Liang, Yujia Qin, Peng Li, Zhiyuan Liu, Maosong Sun, and Yang Liu. Stabletoolbench: Towards stable large-scale benchmarking on tool learning of large language models. In Findings of the Association for Computational Linguistics ACL 2024, pages 11143–11156, 2024
work page 2024
-
[32]
Mohammed Mehedi Hasan, Hao Li, Emad Fallahzadeh, Gopi Krishnan Rajbahadur, Bram Adams, and Ahmed E Hassan. Model context protocol (mcp) at first glance: Studying the security and maintainability of mcp servers. arXiv preprint arXiv:2506.13538, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[33]
Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions
Xinyi Hou, Yanjie Zhao, Shenao Wang, and Haoyu Wang. Model context protocol (mcp): Landscape, security threats, and future research directions. arXiv preprint arXiv:2503.23278, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[34]
Towards mitigating llm hallucination via self reflection
Ziwei Ji, Tiezheng Yu, Yan Xu, Nayeon Lee, Etsuko Ishii, and Pascale Fung. Towards mitigating llm hallucination via self reflection. InFindings of the Association for Computational Linguistics: EMNLP 2023, pages 1827–1843, 2023
work page 2023
-
[35]
Mcp guardian: A security-first layer for safeguarding mcp-based ai system
Sonu Kumar, Anubhav Girdhar, Ritesh Patil, and Divyansh Tripathi. Mcp guardian: A security-first layer for safeguarding mcp-based ai system. arXiv preprint arXiv:2504.12757, 2025
-
[36]
We urgently need privilege management in mcp: A measurement of api usage in mcp ecosystems
Zhihao Li, Kun Li, Boyang Ma, Minghui Xu, Yue Zhang, and Xiuzhen Cheng. We urgently need privilege management in mcp: A measurement of api usage in mcp ecosystems. arXiv preprint arXiv:2507.06250, 2025
-
[37]
Jiawei Liu, Chunqiu Steven Xia, Yuyao Wang, and Lingming Zhang. Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation. Advances in Neural Information Processing Systems, 36:21558–21572, 2023
work page 2023
-
[38]
Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment
Yang Liu, Yuanshun Yao, Jean-Francois Ton, Xiaoying Zhang, Ruocheng Guo, Hao Cheng, Yegor Klochkov, Muhammad Faaiz Taufiq, and Hang Li. Trustworthy llms: a survey and guideline for evaluating large language models’ alignment. arXiv preprint arXiv:2308.05374, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[39]
Vineeth Sai Narajala and Idan Habler. Enterprise-grade security for the model context protocol (mcp): Frameworks and mitigation strategies. arXiv preprint arXiv:2504.08623, 2025
-
[40]
{CodexLeaks}: Privacy leaks from code generation language models in {GitHub} copilot
Liang Niu, Shujaat Mirza, Zayd Maradni, and Christina Pöpper. {CodexLeaks}: Privacy leaks from code generation language models in {GitHub} copilot. In 32nd USENIX Security Symposium (USENIX Security 23), pages 2133–2150, 2023. 18 Mind Your Server: A Systematic Study of Parasitic Attacks on the MCP Ecosystem
work page 2023
-
[41]
On the risk of misinformation pollution with large language models
Yikang Pan, Liangming Pan, Wenhu Chen, Preslav Nakov, Min-Yen Kan, and William Wang. On the risk of misinformation pollution with large language models. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 1389–1403, 2023
work page 2023
-
[42]
Gorilla: Large language model connected with massive apis
Shishir G Patil, Tianjun Zhang, Xin Wang, and Joseph E Gonzalez. Gorilla: Large language model connected with massive apis. Advances in Neural Information Processing Systems, 37:126544–126565, 2024
work page 2024
-
[43]
Mcp safety audit: Llms with the model context protocol allow major security exploits,
Brandon Radosevich and John Halloran. Mcp safety audit: Llms with the model context protocol allow major security exploits. arXiv preprint arXiv:2504.03767, 2025
-
[44]
Characteristics of harmful text: Towards rigorous benchmarking of language models
Maribeth Rauh, John Mellor, Jonathan Uesato, Po-Sen Huang, Johannes Welbl, Laura Weidinger, Sumanth Dathathri, Amelia Glaese, Geoffrey Irving, Iason Gabriel, et al. Characteristics of harmful text: Towards rigorous benchmarking of language models. Advances in Neural Information Processing Systems, 35:24720–24739, 2022
work page 2022
-
[45]
HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns
Xinyue Shen, Yixin Wu, Yiting Qu, Michael Backes, Savvas Zannettou, and Yang Zhang. HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns. In USENIX Security Symposium (USENIX Security). USENIX, 2025
work page 2025
-
[46]
Hao Song, Yiming Shen, Wenxuan Luo, Leixin Guo, Ting Chen, Jiashui Wang, Beibei Li, Xiaosong Zhang, and Jiachi Chen. Beyond the protocol: Unveiling attack vectors in the model context protocol ecosystem. arXiv preprint arXiv:2506.02040, 2025
-
[47]
Miles Turpin, Julian Michael, Ethan Perez, and Samuel Bowman. Language models don’t always say what they think: Unfaithful explanations in chain-of-thought prompting. Advances in Neural Information Processing Systems, 36:74952–74965, 2023
work page 2023
-
[48]
Mcptox: A benchmark for tool poisoning attack on real-world mcp servers
Zhiqiang Wang, Yichao Gao, Yanting Wang, Suyuan Liu, Haifeng Sun, Haoran Cheng, Guanquan Shi, Haohua Du, and Xiangyang Li. Mcptox: A benchmark for tool poisoning attack on real-world mcp servers. arXiv preprint arXiv:2508.14925, 2025
-
[49]
Zihan Wang, Hongwei Li, Rui Zhang, Yu Liu, Wenbo Jiang, Wenshu Fan, Qingchuan Zhao, and Guowen Xu. Mpma: Preference manipulation attack against model context protocol. arXiv preprint arXiv:2505.11154, 2025
-
[50]
Yuchong Xie, Mingyu Luo, Zesen Liu, Zhixiang Zhang, Kaikai Zhang, Yu Liu, Zongjie Li, Ping Chen, Shuai Wang, and Dongdong She. On the security of tool-invocation prompts for llm-based agentic systems: An empirical risk assessment. arXiv preprint arXiv:2509.05755, 2025
-
[51]
Mcpsecbench: A systematic security benchmark and playground for testing model context protocols
Yixuan Yang, Daoyuan Wu, and Yufan Chen. Mcpsecbench: A systematic security benchmark and playground for testing model context protocols. arXiv preprint arXiv:2508.13220, 2025
-
[52]
LLM lies: Hallucinations are not bugs, but features as adversarial examples,
Jia-Yu Yao, Kun-Peng Ning, Zhen-Hui Liu, Mu-Nan Ning, Yu-Yang Liu, and Li Yuan. Llm lies: Hallucinations are not bugs, but features as adversarial examples. arXiv preprint arXiv:2310.01469, 2023
-
[53]
React: Synergizing reasoning and acting in language models
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. In International Conference on Learning Representations (ICLR), 2023
work page 2023
-
[54]
Yi Zeng, Hongpeng Lin, Jingwen Zhang, Diyi Yang, Ruoxi Jia, and Weiyan Shi. How johnny can persuade llms to jailbreak them: Rethinking persuasion to challenge ai safety by humanizing llms. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (V olume1: Long Papers), pages 14322–14350, 2024
work page 2024
-
[55]
On large language models’ resilience to coercive interrogation
Zhuo Zhang, Guangyu Shen, Guanhong Tao, Siyuan Cheng, and Xiangyu Zhang. On large language models’ resilience to coercive interrogation. In 2024 IEEE Symposium on Security and Privacy (SP), pages 826–844. IEEE, 2024
work page 2024
-
[56]
When mcp servers attack: Taxonomy, feasibility, and mitigation,
Weibo Zhao, Jiahao Liu, Bonan Ruan, Shaofei Li, and Zhenkai Liang. When mcp servers attack: Taxonomy, feasibility, and mitigation. arXiv preprint arXiv:2509.24272, 2025
-
[57]
Universal and Transferable Adversarial Attacks on Aligned Language Models
Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J Zico Kolter, and Matt Fredrikson. Universal and transferable adversarial attacks on aligned language models. arXiv preprint arXiv:2307.15043, 2023. 19
work page internal anchor Pith review Pith/arXiv arXiv 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.