{"total":13,"items":[{"citing_arxiv_id":"2606.10749","ref_index":197,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Toward Secure LLM Agents: Threat Surfaces, Attacks, Defenses, and Evaluation","primary_cat":"cs.CR","submitted_at":"2026-06-09T12:01:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"A synthesis of 247 papers on LLM agent security identifies prompt injection and tool hijacking as dominant threats, notes weakly compositional defenses, and argues for trust boundaries and realistic evaluations.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"White, Doug Burger, and Chi Wang. 2024. AutoGen: Enabling Next-Gen ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article 1. Publication date: January 2026. Toward Secure LLM Agents: Threat Surfaces, Attacks, Defenses, and Evaluation 1:39 LLM Applications via Multi-Agent Conversation. InFirst Conference on Language Modeling (COLM). OpenReview.net. [197] Yongxuan Wu, Xixun Lin, He Zhang, Nan Sun, Kun Wang, Chuan Zhou, Shirui Pan, and Yanan Cao. 2026. CIA: Inferring the Communication Topology from LLM-based Multi-Agent Systems. arXiv:2604.12461 [cs.AI] doi:10.48550/ arXiv.2604.12461 [198] Yuhao Wu, Franziska Roesner, Tadayoshi Kohno, Ning Zhang, and Umar Iqbal. 2025. IsolateGPT: An Execution Isolation Architecture for LLM-Based Systems."},{"citing_arxiv_id":"2606.05976","ref_index":41,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"The Self-Correction Illusion: LLMs Correct Others but Not Themselves","primary_cat":"cs.AI","submitted_at":"2026-06-04T10:17:00+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Relabeling an identical erroneous claim from the model's own thought role to an external chat role increases explicit correction rates by 23-93 percentage points across 13 model-domain cells, indicating a chat-template artifact rather than a cognitive deficit.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.23723","ref_index":27,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"MemAudit: Post-hoc Auditing of Poisoned Agent Memory via Causal Attribution and Structural Anomaly Detection","primary_cat":"cs.AI","submitted_at":"2026-05-22T15:03:13+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MemAudit combines counterfactual causal influence scores with memory consistency graphs to identify poisoned records in LLM agent memory, reducing MINJA attack success from 70% to 0% in QA and 83.3% to 0% in reasoning tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18930","ref_index":34,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"OEP: Poisoning Self-Evolving LLM Agents via Locally Correct but Non-Transferable Experiences","primary_cat":"cs.CR","submitted_at":"2026-05-18T14:08:59+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"OEP poisons self-evolving LLM agents by constructing clean edge-case experiences that appear locally valid yet cause harmful over-generalization during reflection, achieving over 50% attack success rate on GPT-4o agents across three domains.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.17830","ref_index":39,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Remembering More, Risking More: Longitudinal Safety Risks in Memory-Equipped LLM Agents","primary_cat":"cs.AI","submitted_at":"2026-05-18T04:06:34+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Memory-equipped LLM agents exhibit increasing safety violation rates as memory accumulates across independent tasks, termed temporal memory contamination, detected via a new trigger-probe protocol.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14421","ref_index":21,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"MemLineage: Lineage-Guided Enforcement for LLM Agent Memory","primary_cat":"cs.CR","submitted_at":"2026-05-14T06:07:54+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MemLineage enforces untrusted-path persistence in LLM agent memory through Merkle logs, per-principal signatures, and max-of-strong-edges lineage propagation, achieving zero ASR on three poisoning workloads with sub-millisecond overhead.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.22842","ref_index":33,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"The Misattribution Gap: When Memory Poisoning Looks Like Model Failure in Agentic AI Systems","primary_cat":"cs.CR","submitted_at":"2026-05-12T20:21:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Memory poisoning via lost-provenance documents in agent memory stores creates agent misconduct that safety systems misattribute to model failure; the paper defines Semantic Norm Drift, releases a benchmark, and proposes a new testing method plus a defense.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.09330","ref_index":50,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"The Trap of Trajectory: Towards Understanding and Mitigating Spurious Correlations in Agentic Memory","primary_cat":"cs.LG","submitted_at":"2026-05-10T05:04:13+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Agentic memory improves clean reasoning but worsens performance when spurious patterns are present in stored trajectories; CAMEL calibration reduces this reliance while preserving clean performance.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 11279-11298, 2022. [49] Yifei Wang, Dizhan Xue, Shengjie Zhang, and Shengsheng Qian. Badagent: Inserting and activating backdoor attacks in llm agents. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 9811-9827, 2024. [50] Qianshan Wei, Tengchao Yang, Yaochen Wang, Xinfeng Li, Lijun Li, Zhenfei Yin, Yi Zhan, Thorsten Holz, Zhiqiang Lin, and XiaoFeng Wang. A-memguard: A proactive defense frame- work for llm-based agent memory.arXiv preprint arXiv:2510.02373, 2025. [51] Daniel Westreich. Berkson's bias, selection bias, and missing data.Epidemiology, 23(1):159- 164, 2012."},{"citing_arxiv_id":"2605.09278","ref_index":78,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"EquiMem: Calibrating Shared Memory in Multi-Agent Debate via Game-Theoretic Equilibrium","primary_cat":"cs.AI","submitted_at":"2026-05-10T03:04:12+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"EquiMem calibrates shared memory in multi-agent debate by computing a game-theoretic equilibrium from agent queries and paths, outperforming heuristics and LLM validators across benchmarks while remaining robust to adversarial agents.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"commit decision depends on agents' self-reported confidence rather than on any check against the memory state itself, so debate alone cannot filter the errors. Preprint. arXiv:2605.09278v1 [cs.AI] 10 May 2026 Existing safeguards either filter proposed entries via heuristics or auxiliary LLM classifiers [1, 76], or rely on consensus-based validation [78]. Both deliver real protection, but still produce calibration through AI models that carry the same risks of hallucination and sycophancy. More importantly, these methods are designed for a single agent reading its own memory, and do not model multiple agents continuously writing to (and potentially colluding over) a shared memory space. This gap motivates a critical question:how can we calibrate memory updates in a MAD system without relying"},{"citing_arxiv_id":"2605.03482","ref_index":10,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"MEMSAD: Gradient-Coupled Anomaly Detection for Memory Poisoning in Retrieval-Augmented Agents","primary_cat":"cs.CR","submitted_at":"2026-05-05T08:15:41+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"MEMSAD links anomaly detection gradients to retrieval objectives under encoder regularity to certify detection of continuous memory poisons, achieving perfect TPR/FPR in experiments while exposing a synonym-invariance gap.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.01970","ref_index":86,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Trojan Hippo: Weaponizing Agent Memory for Data Exfiltration","primary_cat":"cs.CR","submitted_at":"2026-05-03T17:07:20+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"The paper defines and evaluates Trojan Hippo attacks on LLM agent memory, showing 85-100% success in data exfiltration across backends and reduced rates with defenses at varying utility costs.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Are Nails: A Simple Reinforcement Learning Recipe for Strong Prompt Injection. arxiv, 2025. [86] Di Wu, Hongwei Wang, Wenhao Yu, Yuwei Zhang, Kai-Wei Chang, and Dong Yu. Longmemeval: Benchmarking chat assistants on long-term interactive memory. InInternational Conference on Learning Representations (ICLR), 2025. URL https: //openreview.net/forum?id=pZiyCaVuti. [87] Liu Yang, Susan T. Dumais, Paul N. Bennett, and Ahmed Hassan Awadallah. Characterizing and predicting enterprise email reply behavior. InProceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 235-244. ACM, 2017. doi: 10.1145/3077136.3080770. [88] Shiyi Yang, Zhibo Hu, Xinshu Li, Chen Wang, Tong Yu, Xiwei Xu, Liming Zhu,"},{"citing_arxiv_id":"2605.12535","ref_index":31,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Ghost in the Context: Measuring Policy-Carriage Failures in Decision-Time Assembly","primary_cat":"cs.CR","submitted_at":"2026-05-02T18:07:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"The paper measures policy-carriage failures during LLM context assembly and evaluates SafeContext as a partial mitigation on Llama, Qwen, and Mistral models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.24657","ref_index":12,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"AgentWard: A Lifecycle Security Architecture for Autonomous AI Agents","primary_cat":"cs.CR","submitted_at":"2026-04-27T16:22:27+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"AgentWard organizes stage-specific security controls with cross-layer coordination to intercept threats across the full lifecycle of autonomous AI agents.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}