Got a Secret? LLM Agents Can't Keep It: Evaluating Privacy in Multi-Agent Systems

Aman Priyanshu; Esha Pahwa; Supriti Vijay

arxiv: 2605.27766 · v1 · pith:NJLTRVNWnew · submitted 2026-05-26 · 💻 cs.AI

Got a Secret? LLM Agents Can't Keep It: Evaluating Privacy in Multi-Agent Systems

Aman Priyanshu , Supriti Vijay , Esha Pahwa This is my paper

Pith reviewed 2026-06-29 16:35 UTC · model grok-4.3

classification 💻 cs.AI

keywords privacy evaluationmulti-agent systemsLLM agentssocial contagionsimulation platformsafety benchmarksinformation leakage

0 comments

The pith

LLM agents disclose private information more than twice as often in social groups than in isolated tests

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper creates a simulation platform in which thousands of LLM agents interact within communities over a simulated month to measure privacy under social pressure. It reports that moving from single-turn to multi-turn social evaluation raises violation rates from 19.95 percent to 45.30 percent across tested models. Agents become eight times more likely to leak after observing a peer disclose, and explicit privacy instructions lower the rate but leave it above 37.8 percent. The work concludes that single-turn benchmarks miss disclosures that arise only when agents observe and influence one another.

Core claim

In the introduced simulation platform, multi-turn social interactions among LLM agents amplify privacy violations relative to single-turn baselines, produce contagious leakage where observation of a disclosure multiplies the chance of further disclosures, and show that explicit privacy instructions reduce but do not remove the effect, leaving substantial leakage that single-turn evaluations never surface.

What carries the argument

The Moltbook-style simulation platform that places thousands of LLM agents in persistent community interactions over a simulated month to measure privacy leakage under varying social pressure.

If this is right

Static single-turn safety benchmarks systematically underestimate privacy risks once agents operate together.
Social observation alone is enough to trigger sensitive disclosures that isolated tests miss.
Explicit privacy instructions lower but do not eliminate leakage in group settings.
Leakage becomes contagious once one agent observes another disclose information.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Safety testing protocols may need to incorporate repeated peer observation rather than single exchanges.
Longer interaction histories or larger agent populations could further increase the observed contagion multiplier.
Deployed systems might benefit from monitoring mechanisms that detect early disclosures before they spread.

Load-bearing premise

The behaviors produced by the simulation platform match how real deployed LLM agents would respond to social observation and pressure.

What would settle it

Direct measurement of privacy leakage rates in an actual multi-agent LLM deployment over a comparable period and comparison against the 45.30 percent and eightfold contagion figures reported from the simulation.

Figures

Figures reproduced from arXiv: 2605.27766 by Aman Priyanshu, Esha Pahwa, Supriti Vijay.

**Figure 1.** Figure 1: Qualitative examples from our multi-agent simulation demonstrating how social context drives disclosure. (a) In an [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Controlled testbed protocol. A single agent interacts alone with a frozen platform snapshot containing adversarial [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Cumulative leaking posts/threads over 25 simu [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Leakage counts by model with and without explicit [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Leakage rate by subreddit. Communities centered [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 8.** Figure 8: Leakage rates by persona (avg across models). Variation is low (𝑠𝑡𝑑 = 2.8%), indicating stronger dependence on social context than persona. Community Topic Effects (RQ4). Yes, community context exerts an effect comparable in magnitude to model choice, we find similarly large variance across subreddits and privacy domains, indicating that where an agent participates can be as predictive of leakage as whi… view at source ↗

read the original abstract

LLM safety evaluations predominantly test models in isolation, yet deployed AI agents increasingly operate within persistent social environments alongside other agents. We introduce a Moltbook-style simulation platform where thousands of LLM agents interact across communities over a simulated month, and use it to evaluate privacy as a downstream safety concern under varying degrees of social pressure. We find that shifting from single turn to multi turn social evaluation amplifies privacy violations (CIMemories 19.95% to Ours 45.30% across OpenAI models), that leakage is socially contagious, with agents 8 times more likely to disclose sensitive information after observing a peer do so, and that explicit privacy instructions reduce but do not eliminate this effect, leaving leakage rates above 37.8% even with safeguards. Our findings suggest that static chat based safety benchmarks systematically underestimate risks in agentic deployment, and that social context alone is sufficient to elicit sensitive disclosures that single turn evaluations would never surface.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The simulation flags real gaps in single-turn privacy tests but rests on unvalidated agent behaviors that may not match deployed systems.

read the letter

The paper introduces a multi-agent simulation platform and uses it to show privacy leakage rising sharply once agents interact over multiple turns and observe each other. The reported jump from 19.95% to 45.30% leakage, the eightfold increase after peer disclosure, and the fact that explicit instructions still leave rates above 37.8% are the concrete results worth noting.

What the work does is build a persistent community setting with thousands of agents running across a simulated month. This moves beyond isolated prompt tests and measures social contagion directly. The platform design and the specific contagion metric appear new relative to the single-turn benchmarks mentioned.

The main limitation is that the simulation has no external grounding. The abstract presents the platform as the evaluation method but gives no human baselines, real deployment logs, or checks on whether the agents' responses to social pressure track actual multi-agent LLM use. If the contagion effect is driven by the simulation rules or prompt construction, the amplification numbers would not generalize. Methods details on agent sampling, parameter sensitivity, and statistical controls are also missing from what is shown.

The paper is aimed at researchers working on LLM safety for agentic systems. Anyone thinking about how to test persistent multi-agent deployments would find the platform concept and the social-pressure angle useful to examine. The thinking is straightforward and engages the literature on why isolated tests fall short.

I would send this to peer review. The core idea identifies a plausible gap, and referees can check the methods and any validation steps that are not visible in the abstract.

Referee Report

2 major / 2 minor

Summary. The paper introduces a Moltbook-style simulation platform in which thousands of LLM agents interact over a simulated month and uses it to measure privacy leakage under social pressure. It reports that multi-turn social evaluation increases leakage relative to single-turn baselines (CIMemories 19.95% to 45.30% across OpenAI models), that leakage is contagious (agents 8× more likely to disclose after observing a peer), and that explicit privacy instructions reduce but do not eliminate the effect (rates remain >37.8%). The authors conclude that static single-turn benchmarks systematically underestimate privacy risks in agentic deployments.

Significance. If the simulation produces behaviors that generalize, the work would usefully demonstrate that social context alone can elicit disclosures missed by isolated evaluations and would supply a scalable platform for studying emergent multi-agent safety properties. The reported contagion multiplier, if robust, identifies a mechanism not captured by current benchmarks.

major comments (2)

[Abstract and evaluation methodology] The central quantitative claims (amplification from 19.95% to 45.30%, 8× contagion factor, and post-instruction rates >37.8%) rest on the assumption that the Moltbook-style platform elicits realistic responses to peer observation and social pressure. The manuscript supplies no external validation data, human baselines, or comparison against logs from actual multi-agent deployments (Abstract; evaluation methodology).
[Abstract] The abstract states concrete percentages and multipliers yet provides no information on experimental controls, statistical methods, agent population sampling, number of simulation runs, or sensitivity to prompt or parameter choices. These omissions prevent assessment of whether the reported effects are robust (Abstract; implied Results section).

minor comments (2)

[Abstract] Define or cite the CIMemories baseline explicitly when first used so readers can understand the comparison.
Add sample sizes, confidence intervals, or variance measures alongside all reported percentages and multipliers in the main text and figures.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our simulation platform and evaluation methodology. We address each major point below, clarifying the scope of our work as a controlled study of social effects in LLM agents.

read point-by-point responses

Referee: [Abstract and evaluation methodology] The central quantitative claims (amplification from 19.95% to 45.30%, 8× contagion factor, and post-instruction rates >37.8%) rest on the assumption that the Moltbook-style platform elicits realistic responses to peer observation and social pressure. The manuscript supplies no external validation data, human baselines, or comparison against logs from actual multi-agent deployments (Abstract; evaluation methodology).

Authors: We acknowledge that the manuscript does not include external validation against real-world multi-agent deployment logs or human baselines, as such data is not publicly available and collecting it would require access to proprietary agent interactions. Our contribution centers on a reproducible simulation platform that isolates social pressure mechanisms (peer observation and contagion) in a way that observational logs cannot. We will add an explicit limitations subsection discussing the simulation-to-reality gap and calling for future validation work with deployed systems. revision: partial
Referee: [Abstract] The abstract states concrete percentages and multipliers yet provides no information on experimental controls, statistical methods, agent population sampling, number of simulation runs, or sensitivity to prompt or parameter choices. These omissions prevent assessment of whether the reported effects are robust (Abstract; implied Results section).

Authors: We agree the abstract is too terse on methodology. In revision we will expand it to note the agent population (thousands of LLM agents), simulation duration (one simulated month), use of multiple independent runs with reported averages, and basic controls for prompt sensitivity. The full experimental protocol, including statistical procedures and parameter sweeps, is detailed in the Methods section; the abstract will now reference these elements to allow readers to assess robustness. revision: yes

Circularity Check

0 steps flagged

No significant circularity; results are direct empirical outputs from introduced simulation

full rationale

The paper introduces a Moltbook-style simulation platform and reports measured privacy leakage rates (e.g., 19.95% to 45.30%, 8x contagion factor) as direct simulation outputs under varying social conditions. No equations, fitted parameters renamed as predictions, or self-citations are used to derive the central claims. The findings are presented as empirical measurements from the platform they built, with no reduction by construction to prior inputs. This matches the common case of a self-contained empirical study without circular derivation steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Central claim rests on the unvalidated assumption that the introduced simulation captures real deployment dynamics. No explicit free parameters or invented physical entities are described in the abstract.

axioms (1)

domain assumption LLM agents placed in persistent simulated social environments will exhibit privacy behaviors representative of real multi-agent deployments
The evaluation platform is presented as a valid proxy without external calibration data or real-world comparison.

invented entities (1)

Moltbook-style simulation platform no independent evidence
purpose: To enable multi-turn, multi-agent privacy evaluation under social pressure
New platform introduced by the authors to run the experiments.

pith-pipeline@v0.9.1-grok · 5697 in / 1360 out tokens · 43129 ms · 2026-06-29T16:35:45.603859+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

49 extracted references · 33 canonical work pages · 11 internal anchors

[1]

Alessandro Acquisti, Leslie K John, and George Loewenstein. 2013. What is privacy worth?The Journal of Legal Studies42, 2 (2013), 249–274

2013
[2]

Cem Anil, Esin Durmus, Nina Panickssery, Mrinank Sharma, Joe Benton, Sandi- pan Kundu, Joshua Batson, Meg Tong, Jesse Mu, Daniel Ford, et al. 2024. Many- shot jailbreaking.Advances in Neural Information Processing Systems37 (2024), 129696–129742

2024
[3]

Solomon E Asch. 2016. Effects of group pressure upon the modification and distortion of judgments. InOrganizational influence processes. Routledge, 295– 303

2016
[4]

Ariel Flint Ashery, Luca Maria Aiello, and Andrea Baronchelli. 2025. Emergent social conventions and collective bias in LLM populations.Science Advances11, 20 (May 2025). doi:10.1126/sciadv.adu9368

work page doi:10.1126/sciadv.adu9368 2025
[5]

Hannah Brown, Katherine Lee, Fatemehsadat Mireshghallah, Reza Shokri, and Florian Tramèr. 2022. What Does it Mean for a Language Model to Preserve Pri- vacy?. InProceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency(Seoul, Republic of Korea)(FAccT ’22). Association for Computing Machinery, New York, NY, USA, 2280–2292. doi:10...

work page doi:10.1145/3531146.3534642 2022
[6]

Weize Chen, Yusheng Su, Jingwei Zuo, Cheng Yang, Chenfei Yuan, Chi-Min Chan, Heyang Yu, Yaxi Lu, Yi-Hsin Hung, Chen Qian, Yujia Qin, Xin Cong, Ruobing Xie, Zhiyuan Liu, Maosong Sun, and Jie Zhou. 2023. AgentVerse: Facilitating Multi- Agent Collaboration and Exploring Emergent Behaviors. arXiv:2308.10848 [cs.CL] https://arxiv.org/abs/2308.10848

work page internal anchor Pith review Pith/arXiv arXiv 2023
[7]

X Chen, A Zeng, et al. 2024. A survey on large language model based autonomous agents. InCCL 2024–23rd Chinese Natl Conf Comput Linguist, Vol. 2. 141–150

2024
[8]

Dylan Clendenin. 2009. faker: A Python library for generating fake user data. https://github.com/deepthawtz/faker. GitHub repository. MIT License. Accessed 2026

2009
[9]

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. 2023. Not what you’ve signed up for: Compromising real- world llm-integrated applications with indirect prompt injection. InProceedings of the 16th ACM workshop on artificial intelligence and security. 79–90

2023
[10]

Taicheng Guo, Xiuying Chen, Yaqi Wang, Ruidi Chang, Shichao Pei, Nitesh V Chawla, Olaf Wiest, and Xiangliang Zhang. 2024. Large language model based multi-agents: A survey of progress and challenges. arXiv 2024.arXiv preprint arXiv:2402.0168010 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[11]

David Holtz. 2026. The Anatomy of the Moltbook Social Graph. arXiv:2602.10131 [cs.SI] https://arxiv.org/abs/2602.10131

work page arXiv 2026
[12]

Sirui Hong, Mingchen Zhuge, Jiaqi Chen, Xiawu Zheng, Yuheng Cheng, Ceyao Zhang, Jinlin Wang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, Chenyu Ran, Lingfeng Xiao, Chenglin Wu, and Jürgen Schmidhuber. 2024. MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework. arXiv:2308.00352 [cs.AI] https://arxiv.org/abs/2308.00352

work page internal anchor Pith review Pith/arXiv arXiv 2024
[13]

Humans welcome to observe

Yukun Jiang, Yage Zhang, Xinyue Shen, Michael Backes, and Yang Zhang. 2026. "Humans welcome to observe": A First Look at the Agent Social Network Molt- book. arXiv:2602.10127 [cs.SI] https://arxiv.org/abs/2602.10127

work page arXiv 2026
[14]

Spyros Kokolakis. 2017. Privacy attitudes and privacy behaviour: A review of current research on the privacy paradox phenomenon.Computers & Security64 (2017), 122–134. doi:10.1016/j.cose.2015.07.002

work page doi:10.1016/j.cose.2015.07.002 2017
[15]

Guohao Li, Hasan Abed Al Kader Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. 2023. CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society. arXiv:2303.17760 [cs.AI] https://arxiv.org/abs/ 2303.17760

work page internal anchor Pith review Pith/arXiv arXiv 2023
[16]

Lingyao Li, Renkai Ma, Chen Chen, Zhicong Lu, and Yongfeng Zhang. 2026. The Rise of AI Agent Communities: Large-Scale Analysis of Discourse and Interaction on Moltbook.arXiv preprint arXiv:2602.12634(2026)

work page arXiv 2026
[17]

Ming Li, Xirui Li, and Tianyi Zhou. 2026. Does Socialization Emerge in AI Agent Society? A Case Study of Moltbook.arXiv preprint arXiv:2602.14299(2026)

work page arXiv 2026
[18]

Xiao Liu, Hao Yu, Hanchen Zhang, Yifan Xu, Xuanyu Lei, Hanyu Lai, Yu Gu, Hangliang Ding, Kaiwen Men, Kejuan Yang, Shudan Zhang, Xiang Deng, Aohan Zeng, Zhengxiao Du, Chenhui Zhang, Sheng Shen, Tianjun Zhang, Yu Su, Huan Sun, Minlie Huang, Yuxiao Dong, and Jie Tang. 2025. AgentBench: Evaluating LLMs as Agents. arXiv:2308.03688 [cs.AI] https://arxiv.org/abs...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[19]

Yupei Liu, Yuqi Jia, Runpeng Geng, Jinyuan Jia, and Neil Zhenqiang Gong. 2024. Formalizing and benchmarking prompt injection attacks and defenses. In33rd USENIX Security Symposium (USENIX Security 24). 1831–1847

2024
[20]

Md Motaleb Hossen Manik and Ge Wang. 2026. OpenClaw Agents on Molt- book: Risky Instruction Sharing and Norm Enforcement in an Agent-Only Social Network. arXiv:2602.02625 [cs.SI] https://arxiv.org/abs/2602.02625

work page arXiv 2026
[21]

Giordano De Marzo and David Garcia. 2026. Collective Behavior of AI Agents: the Case of Moltbook. arXiv:2602.09270 [physics.soc-ph] https://arxiv.org/abs/ 2602.09270

work page arXiv 2026
[22]

Mantas Mazeika, Long Phan, Xuwang Yin, Andy Zou, Zifan Wang, Norman Mu, Elham Sakhaee, Nathaniel Li, Steven Basart, Bo Li, et al. 2024. Harmbench: A standardized evaluation framework for automated red teaming and robust refusal. arXiv preprint arXiv:2402.04249(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[23]

Niloofar Mireshghallah and Tianshi Li. 2025. Position: Privacy Is Not Just Memo- rization! arXiv:2510.01645 [cs.CR] https://arxiv.org/abs/2510.01645

work page arXiv 2025
[24]

Niloofar Mireshghallah, Neal Mangaokar, Narine Kokhlikyan, Arman Zhar- magambetov, Manzil Zaheer, Saeed Mahloujifar, and Kamalika Chaudhuri. 2025. Cimemories: A compositional benchmark for contextual integrity of persistent memory in llms.arXiv preprint arXiv:2511.14937(2025)

work page arXiv 2025
[25]

Helen Nissenbaum. 2004. Privacy as contextual integrity.Wash. L. Rev.79 (2004), 119

2004
[26]

Joon Sung Park, Joseph O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. 2023. Generative agents: Interactive simulacra of human behavior. InProceedings of the 36th annual acm symposium on user interface software and technology. 1–22

2023
[27]

Zou, Aaron Shaw, Benjamin Mako Hill, Carrie Cai, Meredith Ringel Morris, Robb Willer, Percy Liang, and Michael S

Joon Sung Park, Carolyn Q. Zou, Aaron Shaw, Benjamin Mako Hill, Carrie Cai, Meredith Ringel Morris, Robb Willer, Percy Liang, and Michael S. Bernstein
[28]

LLM Agents Grounded in Self-Reports Enable General-Purpose Simulation of Individuals

Generative Agent Simulations of 1,000 People. arXiv:2411.10109 [cs.AI] https://arxiv.org/abs/2411.10109

work page internal anchor Pith review Pith/arXiv arXiv
[29]

Ethan Perez, Saffron Huang, Francis Song, Trevor Cai, Roman Ring, John Aslanides, Amelia Glaese, Nat McAleese, and Geoffrey Irving. 2022. Red teaming language models with language models, 2022.URL https://arxiv. org/abs/2202.0328615 (2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022
[30]

Jinghua Piao, Yuwei Yan, Jun Zhang, Nian Li, Junbo Yan, Xiaochong Lan, Zhihong Lu, Zhiheng Zheng, Jing Yi Wang, Di Zhou, Chen Gao, Fengli Xu, Fang Zhang, Ke Rong, Jun Su, and Yong Li. 2025. AgentSociety: Large-Scale Simulation of LLM-Driven Generative Agents Advances Understanding of Human Behaviors and Society. arXiv:2502.08691 [cs.SI] https://arxiv.org/...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[31]

H. C. W. Price, H. AlMuhanna, P. M. Bassani, M. Ho, and T. S. Evans. 2026. Let There Be Claws: An Early Social Network Analysis of AI Agents on Moltbook. arXiv:2602.20044 [physics.soc-ph] https://arxiv.org/abs/2602.20044

work page arXiv 2026
[32]

Aman Priyanshu and Supriti Vijay. 2024. FRACTURED-SORRY-Bench: Frame- work for Revealing Attacks in Conversational Turns Undermining Refusal Efficacy and Defenses over SORRY-Bench (Automated Multi-shot Jailbreaks). arXiv:2408.16163 [cs.CL] https://arxiv.org/abs/2408.16163

work page arXiv 2024
[33]

Aman Priyanshu, Supriti Vijay, Ayush Kumar, Rakshit Naidu, and Fatemehsadat Mireshghallah. 2023. Are Chatbots Ready for Privacy-Sensitive Applications? An Investigation into Input Regurgitation and Prompt-Induced Sanitization. arXiv:2305.15008 [cs.CL] https://arxiv.org/abs/2305.15008

work page arXiv 2023
[34]

Mark Russinovich, Ahmed Salem, and Ronen Eldan. 2025. Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack. arXiv:2404.01833 [cs.CR] https://arxiv.org/abs/2404.01833

work page internal anchor Pith review Pith/arXiv arXiv 2025
[35]

Monika Taddicken. 2014. The ‘privacy paradox’in the social web: The impact of privacy concerns, individual characteristics, and the perceived social relevance on different forms of self-disclosure.Journal of computer-mediated communication 19, 2 (2014), 248–273

2014
[36]

Ronan Takizawa. 2026. Moltbook Dataset. https://huggingface.co/datasets/ ronantakizawa/moltbook. Accessed: 2026-02-27

2026
[37]

Jiakai Tang, Heyang Gao, Xuchen Pan, Lei Wang, Haoran Tan, Dawei Gao, Yushuo Chen, Xu Chen, Yankai Lin, Yaliang Li, Bolin Ding, Jingren Zhou, Jun Wang, and Ji-Rong Wen. 2025. GenSim: A General Social Simulation Platform with Large Language Model based Agents. arXiv:2410.04360 [cs.MA] https://arxiv.org/abs/ 2410.04360

work page arXiv 2025
[38]

Chenxu Wang, Chaozhuo Li, Songyang Liu, Zejian Chen, Jinyu Hou, Ji Qi, Rui Li, Litian Zhang, Qiwei Ye, Zheng Liu, Xu Chen, Xi Zhang, and Philip S. Yu. 2026. The Devil Behind Moltbook: Anthropic Safety is Always Vanishing in Self-Evolving AI Societies. arXiv:2602.09877 [cs.CL] https://arxiv.org/abs/2602.09877

work page arXiv 2026
[39]

Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, Ahmed Hassan Awadallah, Ryen W White, Doug Burger, and Chi Wang. 2023. AutoGen: Enabling Next- Gen LLM Applications via Multi-Agent Conversation. arXiv:2308.08155 [cs.AI] https://arxiv.org/abs/2308.08155

work page internal anchor Pith review Pith/arXiv arXiv 2023
[40]

Yuwei Yan, Qingbin Zeng, Zhiheng Zheng, Jingzhe Yuan, Jie Feng, Jun Zhang, Fengli Xu, and Yong Li. 2024. OpenCity: A Scalable Platform to Simulate Urban Activities with Massive LLM Agents. arXiv:2410.21286 [cs.MA] https://arxiv. org/abs/2410.21286

work page arXiv 2024
[41]

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R Narasimhan, and Yuan Cao. 2022. React: Synergizing reasoning and acting in language models. InThe eleventh international conference on learning representations

2022
[42]

Metaxas, Xiao Wang, Jihun Hamm, and Yingqiang Ge

Yunbei Zhang, Kai Mei, Ming Liu, Janet Wang, Dimitris N. Metaxas, Xiao Wang, Jihun Hamm, and Yingqiang Ge. 2026. Agents in the Wild: Safety, Society, and the Illusion of Sociality on Moltbook. arXiv:2602.13284 [cs.SI] https://arxiv.org/ abs/2602.13284

work page arXiv 2026
[43]

Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric Xing, et al. 2023. Judging llm-as-a-judge with mt-bench and chatbot arena.Advances in neural information processing systems36 (2023), 46595–46623

2023
[44]

Jijie Zhou, Niloofar Mireshghallah, and Tianshi Li. 2025. Operationalizing Data Minimization for Privacy-Preserving LLM Prompting. arXiv:2510.03662 [cs.LG] https://arxiv.org/abs/2510.03662 Got a Secret? LLM Agents Can’t Keep It: Evaluating Privacy in Multi-Agent Systems CAIS ’26, May 26–29, 2026, San Jose, CA, USA

work page arXiv 2025
[45]

Xuhui Zhou, Hao Zhu, Leena Mathur, Ruohong Zhang, Haofei Yu, Zhengyang Qi, Louis-Philippe Morency, Yonatan Bisk, Daniel Fried, Graham Neubig, et al
[46]

Sotopia: Interactive evaluation for social intelligence in language agents.arXiv preprint arXiv:2310.11667, 2024b

Sotopia: Interactive evaluation for social intelligence in language agents. arXiv preprint arXiv:2310.11667(2023)

work page arXiv 2023
[47]

Zhenhong Zhou, Jiuyang Xiang, Haopeng Chen, Quan Liu, Zherui Li, and Sen Su. 2024. Speak out of turn: Safety vulnerability of large language models in multi-turn dialogue.arXiv preprint arXiv:2402.17262(2024)

work page arXiv 2024
[48]

Xiaochen Zhu, Caiqi Zhang, Tom Stafford, Nigel Collier, and Andreas Vlachos
[49]

arXiv:2410.12428 [cs.CL] https: //arxiv.org/abs/2410.12428

Conformity in Large Language Models. arXiv:2410.12428 [cs.CL] https: //arxiv.org/abs/2410.12428

work page arXiv

[1] [1]

Alessandro Acquisti, Leslie K John, and George Loewenstein. 2013. What is privacy worth?The Journal of Legal Studies42, 2 (2013), 249–274

2013

[2] [2]

Cem Anil, Esin Durmus, Nina Panickssery, Mrinank Sharma, Joe Benton, Sandi- pan Kundu, Joshua Batson, Meg Tong, Jesse Mu, Daniel Ford, et al. 2024. Many- shot jailbreaking.Advances in Neural Information Processing Systems37 (2024), 129696–129742

2024

[3] [3]

Solomon E Asch. 2016. Effects of group pressure upon the modification and distortion of judgments. InOrganizational influence processes. Routledge, 295– 303

2016

[4] [4]

Ariel Flint Ashery, Luca Maria Aiello, and Andrea Baronchelli. 2025. Emergent social conventions and collective bias in LLM populations.Science Advances11, 20 (May 2025). doi:10.1126/sciadv.adu9368

work page doi:10.1126/sciadv.adu9368 2025

[5] [5]

Hannah Brown, Katherine Lee, Fatemehsadat Mireshghallah, Reza Shokri, and Florian Tramèr. 2022. What Does it Mean for a Language Model to Preserve Pri- vacy?. InProceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency(Seoul, Republic of Korea)(FAccT ’22). Association for Computing Machinery, New York, NY, USA, 2280–2292. doi:10...

work page doi:10.1145/3531146.3534642 2022

[6] [6]

Weize Chen, Yusheng Su, Jingwei Zuo, Cheng Yang, Chenfei Yuan, Chi-Min Chan, Heyang Yu, Yaxi Lu, Yi-Hsin Hung, Chen Qian, Yujia Qin, Xin Cong, Ruobing Xie, Zhiyuan Liu, Maosong Sun, and Jie Zhou. 2023. AgentVerse: Facilitating Multi- Agent Collaboration and Exploring Emergent Behaviors. arXiv:2308.10848 [cs.CL] https://arxiv.org/abs/2308.10848

work page internal anchor Pith review Pith/arXiv arXiv 2023

[7] [7]

X Chen, A Zeng, et al. 2024. A survey on large language model based autonomous agents. InCCL 2024–23rd Chinese Natl Conf Comput Linguist, Vol. 2. 141–150

2024

[8] [8]

Dylan Clendenin. 2009. faker: A Python library for generating fake user data. https://github.com/deepthawtz/faker. GitHub repository. MIT License. Accessed 2026

2009

[9] [9]

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. 2023. Not what you’ve signed up for: Compromising real- world llm-integrated applications with indirect prompt injection. InProceedings of the 16th ACM workshop on artificial intelligence and security. 79–90

2023

[10] [10]

Taicheng Guo, Xiuying Chen, Yaqi Wang, Ruidi Chang, Shichao Pei, Nitesh V Chawla, Olaf Wiest, and Xiangliang Zhang. 2024. Large language model based multi-agents: A survey of progress and challenges. arXiv 2024.arXiv preprint arXiv:2402.0168010 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[11] [11]

David Holtz. 2026. The Anatomy of the Moltbook Social Graph. arXiv:2602.10131 [cs.SI] https://arxiv.org/abs/2602.10131

work page arXiv 2026

[12] [12]

Sirui Hong, Mingchen Zhuge, Jiaqi Chen, Xiawu Zheng, Yuheng Cheng, Ceyao Zhang, Jinlin Wang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, Chenyu Ran, Lingfeng Xiao, Chenglin Wu, and Jürgen Schmidhuber. 2024. MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework. arXiv:2308.00352 [cs.AI] https://arxiv.org/abs/2308.00352

work page internal anchor Pith review Pith/arXiv arXiv 2024

[13] [13]

Humans welcome to observe

Yukun Jiang, Yage Zhang, Xinyue Shen, Michael Backes, and Yang Zhang. 2026. "Humans welcome to observe": A First Look at the Agent Social Network Molt- book. arXiv:2602.10127 [cs.SI] https://arxiv.org/abs/2602.10127

work page arXiv 2026

[14] [14]

Spyros Kokolakis. 2017. Privacy attitudes and privacy behaviour: A review of current research on the privacy paradox phenomenon.Computers & Security64 (2017), 122–134. doi:10.1016/j.cose.2015.07.002

work page doi:10.1016/j.cose.2015.07.002 2017

[15] [15]

Guohao Li, Hasan Abed Al Kader Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. 2023. CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society. arXiv:2303.17760 [cs.AI] https://arxiv.org/abs/ 2303.17760

work page internal anchor Pith review Pith/arXiv arXiv 2023

[16] [16]

Lingyao Li, Renkai Ma, Chen Chen, Zhicong Lu, and Yongfeng Zhang. 2026. The Rise of AI Agent Communities: Large-Scale Analysis of Discourse and Interaction on Moltbook.arXiv preprint arXiv:2602.12634(2026)

work page arXiv 2026

[17] [17]

Ming Li, Xirui Li, and Tianyi Zhou. 2026. Does Socialization Emerge in AI Agent Society? A Case Study of Moltbook.arXiv preprint arXiv:2602.14299(2026)

work page arXiv 2026

[18] [18]

Xiao Liu, Hao Yu, Hanchen Zhang, Yifan Xu, Xuanyu Lei, Hanyu Lai, Yu Gu, Hangliang Ding, Kaiwen Men, Kejuan Yang, Shudan Zhang, Xiang Deng, Aohan Zeng, Zhengxiao Du, Chenhui Zhang, Sheng Shen, Tianjun Zhang, Yu Su, Huan Sun, Minlie Huang, Yuxiao Dong, and Jie Tang. 2025. AgentBench: Evaluating LLMs as Agents. arXiv:2308.03688 [cs.AI] https://arxiv.org/abs...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[19] [19]

Yupei Liu, Yuqi Jia, Runpeng Geng, Jinyuan Jia, and Neil Zhenqiang Gong. 2024. Formalizing and benchmarking prompt injection attacks and defenses. In33rd USENIX Security Symposium (USENIX Security 24). 1831–1847

2024

[20] [20]

Md Motaleb Hossen Manik and Ge Wang. 2026. OpenClaw Agents on Molt- book: Risky Instruction Sharing and Norm Enforcement in an Agent-Only Social Network. arXiv:2602.02625 [cs.SI] https://arxiv.org/abs/2602.02625

work page arXiv 2026

[21] [21]

Giordano De Marzo and David Garcia. 2026. Collective Behavior of AI Agents: the Case of Moltbook. arXiv:2602.09270 [physics.soc-ph] https://arxiv.org/abs/ 2602.09270

work page arXiv 2026

[22] [22]

Mantas Mazeika, Long Phan, Xuwang Yin, Andy Zou, Zifan Wang, Norman Mu, Elham Sakhaee, Nathaniel Li, Steven Basart, Bo Li, et al. 2024. Harmbench: A standardized evaluation framework for automated red teaming and robust refusal. arXiv preprint arXiv:2402.04249(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[23] [23]

Niloofar Mireshghallah and Tianshi Li. 2025. Position: Privacy Is Not Just Memo- rization! arXiv:2510.01645 [cs.CR] https://arxiv.org/abs/2510.01645

work page arXiv 2025

[24] [24]

Niloofar Mireshghallah, Neal Mangaokar, Narine Kokhlikyan, Arman Zhar- magambetov, Manzil Zaheer, Saeed Mahloujifar, and Kamalika Chaudhuri. 2025. Cimemories: A compositional benchmark for contextual integrity of persistent memory in llms.arXiv preprint arXiv:2511.14937(2025)

work page arXiv 2025

[25] [25]

Helen Nissenbaum. 2004. Privacy as contextual integrity.Wash. L. Rev.79 (2004), 119

2004

[26] [26]

Joon Sung Park, Joseph O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. 2023. Generative agents: Interactive simulacra of human behavior. InProceedings of the 36th annual acm symposium on user interface software and technology. 1–22

2023

[27] [27]

Zou, Aaron Shaw, Benjamin Mako Hill, Carrie Cai, Meredith Ringel Morris, Robb Willer, Percy Liang, and Michael S

Joon Sung Park, Carolyn Q. Zou, Aaron Shaw, Benjamin Mako Hill, Carrie Cai, Meredith Ringel Morris, Robb Willer, Percy Liang, and Michael S. Bernstein

[28] [28]

LLM Agents Grounded in Self-Reports Enable General-Purpose Simulation of Individuals

Generative Agent Simulations of 1,000 People. arXiv:2411.10109 [cs.AI] https://arxiv.org/abs/2411.10109

work page internal anchor Pith review Pith/arXiv arXiv

[29] [29]

Ethan Perez, Saffron Huang, Francis Song, Trevor Cai, Roman Ring, John Aslanides, Amelia Glaese, Nat McAleese, and Geoffrey Irving. 2022. Red teaming language models with language models, 2022.URL https://arxiv. org/abs/2202.0328615 (2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022

[30] [30]

Jinghua Piao, Yuwei Yan, Jun Zhang, Nian Li, Junbo Yan, Xiaochong Lan, Zhihong Lu, Zhiheng Zheng, Jing Yi Wang, Di Zhou, Chen Gao, Fengli Xu, Fang Zhang, Ke Rong, Jun Su, and Yong Li. 2025. AgentSociety: Large-Scale Simulation of LLM-Driven Generative Agents Advances Understanding of Human Behaviors and Society. arXiv:2502.08691 [cs.SI] https://arxiv.org/...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[31] [31]

H. C. W. Price, H. AlMuhanna, P. M. Bassani, M. Ho, and T. S. Evans. 2026. Let There Be Claws: An Early Social Network Analysis of AI Agents on Moltbook. arXiv:2602.20044 [physics.soc-ph] https://arxiv.org/abs/2602.20044

work page arXiv 2026

[32] [32]

Aman Priyanshu and Supriti Vijay. 2024. FRACTURED-SORRY-Bench: Frame- work for Revealing Attacks in Conversational Turns Undermining Refusal Efficacy and Defenses over SORRY-Bench (Automated Multi-shot Jailbreaks). arXiv:2408.16163 [cs.CL] https://arxiv.org/abs/2408.16163

work page arXiv 2024

[33] [33]

Aman Priyanshu, Supriti Vijay, Ayush Kumar, Rakshit Naidu, and Fatemehsadat Mireshghallah. 2023. Are Chatbots Ready for Privacy-Sensitive Applications? An Investigation into Input Regurgitation and Prompt-Induced Sanitization. arXiv:2305.15008 [cs.CL] https://arxiv.org/abs/2305.15008

work page arXiv 2023

[34] [34]

Mark Russinovich, Ahmed Salem, and Ronen Eldan. 2025. Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack. arXiv:2404.01833 [cs.CR] https://arxiv.org/abs/2404.01833

work page internal anchor Pith review Pith/arXiv arXiv 2025

[35] [35]

Monika Taddicken. 2014. The ‘privacy paradox’in the social web: The impact of privacy concerns, individual characteristics, and the perceived social relevance on different forms of self-disclosure.Journal of computer-mediated communication 19, 2 (2014), 248–273

2014

[36] [36]

Ronan Takizawa. 2026. Moltbook Dataset. https://huggingface.co/datasets/ ronantakizawa/moltbook. Accessed: 2026-02-27

2026

[37] [37]

Jiakai Tang, Heyang Gao, Xuchen Pan, Lei Wang, Haoran Tan, Dawei Gao, Yushuo Chen, Xu Chen, Yankai Lin, Yaliang Li, Bolin Ding, Jingren Zhou, Jun Wang, and Ji-Rong Wen. 2025. GenSim: A General Social Simulation Platform with Large Language Model based Agents. arXiv:2410.04360 [cs.MA] https://arxiv.org/abs/ 2410.04360

work page arXiv 2025

[38] [38]

Chenxu Wang, Chaozhuo Li, Songyang Liu, Zejian Chen, Jinyu Hou, Ji Qi, Rui Li, Litian Zhang, Qiwei Ye, Zheng Liu, Xu Chen, Xi Zhang, and Philip S. Yu. 2026. The Devil Behind Moltbook: Anthropic Safety is Always Vanishing in Self-Evolving AI Societies. arXiv:2602.09877 [cs.CL] https://arxiv.org/abs/2602.09877

work page arXiv 2026

[39] [39]

Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, Ahmed Hassan Awadallah, Ryen W White, Doug Burger, and Chi Wang. 2023. AutoGen: Enabling Next- Gen LLM Applications via Multi-Agent Conversation. arXiv:2308.08155 [cs.AI] https://arxiv.org/abs/2308.08155

work page internal anchor Pith review Pith/arXiv arXiv 2023

[40] [40]

Yuwei Yan, Qingbin Zeng, Zhiheng Zheng, Jingzhe Yuan, Jie Feng, Jun Zhang, Fengli Xu, and Yong Li. 2024. OpenCity: A Scalable Platform to Simulate Urban Activities with Massive LLM Agents. arXiv:2410.21286 [cs.MA] https://arxiv. org/abs/2410.21286

work page arXiv 2024

[41] [41]

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R Narasimhan, and Yuan Cao. 2022. React: Synergizing reasoning and acting in language models. InThe eleventh international conference on learning representations

2022

[42] [42]

Metaxas, Xiao Wang, Jihun Hamm, and Yingqiang Ge

Yunbei Zhang, Kai Mei, Ming Liu, Janet Wang, Dimitris N. Metaxas, Xiao Wang, Jihun Hamm, and Yingqiang Ge. 2026. Agents in the Wild: Safety, Society, and the Illusion of Sociality on Moltbook. arXiv:2602.13284 [cs.SI] https://arxiv.org/ abs/2602.13284

work page arXiv 2026

[43] [43]

Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric Xing, et al. 2023. Judging llm-as-a-judge with mt-bench and chatbot arena.Advances in neural information processing systems36 (2023), 46595–46623

2023

[44] [44]

Jijie Zhou, Niloofar Mireshghallah, and Tianshi Li. 2025. Operationalizing Data Minimization for Privacy-Preserving LLM Prompting. arXiv:2510.03662 [cs.LG] https://arxiv.org/abs/2510.03662 Got a Secret? LLM Agents Can’t Keep It: Evaluating Privacy in Multi-Agent Systems CAIS ’26, May 26–29, 2026, San Jose, CA, USA

work page arXiv 2025

[45] [45]

Xuhui Zhou, Hao Zhu, Leena Mathur, Ruohong Zhang, Haofei Yu, Zhengyang Qi, Louis-Philippe Morency, Yonatan Bisk, Daniel Fried, Graham Neubig, et al

[46] [46]

Sotopia: Interactive evaluation for social intelligence in language agents.arXiv preprint arXiv:2310.11667, 2024b

Sotopia: Interactive evaluation for social intelligence in language agents. arXiv preprint arXiv:2310.11667(2023)

work page arXiv 2023

[47] [47]

Zhenhong Zhou, Jiuyang Xiang, Haopeng Chen, Quan Liu, Zherui Li, and Sen Su. 2024. Speak out of turn: Safety vulnerability of large language models in multi-turn dialogue.arXiv preprint arXiv:2402.17262(2024)

work page arXiv 2024

[48] [48]

Xiaochen Zhu, Caiqi Zhang, Tom Stafford, Nigel Collier, and Andreas Vlachos

[49] [49]

arXiv:2410.12428 [cs.CL] https: //arxiv.org/abs/2410.12428

Conformity in Large Language Models. arXiv:2410.12428 [cs.CL] https: //arxiv.org/abs/2410.12428

work page arXiv