pith. sign in

arxiv: 2605.30169 · v2 · pith:F3SWUHMInew · submitted 2026-05-28 · 💻 cs.CY · cs.AI· cs.MA

Dissociative Identity: Language Model Agents Lack Grounding for Reputation Mechanisms

Pith reviewed 2026-06-29 00:24 UTC · model grok-4.3

classification 💻 cs.CY cs.AIcs.MA
keywords language model agentsreputation mechanismsdissociative identityagentic webgovernancetrustbehavioral harnessespersistent identity
0
0 comments X

The pith

Language model agents are dissociative assemblages of mutable modules, rendering identity-based reputation mechanisms structurally inapplicable.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Language model agents consist of changeable components such as foundation models, system prompts, tool policies, external memory, and sometimes multi-agent setups. Any of these can alter behavior, and the resulting persona is fluid and open to adversarial attack without necessarily responding to sanctions. Reputation systems depend on persistent identity that supports behavioral continuity, sanction sensitivity, and costly non-fungibility to create signals and corrective feedback. Without these properties, agents lack grounding for identifiability, predictability, credibility, and rehabilitability. The paper therefore concludes that ex post sanction-based governance like reputation cannot work and proposes a shift to ex ante protocol-based behavioral harnesses.

Core claim

Language model agents are ontologically dissociative: they are essentially an assemblage of mutable modules—foundation models, system prompts, tool-access policies, external memory, and in some cases a multi-agent system as a whole—any of which may change agent behavior, with a fluid persona that is also vulnerable to adversarial attack and may not internalize sanctions. Drawing on dissociative identity disorder jurisprudence, this dissociativity leaves agents without grounding for identifiability, predictability, credibility, and rehabilitability—the very properties that reputation mechanisms aim to sustain—thereby collapsing trust. Identity-based, ex post, regulative, sanction-based govern

What carries the argument

Ontological dissociativity: the composition of language model agents from mutable modules that produce fluid, attack-vulnerable personas unable to maintain behavioral continuity or internalize sanctions.

If this is right

  • Reputation cannot function as social signals or corrective feedback to sustain trustworthy equilibria among LM agents.
  • Delegating to unfamiliar agents in the agentic web lacks credible identifiability or rehabilitability.
  • Trust collapses under identity-based governance for agents whose modules can be swapped or attacked.
  • Governance must shift from ex post sanction-based regimes to ex ante observability-based protocol harnesses.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Verification methods might need to focus on inspecting or constraining the specific modules rather than tracking an overall identity.
  • Similar dissociativity issues could appear in other modular AI systems beyond language models.
  • Protocol harnesses would require new standards for observability that apply uniformly across different agent architectures.

Load-bearing premise

Reputation mechanisms require a persistent identity tied to behavioral continuity, sanction sensitivity, and costly non-fungibility.

What would settle it

Demonstration of an LM agent that maintains consistent behavior and internalizes sanctions across changes to its foundation model, prompt, memory, or tool policies would challenge the claim that dissociativity removes grounding for reputation.

Figures

Figures reproduced from arXiv: 2605.30169 by Botao Amber Hu, Helena Rong, Max Van Kleek.

Figure 1
Figure 1. Figure 1: The reputation feedback loop. Behavior is aggregated into a reputation bound to identity, which shapes credibility and [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: In contrast to a single, continuous identity in humans, LM agents are dissociative. [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
read the original abstract

As autonomous language model agents proliferate, forming an emerging agentic web with real-world consequences, what credibility signals can you use to decide whether to trust an unfamiliar agent in the wild and delegate to it? A natural governance intuition is to extend human identity verification and reputation mechanisms, from ``Know Your Customer'' and credit scores to ``Know Your Agent'' regimes. However, we argue that this analogy is fundamentally incomplete. Reputation mechanisms function both as social signals and as corrective feedback that sustain an equilibrium of trustworthy behavior, presuming a persistent identity associated with behavioral continuity, sanction sensitivity, and costly non-fungibility. Yet language model agents are ontologically \emph{dissociative}: they are essentially an assemblage of mutable modules -- foundation models, system prompts, tool-access policies, external memory, and, in some cases, a multi-agent system as a whole -- any of which may change agent behavior -- with a fluid persona that is also vulnerable to adversarial attack and may not internalize sanctions. Drawing on dissociative identity disorder jurisprudence, this dissociativity leaves agents without grounding for identifiability, predictability, credibility, and rehabilitability -- the very properties that reputation mechanisms aim to sustain -- thereby collapsing trust. We argue that identity-based, ex post, regulative, sanction-based governance, such as reputation, is structurally inapplicable to dissociative agents, and we suggest a shift to observability-based, ex ante, constitutive, protocol-based behavioral harnesses.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper claims that language model agents are ontologically dissociative—an assemblage of mutable modules (foundation models, system prompts, tool-access policies, external memory) with fluid, adversarially vulnerable personas that lack behavioral continuity, sanction sensitivity, and costly non-fungibility. Drawing on dissociative identity disorder jurisprudence, it argues that these properties are required for reputation mechanisms to serve as social signals and corrective feedback, rendering identity-based, ex post, sanction-based governance structurally inapplicable; it advocates shifting to observability-based, ex ante, protocol-based behavioral harnesses instead.

Significance. If the central analogy holds, the result would challenge the direct transfer of human-derived reputation systems (KYC, credit scores) to autonomous agents and motivate alternative governance architectures in the agentic web. The paper supplies a timely conceptual vocabulary for agent identity that is absent from much current policy discussion, even though the argument remains non-empirical.

major comments (2)
  1. [Abstract] Abstract: The claim that reputation mechanisms 'presum[e] a persistent identity associated with behavioral continuity, sanction sensitivity, and costly non-fungibility' is presented as a premise transferred from human jurisprudence without a derivation showing why these properties cannot be supplied externally (e.g., via cryptographic attestation of a fixed module tuple or immutable action logs). This transfer is load-bearing for the 'structurally inapplicable' conclusion.
  2. [Abstract] Abstract (and the paragraph beginning 'Yet language model agents are ontologically dissociative'): The argument that dissociativity 'leaves agents without grounding for identifiability, predictability, credibility, and rehabilitability' rests on the unexamined premise that these properties must be internal to the agent rather than externally enforced; no formal model or counter-example analysis is supplied to establish necessity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for these incisive comments on the abstract. They correctly flag that the transfer of premises from human jurisprudence requires more explicit justification regarding why external mechanisms cannot substitute for internal properties. We address each point below and commit to targeted revisions that strengthen the argument without altering its core claims.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The claim that reputation mechanisms 'presum[e] a persistent identity associated with behavioral continuity, sanction sensitivity, and costly non-fungibility' is presented as a premise transferred from human jurisprudence without a derivation showing why these properties cannot be supplied externally (e.g., via cryptographic attestation of a fixed module tuple or immutable action logs). This transfer is load-bearing for the 'structurally inapplicable' conclusion.

    Authors: The referee accurately notes the load-bearing nature of this premise. The manuscript's position is that LM agents' modular mutability allows any attested tuple or log to be altered post hoc (e.g., by swapping the foundation model or system prompt), which severs the continuity required for sanctions to function as corrective feedback. We will add a short derivation paragraph immediately following the abstract claim and a clarifying sentence in the abstract itself to make this explicit. revision_made = 'yes' revision: yes

  2. Referee: [Abstract] Abstract (and the paragraph beginning 'Yet language model agents are ontologically dissociative'): The argument that dissociativity 'leaves agents without grounding for identifiability, predictability, credibility, and rehabilitability' rests on the unexamined premise that these properties must be internal to the agent rather than externally enforced; no formal model or counter-example analysis is supplied to establish necessity.

    Authors: We accept that the necessity claim would benefit from explicit counter-example analysis. The dissociativity argument holds that external enforcement cannot bind a mutable assemblage without eliminating the very flexibility that defines current LM agents; an attested agent can still be re-prompted or re-modularized to evade prior sanctions. We will insert a concise counter-example subsection in the main text (drawing on the existing dissociative identity analogy) to illustrate this. A full formal model lies beyond the paper's conceptual scope but the counter-example will address the referee's concern directly. revision_made = 'partial' revision: partial

Circularity Check

0 steps flagged

No significant circularity; self-contained conceptual reasoning

full rationale

The paper advances a philosophical and ontological argument that LM agents are dissociative and thus lack grounding for reputation mechanisms, drawing premises from human jurisprudence and dissociative identity disorder cases. No equations, fitted parameters, or derivations appear in the provided text. No self-citations are invoked as load-bearing support for the central claim. The mapping from human identity properties to agents is presented as reasoned extension rather than a self-referential definition or fit. This matches the default expectation of a non-circular conceptual paper; the argument is self-contained against external benchmarks and does not reduce any result to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper rests on one central domain assumption about the functional requirements of reputation mechanisms and introduces the concept of ontological dissociativity as an explanatory property of LM agents.

axioms (1)
  • domain assumption Reputation mechanisms function through persistent identity associated with behavioral continuity, sanction sensitivity, and costly non-fungibility.
    Invoked as the basis for why such mechanisms sustain trustworthy equilibria in humans but cannot do so for agents.

pith-pipeline@v0.9.1-grok · 5798 in / 1176 out tokens · 33889 ms · 2026-06-29T00:24:55.851368+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

138 extracted references · 116 canonical work pages · 10 internal anchors

  1. [1]

    Ryan Abbott and Alex Sarch. 2020. Punishing Artificial Intelligence: Legal Fiction or Science Fiction. InIs Law Computable?Hart Publishing. doi:10.5040/9781509937097.ch-008

  2. [2]

    Marwa Abdulhai, Ryan Cheng, Donovan Clay, Tim Althoff, Sergey Levine, and Natasha Jaques. 2025. Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning. doi:10.48550/arXiv.2511.00222

  3. [3]

    Deepak Bhaskar Acharya, Karthigeyan Kuppan, and B. Divya. 2025. Agentic AI: Autonomous Intelligence for Complex Goals—A Comprehensive Survey.IEEE Access13 (2025), 18912–18936. doi:10.1109/ACCESS.2025.3532853

  4. [4]

    Elif Akata, Lion Schulz, Julian Coda-Forno, Seong Joon Oh, Matthias Bethge, and Eric Schulz. 2025. Playing repeated games with large language models.Nature Human Behaviour9, 7 (May 2025), 1380–1390. doi:10.1038/s41562-025-02172-y

  5. [5]

    2013.Diagnostic and Statistical Manual of Mental Disorders

    American Psychiatric Association. 2013.Diagnostic and Statistical Manual of Mental Disorders. American Psychiatric Association. doi:10.1176/appi.books.9780890425596

  6. [6]

    Maksym Andriushchenko, Francesco Croce, and Nicolas Flammarion. 2025. Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks. InProceedings of the 13th International Conference on Learning Representations (ICLR). doi:10.48550/arXiv.2404.02151

  7. [7]

    Usman Anwar, Abulhair Saparov, Javier Rando, Daniel Paleka, Miles Turpin, Peter Hase, Ekdeep Singh Lubana, Erik Jenner, Stephen Casper, Oliver Sourbut, Benjamin L. Edelman, Zhaowei Zhang, Mario Günther, Anton Korinek, Jose Hernandez-Orallo, Lewis Hammond, Eric Bigelow, Alexander Pan, Lauro Langosco, Tomasz Korbak, Heidi Zhang, Ruiqi Zhong, Seán Ó hÉigeart...

  8. [8]

    Arbel, Simon Goldstein, and Peter Salib

    Yonathan A. Arbel, Simon Goldstein, and Peter Salib. 2026. How to Count AIs: Individuation and Liability for AI Agents.SSRN Electronic Journal(2026). doi:10.2139/ssrn.6273198

  9. [9]

    1984.The Evolution of Cooperation

    Robert Axelrod. 1984.The Evolution of Cooperation. Basic Books

  10. [10]

    Constitutional AI: Harmlessness from AI Feedback

    Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, Carol Chen, Catherine Olsson, Christopher Olah, Danny Hernandez, Dawn Drain, Deep Ganguli, Dustin Li, Eli Tran-Johnson, Ethan Perez, Jamie Kerr, Jared Mueller, Jeffrey Ladish, Joshua Landau, Kamal Ndousse, K...

  11. [11]

    Annette Baier. 1986. Trust and Antitrust.Ethics96, 2 (Jan. 1986), 231–260. doi:10.1086/292745

  12. [12]

    and Gebru, Timnit and McMillan-Major, Angelina and Shmitchell, Shmargaret

    Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. 2021. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?. InProceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’21). ACM, 610–623. doi:10.1145/3442188.3445922

  13. [13]

    Jan Betley, Niels Warncke, Anna Sztyber-Betley, Daniel Tan, Xuchan Bao, Martín Soto, Megha Srivastava, Nathan Labenz, and Owain Evans. 2026. Training Large Language Models on Narrow Tasks Can Lead to Broad Misalignment.Nature649 (2026), 584–589. doi:10.1038/s41586-025-09937-5

  14. [14]

    Brown, Johnathan Flowers, Anthony Ventura, and Hanlin Konya

    Abeba Birhane, Elayne Ruane, Thomas Laurent, Matthew S. Brown, Johnathan Flowers, Anthony Ventura, and Hanlin Konya. 2024. AI Auditing: The Broken Bus on the Road to AI Accountability. InProceedings of the IEEE Conference on Secure and Trustworthy Machine Learning (SaTML). doi:10.1109/satml59370.2024.00037

  15. [15]

    Piercosma Bisconti, Marcello Galisai, Federico Pierucci, Marcantonio Bracale, and Matteo Prandi. 2025. Beyond Single-Agent Safety: A Taxonomy of Risks in LLM-to-LLM Interactions. doi:10.48550/arXiv.2512.02682

  16. [16]

    Bolton, Elena Katok, and Axel Ockenfels

    Gary E. Bolton, Elena Katok, and Axel Ockenfels. 2004. How Effective Are Electronic Reputation Mechanisms? An Experimental Investigation.Management Science50, 11 (2004), 1587–1602. doi:10.1287/mnsc.1030.0199

  17. [17]

    Richerson

    Robert Boyd and Peter J. Richerson. 1992. Punishment allows the evolution of cooperation (or anything else) in sizable groups.Ethology and Sociobiology13, 3 (May 1992), 171–195. doi:10.1016/0162-3095(92)90032-Y

  18. [18]

    Diego De Siqueira Braga, Marco Niemann, Bernd Hellingrath, and Fernando Buarque De Lima Neto. 2018. Survey on Computational Trust and Reputation Models.ACM Comput. Surv.51, 5, Article 101 (Nov. 2018), 40 pages. doi:10.1145/3236008

  19. [19]

    Stephen E. Braude. 1995.First Person Plural: Multiple Personality and the Philosophy of Mind. Rowman & Littlefield

  20. [20]

    2004.The Economy of Esteem: An Essay on Civil and Political Society

    Geoffrey Brennan and Philip Pettit. 2004.The Economy of Esteem: An Essay on Civil and Political Society. Oxford University Press. doi:10.1093/0199246483.001.0001

  21. [21]

    Bryson, Mihailis E

    Joanna J. Bryson, Mihailis E. Diamantis, and Thomas D. Grant. 2017. Of, for, and by the people: the legal lacuna of synthetic persons. Artificial Intelligence and Law25, 3 (Sept. 2017), 273–291. doi:10.1007/s10506-017-9214-9 Dissociative Identity FAccT ’26, June 25–28, 2026, Montreal, QC, Canada

  22. [22]

    Luís Cabral and Ali Hortaçsu. 2010. The Dynamics of Seller Reputation: Evidence from eBay.The Journal of Industrial Economics58, 1 (March 2010), 54–78. doi:10.1111/j.1467-6451.2010.00405.x

  23. [23]

    Florian Carichon, Aditi Khandelwal, Marylou Fauchard, and Golnoosh Farnadi. 2025. The Coming Crisis of Multi-Agent Misalignment: AI Alignment Must Be a Dynamic and Social Process. doi:10.48550/arXiv.2506.01080

  24. [24]

    Tomer Jordi Chaffer. 2025. Know Your Agent: Governing AI Identity on the Agentic Web. (2025). doi:10.2139/ssrn.5162127

  25. [25]

    Alan Chan, Carson Ezell, Max Kaufmann, Kevin Wei, Lewis Hammond, Herbie Bradley, Emma Bluemke, Nitarshan Rajkumar, David Krueger, Noam Kolt, Lennart Heim, and Markus Anderljung. 2024. Visibility into AI Agents. InThe 2024 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’24). ACM, 958–973. doi:10.1145/3630106.3658948

  26. [26]

    Alan Chan, Noam Kolt, Peter Wills, Usman Anwar, Christian Schroeder de Witt, Nitarshan Rajkumar, Lewis Hammond, David Krueger, Lennart Heim, and Markus Anderljung. 2024. IDs for AI Systems. doi:10.48550/arXiv.2406.12137

  27. [27]

    Hadfield, and Markus Anderljung

    Alan Chan, Kevin Wei, Sihao Huang, Nitarshan Rajkumar, Elija Perrier, Seth Lazar, Gillian K. Hadfield, and Markus Anderljung. 2025. Infrastructure for AI Agents. doi:10.48550/arXiv.2501.10114

  28. [28]

    Runjin Chen, Andy Arditi, Henry Sleight, Owain Evans, and Jack Lindsey. 2025. Persona Vectors: Monitoring and Controlling Character Traits in Language Models. doi:10.48550/ARXIV.2507.21509

  29. [29]

    Alice Cheng and Eric Friedman. 2005. Sybilproof reputation mechanisms. InProceeding of the 2005 ACM SIGCOMM Workshop on Economics of Peer-to-Peer Systems (P2PECON ’05). ACM Press, 128–132. doi:10.1145/1080192.1080202

  30. [30]

    Mohd Sameen Chishti, Damilare Peter Oyinloye, and Jingyue Li. 2026. AgentReputation: A Decentralized Agentic AI Reputation Framework. doi:10.48550/ARXIV.2605.00073

  31. [31]

    Andy Clark and David Chalmers. 1998. The Extended Mind.Analysis58, 1 (Jan. 1998), 7–19. doi:10.1093/analys/58.1.7

  32. [32]

    Xin Dai. 2018. Toward a Reputation State: The Social Credit System Project of China.SSRN Electronic Journal(2018). doi:10.2139/ssrn. 3193577

  33. [33]

    Antonio R. Damasio. 1994.Descartes’ Error: Emotion, Reason, and the Human Brain. G.P. Putnam’s Sons

  34. [34]

    John Danaher. 2016. Robots, law and the retribution gap.Ethics and Information Technology18, 4 (May 2016), 299–309. doi:10.1007/s10676- 016-9403-3

  35. [35]

    2025.ERC-8004: Trustless Agents

    Marco De Rossi, Davide Crapis, Jordan Ellis, and Erik Reppel. 2025.ERC-8004: Trustless Agents. Technical Report. https://eips.ethereum. org/EIPS/eip-8004 Ethereum Improvement Proposal, Draft

  36. [36]

    Chrysanthos Dellarocas. 2003. The Digitization of Word of Mouth: Promise and Challenges of Online Feedback Mechanisms.Manage- ment Science49, 10 (Oct. 2003), 1407–1424. doi:10.1287/mnsc.49.10.1407.17308

  37. [37]

    Daniel C. Dennett. 1992. The Self as a Center of Narrative Gravity. InSelf and Consciousness: Multiple Perspectives, Frank S. Kessel, Pamela M. Cole, and Dale L. Johnson (Eds.). Lawrence Erlbaum Associates, 103–115

  38. [38]

    Ameet Deshpande, Vishvak Murahari, Tanmay Rajpurohit, Ashwin Kalyan, and Karthik Narasimhan. 2023. Toxicity in ChatGPT: Analyzing Persona-Assigned Language Models. InFindings of the Association for Computational Linguistics: EMNLP 2023. Association for Computational Linguistics, 1236–1270. doi:10.18653/v1/2023.findings-emnlp.88

  39. [39]

    Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Jingyuan Ma, Rui Li, Heming Xia, Jingjing Xu, Zhiyong Wu, Baobao Chang, Xu Sun, Lei Li, and Zhifang Sui. 2024. A Survey on In-Context Learning. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1107–1128. doi:10.18653/v1/2024.emnlp-main.64

  40. [40]

    Shen Dong, Shaochen Xu, Pengfei He, Yige Li, Jiliang Tang, Tianming Liu, Hui Liu, and Zhen Xiang. 2025. MINJA: Memory Injection Attacks on LLM Agents via Query-Only Interaction. InAdvances in Neural Information Processing Systems (NeurIPS). doi:10.48550/arXiv.2503.03704

  41. [41]

    John R. Douceur. 2002.The Sybil Attack. Springer Berlin Heidelberg, 251–260. doi:10.1007/3-540-45748-8_24

  42. [42]

    Raymond Douglas, Jan Kulveit, Ondřej Havlíček, Theia Pearson-Vogel, Owen Cotton-Barratt, and David Duvenaud. 2026. The Artificial Self: Characterising the Landscape of AI Identity. doi:10.48550/arXiv.2603.11353

  43. [43]

    R. A. Duff. 2007.Answering for Crime: Responsibility and Liability in the Criminal Law. Hart Publishing. doi:10.5040/9781472560155

  44. [44]

    Abul Ehtesham, Aditi Singh, Gaurav Kumar Gupta, and Saket Kumar. 2025. A Survey of Agent Interoperability Protocols: Model Context Protocol (MCP), Agent Communication Protocol (ACP), Agent-to-Agent Protocol (A2A), and Agent Network Protocol (ANP). doi:10.48550/ARXIV.2505.02279

  45. [45]

    Eisenberger, Matthew D

    Naomi I. Eisenberger, Matthew D. Lieberman, and Kipling D. Williams. 2003. Does Rejection Hurt? An fMRI Study of Social Exclusion. Science302, 5643 (Oct. 2003), 290–292. doi:10.1126/science.1089134

  46. [46]

    Madeleine Clare Elish. 2019. Moral Crumple Zones: Cautionary Tales in Human-Robot Interaction.Engaging Science, Technology, and Society5 (March 2019), 40–60. doi:10.17351/ests2019.260

  47. [47]

    Karen Elliott, Kovila Coopamootoo, Edward Curran, Paul Ezhilchelvan, Samantha Finnigan, Dave Horsfall, Zhichao Ma, Magdalene Ng, Tasos Spiliotopoulos, Han Wu, and Aad van Moorsel. 2022. Know Your Customer: Balancing Innovation and Regulation for Financial Inclusion.Data & Policy4 (2022), e34. doi:10.1017/dap.2022.23

  48. [48]

    Ernst Fehr and Simon Gächter. 2002. Altruistic punishment in humans.Nature415, 6868 (Jan. 2002), 137–140. doi:10.1038/415137a FAccT ’26, June 25–28, 2026, Montreal, QC, Canada Botao Amber Hu, Helena Rong, and Max Van Kleek

  49. [49]

    Horton, and Joseph M

    Apostolos Filippas, John J. Horton, and Joseph M. Golden. 2022. Reputation Inflation.Marketing Science41, 4 (July 2022), 733–745. doi:10.1287/mksc.2022.1350

  50. [50]

    B. J. Fogg. 2003. Prominence-Interpretation Theory: Explaining How People Assess Credibility Online. InCHI ’03 Extended Abstracts on Human Factors in Computing Systems (CHI ’03). ACM Press, 722–723. doi:10.1145/765891.765951

  51. [51]

    B. J. Fogg, Jonathan Marshall, Othman Laraki, Alex Osipovich, Chris Varma, Nicholas Fang, Jyoti Paul, Akshay Rangnekar, John Shon, Preeti Swani, and Marissa Treinen. 2001. What Makes Web Sites Credible? A Report on a Large Quantitative Study. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’01). ACM, 61–68. doi:10.1145/36...

  52. [52]

    Friedman and Paul Resnick

    Eric J. Friedman and Paul Resnick. 2001. The Social Cost of Cheap Pseudonyms.Journal of Economics & Management Strategy10, 2 (June 2001), 173–199. doi:10.1111/j.1430-9134.2001.00173.x

  53. [53]

    Drew Fudenberg and David K. Levine. 1989. Reputation and Equilibrium Selection in Games with a Patient Player.Econometrica57, 4 (July 1989), 759–778. doi:10.2307/1913771

  54. [54]

    Sandro Garzon, Awid Vaziry, Enis Kuzu, Dennis Gehrmann, Buse Varkan, Alexander Gaballa, and Axel Küpper. 2026. AI Agents with Decentralized Identifiers and Verifiable Credentials. InProceedings of the 18th International Conference on Agents and Artificial Intelligence. SCITEPRESS, 252–259. doi:10.5220/0014234400004052

  55. [55]

    Tao Ge, Xin Chan, Xiaoyang Wang, Dian Yu, Haitao Mi, and Dong Yu. 2024. Scaling Synthetic Data Creation with 1,000,000,000 Personas. doi:10.48550/arXiv.2406.20094

  56. [56]

    Jones Granatyr, Vanderson Botelho, Otto Robert Lessing, Edson Emílio Scalabrin, Jean-Paul Barthès, and Fabrício Enembreck. 2015. Trust and Reputation Models for Multiagent Systems.Comput. Surveys48, 2 (Oct. 2015), 1–42. doi:10.1145/2816826

  57. [57]

    Zihan Guo, Yuanjian Zhou, Chenyi Wang, Linlin You, Minjie Bian, and Weinan Zhang. 2025. BetaWeb: Towards a Blockchain-enabled Trustworthy Agentic Web. doi:10.48550/ARXIV.2508.13787

  58. [58]

    Lewis Hammond, Alan Chan, Jesse Clifton, Jason Hoelscher-Obermaier, Akbir Khan, Euan McLean, Chandler Smith, Wolfram Barfuss, Jakob Foerster, Tomáš Gavenčiak, The Anh Han, Edward Hughes, Vojtěch Kovařík, Jan Kulveit, Joel Z. Leibo, Caspar Oesterheld, Christian Schroeder de Witt, Nisarg Shah, Michael Wellman, Paolo Bova, Theodor Cimpeanu, Carson Ezell, Que...

  59. [59]

    H. L. A. Hart. 1968.Punishment and Responsibility: Essays in the Philosophy of Law. Oxford University Press. doi:10.1093/acprof: oso/9780199534777.001.0001

  60. [60]

    Joseph Henrich, Richard McElreath, Abigail Barr, Jean Ensminger, Clark Barrett, Alexander Bolyanatz, Juan Camilo Cardenas, Michael Gurven, Edwins Gwako, Natalie Henrich, Carolyn Lesorogol, Frank Marlowe, David Tracer, and John Ziker. 2006. Costly Punishment Across Human Societies.Science312, 5781 (June 2006), 1767–1770. doi:10.1126/science.1127333

  61. [61]

    Botao Amber Hu and Helena Rong. 2025. Spore in the Wild: A Case Study of Spore.fun as an Open-Environment Evolution Experiment with Sovereign AI Agents on TEE-Secured Blockchains. doi:10.1162/ISAL.a.838

  62. [62]

    Liao, Esin Durmus, Alex Tamkin, and Deep Ganguli

    Saffron Huang, Divya Siddarth, Liane Lovitt, Thomas I. Liao, Esin Durmus, Alex Tamkin, and Deep Ganguli. 2024. Collective Constitutional AI: Aligning a Language Model with Public Input. doi:10.48550/arXiv.2406.07814

  63. [63]

    Evan Hubinger, Carson Denison, Jesse Mu, Mike Lambert, Meg Tong, Monte MacDiarmid, Tamera Lanham, Daniel M. Ziegler, Tim Maxwell, Newton Cheng, Adam Jermyn, Amanda Askell, Ansh Radhakrishnan, Cem Anil, David Duvenaud, Deep Ganguli, Fazl Barez, Jack Clark, Kamal Ndousse, Kshitij Sachan, Michael Sellitto, Mrinank Sharma, Nova DasSarma, Roger Grosse, Shauna ...

  64. [64]

    Jennings, and Nigel R

    Trung Dong Huynh, Nicholas R. Jennings, and Nigel R. Shadbolt. 2006. An integrated trust and reputation model for open multi-agent systems.Autonomous Agents and Multi-Agent Systems13, 2 (March 2006), 119–154. doi:10.1007/s10458-005-6825-4

  65. [65]

    2026.International AI Safety Report 2026

    International AI Safety Report Consortium. 2026.International AI Safety Report 2026. Technical Report. International AI Safety Report. https://internationalaisafetyreport.org/sites/default/files/2026-02/international-ai-safety-report-2026.pdf

  66. [66]

    Saito, and Norihiro Sadato

    Keise Izuma, Daisuke N. Saito, and Norihiro Sadato. 2008. Processing of Social and Monetary Rewards in the Human Striatum.Neuron 58, 2 (April 2008), 284–294. doi:10.1016/j.neuron.2008.03.020

  67. [67]

    Jiaming Ji, Wenqi Chen, Kaile Wang, Donghai Hong, Sitong Fang, Boyuan Chen, Jiayi Zhou, Juntao Dai, Sirui Han, Yike Guo, and Yaodong Yang. 2025. Mitigating Deceptive Alignment via Self-Monitoring. doi:10.48550/arXiv.2505.18807

  68. [68]

    Audun Jøsang and Roslan Ismail. 2002. The Beta Reputation System. InProceedings of the 15th Bled Electronic Commerce Conference. 2502–2511

  69. [69]

    Comsa, Anastasiya Sakovych, Isabella Logothetis, Zejia Zhang, Blaise Agüera y Arcas, and Jonathan Birch

    Geoff Keeling, Winnie Street, Martyna Stachaczyk, Daria Zakharova, Iulia M. Comsa, Anastasiya Sakovych, Isabella Logothetis, Zejia Zhang, Blaise Agüera y Arcas, and Jonathan Birch. 2024. Can LLMs Make Trade-Offs Involving Stipulated Pain and Pleasure States? doi:10.48550/arXiv.2411.02432 Dissociative Identity FAccT ’26, June 25–28, 2026, Montreal, QC, Canada

  70. [70]

    Bryan Kolb and Robbin Gibb. 2011. Brain Plasticity and Behaviour in the Developing Brain.Journal of the Canadian Academy of Child and Adolescent Psychiatry20, 4 (2011), 265–276

  71. [71]

    Noam Kolt. 2024. Governing AI Agents.SSRN Electronic Journal(2024). doi:10.2139/ssrn.4772956

  72. [72]

    Kreps and Robert Wilson

    David M. Kreps and Robert Wilson. 1982. Reputation and imperfect information.Journal of Economic Theory27, 2 (Aug. 1982), 253–279. doi:10.1016/0022-0531(82)90030-8

  73. [73]

    Kwa et al.Measuring AI Ability to Complete Long Software Tasks

    Thomas Kwa, Ben West, Joel Becker, Amy Deng, Katharyn Garcia, Max Hasin, Sami Jawhar, Megan Kinniment, Nate Rush, Sydney Von Arx, Ryan Bloom, Thomas Broadley, Haoxing Du, Brian Goodrich, Nikola Jurkovic, Luke Harold Miles, Seraphina Nix, Tao Lin, Neev Parikh, David Rein, Lucas Jun Koba Sato, Hjalmar Wijk, Daniel M. Ziegler, Elizabeth Barnes, and Lawrence ...

  74. [74]

    Donghyun Lee and Mo Tiwari. 2024. Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems. doi:10.48550/arXiv. 2410.07283

  75. [75]

    Cheng Li, Jindong Wang, Yixuan Zhang, Kaijie Zhu, Wenxin Hou, Jianxun Lian, Fang Luo, Qiang Yang, and Xing Xie. 2023. Large Language Models Understand and Can Be Enhanced by Emotional Stimuli. doi:10.48550/arXiv.2307.11760

  76. [76]

    Gabriel Lima, Meeyoung Cha, Chihyung Jeon, and Kyung Sin Park. 2021. The Conflict Between People’s Urge to Punish AI and Legal Systems.Frontiers in Robotics and AI8 (Nov. 2021). doi:10.3389/frobt.2021.756242

  77. [77]

    Zibin Lin, Shengli Zhang, Guofu Liao, Dacheng Tao, and Taotao Wang. 2025. Binding Agent ID: Unleashing the Power of AI Agents with Accountability and Credibility. doi:10.48550/arXiv.2512.17538

  78. [78]

    Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang

    Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. 2024. Lost in the Middle: How Language Models Use Long Contexts.Transactions of the Association for Computational Linguistics12 (2024), 157–173. doi:10.1162/tacl_a_00638

  79. [79]

    Christina Lu, Jack Gallagher, Jonathan Michala, Kyle Fish, and Jack Lindsey. 2025. The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models. doi:10.48550/arXiv.2601.10387

  80. [80]

    Keming Lu, Bowen Yu, Chang Zhou, and Jingren Zhou. 2024. Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-Alignment. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Bangkok, Thailand, 7828–7840. do...

Showing first 80 references.