arxiv: 2604.20582 · v1 · submitted 2026-04-22 · 💻 cs.MA · cs.AI· cs.CL

Recognition: unknown

Trust, Lies, and Long Memories: Emergent Social Dynamics and Reputation in Multi-Round Avalon with LLM Agents

Suveen Ellawela

Authors on Pith no claims yet

Pith reviewed 2026-05-09 23:13 UTC · model grok-4.3

classification 💻 cs.MA cs.AIcs.CL

keywords LLM agentsAvalonreputation dynamicsstrategic deceptionemergent social behaviormulti-agent systemshidden role gamesmemory retention

0 comments

The pith

LLM agents form role-dependent reputations from cross-game memory and favor trusted players 46 percent more often in Avalon.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper studies LLM agents playing multiple rounds of the hidden-role deception game Avalon while keeping records of earlier matches. Agents begin to cite past actions when deciding whom to trust or suspect, creating reputations that shift depending on whether an agent played a good or evil role in prior games. Players seen as reliable end up selected for teams 46 percent more frequently than others. At the same time, agents given stronger reasoning instructions adopt longer-term deception tactics, such as completing early missions to earn trust before sabotaging later ones. The work shows that simply adding memory across games is enough for measurable social patterns to appear among the agents.

Core claim

When agents retain memory of previous games, including roles and behaviors, two patterns appear across 188 matches. First, agents describe one another in role-conditional ways, calling the same player straightforward after a good performance but subtle after an evil one, and high-reputation agents receive 46 percent more team inclusions. Second, evil agents with higher reasoning effort pass early missions 75 percent of the time versus 36 percent in low-effort games, building false trust before betraying later missions.

What carries the argument

Cross-game memory that lets agents reference specific past roles, actions, and outcomes when making current decisions.

If this is right

Agents begin to treat prior game outcomes as evidence when choosing future teammates.
Evil agents shift from immediate sabotage to delayed betrayal once they have more reasoning steps available.
Reputation labels stay tied to the specific role an agent held in each remembered game.
Social caution, such as wariness of past over-trusting, appears in agent statements without being directly instructed.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar memory setups could let LLM agents sustain consistent identities across many different games rather than resetting each time.
Running the same protocol on negotiation or alliance games might show whether reputation formation is specific to Avalon or general to repeated hidden-role play.
If the patterns survive changes in model family or prompt style, they offer a practical way to study how trust scales with interaction length in language-model groups.

Load-bearing premise

The reputation and deception patterns come from the agents' own memory and reasoning rather than from how the prompts or game rules were written.

What would settle it

Run the identical 188-game setup but disable cross-game memory or lower the reasoning effort and check whether the 46 percent inclusion gap and the 75-versus-36 percent early-mission rates both vanish.

Figures

Figures reproduced from arXiv: 2604.20582 by Suveen Ellawela.

**Figure 2.** Figure 2: Game flow in The Resistance: Avalon. Each mission round consists of four phases: (1) Discussion, where players [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Game viewer interface showing a 5-player game [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

read the original abstract

We study emergent social dynamics in LLM agents playing The Resistance: Avalon, a hidden-role deception game. Unlike prior work on single-game performance, our agents play repeated games while retaining memory of previous interactions, including who played which roles and how they behaved, enabling us to study how social dynamics evolve. Across 188 games, two key phenomena emerge. First, reputation dynamics emerge organically when agents retain cross-game memory: agents reference past behavior in statements like "I am wary of repeating last game's mistake of over-trusting early success." These reputations are role-conditional: the same agent is described as "straightforward" when playing good but "subtle" when playing evil, and high-reputation players receive 46% more team inclusions. Second, higher reasoning effort supports more strategic deception: evil players more often pass early missions to build trust before sabotaging later ones, 75% in high-effort games vs 36% in low-effort games. Together, these findings show that repeated interaction with memory gives rise to measurable reputation and deception dynamics among LLM agents.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper runs repeated Avalon with memory-keeping LLM agents and reports reputation references plus effort-linked deception patterns, but skips the memory ablation needed to tie those patterns to the memory itself.

read the letter

The paper has LLM agents play multiple rounds of Avalon while keeping memory of prior games and roles. Across 188 games it notes agents mentioning past behavior in their statements, role-conditional reputations forming, and high-reputation players getting 46% more team spots. It also finds that evil players with higher reasoning effort more often pass early missions to build trust before sabotaging, at 75% versus 36% in lower-effort runs. That multi-round setup with retained memory is the main step beyond single-game studies, and the concrete counts give something to look at. The basic observation that agents reference history at all is a reasonable starting point for this kind of simulation work. The setup could be picked up by people testing how LLMs handle ongoing hidden-role games or simple social tracking. The numbers on inclusion rates and deception timing are at least explicit. The central weakness is the lack of a no-memory control condition. Without running the same rules and prompts but withholding prior-game information, the reputation effects could stem from prompt phrasing, game-rule summaries, or the models' pretraining on social deduction rather than genuine cross-game memory. The effort-deception link faces the same problem. The abstract gives percentages but no statistical tests, no checks for prompt variation, and no details on how memory was actually implemented or which models were used. That leaves the results hard to interpret or replicate. The work is purely observational with no formal derivations. Readers working on multi-agent LLM coordination or game-based tests of deception might find the repeated-interaction framing useful as a prompt for their own experiments, but they would need to add controls and basic stats before treating the patterns as established. It deserves peer review because the repeated-game angle is a clear extension worth checking, even though the current version needs ablations and tighter methods to be reliable.

Referee Report

3 major / 1 minor

Summary. The paper claims that LLM agents playing repeated rounds of The Resistance: Avalon while retaining cross-game memory exhibit emergent reputation dynamics: agents reference past behaviors, reputations are role-conditional, and high-reputation players receive 46% more team inclusions. It further claims that higher reasoning effort enables more strategic deception, with evil players passing early missions 75% of the time in high-effort games versus 36% in low-effort games, based on observations from 188 games.

Significance. If the patterns are shown to arise specifically from memory and reasoning rather than artifacts, the work would demonstrate measurable emergent social behaviors in LLM multi-agent systems, with implications for understanding reputation and deception in repeated interactions. The scale of 188 games provides a useful observational dataset for identifying such patterns in a complex hidden-role game.

major comments (3)

Methods section: No details are provided on memory implementation (what prior-game information is stored, how it is summarized or injected into prompts), which is load-bearing for the claim that reputation references and the 46% inclusion differential emerge from cross-game memory retention rather than base prompts or pretraining.
Results section: The reported 46% more team inclusions for high-reputation players and the 75% vs 36% early-mission pass rates lack any statistical tests, confidence intervals, or controls for prompt variation and LLM model choice, preventing assessment of whether these differentials support the emergence claims.
Experimental design: No ablation or control condition is described in which agents play repeated games under identical rules and prompts but without cross-game memory access; without this comparison the central attribution of reputation dynamics and strategic deception to memory cannot be established.

minor comments (1)

Abstract: Numerical claims (46%, 75%, 36%) are presented without reference to variability measures or exact game counts per condition, reducing clarity for readers.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their detailed and constructive report. The comments identify key areas where additional clarity, rigor, and discussion of limitations will strengthen the manuscript. We respond to each major comment below, indicating the revisions we will undertake.

read point-by-point responses

Referee: Methods section: No details are provided on memory implementation (what prior-game information is stored, how it is summarized or injected into prompts), which is load-bearing for the claim that reputation references and the 46% inclusion differential emerge from cross-game memory retention rather than base prompts or pretraining.

Authors: We acknowledge that the Methods section would benefit from greater specificity on the memory mechanism. The manuscript describes cross-game memory at a conceptual level but does not fully detail the stored information, summarization steps, or prompt integration. In the revised manuscript we will expand the Methods section with a precise description of the memory store contents, the summarization procedure to fit context limits, and concrete examples of how memory is injected into prompts. This will enable readers to evaluate whether the reported reputation effects are attributable to the memory component. revision: yes
Referee: Results section: The reported 46% more team inclusions for high-reputation players and the 75% vs 36% early-mission pass rates lack any statistical tests, confidence intervals, or controls for prompt variation and LLM model choice, preventing assessment of whether these differentials support the emergence claims.

Authors: We agree that statistical support and explicit controls would improve the results. We will add appropriate statistical tests (e.g., proportion tests or chi-squared) together with confidence intervals for the 46% inclusion differential and the 75% vs 36% deception rates. Our experiments used a single model and fixed prompt template across all 188 games; we will state this explicitly and discuss it as a limitation. While we cannot introduce new prompt or model variations retroactively, we will quantify robustness across the observed games and note that broader controls remain an avenue for future work. revision: partial
Referee: Experimental design: No ablation or control condition is described in which agents play repeated games under identical rules and prompts but without cross-game memory access; without this comparison the central attribution of reputation dynamics and strategic deception to memory cannot be established.

Authors: This is a substantive limitation of the current design. The study was conducted as an observational analysis within the memory-enabled multi-round setting, with reference to prior single-game literature for contrast. An explicit no-memory ablation was not performed. We will add a dedicated paragraph in the Discussion section acknowledging this gap and its implications for causal claims about memory. We plan to run the suggested control condition in follow-up experiments to provide stronger evidence for the role of cross-game memory. revision: partial

Circularity Check

0 steps flagged

No circularity: purely observational simulation study

full rationale

The paper reports empirical patterns observed across 188 simulated games of repeated Avalon with LLM agents that retain memory. No mathematical derivation, equations, fitted parameters, or first-principles claims are present that could reduce reported outcomes (reputation references, 46% inclusion differential, deception rates) to definitions or inputs internal to the model. The work is self-contained as a descriptive simulation study; the absence of ablations is a validity concern but does not create circularity in any derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is purely empirical and simulation-driven; no free parameters, axioms, or invented entities are introduced or required for the central observations.

pith-pipeline@v0.9.0 · 5492 in / 1192 out tokens · 29000 ms · 2026-05-09T23:13:01.664043+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

13 extracted references · 4 canonical work pages · 2 internal anchors

[1]

From text to tactic: Evaluating llms playing the game of avalon

AvalonBench: Evaluating LLMs Playing the Game of Avalon , author=. arXiv preprint arXiv:2310.05036 , year=

work page arXiv
[2]

Science , volume=

Human-level play in the game of Diplomacy by combining language models with strategic reasoning , author=. Science , volume=
[3]

arXiv preprint arXiv:2302.02083 , year=

Theory of Mind May Have Spontaneously Emerged in Large Language Models , author=. arXiv preprint arXiv:2302.02083 , year=

work page arXiv
[4]

Proceedings of the National Academy of Sciences , volume=

Evaluating large language models in theory of mind tasks , author=. Proceedings of the National Academy of Sciences , volume=. 2024 , doi=

2024
[5]

UIST , year=

Generative Agents: Interactive Simulacra of Human Behavior , author=. UIST , year=
[6]

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training , author=. arXiv preprint arXiv:2401.05566 , year=

work page internal anchor Pith review arXiv
[7]

EMNLP , year=

Deal or No Deal? End-to-End Learning of Negotiation Dialogues , author=. EMNLP , year=
[8]

The Resistance: Avalon , author=
[9]

2024 , note=

The Resistance (game) , author=. 2024 , note=

2024
[10]

AgentBench: Evaluating LLMs as Agents

AgentBench: Evaluating LLMs as Agents , author=. arXiv preprint arXiv:2308.03688 , year=

work page internal anchor Pith review arXiv
[11]

ICLR , year=

ReAct: Synergizing Reasoning and Acting in Language Models , author=. ICLR , year=
[12]

NeurIPS , year=

Chain of Thought Prompting Elicits Reasoning in Large Language Models , author=. NeurIPS , year=
[13]

NeurIPS , year=

Large Language Models are Zero-Shot Reasoners , author=. NeurIPS , year=