EASE Configuration Facilitates A Reproducible Science of LLM Social Simulations

Aur\'elien B\"uck-Kaeffer; Jean-Fran\c{c}ois Godbout; Maximilian Puelma Touzel; Reihaneh Rabbany; Sneheel Sarangi; Zachary Yang

arxiv: 2605.30258 · v1 · pith:QJMAPTI4new · submitted 2026-05-28 · 💻 cs.MA

EASE Configuration Facilitates A Reproducible Science of LLM Social Simulations

Sneheel Sarangi , Maximilian Puelma Touzel , Aur\'elien B\"uck-Kaeffer , Zachary Yang , Jean-Fran\c{c}ois Godbout , Reihaneh Rabbany This is my paper

Pith reviewed 2026-06-28 23:48 UTC · model grok-4.3

classification 💻 cs.MA

keywords LLM social simulationsmulti-agent systemsreproducibilitymodular architecturesimulation enginesevaluation metricsagent-based modeling

0 comments

The pith

EASE modularization turns ad-hoc LLM social simulators into reproducible research tools by separating environments, agents, engines and metrics.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Current LLM simulators for social interactions are built as monolithic, one-off systems that resist replication and systematic comparison. The paper proposes a four-part modular breakdown called EASE—environments that define the setting, agents that embody the participants, simulation engines that run the interactions, and evaluation metrics that score the outcomes. This breakdown is placed inside an experimental study schema that ties every run to an explicit research question. An open-source implementation, SiliSocS, puts the structure into practice and is tested in three case studies that re-examine prior questions, probe deeper into complex scenarios, and extend earlier work. If the modular split works as described, design choices become isolatable variables whose effects on results can be measured consistently across independent studies.

Core claim

The central claim is that imposing the EASE modular structure on LLM-based multi-agent simulators produces more reproducible research outputs; the three case studies conducted inside the SiliSocS sandbox demonstrate this by showing how the same configuration can assess existing questions, dive deeper into complex ones, and elaborate on prior studies while isolating the impacts of specific design choices.

What carries the argument

EASE, the explicit separation of a simulator into Environments, Agents, Simulation engines, and Evaluation metrics, which supplies the standardized parts needed to run study-structured workflows around explicit research questions.

If this is right

Researchers gain a consistent way to orchestrate workflows that center on answering explicit questions inside generated scenarios.
Limitations of existing modeling approaches become visible through repeated, comparable assessments.
The effects of individual design choices on key simulation results can be isolated and measured.
Existing studies can be elaborated or extended using the same modular parts without rebuilding the entire simulator.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Groups working on different social domains could share and recombine EASE components instead of rewriting entire simulators from scratch.
Standardized evaluation metrics inside EASE might eventually support direct numerical comparison of simulation quality across papers.
The framework could be extended to record exact configuration files alongside published results, making later re-runs trivial.

Load-bearing premise

That forcing simulators into the EASE modular split will produce measurably more reproducible outputs than the current ad-hoc style.

What would settle it

A side-by-side replication exercise in which independent teams rebuild the same social scenario once with ordinary ad-hoc code and once with an EASE-configured system, then compare the variance in generated interaction logs and the success rate of exact replication.

Figures

Figures reproduced from arXiv: 2605.30258 by Aur\'elien B\"uck-Kaeffer, Jean-Fran\c{c}ois Godbout, Maximilian Puelma Touzel, Reihaneh Rabbany, Sneheel Sarangi, Zachary Yang.

**Figure 1.** Figure 1: System Design Exemplified with Application To Style Diversity. The framework consists of EASE simulation configuration (C2; right): Environments, Agents, Simulation Engine, and Evaluation Metrics. These are used to configure a (e.g., Concordia) simulation engine (middle) to run custom simulations within a 7-step research cycle structured in our proposed study schema (C1; left). The entire system is provide… view at source ↗

**Figure 2.** Figure 2: Style diversity study results. (a) gpt4o outperforms gpt4o-mini in having more diverse responses. (b) Post diversity seems unaffected by stronger grounding of agents using rich personas (gpt4o-mini was fixed here). (c) Posts are, nevertheless, more diverse with rich personas, just in stance, not in lexical diversity. (d) Action Prompt rephrasing for distinct goals gives has little effect on within agent di… view at source ↗

**Figure 3.** Figure 3: Engagement case study panels: (a; left) total actions per active agent per active episode for [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Score Diversity of responses to probe questions. [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

**Figure 5.** Figure 5: Action distribution differences between the Qwen3.5-4B and 9B model. [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗

read the original abstract

LLMs are increasingly deployed to simulate social interactions, yet many of the existing simulators remain ad hoc and monolithic. This lack of architectural standardization prevents reproducible research and complicates downstream evaluation. We advance a rigorous science of LLM-based multi-agent simulation by modularizing core components into Environments, Agents, Simulation engines, and Evaluation metrics (EASE). We demonstrate the utility of EASE configuration by wrapping it in an experimental study schema for orchestrating workflows centered around answering explicit research questions in generated scenarios. We contribute SiliSocS, an open-source, research-ready Silicon Society Sandbox implementing a study-structured EASE configuration to enable highly configurable and reproducible LLM-based social simulations. Using SiliSocS and EASE, we present three case studies, showcasing the system's comprehensive assessment of existing questions, ability to dive deeper into complex questions, and elaboration of existing studies, respectively. Together, these case studies highlight the limitations of current modeling approaches and isolate the impacts of design choices on key results.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

EASE is a clean packaging of standard modular ideas plus an open platform, but the case studies stay qualitative and do not measure the reproducibility gains they assert.

read the letter

The main takeaway is that the authors have wrapped the usual pieces of LLM multi-agent work into an EASE acronym (Environments, Agents, Simulation engines, Evaluation) and added a study schema plus the SiliSocS sandbox. That specific four-part framing and workflow is new as a single named package.

They do the field a service by releasing configurable, research-oriented code instead of another one-off simulator. The three case studies show the setup handling assessment of existing questions, deeper dives, and extensions of prior work, which makes the modularity concrete.

The soft spot is exactly the one flagged in the stress-test note. The case studies are presented as qualitative demonstrations that isolate design impacts, with no before-after numbers, variance stats, replication rates, or baseline comparisons on reproducibility. The claim that EASE produces measurably more reliable outputs therefore stays an inference from the structure rather than an observed result.

This paper is for people already running or reviewing LLM social simulations who want a shared vocabulary and starting code. A reader who needs a practical reference for organizing experiments will get something usable; someone looking for proven gains in reliability will come away wanting more data.

It deserves peer review. The framework is coherent, the platform is contributed, and the case studies are at least illustrative. Referees can ask for the quantitative checks that are missing and decide whether the reproducibility argument holds once the full details are visible.

Referee Report

1 major / 1 minor

Summary. The manuscript proposes the EASE framework (Environments, Agents, Simulation engines, Evaluation metrics) to modularize LLM-based multi-agent social simulations, aiming to improve reproducibility over ad-hoc approaches. It contributes the open-source SiliSocS sandbox implementing a study-structured EASE configuration and presents three case studies demonstrating comprehensive assessment, deeper dives into questions, and elaboration of prior work.

Significance. The open-source release of SiliSocS is a concrete strength that could enable community-wide experimentation and comparison. If the modular EASE structure can be shown to deliver measurable reproducibility gains, the work would provide a useful organizational scaffold for the growing area of LLM social simulations.

major comments (1)

[Abstract] Abstract (final sentence) and the case-study descriptions: the central claim that EASE 'isolate[s] the impacts of design choices on key results' and thereby facilitates reproducible science is load-bearing, yet the three case studies are presented only as qualitative demonstrations without any quantitative metrics (variance reduction, inter-run consistency, replication-rate improvement, or explicit before/after comparison to ad-hoc baselines).

minor comments (1)

The experimental study schema that wraps EASE is mentioned but not given a dedicated section or pseudocode; a concise diagram or table enumerating the workflow steps would improve clarity for readers implementing similar setups.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive review and for highlighting the potential of the open-source SiliSocS contribution. We address the single major comment below and describe the corresponding revisions.

read point-by-point responses

Referee: [Abstract] Abstract (final sentence) and the case-study descriptions: the central claim that EASE 'isolate[s] the impacts of design choices on key results' and thereby facilitates reproducible science is load-bearing, yet the three case studies are presented only as qualitative demonstrations without any quantitative metrics (variance reduction, inter-run consistency, replication-rate improvement, or explicit before/after comparison to ad-hoc baselines).

Authors: We agree that the case studies function as qualitative demonstrations of EASE's modularity rather than as quantitative benchmarks of reproducibility gains. The manuscript's central claim rests on the observation that the explicit EASE decomposition (with study-structured configuration) makes individual design choices transparent and independently variable, which the three case studies illustrate by showing how targeted changes in one module produce observable differences in generated social outcomes. This structure inherently supports reproducibility by enabling others to replicate or extend the exact configuration. At the same time, we acknowledge that the absence of explicit quantitative metrics (e.g., variance across seeds or direct ad-hoc baselines) leaves the reproducibility benefit as an inferred rather than measured property. In revision we will (1) add a short quantitative subsection reporting run-to-run variance for key metrics under fixed EASE configurations, (2) include a brief discussion contrasting the study-structured workflow with a monolithic baseline, and (3) revise the abstract's final sentence to state that EASE "enables isolation of design-choice impacts" rather than claiming it has already been shown to deliver measurable reproducibility gains. revision: partial

Circularity Check

0 steps flagged

No circularity: EASE is a conceptual modularization proposal without derivation or self-referential reduction

full rationale

The paper introduces EASE as an organizational framework (Environments, Agents, Simulation engines, Evaluation metrics) for LLM multi-agent simulators and implements it in SiliSocS, with three qualitative case studies as demonstrations. No equations, fitted parameters, predictions, or load-bearing self-citations appear. The central claim that EASE improves reproducibility is presented as a consequence of the modular structure itself, supported by case-study illustrations rather than any closed loop that reduces the output to the input by construction. This is a standard non-circular framework proposal.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that LLM agents can usefully stand in for human social actors and that explicit modular boundaries will reduce hidden implementation variance; no free parameters or new physical entities are introduced.

axioms (1)

domain assumption LLM-based agents can simulate social interactions in a manner that yields scientifically useful outputs
Invoked by the decision to build simulators around LLMs and to evaluate them on social-science questions.

invented entities (1)

EASE configuration no independent evidence
purpose: To enforce modular separation of simulation components for reproducibility
Newly defined four-part architecture introduced by the authors; independent evidence would require external adoption metrics not present in the abstract.

pith-pipeline@v0.9.1-grok · 5730 in / 1450 out tokens · 22410 ms · 2026-06-28T23:48:58.877432+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

57 extracted references · 25 canonical work pages · 4 internal anchors

[1]

Zhiheng Xi et al.The Rise and Potential of Large Language Model Based Agents: A Survey
[2]

arXiv:2309.07864 [cs.AI].URL:https://arxiv.org/abs/2309.07864

work page internal anchor Pith review Pith/arXiv arXiv
[3]

Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour.Should you use LLMs to simulate opinions? Quality checks for early-stage deliberation. 2025. arXiv: 2504.08954 [cs.CY]. URL:https://arxiv.org/abs/2504.08954

work page arXiv 2025
[4]

Dingyi Zuo et al.MTOS: A LLM-Driven Multi-topic Opinion Simulation Framework for Exploring Echo Chamber Dynamics. 2025. arXiv: 2510 . 12423 [cs.AI].URL: https : //arxiv.org/abs/2510.12423

work page arXiv 2025
[5]

Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents

Giorgio Piatti et al. “Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents”. In:Advances in Neural Information Processing Systems. Ed. by A. Globerson et al. V ol. 37. Curran Associates, Inc., 2024, pp. 111715–111759. URL: https : / / proceedings . neurips . cc / paper _ files / paper / 2024 / file / ca9567d8ef6b2ea2da0d...

2024
[6]

The Concordia Contest: Advancing the Cooperative Intelligence of Language Agents

Chandler Smith et al. “The Concordia Contest: Advancing the Cooperative Intelligence of Language Agents”. In:NeurIPS 2024 Competition Track. 2024.URL: https://openreview. net/forum?id=dfeFy1PSSw

2024
[7]

Ali Khodabandeh Yalabadi et al.Controlling the Misinformation Diffusion in Social Media by the Effect of Different Classes of Agents. 2024. arXiv: 2401.11524 [cs.MA].URL: https: //arxiv.org/abs/2401.11524

work page arXiv 2024
[8]

Gian Marco Orlando et al.Emergent Coordinated Behaviors in Networked LLM Agents: Modeling the Strategic Dynamics of Information Operations. 2025. arXiv: 2510 . 25003 [cs.MA].URL:https://arxiv.org/abs/2510.25003

work page arXiv 2025
[9]

Natalie Shapira et al.Agents of Chaos. 2026. arXiv: 2602.20021 [cs.AI] .URL: https: //arxiv.org/abs/2602.20021

work page internal anchor Pith review Pith/arXiv arXiv 2026
[10]

Aron Vallinder and Edward Hughes.Cultural Evolution of Cooperation among LLM Agents
[11]

arXiv:2412.10270 [cs.MA].URL:https://arxiv.org/abs/2412.10270

work page arXiv
[12]

Position: Time to Close The Validation Gap in LLM Social Simulations

Maximilian Puelma Touzel et al. “Position: Time to Close The Validation Gap in LLM Social Simulations”. In:Forty-third International Conference on Machine Learning Position Paper Track. 2026.URL:https://openreview.net/forum?id=LpbxLBcOBf

2026
[13]

Oasis: Open agent social interaction simulations with one million agents,

Ziyi Yang et al. “Oasis: Open agent social interaction simulations with one million agents”. In: arXiv preprint arXiv:2411.11581(2024)

work page arXiv 2024
[14]

Maik Larooij and Petter Törnberg.Do Large Language Models Solve the Problems of Agent- Based Modeling? A Critical Review of Generative Social Simulations. 2025. arXiv: 2504. 03274 [cs.MA].URL:https://arxiv.org/abs/2504.03274

work page arXiv 2025
[15]

Jiaxu Zhou et al.The PIMMUR Principles: Ensuring Validity in Collective Behavior of LLM Societies. 2026. arXiv: 2509 . 18052 [cs.CL].URL: https : / / arxiv . org / abs / 2509 . 18052

2026
[16]

Laura Ferrarotti et al.Generative AI collective behavior needs an interactionist paradigm
[17]

arXiv:2601.10567 [cs.AI].URL:https://arxiv.org/abs/2601.10567

work page arXiv
[18]

Are LLM-Powered Social Media Bots Realistic?

Lynnette Hui Xian Ng and Kathleen M Carley. “Are LLM-Powered Social Media Bots Realistic?” In:International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction and Behavior Representation in Modeling and Simulation. Springer. 2025, pp. 14–23

2025
[19]

Jiaxu Zhou et al.The PIMMUR Principles: Ensuring Validity in Collective Behavior of LLM Societies. 2025. arXiv: 2509 . 18052 [cs.CL].URL: https : / / arxiv . org / abs / 2509 . 18052. 10

2025
[20]

Christopher Barrie and Petter Törnberg.Emergent LLM behaviors are observationally equiva- lent to data leakage. 2025. arXiv: 2505.23796 [cs.CL].URL: https://arxiv.org/abs/ 2505.23796

work page arXiv 2025
[21]

Maik Larooij and Petter Törnberg.Can We Fix Social Media? Testing Prosocial Interventions using Generative Social Simulation. 2025. arXiv: 2508.03385 [cs.SI] .URL: https:// arxiv.org/abs/2508.03385

work page arXiv 2025
[22]

Jinghua Piao et al.AgentSociety: Large-Scale Simulation of LLM-Driven Generative Agents Ad- vances Understanding of Human Behaviors and Society. 2025. arXiv: 2502.08691 [cs.SI]. URL:https://arxiv.org/abs/2502.08691

work page internal anchor Pith review Pith/arXiv arXiv 2025
[23]

Alexander Sasha Vezhnevets et al.Generative agent-based modeling with actions grounded in physical, social, or digital space using Concordia. 2023. arXiv: 2312.03664 [cs.AI].URL: https://arxiv.org/abs/2312.03664

work page arXiv 2023
[24]

Alexander Sasha Vezhnevets et al.Multi-Actor Generative Artificial Intelligence as a Game Engine. 2025. arXiv: 2507.08892 [cs.AI].URL: https://arxiv.org/abs/2507.08892

work page arXiv 2025
[25]

Leibo et al.A Theory of Appropriateness That Accounts for Norms of Rationality

Joel Z. Leibo et al.A Theory of Appropriateness That Accounts for Norms of Rationality. 2026. arXiv:2603.14050 [cs.NE].URL:https://arxiv.org/abs/2603.14050

work page arXiv 2026
[26]

Benefits and challenges for platform-based design

Alberto Sangiovanni-Vincentelli et al. “Benefits and challenges for platform-based design”. In:Proceedings of the 41st Annual Design Automation Conference. DAC ’04. San Diego, CA, USA: Association for Computing Machinery, 2004, pp. 409–414.ISBN: 1581138288.DOI: 10.1145/996566.996684.URL:https://doi.org/10.1145/996566.996684

work page doi:10.1145/996566.996684.url:https://doi.org/10.1145/996566.996684 2004
[27]

Jingtao Ding et al.Understanding World or Predicting Future? A Comprehensive Survey of World Models. 2025. arXiv: 2411.14499 [cs.CL].URL: https://arxiv.org/abs/2411. 14499

work page arXiv 2025
[28]

Xuhui Zhou et al.Social World Models. 2025. arXiv: 2509.00559 [cs.AI] .URL: https: //arxiv.org/abs/2509.00559

work page arXiv 2025
[29]

Joon Sung Park et al.Social Simulacra: Creating Populated Prototypes for Social Computing Systems. 2022. arXiv: 2208 . 04024 [cs.HC].URL: https : / / arxiv . org / abs / 2208 . 04024

2022
[30]

Pranav Narayanan Venkit et al.The Need for a Socially-Grounded Persona Framework for User Simulation. 2026. arXiv: 2601.07110 [cs.CL] .URL: https://arxiv.org/abs/ 2601.07110

work page arXiv 2026
[31]

BluePrint: A Social Media User Dataset for LLM Persona Evaluation and Training

Aurélien Bück-Kaeffer et al. “BluePrint: A Social Media User Dataset for LLM Persona Evaluation and Training”. In:Workshop on Tailoring AI: Exploring Active and Passive LLM Personalization (PALS). EMNLP. 2025.URL: https://pals-nlp-workshop.github.io/

2025
[32]

Position: LLM Social Simulations Are a Promising Research Method

Jacy Reese Anthis et al. “Position: LLM Social Simulations Are a Promising Research Method”. In:Forty-second International Conference on Machine Learning Position Paper Track. 2025. URL:https://openreview.net/forum?id=cRBg1dtj7o

2025
[33]

Aurélien Bück-Kaeffer et al.The Silicon Society Cookbook: Design Space of LLM-based Social Simulations. 2026. arXiv: 2605.00197 [cs.MA].URL: https://arxiv.org/abs/ 2605.00197

work page internal anchor Pith review Pith/arXiv arXiv 2026
[34]

Erica Coppolillo et al.Engagement-Driven Content Generation with Large Language Models
[35]

arXiv:2411.13187 [cs.LG].URL:https://arxiv.org/abs/2411.13187

work page arXiv
[36]

TwHIN: Embedding the Twitter Heterogeneous Information Network for Personalized Recommendation

Ahmed El-Kishky et al. “TwHIN: Embedding the Twitter Heterogeneous Information Network for Personalized Recommendation”. In:Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. KDD ’22. ACM, Aug. 2022, pp. 2842–2850.DOI: 10.1145/3534678.3539080.URL:http://dx.doi.org/10.1145/3534678.3539080

work page doi:10.1145/3534678.3539080.url:http://dx.doi.org/10.1145/3534678.3539080 2022
[37]

Qwen Team.Qwen3.5: Towards Native Multimodal Agents. Feb. 2026.URL: https://qwen. ai/blog?id=qwen3.5

2026
[38]

Decoding Echo Chambers: LLM-Powered Simulations Revealing Po- larization in Social Networks

Chenxi Wang et al. “Decoding Echo Chambers: LLM-Powered Simulations Revealing Po- larization in Social Networks”. In:Proceedings of the 31st International Conference on Computational Linguistics. Ed. by Owen Rambow et al. Abu Dhabi, UAE: Association for Computational Linguistics, Jan. 2025, pp. 3913–3923.URL: https://aclanthology.org/ 2025.coling-main.264/

2025
[39]

June 2025.URL: https://huggingface.co/datasets/nvidia/ Nemotron-Personas-USA

Yev Meyer and Dane Corneil.Nemotron-Personas-USA: Synthetic Personas Aligned to Real- World Distributions. June 2025.URL: https://huggingface.co/datasets/nvidia/ Nemotron-Personas-USA. 11

2025
[40]

SandboxSocial: A Sandbox for Social Media Using Mul- timodal AI Agents

Maximilian Puelma Touzel et al. “SandboxSocial: A Sandbox for Social Media Using Mul- timodal AI Agents”. In:Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, IJCAI-25. Ed. by James Kwok. Demo Track. International Joint Conferences on Artificial Intelligence Organization, Aug. 2025, pp. 11100–11103.DOI: 10.24963/i...

work page doi:10.24963/ijcai.2025/1271.url:https://doi.org/10.24963/ijcai.2025/1271 2025
[41]

That’s a great point,

Qirui Mi et al.MF-LLM: Simulating Population Decision Dynamics via a Mean-Field Large Language Model Framework. 2025. arXiv: 2504.21582 [cs.MA] .URL: https://arxiv. org/abs/2504.21582. LLM Disclosure StatementIn this paper, we used the GPT-5.3-Codex model via GitHub Copilot to interpret data and help in the generation of some case study plots. A Reference...

work page arXiv 2025
[42]

Follower-Chronological: retrieves the 10 most recent posts, replies, or reposts from followed users
[43]

General embedding: Uses a general sentence-transformers model to retrieve the top 10 similar posts to the user’s profile, which is generated by combining their persona description, 10 most recent posts, and 10 most recent liked posts
[44]

We borrow implementation details of the two recsys algorithms from OASIS [11]

TwHIN Encoder: Same as above, but uses the TwHIN [32] model that is trained on Twitter data to compute similarity. We borrow implementation details of the two recsys algorithms from OASIS [11]. Outcome.We observe that total actions show no significant differences across the different timeline curation settings. Interpretation.Even though we see no meaning...
[45]

Similarity exposure: agents observe neighbors with similar beliefs
[46]

Opposing exposure: agents observe neighbors with distant or opposing beliefs
[47]

Outcome.Opposing exposure strongly reduces final polarization and global disagreement relative to similarity exposure

Random exposure: agents observe eligible neighbors without belief-similarity filtering. Outcome.Opposing exposure strongly reduces final polarization and global disagreement relative to similarity exposure. Final polarization drops from 2.722 to 1.796, and final global disagreement drops from 2.228 to 1.557. Random exposure produces a weaker version of th...
[48]

Exact reproduction: direct neighbor opinion exposure and daily belief update
[49]

Outcome.The loose social environment still produces the qualitative echo-chamber signature

Loose social environment: timeline observations, social-media actions, and terminal belief probe. Outcome.The loose social environment still produces the qualitative echo-chamber signature. With Echo-style memory and self-state feedback, final polarization reaches 2.990±0.150 , NCI reaches 0.411±0.108 , and global disagreement falls to 2.296±0.085 . From ...
[50]

With self-state feedback: the agent is reminded of its previous opinion and belief
[51]

Outcome.Removing self-state feedback weakens polarization and local alignment

Without self-state feedback: the same observations and memory are provided, but explicit previous opinion/belief fields are removed. Outcome.Removing self-state feedback weakens polarization and local alignment. Under GPT- 4o-mini with Echo-style memory, final polarization falls from 2.990 to 2.695, and final NCI falls from 0.411 to 0.295. Belief volatili...
[52]

Echo-memory agent: preserves short-term summary, long-term consolidation, and structured belief update
[53]

Outcome.The simple social agent still shows echo-chamber directionality, but the effect is weaker

Simple social agent: uses a simpler observe-memory-act memory path. Outcome.The simple social agent still shows echo-chamber directionality, but the effect is weaker. With self-state feedback, final polarization falls from 2.990 for the Echo-memory agent to 2.641 for the simple agent, and NCI falls from 0.411 to 0.193. Without self-state feedback, the sim...
[54]

Outcome.Qwen3.5-4B does not reproduce the GPT-like local-alignment signature

Qwen3.5-4B. Outcome.Qwen3.5-4B does not reproduce the GPT-like local-alignment signature. With Echo memory and self-state feedback, Qwen reaches final polarization 2.605, but NCI remains negative at −0.110, and global disagreement increases to 2.841. Without self-state feedback, Qwen Echo- memory agents become highly volatile, with mean belief volatility ...

2026
[55]

H5: Algorithmic recommendation system feeds lead to more realistic information dynamic structures such as cascade measurements, virality etc
[56]

H6: Agents can be aligned to real-world distributions for engagement-actions via agent selection or assigning agents pre-set social personas
[57]

H7: By explicitly making the follower-chronological field non-interesting to the user (not aligned with voting goal, or interests), we can elicit a much bigger gap between the recsys-TWHiN and the chronological timeline 22

[1] [1]

Zhiheng Xi et al.The Rise and Potential of Large Language Model Based Agents: A Survey

[2] [2]

arXiv:2309.07864 [cs.AI].URL:https://arxiv.org/abs/2309.07864

work page internal anchor Pith review Pith/arXiv arXiv

[3] [3]

Terrence Neumann, Maria De-Arteaga, and Sina Fazelpour.Should you use LLMs to simulate opinions? Quality checks for early-stage deliberation. 2025. arXiv: 2504.08954 [cs.CY]. URL:https://arxiv.org/abs/2504.08954

work page arXiv 2025

[4] [4]

Dingyi Zuo et al.MTOS: A LLM-Driven Multi-topic Opinion Simulation Framework for Exploring Echo Chamber Dynamics. 2025. arXiv: 2510 . 12423 [cs.AI].URL: https : //arxiv.org/abs/2510.12423

work page arXiv 2025

[5] [5]

Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents

Giorgio Piatti et al. “Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents”. In:Advances in Neural Information Processing Systems. Ed. by A. Globerson et al. V ol. 37. Curran Associates, Inc., 2024, pp. 111715–111759. URL: https : / / proceedings . neurips . cc / paper _ files / paper / 2024 / file / ca9567d8ef6b2ea2da0d...

2024

[6] [6]

The Concordia Contest: Advancing the Cooperative Intelligence of Language Agents

Chandler Smith et al. “The Concordia Contest: Advancing the Cooperative Intelligence of Language Agents”. In:NeurIPS 2024 Competition Track. 2024.URL: https://openreview. net/forum?id=dfeFy1PSSw

2024

[7] [7]

Ali Khodabandeh Yalabadi et al.Controlling the Misinformation Diffusion in Social Media by the Effect of Different Classes of Agents. 2024. arXiv: 2401.11524 [cs.MA].URL: https: //arxiv.org/abs/2401.11524

work page arXiv 2024

[8] [8]

Gian Marco Orlando et al.Emergent Coordinated Behaviors in Networked LLM Agents: Modeling the Strategic Dynamics of Information Operations. 2025. arXiv: 2510 . 25003 [cs.MA].URL:https://arxiv.org/abs/2510.25003

work page arXiv 2025

[9] [9]

Natalie Shapira et al.Agents of Chaos. 2026. arXiv: 2602.20021 [cs.AI] .URL: https: //arxiv.org/abs/2602.20021

work page internal anchor Pith review Pith/arXiv arXiv 2026

[10] [10]

Aron Vallinder and Edward Hughes.Cultural Evolution of Cooperation among LLM Agents

[11] [11]

arXiv:2412.10270 [cs.MA].URL:https://arxiv.org/abs/2412.10270

work page arXiv

[12] [12]

Position: Time to Close The Validation Gap in LLM Social Simulations

Maximilian Puelma Touzel et al. “Position: Time to Close The Validation Gap in LLM Social Simulations”. In:Forty-third International Conference on Machine Learning Position Paper Track. 2026.URL:https://openreview.net/forum?id=LpbxLBcOBf

2026

[13] [13]

Oasis: Open agent social interaction simulations with one million agents,

Ziyi Yang et al. “Oasis: Open agent social interaction simulations with one million agents”. In: arXiv preprint arXiv:2411.11581(2024)

work page arXiv 2024

[14] [14]

Maik Larooij and Petter Törnberg.Do Large Language Models Solve the Problems of Agent- Based Modeling? A Critical Review of Generative Social Simulations. 2025. arXiv: 2504. 03274 [cs.MA].URL:https://arxiv.org/abs/2504.03274

work page arXiv 2025

[15] [15]

Jiaxu Zhou et al.The PIMMUR Principles: Ensuring Validity in Collective Behavior of LLM Societies. 2026. arXiv: 2509 . 18052 [cs.CL].URL: https : / / arxiv . org / abs / 2509 . 18052

2026

[16] [16]

Laura Ferrarotti et al.Generative AI collective behavior needs an interactionist paradigm

[17] [17]

arXiv:2601.10567 [cs.AI].URL:https://arxiv.org/abs/2601.10567

work page arXiv

[18] [18]

Are LLM-Powered Social Media Bots Realistic?

Lynnette Hui Xian Ng and Kathleen M Carley. “Are LLM-Powered Social Media Bots Realistic?” In:International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction and Behavior Representation in Modeling and Simulation. Springer. 2025, pp. 14–23

2025

[19] [19]

Jiaxu Zhou et al.The PIMMUR Principles: Ensuring Validity in Collective Behavior of LLM Societies. 2025. arXiv: 2509 . 18052 [cs.CL].URL: https : / / arxiv . org / abs / 2509 . 18052. 10

2025

[20] [20]

Christopher Barrie and Petter Törnberg.Emergent LLM behaviors are observationally equiva- lent to data leakage. 2025. arXiv: 2505.23796 [cs.CL].URL: https://arxiv.org/abs/ 2505.23796

work page arXiv 2025

[21] [21]

Maik Larooij and Petter Törnberg.Can We Fix Social Media? Testing Prosocial Interventions using Generative Social Simulation. 2025. arXiv: 2508.03385 [cs.SI] .URL: https:// arxiv.org/abs/2508.03385

work page arXiv 2025

[22] [22]

Jinghua Piao et al.AgentSociety: Large-Scale Simulation of LLM-Driven Generative Agents Ad- vances Understanding of Human Behaviors and Society. 2025. arXiv: 2502.08691 [cs.SI]. URL:https://arxiv.org/abs/2502.08691

work page internal anchor Pith review Pith/arXiv arXiv 2025

[23] [23]

Alexander Sasha Vezhnevets et al.Generative agent-based modeling with actions grounded in physical, social, or digital space using Concordia. 2023. arXiv: 2312.03664 [cs.AI].URL: https://arxiv.org/abs/2312.03664

work page arXiv 2023

[24] [24]

Alexander Sasha Vezhnevets et al.Multi-Actor Generative Artificial Intelligence as a Game Engine. 2025. arXiv: 2507.08892 [cs.AI].URL: https://arxiv.org/abs/2507.08892

work page arXiv 2025

[25] [25]

Leibo et al.A Theory of Appropriateness That Accounts for Norms of Rationality

Joel Z. Leibo et al.A Theory of Appropriateness That Accounts for Norms of Rationality. 2026. arXiv:2603.14050 [cs.NE].URL:https://arxiv.org/abs/2603.14050

work page arXiv 2026

[26] [26]

Benefits and challenges for platform-based design

Alberto Sangiovanni-Vincentelli et al. “Benefits and challenges for platform-based design”. In:Proceedings of the 41st Annual Design Automation Conference. DAC ’04. San Diego, CA, USA: Association for Computing Machinery, 2004, pp. 409–414.ISBN: 1581138288.DOI: 10.1145/996566.996684.URL:https://doi.org/10.1145/996566.996684

work page doi:10.1145/996566.996684.url:https://doi.org/10.1145/996566.996684 2004

[27] [27]

Jingtao Ding et al.Understanding World or Predicting Future? A Comprehensive Survey of World Models. 2025. arXiv: 2411.14499 [cs.CL].URL: https://arxiv.org/abs/2411. 14499

work page arXiv 2025

[28] [28]

Xuhui Zhou et al.Social World Models. 2025. arXiv: 2509.00559 [cs.AI] .URL: https: //arxiv.org/abs/2509.00559

work page arXiv 2025

[29] [29]

Joon Sung Park et al.Social Simulacra: Creating Populated Prototypes for Social Computing Systems. 2022. arXiv: 2208 . 04024 [cs.HC].URL: https : / / arxiv . org / abs / 2208 . 04024

2022

[30] [30]

Pranav Narayanan Venkit et al.The Need for a Socially-Grounded Persona Framework for User Simulation. 2026. arXiv: 2601.07110 [cs.CL] .URL: https://arxiv.org/abs/ 2601.07110

work page arXiv 2026

[31] [31]

BluePrint: A Social Media User Dataset for LLM Persona Evaluation and Training

Aurélien Bück-Kaeffer et al. “BluePrint: A Social Media User Dataset for LLM Persona Evaluation and Training”. In:Workshop on Tailoring AI: Exploring Active and Passive LLM Personalization (PALS). EMNLP. 2025.URL: https://pals-nlp-workshop.github.io/

2025

[32] [32]

Position: LLM Social Simulations Are a Promising Research Method

Jacy Reese Anthis et al. “Position: LLM Social Simulations Are a Promising Research Method”. In:Forty-second International Conference on Machine Learning Position Paper Track. 2025. URL:https://openreview.net/forum?id=cRBg1dtj7o

2025

[33] [33]

Aurélien Bück-Kaeffer et al.The Silicon Society Cookbook: Design Space of LLM-based Social Simulations. 2026. arXiv: 2605.00197 [cs.MA].URL: https://arxiv.org/abs/ 2605.00197

work page internal anchor Pith review Pith/arXiv arXiv 2026

[34] [34]

Erica Coppolillo et al.Engagement-Driven Content Generation with Large Language Models

[35] [35]

arXiv:2411.13187 [cs.LG].URL:https://arxiv.org/abs/2411.13187

work page arXiv

[36] [36]

TwHIN: Embedding the Twitter Heterogeneous Information Network for Personalized Recommendation

Ahmed El-Kishky et al. “TwHIN: Embedding the Twitter Heterogeneous Information Network for Personalized Recommendation”. In:Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. KDD ’22. ACM, Aug. 2022, pp. 2842–2850.DOI: 10.1145/3534678.3539080.URL:http://dx.doi.org/10.1145/3534678.3539080

work page doi:10.1145/3534678.3539080.url:http://dx.doi.org/10.1145/3534678.3539080 2022

[37] [37]

Qwen Team.Qwen3.5: Towards Native Multimodal Agents. Feb. 2026.URL: https://qwen. ai/blog?id=qwen3.5

2026

[38] [38]

Decoding Echo Chambers: LLM-Powered Simulations Revealing Po- larization in Social Networks

Chenxi Wang et al. “Decoding Echo Chambers: LLM-Powered Simulations Revealing Po- larization in Social Networks”. In:Proceedings of the 31st International Conference on Computational Linguistics. Ed. by Owen Rambow et al. Abu Dhabi, UAE: Association for Computational Linguistics, Jan. 2025, pp. 3913–3923.URL: https://aclanthology.org/ 2025.coling-main.264/

2025

[39] [39]

June 2025.URL: https://huggingface.co/datasets/nvidia/ Nemotron-Personas-USA

Yev Meyer and Dane Corneil.Nemotron-Personas-USA: Synthetic Personas Aligned to Real- World Distributions. June 2025.URL: https://huggingface.co/datasets/nvidia/ Nemotron-Personas-USA. 11

2025

[40] [40]

SandboxSocial: A Sandbox for Social Media Using Mul- timodal AI Agents

Maximilian Puelma Touzel et al. “SandboxSocial: A Sandbox for Social Media Using Mul- timodal AI Agents”. In:Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, IJCAI-25. Ed. by James Kwok. Demo Track. International Joint Conferences on Artificial Intelligence Organization, Aug. 2025, pp. 11100–11103.DOI: 10.24963/i...

work page doi:10.24963/ijcai.2025/1271.url:https://doi.org/10.24963/ijcai.2025/1271 2025

[41] [41]

That’s a great point,

Qirui Mi et al.MF-LLM: Simulating Population Decision Dynamics via a Mean-Field Large Language Model Framework. 2025. arXiv: 2504.21582 [cs.MA] .URL: https://arxiv. org/abs/2504.21582. LLM Disclosure StatementIn this paper, we used the GPT-5.3-Codex model via GitHub Copilot to interpret data and help in the generation of some case study plots. A Reference...

work page arXiv 2025

[42] [42]

Follower-Chronological: retrieves the 10 most recent posts, replies, or reposts from followed users

[43] [43]

General embedding: Uses a general sentence-transformers model to retrieve the top 10 similar posts to the user’s profile, which is generated by combining their persona description, 10 most recent posts, and 10 most recent liked posts

[44] [44]

We borrow implementation details of the two recsys algorithms from OASIS [11]

TwHIN Encoder: Same as above, but uses the TwHIN [32] model that is trained on Twitter data to compute similarity. We borrow implementation details of the two recsys algorithms from OASIS [11]. Outcome.We observe that total actions show no significant differences across the different timeline curation settings. Interpretation.Even though we see no meaning...

[45] [45]

Similarity exposure: agents observe neighbors with similar beliefs

[46] [46]

Opposing exposure: agents observe neighbors with distant or opposing beliefs

[47] [47]

Outcome.Opposing exposure strongly reduces final polarization and global disagreement relative to similarity exposure

Random exposure: agents observe eligible neighbors without belief-similarity filtering. Outcome.Opposing exposure strongly reduces final polarization and global disagreement relative to similarity exposure. Final polarization drops from 2.722 to 1.796, and final global disagreement drops from 2.228 to 1.557. Random exposure produces a weaker version of th...

[48] [48]

Exact reproduction: direct neighbor opinion exposure and daily belief update

[49] [49]

Outcome.The loose social environment still produces the qualitative echo-chamber signature

Loose social environment: timeline observations, social-media actions, and terminal belief probe. Outcome.The loose social environment still produces the qualitative echo-chamber signature. With Echo-style memory and self-state feedback, final polarization reaches 2.990±0.150 , NCI reaches 0.411±0.108 , and global disagreement falls to 2.296±0.085 . From ...

[50] [50]

With self-state feedback: the agent is reminded of its previous opinion and belief

[51] [51]

Outcome.Removing self-state feedback weakens polarization and local alignment

Without self-state feedback: the same observations and memory are provided, but explicit previous opinion/belief fields are removed. Outcome.Removing self-state feedback weakens polarization and local alignment. Under GPT- 4o-mini with Echo-style memory, final polarization falls from 2.990 to 2.695, and final NCI falls from 0.411 to 0.295. Belief volatili...

[52] [52]

Echo-memory agent: preserves short-term summary, long-term consolidation, and structured belief update

[53] [53]

Outcome.The simple social agent still shows echo-chamber directionality, but the effect is weaker

Simple social agent: uses a simpler observe-memory-act memory path. Outcome.The simple social agent still shows echo-chamber directionality, but the effect is weaker. With self-state feedback, final polarization falls from 2.990 for the Echo-memory agent to 2.641 for the simple agent, and NCI falls from 0.411 to 0.193. Without self-state feedback, the sim...

[54] [54]

Outcome.Qwen3.5-4B does not reproduce the GPT-like local-alignment signature

Qwen3.5-4B. Outcome.Qwen3.5-4B does not reproduce the GPT-like local-alignment signature. With Echo memory and self-state feedback, Qwen reaches final polarization 2.605, but NCI remains negative at −0.110, and global disagreement increases to 2.841. Without self-state feedback, Qwen Echo- memory agents become highly volatile, with mean belief volatility ...

2026

[55] [55]

H5: Algorithmic recommendation system feeds lead to more realistic information dynamic structures such as cascade measurements, virality etc

[56] [56]

H6: Agents can be aligned to real-world distributions for engagement-actions via agent selection or assigning agents pre-set social personas

[57] [57]

H7: By explicitly making the follower-chronological field non-interesting to the user (not aligned with voting goal, or interests), we can elicit a much bigger gap between the recsys-TWHiN and the chronological timeline 22