pith. machine review for the scientific record. sign in

arxiv: 2604.17615 · v1 · submitted 2026-04-19 · 💻 cs.HC

Recognition: unknown

WhatIf: Interactive Exploration of LLM-Powered Social Simulations for Policy Reasoning

Authors on Pith no claims yet

Pith reviewed 2026-05-10 05:13 UTC · model grok-4.3

classification 💻 cs.HC
keywords interactive systemsLLM social simulationspolicy reasoningemergency preparednessagent-based modelinghuman-AI interactiondecision making under uncertaintydisaster evacuation planning
0
0 comments X

The pith

Interactive LLM social simulations let policymakers iteratively steer, compare, and inspect agent behaviors to test plans under uncertainty.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents WhatIf, an interactive system that enables real-time steering, inspection, and comparison of LLM-powered simulations of how populations respond to policies, such as in disaster evacuations. It derives four design requirements from a formative study with preparedness planners and implements fluid steering, real-time scaling, collaborative use, and multi-level views of agent actions. Evaluation with five professionals across three scenarios showed participants using the system to branch between options dynamically, question their own assumptions when simulated agents acted unexpectedly, uncover previously hidden vulnerabilities, and base conclusions on specific inspectable agent cases rather than summary statistics alone. This positions LLM social simulations as shared reasoning environments that can support decision-making where outcomes depend on unpredictable human coordination.

Core claim

WhatIf demonstrates that policymakers engage with LLM-powered social simulations as spaces for iterative branching and comparison, reflecting on tacit assumptions when agent behaviors violate expectations, surfacing unrecognized planning vulnerabilities, and grounding reasoning in inspectable agent-level cases, which supports designing such systems as interactive shared reasoning environments rather than offline predictive tools to better aid expert decisions under deep uncertainty.

What carries the argument

WhatIf, the interactive system supporting fluid steering of LLM agents, real-time scale, collaborative exploration, and multi-level interpretability of simulation outputs.

If this is right

  • Users shift from evaluating fixed plans to iterative branching and comparison of policy options.
  • Unexpected agent behaviors trigger reflection on unstated assumptions in the planners' mental models.
  • Interaction surfaces previously unrecognized vulnerabilities in the proposed policies.
  • Reasoning draws primarily from inspectable individual agent cases instead of aggregate statistics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Combining the system with real-time external data sources could make agent responses more grounded for ongoing crises.
  • The interactive approach could extend to policy domains like public health messaging or climate adaptation where population coordination is key.
  • Repeated use sessions might allow users to refine the underlying simulation models based on observed discrepancies.

Load-bearing premise

The LLM-generated agent behaviors are realistic enough to prompt genuine reflection on real planning assumptions, and qualitative insights from five preparedness professionals generalize to broader policy needs.

What would settle it

A larger controlled study in which policymakers using WhatIf identify no more vulnerabilities or revise plans no more than those using static simulations or tabletop exercises would undermine the claim.

Figures

Figures reproduced from arXiv: 2604.17615 by Hirokazu Shirado, Kyzyl Monteiro, Sauvik Das, Yuxuan Li.

Figure 1
Figure 1. Figure 1: WhatIf enables policymakers to interactively explore large-scale LLM-powered social simulations as shared reasoning [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: WhatIf system overview. After user-authored interventions (left), the system executes repeated simulation rounds that combines LLM-driven agent decisions with physics-based world updates (center). Interventions update shared state and agent context. Agents enter LLM deliberation when their context changes, producing destinations, messages, and rationales, while a deterministic engine advances movement, haz… view at source ↗
Figure 3
Figure 3. Figure 3: Fluid steering in WhatIf. The interface supports real-time intervention through complementary modalities, including direct manipulation on the canvas (1), structured tool selection (2–3), and natural-language commands via the What-If Agent (9). Users can modify unfolding scenarios—for example, by placing coordinators or editing announcements—and immediately observe resulting changes in agent behavior and c… view at source ↗
Figure 4
Figure 4. Figure 4: Collaborative exploration in WhatIf. Users access shared project workspaces with multiple simulation runs (1) and join live session through a lightweight “Join Project” flow (2), Once connected, collaborators share a synchronized simulation state with visible presence through avatars and live cursors (3). A shared interactive canvas allows users to inspect, annotate, and intervene in the same unfolding sim… view at source ↗
Figure 5
Figure 5. Figure 5: Multi-level interpretability in in WhatIf. WhatIf supports inspection across levels of abstraction and time, linking individual agent reasoning to system-level outcomes. (1) A report view summarizes outcomes—including issue detection, evacuation progress, congestion patterns, and trajectory comparisons—to support post-hoc analysis and cross-run evaluation. (2) In-character interviews expose agents’ situate… view at source ↗
Figure 6
Figure 6. Figure 6: Iterative what-if cycles across fork episodes. Each panel corresponds to a different participant (P2, P4, P5) and [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
read the original abstract

Policymakers in domains such as emergency management, public health, and urban planning must make decisions under deep uncertainty, where outcomes depend on how large populations interpret information, coordinate, and adopt over time. Existing tools only partially support this process: tabletop exercises enable collaborative discussion but lack dynamic feedback, while computational simulations capture population dynamics but are designed for offline analysis. We present WhatIf, an interactive system that enables policymakers to steer, inspect, and compare LLM-powered social simulations in real time. Informed by a formative study in emergency preparedness planning, we derive four design requirements for interactive policy simulations: fluid steering, real-time scale, collaborative exploration, and multi-level interpretability. We developed WhatIf guided by these requirements and evaluated it with five preparedness professionals across three disaster evacuation scenarios. Our findings show that participants used the system as a space for iterative branching and comparison rather than evaluating fixed plans; reflected on tacit planning assumptions when agent behavior violated expectations; surfaced previously unrecognized planning vulnerabilities; and grounded their reasoning in inspectable agent-level cases rather than aggregate outputs alone. These findings suggest broader design implications for LLM-powered social simulation systems: designing such systems as interactive, shared reasoning environments -- rather than offline predictive tools -- can better support expert decision-making under deep uncertainty.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper presents WhatIf, an interactive system allowing policymakers to steer, inspect, and compare LLM-powered social simulations in real time for reasoning under deep uncertainty in domains like emergency management. Drawing on a formative study, it derives four design requirements (fluid steering, real-time scale, collaborative exploration, multi-level interpretability) and implements the system accordingly. A qualitative evaluation with five preparedness professionals across three disaster evacuation scenarios reports that participants used the tool for iterative branching and comparison, reflected on tacit assumptions when agent behaviors violated expectations, surfaced unrecognized vulnerabilities, and grounded reasoning in agent-level cases rather than aggregates. The authors conclude that such systems are better positioned as interactive shared reasoning environments than as offline predictive tools.

Significance. If the core findings hold, the work offers a concrete design contribution to HCI and policy informatics by demonstrating how real-time LLM simulation steering can surface planning assumptions and vulnerabilities that static or aggregate tools miss. The derivation of requirements from a domain-specific formative study and the emphasis on multi-level interpretability provide reusable guidance for future interactive simulation systems. However, the small-scale qualitative evaluation limits the strength of the broader design implications.

major comments (3)
  1. [Evaluation] Evaluation section: the reported findings rest on a qualitative study with five participants across three scenarios, yet no details are provided on study protocol, recruitment, session structure, data collection methods, or analysis approach. This absence makes it impossible to evaluate whether the observed reflections on tacit assumptions and vulnerabilities are attributable to the system's interactive features or to other factors.
  2. [Evaluation] The central claim that participants reflected on planning assumptions specifically because LLM agent behaviors violated expectations requires evidence that the generated behaviors are sufficiently realistic proxies for real populations. No validation is reported (e.g., expert ratings of trace realism against historical data, comparison to non-LLM baselines, or ablation of prompt effects), leaving open the possibility that reflections stem from LLM artifacts rather than transferable policy insights.
  3. [Discussion / Conclusion] The design implications (that interactive LLM simulations outperform offline tools for expert reasoning under uncertainty) are load-bearing for the paper's contribution, yet they are extrapolated from n=5 sessions without quantitative metrics, controls, or comparison conditions. This weakens the generalizability asserted in the abstract and conclusion.
minor comments (2)
  1. [Abstract] The abstract and introduction would benefit from a brief statement of the exact number of participants and scenarios to set expectations for the evaluation's scope.
  2. [System Description] Figure captions and system screenshots should explicitly label which interface elements correspond to the four design requirements (e.g., steering controls, agent inspection panels) to aid reader mapping.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the constructive feedback highlighting gaps in the evaluation methodology and the scope of our claims. We agree that additional details on the study protocol are necessary and that the qualitative findings from a small sample should not be overgeneralized. We will revise the manuscript to incorporate more methodological transparency and to qualify the design implications accordingly, while maintaining the focus on the interactive system's role in supporting reflection.

read point-by-point responses
  1. Referee: [Evaluation] Evaluation section: the reported findings rest on a qualitative study with five participants across three scenarios, yet no details are provided on study protocol, recruitment, session structure, data collection methods, or analysis approach. This absence makes it impossible to evaluate whether the observed reflections on tacit assumptions and vulnerabilities are attributable to the system's interactive features or to other factors.

    Authors: We agree that these details were omitted and should have been included. In the revised manuscript, we will expand the Evaluation section with a dedicated protocol description covering: recruitment through targeted outreach to emergency preparedness professionals; session structure as 60-minute individual sessions using a think-aloud protocol while exploring the three scenarios; data collection via screen recordings, audio transcripts, and brief post-session debriefs; and analysis via inductive thematic analysis to identify patterns in usage and reflection. These additions will allow readers to better assess the attribution of findings to the system's features. revision: yes

  2. Referee: [Evaluation] The central claim that participants reflected on planning assumptions specifically because LLM agent behaviors violated expectations requires evidence that the generated behaviors are sufficiently realistic proxies for real populations. No validation is reported (e.g., expert ratings of trace realism against historical data, comparison to non-LLM baselines, or ablation of prompt effects), leaving open the possibility that reflections stem from LLM artifacts rather than transferable policy insights.

    Authors: We acknowledge the validity of this point. The study did not conduct formal validation of LLM behavior realism (such as expert ratings or baseline comparisons), which is a genuine limitation. We will revise the Discussion to explicitly state that observed reflections may partly stem from LLM-specific characteristics rather than purely transferable insights, and that the system supports interrogation of assumptions within the generated simulations without claiming predictive fidelity. The contribution centers on the interactive exploration process rather than model accuracy. revision: partial

  3. Referee: [Discussion / Conclusion] The design implications (that interactive LLM simulations outperform offline tools for expert reasoning under uncertainty) are load-bearing for the paper's contribution, yet they are extrapolated from n=5 sessions without quantitative metrics, controls, or comparison conditions. This weakens the generalizability asserted in the abstract and conclusion.

    Authors: The work is framed as a qualitative design study, not a controlled comparative experiment. We will revise the abstract and conclusion to remove or qualify any implication of broad outperformance or generalizability, instead describing the findings as preliminary patterns observed in a small expert sample and calling for future studies with quantitative measures and controls. The reusable design requirements derived from the formative study remain the central contribution. revision: yes

standing simulated objections not resolved
  • The request for validation evidence (e.g., expert ratings or baseline comparisons) demonstrating that LLM agent behaviors are realistic proxies for real populations, as no such validation was included in the study.

Circularity Check

0 steps flagged

No circularity: qualitative system design and user study

full rationale

The paper presents WhatIf, an interactive system for LLM-powered social simulations, with design requirements derived from a formative study and evaluated via qualitative sessions with five professionals across three scenarios. Claims about usage patterns, reflection on assumptions, and design implications rest directly on reported participant behaviors and observations. No equations, fitted parameters, predictions, or derivations appear that could reduce to self-citations or inputs by construction. Self-citations, if present, are not load-bearing for the central claims, which are grounded in the new user study rather than prior author work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an HCI systems paper with no mathematical derivations, fitted parameters, or postulated entities. The central claims rest on qualitative observations from a small user study rather than any formal model or external benchmarks.

pith-pipeline@v0.9.0 · 5530 in / 1288 out tokens · 40964 ms · 2026-05-10T05:13:10.291919+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

69 extracted references · 12 canonical work pages · 3 internal anchors

  1. [1]

    Juliane Adrian, Armin Seyfried, and Anna Sieben. 2020. Crowds in front of bottlenecks at entrances from the perspective of physics and social psychology. Journal of the Royal Society Interface17, 165 (2020)

  2. [2]

    Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N Bennett, Kori Inkpen, et al. 2019. Guidelines for human-AI interaction. InProceedings of the 2019 chi conference on human factors in computing systems. 1–13

  3. [3]

    Lee, and Nikola Banovic

    Anindya Das Antar, Somayeh Molaei, Yan-Ying Chen, Matthew L. Lee, and Nikola Banovic. 2024. VIME: Visual Interactive Model Explorer for Identifying Capabilities and Limitations of Machine Learning Models for Sequential Decision- Making.Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology(2024). https://api.semanticscholar...

  4. [4]

    Louis Bavoil, Steven P Callahan, Patricia J Crossno, Juliana Freire, Carlos E Scheidegger, Cláudio T Silva, and Huy T Vo. 2005. Vistrails: Enabling interactive multiple-view visualizations. InVIS 05. IEEE Visualization, 2005.IEEE, 135–142

  5. [5]

    Virginia Braun and Victoria Clarke. 2006. Using thematic analy- sis in psychology.Qualitative Research in Psychology3, 2 (2006), 77–101. arXiv:https://doi.org/10.1191/1478088706qp063oa doi:10.1191/ 1478088706qp063oa

  6. [6]

    John Brooke et al. 1996. SUS-A quick and dirty usability scale.Usability evaluation in industry189, 194 (1996), 4–7

  7. [7]

    Vanessa Colella, Richard Borovoy, and Mitchel Resnick. 1998. Participatory simulations: Using computational objects to learn about dynamic systems. In CHI 98 conference summary on human factors in computing systems. 9–10

  8. [8]

    Zach Cutler, Kiran Gadhave, and Alexander Lex. 2020. Trrack: A library for provenance-tracking in web-based visualizations. In2020 IEEE Visualization Conference (VIS). IEEE, 116–120

  9. [9]

    Pei Dang, Jun Zhu, Weilian Li, Yakun Xie, and Heng Zhang. 2025. Large-language- model-driven agents for fire evacuation simulation in a cellular automata envi- ronment.Safety Science191 (2025), 106935

  10. [10]

    David J Dausey, James W Buehler, and Nicole Lurie. 2007. Designing and con- ducting tabletop exercises to assess public health preparedness for manmade and naturally occurring biological threats.BMC public health7, 1 (2007), 92

  11. [11]

    Upol Ehsan, Koustuv Saha, Munmun De Choudhury, and Mark O Riedl. 2023. Charting the sociotechnical gap in explainable AI: A framework to address the gap in XAI.Proceedings of the ACM on human-computer interaction7, CSCW1 (2023), 1–32

  12. [12]

    Alex Endert, M Shahriar Hossain, Naren Ramakrishnan, Chris North, Patrick Fiaux, and Christopher Andrews. 2014. The human is the loop: new directions for visual analytics.Journal of intelligent information systems43, 3 (2014), 411–435

  13. [13]

    1970.Designing for pedestrians a level of service concept

    John Joseph Fruin. 1970.Designing for pedestrians a level of service concept. Polytechnic University

  14. [14]

    Chen Gao, Xiaochong Lan, Nian Li, Yuan Yuan, Jingtao Ding, Zhilun Zhou, Fengli Xu, and Yong Li. 2024. Large language models empowered agent-based modeling and simulation: A survey and perspectives.Humanities and Social Sciences Communications11, 1 (2024), 1–24

  15. [15]

    Stan Geertman and John Stillwell. 2020. Planning support science: Developments and challenges.Environment and Planning B: Urban Analytics and City Science 47, 8 (2020), 1326–1342

  16. [16]

    Stan C. M. Geertman. 2008. Planning Support Systems: A Planner’s Perspective. InPlanning Support Systems for Cities and Regions, Richard K. Brail (Ed.). Lincoln Institute of Land Policy, 213–230

  17. [17]

    Abhinav Golas, Rahul Narain, and Ming C Lin. 2014. Continuum modeling of crowd turbulence.Physical Review E-Statistical, Nonlinear, and Soft Matter Physics 90, 4 (2014)

  18. [18]

    1998.Guide for all-hazard emergency operations planning

    Kay C Goss. 1998.Guide for all-hazard emergency operations planning. DIANE Publishing

  19. [19]

    Onder Gurcan. 2024. Llm-augmented agent-based modelling for social simula- tions: Challenges and opportunities.arXiv preprint arXiv:2405.06700(2024)

  20. [20]

    1966.The hidden dimension

    Edmund T Hall and Edward T Hall. 1966.The hidden dimension. Vol. 609. Anchor

  21. [21]

    Sandra G Hart and Lowell E Staveland. 1988. Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. InAdvances in psy- chology. Vol. 52. Elsevier, 139–183

  22. [22]

    Jeffrey Heer, Fernanda B Viégas, and Martin Wattenberg. 2007. Voyagers and voyeurs: supporting asynchronous collaborative information visualization. In Proceedings of the SIGCHI conference on Human factors in computing systems. 1029–1038

  23. [23]

    Dirk Helbing, Illés Farkas, and Tamas Vicsek. 2000. Simulating dynamical features of escape panic.Nature407, 6803 (2000), 487–490

  24. [24]

    Dirk Helbing, Anders Johansson, and Habib Zein Al-Abideen. 2007. Dynamics of crowd disasters: An empirical study.Physical Review E—Statistical, Nonlinear, and Soft Matter Physics75, 4 (2007), 046109

  25. [25]

    Dirk Helbing and Peter Molnar. 1995. Social force model for pedestrian dynamics. Physical review E51, 5 (1995), 4282

  26. [26]

    Erzhen Hu, Yanhe Chen, Mingyi Li, Vrushank Phadnis, Pingmei Xu, Xun Qian, Alex Olwal, David Kim, Seongkook Heo, and Ruofei Du. 2025. DialogLab: Au- thoring, Simulating, and Testing Dynamic Human-AI Group Conversations. In Proceedings of the 38th Annual ACM Symposium on User Interface Software and Technology (UIST ’25). Association for Computing Machinery,...

  27. [27]

    Hilary Hutchinson, Wendy Mackay, Bo Westerlund, Benjamin B Bederson, Al- lison Druin, Catherine Plaisant, Michel Beaudouin-Lafon, Stéphane Conversy, Helen Evans, Heiko Hansen, et al. 2003. Technology probes: inspiring design for and with families. InProceedings of the SIGCHI conference on Human factors in computing systems. 17–24

  28. [28]

    Petra Isenberg, Danyel Fisher, Sharoda A Paul, Meredith Ringel Morris, Kori Inkpen, and Mary Czerwinski. 2011. Co-located collaborative visual analytics around a tabletop display.IEEE Transactions on visualization and Computer Graphics18, 5 (2011), 689–702

  29. [29]

    Can large language model agents simulate human trust behavior? , isbn =

    Feiran Jia, Ziyu Ye, Shiyang Lai, Kai Shu, Jindong Gu, Adel Bibi, Ziniu Hu, David Jurgens, James Evans, Philip H.S. Torr, Bernard Ghanem, Guohao Li, Chengxing Xie, and Canyu Chen. 2024. Can Large Language Model Agents Simulate Human Trust Behavior?. InAdvances in Neural Information Processing Systems, A. Glober- son, L. Mackey, D. Belgrave, A. Fan, U. Paq...

  30. [30]

    Elinore J Kaufman, Douglas J Wiebe, Ruiying Aria Xiong, Christopher N Morrison, Mark J Seamon, and M Kit Delgado. 2021. Epidemiologic trends in fatal and nonfatal firearm injuries in the US, 2009-2017.JAMA internal medicine181, 2 (2021), 237–244

  31. [31]

    Max T Kinateder, Erica D Kuligowski, Paul A Reneke, and Richard D Peacock

  32. [32]

    Risk perception in fire evacuation behavior revisited: definitions, related concepts, and empirical evidence.Fire science reviews4, 1 (2015), 1

  33. [33]

    Richard E Klosterman. 1999. The what if? Collaborative planning support system. Environment and planning B: Planning and design26, 3 (1999), 393–408

  34. [34]

    Robert Lempert. 2002. Agent-based modeling as organizational and public policy simulators.Proceedings of the national academy of sciences99, suppl_3 (2002), 7195–7196

  35. [35]

    Guohao Li, Hasan Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. 2023. Camel: Communicative agents for" mind" exploration of large language model society.Advances in neural information processing systems36 (2023), 51991–52008

  36. [36]

    Yuxuan Li, Sauvik Das, and Hirokazu Shirado. 2025. What Makes LLM Agent Simulations Useful for Policy? Insights From an Iterative Design Engagement in Emergency Preparedness.arXiv preprint arXiv:2509.21868(2025)

  37. [37]

    Yuxuan Li, Leyang Li, Hao-Ping (Hank) Lee, and Sauvik Das. 2026. How Well Can LLM Agents Simulate End-User Security and Privacy Attitudes and Behav- iors?ArXivabs/2602.18464 (2026). https://api.semanticscholar.org/CorpusID: 285972806

  38. [38]

    Yuxuan Li, Hirokazu Shirado, and Sauvik Das. 2025. Actions speak louder than words: Agent decisions reveal implicit biases in language models. InProceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency. 3303– 3325

  39. [39]

    Michael K Lindell and Ronald W Perry. 2012. The protective action decision model: Theoretical modifications and additional evidence.Risk Analysis: An International Journal32, 4 (2012), 616–632

  40. [40]

    Xiao Liu, Hao Yu, Hanchen Zhang, Yifan Xu, Xuanyu Lei, Hanyu Lai, Yu Gu, Yuxian Gu, Han Ding, Kai Men, Kejuan Yang, Shudan Zhang, Xiang Deng, Aohan Zeng, Zhengxiao Du, Chenhui Zhang, Shengqi Shen, Tianjun Zhang, Sheng Shen, Yu Su, Huan Sun, Minlie Huang, Yuxiao Dong, and Jie Tang

  41. [41]

    AgentBench: Evaluating LLMs as Agents

    AgentBench: Evaluating LLMs as Agents.ArXivabs/2308.03688 (2023). https://api.semanticscholar.org/CorpusID:260682249

  42. [42]

    Charles M Macal and Michael J North. 2005. Tutorial on agent-based modeling and simulation. InProceedings of the Winter Simulation Conference, 2005.IEEE, 14–pp

  43. [43]

    Sally Maitlis. 2005. The social processes of organizational sensemaking.Academy of management journal48, 1 (2005), 21–49

  44. [44]

    VAWJ Marchau, Warren E Walker, PJTM Bloemen, and Steven W Popper. 2019. Decision making under deep uncertainty: from theory to practice.(No Title) Preprint, , Li et al. (2019)

  45. [45]

    Jurriaan D Mulder, Robert van Liere, and Jarke J van Wijk. 1998. Computational steering in the CAVE.Future Generation Computer Systems14, 3-4 (1998), 199– 207

  46. [46]

    Arpit Narechania, Arjun Srinivasan, and John T. Stasko. 2020. NL4DV: A Toolkit for Generating Analytic Specifications for Data Visualization from Natural Lan- guage Queries.IEEE Transactions on Visualization and Computer Graphics27 (2020), 369–379. https://api.semanticscholar.org/CorpusID:221292836

  47. [47]

    Vittorio Nespeca, Tina Comes, and Frances Brazier. 2023. A methodology to develop agent-based models for policy support via qualitative inquiry.Journal of Artificial Societies and Social Simulation26, 1 (2023)

  48. [48]

    Joon Sung Park, Joseph O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. 2023. Generative agents: Interactive simulacra of human behavior. InProceedings of the 36th annual acm symposium on user interface software and technology. 1–22

  49. [49]

    Zou, Aaron Shaw, Benjamin Mako Hill, Carrie Jun Cai, Meredith Ringel Morris, Robb Willer, Percy Liang, and Michael S

    Joon Sung Park, Carolyn Q. Zou, Aaron Shaw, Benjamin Mako Hill, Carrie Jun Cai, Meredith Ringel Morris, Robb Willer, Percy Liang, and Michael S. Bernstein

  50. [50]

    LLM Agents Grounded in Self-Reports Enable General-Purpose Simulation of Individuals

    Generative Agent Simulations of 1,000 People.ArXivabs/2411.10109 (2024). https://api.semanticscholar.org/CorpusID:274117080

  51. [51]

    Seung In Park, Yong Cao, and Francis K. H. Quek. 2011. Large Scale Crowd Sim- ulation Using A Hybrid Agent Model. https://api.semanticscholar.org/CorpusID: 16115924

  52. [52]

    Peter Pelzer, Stan Geertman, Robert van der Heijden, and Etienne A. J. A. Rouwette. 2014. The Added Value of Planning Support Systems: A Practi- tioner’s Perspective.Computers, Environment and Urban Systems48 (2014), 16–27. doi:10.1016/j.compenvurbsys.2014.05.002

  53. [53]

    Liliana Perez and Suzana Dragicevic. 2009. An agent-based approach for modeling dynamics of contagious disease spread.International journal of health geographics 8, 1 (2009), 50

  54. [54]

    Jing Piao, Yuwei Yan, Jun Zhang, Nian Li, Junbo Yan, Xiaochong Lan, Zhihong Lu, Zhiheng Zheng, Jing Yi Wang, Di Zhou, Chen Gao, Fengli Xu, Fang Zhang, Ke Rong, Jun Su, and Yong Li. 2025. AgentSociety: Large-Scale Simulation of LLM- Driven Generative Agents Advances Understanding of Human Behaviors and Society.ArXivabs/2502.08691 (2025). https://api.semant...

  55. [55]

    Jaziar Radianti, Santiago Gil Martinez, Bjørn Erik Munkvold, and Morgan Konnes- tad. 2018. Co-design of a virtual training tool with emergency management stakeholders for extreme weather response. Ininternational conference of design, user experience, and usability. Springer, 185–202

  56. [56]

    Audrey Reinert, Luke S Snyder, Jieqiong Zhao, Andrew S Fox, Dean F Hougen, Charles Nicholson, and David S Ebert. 2020. Visual analytics for decision-making during pandemics.Computing in Science & Engineering22, 6 (2020), 48–59

  57. [57]

    Gayani PDP Senanayake, Minh Kieu, Yang Zou, and Kim Dirks. 2024. Agent-based simulation for pedestrian evacuation: A systematic literature review.International journal of disaster risk reduction111 (2024), 104705

  58. [58]

    Armin Seyfried, Bernhard Steffen, Wolfram Klingsch, and Maik Boltes. 2005. The fundamental diagram of pedestrian movement revisited.Journal of Statistical Mechanics: Theory and Experiment2005, 10 (2005), P10002–P10002

  59. [59]

    Aleksandra Solinska-Nowak, Piotr Magnuszewski, Margot Curl, Adam French, Adriana Keating, Junko Mochizuki, Wei Liu, Reinhard Mechler, Michalina Ku- lakowska, and Lukasz Jarzabek. 2018. An overview of serious games for disaster risk management–Prospects and limitations for informing actions to arrest in- creasing risk.International journal of disaster risk...

  60. [60]

    2014.Introduction to crowd science

    G Keith Still. 2014.Introduction to crowd science. CRC Press

  61. [61]

    Jur Van Den Berg, Stephen J Guy, Ming Lin, and Dinesh Manocha. 2011. Reciprocal n-body collision avoidance. InRobotics research: the 14th international symposium ISRR. Springer, 3–19

  62. [62]

    Fernanda B Viegas, Martin Wattenberg, Frank Van Ham, Jesse Kriss, and Matt McKeon. 2007. Manyeyes: a site for visualization at internet scale.IEEE transac- tions on visualization and computer graphics13, 6 (2007), 1121–1128

  63. [63]

    Warren E Walker, Robert J Lempert, and Jan H Kwakkel. 2012. Deep uncertainty. Delft University of Technology1, 2 (2012), 1

  64. [64]

    Ulrich Weidmann. 1993. Transporttechnik der fussgänger.Schriftenreihe des IVT 90, 2 (1993)

  65. [65]

    James Wexler, Mahima Pushkarna, Tolga Bolukbasi, Martin Wattenberg, Fernanda Viégas, and Jimbo Wilson. 2019. The what-if tool: Interactive probing of machine learning models.IEEE transactions on visualization and computer graphics26, 1 (2019), 56–65

  66. [66]

    Uri Wilensky and Walter Stroup. 1999. Learning Through Participatory Simula- tions: Network-Based Design for Systems Learning in Classrooms. InProceedings of CSCL ’99

  67. [67]

    Zhiqiang Xie, Hao Kang, Ying Sheng, Tushar Krishna, Kayvon Fatahalian, and Christos Kozyrakis. 2024. AI Metropolis: Scaling Large Language Model-based Multi-Agent Simulation with Out-of-order Execution.ArXivabs/2411.03519 (2024). https://api.semanticscholar.org/CorpusID:273850266

  68. [68]

    Yuwei Yan, Yu Shang, Qingbin Zeng, Yu Li, Keyu Zhao, Zhiheng Zheng, Xuefei Ning, Tianji Wu, Shengen Yan, Yu Wang, Fengli Xu, and Yong Li. 2025. AgentSo- ciety Challenge: Designing LLM Agents for User Modeling and Recommendation on Web Platforms.Companion Proceedings of the ACM on Web Conference 2025 (2025). https://api.semanticscholar.org/CorpusID:276617682

  69. [69]

    There’s a demonstration, please avoid the area. You are going to tell me that I can go there. It’s my freedom spirit

    Yongchao Zeng, Calum Brown, and Mark Rounsevell. 2026. Too human to model: the uncanny valley of large language models in simulating human systems.npj Complexity3 (2026). https://api.semanticscholar.org/CorpusID:286172750 A Representative Quotes from the Formative Study A.1 DR1: Fluid Steering On how differently situated recipients interpret the same safe...