Toward Temporal Realism in City-Scale Crisis Response Simulation using LLM Agents
Pith reviewed 2026-06-26 15:14 UTC · model grok-4.3
The pith
Temporal realism in LLM crisis simulations requires decoupling when agents act from what they do.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes that standard LLM-only simulators lack temporal realism because their synchronous schedules produce no self-excitation, while real crisis participation exhibits endogenous bursty timing that intensifies during the pandemic. Introducing a data-calibrated self-excitation channel and crisis-activation regime to decide when agents act, querying the LLM only then for task choice, raises median burstiness from -0.14 to approximately 0.37 without degrading the quality of LLM decisions.
What carries the argument
The explicit self-excitation and crisis-activation mechanism that governs when each agent acts, independent of the LLM which only determines the action content at activated times.
If this is right
- The LLM-only baseline yields no bursty agents.
- A single data-calibrated gate suffices to achieve realistic per-agent timing.
- Timing is largely endogenous and amplified by crisis periods rather than daily cycles.
- Content decisions by the LLM remain unaffected by the added timing layer.
Where Pith is reading between the lines
- The decoupling approach may improve timing accuracy in other domains using LLM agents for social simulation.
- Generalization of the calibrated parameters to new cities or crises would require validation on additional datasets.
- Such simulators could support more accurate predictions of volunteer surges during future events.
Load-bearing premise
The self-excitation parameters calibrated on the Shenzhen volunteering log will generalize to produce realistic timing in other crisis contexts and populations without further fitting.
What would settle it
A test on volunteering or participation logs from another city during a comparable crisis, checking whether the simulated burstiness statistics match the empirical ones; failure to match would indicate the mechanism does not generalize.
Figures
read the original abstract
Human collective participation is rarely steady in time: it is bursty, with short episodes of intense activity separated by long quiet intervals. In crisis response and community mobilization, predicting when people act matters as much as predicting whether they act. Such settings are increasingly modeled with LLM-based social simulators, yet these simulators are validated on whether each action is individually plausible, not on whether actions are timed as in reality. Their temporal realism, the degree to which simulated activity reproduces the bursty, heavy-tailed timing of real human systems, thus remains untested. We examine this gap using a multi-year, city-scale log of offline volunteering in Shenzhen that spans the COVID-19 pandemic. Empirically, we establish that bursty timing is common at individual and tracked-group levels, that it is largely endogenous and self-exciting, and that it is amplified by the pandemic rather than produced by daily activity cycles. A standard LLM-only simulator reproduces almost none of this timing: its synchronous schedule has no self-excitation channel, so agents act on a near-regular clock. Guided by these findings, we build a simulator in which a data-calibrated self-excitation channel and a crisis-period regime decide when each agent acts and query the LLM only at those moments, leaving it to decide which task to join and whether to commit. The LLM-only baseline yields no bursty agents (median burstiness $B=-0.14$); a single data-calibrated gate is then sufficient to lift per-agent timing above the burst threshold (median $B\approx0.37$) without degrading LLM content decisions. These results indicate that temporal realism in LLM-based crisis-response simulation is best achieved by decoupling when agents act, governed by an explicit self-excitation and crisis-activation mechanism, from what they do, governed by the LLM.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that temporal realism (bursty, heavy-tailed activity timing) in LLM-based city-scale crisis simulators is best achieved by decoupling the 'when' decision (an explicit, data-calibrated self-excitation channel plus crisis regime, fitted to a multi-year Shenzhen volunteering log) from the 'what' decision (handled by the LLM). Empirical analysis of the Shenzhen data shows burstiness is endogenous and pandemic-amplified; a standard synchronous LLM simulator yields median burstiness B = -0.14, while adding the single calibrated gate raises per-agent median B to ≈0.37 without degrading LLM content quality.
Significance. If the decoupling result generalizes, the work would be significant for social simulation by showing that explicit timing mechanisms can supply realistic burstiness that LLMs alone do not produce, while preserving the LLM's strength on action content. Credit is due for grounding the approach in a real multi-year city-scale offline volunteering dataset spanning COVID-19, for quantifying burstiness at both individual and group levels, and for reporting concrete before/after metrics on the same data.
major comments (2)
- [Abstract] Abstract: the reported improvement (median B from -0.14 to ≈0.37) is measured on the identical Shenzhen volunteering log used to calibrate the self-excitation parameters; this makes the lift a fitted outcome rather than an independent test of the decoupling claim. The assertion that the approach is 'best achieved' for general city-scale crisis simulation therefore rests on an untested transfer assumption.
- [Abstract] Abstract and methods description: no details are supplied on the calibration procedure for the self-excitation parameters (data exclusion rules, fitting objective, or statistical tests confirming the B improvement is significant and not an artifact of the fitting process). These omissions leave the central empirical support only partially documented.
minor comments (1)
- [Abstract] The burstiness metric B is introduced without an explicit formula or reference in the abstract; a short definition or citation would aid readability.
Simulated Author's Rebuttal
We thank the referee for the careful reading and for identifying these two points on validation and documentation. We agree with both comments and will revise the manuscript to address them directly.
read point-by-point responses
-
Referee: [Abstract] Abstract: the reported improvement (median B from -0.14 to ≈0.37) is measured on the identical Shenzhen volunteering log used to calibrate the self-excitation parameters; this makes the lift a fitted outcome rather than an independent test of the decoupling claim. The assertion that the approach is 'best achieved' for general city-scale crisis simulation therefore rests on an untested transfer assumption.
Authors: We agree that the B improvement is shown on the calibration data and therefore constitutes a demonstration that the self-excitation gate can reproduce the observed burstiness when fitted, rather than an independent test. The phrasing that temporal realism is 'best achieved' by decoupling therefore does rest on an untested transfer assumption for other cities or crises. In revision we will qualify the abstract claim, replace 'best achieved' with a more precise statement limited to the Shenzhen setting, and add a limitations paragraph discussing the transfer assumption together with planned cross-dataset validation. revision: yes
-
Referee: [Abstract] Abstract and methods description: no details are supplied on the calibration procedure for the self-excitation parameters (data exclusion rules, fitting objective, or statistical tests confirming the B improvement is significant and not an artifact of the fitting process). These omissions leave the central empirical support only partially documented.
Authors: We agree that the calibration procedure is insufficiently documented. The revised methods section will add: (i) explicit data exclusion rules applied to the Shenzhen log, (ii) the fitting objective (matching the empirical distribution of inter-event times and per-agent burstiness), and (iii) the statistical tests used to assess the significance of the B lift (including bootstrap confidence intervals and a permutation test against the null of no self-excitation). These additions will make the empirical support fully reproducible. revision: yes
Circularity Check
Data-calibrated self-excitation parameters produce reported burstiness improvement by construction on Shenzhen log
specific steps
-
fitted input called prediction
[Abstract]
"a single data-calibrated gate is then sufficient to lift per-agent timing above the burst threshold (median B≈0.37) without degrading LLM content decisions"
The self-excitation parameters are calibrated on the multi-year Shenzhen volunteering log (including its COVID amplification) to reproduce observed burstiness; the reported lift in median B is measured on the same log, making the claimed temporal realism a direct outcome of the fit rather than an independent model prediction.
full rationale
The paper's central demonstration that the decoupled self-excitation gate achieves temporal realism (lifting median B from -0.14 to ~0.37) is obtained by fitting the gate parameters directly to the Shenzhen volunteering log and evaluating the match on the identical dataset. This matches pattern 2 exactly: a fitted input is presented as the model's predictive success. The abstract explicitly ties the result to calibration on that log with no described out-of-sample test, so the timing-realism claim reduces to the calibration step rather than an independent derivation from the model equations. No other circularity patterns are present.
Axiom & Free-Parameter Ledger
free parameters (1)
- self-excitation parameters
axioms (2)
- domain assumption Bursty timing in human volunteering is largely endogenous and self-exciting rather than driven by external daily cycles
- domain assumption The pandemic period amplifies burstiness in activity
Reference graph
Works this paper leans on
-
[1]
Correlated dynamics in human printing behavior,
U. Harder and M. Paczuski, “Correlated dynamics in human printing behavior,”Physica A: Statistical Mechanics and its Applications, vol. 361, no. 1, pp. 329–336, 2006
2006
-
[2]
The origin of bursts and heavy tails in human dynam- ics,
A.-L. Barab ´asi, “The origin of bursts and heavy tails in human dynam- ics,”Nature, vol. 435, no. 7039, pp. 207–211, 2005
2005
-
[3]
Large language models empowered agent-based modeling and simulation: A survey and perspectives,
C. Gao, X. Lan, N. Li, Y . Yuan, J. Ding, Z. Zhou, F. Xu, and Y . Li, “Large language models empowered agent-based modeling and simulation: A survey and perspectives,”Humanities and Social Sciences Communications, vol. 11, no. 1, pp. 1–24, 2024
2024
-
[4]
Generative agents: Interactive simulacra of human behavior,
J. S. Park, J. O’Brien, C. J. Cai, M. R. Morris, P. Liang, and M. S. Bernstein, “Generative agents: Interactive simulacra of human behavior,” inProceedings of the 36th annual acm symposium on user interface software and technology, 2023, pp. 1–22
2023
-
[5]
J. Piao, Y . Yan, J. Zhang, N. Li, J. Yan, X. Lan, Z. Lu, Z. Zheng, J. Y . Wang, D. Zhouet al., “Agentsociety: Large-scale simulation of llm- driven generative agents advances understanding of human behaviors and society,”arXiv preprint arXiv:2502.08691, 2025
Pith/arXiv arXiv 2025
-
[6]
Unveiling the truth and facilitating change: Towards agent-based large-scale social movement simulation,
X. Mou, Z. Wei, and X.-J. Huang, “Unveiling the truth and facilitating change: Towards agent-based large-scale social movement simulation,” inFindings of the Association for Computational Linguistics: ACL 2024, 2024, pp. 4789–4809
2024
-
[7]
Optimising self-organised volunteer efforts in response to the covid-19 pandemic,
A. Zhang, K. Zhang, W. Li, Y . Wang, Y . Li, and L. Zhang, “Optimising self-organised volunteer efforts in response to the covid-19 pandemic,” Humanities and Social Sciences Communications, vol. 9, no. 1, 2022
2022
-
[8]
The evolution and impact of group collaboration in crisis response,
A. Zhang, D. Kong, J. Lai, G. Zhang, J. Li, Y . Wang, Y . Li, and M. C. Gonz´alez, “The evolution and impact of group collaboration in crisis response,”Communications Physics, 2026
2026
-
[9]
V olunteer retention and future collaboration prediction in volunteer crowdsourcing platforms,
S. Chen, A. Zhang, Q. Chen, and Y . Li, “V olunteer retention and future collaboration prediction in volunteer crowdsourcing platforms,” in2023 IEEE 33rd International Workshop on Machine Learning for Signal Processing (MLSP). IEEE, 2023, pp. 1–5
2023
-
[10]
Burstiness and memory in complex systems,
K.-I. Goh and A.-L. Barab ´asi, “Burstiness and memory in complex systems,”Europhysics Letters, vol. 81, no. 4, p. 48002, 2008
2008
-
[11]
Karsai, H.-H
M. Karsai, H.-H. Jo, K. Kaskiet al.,Bursty human dynamics. Springer, 2018
2018
-
[12]
Modeling bursts and heavy tails in human dynamics,
A. V ´azquez, J. G. Oliveira, Z. Dezs ¨o, K.-I. Goh, I. Kondor, and A.- L. Barab ´asi, “Modeling bursts and heavy tails in human dynamics,” Physical Review E—Statistical, Nonlinear, and Soft Matter Physics, vol. 73, no. 3, p. 036127, 2006
2006
-
[13]
Darwin and einstein correspondence patterns,
J. G. Oliveira and A.-L. Barab ´asi, “Darwin and einstein correspondence patterns,”Nature, vol. 437, no. 7063, pp. 1251–1251, 2005
2005
-
[14]
A poissonian explanation for heavy tails in e-mail communication,
R. D. Malmgren, D. B. Stouffer, A. E. Motter, and L. A. Amaral, “A poissonian explanation for heavy tails in e-mail communication,” Proceedings of the National Academy of Sciences, vol. 105, no. 47, pp. 18 153–18 158, 2008
2008
-
[15]
Understanding individual human mobility patterns,
M. C. Gonzalez, C. A. Hidalgo, and A.-L. Barabasi, “Understanding individual human mobility patterns,”Nature, vol. 453, no. 7196, pp. 779–782, 2008
2008
-
[16]
Dynamics of information access on the web,
Z. Dezs ¨o, E. Almaas, A. Luk ´acs, B. R ´acz, I. Szakad ´at, and A.-L. Barab´asi, “Dynamics of information access on the web,”Physical Review E—Statistical, Nonlinear, and Soft Matter Physics, vol. 73, no. 6, p. 066132, 2006
2006
-
[17]
Hawkes processes in fi- nance,
E. Bacry, I. Mastromatteo, and J.-F. Muzy, “Hawkes processes in fi- nance,”Market Microstructure and Liquidity, vol. 1, no. 01, p. 1550005, 2015
2015
-
[18]
Dynamics of person-to-person interactions from distributed rfid sensor networks,
C. Cattuto, W. Van den Broeck, A. Barrat, V . Colizza, J.-F. Pinton, and A. Vespignani, “Dynamics of person-to-person interactions from distributed rfid sensor networks,”PloS one, vol. 5, no. 7, p. e11596, 2010
2010
-
[19]
What’s in a crowd? analysis of face-to-face behavioral net- works,
L. Isella, J. Stehl ´e, A. Barrat, C. Cattuto, J.-F. Pinton, and W. Van den Broeck, “What’s in a crowd? analysis of face-to-face behavioral net- works,”Journal of theoretical biology, vol. 271, no. 1, pp. 166–180, 2011
2011
-
[20]
Circadian pattern and burstiness in mobile phone communication,
H.-H. Jo, M. Karsai, J. Kert ´esz, and K. Kaski, “Circadian pattern and burstiness in mobile phone communication,”New Journal of Physics, vol. 14, no. 1, p. 013055, 2012
2012
-
[21]
Spectra of some self-exciting and mutually exciting point processes,
A. G. Hawkes, “Spectra of some self-exciting and mutually exciting point processes,”Biometrika, vol. 58, no. 1, pp. 83–90, 1971
1971
-
[22]
Seismic: A self-exciting point process model for predicting tweet popularity,
Q. Zhao, M. A. Erdogdu, H. Y . He, A. Rajaraman, and J. Leskovec, “Seismic: A self-exciting point process model for predicting tweet popularity,” inProceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, 2015, pp. 1513– 1522
2015
-
[23]
Bayesian estimation of the etas model for earthquake occurrences,
G. J. Ross, “Bayesian estimation of the etas model for earthquake occurrences,”Bulletin of the Seismological Society of America, vol. 111, no. 3, pp. 1473–1480, 2021
2021
-
[24]
Robust dynamic classes revealed by mea- suring the response function of a social system,
R. Crane and D. Sornette, “Robust dynamic classes revealed by mea- suring the response function of a social system,”Proceedings of the National Academy of Sciences, vol. 105, no. 41, pp. 15 649–15 653, 2008
2008
-
[25]
Dynamical classes of collective attention in twitter,
J. Lehmann, B. Gonc ¸alves, J. J. Ramasco, and C. Cattuto, “Dynamical classes of collective attention in twitter,” inProceedings of the 21st international conference on World Wide Web, 2012, pp. 251–260
2012
-
[26]
A survey on large language model based autonomous agents,
L. Wang, C. Ma, X. Feng, Z. Zhang, H. Yang, J. Zhang, Z. Chen, J. Tang, X. Chen, Y . Linet al., “A survey on large language model based autonomous agents,”Frontiers of Computer Science, vol. 18, no. 6, p. 186345, 2024
2024
-
[27]
The rise and potential of large language model based agents: A survey,
Z. Xi, W. Chen, X. Guo, W. He, Y . Ding, B. Hong, M. Zhang, J. Wang, S. Jin, E. Zhouet al., “The rise and potential of large language model based agents: A survey,”Science China Information Sciences, vol. 68, no. 2, p. 121101, 2025
2025
-
[28]
Exploring collaboration mechanisms for llm agents: A social psychology view,
J. Zhang, X. Xu, N. Zhang, R. Liu, B. Hooi, and S. Deng, “Exploring collaboration mechanisms for llm agents: A social psychology view,” arXiv preprint arXiv:2310.02124, 2023
arXiv 2023
-
[29]
Chain-of-thought prompting elicits reasoning in large language models,
J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V . Le, D. Zhouet al., “Chain-of-thought prompting elicits reasoning in large language models,”Advances in neural information processing systems, vol. 35, pp. 24 824–24 837, 2022
2022
-
[30]
Adaplanner: Adaptive planning from feedback with language models,
H. Sun, Y . Zhuang, L. Kong, B. Dai, and C. Zhang, “Adaplanner: Adaptive planning from feedback with language models,”Advances in neural information processing systems, vol. 36, pp. 58 202–58 245, 2023
2023
-
[31]
F. Xu, J. Zhang, C. Gao, J. Feng, and Y . Li, “Urban generative intelligence (ugi): A foundational platform for agents in embodied city environment,”arXiv preprint arXiv:2312.11813, 2023
arXiv 2023
-
[32]
Camel: Communicative agents for
G. Li, H. Hammoud, H. Itani, D. Khizbullin, and B. Ghanem, “Camel: Communicative agents for” mind” exploration of large language model society,”Advances in neural information processing systems, vol. 36, pp. 51 991–52 008, 2023
2023
-
[33]
Agentverse: Facilitating multi- agent collaboration and exploring emergent behaviors,
W. Chen, Y . Su, J. Zuo, C. Yang, C. Yuan, C.-M. Chan, H. Yu, Y . Lu, Y .-H. Hung, C. Qianet al., “Agentverse: Facilitating multi- agent collaboration and exploring emergent behaviors,” inThe Twelfth International Conference on Learning Representations, 2023
2023
-
[34]
Unleashing cognitive synergy in large language models: Atask-solving agent through multipersona self-collaboration,
Z. Wang, S. Mao, W. Wu, T. Ge, F. Wei, and H. Ji, “Unleashing cognitive synergy in large language models: Atask-solving agent through multipersona self-collaboration,”arXiv preprint arXiv, vol. 2307, 2023
2023
-
[35]
S3: Social-network simulation system with large language model- empowered agents,
C. Gao, X. Lan, Z. Lu, J. Mao, J. Piao, H. Wang, D. Jin, and Y . Li, “S3: Social-network simulation system with large language model- empowered agents,”arXiv preprint arXiv:2307.14984, 2023
Pith/arXiv arXiv 2023
-
[36]
Sotopia: Interactive evaluation for social intelligence in language agents,
X. Zhou, H. Zhu, L. Mathur, R. Zhang, H. Yu, Z. Qi, L.-P. Morency, Y . Bisk, D. Fried, G. Neubiget al., “Sotopia: Interactive evaluation for social intelligence in language agents,”arXiv preprint arXiv:2310.11667, 2023
arXiv 2023
-
[37]
Chatdev: Communicative agents for software development,
C. Qian, W. Liu, H. Liu, N. Chen, Y . Dang, J. Li, C. Yang, W. Chen, Y . Su, X. Conget al., “Chatdev: Communicative agents for software development,” inProceedings of the 62nd annual meeting of the associ- ation for computational linguistics (volume 1: Long papers), 2024, pp. 15 174–15 186
2024
-
[38]
Metagpt: Meta programming for a multi-agent collaborative framework,
S. Hong, M. Zhuge, J. Chen, X. Zheng, Y . Cheng, J. Wang, C. Zhang, Z. Wang, S. K. S. Yau, Z. Linet al., “Metagpt: Meta programming for a multi-agent collaborative framework,” inThe twelfth international conference on learning representations, 2023
2023
-
[39]
Building cooperative embodied agents modularly with large language models,
H. Zhang, W. Du, J. Shan, Q. Zhou, Y . Du, J. B. Tenenbaum, T. Shu, and C. Gan, “Building cooperative embodied agents modularly with large language models,”arXiv preprint arXiv:2307.02485, 2023
arXiv 2023
-
[40]
V oyager: An open-ended embodied agent with large language models,
G. Wang, Y . Xie, Y . Jiang, A. Mandlekar, C. Xiao, Y . Zhu, L. Fan, and A. Anandkumar, “V oyager: An open-ended embodied agent with large language models,”arXiv preprint arXiv:2305.16291, 2023
Pith/arXiv arXiv 2023
-
[41]
X. Zhu, Y . Chen, H. Tian, C. Tao, W. Su, C. Yang, G. Huang, B. Li, L. Lu, X. Wanget al., “Ghost in the minecraft: Generally capable agents for open-world environments via large language models with text-based knowledge and memory,”arXiv preprint arXiv:2305.17144, 2023
Pith/arXiv arXiv 2023
-
[42]
Large language model-empowered agents for simulating macroeconomic activities,
N. Li, C. Gao, Y . Li, and Q. Liao, “Large language model-empowered agents for simulating macroeconomic activities,”Available at SSRN 4606937, 2023
2023
-
[43]
Epidemic modeling with generative agents,
R. Williams, N. Hosseinichimeh, A. Majumdar, and N. Ghaffarzade- gan, “Epidemic modeling with generative agents,”arXiv preprint arXiv:2307.04986, 2023
arXiv 2023
-
[44]
An evolutionary model of personality traits related to cooperative behavior using a large language model,
R. Suzuki and T. Arita, “An evolutionary model of personality traits related to cooperative behavior using a large language model,”Scientific Reports, vol. 14, no. 1, p. 5989, 2024
2024
-
[45]
Emergent cooperation and strategy adaptation in multi-agent systems: An extended coevolutionary theory with llms,
I. De Zarz `a, J. De Curt `o, G. Roig, P. Manzoni, and C. T. Calafate, “Emergent cooperation and strategy adaptation in multi-agent systems: An extended coevolutionary theory with llms,”Electronics, vol. 12, no. 12, p. 2722, 2023
2023
-
[46]
User behavior simulation with large lan- guage model-based agents,
L. Wang, J. Zhang, H. Yang, Z.-Y . Chen, J. Tang, Z. Zhang, X. Chen, Y . Lin, H. Sun, R. Songet al., “User behavior simulation with large lan- guage model-based agents,”ACM Transactions on Information Systems, vol. 43, no. 2, pp. 1–37, 2025
2025
-
[47]
From persona to personalization: A survey on role-playing language agents,
J. Chen, X. Wang, R. Xu, S. Yuan, Y . Zhang, W. Shi, J. Xie, S. Li, R. Yang, T. Zhuet al., “From persona to personalization: A survey on role-playing language agents,”arXiv preprint arXiv:2404.18231, 2024
arXiv 2024
-
[48]
Hawkes processes for events in social media,
M.-A. Rizoiu, Y . Lee, S. Mishra, and L. Xie, “Hawkes processes for events in social media,” inFrontiers of multimedia research, 2017, pp. 191–218
2017
-
[49]
Can llm agents simulate dynamic networks? a case study on email networks with phishing synthesis,
S. Miao, Z. Chen, Y . Luo, H. H.-H. Hsu, M. Li, K. Zhang, and P. Li, “Can llm agents simulate dynamic networks? a case study on email networks with phishing synthesis,”arXiv preprint arXiv:2605.12507, 2026
Pith/arXiv arXiv 2026
-
[50]
Notice on further optimizing and implementing covid-19 prevention and control measures,
State Council Joint Prevention and Control Mechanism, “Notice on further optimizing and implementing covid-19 prevention and control measures,” https://www.gov.cn/xinwen/2022- 12/07/content 5730475.htm, 2022, accessed: 2026-06-01
2022
-
[51]
Tracking the evolution of communities in dynamic social networks,
D. Greene, D. Doyle, and P. Cunningham, “Tracking the evolution of communities in dynamic social networks,” in2010 international conference on advances in social networks analysis and mining. IEEE, 2010, pp. 176–183
2010
-
[52]
Power-law distributions in empirical data,
A. Clauset, C. R. Shalizi, and M. E. Newman, “Power-law distributions in empirical data,”SIAM review, vol. 51, no. 4, pp. 661–703, 2009
2009
-
[53]
Measuring burstiness for finite event se- quences,
E.-K. Kim and H.-H. Jo, “Measuring burstiness for finite event se- quences,”Physical Review E, vol. 94, no. 3, p. 032311, 2016
2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.