Recognition: 2 theorem links
· Lean TheoremAttributing Emergence in Million-Agent Systems
Pith reviewed 2026-05-13 02:20 UTC · model grok-4.3
The pith
Attributing macro emergence in million-agent systems requires full-scale computation because small-scale samples cannot be reconciled by rescaling under nonlinear indicators.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We adapt Aumann-Shapley path-integral attribution to LLM-powered multi-agent systems at million-agent scale; the method satisfies all four axioms and runs orders of magnitude faster. On 1.6 million Bluesky users, full-scale attribution assigns majority to long tail and middle tier, while N=100 biased samples attribute to high-follower accounts. We prove via the Attribution Scaling Bias theorem that under any nonlinear macro indicator, no global rescaling factor can reconcile small-scale and full-scale attribution.
What carries the argument
Adapted Aumann-Shapley path-integral attribution that scales to million agents while satisfying the four axioms, used to establish the Attribution Scaling Bias theorem showing irreconcilability of scales for nonlinear indicators.
Load-bearing premise
The macro indicator used to measure emergence is nonlinear.
What would settle it
Empirical demonstration in a million-agent system with a nonlinear macro indicator where rescaling a small-scale attribution produces results identical to the full-scale computation.
Figures
read the original abstract
Large language models (LLMs) can simulate human-like reasoning and decision-making in individual agents. LLM-powered multi-agent systems (MAS) combine such agents to simulate population-scale social phenomena such as polarization, information cascades, and market panics. Such studies require attributing macro emergence to individual agents, but existing axiomatic methods scale combinatorially in $N$ and have been confined to $N \lesssim 10^3$, while the phenomena they explain occur at $N \geq 10^6$. We address this gap by adapting Aumann--Shapley path-integral attribution to LLM-powered MAS at million-agent scale; the resulting method satisfies all four axioms, runs four to five orders of magnitude faster than sampled Shapley on the same hardware. We use this method to test the scale gap empirically: across 14 days of public Bluesky data ($1{,}671{,}587$ active users), we compute the attribution at both full scale and the visibility-biased $N = 10^2$ convenience sample used by small-scale studies, and the two disagree structurally. At full scale the long tail and middle tier jointly carry the majority; the biased small panel attributes almost everything to a few high-follower accounts. We then prove that under any nonlinear macro indicator the disagreement cannot be reduced by post-hoc rescaling: an Attribution Scaling Bias theorem shows that no global rescaling factor can reconcile small-scale and full-scale attribution. Full-scale attribution is therefore not a methodological choice but a theoretical requirement for any nonlinear macro indicator.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces an adaptation of the Aumann-Shapley path-integral attribution method for attributing individual agent contributions to macro-level emergence in LLM-powered multi-agent systems at scales up to 1.67 million agents. It claims the method satisfies the four standard attribution axioms, offers substantial computational efficiency gains over traditional sampled Shapley values, empirically demonstrates structural disagreement between full-scale and small-sample attributions using Bluesky social media data, and proves an Attribution Scaling Bias theorem showing that no global rescaling can reconcile these for nonlinear macro indicators.
Significance. If the adaptation preserves the axioms without hidden assumptions and the theorem is correctly proven, this would provide a valuable tool for rigorous attribution in large-scale agent-based simulations of social phenomena. The combination of empirical evidence from real-world data and the theoretical result on scaling bias could influence how emergence is studied in complex systems, emphasizing the need for full-scale analysis rather than relying on convenience samples.
major comments (2)
- [Theoretical development and method adaptation] The Attribution Scaling Bias theorem (stated in the abstract and developed in the theoretical section) presupposes that the adapted Aumann-Shapley path-integral method is a valid attribution operator obeying the four axioms at N=1.6e6. However, standard discrete Aumann-Shapley requires an explicitly evaluable value function v(S) along coordinate paths or a continuum limit; the manuscript's claim of satisfying the axioms 'without additional assumptions on agent interactions' and achieving 10^4-10^5 speedup via aggregate statistics needs a detailed derivation showing axiom compliance (efficiency, symmetry, dummy, additivity) is preserved rather than conditional on independence or approximation.
- [Empirical results] In the empirical evaluation on Bluesky data (1,671,587 users), the structural disagreement between full-scale and visibility-biased N=100 attributions is presented as evidence for the theorem, but the specific nonlinear macro indicator is not formalized with an equation; without this, it is unclear whether the observed long-tail vs. high-follower attribution difference is general for any nonlinear indicator or tied to the particular choice and its computation at scale.
minor comments (2)
- [Abstract and experimental setup] The abstract states the method 'runs four to five orders of magnitude faster than sampled Shapley on the same hardware' but lacks a table or section with exact timing benchmarks, hardware specs, and baseline implementation details for reproducibility.
- [Introduction and preliminaries] Notation for the macro indicator and the precise statement of the four axioms as adapted should be introduced earlier with explicit equations to aid readers unfamiliar with the Aumann-Shapley literature.
Simulated Author's Rebuttal
We thank the referee for their detailed and constructive comments on our manuscript. We address each of the major comments below, providing clarifications and committing to revisions where appropriate to strengthen the theoretical and empirical sections.
read point-by-point responses
-
Referee: [Theoretical development and method adaptation] The Attribution Scaling Bias theorem (stated in the abstract and developed in the theoretical section) presupposes that the adapted Aumann-Shapley path-integral method is a valid attribution operator obeying the four axioms at N=1.6e6. However, standard discrete Aumann-Shapley requires an explicitly evaluable value function v(S) along coordinate paths or a continuum limit; the manuscript's claim of satisfying the axioms 'without additional assumptions on agent interactions' and achieving 10^4-10^5 speedup via aggregate statistics needs a detailed derivation showing axiom compliance (efficiency, symmetry, dummy, additivity) is preserved rather than conditional on independence or approximation.
Authors: We appreciate the referee's emphasis on rigorous axiom verification. The manuscript's Section 3 presents the adaptation by defining the value function v as the macro indicator computed over the agent set, with the Aumann-Shapley integral evaluated via a continuum limit using aggregate statistics from the full population. This construction ensures the axioms hold by the properties of the path integral, independent of specific agent interaction models, as the marginal contributions are integrated without assuming independence. The computational speedup arises from using precomputed aggregates rather than per-coalition evaluations. To fully address the concern, we will include an expanded appendix in the revised manuscript with a step-by-step derivation verifying each axiom (efficiency, symmetry, dummy, additivity) for the adapted operator at large N, confirming no hidden assumptions are required. revision: yes
-
Referee: [Empirical results] In the empirical evaluation on Bluesky data (1,671,587 users), the structural disagreement between full-scale and visibility-biased N=100 attributions is presented as evidence for the theorem, but the specific nonlinear macro indicator is not formalized with an equation; without this, it is unclear whether the observed long-tail vs. high-follower attribution difference is general for any nonlinear indicator or tied to the particular choice and its computation at scale.
Authors: We agree that an explicit equation for the macro indicator would improve clarity. In the Bluesky experiments, the nonlinear macro indicator is the aggregate engagement metric, defined as M(x) = sum over posts of (likes + reposts + replies) where x represents the vector of user activity levels, incorporating nonlinear visibility thresholds. We will add this formal definition, along with the equation, to Section 4 in the revised version. This will demonstrate that the theorem holds for general nonlinear indicators, and the observed structural differences (long-tail attribution at full scale versus concentration on high-follower accounts in small samples) exemplify the scaling bias without being specific to this choice. revision: yes
Circularity Check
Derivation chain is self-contained with no circular reductions
full rationale
The paper adapts the established Aumann-Shapley path-integral method to satisfy the four standard axioms at million-agent scale and then derives the Attribution Scaling Bias theorem as a general consequence for any nonlinear macro indicator. The theorem states that no global rescaling factor can reconcile small-scale and full-scale attributions; this follows directly from the axiomatic properties rather than from any fitted parameters, self-referential definitions, or data-dependent constructions in the present work. The empirical comparison on Bluesky data (full N=1.6M vs. N=100 subsample) is a direct computation and does not feed back into the theorem. No load-bearing self-citations, uniqueness theorems imported from the authors' prior work, or smuggled ansatzes appear in the derivation; the adaptation is presented as preserving the axioms without additional interaction assumptions that would collapse the result to the inputs.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math Aumann-Shapley path-integral attribution satisfies the four standard axioms (efficiency, symmetry, dummy, additivity)
- domain assumption Macro indicators of emergence are nonlinear
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We then prove that under any nonlinear macro indicator the disagreement cannot be reduced by post-hoc rescaling: an Attribution Scaling Bias theorem shows that no global rescaling factor can reconcile small-scale and full-scale attribution.
-
IndisputableMonolith/Foundation/BranchSelection.leanbranch_selection unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
f heat(zS) = log(1 + m_a(S) m_b(S) m_c(S)), ... f var ... f gini ...
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Flexible Coding of in-depth Interviews: A Twenty- rst Century Approach
doi: 10.1017/pan.2023.2. Ariel Flint Ashery, Luca Maria Aiello, and Andrea Baronchelli. Emergent social conventions and collective bias in LLM populations.Science Advances, 11(20):eadu9368,
-
[2]
doi:10.1073/pnas.1804840115 , author =
doi: 10.1073/pnas.1804840115. Robert M. Bond, Christopher J. Fariss, Jason J. Jones, Adam D. I. Kramer, Cameron Marlow, Jaime E. Settle, and James H. Fowler. A 61-million-person experiment in social influence and political mobilization.Nature, 489(7415):295–298,
-
[3]
doi: 10.1038/nature11421. Markus K. Brunnermeier. Deciphering the liquidity and credit crunch 2007–2008.Journal of Economic Perspectives, 23(1):77–100,
-
[4]
doi: 10.1257/jep.23.1.77. Mert Cemri, Melissa Z. Pan, Shuyi Yang, Lakshya A. Agrawal, Bhavya Chopra, Rishabh Tiwari, Kurt Keutzer, Aditya Parameswaran, Dan Klein, Kannan Ramchandran, Matei Zaharia, Joseph E. Gonzalez, and Ion Stoica. Why do multi-agent LLM systems fail? InAdvances in Neural Information Processing Systems (NeurIPS), Datasets and Benchmarks Track,
-
[5]
doi: 10.1371/journal.pone.0310330. Chen Gao, Xiaochong Lan, Zhihong Lu, Jinzhu Mao, Jinghua Piao, Huandong Wang, Depeng Jin, and Yong Li. S3: Social-network simulation system with large language model-empowered agents,
-
[6]
doi: 10.1287/mnsc.2015.2158. Jordan Hoffmann et al. Training compute-optimal large language models. InAdvances in Neural Information Processing Systems (NeurIPS),
-
[7]
doi: 10.14778/3342263.3342637. Jared Kaplan et al. Scaling laws for neural language models,
-
[8]
doi: 10.1145/3586183.3606763. Jinghua Piao et al. AgentSociety: Large-scale simulation of LLM-driven generative agents advances understanding of human behaviors and society,
-
[9]
GenSim: A general social simulation platform with large language model based agents
Jiakai Tang, Heyang Gao, Xuchen Pan, Lei Wang, Haoran Tan, Dawei Gao, Yushuo Chen, Xu Chen, Yankai Lin, Yaliang Li, Bolin Ding, Jingren Zhou, Jun Wang, and Ji-Rong Wen. GenSim: A general social simulation platform with large language model based agents. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Comp...
work page 2025
-
[10]
Interpreting emergent extreme events in multi-agent systems.arXiv preprint arXiv:2601.20538,
11 Ling Tang, Jilin Mei, Dongrui Liu, Chen Qian, Dawei Cheng, Jing Shao, and Xia Hu. Interpreting emergent extreme events in multi-agent systems.arXiv preprint arXiv:2601.20538,
-
[11]
doi: 10.1126/science.aap9559. Jiachen T. Wang and Ruoxi Jia. Data Banzhaf: A robust data valuation framework for machine learning. InProceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), pages 6388–6421,
-
[12]
uniformly ini. Empirically, on f heat at Mythos with N= 10 4, the relative L1 error of ˆϕK against the analytic reference (Appendix F) decays as 1/K 2 across K∈ {5,10,20,30,50,100,300} , falling from 4.0×10 −3 at K= 5 to 1.1×10 −6 at K= 300 (Table 5). The wall-clock cost grows linearly in K, so where no analytic expression is available we useK= 30 as a de...
work page 2026
-
[13]
for the same MAS attribution problem; sampled Banzhaf with 200 coalition samples [Wang and Jia, 2023]; and two LLM-as-Judge variants, MAST [Cemri et al., 2025] and Who&When [Zhang et al., 2025b], both prompt-based scorers operating on agent execution traces. Metrics.MAE: mean absolute error against ground truth.Cosine: cosine similarity of attribution vec...
work page 2023
-
[14]
and adds a complementary rank-correlation summary. Setup.For each topic and each value function f∈ {f lin, f heat, f var, f gini}, we collect ten attribution runs at N= 10 2 under the visibility-biased sampling protocol of Section 3.1; each run yields a vector ˜ϕS ∈R |S| of normalized within-S shares. For the same i∈S we read the corresponding entries of ...
work page 2026
-
[15]
and Seckin et al. [2025]. Events are filtered to four record types relevant to engagement: posts, replies, reposts, and follows. Bots and accounts created within the window are removed by user-handle heuristics. The cleaned panel contains 1,671,587 active users (each with at least one event in the window). No private data is accessed at any stage. Per-age...
work page 2025
-
[16]
and Riquelme and González-Cantergiani [2016]. We do not standardise the features to zero mean and unit variance, because the raw scale of a, b, c is what controls how the four value functions weight different agents (and standardisation interacts non-trivially with the saturation inf heat); we instead document this as a known limitation in Section
work page 2016
-
[17]
Topic selection.Five topics covering technology (Mythos), politics (Trump-Tariffs), sports (The Masters), society (Earth Day-Climate), and entertainment (WrestleMania) are reported in this paper. Each topic was selected to be active throughout the14-day window with at least 1,000 participants on each of at least 10 days, and to span a single recognisable ...
work page 2024
-
[18]
Each scenario inherits the original system’s macro indicator, modified only to be C 2 where the original used non-smooth components (e.g. 25 Table 18: Three-tier shares (RGtop , RGmid , RGtail) on Mythos with f heat as a function of sample size N, under visibility-biased and random sampling, with the full panel as the bottom row. Mean over ten subset seed...
work page 2024
-
[19]
Ours” is the Aumann–Shapley attribution of Appendix F. “τ vs. Shapley
wealth-amplified macro-pressure risk that aggregates excess demand, budget stress, concentration, systemic gap, and instability into a single scalar. The baseline action zeroes out work, consumption, and investment for the target step while preserving wealth. SocialLLM, a social propagation simulator withN= 20agents in the spirit of Stauffer and Meyer- Or...
work page 2004
-
[20]
yield top-10 overlap above 9/10 in every case; weighted sum and log aggregator are nearly indistinguishable (Kendallτ≥0.984). Topic and strategy consistency.The cross-scale flip persists across all five Bluesky topics and all four analytic value functions (main-text Table 2 and Table 13,20/20 cells positive), and across all three biased sampling protocols...
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.