Drawing with Strangers: Population Scaling Drives Zero-Shot Mutual Intelligibility in Emergent Sketching
Pith reviewed 2026-06-27 14:13 UTC · model grok-4.3
The pith
Scaling training populations in emergent sketching agents improves communication between independent groups without prior exposure.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Scaling the training population substantially improves zero-shot mutual intelligibility across independent groups. As population size grows, in-group communicative variation increases, preventing co-adaptation into homogeneity, while cross-group variation decreases, indicating structural convergence toward universality. This universality is achieved through perceptual grounding, as scaled populations increasingly anchor their emergent sketches on the objective visual resemblance of the target images.
What carries the argument
Zero-shot mutual intelligibility (ZMI) between disjoint agent populations, achieved when population scaling drives convergence on perceptually grounded sketches rather than private conventions.
Load-bearing premise
Training occurs in strictly disjoint populations with no prior exposure, and the rise in in-group communicative variation with scale is the causal driver of reduced cross-group variation and improved zero-shot communication.
What would settle it
Running the same sketching experiments with larger populations but finding no increase in in-group variation or no gain in cross-group communication success would falsify the claim.
Figures
read the original abstract
Generalization in emergent communication has largely focused on novel inputs or linguistic structures, yet the capacity for agents to communicate with strangers from strictly disjoint communities remains relatively unexplored. In this work, we formalize this capability as \textit{zero-shot mutual intelligibility (ZMI)}: successful communication between independently trained populations without prior exposure. Leveraging emergent sketching -- in which agents communicate through sets of drawn strokes -- as a visually grounded modality, we find that scaling the training population substantially improves ZMI across independent groups. Crucially, as we scale the population size, in-group communicative variation increases, preventing co-adaptation into homogeneity. Simultaneously, cross-group variation decreases, indicating a structural convergence toward a certain type of universality. Further analysis reveals that this universality is achieved through perceptual grounding: scaled populations increasingly anchor their emergent sketches on the objective visual resemblance of the target images. Together, these results position ZMI as a distinct axis of generalization in emergent communication and suggest a route toward socially interoperable artificial agents.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript formalizes zero-shot mutual intelligibility (ZMI) as successful communication between independently trained populations of sketching agents with no prior exposure. It reports that scaling the size of the training population improves ZMI, accompanied by increased in-group communicative variation (preventing homogeneity), decreased cross-group variation (indicating structural convergence), and a shift toward perceptual grounding on objective visual resemblance of target images.
Significance. If the empirical patterns and proposed mechanism hold after appropriate controls, the work would usefully extend emergent-communication research by identifying population scale as a driver of inter-group interoperability and by distinguishing ZMI from other generalization axes. The emphasis on perceptual grounding supplies a concrete, testable route toward socially interoperable agents.
major comments (2)
- [Abstract / §4] Abstract and §4 (results on variation): the manuscript states that population scaling increases in-group variation, which in turn prevents homogeneity and produces the observed drop in cross-group variation plus ZMI gains. No mediation analysis, ablation that holds scale fixed while varying communicative diversity, or controlled comparison isolating the variation driver is described; the causal sequence therefore remains an untested assumption rather than a demonstrated mechanism.
- [Abstract] Abstract: the claim that scaled populations 'increasingly anchor their emergent sketches on the objective visual resemblance' is presented as the explanation for universality, yet the text supplies no quantitative measure (e.g., correlation with image-feature similarity, human perceptual judgments, or ablation removing visual grounding) that would distinguish this account from alternative explanations such as richer gradients or implicit regularization.
minor comments (2)
- [Abstract] The abstract contains no numerical results, error bars, or statistical tests; the full manuscript should include these in the main text or a dedicated results table so readers can evaluate effect sizes.
- [§3 / §4] Notation for 'in-group communicative variation' and 'cross-group variation' should be defined explicitly (e.g., via an equation or distance metric) at first use to avoid ambiguity across figures.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which help clarify the strength of evidence for the proposed mechanisms. We respond point-by-point below.
read point-by-point responses
-
Referee: [Abstract / §4] Abstract and §4 (results on variation): the manuscript states that population scaling increases in-group variation, which in turn prevents homogeneity and produces the observed drop in cross-group variation plus ZMI gains. No mediation analysis, ablation that holds scale fixed while varying communicative diversity, or controlled comparison isolating the variation driver is described; the causal sequence therefore remains an untested assumption rather than a demonstrated mechanism.
Authors: We agree that the manuscript presents the causal sequence as an inference from observed scaling patterns rather than through formal mediation analysis or an ablation that holds population size fixed while manipulating communicative diversity. The reported experiments show that larger populations reliably produce higher in-group variation, lower cross-group variation, and higher ZMI, but these remain correlational. We will add a mediation analysis and a controlled ablation isolating the variation driver in the revised manuscript. revision: yes
-
Referee: [Abstract] Abstract: the claim that scaled populations 'increasingly anchor their emergent sketches on the objective visual resemblance' is presented as the explanation for universality, yet the text supplies no quantitative measure (e.g., correlation with image-feature similarity, human perceptual judgments, or ablation removing visual grounding) that would distinguish this account from alternative explanations such as richer gradients or implicit regularization.
Authors: The manuscript's further analysis section reports quantitative correlations between emergent sketch features and objective image features that strengthen with population scale, together with supporting human judgment data. These results favor perceptual grounding over purely optimization-based alternatives. We acknowledge that an explicit ablation removing visual grounding would provide a sharper contrast and will include such an ablation in the revision. revision: yes
Circularity Check
Empirical simulation results exhibit no circularity
full rationale
The paper reports experimental findings on population scaling effects in emergent sketching agents, with ZMI, in-group variation, and cross-group convergence measured directly from simulations. No derivation chain, equations, or self-citations are invoked to derive the central claims; results are presented as observations from disjoint training populations. This matches the default expectation of self-contained empirical work against external benchmarks (simulation runs), warranting score 0 with empty steps.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
ICLR , year=
Multi-Agent Cooperation and the Emergence of (Natural) Language , author=. ICLR , year=
-
[2]
NeurIPS , year=
Learning to Communicate with Deep Multi-Agent Reinforcement Learning , author=. NeurIPS , year=
-
[3]
NeurIPS , year=
Learning multiagent communication with backpropagation , author=. NeurIPS , year=
-
[4]
NeurIPS , year=
Emergence of Language with Multi-agent Games: Learning to Communicate with Sequences of Symbols , author=. NeurIPS , year=
-
[5]
arXiv preprint arXiv:1611.03218 , year=
Learning to play guess who? and inventing a grounded language as a consequence , author=. arXiv preprint arXiv:1611.03218 , year=
-
[6]
ICLR , year=
Revisiting Populations in multi-agent Communication , author=. ICLR , year=
-
[7]
NeurIPS , year=
Emergent Communication under Varying Group Sizes and Connectivities , author=. NeurIPS , year=
-
[8]
EMNLP , year=
Emergent linguistic phenomena in multi-agent communication games , author=. EMNLP , year=
-
[9]
ICLR , year=
Emergent Communication at Scale , author=. ICLR , year=
-
[10]
arXiv preprint arXiv:1711.09846 , year=
Population based training of neural networks , author=. arXiv preprint arXiv:1711.09846 , year=
-
[11]
NeurIPS , year=
Collaborating with humans without human data , author=. NeurIPS , year=
-
[12]
2020 , url=
Hu, Hengyuan and Lerer, Adam and Peysakhovich, Alex and Foerster, Jakob , booktitle=. 2020 , url=
2020
-
[13]
NeurIPS , year=
Learning to draw: Emergent communication through sketching , author=. NeurIPS , year=
-
[14]
arXiv preprint arXiv:2103.16194 , year=
Differentiable drawing and sketching , author=. arXiv preprint arXiv:2103.16194 , year=
-
[15]
Emergent communication: Generalization and overfitting in
Rita, Mathieu and Tallec, Corentin and Michel, Paul and Grill, Jean-Bastien and Pietquin, Olivier and Dupoux, Emmanuel and Strub, Florian , booktitle=. Emergent communication: Generalization and overfitting in. 2022 , url=
2022
-
[16]
ACL , year=
Compositionality and Generalization In Emergent Languages , author=. ACL , year=
-
[17]
NeurIPS , year=
Emergent communication of generalizations , author=. NeurIPS , year=
-
[18]
ICLR , year=
Environmental Drivers of Systematicity and Generalization in a Situated Agent , author=. ICLR , year=
-
[19]
ACL (Findings) , year=
Concept-Best-Matching: Evaluating Compositionality In Emergent Communication , author=. ACL (Findings) , year=
-
[20]
ICML , year=
Countering Language Drift with Seeded Iterated Learning , author=. ICML , year=
-
[21]
1969 , publisher=
Convention: A Philosophical Study , author=. 1969 , publisher=
1969
-
[22]
PNAS , volume=
The Evolution of Language , author=. PNAS , volume=. 1999 , url=
1999
-
[23]
2010 , publisher=
Origins of Human Communication , author=. 2010 , publisher=
2010
-
[24]
2010 , publisher=
Signals: Evolution, Learning, and Information , author=. 2010 , publisher=
2010
-
[25]
Collective Dynamics of
Watts, Duncan and Strogatz, Steven , journal=. Collective Dynamics of. 1998 , url=
1998
-
[26]
Science , volume=
Emergence of Scaling in Random Networks , author=. Science , volume=. 1999 , url=
1999
-
[27]
TMLR , issn=
A Review of the Applications of Deep Learning-Based Emergent Communication , author=. TMLR , issn=. 2024 , url=
2024
-
[28]
Pragmatics & Cognition , volume=
Iconicity: From sign to system in human communication and language , author=. Pragmatics & Cognition , volume=. 2014 , url=
2014
-
[29]
Cognitive science , volume=
Foundations of representation: where might graphical symbol systems come from? , author=. Cognitive science , volume=. 2007 , url=
2007
-
[30]
ICLR , year=
Emergent Tool Use From Multi-Agent Autocurricula , author=. ICLR , year=
-
[31]
NeurIPS , year=
On the Utility of Learning About Humans for Zero-Shot Coordination , author=. NeurIPS , year=
-
[32]
Natural Language Does Not Emerge '
Kottur, Satwik and Moura, Jos. Natural Language Does Not Emerge '. EMNLP , year=
-
[33]
AAAI , year=
Emergence of Grounded Compositional Language in Multi-Agent Populations , author=. AAAI , year=
-
[34]
ICML , year=
Off-Belief Learning , author=. ICML , year=
-
[35]
ICML , year=
A New Formalism, Method and Open Issues for Zero-Shot Coordination , author=. ICML , year=
-
[36]
NeurIPS , year=
Emergent Communication in Interactive Sketch Question Answering , author=. NeurIPS , year=
-
[37]
ICLR , year=
A Neural Representation of Sketch Drawings , author=. ICLR , year=
-
[38]
ACL , year=
Multi-Agent Communication Meets Natural Language: Synergies Between Functional and Structural Language Learning , author=. ACL , year=
-
[39]
EMNLP , year=
Countering Language Drift via Visual Grounding , author=. EMNLP , year=
-
[40]
IEEE Transactions on Evolutionary Computation , volume=
Spontaneous evolution of linguistic structure---an iterated learning model of the emergence of regularity and irregularity , author=. IEEE Transactions on Evolutionary Computation , volume=. 2001 , doi=
2001
-
[41]
Current Opinion in Neurobiology , volume=
Iterated learning and the evolution of language , author=. Current Opinion in Neurobiology , volume=. 2014 , doi=
2014
-
[42]
Artificial Life , volume=
Iterated learning: a framework for the emergence of language , author=. Artificial Life , volume=. 2003 , doi=
2003
-
[43]
ICML , year=
Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents , author=. ICML , year=
-
[44]
ICLR , year=
Compositional Languages Emerge in a Neural Iterated Learning Model , author=. ICLR , year=
-
[45]
PLoS ONE , volume=
Language structure is partly determined by social structure , author=. PLoS ONE , volume=. 2010 , url=
2010
-
[46]
2011 , url=
Sociolinguistic Typology: Social Determinants of Linguistic Complexity , author=. 2011 , url=
2011
-
[47]
Lingua , volume=
The consequences of talking to strangers: Evolutionary corollaries of socio-cultural influences on linguistic form , author=. Lingua , volume=. 2007 , url=
2007
-
[48]
PLoS ONE , volume=
Speaker input variability does not explain why larger populations have simpler languages , author=. PLoS ONE , volume=. 2015 , url=
2015
-
[49]
Psychological Science , volume=
Variability and detection of invariant structure , author=. Psychological Science , volume=. 2002 , url=
2002
-
[50]
Psychological Science , volume=
Learn locally, think globally: Exemplar variability supports higher-order generalization and word learning , author=. Psychological Science , volume=. 2010 , url=
2010
-
[51]
Cognitive Science , volume=
How the size of our social network influences our semantic skills , author=. Cognitive Science , volume=. 2016 , url=
2016
-
[52]
Proceedings of the Royal Society B: Biological Sciences , volume=
Larger communities create more systematic languages , author=. Proceedings of the Royal Society B: Biological Sciences , volume=. 2019 , url=
2019
-
[53]
Proceedings of the IEEE , volume =
Gradient-Based Learning Applied to Document Recognition , author =. Proceedings of the IEEE , volume =. 1998 , url =
1998
-
[54]
NeurIPS , year =
ImageNet Classification with Deep Convolutional Neural Networks , author =. NeurIPS , year =
-
[55]
arXiv preprint arXiv:1412.6980 , year =
Adam: A Method for Stochastic Optimization , author =. arXiv preprint arXiv:1412.6980 , year =
-
[56]
ICLR , year=
Compositional Obverter Communication Learning from Raw Visual Input , author=. ICLR , year=
-
[57]
arXiv preprint arXiv:2006.02419 , year=
Emergent multi-agent communication in the deep learning era , author=. arXiv preprint arXiv:2006.02419 , year=
arXiv 2006
-
[58]
Journal of Multilingual and Multicultural Development , volume=
The contribution of linguistic factors to the intelligibility of closely related languages , author=. Journal of Multilingual and Multicultural Development , volume=. 2007 , url=
2007
-
[59]
International Journal of Multilingualism , volume=
Mutual intelligibility between closely related languages in Europe , author=. International Journal of Multilingualism , volume=. 2018 , url=
2018
-
[60]
2009 , note =
Learning Multiple Layers of Features from Tiny Images , author =. 2009 , note =
2009
-
[61]
Journal of the Royal Statistical Society: Series B (Methodological) , volume =
Ramsey, James , title =. Journal of the Royal Statistical Society: Series B (Methodological) , volume =. 1969 , doi =
1969
-
[62]
ICCV , year=
Deep learning face attributes in the wild , author=. ICCV , year=
-
[63]
arXiv preprint arXiv:1712.00409 , year=
Deep Learning Scaling is Predictable, Empirically , author=. arXiv preprint arXiv:1712.00409 , year=
-
[64]
arXiv preprint arXiv:2001.08361 , year=
Scaling Laws for Neural Language Models , author=. arXiv preprint arXiv:2001.08361 , year=
Pith/arXiv arXiv 2001
-
[65]
NeurIPS , year=
Language Models are Few-Shot Learners , author=. NeurIPS , year=
-
[66]
arXiv preprint arXiv:2102.01293 , year=
Scaling Laws for Transfer , author=. arXiv preprint arXiv:2102.01293 , year=
-
[67]
CVPR , year=
Scaling Vision Transformers , author=. CVPR , year=
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.