When transformers learn "impossible" languages, what do they learn?
Pith reviewed 2026-07-01 02:10 UTC · model grok-4.3
The pith
Transformers show only gradual loss of grammatical sensitivity on impossible languages but severe failures generating long sentences from them.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Using GPT-2 style models trained on perturbed impossible variants of English, grammatical sensitivity measured by BLiMP minimal pairs exhibits only gradual degradation mediated by the language's information locality, whereas the models exhibit pronounced failures in generation by producing substantially fewer high-quality sentences at longer lengths; together these results support generative deficiency and transmission failures as a plausible linking hypothesis for non-attestation.
What carries the argument
The contrast between BLiMP minimal-pair grammaticality judgments and direct measurement of generation quality across sentence lengths on information-locally perturbed English variants.
If this is right
- Grammatical sensitivity alone does not explain model bias against impossible languages.
- Generation quality declines sharply with length under the perturbed conditions.
- Transmission failures become a candidate mechanism connecting model behavior to human non-attestation.
- Information locality modulates how quickly grammatical performance degrades.
Where Pith is reading between the lines
- If production limits are central, similar length-dependent failures might appear in other sequence models trained on the same variants.
- The result suggests testing whether human learners also show production bottlenecks before recognition ones on these constructed languages.
- The mediation by information locality implies that locality-preserving perturbations might produce different patterns than locality-breaking ones.
Load-bearing premise
The specific changes made to English preserve the property that the resulting languages remain unacquirable by humans while still permitting fair comparison to natural English.
What would settle it
An experiment that measures whether the same models generate high-quality long sentences from the perturbed variants at rates comparable to natural English, or a human learning study showing that people acquire the perturbed languages at rates predicted by the model results.
Figures
read the original abstract
Recent work suggests that transformer language models show a bias towards human languages over unnatural ("impossible") languages argued to be unacquirable by humans. However, this literature has largely based these claims on differences in sample efficiency and test-set perplexity, rather than on direct evaluations of the linguistic capacities that could plausibly explain non-attestation in human languages. We evaluate two theoretically motivated linking hypotheses: impossibility arising from deficiencies in grammatical sensitivity or generative production. Using GPT-2 style models trained on perturbed "impossible" variants of English, we measure sensitivity to grammaticality using BLiMP minimal pairs, finding that model performance exhibits only gradual degradation, mediated by the language's information locality. In contrast, these models exhibited pronounced failures in generation, producing substantially fewer high-quality sentences at longer lengths. Together, these results suggest generative deficiency and transmission failures as a plausible linking hypothesis between language model behaviour and non-attestation of impossible languages.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that GPT-2-style transformers trained on perturbed 'impossible' variants of English exhibit only gradual degradation in grammatical sensitivity on BLiMP minimal pairs (mediated by information locality), but show pronounced failures in generation by producing substantially fewer high-quality sentences at longer lengths. These results are used to argue that generative deficiency and transmission failures provide a plausible linking hypothesis for the non-attestation of impossible languages, moving beyond prior work focused on sample efficiency and perplexity.
Significance. If the results hold, the work strengthens the literature by shifting from indirect metrics to direct tests of grammatical sensitivity and generative capacity on held-out sets. The empirical contrast between BLiMP and generation tasks offers a falsifiable basis for connecting model behavior to linguistic non-attestation, with potential implications for both acquisition theory and model evaluation.
major comments (2)
- [language construction section] Language construction section: The perturbations used to create the impossible variants must be explicitly justified against documented linguistic universals or acquisition constraints to ensure the languages remain unacquirable by humans while preserving structural comparability to English; absent this grounding, the observed gradual BLiMP degradation versus generation failures cannot securely support the transmission-failure hypothesis.
- [generation evaluation] Generation evaluation (results section): The definition and measurement of 'high-quality sentences' at longer lengths, including any statistical controls or inter-annotator details, is load-bearing for the central claim of pronounced generative deficiency; without these, the contrast with BLiMP cannot be assessed as robust.
minor comments (2)
- [Abstract] Abstract: Include the number of models, exact data splits, and perturbation types to allow readers to evaluate the directional results without needing the full methods.
- [Methods] Methods: Ensure all baseline comparisons and information-locality mediation analyses are described with sufficient detail for replication.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our manuscript. We address each major point below and will incorporate revisions to strengthen the presentation of our methods and results.
read point-by-point responses
-
Referee: [language construction section] Language construction section: The perturbations used to create the impossible variants must be explicitly justified against documented linguistic universals or acquisition constraints to ensure the languages remain unacquirable by humans while preserving structural comparability to English; absent this grounding, the observed gradual BLiMP degradation versus generation failures cannot securely support the transmission-failure hypothesis.
Authors: We agree that more explicit grounding is needed. The language construction section motivates the perturbations from theoretical proposals in the linguistics literature on impossible languages and information locality, but we will add a dedicated subsection that directly links each perturbation to specific documented universals and acquisition constraints (with additional citations). This will clarify how the variants are positioned as unacquirable by humans while preserving structural comparability to English, thereby reinforcing the transmission-failure hypothesis. revision: yes
-
Referee: [generation evaluation] Generation evaluation (results section): The definition and measurement of 'high-quality sentences' at longer lengths, including any statistical controls or inter-annotator details, is load-bearing for the central claim of pronounced generative deficiency; without these, the contrast with BLiMP cannot be assessed as robust.
Authors: We concur that greater methodological detail is required here. In the revised manuscript we will expand the generation evaluation section to provide an explicit definition of high-quality sentences (based on grammaticality, fluency, and coherence criteria), report inter-annotator agreement metrics, and include statistical controls such as length-matched comparisons and significance testing. These additions will allow a clearer assessment of the contrast with BLiMP results. revision: yes
Circularity Check
No significant circularity; empirical results on held-out data
full rationale
The paper's claims rest on direct experimental measurements: models are trained on perturbed language variants, then evaluated for grammatical sensitivity via BLiMP minimal pairs and for generation quality at varying lengths. These outcomes are reported as observed performance differences on test sets, not as quantities derived from the paper's own equations, fitted parameters renamed as predictions, or self-referential definitions. No load-bearing self-citations, uniqueness theorems, or ansatzes smuggled via prior work are described in the abstract or methodology summary that would reduce the central linking hypothesis to an input by construction. The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Smith, Kenny and Kirby, Simon and Guo, Shangmin and Griffiths, Thomas L. , month = sep, year =. Nature , publisher =. doi:10.1038/d41586-024-03023-y , language =
-
[2]
Compression and communication in the cultural evolution of linguistic structure , volume =. Cognition , author =. 2015 , keywords =. doi:10.1016/j.cognition.2015.03.016 , abstract =
-
[3]
AI models collapse when trained on recursively generated data.Nature, 631(8022):755–759, 2024
Shumailov, Ilia and Shumaylov, Zakhar and Zhao, Yiren and Papernot, Nicolas and Anderson, Ross and Gal, Yarin , month = jul, year =. Nature , publisher =. doi:10.1038/s41586-024-07566-y , abstract =
-
[4]
Ziv, Imry and Lan, Nur and Chemla, Emmanuel and Katzir, Roni , month = oct, year =. Biasless. doi:10.48550/arXiv.2510.07178 , abstract =
-
[5]
Xu, Tianyang and Kuribayashi, Tatsuki and Oseki, Yohei and Cotterell, Ryan and Warstadt, Alex , month = feb, year =. Can. doi:10.48550/arXiv.2502.12317 , abstract =
-
[6]
and Giulianelli, Mario and Cotterell, Ryan , editor =
Someya, Taiga and Svete, Anej and DuSell, Brian and O'Donnell, Timothy J. and Giulianelli, Mario and Cotterell, Ryan , editor =. Information. Proceedings of the 63rd. 2025 , pages =. doi:10.18653/v1/2025.acl-long.1357 , abstract =
-
[7]
Transactions of the Association for Computational Linguistics , author =
How. Transactions of the Association for Computational Linguistics , author =. 2023 , pages =. doi:10.1162/tacl_a_00567 , abstract =
-
[8]
Linzen, Tal and Baroni, Marco , month = jan, year =. Syntactic. Annual Review of Linguistics , publisher =. doi:10.1146/annurev-linguistics-032020-051035 , abstract =
-
[9]
, year =
LeBrun, Benjamin and Sordoni, Alessandro and O'Donnell, Timothy J. , year =. Evaluating distributional distortion in neural language modeling , url =. The tenth international conference on learning representations (
-
[10]
Transactions of the Association for Computational Linguistics , author =
What. Transactions of the Association for Computational Linguistics , author =. 2026 , pages =. doi:10.1162/TACL.a.611 , abstract =
-
[11]
Hu, Jennifer and Wilcox, Ethan Gotlieb and Song, Siyuan and Mahowald, Kyle and Levy, Roger P. , month = oct, year =. What. doi:10.48550/arXiv.2510.16227 , abstract =
-
[12]
Understanding emergent abilities of language models from the loss perspective , url =
Du, Zhengxiao and Zeng, Aohan and Dong, Yuxiao and Tang, Jie , editor =. Understanding emergent abilities of language models from the loss perspective , url =. Advances in neural information processing systems 38:
-
[13]
Chomsky, Noam , year =. On the. Language and
-
[14]
Guo, Yanzhu and Shang, Guokan and Vazirgiannis, Michalis and Clavel, Chloé , editor =. The. Findings of the. 2024 , pages =. doi:10.18653/v1/2024.findings-naacl.228 , abstract =
-
[15]
Proceedings of the
Heafield, Kenneth , editor =. Proceedings of the. 2011 , pages =
2011
-
[16]
Mitchell, Jeff and Bowers, Jeffrey , editor =. Priorless. Proceedings of the 28th. 2020 , pages =. doi:10.18653/v1/2020.coling-main.451 , abstract =
-
[17]
Behavioral and Brain Sciences , author =
Rules and representations , volume =. Behavioral and Brain Sciences , author =. 1980 , keywords =. doi:10.1017/S0140525X00001515 , abstract =
-
[19]
Futrell, Richard and Mahowald, Kyle , pages =. How. doi:10.1017/s0140525x2510112x , journal =
-
[20]
why is my face purple -
-
[21]
Oh, Byung-Doh and Schuler, William , editor =. Transformer-. Findings of the. 2023 , pages =. doi:10.18653/v1/2023.findings-emnlp.128 , abstract =
-
[22]
Hu, Jennifer and Gauthier, Jon and Qian, Peng and Wilcox, Ethan and Levy, Roger , editor =. A. Proceedings of the 58th. 2020 , pages =. doi:10.18653/v1/2020.acl-main.158 , abstract =
-
[23]
Proceedings of the National Academy of Sciences , author =
Cumulative cultural evolution in the laboratory:. Proceedings of the National Academy of Sciences , author =. 2008 , pages =. doi:10.1073/pnas.0707835105 , abstract =
-
[24]
Language models as tools for investigating the distinction between possible and impossible natural languages , url =
Kallini, Julie and Potts, Christopher , year =. Language models as tools for investigating the distinction between possible and impossible natural languages , url =
-
[25]
arXiv.org , author =
Language models as tools for investigating the distinction between possible and impossible natural languages , url =. arXiv.org , author =
-
[26]
Prompting is not a substitute for probability measurements in large language models , url =
Hu, Jennifer and Levy, Roger , editor =. Prompting is not a substitute for probability measurements in large language models , url =. Proceedings of the 2023. 2023 , pages =. doi:10.18653/v1/2023.emnlp-main.306 , abstract =
-
[27]
Large languages, impossible languages and human brains , volume =. Cortex , author =. 2023 , keywords =. doi:10.1016/j.cortex.2023.07.003 , abstract =
-
[28]
Form and meaning in intrinsic multilingual evaluations , url =
Poelman, Wessel and de Lhoneux, Miryam , year =. Form and meaning in intrinsic multilingual evaluations , url =
-
[29]
Thomas and Min, Junghyun and Linzen, Tal , editor =
McCoy, R. Thomas and Min, Junghyun and Linzen, Tal , editor =. Proceedings of the. 2020 , pages =. doi:10.18653/v1/2020.blackboxnlp-1.21 , abstract =
-
[30]
arXiv.org , author =
Can. arXiv.org , author =
-
[31]
Yang, Xiulin and Aoyama, Tatsuya and Yao, Yuekun and Wilcox, Ethan , editor =. Anything. Proceedings of the 63rd. 2025 , pages =. doi:10.18653/v1/2025.acl-long.1264 , abstract =
-
[32]
Chomsky, Noam , month = jan, year =. Explanatory. Studies in. doi:10.1016/S0049-237X(09)70617-2 , pages =
-
[33]
and Dautriche, Isabelle and Mahowald, Kyle and Bergen, Leon and Levy, Roger , month = may, year =
Gibson, Edward and Futrell, Richard and Piantadosi, Steven P. and Dautriche, Isabelle and Mahowald, Kyle and Bergen, Leon and Levy, Roger , month = may, year =. How. Trends in Cognitive Sciences , publisher =. doi:10.1016/j.tics.2019.02.003 , language =
-
[34]
Culbertson, Jennifer and Kirby, Simon , month = jan, year =. Simplicity and. Frontiers in Psychology , publisher =. doi:10.3389/fpsyg.2015.01964 , abstract =
-
[35]
Moro, Andrea , month = may, year =. The. doi:10.7551/mitpress/9780262134989.001.0001 , abstract =
-
[36]
The New York Times , author =
Opinion. The New York Times , author =. 2023 , keywords =
2023
-
[38]
and Bovy, Jo , month = jan, year =
Leung, Henry W. and Bovy, Jo , month = jan, year =. Towards an astronomical foundation model for stars with a transformer-based model , volume =. Monthly Notices of the Royal Astronomical Society , publisher =. doi:10.1093/mnras/stad3015 , abstract =
-
[39]
Dalla-Torre, Hugo and Gonzalez, Liam and Mendoza-Revilla, Javier and Lopez Carranza, Nicolas and Grzywaczewski, Adam Henryk and Oteri, Francesco and Dallago, Christian and Trop, Evan and de Almeida, Bernardo P. and Sirelkhatim, Hassan and Richard, Guillaume and Skwark, Marcin and Beguir, Karim and Lopez, Marie and Pierrot, Thomas , month = feb, year =. Nu...
-
[40]
An image is worth 16x16 words:
Dosovitskiy, Alexey and Beyer, Lucas and Kolesnikov, Alexander and Weissenborn, Dirk and Zhai, Xiaohua and Unterthiner, Thomas and Dehghani, Mostafa and Minderer, Matthias and Heigold, Georg and Gelly, Sylvain and Uszkoreit, Jakob and Houlsby, Neil , year =. An image is worth 16x16 words:. 9th international conference on learning representations,
-
[41]
Using. Linguistic Inquiry , author =. 2024 , pages =. doi:10.1162/ling_a_00491 , abstract =
-
[42]
Trends in Cognitive Sciences , author =
Dissociating language and thought in large language models , volume =. Trends in Cognitive Sciences , author =. 2024 , keywords =. doi:10.1016/j.tics.2024.01.011 , abstract =
-
[43]
Kann, Katharina and Rothe, Sascha and Filippova, Katja , editor =. Sentence-. Proceedings of the 22nd. 2018 , pages =. doi:10.18653/v1/K18-1031 , abstract =
-
[44]
Padmakumar, Vishakh and Yueh-Han, Chen and Pan, Jane and Chen, Valerie and He, He , month = oct, year =. Measuring. doi:10.48550/arXiv.2504.09389 , abstract =
-
[45]
Saakyan, Arkadiy and Kim, Najoung and Muresan, Smaranda and Chakrabarty, Tuhin , year =. Death of the. doi:10.48550/ARXIV.2509.22641 , abstract =
-
[46]
Guo, Yanzhu and Shang, Guokan and Vazirgiannis, Michalis and Clavel, Chloé , year =. The. doi:10.48550/ARXIV.2311.09807 , abstract =
-
[47]
McCoy, R. Thomas and Smolensky, Paul and Linzen, Tal and Gao, Jianfeng and Celikyilmaz, Asli , month = nov, year =. How much do language models copy from their training data?. doi:10.48550/arXiv.2111.09509 , abstract =
-
[48]
Warstadt, Alex and Parrish, Alicia and Liu, Haokun and Mohananey, Anhad and Peng, Wei and Wang, Sheng-Fu and Bowman, Samuel R. , editor =. Transactions of the Association for Computational Linguistics , publisher =. 2020 , pages =. doi:10.1162/tacl_a_00321 , abstract =
-
[49]
Mahowald, Kyle and Ivanova, Anna A. and Blank, Idan A. and Kanwisher, Nancy and Tenenbaum, Joshua B. and Fedorenko, Evelina , month = mar, year =. Dissociating language and thought in large language models , url =. doi:10.48550/arXiv.2301.06627 , abstract =
-
[50]
Schaeffer, Rylan and Miranda, Brando and Koyejo, Sanmi , month = may, year =. Are. doi:10.48550/arXiv.2304.15004 , abstract =
-
[51]
Berti, Leonardo and Giorgi, Flavio and Kasneci, Gjergji , month = mar, year =. Emergent. doi:10.48550/arXiv.2503.05788 , abstract =
-
[52]
Du, Zhengxiao and Zeng, Aohan and Dong, Yuxiao and Tang, Jie , month = jan, year =. Understanding. doi:10.48550/arXiv.2403.15796 , abstract =
-
[53]
Emergent Abilities of Large Language Models
Wei, Jason and Tay, Yi and Bommasani, Rishi and Raffel, Colin and Zoph, Barret and Borgeaud, Sebastian and Yogatama, Dani and Bosma, Maarten and Zhou, Denny and Metzler, Donald and Chi, Ed H. and Hashimoto, Tatsunori and Vinyals, Oriol and Liang, Percy and Dean, Jeff and Fedus, William , month = oct, year =. Emergent. doi:10.48550/arXiv.2206.07682 , abstract =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2206.07682
-
[54]
Newmeyer, Frederick J. , month = jan, year =. The. The. doi:10.1163/9781849500098_009 , language =
-
[55]
Comparing
Hartmann, Mareike and Kementchedjhieva, Yova and Søgaard, Anders , editor =. Comparing. Advances in. 2019 , pages =
2019
-
[56]
Theoretical stances shape empirical generalizations on inflection vs
Kyjánek, Lukáš and Bonami, Olivier , year =. Theoretical stances shape empirical generalizations on inflection vs. derivation: quantitative evidence from. Word-
-
[57]
Marked nominative in africa , volume =
König, Christa , year =. Marked nominative in africa , volume =. Studies in Language , publisher =
-
[58]
Lindsey, Jack and Gurnee, Wes and Ameisen, Emmanuel and Chen, Brian and Pearce, Adam and Turner, Nicholas L. and Citro, Craig and Abrahams, David and Carter, Shan and Hosmer, Basil and Marcus, Jonathan and Sklar, Michael and Templeton, Adly and Bricken, Trenton and McDougall◊, Callum and Cunningham, Hoagy and Henighan, Thomas and Jermyn, Adam and Jones, A...
-
[59]
Givón, Talmy , year =. On
-
[60]
Givón, Talmy , month = mar, year =. On. doi:10.1075/z.213 , abstract =
-
[61]
Syntactic categories and grammatical relations:
Croft, William , year =. Syntactic categories and grammatical relations:
-
[62]
Multimodal. Distill , author =. 2021 , pages =. doi:10.23915/distill.00030 , abstract =
-
[63]
Vasselli, Justin and Martínez Peguero, Arturo and Sung, Junehwan and Watanabe, Taro , editor =. Applying. Proceedings of the 4th. 2024 , pages =. doi:10.18653/v1/2024.americasnlp-1.24 , abstract =
-
[64]
Tanzer, Garrett and Suzgun, Mirac and Visser, Eline and Jurafsky, Dan and Melas-Kyriazi, Luke , year =. A. The
-
[65]
Ansell, Alan and Ponti, Edoardo and Korhonen, Anna and Vulić, Ivan , editor =. Composable. Proceedings of the 60th. 2022 , pages =. doi:10.18653/v1/2022.acl-long.125 , abstract =
-
[66]
Papadimitriou, Isabel and Lopez, Kezia and Jurafsky, Dan , editor =. Multilingual. Findings of the. 2023 , pages =. doi:10.18653/v1/2023.findings-eacl.89 , abstract =
-
[67]
Misra, Kanishka and Mahowald, Kyle , editor =. Language. Proceedings of the 2024. 2024 , pages =. doi:10.18653/v1/2024.emnlp-main.53 , abstract =
-
[68]
Hu, Michael Y. and Mueller, Aaron and Ross, Candace and Williams, Adina and Linzen, Tal and Zhuang, Chengxu and Cotterell, Ryan and Choshen, Leshem and Warstadt, Alex and Wilcox, Ethan Gotlieb , editor =. Findings of the. The 2nd. 2024 , pages =
2024
-
[69]
Warstadt, Alex and Mueller, Aaron and Choshen, Leshem and Wilcox, Ethan and Zhuang, Chengxu and Ciro, Juan and Mosquera, Rafael and Paranjabe, Bhargavi and Williams, Adina and Linzen, Tal and Cotterell, Ryan , editor =. Findings of the. Proceedings of the. 2023 , pages =. doi:10.18653/v1/2023.conll-babylm.1 , urldate =
-
[70]
Charpentier, Lucas and Choshen, Leshem and Cotterell, Ryan and Gul, Mustafa Omer and Hu, Michael Y. and Liu, Jing and Jumelet, Jaap and Linzen, Tal and Mueller, Aaron and Ross, Candance and Shah, Raj Sanjay and Warstadt, Alex and Wilcox, Ethan Gotlieb and Williams, Adina , editor =. Findings of the. Proceedings of the. 2025 , pages =
2025
-
[71]
Woodard, Roger D. , editor =. Greek. The. 2023 , keywords =. doi:10.1017/9780511842788.008 , abstract =
-
[72]
Peyraube, Alain and Chappell, Hilary and Vovin, Alexander , editor =. East. The. 2023 , keywords =. doi:10.1017/9780511842788.006 , abstract =
-
[73]
Amsler, Mark , editor =. The. The. 2023 , keywords =. doi:10.1017/9780511842788.004 , abstract =
-
[74]
doi:10.1017/9780511842788 , abstract =
The. doi:10.1017/9780511842788 , abstract =
-
[75]
Unlocking finite-state morphological transducers:
Haley, Coleman , month = jun, year =. Unlocking finite-state morphological transducers:. Society for Computation in Linguistics , publisher =. doi:10.7275/scil.3172 , abstract =
-
[76]
Computational Linguistics , author =
A. Computational Linguistics , author =. 2025 , pages =. doi:10.1162/COLI.a.577 , abstract =
-
[77]
Ploeger, Esther and Poelman, Wessel and de Lhoneux, Miryam and Bjerva, Johannes , editor =. What is “. Proceedings of the 2024. 2024 , pages =. doi:10.18653/v1/2024.emnlp-main.326 , abstract =
-
[78]
Lieber, Rochelle and Štekauer, Pavol , editor =. Universals in. The. 2014 , pages =. doi:10.1093/oxfordhb/9780199641642.013.0041 , abstract =
-
[79]
Sampling for variety , volume =
Miestamo, Matti and Bakker, Dik and Arppe, Antti , month = oct, year =. Sampling for variety , volume =. Linguistic Typology , publisher =. doi:10.1515/lingty-2016-0006 , abstract =
-
[80]
Dryer, Matthew S. , month = jan, year =. Large. Studies in Language. International Journal sponsored by the Foundation “Foundations of Language” , publisher =. doi:10.1075/sl.13.2.03dry , abstract =
-
[81]
Lexical-semantic versus syntactic disorders in aphasia: the processing of prepositions , author =
-
[82]
Fractionating. Cerebral Cortex , author =. 1992 , pages =. doi:10.1093/cercor/2.3.244 , abstract =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.