pith. machine review for the scientific record. sign in

arxiv: 2605.09271 · v1 · submitted 2026-05-10 · 💻 cs.AI

Recognition: 3 theorem links

· Lean Theorem

Shaping Schema via Language Representation as the Next Frontier for LLM Intelligence Expanding

Jingwen Fu, Masashi Sugiyama, Nanning Zheng, Pei Fu anf Bo Han, Yuhan Liu, Zhiqin Yang

Pith reviewed 2026-05-12 04:56 UTC · model grok-4.3

classification 💻 cs.AI
keywords LLMlanguage representationschemaknowledge activationsymbolic constructsintelligence expansionperformance variationinternal activations
0
0 comments X

The pith

Shaping schemas through advanced language representation is the next frontier for expanding LLM intelligence.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper posits that LLMs' ability to apply their internalized knowledge effectively in complex tasks is limited by the expressive power of natural language as the default medium. By designing more sophisticated language representations that use richer structural and symbolic forms to map the real world, one can shape an LLM's schema—the way it activates and organizes its knowledge—to achieve better performance without scaling the model or changing its parameters. The authors provide a formalization of this idea and back it with reviews of existing methods that gain from representation design plus new experiments demonstrating changes in both accuracy and internal activations when the same task is framed differently. A sympathetic reader would care because this points to a more efficient path for advancing AI capabilities than the current focus on ever-larger models.

Core claim

An LLM's schema, its knowledge activation and organization, depends heavily on the structural and symbolic sophistication of the language used to represent a given task. Shaping schemas through advanced language representation therefore constitutes the next frontier for expanding LLM intelligence, as shown by empirical practices and controlled experiments where performance and feature activations vary with different representations of the same task.

What carries the argument

The schema, defined as the LLM's knowledge activation and organization, which is shaped by the language representation consisting of linguistic and symbolic constructs that map the real world.

If this is right

  • Deliberate design of language representations can yield substantial performance gains on complex problems without modifying model parameters or increasing scale.
  • LLM internal feature activations change in response to the symbolic structure of the input language for an identical task.
  • Future research should emphasize language representation design as a primary direction alongside or instead of scaling.
  • The bottleneck of natural language's limited expressivity for problem-solving can be addressed through custom symbolic and structural enhancements.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • One could explore whether automatically generated language representations optimized for schema shaping would outperform human-designed ones on specific domains.
  • The approach may connect to how formal languages in mathematics enable more precise reasoning, suggesting similar benefits for LLMs in technical fields.
  • If schema shaping proves robust, it could reduce reliance on massive training datasets by improving how existing knowledge is accessed and structured.

Load-bearing premise

Performance differences across language representations of the same task arise specifically from schema shaping rather than tokenization, attention patterns, or surface-level prompting effects.

What would settle it

Finding that different language representations of the same task lead to identical internal feature activations and task performance levels, after matching for token count and basic structure, would falsify the claim that representation shapes schema in a distinct and measurable way.

Figures

Figures reproduced from arXiv: 2605.09271 by Jingwen Fu, Masashi Sugiyama, Nanning Zheng, Pei Fu anf Bo Han, Yuhan Liu, Zhiqin Yang.

Figure 1
Figure 1. Figure 1: Language representation as a frontier for LLM intelligence. Natural language encodes only a fraction of world information (Left). We organize representations along an axis of increasing design sophistication, from natural-language baselines (Level 0) through ambiguity elimination (Level 1) and logical constraints (Level 2) to scientific formalization and world modeling (Level 3). Each level induces progres… view at source ↗
Figure 2
Figure 2. Figure 2: Performance gains and capability expansion [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 2
Figure 2. Figure 2: Example of grammar prompting for a calendar DSL. We interleave the minimal specialized grammar G[y(i) ] between the demonstrations x(i) and y(i) . Duringdecoding,theLLMfirstpredictsthespecialized grammar G!, and then predicts the program y! conditioned on G!. Theblueportionisnotpartoftheactual prompt and only shown for illustrative purposes. 3 GrammarPrompting Grammar prompting exploits the fact that while… view at source ↗
Figure 4
Figure 4. Figure 4: Conceptual overview of shaping schema via language representation design. We posit that the intelligence of Large Language Models (LLMs) can be expanded not just by scaling, but by designing language representations (L) that deliberately induce optimal internal schemas. As conceptualized in our position, a complex task—such as a 48-hour multi-city schedule constrained by punctuality, low-carbon preferences… view at source ↗
Figure 5
Figure 5. Figure 5: Multi-dimensional evaluation of language representation formats across internal dynamics, [PITH_FULL_IMAGE:figures/full_fig_p024_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: t-SNE of last-layer hidden states (mean pooling), colored by representation type, layer 64 of Qwen3-32B. Each point is a single circuit problem; colors denote the 15 surface representations. Despite the underlying logical content being identical across formats, the model’s final-layer states form sharply disjoint clusters, one per representation. The silhouette score of 0.93 and the 96.8% between-format va… view at source ↗
Figure 7
Figure 7. Figure 7: Visualization of attention weights in Layer 6 (early). Rows correspond to distinct language [PITH_FULL_IMAGE:figures/full_fig_p029_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Visualization of attention weights in Layer 6 (early). Rows correspond to distinct language [PITH_FULL_IMAGE:figures/full_fig_p030_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Visualization of attention weights in Layer 6 (early). Rows correspond to distinct language [PITH_FULL_IMAGE:figures/full_fig_p031_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Visualization of attention weights in Layer 24 (middle). Rows correspond to distinct [PITH_FULL_IMAGE:figures/full_fig_p032_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Visualization of attention weights in Layer 24 (middle). Rows correspond to distinct [PITH_FULL_IMAGE:figures/full_fig_p033_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Visualization of attention weights in Layer 24 (middle). Rows correspond to distinct [PITH_FULL_IMAGE:figures/full_fig_p034_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Visualization of attention weights in Layer 48 (late). Rows correspond to distinct language [PITH_FULL_IMAGE:figures/full_fig_p035_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Visualization of attention weights in Layer 48 (late). Rows correspond to distinct language [PITH_FULL_IMAGE:figures/full_fig_p036_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Visualization of attention weights in Layer 48 (late). Rows correspond to distinct language [PITH_FULL_IMAGE:figures/full_fig_p037_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: An example for the natural language representation for the logic circuit simulation task. [PITH_FULL_IMAGE:figures/full_fig_p037_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: An example for the netlist language representation for the logic circuit simulation task. [PITH_FULL_IMAGE:figures/full_fig_p038_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: An example for the graph adjacency notation representation for the logic circuit simulation [PITH_FULL_IMAGE:figures/full_fig_p038_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: An example for the matrix representation for the logic circuit simulation task. [PITH_FULL_IMAGE:figures/full_fig_p038_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: An example for the lisp tree representation for the logic circuit simulation task. [PITH_FULL_IMAGE:figures/full_fig_p039_20.png] view at source ↗
Figure 21
Figure 21. Figure 21: An example for the dataflow language representation for the logic circuit simulation task. [PITH_FULL_IMAGE:figures/full_fig_p039_21.png] view at source ↗
Figure 22
Figure 22. Figure 22: An example for the partial truth table representation for the logic circuit simulation task. [PITH_FULL_IMAGE:figures/full_fig_p039_22.png] view at source ↗
Figure 23
Figure 23. Figure 23: An example for the compact gate notation representation for the logic circuit simulation [PITH_FULL_IMAGE:figures/full_fig_p040_23.png] view at source ↗
Figure 24
Figure 24. Figure 24: An example for the reverse polish notation representation for the logic circuit simulation [PITH_FULL_IMAGE:figures/full_fig_p040_24.png] view at source ↗
Figure 25
Figure 25. Figure 25: An example for the dependency chain language representation for the logic circuit [PITH_FULL_IMAGE:figures/full_fig_p040_25.png] view at source ↗
Figure 26
Figure 26. Figure 26: An example for the layered execution plan representation for the logic circuit simulation [PITH_FULL_IMAGE:figures/full_fig_p040_26.png] view at source ↗
Figure 27
Figure 27. Figure 27: An example for the signal propagation trace representation for the logic circuit simulation [PITH_FULL_IMAGE:figures/full_fig_p041_27.png] view at source ↗
Figure 28
Figure 28. Figure 28: An example for the constraint satisfaction format representation for the logic circuit [PITH_FULL_IMAGE:figures/full_fig_p041_28.png] view at source ↗
Figure 29
Figure 29. Figure 29: An example for the canonical boolean expression representation for the logic circuit [PITH_FULL_IMAGE:figures/full_fig_p041_29.png] view at source ↗
Figure 30
Figure 30. Figure 30: An example for the petri net notation representation for the logic circuit simulation task. [PITH_FULL_IMAGE:figures/full_fig_p041_30.png] view at source ↗
read the original abstract

Although natural language is the default medium for Large Language Models (LLMs), its limited expressive capacity creates a profound bottleneck for complex problem-solving. While recent advancements in AI have relied heavily on scaling, merely internalizing knowledge does not guarantee its effective application. Defining language representation as the linguistic and symbolic constructs used to map and model the real world, this paper argues that shaping schemas through advanced language representation is the next frontier for expanding LLM intelligence. We posit that an LLM's knowledge activation and organization -- its schema -- depends heavily on the structural and symbolic sophistication of the language used to represent a given task. This paper contributes both a formalization of this claim and the empirical evidence to support it. With a new formalization, we present multiple lines of evidence to support our position: Firstly, we review recent empirical practices and emerging methodologies that demonstrate the substantial performance gains achievable through deliberate language representation design, even without modifying model parameters or scale. Secondly, we conduct controlled experiments showing that LLM performance and its internal feature activations vary under different language representations of the same underlying task. Together, these findings highlight language representation design as a promising direction for future research.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript argues that an LLM's schema—its knowledge activation and organization—depends on the structural and symbolic sophistication of the language representation used to encode tasks. It positions deliberate language representation design as the next frontier for LLM intelligence beyond scaling, supported by a new formalization, a review of empirical practices showing performance gains without parameter changes, and controlled experiments demonstrating that performance and internal feature activations vary across different language representations of the same underlying task.

Significance. If the central claim is substantiated with isolated evidence, the work could usefully redirect attention toward input representation engineering as a scalable complement to model scaling. The review of practices and the formalization provide a conceptual starting point that might stimulate targeted follow-up studies on representation effects.

major comments (2)
  1. [Abstract and Experiments] Abstract / controlled experiments description: the claim that performance and internal activations vary under different language representations is presented as direct support for schema shaping, yet no quantitative results, statistical tests, sample sizes, or controls for tokenization, sequence length, or attention-pattern changes are reported. Without these, the attribution to schema (rather than surface encoding differences) cannot be evaluated.
  2. [Formalization] Formalization: schema is defined as depending on the structural sophistication of the language representation, rendering the posited dependence largely definitional rather than independently testable. An operationalization of schema that is representation-independent would be required to support the causal claim.
minor comments (1)
  1. [Title] The title ends abruptly with 'Expanding' and would benefit from grammatical revision for clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive review of our manuscript. Their comments highlight important areas for clarification and strengthening, particularly regarding the presentation of experimental evidence and the formalization. We address each point below.

read point-by-point responses
  1. Referee: [Abstract and Experiments] Abstract / controlled experiments description: the claim that performance and internal activations vary under different language representations is presented as direct support for schema shaping, yet no quantitative results, statistical tests, sample sizes, or controls for tokenization, sequence length, or attention-pattern changes are reported. Without these, the attribution to schema (rather than surface encoding differences) cannot be evaluated.

    Authors: We agree that the abstract, being a concise summary, does not include specific quantitative results or statistical details. The controlled experiments in the full manuscript demonstrate variations in performance and internal activations across language representations of the same task, but we acknowledge the need for more rigorous reporting. We will revise the abstract to summarize key quantitative findings and expand the experiments section to include sample sizes, statistical tests, and explicit controls for tokenization, sequence length, and attention patterns to better isolate schema effects from surface-level differences. revision: yes

  2. Referee: [Formalization] Formalization: schema is defined as depending on the structural sophistication of the language representation, rendering the posited dependence largely definitional rather than independently testable. An operationalization of schema that is representation-independent would be required to support the causal claim.

    Authors: The formalization is intended to provide a structured way to analyze how language representations influence schema activation, building on established cognitive concepts. While it links the two, the causal claim is supported by the empirical component where we hold the underlying task constant and vary only the representation, observing differences in outcomes and activations. This provides a test independent of the definition. That said, we appreciate the suggestion for a more explicit representation-independent operationalization of schema and will add a subsection discussing potential measures, such as using probing techniques or activation similarity metrics that can be applied uniformly across different representations. revision: partial

Circularity Check

0 steps flagged

No significant circularity; central claim supported by independent empirical review and experiments

full rationale

The paper posits a dependence between schema (defined as knowledge activation and organization) and language representation sophistication, then supports the position via two external lines of evidence: a review of recent empirical practices showing performance gains from representation design without parameter changes, and controlled experiments documenting variations in performance and feature activations for the same task under different representations. No equations, formalization steps, or predictions are shown to reduce by construction to the initial definition or inputs. No self-citations are invoked as load-bearing uniqueness theorems, and no fitted parameters are relabeled as predictions. The derivation chain remains self-contained against the provided evidence sources.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on the untested premise that language representation is the primary bottleneck and that schema is a useful, language-dependent construct.

axioms (2)
  • domain assumption Natural language has limited expressive capacity that creates a bottleneck for complex problem-solving in LLMs
    Opening premise of the abstract.
  • ad hoc to paper LLM knowledge activation and organization (schema) is determined by the structural sophistication of the input language representation
    Load-bearing claim being advanced.
invented entities (1)
  • schema no independent evidence
    purpose: To denote the LLM's internal knowledge activation and organization structure
    Introduced as the key mediating concept between language representation and performance.

pith-pipeline@v0.9.0 · 5514 in / 1263 out tokens · 65584 ms · 2026-05-12T04:56:33.431386+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Reference graph

Works this paper leans on

140 extracted references · 140 canonical work pages · 11 internal anchors

  1. [1]

    Tractatus logico-philosophicus

    Ludwig Wittgenstein. Tractatus logico-philosophicus. 1922

  2. [2]

    Language models are unsupervised multitask learners.OpenAI blog, 1(8):9, 2019

    Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. Language models are unsupervised multitask learners.OpenAI blog, 1(8):9, 2019

  3. [3]

    Language models are few-shot learners.Advances in neural information processing systems, 33:1877– 1901, 2020

    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhari- wal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.Advances in neural information processing systems, 33:1877– 1901, 2020

  4. [4]

    OpenAI o1 System Card

    Aaron Jaech, Adam Kalai, Adam Lerer, Adam Richardson, Ahmed El-Kishky, Aiden Low, Alec Helyar, Aleksander Madry, Alex Beutel, Alex Carney, et al. Openai o1 system card. arXiv preprint arXiv:2412.16720, 2024

  5. [5]

    DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

    Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025

  6. [6]

    Kimi Team, Angang Du, Bofei Gao, Bowei Xing, Changjiu Jiang, Cheng Chen, Cheng Li, Chenjun Xiao, Chenzhuang Du, Chonghua Liao, et al. Kimi k1. 5: Scaling reinforcement learning with llms.arXiv preprint arXiv:2501.12599, 2025

  7. [7]

    Emergent abilities of large language models.Transactions on Machine Learning Research, 2022

    Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, et al. Emergent abilities of large language models.Transactions on Machine Learning Research, 2022

  8. [8]

    Training compute-optimal large language models

    Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, et al. Training compute-optimal large language models. InProceedings of the 36th International Conference on Neural Information Processing Systems, pages 30016–30030, 2022

  9. [9]

    Fabio Petroni, Tim Rockt¨aschel, Sebastian Riedel, Patrick Lewis, Anton Bakhtin, Yuxiang Wu, and Alexander Miller. Language models as knowledge bases? InProceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pages 2463–2473, 2019

  10. [10]

    Adam Roberts, Colin Raffel, and Noam Shazeer. How much knowledge can you pack into the parameters of a language model? InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5418–5426, 2020

  11. [11]

    React: Synergizing reasoning and acting in language models

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. InThe eleventh international conference on learning representations, 2022

  12. [12]

    Neuroscience-inspired artificial intelligence.Neuron, 95(2):245–258, 2017

    Demis Hassabis, Dharshan Kumaran, Christopher Summerfield, and Matthew Botvinick. Neuroscience-inspired artificial intelligence.Neuron, 95(2):245–258, 2017

  13. [13]

    When brain-inspired ai meets agi.Meta-Radiology, 1(1):100005, 2023

    Lin Zhao, Lu Zhang, Zihao Wu, Yuzhong Chen, Haixing Dai, Xiaowei Yu, Zhengliang Liu, Tuo Zhang, Xintao Hu, Xi Jiang, et al. When brain-inspired ai meets agi.Meta-Radiology, 1(1):100005, 2023

  14. [14]

    Debates on the nature of artificial general intelligence, 2024

    Melanie Mitchell. Debates on the nature of artificial general intelligence, 2024

  15. [15]

    Remembering: A study in experimental and social psychology

    Frederic C Bartlett. Remembering: A study in experimental and social psychology. 1932

  16. [16]

    Thinking: An experimental and social study

    Frederic Charles Bartlett. Thinking: An experimental and social study. 1958

  17. [17]

    Prentice Hall, 1993

    Gail E Tompkins and Lea M McGee.Teaching reading with literature: Case studies to action plans. Prentice Hall, 1993

  18. [18]

    Yohan J. John. The power of scale in machine learning. https://kempnerinstitute. harvard.edu/news/the-power-of-scale-in-machine-learning/ , Aug 2025. Kemp- ner Institute at Harvard University. 10

  19. [19]

    Ilya sutskever: We’re moving from the age of scaling to the age of research

    Ilya Sutskever and Dwarkesh Patel. Ilya sutskever: We’re moving from the age of scaling to the age of research. The Dwarkesh Podcast, nov 2025. Published on November 25, 2025

  20. [20]

    Gorle, Maahe Zehra Kazmi, Ayesha Mohsin, Muhammad Usman Rafique, Zihao He, Pulkit Mehta, Muham- mad Ali Jamshed, and John M

    Muhammad Ahmed Mohsin, Muhammad Umer, Ahsan Bilal, Zeeshan Memon, Muham- mad Ibtsaam Qadir, Sagnik Bhattacharya, Hassan Rizwan, Abhiram R Gorle, Maahe Zehra Kazmi, Ayesha Mohsin, et al. On the fundamental limits of llms at scale.arXiv preprint arXiv:2511.12869, 2025

  21. [21]

    Using machine learning to simultaneously quantify multiple cognitive components of episodic memory.Nature Communications, 16(1):2856, 2025

    Soroush Mirjalili and Audrey Duarte. Using machine learning to simultaneously quantify multiple cognitive components of episodic memory.Nature Communications, 16(1):2856, 2025

  22. [22]

    Prefrontal connectomics: from anatomy to human imaging.Neuropsychopharmacology, 47(1):20–40, 2022

    Suzanne N Haber, Hesheng Liu, Jakob Seidlitz, and Ed Bullmore. Prefrontal connectomics: from anatomy to human imaging.Neuropsychopharmacology, 47(1):20–40, 2022

  23. [23]

    Schemata: The building blocks of cognition

    David E Rumelhart. Schemata: The building blocks of cognition. InTheoretical issues in reading comprehension, pages 33–58. Routledge, 2017

  24. [24]

    Schema for in-context learning.arXiv preprint arXiv:2510.13905, 2025

    Pan Chen, Shaohong Chen, Mark Wang, Shi Xuan Leong, Priscilla Fung, Varinia Bernales, and Alan Aspuru-Guzik. Schema for in-context learning.arXiv preprint arXiv:2510.13905, 2025

  25. [25]

    Schema theory

    Tricia Smith. Schema theory. https://www.ebsco.com/research-starters/ psychology/schema-theory, 2021

  26. [26]

    Semantic encoding during language comprehension at single-cell resolution.Nature, 631(8021):610– 616, 2024

    Mohsen Jamali, Benjamin Grannan, Jing Cai, Arjun R Khanna, William Mu˜noz, Irene Caprara, Angelique C Paulk, Sydney S Cash, Evelina Fedorenko, and Ziv M Williams. Semantic encoding during language comprehension at single-cell resolution.Nature, 631(8021):610– 616, 2024

  27. [27]

    Language, thought, and reality: selected writings of

    Benjamin Lee Whorf. Language, thought, and reality: selected writings of. . . .(edited by john b. carroll.). 1956

  28. [28]

    Linguistic relativity.Annual review of anthropology, 26(1):291–312, 1997

    John A Lucy. Linguistic relativity.Annual review of anthropology, 26(1):291–312, 1997

  29. [29]

    Experience grounds language

    Yonatan Bisk, Ari Holtzman, Jesse Thomason, Jacob Andreas, Yoshua Bengio, Joyce Chai, Mirella Lapata, Angeliki Lazaridou, Jonathan May, Aleksandr Nisnevich, et al. Experience grounds language. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 8718–8735, 2020

  30. [30]

    Linguistic skill and stimulus-driven attention: A case for linguistic relativity.Frontiers in Psychology, 13:875744, 2022

    Ulrich Ansorge, Diane Baier, and Soonja Choi. Linguistic skill and stimulus-driven attention: A case for linguistic relativity.Frontiers in Psychology, 13:875744, 2022

  31. [31]

    Meaning without reference in large language models

    Steven Piantadosi and Felix Hill. Meaning without reference in large language models. In NeurIPS 2022 Workshop on Neuro Causal and Symbolic AI (nCSI), 2022

  32. [32]

    Language and causation: A discursive action model of description and attribution.Psychological review, 100(1):23, 1993

    Derek Edwards and Jonathan Potter. Language and causation: A discursive action model of description and attribution.Psychological review, 100(1):23, 1993

  33. [33]

    English and spanish speakers remember causal agents differently

    Caitlin M Fausey and Lera Boroditsky. English and spanish speakers remember causal agents differently. InProceedings of the Annual Meeting of the Cognitive Science Society, volume 30, 2008

  34. [34]

    MIT Press, 2000

    Leonard Talmy.Toward a cognitive semantics: Concept structuring systems, volume 1. MIT Press, 2000

  35. [35]

    Does language shape thought?: Mandarin and english speakers’ conceptions of time.Cognitive psychology, 43(1):1–22, 2001

    Lera Boroditsky. Does language shape thought?: Mandarin and english speakers’ conceptions of time.Cognitive psychology, 43(1):1–22, 2001

  36. [36]

    Wilhelm von Humboldt.From ‘thought and language’ to ‘thinking for speaking’.Cambridge University Press, 1996

  37. [37]

    How language shapes thought.Scientific American, 304(2):62–65, 2011

    Lera Boroditsky. How language shapes thought.Scientific American, 304(2):62–65, 2011. 11

  38. [38]

    Building machines that learn and think like people.Behavioral and brain sciences, 40:e253, 2017

    Brenden M Lake, Tomer D Ullman, Joshua B Tenenbaum, and Samuel J Gershman. Building machines that learn and think like people.Behavioral and brain sciences, 40:e253, 2017

  39. [39]

    Using cognitive psychology to understand gpt-3.Proceedings of the National Academy of Sciences, 120(6):e2218523120, 2023

    Marcel Binz and Eric Schulz. Using cognitive psychology to understand gpt-3.Proceedings of the National Academy of Sciences, 120(6):e2218523120, 2023

  40. [40]

    Circuit tracing: Revealing computational graphs in language models.Transformer Circuits Thread, 6, 2025

    Emmanuel Ameisen, Jack Lindsey, Adam Pearce, Wes Gurnee, Nicholas L Turner, Brian Chen, Craig Citro, David Abrahams, Shan Carter, Basil Hosmer, et al. Circuit tracing: Revealing computational graphs in language models.Transformer Circuits Thread, 6, 2025

  41. [41]

    Semantic structure in large language model embeddings.arXiv preprint arXiv:2508.10003,

    Austin C Kozlowski, Callin Dai, and Andrei Boutyline. Semantic structure in large language model embeddings.arXiv preprint arXiv:2508.10003, 2025

  42. [42]

    Under the shadow of babel: How language shapes reasoning in llms

    Chenxi Wang, Yixuan Zhang, Lang Gao, Zixiang Xu, Zirui Song, Yanbo Wang, and Xiuying Chen. Under the shadow of babel: How language shapes reasoning in llms. InFindings of the Association for Computational Linguistics: EMNLP 2025, pages 24327–24344, 2025

  43. [43]

    A survey on in-context learning

    Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Jingyuan Ma, Rui Li, Heming Xia, Jingjing Xu, Zhiyong Wu, Baobao Chang, et al. A survey on in-context learning. InProceedings of the 2024 conference on empirical methods in natural language processing, pages 1107–1128, 2024

  44. [44]

    Decoding in-context learning: Neuroscience-inspired analysis of representations in large language models.arXiv preprint arXiv:2310.00313, 2023

    Safoora Yousefi, Leo Betthauser, Hosein Hasanbeig, Rapha¨el Milli`ere, and Ida Momennejad. Decoding in-context learning: Neuroscience-inspired analysis of representations in large language models.arXiv preprint arXiv:2310.00313, 2023

  45. [45]

    Chain-of-thought prompting elicits reasoning in large language models

    Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35:24824–24837, 2022

  46. [46]

    Towards understanding chain-of-thought prompting: An empirical study of what matters

    Boshi Wang, Sewon Min, Xiang Deng, Jiaming Shen, You Wu, Luke Zettlemoyer, and Huan Sun. Towards understanding chain-of-thought prompting: An empirical study of what matters. InProceedings of the 61st annual meeting of the association for computational linguistics (volume 1: Long papers), pages 2717–2739, 2023

  47. [47]

    Schema and natural language aware in-context learning for improved graphql query generation

    Nitin Gupta, Manish Kesarwani, Sambit Ghosh, Sameep Mehta, Carlos Eberhardt, and Dan Debrunner. Schema and natural language aware in-context learning for improved graphql query generation. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Ind...

  48. [48]

    Infusing prompts with syntax and semantics

    Anton Bulle Labate and Fabio Gagliardi Cozman. Infusing prompts with syntax and semantics. arXiv preprint arXiv:2412.06107, 2024

  49. [49]

    Schema-learning and rebinding as mechanisms of in-context learning and emergence.Advances in neural information processing systems, 36:28785–28804, 2023

    Sivaramakrishnan Swaminathan, Antoine Dedieu, Rajkumar Vasudeva Raju, Murray Shanahan, Miguel Lazaro-Gredilla, and Dileep George. Schema-learning and rebinding as mechanisms of in-context learning and emergence.Advances in neural information processing systems, 36:28785–28804, 2023

  50. [50]

    Improving rule-based reasoning in LLMs using neurosym- bolic representations

    Varun Dhanraj and Chris Eliasmith. Improving rule-based reasoning in LLMs using neurosym- bolic representations. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 30577–30596, 2025

  51. [51]

    On the worst prompt performance of large language models.Advances in Neural Information Processing Systems, 37:69022–69042, 2024

    Bowen Cao, Deng Cai, Zhisong Zhang, Yuexian Zou, and Wai Lam. On the worst prompt performance of large language models.Advances in Neural Information Processing Systems, 37:69022–69042, 2024

  52. [52]

    Promptrobust: Towards evaluating the robustness of large language models on adversarial prompts

    Kaijie Zhu, Jindong Wang, Jiaheng Zhou, Zichen Wang, Hao Chen, Yidong Wang, Linyi Yang, Wei Ye, Yue Zhang, Neil Gong, et al. Promptrobust: Towards evaluating the robustness of large language models on adversarial prompts. InProceedings of the 1st ACM workshop on large AI systems and models with privacy and safety analysis, pages 57–68, 2023

  53. [53]

    The communicative function of ambiguity in language.Cognition, 122(3):280–291, 2012

    Steven T Piantadosi, Harry Tily, and Edward Gibson. The communicative function of ambiguity in language.Cognition, 122(3):280–291, 2012. 12

  54. [54]

    Climbing towards nlu: On meaning, form, and understanding in the age of data

    Emily M Bender and Alexander Koller. Climbing towards nlu: On meaning, form, and understanding in the age of data. InProceedings of the 58th annual meeting of the association for computational linguistics, pages 5185–5198, 2020

  55. [55]

    Emergent Abilities of Large Language Models

    Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, et al. Emergent abilities of large language models.arXiv preprint arXiv:2206.07682, 2022

  56. [56]

    Prompt programming for large language models: Beyond the few-shot paradigm

    Laria Reynolds and Kyle McDonell. Prompt programming for large language models: Beyond the few-shot paradigm. InExtended abstracts of the 2021 CHI conference on human factors in computing systems, pages 1–7, 2021

  57. [57]

    Grammar prompting for domain-specific language generation with large language models.Advances in Neural Information Processing Systems, 36:65030–65055, 2023

    Bailin Wang, Zi Wang, Xuezhi Wang, Yuan Cao, Rif A Saurous, and Yoon Kim. Grammar prompting for domain-specific language generation with large language models.Advances in Neural Information Processing Systems, 36:65030–65055, 2023

  58. [58]

    Constrained language models yield few-shot semantic parsers

    Richard Shin, Christopher Lin, Sam Thomson, Charles Chen Jr, Subhro Roy, Emmanouil Anto- nios Platanios, Adam Pauls, Dan Klein, Jason Eisner, and Benjamin Van Durme. Constrained language models yield few-shot semantic parsers. InProceedings of the 2021 conference on empirical methods in natural language processing, pages 7699–7715, 2021

  59. [59]

    Grammar-aligned decoding.Advances in Neural Information Processing Systems, 37:24547– 24568, 2024

    Kanghee Park, Jiayu Wang, Taylor Berg-Kirkpatrick, Nadia Polikarpova, and Loris D’Antoni. Grammar-aligned decoding.Advances in Neural Information Processing Systems, 37:24547– 24568, 2024

  60. [60]

    Improving parallel program performance through dsl-driven code generation with llm optimizers.arXiv e-prints, pages arXiv–2410, 2024

    Anjiang Wei, Allen Nie, Thiago SFX Teixeira, Rohan Yadav, Wonchan Lee, Ke Wang, and Alex Aiken. Improving parallel program performance through dsl-driven code generation with llm optimizers.arXiv e-prints, pages arXiv–2410, 2024

  61. [61]

    Making pre-trained language models better few- shot learners

    Tianyu Gao, Adam Fisch, and Danqi Chen. Making pre-trained language models better few- shot learners. InProceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (volume 1: long papers), pages 3816–3830, 2021

  62. [62]

    True few-shot learning with language models

    Ethan Perez, Douwe Kiela, and Kyunghyun Cho. True few-shot learning with language models. Advances in neural information processing systems, 34:11054–11070, 2021

  63. [63]

    Stress test evaluation for natural language inference.arXiv preprint arXiv:1806.00692, 2018

    Aakanksha Naik, Abhilasha Ravichander, Norman Sadeh, Carolyn Rose, and Graham Neubig. Stress test evaluation for natural language inference.arXiv preprint arXiv:1806.00692, 2018

  64. [64]

    The butterfly effect of altering prompts: How small changes and jailbreaks affect large language model performance.arXiv preprint arXiv:2401.03729, 2024

    Abel Salinas and Fred Morstatter. The butterfly effect of altering prompts: How small changes and jailbreaks affect large language model performance.arXiv preprint arXiv:2401.03729, 2024

  65. [65]

    Solving formal math problems by decomposition and iterative reflection.arXiv preprint arXiv:2507.15225, 2025

    Yichi Zhou, Jianqiu Zhao, Yongxin Zhang, Bohan Wang, Siran Wang, Luoxin Chen, Jiahui Wang, Haowei Chen, Allan Jie, Xinbo Zhang, et al. Solving formal math problems by decomposition and iterative reflection.arXiv preprint arXiv:2507.15225, 2025

  66. [66]

    Generating structured outputs from language models: Benchmark and studies.arXiv e-prints, pages arXiv–2501, 2025

    Saibo Geng, Hudson Cooper, Michał Moskal, Samuel Jenkins, Julian Berman, Nathan Ranchin, Robert West, Eric Horvitz, and Harsha Nori. Generating structured outputs from language models: Benchmark and studies.arXiv e-prints, pages arXiv–2501, 2025

  67. [67]

    Combining tsl and llm to automate rest api testing: A comparative study.arXiv preprint arXiv:2509.05540, 2025

    Thiago Barradas, Aline Paes, and Vˆania de Oliveira Neves. Combining tsl and llm to automate rest api testing: A comparative study.arXiv preprint arXiv:2509.05540, 2025

  68. [68]

    Language models as compilers: Simulating pseudocode execution improves algorithmic reasoning in language models

    Hyungjoo Chae, Yeonghyeon Kim, Seungone Kim, Kai Tzu-iunn Ong, Beong-woo Kwak, Moohyeon Kim, Sunghwan Mac Kim, Taeyoon Kwon, Jiwan Chung, Youngjae Yu, et al. Language models as compilers: Simulating pseudocode execution improves algorithmic reasoning in language models. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processi...

  69. [69]

    Logic-lm: Empowering large language models with symbolic solvers for faithful logical reasoning

    Liangming Pan, Alon Albalak, Xinyi Wang, and William Wang. Logic-lm: Empowering large language models with symbolic solvers for faithful logical reasoning. InFindings of the Association for Computational Linguistics: EMNLP 2023, pages 3806–3824, 2023. 13

  70. [70]

    Vipergpt: Visual inference via python execution for reasoning

    D´ıdac Sur´ıs, Sachit Menon, and Carl V ondrick. Vipergpt: Visual inference via python execution for reasoning. InProceedings of the IEEE/CVF international conference on computer vision, pages 11888–11898, 2023

  71. [71]

    Thinking with blueprints: Assisting vision-language models in spatial reasoning via structured object representation.CoRR, abs/2601.01984, 2026

    Weijian Ma, Shizhao Sun, Tianyu Yu, Ruiyu Wang, Tat-Seng Chua, and Jiang Bian. Thinking with blueprints: Assisting vision-language models in spatial reasoning via structured object representation.arXiv preprint arXiv:2601.01984, 2026

  72. [72]

    Thinking Without Words: Efficient Latent Reasoning with Abstract Chain-of-Thought

    Keshav Ramji, Tahira Naseem, and Ram´on Fernandez Astudillo. Thinking without words: Efficient latent reasoning with abstract chain-of-thought.arXiv preprint arXiv:2604.22709, 2026

  73. [73]

    The language instinct (1994/2007)

    Steven Pinker. The language instinct (1994/2007). 2007

  74. [74]

    AgentSPEX: An Agent SPecification and EXecution Language

    Pengcheng Wang, Jerry Huang, Jiarui Yao, Rui Pan, Peizhi Niu, Yaowenqi Liu, Ruida Wang, Renhao Lu, Yuwei Guo, and Tong Zhang. Agentspex: An agent specification and execution language.arXiv preprint arXiv:2604.13346, 2026

  75. [75]

    Constituent-constrained word prediction during language comprehension.Nature Neuroscience, pages 1–12, 2026

    Jiajie Zou, David Poeppel, and Nai Ding. Constituent-constrained word prediction during language comprehension.Nature Neuroscience, pages 1–12, 2026

  76. [76]

    Faithful logical reasoning via symbolic chain-of-thought.arXiv preprint arXiv:2405.18357, 2024

    Jundong Xu, Hao Fei, Liangming Pan, Qian Liu, Mong-Li Lee, and Wynne Hsu. Faithful logical reasoning via symbolic chain-of-thought.arXiv preprint arXiv:2405.18357, 2024

  77. [77]

    Geometrically-constrained agent for spatial reasoning.arXiv preprint arXiv:2511.22659, 2025

    Zeren Chen, Xiaoya Lu, Zhijie Zheng, Pengrui Li, Lehan He, Yijin Zhou, Jing Shao, Bohan Zhuang, and Lu Sheng. Geometrically-constrained agent for spatial reasoning.arXiv preprint arXiv:2511.22659, 2025

  78. [78]

    Elsevier, 2004

    Malik Ghallab, Dana Nau, and Paolo Traverso.Automated Planning: theory and practice. Elsevier, 2004

  79. [79]

    Llm+ al: Bridging large language models and action languages for complex reasoning about actions

    Adam Ishay and Joohyung Lee. Llm+ al: Bridging large language models and action languages for complex reasoning about actions. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 24212–24220, 2025

  80. [80]

    Smirnov, F

    Pavel Smirnov, Frank Joublin, Antonello Ceravola, and Michael Gienger. Generating consistent pddl domains with large language models.arXiv preprint arXiv:2404.07751, 2024

Showing first 80 references.