arxiv: 2605.09271 · v1 · submitted 2026-05-10 · 💻 cs.AI

Recognition: 3 theorem links

· Lean Theorem

Shaping Schema via Language Representation as the Next Frontier for LLM Intelligence Expanding

Jingwen Fu, Masashi Sugiyama, Nanning Zheng, Pei Fu anf Bo Han, Yuhan Liu, Zhiqin Yang

Pith reviewed 2026-05-12 04:56 UTC · model grok-4.3

classification 💻 cs.AI

keywords LLMlanguage representationschemaknowledge activationsymbolic constructsintelligence expansionperformance variationinternal activations

0 comments

The pith

Shaping schemas through advanced language representation is the next frontier for expanding LLM intelligence.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper posits that LLMs' ability to apply their internalized knowledge effectively in complex tasks is limited by the expressive power of natural language as the default medium. By designing more sophisticated language representations that use richer structural and symbolic forms to map the real world, one can shape an LLM's schema—the way it activates and organizes its knowledge—to achieve better performance without scaling the model or changing its parameters. The authors provide a formalization of this idea and back it with reviews of existing methods that gain from representation design plus new experiments demonstrating changes in both accuracy and internal activations when the same task is framed differently. A sympathetic reader would care because this points to a more efficient path for advancing AI capabilities than the current focus on ever-larger models.

Core claim

An LLM's schema, its knowledge activation and organization, depends heavily on the structural and symbolic sophistication of the language used to represent a given task. Shaping schemas through advanced language representation therefore constitutes the next frontier for expanding LLM intelligence, as shown by empirical practices and controlled experiments where performance and feature activations vary with different representations of the same task.

What carries the argument

The schema, defined as the LLM's knowledge activation and organization, which is shaped by the language representation consisting of linguistic and symbolic constructs that map the real world.

If this is right

Deliberate design of language representations can yield substantial performance gains on complex problems without modifying model parameters or increasing scale.
LLM internal feature activations change in response to the symbolic structure of the input language for an identical task.
Future research should emphasize language representation design as a primary direction alongside or instead of scaling.
The bottleneck of natural language's limited expressivity for problem-solving can be addressed through custom symbolic and structural enhancements.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

One could explore whether automatically generated language representations optimized for schema shaping would outperform human-designed ones on specific domains.
The approach may connect to how formal languages in mathematics enable more precise reasoning, suggesting similar benefits for LLMs in technical fields.
If schema shaping proves robust, it could reduce reliance on massive training datasets by improving how existing knowledge is accessed and structured.

Load-bearing premise

Performance differences across language representations of the same task arise specifically from schema shaping rather than tokenization, attention patterns, or surface-level prompting effects.

What would settle it

Finding that different language representations of the same task lead to identical internal feature activations and task performance levels, after matching for token count and basic structure, would falsify the claim that representation shapes schema in a distinct and measurable way.

Figures

Figures reproduced from arXiv: 2605.09271 by Jingwen Fu, Masashi Sugiyama, Nanning Zheng, Pei Fu anf Bo Han, Yuhan Liu, Zhiqin Yang.

**Figure 1.** Figure 1: Language representation as a frontier for LLM intelligence. Natural language encodes only a fraction of world information (Left). We organize representations along an axis of increasing design sophistication, from natural-language baselines (Level 0) through ambiguity elimination (Level 1) and logical constraints (Level 2) to scientific formalization and world modeling (Level 3). Each level induces progres… view at source ↗

**Figure 2.** Figure 2: Performance gains and capability expansion [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 2.** Figure 2: Example of grammar prompting for a calendar DSL. We interleave the minimal specialized grammar G[y(i) ] between the demonstrations x(i) and y(i) . Duringdecoding,theLLMfirstpredictsthespecialized grammar G!, and then predicts the program y! conditioned on G!. Theblueportionisnotpartoftheactual prompt and only shown for illustrative purposes. 3 GrammarPrompting Grammar prompting exploits the fact that while… view at source ↗

**Figure 4.** Figure 4: Conceptual overview of shaping schema via language representation design. We posit that the intelligence of Large Language Models (LLMs) can be expanded not just by scaling, but by designing language representations (L) that deliberately induce optimal internal schemas. As conceptualized in our position, a complex task—such as a 48-hour multi-city schedule constrained by punctuality, low-carbon preferences… view at source ↗

**Figure 5.** Figure 5: Multi-dimensional evaluation of language representation formats across internal dynamics, [PITH_FULL_IMAGE:figures/full_fig_p024_5.png] view at source ↗

**Figure 6.** Figure 6: t-SNE of last-layer hidden states (mean pooling), colored by representation type, layer 64 of Qwen3-32B. Each point is a single circuit problem; colors denote the 15 surface representations. Despite the underlying logical content being identical across formats, the model’s final-layer states form sharply disjoint clusters, one per representation. The silhouette score of 0.93 and the 96.8% between-format va… view at source ↗

**Figure 7.** Figure 7: Visualization of attention weights in Layer 6 (early). Rows correspond to distinct language [PITH_FULL_IMAGE:figures/full_fig_p029_7.png] view at source ↗

**Figure 8.** Figure 8: Visualization of attention weights in Layer 6 (early). Rows correspond to distinct language [PITH_FULL_IMAGE:figures/full_fig_p030_8.png] view at source ↗

**Figure 9.** Figure 9: Visualization of attention weights in Layer 6 (early). Rows correspond to distinct language [PITH_FULL_IMAGE:figures/full_fig_p031_9.png] view at source ↗

**Figure 10.** Figure 10: Visualization of attention weights in Layer 24 (middle). Rows correspond to distinct [PITH_FULL_IMAGE:figures/full_fig_p032_10.png] view at source ↗

**Figure 11.** Figure 11: Visualization of attention weights in Layer 24 (middle). Rows correspond to distinct [PITH_FULL_IMAGE:figures/full_fig_p033_11.png] view at source ↗

**Figure 12.** Figure 12: Visualization of attention weights in Layer 24 (middle). Rows correspond to distinct [PITH_FULL_IMAGE:figures/full_fig_p034_12.png] view at source ↗

**Figure 13.** Figure 13: Visualization of attention weights in Layer 48 (late). Rows correspond to distinct language [PITH_FULL_IMAGE:figures/full_fig_p035_13.png] view at source ↗

**Figure 14.** Figure 14: Visualization of attention weights in Layer 48 (late). Rows correspond to distinct language [PITH_FULL_IMAGE:figures/full_fig_p036_14.png] view at source ↗

**Figure 15.** Figure 15: Visualization of attention weights in Layer 48 (late). Rows correspond to distinct language [PITH_FULL_IMAGE:figures/full_fig_p037_15.png] view at source ↗

**Figure 16.** Figure 16: An example for the natural language representation for the logic circuit simulation task. [PITH_FULL_IMAGE:figures/full_fig_p037_16.png] view at source ↗

**Figure 17.** Figure 17: An example for the netlist language representation for the logic circuit simulation task. [PITH_FULL_IMAGE:figures/full_fig_p038_17.png] view at source ↗

**Figure 18.** Figure 18: An example for the graph adjacency notation representation for the logic circuit simulation [PITH_FULL_IMAGE:figures/full_fig_p038_18.png] view at source ↗

**Figure 19.** Figure 19: An example for the matrix representation for the logic circuit simulation task. [PITH_FULL_IMAGE:figures/full_fig_p038_19.png] view at source ↗

**Figure 20.** Figure 20: An example for the lisp tree representation for the logic circuit simulation task. [PITH_FULL_IMAGE:figures/full_fig_p039_20.png] view at source ↗

**Figure 21.** Figure 21: An example for the dataflow language representation for the logic circuit simulation task. [PITH_FULL_IMAGE:figures/full_fig_p039_21.png] view at source ↗

**Figure 22.** Figure 22: An example for the partial truth table representation for the logic circuit simulation task. [PITH_FULL_IMAGE:figures/full_fig_p039_22.png] view at source ↗

**Figure 23.** Figure 23: An example for the compact gate notation representation for the logic circuit simulation [PITH_FULL_IMAGE:figures/full_fig_p040_23.png] view at source ↗

**Figure 24.** Figure 24: An example for the reverse polish notation representation for the logic circuit simulation [PITH_FULL_IMAGE:figures/full_fig_p040_24.png] view at source ↗

**Figure 25.** Figure 25: An example for the dependency chain language representation for the logic circuit [PITH_FULL_IMAGE:figures/full_fig_p040_25.png] view at source ↗

**Figure 26.** Figure 26: An example for the layered execution plan representation for the logic circuit simulation [PITH_FULL_IMAGE:figures/full_fig_p040_26.png] view at source ↗

**Figure 27.** Figure 27: An example for the signal propagation trace representation for the logic circuit simulation [PITH_FULL_IMAGE:figures/full_fig_p041_27.png] view at source ↗

**Figure 28.** Figure 28: An example for the constraint satisfaction format representation for the logic circuit [PITH_FULL_IMAGE:figures/full_fig_p041_28.png] view at source ↗

**Figure 29.** Figure 29: An example for the canonical boolean expression representation for the logic circuit [PITH_FULL_IMAGE:figures/full_fig_p041_29.png] view at source ↗

**Figure 30.** Figure 30: An example for the petri net notation representation for the logic circuit simulation task. [PITH_FULL_IMAGE:figures/full_fig_p041_30.png] view at source ↗

read the original abstract

Although natural language is the default medium for Large Language Models (LLMs), its limited expressive capacity creates a profound bottleneck for complex problem-solving. While recent advancements in AI have relied heavily on scaling, merely internalizing knowledge does not guarantee its effective application. Defining language representation as the linguistic and symbolic constructs used to map and model the real world, this paper argues that shaping schemas through advanced language representation is the next frontier for expanding LLM intelligence. We posit that an LLM's knowledge activation and organization -- its schema -- depends heavily on the structural and symbolic sophistication of the language used to represent a given task. This paper contributes both a formalization of this claim and the empirical evidence to support it. With a new formalization, we present multiple lines of evidence to support our position: Firstly, we review recent empirical practices and emerging methodologies that demonstrate the substantial performance gains achievable through deliberate language representation design, even without modifying model parameters or scale. Secondly, we conduct controlled experiments showing that LLM performance and its internal feature activations vary under different language representations of the same underlying task. Together, these findings highlight language representation design as a promising direction for future research.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a position paper arguing that better language representations can reshape LLM schemas for gains without scaling, but the experiments don't isolate that mechanism from tokenization and attention changes.

read the letter

The core pitch is that LLMs' internal knowledge activation depends on the symbolic structure of the input language, so deliberate design of those representations could be the next efficiency lever. The authors formalize this as schema shaping and back it with a literature review plus their own controlled tests on the same tasks under different framings. Performance and activation patterns shift, which they take as evidence that representation matters independently of model size or data volume. That framing pulls together scattered prompt-engineering results into one argument and correctly notes that many gains have come from representation tweaks rather than parameters alone. The review section does a decent job cataloging examples where structured language helped without retraining. The experiments are the part that needs more scrutiny. Changing the language representation necessarily alters token sequences, lengths, and attention distributions, so the observed differences could stem from those surface factors rather than any deeper schema reorganization. The abstract gives no numbers, no matching on input length or token statistics, and no independent schema metric that avoids circularity with the activations themselves. Without those controls the causal claim stays unsecured. This paper is aimed at people already working on prompting, representation learning, or scaling alternatives. It organizes existing observations and flags a direction worth exploring, but it does not deliver a new method or a cleanly supported result. A serious editor should send it for review so the experimental design can be tightened and the formalization can be stress-tested against simpler explanations like better tokenization or longer context.

Referee Report

2 major / 1 minor

Summary. The manuscript argues that an LLM's schema—its knowledge activation and organization—depends on the structural and symbolic sophistication of the language representation used to encode tasks. It positions deliberate language representation design as the next frontier for LLM intelligence beyond scaling, supported by a new formalization, a review of empirical practices showing performance gains without parameter changes, and controlled experiments demonstrating that performance and internal feature activations vary across different language representations of the same underlying task.

Significance. If the central claim is substantiated with isolated evidence, the work could usefully redirect attention toward input representation engineering as a scalable complement to model scaling. The review of practices and the formalization provide a conceptual starting point that might stimulate targeted follow-up studies on representation effects.

major comments (2)

[Abstract and Experiments] Abstract / controlled experiments description: the claim that performance and internal activations vary under different language representations is presented as direct support for schema shaping, yet no quantitative results, statistical tests, sample sizes, or controls for tokenization, sequence length, or attention-pattern changes are reported. Without these, the attribution to schema (rather than surface encoding differences) cannot be evaluated.
[Formalization] Formalization: schema is defined as depending on the structural sophistication of the language representation, rendering the posited dependence largely definitional rather than independently testable. An operationalization of schema that is representation-independent would be required to support the causal claim.

minor comments (1)

[Title] The title ends abruptly with 'Expanding' and would benefit from grammatical revision for clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive review of our manuscript. Their comments highlight important areas for clarification and strengthening, particularly regarding the presentation of experimental evidence and the formalization. We address each point below.

read point-by-point responses

Referee: [Abstract and Experiments] Abstract / controlled experiments description: the claim that performance and internal activations vary under different language representations is presented as direct support for schema shaping, yet no quantitative results, statistical tests, sample sizes, or controls for tokenization, sequence length, or attention-pattern changes are reported. Without these, the attribution to schema (rather than surface encoding differences) cannot be evaluated.

Authors: We agree that the abstract, being a concise summary, does not include specific quantitative results or statistical details. The controlled experiments in the full manuscript demonstrate variations in performance and internal activations across language representations of the same task, but we acknowledge the need for more rigorous reporting. We will revise the abstract to summarize key quantitative findings and expand the experiments section to include sample sizes, statistical tests, and explicit controls for tokenization, sequence length, and attention patterns to better isolate schema effects from surface-level differences. revision: yes
Referee: [Formalization] Formalization: schema is defined as depending on the structural sophistication of the language representation, rendering the posited dependence largely definitional rather than independently testable. An operationalization of schema that is representation-independent would be required to support the causal claim.

Authors: The formalization is intended to provide a structured way to analyze how language representations influence schema activation, building on established cognitive concepts. While it links the two, the causal claim is supported by the empirical component where we hold the underlying task constant and vary only the representation, observing differences in outcomes and activations. This provides a test independent of the definition. That said, we appreciate the suggestion for a more explicit representation-independent operationalization of schema and will add a subsection discussing potential measures, such as using probing techniques or activation similarity metrics that can be applied uniformly across different representations. revision: partial

Circularity Check

0 steps flagged

No significant circularity; central claim supported by independent empirical review and experiments

full rationale

The paper posits a dependence between schema (defined as knowledge activation and organization) and language representation sophistication, then supports the position via two external lines of evidence: a review of recent empirical practices showing performance gains from representation design without parameter changes, and controlled experiments documenting variations in performance and feature activations for the same task under different representations. No equations, formalization steps, or predictions are shown to reduce by construction to the initial definition or inputs. No self-citations are invoked as load-bearing uniqueness theorems, and no fitted parameters are relabeled as predictions. The derivation chain remains self-contained against the provided evidence sources.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on the untested premise that language representation is the primary bottleneck and that schema is a useful, language-dependent construct.

axioms (2)

domain assumption Natural language has limited expressive capacity that creates a bottleneck for complex problem-solving in LLMs
Opening premise of the abstract.
ad hoc to paper LLM knowledge activation and organization (schema) is determined by the structural sophistication of the input language representation
Load-bearing claim being advanced.

invented entities (1)

schema no independent evidence
purpose: To denote the LLM's internal knowledge activation and organization structure
Introduced as the key mediating concept between language representation and performance.

pith-pipeline@v0.9.0 · 5514 in / 1263 out tokens · 65584 ms · 2026-05-12T04:56:33.431386+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear
We posit that an LLM's knowledge activation and organization -- its schema -- depends heavily on the structural and symbolic sophistication of the language used to represent a given task. ... SM(L) ≜ D_KL(s^L_f || s^L_π)
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
Proposition 3.6 (Bounds on prediction error) ... σ²_min/2 SM(L) ≤ d(f, f̂_L) ≤ σ²_max/2 SM(L)

Reference graph

Works this paper leans on

140 extracted references · 140 canonical work pages · 11 internal anchors

[1]

Tractatus logico-philosophicus

Ludwig Wittgenstein. Tractatus logico-philosophicus. 1922

work page 1922
[2]

Language models are unsupervised multitask learners.OpenAI blog, 1(8):9, 2019

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. Language models are unsupervised multitask learners.OpenAI blog, 1(8):9, 2019

work page 2019
[3]

Language models are few-shot learners.Advances in neural information processing systems, 33:1877– 1901, 2020

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhari- wal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.Advances in neural information processing systems, 33:1877– 1901, 2020

work page 1901
[4]

OpenAI o1 System Card

Aaron Jaech, Adam Kalai, Adam Lerer, Adam Richardson, Ahmed El-Kishky, Aiden Low, Alec Helyar, Aleksander Madry, Alex Beutel, Alex Carney, et al. Openai o1 system card. arXiv preprint arXiv:2412.16720, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[5]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[6]

Kimi Team, Angang Du, Bofei Gao, Bowei Xing, Changjiu Jiang, Cheng Chen, Cheng Li, Chenjun Xiao, Chenzhuang Du, Chonghua Liao, et al. Kimi k1. 5: Scaling reinforcement learning with llms.arXiv preprint arXiv:2501.12599, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[7]

Emergent abilities of large language models.Transactions on Machine Learning Research, 2022

Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, et al. Emergent abilities of large language models.Transactions on Machine Learning Research, 2022

work page 2022
[8]

Training compute-optimal large language models

Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, et al. Training compute-optimal large language models. InProceedings of the 36th International Conference on Neural Information Processing Systems, pages 30016–30030, 2022

work page 2022
[9]

Fabio Petroni, Tim Rockt¨aschel, Sebastian Riedel, Patrick Lewis, Anton Bakhtin, Yuxiang Wu, and Alexander Miller. Language models as knowledge bases? InProceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pages 2463–2473, 2019

work page 2019
[10]

Adam Roberts, Colin Raffel, and Noam Shazeer. How much knowledge can you pack into the parameters of a language model? InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5418–5426, 2020

work page 2020
[11]

React: Synergizing reasoning and acting in language models

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. InThe eleventh international conference on learning representations, 2022

work page 2022
[12]

Neuroscience-inspired artificial intelligence.Neuron, 95(2):245–258, 2017

Demis Hassabis, Dharshan Kumaran, Christopher Summerfield, and Matthew Botvinick. Neuroscience-inspired artificial intelligence.Neuron, 95(2):245–258, 2017

work page 2017
[13]

When brain-inspired ai meets agi.Meta-Radiology, 1(1):100005, 2023

Lin Zhao, Lu Zhang, Zihao Wu, Yuzhong Chen, Haixing Dai, Xiaowei Yu, Zhengliang Liu, Tuo Zhang, Xintao Hu, Xi Jiang, et al. When brain-inspired ai meets agi.Meta-Radiology, 1(1):100005, 2023

work page 2023
[14]

Debates on the nature of artificial general intelligence, 2024

Melanie Mitchell. Debates on the nature of artificial general intelligence, 2024

work page 2024
[15]

Remembering: A study in experimental and social psychology

Frederic C Bartlett. Remembering: A study in experimental and social psychology. 1932

work page 1932
[16]

Thinking: An experimental and social study

Frederic Charles Bartlett. Thinking: An experimental and social study. 1958

work page 1958
[17]

Prentice Hall, 1993

Gail E Tompkins and Lea M McGee.Teaching reading with literature: Case studies to action plans. Prentice Hall, 1993

work page 1993
[18]

Yohan J. John. The power of scale in machine learning. https://kempnerinstitute. harvard.edu/news/the-power-of-scale-in-machine-learning/ , Aug 2025. Kemp- ner Institute at Harvard University. 10

work page 2025
[19]

Ilya sutskever: We’re moving from the age of scaling to the age of research

Ilya Sutskever and Dwarkesh Patel. Ilya sutskever: We’re moving from the age of scaling to the age of research. The Dwarkesh Podcast, nov 2025. Published on November 25, 2025

work page 2025
[20]

Gorle, Maahe Zehra Kazmi, Ayesha Mohsin, Muhammad Usman Rafique, Zihao He, Pulkit Mehta, Muham- mad Ali Jamshed, and John M

Muhammad Ahmed Mohsin, Muhammad Umer, Ahsan Bilal, Zeeshan Memon, Muham- mad Ibtsaam Qadir, Sagnik Bhattacharya, Hassan Rizwan, Abhiram R Gorle, Maahe Zehra Kazmi, Ayesha Mohsin, et al. On the fundamental limits of llms at scale.arXiv preprint arXiv:2511.12869, 2025

work page arXiv 2025
[21]

Using machine learning to simultaneously quantify multiple cognitive components of episodic memory.Nature Communications, 16(1):2856, 2025

Soroush Mirjalili and Audrey Duarte. Using machine learning to simultaneously quantify multiple cognitive components of episodic memory.Nature Communications, 16(1):2856, 2025

work page 2025
[22]

Prefrontal connectomics: from anatomy to human imaging.Neuropsychopharmacology, 47(1):20–40, 2022

Suzanne N Haber, Hesheng Liu, Jakob Seidlitz, and Ed Bullmore. Prefrontal connectomics: from anatomy to human imaging.Neuropsychopharmacology, 47(1):20–40, 2022

work page 2022
[23]

Schemata: The building blocks of cognition

David E Rumelhart. Schemata: The building blocks of cognition. InTheoretical issues in reading comprehension, pages 33–58. Routledge, 2017

work page 2017
[24]

Schema for in-context learning.arXiv preprint arXiv:2510.13905, 2025

Pan Chen, Shaohong Chen, Mark Wang, Shi Xuan Leong, Priscilla Fung, Varinia Bernales, and Alan Aspuru-Guzik. Schema for in-context learning.arXiv preprint arXiv:2510.13905, 2025

work page arXiv 2025
[25]

Schema theory

Tricia Smith. Schema theory. https://www.ebsco.com/research-starters/ psychology/schema-theory, 2021

work page 2021
[26]

Semantic encoding during language comprehension at single-cell resolution.Nature, 631(8021):610– 616, 2024

Mohsen Jamali, Benjamin Grannan, Jing Cai, Arjun R Khanna, William Mu˜noz, Irene Caprara, Angelique C Paulk, Sydney S Cash, Evelina Fedorenko, and Ziv M Williams. Semantic encoding during language comprehension at single-cell resolution.Nature, 631(8021):610– 616, 2024

work page 2024
[27]

Language, thought, and reality: selected writings of

Benjamin Lee Whorf. Language, thought, and reality: selected writings of. . . .(edited by john b. carroll.). 1956

work page 1956
[28]

Linguistic relativity.Annual review of anthropology, 26(1):291–312, 1997

John A Lucy. Linguistic relativity.Annual review of anthropology, 26(1):291–312, 1997

work page 1997
[29]

Experience grounds language

Yonatan Bisk, Ari Holtzman, Jesse Thomason, Jacob Andreas, Yoshua Bengio, Joyce Chai, Mirella Lapata, Angeliki Lazaridou, Jonathan May, Aleksandr Nisnevich, et al. Experience grounds language. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 8718–8735, 2020

work page 2020
[30]

Linguistic skill and stimulus-driven attention: A case for linguistic relativity.Frontiers in Psychology, 13:875744, 2022

Ulrich Ansorge, Diane Baier, and Soonja Choi. Linguistic skill and stimulus-driven attention: A case for linguistic relativity.Frontiers in Psychology, 13:875744, 2022

work page 2022
[31]

Meaning without reference in large language models

Steven Piantadosi and Felix Hill. Meaning without reference in large language models. In NeurIPS 2022 Workshop on Neuro Causal and Symbolic AI (nCSI), 2022

work page 2022
[32]

Language and causation: A discursive action model of description and attribution.Psychological review, 100(1):23, 1993

Derek Edwards and Jonathan Potter. Language and causation: A discursive action model of description and attribution.Psychological review, 100(1):23, 1993

work page 1993
[33]

English and spanish speakers remember causal agents differently

Caitlin M Fausey and Lera Boroditsky. English and spanish speakers remember causal agents differently. InProceedings of the Annual Meeting of the Cognitive Science Society, volume 30, 2008

work page 2008
[34]

MIT Press, 2000

Leonard Talmy.Toward a cognitive semantics: Concept structuring systems, volume 1. MIT Press, 2000

work page 2000
[35]

Does language shape thought?: Mandarin and english speakers’ conceptions of time.Cognitive psychology, 43(1):1–22, 2001

Lera Boroditsky. Does language shape thought?: Mandarin and english speakers’ conceptions of time.Cognitive psychology, 43(1):1–22, 2001

work page 2001
[36]

Wilhelm von Humboldt.From ‘thought and language’ to ‘thinking for speaking’.Cambridge University Press, 1996

work page 1996
[37]

How language shapes thought.Scientific American, 304(2):62–65, 2011

Lera Boroditsky. How language shapes thought.Scientific American, 304(2):62–65, 2011. 11

work page 2011
[38]

Building machines that learn and think like people.Behavioral and brain sciences, 40:e253, 2017

Brenden M Lake, Tomer D Ullman, Joshua B Tenenbaum, and Samuel J Gershman. Building machines that learn and think like people.Behavioral and brain sciences, 40:e253, 2017

work page 2017
[39]

Using cognitive psychology to understand gpt-3.Proceedings of the National Academy of Sciences, 120(6):e2218523120, 2023

Marcel Binz and Eric Schulz. Using cognitive psychology to understand gpt-3.Proceedings of the National Academy of Sciences, 120(6):e2218523120, 2023

work page 2023
[40]

Circuit tracing: Revealing computational graphs in language models.Transformer Circuits Thread, 6, 2025

Emmanuel Ameisen, Jack Lindsey, Adam Pearce, Wes Gurnee, Nicholas L Turner, Brian Chen, Craig Citro, David Abrahams, Shan Carter, Basil Hosmer, et al. Circuit tracing: Revealing computational graphs in language models.Transformer Circuits Thread, 6, 2025

work page 2025
[41]

Semantic structure in large language model embeddings.arXiv preprint arXiv:2508.10003,

Austin C Kozlowski, Callin Dai, and Andrei Boutyline. Semantic structure in large language model embeddings.arXiv preprint arXiv:2508.10003, 2025

work page arXiv 2025
[42]

Under the shadow of babel: How language shapes reasoning in llms

Chenxi Wang, Yixuan Zhang, Lang Gao, Zixiang Xu, Zirui Song, Yanbo Wang, and Xiuying Chen. Under the shadow of babel: How language shapes reasoning in llms. InFindings of the Association for Computational Linguistics: EMNLP 2025, pages 24327–24344, 2025

work page 2025
[43]

A survey on in-context learning

Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Jingyuan Ma, Rui Li, Heming Xia, Jingjing Xu, Zhiyong Wu, Baobao Chang, et al. A survey on in-context learning. InProceedings of the 2024 conference on empirical methods in natural language processing, pages 1107–1128, 2024

work page 2024
[44]

Decoding in-context learning: Neuroscience-inspired analysis of representations in large language models.arXiv preprint arXiv:2310.00313, 2023

Safoora Yousefi, Leo Betthauser, Hosein Hasanbeig, Rapha¨el Milli`ere, and Ida Momennejad. Decoding in-context learning: Neuroscience-inspired analysis of representations in large language models.arXiv preprint arXiv:2310.00313, 2023

work page arXiv 2023
[45]

Chain-of-thought prompting elicits reasoning in large language models

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35:24824–24837, 2022

work page 2022
[46]

Towards understanding chain-of-thought prompting: An empirical study of what matters

Boshi Wang, Sewon Min, Xiang Deng, Jiaming Shen, You Wu, Luke Zettlemoyer, and Huan Sun. Towards understanding chain-of-thought prompting: An empirical study of what matters. InProceedings of the 61st annual meeting of the association for computational linguistics (volume 1: Long papers), pages 2717–2739, 2023

work page 2023
[47]

Schema and natural language aware in-context learning for improved graphql query generation

Nitin Gupta, Manish Kesarwani, Sambit Ghosh, Sameep Mehta, Carlos Eberhardt, and Dan Debrunner. Schema and natural language aware in-context learning for improved graphql query generation. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Ind...

work page 2025
[48]

Infusing prompts with syntax and semantics

Anton Bulle Labate and Fabio Gagliardi Cozman. Infusing prompts with syntax and semantics. arXiv preprint arXiv:2412.06107, 2024

work page arXiv 2024
[49]

Schema-learning and rebinding as mechanisms of in-context learning and emergence.Advances in neural information processing systems, 36:28785–28804, 2023

Sivaramakrishnan Swaminathan, Antoine Dedieu, Rajkumar Vasudeva Raju, Murray Shanahan, Miguel Lazaro-Gredilla, and Dileep George. Schema-learning and rebinding as mechanisms of in-context learning and emergence.Advances in neural information processing systems, 36:28785–28804, 2023

work page 2023
[50]

Improving rule-based reasoning in LLMs using neurosym- bolic representations

Varun Dhanraj and Chris Eliasmith. Improving rule-based reasoning in LLMs using neurosym- bolic representations. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 30577–30596, 2025

work page 2025
[51]

On the worst prompt performance of large language models.Advances in Neural Information Processing Systems, 37:69022–69042, 2024

Bowen Cao, Deng Cai, Zhisong Zhang, Yuexian Zou, and Wai Lam. On the worst prompt performance of large language models.Advances in Neural Information Processing Systems, 37:69022–69042, 2024

work page 2024
[52]

Promptrobust: Towards evaluating the robustness of large language models on adversarial prompts

Kaijie Zhu, Jindong Wang, Jiaheng Zhou, Zichen Wang, Hao Chen, Yidong Wang, Linyi Yang, Wei Ye, Yue Zhang, Neil Gong, et al. Promptrobust: Towards evaluating the robustness of large language models on adversarial prompts. InProceedings of the 1st ACM workshop on large AI systems and models with privacy and safety analysis, pages 57–68, 2023

work page 2023
[53]

The communicative function of ambiguity in language.Cognition, 122(3):280–291, 2012

Steven T Piantadosi, Harry Tily, and Edward Gibson. The communicative function of ambiguity in language.Cognition, 122(3):280–291, 2012. 12

work page 2012
[54]

Climbing towards nlu: On meaning, form, and understanding in the age of data

Emily M Bender and Alexander Koller. Climbing towards nlu: On meaning, form, and understanding in the age of data. InProceedings of the 58th annual meeting of the association for computational linguistics, pages 5185–5198, 2020

work page 2020
[55]

Emergent Abilities of Large Language Models

Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, et al. Emergent abilities of large language models.arXiv preprint arXiv:2206.07682, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[56]

Prompt programming for large language models: Beyond the few-shot paradigm

Laria Reynolds and Kyle McDonell. Prompt programming for large language models: Beyond the few-shot paradigm. InExtended abstracts of the 2021 CHI conference on human factors in computing systems, pages 1–7, 2021

work page 2021
[57]

Grammar prompting for domain-specific language generation with large language models.Advances in Neural Information Processing Systems, 36:65030–65055, 2023

Bailin Wang, Zi Wang, Xuezhi Wang, Yuan Cao, Rif A Saurous, and Yoon Kim. Grammar prompting for domain-specific language generation with large language models.Advances in Neural Information Processing Systems, 36:65030–65055, 2023

work page 2023
[58]

Constrained language models yield few-shot semantic parsers

Richard Shin, Christopher Lin, Sam Thomson, Charles Chen Jr, Subhro Roy, Emmanouil Anto- nios Platanios, Adam Pauls, Dan Klein, Jason Eisner, and Benjamin Van Durme. Constrained language models yield few-shot semantic parsers. InProceedings of the 2021 conference on empirical methods in natural language processing, pages 7699–7715, 2021

work page 2021
[59]

Grammar-aligned decoding.Advances in Neural Information Processing Systems, 37:24547– 24568, 2024

Kanghee Park, Jiayu Wang, Taylor Berg-Kirkpatrick, Nadia Polikarpova, and Loris D’Antoni. Grammar-aligned decoding.Advances in Neural Information Processing Systems, 37:24547– 24568, 2024

work page 2024
[60]

Improving parallel program performance through dsl-driven code generation with llm optimizers.arXiv e-prints, pages arXiv–2410, 2024

Anjiang Wei, Allen Nie, Thiago SFX Teixeira, Rohan Yadav, Wonchan Lee, Ke Wang, and Alex Aiken. Improving parallel program performance through dsl-driven code generation with llm optimizers.arXiv e-prints, pages arXiv–2410, 2024

work page 2024
[61]

Making pre-trained language models better few- shot learners

Tianyu Gao, Adam Fisch, and Danqi Chen. Making pre-trained language models better few- shot learners. InProceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (volume 1: long papers), pages 3816–3830, 2021

work page 2021
[62]

True few-shot learning with language models

Ethan Perez, Douwe Kiela, and Kyunghyun Cho. True few-shot learning with language models. Advances in neural information processing systems, 34:11054–11070, 2021

work page 2021
[63]

Stress test evaluation for natural language inference.arXiv preprint arXiv:1806.00692, 2018

Aakanksha Naik, Abhilasha Ravichander, Norman Sadeh, Carolyn Rose, and Graham Neubig. Stress test evaluation for natural language inference.arXiv preprint arXiv:1806.00692, 2018

work page arXiv 2018
[64]

The butterfly effect of altering prompts: How small changes and jailbreaks affect large language model performance.arXiv preprint arXiv:2401.03729, 2024

Abel Salinas and Fred Morstatter. The butterfly effect of altering prompts: How small changes and jailbreaks affect large language model performance.arXiv preprint arXiv:2401.03729, 2024

work page arXiv 2024
[65]

Solving formal math problems by decomposition and iterative reflection.arXiv preprint arXiv:2507.15225, 2025

Yichi Zhou, Jianqiu Zhao, Yongxin Zhang, Bohan Wang, Siran Wang, Luoxin Chen, Jiahui Wang, Haowei Chen, Allan Jie, Xinbo Zhang, et al. Solving formal math problems by decomposition and iterative reflection.arXiv preprint arXiv:2507.15225, 2025

work page arXiv 2025
[66]

Generating structured outputs from language models: Benchmark and studies.arXiv e-prints, pages arXiv–2501, 2025

Saibo Geng, Hudson Cooper, Michał Moskal, Samuel Jenkins, Julian Berman, Nathan Ranchin, Robert West, Eric Horvitz, and Harsha Nori. Generating structured outputs from language models: Benchmark and studies.arXiv e-prints, pages arXiv–2501, 2025

work page 2025
[67]

Combining tsl and llm to automate rest api testing: A comparative study.arXiv preprint arXiv:2509.05540, 2025

Thiago Barradas, Aline Paes, and Vˆania de Oliveira Neves. Combining tsl and llm to automate rest api testing: A comparative study.arXiv preprint arXiv:2509.05540, 2025

work page arXiv 2025
[68]

Language models as compilers: Simulating pseudocode execution improves algorithmic reasoning in language models

Hyungjoo Chae, Yeonghyeon Kim, Seungone Kim, Kai Tzu-iunn Ong, Beong-woo Kwak, Moohyeon Kim, Sunghwan Mac Kim, Taeyoon Kwon, Jiwan Chung, Youngjae Yu, et al. Language models as compilers: Simulating pseudocode execution improves algorithmic reasoning in language models. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processi...

work page 2024
[69]

Logic-lm: Empowering large language models with symbolic solvers for faithful logical reasoning

Liangming Pan, Alon Albalak, Xinyi Wang, and William Wang. Logic-lm: Empowering large language models with symbolic solvers for faithful logical reasoning. InFindings of the Association for Computational Linguistics: EMNLP 2023, pages 3806–3824, 2023. 13

work page 2023
[70]

Vipergpt: Visual inference via python execution for reasoning

D´ıdac Sur´ıs, Sachit Menon, and Carl V ondrick. Vipergpt: Visual inference via python execution for reasoning. InProceedings of the IEEE/CVF international conference on computer vision, pages 11888–11898, 2023

work page 2023
[71]

Thinking with blueprints: Assisting vision-language models in spatial reasoning via structured object representation.CoRR, abs/2601.01984, 2026

Weijian Ma, Shizhao Sun, Tianyu Yu, Ruiyu Wang, Tat-Seng Chua, and Jiang Bian. Thinking with blueprints: Assisting vision-language models in spatial reasoning via structured object representation.arXiv preprint arXiv:2601.01984, 2026

work page arXiv 2026
[72]

Thinking Without Words: Efficient Latent Reasoning with Abstract Chain-of-Thought

Keshav Ramji, Tahira Naseem, and Ram´on Fernandez Astudillo. Thinking without words: Efficient latent reasoning with abstract chain-of-thought.arXiv preprint arXiv:2604.22709, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[73]

The language instinct (1994/2007)

Steven Pinker. The language instinct (1994/2007). 2007

work page 1994
[74]

AgentSPEX: An Agent SPecification and EXecution Language

Pengcheng Wang, Jerry Huang, Jiarui Yao, Rui Pan, Peizhi Niu, Yaowenqi Liu, Ruida Wang, Renhao Lu, Yuwei Guo, and Tong Zhang. Agentspex: An agent specification and execution language.arXiv preprint arXiv:2604.13346, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[75]

Constituent-constrained word prediction during language comprehension.Nature Neuroscience, pages 1–12, 2026

Jiajie Zou, David Poeppel, and Nai Ding. Constituent-constrained word prediction during language comprehension.Nature Neuroscience, pages 1–12, 2026

work page 2026
[76]

Faithful logical reasoning via symbolic chain-of-thought.arXiv preprint arXiv:2405.18357, 2024

Jundong Xu, Hao Fei, Liangming Pan, Qian Liu, Mong-Li Lee, and Wynne Hsu. Faithful logical reasoning via symbolic chain-of-thought.arXiv preprint arXiv:2405.18357, 2024

work page arXiv 2024
[77]

Geometrically-constrained agent for spatial reasoning.arXiv preprint arXiv:2511.22659, 2025

Zeren Chen, Xiaoya Lu, Zhijie Zheng, Pengrui Li, Lehan He, Yijin Zhou, Jing Shao, Bohan Zhuang, and Lu Sheng. Geometrically-constrained agent for spatial reasoning.arXiv preprint arXiv:2511.22659, 2025

work page arXiv 2025
[78]

Elsevier, 2004

Malik Ghallab, Dana Nau, and Paolo Traverso.Automated Planning: theory and practice. Elsevier, 2004

work page 2004
[79]

Llm+ al: Bridging large language models and action languages for complex reasoning about actions

Adam Ishay and Joohyung Lee. Llm+ al: Bridging large language models and action languages for complex reasoning about actions. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 24212–24220, 2025

work page 2025
[80]

Smirnov, F

Pavel Smirnov, Frank Joublin, Antonello Ceravola, and Michael Gienger. Generating consistent pddl domains with large language models.arXiv preprint arXiv:2404.07751, 2024

work page arXiv 2024

Showing first 80 references.