When LLMs Develop Languages: Symbolic Communication for Efficient Multi-Agent Reasoning

Qingming Huang; Shuhui Wang; Zhengqi Pei

arxiv: 2606.29354 · v1 · pith:ABMXQ22Nnew · submitted 2026-06-28 · 💻 cs.AI · cs.NE

When LLMs Develop Languages: Symbolic Communication for Efficient Multi-Agent Reasoning

Zhengqi Pei , Qingming Huang , Shuhui Wang This is my paper

Pith reviewed 2026-06-30 07:19 UTC · model grok-4.3

classification 💻 cs.AI cs.NE

keywords multi-agent LLMssymbolic communicationchain-of-thoughttoken efficiencyemergent languagesrouting protocols

0 comments

The pith

Multiple LLM agents autonomously invent compact symbolic languages that cut token use by 3-6 times on reasoning tasks while holding accuracy steady.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Communicative Language Symbolism Routing, a test-time method in which several LLM agents jointly create, refine, and exchange compact Language Symbolism Frameworks. Each framework consists of symbols, rules, and a message contract that agents evolve through loops driven by correctness and token cost. A router then chooses whether to invoke one framework, combine several, or run a multi-round protocol depending on the query. The result is a 3-to-6-fold drop in generated tokens relative to standard chain-of-thought prompting across hard benchmarks, with no loss in accuracy. The authors also supply an information-theoretic lower bound on token cost and show that, under an interpreter-realizability premise, multi-round protocols can replace conventional program-execution pipelines.

Core claim

Agents develop reusable Language Symbolism Frameworks as symbolic protocols and a latent-free router adaptively composes them at inference time to optimize the accuracy-token trade-off, delivering 3-6 times lower token completion than chain-of-thought while preserving accuracy; under interpreter-realizability, multi-round protocols conditionally subsume program-execution pipelines.

What carries the argument

Language Symbolism Framework (LSF): a compact symbolic protocol with symbols, usage rules, and a message-passing contract that agents invent and evolve for efficient multi-agent communication.

If this is right

A single low-cost LSF call, an ensemble of LSFs, or a multi-round composition protocol can be selected per query.
Token completion on challenging benchmarks drops by a factor of 3 to 6 relative to standard chain-of-thought while accuracy stays the same.
An information-theoretic lower bound exists on token cost under arbitrary symbolism.
Multi-round LSF protocols conditionally subsume program-execution pipelines when the interpreter-realizability premise holds.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The evolutionary improvement of LSFs could be applied to objectives other than token cost, such as error rate or interpretability.
If the premise holds, language-based agent systems and code-based execution systems become interchangeable in principle.
The router's adaptive selection suggests a general mechanism for trading computation depth against communication cost in multi-agent setups.

Load-bearing premise

The interpreter-realizability premise that lets multi-round LSF protocols replace program-execution pipelines.

What would settle it

A benchmark run in which CLSR produces either more tokens than chain-of-thought or lower accuracy than chain-of-thought on the same tasks.

Figures

Figures reproduced from arXiv: 2606.29354 by Qingming Huang, Shuhui Wang, Zhengqi Pei.

**Figure 1.** Figure 1: Communicative Language Symbolism Routing (CLSR). (a) We let LLMs self-evolve an LLM-oriented compact Language Symbolism Framework (LSF), i.e., an information-dense reasoning “dialect”, rather than expanding natural language rationales (e.g., CoT) or relying on an external executor (e.g., PoT). An LLM router then selects, ensembles, or composes LSFs, enabling a controllable accuracy–token trade-off. (b) Mul… view at source ↗

**Figure 2.** Figure 2: Scaling generated tokens: CLSR dominates CoT under test-time scaling. We increase the number of generated tokens via test-time scaling (multiple samples with majority vote) and compare standard CoT against CLSR (LSF). The reported curves trace the empirical accuracy–token trade-off. CLSR achieves higher accuracy at a given token budget (or the same accuracy at fewer tokens) across (a) scientific QA benchma… view at source ↗

**Figure 3.** Figure 3: Scaling evolution agents improves LSF quality. We ablate the number of LLM agents participating in the offline LSF-evolution process (parallel proposal, critique, and refinement), and evaluate the resulting LSF pools on representative benchmarks. Across tasks, using more agents typically improves downstream accuracy, while also increasing (or modestly changing) generation cost. 2023) and (viii) PromptBreed… view at source ↗

**Figure 4.** Figure 4: Effect of evolution depth on the accuracy–token frontier. We vary the number of language-evolution iterations used to refine the LSF pool, while holding the backbone fixed. N denotes the number of training exemplars used for LSF synthesis. We report (a) test accuracy, (b) average generated completion tokens per problem, and (c) an aggregated efficiency score. Increasing the evolution depth improves both ac… view at source ↗

**Figure 5.** Figure 5: Effects of adding more LLM agents on MMLU-pro. (a) No.Agents versus No.tokens. (b) No.Agents versus Accuracy. (c) No.tokens versus Accuracy [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗

**Figure 6.** Figure 6: Effects of adding more LLM agents on GPQA-main. A.1. Comparison to length-controlled prompting baselines. Compared with Constrained CoT (CCoT), Chain-of-Draft (CoD), and Sketch-of-Thought (SoT), CLSR achieves a stronger overall trade-off: (i) relative to CCoT, CLSR improves accuracy while also using fewer tokens (since CCoT removes verbosity but does not introduce a task-adaptive symbolic code); (ii) relat… view at source ↗

**Figure 7.** Figure 7: Effects of adding more LLM agents on MATH500. (a) No.Agents versus No.tokens. (b) No.Agents versus Accuracy. (c) No.tokens versus Accuracy [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗

**Figure 8.** Figure 8: Effects of adding more LLM agents on Science-QA. A.2. Interpretation through the theory lens. Section 3 predicts that, for a fixed target accuracy, token cost is lower-bounded by a ratio E[|T|] ≥ Ireq(x, δ)/κθ(x). CLSR improves the empirical cost–accuracy curve in a manner consistent with increasing κθ(x): LSF traces replace low-information narrative text with compact, state-carrying symbols (operators, bi… view at source ↗

**Figure 9.** Figure 9: Qualitative GPQA-Main example: CoT vs. CLSR (LSF). A representative GPQA-Main test query and the corresponding model generations under standard CoT and CLSR. CLSR replaces verbose natural-language narration with a compact symbolic trace that preserves key intermediate state and checks, yielding a shorter completion and a correct final answer in this example. 25 [PITH_FULL_IMAGE:figures/full_fig_p025_9.png] view at source ↗

**Figure 10.** Figure 10: Router-selected LSF specification for GPQA-Main. The automatically selected LSF content (language descriptor and protocol template) used by CLSR for the GPQA-Main query in [PITH_FULL_IMAGE:figures/full_fig_p026_10.png] view at source ↗

**Figure 11.** Figure 11: Qualitative MATH500 example: CoT vs. CLSR (LSF). A representative MATH500 test query with generations from standard CoT and CLSR. The CLSR trace emphasizes concise operator-like transformations and explicit state (e.g., variable bindings and subgoal markers), reducing verbosity while maintaining (and here improving) correctness. 27 [PITH_FULL_IMAGE:figures/full_fig_p027_11.png] view at source ↗

**Figure 12.** Figure 12: Router-selected LSF specification for MATH500. The automatically selected LSF content used by CLSR for the MATH500 query in [PITH_FULL_IMAGE:figures/full_fig_p028_12.png] view at source ↗

**Figure 13.** Figure 13: Qualitative AIME example: CoT vs. CLSR (LSF). A representative AIME test query comparing standard CoT with CLSR. CLSR produces a program-like symbolic trace that keeps only the computable core of reasoning, reducing narrative overhead and supporting reliable multi-step derivation under a constrained token budget. 29 [PITH_FULL_IMAGE:figures/full_fig_p029_13.png] view at source ↗

**Figure 14.** Figure 14: Router-selected LSF specification for AIME. The automatically selected LSF content used by CLSR for the AIME query in [PITH_FULL_IMAGE:figures/full_fig_p030_14.png] view at source ↗

read the original abstract

Chain-of-Thought (CoT) improves large language models (LLMs) on difficult reasoning tasks, but it often incurs long natural-language rationales that are poorly aligned with efficient machine reasoning. We propose Communicative Language Symbolism Routing (CLSR), a test-time framework in which multiple LLM agents autonomously invent, evolve, and share compact Language Symbolism Frameworks (LSFs), while a latent-free router adaptively selects and composes these languages per query to optimize the accuracy-token trade-off. Unlike prompt optimization that refines surface instructions, CLSR treats each LSF as a reusable symbolic protocol with compact symbols, usage rules, and a message-passing contract, and improves it through an evolutionary loop driven by correctness and token cost. At inference time, the router may invoke a single low-cost LSF call, ensemble multiple LSFs, or execute a multi-round LSF composition protocol on harder queries. Across challenging benchmarks, CLSR reduces latency-oriented generated token completion by $3\sim 6\times$ compared to standard CoT while maintaining accuracy. We further derive an information-theoretic lower bound on token cost under arbitrary symbolism and show that, under an interpreter-realizability premise, multi-round LSF protocols conditionally subsume program-execution pipelines. Code is publicly available (https://github.com/pzqpzq/LSF_MDia).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CLSR has agents invent and evolve compact symbolic languages for multi-agent reasoning to cut tokens, but the subsumption claim rests on an undefined premise.

read the letter

CLSR is a test-time setup where multiple LLM agents autonomously create, evolve, and share compact symbolic languages instead of using natural-language CoT. The evolutionary loop optimizes for both correctness and token cost, and a router decides whether to use one language, ensemble several, or run multi-round compositions.

What is new is the combination of treating the invented languages as reusable protocols with symbols, rules, and message contracts, plus the cost-aware evolution and the latent-free router. The 3-6x token reduction claim on benchmarks while holding accuracy is the practical result, and releasing the code makes it testable.

The soft spot is the theoretical part. The paper states that under an interpreter-realizability premise, multi-round LSF protocols conditionally subsume program-execution pipelines and derives an information-theoretic lower bound. That premise is introduced without definition, motivation, or derivation, so the subsumption step does not follow from the other elements given. The empirical token savings stand separately from this premise.

The abstract also gives no experimental controls, error bars, or variance, which makes it hard to judge how reliable the accuracy-token numbers are.

This is for people working on efficient inference in multi-agent LLM systems. A reader focused on test-time methods could extract the evolutionary protocol idea even if the theory stays thin.

It deserves peer review so the premise can be clarified and the experiments can be checked with proper statistics.

Referee Report

2 major / 0 minor

Summary. The paper proposes Communicative Language Symbolism Routing (CLSR), a test-time multi-agent framework in which LLMs autonomously invent, evolve, and route compact Language Symbolism Frameworks (LSFs) to improve the accuracy-token trade-off on reasoning tasks. It reports that CLSR achieves 3-6× reductions in generated tokens relative to standard Chain-of-Thought while preserving accuracy, derives an information-theoretic lower bound on token cost under arbitrary symbolism, and claims that multi-round LSF protocols conditionally subsume program-execution pipelines under an interpreter-realizability premise. Public code is provided.

Significance. If the empirical efficiency gains and the conditional theoretical subsumption can be placed on firmer footing, the work would offer a concrete route toward more token-efficient multi-agent LLM reasoning by treating evolved symbolic protocols as first-class reusable artifacts. The public release of code is a clear strength that supports reproducibility and follow-on experimentation.

major comments (2)

[Abstract] Abstract: the subsumption claim that 'multi-round LSF protocols conditionally subsume program-execution pipelines' is stated as conditional on an 'interpreter-realizability premise' that is neither defined nor motivated anywhere in the manuscript. Because this premise is required to bridge LSF message-passing contracts to arbitrary program execution, its absence renders the theoretical contribution unverifiable from the given text.
[Abstract] Abstract: the central empirical claim of 3∼6× token reduction 'while maintaining accuracy' is presented without reference to error bars, number of runs, statistical tests, or explicit controls for prompt length, model temperature, or baseline prompt-engineering variants. These omissions make it impossible to assess whether the reported accuracy-token trade-off is robust or load-bearing for the paper's main contribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which highlight areas where the presentation of our theoretical and empirical contributions can be strengthened. We address each major comment below and commit to revisions that will make the claims more verifiable and robust.

read point-by-point responses

Referee: [Abstract] Abstract: the subsumption claim that 'multi-round LSF protocols conditionally subsume program-execution pipelines' is stated as conditional on an 'interpreter-realizability premise' that is neither defined nor motivated anywhere in the manuscript. Because this premise is required to bridge LSF message-passing contracts to arbitrary program execution, its absence renders the theoretical contribution unverifiable from the given text.

Authors: We agree that the interpreter-realizability premise is referenced in the abstract but not formally defined or motivated in the manuscript body. This omission weakens the verifiability of the conditional subsumption result. In the revised manuscript, we will insert a new subsection (likely in Section 4 on theoretical analysis) that (i) defines the premise as the existence of a deterministic interpreter that can map any valid LSF message-passing contract to executable program steps, (ii) motivates it by showing equivalence to Turing-complete computation under the premise, and (iii) clarifies the precise conditions under which multi-round LSF protocols subsume program-execution pipelines. This addition will render the claim fully verifiable from the text. revision: yes
Referee: [Abstract] Abstract: the central empirical claim of 3∼6× token reduction 'while maintaining accuracy' is presented without reference to error bars, number of runs, statistical tests, or explicit controls for prompt length, model temperature, or baseline prompt-engineering variants. These omissions make it impossible to assess whether the reported accuracy-token trade-off is robust or load-bearing for the paper's main contribution.

Authors: We concur that the abstract's empirical claim lacks the necessary statistical and methodological qualifiers. In the revision we will (i) expand the abstract to cite the experimental protocol (5 independent runs per benchmark, temperature fixed at 0.0 for reproducibility, prompt-length controls via token-budget matching), (ii) report mean token counts with standard-deviation error bars, and (iii) reference the statistical tests (paired t-tests against CoT and prompt-engineering baselines) already present in the main results section. These changes will allow readers to evaluate the robustness of the accuracy-token trade-off directly from the abstract. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper presents an empirical CLSR framework with measured 3-6x token reductions on benchmarks and a separate theoretical claim deriving an information-theoretic lower bound on token cost, followed by a subsumption result that is explicitly conditional on an additional interpreter-realizability premise. No quoted equations or steps reduce the claimed results to their inputs by construction, no self-citations are invoked as load-bearing uniqueness theorems, and the premise functions as a stated assumption rather than a self-definitional or fitted element. The empirical results stand independently of the conditional theoretical step.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The abstract introduces two new named constructs (LSFs and CLSR) and one explicit premise; no numerical free parameters are mentioned.

axioms (1)

domain assumption interpreter-realizability premise
Invoked to support the claim that multi-round LSF protocols conditionally subsume program-execution pipelines.

invented entities (2)

Language Symbolism Frameworks (LSFs) no independent evidence
purpose: Reusable symbolic protocols with compact symbols, usage rules, and message-passing contracts
New entity created by the agents and improved through the evolutionary loop.
Communicative Language Symbolism Routing (CLSR) no independent evidence
purpose: Test-time framework that lets agents invent, evolve, and route LSFs via a latent-free router
The overall proposed system.

pith-pipeline@v0.9.1-grok · 5773 in / 1335 out tokens · 36035 ms · 2026-06-30T07:19:04.247338+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

71 extracted references · 15 canonical work pages · 6 internal anchors

[1]

Langley , title =

P. Langley , title =. Proceedings of the 17th International Conference on Machine Learning (ICML 2000) , address =. 2000 , pages =

2000
[2]

T. M. Mitchell. The Need for Biases in Learning Generalizations. 1980

1980
[3]

M. J. Kearns , title =
[4]

Machine Learning: An Artificial Intelligence Approach, Vol. I. 1983

1983
[5]

R. O. Duda and P. E. Hart and D. G. Stork. Pattern Classification. 2000

2000
[6]

Suppressed for Anonymity , author=
[7]

Newell and P

A. Newell and P. S. Rosenbloom. Mechanisms of Skill Acquisition and the Law of Practice. Cognitive Skills and Their Acquisition. 1981

1981
[8]

A. L. Samuel. Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development. 1959

1959
[9]

Advances in neural information processing systems , volume=

Chain-of-thought prompting elicits reasoning in large language models , author=. Advances in neural information processing systems , volume=
[10]

Advances in neural information processing systems , volume=

Tree of thoughts: Deliberate problem solving with large language models , author=. Advances in neural information processing systems , volume=
[11]

Proceedings of the AAAI conference on artificial intelligence , volume=

Graph of thoughts: Solving elaborate problems with large language models , author=. Proceedings of the AAAI conference on artificial intelligence , volume=
[12]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Why prompt design matters and works: A complexity analysis of prompt search space in llms , author=. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
[13]

ACM Computing Surveys , volume=

Natural language reasoning, a survey , author=. ACM Computing Surveys , volume=. 2024 , publisher=

2024
[14]

Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models

Towards reasoning era: A survey of long chain-of-thought for reasoning large language models , author=. arXiv preprint arXiv:2503.09567 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[15]

arXiv preprint arXiv:2407.19825 , year=

Concise thoughts: Impact of output length on llm reasoning and cost , author=. arXiv preprint arXiv:2407.19825 , year=

work page arXiv
[16]

CoRR , year=

Training Language Models to Reason Efficiently , author=. CoRR , year=
[17]

The Thirteenth International Conference on Learning Representations , year=

To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning , author=. The Thirteenth International Conference on Learning Representations , year=
[18]

The Eleventh International Conference on Learning Representations , year=

Self-Consistency Improves Chain of Thought Reasoning in Language Models , author=. The Eleventh International Conference on Learning Representations , year=
[19]

Advances in Neural Information Processing Systems , volume=

Language models don't always say what they think: Unfaithful explanations in chain-of-thought prompting , author=. Advances in Neural Information Processing Systems , volume=
[20]

arXiv preprint arXiv:2511.22891 , year=

ORION: Teaching Language Models to Reason Efficiently in the Language of Thought , author=. arXiv preprint arXiv:2511.22891 , year=

work page arXiv
[21]

The Twelfth International Conference on Learning Representations , year=

Large language models as optimizers , author=. The Twelfth International Conference on Learning Representations , year=
[22]

International Conference on Machine Learning , pages=

Promptbreeder: Self-Referential Self-Improvement via Prompt Evolution , author=. International Conference on Machine Learning , pages=. 2024 , organization=

2024
[23]

International Conference on Machine Learning , pages=

Pal: Program-aided language models , author=. International Conference on Machine Learning , pages=. 2023 , organization=

2023
[24]

The Twelfth International Conference on Learning Representations , year=

LILO: Learning Interpretable Libraries by Compressing and Documenting Code , author=. The Twelfth International Conference on Learning Representations , year=
[25]

Compressed Chain of Thought: Efficient Reasoning Through Dense Representations

Compressed chain of thought: Efficient reasoning through dense representations , author=. arXiv preprint arXiv:2412.13171 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[26]

Emergent multi-agent communication in the deep learning era

Emergent multi-agent communication in the deep learning era , author=. arXiv preprint arXiv:2006.02419 , year=

work page arXiv 2006
[27]

Advances in neural information processing systems , volume=

Trading off utility, informativeness, and complexity in emergent communication , author=. Advances in neural information processing systems , volume=
[28]

arXiv preprint arXiv:2502.17216 , year=

Intermediate Languages Matter: Formal Choice Drives Neurosymbolic LLM Reasoning , author=. arXiv preprint arXiv:2502.17216 , year=

work page arXiv
[29]

Forty-second International Conference on Machine Learning , year=

RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning , author=. Forty-second International Conference on Machine Learning , year=
[30]

arXiv preprint arXiv:2503.05179 , year=

Sketch-of-thought: Efficient llm reasoning with adaptive cognitive-inspired sketching , author=. arXiv preprint arXiv:2503.05179 , year=

work page arXiv
[31]

Advances in Neural Information Processing Systems , volume=

Self-discover: Large language models self-compose reasoning structures , author=. Advances in Neural Information Processing Systems , volume=
[32]

Transactions on Machine Learning Research , year=

Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks , author=. Transactions on Machine Learning Research , year=
[33]

Proceedings of the 31st International Conference on Computational Linguistics , pages=

Searching for structure: Investigating emergent communication with large language models , author=. Proceedings of the 31st International Conference on Computational Linguistics , pages=
[34]

International Conference on Machine Learning , pages=

Dynamics-inspired neuromorphic visual representation learning , author=. International Conference on Machine Learning , pages=. 2023 , organization=

2023
[35]

International Conference on Machine Learning , pages=

Data-free Neural Representation Compression with Riemannian Neural Dynamics , author=. International Conference on Machine Learning , pages=. 2024 , organization=

2024
[36]

International Conference on Machine Learning , pages=

Modeling Language Tokens as Functionals of Semantic Fields , author=. International Conference on Machine Learning , pages=. 2024 , organization=

2024
[37]

arXiv preprint arXiv:2510.16851 , year=

Neuronal Group Communication for Efficient Neural representation , author=. arXiv preprint arXiv:2510.16851 , year=

work page arXiv
[38]

Qwen3 Technical Report

Qwen3 technical report , author=. arXiv preprint arXiv:2505.09388 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[39]

2024 , journal=

Llama 3 Model Card , author=. 2024 , journal=

2024
[40]

2023 , eprint=

Mistral 7B , author=. 2023 , eprint=

2023
[41]

Advances in Neural Information Processing Systems , volume=

Mmlu-pro: A more robust and challenging multi-task language understanding benchmark , author=. Advances in Neural Information Processing Systems , volume=
[42]

First Conference on Language Modeling , year=

Gpqa: A graduate-level google-proof q&a benchmark , author=. First Conference on Language Modeling , year=
[43]

GPQA: A Graduate-Level Google-Proof Q&A Benchmark

Gpqa: A graduate-level google-proof q&a benchmark , author=. arXiv preprint arXiv:2311.12022 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[44]

Training Verifiers to Solve Math Word Problems

Training verifiers to solve math word problems , author=. arXiv preprint arXiv:2110.14168 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[45]

Measuring Mathematical Problem Solving With the MATH Dataset

Measuring mathematical problem solving with the math dataset , author=. arXiv preprint arXiv:2103.03874 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[46]

2023 , publisher =

Hemish Veeraboina , title =. 2023 , publisher =

2023
[47]

Advances in Neural Information Processing Systems , volume=

Learn to explain: Multimodal reasoning via thought chains for science question answering , author=. Advances in Neural Information Processing Systems , volume=
[48]

Proceedings of the 2018 conference on empirical methods in natural language processing , pages=

HotpotQA: A dataset for diverse, explainable multi-hop question answering , author=. Proceedings of the 2018 conference on empirical methods in natural language processing , pages=

2018
[49]

CoRR , year=

Hybrid LLM: Cost-Efficient and Quality-Aware Query Routing , author=. CoRR , year=
[50]

Cognition , volume=

Zipf’s law of abbreviation and the principle of least effort: Language users optimise a miniature lexicon for efficient communication , author=. Cognition , volume=. 2017 , publisher=

2017
[51]

2021 , publisher=

Constrained Markov decision processes , author=. 2021 , publisher=

2021
[52]

Advances in Applied Probability , volume=

The Expected Total Cost Criterion for Markov Decision Processes under Constraints: A Convex Analytic Approach , author=. Advances in Applied Probability , volume=. 2012 , publisher=

2012
[53]

arXiv preprint arXiv:1901.00555 , year=

An introductory guide to Fano's inequality with applications in statistical estimation , author=. arXiv preprint arXiv:1901.00555 , year=

work page arXiv 1901
[54]

International Conference on Learning Representations , year=

On the Turing Completeness of Modern Neural Network Architectures , author=. International Conference on Learning Representations , year=
[55]

Journal of Machine Learning Research , volume=

Attention is turing-complete , author=. Journal of Machine Learning Research , volume=
[56]

2014 , publisher=

Markov decision processes: discrete stochastic dynamic programming , author=. 2014 , publisher=

2014
[57]

2025 , eprint=

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning , author=. 2025 , eprint=

2025
[58]

CoRR , year=

Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost , author=. CoRR , year=
[59]

Chain of draft: Thinking faster by writing less.arXiv preprint arXiv:2502.18600, 2025a

Chain of draft: Thinking faster by writing less , author=. arXiv preprint arXiv:2502.18600 , year=

work page arXiv
[60]

Advances in neural information processing systems , volume=

Large language models are zero-shot reasoners , author=. Advances in neural information processing systems , volume=
[61]

The Eleventh International Conference on Learning Representations , year=

Least-to-Most Prompting Enables Complex Reasoning in Large Language Models , author=. The Eleventh International Conference on Learning Representations , year=
[62]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models , author=. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
[63]

Advances in Neural Information Processing Systems , volume=

Toolformer: Language models can teach themselves to use tools , author=. Advances in Neural Information Processing Systems , volume=
[64]

arXiv preprint arXiv:2505.16838 , year=

R1-compress: Long chain-of-thought compression via chunk compression and search , author=. arXiv preprint arXiv:2505.16838 , year=

work page arXiv
[65]

Advances in Neural Information Processing Systems , volume=

Self-refine: Iterative refinement with self-feedback , author=. Advances in Neural Information Processing Systems , volume=
[66]

Advances in Neural Information Processing Systems , volume=

Reflexion: Language agents with verbal reinforcement learning , author=. Advances in Neural Information Processing Systems , volume=
[67]

Preprint, alphaXiv , pages=

Chain-of-thought is not explainability , author=. Preprint, alphaXiv , pages=
[68]

International Conference on Machine Learning , pages=

Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models , author=. International Conference on Machine Learning , pages=. 2024 , organization=

2024
[69]

The eleventh international conference on learning representations , year=

React: Synergizing reasoning and acting in language models , author=. The eleventh international conference on learning representations , year=
[70]

Companion Proceedings of the ACM on Web Conference 2025 , pages=

RLG-RAG: Guiding the Knowledge Retrieval and Evaluation in Retrieval-Augmented Generation Framework by Reasoning Logic , author=. Companion Proceedings of the ACM on Web Conference 2025 , pages=

2025
[71]

Electronics Letters , volume=

Reconstruction of three-dimensional sound field system based on machine learning method , author=. Electronics Letters , volume=. 2025 , publisher=

2025

[1] [1]

Langley , title =

P. Langley , title =. Proceedings of the 17th International Conference on Machine Learning (ICML 2000) , address =. 2000 , pages =

2000

[2] [2]

T. M. Mitchell. The Need for Biases in Learning Generalizations. 1980

1980

[3] [3]

M. J. Kearns , title =

[4] [4]

Machine Learning: An Artificial Intelligence Approach, Vol. I. 1983

1983

[5] [5]

R. O. Duda and P. E. Hart and D. G. Stork. Pattern Classification. 2000

2000

[6] [6]

Suppressed for Anonymity , author=

[7] [7]

Newell and P

A. Newell and P. S. Rosenbloom. Mechanisms of Skill Acquisition and the Law of Practice. Cognitive Skills and Their Acquisition. 1981

1981

[8] [8]

A. L. Samuel. Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development. 1959

1959

[9] [9]

Advances in neural information processing systems , volume=

Chain-of-thought prompting elicits reasoning in large language models , author=. Advances in neural information processing systems , volume=

[10] [10]

Advances in neural information processing systems , volume=

Tree of thoughts: Deliberate problem solving with large language models , author=. Advances in neural information processing systems , volume=

[11] [11]

Proceedings of the AAAI conference on artificial intelligence , volume=

Graph of thoughts: Solving elaborate problems with large language models , author=. Proceedings of the AAAI conference on artificial intelligence , volume=

[12] [12]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Why prompt design matters and works: A complexity analysis of prompt search space in llms , author=. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

[13] [13]

ACM Computing Surveys , volume=

Natural language reasoning, a survey , author=. ACM Computing Surveys , volume=. 2024 , publisher=

2024

[14] [14]

Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models

Towards reasoning era: A survey of long chain-of-thought for reasoning large language models , author=. arXiv preprint arXiv:2503.09567 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[15] [15]

arXiv preprint arXiv:2407.19825 , year=

Concise thoughts: Impact of output length on llm reasoning and cost , author=. arXiv preprint arXiv:2407.19825 , year=

work page arXiv

[16] [16]

CoRR , year=

Training Language Models to Reason Efficiently , author=. CoRR , year=

[17] [17]

The Thirteenth International Conference on Learning Representations , year=

To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning , author=. The Thirteenth International Conference on Learning Representations , year=

[18] [18]

The Eleventh International Conference on Learning Representations , year=

Self-Consistency Improves Chain of Thought Reasoning in Language Models , author=. The Eleventh International Conference on Learning Representations , year=

[19] [19]

Advances in Neural Information Processing Systems , volume=

Language models don't always say what they think: Unfaithful explanations in chain-of-thought prompting , author=. Advances in Neural Information Processing Systems , volume=

[20] [20]

arXiv preprint arXiv:2511.22891 , year=

ORION: Teaching Language Models to Reason Efficiently in the Language of Thought , author=. arXiv preprint arXiv:2511.22891 , year=

work page arXiv

[21] [21]

The Twelfth International Conference on Learning Representations , year=

Large language models as optimizers , author=. The Twelfth International Conference on Learning Representations , year=

[22] [22]

International Conference on Machine Learning , pages=

Promptbreeder: Self-Referential Self-Improvement via Prompt Evolution , author=. International Conference on Machine Learning , pages=. 2024 , organization=

2024

[23] [23]

International Conference on Machine Learning , pages=

Pal: Program-aided language models , author=. International Conference on Machine Learning , pages=. 2023 , organization=

2023

[24] [24]

The Twelfth International Conference on Learning Representations , year=

LILO: Learning Interpretable Libraries by Compressing and Documenting Code , author=. The Twelfth International Conference on Learning Representations , year=

[25] [25]

Compressed Chain of Thought: Efficient Reasoning Through Dense Representations

Compressed chain of thought: Efficient reasoning through dense representations , author=. arXiv preprint arXiv:2412.13171 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[26] [26]

Emergent multi-agent communication in the deep learning era

Emergent multi-agent communication in the deep learning era , author=. arXiv preprint arXiv:2006.02419 , year=

work page arXiv 2006

[27] [27]

Advances in neural information processing systems , volume=

Trading off utility, informativeness, and complexity in emergent communication , author=. Advances in neural information processing systems , volume=

[28] [28]

arXiv preprint arXiv:2502.17216 , year=

Intermediate Languages Matter: Formal Choice Drives Neurosymbolic LLM Reasoning , author=. arXiv preprint arXiv:2502.17216 , year=

work page arXiv

[29] [29]

Forty-second International Conference on Machine Learning , year=

RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning , author=. Forty-second International Conference on Machine Learning , year=

[30] [30]

arXiv preprint arXiv:2503.05179 , year=

Sketch-of-thought: Efficient llm reasoning with adaptive cognitive-inspired sketching , author=. arXiv preprint arXiv:2503.05179 , year=

work page arXiv

[31] [31]

Advances in Neural Information Processing Systems , volume=

Self-discover: Large language models self-compose reasoning structures , author=. Advances in Neural Information Processing Systems , volume=

[32] [32]

Transactions on Machine Learning Research , year=

Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks , author=. Transactions on Machine Learning Research , year=

[33] [33]

Proceedings of the 31st International Conference on Computational Linguistics , pages=

Searching for structure: Investigating emergent communication with large language models , author=. Proceedings of the 31st International Conference on Computational Linguistics , pages=

[34] [34]

International Conference on Machine Learning , pages=

Dynamics-inspired neuromorphic visual representation learning , author=. International Conference on Machine Learning , pages=. 2023 , organization=

2023

[35] [35]

International Conference on Machine Learning , pages=

Data-free Neural Representation Compression with Riemannian Neural Dynamics , author=. International Conference on Machine Learning , pages=. 2024 , organization=

2024

[36] [36]

International Conference on Machine Learning , pages=

Modeling Language Tokens as Functionals of Semantic Fields , author=. International Conference on Machine Learning , pages=. 2024 , organization=

2024

[37] [37]

arXiv preprint arXiv:2510.16851 , year=

Neuronal Group Communication for Efficient Neural representation , author=. arXiv preprint arXiv:2510.16851 , year=

work page arXiv

[38] [38]

Qwen3 Technical Report

Qwen3 technical report , author=. arXiv preprint arXiv:2505.09388 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[39] [39]

2024 , journal=

Llama 3 Model Card , author=. 2024 , journal=

2024

[40] [40]

2023 , eprint=

Mistral 7B , author=. 2023 , eprint=

2023

[41] [41]

Advances in Neural Information Processing Systems , volume=

Mmlu-pro: A more robust and challenging multi-task language understanding benchmark , author=. Advances in Neural Information Processing Systems , volume=

[42] [42]

First Conference on Language Modeling , year=

Gpqa: A graduate-level google-proof q&a benchmark , author=. First Conference on Language Modeling , year=

[43] [43]

GPQA: A Graduate-Level Google-Proof Q&A Benchmark

Gpqa: A graduate-level google-proof q&a benchmark , author=. arXiv preprint arXiv:2311.12022 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[44] [44]

Training Verifiers to Solve Math Word Problems

Training verifiers to solve math word problems , author=. arXiv preprint arXiv:2110.14168 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[45] [45]

Measuring Mathematical Problem Solving With the MATH Dataset

Measuring mathematical problem solving with the math dataset , author=. arXiv preprint arXiv:2103.03874 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[46] [46]

2023 , publisher =

Hemish Veeraboina , title =. 2023 , publisher =

2023

[47] [47]

Advances in Neural Information Processing Systems , volume=

Learn to explain: Multimodal reasoning via thought chains for science question answering , author=. Advances in Neural Information Processing Systems , volume=

[48] [48]

Proceedings of the 2018 conference on empirical methods in natural language processing , pages=

HotpotQA: A dataset for diverse, explainable multi-hop question answering , author=. Proceedings of the 2018 conference on empirical methods in natural language processing , pages=

2018

[49] [49]

CoRR , year=

Hybrid LLM: Cost-Efficient and Quality-Aware Query Routing , author=. CoRR , year=

[50] [50]

Cognition , volume=

Zipf’s law of abbreviation and the principle of least effort: Language users optimise a miniature lexicon for efficient communication , author=. Cognition , volume=. 2017 , publisher=

2017

[51] [51]

2021 , publisher=

Constrained Markov decision processes , author=. 2021 , publisher=

2021

[52] [52]

Advances in Applied Probability , volume=

The Expected Total Cost Criterion for Markov Decision Processes under Constraints: A Convex Analytic Approach , author=. Advances in Applied Probability , volume=. 2012 , publisher=

2012

[53] [53]

arXiv preprint arXiv:1901.00555 , year=

An introductory guide to Fano's inequality with applications in statistical estimation , author=. arXiv preprint arXiv:1901.00555 , year=

work page arXiv 1901

[54] [54]

International Conference on Learning Representations , year=

On the Turing Completeness of Modern Neural Network Architectures , author=. International Conference on Learning Representations , year=

[55] [55]

Journal of Machine Learning Research , volume=

Attention is turing-complete , author=. Journal of Machine Learning Research , volume=

[56] [56]

2014 , publisher=

Markov decision processes: discrete stochastic dynamic programming , author=. 2014 , publisher=

2014

[57] [57]

2025 , eprint=

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning , author=. 2025 , eprint=

2025

[58] [58]

CoRR , year=

Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost , author=. CoRR , year=

[59] [59]

Chain of draft: Thinking faster by writing less.arXiv preprint arXiv:2502.18600, 2025a

Chain of draft: Thinking faster by writing less , author=. arXiv preprint arXiv:2502.18600 , year=

work page arXiv

[60] [60]

Advances in neural information processing systems , volume=

Large language models are zero-shot reasoners , author=. Advances in neural information processing systems , volume=

[61] [61]

The Eleventh International Conference on Learning Representations , year=

Least-to-Most Prompting Enables Complex Reasoning in Large Language Models , author=. The Eleventh International Conference on Learning Representations , year=

[62] [62]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models , author=. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

[63] [63]

Advances in Neural Information Processing Systems , volume=

Toolformer: Language models can teach themselves to use tools , author=. Advances in Neural Information Processing Systems , volume=

[64] [64]

arXiv preprint arXiv:2505.16838 , year=

R1-compress: Long chain-of-thought compression via chunk compression and search , author=. arXiv preprint arXiv:2505.16838 , year=

work page arXiv

[65] [65]

Advances in Neural Information Processing Systems , volume=

Self-refine: Iterative refinement with self-feedback , author=. Advances in Neural Information Processing Systems , volume=

[66] [66]

Advances in Neural Information Processing Systems , volume=

Reflexion: Language agents with verbal reinforcement learning , author=. Advances in Neural Information Processing Systems , volume=

[67] [67]

Preprint, alphaXiv , pages=

Chain-of-thought is not explainability , author=. Preprint, alphaXiv , pages=

[68] [68]

International Conference on Machine Learning , pages=

Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models , author=. International Conference on Machine Learning , pages=. 2024 , organization=

2024

[69] [69]

The eleventh international conference on learning representations , year=

React: Synergizing reasoning and acting in language models , author=. The eleventh international conference on learning representations , year=

[70] [70]

Companion Proceedings of the ACM on Web Conference 2025 , pages=

RLG-RAG: Guiding the Knowledge Retrieval and Evaluation in Retrieval-Augmented Generation Framework by Reasoning Logic , author=. Companion Proceedings of the ACM on Web Conference 2025 , pages=

2025

[71] [71]

Electronics Letters , volume=

Reconstruction of three-dimensional sound field system based on machine learning method , author=. Electronics Letters , volume=. 2025 , publisher=

2025