Recognition: no theorem link
Deep Reasoning in General Purpose Agents via Structured Meta-Cognition
Pith reviewed 2026-05-13 02:41 UTC · model grok-4.3
The pith
LLM agents can dynamically construct their own task-specific reasoning scaffolds at inference time using structured meta-reasoning in a formal language.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Deep Reasoning treats scaffolding itself as adaptive reasoning: a formal language represents meta-reasoning as executable decompositions over associative inference, formal computation, and recursive subproblem solving, so that decomposition principles can be encoded once as in-context examples and then used at test time to construct a task-specific scaffold on the fly. When instantiated in the DOLORES agent, this produces better performance than any fixed scaffold on hard benchmarks while reducing premature termination and hallucinations by spreading cognition across more controlled threads.
What carries the argument
A formal language for structured meta-reasoning that encodes decompositions over associative inference, formal computation, and recursive subproblem solving as executable in-context examples for test-time scaffold construction.
If this is right
- DOLORES outperforms every evaluated fixed scaffold by 24.8 percent on average across model sizes and families.
- An 8B-parameter DOLORES agent surpasses multiple 32B baselines from the same family in more than half of the tested settings.
- Distributing work across structured, lower-load reasoning threads reduces premature termination and hallucinations.
- Scaffolding can be treated as just-in-time adaptive reasoning rather than pre-engineered structure.
Where Pith is reading between the lines
- The same formal language could be used to inspect or debug an agent's reasoning plan before execution.
- If the language is sufficiently general, it might reduce the amount of task-specific prompt engineering needed when deploying agents to new domains.
- Extending the decomposition primitives to include uncertainty estimation or tool-use planning would be a direct next step within the same framework.
Load-bearing premise
That examples of meta-reasoning decompositions written in the formal language will let the model reliably invent effective task-specific scaffolds for diverse and previously unseen problems without any additional training or hand-crafted prompts.
What would settle it
A controlled test on a suite of novel tasks whose required reasoning structure differs markedly from the in-context decomposition examples, where DOLORES either matches or underperforms the strongest fixed-scaffold baseline.
Figures
read the original abstract
Humans intuitively solve complex problems by flexibly shifting among reasoning modes: they plan, execute, revise intermediate goals, resolve ambiguity through associative judgment, and apply formal procedures to well-specified subproblems. Current LLM agents lack this flexibility, as their scaffolds hard-code such reasoning decisions in advance. These scaffolds are effective when their prescribed structure matches the task, but brittle when solving the task requires adapting the structure of reasoning itself. We introduce Deep Reasoning -- an inference-time approach for constructing task-specific scaffolds through structured meta-reasoning. Deep Reasoning uses a formal language that represents meta-reasoning as executable decompositions over associative inference, formal computation, and recursive subproblem solving, enabling decomposition principles to be encoded as in-context examples that guide test-time scaffold construction. We instantiate this approach in a general-purpose agent (DOLORES) that distributes complex tasks across more controlled reasoning threads. We evaluate it against state-of-the-art scaffolding methods across four hard benchmarks: multi-hop reasoning, long-chain question answering, long-context aggregation, and deep research-style information seeking. DOLORES outperforms all evaluated scaffolds across three model sizes and two model families, improving over the strongest evaluated scaffold baseline by 24.8% on average. DOLORES distributes cognition across structured, lower-load reasoning threads, thereby reducing premature termination and hallucinations. This advantage can even bridge the scaling gap, with an 8B version surpassing all evaluated 32B baselines from the same family in more than half the settings. These results point toward future agentic systems that treat scaffolding as adaptive reasoning, constructing the structure each task requires just-in-time.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to introduce 'Deep Reasoning' as an inference-time method for general-purpose LLM agents to construct task-specific scaffolds via structured meta-reasoning in a formal language representing decompositions over associative inference, formal computation, and recursive solving. The DOLORES agent implements this and is evaluated on four benchmarks (multi-hop reasoning, long-chain question answering, long-context aggregation, and deep research-style information seeking), outperforming state-of-the-art scaffolding methods by 24.8% on average across three model sizes and two families, with an 8B model surpassing 32B baselines in over half the settings.
Significance. Should the empirical results prove robust with truly fixed in-context examples, this would be significant for LLM agent research by demonstrating that structured meta-cognition can enable adaptive, just-in-time scaffolding without per-task engineering, addressing the brittleness of current hard-coded approaches. The broad evaluation across model sizes/families and the observation of scaling-gap bridging provide useful evidence of potential impact, while the mechanism of distributing cognition to reduce hallucinations offers a practical direction for more reliable agents.
major comments (1)
- Abstract: The description states that the formal language enables 'decomposition principles to be encoded as in-context examples that guide test-time scaffold construction' without clarifying whether the same fixed set of examples is used across all four benchmarks or adapted per task type. This is load-bearing for the central claim of reliable meta-reasoning on novel tasks; if examples differ by benchmark, the 24.8% average gain and scaling-gap bridging may reflect benchmark-specific prompt engineering rather than the proposed general approach.
minor comments (2)
- Abstract: The new term 'Deep Reasoning' is introduced alongside the title's 'Structured Meta-Cognition' without an explicit statement of their relationship; a one-sentence clarification in the introduction would improve readability.
- Abstract: The acronym DOLORES is used without expansion; define it on first use for clarity.
Simulated Author's Rebuttal
We thank the referee for highlighting the need for greater precision in the abstract regarding the in-context examples. This is a substantive point about the generality of the approach, and we address it directly below while committing to a revision.
read point-by-point responses
-
Referee: Abstract: The description states that the formal language enables 'decomposition principles to be encoded as in-context examples that guide test-time scaffold construction' without clarifying whether the same fixed set of examples is used across all four benchmarks or adapted per task type. This is load-bearing for the central claim of reliable meta-reasoning on novel tasks; if examples differ by benchmark, the 24.8% average gain and scaling-gap bridging may reflect benchmark-specific prompt engineering rather than the proposed general approach.
Authors: The in-context examples consist of a single fixed set that encodes general decomposition principles over associative inference, formal computation, and recursive solving. These examples are not adapted or rewritten per benchmark or task type; the same examples are used for all four evaluation settings (multi-hop reasoning, long-chain QA, long-context aggregation, and deep research-style seeking) to test the claim of general-purpose meta-reasoning. Section 3.2 and the supplementary prompt appendix describe the construction of this fixed prompt template. We agree that the abstract's phrasing leaves this ambiguous and will revise it to state explicitly that a fixed set of examples is employed across benchmarks. revision: yes
Circularity Check
No circularity: empirical evaluation of meta-reasoning scaffold
full rationale
The paper introduces DOLORES as an inference-time method using a formal language for meta-reasoning encoded in fixed in-context examples, then reports direct empirical performance gains (24.8% average) over scaffolding baselines on four standard benchmarks across model sizes. No equations, fitted parameters, or first-principles derivations are presented whose outputs reduce by construction to the inputs; the central claims rest on benchmark comparisons rather than any self-referential prediction or self-citation load-bearing step. The evaluation is therefore self-contained against external benchmarks, consistent with the reader's assessment of score 1.0.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Large language models can execute structured meta-reasoning and construct effective task-specific scaffolds when provided with in-context examples of decomposition principles.
invented entities (2)
-
Deep Reasoning
no independent evidence
-
DOLORES
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Meta-reasoning: Monitoring and control of thinking and reasoning
Rakefet Ackerman and Valerie A Thompson. Meta-reasoning: Monitoring and control of thinking and reasoning. Trends in cognitive sciences, 21 0 (8): 0 607--617, 2017
work page 2017
-
[3]
Dual process theory: Embodied and predictive; symbolic and classical
Samuel C Bellini-Leite. Dual process theory: Embodied and predictive; symbolic and classical. Frontiers in Psychology, 13: 0 805386, 2022
work page 2022
- [5]
-
[6]
Ted Byrt, Janet Bishop, and John B Carlin. Bias, prevalence and kappa. Journal of clinical epidemiology, 46 0 (5): 0 423--429, 1993
work page 1993
-
[7]
Jianlyu Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu Lian, and Zheng Liu. M3-embedding: Multi-linguality, multi-functionality, multi-granularity text embeddings through self-knowledge distillation. In Findings of the association for computational linguistics: ACL 2024, pages 2318--2335, 2024 a
work page 2024
-
[8]
Qiguang Chen, Libo Qin, Jiaqi Wang, Jingxuan Zhou, and Wanxiang Che. Unlocking the capabilities of thought: A reasoning boundary framework to quantify and optimize chain-of-thought. In Amir Globersons, Lester Mackey, Danielle Belgrave, Angela Fan, Ulrich Paquet, Jakub M. Tomczak, and Cheng Zhang, editors, Advances in Neural Information Processing Systems ...
work page 2024
-
[9]
Do not think that much for 2+ 3=? on the overthinking of long reasoning models
Xingyu Chen, Jiahao Xu, Tian Liang, Zhiwei He, Jianhui Pang, Dian Yu, Linfeng Song, Qiuzhi Liu, Mengfei Zhou, Zhuosheng Zhang, et al. Do not think that much for 2+ 3=? on the overthinking of long reasoning models. In Forty-second International Conference on Machine Learning, 2025
work page 2025
-
[10]
Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. The llama 3 herd of models. arXiv e-prints, pages arXiv--2407, 2024
work page 2024
-
[11]
Metacognition and cognitive monitoring: A new area of cognitive--developmental inquiry
John H Flavell. Metacognition and cognitive monitoring: A new area of cognitive--developmental inquiry. American psychologist, 34 0 (10): 0 906, 1979
work page 1979
-
[12]
Agentrefine: Enhancing agent generalization through refinement tuning
Dayuan Fu, Keqing He, Yejie Wang, Wentao Hong, Zhuoma GongQue, Weihao Zeng, Wei Wang, Jingang Wang, Xunliang Cai, and Weiran Xu. Agentrefine: Enhancing agent generalization through refinement tuning. In The Thirteenth International Conference on Learning Representations, 2025 a . URL https://openreview.net/forum?id=FDimWzmcWn
work page 2025
-
[14]
Phantomwiki: On-demand datasets for reasoning and retrieval evaluation
Albert Gong, Kamil \.e Stankevi c i \=u t \.e , Chao Wan, Anmol Kabra, Raphael Thesmar, Johann Lee, Julius Klenke, Carla P Gomes, and Kilian Q Weinberger. Phantomwiki: On-demand datasets for reasoning and retrieval evaluation. In International Conference on Machine Learning, pages 19964--19995. PMLR, 2025
work page 2025
-
[16]
Synthworlds: Controlled parallel worlds for disentangling reasoning and knowledge in language models
Ken Gu, Advait Bhat, Mike A Merrill, Robert West, Xin Liu, Daniel McDuff, and Tim Althoff. Synthworlds: Controlled parallel worlds for disentangling reasoning and knowledge in language models. In The Fourteenth International Conference on Learning Representations, 2026. URL https://openreview.net/forum?id=46AQ4qaWqQ
work page 2026
- [21]
-
[23]
Gonzalez, Hao Zhang, and Ion Stoica
Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, and Ion Stoica. Efficient memory management for large language model serving with pagedattention. In Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles, 2023
work page 2023
-
[24]
Eugenie Lai, Gerardo Vitagliano, Ziyu Zhang, Om Chabra, SIVAPRASAD SUDHIR, Anna Zeng, Anton A. Zabreyko, Chenning Li, Ferdi Kossmann, Jialin Ding, Jun Chen, Markos Markakis, Matthew Russo, Weiyang Wang, Ziniu Wu, Mike Cafarella, Lei Cao, Samuel Madden, and Tim Kraska. KRAMABENCH : A benchmark for AI systems on data-to-insight pipelines over data lakes. In...
work page 2026
-
[28]
Faithful chain-of-thought reasoning
Qing Lyu, Shreya Havaldar, Adam Stein, Li Zhang, Delip Rao, Eric Wong, Marianna Apidianaki, and Chris Callison-Burch. Faithful chain-of-thought reasoning. In Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Lo...
work page 2023
-
[31]
Tahmid Rahman Laskar, Mizanur Rahman, Shadikur Rahman, Mehrad Shahmohammadi, Megh Thakkar, Md
Ahmed Masry, Mohammed Saidul Islam, Mahir Ahmed, Aayush Bajaj, Firoz Kabir, Aaryaman Kartha, Md. Tahmid Rahman Laskar, Mizanur Rahman, Shadikur Rahman, Mehrad Shahmohammadi, Megh Thakkar, Md. Rizwan Parvez, Enamul Hoque, and Shafiq Joty. Chartqapro: A more diverse and challenging benchmark for chart question answering. In Wanxiang Che, Joyce Nabende, Ekat...
work page 2025
-
[32]
The associative basis of the creative process
Sarnoff Mednick. The associative basis of the creative process. Psychological review, 69 0 (3): 0 220, 1962
work page 1962
-
[33]
Report on a general problem solving program
Allen Newell, John C Shaw, and Herbert A Simon. Report on a general problem solving program. In IFIP congress, volume 256, page 1959. Pittsburgh, PA, 1959
work page 1959
-
[34]
NVIDIA AI-Q Blueprint for Intelligent Agents , 2026
NVIDIA . NVIDIA AI-Q Blueprint for Intelligent Agents , 2026. URL https://build.nvidia.com/nvidia/aiq. Accessed: 2026-04-14
work page 2026
-
[35]
Introducing deep research, February 2025
OpenAI . Introducing deep research, February 2025. URL https://openai.com/index/introducing-deep-research/. Accessed: 2026-05-06
work page 2025
-
[36]
Cognitive processes in propositional reasoning
Lance J Rips. Cognitive processes in propositional reasoning. Psychol. Rev., 90 0 (1): 0 38--71, January 1983
work page 1983
-
[37]
Eleanor Rosch. Principles of categorization. In Eleanor Rosch and Barbara Bloom Lloyd, editors, Cognition and Categorization, pages 27--48. Lawrence Elbaum Associates, 1978
work page 1978
-
[38]
Agentbreeder: Mitigating the AI safety risks of multi-agent scaffolds via self-improvement
J Rosser and Jakob Nicolaus Foerster. Agentbreeder: Mitigating the AI safety risks of multi-agent scaffolds via self-improvement. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026. URL https://openreview.net/forum?id=mlU9KqdZUS
work page 2026
-
[39]
`smolagents`: a smol library to build great agentic systems
Aymeric Roucher, Albert Villanova del Moral, Thomas Wolf, Leandro von Werra, and Erik Kaunismäki. `smolagents`: a smol library to build great agentic systems. https://github.com/huggingface/smolagents, 2025 a
work page 2025
-
[40]
Open-source deepresearch -- freeing our search agents, February 2025 b
Aymeric Roucher, Albert Villanova del Moral, Merve, Thomas Wolf, and Cl \'e mentine Fourrier. Open-source deepresearch -- freeing our search agents, February 2025 b . URL https://huggingface.co/blog/open-deep-research. Accessed: 2026-04-13
work page 2025
-
[42]
The empirical case for two systems of reasoning
Steven A Sloman. The empirical case for two systems of reasoning. Psychological bulletin, 119 0 (1): 0 3, 1996
work page 1996
-
[43]
Rationality and the reflective mind
Keith Stanovich. Rationality and the reflective mind. Oxford University Press, 2011
work page 2011
-
[46]
Monitoring and storage of irrelevant messages in selective attention
Anne Treisman. Monitoring and storage of irrelevant messages in selective attention. Journal of Verbal Learning and Verbal Behavior, 3 0 (6): 0 449--459, 1964
work page 1964
-
[47]
Executable code actions elicit better llm agents
Xingyao Wang, Yangyi Chen, Lifan Yuan, Yizhe Zhang, Yunzhu Li, Hao Peng, and Heng Ji. Executable code actions elicit better llm agents. In Forty-first International Conference on Machine Learning, 2024
work page 2024
-
[48]
Peter C Wason. Reasoning about a rule. Quarterly journal of experimental psychology, 20 0 (3): 0 273--281, 1968
work page 1968
-
[51]
React: Synergizing reasoning and acting in language models
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. In The eleventh international conference on learning representations, 2022
work page 2022
-
[55]
Judging llm-as-a-judge with mt-bench and chatbot arena
Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric Xing, et al. Judging llm-as-a-judge with mt-bench and chatbot arena. Advances in neural information processing systems, 36: 0 46595--46623, 2023
work page 2023
-
[56]
Chunqiu Steven Xia and Zhe Wang and Yan Yang and Yuxiang Wei and Lingming Zhang , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2511.13646 , eprinttype =. 2511.13646 , timestamp =
-
[57]
Guangyi Liu and Haojun Lin and Huan Zeng and Heng Wang and Quanming Yao , title =. CoRR , volume =. 2026 , url =. doi:10.48550/ARXIV.2602.13671 , eprinttype =. 2602.13671 , timestamp =
-
[58]
The Thirteenth International Conference on Learning Representations,
Shengran Hu and Cong Lu and Jeff Clune , title =. The Thirteenth International Conference on Learning Representations,. 2025 , url =
work page 2025
-
[59]
ReCreate: Reasoning and Creating Domain Agents Driven by Experience
Zhezheng Hao and Hong Wang and Jian Luo and Jianqing Zhang and Yuyan Zhou and Qiang Lin and Can Wang and Hande Dong and Jiawei Chen , title =. CoRR , volume =. 2026 , url =. doi:10.48550/ARXIV.2601.11100 , eprinttype =. 2601.11100 , timestamp =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2601.11100 2026
-
[60]
Eugenie Lai and Gerardo Vitagliano and Ziyu Zhang and Om Chabra and SIVAPRASAD SUDHIR and Anna Zeng and Anton A. Zabreyko and Chenning Li and Ferdi Kossmann and Jialin Ding and Jun Chen and Markos Markakis and Matthew Russo and Weiyang Wang and Ziniu Wu and Mike Cafarella and Lei Cao and Samuel Madden and Tim Kraska , booktitle=. 2026 , url=
work page 2026
-
[61]
Stephen Casper and Luke Bailey and Rosco Hunter and Carson Ezell and Emma Cabal. The. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2502.01635 , eprinttype =. 2502.01635 , timestamp =
-
[62]
The OpenHands Software Agent SDK: A Composable and Extensible Foundation for Production Agents
Xingyao Wang and Simon Rosenberg and Juan Michelini and Calvin Smith and Hoang H. Tran and Engel Nyst and Rohit Malhotra and Xuhui Zhou and Valerie Chen and Robert Brennan and Graham Neubig , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2511.03690 , eprinttype =. 2511.03690 , timestamp =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2511.03690 2025
-
[63]
Frontier LLMs Still Struggle with Simple Reasoning Tasks , journal =
Alan Malek and Jiawei Ge and Nevena Lazic and Chi Jin and Andr. Frontier LLMs Still Struggle with Simple Reasoning Tasks , journal =. 2025 , url =. doi:10.48550/ARXIV.2507.07313 , eprinttype =. 2507.07313 , timestamp =
-
[64]
Sweller, John and van Merriënboer, Jeroen J. G. and Paas, Fred , year =. Cognitive Architecture and Instructional Design: 20 Years Later , volume =. Educational Psychology Review , publisher =. doi:10.1007/s10648-019-09465-5 , number =
-
[65]
Doing more with less: meta-reasoning and meta-learning in humans and machines , journal =
Thomas L Griffiths and Frederick Callaway and Michael B Chang and Erin Grant and Paul M Krueger and Falk Lieder , abstract =. Doing more with less: meta-reasoning and meta-learning in humans and machines , journal =. 2019 , note =. doi:https://doi.org/10.1016/j.cobeha.2019.01.005 , url =
-
[66]
Unlocking the Capabilities of Thought:
Qiguang Chen and Libo Qin and Jiaqi Wang and Jingxuan Zhou and Wanxiang Che , editor =. Unlocking the Capabilities of Thought:. Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024 , year =
work page 2024
-
[67]
Qihang Fu and Yongbin Qin and Ruizhang Huang and Yanping Chen and Yulin Zhou and Lintao Long , editor =. Exclusion of Thought: Mitigating Cognitive Load in Large Language Models for Enhanced Reasoning in Multiple-Choice Tasks , booktitle =. 2025 , url =. doi:10.18653/V1/2025.ACL-LONG.1051 , timestamp =
-
[68]
Working Memory Identifies Reasoning Limits in Language Models , booktitle =
Chunhui Zhang and Yiren Jian and Zhongyu Ouyang and Soroush Vosoughi , editor =. Working Memory Identifies Reasoning Limits in Language Models , booktitle =. 2024 , url =. doi:10.18653/V1/2024.EMNLP-MAIN.938 , timestamp =
-
[69]
The Thirteenth International Conference on Learning Representations , year=
AgentRefine: Enhancing Agent Generalization through Refinement Tuning , author=. The Thirteenth International Conference on Learning Representations , year=
-
[70]
Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning , author=. 2025 , eprint=
work page 2025
-
[71]
Meta-Harness: End-to-End Optimization of Model Harnesses
Yoonho Lee and Roshen Nair and Qizheng Zhang and Kangwook Lee and Omar Khattab and Chelsea Finn , title =. CoRR , volume =. 2026 , url =. doi:10.48550/ARXIV.2603.28052 , eprinttype =. 2603.28052 , timestamp =
work page internal anchor Pith review doi:10.48550/arxiv.2603.28052 2026
-
[72]
John Yang and Carlos E. Jimenez and Alexander Wettig and Kilian Lieret and Shunyu Yao and Karthik Narasimhan and Ofir Press , editor =. SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering , booktitle =. 2024 , url =
work page 2024
-
[73]
Frontier AI Trends Report , year =
-
[74]
Anthropic Raises \ 30 Billion in Series G Funding at \ 380 Billion Post-Money Valuation , year =
-
[75]
Thomas McCoy and Shunyu Yao and Dan Friedman and Matthew Hardy and Thomas L
R. Thomas McCoy and Shunyu Yao and Dan Friedman and Matthew Hardy and Thomas L. Griffiths , title =. CoRR , volume =. 2023 , url =. doi:10.48550/ARXIV.2309.13638 , eprinttype =. 2309.13638 , timestamp =
-
[76]
J Rosser and Jakob Nicolaus Foerster , booktitle=. AgentBreeder: Mitigating the. 2026 , url=
work page 2026
- [77]
-
[78]
Carlos E Jimenez and John Yang and Alexander Wettig and Shunyu Yao and Kexin Pei and Ofir Press and Karthik R Narasimhan , booktitle=. 2024 , url=
work page 2024
-
[79]
Thomas Kwa and Ben West and Joel Becker and Amy Deng and Katharyn Garcia and Max Hasin and Sami Jawhar and Megan Kinniment and Nate Rush and Sydney Von Arx and Ryan Bloom and Thomas Broadley and Haoxing Du and Brian Goodrich and Nikola Jurkovic and Luke Harold Miles and Seraphina Nix and Tao Roa Lin and Neev Parikh and David Rein and Lucas Jun Koba Sato a...
work page 2026
-
[80]
Course in General Linguistics , year =
de Saussure, Ferdinand , biburl =. Course in General Linguistics , year =
-
[81]
Yoshihiko Futamura , title =. High. Order Symb. Comput. , volume =. 1999 , url =. doi:10.1023/A:1010043619517 , timestamp =
-
[82]
Proceedings of the Symposium on Computers and Automata , editor =
Scott, Dana and Strachey, Christopher , title =. Proceedings of the Symposium on Computers and Automata , editor =. 1971 , volume =
work page 1971
-
[83]
Formal Philosophy: Selected Papers of Richard Montague , editor =
Montague, Richard , title =. Formal Philosophy: Selected Papers of Richard Montague , editor =. 1974 , pages =
work page 1974
-
[84]
Attention is All you Need , url =
Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, ukasz and Polosukhin, Illia , booktitle =. Attention is All you Need , url =
-
[85]
Measuring Faithfulness in Chain-of-Thought Reasoning
Measuring faithfulness in chain-of-thought reasoning , author=. arXiv preprint arXiv:2307.13702 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[86]
arXiv preprint arXiv:2405.15092 , year=
Dissociation of faithful and unfaithful reasoning in llms , author=. arXiv preprint arXiv:2405.15092 , year=
-
[87]
Faithful chain-of-thought reasoning , author=. Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
-
[88]
Chain-of-thought reasoning in the wild is not always faithful , author=. arXiv preprint arXiv:2503.08679 , year=
-
[89]
Forty-second International Conference on Machine Learning , year=
Do NOT think that much for 2+ 3=? on the overthinking of long reasoning models , author=. Forty-second International Conference on Machine Learning , year=
-
[90]
Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning , author=. arXiv preprint arXiv:2505.17813 , year=
-
[91]
Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs , author=. ArXiv , year=
-
[92]
Nature human behaviour , volume=
Building machines that learn and think with people , author=. Nature human behaviour , volume=. 2024 , publisher=
work page 2024
-
[93]
Proceedings of the National Academy of Sciences , volume=
Mental models and human reasoning , author=. Proceedings of the National Academy of Sciences , volume=. 2010 , publisher=
work page 2010
- [94]
-
[95]
Trends in cognitive sciences , volume=
Meta-reasoning: Monitoring and control of thinking and reasoning , author=. Trends in cognitive sciences , volume=. 2017 , publisher=
work page 2017
- [96]
-
[97]
Frontiers in Psychology , volume=
Dual process theory: Embodied and predictive; symbolic and classical , author=. Frontiers in Psychology , volume=. 2022 , publisher=
work page 2022
-
[98]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning , author=. arXiv preprint arXiv:2501.12948 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[99]
Openai o1 system card , author=. arXiv preprint arXiv:2412.16720 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[100]
Does Reinforcement Learning Really Incentivize Reasoning Capacity in
Yang Yue and Zhiqi Chen and Rui Lu and Andrew Zhao and Zhaokai Wang and Yang Yue and Shiji Song and Gao Huang , booktitle=. Does Reinforcement Learning Really Incentivize Reasoning Capacity in. 2026 , url=
work page 2026
-
[101]
The eleventh international conference on learning representations , year=
React: Synergizing reasoning and acting in language models , author=. The eleventh international conference on learning representations , year=
-
[102]
Forty-first International Conference on Machine Learning , year=
Executable code actions elicit better llm agents , author=. Forty-first International Conference on Machine Learning , year=
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.