GraphMind: Theorem Selection and Conclusion Generation Framework with Dynamic GNN for LLM Reasoning

Caiyan Qin; GuoChen; Xudong Wang; Yitian Zhou; Yutong Li

arxiv: 2511.19078 · v2 · pith:FO4EHHORnew · submitted 2025-11-24 · 💻 cs.CL · cs.AI

GraphMind: Theorem Selection and Conclusion Generation Framework with Dynamic GNN for LLM Reasoning

Yutong Li , Yitian Zhou , Xudong Wang , GuoChen , Caiyan Qin This is my paper

Pith reviewed 2026-05-21 18:27 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords Graph Neural NetworkLarge Language ModelsMulti-step ReasoningTheorem SelectionDynamic GraphsQuestion AnsweringConclusion Generation

0 comments

The pith

Modeling reasoning as an evolving heterogeneous graph with GNN encoding allows LLMs to select theorems and generate conclusions more effectively in multi-step tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces GraphMind to address the lack of explicit dynamic mechanisms in LLMs for representing and evolving intermediate reasoning states. It models the reasoning process as a heterogeneous evolving graph with nodes for conditions, theorems, and conclusions, and edges for logical dependencies. A graph neural network encodes the current state to support semantic matching for theorem selection and iterative conclusion generation. This creates a closed-loop, context-aware reasoning process. Tests on multiple question-answering datasets show consistent gains and better results than prior methods in multi-step reasoning.

Core claim

The central discovery is that integrating a dynamic graph neural network with LLMs through a heterogeneous evolving graph enables context-aware theorem selection and iterative conclusion generation, resulting in improved performance on multi-step reasoning tasks over existing baselines.

What carries the argument

A heterogeneous evolving graph with nodes representing conditions, theorems, and conclusions and edges capturing logical dependencies, encoded dynamically by a GNN to guide theorem selection and conclusion generation.

If this is right

Provides an explicit mechanism to structurally represent and evolve intermediate reasoning states.
Achieves consistent performance improvements on various QA datasets.
Significantly outperforms existing baselines in multi-step reasoning.
Supports interpretable and structured reasoning in a closed-loop manner.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Such graph-based tracking of reasoning dependencies could extend to other complex tasks like automated theorem proving or planning.
Visualizing the evolving graph might help users understand and correct LLM reasoning paths.
Integrating this with symbolic solvers could create more reliable hybrid reasoning systems.

Load-bearing premise

The modeling of the reasoning process as a heterogeneous evolving graph enables the GNN to provide effective context-aware guidance for theorem selection and conclusion generation.

What would settle it

A controlled experiment where removing the graph component or GNN encoding results in no performance difference or worse results on the same QA datasets compared to the full GraphMind method.

Figures

Figures reproduced from arXiv: 2511.19078 by Caiyan Qin, GuoChen, Xudong Wang, Yitian Zhou, Yutong Li.

**Figure 1.** Figure 1: Overview of the proposed GraphMind framework, consisting of four core modules: graph encoding, theorem matching, [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

read the original abstract

Large language models (LLMs) have demonstrated impressive capabilities in natural language understanding and generation, including multi-step reasoning such as mathematical proving. However, existing approaches often lack an explicit and dynamic mechanism to structurally represent and evolve intermediate reasoning states, which limits their ability to perform context-aware theorem selection and iterative conclusion generation. To address these challenges, we propose GraphMind, a novel dynamic graph-based framework that integrates the graph neural network (GNN) with LLMs to iteratively select theorems and generate intermediate conclusions for multi-step reasoning. Our method models the reasoning process as a heterogeneous evolving graph, where nodes represent conditions, theorems, and conclusions, while edges capture logical dependencies between nodes. By encoding the current reasoning state with GNN and leveraging semantic matching for theorem selection, our framework enables context-aware, interpretable, and structured reasoning in a closed-loop manner. Experiments on various question-answering (QA) datasets demonstrate that our proposed GraphMind method achieves consistent performance improvements and significantly outperforms existing baselines in multi-step reasoning, validating the effectiveness and generalizability of our approach.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GraphMind adds a dynamic GNN to evolve a heterogeneous reasoning graph for theorem selection, but the QA dataset experiments leave the central mechanism under-tested.

read the letter

The core idea is to represent multi-step reasoning as an evolving graph with nodes for conditions, theorems, and conclusions, then use a GNN to encode the current state and pick the next theorem in a closed loop with the LLM. That combination is the actual new piece; prior work has used graphs or GNNs for reasoning, but the explicit dynamic evolution tied to theorem selection is a distinct framing here. The paper does a reasonable job laying out the heterogeneous graph construction and the semantic matching step for selection, which at least gives a concrete architecture to discuss. Credit for trying to make the intermediate states more inspectable than pure chain-of-thought prompting. The main soft spot is the evaluation. The abstract and stress-test note both point to standard QA datasets, which rarely come with an explicit corpus of theorems or logical rules to select from. If the paper does not show a reproducible procedure for populating theorem nodes from the input questions or for ablating the GNN component against plain retrieval-plus-LLM baselines, the reported gains could easily come from the LLM side rather than the claimed graph evolution. That makes the central claim harder to accept at face value. The work is aimed at researchers working on structured LLM reasoning and interpretability. A reader already interested in graph-augmented agents would get value from the architecture description even if the experiments need tightening. It is coherent enough on its own terms to deserve a serious referee rather than a desk reject; the idea is worth testing properly with the right benchmarks or ablations.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes GraphMind, a framework that integrates dynamic Graph Neural Networks (GNNs) with Large Language Models (LLMs) for multi-step reasoning. The reasoning process is modeled as a heterogeneous evolving graph whose nodes represent conditions, theorems, and conclusions, with edges encoding logical dependencies. A GNN encodes the current state to support context-aware theorem selection via semantic matching, followed by iterative conclusion generation in a closed loop. The authors claim that experiments on various question-answering datasets demonstrate consistent performance gains and outperformance of existing baselines.

Significance. If the central empirical claim is substantiated by rigorous experiments that isolate the contribution of the dynamic graph evolution, the work could provide a structured and interpretable alternative to purely prompt-based LLM reasoning. The explicit modeling of evolving states via heterogeneous graphs addresses a recognized limitation in current approaches. However, the significance hinges on demonstrating that observed gains arise from the GNN-driven theorem selection rather than from generic LLM enhancements or retrieval components.

major comments (2)

[Abstract] Abstract: the claim that 'experiments on various question-answering (QA) datasets demonstrate that our proposed GraphMind method achieves consistent performance improvements and significantly outperforms existing baselines' is unsupported by any metrics, statistical tests, dataset names, baseline descriptions, or ablation results, rendering the central performance claim impossible to evaluate.
[Experiments] Experiments section: standard QA benchmarks (e.g., HotpotQA-style multi-hop datasets) do not supply an explicit theorem corpus or logical rules; the manuscript provides no documented procedure for dynamically populating theorem nodes or for constructing the heterogeneous graph from such data. Without this, performance gains cannot be attributed to the claimed GNN-based context-aware selection and graph evolution rather than to LLM prompting or retrieval alone.

minor comments (1)

[Method] The description of how the GNN updates the evolving graph state after each conclusion generation step would benefit from a concise algorithmic outline or pseudocode.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments, which highlight important areas for improving the clarity and rigor of our manuscript. We address each major comment point by point below and commit to revisions that directly respond to the concerns raised.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that 'experiments on various question-answering (QA) datasets demonstrate that our proposed GraphMind method achieves consistent performance improvements and significantly outperforms existing baselines' is unsupported by any metrics, statistical tests, dataset names, baseline descriptions, or ablation results, rendering the central performance claim impossible to evaluate.

Authors: We agree that the abstract presents the performance claim at a high level without sufficient concrete details. The Experiments section of the manuscript does contain the supporting information, including specific QA datasets, quantitative metrics, baseline comparisons, and ablation studies. To address this, we will revise the abstract to incorporate key details such as dataset names (e.g., HotpotQA), reported performance gains, and references to the baselines and ablations, while preserving conciseness. This change will make the central claim more directly evaluable. revision: yes
Referee: [Experiments] Experiments section: standard QA benchmarks (e.g., HotpotQA-style multi-hop datasets) do not supply an explicit theorem corpus or logical rules; the manuscript provides no documented procedure for dynamically populating theorem nodes or for constructing the heterogeneous graph from such data. Without this, performance gains cannot be attributed to the claimed GNN-based context-aware selection and graph evolution rather than to LLM prompting or retrieval alone.

Authors: The referee correctly notes that standard multi-hop QA datasets lack an explicit theorem corpus. In GraphMind, theorem nodes and the heterogeneous graph are constructed dynamically: the LLM extracts conditions from the query, generates candidate theorems via semantic matching against retrieved context, and evolves the graph as conclusions are produced. However, we acknowledge that the current manuscript does not document this procedure with sufficient detail or pseudocode. We will add a dedicated subsection in the revised Experiments section describing the graph construction process step by step, including how nodes and edges are populated and updated. We will also expand the ablation studies to better isolate the GNN's contribution from generic LLM prompting or retrieval effects. revision: yes

Circularity Check

0 steps flagged

No circularity: framework and claims rest on external experiments and standard components

full rationale

The paper proposes GraphMind by describing a heterogeneous evolving graph (nodes for conditions/theorems/conclusions, edges for logical dependencies) encoded via GNN plus semantic matching for theorem selection, then reports performance gains on QA datasets. No equations, fitted parameters, or derivations are presented that reduce by construction to the inputs themselves. The central performance claim is tied to experimental results on external benchmarks rather than self-definition or self-citation chains. The derivation chain is therefore self-contained against the stated assumptions and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that reasoning states can be usefully represented as heterogeneous evolving graphs and that GNN encoding of those graphs yields better theorem selection than standard LLM methods.

axioms (1)

domain assumption The reasoning process can be effectively modeled as a heterogeneous evolving graph where nodes represent conditions, theorems, and conclusions, and edges capture logical dependencies.
This modeling choice is invoked as the foundation for context-aware selection and iterative generation.

invented entities (1)

Dynamic GNN for evolving reasoning state no independent evidence
purpose: To encode the current reasoning graph and support semantic theorem selection in a closed loop
Introduced as the core novel component of the framework without external independent evidence cited in the abstract.

pith-pipeline@v0.9.0 · 5726 in / 1363 out tokens · 61926 ms · 2026-05-21T18:27:11.311656+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

models the reasoning process as a heterogeneous evolving graph, where nodes represent conditions, theorems, and conclusions, while edges capture logical dependencies
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

By encoding the current reasoning state with GNN and leveraging semantic matching for theorem selection

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages · 8 internal anchors

[1]

Faysal Abdaljalil, Kewen Xu, Yansen Wang, Hongming Zhang, Xiang Zhang, Yuning Zhang, and Muhao Chen. 2025. Theorem-of-Thought: A Multi-Agent Framework for Theorem Reasoning with Language Models.arXiv preprint arXiv:2506.07106(2025)

work page arXiv 2025
[2]

Ahmed Abdeljalil, John Smith, and Li Zhao. 2023. Theorem-of-Thought: Reason- ing with Language Models through Theorem-Guided Agents. InProceedings of the 37th AAAI Conference on Artificial Intelligence (AAAI). AAAI Press, 3456–3463

work page 2023
[3]

Mohamed Abdeljalil and et al. 2023. Theorem-Guided Reasoning with Graph Neural Networks. InACL

work page 2023
[4]

Anthropic. 2023. Claude: Constitutional AI. https://www.anthropic.com/index/ claude. Accessed: 2025-07-15

work page 2023
[5]

Zhiyu Chen, Wenhu Chen, Charese Smiley, Sameena Shah, Iana Borova, Dylan Langdon, Reema Moussa, Matt Beane, Ting-Hao Huang, Bryan R Routledge, et al. 2021. FinQA: A Dataset of Numerical Reasoning over Financial Data. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 3697–3711

work page 2021
[6]

Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, et al. 2021. Training verifiers to solve math word problems.arXiv preprint arXiv:2110.14168(2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021
[7]

Google DeepMind. 2023. Gemini: Our Most Capable and General AI Yet. https: //deepmind.google/technologies/gemini. Accessed: 2025-07-15

work page 2023
[8]

Shizhe Diao, Pengcheng Wang, Yong Lin, Rui Pan, Xiang Liu, and Tong Zhang

work page
[9]

InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Active Prompting with Chain-of-Thought for Large Language Models. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1330–1350

work page
[10]

Neel Guha, Julian Nyarko, Daniel Ho, Christopher Ré, Adam Chilton, Alex Chohlas-Wood, Austin Peters, Brandon Waldon, Daniel Rockmore, Diego Zam- brano, et al. 2023. Legalbench: A collaboratively built benchmark for measuring legal reasoning in large language models.Advances in neural information pro- cessing systems36 (2023), 44123–44279

work page 2023
[11]

Kevin Han, Nidhi Tandon, Peter West, Yejin Yang, and Hannaneh Hajishirzi. 2021. ProofWriter: Generating and Explaining Implicit Knowledge. InProceedings of the 59th Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics, 879–894

work page 2021
[12]

Zhengbao Han, Yichong Xie, Mihai Surdeanu, Peter Clark, Matt Gardner, and Hannaneh Hajishirzi. 2021. ProofWriter: Generating Implications, Proofs, and Abductive Statements over Natural Language.arXiv preprint arXiv:2105.10823 (2021)

work page arXiv 2021
[13]

Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, et al. 2024. Gpt-4o system card.arXiv preprint arXiv:2410.21276(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[14]

Aditya Kalyanpur, Kailash Saravanakumar, Victor Barres, Jennifer Chu-Carroll, David Melville, and David A Ferrucci. 2024. LLM-ARC: Enhancing LLMs with an Automated Reasoning Critic.CoRR(2024)

work page 2024
[15]

Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. 2022. Large Language Models are Zero-Shot Reasoners.arXiv preprint arXiv:2205.11916(2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022
[16]

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rock- täschel, et al. 2020. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.arXiv preprint arXiv:2005.11401(2020)

work page internal anchor Pith review Pith/arXiv arXiv 2020
[17]

Solving Quantitative Reasoning Problems with Language Models

Aitor Lewkowycz, Aitor Lewkowycz, Barret Zoph, Daniel M. Freeman, Adams Yu, Yanping Zhao, Xinyun Chen, Sharan Narang, Zihang Dai, Aakanksha Chowdhery, et al. 2022. Solving Quantitative Reasoning Problems with Language Models. arXiv preprint arXiv:2206.14858(2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022
[18]

Jing Ma, Hui Lee, and Ming Wang. 2024. Graph-of-Thought: Reasoning with Language Models through Graph-Based Multi-Path Planning. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics

work page 2024
[19]

Aaron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learn- ing with contrastive predictive coding. InProceedings of the 32nd International Conference on Neural Information Processing Systems (NeurIPS)

work page 2018
[20]

OpenAI. 2023. GPT-4 Technical Report.arXiv preprint arXiv:2303.08774(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[21]

Yuanhang Tian, Xiang Li, Chuanqi Tan, Shikun Yu, Songfang Zhang, and Fei Chen. 2023. Graph Neural Prompting with Large Language Models.arXiv preprint arXiv:2309.15427(2023). Yutong Li, Yitian Zhou, Xudong Wang, GuoChen, and Caiyan Qin*

work page arXiv 2023
[22]

Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, and Denny Zhou

work page
[23]

Self-Consistency Improves Chain of Thought Reasoning in Language Models.arXiv preprint arXiv:2203.11171(2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022
[24]

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, and Denny Zhou. 2022. Chain of Thought Prompting Elicits Reasoning in Large Language Models.arXiv preprint arXiv:2201.11903 (2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022
[25]

Chenghao Yang, Yuzhong Chen, Xinyun Liu, Yuntian Cao, Bill Yuchen Lin, Xipeng Qiu, Jing Liu, Haixun Shi, and Xiang Ren. 2023. ProofNet: Autoformalizing and Proving under Theorem Libraries.arXiv preprint arXiv:2305.14342(2023)

work page arXiv 2023
[26]

Xinyu Yang, Zhiyuan Liu, Yixin Chen, et al. 2023. ProofNet: Neural Theorem Proving with Structured Neural Networks. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics, 1234–1245

work page 2023
[27]

Li Yao, Hao Chen, and Wei Sun. 2023. GraphProgram: Program Synthesis Over Graphs for Neural Reasoning. InProceedings of the 40th International Conference on Machine Learning (ICML). PMLR

work page 2023
[28]

Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Tom Griffiths, Yuan Cao, and Karthik Narasimhan. 2023. Tree of thoughts: Deliberate problem solving with large language models.Advances in neural information processing systems36 (2023), 11809–11822

work page 2023

[1] [1]

Faysal Abdaljalil, Kewen Xu, Yansen Wang, Hongming Zhang, Xiang Zhang, Yuning Zhang, and Muhao Chen. 2025. Theorem-of-Thought: A Multi-Agent Framework for Theorem Reasoning with Language Models.arXiv preprint arXiv:2506.07106(2025)

work page arXiv 2025

[2] [2]

Ahmed Abdeljalil, John Smith, and Li Zhao. 2023. Theorem-of-Thought: Reason- ing with Language Models through Theorem-Guided Agents. InProceedings of the 37th AAAI Conference on Artificial Intelligence (AAAI). AAAI Press, 3456–3463

work page 2023

[3] [3]

Mohamed Abdeljalil and et al. 2023. Theorem-Guided Reasoning with Graph Neural Networks. InACL

work page 2023

[4] [4]

Anthropic. 2023. Claude: Constitutional AI. https://www.anthropic.com/index/ claude. Accessed: 2025-07-15

work page 2023

[5] [5]

Zhiyu Chen, Wenhu Chen, Charese Smiley, Sameena Shah, Iana Borova, Dylan Langdon, Reema Moussa, Matt Beane, Ting-Hao Huang, Bryan R Routledge, et al. 2021. FinQA: A Dataset of Numerical Reasoning over Financial Data. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 3697–3711

work page 2021

[6] [6]

Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, et al. 2021. Training verifiers to solve math word problems.arXiv preprint arXiv:2110.14168(2021)

work page internal anchor Pith review Pith/arXiv arXiv 2021

[7] [7]

Google DeepMind. 2023. Gemini: Our Most Capable and General AI Yet. https: //deepmind.google/technologies/gemini. Accessed: 2025-07-15

work page 2023

[8] [8]

Shizhe Diao, Pengcheng Wang, Yong Lin, Rui Pan, Xiang Liu, and Tong Zhang

work page

[9] [9]

InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Active Prompting with Chain-of-Thought for Large Language Models. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1330–1350

work page

[10] [10]

Neel Guha, Julian Nyarko, Daniel Ho, Christopher Ré, Adam Chilton, Alex Chohlas-Wood, Austin Peters, Brandon Waldon, Daniel Rockmore, Diego Zam- brano, et al. 2023. Legalbench: A collaboratively built benchmark for measuring legal reasoning in large language models.Advances in neural information pro- cessing systems36 (2023), 44123–44279

work page 2023

[11] [11]

Kevin Han, Nidhi Tandon, Peter West, Yejin Yang, and Hannaneh Hajishirzi. 2021. ProofWriter: Generating and Explaining Implicit Knowledge. InProceedings of the 59th Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics, 879–894

work page 2021

[12] [12]

Zhengbao Han, Yichong Xie, Mihai Surdeanu, Peter Clark, Matt Gardner, and Hannaneh Hajishirzi. 2021. ProofWriter: Generating Implications, Proofs, and Abductive Statements over Natural Language.arXiv preprint arXiv:2105.10823 (2021)

work page arXiv 2021

[13] [13]

Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, et al. 2024. Gpt-4o system card.arXiv preprint arXiv:2410.21276(2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[14] [14]

Aditya Kalyanpur, Kailash Saravanakumar, Victor Barres, Jennifer Chu-Carroll, David Melville, and David A Ferrucci. 2024. LLM-ARC: Enhancing LLMs with an Automated Reasoning Critic.CoRR(2024)

work page 2024

[15] [15]

Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. 2022. Large Language Models are Zero-Shot Reasoners.arXiv preprint arXiv:2205.11916(2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022

[16] [16]

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rock- täschel, et al. 2020. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.arXiv preprint arXiv:2005.11401(2020)

work page internal anchor Pith review Pith/arXiv arXiv 2020

[17] [17]

Solving Quantitative Reasoning Problems with Language Models

Aitor Lewkowycz, Aitor Lewkowycz, Barret Zoph, Daniel M. Freeman, Adams Yu, Yanping Zhao, Xinyun Chen, Sharan Narang, Zihang Dai, Aakanksha Chowdhery, et al. 2022. Solving Quantitative Reasoning Problems with Language Models. arXiv preprint arXiv:2206.14858(2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022

[18] [18]

Jing Ma, Hui Lee, and Ming Wang. 2024. Graph-of-Thought: Reasoning with Language Models through Graph-Based Multi-Path Planning. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics

work page 2024

[19] [19]

Aaron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learn- ing with contrastive predictive coding. InProceedings of the 32nd International Conference on Neural Information Processing Systems (NeurIPS)

work page 2018

[20] [20]

OpenAI. 2023. GPT-4 Technical Report.arXiv preprint arXiv:2303.08774(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[21] [21]

Yuanhang Tian, Xiang Li, Chuanqi Tan, Shikun Yu, Songfang Zhang, and Fei Chen. 2023. Graph Neural Prompting with Large Language Models.arXiv preprint arXiv:2309.15427(2023). Yutong Li, Yitian Zhou, Xudong Wang, GuoChen, and Caiyan Qin*

work page arXiv 2023

[22] [22]

Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, and Denny Zhou

work page

[23] [23]

Self-Consistency Improves Chain of Thought Reasoning in Language Models.arXiv preprint arXiv:2203.11171(2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022

[24] [24]

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, and Denny Zhou. 2022. Chain of Thought Prompting Elicits Reasoning in Large Language Models.arXiv preprint arXiv:2201.11903 (2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022

[25] [25]

Chenghao Yang, Yuzhong Chen, Xinyun Liu, Yuntian Cao, Bill Yuchen Lin, Xipeng Qiu, Jing Liu, Haixun Shi, and Xiang Ren. 2023. ProofNet: Autoformalizing and Proving under Theorem Libraries.arXiv preprint arXiv:2305.14342(2023)

work page arXiv 2023

[26] [26]

Xinyu Yang, Zhiyuan Liu, Yixin Chen, et al. 2023. ProofNet: Neural Theorem Proving with Structured Neural Networks. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics, 1234–1245

work page 2023

[27] [27]

Li Yao, Hao Chen, and Wei Sun. 2023. GraphProgram: Program Synthesis Over Graphs for Neural Reasoning. InProceedings of the 40th International Conference on Machine Learning (ICML). PMLR

work page 2023

[28] [28]

Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Tom Griffiths, Yuan Cao, and Karthik Narasimhan. 2023. Tree of thoughts: Deliberate problem solving with large language models.Advances in neural information processing systems36 (2023), 11809–11822

work page 2023