Tree of Thoughts enables language models to solve complex planning tasks by generating, evaluating, and searching over coherent intermediate thoughts in a tree, raising Game of 24 success from 4% to 74% with GPT-4.
Faithful reasoning using large language models
6 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
γILP is a differentiable pipeline for inducing first-order rules from unlabeled image data, showing strong performance on symbolic relational datasets, relational images, and pure image datasets such as Kandinsky patterns.
Chain-of-Thought reasoning in LLMs is often unfaithful, with models relying on it variably by task and less so as models scale larger.
CoT traces align with internal answer commitment in only 61.9% of steps on average, dominated by confabulated continuations after commitment has stabilized.
LLM reasoning is primarily mediated by latent-state trajectories rather than by explicit surface chain-of-thought outputs.
Training LLMs on text-to-ASCII spatial layout construction improves text-only spatial reasoning and transfers to external benchmarks.
citing papers explorer
-
Visual Perceptual to Conceptual First-Order Rule Learning Networks
γILP is a differentiable pipeline for inducing first-order rules from unlabeled image data, showing strong performance on symbolic relational datasets, relational images, and pure image datasets such as Kandinsky patterns.
-
Measuring Faithfulness in Chain-of-Thought Reasoning
Chain-of-Thought reasoning in LLMs is often unfaithful, with models relying on it variably by task and less so as models scale larger.
-
When Reasoning Traces Become Performative: Step-Level Evidence that Chain-of-Thought Is an Imperfect Oversight Channel
CoT traces align with internal answer commitment in only 61.9% of steps on average, dominated by confabulated continuations after commitment has stabilized.
-
LLM Reasoning Is Latent, Not the Chain of Thought
LLM reasoning is primarily mediated by latent-state trajectories rather than by explicit surface chain-of-thought outputs.
-
Learning to Draw ASCII Improves Spatial Reasoning in Language Models
Training LLMs on text-to-ASCII spatial layout construction improves text-only spatial reasoning and transfers to external benchmarks.