{"work":{"id":"07adb06e-4ed5-4ec5-a7ae-ff288fd214fb","openalex_id":null,"doi":null,"arxiv_id":"2305.10601","raw_key":null,"title":"Tree of Thoughts: Deliberate Problem Solving with Large Language Models","authors":null,"authors_text":"Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao","year":2023,"venue":"cs.CL","abstract":"Language models are increasingly being deployed for general problem solving across a wide range of tasks, but are still confined to token-level, left-to-right decision-making processes during inference. This means they can fall short in tasks that require exploration, strategic lookahead, or where initial decisions play a pivotal role. To surmount these challenges, we introduce a new framework for language model inference, Tree of Thoughts (ToT), which generalizes over the popular Chain of Thought approach to prompting language models, and enables exploration over coherent units of text (thoughts) that serve as intermediate steps toward problem solving. ToT allows LMs to perform deliberate decision making by considering multiple different reasoning paths and self-evaluating choices to decide the next course of action, as well as looking ahead or backtracking when necessary to make global choices. Our experiments show that ToT significantly enhances language models' problem-solving abilities on three novel tasks requiring non-trivial planning or search: Game of 24, Creative Writing, and Mini Crosswords. For instance, in Game of 24, while GPT-4 with chain-of-thought prompting only solved 4% of tasks, our method achieved a success rate of 74%. Code repo with all prompts: https://github.com/princeton-nlp/tree-of-thought-llm.","external_url":"https://arxiv.org/abs/2305.10601","cited_by_count":null,"metadata_source":"pith","metadata_fetched_at":"2026-05-15T00:08:21.733305+00:00","pith_arxiv_id":"2305.10601","created_at":"2026-05-08T17:08:34.380555+00:00","updated_at":"2026-05-15T00:08:21.733305+00:00","title_quality_ok":true,"display_title":"Tree of Thoughts: Deliberate Problem Solving with Large Language Models","render_title":"Tree of Thoughts: Deliberate Problem Solving with Large Language Models"},"hub":{"state":{"work_id":"07adb06e-4ed5-4ec5-a7ae-ff288fd214fb","tier":"hub","tier_reason":"10+ Pith inbound or 1,000+ external citations","pith_inbound_count":48,"external_cited_by_count":null,"distinct_field_count":11,"first_pith_cited_at":"2023-04-22T20:34:03+00:00","last_pith_cited_at":"2026-05-13T15:48:16+00:00","author_build_status":"not_needed","summary_status":"needed","contexts_status":"needed","graph_status":"needed","ask_index_status":"not_needed","reader_status":"not_needed","recognition_status":"not_needed","updated_at":"2026-05-15T03:47:26.727745+00:00","tier_text":"hub"},"tier":"hub","role_counts":[{"context_role":"background","n":4}],"polarity_counts":[{"context_polarity":"background","n":4}],"runs":{"context_extract":{"job_type":"context_extract","status":"succeeded","result":{"enqueued_papers":25},"error":null,"updated_at":"2026-05-14T16:32:39.174373+00:00"},"graph_features":{"job_type":"graph_features","status":"succeeded","result":{"co_cited":[{"title":"ReAct: Synergizing Reasoning and Acting in Language Models","work_id":"407a2351-25f1-497d-b611-f77d0292a8e6","shared_citers":15},{"title":"Reflexion: Language Agents with Verbal Reinforcement Learning","work_id":"778f739e-5f55-4961-8a2a-e4736a2757f4","shared_citers":14},{"title":"Chain-of-Thought Prompting Elicits Reasoning in Large Language Models","work_id":"d1cf6693-a082-403c-ada9-dac7b96341f9","shared_citers":12},{"title":"Self-Refine: Iterative Refinement with Self-Feedback","work_id":"59181e7f-e58e-45d3-8146-4477a9f53d5a","shared_citers":12},{"title":"Self-Consistency Improves Chain of Thought Reasoning in Language Models","work_id":"8c6d5a6b-b5cc-4105-9c84-9c34bb9375bb","shared_citers":11},{"title":"Evaluating Large Language Models Trained on Code","work_id":"042493e9-b26f-4b4e-bbde-382072ca9b08","shared_citers":10},{"title":"Voyager: An Open-Ended Embodied Agent with Large Language Models","work_id":"ffe0d207-86cf-4742-a100-e988ac8b9676","shared_citers":10},{"title":"Toolformer: Language Models Can Teach Themselves to Use Tools","work_id":"9bce40c8-cfd7-4983-80e0-c3bd4402322a","shared_citers":9},{"title":"GPT-4 Technical Report","work_id":"b928e041-6991-4c08-8c81-0359e4097c7b","shared_citers":8},{"title":"AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation","work_id":"92b7eb9c-c3d8-4518-a376-06fa15dd895b","shared_citers":7},{"title":"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning","work_id":"e6b75ad5-2877-4168-97c8-710407094d20","shared_citers":7},{"title":"HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face","work_id":"f20ed1da-2676-4598-a11b-54549718735b","shared_citers":7},{"title":"Llama 2: Open Foundation and Fine-Tuned Chat Models","work_id":"68a5177f-d644-44c1-bd4f-4e5278c22f5d","shared_citers":7},{"title":"Measuring Mathematical Problem Solving With the MATH Dataset","work_id":"50652ac6-fb7c-4675-a2c2-159c241feb17","shared_citers":6},{"title":"The Rise and Potential of Large Language Model Based Agents: A Survey","work_id":"985ca219-7e34-4c4f-bdc5-ccd39763ad61","shared_citers":6},{"title":"AgentBench: Evaluating LLMs as Agents","work_id":"a37549b4-4c94-412d-acc4-4efeb08509be","shared_citers":5},{"title":"arXiv preprint arXiv:2305.18323 , year=","work_id":"d9334986-6f58-4143-afac-98d9519e7b4e","shared_citers":5},{"title":"A survey on large language model based autonomous agents","work_id":"47f7e8a3-3732-4530-b412-d9c984ce99ed","shared_citers":5},{"title":"DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models","work_id":"c5006563-f3ec-438a-9e35-b7b484f34828","shared_citers":5},{"title":"Language Models are Few-Shot Learners","work_id":"214732c0-2edd-44a0-af9e-28184a2b8279","shared_citers":5},{"title":"Large Language Models Cannot Self-Correct Reasoning Yet","work_id":"f63b261b-ef16-40f5-993b-9d37b1a51b92","shared_citers":5},{"title":"LLaMA: Open and Efficient Foundation Language Models","work_id":"c018fc23-6f3f-4035-9d02-28a2173b2b9d","shared_citers":5},{"title":"Qwen3 Technical Report","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","shared_citers":5},{"title":"Self-Instruct: Aligning Language Models with Self-Generated Instructions","work_id":"d0018767-775d-406e-861d-539ed681ff73","shared_citers":5}],"time_series":[{"n":6,"year":2023},{"n":5,"year":2024},{"n":32,"year":2026}],"dependency_candidates":[]},"error":null,"updated_at":"2026-05-14T16:32:41.723993+00:00"},"identity_refresh":{"job_type":"identity_refresh","status":"succeeded","result":{"items":[{"title":"Qwen3 Technical Report","outcome":"unchanged","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","resolver":"local_arxiv","confidence":0.98,"old_work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e"}],"counts":{"fixed":0,"merged":0,"unchanged":1,"quarantined":0,"needs_external_resolution":0},"errors":[],"attempted":1},"error":null,"updated_at":"2026-05-14T16:32:46.613261+00:00"},"summary_claims":{"job_type":"summary_claims","status":"succeeded","result":{"title":"Tree of Thoughts: Deliberate Problem Solving with Large Language Models","claims":[{"claim_text":"Language models are increasingly being deployed for general problem solving across a wide range of tasks, but are still confined to token-level, left-to-right decision-making processes during inference. This means they can fall short in tasks that require exploration, strategic lookahead, or where initial decisions play a pivotal role. To surmount these challenges, we introduce a new framework for language model inference, Tree of Thoughts (ToT), which generalizes over the popular Chain of Thought approach to prompting language models, and enables exploration over coherent units of text (thoug","claim_type":"abstract","evidence_strength":"source_metadata"}],"why_cited":"Pith tracks Tree of Thoughts: Deliberate Problem Solving with Large Language Models because it crossed a citation-hub threshold.","role_counts":[]},"error":null,"updated_at":"2026-05-14T16:32:41.726588+00:00"}},"summary":{"title":"Tree of Thoughts: Deliberate Problem Solving with Large Language Models","claims":[{"claim_text":"Language models are increasingly being deployed for general problem solving across a wide range of tasks, but are still confined to token-level, left-to-right decision-making processes during inference. This means they can fall short in tasks that require exploration, strategic lookahead, or where initial decisions play a pivotal role. To surmount these challenges, we introduce a new framework for language model inference, Tree of Thoughts (ToT), which generalizes over the popular Chain of Thought approach to prompting language models, and enables exploration over coherent units of text (thoug","claim_type":"abstract","evidence_strength":"source_metadata"}],"why_cited":"Pith tracks Tree of Thoughts: Deliberate Problem Solving with Large Language Models because it crossed a citation-hub threshold.","role_counts":[]},"graph":{"co_cited":[{"title":"ReAct: Synergizing Reasoning and Acting in Language Models","work_id":"407a2351-25f1-497d-b611-f77d0292a8e6","shared_citers":15},{"title":"Reflexion: Language Agents with Verbal Reinforcement Learning","work_id":"778f739e-5f55-4961-8a2a-e4736a2757f4","shared_citers":14},{"title":"Chain-of-Thought Prompting Elicits Reasoning in Large Language Models","work_id":"d1cf6693-a082-403c-ada9-dac7b96341f9","shared_citers":12},{"title":"Self-Refine: Iterative Refinement with Self-Feedback","work_id":"59181e7f-e58e-45d3-8146-4477a9f53d5a","shared_citers":12},{"title":"Self-Consistency Improves Chain of Thought Reasoning in Language Models","work_id":"8c6d5a6b-b5cc-4105-9c84-9c34bb9375bb","shared_citers":11},{"title":"Evaluating Large Language Models Trained on Code","work_id":"042493e9-b26f-4b4e-bbde-382072ca9b08","shared_citers":10},{"title":"Voyager: An Open-Ended Embodied Agent with Large Language Models","work_id":"ffe0d207-86cf-4742-a100-e988ac8b9676","shared_citers":10},{"title":"Toolformer: Language Models Can Teach Themselves to Use Tools","work_id":"9bce40c8-cfd7-4983-80e0-c3bd4402322a","shared_citers":9},{"title":"GPT-4 Technical Report","work_id":"b928e041-6991-4c08-8c81-0359e4097c7b","shared_citers":8},{"title":"AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation","work_id":"92b7eb9c-c3d8-4518-a376-06fa15dd895b","shared_citers":7},{"title":"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning","work_id":"e6b75ad5-2877-4168-97c8-710407094d20","shared_citers":7},{"title":"HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face","work_id":"f20ed1da-2676-4598-a11b-54549718735b","shared_citers":7},{"title":"Llama 2: Open Foundation and Fine-Tuned Chat Models","work_id":"68a5177f-d644-44c1-bd4f-4e5278c22f5d","shared_citers":7},{"title":"Measuring Mathematical Problem Solving With the MATH Dataset","work_id":"50652ac6-fb7c-4675-a2c2-159c241feb17","shared_citers":6},{"title":"The Rise and Potential of Large Language Model Based Agents: A Survey","work_id":"985ca219-7e34-4c4f-bdc5-ccd39763ad61","shared_citers":6},{"title":"AgentBench: Evaluating LLMs as Agents","work_id":"a37549b4-4c94-412d-acc4-4efeb08509be","shared_citers":5},{"title":"arXiv preprint arXiv:2305.18323 , year=","work_id":"d9334986-6f58-4143-afac-98d9519e7b4e","shared_citers":5},{"title":"A survey on large language model based autonomous agents","work_id":"47f7e8a3-3732-4530-b412-d9c984ce99ed","shared_citers":5},{"title":"DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models","work_id":"c5006563-f3ec-438a-9e35-b7b484f34828","shared_citers":5},{"title":"Language Models are Few-Shot Learners","work_id":"214732c0-2edd-44a0-af9e-28184a2b8279","shared_citers":5},{"title":"Large Language Models Cannot Self-Correct Reasoning Yet","work_id":"f63b261b-ef16-40f5-993b-9d37b1a51b92","shared_citers":5},{"title":"LLaMA: Open and Efficient Foundation Language Models","work_id":"c018fc23-6f3f-4035-9d02-28a2173b2b9d","shared_citers":5},{"title":"Qwen3 Technical Report","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","shared_citers":5},{"title":"Self-Instruct: Aligning Language Models with Self-Generated Instructions","work_id":"d0018767-775d-406e-861d-539ed681ff73","shared_citers":5}],"time_series":[{"n":6,"year":2023},{"n":5,"year":2024},{"n":32,"year":2026}],"dependency_candidates":[]},"authors":[]}}