Recognition: unknown
Decoupled Travel Planning with Behavior Forest
Pith reviewed 2026-05-09 22:48 UTC · model grok-4.3
The pith
Structuring travel planning as a forest of behavior trees lets LLMs reason on local subtasks while coordinating globally.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Behavior sequences form the foundation of travel planning, yet each step is shaped by both local constraints inside its subtask and global constraints that span many subtasks. Prior methods collapse all constraints into one decision space, forcing the model to reason over the full entanglement at every step. The Behavior Forest instead builds a collection of parallel behavior trees, assigns each tree to one subtask, embeds large language models inside the tree nodes to produce subplans from local constraints alone, and adds an explicit global coordination mechanism that passes feedback between trees. The trees supply control structure while the LLMs supply generation, decoupling the problem.
What carries the argument
The Behavior Forest: a collection of parallel behavior trees, each responsible for one subtask, with large language models at the nodes performing localized reasoning and a global coordination mechanism that orchestrates interactions across trees.
If this is right
- Each subtask can be planned with constraints that act only inside its own tree, avoiding the need to track distant global rules at every step.
- Coordination feedback lets one tree revise an earlier decision when later trees reveal conflicts, without restarting the entire plan.
- The explicit tree structure supplies a readable trace of how local decisions combine into a global itinerary.
- LLM generation is guided at each node by the tree's local context rather than by a single undifferentiated prompt.
Where Pith is reading between the lines
- The same forest structure could be applied to other multi-stage planning domains such as project scheduling or supply-chain routing where local and global constraints also interact.
- If coordination feedback is made richer, the method might support incremental replanning when new constraints appear mid-execution.
- Embedding smaller specialized models inside individual trees rather than a single large model might further reduce per-step cost while preserving the decoupling benefit.
Load-bearing premise
The global coordination mechanism can reliably resolve interactions among subtasks without adding overhead that offsets the benefits of local decoupling, and that LLMs embedded in tree nodes can generate coherent subplans conditioned only on task-specific constraints plus coordination feedback.
What would settle it
A controlled ablation that disables the coordination layer entirely while keeping the same forest structure and LLM nodes, then measures whether plan success rate on TravelPlanner or ChinaTravel falls back to the level of entangled baselines.
Figures
read the original abstract
Behavior sequences, composed of executable steps, serve as the operational foundation for multi-constraint planning problems such as travel planning. In such tasks, each planning step is not only constrained locally but also influenced by global constraints spanning multiple subtasks, leading to a tightly coupled and complex decision process. Existing travel planning methods typically rely on a single decision space that entangles all subtasks and constraints, failing to distinguish between locally acting constraints within a subtask and global constraints that span multiple subtasks. Consequently, the model is forced to jointly reason over local and global constraints at each decision step, increasing the reasoning burden and reducing planning efficiency. To address this problem, we propose the Behavior Forest method. Specifically, our approach structures the decision-making process into a forest of parallel behavior trees, where each behavior tree is responsible for a subtask. A global coordination mechanism is introduced to orchestrate the interactions among these trees, enabling modular and coherent travel planning. Within this framework, large language models are embedded as decision engines within behavior tree nodes, performing localized reasoning conditioned on task-specific constraints to generate candidate subplans and adapt decisions based on coordination feedback. The behavior trees, in turn, provide an explicit control structure that guides LLM generation. This design decouples complex tasks and constraints into manageable subspaces, enabling task-specific reasoning and reducing the cognitive load of LLM. Experimental results show that our method outperforms state-of-the-art methods by 6.67% on the TravelPlanner and by 11.82% on the ChinaTravel benchmarks, demonstrating its effectiveness in increasing LLM performance for complex multi-constraint travel planning.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces the Behavior Forest approach for travel planning tasks. It decouples the planning into a forest of parallel behavior trees, each managing a subtask with local constraints, while a global coordination mechanism handles interactions across trees. LLMs serve as decision engines in the tree nodes, generating subplans based on task-specific constraints and coordination feedback. The paper claims this reduces the reasoning burden on LLMs and reports empirical improvements of 6.67% on the TravelPlanner benchmark and 11.82% on the ChinaTravel benchmark over state-of-the-art methods.
Significance. If the results hold and the gains are due to the proposed decoupling rather than extraneous factors, the Behavior Forest method could provide a valuable framework for enhancing LLM performance in multi-constraint planning problems by offering modular control structures. The explicit use of behavior trees for guiding LLM generation is a promising direction for addressing complex decision processes.
major comments (2)
- [Method] The global coordination mechanism is introduced to orchestrate interactions among the parallel behavior trees, but the manuscript provides no algorithm, pseudocode, or detailed rules for how conflicts are resolved or how feedback is integrated to maintain coherence. This is a load-bearing omission for the central claim that the approach enables modular and coherent planning without re-entangling constraints.
- [Experiments] The experimental results claim specific percentage improvements (6.67% on TravelPlanner, 11.82% on ChinaTravel) but lack details on the baselines compared against, the precise metrics used to compute these gains, statistical significance, or controls for variables such as total LLM inference budget and prompt variations. Without these, the attribution of gains to the decoupling mechanism cannot be verified.
minor comments (1)
- [Abstract] The abstract refers to 'state-of-the-art methods' without naming them, which reduces clarity for readers unfamiliar with the specific benchmarks.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below and will revise the paper accordingly to improve clarity and completeness.
read point-by-point responses
-
Referee: [Method] The global coordination mechanism is introduced to orchestrate interactions among the parallel behavior trees, but the manuscript provides no algorithm, pseudocode, or detailed rules for how conflicts are resolved or how feedback is integrated to maintain coherence. This is a load-bearing omission for the central claim that the approach enables modular and coherent planning without re-entangling constraints.
Authors: We acknowledge that the current description of the global coordination mechanism, while present in Section 3.2, would benefit from greater formalization. The mechanism uses priority-based merging of subplans and iterative feedback loops to resolve conflicts (e.g., date overlaps or budget violations) without re-entangling all constraints. To address this directly, we will add a dedicated algorithm box with pseudocode and explicit rules for conflict resolution and feedback integration in the revised manuscript. revision: yes
-
Referee: [Experiments] The experimental results claim specific percentage improvements (6.67% on TravelPlanner, 11.82% on ChinaTravel) but lack details on the baselines compared against, the precise metrics used to compute these gains, statistical significance, or controls for variables such as total LLM inference budget and prompt variations. Without these, the attribution of gains to the decoupling mechanism cannot be verified.
Authors: We agree that additional experimental details are necessary for full verifiability. The reported gains are computed using the standard success-rate and constraint-satisfaction metrics from the TravelPlanner and ChinaTravel benchmarks, relative to the strongest prior baselines. In the revision we will include an expanded results table with all baseline scores, paired statistical significance tests across multiple runs, and new ablation experiments that control for total LLM inference budget and prompt variations to better isolate the effect of the Behavior Forest decoupling. revision: yes
Circularity Check
No significant circularity; empirical claims rest on benchmark comparisons without self-referential reductions
full rationale
The paper proposes a Behavior Forest architecture for decoupling travel planning into parallel behavior trees with a global coordinator, embedding LLMs for localized reasoning. No mathematical derivations, equations, or first-principles predictions are present in the provided text. The central claims consist of a descriptive method outline followed by reported empirical gains (6.67% on TravelPlanner, 11.82% on ChinaTravel) against baselines. These are external benchmark results rather than quantities fitted or defined in terms of themselves. No self-citations, ansatzes, or uniqueness theorems are invoked to justify core components, and the coordination mechanism is presented as an explicit design choice without reducing to a tautology or prior self-result. The derivation chain is therefore self-contained as an engineering proposal validated externally.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Travelplanner: a benchmark for real-world planning with language agents,
J. Xie, K. Zhang, J. Chen, T. Zhu, R. Lou, Y . Tian, Y . Xiao, and Y . Su, “Travelplanner: a benchmark for real-world planning with language agents,” inProceedings of the 41st International Conference on Machine Learning, 2024, pp. 54 590–54 613
2024
-
[2]
A constraint-based method for solving sequential manipulation planning problems,
T. Lozano-P ´erez and L. P. Kaelbling, “A constraint-based method for solving sequential manipulation planning problems,” in2014 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2014, pp. 3684–3691
2014
-
[3]
Planning as satisfiability: Heuristics,
J. Rintanen, “Planning as satisfiability: Heuristics,”Artificial intelli- gence, vol. 193, pp. 45–86, 2012
2012
-
[4]
Incremental task and motion planning: A constraint-based approach
N. T. Dantam, Z. K. Kingston, S. Chaudhuri, and L. E. Kavraki, “Incremental task and motion planning: A constraint-based approach.” inRobotics: Science and systems, vol. 12. Ann Arbor, MI, USA, 2016, p. 00052
2016
-
[5]
Z3: An efficient smt solver,
L. De Moura and N. Bjørner, “Z3: An efficient smt solver,” inInter- national conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer, 2008, pp. 337–340
2008
-
[6]
A fast linear-arithmetic solver for dpll (t),
B. Dutertre and L. De Moura, “A fast linear-arithmetic solver for dpll (t),” inInternational Conference on Computer Aided Verification. Springer, 2006, pp. 81–94
2006
-
[7]
Large language models can solve real-world planning rigorously with formal verification tools,
Y . Hao, Y . Chen, Y . Zhang, and C. Fan, “Large language models can solve real-world planning rigorously with formal verification tools,” in Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2025, pp. 3434–3483
2025
-
[8]
Planning with multi-constraints via collaborative language agents,
C. Zhang, X. D. Goh, D. Li, H. Zhang, and Y . Liu, “Planning with multi-constraints via collaborative language agents,” inProceedings of the 31st International Conference on Computational Linguistics, 2025, pp. 10 054–10 082
2025
-
[9]
Z. Lu, W. Lu, Y . Tao, Y . Dai, Z. Chen, H. Zhuang, C. Chen, H. Peng, and Z. Zeng, “Decompose, plan in parallel, and merge: A novel paradigm for large language models based planning with multiple constraints,” arXiv preprint arXiv:2506.02683, 2025
-
[10]
Plan-and-solve prompting: Improving zero-shot chain-of-thought reasoning by large language models,
L. Wang, W. Xu, Y . Lan, Z. Hu, Y . Lan, R. K.-W. Lee, and E.-P. Lim, “Plan-and-solve prompting: Improving zero-shot chain-of-thought reasoning by large language models,” inThe 61st Annual Meeting Of The Association For Computational Linguistics, 2023
2023
-
[11]
Hypertree planning: Enhancing llm reasoning via hierarchical thinking,
R. Gui, Z. Wang, J. Wang, C. Ma, H. Zhen, M. Yuan, J. HAO, D. Lian, E. Chen, and F. Wu, “Hypertree planning: Enhancing llm reasoning via hierarchical thinking,” inForty-second International Conference on Machine Learning
-
[12]
arXiv preprint arXiv:2506.15451 , year =
Z. Gu, X. Zhu, Y . Cai, H. Shen, X. Chen, Q. Wang, J. Li, X. Shi, H. Guo, W. Huanget al., “Agentgroupchat-v2: Divide-and-conquer is what llm- based multi-agent system need,”arXiv preprint arXiv:2506.15451, 2025
-
[13]
Closed-loop long-horizon robotic planning via equi- librium sequence modeling,
J. Li, Z. Sunet al., “Closed-loop long-horizon robotic planning via equi- librium sequence modeling,” inForty-second International Conference on Machine Learning
-
[14]
Robust planning with llm-modulo framework: Case study in travel planning,
A. Gundawar, M. Verma, L. Guan, K. Valmeekam, S. Bhambri, and S. Kambhampati, “Robust planning with llm-modulo framework: Case study in travel planning,”arXiv preprint arXiv:2405.20625, 2024
-
[15]
Position: Llms can’t plan, but can help planning in llm-modulo frameworks,
S. Kambhampati, K. Valmeekam, L. Guan, M. Verma, K. Stechly, S. Bhambri, L. P. Saldyt, and A. B. Murthy, “Position: Llms can’t plan, but can help planning in llm-modulo frameworks,” inForty-first International Conference on Machine Learning, 2024
2024
-
[16]
Z. Guo, B. Xu, X. Wang, and Z. Mao, “Mirror: Multi-agent intra-and inter-reflection for optimized reasoning in tool learning,”arXiv preprint arXiv:2505.20670, 2025
-
[17]
Evoagent: Towards automatic multi-agent generation via evolutionary algorithms,
S. Yuan, K. Song, J. Chen, X. Tan, D. Li, and D. Yang, “Evoagent: Towards automatic multi-agent generation via evolutionary algorithms,” inProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2025, pp. 6192–6217
2025
-
[18]
React: Synergizing reasoning and acting in language models,
S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y . Cao, “React: Synergizing reasoning and acting in language models,” in International Conference on Learning Representations (ICLR), 2023
2023
-
[19]
Reflex- ion: Language agents with verbal reinforcement learning,
N. Shinn, F. Cassano, A. Gopinath, K. Narasimhan, and S. Yao, “Reflex- ion: Language agents with verbal reinforcement learning,”Advances in Neural Information Processing Systems, vol. 36, pp. 8634–8652, 2023
2023
-
[20]
LLM+P: Empowering Large Language Models with Optimal Planning Proficiency
B. Liu, Y . Jiang, X. Zhang, Q. Liu, S. Zhang, J. Biswas, and P. Stone, “Llm+ p: Empowering large language models with optimal planning proficiency,”arXiv preprint arXiv:2304.11477, 2023
work page internal anchor Pith review arXiv 2023
-
[21]
Autotamp: Autoregressive task and motion planning with llms as translators and checkers,
Y . Chen, J. Arkin, C. Dawson, Y . Zhang, N. Roy, and C. Fan, “Autotamp: Autoregressive task and motion planning with llms as translators and checkers,” in2024 IEEE International conference on robotics and automation (ICRA). IEEE, 2024, pp. 6695–6702
2024
-
[22]
Lgmcts: Language-guided monte-carlo tree search for executable semantic object rearrangement,
H. Chang, K. Gao, K. Boyalakuntla, A. Lee, B. Huang, J. Yu, and A. Boularias, “Lgmcts: Language-guided monte-carlo tree search for executable semantic object rearrangement,” in2024 IEEE/RSJ Interna- tional Conference on Intelligent Robots and Systems (IROS). IEEE, 2024, pp. 13 607–13 612
2024
-
[23]
A critical assessment of llms for solving multi-step problems: Preliminary results,
V . Hsiao, M. Fine-Morris, M. Roberts, L. N. Smith, and L. M. Hiatt, “A critical assessment of llms for solving multi-step problems: Preliminary results,” inAAAI 2025 Workshop LM4Plan
2025
-
[24]
Forest-of-thought: Scaling test-time compute for enhancing llm reasoning,
Z. Bi, K. Han, C. Liu, Y . Tang, and Y . Wang, “Forest-of-thought: Scaling test-time compute for enhancing llm reasoning,” inForty-second International Conference on Machine Learning
-
[25]
Tree of thoughts: Deliberate problem solving with large language models,
S. Yao, D. Yu, J. Zhao, I. Shafran, T. Griffiths, Y . Cao, and K. Narasimhan, “Tree of thoughts: Deliberate problem solving with large language models,”Advances in neural information processing systems, vol. 36, pp. 11 809–11 822, 2023
2023
-
[26]
Tree-of-mixed-thought: Combining fast and slow thinking for multi- hop visual reasoning,
P. Hu, J. Qi, X. Li, H. Li, X. Wang, B. Quan, R. Wang, and Y . Zhou, “Tree-of-mixed-thought: Combining fast and slow thinking for multi- hop visual reasoning,”arXiv preprint arXiv:2308.09658, 2023
-
[27]
X. Chen, S. Zhou, K. Liang, D. Yuan, H. Chen, X. Sun, L. Meng, and X. Liu, “Putting on the thinking hats: A survey on chain of thought fine-tuning from the perspective of human reasoning mechanism,”arXiv preprint arXiv:2510.13170, 2025. ARXIV 14
-
[28]
Collm: Integrating collaborative embeddings into large language models for recommendation,
Y . Zhang, F. Feng, J. Zhang, K. Bao, Q. Wang, and X. He, “Collm: Integrating collaborative embeddings into large language models for recommendation,”IEEE Transactions on Knowledge and Data Engi- neering, 2025
2025
-
[29]
Global optimal travel planning for massive travel queries in road networks,
Y . Xu, L. Li, M. Zhang, Z. Xu, and X. Zhou, “Global optimal travel planning for massive travel queries in road networks,”IEEE Transactions on Knowledge and Data Engineering, 2024
2024
-
[30]
Language mod- els are few-shot learners,
T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askellet al., “Language mod- els are few-shot learners,”Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020
1901
-
[31]
To cot or not to cot? chain-of-thought helps mainly on math and symbolic reasoning,
Z. R. Sprague, F. Yin, J. D. Rodriguez, D. Jiang, M. Wadhwa, P. Singhal, X. Zhao, X. Ye, K. Mahowald, and G. Durrett, “To cot or not to cot? chain-of-thought helps mainly on math and symbolic reasoning,” inThe Thirteenth International Conference on Learning Representations
-
[32]
Toward adaptive reasoning in large language models with thought rollback,
S. Chen and B. Li, “Toward adaptive reasoning in large language models with thought rollback,” inProceedings of the 41st International Conference on Machine Learning, 2024, pp. 7033–7056
2024
-
[33]
Large- scale hierarchical causal discovery via weak prior knowledge,
X. Wang, T. Ban, L. Chen, D. Lyu, Q. Zhu, and H. Chen, “Large- scale hierarchical causal discovery via weak prior knowledge,”IEEE Transactions on Knowledge and Data Engineering, 2025
2025
-
[34]
Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face,
Y . Shen, K. Song, X. Tan, D. Li, W. Lu, and Y . Zhuang, “Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face,”Advances in Neural Information Processing Systems, vol. 36, pp. 38 154–38 180, 2023
2023
-
[36]
Pal: Program-aided language models,
L. Gao, A. Madaan, S. Zhou, U. Alon, P. Liu, Y . Yang, J. Callan, and G. Neubig, “Pal: Program-aided language models,” inInternational Conference on Machine Learning. PMLR, 2023, pp. 10 764–10 799
2023
-
[37]
Learning hierarchical preferences for recommendation with mixture intention neural stochastic processes,
H. Liu, L. Jing, J. Yu, and M. K. Ng, “Learning hierarchical preferences for recommendation with mixture intention neural stochastic processes,” IEEE Transactions on Knowledge and Data Engineering, vol. 36, no. 7, pp. 3237–3251, 2024
2024
-
[38]
Leap & lean: Look-ahead planning and agile navigation for llm agents,
N. Verma and M. Bharadwaj, “Leap & lean: Look-ahead planning and agile navigation for llm agents,” inProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track), 2025, pp. 896–933
2025
-
[39]
Compositional coordination for multi-robot teams with large language models,
Z. Huang, G. Shi, Y . Wu, V . Kumar, and G. S. Sukhatme, “Compositional coordination for multi-robot teams with large language models,”arXiv preprint arXiv:2507.16068, 2025
-
[40]
Game agent driven by free-form text command: Using llm-based code generation and behavior branch,
R. Ito and J. Takahashi, “Game agent driven by free-form text command: Using llm-based code generation and behavior branch,”arXiv preprint arXiv:2402.07442, 2024
-
[41]
Mrbtp: Efficient multi-robot behavior tree planning and collaboration,
Y . Cai, X. Chen, Z. Cai, Y . Mao, M. Li, W. Yang, and J. Wang, “Mrbtp: Efficient multi-robot behavior tree planning and collaboration,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 14, 2025, pp. 14 548–14 557
2025
-
[42]
Integrating intent understanding and optimal behavior planning for behavior tree generation from human instructions,
X. Chen, Y . Cai, Y . Mao, M. Li, W. Yang, W. Xu, and J. Wang, “Integrating intent understanding and optimal behavior planning for behavior tree generation from human instructions,” inIJCAI, 2024
2024
-
[43]
Llm-driven self-refinement for embodied drone task planning,
D. Zhang, X. Zhang, J. Li, T. Long, X. Dai, Y . Fu, J. Zhang, J. Ren, and Y . Zhang, “Llm-driven self-refinement for embodied drone task planning,”arXiv preprint arXiv:2508.15501, 2025
-
[44]
Hbtp: Heuristic behavior tree planning with large language model reasoning,
Y . Cai, X. Chen, Y . Mao, M. Li, S. Yang, W. Yang, and J. Wang, “Hbtp: Heuristic behavior tree planning with large language model reasoning,” in2025 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2025, pp. 13 706–13 713
2025
-
[45]
Adaptive command: Real-time policy adjustment via language models in starcraft ii,
W. Ma, D. Xu, S. Lin, H. Zhang, and J. Wang, “Adaptive command: Real-time policy adjustment via language models in starcraft ii,” in Proceedings of the 2024 6th International Conference on Distributed Artificial Intelligences, 2024, pp. 22–30
2024
-
[46]
Au- tomatic behavior tree expansion with llms for robotic manipulation,
J. Styrud, M. Iovino, M. Norrl ¨of, M. Bj ¨orkman, and C. Smith, “Au- tomatic behavior tree expansion with llms for robotic manipulation,” in2025 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2025, pp. 1225–1232
2025
-
[47]
A survey on mixture of experts in large language models,
W. Cai, J. Jiang, F. Wang, J. Tang, S. Kim, and J. Huang, “A survey on mixture of experts in large language models,”IEEE Transactions on Knowledge and Data Engineering, 2025
2025
-
[48]
Optimizing long-term efficiency and fairness in ride-hailing under budget constraint via joint order dispatching and driver repositioning,
J. Sun, H. Jin, Z. Yang, and L. Su, “Optimizing long-term efficiency and fairness in ride-hailing under budget constraint via joint order dispatching and driver repositioning,”IEEE Transactions on Knowledge and Data Engineering, vol. 36, no. 7, pp. 3348–3362, 2024
2024
-
[49]
Evocurr: Self-evolving curriculum with behavior code generation for complex decision-making,
Y . Cheng, Z. Wang, W. Ma, W. Zhu, Y . Deng, and J. Zhao, “Evocurr: Self-evolving curriculum with behavior code generation for complex decision-making,”arXiv preprint arXiv:2508.09586, 2025
-
[50]
From debate to equilibrium: Belief-driven multi-agent llm reasoning via bayesian nash equilibrium,
X. Yi, Z. Zhou, C. Cao, Q. Niu, T. Liu, and B. Han, “From debate to equilibrium: Belief-driven multi-agent llm reasoning via bayesian nash equilibrium,” inForty-second International Conference on Machine Learning
-
[51]
Megaagent: A large-scale autonomous llm-based multi-agent system without predefined sops,
Q. Wang, T. Wang, Z. Tang, Q. Li, N. Chen, J. Liang, and B. He, “Megaagent: A large-scale autonomous llm-based multi-agent system without predefined sops,” inFindings of the Association for Computa- tional Linguistics: ACL 2025, 2025, pp. 4998–5036
2025
-
[52]
Tree-of-reasoning: Towards complex medical diagnosis via multi-agent reasoning with evidence tree,
Q. Peng, J. Cui, J. Xie, Y . Cai, and Q. Li, “Tree-of-reasoning: Towards complex medical diagnosis via multi-agent reasoning with evidence tree,”arXiv preprint arXiv:2508.03038, 2025
-
[53]
Mediator-guided multi-agent collaboration among open-source models for medical decision-making,
K. Chen, M. Liu, D. Zong, C. Ding, S. Rui, Y . Jiang, M. Zhou, and X. Wang, “Mediator-guided multi-agent collaboration among open-source models for medical decision-making,”arXiv preprint arXiv:2508.05996, 2025
-
[54]
J.-J. Shao, B.-W. Zhang, X.-W. Yang, B. Chen, S.-Y . Han, W.-D. Wei, G. Cai, Z. Dong, L.-Z. Guo, and Y .-f. Li, “Chinatravel: An open- ended benchmark for language agents in chinese travel planning,”arXiv preprint arXiv:2412.13682, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[55]
A human-like reasoning framework for multi-phases planning task with large language models,
C. Xie and D. Zou, “A human-like reasoning framework for multi-phases planning task with large language models,” inICML 2024 Workshop on LLMs and Cognition
2024
-
[56]
InThe F ourteenth International Conference on Learning Representations
Y . Shao, V . Samuel, Y . Jiang, J. Yang, and D. Yang, “Collaborative gym: A framework for enabling and evaluating human-agent collaboration,” arXiv preprint arXiv:2412.15701, 2024
-
[57]
Optimizing large language models for dynamic constraints through human-in-the-loop discriminators,
T. Wei, A. Miin, and A. Miin, “Optimizing large language models for dynamic constraints through human-in-the-loop discriminators,”arXiv preprint arXiv:2410.15163, 2024
-
[58]
Revealing the barriers of language agents in planning,
J. Xie, K. Zhang, J. Chen, S. Yuan, K. Zhang, Y . Zhang, L. Li, and Y . Xiao, “Revealing the barriers of language agents in planning,” in Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2025, pp. 1872–1888
2025
-
[59]
Can we rely on llm agents to draft long-horizon plans? let’s take travelplanner as an example,
Y . Chen, A. Pesaranghader, T. Sadhu, and D. H. Yi, “Can we rely on llm agents to draft long-horizon plans? let’s take travelplanner as an example,”arXiv preprint arXiv:2408.06318, 2024
-
[60]
Z. Zhang, R. Ding, Y . Zhu, Z. Kong, and P. Xu, “Narrative-driven travel planning: Geoculturally-grounded script generation with evolutionary itinerary optimization,”arXiv preprint arXiv:2502.14456, 2025
-
[61]
Language agent tree search unifies reasoning, acting, and planning in language models,
A. Zhou, K. Yan, M. Shlapentokh-Rothman, H. Wang, and Y .-X. Wang, “Language agent tree search unifies reasoning, acting, and planning in language models,” inProceedings of the 41st International Conference on Machine Learning, 2024, pp. 62 138–62 160
2024
-
[62]
Beyond ex- amples: High-level automated reasoning paradigm in in-context learning via mcts,
J. Wu, M. Feng, S. Zhang, F. Che, Z. Wen, C. Liao, and J. Tao, “Beyond examples: High-level automated reasoning paradigm in in- context learning via mcts,”arXiv preprint arXiv:2411.18478, 2024
-
[63]
Reasoning with language model is planning with world model,
S. Hao, Y . Gu, H. Ma, J. Hong, Z. Wang, D. Z. Wang, and Z. Hu, “Reasoning with language model is planning with world model,” in NeurIPS 2023 Workshop on Generalization in Planning
2023
-
[64]
Chain-of-thought prompting elicits reasoning in large language models,
J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V . Le, D. Zhouet al., “Chain-of-thought prompting elicits reasoning in large language models,”Advances in neural information processing systems, vol. 35, pp. 24 824–24 837, 2022
2022
-
[65]
Logic-lm: Empowering large language models with symbolic solvers for faithful logical rea- soning,
L. Pan, A. Albalak, X. Wang, and W. Wang, “Logic-lm: Empowering large language models with symbolic solvers for faithful logical rea- soning,” inFindings of the Association for Computational Linguistics: EMNLP 2023, 2023, pp. 3806–3824. Duanyang Yuanis currently pursuing a Ph.D. de- gree at the National University of Defense Technol- ogy (NUDT). Her resea...
2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.