Grounded Iterative Language Planning: How Parameterized World Models Reduce Hallucination Propagation in LLM Agents

Xinyuan Song; Zekun Cai

arxiv: 2606.27806 · v1 · pith:3XOLTHNInew · submitted 2026-06-26 · 💻 cs.AI

Grounded Iterative Language Planning: How Parameterized World Models Reduce Hallucination Propagation in LLM Agents

Xinyuan Song , Zekun Cai This is my paper

Pith reviewed 2026-06-29 04:43 UTC · model grok-4.3

classification 💻 cs.AI

keywords world modelsLLM agentshallucinationplanningparameterized modelsconsistency gateiterative planninggraph benchmarks

0 comments

The pith

A small parameterized backbone grounds LLM drafts via consistency checks to cut hallucinated states in planning agents.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that pure LLM-based world models produce hallucinated state changes that resist ordinary regression scoring, while parameterized models yield measurable errors but lack flexible standalone planning. GILP trains only a compact backbone to supply actions, state-delta predictions, risk, and value estimates, then inserts a consistency gate that forces LLM revisions when drafts disagree with the backbone. This hybrid cuts hallucinated-state rates sharply and lifts task success on graph benchmarks while adding limited extra calls. A sympathetic reader cares because hallucination propagation is a core obstacle to reliable sequential decision-making by language agents.

Core claim

GILP trains only a small parameterized backbone and combines it with API-based agent reasoning. The backbone supplies valid actions, predicted state deltas, risk, and value; the LLM drafts an action and imagined delta; and a consistency gate asks for revision when the two disagree. On real GPT-4o-mini calls, GILP reduces hallucinated-state rate from 0.176 to 0.035. In calibrated simulator ablations, it raises success from 0.668 to 0.838 while adding only ~22% extra LLM calls.

What carries the argument

The consistency gate that compares an LLM-drafted state delta against the parameterized backbone's prediction and triggers revision on mismatch.

If this is right

Hallucinated-state rate drops from 0.176 to 0.035 on real GPT-4o-mini calls.
Planning success rises from 0.668 to 0.838 in calibrated simulator ablations.
The hybrid adds only about 22 percent extra LLM calls.
The method applies across four graph-structured planning benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same consistency mechanism could extend planning horizons before errors accumulate.
A sufficiently accurate backbone might allow smaller or cheaper LLMs to reach comparable reliability.
Grounding via external predictions offers a route to safer agent behavior in interactive settings.

Load-bearing premise

The small parameterized backbone can reliably supply valid actions, predicted state deltas, risk, and value estimates that correctly identify when the LLM draft is inconsistent without introducing new errors or overly restricting the LLM's flexibility.

What would settle it

Running the same four graph benchmarks and GPT-4o-mini calls with the consistency gate removed and checking whether the hallucinated-state rate stays near 0.176 rather than falling to 0.035.

Figures

Figures reproduced from arXiv: 2606.27806 by Xinyuan Song, Zekun Cai.

**Figure 2.** Figure 2: Pareto frontier of success rate versus tokens per task on a log scale. Parameterized-only planning [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗

**Figure 3.** Figure 3: A false completion of node X at step 3 makes the agent-based world model take invalid actions at steps 4–6; GILP catches the inconsistency between the agent’s imagined X:completed and the parameterized backbone’s predicted X:pending, and the corrective re-prompt revises the action [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗

**Figure 4.** Figure 4: Standalone versus hybrid success for each backbone family. The MLP is weak as a planner by [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗

**Figure 5.** Figure 5: GILP ablation SR and HSR. Removing the correction gate (NoCorrGate) removes the main [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗

**Figure 6.** Figure 6: Empirical Pbany(H) for agent-side semantic error. GILP curves lie below Hybrid-Full and agent-only curves, especially at long horizons, showing that the parameterized signal reduces hallucination propagation rather than merely improving final-task ranking. Kurtland Chua, Roberto Calandra, Rowan McAllister, and Sergey Levine. 2018. Deep reinforcement learning in a handful of trials using probabilistic dyna… view at source ↗

**Figure 7.** Figure 7: Three planning paradigms. Agent-based world modeling uses the LLM’s own imagined state [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗

**Figure 8.** Figure 8: How the parametric skeleton Bt is formatted and inserted into the agent prompt. For each candidate action the backbone predicts validity, state delta, affected entities, risk, and short-horizon value. Thomas N. Kipf and Max Welling. 2017. Semisupervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR). Hunter Lightman, Vineet Kosaraju, Yuri … view at source ↗

**Figure 9.** Figure 9: SR and HSR before/after GILP grounding for four LLM APIs. Green annotations: SR gain [PITH_FULL_IMAGE:figures/full_fig_p020_9.png] view at source ↗

**Figure 10.** Figure 10: Left: cost–quality improvement (Agent→GILP) per API on a log-cost axis. Llama-3-8B (self-hosted) achieves comparable GILP SR to paid APIs at zero marginal token cost. Right: GILP Phase-3 correction rate and agent JSON-fail rate per API; higher correction rate mirrors higher agent-only HSR. TaskGraph ToolChain ResourceAlloc RepairFlow GPT 4o mini Claude 3 Haiku Gemini 1.5 Flash Llama 3 8B 0.172 0.151 0.191… view at source ↗

**Figure 11.** Figure 11: HSR heatmaps per API × benchmark. Left: agent-only HSR; right: GILP HSR reduction (%). GPT-4o-mini achieves > 85% HSR reduction on every benchmark; other APIs achieve 62–69% consistently across all four benchmarks. grounded language agents. In Advances in Neural Information Processing Systems (NeurIPS). Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, and Karthik Narasimha… view at source ↗

**Figure 12.** Figure 12: SR and HSR for Agent-only vs. GILP on FB15k-237 multi-hop KG traversal. HSR annotations [PITH_FULL_IMAGE:figures/full_fig_p022_12.png] view at source ↗

read the original abstract

World models for language agents come in two useful forms. An agent-based world model calls an LLM API and reasons flexibly in language, but its errors appear as hallucinated state changes that are hard to score with ordinary regression losses. A parameterized world model is a trained transition predictor; its errors are easier to measure with quantities such as NodeMSE, delta accuracy, and validity accuracy, but it is usually weaker as a standalone planner. We compare these two families on four graph-structured planning benchmarks and introduce operational hallucination metrics for the agent-based case. The comparison motivates \textbf{Grounded Iterative Language Planning} (GILP), which trains only a small parameterized backbone and combines it with API-based agent reasoning. The backbone supplies valid actions, predicted state deltas, risk, and value; the LLM drafts an action and imagined delta; and a consistency gate asks for revision when the two disagree. On real GPT-4o-mini calls, GILP reduces hallucinated-state rate from 0.176 to 0.035. In calibrated simulator ablations, it raises success from 0.668 to 0.838 while adding only ~22% extra LLM calls.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GILP pairs a small parameterized backbone with LLM drafts and a consistency gate to cut hallucinated states on real GPT-4o-mini calls, but the backbone's standalone accuracy is not reported.

read the letter

The main takeaway is that GILP trains a small parameterized world model and uses it to check LLM-generated actions and state deltas, dropping the hallucinated-state rate from 0.176 to 0.035 on actual GPT-4o-mini calls and lifting success from 0.668 to 0.838 in simulator ablations at roughly 22 percent extra cost.

The paper does a few things cleanly. It separates the two world-model styles, defines usable operational metrics for hallucination in the LLM case, and shows the hybrid beats either approach alone on four graph benchmarks. The consistency gate is simple: the backbone supplies valid actions, predicted deltas, risk, and value; the LLM proposes a step; disagreement triggers revision. The real-API results and modest overhead are the strongest evidence.

The soft spot is the untested reliability of the backbone itself. The method assumes its predictions are accurate enough to catch real LLM errors without high false positives or new mistakes that shrink the search space. The abstract gives no standalone numbers for the backbone's validity accuracy, delta accuracy, or false-positive rate on LLM drafts, so it is hard to tell whether the gains come from better grounding or from an aggressive filter. The stress-test concern about calibration stands until those figures appear. Error bars and fuller ablation controls are also missing from the summary.

This is aimed at people working on reliable LLM agents. It has enough concrete empirical grounding and clear thinking to deserve peer review, though the methods and backbone details will need close checking.

Referee Report

2 major / 1 minor

Summary. The paper claims that a hybrid approach called Grounded Iterative Language Planning (GILP) — training only a small parameterized backbone to supply valid actions, predicted state deltas, risk, and value estimates, then using these in a consistency gate to revise LLM action drafts — reduces hallucinated-state rates from 0.176 to 0.035 on real GPT-4o-mini calls and raises success rates from 0.668 to 0.838 in calibrated simulator ablations on four graph-structured planning benchmarks, at the cost of only ~22% extra LLM calls. It also introduces operational hallucination metrics for agent-based world models.

Significance. If substantiated, the result would show that lightweight parameterized world models can effectively ground flexible LLM reasoning to limit hallucination propagation in planning agents, providing a practical middle ground between pure agent-based and pure parameterized approaches while keeping API overhead modest. The operational metrics for scoring hallucinated state changes in LLM agents would also be a useful addition to evaluation practices in this area.

major comments (2)

Abstract: The headline reductions (hallucinated-state rate 0.176→0.035; success 0.668→0.838) depend on the consistency gate correctly identifying LLM inconsistencies using the backbone's outputs. No standalone metrics are reported for the backbone's validity accuracy, delta accuracy, or false-positive rate when applied to LLM-generated drafts, so it is impossible to determine whether the gains arise from genuine grounding or from an overly restrictive filter that shrinks the LLM's search space.
Abstract / implied methods: The abstract references 'calibrated simulator ablations' and real API calls but supplies no dataset details, error bars, or full ablation tables, which are required to verify the robustness of the reported improvements and to isolate the contribution of the consistency gate.

minor comments (1)

The manuscript would benefit from an explicit table comparing the parameterized backbone in isolation against the full GILP hybrid on the same metrics.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the opportunity to respond to the referee's report. We believe the concerns raised can be addressed through clarifications and additions to the manuscript, particularly in the abstract and methods sections. Below we provide point-by-point responses.

read point-by-point responses

Referee: Abstract: The headline reductions (hallucinated-state rate 0.176→0.035; success 0.668→0.838) depend on the consistency gate correctly identifying LLM inconsistencies using the backbone's outputs. No standalone metrics are reported for the backbone's validity accuracy, delta accuracy, or false-positive rate when applied to LLM-generated drafts, so it is impossible to determine whether the gains arise from genuine grounding or from an overly restrictive filter that shrinks the LLM's search space.

Authors: The full manuscript reports the backbone's validity accuracy, delta accuracy, and related metrics in Section 4 along with an analysis of the consistency gate. However, we acknowledge that the abstract does not explicitly present these to contextualize the headline results. We will revise the abstract to include these metrics and add a sentence clarifying that the gate's false-positive rate on LLM drafts is evaluated in the experiments, demonstrating that improvements arise from accurate grounding. revision: yes
Referee: Abstract / implied methods: The abstract references 'calibrated simulator ablations' and real API calls but supplies no dataset details, error bars, or full ablation tables, which are required to verify the robustness of the reported improvements and to isolate the contribution of the consistency gate.

Authors: Dataset details appear in Section 3, error bars are included in all reported results, and full ablation tables isolating the consistency gate are in the appendix. The abstract's brevity precludes full details, but we agree this could be improved. We will update the abstract to reference the four benchmarks and ablation studies more explicitly, and ensure the main text isolates the gate's contribution. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical results on external benchmarks and real API calls

full rationale

The paper trains a small parameterized backbone for valid actions, state deltas, risk and value, then applies a consistency gate to LLM drafts. Reported gains (hallucinated-state rate 0.176→0.035 on GPT-4o-mini; success 0.668→0.838 in simulator) are measured directly against real model calls and calibrated external ablations rather than being derived by construction from the fitted parameters or any self-citation chain. No load-bearing step reduces to self-definition, fitted-input-as-prediction, or imported uniqueness; the evaluation remains independent of the method's internal fits.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities. The trained parameterized backbone implies fitted parameters whose values and training details are not stated.

pith-pipeline@v0.9.1-grok · 5740 in / 1101 out tokens · 27380 ms · 2026-06-29T04:43:06.817754+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

89 extracted references · 34 canonical work pages · 25 internal anchors

[1]

2023 , eprint =

Yao, Shunyu and Zhao, Jeffrey and Yu, Dian and Du, Nan and Shafran, Izhak and Narasimhan, Karthik and Cao, Yuan , booktitle =. 2023 , eprint =

2023
[2]

Reflexion: Language Agents with Verbal Reinforcement Learning

Reflexion: Language Agents with Verbal Reinforcement Learning , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =. 2303.11366 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv
[3]

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =. 2201.11903 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv
[4]

Tree of Thoughts: Deliberate Problem Solving with Large Language Models

Tree of Thoughts: Deliberate Problem Solving with Large Language Models , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =. 2305.10601 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv
[5]

Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models

Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models , author =. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL) , year =. 2305.04091 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv
[6]

Reasoning with Language Model is Planning with World Model

Reasoning with Language Model is Planning with World Model , author =. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP) , year =. 2305.14992 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv 2023
[7]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Large Language Models as Commonsense Knowledge for Large-Scale Task Planning , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =. 2305.14078 , archivePrefix =

work page arXiv
[8]

International Conference on Machine Learning (ICML) , year =

Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents , author =. International Conference on Machine Learning (ICML) , year =. 2201.07207 , archivePrefix =

work page arXiv
[9]

Zhu, Xizhou and Chen, Yuntao and Tian, Hao and Tao, Chenxin and Su, Weijie and Yang, Chenyu and Huang, Gao and Li, Bin and Lu, Lewei and Wang, Xiaogang and Qiao, Yu and Zhang, Zhaoxiang and Dai, Jifeng , journal =
[10]

and Chao, Wei-Lun and Su, Yu , booktitle =

Song, Chan Hee and Wu, Jiaman and Washington, Clayton and Sadler, Brian M. and Chao, Wei-Lun and Su, Yu , booktitle =. 2023 , pages =

2023
[11]

ALFWorld: Aligning Text and Embodied Environments for Interactive Learning

Shridhar, Mohit and Yuan, Xingdi and C. International Conference on Learning Representations (ICLR) , year =. 2010.03768 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv 2010
[12]

2022 , eprint =

Yao, Shunyu and Chen, Howard and Yang, John and Narasimhan, Karthik , booktitle =. 2022 , eprint =

2022
[13]

2024 , eprint =

Qin, Yujia and Liang, Shihao and Ye, Yining and Zhu, Kunlun and Yan, Lan and Lu, Yaxi and Lin, Yankai and Cong, Xin and Tang, Xiangru and Qian, Bill and Zhao, Sihan and Hong, Lauren and Tian, Runchu and Xie, Ruobing and Zhou, Jie and Gerstein, Mark and Li, Dahai and Liu, Zhiyuan and Sun, Maosong , booktitle =. 2024 , eprint =

2024
[14]

Voyager: An Open-Ended Embodied Agent with Large Language Models

Voyager: An Open-Ended Embodied Agent with Large Language Models , author =. Transactions on Machine Learning Research , year =. 2305.16291 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv
[15]

2023 , eprint =

Shen, Yongliang and Song, Kaitao and Tan, Xu and Li, Dongsheng and Lu, Weiming and Zhuang, Yueting , booktitle =. 2023 , eprint =

2023
[16]

2024 , eprint =

Liu, Xiao and Yu, Hao and Zhang, Hanchen and Xu, Yifan and Lei, Xuanyu and Lai, Hanyu and Gu, Yu and Ding, Hangliang and Men, Kaiwen and Yang, Kejuan and Zhang, Shudan and Deng, Xiang and Zeng, Aohan and Du, Zhengxiao and Zhang, Chenhui and Shen, Sheng and Zhang, Tianjun and Su, Yu and Sun, Huan and Huang, Minlie and Dong, Yuxiao and Tang, Jie , booktitle...

2024
[17]

Kambhampati, Subbarao and Valmeekam, Karthik and Guan, Lin and Stechly, Kaya and Verma, Mudit and Bhambri, Siddhant and Saldyt, Lucas and Murthy, Anil , journal =
[18]

Advances in Neural Information Processing Systems (NeurIPS) , year =

On the Planning Abilities of Large Language Models -- A Critical Investigation , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =. 2305.15771 , archivePrefix =

work page arXiv
[19]

2023 , eprint =

Lin, Bill Yuchen and Fu, Yicheng and Yang, Karina and Brahman, Faeze and Huang, Shiyu and Bhagavatula, Chandra and Ammanabrolu, Prithviraj and Choi, Yejin and Ren, Xiang , booktitle =. 2023 , eprint =

2023
[20]

Toolformer: Language Models Can Teach Themselves to Use Tools

Toolformer: Language Models Can Teach Themselves to Use Tools , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =. 2302.04761 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv
[21]

Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents

Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =. 2302.01560 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv
[22]

World Models

Recurrent World Models Facilitate Policy Evolution , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =. 1803.10122 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv
[23]

Dream to Control: Learning Behaviors by Latent Imagination

Dream to Control: Learning Behaviors by Latent Imagination , author =. International Conference on Learning Representations (ICLR) , year =. 1912.01603 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv 1912
[24]

Mastering

Hafner, Danijar and Lillicrap, Timothy and Norouzi, Mohammad and Ba, Jimmy , booktitle =. Mastering. 2021 , eprint =

2021
[25]

Mastering Diverse Domains through World Models

Mastering Diverse Domains through World Models , author =. arXiv preprint arXiv:2301.04104 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[26]

, journal =

Sutton, Richard S. , journal =
[27]

Foundations and Trends in Machine Learning , volume =

Model-Based Reinforcement Learning: A Survey , author =. Foundations and Trends in Machine Learning , volume =
[28]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Deep Reinforcement Learning in a Handful of Trials Using Probabilistic Dynamics Models , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =
[29]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Offline Reinforcement Learning as One Big Sequence Modeling Problem , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =. 2106.02039 , archivePrefix =

work page arXiv
[30]

Mastering

Schrittwieser, Julian and Antonoglou, Ioannis and Hubert, Thomas and Simonyan, Karen and Sifre, Laurent and Schmitt, Simon and Guez, Arthur and Lockhart, Edward and Hassabis, Demis and Graepel, Thore and Lillicrap, Timothy and Silver, David , booktitle =. Mastering. 2020 , volume =

2020
[31]

Semi-Supervised Classification with Graph Convolutional Networks

Semi-Supervised Classification with Graph Convolutional Networks , author =. International Conference on Learning Representations (ICLR) , year =. 1609.02907 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv
[32]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Inductive Representation Learning on Large Graphs , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =
[33]

International Conference on Machine Learning (ICML) , year =

Neural Message Passing for Quantum Chemistry , author =. International Conference on Machine Learning (ICML) , year =
[34]

Graph Attention Networks

Graph Attention Networks , author =. International Conference on Learning Representations (ICLR) , year =. 1710.10903 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv
[35]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Recipe for a General, Powerful, Scalable Graph Transformer , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =. 2205.12454 , archivePrefix =

work page arXiv
[36]

European Semantic Web Conference (ESWC) , year =

Modeling Relational Data with Graph Convolutional Networks , author =. European Semantic Web Conference (ESWC) , year =
[37]

How Powerful are Graph Neural Networks?

How Powerful are Graph Neural Networks? , author =. International Conference on Learning Representations (ICLR) , year =. 1810.00826 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv
[38]

and Andreas, Jacob , journal =

Gu, Yu and Du, Yilun and Tenenbaum, Joshua B. and Andreas, Jacob , journal =. Are
[39]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Language Models Meet World Models: Embodied Experiences Enhance Language Models , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =. 2305.10626 , archivePrefix =

work page arXiv
[40]

International Conference on Machine Learning (ICML) , year =

Do Embodied Agents Dream of Pixelated Sheep: Embodied Decision Making using Language Guided World Modelling , author =. International Conference on Machine Learning (ICML) , year =. 2301.12050 , archivePrefix =

work page arXiv
[41]

2024 , eprint =

Zhu, Deyao and Chen, Jun and Shen, Xiaoqian and Li, Xiang and Elhoseiny, Mohamed , booktitle =. 2024 , eprint =

2024
[42]

2024 , eprint =

Zhao, Andrew and Huang, Daniel and Xu, Quentin and Lin, Matthieu and Liu, Yong-Jin and Huang, Gao , booktitle =. 2024 , eprint =

2024
[43]

Siren's Song in the

Zhang, Yue and Li, Yafu and Cui, Leyang and Cai, Deng and Liu, Lemao and Fu, Tingchen and Huang, Xinting and Zhao, Enbo and Zhang, Yu and Chen, Yulong and Wang, Longyue and Luu, Anh Tuan and Bi, Wei and Shi, Freda and Shi, Shuming , journal =. Siren's Song in the
[44]

A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions

A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions , author =. arXiv preprint arXiv:2311.05232 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[45]

A Survey of Hallucination in Large Foundation Models

A Survey of Hallucination in Large Foundation Models , author =. arXiv preprint arXiv:2309.05922 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[46]

ACM Computing Surveys , volume =

Survey of Hallucination in Natural Language Generation , author =. ACM Computing Surveys , volume =
[47]

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL) , year =

On Faithfulness and Factuality in Abstractive Summarization , author =. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL) , year =
[48]

Proceedings of the International Conference on Automated Planning and Scheduling (ICAPS) , year =

A Novel Iterative Approach to Top-k Planning , author =. Proceedings of the International Conference on Automated Planning and Scheduling (ICAPS) , year =
[49]

Annual Review of Control, Robotics, and Autonomous Systems , volume =

Integrated Task and Motion Planning , author =. Annual Review of Control, Robotics, and Autonomous Systems , volume =
[50]

arXiv preprint arXiv:2203.09634 , year =

Inventing Relational State and Action Abstractions for Effective and Efficient Bilevel Planning , author =. arXiv preprint arXiv:2203.09634 , year =

work page arXiv
[51]

Liu, Bo and Jiang, Yuqian and Zhang, Xiaohan and Liu, Qiang and Zhang, Shiqi and Biswas, Joydeep and Stone, Peter , journal =
[52]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-Based Task Planning , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =. 2305.14909 , archivePrefix =

work page arXiv
[53]

and Kaelbling, Leslie Pack and Katz, Michael , booktitle =

Silver, Tom and Dan, Soham and Srinivas, Kavitha and Tenenbaum, Joshua B. and Kaelbling, Leslie Pack and Katz, Michael , booktitle =. Generalized Planning in
[54]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Language Models are Few-Shot Learners , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =
[55]

arXiv preprint arXiv:2303.08774 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[56]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Training Language Models to Follow Instructions with Human Feedback , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =
[57]

Touvron, Hugo and Martin, Louis and Stone, Kevin and Albert, Peter and Almahairi, Amjad and Babaei, Yasmine and others , journal =
[58]

Self-Consistency Improves Chain of Thought Reasoning in Language Models

Self-Consistency Improves Chain of Thought Reasoning in Language Models , author =. International Conference on Learning Representations (ICLR) , year =. 2203.11171 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv
[59]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Large Language Models are Zero-Shot Reasoners , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =
[60]

Self-Refine: Iterative Refinement with Self-Feedback

Self-Refine: Iterative Refinement with Self-Feedback , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =. 2303.17651 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv
[61]

Proceedings of the

Graph of Thoughts: Solving Elaborate Problems with Large Language Models , author =. Proceedings of the. 2024 , eprint =

2024
[62]

Proceedings of the 36th Annual

Generative Agents: Interactive Simulacra of Human Behavior , author =. Proceedings of the 36th Annual
[63]

Let's Verify Step by Step

Let's Verify Step by Step , author =. arXiv preprint arXiv:2305.20050 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[64]

Zhou, Shuyan and Xu, Frank F and Zhu, Hao and others , booktitle=
[65]

Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models

Language Agent Tree Search Unifies Reasoning, Acting, and Planning in Language Models , author=. arXiv:2310.04406 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[66]

NeurIPS , year=

Toolformer: Language Models Can Teach Themselves to Use Tools , author=. NeurIPS , year=
[67]

Executable Code Actions Elicit Better

Wang, Xingyao and Chen, Yangyi and others , booktitle=. Executable Code Actions Elicit Better
[68]

Jimenez, Carlos E and Yang, John and others , booktitle=
[69]

ICLR , year=

Self-Consistency Improves Chain of Thought Reasoning in Language Models , author=. ICLR , year=
[70]

Chain-of-Verification Reduces Hallucination in Large Language Models

Chain-of-Verification Reduces Hallucination in Large Language Models , author=. arXiv:2309.11495 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[71]

IJCNLP-AACL , year=

Faithful Chain-of-Thought Reasoning , author=. IJCNLP-AACL , year=
[72]

TMLR , year=

Cognitive Architectures for Language Agents , author=. TMLR , year=
[73]

Zeng, Aohan and Liu, Mingdao and others , booktitle=
[74]

Zhao, Jiaan and others , journal=. Is
[75]

Chen, Baian and Shu, Chang and others , journal=
[76]

Gemini: A Family of Highly Capable Multimodal Models

Gemini: A Family of Highly Capable Multimodal Models , author=. arXiv:2312.11805 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[77]

Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback

Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback , author=. arXiv:2302.12813 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[78]

Gou, Zhibin and others , booktitle=
[79]

TACL , year=

Automatically Correcting Large Language Models: Surveying the Landscape of Diverse Automated Correction Strategies , author=. TACL , year=
[80]

ACL Findings , year=

Towards Faithful Explanations for Text Classification with Robustness Improvement , author=. ACL Findings , year=

Showing first 80 references.

[1] [1]

2023 , eprint =

Yao, Shunyu and Zhao, Jeffrey and Yu, Dian and Du, Nan and Shafran, Izhak and Narasimhan, Karthik and Cao, Yuan , booktitle =. 2023 , eprint =

2023

[2] [2]

Reflexion: Language Agents with Verbal Reinforcement Learning

Reflexion: Language Agents with Verbal Reinforcement Learning , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =. 2303.11366 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv

[3] [3]

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =. 2201.11903 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv

[4] [4]

Tree of Thoughts: Deliberate Problem Solving with Large Language Models

Tree of Thoughts: Deliberate Problem Solving with Large Language Models , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =. 2305.10601 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv

[5] [5]

Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models

Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models , author =. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL) , year =. 2305.04091 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv

[6] [6]

Reasoning with Language Model is Planning with World Model

Reasoning with Language Model is Planning with World Model , author =. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP) , year =. 2305.14992 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv 2023

[7] [7]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Large Language Models as Commonsense Knowledge for Large-Scale Task Planning , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =. 2305.14078 , archivePrefix =

work page arXiv

[8] [8]

International Conference on Machine Learning (ICML) , year =

Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents , author =. International Conference on Machine Learning (ICML) , year =. 2201.07207 , archivePrefix =

work page arXiv

[9] [9]

Zhu, Xizhou and Chen, Yuntao and Tian, Hao and Tao, Chenxin and Su, Weijie and Yang, Chenyu and Huang, Gao and Li, Bin and Lu, Lewei and Wang, Xiaogang and Qiao, Yu and Zhang, Zhaoxiang and Dai, Jifeng , journal =

[10] [10]

and Chao, Wei-Lun and Su, Yu , booktitle =

Song, Chan Hee and Wu, Jiaman and Washington, Clayton and Sadler, Brian M. and Chao, Wei-Lun and Su, Yu , booktitle =. 2023 , pages =

2023

[11] [11]

ALFWorld: Aligning Text and Embodied Environments for Interactive Learning

Shridhar, Mohit and Yuan, Xingdi and C. International Conference on Learning Representations (ICLR) , year =. 2010.03768 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv 2010

[12] [12]

2022 , eprint =

Yao, Shunyu and Chen, Howard and Yang, John and Narasimhan, Karthik , booktitle =. 2022 , eprint =

2022

[13] [13]

2024 , eprint =

Qin, Yujia and Liang, Shihao and Ye, Yining and Zhu, Kunlun and Yan, Lan and Lu, Yaxi and Lin, Yankai and Cong, Xin and Tang, Xiangru and Qian, Bill and Zhao, Sihan and Hong, Lauren and Tian, Runchu and Xie, Ruobing and Zhou, Jie and Gerstein, Mark and Li, Dahai and Liu, Zhiyuan and Sun, Maosong , booktitle =. 2024 , eprint =

2024

[14] [14]

Voyager: An Open-Ended Embodied Agent with Large Language Models

Voyager: An Open-Ended Embodied Agent with Large Language Models , author =. Transactions on Machine Learning Research , year =. 2305.16291 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv

[15] [15]

2023 , eprint =

Shen, Yongliang and Song, Kaitao and Tan, Xu and Li, Dongsheng and Lu, Weiming and Zhuang, Yueting , booktitle =. 2023 , eprint =

2023

[16] [16]

2024 , eprint =

Liu, Xiao and Yu, Hao and Zhang, Hanchen and Xu, Yifan and Lei, Xuanyu and Lai, Hanyu and Gu, Yu and Ding, Hangliang and Men, Kaiwen and Yang, Kejuan and Zhang, Shudan and Deng, Xiang and Zeng, Aohan and Du, Zhengxiao and Zhang, Chenhui and Shen, Sheng and Zhang, Tianjun and Su, Yu and Sun, Huan and Huang, Minlie and Dong, Yuxiao and Tang, Jie , booktitle...

2024

[17] [17]

Kambhampati, Subbarao and Valmeekam, Karthik and Guan, Lin and Stechly, Kaya and Verma, Mudit and Bhambri, Siddhant and Saldyt, Lucas and Murthy, Anil , journal =

[18] [18]

Advances in Neural Information Processing Systems (NeurIPS) , year =

On the Planning Abilities of Large Language Models -- A Critical Investigation , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =. 2305.15771 , archivePrefix =

work page arXiv

[19] [19]

2023 , eprint =

Lin, Bill Yuchen and Fu, Yicheng and Yang, Karina and Brahman, Faeze and Huang, Shiyu and Bhagavatula, Chandra and Ammanabrolu, Prithviraj and Choi, Yejin and Ren, Xiang , booktitle =. 2023 , eprint =

2023

[20] [20]

Toolformer: Language Models Can Teach Themselves to Use Tools

Toolformer: Language Models Can Teach Themselves to Use Tools , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =. 2302.04761 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv

[21] [21]

Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents

Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =. 2302.01560 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv

[22] [22]

World Models

Recurrent World Models Facilitate Policy Evolution , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =. 1803.10122 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv

[23] [23]

Dream to Control: Learning Behaviors by Latent Imagination

Dream to Control: Learning Behaviors by Latent Imagination , author =. International Conference on Learning Representations (ICLR) , year =. 1912.01603 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv 1912

[24] [24]

Mastering

Hafner, Danijar and Lillicrap, Timothy and Norouzi, Mohammad and Ba, Jimmy , booktitle =. Mastering. 2021 , eprint =

2021

[25] [25]

Mastering Diverse Domains through World Models

Mastering Diverse Domains through World Models , author =. arXiv preprint arXiv:2301.04104 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[26] [26]

, journal =

Sutton, Richard S. , journal =

[27] [27]

Foundations and Trends in Machine Learning , volume =

Model-Based Reinforcement Learning: A Survey , author =. Foundations and Trends in Machine Learning , volume =

[28] [28]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Deep Reinforcement Learning in a Handful of Trials Using Probabilistic Dynamics Models , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

[29] [29]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Offline Reinforcement Learning as One Big Sequence Modeling Problem , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =. 2106.02039 , archivePrefix =

work page arXiv

[30] [30]

Mastering

Schrittwieser, Julian and Antonoglou, Ioannis and Hubert, Thomas and Simonyan, Karen and Sifre, Laurent and Schmitt, Simon and Guez, Arthur and Lockhart, Edward and Hassabis, Demis and Graepel, Thore and Lillicrap, Timothy and Silver, David , booktitle =. Mastering. 2020 , volume =

2020

[31] [31]

Semi-Supervised Classification with Graph Convolutional Networks

Semi-Supervised Classification with Graph Convolutional Networks , author =. International Conference on Learning Representations (ICLR) , year =. 1609.02907 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv

[32] [32]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Inductive Representation Learning on Large Graphs , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

[33] [33]

International Conference on Machine Learning (ICML) , year =

Neural Message Passing for Quantum Chemistry , author =. International Conference on Machine Learning (ICML) , year =

[34] [34]

Graph Attention Networks

Graph Attention Networks , author =. International Conference on Learning Representations (ICLR) , year =. 1710.10903 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv

[35] [35]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Recipe for a General, Powerful, Scalable Graph Transformer , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =. 2205.12454 , archivePrefix =

work page arXiv

[36] [36]

European Semantic Web Conference (ESWC) , year =

Modeling Relational Data with Graph Convolutional Networks , author =. European Semantic Web Conference (ESWC) , year =

[37] [37]

How Powerful are Graph Neural Networks?

How Powerful are Graph Neural Networks? , author =. International Conference on Learning Representations (ICLR) , year =. 1810.00826 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv

[38] [38]

and Andreas, Jacob , journal =

Gu, Yu and Du, Yilun and Tenenbaum, Joshua B. and Andreas, Jacob , journal =. Are

[39] [39]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Language Models Meet World Models: Embodied Experiences Enhance Language Models , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =. 2305.10626 , archivePrefix =

work page arXiv

[40] [40]

International Conference on Machine Learning (ICML) , year =

Do Embodied Agents Dream of Pixelated Sheep: Embodied Decision Making using Language Guided World Modelling , author =. International Conference on Machine Learning (ICML) , year =. 2301.12050 , archivePrefix =

work page arXiv

[41] [41]

2024 , eprint =

Zhu, Deyao and Chen, Jun and Shen, Xiaoqian and Li, Xiang and Elhoseiny, Mohamed , booktitle =. 2024 , eprint =

2024

[42] [42]

2024 , eprint =

Zhao, Andrew and Huang, Daniel and Xu, Quentin and Lin, Matthieu and Liu, Yong-Jin and Huang, Gao , booktitle =. 2024 , eprint =

2024

[43] [43]

Siren's Song in the

Zhang, Yue and Li, Yafu and Cui, Leyang and Cai, Deng and Liu, Lemao and Fu, Tingchen and Huang, Xinting and Zhao, Enbo and Zhang, Yu and Chen, Yulong and Wang, Longyue and Luu, Anh Tuan and Bi, Wei and Shi, Freda and Shi, Shuming , journal =. Siren's Song in the

[44] [44]

A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions

A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions , author =. arXiv preprint arXiv:2311.05232 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[45] [45]

A Survey of Hallucination in Large Foundation Models

A Survey of Hallucination in Large Foundation Models , author =. arXiv preprint arXiv:2309.05922 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[46] [46]

ACM Computing Surveys , volume =

Survey of Hallucination in Natural Language Generation , author =. ACM Computing Surveys , volume =

[47] [47]

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL) , year =

On Faithfulness and Factuality in Abstractive Summarization , author =. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL) , year =

[48] [48]

Proceedings of the International Conference on Automated Planning and Scheduling (ICAPS) , year =

A Novel Iterative Approach to Top-k Planning , author =. Proceedings of the International Conference on Automated Planning and Scheduling (ICAPS) , year =

[49] [49]

Annual Review of Control, Robotics, and Autonomous Systems , volume =

Integrated Task and Motion Planning , author =. Annual Review of Control, Robotics, and Autonomous Systems , volume =

[50] [50]

arXiv preprint arXiv:2203.09634 , year =

Inventing Relational State and Action Abstractions for Effective and Efficient Bilevel Planning , author =. arXiv preprint arXiv:2203.09634 , year =

work page arXiv

[51] [51]

Liu, Bo and Jiang, Yuqian and Zhang, Xiaohan and Liu, Qiang and Zhang, Shiqi and Biswas, Joydeep and Stone, Peter , journal =

[52] [52]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-Based Task Planning , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =. 2305.14909 , archivePrefix =

work page arXiv

[53] [53]

and Kaelbling, Leslie Pack and Katz, Michael , booktitle =

Silver, Tom and Dan, Soham and Srinivas, Kavitha and Tenenbaum, Joshua B. and Kaelbling, Leslie Pack and Katz, Michael , booktitle =. Generalized Planning in

[54] [54]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Language Models are Few-Shot Learners , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

[55] [55]

arXiv preprint arXiv:2303.08774 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[56] [56]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Training Language Models to Follow Instructions with Human Feedback , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

[57] [57]

Touvron, Hugo and Martin, Louis and Stone, Kevin and Albert, Peter and Almahairi, Amjad and Babaei, Yasmine and others , journal =

[58] [58]

Self-Consistency Improves Chain of Thought Reasoning in Language Models

Self-Consistency Improves Chain of Thought Reasoning in Language Models , author =. International Conference on Learning Representations (ICLR) , year =. 2203.11171 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv

[59] [59]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Large Language Models are Zero-Shot Reasoners , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

[60] [60]

Self-Refine: Iterative Refinement with Self-Feedback

Self-Refine: Iterative Refinement with Self-Feedback , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =. 2303.17651 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv

[61] [61]

Proceedings of the

Graph of Thoughts: Solving Elaborate Problems with Large Language Models , author =. Proceedings of the. 2024 , eprint =

2024

[62] [62]

Proceedings of the 36th Annual

Generative Agents: Interactive Simulacra of Human Behavior , author =. Proceedings of the 36th Annual

[63] [63]

Let's Verify Step by Step

Let's Verify Step by Step , author =. arXiv preprint arXiv:2305.20050 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[64] [64]

Zhou, Shuyan and Xu, Frank F and Zhu, Hao and others , booktitle=

[65] [65]

Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models

Language Agent Tree Search Unifies Reasoning, Acting, and Planning in Language Models , author=. arXiv:2310.04406 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[66] [66]

NeurIPS , year=

Toolformer: Language Models Can Teach Themselves to Use Tools , author=. NeurIPS , year=

[67] [67]

Executable Code Actions Elicit Better

Wang, Xingyao and Chen, Yangyi and others , booktitle=. Executable Code Actions Elicit Better

[68] [68]

Jimenez, Carlos E and Yang, John and others , booktitle=

[69] [69]

ICLR , year=

Self-Consistency Improves Chain of Thought Reasoning in Language Models , author=. ICLR , year=

[70] [70]

Chain-of-Verification Reduces Hallucination in Large Language Models

Chain-of-Verification Reduces Hallucination in Large Language Models , author=. arXiv:2309.11495 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[71] [71]

IJCNLP-AACL , year=

Faithful Chain-of-Thought Reasoning , author=. IJCNLP-AACL , year=

[72] [72]

TMLR , year=

Cognitive Architectures for Language Agents , author=. TMLR , year=

[73] [73]

Zeng, Aohan and Liu, Mingdao and others , booktitle=

[74] [74]

Zhao, Jiaan and others , journal=. Is

[75] [75]

Chen, Baian and Shu, Chang and others , journal=

[76] [76]

Gemini: A Family of Highly Capable Multimodal Models

Gemini: A Family of Highly Capable Multimodal Models , author=. arXiv:2312.11805 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[77] [77]

Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback

Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback , author=. arXiv:2302.12813 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[78] [78]

Gou, Zhibin and others , booktitle=

[79] [79]

TACL , year=

Automatically Correcting Large Language Models: Surveying the Landscape of Diverse Automated Correction Strategies , author=. TACL , year=

[80] [80]

ACL Findings , year=

Towards Faithful Explanations for Text Classification with Robustness Improvement , author=. ACL Findings , year=