pith. sign in

arxiv: 2606.18636 · v1 · pith:OR6NXVAPnew · submitted 2026-06-17 · 💻 cs.CL · cs.AI

PEC-Home: Interpretation of Progressively Elliptical Commands in Smart Homes

Pith reviewed 2026-06-26 21:04 UTC · model grok-4.3

classification 💻 cs.CL cs.AI
keywords smart homeselliptical commandslarge language modelsreferential ambiguityintention ambiguitydialogue historyhome assistants
0
0 comments X

The pith

Existing home assistants execute elliptical commands less accurately than complete ones, even with dialogue history tools.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces PEC-Home, the first simulated dataset built to test how home assistants handle commands that grow shorter and more ambiguous as shared context builds across turns. It isolates two specific problems that arise from this progressive omission: referential ambiguity when multiple users hold different expectations about the environment, and intention ambiguity when preferences shift over time or with context. Experiments across LLMs including GPT-4o demonstrate that execution accuracy on these elliptical inputs falls below the level achieved with full commands, even when the models are given explicit tools to store and retrieve prior dialogue. A reader would care because real household conversations naturally drop details once context is established, so any system that cannot keep up limits its own usefulness in everyday settings.

Core claim

PEC-Home is presented as the first dataset for progressively elliptical commands in smart homes; it shows that current LLMs encounter referential ambiguity from differing user environmental expectations and intention ambiguity from evolving preferences, producing lower execution accuracy on elliptical inputs than on complete commands despite access to dialogue-history retrieval tools.

What carries the argument

PEC-Home dataset, which encodes progressive omission across multi-user home turns to produce referential and intention ambiguities that must be resolved for correct device operation.

If this is right

  • Assistants must move beyond simple history storage to resolve ambiguities that accumulate with progressive omission.
  • Models need mechanisms to track shifting user intentions across turns and users rather than assuming static preferences.
  • Development of practical home systems should incorporate explicit handling of elliptical forms to match the efficiency of human dialogue.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same progressive-omission pattern likely appears in other multi-turn dialogue settings such as personal scheduling or customer support, suggesting the dataset design could transfer.
  • Testing whether fine-tuning on PEC-Home closes the accuracy gap would indicate whether the limitation is primarily data-driven or architectural.
  • If the gap persists on real data, it would motivate new architectures that maintain explicit models of shared environmental state rather than relying on implicit context in the prompt.

Load-bearing premise

The simulated home scenarios in PEC-Home accurately capture the referential and intention ambiguities that arise from progressive omission in real multi-user smart-home interactions.

What would settle it

A direct comparison of the same LLMs on PEC-Home versus a corpus of recorded real multi-user home dialogues that exhibit increasing ellipsis would show whether the observed accuracy gap is an artifact of the simulation.

Figures

Figures reproduced from arXiv: 2606.18636 by Boao Qian, HaiFeng Wang, Jiashu Yao, Silin Li, Yingyu Shan, Yuhang Guo, Zeming Liu.

Figure 1
Figure 1. Figure 1: An example of PEC-Home. Multi-User Prefer￾ences presents the referential ambiguity from conflicting "comfortable temperature" definitions between family members and Dynamic User Preferences indicates the intention ambiguity caused by environment changes. recognition tasks like household routine automa￾tion (Dey et al., 2006; Ur et al., 2014) and activity prediction (Tax, 2018; Kim et al., 2017; Khraief et … view at source ↗
Figure 2
Figure 2. Figure 2: Examples of progressively elliptical user commands across four levels (Lv1–Lv4) illustrate the defined [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Execution Accuracy of Sasha on Qwen2.5- 7B-Instruct across varying amounts of preloaded mem￾ory in multi-user preferences and dynamic user prefer￾ences scenarios. ‘Mem number’ indicates the amount of preloaded irrelevant memory [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Execution Accuracy of Qwen2.5-7B-Instruct [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Execution Accuracy of SAGE on Qwen2.5- 7B-Instruct across varying amounts of preloaded mem￾ory in multi-user preferences and dynamic user prefer￾ences scenarios. ‘Mem number’ indicates the amount of preloaded irrelevant memory. 1 2 3 4 Ellipsis Level 35 40 60 80 100 EA RAG (Multi-User Pref.) 1 2 3 4 Ellipsis Level 35 40 60 80 100 RAG (Dynamic User Pref.) Mem0 Mem5 Mem10 Mem20 [PITH_FULL_IMAGE:figures/full… view at source ↗
Figure 6
Figure 6. Figure 6: Execution Accuracy of RAG on Qwen2.5- 7B-Instruct across varying amounts of preloaded mem￾ory in multi-user preferences and dynamic user prefer￾ences scenarios. ‘Mem number’ indicates the amount of preloaded irrelevant memory [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Execution Accuracy of RAG on Gemma2-9b￾it across varying amounts of preloaded memory in multi￾user preferences and dynamic user preferences scenar￾ios. ‘Mem number’ indicates the amount of preloaded irrelevant memory. 20 40 60 80 100 120 140 160 180 200 Step 0 20 40 60 80 100 EA Multi-User Pref. 20 40 60 80 100 120 140 160 180 200 Step 0 20 40 60 80 100 Dynamic User Pref. Lv1 Lv2 Lv3 Lv4 [PITH_FULL_IMAGE:… view at source ↗
read the original abstract

Recent advancements in Large Language Models (LLMs) have empowered home assistants with natural language interaction capabilities. However, current assistants overlook the progressive omission that occurs in human dialogue as shared context accumulates, leading to more elliptical expressions for efficient communication. Thus, current assistants still struggle to interpret such elliptical expressions accurately, which limits their effectiveness in real-world applications. In practical smart home scenarios, assistants face two major challenges caused by elliptical commands: (1) referential ambiguity caused by different environmental expectations among multiple users; and (2) intention ambiguity resulting from user preferences that evolve over time or change with the environment. To address these challenges, we introduce PEC-Home, the first simulated home dataset specifically designed for interpreting progressively elliptical commands in smart homes. Extensive experiments on various LLMs, including GPT-4o, show that existing home assistants struggle to execute user-intended operations based solely on elliptical commands. Even when equipped with tools for storing and retrieving user dialogue history, execution accuracy remains below that achieved with complete commands.}.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper introduces PEC-Home, the first simulated home dataset specifically designed for interpreting progressively elliptical commands in smart homes. It identifies two challenges—referential ambiguity from multiple users' differing environmental expectations and intention ambiguity from evolving preferences—and reports that LLMs including GPT-4o achieve lower execution accuracy on elliptical commands than complete ones, even when equipped with dialogue-history storage and retrieval tools.

Significance. If the simulated scenarios faithfully capture real multi-turn ellipsis patterns and multi-user ambiguities, the dataset would provide a useful benchmark for improving context handling in LLM-based home assistants. The emphasis on progressive omission as shared context accumulates addresses a practical gap in current dialogue systems.

major comments (2)
  1. [Abstract] Abstract: the central empirical claim that 'execution accuracy remains below that achieved with complete commands' is stated without any quantitative accuracy numbers, dataset statistics, error analysis, or experimental setup details, leaving the result unsupported by visible evidence.
  2. [Abstract] Abstract: the dataset is described as 'specifically designed' for referential and intention ambiguities, but no construction details, user-model sampling procedure, context-accumulation rules, or external validation against observed human ellipsis patterns are supplied, so it is impossible to assess whether the generated distributions match real multi-user smart-home interactions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for highlighting issues in the abstract. We agree that the abstract should more explicitly support its claims and will revise it accordingly while preserving its brevity. The full manuscript already contains the requested details in later sections.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central empirical claim that 'execution accuracy remains below that achieved with complete commands' is stated without any quantitative accuracy numbers, dataset statistics, error analysis, or experimental setup details, leaving the result unsupported by visible evidence.

    Authors: We accept this observation. The abstract summarizes results from the experiments in Sections 4 and 5 but does not include the actual numbers. In the revision we will insert concise quantitative statements (e.g., the accuracy gap for GPT-4o and other models) and a brief reference to the evaluation protocol so the central claim is directly supported by evidence visible in the abstract. revision: yes

  2. Referee: [Abstract] Abstract: the dataset is described as 'specifically designed' for referential and intention ambiguities, but no construction details, user-model sampling procedure, context-accumulation rules, or external validation against observed human ellipsis patterns are supplied, so it is impossible to assess whether the generated distributions match real multi-user smart-home interactions.

    Authors: We agree the abstract is too terse on this point. Section 3 of the manuscript details the simulation procedure, user-model sampling, context-accumulation rules, and the design choices that produce referential and intention ambiguities. We will add one or two high-level sentences to the abstract that point to these design elements and note that the distributions were derived from observed multi-user dialogue patterns. revision: yes

Circularity Check

0 steps flagged

No circularity: dataset introduction and empirical benchmarking only

full rationale

The paper introduces PEC-Home as a simulated dataset for progressively elliptical commands and reports LLM benchmarking results on it. No mathematical derivations, equations, fitted parameters, predictions from subsets of data, or self-citation chains appear in the provided text. The central claim (lower accuracy on elliptical vs. complete commands) is an empirical observation on the new dataset rather than a reduction to prior inputs by construction. This matches the default expectation of no significant circularity for a straightforward dataset-plus-benchmark paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical model or new theoretical entities; the contribution rests on empirical dataset creation and LLM testing.

pith-pipeline@v0.9.1-grok · 5722 in / 1006 out tokens · 18714 ms · 2026-06-26T21:04:52.967213+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

62 extracted references · 15 canonical work pages

  1. [1]

    Can an intelligent personal assistant (IPA) be your friend? Para-friendship development mechanism between IPAs and their users , author=. Comput. Hum. Behav. , year=

  2. [2]

    , author=

    Situation models in language comprehension and memory. , author=. Psychological bulletin , volume=. 1998 , publisher=

  3. [3]

    arXiv preprint arXiv:2303.08774 , year=

    Gpt-4 technical report , author=. arXiv preprint arXiv:2303.08774 , year=

  4. [4]

    Attention is All you Need , url =

    Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, ukasz and Polosukhin, Illia , booktitle =. Attention is All you Need , url =

  5. [5]

    Analysis of IFTTT Recipes to Study How Humans Use Internet-of-Things (IoT) Devices , url=

    Yu, Haoxiang and Hua, Jie and Julien, Christine , year=. Analysis of IFTTT Recipes to Study How Humans Use Internet-of-Things (IoT) Devices , url=. doi:10.1145/3485730.3494115 , booktitle=

  6. [6]

    Advances in neural information processing systems , volume=

    Chain-of-thought prompting elicits reasoning in large language models , author=. Advances in neural information processing systems , volume=

  7. [7]

    2023 , eprint=

    Mistral 7B , author=. 2023 , eprint=

  8. [8]

    arXiv preprint arXiv:2412.15115 , year =

    Qwen2.5 Technical Report , author =. arXiv preprint arXiv:2412.15115 , year =

  9. [9]

    2024 , eprint=

    Gemma 2: Improving Open Language Models at a Practical Size , author=. 2024 , eprint=

  10. [10]

    Advances in Neural Information Processing Systems , volume=

    Toolqa: A dataset for llm question answering with external tools , author=. Advances in Neural Information Processing Systems , volume=

  11. [11]

    2024 , url =

    Llama 3 Model Card , author=. 2024 , url =

  12. [12]

    arXiv preprint arXiv:2307.09288 , year=

    Llama 2: Open foundation and fine-tuned chat models , author=. arXiv preprint arXiv:2307.09288 , year=

  13. [13]

    , author=

    Lora: Low-rank adaptation of large language models. , author=. Iclr , volume=

  14. [14]

    Cognitive science , volume=

    Characterizing the dynamics of learning in repeated reference games , author=. Cognitive science , volume=. 2020 , publisher=

  15. [15]

    Cognition , volume=

    Referring as a collaborative process , author=. Cognition , volume=. 1986 , publisher=

  16. [16]

    Language and Speech , volume=

    Naming and describing in social communication , author=. Language and Speech , volume=. 1980 , publisher=

  17. [17]

    , author=

    Conceptual pacts and lexical choice in conversation. , author=. Journal of experimental psychology: Learning, memory, and cognition , volume=. 1996 , publisher=

  18. [18]

    Psychonomic Science , volume=

    Changes in reference phrases as a function of frequency of usage in social interaction: A preliminary study , author=. Psychonomic Science , volume=. 1964 , publisher=

  19. [19]

    International Conference on Learning Representations (ICLR) , year=

    React: Synergizing reasoning and acting in language models , author=. International Conference on Learning Representations (ICLR) , year=

  20. [20]

    2024 , eprint=

    DeepSeek-V3 Technical Report , author=. 2024 , eprint=

  21. [21]

    and Wong, Kam-Fai

    Wang, Hongru and Wang, Rui and Xue, Boyang and Xia, Heming and Cao, Jingtao and Liu, Zeming and Pan, Jeff Z. and Wong, Kam-Fai. A pp B ench: Planning of Multiple API s from Various APP s for Complex User Instruction. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.856

  22. [22]

    Order-Agnostic Data Augmentation for Few-Shot Named Entity Recognition

    Wang, Huiming and Cheng, Liying and Zhang, Wenxuan and Soh, De Wen and Bing, Lidong. Order-Agnostic Data Augmentation for Few-Shot Named Entity Recognition. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.acl-long.421

  23. [23]

    BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding

    Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina. BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019. doi:10.18653/v...

  24. [24]

    Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task

    Yu, Tao and Zhang, Rui and Yang, Kai and Yasunaga, Michihiro and Wang, Dongxu and Li, Zifan and Ma, James and Li, Irene and Yao, Qingning and Roman, Shanelle and Zhang, Zilin and Radev, Dragomir. S pider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to- SQL Task. Proceedings of the 2018 Conference on Empirical...

  25. [25]

    arXiv preprint arXiv:2107.03374 , year=

    Evaluating large language models trained on code , author=. arXiv preprint arXiv:2107.03374 , year=

  26. [26]

    arXiv preprint arXiv:2401.17167 , year=

    Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios , author=. arXiv preprint arXiv:2401.17167 , year=

  27. [27]

    FAME : Towards Factual Multi-Task Model Editing

    Zeng, Li and Shan, Yingyu and Liu, Zeming and Yao, Jiashu and Guo, Yuhang. FAME : Towards Factual Multi-Task Model Editing. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.894

  28. [28]

    Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , url =

    Lewis, Patrick and Perez, Ethan and Piktus, Aleksandra and Petroni, Fabio and Karpukhin, Vladimir and Goyal, Naman and K\". Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , url =. Advances in Neural Information Processing Systems , editor =

  29. [29]

    KAT : A Knowledge Augmented Transformer for Vision-and-Language

    Gui, Liangke and Wang, Borui and Huang, Qiuyuan and Hauptmann, Alexander and Bisk, Yonatan and Gao, Jianfeng. KAT : A Knowledge Augmented Transformer for Vision-and-Language. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022. doi:10.18653/v1/2022.naacl-main.70

  30. [30]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume=

    Can Large Language Models Understand Real-World Complex Instructions? , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

  31. [31]

    2018 14th international conference on intelligent environments (IE) , pages=

    Human activity prediction in smart home environments with LSTM neural networks , author=. 2018 14th international conference on intelligent environments (IE) , pages=. 2018 , organization=

  32. [32]

    2018 IEEE International Symposium on Circuits and Systems (ISCAS) , pages=

    Video-based human fall detection in smart homes using deep learning , author=. 2018 IEEE International Symposium on Circuits and Systems (ISCAS) , pages=. 2018 , organization=

  33. [33]

    International Journal of Machine Learning and Computing , volume=

    Convolutional neural network based on dynamic motion and shape variations for elderly fall detection , author=. International Journal of Machine Learning and Computing , volume=

  34. [34]

    Engineering Applications of Artificial Intelligence , volume=

    Audio content analysis for unobtrusive event detection in smart homes , author=. Engineering Applications of Artificial Intelligence , volume=. 2020 , publisher=

  35. [35]

    ACM Computing Surveys (CSUR) , volume=

    Machine learning for smart building applications: Review and taxonomy , author=. ACM Computing Surveys (CSUR) , volume=. 2019 , publisher=

  36. [36]

    Applied Sciences , volume=

    A systematic content review of artificial intelligence and the internet of things applications in smart home , author=. Applied Sciences , volume=. 2020 , publisher=

  37. [37]

    Deep Learning (CNN, RNN) Applications for Smart Homes: A Systematic Review , volume =

    Yu, Ji Yeon and de Antonio, Angélica and Villalba Mora, Elena , year =. Deep Learning (CNN, RNN) Applications for Smart Homes: A Systematic Review , volume =. Computers , doi =

  38. [38]

    Aho and Jeffrey D

    Alfred V. Aho and Jeffrey D. Ullman , title =. 1972

  39. [39]

    Publications Manual , year = "1983", publisher =

  40. [40]

    Chandra and Dexter C

    Ashok K. Chandra and Dexter C. Kozen and Larry J. Stockmeyer , year = "1981", title =. doi:10.1145/322234.322243

  41. [41]

    Scalable training of

    Andrew, Galen and Gao, Jianfeng , booktitle=. Scalable training of

  42. [42]

    Dan Gusfield , title =. 1997

  43. [43]

    Tetreault , title =

    Mohammad Sadegh Rasooli and Joel R. Tetreault , title =. Computing Research Repository , volume =. 2015 , url =

  44. [44]

    A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =

    Ando, Rie Kubota and Zhang, Tong , Issn =. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =. Journal of Machine Learning Research , Month = dec, Numpages =

  45. [45]

    IEEE Internet of Things Journal , year=

    AIoT Smart Home via Autonomous LLM Agents , author=. IEEE Internet of Things Journal , year=

  46. [46]

    Proceedings of the SIGCHI conference on human factors in computing systems , pages=

    Practical trigger-action programming in the smart home , author=. Proceedings of the SIGCHI conference on human factors in computing systems , pages=

  47. [47]

    Pervasive Computing: 4th International Conference, PERVASIVE 2006, Dublin, Ireland, May 7-10, 2006

    iCAP: Interactive prototyping of context-aware applications , author=. Pervasive Computing: 4th International Conference, PERVASIVE 2006, Dublin, Ireland, May 7-10, 2006. Proceedings 4 , pages=. 2006 , organization=

  48. [48]

    2020 international conference for emerging technology (INCET) , pages=

    Smart home automation using machine learning algorithms , author=. 2020 international conference for emerging technology (INCET) , pages=. 2020 , organization=

  49. [49]

    2017 IEEE International Conference on Smart Computing (SMARTCOMP) , pages=

    An activity-embedding approach for next-activity prediction in a multi-user smart space , author=. 2017 IEEE International Conference on Smart Computing (SMARTCOMP) , pages=. 2017 , organization=

  50. [50]

    Deploying Reinforcement Learning Approaches for Smart Home Automation , year=

    Sen, Amit Prakash and Goyal, Manish Kumar and Shalini , booktitle=. Deploying Reinforcement Learning Approaches for Smart Home Automation , year=

  51. [51]

    Potential Impacts of Smart Homes on Human Behavior: A Reinforcement Learning Approach , year=

    Suman, Shashi and Etemad, Ali and Rivest, Francois , journal=. Potential Impacts of Smart Homes on Human Behavior: A Reinforcement Learning Approach , year=

  52. [52]

    2020 , volume =

    Gupta, Saurabh and Bhambri, Siddhant and Dhingra, Karan and Buduru, Arun Balaji and Kumaraguru, Ponnurangam , booktitle =. 2020 , volume =. doi:10.1109/SMDS49396.2020.00018 , url =

  53. [53]

    SIGMOBILE Mob

    Weiser, Mark , title =. SIGMOBILE Mob. Comput. Commun. Rev. , month = jul, pages =. 1999 , issue_date =. doi:10.1145/329124.329126 , abstract =

  54. [54]

    King, Evan and Yu, Haoxiang and Lee, Sangsu and Julien, Christine , title =. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. , month = mar, articleno =. 2024 , issue_date =. doi:10.1145/3643505 , abstract =

  55. [55]

    Language Models are Few-Shot Learners , url =

    Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and Agarwal, Sandhini and Herbert-Voss, Ariel and Krueger, Gretchen and Henighan, Tom and Child, Rewon and Ramesh, Aditya and Ziegler, Daniel and Wu, Jeffrey and Winte...

  56. [56]

    2024 , eprint=

    Harmony: A Home Agent for Responsive Management and Action Optimization with a Locally Deployed Large Language Model , author=. 2024 , eprint=

  57. [57]

    2024 , eprint=

    Bridging the gap between natural user expression with complex automation programming in smart homes , author=. 2024 , eprint=

  58. [58]

    CodeAgent: Enhancing code generation with tool-integrated agent systems for real-world repo-level coding challenges

    Zhang, Kechi and Li, Jia and Li, Ge and Shi, Xianjie and Jin, Zhi. C ode A gent: Enhancing Code Generation with Tool-Integrated Agent Systems for Real-World Repo-level Coding Challenges. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.acl-long.737

  59. [59]

    T -Eval: Evaluating the Tool Utilization Capability of Large Language Models Step by Step

    Chen, Zehui and Du, Weihua and Zhang, Wenwei and Liu, Kuikun and Liu, Jiangning and Zheng, Miao and Zhuo, Jingming and Zhang, Songyang and Lin, Dahua and Chen, Kai and Zhao, Feng. T -Eval: Evaluating the Tool Utilization Capability of Large Language Models Step by Step. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistic...

  60. [60]

    AutoAct: Automatic Agent Learning from Scratch for

    Qiao, Shuofei and Zhang, Ningyu and Fang, Runnan and Luo, Yujie and Zhou, Wangchunshu and Jiang, Yuchen and Lv, Chengfei and Chen, Huajun. A uto A ct: Automatic Agent Learning from Scratch for QA via Self-Planning. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.18653/v1/2024.ac...

  61. [61]

    arXiv preprint arXiv:2112.09118 , year=

    Unsupervised dense information retrieval with contrastive learning , author=. arXiv preprint arXiv:2112.09118 , year=

  62. [62]

    (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

    Zhang, Saizheng and Dinan, Emily and Urbanek, Jack and Szlam, Arthur and Kiela, Douwe and Weston, Jason. Personalizing Dialogue Agents: I have a dog, do you have pets too?. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2018. doi:10.18653/v1/P18-1205