pith. machine review for the scientific record. sign in

arxiv: 2604.09579 · v1 · submitted 2026-02-25 · 💻 cs.AI · cs.SE

Recognition: no theorem link

Help Without Being Asked: A Deployed Proactive Agent System for On-Call Support with Continuous Self-Improvement

Authors on Pith no claims yet

Pith reviewed 2026-05-15 19:52 UTC · model grok-4.3

classification 💻 cs.AI cs.SE
keywords proactive agenton-call supportself-improvementcloud platformcustomer dialogueAI deployment
0
0 comments X

The pith

Vigil proactively inserts AI assistance into live human-customer on-call dialogues and learns from resolved cases to improve itself.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Vigil as a system that stays active after a ticket escalates to human analysts in high-volume cloud support environments. Rather than stopping when humans take over, the agent joins the ongoing conversation to offer suggestions without being prompted and then pulls lessons from how the humans close each case to update its own responses. This matters for platforms that generate thousands of tickets daily because it keeps the AI engaged through the full resolution process instead of handing off and forgetting. A ten-month deployment provides the main evidence that the approach can run in production without major reported issues.

Core claim

Vigil operates across the entire on-call life-cycle by proactively offering assistance during the human-involved phase without explicit invocation and incorporates continuous self-improvement by extracting knowledge from human-resolved cases to autonomously update its capabilities, with real-world deployment on a large cloud platform demonstrating practicality.

What carries the argument

Proactive dialogue insertion combined with automated knowledge extraction from human-resolved cases for ongoing capability updates.

If this is right

  • Analysts receive context-aware suggestions mid-dialogue that can shorten resolution times for escalated tickets.
  • The agent accumulates knowledge from every human-closed case, expanding its coverage of issues over successive deployments.
  • Follow-up questions and progress tracking remain supported after escalation, closing a gap left by purely reactive agents.
  • Real-time operation without explicit user calls reduces the friction of invoking help during busy support sessions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same insertion-plus-extraction loop could be tested in other high-volume human-AI collaboration settings such as medical triage or legal review.
  • If knowledge extraction scales cleanly, the fraction of tickets requiring human escalation may decline as the agent improves.
  • Deployment data from one platform leaves open whether the same self-improvement holds when transferred to support teams with different domain knowledge.

Load-bearing premise

Inserting proactive suggestions into live analyst-customer dialogues does not disrupt workflows or add errors, and knowledge taken from human cases improves the agent without creating new problems.

What would settle it

Measure changes in average ticket resolution time, analyst intervention frequency, and introduction of incorrect suggestions when the proactive agent is turned on versus off in the same live support queues.

Figures

Figures reproduced from arXiv: 2604.09579 by Fengrui Liu, Tieying Zhang, Xiao He.

Figure 1
Figure 1. Figure 1: Comparison of reactive and proactive agent paradigms in the on-call support process. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Framework of Vigil. Features with two primary functions: (1) Online Proactive Response; (2) Continuous Self-Improvement. Furthermore, for some operational issues, Vigil’s capability ex￾tends by invoking tools to retrieve detailed logs, associated alerts, and diagnostic metadata. This integration allows Vigil to ground its responses in real cloud system, providing highly specific and actionable assistance. … view at source ↗
Figure 3
Figure 3. Figure 3: Proactive response card of Vigil. Including a distinct layout for distinguishing agent from human, explicit citations for verifiability and an Accept button for collecting feedback. As shown in [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Continuous self-improvement framework. Vigil learns from answered questions, unanswered questions, and exter￾nal documents shared in the on-call dialogue. entries are retrieved and utilized in future on-calls, they become subject to the validation mechanisms in the Learning from An￾swered Questions module. Through the Update operation described previously, these specific answers are iteratively polished an… view at source ↗
Figure 5
Figure 5. Figure 5: Volcano Engine On-Call Statistics, June 23 – July 22, [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Vigil operates different on-call dialogues for the same critical issue (simplified and anonymized). support, Vigil was unable to provide a resolution due to insuffi￾cient contextual information. The site reliability engineering team observed that the automatic evacuation protocol had failed to exe￾cute as expected. The team proposed a temporary workaround of manually migrating the service to a reserved hos… view at source ↗
Figure 7
Figure 7. Figure 7: Vigil reviews the unaccepted answer and refines its knowledge base (simplified and anonymized). second customer later encountered the same issue, Vigil proactively suggested the UTF-8 encoding solution. Nevertheless, the answer was not accepted by the customer. Within our framework, such non￾acceptance triggers a review mechanism, preventing potentially incorrect knowledge from being reused in future cases… view at source ↗
read the original abstract

In large-scale cloud service platforms, thousands of customer tickets are generated daily and are typically handled through on-call dialogues. This high volume of on-call interactions imposes a substantial workload on human support analysts. Recent studies have explored reactive agents that leverage large language models as a first line of support to interact with customers directly and resolve issues. However, when issues remain unresolved and are escalated to human support, these agents are typically disengaged. As a result, they cannot assist with follow-up inquiries, track resolution progress, or learn from the cases they fail to address. In this paper, we introduce Vigil, a novel proactive agent system designed to operate throughout the entire on-call life-cycle. Unlike reactive agents, Vigil focuses on providing assistance during the phase in which human support is already involved. It integrates into the dialogue between the customer and the analyst, proactively offering assistance without explicit user invocation. Moreover, Vigil incorporates a continuous self-improvement mechanism that extracts knowledge from human-resolved cases to autonomously update its capabilities. Vigil has been deployed on Volcano Engine, ByteDance's cloud platform, for over ten months, and comprehensive evaluations based on this deployment demonstrate its effectiveness and practicality. The open source version of this work is publicly available at https://github.com/volcengine/veaiops.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces Vigil, a proactive LLM-based agent system for on-call support in large-scale cloud platforms. Unlike reactive agents that disengage once issues escalate to humans, Vigil integrates into live customer-analyst dialogues to offer assistance without explicit invocation and includes a continuous self-improvement mechanism that extracts knowledge from human-resolved cases to update its capabilities. The central claim is that its deployment on Volcano Engine (ByteDance's cloud platform) for over ten months, together with evaluations based on this deployment, demonstrates the system's effectiveness and practicality.

Significance. If the deployment results can be shown to isolate the contributions of proactive insertion and the self-improvement loop, the work would provide rare real-world evidence for the viability of proactive agents in high-volume support workflows, with potential to reduce analyst workload while preserving quality. The ten-month production deployment and public open-sourcing of the code are clear strengths that increase the result's credibility beyond typical lab evaluations.

major comments (3)
  1. [Evaluation] Evaluation section: the reported aggregate success rates do not include controlled A/B tests or before/after comparisons that isolate the effect of proactive insertions versus baseline analyst performance; without such isolation it is difficult to attribute observed improvements specifically to Vigil rather than to analyst skill or ticket distribution.
  2. [Self-Improvement Mechanism] Self-improvement mechanism description: the paper states that knowledge is extracted from resolved cases to update capabilities, but provides no quantitative metrics on update frequency, error rates introduced by extracted knowledge, or ablation showing net-positive impact versus potential policy drift or hallucinated fixes.
  3. [System Architecture and Deployment] Deployment integration: while the architecture for inserting assistance into live dialogues is described, there is no reported measurement of workflow disruption, analyst acceptance rates, or customer-experience impact (e.g., response latency or satisfaction scores) attributable to the proactive component.
minor comments (2)
  1. [Abstract] Abstract: quantitative metrics, baselines, and error analysis are absent; adding a concise summary of key deployment statistics (e.g., number of tickets assisted, success-rate lift) would strengthen the abstract.
  2. [Figures] Figure captions and tables: several figures lack axis labels or error bars; ensure all visualizations include statistical significance indicators where comparisons are made.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript describing the Vigil proactive agent system. We address each major comment below with clarifications based on our deployment experience and indicate where revisions will be made.

read point-by-point responses
  1. Referee: [Evaluation] Evaluation section: the reported aggregate success rates do not include controlled A/B tests or before/after comparisons that isolate the effect of proactive insertions versus baseline analyst performance; without such isolation it is difficult to attribute observed improvements specifically to Vigil rather than to analyst skill or ticket distribution.

    Authors: We agree that fully isolating the contribution of proactive insertions from analyst skill or ticket distribution is difficult in a live production system. Our evaluations rely on aggregate metrics collected over the ten-month deployment on Volcano Engine, which demonstrate overall improvements in resolution efficiency. Randomized A/B testing was not performed to avoid risks to customer service quality. We have added a dedicated subsection in the revised manuscript discussing these methodological constraints and the observed temporal trends in the deployment data. revision: partial

  2. Referee: [Self-Improvement Mechanism] Self-improvement mechanism description: the paper states that knowledge is extracted from resolved cases to update capabilities, but provides no quantitative metrics on update frequency, error rates introduced by extracted knowledge, or ablation showing net-positive impact versus potential policy drift or hallucinated fixes.

    Authors: The original manuscript emphasized the overall system outcomes rather than granular self-improvement statistics. We acknowledge this as a gap and have added quantitative results to the revised version, including update frequency from resolved cases, measured error rates in extracted knowledge, and an ablation analysis comparing performance with and without the self-improvement loop to demonstrate net-positive effects. revision: yes

  3. Referee: [System Architecture and Deployment] Deployment integration: while the architecture for inserting assistance into live dialogues is described, there is no reported measurement of workflow disruption, analyst acceptance rates, or customer-experience impact (e.g., response latency or satisfaction scores) attributable to the proactive component.

    Authors: We have expanded the deployment section in the revised manuscript to include quantitative measurements from the ten-month production run, such as analyst acceptance rates for proactive suggestions, additional latency introduced by insertions, and indicators of customer experience impact derived from support logs. These additions provide direct evidence of the integration's effects on the workflow. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on external deployment observations

full rationale

The paper presents a system description and reports real-world deployment metrics from Volcano Engine over ten months. No equations, parameter fits, predictions, or self-citations appear in the derivation chain. Effectiveness is asserted via observed outcomes rather than any reduction to fitted inputs or self-referential definitions. This matches the default non-circular case for deployment papers grounded in external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical model or derivations present; the contribution is a deployed system description. No free parameters, axioms, or invented entities are introduced in a formal sense.

pith-pipeline@v0.9.0 · 5533 in / 957 out tokens · 18667 ms · 2026-05-15T19:52:49.141311+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages · 4 internal anchors

  1. [1]

    Introducing GPT-5

    2025. Introducing GPT-5. https://openai.com/index/introducing-gpt-5/

  2. [2]

    Rama Akkiraju, Anbang Xu, and Deepak et.al. Bora. 2024. FACTS About Building Retrieval Augmented Generation-based Chatbots. arXiv:2407.07858 [cs] doi:10. 48550/arXiv.2407.07858

  3. [3]

    Arikkat, Abhinav M, Navya Binu, Parvathi M, Navya Biju, K

    Dincy R. Arikkat, Abhinav M, Navya Binu, Parvathi M, Navya Biju, K. S. Arunima, Vinod P, Rafidha Rehiman K. A, and Mauro Conti. 2024. IntellBot: Retrieval Aug- mented LLM Chatbot for Cyber Threat Knowledge Delivery. arXiv:2411.05442 [cs] doi:10.48550/arXiv.2411.05442

  4. [4]

    Maciej Besta, Ales Kubicek, Robert Gerstenberger, Marcin Chrapek, Roman Niggli, Patrik Okanovic, Yi Zhu, Patrick Iff, Michal Podstawski, Lucas Weitzendorf, Mingyuan Chi, Joanna Gajda, Piotr Nyczyk, Jürgen Müller, Hubert Niewiadomski, and Torsten Hoefler. 2025. Multi-Head RAG: Solving Multi-Aspect Problems with LLMs. arXiv:2406.05085 [cs] doi:10.48550/arXi...

  5. [5]

    Som Sekhar Bhattacharyya. 2024. Study of Adoption of Artificial Intelligence Technology-Driven Natural Large Language Model-Based Chatbots by Firms for Customer Service Interaction.Journal of Science and Technology Policy Manage- ment(May 2024). doi:10.1108/JSTPM-11-2023-0201

  6. [6]

    Mark Chen, Jerry Tworek, Heewoo Jun, et al. 2021. Evaluating Large Language Models Trained on Code. arXiv:2107.03374 [cs] doi:10.48550/arXiv.2107.03374

  7. [7]

    Xin Cheng, Di Luo, Xiuying Chen, Lemao Liu, Dongyan Zhao, and Rui Yan

  8. [8]

    arXiv:2305.02437 [cs] doi:10.48550/arXiv.2305.02437

    Lift Yourself Up: Retrieval-augmented Text Generation with Self Memory. arXiv:2305.02437 [cs] doi:10.48550/arXiv.2305.02437

  9. [9]

    Yang Deng, Wenqiang Lei, Wai Lam, and Tat-Seng Chua. 2023. A Survey on Proac- tive Dialogue Systems: Problems, Methods, and Prospects. arXiv:2305.02750 [cs] doi:10.48550/arXiv.2305.02750

  10. [10]

    Yang Deng, Wenqiang Lei, Wenxuan Zhang, Wai Lam, and Tat-Seng Chua. 2022. PACIFIC: Towards Proactive Conversational Question Answering over Tabular and Textual Data in Finance. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computational Ling...

  11. [11]

    Yang Deng, Lizi Liao, Zhonghua Zheng, Grace Hui Yang, and Tat-Seng Chua

  12. [12]

    InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval

    Towards Human-centered Proactive Conversational Agents. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, Washington DC USA, 807–818. doi:10.1145/3626772. 3657843

  13. [13]

    Huan-ang Gao, Jiayi Geng, Wenyue Hua, et al. 2025. A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence. arXiv:2507.21046 [cs] doi:10. 48550/arXiv.2507.21046

  14. [14]

    Google. 2025. Gemini 2.5 Pro. https://deepmind.google/models/gemini/pro/

  15. [15]

    A Survey on LLM-as-a-Judge

    Jiawei Gu, Xuhui Jiang, Zhichao Shi, Hexiang Tan, Xuehao Zhai, Chengjin Xu, Wei Li, Yinghan Shen, Shengjie Ma, Honghao Liu, Saizhuo Wang, Kun Zhang, Yuanzhuo Wang, Wen Gao, Lionel Ni, and Jian Guo. 2025.A Survey on LLM-as-a- Judge. arXiv:2411.15594 [cs] doi:10.48550/arXiv.2411.15594

  16. [16]

    Markovitch, and Rusty A

    Dongling Huang, Dmitri G. Markovitch, and Rusty A. Stough. 2024. Can Chatbot Customer Service Match Human Service Agents on Customer Satisfaction? An Investigation in the Role of Trust.Journal of Retailing and Consumer Services76 (Jan. 2024), 103600. doi:10.1016/j.jretconser.2023.103600

  17. [17]

    Jiho Kim, Yeonsu Kwon, Yohan Jo, and Edward Choi. 2023. KG-GPT: A Gen- eral Framework for Reasoning on Knowledge Graphs Using Large Language Models. InFindings of the Association for Computational Linguistics: EMNLP 2023, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapore, 9410–9421. doi:10.18653/v1/2023...

  18. [18]

    Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. 2020. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. InAdvances in Neural Information Processing Systems, Vol. 33. Curran Associates, Inc., 9459–9474

  19. [19]

    Fengrui Liu, Xiao He, Tieying Zhang, Jianjun Chen, Yi Li, Lihua Yi, Haipeng Zhang, Gang Wu, and Rui Shi. 2025. TickIt: Leveraging Large Language Models for Automated Ticket Escalation. InProceedings of the 33rd ACM International Con- ference on the Foundations of Software Engineering. 343–354. arXiv:2504.08475 [cs] doi:10.1145/3696630.3728558

  20. [20]

    Xingyu Bruce Liu, Shitao Fang, Weiyan Shi, Chien-Sheng Wu, Takeo Igarashi, and Xiang ’Anthony’ Chen. 2025. Proactive Conversational Agents with In- ner Thoughts. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. ACM, Yokohama Japan, 1–19. doi:10.1145/3706598.3713760

  21. [21]

    Yaxi Lu, Shenzhi Yang, Cheng Qian, Guirong Chen, Qinyu Luo, Yesai Wu, Huadong Wang, Xin Cong, Zhong Zhang, Yankai Lin, Weiwen Liu, Yasheng Wang, Zhiyuan Liu, Fangming Liu, and Maosong Sun. 2024. Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance. arXiv:2410.12361 [cs] doi:10.48550/arXiv.2410.12361

  22. [22]

    Xinbei Ma, Yeyun Gong, Pengcheng He, Hai Zhao, and Nan Duan

  23. [23]

    arXiv:2305.14283 [cs] doi:10.48550/arXiv.2305.14283

    Query Rewriting for Retrieval-Augmented Large Language Models. arXiv:2305.14283 [cs] doi:10.48550/arXiv.2305.14283

  24. [25]

    Zihan Niu, Zheyong Xie, Shaosheng Cao, Chonggang Lu, Zheyu Ye, Tong Xu, Zuozhu Liu, Yan Gao, Jia Chen, Zhe Xu, Yi Wu, and Yao Hu. 2025. PaRT: En- hancing Proactive Social Chatbots with Personalized Real-Time Retrieval. In Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’25). Associatio...

  25. [26]

    Patil, Tianjun Zhang, Xin Wang, and Joseph E

    Shishir G. Patil, Tianjun Zhang, Xin Wang, and Joseph E. Gonzalez. 2024. Go- rilla: Large Language Model Connected with Massive APIs.Advances in Neural Information Processing Systems37 (Dec. 2024), 126544–126565

  26. [27]

    Changhua Pei, Zexin Wang, Fengrui Liu, Zeyan Li, Yang Liu, Xiao He, Rong Kang, Tieying Zhang, Jianjun Chen, Jianhui Li, Gaogang Xie, and Dan Pei. 2025. Flow-of-Action: SOP Enhanced LLM-Based Multi-Agent System for Root Cause Analysis. arXiv:2502.08224 [cs] doi:10.48550/arXiv.2502.08224

  27. [28]

    Qwen. 2025. Qwen2.5 VL. https://qwen.ai

  28. [29]

    Ask Me Anything

    Scott Rome, Tianwen Chen, Raphael Tang, Luwei Zhou, and Ferhan Ture. 2024. "Ask Me Anything": How Comcast Uses LLMs to Assist Agents in Real Time. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’24). Association for Computing Machinery, New York, NY, USA, 2827–2831. doi:10.1145/3626...

  29. [30]

    Bytedance Seed. 2025. ByteDance Seed. https://seed.bytedance.com/en/seed1_6

  30. [31]

    Bytedance Seed. 2025. Seed1.6-Embedding. https://seed1-6- embedding.github.io/

  31. [32]

    Ferreira

    Samaneh Shafee, Alysson Bessani, and Pedro M. Ferreira. 2025. Evaluation of LLM Chatbots for OSINT-based Cyber Threat Awareness.Expert Systems with Applications261 (Feb. 2025), 125509. arXiv:2401.15127 [cs] doi:10.1016/j.eswa. 2024.125509

  32. [33]

    Jingzhe Shi, Jialuo Li, Qinwei Ma, Zaiwen Yang, Huan Ma, and Lei Li. 2024. CHOPS: CHat with custOmer Profile Systems for Customer Service with LLMs. arXiv:2404.01343 [cs] doi:10.48550/arXiv.2404.01343

  33. [34]

    Hanchen Su, Wei Luo, Yashar Mehdad, Wei Han, Elaine Liu, Wayne Zhang, Mia Zhao, and Joy Zhang. 2025. LLM-Friendly Knowledge Representation for Customer Support. InProceedings of the 31st International Conference on Com- putational Linguistics: Industry Track, Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schocka...

  34. [35]

    Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. 2023. Voyager: An Open-Ended Embodied Agent with Large Language Models. arXiv:2305.16291 [cs] doi:10.48550/arXiv. 2305.16291

  35. [36]

    Haoxin Wang, Xianhan Peng, Xucheng Huang, Yizhe Huang, Ming Gong, Cheng- han Yang, Yang Liu, and Ling Jiang. 2025. ECom-Bench: Can LLM Agent Re- solve Real-World E-commerce Customer Support Issues? arXiv:2507.05639 [cs] doi:10.48550/arXiv.2507.05639

  36. [37]

    Shuting Wang, Xin Yu, Mang Wang, Weipeng Chen, Yutao Zhu, and Zhicheng Dou. 2024. RichRAG: Crafting Rich Responses for Multi-faceted Queries in Retrieval-Augmented Generation. arXiv:2406.12566 [cs] doi:10.48550/arXiv.2406. 12566

  37. [38]

    Zhentao Xu, Mark Jerome Cruz, Matthew Guevara, Tie Wang, Manasi Deshpande, Xiaofeng Wang, and Zheng Li. 2024. Retrieval-Augmented Generation with Knowledge Graphs for Customer Service Question Answering. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’24). Association for Computing ...

  38. [39]

    Bufang Yang, Lilin Xu, Liekang Zeng, Kaiwei Liu, Siyang Jiang, Wenrui Lu, Hongkai Chen, Xiaofan Jiang, Guoliang Xing, and Zhenyu Yan. 2025. Con- textAgent: Context-Aware Proactive LLM Agents with Open-World Sensory Perceptions. arXiv:2505.14668 [cs] doi:10.48550/arXiv.2505.14668

  39. [40]

    Rui Yang, Michael Fu, Chakkrit Tantithamthavorn, Chetan Arora, Lisa Van- denhurk, and Joey Chua. 2025. RAGVA: Engineering Retrieval Augmented Generation-based Virtual Assistants in Practice. arXiv:2502.14930 [cs] doi:10. 48550/arXiv.2502.14930

  40. [41]

    Ceyao Zhang, Kaijie Yang, Siyi Hu, Zihao Wang, Guanghe Li, Yihang Sun, Cheng Zhang, Zhaowei Zhang, Anji Liu, Song-Chun Zhu, Xiaojun Chang, Junge Zhang, Feng Yin, Yitao Liang, and Yaodong Yang. 2024. ProAgent: Building Proactive Cooperative Agents with Large Language Models.Proceedings of the AAAI Conference on Artificial Intelligence38, 16 (March 2024), 1...

  41. [42]

    Xuan Zhang, Yang Deng, Zifeng Ren, See-Kiong Ng, and Tat-Seng Chua

  42. [43]

    arXiv:2406.12639 [cs] doi:10.48550/arXiv.2406.12639

    Ask-before-Plan: Proactive Language Agents for Real-World Planning. arXiv:2406.12639 [cs] doi:10.48550/arXiv.2406.12639

  43. [44]

    Yifei Zhou, Sergey Levine, Jason Weston, Xian Li, and Sainbayar Sukhbaatar

  44. [45]

    arXiv:2506.01716 [cs] doi:10

    Self-Challenging Language Model Agents. arXiv:2506.01716 [cs] doi:10. 48550/arXiv.2506.01716 Conference’17, July 2017, Washington, DC, USA Fengrui Liu et al. A Prompt Template Details We use seed-1.6 model, a model with original thinking ability from ByteDance, to perform different tasks that introduced in this paper. The model output constrains to a spec...

  45. [46]

    You are able to answer questions related to the Volcano Engine's product features, usage guidance, configuration instructions, and provide code examples

  46. [47]

    You can help with explaining the error message, exception and common troubleshooting steps of Volcano Engine. # Rules

  47. [48]

    Within Scope

    If the messages contain a question that you are capable of answering, classify it as "Within Scope"

  48. [49]

    Out of Scope

    If the messages contain a question that is beyond your ability scope, classify it as "Out of Scope"

  49. [50]

    No assistance needed

    If the messages do not contain any question, classify it as "No assistance needed"

  50. [51]

    Unable to answer

    You just need to give the classification result, without answering the question. Please analyze the newly added messages from the customer and give your classification result. A.2 Prompt for Answer Generation # Role You are an intelligent assistant. Please combine the historical dialogue with the references to understand and respond to the current questio...

  51. [52]

    Keep: If you find the answer from the follow-up dialogue is consistent with the existing answer, or the follow-up dialogue does not discuss about the question any more, select "Keep" which represents that you need to do nothing to the knowledge base

  52. [53]

    Delete: If you find the answer from the follow-up dialogue has significantly differences from the existing answer, and the references is not suitable for this question, select "Delete" that you can delete the inappropriate references from the knowledge base

  53. [54]

    Update: Historical references may contain some differences compared to the current on-call. If you find the answer from the follow-up dialogue has few differences from the existing answer, you need to distinguish the different background and prerequisites of this problem, and rewrite the question and answer to make them more accurate