arxiv: 2604.09579 · v1 · submitted 2026-02-25 · 💻 cs.AI · cs.SE

Recognition: no theorem link

Help Without Being Asked: A Deployed Proactive Agent System for On-Call Support with Continuous Self-Improvement

Fengrui Liu , Xiao He , Tieying Zhang

Authors on Pith no claims yet

Pith reviewed 2026-05-15 19:52 UTC · model grok-4.3

classification 💻 cs.AI cs.SE

keywords proactive agenton-call supportself-improvementcloud platformcustomer dialogueAI deployment

0 comments

The pith

Vigil proactively inserts AI assistance into live human-customer on-call dialogues and learns from resolved cases to improve itself.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Vigil as a system that stays active after a ticket escalates to human analysts in high-volume cloud support environments. Rather than stopping when humans take over, the agent joins the ongoing conversation to offer suggestions without being prompted and then pulls lessons from how the humans close each case to update its own responses. This matters for platforms that generate thousands of tickets daily because it keeps the AI engaged through the full resolution process instead of handing off and forgetting. A ten-month deployment provides the main evidence that the approach can run in production without major reported issues.

Core claim

Vigil operates across the entire on-call life-cycle by proactively offering assistance during the human-involved phase without explicit invocation and incorporates continuous self-improvement by extracting knowledge from human-resolved cases to autonomously update its capabilities, with real-world deployment on a large cloud platform demonstrating practicality.

What carries the argument

Proactive dialogue insertion combined with automated knowledge extraction from human-resolved cases for ongoing capability updates.

If this is right

Analysts receive context-aware suggestions mid-dialogue that can shorten resolution times for escalated tickets.
The agent accumulates knowledge from every human-closed case, expanding its coverage of issues over successive deployments.
Follow-up questions and progress tracking remain supported after escalation, closing a gap left by purely reactive agents.
Real-time operation without explicit user calls reduces the friction of invoking help during busy support sessions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same insertion-plus-extraction loop could be tested in other high-volume human-AI collaboration settings such as medical triage or legal review.
If knowledge extraction scales cleanly, the fraction of tickets requiring human escalation may decline as the agent improves.
Deployment data from one platform leaves open whether the same self-improvement holds when transferred to support teams with different domain knowledge.

Load-bearing premise

Inserting proactive suggestions into live analyst-customer dialogues does not disrupt workflows or add errors, and knowledge taken from human cases improves the agent without creating new problems.

What would settle it

Measure changes in average ticket resolution time, analyst intervention frequency, and introduction of incorrect suggestions when the proactive agent is turned on versus off in the same live support queues.

Figures

Figures reproduced from arXiv: 2604.09579 by Fengrui Liu, Tieying Zhang, Xiao He.

**Figure 2.** Figure 2: Framework of Vigil. Features with two primary functions: (1) Online Proactive Response; (2) Continuous Self-Improvement. Furthermore, for some operational issues, Vigil’s capability extends by invoking tools to retrieve detailed logs, associated alerts, and diagnostic metadata. This integration allows Vigil to ground its responses in real cloud system, providing highly specific and actionable assistance. … view at source ↗

**Figure 3.** Figure 3: Proactive response card of Vigil. Including a distinct layout for distinguishing agent from human, explicit citations for verifiability and an Accept button for collecting feedback. As shown in [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Continuous self-improvement framework. Vigil learns from answered questions, unanswered questions, and external documents shared in the on-call dialogue. entries are retrieved and utilized in future on-calls, they become subject to the validation mechanisms in the Learning from Answered Questions module. Through the Update operation described previously, these specific answers are iteratively polished an… view at source ↗

**Figure 5.** Figure 5: Volcano Engine On-Call Statistics, June 23 – July 22, [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Vigil operates different on-call dialogues for the same critical issue (simplified and anonymized). support, Vigil was unable to provide a resolution due to insufficient contextual information. The site reliability engineering team observed that the automatic evacuation protocol had failed to execute as expected. The team proposed a temporary workaround of manually migrating the service to a reserved hos… view at source ↗

**Figure 7.** Figure 7: Vigil reviews the unaccepted answer and refines its knowledge base (simplified and anonymized). second customer later encountered the same issue, Vigil proactively suggested the UTF-8 encoding solution. Nevertheless, the answer was not accepted by the customer. Within our framework, such nonacceptance triggers a review mechanism, preventing potentially incorrect knowledge from being reused in future cases… view at source ↗

read the original abstract

In large-scale cloud service platforms, thousands of customer tickets are generated daily and are typically handled through on-call dialogues. This high volume of on-call interactions imposes a substantial workload on human support analysts. Recent studies have explored reactive agents that leverage large language models as a first line of support to interact with customers directly and resolve issues. However, when issues remain unresolved and are escalated to human support, these agents are typically disengaged. As a result, they cannot assist with follow-up inquiries, track resolution progress, or learn from the cases they fail to address. In this paper, we introduce Vigil, a novel proactive agent system designed to operate throughout the entire on-call life-cycle. Unlike reactive agents, Vigil focuses on providing assistance during the phase in which human support is already involved. It integrates into the dialogue between the customer and the analyst, proactively offering assistance without explicit user invocation. Moreover, Vigil incorporates a continuous self-improvement mechanism that extracts knowledge from human-resolved cases to autonomously update its capabilities. Vigil has been deployed on Volcano Engine, ByteDance's cloud platform, for over ten months, and comprehensive evaluations based on this deployment demonstrate its effectiveness and practicality. The open source version of this work is publicly available at https://github.com/volcengine/veaiops.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces Vigil, a proactive LLM-based agent system for on-call support in large-scale cloud platforms. Unlike reactive agents that disengage once issues escalate to humans, Vigil integrates into live customer-analyst dialogues to offer assistance without explicit invocation and includes a continuous self-improvement mechanism that extracts knowledge from human-resolved cases to update its capabilities. The central claim is that its deployment on Volcano Engine (ByteDance's cloud platform) for over ten months, together with evaluations based on this deployment, demonstrates the system's effectiveness and practicality.

Significance. If the deployment results can be shown to isolate the contributions of proactive insertion and the self-improvement loop, the work would provide rare real-world evidence for the viability of proactive agents in high-volume support workflows, with potential to reduce analyst workload while preserving quality. The ten-month production deployment and public open-sourcing of the code are clear strengths that increase the result's credibility beyond typical lab evaluations.

major comments (3)

[Evaluation] Evaluation section: the reported aggregate success rates do not include controlled A/B tests or before/after comparisons that isolate the effect of proactive insertions versus baseline analyst performance; without such isolation it is difficult to attribute observed improvements specifically to Vigil rather than to analyst skill or ticket distribution.
[Self-Improvement Mechanism] Self-improvement mechanism description: the paper states that knowledge is extracted from resolved cases to update capabilities, but provides no quantitative metrics on update frequency, error rates introduced by extracted knowledge, or ablation showing net-positive impact versus potential policy drift or hallucinated fixes.
[System Architecture and Deployment] Deployment integration: while the architecture for inserting assistance into live dialogues is described, there is no reported measurement of workflow disruption, analyst acceptance rates, or customer-experience impact (e.g., response latency or satisfaction scores) attributable to the proactive component.

minor comments (2)

[Abstract] Abstract: quantitative metrics, baselines, and error analysis are absent; adding a concise summary of key deployment statistics (e.g., number of tickets assisted, success-rate lift) would strengthen the abstract.
[Figures] Figure captions and tables: several figures lack axis labels or error bars; ensure all visualizations include statistical significance indicators where comparisons are made.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript describing the Vigil proactive agent system. We address each major comment below with clarifications based on our deployment experience and indicate where revisions will be made.

read point-by-point responses

Referee: [Evaluation] Evaluation section: the reported aggregate success rates do not include controlled A/B tests or before/after comparisons that isolate the effect of proactive insertions versus baseline analyst performance; without such isolation it is difficult to attribute observed improvements specifically to Vigil rather than to analyst skill or ticket distribution.

Authors: We agree that fully isolating the contribution of proactive insertions from analyst skill or ticket distribution is difficult in a live production system. Our evaluations rely on aggregate metrics collected over the ten-month deployment on Volcano Engine, which demonstrate overall improvements in resolution efficiency. Randomized A/B testing was not performed to avoid risks to customer service quality. We have added a dedicated subsection in the revised manuscript discussing these methodological constraints and the observed temporal trends in the deployment data. revision: partial
Referee: [Self-Improvement Mechanism] Self-improvement mechanism description: the paper states that knowledge is extracted from resolved cases to update capabilities, but provides no quantitative metrics on update frequency, error rates introduced by extracted knowledge, or ablation showing net-positive impact versus potential policy drift or hallucinated fixes.

Authors: The original manuscript emphasized the overall system outcomes rather than granular self-improvement statistics. We acknowledge this as a gap and have added quantitative results to the revised version, including update frequency from resolved cases, measured error rates in extracted knowledge, and an ablation analysis comparing performance with and without the self-improvement loop to demonstrate net-positive effects. revision: yes
Referee: [System Architecture and Deployment] Deployment integration: while the architecture for inserting assistance into live dialogues is described, there is no reported measurement of workflow disruption, analyst acceptance rates, or customer-experience impact (e.g., response latency or satisfaction scores) attributable to the proactive component.

Authors: We have expanded the deployment section in the revised manuscript to include quantitative measurements from the ten-month production run, such as analyst acceptance rates for proactive suggestions, additional latency introduced by insertions, and indicators of customer experience impact derived from support logs. These additions provide direct evidence of the integration's effects on the workflow. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on external deployment observations

full rationale

The paper presents a system description and reports real-world deployment metrics from Volcano Engine over ten months. No equations, parameter fits, predictions, or self-citations appear in the derivation chain. Effectiveness is asserted via observed outcomes rather than any reduction to fitted inputs or self-referential definitions. This matches the default non-circular case for deployment papers grounded in external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical model or derivations present; the contribution is a deployed system description. No free parameters, axioms, or invented entities are introduced in a formal sense.

pith-pipeline@v0.9.0 · 5533 in / 957 out tokens · 18667 ms · 2026-05-15T19:52:49.141311+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages · 4 internal anchors

[1]

Introducing GPT-5

2025. Introducing GPT-5. https://openai.com/index/introducing-gpt-5/

work page 2025
[2]

Rama Akkiraju, Anbang Xu, and Deepak et.al. Bora. 2024. FACTS About Building Retrieval Augmented Generation-based Chatbots. arXiv:2407.07858 [cs] doi:10. 48550/arXiv.2407.07858

work page arXiv 2024
[3]

Arikkat, Abhinav M, Navya Binu, Parvathi M, Navya Biju, K

Dincy R. Arikkat, Abhinav M, Navya Binu, Parvathi M, Navya Biju, K. S. Arunima, Vinod P, Rafidha Rehiman K. A, and Mauro Conti. 2024. IntellBot: Retrieval Aug- mented LLM Chatbot for Cyber Threat Knowledge Delivery. arXiv:2411.05442 [cs] doi:10.48550/arXiv.2411.05442

work page doi:10.48550/arxiv.2411.05442 2024
[4]

Maciej Besta, Ales Kubicek, Robert Gerstenberger, Marcin Chrapek, Roman Niggli, Patrik Okanovic, Yi Zhu, Patrick Iff, Michal Podstawski, Lucas Weitzendorf, Mingyuan Chi, Joanna Gajda, Piotr Nyczyk, Jürgen Müller, Hubert Niewiadomski, and Torsten Hoefler. 2025. Multi-Head RAG: Solving Multi-Aspect Problems with LLMs. arXiv:2406.05085 [cs] doi:10.48550/arXi...

work page doi:10.48550/arxiv.2406.05085 2025
[5]

Som Sekhar Bhattacharyya. 2024. Study of Adoption of Artificial Intelligence Technology-Driven Natural Large Language Model-Based Chatbots by Firms for Customer Service Interaction.Journal of Science and Technology Policy Manage- ment(May 2024). doi:10.1108/JSTPM-11-2023-0201

work page doi:10.1108/jstpm-11-2023-0201 2024
[6]

Mark Chen, Jerry Tworek, Heewoo Jun, et al. 2021. Evaluating Large Language Models Trained on Code. arXiv:2107.03374 [cs] doi:10.48550/arXiv.2107.03374

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2107.03374 2021
[7]

Xin Cheng, Di Luo, Xiuying Chen, Lemao Liu, Dongyan Zhao, and Rui Yan

work page
[8]

arXiv:2305.02437 [cs] doi:10.48550/arXiv.2305.02437

Lift Yourself Up: Retrieval-augmented Text Generation with Self Memory. arXiv:2305.02437 [cs] doi:10.48550/arXiv.2305.02437

work page doi:10.48550/arxiv.2305.02437
[9]

Yang Deng, Wenqiang Lei, Wai Lam, and Tat-Seng Chua. 2023. A Survey on Proac- tive Dialogue Systems: Problems, Methods, and Prospects. arXiv:2305.02750 [cs] doi:10.48550/arXiv.2305.02750

work page doi:10.48550/arxiv.2305.02750 2023
[10]

Yang Deng, Wenqiang Lei, Wenxuan Zhang, Wai Lam, and Tat-Seng Chua. 2022. PACIFIC: Towards Proactive Conversational Question Answering over Tabular and Textual Data in Finance. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computational Ling...

work page doi:10.18653/v1/2022.emnlp-main.469 2022
[11]

Yang Deng, Lizi Liao, Zhonghua Zheng, Grace Hui Yang, and Tat-Seng Chua

work page
[12]

InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval

Towards Human-centered Proactive Conversational Agents. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, Washington DC USA, 807–818. doi:10.1145/3626772. 3657843

work page doi:10.1145/3626772
[13]

Huan-ang Gao, Jiayi Geng, Wenyue Hua, et al. 2025. A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence. arXiv:2507.21046 [cs] doi:10. 48550/arXiv.2507.21046

work page internal anchor Pith review Pith/arXiv arXiv 2025
[14]

Google. 2025. Gemini 2.5 Pro. https://deepmind.google/models/gemini/pro/

work page 2025
[15]

A Survey on LLM-as-a-Judge

Jiawei Gu, Xuhui Jiang, Zhichao Shi, Hexiang Tan, Xuehao Zhai, Chengjin Xu, Wei Li, Yinghan Shen, Shengjie Ma, Honghao Liu, Saizhuo Wang, Kun Zhang, Yuanzhuo Wang, Wen Gao, Lionel Ni, and Jian Guo. 2025.A Survey on LLM-as-a- Judge. arXiv:2411.15594 [cs] doi:10.48550/arXiv.2411.15594

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2411.15594 2025
[16]

Markovitch, and Rusty A

Dongling Huang, Dmitri G. Markovitch, and Rusty A. Stough. 2024. Can Chatbot Customer Service Match Human Service Agents on Customer Satisfaction? An Investigation in the Role of Trust.Journal of Retailing and Consumer Services76 (Jan. 2024), 103600. doi:10.1016/j.jretconser.2023.103600

work page doi:10.1016/j.jretconser.2023.103600 2024
[17]

Jiho Kim, Yeonsu Kwon, Yohan Jo, and Edward Choi. 2023. KG-GPT: A Gen- eral Framework for Reasoning on Knowledge Graphs Using Large Language Models. InFindings of the Association for Computational Linguistics: EMNLP 2023, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapore, 9410–9421. doi:10.18653/v1/2023...

work page doi:10.18653/v1/2023.findings-emnlp.631 2023
[18]

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. 2020. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. InAdvances in Neural Information Processing Systems, Vol. 33. Curran Associates, Inc., 9459–9474

work page 2020
[19]

Fengrui Liu, Xiao He, Tieying Zhang, Jianjun Chen, Yi Li, Lihua Yi, Haipeng Zhang, Gang Wu, and Rui Shi. 2025. TickIt: Leveraging Large Language Models for Automated Ticket Escalation. InProceedings of the 33rd ACM International Con- ference on the Foundations of Software Engineering. 343–354. arXiv:2504.08475 [cs] doi:10.1145/3696630.3728558

work page doi:10.1145/3696630.3728558 2025
[20]

Xingyu Bruce Liu, Shitao Fang, Weiyan Shi, Chien-Sheng Wu, Takeo Igarashi, and Xiang ’Anthony’ Chen. 2025. Proactive Conversational Agents with In- ner Thoughts. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. ACM, Yokohama Japan, 1–19. doi:10.1145/3706598.3713760

work page doi:10.1145/3706598.3713760 2025
[21]

Yaxi Lu, Shenzhi Yang, Cheng Qian, Guirong Chen, Qinyu Luo, Yesai Wu, Huadong Wang, Xin Cong, Zhong Zhang, Yankai Lin, Weiwen Liu, Yasheng Wang, Zhiyuan Liu, Fangming Liu, and Maosong Sun. 2024. Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance. arXiv:2410.12361 [cs] doi:10.48550/arXiv.2410.12361

work page doi:10.48550/arxiv.2410.12361 2024
[22]

Xinbei Ma, Yeyun Gong, Pengcheng He, Hai Zhao, and Nan Duan

work page
[23]

arXiv:2305.14283 [cs] doi:10.48550/arXiv.2305.14283

Query Rewriting for Retrieval-Augmented Large Language Models. arXiv:2305.14283 [cs] doi:10.48550/arXiv.2305.14283

work page doi:10.48550/arxiv.2305.14283
[25]

Zihan Niu, Zheyong Xie, Shaosheng Cao, Chonggang Lu, Zheyu Ye, Tong Xu, Zuozhu Liu, Yan Gao, Jia Chen, Zhe Xu, Yi Wu, and Yao Hu. 2025. PaRT: En- hancing Proactive Social Chatbots with Personalized Real-Time Retrieval. In Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’25). Associatio...

work page doi:10.1145/3726302.3731946 2025
[26]

Patil, Tianjun Zhang, Xin Wang, and Joseph E

Shishir G. Patil, Tianjun Zhang, Xin Wang, and Joseph E. Gonzalez. 2024. Go- rilla: Large Language Model Connected with Massive APIs.Advances in Neural Information Processing Systems37 (Dec. 2024), 126544–126565

work page 2024
[27]

Changhua Pei, Zexin Wang, Fengrui Liu, Zeyan Li, Yang Liu, Xiao He, Rong Kang, Tieying Zhang, Jianjun Chen, Jianhui Li, Gaogang Xie, and Dan Pei. 2025. Flow-of-Action: SOP Enhanced LLM-Based Multi-Agent System for Root Cause Analysis. arXiv:2502.08224 [cs] doi:10.48550/arXiv.2502.08224

work page doi:10.48550/arxiv.2502.08224 2025
[28]

Qwen. 2025. Qwen2.5 VL. https://qwen.ai

work page 2025
[29]

Ask Me Anything

Scott Rome, Tianwen Chen, Raphael Tang, Luwei Zhou, and Ferhan Ture. 2024. "Ask Me Anything": How Comcast Uses LLMs to Assist Agents in Real Time. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’24). Association for Computing Machinery, New York, NY, USA, 2827–2831. doi:10.1145/3626...

work page doi:10.1145/3626772.3661345 2024
[30]

Bytedance Seed. 2025. ByteDance Seed. https://seed.bytedance.com/en/seed1_6

work page 2025
[31]

Bytedance Seed. 2025. Seed1.6-Embedding. https://seed1-6- embedding.github.io/

work page 2025
[32]

Ferreira

Samaneh Shafee, Alysson Bessani, and Pedro M. Ferreira. 2025. Evaluation of LLM Chatbots for OSINT-based Cyber Threat Awareness.Expert Systems with Applications261 (Feb. 2025), 125509. arXiv:2401.15127 [cs] doi:10.1016/j.eswa. 2024.125509

work page doi:10.1016/j.eswa 2025
[33]

Jingzhe Shi, Jialuo Li, Qinwei Ma, Zaiwen Yang, Huan Ma, and Lei Li. 2024. CHOPS: CHat with custOmer Profile Systems for Customer Service with LLMs. arXiv:2404.01343 [cs] doi:10.48550/arXiv.2404.01343

work page doi:10.48550/arxiv.2404.01343 2024
[34]

Hanchen Su, Wei Luo, Yashar Mehdad, Wei Han, Elaine Liu, Wayne Zhang, Mia Zhao, and Joy Zhang. 2025. LLM-Friendly Knowledge Representation for Customer Support. InProceedings of the 31st International Conference on Com- putational Linguistics: Industry Track, Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schocka...

work page 2025
[35]

Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. 2023. Voyager: An Open-Ended Embodied Agent with Large Language Models. arXiv:2305.16291 [cs] doi:10.48550/arXiv. 2305.16291

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv 2023
[36]

Haoxin Wang, Xianhan Peng, Xucheng Huang, Yizhe Huang, Ming Gong, Cheng- han Yang, Yang Liu, and Ling Jiang. 2025. ECom-Bench: Can LLM Agent Re- solve Real-World E-commerce Customer Support Issues? arXiv:2507.05639 [cs] doi:10.48550/arXiv.2507.05639

work page doi:10.48550/arxiv.2507.05639 2025
[37]

Shuting Wang, Xin Yu, Mang Wang, Weipeng Chen, Yutao Zhu, and Zhicheng Dou. 2024. RichRAG: Crafting Rich Responses for Multi-faceted Queries in Retrieval-Augmented Generation. arXiv:2406.12566 [cs] doi:10.48550/arXiv.2406. 12566

work page doi:10.48550/arxiv.2406 2024
[38]

Zhentao Xu, Mark Jerome Cruz, Matthew Guevara, Tie Wang, Manasi Deshpande, Xiaofeng Wang, and Zheng Li. 2024. Retrieval-Augmented Generation with Knowledge Graphs for Customer Service Question Answering. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’24). Association for Computing ...

work page doi:10.1145/3626772.3661370 2024
[39]

Bufang Yang, Lilin Xu, Liekang Zeng, Kaiwei Liu, Siyang Jiang, Wenrui Lu, Hongkai Chen, Xiaofan Jiang, Guoliang Xing, and Zhenyu Yan. 2025. Con- textAgent: Context-Aware Proactive LLM Agents with Open-World Sensory Perceptions. arXiv:2505.14668 [cs] doi:10.48550/arXiv.2505.14668

work page doi:10.48550/arxiv.2505.14668 2025
[40]

Rui Yang, Michael Fu, Chakkrit Tantithamthavorn, Chetan Arora, Lisa Van- denhurk, and Joey Chua. 2025. RAGVA: Engineering Retrieval Augmented Generation-based Virtual Assistants in Practice. arXiv:2502.14930 [cs] doi:10. 48550/arXiv.2502.14930

work page arXiv 2025
[41]

Ceyao Zhang, Kaijie Yang, Siyi Hu, Zihao Wang, Guanghe Li, Yihang Sun, Cheng Zhang, Zhaowei Zhang, Anji Liu, Song-Chun Zhu, Xiaojun Chang, Junge Zhang, Feng Yin, Yitao Liang, and Yaodong Yang. 2024. ProAgent: Building Proactive Cooperative Agents with Large Language Models.Proceedings of the AAAI Conference on Artificial Intelligence38, 16 (March 2024), 1...

work page 2024
[42]

Xuan Zhang, Yang Deng, Zifeng Ren, See-Kiong Ng, and Tat-Seng Chua

work page
[43]

arXiv:2406.12639 [cs] doi:10.48550/arXiv.2406.12639

Ask-before-Plan: Proactive Language Agents for Real-World Planning. arXiv:2406.12639 [cs] doi:10.48550/arXiv.2406.12639

work page doi:10.48550/arxiv.2406.12639
[44]

Yifei Zhou, Sergey Levine, Jason Weston, Xian Li, and Sainbayar Sukhbaatar

work page
[45]

arXiv:2506.01716 [cs] doi:10

Self-Challenging Language Model Agents. arXiv:2506.01716 [cs] doi:10. 48550/arXiv.2506.01716 Conference’17, July 2017, Washington, DC, USA Fengrui Liu et al. A Prompt Template Details We use seed-1.6 model, a model with original thinking ability from ByteDance, to perform different tasks that introduced in this paper. The model output constrains to a spec...

work page arXiv 2017
[46]

You are able to answer questions related to the Volcano Engine's product features, usage guidance, configuration instructions, and provide code examples

work page
[47]

You can help with explaining the error message, exception and common troubleshooting steps of Volcano Engine. # Rules

work page
[48]

Within Scope

If the messages contain a question that you are capable of answering, classify it as "Within Scope"

work page
[49]

Out of Scope

If the messages contain a question that is beyond your ability scope, classify it as "Out of Scope"

work page
[50]

No assistance needed

If the messages do not contain any question, classify it as "No assistance needed"

work page
[51]

Unable to answer

You just need to give the classification result, without answering the question. Please analyze the newly added messages from the customer and give your classification result. A.2 Prompt for Answer Generation # Role You are an intelligent assistant. Please combine the historical dialogue with the references to understand and respond to the current questio...

work page
[52]

Keep: If you find the answer from the follow-up dialogue is consistent with the existing answer, or the follow-up dialogue does not discuss about the question any more, select "Keep" which represents that you need to do nothing to the knowledge base

work page
[53]

Delete: If you find the answer from the follow-up dialogue has significantly differences from the existing answer, and the references is not suitable for this question, select "Delete" that you can delete the inappropriate references from the knowledge base

work page
[54]

Update: Historical references may contain some differences compared to the current on-call. If you find the answer from the follow-up dialogue has few differences from the existing answer, you need to distinguish the different background and prerequisites of this problem, and rewrite the question and answer to make them more accurate

work page