Recognition: no theorem link
Help Without Being Asked: A Deployed Proactive Agent System for On-Call Support with Continuous Self-Improvement
Pith reviewed 2026-05-15 19:52 UTC · model grok-4.3
The pith
Vigil proactively inserts AI assistance into live human-customer on-call dialogues and learns from resolved cases to improve itself.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Vigil operates across the entire on-call life-cycle by proactively offering assistance during the human-involved phase without explicit invocation and incorporates continuous self-improvement by extracting knowledge from human-resolved cases to autonomously update its capabilities, with real-world deployment on a large cloud platform demonstrating practicality.
What carries the argument
Proactive dialogue insertion combined with automated knowledge extraction from human-resolved cases for ongoing capability updates.
If this is right
- Analysts receive context-aware suggestions mid-dialogue that can shorten resolution times for escalated tickets.
- The agent accumulates knowledge from every human-closed case, expanding its coverage of issues over successive deployments.
- Follow-up questions and progress tracking remain supported after escalation, closing a gap left by purely reactive agents.
- Real-time operation without explicit user calls reduces the friction of invoking help during busy support sessions.
Where Pith is reading between the lines
- The same insertion-plus-extraction loop could be tested in other high-volume human-AI collaboration settings such as medical triage or legal review.
- If knowledge extraction scales cleanly, the fraction of tickets requiring human escalation may decline as the agent improves.
- Deployment data from one platform leaves open whether the same self-improvement holds when transferred to support teams with different domain knowledge.
Load-bearing premise
Inserting proactive suggestions into live analyst-customer dialogues does not disrupt workflows or add errors, and knowledge taken from human cases improves the agent without creating new problems.
What would settle it
Measure changes in average ticket resolution time, analyst intervention frequency, and introduction of incorrect suggestions when the proactive agent is turned on versus off in the same live support queues.
Figures
read the original abstract
In large-scale cloud service platforms, thousands of customer tickets are generated daily and are typically handled through on-call dialogues. This high volume of on-call interactions imposes a substantial workload on human support analysts. Recent studies have explored reactive agents that leverage large language models as a first line of support to interact with customers directly and resolve issues. However, when issues remain unresolved and are escalated to human support, these agents are typically disengaged. As a result, they cannot assist with follow-up inquiries, track resolution progress, or learn from the cases they fail to address. In this paper, we introduce Vigil, a novel proactive agent system designed to operate throughout the entire on-call life-cycle. Unlike reactive agents, Vigil focuses on providing assistance during the phase in which human support is already involved. It integrates into the dialogue between the customer and the analyst, proactively offering assistance without explicit user invocation. Moreover, Vigil incorporates a continuous self-improvement mechanism that extracts knowledge from human-resolved cases to autonomously update its capabilities. Vigil has been deployed on Volcano Engine, ByteDance's cloud platform, for over ten months, and comprehensive evaluations based on this deployment demonstrate its effectiveness and practicality. The open source version of this work is publicly available at https://github.com/volcengine/veaiops.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Vigil, a proactive LLM-based agent system for on-call support in large-scale cloud platforms. Unlike reactive agents that disengage once issues escalate to humans, Vigil integrates into live customer-analyst dialogues to offer assistance without explicit invocation and includes a continuous self-improvement mechanism that extracts knowledge from human-resolved cases to update its capabilities. The central claim is that its deployment on Volcano Engine (ByteDance's cloud platform) for over ten months, together with evaluations based on this deployment, demonstrates the system's effectiveness and practicality.
Significance. If the deployment results can be shown to isolate the contributions of proactive insertion and the self-improvement loop, the work would provide rare real-world evidence for the viability of proactive agents in high-volume support workflows, with potential to reduce analyst workload while preserving quality. The ten-month production deployment and public open-sourcing of the code are clear strengths that increase the result's credibility beyond typical lab evaluations.
major comments (3)
- [Evaluation] Evaluation section: the reported aggregate success rates do not include controlled A/B tests or before/after comparisons that isolate the effect of proactive insertions versus baseline analyst performance; without such isolation it is difficult to attribute observed improvements specifically to Vigil rather than to analyst skill or ticket distribution.
- [Self-Improvement Mechanism] Self-improvement mechanism description: the paper states that knowledge is extracted from resolved cases to update capabilities, but provides no quantitative metrics on update frequency, error rates introduced by extracted knowledge, or ablation showing net-positive impact versus potential policy drift or hallucinated fixes.
- [System Architecture and Deployment] Deployment integration: while the architecture for inserting assistance into live dialogues is described, there is no reported measurement of workflow disruption, analyst acceptance rates, or customer-experience impact (e.g., response latency or satisfaction scores) attributable to the proactive component.
minor comments (2)
- [Abstract] Abstract: quantitative metrics, baselines, and error analysis are absent; adding a concise summary of key deployment statistics (e.g., number of tickets assisted, success-rate lift) would strengthen the abstract.
- [Figures] Figure captions and tables: several figures lack axis labels or error bars; ensure all visualizations include statistical significance indicators where comparisons are made.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript describing the Vigil proactive agent system. We address each major comment below with clarifications based on our deployment experience and indicate where revisions will be made.
read point-by-point responses
-
Referee: [Evaluation] Evaluation section: the reported aggregate success rates do not include controlled A/B tests or before/after comparisons that isolate the effect of proactive insertions versus baseline analyst performance; without such isolation it is difficult to attribute observed improvements specifically to Vigil rather than to analyst skill or ticket distribution.
Authors: We agree that fully isolating the contribution of proactive insertions from analyst skill or ticket distribution is difficult in a live production system. Our evaluations rely on aggregate metrics collected over the ten-month deployment on Volcano Engine, which demonstrate overall improvements in resolution efficiency. Randomized A/B testing was not performed to avoid risks to customer service quality. We have added a dedicated subsection in the revised manuscript discussing these methodological constraints and the observed temporal trends in the deployment data. revision: partial
-
Referee: [Self-Improvement Mechanism] Self-improvement mechanism description: the paper states that knowledge is extracted from resolved cases to update capabilities, but provides no quantitative metrics on update frequency, error rates introduced by extracted knowledge, or ablation showing net-positive impact versus potential policy drift or hallucinated fixes.
Authors: The original manuscript emphasized the overall system outcomes rather than granular self-improvement statistics. We acknowledge this as a gap and have added quantitative results to the revised version, including update frequency from resolved cases, measured error rates in extracted knowledge, and an ablation analysis comparing performance with and without the self-improvement loop to demonstrate net-positive effects. revision: yes
-
Referee: [System Architecture and Deployment] Deployment integration: while the architecture for inserting assistance into live dialogues is described, there is no reported measurement of workflow disruption, analyst acceptance rates, or customer-experience impact (e.g., response latency or satisfaction scores) attributable to the proactive component.
Authors: We have expanded the deployment section in the revised manuscript to include quantitative measurements from the ten-month production run, such as analyst acceptance rates for proactive suggestions, additional latency introduced by insertions, and indicators of customer experience impact derived from support logs. These additions provide direct evidence of the integration's effects on the workflow. revision: yes
Circularity Check
No circularity: claims rest on external deployment observations
full rationale
The paper presents a system description and reports real-world deployment metrics from Volcano Engine over ten months. No equations, parameter fits, predictions, or self-citations appear in the derivation chain. Effectiveness is asserted via observed outcomes rather than any reduction to fitted inputs or self-referential definitions. This matches the default non-circular case for deployment papers grounded in external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
2025. Introducing GPT-5. https://openai.com/index/introducing-gpt-5/
work page 2025
- [2]
-
[3]
Arikkat, Abhinav M, Navya Binu, Parvathi M, Navya Biju, K
Dincy R. Arikkat, Abhinav M, Navya Binu, Parvathi M, Navya Biju, K. S. Arunima, Vinod P, Rafidha Rehiman K. A, and Mauro Conti. 2024. IntellBot: Retrieval Aug- mented LLM Chatbot for Cyber Threat Knowledge Delivery. arXiv:2411.05442 [cs] doi:10.48550/arXiv.2411.05442
-
[4]
Maciej Besta, Ales Kubicek, Robert Gerstenberger, Marcin Chrapek, Roman Niggli, Patrik Okanovic, Yi Zhu, Patrick Iff, Michal Podstawski, Lucas Weitzendorf, Mingyuan Chi, Joanna Gajda, Piotr Nyczyk, Jürgen Müller, Hubert Niewiadomski, and Torsten Hoefler. 2025. Multi-Head RAG: Solving Multi-Aspect Problems with LLMs. arXiv:2406.05085 [cs] doi:10.48550/arXi...
-
[5]
Som Sekhar Bhattacharyya. 2024. Study of Adoption of Artificial Intelligence Technology-Driven Natural Large Language Model-Based Chatbots by Firms for Customer Service Interaction.Journal of Science and Technology Policy Manage- ment(May 2024). doi:10.1108/JSTPM-11-2023-0201
-
[6]
Mark Chen, Jerry Tworek, Heewoo Jun, et al. 2021. Evaluating Large Language Models Trained on Code. arXiv:2107.03374 [cs] doi:10.48550/arXiv.2107.03374
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2107.03374 2021
-
[7]
Xin Cheng, Di Luo, Xiuying Chen, Lemao Liu, Dongyan Zhao, and Rui Yan
-
[8]
arXiv:2305.02437 [cs] doi:10.48550/arXiv.2305.02437
Lift Yourself Up: Retrieval-augmented Text Generation with Self Memory. arXiv:2305.02437 [cs] doi:10.48550/arXiv.2305.02437
-
[9]
Yang Deng, Wenqiang Lei, Wai Lam, and Tat-Seng Chua. 2023. A Survey on Proac- tive Dialogue Systems: Problems, Methods, and Prospects. arXiv:2305.02750 [cs] doi:10.48550/arXiv.2305.02750
-
[10]
Yang Deng, Wenqiang Lei, Wenxuan Zhang, Wai Lam, and Tat-Seng Chua. 2022. PACIFIC: Towards Proactive Conversational Question Answering over Tabular and Textual Data in Finance. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computational Ling...
-
[11]
Yang Deng, Lizi Liao, Zhonghua Zheng, Grace Hui Yang, and Tat-Seng Chua
-
[12]
Towards Human-centered Proactive Conversational Agents. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, Washington DC USA, 807–818. doi:10.1145/3626772. 3657843
-
[13]
Huan-ang Gao, Jiayi Geng, Wenyue Hua, et al. 2025. A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence. arXiv:2507.21046 [cs] doi:10. 48550/arXiv.2507.21046
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[14]
Google. 2025. Gemini 2.5 Pro. https://deepmind.google/models/gemini/pro/
work page 2025
-
[15]
Jiawei Gu, Xuhui Jiang, Zhichao Shi, Hexiang Tan, Xuehao Zhai, Chengjin Xu, Wei Li, Yinghan Shen, Shengjie Ma, Honghao Liu, Saizhuo Wang, Kun Zhang, Yuanzhuo Wang, Wen Gao, Lionel Ni, and Jian Guo. 2025.A Survey on LLM-as-a- Judge. arXiv:2411.15594 [cs] doi:10.48550/arXiv.2411.15594
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2411.15594 2025
-
[16]
Dongling Huang, Dmitri G. Markovitch, and Rusty A. Stough. 2024. Can Chatbot Customer Service Match Human Service Agents on Customer Satisfaction? An Investigation in the Role of Trust.Journal of Retailing and Consumer Services76 (Jan. 2024), 103600. doi:10.1016/j.jretconser.2023.103600
-
[17]
Jiho Kim, Yeonsu Kwon, Yohan Jo, and Edward Choi. 2023. KG-GPT: A Gen- eral Framework for Reasoning on Knowledge Graphs Using Large Language Models. InFindings of the Association for Computational Linguistics: EMNLP 2023, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapore, 9410–9421. doi:10.18653/v1/2023...
-
[18]
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. 2020. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. InAdvances in Neural Information Processing Systems, Vol. 33. Curran Associates, Inc., 9459–9474
work page 2020
-
[19]
Fengrui Liu, Xiao He, Tieying Zhang, Jianjun Chen, Yi Li, Lihua Yi, Haipeng Zhang, Gang Wu, and Rui Shi. 2025. TickIt: Leveraging Large Language Models for Automated Ticket Escalation. InProceedings of the 33rd ACM International Con- ference on the Foundations of Software Engineering. 343–354. arXiv:2504.08475 [cs] doi:10.1145/3696630.3728558
-
[20]
Xingyu Bruce Liu, Shitao Fang, Weiyan Shi, Chien-Sheng Wu, Takeo Igarashi, and Xiang ’Anthony’ Chen. 2025. Proactive Conversational Agents with In- ner Thoughts. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. ACM, Yokohama Japan, 1–19. doi:10.1145/3706598.3713760
-
[21]
Yaxi Lu, Shenzhi Yang, Cheng Qian, Guirong Chen, Qinyu Luo, Yesai Wu, Huadong Wang, Xin Cong, Zhong Zhang, Yankai Lin, Weiwen Liu, Yasheng Wang, Zhiyuan Liu, Fangming Liu, and Maosong Sun. 2024. Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance. arXiv:2410.12361 [cs] doi:10.48550/arXiv.2410.12361
-
[22]
Xinbei Ma, Yeyun Gong, Pengcheng He, Hai Zhao, and Nan Duan
-
[23]
arXiv:2305.14283 [cs] doi:10.48550/arXiv.2305.14283
Query Rewriting for Retrieval-Augmented Large Language Models. arXiv:2305.14283 [cs] doi:10.48550/arXiv.2305.14283
-
[25]
Zihan Niu, Zheyong Xie, Shaosheng Cao, Chonggang Lu, Zheyu Ye, Tong Xu, Zuozhu Liu, Yan Gao, Jia Chen, Zhe Xu, Yi Wu, and Yao Hu. 2025. PaRT: En- hancing Proactive Social Chatbots with Personalized Real-Time Retrieval. In Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’25). Associatio...
-
[26]
Patil, Tianjun Zhang, Xin Wang, and Joseph E
Shishir G. Patil, Tianjun Zhang, Xin Wang, and Joseph E. Gonzalez. 2024. Go- rilla: Large Language Model Connected with Massive APIs.Advances in Neural Information Processing Systems37 (Dec. 2024), 126544–126565
work page 2024
-
[27]
Changhua Pei, Zexin Wang, Fengrui Liu, Zeyan Li, Yang Liu, Xiao He, Rong Kang, Tieying Zhang, Jianjun Chen, Jianhui Li, Gaogang Xie, and Dan Pei. 2025. Flow-of-Action: SOP Enhanced LLM-Based Multi-Agent System for Root Cause Analysis. arXiv:2502.08224 [cs] doi:10.48550/arXiv.2502.08224
-
[28]
Qwen. 2025. Qwen2.5 VL. https://qwen.ai
work page 2025
-
[29]
Scott Rome, Tianwen Chen, Raphael Tang, Luwei Zhou, and Ferhan Ture. 2024. "Ask Me Anything": How Comcast Uses LLMs to Assist Agents in Real Time. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’24). Association for Computing Machinery, New York, NY, USA, 2827–2831. doi:10.1145/3626...
-
[30]
Bytedance Seed. 2025. ByteDance Seed. https://seed.bytedance.com/en/seed1_6
work page 2025
-
[31]
Bytedance Seed. 2025. Seed1.6-Embedding. https://seed1-6- embedding.github.io/
work page 2025
-
[32]
Samaneh Shafee, Alysson Bessani, and Pedro M. Ferreira. 2025. Evaluation of LLM Chatbots for OSINT-based Cyber Threat Awareness.Expert Systems with Applications261 (Feb. 2025), 125509. arXiv:2401.15127 [cs] doi:10.1016/j.eswa. 2024.125509
-
[33]
Jingzhe Shi, Jialuo Li, Qinwei Ma, Zaiwen Yang, Huan Ma, and Lei Li. 2024. CHOPS: CHat with custOmer Profile Systems for Customer Service with LLMs. arXiv:2404.01343 [cs] doi:10.48550/arXiv.2404.01343
-
[34]
Hanchen Su, Wei Luo, Yashar Mehdad, Wei Han, Elaine Liu, Wayne Zhang, Mia Zhao, and Joy Zhang. 2025. LLM-Friendly Knowledge Representation for Customer Support. InProceedings of the 31st International Conference on Com- putational Linguistics: Industry Track, Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schocka...
work page 2025
-
[35]
Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. 2023. Voyager: An Open-Ended Embodied Agent with Large Language Models. arXiv:2305.16291 [cs] doi:10.48550/arXiv. 2305.16291
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv 2023
-
[36]
Haoxin Wang, Xianhan Peng, Xucheng Huang, Yizhe Huang, Ming Gong, Cheng- han Yang, Yang Liu, and Ling Jiang. 2025. ECom-Bench: Can LLM Agent Re- solve Real-World E-commerce Customer Support Issues? arXiv:2507.05639 [cs] doi:10.48550/arXiv.2507.05639
-
[37]
Shuting Wang, Xin Yu, Mang Wang, Weipeng Chen, Yutao Zhu, and Zhicheng Dou. 2024. RichRAG: Crafting Rich Responses for Multi-faceted Queries in Retrieval-Augmented Generation. arXiv:2406.12566 [cs] doi:10.48550/arXiv.2406. 12566
-
[38]
Zhentao Xu, Mark Jerome Cruz, Matthew Guevara, Tie Wang, Manasi Deshpande, Xiaofeng Wang, and Zheng Li. 2024. Retrieval-Augmented Generation with Knowledge Graphs for Customer Service Question Answering. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’24). Association for Computing ...
-
[39]
Bufang Yang, Lilin Xu, Liekang Zeng, Kaiwei Liu, Siyang Jiang, Wenrui Lu, Hongkai Chen, Xiaofan Jiang, Guoliang Xing, and Zhenyu Yan. 2025. Con- textAgent: Context-Aware Proactive LLM Agents with Open-World Sensory Perceptions. arXiv:2505.14668 [cs] doi:10.48550/arXiv.2505.14668
- [40]
-
[41]
Ceyao Zhang, Kaijie Yang, Siyi Hu, Zihao Wang, Guanghe Li, Yihang Sun, Cheng Zhang, Zhaowei Zhang, Anji Liu, Song-Chun Zhu, Xiaojun Chang, Junge Zhang, Feng Yin, Yitao Liang, and Yaodong Yang. 2024. ProAgent: Building Proactive Cooperative Agents with Large Language Models.Proceedings of the AAAI Conference on Artificial Intelligence38, 16 (March 2024), 1...
work page 2024
-
[42]
Xuan Zhang, Yang Deng, Zifeng Ren, See-Kiong Ng, and Tat-Seng Chua
-
[43]
arXiv:2406.12639 [cs] doi:10.48550/arXiv.2406.12639
Ask-before-Plan: Proactive Language Agents for Real-World Planning. arXiv:2406.12639 [cs] doi:10.48550/arXiv.2406.12639
-
[44]
Yifei Zhou, Sergey Levine, Jason Weston, Xian Li, and Sainbayar Sukhbaatar
-
[45]
Self-Challenging Language Model Agents. arXiv:2506.01716 [cs] doi:10. 48550/arXiv.2506.01716 Conference’17, July 2017, Washington, DC, USA Fengrui Liu et al. A Prompt Template Details We use seed-1.6 model, a model with original thinking ability from ByteDance, to perform different tasks that introduced in this paper. The model output constrains to a spec...
-
[46]
You are able to answer questions related to the Volcano Engine's product features, usage guidance, configuration instructions, and provide code examples
-
[47]
You can help with explaining the error message, exception and common troubleshooting steps of Volcano Engine. # Rules
-
[48]
If the messages contain a question that you are capable of answering, classify it as "Within Scope"
-
[49]
If the messages contain a question that is beyond your ability scope, classify it as "Out of Scope"
-
[50]
If the messages do not contain any question, classify it as "No assistance needed"
-
[51]
You just need to give the classification result, without answering the question. Please analyze the newly added messages from the customer and give your classification result. A.2 Prompt for Answer Generation # Role You are an intelligent assistant. Please combine the historical dialogue with the references to understand and respond to the current questio...
-
[52]
Keep: If you find the answer from the follow-up dialogue is consistent with the existing answer, or the follow-up dialogue does not discuss about the question any more, select "Keep" which represents that you need to do nothing to the knowledge base
-
[53]
Delete: If you find the answer from the follow-up dialogue has significantly differences from the existing answer, and the references is not suitable for this question, select "Delete" that you can delete the inappropriate references from the knowledge base
-
[54]
Update: Historical references may contain some differences compared to the current on-call. If you find the answer from the follow-up dialogue has few differences from the existing answer, you need to distinguish the different background and prerequisites of this problem, and rewrite the question and answer to make them more accurate
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.