arxiv: 2604.17234 · v1 · submitted 2026-04-19 · 💻 cs.SE

Recognition: unknown

From Language to Action: Enhancing LLM Task Efficiency with Task-Aware MCP Server Recommendation

Shiyu He , Zhiman Chen , Yuqi Zhao , Neng Zhang , Ran Mo , Yutao ma

Authors on Pith no claims yet

Pith reviewed 2026-05-10 06:16 UTC · model grok-4.3

classification 💻 cs.SE

keywords MCP server recommendationLLM agentstask-oriented retrievalsemantic relevancere-rankingdataset constructiontool integrationsoftware engineering

0 comments

The pith

T2MRec recommends MCP servers for development tasks by first matching semantic relevance and structural compatibility then expanding candidates and re-ranking with constrained LLM output.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper formulates the problem of selecting suitable MCP servers for LLM agent tasks as a structured retrieval-and-ranking challenge that accounts for both meaning and engineering constraints. It releases the Task2MCP dataset that pairs taxonomy-grounded development tasks with curated servers to supply reproducible supervision. The T2MRec model builds an initial candidate pool from semantic and compatibility signals, then applies centroid-based expansion to increase coverage and constrained LLM re-ranking to improve ordering. An accompanying conversational agent prototype shows how the outputs can be presented with usage guidance in interactive settings. If the pipeline works as described, developers would spend less time manually searching for compatible tools when extending LLM agents.

Core claim

Task-oriented MCP server recommendation is solved by constructing an initial candidate set through joint modeling of semantic relevance and structural compatibility, followed by centroid-based candidate expansion to boost coverage and constrained LLM-based re-ranking to refine quality, all supported by the new Task2MCP dataset that systematically associates taxonomy-grounded tasks with curated MCP servers.

What carries the argument

T2MRec, the task-to-MCP recommendation model that forms candidates from semantic relevance plus structural compatibility, then performs centroid-based expansion and constrained LLM re-ranking.

If this is right

The Task2MCP dataset supplies a public, reproducible benchmark for evaluating future MCP recommendation methods.
Centroid-based expansion increases the chance that relevant but initially overlooked servers enter the candidate pool.
Constrained LLM re-ranking produces rankings that respect both relevance and practical engineering limits.
The interactive agent prototype demonstrates how recommendations plus usage guidelines can support real-time developer decisions.
Joint semantic-structural modeling reduces the mismatch between task intent and server capabilities compared with text-only retrieval.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same candidate-construction plus expansion plus re-ranking pattern could be tested on tool recommendation problems outside the MCP ecosystem.
If the centroid step proves effective, analogous expansion techniques might improve recall in other structured retrieval settings such as API or library recommendation.
Providing usage guidelines alongside ranked servers could shorten the time from recommendation to working integration in agent development pipelines.

Load-bearing premise

The Task2MCP dataset supplies representative examples of real development tasks and the retrieval-ranking pipeline produces measurable gains in task efficiency inside actual developer workflows.

What would settle it

A controlled user study that records task completion time, success rate, and integration effort for developers solving the same Task2MCP tasks with T2MRec recommendations versus baseline manual search or random selection.

Figures

Figures reproduced from arXiv: 2604.17234 by Neng Zhang, Ran Mo, Shiyu He, Yuqi Zhao, Yutao ma, Zhiman Chen.

**Figure 2.** Figure 2: Two-Level task taxonomy based on the NIST human–AI activity framework. [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗

**Figure 3.** Figure 3: Schema and illustrative example of the Task2MCP dataset. [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗

**Figure 4.** Figure 4: Architecture of the T2MRec model. The model progressively narrows the search space of MCP servers [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

**Figure 5.** Figure 5: Prompt template used by the constrained list-wise LLM re-ranker. [PITH_FULL_IMAGE:figures/full_fig_p021_5.png] view at source ↗

**Figure 6.** Figure 6: Effect of the two-tower encoder on task-to-MCP alignment in the shared embedding space using [PITH_FULL_IMAGE:figures/full_fig_p029_6.png] view at source ↗

**Figure 7.** Figure 7: Effect of different semantics–structure weights. [PITH_FULL_IMAGE:figures/full_fig_p031_7.png] view at source ↗

**Figure 8.** Figure 8: Effect of different temperature values in the contrastive objective. [PITH_FULL_IMAGE:figures/full_fig_p032_8.png] view at source ↗

**Figure 9.** Figure 9: Effect of different learning rates. excessively large and excessively small learning rates lead to inferior performance. In general, the learning rate directly affects the stability and efficiency of optimization and therefore substantially influences the quality of the learned task–MCP server representations. When the learning rate was too large, parameter updates became excessively aggressive, making the… view at source ↗

**Figure 10.** Figure 10: Overall system architecture of T2MAgent for retrieval-centered MCP server recommendation. [PITH_FULL_IMAGE:figures/full_fig_p035_10.png] view at source ↗

**Figure 11.** Figure 11: Interactive interface of the T2MAgent agent showing MCP server recommendations and evidence [PITH_FULL_IMAGE:figures/full_fig_p036_11.png] view at source ↗

**Figure 12.** Figure 12: Illustrative task used in the case study: generating marketing-oriented captions for YouTube videos. [PITH_FULL_IMAGE:figures/full_fig_p037_12.png] view at source ↗

read the original abstract

The rapid expansion of the model context protocol (MCP) ecosystem enables large language model (LLM)-based agents to access a wide range of external tools via a standardized interface. However, identifying appropriate MCP servers for a specific development task remains challenging. Existing studies primarily focus on measuring the MCP ecosystem or optimizing tool invocation mechanisms, while systematic recommendation frameworks and reproducible benchmarks for real-world development tasks remain largely unexplored. To address this limitation, we formulate task-oriented MCP server recommendation as a structured retrieval-and-ranking problem that jointly considers semantic relevance and engineering constraints. We first construct Task2MCP, a task-centered dataset that systematically associates taxonomy-grounded development tasks with curated MCP servers. This dataset provides structured supervision and a reproducible evaluation environment for research on MCP tool recommendations. Building on this dataset, we propose T2MRec, a task-to-MCP server recommendation model. It models semantic relevance and structural compatibility to construct an initial candidate set. Then it improves coverage and ranking quality through centroid-based candidate expansion and constrained LLM-based re-ranking. In addition, we design and implement an interactive MCP server recommendation agent prototype that operates in conversational environments to support dynamic decision-making. The agent assists developers in efficiently evaluating and integrating tools by providing recommended MCP servers together with usage guidelines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper introduces a new dataset and recommendation pipeline for MCP servers in LLM agents but lacks any experimental validation of its performance claims.

read the letter

The punchline is that this work creates a fresh dataset for task-to-MCP server matching and outlines a retrieval-plus-re-ranking pipeline, but it offers no test results to confirm any gains in efficiency. They constructed Task2MCP by associating taxonomy-based tasks with curated servers, which gives a structured way to supervise and evaluate recommendations. That dataset construction stands out as the concrete new piece. On top of it they define T2MRec to first find candidates via semantic relevance and structural compatibility, then expand them using centroids and re-rank with a constrained LLM. They round it out with an interactive agent that suggests servers and usage notes in chat. The pipeline addresses a real gap in handling both meaning and constraints for tool choice in LLM agents. The agent prototype shows how this could fit into actual development workflows. The weakness is the complete lack of quantitative support. No precision or recall figures appear, no baseline comparisons, no ablations on the centroid expansion or the LLM step, and no end-to-end measures on task completion time or success rates. The improvements in coverage and ranking are asserted but not measured, so the efficiency claim stays untested. Readers working on LLM tool use or agent systems in software engineering would get the most from the dataset and the problem framing. It could spark ideas for similar recommendation setups. This paper deserves a serious referee. The dataset and formulation are worth reviewing even if the current version needs experiments to strengthen it. I recommend sending it for peer review with a note that solid validation experiments will be needed to make the claims stick.

Referee Report

2 major / 1 minor

Summary. The manuscript formulates task-oriented MCP server recommendation as a structured retrieval-and-ranking problem. It constructs the Task2MCP dataset associating taxonomy-grounded development tasks with curated MCP servers and proposes the T2MRec model, which builds an initial candidate set via semantic relevance and structural compatibility, then applies centroid-based candidate expansion and constrained LLM-based re-ranking to improve coverage and ranking quality. An interactive MCP server recommendation agent prototype for conversational environments is also described.

Significance. If the pipeline's effectiveness is demonstrated, the Task2MCP dataset would provide a valuable reproducible benchmark and the T2MRec approach a practical method for tool selection in the expanding MCP ecosystem, potentially improving LLM agent efficiency in real development tasks. The prototype adds applied value for dynamic decision-making.

major comments (2)

[Method] Method section: The claims that centroid-based candidate expansion and constrained LLM-based re-ranking improve coverage and ranking quality over the initial semantic+structural candidate set are not supported by any quantitative results, ablation studies, baseline comparisons, precision/recall metrics, or task-efficiency measurements.
[Evaluation] Evaluation section: No experimental validation, user studies, or latency metrics on actual development tasks are reported, leaving the central assertion that the two-stage pipeline enhances LLM task efficiency unsubstantiated and untestable from the manuscript alone.

minor comments (1)

[Abstract] Abstract: The specific taxonomy used to ground the development tasks in Task2MCP is not named, which would aid reader understanding of dataset scope.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the potential value of the Task2MCP dataset and T2MRec approach for the MCP ecosystem. We agree that the current manuscript requires additional empirical support to substantiate the claims about the two-stage pipeline. We will revise the paper accordingly to address these points.

read point-by-point responses

Referee: [Method] Method section: The claims that centroid-based candidate expansion and constrained LLM-based re-ranking improve coverage and ranking quality over the initial semantic+structural candidate set are not supported by any quantitative results, ablation studies, baseline comparisons, precision/recall metrics, or task-efficiency measurements.

Authors: We acknowledge that the manuscript currently describes the T2MRec components and their intended benefits without providing quantitative ablation results or metrics. In the revised version, we will add an ablation study to the Method and Evaluation sections. This will report precision@K, recall@K, coverage, and ranking quality metrics on the Task2MCP dataset, directly comparing the initial semantic+structural candidate set against the full pipeline that includes centroid-based expansion and constrained LLM re-ranking. We will also include relevant baseline comparisons. revision: yes
Referee: [Evaluation] Evaluation section: No experimental validation, user studies, or latency metrics on actual development tasks are reported, leaving the central assertion that the two-stage pipeline enhances LLM task efficiency unsubstantiated and untestable from the manuscript alone.

Authors: We agree that the central claim of enhanced LLM task efficiency requires empirical validation. In the revision, we will expand the Evaluation section with experiments on real development tasks drawn from the Task2MCP taxonomy. This will include user studies measuring task completion time, success rates, and developer feedback when using T2MRec recommendations versus baselines, as well as latency metrics for the recommendation pipeline. These results will be reported to make the efficiency improvements testable and substantiated. revision: yes

Circularity Check

0 steps flagged

No circularity: dataset and pipeline described as independent contributions without self-referential reductions

full rationale

The paper constructs Task2MCP as a new task-centered dataset providing structured supervision and then describes T2MRec as a retrieval-ranking pipeline that first builds candidate sets via semantic relevance and structural compatibility, followed by centroid-based expansion and constrained LLM re-ranking. No equations, fitted parameters, or predictions are presented that reduce outputs to inputs by construction. No load-bearing self-citations, uniqueness theorems, or ansatzes imported from prior author work are invoked. The interactive agent prototype is presented as an additional implementation detail. The derivation chain remains self-contained and does not collapse into its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The central claim rests on the assumption that semantic similarity plus structural constraints can be combined into an effective ranking signal and that the constructed dataset supplies unbiased supervision. No free parameters or invented physical entities are described.

axioms (2)

domain assumption Development tasks can be reliably taxonomized and paired with MCP servers to create representative training data.
Invoked when constructing Task2MCP as structured supervision.
ad hoc to paper Centroid-based expansion and LLM re-ranking will improve both coverage and ranking quality over initial semantic matching.
Stated as the mechanism that improves the candidate set.

invented entities (2)

T2MRec model no independent evidence
purpose: Task-to-MCP server recommendation via semantic and structural modeling plus expansion and re-ranking.
New system introduced in the paper; no independent evidence outside the described pipeline.
Task2MCP dataset no independent evidence
purpose: Structured supervision linking taxonomy-grounded tasks to curated MCP servers.
New dataset constructed for this work; reproducibility depends on public release not mentioned in abstract.

pith-pipeline@v0.9.0 · 5537 in / 1504 out tokens · 52937 ms · 2026-05-10T06:16:48.629574+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

77 extracted references · 20 canonical work pages

[1]

Shang Liu, Wenji Fang, Yao Lu, Jing Wang, Qijun Zhang, Hongce Zhang, and Zhiyao Xie. 2025. RTLCoder: Fully Open-Source and Efficient LLM-Assisted RTL Code Generation Technique.IEEE Trans. Comput. Aided Des. Integr. Circuits Syst.44, 4 (2025), 1448–1461

2025
[2]

Jinghan Zhang, Xiting Wang, Weijieying Ren, Lu Jiang, Dongjie Wang, and Kunpeng Liu. 2025. RATT: A Thought Structure for Coherent and Correct LLM Reasoning. InProceedings of the 39th AAAI Conference on Artificial Intelligence, AAAI 2025, Toby Walsh, Julie Shah, and Zico Kolter (Eds.). AAAI Press, Philadelphia, PA, USA, 26733–26741

2025
[3]

Xinghua Zhang, Haiyang Yu, Cheng Fu, Fei Huang, and Yongbin Li. 2025. IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2025, Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Ta...

2025
[4]

Junda He, Christoph Treude, and David Lo. 2025. LLM-Based Multi-Agent Systems for Software Engineering: Literature Review, Vision, and the Road Ahead.ACM Trans. Softw. Eng. Methodol.34, 5 (2025), 124:1–124:30

2025
[5]

Zhiyuan Ma, Zhenya Huang, Jiayu Liu, Minmao Wang, Hongke Zhao, and Xin Li. 2025. Automated Creation of Reusable and Diverse Toolsets for Enhancing LLM Reasoning. InProceedings of the 39th AAAI Conference on Artificial Intelligence, AAAI 2025, Toby Walsh, Julie Shah, and Zico Kolter (Eds.). AAAI Press, Philadelphia, PA, USA, 24821–24830

2025
[6]

Han Xu, Xingyuan Wang, and Shihao Ji. 2025. Reliable and Efficient Container Orchestration of LLMs via MCP. In Proceedings of the 34th ACM International Conference on Information and Knowledge Management, CIKM 2025, Meeyoung Cha, Chanyoung Park, Noseong Park, Carl Yang, Senjuti Basu Roy, Jessie Li, Jaap Kamps, Kijung Shin, Bryan Hooi, and Lifang He (Eds.)...

2025
[7]

Zhiwei Lin, Bonan Ruan, Jiahao Liu, and Weibo Zhao. 2025. A Large-Scale Evolvable Dataset for Model Context Protocol Ecosystem and Security Analysis. InProceedings of the 40th IEEE/ACM International Conference on Automated Software Engineering, ASE 2025. IEEE, Seoul, Republic of Korea, 3985–3988

2025
[8]

Hechuan Guo, Yongle Hao, Yue Zhang, Minghui Xu, Peizhuo Lv, Jiezhi Chen, and Xiuzhen Cheng. 2025. A Measurement Study of Model Context Protocol Ecosystem. arXiv:2509.25292 [cs.CY] https://arxiv.org/abs/2509.25292

work page arXiv 2025
[9]

Xinyi Hou, Yanjie Zhao, Shenao Wang, and Haoyu Wang. 2026. Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions.ACM Trans. Softw. Eng. Methodol.(2026). doi:10.1145/3796519 Just Accepted

work page doi:10.1145/3796519 2026
[10]

Zihan Wang, Rui Zhang, Yu Liu, Wenshu Fan, Wenbo Jiang, Qingchuan Zhao, Hongwei Li, and Guowen Xu. 2026. MPMA: Preference Manipulation Attack Against Model Context Protocol. InProceedings of the 40th AAAI Conference on Artificial Intelligence, AAAI 2026, Sven Koenig, Chad Jenkins, and Matthew E. Taylor (Eds.). AAAI Press, Singapore, 35838–35846

2026
[11]

Mengying Wu, Pei Chen, Geng Hong, Baichao An, Jinsong Chen, Binwang Wan, Xudong Pan, Jiarun Dai, and Min Yang. 2025. MCPZoo: A Large-Scale Dataset of Runnable Model Context Protocol Servers for AI Agent. arXiv:2512.15144 [cs.CR] https://arxiv.org/abs/2512.15144

work page arXiv 2025
[12]

Xiang Fei, Xiawu Zheng, and Hao Feng. 2025. MCP-Zero: Active Tool Discovery for Autonomous LLM Agents. arXiv:2506.01056 [cs.AI] https://arxiv.org/abs/2506.01056

work page arXiv 2025
[13]

Shubh Laddha, Lucas Changbencharoen, Win Kuptivej, Surya Shringla, Archana Vaidheeswaran, and Yash Bhaskar. 2025. HumanMCP: A Human-Like Query Dataset for Evaluating MCP Tool Retrieval Performance. arXiv:2602.23367 [cs.AI] https://arxiv.org/abs/2602.23367

work page arXiv 2025
[14]

Xuanqi Gao, Siyi Xie, Juan Zhai, Shiqing Ma, and Chao Shen. 2025. MCP-RADAR: A Multi-Dimensional Benchmark for Evaluating Tool Use Capabilities in Large Language Models. arXiv:2505.16700 [cs.AI] https://arxiv.org/abs/2505.16700

work page arXiv 2025
[16]

Zhenting Wang, Qi Chang, Hemani Patel, Shashank Biju, Cheng-En Wu, Quan Liu, Aolin Ding, Alireza Rezazadeh, Ankit Shah, Yujia Bao, and Eugene Siow. 2025. MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers. arXiv:2508.20453 [cs.CL] https://arxiv.org/abs/2508.20453 , Vol. 1, No. 1, Article . Publication date: April 2...

work page arXiv 2025
[17]

Zijian Wu, Xiangyan Liu, Xinyuan Zhang, Lingjun Chen, Fanqing Meng, Lingxiao Du, Yiran Zhao, Fanshi Zhang, Yaoqi Ye, Jiawei Wang, Zirui Wang, Jinjie Ni, Yufan Yang, Arvin Xu, and Michael Qizhe Shieh. 2025. MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use. arXiv:2509.24002 [cs.CL] https://arxiv.org/abs/2509.24002

work page arXiv 2025
[18]

Zikang Guo, Benfeng Xu, Chiwei Zhu, Wentao Hong, Xiaorui Wang, and Zhendong Mao. 2026. MCP-AgentBench: Evaluating Real-World Language Agent Performance with MCP-Mediated Tools. InProceedings of the 40th AAAI Conference on Artificial Intelligence, AAAI 2026, Sven Koenig, Chad Jenkins, and Matthew E. Taylor (Eds.). AAAI Press, Singapore, 30888–30896

2026
[19]

Zhiqiang Wang, Yichao Gao, Yanting Wang, Suyuan Liu, Haifeng Sun, Haoran Cheng, Guanquan Shi, Haohua Du, and Xiangyang Li. 2026. MCPTox: A Benchmark for Tool Poisoning on Real-World MCP Servers. InProceedings of the 40th AAAI Conference on Artificial Intelligence, AAAI 2026, Sven Koenig, Chad Jenkins, and Matthew E. Taylor (Eds.). AAAI Press, Singapore, 3...

2026
[20]

Ningyuan Yang, Guanliang Lyu, Mingchen Ma, Yiyi Lu, Yiming Li, Zhihui Gao, Hancheng Ye, Jianyi Zhang, Tingjun Chen, and Yiran Chen. 2025. IoT-MCP: Bridging LLMs and IoT Systems Through Model Context Protocol. InProceedings of the ACM Workshop on Wireless Network Testbeds, Experimental evaluation & Characterization, WiNTECH 2025. ACM, Hong Kong, China, 73–80

2025
[21]

Yixuan Yang, Cuifeng Gao, Daoyuan Wu, Yufan Chen, Yingjiu Li, and Shuai Wang. 2026. MCPSecBench: A Systematic Security Benchmark and Playground for Testing Model Context Protocols. arXiv:2508.13220 [cs.CR] https://arxiv.org/ abs/2508.13220

work page arXiv 2026
[22]

Brandon Radosevich and John Halloran. 2025. MCP Safety Audit: LLMs with the Model Context Protocol Allow Major Security Exploits. arXiv:2504.03767 [cs.CR] https://arxiv.org/abs/2504.03767

work page arXiv 2025
[23]

Mingwei Liu, Chengyuan Zhao, Xin Peng, Simin Yu, Haofen Wang, and Chaofeng Sha. 2023. Task-Oriented ML/DL Library Recommendation Based on a Knowledge Graph.IEEE Trans. Software Eng.49, 8 (2023), 4081–4096

2023
[24]

Jeffrey Brantingham, Nanyun Peng, and Wei Wang

Mingyu Derek Ma, Xiaoxuan Wang, Po-Nien Kung, P. Jeffrey Brantingham, Nanyun Peng, and Wei Wang. 2024. STAR: Boosting Low-Resource Information Extraction by Structure-to-Text Data Generation with Large Language Models. In Proceedings of the 38th AAAI Conference on Artificial Intelligence, AAAI 2024, Michael J. Wooldridge, Jennifer G. Dy, and Sriraam Natar...

2024
[25]

Junliang Yu, Hongzhi Yin, Xin Xia, Tong Chen, Jundong Li, and Zi Huang. 2024. Self-Supervised Learning for Recommender Systems: A Survey.IEEE Trans. Knowl. Data Eng.36, 1 (2024), 335–355

2024
[26]

Shiyu He, Yuqi Zhao, Qibo Li, and Yutao Ma. 2026. RGPRec: A RAG-Enhanced GNN for Personalized Task Recommen- dations in Open-Source Communities.Softw. Pract. Exp.56, 1 (2026), 3–25

2026
[27]

Shyam Sundar, and Joseph B

Mengqi Liao, S. Shyam Sundar, and Joseph B. Walther. 2022. User Trust in Recommendation Systems: A comparison of Content-Based, Collaborative and Demographic Filtering. InProceedings of the CHI Conference on Human Factors in Computing Systems, CHI 2022, Simone D. J. Barbosa, Cliff Lampe, Caroline Appert, David A. Shamma, Steven Mark Drucker, Julie R. Will...

2022
[28]

Liang Zeng, Jiewen Guan, and Bilian Chen. 2023. MSBPR: A multi-pairwise preference and similarity based Bayesian personalized ranking method for recommendation.Knowl. Based Syst.260 (2023), 110165

2023
[29]

Yidan Wang, Zhaochun Ren, Weiwei Sun, Jiyuan Yang, Zhixiang Liang, Xin Chen, Ruobing Xie, Su Yan, Xu Zhang, Pengjie Ren, Zhumin Chen, and Xin Xin. 2024. Content-Based Collaborative Generation for Recommender Systems. In Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, CIKM 2024, Edoardo Serra and Francesca Spez...

2024
[30]

Moshi Wei, Nima Shiri Harzevili, Yuchao Huang, Junjie Wang, and Song Wang. 2022. CLEAR: Contrastive Learning for API Recommendation. InProceedings of the 44th IEEE/ACM International Conference on Software Engineering, ICSE

2022
[31]

ACM, Pittsburgh, PA, USA, 376–387
[32]

Zhihao Li, Chuanyi Li, Ze Tang, Wanhong Huang, Jidong Ge, Bin Luo, Vincent Ng, Ting Wang, Yucheng Hu, and Xiaopeng Zhang. 2024. PTM-APIRec: Leveraging Pre-trained Models of Source Code in API Recommendation.ACM Trans. Softw. Eng. Methodol.33, 3 (2024), 72:1–72:30

2024
[33]

Silva, Sara C

Miguel G. Silva, Sara C. Madeira, and Rui Henriques. 2024. A Comprehensive Survey on Biclustering-based Collaborative Filtering.ACM Comput. Surv.56, 12 (2024), 313:1–313:32

2024
[34]

Sajad Ahmadian, Kamal Berahmand, Mehrdad Rostami, Saman Forouzandeh, Parham Moradi, and Mahdi Jalili. 2025. Recommender Systems Based on Nonnegative Matrix Factorization: A Survey.IEEE Trans. Artif. Intell.6, 10 (2025), 2554–2574

2025
[35]

Yuanhao Pu, Xiaolong Chen, Xu Huang, Jin Chen, Defu Lian, and Enhong Chen. 2024. Learning-Efficient Yet Generalizable Collaborative Filtering for Item Recommendation. InProceedings of the 41st International Conference on Machine Learning, ICML 2024 (Proceedings of Machine Learning Research, Vol. 235), Ruslan Salakhutdinov, Zico Kolter, Katherine A. Heller...

2024
[36]

Xiaoyao Zheng, Zhen Ni, Xiangnan Zhong, and Yonglong Luo. 2024. Kernelized Deep Learning for Matrix Factorization Recommendation System Using Explicit and Implicit Information.IEEE Trans. Neural Networks Learn. Syst.35, 1 (2024), 1205–1216

2024
[37]

Man-Ching Yuen, Irwin King, and Kwong-Sak Leung. 2021. Temporal context-aware task recommendation in crowd- sourcing systems.Knowl. Based Syst.219 (2021), 106770

2021
[38]

Ting Yu, Dongjin Yu, Dongjing Wang, Quanxin Yang, and Xueyou Hu. 2024. Iterative framework based on multi-task learning for service recommendation.J. Syst. Softw.207 (2024), 111873

2024
[39]

Le Wu, Xiangnan He, Xiang Wang, Kun Zhang, and Meng Wang. 2023. A Survey on Accuracy-Oriented Neural Recommendation: From Collaborative Filtering to Information-Rich Recommendation.IEEE Trans. Knowl. Data Eng. 35, 5 (2023), 4425–4445

2023
[40]

Vineeta Anand and Ashish Kumar Maurya. 2025. A Survey on Recommender Systems Using Graph Neural Network. ACM Trans. Inf. Syst.43, 1 (2025), 9:1–9:49

2025
[41]

Yuanhui Tian, Shenquan Huang, Yong Zhang, Zirui Chen, Panfeng Li, and Luchuan Yu. 2025. Crowdsourcing task recommendation model based on collaborative knowledge graph.Comput. Ind. Eng.210 (2025), 111516

2025
[42]

Kang Yang and Ruiyun Yu. 2025. Cybertron: task-adaptive intent attention graph neural networks for few-shot recommendation.Knowl. Inf. Syst.67, 12 (2025), 12029–12053

2025
[43]

Yuwei Cao, Liangwei Yang, Chen Wang, Zhiwei Liu, Hao Peng, Chenyu You, and Philip S. Yu. 2023. Multi-task Item- attribute Graph Pre-training for Strict Cold-start Item Recommendation. InProceedings of the 17th ACM Conference on Recommender Systems, RecSys 2023, Jie Zhang, Li Chen, Shlomo Berkovsky, Min Zhang, Tommaso Di Noia, Justin Basilico, Luiz Pizzato...

2023
[44]

Mintae Kim and Wooju Kim. 2023. Task-Oriented Collaborative Graph Embedding Using Explicit High-Order Proximity for Recommendation.Big Data Res.33 (2023), 100382

2023
[45]

Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, and Tat-Seng Chua. 2019. Neural Graph Collaborative Filtering. InProceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2019, Benjamin Piwowarski, Max Chevalier, Éric Gaussier, Yoelle Maarek, Jian-Yun Nie, and Falk Scholer (Eds.). ACM, Paris, Fra...

2019
[46]

Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yong-Dong Zhang, and Meng Wang. 2020. LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation. InProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2020, Jimmy X. Huang, Yi Chang, Xueqi Cheng, Jaap Kamps, Vanessa Murdock,...

2020
[47]

Zhensu Sun, Yan Liu, Ziming Cheng, Chen Yang, and Pengyu Che. 2020. Req2Lib: A Semantic Neural Model for Software Library Recommendation. InProceedings of the 27th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2020, Kostas Kontogiannis, Foutse Khomh, Alexander Chatzigeorgiou, Marios-Eleftherios Fokaefs, and Minghui...

2020
[48]

Grundy, and Yun Yang

Bo Li, Qiang He, Feifei Chen, Xin Xia, Li Li, John C. Grundy, and Yun Yang. 2021. Embedding app-library graph for neural third party library recommendation. InProceedings of the 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2021, Diomidis Spinellis, Georgios Gousios, Marsha Chech...

2021
[49]

Duantengchuan Li, Yuxuan Gao, Zhihao Wang, Hua Qiu, Pan Liu, Zhuoran Xiong, and Zilong Zhang. 2024. Homoge- neous graph neural networks for third-party library recommendation.Inf. Process. Manag.61, 6 (2024), 103831

2024
[50]

Yu, and Ming Zhang

Wei Ju, Siyu Yi, Yifan Wang, Zhiping Xiao, Zhengyang Mao, Hourun Li, Yiyang Gu, Yifang Qin, Nan Yin, Senzhang Wang, Xinwang Liu, Philip S. Yu, and Ming Zhang. 2026. A Survey of Graph Neural Networks in Real World: Imbalance, Noise, Privacy and OOD Challenges.IEEE Trans. Pattern Anal. Mach. Intell.48, 3 (2026), 3036–3055

2026
[51]

Yuyue Zhao, Jiancan Wu, Xiang Wang, Wei Tang, Dingxian Wang, and Maarten de Rijke. 2024. Let Me Do It For You: Towards LLM Empowered Recommendation via Tool Learning. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2024, Grace Hui Yang, Hongning Wang, Sam Han, Claudia Hauff, Guido Zu...

2024
[52]

Bowen Jin, Gang Liu, Chi Han, Meng Jiang, Heng Ji, and Jiawei Han. 2024. Large Language Models on Graphs: A Comprehensive Survey.IEEE Trans. Knowl. Data Eng.36, 12 (2024), 8622–8642

2024
[53]

Ne Luo, Aryo Pradipta Gema, Xuanli He, Emile van Krieken, Pietro Lesci, and Pasquale Minervini. 2025. Self-Training Large Language Models for Tool-Use Without Demonstrations. InProceedings of the Association for Computational Lin- guistics: NAACL 2025 (Findings of ACL), Luis Chiruzzo, Alan Ritter, and Lu Wang (Eds.). Association for Computational Linguist...

2025
[54]

Varatheepan Paramanayakam, Andreas Karatzas, Iraklis Anagnostopoulos, and Dimitrios Stamoulis. 2025. Less is More: Optimizing Function Calling for LLM Execution on Edge Devices. InProceedings of the Design, Automation & Test in Europe Conference 2025, DATE 2025. IEEE, Lyon, France, 1–7. , Vol. 1, No. 1, Article . Publication date: April 2026. From Languag...

2025
[55]

Tran, Jonah Samost, Maciej Kula, Ed H

Shashank Rajput, Nikhil Mehta, Anima Singh, Raghunandan Hulikal Keshavan, Trung Vu, Lukasz Heldt, Lichan Hong, Yi Tay, Vinh Q. Tran, Jonah Samost, Maciej Kula, Ed H. Chi, and Mahesh Sathiamoorthy. 2023. Recommender Systems with Generative Retrieval. InProceedings of the 37th Annual Conference on Neural Information Processing Systems, NeurIPS 2023, Alice O...

2023
[56]

Shuyuan Xu, Wenyue Hua, and Yongfeng Zhang. 2024. OpenP5: An Open-Source Platform for Developing, Training, and Evaluating LLM-based Recommender Systems. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2024, Grace Hui Yang, Hongning Wang, Sam Han, Claudia Hauff, Guido Zuccon, and Yi ...

2024
[57]

Bowen Zheng, Yupeng Hou, Hongyu Lu, Yu Chen, Wayne Xin Zhao, Ming Chen, and Ji-Rong Wen. 2024. Adapting Large Language Models by Integrating Collaborative Semantics for Recommendation. InProceedings of the 40th IEEE International Conference on Data Engineering, ICDE 2024. IEEE, Utrecht, The Netherlands, 1435–1448

2024
[58]

Dong, Yuan Fang, and Hady W

Hoang V. Dong, Yuan Fang, and Hady W. Lauw. 2025. A Contrastive Framework with User, Item and Review Alignment for Recommendation. InProceedings of the 18th ACM International Conference on Web Search and Data Mining, WSDM 2025, Wolfgang Nejdl, Sören Auer, Meeyoung Cha, Marie-Francine Moens, and Marc Najork (Eds.). ACM, Hannover, Germany, 117–126

2025
[59]

Jingbo Yang, Wenjun Wu, and Jian Ren. 2025. DSKIPP: A Prompt Method to Enhance the Reliability in LLMs for Java API Recommendation Task.Softw. Test. Verification Reliab.35, 2 (2025), e1913

2025
[60]

Fan Zhang, Jinpeng Chen, Tao Wang, Huan Li, Senzhang Wang, Feifei Kou, Ye Ji, Kaimin Wei, and Zhenye Yang
[61]

InProceedings of the 40th AAAI Conference on Artificial Intelligence, AAAI 2026, Sven Koenig, Chad Jenkins, and Matthew E

DiMA: Distinguishing Resident and Tourist Preferences via Multi-Modal LLM Alignment for Out-of-Town Cross-Domain Recommendation. InProceedings of the 40th AAAI Conference on Artificial Intelligence, AAAI 2026, Sven Koenig, Chad Jenkins, and Matthew E. Taylor (Eds.). AAAI Press, Singapore, 16262–16270

2026
[62]

Qiyao Peng, Hongtao Liu, Hua Huang, Jian Yang, Qing Yang, and Minglai Shao. 2025. A Survey on LLM-powered Agents for Recommender Systems. InProceedings of the Association for Computational Linguistics: EMNLP 2025 (Findings of ACL), Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng (Eds.). Association for Computational Linguist...

2025
[63]

Likang Wu, Zhi Zheng, Zhaopeng Qiu, Hao Wang, Hongchao Gu, Tingjia Shen, Chuan Qin, Chen Zhu, Hengshu Zhu, Qi Liu, Hui Xiong, and Enhong Chen. 2024. A survey on large language models for recommendation.World Wide Web (WWW)27, 5 (2024), 60

2024
[64]

Mir Rayat Imtiaz Hossain, Leo Feng, Leonid Sigal, and Mohamed Osama Ahmed. 2026. Do LLMs Benefit from User and Item Embeddings in Recommendation Tasks? arXiv:2601.04690 [cs.LG] https://arxiv.org/abs/2601.04690

work page arXiv 2026
[65]

Elias Lumer, Anmol Gulati, Vamse Kumar Subbiah, Pradeep Honaganahalli Basavaraju, and James A. Burke. 2025. ScaleMCP: Dynamic and Auto-Synchronizing Model Context Protocol Tools for LLM Agents. arXiv:2505.06416 [cs.CL] https://arxiv.org/abs/2505.06416

work page arXiv 2025
[66]

Wenpeng Xing, Zhipeng Chen, Changting Lin, and Meng Han. 2026. HGMF: A Hierarchical Gaussian Mixture Framework for Scalable Tool Invocation within the Model Context Protocol. arXiv:2508.07602 [cs.AI] https://arxiv. org/abs/2508.07602

work page arXiv 2026
[67]

Enhan Li, Hongyang Du, and Kaibin Huang. 2025. NetMCP: Network-Aware Model Context Protocol Platform for LLM Capability Extension. arXiv:2510.13467 [cs.NI] https://arxiv.org/abs/2510.13467

work page arXiv 2025
[68]

Emanuele Antonioni, Stefan Markovic, Anirudha Shankar, Jaime Bernardo, Lovro Markovic, Silvia Pareti, and Benedetto Proietti. 2025. JSPLIT: A Taxonomy-based Solution for Prompt Bloating in Model Context Protocol. arXiv:2510.14537 [cs.AI] https://arxiv.org/abs/2510.14537

work page arXiv 2025
[69]

Arash Ahmadi, Sarah Sharif, and Yaser M. Banad. 2026. MCP Bridge: A Lightweight, LLM-Agnostic RESTful Proxy for Model Context Protocol Servers. arXiv:2504.08999 [cs.CR] https://arxiv.org/abs/2504.08999

work page arXiv 2026
[70]

Shiqing Fan, Xichen Ding, Liang Zhang, and Linjian Mo. 2025. MCPToolBench++: A Large Scale AI Agent Model Context Protocol MCP Tool Use Benchmark. arXiv:2508.07575 [cs.AI] https://arxiv.org/abs/2508.07575

work page arXiv 2025
[71]

Ziyang Luo, Zhiqi Shen, Wenzhuo Yang, Zirui Zhao, Prathyusha Jwalapuram, Amrita Saha, Doyen Sahoo, Silvio Savarese, Caiming Xiong, and Junnan Li. 2025. MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers. arXiv:2508.14704 [cs.AI] https://arxiv.org/abs/2508.14704

work page arXiv 2025
[72]

Wei Song, Haonan Zhong, Ziqi Ding, Jingling Xue, and Yuekang Li. 2025. Help or Hurdle? Rethinking Model Context Protocol-Augmented Large Language Models. arXiv:2508.12566 [cs.AI] https://arxiv.org/abs/2508.12566

work page arXiv 2025
[73]

Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural Collaborative Filtering. InProceedings of the 26th International Conference on World Wide Web, WWW 2017, Rick Barrett, Rick Cummings, Eugene Agichtein, and Evgeniy Gabrilovich (Eds.). ACM, Perth, Australia, 173–182

2017
[74]

Jianghao Lin, Xinyi Dai, Yunjia Xi, Weiwen Liu, Bo Chen, Hao Zhang, Yong Liu, Chuhan Wu, Xiangyang Li, Chenxu Zhu, Huifeng Guo, Yong Yu, Ruiming Tang, and Weinan Zhang. 2025. How Can Recommender Systems Benefit from Large Language Models: A Survey.ACM Trans. Inf. Syst.43, 2 (2025), 28:1–28:47. , Vol. 1, No. 1, Article . Publication date: April 2026. 44 He et al

2025
[75]

Jiajia Chen, Xin Xin, Xianfeng Liang, Xiangnan He, and Jun Liu. 2023. GDSRec: Graph-Based Decentralized Collabora- tive Filtering for Social Recommendation.IEEE Trans. Knowl. Data Eng.35, 5 (2023), 4813–4824

2023
[76]

Shanquan Gao, Yihui Wang, and Zhenwei Ou. 2026. CVH-REC: A novel method for web API recommendation based on cross-view HGNNs.IEEE Trans. Software Eng.(2026), 1–16. doi:10.1109/TSE.2026.3666184 Early Access

work page doi:10.1109/tse.2026.3666184 2026
[77]

Ting Bai, Le Huang, Yue Yu, Cheng Yang, Cheng Hou, Zhe Zhao, and Chuan Shi. 2025. Efficient Multi-task Prompt Tuning for Recommendation.ACM Trans. Inf. Syst.43, 4 (2025), 104:1–104:21

2025
[78]

Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. InProceedings of the 7th International Conference on Learning Representations, ICLR 2019. OpenReview.net, New Orleans, LA, USA, 1–18. https://openreview. net/forum?id=Bkg6RiCqY7 , Vol. 1, No. 1, Article . Publication date: April 2026

2019