A Taxonomy of Runtime Faults in Model Context Protocol Servers

Antonio Ken Iannillo; Damian Andrew Tamburri; Indika Kumara; Joshua Owotogbe; Roberto Natella; Willem-Jan van den Heuvel

arxiv: 2606.05339 · v1 · pith:XKUJSJFFnew · submitted 2026-06-03 · 💻 cs.SE · cs.AI

A Taxonomy of Runtime Faults in Model Context Protocol Servers

Joshua Owotogbe , Indika Kumara , Willem-Jan van den Heuvel , Damian Andrew Tamburri , Antonio Ken Iannillo , Roberto Natella This is my paper

Pith reviewed 2026-06-28 04:55 UTC · model grok-4.3

classification 💻 cs.SE cs.AI

keywords Model Context Protocolruntime faultstaxonomyempirical studyGitHub repositoriesAI software maintenanceLLM toolsfault categorization

0 comments

The pith

A taxonomy derived from 837 GitHub threads organizes runtime faults in Model Context Protocol servers into 11 categories.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes the first empirical taxonomy of runtime faults in MCP servers through manual analysis of real issue reports. It uses a bottom-up coding process to group failures observed across protocol handling, tool execution, schema checks, state handling, provider connections, security, and operation cancellations. A follow-up survey of developers confirms that these categories match what practitioners actually encounter in deployed systems. If the taxonomy holds, it supplies a shared vocabulary that can guide debugging and evolution of tool-using AI applications.

Core claim

Runtime faults in MCP servers fall into 11 top-level categories and 27 subcategories that together capture 73 distinct leaf fault types; these were obtained by open coding of 837 MCP-specific threads drawn from 473 GitHub repositories and were found to cover every category reported by 55 surveyed MCP server developers, who on average had experienced 20 of the 27 subcategories.

What carries the argument

The taxonomy of 11 top-level categories and 27 subcategories produced by bottom-up open coding of runtime fault threads.

If this is right

The taxonomy covers recurrent failures across protocol interactions, tool invocations, schema enforcement, state management, model-provider integration, security validation, and timeouts or cancellations.
No top-level category remained unobserved when 55 MCP server developers were asked about their experience.
The taxonomy supplies a structured reference that can support maintenance and evolution tasks in AI software that uses the Model Context Protocol.
Developers encounter an average of 20 of the 27 subcategories, indicating broad practical coverage.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Automated checkers or linters could be written to flag code patterns that match the 73 leaf fault types.
The same bottom-up method could be applied to fault reports from other LLM-to-tool protocols to produce comparable taxonomies.
Longitudinal studies of open-source MCP servers could measure how often each subcategory appears in production logs.
Training materials for MCP server developers could be organized around the 11 categories to reduce common mistakes.

Load-bearing premise

The 837 selected threads from 473 repositories are representative of all runtime faults that actually occur in MCP servers.

What would settle it

Discovery of a large set of MCP server runtime faults that cannot be placed into any of the 11 categories would show the taxonomy is incomplete.

Figures

Figures reproduced from arXiv: 2606.05339 by Antonio Ken Iannillo, Damian Andrew Tamburri, Indika Kumara, Joshua Owotogbe, Roberto Natella, Willem-Jan van den Heuvel.

**Figure 2.** Figure 2: Overview of the methodological pipeline used in this study. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Taxonomy of MCP Server Failure Modes [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

read the original abstract

MCP (Model Context Protocol) enables LLMs (Large Language Models) to interact with external tools and data sources via a standardized protocol. Its rapid adoption in tool-augmented Artificial Intelligence (AI) workflows has introduced new reliability challenges, such as configuration parameters that are accepted but not enforced at runtime, leading to unintended default behavior, whose runtime fault characteristics remain empirically unexamined. We present the first empirical taxonomy of runtime faults in MCP servers. We manually analyzed 837 MCP-specific runtime fault threads from 473 actively maintained MCP server GitHub repositories and derived a taxonomy using a bottom-up open coding procedure. The taxonomy comprises 11 top-level categories and 27 subcategories (73 leaf fault types), covering recurrent failures across protocol interactions, tool invocations, schema enforcement, state management, model-provider integration, security validation, and timeouts or explicit cancellations of in-progress operations. To assess the taxonomy's external validity, we surveyed 55 MCP server developers. Respondents reported experiencing an average of 20 of the 27 fault subcategories, and no category remained unobserved. These results indicate that the taxonomy reflects widely observed runtime failures in MCP-based systems and shall assist AI software maintenance and evolution in the future.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives the first empirical taxonomy of MCP server runtime faults from 837 GitHub threads plus a 55-developer survey, but the manual selection process lacks any documented criteria.

read the letter

The main thing here is a new taxonomy of runtime faults in MCP servers. They analyzed 837 threads from 473 GitHub repos with bottom-up open coding, ending up with 11 top-level categories and 27 subcategories covering protocol issues, tool calls, schema problems, state, security, and timeouts. A survey of 55 developers then checked that most had run into nearly all of those subcategories. That is actually new for this protocol, which is still emerging.

The work does a decent job staying close to real issue reports instead of just making up categories. The survey adds a useful external check that the faults are not just one-off observations.

The soft spot is the sampling. The abstract says the threads were manually selected as MCP-specific runtime fault threads but gives no search strings, date range, exclusion rules, or definition of what counted as runtime versus other kinds of issues. That leaves open the chance that easily spotted problems dominate while quieter or private ones are missed. The survey cannot fully fix that upstream gap.

This is for people building or maintaining MCP servers and similar LLM tool integrations who want a practical list of what tends to break. It is not reshaping any core theory.

I would send it to peer review. The empirical base is a reasonable start for a new area and the survey is a positive step, but the methods need more detail on selection and reliability before the representativeness claim can be taken as solid.

Referee Report

3 major / 1 minor

Summary. The paper claims to present the first empirical taxonomy of runtime faults in Model Context Protocol (MCP) servers. It is based on bottom-up open coding of 837 manually selected MCP-specific runtime fault threads drawn from 473 actively maintained GitHub repositories, yielding 11 top-level categories, 27 subcategories, and 73 leaf fault types. External validity is assessed through a survey of 55 MCP server developers, who reported experiencing an average of 20 of the 27 subcategories with no category unobserved.

Significance. If the sampling frame and coding process prove representative and reliable, the taxonomy could provide a practical reference for diagnosing and mitigating runtime failures in tool-augmented LLM systems, supporting maintenance and evolution of MCP-based AI software.

major comments (3)

[Abstract] Abstract: the central claim that the taxonomy 'reflects widely observed runtime failures' rests on the 837 threads being representative, yet the description provides no search strings, inclusion/exclusion criteria, temporal bounds, operational definition of 'MCP-specific' or 'runtime fault', or sampling frame. This selection process is load-bearing for all downstream claims.
[Abstract] Abstract (open coding description): the bottom-up procedure is presented without any information on the number of coders, inter-rater reliability statistics, disagreement resolution protocol, or saturation criteria. These omissions directly affect the trustworthiness of the 11/27/73 category structure.
[Abstract (validation survey)] Survey validation paragraph: while 55 respondents reported experience with an average of 20/27 subcategories, the paper supplies no recruitment method, response rate, or respondent demographics. This limits the survey's ability to compensate for potential upstream selection bias in the GitHub corpus.

minor comments (1)

[Abstract] Abstract: the parenthetical '(73 leaf fault types)' appears late; moving the full category counts earlier would improve immediate readability of the contribution size.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on methodological transparency. We agree that the abstract omits key details due to length constraints and will revise it to include concise references to the sampling, coding, and survey procedures described in the methods section. We address each major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the taxonomy 'reflects widely observed runtime failures' rests on the 837 threads being representative, yet the description provides no search strings, inclusion/exclusion criteria, temporal bounds, operational definition of 'MCP-specific' or 'runtime fault', or sampling frame. This selection process is load-bearing for all downstream claims.

Authors: We acknowledge the abstract does not detail these elements. The full manuscript (Section 3.1) specifies the sampling frame: 473 actively maintained GitHub repositories implementing MCP (filtered by stars, recent commits, and protocol relevance), with 837 threads selected via keyword searches combining 'MCP'/'model context protocol' with runtime fault terms (e.g., 'error', 'exception', 'fault', 'timeout'). Inclusion required explicit runtime fault reports in MCP server contexts; exclusion covered non-runtime issues, feature requests, and non-MCP projects. Temporal bounds cover threads from the protocol's initial public release through the collection date. We will revise the abstract to briefly summarize the sampling approach and direct readers to Section 3 for complete criteria. revision: yes
Referee: [Abstract] Abstract (open coding description): the bottom-up procedure is presented without any information on the number of coders, inter-rater reliability statistics, disagreement resolution protocol, or saturation criteria. These omissions directly affect the trustworthiness of the 11/27/73 category structure.

Authors: The abstract omits these details for brevity. In the full paper, open coding was conducted by two researchers using an iterative bottom-up process on the 837 threads, with weekly consensus meetings to resolve disagreements through discussion (no independent parallel coding was performed, so formal IRR metrics such as Cohen's kappa were not calculated). Saturation was reached when analysis of additional threads yielded no new categories or subcategories. We will expand the abstract with a brief methods summary and add an explicit subsection in Section 3.2 describing the coder count, resolution protocol, and saturation assessment. revision: yes
Referee: [Abstract (validation survey)] Survey validation paragraph: while 55 respondents reported experience with an average of 20/27 subcategories, the paper supplies no recruitment method, response rate, or respondent demographics. This limits the survey's ability to compensate for potential upstream selection bias in the GitHub corpus.

Authors: The abstract does not include these survey details. Recruitment occurred via posts in MCP-related GitHub discussions, Discord communities, and direct outreach to repository maintainers; the survey was open for four weeks. We will revise the abstract and methods section to report the recruitment channels, achieved response count, and available respondent characteristics (e.g., self-reported MCP experience levels). Note that exact response rate cannot be computed as the total population size of MCP developers is unknown. revision: partial

Circularity Check

0 steps flagged

No circularity; taxonomy derived bottom-up from primary data corpus

full rationale

The paper conducts an empirical classification study: 837 threads are manually selected from GitHub repositories and subjected to bottom-up open coding to produce the 11-category taxonomy. No equations, fitted parameters, predictions, or self-citations appear in the derivation chain; categories are stated to emerge from the threads rather than being presupposed or imported. The developer survey functions as an independent consistency check on the resulting taxonomy, not as an input that forces the categories. This matches the default expectation of a self-contained empirical taxonomy with no load-bearing reduction to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

This is an empirical qualitative study; the central claim rests on the validity of open coding applied to issue reports and the representativeness of the sampled repositories. No numerical parameters are fitted and no new entities are postulated.

axioms (1)

domain assumption Bottom-up open coding applied to GitHub issue threads produces a valid and useful taxonomy of runtime faults
The entire taxonomy construction depends on this standard assumption from qualitative software engineering research.

pith-pipeline@v0.9.1-grok · 5760 in / 1297 out tokens · 30210 ms · 2026-06-28T04:55:09.127685+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

59 extracted references · 17 canonical work pages · 8 internal anchors

[1]

A Survey of Large Language Models

W. X. Zhao, K. Zhou, J. Li, T. Tang, X. Wang, Y. Hou, Y. Min, B. Zhang, J. Zhang, Z. Donget al., “A survey of large language models,”arXiv preprint arXiv:2303.18223, vol. 1, no. 2, pp. 1–124, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[2]

On the Opportunities and Risks of Foundation Models

R. Bommasani, D. A. Hudson, E. Adeli, R. Altman, S. Arora, S. von Arx, M. S. Bernstein, J. Bohg, A. Bosselut, E. Brunskillet al., “On the opportunities and risks of foundation models,”arXiv preprint arXiv:2108.07258, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[3]

A comprehensive survey on integrating large language models with knowledge- based methods,

W. Yang, L. Some, M. Bain, and B. Kang, “A comprehensive survey on integrating large language models with knowledge- based methods,”Knowledge-Based Systems, vol. 318, p. 113503, 2025

2025
[4]

A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models

S. Tonmoy, S. Zaman, V . Jain, A. Rani, V . Rawte, A. Chadha, and A. Das, “A comprehensive survey of hallucination mitigation tech- niques in large language models,”arXiv preprint arXiv:2401.01313, vol. 6, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[5]

Retrieval- augmented generation for knowledge-intensive nlp tasks,

P . Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih, T. Rocktäschelet al., “Retrieval- augmented generation for knowledge-intensive nlp tasks,”Ad- vances in neural information processing systems, vol. 33, pp. 9459– 9474, 2020

2020
[6]

Retrieval-Augmented Generation for Large Language Models: A Survey

Y. Gao, Y. Xiong, X. Gao, K. Jia, J. Pan, Y. Bi, Y. Dai, J. Sun, H. Wang, H. Wanget al., “Retrieval-augmented generation for large language models: A survey,”arXiv preprint arXiv:2312.10997, vol. 2, no. 1, p. 32, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[7]

React: Synergizing reasoning and acting in language models,

S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. R. Narasimhan, and Y. Cao, “React: Synergizing reasoning and acting in language models,” inThe eleventh international conference on learning repre- sentations, 2022

2022
[8]

Toolformer: Language models can teach themselves to use tools,

T. Schick, J. Dwivedi-Yu, R. Dessì, R. Raileanu, M. Lomeli, E. Ham- bro, L. Zettlemoyer, N. Cancedda, and T. Scialom, “Toolformer: Language models can teach themselves to use tools,”Advances in neural information processing systems, vol. 36, pp. 68 539–68 551, 2023

2023
[9]

Promises and challenges of microservices: an exploratory study,

Y. Wang, H. Kadiyala, and J. Rubin, “Promises and challenges of microservices: an exploratory study,”Empirical Software Engineer- ing, vol. 26, no. 4, p. 63, 2021

2021
[10]

Gray failure: The achilles’ heel of cloud-scale sys- tems,

P . Huang, C. Guo, L. Zhou, J. R. Lorch, Y. Dang, M. Chintalapati, and R. Yao, “Gray failure: The achilles’ heel of cloud-scale sys- tems,” inProceedings of the 16th Workshop on Hot Topics in Operating Systems, 2017, pp. 150–155

2017
[11]

Experience report: An empirical study of api failures in openstack cloud environments,

P . Musavi, B. Adams, and F. Khomh, “Experience report: An empirical study of api failures in openstack cloud environments,” in2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE). IEEE, 2016, pp. 424–434

2016
[12]

What is the model context protocol (mcp)?

Model Context Protocol, “What is the model context protocol (mcp)?” https://modelcontextprotocol.io/docs/getting-started/i ntro, 2026, official documentation, accessed February 25, 2026

2026
[13]

Specification-model context protocol,

——, “Specification-model context protocol,” https://modelcon textprotocol.io/specification/2025-06-18, Jun. 2025, accessed: 2026-02-26

2025
[14]

Model Context Protocol (MCP) at First Glance: Studying the Security and Maintainability of MCP Servers

M. M. Hasan, H. Li, E. Fallahzadeh, G. K. Rajbahadur, B. Adams, and A. E. Hassan, “Model context protocol (mcp) at first glance: Studying the security and maintainability of mcp servers,”arXiv preprint arXiv:2506.13538, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[15]

Systematization of knowledge: Security and safety in the model context protocol ecosystem,

S. Gaire, S. Gyawali, S. Mishra, S. Niroula, D. Thakur, and U. Yadav, “Systematization of knowledge: Security and safety in the model context protocol ecosystem,”arXiv preprint arXiv:2512.08290, 2025

work page arXiv 2025
[16]

A measurement study of model context protocol ecosystem,

H. Guo, Y. Hao, Y. Zhang, M. Xu, P . Lv, J. Chen, and X. Cheng, “A measurement study of model context protocol ecosystem,”arXiv preprint arXiv:2509.25292, 2025

work page arXiv 2025
[17]

MCPXKIT: The Unified Toolkit for Analyzing Model Context Protocol Security

Y. Guo, P . Liu, W. Ma, Z. Deng, X. Zhu, P . Di, X. Xiao, and S. Wen, “Systematic analysis of mcp security,”arXiv preprint arXiv:2508.12538, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[18]

When mcp servers attack: Tax- onomy, feasibility, and mitigation.arXiv preprint arXiv:2509.24272, 2025

W. Zhao, J. Liu, B. Ruan, S. Li, and Z. Liang, “When mcp servers attack: Taxonomy, feasibility, and mitigation,”arXiv preprint arXiv:2509.24272, 2025

work page arXiv 2025
[19]

Basic concepts and taxonomy of dependable and secure computing,

A. Avizienis, J.-C. Laprie, B. Randell, and C. Landwehr, “Basic concepts and taxonomy of dependable and secure computing,” IEEE transactions on dependable and secure computing, vol. 1, no. 1, pp. 11–33, 2004

2004
[20]

Understanding fault-tolerant distributed systems,

F. Cristian, “Understanding fault-tolerant distributed systems,” Communications of the ACM, vol. 34, no. 2, pp. 56–78, 1991

1991
[21]

Fault analysis and debugging of microservice systems: Industrial sur- vey, benchmark system, and empirical study,

X. Zhou, X. Peng, T. Xie, J. Sun, C. Ji, W. Li, and D. Ding, “Fault analysis and debugging of microservice systems: Industrial sur- vey, benchmark system, and empirical study,”IEEE Transactions on Software Engineering, vol. 47, no. 2, pp. 243–260, 2018

2018
[22]

Taxonomy of real faults in deep learning systems,

N. Humbatova, G. Jahangirova, G. Bavota, V . Riccio, A. Stocco, and P . Tonella, “Taxonomy of real faults in deep learning systems,” in Proceedings of the ACM/IEEE 42nd international conference on software engineering, 2020, pp. 1110–1121

2020
[23]

A taxonomy of real faults for hybrid quantum-classical software architectures,

A. Bensoussan, G. Jahangirova, and M. Mousavi, “A taxonomy of real faults for hybrid quantum-classical software architectures,” ACM Transactions on Software Engineering and Methodology, 2025

2025
[24]

Empirical validation of a web fault taxonomy and its usage for fault seeding,

A. Marchetto, F. Ricca, and P . Tonella, “Empirical validation of a web fault taxonomy and its usage for fault seeding,” in2007 9th IEEE International Workshop on Web Site Evolution. IEEE, 2007, pp. 31–38

2007
[25]

Are mutants a valid substitute for real faults in software testing?

R. Just, D. Jalali, L. Inozemtseva, M. D. Ernst, R. Holmes, and G. Fraser, “Are mutants a valid substitute for real faults in software testing?” inProceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering, 2014, pp. 654–665

2014
[26]

Crowd- sourced knowledge on stack overflow: A systematic mapping study,

S. Meldrum, S. A. Licorish, and B. T. R. Savarimuthu, “Crowd- sourced knowledge on stack overflow: A systematic mapping study,” inProceedings of the 21st international conference on evaluation and assessment in software engineering, 2017, pp. 180–185

2017
[27]

Sampling methods, types & techniques,

W. Webster, “Sampling methods, types & techniques,” https:// www.qualtrics.com/articles/strategy-research/sampling-metho ds/, Jan. 2023, qualtrics. Accessed: 2026-06-01

2023
[28]

Practitioners’ expectations on log anomaly detection,

X. Ma, Y. Li, J. Keung, X. Yu, H. Zou, Z. Yang, F. Sarro, and E. T. Barr, “Practitioners’ expectations on log anomaly detection,”IEEE Transactions on Software Engineering, 2025

2025
[29]

Pull request governance in open source communities,

A. Alami, R. Pardo, M. L. Cohn, and A. W ˛ asowski, “Pull request governance in open source communities,”IEEE Transactions on Software Engineering, vol. 48, no. 12, pp. 4838–4856, 2021

2021
[30]

Function calling,

OpenAI, “Function calling,” https://platform.openai.com/docs /guides/function-calling, 2025, accessed: 2026-02-26

2025
[31]

Json-rpc 2.0 specification,

JSON-RPC Working Group, “Json-rpc 2.0 specification,” https:// www.jsonrpc.org/specification, 2013, accessed: 2026-02-26

2013
[32]

Experimental evaluation of the fail- silent behavior of a distributed real-time run-time support built from cots components,

P . Chevochot and I. Puaut, “Experimental evaluation of the fail- silent behavior of a distributed real-time run-time support built from cots components,” in2001 International Conference on Depend- able Systems and Networks. IEEE, 2001, pp. 304–313

2001
[33]

An empirical study on api-misuse bugs in open-source c programs,

Z. Gu, J. Wu, J. Liu, M. Zhou, and M. Gu, “An empirical study on api-misuse bugs in open-source c programs,” in2019 IEEE 43rd annual computer software and applications conference (COMPSAC), vol. 1. IEEE, 2019, pp. 11–20. 14

2019
[34]

Root cause analysis of anomalies of multitier services in public clouds,

J. Weng, J. H. Wang, J. Yang, and Y. Yang, “Root cause analysis of anomalies of multitier services in public clouds,”IEEE/ACM Transactions on Networking, vol. 26, no. 4, pp. 1646–1659, 2018

2018
[35]

Failure diagnosis in microservice systems: A comprehensive survey and analysis,

S. Zhang, S. Xia, W. Fan, B. Shi, X. Xiong, Z. Zhong, M. Ma, Y. Sun, and D. Pei, “Failure diagnosis in microservice systems: A comprehensive survey and analysis,”ACM Transactions on Software Engineering and Methodology, 2024

2024
[36]

An empirical study on tensorflow program bugs,

Y. Zhang, Y. Chen, S.-C. Cheung, Y. Xiong, and L. Zhang, “An empirical study on tensorflow program bugs,” inProceedings of the 27th ACM SIGSOFT international symposium on software testing and analysis, 2018, pp. 129–140

2018
[37]

An empirical study of issues in large language model training systems,

Y. Gao, R. Lu, H. Lin, and Y. Chen, “An empirical study of issues in large language model training systems,” inProceedings of the 33rd ACM International Conference on the Foundations of Software Engineering, 2025, pp. 122–133

2025
[38]

Defining and detecting the defects of large language model-based autonomous agents,

K. Ning, J. Chen, J. Zhang, W. Li, Z. Wang, Y. Feng, W. Zhang, and Z. Zheng, “Defining and detecting the defects of large language model-based autonomous agents,”IEEE Transactions on Software Engineering, 2026

2026
[39]

Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions

X. Hou, Y. Zhao, S. Wang, and H. Wang, “Model context protocol (mcp): Landscape, security threats, and future research direc- tions,”arXiv preprint arXiv:2503.23278, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[40]

We urgently need privilege management in mcp: A measurement of api usage in mcp ecosystems,

Z. Li, K. Li, B. Ma, M. Xu, Y. Zhang, and X. Cheng, “We urgently need privilege management in mcp: A measurement of api usage in mcp ecosystems,”arXiv preprint arXiv:2507.06250, 2025

work page arXiv 2025
[41]

Parasites in the Toolchain: A Large-Scale Analysis of Attacks on the MCP Ecosystem

S. Zhao, Q. Hou, Z. Zhan, Y. Wang, Y. Xie, Y. Guo, L. Chen, S. Li, and Z. Xue, “Mind your server: A systematic study of parasitic toolchain attacks on the mcp ecosystem,”arXiv preprint arXiv:2509.06572, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[42]

On the use of agentic coding: An empirical study of pull requests on github,

M. Watanabe, H. Li, Y. Kashiwa, B. Reid, H. Iida, and A. E. Hassan, “On the use of agentic coding: An empirical study of pull requests on github,”arXiv preprint arXiv:2509.14745, 2025

work page arXiv 2025
[43]

muprl: A mutation testing pipeline for deep reinforcement learning based on real faults,

D.-G. Thomas, M. Biagiola, N. Humbatova, M. Wardat, G. Ja- hangirova, H. Rajan, and P . Tonella, “muprl: A mutation testing pipeline for deep reinforcement learning based on real faults,” arXiv preprint arXiv:2408.15150, 2024

work page arXiv 2024
[44]

Taxonomy of faults in attention-based neural networks,

S. Jahan, S. Singh Rajput, T. Sharma, and M. Masudur Rahman, “Taxonomy of faults in attention-based neural networks,”arXiv e-prints, pp. arXiv–2508, 2025

2025
[45]

Faults in deep reinforcement learning programs: a taxonomy and a detection approach,

A. Nikanjam, M. M. Morovati, F. Khomh, and H. Ben Braiek, “Faults in deep reinforcement learning programs: a taxonomy and a detection approach,”Automated software engineering, vol. 29, no. 1, p. 8, 2022

2022
[46]

An in-depth study of the promises and perils of mining github,

E. Kalliamvakou, G. Gousios, K. Blincoe, L. Singer, D. M. German, and D. Damian, “An in-depth study of the promises and perils of mining github,”Empirical Software Engineering, vol. 21, no. 5, pp. 2035–2071, 2016

2035
[47]

Curating github for engineered software projects,

N. Munaiah, S. Kroh, C. Cabrey, and M. Nagappan, “Curating github for engineered software projects,”Empirical Software Engi- neering, vol. 22, no. 6, pp. 3219–3253, 2017

2017
[48]

Data quality assessment in the wild: Find- ings from github,

I. Ustunboyacioglu, I. Kumara, D. Di Nucci, D. A. Tamburri, and W.-J. Van Den Heuvel, “Data quality assessment in the wild: Find- ings from github,” inProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering, 2024, pp. 120– 129

2024
[49]

Chaos engineering in the wild: Findings from github,

J. Owotogbe, I. Kumara, D. Di Nucci, D. A. Tamburri, and W.- J. v. d. Heuvel, “Chaos engineering in the wild: Findings from github,”arXiv preprint arXiv:2505.13654, 2025

work page arXiv 2025
[50]

Integrating large language models in software engineering education: A pi- lot study through github repositories mining,

M. Khan, M. A. Akbar, and J. Kasurinen, “Integrating large language models in software engineering education: A pi- lot study through github repositories mining,”arXiv preprint arXiv:2509.04877, 2025

work page arXiv 2025
[51]

Towards detecting prompt knowledge gaps for improved llm-guided issue resolution,

R. Ehsani, S. Pathak, and P . Chatterjee, “Towards detecting prompt knowledge gaps for improved llm-guided issue resolution,” in 2025 IEEE/ACM 22nd International Conference on Mining Software Repositories (MSR). IEEE, 2025, pp. 699–711

2025
[52]

A first look at the self-admitted technical debt in test code: Taxonomy and detection,

S. Islam, M. N. I. Opu, S. Wang, and S. Chowdhury, “A first look at the self-admitted technical debt in test code: Taxonomy and detection,”arXiv preprint arXiv:2510.22409, 2025

work page arXiv 2025
[53]

Bugs in machine learning-based systems: a faultload benchmark,

M. M. Morovati, A. Nikanjam, F. Khomh, and Z. M. Jiang, “Bugs in machine learning-based systems: a faultload benchmark,”Em- pirical Software Engineering, vol. 28, no. 3, p. 62, 2023

2023
[54]

Sampling in software engineering research: A critical review and guidelines,

S. Baltes and P . Ralph, “Sampling in software engineering research: A critical review and guidelines,”Empirical Software Engineering, vol. 27, no. 4, p. 94, 2022

2022
[55]

Bug taxonomies: Use them to generate better tests,

G. Vijayaraghavan and C. Kaner, “Bug taxonomies: Use them to generate better tests,”Star East, vol. 2003, pp. 1–40, 2003

2003
[56]

A comprehensive study on security bug characteristics,

Y. Wei, X. Sun, L. Bo, S. Cao, X. Xia, and B. Li, “A comprehensive study on security bug characteristics,”Journal of Software: Evolution and Process, vol. 33, no. 10, p. e2376, 2021

2021
[57]

What do users ask in open-source ai repositories? an empirical study of github issues,

Z. Yang, C. Wang, J. Shi, T. Hoang, P . Kochhar, Q. Lu, Z. Xing, and D. Lo, “What do users ask in open-source ai repositories? an empirical study of github issues,” in2023 IEEE/ACM 20th Inter- national Conference on Mining Software Repositories (MSR). IEEE, 2023, pp. 79–91

2023
[58]

Wohlin, P

C. Wohlin, P . Runeson, M. Höst, M. C. Ohlsson, B. Regnell, A. Wesslénet al.,Experimentation in software engineering. Karl- skrona, Sweden: Springer, 2012, vol. 236

2012
[59]

Are you still working on this? an empirical study on pull request abandon- ment,

Z. Li, Y. Yu, T. Wang, G. Yin, S. Li, and H. Wang, “Are you still working on this? an empirical study on pull request abandon- ment,”IEEE Transactions on Software Engineering, vol. 48, no. 6, pp. 2173–2188, 2021. ACKNOWLEDGMENTS This research was partially funded by the Safeguard project (grant No. 20506020) with Deloitte. Joshua Owotogbeis a Ph.D. candid...

2021

[1] [1]

A Survey of Large Language Models

W. X. Zhao, K. Zhou, J. Li, T. Tang, X. Wang, Y. Hou, Y. Min, B. Zhang, J. Zhang, Z. Donget al., “A survey of large language models,”arXiv preprint arXiv:2303.18223, vol. 1, no. 2, pp. 1–124, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[2] [2]

On the Opportunities and Risks of Foundation Models

R. Bommasani, D. A. Hudson, E. Adeli, R. Altman, S. Arora, S. von Arx, M. S. Bernstein, J. Bohg, A. Bosselut, E. Brunskillet al., “On the opportunities and risks of foundation models,”arXiv preprint arXiv:2108.07258, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[3] [3]

A comprehensive survey on integrating large language models with knowledge- based methods,

W. Yang, L. Some, M. Bain, and B. Kang, “A comprehensive survey on integrating large language models with knowledge- based methods,”Knowledge-Based Systems, vol. 318, p. 113503, 2025

2025

[4] [4]

A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models

S. Tonmoy, S. Zaman, V . Jain, A. Rani, V . Rawte, A. Chadha, and A. Das, “A comprehensive survey of hallucination mitigation tech- niques in large language models,”arXiv preprint arXiv:2401.01313, vol. 6, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[5] [5]

Retrieval- augmented generation for knowledge-intensive nlp tasks,

P . Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih, T. Rocktäschelet al., “Retrieval- augmented generation for knowledge-intensive nlp tasks,”Ad- vances in neural information processing systems, vol. 33, pp. 9459– 9474, 2020

2020

[6] [6]

Retrieval-Augmented Generation for Large Language Models: A Survey

Y. Gao, Y. Xiong, X. Gao, K. Jia, J. Pan, Y. Bi, Y. Dai, J. Sun, H. Wang, H. Wanget al., “Retrieval-augmented generation for large language models: A survey,”arXiv preprint arXiv:2312.10997, vol. 2, no. 1, p. 32, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[7] [7]

React: Synergizing reasoning and acting in language models,

S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. R. Narasimhan, and Y. Cao, “React: Synergizing reasoning and acting in language models,” inThe eleventh international conference on learning repre- sentations, 2022

2022

[8] [8]

Toolformer: Language models can teach themselves to use tools,

T. Schick, J. Dwivedi-Yu, R. Dessì, R. Raileanu, M. Lomeli, E. Ham- bro, L. Zettlemoyer, N. Cancedda, and T. Scialom, “Toolformer: Language models can teach themselves to use tools,”Advances in neural information processing systems, vol. 36, pp. 68 539–68 551, 2023

2023

[9] [9]

Promises and challenges of microservices: an exploratory study,

Y. Wang, H. Kadiyala, and J. Rubin, “Promises and challenges of microservices: an exploratory study,”Empirical Software Engineer- ing, vol. 26, no. 4, p. 63, 2021

2021

[10] [10]

Gray failure: The achilles’ heel of cloud-scale sys- tems,

P . Huang, C. Guo, L. Zhou, J. R. Lorch, Y. Dang, M. Chintalapati, and R. Yao, “Gray failure: The achilles’ heel of cloud-scale sys- tems,” inProceedings of the 16th Workshop on Hot Topics in Operating Systems, 2017, pp. 150–155

2017

[11] [11]

Experience report: An empirical study of api failures in openstack cloud environments,

P . Musavi, B. Adams, and F. Khomh, “Experience report: An empirical study of api failures in openstack cloud environments,” in2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE). IEEE, 2016, pp. 424–434

2016

[12] [12]

What is the model context protocol (mcp)?

Model Context Protocol, “What is the model context protocol (mcp)?” https://modelcontextprotocol.io/docs/getting-started/i ntro, 2026, official documentation, accessed February 25, 2026

2026

[13] [13]

Specification-model context protocol,

——, “Specification-model context protocol,” https://modelcon textprotocol.io/specification/2025-06-18, Jun. 2025, accessed: 2026-02-26

2025

[14] [14]

Model Context Protocol (MCP) at First Glance: Studying the Security and Maintainability of MCP Servers

M. M. Hasan, H. Li, E. Fallahzadeh, G. K. Rajbahadur, B. Adams, and A. E. Hassan, “Model context protocol (mcp) at first glance: Studying the security and maintainability of mcp servers,”arXiv preprint arXiv:2506.13538, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[15] [15]

Systematization of knowledge: Security and safety in the model context protocol ecosystem,

S. Gaire, S. Gyawali, S. Mishra, S. Niroula, D. Thakur, and U. Yadav, “Systematization of knowledge: Security and safety in the model context protocol ecosystem,”arXiv preprint arXiv:2512.08290, 2025

work page arXiv 2025

[16] [16]

A measurement study of model context protocol ecosystem,

H. Guo, Y. Hao, Y. Zhang, M. Xu, P . Lv, J. Chen, and X. Cheng, “A measurement study of model context protocol ecosystem,”arXiv preprint arXiv:2509.25292, 2025

work page arXiv 2025

[17] [17]

MCPXKIT: The Unified Toolkit for Analyzing Model Context Protocol Security

Y. Guo, P . Liu, W. Ma, Z. Deng, X. Zhu, P . Di, X. Xiao, and S. Wen, “Systematic analysis of mcp security,”arXiv preprint arXiv:2508.12538, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[18] [18]

When mcp servers attack: Tax- onomy, feasibility, and mitigation.arXiv preprint arXiv:2509.24272, 2025

W. Zhao, J. Liu, B. Ruan, S. Li, and Z. Liang, “When mcp servers attack: Taxonomy, feasibility, and mitigation,”arXiv preprint arXiv:2509.24272, 2025

work page arXiv 2025

[19] [19]

Basic concepts and taxonomy of dependable and secure computing,

A. Avizienis, J.-C. Laprie, B. Randell, and C. Landwehr, “Basic concepts and taxonomy of dependable and secure computing,” IEEE transactions on dependable and secure computing, vol. 1, no. 1, pp. 11–33, 2004

2004

[20] [20]

Understanding fault-tolerant distributed systems,

F. Cristian, “Understanding fault-tolerant distributed systems,” Communications of the ACM, vol. 34, no. 2, pp. 56–78, 1991

1991

[21] [21]

Fault analysis and debugging of microservice systems: Industrial sur- vey, benchmark system, and empirical study,

X. Zhou, X. Peng, T. Xie, J. Sun, C. Ji, W. Li, and D. Ding, “Fault analysis and debugging of microservice systems: Industrial sur- vey, benchmark system, and empirical study,”IEEE Transactions on Software Engineering, vol. 47, no. 2, pp. 243–260, 2018

2018

[22] [22]

Taxonomy of real faults in deep learning systems,

N. Humbatova, G. Jahangirova, G. Bavota, V . Riccio, A. Stocco, and P . Tonella, “Taxonomy of real faults in deep learning systems,” in Proceedings of the ACM/IEEE 42nd international conference on software engineering, 2020, pp. 1110–1121

2020

[23] [23]

A taxonomy of real faults for hybrid quantum-classical software architectures,

A. Bensoussan, G. Jahangirova, and M. Mousavi, “A taxonomy of real faults for hybrid quantum-classical software architectures,” ACM Transactions on Software Engineering and Methodology, 2025

2025

[24] [24]

Empirical validation of a web fault taxonomy and its usage for fault seeding,

A. Marchetto, F. Ricca, and P . Tonella, “Empirical validation of a web fault taxonomy and its usage for fault seeding,” in2007 9th IEEE International Workshop on Web Site Evolution. IEEE, 2007, pp. 31–38

2007

[25] [25]

Are mutants a valid substitute for real faults in software testing?

R. Just, D. Jalali, L. Inozemtseva, M. D. Ernst, R. Holmes, and G. Fraser, “Are mutants a valid substitute for real faults in software testing?” inProceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering, 2014, pp. 654–665

2014

[26] [26]

Crowd- sourced knowledge on stack overflow: A systematic mapping study,

S. Meldrum, S. A. Licorish, and B. T. R. Savarimuthu, “Crowd- sourced knowledge on stack overflow: A systematic mapping study,” inProceedings of the 21st international conference on evaluation and assessment in software engineering, 2017, pp. 180–185

2017

[27] [27]

Sampling methods, types & techniques,

W. Webster, “Sampling methods, types & techniques,” https:// www.qualtrics.com/articles/strategy-research/sampling-metho ds/, Jan. 2023, qualtrics. Accessed: 2026-06-01

2023

[28] [28]

Practitioners’ expectations on log anomaly detection,

X. Ma, Y. Li, J. Keung, X. Yu, H. Zou, Z. Yang, F. Sarro, and E. T. Barr, “Practitioners’ expectations on log anomaly detection,”IEEE Transactions on Software Engineering, 2025

2025

[29] [29]

Pull request governance in open source communities,

A. Alami, R. Pardo, M. L. Cohn, and A. W ˛ asowski, “Pull request governance in open source communities,”IEEE Transactions on Software Engineering, vol. 48, no. 12, pp. 4838–4856, 2021

2021

[30] [30]

Function calling,

OpenAI, “Function calling,” https://platform.openai.com/docs /guides/function-calling, 2025, accessed: 2026-02-26

2025

[31] [31]

Json-rpc 2.0 specification,

JSON-RPC Working Group, “Json-rpc 2.0 specification,” https:// www.jsonrpc.org/specification, 2013, accessed: 2026-02-26

2013

[32] [32]

Experimental evaluation of the fail- silent behavior of a distributed real-time run-time support built from cots components,

P . Chevochot and I. Puaut, “Experimental evaluation of the fail- silent behavior of a distributed real-time run-time support built from cots components,” in2001 International Conference on Depend- able Systems and Networks. IEEE, 2001, pp. 304–313

2001

[33] [33]

An empirical study on api-misuse bugs in open-source c programs,

Z. Gu, J. Wu, J. Liu, M. Zhou, and M. Gu, “An empirical study on api-misuse bugs in open-source c programs,” in2019 IEEE 43rd annual computer software and applications conference (COMPSAC), vol. 1. IEEE, 2019, pp. 11–20. 14

2019

[34] [34]

Root cause analysis of anomalies of multitier services in public clouds,

J. Weng, J. H. Wang, J. Yang, and Y. Yang, “Root cause analysis of anomalies of multitier services in public clouds,”IEEE/ACM Transactions on Networking, vol. 26, no. 4, pp. 1646–1659, 2018

2018

[35] [35]

Failure diagnosis in microservice systems: A comprehensive survey and analysis,

S. Zhang, S. Xia, W. Fan, B. Shi, X. Xiong, Z. Zhong, M. Ma, Y. Sun, and D. Pei, “Failure diagnosis in microservice systems: A comprehensive survey and analysis,”ACM Transactions on Software Engineering and Methodology, 2024

2024

[36] [36]

An empirical study on tensorflow program bugs,

Y. Zhang, Y. Chen, S.-C. Cheung, Y. Xiong, and L. Zhang, “An empirical study on tensorflow program bugs,” inProceedings of the 27th ACM SIGSOFT international symposium on software testing and analysis, 2018, pp. 129–140

2018

[37] [37]

An empirical study of issues in large language model training systems,

Y. Gao, R. Lu, H. Lin, and Y. Chen, “An empirical study of issues in large language model training systems,” inProceedings of the 33rd ACM International Conference on the Foundations of Software Engineering, 2025, pp. 122–133

2025

[38] [38]

Defining and detecting the defects of large language model-based autonomous agents,

K. Ning, J. Chen, J. Zhang, W. Li, Z. Wang, Y. Feng, W. Zhang, and Z. Zheng, “Defining and detecting the defects of large language model-based autonomous agents,”IEEE Transactions on Software Engineering, 2026

2026

[39] [39]

Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions

X. Hou, Y. Zhao, S. Wang, and H. Wang, “Model context protocol (mcp): Landscape, security threats, and future research direc- tions,”arXiv preprint arXiv:2503.23278, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[40] [40]

We urgently need privilege management in mcp: A measurement of api usage in mcp ecosystems,

Z. Li, K. Li, B. Ma, M. Xu, Y. Zhang, and X. Cheng, “We urgently need privilege management in mcp: A measurement of api usage in mcp ecosystems,”arXiv preprint arXiv:2507.06250, 2025

work page arXiv 2025

[41] [41]

Parasites in the Toolchain: A Large-Scale Analysis of Attacks on the MCP Ecosystem

S. Zhao, Q. Hou, Z. Zhan, Y. Wang, Y. Xie, Y. Guo, L. Chen, S. Li, and Z. Xue, “Mind your server: A systematic study of parasitic toolchain attacks on the mcp ecosystem,”arXiv preprint arXiv:2509.06572, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[42] [42]

On the use of agentic coding: An empirical study of pull requests on github,

M. Watanabe, H. Li, Y. Kashiwa, B. Reid, H. Iida, and A. E. Hassan, “On the use of agentic coding: An empirical study of pull requests on github,”arXiv preprint arXiv:2509.14745, 2025

work page arXiv 2025

[43] [43]

muprl: A mutation testing pipeline for deep reinforcement learning based on real faults,

D.-G. Thomas, M. Biagiola, N. Humbatova, M. Wardat, G. Ja- hangirova, H. Rajan, and P . Tonella, “muprl: A mutation testing pipeline for deep reinforcement learning based on real faults,” arXiv preprint arXiv:2408.15150, 2024

work page arXiv 2024

[44] [44]

Taxonomy of faults in attention-based neural networks,

S. Jahan, S. Singh Rajput, T. Sharma, and M. Masudur Rahman, “Taxonomy of faults in attention-based neural networks,”arXiv e-prints, pp. arXiv–2508, 2025

2025

[45] [45]

Faults in deep reinforcement learning programs: a taxonomy and a detection approach,

A. Nikanjam, M. M. Morovati, F. Khomh, and H. Ben Braiek, “Faults in deep reinforcement learning programs: a taxonomy and a detection approach,”Automated software engineering, vol. 29, no. 1, p. 8, 2022

2022

[46] [46]

An in-depth study of the promises and perils of mining github,

E. Kalliamvakou, G. Gousios, K. Blincoe, L. Singer, D. M. German, and D. Damian, “An in-depth study of the promises and perils of mining github,”Empirical Software Engineering, vol. 21, no. 5, pp. 2035–2071, 2016

2035

[47] [47]

Curating github for engineered software projects,

N. Munaiah, S. Kroh, C. Cabrey, and M. Nagappan, “Curating github for engineered software projects,”Empirical Software Engi- neering, vol. 22, no. 6, pp. 3219–3253, 2017

2017

[48] [48]

Data quality assessment in the wild: Find- ings from github,

I. Ustunboyacioglu, I. Kumara, D. Di Nucci, D. A. Tamburri, and W.-J. Van Den Heuvel, “Data quality assessment in the wild: Find- ings from github,” inProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering, 2024, pp. 120– 129

2024

[49] [49]

Chaos engineering in the wild: Findings from github,

J. Owotogbe, I. Kumara, D. Di Nucci, D. A. Tamburri, and W.- J. v. d. Heuvel, “Chaos engineering in the wild: Findings from github,”arXiv preprint arXiv:2505.13654, 2025

work page arXiv 2025

[50] [50]

Integrating large language models in software engineering education: A pi- lot study through github repositories mining,

M. Khan, M. A. Akbar, and J. Kasurinen, “Integrating large language models in software engineering education: A pi- lot study through github repositories mining,”arXiv preprint arXiv:2509.04877, 2025

work page arXiv 2025

[51] [51]

Towards detecting prompt knowledge gaps for improved llm-guided issue resolution,

R. Ehsani, S. Pathak, and P . Chatterjee, “Towards detecting prompt knowledge gaps for improved llm-guided issue resolution,” in 2025 IEEE/ACM 22nd International Conference on Mining Software Repositories (MSR). IEEE, 2025, pp. 699–711

2025

[52] [52]

A first look at the self-admitted technical debt in test code: Taxonomy and detection,

S. Islam, M. N. I. Opu, S. Wang, and S. Chowdhury, “A first look at the self-admitted technical debt in test code: Taxonomy and detection,”arXiv preprint arXiv:2510.22409, 2025

work page arXiv 2025

[53] [53]

Bugs in machine learning-based systems: a faultload benchmark,

M. M. Morovati, A. Nikanjam, F. Khomh, and Z. M. Jiang, “Bugs in machine learning-based systems: a faultload benchmark,”Em- pirical Software Engineering, vol. 28, no. 3, p. 62, 2023

2023

[54] [54]

Sampling in software engineering research: A critical review and guidelines,

S. Baltes and P . Ralph, “Sampling in software engineering research: A critical review and guidelines,”Empirical Software Engineering, vol. 27, no. 4, p. 94, 2022

2022

[55] [55]

Bug taxonomies: Use them to generate better tests,

G. Vijayaraghavan and C. Kaner, “Bug taxonomies: Use them to generate better tests,”Star East, vol. 2003, pp. 1–40, 2003

2003

[56] [56]

A comprehensive study on security bug characteristics,

Y. Wei, X. Sun, L. Bo, S. Cao, X. Xia, and B. Li, “A comprehensive study on security bug characteristics,”Journal of Software: Evolution and Process, vol. 33, no. 10, p. e2376, 2021

2021

[57] [57]

What do users ask in open-source ai repositories? an empirical study of github issues,

Z. Yang, C. Wang, J. Shi, T. Hoang, P . Kochhar, Q. Lu, Z. Xing, and D. Lo, “What do users ask in open-source ai repositories? an empirical study of github issues,” in2023 IEEE/ACM 20th Inter- national Conference on Mining Software Repositories (MSR). IEEE, 2023, pp. 79–91

2023

[58] [58]

Wohlin, P

C. Wohlin, P . Runeson, M. Höst, M. C. Ohlsson, B. Regnell, A. Wesslénet al.,Experimentation in software engineering. Karl- skrona, Sweden: Springer, 2012, vol. 236

2012

[59] [59]

Are you still working on this? an empirical study on pull request abandon- ment,

Z. Li, Y. Yu, T. Wang, G. Yin, S. Li, and H. Wang, “Are you still working on this? an empirical study on pull request abandon- ment,”IEEE Transactions on Software Engineering, vol. 48, no. 6, pp. 2173–2188, 2021. ACKNOWLEDGMENTS This research was partially funded by the Safeguard project (grant No. 20506020) with Deloitte. Joshua Owotogbeis a Ph.D. candid...

2021