Model Context Protocol (MCP) at First Glance: Studying the Security and Maintainability of MCP Servers

Mohammed Mehedi Hasan , Hao Li , Emad Fallahzadeh , Gopi Krishnan Rajbahadur , Bram Adams , Ahmed E. Hassan

Authors on Pith no claims yet

classification 💻 cs.SE cs.ET

keywords toolmaintainabilitymcp-specificsecurityserversvulnerabilitiesanalysisecosystem

read the original abstract

Although Foundation Models (FMs), such as GPT-4, are increasingly used in domains like finance and software engineering, reliance on textual interfaces limits these models' real-world interaction. To address this, FM providers introduced a tool called -- triggering a proliferation of frameworks with distinct tool interfaces. In late 2024, Anthropic introduced the Model Context Protocol (MCP) to standardize this tool ecosystem. MCP is rapidly emerging as a de facto industry standard. Despite its adoption, MCP's AI-driven, non-deterministic control flow introduces new risks to sustainability, security, and maintainability, warranting closer examination. Towards this end, we present the first large-scale empirical study of MCP. Using state-of-the-art health metrics and a hybrid analysis pipeline that combines a general-purpose static analysis tool with an MCP-specific scanner, we evaluate 1,899 open-source MCP servers to assess their health, security, and maintainability. Despite MCP servers demonstrating strong health metrics, we identify eight distinct vulnerabilities -- only three of which overlap with traditional software vulnerabilities. Additionally, 7.2% of servers contain general vulnerabilities, and 5.5% exhibit MCP-specific tool poisoning. Regarding maintainability, while 66% exhibit code smells, 14.4% contain ten bug patterns overlapping prior research. These findings highlight the need for MCP-specific vulnerability detection techniques while reaffirming the value of traditional analysis and refactoring practices. Furthermore, we advocate for stronger governance across the MCP ecosystem by incorporating MCP-specific vulnerabilities into standardized vulnerability databases, enabling automated security scanning within MCP registries, and promoting responsible development practices to ensure the long-term safety and sustainability of the MCP ecosystem.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 10 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale
cs.CR 2026-01 unverdicted novelty 8.0

26.1% of analyzed AI agent skills contain vulnerabilities across 14 patterns, with executable scripts raising risk 2.12x, based on static and LLM analysis of 31k skills.
DADL: A Declarative Description Language for Enterprise Tool Libraries in LLM Agent Systems
cs.SE 2026-05 unverdicted novelty 7.0

DADL is a declarative YAML format that lets a single runtime handle many REST API tools for LLM agents, cutting tool advertisement context cost by 142x from 142,000 to 1,000 tokens on a catalog of 1,833 definitions.
MCP-DPT: A Defense-Placement Taxonomy and Coverage Analysis for Model Context Protocol Security
cs.CR 2026-04 conditional novelty 7.0

MCP-DPT creates a defense-placement taxonomy that organizes MCP threats and defenses across six architectural layers, revealing mostly tool-centric protections and gaps at orchestration, transport, and supply-chain layers.
Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions
cs.CR 2025-03 unverdicted novelty 7.0

MCP lifecycle is defined with four phases and 16 activities; a threat taxonomy of 16 scenarios is constructed, validated via case studies, and paired with phase-specific safeguards.
OpenAaaS: An Open Agent-as-a-Service Framework for Distributed Materials-Informatics Research
cond-mat.mtrl-sci 2026-05 unverdicted novelty 6.0

OpenAaaS is a hierarchical agent-as-a-service system that enables secure multi-agent collaboration for materials informatics by moving code to data rather than data to code.
When Agents Overtrust Environmental Evidence: An Extensible Agentic Framework for Benchmarking Evidence-Grounding Defects in LLM Agents
cs.AI 2026-05 unverdicted novelty 6.0

EnvTrustBench benchmarks evidence-grounding defects in LLM agents and finds they occur consistently across workflows.
When Agents Overtrust Environmental Evidence: An Extensible Agentic Framework for Benchmarking Evidence-Grounding Defects in LLM Agents
cs.AI 2026-05 unverdicted novelty 6.0

EnvTrustBench is a new agentic benchmark that measures evidence-grounding defects where LLM agents overtrust faulty environmental observations and take incorrect actions.
Unsafe by Flow: Uncovering Bidirectional Data-Flow Risks in MCP Ecosystem
cs.SE 2026-05 unverdicted novelty 6.0

MCP-BiFlow detects 93.8% of known bidirectional data-flow vulnerabilities in MCP servers and identifies 118 confirmed issues across 87 real-world servers from a scan of 15,452 repositories.
CASCADE: A Cascaded Hybrid Defense Architecture for Prompt Injection Detection in MCP-Based Systems
cs.CR 2026-04 unverdicted novelty 4.0

CASCADE is a cascaded hybrid detector that combines fast regex/entropy filtering, BGE embeddings with local LLM fallback, and output pattern checks to achieve 95.85% precision and 6.06% false-positive rate against pro...
Flowr -- Scaling Up Retail Supply Chain Operations Through Agentic AI in Large Scale Supermarket Chains
cs.AI 2026-04 unverdicted novelty 3.0

Flowr is an agentic AI framework that decomposes retail supply chain workflows into coordinated LLM-based agents with human-in-the-loop oversight to automate operations in large supermarket chains.