Recognition: unknown
The Origins of MEV: Systematic Attribution of Arbitrage Opportunity Creation at Scale
Pith reviewed 2026-05-07 06:45 UTC · model grok-4.3
The pith
Analysis of a million Polygon blocks shows most atomic arbitrage opportunities stem from single source transactions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
This work formalizes the problem of attributing maximal extractable value opportunities to their origins and provides a framework with four methods to identify the source transactions for atomic arbitrage on EVM networks. Applying these to more than one million blocks on Polygon reveals that the majority of such opportunities trace back to single source transactions. This supports the view of competitive markets and shows that opportunity creation is highly concentrated among a few protocols.
What carries the argument
Four attribution methods—bot-data-driven, simulation-based, coefficient-based, and Shapley-based—within a systems framework for tracing atomic arbitrage opportunities to their creating transactions.
If this is right
- Protocol designers can target specific source transactions to reduce maximal extractable value leakage.
- Validators can use origin information to better optimize the order of transactions.
- Ecosystem analysts gain a metric for health based on how opportunities are created.
- The high concentration means efforts can focus on a small set of protocols rather than the entire network.
Where Pith is reading between the lines
- Extending this attribution approach to non-atomic MEV types like liquidations could reveal similar patterns in other extraction activities.
- If the methods prove causal, smart contracts could be designed with built-in safeguards at the point of opportunity creation.
- The concentration finding may indicate that certain protocols have structural advantages in generating extractable value.
- Testing the framework on other EVM chains would check whether the single-source dominance is a general feature of these networks.
Load-bearing premise
The attribution methods are able to correctly identify the transactions that actually cause the arbitrage opportunities rather than just finding ones that happen to be linked to them.
What would settle it
Finding an arbitrage opportunity that remains even after the attributed source transaction is removed from the block, or identifying clear cases with multiple contributing sources that the methods still attribute to only one.
Figures
read the original abstract
Maximal Extractable Value (MEV) represents billions of dollars in extracted value that fundamentally shapes blockchain network dynamics and participant incentives. While research has focused on MEV extraction and mitigation, we lack systematic methods to attribute MEV opportunities to their on-chain origins. This paper formalizes the MEV opportunity attribution problem and introduces a systems framework for identifying which transactions create arbitrage opportunities and quantifying their contributions. We design and evaluate four attribution methods for atomic arbitrage on EVM-compatible networks: bot-data-driven, simulation-based, coefficient-based, and Shapley-based approaches. Through large-scale retrospective analysis spanning over one million blocks on Polygon, we demonstrate that the majority of atomic arbitrage opportunities can be traced to single source transactions, validating our central hypothesis about competitive MEV markets. We quantify a highly concentrated distribution of MEV creation, where a small subset of protocols generates most opportunities, and provide comparative analysis of method trade-offs in accuracy, cost, and scalability. Our findings offer insights for protocol designers reducing MEV leakage, validators optimizing transaction ordering, and analysts measuring ecosystem health through opportunity creation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to formalize the MEV opportunity attribution problem and introduces four attribution methods for atomic arbitrage on EVM networks: bot-data-driven, simulation-based, coefficient-based, and Shapley-based. Using large-scale retrospective analysis on over one million blocks from Polygon, it demonstrates that the majority of atomic arbitrage opportunities can be traced to single source transactions, validating the hypothesis about competitive MEV markets. It also quantifies a highly concentrated distribution of MEV creation among a small subset of protocols and compares the methods' trade-offs.
Significance. This empirical study provides a novel systems framework for attributing MEV opportunities to their on-chain origins, shifting focus from extraction to creation. If the methods are shown to be causally accurate, the finding of single-source dominance and concentration would offer valuable insights for reducing MEV leakage in protocol design, optimizing transaction ordering by validators, and assessing ecosystem health. The scale of the analysis (>1M blocks) and multi-method comparison are strengths. However, the lack of external validation limits the strength of the conclusions at present.
major comments (2)
- [Attribution Methods] The evaluation of the four attribution methods relies on their internal consistency and the observed concentration in the data. No external benchmark or ground-truth validation is provided (e.g., using synthetic data with injected source transactions or verified real-world bot disclosures). This is a load-bearing issue for the central claim, as the single-source attribution could result from methodological artifacts rather than true competitive dynamics. See the skeptic's note on potential shared bias toward earliest transactions.
- [Experimental Setup] Details on how atomic arbitrage opportunities are detected, any data filtering or exclusion criteria, and the exact definition of 'source transaction' are insufficiently specified. This makes it challenging to assess whether the results are robust or sensitive to analysis choices.
minor comments (3)
- [Abstract] The abstract states the central finding but could more precisely quantify 'the majority' (e.g., what percentage) to set expectations for the results section.
- Ensure that all equations and algorithms for the attribution methods are clearly presented with pseudocode or mathematical formulations for reproducibility.
- [Results] The comparative analysis of method trade-offs should include specific numbers for accuracy, cost, and scalability to allow readers to evaluate them directly.
Simulated Author's Rebuttal
We thank the referee for their constructive review and for highlighting areas where additional clarity and discussion would strengthen the manuscript. We address each major comment in turn below, with planned revisions where appropriate.
read point-by-point responses
-
Referee: [Attribution Methods] The evaluation of the four attribution methods relies on their internal consistency and the observed concentration in the data. No external benchmark or ground-truth validation is provided (e.g., using synthetic data with injected source transactions or verified real-world bot disclosures). This is a load-bearing issue for the central claim, as the single-source attribution could result from methodological artifacts rather than true competitive dynamics. See the skeptic's note on potential shared bias toward earliest transactions.
Authors: We agree that external ground-truth validation would provide stronger causal support for the attribution results. Generating fully realistic synthetic MEV data that captures adversarial bot strategies, gas price dynamics, and cross-protocol interactions is technically challenging and beyond the scope of the current retrospective analysis. We instead emphasize the convergence across four methodologically distinct approaches (bot-data-driven, which relies on observed searcher actions; simulation-based, which replays execution traces; coefficient-based, which uses regression on opportunity sizes; and Shapley-based, which computes marginal contributions over coalitions). All four independently identify single-source dominance, which reduces the probability of a shared methodological artifact. On the potential bias toward earliest transactions, the bot-data-driven method uses real bot transaction data rather than positional assumptions, the simulation-based method identifies dependency via actual execution outcomes, and the Shapley value explicitly averages over all possible orderings of transactions. We will add a new Limitations subsection that explicitly discusses these validation challenges, the rationale for relying on multi-method agreement, and directions for future work (e.g., collaboration with disclosed bot operators). revision: partial
-
Referee: [Experimental Setup] Details on how atomic arbitrage opportunities are detected, any data filtering or exclusion criteria, and the exact definition of 'source transaction' are insufficiently specified. This makes it challenging to assess whether the results are robust or sensitive to analysis choices.
Authors: We acknowledge that the current description of the experimental pipeline is not sufficiently detailed for full reproducibility. In the revised manuscript we will expand the Data Collection and Opportunity Detection subsection to include: (1) the precise heuristic for identifying atomic arbitrage opportunities (including the requirement that the opportunity be profitable after gas and that the backrun transaction appears in the same block); (2) all filtering criteria applied to the >1 M Polygon blocks (e.g., exclusion of blocks with missing trace data, non-EVM transactions, or blocks containing only failed transactions); and (3) a formal definition of a source transaction as the earliest transaction in the block whose execution creates a positive expected value for a subsequent atomic arbitrage backrun. We will also add pseudocode for the detection algorithm and a sensitivity analysis appendix examining how results change under alternative filtering thresholds. revision: yes
Circularity Check
No significant circularity in empirical attribution analysis
full rationale
The paper's central claim—that the majority of atomic arbitrage opportunities trace to single source transactions—is an empirical statistical observation obtained by applying four attribution methods (bot-data-driven, simulation-based, coefficient-based, and Shapley-based) to over one million blocks of Polygon data. No equations, derivations, or self-citations are presented that reduce the attribution results or the hypothesis about competitive MEV markets to the inputs by construction. The methods are introduced as distinct approaches whose internal consistency and resulting concentration statistics are reported directly from the external blockchain dataset; the outcome is not forced by definitional assumptions, fitted parameters renamed as predictions, or load-bearing self-citations. While independent causal ground truth would strengthen validity, the reported derivation chain remains self-contained against the public data and does not exhibit the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math Blockchain state transitions are deterministic given transaction inputs and current state
- domain assumption Arbitrage opportunities are created by price discrepancies introduced by prior transactions rather than by simultaneous actions
Reference graph
Works this paper leans on
- [1]
-
[2]
A. Bagourd and L. G. Francois. 2023. Quantifying MEV on Layer 2 Networks.arXiv preprint arXiv:2309.00629(2023). Available at https://arxiv.org/abs/2309.00629
-
[3]
Saeed Banaeian Far, Azadeh Imani Rad, and Maryam Rajabzadeh Asaar
-
[4]
Blockchain and its derived technologies shape the future gener- ation of digital businesses: a focus on decentralized finance and the Metaverse.Data Science and Management6, 3 (Sept. 2023), 183–197. https://doi.org/10.1016/j.dsm.2023.06.002
-
[5]
Mikolaj Barczentewicz. 2023. Mev on ethereum: A policy analysis. ICLE White Paper(2023), 01–23
2023
-
[6]
Dmitri Boreiko. 2024. Decentralized finance and Non-Fungible Tokens. InUnderstanding Initial Coin Offerings. Edward Elgar Publishing, 233– 248
2024
-
[7]
Philip Daian, Steven Goldfeder, Tyler Kell, Yunqi Li, Xueyuan Zhao, Iddo Bentov, Lorenz Breidenbach, and Ari Juels. 2020. Flash Boys 2.0: Frontrunning in Decentralized Exchanges, Miner Extractable Value, and Consensus Instability. In2020 IEEE Symposium on Security and Pri- vacy (SP). IEEE, 910–927.https://doi.org/10.1109/sp40000.2020.00040
-
[8]
2020.TON: The Open Network – Technical Overview
Nikolai Durov, Ivan Emelianenko, Daniil Melnik, et al. 2020.TON: The Open Network – Technical Overview. Whitepaper. Telegram Messenger Inc.https://ton.org/whitepaper.pdf
2020
-
[9]
Christof Ferreira Torres, Albin Mamuti, Ben Weintraub, Cristina Nita- Rotaru, and Shweta Shinde. 2024. Rolling in the Shadows: Analyzing the Extraction of MEV Across Layer-2 Rollups. InProceedings of the 2024 ACM SIGSAC Conference on Computer and Communications Secu- rity. ACM, 2591–2605.https://doi.org/10.1145/3658644.3690259
-
[10]
Neil Giridharan, Florian Suri-Payer, Ittai Abraham, Lorenzo Alvisi, and Natacha Crooks. 2024. Autobahn: Seamless high speed BFT. In Proceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles (SOSP ’24). Association for Computing Machinery, New York, NY, USA, 1–23.https://doi.org/10.1145/3694715.3695942
-
[11]
Vincent Gramlich, Dennis Jelito, and Johannes Sedlmeir. 2024. Maximal extractable value: Current understanding, categorization, and open research questions.Electronic Markets34, 1 (Oct. 2024).https://doi. org/10.1007/s12525-024-00727-x
-
[12]
Mahimna Kelkar, Fan Zhang, Steven Goldfeder, and Ari Juels. 2020. Order-Fairness for Byzantine Consensus. InProceedings of the 29th USENIX Security Symposium (USENIX Security ’20). USENIX Associa- tion, 2651–2668.https://www.usenix.org/conference/usenixsecurity20/ presentation/kelkar
2020
-
[13]
Seung-seob Lee, Jachym Putta, Ziming Mao, and Anurag Khandelwal
-
[14]
InProceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles (SOSP ’25)
Spirit: Fair Allocation of Interdependent Resources in Remote Memory Systems. InProceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles (SOSP ’25). Association for Computing Machinery, New York, NY, USA, 120–135.https://doi.org/10.1145/ 3731569.3764805
-
[15]
Mingjie Li, Zeyan Li, Kanglin Yin, Xiaohui Nie, Wenchi Zhang, Kaixin Sui, and Dan Pei. 2022. Causal Inference-Based Root Cause Analysis for Online Service Systems with Intervention Recognition. InProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. ACM, 3230–3240.https://doi.org/10.1145/3534678.3539041
-
[16]
Ye Li, Jian Tan, Bin Wu, Xiao He, and Feifei Li. 2023. ShapleyIQ: Influ- ence Quantification by Shapley Values for Performance Debugging of Microservices. InProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4. ACM, 287–323.https://doi.org/10.1145/3623278. 3624771
-
[17]
Chenxiao Liu, Zhenting Zhu, Quanxi Li, Yanwen Xia, Yifan Qiao, Xi- angyun Deng, Youyou Lu, Tao Xie, Huimin Cui, Zidong Du, Harry Xu, and Chenxi Wang. 2025. Orthrus: Efficient and Timely Detec- tion of Silent User Data Corruption in the Cloud with Resource- Adaptive Computation Validation. InProceedings of the ACM SIGOPS 31st Symposium on Operating Systems...
-
[18]
Jiacheng Ma, Jonas Kaufmann, Emilien Guandalino, Rishabh Iyer, Thomas Bourgeat, and George Candea. 2025. Fast End-to-End Per- formance Simulation of Accelerated Hardware-Software Stacks. In Proceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles (SOSP ’25). Association for Computing Machinery, New York, NY, USA, 341–358.https://doi.or...
-
[19]
The Merge
Davide Mancino, Alberto Leporati, Marco Viviani, Giovanni Denaro, et al. 2023. Exploiting Ethereum after" The Merge": The Interplay between PoS and MEV Strategies.. InITASEC
2023
-
[20]
Satoshi Nakamoto. 2008. Bitcoin: A Peer-to-Peer Electronic Cash System.www.bitcoin.org(2008), 1–9.https://bitcoin.org/bitcoin.pdf
2008
-
[21]
Jia Pan, Haoze Wu, Tanakorn Leesatapornwongsa, Suman Nath, and Peng Huang. 2024. Efficient Reproduction of Fault-Induced Failures in Distributed Systems with Feedback-Driven Fault Injection. InPro- ceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles (SOSP ’24). Association for Computing Machinery, New York, NY, USA, 46–62.https://doi...
-
[22]
Kaihua Qin, Liyi Zhou, and Arthur Gervais. 2022. Quantifying Blockchain Extractable Value: How Dark Is the Forest?. In2022 IEEE Symposium on Security and Privacy (SP). IEEE, 198–214.https: //doi.org/10.1109/SP46214.2022.9833734
- [23]
- [24]
- [25]
-
[26]
Florian Suri-Payer, Neil Giridharan, Liam Arzola, Shir Cohen, Lorenzo Alvisi, and Natacha Crooks. 2025. Pesto: Cooking up High Performance BFT Queries. InProceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles (SOSP ’25). Association for Computing Machinery, New York, NY, USA, 529–554.https://doi.org/10.1145/ 3731569.3764799
- [27]
- [28]
-
[29]
Qiang Wang, Rui Li, Qian Wang, and Shi Chen. 2020. A Survey on Blockchain Sharding.IEEE Access8 (2020), 193744–193760.https: //doi.org/10.1109/ACCESS.2020.3033453
-
[30]
Ben Weintraub, Christof Ferreira Torres, Cristina Nita-Rotaru, and Radu State. 2022. A flash(bot) in the pan: measuring maximal ex- tractable value in private pools. InProceedings of the 22nd ACM Internet Measurement Conference (IMC ’22). ACM, 458–471.https: //doi.org/10.1145/3517745.3561448
-
[31]
Gavin Wood. 2014. Ethereum: A Secure Decentralised Generalised Transaction Ledger.Ethereum Project Yellow Paper151, 1 (2014), 1–32. https://ethereum.github.io/yellowpaper/paper.pdf
2014
-
[32]
Anatoly Yakovenko. 2018. Solana: A new architecture for a high performance blockchain. (2018), 32 pages.https://solana.com/ 13 solana-whitepaper.pdf
2018
-
[33]
Sen Yang, Fan Zhang, Ken Huang, Xi Chen, Youwei Yang, and Feng Zhu. 2024. SoK: MEV Countermeasures. InProceedings of the Workshop on Decentralized Finance and Security (CCS ’24). ACM, 21–30.https: //doi.org/10.1145/3689931.3694911
-
[34]
2024.Analysis of Front-Running Vulnerabilities in Solidity Smart Contracts
Halid Zecirovic. 2024.Analysis of Front-Running Vulnerabilities in Solidity Smart Contracts. Ph.D. Dissertation. Technische Universität Wien
2024
-
[35]
Patrick Züst, Tejaswi Nadahalli, and Ye Wang Roger Wattenhofer. 2021. Analyzing and preventing sandwich attacks in ethereum.ETH Zürich (2021), 1–29. 14
2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.