pith. machine review for the scientific record. sign in

arxiv: 2604.27979 · v1 · submitted 2026-04-30 · 💻 cs.DC

Recognition: unknown

The Origins of MEV: Systematic Attribution of Arbitrage Opportunity Creation at Scale

Aleksei Smirnov, Anastasiia Smirnova, Andrei Seoev, Denis Fedyanin, Dmitry Belousov, Ksenia Kurinova, Yury Yanovich

Authors on Pith no claims yet

Pith reviewed 2026-05-07 06:45 UTC · model grok-4.3

classification 💻 cs.DC
keywords MEVarbitrage attributionblockchainPolygonatomic arbitragesource transactionopportunity creationEVM
0
0 comments X

The pith

Analysis of a million Polygon blocks shows most atomic arbitrage opportunities stem from single source transactions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors create ways to figure out which on-chain transactions first create chances for atomic arbitrage in blockchain systems. Using data from over one million blocks on the Polygon network and four different ways to trace back the origins, they show that most of these chances come from one main transaction each. Readers might care because knowing the starting points of these value-extraction chances helps designers make protocols that leak less value, lets validators arrange transactions better, and gives a way to check how healthy the blockchain economy is. The results also point to a small number of protocols being responsible for creating the bulk of these chances.

Core claim

This work formalizes the problem of attributing maximal extractable value opportunities to their origins and provides a framework with four methods to identify the source transactions for atomic arbitrage on EVM networks. Applying these to more than one million blocks on Polygon reveals that the majority of such opportunities trace back to single source transactions. This supports the view of competitive markets and shows that opportunity creation is highly concentrated among a few protocols.

What carries the argument

Four attribution methods—bot-data-driven, simulation-based, coefficient-based, and Shapley-based—within a systems framework for tracing atomic arbitrage opportunities to their creating transactions.

If this is right

  • Protocol designers can target specific source transactions to reduce maximal extractable value leakage.
  • Validators can use origin information to better optimize the order of transactions.
  • Ecosystem analysts gain a metric for health based on how opportunities are created.
  • The high concentration means efforts can focus on a small set of protocols rather than the entire network.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Extending this attribution approach to non-atomic MEV types like liquidations could reveal similar patterns in other extraction activities.
  • If the methods prove causal, smart contracts could be designed with built-in safeguards at the point of opportunity creation.
  • The concentration finding may indicate that certain protocols have structural advantages in generating extractable value.
  • Testing the framework on other EVM chains would check whether the single-source dominance is a general feature of these networks.

Load-bearing premise

The attribution methods are able to correctly identify the transactions that actually cause the arbitrage opportunities rather than just finding ones that happen to be linked to them.

What would settle it

Finding an arbitrage opportunity that remains even after the attributed source transaction is removed from the block, or identifying clear cases with multiple contributing sources that the methods still attribute to only one.

Figures

Figures reproduced from arXiv: 2604.27979 by Aleksei Smirnov, Anastasiia Smirnova, Andrei Seoev, Denis Fedyanin, Dmitry Belousov, Ksenia Kurinova, Yury Yanovich.

Figure 1
Figure 1. Figure 1: Example atomic arbitrage on Polygon (block 58,329,504). The arbitrageur exploits a price imbalance be￾tween Uniswap V3 (WMATIC/USDC) and Uniswap V2 (USDC/WMATIC) pools. After executing two swaps, the transaction yields a net profit of 0.3121 WMATIC ($0.24 at execution time), satisfying all three atomic arbitrage criteria: 𝑁 = 2 swaps, Δ(𝐴) ≥ 0 for all assets, and Profit > 0 after fees. Transaction hash: 0x… view at source ↗
Figure 2
Figure 2. Figure 2: Simulation-based attribution pipeline. (1) Filter transactions by pool intersection (yellow). (2) Binary search backwards to find edge transaction 𝑇𝑒𝑑𝑔𝑒 where profit drops below 5% threshold (blue). (3) Compute marginal impacts via backward pass; select source transaction with maximum impact (green). We traverse backwards from 𝑇𝑎𝑟𝑏 to 𝑇𝑒𝑑𝑔𝑒 , computing Imp𝑖 for each transaction. The source transaction is s… view at source ↗
Figure 3
Figure 3. Figure 3: Shapley attribution for an arbitrage event (block 82,563,006). Positive values indicate opportunity creation; negative values indicate profit consumption by competing arbitrageurs. The non-arbitrage transaction at index 129 is the primary source (+32.58 MATIC). agreement computation with Shapley, attributions to the 𝑆0 (i.e., "no in-block source") are counted as matches when multiple methods concur on this… view at source ↗
Figure 4
Figure 4. Figure 4: Monte Carlo Shapley convergence for transaction 0xb1f2a5bb.. (block 82,554,874). Four subplots show convergence for candidates with varying Shapley values (one near-zero, one exactly zero, two negative). Exact Shapley values shown as horizontal dashed lines; Monte Carlo estimates (mean ±1 std over 100 runs) shown as points with shaded regions. Estimates stabilize within 5% after ∼500 samples. Transaction O… view at source ↗
Figure 5
Figure 5. Figure 5: Shapley attribution for complex arbitrage (block 82,554,874, transaction 0xb1f2a5bb..). Seventeen transactions connected by non-zero Shapley values. Despite multiple participants, attribution remains dominated by a single source. this concentration at the address level, each opportunity￾creating transaction attracts only ∼1.6 successfully executed arbitrage transactions on average–a ratio reflecting standa… view at source ↗
Figure 6
Figure 6. Figure 6: MEV concentration: accumulated MEV value by percentile of top arbitrageurs (blue) vs. opportunity-creating transactions (orange). Top 1% of each group accounts for 80% of extracted value, yet the executed arbitrage-to-opportunity ratio remains ∼1.6:1. Analysis based on 220,262 opportunity￾creating transactions from February 2026 dataset. other MEV categories (liquidations, sandwich attacks, top￾of-block op… view at source ↗
read the original abstract

Maximal Extractable Value (MEV) represents billions of dollars in extracted value that fundamentally shapes blockchain network dynamics and participant incentives. While research has focused on MEV extraction and mitigation, we lack systematic methods to attribute MEV opportunities to their on-chain origins. This paper formalizes the MEV opportunity attribution problem and introduces a systems framework for identifying which transactions create arbitrage opportunities and quantifying their contributions. We design and evaluate four attribution methods for atomic arbitrage on EVM-compatible networks: bot-data-driven, simulation-based, coefficient-based, and Shapley-based approaches. Through large-scale retrospective analysis spanning over one million blocks on Polygon, we demonstrate that the majority of atomic arbitrage opportunities can be traced to single source transactions, validating our central hypothesis about competitive MEV markets. We quantify a highly concentrated distribution of MEV creation, where a small subset of protocols generates most opportunities, and provide comparative analysis of method trade-offs in accuracy, cost, and scalability. Our findings offer insights for protocol designers reducing MEV leakage, validators optimizing transaction ordering, and analysts measuring ecosystem health through opportunity creation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The paper claims to formalize the MEV opportunity attribution problem and introduces four attribution methods for atomic arbitrage on EVM networks: bot-data-driven, simulation-based, coefficient-based, and Shapley-based. Using large-scale retrospective analysis on over one million blocks from Polygon, it demonstrates that the majority of atomic arbitrage opportunities can be traced to single source transactions, validating the hypothesis about competitive MEV markets. It also quantifies a highly concentrated distribution of MEV creation among a small subset of protocols and compares the methods' trade-offs.

Significance. This empirical study provides a novel systems framework for attributing MEV opportunities to their on-chain origins, shifting focus from extraction to creation. If the methods are shown to be causally accurate, the finding of single-source dominance and concentration would offer valuable insights for reducing MEV leakage in protocol design, optimizing transaction ordering by validators, and assessing ecosystem health. The scale of the analysis (>1M blocks) and multi-method comparison are strengths. However, the lack of external validation limits the strength of the conclusions at present.

major comments (2)
  1. [Attribution Methods] The evaluation of the four attribution methods relies on their internal consistency and the observed concentration in the data. No external benchmark or ground-truth validation is provided (e.g., using synthetic data with injected source transactions or verified real-world bot disclosures). This is a load-bearing issue for the central claim, as the single-source attribution could result from methodological artifacts rather than true competitive dynamics. See the skeptic's note on potential shared bias toward earliest transactions.
  2. [Experimental Setup] Details on how atomic arbitrage opportunities are detected, any data filtering or exclusion criteria, and the exact definition of 'source transaction' are insufficiently specified. This makes it challenging to assess whether the results are robust or sensitive to analysis choices.
minor comments (3)
  1. [Abstract] The abstract states the central finding but could more precisely quantify 'the majority' (e.g., what percentage) to set expectations for the results section.
  2. Ensure that all equations and algorithms for the attribution methods are clearly presented with pseudocode or mathematical formulations for reproducibility.
  3. [Results] The comparative analysis of method trade-offs should include specific numbers for accuracy, cost, and scalability to allow readers to evaluate them directly.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive review and for highlighting areas where additional clarity and discussion would strengthen the manuscript. We address each major comment in turn below, with planned revisions where appropriate.

read point-by-point responses
  1. Referee: [Attribution Methods] The evaluation of the four attribution methods relies on their internal consistency and the observed concentration in the data. No external benchmark or ground-truth validation is provided (e.g., using synthetic data with injected source transactions or verified real-world bot disclosures). This is a load-bearing issue for the central claim, as the single-source attribution could result from methodological artifacts rather than true competitive dynamics. See the skeptic's note on potential shared bias toward earliest transactions.

    Authors: We agree that external ground-truth validation would provide stronger causal support for the attribution results. Generating fully realistic synthetic MEV data that captures adversarial bot strategies, gas price dynamics, and cross-protocol interactions is technically challenging and beyond the scope of the current retrospective analysis. We instead emphasize the convergence across four methodologically distinct approaches (bot-data-driven, which relies on observed searcher actions; simulation-based, which replays execution traces; coefficient-based, which uses regression on opportunity sizes; and Shapley-based, which computes marginal contributions over coalitions). All four independently identify single-source dominance, which reduces the probability of a shared methodological artifact. On the potential bias toward earliest transactions, the bot-data-driven method uses real bot transaction data rather than positional assumptions, the simulation-based method identifies dependency via actual execution outcomes, and the Shapley value explicitly averages over all possible orderings of transactions. We will add a new Limitations subsection that explicitly discusses these validation challenges, the rationale for relying on multi-method agreement, and directions for future work (e.g., collaboration with disclosed bot operators). revision: partial

  2. Referee: [Experimental Setup] Details on how atomic arbitrage opportunities are detected, any data filtering or exclusion criteria, and the exact definition of 'source transaction' are insufficiently specified. This makes it challenging to assess whether the results are robust or sensitive to analysis choices.

    Authors: We acknowledge that the current description of the experimental pipeline is not sufficiently detailed for full reproducibility. In the revised manuscript we will expand the Data Collection and Opportunity Detection subsection to include: (1) the precise heuristic for identifying atomic arbitrage opportunities (including the requirement that the opportunity be profitable after gas and that the backrun transaction appears in the same block); (2) all filtering criteria applied to the >1 M Polygon blocks (e.g., exclusion of blocks with missing trace data, non-EVM transactions, or blocks containing only failed transactions); and (3) a formal definition of a source transaction as the earliest transaction in the block whose execution creates a positive expected value for a subsequent atomic arbitrage backrun. We will also add pseudocode for the detection algorithm and a sensitivity analysis appendix examining how results change under alternative filtering thresholds. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical attribution analysis

full rationale

The paper's central claim—that the majority of atomic arbitrage opportunities trace to single source transactions—is an empirical statistical observation obtained by applying four attribution methods (bot-data-driven, simulation-based, coefficient-based, and Shapley-based) to over one million blocks of Polygon data. No equations, derivations, or self-citations are presented that reduce the attribution results or the hypothesis about competitive MEV markets to the inputs by construction. The methods are introduced as distinct approaches whose internal consistency and resulting concentration statistics are reported directly from the external blockchain dataset; the outcome is not forced by definitional assumptions, fitted parameters renamed as predictions, or load-bearing self-citations. While independent causal ground truth would strengthen validity, the reported derivation chain remains self-contained against the public data and does not exhibit the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the assumption that the four attribution methods can isolate causal sources from observational blockchain data. No new physical constants or mathematical axioms are introduced; the work relies on standard blockchain execution semantics and game-theoretic fairness notions already present in the literature.

axioms (2)
  • standard math Blockchain state transitions are deterministic given transaction inputs and current state
    Invoked implicitly when using simulation-based attribution to replay blocks without certain transactions.
  • domain assumption Arbitrage opportunities are created by price discrepancies introduced by prior transactions rather than by simultaneous actions
    Central to the single-source hypothesis and the design of all four attribution methods.

pith-pipeline@v0.9.0 · 5516 in / 1546 out tokens · 68386 ms · 2026-05-07T06:45:17.805629+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

35 extracted references · 23 canonical work pages

  1. [1]

    Austin Adams, Benjamin Y Chan, Sarit Markovich, and Xin Wan. 2024. Don’t Let MEV Slip: The Costs of Swapping on the Uniswap Protocol. arXiv preprint arXiv:2309.13648(2024)

  2. [2]

    Bagourd and L

    A. Bagourd and L. G. Francois. 2023. Quantifying MEV on Layer 2 Networks.arXiv preprint arXiv:2309.00629(2023). Available at https://arxiv.org/abs/2309.00629

  3. [3]

    Saeed Banaeian Far, Azadeh Imani Rad, and Maryam Rajabzadeh Asaar

  4. [4]

    2023), 183–197

    Blockchain and its derived technologies shape the future gener- ation of digital businesses: a focus on decentralized finance and the Metaverse.Data Science and Management6, 3 (Sept. 2023), 183–197. https://doi.org/10.1016/j.dsm.2023.06.002

  5. [5]

    Mikolaj Barczentewicz. 2023. Mev on ethereum: A policy analysis. ICLE White Paper(2023), 01–23

  6. [6]

    Dmitri Boreiko. 2024. Decentralized finance and Non-Fungible Tokens. InUnderstanding Initial Coin Offerings. Edward Elgar Publishing, 233– 248

  7. [7]

    Philip Daian, Steven Goldfeder, Tyler Kell, Yunqi Li, Xueyuan Zhao, Iddo Bentov, Lorenz Breidenbach, and Ari Juels. 2020. Flash Boys 2.0: Frontrunning in Decentralized Exchanges, Miner Extractable Value, and Consensus Instability. In2020 IEEE Symposium on Security and Pri- vacy (SP). IEEE, 910–927.https://doi.org/10.1109/sp40000.2020.00040

  8. [8]

    2020.TON: The Open Network – Technical Overview

    Nikolai Durov, Ivan Emelianenko, Daniil Melnik, et al. 2020.TON: The Open Network – Technical Overview. Whitepaper. Telegram Messenger Inc.https://ton.org/whitepaper.pdf

  9. [9]

    Christof Ferreira Torres, Albin Mamuti, Ben Weintraub, Cristina Nita- Rotaru, and Shweta Shinde. 2024. Rolling in the Shadows: Analyzing the Extraction of MEV Across Layer-2 Rollups. InProceedings of the 2024 ACM SIGSAC Conference on Computer and Communications Secu- rity. ACM, 2591–2605.https://doi.org/10.1145/3658644.3690259

  10. [10]

    Neil Giridharan, Florian Suri-Payer, Ittai Abraham, Lorenzo Alvisi, and Natacha Crooks. 2024. Autobahn: Seamless high speed BFT. In Proceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles (SOSP ’24). Association for Computing Machinery, New York, NY, USA, 1–23.https://doi.org/10.1145/3694715.3695942

  11. [11]

    Vincent Gramlich, Dennis Jelito, and Johannes Sedlmeir. 2024. Maximal extractable value: Current understanding, categorization, and open research questions.Electronic Markets34, 1 (Oct. 2024).https://doi. org/10.1007/s12525-024-00727-x

  12. [12]

    Mahimna Kelkar, Fan Zhang, Steven Goldfeder, and Ari Juels. 2020. Order-Fairness for Byzantine Consensus. InProceedings of the 29th USENIX Security Symposium (USENIX Security ’20). USENIX Associa- tion, 2651–2668.https://www.usenix.org/conference/usenixsecurity20/ presentation/kelkar

  13. [13]

    Seung-seob Lee, Jachym Putta, Ziming Mao, and Anurag Khandelwal

  14. [14]

    InProceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles (SOSP ’25)

    Spirit: Fair Allocation of Interdependent Resources in Remote Memory Systems. InProceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles (SOSP ’25). Association for Computing Machinery, New York, NY, USA, 120–135.https://doi.org/10.1145/ 3731569.3764805

  15. [15]

    Mingjie Li, Zeyan Li, Kanglin Yin, Xiaohui Nie, Wenchi Zhang, Kaixin Sui, and Dan Pei. 2022. Causal Inference-Based Root Cause Analysis for Online Service Systems with Intervention Recognition. InProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. ACM, 3230–3240.https://doi.org/10.1145/3534678.3539041

  16. [16]

    Ye Li, Jian Tan, Bin Wu, Xiao He, and Feifei Li. 2023. ShapleyIQ: Influ- ence Quantification by Shapley Values for Performance Debugging of Microservices. InProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4. ACM, 287–323.https://doi.org/10.1145/3623278. 3624771

  17. [17]

    Chenxiao Liu, Zhenting Zhu, Quanxi Li, Yanwen Xia, Yifan Qiao, Xi- angyun Deng, Youyou Lu, Tao Xie, Huimin Cui, Zidong Du, Harry Xu, and Chenxi Wang. 2025. Orthrus: Efficient and Timely Detec- tion of Silent User Data Corruption in the Cloud with Resource- Adaptive Computation Validation. InProceedings of the ACM SIGOPS 31st Symposium on Operating Systems...

  18. [18]

    Jiacheng Ma, Jonas Kaufmann, Emilien Guandalino, Rishabh Iyer, Thomas Bourgeat, and George Candea. 2025. Fast End-to-End Per- formance Simulation of Accelerated Hardware-Software Stacks. In Proceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles (SOSP ’25). Association for Computing Machinery, New York, NY, USA, 341–358.https://doi.or...

  19. [19]

    The Merge

    Davide Mancino, Alberto Leporati, Marco Viviani, Giovanni Denaro, et al. 2023. Exploiting Ethereum after" The Merge": The Interplay between PoS and MEV Strategies.. InITASEC

  20. [20]

    Satoshi Nakamoto. 2008. Bitcoin: A Peer-to-Peer Electronic Cash System.www.bitcoin.org(2008), 1–9.https://bitcoin.org/bitcoin.pdf

  21. [21]

    Jia Pan, Haoze Wu, Tanakorn Leesatapornwongsa, Suman Nath, and Peng Huang. 2024. Efficient Reproduction of Fault-Induced Failures in Distributed Systems with Feedback-Driven Fault Injection. InPro- ceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles (SOSP ’24). Association for Computing Machinery, New York, NY, USA, 46–62.https://doi...

  22. [22]

    Kaihua Qin, Liyi Zhou, and Arthur Gervais. 2022. Quantifying Blockchain Extractable Value: How Dark Is the Forest?. In2022 IEEE Symposium on Security and Privacy (SP). IEEE, 198–214.https: //doi.org/10.1109/SP46214.2022.9833734

  23. [23]

    Kaihua Qin, Liyi Zhou, and Arthur Gervais. 2023. The Blockchain MEV Taxonomy.arXiv preprint arXiv:2302.01670(2023)

  24. [24]

    Amit Sharma, Hua Li, and Jian Jiao. 2022. The Counterfactual- Shapley Value: Attributing Change in System Metrics.arXiv preprint arXiv:2208.08399(2022)

  25. [25]

    Chaofan Shou, Yuanyu Ke, Yupeng Yang, Qi Su, Or Dadosh, Assaf Eli, David Benchimol, Doudou Lu, Daniel Tong, Dex Chen, et al. 2024. BACKRUNNER: Mitigating Smart Contract Attacks in the Real World. arXiv preprint arXiv:2409.06213(2024)

  26. [26]

    Florian Suri-Payer, Neil Giridharan, Liam Arzola, Shir Cohen, Lorenzo Alvisi, and Natacha Crooks. 2025. Pesto: Cooking up High Performance BFT Queries. InProceedings of the ACM SIGOPS 31st Symposium on Operating Systems Principles (SOSP ’25). Association for Computing Machinery, New York, NY, USA, 529–554.https://doi.org/10.1145/ 3731569.3764799

  27. [27]

    Christof Ferreira Torres, Ramiro Daniel Camino, and Radu State. 2021. Frontrunner Jones and the Raiders of the Dark Forest: An Empir- ical Study of Frontrunning on the Ethereum Blockchain.ArXiv abs/2102.03347 (2021).https://api.semanticscholar.org/CorpusID: 231839835

  28. [28]

    Daniil Vostrikov, Yash Madhwal, Andrey Seoev, Anastasiia Smirnova, Yury Yanovich, Alexey Smirnov, and Vladimir Gorgadze. 2025. Un- packing Maximum Extractable Value on Polygon: A Study on Atomic Arbitrage.Arxiv(2025).https://arxiv.org/abs/2508.21473

  29. [29]

    Qiang Wang, Rui Li, Qian Wang, and Shi Chen. 2020. A Survey on Blockchain Sharding.IEEE Access8 (2020), 193744–193760.https: //doi.org/10.1109/ACCESS.2020.3033453

  30. [30]

    Ben Weintraub, Christof Ferreira Torres, Cristina Nita-Rotaru, and Radu State. 2022. A flash(bot) in the pan: measuring maximal ex- tractable value in private pools. InProceedings of the 22nd ACM Internet Measurement Conference (IMC ’22). ACM, 458–471.https: //doi.org/10.1145/3517745.3561448

  31. [31]

    Gavin Wood. 2014. Ethereum: A Secure Decentralised Generalised Transaction Ledger.Ethereum Project Yellow Paper151, 1 (2014), 1–32. https://ethereum.github.io/yellowpaper/paper.pdf

  32. [32]

    Anatoly Yakovenko. 2018. Solana: A new architecture for a high performance blockchain. (2018), 32 pages.https://solana.com/ 13 solana-whitepaper.pdf

  33. [33]

    Sen Yang, Fan Zhang, Ken Huang, Xi Chen, Youwei Yang, and Feng Zhu. 2024. SoK: MEV Countermeasures. InProceedings of the Workshop on Decentralized Finance and Security (CCS ’24). ACM, 21–30.https: //doi.org/10.1145/3689931.3694911

  34. [34]

    2024.Analysis of Front-Running Vulnerabilities in Solidity Smart Contracts

    Halid Zecirovic. 2024.Analysis of Front-Running Vulnerabilities in Solidity Smart Contracts. Ph.D. Dissertation. Technische Universität Wien

  35. [35]

    Patrick Züst, Tejaswi Nadahalli, and Ye Wang Roger Wattenhofer. 2021. Analyzing and preventing sandwich attacks in ethereum.ETH Zürich (2021), 1–29. 14