Recognition: no theorem link
Dynamic Transaction Scheduling and Pricing in the Ethereum Mempool
Pith reviewed 2026-05-14 19:20 UTC · model grok-4.3
The pith
Dynamic block pricing via an MDP stabilizes Ethereum mempool volume near target capacity and produces EIP-1559-like updates.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We first give a primal-dual view of static EIP-1559 in which block prices arise as dual variables to a social-welfare maximization problem. Extending to the dynamic setting, we formulate an MDP whose objective balances long-run discounted reward against holding costs and overshoot penalties. Natural policy gradient computation on this MDP produces policies under which increasing the overshoot weight causes scheduled transaction volume to converge to target block capacity and yields price updates that resemble the EIP-1559 rule. For homogeneous transactions the optimal policy has a threshold structure; for uniform arrivals a bang-bang mechanism is proposed together with a lower bound on block
What carries the argument
Markov decision process whose state is the current mempool configuration and whose actions are block prices, solved by natural policy gradient to maximize discounted reward net of holding and overshoot costs.
If this is right
- Raising the overshoot penalty causes average scheduled transaction volume to converge to target block capacity.
- Natural policy gradient price updates closely resemble the EIP-1559 update rule.
- In the homogeneous-transaction case the optimal policy exhibits a threshold structure.
- For uniform arrivals a bang-bang pricing rule together with a sufficient block-capacity lower bound guarantees stability.
Where Pith is reading between the lines
- Mempool trace data could be used to test whether the MDP state representation matches observed arrival and departure statistics.
- The same MDP-plus-NPG approach could be applied to fee-market design on other chains that set per-block prices.
- The threshold structure found for homogeneous transactions suggests implementable heuristic rules that avoid solving the full MDP online.
Load-bearing premise
The stochastic evolution of the mempool is fully captured by a Markov decision process whose state is the current mempool configuration.
What would settle it
Run a simulation of the mempool with successively larger overshoot penalties and check whether average scheduled transaction volume converges to the target block capacity while the computed price updates match the EIP-1559 formula.
Figures
read the original abstract
The Ethereum blockchain utilizes the EIP-1559 algorithm to manage transaction inclusion and block assembly. However, EIP-1559 and much of the existing literature study this problem from a static perspective, focusing on price evolution without modelling transaction dynamics within the mempool. Motivated by this limitation, we study a dynamic transaction scheduling problem in which transactions with heterogeneous sizes and per-unit values arrive over time and remain in the mempool until scheduled. To capture the stochastic mempool evolution, we formulate the problem as a Markov Decision Process (MDP) whose state represents the mempool configuration and whose actions correspond to block prices. We first provide a primal-dual interpretation of the static EIP-1559 mechanism, showing that block prices arise naturally as dual variables of a social-welfare maximization problem. Building on this perspective, we extend the framework to the dynamic setting and formulate an objective that maximizes long-run discounted reward while incorporating holding costs and overshoot penalties. We then employ a Natural Policy Gradient (NPG) algorithm to compute the optimal policy. Our results show that dynamic pricing stabilizes the mempool while maximizing long-run discounted reward. In particular, as the overshoot penalty increases, the average scheduled transaction volume converges to the target block capacity, and the resulting NPG updates closely resemble the EIP-1559 price update rule. Finally, we study two special cases of the MDP formulation: homogeneous transactions and uniform arrivals. In the homogeneous setting, where the protocol directly controls scheduled volume, we show that the optimal policy has a threshold structure. We then propose a bang-bang pricing mechanism for uniform arrivals and derive a lower bound on the block capacity needed to ensure system stability.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper models Ethereum mempool transaction scheduling as a dynamic MDP with state as mempool configuration and actions as block prices. It gives a primal-dual interpretation of static EIP-1559 as dual variables of a social-welfare problem, extends the framework to a discounted-reward objective with holding costs and overshoot penalties, applies Natural Policy Gradient to obtain policies, and reports that increasing the overshoot penalty makes average scheduled volume converge to target capacity while NPG updates resemble the EIP-1559 rule. Special cases are analyzed: homogeneous transactions admit a threshold policy, and uniform arrivals admit a bang-bang mechanism with a derived lower bound on block capacity for stability.
Significance. If the claimed resemblance between NPG updates and EIP-1559 can be placed on an analytical footing rather than simulation, the work would supply a dynamic reinforcement-learning foundation for fee-market design that recovers known static mechanisms as special cases, with direct implications for stability analysis of blockchain transaction inclusion.
major comments (3)
- [Abstract] Abstract: the assertion that 'the resulting NPG updates closely resemble the EIP-1559 price update rule' is presented without an analytical derivation. The primal-dual view is derived only for the static welfare problem; the dynamic MDP uses a discounted objective with holding costs and overshoot penalties, yet no steps are shown establishing that natural policy gradient on the chosen parameterization exactly reproduces the multiplicative update basefee_{t+1} = basefee_t * (1 + f*(gas_used - target)/target). The observed similarity therefore remains an empirical outcome of specific reward weights and trajectories rather than a structural property of the MDP.
- [MDP formulation and results] MDP formulation and results sections: the central claim that dynamic pricing stabilizes the mempool while maximizing long-run discounted reward rests on the assumption that the chosen state representation fully captures stochastic mempool evolution and that the objective correctly balances the three cost components. No verification (convergence diagnostics, error bars across random seeds, or optimality-gap bounds) is supplied that the NPG procedure actually solves this MDP to sufficient accuracy, weakening the evidence that the reported policies are optimal or that the resemblance to EIP-1559 is robust.
- [Homogeneous transactions special case] Homogeneous-transactions special case: the claim that the optimal policy has a threshold structure is stated without the explicit value-function derivation or Bellman-equation analysis that would confirm the structure is indeed optimal for the discounted objective; this gap is load-bearing because the threshold policy is used to motivate the bang-bang mechanism in the uniform-arrivals case.
minor comments (2)
- [Results] Simulation figures would be clearer if they reported standard errors or multiple independent runs rather than single trajectories.
- [MDP formulation] Notation for the overshoot penalty coefficient and discount factor should be introduced once with explicit ranges used in experiments.
Simulated Author's Rebuttal
We thank the referee for the constructive report. The comments correctly identify places where the presentation of empirical observations and supporting analysis can be strengthened. We address each point below and commit to revisions that clarify the scope of our claims without altering the core contributions.
read point-by-point responses
-
Referee: [Abstract] The assertion that NPG updates closely resemble the EIP-1559 rule lacks analytical derivation. The primal-dual view is only for the static case; the dynamic MDP yields an empirical similarity under specific reward weights rather than a structural property.
Authors: We agree the resemblance is empirical, arising from simulations with high overshoot penalties that penalize deviation from target capacity. The manuscript does not claim an exact analytical equivalence. In revision we will (i) rephrase the abstract to state that the updates 'empirically resemble' the EIP-1559 rule under the chosen objective, (ii) add a short discussion subsection explaining why the discounted-reward formulation with overshoot penalties produces multiplicative-like updates in the NPG trajectory, and (iii) include a remark that a full structural proof remains open. These changes make the claim's scope explicit. revision: partial
-
Referee: [MDP formulation and results] No convergence diagnostics, error bars across seeds, or optimality-gap bounds are supplied, weakening evidence that NPG solves the MDP accurately and that the EIP-1559 resemblance is robust.
Authors: We accept this criticism. The current version reports only single-run trajectories. In the revised manuscript we will add: (a) learning curves with mean and standard deviation over 10 random seeds, (b) a table of final average reward and volume deviation with 95% confidence intervals, and (c) a brief paragraph invoking standard policy-gradient convergence results to bound the optimality gap for the chosen parameterization. These additions directly address the request for verification. revision: yes
-
Referee: [Homogeneous transactions special case] The threshold-structure claim lacks explicit value-function derivation or Bellman-equation analysis, which is load-bearing for the subsequent bang-bang mechanism.
Authors: We will supply the missing derivation. In a new appendix we will write the Bellman optimality equation for the homogeneous-transaction MDP, solve for the value function under the discounted objective, and prove that the optimal action is indeed a threshold on the current mempool size. This analysis will be referenced in the main text to justify the bang-bang construction for the uniform-arrivals case. revision: yes
Circularity Check
No significant circularity; dynamic MDP formulation and NPG results are independent of static EIP-1559 primal-dual view
full rationale
The paper first derives a primal-dual interpretation for the static EIP-1559 social-welfare problem, then separately formulates a new dynamic MDP with state as mempool configuration, actions as block prices, and objective incorporating holding costs, overshoot penalties, and discounted reward. Standard NPG is applied to optimize the policy, and the resemblance to EIP-1559 updates is reported as an empirical observation from simulations under increasing overshoot penalty. No equation reduces the dynamic policy or NPG steps to the static dual variables by construction, nor is any parameter fitted to force the resemblance. Special cases (homogeneous transactions with threshold policy, uniform arrivals with bang-bang mechanism) are derived directly from the MDP without importing uniqueness or ansatz from self-citations. The chain is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (2)
- discount factor
- overshoot penalty coefficient
axioms (1)
- domain assumption Mempool state evolves as a Markov process with actions as block prices
Reference graph
Works this paper leans on
-
[1]
Online learning in markov decision processes with adversarially chosen transition probability distributions.Advances in neural information processing systems26 (2013). Fatemeh Fardno and S. Rasoul Etesami23 Eitan Altman. 2021.Constrained Markov Decision Processes. Routledge. Guillermo Angeris, Theo Diamandis, and Ciamac Moallemi
work page 2013
-
[2]
arXiv preprint arXiv:2402.08661(2024)
Multidimensional blockchain fees are (essentially) optimal. arXiv preprint arXiv:2402.08661(2024). Ameya Anjarlekar, S. Rasoul Etesami, and R. Srikant
-
[3]
Scalable policy-based RL algorithms for POMDPs.Advances in Neural Information Processing Systems38 (2026), 96536–96571. Moshe Babaioff and Noam Nisan
work page 2026
-
[4]
On the Optimality of EIP-1559 for Patient Bidders (Draft–Comments Welcome). (2024). Dimitri Bertsekas. 2012.Dynamic programming and optimal control: Volume I. Vol
work page 2024
-
[5]
https: //ethereum.org/en/whitepaper
Ethereum: A Next-Generation Smart Contract and Decentralized Application Platform. https: //ethereum.org/en/whitepaper. Accessed: 2026-02-01. Vitalik Buterin
work page 2026
-
[6]
ch/uploads/default/original X2, 3 (2018),
Blockchain resource pricing.URL: https://ethresear. ch/uploads/default/original X2, 3 (2018),
work page 2018
-
[7]
Davide Crapis, Ciamac C Moallemi, and Shouqiao Wang
https://eips.ethereum.org/EIPS/eip-1559 Accessed: 2026-02-01. Davide Crapis, Ciamac C Moallemi, and Shouqiao Wang
work page 2026
-
[8]
Matheus VX Ferreira, Daniel J Moroz, David C Parkes, and Mitchell Stern
Online Markov decision processes.Mathematics of Operations Research34, 3 (2009), 726–736. Matheus VX Ferreira, Daniel J Moroz, David C Parkes, and Mitchell Stern
work page 2009
-
[9]
Analysis of Dynamic Transaction Fee Blockchain Using Queueing Theory. Mathematics13, 6 (2025). https://doi.org/10.3390/math13061010 Vincent Leon and S. Rasoul Etesami
-
[10]
Stefanos Leonardos, Barnabé Monnot, Daniël Reijsbergen, Efstratios Skoulakis, and Georgios Piliouras
Online Learning for Dynamic Vickrey-Clarke-Groves Mechanism in Sequential Auctions under Unknown Environments.arXiv e-prints(2025), arXiv–2506. Stefanos Leonardos, Barnabé Monnot, Daniël Reijsbergen, Efstratios Skoulakis, and Georgios Piliouras
work page 2025
-
[11]
https://doi.org/10.1016/j.frl.2025.107700 Available at SSRN: https://ssrn.com/abstract=5180204
A Methodology for Pricing Gas Options in Blockchain Protocols.Finance Research Letters84 (2025), 107700. https://doi.org/10.1016/j.frl.2025.107700 Available at SSRN: https://ssrn.com/abstract=5180204. Satoshi Nakamoto
-
[12]
Bitcoin: A peer-to-peer electronic cash system. (2008). Tiancheng Qin and S Rasoul Etesami
work page 2008
-
[13]
Scalable and Independent Learning of Nash Equilibrium Policies in 𝑛-Player Stochastic Games with Unknown Independent Chains.arXiv preprint arXiv:2312.01587(2023). Daniël Reijsbergen, Shyam Sridhar, Barnabé Monnot, Stefanos Leonardos, Stratis Skoulakis, and Georgios Piliouras
-
[14]
arXiv preprint arXiv:2012.00854(2020)
Transaction fee mechanism design for the Ethereum blockchain: An economic analysis of EIP-1559. arXiv preprint arXiv:2012.00854(2020). Fatemeh Fardno and S. Rasoul Etesami24 Appendix Notations The main notations used in this paper are listed in Table
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.