arxiv: 2605.12794 · v1 · submitted 2026-05-12 · 💻 cs.GT · cs.CR· cs.DC· cs.NI· cs.SY· eess.SY

Recognition: no theorem link

Dynamic Transaction Scheduling and Pricing in the Ethereum Mempool

Fatemeh Fardno , S. Rasoul Etesami

Authors on Pith no claims yet

Pith reviewed 2026-05-14 19:20 UTC · model grok-4.3

classification 💻 cs.GT cs.CRcs.DCcs.NIcs.SYeess.SY

keywords Ethereum mempooldynamic pricingEIP-1559Markov decision processnatural policy gradienttransaction schedulingblock capacity

0 comments

The pith

Dynamic block pricing via an MDP stabilizes Ethereum mempool volume near target capacity and produces EIP-1559-like updates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper models transactions arriving over time into the Ethereum mempool, each with size and value, as a Markov decision process whose state tracks the current configuration of waiting transactions. Block prices are chosen as actions to maximize long-run discounted reward while penalizing both holding costs for unscheduled transactions and overshoot beyond block capacity. Natural policy gradient is used to compute the policy, and the results show that raising the overshoot penalty drives average scheduled volume to the target capacity with price updates that closely track the existing EIP-1559 rule. Special cases for homogeneous transactions and uniform arrivals yield explicit threshold policies and a stability bound on capacity.

Core claim

We first give a primal-dual view of static EIP-1559 in which block prices arise as dual variables to a social-welfare maximization problem. Extending to the dynamic setting, we formulate an MDP whose objective balances long-run discounted reward against holding costs and overshoot penalties. Natural policy gradient computation on this MDP produces policies under which increasing the overshoot weight causes scheduled transaction volume to converge to target block capacity and yields price updates that resemble the EIP-1559 rule. For homogeneous transactions the optimal policy has a threshold structure; for uniform arrivals a bang-bang mechanism is proposed together with a lower bound on block

What carries the argument

Markov decision process whose state is the current mempool configuration and whose actions are block prices, solved by natural policy gradient to maximize discounted reward net of holding and overshoot costs.

If this is right

Raising the overshoot penalty causes average scheduled transaction volume to converge to target block capacity.
Natural policy gradient price updates closely resemble the EIP-1559 update rule.
In the homogeneous-transaction case the optimal policy exhibits a threshold structure.
For uniform arrivals a bang-bang pricing rule together with a sufficient block-capacity lower bound guarantees stability.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Mempool trace data could be used to test whether the MDP state representation matches observed arrival and departure statistics.
The same MDP-plus-NPG approach could be applied to fee-market design on other chains that set per-block prices.
The threshold structure found for homogeneous transactions suggests implementable heuristic rules that avoid solving the full MDP online.

Load-bearing premise

The stochastic evolution of the mempool is fully captured by a Markov decision process whose state is the current mempool configuration.

What would settle it

Run a simulation of the mempool with successively larger overshoot penalties and check whether average scheduled transaction volume converges to the target block capacity while the computed price updates match the EIP-1559 formula.

Figures

Figures reproduced from arXiv: 2605.12794 by Fatemeh Fardno, S. Rasoul Etesami.

**Figure 2.** Figure 2: Average scheduled transaction volume as a function of the target block capacity for Setting 1. [PITH_FULL_IMAGE:figures/full_fig_p014_2.png] view at source ↗

**Figure 3.** Figure 3: Average scheduled transaction volume as a function of the overshoot penalty in [PITH_FULL_IMAGE:figures/full_fig_p015_3.png] view at source ↗

**Figure 4.** Figure 4: Periodic state evolution induced by the policy defined in Eq. [PITH_FULL_IMAGE:figures/full_fig_p022_4.png] view at source ↗

**Figure 5.** Figure 5: Illustration of the feasible region H for 𝐵 = 5. Next, we show that 𝑄𝑘 (·, ·) is jointly convex. For any 𝜆 ∈ [0, 1], it suffices to show that 𝜆𝑄𝑘 (𝑥1, 𝑓1) + (1 − 𝜆)𝑄𝑘 (𝑥2, 𝑓2) ≥ 𝑄𝑘 (𝜆𝑥1 + (1 − 𝜆)𝑥2, 𝜆 𝑓1 + (1 − 𝜆)𝑓2). We can write 𝜆𝑄𝑘 (𝑥1, 𝑓1) + (1 − 𝜆)𝑄𝑘 (𝑥2, 𝑓2) = 𝜆 [PITH_FULL_IMAGE:figures/full_fig_p028_5.png] view at source ↗

read the original abstract

The Ethereum blockchain utilizes the EIP-1559 algorithm to manage transaction inclusion and block assembly. However, EIP-1559 and much of the existing literature study this problem from a static perspective, focusing on price evolution without modelling transaction dynamics within the mempool. Motivated by this limitation, we study a dynamic transaction scheduling problem in which transactions with heterogeneous sizes and per-unit values arrive over time and remain in the mempool until scheduled. To capture the stochastic mempool evolution, we formulate the problem as a Markov Decision Process (MDP) whose state represents the mempool configuration and whose actions correspond to block prices. We first provide a primal-dual interpretation of the static EIP-1559 mechanism, showing that block prices arise naturally as dual variables of a social-welfare maximization problem. Building on this perspective, we extend the framework to the dynamic setting and formulate an objective that maximizes long-run discounted reward while incorporating holding costs and overshoot penalties. We then employ a Natural Policy Gradient (NPG) algorithm to compute the optimal policy. Our results show that dynamic pricing stabilizes the mempool while maximizing long-run discounted reward. In particular, as the overshoot penalty increases, the average scheduled transaction volume converges to the target block capacity, and the resulting NPG updates closely resemble the EIP-1559 price update rule. Finally, we study two special cases of the MDP formulation: homogeneous transactions and uniform arrivals. In the homogeneous setting, where the protocol directly controls scheduled volume, we show that the optimal policy has a threshold structure. We then propose a bang-bang pricing mechanism for uniform arrivals and derive a lower bound on the block capacity needed to ensure system stability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper sets up a dynamic MDP for Ethereum mempool scheduling and applies NPG, with clean threshold and bang-bang policies in special cases, but the claimed recovery of EIP-1559 is only a simulation resemblance.

read the letter

The core contribution is a Markov decision process that treats mempool state as the configuration of pending transactions and lets the action be the block price. They first restate the static EIP-1559 rule as the dual of a welfare problem, then add holding costs and an overshoot penalty to a discounted long-run objective and optimize it with natural policy gradient. In the homogeneous-transaction case they prove a threshold structure for the optimal policy; for uniform arrivals they give a bang-bang mechanism and a stability bound on capacity. Those two derivations look solid and give concrete policy shapes that static models lack. The main empirical claim is that raising the overshoot penalty makes the learned price updates look like the EIP-1559 multiplicative rule and drives scheduled volume to target. That resemblance is shown only in simulation trajectories, with no derivation that the NPG step on their parameterization exactly reproduces basefee_{t+1} = basefee_t * (1 + f*(gas_used - target)/target). The abstract also gives no error bars, convergence diagnostics, or sensitivity checks on the discount factor and penalty coefficient, so the stabilization result is plausible but not yet tightly supported. The modeling choice to capture arrivals and holding costs inside the MDP is reasonable and moves past purely static analysis. This work is aimed at mechanism-design researchers who already know EIP-1559 and want to see a dynamic extension; the special-case results are worth citing even if the general NPG claim needs more verification. I would send it to referees rather than desk-reject.

Referee Report

3 major / 2 minor

Summary. The paper models Ethereum mempool transaction scheduling as a dynamic MDP with state as mempool configuration and actions as block prices. It gives a primal-dual interpretation of static EIP-1559 as dual variables of a social-welfare problem, extends the framework to a discounted-reward objective with holding costs and overshoot penalties, applies Natural Policy Gradient to obtain policies, and reports that increasing the overshoot penalty makes average scheduled volume converge to target capacity while NPG updates resemble the EIP-1559 rule. Special cases are analyzed: homogeneous transactions admit a threshold policy, and uniform arrivals admit a bang-bang mechanism with a derived lower bound on block capacity for stability.

Significance. If the claimed resemblance between NPG updates and EIP-1559 can be placed on an analytical footing rather than simulation, the work would supply a dynamic reinforcement-learning foundation for fee-market design that recovers known static mechanisms as special cases, with direct implications for stability analysis of blockchain transaction inclusion.

major comments (3)

[Abstract] Abstract: the assertion that 'the resulting NPG updates closely resemble the EIP-1559 price update rule' is presented without an analytical derivation. The primal-dual view is derived only for the static welfare problem; the dynamic MDP uses a discounted objective with holding costs and overshoot penalties, yet no steps are shown establishing that natural policy gradient on the chosen parameterization exactly reproduces the multiplicative update basefee_{t+1} = basefee_t * (1 + f*(gas_used - target)/target). The observed similarity therefore remains an empirical outcome of specific reward weights and trajectories rather than a structural property of the MDP.
[MDP formulation and results] MDP formulation and results sections: the central claim that dynamic pricing stabilizes the mempool while maximizing long-run discounted reward rests on the assumption that the chosen state representation fully captures stochastic mempool evolution and that the objective correctly balances the three cost components. No verification (convergence diagnostics, error bars across random seeds, or optimality-gap bounds) is supplied that the NPG procedure actually solves this MDP to sufficient accuracy, weakening the evidence that the reported policies are optimal or that the resemblance to EIP-1559 is robust.
[Homogeneous transactions special case] Homogeneous-transactions special case: the claim that the optimal policy has a threshold structure is stated without the explicit value-function derivation or Bellman-equation analysis that would confirm the structure is indeed optimal for the discounted objective; this gap is load-bearing because the threshold policy is used to motivate the bang-bang mechanism in the uniform-arrivals case.

minor comments (2)

[Results] Simulation figures would be clearer if they reported standard errors or multiple independent runs rather than single trajectories.
[MDP formulation] Notation for the overshoot penalty coefficient and discount factor should be introduced once with explicit ranges used in experiments.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive report. The comments correctly identify places where the presentation of empirical observations and supporting analysis can be strengthened. We address each point below and commit to revisions that clarify the scope of our claims without altering the core contributions.

read point-by-point responses

Referee: [Abstract] The assertion that NPG updates closely resemble the EIP-1559 rule lacks analytical derivation. The primal-dual view is only for the static case; the dynamic MDP yields an empirical similarity under specific reward weights rather than a structural property.

Authors: We agree the resemblance is empirical, arising from simulations with high overshoot penalties that penalize deviation from target capacity. The manuscript does not claim an exact analytical equivalence. In revision we will (i) rephrase the abstract to state that the updates 'empirically resemble' the EIP-1559 rule under the chosen objective, (ii) add a short discussion subsection explaining why the discounted-reward formulation with overshoot penalties produces multiplicative-like updates in the NPG trajectory, and (iii) include a remark that a full structural proof remains open. These changes make the claim's scope explicit. revision: partial
Referee: [MDP formulation and results] No convergence diagnostics, error bars across seeds, or optimality-gap bounds are supplied, weakening evidence that NPG solves the MDP accurately and that the EIP-1559 resemblance is robust.

Authors: We accept this criticism. The current version reports only single-run trajectories. In the revised manuscript we will add: (a) learning curves with mean and standard deviation over 10 random seeds, (b) a table of final average reward and volume deviation with 95% confidence intervals, and (c) a brief paragraph invoking standard policy-gradient convergence results to bound the optimality gap for the chosen parameterization. These additions directly address the request for verification. revision: yes
Referee: [Homogeneous transactions special case] The threshold-structure claim lacks explicit value-function derivation or Bellman-equation analysis, which is load-bearing for the subsequent bang-bang mechanism.

Authors: We will supply the missing derivation. In a new appendix we will write the Bellman optimality equation for the homogeneous-transaction MDP, solve for the value function under the discounted objective, and prove that the optimal action is indeed a threshold on the current mempool size. This analysis will be referenced in the main text to justify the bang-bang construction for the uniform-arrivals case. revision: yes

Circularity Check

0 steps flagged

No significant circularity; dynamic MDP formulation and NPG results are independent of static EIP-1559 primal-dual view

full rationale

The paper first derives a primal-dual interpretation for the static EIP-1559 social-welfare problem, then separately formulates a new dynamic MDP with state as mempool configuration, actions as block prices, and objective incorporating holding costs, overshoot penalties, and discounted reward. Standard NPG is applied to optimize the policy, and the resemblance to EIP-1559 updates is reported as an empirical observation from simulations under increasing overshoot penalty. No equation reduces the dynamic policy or NPG steps to the static dual variables by construction, nor is any parameter fitted to force the resemblance. Special cases (homogeneous transactions with threshold policy, uniform arrivals with bang-bang mechanism) are derived directly from the MDP without importing uniqueness or ansatz from self-citations. The chain is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The model rests on standard MDP assumptions plus newly introduced holding costs and overshoot penalties in the objective; no new physical entities are postulated.

free parameters (2)

discount factor
Controls the long-run discounted reward objective; value not specified in abstract.
overshoot penalty coefficient
Tuned to produce convergence to target block capacity; directly affects reported policy behavior.

axioms (1)

domain assumption Mempool state evolves as a Markov process with actions as block prices
Invoked when formulating the scheduling problem as an MDP.

pith-pipeline@v0.9.0 · 5625 in / 1309 out tokens · 25679 ms · 2026-05-14T19:20:15.521051+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages

[1]

Fatemeh Fardno and S

Online learning in markov decision processes with adversarially chosen transition probability distributions.Advances in neural information processing systems26 (2013). Fatemeh Fardno and S. Rasoul Etesami23 Eitan Altman. 2021.Constrained Markov Decision Processes. Routledge. Guillermo Angeris, Theo Diamandis, and Ciamac Moallemi

work page 2013
[2]

arXiv preprint arXiv:2402.08661(2024)

Multidimensional blockchain fees are (essentially) optimal. arXiv preprint arXiv:2402.08661(2024). Ameya Anjarlekar, S. Rasoul Etesami, and R. Srikant

work page arXiv 2024
[3]

Moshe Babaioff and Noam Nisan

Scalable policy-based RL algorithms for POMDPs.Advances in Neural Information Processing Systems38 (2026), 96536–96571. Moshe Babaioff and Noam Nisan

work page 2026
[4]

On the Optimality of EIP-1559 for Patient Bidders (Draft–Comments Welcome). (2024). Dimitri Bertsekas. 2012.Dynamic programming and optimal control: Volume I. Vol

work page 2024
[5]

https: //ethereum.org/en/whitepaper

Ethereum: A Next-Generation Smart Contract and Decentralized Application Platform. https: //ethereum.org/en/whitepaper. Accessed: 2026-02-01. Vitalik Buterin

work page 2026
[6]

ch/uploads/default/original X2, 3 (2018),

Blockchain resource pricing.URL: https://ethresear. ch/uploads/default/original X2, 3 (2018),

work page 2018
[7]

Davide Crapis, Ciamac C Moallemi, and Shouqiao Wang

https://eips.ethereum.org/EIPS/eip-1559 Accessed: 2026-02-01. Davide Crapis, Ciamac C Moallemi, and Shouqiao Wang

work page 2026
[8]

Matheus VX Ferreira, Daniel J Moroz, David C Parkes, and Mitchell Stern

Online Markov decision processes.Mathematics of Operations Research34, 3 (2009), 726–736. Matheus VX Ferreira, Daniel J Moroz, David C Parkes, and Mitchell Stern

work page 2009
[9]

Mathematics13, 6 (2025)

Analysis of Dynamic Transaction Fee Blockchain Using Queueing Theory. Mathematics13, 6 (2025). https://doi.org/10.3390/math13061010 Vincent Leon and S. Rasoul Etesami

work page doi:10.3390/math13061010 2025
[10]

Stefanos Leonardos, Barnabé Monnot, Daniël Reijsbergen, Efstratios Skoulakis, and Georgios Piliouras

Online Learning for Dynamic Vickrey-Clarke-Groves Mechanism in Sequential Auctions under Unknown Environments.arXiv e-prints(2025), arXiv–2506. Stefanos Leonardos, Barnabé Monnot, Daniël Reijsbergen, Efstratios Skoulakis, and Georgios Piliouras

work page 2025
[11]

https://doi.org/10.1016/j.frl.2025.107700 Available at SSRN: https://ssrn.com/abstract=5180204

A Methodology for Pricing Gas Options in Blockchain Protocols.Finance Research Letters84 (2025), 107700. https://doi.org/10.1016/j.frl.2025.107700 Available at SSRN: https://ssrn.com/abstract=5180204. Satoshi Nakamoto

work page doi:10.1016/j.frl.2025.107700 2025
[12]

Bitcoin: A peer-to-peer electronic cash system. (2008). Tiancheng Qin and S Rasoul Etesami

work page 2008
[13]

Daniël Reijsbergen, Shyam Sridhar, Barnabé Monnot, Stefanos Leonardos, Stratis Skoulakis, and Georgios Piliouras

Scalable and Independent Learning of Nash Equilibrium Policies in 𝑛-Player Stochastic Games with Unknown Independent Chains.arXiv preprint arXiv:2312.01587(2023). Daniël Reijsbergen, Shyam Sridhar, Barnabé Monnot, Stefanos Leonardos, Stratis Skoulakis, and Georgios Piliouras

work page arXiv 2023
[14]

arXiv preprint arXiv:2012.00854(2020)

Transaction fee mechanism design for the Ethereum blockchain: An economic analysis of EIP-1559. arXiv preprint arXiv:2012.00854(2020). Fatemeh Fardno and S. Rasoul Etesami24 Appendix Notations The main notations used in this paper are listed in Table

work page arXiv 2012