pith. sign in

arxiv: 2606.08379 · v1 · pith:HJGDJ5W4new · submitted 2026-06-07 · 💻 cs.AI · cs.CE· cs.LG· q-fin.CP· q-fin.TR

TT-DAC-PS: Twin-Target Deterministic Actor-Critic with Policy Smoothing for Optimal Trade Execution

Pith reviewed 2026-06-27 19:06 UTC · model grok-4.3

classification 💻 cs.AI cs.CEcs.LGq-fin.CPq-fin.TR
keywords optimal trade executionreinforcement learningactor-criticimplementation shortfalllimit order booktrade impactpolicy smoothing
0
0 comments X

The pith

TT-DAC-PS reduces mean implementation shortfall for large stock sell programs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a deterministic actor-critic model called TT-DAC-PS that combines twin critic targets, pessimistic min backup, target policy smoothing, delayed updates, and conservative Q regularization. It evaluates the model inside a simulator that merges Almgren-Chriss impact with real limit-order-book prices and volumes for ten U.S. stocks. A sympathetic reader would care because lower average shortfall directly reduces the cost of unwinding large positions while keeping variance competitive. The model is tested against PPO, SAC, A2C and the classical TWAP, VWAP, and Almgren-Chriss baselines.

Core claim

TT-DAC-PS integrates twin exponential-moving-average critic targets with pessimistic min backup, TD3-style target policy smoothing noise, delayed actor updates, and conservative Q regularisation. Exploration uses Ornstein-Uhlenbeck noise under a hybrid schedule of deterministic decay, variance-guided adjustment, and a learned temperature. When run on limit-order-book data for ten U.S. stocks, the method consistently lowers mean implementation shortfall percentage with competitive variance and outperforms the listed reinforcement-learning and classical benchmarks.

What carries the argument

Twin-Target Deterministic Actor-Critic with Policy Smoothing (TT-DAC-PS), which stabilises Q-value estimates through twin targets, pessimistic backup, smoothing noise, and conservative regularisation to support better policy decisions in trade execution.

If this is right

  • Large sell programs can be completed at lower average cost than with time-weighted or volume-weighted schedules.
  • The combination of pessimistic backup and smoothing noise keeps learning stable despite the non-stationary order-book environment.
  • Normalised state features and per-step volume caps allow the same architecture to generalise across the ten tested stocks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Changing the reward function to penalise different risk measures could shift the variance-cost trade-off without altering the core architecture.
  • Applying the same twin-target and smoothing components to buy-side execution or to other impact models would test whether the gains are specific to sell programs.
  • Running the method on out-of-sample periods after the training window would show whether the shortfall reduction persists under new market conditions.

Load-bearing premise

The simulated trading environment based on the Almgren-Chriss impact model and historical LOB data sufficiently represents the dynamics of actual market execution for the tested stocks.

What would settle it

Deploying the trained policy on live trading data for the same ten stocks and measuring no reduction in mean implementation shortfall would contradict the reported performance advantage.

Figures

Figures reproduced from arXiv: 2606.08379 by Alfonso Dufour, Atta Badii, Ilia Zaznov, Julian Kunkel.

Figure 1
Figure 1. Figure 1: Taxonomy of optimal execution research. • VWAP (Volume-Weighted Average Price) aligns trades with volume: x VWAP t = Q · vt PN−1 k=0 vk where vt is expected or observed volume. No-dynamic-arbitrage constraints [19] require the permanent impact to be linear in aggregate order flow, ruling out price manipulation and ensuring market integrity. Extensions to the AC model include transient impact and resilience… view at source ↗
read the original abstract

This study addresses the optimal execution of large stock sell programs by introducing TT-DAC-PS (Twin-Target Deterministic Actor-Critic with Policy Smoothing), a deterministic actor-critic architecture that combines twin exponential-moving-average critic targets with pessimistic min backup, TD3-style target policy smoothing noise, delayed actor updates, and conservative Q regularisation to curb overestimation. Exploration uses Ornstein-Uhlenbeck (OU) noise with a hybrid schedule: deterministic episode-wise decay, variance-guided adjustment based on recent reward dispersion, and a Soft Actor-Critic (SAC)-style temperature that is learned and mapped to the noise scale. The environment integrates Almgren-Chriss (AC) trade impact with Limit Order Book (LOB) prices and volumes, normalised state features, per-step volume participation caps, and a utility-based reward. The trade execution algorithm is applied to LOB data for ten U.S. stocks. Performance is assessed against reinforcement-learning baseline algorithms, including Proximal Policy Optimisation (PPO), Soft Actor-Critic (SAC), and Advantage Actor-Critic (A2C), as well as alternative trade execution algorithms, including Time-Weighted Average Price (TWAP), Volume-Weighted Average Price (VWAP), and AC. The proposed model consistently reduces mean implementation shortfall percentage with competitive variance, outperforming classical baselines and standard reinforcement-learning benchmark models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The manuscript introduces TT-DAC-PS, a deterministic actor-critic method for optimal trade execution that augments TD3 with twin exponential-moving-average critic targets, pessimistic min backup, target policy smoothing noise, delayed actor updates, and conservative Q regularization. Exploration employs Ornstein-Uhlenbeck noise under a hybrid schedule combining deterministic decay, variance-guided adjustment, and a learned SAC-style temperature. The environment combines the Almgren-Chriss impact model with historical limit-order-book snapshots; the method is evaluated on ten U.S. stocks and reports lower mean implementation shortfall (with competitive variance) relative to TWAP, VWAP, AC, PPO, SAC, and A2C.

Significance. If the simulator faithfully reproduces real-market impact and liquidity dynamics, the architecture could supply a practical, overestimation-resistant RL baseline for continuous-control execution problems; the hybrid noise schedule and twin-target design are potentially reusable in other noisy-reward domains. The empirical contribution, however, is conditional on validation of the Almgren-Chriss + LOB environment against realized slippage.

major comments (3)
  1. [Environment and Experimental Setup] Environment and Experimental Setup: the headline outperformance claim rests on the Almgren-Chriss + LOB simulator. The manuscript provides no information on how the temporary and permanent impact coefficients were fitted to the ten stocks, whether parameters were stock-specific or fixed, or whether any out-of-sample validation against actual execution slippage was performed. Without these details the reported gains relative to the classical and RL baselines cannot be distinguished from artifacts of the impact model.
  2. [Results section] Results section (tables reporting mean implementation shortfall): the comparisons lack the number of independent training runs, standard errors or confidence intervals, and any statistical significance tests (e.g., paired t-tests or Wilcoxon rank-sum across seeds or stocks). This omission prevents assessment of whether the claimed consistent reductions are statistically reliable or could arise from training variance.
  3. [§4] §4 (or wherever the reward and participation constraints are defined): the utility-based reward and per-step volume caps are central to the policy objective, yet no sensitivity analysis is reported with respect to the choice of utility function or cap values; small changes in these modeling choices could alter the ranking versus the baselines.
minor comments (3)
  1. [Method] Notation for the twin-target and pessimistic-min operators should be introduced with explicit equations rather than prose descriptions only.
  2. [Abstract and Results] The abstract states "consistently reduces" but the results tables do not indicate whether this holds for every stock or only on average; a per-stock breakdown or win-rate statistic would clarify the claim.
  3. [Related Work] Missing references to recent RL-for-execution surveys or to prior work that already combines AC impact with LOB snapshots should be added for context.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which help strengthen the manuscript. Below we respond point-by-point to the major comments, indicating planned revisions where appropriate.

read point-by-point responses
  1. Referee: [Environment and Experimental Setup] the headline outperformance claim rests on the Almgren-Chriss + LOB simulator. The manuscript provides no information on how the temporary and permanent impact coefficients were fitted to the ten stocks, whether parameters were stock-specific or fixed, or whether any out-of-sample validation against actual execution slippage was performed. Without these details the reported gains relative to the classical and RL baselines cannot be distinguished from artifacts of the impact model.

    Authors: We agree that the current manuscript lacks these details. In the revision we will add a subsection describing the stock-specific fitting procedure for the impact coefficients using historical LOB and trade data. We will also explicitly state that comprehensive out-of-sample validation against realized execution slippage was not performed (the study is simulator-based) and discuss this limitation. This provides the requested transparency without altering the core claims. revision: partial

  2. Referee: [Results section] the comparisons lack the number of independent training runs, standard errors or confidence intervals, and any statistical significance tests (e.g., paired t-tests or Wilcoxon rank-sum across seeds or stocks). This omission prevents assessment of whether the claimed consistent reductions are statistically reliable or could arise from training variance.

    Authors: We accept this criticism. The revised manuscript will report the number of independent runs (10 random seeds), include standard errors and confidence intervals in all tables, and add paired t-tests (or Wilcoxon rank-sum where appropriate) across seeds and stocks to establish statistical reliability of the reported improvements. revision: yes

  3. Referee: [§4] the utility-based reward and per-step volume caps are central to the policy objective, yet no sensitivity analysis is reported with respect to the choice of utility function or cap values; small changes in these modeling choices could alter the ranking versus the baselines.

    Authors: We acknowledge the value of such analysis. The revision will include a new sensitivity study (main text or appendix) varying the utility parameters and participation caps, demonstrating that TT-DAC-PS retains its performance advantage under reasonable perturbations of these choices. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical claims rest on external baselines

full rationale

The manuscript introduces TT-DAC-PS, an RL architecture, and reports its performance on LOB data for ten stocks inside an AC+LOB simulator. All load-bearing results are obtained by direct comparison to independent baselines (TWAP, VWAP, AC, PPO, SAC, A2C). No equations reduce a claimed prediction to a fitted input by construction, no uniqueness theorems are imported from the authors' prior work, and no ansatz is smuggled via self-citation. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review limits identification of all parameters; the main assumption is the fidelity of the market simulation model.

axioms (1)
  • domain assumption The trading environment is modeled accurately by the Almgren-Chriss impact function combined with LOB data.
    This is invoked in the environment description in the abstract.

pith-pipeline@v0.9.1-grok · 5805 in / 1301 out tokens · 34644 ms · 2026-06-27T19:06:41.347304+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

61 extracted references · 6 canonical work pages · 3 internal anchors

  1. [1]

    Optimal execution of portfolio transactions.Journal of Risk, 3(2):5–39, 2001

    Robert Almgren and Neil Chriss. Optimal execution of portfolio transactions.Journal of Risk, 3(2):5–39, 2001

  2. [2]

    Optimal execution with nonlinear impact functions and trading-enhanced risk.Applied mathematical finance, 10(1):1–18, 2003

    Robert F Almgren. Optimal execution with nonlinear impact functions and trading-enhanced risk.Applied mathematical finance, 10(1):1–18, 2003

  3. [3]

    Optimal trade execution under geometric brownian motion in the almgren and chriss framework.International Journal of Theoretical and Applied Finance, 14(03):353–368, 2011

    Jim Gatheral and Alexander Schied. Optimal trade execution under geometric brownian motion in the almgren and chriss framework.International Journal of Theoretical and Applied Finance, 14(03):353–368, 2011

  4. [4]

    PhD thesis, University College London, 2015

    Weiguan Wang.Optimal Execution Under Nonlinear Transient Market Impact Model. PhD thesis, University College London, 2015

  5. [5]

    Cambridge University Press, 2015

    Álvaro Cartea, Sebastian Jaimungal, and José Penalva.Algorithmic and High-Frequency Trading. Cambridge University Press, 2015

  6. [6]

    Agent-based models for latent liquidity and concave price impact.Physical Review E, 89(4):042805, 2014

    Iacopo Mastromatteo, Bence Toth, and Jean-Philippe Bouchaud. Agent-based models for latent liquidity and concave price impact.Physical Review E, 89(4):042805, 2014

  7. [7]

    Reinforcement learning for optimized trade execution

    Yuriy Nevmyvaka, Yi Feng, and Michael Kearns. Reinforcement learning for optimized trade execution. In Proceedings of the 23rd International Conference on Machine Learning, pages 673–680. ACM, 2006

  8. [8]

    A reinforcement learning extension to the almgren-chriss framework for optimal trade execution

    Dieter Hendricks and Diane Wilcox. A reinforcement learning extension to the almgren-chriss framework for optimal trade execution. In2014 IEEE Conference on computational intelligence for financial engineering & economics (CIFEr), pages 457–464. IEEE, 2014

  9. [9]

    Universal trading for order execution with oracle policy distillation

    Yuchen Fang, Kan Ren, Weiqing Liu, Dong Zhou, Weinan Zhang, Jiang Bian, Yong Yu, and Tie-Yan Liu. Universal trading for order execution with oracle policy distillation. InProceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 107–115, 2021

  10. [10]

    Deep reinforcement learning for algorith- mic trading.Available at SSRN 3812473, 2021

    Álvaro Cartea, Sebastian Jaimungal, and Leandro Sánchez-Betancourt. Deep reinforcement learning for algorith- mic trading.Available at SSRN 3812473, 2021

  11. [11]

    Optimal execution with reinforcement learning.arXiv, 2024

    Y Hafsi and E Vittori. Optimal execution with reinforcement learning.arXiv, 2024

  12. [12]

    Reinforcement learning for optimal execution when liquidity is time-varying

    Tommaso Macrì and Fabrizio Lillo. Reinforcement learning for optimal execution when liquidity is time-varying. arXiv preprint arXiv:2402.12049, 2024

  13. [13]

    Benchmarking deep reinforcement learning approaches to trade execution.Pacific-Basin Finance Journal, 94:102876, 2025

    Isaac Tonkin et al. Benchmarking deep reinforcement learning approaches to trade execution.Pacific-Basin Finance Journal, 94:102876, 2025

  14. [14]

    Optimal execution strategies in limit order books with general shape functions.Quantitative Finance, 10(2):143–157, 2009

    Aurélien Alfonsi, Antje Fruth, and Alexander Schied. Optimal execution strategies in limit order books with general shape functions.Quantitative Finance, 10(2):143–157, 2009

  15. [15]

    Recent advances in reinforcement learning in finance.Mathematical Finance, 2021

    Ben Hambly, Renyuan Xu, and Huining Yang. Recent advances in reinforcement learning in finance.Mathematical Finance, 2021

  16. [16]

    Deep reinforcement learning for online optimal execution strategies.arXiv preprint arXiv:2410.13493, 2024

    Matteo Micheli and Antoine Monod. Deep reinforcement learning for online optimal execution strategies.arXiv preprint arXiv:2410.13493, 2024

  17. [17]

    Continuous control with deep reinforcement learning

    Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. Continuous control with deep reinforcement learning.arXiv preprint arXiv:1509.02971, 2015. ICLR 2016

  18. [18]

    Dimitris Bertsimas and Andrew W. Lo. Optimal control of execution costs.Journal of Financial Markets, 1(1):1–50, 1998

  19. [19]

    No-dynamic-arbitrage and market impact.Quantitative Finance, 10(7):749–759, 2010

    Jim Gatheral. No-dynamic-arbitrage and market impact.Quantitative Finance, 10(7):749–759, 2010. 19 TT-DAC-PS: Twin-Target Deterministic Actor-Critic with Policy Smoothing for Optimal Trade Execution

  20. [20]

    Transient linear price impact and optimal execution.Mathemat- ical Finance, 25(3):557–592, 2015

    Jim Gatheral, Alexander Schied, and Alla Slynko. Transient linear price impact and optimal execution.Mathemat- ical Finance, 25(3):557–592, 2015. Often cited by early preprint year 2013

  21. [21]

    Robust strategies for optimal order execution in the almgren–chriss framework.Applied Mathematical Finance, 20(3):264–286, 2013

    Alexander Schied. Robust strategies for optimal order execution in the almgren–chriss framework.Applied Mathematical Finance, 20(3):264–286, 2013

  22. [22]

    Dynamic trading with predictable returns and transaction costs.The Journal of Finance, 68(6):2309–2340, 2013

    Nicolae Gârleanu and Lasse Heje Pedersen. Dynamic trading with predictable returns and transaction costs.The Journal of Finance, 68(6):2309–2340, 2013

  23. [23]

    Gould, Mason A

    Martin D. Gould, Mason A. Porter, Stacy Williams, Mark McDonald, Daniel J. Fenn, and Sam D. Howison. Limit order books.Quantitative Finance, 13(11):1709–1742, 2013

  24. [24]

    Statistical properties of stock order books: empirical results and models.Quantitative Finance, 2(4):251–256, 2002

    Jean-Philippe Bouchaud, Marc Mézard, and Marc Potters. Statistical properties of stock order books: empirical results and models.Quantitative Finance, 2(4):251–256, 2002

  25. [25]

    Statistical modeling of high-frequency financial data.Annual Review of Financial Economics, 3(1):291–310, 2011

    Rama Cont. Statistical modeling of high-frequency financial data.Annual Review of Financial Economics, 3(1):291–310, 2011

  26. [26]

    Fluctuations and response in financial markets: The subtle nature of “random” price changes.Quantitative Finance, 4(2):176–190, 2004

    Jean-Philippe Bouchaud, Yuval Gefen, Marc Potters, and Matthieu Wyart. Fluctuations and response in financial markets: The subtle nature of “random” price changes.Quantitative Finance, 4(2):176–190, 2004

  27. [27]

    Hawkes processes in finance.Market Microstructure and Liquidity, 1(1):1550005, 2015

    Emmanuel Bacry, Iacopo Mastromatteo, and Jean-François Muzy. Hawkes processes in finance.Market Microstructure and Liquidity, 1(1):1550005, 2015

  28. [28]

    Critical reflexivity in financial markets: a hawkes process analysis.The European Physical Journal B, 86(10):442, 2013

    Stephen Hardiman, Nicolas Bercot, and Jean-Philippe Bouchaud. Critical reflexivity in financial markets: a hawkes process analysis.The European Physical Journal B, 86(10):442, 2013

  29. [29]

    Anomalous price impact and the critical nature of liquidity in financial markets.Physical Review X, 1(2):021006, 2011

    Bence Tóth, Yves Lemperiere, Cyril Deremble, Joachim De Lataillade, Julien Kockelkoren, and J-P Bouchaud. Anomalous price impact and the critical nature of liquidity in financial markets.Physical Review X, 1(2):021006, 2011

  30. [30]

    The non-linear market impact of large trades: Evidence from limit order books.The Journal of Trading, 8(3):1–12, 2013

    Natalia Bershova and Dmitry Rakhlin. The non-linear market impact of large trades: Evidence from limit order books.The Journal of Trading, 8(3):1–12, 2013

  31. [31]

    Direct estimation of equity market impact

    Robert Almgren, Chee Thum, Emmanuel Hauptmann, and Hong Li. Direct estimation of equity market impact. Risk, 18(7):58–62, 2005

  32. [32]

    Addressing function approximation error in actor-critic methods

    Scott Fujimoto, Herke van Hoof, and David Meger. Addressing function approximation error in actor-critic methods. InProceedings of the 35th International Conference on Machine Learning (ICML), 2018

  33. [33]

    Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor

    Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. InProceedings of the 35th International Conference on Machine Learning, 2018

  34. [34]

    Proximal Policy Optimization Algorithms

    John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. InarXiv preprint arXiv:1707.06347, 2017

  35. [35]

    Deterministic policy gradient algorithms

    David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller. Deterministic policy gradient algorithms. InProceedings of the 31st International Conference on Machine Learning, 2014

  36. [36]

    Double deep q-learning for optimal trade execution.arXiv preprint arXiv:1812.06600, 2018

    Bohan Ning, Xiaoteng Wang, Andrew Lim, and Jie Ye. Double deep q-learning for optimal trade execution.arXiv preprint arXiv:1812.06600, 2018

  37. [37]

    An end-to-end optimal trade execution framework based on proximal policy optimization

    Siyu Lin and Peter A Beling. An end-to-end optimal trade execution framework based on proximal policy optimization. InProceedings of the twenty-ninth international conference on international joint conferences on artificial intelligence, pages 4548–4554, 2021

  38. [38]

    Deep reinforcement learning for automated stock trading: An ensemble strategy

    Hongyang Yang, Xiao-Yang Liu, Shan Zhong, and Anwar Walid. Deep reinforcement learning for automated stock trading: An ensemble strategy. InProceedings of the first ACM international conference on AI in finance, pages 1–8, 2020

  39. [39]

    A deep reinforcement learning framework for optimal trade execution

    Siyu Lin and Peter A Beling. A deep reinforcement learning framework for optimal trade execution. InJoint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 223–240. Springer, 2020

  40. [40]

    Chen, Xi Chen, Tamim Asfour, Pieter Abbeel, and Marcin Andrychowicz

    Matthias Plappert, Rein Houthooft, Prafulla Dhariwal, Szymon Sidor, Richard Y . Chen, Xi Chen, Tamim Asfour, Pieter Abbeel, and Marcin Andrychowicz. Parameter space noise for exploration. InInternational Conference on Learning Representations, 2018

  41. [41]

    Noisy networks for exploration

    Meire Fortunato, Mohammad Gheshlaghi Azar, Bilal Piot, Jacob Menick, Matteo Hessel, Ian Osband, Alex Graves, V olodymyr Mnih, Rémi Munos, Demis Hassabis, Olivier Pietquin, Charles Blundell, and Shane Legg. Noisy networks for exploration. InInternational Conference on Learning Representations (ICLR), 2018. 20 TT-DAC-PS: Twin-Target Deterministic Actor-Crit...

  42. [42]

    Exploration by Random Network Distillation

    Yuri Burda, Harrison Edwards, Amos Storkey, and Oleg Klimov. Exploration by random network distillation. arXiv preprint arXiv:1810.12894, 2018

  43. [43]

    Efros, and Trevor Darrell

    Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, and Trevor Darrell. Curiosity-driven exploration by self- supervised prediction. InICML Workshop on Principled Approaches to Deep Learning, 2017

  44. [44]

    Deep exploration via bootstrapped dqn

    Ian Osband, Charles Blundell, Alexander Pritzel, and Benjamin Van Roy. Deep exploration via bootstrapped dqn. InAdvances in Neural Information Processing Systems, 2016

  45. [45]

    A tutorial on thompson sampling and the exploration-exploitation tradeoff.Foundations and Trends in Machine Learning, 11(1):1–96, 2018

    Daniel Russo, Benjamin Van Roy, Abbas Kazerouni, Ian Osband, and Zheng Wen. A tutorial on thompson sampling and the exploration-exploitation tradeoff.Foundations and Trends in Machine Learning, 11(1):1–96, 2018

  46. [46]

    A fully consistent, minimal model for non-linear market impact.Quantitative Finance, 15(7):1109–1121, 2015

    Jonathan Donier, Julius Bonart, Iacopo Mastromatteo, and Jean-Philippe Bouchaud. A fully consistent, minimal model for non-linear market impact.Quantitative Finance, 15(7):1109–1121, 2015

  47. [47]

    Cross-impact and no-dynamic-arbitrage.Quantitative Finance, 19(1):137– 154, 2019

    Michael Schneider and Fabrizio Lillo. Cross-impact and no-dynamic-arbitrage.Quantitative Finance, 19(1):137– 154, 2019

  48. [48]

    Dissecting cross impact on stock markets: An empirical analysis.Journal of Statistical Mechanics: Theory and Experiment, 2017(2):023406, 2017

    Michael Benzaquen, Iacopo Mastromatteo, Zoltan Eisler, and Jean-Philippe Bouchaud. Dissecting cross impact on stock markets: An empirical analysis.Journal of Statistical Mechanics: Theory and Experiment, 2017(2):023406, 2017

  49. [49]

    Trading lightly: Cross- impact and optimal portfolio execution

    Iacopo Mastromatteo, Michael Benzaquen, Zoltan Eisler, and Jean-Philippe Bouchaud. Trading lightly: Cross- impact and optimal portfolio execution. 2017

  50. [50]

    ABIDES: Towards high-fidelity market simulation for AI research

    David Byrd, Maria Hybinette, and Tucker Hybinette Balch. ABIDES: Towards high-fidelity market simulation for AI research. 2019

  51. [51]

    ABIDES: Towards high-fidelity multi-agent market simulation

    David Byrd, Maria Hybinette, and Tucker Hybinette Balch. ABIDES: Towards high-fidelity multi-agent market simulation. InProceedings of the 2020 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation (PADS), 2020

  52. [52]

    Predicting stock price changes based on the limit order book: a survey.Mathematics, 10(8):1234, 2022

    Ilia Zaznov, Julian Kunkel, Alfonso Dufour, and Atta Badii. Predicting stock price changes based on the limit order book: a survey.Mathematics, 10(8):1234, 2022

  53. [53]

    Ilia Zaznov, Julian Martin Kunkel, Atta Badii, and Alfonso Dufour. The intraday dynamics predictor: a trioflow fusion of convolutional layers and gated recurrent units for high-frequency price movement forecasting.Applied Sciences, 14(7):2984, 2024

  54. [54]

    Universal features of price formation in financial markets: perspectives from deep learning.Quantitative Finance, 19(9):1449–1459, 2019

    Justin Sirignano and Rama Cont. Universal features of price formation in financial markets: perspectives from deep learning.Quantitative Finance, 19(9):1449–1459, 2019

  55. [55]

    Deeplob: Deep convolutional neural networks for limit order books.IEEE Access, 7:167692–167705, 2019

    Zihao Zhang, Stefan Zohren, and Stephen Roberts. Deeplob: Deep convolutional neural networks for limit order books.IEEE Access, 7:167692–167705, 2019

  56. [56]

    Attention based reading, highlighting, and forecasting of the limit order book

    Jiwon Jung and Kiseop Lee. Attention based reading, highlighting, and forecasting of the limit order book. 2024

  57. [57]

    Optimal execution with price-volume coupling

    Matthias Schnaubelt, Jonas Löhner, Bálint Horváth, et al. Optimal execution with price-volume coupling. SSRN 3534315, 2020

  58. [58]

    Latency and liquidity risk.International Journal of Theoretical and Applied Finance, 24(06n07):2150035, 2021

    Álvaro Cartea, Sebastian Jaimungal, and Leandro Sánchez-Betancourt. Latency and liquidity risk.International Journal of Theoretical and Applied Finance, 24(06n07):2150035, 2021

  59. [59]

    André F. Perold. The implementation shortfall: Paper versus reality.The Journal of Portfolio Management, 14(3):4–9, 1988

  60. [60]

    Tyrrell Rockafellar and Stanislav Uryasev

    R. Tyrrell Rockafellar and Stanislav Uryasev. Optimization of conditional value-at-risk.Journal of Risk, 2:21–41, 2000

  61. [61]

    Adamz: an enhanced optimisation method for neural network training.Neural Computing and Applications, pages 1–28, 2025

    Ilia Zaznov, Atta Badii, Julian Kunkel, and Alfonso Dufour. Adamz: an enhanced optimisation method for neural network training.Neural Computing and Applications, pages 1–28, 2025. 21