Derivative-Informed Operator Learning for Finance: On-the-Fly Greeks, Surfaces, Hedging, and Control
Pith reviewed 2026-06-27 22:47 UTC · model grok-4.3
The pith
Training financial pricing surrogates on both values and their directional derivatives cuts hedging and sensitivity errors.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A learned pricing or risk operator is trained simultaneously to reproduce a high-fidelity map and to reproduce its directional Fréchet derivatives; the resulting error bounds establish that derivative accuracy controls hedging error, local stress error, and optimizer instability, with discrete-time hedging error further governed by second-order accuracy.
What carries the argument
Derivative-informed operator learning, which augments standard operator training with on-the-fly matching of directional Fréchet derivatives obtained via adjoint algorithmic differentiation and tangent sensitivity equations.
If this is right
- In a Black-Scholes network a tuned derivative weight reduces vega error by 40 percent and delta error by 15 percent.
- Heston and Bates random-feature models cut stochastic-volatility and jump-parameter sensitivity errors by 60 to 76 percent.
- A random-feature DeepONet mapping volatility curves to price surfaces lowers out-of-sample JVP error by 44 percent and price RMSE by 23 percent.
- Derivative consistency by itself does not eliminate no-arbitrage violations, so explicit economic constraints must still be imposed.
Where Pith is reading between the lines
- The same training principle could be applied to any surrogate used for gradient-based control or inverse problems outside finance.
- Value-only training may leave residual errors that become visible only when the surrogate is placed inside a hedging or optimization loop.
- The framework invites tests in incomplete-market settings or with alternative discretizations to check how far the derived bounds extend.
Load-bearing premise
The error bounds that connect derivative accuracy to hedging and optimization performance are derived under the modeling assumptions of the chosen pricing operators and the chosen discretization of the hedging problem.
What would settle it
Measure the realized discrete-time hedging error of a Bates-model surrogate trained with versus without the derivative-matching term over an ensemble of paths and rebalancing frequencies.
read the original abstract
Financial decision systems require fast surrogate models for pricing, calibration, hedging, XVA, stress testing, and portfolio optimization. Standard neural surrogates reproduce prices or risk quantities, but downstream tasks depend as much on derivatives: deltas, vegas, curve and credit-spread sensitivities, exposure and objective gradients. We formulate a derivative-informed operator-learning framework in which the learned map -- a neural operator, random-feature operator, or finite-dimensional surrogate -- is trained both to match a high-fidelity pricing or risk operator and to match directional Fr\'echet derivatives generated on the fly. The framework combines operator learning, adjoint algorithmic differentiation, tangent sensitivity equations, random sketching of Jacobian actions, and no-arbitrage constraints. We derive error bounds showing derivative accuracy controls local stress errors, hedging error, and optimizer instability, and that discrete-time hedging error is also governed by second-order (gamma) accuracy. A Black--Scholes network over eight seeds shows a tuned derivative weight cuts vega error by 40\% and delta error by 15\% while modestly improving prices, but not an unsupervised second-order Greek. Heston and Bates random-feature experiments reduce stochastic-volatility and jump-parameter sensitivity errors by 60--76\%. A random-feature DeepONet/Galerkin operator mapping instantaneous-volatility curves to dense price surfaces reduces out-of-sample JVP error by 44\% and price RMSE by 23\% over eight seeds; it also shows derivative consistency alone does not remove no-arbitrage violations, so economic constraints must be imposed explicitly. The framework gives a disciplined route from value-only surrogates to derivative-aware engines that output differentiable instruments for hedging, risk, and control.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper formulates a derivative-informed operator-learning framework in which neural operators, random-feature operators, or finite-dimensional surrogates are trained to match both a high-fidelity pricing/risk operator and its directional Fréchet derivatives (generated via adjoints, tangent equations, or random sketching). It derives error bounds asserting that derivative accuracy controls local stress errors, discrete-time hedging error (via second-order/gamma accuracy), and optimizer instability, and combines these with no-arbitrage constraints. Experiments on Black–Scholes (network, 8 seeds), Heston/Bates (random features), and a DeepONet/Galerkin operator for volatility-curve-to-price-surface mapping report concrete error reductions (40% vega, 15% delta, 60–76% parameter sensitivities, 44% JVP, 23% price RMSE) while noting that derivative consistency alone does not eliminate arbitrage violations.
Significance. If the error bounds can be made rigorous under stated conditions, the framework supplies a disciplined route from value-only surrogates to derivative-aware engines usable for hedging, XVA, stress testing, and control. The multi-model, multi-seed numerical results (including explicit comparison of supervised vs. unsupervised second-order Greeks and the necessity of explicit economic constraints) provide concrete evidence of practical gains in Greek and sensitivity accuracy.
major comments (1)
- [error-bound derivations (abstract)] Abstract and error-bound derivations: the paper asserts that derivative accuracy controls hedging error and optimizer instability for the Black–Scholes, Heston, and Bates operators, yet provides no explicit list of the required regularity conditions (e.g., C^{2,1} regularity of the pricing map, uniform ellipticity or Lipschitz constants on coefficients) or market-completeness hypotheses needed to pass from Fréchet-derivative error to integrated hedging error. Without these hypotheses the quantitative link between the reported 15–76% Greek-error reductions and the claimed control of hedging/optimization error remains formally unsupported outside the tested numerical regimes.
minor comments (2)
- [Abstract] Abstract: the statement that “derivative consistency alone does not remove no-arbitrage violations” is an important negative result; it would benefit from a brief indication of which no-arbitrage constraints were imposed and how they interact with the derivative loss.
- [Abstract] Abstract: the derivative loss weight is described as “tuned”; a short clarification of the tuning procedure (grid search, validation metric, etc.) would help readers assess reproducibility of the reported 40%/15% reductions.
Simulated Author's Rebuttal
We thank the referee for the careful reading and for highlighting the need for explicit regularity conditions in the error-bound derivations. We address the single major comment below and will incorporate the requested clarifications in the revised manuscript.
read point-by-point responses
-
Referee: Abstract and error-bound derivations: the paper asserts that derivative accuracy controls hedging error and optimizer instability for the Black–Scholes, Heston, and Bates operators, yet provides no explicit list of the required regularity conditions (e.g., C^{2,1} regularity of the pricing map, uniform ellipticity or Lipschitz constants on coefficients) or market-completeness hypotheses needed to pass from Fréchet-derivative error to integrated hedging error. Without these hypotheses the quantitative link between the reported 15–76% Greek-error reductions and the claimed control of hedging/optimization error remains formally unsupported outside the tested numerical regimes.
Authors: We agree that an explicit enumeration of the standing assumptions would strengthen the presentation. The error bounds in Section 3 are derived under the standard hypotheses of the parabolic PDE and stochastic-process literature for these models (C^{2,1} regularity of the pricing map, uniform ellipticity and bounded Lipschitz coefficients of the SDEs, and market completeness with respect to the chosen numeraire for the discrete-hedging result). In the revision we will insert a dedicated paragraph (or short subsection) immediately before Theorem 3.1 that lists these conditions verbatim, together with a brief remark on how they are satisfied by the Black–Scholes, Heston, and Bates dynamics used in the experiments. This will make the passage from Fréchet-derivative error to integrated hedging and optimizer error fully rigorous while leaving the numerical claims unchanged. revision: yes
Circularity Check
No significant circularity; derivation is self-contained
full rationale
The paper formulates a derivative-informed operator learning framework and states that error bounds are derived linking Fréchet derivative accuracy to local stress errors, discrete-time hedging error, and optimizer instability under the assumptions of the chosen pricing operators. Training explicitly augments the loss with on-the-fly directional derivatives generated via adjoints or tangent equations; the reported error reductions (e.g., 15-76% in Greeks) are measured outcomes of this augmented training rather than quantities that reduce by construction to the fitted inputs or to any self-citation chain. No load-bearing self-citations, uniqueness theorems, or ansatzes imported from prior author work appear in the abstract or described claims, and the central mathematical and empirical content remains independent of the inputs.
Axiom & Free-Parameter Ledger
free parameters (1)
- derivative loss weight
axioms (2)
- domain assumption The pricing operators admit directional Fréchet derivatives that can be generated on the fly via adjoints or tangent equations.
- domain assumption Discrete-time hedging error is governed by second-order (gamma) accuracy of the surrogate.
Reference graph
Works this paper leans on
-
[1]
Nature Machine Intelligence , volume =
Lu, Lu and Jin, Pengzhan and Pang, Guofei and Zhang, Zhongqiang and Karniadakis, George Em , title =. Nature Machine Intelligence , volume =
-
[2]
International Conference on Learning Representations (ICLR) , year =
Li, Zongyi and Kovachki, Nikola and Azizzadenesheli, Kamyar and Liu, Burigede and Bhattacharya, Kaushik and Stuart, Andrew and Anandkumar, Anima , title =. International Conference on Learning Representations (ICLR) , year =
-
[3]
Journal of Machine Learning Research , volume =
Kovachki, Nikola and Li, Zongyi and Liu, Burigede and Azizzadenesheli, Kamyar and Bhattacharya, Kaushik and Stuart, Andrew and Anandkumar, Anima , title =. Journal of Machine Learning Research , volume =
-
[4]
Journal of Computational Physics , volume =
O'Leary-Roseberry, Thomas and Chen, Peng and Villa, Umberto and Ghattas, Omar , title =. Journal of Computational Physics , volume =. 2024 , doi =
2024
-
[5]
arXiv preprint arXiv:2512.14086 , year =
Yao, Boyuan and Luo, Dingcheng and Cao, Lianghao and Kovachki, Nikola and O'Leary-Roseberry, Thomas and Ghattas, Omar , title =. arXiv preprint arXiv:2512.14086 , year =
-
[6]
and Osindero, Simon and Jaderberg, Max and
Czarnecki, Wojciech M. and Osindero, Simon and Jaderberg, Max and. Advances in Neural Information Processing Systems (NeurIPS) , volume =
-
[7]
arXiv preprint arXiv:2504.08730 , year =
Luo, Dingcheng and O'Leary-Roseberry, Thomas and Chen, Peng and Ghattas, Omar , title =. arXiv preprint arXiv:2504.08730 , year =
-
[8]
arXiv preprint arXiv:2005.02347 , year =
Huge, Brian and Savine, Antoine , title =. arXiv preprint arXiv:2005.02347 , year =
arXiv 2005
-
[9]
2026 , month = may, howpublished =
Mou, Shancong , title =. 2026 , month = may, howpublished =
2026
-
[10]
Journal of Political Economy , volume =
Black, Fischer and Scholes, Myron , title =. Journal of Political Economy , volume =
-
[11]
, title =
Merton, Robert C. , title =. The Bell Journal of Economics and Management Science , volume =
-
[12]
, title =
Heston, Steven L. , title =. The Review of Financial Studies , volume =
-
[13]
Risk , volume =
Dupire, Bruno , title =. Risk , volume =
-
[14]
Glasserman, Paul , title =
-
[15]
Risk , volume =
Giles, Mike and Glasserman, Paul , title =. Risk , volume =
-
[16]
SSRN Electronic Journal , year =
Homescu, Cristian , title =. SSRN Electronic Journal , year =
-
[17]
Quantitative Finance , volume =
Capriotti, Luca and Giles, Mike , title =. Quantitative Finance , volume =
-
[18]
Deep hedging , journal =
B. Deep hedging , journal =
-
[19]
Quantitative Finance , volume =
Horvath, Blanka and Muguruza, Aitor and Tomas, Mehdi , title =. Quantitative Finance , volume =
-
[20]
, title =
Raissi, Maziar and Perdikaris, Paris and Karniadakis, George E. , title =. Journal of Computational Physics , volume =
-
[21]
Journal of Computational Physics , volume =
Sirignano, Justin and Spiliopoulos, Konstantinos , title =. Journal of Computational Physics , volume =
-
[22]
arXiv preprint arXiv:2406.11520 , year =
Gonon, Lukas and Jacquier, Antoine and Wiedemann, Ruben , title =. arXiv preprint arXiv:2406.11520 , year =
-
[23]
Quantitative Finance , volume =
Gatheral, Jim and Jacquier, Antoine , title =. Quantitative Finance , volume =
-
[24]
, title =
Nocedal, Jorge and Wright, Stephen J. , title =
-
[25]
Vershynin, Roman , title =
-
[26]
, title =
Bates, David S. , title =. The Review of Financial Studies , volume =
-
[27]
, title =
Carr, Peter and Madan, Dilip B. , title =. Journal of Computational Finance , volume =
-
[28]
Advances in Neural Information Processing Systems (NeurIPS) , volume =
Rahimi, Ali and Recht, Benjamin , title =. Advances in Neural Information Processing Systems (NeurIPS) , volume =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.