Recognition: no theorem link
Yield Curves Dynamics Using Variational Autoencoders Under No-arbitrage
Pith reviewed 2026-05-14 19:47 UTC · model grok-4.3
The pith
A variational autoencoder paired with a no-arbitrage penalized neural SDE produces arbitrage-free yield curve forecasts at 6.58 bps error.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that a Student-t Conditional Variational Autoencoder with Dynamic Level Injection first decouples macroeconomic shape dynamics from absolute base rates to extract a robust heavy-tailed term structure manifold, after which the latent dynamics are governed by a continuous-time Neural SDE whose loss is strictly penalized by the no-arbitrage PDE, yielding arbitrage-free paths with 6.58 bps mean tenor RMSE out-of-sample and avoiding the parallel drift and zero-lower-bound violations of the HJM model in extreme regimes.
What carries the argument
Two-stage architecture consisting of CVAEsT+LS for manifold extraction followed by a Neural SDE whose training includes an explicit No-Arbitrage PDE penalty term.
If this is right
- Out-of-sample mean tenor RMSE reaches 6.58 basis points on sovereign yield curves.
- Generated paths avoid the massive parallel drift and zero-lower-bound violations observed in classical HJM dynamics.
- Phase-space vector field analysis of the latent SDE supports unsupervised detection of macroeconomic regimes.
- Continuous-time scenario generation becomes feasible at arbitrary tenors without discrete-time discretization artifacts.
Where Pith is reading between the lines
- The same penalty structure might be applied to equity or commodity forward curves if analogous manifold structures can be identified.
- Replacing the fixed Student-t prior with a regime-switching latent distribution could further sharpen regime detection.
- Evaluating the model on intraday or tick-level data would test whether the continuous-time paths remain arbitrage-free at higher sampling frequencies.
Load-bearing premise
That imposing the no-arbitrage PDE penalty during training of the neural SDE will keep the generated paths arbitrage-free when the penalty is removed in out-of-sample forecasting.
What would settle it
Generate new paths from the trained model in an extreme-rate environment and check whether any set of discount factors derived from those paths permits a static arbitrage portfolio with positive profit and zero risk.
Figures
read the original abstract
This paper introduces a physics-informed generative framework that resolves the fundamental conflict between the statistical flexibility of deep learning and the rigorous theoretical constraints of fixed-income modeling. We demonstrate that standard generative models and unconstrained statistical extrapolations suffer from "manifold collapse" and severe arbitrage violations when forecasting term structures across diverse macroeconomic regimes. To overcome this, we propose a two-stage architecture. First, a Student-t Conditional Variational Autoencoder with Dynamic Level Injection (CVAEsT+LS) extracts a robust, heavy-tailed term structure manifold, effectively decoupling macroeconomic shape dynamics from absolute base rates. Second, the latent dynamic evolution is governed by a continuous-time Neural Stochastic Differential Equation (SDE) strictly penalized by a No-Arbitrage Partial Differential Equation (PDE). Empirical results across multiple sovereign currencies (USD, GBP, JPY) confirm that our synergistic approach drastically reduces out-of-sample forecasting errors -- achieving an exceptional 6.58 bps Mean Tenor RMSE -- and successfully overcomes the massive parallel drift and zero-lower-bound violations exhibited by the classical HJM model in extreme environments. Furthermore, through phase space vector field analysis, we demonstrate the model's superior capability in unsupervised macroeconomic regime detection and high-quality continuous-time scenario generation. Ultimately, this research provides a highly scalable, mathematically sound evolutionary engine for term structure modeling.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a two-stage physics-informed generative framework for yield curve dynamics: a Student-t Conditional Variational Autoencoder with Dynamic Level Injection (CVAEsT+LS) to extract a heavy-tailed term structure manifold decoupling shape from base rates, followed by a continuous-time Neural SDE whose evolution is constrained via a no-arbitrage PDE penalty term added to the training loss. It claims this yields 6.58 bps out-of-sample Mean Tenor RMSE across USD/GBP/JPY, eliminates parallel-drift and zero-lower-bound violations seen in classical HJM models, and enables unsupervised regime detection via phase-space analysis.
Significance. If the PDE penalty robustly enforces arbitrage-free paths without post-hoc tuning, the work would meaningfully advance hybrid neural-SDE and no-arbitrage modeling in fixed income by offering a scalable route to continuous-time, heavy-tailed scenario generation that respects theoretical constraints while capturing macroeconomic regimes.
major comments (3)
- [Abstract] Abstract: the description of the Neural SDE as 'strictly penalized by a No-Arbitrage PDE' does not specify the mathematical form of the penalty term, the numerical value or selection procedure for its coefficient, or whether the coefficient is fixed solely on training data; because this weight directly controls the reported low arbitrage violations, the out-of-sample no-arbitrage claim is load-bearing and requires explicit validation that the coefficient choice does not leak test-regime information.
- [Abstract] Abstract: the headline 6.58 bps Mean Tenor RMSE and elimination of HJM-style parallel-drift/ZLB violations are presented without error bars, number of Monte Carlo runs, definition of how arbitrage violations are quantified (e.g., which specific no-arbitrage conditions are checked on generated paths), or ablation on the PDE penalty strength; these omissions prevent assessment of whether the performance gain is statistically reliable or sensitive to the penalty hyper-parameter.
- [Abstract] Abstract: the free parameters listed (PDE penalty coefficient and Student-t degrees of freedom) are not accompanied by sensitivity analysis or cross-validation protocol; without this, it is unclear whether the reported superiority over HJM holds for fixed, pre-specified hyper-parameters or only after tuning that produces the desired low-violation regime.
minor comments (2)
- Define 'Mean Tenor RMSE' explicitly (which tenors, weighting, and forecast horizon) and report the corresponding HJM baseline value for direct comparison.
- Clarify the precise form of the dynamic level injection in CVAEsT+LS and how it interacts with the subsequent Neural SDE latent dynamics.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. We have revised the manuscript to address each point by clarifying the penalty formulation, adding statistical details, and including sensitivity analyses. All changes are confined to the training regime with no test-data leakage.
read point-by-point responses
-
Referee: [Abstract] Abstract: the description of the Neural SDE as 'strictly penalized by a No-Arbitrage PDE' does not specify the mathematical form of the penalty term, the numerical value or selection procedure for its coefficient, or whether the coefficient is fixed solely on training data; because this weight directly controls the reported low arbitrage violations, the out-of-sample no-arbitrage claim is load-bearing and requires explicit validation that the coefficient choice does not leak test-regime information.
Authors: We agree that explicit specification is required. The penalty is the integrated squared residual of the HJM no-arbitrage drift condition: λ ∫ ||∂_t f(t,T) + σ(t,T) ∫_t^T σ(t,u) du||² dt, where f and σ are produced by the Neural SDE. The scalar λ was selected by 5-fold cross-validation on the training period only (2010–2018 for USD, analogous splits for GBP/JPY), minimizing a joint loss of reconstruction error plus penalty; the test window (2019–2023) was never used. We have updated the abstract with the exact functional form and the training-only protocol, and added a dedicated paragraph in Section 3.2 confirming the absence of leakage. revision: yes
-
Referee: [Abstract] Abstract: the headline 6.58 bps Mean Tenor RMSE and elimination of HJM-style parallel-drift/ZLB violations are presented without error bars, number of Monte Carlo runs, definition of how arbitrage violations are quantified (e.g., which specific no-arbitrage conditions are checked on generated paths), or ablation on the PDE penalty strength; these omissions prevent assessment of whether the performance gain is statistically reliable or sensitive to the penalty hyper-parameter.
Authors: We accept that these supporting statistics must be reported. The 6.58 bps figure is the mean across 50 independent Monte-Carlo trajectories (different random seeds) with standard error ±0.31 bps. Arbitrage violations are defined as the fraction of paths that either (i) violate the HJM drift condition by more than 1 bp or (ii) produce negative yields. We have added an ablation table (new Table 4) showing RMSE and violation rates for λ ∈ {0, 0.1, 1, 5, 10}; the reported operating point λ = 1 yields the lowest combined error while keeping violations below 0.2 %. These elements are now stated in the revised abstract and expanded in Section 4.3. revision: yes
-
Referee: [Abstract] Abstract: the free parameters listed (PDE penalty coefficient and Student-t degrees of freedom) are not accompanied by sensitivity analysis or cross-validation protocol; without this, it is unclear whether the reported superiority over HJM holds for fixed, pre-specified hyper-parameters or only after tuning that produces the desired low-violation regime.
Authors: The Student-t degrees of freedom ν = 4 was fixed after inspecting the kurtosis of yield residuals on the training set; λ was chosen via the same 5-fold CV procedure described above. We have performed and will report a full sensitivity grid (ν ∈ {3,4,5,6}, λ ∈ {0.1,0.5,1,2,5,10}) showing that out-of-sample RMSE stays below 9 bps and violations remain under 1 % for the interval λ ∈ [0.5,5]. The cross-validation protocol and grid results are now summarized in the abstract and detailed in a new Appendix C, confirming that the superiority versus HJM is not an artifact of post-hoc tuning on test data. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper presents a two-stage architecture consisting of a Student-t Conditional Variational Autoencoder with Dynamic Level Injection to extract the term structure manifold, followed by a Neural SDE whose training objective includes a no-arbitrage PDE penalty term. This penalty is a standard soft constraint in physics-informed models and does not reduce any reported out-of-sample metric (such as the 6.58 bps RMSE) to a fitted parameter by construction. No self-citation chain is invoked to justify uniqueness or to smuggle in an ansatz, and the empirical claims rest on cross-currency out-of-sample tests rather than re-labeling inputs as predictions. The derivation chain remains independent of its own outputs.
Axiom & Free-Parameter Ledger
free parameters (2)
- PDE penalty coefficient
- Student-t degrees of freedom
axioms (2)
- standard math Solutions to the neural SDE exist and are unique under the chosen drift and diffusion networks
- domain assumption The no-arbitrage PDE derived from classical fixed-income theory remains valid when applied to the latent manifold coordinates
invented entities (2)
-
CVAEsT+LS (Student-t Conditional VAE with Dynamic Level Injection)
no independent evidence
-
Neural SDE strictly penalized by No-Arbitrage PDE
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Martingales and arbitrage in multiperiod securities markets
J Michael Harrison and David M Kreps. “Martingales and arbitrage in multiperiod securities markets”. In:Journal of Economic theory20.3 (1979), pp. 381–408
work page 1979
-
[2]
An equilibrium characterization of the term structure
Oldrich Vasicek. “An equilibrium characterization of the term structure”. In:Journal of financial economics5.2 (1977), pp. 177–188
work page 1977
-
[3]
A theory of the term structure of interest rates
John C Cox, Jonathan E Ingersoll, Stephen A Ross, et al. “A theory of the term structure of interest rates”. In:Econometrica53.2 (1985), pp. 385–407
work page 1985
-
[4]
Pricing interest-rate-derivative securities
John Hull and Alan White. “Pricing interest-rate-derivative securities”. In:The review of financial studies3.4 (1990), pp. 573–592
work page 1990
-
[5]
Specification analysis of affine term structure models
Qiang Dai and Kenneth J Singleton. “Specification analysis of affine term structure models”. In: The journal of finance55.5 (2000), pp. 1943–1978
work page 2000
-
[6]
David Heath, Robert Jarrow, and Andrew Morton. “Bond pricing and the term structure of inter- est rates: A new methodology for contingent claims valuation”. In:Econometrica: Journal of the Econometric Society(1992), pp. 77–105
work page 1992
-
[7]
Parsimonious modeling of yield curves
Charles R Nelson and Andrew F Siegel. “Parsimonious modeling of yield curves”. In:Journal of business(1987), pp. 473–489
work page 1987
-
[8]
Volatility and the yield curve
Robert B Litterman, Jos´ e Scheinkman, and Laurence Weiss. “Volatility and the yield curve”. In: The Journal of Fixed Income1.1 (1991), pp. 49–53
work page 1991
-
[9]
Lars EO Svensson.Estimating and interpreting forward interest rates: Sweden 1992-1994. 1994
work page 1992
-
[10]
Autoencoder-Based Risk-Neutral Model for Interest Rates
Andrei Lyashenko, Fabio Mercurio, and Alexander Sokol. “Autoencoder-Based Risk-Neutral Model for Interest Rates”. In:Available at SSRN 4836728(2024)
work page 2024
-
[11]
Rheinische Friedrich-Wilhelms-Universit¨ at Bonn, 1993
Marek Musiela, Dieter Sondermann, et al.Different dynamical specifications of the term structure of interest rates and their implications. Rheinische Friedrich-Wilhelms-Universit¨ at Bonn, 1993
work page 1993
-
[12]
Autoencoder market models for interest rates
Alexander Sokol. “Autoencoder market models for interest rates”. In:Available at SSRN 4300756 (2022). 28
work page 2022
-
[13]
The US Treasury yield curve: 1961 to the present
Refet S G¨ urkaynak, Brian Sack, and Jonathan H Wright. “The US Treasury yield curve: 1961 to the present”. In:Journal of monetary Economics54.8 (2007), pp. 2291–2304
work page 1961
-
[14]
Long forward and zero-coupon rates can never fall
Philip H Dybvig, Jonathan E Ingersoll Jr, and Stephen A Ross. “Long forward and zero-coupon rates can never fall”. In:Journal of Business(1996), pp. 1–25
work page 1996
-
[15]
Jens HE Christensen, Francis X Diebold, and Glenn D Rudebusch.An arbitrage-free generalized Nelson–Siegel term structure model. 2009
work page 2009
-
[16]
Jesper Andreasen. “Decoding the Autoencoder”. In:Wilmott2023.127 (2023).doi:10.54946/ wilm.11166.url:https://doi.org/10.54946/wilm.11166
-
[17]
Auto-Encoding Variational Bayes
Diederik P Kingma and Max Welling. “Auto-encoding variational bayes”. In:arXiv preprint arXiv:1312.6114 (2013)
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[18]
Matching aggregate posteriors in the variational autoencoder
Surojit Saha, Sarang Joshi, and Ross Whitaker. “Matching aggregate posteriors in the variational autoencoder”. In:International Conference on Pattern Recognition. Springer. 2025, pp. 428–444
work page 2025
-
[19]
Learning Energy-based Variational Latent Prior for VAEs
Debottam Dutta et al. “Learning Energy-based Variational Latent Prior for VAEs”. In:arXiv preprint arXiv:2510.00260(2025)
-
[20]
R., Falorsi, L., De Cao, N., Kipf, T., and Tomczak, J
Tim R Davidson et al. “Hyperspherical variational auto-encoders”. In:arXiv preprint arXiv:1804.00891 (2018)
-
[21]
Multiresolution Signal Processing of Financial Market Objects
Ioana Boier. “Multiresolution Signal Processing of Financial Market Objects”. In:ICASSP 2023- 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE. 2023, pp. 1–5
work page 2023
-
[22]
Student-t Variational Autoencoder for Robust Density Estimation
Hiroshi Takahashi et al. “Student-t Variational Autoencoder for Robust Density Estimation.” In: IJCAI. 2018, pp. 2696–2702
work page 2018
-
[23]
Springer Science & Business Media, 2001
Damir Filipovic.Consistency problems for Heath-Jarrow-Morton interest rate models. Springer Science & Business Media, 2001
work page 2001
-
[24]
Forecasting the term structure of government bond yields
Francis X Diebold and Canlin Li. “Forecasting the term structure of government bond yields”. In: Journal of econometrics130.2 (2006), pp. 337–364. 29
work page 2006
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.