Universal Time Series Generation with Neural Controlled Differential Equations
Pith reviewed 2026-06-29 14:07 UTC · model grok-4.3
The pith
Maximally expressive SLiCEs can approximate the path laws of any continuous causal pushforward on compact latent sets in the W infinity metric.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Maximally expressive Structured Linear Controlled Differential Equations (SLiCEs) are universal time-series generators in the sense that they can approximate the induced path laws of continuous causal pushforwards on compact latent sets in W_∞. This theoretical guarantee underpins the construction of Generative SLiCEs (G-SLiCEs), a continuous-time flow-matching model on path space that empirically outperforms less expressive alternatives on forecasting tasks while generalizing to irregular sampling grids.
What carries the argument
Maximally expressive Structured Linear Controlled Differential Equations (SLiCEs), which serve as the continuous-time backbone whose linear structure and parameterisation allow approximation of arbitrary continuous causal path measures on compact sets.
If this is right
- Expressivity directly improves accuracy in probabilistic forecasting and related downstream tasks.
- The model inherits continuous-time properties and therefore works on arbitrary observation grids without modification.
- Irregular or missing data no longer requires special handling or interpolation that fixed-grid models need.
- Flow matching performed on path space yields a generative model whose samples respect the continuous causal structure of the data.
- The same universality result supplies a design principle for future continuous-time generative architectures.
Where Pith is reading between the lines
- The compactness requirement suggests testing whether performance degrades on latent spaces that are only locally compact or unbounded.
- Because the model operates in continuous time, it may be directly applicable to irregularly sampled multivariate sensor streams without resampling.
- The path-space flow-matching formulation could be combined with existing neural ODE or SDE solvers to produce hybrid generative pipelines.
- If the causality assumption is relaxed, the same machinery might still approximate non-causal maps at the cost of losing the W_∞ guarantee.
Load-bearing premise
The latent sets must be compact and the pushforwards must be continuous and causal.
What would settle it
Exhibit one continuous causal pushforward from a compact latent set whose induced path law lies outside the closure of the set of laws produced by maximally expressive SLiCEs, when distance is measured in W_∞.
Figures
read the original abstract
Recent work on the sequence universality of State Space Models (SSMs) has introduced efficient, maximally expressive continuous-time approaches for time-series modelling. While these works focus on discriminative settings, we extend this perspective to generative time-series modelling by proving that maximally expressive Structured Linear Controlled Differential Equations (SLiCEs) are universal time-series generators, in the sense that they can approximate the induced path laws of continuous causal pushforwards on compact latent sets in $W_\infty$. Building on these theoretical results, we propose Generative SLiCEs (G-SLiCEs), a maximally expressive continuous-time model for flow matching on path-space. Empirically, we show that expressivity improves performance in probabilistic forecasting and downstream tasks, while retaining the advantages of continuous-time models such as generalising to arbitrary observation grids. This is particularly beneficial for irregular grids, where fixed-grid models often struggle.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims to prove that maximally expressive Structured Linear Controlled Differential Equations (SLiCEs) are universal time-series generators, in the sense that they approximate the induced path laws of continuous causal pushforwards on compact latent sets in the W_∞ metric. It introduces Generative SLiCEs (G-SLiCEs) for flow matching on path-space and reports that increased expressivity improves probabilistic forecasting and downstream tasks while preserving continuous-time advantages such as generalization to arbitrary observation grids.
Significance. If the universality result holds under the stated conditions, the work supplies a qualified theoretical foundation for continuous-time generative models that extends recent SSM universality results from discriminative to generative settings. The explicit restrictions (compact latent sets, continuous causal pushforwards, W_∞ metric) and the proposal of a flow-matching model on path space are strengths. The empirical emphasis on irregular grids addresses a recurring practical limitation of fixed-grid approaches.
major comments (2)
- [Abstract] Abstract: the central universality claim is asserted without any derivation steps, theorem statement, or proof sketch, so the load-bearing mathematical result cannot be assessed for correctness against the listed conditions (compact sets, continuous causal pushforwards).
- [Abstract] Abstract: the empirical section states that 'expressivity improves performance' but supplies no baselines, metrics, datasets, or experimental protocol, preventing verification of the claimed gains for probabilistic forecasting and irregular-grid tasks.
minor comments (1)
- [Abstract] Abstract: the symbol W_∞ is introduced without definition or citation to the underlying path-space metric.
Simulated Author's Rebuttal
We thank the referee for the detailed comments. The universality theorem and its proof appear in full in the body of the manuscript (Section 3), while the abstract follows standard conventions by summarizing the result. The empirical protocol, baselines, and metrics are likewise detailed in Section 5. We address each point below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central universality claim is asserted without any derivation steps, theorem statement, or proof sketch, so the load-bearing mathematical result cannot be assessed for correctness against the listed conditions (compact sets, continuous causal pushforwards).
Authors: The precise theorem statement (including the restrictions to compact latent sets, continuous causal pushforwards, and the W_∞ metric) together with the complete proof is given in Section 3. The abstract is intentionally concise and does not contain derivations, as is conventional; the full mathematical development is available for assessment in the main text. revision: no
-
Referee: [Abstract] Abstract: the empirical section states that 'expressivity improves performance' but supplies no baselines, metrics, datasets, or experimental protocol, preventing verification of the claimed gains for probabilistic forecasting and irregular-grid tasks.
Authors: The abstract summarizes the empirical outcome at a high level. Section 5 supplies the full experimental protocol, including the datasets (with irregular observation grids), metrics (CRPS, log-likelihood, and downstream task accuracy), baselines (both discrete and continuous-time models), and implementation details. We are happy to add a parenthetical mention of the primary metric to the abstract if the editor prefers. revision: partial
Circularity Check
No significant circularity detected in universality claim
full rationale
The paper presents a proof that maximally expressive SLiCEs approximate induced path laws of continuous causal pushforwards on compact latent sets in the W_∞ metric, with conditions stated explicitly in the abstract. This is framed as an extension of prior SSM universality results rather than a redefinition or fit of the target quantity itself. No self-definitional reduction, fitted-input prediction, or load-bearing self-citation chain is visible that would collapse the central claim to its inputs by construction. The derivation is therefore self-contained.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Compact latent sets and continuous causal pushforwards allow approximation of induced path laws in W_∞
invented entities (1)
-
G-SLiCEs
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Albergo, Mark Goldstein, Nicholas M
Michael S. Albergo, Mark Goldstein, Nicholas M. Boffi, Rajesh Ranganath, and Eric Vanden-Eijnden. Stochastic interpolants with data-dependent couplings.arXiv preprint arXiv:2310.03725, 2023
-
[2]
Diffusion-based time series imputation and forecasting with structured state space models.Transactions on Machine Learning Research, 2022
Juan Lopez Alcaraz and Nils Strodthoff. Diffusion-based time series imputation and forecasting with structured state space models.Transactions on Machine Learning Research, 2022. ISSN 2835-8856. URLhttps://openreview.net/forum?id=hHiIbk7ApW
2022
-
[3]
Juan Miguel Lopez Alcaraz and Nils Strodthoff. Diffusion-based time series imputation and forecasting with structured state space models.arXiv preprint arXiv:2208.09399, 2022
-
[4]
J. M. Aldaz, O. Kounchev, and H. Render. Bernstein operators for exponential polynomials. Constructive Approximation, 29:345–367, 2009
2009
-
[5]
GluonTS: Probabilistic Time Series Models in Python
A. Alexandrov, K. Benidis, M. Bohlke-Schneider, V . Flunkert, J. Gasthaus, T. Januschowski, D. C. Maddix, S. Rangapuram, D. Salinas, J. Schulz, L. Stella, A. C. Türkmen, and Y . Wang. GluonTS: Probabilistic Time Series Modeling in Python.arXiv preprint arXiv:1906.05264, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1906
-
[6]
Maddix, Syama Rangapuram, David Salinas, Jasper Schulz, Lorenzo Stella, Ali Caner Türkmen, and Yuyang Wang
Alexander Alexandrov, Konstantinos Benidis, Michael Bohlke-Schneider, Valentin Flunkert, Jan Gasthaus, Tim Januschowski, Danielle C. Maddix, Syama Rangapuram, David Salinas, Jasper Schulz, Lorenzo Stella, Ali Caner Türkmen, and Yuyang Wang. GluonTS: Probabilistic and Neural Time Series Modeling in Python.Journal of Machine Learning Research, 21(116): 1–6,...
2020
-
[7]
xlstm: Extended long short-term memory
Maximilian Beck, Korbinian Pöppel, Markus Spanring, Andreas Auer, Oleksandra Prudnikova, Michael Kopp, Günter Klambauer, Johannes Brandstetter, and Sepp Hochreiter. xlstm: Extended long short-term memory. InProceedings of the 38th Conference on Neural Information Processing Systems, 2024
2024
-
[8]
Titans: Learning to Memorize at Test Time
Ali Behrouz, Peilin Zhong, and Vahab Mirrokni. Titans: Learning to memorize at test time. arXiv preprint arXiv:2501.00663, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[9]
Permu- tation equivariant neural controlled differential equations for dynamic graph representation learning
Torben Berndt, Benjamin Walker, Tiexin Qin, Jan Stühmer, and Andrey Kormilitzin. Permu- tation equivariant neural controlled differential equations for dynamic graph representation learning. InProceedings of the 39th Conference on Neural Information Processing Systems (NeurIPS), 2025
2025
-
[10]
Wiley, 2 edition, 1999
Patrick Billingsley.Convergence of Probability Measures. Wiley, 2 edition, 1999
1999
-
[11]
Modeling temporal data as continuous functions with stochastic process diffusion
Marin Biloš, Kashif Rasul, Anderson Schneider, Yuriy Nevmyvaka, and Stephan Günnemann. Modeling temporal data as continuous functions with stochastic process diffusion. InInterna- tional Conference on Machine Learning, pages 2452–2470. PMLR, 2023
2023
-
[12]
Bogachev.Measure Theory
Vladimir I. Bogachev.Measure Theory. Springer, Berlin, 2007. doi: 10.1007/ 978-3-540-34514-5
2007
-
[13]
Ricky T. Q. Chen and Yaron Lipman. Flow matching on general geometries. InThe Twelfth International Conference on Learning Representations, 2024. 10
2024
-
[14]
Fractional ornstein–uhlenbeck processes.Electronic Journal of Probability, 8, 2003
Patrick Cheridito, Hiroshi Kawaguchi, and Makoto Maejima. Fractional ornstein–uhlenbeck processes.Electronic Journal of Probability, 8, 2003
2003
-
[15]
Theoretical foundations of deep selective state-space models
Nicola Muca Cirone, Antonio Orvieto, Benjamin Walker, Cristopher Salvi, and Terry Lyons. Theoretical foundations of deep selective state-space models. InProceedings of the 38th Conference on Neural Information Processing Systems (NeurIPS), 2024
2024
-
[16]
Approximation by superpositions of a sigmoidal function.Mathematics of Control, Signals and Systems, 2:303–314, 1989
George Cybenko. Approximation by superpositions of a sigmoidal function.Mathematics of Control, Signals and Systems, 2:303–314, 1989
1989
-
[17]
Transformers are ssms: Generalized models and efficient algorithms through structured state space duality
Tri Dao and Albert Gu. Transformers are ssms: Generalized models and efficient algorithms through structured state space duality. InProceedings of the 41st International Conference on Machine Learning. JMLR.org, 2024
2024
-
[18]
TimeV AE: A variational auto-encoder for multivariate time series generation,
Abhyuday Desai, Cynthia Freeman, Zuhui Wang, and Ian Beaver. Timevae: A variational auto-encoder for multivariate time series generation.arXiv preprint arXiv:2111.08095, 2021
-
[19]
Karra Taniskidou
Dua Dheeru and E. Karra Taniskidou. Uci machine learning repository, 2017
2017
-
[20]
Joseph L. Doob. The brownian movement and stochastic equations.Annals of Mathematics, 43 (2), 1942
1942
-
[21]
Uci machine learning repository
Dheeru Dua, Casey Graff, et al. Uci machine learning repository. Beijing, 2017
2017
-
[22]
Advancing regular language rea- soning in linear recurrent neural networks
Ting-Han Fan, Ta-Chung Chi, and Alexander Rudnicky. Advancing regular language rea- soning in linear recurrent neural networks. InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2: Short Papers, pages 45–53, Mexico City, Mexico, 2024. Association for Com...
2024
-
[23]
Uber tlc foil response
FiveThirtyEight. Uber tlc foil response. https://github.com/fivethirtyeight/ uber-tlc-foil-response, 2016
2016
-
[24]
Alaya, Aurélie Boisbunon, Stanislas Chambon, Laetitia Chapel, Adrien Corenflos, Kilian Fatras, Nemo Fournier, Léo Gautheron, Nathalie T.H
Rémi Flamary, Nicolas Courty, Alexandre Gramfort, Mokhtar Z. Alaya, Aurélie Boisbunon, Stanislas Chambon, Laetitia Chapel, Adrien Corenflos, Kilian Fatras, Nemo Fournier, Léo Gautheron, Nathalie T.H. Gayraud, Hicham Janati, Alain Rakotomamonjy, Ievgen Redko, Antoine Rolet, Antony Schutz, Vivien Seguy, Danica J. Sutherland, Romain Tavenard, Alexander Tong,...
2021
-
[25]
Pot python optimal transport (version 0.9.5), 2024
Rémi Flamary, Cédric Vincent-Cuaz, Nicolas Courty, Alexandre Gramfort, Oleksii Kachaiev, Huy Quang Tran, Laurène David, Clément Bonet, Nathan Cassereau, Théo Gnassounou, Eloi Tanguy, Julie Delon, Antoine Collas, Sonia Mazelet, Laetitia Chapel, Tanguy Kerdoncuff, Xizheng Yu, Matthew Feickert, Paul Krzakala, Tianlin Liu, and Eduardo Fernandes Montesuma. Pot...
2024
-
[26]
Folland.Real Analysis: Modern Techniques and Their Applications
Gerald B. Folland.Real Analysis: Modern Techniques and Their Applications. Wiley, 2 edition, 1999
1999
-
[27]
Probabilistic forecasting with spline quantile function rnns
Jan Gasthaus, Konstantinos Benidis, Yuyang Wang, Syama Sundar Rangapuram, David Salinas, Valentin Flunkert, and Tim Januschowski. Probabilistic forecasting with spline quantile function rnns. InThe 22nd international conference on artificial intelligence and statistics, pages 1901–1910. PMLR, 2019
1901
-
[28]
Itai Gat, Tal Remez, Neta Shaul, Felix Kreuk, Ricky T. Q. Chen, Gabriel Synnaeve, Yossi Adi, and Yaron Lipman. Discrete flow matching. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024
2024
-
[29]
Tilmann Gneiting and Adrian E. Raftery. Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association, 102(477):359–378, 2007
2007
-
[30]
Webb, Rob J
Rakshitha Godahewa, Christoph Bergmeir, Geoffrey I. Webb, Rob J. Hyndman, and Pablo Montero-Manso. Monash time series forecasting archive. InNeural Information Processing Systems Track on Datasets and Benchmarks, 2021. 11
2021
-
[31]
Monash time series forecasting archive
Rakshitha Wathsadini Godahewa, Christoph Bergmeir, Geoffrey I Webb, Rob Hyndman, and Pablo Montero-Manso. Monash time series forecasting archive. InThirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021
2021
-
[32]
Xavier Gonzalez, Andrew Warrington, Jimmy T. H. Smith, and Scott W. Linderman. Towards scalable and stable parallelization of nonlinear rnns. InAdvances in Neural Information Processing Systems, 2024
2024
-
[33]
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[34]
Approximation capabilities of multilayer feedforward networks.Neural Networks, 4(2):251–257, 1991
Kurt Hornik. Approximation capabilities of multilayer feedforward networks.Neural Networks, 4(2):251–257, 1991
1991
-
[35]
Koehler, J
Rob Hyndman, Anne B. Koehler, J. Keith Ord, and Ralph D. Snyder.Forecasting with exponential smoothing: the state space approach. Springer Science & Business Media, 2008
2008
-
[36]
Functional flow matching.arXiv preprint arXiv:2305.17209, 2023
Gavin Kerrigan, Giosue Migliorini, and Padhraic Smyth. Functional flow matching.arXiv preprint arXiv:2305.17209, 2023
-
[37]
Predict, refine, synthesize: Self-guiding diffusion models for probabilistic time series forecasting.Advances in Neural Information Processing Systems, 36, 2023
Marcel Kollovieh, Abdul Fatir Ansari, Michael Bohlke-Schneider, Jasper Zschiegner, Hao Wang, and Yuyang Bernie Wang. Predict, refine, synthesize: Self-guiding diffusion models for probabilistic time series forecasting.Advances in Neural Information Processing Systems, 36, 2023
2023
-
[38]
Flow matching with gaussian process priors for probabilistic time series forecasting.The Thirteenth International Conference on Learning Representations, 2025
Marcel Kollovieh, Marten Lienen, David Lüdke, Leo Schwinn, and Stephan Günnemann. Flow matching with gaussian process priors for probabilistic time series forecasting.The Thirteenth International Conference on Learning Representations, 2025
2025
-
[39]
Modeling long-and short-term temporal patterns with deep neural networks
Guokun Lai, Wei-Cheng Chang, Yiming Yang, and Hanxiao Liu. Modeling long-and short-term temporal patterns with deep neural networks. InThe 41st international ACM SIGIR conference on research & development in information retrieval, pages 95–104, 2018
2018
-
[40]
Arık, Nicolas Loeff, and Tomas Pfister
Bryan Lim, Sercan Ö. Arık, Nicolas Loeff, and Tomas Pfister. Temporal fusion transformers for interpretable multi-horizon time series forecasting.International Journal of Forecasting, 37(4): 1748–1764, 2021
2021
-
[41]
Parallelizing nonlinear sequential models over the sequence length.International Conference on Learning Representations, 2024
Yi Heng Lim, Qi Zhu, Joshua Selfridge, and Muhammad Firmansyah Kasim. Parallelizing nonlinear sequential models over the sequence length.International Conference on Learning Representations, 2024
2024
-
[42]
Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling. InThe Eleventh International Conference on Learning Representations, 2023
2023
-
[43]
The m4 competition: 100,000 time series and 61 forecasting methods.International Journal of Forecasting, 36(1): 54–74, 2020
Spyros Makridakis, Evangelos Spiliotis, and Vassilios Assimakopoulos. The m4 competition: 100,000 time series and 61 forecasting methods.International Journal of Forecasting, 36(1): 54–74, 2020
2020
-
[44]
Fixed-point rnns: From diagonal to dense in a few iterations
Sajad Movahedi, Felix Sarnthein, Nicola Muca Cirone, and Antonio Orvieto. Fixed-point rnns: From diagonal to dense in a few iterations. InProceedings of the 39th Conference on Neural Information Processing Systems (NeurIPS), 2025
2025
-
[45]
Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam
Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers. InInternational Conference on Learning Representations, 2023
2023
-
[46]
WaveNet: A Generative Model for Raw Audio
Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. Wavenet: A generative model for raw audio.arXiv preprint arXiv:1609.03499, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[47]
Antonio Orvieto, Soham De, Caglar Gulcehre, Razvan Pascanu, and Samuel L. Smith. Uni- versality of linear recurrences followed by non-linear projections: Finite-width guarantees and benefits of complex eigenvalues. InProceedings of the 41st International Conference on Machine Learning, 2024. 12
2024
-
[48]
Pedregosa, G
F. Pedregosa, G. Varoquaux, A. Gramfort, V . Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V . Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python.Journal of Machine Learning Research, 12:2825–2830, 2011
2011
-
[49]
Wind, Tianyi Wu, Daniel Wuttke, and Christian Zhou-Zheng
Bo Peng, Ruichong Zhang, Daniel Goldstein, Eric Alcaide, Xingjian Du, Haowen Hou, Jiaju Lin, Jiaxing Liu, Janna Lu, William Merrill, Guangyu Song, Kaifeng Tan, Saiteja Utpala, Nathan Wilce, Johan S. Wind, Tianyi Wu, Daniel Wuttke, and Christian Zhou-Zheng. Rwkv-7 “goose” with expressive dynamic state evolution.arXiv preprint arXiv:2503.14456, 2025
-
[50]
Smith, and Lingpeng Kong
Hao Peng, Nikolaos Pappas, Dani Yogatama, Roy Schwartz, Noah A. Smith, and Lingpeng Kong. Random feature attention. InInternational Conference on Learning Representations, 2021
2021
-
[51]
Probabilistic weather forecasting with machine learning.Nature, 637(8044):84–90, 2025
Ilan Price, Alvaro Sanchez-Gonzalez, Ferran Alet, Tom R Andersson, Andrew El-Kadi, Dominic Masters, Timo Ewalds, Jacklynn Stott, Shakir Mohamed, Peter Battaglia, et al. Probabilistic weather forecasting with machine learning.Nature, 637(8044):84–90, 2025
2025
-
[52]
Uncertainty quantification for traffic forecasting: A unified approach
Weizhu Qian, Dalin Zhang, Yan Zhao, Kai Zheng, and James JQ Yu. Uncertainty quantification for traffic forecasting: A unified approach. In2023 IEEE 39th International Conference on Data Engineering (ICDE), pages 992–1004. IEEE, 2023
2023
-
[53]
Learning dynamic graph embeddings with neural controlled differential equations.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025
Tiexin Qin, Benjamin Walker, Terry Lyons, Hong Yan, and Haoliang Li. Learning dynamic graph embeddings with neural controlled differential equations.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025
2025
-
[54]
Hgrn2: Gated linear rnns with state expansion.arXiv preprint arXiv:2404.07904, 2024
Zhen Qin, Songlin Yang, Weixuan Sun, Xuyang Shen, Dong Li, Weigao Sun, and Yiran Zhong. Hgrn2: Gated linear rnns with state expansion.arXiv preprint arXiv:2404.07904, 2024
-
[55]
Carl Edward Rasmussen and Christopher K. I. Williams.Gaussian Processes for Machine Learning. The MIT Press, 2006
2006
-
[56]
Autoregressive denois- ing diffusion models for multivariate probabilistic time series forecasting
Kashif Rasul, Calvin Seward, Ingmar Schuster, and Roland V ollgraf. Autoregressive denois- ing diffusion models for multivariate probabilistic time series forecasting. InInternational Conference on Machine Learning, pages 8857–8868. PMLR, 2021
2021
-
[57]
Deepar: Probabilistic forecasting with autoregressive recurrent networks.International journal of forecasting, 36(3): 1181–1191, 2020
David Salinas, Valentin Flunkert, Jan Gasthaus, and Tim Januschowski. Deepar: Probabilistic forecasting with autoregressive recurrent networks.International journal of forecasting, 36(3): 1181–1191, 2020
2020
-
[58]
Julien Siems, Timur Carstensen, Arber Zela, Frank Hutter, Massimiliano Pontil, and Riccardo Grazzi. Deltaproduct: Imprpoving state-tracking in linear rnns via householder products.arXiv preprint arXiv:2502.10297, 2025
-
[59]
Deep unsuper- vised learning using nonequilibrium thermodynamics
Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsuper- vised learning using nonequilibrium thermodynamics. InInternational conference on machine learning, pages 2256–2265. PMLR, 2015
2015
-
[60]
Learning to (Learn at Test Time): RNNs with Expressive Hidden States
Yu Sun, Xinhao Li, Karan Dalal, Jiarui Xu, Arjun Vikram, Genghan Zhang, Yann Dubois, Xinlei Chen, Xiaolong Wang, Sanmi Koyejo, Tatsunori Hashimoto, and Carlos Guestrin. Learning to (learn at test time): Rnns with expressive hidden states.arXiv preprint arXiv:2407.04620, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[61]
Probabilistic time series forecasting with boosted additive models: an application to smart meter data.Department of Economics and business statistics, Monash University, 2015
Souhaib Ben Taieb, Raphael Huser, Rob J Hyndman, Marc G Genton, et al. Probabilistic time series forecasting with boosted additive models: an application to smart meter data.Department of Economics and business statistics, Monash University, 2015
2015
-
[62]
Csdi: Conditional score-based diffusion models for probabilistic time series imputation.Advances in Neural Information Processing Systems, 34:24804–24816, 2021
Yusuke Tashiro, Jiaming Song, Yang Song, and Stefano Ermon. Csdi: Conditional score-based diffusion models for probabilistic time series imputation.Advances in Neural Information Processing Systems, 34:24804–24816, 2021
2021
-
[63]
Improving and generalizing flow-based generative models with minibatch optimal transport
Alexander Tong, Nikolay Malkin, Guillaume Huguet, Yanlei Zhang, Jarrid Rector-Brooks, Kilian FATRAS, Guy Wolf, and Yoshua Bengio. Improving and generalizing flow-based generative models with minibatch optimal transport. InICML Workshop on New Frontiers in Learning, Control, and Dynamical Systems, 2023. 13
2023
-
[64]
Springer, Berlin, Heidelberg, 2009
Cédric Villani.Optimal Transport: Old and New. Springer, Berlin, Heidelberg, 2009. doi: 10.1007/978-3-540-71050-9
-
[65]
Struc- tured linear cdes: Maximally expressive and parallel-in-time sequence models
Benjamin Walker, Lingyi Yang, Nicola Muca Cirone, Cristopher Salvi, and Terry Lyons. Struc- tured linear cdes: Maximally expressive and parallel-in-time sequence models. InProceedings of the 39th Conference on Neural Information Processing Systems (NeurIPS), 2025
2025
-
[66]
State-space models with layer-wise nonlinearity are universal approximators with exponential decaying memory
Shida Wang and Beichen Xue. State-space models with layer-wise nonlinearity are universal approximators with exponential decaying memory. InProceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS), 2023
2023
-
[67]
Gated Delta Networks: Improving Mamba2 with Delta Rule
Songlin Yang, Jan Kautz, and Ali Hatamizadeh. Gated delta networks: Improving mamba2 with delta rule.arXiv preprint arXiv:2412.06464, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[68]
Gated Linear Attention Transformers with Hardware-Efficient Training
Songlin Yang, Bailin Wang, Yikang Shen, Rameswar Panda, and Yoon Kim. Gated linear attention transformers with hardware-efficient training.arXiv preprint arXiv:2312.06635, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[69]
Parallelizing linear transformers with the delta rule over sequence length
Songlin Yang, Bailin Wang, Yu Zhang, Yikang Shen, and Yoon Kim. Parallelizing linear transformers with the delta rule over sequence length. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024
2024
-
[70]
Are transformers effective for time series forecasting? InProceedings of the AAAI conference on artificial intelligence, volume 37, pages 11121–11128, 2023
Ailing Zeng, Muxi Chen, Lei Zhang, and Qiang Xu. Are transformers effective for time series forecasting? InProceedings of the AAAI conference on artificial intelligence, volume 37, pages 11121–11128, 2023
2023
-
[71]
Gated slot attention for efficient linear-time sequence modeling
Yuxuan Zhang, Shiliang Yang, Rong Zhu, Yichong Zhang, Lei Cui, Yongjing Wang, Bin Wang, Feng Shi, Bing Wang, Wei Bi, Ping Zhou, and Guoxin Fu. Gated slot attention for efficient linear-time sequence modeling. InProceedings of the Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024
2024
-
[72]
Informer: Beyond efficient transformer for long sequence time-series forecasting
Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. Informer: Beyond efficient transformer for long sequence time-series forecasting. In The Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Virtual Conference, volume 35, pages 11106–11115. AAAI Press, 2021. 14 Appendices Appendix Contents A Uni...
2021
-
[73]
The pair(X,Σ)is called a measurable space
if(A n)n≥1 is a sequence of sets inΣ, then S n≥1 An ∈Σ. The pair(X,Σ)is called a measurable space. A σ-algebra specifies which subsets of X are measurable, and therefore which events can be assigned probabilities. Since X is assumed to be a metric space, there is a canonical choice of measurable structure, namely the Borelσ-algebra generated by the open s...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.