Universal Time Series Generation with Neural Controlled Differential Equations

Benjamin Walker; Elyes Farjallah; Jan St\"uhmer; Leif Seute; Raeid Saqur; Torben Berndt

arxiv: 2605.28507 · v2 · pith:KJQVFUOOnew · submitted 2026-05-27 · 💻 cs.LG

Universal Time Series Generation with Neural Controlled Differential Equations

Torben Berndt , Elyes Farjallah , Leif Seute , Raeid Saqur , Benjamin Walker , Jan St\"uhmer This is my paper

Pith reviewed 2026-06-29 14:07 UTC · model grok-4.3

classification 💻 cs.LG

keywords time series generationcontrolled differential equationsuniversalityflow matchingpath spaceprobabilistic forecastingirregular samplingcontinuous-time models

0 comments

The pith

Maximally expressive SLiCEs can approximate the path laws of any continuous causal pushforward on compact latent sets in the W infinity metric.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proves that Structured Linear Controlled Differential Equations become universal time-series generators once they are made maximally expressive. This means they can approximate the distributions of paths produced by continuous causal maps from compact latent spaces, with the approximation measured in the Wasserstein infinity distance. The authors then build Generative SLiCEs that apply flow matching directly in path space. The resulting model handles arbitrary observation times without retraining and improves probabilistic forecasting accuracy compared with fixed-grid baselines.

Core claim

Maximally expressive Structured Linear Controlled Differential Equations (SLiCEs) are universal time-series generators in the sense that they can approximate the induced path laws of continuous causal pushforwards on compact latent sets in W_∞. This theoretical guarantee underpins the construction of Generative SLiCEs (G-SLiCEs), a continuous-time flow-matching model on path space that empirically outperforms less expressive alternatives on forecasting tasks while generalizing to irregular sampling grids.

What carries the argument

Maximally expressive Structured Linear Controlled Differential Equations (SLiCEs), which serve as the continuous-time backbone whose linear structure and parameterisation allow approximation of arbitrary continuous causal path measures on compact sets.

If this is right

Expressivity directly improves accuracy in probabilistic forecasting and related downstream tasks.
The model inherits continuous-time properties and therefore works on arbitrary observation grids without modification.
Irregular or missing data no longer requires special handling or interpolation that fixed-grid models need.
Flow matching performed on path space yields a generative model whose samples respect the continuous causal structure of the data.
The same universality result supplies a design principle for future continuous-time generative architectures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The compactness requirement suggests testing whether performance degrades on latent spaces that are only locally compact or unbounded.
Because the model operates in continuous time, it may be directly applicable to irregularly sampled multivariate sensor streams without resampling.
The path-space flow-matching formulation could be combined with existing neural ODE or SDE solvers to produce hybrid generative pipelines.
If the causality assumption is relaxed, the same machinery might still approximate non-causal maps at the cost of losing the W_∞ guarantee.

Load-bearing premise

The latent sets must be compact and the pushforwards must be continuous and causal.

What would settle it

Exhibit one continuous causal pushforward from a compact latent set whose induced path law lies outside the closure of the set of laws produced by maximally expressive SLiCEs, when distance is measured in W_∞.

Figures

Figures reproduced from arXiv: 2605.28507 by Benjamin Walker, Elyes Farjallah, Jan St\"uhmer, Leif Seute, Raeid Saqur, Torben Berndt.

**Figure 1.** Figure 1: Path-space flow matching with G-SLiCE. G-SLiCE models probabilistic time-series generation as a continuous flow on path space. At inference time, an initial path X(0) ∼ µ0 is transported by the learned flow φθ,s, producing a terminal sample X(1) ∼ (φθ,1)#µ0. Each inference panel shows samples from the evolving distribution on path space. During training, the SLiCE vector field is trained to match the path-… view at source ↗

**Figure 2.** Figure 2: Expressivity gap on the sequence task (Section [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Representative cross-frequency forecasts on ETTSmall1h for models trained at 6-hour resolution and evaluated on different test frequencies. Curves show ground truth and predictive means; shaded regions indicate 95% confidence intervals. G-SLiCE remains stable across changes in the observation grid, while TSFlow is more sensitive to frequency shifts. Cross-frequency generalisation. We train models on one … view at source ↗

**Figure 4.** Figure 4: Example forecasts for TSFlow (left) and G-SLiCEs (right) trained with ktrain = 1 and evaluated for ktest = 1 (top) or ktest → ∞ (bottom). Curves show ground truth and predictive means; shaded regions indicate 95% confidence intervals. This paper introduced G-SLiCE, a path-space flowmatching model with maximally expressive Structured Linear CDE backbone. Theoretically, we showed that pathwise universality … view at source ↗

**Figure 5.** Figure 5: Example forecasts for G-SLiCEs (left) and TSFlow (right) trained with [PITH_FULL_IMAGE:figures/full_fig_p025_5.png] view at source ↗

**Figure 6.** Figure 6: Sample-path diagnostics for the hard-core expressivity example. Each block compares [PITH_FULL_IMAGE:figures/full_fig_p026_6.png] view at source ↗

**Figure 7.** Figure 7: Empirical pushforward distributions for the hard-core expressivity example. Each panel [PITH_FULL_IMAGE:figures/full_fig_p026_7.png] view at source ↗

read the original abstract

Recent work on the sequence universality of State Space Models (SSMs) has introduced efficient, maximally expressive continuous-time approaches for time-series modelling. While these works focus on discriminative settings, we extend this perspective to generative time-series modelling by proving that maximally expressive Structured Linear Controlled Differential Equations (SLiCEs) are universal time-series generators, in the sense that they can approximate the induced path laws of continuous causal pushforwards on compact latent sets in $W_\infty$. Building on these theoretical results, we propose Generative SLiCEs (G-SLiCEs), a maximally expressive continuous-time model for flow matching on path-space. Empirically, we show that expressivity improves performance in probabilistic forecasting and downstream tasks, while retaining the advantages of continuous-time models such as generalising to arbitrary observation grids. This is particularly beneficial for irregular grids, where fixed-grid models often struggle.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper proves a qualified universality result for generative SLiCEs and introduces G-SLiCEs for path-space flow matching.

read the letter

The central new piece is the extension of recent discriminative SSM universality results to the generative case. They prove that maximally expressive Structured Linear Controlled Differential Equations can approximate the induced path laws of continuous causal pushforwards on compact latent sets in the W_infty metric. From that they build G-SLiCEs, a continuous-time model for flow matching directly on path space.

What works is the link to irregular grids. Continuous-time models already handle arbitrary observation times without retraining or padding, and the universality claim supplies a theoretical reason why the model class can still be expressive enough for probabilistic forecasting. That combination is useful for anyone dealing with real sensor or event data.

The soft spots are the missing details. The abstract states the proof and the empirical gains but gives no derivation steps, no description of the flow-matching objective, and no information on baselines or metrics. Without those it is impossible to judge how tight the approximation result actually is or whether the reported improvements are meaningful. The conditions (compact sets, continuous causal pushforwards) are stated explicitly, so there is no hidden overclaim, but they do restrict the scope.

The citations appear to build cleanly on the prior discriminative SLiCE work. This paper is aimed at researchers who want theoretical grounding for continuous-time generative models rather than at practitioners looking for an off-the-shelf forecaster. A reader focused on irregular time series or path-space methods would get value from the continuous-time angle and the stated universality conditions.

It deserves peer review. The theoretical claim is substantive enough that referees should check the proof, and the model direction is a reasonable extension worth testing in full experiments.

Referee Report

2 major / 1 minor

Summary. The manuscript claims to prove that maximally expressive Structured Linear Controlled Differential Equations (SLiCEs) are universal time-series generators, in the sense that they approximate the induced path laws of continuous causal pushforwards on compact latent sets in the W_∞ metric. It introduces Generative SLiCEs (G-SLiCEs) for flow matching on path-space and reports that increased expressivity improves probabilistic forecasting and downstream tasks while preserving continuous-time advantages such as generalization to arbitrary observation grids.

Significance. If the universality result holds under the stated conditions, the work supplies a qualified theoretical foundation for continuous-time generative models that extends recent SSM universality results from discriminative to generative settings. The explicit restrictions (compact latent sets, continuous causal pushforwards, W_∞ metric) and the proposal of a flow-matching model on path space are strengths. The empirical emphasis on irregular grids addresses a recurring practical limitation of fixed-grid approaches.

major comments (2)

[Abstract] Abstract: the central universality claim is asserted without any derivation steps, theorem statement, or proof sketch, so the load-bearing mathematical result cannot be assessed for correctness against the listed conditions (compact sets, continuous causal pushforwards).
[Abstract] Abstract: the empirical section states that 'expressivity improves performance' but supplies no baselines, metrics, datasets, or experimental protocol, preventing verification of the claimed gains for probabilistic forecasting and irregular-grid tasks.

minor comments (1)

[Abstract] Abstract: the symbol W_∞ is introduced without definition or citation to the underlying path-space metric.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed comments. The universality theorem and its proof appear in full in the body of the manuscript (Section 3), while the abstract follows standard conventions by summarizing the result. The empirical protocol, baselines, and metrics are likewise detailed in Section 5. We address each point below.

read point-by-point responses

Referee: [Abstract] Abstract: the central universality claim is asserted without any derivation steps, theorem statement, or proof sketch, so the load-bearing mathematical result cannot be assessed for correctness against the listed conditions (compact sets, continuous causal pushforwards).

Authors: The precise theorem statement (including the restrictions to compact latent sets, continuous causal pushforwards, and the W_∞ metric) together with the complete proof is given in Section 3. The abstract is intentionally concise and does not contain derivations, as is conventional; the full mathematical development is available for assessment in the main text. revision: no
Referee: [Abstract] Abstract: the empirical section states that 'expressivity improves performance' but supplies no baselines, metrics, datasets, or experimental protocol, preventing verification of the claimed gains for probabilistic forecasting and irregular-grid tasks.

Authors: The abstract summarizes the empirical outcome at a high level. Section 5 supplies the full experimental protocol, including the datasets (with irregular observation grids), metrics (CRPS, log-likelihood, and downstream task accuracy), baselines (both discrete and continuous-time models), and implementation details. We are happy to add a parenthetical mention of the primary metric to the abstract if the editor prefers. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected in universality claim

full rationale

The paper presents a proof that maximally expressive SLiCEs approximate induced path laws of continuous causal pushforwards on compact latent sets in the W_∞ metric, with conditions stated explicitly in the abstract. This is framed as an extension of prior SSM universality results rather than a redefinition or fit of the target quantity itself. No self-definitional reduction, fitted-input prediction, or load-bearing self-citation chain is visible that would collapse the central claim to its inputs by construction. The derivation is therefore self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Abstract-only review; the central claim rests on the mathematical conditions of compact latent sets and continuous causal pushforwards plus the definition of the W_∞ metric on path space. No free parameters or invented entities beyond the model name are stated.

axioms (1)

domain assumption Compact latent sets and continuous causal pushforwards allow approximation of induced path laws in W_∞
Explicitly required in the abstract's statement of universality for SLiCEs.

invented entities (1)

G-SLiCEs no independent evidence
purpose: Maximally expressive continuous-time model for flow matching on path-space
Proposed in the abstract as the practical instantiation of the theoretical result.

pith-pipeline@v0.9.1-grok · 5697 in / 1337 out tokens · 39800 ms · 2026-06-29T14:07:35.015231+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

73 extracted references · 16 canonical work pages · 7 internal anchors

[1]

Albergo, Mark Goldstein, Nicholas M

Michael S. Albergo, Mark Goldstein, Nicholas M. Boffi, Rajesh Ranganath, and Eric Vanden-Eijnden. Stochastic interpolants with data-dependent couplings.arXiv preprint arXiv:2310.03725, 2023

work page arXiv 2023
[2]

Diffusion-based time series imputation and forecasting with structured state space models.Transactions on Machine Learning Research, 2022

Juan Lopez Alcaraz and Nils Strodthoff. Diffusion-based time series imputation and forecasting with structured state space models.Transactions on Machine Learning Research, 2022. ISSN 2835-8856. URLhttps://openreview.net/forum?id=hHiIbk7ApW

2022
[3]

Diffusion-based time series imputation and forecasting with structured state space models.arXiv preprint arXiv:2208.09399, 2022

Juan Miguel Lopez Alcaraz and Nils Strodthoff. Diffusion-based time series imputation and forecasting with structured state space models.arXiv preprint arXiv:2208.09399, 2022

work page arXiv 2022
[4]

J. M. Aldaz, O. Kounchev, and H. Render. Bernstein operators for exponential polynomials. Constructive Approximation, 29:345–367, 2009

2009
[5]

GluonTS: Probabilistic Time Series Models in Python

A. Alexandrov, K. Benidis, M. Bohlke-Schneider, V . Flunkert, J. Gasthaus, T. Januschowski, D. C. Maddix, S. Rangapuram, D. Salinas, J. Schulz, L. Stella, A. C. Türkmen, and Y . Wang. GluonTS: Probabilistic Time Series Modeling in Python.arXiv preprint arXiv:1906.05264, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1906
[6]

Maddix, Syama Rangapuram, David Salinas, Jasper Schulz, Lorenzo Stella, Ali Caner Türkmen, and Yuyang Wang

Alexander Alexandrov, Konstantinos Benidis, Michael Bohlke-Schneider, Valentin Flunkert, Jan Gasthaus, Tim Januschowski, Danielle C. Maddix, Syama Rangapuram, David Salinas, Jasper Schulz, Lorenzo Stella, Ali Caner Türkmen, and Yuyang Wang. GluonTS: Probabilistic and Neural Time Series Modeling in Python.Journal of Machine Learning Research, 21(116): 1–6,...

2020
[7]

xlstm: Extended long short-term memory

Maximilian Beck, Korbinian Pöppel, Markus Spanring, Andreas Auer, Oleksandra Prudnikova, Michael Kopp, Günter Klambauer, Johannes Brandstetter, and Sepp Hochreiter. xlstm: Extended long short-term memory. InProceedings of the 38th Conference on Neural Information Processing Systems, 2024

2024
[8]

Titans: Learning to Memorize at Test Time

Ali Behrouz, Peilin Zhong, and Vahab Mirrokni. Titans: Learning to memorize at test time. arXiv preprint arXiv:2501.00663, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[9]

Permu- tation equivariant neural controlled differential equations for dynamic graph representation learning

Torben Berndt, Benjamin Walker, Tiexin Qin, Jan Stühmer, and Andrey Kormilitzin. Permu- tation equivariant neural controlled differential equations for dynamic graph representation learning. InProceedings of the 39th Conference on Neural Information Processing Systems (NeurIPS), 2025

2025
[10]

Wiley, 2 edition, 1999

Patrick Billingsley.Convergence of Probability Measures. Wiley, 2 edition, 1999

1999
[11]

Modeling temporal data as continuous functions with stochastic process diffusion

Marin Biloš, Kashif Rasul, Anderson Schneider, Yuriy Nevmyvaka, and Stephan Günnemann. Modeling temporal data as continuous functions with stochastic process diffusion. InInterna- tional Conference on Machine Learning, pages 2452–2470. PMLR, 2023

2023
[12]

Bogachev.Measure Theory

Vladimir I. Bogachev.Measure Theory. Springer, Berlin, 2007. doi: 10.1007/ 978-3-540-34514-5

2007
[13]

Ricky T. Q. Chen and Yaron Lipman. Flow matching on general geometries. InThe Twelfth International Conference on Learning Representations, 2024. 10

2024
[14]

Fractional ornstein–uhlenbeck processes.Electronic Journal of Probability, 8, 2003

Patrick Cheridito, Hiroshi Kawaguchi, and Makoto Maejima. Fractional ornstein–uhlenbeck processes.Electronic Journal of Probability, 8, 2003

2003
[15]

Theoretical foundations of deep selective state-space models

Nicola Muca Cirone, Antonio Orvieto, Benjamin Walker, Cristopher Salvi, and Terry Lyons. Theoretical foundations of deep selective state-space models. InProceedings of the 38th Conference on Neural Information Processing Systems (NeurIPS), 2024

2024
[16]

Approximation by superpositions of a sigmoidal function.Mathematics of Control, Signals and Systems, 2:303–314, 1989

George Cybenko. Approximation by superpositions of a sigmoidal function.Mathematics of Control, Signals and Systems, 2:303–314, 1989

1989
[17]

Transformers are ssms: Generalized models and efficient algorithms through structured state space duality

Tri Dao and Albert Gu. Transformers are ssms: Generalized models and efficient algorithms through structured state space duality. InProceedings of the 41st International Conference on Machine Learning. JMLR.org, 2024

2024
[18]

TimeV AE: A variational auto-encoder for multivariate time series generation,

Abhyuday Desai, Cynthia Freeman, Zuhui Wang, and Ian Beaver. Timevae: A variational auto-encoder for multivariate time series generation.arXiv preprint arXiv:2111.08095, 2021

work page arXiv 2021
[19]

Karra Taniskidou

Dua Dheeru and E. Karra Taniskidou. Uci machine learning repository, 2017

2017
[20]

Joseph L. Doob. The brownian movement and stochastic equations.Annals of Mathematics, 43 (2), 1942

1942
[21]

Uci machine learning repository

Dheeru Dua, Casey Graff, et al. Uci machine learning repository. Beijing, 2017

2017
[22]

Advancing regular language rea- soning in linear recurrent neural networks

Ting-Han Fan, Ta-Chung Chi, and Alexander Rudnicky. Advancing regular language rea- soning in linear recurrent neural networks. InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2: Short Papers, pages 45–53, Mexico City, Mexico, 2024. Association for Com...

2024
[23]

Uber tlc foil response

FiveThirtyEight. Uber tlc foil response. https://github.com/fivethirtyeight/ uber-tlc-foil-response, 2016

2016
[24]

Alaya, Aurélie Boisbunon, Stanislas Chambon, Laetitia Chapel, Adrien Corenflos, Kilian Fatras, Nemo Fournier, Léo Gautheron, Nathalie T.H

Rémi Flamary, Nicolas Courty, Alexandre Gramfort, Mokhtar Z. Alaya, Aurélie Boisbunon, Stanislas Chambon, Laetitia Chapel, Adrien Corenflos, Kilian Fatras, Nemo Fournier, Léo Gautheron, Nathalie T.H. Gayraud, Hicham Janati, Alain Rakotomamonjy, Ievgen Redko, Antoine Rolet, Antony Schutz, Vivien Seguy, Danica J. Sutherland, Romain Tavenard, Alexander Tong,...

2021
[25]

Pot python optimal transport (version 0.9.5), 2024

Rémi Flamary, Cédric Vincent-Cuaz, Nicolas Courty, Alexandre Gramfort, Oleksii Kachaiev, Huy Quang Tran, Laurène David, Clément Bonet, Nathan Cassereau, Théo Gnassounou, Eloi Tanguy, Julie Delon, Antoine Collas, Sonia Mazelet, Laetitia Chapel, Tanguy Kerdoncuff, Xizheng Yu, Matthew Feickert, Paul Krzakala, Tianlin Liu, and Eduardo Fernandes Montesuma. Pot...

2024
[26]

Folland.Real Analysis: Modern Techniques and Their Applications

Gerald B. Folland.Real Analysis: Modern Techniques and Their Applications. Wiley, 2 edition, 1999

1999
[27]

Probabilistic forecasting with spline quantile function rnns

Jan Gasthaus, Konstantinos Benidis, Yuyang Wang, Syama Sundar Rangapuram, David Salinas, Valentin Flunkert, and Tim Januschowski. Probabilistic forecasting with spline quantile function rnns. InThe 22nd international conference on artificial intelligence and statistics, pages 1901–1910. PMLR, 2019

1901
[28]

Itai Gat, Tal Remez, Neta Shaul, Felix Kreuk, Ricky T. Q. Chen, Gabriel Synnaeve, Yossi Adi, and Yaron Lipman. Discrete flow matching. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

2024
[29]

Tilmann Gneiting and Adrian E. Raftery. Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association, 102(477):359–378, 2007

2007
[30]

Webb, Rob J

Rakshitha Godahewa, Christoph Bergmeir, Geoffrey I. Webb, Rob J. Hyndman, and Pablo Montero-Manso. Monash time series forecasting archive. InNeural Information Processing Systems Track on Datasets and Benchmarks, 2021. 11

2021
[31]

Monash time series forecasting archive

Rakshitha Wathsadini Godahewa, Christoph Bergmeir, Geoffrey I Webb, Rob Hyndman, and Pablo Montero-Manso. Monash time series forecasting archive. InThirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021

2021
[32]

Xavier Gonzalez, Andrew Warrington, Jimmy T. H. Smith, and Scott W. Linderman. Towards scalable and stable parallelization of nonlinear rnns. InAdvances in Neural Information Processing Systems, 2024

2024
[33]

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[34]

Approximation capabilities of multilayer feedforward networks.Neural Networks, 4(2):251–257, 1991

Kurt Hornik. Approximation capabilities of multilayer feedforward networks.Neural Networks, 4(2):251–257, 1991

1991
[35]

Koehler, J

Rob Hyndman, Anne B. Koehler, J. Keith Ord, and Ralph D. Snyder.Forecasting with exponential smoothing: the state space approach. Springer Science & Business Media, 2008

2008
[36]

Functional flow matching.arXiv preprint arXiv:2305.17209, 2023

Gavin Kerrigan, Giosue Migliorini, and Padhraic Smyth. Functional flow matching.arXiv preprint arXiv:2305.17209, 2023

work page arXiv 2023
[37]

Predict, refine, synthesize: Self-guiding diffusion models for probabilistic time series forecasting.Advances in Neural Information Processing Systems, 36, 2023

Marcel Kollovieh, Abdul Fatir Ansari, Michael Bohlke-Schneider, Jasper Zschiegner, Hao Wang, and Yuyang Bernie Wang. Predict, refine, synthesize: Self-guiding diffusion models for probabilistic time series forecasting.Advances in Neural Information Processing Systems, 36, 2023

2023
[38]

Flow matching with gaussian process priors for probabilistic time series forecasting.The Thirteenth International Conference on Learning Representations, 2025

Marcel Kollovieh, Marten Lienen, David Lüdke, Leo Schwinn, and Stephan Günnemann. Flow matching with gaussian process priors for probabilistic time series forecasting.The Thirteenth International Conference on Learning Representations, 2025

2025
[39]

Modeling long-and short-term temporal patterns with deep neural networks

Guokun Lai, Wei-Cheng Chang, Yiming Yang, and Hanxiao Liu. Modeling long-and short-term temporal patterns with deep neural networks. InThe 41st international ACM SIGIR conference on research & development in information retrieval, pages 95–104, 2018

2018
[40]

Arık, Nicolas Loeff, and Tomas Pfister

Bryan Lim, Sercan Ö. Arık, Nicolas Loeff, and Tomas Pfister. Temporal fusion transformers for interpretable multi-horizon time series forecasting.International Journal of Forecasting, 37(4): 1748–1764, 2021

2021
[41]

Parallelizing nonlinear sequential models over the sequence length.International Conference on Learning Representations, 2024

Yi Heng Lim, Qi Zhu, Joshua Selfridge, and Muhammad Firmansyah Kasim. Parallelizing nonlinear sequential models over the sequence length.International Conference on Learning Representations, 2024

2024
[42]

Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling. InThe Eleventh International Conference on Learning Representations, 2023

2023
[43]

The m4 competition: 100,000 time series and 61 forecasting methods.International Journal of Forecasting, 36(1): 54–74, 2020

Spyros Makridakis, Evangelos Spiliotis, and Vassilios Assimakopoulos. The m4 competition: 100,000 time series and 61 forecasting methods.International Journal of Forecasting, 36(1): 54–74, 2020

2020
[44]

Fixed-point rnns: From diagonal to dense in a few iterations

Sajad Movahedi, Felix Sarnthein, Nicola Muca Cirone, and Antonio Orvieto. Fixed-point rnns: From diagonal to dense in a few iterations. InProceedings of the 39th Conference on Neural Information Processing Systems (NeurIPS), 2025

2025
[45]

Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam

Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers. InInternational Conference on Learning Representations, 2023

2023
[46]

WaveNet: A Generative Model for Raw Audio

Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. Wavenet: A generative model for raw audio.arXiv preprint arXiv:1609.03499, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[47]

Antonio Orvieto, Soham De, Caglar Gulcehre, Razvan Pascanu, and Samuel L. Smith. Uni- versality of linear recurrences followed by non-linear projections: Finite-width guarantees and benefits of complex eigenvalues. InProceedings of the 41st International Conference on Machine Learning, 2024. 12

2024
[48]

Pedregosa, G

F. Pedregosa, G. Varoquaux, A. Gramfort, V . Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V . Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python.Journal of Machine Learning Research, 12:2825–2830, 2011

2011
[49]

Wind, Tianyi Wu, Daniel Wuttke, and Christian Zhou-Zheng

Bo Peng, Ruichong Zhang, Daniel Goldstein, Eric Alcaide, Xingjian Du, Haowen Hou, Jiaju Lin, Jiaxing Liu, Janna Lu, William Merrill, Guangyu Song, Kaifeng Tan, Saiteja Utpala, Nathan Wilce, Johan S. Wind, Tianyi Wu, Daniel Wuttke, and Christian Zhou-Zheng. Rwkv-7 “goose” with expressive dynamic state evolution.arXiv preprint arXiv:2503.14456, 2025

work page arXiv 2025
[50]

Smith, and Lingpeng Kong

Hao Peng, Nikolaos Pappas, Dani Yogatama, Roy Schwartz, Noah A. Smith, and Lingpeng Kong. Random feature attention. InInternational Conference on Learning Representations, 2021

2021
[51]

Probabilistic weather forecasting with machine learning.Nature, 637(8044):84–90, 2025

Ilan Price, Alvaro Sanchez-Gonzalez, Ferran Alet, Tom R Andersson, Andrew El-Kadi, Dominic Masters, Timo Ewalds, Jacklynn Stott, Shakir Mohamed, Peter Battaglia, et al. Probabilistic weather forecasting with machine learning.Nature, 637(8044):84–90, 2025

2025
[52]

Uncertainty quantification for traffic forecasting: A unified approach

Weizhu Qian, Dalin Zhang, Yan Zhao, Kai Zheng, and James JQ Yu. Uncertainty quantification for traffic forecasting: A unified approach. In2023 IEEE 39th International Conference on Data Engineering (ICDE), pages 992–1004. IEEE, 2023

2023
[53]

Learning dynamic graph embeddings with neural controlled differential equations.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

Tiexin Qin, Benjamin Walker, Terry Lyons, Hong Yan, and Haoliang Li. Learning dynamic graph embeddings with neural controlled differential equations.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

2025
[54]

Hgrn2: Gated linear rnns with state expansion.arXiv preprint arXiv:2404.07904, 2024

Zhen Qin, Songlin Yang, Weixuan Sun, Xuyang Shen, Dong Li, Weigao Sun, and Yiran Zhong. Hgrn2: Gated linear rnns with state expansion.arXiv preprint arXiv:2404.07904, 2024

work page arXiv 2024
[55]

Carl Edward Rasmussen and Christopher K. I. Williams.Gaussian Processes for Machine Learning. The MIT Press, 2006

2006
[56]

Autoregressive denois- ing diffusion models for multivariate probabilistic time series forecasting

Kashif Rasul, Calvin Seward, Ingmar Schuster, and Roland V ollgraf. Autoregressive denois- ing diffusion models for multivariate probabilistic time series forecasting. InInternational Conference on Machine Learning, pages 8857–8868. PMLR, 2021

2021
[57]

Deepar: Probabilistic forecasting with autoregressive recurrent networks.International journal of forecasting, 36(3): 1181–1191, 2020

David Salinas, Valentin Flunkert, Jan Gasthaus, and Tim Januschowski. Deepar: Probabilistic forecasting with autoregressive recurrent networks.International journal of forecasting, 36(3): 1181–1191, 2020

2020
[58]

Deltaproduct: Imprpoving state-tracking in linear rnns via householder products.arXiv preprint arXiv:2502.10297, 2025

Julien Siems, Timur Carstensen, Arber Zela, Frank Hutter, Massimiliano Pontil, and Riccardo Grazzi. Deltaproduct: Imprpoving state-tracking in linear rnns via householder products.arXiv preprint arXiv:2502.10297, 2025

work page arXiv 2025
[59]

Deep unsuper- vised learning using nonequilibrium thermodynamics

Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsuper- vised learning using nonequilibrium thermodynamics. InInternational conference on machine learning, pages 2256–2265. PMLR, 2015

2015
[60]

Learning to (Learn at Test Time): RNNs with Expressive Hidden States

Yu Sun, Xinhao Li, Karan Dalal, Jiarui Xu, Arjun Vikram, Genghan Zhang, Yann Dubois, Xinlei Chen, Xiaolong Wang, Sanmi Koyejo, Tatsunori Hashimoto, and Carlos Guestrin. Learning to (learn at test time): Rnns with expressive hidden states.arXiv preprint arXiv:2407.04620, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[61]

Probabilistic time series forecasting with boosted additive models: an application to smart meter data.Department of Economics and business statistics, Monash University, 2015

Souhaib Ben Taieb, Raphael Huser, Rob J Hyndman, Marc G Genton, et al. Probabilistic time series forecasting with boosted additive models: an application to smart meter data.Department of Economics and business statistics, Monash University, 2015

2015
[62]

Csdi: Conditional score-based diffusion models for probabilistic time series imputation.Advances in Neural Information Processing Systems, 34:24804–24816, 2021

Yusuke Tashiro, Jiaming Song, Yang Song, and Stefano Ermon. Csdi: Conditional score-based diffusion models for probabilistic time series imputation.Advances in Neural Information Processing Systems, 34:24804–24816, 2021

2021
[63]

Improving and generalizing flow-based generative models with minibatch optimal transport

Alexander Tong, Nikolay Malkin, Guillaume Huguet, Yanlei Zhang, Jarrid Rector-Brooks, Kilian FATRAS, Guy Wolf, and Yoshua Bengio. Improving and generalizing flow-based generative models with minibatch optimal transport. InICML Workshop on New Frontiers in Learning, Control, and Dynamical Systems, 2023. 13

2023
[64]

Springer, Berlin, Heidelberg, 2009

Cédric Villani.Optimal Transport: Old and New. Springer, Berlin, Heidelberg, 2009. doi: 10.1007/978-3-540-71050-9

work page doi:10.1007/978-3-540-71050-9 2009
[65]

Struc- tured linear cdes: Maximally expressive and parallel-in-time sequence models

Benjamin Walker, Lingyi Yang, Nicola Muca Cirone, Cristopher Salvi, and Terry Lyons. Struc- tured linear cdes: Maximally expressive and parallel-in-time sequence models. InProceedings of the 39th Conference on Neural Information Processing Systems (NeurIPS), 2025

2025
[66]

State-space models with layer-wise nonlinearity are universal approximators with exponential decaying memory

Shida Wang and Beichen Xue. State-space models with layer-wise nonlinearity are universal approximators with exponential decaying memory. InProceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS), 2023

2023
[67]

Gated Delta Networks: Improving Mamba2 with Delta Rule

Songlin Yang, Jan Kautz, and Ali Hatamizadeh. Gated delta networks: Improving mamba2 with delta rule.arXiv preprint arXiv:2412.06464, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[68]

Gated Linear Attention Transformers with Hardware-Efficient Training

Songlin Yang, Bailin Wang, Yikang Shen, Rameswar Panda, and Yoon Kim. Gated linear attention transformers with hardware-efficient training.arXiv preprint arXiv:2312.06635, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[69]

Parallelizing linear transformers with the delta rule over sequence length

Songlin Yang, Bailin Wang, Yu Zhang, Yikang Shen, and Yoon Kim. Parallelizing linear transformers with the delta rule over sequence length. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

2024
[70]

Are transformers effective for time series forecasting? InProceedings of the AAAI conference on artificial intelligence, volume 37, pages 11121–11128, 2023

Ailing Zeng, Muxi Chen, Lei Zhang, and Qiang Xu. Are transformers effective for time series forecasting? InProceedings of the AAAI conference on artificial intelligence, volume 37, pages 11121–11128, 2023

2023
[71]

Gated slot attention for efficient linear-time sequence modeling

Yuxuan Zhang, Shiliang Yang, Rong Zhu, Yichong Zhang, Lei Cui, Yongjing Wang, Bin Wang, Feng Shi, Bing Wang, Wei Bi, Ping Zhou, and Guoxin Fu. Gated slot attention for efficient linear-time sequence modeling. InProceedings of the Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

2024
[72]

Informer: Beyond efficient transformer for long sequence time-series forecasting

Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. Informer: Beyond efficient transformer for long sequence time-series forecasting. In The Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Virtual Conference, volume 35, pages 11106–11115. AAAI Press, 2021. 14 Appendices Appendix Contents A Uni...

2021
[73]

The pair(X,Σ)is called a measurable space

if(A n)n≥1 is a sequence of sets inΣ, then S n≥1 An ∈Σ. The pair(X,Σ)is called a measurable space. A σ-algebra specifies which subsets of X are measurable, and therefore which events can be assigned probabilities. Since X is assumed to be a metric space, there is a canonical choice of measurable structure, namely the Borelσ-algebra generated by the open s...

work page arXiv 2000

[1] [1]

Albergo, Mark Goldstein, Nicholas M

Michael S. Albergo, Mark Goldstein, Nicholas M. Boffi, Rajesh Ranganath, and Eric Vanden-Eijnden. Stochastic interpolants with data-dependent couplings.arXiv preprint arXiv:2310.03725, 2023

work page arXiv 2023

[2] [2]

Diffusion-based time series imputation and forecasting with structured state space models.Transactions on Machine Learning Research, 2022

Juan Lopez Alcaraz and Nils Strodthoff. Diffusion-based time series imputation and forecasting with structured state space models.Transactions on Machine Learning Research, 2022. ISSN 2835-8856. URLhttps://openreview.net/forum?id=hHiIbk7ApW

2022

[3] [3]

Diffusion-based time series imputation and forecasting with structured state space models.arXiv preprint arXiv:2208.09399, 2022

Juan Miguel Lopez Alcaraz and Nils Strodthoff. Diffusion-based time series imputation and forecasting with structured state space models.arXiv preprint arXiv:2208.09399, 2022

work page arXiv 2022

[4] [4]

J. M. Aldaz, O. Kounchev, and H. Render. Bernstein operators for exponential polynomials. Constructive Approximation, 29:345–367, 2009

2009

[5] [5]

GluonTS: Probabilistic Time Series Models in Python

A. Alexandrov, K. Benidis, M. Bohlke-Schneider, V . Flunkert, J. Gasthaus, T. Januschowski, D. C. Maddix, S. Rangapuram, D. Salinas, J. Schulz, L. Stella, A. C. Türkmen, and Y . Wang. GluonTS: Probabilistic Time Series Modeling in Python.arXiv preprint arXiv:1906.05264, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1906

[6] [6]

Maddix, Syama Rangapuram, David Salinas, Jasper Schulz, Lorenzo Stella, Ali Caner Türkmen, and Yuyang Wang

Alexander Alexandrov, Konstantinos Benidis, Michael Bohlke-Schneider, Valentin Flunkert, Jan Gasthaus, Tim Januschowski, Danielle C. Maddix, Syama Rangapuram, David Salinas, Jasper Schulz, Lorenzo Stella, Ali Caner Türkmen, and Yuyang Wang. GluonTS: Probabilistic and Neural Time Series Modeling in Python.Journal of Machine Learning Research, 21(116): 1–6,...

2020

[7] [7]

xlstm: Extended long short-term memory

Maximilian Beck, Korbinian Pöppel, Markus Spanring, Andreas Auer, Oleksandra Prudnikova, Michael Kopp, Günter Klambauer, Johannes Brandstetter, and Sepp Hochreiter. xlstm: Extended long short-term memory. InProceedings of the 38th Conference on Neural Information Processing Systems, 2024

2024

[8] [8]

Titans: Learning to Memorize at Test Time

Ali Behrouz, Peilin Zhong, and Vahab Mirrokni. Titans: Learning to memorize at test time. arXiv preprint arXiv:2501.00663, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[9] [9]

Permu- tation equivariant neural controlled differential equations for dynamic graph representation learning

Torben Berndt, Benjamin Walker, Tiexin Qin, Jan Stühmer, and Andrey Kormilitzin. Permu- tation equivariant neural controlled differential equations for dynamic graph representation learning. InProceedings of the 39th Conference on Neural Information Processing Systems (NeurIPS), 2025

2025

[10] [10]

Wiley, 2 edition, 1999

Patrick Billingsley.Convergence of Probability Measures. Wiley, 2 edition, 1999

1999

[11] [11]

Modeling temporal data as continuous functions with stochastic process diffusion

Marin Biloš, Kashif Rasul, Anderson Schneider, Yuriy Nevmyvaka, and Stephan Günnemann. Modeling temporal data as continuous functions with stochastic process diffusion. InInterna- tional Conference on Machine Learning, pages 2452–2470. PMLR, 2023

2023

[12] [12]

Bogachev.Measure Theory

Vladimir I. Bogachev.Measure Theory. Springer, Berlin, 2007. doi: 10.1007/ 978-3-540-34514-5

2007

[13] [13]

Ricky T. Q. Chen and Yaron Lipman. Flow matching on general geometries. InThe Twelfth International Conference on Learning Representations, 2024. 10

2024

[14] [14]

Fractional ornstein–uhlenbeck processes.Electronic Journal of Probability, 8, 2003

Patrick Cheridito, Hiroshi Kawaguchi, and Makoto Maejima. Fractional ornstein–uhlenbeck processes.Electronic Journal of Probability, 8, 2003

2003

[15] [15]

Theoretical foundations of deep selective state-space models

Nicola Muca Cirone, Antonio Orvieto, Benjamin Walker, Cristopher Salvi, and Terry Lyons. Theoretical foundations of deep selective state-space models. InProceedings of the 38th Conference on Neural Information Processing Systems (NeurIPS), 2024

2024

[16] [16]

Approximation by superpositions of a sigmoidal function.Mathematics of Control, Signals and Systems, 2:303–314, 1989

George Cybenko. Approximation by superpositions of a sigmoidal function.Mathematics of Control, Signals and Systems, 2:303–314, 1989

1989

[17] [17]

Transformers are ssms: Generalized models and efficient algorithms through structured state space duality

Tri Dao and Albert Gu. Transformers are ssms: Generalized models and efficient algorithms through structured state space duality. InProceedings of the 41st International Conference on Machine Learning. JMLR.org, 2024

2024

[18] [18]

TimeV AE: A variational auto-encoder for multivariate time series generation,

Abhyuday Desai, Cynthia Freeman, Zuhui Wang, and Ian Beaver. Timevae: A variational auto-encoder for multivariate time series generation.arXiv preprint arXiv:2111.08095, 2021

work page arXiv 2021

[19] [19]

Karra Taniskidou

Dua Dheeru and E. Karra Taniskidou. Uci machine learning repository, 2017

2017

[20] [20]

Joseph L. Doob. The brownian movement and stochastic equations.Annals of Mathematics, 43 (2), 1942

1942

[21] [21]

Uci machine learning repository

Dheeru Dua, Casey Graff, et al. Uci machine learning repository. Beijing, 2017

2017

[22] [22]

Advancing regular language rea- soning in linear recurrent neural networks

Ting-Han Fan, Ta-Chung Chi, and Alexander Rudnicky. Advancing regular language rea- soning in linear recurrent neural networks. InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2: Short Papers, pages 45–53, Mexico City, Mexico, 2024. Association for Com...

2024

[23] [23]

Uber tlc foil response

FiveThirtyEight. Uber tlc foil response. https://github.com/fivethirtyeight/ uber-tlc-foil-response, 2016

2016

[24] [24]

Alaya, Aurélie Boisbunon, Stanislas Chambon, Laetitia Chapel, Adrien Corenflos, Kilian Fatras, Nemo Fournier, Léo Gautheron, Nathalie T.H

Rémi Flamary, Nicolas Courty, Alexandre Gramfort, Mokhtar Z. Alaya, Aurélie Boisbunon, Stanislas Chambon, Laetitia Chapel, Adrien Corenflos, Kilian Fatras, Nemo Fournier, Léo Gautheron, Nathalie T.H. Gayraud, Hicham Janati, Alain Rakotomamonjy, Ievgen Redko, Antoine Rolet, Antony Schutz, Vivien Seguy, Danica J. Sutherland, Romain Tavenard, Alexander Tong,...

2021

[25] [25]

Pot python optimal transport (version 0.9.5), 2024

Rémi Flamary, Cédric Vincent-Cuaz, Nicolas Courty, Alexandre Gramfort, Oleksii Kachaiev, Huy Quang Tran, Laurène David, Clément Bonet, Nathan Cassereau, Théo Gnassounou, Eloi Tanguy, Julie Delon, Antoine Collas, Sonia Mazelet, Laetitia Chapel, Tanguy Kerdoncuff, Xizheng Yu, Matthew Feickert, Paul Krzakala, Tianlin Liu, and Eduardo Fernandes Montesuma. Pot...

2024

[26] [26]

Folland.Real Analysis: Modern Techniques and Their Applications

Gerald B. Folland.Real Analysis: Modern Techniques and Their Applications. Wiley, 2 edition, 1999

1999

[27] [27]

Probabilistic forecasting with spline quantile function rnns

Jan Gasthaus, Konstantinos Benidis, Yuyang Wang, Syama Sundar Rangapuram, David Salinas, Valentin Flunkert, and Tim Januschowski. Probabilistic forecasting with spline quantile function rnns. InThe 22nd international conference on artificial intelligence and statistics, pages 1901–1910. PMLR, 2019

1901

[28] [28]

Itai Gat, Tal Remez, Neta Shaul, Felix Kreuk, Ricky T. Q. Chen, Gabriel Synnaeve, Yossi Adi, and Yaron Lipman. Discrete flow matching. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

2024

[29] [29]

Tilmann Gneiting and Adrian E. Raftery. Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association, 102(477):359–378, 2007

2007

[30] [30]

Webb, Rob J

Rakshitha Godahewa, Christoph Bergmeir, Geoffrey I. Webb, Rob J. Hyndman, and Pablo Montero-Manso. Monash time series forecasting archive. InNeural Information Processing Systems Track on Datasets and Benchmarks, 2021. 11

2021

[31] [31]

Monash time series forecasting archive

Rakshitha Wathsadini Godahewa, Christoph Bergmeir, Geoffrey I Webb, Rob Hyndman, and Pablo Montero-Manso. Monash time series forecasting archive. InThirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021

2021

[32] [32]

Xavier Gonzalez, Andrew Warrington, Jimmy T. H. Smith, and Scott W. Linderman. Towards scalable and stable parallelization of nonlinear rnns. InAdvances in Neural Information Processing Systems, 2024

2024

[33] [33]

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[34] [34]

Approximation capabilities of multilayer feedforward networks.Neural Networks, 4(2):251–257, 1991

Kurt Hornik. Approximation capabilities of multilayer feedforward networks.Neural Networks, 4(2):251–257, 1991

1991

[35] [35]

Koehler, J

Rob Hyndman, Anne B. Koehler, J. Keith Ord, and Ralph D. Snyder.Forecasting with exponential smoothing: the state space approach. Springer Science & Business Media, 2008

2008

[36] [36]

Functional flow matching.arXiv preprint arXiv:2305.17209, 2023

Gavin Kerrigan, Giosue Migliorini, and Padhraic Smyth. Functional flow matching.arXiv preprint arXiv:2305.17209, 2023

work page arXiv 2023

[37] [37]

Predict, refine, synthesize: Self-guiding diffusion models for probabilistic time series forecasting.Advances in Neural Information Processing Systems, 36, 2023

Marcel Kollovieh, Abdul Fatir Ansari, Michael Bohlke-Schneider, Jasper Zschiegner, Hao Wang, and Yuyang Bernie Wang. Predict, refine, synthesize: Self-guiding diffusion models for probabilistic time series forecasting.Advances in Neural Information Processing Systems, 36, 2023

2023

[38] [38]

Flow matching with gaussian process priors for probabilistic time series forecasting.The Thirteenth International Conference on Learning Representations, 2025

Marcel Kollovieh, Marten Lienen, David Lüdke, Leo Schwinn, and Stephan Günnemann. Flow matching with gaussian process priors for probabilistic time series forecasting.The Thirteenth International Conference on Learning Representations, 2025

2025

[39] [39]

Modeling long-and short-term temporal patterns with deep neural networks

Guokun Lai, Wei-Cheng Chang, Yiming Yang, and Hanxiao Liu. Modeling long-and short-term temporal patterns with deep neural networks. InThe 41st international ACM SIGIR conference on research & development in information retrieval, pages 95–104, 2018

2018

[40] [40]

Arık, Nicolas Loeff, and Tomas Pfister

Bryan Lim, Sercan Ö. Arık, Nicolas Loeff, and Tomas Pfister. Temporal fusion transformers for interpretable multi-horizon time series forecasting.International Journal of Forecasting, 37(4): 1748–1764, 2021

2021

[41] [41]

Parallelizing nonlinear sequential models over the sequence length.International Conference on Learning Representations, 2024

Yi Heng Lim, Qi Zhu, Joshua Selfridge, and Muhammad Firmansyah Kasim. Parallelizing nonlinear sequential models over the sequence length.International Conference on Learning Representations, 2024

2024

[42] [42]

Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling. InThe Eleventh International Conference on Learning Representations, 2023

2023

[43] [43]

The m4 competition: 100,000 time series and 61 forecasting methods.International Journal of Forecasting, 36(1): 54–74, 2020

Spyros Makridakis, Evangelos Spiliotis, and Vassilios Assimakopoulos. The m4 competition: 100,000 time series and 61 forecasting methods.International Journal of Forecasting, 36(1): 54–74, 2020

2020

[44] [44]

Fixed-point rnns: From diagonal to dense in a few iterations

Sajad Movahedi, Felix Sarnthein, Nicola Muca Cirone, and Antonio Orvieto. Fixed-point rnns: From diagonal to dense in a few iterations. InProceedings of the 39th Conference on Neural Information Processing Systems (NeurIPS), 2025

2025

[45] [45]

Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam

Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers. InInternational Conference on Learning Representations, 2023

2023

[46] [46]

WaveNet: A Generative Model for Raw Audio

Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. Wavenet: A generative model for raw audio.arXiv preprint arXiv:1609.03499, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[47] [47]

Antonio Orvieto, Soham De, Caglar Gulcehre, Razvan Pascanu, and Samuel L. Smith. Uni- versality of linear recurrences followed by non-linear projections: Finite-width guarantees and benefits of complex eigenvalues. InProceedings of the 41st International Conference on Machine Learning, 2024. 12

2024

[48] [48]

Pedregosa, G

F. Pedregosa, G. Varoquaux, A. Gramfort, V . Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V . Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python.Journal of Machine Learning Research, 12:2825–2830, 2011

2011

[49] [49]

Wind, Tianyi Wu, Daniel Wuttke, and Christian Zhou-Zheng

Bo Peng, Ruichong Zhang, Daniel Goldstein, Eric Alcaide, Xingjian Du, Haowen Hou, Jiaju Lin, Jiaxing Liu, Janna Lu, William Merrill, Guangyu Song, Kaifeng Tan, Saiteja Utpala, Nathan Wilce, Johan S. Wind, Tianyi Wu, Daniel Wuttke, and Christian Zhou-Zheng. Rwkv-7 “goose” with expressive dynamic state evolution.arXiv preprint arXiv:2503.14456, 2025

work page arXiv 2025

[50] [50]

Smith, and Lingpeng Kong

Hao Peng, Nikolaos Pappas, Dani Yogatama, Roy Schwartz, Noah A. Smith, and Lingpeng Kong. Random feature attention. InInternational Conference on Learning Representations, 2021

2021

[51] [51]

Probabilistic weather forecasting with machine learning.Nature, 637(8044):84–90, 2025

Ilan Price, Alvaro Sanchez-Gonzalez, Ferran Alet, Tom R Andersson, Andrew El-Kadi, Dominic Masters, Timo Ewalds, Jacklynn Stott, Shakir Mohamed, Peter Battaglia, et al. Probabilistic weather forecasting with machine learning.Nature, 637(8044):84–90, 2025

2025

[52] [52]

Uncertainty quantification for traffic forecasting: A unified approach

Weizhu Qian, Dalin Zhang, Yan Zhao, Kai Zheng, and James JQ Yu. Uncertainty quantification for traffic forecasting: A unified approach. In2023 IEEE 39th International Conference on Data Engineering (ICDE), pages 992–1004. IEEE, 2023

2023

[53] [53]

Learning dynamic graph embeddings with neural controlled differential equations.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

Tiexin Qin, Benjamin Walker, Terry Lyons, Hong Yan, and Haoliang Li. Learning dynamic graph embeddings with neural controlled differential equations.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

2025

[54] [54]

Hgrn2: Gated linear rnns with state expansion.arXiv preprint arXiv:2404.07904, 2024

Zhen Qin, Songlin Yang, Weixuan Sun, Xuyang Shen, Dong Li, Weigao Sun, and Yiran Zhong. Hgrn2: Gated linear rnns with state expansion.arXiv preprint arXiv:2404.07904, 2024

work page arXiv 2024

[55] [55]

Carl Edward Rasmussen and Christopher K. I. Williams.Gaussian Processes for Machine Learning. The MIT Press, 2006

2006

[56] [56]

Autoregressive denois- ing diffusion models for multivariate probabilistic time series forecasting

Kashif Rasul, Calvin Seward, Ingmar Schuster, and Roland V ollgraf. Autoregressive denois- ing diffusion models for multivariate probabilistic time series forecasting. InInternational Conference on Machine Learning, pages 8857–8868. PMLR, 2021

2021

[57] [57]

Deepar: Probabilistic forecasting with autoregressive recurrent networks.International journal of forecasting, 36(3): 1181–1191, 2020

David Salinas, Valentin Flunkert, Jan Gasthaus, and Tim Januschowski. Deepar: Probabilistic forecasting with autoregressive recurrent networks.International journal of forecasting, 36(3): 1181–1191, 2020

2020

[58] [58]

Deltaproduct: Imprpoving state-tracking in linear rnns via householder products.arXiv preprint arXiv:2502.10297, 2025

Julien Siems, Timur Carstensen, Arber Zela, Frank Hutter, Massimiliano Pontil, and Riccardo Grazzi. Deltaproduct: Imprpoving state-tracking in linear rnns via householder products.arXiv preprint arXiv:2502.10297, 2025

work page arXiv 2025

[59] [59]

Deep unsuper- vised learning using nonequilibrium thermodynamics

Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsuper- vised learning using nonequilibrium thermodynamics. InInternational conference on machine learning, pages 2256–2265. PMLR, 2015

2015

[60] [60]

Learning to (Learn at Test Time): RNNs with Expressive Hidden States

Yu Sun, Xinhao Li, Karan Dalal, Jiarui Xu, Arjun Vikram, Genghan Zhang, Yann Dubois, Xinlei Chen, Xiaolong Wang, Sanmi Koyejo, Tatsunori Hashimoto, and Carlos Guestrin. Learning to (learn at test time): Rnns with expressive hidden states.arXiv preprint arXiv:2407.04620, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[61] [61]

Probabilistic time series forecasting with boosted additive models: an application to smart meter data.Department of Economics and business statistics, Monash University, 2015

Souhaib Ben Taieb, Raphael Huser, Rob J Hyndman, Marc G Genton, et al. Probabilistic time series forecasting with boosted additive models: an application to smart meter data.Department of Economics and business statistics, Monash University, 2015

2015

[62] [62]

Csdi: Conditional score-based diffusion models for probabilistic time series imputation.Advances in Neural Information Processing Systems, 34:24804–24816, 2021

Yusuke Tashiro, Jiaming Song, Yang Song, and Stefano Ermon. Csdi: Conditional score-based diffusion models for probabilistic time series imputation.Advances in Neural Information Processing Systems, 34:24804–24816, 2021

2021

[63] [63]

Improving and generalizing flow-based generative models with minibatch optimal transport

Alexander Tong, Nikolay Malkin, Guillaume Huguet, Yanlei Zhang, Jarrid Rector-Brooks, Kilian FATRAS, Guy Wolf, and Yoshua Bengio. Improving and generalizing flow-based generative models with minibatch optimal transport. InICML Workshop on New Frontiers in Learning, Control, and Dynamical Systems, 2023. 13

2023

[64] [64]

Springer, Berlin, Heidelberg, 2009

Cédric Villani.Optimal Transport: Old and New. Springer, Berlin, Heidelberg, 2009. doi: 10.1007/978-3-540-71050-9

work page doi:10.1007/978-3-540-71050-9 2009

[65] [65]

Struc- tured linear cdes: Maximally expressive and parallel-in-time sequence models

Benjamin Walker, Lingyi Yang, Nicola Muca Cirone, Cristopher Salvi, and Terry Lyons. Struc- tured linear cdes: Maximally expressive and parallel-in-time sequence models. InProceedings of the 39th Conference on Neural Information Processing Systems (NeurIPS), 2025

2025

[66] [66]

State-space models with layer-wise nonlinearity are universal approximators with exponential decaying memory

Shida Wang and Beichen Xue. State-space models with layer-wise nonlinearity are universal approximators with exponential decaying memory. InProceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS), 2023

2023

[67] [67]

Gated Delta Networks: Improving Mamba2 with Delta Rule

Songlin Yang, Jan Kautz, and Ali Hatamizadeh. Gated delta networks: Improving mamba2 with delta rule.arXiv preprint arXiv:2412.06464, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[68] [68]

Gated Linear Attention Transformers with Hardware-Efficient Training

Songlin Yang, Bailin Wang, Yikang Shen, Rameswar Panda, and Yoon Kim. Gated linear attention transformers with hardware-efficient training.arXiv preprint arXiv:2312.06635, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[69] [69]

Parallelizing linear transformers with the delta rule over sequence length

Songlin Yang, Bailin Wang, Yu Zhang, Yikang Shen, and Yoon Kim. Parallelizing linear transformers with the delta rule over sequence length. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

2024

[70] [70]

Are transformers effective for time series forecasting? InProceedings of the AAAI conference on artificial intelligence, volume 37, pages 11121–11128, 2023

Ailing Zeng, Muxi Chen, Lei Zhang, and Qiang Xu. Are transformers effective for time series forecasting? InProceedings of the AAAI conference on artificial intelligence, volume 37, pages 11121–11128, 2023

2023

[71] [71]

Gated slot attention for efficient linear-time sequence modeling

Yuxuan Zhang, Shiliang Yang, Rong Zhu, Yichong Zhang, Lei Cui, Yongjing Wang, Bin Wang, Feng Shi, Bing Wang, Wei Bi, Ping Zhou, and Guoxin Fu. Gated slot attention for efficient linear-time sequence modeling. InProceedings of the Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

2024

[72] [72]

Informer: Beyond efficient transformer for long sequence time-series forecasting

Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang. Informer: Beyond efficient transformer for long sequence time-series forecasting. In The Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Virtual Conference, volume 35, pages 11106–11115. AAAI Press, 2021. 14 Appendices Appendix Contents A Uni...

2021

[73] [73]

The pair(X,Σ)is called a measurable space

if(A n)n≥1 is a sequence of sets inΣ, then S n≥1 An ∈Σ. The pair(X,Σ)is called a measurable space. A σ-algebra specifies which subsets of X are measurable, and therefore which events can be assigned probabilities. Since X is assumed to be a metric space, there is a canonical choice of measurable structure, namely the Borelσ-algebra generated by the open s...

work page arXiv 2000