pith. sign in

arxiv: 2207.05442 · v8 · submitted 2022-07-12 · 📊 stat.ML · cs.LG

Wasserstein multivariate auto-regressive models for modeling distributional time series

Pith reviewed 2026-05-24 11:48 UTC · model grok-4.3

classification 📊 stat.ML cs.LG
keywords distributional time seriesWasserstein spaceautoregressive modelsmultivariate time seriessecond-order stationaritysparse estimationtemporal dependency graphsprobability measures
0
0 comments X

The pith

A multivariate autoregressive model defined directly in Wasserstein space produces second-order stationary distributional time series whose coefficients are estimated sparsely enough to recover temporal dependency graphs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper treats each time slice of multiple probability distributions as a single random element inside the Wasserstein space of measures. It writes down a linear recursion that updates the whole collection of measures at the next time step by taking weighted Wasserstein barycenters. Iterated-random-function theory supplies contraction conditions that guarantee the recursion possesses a unique second-order stationary solution. The coefficients of the recursion are estimated by a constrained procedure whose simplex restrictions force many entries to zero, so the estimated matrix directly supplies a directed graph of temporal influences. The procedure is illustrated on simulated trajectories and on two real collections: national age distributions and station-level bike rental histograms.

Core claim

By embedding a collection of time-indexed probability measures as a random element of the Wasserstein space, one can define a multivariate autoregressive recursion whose solution is second-order stationary whenever the map satisfies the contraction conditions of an iterated random function system. The autoregressive coefficients are recovered by a consistent estimator whose rows are constrained to the probability simplex; the resulting row-wise sparsity pattern is interpreted as a directed graph of temporal dependencies among the component series.

What carries the argument

The Wasserstein multivariate autoregressive recursion that expresses each next vector of measures as a collection of Wasserstein barycenters whose weights lie on the simplex.

If this is right

  • The recursion admits a unique second-order stationary solution under the stated contraction conditions.
  • The simplex-constrained estimator is consistent for the true autoregressive coefficients.
  • The estimated coefficient matrix is automatically sparse and can be read as a directed graph of temporal influences.
  • The fitted model can be applied to observed collections of probability measures such as age histograms or usage histograms.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Iterating the fitted recursion supplies a way to generate future distributional forecasts without additional parametric assumptions.
  • The same sparsity mechanism could be used to screen irrelevant series in higher-dimensional collections of histograms.
  • If the underlying measures are supported on a common bounded interval, the stationarity result supplies a long-run average distribution that can be computed from the fixed point of the barycentric map.

Load-bearing premise

The successive random measures must obey contraction conditions that make the iterated random function system possess a unique stationary element with finite second moments.

What would settle it

Simulate long trajectories from the recursion with known coefficient matrix, then check whether the proposed estimator recovers those exact coefficients at a rate that improves with the length of the observed series.

Figures

Figures reproduced from arXiv: 2207.05442 by J\'er\'emie Bigot, Yiye Jiang.

Figure 1
Figure 1. Figure 1: Annual records of age distributions of EU countries. On the top are 27 countries in the European union. A sequence of age distribution is recorded at each country over years. For example, at the bottom we illustrate the sequence of France, where one distribution supported over r0, 1s is observed at each year. On the lower left, we visualize the resulting univariate distributional time series with a surface… view at source ↗
Figure 2
Figure 2. Figure 2: Centering of a random probability measure. On the leftmost are i.i.d. samples of a random probability measure, whose population Fréchet mean is given in the middle. Applying the centering step (3.2) on them gives the centred samples shown on the rightmost. All probability measures are represented by their quantile functions. We can see that the centered samples correspond to the empirical Fréchet mean give… view at source ↗
Figure 3
Figure 3. Figure 3: The function g on the left is given by the natural cubic spline passing through the points p0, 0q,p0.2, 0.1q,p0.6, 0.2q,p1, 1q. On the right is one realization of 30 i.i.d. samples of the resulting g. data. Thus, we have to generate furthermore the population Fréchet mean F ´1 i,‘ of each univariate series in order to finally obtain the synthesized “raw” data, as the inverse of transformation (3.2): F ´1 … view at source ↗
Figure 4
Figure 4. Figure 4: Mean (left) and standard deviation (right) of RMSD for N “ 10. The mean and the variance are calculated over 100 simulations along time T, every 10 time instants. 0 2000 4000 6000 8000 10000 T 1.0 1.5 2.0 2.5 3.0 3.5 alpha = 0.1 alpha = 0.5 0 2000 4000 6000 8000 10000 T 0.000 0.005 0.010 0.015 0.020 0.025 0.030 0.035 alpha = 0.1 alpha = 0.5 [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Mean (left) and standard deviation (right) of RMSD for N “ 100. The mean and the variance are calculated every 100 time instants. Additionally, we can notice that, the RMSD for α “ 0.1 which corresponds to larger `2 norm of A has a smaller mean in both cases, and also a smaller variance for most of the sample sizes T investigated. Also we would like to remark that, during the first few T values, the sample… view at source ↗
Figure 6
Figure 6. Figure 6: Calculation time (in seconds) of Ap with respect to the sample size T for N “ 10 (left) and N “ 100 (right). The calculation time counting starts from the computation of the empirical Fréchet means for Data transformation (4.3), and ends when the accelerated projected gradient descent of Problem (4.5) finishes for the last row i “ N. simulation is 0.01, that is we input/output only the quantile function va… view at source ↗
Figure 7
Figure 7. Figure 7: Inferred age structure graph. The non-zero coefficients Aij are represented by the weighted directed edges from node j to node i. Thicker arrow corresponds to larger weights. The blue circles around nodes represent the weights of self-loop. Firstly, we can notice that for all countries i P t1, ..., 34u, the weight of self-loop Aii dominates the weights of incoming edges Aij , j “ 1, ..., 34, which are boun… view at source ↗
Figure 8
Figure 8. Figure 8: Evolution of age structure from 1996 to 2036 (projected) of Estonia (left) versus Latvia (right). Each curve connects the 101 relative frequencies from 0, 1{100, 2{100, ..., 1, which represents the age structure of a considered year. Lighter curves correspond to more recent years. 0.0 0.2 0.4 0.6 0.8 1.0 Age/100 0.000 0.002 0.004 0.006 0.008 0.010 0.012 0.014 Relative frequency 0.0 0.2 0.4 0.6 0.8 1.0 Age/… view at source ↗
Figure 9
Figure 9. Figure 9: Evolution of age structure from 1996 to 2036 (projected) of Sweden (left) versus Norway (right). 20 [PITH_FULL_IMAGE:figures/full_fig_p020_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Evolution of age structure from 1996 to 2036 (projected) of France (left) versus Italy (right). All these observations strongly support the usefulness of our model. Lastly, in [PITH_FULL_IMAGE:figures/full_fig_p021_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Subgraph 1 ( 50 nodes chosen randomly ). For graphical meanings see the caption of [PITH_FULL_IMAGE:figures/full_fig_p022_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Subgraph 2 ( the first 50 nodes ). This subgraph exhibits a more specific area in Paris, isolating the stations numbered 1 to 50. 6 Conclusion In this paper, we extend the standard VAR models to distributional multivariate AR models, which provides an approach to model a collection of multiple time-dependent probability measures, and to represent their dependency structure by a directed weighted graph at … view at source ↗
read the original abstract

This paper is focused on the statistical analysis of data consisting of a collection of multiple series of probability measures that are indexed by distinct time instants and supported over a bounded interval of the real line. By modeling these time-dependent probability measures as random objects in the Wasserstein space, we propose a new auto-regressive model for the statistical analysis of multivariate distributional time series. Using the theory of iterated random function systems, results on the second order stationarity of the solution of such a model are provided. We also propose a consistent estimator for the auto-regressive coefficients of this model. Due to the simplex constraints that we impose on the model coefficients, the proposed estimator that is learned under these constraints, naturally has a sparse structure. The sparsity allows the application of the proposed model in learning a graph of temporal dependency from multivariate distributional time series. We explore the numerical performances of our estimation procedure using simulated data. To shed some light on the benefits of our approach for real data analysis, we also apply this methodology to two data sets, respectively made of observations from age distribution in different countries and those from the bike sharing network in Paris.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes a new multivariate autoregressive model for distributional time series, where each time-indexed observation is a probability measure treated as a random object in the 2-Wasserstein space. The model is defined via a recursion using barycentric combinations with coefficients constrained to the simplex; second-order stationarity of the solution is claimed via the theory of iterated random function systems. A consistent estimator for the autoregressive coefficients is introduced that inherits sparsity from the simplex constraint, enabling inference of temporal dependency graphs. The approach is illustrated on simulated data and two real datasets (country-level age distributions and Paris bike-sharing station distributions).

Significance. If the stationarity and consistency results hold without additional unstated restrictions, the framework would provide a principled extension of vector autoregressive models to the Wasserstein space of measures, with the built-in sparsity offering a direct route to interpretable temporal graphs. The simplex constraint and its sparsity consequence are a genuine strength for applications in demography and network data.

major comments (2)
  1. [Abstract (stationarity results) and the section presenting the IRFS argument] The second-order stationarity claim rests on applying iterated random function system theory to the Wasserstein-space recursion. The manuscript must explicitly derive or bound the average contraction factor (typically E[Lip(f_t)] < 1 with respect to the 2-Wasserstein metric) under the simplex constraint on the coefficient vectors; without this step the existence of a unique stationary solution is not guaranteed for arbitrary simplex matrices.
  2. [Section on the estimator and its consistency theorem] Consistency of the proposed estimator is asserted, yet the proof must be shown to hold under the Wasserstein metric and the simplex constraint without post-hoc restrictions on the coefficient space; the current statement leaves open whether the estimator remains consistent when the true coefficients lie on the boundary of the simplex.
minor comments (2)
  1. [Model definition] Notation for the multivariate recursion and the Wasserstein barycenter should be introduced with an explicit equation number in the model-definition section to avoid ambiguity when the number of series exceeds two.
  2. [Applications to real data] The real-data section would benefit from a table reporting the estimated coefficient matrices (or their support graphs) alongside a baseline such as independent univariate Wasserstein AR(1) fits.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on our manuscript. We address each major comment below and will revise the paper to strengthen the presentation of the stationarity and consistency results.

read point-by-point responses
  1. Referee: [Abstract (stationarity results) and the section presenting the IRFS argument] The second-order stationarity claim rests on applying iterated random function system theory to the Wasserstein-space recursion. The manuscript must explicitly derive or bound the average contraction factor (typically E[Lip(f_t)] < 1 with respect to the 2-Wasserstein metric) under the simplex constraint on the coefficient vectors; without this step the existence of a unique stationary solution is not guaranteed for arbitrary simplex matrices.

    Authors: We agree that an explicit bound on the contraction factor is required to rigorously invoke the IRFS theory under the simplex constraint. The current manuscript states the stationarity result via IRFS but does not derive the bound E[Lip(f_t)] < 1 explicitly in terms of the simplex coefficients. In the revised version we will add a supporting lemma that establishes this bound (showing the expected Lipschitz constant is at most the maximum row sum of the absolute coefficient matrix, which is strictly less than 1 under the model assumptions), thereby guaranteeing a unique second-order stationary solution for admissible simplex matrices. revision: yes

  2. Referee: [Section on the estimator and its consistency theorem] Consistency of the proposed estimator is asserted, yet the proof must be shown to hold under the Wasserstein metric and the simplex constraint without post-hoc restrictions on the coefficient space; the current statement leaves open whether the estimator remains consistent when the true coefficients lie on the boundary of the simplex.

    Authors: We acknowledge that the consistency theorem and its proof need to be stated more carefully with respect to the simplex constraint and boundary cases. The existing argument is formulated in the Wasserstein metric but does not explicitly address boundary points. We will revise the theorem statement and extend the proof in the appendix to cover the boundary by using continuity of the objective and properties of the Euclidean projection onto the simplex; this will confirm consistency holds for all points in the simplex without additional restrictions. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on external IRFS theory applied to newly defined model

full rationale

The paper introduces a novel multivariate autoregressive model on the Wasserstein space with simplex-constrained coefficients, then invokes the external theory of iterated random function systems to obtain second-order stationarity results and proposes a consistent estimator. No quoted step reduces the stationarity claim, consistency, or sparsity property to a fitted quantity or self-citation by construction; the central claims remain independent of the paper's own fitted outputs and rest on standard external mathematical machinery.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard properties of the Wasserstein metric space and on the applicability of iterated random function system theory; the autoregressive coefficients are the primary quantities fitted to data under explicit simplex constraints.

free parameters (1)
  • autoregressive coefficients
    Estimated from data subject to simplex constraints that enforce summation to one and induce sparsity.
axioms (2)
  • domain assumption Probability measures supported on a bounded real interval can be treated as random objects in the Wasserstein space
    Explicit modeling choice stated in the abstract.
  • domain assumption The recursion satisfies the contraction conditions needed for iterated random function systems to guarantee second-order stationarity
    Invoked to obtain the stationarity result.

pith-pipeline@v0.9.0 · 5728 in / 1371 out tokens · 29590 ms · 2026-05-24T11:48:27.847465+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages · 2 internal anchors

  1. [1]

    Log-PCA versus Geodesic PCA of histograms in the Wasserstein space

    E. Cazelles, V. Seguy, J. Bigot, M. Cuturi, and N. Papadakis. Log-PCA versus geodesic PCA of histograms in the wasserstein space.arXiv preprint arXiv:1708.08143,

  2. [2]

    Jiang, J

    Y. Jiang, J. Bigot, and S. Maabout. Sensor selection on graphs via data-driven node sub-sampling in network time series.arXiv preprint arXiv:2004.11815,

  3. [3]

    J. Thai, C. Wu, A. Pozdnukhov, and A. Bayen. Projected sub-gradient withℓ1 or simplex constraints via isotonic regression. In2015 54th IEEE Conference on Decision and Control (CDC), pages 2031–2036. IEEE,

  4. [4]

    X. Zhou. On the fenchel duality between strong convexity and lipschitz continuous gradient.arXiv preprint arXiv:1803.06573,

  5. [5]

    Zhu and H.-G

    C. Zhu and H.-G. Müller. Autoregressive optimal transport models.arXiv preprint arXiv:2105.05439,

  6. [6]

    random objects taking values in a measurable spaceΘ, Φϵp¨q:“ Φp¨,ϵq is the ϵ-section of a jointly measurable functionΦ : Xˆ ΘÑ X

    7 Supplementary Material 7.1 Time series in metric space LetpX,dq be a complete separate metric space with Borel setX, an iterated random function (IRF) system in the state spacepX,dq is defined as Xt“ ΦϵtpXt´1q, t P Z, (7.1) whereϵt,t P Z are i.i.d. random objects taking values in a measurable spaceΘ, Φϵp¨q:“ Φp¨,ϵq is the ϵ-section of a jointly measurabl...

  7. [7]

    Condition

    (Condition 1 in Wu and Shao (2004)) There existsY 0P X and αą 0, such that Ipα,Y 0q :“ EdαpY 0, ΦϵtpY 0qqă8 . Condition

  8. [8]

    (7.2) Note thatϵt,t P Z are i.i.d, thus for any fixedX P X, we haverΦt,mpXq d“ rΦt1,mpXq,@t,t1P Z

    (Theorem 1 in Zhu and Müller (2021); Condition2 in Wu and Shao (2004)) There exists X0P X, αą 0, r“rpαqPp 0, 1q, andC“Cpαqą 0, such that for alltP Z, we have EdαprΦt,mpX0q, rΦt,mpXqqď CrmdαpX0,Xq, @XP X,m P N. (7.2) Note thatϵt,t P Z are i.i.d, thus for any fixedX P X, we haverΦt,mpXq d“ rΦt1,mpXq,@t,t1P Z. Thus Condition 2 is not a uniform requirement imp...

  9. [9]

    Thus, we provide the uniqueness result for the general system in Theorem 7.5

    is no longer valid for the general system(7.1), which is also mentioned in Wu and Shao (2004, Theorem 2). Thus, we provide the uniqueness result for the general system in Theorem 7.5. For the proof we refer to Section 7.3.4. Theorem 7.5.(Uniqueness) Suppose that Conditions 1 and 2 hold, then, if there is another solution St, tP Z, such that E “ dβpSt,Z 0q...

  10. [10]

    Γp1qr Γp0qs´1, whererΓp0qsj,l“ ExrFj,t´1´id, rFl,t´1´idyLeb andrΓp1qsj,l“ ExrFj,t´id, rFl,t´1´idyLeb. Since rA“ rΓp1q

    Otherwise, we would have P « }ϵi,t´1˝Ht´2´id}2 Leb ˇˇˇˇˇHt´2 ff “ 1 ðñ P « ϵi,t´1˝Ht´2“id almost everywhere ˇˇˇˇˇHt´2 ff “ 1 ðñ P « ϵi,t´1“H´1 t´2 a.e. ˇˇˇˇˇHt´2 ff “ 1 ðñ P “ ϵi,t´1“H´1 t´2 a.e. ‰ “ 1, which is a contradiction to the independence betweenϵi,t´1 andH´1 t´2. Thus, we can conclude that Γp0q is nonsingular. Point 2: Process (3.8) writes in terms...

  11. [11]

    It is then left to show thatgjl is stochastic dini-continuous (Wu and Shao, 2004, Equation (9)). For anyY and Y1 identically distributed inpX,x¨,¨yq, we have |gjlpYq´ gjlpY1q|“|x Yj´id, Yl´idyLeb´x Y1 j´id, Y1 l ´idyLeb| ď|x Yj´id, Yl´idyLeb´x Yj´id, Y1 l ´idyLeb| `|x Yj´id, Y1 l ´idyLeb´x Y1 j´id, Y1 l ´idyLeb| ď|x Yj´id, Yl´ Y1 lyLeb|`|x Yj´ Y1 j, Y1 l ...

  12. [12]

    ? T ˆ” rΓp0q ı j,l ´r Γp0qsj,l ˙ dÑσgjl Np0, 1q, which is followed

    Thus, we have furthermore |gjlpYq´ gjlpY1q|ď 1 2 ` }Yl´ Y1 l}`} Yj´ Y1 j} ˘ ď ? 2 2 ` }Yl´ Y1 l}2`} Yj´ Y1 j}2˘ 1 2 . Then sup Y,Y1 " |gjlpYq´ gjlpY1q|1´?řN i“1}Yi´Y1 i}2ăδ ¯ * ď ? 2 2 δ. Thus gjl is stochastic dini-continuous. Since IRF system(3.8) is geometric moment contracting indicated in the proof of Theorem (3.1), then by Theorem3 in Wu and Shao (2...

  13. [13]

    40 7.10 Proof of Theorem 4.4 The Lagrangian of Problem (4.8) is given by Lpβ, Λq“ fTpβq` mÿ i“1 λifipβq` pÿ j“1 νjhjpβq

    However, to show clearly the logic of the proof of Theorem 4.3, and the difference of the random objects involved, especially rFt and pFt, we complete the result forrA and emphasize it in Theorem 4.3. 40 7.10 Proof of Theorem 4.4 The Lagrangian of Problem (4.8) is given by Lpβ, Λq“ fTpβq` mÿ i“1 λifipβq` pÿ j“1 νjhjpβq. Let Λ˚ be a dual solution, then the ...