arxiv: 2604.07169 · v2 · submitted 2026-04-08 · 📊 stat.ML · cs.LG· cs.NA· math.NA

Recognition: unknown

FLUID: Flow-based Unified Inference for Dynamics

Chenlong Pei, Tao Zhou, Tiangang Cui, Xiaodong Feng, Xiaoliang Wan

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:17 UTC · model grok-4.3

classification 📊 stat.ML cs.LGcs.NAmath.NA

keywords Bayesian filteringsmoothingnormalizing flowsamortized inferencedynamical systemsrecurrent encoderstate estimation

0 comments

The pith

FLUID encodes variable-length observations into a fixed summary that conditions coupled forward and backward flows for unified filtering and smoothing.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents FLUID as a flow-based amortized method that solves Bayesian filtering and smoothing together for high-dimensional nonlinear dynamical systems. A recurrent encoder compresses any-length observation history into a fixed-dimensional vector that is shared by two learned flows. One flow approximates the filtering distribution at each step while the other approximates the backward transition kernel. Full trajectory smoothing is recovered by running the standard backward recursion from the terminal filter using the learned kernel, and the same structure supports extrapolation past the training horizon. The shared summary couples the two flows and thereby regularizes the estimated latent trajectories.

Core claim

FLUID encodes each observation history into a fixed-dimensional summary statistic via a recurrent encoder. Conditioned on this statistic, a forward flow approximates the filtering distribution while a backward flow approximates the backward transition kernel. The smoothing distribution over an entire trajectory is recovered by combining the terminal filtering distribution with the learned backward flow through the standard backward recursion. By learning the underlying temporal evolution structure, FLUID also supports extrapolation beyond the training horizon.

What carries the argument

The fixed-dimensional recurrent summary statistic that is shared to condition both the forward filtering flow and the backward transition flow.

If this is right

Accurate approximations of both filtering distributions and smoothing paths for high-dimensional nonlinear systems.
Improved trajectory-level smoothing through implicit regularization induced by coupling the two flows via the shared summary.
Extrapolation to time steps beyond the training horizon by learning the temporal evolution structure.
An alternative flow-based particle filtering procedure with ESS-based diagnostics when explicit model factors are available.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The amortized nature of the shared summary could support repeated inference on new sequences at lower cost than retraining separate filters and smoothers.
If the recurrent summary preserves the essential dynamics, the same trained flows might be reused on related systems that differ only in sequence length or noise level.
Incremental updating of the recurrent summary as new observations arrive would allow the method to be applied in online settings without re-encoding the full history.

Load-bearing premise

A fixed-dimensional recurrent summary of the observation history is sufficient to condition accurate approximations of the filtering distribution and backward transition kernel for arbitrary-length sequences in high-dimensional nonlinear systems.

What would settle it

A test on a nonlinear system with long-range temporal dependencies where approximation error in the recovered smoothing paths increases sharply once sequence length exceeds the lengths used to train the recurrent encoder.

Figures

Figures reproduced from arXiv: 2604.07169 by Chenlong Pei, Tao Zhou, Tiangang Cui, Xiaodong Feng, Xiaoliang Wan.

**Figure 1.** Figure 1: Schematic of a single-layer LSTM cell. In practice, we use a multi-layer LSTM obtained by stacking L single-layer LSTMs. For each ℓ ∈ {1, . . . , L} and time t we denote by h (ℓ) t , c (ℓ) t ∈ Rdh the hidden and cell states in layer ℓ, and the hidden states h (ℓ) t are used as inputs for the next layer. The layerwise inputs are defined recursively by z (1) t B yt , z (ℓ) t B h (ℓ−1) t for ℓ ≥ 2, and the up… view at source ↗

**Figure 2.** Figure 2: Schematic of the recurrent summary network. The input observation sequence y1:t is processed by a multi-layer LSTM to produce hidden states at each time step. A linear transformation of the top-layer hidden state yields a fixed-dimensional summary statistic st that encodes information from the entire history y1:t . We next turn to the smoothing problem. For any t ≥ 2, the trajectory smoothing distribution … view at source ↗

**Figure 3.** Figure 3: Schematic of FLUID. A shared recurrent summary network encodes the observation history y1:t into a fixed-dimensional summary statistic st . Conditioned on the shared summary st , the forward and backward flows approximate the filtering distribution p(ut | y1:t ) and the backward transition distribution p(ut | ut+1, y1:t ), respectively. 3.2. Flow-based particle filtering Beyond the amortized filtering proc… view at source ↗

**Figure 4.** Figure 4: Visualization of the mean and uncertainty of the estimated filtering distribution pθ1 ,ψ(uk | sk ) (left column) and the smoothing distribution p smoothing θ1 ,θ2 ,ψ (uk | y1:T) (right column) for Case 1 of the advection-diffusion problem with state dimension n = 50. To illustrate the effectiveness of FLUID for high-dimensional state estimation, [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗

**Figure 5.** Figure 5: Time evolution of DKL(p(uk | y1:k ) ∥ pθ1 ,ψ(uk | sk )) (left) and DKL(p(uk−1 | uk , y1:k ) ∥ pθ2 ,ψ(uk−1 | uk ,sk )) (right) for Case 1 of the advection-diffusion problem with state dimension n = 50 [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗

**Figure 6.** Figure 6: further shows the evolution of RMSE and the other error metrics for pθ1,ψ(uk | sk) and pθ2,ψ(uk−1 | uk ,sk) when n = 50. The results again demonstrate the strong robustness and temporal extrapolation performance of the proposed method. In addition, [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

**Figure 7.** Figure 7: Spatiotemporal results for the predicted filtering distribution pθ1 ,ψ(uk | sk ) (top row) and the smoothing distribution p smoothing θ1 ,θ2 ,ψ (uk | y1:T) (bottom row) for Case 1 of the advection-diffusion problem with state dimension n = 50. From left to right, the columns show the reference mean, the predicted mean, the absolute error, and the predicted standard deviation. The corresponding process-nois… view at source ↗

**Figure 8.** Figure 8: Visualization of the mean and uncertainty of the estimated filtering distribution pθ1 ,ψ(uk | sk ) (left column) and the smoothing distribution p smoothing θ1 ,θ2 ,ψ (uk | y1:T) (right column) for Case 2 of the advection-diffusion problem at state dimension n = 48. To illustrate the effectiveness of FLUID for high-dimensional state estimation, [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗

**Figure 9.** Figure 9: Time evolution of DKL(p(uk | y1:k ) ∥ pθ1 ,ψ(uk | sk )) (left) and DKL(p(uk−1 | uk , y1:k ) ∥ pθ2 ,ψ(uk−1 | uk ,sk )) (right) for Case 2 of the advection-diffusion problem at state dimension n = 48. 0 200 400 600 800 1000 Time step 0.000 0.005 0.010 0.015 0.020 0.025 0.030 RMSE FLUID Ttrain 0 200 400 600 800 1000 Time step 0.000 0.002 0.004 0.006 0.008 0.010 MMD FLUID Ttrain 0 200 400 600 800 1000 Time ste… view at source ↗

**Figure 10.** Figure 10: Time evolution of error metrics (RMSE, MMD, and CRPS) for the predicted filtering distribution pθ1 ,ψ(uk | sk ) (top row) and the backward kernel distribution pθ2 ,ψ(uk | uk+1 ,sk ) (bottom row) for Case 2 of the advection-diffusion problem at state dimension n = 48. 0.0 2.5 5.0 7.5 t 0.0 0.2 0.4 0.6 0.8 x Reference Mean 0.0 2.5 5.0 7.5 t x Predicted Mean 0.0 2.5 5.0 7.5 t x Error 0.0 2.5 5.0 7.5 t x Pred… view at source ↗

**Figure 11.** Figure 11: Spatiotemporal results for the predicted filtering distribution pθ1 ,ψ(uk | sk ) (top row) and the smoothing distribution p smoothing θ1 ,θ2 ,ψ (uk | y1:T) (bottom row) for Case 2 of the advection-diffusion problem at state dimension n = 48. From left to right, the columns display the reference mean, the predicted mean, the absolute error between them, and the predicted standard deviation. 19 [PITH_FULL_… view at source ↗

**Figure 12.** Figure 12: Comparison of the time evolution of RESS for the filtering distributions of the flow-based particle filtering method p particle θ3 ,θ4 (uk | y1:k ) and the FBF method pFBF(uk | y1:k ) for the two-factor stochastic volatility model on a test case. 20 [PITH_FULL_IMAGE:figures/full_fig_p020_12.png] view at source ↗

**Figure 13.** Figure 13: Visualization of the mean and uncertainty of the estimated filtering distribution pθ1 ,ψ(uk | sk ) for the two-factor stochastic volatility model. 0 250 500 750 1000 1250 1500 1750 2000 Time step 4 2 0 2 4 u 1 0 250 500 750 1000 1250 1500 1750 2000 Time step 4 2 0 2 4 6 u 2 Particle Filtering Result path predicted mean predicted ± 2 stds Ttrain [PITH_FULL_IMAGE:figures/full_fig_p021_13.png] view at source ↗

**Figure 14.** Figure 14: Visualization of the mean and uncertainty of the estimated filtering distribution p particle θ3 ,θ4 (uk | y1:k ) for the two-factor stochastic volatility model. To further illustrate the behavior of the two learned filtering models, Figures 13 and 14 visualize the mean and uncertainty of the estimated filtering distributions produced by FLUID and the flow-based particle filtering method, respectively, on … view at source ↗

**Figure 15.** Figure 15: Visualization of the mean and uncertainty of the estimated smoothing distribution p smoothing θ1 ,θ2 ,ψ (uk | y1:T) for the two-factor stochastic volatility model with T = 2000. After validating the proposed methods on synthetic data, we further examine the flow-based particle filtering method on real market data. Specifically, we apply the trained model to the daily returns of the S&P 500 index in order … view at source ↗

**Figure 16.** Figure 16: Time evolution of error metrics (RMSE, MMD, and CRPS) for the predicted filtering distribution pθ1 ,ψ(uk | sk ) (top row) and the backward kernel distribution pθ2 ,ψ(uk−1 | uk ,sk ) (bottom row) for the two-factor stochastic volatility model. 0 200 400 600 800 1000 Time step 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 RESS Flow-based Particle Filter Relative Effective Sample Size (RESS) [PITH_FULL_IMAGE:figu… view at source ↗

**Figure 17.** Figure 17: Time evolution of RESS for the filtering distributions of the flow-based particle filtering method p particle θ3 ,θ4 (uk | y1:k ) for the two-factor stochastic volatility model on S&P 500 dataset. 0 200 400 600 800 1000 Time step 8 6 4 2 0 2 4 6 S&P500 path 0 200 400 600 800 1000 Time step 0 10 20 30 40 50 60 70 predicted mean 90% confidence interval S&P500 [PITH_FULL_IMAGE:figures/full_fig_p023_17.png] view at source ↗

**Figure 18.** Figure 18: Flow-based particle filtering results for the S&P 500 dataset. The top row visualizes the observation sequence over time. The bottom row displays the mean and the 90% credible interval (5th to 95th percentiles) of the exponentiated state exp(uk ), derived from the estimated filtering distribution p particle θ3 ,θ4 (uk | y1:k ). 23 [PITH_FULL_IMAGE:figures/full_fig_p023_18.png] view at source ↗

**Figure 19.** Figure 19: Comparison of the time evolution of error metrics (RMSE, MMD, and CRPS) for the filtering distributions of the proposed FLUID method pθ1 ,ψ(uk | sk ) and the FBF method pFBF(uk | y1:k ) for the Burgers’ equation at an observation noise level of r 2 = 0.25. 24 [PITH_FULL_IMAGE:figures/full_fig_p024_19.png] view at source ↗

**Figure 20.** Figure 20: Visualization of the mean and uncertainty of the estimated filtering distribution pθ1 ,ψ(uk | sk ) (left column) and the smoothing distribution p smoothing θ1 ,θ2 ,ψ (uk | y1:T) (right column) for the Burgers’ equation problem at an observation noise level of r 2 = 0.25 [PITH_FULL_IMAGE:figures/full_fig_p025_20.png] view at source ↗

**Figure 21.** Figure 21: Time evolution of error metrics (RMSE, MMD, and CRPS) for the predicted backward kernel distribution pθ2 ,ψ(uk | uk+1 ,sk ) for the Burgers’ equation problem at an observation noise level r 2 = 0.25. 0.00 0.25 0.50 0.75 t -1.0 -0.5 0.0 0.5 x Path 0.00 0.25 0.50 0.75 t x Predicted Mean 0.00 0.25 0.50 0.75 t x Error 0.00 0.25 0.50 0.75 t x Predicted Std -1.0 -0.5 0.0 0.5 1.0 -0.75 -0.50 -0.25 0.00 0.25 0.50… view at source ↗

**Figure 22.** Figure 22: Spatiotemporal results for the predicted filtering distribution pθ1 ,ψ(uk | sk ) (top row) and the smoothing distribution p smoothing θ1 ,θ2 ,ψ (uk | y1:T) (bottom row) for the Burgers’ equation problem at an observation noise level r 2 = 0.25. From left to right, the columns display the true path (reference), the predicted mean, the absolute error between them, and the predicted standard deviation. 4.4. … view at source ↗

**Figure 23.** Figure 23: , which further shows that the filtering performance remains stable over long prediction horizons. RMSE MMD CRPS pθ1 ,ψ(uk | sk ) pFBF(uk | y1:k ) pθ1 ,ψ(uk | sk ) pFBF(uk | y1:k ) pθ1 ,ψ(uk | sk ) pFBF(uk | y1:k ) K = 10 0.1632 0.2044 0.0647 0.0967 0.0738 0.1015 K = 20 0.1945 0.2439 0.1703 0.2530 0.0902 0.1272 K = 30 0.2081 0.2604 0.2675 0.3987 0.0976 0.1393 K = 40 0.2255 0.2742 0.3844 0.5295 0.1131 0.14… view at source ↗

**Figure 24.** Figure 24: Time evolution of error metrics (RMSE, MMD, and CRPS) for the predicted backward kernel distribution pθ2 ,ψ(uk | uk+1 ,sk ) for the single-scale Lorenz-96 model at state dimension K = 50. Furthermore, we investigate the necessity of sharing the summary statistic for the smoothing procedure. As shown in [PITH_FULL_IMAGE:figures/full_fig_p028_24.png] view at source ↗

**Figure 25.** Figure 25: Visualization of the mean and uncertainty of the estimated filtering distribution pθ1 ,ψ(uk | sk ) (left column) and the smoothing distribution p smoothing θ1 ,θ2 ,ψ (uk | y1:T) (right column) for the single-scale Lorenz-96 model at state dimension n = 50. 0 20 40 t 0 10 20 30 40 x Path 0 20 40 t x Predicted Mean 0 20 40 t x Error 0 20 40 t x Predicted Std -5 0 5 10 -10 -5 0 5 10 0 1 2 3 4 5 0.1 0.2 0.3 0… view at source ↗

**Figure 26.** Figure 26: Spatiotemporal results for the predicted filtering distribution pθ1 ,ψ(uk | sk ) (top row) and the smoothing distribution p smoothing θ1 ,θ2 ,ψ (uk | y1:T) (bottom row) for the single-scale Lorenz-96 model at state dimension K = 50. From left to right, the columns display the true path (reference), the predicted mean, the absolute error between them, and the predicted standard deviation. 29 [PITH_FULL_IM… view at source ↗

**Figure 27.** Figure 27: Comparison of the time evolution of error metrics (RMSE, MMD, and CRPS) for the filtering distributions of the proposed FLUID method pθ1 ,ψ(uk | sk ) and the FBF method pFBF(uk | y1:k ) for the two-factor Lorenz model with forcing term F = 5 (top row) and F = 16 (bottom row) [PITH_FULL_IMAGE:figures/full_fig_p030_27.png] view at source ↗

**Figure 28.** Figure 28: Visualization of the mean and uncertainty of the estimated filtering distribution pθ1 ,ψ(uk | sk ) (left column) and the smoothing distribution p smoothing θ1 ,θ2 ,ψ (uk | y1:T) (right column) for the two-scale Lorenz model at forcing term F = 5. 31 [PITH_FULL_IMAGE:figures/full_fig_p031_28.png] view at source ↗

**Figure 29.** Figure 29: Visualization of the mean and uncertainty of the estimated filtering distribution pθ1 ,ψ(uk | sk ) (left column) and the smoothing distribution p smoothing θ1 ,θ2 ,ψ (uk | y1:T) (right column) for the two-scale Lorenz model at forcing term F = 16. 0 200 400 600 800 1000 Time step 0.00 0.02 0.04 0.06 0.08 0.10 RMSE FLUID Ttrain 0 200 400 600 800 1000 Time step 0.00 0.01 0.02 0.03 0.04 MMD FLUID Ttrain 0 20… view at source ↗

**Figure 30.** Figure 30: Time evolution of error metrics (RMSE, MMD, and CRPS) for the predicted backward kernel distribution pθ2 ,ψ(uk | uk+1 ,sk ) for the two-scale Lorenz model with forcing term F = 5 (top row) and F = 16 (bottom row). 32 [PITH_FULL_IMAGE:figures/full_fig_p032_30.png] view at source ↗

**Figure 31.** Figure 31: Spatiotemporal results for the two-scale Lorenz model under forcing terms F = 5 (top two rows) and F = 16 (bottom two rows). For each forcing term, the predicted filtering distribution pθ1 ,ψ(uk | sk ) and the smoothing distribution p smoothing θ1 ,θ2 ,ψ (uk | y1:T) are displayed in the upper and lower rows, respectively. From left to right, the columns show the true path (reference), the predicted mean, … view at source ↗

read the original abstract

Bayesian filtering and smoothing for high-dimensional nonlinear dynamical systems are fundamental yet challenging problems in many areas of science and engineering. In this work, we propose FLUID, a flow-based unified amortized inference framework for filtering and smoothing dynamics. The core idea is to encode each observation history into a fixed-dimensional summary statistic and use this shared representation to learn both a forward flow for the filtering distribution and a backward flow for the backward transition kernel. Specifically, a recurrent encoder maps each observation history to a fixed-dimensional summary statistic whose dimension does not depend on the length of the time series. Conditioned on this shared summary statistic, the forward flow approximates the filtering distribution, while the backward flow approximates the backward transition kernel. The smoothing distribution over an entire trajectory is then recovered by combining the terminal filtering distribution with the learned backward flow through the standard backward recursion. By learning the underlying temporal evolution structure, FLUID also supports extrapolation beyond the training horizon. Moreover, by coupling the two flows through shared summary statistics, FLUID induces an implicit regularization across latent state trajectories and improves trajectory-level smoothing. In addition, we develop a flow-based particle filtering variant that provides an alternative filtering procedure and enables ESS-based diagnostics when explicit model factors are available. Numerical experiments demonstrate that FLUID provides accurate approximations of both filtering distributions and smoothing paths.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FLUID gives a new way to do amortized filtering and smoothing together by sharing a recurrent summary statistic between the forward and backward flows.

read the letter

The punchline is that FLUID gives a new way to do amortized filtering and smoothing together by sharing a recurrent summary statistic between the forward and backward flows. This coupling is the fresh element, and it leads to the claimed implicit regularization on trajectories. What the paper does well is describe a coherent architecture that recovers smoothing via standard recursion after learning the flows. It also handles extrapolation and adds a particle filter option for diagnostics. These are practical features for dynamics inference. Where it is soft is on the central assumption that the fixed-dimensional summary captures everything needed for accurate approximations in arbitrary high-dimensional nonlinear cases. RNNs have known limitations with long horizons, and if the filtering distributions grow complex, the shared summary may not be enough. The numerical experiments are cited as showing accuracy, but without specifics on the data regimes or comparisons in the abstract, it's difficult to gauge how well this holds. The full paper likely has more, but that section will be key. This paper is for people in machine learning and statistics who work on state-space models and want amortized methods. A reader focused on flow-based models or unified inference frameworks would find the design useful to consider. It deserves a serious referee. The idea is original enough and the framework is well-motivated, so review will help refine the empirical support. I recommend engaging with it in peer review.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces FLUID, a flow-based unified amortized inference framework for Bayesian filtering and smoothing in high-dimensional nonlinear dynamical systems. A recurrent encoder maps observation histories to fixed-dimensional summary statistics independent of sequence length; conditioned on this shared summary, a forward flow approximates the filtering distribution while a backward flow approximates the backward transition kernel. Smoothing distributions are recovered via standard backward recursion, with additional claims of support for extrapolation beyond training horizons, implicit regularization across trajectories from the shared summary, and a flow-based particle filtering variant. Numerical experiments are stated to demonstrate accurate approximations of both filtering distributions and smoothing paths.

Significance. If the central claims hold, FLUID would offer a scalable amortized approach to joint filtering and smoothing that couples the two tasks through shared representations, potentially improving trajectory-level accuracy via implicit regularization while enabling extrapolation. The flow-based formulation provides flexible density modeling without explicit model factors in the base procedure, and the particle-filtering extension adds diagnostic capabilities when factors are available. These features could advance inference in complex state-space models common in science and engineering.

major comments (2)

[§3] §3 (Method): The central claim that a fixed-dimensional recurrent summary serves as a sufficient statistic for both the filtering distribution (forward flow) and the backward transition kernel (backward flow) for arbitrary-length sequences in high-dimensional nonlinear systems is load-bearing for the smoothing recursion and the implicit-regularization argument, yet no theoretical justification, information-loss bounds, or ablation on summary dimension versus sequence length is provided to support sufficiency.
[§5] §5 (Numerical Experiments): The assertion that experiments demonstrate accurate approximations lacks specification of baselines (e.g., standard particle filters, other amortized methods), quantitative metrics (KL, ESS, trajectory error), data regimes (state dimension, sequence lengths, nonlinearity), or controls for hyperparameter selection, which directly undermines evaluation of the claimed improvements in filtering and smoothing.

minor comments (1)

[Abstract and §3] The abstract and method sections use 'shared summary statistic' without an explicit equation defining its dimension or the recurrent encoder architecture (e.g., LSTM vs. GRU hidden size), which would improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the constructive review and for recognizing the potential of FLUID as a scalable amortized approach to joint filtering and smoothing. We address each major comment below and commit to revisions that strengthen the manuscript without altering its core contributions.

read point-by-point responses

Referee: [§3] §3 (Method): The central claim that a fixed-dimensional recurrent summary serves as a sufficient statistic for both the filtering distribution (forward flow) and the backward transition kernel (backward flow) for arbitrary-length sequences in high-dimensional nonlinear systems is load-bearing for the smoothing recursion and the implicit-regularization argument, yet no theoretical justification, information-loss bounds, or ablation on summary dimension versus sequence length is provided to support sufficiency.

Authors: We agree that a formal theoretical guarantee of sufficiency would provide stronger support. The recurrent encoder is motivated by the universal approximation capabilities of RNNs for sequence compression, which have been effective in related dynamical modeling tasks. In the revised manuscript we will expand the discussion in §3 to clarify the modeling assumptions and add an ablation study that varies summary dimension against sequence length, reporting effects on filtering and smoothing accuracy. This supplies the requested empirical assessment while preserving the practical emphasis of the work. revision: yes
Referee: [§5] §5 (Numerical Experiments): The assertion that experiments demonstrate accurate approximations lacks specification of baselines (e.g., standard particle filters, other amortized methods), quantitative metrics (KL, ESS, trajectory error), data regimes (state dimension, sequence lengths, nonlinearity), or controls for hyperparameter selection, which directly undermines evaluation of the claimed improvements in filtering and smoothing.

Authors: We appreciate the referee highlighting the need for clearer experimental reporting. The original manuscript contains comparisons and metrics, but these details were insufficiently explicit. We will revise §5 to list all baselines (including particle filters and other amortized methods), report quantitative results using KL divergence, effective sample size, and trajectory errors, specify data regimes (state dimensions, sequence lengths, nonlinearity levels), and describe hyperparameter selection with controls. These updates will make the evaluation transparent and reproducible. revision: yes

Circularity Check

0 steps flagged

No circularity: standard flows, recurrent summary, and backward recursion remain independent of fitted outputs

full rationale

The derivation encodes observation histories via a recurrent network into a fixed-dimensional summary, then trains separate normalizing flows for the filtering distribution and backward kernel conditioned on that summary. Smoothing is recovered by applying the textbook backward recursion to the terminal filter and the learned backward kernel. None of these steps re-uses a fitted quantity as its own prediction, invokes a self-citation for a uniqueness theorem, or smuggles an ansatz through prior work. The shared-summary coupling is an architectural choice whose regularization effect is asserted but not derived by re-labeling inputs; numerical validation is external to the algebraic chain. The procedure is therefore self-contained against external benchmarks and receives score 0.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 1 invented entities

The central claim depends on the recurrent encoder producing a sufficient statistic and on normalizing flows being expressive enough to represent the required distributions; these are learned from data rather than derived from first principles.

free parameters (1)

dimension of summary statistic
Chosen as a fixed hyperparameter independent of sequence length; its specific value is not stated in the abstract but directly affects conditioning of both flows.

axioms (2)

domain assumption Normalizing flows can accurately approximate the filtering distribution and backward transition kernel when conditioned on the summary statistic
Invoked implicitly when stating that the learned flows provide accurate approximations.
domain assumption The recurrent encoder compresses arbitrary-length histories into a fixed vector without critical information loss for the inference task
Required for the shared representation to support both filtering and smoothing.

invented entities (1)

shared summary statistic produced by recurrent encoder no independent evidence
purpose: Provides a length-independent conditioning variable for both the forward filtering flow and the backward transition flow
New representational device introduced to couple the two inference directions.

pith-pipeline@v0.9.0 · 5547 in / 1569 out tokens · 53468 ms · 2026-05-10T17:17:17.024402+00:00 · methodology

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

One Operator for Many Densities: Amortized Approximation of Conditioning by Neural Operators
stat.ML 2026-05 unverdicted novelty 7.0

A single neural operator can approximate the map from arbitrary joint densities to their conditionals, backed by new continuity results and illustrated on Gaussian mixtures.
One Operator for Many Densities: Amortized Approximation of Conditioning by Neural Operators
stat.ML 2026-05 unverdicted novelty 6.0

A single neural operator can approximate the map from joint densities to conditional densities to arbitrary accuracy, with a proof based on continuity of the conditioning operator and a demonstration on Gaussian mixtures.

Reference graph

Works this paper leans on

49 extracted references · 8 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

Särkkä, L

S. Särkkä, L. Svensson, Bayesian filtering and smoothing, volume 17, Cambridge university press, 2023

2023
[2]

K. Law, A. Stuart, K. Zygalakis, Data assimilation, Cham, Switzerland: Springer 214 (2015) 7

2015
[3]

M. Asch, M. Bocquet, M. Nodet, Data assimilation: methods, algorithms, and applications, SIAM, 2016

2016
[5]

J. Ko, D. Fox, GP-BayesFilters: Bayesian filtering using Gaussian process prediction and observation models, Autonomous Robots 27 (2009) 75–90

2009
[6]

J. V . Candy, Bayesian signal processing: classical, modern, and particle filtering methods, John Wiley & Sons, 2016

2016
[7]

H. F. Lopes, R. S. Tsay, Particle filters and Bayesian inference in financial econometrics, Journal of Forecasting 30 (2011) 168–209

2011
[8]

Zhang, M

X. Zhang, M. L. King, Box-Cox stochastic volatility models with heavy-tails and correlated errors, Journal of Empirical Finance 15 (2008) 549–566

2008
[9]

Chopin, Central Limit Theorem for sequential Monte Carlo methods and its application to Bayesian inference, Annals of Statistics 32 (2004) 2385–2411

N. Chopin, Central Limit Theorem for sequential Monte Carlo methods and its application to Bayesian inference, Annals of Statistics 32 (2004) 2385–2411

2004
[10]

Bengtsson, P

T. Bengtsson, P . Bickel, B. Li, Curse-of-dimensionality revisited: Collapse of the particle filter in very large scale systems, in: Probability and statistics: Essays in honor of David A. Freedman, volume 2, Institute of Mathematical Statistics, 2008, pp. 316–335

2008
[11]

Rebeschini, R

P . Rebeschini, R. van Handel, Can local particle filters beat the curse of dimensionality?, The Annals of Applied Probability (2015)

2015
[12]

R. E. Kalman, A new approach to linear filtering and prediction problems (1960)

1960
[13]

G. A. Einicke, L. B. White, Robust extended Kalman filtering, IEEE transactions on signal processing 47 (2002) 2596–2599

2002
[14]

Evensen, The ensemble Kalman filter: Theoretical formulation and practical implementation, Ocean dynamics 53 (2003) 343–367

G. Evensen, The ensemble Kalman filter: Theoretical formulation and practical implementation, Ocean dynamics 53 (2003) 343–367

2003
[15]

Calvello, S

E. Calvello, S. Reich, A. M. Stuart, Ensemble Kalman methods: A mean-field perspective, Acta Numerica 34 (2025) 123–291

2025
[16]

Doucet, N

A. Doucet, N. De Freitas, N. J. Gordon, et al., Sequential Monte Carlo methods in practice, volume 1, Springer, 2001

2001
[17]

Beskos, A

A. Beskos, A. Jasra, E. A. Muzaffer, A. M. Stuart, Sequential Monte Carlo methods for Bayesian elliptic inverse problems, Statistics and Computing 25 (2015) 727–737

2015
[18]

P . M. Djuric, J. H. Kotecha, J. Zhang, Y. Huang, T. Ghirmai, M. F. Bugallo, J. Miguez, Particle filtering, IEEE signal processing magazine 20 (2003) 19–38

2003
[19]

N. J. Gordon, D. J. Salmond, A. F. Smith, Novel approach to nonlinear/non-Gaussian Bayesian state estimation, in: IEE proceedings F (radar and signal processing), volume 140, IET, pp. 107–113
[20]

M. K. Pitt, N. Shephard, Filtering via simulation: Auxiliary particle filters, Journal of the American statistical association 94 (1999) 590–599

1999
[21]

Carpenter, P

J. Carpenter, P . Clifford, P . Fearnhead, Improved particle filter for nonlinear problems, IEE Proceedings- Radar, Sonar and Navigation 146 (1999) 2–7. 41

1999
[22]

Kitagawa, Monte Carlo filter and smoother for non-Gaussian nonlinear state space models, Journal of computational and graphical statistics 5 (1996) 1–25

G. Kitagawa, Monte Carlo filter and smoother for non-Gaussian nonlinear state space models, Journal of computational and graphical statistics 5 (1996) 1–25

1996
[23]

W. R. Gilks, C. Berzuini, Following a moving target—Monte Carlo inference for dynamic Bayesian models, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 63 (2001) 127–146

2001
[24]

Snyder, T

C. Snyder, T. Bengtsson, P . Bickel, J. Anderson, Obstacles to high-dimensional particle filtering, Monthly Weather Review 136 (2008) 4629–4640

2008
[25]

Reich, C

S. Reich, C. Cotter, Probabilistic forecasting and Bayesian data assimilation, Cambridge University Press, 2015

2015
[26]

Del Moral, A

P . Del Moral, A. Doucet, A. Jasra, On adaptive resampling strategies for sequential Monte Carlo methods (2012)

2012
[27]

Cheng, C

S. Cheng, C. Quilodrán-Casas, S. Ouala, A. Farchi, C. Liu, P . Tandeo, R. Fablet, D. Lucor, B. Iooss, J. Brajard, et al., Machine learning with data assimilation and uncertainty quantification for dynamical systems: a review, IEEE/CAA Journal of Automatica Sinica 10 (2023) 1361–1387

2023
[28]

E. Bach, R. Baptista, D. Sanz-Alonso, A. Stuart, Machine learning for inverse problems and data assimilation, arXiv preprint arXiv:2410.10523 (2024)

work page arXiv 2024
[29]

Zhou, Z.-R

X.-H. Zhou, Z.-R. Liu, H. Xiao, BI-EqNO: Generalized approximate Bayesian inference with an equiv- ariant neural operator framework, arXiv preprint arXiv:2410.16420 (2024)

work page arXiv 2024
[30]

Bocquet, A

M. Bocquet, A. Farchi, T. S. Finn, C. Durand, S. Cheng, Y. Chen, I. Pasmans, A. Carrassi, Accurate deep learning-based filtering for chaotic dynamics by identifying instabilities without an ensemble, Chaos: An Interdisciplinary Journal of Nonlinear Science 34 (2024)

2024
[31]

E. Bach, R. Baptista, E. Luk, A. Stuart, Learning optimal filters using variational inference, arXiv preprint arXiv:2406.18066 (2024)

work page arXiv 2024
[32]

Revach, N

G. Revach, N. Shlezinger, X. Ni, A. L. Escoriza, R. J. Van Sloun, Y. C. Eldar, KalmanNet: Neural network aided Kalman filtering for partially known dynamics, IEEE Transactions on Signal Processing 70 (2022) 1532–1547

2022
[33]

F. Bao, Z. Zhang, G. Zhang, A score-based filter for nonlinear data assimilation, Journal of Computa- tional Physics 514 (2024) 113207

2024
[34]

F. Bao, Z. Zhang, G. Zhang, An ensemble score filter for tracking high-dimensional nonlinear dynamical systems, Computer Methods in Applied Mechanics and Engineering 432 (2024) 117447

2024
[35]

P . T. Huynh, G. Zhang, F. Bao, et al., Joint State-Parameter Estimation for the Reduced Fracture Model via the United Filter, Journal of Computational Physics (2025) 114159

2025
[36]

E. Bach, R. Baptista, E. Calvello, B. Chen, A. Stuart, Learning Enhanced Ensemble Filters, arXiv preprint arXiv:2504.17836 (2025)

work page arXiv 2025
[37]

X. T. Tong, Y. Wang, L. Yan, Latent Autoencoder Ensemble Kalman Filter for Data assimilation, arXiv preprint arXiv:2603.06752 (2026)

work page internal anchor Pith review Pith/arXiv arXiv 2026
[38]

Taghvaei, P

A. Taghvaei, P . G. Mehta, An optimal transport formulation of the ensemble Kalman filter, IEEE Transactions on Automatic Control 66 (2020) 3052–3067

2020
[39]

Spantini, R

A. Spantini, R. Baptista, Y. Marzouk, Coupling techniques for nonlinear ensemble filtering, SIAM Review 64 (2022) 921–953

2022
[40]

Ramgraber, D

M. Ramgraber, D. Sharp, M. L. Provost, Y. Marzouk, A friendly introduction to triangular transport, arXiv preprint arXiv:2503.21673 (2025). 42

work page arXiv 2025
[41]

Y. Zhao, T. Cui, Tensor-train methods for sequential state and parameter learning in state-space models, Journal of Machine Learning Research 25 (2024) 1–51

2024
[42]

Kobyzev, S

I. Kobyzev, S. J. Prince, M. A. Brubaker, Normalizing flows: An introduction and review of current methods, IEEE transactions on pattern analysis and machine intelligence 43 (2020) 3964–3979

2020
[43]

Papamakarios, E

G. Papamakarios, E. Nalisnick, D. J. Rezende, S. Mohamed, B. Lakshminarayanan, Normalizing flows for probabilistic modeling and inference, Journal of Machine Learning Research 22 (2021) 1–64

2021
[44]

Zammit-Mangion, M

A. Zammit-Mangion, M. Sainsbury-Dale, R. Huser, Neural methods for amortized inference, Annual Review of Statistics and Its Application 12 (2025) 311–335

2025
[45]

Goncalves, and Jakob H

M. Deistler, J. Boelts, P . Steinbach, G. Moss, T. Moreau, M. Gloeckler, P . L. Rodrigues, J. Linhart, J. K. Lappalainen, B. K. Miller, et al., Simulation-based inference: A practical guide, arXiv preprint arXiv:2508.12939 (2025)

work page arXiv 2025
[46]

X. Wang, X. Guan, L. Guo, H. Wu, Flow-based Bayesian filtering for high-dimensional nonlinear stochastic dynamical systems, arXiv preprint arXiv:2502.16232 (2025)

work page arXiv 2025
[47]

X. Feng, L. Zeng, T. Zhou, Solving Time Dependent Fokker-Planck Equations via Temporal Normalizing Flow, Communications in Computational Physics 32 (2022) 401–423

2022
[48]

J. He, Q. Liao, X. Wan, Adaptive deep density approximation for stochastic dynamical systems, Journal of Scientific Computing 102 (2025) 57

2025
[49]

Doucet, S

A. Doucet, S. Godsill, C. Andrieu, On sequential Monte Carlo sampling methods for Bayesian filtering, Statistics and computing 10 (2000) 197–208

2000
[50]

D. S. Wilks, Effects of stochastic parametrizations in the Lorenz’96 system, Quarterly Journal of the Royal Meteorological Society: A journal of the atmospheric sciences, applied meteorology and physical oceanography 131 (2005) 389–407. 43

2005