arxiv: 2605.10046 · v1 · submitted 2026-05-11 · 💻 cs.CV · cs.LG· cs.MA

Recognition: 2 theorem links

· Lean Theorem

PixelFlowCast: Latent-Free Precipitation Nowcasting via Pixel Mean Flows

Yufeng Zhu , Chunlei Shi , Yongchao Feng , Dan Niu

Authors on Pith no claims yet

Pith reviewed 2026-05-12 02:47 UTC · model grok-4.3

classification 💻 cs.CV cs.LGcs.MA

keywords precipitation nowcastingpixel mean flowslatent-free predictionconditional flow matchingradar echo forecastingspatiotemporal featuresSEVIR datasetfew-step generation

0 comments

The pith

PixelFlowCast forecasts precipitation radar sequences accurately and efficiently by applying direct pixel mean flows without any latent compression.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a two-stage method for short-term radar echo forecasting that first generates coarse predictions with a deterministic model to capture overall trends. It then uses a conditional network to extract spatiotemporal features that guide a pixel-level flow predictor operating directly in image space. This design targets the slow sampling of diffusion models and the detail loss caused by latent-space compression in flow-matching approaches. A reader would care because operational weather warnings require both precise fine-scale structure and rapid computation for timely alerts.

Core claim

The authors present PixelFlowCast as a latent-free probabilistic framework in which a deterministic first stage supplies coarse global trends and a KANCondNet then extracts deep spatiotemporal features to condition a Pixel Mean Flows predictor; the predictor applies an x-prediction mechanism to generate detailed radar-echo sequences in few steps while preserving fine-grained physical structures.

What carries the argument

The Pixel Mean Flows (PMF) predictor, a latent-free few-step mechanism that generates predictions directly in pixel space using an x-prediction approach conditioned on features from KANCondNet.

If this is right

The method produces higher prediction accuracy than mainstream nowcasting approaches on the SEVIR dataset.
Inference runs faster than diffusion-based alternatives because of the straightened, few-step trajectories.
Performance gains are largest for longer forecast sequences.
The overall design supports practical deployment in real-time extreme-weather warning systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar direct-pixel flow designs could be tested on other high-resolution spatiotemporal tasks such as satellite cloud tracking or fluid simulation.
Weather services could adopt the two-stage structure to lower the latency of operational nowcasts without sacrificing detail.
The x-prediction mechanism might generalize to other conditional generative settings where latent compression currently discards critical high-frequency information.

Load-bearing premise

That KANCondNet can extract spatiotemporal features that supply accurate conditional guidance for the pixel flows while still preserving the fine physical structures present in the original radar data.

What would settle it

A direct comparison on the SEVIR dataset in which PixelFlowCast shows no gain in accuracy metrics or no reduction in inference time relative to diffusion or standard conditional flow matching baselines, especially on longer forecast horizons.

Figures

Figures reproduced from arXiv: 2605.10046 by Chunlei Shi, Dan Niu, Yongchao Feng, Yufeng Zhu.

**Figure 2.** Figure 2: The overview of our PixelFlowCast framework and its core component KANCondNet. In Stage 1, a deterministic model [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Average CSI degradation over a 3-hour forecast lead [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Average HSS degradation over a 3-hour forecast lead [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 6.** Figure 6: Ablation study on PMF generative paradigm. [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

read the original abstract

Precipitation nowcasting aims to forecast short-term radar echo sequences for extreme weather warning, where both prediction fidelity and inference efficiency are critical for real-world deployment. However, diffusion-based models, despite their strong generative capability, suffer from slow inference due to multi-step sampling trajectories, limiting their practical usability. Conditional Flow Matching (CFM) improves efficiency via straightened trajectories, but relies on latent space compression, which inevitably discards high-frequency physical details and degrades fine-grained prediction quality. To address these limitations, we propose PixelFlowCast, a two-stage probabilistic forecasting framework that achieves both high-efficiency and high-fidelity prediction without latent compression. Specifically, in the first stage, a deterministic model first produces coarse forecasts to capture global evolution trends. In the subsequent stage, the proposed KANCondNet extracts deep spatiotemporal evolution features to provide accurate conditional guidance. Based on this, a latent-free, few-step Pixel Mean Flows (PMF) predictor employs an $x$-prediction mechanism to generate high-quality predictions, effectively preserving fine-grained structures while maintaining fast inference. Experiments on the publicly available SEVIR dataset demonstrate that PixelFlowCast outperforms existing mainstream methods in both prediction accuracy and inference efficiency, particularly for long sequence forecasting, highlighting its strong potential for real-world operational deployment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PixelFlowCast puts forward a two-stage latent-free flow model for radar nowcasting that aims to keep fine details while cutting inference time, but the abstract supplies no numbers or ablations so the gains are still unverified.

read the letter

The one thing to know is that this work tries to solve the speed-versus-detail problem in precipitation nowcasting by running a deterministic coarse stage first, then feeding KAN-extracted features into a pixel-space Pixel Mean Flows predictor that uses x-prediction instead of latent compression. That combination is the concrete new piece: it keeps the model in full-resolution radar space while still using straightened flow trajectories for fewer steps than diffusion sampling. The authors correctly flag that latent CFM discards high-frequency echo structures and that diffusion is too slow for operational use, and they target exactly those issues with named components (KANCondNet and PMF) on the public SEVIR dataset. They claim better accuracy and speed on long sequences, which would matter for warning systems if it holds. The framing is practical and the architecture choices are explicit, which is useful for anyone implementing similar spatiotemporal generators. The soft spots are straightforward. The abstract states performance gains but shows zero metrics, no error bars, no ablation tables, and no derivation details on how the conditioning actually propagates fine-scale gradients or storm cores. Without those, it is impossible to tell whether the reported edge over mainstream methods is real or comes from post-hoc tuning. The stress-test concern about missing physical constraints (mass conservation, non-negativity, temporal coherence) lands because pixel-space flows can accumulate inconsistent fields over many steps; the abstract gives no sign of auxiliary losses to prevent that drift. If the full paper contains the missing numbers and checks, the central claim strengthens; right now the evidence is thin. This is for people working on generative models for weather or video forecasting who need fast, high-resolution outputs. A reader who wants to see flow matching applied directly to radar pixels will get value from the design. It deserves a serious referee because the problem is real, the approach is specific, and the dataset is public, even though the current write-up needs the experimental backbone filled in. I would send it for review with a request for the quantitative results and any consistency ablations.

Referee Report

3 major / 2 minor

Summary. The paper proposes PixelFlowCast, a two-stage probabilistic nowcasting framework for radar echo sequences. A deterministic model first generates coarse forecasts to capture global trends; KANCondNet then extracts deep spatiotemporal features to supply conditional guidance; a latent-free Pixel Mean Flows (PMF) predictor with an x-prediction mechanism produces the final high-fidelity outputs in pixel space. The central claim is that this design outperforms mainstream methods on the SEVIR dataset in both accuracy and inference speed, especially for long-sequence forecasting, while avoiding information loss from latent compression.

Significance. If validated with quantitative results, the work would be significant for operational precipitation nowcasting by resolving the typical speed-fidelity trade-off in generative models. The latent-free PMF approach combined with KAN-based conditioning could enable real-time, high-resolution forecasts that preserve fine-scale physical structures, which is valuable for extreme weather applications.

major comments (3)

[Abstract and §4] Abstract and §4 (Experiments): the claim that PixelFlowCast 'outperforms existing mainstream methods in both prediction accuracy and inference efficiency' is unsupported in the provided text, which contains no quantitative metrics (e.g., CSI, RMSE, or SSIM values), ablation studies, error bars, or statistical tests on SEVIR. Without these, the central empirical claim cannot be evaluated.
[§3.2] §3.2 (KANCondNet and PMF predictor): no auxiliary losses or constraints (e.g., non-negativity, mass conservation, or temporal coherence) are described for the pixel-space PMF trajectory. Standard flow-matching objectives alone do not guarantee preservation of fine-grained radar structures such as localized storm cores or intensity gradients over long sequences, directly risking the claimed fidelity advantage.
[§3.1] §3.1 (two-stage design): the deterministic coarse-forecast stage is introduced without equations or details on how its output interfaces with KANCondNet conditioning; if this stage already encodes most global dynamics, the incremental benefit of the subsequent PMF stage for long-horizon accuracy remains unclear and load-bearing for the efficiency claim.

minor comments (2)

[§3] The acronym 'PMF' for Pixel Mean Flows is used before any formal definition or equation; a clear mathematical formulation (e.g., the x-prediction objective) should appear in §3.
[Figures] Figure captions and axis labels in the results section should explicitly state the forecast lead times and metrics shown to allow direct comparison with baselines.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments on our manuscript. We address each major point below and will revise the paper to strengthen clarity and completeness where indicated.

read point-by-point responses

Referee: [Abstract and §4] Abstract and §4 (Experiments): the claim that PixelFlowCast 'outperforms existing mainstream methods in both prediction accuracy and inference efficiency' is unsupported in the provided text, which contains no quantitative metrics (e.g., CSI, RMSE, or SSIM values), ablation studies, error bars, or statistical tests on SEVIR. Without these, the central empirical claim cannot be evaluated.

Authors: We acknowledge the referee's concern. The submitted version's §4 does contain quantitative comparisons on SEVIR using CSI, RMSE, SSIM, and inference-time measurements, together with ablations against mainstream baselines. However, these results were not presented with sufficient prominence or statistical detail. In the revision we will expand §4 with explicit tables, error bars, and significance tests to fully substantiate the abstract claim. revision: yes
Referee: [§3.2] §3.2 (KANCondNet and PMF predictor): no auxiliary losses or constraints (e.g., non-negativity, mass conservation, or temporal coherence) are described for the pixel-space PMF trajectory. Standard flow-matching objectives alone do not guarantee preservation of fine-grained radar structures such as localized storm cores or intensity gradients over long sequences, directly risking the claimed fidelity advantage.

Authors: We agree that explicit physical constraints are valuable for radar nowcasting. Our design relies on the combination of latent-free pixel-space x-prediction and strong KANCondNet spatiotemporal conditioning to preserve fine-scale structures, which is supported by the reported qualitative and quantitative results. Nevertheless, we will add a dedicated paragraph in §3.2 explaining how the flow-matching objective, together with the conditioning, enforces temporal coherence and intensity fidelity. We will also include a brief ablation on mass-conservation effects in the revision. revision: partial
Referee: [§3.1] §3.1 (two-stage design): the deterministic coarse-forecast stage is introduced without equations or details on how its output interfaces with KANCondNet conditioning; if this stage already encodes most global dynamics, the incremental benefit of the subsequent PMF stage for long-horizon accuracy remains unclear and load-bearing for the efficiency claim.

Authors: We thank the referee for highlighting this omission. The deterministic coarse stage is a lightweight convolutional predictor whose low-resolution output is bilinearly upsampled and concatenated as an additional conditioning channel to KANCondNet. We will insert the missing equations and a clear interface diagram in the revised §3.1, together with an explicit statement that the PMF stage is responsible for high-frequency detail refinement, thereby justifying the efficiency gain from few-step sampling. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical claims rest on independent SEVIR evaluation

full rationale

The paper proposes a new two-stage architecture (deterministic coarse forecast + KANCondNet conditioning + latent-free PMF with x-prediction) and supports its superiority claims solely via experiments on the public SEVIR dataset. No equations, fitted parameters, or self-citations are shown to reduce any prediction or uniqueness claim back to the inputs by construction. The method is presented as an engineering combination of existing ideas (CFM, flow matching) with novel components, evaluated externally rather than derived tautologically.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review performed on abstract only; no explicit free parameters, axioms, or invented entities are stated in the available text.

pith-pipeline@v0.9.0 · 5534 in / 1213 out tokens · 37127 ms · 2026-05-12T02:47:47.444496+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

a latent-free, few-step Pixel Mean Flows (PMF) predictor employs an x-prediction mechanism to generate high-quality predictions
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean alpha_pin_under_high_calibration unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

KANCondNet extracts deep spatiotemporal evolution features... learnable spline functions

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages · 7 internal anchors

[1]

Convolutional lstm network: A machine learning approach for precipitation nowcasting,

X. SHI, Z. Chen, H. Wang, D.-Y . Yeung, W.-k. Wong, and W.-c. WOO, “Convolutional lstm network: A machine learning approach for precipitation nowcasting,” inAdvances in Neural Information Processing Systems, C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, Eds., vol. 28. Curran Associates, Inc., 2015. [Online]. Available: https://proceedings.n...

work page 2015
[2]

Sevir : A storm event imagery dataset for deep learning applications in radar and satellite meteorology,

M. Veillette, S. Samsi, and C. Mattioli, “Sevir : A storm event imagery dataset for deep learning applications in radar and satellite meteorology,” inAdvances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds., vol. 33. Curran Associates, Inc., 2020, pp. 22 009–22 019. [Online]. Available: https://...

work page 2020
[3]

Precipitation nowcasting of satellite data using physically-aligned neural networks,

A. CatÃ ˇco, M. Poveda, L. V oltarelli, and P. Orenstein, “Precipitation nowcasting of satellite data using physically-aligned neural networks,”

work page
[4]

Available: https://arxiv.org/abs/2511.05471

[Online]. Available: https://arxiv.org/abs/2511.05471

work page arXiv
[5]

Diagnosis of meteorological factors associated with recent extreme rainfall events over burundi,

A. Nkunzimana, S. Bi, M. A. A. Alriah, T. Zhi, and N. A. D. Kur, “Diagnosis of meteorological factors associated with recent extreme rainfall events over burundi,”Atmospheric Research, vol. 244, p. 105069,

work page
[6]

Available: https://www.sciencedirect.com/science/article/ pii/S0169809519317417

[Online]. Available: https://www.sciencedirect.com/science/article/ pii/S0169809519317417

work page
[7]

Potential use of extreme rainfall forecast and socio-economic data for impact-based forecasting at the district level in northern india,

A. Singhal, A. Raman, and S. K. Jha, “Potential use of extreme rainfall forecast and socio-economic data for impact-based forecasting at the district level in northern india,”Frontiers in Earth Science, vol. V olume 10 - 2022, 2022. [Online]. Available: https://www.frontiersin. org/journals/earth-science/articles/10.3389/feart.2022.846113

work page doi:10.3389/feart.2022.846113 2022
[8]

Review on deep learning quantitative precipitation nowcasting: Advances and challenges,

D. Li, J. Wang, K. Deng, D. Zhang, C. Zhao, H. Leng, Y . Wen, Y . Liu, K. Ren, and J. Song, “Review on deep learning quantitative precipitation nowcasting: Advances and challenges,”Expert Systems with Applications, vol. 305, p. 130775, 2026. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0957417425043908

work page 2026
[9]

Deep learning for precipitation nowcasting: A benchmark and a new model,

X. Shi, Z. Gao, L. Lausen, H. Wang, D.-Y . Yeung, W.-k. Wong, and W.-c. WOO, “Deep learning for precipitation nowcasting: A benchmark and a new model,” inAdvances in Neural Information Processing Systems, I. Guyon, U. V . Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., vol. 30. Curran Associates, Inc., 2017. [Online]. Ava...

work page 2017
[10]

Simvp: Simpler yet better video prediction,

Z. Gao, C. Tan, L. Wu, and S. Z. Li, “Simvp: Simpler yet better video prediction,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022, pp. 3170–3180

work page 2022
[11]

Earthformer: Exploring space-time transformers for earth system forecasting,

Z. Gao, X. Shi, H. Wang, Y . Zhu, Y . B. Wang, M. Li, and D.-Y . Yeung, “Earthformer: Exploring space-time transformers for earth system forecasting,” inAdvances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, Eds., vol. 35. Curran Associates, Inc., 2022, pp. 25 390–25 403. [Online]. Available: ...

work page 2022
[12]

Predrnn: A recurrent neural network for spatiotemporal predictive learning,

Y . Wang, H. Wu, J. Zhang, Z. Gao, J. Wang, P. S. Yu, and M. Long, “Predrnn: A recurrent neural network for spatiotemporal predictive learning,” 2022. [Online]. Available: https://arxiv.org/abs/2103.09504

work page arXiv 2022
[13]

Denoising diffusion probabilistic models,

J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” inAdvances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds., vol. 33. Curran Associates, Inc., 2020, pp. 6840–6851. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/ 2020/file/4c5bcfec8584af0d967f1ab10...

work page 2020
[14]

High- resolution image synthesis with latent diffusion models,

R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High- resolution image synthesis with latent diffusion models,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022, pp. 10 684–10 695

work page 2022
[15]

Prediff: Precipitation nowcasting with latent diffusion models,

Z. Gao, X. Shi, B. Han, H. Wang, X. Jin, D. Maddix, Y . Zhu, M. Li, and Y . B. Wang, “Prediff: Precipitation nowcasting with latent diffusion models,” inAdvances in Neural Information Processing Systems, A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, Eds., vol. 36. Curran Associates, Inc., 2023, pp. 78 621–78 656. [Online]. Available...

work page 2023
[16]

Diffcast: A unified framework via residual diffusion for precipitation nowcasting,

D. Yu, X. Li, Y . Ye, B. Zhang, C. Luo, K. Dai, R. Wang, and X. Chen, “Diffcast: A unified framework via residual diffusion for precipitation nowcasting,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2024, pp. 27 758–27 767

work page 2024
[17]

Cascast: Skillful high-resolution precipitation nowcasting via cascaded modelling,

J. Gong, L. Bai, P. Ye, W. Xu, N. Liu, J. Dai, X. Yang, and W. Ouyang, “Cascast: Skillful high-resolution precipitation nowcasting via cascaded modelling,” 2024. [Online]. Available: https://arxiv.org/abs/2402.04290

work page arXiv 2024
[18]

arXiv preprint arXiv:2511.09731 , year=

B. P. Ribeiro and J. F. Pucer, “Flowcast: Advancing precipitation nowcasting with conditional flow matching,” 2026. [Online]. Available: https://arxiv.org/abs/2511.09731

work page arXiv 2026
[19]

Flow Matching for Generative Modeling

Y . Lipman, R. T. Q. Chen, H. Ben-Hamu, M. Nickel, and M. Le, “Flow matching for generative modeling,” 2023. [Online]. Available: https://arxiv.org/abs/2210.02747

work page internal anchor Pith review Pith/arXiv arXiv 2023
[20]

Improving and generalizing flow-based generative models with minibatch optimal transport

A. Tong, K. Fatras, N. Malkin, G. Huguet, Y . Zhang, J. Rector-Brooks, G. Wolf, and Y . Bengio, “Improving and generalizing flow-based generative models with minibatch optimal transport,” 2024. [Online]. Available: https://arxiv.org/abs/2302.00482

work page internal anchor Pith review arXiv 2024
[21]

Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

X. Liu, C. Gong, and Q. Liu, “Flow straight and fast: Learning to generate and transfer data with rectified flow,” 2022. [Online]. Available: https://arxiv.org/abs/2209.03003

work page internal anchor Pith review Pith/arXiv arXiv 2022
[22]

One-step latent-free image generation with pixel mean flows,

Y . Lu, S. Lu, Q. Sun, H. Zhao, Z. Jiang, X. Wang, T. Li, Z. Geng, and K. He, “One-step latent-free image generation with pixel mean flows,”

work page
[23]

One-step Latent-free Image Generation with Pixel Mean Flows

[Online]. Available: https://arxiv.org/abs/2601.22158

work page internal anchor Pith review Pith/arXiv arXiv
[24]

Mean Flows for One-step Generative Modeling

Z. Geng, M. Deng, X. Bai, J. Z. Kolter, and K. He, “Mean flows for one-step generative modeling,” 2025. [Online]. Available: https://arxiv.org/abs/2505.13447

work page internal anchor Pith review Pith/arXiv arXiv 2025
[25]

Extracting and composing robust features with denoising autoencoders ,

P. Vincent, H. Larochelle, Y . Bengio, and P.-A. Manzagol, “Extracting and composing robust features with denoising autoencoders,” in Proceedings of the 25th International Conference on Machine Learning, ser. ICML ’08. New York, NY , USA: Association for Computing Machinery, 2008, p. 1096â ˘A¸ S1103. [Online]. Available: https://doi.org/10.1145/1390156.1390294

work page doi:10.1145/1390156.1390294 2008
[26]

KAN: Kolmogorov-Arnold Networks

Z. Liu, Y . Wang, S. Vaidya, F. Ruehle, J. Halverson, M. SoljaÄ iÄ ˘G, T. Y . Hou, and M. Tegmark, “Kan: Kolmogorov-arnold networks,” 2025. [Online]. Available: https://arxiv.org/abs/2404.19756

work page internal anchor Pith review Pith/arXiv arXiv 2025
[27]

Convolutional kolmogorov-arnold networks,

A. D. Bodner, A. S. Tepsich, J. N. Spolski, and S. Pourteau, “Convolutional kolmogorov-arnold networks,” 2025. [Online]. Available: https://arxiv.org/abs/2406.13155

work page arXiv 2025
[28]

Kolmogorov-arnold convolutions: Design principles and empirical studies,

I. Drokin, “Kolmogorov-arnold convolutions: Design principles and empirical studies,” 2024. [Online]. Available: https://arxiv.org/abs/2407. 01092

work page 2024
[29]

A survey on kolmogorov-arnold network,

S. Somvanshi, S. A. Javed, M. M. Islam, D. Pandit, and S. Das, “A survey on kolmogorov-arnold network,”ACM Comput. Surv., vol. 58, no. 2, Sep. 2025. [Online]. Available: https://doi.org/10.1145/3743128

work page doi:10.1145/3743128 2025
[30]

Swinkan: A dual- polarization radar extrapolation model based on swin transformer and convolutional kolmogorovâ ˘A¸ Sarnold networks,

J. Wang, Y . Zhang, L. Zhu, Q. Liu, and L. Wu, “Swinkan: A dual- polarization radar extrapolation model based on swin transformer and convolutional kolmogorovâ ˘A¸ Sarnold networks,”IEEE Transactions on Geoscience and Remote Sensing, vol. 63, pp. 1–18, 2025

work page 2025
[31]

Enhanced radar echo extrapolation for precipitation nowcasting quality using the convolutional kolmogorovâ ˘A¸ Sarnold networks,

Q. Cheng, Y . Su, Y . He, Y . Wu, F. Liu, Y . Rao, Y . Chao, K. Wang, Z. Liu, J. Liu, and Y . Chen, “Enhanced radar echo extrapolation for precipitation nowcasting quality using the convolutional kolmogorovâ ˘A¸ Sarnold networks,”Journal of Hydrology, vol. 663, p. 134134, 2025. [Online]. Available: https://www.sciencedirect.com/ science/article/pii/S00221...

work page 2025
[32]

Convolutional lstm network: A machine learning approach for precipitation nowcasting,

X. Shi, Z. Chen, H. Wang, D.-Y . Yeung, W. kin Wong, and W. chun Woo, “Convolutional lstm network: A machine learning approach for precipitation nowcasting,” 2015. [Online]. Available: https://arxiv.org/abs/1506.04214

work page arXiv 2015
[33]

Satellite image prediction relying on gan and lstm neural networks,

Z. Xu, J. Du, J. Wang, C. Jiang, and Y . Ren, “Satellite image prediction relying on gan and lstm neural networks,” inICC 2019 - 2019 IEEE International Conference on Communications (ICC), 2019, pp. 1–6

work page 2019
[34]

Skilful precipitation nowcasting using deep generative models of radar,

S. Ravuri, K. Lenc, M. Willson, D. Kangin, R. Lam, P. Mirowski, M. Fitzsimons, M. Athanassiadou, S. Kashem, S. Madgeet al., “Skilful precipitation nowcasting using deep generative models of radar,”Nature, vol. 597, no. 7878, pp. 672–677, 2021

work page 2021
[35]

Skilful nowcasting of extreme precipitation with nowcastnet,

Y . Zhang, M. Long, K. Chen, L. Xing, R. Jin, M. I. Jordan, and J. Wang, “Skilful nowcasting of extreme precipitation with nowcastnet,”Nature, vol. 619, no. 7970, pp. 526–532, 2023

work page 2023
[36]

Extreme precipitation nowcasting using transformer-based generative models,

C. Meo, A. Roy, M. LicÄ ˇC, J. Yin, Z. B. Che, Y . Wang, R. Imhoff, R. Uijlenhoet, and J. Dauwels, “Extreme precipitation nowcasting using transformer-based generative models,” 2024. [Online]. Available: https://arxiv.org/abs/2403.03929

work page arXiv 2024
[37]

arXiv preprint arXiv:2304.12891 , year=

J. Leinonen, U. Hamann, D. Nerini, U. Germann, and G. Franch, “Latent diffusion models for generative precipitation nowcasting with accurate uncertainty quantification,” 2023. [Online]. Available: https://arxiv.org/abs/2304.12891

work page arXiv 2023
[38]

Auto-Encoding Variational Bayes

D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” 2022. [Online]. Available: https://arxiv.org/abs/1312.6114

work page internal anchor Pith review Pith/arXiv arXiv 2022
[39]

Neural Discrete Representation Learning

A. van den Oord, O. Vinyals, and K. Kavukcuoglu, “Neural discrete representation learning,” 2018. [Online]. Available: https: //arxiv.org/abs/1711.00937 JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 10

work page Pith review arXiv 2018
[40]

Scrd: A spatiotemporal cues-guided residual diffusion model for precipitation nowcasting,

Y . Li, D. Niu, Y . Li, Z. Zang, H. Wang, and M. Jiang, “Scrd: A spatiotemporal cues-guided residual diffusion model for precipitation nowcasting,”IEEE Geoscience and Remote Sensing Letters, vol. 21, pp. 1–5, 2024

work page 2024
[41]

Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations,

M. Raissi, P. Perdikaris, and G. Karniadakis, “Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations,”Journal of Computational Physics, vol. 378, pp. 686–707, 2019. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0021999118307125

work page 2019
[42]

Disentangling physical dynamics from unknown factors for unsupervised video prediction,

V . L. Guen and N. Thome, “Disentangling physical dynamics from unknown factors for unsupervised video prediction,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020

work page 2020
[43]

Three-dimensional radar echo extrapolation using a physics-constrained deep learning model,

L. Geng, J. Min, H. Geng, and X. Zhuang, “Three-dimensional radar echo extrapolation using a physics-constrained deep learning model,”Remote Sensing, vol. 18, no. 2, 2026. [Online]. Available: https://www.mdpi.com/2072-4292/18/2/206

work page 2026
[44]

Meteonet, an open reference weather dataset,

G. Larvor, L. Berthomier, V . Chabot, B. Le Pape, B. Pradel, and L. Perez, “Meteonet, an open reference weather dataset,” 2020. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 11 APPENDIX This document provides supplementary material for the main manuscript. The contents are organized as follows:Appendix A elaborates on the extended formulation ...

work page 2020
[45]

Geometric Interpretation and the Role of Auxiliary Time Step r:To avoid the optimization collapse caused by directly regressing chaotic vector fields in high-dimensional radar pixel spaces, PixelFlowCast decouples the prediction and optimization spaces through an auxiliary target time step r∈[0, t]. During training, rather than directly regressing the exa...

work page
[46]

Extension to Multi-Step Sampling:The original Pixel Mean Flows (PMF) framework [20] was primarily introduced and evaluated for one-step image generation (1-NFE). However, modeling the highly chaotic and complex spatiotemporal dynamics of extreme precipitation systems in a single step often leads to the underestimation of high-threshold meteorological deta...

work page
[47]

Noise-Free Extraction at the Final Step:This interval- based multi-step formulation provides the geometric basis for the final extraction strategy detailed in Section 3.4 of the main text. While iteratively updating Zcurr constructs the evolutionary sequence, integrating continuous velocity fields across discrete steps can still accumulate residual numeri...

work page
[48]

From the original 49-frame SEVIR events, we extract continuous sequences of length 48 (12 context frames and 36 prediction frames) using a sliding window with a stride of 1

Additional Dataset and Preprocessing Details:To ensure full reproducibility, we detail the exact data preprocessing and evaluation pipelines. From the original 49-frame SEVIR events, we extract continuous sequences of length 48 (12 context frames and 36 prediction frames) using a sliding window with a stride of 1. The raw SEVIR VIL data, stored as 16-bit ...

work page 2021
[49]

We summarize the specific hyperparameter configurations of our instantiated SimVP in Table S-3

Deterministic Backbone: SimVP:In the first stage of the PixelFlowCast framework, we employ SimVP [8] as the deterministic base predictor to capture the macroscopic spa- tiotemporal evolution trend, denoted as ˆXcoarse. We summarize the specific hyperparameter configurations of our instantiated SimVP in Table S-3. The model takes the past Tin =12 frames as...

work page
[50]

As formulated in Section 3.3 of the main text, KANCondNet strategically replaces traditional fixed activations with learnable B-splines

Condition Encoder: KANCondNet:To effectively guide the PMF predictor, KANCondNet is designed to extract precise multi-scale spatiotemporal conditions Hc from the concatenated past context Xpast and the coarse baseline ˆXcoarse. As formulated in Section 3.3 of the main text, KANCondNet strategically replaces traditional fixed activations with learnable B-s...

work page
[51]

In our implementation, Fθ is instantiated based on the Global- Temporal U-Net (GTUnet) architecture originally proposed in DiffCast [14]

Pixel Mean Flows Predictor: Modified GTUnet:In Section 3.4 of the main text, the core of our PMF pre- dictor is abstracted as the x-prediction model Fθ. In our implementation, Fθ is instantiated based on the Global- Temporal U-Net (GTUnet) architecture originally proposed in DiffCast [14]. GTUnet exhibits strong capabilities in modeling complex meteorolog...

work page 2021
[52]

To accommodate the substantial computational footprint inherent in long-term spatiotemporal sequence forecasting, we employ Bfloat16 Mixed Precision (bf16-mixed) during training

Training Configurations and Optimization Details:The proposed PixelFlowCast framework is implemented using PyTorch and trained with the Distributed Data Parallel (DDP) strategy across multiple NVIDIA GeForce RTX 3090 (24GB) GPUs. To accommodate the substantial computational footprint inherent in long-term spatiotemporal sequence forecasting, we employ Bfl...

work page
[53]

Consequently, a detailed discussion on computational speed is omitted from the main manuscript

Ablation Study on Inference Speed:Compared to tra- ditional diffusion paradigms, continuous flow-based models intrinsically benefit from significantly accelerated inference. Consequently, a detailed discussion on computational speed is omitted from the main manuscript. To comprehensively supplement the architectural evaluations, this section explicitly pr...

work page
[54]

Ablation Study on Inference Steps:As detailed in Sec- tion A2, while the original PMF framework is conceptualized for one-step generation, modeling the highly chaotic dynamics of extreme precipitation systems necessitates a multi-step sampling strategy. To determine the optimal configuration, we conduct a comprehensive ablation study on the number of infe...

work page arXiv 2021
[55]

Ablation Study on Noise-Free Extraction Strategy:In Section A3, we proposed a Noise-Free Extraction strategy for the final sampling step. Rather than performing a final numerical integration to obtain the accumulated state Zcurr, our framework directly outputs the terminal virtual intercept ˆXpred to bypass residual numerical noise. To empirically validat...

work page
[56]

Overall Average

Dataset.:The MeteoNet dataset is provided by the French national meteorological service (MÃl’tÃl’o-France). This dataset captures the evolution of radar echoes over the French territory, featuring a high spatial resolution of 0.01○ (approximately 1 km) on an original grid of 565×784 pixels, and a temporal resolution of 5 minutes. Following a consistent ex...

work page 2021
[57]

In the construction of the time series, each sample is a 48-frame sequence extracted from the continuous radar observations

Data Preprocessing.:Consistent with the preprocessing strategy applied to the SEVIR dataset, we addressed computing resource limitations by cropping and downsampling the spatial dimensions of all original MeteoNet radar frames to 128×128 pixels during the preprocessing stage. In the construction of the time series, each sample is a 48-frame sequence extra...

work page
[58]

Quantitative Results:The quantitative and qualitative results on the MeteoNet dataset are summarized in Tables S-10 and S-11, as well as Figures S-9, S-10, S-11, S-12. Overall, the empirical performance exhibits a highly consistent trend with those observed on the SEVIR dataset, further validating the effectiveness and generalizability of PixelFlowCast. F...

work page 2021