arxiv: 2604.12180 · v1 · submitted 2026-04-14 · 💻 cs.LG · cs.AI

Recognition: unknown

CycloneMAE: A Scalable Multi-Task Learning Model for Global Tropical Cyclone Probabilistic Forecasting

Renlong Hang , Zihao Xu , Jiuwei Zhao , Runling Yu , Leye Cheng , Qingshan Liu

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:55 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords tropical cyclone forecastingmasked autoencodermulti-task learningprobabilistic forecastingdeep learning for meteorologynumerical weather predictionglobal ocean basinsinterpretable weather models

0 comments

The pith

CycloneMAE pretrains a structure-aware masked autoencoder on multi-modal data to deliver both deterministic and probabilistic tropical cyclone forecasts that outperform numerical weather prediction models in pressure, wind, and short-term跟踪

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops CycloneMAE as a multi-task model that first pretrains a tropical-cyclone-structure-aware masked autoencoder on satellite imagery and environmental fields to extract shared representations across variables and basins. It then fine-tunes the model with a discrete probabilistic gridding layer so that a single network produces both point forecasts and full probability distributions for track, central pressure, and wind speed. Evaluated on five global ocean basins, the resulting forecasts improve on leading numerical weather prediction systems for pressure and wind through 120 hours and for track through 24 hours while also exposing how the model shifts attention from internal convective structure at short lead times to external environmental drivers at longer ranges.

Core claim

By coupling a TC-structure-aware masked autoencoder with a discrete probabilistic gridding mechanism inside a pre-train/fine-tune workflow, CycloneMAE simultaneously produces deterministic forecasts and well-calibrated probability distributions for tropical-cyclone track, pressure, and wind; when tested across five ocean basins these outputs exceed the accuracy of operational numerical weather prediction systems for pressure and wind up to 120 hours and for track up to 24 hours, with integrated-gradients attribution showing a physically consistent progression from reliance on core convective features to reliance on surrounding environmental fields as forecast horizon increases.

What carries the argument

The TC-structure-aware masked autoencoder that reconstructs masked multi-modal inputs (satellite imagery plus environmental fields) to learn transferable representations, paired with a discrete probabilistic gridding layer that converts latent features into both point estimates and probability distributions over a gridded output space.

If this is right

A single pretrained backbone can be fine-tuned for multiple forecast variables instead of training separate models for each.
Probabilistic outputs are generated directly by the network rather than by post-processing an ensemble.
Attention maps derived from integrated gradients provide a built-in diagnostic of whether the model is using physically plausible features at different lead times.
Historical multi-modal archives can be leveraged for pretraining without running expensive numerical integrations.
Operational systems could replace or augment select numerical weather prediction runs with the lighter deep-learning inference for the first 24-120 hours.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same pretraining recipe could be adapted to other high-impact weather phenomena whose internal structure is visible in satellite imagery, such as extratropical cyclones or atmospheric rivers.
The observed shift from internal to external drivers suggests that future hybrid systems could route short-range forecasts through imagery-heavy branches and long-range forecasts through environment-heavy branches.
If the discrete gridding layer proves portable, it could be grafted onto other weather foundation models to add native probabilistic capability without retraining the entire backbone.

Load-bearing premise

The learned representations remain effective when transferred across ocean basins, forecast variables, and lead times without requiring basin-specific retraining or post-hoc calibration of the probability outputs.

What would settle it

On a held-out basin or variable, CycloneMAE shows no statistically significant improvement over the best numerical weather prediction baseline in root-mean-square error for pressure or wind at 72-120 hours, or the Brier score for the probabilistic track outputs exceeds that of a calibrated ensemble reference.

read the original abstract

Tropical cyclones (TCs) rank among the most destructive natural hazards, yet their forecasting faces fundamental trade-offs: numerical weather prediction (NWP) models are computationally prohibitive and struggle to leverage historical data, while existing deep learning (DL)-based intelligent models are variable-specific and deterministic, which fail to generalize across different forecasting variables. Here we present CycloneMAE, a scalable multi-task forecasting model that learns transferable TC representations from multi-modal data using a TC structure-aware masked autoencoder. By coupling a discrete probabilistic gridding mechanism with a pre-train/fine-tune paradigm, CycloneMAE simultaneously delivers deterministic forecasts and probability distributions. Evaluated across five global ocean basins, CycloneMAE outperforms leading NWP systems in pressure and wind forecasting up to 120 hours and in track forecasting up to 24 hours. Attribution analysis via integrated gradients reveals physically interpretable learning dynamics: short-term forecasts rely predominantly on the internal core convective structure from satellite imagery, whereas longer-term forecasts progressively shift attention to external environmental factors. Our framework establishes a scalable, probabilistic, and interpretable pathway for operational TC forecasting.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CycloneMAE applies a TC-aware masked autoencoder with probabilistic gridding in a multi-task setup and claims gains over NWP, but the abstract gives no metrics or ablations to back the transferability and calibration assumptions.

read the letter

The main takeaway is that this paper takes the masked autoencoder idea and adapts it for global tropical cyclone forecasting with a structure-aware pretraining step, then adds discrete probabilistic gridding so the model can output both point forecasts and distributions in one multi-task framework. It reports better pressure and wind skill than leading NWP out to 120 hours and better track skill out to 24 hours across five basins, plus an integrated-gradients analysis that shows the model moving from internal convective features at short range to environmental drivers at longer range. That combination of elements is the actual new piece; the underlying MAE and pretrain/fine-tune pattern are already in the literature, but the TC-specific masking and the gridding mechanism for joint deterministic-probabilistic output are a fresh application to this domain. The interpretability section is also a clear positive because it ties the learned representations back to physical intuition rather than leaving them as a black box. The framing of the problem is straightforward: NWP is too expensive and data-inefficient, while prior DL models are usually single-variable and deterministic. The paper tries to address both issues at once. The soft spot is exactly the one flagged in the stress test. The central performance claims rest on the assumption that the structure-aware representations transfer across variables, basins, and lead times, and that the discrete gridding produces well-calibrated probabilities without hidden post-processing. The abstract supplies none of the usual checks—no ablation tables, no basin-by-basin breakdowns, no reliability diagrams, no calibration scores, and no error bars. Without those, it is impossible to tell whether the reported gains are real or whether they come from the multi-task construction itself or from dataset quirks. The full manuscript may contain the missing numbers, but on the evidence given the argument is not yet load-bearing. This work is aimed at the intersection of machine learning and operational meteorology. Anyone building probabilistic environmental models or looking for ways to make MAE-style pretraining useful for gridded physical data could pick up useful architectural details. It is worth sending to peer review because the topic has clear societal value and the approach is coherent on its own terms, but any referee will need to see the quantitative validation and calibration evidence before the outperformance claims can be taken at face value.

Referee Report

2 major / 0 minor

Summary. The manuscript presents CycloneMAE, a scalable multi-task learning model for global tropical cyclone probabilistic forecasting. It uses a TC structure-aware masked autoencoder to learn transferable representations from multi-modal data, combined with a discrete probabilistic gridding mechanism in a pre-train/fine-tune paradigm to deliver both deterministic forecasts and probability distributions. The central claim is that, evaluated across five global ocean basins, CycloneMAE outperforms leading NWP systems in pressure and wind forecasting up to 120 hours and in track forecasting up to 24 hours, with attribution analysis via integrated gradients revealing physically interpretable attention shifts from internal convective structures to external environmental factors.

Significance. If the empirical claims hold after verification, this work could be significant for operational TC forecasting by providing a computationally efficient, multi-task DL alternative to NWP that jointly handles multiple variables and outputs calibrated probabilities, along with built-in interpretability that aligns with physical understanding of cyclone dynamics.

major comments (2)

[Abstract] Abstract: The assertion that CycloneMAE 'outperforms leading NWP systems in pressure and wind forecasting up to 120 hours and in track forecasting up to 24 hours' across five basins supplies no quantitative metrics, baseline details, error bars, or validation procedures. This is load-bearing for the central claim and leaves it impossible to determine whether the data actually support outperformance.
[Abstract] Abstract and evaluation description: The central claim requires that the masked autoencoder's structure-aware representations transfer across variables, basins, and horizons, and that the discrete probabilistic gridding produces well-calibrated probabilities without post-hoc tuning driving the gains. No ablation results, cross-basin breakdowns, reliability diagrams, or calibration scores are supplied to verify these conditions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and valuable feedback on our manuscript. We address the major comments point by point, agreeing to enhance the abstract and evaluation sections with additional details and analyses to better support our claims.

read point-by-point responses

Referee: [Abstract] Abstract: The assertion that CycloneMAE 'outperforms leading NWP systems in pressure and wind forecasting up to 120 hours and in track forecasting up to 24 hours' across five basins supplies no quantitative metrics, baseline details, error bars, or validation procedures. This is load-bearing for the central claim and leaves it impossible to determine whether the data actually support outperformance.

Authors: We agree that the abstract would benefit from including key quantitative metrics to substantiate the outperformance claim. In the revised manuscript, we will update the abstract to include specific metrics, such as percentage improvements in mean sea level pressure (MSLP) and maximum sustained wind (MSW) forecasts up to 120 hours, track errors up to 24 hours, along with references to the NWP baselines used and the validation across the five basins. Error bars and validation procedures will be briefly noted. Full details remain in the main text and supplementary materials. This change will be incorporated. revision: yes
Referee: [Abstract] Abstract and evaluation description: The central claim requires that the masked autoencoder's structure-aware representations transfer across variables, basins, and horizons, and that the discrete probabilistic gridding produces well-calibrated probabilities without post-hoc tuning driving the gains. No ablation results, cross-basin breakdowns, reliability diagrams, or calibration scores are supplied to verify these conditions.

Authors: We agree that to fully verify the transfer of representations and the calibration of probabilities, additional analyses are warranted. In the revised manuscript, we will include ablation studies on the TC structure-aware masked autoencoder, the discrete probabilistic gridding, and the pre-train/fine-tune paradigm. We will also provide cross-basin breakdowns, reliability diagrams, and calibration scores to confirm the claims. These will be added to the evaluation section and supplementary material. revision: yes

Circularity Check

0 steps flagged

Standard pre-train/fine-tune pipeline with no circular reductions

full rationale

The paper presents CycloneMAE as a masked autoencoder pre-trained on multi-modal TC data then fine-tuned for multi-task probabilistic forecasting. This follows a conventional supervised learning workflow evaluated on held-out data across five basins. No equations or claims reduce any 'prediction' or 'first-principles result' to fitted inputs by construction, nor do self-citations serve as load-bearing justification for the central outperformance claims. Attribution analysis and discrete gridding are presented as architectural choices, not tautological derivations. The derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities beyond naming the model and its components; full paper would be needed to audit training hyperparameters or architectural choices.

pith-pipeline@v0.9.0 · 5504 in / 1144 out tokens · 54281 ms · 2026-05-10T15:55:32.126459+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 23 canonical work pages

[1]

Science 373(6553), 453–457 (2021) https://doi.org/10.1126/science.abb9038

Wang, S., Toumi, R.: Recent migration of tropical cyclones toward coasts. Science 373(6553), 453–457 (2021) https://doi.org/10.1126/science.abb9038

work page doi:10.1126/science.abb9038 2021
[2]

Nature436(7051), 686–688 (2005) https://doi.org/10.1038/nature03906

Emanuel, K.: Increasing destructiveness of tropical cyclones over the past 30 years. Nature436(7051), 686–688 (2005) https://doi.org/10.1038/nature03906

work page doi:10.1038/nature03906 2005
[3]

Nature504(7478), 44–52 (2013) https://doi.org/10.1038/ nature12855

Woodruff, J.D., Irish, J.L., Camargo, S.J.: Coastal flooding by tropical cyclones and sea-level rise. Nature504(7478), 44–52 (2013) https://doi.org/10.1038/ nature12855

2013
[4]

https://wmo.int/ topics/tropical-cyclone

World Meteorological Organization (WMO): Tropical cyclone. https://wmo.int/ topics/tropical-cyclone. Accessed: 2024 (2024)

2024
[5]

Science 309(5742), 1844–1846 (2005) https://doi.org/10.1126/science.1116448

Webster, P.J., Holland, G.J., Curry, J.A., Chang, H.-R.: Changes in tropical cyclone number, duration, and intensity in a warming environment. Science 309(5742), 1844–1846 (2005) https://doi.org/10.1126/science.1116448

work page doi:10.1126/science.1116448 2005
[6]

Nature525(7567), 47–55 (2015) https://doi.org/10.1038/nature14956

Bauer, P., Thorpe, A., Brunet, G.: The quiet revolution of numerical weather pre- diction. Nature525(7567), 47–55 (2015) https://doi.org/10.1038/nature14956

work page doi:10.1038/nature14956 2015
[7]

Halperin, D.J., Fuelberg, H.E., Hart, R.E., Cossuth, J.H., Sura, P., Pasch, R.J.: An evaluation of tropical cyclone genesis forecasts from global numerical models. 19 Wea. Forecast.28(6), 1423–1445 (2013) https://doi.org/10.1175/waf-d-13-00008. 1

work page doi:10.1175/waf-d-13-00008 2013
[8]

IEEE Trans

Yue, L., Zhang, R., Ding, J., Liu, Q.: Real-time statistical weather estimation and prediction for tropical cyclone intensity in an interpretable manner via causal inference. IEEE Trans. Geosci. Remote Sens.62, 4109411 (2024) https://doi.org/ 10.1109/tgrs.2024.3451725

work page doi:10.1109/tgrs.2024.3451725 2024
[9]

IEEE Trans

Tian, W., Chen, Y., Song, P., Xu, H., Wu, L., Zhang, Y., Xiang, C., Hao, S.: Tcip-net: Quantifying radial structure evolution for tropical cyclone inten- sity prediction. IEEE Trans. Geosci. Remote Sens.62, 4109314 (2024) https: //doi.org/10.1109/tgrs.2024.3450711

work page doi:10.1109/tgrs.2024.3450711 2024
[10]

Huang, C.,et al.: Benchmark dataset and deep learning method for global tropi- cal cyclone forecasting. Nat. Commun.16, 5923 (2025) https://doi.org/10.1038/ s41467-025-61087-4

2025
[11]

Accurate medium-range global weather forecasting with 3d neural networks,

Bi, K., Xie, L., Zhang, H., Chen, X., Gu, X., Tian, Q.: Accurate medium-range global weather forecasting with 3d neural networks. Nature619(7970), 533–538 (2023) https://doi.org/10.1038/s41586-023-06185-3

work page doi:10.1038/s41586-023-06185-3 2023
[12]

Scienc e 382(6669), 1416–1421 (2023) https://doi.org/10.1126/science.adi2336

Lam, R., Sanchez-Gonzalez, A., Willson, M., Wirnsberger, P., Fortunato, M., Alet, F., Ravuri, S., Ewalds, T., Eaton-Rosen, Z., Hu, W., Merose, A., Hoyer, S., Holland, G., Vinyals, O., Stott, J., Pritzel, A., Mohamed, S., Battaglia, P.: Learning skillful medium-range global weather forecasting. Science382(6677), 1416–1421 (2023) https://doi.org/10.1126/sci...

work page doi:10.1126/science.adi2336 2023
[13]

Nature632, 1060–1066 (2024) https://doi.org/10.1038/ s41586-024-07744-y

Kochkov, D., Yuval, J., Langmore, I.,et al.: Neural general circulation models for weather and climate. Nature632, 1060–1066 (2024) https://doi.org/10.1038/ s41586-024-07744-y

2024
[14]

FuXi : a cascade machine learning forecasting system for 15-day global weather forecast

Chen, L., Zhong, X., Zhang, F., Cheng, Y., Xu, Y., Qi, Y., Li, H.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast. npj Clim. Atmos. Sci.6(1) (2023) https://doi.org/10.1038/s41612-023-00512-1

work page doi:10.1038/s41612-023-00512-1 2023
[15]

Schreck, J.S., Gagne, D.J., Becker, C., Chapman, W.E., Elmore, K., Fan, D., Gantos, G., Kim, E., Kimpara, D., Martin, T., Molina, M.J., Przybylo, V.M., Radford, J., Saavedra, B., Willson, J., Wirz, C.: Evidential deep learning: Enhanc- ing predictive uncertainty estimation for earth system science applications. Artif. Intell. Earth Syst.3(4) (2024) https:...

work page doi:10.1175/aies-d-23-0093.1 2024
[16]

Nature637, 84–90 (2025) https://doi.org/10.1038/s41586-024-08252-9

Price, I., Sanchez-Gonzalez, A., Alet, F., Andersson, T.R., El-Kadi, A., Masters, D., Ewalds, T., Stott, J., Mohamed, S., Battaglia, P., Lam, R., Willson, M.: Probabilistic weather forecasting with machine learning. Nature637(8044), 84–90 (2024) https://doi.org/10.1038/s41586-024-08252-9

work page doi:10.1038/s41586-024-08252-9 2024
[17]

Science 20 310(5746), 248–249 (2005) https://doi.org/10.1126/science.1115255

Gneiting, T., Raftery, A.E.: Weather forecasting with ensemble methods. Science 20 310(5746), 248–249 (2005) https://doi.org/10.1126/science.1115255

work page doi:10.1126/science.1115255 2005
[18]

Nguyen, T., Brandstetter, J., Kapoor, A., Gupta, J.K., Grover, A.: ClimaX: A foundation model for weather and climate. In: Int. Conf. Mach. Learn. (2023)

2023
[19]

Bodnar, W

Bodnar, C., Bruinsma, W.P., Lucic, A., Stanley, M., Allen, A., Brandstetter, J., Garvan, P., Riechert, M., Weyn, J.A., Dong, H., Gupta, J.K., Thambiratnam, K., Archibald, A.T., Wu, C.-C., Heider, E., Welling, M., Turner, R.E., Perdikaris, P.: A foundation model for the earth system. Nature641(8065), 1180–1187 (2025) https://doi.org/10.1038/s41586-025-09005-y

work page doi:10.1038/s41586-025-09005-y 2025
[20]

In: Proc

Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks. In: Proc. 34th Int. Conf. Mach. Learn., vol. 70, pp. 3319–3328 (2017). https: //doi.org/10.5555/3305890.3306024

work page doi:10.5555/3305890.3306024 2017
[21]

Liu, Y., Shen, D., Wang, H., Wang, Y., Li, X., Mu, S.: Phase-resolved attri- bution of tropical cyclone cold wakes from an interpretable data-driven model. J. Geophys. Res. Mach. Learn. Comput.3(1) (2026) https://doi.org/10.1029/ 2025jh001179

2026
[22]

Toms, B.A., Barnes, E.A., Ebert-Uphoff, I.: Physically interpretable neural net- works for the geosciences: Applications to earth system variability. J. Adv. Model. Earth Syst.12(9), 2019–002002 (2020) https://doi.org/10.1029/2019MS002002

work page doi:10.1029/2019ms002002 2019
[23]

MViTv2: Improved Multiscale Vision Transformers for Classification and Detection , isbn =

He, K., Chen, X., Xie, S., Li, Y., Doll´ ar, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp. 15979–15988 (2022). https://doi.org/10.1109/CVPR52688.2022. 01553

work page doi:10.1109/cvpr52688.2022 2022
[24]

MViTv2: Improved Multiscale Vision Transformers for Classification and Detection , isbn =

Xie, Z., Zhang, Z., Cao, Y., Lin, Y., Bao, J., Yao, Z., Dai, Q., Hu, H.: SimMIM: A simple framework for masked image modeling. In: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp. 9643–9653 (2022). https://doi.org/10.1109/ cvpr52688.2022.00943

work page arXiv 2022
[25]

Meteorol

Wang, Y., Wu, C.-C.: Current understanding of tropical cyclone structure and intensity changes: A review. Meteorol. Atmos. Phys.87(4), 257–278 (2004) https: //doi.org/10.1007/s00703-003-0055-6

work page doi:10.1007/s00703-003-0055-6 2004
[26]

Wang, C., Yang, N., Li, X.: Advancing forecasting capabilities: A contrastive learning model for forecasting tropical cyclone rapid intensification. Proc. Natl. Acad. Sci. U.S.A.122(4), 2415501122 (2025) https://doi.org/10.1073/pnas. 2415501122

work page doi:10.1073/pnas 2025
[27]

Knapp, K.R., Ansari, S., Bain, C.L., Bourassa, M.A., Dickinson, M.J., Funk, C., Helms, C.N., Hennon, C.C., Holmes, C.D., Huffman, G.J., Kossin, J.P., Lee, H.- T., Loew, A., Magnusdottir, G.: Globally gridded satellite observations for climate studies. Bull. Amer. Meteor. Soc.92(7), 893–907 (2011) https://doi.org/10.1175/ 21 2011bams3039.1

2011
[28]

Hersbach, H., Bell, B., Berrisford, P., Hirahara, S., Hor´ anyi, A., Mu˜ noz-Sabater, J., Nicolas, J., Peubey, C., Radu, R., Schepers, D., Simmons, A., Soci, C., Abdalla, S., Abellan, X., Balsamo, G., Bechtold, P., Biavati, G., Bidlot, J., Bonavita, M., De Chiara, G., Dahlgren, P., Dee, D., Diamantakis, M., Dragani, R., Flemming, J., Forbes, R., Fuentes, ...

work page doi:10.1002/qj.3803 1999
[29]

Knapp, K.R., Kruk, M.C., Levinson, D.H., Diamond, H.J., Neumann, C.J.: The international best track archive for climate stewardship (IBTrACS). Bull. Amer. Meteorol. Soc.92(3), 363–376 (2010) https://doi.org/10.1175/2009bams2755.1

work page doi:10.1175/2009bams2755.1 2010
[30]

Advances in Atmospheric Sciences38(4), 690–699 (2021) https://doi.org/10.1007/s00376-020-0211-7

Lu, X., Yu, H., Ying, M., Zhao, B., Zhang, S., Lin, L., Bai, L., Wan, R.: Western north pacific tropical cyclone database created by the china meteoro- logical administration. Advances in Atmospheric Sciences38(4), 690–699 (2021) https://doi.org/10.1007/s00376-020-0211-7

work page doi:10.1007/s00376-020-0211-7 2021
[31]

Bougeault, P., Toth, Z., Bishop, C., Brown, B., Burridge, D., Chen, D.H., Ebert, B., Fuentes, M., Hamill, T.M., Mylne, K., Nicolau, J., Paccagnella, T., Park, Y.- Y., Parsons, D., Raoult, B., Schuster, D., Dias, P.S., Swinbank, R., Takeuchi, Y., Tennant, W., Wilson, L., Worley, S.: The THORPEX interactive grand global ensemble. Bull. Amer. Meteor. Soc.91(...

2010