Land cover and flood type govern the detection limits of satellite-based flood mapping across diverse global flood events
Pith reviewed 2026-06-27 21:38 UTC · model grok-4.3
The pith
Land cover and flood type jointly determine satellite flood mapping accuracy across global events.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Across 19 out-of-distribution flood events, the model reaches highest agreement in cropland (IoU 52 percent) and riverine floods (F1 0.69), yet shows near-zero detection in tree cover and built-up areas (IoU 4 percent) regardless of flood mechanism. Dual-reference validation shows that part of the apparent error reflects definitional inconsistency between the reference products. Iterative testing isolates 23 failure modes where pipeline engineering dominates initial error over model capacity, establishing environment-dependent detection boundaries for operational satellite flood mapping.
What carries the argument
Joint dependence of detection metrics (IoU and F1) on land cover class and flood mechanism
If this is right
- Detection performance reaches its highest levels in cropland and for riverine flood events.
- Tree cover and built-up areas produce near-zero detection irrespective of the flood mechanism.
- Inconsistencies between reference products account for a portion of measured model error.
- Pipeline engineering choices exert greater influence on results than model capacity alone.
- Operational satellite flood mapping is bounded by these environment-specific limits.
Where Pith is reading between the lines
- Monitoring systems could prioritize cropland and riverine zones where satellite detection is most reliable.
- Urban and forested flood response may need supplementary ground or radar data sources.
- Model training could incorporate land-cover stratification to reduce systematic blind spots.
Load-bearing premise
The two independent reference products supply sufficiently consistent and accurate benchmarks for true flood extent despite definitional differences between them.
What would settle it
A documented flood event in tree-covered or built-up terrain where the model reaches IoU above 30 percent against both independent references would undermine the near-zero detection claim.
read the original abstract
Floods are among the most destructive natural hazards, and their increasing frequency under climate change makes satellite-based inundation mapping essential for disaster response. Geospatial foundation models pretrained on satellite archives offer geographic transferability, but their operational reliability across diverse, unseen events remains uncharacterized. Here we deploy Prithvi-EO-2.0 across 19 out-of-distribution flood events (2017-2025) spanning six continents, eight climate zones, and six flood mechanisms, validating against two independent reference products. Detection accuracy depended jointly on land cover and flood type, with cropland yielding the highest agreement (IoU=52%) and riverine events the strongest detection (F1=0.69), while tree cover and built-up areas showed near-zero detection (IoU=4%) regardless of flood mechanism. Dual-reference validation revealed that apparent model error partly reflects definitional inconsistency between reference products rather than detection failure. Iterative pipeline testing identified 23 failure modes, with pipeline engineering dominating initial error over model capacity. These findings establish environment-dependent detection boundaries for operational satellite flood mapping.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript evaluates the Prithvi-EO-2.0 geospatial foundation model on 19 out-of-distribution flood events (2017-2025) spanning six continents, eight climate zones, and six flood mechanisms. It validates model outputs against two independent reference products and reports that detection accuracy depends jointly on land cover and flood type, with cropland yielding the highest agreement (IoU=52%) and riverine events the strongest detection (F1=0.69), while tree cover and built-up areas show near-zero detection (IoU=4%) regardless of flood mechanism. Dual-reference validation indicates that some apparent model errors reflect definitional inconsistencies between references rather than detection failure, and iterative testing identifies 23 failure modes where pipeline engineering dominates over model capacity.
Significance. If the central empirical findings hold after addressing reference-consistency quantification, the work provides valuable environment-dependent detection boundaries for operational satellite flood mapping. The multi-continent, multi-mechanism out-of-distribution evaluation and explicit acknowledgment of reference inconsistencies are strengths that move beyond single-benchmark assessments common in the field.
major comments (2)
- [Abstract] Abstract and validation results: The claim that dual-reference validation shows 'apparent model error partly reflects definitional inconsistency' lacks a quantitative breakdown (e.g., per-land-cover agreement rates between the two references or the fraction of reported IoU/F1 'error' attributable to reference mismatch). This is load-bearing for the central claim because the near-zero IoU=4% in tree cover and built-up areas could be an artifact of higher benchmark disagreement in those classes rather than a true model detection limit.
- [Methods] Methods (event selection and validation protocol): No details are provided on data exclusion criteria, statistical tests for the reported IoU/F1 values, or how the 19 events were chosen from the larger set of possible floods. This affects interpretability of the joint land-cover/flood-type dependence, as post-hoc selection or inconsistent reference handling could inflate the reported environment-specific patterns.
minor comments (1)
- [Abstract] The abstract reports specific metrics (IoU=52%, F1=0.69, IoU=4%) without accompanying confidence intervals or sample sizes per category; adding these would improve clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which help clarify the presentation of our results on environment-dependent flood detection limits with Prithvi-EO-2.0. We address each major comment below and will revise the manuscript accordingly to improve transparency and quantification.
read point-by-point responses
-
Referee: [Abstract] Abstract and validation results: The claim that dual-reference validation shows 'apparent model error partly reflects definitional inconsistency' lacks a quantitative breakdown (e.g., per-land-cover agreement rates between the two references or the fraction of reported IoU/F1 'error' attributable to reference mismatch). This is load-bearing for the central claim because the near-zero IoU=4% in tree cover and built-up areas could be an artifact of higher benchmark disagreement in those classes rather than a true model detection limit.
Authors: We agree that the current presentation would benefit from explicit quantification to support the claim. In the revised manuscript, we will add a new table or section reporting per-land-cover agreement rates (e.g., IoU and F1) between the two reference products, along with the estimated fraction of model-reference discrepancies attributable to definitional mismatch versus actual detection failure. This will directly address whether the low performance in tree cover and built-up areas is inflated by reference inconsistency. revision: yes
-
Referee: [Methods] Methods (event selection and validation protocol): No details are provided on data exclusion criteria, statistical tests for the reported IoU/F1 values, or how the 19 events were chosen from the larger set of possible floods. This affects interpretability of the joint land-cover/flood-type dependence, as post-hoc selection or inconsistent reference handling could inflate the reported environment-specific patterns.
Authors: We acknowledge that additional methodological transparency is warranted. We will expand the Methods section to include: (1) explicit data exclusion criteria applied to the initial pool of flood events, (2) the rationale and process for selecting the final 19 events (including any stratification by continent, climate zone, or mechanism to avoid post-hoc bias), and (3) statistical tests or uncertainty measures (e.g., bootstrap confidence intervals) for the reported IoU and F1 scores. These additions will strengthen the interpretability of the land-cover and flood-type dependence findings. revision: yes
Circularity Check
No circularity: purely empirical evaluation against external references
full rationale
The paper conducts an empirical deployment and validation of a pretrained geospatial model on 19 out-of-distribution flood events, computing standard agreement metrics (IoU, F1) stratified by land cover and flood type against two independent external reference products. No mathematical derivations, parameter fitting presented as prediction, self-citation load-bearing premises, or ansatz smuggling appear in the described chain. All reported performance numbers are direct comparisons to external benchmarks, with the dual-reference analysis serving as an explicit check on reference inconsistency rather than a reduction to the paper's own inputs. The central claims therefore remain self-contained against outside data.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Liu, Q., Du, M., Wang, Y., Deng, J., Yan, W., Qin, C., Liu, M. and Liu, J. Global, regional and national trends and impacts of natural floods, 1990–2022. Bulletin of the World Health Organization 102 , 410-420 (2024). https://doi.org/10.2471/BLT.23.290243
-
[2]
Xiao, Y., Dai, Q., Liu, R., Ji, D., Lai, X. & Zhang, J. Modeling dynamic flood population exposure in coupled human ‐ water systems: The role of reservoir regulation and population movement. Water Resources Research 61 , e2025WR041234 (2025). https://doi.org/10.1029/2025WR041234
-
[3]
Shen, X., Anagnostou, E. N., Allen, G. H., Brakenridge, G. R. & Kettner, A. J. Near-real-time non-obstructed flood inundation mapping using synthetic aperture radar. Remote Sensing of Environment 221 , 302-315 (2019). https://doi.org/10.1016/j.rse.2018.11.008
-
[4]
Wu, H., Kimball, J.S., Zhou, N., Alfieri, L., Luo, L., Du, J. and Huang, Z. Evaluation of real-time global flood modeling with satellite surface inundation observations from SMAP. Remote Sensing of Environment 233 , 111360 (2019). https://doi.org/10.1016/j.rse.2019.111360
-
[5]
The Harmonized Landsat and Sentinel-2 surface reflectance data set,
Claverie, M., Ju, J., Masek, J.G., Dungan, J.L., Vermote, E.F., Roger, J.C., Skakun, S.V. and Justice, C. The Harmonized Landsat and Sentinel-2 surface reflectance data set. Remote Sensing of Environment 219 , 145-161 (2018). https://doi.org/10.1016/j.rse.2018.09.002
-
[6]
Ban, Y., Jacob, A. & Gamba, P. Spaceborne SAR data for global urban mapping at 30 m resolution using a robust urban extractor. ISPRS Journal of Photogrammetry and Remote Sensing 103 , 28-37 (2017). https://doi.org/10.1016/j.isprsjprs.2014.08.004
-
[7]
Tulbure, M. G., Broich, M., Stehman, S. V. & Kommareddy, A. Surface water extent dynamics from three decades of seasonally continuous Landsat time series at subcontinental scale in a semi-arid region. Remote Sensing of Environment 178 , 142-157 (2022). https://doi.org/10.1016/j.rse.2016.02.034
-
[8]
Munasinghe, D., Frasson, R. P. D. M., David, C. H., Bonnema, M., Schumann, G. & Brakenridge, G. R. A multi-sensor approach for increased measurements of floods and their societal impacts from space. Communications Earth & Environment 4 , 462 (2023). https://doi.org/10.1038/s43247-023-01129-1 16
-
[9]
I., Tsagkatakis, G., Fotiadou, K
Drakonakis, G. I., Tsagkatakis, G., Fotiadou, K. & Tsakalides, P. Ombrianet—supervised flood mapping via convolutional neural networks using multitemporal sentinel-1 and sentinel-2 data fusion. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 15 , 2341-2356 (2022). https://doi.org/10.1109/JSTARS.2022.3155559
-
[10]
Bentivoglio, R., Isufi, E., Jonkman, S. N. & Taormina, R. Deep learning methods for flood mapping: a review of existing applications and future research directions. Hydrology and Earth System Sciences 26 , 4345-4378 (2022). https://doi.org/10.5194/hess-26-4345-2022
-
[11]
Nemni, E., Bullock, J., Belabbes, S. & Bromley, L. Fully convolutional neural network for rapid flood segmentation in synthetic aperture radar imagery. Remote Sensing 12 , 2532 (2020). https://doi.org/10.3390/rs12162532
-
[12]
Portalés-Julià, E., Mateo-García, G. & Gómez-Chova, L. Understanding flood detection models across Sentinel-1 and Sentinel-2 modalities and benchmark datasets. Remote Sensing of Environment 328 , 114882 (2025). https://doi.org/10.1016/j.rse.2025.114882
-
[13]
Shinde, R., Ankur, K., Phillips, C.E., Gupta, A., Pfreundschuh, S., Roy, S., Kirkland, S., Gaur, V., Kolluru, V., Lin, A. and Trital, P. WxC-Bench: A novel dataset for weather and climate downstream tasks. Scientific Data (2026). https://doi.org/10.1038/s41597-026-06839-7
-
[14]
Bommasani, R., Hudson, D.A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M.S., Bohg, J., Bosselut, A., Brunskill, E. and Brynjolfsson, E. On the opportunities and risks of foundation models. (2021). Preprint at https://arxiv.org/abs/2108.07258
Pith/arXiv arXiv 2021
-
[15]
Lu, S., Guo, J., Zimmer-Dauphinee, J.R., Nieusma, J.M., Wang, X., VanValkenburgh, P., Wernke, S.A. and Huo, Y. Vision foundation models in remote sensing: A survey. IEEE Geoscience and Remote Sensing Magazine (2025). https://doi.org/10.1109/MGRS.2025.3541952
-
[16]
Jakubik, J., Roy, S., Phillips, C.E., Fraccaro, P., Godwin, D., Zadrozny, B., Szwarcman, D., Gomes, C., Nyirjesy, G., Edwards, B. and Kimura, D. Foundation models for generalist geospatial artificial intelligence. arXiv preprint. https://doi.org/10.48550/arXiv.2310.18660
-
[17]
Szwarcman, D., Roy, S., Fraccaro, P., Gíslason, O.E., Blumenstiel, B., Ghosal, R., De Oliveira, P.H., de Sousa Almeida, J.L., Sedona, R., Kang, Y. and Chakraborty, S. Prithvi-EO-2.0: A Versatile Multitemporal Foundation Model for Earth Observation Applications. IEEE Transactions on Geoscience and Remote Sensing 64 , 1-20 (2026). https://doi.org/10.1109/TG...
-
[18]
Bonafilia, D., Tellman, B., Anderson, T. & Issenberg, E. Sen1Floods11: a georeferenced dataset to train and test deep learning flood algorithms for Sentinel-1. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (pp. 835–845). IEEE (2020). https://doi.org/10.1109/CVPRW50498.2020.00113
-
[19]
Li, W., Lee, H., Wang, S., Hsu, C. Y. & Arundel, S. T. Assessment of a new GeoAI foundation model for flood inundation mapping. In Proceedings of the 6th ACM SIGSPATIAL International workshop on AI for geographic knowledge discovery (pp. 102-109) (2023). https://doi.org/10.1145/3615886.3627747 17
-
[20]
Kostejn, J., Janečka, K. & Šatanová, A. U-Prithvi: integrating a foundation model and U-Net for enhanced flood inundation mapping. In GIScience 2025, LIPIcs, Vol. 343, Article 18 (2025). https://doi.org/10.4230/LIPIcs.GIScience.2025.18
-
[21]
Kaushik, S., Maurya, L., Tellman, B. & Zhang, Z. Assessing geo-foundational models for flood inundation mapping: benchmarking models for Sentinel-1, Sentinel-2, and PlanetScope. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 19 , 5649-5665 (2026). https://doi.org/10.1109/JSTARS.2026.3656855
-
[22]
Ghamisi, P., Rasti, B., Yokoya, N., Wang, Q., Hofle, B., Bruzzone, L., Bovolo, F., Chi, M., Anders, K., Gloaguen, R. and Atkinson, P.M. Multisource and multitemporal data fusion in remote sensing: a comprehensive review of the state of the art. IEEE Geoscience and Remote Sensing Magazine 7 , 6-39 (2025). https://doi.org/10.1109/MGRS.2018.2890023
-
[23]
Zhu, X.X., Xiong, Z., Wang, Y., Stewart, A.J., Heidler, K., Wang, Y., Yuan, Z., Dujardin, T., Xu, Q. and Shi, Y. On the foundations of Earth foundation models. Communications Earth & Environment. 7, 103 (2026). https://doi.org/10.1038/s43247-025-03127-x
-
[24]
Tulbure, M.G., Caineta, J., Broich, M., Gaines, M.D., Rufin, P., Thomas, L.F., Alemohammad, H., Hemmerling, J. and Hostert, P. Leveraging AI multimodal geospatial foundation models for improved near-real-time flood mapping at a global scale. arXiv preprint arXiv:2512.02055 . https://doi.org/10.48550/arXiv.2512.02055
-
[25]
Misra, A., White, K., Nsutezo, S.F., Straka III, W. and Lavista, J. Mapping global floods with 10 years of satellite radar data. Nature Communications 16 , 5762 (2025). https://doi.org/10.1038/s41467-025-60973-1
-
[26]
Li, K., Razavi, S., Maier, H.R., Hrachowitz, M., Nabavi, E., Harvey, N., Akhtar, K. and Unduche, F. When are AI models ready for deployment? Reassessing Google's global AI flood forecasting system through the lens of responsible modelling. Journal of Hydrology X, 100215 (2026). https://doi.org/10.1016/j.hydroa.2026.100215
-
[28]
Feffer, M., Sinha, A., Deng, W. H., Lipton, Z. C. & Heidari, H. Red-teaming for generative AI: Silver bullet or security theater? In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society 7 421-437 (2024). https://doi.org/10.1609/aies.v7i1.31647
-
[29]
Bullwinkel, B., Minnich, A., Chawla, S., Lopez, G., Pouliot, M., Maxwell, W., de Gruyter, J., Pratt, K., Qi, S., Chikanov, N. and Lutz, R. Lessons from red teaming 100 generative AI products. arXiv preprint arXiv:2501.07238 (2025). https://doi.org/10.48550/arXiv.2501.07238 18
-
[30]
Wang, P., Wu, X. & Chen, Y. AI-driven approaches to flood risk management: overcoming data bias and enhancing decision-making. Climate Risk Management 50 , 100752 (2025). https://doi.org/10.1016/j.crm.2025.100752
-
[31]
Liu, Z., Coleman, N., Patrascu, F. I., Yin, K., Li, X. & Mostafavi, A. Artificial intelligence for flood risk management: a comprehensive state-of-the-art review and future directions. International Journal of Disaster Risk Reduction 117 , 105110 (2025). https://doi.org/10.1016/j.ijdrr.2024.105110
-
[32]
Li, Z., Xu, S. and Weng, Q. Beyond clouds: seamless flood mapping using Harmonized Landsat and Sentinel-2 time series imagery and water occurrence data. ISPRS Journal of Photogrammetry and Remote Sensing 216 , 185-199 (2024). https://doi.org/10.1016/j.isprsjprs.2024.07.022
-
[33]
Mohamadiazar, N., Ebrahimian, A. & Hosseiny, H. Integrating deep learning, satellite image processing, and spatial-temporal analysis for urban flood prediction. Journal of Hydrology 639 , 131508 (2024). https://doi.org/10.1016/j.jhydrol.2024.131508
-
[34]
Marsocci, V. et al. PANGAEA: Assessing Geospatial Foundation Models Capabilities through a Global and Inclusive Benchmark. IEEE geoscience and remote sensing magazine (2025). https://doi.org/10.1109/MGRS.2025.3628194
-
[35]
Nearing, G., Cohen, D., Dube, V., Gauch, M., Gilon, O., Harrigan, S., Hassidim, A., Klotz, D., Kratzert, F., Metzger, A. and Nevo, S., 2024. Global prediction of extreme floods in ungauged watersheds. Nature 627 , 559-563 (2024). https://doi.org/10.1038/s41586-024-07145-1
-
[36]
Zhu, Z., Wang, S. & Woodcock, C. E. Improvement and expansion of the Fmask algorithm: cloud, cloud shadow, and snow detection for Landsats 4–7, 8, and Sentinel 2 images. Remote Sensing of Environment 159 , 269-277 (2015). https://doi.org/10.1016/j.rse.2014.12.014
-
[37]
Salamon, P., Mctlormick, N., Reimer, C., Clarke, T., Bauer-Marschallinger, B., Wagner, W., Martinis, S., Chow, C., Böhnke, C., Matgen, P. and Chini, M. The new, systematic global flood monitoring product of the Copernicus Emergency Management Service. In 2021 IEEE International Geoscience and Remote Sensing Symposium (IGARSS) 1053–1056 (2021). https://doi...
-
[38]
OPERA Dynamic Surface Water Extent from Harmonized Landsat Sentinel-2 Version 1
OPERA. OPERA Dynamic Surface Water Extent from Harmonized Landsat Sentinel-2 Version 1. NASA Physical Oceanography Distributed Active Archive Center (2023). https://doi.org/10.5067/opdsw-pl3v1
-
[39]
Cédric Villani.Optimal Transport: Old and New
He, K., Chen, X., Xie, S., Li, Y., Dollár, P. & Girshick, R. Masked autoencoders are scalable vision learners. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 16000–16009). IEEE (2022). https://doi.org/10.1109/CVPR52688.2022.01553
-
[40]
Kolluru, V., John, R., Chen, J., Henebry, G.M., Xiao, J., Shinde, R., Kussainova, M. and Ganzorig, U., 2026. Leveraging sUAS-Sentinel-2 synergy for cross-scale mapping of canopy cover and aboveground biomass across Mongolia and Kazakhstan. Remote Sensing of Environment, 336, p.115302. https://doi.org/10.1016/j.rse.2026.115302
-
[41]
Kolluru, V., John, R., Chen, J., Konkathi, P., Kolluru, S., Saraf, S., Henebry, G.M., Xiao, J., Jain, K. and Kussainova, M., 2024. Dominant role of grazing and snow cover variability on vegetation shifts in 19 the drylands of Kazakhstan. Communications Earth & Environment, 5(1), p.424. https://doi.org/10.1038/s43247-024-01587-1
-
[42]
Venkatesh, K., John, R., Chen, J., Jarchow, M., Amirkhiz, R.G., Giannico, V., Saraf, S., Jain, K., Kussainova, M. and Yuan, J., 2022. Untangling the impacts of socioeconomic and climatic changes on vegetation greenness and productivity in Kazakhstan. Environmental Research Letters, 17(9), p.095007. https://doi.org/10.1088/1748-9326/ac8c59
-
[43]
Pekel, J.-F., Cottam, A., Gorelick, N. & Belward, A. S. High-resolution mapping of global surface water and its long-term changes. Nature 540 , 418-422 (2016). https://doi.org/10.1038/nature20584
-
[44]
Zanaga, D., Van De Kerchove, R., Daems, D., De Keersmaecker, W., Brockmann, C., Kirches, G., Wevers, J., Cartus, O., Santoro, M., Fritz, S. and Lesiv, M. ESA WorldCover 10 m 2021 v200. Zenodo (2022). https://doi.org/10.5281/zenodo.7254221 20
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.