Probabilistic NDVI Forecasting from Sparse Satellite Time Series and Weather Covariates

Daniele Molino; Elena Mulero Ayll\'on; Filippo Ruffini; Giulia Romoli; Irene Iele; Matteo Tortora; Paolo Soda

arxiv: 2602.17683 · v3 · pith:6ZIDOKI5new · submitted 2026-02-04 · 💻 cs.LG · cs.CV· stat.ML

Probabilistic NDVI Forecasting from Sparse Satellite Time Series and Weather Covariates

Irene Iele , Giulia Romoli , Daniele Molino , Elena Mulero Ayll\'on , Filippo Ruffini , Paolo Soda , Matteo Tortora This is my paper

Pith reviewed 2026-05-16 07:13 UTC · model grok-4.3

classification 💻 cs.LG cs.CVstat.ML

keywords NDVIprobabilistic forecastingsatellite time seriessparse dataquantile lossweather covariatesvegetation indexprecision agriculture

0 comments

The pith

A probabilistic NDVI forecasting model separates historical encodings from future weather covariates to handle sparse satellite data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a framework for probabilistic forecasting of the Normalized Difference Vegetation Index from irregular satellite observations and weather data. It encodes past NDVI and meteo history independently from future covariates before fusing them for multi-step quantile predictions. A temporal-distance weighted quantile loss accounts for varying uncertainty over forecast horizons, while cumulative and extreme weather features capture delayed effects on crops. Experiments on European data demonstrate superior performance over statistical, deep learning, and time-series baselines in both point predictions and uncertainty estimates. Ablation analysis highlights that historical target data drives most accuracy, with meteorological inputs providing further improvements.

Core claim

The central discovery is that a multimodal architecture encoding historical NDVI and meteorological observations separately from future exogenous covariates, combined with a temporal-distance weighted quantile loss and engineered cumulative/extreme-weather features, achieves better probabilistic multi-step NDVI predictions under sparse and irregular clear-sky acquisitions than existing baselines.

What carries the argument

The architecture that separates the encoding of historical NDVI and meteorological observations from future exogenous covariates for multi-step quantile prediction, trained with a temporal-distance weighted quantile loss.

If this is right

Probabilistic forecasts quantify uncertainty arising from cloud masking in satellite data.
Feature engineering for cumulative and extreme weather effects improves capture of vegetation response delays.
Target history serves as the primary driver of predictive performance.
Meteorological covariates yield additional gains when integrated in the full multimodal setup.
Outperformance holds across both pointwise accuracy and probabilistic metrics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This approach could support more reliable decision-making in precision agriculture where satellite revisits are infrequent.
Retraining on datasets with different cloud masking statistics might be necessary for global applicability.
Extending the model to incorporate additional data sources like soil moisture could reduce dependence on clear-sky observations.

Load-bearing premise

The temporal-distance weighted quantile loss and engineered cumulative/extreme-weather features will generalize beyond the European dataset and specific cloud-masking patterns used.

What would settle it

Evaluating the model on satellite data from a non-European region with markedly different revisit frequencies and weather patterns, where it fails to outperform baselines, would falsify the performance claims.

Figures

Figures reproduced from arXiv: 2602.17683 by Daniele Molino, Elena Mulero Ayll\'on, Filippo Ruffini, Giulia Romoli, Irene Iele, Matteo Tortora, Paolo Soda.

**Figure 1.** Figure 1: Overview of the proposed pipeline for probabilistic NDVI forecasting. (a) Sentinel-2 cubes are cloud-masked to derive clear-sky NDVI, combined with [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Ground truth versus predicted NDVI by aggregated K [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Comparison of forecasting models in terms of RMSE (y-axis, lower [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

read the original abstract

Short-term forecasting of vegetation dynamics is a key enabler for data-driven decision support in precision agriculture. Normalized Difference Vegetation Index (NDVI) forecasting from satellite observations, however, remains challenging due to sparse and irregular sampling caused by cloud masking, as well as the heterogeneous climatic conditions under which crops evolve. In this work, we propose a probabilistic forecasting framework for field-level NDVI prediction under sparse, irregular clear-sky acquisitions. The architecture separates the encoding of historical NDVI and meteorological observations from future exogenous covariates, fusing both representations for multi-step quantile prediction. To address irregular revisit patterns and horizon-dependent uncertainty, we introduce a temporal-distance weighted quantile loss that aligns the training objective with the effective forecasting horizon. In addition, we incorporate cumulative and extreme-weather feature engineering to capture delayed meteorological effects relevant to vegetation response. Experiments on European satellite data show that the proposed approach outperforms statistical, deep learning, and time-series baselines on both pointwise and probabilistic evaluation metrics. Ablation studies confirm that target history is the primary driver of performance, with meteorological covariates providing additional gains in the full multimodal setting. The code is available at https://github.com/arco-group/ndvi-forecasting.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a clean empirical paper that improves NDVI forecasting on sparse European satellite data via a weighted quantile loss and weather features, but the gains are incremental and the generalization story is thin.

read the letter

The core contribution is an encoder-decoder that separates historical NDVI and weather encoding from future covariates, then uses a temporal-distance weighted quantile loss plus cumulative and extreme-weather features. On the reported European dataset it beats the listed statistical, deep-learning, and time-series baselines on both point forecasts and probabilistic metrics. Ablations confirm that target history drives most of the performance, which matches what one would expect for vegetation indices. Code release is a plus and lets others check the exact loss weighting and splits.

Referee Report

1 major / 2 minor

Summary. The manuscript proposes a probabilistic NDVI forecasting framework that encodes historical NDVI and meteorological observations separately from future exogenous weather covariates, fuses the representations for multi-step quantile prediction, introduces a temporal-distance weighted quantile loss to handle irregular sampling and horizon-dependent uncertainty, and incorporates cumulative and extreme-weather feature engineering. Experiments on European satellite data report outperformance over statistical, deep learning, and time-series baselines on both pointwise and probabilistic metrics, with ablation studies identifying target history as the primary performance driver. The code is released at a public repository.

Significance. If the empirical results hold under clarified validation, the work would provide a practical contribution to precision agriculture by improving short-term vegetation forecasting under sparse, cloud-masked satellite observations. The separation of historical and future inputs, horizon-aware loss, and multimodal fusion directly target the challenges of irregular revisit patterns and delayed weather effects. Ablation results and code release add value by highlighting component importance and supporting reproducibility.

major comments (1)

[Experiments] Experiments section: The train/test split methodology is not described in sufficient detail. It remains unclear whether the same agricultural fields appear in both training and test sets, which is load-bearing for confirming that the reported outperformance on pointwise and probabilistic metrics is not due to data leakage or field-specific correlations.

minor comments (2)

[Abstract] Abstract: A brief mention of dataset scale (number of fields, time span) and the temporal split strategy would strengthen the central claim of outperformance.
[Method] Method section: The precise mathematical definition of the temporal-distance weighted quantile loss would benefit from an explicit equation to improve reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive comment regarding the train/test split. We have revised the manuscript to provide a clear and detailed description of the methodology, confirming a field-disjoint split that eliminates the possibility of data leakage.

read point-by-point responses

Referee: [Experiments] Experiments section: The train/test split methodology is not described in sufficient detail. It remains unclear whether the same agricultural fields appear in both training and test sets, which is load-bearing for confirming that the reported outperformance on pointwise and probabilistic metrics is not due to data leakage or field-specific correlations.

Authors: We appreciate the referee raising this critical point about potential data leakage. In the revised manuscript, we have expanded the 'Experiments' section (specifically the 'Dataset and Preprocessing' and 'Evaluation Protocol' subsections) to explicitly detail the split procedure. The dataset was partitioned at the field level: individual agricultural fields were randomly assigned to training (70%), validation (15%), and test (15%) sets, with all time series belonging to a given field kept entirely within one split. No field appears in more than one partition. This design ensures that performance gains cannot be attributed to field-specific correlations, repeated observations of the same location, or leakage across train and test sets. We believe this clarification directly addresses the concern and strengthens the validity of the reported results. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents an empirical machine learning framework for probabilistic NDVI forecasting. Its central claims rest on experimental outperformance against statistical, deep learning, and time-series baselines on a European satellite dataset, supported by ablation studies identifying target history as the primary driver. No mathematical derivation, first-principles result, or uniqueness theorem is claimed; the temporal-distance weighted quantile loss and cumulative/extreme-weather features are introduced as design choices to handle irregular sampling and delayed effects, without reducing to fitted parameters by construction. No load-bearing self-citations or ansatz smuggling appear in the provided text. The contribution is self-contained as an empirical comparison.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The framework rests on standard supervised learning assumptions (i.i.d. training examples after masking, stationarity of vegetation-weather relationships within the study region) plus the domain assumption that meteorological covariates carry predictive signal beyond target history. No new entities are postulated.

free parameters (2)

quantile levels
Chosen to produce the reported probabilistic outputs; values not stated in abstract.
temporal weighting schedule
Hyperparameter controlling how strongly the loss emphasizes longer horizons; fitted or tuned on validation data.

axioms (2)

domain assumption Vegetation response to weather is sufficiently stationary within the European study region and time period to allow generalization from training to test fields.
Invoked implicitly by training on historical data and evaluating on later periods.
domain assumption Clear-sky NDVI observations are missing at random conditional on the weather covariates.
Required for the irregular-sampling handling to be unbiased.

pith-pipeline@v0.9.0 · 5532 in / 1307 out tokens · 23887 ms · 2026-05-16T07:13:27.785788+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

temporal-distance weighted quantile loss ... wk = 1/(1 + α·Δdays_k) with α=0.5
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

transformer-based architecture ... history and future branches

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

VegSim: A Geospatial World Model for Scenario-Conditioned Vegetation Simulation
cs.LG 2026-06 unverdicted novelty 5.0

VegSim uses recurrent latent dynamics to enable both standard NDVI forecasting and user-controlled scenario simulation of vegetation from sparse satellite and weather inputs.