Efficient Spatio-Temporal Vegetation Pixel Classification with Vision Transformers

Alan Gomes; Anderson Gon\c{c}alves; Bruna de Costa Alberton; Jurandy Almeida; Leonor Patricia C. Morellato; Magna Soelma Beserra de Moura; Nathan Felipe Alves; Ricardo da Silva Torres; Samuel Felipe dos Santos

arxiv: 2605.00296 · v1 · submitted 2026-04-30 · 💻 cs.CV

Efficient Spatio-Temporal Vegetation Pixel Classification with Vision Transformers

Alan Gomes , Anderson Gon\c{c}alves , Samuel Felipe dos Santos , Nathan Felipe Alves , Magna Soelma Beserra de Moura , Bruna de Costa Alberton , Leonor Patricia C. Morellato , Ricardo da Silva Torres

show 1 more author

Jurandy Almeida

This is my paper

Pith reviewed 2026-05-09 19:36 UTC · model grok-4.3

classification 💻 cs.CV

keywords vision transformersvegetation pixel classificationspatio-temporal analysisphenology monitoringcomputational efficiencyUAV imagerymulti-temporal classificationCerrado datasets

0 comments

The pith

Vision Transformers classify vegetation pixels in time-series imagery with an order of magnitude fewer operations than convolutional networks while keeping parameter count fixed.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that a Vision Transformer can perform pixel-level vegetation classification on multi-temporal aerial and near-surface images more efficiently than existing CNN approaches. It reaches this through systematic testing of design choices for normalization, tokenization, positional encoding, and feature handling on two Cerrado biome datasets. The result matters because longer time series become practical without exploding compute or memory demands, supporting ongoing monitoring of ecosystem changes. The transformer maintains competitive accuracy while its cost stays independent of sequence length, in contrast to CNN baselines whose requirements grow linearly.

Core claim

A Vision Transformer optimized across seven design dimensions reduces floating-point operations by an order of magnitude and maintains constant parameter complexity independent of time-series length for spatio-temporal vegetation pixel classification on Serra do Cipó aerial imagery and Itirapina near-surface imagery, while delivering classification performance comparable to multi-temporal CNN baselines.

What carries the argument

The Vision Transformer architecture with custom tokenization, positional encoding, and aggregation strategies applied to multi-temporal spectral pixel patches.

If this is right

Phenological monitoring systems can process extended image sequences without proportional increases in compute or memory.
UAV and camera deployments become more feasible for continuous species identification in resource-limited field settings.
Spatio-temporal pixel tasks in remote sensing can shift from rigid multi-branch CNN designs to more scalable transformer models.
The constant complexity profile opens the door to handling very long observation records that would overwhelm current CNN approaches.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same efficiency pattern could extend to related tasks such as crop-type mapping or forest disturbance detection over time.
Deployment on edge hardware for near-real-time vegetation tracking becomes plausible given the reduced operation count.
The approach suggests that transformer designs may replace CNNs in other sequence-length-sensitive remote-sensing applications without custom multi-branch engineering.

Load-bearing premise

That the ablation results on seven design choices produce configurations that generalize beyond the two Cerrado datasets and that matching CNN accuracy levels suffices for real phenological monitoring needs.

What would settle it

Evaluating both the optimized Vision Transformer and the CNN baseline on a new dataset with substantially longer time series or from a different biome and checking whether the order-of-magnitude FLOPs reduction and constant parameter count persist while accuracy stays competitive.

Figures

Figures reproduced from arXiv: 2605.00296 by Alan Gomes, Anderson Gon\c{c}alves, Bruna de Costa Alberton, Jurandy Almeida, Leonor Patricia C. Morellato, Magna Soelma Beserra de Moura, Nathan Felipe Alves, Ricardo da Silva Torres, Samuel Felipe dos Santos.

**Figure 1.** Figure 1: Overview of our method for searching for an optimal setting for applying a ViT architecture to the spatio-temporal vegetation pixel classification task. [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 3.** Figure 3: Sample RGB image from the Itirapina dataset (left) and its ground [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 2.** Figure 2: Sample RGB image from the Serra do Cipo dataset (left) and its ´ ground-truth map (right). Following the protocol of Nogueira et al. [17], the classes and their respective training and test splits are color-coded as: Bowdichia virgilioides (red: train, orange: test); Eremanthus erythropappus (blue: train, purple: test); Vochysia cinnamomea (cyan: train, white: test); and a set of Evergreen species (green: … view at source ↗

**Figure 4.** Figure 4: Phenological visual rhythms for unnormalized (left) and normalized [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Phenological visual rhythms for unnormalized (left) and normalized [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Balanced accuracy, computational complexity (FLOPs), and number [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

read the original abstract

Plant phenology-the study of recurrent life cycle events-is essential for understanding ecosystem dynamics and their responses to climate change impacts. While Unmanned Aerial Vehicles (UAVs) and near-surface cameras enable high-resolution monitoring, identifying plant species across time remains computationally challenging. State-of-the-art approaches, specifically Multi-Temporal Convolutional Networks (CNNs), rely on rigid multi-branch architectures that scale poorly with longer time series and require large spatial context windows. In this paper, we present an extensive study on optimizing Vision Transformers (ViTs) for efficient spatio-temporal vegetation pixel classification. We conducted a comprehensive ablation study analyzing seven key design dimensions, including: (i) data normalization; (ii) spectral arrangement; (iii) boundary handling; (iv) spatial context window shape and size; (v) tokenization strategies; (vi) positional encoding; and (vii) feature aggregation strategies. Our method was evaluated on two datasets from the Brazilian Cerrado biome, Serra do Cip\'o (aerial imagery) and Itirapina (near-surface imagery). Experimental results demonstrate that our ViT approach offers a substantial improvement in computational efficiency while maintaining competitive classification performance. Notably, our ViT reduces Floating Point Operations (FLOPs) by an order of magnitude and maintains constant parameter complexity regardless of the time series length, whereas the CNN baseline scales linearly. Our findings confirm that ViTs are a robust, scalable solution for resource-constrained phenological monitoring systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A ViT tuned via seven ablations cuts FLOPs by 10x for vegetation pixel classification in time series while keeping parameter count flat with sequence length.

read the letter

The main point is that after ablating seven design choices, their Vision Transformer setup processes spatio-temporal vegetation pixels with roughly an order of magnitude fewer FLOPs than the multi-temporal CNN baseline, and the parameter count does not grow with longer time series. This efficiency edge is real and follows directly from the fixed-depth transformer blocks versus the scaling behavior of CNN branches or 3D kernels. The paper tests the approach on two Cerrado datasets—one aerial from Serra do Cipó and one near-surface from Itirapina—and reports competitive accuracy alongside the compute savings. The ablation covers normalization, spectral ordering, boundary handling, context window shape, tokenization, positional encoding, and aggregation, which gives readers concrete knobs to turn for similar tasks. That systematic check is the useful addition here; it moves beyond simply swapping a ViT into an existing pipeline. The scaling result itself needs no extra proof once the architecture is fixed, and the experiments appear to confirm it holds on held-out data. The work is narrow in scope, limited to these two Brazilian sites and the specific imagery types. Generalization to other biomes, sensors, or resolutions is not shown, so the practical takeaway stays tied to phenology monitoring in similar savanna settings. The abstract claims competitive performance, but without the full tables, error bars, or per-class breakdowns it is hard to judge how robust the accuracy is across seasons or edge cases. The CNN baseline is described as standard, yet details on its exact multi-branch implementation would strengthen the comparison. This paper is for remote-sensing and phenology researchers who run pixel-level classification on UAV or camera time series and care about keeping compute manageable as sequences lengthen. The ablation results and the clear efficiency contrast make it worth reading for anyone tuning transformers on temporal imagery. It deserves a serious referee because the central claim is architecture-driven and verifiable, the experiments are grounded, and the design study adds usable guidance even if the datasets stay limited.

Referee Report

0 major / 4 minor

Summary. The paper presents an extensive ablation study optimizing Vision Transformers for spatio-temporal vegetation pixel classification from high-resolution UAV and near-surface imagery. It evaluates the approach on two Cerrado biome datasets (Serra do Cipó aerial and Itirapina near-surface), claiming that the resulting ViT reduces FLOPs by an order of magnitude relative to a multi-temporal CNN baseline while maintaining constant parameter count independent of time-series length.

Significance. If the efficiency results hold under the reported experimental conditions, the work offers a practical, scalable alternative to CNNs for resource-constrained phenological monitoring, directly addressing the linear scaling limitations of multi-branch temporal architectures with longer sequences.

minor comments (4)

The abstract states competitive classification performance but does not specify the exact accuracy, F1, or IoU values achieved by the final ViT configuration versus the CNN baseline; these numbers should appear in a results table with standard deviations across runs.
Section describing the seven-dimensional ablation (data normalization, spectral arrangement, boundary handling, spatial context, tokenization, positional encoding, feature aggregation) should include a summary table showing the performance delta for each dimension rather than only the final selected configuration.
The FLOPs and parameter scaling claims would benefit from an explicit complexity analysis subsection (e.g., big-O notation for sequence length T) accompanied by measured values on both datasets to confirm the order-of-magnitude gap.
Figure captions and axis labels for any efficiency plots should explicitly state the input dimensions (spatial patches × time steps) used for each model to allow direct reproduction of the reported scaling behavior.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive evaluation of our manuscript and for recommending minor revision. We are pleased that the significance of the efficiency gains—order-of-magnitude FLOPs reduction and constant parameter count independent of time-series length—is recognized as offering a practical alternative to multi-temporal CNNs for resource-constrained phenological monitoring.

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper is an empirical ablation study comparing ViT configurations against a CNN baseline on two Cerrado datasets. The central efficiency claims (order-of-magnitude FLOPs reduction and parameter count independent of time-series length) follow directly from the fixed-depth transformer architecture's standard scaling properties, which are independent of the paper's fitted hyperparameters or results. The seven-dimensional ablation selects a configuration but does not derive or redefine the complexity scaling. No equations, predictions, or load-bearing premises reduce to self-definition, fitted inputs renamed as predictions, or self-citation chains. The work is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The work rests on standard machine learning assumptions for supervised image classification and transformer architectures; no new entities or heavy free parameters beyond typical hyperparameters are introduced in the abstract.

axioms (2)

domain assumption Vision Transformers with appropriate tokenization and positional encoding can effectively capture spatio-temporal dependencies in vegetation imagery
Invoked implicitly when claiming competitive performance after ablation on tokenization and positional encoding strategies.
domain assumption The two Cerrado datasets are representative for evaluating general efficiency and accuracy in phenological monitoring
Central to generalizing the efficiency claims beyond the specific Serra do Cipó and Itirapina sites.

pith-pipeline@v0.9.0 · 5605 in / 1261 out tokens · 38909 ms · 2026-05-09T19:36:18.133170+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages · 1 internal anchor

[1]

Detecting tropical forests’ responses to global climatic and atmospheric change: Current challenges and a way forward,

D. B. Clark, “Detecting tropical forests’ responses to global climatic and atmospheric change: Current challenges and a way forward,”Biotropica, vol. 39, no. 1, pp. 4–19, 2007

work page 2007
[2]

Content-based image retrieval: Theory and applications,

R. da S. Torres and A. X. Falc ˜ao, “Content-based image retrieval: Theory and applications,”Journal of Theoretical and Applied Informatics, vol. 13, no. 2, pp. 161–185, 2006

work page 2006
[3]

Discriminative unsupervised feature learning with exemplar convolutional neural networks,

A. Dosovitskiy, P. Fischer, J. T. Springenberg, M. A. Riedmiller, and T. Brox, “Discriminative unsupervised feature learning with exemplar convolutional neural networks,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 9, pp. 1734–1747, September 2016

work page 2016
[4]

A globally coherent fingerprint to climate change impacts accross natural systems,

C. Parmesan and G. A. Yohe, “A globally coherent fingerprint to climate change impacts accross natural systems,”Nature, vol. 421, pp. 37–42, 2003

work page 2003
[5]

Attributing physical and biological impacts to anthropogenic climate change,

C. Rosenzweig, D. Karoly, M. Vicarelli, P. Neofotis, Q. Wu, G. Casassa, A. Menzel, T. L. Root, N. Estrella, B. Seguin, P. Tryjanowski, C. Liu, S. Rawlins, and A. Imeson, “Attributing physical and biological impacts to anthropogenic climate change,”Nature, vol. 453, pp. 353–357, 2008

work page 2008
[6]

Plants in a warmer world,

G. R. Walther, “Plants in a warmer world,”Perspectives in Plant Ecology Evolution and Systematics, vol. 6, pp. 169–185, 2004

work page 2004
[7]

Ecolog- ical responses to recent climate change,

G. R. Walther, E. Post, P. Convey, A. Menzel, C. Parmesan, T. J. C. Beebee, J. M. Fromentin, O. Hoegh-Guldberg, and F. Bairlein, “Ecolog- ical responses to recent climate change,”Nature, vol. 416, pp. 389–395, 2002

work page 2002
[8]

Satellite remote sensing of vegetation phenology: Progress, challenges, and opportunities,

Z. Gong, W. Ge, J. Guo, and J. Liu, “Satellite remote sensing of vegetation phenology: Progress, challenges, and opportunities,”ISPRS Journal of Photogrammetry and Remote Sensing, vol. 217, pp. 149–164, 2024

work page 2024
[9]

Herbivory as a selective agent on the timing of leaf production in a tropical understory community,

T. M. Aide, “Herbivory as a selective agent on the timing of leaf production in a tropical understory community,”Nature, vol. 336, pp. 574–575, 1988

work page 1988
[10]

Tracking the rhythm of the seasons in the face of global change: Phenological research in the 21st century,

J. T. Morisette, A. D. Richardson, A. K. Knapp, J. I. Fisher, E. A. Graham, J. Abatzoglou, B. E. Wilson, D. D. Breshears, G. M. Henebry, J. M. Hanes, and L. Liang, “Tracking the rhythm of the seasons in the face of global change: Phenological research in the 21st century,” Frontiers in Ecology and the Environment, vol. 7, no. 5, pp. 253–260, 2009

work page 2009
[11]

Introducing digital cameras to monitor plant phenology in the tropics: applications for conservation,

B. Alberton, R. da S. Torres, L. F. Cancian, B. D. Borges, J. Almeida, G. C. Mariano, J. dos Santos, and L. P. C. Morellato, “Introducing digital cameras to monitor plant phenology in the tropics: applications for conservation,”Perspectives in Ecology and Conservation, vol. 15, no. 2, pp. 82–90, 2017

work page 2017
[12]

Relationship between trop- ical leaf phenology and ecosystem productivity using phenocameras,

B. Alberton, T. C. Martin, H. R. Da Rocha, A. D. Richardson, M. S. Moura, R. S. Torres, and L. P. C. Morellato, “Relationship between trop- ical leaf phenology and ecosystem productivity using phenocameras,” Frontiers in Environmental Science, vol. 11, p. 1223219, 2023

work page 2023
[13]

A review of remote sensing image segmentation by deep learning methods,

J. Li, Y . Cai, Q. Li, M. Kou, and T. Zhang, “A review of remote sensing image segmentation by deep learning methods,”International Journal of Digital Earth, vol. 17, no. 1, p. 2328827, 2024

work page 2024
[14]

Near-surface remote sensing of spatial and temporal variation in canopy phenology,

A. D. Richardson, B. H. Braswell, D. Y . Hollinger, J. P. Jenkins, and S. V . Ollinger, “Near-surface remote sensing of spatial and temporal variation in canopy phenology,”Ecological Applications, vol. 19, no. 6, pp. 1417–1428, 2009

work page 2009
[15]

Using phenological cameras to track the green up in a cerrado savanna and its on-the-ground validation,

B. Alberton, J. Almeida, R. Henneken, R. S. Torres, A. Menzel, and L. P. C. Morellato, “Using phenological cameras to track the green up in a cerrado savanna and its on-the-ground validation,”Ecological Informatics, vol. 19, pp. 62–70, 2014. 13

work page 2014
[16]

A review of plant phenology in south and central america,

L. P. C. Morellato, M. G. G. Camargo, and E. Gressler, “A review of plant phenology in south and central america,” inPhenology: An Integrative Environmental Science, M. D. Schwartz, Ed. Springer, 2013, chapter 6, pp. 91–113

work page 2013
[17]

Spatio-temporal vegetation pixel classification by using convolutional networks,

K. Nogueira, J. A. dos Santos, N. Menini, T. S. Silva, L. P. C. Morellato, and R. d. S. Torres, “Spatio-temporal vegetation pixel classification by using convolutional networks,”IEEE Geosci. Remote Sens. Lett., vol. 16, no. 10, pp. 1665–1669, 2019

work page 2019
[18]

Applying machine learning based on multiscale classifiers to detect remote phenology patterns in cerrado savanna trees,

J. Almeida, J. A. dos Santos, B. Alberton, R. d. S. Torres, and L. P. C. Morellato, “Applying machine learning based on multiscale classifiers to detect remote phenology patterns in cerrado savanna trees,”Ecological Informatics, vol. 23, pp. 49–61, 2014

work page 2014
[19]

Unsupervised distance learning for plant species identification,

J. Almeida, D. C. Pedronette, B. C. Alberton, L. P. C. Morellato, and R. d. S. Torres, “Unsupervised distance learning for plant species identification,”IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 9, no. 12, pp. 5325–5338, 2016

work page 2016
[20]

Phenological visual rhythms: Compact representations for fine- grained plant species identification,

J. Almeida, J. A. dos Santos, B. Alberton, L. P. C. Morellato, and R. d. S. Torres, “Phenological visual rhythms: Compact representations for fine- grained plant species identification,”Pattern Recognition Letters, vol. 81, pp. 90–100, 2016

work page 2016
[21]

Deriving vegetation indices for phe- nology analysis using genetic programming,

J. Almeida, J. A. dos Santos, W. O. Miranda, B. Alberton, L. P. C. Morellato, and R. d. S. Torres, “Deriving vegetation indices for phe- nology analysis using genetic programming,”Ecological Informatics, vol. 26, pp. 61–69, 2015

work page 2015
[22]

Time series-based classifier fusion for fine-grained plant species recognition,

F. A. Faria, J. Almeida, B. Alberton, L. P. C. Morellato, A. Rocha, and R. d. S. Torres, “Time series-based classifier fusion for fine-grained plant species recognition,”Pattern Recognition Letters, vol. 81, pp. 101–109, 2016

work page 2016
[23]

Fusion of time series representations for plant recognition in phenology studies,

F. A. Faria, J. Almeida, B. Alberton, L. P. C. Morellato, and R. d. S. Torres, “Fusion of time series representations for plant recognition in phenology studies,”Pattern Recognition Letters, vol. 83, pp. 205–214, 2016

work page 2016
[24]

Agrifm: A multi-source temporal remote sensing foundation model for crop mapping,

W. Li, S. Liang, K. Chen, Y . Chen, H. Ma, J. Xu, Y . Ma, S. Guan, H. Fang, and Z. Shi, “Agrifm: A multi-source temporal remote sensing foundation model for crop mapping,”arXiv preprint arXiv:2505.21357, 2025

work page arXiv 2025
[25]

A review of artificial intelligence techniques for wheat crop monitoring and management,

J. G. A. Barbedo, “A review of artificial intelligence techniques for wheat crop monitoring and management,”Agronomy, vol. 15, no. 5, 2025

work page 2025
[26]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

A. Dosovitskiy, “An image is worth 16x16 words: Transformers for image recognition at scale,”arXiv preprint arXiv:2010.11929, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010
[27]

A systematic review of the use of deep learning in satellite imagery for agriculture,

B. Victor, A. Nibali, and Z. He, “A systematic review of the use of deep learning in satellite imagery for agriculture,”IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 18, pp. 2297–2316, 2025

work page 2025
[28]

Vits for sits: Vision trans- formers for satellite image time series,

M. Tarasiou, E. Chavez, and S. Zafeiriou, “Vits for sits: Vision trans- formers for satellite image time series,” inIEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 10 418–10 428

work page 2023
[29]

Hypyramamba: A pyramid spectral attention and mamba-based architecture for robust hyperspectral image classification,

D. Li, U. A. Bhatti, M. Huang, L. Bruzzone, and J. Li, “Hypyramamba: A pyramid spectral attention and mamba-based architecture for robust hyperspectral image classification,”IEEE Transactions on Geoscience and Remote Sensing (TGRS), vol. 64, pp. 1–16, 2026

work page 2026
[30]

Swdiff: Stage-wise hyperspectral diffusion model for hyperspectral image classification,

L. Chen, J. He, H. Shi, J. Yang, and W. Li, “Swdiff: Stage-wise hyperspectral diffusion model for hyperspectral image classification,” IEEE Transactions on Geoscience and Remote Sensing (TGRS), vol. 62, pp. 1–17, 2024

work page 2024

[1] [1]

Detecting tropical forests’ responses to global climatic and atmospheric change: Current challenges and a way forward,

D. B. Clark, “Detecting tropical forests’ responses to global climatic and atmospheric change: Current challenges and a way forward,”Biotropica, vol. 39, no. 1, pp. 4–19, 2007

work page 2007

[2] [2]

Content-based image retrieval: Theory and applications,

R. da S. Torres and A. X. Falc ˜ao, “Content-based image retrieval: Theory and applications,”Journal of Theoretical and Applied Informatics, vol. 13, no. 2, pp. 161–185, 2006

work page 2006

[3] [3]

Discriminative unsupervised feature learning with exemplar convolutional neural networks,

A. Dosovitskiy, P. Fischer, J. T. Springenberg, M. A. Riedmiller, and T. Brox, “Discriminative unsupervised feature learning with exemplar convolutional neural networks,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 9, pp. 1734–1747, September 2016

work page 2016

[4] [4]

A globally coherent fingerprint to climate change impacts accross natural systems,

C. Parmesan and G. A. Yohe, “A globally coherent fingerprint to climate change impacts accross natural systems,”Nature, vol. 421, pp. 37–42, 2003

work page 2003

[5] [5]

Attributing physical and biological impacts to anthropogenic climate change,

C. Rosenzweig, D. Karoly, M. Vicarelli, P. Neofotis, Q. Wu, G. Casassa, A. Menzel, T. L. Root, N. Estrella, B. Seguin, P. Tryjanowski, C. Liu, S. Rawlins, and A. Imeson, “Attributing physical and biological impacts to anthropogenic climate change,”Nature, vol. 453, pp. 353–357, 2008

work page 2008

[6] [6]

Plants in a warmer world,

G. R. Walther, “Plants in a warmer world,”Perspectives in Plant Ecology Evolution and Systematics, vol. 6, pp. 169–185, 2004

work page 2004

[7] [7]

Ecolog- ical responses to recent climate change,

G. R. Walther, E. Post, P. Convey, A. Menzel, C. Parmesan, T. J. C. Beebee, J. M. Fromentin, O. Hoegh-Guldberg, and F. Bairlein, “Ecolog- ical responses to recent climate change,”Nature, vol. 416, pp. 389–395, 2002

work page 2002

[8] [8]

Satellite remote sensing of vegetation phenology: Progress, challenges, and opportunities,

Z. Gong, W. Ge, J. Guo, and J. Liu, “Satellite remote sensing of vegetation phenology: Progress, challenges, and opportunities,”ISPRS Journal of Photogrammetry and Remote Sensing, vol. 217, pp. 149–164, 2024

work page 2024

[9] [9]

Herbivory as a selective agent on the timing of leaf production in a tropical understory community,

T. M. Aide, “Herbivory as a selective agent on the timing of leaf production in a tropical understory community,”Nature, vol. 336, pp. 574–575, 1988

work page 1988

[10] [10]

Tracking the rhythm of the seasons in the face of global change: Phenological research in the 21st century,

J. T. Morisette, A. D. Richardson, A. K. Knapp, J. I. Fisher, E. A. Graham, J. Abatzoglou, B. E. Wilson, D. D. Breshears, G. M. Henebry, J. M. Hanes, and L. Liang, “Tracking the rhythm of the seasons in the face of global change: Phenological research in the 21st century,” Frontiers in Ecology and the Environment, vol. 7, no. 5, pp. 253–260, 2009

work page 2009

[11] [11]

Introducing digital cameras to monitor plant phenology in the tropics: applications for conservation,

B. Alberton, R. da S. Torres, L. F. Cancian, B. D. Borges, J. Almeida, G. C. Mariano, J. dos Santos, and L. P. C. Morellato, “Introducing digital cameras to monitor plant phenology in the tropics: applications for conservation,”Perspectives in Ecology and Conservation, vol. 15, no. 2, pp. 82–90, 2017

work page 2017

[12] [12]

Relationship between trop- ical leaf phenology and ecosystem productivity using phenocameras,

B. Alberton, T. C. Martin, H. R. Da Rocha, A. D. Richardson, M. S. Moura, R. S. Torres, and L. P. C. Morellato, “Relationship between trop- ical leaf phenology and ecosystem productivity using phenocameras,” Frontiers in Environmental Science, vol. 11, p. 1223219, 2023

work page 2023

[13] [13]

A review of remote sensing image segmentation by deep learning methods,

J. Li, Y . Cai, Q. Li, M. Kou, and T. Zhang, “A review of remote sensing image segmentation by deep learning methods,”International Journal of Digital Earth, vol. 17, no. 1, p. 2328827, 2024

work page 2024

[14] [14]

Near-surface remote sensing of spatial and temporal variation in canopy phenology,

A. D. Richardson, B. H. Braswell, D. Y . Hollinger, J. P. Jenkins, and S. V . Ollinger, “Near-surface remote sensing of spatial and temporal variation in canopy phenology,”Ecological Applications, vol. 19, no. 6, pp. 1417–1428, 2009

work page 2009

[15] [15]

Using phenological cameras to track the green up in a cerrado savanna and its on-the-ground validation,

B. Alberton, J. Almeida, R. Henneken, R. S. Torres, A. Menzel, and L. P. C. Morellato, “Using phenological cameras to track the green up in a cerrado savanna and its on-the-ground validation,”Ecological Informatics, vol. 19, pp. 62–70, 2014. 13

work page 2014

[16] [16]

A review of plant phenology in south and central america,

L. P. C. Morellato, M. G. G. Camargo, and E. Gressler, “A review of plant phenology in south and central america,” inPhenology: An Integrative Environmental Science, M. D. Schwartz, Ed. Springer, 2013, chapter 6, pp. 91–113

work page 2013

[17] [17]

Spatio-temporal vegetation pixel classification by using convolutional networks,

K. Nogueira, J. A. dos Santos, N. Menini, T. S. Silva, L. P. C. Morellato, and R. d. S. Torres, “Spatio-temporal vegetation pixel classification by using convolutional networks,”IEEE Geosci. Remote Sens. Lett., vol. 16, no. 10, pp. 1665–1669, 2019

work page 2019

[18] [18]

Applying machine learning based on multiscale classifiers to detect remote phenology patterns in cerrado savanna trees,

J. Almeida, J. A. dos Santos, B. Alberton, R. d. S. Torres, and L. P. C. Morellato, “Applying machine learning based on multiscale classifiers to detect remote phenology patterns in cerrado savanna trees,”Ecological Informatics, vol. 23, pp. 49–61, 2014

work page 2014

[19] [19]

Unsupervised distance learning for plant species identification,

J. Almeida, D. C. Pedronette, B. C. Alberton, L. P. C. Morellato, and R. d. S. Torres, “Unsupervised distance learning for plant species identification,”IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 9, no. 12, pp. 5325–5338, 2016

work page 2016

[20] [20]

Phenological visual rhythms: Compact representations for fine- grained plant species identification,

J. Almeida, J. A. dos Santos, B. Alberton, L. P. C. Morellato, and R. d. S. Torres, “Phenological visual rhythms: Compact representations for fine- grained plant species identification,”Pattern Recognition Letters, vol. 81, pp. 90–100, 2016

work page 2016

[21] [21]

Deriving vegetation indices for phe- nology analysis using genetic programming,

J. Almeida, J. A. dos Santos, W. O. Miranda, B. Alberton, L. P. C. Morellato, and R. d. S. Torres, “Deriving vegetation indices for phe- nology analysis using genetic programming,”Ecological Informatics, vol. 26, pp. 61–69, 2015

work page 2015

[22] [22]

Time series-based classifier fusion for fine-grained plant species recognition,

F. A. Faria, J. Almeida, B. Alberton, L. P. C. Morellato, A. Rocha, and R. d. S. Torres, “Time series-based classifier fusion for fine-grained plant species recognition,”Pattern Recognition Letters, vol. 81, pp. 101–109, 2016

work page 2016

[23] [23]

Fusion of time series representations for plant recognition in phenology studies,

F. A. Faria, J. Almeida, B. Alberton, L. P. C. Morellato, and R. d. S. Torres, “Fusion of time series representations for plant recognition in phenology studies,”Pattern Recognition Letters, vol. 83, pp. 205–214, 2016

work page 2016

[24] [24]

Agrifm: A multi-source temporal remote sensing foundation model for crop mapping,

W. Li, S. Liang, K. Chen, Y . Chen, H. Ma, J. Xu, Y . Ma, S. Guan, H. Fang, and Z. Shi, “Agrifm: A multi-source temporal remote sensing foundation model for crop mapping,”arXiv preprint arXiv:2505.21357, 2025

work page arXiv 2025

[25] [25]

A review of artificial intelligence techniques for wheat crop monitoring and management,

J. G. A. Barbedo, “A review of artificial intelligence techniques for wheat crop monitoring and management,”Agronomy, vol. 15, no. 5, 2025

work page 2025

[26] [26]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

A. Dosovitskiy, “An image is worth 16x16 words: Transformers for image recognition at scale,”arXiv preprint arXiv:2010.11929, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010

[27] [27]

A systematic review of the use of deep learning in satellite imagery for agriculture,

B. Victor, A. Nibali, and Z. He, “A systematic review of the use of deep learning in satellite imagery for agriculture,”IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 18, pp. 2297–2316, 2025

work page 2025

[28] [28]

Vits for sits: Vision trans- formers for satellite image time series,

M. Tarasiou, E. Chavez, and S. Zafeiriou, “Vits for sits: Vision trans- formers for satellite image time series,” inIEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 10 418–10 428

work page 2023

[29] [29]

Hypyramamba: A pyramid spectral attention and mamba-based architecture for robust hyperspectral image classification,

D. Li, U. A. Bhatti, M. Huang, L. Bruzzone, and J. Li, “Hypyramamba: A pyramid spectral attention and mamba-based architecture for robust hyperspectral image classification,”IEEE Transactions on Geoscience and Remote Sensing (TGRS), vol. 64, pp. 1–16, 2026

work page 2026

[30] [30]

Swdiff: Stage-wise hyperspectral diffusion model for hyperspectral image classification,

L. Chen, J. He, H. Shi, J. Yang, and W. Li, “Swdiff: Stage-wise hyperspectral diffusion model for hyperspectral image classification,” IEEE Transactions on Geoscience and Remote Sensing (TGRS), vol. 62, pp. 1–17, 2024

work page 2024