pith. sign in

arxiv: 2507.00170 · v2 · submitted 2025-06-30 · 💻 cs.CV

SelvaBox: A high-resolution dataset for tropical tree crown detection

Pith reviewed 2026-05-19 06:43 UTC · model grok-4.3

classification 💻 cs.CV
keywords tropical tree crown detectionhigh-resolution drone imageryobject detection datasetzero-shot generalizationmulti-resolution trainingforest monitoringannotated aerial images
0
0 comments X

The pith

SelvaBox supplies over 83,000 labeled tropical tree crowns from drone imagery to improve detection accuracy and enable zero-shot transfer to new forests.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents SelvaBox as the largest open dataset for spotting individual tree crowns in tropical forests from high-resolution drone photos. It covers sites in three countries and includes more than 83,000 hand-labeled crowns, far exceeding earlier collections. Tests on the data show that feeding models higher-resolution images raises their accuracy at finding crowns. Detectors trained only on SelvaBox perform as well as or better than other approaches when applied directly to different tropical crown datasets they have never seen. Training one model on SelvaBox together with other datasets at varying resolutions produces a detector that ranks at the top or near the top on every test set.

Core claim

SelvaBox is a new collection of high-resolution drone images covering tropical forests in three countries and carrying manual annotations for more than 83,000 tree crowns. Benchmarks run on this collection establish that higher-resolution input images raise detection accuracy, and that models trained solely on SelvaBox match or exceed prior methods in zero-shot detection on entirely separate tropical crown datasets. A unified training scheme that combines SelvaBox with three other datasets at resolutions between 3 and 10 cm per pixel produces a single detector that places first or second on every evaluated dataset.

What carries the argument

The SelvaBox dataset of manually annotated high-resolution drone images containing more than 83,000 tropical tree crowns.

If this is right

  • Higher-resolution drone imagery becomes a practical requirement for accurate crown detection in tropical settings.
  • Zero-shot use of SelvaBox-trained models lowers the labeling effort needed for new forest sites.
  • A single multi-resolution training pipeline can unify data collected at different pixel scales and still deliver top performance.
  • Better crown detectors support more precise tracking of forest structure changes driven by climate or human activity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The dataset could support improved large-scale estimates of biomass or carbon storage once crown sizes are linked to tree volume models.
  • Similar high-resolution labeling efforts in other ecosystems might reveal whether the resolution benefit holds outside the tropics.
  • Public release of the data and weights allows rapid testing of whether newer detection architectures gain even more from the extra scale.

Load-bearing premise

The manual labeling process produces accurate and consistent ground-truth annotations across the 83,000 crowns.

What would settle it

An experiment showing that models trained exclusively on SelvaBox fall short of competing methods on new unseen tropical crown datasets, or that lower-resolution inputs produce equal or higher detection scores than higher-resolution inputs.

Figures

Figures reproduced from arXiv: 2507.00170 by Antoine Caron-Guay, Arthur Ouaknine, Christopher Pal, Etienne Lalibert\'e, Hugo Baudchon, Martin Weiss, M\'elisande Teng, Thomas R. Walla.

Figure 1
Figure 1. Figure 1: The SELVABOX dataset. The illustrated samples are extracted from rasters recorded in Panama, Brazil and Ecuador with a spatial extent of 80m × 80m and a resolution of 1.2 to 5.1 cm per pixel. The red square on the right highlights a zoom of the Ecuador sample with a spatial extent of 40m × 40m at the same resolution. 1 Introduction Tropical forests cover 10% of the land area, but they store most of the bio… view at source ↗
Figure 2
Figure 2. Figure 2: Distribution of box annotations size in SELVABOX per country. Annotations. The manual annotations have been produced by six trained biol￾ogists. They were asked to label every individual tree crown they could reliably detect from the imagery with bounding boxes. They generated 83 137 manual tree annotations during 1 284 people-hours with crowns spanning from < 2 m to > 50 m in diameter ( [PITH_FULL_IMAGE:… view at source ↗
Figure 3
Figure 3. Figure 3: Multi-resolution vs. single-resolution on SELVABOX. Comparison of RF175 between best performing single-resolution meth￾ods from Tab. 4 trained with a fixed spatial extent of 80 × 80 m, against multi-resolution approaches with increasingly large crop augmenta￾tion ranges ([36, 88], [30, 100] and [30, 120]). All methods are ‘DINO 5-scale Swin L-384’. We structure our experimental results as follows: first, w… view at source ↗
Figure 4
Figure 4. Figure 4: Visualization of spatially separated splits. All 14 rasters of SELVABOX are illustrated with their corresponding train, valid and test AOI-based splits. Images are uniformly sized and not at scale. A few train AOIs (red) have holes to exclude sparse annotations (see Section 3). 19 [PITH_FULL_IMAGE:figures/full_fig_p019_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Example of masked pixels in sparse annotations zones. Example on a 3555×3555 pixels training tile (160 × 160 meters) from the pantano raster. On the left is the raw tile, showing holes (red polygons) in the train AOI geopackage where annotations (white boxes) are sparse. On the right is the preprocessed tile, where pixels overlapping the AOI holes have been masked to remove sparse annotations. AOI holes we… view at source ↗
Figure 6
Figure 6. Figure 6: Example of cropping and resizing augmentations for the multi-resolution approach. We showcase the [30, 120] m configuration used in our benchmark: a 3555 × 3555 tile at 4.5cm = 0.045 m GSD, equivalent to a 160 × 160 m spatial extent, will be cropped with a random crop size value in [666, 2666] pixels, and then resized to a random value in [1024, 1777] pixels. This process has two effects: 1 cropping perfor… view at source ↗
Figure 7
Figure 7. Figure 7: Multi-resolution vs. single-resolution on SELVABOX. Comparison of mAP50:95 and mAR50:95 between best performing single-resolution methods from [PITH_FULL_IMAGE:figures/full_fig_p027_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Distribution of box annotations size across datasets. E.2 External methods evaluation We keep the default Detectree2 inference parameters provided in their python library. For DeepForest, we use their python library directly to benchmark their method but limit input size to 1000 × 1000 pixels maximum following their documentation guidelines and examples. E.3 ReforesTree dataset qualitative results [PITH_F… view at source ↗
Figure 9
Figure 9. Figure 9: Qualitative results on ReforesTree. In white the ReforesTree annotations generated from an in-distribution and fine-tuned DeepForest model, in blue our best multi-resolution [30, 120] model and in red our best model trained on multi-dataset + SELVABOX (both our methods are OOD). Results are shown post-NMS, using the optimal NMS IoU (τnms) and score (smin) thresholds for RF175 from Algorithm 1 (see Section … view at source ↗
Figure 10
Figure 10. Figure 10: Qualitative results on SELVABOX (Brazil). We compare the annotations in white, the best competing method Detectree2-resize (OOD) in yellow, our best multi-resolution [30, 120] model (ID) in blue and our best model trained on multi-dataset + SELVABOX (ID) in red. Results are shown post-NMS, using the optimal NMS IoU (τnms) and score (smin) thresholds for RF175 from Algorithm 1 (see Section B.3 for exact va… view at source ↗
Figure 11
Figure 11. Figure 11: Qualitative results on SELVABOX (Ecuador). We compare the annotations in white, the best competing method Detectree2-resize (OOD) in yellow, our best multi-resolution [30, 120] model (ID) in blue and our best model trained on multi-dataset + SELVABOX (ID) in red. Results are shown post-NMS, using the optimal NMS IoU (τnms) and score (smin) thresholds for RF175 from Algorithm 1 (see Section B.3 for exact v… view at source ↗
Figure 12
Figure 12. Figure 12: Qualitative results on SELVABOX (Panama). We compare the annotations in white, the best competing method Detectree2-resize (OOD) in yellow, our best multi-resolution [30, 120] model (ID) in blue and our best model trained on multi-dataset + SELVABOX (ID) in red. Results are shown post-NMS, using the optimal NMS IoU (τnms) and score (smin) thresholds for RF175 from Algorithm 1 (see Section B.3 for exact va… view at source ↗
Figure 13
Figure 13. Figure 13: Qualitative results on BCI50ha. We compare the annotations in white, the best competing method Detectree2-resize (OOD) in yellow, our best multi-resolution [30, 120] model (OOD) in blue and our best model trained on multi-dataset + SELVABOX (OOD) in red. Results are shown post-NMS, using the optimal NMS IoU (τnms) and score (smin) thresholds for RF175 from Algorithm 1 (see Section B.3 for exact values). 3… view at source ↗
Figure 14
Figure 14. Figure 14: Qualitative results on Detectree2 dataset. We compare the annotations in white, the best competing method Detectree2-resize (ID; possibly affected by train–test leakage, since we couldn’t recover their data splits) in yellow, our best multi-resolution [30, 120] model (OOD) in blue and our best model trained on multi-dataset + SELVABOX (OOD) in red. Results are shown post-NMS, using the optimal NMS IoU (τn… view at source ↗
Figure 15
Figure 15. Figure 15: Qualitative results on QuebecTrees. We compare the annotations in white, the best competing method Detectree2-flexi (OOD) in yellow, our best multi-resolution [30, 120] model (OOD) in blue and our best model trained on multi-dataset + SELVABOX (ID) in red. Results are shown post-NMS, using the optimal NMS IoU (τnms) and score (smin) thresholds for RF175 from Algorithm 1 (see Section B.3 for exact values).… view at source ↗
read the original abstract

Detecting individual tree crowns in tropical forests is essential to study these complex and crucial ecosystems impacted by human interventions and climate change. However, tropical crowns vary widely in size, structure, and pattern and are largely overlapping and intertwined, requiring advanced remote sensing methods applied to high-resolution imagery. Despite growing interest in tropical tree crown detection, annotated datasets remain scarce, hindering robust model development. We introduce SelvaBox, the largest open-access dataset for tropical tree crown detection in high-resolution drone imagery. It spans three countries and contains more than 83,000 manually labeled crowns - an order of magnitude larger than all previous tropical forest datasets combined. Extensive benchmarks on SelvaBox reveal two key findings: (1) higher-resolution inputs consistently boost detection accuracy; and (2) models trained exclusively on SelvaBox achieve competitive zero-shot detection performance on unseen tropical tree crown datasets, matching or exceeding competing methods. Furthermore, jointly training on SelvaBox and three other datasets at resolutions from 3 to 10 cm per pixel within a unified multi-resolution pipeline yields a detector ranking first or second across all evaluated datasets. Our dataset, code, and pre-trained weights are made public.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces SelvaBox, the largest open-access dataset for tropical tree crown detection, containing more than 83,000 manually labeled crowns from high-resolution drone imagery across three countries. Through benchmarks, it claims that higher-resolution inputs consistently improve detection accuracy, that models trained solely on SelvaBox achieve competitive zero-shot performance on unseen tropical datasets (matching or exceeding prior methods), and that joint multi-resolution training with other datasets produces a detector that ranks first or second across evaluations.

Significance. If the annotations prove reliable, SelvaBox would be a valuable resource addressing data scarcity in tropical remote sensing, enabling more robust models for ecological monitoring. The reported resolution scaling and zero-shot transfer results, if substantiated by rigorous validation, would provide actionable guidance for high-resolution aerial imagery applications in complex forest environments.

major comments (2)
  1. [Dataset description] Dataset creation and labeling section: No inter-annotator agreement, consistency metrics, expert validation subset, or label-noise analysis is reported for the 83,000 crowns. Since the paper explicitly notes that tropical crowns are 'largely overlapping and intertwined,' making boundary decisions subjective, this omission directly affects the reliability of all mAP scores, resolution-gain claims, and zero-shot generalization results.
  2. [Benchmarks and experiments] Experimental setup and results sections: The manuscript omits error bars, standard deviations across runs, exact train/test split ratios, and cross-validation details. Without these, the statistical significance of the 'consistent' accuracy boosts from higher resolution and the competitive zero-shot performance cannot be properly assessed.
minor comments (2)
  1. [Abstract] The abstract states that joint training uses 'resolutions from 3 to 10 cm per pixel' but does not name the three additional datasets or detail the unified multi-resolution pipeline architecture.
  2. [Results tables] Tables comparing SelvaBox-trained models to baselines would benefit from explicit column headers indicating whether results are zero-shot or fine-tuned.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript introducing SelvaBox. We address each major point below and describe the revisions that will be incorporated to strengthen the presentation of dataset quality and experimental rigor.

read point-by-point responses
  1. Referee: [Dataset description] Dataset creation and labeling section: No inter-annotator agreement, consistency metrics, expert validation subset, or label-noise analysis is reported for the 83,000 crowns. Since the paper explicitly notes that tropical crowns are 'largely overlapping and intertwined,' making boundary decisions subjective, this omission directly affects the reliability of all mAP scores, resolution-gain claims, and zero-shot generalization results.

    Authors: We agree that the subjective nature of delineating overlapping and intertwined tropical crowns, as stated in the manuscript, makes annotation quality assessment important. The original submission did not include these metrics. In the revised version we will add a dedicated paragraph in the Dataset creation and labeling section that reports inter-annotator agreement (mean IoU and percentage agreement) computed on a 500-image multi-annotator subset, together with a label-noise analysis that quantifies boundary variability. We will also describe the annotation protocol, including training of annotators and expert oversight, to support the reliability of the reported mAP, resolution-scaling, and zero-shot results. revision: yes

  2. Referee: [Benchmarks and experiments] Experimental setup and results sections: The manuscript omits error bars, standard deviations across runs, exact train/test split ratios, and cross-validation details. Without these, the statistical significance of the 'consistent' accuracy boosts from higher resolution and the competitive zero-shot performance cannot be properly assessed.

    Authors: We concur that additional statistical detail is needed to allow readers to evaluate the significance of the resolution and generalization findings. The revised manuscript will include error bars (standard deviation over five independent training runs with different random seeds) on all tables and figures in the Experimental setup and results sections. We will also state the precise train/test split ratios used for each benchmark and clarify whether any form of cross-validation was performed. These changes will make the strength of the higher-resolution accuracy gains and zero-shot competitiveness directly assessable. revision: yes

Circularity Check

0 steps flagged

No circularity: benchmarks use held-out and external test sets

full rationale

The paper introduces SelvaBox and reports empirical detection accuracies from standard train/test splits on its own data plus zero-shot transfer to completely separate external tropical crown datasets. These mAP numbers are computed against manual annotations on data the models were not trained on, rather than reducing by construction to any fitted parameter, self-definition, or self-citation chain. No equations, uniqueness theorems, or ansatzes are invoked that collapse the claimed results back into the inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests primarily on the assumption that manual crown annotations are sufficiently accurate to serve as ground truth; no new mathematical axioms or invented physical entities are introduced.

axioms (1)
  • domain assumption Manual annotations of tree crowns in drone imagery are accurate and consistent enough to serve as reliable ground truth for model evaluation.
    All reported detection accuracies and generalization results depend on these labels being correct.

pith-pipeline@v0.9.0 · 5763 in / 1315 out tokens · 49977 ms · 2026-05-19T06:43:33.422572+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. ResAF-Net: An Anchor-Free Attention-Based Network for Tree Detection and Agricultural Mapping in Palestine

    cs.CV 2026-04 unverdicted novelty 3.0

    ResAF-Net detects trees in satellite imagery for Palestinian agriculture, reaching 82% recall and 63% mAP@0.5 on the MillionTrees validation set and deployed in a web GIS.

Reference graph

Works this paper leans on

101 extracted references · 101 canonical work pages · cited by 1 Pith paper · 1 internal anchor

  1. [1]

    M. R. M. Amaral, A. J. N. Lima, F. G. Higuchi, J. dos Santos, and N. Higuchi. Dynamics of Tropical Forest Twenty-Five Years after Experimental Logging in Central Amazon Mature Forest. Forests, 10(2):89, Feb. 2019. 4

  2. [2]

    BAI, J.-B

    Y . BAI, J.-B. Durand, G. L. Vincent, and F. Forbes. Semantic segmentation of sparse irregular point clouds for leaf/wood discrimination. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. 2

  3. [3]

    accurate delineation of individual tree crowns in tropical forests from aerial rgb imagery using mask r-cnn

    J. Ball, T. Jackson, S. Hickman, and X. J. Koay. Crown data for "accurate delineation of individual tree crowns in tropical forests from aerial rgb imagery using mask r-cnn", Apr. 2023. 5

  4. [4]

    J. G. C. Ball, S. H. M. Hickman, T. D. Jackson, X. J. Koay, J. Hirst, W. Jay, M. Archer, M. Aubry-Kientz, G. Vincent, and D. A. Coomes. Accurate delineation of individual tree crowns in tropical forests from aerial RGB imagery using Mask R-CNN. Remote Sensing in Ecology and Conservation, 9(5):641–655, Oct. 2023. 2, 3, 5, 6

  5. [5]

    S. M. A. Bashir and Y . Wang. Small Object Detection in Remote Sensing Images with Residual Feature Aggregation-Based Super-Resolution and Object Detector Network. Remote Sensing, 13(9):1854, May 2021. Publisher: MDPI AG. 2

  6. [6]

    Beloiu, L

    M. Beloiu, L. Heinzmann, N. Rehush, A. Gessler, and V . C. Griess. Individual Tree-Crown Detection and Species Identification in Heterogeneous Forests Using Aerial RGB Imagery and Deep Learning. Remote Sensing, 15(5):1463, Mar. 2023. 3

  7. [7]

    Bodla, B

    N. Bodla, B. Singh, R. Chellappa, and L. S. Davis. Soft-nms — improving object detection with one line of code. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 5562–5570, 2017. 9

  8. [8]

    G. B. Bonan. Forests and Climate Change: Forcings, Feedbacks, and the Climate Benefits of Forests. Science, 320(5882):1444–1449, June 2008. 2

  9. [9]

    N. I. Bountos, A. Ouaknine, I. Papoutsis, and D. Rolnick. FoMo: Multi-Modal, Multi-Scale and Multi-Task Remote Sensing Foundation Models for Forest Monitoring. Proceedings of the AAAI Conference on Artificial Intelligence, 39(27):27858–27868, Apr. 2025. 2, 3

  10. [10]

    Brandt, C

    M. Brandt, C. J. Tucker, A. Kariryaa, K. Rasmussen, C. Abel, J. Small, J. Chave, L. V . Rasmussen, P. Hiernaux, A. A. Diouf, L. Kergoat, O. Mertz, C. Igel, F. Gieseke, J. Schöning, S. Li, K. Melocik, J. Meyer, S. Sinno, E. Romero, E. Glennie, A. Montagu, M. Dendoncker, and R. Fensholt. An unexpectedly large count of trees in the West African Sahara and Sa...

  11. [11]

    Brandtberg and F

    T. Brandtberg and F. Walter. Automated delineation of individual tree crowns in high spatial resolution aerial images by multiple-scale analysis.Machine Vision and Applications, 11(2):64– 73, Oct. 1998. 3

  12. [12]

    R. J. W. Brienen, O. L. Phillips, T. R. Feldpausch, E. Gloor, T. R. Baker, J. Lloyd, G. Lopez- Gonzalez, A. Monteagudo-Mendoza, Y . Malhi, S. L. Lewis, et al. Long-term decline of the Amazon carbon sink. Nature, 519(7543):344–348, Mar. 2015. 2 10

  13. [13]

    Cheng, I

    B. Cheng, I. Misra, A. G. Schwing, A. Kirillov, and R. Girdhar. Masked-Attention Mask Transformer for Universal Image Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1290–1299, June 2022. 3

  14. [14]

    Cloutier, M

    M. Cloutier, M. Germain, and E. Laliberté. Quebec trees dataset, Sept. 2023. 5

  15. [15]

    Cloutier, M

    M. Cloutier, M. Germain, and E. Laliberté. Influence of temperate forest autumn leaf phenology on segmentation of tree species from UA V imagery using deep learning.Remote Sensing of Environment, 311:114283, Sept. 2024. 2, 3, 5

  16. [16]

    D. S. Culvenor. TIDA: an algorithm for the delineation of tree crowns in high spatial resolution remotely sensed imagery. Computers & Geosciences, 28(1):33–44, Feb. 2002. 3

  17. [17]

    S. J. Davies, I. Abiem, K. Abu Salim, S. Aguilar, D. Allen, A. Alonso, K. Anderson-Teixeira, A. Andrade, G. Arellano, et al. ForestGEO: Understanding forest diversity and dynamics through a global observatory network. Biological Conservation, 253:108907, Jan. 2021. 2

  18. [18]

    R. A. F. de Lima, O. L. Phillips, A. Duque, J. S. Tello, S. J. Davies, A. A. de Oliveira, S. Muller, E. N. Honorio Coronado, E. Vilanova, A. Cuni-Sanchez, T. R. Baker, C. M. Ryan, A. Malizia, S. L. Lewis, H. ter Steege, J. Ferreira, B. S. Marimon, H. T. Luu, G. Imani, L. Arroyo, C. Blundo, D. Kenfack, M. N. Sainge, B. Sonké, and R. Vásquez. Making forest ...

  19. [19]

    M. Erikson. Species classification of individually segmented tree crowns in high-resolution aerial images using radiometric and morphologic image measures. Remote Sensing of Environ- ment, 91(3-4):469–477, June 2004. 3

  20. [20]

    Esquivel-Muelbert, T

    A. Esquivel-Muelbert, T. R. Baker, K. G. Dexter, S. L. Lewis, R. J. W. Brienen, T. R. Feld- pausch, J. Lloyd, A. Monteagudo-Mendoza, L. Arroyo, Álvarez-Dávila, et al. Compositional response of Amazon forests to climate change. Global Change Biology, 25(1):39–56, 2019. 2

  21. [21]

    Firoze, C

    A. Firoze, C. Wingren, R. A. Yeh, B. Benes, and D. Aliaga. Tree Instance Segmentation with Temporal Contour Graph. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2193–2202, Vancouver, BC, Canada, June 2023. IEEE. 3, 5

  22. [22]

    H. Fu, H. Zhao, J. Jiang, Y . Zhang, G. Liu, W. Xiao, S. Du, W. Guo, and X. Liu. Automatic detection tree crown and height using Mask R-CNN based on unmanned aerial vehicles images for biomass mapping. Forest Ecology and Management, 555:121712, Mar. 2024. 3

  23. [23]

    J. R. G. Braga, V . Peripato, R. Dalagnol, M. P. Ferreira, Y . Tarabalka, L. E. O. C. Aragão, H. F. De Campos Velho, E. H. Shiguemori, and F. H. Wagner. Tree Crown Delineation Algorithm Based on a Convolutional Neural Network. Remote Sensing, 12(8):1288, Apr. 2020. 3

  24. [24]

    N. C. Galuszynski, R. Duker, A. J. Potts, and T. Kattenborn. Automated mapping of Por- tulacaria afra canopies for restoration monitoring with convolutional neural networks and heterogeneous unmanned aerial vehicle imagery. PeerJ, 10:e14219, Oct. 2022. 3

  25. [25]

    Y . Gan, Q. Wang, and A. Iio. Tree Crown Detection and Delineation in a Temperate Decid- uous Forest from UA V RGB Imagery Using Deep Learning Approaches: Effects of Spatial Resolution and Species Characteristics. Remote Sensing, 15(3):778, Jan. 2023. 3

  26. [26]

    R. C. Gatti, P. B. Reich, J. G. P. Gamarra, T. Crowther, C. Hui, A. Morera, J.-F. Bastin, S. de-Miguel, G.-J. Nabuurs, J.-C. Svenning, J. M. Serra-Diaz, et al. The number of tree species on Earth. Proceedings of the National Academy of Sciences, 119(6):e2115329119, Feb

  27. [27]

    Gaydon and F

    C. Gaydon and F. Roche. PureForest: A Large-Scale Aerial Lidar and Aerial Imagery Dataset for Tree Species Classification in Monospecific Forests. In Proceedings of the Winter Conference on Applications of Computer Vision (WACV), pages 5895–5904, Feb. 2025. 2

  28. [28]

    F. A. Gougeon. A Crown-Following Approach to the Automatic Delineation of Individual Tree Crowns in High Spatial Resolution Aerial Images. Canadian Journal of Remote Sensing, 21(3):274–284, Aug. 1995. 3 11

  29. [29]

    Z. Hao, L. Lin, C. J. Post, E. A. Mikhailova, M. Li, Y . Chen, K. Yu, and J. Liu. Automated tree- crown and height detection in a young forest plantation using mask region-based convolutional neural network (Mask R-CNN). ISPRS Journal of Photogrammetry and Remote Sensing , 178:112–123, Aug. 2021. 3

  30. [30]

    N. L. Harris, D. A. Gibbs, A. Baccini, R. A. Birdsey, S. De Bruin, M. Farina, L. Fatoyinbo, M. C. Hansen, M. Herold, R. A. Houghton, P. V . Potapov, D. R. Suarez, R. M. Roman-Cuesta, S. S. Saatchi, C. M. Slay, S. A. Turubanova, and A. Tyukavina. Global maps of twenty-first century forest carbon fluxes. Nature Climate Change, 11(3):234–240, Mar. 2021. 2

  31. [31]

    K. He, G. Gkioxari, P. Dollar, and R. Girshick. Mask R-CNN. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 2980–2988, Venice, Oct. 2017. IEEE. 3

  32. [32]

    K. He, X. Zhang, S. Ren, and J. Sun. Deep Residual Learning for Image Recognition. In Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR ’16, pages 770–778. IEEE, June 2016. 6

  33. [33]

    Henrich, J

    J. Henrich, J. v. Delden, D. Seidel, T. Kneib, and A. Ecker. TreeLearn: A deep learning method for segmenting individual trees from ground-based LiDAR forest point clouds. Ecological Informatics, 84:102888, Dec. 2024. arXiv:2309.08471 [cs]. 2

  34. [34]

    Hoorn, F

    C. Hoorn, F. P. Wesselingh, H. ter Steege, M. A. Bermudez, A. Mora, J. Sevink, I. Sanmartin, A. Sanchez-Meseguer, C. L. Anderson, J. P. Figueiredo, C. Jaramillo, D. Riff, F. R. Negri, H. Hooghiemstra, J. Lundberg, T. Stadler, T. Sarkinen, and A. Antonelli. Amazonia Through Time: Andean Uplift, Climate Change, Landscape Evolution, and Biodiversity. Science...

  35. [35]

    Jiang, M

    T. Jiang, M. Freudenberg, C. Kleinn, T. Lüddecke, A. Ecker, and N. Nölke. Detection transformer-based approach for mapping trees outside forests on high resolution satellite imagery. Ecological Informatics, 87:103114, 2025. 2

  36. [36]

    Kattenborn, J

    T. Kattenborn, J. Eichel, S. Wiser, L. Burrows, F. E. Fassnacht, and S. Schmidtlein. Convolu- tional Neural Networks accurately predict cover fractions of plant species and communities in Unmanned Aerial Vehicle imagery. Remote Sensing in Ecology and Conservation, 6(4):472– 486, Dec. 2020. 3

  37. [37]

    Kattenborn, J

    T. Kattenborn, J. Leitloff, F. Schiefer, and S. Hinz. Review on Convolutional Neural Networks (CNN) in vegetation remote sensing. ISPRS Journal of Photogrammetry and Remote Sensing, 173:24–49, Mar. 2021. 3

  38. [38]

    Kattenborn, J

    T. Kattenborn, J. Lopatin, M. Förster, A. C. Braun, and F. E. Fassnacht. UA V data as alternative to field sampling to map woody invasive species based on combined Sentinel-1 and Sentinel-2 data. Remote Sensing of Environment, 227:61–73, June 2019. 3

  39. [39]

    Kattenborn, F

    T. Kattenborn, F. Schiefer, J. Frey, H. Feilhauer, M. D. Mahecha, and C. F. Dormann. Spatially autocorrelated training and validation samples inflate performance assessment of convolutional neural networks. ISPRS Open Journal of Photogrammetry and Remote Sensing, 5:100018,

  40. [40]

    Ke and L

    Y . Ke and L. J. Quackenbush. A review of methods for automatic individual tree-crown detection and delineation from passive remote sensing. International Journal of Remote Sensing, 32(17):4725–4747, Sept. 2011. 3

  41. [41]

    Segment Anything

    A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y . Lo, P. Dollár, and R. Girshick. Segment Anything, Apr. 2023. arXiv:2304.02643 [cs]. 3

  42. [42]

    N. Lang, W. Jetz, K. Schindler, and J. D. Wegner. A high-resolution canopy height model of the Earth. Nature Ecology & Evolution, 7(11):1778–1789, Sept. 2023. 2

  43. [43]

    Lefebvre and E

    I. Lefebvre and E. Laliberté. UA V LiDAR, UA V Imagery, Tree Segmentations and Ground Mesurements for Estimating Tree Biomass in Canadian (Quebec) Plantations, July 2024. 3, 5 12

  44. [44]

    F. Li, H. Zhang, S. Liu, J. Guo, L. M. Ni, and L. Zhang. Dn-detr: Accelerate detr training by introducing query denoising. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13619–13627, 2022. 3

  45. [45]

    F. Li, H. Zhang, H. Xu, S. Liu, L. Zhang, L. M. Ni, and H.-Y . Shum. Mask DINO: Towards a Unified Transformer-Based Framework for Object Detection and Segmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3041–3050, June 2023. 3

  46. [46]

    W. Li, H. Fu, L. Yu, and A. Cracknell. Deep Learning Based Oil Palm Tree Detection and Counting for High-Resolution Remote Sensing Images. Remote Sensing, 9(1):22, Dec. 2016. 3

  47. [47]

    Y . Li, Q. Huang, X. Pei, Y . Chen, L. Jiao, and R. Shang. Cross-Layer Attention Network for Small Object Detection in Remote Sensing Imagery. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 14:2148–2161, 2021. Publisher: Institute of Electrical and Electronics Engineers (IEEE). 2

  48. [48]

    T.-Y . Lin, P. Goyal, R. Girshick, K. He, and P. Dollar. Focal Loss for Dense Object Detection. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 2999–3007, Venice, Oct. 2017. IEEE. 3

  49. [49]

    T.-Y . Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft coco: Common objects in context. In D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, editors, Computer Vision – ECCV 2014, pages 740–755, Cham, 2014. Springer International Publishing. 5

  50. [50]

    M. Liu, T. Yu, X. Gu, Z. Sun, J. Yang, Z. Zhang, X. Mi, W. Cao, and J. Li. The Impact of Spatial Resolution on the Classification of Vegetation Types in Highly Fragmented Planting Areas Based on Unmanned Aerial Vehicle Hyperspectral Images. Remote Sensing, 12(1):146, Jan. 2020. 3

  51. [51]

    S. Liu, F. Li, H. Zhang, X. Yang, X. Qi, H. Su, J. Zhu, and L. Zhang. DAB-DETR: Dy- namic Anchor Boxes are Better Queries for DETR. In International Conference on Learning Representations, 2022. 3

  52. [52]

    Y . Liu, H. You, X. Tang, Q. You, Y . Huang, and J. Chen. Study on Individual Tree Segmentation of Different Tree Species Using Different Segmentation Algorithms Based on 3D UA V Data. Forests, 14(7):1327, June 2023. Publisher: MDPI AG. 2

  53. [53]

    Z. Liu, H. Hu, Y . Lin, Z. Yao, Z. Xie, Y . Wei, J. Ning, Y . Cao, Z. Zhang, L. Dong, F. Wei, and B. Guo. Swin Transformer V2: Scaling Up Capacity and Resolution. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11999–12009, 2022. 2

  54. [54]

    Z. Liu, Y . Lin, Y . Cao, H. Hu, Y . Wei, Z. Zhang, S. Lin, and B. Guo. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 9992–10002, 2021. 2, 6

  55. [55]

    J. A. Lutz, T. J. Furniss, D. J. Johnson, S. J. Davies, D. Allen, A. Alonso, K. J. Anderson- Teixeira, A. Andrade, J. Baltzer, Becker, et al. Global importance of large-diameter trees. Global Ecology and Biogeography, 27(7):849–864, 2018. 2

  56. [56]

    Z. Ma, Y . Dong, J. Zi, F. Xu, and F. Chen. Forest-PointNet: A Deep Learning Model for Vertical Structure Segmentation in Complex Forest Scenes. Remote Sensing, 15(19):4793, Sept. 2023. Publisher: MDPI AG. 2

  57. [57]

    Mayoral, M

    C. Mayoral, M. van Breugel, A. Cerezo, and J. S. Hall. Survival and growth of five Neotropical timber species in monocultures and mixtures. Forest Ecology and Management, 403:1–11, Nov. 2017. 4 13

  58. [58]

    Mosig, J

    C. Mosig, J. Vajna-Jehle, M. D. Mahecha, Y . Cheng, H. Hartmann, D. Montero, S. Junttila, S. Horion, S. Adu-Bredu, D. Al-Halbouni, M. Allen, J. Altman, et al. deadtrees.earth - An Open-Access and Interactive Database for Centimeter-Scale Aerial Imagery to Uncover Global Tree Mortality Dynamics, Oct. 2024. 3

  59. [59]

    R. Näsi, E. Honkavaara, P. Lyytikäinen-Saarenmaa, M. Blomqvist, P. Litkey, T. Hakala, N. Vil- janen, T. Kantola, T. Tanhuanpää, and M. Holopainen. Using UA V-Based Photogrammetry and Hyperspectral Imaging for Mapping Bark Beetle Damage at Tree-Level. Remote Sensing, 7(11):15467–15493, Nov. 2015. 3

  60. [60]

    Onishi and T

    M. Onishi and T. Ise. Explainable identification and mapping of trees using UA V RGB image and deep learning. Scientific Reports, 11(1):903, Jan. 2021. 3

  61. [61]

    Oquab, T

    M. Oquab, T. Darcet, T. Moutakanni, H. V . V o, M. Szafraniec, V . Khalidov, P. Fernandez, D. HAZIZA, F. Massa, A. El-Nouby, M. Assran, N. Ballas, W. Galuba, R. Howes, P.-Y . Huang, S.-W. Li, I. Misra, M. Rabbat, V . Sharma, G. Synnaeve, H. Xu, H. Jegou, J. Mairal, P. Labatut, A. Joulin, and P. Bojanowski. DINOv2: Learning Robust Visual Features without S...

  62. [62]

    Ouaknine, T

    A. Ouaknine, T. Kattenborn, E. Laliberté, and D. Rolnick. OpenForest: a data catalog for machine learning in forest monitoring. Environmental Data Science, 4:e15, 2025. 2, 3

  63. [63]

    Y . Pan, R. A. Birdsey, J. Fang, R. Houghton, P. E. Kauppi, W. A. Kurz, O. L. Phillips, A. Shvidenko, S. L. Lewis, J. G. Canadell, P. Ciais, R. B. Jackson, S. W. Pacala, A. D. McGuire, S. Piao, A. Rautiainen, S. Sitch, and D. Hayes. A Large and Persistent Carbon Sink in the World’s Forests.Science, 333(6045):988–993, Aug. 2011. Publisher: American Associa...

  64. [64]

    Puliti, E

    S. Puliti, E. R. Lines, J. Müllerová, J. Frey, Z. Schindler, A. Straker, M. J. Allen, L. Winiwarter, N. Rehush, H. Hristova, B. Murray, K. Calders, N. Coops, B. Höfle, L. Irwin, et al. Bench- marking tree species classification from proximally sensed laser scanning data: Introducing the for-species20k dataset. Methods in Ecology and Evolution , 16(4):801–...

  65. [65]

    Puliti, G

    S. Puliti, G. Pearse, P. Surový, L. Wallace, M. Hollaus, M. Wielgosz, and R. Astrup. FOR- instance: a UA V laser scanning benchmark dataset for semantic and instance segmentation of individual trees, Sept. 2023. arXiv:2309.01279 [cs]. 2

  66. [66]

    Rabbi, N

    J. Rabbi, N. Ray, M. Schubert, S. Chowdhury, and D. Chao. Small-Object Detection in Remote Sensing Images with End-to-End Edge-Enhanced GAN and Object Detector Network. Remote Sensing, 12(9):1432, May 2020. Publisher: MDPI AG. 2

  67. [67]

    C. J. Reed, R. Gupta, S. Li, S. Brockman, C. Funk, B. Clipp, K. Keutzer, S. Candido, M. Uyt- tendaele, and T. Darrell. Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning. In Proceedings of the IEEE/CVF International Confer- ence on Computer Vision (ICCV), pages 4088–4099, Oct. 2023. 3

  68. [68]

    Reiersen, D

    G. Reiersen, D. Dao, B. Lütjens, K. Klemmer, K. Amara, A. Steinegger, C. Zhang, and X. Zhu. ReforesTree: A Dataset for Estimating Tropical Forest Carbon Stock with Deep Learning and Aerial Imagery. Proceedings of the AAAI Conference on Artificial Intelligence, 36(11):12119–12125, June 2022. 2, 3, 5

  69. [69]

    S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN: towards real-time object detection with region proposal networks. In Proceedings of the 29th International Conference on Neural Information Processing Systems - Volume 1, NIPS’15, pages 91–99, Cambridge, MA, USA,

  70. [70]

    event-place: Montreal, Canada

    MIT Press. event-place: Montreal, Canada. 3

  71. [71]

    S. Ren, K. He, R. Girshick, and J. Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc., 2015. 6 14

  72. [72]

    T. Ren, S. Liu, F. Li, H. Zhang, A. Zeng, J. Yang, X. Liao, D. Jia, H. Li, H. Cao, J. Wang, Z. Zeng, X. Qi, Y . Yuan, J. Yang, and L. Zhang. detrex: Benchmarking detection transformers,

  73. [73]

    Rolnick, A

    D. Rolnick, A. Aspuru-Guzik, S. Beery, B. Dilkina, P. L. Donti, M. Ghassemi, H. Kerner, C. Monteleoni, E. Rolf, M. Tambe, and A. White. Position: Application-Driven Innovation in Machine Learning. In R. Salakhutdinov, Z. Kolter, K. Heller, A. Weller, N. Oliver, J. Scarlett, and F. Berkenkamp, editors, Proceedings of the 41st International Conference on Ma...

  74. [74]

    A. A. D. Santos, J. Marcato Junior, M. S. Araújo, D. R. Di Martini, E. C. Tetila, H. L. Siqueira, C. Aoki, A. Eltner, E. T. Matsubara, H. Pistori, R. Q. Feitosa, V . Liesenberg, and W. N. Gonçalves. Assessment of CNN-Based Methods for Individual Tree Detection on Images Captured by RGB Cameras Attached to UA Vs.Sensors, 19(16):3595, Aug. 2019. 3

  75. [75]

    Schiefer, T

    F. Schiefer, T. Kattenborn, A. Frick, J. Frey, P. Schall, B. Koch, and S. Schmidtlein. Mapping forest tree species in high resolution UA V-based RGB-imagery by means of convolutional neural networks. ISPRS Journal of Photogrammetry and Remote Sensing, 170:205–215, Dec

  76. [76]

    Solovyev, W

    R. Solovyev, W. Wang, and T. Gabruseva. Weighted boxes fusion: Ensembling boxes from different object detection models. Image and Vision Computing, 107:104117, 2021. 9

  77. [77]

    M. Teng, A. Ouaknine, E. Laliberté, Y . Bengio, D. Rolnick, and H. Larochelle. Assessing SAM for Tree Crown Instance Segmentation from Drone Imagery, Mar. 2025. arXiv:2503.20199 [cs]. 3

  78. [78]

    Tolan, H.-I

    J. Tolan, H.-I. Yang, B. Nosarzewski, G. Couairon, H. V . V o, J. Brandt, J. Spore, S. Majumdar, D. Haziza, J. Vamaraju, T. Moutakanni, P. Bojanowski, T. Johns, B. White, T. Tiecke, and C. Couprie. Very high resolution canopy height maps from RGB imagery using self-supervised vision transformer and convolutional decoder trained on aerial lidar. Remote Sen...

  79. [79]

    Touvron, M

    H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Jegou. Training data- efficient image transformers &amp; distillation through attention. In M. Meila and T. Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 10347–10357. PMLR, July 2021. 3

  80. [80]

    Troles, U

    J. Troles, U. Schmid, W. Fan, and J. Tian. BAMFORESTS: Bamberg Benchmark Forest Dataset of Individual Tree Crowns in Very-High-Resolution UA V Images.Remote Sensing, 16(11):1935, May 2024. 3

Showing first 80 references.