pith. machine review for the scientific record. sign in

arxiv: 2604.23653 · v1 · submitted 2026-04-26 · 💻 cs.CV · cs.AI

Recognition: unknown

ResAF-Net: An Anchor-Free Attention-Based Network for Tree Detection and Agricultural Mapping in Palestine

Rabee Al-Qasem

Authors on Pith no claims yet

Pith reviewed 2026-05-08 06:47 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords tree detectionsatellite imageryagricultural mappinganchor-free detectionattention mechanismGIS integrationcrop monitoringPalestine
0
0 comments X

The pith

ResAF-Net uses a ResNet encoder with attention and an anchor-free head to detect trees in satellite images for agricultural mapping in Palestine.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops ResAF-Net to solve the problem of collecting reliable tree and crop data in Palestine, where fragmented land, limited access, and monitoring restrictions make traditional surveys difficult. The network processes satellite imagery through a ResNet-50 backbone, atrous pooling, feature fusion, and self-attention before an anchor-free detector locates individual trees. When tested on a large benchmark, it shows high sensitivity to tree presence and is then linked to local land records inside a web GIS tool. This setup allows analysis from single scenes to entire communities and creates a starting point for remote, data-driven agricultural monitoring.

Core claim

ResAF-Net is an anchor-free attention-based network for tree detection that combines a ResNet-50 encoder, Atrous Spatial Pyramid Pooling, a feature-fusion stage, a multi-head self-attention refinement module, and an FCOS detection head. Trained on the MillionTrees benchmark, the model records 82 percent recall, 63.03 percent mAP at 0.50 IoU, and 35.47 percent mAP across 0.50 to 0.95 IoU on the validation set. The same model is deployed inside a web-based GIS application that ingests Palestinian cadastral data from GeoMolg, enabling tree analysis at scene, parcel, and community scales and providing a practical route to large-scale agricultural inventorying.

What carries the argument

ResAF-Net architecture that stacks a ResNet-50 encoder, atrous spatial pyramid pooling, feature fusion, multi-head self-attention refinement, and an anchor-free FCOS head to localize trees in dense or heterogeneous satellite scenes.

If this is right

  • The GIS integration permits tree inventories at scene, parcel, and community levels using existing cadastral layers.
  • The deployment shows that satellite-based detection can support agricultural monitoring where physical or aerial access is limited.
  • The framework supplies a base for later species-level classification of Mediterranean tree crops.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar architectures could be tested in other regions that face comparable access or data-collection barriers.
  • Routine use of the system might feed into broader land-use planning and food-security tracking at national scale.
  • Performance could be checked by running the model on fresh Palestinian satellite scenes and measuring agreement with any available local reference data.

Load-bearing premise

That benchmark performance on MillionTrees will transfer to the fragmented and restricted landscapes of Palestine without extra domain adaptation or local validation data.

What would settle it

A direct comparison of model predictions against ground-truth tree counts obtained from field visits or very-high-resolution local imagery in selected Palestinian agricultural parcels.

Figures

Figures reproduced from arXiv: 2604.23653 by Rabee Al-Qasem.

Figure 1
Figure 1. Figure 1: Sample Image with its annotations from different view at source ↗
Figure 2
Figure 2. Figure 2: The distribution of aspect ratio between different view at source ↗
Figure 4
Figure 4. Figure 4: Tree detection architecture consisting of an encoder, view at source ↗
Figure 7
Figure 7. Figure 7: High-level architecture of the self-attention refiner. FPN view at source ↗
Figure 8
Figure 8. Figure 8: Architecture of the Anchor-Free Detection Head. view at source ↗
Figure 9
Figure 9. Figure 9: Training loss curves for the detection model over view at source ↗
Figure 10
Figure 10. Figure 10: Visual results from the validation dataset of Model Detections. The images display ground-truth labels (green) and our model’s predictions (purple). The high Recall (82%) is evident as the purple boxes successfully intersect with the vast majority of green boxes. generates reports on the expansion or decline of specific tree types and overall green cover within targeted areas. We built the application on … view at source ↗
Figure 11
Figure 11. Figure 11: Visual representation of the 6,184 trees detected by view at source ↗
Figure 13
Figure 13. Figure 13: Visual representation of the 6,641 trees detected by view at source ↗
Figure 12
Figure 12. Figure 12: Visual representation of the 384 trees detected by the view at source ↗
read the original abstract

Reliable agricultural data is essential for food security, land-use planning, and economic resilience, yet in Palestine, such data remains difficult to collect at scale because of fragmented landscapes, limited field access, and restrictions on aerial monitoring. This paper presents ResAF-Net, a satellite-based tree detection framework designed for large-scale agricultural monitoring in resource-constrained settings. The proposed architecture combines a ResNet-50 encoder, Atrous Spatial Pyramid Pooling (ASPP), a feature-fusion stage, a multi-head self-attention refinement module, and an anchor-free FCOS detection head to improve tree localization in dense and heterogeneous scenes. Trained on the MillionTrees benchmark, the model achieved 82% Recall, 63.03% mAP@0.50, and 35.47% mAP@0.50:0.95 on the validation split, indicating strong sensitivity to tree presence while maintaining competitive localization quality. Beyond benchmark evaluation, we implemented the model within a web-based GIS application integrated with Palestinian cadastral data from GeoMolg, enabling tree analysis at scene, parcel, and community levels. This deployment demonstrates the practical feasibility of AI-assisted agricultural inventorying in Palestine. It provides a foundation for data-driven monitoring, reporting, and future species-level analysis of Mediterranean tree crops.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces ResAF-Net, an anchor-free network for satellite-based tree detection that combines a ResNet-50 encoder, ASPP, feature fusion, multi-head self-attention, and an FCOS head. Trained on the MillionTrees benchmark, it reports 82% recall, 63.03% mAP@0.50 and 35.47% mAP@0.50:0.95 on the validation split, and describes a qualitative web-GIS deployment integrated with Palestinian GeoMolg cadastral data for multi-scale tree inventorying.

Significance. If the reported performance generalizes beyond the MillionTrees benchmark to the target domain, the work could supply a practical monitoring tool for fragmented agricultural landscapes where field access and aerial data are restricted. The cadastral integration is a concrete step toward operational use, but the absence of any local quantitative results limits the immediate significance to a proof-of-concept on a public benchmark.

major comments (2)
  1. [Abstract] Abstract and deployment description: the central claim that the system demonstrates 'practical feasibility of AI-assisted agricultural inventorying in Palestine' rests on an untested generalization assumption. All numeric results (82% Recall, 63.03% mAP@0.50, 35.47% mAP@0.50:0.95) are reported exclusively on the MillionTrees validation split; no accuracy figures, local test set, fine-tuning protocol, or domain-adaptation results on Palestinian satellite scenes or parcels are provided.
  2. [Abstract] The weakest assumption—that benchmark performance will transfer to Palestine’s heterogeneous, restricted-access landscapes—is load-bearing for the application claim yet unsupported by any quantitative evidence in the manuscript.
minor comments (1)
  1. [Abstract] Training protocol details (data splits, augmentation strategy, hyper-parameter search, optimizer, and error analysis) are omitted from the abstract and not referenced in the provided text, preventing assessment of robustness or overfitting risk.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on the manuscript. We agree that the claims regarding practical feasibility in Palestine are not quantitatively supported by local data and will revise the abstract and add explicit limitations discussion to align claims with the presented evidence.

read point-by-point responses
  1. Referee: [Abstract] Abstract and deployment description: the central claim that the system demonstrates 'practical feasibility of AI-assisted agricultural inventorying in Palestine' rests on an untested generalization assumption. All numeric results (82% Recall, 63.03% mAP@0.50, 35.47% mAP@0.50:0.95) are reported exclusively on the MillionTrees validation split; no accuracy figures, local test set, fine-tuning protocol, or domain-adaptation results on Palestinian satellite scenes or parcels are provided.

    Authors: We agree that the manuscript reports all numeric results exclusively on the MillionTrees validation split and provides no local test set, fine-tuning protocol, or domain-adaptation results on Palestinian scenes. The web-GIS deployment with GeoMolg data is presented as a qualitative illustration of integration for multi-scale inventorying rather than a validated local deployment. We will revise the abstract to replace the phrase 'demonstrates the practical feasibility' with 'illustrates the potential for' and add a dedicated limitations paragraph stating the absence of local quantitative validation. This addresses the untested generalization directly. revision: yes

  2. Referee: [Abstract] The weakest assumption—that benchmark performance will transfer to Palestine’s heterogeneous, restricted-access landscapes—is load-bearing for the application claim yet unsupported by any quantitative evidence in the manuscript.

    Authors: The referee is correct that transferability to Palestinian landscapes is assumed without quantitative support in the current work. No local satellite scenes or parcels were used for testing or adaptation, owing to the access restrictions noted in the introduction. We will revise the abstract and conclusion to temper the language, explicitly framing the Palestine component as a proof-of-concept deployment framework rather than a demonstrated operational system. A new discussion subsection will acknowledge the domain-shift risks and the need for future local fine-tuning and evaluation. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical benchmark evaluation with no derivations or self-referential predictions

full rationale

The manuscript proposes a composite CNN architecture (ResNet-50 encoder, ASPP, feature fusion, multi-head attention, FCOS head) and reports standard detection metrics obtained by training and evaluating on the external MillionTrees validation split. No equations, parameter-fitting procedures, or derivation chains are described that could reduce to self-definition or fitted-input predictions. The Palestine deployment is presented only as a qualitative GIS integration without any local quantitative results, so no circular reduction occurs. This is a standard empirical ML paper whose central claims rest on external benchmark performance rather than internal self-reference.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the empirical transfer of a standard object-detection pipeline to Palestinian agricultural scenes; no free parameters, axioms, or invented entities are explicitly introduced in the abstract.

pith-pipeline@v0.9.0 · 5529 in / 1185 out tokens · 55133 ms · 2026-05-08T06:47:43.165959+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

32 extracted references · 8 canonical work pages · 3 internal anchors

  1. [1]

    Urban tree crown detection based on deep learning and high- resolution aerial imagery: Ptcnet for pullman, wa, usa,

    O. M. Alegbeleye, A. J. H. Meddens, Y . O. Rotimi, and K. G. Ibeh, “Urban tree crown detection based on deep learning and high- resolution aerial imagery: Ptcnet for pullman, wa, usa,”Remote Sensing Applications: Society and Environment, p. 101818, 2025

  2. [2]

    Harnessing artificial intelligence, machine learning and deep learning for sustainable forestry management and conservation: Transformative potential and future perspectives,

    T. Wang, Y . Zuo, T. Manda, D. Hwarari, and L. Yang, “Harnessing artificial intelligence, machine learning and deep learning for sustainable forestry management and conservation: Transformative potential and future perspectives,”Plants, vol. 14, no. 7, p. 998, 2025

  3. [3]

    Rome, Italy: FAO, 2025, 210 p

    FAO,Global Forest Resources Assessment 2025. Rome, Italy: FAO, 2025, 210 p. [Online]. Available: https://doi.org/10.4060/cd6709en

  4. [4]

    Agroecology as climate justice: A theorotical proposal for transformation development in pales- tine,

    F. Abu Saif, M. Jouili, and E. M. Rashad, “Agroecology as climate justice: A theorotical proposal for transformation development in pales- tine,”Journal of Plant Protection and Pathology, vol. 16, no. 11, pp. 567–572, 2025

  5. [5]

    Palestinian agriculture, food security and incomes in the context of genocide,

    S. Al Botmeh and I. Saadeh, “Palestinian agriculture, food security and incomes in the context of genocide,”Priorities for Palestine’s Economy in the Midst of War, p. 38, 2024

  6. [6]

    The impact of settlement expansion on jerusalem villages: Demographic and economic transformations in biddu, beit iksa, ar-ram and kufr aqab,

    A. Rafeedie and M. Abdellatif, “The impact of settlement expansion on jerusalem villages: Demographic and economic transformations in biddu, beit iksa, ar-ram and kufr aqab,”Omran, vol. 14, no. 54-55, pp. 157–190, 2026

  7. [7]

    Comparison of airborne and satellite high spatial resolution data for the identification of individual trees with local maxima filtering,

    M. Wulder, J. White, K. Niemann, and T. Nelson, “Comparison of airborne and satellite high spatial resolution data for the identification of individual trees with local maxima filtering,”International Journal of Remote Sensing, vol. 25, no. 11, pp. 2225–2232, 2004

  8. [8]

    Chmv2: Improvements in global canopy height mapping using dinov3,

    J. Brandt, S. Yi, J. Tolan, X. Li, P. Potapov, J. Ertel, J. Spore, H. V . V o, M. Ramamonjisoa, P. Labatutet al., “Chmv2: Improvements in global canopy height mapping using dinov3,”arXiv preprint arXiv:2603.06382, 2026

  9. [9]

    Vhrtrees: a new benchmark dataset for tree detection in satellite imagery and performance evaluation with yolo-based models,

    S ¸. N. Topg ¨ul, E. Sertel, S. Aksoy, C. ¨Unsalan, and J. E. Fransson, “Vhrtrees: a new benchmark dataset for tree detection in satellite imagery and performance evaluation with yolo-based models,”Frontiers in Forests and Global Change, vol. 7, p. 1495544, 2025

  10. [10]

    Fm-sam: individual tree crown delineation and classification based on segmentation anything model (sam) and yolov10 in uav im- agery for forest monitoring,

    H. Que, H. Gao, W. Shan, M. Liu, J. An, F. Deng, S. Feng, X. Yang, and L. Mu, “Fm-sam: individual tree crown delineation and classification based on segmentation anything model (sam) and yolov10 in uav im- agery for forest monitoring,”Computers and Electronics in Agriculture, vol. 240, p. 111162, 2026

  11. [11]

    Tree-net: A novel deep learning tree de- tection architecture using uav lidar data,

    S. Jarahizadeh and B. Salehi, “Tree-net: A novel deep learning tree de- tection architecture using uav lidar data,”Remote sensing of environment, vol. 332, p. 115088, 2026

  12. [12]

    Objectbox: From centers to boxes for anchor-free object detection,

    M. Zand, A. Etemad, and M. Greenspan, “Objectbox: From centers to boxes for anchor-free object detection,” inEuropean Conference on Computer Vision. Springer, 2022, pp. 390–406

  13. [13]

    Benchmarking anchor-based and anchor-free state-of-the-art deep learning methods for individual tree detection in rgb high-resolution images,

    P. Zamboni, J. M. Junior, J. d. A. Silva, G. T. Miyoshi, E. T. Matsubara, K. Nogueira, and W. N. Gonc ¸alves, “Benchmarking anchor-based and anchor-free state-of-the-art deep learning methods for individual tree detection in rgb high-resolution images,”Remote Sensing, vol. 13, no. 13, p. 2482, 2021

  14. [14]

    Milliontrees: A benchmark dataset for airborne tree prediction,

    B. Weinstein, “Milliontrees: A benchmark dataset for airborne tree prediction,” https://milliontrees.idtrees.org/, 2025, accessed: 2026-03-27; includes TreeBoxes, TreePoints, and TreePolygons datasets

  15. [15]

    Oam-tcd: A globally diverse dataset of high-resolution tree cover maps,

    J. Veitch-Michaelis, A. Cottam, D. Schweizer, E. N. Broadbent, D. Dao, C. Zhang, A. A. Zambrano, and S. Max, “Oam-tcd: A globally diverse dataset of high-resolution tree cover maps,”Advances in neural infor- mation processing systems, vol. 37, pp. 49 749–49 767, 2024

  16. [16]

    SelvaBox: A high-resolution dataset for tropical tree crown detection

    H. Baudchon, A. Ouaknine, M. Weiss, M. Teng, T. R. Walla, A. Caron- Guay, C. Pal, and E. Lalibert ´e, “Selvabox: A high-resolution dataset for tropical tree crown detection,”arXiv preprint arXiv:2507.00170, 2025

  17. [17]

    Tree detection in aerial imagery,

    A. Radogoshiet al., “Tree detection in aerial imagery,” https://lila.science/datasets/forest-damages-larch-casebearer//, 2021

  18. [18]

    A benchmark dataset for canopy crown detection and delineation in co-registered airborne rgb, lidar and hyperspectral imagery from the national ecological observation network,

    B. G. Weinstein, S. J. Graves, S. Marconi, A. Singh, A. Zare, D. Stewart, S. A. Bohlman, and E. P. White, “A benchmark dataset for canopy crown detection and delineation in co-registered airborne rgb, lidar and hyperspectral imagery from the national ecological observation network,”PLoS computational biology, vol. 17, no. 7, p. e1009180, 2021

  19. [19]

    Counting trees in a subtropical mega city using the instance segmentation method,

    Y . Sun, Z. Li, H. He, L. Guo, X. Zhang, and Q. Xin, “Counting trees in a subtropical mega city using the instance segmentation method,”In- ternational Journal of Applied Earth Observation and Geoinformation, vol. 106, p. 102662, 2022

  20. [20]

    Annotated tree crown bounding boxes in urban/rural environment,

    J. Dumortier, “Annotated tree crown bounding boxes in urban/rural environment,” May 2025. [Online]. Available: https://doi.org/10.5281/zenodo.15155081

  21. [21]

    Individual tree-crown detection in rgb imagery using semi-supervised deep learning neural networks,

    B. G. Weinstein, S. Marconi, S. Bohlman, A. Zare, and E. White, “Individual tree-crown detection in rgb imagery using semi-supervised deep learning neural networks,”Remote Sensing, vol. 11, no. 11, p. 1309, 2019

  22. [22]

    A NEON individual tree crowns dataset with extra-terrestrial intelligence for deep learning,

    B. G. Weinsteinet al., “A NEON individual tree crowns dataset with extra-terrestrial intelligence for deep learning,”Scientific Data, vol. 8, no. 1, pp. 1–11, 2021. [Online]. Available: https://par.nsf.gov/biblio/10453016-neon-tree-crowns-dataset

  23. [23]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gellyet al., “An image is worth 16x16 words: Transformers for image recognition at scale,”arXiv preprint arXiv:2010.11929, 2020

  24. [24]

    Swin transformer v2: Scaling up capacity and resolution,

    Z. Liu, H. Hu, Y . Lin, Z. Yao, Z. Xie, Y . Wei, J. Ning, Y . Cao, Z. Zhang, L. Donget al., “Swin transformer v2: Scaling up capacity and resolution,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 12 009–12 019

  25. [25]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778

  26. [26]

    Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,

    L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,”IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 4, pp. 834–848, 2017

  27. [27]

    Rethinking Atrous Convolution for Semantic Image Segmentation

    L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking atrous convolution for semantic image segmentation,”arXiv preprint arXiv:1706.05587, 2017

  28. [28]

    Res50- simam-aspp-unet: a semantic segmentation model for high-resolution remote sensing images,

    J. Cai, J. Shi, Y .-B. Leau, S. Meng, X. Zheng, and J. Zhou, “Res50- simam-aspp-unet: a semantic segmentation model for high-resolution remote sensing images,”IEEE Access, vol. 12, pp. 192 301–192 316, 2024

  29. [29]

    Residual attention network with atrous spatial pyramid pooling for soil element estimation in lucas hyperspectral data,

    Y . Deng, Y . Cao, S. Chen, and X. Cheng, “Residual attention network with atrous spatial pyramid pooling for soil element estimation in lucas hyperspectral data,”Applied Sciences, vol. 15, no. 13, p. 7457, 2025

  30. [30]

    Does context matter? enhancing hand- written text recognition with metadata in historical manuscripts,

    B. Kiessling and T. Cl ´erice, “Does context matter? enhancing hand- written text recognition with metadata in historical manuscripts,” in CHR2024–Computational Humanities Research Conference, 2024

  31. [31]

    Conformer: Convolution- augmented transformer for speech recognition,

    A. Gulati, J. Qin, C.-C. Chiu, N. Parmar, Y . Zhang, J. Yu, W. Han, S. Wang, Z. Zhang, Y . Wuet al., “Conformer: Convolution-augmented transformer for speech recognition,”arXiv preprint arXiv:2005.08100, 2020

  32. [32]

    Fcos: Fully convolutional one- stage object detection,

    Z. Tian, C. Shen, H. Chen, and T. He, “Fcos: Fully convolutional one- stage object detection,” inProceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 9627–9636