arxiv: 2604.23653 · v1 · submitted 2026-04-26 · 💻 cs.CV · cs.AI

Recognition: unknown

ResAF-Net: An Anchor-Free Attention-Based Network for Tree Detection and Agricultural Mapping in Palestine

Rabee Al-Qasem

Authors on Pith no claims yet

Pith reviewed 2026-05-08 06:47 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords tree detectionsatellite imageryagricultural mappinganchor-free detectionattention mechanismGIS integrationcrop monitoringPalestine

0 comments

The pith

ResAF-Net uses a ResNet encoder with attention and an anchor-free head to detect trees in satellite images for agricultural mapping in Palestine.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops ResAF-Net to solve the problem of collecting reliable tree and crop data in Palestine, where fragmented land, limited access, and monitoring restrictions make traditional surveys difficult. The network processes satellite imagery through a ResNet-50 backbone, atrous pooling, feature fusion, and self-attention before an anchor-free detector locates individual trees. When tested on a large benchmark, it shows high sensitivity to tree presence and is then linked to local land records inside a web GIS tool. This setup allows analysis from single scenes to entire communities and creates a starting point for remote, data-driven agricultural monitoring.

Core claim

ResAF-Net is an anchor-free attention-based network for tree detection that combines a ResNet-50 encoder, Atrous Spatial Pyramid Pooling, a feature-fusion stage, a multi-head self-attention refinement module, and an FCOS detection head. Trained on the MillionTrees benchmark, the model records 82 percent recall, 63.03 percent mAP at 0.50 IoU, and 35.47 percent mAP across 0.50 to 0.95 IoU on the validation set. The same model is deployed inside a web-based GIS application that ingests Palestinian cadastral data from GeoMolg, enabling tree analysis at scene, parcel, and community scales and providing a practical route to large-scale agricultural inventorying.

What carries the argument

ResAF-Net architecture that stacks a ResNet-50 encoder, atrous spatial pyramid pooling, feature fusion, multi-head self-attention refinement, and an anchor-free FCOS head to localize trees in dense or heterogeneous satellite scenes.

If this is right

The GIS integration permits tree inventories at scene, parcel, and community levels using existing cadastral layers.
The deployment shows that satellite-based detection can support agricultural monitoring where physical or aerial access is limited.
The framework supplies a base for later species-level classification of Mediterranean tree crops.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar architectures could be tested in other regions that face comparable access or data-collection barriers.
Routine use of the system might feed into broader land-use planning and food-security tracking at national scale.
Performance could be checked by running the model on fresh Palestinian satellite scenes and measuring agreement with any available local reference data.

Load-bearing premise

That benchmark performance on MillionTrees will transfer to the fragmented and restricted landscapes of Palestine without extra domain adaptation or local validation data.

What would settle it

A direct comparison of model predictions against ground-truth tree counts obtained from field visits or very-high-resolution local imagery in selected Palestinian agricultural parcels.

Figures

Figures reproduced from arXiv: 2604.23653 by Rabee Al-Qasem.

**Figure 1.** Figure 1: Sample Image with its annotations from different view at source ↗

**Figure 2.** Figure 2: The distribution of aspect ratio between different view at source ↗

**Figure 4.** Figure 4: Tree detection architecture consisting of an encoder, view at source ↗

**Figure 7.** Figure 7: High-level architecture of the self-attention refiner. FPN view at source ↗

**Figure 8.** Figure 8: Architecture of the Anchor-Free Detection Head. view at source ↗

**Figure 9.** Figure 9: Training loss curves for the detection model over view at source ↗

**Figure 10.** Figure 10: Visual results from the validation dataset of Model Detections. The images display ground-truth labels (green) and our model’s predictions (purple). The high Recall (82%) is evident as the purple boxes successfully intersect with the vast majority of green boxes. generates reports on the expansion or decline of specific tree types and overall green cover within targeted areas. We built the application on … view at source ↗

**Figure 11.** Figure 11: Visual representation of the 6,184 trees detected by view at source ↗

**Figure 13.** Figure 13: Visual representation of the 6,641 trees detected by view at source ↗

**Figure 12.** Figure 12: Visual representation of the 384 trees detected by the view at source ↗

read the original abstract

Reliable agricultural data is essential for food security, land-use planning, and economic resilience, yet in Palestine, such data remains difficult to collect at scale because of fragmented landscapes, limited field access, and restrictions on aerial monitoring. This paper presents ResAF-Net, a satellite-based tree detection framework designed for large-scale agricultural monitoring in resource-constrained settings. The proposed architecture combines a ResNet-50 encoder, Atrous Spatial Pyramid Pooling (ASPP), a feature-fusion stage, a multi-head self-attention refinement module, and an anchor-free FCOS detection head to improve tree localization in dense and heterogeneous scenes. Trained on the MillionTrees benchmark, the model achieved 82% Recall, 63.03% mAP@0.50, and 35.47% mAP@0.50:0.95 on the validation split, indicating strong sensitivity to tree presence while maintaining competitive localization quality. Beyond benchmark evaluation, we implemented the model within a web-based GIS application integrated with Palestinian cadastral data from GeoMolg, enabling tree analysis at scene, parcel, and community levels. This deployment demonstrates the practical feasibility of AI-assisted agricultural inventorying in Palestine. It provides a foundation for data-driven monitoring, reporting, and future species-level analysis of Mediterranean tree crops.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ResAF-Net is a standard ResNet-ASPP-attention-FCOS pipeline applied to tree detection, with benchmark numbers on MillionTrees but no quantitative results on actual Palestinian imagery to support the deployment claims.

read the letter

The main thing to know is that this paper takes well-known detection pieces, runs them on the MillionTrees benchmark, and then describes a GIS wrapper using Palestinian cadastral data. The numbers look okay for the benchmark—82% recall and 63% mAP at 0.5—but the Palestine-specific part stays qualitative. No local test set, no fine-tuning results, and no accuracy figures on the target satellite scenes are given, so the transfer story rests on an assumption that does not get checked in the paper. That is the central limitation. The architecture itself is not new: ResNet-50 encoder, ASPP, multi-head attention, and FCOS head have all appeared in remote-sensing work before. The authors assemble them cleanly and report the validation metrics, which is fine as far as it goes. The GIS integration with GeoMolg data is the practical piece they add, letting users query at scene, parcel, and community levels. That part is described at a high level and could be useful to someone who already has similar cadastral layers. Training details are thin—no splits, augmentation, or hyper-parameter search are spelled out—so the reported numbers cannot be fully stress-tested for robustness. The paper is aimed at practitioners who need a starting template for agricultural mapping in fragmented or access-limited regions. A reader working on Mediterranean crops or similar constrained settings might borrow the pipeline or the deployment pattern. It does not introduce new primitives or theory, so it is not for people looking for algorithmic advances. I would send it to peer review. The application is concrete, the benchmark evaluation is reported, and the authors are direct about the setting. It would benefit from a local validation section, but the work is coherent enough on its own terms to deserve referee time rather than a desk rejection.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces ResAF-Net, an anchor-free network for satellite-based tree detection that combines a ResNet-50 encoder, ASPP, feature fusion, multi-head self-attention, and an FCOS head. Trained on the MillionTrees benchmark, it reports 82% recall, 63.03% mAP@0.50 and 35.47% mAP@0.50:0.95 on the validation split, and describes a qualitative web-GIS deployment integrated with Palestinian GeoMolg cadastral data for multi-scale tree inventorying.

Significance. If the reported performance generalizes beyond the MillionTrees benchmark to the target domain, the work could supply a practical monitoring tool for fragmented agricultural landscapes where field access and aerial data are restricted. The cadastral integration is a concrete step toward operational use, but the absence of any local quantitative results limits the immediate significance to a proof-of-concept on a public benchmark.

major comments (2)

[Abstract] Abstract and deployment description: the central claim that the system demonstrates 'practical feasibility of AI-assisted agricultural inventorying in Palestine' rests on an untested generalization assumption. All numeric results (82% Recall, 63.03% mAP@0.50, 35.47% mAP@0.50:0.95) are reported exclusively on the MillionTrees validation split; no accuracy figures, local test set, fine-tuning protocol, or domain-adaptation results on Palestinian satellite scenes or parcels are provided.
[Abstract] The weakest assumption—that benchmark performance will transfer to Palestine’s heterogeneous, restricted-access landscapes—is load-bearing for the application claim yet unsupported by any quantitative evidence in the manuscript.

minor comments (1)

[Abstract] Training protocol details (data splits, augmentation strategy, hyper-parameter search, optimizer, and error analysis) are omitted from the abstract and not referenced in the provided text, preventing assessment of robustness or overfitting risk.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on the manuscript. We agree that the claims regarding practical feasibility in Palestine are not quantitatively supported by local data and will revise the abstract and add explicit limitations discussion to align claims with the presented evidence.

read point-by-point responses

Referee: [Abstract] Abstract and deployment description: the central claim that the system demonstrates 'practical feasibility of AI-assisted agricultural inventorying in Palestine' rests on an untested generalization assumption. All numeric results (82% Recall, 63.03% mAP@0.50, 35.47% mAP@0.50:0.95) are reported exclusively on the MillionTrees validation split; no accuracy figures, local test set, fine-tuning protocol, or domain-adaptation results on Palestinian satellite scenes or parcels are provided.

Authors: We agree that the manuscript reports all numeric results exclusively on the MillionTrees validation split and provides no local test set, fine-tuning protocol, or domain-adaptation results on Palestinian scenes. The web-GIS deployment with GeoMolg data is presented as a qualitative illustration of integration for multi-scale inventorying rather than a validated local deployment. We will revise the abstract to replace the phrase 'demonstrates the practical feasibility' with 'illustrates the potential for' and add a dedicated limitations paragraph stating the absence of local quantitative validation. This addresses the untested generalization directly. revision: yes
Referee: [Abstract] The weakest assumption—that benchmark performance will transfer to Palestine’s heterogeneous, restricted-access landscapes—is load-bearing for the application claim yet unsupported by any quantitative evidence in the manuscript.

Authors: The referee is correct that transferability to Palestinian landscapes is assumed without quantitative support in the current work. No local satellite scenes or parcels were used for testing or adaptation, owing to the access restrictions noted in the introduction. We will revise the abstract and conclusion to temper the language, explicitly framing the Palestine component as a proof-of-concept deployment framework rather than a demonstrated operational system. A new discussion subsection will acknowledge the domain-shift risks and the need for future local fine-tuning and evaluation. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical benchmark evaluation with no derivations or self-referential predictions

full rationale

The manuscript proposes a composite CNN architecture (ResNet-50 encoder, ASPP, feature fusion, multi-head attention, FCOS head) and reports standard detection metrics obtained by training and evaluating on the external MillionTrees validation split. No equations, parameter-fitting procedures, or derivation chains are described that could reduce to self-definition or fitted-input predictions. The Palestine deployment is presented only as a qualitative GIS integration without any local quantitative results, so no circular reduction occurs. This is a standard empirical ML paper whose central claims rest on external benchmark performance rather than internal self-reference.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the empirical transfer of a standard object-detection pipeline to Palestinian agricultural scenes; no free parameters, axioms, or invented entities are explicitly introduced in the abstract.

pith-pipeline@v0.9.0 · 5529 in / 1185 out tokens · 55133 ms · 2026-05-08T06:47:43.165959+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

32 extracted references · 8 canonical work pages · 3 internal anchors

[1]

Urban tree crown detection based on deep learning and high- resolution aerial imagery: Ptcnet for pullman, wa, usa,

O. M. Alegbeleye, A. J. H. Meddens, Y . O. Rotimi, and K. G. Ibeh, “Urban tree crown detection based on deep learning and high- resolution aerial imagery: Ptcnet for pullman, wa, usa,”Remote Sensing Applications: Society and Environment, p. 101818, 2025

2025
[2]

Harnessing artificial intelligence, machine learning and deep learning for sustainable forestry management and conservation: Transformative potential and future perspectives,

T. Wang, Y . Zuo, T. Manda, D. Hwarari, and L. Yang, “Harnessing artificial intelligence, machine learning and deep learning for sustainable forestry management and conservation: Transformative potential and future perspectives,”Plants, vol. 14, no. 7, p. 998, 2025

2025
[3]

Rome, Italy: FAO, 2025, 210 p

FAO,Global Forest Resources Assessment 2025. Rome, Italy: FAO, 2025, 210 p. [Online]. Available: https://doi.org/10.4060/cd6709en

work page doi:10.4060/cd6709en 2025
[4]

Agroecology as climate justice: A theorotical proposal for transformation development in pales- tine,

F. Abu Saif, M. Jouili, and E. M. Rashad, “Agroecology as climate justice: A theorotical proposal for transformation development in pales- tine,”Journal of Plant Protection and Pathology, vol. 16, no. 11, pp. 567–572, 2025

2025
[5]

Palestinian agriculture, food security and incomes in the context of genocide,

S. Al Botmeh and I. Saadeh, “Palestinian agriculture, food security and incomes in the context of genocide,”Priorities for Palestine’s Economy in the Midst of War, p. 38, 2024

2024
[6]

The impact of settlement expansion on jerusalem villages: Demographic and economic transformations in biddu, beit iksa, ar-ram and kufr aqab,

A. Rafeedie and M. Abdellatif, “The impact of settlement expansion on jerusalem villages: Demographic and economic transformations in biddu, beit iksa, ar-ram and kufr aqab,”Omran, vol. 14, no. 54-55, pp. 157–190, 2026

2026
[7]

Comparison of airborne and satellite high spatial resolution data for the identification of individual trees with local maxima filtering,

M. Wulder, J. White, K. Niemann, and T. Nelson, “Comparison of airborne and satellite high spatial resolution data for the identification of individual trees with local maxima filtering,”International Journal of Remote Sensing, vol. 25, no. 11, pp. 2225–2232, 2004

2004
[8]

Chmv2: Improvements in global canopy height mapping using dinov3,

J. Brandt, S. Yi, J. Tolan, X. Li, P. Potapov, J. Ertel, J. Spore, H. V . V o, M. Ramamonjisoa, P. Labatutet al., “Chmv2: Improvements in global canopy height mapping using dinov3,”arXiv preprint arXiv:2603.06382, 2026

work page arXiv 2026
[9]

Vhrtrees: a new benchmark dataset for tree detection in satellite imagery and performance evaluation with yolo-based models,

S ¸. N. Topg ¨ul, E. Sertel, S. Aksoy, C. ¨Unsalan, and J. E. Fransson, “Vhrtrees: a new benchmark dataset for tree detection in satellite imagery and performance evaluation with yolo-based models,”Frontiers in Forests and Global Change, vol. 7, p. 1495544, 2025

2025
[10]

Fm-sam: individual tree crown delineation and classification based on segmentation anything model (sam) and yolov10 in uav im- agery for forest monitoring,

H. Que, H. Gao, W. Shan, M. Liu, J. An, F. Deng, S. Feng, X. Yang, and L. Mu, “Fm-sam: individual tree crown delineation and classification based on segmentation anything model (sam) and yolov10 in uav im- agery for forest monitoring,”Computers and Electronics in Agriculture, vol. 240, p. 111162, 2026

2026
[11]

Tree-net: A novel deep learning tree de- tection architecture using uav lidar data,

S. Jarahizadeh and B. Salehi, “Tree-net: A novel deep learning tree de- tection architecture using uav lidar data,”Remote sensing of environment, vol. 332, p. 115088, 2026

2026
[12]

Objectbox: From centers to boxes for anchor-free object detection,

M. Zand, A. Etemad, and M. Greenspan, “Objectbox: From centers to boxes for anchor-free object detection,” inEuropean Conference on Computer Vision. Springer, 2022, pp. 390–406

2022
[13]

Benchmarking anchor-based and anchor-free state-of-the-art deep learning methods for individual tree detection in rgb high-resolution images,

P. Zamboni, J. M. Junior, J. d. A. Silva, G. T. Miyoshi, E. T. Matsubara, K. Nogueira, and W. N. Gonc ¸alves, “Benchmarking anchor-based and anchor-free state-of-the-art deep learning methods for individual tree detection in rgb high-resolution images,”Remote Sensing, vol. 13, no. 13, p. 2482, 2021

2021
[14]

Milliontrees: A benchmark dataset for airborne tree prediction,

B. Weinstein, “Milliontrees: A benchmark dataset for airborne tree prediction,” https://milliontrees.idtrees.org/, 2025, accessed: 2026-03-27; includes TreeBoxes, TreePoints, and TreePolygons datasets

2025
[15]

Oam-tcd: A globally diverse dataset of high-resolution tree cover maps,

J. Veitch-Michaelis, A. Cottam, D. Schweizer, E. N. Broadbent, D. Dao, C. Zhang, A. A. Zambrano, and S. Max, “Oam-tcd: A globally diverse dataset of high-resolution tree cover maps,”Advances in neural infor- mation processing systems, vol. 37, pp. 49 749–49 767, 2024

2024
[16]

SelvaBox: A high-resolution dataset for tropical tree crown detection

H. Baudchon, A. Ouaknine, M. Weiss, M. Teng, T. R. Walla, A. Caron- Guay, C. Pal, and E. Lalibert ´e, “Selvabox: A high-resolution dataset for tropical tree crown detection,”arXiv preprint arXiv:2507.00170, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[17]

Tree detection in aerial imagery,

A. Radogoshiet al., “Tree detection in aerial imagery,” https://lila.science/datasets/forest-damages-larch-casebearer//, 2021

2021
[18]

A benchmark dataset for canopy crown detection and delineation in co-registered airborne rgb, lidar and hyperspectral imagery from the national ecological observation network,

B. G. Weinstein, S. J. Graves, S. Marconi, A. Singh, A. Zare, D. Stewart, S. A. Bohlman, and E. P. White, “A benchmark dataset for canopy crown detection and delineation in co-registered airborne rgb, lidar and hyperspectral imagery from the national ecological observation network,”PLoS computational biology, vol. 17, no. 7, p. e1009180, 2021

2021
[19]

Counting trees in a subtropical mega city using the instance segmentation method,

Y . Sun, Z. Li, H. He, L. Guo, X. Zhang, and Q. Xin, “Counting trees in a subtropical mega city using the instance segmentation method,”In- ternational Journal of Applied Earth Observation and Geoinformation, vol. 106, p. 102662, 2022

2022
[20]

Annotated tree crown bounding boxes in urban/rural environment,

J. Dumortier, “Annotated tree crown bounding boxes in urban/rural environment,” May 2025. [Online]. Available: https://doi.org/10.5281/zenodo.15155081

work page doi:10.5281/zenodo.15155081 2025
[21]

Individual tree-crown detection in rgb imagery using semi-supervised deep learning neural networks,

B. G. Weinstein, S. Marconi, S. Bohlman, A. Zare, and E. White, “Individual tree-crown detection in rgb imagery using semi-supervised deep learning neural networks,”Remote Sensing, vol. 11, no. 11, p. 1309, 2019

2019
[22]

A NEON individual tree crowns dataset with extra-terrestrial intelligence for deep learning,

B. G. Weinsteinet al., “A NEON individual tree crowns dataset with extra-terrestrial intelligence for deep learning,”Scientific Data, vol. 8, no. 1, pp. 1–11, 2021. [Online]. Available: https://par.nsf.gov/biblio/10453016-neon-tree-crowns-dataset

work page arXiv 2021
[23]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gellyet al., “An image is worth 16x16 words: Transformers for image recognition at scale,”arXiv preprint arXiv:2010.11929, 2020

work page internal anchor Pith review arXiv 2010
[24]

Swin transformer v2: Scaling up capacity and resolution,

Z. Liu, H. Hu, Y . Lin, Z. Yao, Z. Xie, Y . Wei, J. Ning, Y . Cao, Z. Zhang, L. Donget al., “Swin transformer v2: Scaling up capacity and resolution,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 12 009–12 019

2022
[25]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778

2016
[26]

Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,

L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,”IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 4, pp. 834–848, 2017

2017
[27]

Rethinking Atrous Convolution for Semantic Image Segmentation

L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking atrous convolution for semantic image segmentation,”arXiv preprint arXiv:1706.05587, 2017

work page internal anchor Pith review arXiv 2017
[28]

Res50- simam-aspp-unet: a semantic segmentation model for high-resolution remote sensing images,

J. Cai, J. Shi, Y .-B. Leau, S. Meng, X. Zheng, and J. Zhou, “Res50- simam-aspp-unet: a semantic segmentation model for high-resolution remote sensing images,”IEEE Access, vol. 12, pp. 192 301–192 316, 2024

2024
[29]

Residual attention network with atrous spatial pyramid pooling for soil element estimation in lucas hyperspectral data,

Y . Deng, Y . Cao, S. Chen, and X. Cheng, “Residual attention network with atrous spatial pyramid pooling for soil element estimation in lucas hyperspectral data,”Applied Sciences, vol. 15, no. 13, p. 7457, 2025

2025
[30]

Does context matter? enhancing hand- written text recognition with metadata in historical manuscripts,

B. Kiessling and T. Cl ´erice, “Does context matter? enhancing hand- written text recognition with metadata in historical manuscripts,” in CHR2024–Computational Humanities Research Conference, 2024

2024
[31]

Conformer: Convolution- augmented transformer for speech recognition,

A. Gulati, J. Qin, C.-C. Chiu, N. Parmar, Y . Zhang, J. Yu, W. Han, S. Wang, Z. Zhang, Y . Wuet al., “Conformer: Convolution-augmented transformer for speech recognition,”arXiv preprint arXiv:2005.08100, 2020

work page arXiv 2005
[32]

Fcos: Fully convolutional one- stage object detection,

Z. Tian, C. Shen, H. Chen, and T. He, “Fcos: Fully convolutional one- stage object detection,” inProceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 9627–9636

2019