Recognition: no theorem link
Low-Cost Stereo Vision for Robust 3D Positioning of Thin Radiata Pine Branches in Autonomous Drone Pruning
Pith reviewed 2026-05-12 00:45 UTC · model grok-4.3
The pith
A low-cost stereo camera on a drone can locate thin 10 mm pine branches in 3D for autonomous pruning without extra sensors.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By pairing real-time segmentation masks from YOLOv8 or YOLOv9 with disparity maps from deep stereo networks such as RAFT-Stereo or ACVNet, then applying centroid extraction and median-absolute-deviation filtering to the resulting 3D points, the system yields a robust per-branch distance estimate that supports pruning operations on branches down to 10 mm thickness at typical drone working distances.
What carries the argument
Centroid-based triangulation with Median-Absolute-Deviation outlier rejection that converts a segmentation mask and disparity map into one reliable branch-to-camera distance.
If this is right
- Autonomous pruning platforms can drop expensive auxiliary depth sensors and still target branches as thin as 10 mm.
- The same segmentation-plus-centroid pipeline can be swapped to newer YOLO releases without redesigning the depth or positioning stages.
- Forestry-specific fine-tuning of stereo networks is required because urban driving benchmarks leave a noticeable domain gap in natural scenes.
- Real-time operation on drones becomes feasible once the chosen stereo and segmentation models run at camera frame rates.
- Sparse-texture handling improves when learning-based disparity methods replace classical block-matching approaches.
Where Pith is reading between the lines
- If the same pipeline were tested on moving branches or under wind, the outlier rejection step might need additional temporal filtering to stay reliable.
- The approach could transfer to other thin linear structures such as power lines or vineyard wires once domain-specific data is collected.
- Coupling the 3D branch positions directly to a pruning end-effector would close the loop from perception to action without separate mapping steps.
- Quantitative error budgets tied to branch diameter would let future work set clear accuracy targets instead of relying on visual inspection.
Load-bearing premise
Qualitative visual comparisons of depth maps at 1-2 m distances on a 71-pair custom dataset are sufficient to establish that the positioning accuracy meets the requirements for autonomous pruning of 10 mm branches in real operational conditions.
What would settle it
A quantitative measurement campaign that records average 3D positioning error larger than half the branch diameter (5 mm) or more than 10 percent of range on thin branches at 1-2 m would show the method does not yet meet pruning needs.
Figures
read the original abstract
Manual pruning of radiata pine, a species of major economic importance to New Zealand forestry, is hazardous, labour-intensive, and increasingly constrained by workforce shortages. Existing autonomous pruning platforms typically rely on expensive sensors such as LiDAR and are limited to thick branches, which restricts their wider adoption. This paper investigates whether a single low-cost stereo camera mounted on a drone can provide sufficiently accurate branch detection and three-dimensional positioning to support autonomous pruning of branches as thin as 10 mm, thereby removing the need for auxiliary depth sensors. The proposed pipeline comprises two stages: branch segmentation and depth estimation. For segmentation, Mask R-CNN variants and the YOLOv8 and YOLOv9 families are compared on a custom dataset of 71 stereo image pairs captured with a ZED Mini camera; YOLOv8 and YOLOv9 are selected as representative state-of-the-art real-time segmentors at the time of data collection, and the framework is designed to remain compatible with newer YOLO releases. For depth estimation, a traditional method (SGBM with WLS filtering) and deep-learning-based methods (PSMNet, ACVNet, GWCNet, MobileStereoNet, RAFT-Stereo, and NeRF-Supervised Deep Stereo) are evaluated, including cross-dataset fine-tuning experiments that expose the domain gap between urban driving benchmarks and natural forestry scenes. The main novelty of this work lies in coupling stereo segmentation with a centroid-based triangulation algorithm and Median-Absolute-Deviation outlier rejection that converts a segmentation mask and disparity map into a single robust branch-to-camera distance, addressing the challenges of sparse texture, thin structures, and noisy disparity values typical of forest scenes. Qualitative evaluations at distances of 1-2 m show that the learning-based stereo methods produce more coherent depth es...
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that a single low-cost stereo camera (ZED Mini) mounted on a drone can deliver sufficiently accurate branch detection and 3D positioning for thin (10 mm) radiata pine branches to enable autonomous pruning without auxiliary depth sensors such as LiDAR. The proposed pipeline uses instance segmentation (Mask R-CNN, YOLOv8, YOLOv9) on a custom 71-pair stereo dataset, followed by depth estimation (SGBM, PSMNet, RAFT-Stereo and others) and a novel centroid-based triangulation step with Median-Absolute-Deviation outlier rejection to convert masks and disparity maps into a single robust branch-to-camera distance; qualitative visual comparisons at 1-2 m distances are presented to support the accuracy claim.
Significance. If the mm-scale positioning accuracy were quantitatively validated, the work would be significant for low-cost automation in New Zealand forestry by reducing reliance on expensive sensors and extending pruning capability to thinner branches. The comparison of segmentation and stereo models on challenging forest scenes with sparse texture, plus the practical centroid-MAD post-processing, provides a useful baseline for future drone-based systems.
major comments (1)
- [Abstract and Evaluation section] Abstract and Evaluation section: The central claim that the pipeline provides 'sufficiently accurate' 3D positioning for 10 mm branches (thereby removing the need for auxiliary sensors) is not supported by any quantitative evidence. Only qualitative visual comparisons of depth maps and segmentation masks at 1-2 m on the 71-pair custom dataset are reported; no ground-truth branch-to-camera distances, MAE/RMSE on the final triangulated scalar distances, error bars, tolerance analysis relative to branch radius, or drone-mounted repeatability tests appear. This directly undermines the strongest claim.
minor comments (2)
- [Pipeline description (Section 3)] The description of the centroid-MAD triangulation and outlier rejection would be clearer with explicit equations or pseudocode showing how the segmentation mask and disparity map are reduced to a single distance value.
- [Abstract] The abstract sentence on learning-based stereo methods is truncated ('more coherent depth es...'); complete it in the revision.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address the major comment below, clarifying our evaluation choices while committing to revisions that temper claims and add context without misrepresenting the work.
read point-by-point responses
-
Referee: [Abstract and Evaluation section] Abstract and Evaluation section: The central claim that the pipeline provides 'sufficiently accurate' 3D positioning for 10 mm branches (thereby removing the need for auxiliary sensors) is not supported by any quantitative evidence. Only qualitative visual comparisons of depth maps and segmentation masks at 1-2 m on the 71-pair custom dataset are reported; no ground-truth branch-to-camera distances, MAE/RMSE on the final triangulated scalar distances, error bars, tolerance analysis relative to branch radius, or drone-mounted repeatability tests appear. This directly undermines the strongest claim.
Authors: We agree that quantitative metrics such as MAE/RMSE on the final triangulated distances would provide stronger support for the accuracy claim. The manuscript focuses on qualitative visual comparisons because obtaining precise ground-truth 3D positions for thin 10 mm branches in unstructured forest scenes is practically challenging without auxiliary sensors that would undermine the low-cost premise. The centroid-MAD triangulation is presented as a practical post-processing step that yields coherent positions despite noisy disparities. We will revise the abstract and evaluation sections to explicitly qualify the results as qualitative, moderate the phrasing of 'sufficiently accurate,' and add a tolerance discussion relative to branch radius (e.g., positioning error acceptable if within 5 mm for pruning contact). This addresses the concern while preserving the contribution as a baseline for low-cost drone systems. revision: partial
Circularity Check
No circularity: experimental comparison of off-the-shelf models on new data
full rationale
The manuscript evaluates standard segmentation networks (Mask R-CNN, YOLOv8/9) and stereo depth estimators (SGBM, PSMNet, RAFT-Stereo, etc.) on a newly collected 71-pair ZED Mini dataset. The final distance is obtained via a centroid-MAD triangulation step that applies a conventional statistical outlier rule to disparity values; this step is not fitted to the target distances and does not redefine any reported quantity in terms of itself. No equations, uniqueness theorems, or self-citations reduce the claimed positioning accuracy to a tautology or to parameters optimized on the same evaluation set. The work therefore remains an independent empirical comparison.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Disparity maps from stereo matching can be converted to metric distances via known camera intrinsics and baseline
- domain assumption A 71-pair custom dataset captured with ZED Mini is representative of operational forestry conditions for thin branches
Reference graph
Works this paper leans on
-
[1]
Robotics and Biomimetics , volume=
Aerial pruning mechanism, initial real environment test , author=. Robotics and Biomimetics , volume=. 2017 , publisher=
work page 2017
-
[2]
iForest - Biogeosciences and Forestry , vol =
MP Fernandez and J Basauri and C Madariaga and M Menendez-Miguelez and R Olea and A Zubizarreta-Gerendiain , title =. iForest - Biogeosciences and Forestry , vol =. 2017 , URL =. https://iforest.sisef.org/pdf/?id=ifor2037-009 , doi =
work page 2017
-
[3]
New Zealand Journal of Forestry Science , volume=
Impacts of tending on attributes of radiata pine trees and stands in New Zealand--a review , author=. New Zealand Journal of Forestry Science , volume=
-
[4]
Canadian Journal of Forest Research , volume=
Effects of green pruning on growth of Pinus radiata , author=. Canadian Journal of Forest Research , volume=. 2003 , publisher=
work page 2003
-
[5]
Effects of thinning and pruning on stem and crown characteristics of radiata pine (Pinus radiata D. Don) , author=. iForest-Biogeosciences and Forestry , volume=. 2017 , publisher=
work page 2017
-
[6]
Tree Branch Skeleton Extraction from Drone-Based Photogrammetric Point Cloud , author=. Drones , volume=. 2023 , publisher=
work page 2023
-
[7]
Materials Today: Proceedings , volume=
Non-contact type tree branch cutter using drone attached with laser head , author=. Materials Today: Proceedings , volume=. 2022 , publisher=
work page 2022
-
[8]
An apple tree branch pruning analysis , author=. HortTechnology , volume=. 2022 , publisher=
work page 2022
-
[9]
Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
Rich feature hierarchies for accurate object detection and semantic segmentation , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
-
[10]
IEEE transactions on pattern analysis and machine intelligence , volume=
Spatial pyramid pooling in deep convolutional networks for visual recognition , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2015 , publisher=
work page 2015
-
[11]
Proceedings of the IEEE international conference on computer vision , pages=
Fast r-cnn , author=. Proceedings of the IEEE international conference on computer vision , pages=
-
[12]
IEEE transactions on pattern analysis and machine intelligence , volume=
Faster R-CNN: Towards real-time object detection with region proposal networks , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2016 , publisher=
work page 2016
-
[13]
Proceedings of the IEEE international conference on computer vision , pages=
Mask r-cnn , author=. Proceedings of the IEEE international conference on computer vision , pages=
-
[14]
Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
You only look once: Unified, real-time object detection , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
-
[15]
arXiv preprint arXiv:2305.09972 , year=
Real-time flying object detection with YOLOv8 , author=. arXiv preprint arXiv:2305.09972 , year=
-
[16]
Yolov9: Learning what you want to learn us- ing programmable gradient information
Yolov9: Learning what you want to learn using programmable gradient information , author=. arXiv preprint arXiv:2402.13616 , year=
-
[17]
A model of stem growth and wood formation in Pinus radiata , author=. Trees , volume=. 2015 , publisher=
work page 2015
-
[18]
Smart Agriculture for Developing Nations , author=. 2023 , publisher=
work page 2023
-
[19]
2018 IEEE 22nd International Conference on Intelligent Engineering Systems (INES) , pages=
Survey of drones for agriculture automation from planting to harvest , author=. 2018 IEEE 22nd International Conference on Intelligent Engineering Systems (INES) , pages=. 2018 , organization=
work page 2018
-
[20]
International journal of remote sensing , volume=
Forestry applications of UAVs in Europe: A review , author=. International journal of remote sensing , volume=. 2017 , publisher=
work page 2017
-
[21]
IEEE transactions on pattern analysis and machine intelligence , volume=
A survey on deep learning techniques for stereo-based depth estimation , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2020 , publisher=
work page 2020
-
[22]
Proceedings of the IEEE , volume=
Object detection in 20 years: A survey , author=. Proceedings of the IEEE , volume=. 2023 , publisher=
work page 2023
-
[23]
IEEE transactions on pattern analysis and machine intelligence , volume=
Image segmentation using deep learning: A survey , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2021 , publisher=
work page 2021
-
[24]
A survey on object instance segmentation , author=. SN Computer Science , volume=. 2022 , publisher=
work page 2022
-
[25]
2017 4th International Conference on Advanced Computing and Communication Systems (ICACCS) , pages=
Robotic arm design, development and control for agriculture applications , author=. 2017 4th International Conference on Advanced Computing and Communication Systems (ICACCS) , pages=. 2017 , organization=
work page 2017
-
[26]
Deep learning-based stereopsis and monocular depth estimation techniques: a review , author=. Vehicles , volume=. 2024 , publisher=
work page 2024
-
[27]
Microsoft coco: Common objects in context , author=. Computer Vision--ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13 , pages=. 2014 , organization=
work page 2014
-
[28]
IEEE transactions on circuits and systems for video technology , volume=
A new three-step search algorithm for block motion estimation , author=. IEEE transactions on circuits and systems for video technology , volume=. 1994 , publisher=
work page 1994
-
[29]
IEEE transactions on circuits and systems for video technology , volume=
A novel four-step search algorithm for fast block motion estimation , author=. IEEE transactions on circuits and systems for video technology , volume=. 1996 , publisher=
work page 1996
-
[30]
International journal of computer vision , volume=
A taxonomy and evaluation of dense two-frame stereo correspondence algorithms , author=. International journal of computer vision , volume=. 2002 , publisher=
work page 2002
-
[31]
IEEE Transactions on pattern analysis and machine intelligence , volume=
Stereo processing by semiglobal matching and mutual information , author=. IEEE Transactions on pattern analysis and machine intelligence , volume=. 2007 , publisher=
work page 2007
-
[32]
Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
Attention concatenation volume for accurate and efficient stereo matching , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
-
[33]
Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
Group-wise correlation stereo network , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
-
[34]
Proceedings of the ieee/cvf winter conference on applications of computer vision , pages=
Mobilestereonet: Towards lightweight deep networks for stereo matching , author=. Proceedings of the ieee/cvf winter conference on applications of computer vision , pages=
-
[35]
Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
Pyramid stereo matching network , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
-
[36]
2021 International Conference on 3D Vision (3DV) , pages=
Raft-stereo: Multilevel recurrent field transforms for stereo matching , author=. 2021 International Conference on 3D Vision (3DV) , pages=. 2021 , organization=
work page 2021
-
[37]
Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
Nerf-supervised deep stereo , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
-
[38]
Deep learning for monocular depth estimation: A review , author=. Neurocomputing , volume=. 2021 , publisher=
work page 2021
-
[39]
IEEE transactions on pattern analysis and machine intelligence , volume=
Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2020 , publisher=
work page 2020
-
[40]
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
Depth anything: Unleashing the power of large-scale unlabeled data , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
-
[41]
2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops , pages=
Accurate camera calibration using iterative refinement of control points , author=. 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops , pages=. 2009 , organization=
work page 2009
-
[42]
Real-time stereo vision applications , author=. Robot Vision , pages=. 2010 , publisher=
work page 2010
-
[43]
2018 IEEE international conference on robotics and automation (ICRA) , pages=
Squeezeseg: Convolutional neural nets with recurrent crf for real-time road-object segmentation from 3d lidar point cloud , author=. 2018 IEEE international conference on robotics and automation (ICRA) , pages=. 2018 , organization=
work page 2018
-
[44]
Optics & Laser Technology , volume=
Overview of modulation techniques for spatially structured-light 3d imaging , author=. Optics & Laser Technology , volume=. 2024 , publisher=
work page 2024
-
[45]
Computer Graphics Forum , volume=
Time-of-flight cameras in computer graphics , author=. Computer Graphics Forum , volume=. 2010 , organization=
work page 2010
-
[46]
Journal of Visual Communication and Image Representation , volume=
Obtaining depth map from segment-based stereo matching using graph cuts , author=. Journal of Visual Communication and Image Representation , volume=. 2011 , publisher=
work page 2011
-
[47]
Multiple view geometry in computer vision , author=. 2003 , publisher=
work page 2003
-
[48]
Advances in neural information processing systems , volume=
Depth map prediction from a single image using a multi-scale deep network , author=. Advances in neural information processing systems , volume=
-
[49]
2012 IEEE conference on computer vision and pattern recognition , pages=
Are we ready for autonomous driving? the kitti vision benchmark suite , author=. 2012 IEEE conference on computer vision and pattern recognition , pages=. 2012 , organization=
work page 2012
-
[50]
Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
-
[51]
1–a model zoo for robust monocular relative depth estimation
Midas v3. 1--a model zoo for robust monocular relative depth estimation , author=. arXiv preprint arXiv:2307.14460 , year=
-
[52]
Understanding Deep Neural Networks with Rectified Linear Units
Understanding deep neural networks with rectified linear units , author=. arXiv preprint arXiv:1611.01491 , year=
-
[53]
2019 IEEE International Electron Devices Meeting (IEDM) , pages=
High-density multiple bits-per-cell 1T4R RRAM array with gradual SET/RESET and its effectiveness for deep learning , author=. 2019 IEEE International Electron Devices Meeting (IEDM) , pages=. 2019 , organization=
work page 2019
-
[54]
arXiv preprint arXiv:2409.17526 , year=
Drone Stereo Vision for Radiata Pine Branch Detection and Distance Measurement: Integrating SGBM and Segmentation Models , author=. arXiv preprint arXiv:2409.17526 , year=
-
[55]
Drone Stereo Vision for Radiata Pine Branch Detection and Distance Measurement: Utilizing Deep Learning and YOLO Integration , author=. arXiv e-prints , pages=
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.