Recognition: unknown
Remote SAMsing: From Segment Anything to Segment Everything
Pith reviewed 2026-05-09 19:48 UTC · model grok-4.3
The pith
A multi-pass SAM2 pipeline with mask painting and tile merging segments remote sensing scenes to 91-98% coverage without retraining.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Remote SAMsing runs SAM2 multiple times per tile, painting accepted high-quality masks black between passes so later iterations see a simpler scene; thresholds are relaxed only after coverage gains plateau. Objects fragmented by tiling are reconstructed via contextual padding and a parameter-free best-match merge. Evaluated on seven scenes, the method lifts coverage from 30-68% to 91-98%, achieves 95% detection on buildings and 82-93% on cars at IoU 0.5, and produces boundaries 3-8 times more precise than SLIC or Felzenszwalb. The pipeline generalizes to MNF false-color data at 99.5% ASA and scales to a 1.94-billion-pixel mosaic at 97% coverage.
What carries the argument
The multi-pass algorithm that paints accepted masks black between iterations and relaxes quality thresholds only when coverage stagnates, together with contextual padding and parameter-free best-match merge for reconstructing objects across tile boundaries.
If this is right
- Coverage reaches 91-98% across scenes from 5 cm to 4.78 m GSD while mask quality remains high.
- Buildings achieve 95% and cars 82-93% detection at IoU 0.5 with boundaries 3-8 times more precise than SLIC or Felzenszwalb.
- Tile size functions as an implicit scale parameter: shrinking tiles from 1000 to 250 pixels raises detection from 56% to 85%.
- The method generalizes to MNF false-color imagery at 99.5% ASA without retraining.
- It processes production-scale images such as a 1.94-billion-pixel mosaic at 97% coverage.
Where Pith is reading between the lines
- Inference-time strategies like repeated passes and mask painting can adapt zero-shot models to new visual domains more readily than fine-tuning.
- Tile size serving as a scale control suggests a general way to manage object-size variation in any tiled segmentation pipeline.
- Success on false-color data indicates the approach may tolerate other spectral or radiometric shifts common in remote sensing.
Load-bearing premise
SAM2's zero-shot segmentation performance on natural images transfers sufficiently well to remote sensing imagery that the multi-pass strategy can increase coverage without degrading mask quality.
What would settle it
Running the pipeline on a new remote sensing collection dominated by complex natural textures where multi-pass coverage stays below 70% or where accepted masks show visibly poorer boundaries than single-pass masks would falsify the central claim.
Figures
read the original abstract
SAM2 produces high-quality zero-shot segmentation on natural images, but applying it to large remote sensing scenes exposes two problems: (1) its mask generator faces an inherent quality-coverage trade-off: strict thresholds yield precise masks but leave most of the image unsegmented, while relaxed thresholds increase coverage at the cost of mask quality; and (2) large images must be tiled, fragmenting objects across tile boundaries. We propose Remote SAMsing, an open-source pipeline that solves both problems without modifying SAM2 or requiring training data. For coverage, a multi-pass algorithm runs SAM2 repeatedly on each tile, painting accepted masks black between passes to simplify the scene for the next iteration, and relaxing quality thresholds only when coverage gains stagnate, ensuring that the most precise masks are always captured first. For spatial consistency, contextual padding and a parameter-free best-match merge reconstruct objects fragmented across tile boundaries. Evaluated on seven scenes (5~cm to 4.78~m GSD), the pipeline raises coverage from 30--68\% (single-pass SAM2) to 91--98\%. Ablation experiments quantify the contribution of each component to coverage and detection quality. Per-class evaluation shows that SAM2 transfers well to discrete RS objects (buildings 95\%, cars 82--93\% Det@0.5) with segment boundaries 3--8$\times$ more precise than SLIC and Felzenszwalb baselines. Tile size functions as an implicit scale parameter: reducing it from $1{,}000$ to 250 raises Det@0.5 from 56\% to 85\%, outperforming SAM2's built-in multi-scale mechanism. The pipeline generalizes to MNF false-color imagery without retraining (99.5\% ASA) and scales to production-sized images: a 1.94 billion pixel Potsdam mosaic achieved 97\% coverage without quality degradation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents Remote SAMsing, an open-source pipeline that applies SAM2 to large remote sensing scenes without retraining or modifying the model. It addresses the quality-coverage trade-off via a multi-pass algorithm that paints accepted masks black between iterations and relaxes thresholds only after coverage stagnates, plus contextual padding and parameter-free best-match merging to handle objects split across tiles. On seven scenes spanning 5 cm to 4.78 m GSD, it reports raising coverage from 30-68% (single-pass SAM2) to 91-98%, with ablations quantifying component contributions, per-class Det@0.5 scores (buildings 95%, cars 82-93%), 3-8x better boundary precision than SLIC/Felzenszwalb, 99.5% ASA on MNF false-color, and successful scaling to a 1.94-billion-pixel mosaic.
Significance. If the no-degradation claim holds, the work supplies a practical, training-free route to high-coverage zero-shot segmentation on remote-sensing data, which could aid large-scale mapping and object detection pipelines. The explicit ablations, cross-GSD evaluation, and demonstration that tile size functions as an implicit scale parameter (outperforming SAM2's built-in multi-scale) are concrete strengths. Generalization to false-color imagery without retraining further broadens applicability.
major comments (2)
- [Evaluation] Evaluation section: the central claim that multi-pass processing raises coverage 'without quality degradation' rests on proxy metrics (Det@0.5, boundary precision, ASA) and ablations, yet the manuscript provides no direct side-by-side comparison of mask-level quality (e.g., mean IoU or precision against ground truth) between strict single-pass SAM2 and the final multi-pass output on the same seven scenes. This direct evidence is load-bearing for the assertion that relaxed-threshold passes do not inject low-quality masks.
- [Results] Results and ablation experiments: while per-class Det@0.5 and boundary-precision gains versus SLIC/Felzenszwalb are reported, the manuscript does not quantify robustness to dense/overlapping objects or spectral mismatch across the GSD range; if transfer of SAM2 priors fails on unseen RS characteristics, the later passes could still degrade quality despite the black-painting schedule.
minor comments (2)
- [Abstract / Results] The abstract and results state coverage ranges (30-68% to 91-98%) and Det@0.5 intervals without error bars, standard deviations, or per-scene tables; adding these would strengthen interpretability of the seven-scene aggregate.
- [Method] Implementation details for the 'parameter-free best-match merge' and exact quality-threshold schedule are described at a high level; a short pseudocode block or explicit parameter list would improve reproducibility.
Simulated Author's Rebuttal
We thank the referee for their thorough review and constructive feedback on our manuscript. We address each of the major comments below, providing clarifications and indicating where revisions will be made to strengthen the paper.
read point-by-point responses
-
Referee: [Evaluation] Evaluation section: the central claim that multi-pass processing raises coverage 'without quality degradation' rests on proxy metrics (Det@0.5, boundary precision, ASA) and ablations, yet the manuscript provides no direct side-by-side comparison of mask-level quality (e.g., mean IoU or precision against ground truth) between strict single-pass SAM2 and the final multi-pass output on the same seven scenes. This direct evidence is load-bearing for the assertion that relaxed-threshold passes do not inject low-quality masks.
Authors: We agree that a direct comparison using mask-level metrics such as mean IoU would provide stronger evidence for the no-degradation claim. Our current evaluation relies on Det@0.5, which is an IoU-based detection metric, and boundary precision, showing that the multi-pass pipeline maintains high quality while increasing coverage. The design of the algorithm prioritizes strict thresholds first, with black-painting preventing re-processing of high-quality areas. To address this, we will add a direct comparison table of mask quality metrics between single-pass and multi-pass outputs on the annotated scenes in the revised manuscript. revision: yes
-
Referee: [Results] Results and ablation experiments: while per-class Det@0.5 and boundary-precision gains versus SLIC/Felzenszwalb are reported, the manuscript does not quantify robustness to dense/overlapping objects or spectral mismatch across the GSD range; if transfer of SAM2 priors fails on unseen RS characteristics, the later passes could still degrade quality despite the black-painting schedule.
Authors: Our evaluation includes scenes with varying object densities and GSDs, and the strong performance on cars (82-93% Det@0.5) suggests robustness to dense objects. The successful application to MNF false-color imagery (99.5% ASA) demonstrates generalization across spectral characteristics without retraining. We will expand the discussion and ablations to explicitly address performance in dense/overlapping scenarios and across the GSD range to further quantify this robustness. revision: partial
Circularity Check
No significant circularity in pipeline description or empirical claims.
full rationale
The paper describes an engineering pipeline applying SAM2 via multi-pass black-painting, threshold relaxation, contextual padding, and best-match merge to remote-sensing tiles. Central claims (coverage lift from 30-68% to 91-98%, per-class Det@0.5, 3-8x boundary precision vs. SLIC/Felzenszwalb, 99.5% ASA on MNF) rest on direct evaluation across seven independent scenes spanning 5 cm to 4.78 m GSD, plus ablations that isolate each component. No equations, fitted parameters renamed as predictions, self-definitional loops, or load-bearing self-citations appear; the transfer assumption is tested rather than presupposed by construction. Results remain falsifiable on held-out imagery and do not reduce to quantities defined inside the paper's own steps.
Axiom & Free-Parameter Ledger
free parameters (2)
- quality thresholds
- tile size
axioms (2)
- domain assumption SAM2 produces high-quality zero-shot segmentation on natural images
- standard math Large remote sensing images must be tiled for processing
Reference graph
Works this paper leans on
-
[1]
Slic superpixels compared to state-of-the-art superpixel methods
Slic superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell. 34, 2274–2282. doi:10.1109/TPAMI.2012.120. Baatz, M., Schäpe, A.,
-
[2]
Multi-resolution, object-oriented fuzzy analysis of remote sensing data for GIS-ready information. ISPRS J. Pho- togramm. Remote Sens. 58, 239–258. doi:10.1016/j.isprsjprs.2003.10.002. Blaschke, T.,
-
[3]
Object based image analysis for remote sensing. ISPRS J. Photogramm. Remote Sens. 65, 2–16. doi:10.1016/j.isprsjprs.2009.06.004. 27 Blaschke, T., Hay, G.J., Kelly, M., Lang, S., Hofmann, P., Addink, E., Feitosa, R.Q., van der Meer, F., van der Werff, H., van Coillie, F., Tiede, D.,
-
[4]
Geographic object-based image analysis – towards a new paradigm. ISPRS J. Photogramm. Remote Sens. 87, 180–191. doi:10.1016/j. isprsjprs.2013.09.014. Bommasani, R., Hudson, D.A., Adeli, E., Altman, R., Arber, S., von Arx, S., Bernstein, M.S., Bohg, J., Bosselut, A., Brunskill, E., et al.,
work page doi:10.1016/j 2013
-
[5]
On the Opportunities and Risks of Foundation Models
On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258 . de Carvalho, O.L.F., de Carvalho Júnior, O.A., de Albuquerque, A.O., de Bem, P.P., Silva, C.R., Ferreira, P.H.G., de Moura, R.d.S., Gomes, R.A.T., Guimarães, R.F., Borges, D.L.,
work page internal anchor Pith review arXiv
-
[6]
doi:10.3390/rs13010039. de Carvalho, O.L.F., de Carvalho Júnior, O.A., Silva, C.R.e., de Albuquerque, A.O., Santana, N.C., Borges, D.L., Gomes, R.A.T., Guimarães, R.F.,
-
[7]
Chen, K., Liu, C., Chen, H., Zhang, H., Li, W., Zou, Z., Shi, Z.,
doi:10.3390/rs14040965. Chen, K., Liu, C., Chen, H., Zhang, H., Li, W., Zou, Z., Shi, Z.,
-
[8]
Rsprompter: Learning to prompt for remote sensing instance segmentation based on visual foundation model. IEEE Trans. Geosci. Remote Sens. 62, 1–17. doi:10.1109/TGRS.2024.3356074. Cheng, B., Girshick, R., Dollár, P., Berg, A.C., Kirillov, A.,
-
[9]
Boundary IoU: Improving object-centric image segmentation evaluation, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 15329–15337. doi:10.1109/CVPR46437.2021.01508. Clinton, N., Holt, A., Scarborough, J., Yan, L., Gong, P.,
-
[10]
Accuracy assessment measures for object-based image segmentation goodness. Photogramm. Eng. Remote Sens. 76, 289–299. doi:10.14358/PERS.76.3.289. Ding, L., Zhu, K., Peng, D., Tang, H., Yang, K., Bruzzone, L.,
-
[11]
Adapting segment anything model for change detection in vhr remote sensing images. IEEE Trans. Geosci. Remote Sens. 62, 1–11. doi:10.1109/TGRS.2024.3368168. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.,
-
[12]
Efficient graph-based image segmentation. Int. J. Comput. Vis. 59, 167–181. doi:10.1023/B:VISI.0000022288.19776.77. Frazier, A.E., Hemingway, B.L.,
-
[13]
Green, A.A., Berman, M., Switzer, P., Craig, M.D.,
doi:10.3390/rs13193930. Green, A.A., Berman, M., Switzer, P., Craig, M.D.,
-
[14]
A transformation for ordering multi- spectral data in terms of image quality with implications for noise removal. IEEE Trans. Geosci. Remote Sens. 26, 65–74. doi:10.1109/36.3001. Hay, G.J., Castilla, G.,
-
[15]
(Eds.), Object-Based Image Analysis
Geographic object-based image analysis (GEOBIA): A new name for a new discipline, in: Blaschke, T., Lang, S., Hay, G.J. (Eds.), Object-Based Image Analysis. Springer, pp. 75–89. doi:10.1007/978-3-540-77058-9_4. Huang, B., Reichman, D., Collins, L.M., Bradbury, K., Malof, J.M.,
-
[16]
arXiv preprint arXiv:1805.12219
Tiling and stitching segmentation output for remote sensing: Basic challenges and recommendations. arXiv preprint arXiv:1805.12219 . Kirillov, A., He, K., Girshick, R., Rother, C., Dollár, P.,
-
[17]
Panoptic segmentation, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 9396–9405. doi:10.1109/CVPR. 2019.00963. Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A.C., Lo, W.Y., et al.,
-
[18]
Segment anything, in: Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), pp. 3992–4003. doi:10.1109/ICCV51070.2023.00371. Lassalle, P., Inglada, J., Michel, J., Grizonnet, M., Malik, J.,
-
[19]
A scalable tile-based framework for region-merging segmentation. IEEE Trans. Geosci. Remote Sens. 53, 5473–5485. doi:10.1109/ TGRS.2015.2422848. Li, J., Cai, Y., Li, Q., et al.,
-
[20]
29 Lin, Y., Li, H., Shao, W., Yang, Z., Zhao, J., He, X., Luo, P., Zhang, K.,
doi:10.1080/17538947.2024.2328827. 29 Lin, Y., Li, H., Shao, W., Yang, Z., Zhao, J., He, X., Luo, P., Zhang, K.,
-
[21]
Lv, X., Persello, C., Li, W., Huang, X., Ming, D., Stein, A.,
doi:10.1080/17538947.2026.2645885. Lv, X., Persello, C., Li, W., Huang, X., Ming, D., Stein, A.,
-
[22]
Deep merge: Deep-learning-based region merging for remote sensing image segmentation. IEEE Trans. Geosci. Remote Sens. 63, 1–20. doi:10.1109/TGRS.2025.3544549. Osco, L.P., Wu, Q., de Lemos, E.L., Gonçalves, W.N., Ramos, A.P.M., Li, J., Marcato Junior, J.,
-
[23]
The segment anything model (sam) for remote sensing applications: From zero to one shot. Int. J. Appl. Earth Obs. Geoinf. 124, 103540. doi:10.1016/j.jag.2023.103540. Ravi, N., Gabeur, V., Hu, Y.T., Hu, R., Ryali, C., Ma, T., Khedr, H., Rädle, R., Rolland, C., Gustafson, L., et al.,
-
[24]
Segment anything, from space?, in: Proc. IEEE/CVF Winter Conf. Appl. Comput. Vis. (WACV), pp. 8340–8350. doi:10.1109/WACV57701.2024.00817. Ren, X., Malik, J.,
-
[25]
Learning a classification model for segmentation, in: Proc. IEEE Int. Conf. Comput. Vis. (ICCV), pp. 10–17. doi:10.1109/ICCV.2003.1238308. Rottensteiner, F., Sohn, G., Gerke, M., Wegner, J.D., Breitkopf, U., Jung, J.,
-
[26]
Results of the ISPRS benchmark on urban object detection and 3D building reconstruction. ISPRS J. Photogramm. Remote Sens. 93, 256–271. doi:10.1016/j.isprsjprs.2013.10.004. Ryali, C., Hu, Y.T., Bolya, D., Wei, C., Fan, H., Huang, P.Y., Aggarwal, V., Chowdhury, A., Poursaeed, O., Hoffman, J., Malik, J., Li, Y., Feichtenhofer, C.,
-
[27]
30 Stutz, D., Hermans, A., Leibe, B.,
doi:10.3390/rs11060658. 30 Stutz, D., Hermans, A., Leibe, B.,
-
[28]
Superpixels: An evaluation of the state of the art. Comput. Vis. Image Underst. 166, 1–27. doi:10.1016/j.cviu.2017.03.007. Tarjan, R.E.,
-
[29]
Efficiency of a good but not linear set union algorithm. J. ACM 22, 215–225. doi:10.1145/321879.321884. Walther, J., Giraud, R., Clément, M.,
-
[30]
A systematic survey and meta-analysis of the segment anything model in remote sensing image processing: Challenges, advances, applications, andopportunities. ISPRSJ.Photogramm.Remote Sens. 229, 436–466. doi:10.1016/j.isprsjprs.2025.08.023. Wang, D., Zhang, J., Du, B., Xu, M., Liu, L., Tao, D., Zhang, L.,
-
[31]
Xiao, A., Xuan, W., Wang, J., Huang, J., Tao, D., Lu, S., Yokoya, N.,
doi:10.21105/joss.05663. Xiao, A., Xuan, W., Wang, J., Huang, J., Tao, D., Lu, S., Yokoya, N.,
-
[32]
Foundation models for remote sensing and earth observation: A survey. IEEE Geosci. Remote Sens. Mag. 13, 297–324. doi:10.1109/MGRS.2025.3576766. Xiong, Y., Varadarajan, B., Wu, L., Xiang, X., Xiao, F., Zhu, C., Dai, X., Wang, D., Sun, F., Iandola, F., Krishnamoorthi, R., Chandra, V.,
-
[33]
In: 2024 IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR)
Efficientsam: Leveraged masked image pretraining for efficient segment anything, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 16111–16121. doi:10.1109/CVPR52733.2024.01525. Zhang, J., Tang, H.,
-
[34]
arXiv preprint arXiv:2503.12781
Sam2 for image and video segmentation: A comprehensive survey. arXiv preprint arXiv:2503.12781 . Zhao, X., Ding, W., An, Y., Du, Y., Yu, T., Li, M., Tang, M., Wang, J., 2023a. Fast segment anything. arXiv preprint arXiv:2306.12156 . Zhao, Z., Fan, C., Liu, L., 2023b. Geo sam: A qgis plugin using segment anything model (sam) to accelerate geospatial image ...
-
[35]
Deep learning in remote sensing: A comprehensive review and list of resources. IEEE Geosci. Remote Sens. Mag. 5, 8–36. doi:10.1109/MGRS.2017.2762307. 32
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.