pith. machine review for the scientific record. sign in

arxiv: 2605.00256 · v1 · submitted 2026-04-30 · 💻 cs.CV · cs.AI

Recognition: unknown

Remote SAMsing: From Segment Anything to Segment Everything

Anesmar Olino de Albuquerque, Daniel Guerreiro e Silva, Osmar Ab\'ilio de Carvalho J\'unior, Osmar Luiz Ferreira de Carvalho

Authors on Pith no claims yet

Pith reviewed 2026-05-09 19:48 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords remote sensingsegment anything modelzero-shot segmentationmulti-pass algorithmimage tilingmask mergingcoverage-quality tradeoffSAM2
0
0 comments X

The pith

A multi-pass SAM2 pipeline with mask painting and tile merging segments remote sensing scenes to 91-98% coverage without retraining.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that SAM2's quality-coverage tradeoff on large remote sensing images can be resolved by repeatedly running the model on each tile, accepting precise masks first and then blacking them out to simplify the scene for later passes with relaxed thresholds. Large images are tiled, but contextual padding plus a best-match merge reconstructs objects split across boundaries. On seven scenes spanning 5 cm to 4.78 m ground sample distance, this raises coverage from 30-68% in a single pass to 91-98% while preserving mask quality, and it works on false-color imagery as well. Tile size acts as an implicit scale knob that improves small-object detection. A reader cares because the approach turns a natural-image foundation model into a practical tool for earth observation without any training data or model changes.

Core claim

Remote SAMsing runs SAM2 multiple times per tile, painting accepted high-quality masks black between passes so later iterations see a simpler scene; thresholds are relaxed only after coverage gains plateau. Objects fragmented by tiling are reconstructed via contextual padding and a parameter-free best-match merge. Evaluated on seven scenes, the method lifts coverage from 30-68% to 91-98%, achieves 95% detection on buildings and 82-93% on cars at IoU 0.5, and produces boundaries 3-8 times more precise than SLIC or Felzenszwalb. The pipeline generalizes to MNF false-color data at 99.5% ASA and scales to a 1.94-billion-pixel mosaic at 97% coverage.

What carries the argument

The multi-pass algorithm that paints accepted masks black between iterations and relaxes quality thresholds only when coverage stagnates, together with contextual padding and parameter-free best-match merge for reconstructing objects across tile boundaries.

If this is right

  • Coverage reaches 91-98% across scenes from 5 cm to 4.78 m GSD while mask quality remains high.
  • Buildings achieve 95% and cars 82-93% detection at IoU 0.5 with boundaries 3-8 times more precise than SLIC or Felzenszwalb.
  • Tile size functions as an implicit scale parameter: shrinking tiles from 1000 to 250 pixels raises detection from 56% to 85%.
  • The method generalizes to MNF false-color imagery at 99.5% ASA without retraining.
  • It processes production-scale images such as a 1.94-billion-pixel mosaic at 97% coverage.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Inference-time strategies like repeated passes and mask painting can adapt zero-shot models to new visual domains more readily than fine-tuning.
  • Tile size serving as a scale control suggests a general way to manage object-size variation in any tiled segmentation pipeline.
  • Success on false-color data indicates the approach may tolerate other spectral or radiometric shifts common in remote sensing.

Load-bearing premise

SAM2's zero-shot segmentation performance on natural images transfers sufficiently well to remote sensing imagery that the multi-pass strategy can increase coverage without degrading mask quality.

What would settle it

Running the pipeline on a new remote sensing collection dominated by complex natural textures where multi-pass coverage stays below 70% or where accepted masks show visibly poorer boundaries than single-pass masks would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.00256 by Anesmar Olino de Albuquerque, Daniel Guerreiro e Silva, Osmar Ab\'ilio de Carvalho J\'unior, Osmar Luiz Ferreira de Carvalho.

Figure 1
Figure 1. Figure 1: Full segmentation results across three datasets. (A) BSB-1: high-resolution urban scene at 24 cm GSD [PITH_FULL_IMAGE:figures/full_fig_p011_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Multi-pass segmentation progression across three datasets. Each row shows a different scene; each column [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Visual comparison of pipeline configurations on a BSB-1 crop: (A) original image, (B) SamGeo2, (C) [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Effect of tile size on car detection. Top row: original image and Remote SAMsing segments at three tile [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Boundary merge comparison across three datasets. Each row shows a crop centered on a tile boundary [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Effect of contextual padding on boundary segmentation. Colored segments are affected by the tile boundary [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Visual comparison of segmentation methods on BSB-1: (A) original image, (B) Remote SAMsing ( [PITH_FULL_IMAGE:figures/full_fig_p020_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Full Potsdam mosaic segmentation (36,000 × 54,000 pixels), with four zoomed panels (A, B, C, and D) showing segment detail in four regions of the mosaic, with matching colored rectangles indicating the source locations. Per-class results confirm that quality does not degrade with image size: buildings reach 89.4% Det@0.5 (vs. 94.9% on the individual Potsdam-1 patch), cars 93.2% (vs. 93.0%), and BIoU remain… view at source ↗
read the original abstract

SAM2 produces high-quality zero-shot segmentation on natural images, but applying it to large remote sensing scenes exposes two problems: (1) its mask generator faces an inherent quality-coverage trade-off: strict thresholds yield precise masks but leave most of the image unsegmented, while relaxed thresholds increase coverage at the cost of mask quality; and (2) large images must be tiled, fragmenting objects across tile boundaries. We propose Remote SAMsing, an open-source pipeline that solves both problems without modifying SAM2 or requiring training data. For coverage, a multi-pass algorithm runs SAM2 repeatedly on each tile, painting accepted masks black between passes to simplify the scene for the next iteration, and relaxing quality thresholds only when coverage gains stagnate, ensuring that the most precise masks are always captured first. For spatial consistency, contextual padding and a parameter-free best-match merge reconstruct objects fragmented across tile boundaries. Evaluated on seven scenes (5~cm to 4.78~m GSD), the pipeline raises coverage from 30--68\% (single-pass SAM2) to 91--98\%. Ablation experiments quantify the contribution of each component to coverage and detection quality. Per-class evaluation shows that SAM2 transfers well to discrete RS objects (buildings 95\%, cars 82--93\% Det@0.5) with segment boundaries 3--8$\times$ more precise than SLIC and Felzenszwalb baselines. Tile size functions as an implicit scale parameter: reducing it from $1{,}000$ to 250 raises Det@0.5 from 56\% to 85\%, outperforming SAM2's built-in multi-scale mechanism. The pipeline generalizes to MNF false-color imagery without retraining (99.5\% ASA) and scales to production-sized images: a 1.94 billion pixel Potsdam mosaic achieved 97\% coverage without quality degradation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents Remote SAMsing, an open-source pipeline that applies SAM2 to large remote sensing scenes without retraining or modifying the model. It addresses the quality-coverage trade-off via a multi-pass algorithm that paints accepted masks black between iterations and relaxes thresholds only after coverage stagnates, plus contextual padding and parameter-free best-match merging to handle objects split across tiles. On seven scenes spanning 5 cm to 4.78 m GSD, it reports raising coverage from 30-68% (single-pass SAM2) to 91-98%, with ablations quantifying component contributions, per-class Det@0.5 scores (buildings 95%, cars 82-93%), 3-8x better boundary precision than SLIC/Felzenszwalb, 99.5% ASA on MNF false-color, and successful scaling to a 1.94-billion-pixel mosaic.

Significance. If the no-degradation claim holds, the work supplies a practical, training-free route to high-coverage zero-shot segmentation on remote-sensing data, which could aid large-scale mapping and object detection pipelines. The explicit ablations, cross-GSD evaluation, and demonstration that tile size functions as an implicit scale parameter (outperforming SAM2's built-in multi-scale) are concrete strengths. Generalization to false-color imagery without retraining further broadens applicability.

major comments (2)
  1. [Evaluation] Evaluation section: the central claim that multi-pass processing raises coverage 'without quality degradation' rests on proxy metrics (Det@0.5, boundary precision, ASA) and ablations, yet the manuscript provides no direct side-by-side comparison of mask-level quality (e.g., mean IoU or precision against ground truth) between strict single-pass SAM2 and the final multi-pass output on the same seven scenes. This direct evidence is load-bearing for the assertion that relaxed-threshold passes do not inject low-quality masks.
  2. [Results] Results and ablation experiments: while per-class Det@0.5 and boundary-precision gains versus SLIC/Felzenszwalb are reported, the manuscript does not quantify robustness to dense/overlapping objects or spectral mismatch across the GSD range; if transfer of SAM2 priors fails on unseen RS characteristics, the later passes could still degrade quality despite the black-painting schedule.
minor comments (2)
  1. [Abstract / Results] The abstract and results state coverage ranges (30-68% to 91-98%) and Det@0.5 intervals without error bars, standard deviations, or per-scene tables; adding these would strengthen interpretability of the seven-scene aggregate.
  2. [Method] Implementation details for the 'parameter-free best-match merge' and exact quality-threshold schedule are described at a high level; a short pseudocode block or explicit parameter list would improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and constructive feedback on our manuscript. We address each of the major comments below, providing clarifications and indicating where revisions will be made to strengthen the paper.

read point-by-point responses
  1. Referee: [Evaluation] Evaluation section: the central claim that multi-pass processing raises coverage 'without quality degradation' rests on proxy metrics (Det@0.5, boundary precision, ASA) and ablations, yet the manuscript provides no direct side-by-side comparison of mask-level quality (e.g., mean IoU or precision against ground truth) between strict single-pass SAM2 and the final multi-pass output on the same seven scenes. This direct evidence is load-bearing for the assertion that relaxed-threshold passes do not inject low-quality masks.

    Authors: We agree that a direct comparison using mask-level metrics such as mean IoU would provide stronger evidence for the no-degradation claim. Our current evaluation relies on Det@0.5, which is an IoU-based detection metric, and boundary precision, showing that the multi-pass pipeline maintains high quality while increasing coverage. The design of the algorithm prioritizes strict thresholds first, with black-painting preventing re-processing of high-quality areas. To address this, we will add a direct comparison table of mask quality metrics between single-pass and multi-pass outputs on the annotated scenes in the revised manuscript. revision: yes

  2. Referee: [Results] Results and ablation experiments: while per-class Det@0.5 and boundary-precision gains versus SLIC/Felzenszwalb are reported, the manuscript does not quantify robustness to dense/overlapping objects or spectral mismatch across the GSD range; if transfer of SAM2 priors fails on unseen RS characteristics, the later passes could still degrade quality despite the black-painting schedule.

    Authors: Our evaluation includes scenes with varying object densities and GSDs, and the strong performance on cars (82-93% Det@0.5) suggests robustness to dense objects. The successful application to MNF false-color imagery (99.5% ASA) demonstrates generalization across spectral characteristics without retraining. We will expand the discussion and ablations to explicitly address performance in dense/overlapping scenarios and across the GSD range to further quantify this robustness. revision: partial

Circularity Check

0 steps flagged

No significant circularity in pipeline description or empirical claims.

full rationale

The paper describes an engineering pipeline applying SAM2 via multi-pass black-painting, threshold relaxation, contextual padding, and best-match merge to remote-sensing tiles. Central claims (coverage lift from 30-68% to 91-98%, per-class Det@0.5, 3-8x boundary precision vs. SLIC/Felzenszwalb, 99.5% ASA on MNF) rest on direct evaluation across seven independent scenes spanning 5 cm to 4.78 m GSD, plus ablations that isolate each component. No equations, fitted parameters renamed as predictions, self-definitional loops, or load-bearing self-citations appear; the transfer assumption is tested rather than presupposed by construction. Results remain falsifiable on held-out imagery and do not reduce to quantities defined inside the paper's own steps.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on the transferability of SAM2 to remote sensing data and the effectiveness of the iterative masking strategy in improving coverage without degrading quality. No new physical entities are introduced; the approach is algorithmic.

free parameters (2)
  • quality thresholds
    Relaxed only when coverage gains stagnate; specific initial values and stagnation criterion function as tunable elements even if not explicitly fitted.
  • tile size
    Functions as an implicit scale parameter; different values (1000 to 250) are tested and affect detection performance.
axioms (2)
  • domain assumption SAM2 produces high-quality zero-shot segmentation on natural images
    Invoked as the starting point for applying the model to remote sensing scenes.
  • standard math Large remote sensing images must be tiled for processing
    Practical computational necessity stated in the abstract.

pith-pipeline@v0.9.0 · 5669 in / 1627 out tokens · 51223 ms · 2026-05-09T19:48:00.052289+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages · 1 internal anchor

  1. [1]

    Slic superpixels compared to state-of-the-art superpixel methods

    Slic superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell. 34, 2274–2282. doi:10.1109/TPAMI.2012.120. Baatz, M., Schäpe, A.,

  2. [2]

    Multi-resolution, object-oriented fuzzy analysis of remote sensing data for GIS-ready information. ISPRS J. Pho- togramm. Remote Sens. 58, 239–258. doi:10.1016/j.isprsjprs.2003.10.002. Blaschke, T.,

  3. [3]

    Object based image analysis for remote sensing. ISPRS J. Photogramm. Remote Sens. 65, 2–16. doi:10.1016/j.isprsjprs.2009.06.004. 27 Blaschke, T., Hay, G.J., Kelly, M., Lang, S., Hofmann, P., Addink, E., Feitosa, R.Q., van der Meer, F., van der Werff, H., van Coillie, F., Tiede, D.,

  4. [4]

    Geographic object-based image analysis – towards a new paradigm. ISPRS J. Photogramm. Remote Sens. 87, 180–191. doi:10.1016/j. isprsjprs.2013.09.014. Bommasani, R., Hudson, D.A., Adeli, E., Altman, R., Arber, S., von Arx, S., Bernstein, M.S., Bohg, J., Bosselut, A., Brunskill, E., et al.,

  5. [5]

    On the Opportunities and Risks of Foundation Models

    On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258 . de Carvalho, O.L.F., de Carvalho Júnior, O.A., de Albuquerque, A.O., de Bem, P.P., Silva, C.R., Ferreira, P.H.G., de Moura, R.d.S., Gomes, R.A.T., Guimarães, R.F., Borges, D.L.,

  6. [6]

    de Carvalho, O.L.F., de Carvalho Júnior, O.A., Silva, C.R.e., de Albuquerque, A.O., Santana, N.C., Borges, D.L., Gomes, R.A.T., Guimarães, R.F.,

    doi:10.3390/rs13010039. de Carvalho, O.L.F., de Carvalho Júnior, O.A., Silva, C.R.e., de Albuquerque, A.O., Santana, N.C., Borges, D.L., Gomes, R.A.T., Guimarães, R.F.,

  7. [7]

    Chen, K., Liu, C., Chen, H., Zhang, H., Li, W., Zou, Z., Shi, Z.,

    doi:10.3390/rs14040965. Chen, K., Liu, C., Chen, H., Zhang, H., Li, W., Zou, Z., Shi, Z.,

  8. [8]

    IEEE Trans

    Rsprompter: Learning to prompt for remote sensing instance segmentation based on visual foundation model. IEEE Trans. Geosci. Remote Sens. 62, 1–17. doi:10.1109/TGRS.2024.3356074. Cheng, B., Girshick, R., Dollár, P., Berg, A.C., Kirillov, A.,

  9. [9]

    2021 , url =

    Boundary IoU: Improving object-centric image segmentation evaluation, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 15329–15337. doi:10.1109/CVPR46437.2021.01508. Clinton, N., Holt, A., Scarborough, J., Yan, L., Gong, P.,

  10. [10]

    Photogramm

    Accuracy assessment measures for object-based image segmentation goodness. Photogramm. Eng. Remote Sens. 76, 289–299. doi:10.14358/PERS.76.3.289. Ding, L., Zhu, K., Peng, D., Tang, H., Yang, K., Bruzzone, L.,

  11. [11]

    IEEE Trans

    Adapting segment anything model for change detection in vhr remote sensing images. IEEE Trans. Geosci. Remote Sens. 62, 1–11. doi:10.1109/TGRS.2024.3368168. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.,

  12. [12]

    Efficient graph-based image segmentation. Int. J. Comput. Vis. 59, 167–181. doi:10.1023/B:VISI.0000022288.19776.77. Frazier, A.E., Hemingway, B.L.,

  13. [13]

    Green, A.A., Berman, M., Switzer, P., Craig, M.D.,

    doi:10.3390/rs13193930. Green, A.A., Berman, M., Switzer, P., Craig, M.D.,

  14. [14]

    IEEE Trans

    A transformation for ordering multi- spectral data in terms of image quality with implications for noise removal. IEEE Trans. Geosci. Remote Sens. 26, 65–74. doi:10.1109/36.3001. Hay, G.J., Castilla, G.,

  15. [15]

    (Eds.), Object-Based Image Analysis

    Geographic object-based image analysis (GEOBIA): A new name for a new discipline, in: Blaschke, T., Lang, S., Hay, G.J. (Eds.), Object-Based Image Analysis. Springer, pp. 75–89. doi:10.1007/978-3-540-77058-9_4. Huang, B., Reichman, D., Collins, L.M., Bradbury, K., Malof, J.M.,

  16. [16]

    arXiv preprint arXiv:1805.12219

    Tiling and stitching segmentation output for remote sensing: Basic challenges and recommendations. arXiv preprint arXiv:1805.12219 . Kirillov, A., He, K., Girshick, R., Rother, C., Dollár, P.,

  17. [17]

    Reid, and Silvio Savarese

    Panoptic segmentation, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 9396–9405. doi:10.1109/CVPR. 2019.00963. Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A.C., Lo, W.Y., et al.,

  18. [18]

    Sadler and Jiaman Wu and Wei

    Segment anything, in: Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), pp. 3992–4003. doi:10.1109/ICCV51070.2023.00371. Lassalle, P., Inglada, J., Michel, J., Grizonnet, M., Malik, J.,

  19. [19]

    IEEE Trans

    A scalable tile-based framework for region-merging segmentation. IEEE Trans. Geosci. Remote Sens. 53, 5473–5485. doi:10.1109/ TGRS.2015.2422848. Li, J., Cai, Y., Li, Q., et al.,

  20. [20]

    29 Lin, Y., Li, H., Shao, W., Yang, Z., Zhao, J., He, X., Luo, P., Zhang, K.,

    doi:10.1080/17538947.2024.2328827. 29 Lin, Y., Li, H., Shao, W., Yang, Z., Zhao, J., He, X., Luo, P., Zhang, K.,

  21. [21]

    Lv, X., Persello, C., Li, W., Huang, X., Ming, D., Stein, A.,

    doi:10.1080/17538947.2026.2645885. Lv, X., Persello, C., Li, W., Huang, X., Ming, D., Stein, A.,

  22. [22]

    IEEE Trans

    Deep merge: Deep-learning-based region merging for remote sensing image segmentation. IEEE Trans. Geosci. Remote Sens. 63, 1–20. doi:10.1109/TGRS.2025.3544549. Osco, L.P., Wu, Q., de Lemos, E.L., Gonçalves, W.N., Ramos, A.P.M., Li, J., Marcato Junior, J.,

  23. [23]

    The segment anything model (sam) for remote sensing applications: From zero to one shot. Int. J. Appl. Earth Obs. Geoinf. 124, 103540. doi:10.1016/j.jag.2023.103540. Ravi, N., Gabeur, V., Hu, Y.T., Hu, R., Ryali, C., Ma, T., Khedr, H., Rädle, R., Rolland, C., Gustafson, L., et al.,

  24. [24]

    Semantic generative augmentations for few-shot counting, in: IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2024, Waikoloa, HI, USA, January 3-8, 2024, IEEE

    Segment anything, from space?, in: Proc. IEEE/CVF Winter Conf. Appl. Comput. Vis. (WACV), pp. 8340–8350. doi:10.1109/WACV57701.2024.00817. Ren, X., Malik, J.,

  25. [25]

    IEEE Int

    Learning a classification model for segmentation, in: Proc. IEEE Int. Conf. Comput. Vis. (ICCV), pp. 10–17. doi:10.1109/ICCV.2003.1238308. Rottensteiner, F., Sohn, G., Gerke, M., Wegner, J.D., Breitkopf, U., Jung, J.,

  26. [26]

    Results of the ISPRS benchmark on urban object detection and 3D building reconstruction. ISPRS J. Photogramm. Remote Sens. 93, 256–271. doi:10.1016/j.isprsjprs.2013.10.004. Ryali, C., Hu, Y.T., Bolya, D., Wei, C., Fan, H., Huang, P.Y., Aggarwal, V., Chowdhury, A., Poursaeed, O., Hoffman, J., Malik, J., Li, Y., Feichtenhofer, C.,

  27. [27]

    30 Stutz, D., Hermans, A., Leibe, B.,

    doi:10.3390/rs11060658. 30 Stutz, D., Hermans, A., Leibe, B.,

  28. [28]

    Superpixels: An evaluation of the state of the art. Comput. Vis. Image Underst. 166, 1–27. doi:10.1016/j.cviu.2017.03.007. Tarjan, R.E.,

  29. [29]

    Efficiency of a good but not linear set union algorithm. J. ACM 22, 215–225. doi:10.1145/321879.321884. Walther, J., Giraud, R., Clément, M.,

  30. [30]

    ISPRSJ.Photogramm.Remote Sens

    A systematic survey and meta-analysis of the segment anything model in remote sensing image processing: Challenges, advances, applications, andopportunities. ISPRSJ.Photogramm.Remote Sens. 229, 436–466. doi:10.1016/j.isprsjprs.2025.08.023. Wang, D., Zhang, J., Du, B., Xu, M., Liu, L., Tao, D., Zhang, L.,

  31. [31]

    Xiao, A., Xuan, W., Wang, J., Huang, J., Tao, D., Lu, S., Yokoya, N.,

    doi:10.21105/joss.05663. Xiao, A., Xuan, W., Wang, J., Huang, J., Tao, D., Lu, S., Yokoya, N.,

  32. [32]

    IEEE Geosci

    Foundation models for remote sensing and earth observation: A survey. IEEE Geosci. Remote Sens. Mag. 13, 297–324. doi:10.1109/MGRS.2025.3576766. Xiong, Y., Varadarajan, B., Wu, L., Xiang, X., Xiao, F., Zhu, C., Dai, X., Wang, D., Sun, F., Iandola, F., Krishnamoorthi, R., Chandra, V.,

  33. [33]

    In: 2024 IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR)

    Efficientsam: Leveraged masked image pretraining for efficient segment anything, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 16111–16121. doi:10.1109/CVPR52733.2024.01525. Zhang, J., Tang, H.,

  34. [34]

    arXiv preprint arXiv:2503.12781

    Sam2 for image and video segmentation: A comprehensive survey. arXiv preprint arXiv:2503.12781 . Zhao, X., Ding, W., An, Y., Du, Y., Yu, T., Li, M., Tang, M., Wang, J., 2023a. Fast segment anything. arXiv preprint arXiv:2306.12156 . Zhao, Z., Fan, C., Liu, L., 2023b. Geo sam: A qgis plugin using segment anything model (sam) to accelerate geospatial image ...

  35. [35]

    IEEE Geosci

    Deep learning in remote sensing: A comprehensive review and list of resources. IEEE Geosci. Remote Sens. Mag. 5, 8–36. doi:10.1109/MGRS.2017.2762307. 32