arxiv: 2605.00256 · v1 · submitted 2026-04-30 · 💻 cs.CV · cs.AI

Recognition: unknown

Remote SAMsing: From Segment Anything to Segment Everything

Anesmar Olino de Albuquerque, Daniel Guerreiro e Silva, Osmar Ab\'ilio de Carvalho J\'unior, Osmar Luiz Ferreira de Carvalho

Authors on Pith no claims yet

Pith reviewed 2026-05-09 19:48 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords remote sensingsegment anything modelzero-shot segmentationmulti-pass algorithmimage tilingmask mergingcoverage-quality tradeoffSAM2

0 comments

The pith

A multi-pass SAM2 pipeline with mask painting and tile merging segments remote sensing scenes to 91-98% coverage without retraining.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that SAM2's quality-coverage tradeoff on large remote sensing images can be resolved by repeatedly running the model on each tile, accepting precise masks first and then blacking them out to simplify the scene for later passes with relaxed thresholds. Large images are tiled, but contextual padding plus a best-match merge reconstructs objects split across boundaries. On seven scenes spanning 5 cm to 4.78 m ground sample distance, this raises coverage from 30-68% in a single pass to 91-98% while preserving mask quality, and it works on false-color imagery as well. Tile size acts as an implicit scale knob that improves small-object detection. A reader cares because the approach turns a natural-image foundation model into a practical tool for earth observation without any training data or model changes.

Core claim

Remote SAMsing runs SAM2 multiple times per tile, painting accepted high-quality masks black between passes so later iterations see a simpler scene; thresholds are relaxed only after coverage gains plateau. Objects fragmented by tiling are reconstructed via contextual padding and a parameter-free best-match merge. Evaluated on seven scenes, the method lifts coverage from 30-68% to 91-98%, achieves 95% detection on buildings and 82-93% on cars at IoU 0.5, and produces boundaries 3-8 times more precise than SLIC or Felzenszwalb. The pipeline generalizes to MNF false-color data at 99.5% ASA and scales to a 1.94-billion-pixel mosaic at 97% coverage.

What carries the argument

The multi-pass algorithm that paints accepted masks black between iterations and relaxes quality thresholds only when coverage stagnates, together with contextual padding and parameter-free best-match merge for reconstructing objects across tile boundaries.

If this is right

Coverage reaches 91-98% across scenes from 5 cm to 4.78 m GSD while mask quality remains high.
Buildings achieve 95% and cars 82-93% detection at IoU 0.5 with boundaries 3-8 times more precise than SLIC or Felzenszwalb.
Tile size functions as an implicit scale parameter: shrinking tiles from 1000 to 250 pixels raises detection from 56% to 85%.
The method generalizes to MNF false-color imagery at 99.5% ASA without retraining.
It processes production-scale images such as a 1.94-billion-pixel mosaic at 97% coverage.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Inference-time strategies like repeated passes and mask painting can adapt zero-shot models to new visual domains more readily than fine-tuning.
Tile size serving as a scale control suggests a general way to manage object-size variation in any tiled segmentation pipeline.
Success on false-color data indicates the approach may tolerate other spectral or radiometric shifts common in remote sensing.

Load-bearing premise

SAM2's zero-shot segmentation performance on natural images transfers sufficiently well to remote sensing imagery that the multi-pass strategy can increase coverage without degrading mask quality.

What would settle it

Running the pipeline on a new remote sensing collection dominated by complex natural textures where multi-pass coverage stays below 70% or where accepted masks show visibly poorer boundaries than single-pass masks would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.00256 by Anesmar Olino de Albuquerque, Daniel Guerreiro e Silva, Osmar Ab\'ilio de Carvalho J\'unior, Osmar Luiz Ferreira de Carvalho.

**Figure 2.** Figure 2: Multi-pass segmentation progression across three datasets. Each row shows a different scene; each column [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗

**Figure 3.** Figure 3: Visual comparison of pipeline configurations on a BSB-1 crop: (A) original image, (B) SamGeo2, (C) [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗

**Figure 4.** Figure 4: Effect of tile size on car detection. Top row: original image and Remote SAMsing segments at three tile [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗

**Figure 5.** Figure 5: Boundary merge comparison across three datasets. Each row shows a crop centered on a tile boundary [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗

**Figure 6.** Figure 6: Effect of contextual padding on boundary segmentation. Colored segments are affected by the tile boundary [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗

**Figure 7.** Figure 7: Visual comparison of segmentation methods on BSB-1: (A) original image, (B) Remote SAMsing ( [PITH_FULL_IMAGE:figures/full_fig_p020_7.png] view at source ↗

**Figure 8.** Figure 8: Full Potsdam mosaic segmentation (36,000 × 54,000 pixels), with four zoomed panels (A, B, C, and D) showing segment detail in four regions of the mosaic, with matching colored rectangles indicating the source locations. Per-class results confirm that quality does not degrade with image size: buildings reach 89.4% Det@0.5 (vs. 94.9% on the individual Potsdam-1 patch), cars 93.2% (vs. 93.0%), and BIoU remain… view at source ↗

read the original abstract

SAM2 produces high-quality zero-shot segmentation on natural images, but applying it to large remote sensing scenes exposes two problems: (1) its mask generator faces an inherent quality-coverage trade-off: strict thresholds yield precise masks but leave most of the image unsegmented, while relaxed thresholds increase coverage at the cost of mask quality; and (2) large images must be tiled, fragmenting objects across tile boundaries. We propose Remote SAMsing, an open-source pipeline that solves both problems without modifying SAM2 or requiring training data. For coverage, a multi-pass algorithm runs SAM2 repeatedly on each tile, painting accepted masks black between passes to simplify the scene for the next iteration, and relaxing quality thresholds only when coverage gains stagnate, ensuring that the most precise masks are always captured first. For spatial consistency, contextual padding and a parameter-free best-match merge reconstruct objects fragmented across tile boundaries. Evaluated on seven scenes (5~cm to 4.78~m GSD), the pipeline raises coverage from 30--68\% (single-pass SAM2) to 91--98\%. Ablation experiments quantify the contribution of each component to coverage and detection quality. Per-class evaluation shows that SAM2 transfers well to discrete RS objects (buildings 95\%, cars 82--93\% Det@0.5) with segment boundaries 3--8$\times$ more precise than SLIC and Felzenszwalb baselines. Tile size functions as an implicit scale parameter: reducing it from $1{,}000$ to 250 raises Det@0.5 from 56\% to 85\%, outperforming SAM2's built-in multi-scale mechanism. The pipeline generalizes to MNF false-color imagery without retraining (99.5\% ASA) and scales to production-sized images: a 1.94 billion pixel Potsdam mosaic achieved 97\% coverage without quality degradation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Remote SAMsing is a practical engineering fix that lifts SAM2 coverage on remote sensing tiles from 30-68% to 91-98% via black-painting passes and contextual merging, but the no-quality-drop claim rests on indirect proxies rather than direct mask comparisons.

read the letter

The paper's core contribution is a zero-shot pipeline that runs SAM2 repeatedly on image tiles. After each pass it paints accepted masks black so the next round sees a simpler scene, then relaxes thresholds only when coverage stops improving. It adds contextual padding and a best-match merge to stitch objects split across tiles. This combination is new for the remote-sensing use of SAM2 and directly targets the two problems the abstract names: quality-coverage trade-off and boundary fragmentation on large scenes. They report the coverage jump on seven scenes spanning 5 cm to 4.78 m GSD, plus ablations that isolate each component, per-class detection numbers (buildings 95 %, cars 82-93 % at Det@0.5), and 3-8× better boundary precision than SLIC or Felzenszwalb. The method also runs on MNF false-color imagery at 99.5 % ASA and scales to a 1.94-billion-pixel mosaic at 97 % coverage. Those concrete numbers and the open-source claim make the work immediately usable for practitioners who already have SAM2 installed. Tile size acting as an implicit scale parameter is a nice side observation. The main soft spot is the quality-preservation argument. The abstract gives no head-to-head IoU or precision numbers between strict single-pass SAM2 masks and the final multi-pass output; instead it relies on downstream detection scores and comparisons to classical baselines. Seven scenes is a modest test set, and the text does not mention error bars or statistical tests. If SAM2's natural-image priors fail on unseen spectral or textural conditions, the later relaxed passes could quietly add low-quality masks. That risk is not quantified. This paper is for remote-sensing researchers and engineers who want to segment large imagery without retraining foundation models. A reader who needs a drop-in way to boost SAM2 coverage on orthophotos or satellite tiles will find the implementation details and reported gains useful. It is not a theoretical advance, but the engineering is clear and the results are reproducible enough to warrant referee time. I would send it to peer review and ask the authors to add direct mask-quality metrics and at least one more diverse test scene.

Referee Report

2 major / 2 minor

Summary. The manuscript presents Remote SAMsing, an open-source pipeline that applies SAM2 to large remote sensing scenes without retraining or modifying the model. It addresses the quality-coverage trade-off via a multi-pass algorithm that paints accepted masks black between iterations and relaxes thresholds only after coverage stagnates, plus contextual padding and parameter-free best-match merging to handle objects split across tiles. On seven scenes spanning 5 cm to 4.78 m GSD, it reports raising coverage from 30-68% (single-pass SAM2) to 91-98%, with ablations quantifying component contributions, per-class Det@0.5 scores (buildings 95%, cars 82-93%), 3-8x better boundary precision than SLIC/Felzenszwalb, 99.5% ASA on MNF false-color, and successful scaling to a 1.94-billion-pixel mosaic.

Significance. If the no-degradation claim holds, the work supplies a practical, training-free route to high-coverage zero-shot segmentation on remote-sensing data, which could aid large-scale mapping and object detection pipelines. The explicit ablations, cross-GSD evaluation, and demonstration that tile size functions as an implicit scale parameter (outperforming SAM2's built-in multi-scale) are concrete strengths. Generalization to false-color imagery without retraining further broadens applicability.

major comments (2)

[Evaluation] Evaluation section: the central claim that multi-pass processing raises coverage 'without quality degradation' rests on proxy metrics (Det@0.5, boundary precision, ASA) and ablations, yet the manuscript provides no direct side-by-side comparison of mask-level quality (e.g., mean IoU or precision against ground truth) between strict single-pass SAM2 and the final multi-pass output on the same seven scenes. This direct evidence is load-bearing for the assertion that relaxed-threshold passes do not inject low-quality masks.
[Results] Results and ablation experiments: while per-class Det@0.5 and boundary-precision gains versus SLIC/Felzenszwalb are reported, the manuscript does not quantify robustness to dense/overlapping objects or spectral mismatch across the GSD range; if transfer of SAM2 priors fails on unseen RS characteristics, the later passes could still degrade quality despite the black-painting schedule.

minor comments (2)

[Abstract / Results] The abstract and results state coverage ranges (30-68% to 91-98%) and Det@0.5 intervals without error bars, standard deviations, or per-scene tables; adding these would strengthen interpretability of the seven-scene aggregate.
[Method] Implementation details for the 'parameter-free best-match merge' and exact quality-threshold schedule are described at a high level; a short pseudocode block or explicit parameter list would improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and constructive feedback on our manuscript. We address each of the major comments below, providing clarifications and indicating where revisions will be made to strengthen the paper.

read point-by-point responses

Referee: [Evaluation] Evaluation section: the central claim that multi-pass processing raises coverage 'without quality degradation' rests on proxy metrics (Det@0.5, boundary precision, ASA) and ablations, yet the manuscript provides no direct side-by-side comparison of mask-level quality (e.g., mean IoU or precision against ground truth) between strict single-pass SAM2 and the final multi-pass output on the same seven scenes. This direct evidence is load-bearing for the assertion that relaxed-threshold passes do not inject low-quality masks.

Authors: We agree that a direct comparison using mask-level metrics such as mean IoU would provide stronger evidence for the no-degradation claim. Our current evaluation relies on Det@0.5, which is an IoU-based detection metric, and boundary precision, showing that the multi-pass pipeline maintains high quality while increasing coverage. The design of the algorithm prioritizes strict thresholds first, with black-painting preventing re-processing of high-quality areas. To address this, we will add a direct comparison table of mask quality metrics between single-pass and multi-pass outputs on the annotated scenes in the revised manuscript. revision: yes
Referee: [Results] Results and ablation experiments: while per-class Det@0.5 and boundary-precision gains versus SLIC/Felzenszwalb are reported, the manuscript does not quantify robustness to dense/overlapping objects or spectral mismatch across the GSD range; if transfer of SAM2 priors fails on unseen RS characteristics, the later passes could still degrade quality despite the black-painting schedule.

Authors: Our evaluation includes scenes with varying object densities and GSDs, and the strong performance on cars (82-93% Det@0.5) suggests robustness to dense objects. The successful application to MNF false-color imagery (99.5% ASA) demonstrates generalization across spectral characteristics without retraining. We will expand the discussion and ablations to explicitly address performance in dense/overlapping scenarios and across the GSD range to further quantify this robustness. revision: partial

Circularity Check

0 steps flagged

No significant circularity in pipeline description or empirical claims.

full rationale

The paper describes an engineering pipeline applying SAM2 via multi-pass black-painting, threshold relaxation, contextual padding, and best-match merge to remote-sensing tiles. Central claims (coverage lift from 30-68% to 91-98%, per-class Det@0.5, 3-8x boundary precision vs. SLIC/Felzenszwalb, 99.5% ASA on MNF) rest on direct evaluation across seven independent scenes spanning 5 cm to 4.78 m GSD, plus ablations that isolate each component. No equations, fitted parameters renamed as predictions, self-definitional loops, or load-bearing self-citations appear; the transfer assumption is tested rather than presupposed by construction. Results remain falsifiable on held-out imagery and do not reduce to quantities defined inside the paper's own steps.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on the transferability of SAM2 to remote sensing data and the effectiveness of the iterative masking strategy in improving coverage without degrading quality. No new physical entities are introduced; the approach is algorithmic.

free parameters (2)

quality thresholds
Relaxed only when coverage gains stagnate; specific initial values and stagnation criterion function as tunable elements even if not explicitly fitted.
tile size
Functions as an implicit scale parameter; different values (1000 to 250) are tested and affect detection performance.

axioms (2)

domain assumption SAM2 produces high-quality zero-shot segmentation on natural images
Invoked as the starting point for applying the model to remote sensing scenes.
standard math Large remote sensing images must be tiled for processing
Practical computational necessity stated in the abstract.

pith-pipeline@v0.9.0 · 5669 in / 1627 out tokens · 51223 ms · 2026-05-09T19:48:00.052289+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages · 1 internal anchor

[1]

Slic superpixels compared to state-of-the-art superpixel methods

Slic superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell. 34, 2274–2282. doi:10.1109/TPAMI.2012.120. Baatz, M., Schäpe, A.,

work page doi:10.1109/tpami.2012.120 2012
[2]

Multi-resolution, object-oriented fuzzy analysis of remote sensing data for GIS-ready information. ISPRS J. Pho- togramm. Remote Sens. 58, 239–258. doi:10.1016/j.isprsjprs.2003.10.002. Blaschke, T.,

work page doi:10.1016/j.isprsjprs.2003.10.002 2003
[3]

Object based image analysis for remote sensing. ISPRS J. Photogramm. Remote Sens. 65, 2–16. doi:10.1016/j.isprsjprs.2009.06.004. 27 Blaschke, T., Hay, G.J., Kelly, M., Lang, S., Hofmann, P., Addink, E., Feitosa, R.Q., van der Meer, F., van der Werff, H., van Coillie, F., Tiede, D.,

work page doi:10.1016/j.isprsjprs.2009.06.004 2009
[4]

Geographic object-based image analysis – towards a new paradigm. ISPRS J. Photogramm. Remote Sens. 87, 180–191. doi:10.1016/j. isprsjprs.2013.09.014. Bommasani, R., Hudson, D.A., Adeli, E., Altman, R., Arber, S., von Arx, S., Bernstein, M.S., Bohg, J., Bosselut, A., Brunskill, E., et al.,

work page doi:10.1016/j 2013
[5]

On the Opportunities and Risks of Foundation Models

On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258 . de Carvalho, O.L.F., de Carvalho Júnior, O.A., de Albuquerque, A.O., de Bem, P.P., Silva, C.R., Ferreira, P.H.G., de Moura, R.d.S., Gomes, R.A.T., Guimarães, R.F., Borges, D.L.,

work page internal anchor Pith review arXiv
[6]

de Carvalho, O.L.F., de Carvalho Júnior, O.A., Silva, C.R.e., de Albuquerque, A.O., Santana, N.C., Borges, D.L., Gomes, R.A.T., Guimarães, R.F.,

doi:10.3390/rs13010039. de Carvalho, O.L.F., de Carvalho Júnior, O.A., Silva, C.R.e., de Albuquerque, A.O., Santana, N.C., Borges, D.L., Gomes, R.A.T., Guimarães, R.F.,

work page doi:10.3390/rs13010039
[7]

Chen, K., Liu, C., Chen, H., Zhang, H., Li, W., Zou, Z., Shi, Z.,

doi:10.3390/rs14040965. Chen, K., Liu, C., Chen, H., Zhang, H., Li, W., Zou, Z., Shi, Z.,

work page doi:10.3390/rs14040965
[8]

IEEE Trans

Rsprompter: Learning to prompt for remote sensing instance segmentation based on visual foundation model. IEEE Trans. Geosci. Remote Sens. 62, 1–17. doi:10.1109/TGRS.2024.3356074. Cheng, B., Girshick, R., Dollár, P., Berg, A.C., Kirillov, A.,

work page doi:10.1109/tgrs.2024.3356074 2024
[9]

2021 , url =

Boundary IoU: Improving object-centric image segmentation evaluation, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 15329–15337. doi:10.1109/CVPR46437.2021.01508. Clinton, N., Holt, A., Scarborough, J., Yan, L., Gong, P.,

work page doi:10.1109/cvpr46437.2021.01508 2021
[10]

Photogramm

Accuracy assessment measures for object-based image segmentation goodness. Photogramm. Eng. Remote Sens. 76, 289–299. doi:10.14358/PERS.76.3.289. Ding, L., Zhu, K., Peng, D., Tang, H., Yang, K., Bruzzone, L.,

work page doi:10.14358/pers.76.3.289
[11]

IEEE Trans

Adapting segment anything model for change detection in vhr remote sensing images. IEEE Trans. Geosci. Remote Sens. 62, 1–11. doi:10.1109/TGRS.2024.3368168. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.,

work page doi:10.1109/tgrs.2024.3368168 2024
[12]

Efficient graph-based image segmentation. Int. J. Comput. Vis. 59, 167–181. doi:10.1023/B:VISI.0000022288.19776.77. Frazier, A.E., Hemingway, B.L.,

work page doi:10.1023/b:visi.0000022288.19776.77
[13]

Green, A.A., Berman, M., Switzer, P., Craig, M.D.,

doi:10.3390/rs13193930. Green, A.A., Berman, M., Switzer, P., Craig, M.D.,

work page doi:10.3390/rs13193930
[14]

IEEE Trans

A transformation for ordering multi- spectral data in terms of image quality with implications for noise removal. IEEE Trans. Geosci. Remote Sens. 26, 65–74. doi:10.1109/36.3001. Hay, G.J., Castilla, G.,

work page doi:10.1109/36.3001
[15]

(Eds.), Object-Based Image Analysis

Geographic object-based image analysis (GEOBIA): A new name for a new discipline, in: Blaschke, T., Lang, S., Hay, G.J. (Eds.), Object-Based Image Analysis. Springer, pp. 75–89. doi:10.1007/978-3-540-77058-9_4. Huang, B., Reichman, D., Collins, L.M., Bradbury, K., Malof, J.M.,

work page doi:10.1007/978-3-540-77058-9_4
[16]

arXiv preprint arXiv:1805.12219

Tiling and stitching segmentation output for remote sensing: Basic challenges and recommendations. arXiv preprint arXiv:1805.12219 . Kirillov, A., He, K., Girshick, R., Rother, C., Dollár, P.,

work page arXiv
[17]

Reid, and Silvio Savarese

Panoptic segmentation, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 9396–9405. doi:10.1109/CVPR. 2019.00963. Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A.C., Lo, W.Y., et al.,

work page doi:10.1109/cvpr 2019
[18]

Sadler and Jiaman Wu and Wei

Segment anything, in: Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), pp. 3992–4003. doi:10.1109/ICCV51070.2023.00371. Lassalle, P., Inglada, J., Michel, J., Grizonnet, M., Malik, J.,

work page doi:10.1109/iccv51070.2023.00371 2023
[19]

IEEE Trans

A scalable tile-based framework for region-merging segmentation. IEEE Trans. Geosci. Remote Sens. 53, 5473–5485. doi:10.1109/ TGRS.2015.2422848. Li, J., Cai, Y., Li, Q., et al.,

work page arXiv 2015
[20]

29 Lin, Y., Li, H., Shao, W., Yang, Z., Zhao, J., He, X., Luo, P., Zhang, K.,

doi:10.1080/17538947.2024.2328827. 29 Lin, Y., Li, H., Shao, W., Yang, Z., Zhao, J., He, X., Luo, P., Zhang, K.,

work page doi:10.1080/17538947.2024.2328827 2024
[21]

Lv, X., Persello, C., Li, W., Huang, X., Ming, D., Stein, A.,

doi:10.1080/17538947.2026.2645885. Lv, X., Persello, C., Li, W., Huang, X., Ming, D., Stein, A.,

work page doi:10.1080/17538947.2026.2645885 2026
[22]

IEEE Trans

Deep merge: Deep-learning-based region merging for remote sensing image segmentation. IEEE Trans. Geosci. Remote Sens. 63, 1–20. doi:10.1109/TGRS.2025.3544549. Osco, L.P., Wu, Q., de Lemos, E.L., Gonçalves, W.N., Ramos, A.P.M., Li, J., Marcato Junior, J.,

work page doi:10.1109/tgrs.2025.3544549 2025
[23]

The segment anything model (sam) for remote sensing applications: From zero to one shot. Int. J. Appl. Earth Obs. Geoinf. 124, 103540. doi:10.1016/j.jag.2023.103540. Ravi, N., Gabeur, V., Hu, Y.T., Hu, R., Ryali, C., Ma, T., Khedr, H., Rädle, R., Rolland, C., Gustafson, L., et al.,

work page doi:10.1016/j.jag.2023.103540 2023
[24]

Semantic generative augmentations for few-shot counting, in: IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2024, Waikoloa, HI, USA, January 3-8, 2024, IEEE

Segment anything, from space?, in: Proc. IEEE/CVF Winter Conf. Appl. Comput. Vis. (WACV), pp. 8340–8350. doi:10.1109/WACV57701.2024.00817. Ren, X., Malik, J.,

work page doi:10.1109/wacv57701.2024.00817 2024
[25]

IEEE Int

Learning a classification model for segmentation, in: Proc. IEEE Int. Conf. Comput. Vis. (ICCV), pp. 10–17. doi:10.1109/ICCV.2003.1238308. Rottensteiner, F., Sohn, G., Gerke, M., Wegner, J.D., Breitkopf, U., Jung, J.,

work page doi:10.1109/iccv.2003.1238308 2003
[26]

Results of the ISPRS benchmark on urban object detection and 3D building reconstruction. ISPRS J. Photogramm. Remote Sens. 93, 256–271. doi:10.1016/j.isprsjprs.2013.10.004. Ryali, C., Hu, Y.T., Bolya, D., Wei, C., Fan, H., Huang, P.Y., Aggarwal, V., Chowdhury, A., Poursaeed, O., Hoffman, J., Malik, J., Li, Y., Feichtenhofer, C.,

work page doi:10.1016/j.isprsjprs.2013.10.004 2013
[27]

30 Stutz, D., Hermans, A., Leibe, B.,

doi:10.3390/rs11060658. 30 Stutz, D., Hermans, A., Leibe, B.,

work page doi:10.3390/rs11060658
[28]

Superpixels: An evaluation of the state of the art. Comput. Vis. Image Underst. 166, 1–27. doi:10.1016/j.cviu.2017.03.007. Tarjan, R.E.,

work page doi:10.1016/j.cviu.2017.03.007 2017
[29]

Efficiency of a good but not linear set union algorithm. J. ACM 22, 215–225. doi:10.1145/321879.321884. Walther, J., Giraud, R., Clément, M.,

work page doi:10.1145/321879.321884
[30]

ISPRSJ.Photogramm.Remote Sens

A systematic survey and meta-analysis of the segment anything model in remote sensing image processing: Challenges, advances, applications, andopportunities. ISPRSJ.Photogramm.Remote Sens. 229, 436–466. doi:10.1016/j.isprsjprs.2025.08.023. Wang, D., Zhang, J., Du, B., Xu, M., Liu, L., Tao, D., Zhang, L.,

work page doi:10.1016/j.isprsjprs.2025.08.023 2025
[31]

Xiao, A., Xuan, W., Wang, J., Huang, J., Tao, D., Lu, S., Yokoya, N.,

doi:10.21105/joss.05663. Xiao, A., Xuan, W., Wang, J., Huang, J., Tao, D., Lu, S., Yokoya, N.,

work page doi:10.21105/joss.05663
[32]

IEEE Geosci

Foundation models for remote sensing and earth observation: A survey. IEEE Geosci. Remote Sens. Mag. 13, 297–324. doi:10.1109/MGRS.2025.3576766. Xiong, Y., Varadarajan, B., Wu, L., Xiang, X., Xiao, F., Zhu, C., Dai, X., Wang, D., Sun, F., Iandola, F., Krishnamoorthi, R., Chandra, V.,

work page doi:10.1109/mgrs.2025.3576766 2025
[33]

In: 2024 IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR)

Efficientsam: Leveraged masked image pretraining for efficient segment anything, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 16111–16121. doi:10.1109/CVPR52733.2024.01525. Zhang, J., Tang, H.,

work page doi:10.1109/cvpr52733.2024.01525 2024
[34]

arXiv preprint arXiv:2503.12781

Sam2 for image and video segmentation: A comprehensive survey. arXiv preprint arXiv:2503.12781 . Zhao, X., Ding, W., An, Y., Du, Y., Yu, T., Li, M., Tang, M., Wang, J., 2023a. Fast segment anything. arXiv preprint arXiv:2306.12156 . Zhao, Z., Fan, C., Liu, L., 2023b. Geo sam: A qgis plugin using segment anything model (sam) to accelerate geospatial image ...

work page doi:10.5281/zenodo.8191039
[35]

IEEE Geosci

Deep learning in remote sensing: A comprehensive review and list of resources. IEEE Geosci. Remote Sens. Mag. 5, 8–36. doi:10.1109/MGRS.2017.2762307. 32

work page doi:10.1109/mgrs.2017.2762307 2017