pith. machine review for the scientific record. sign in

arxiv: 2602.13780 · v2 · submitted 2026-02-14 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

Foundation Model-Driven Semantic Change Detection in Remote Sensing Imagery

Authors on Pith no claims yet

Pith reviewed 2026-05-15 22:04 UTC · model grok-4.3

classification 💻 cs.CV
keywords semantic change detectionremote sensingfoundation modelschange detectionCascaded Gated Decoderpseudo-changesdata efficiency
0
0 comments X

The pith

A modular cascaded decoder lets remote sensing foundation models extract semantic changes while cutting pseudo-changes and training data needs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Remote sensing semantic change detection must distinguish real surface transitions from pseudo-changes caused by lighting, season, or sensor differences. Foundation models such as PerA supply features that stay semantically consistent across these variations. The paper builds PerASCD by attaching a Cascaded Gated Decoder to PerA; the decoder fuses multi-scale features coarsely then refines them adaptively. A Soft Semantic Consistency Loss stabilizes mixed-precision training. On SECOND and LandsatSCD the method reaches new Sek scores of 26.11 percent and 65.21 percent, beats earlier bests, and still wins when trained on only half the data.

Core claim

PerASCD is a unified framework that pairs the PerA remote-sensing foundation model with a Cascaded Gated Decoder and a Soft Semantic Consistency Loss. The decoder accepts multi-scale backbone features, processes them in coarse-to-fine stages, and produces pixel-level change maps for multiple semantic classes. Experiments on SECOND and LandsatSCD establish new state-of-the-art Sek scores of 26.11 percent and 65.21 percent, exceed prior leaders by 0.61 and 4.95 percentage points, maintain superiority at 50 percent data, and generalize across backbones while preserving semantic consistency under radiometric shifts.

What carries the argument

The Cascaded Gated Decoder, a modular coarse-to-fine network that fuses multi-scale features from any backbone and adaptively extracts change information for semantic change detection tasks.

If this is right

  • PerASCD outperforms full-data baselines when trained on only 50 percent of the labeled examples from SECOND and LandsatSCD.
  • The same decoder architecture works without retraining when swapped onto different backbone networks.
  • Interpretability improves because change decisions rest on the foundation model's already consistent semantic embeddings rather than ad-hoc features.
  • Performance remains stable across radiometric and environmental shifts that normally create pseudo-changes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The decoder pattern could transfer to other dense-prediction remote-sensing tasks such as semantic segmentation or instance detection.
  • Data efficiency may allow faster model updates when new satellite imagery arrives with limited labels.
  • Pairing the decoder with future, larger foundation models could further suppress pseudo-change errors.
  • The coarse-to-fine gating might extend naturally to multi-date sequences beyond standard bi-temporal pairs.

Load-bearing premise

The chosen foundation model must keep producing semantically stable features even when imaging conditions and dates differ between the two images.

What would settle it

Run PerASCD and the prior best method on a fresh bi-temporal remote-sensing dataset that contains stronger seasonal, atmospheric, and sensor variations; if the new method no longer leads in Sek score, the consistency premise does not hold.

Figures

Figures reproduced from arXiv: 2602.13780 by Fei Tan, Hengtong Shen, Hong Xie, Li Yan, Peixian Lv, Wenfei Shen, Xinhao Li, Yaxuan Wei.

Figure 1
Figure 1. Figure 1: Existing common paradigms of semantic change detection models: (a) Rigid Dual-Branch: A fixed architecture which change masks are derived from independent semantic branches; (b) Rigid Triple-Branch: A fixed architecture with an explicit change-specific decoder; (c) Adaptive Integration (Ours): A backbone-agnostic paradigm that adaptively fuses multi-scale features from foundation models. C. Semantic Change… view at source ↗
read the original abstract

Remote sensing (RS) change detection is essential for interpreting surface dynamics. Semantic change detection (SCD) further enables pixel-level understanding of multi-class transitions, yet remains sensitive to pseudo-changes induced by imaging conditions. Recent RS foundation models extract semantically consistent features across temporal and environmental variations, which is critical for mitigating pseudo-changes. However, existing SCD methods are often rigid and backbone-specific, lacking the flexibility to integrate diverse multi-scale features from emerging foundation models. To this end, we introduce a modular Cascaded Gated Decoder (CG-Decoder) that bridges various backbones and SCD tasks, processing multi-scale features in a coarse-to-fine manner while enabling adaptive change extraction. Building upon the RS foundation model PerA, we present PerASCD, a unified SCD framework. We further propose a Soft Semantic Consistency Loss (SSCLoss) to mitigate numerical instability in mixed-precision training. Extensive experiments on SECOND and LandsatSCD show that PerASCD achieves new state-of-the-art Sek scores (26.11% and 65.21%), surpassing the previous best by 0.61% and 4.95%, respectively. It also demonstrates exceptional data efficiency (outperforming the full-data baseline with 50% data), seamless cross-backbone generalization, and enhanced interpretability. Our approach maintains robust semantic consistency under radiometric variations, providing a reliable SCD solution. Code: https://github.com/SathShen/PerASCD.git.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces PerASCD, a semantic change detection (SCD) framework for remote sensing imagery that leverages the PerA foundation model. It proposes a modular Cascaded Gated Decoder (CG-Decoder) for coarse-to-fine multi-scale feature processing across backbones and a Soft Semantic Consistency Loss (SSCLoss) to mitigate mixed-precision instability. Experiments on SECOND and LandsatSCD report new SOTA Sek scores of 26.11% and 65.21% (gains of 0.61% and 4.95% over prior best), plus claims of 50%-data efficiency, cross-backbone generalization, and robustness to radiometric variations.

Significance. If the performance deltas are shown to arise from PerA's semantic consistency rather than decoder architecture or training details alone, the work could meaningfully advance foundation-model integration in SCD by reducing pseudo-changes and improving data efficiency. The modular CG-Decoder design offers a practical bridge for future backbones, and code release supports reproducibility.

major comments (2)
  1. [Experimental Results] Experimental Results section: The headline Sek gains (0.61% on SECOND, 4.95% on LandsatSCD) and 50%-data efficiency claim rest on the premise that PerA features remain semantically stable across temporal/radiometric variations. No quantitative isolation (e.g., cosine similarity on unchanged pixels, change-map ablation on pseudo-change regions, or radiometric-perturbation test) is reported to separate PerA's contribution from the CG-Decoder or SSCLoss; without this, attribution of the deltas remains unverified.
  2. [Methodology] Methodology section: The CG-Decoder is described as processing multi-scale features in a coarse-to-fine gated manner, yet no equations or pseudocode detail the gating mechanism, feature fusion, or how it specifically exploits PerA consistency; this leaves the load-bearing architectural novelty underspecified relative to the central claim.
minor comments (2)
  1. [Abstract] Abstract: The reported Sek scores and data-efficiency claim would be strengthened by explicit mention of error bars, number of runs, and exact baseline configurations.
  2. [Code Availability] The code link is provided but should include a README with exact training hyperparameters and dataset splits to aid verification.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review. We address each major comment below and commit to revisions that strengthen the attribution of results and the specification of the architecture.

read point-by-point responses
  1. Referee: [Experimental Results] Experimental Results section: The headline Sek gains (0.61% on SECOND, 4.95% on LandsatSCD) and 50%-data efficiency claim rest on the premise that PerA features remain semantically stable across temporal/radiometric variations. No quantitative isolation (e.g., cosine similarity on unchanged pixels, change-map ablation on pseudo-change regions, or radiometric-perturbation test) is reported to separate PerA's contribution from the CG-Decoder or SSCLoss; without this, attribution of the deltas remains unverified.

    Authors: We agree that stronger isolation of PerA's contribution would improve the manuscript. In the revision we will add (i) cosine-similarity statistics of PerA features computed exclusively on unchanged pixels across temporal pairs, (ii) an ablation that replaces the CG-Decoder with a standard FPN while keeping PerA features fixed, and (iii) a controlled radiometric-perturbation experiment that measures Sek degradation under simulated illumination and sensor shifts. These additions will directly quantify how much of the reported gains and data-efficiency stem from PerA's semantic stability versus the decoder or loss. revision: yes

  2. Referee: [Methodology] Methodology section: The CG-Decoder is described as processing multi-scale features in a coarse-to-fine gated manner, yet no equations or pseudocode detail the gating mechanism, feature fusion, or how it specifically exploits PerA consistency; this leaves the load-bearing architectural novelty underspecified relative to the central claim.

    Authors: We accept that the gating and fusion operations require formal specification. The revised manuscript will include (a) the exact equations for the cascaded gate computation (including the sigmoid-activated gate weights and the element-wise modulation of multi-scale features), (b) the coarse-to-fine fusion formula that progressively refines change maps, and (c) pseudocode for the full CG-Decoder forward pass. These additions will make explicit how the architecture exploits the semantic consistency already present in PerA features. revision: yes

Circularity Check

0 steps flagged

No significant circularity; new modular components evaluated on public benchmarks

full rationale

The paper introduces CG-Decoder and SSCLoss as novel modules, treats the PerA foundation model as an external pretrained input, and reports empirical results on the public SECOND and LandsatSCD datasets. No equation or claim reduces by construction to a fitted parameter or self-definition internal to the paper. The assumption that PerA features are semantically consistent is presented as a premise drawn from prior foundation-model literature rather than derived within this work. No self-citations appear load-bearing for the central performance claims, and the reported Sek gains, data-efficiency results, and cross-backbone tests are independent evaluations rather than tautological restatements of inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The central claim rests on the domain assumption that foundation model features are semantically consistent across variations, plus two newly introduced components whose effectiveness is demonstrated empirically rather than derived from first principles. No explicit free parameters are mentioned in the abstract.

axioms (1)
  • domain assumption Remote sensing foundation models extract semantically consistent features across temporal and environmental variations
    Explicitly stated as critical for mitigating pseudo-changes in the abstract.
invented entities (2)
  • Cascaded Gated Decoder (CG-Decoder) no independent evidence
    purpose: Bridge various backbones and process multi-scale features in a coarse-to-fine manner for adaptive change extraction
    New modular component introduced to integrate foundation model features with SCD tasks
  • Soft Semantic Consistency Loss (SSCLoss) no independent evidence
    purpose: Mitigate numerical instability in mixed-precision training
    New loss function proposed for training stability

pith-pipeline@v0.9.0 · 5579 in / 1360 out tokens · 92489 ms · 2026-05-15T22:04:33.483079+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel echoes
    ?
    echoes

    ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

    Recent RS foundation models extract semantically consistent features across temporal and environmental variations, which is critical for mitigating pseudo-changes.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages · 3 internal anchors

  1. [1]

    Hi -UCD: A Large -scale Dataset for Urban Semantic Change Detection in Remote Sensing Imagery,

    S. Tian, A. Ma, Z. Zheng, and Y. Zhong, “Hi -UCD: A Large -scale Dataset for Urban Semantic Change Detection in Remote Sensing Imagery,” Dec. 28, 2020, arXiv: arXiv:2011.03247. doi: 10.48550/arXiv.2011.03247

  2. [2]

    Ochtyra, A

    A. Ochtyra, A. Marcinkowska -Ochtyra, and E. Raczko, “Threshold - and trend-based vegetation change monitoring algorithm based on the inter - annual multi -temporal normalized difference moisture index series: A case study of the Tatra Mountains,” Remote Sensing of Environment , vol. 249, p. 112026, Nov. 2020, doi: 10.1016/j.rse.2020.112026

  3. [3]

    Building damage assessment for rapid disaster response with a deep object -based semantic change detection framework: From natural disasters to man -made disasters,

    Z. Zheng, Y. Zhong, J. Wang, A. Ma, and L. Zhang, “Building damage assessment for rapid disaster response with a deep object -based semantic change detection framework: From natural disasters to man -made disasters,” Remote Sensing of Environment , vol. 265, p. 112636, Nov. 2021, doi: 10.1016/j.rse.2021.112636

  4. [4]

    A Billion -scale Foundation Model for Remote Sensing Images,

    K. Cha, J. Seo, and T. Lee, “A Billion -scale Foundation Model for Remote Sensing Images,” IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing, pp. 1–17, 2024, doi: 10.1109/JSTARS.2024.3401772

  5. [5]

    RingMo: A Remote Sensing Foundation M odel With Masked Image Modeling,

    X. Sun et al. , “RingMo: A Remote Sensing Foundation M odel With Masked Image Modeling,” IEEE Trans. Geosci. Remote Sensing , vol. 61, pp. 1–22, 2023, doi: 10.1109/TGRS.2022.3194732

  6. [6]

    Multi-level logit distillation

    U. Mall, B. Hariharan, and K. Bala, “Change -Aware Sampling and Contrastive Learning for Satellite Images,” in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , Vancouver, BC, Canada: IEEE, Jun. 2023, pp. 5261 –5270. doi: 10.1109/CVPR52729.2023.00509

  7. [7]

    A contrastive learning foundation model based on perfect ly aligned sample pairs for remote sensing images,

    H. Shen, H. Gu, H. Li, Y. Yang, and A. Qiu, “A contrastive learning foundation model based on perfect ly aligned sample pairs for remote sensing images,” Geo-spatial Information Science, vol. 0, no. 0, pp. 1 –18, Mar. 2026, doi: 10.1080/10095020.2026.2628435

  8. [8]

    Scale-MAE: A Scale -Aware Masked Autoencoder for Multiscale Geospatial Repre sentation Learning,

    C. J. Reed et al., “Scale-MAE: A Scale -Aware Masked Autoencoder for Multiscale Geospatial Repre sentation Learning,” Sep. 22, 2023, arXiv: arXiv:2212.14532. doi: 10.48550/arXiv.2212.14532

  9. [9]

    Seasonal Contrast: Unsupervised Pre -Training from Uncurated Remote Sensing Data,

    O. Mañ as, A. Lacoste, X. Giro -i-Nieto, D. Vazquez, and P. Rodriguez, “Seasonal Contrast: Unsupervised Pre -Training from Uncurated Remote Sensing Data,” May 03, 2021, arXiv: arXiv:2103.16607. doi: 10.48550/arXiv.2103.16607

  10. [10]

    SkySense: A Multi -Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery,

    X. Guo et al. , “SkySense: A Multi -Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery,” Mar. 22, 2024, arXiv: arXiv:2312.10115. doi: 10.48550/arXiv.2312.10115

  11. [11]

    Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere,

    T. Wang and P. Isola, “Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere,” Aug. 15, 2022, arXiv: arXiv:2005.10242. doi: 10.48550/arXiv.2005.10242

  12. [12]

    Masked Autoencoders Are Scalable Vision Learners

    K. He, X. Chen, S. Xie, Y . Li, P. Dollár, and R. Girshick, “Masked Autoencoders Are Scalable Vision Learners,” Dec. 19, 2021, arXiv: arXiv:2111.06377. doi: 10.48550/arXiv.2111.06377

  13. [13]

    Fully Convolutional Siamese Networks for Change De tection,

    R. Caye Daudt, B. Le Saux, and A. Boulch, “Fully Convolutional Siamese Networks for Change De tection,” in 2018 25th IEEE International Conference on Image Processing (ICIP) , Oct. 2018, pp. 4063–4067. doi: 10.1109/ICIP.2018.8451652

  14. [14]

    SNUNet -CD: A Densely Connected Siamese Network for Change Detection of VHR Images,

    S. Fang, K. Li, J. Shao, and Z. Li, “SNUNet -CD: A Densely Connected Siamese Network for Change Detection of VHR Images,” IEEE Geoscience and Remote Sensing Letters , vol. 19, pp. 1 –5, 2022, doi: 10.1109/LGRS.2021.3056416

  15. [15]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    A. Dosovitskiy et al. , “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” Jun. 03, 2021, arXiv: arXiv:2010.11929. doi: 10.48550/arXiv.2010.11929

  16. [16]

    Remote Sensing Image Change Detection With Transformers,

    H. Chen, Z. Qi, and Z. Shi, “Remote Sensing Image Change Detection With Transformers,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–14, 2022, doi: 10.1109/TGRS.2021.3095166

  17. [17]

    A Transformer -Based Siamese Network for Change Detection,

    W. G. C. Bandara an d V. M. Patel, “A Transformer -Based Siamese Network for Change Detection,” in IGARSS 2022 - 2022 IEEE International Geoscience and Remote Sensing Symposium , Jul. 2022, pp. 207–210. doi: 10.1109/IGARSS46834.2022.9883686

  18. [18]

    Ke rnel Slow Feature Analysis for Scene Change Detection,

    C. Wu, L. Zhang, and B. Du, “Ke rnel Slow Feature Analysis for Scene Change Detection,” IEEE Trans. Geosci. Remote Sensing , vol. 55, no. 4, pp. 2367–2384, Apr. 2017, doi: 10.1109/TGRS.2016.2642125

  19. [19]

    Semantic Change Detection with Hypermaps

    T. Suzuki, S. Shirakabe, Y. Miyashita, A. Nakamura, Y. Satoh, and H. Kataoka, “Semantic Change Detection with Hypermaps,” Mar. 16, 2017, arXiv: arXiv:1604.07513. doi: 10.48550/arXiv.1604.07513

  20. [20]

    ChangeNet: A Deep Learning Architecture for Visual Change Detection,

    A. Varghese, J. Gubbi, A. Ramaswamy, and P. Balamuralidha r, “ChangeNet: A Deep Learning Architecture for Visual Change Detection,” presented at the Proceedings of the European Conference on Computer Vision (ECCV) Workshops, 2018, pp. 0 –0. Accessed: Feb. 05, 2026. [Online]. Available: https://openaccess.thecvf.com/content_eccv_2018_workshops/w7/html/V ar...

  21. [21]

    Multitask learning for large-scale semantic change detection,

    R. Caye Daudt, B. Le Saux, A. Boulch, and Y. Gousseau, “Multitask learning for large-scale semantic change detection,” Computer Vision and Image Understanding , vol. 187, p. 102783, Oct. 2019, doi: 10.1016/j.cviu.2019.07.003

  22. [22]

    Scene Change Detection VIA Deep Convolution Canonical Correlation Analysis Neural Network,

    Y. Wang, B. Du, L. Ru, C. Wu, and H. Luo, “Scene Change Detection VIA Deep Convolution Canonical Correlation Analysis Neural Network,” in IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan: IEEE, Jul. 2019, pp. 198–201. doi: 10.1109/IGARSS.2019.8898211

  23. [23]

    SCDNET: A novel convolutional network for semantic change detection in high resolution optical remote sensing imagery,

    D. Peng, L. Bruzzone, Y. Zhang, H. Guan, and P. He, “SCDNET: A novel convolutional network for semantic change detection in high resolution optical remote sensing imagery,” International Journal of Applied Earth Observation and Geoinformation , vol. 103, p. 102465, Dec. 2021, doi: 10.1016/j.jag.2021.102465

  24. [24]

    Semantic -CD: Remote Sensing Image Semantic Change Detection towards Open - vocabulary Setting,

    Y. Zhu, L. Li, K. Chen, C. Liu, F. Zh ou, and Z. Shi, “Semantic -CD: Remote Sensing Image Semantic Change Detection towards Open - vocabulary Setting,” Jan. 12, 2025, arXiv: arXiv:2501.06808. doi: 10.48550/arXiv.2501.06808

  25. [25]

    Feature -Guided Multitask Change Detection Network,

    Y. Deng et al. , “Feature -Guided Multitask Change Detection Network,” IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing , vol. 15, pp. 9667–9679, 2022, doi: 10.1109/JSTARS.2022.3215773

  26. [26]

    Dual -Task Semantic Change Detection for Remote Sensing Images Usin g the Generative Change Field Module,

    S. Xiang, M. Wang, X. Jiang, G. Xie, Z. Zhang, and P. Tang, “Dual -Task Semantic Change Detection for Remote Sensing Images Usin g the Generative Change Field Module,” Remote Sensing , vol. 13, no. 16, p. 3336, Aug. 2021, doi: 10.3390/rs13163336

  27. [27]

    Joint Spatio-Temporal Modeling for Semantic Change Detection in Remote Sensing Images,

    L. Ding, J. Zhang, H. Guo, K. Zhang, B. Liu, and L. Bruzzone, “Joint Spatio-Temporal Modeling for Semantic Change Detection in Remote Sensing Images,” IEEE Trans. Geosci. Remote Sensing, vol. 62, pp. 1–14, 2024, doi: 10.1109/TGRS.2024.3362795

  28. [28]

    MSCD -Net: From Unimodal to Multimodal Semantic Change Detection,

    J. Wang et al. , “MSCD -Net: From Unimodal to Multimodal Semantic Change Detection,” IEEE Trans. Geosci. Remote Sensing , vol. 63, pp. 1 – 17, 2025, doi: 10.1109/TGRS.2025.3591814

  29. [29]

    Semantic Change Detection of Roads and Bridges: A Fine - grained Dataset and Multimodal Frequency -driven Detector,

    Q. Shu et al., “Semantic Change Detection of Roads and Bridges: A Fine - grained Dataset and Multimodal Frequency -driven Detector,” Sep. 19, 2025, arXiv: arXiv:2505.13212. doi: 10.48550/arXiv.2505.13212

  30. [30]

    Mamba -FCS: Joint Spatio - Frequency Feature Fusion, Change-Guided Attention, and SeK Loss for Enhanced Semantic Change Detection in Remote Sensing,

    B. Wijena yake et al. , “Mamba -FCS: Joint Spatio - Frequency Feature Fusion, Change-Guided Attention, and SeK Loss for Enhanced Semantic Change Detection in Remote Sensing,” Aug. 11, 2025, arXiv: arXiv:2508.08232. doi: 10.48550/arXiv.2508.08232

  31. [31]

    FAPMNet: Flow -Aligned Prototype Memory Network for Semantic Change Detection in Remote Sensing Images,

    Y. Chen, C. Li, C . Ling, Y. Tan, and P. Wu, “FAPMNet: Flow -Aligned Prototype Memory Network for Semantic Change Detection in Remote Sensing Images,” IEEE Geoscience and Remote Sensing Letters , vol. 23, pp. 1–5, 2026, doi: 10.1109/LGRS.2026.3652300

  32. [32]

    GSTM-SCD: Graph-enhanced spatio-temporal state space model for semantic change detection in multi -temporal remote sensing images,

    X. Liu et al., “GSTM-SCD: Graph-enhanced spatio-temporal state space model for semantic change detection in multi -temporal remote sensing images,” ISPRS Journal of Photogrammetry and Remote Sensing , vol. 230, pp. 73–91, Dec. 2025, doi: 10.1016/j.isprsjprs.2025.09.003

  33. [33]

    A Review of Deep -Learning Methods for Change Detection in Multispectral Remote Sensing Images,

    E. J. Parelius, “A Review of Deep -Learning Methods for Change Detection in Multispectral Remote Sensing Images,” Remote Sensing, vol. 15, no. 8, p. 2092, Apr. 2023, doi: 10.3390/rs15082092

  34. [34]

    Change Detection Methods for Remote Se nsing in the Last Decade: A Comprehensive Review,

    G. Cheng et al., “Change Detection Methods for Remote Se nsing in the Last Decade: A Comprehensive Review,” Remote Sensing, vol. 16, no. 13, p. 2355, Jan. 2024, doi: 10.3390/rs16132355

  35. [35]

    ChangeRD: A registration - integrated change detection framework for unaligned remote sensing images,

    W. Jing, K. Chi, Q. Li, and Q. Wang, “ChangeRD: A registration - integrated change detection framework for unaligned remote sensing images,” ISPRS Journal of Photogrammetry and Remote Sensing , vol. 220, pp. 64–74, Feb. 2025, doi: 10.1016/j.isprsjprs.2024.11.019

  36. [36]

    Bi - Temporal Semantic Reasoning for the Semantic Cha nge Detection in HR Remote Sensing Images,

    L. Ding, H. Guo, S. Liu, L. Mou, J. Zhang, and L. Bruzzone, “Bi - Temporal Semantic Reasoning for the Semantic Cha nge Detection in HR Remote Sensing Images,” IEEE Trans. Geosci. Remote Sensing , vol. 60, pp. 1–14, 2022, doi: 10.1109/TGRS.2022.3154390

  37. [37]

    Vision Transformer Adapter for Dense Predictions,

    Z. Chen et al., “Vision Transformer Adapter for Dense Predictions,” Feb. 13, 2023, arXiv: arXiv:2205.08534. doi: 10.48550/arXiv.2205.08534

  38. [38]

    U -Net: Convolutional Networks for Biomedical Image Segmentation,

    O. Ronneberger, P. Fischer, and T. Brox, “U -Net: Convolutional Networks for Biomedical Image Segmentation,” in Medical Image Computing and Computer -Assisted Intervention – MICCAI 2015 , N. Navab, J. Hornegger, W. M. Wells, and A. F. Frangi, Eds., Cham: 5 > REPLACE THIS LINE WITH YOUR MANUSCRIPT ID NUMBER (DOUBLE-CLICK HERE TO EDIT) < Springer Internatio...

  39. [39]

    Asymmetric Siamese Networks for Semantic Change Detection in Aerial Images,

    K. Yang et al. , “Asymmetric Siamese Networks for Semantic Change Detection in Aerial Images,” IEEE Trans. Geosci. Remote Sensing , vol. 60, pp. 1–18, 2022, doi: 10.1109/TGRS.2021.3113912

  40. [40]

    A transformer-based Siamese network and an open optical dataset for semantic change detection of remote sensing images,

    P. Yuan, Q. Zhao, X. Zhao, X. Wang, X. Long, and Y. Zheng, “A transformer-based Siamese network and an open optical dataset for semantic change detection of remote sensing images,” International Journal of Digital Earth , vol. 15, no. 1, pp. 1506 –1525, Dec. 2022, doi: 10.1080/17538947.2022.2111470

  41. [41]

    Be the Change You Want to See: Revisiting Remote Sensing Change Detection Practices,

    B. Rolih, M. Fučka, F. Wolf, and L. Č. Zajc, “Be the Change You Want to See: Revisiting Remote Sensing Change Detection Practices,” IEEE Transactions on Geoscience and Remote Sensing, vol. 63, pp. 1–11, 2025, doi: 10.1109/TGRS.2025.3585342

  42. [42]

    Deep Residual Learning for Image Recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” presented at the Proceedings of the IEEE Conference on Computer Vision and P attern Recognition, 2016, pp. 770 –778. Accessed: Feb. 05, 2026. [Online]. Available: https://openaccess.thecvf.com/content_cvpr_2016/html/He_Deep_Residu al_Learning_CVPR_2016_paper.html

  43. [43]

    Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows,

    Z. Liu et al., “Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows,” presented at the Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10012 –10022. Accessed: Feb. 05, 2026. [Online]. Available: https://openaccess.thecvf.com/content/ICCV2021/html/Liu_Swin_Transf ormer_Hierarchical_Vision_Transformer_U...

  44. [44]

    SatMAE: Pre -training Transformers for Temporal and Multi-Spectral Satellite Imagery

    Y. Cong et al. , “SatMAE: Pre -training Transformers for Temporal and Multi-Spectral Satellite Imagery”

  45. [45]

    VMamba: Visual State Space Model,

    Y. Liu et al., “VMamba: Visual State Space Model,” Advances in Neural Information Processing Systems, vol. 37, pp. 103031 –103063, Dec. 2024, doi: 10.52202/079017-3273. Hengtong Shen received the M.E. degree in photogrammetry and remote sensing from the Chinese Academy of Surveying and Mapping in 2025. He is currently pursuing a Ph.D. degree at Wuhan Univ...

  46. [46]

    degree at Wuhan University

    He is currently pursuing a M.E. degree at Wuhan University. His research interests include change detection in urban scenes and point cloud deep learning