arxiv: 2602.13780 · v2 · submitted 2026-02-14 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

Foundation Model-Driven Semantic Change Detection in Remote Sensing Imagery

Hengtong Shen , Li Yan , Hong Xie , Yaxuan Wei , Xinhao Li , Wenfei Shen , Peixian Lv , Fei Tan

Authors on Pith no claims yet

Pith reviewed 2026-05-15 22:04 UTC · model grok-4.3

classification 💻 cs.CV

keywords semantic change detectionremote sensingfoundation modelschange detectionCascaded Gated Decoderpseudo-changesdata efficiency

0 comments

The pith

A modular cascaded decoder lets remote sensing foundation models extract semantic changes while cutting pseudo-changes and training data needs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Remote sensing semantic change detection must distinguish real surface transitions from pseudo-changes caused by lighting, season, or sensor differences. Foundation models such as PerA supply features that stay semantically consistent across these variations. The paper builds PerASCD by attaching a Cascaded Gated Decoder to PerA; the decoder fuses multi-scale features coarsely then refines them adaptively. A Soft Semantic Consistency Loss stabilizes mixed-precision training. On SECOND and LandsatSCD the method reaches new Sek scores of 26.11 percent and 65.21 percent, beats earlier bests, and still wins when trained on only half the data.

Core claim

PerASCD is a unified framework that pairs the PerA remote-sensing foundation model with a Cascaded Gated Decoder and a Soft Semantic Consistency Loss. The decoder accepts multi-scale backbone features, processes them in coarse-to-fine stages, and produces pixel-level change maps for multiple semantic classes. Experiments on SECOND and LandsatSCD establish new state-of-the-art Sek scores of 26.11 percent and 65.21 percent, exceed prior leaders by 0.61 and 4.95 percentage points, maintain superiority at 50 percent data, and generalize across backbones while preserving semantic consistency under radiometric shifts.

What carries the argument

The Cascaded Gated Decoder, a modular coarse-to-fine network that fuses multi-scale features from any backbone and adaptively extracts change information for semantic change detection tasks.

If this is right

PerASCD outperforms full-data baselines when trained on only 50 percent of the labeled examples from SECOND and LandsatSCD.
The same decoder architecture works without retraining when swapped onto different backbone networks.
Interpretability improves because change decisions rest on the foundation model's already consistent semantic embeddings rather than ad-hoc features.
Performance remains stable across radiometric and environmental shifts that normally create pseudo-changes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The decoder pattern could transfer to other dense-prediction remote-sensing tasks such as semantic segmentation or instance detection.
Data efficiency may allow faster model updates when new satellite imagery arrives with limited labels.
Pairing the decoder with future, larger foundation models could further suppress pseudo-change errors.
The coarse-to-fine gating might extend naturally to multi-date sequences beyond standard bi-temporal pairs.

Load-bearing premise

The chosen foundation model must keep producing semantically stable features even when imaging conditions and dates differ between the two images.

What would settle it

Run PerASCD and the prior best method on a fresh bi-temporal remote-sensing dataset that contains stronger seasonal, atmospheric, and sensor variations; if the new method no longer leads in Sek score, the consistency premise does not hold.

Figures

Figures reproduced from arXiv: 2602.13780 by Fei Tan, Hengtong Shen, Hong Xie, Li Yan, Peixian Lv, Wenfei Shen, Xinhao Li, Yaxuan Wei.

**Figure 1.** Figure 1: Existing common paradigms of semantic change detection models: (a) Rigid Dual-Branch: A fixed architecture which change masks are derived from independent semantic branches; (b) Rigid Triple-Branch: A fixed architecture with an explicit change-specific decoder; (c) Adaptive Integration (Ours): A backbone-agnostic paradigm that adaptively fuses multi-scale features from foundation models. C. Semantic Change… view at source ↗

read the original abstract

Remote sensing (RS) change detection is essential for interpreting surface dynamics. Semantic change detection (SCD) further enables pixel-level understanding of multi-class transitions, yet remains sensitive to pseudo-changes induced by imaging conditions. Recent RS foundation models extract semantically consistent features across temporal and environmental variations, which is critical for mitigating pseudo-changes. However, existing SCD methods are often rigid and backbone-specific, lacking the flexibility to integrate diverse multi-scale features from emerging foundation models. To this end, we introduce a modular Cascaded Gated Decoder (CG-Decoder) that bridges various backbones and SCD tasks, processing multi-scale features in a coarse-to-fine manner while enabling adaptive change extraction. Building upon the RS foundation model PerA, we present PerASCD, a unified SCD framework. We further propose a Soft Semantic Consistency Loss (SSCLoss) to mitigate numerical instability in mixed-precision training. Extensive experiments on SECOND and LandsatSCD show that PerASCD achieves new state-of-the-art Sek scores (26.11% and 65.21%), surpassing the previous best by 0.61% and 4.95%, respectively. It also demonstrates exceptional data efficiency (outperforming the full-data baseline with 50% data), seamless cross-backbone generalization, and enhanced interpretability. Our approach maintains robust semantic consistency under radiometric variations, providing a reliable SCD solution. Code: https://github.com/SathShen/PerASCD.git.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A modular decoder and loss for plugging foundation models into remote sensing semantic change detection, with small benchmark gains but weak isolation of why the foundation model helps.

read the letter

The paper's core contribution is a Cascaded Gated Decoder that takes multi-scale features from any backbone and processes them coarse-to-fine for semantic change detection, plus a Soft Semantic Consistency Loss meant to stabilize mixed-precision training. They build PerASCD on the PerA foundation model and report new Sek scores of 26.11% on SECOND and 65.21% on LandsatSCD, plus better results than the full-data baseline when using only half the training data, and they release code.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces PerASCD, a semantic change detection (SCD) framework for remote sensing imagery that leverages the PerA foundation model. It proposes a modular Cascaded Gated Decoder (CG-Decoder) for coarse-to-fine multi-scale feature processing across backbones and a Soft Semantic Consistency Loss (SSCLoss) to mitigate mixed-precision instability. Experiments on SECOND and LandsatSCD report new SOTA Sek scores of 26.11% and 65.21% (gains of 0.61% and 4.95% over prior best), plus claims of 50%-data efficiency, cross-backbone generalization, and robustness to radiometric variations.

Significance. If the performance deltas are shown to arise from PerA's semantic consistency rather than decoder architecture or training details alone, the work could meaningfully advance foundation-model integration in SCD by reducing pseudo-changes and improving data efficiency. The modular CG-Decoder design offers a practical bridge for future backbones, and code release supports reproducibility.

major comments (2)

[Experimental Results] Experimental Results section: The headline Sek gains (0.61% on SECOND, 4.95% on LandsatSCD) and 50%-data efficiency claim rest on the premise that PerA features remain semantically stable across temporal/radiometric variations. No quantitative isolation (e.g., cosine similarity on unchanged pixels, change-map ablation on pseudo-change regions, or radiometric-perturbation test) is reported to separate PerA's contribution from the CG-Decoder or SSCLoss; without this, attribution of the deltas remains unverified.
[Methodology] Methodology section: The CG-Decoder is described as processing multi-scale features in a coarse-to-fine gated manner, yet no equations or pseudocode detail the gating mechanism, feature fusion, or how it specifically exploits PerA consistency; this leaves the load-bearing architectural novelty underspecified relative to the central claim.

minor comments (2)

[Abstract] Abstract: The reported Sek scores and data-efficiency claim would be strengthened by explicit mention of error bars, number of runs, and exact baseline configurations.
[Code Availability] The code link is provided but should include a README with exact training hyperparameters and dataset splits to aid verification.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review. We address each major comment below and commit to revisions that strengthen the attribution of results and the specification of the architecture.

read point-by-point responses

Referee: [Experimental Results] Experimental Results section: The headline Sek gains (0.61% on SECOND, 4.95% on LandsatSCD) and 50%-data efficiency claim rest on the premise that PerA features remain semantically stable across temporal/radiometric variations. No quantitative isolation (e.g., cosine similarity on unchanged pixels, change-map ablation on pseudo-change regions, or radiometric-perturbation test) is reported to separate PerA's contribution from the CG-Decoder or SSCLoss; without this, attribution of the deltas remains unverified.

Authors: We agree that stronger isolation of PerA's contribution would improve the manuscript. In the revision we will add (i) cosine-similarity statistics of PerA features computed exclusively on unchanged pixels across temporal pairs, (ii) an ablation that replaces the CG-Decoder with a standard FPN while keeping PerA features fixed, and (iii) a controlled radiometric-perturbation experiment that measures Sek degradation under simulated illumination and sensor shifts. These additions will directly quantify how much of the reported gains and data-efficiency stem from PerA's semantic stability versus the decoder or loss. revision: yes
Referee: [Methodology] Methodology section: The CG-Decoder is described as processing multi-scale features in a coarse-to-fine gated manner, yet no equations or pseudocode detail the gating mechanism, feature fusion, or how it specifically exploits PerA consistency; this leaves the load-bearing architectural novelty underspecified relative to the central claim.

Authors: We accept that the gating and fusion operations require formal specification. The revised manuscript will include (a) the exact equations for the cascaded gate computation (including the sigmoid-activated gate weights and the element-wise modulation of multi-scale features), (b) the coarse-to-fine fusion formula that progressively refines change maps, and (c) pseudocode for the full CG-Decoder forward pass. These additions will make explicit how the architecture exploits the semantic consistency already present in PerA features. revision: yes

Circularity Check

0 steps flagged

No significant circularity; new modular components evaluated on public benchmarks

full rationale

The paper introduces CG-Decoder and SSCLoss as novel modules, treats the PerA foundation model as an external pretrained input, and reports empirical results on the public SECOND and LandsatSCD datasets. No equation or claim reduces by construction to a fitted parameter or self-definition internal to the paper. The assumption that PerA features are semantically consistent is presented as a premise drawn from prior foundation-model literature rather than derived within this work. No self-citations appear load-bearing for the central performance claims, and the reported Sek gains, data-efficiency results, and cross-backbone tests are independent evaluations rather than tautological restatements of inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The central claim rests on the domain assumption that foundation model features are semantically consistent across variations, plus two newly introduced components whose effectiveness is demonstrated empirically rather than derived from first principles. No explicit free parameters are mentioned in the abstract.

axioms (1)

domain assumption Remote sensing foundation models extract semantically consistent features across temporal and environmental variations
Explicitly stated as critical for mitigating pseudo-changes in the abstract.

invented entities (2)

Cascaded Gated Decoder (CG-Decoder) no independent evidence
purpose: Bridge various backbones and process multi-scale features in a coarse-to-fine manner for adaptive change extraction
New modular component introduced to integrate foundation model features with SCD tasks
Soft Semantic Consistency Loss (SSCLoss) no independent evidence
purpose: Mitigate numerical instability in mixed-precision training
New loss function proposed for training stability

pith-pipeline@v0.9.0 · 5579 in / 1360 out tokens · 92489 ms · 2026-05-15T22:04:33.483079+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

Recent RS foundation models extract semantically consistent features across temporal and environmental variations, which is critical for mitigating pseudo-changes.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages · 3 internal anchors

[1]

Hi -UCD: A Large -scale Dataset for Urban Semantic Change Detection in Remote Sensing Imagery,

S. Tian, A. Ma, Z. Zheng, and Y. Zhong, “Hi -UCD: A Large -scale Dataset for Urban Semantic Change Detection in Remote Sensing Imagery,” Dec. 28, 2020, arXiv: arXiv:2011.03247. doi: 10.48550/arXiv.2011.03247

work page doi:10.48550/arxiv.2011.03247 2020
[2]

Ochtyra, A

A. Ochtyra, A. Marcinkowska -Ochtyra, and E. Raczko, “Threshold - and trend-based vegetation change monitoring algorithm based on the inter - annual multi -temporal normalized difference moisture index series: A case study of the Tatra Mountains,” Remote Sensing of Environment , vol. 249, p. 112026, Nov. 2020, doi: 10.1016/j.rse.2020.112026

work page doi:10.1016/j.rse.2020.112026 2020
[3]

Building damage assessment for rapid disaster response with a deep object -based semantic change detection framework: From natural disasters to man -made disasters,

Z. Zheng, Y. Zhong, J. Wang, A. Ma, and L. Zhang, “Building damage assessment for rapid disaster response with a deep object -based semantic change detection framework: From natural disasters to man -made disasters,” Remote Sensing of Environment , vol. 265, p. 112636, Nov. 2021, doi: 10.1016/j.rse.2021.112636

work page doi:10.1016/j.rse.2021.112636 2021
[4]

A Billion -scale Foundation Model for Remote Sensing Images,

K. Cha, J. Seo, and T. Lee, “A Billion -scale Foundation Model for Remote Sensing Images,” IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing, pp. 1–17, 2024, doi: 10.1109/JSTARS.2024.3401772

work page doi:10.1109/jstars.2024.3401772 2024
[5]

RingMo: A Remote Sensing Foundation M odel With Masked Image Modeling,

X. Sun et al. , “RingMo: A Remote Sensing Foundation M odel With Masked Image Modeling,” IEEE Trans. Geosci. Remote Sensing , vol. 61, pp. 1–22, 2023, doi: 10.1109/TGRS.2022.3194732

work page doi:10.1109/tgrs.2022.3194732 2023
[6]

Multi-level logit distillation

U. Mall, B. Hariharan, and K. Bala, “Change -Aware Sampling and Contrastive Learning for Satellite Images,” in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , Vancouver, BC, Canada: IEEE, Jun. 2023, pp. 5261 –5270. doi: 10.1109/CVPR52729.2023.00509

work page doi:10.1109/cvpr52729.2023.00509 2023
[7]

A contrastive learning foundation model based on perfect ly aligned sample pairs for remote sensing images,

H. Shen, H. Gu, H. Li, Y. Yang, and A. Qiu, “A contrastive learning foundation model based on perfect ly aligned sample pairs for remote sensing images,” Geo-spatial Information Science, vol. 0, no. 0, pp. 1 –18, Mar. 2026, doi: 10.1080/10095020.2026.2628435

work page doi:10.1080/10095020.2026.2628435 2026
[8]

Scale-MAE: A Scale -Aware Masked Autoencoder for Multiscale Geospatial Repre sentation Learning,

C. J. Reed et al., “Scale-MAE: A Scale -Aware Masked Autoencoder for Multiscale Geospatial Repre sentation Learning,” Sep. 22, 2023, arXiv: arXiv:2212.14532. doi: 10.48550/arXiv.2212.14532

work page doi:10.48550/arxiv.2212.14532 2023
[9]

Seasonal Contrast: Unsupervised Pre -Training from Uncurated Remote Sensing Data,

O. Mañ as, A. Lacoste, X. Giro -i-Nieto, D. Vazquez, and P. Rodriguez, “Seasonal Contrast: Unsupervised Pre -Training from Uncurated Remote Sensing Data,” May 03, 2021, arXiv: arXiv:2103.16607. doi: 10.48550/arXiv.2103.16607

work page doi:10.48550/arxiv.2103.16607 2021
[10]

SkySense: A Multi -Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery,

X. Guo et al. , “SkySense: A Multi -Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery,” Mar. 22, 2024, arXiv: arXiv:2312.10115. doi: 10.48550/arXiv.2312.10115

work page doi:10.48550/arxiv.2312.10115 2024
[11]

Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere,

T. Wang and P. Isola, “Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere,” Aug. 15, 2022, arXiv: arXiv:2005.10242. doi: 10.48550/arXiv.2005.10242

work page doi:10.48550/arxiv.2005.10242 2022
[12]

Masked Autoencoders Are Scalable Vision Learners

K. He, X. Chen, S. Xie, Y . Li, P. Dollár, and R. Girshick, “Masked Autoencoders Are Scalable Vision Learners,” Dec. 19, 2021, arXiv: arXiv:2111.06377. doi: 10.48550/arXiv.2111.06377

work page internal anchor Pith review doi:10.48550/arxiv.2111.06377 2021
[13]

Fully Convolutional Siamese Networks for Change De tection,

R. Caye Daudt, B. Le Saux, and A. Boulch, “Fully Convolutional Siamese Networks for Change De tection,” in 2018 25th IEEE International Conference on Image Processing (ICIP) , Oct. 2018, pp. 4063–4067. doi: 10.1109/ICIP.2018.8451652

work page doi:10.1109/icip.2018.8451652 2018
[14]

SNUNet -CD: A Densely Connected Siamese Network for Change Detection of VHR Images,

S. Fang, K. Li, J. Shao, and Z. Li, “SNUNet -CD: A Densely Connected Siamese Network for Change Detection of VHR Images,” IEEE Geoscience and Remote Sensing Letters , vol. 19, pp. 1 –5, 2022, doi: 10.1109/LGRS.2021.3056416

work page doi:10.1109/lgrs.2021.3056416 2022
[15]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

A. Dosovitskiy et al. , “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” Jun. 03, 2021, arXiv: arXiv:2010.11929. doi: 10.48550/arXiv.2010.11929

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2010.11929 2021
[16]

Remote Sensing Image Change Detection With Transformers,

H. Chen, Z. Qi, and Z. Shi, “Remote Sensing Image Change Detection With Transformers,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–14, 2022, doi: 10.1109/TGRS.2021.3095166

work page doi:10.1109/tgrs.2021.3095166 2022
[17]

A Transformer -Based Siamese Network for Change Detection,

W. G. C. Bandara an d V. M. Patel, “A Transformer -Based Siamese Network for Change Detection,” in IGARSS 2022 - 2022 IEEE International Geoscience and Remote Sensing Symposium , Jul. 2022, pp. 207–210. doi: 10.1109/IGARSS46834.2022.9883686

work page doi:10.1109/igarss46834.2022.9883686 2022
[18]

Ke rnel Slow Feature Analysis for Scene Change Detection,

C. Wu, L. Zhang, and B. Du, “Ke rnel Slow Feature Analysis for Scene Change Detection,” IEEE Trans. Geosci. Remote Sensing , vol. 55, no. 4, pp. 2367–2384, Apr. 2017, doi: 10.1109/TGRS.2016.2642125

work page doi:10.1109/tgrs.2016.2642125 2017
[19]

Semantic Change Detection with Hypermaps

T. Suzuki, S. Shirakabe, Y. Miyashita, A. Nakamura, Y. Satoh, and H. Kataoka, “Semantic Change Detection with Hypermaps,” Mar. 16, 2017, arXiv: arXiv:1604.07513. doi: 10.48550/arXiv.1604.07513

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1604.07513 2017
[20]

ChangeNet: A Deep Learning Architecture for Visual Change Detection,

A. Varghese, J. Gubbi, A. Ramaswamy, and P. Balamuralidha r, “ChangeNet: A Deep Learning Architecture for Visual Change Detection,” presented at the Proceedings of the European Conference on Computer Vision (ECCV) Workshops, 2018, pp. 0 –0. Accessed: Feb. 05, 2026. [Online]. Available: https://openaccess.thecvf.com/content_eccv_2018_workshops/w7/html/V ar...

work page 2018
[21]

Multitask learning for large-scale semantic change detection,

R. Caye Daudt, B. Le Saux, A. Boulch, and Y. Gousseau, “Multitask learning for large-scale semantic change detection,” Computer Vision and Image Understanding , vol. 187, p. 102783, Oct. 2019, doi: 10.1016/j.cviu.2019.07.003

work page doi:10.1016/j.cviu.2019.07.003 2019
[22]

Scene Change Detection VIA Deep Convolution Canonical Correlation Analysis Neural Network,

Y. Wang, B. Du, L. Ru, C. Wu, and H. Luo, “Scene Change Detection VIA Deep Convolution Canonical Correlation Analysis Neural Network,” in IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan: IEEE, Jul. 2019, pp. 198–201. doi: 10.1109/IGARSS.2019.8898211

work page doi:10.1109/igarss.2019.8898211 2019
[23]

SCDNET: A novel convolutional network for semantic change detection in high resolution optical remote sensing imagery,

D. Peng, L. Bruzzone, Y. Zhang, H. Guan, and P. He, “SCDNET: A novel convolutional network for semantic change detection in high resolution optical remote sensing imagery,” International Journal of Applied Earth Observation and Geoinformation , vol. 103, p. 102465, Dec. 2021, doi: 10.1016/j.jag.2021.102465

work page doi:10.1016/j.jag.2021.102465 2021
[24]

Semantic -CD: Remote Sensing Image Semantic Change Detection towards Open - vocabulary Setting,

Y. Zhu, L. Li, K. Chen, C. Liu, F. Zh ou, and Z. Shi, “Semantic -CD: Remote Sensing Image Semantic Change Detection towards Open - vocabulary Setting,” Jan. 12, 2025, arXiv: arXiv:2501.06808. doi: 10.48550/arXiv.2501.06808

work page doi:10.48550/arxiv.2501.06808 2025
[25]

Feature -Guided Multitask Change Detection Network,

Y. Deng et al. , “Feature -Guided Multitask Change Detection Network,” IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing , vol. 15, pp. 9667–9679, 2022, doi: 10.1109/JSTARS.2022.3215773

work page doi:10.1109/jstars.2022.3215773 2022
[26]

Dual -Task Semantic Change Detection for Remote Sensing Images Usin g the Generative Change Field Module,

S. Xiang, M. Wang, X. Jiang, G. Xie, Z. Zhang, and P. Tang, “Dual -Task Semantic Change Detection for Remote Sensing Images Usin g the Generative Change Field Module,” Remote Sensing , vol. 13, no. 16, p. 3336, Aug. 2021, doi: 10.3390/rs13163336

work page doi:10.3390/rs13163336 2021
[27]

Joint Spatio-Temporal Modeling for Semantic Change Detection in Remote Sensing Images,

L. Ding, J. Zhang, H. Guo, K. Zhang, B. Liu, and L. Bruzzone, “Joint Spatio-Temporal Modeling for Semantic Change Detection in Remote Sensing Images,” IEEE Trans. Geosci. Remote Sensing, vol. 62, pp. 1–14, 2024, doi: 10.1109/TGRS.2024.3362795

work page doi:10.1109/tgrs.2024.3362795 2024
[28]

MSCD -Net: From Unimodal to Multimodal Semantic Change Detection,

J. Wang et al. , “MSCD -Net: From Unimodal to Multimodal Semantic Change Detection,” IEEE Trans. Geosci. Remote Sensing , vol. 63, pp. 1 – 17, 2025, doi: 10.1109/TGRS.2025.3591814

work page doi:10.1109/tgrs.2025.3591814 2025
[29]

Semantic Change Detection of Roads and Bridges: A Fine - grained Dataset and Multimodal Frequency -driven Detector,

Q. Shu et al., “Semantic Change Detection of Roads and Bridges: A Fine - grained Dataset and Multimodal Frequency -driven Detector,” Sep. 19, 2025, arXiv: arXiv:2505.13212. doi: 10.48550/arXiv.2505.13212

work page doi:10.48550/arxiv.2505.13212 2025
[30]

Mamba -FCS: Joint Spatio - Frequency Feature Fusion, Change-Guided Attention, and SeK Loss for Enhanced Semantic Change Detection in Remote Sensing,

B. Wijena yake et al. , “Mamba -FCS: Joint Spatio - Frequency Feature Fusion, Change-Guided Attention, and SeK Loss for Enhanced Semantic Change Detection in Remote Sensing,” Aug. 11, 2025, arXiv: arXiv:2508.08232. doi: 10.48550/arXiv.2508.08232

work page doi:10.48550/arxiv.2508.08232 2025
[31]

FAPMNet: Flow -Aligned Prototype Memory Network for Semantic Change Detection in Remote Sensing Images,

Y. Chen, C. Li, C . Ling, Y. Tan, and P. Wu, “FAPMNet: Flow -Aligned Prototype Memory Network for Semantic Change Detection in Remote Sensing Images,” IEEE Geoscience and Remote Sensing Letters , vol. 23, pp. 1–5, 2026, doi: 10.1109/LGRS.2026.3652300

work page doi:10.1109/lgrs.2026.3652300 2026
[32]

GSTM-SCD: Graph-enhanced spatio-temporal state space model for semantic change detection in multi -temporal remote sensing images,

X. Liu et al., “GSTM-SCD: Graph-enhanced spatio-temporal state space model for semantic change detection in multi -temporal remote sensing images,” ISPRS Journal of Photogrammetry and Remote Sensing , vol. 230, pp. 73–91, Dec. 2025, doi: 10.1016/j.isprsjprs.2025.09.003

work page doi:10.1016/j.isprsjprs.2025.09.003 2025
[33]

A Review of Deep -Learning Methods for Change Detection in Multispectral Remote Sensing Images,

E. J. Parelius, “A Review of Deep -Learning Methods for Change Detection in Multispectral Remote Sensing Images,” Remote Sensing, vol. 15, no. 8, p. 2092, Apr. 2023, doi: 10.3390/rs15082092

work page doi:10.3390/rs15082092 2092
[34]

Change Detection Methods for Remote Se nsing in the Last Decade: A Comprehensive Review,

G. Cheng et al., “Change Detection Methods for Remote Se nsing in the Last Decade: A Comprehensive Review,” Remote Sensing, vol. 16, no. 13, p. 2355, Jan. 2024, doi: 10.3390/rs16132355

work page doi:10.3390/rs16132355 2024
[35]

ChangeRD: A registration - integrated change detection framework for unaligned remote sensing images,

W. Jing, K. Chi, Q. Li, and Q. Wang, “ChangeRD: A registration - integrated change detection framework for unaligned remote sensing images,” ISPRS Journal of Photogrammetry and Remote Sensing , vol. 220, pp. 64–74, Feb. 2025, doi: 10.1016/j.isprsjprs.2024.11.019

work page doi:10.1016/j.isprsjprs.2024.11.019 2025
[36]

Bi - Temporal Semantic Reasoning for the Semantic Cha nge Detection in HR Remote Sensing Images,

L. Ding, H. Guo, S. Liu, L. Mou, J. Zhang, and L. Bruzzone, “Bi - Temporal Semantic Reasoning for the Semantic Cha nge Detection in HR Remote Sensing Images,” IEEE Trans. Geosci. Remote Sensing , vol. 60, pp. 1–14, 2022, doi: 10.1109/TGRS.2022.3154390

work page doi:10.1109/tgrs.2022.3154390 2022
[37]

Vision Transformer Adapter for Dense Predictions,

Z. Chen et al., “Vision Transformer Adapter for Dense Predictions,” Feb. 13, 2023, arXiv: arXiv:2205.08534. doi: 10.48550/arXiv.2205.08534

work page doi:10.48550/arxiv.2205.08534 2023
[38]

U -Net: Convolutional Networks for Biomedical Image Segmentation,

O. Ronneberger, P. Fischer, and T. Brox, “U -Net: Convolutional Networks for Biomedical Image Segmentation,” in Medical Image Computing and Computer -Assisted Intervention – MICCAI 2015 , N. Navab, J. Hornegger, W. M. Wells, and A. F. Frangi, Eds., Cham: 5 > REPLACE THIS LINE WITH YOUR MANUSCRIPT ID NUMBER (DOUBLE-CLICK HERE TO EDIT) < Springer Internatio...

work page doi:10.1007/978 2015
[39]

Asymmetric Siamese Networks for Semantic Change Detection in Aerial Images,

K. Yang et al. , “Asymmetric Siamese Networks for Semantic Change Detection in Aerial Images,” IEEE Trans. Geosci. Remote Sensing , vol. 60, pp. 1–18, 2022, doi: 10.1109/TGRS.2021.3113912

work page doi:10.1109/tgrs.2021.3113912 2022
[40]

A transformer-based Siamese network and an open optical dataset for semantic change detection of remote sensing images,

P. Yuan, Q. Zhao, X. Zhao, X. Wang, X. Long, and Y. Zheng, “A transformer-based Siamese network and an open optical dataset for semantic change detection of remote sensing images,” International Journal of Digital Earth , vol. 15, no. 1, pp. 1506 –1525, Dec. 2022, doi: 10.1080/17538947.2022.2111470

work page doi:10.1080/17538947.2022.2111470 2022
[41]

Be the Change You Want to See: Revisiting Remote Sensing Change Detection Practices,

B. Rolih, M. Fučka, F. Wolf, and L. Č. Zajc, “Be the Change You Want to See: Revisiting Remote Sensing Change Detection Practices,” IEEE Transactions on Geoscience and Remote Sensing, vol. 63, pp. 1–11, 2025, doi: 10.1109/TGRS.2025.3585342

work page doi:10.1109/tgrs.2025.3585342 2025
[42]

Deep Residual Learning for Image Recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” presented at the Proceedings of the IEEE Conference on Computer Vision and P attern Recognition, 2016, pp. 770 –778. Accessed: Feb. 05, 2026. [Online]. Available: https://openaccess.thecvf.com/content_cvpr_2016/html/He_Deep_Residu al_Learning_CVPR_2016_paper.html

work page 2016
[43]

Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows,

Z. Liu et al., “Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows,” presented at the Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10012 –10022. Accessed: Feb. 05, 2026. [Online]. Available: https://openaccess.thecvf.com/content/ICCV2021/html/Liu_Swin_Transf ormer_Hierarchical_Vision_Transformer_U...

work page 2021
[44]

SatMAE: Pre -training Transformers for Temporal and Multi-Spectral Satellite Imagery

Y. Cong et al. , “SatMAE: Pre -training Transformers for Temporal and Multi-Spectral Satellite Imagery”

work page
[45]

VMamba: Visual State Space Model,

Y. Liu et al., “VMamba: Visual State Space Model,” Advances in Neural Information Processing Systems, vol. 37, pp. 103031 –103063, Dec. 2024, doi: 10.52202/079017-3273. Hengtong Shen received the M.E. degree in photogrammetry and remote sensing from the Chinese Academy of Surveying and Mapping in 2025. He is currently pursuing a Ph.D. degree at Wuhan Univ...

work page doi:10.52202/079017-3273 2024
[46]

degree at Wuhan University

He is currently pursuing a M.E. degree at Wuhan University. His research interests include change detection in urban scenes and point cloud deep learning

work page