SemDINO: A DINOv3-Driven Network for Cross-Temporal Semantic Alignment in Change Detection

Jinxiao Sun; Lei Wang; Meihua Zhou; Xinyu Tong; Yingjie Tang

arxiv: 2606.09772 · v1 · pith:MPOHMUAVnew · submitted 2026-06-08 · 💻 cs.CV

SemDINO: A DINOv3-Driven Network for Cross-Temporal Semantic Alignment in Change Detection

Xinyu Tong , Meihua Zhou , Jinxiao Sun , Yingjie Tang , Lei Wang This is my paper

Pith reviewed 2026-06-27 17:07 UTC · model grok-4.3

classification 💻 cs.CV

keywords semantic change detectionremote sensingcross-temporal alignmentDINOv3 featureschange detection networkpseudo-change suppressionmulti-scale temporal interaction

0 comments

The pith

SemDINO fuses frozen DINOv3 features with CNNs through gated pyramid fusion and targeted modules to align cross-temporal semantics and suppress pseudo-changes in remote sensing.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents SemDINO as an end-to-end network that tackles semantic change detection by combining a dual-branch encoder with multi-scale temporal interaction and collaborative purification and enhancement steps. It uses frozen DINOv3 features alongside a CNN backbone to build richer representations, then applies a bidirectional transformer module for global alignment across time. Semantic purification, bidirectional change enhancement, and multi-scale change enhancement modules are introduced to reduce false variations from illumination, seasons, or registration issues while keeping real land-cover transitions. A multi-branch head produces the binary change mask, before-and-after semantic maps, and edge constraints together. If the approach holds, it would produce more reliable change maps on public datasets even when interference factors are present.

Core claim

SemDINO integrates a dual-branch encoder that combines a CNN backbone and frozen DINOv3 features via gated pyramid fusion, enabling rich multi-scale semantic representation. A multi-scale temporal bidirectional transformer interaction module achieves global cross-temporal feature alignment. Semantic purification, bidirectional change enhancement, and multi-scale change enhancement modules then suppress pseudo-variations while preserving genuine changes, and a multi-branch prediction head jointly outputs the binary change mask, bi-temporal semantic maps, and edge constraint.

What carries the argument

The dual-branch encoder that combines a CNN backbone and frozen DINOv3 features via gated pyramid fusion, which supplies the multi-scale semantic representations used by the subsequent temporal interaction and change enhancement modules.

If this is right

SemDINO achieves superior performance and generalization against state-of-the-art methods on public remote sensing change detection datasets.
Performance gains are largest in complex scenarios that contain illumination, seasonal, or registration interference.
The multi-branch head simultaneously produces a binary change mask, bi-temporal semantic maps, and an edge constraint.
The overall framework unifies cross-temporal alignment, semantic purification, and multi-scale enhancement within one trainable network.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The choice to keep DINOv3 frozen implies that large pre-trained vision models can be plugged into remote sensing pipelines without full retraining.
The emphasis on suppressing pseudo-changes may transfer to other multi-temporal tasks such as object tracking or anomaly detection in satellite sequences.
Joint prediction of change masks and semantic labels could reduce error propagation compared with pipelines that treat detection and classification separately.

Load-bearing premise

The semantic purification, bidirectional change enhancement, and multi-scale change enhancement modules effectively suppress pseudo-variations caused by illumination, season, and registration noise while preserving genuine changes.

What would settle it

Running SemDINO on a held-out remote sensing dataset dominated by strong seasonal illumination shifts or registration noise and finding no gain in change detection accuracy or semantic label consistency over prior methods would falsify the robustness claim.

Figures

Figures reproduced from arXiv: 2606.09772 by Jinxiao Sun, Lei Wang, Meihua Zhou, Xinyu Tong, Yingjie Tang.

**Figure 1.** Figure 1: Overview of the proposed SemDINO framework. Given bi-temporal remote sensing images It=1 and It=2, the network first extracts multi-scale features using a CNN backbone with FPN, and enhances them with complementary features from the frozen DINOv3 encoder. The Pyramid Fusion (PyFu) module then fuses the CNN and DINO features at each scale. Next, the Multi-scale Bidirectional Temporal Transformer (M-TBTT) al… view at source ↗

**Figure 2.** Figure 2: Overview of the Pyramid Fusion (PyFu) module and multilevel feature extraction from DINOv3. Given the input image It, the frozen DINOv3 encoder extracts multi-level semantic features, which are then processed by Separate Adaptation Blocks (SepAB) to generate aligned multi-level DINO features Fdino,t. Each SepAB adapts the DINO features via a bottleneck structure: Conv1 × 1 → BN → depth-wise Conv3×3 → BN →… view at source ↗

**Figure 3.** Figure 3: Overview of #FeaCE: Change Enhancement Structure. Given the aligned bi-temporal features f ′ 1 and f ′ 2, the pipeline consists of three sequential modules: a. Bi-Change Enhancement (BCE) computes the absolute difference of the input features to extract initial change information, which is then enhanced by a learnable gating branch derived from the sum of the two features. A residual convolution branch is … view at source ↗

read the original abstract

Semantic change detection (SCD) aims to simultaneously locate land-cover changes and identify semantic categories before and after transition. However, existing methods suffer from insufficient cross-temporal alignment, weak multi-scale representation, and poor robustness to pseudo-changes caused by illumination, season, and registration noise. To address these issues, we propose a novel end-to-end semantic change detection network named SemDINO, which integrates a dual-branch encoder, multi-scale temporal interaction, semantic purification, change enhancement, and decoupled multi-task prediction into a unified framework. Specifically, we construct a dual-branch encoder that combines a CNN backbone and frozen DINOv3 features via gated pyramid fusion, enabling rich multi-scale semantic representation. Then, a multi-scale temporal bidirectional transformer interaction (M-TBTT) module is proposed to achieve global cross-temporal feature alignment and information interaction. To further enhance genuine changes and suppress pseudo-variations, we introduce semantic purification (SCP), bidirectional change enhancement (BiChangeEnhance), and multi-scale change enhancement (MCE) modules collaboratively. Finally, a multi-branch CD prediction head is designed to jointly output binary change mask, bi-temporal semantic maps, and edge constraint. Extensive experiments on public remote sensing CD datasets demonstrate that SemDINO achieves superior performance and generalization ability against state-of-the-art methods, especially in complex scenarios with interference factors.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SemDINO pairs frozen DINOv3 with a CNN branch and several new modules for temporal alignment and pseudo-change suppression, but the abstract gives no numbers to check if it works.

read the letter

Hi,

The main thing in this paper is a new end-to-end network for semantic change detection that freezes DINOv3, fuses it with a CNN encoder through gated pyramid fusion, runs a multi-scale temporal bidirectional transformer interaction module for cross-temporal alignment, and then applies semantic purification, bidirectional change enhancement, and multi-scale change enhancement modules before a multi-branch head that predicts change masks, bi-temporal semantics, and edges.

It does a reasonable job naming the practical problems in remote sensing SCD, especially pseudo-changes from illumination, seasons, and registration noise, and it lays out concrete module names and roles to tackle them. Using a strong frozen backbone like DINOv3 is a sensible engineering decision that could help with representation quality without extra training cost.

The soft spot is that the abstract asserts superior performance and better generalization on public datasets but supplies zero metrics, no baseline comparisons, no ablation results, and no dataset details. Without those, there is no way to tell whether the claimed gains come from the new modules or simply from the DINOv3 features. The assumption that the purification and enhancement modules actually suppress noise while preserving real changes remains unverified from the given text.

This is aimed at people working on change detection in remote sensing who might want to try the temporal interaction or purification ideas. A reader could get value from the architecture description even if the results need checking. It deserves a serious referee because the motivation is clear and the framework is specific enough to evaluate once the experiments are in front of someone.

I'd send it to review to see the actual tables and figures.

Referee Report

2 major / 0 minor

Summary. The paper proposes SemDINO, an end-to-end network for semantic change detection (SCD) that uses a dual-branch encoder fusing CNN and frozen DINOv3 features via gated pyramid fusion, a multi-scale temporal bidirectional transformer interaction (M-TBTT) module for cross-temporal alignment, semantic purification (SCP), bidirectional change enhancement (BiChangeEnhance), and multi-scale change enhancement (MCE) modules to suppress pseudo-changes from illumination/season/registration noise, plus a multi-branch head predicting binary change masks, bi-temporal semantics, and edges. It claims superior performance and generalization versus SOTA on public remote sensing CD datasets, especially in complex interference scenarios.

Significance. If the experimental claims hold, the work could advance SCD by showing how frozen DINOv3 features combined with targeted temporal interaction and change-enhancement modules improve robustness to pseudo-variations while preserving genuine changes. The design choices around multi-scale fusion and decoupled prediction address recurring practical issues in remote-sensing change detection.

major comments (2)

[Abstract] Abstract: The central claim that 'extensive experiments on public remote sensing CD datasets demonstrate that SemDINO achieves superior performance and generalization ability against state-of-the-art methods' is unsupported by any quantitative metrics, error bars, baseline comparisons, dataset details, ablation results, or statistical tests in the supplied manuscript text. This prevents evaluation of whether the SCP, BiChangeEnhance, and MCE modules actually suppress pseudo-variations as asserted.
[Abstract] Abstract: No equations, architectural diagrams, or implementation specifics are provided for the M-TBTT, SCP, BiChangeEnhance, or MCE modules, nor for the gated pyramid fusion or multi-branch head. Without these, it is impossible to assess whether the claimed cross-temporal alignment and pseudo-change suppression follow from the architecture or are merely asserted.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the comments regarding the abstract. We address each point below, noting that the full manuscript provides the requested details in the body while agreeing that the abstract can be strengthened for clarity.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that 'extensive experiments on public remote sensing CD datasets demonstrate that SemDINO achieves superior performance and generalization ability against state-of-the-art methods' is unsupported by any quantitative metrics, error bars, baseline comparisons, dataset details, ablation results, or statistical tests in the supplied manuscript text. This prevents evaluation of whether the SCP, BiChangeEnhance, and MCE modules actually suppress pseudo-variations as asserted.

Authors: The full manuscript includes Section 4 with quantitative tables (e.g., comparisons on SECOND and HRSCD datasets showing mIoU and F1 gains over SOTA), ablation studies on SCP/BiChangeEnhance/MCE, dataset details, and robustness analysis to pseudo-changes. The abstract is a high-level summary per standard practice and does not embed all metrics. We will revise the abstract to include 1-2 key performance figures and a brief note on the modules' role in suppressing pseudo-variations. revision: yes
Referee: [Abstract] Abstract: No equations, architectural diagrams, or implementation specifics are provided for the M-TBTT, SCP, BiChangeEnhance, or MCE modules, nor for the gated pyramid fusion or multi-branch head. Without these, it is impossible to assess whether the claimed cross-temporal alignment and pseudo-change suppression follow from the architecture or are merely asserted.

Authors: Abstracts conventionally omit equations and diagrams; these appear in the main text (Figure 1 for overall architecture, Sections 3.2-3.5 with equations for M-TBTT bidirectional interaction, gated pyramid fusion, SCP purification, BiChangeEnhance, MCE, and the multi-branch head). The abstract summarizes the framework. We will partially revise the abstract to reference the figure and key design rationale for cross-temporal alignment. revision: partial

Circularity Check

0 steps flagged

No significant circularity; claims rest on empirical validation

full rationale

The provided abstract and description outline a standard neural architecture (dual-branch encoder with frozen DINOv3, M-TBTT module, SCP/BiChangeEnhance/MCE modules, multi-branch head) whose performance is asserted via experiments on public remote sensing datasets. No equations, parameter-fitting steps, or self-citations appear in the text that would reduce any claimed prediction or uniqueness result to a definition or input by construction. The derivation chain consists of design choices justified externally by ablation studies and SOTA comparisons rather than tautological reductions. This is the expected non-finding for a methods paper whose central assertions are falsifiable on held-out data.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no mathematical derivations, fitted parameters, or new postulated entities; the contribution is described as an empirical network architecture.

pith-pipeline@v0.9.1-grok · 5787 in / 1181 out tokens · 36747 ms · 2026-06-27T17:07:20.179165+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

34 extracted references · 22 canonical work pages

[1]

Mul- titask learning for large-scale semantic change detection,

R. Caye Daudt, B. Le Saux, A. Boulch, and Y . Gousseau, “Mul- titask learning for large-scale semantic change detection,”Computer Vision and Image Understanding, vol. 187, p. 102783, 2019, doi: 10.1016/j.cviu.2019.07.003

work page doi:10.1016/j.cviu.2019.07.003 2019
[2]

Bi- temporal semantic reasoning for the semantic change detection in HR remote sensing images,

L. Ding, H. Guo, S. Liu, L. Mou, J. Zhang, and L. Bruzzone, “Bi- temporal semantic reasoning for the semantic change detection in HR remote sensing images,”IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–14, 2022, doi: 10.1109/TGRS.2022.3154390

work page doi:10.1109/tgrs.2022.3154390 2022
[3]

Joint spatio-temporal modeling for semantic change detection in remote sensing images,

L. Ding, J. Zhang, H. Guo, K. Zhang, B. Liu, and L. Bruzzone, “Joint spatio-temporal modeling for semantic change detection in remote sensing images,”IEEE Trans. Geosci. Remote Sens., vol. 62, pp. 1–14, 2024, doi: 10.1109/TGRS.2024.3362795

work page doi:10.1109/tgrs.2024.3362795 2024
[4]

ChangeMamba: Remote sensing change detection with spatiotemporal state space model,

H. Chen, J. Song, C. Han, J. Xia, and N. Yokoya, “ChangeMamba: Remote sensing change detection with spatiotemporal state space model,” IEEE Trans. Geosci. Remote Sens., vol. 62, pp. 1–20, 2024, doi: 10.1109/TGRS.2024.3417253

work page doi:10.1109/tgrs.2024.3417253 2024
[5]

Enhanced Smart Contract Vulnerability Detection via Graph Neural Networks: Achieving High Accuracy and Efficiency,

Y . Tang, S. Feng, C. Zhao, Y . Chen, Z. Lv, and W. Sun, “A semantic change detection network based on boundary detection and task inter- action for high-resolution remote sensing images,”IEEE Trans. Neural Netw. Learn. Syst., vol. 36, no. 9, pp. 17184–17198, Sept. 2025, doi: 10.1109/TNNLS.2025.3570425

work page doi:10.1109/tnnls.2025.3570425 2025
[6]

Cross-difference seman- tic consistency network for semantic change detection,

Q. Wang, W. Jing, K. Chi, and Y . Yuan, “Cross-difference seman- tic consistency network for semantic change detection,”IEEE Trans. Geosci. Remote Sens., vol. 62, pp. 1–12, 2024, Art. no. 4406312, doi: 10.1109/TGRS.2024.3386334

work page doi:10.1109/tgrs.2024.3386334 2024
[7]

Semantic- CD: Remote sensing image semantic change detection towards open- vocabulary setting,

Y . Zhu, L. Li, K. Chen, C. Liu, F. Zhou, and Z. Shi, “Semantic- CD: Remote sensing image semantic change detection towards open- vocabulary setting,”arXiv preprint arXiv:2501.06808, 2025

arXiv 2025
[8]

Recurrent semantic change detection in VHR remote sensing images using visual foundation models,

J. Zhang, L. Ding, T. Zhou, J. Wang, P. M. Atkinson, and L. Bruzzone, “Recurrent semantic change detection in VHR remote sensing images using visual foundation models,”IEEE Trans. Geosci. Remote Sens., vol. 63, pp. 1–14, 2025, doi: 10.1109/TGRS.2025.3546808

work page doi:10.1109/tgrs.2025.3546808 2025
[9]

Asymmetric Siamese networks for semantic change detection in aerial images,

K. Yanget al., “Asymmetric Siamese networks for semantic change detection in aerial images,”IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–18, 2021

2021
[10]

A transformer-based Siamese network and an open optical dataset for semantic change detection of remote sensing images,

P. Yuan, Q. Zhao, X. Zhao, X. Wang, X. Long, and Y . Zheng, “A transformer-based Siamese network and an open optical dataset for semantic change detection of remote sensing images,”Int. J. Digit. Earth, vol. 15, no. 1, pp. 1506–1525, Dec. 2022

2022
[11]

Fully convo- lutional Siamese networks for change detection,

R. Caye Daudt, B. Le Saux, and A. Boulch, “Fully convo- lutional Siamese networks for change detection,” inProc. 25th IEEE Int. Conf. Image Process. (ICIP), 2018, pp. 4063–4067, doi: 10.1109/ICIP.2018.8451652

work page doi:10.1109/icip.2018.8451652 2018
[12]

SNUNet-CD: A densely connected Siamese network for change detection of VHR images,

S. Fang, K. Li, J. Shao, and Z. Li, “SNUNet-CD: A densely connected Siamese network for change detection of VHR images,” IEEE Geosci. Remote Sens. Lett., vol. 19, pp. 1–5, 2022, doi: 10.1109/LGRS.2021.3056416

work page doi:10.1109/lgrs.2021.3056416 2022
[13]

A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sensing images,

C. Zhang, P. Yue, D. Tapete, L. Jiang, B. Shangguan, L. Huang, and G. Liu, “A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sensing images,”ISPRS J. Photogramm. Remote Sens., vol. 166, pp. 183–200, Aug. 2020, doi: 10.1016/j.isprsjprs.2020.06.003

work page doi:10.1016/j.isprsjprs.2020.06.003 2020
[14]

Remote sensing image change detection with transformers,

H. Chen, Z. Qi, and Z. Shi, “Remote sensing image change detection with transformers,”IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–14, 2022, doi: 10.1109/TGRS.2021.3095166

work page doi:10.1109/tgrs.2021.3095166 2022
[15]

A transformer-based Siamese network for change detection,

W. G. C. Bandara and V . M. Patel, “A transformer-based Siamese network for change detection,” inProc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), 2022, pp. 207–210, doi: 10.1109/IGARSS46834.2022.9883686

work page doi:10.1109/igarss46834.2022.9883686 2022
[16]

KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs

M. Caron, H. Touvron, I. Misra, H. J ´egou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerging properties in self-supervised vision transformers,” inProc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2021, pp. 9650–9660, doi: 10.1109/ICCV48922.2021.00951

work page doi:10.1109/iccv48922.2021.00951 2021
[17]

DINOv2: Learning robust visual features without supervision,

M. Oquabet al., “DINOv2: Learning robust visual features without supervision,”Trans. Mach. Learn. Res., 2024

2024
[18]

Sim ´eoniet al., “DINOv3,”arXiv preprint arXiv:2508.10104, 2025

O. Sim ´eoniet al., “DINOv3,”arXiv preprint arXiv:2508.10104, 2025

Pith/arXiv arXiv 2025
[19]

ChangeCLIP: Remote sensing change detection with multimodal vision-language representation learning,

S. Dong, L. Wang, B. Du, and X. Meng, “ChangeCLIP: Remote sensing change detection with multimodal vision-language representation learning,”ISPRS J. Photogramm. Remote Sens., vol. 208, pp. 53–69, Feb. 2024, doi: 10.1016/j.isprsjprs.2024.01.004

work page doi:10.1016/j.isprsjprs.2024.01.004 2024
[20]

ChangeDINO: DINOv3-driven building change detection in optical remote sensing imagery,

C.-H. Cheng and C.-C. Hsu, “ChangeDINO: DINOv3-driven building change detection in optical remote sensing imagery,”arXiv preprint arXiv:2511.16322, 2025

arXiv 2025
[21]

Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery data set,

S. Ji, S. Wei, and M. Lu, “Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery data set,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 1, pp. 574–586, Jan. 2019

2019
[22]

A spatial–temporal attention-based method and a new dataset for remote sensing image change detection,

H. Chen and Z. Shi, “A spatial–temporal attention-based method and a new dataset for remote sensing image change detection,”Remote Sens., vol. 12, no. 10, p. 1662, May 2020

2020
[23]

ChangeMask: Deep multi-task encoder-transformer-decoder architecture for semantic change detection,

Z. Zheng, Y . Zhong, S. Tian, A. Ma, and L. Zhang, “ChangeMask: Deep multi-task encoder-transformer-decoder architecture for semantic change detection,”ISPRS J. Photogramm. Remote Sens., vol. 183, pp. 228–239, Jan. 2022, doi: 10.1016/j.isprsjprs.2021.10.015

work page doi:10.1016/j.isprsjprs.2021.10.015 2022
[24]

SMNet: Symmetric multi- task network for semantic change detection in remote sensing images based on CNN and Transformer,

Y . Niu, H. Guo, J. Lu, L. Ding, and D. Yu, “SMNet: Symmetric multi- task network for semantic change detection in remote sensing images based on CNN and Transformer,”Remote Sens., vol. 15, no. 4, Art. no. 949, 2023, doi: 10.3390/rs15040949

work page doi:10.3390/rs15040949 2023
[25]

The ClearSCD model: Comprehensively leveraging semantics and change relationships for semantic change detection in high spatial resolution remote sensing imagery,

K. Tang, F. Xu, X. Chen, Q. Dong, Y . Yuan, and J. Chen, “The ClearSCD model: Comprehensively leveraging semantics and change relationships for semantic change detection in high spatial resolution remote sensing imagery,”ISPRS J. Photogramm. Remote Sens., vol. 211, pp. 299–317, May 2024, doi: 10.1016/j.isprsjprs.2024.04.013

work page doi:10.1016/j.isprsjprs.2024.04.013 2024
[26]

A decoder-focused multitask network for semantic change detection,

Z. Li, X. Wang, S. Fang, J. Zhao, S. Yang, and W. Li, “A decoder-focused multitask network for semantic change detection,” IEEE Trans. Geosci. Remote Sens., vol. 62, pp. 1–15, 2024, doi: 10.1109/TGRS.2024.3362728

work page doi:10.1109/tgrs.2024.3362728 2024
[27]

Dual- dimension feature interaction for semantic change detection in remote sensing images,

B. Wang, Z. Jiang, W. Ma, X. Xu, P. Zhang, Y . Wu, and H. Yang, “Dual- dimension feature interaction for semantic change detection in remote sensing images,”IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 17, pp. 9595–9605, 2024, doi: 10.1109/JSTARS.2024.3394571

work page doi:10.1109/jstars.2024.3394571 2024
[28]

Semantic enhancement and change consistency network for semantic change detection in remote sensing images,

Z. Jiang, B. Wang, P. Zhang, Y . Wu, W. Ma, X. Xu, and H. Yang, “Semantic enhancement and change consistency network for semantic change detection in remote sensing images,”Int. J. Digit. Earth, vol. 18, no. 1, 2025, doi: 10.1080/17538947.2025.2496790

work page doi:10.1080/17538947.2025.2496790 2025
[29]

SCD-SAM: Adapting Segment Anything Model for semantic change detection in remote sensing imagery,

L. Mei, Z. Ye, C. Xu, H. Wang, Y . Wang, C. Lei, W. Yang, and Y . Li, “SCD-SAM: Adapting Segment Anything Model for semantic change detection in remote sensing imagery,”IEEE Trans. Geosci. Remote Sens., vol. 62, pp. 1–13, 2024, doi: 10.1109/TGRS.2024.3407884. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 10

work page doi:10.1109/tgrs.2024.3407884 2024
[30]

RemoteCLIP: A vision language foundation model for remote sensing,

F. Liu, D. Chen, Z. Guan, X. Zhou, J. Zhu, Q. Ye, L. Fu, and J. Zhou, “RemoteCLIP: A vision language foundation model for remote sensing,” arXiv preprint arXiv:2306.11029, 2023

arXiv 2023
[31]

Foundation model-driven semantic change detection in remote sensing imagery,

H. Shen, L. Yan, H. Xie, Y . Wei, X. Li, W. Shen, P. Lv, and F. Tan, “Foundation model-driven semantic change detection in remote sensing imagery,”arXiv preprint arXiv:2602.13780, 2026

Pith/arXiv arXiv 2026
[32]

ChangeVFM: Unleashing the power of vision foundation models for semantic change detection in remote sensing images,

H. Huang, K. Ding, D. Zhu, Q. Cheng, X. Huang, X. Huang, S. Wang, and Z. Shao, “ChangeVFM: Unleashing the power of vision foundation models for semantic change detection in remote sensing images,”Geo- spatial Information Science, 2026, doi: 10.1080/10095020.2026.2646372

work page doi:10.1080/10095020.2026.2646372 2026
[33]

BT-HRSCD: High-resolution feature is what you need for a semantic change detection network with a triple-decoding branch,

S. Fang, W. Li, S. Yang, Z. Li, J. Zhao, and X. Wang, “BT-HRSCD: High-resolution feature is what you need for a semantic change detection network with a triple-decoding branch,”IEEE Trans. Geosci. Remote Sens., vol. 62, 2024, Art. no. 4416714

2024
[34]

A decoder- focused multitask network for semantic change detection,

Z. Li, X. Wang, S. Fang, J. Zhao, S. Yang, and W. Li, “A decoder- focused multitask network for semantic change detection,”IEEE Trans. Geosci. Remote Sens., vol. 62, 2024, Art. no. 5609115

2024

[1] [1]

Mul- titask learning for large-scale semantic change detection,

R. Caye Daudt, B. Le Saux, A. Boulch, and Y . Gousseau, “Mul- titask learning for large-scale semantic change detection,”Computer Vision and Image Understanding, vol. 187, p. 102783, 2019, doi: 10.1016/j.cviu.2019.07.003

work page doi:10.1016/j.cviu.2019.07.003 2019

[2] [2]

Bi- temporal semantic reasoning for the semantic change detection in HR remote sensing images,

L. Ding, H. Guo, S. Liu, L. Mou, J. Zhang, and L. Bruzzone, “Bi- temporal semantic reasoning for the semantic change detection in HR remote sensing images,”IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–14, 2022, doi: 10.1109/TGRS.2022.3154390

work page doi:10.1109/tgrs.2022.3154390 2022

[3] [3]

Joint spatio-temporal modeling for semantic change detection in remote sensing images,

L. Ding, J. Zhang, H. Guo, K. Zhang, B. Liu, and L. Bruzzone, “Joint spatio-temporal modeling for semantic change detection in remote sensing images,”IEEE Trans. Geosci. Remote Sens., vol. 62, pp. 1–14, 2024, doi: 10.1109/TGRS.2024.3362795

work page doi:10.1109/tgrs.2024.3362795 2024

[4] [4]

ChangeMamba: Remote sensing change detection with spatiotemporal state space model,

H. Chen, J. Song, C. Han, J. Xia, and N. Yokoya, “ChangeMamba: Remote sensing change detection with spatiotemporal state space model,” IEEE Trans. Geosci. Remote Sens., vol. 62, pp. 1–20, 2024, doi: 10.1109/TGRS.2024.3417253

work page doi:10.1109/tgrs.2024.3417253 2024

[5] [5]

Enhanced Smart Contract Vulnerability Detection via Graph Neural Networks: Achieving High Accuracy and Efficiency,

Y . Tang, S. Feng, C. Zhao, Y . Chen, Z. Lv, and W. Sun, “A semantic change detection network based on boundary detection and task inter- action for high-resolution remote sensing images,”IEEE Trans. Neural Netw. Learn. Syst., vol. 36, no. 9, pp. 17184–17198, Sept. 2025, doi: 10.1109/TNNLS.2025.3570425

work page doi:10.1109/tnnls.2025.3570425 2025

[6] [6]

Cross-difference seman- tic consistency network for semantic change detection,

Q. Wang, W. Jing, K. Chi, and Y . Yuan, “Cross-difference seman- tic consistency network for semantic change detection,”IEEE Trans. Geosci. Remote Sens., vol. 62, pp. 1–12, 2024, Art. no. 4406312, doi: 10.1109/TGRS.2024.3386334

work page doi:10.1109/tgrs.2024.3386334 2024

[7] [7]

Semantic- CD: Remote sensing image semantic change detection towards open- vocabulary setting,

Y . Zhu, L. Li, K. Chen, C. Liu, F. Zhou, and Z. Shi, “Semantic- CD: Remote sensing image semantic change detection towards open- vocabulary setting,”arXiv preprint arXiv:2501.06808, 2025

arXiv 2025

[8] [8]

Recurrent semantic change detection in VHR remote sensing images using visual foundation models,

J. Zhang, L. Ding, T. Zhou, J. Wang, P. M. Atkinson, and L. Bruzzone, “Recurrent semantic change detection in VHR remote sensing images using visual foundation models,”IEEE Trans. Geosci. Remote Sens., vol. 63, pp. 1–14, 2025, doi: 10.1109/TGRS.2025.3546808

work page doi:10.1109/tgrs.2025.3546808 2025

[9] [9]

Asymmetric Siamese networks for semantic change detection in aerial images,

K. Yanget al., “Asymmetric Siamese networks for semantic change detection in aerial images,”IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–18, 2021

2021

[10] [10]

A transformer-based Siamese network and an open optical dataset for semantic change detection of remote sensing images,

P. Yuan, Q. Zhao, X. Zhao, X. Wang, X. Long, and Y . Zheng, “A transformer-based Siamese network and an open optical dataset for semantic change detection of remote sensing images,”Int. J. Digit. Earth, vol. 15, no. 1, pp. 1506–1525, Dec. 2022

2022

[11] [11]

Fully convo- lutional Siamese networks for change detection,

R. Caye Daudt, B. Le Saux, and A. Boulch, “Fully convo- lutional Siamese networks for change detection,” inProc. 25th IEEE Int. Conf. Image Process. (ICIP), 2018, pp. 4063–4067, doi: 10.1109/ICIP.2018.8451652

work page doi:10.1109/icip.2018.8451652 2018

[12] [12]

SNUNet-CD: A densely connected Siamese network for change detection of VHR images,

S. Fang, K. Li, J. Shao, and Z. Li, “SNUNet-CD: A densely connected Siamese network for change detection of VHR images,” IEEE Geosci. Remote Sens. Lett., vol. 19, pp. 1–5, 2022, doi: 10.1109/LGRS.2021.3056416

work page doi:10.1109/lgrs.2021.3056416 2022

[13] [13]

A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sensing images,

C. Zhang, P. Yue, D. Tapete, L. Jiang, B. Shangguan, L. Huang, and G. Liu, “A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sensing images,”ISPRS J. Photogramm. Remote Sens., vol. 166, pp. 183–200, Aug. 2020, doi: 10.1016/j.isprsjprs.2020.06.003

work page doi:10.1016/j.isprsjprs.2020.06.003 2020

[14] [14]

Remote sensing image change detection with transformers,

H. Chen, Z. Qi, and Z. Shi, “Remote sensing image change detection with transformers,”IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–14, 2022, doi: 10.1109/TGRS.2021.3095166

work page doi:10.1109/tgrs.2021.3095166 2022

[15] [15]

A transformer-based Siamese network for change detection,

W. G. C. Bandara and V . M. Patel, “A transformer-based Siamese network for change detection,” inProc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), 2022, pp. 207–210, doi: 10.1109/IGARSS46834.2022.9883686

work page doi:10.1109/igarss46834.2022.9883686 2022

[16] [16]

KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs

M. Caron, H. Touvron, I. Misra, H. J ´egou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerging properties in self-supervised vision transformers,” inProc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2021, pp. 9650–9660, doi: 10.1109/ICCV48922.2021.00951

work page doi:10.1109/iccv48922.2021.00951 2021

[17] [17]

DINOv2: Learning robust visual features without supervision,

M. Oquabet al., “DINOv2: Learning robust visual features without supervision,”Trans. Mach. Learn. Res., 2024

2024

[18] [18]

Sim ´eoniet al., “DINOv3,”arXiv preprint arXiv:2508.10104, 2025

O. Sim ´eoniet al., “DINOv3,”arXiv preprint arXiv:2508.10104, 2025

Pith/arXiv arXiv 2025

[19] [19]

ChangeCLIP: Remote sensing change detection with multimodal vision-language representation learning,

S. Dong, L. Wang, B. Du, and X. Meng, “ChangeCLIP: Remote sensing change detection with multimodal vision-language representation learning,”ISPRS J. Photogramm. Remote Sens., vol. 208, pp. 53–69, Feb. 2024, doi: 10.1016/j.isprsjprs.2024.01.004

work page doi:10.1016/j.isprsjprs.2024.01.004 2024

[20] [20]

ChangeDINO: DINOv3-driven building change detection in optical remote sensing imagery,

C.-H. Cheng and C.-C. Hsu, “ChangeDINO: DINOv3-driven building change detection in optical remote sensing imagery,”arXiv preprint arXiv:2511.16322, 2025

arXiv 2025

[21] [21]

Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery data set,

S. Ji, S. Wei, and M. Lu, “Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery data set,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 1, pp. 574–586, Jan. 2019

2019

[22] [22]

A spatial–temporal attention-based method and a new dataset for remote sensing image change detection,

H. Chen and Z. Shi, “A spatial–temporal attention-based method and a new dataset for remote sensing image change detection,”Remote Sens., vol. 12, no. 10, p. 1662, May 2020

2020

[23] [23]

ChangeMask: Deep multi-task encoder-transformer-decoder architecture for semantic change detection,

Z. Zheng, Y . Zhong, S. Tian, A. Ma, and L. Zhang, “ChangeMask: Deep multi-task encoder-transformer-decoder architecture for semantic change detection,”ISPRS J. Photogramm. Remote Sens., vol. 183, pp. 228–239, Jan. 2022, doi: 10.1016/j.isprsjprs.2021.10.015

work page doi:10.1016/j.isprsjprs.2021.10.015 2022

[24] [24]

SMNet: Symmetric multi- task network for semantic change detection in remote sensing images based on CNN and Transformer,

Y . Niu, H. Guo, J. Lu, L. Ding, and D. Yu, “SMNet: Symmetric multi- task network for semantic change detection in remote sensing images based on CNN and Transformer,”Remote Sens., vol. 15, no. 4, Art. no. 949, 2023, doi: 10.3390/rs15040949

work page doi:10.3390/rs15040949 2023

[25] [25]

The ClearSCD model: Comprehensively leveraging semantics and change relationships for semantic change detection in high spatial resolution remote sensing imagery,

K. Tang, F. Xu, X. Chen, Q. Dong, Y . Yuan, and J. Chen, “The ClearSCD model: Comprehensively leveraging semantics and change relationships for semantic change detection in high spatial resolution remote sensing imagery,”ISPRS J. Photogramm. Remote Sens., vol. 211, pp. 299–317, May 2024, doi: 10.1016/j.isprsjprs.2024.04.013

work page doi:10.1016/j.isprsjprs.2024.04.013 2024

[26] [26]

A decoder-focused multitask network for semantic change detection,

Z. Li, X. Wang, S. Fang, J. Zhao, S. Yang, and W. Li, “A decoder-focused multitask network for semantic change detection,” IEEE Trans. Geosci. Remote Sens., vol. 62, pp. 1–15, 2024, doi: 10.1109/TGRS.2024.3362728

work page doi:10.1109/tgrs.2024.3362728 2024

[27] [27]

Dual- dimension feature interaction for semantic change detection in remote sensing images,

B. Wang, Z. Jiang, W. Ma, X. Xu, P. Zhang, Y . Wu, and H. Yang, “Dual- dimension feature interaction for semantic change detection in remote sensing images,”IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 17, pp. 9595–9605, 2024, doi: 10.1109/JSTARS.2024.3394571

work page doi:10.1109/jstars.2024.3394571 2024

[28] [28]

Semantic enhancement and change consistency network for semantic change detection in remote sensing images,

Z. Jiang, B. Wang, P. Zhang, Y . Wu, W. Ma, X. Xu, and H. Yang, “Semantic enhancement and change consistency network for semantic change detection in remote sensing images,”Int. J. Digit. Earth, vol. 18, no. 1, 2025, doi: 10.1080/17538947.2025.2496790

work page doi:10.1080/17538947.2025.2496790 2025

[29] [29]

SCD-SAM: Adapting Segment Anything Model for semantic change detection in remote sensing imagery,

L. Mei, Z. Ye, C. Xu, H. Wang, Y . Wang, C. Lei, W. Yang, and Y . Li, “SCD-SAM: Adapting Segment Anything Model for semantic change detection in remote sensing imagery,”IEEE Trans. Geosci. Remote Sens., vol. 62, pp. 1–13, 2024, doi: 10.1109/TGRS.2024.3407884. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 10

work page doi:10.1109/tgrs.2024.3407884 2024

[30] [30]

RemoteCLIP: A vision language foundation model for remote sensing,

F. Liu, D. Chen, Z. Guan, X. Zhou, J. Zhu, Q. Ye, L. Fu, and J. Zhou, “RemoteCLIP: A vision language foundation model for remote sensing,” arXiv preprint arXiv:2306.11029, 2023

arXiv 2023

[31] [31]

Foundation model-driven semantic change detection in remote sensing imagery,

H. Shen, L. Yan, H. Xie, Y . Wei, X. Li, W. Shen, P. Lv, and F. Tan, “Foundation model-driven semantic change detection in remote sensing imagery,”arXiv preprint arXiv:2602.13780, 2026

Pith/arXiv arXiv 2026

[32] [32]

ChangeVFM: Unleashing the power of vision foundation models for semantic change detection in remote sensing images,

H. Huang, K. Ding, D. Zhu, Q. Cheng, X. Huang, X. Huang, S. Wang, and Z. Shao, “ChangeVFM: Unleashing the power of vision foundation models for semantic change detection in remote sensing images,”Geo- spatial Information Science, 2026, doi: 10.1080/10095020.2026.2646372

work page doi:10.1080/10095020.2026.2646372 2026

[33] [33]

BT-HRSCD: High-resolution feature is what you need for a semantic change detection network with a triple-decoding branch,

S. Fang, W. Li, S. Yang, Z. Li, J. Zhao, and X. Wang, “BT-HRSCD: High-resolution feature is what you need for a semantic change detection network with a triple-decoding branch,”IEEE Trans. Geosci. Remote Sens., vol. 62, 2024, Art. no. 4416714

2024

[34] [34]

A decoder- focused multitask network for semantic change detection,

Z. Li, X. Wang, S. Fang, J. Zhao, S. Yang, and W. Li, “A decoder- focused multitask network for semantic change detection,”IEEE Trans. Geosci. Remote Sens., vol. 62, 2024, Art. no. 5609115

2024