arxiv: 2605.12282 · v1 · submitted 2026-05-12 · 💻 cs.CV

Recognition: unknown

Large-Small Model Collaboration for Farmland Semantic Change Detection

Dengrong Zhang, Haoyu Zhang, Lingfei Ye, Qiurong Peng, Rui Wang, Xinjia Li

Pith reviewed 2026-05-13 05:52 UTC · model grok-4.3

classification 💻 cs.CV

keywords farmland semantic change detectionlarge-small model collaborationFD-MambaCMLApseudo-change suppressionHZNU-FCDremote sensing

0 comments

The pith

A small Mamba model collaborates with a frozen large vision-language model to reach 97.63% F1 on farmland semantic change detection using only 6.65 million trainable parameters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds HZNU-FCD, a new benchmark of 4,588 bitemporal farmland image pairs with five-class from-to labels, to tackle the lack of fine-grained data for tracking cultivated land conversion. It introduces a framework that pairs a compact task-specific model with a frozen large vision-language model, using textual priors to suppress pseudo-changes from crop cycles and lighting while preserving real boundaries and small patches. A hard-region co-training step lets the small model supervise the large model's uncertain predictions, yielding high accuracy on the new dataset and competitive results on existing change-detection benchmarks.

Core claim

The central claim is that integrating Fine-grained Difference-aware Mamba for dense change features with Cross-modal Logical Arbitration from a frozen CLIP-based model, guided by hard-region co-training on low-confidence pixels, produces accurate semantic change maps for farmland that outperform prior multimodal approaches while requiring far fewer trainable weights.

What carries the argument

The hard-region co-training strategy, which supervises the large model's semantic score map exclusively on low-confidence pixels to enable collaboration between the small visual model and the frozen vision-language model.

If this is right

Farmland conversion can be monitored at fine semantic granularity despite phenology-induced appearance shifts.
Boundary accuracy and small-object localization improve because the small model focuses on dense local differences.
Textual priors from the large model reduce false positives from illumination and crop rotation.
The approach generalizes to standard change-detection datasets such as LEVIR-CD and WHU-CD.
Only 6.65 million parameters need training, enabling deployment on resource-limited hardware.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same collaboration pattern could be tested on non-farmland remote-sensing tasks where semantic priors help distinguish subtle conversions.
Replacing the Mamba backbone with an even lighter architecture might preserve accuracy if co-training remains stable.
Extending the framework to multi-date sequences rather than strict bitemporal pairs would allow tracking gradual land-use shifts.
Public release of the HZNU-FCD benchmark invites direct comparisons that could accelerate progress on farmland-specific SCD.

Load-bearing premise

Low-confidence pixels identified by the small model can safely supervise the large model's semantic scores without injecting bias from the new dataset's annotation rules or causing overfitting.

What would settle it

A controlled experiment on a fresh farmland dataset that uses different seasonal timing or annotation conventions where the proposed method shows no F1 improvement over strong baselines.

Figures

Figures reproduced from arXiv: 2605.12282 by Dengrong Zhang, Haoyu Zhang, Lingfei Ye, Qiurong Peng, Rui Wang, Xinjia Li.

**Figure 2.** Figure 2: Representative examples of the six semantic categories in HZNU-FCD. For each category, we show the bitemporal images ( [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Overall architecture of the proposed large-small collaborative framework for farmland SCD. The framework consists of a lightweight visual small [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative comparison on the HZNU-FCD test set. Color legend: red represents farmland [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: Qualitative comparison on the LEVIR-CD test set. White denotes correctly detected change (TP), red denotes false positives (FP), blue denotes false [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

**Figure 6.** Figure 6: Qualitative comparison on the WHU-CD test set. White denotes correctly detected change (TP), red denotes false positives (FP), blue denotes false [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗

**Figure 7.** Figure 7: Qualitative ablation of the CMLA module on LEVIR-CD. Red denotes [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗

**Figure 8.** Figure 8: Qualitative ablation of the CMLA module on WHU-CD. Red denotes [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗

read the original abstract

Farmland Semantic Change Detection (SCD) is essential for cultivated land protection, yet existing benchmarks and models remain insufficient for fine-grained farmland conversion monitoring. Current datasets often lack dedicated "from-to" annotations, while visual change detection models are easily disturbed by phenology-induced pseudo-changes caused by crop rotation, seasonal variation, and illumination differences. To address these challenges, we construct HZNU-FCD, a large-scale fine-grained farmland SCD benchmark with a unified five-class farmland-to-non-farmland annotation protocol. It contains 4,588 bitemporal image pairs with pixel-level labels for practical farmland protection. Based on this benchmark, we propose a large-small collaborative SCD framework that integrates a task-driven small visual model with a frozen large vision-language model. The small model, Fine-grained Difference-aware Mamba (FD-Mamba), learns dense change representations for boundary preservation and small-region localization. The large-model pathway, Cross-modal Logical Arbitration (CMLA), introduces CLIP-based textual priors for prompt-guided semantic arbitration and pseudo-change suppression. To enable effective collaboration, we design a hard-region co-training strategy that supervises the CMLA semantic score map only on low-confidence pixels. Experiments show that our method achieves 97.63% F1, 96.32% IoU, and 96.35% SCD_IoU_mean on HZNU-FCD with only 6.65M trainable parameters. Compared with the multimodal ChangeCLIP-ViT, which leverages vision-language information for change detection, our method improves F1 by 10.19 percentage points on HZNU-FCD. It also achieves 91.43% F1 and 84.21% IoU on LEVIR-CD, and 93.85% F1 and 88.41% IoU on WHU-CD, demonstrating strong robustness and generalization. The code is available at https://github.com/Lovelymili/FD-Mamba.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

They built a new farmland SCD dataset and a hybrid Mamba-CLIP model that posts high scores with few parameters, but the hard-region supervision needs more validation.

read the letter

They built HZNU-FCD, a dataset of 4,588 bitemporal pairs with a five-class farmland-to-non-farmland labeling scheme. That fills a gap for practical monitoring of cultivated land where phenology and crop cycles create false changes. The model pairs a small FD-Mamba for dense boundary and small-region detection with a frozen CLIP pathway called CMLA that brings in text priors for semantic arbitration. A hard-region co-training step feeds supervision to the large model only on pixels where the small model is uncertain. They report 97.63% F1 and 96.32% IoU on their own set, a 10-point lift over ChangeCLIP-ViT, plus 91.43% F1 on LEVIR-CD and 93.85% F1 on WHU-CD, all with 6.65M trainable parameters and released code. The dataset and the hybrid setup are the concrete new pieces. The numbers look strong on the reported benchmarks and the parameter count is attractive for remote-sensing work. The soft spot sits in the co-training loop. Defining low-confidence regions from the small model itself risks feeding back dataset-specific label habits or boundary conventions rather than clean cross-modal signals. The abstract gives no ablation on the threshold choice or error breakdown, so it is unclear whether the gains survive without that component. Generalization to the public sets is shown, but the details on splits and failure cases are not visible here. This is for remote-sensing groups focused on agricultural land monitoring who want a ready benchmark and an efficient hybrid to try. It deserves a serious referee because the new data and architecture are substantive enough to warrant checking, even if the collaboration mechanism needs tighter evidence in revision.

Referee Report

2 major / 2 minor

Summary. The paper introduces the HZNU-FCD benchmark dataset for fine-grained farmland semantic change detection (SCD) with a five-class 'from-to' annotation protocol and proposes a large-small collaborative framework. A small Fine-grained Difference-aware Mamba (FD-Mamba) model handles dense change representations, while a frozen CLIP-based Cross-modal Logical Arbitration (CMLA) pathway supplies textual priors for semantic arbitration and pseudo-change suppression. These are combined via a hard-region co-training strategy that supervises the large model's semantic scores only on low-confidence pixels. The method reports 97.63% F1, 96.32% IoU, and 96.35% SCD_IoU_mean on HZNU-FCD (6.65M trainable parameters), a 10.19 pp F1 gain over ChangeCLIP-ViT, plus competitive results on LEVIR-CD and WHU-CD; code is released.

Significance. If substantiated, the work offers a practical advance for farmland monitoring by explicitly targeting phenology-induced pseudo-changes through vision-language priors and an efficient hybrid architecture. The new dataset and released code are clear strengths that could support follow-on research; the parameter efficiency and reported generalization across public benchmarks add to the potential impact in remote-sensing change detection.

major comments (2)

[Abstract / Method] The hard-region co-training strategy (described in the abstract and method overview) is load-bearing for the central performance claims yet lacks any specification of the confidence threshold, the precise loss formulation used to supervise CMLA on FD-Mamba-identified low-confidence pixels, or ablation studies that isolate its contribution. Without these, it is impossible to verify that the reported 97.63% F1 and +10.19 pp gain do not arise from circular reinforcement of HZNU-FCD-specific label artifacts rather than robust cross-modal arbitration.
[Experiments] Experimental section provides headline metrics but omits details on train/validation/test splits for HZNU-FCD, any cross-validation protocol, or error analysis (e.g., per-class IoU or confusion matrices for the five farmland-to-non-farmland transitions). These omissions directly affect assessment of whether the generalization claims on LEVIR-CD (91.43% F1) and WHU-CD hold under the same annotation conventions.

minor comments (2)

[Abstract] The metric 'SCD_IoU_mean' is introduced without an explicit definition; clarify whether it denotes the mean IoU over semantic change classes or another aggregation.
[Experiments] Figure and table captions should explicitly state the number of runs or random seeds used to obtain the reported means and standard deviations (if any).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments highlight important areas for improving clarity and reproducibility, and we have revised the manuscript accordingly to strengthen the presentation of the hard-region co-training strategy and the experimental reporting.

read point-by-point responses

Referee: [Abstract / Method] The hard-region co-training strategy (described in the abstract and method overview) is load-bearing for the central performance claims yet lacks any specification of the confidence threshold, the precise loss formulation used to supervise CMLA on FD-Mamba-identified low-confidence pixels, or ablation studies that isolate its contribution. Without these, it is impossible to verify that the reported 97.63% F1 and +10.19 pp gain do not arise from circular reinforcement of HZNU-FCD-specific label artifacts rather than robust cross-modal arbitration.

Authors: We agree that the hard-region co-training strategy is central to the performance claims and that its description in the abstract and method overview requires greater precision for verification. In the revised manuscript, we have expanded the method section with the exact confidence threshold used to select low-confidence pixels, the precise loss formulation (masked supervision of CMLA semantic scores via cross-entropy on those pixels only), and dedicated ablation studies that isolate the co-training component. These additions demonstrate that the reported gains arise from the cross-modal arbitration and pseudo-change suppression rather than dataset-specific artifacts, and we have added a brief discussion of robustness to potential label noise. revision: yes
Referee: [Experiments] Experimental section provides headline metrics but omits details on train/validation/test splits for HZNU-FCD, any cross-validation protocol, or error analysis (e.g., per-class IoU or confusion matrices for the five farmland-to-non-farmland transitions). These omissions directly affect assessment of whether the generalization claims on LEVIR-CD (91.43% F1) and WHU-CD hold under the same annotation conventions.

Authors: We acknowledge that the experimental section would benefit from additional reporting details. In the revised manuscript, we have added a dedicated paragraph specifying the train/validation/test splits for HZNU-FCD, the cross-validation protocol, and comprehensive error analysis including per-class IoU scores and confusion matrices across the five farmland-to-non-farmland transitions. These updates enable direct assessment of the generalization results on LEVIR-CD and WHU-CD under consistent evaluation practices. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the derivation or claims.

full rationale

The paper introduces a new dataset (HZNU-FCD) and a collaborative architecture (FD-Mamba + frozen CLIP-based CMLA with hard-region co-training) whose performance numbers are obtained via standard empirical evaluation on held-out test splits and external public benchmarks (LEVIR-CD, WHU-CD). No equations, parameters, or central claims reduce by construction to fitted inputs, self-definitions, or load-bearing self-citations; the reported F1/IoU gains are falsifiable measurements rather than tautological renamings or predictions.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim depends on the effectiveness of the proposed FD-Mamba architecture and the CLIP-based semantic arbitration under the hard-region training regime, plus standard assumptions that the new dataset annotations are reliable and that pseudo-change suppression generalizes.

free parameters (1)

training hyperparameters and hard-region thresholds
Deep learning models require numerous hyperparameters and decision thresholds that are tuned on the training data.

axioms (1)

domain assumption CLIP textual priors provide reliable semantic guidance for distinguishing real farmland changes from phenology-induced pseudo-changes.
Invoked in the CMLA pathway description.

pith-pipeline@v0.9.0 · 5676 in / 1335 out tokens · 31553 ms · 2026-05-13T05:52:38.118585+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages · 1 internal anchor

[1]

Land-cover change detection using multi-temporal MODIS NDVI data,

R. S. Lunetta, J. F. Knight, J. Ediriwickrema, J. G. Lyon, and L. D. Worthy, “Land-cover change detection using multi-temporal MODIS NDVI data,”Remote Sensing of Environment, vol. 105, no. 2, pp. 142– 154, 2006

work page 2006
[2]

Z. Sun, Y . Zhong, X. Wang, and L. Zhang, “Identifying cropland non-agriculturalization with high representational consistency from bi- temporal high-resolution remote sensing images: From benchmark datasets to real-world application,”ISPRS Journal of Photogrammetry and Remote Sensing, vol. 212, pp. 454–474, 2024

work page 2024
[3]

Multitask learning for large-scale semantic change detection,

R. C. Daudt, B. Le Saux, A. Boulch, and Y . Gousseau, “Multitask learning for large-scale semantic change detection,”Computer Vision and Image Understanding, vol. 187, p. 102783, 2019

work page 2019
[4]

Asymmetric siamese networks for semantic change detection in aerial images,

K. Yang, G.-S. Xia, Z. Liu, B. Du, W. Yang, M. Pelillo, and L. Zhang, “Asymmetric siamese networks for semantic change detection in aerial images,”IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–18, 2022

work page 2022
[5]

A spatial-temporal attention-based method and a new dataset for remote sensing image change detection,

H. Chen and Z. Shi, “A spatial-temporal attention-based method and a new dataset for remote sensing image change detection,”Remote Sensing, vol. 12, no. 10, p. 1662, 2020

work page 2020
[6]

A deeply supervised attention metric-based network and an open aerial image dataset for remote sensing change detection,

Q. Shi, M. Liu, S. Li, X. Liu, F. Wang, and L. Zhang, “A deeply supervised attention metric-based network and an open aerial image dataset for remote sensing change detection,”IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–16, 2022

work page 2022
[7]

Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery dataset,

S. Ji, S. Wei, and M. Lu, “Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery dataset,” in IEEE Transactions on Geoscience and Remote Sensing, vol. 57, no. 1, 2018, pp. 574–586

work page 2018
[8]

Jilin-1 cup 2024 remote sensing image intelligent processing competition, track 2: Farmland semantic change detection,

Chang Guang Satellite Technology Co., Ltd., “Jilin-1 cup 2024 remote sensing image intelligent processing competition, track 2: Farmland semantic change detection,” 2024, accessed: 2026-05-

work page 2024
[9]

Available: https://www.jl1mall.com/contest/match/info? id=1645664411716952066

[Online]. Available: https://www.jl1mall.com/contest/match/info? id=1645664411716952066

work page
[10]

Panet: A multi-scale temporal decoupling network and its high-resolution benchmark dataset for detect- ing pseudo changes in cropland non-agriculturalization,

S. Sui, J. Zhang, H. Gu, and Y . Chang, “Panet: A multi-scale temporal decoupling network and its high-resolution benchmark dataset for detect- ing pseudo changes in cropland non-agriculturalization,”ISPRS Journal of Photogrammetry and Remote Sensing, vol. 233, pp. 126–143, 2026

work page 2026
[11]

Fully convolutional siamese networks for change detection,

R. C. Daudt, B. Le Saux, and A. Boulch, “Fully convolutional siamese networks for change detection,” in2018 25th IEEE International Con- ference on Image Processing (ICIP), 2018, pp. 4063–4067

work page 2018
[12]

Change detection based on deep siamese convolutional network for optical aerial images,

Y . Zhan, K. Fu, M. Yan, X. Sun, H. Wang, and X. Qiu, “Change detection based on deep siamese convolutional network for optical aerial images,”IEEE Geoscience and Remote Sensing Letters, vol. 14, no. 10, pp. 1845–1849, 2017

work page 2017
[13]

A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sensing images,

C. Zhang, P. Yue, D. Tapete, L. Jiang, B. Shangguan, L. Huang, and G. Liu, “A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sensing images,”ISPRS Journal of Photogrammetry and Remote Sensing, vol. 166, pp. 183–200, 2020

work page 2020
[14]

Snunet-cd: A densely connected siamese network for change detection of vhr images,

S. Fang, K. Li, J. Shao, and Z. Li, “Snunet-cd: A densely connected siamese network for change detection of vhr images,”IEEE Geoscience and Remote Sensing Letters, vol. 19, pp. 1–5, 2021

work page 2021
[15]

Remote sensing image change detection with transformers,

H. Chen, Z. Qi, and Z. Shi, “Remote sensing image change detection with transformers,”IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–14, 2022

work page 2022
[16]

A transformer-based siamese network for change detection,

W. G. C. Bandara and V . M. Patel, “A transformer-based siamese network for change detection,” in2022 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 2022, pp. 207–210

work page 2022
[17]

Swinsunet: Pure transformer network for remote sensing image change detection,

C. Zhang, L. Wang, S. Cheng, and Y . Li, “Swinsunet: Pure transformer network for remote sensing image change detection,”IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–13, 2022

work page 2022
[18]

Relation changes matter: Cross-temporal difference transformer for change detection in remote sensing images,

K. Zhang, L. T. Luppino, X. X. Zhu, and L. Bruzzone, “Relation changes matter: Cross-temporal difference transformer for change detection in remote sensing images,”IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–15, 2023

work page 2023
[19]

Lightweight structure-aware transformer network for remote sensing image change detection,

T. Lei, Y . Xu, H. Ning, Z. Lv, C. Min, Y . Jin, and A. K. Nandi, “Lightweight structure-aware transformer network for remote sensing image change detection,”IEEE Geoscience and Remote Sensing Letters, vol. 21, no. 6000305, pp. 1–5, 2024

work page 2024
[20]

Changemamba: Re- mote sensing change detection with spatiotemporal state space model,

H. Chen, J. Song, C. Han, J. Xia, and N. Yokoya, “Changemamba: Re- mote sensing change detection with spatiotemporal state space model,” IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1– 20, 2024

work page 2024
[21]

Rs-mamba for large remote sensing image dense prediction,

S. Zhao, H. Chen, X. Zhang, P. Xiao, L. Bai, and W. Ouyang, “Rs-mamba for large remote sensing image dense prediction,”IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–14, 2024

work page 2024
[22]

Cdmamba: Incorporating local clues into mamba for remote sensing image binary change detection,

H. Zhang, K. Chen, C. Liu, H. Chen, Z. Zou, and Z. Shi, “Cdmamba: Incorporating local clues into mamba for remote sensing image binary change detection,”IEEE Transactions on Geoscience and Remote Sens- ing, vol. 63, pp. 1–16, 2025

work page 2025
[23]

Conmamba: Cnn and ssm high- performance hybrid network for remote sensing change detection,

Z. Dong, G. Yuan, Z. Hua, and J. Li, “Conmamba: Cnn and ssm high- performance hybrid network for remote sensing change detection,”IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–15, 2024

work page 2024
[24]

Multi-modal medical diagnosis via large-small model collaboration,

W. Chen, Z. Zhao, J. Yao, Y . Zhang, J. Bu, and H. Wang, “Multi-modal medical diagnosis via large-small model collaboration,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 30 763–30 773

work page 2025
[25]

Cotuning: A large- small model collaborating distillation framework for better model gen- eralization,

Z. Liu, K. Liu, M. Guo, S. Zhang, and Y . Wang, “Cotuning: A large- small model collaborating distillation framework for better model gen- eralization,” inProceedings of the 32nd ACM International Conference on Multimedia, 2024, pp. 10 487–10 496

work page 2024
[26]

Collab- orative training of tiny-large vision language models,

S. Lu, L. Guo, W. Wang, Z. Zhao, T. Yue, J. Liu, and S. Liu, “Collab- orative training of tiny-large vision language models,” inProceedings of the 32nd ACM International Conference on Multimedia, 2024, pp. 4928–4937

work page 2024
[27]

Collaborative enhancement of large and small models for question answering via dual knowledge trans- fer,

S. Wang, Y . Liu, X. Tang, and W. Chen, “Collaborative enhancement of large and small models for question answering via dual knowledge trans- fer,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 40, no. 40, 2026, pp. 33 630–33 638

work page 2026
[28]

Multimodal adaptive distilla- tion for leveraging unimodal encoders for vision-language tasks,

Z. Wang, N. Codella, Y .-C. Chen, L. Zhou, X. Dai, B. Xiao, J. Yang, H. You, K.-W. Chang, S.-f. Changet al., “Multimodal adaptive distilla- tion for leveraging unimodal encoders for vision-language tasks,”arXiv preprint arXiv:2204.10496, 2022

work page arXiv 2022
[29]

Data shunt: Collaboration of small and large models for lower costs and better performance,

D. Chen, Y . Zhuang, S. Zhang, J. Liu, S. Dong, and S. Tang, “Data shunt: Collaboration of small and large models for lower costs and better performance,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 10, 2024, pp. 11 249–11 257

work page 2024
[30]

FarmCD: Agricultural information- guided gated network for farmland change detection from remote sensing images,

H. Wang, N. Wang, and X. Li, “FarmCD: Agricultural information- guided gated network for farmland change detection from remote sensing images,”IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–14, 2026

work page 2026
[31]

Bi- Temporal semantic reasoning for the semantic change detection in HR remote sensing images,

L. Ding, H. Guo, S. Liu, L. Mou, J. Zhang, and L. Bruzzone, “Bi- Temporal semantic reasoning for the semantic change detection in HR remote sensing images,”IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–14, 2022

work page 2022
[32]

Mamba: Linear-time sequence modeling with selective state spaces,

A. Gu and T. Dao, “Mamba: Linear-time sequence modeling with selective state spaces,” 2023

work page 2023
[33]

Mamba-fcs: Joint spatio-frequency feature fusion, change-guided attention, and sek in- spired loss for enhanced semantic change detection in remote sensing,

B. Wijenayake, A. Ratnayake, P. Sumanasekara, R. Godaliyadda, P. Ekanayake, V . Herath, and N. Wasalathilaka, “Mamba-fcs: Joint spatio-frequency feature fusion, change-guided attention, and sek in- spired loss for enhanced semantic change detection in remote sensing,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2026. ...

work page 2026
[34]

Ham-cd: Hybrid attention mamba for remote sensing change detection,

G. Li, P. Han, W. Wang, T. Mu, Z. Xiao, and X. Li, “Ham-cd: Hybrid attention mamba for remote sensing change detection,”IEEE Transactions on Geoscience and Remote Sensing, vol. 64, p. 5609518, 2026

work page 2026
[35]

Lgmm-net: A local–global encoder and mask mamba decoder network for remote sens- ing change detection,

C. Fang, S. Cheng, A. Du, C. Wu, and Y . Ding, “Lgmm-net: A local–global encoder and mask mamba decoder network for remote sens- ing change detection,”IEEE Transactions on Geoscience and Remote Sensing, vol. 64, p. 2000923, 2026

work page 2026
[36]

HGINet: Hierarchical graph interaction transformer with edge-indicated attention for semantic change detection,

Z. Zheng, Q. Wan, Y . Zhang, Y . Zhong, and L. Zhang, “HGINet: Hierarchical graph interaction transformer with edge-indicated attention for semantic change detection,”IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–16, 2024

work page 2024
[37]

Landscd: Change detection based on change of land surface characteristics under semantic constraints,

L. Ding, H. Tang, and L. Bruzzone, “Landscd: Change detection based on change of land surface characteristics under semantic constraints,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1– 14, 2022

work page 2022
[38]

Changeclip: Remote sensing change detection with sample-efficient vision-language semantic alignment,

W. Shi, M. Zhang, R. Zhang, S. Chen, and Z. Zhan, “Changeclip: Remote sensing change detection with sample-efficient vision-language semantic alignment,”ISPRS Journal of Photogrammetry and Remote Sensing, vol. 208, pp. 1–14, 2024

work page 2024
[39]

Semantic change detection via bidirectional vision-language feature alignment,

Y . Liu, D. Peng, X. Zhang, Q. Guo, Y . Zhong, and L. Zhang, “Semantic change detection via bidirectional vision-language feature alignment,” IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1– 14, 2024

work page 2024
[40]

A novel change detection method based on visual language from high-resolution remote sensing images,

J. Qiu, W. Liu, H. Zhang, E. Li, L. Zhang, and X. Li, “A novel change detection method based on visual language from high-resolution remote sensing images,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 18, pp. 4554–4567, 2024

work page 2024
[41]

A remote sensing image change detection network with feature constraints from a visual foundation model,

Z. Wu, L. Zan, Z. Chen, M. Cai, Y . Li, Z. Wang, J. Xie, and X. Shi, “A remote sensing image change detection network with feature constraints from a visual foundation model,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 18, pp. 28 939– 28 956, 2025

work page 2025
[42]

Change-lisa: Language-guided reasoning for remote sensing change detection,

X. Jia, Z. Chen, S. Zhang, and X. Xue, “Change-lisa: Language-guided reasoning for remote sensing change detection,”IEEE Transactions on Geoscience and Remote Sensing, vol. 64, p. 3001415, 2026

work page 2026
[43]

Wfcdclip: A clip-based framework for weakly supervised farmland change detection,

Z. Cao, Y . Huang, L. Ma, Y . Zhou, P. Zhou, and W. Shi, “Wfcdclip: A clip-based framework for weakly supervised farmland change detection,” IEEE Transactions on Geoscience and Remote Sensing, vol. 64, p. 3684850, 2026

work page 2026
[44]

Synergy of content and style: Enhanced remote sensing change detection via disentanglement and refinement,

S. Dong, C. Lu, S. Fu, and X. Meng, “Synergy of content and style: Enhanced remote sensing change detection via disentanglement and refinement,”IEEE Transactions on Geoscience and Remote Sensing, vol. 64, p. 5610316, 2026

work page 2026
[45]

Imea-net: An edge-sensitive network for cropland change detection in high-resolution remote sensing images,

X. Yuan, L. Chen, J. Zhang, G. Zhou, M. Wang, and L. Li, “Imea-net: An edge-sensitive network for cropland change detection in high-resolution remote sensing images,”ISPRS Journal of Photogrammetry and Remote Sensing, vol. 236, pp. 175–196, 2026

work page 2026
[46]

A CNN-transformer network with multiscale context aggregation for fine-grained cropland change de- tection,

M. Liu, Z. Chai, H. Deng, and R. Liu, “A CNN-transformer network with multiscale context aggregation for fine-grained cropland change de- tection,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 15, pp. 4297–4306, 2022

work page 2022
[47]

Learning transferable visual models from natural language supervi- sion,

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning transferable visual models from natural language supervi- sion,” inProceedings of the International Conference on Machine Learning, 2021, pp. 8748–8763

work page 2021
[48]

Gemini: A Family of Highly Capable Multimodal Models

Gemini Team, R. Anil, S. Borgeaud, J.-B. Alayrac, J. Yu, R. Soricut, J. Schalkwyk, A. M. Dai, A. Hauth, K. Millican, D. Silver, M. Johnson, I. Antonoglou, J. Schrittwieser, A. Glaese, J. Chen, E. Pitler, T. Lillicrap, A. Lazaridou, O. Firatet al., “Gemini: A family of highly capable multimodal models,”arXiv preprint arXiv:2312.11805, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[49]

MCTNet: Multi- context transformer network for semantic change detection of remote sensing images,

T. Song, X. Zhang, J. Li, L. Gao, B. Li, and M. Peng, “MCTNet: Multi- context transformer network for semantic change detection of remote sensing images,”IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–15, 2023

work page 2023
[50]

SCanNet: Joint spatiotempo- ral convolutional-and-attention network for semantic change detection,

Y . Du, J. Xu, X. Zhu, X. Qiu, and Z. Wei, “SCanNet: Joint spatiotempo- ral convolutional-and-attention network for semantic change detection,” IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1– 15, 2023

work page 2023
[51]

MambaFCS: Selective state space model with frequency-domain cues for remote sensing image change detection,

Z. Zhao, H. Zhang, Y . He, Y . Zhou, and Z. Shi, “MambaFCS: Selective state space model with frequency-domain cues for remote sensing image change detection,”IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–15, 2024

work page 2024
[52]

Remote sensing change detection with transformers trained from scratch,

M. Noman, M. Fiaz, H. Cholakkal, S. Narayan, R. Muhammad Anwer, S. Khan, and F. Shahbaz Khan, “Remote sensing change detection with transformers trained from scratch,”IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–14, 2024

work page 2024
[53]

Cross-difference semantic consistency network for semantic change detection,

Q. Wang, W. Jing, K. Chi, and Y . Yuan, “Cross-difference semantic consistency network for semantic change detection,”IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–12, 2024

work page 2024
[54]

GSTM-SCD: Graph-enhanced spatio-temporal state space model for semantic change detection in multi-temporal remote sensing images,

X. Liu, C. Dai, L. Ding, Z. Zhang, Y . Li, X. Zuo, M. Li, H. Wang, and Y . Miao, “GSTM-SCD: Graph-enhanced spatio-temporal state space model for semantic change detection in multi-temporal remote sensing images,”ISPRS Journal of Photogrammetry and Remote Sensing, vol. 230, pp. 73–91, 2025

work page 2025
[55]

IFNet: Deep fusion of multi-scale and multi-spectral information for change detection in optical and SAR images,

J. Chen, Z. Yuan, J. Peng, L. Chen, H. Han, J. Chu, X. Fan, and H. Li, “IFNet: Deep fusion of multi-scale and multi-spectral information for change detection in optical and SAR images,”IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–13, 2022

work page 2022
[56]

PaFormer: Parallel attentional transformer for remote sensing change detection,

X. Liu, Z. Li, W. Zhao, J. Shi, and J. Zhou, “PaFormer: Parallel attentional transformer for remote sensing change detection,”IEEE Geoscience and Remote Sensing Letters, vol. 20, pp. 1–5, 2023

work page 2023
[57]

DARNet: Semantic supervised dense attention retargeting network for change detection in remote sensing images,

Z. Mao, Y . Zhong, X. Hu, L. Cao, J. Gao, and L. Zhang, “DARNet: Semantic supervised dense attention retargeting network for change detection in remote sensing images,”IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–15, 2023

work page 2023
[58]

ACABFNet: Attentional class-aware background features for remote sensing image change detection,

J. Li, Z. Su, J. Geng, and Y . Yin, “ACABFNet: Attentional class-aware background features for remote sensing image change detection,”IEEE Geoscience and Remote Sensing Letters, vol. 19, pp. 1–5, 2022

work page 2022