pith. machine review for the scientific record. sign in

arxiv: 2605.12282 · v1 · submitted 2026-05-12 · 💻 cs.CV

Recognition: unknown

Large-Small Model Collaboration for Farmland Semantic Change Detection

Dengrong Zhang, Haoyu Zhang, Lingfei Ye, Qiurong Peng, Rui Wang, Xinjia Li

Pith reviewed 2026-05-13 05:52 UTC · model grok-4.3

classification 💻 cs.CV
keywords farmland semantic change detectionlarge-small model collaborationFD-MambaCMLApseudo-change suppressionHZNU-FCDremote sensing
0
0 comments X

The pith

A small Mamba model collaborates with a frozen large vision-language model to reach 97.63% F1 on farmland semantic change detection using only 6.65 million trainable parameters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds HZNU-FCD, a new benchmark of 4,588 bitemporal farmland image pairs with five-class from-to labels, to tackle the lack of fine-grained data for tracking cultivated land conversion. It introduces a framework that pairs a compact task-specific model with a frozen large vision-language model, using textual priors to suppress pseudo-changes from crop cycles and lighting while preserving real boundaries and small patches. A hard-region co-training step lets the small model supervise the large model's uncertain predictions, yielding high accuracy on the new dataset and competitive results on existing change-detection benchmarks.

Core claim

The central claim is that integrating Fine-grained Difference-aware Mamba for dense change features with Cross-modal Logical Arbitration from a frozen CLIP-based model, guided by hard-region co-training on low-confidence pixels, produces accurate semantic change maps for farmland that outperform prior multimodal approaches while requiring far fewer trainable weights.

What carries the argument

The hard-region co-training strategy, which supervises the large model's semantic score map exclusively on low-confidence pixels to enable collaboration between the small visual model and the frozen vision-language model.

If this is right

  • Farmland conversion can be monitored at fine semantic granularity despite phenology-induced appearance shifts.
  • Boundary accuracy and small-object localization improve because the small model focuses on dense local differences.
  • Textual priors from the large model reduce false positives from illumination and crop rotation.
  • The approach generalizes to standard change-detection datasets such as LEVIR-CD and WHU-CD.
  • Only 6.65 million parameters need training, enabling deployment on resource-limited hardware.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same collaboration pattern could be tested on non-farmland remote-sensing tasks where semantic priors help distinguish subtle conversions.
  • Replacing the Mamba backbone with an even lighter architecture might preserve accuracy if co-training remains stable.
  • Extending the framework to multi-date sequences rather than strict bitemporal pairs would allow tracking gradual land-use shifts.
  • Public release of the HZNU-FCD benchmark invites direct comparisons that could accelerate progress on farmland-specific SCD.

Load-bearing premise

Low-confidence pixels identified by the small model can safely supervise the large model's semantic scores without injecting bias from the new dataset's annotation rules or causing overfitting.

What would settle it

A controlled experiment on a fresh farmland dataset that uses different seasonal timing or annotation conventions where the proposed method shows no F1 improvement over strong baselines.

Figures

Figures reproduced from arXiv: 2605.12282 by Dengrong Zhang, Haoyu Zhang, Lingfei Ye, Qiurong Peng, Rui Wang, Xinjia Li.

Figure 1
Figure 1. Figure 1: Illustration of pseudo-changes and real structural changes in HZNU [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Representative examples of the six semantic categories in HZNU-FCD. For each category, we show the bitemporal images ( [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Overall architecture of the proposed large-small collaborative framework for farmland SCD. The framework consists of a lightweight visual small [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative comparison on the HZNU-FCD test set. Color legend: red represents farmland [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative comparison on the LEVIR-CD test set. White denotes correctly detected change (TP), red denotes false positives (FP), blue denotes false [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative comparison on the WHU-CD test set. White denotes correctly detected change (TP), red denotes false positives (FP), blue denotes false [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Qualitative ablation of the CMLA module on LEVIR-CD. Red denotes [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Qualitative ablation of the CMLA module on WHU-CD. Red denotes [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗
read the original abstract

Farmland Semantic Change Detection (SCD) is essential for cultivated land protection, yet existing benchmarks and models remain insufficient for fine-grained farmland conversion monitoring. Current datasets often lack dedicated "from-to" annotations, while visual change detection models are easily disturbed by phenology-induced pseudo-changes caused by crop rotation, seasonal variation, and illumination differences. To address these challenges, we construct HZNU-FCD, a large-scale fine-grained farmland SCD benchmark with a unified five-class farmland-to-non-farmland annotation protocol. It contains 4,588 bitemporal image pairs with pixel-level labels for practical farmland protection. Based on this benchmark, we propose a large-small collaborative SCD framework that integrates a task-driven small visual model with a frozen large vision-language model. The small model, Fine-grained Difference-aware Mamba (FD-Mamba), learns dense change representations for boundary preservation and small-region localization. The large-model pathway, Cross-modal Logical Arbitration (CMLA), introduces CLIP-based textual priors for prompt-guided semantic arbitration and pseudo-change suppression. To enable effective collaboration, we design a hard-region co-training strategy that supervises the CMLA semantic score map only on low-confidence pixels. Experiments show that our method achieves 97.63% F1, 96.32% IoU, and 96.35% SCD_IoU_mean on HZNU-FCD with only 6.65M trainable parameters. Compared with the multimodal ChangeCLIP-ViT, which leverages vision-language information for change detection, our method improves F1 by 10.19 percentage points on HZNU-FCD. It also achieves 91.43% F1 and 84.21% IoU on LEVIR-CD, and 93.85% F1 and 88.41% IoU on WHU-CD, demonstrating strong robustness and generalization. The code is available at https://github.com/Lovelymili/FD-Mamba.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces the HZNU-FCD benchmark dataset for fine-grained farmland semantic change detection (SCD) with a five-class 'from-to' annotation protocol and proposes a large-small collaborative framework. A small Fine-grained Difference-aware Mamba (FD-Mamba) model handles dense change representations, while a frozen CLIP-based Cross-modal Logical Arbitration (CMLA) pathway supplies textual priors for semantic arbitration and pseudo-change suppression. These are combined via a hard-region co-training strategy that supervises the large model's semantic scores only on low-confidence pixels. The method reports 97.63% F1, 96.32% IoU, and 96.35% SCD_IoU_mean on HZNU-FCD (6.65M trainable parameters), a 10.19 pp F1 gain over ChangeCLIP-ViT, plus competitive results on LEVIR-CD and WHU-CD; code is released.

Significance. If substantiated, the work offers a practical advance for farmland monitoring by explicitly targeting phenology-induced pseudo-changes through vision-language priors and an efficient hybrid architecture. The new dataset and released code are clear strengths that could support follow-on research; the parameter efficiency and reported generalization across public benchmarks add to the potential impact in remote-sensing change detection.

major comments (2)
  1. [Abstract / Method] The hard-region co-training strategy (described in the abstract and method overview) is load-bearing for the central performance claims yet lacks any specification of the confidence threshold, the precise loss formulation used to supervise CMLA on FD-Mamba-identified low-confidence pixels, or ablation studies that isolate its contribution. Without these, it is impossible to verify that the reported 97.63% F1 and +10.19 pp gain do not arise from circular reinforcement of HZNU-FCD-specific label artifacts rather than robust cross-modal arbitration.
  2. [Experiments] Experimental section provides headline metrics but omits details on train/validation/test splits for HZNU-FCD, any cross-validation protocol, or error analysis (e.g., per-class IoU or confusion matrices for the five farmland-to-non-farmland transitions). These omissions directly affect assessment of whether the generalization claims on LEVIR-CD (91.43% F1) and WHU-CD hold under the same annotation conventions.
minor comments (2)
  1. [Abstract] The metric 'SCD_IoU_mean' is introduced without an explicit definition; clarify whether it denotes the mean IoU over semantic change classes or another aggregation.
  2. [Experiments] Figure and table captions should explicitly state the number of runs or random seeds used to obtain the reported means and standard deviations (if any).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments highlight important areas for improving clarity and reproducibility, and we have revised the manuscript accordingly to strengthen the presentation of the hard-region co-training strategy and the experimental reporting.

read point-by-point responses
  1. Referee: [Abstract / Method] The hard-region co-training strategy (described in the abstract and method overview) is load-bearing for the central performance claims yet lacks any specification of the confidence threshold, the precise loss formulation used to supervise CMLA on FD-Mamba-identified low-confidence pixels, or ablation studies that isolate its contribution. Without these, it is impossible to verify that the reported 97.63% F1 and +10.19 pp gain do not arise from circular reinforcement of HZNU-FCD-specific label artifacts rather than robust cross-modal arbitration.

    Authors: We agree that the hard-region co-training strategy is central to the performance claims and that its description in the abstract and method overview requires greater precision for verification. In the revised manuscript, we have expanded the method section with the exact confidence threshold used to select low-confidence pixels, the precise loss formulation (masked supervision of CMLA semantic scores via cross-entropy on those pixels only), and dedicated ablation studies that isolate the co-training component. These additions demonstrate that the reported gains arise from the cross-modal arbitration and pseudo-change suppression rather than dataset-specific artifacts, and we have added a brief discussion of robustness to potential label noise. revision: yes

  2. Referee: [Experiments] Experimental section provides headline metrics but omits details on train/validation/test splits for HZNU-FCD, any cross-validation protocol, or error analysis (e.g., per-class IoU or confusion matrices for the five farmland-to-non-farmland transitions). These omissions directly affect assessment of whether the generalization claims on LEVIR-CD (91.43% F1) and WHU-CD hold under the same annotation conventions.

    Authors: We acknowledge that the experimental section would benefit from additional reporting details. In the revised manuscript, we have added a dedicated paragraph specifying the train/validation/test splits for HZNU-FCD, the cross-validation protocol, and comprehensive error analysis including per-class IoU scores and confusion matrices across the five farmland-to-non-farmland transitions. These updates enable direct assessment of the generalization results on LEVIR-CD and WHU-CD under consistent evaluation practices. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the derivation or claims.

full rationale

The paper introduces a new dataset (HZNU-FCD) and a collaborative architecture (FD-Mamba + frozen CLIP-based CMLA with hard-region co-training) whose performance numbers are obtained via standard empirical evaluation on held-out test splits and external public benchmarks (LEVIR-CD, WHU-CD). No equations, parameters, or central claims reduce by construction to fitted inputs, self-definitions, or load-bearing self-citations; the reported F1/IoU gains are falsifiable measurements rather than tautological renamings or predictions.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim depends on the effectiveness of the proposed FD-Mamba architecture and the CLIP-based semantic arbitration under the hard-region training regime, plus standard assumptions that the new dataset annotations are reliable and that pseudo-change suppression generalizes.

free parameters (1)
  • training hyperparameters and hard-region thresholds
    Deep learning models require numerous hyperparameters and decision thresholds that are tuned on the training data.
axioms (1)
  • domain assumption CLIP textual priors provide reliable semantic guidance for distinguishing real farmland changes from phenology-induced pseudo-changes.
    Invoked in the CMLA pathway description.

pith-pipeline@v0.9.0 · 5676 in / 1335 out tokens · 31553 ms · 2026-05-13T05:52:38.118585+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages · 1 internal anchor

  1. [1]

    Land-cover change detection using multi-temporal MODIS NDVI data,

    R. S. Lunetta, J. F. Knight, J. Ediriwickrema, J. G. Lyon, and L. D. Worthy, “Land-cover change detection using multi-temporal MODIS NDVI data,”Remote Sensing of Environment, vol. 105, no. 2, pp. 142– 154, 2006

  2. [2]

    Z. Sun, Y . Zhong, X. Wang, and L. Zhang, “Identifying cropland non-agriculturalization with high representational consistency from bi- temporal high-resolution remote sensing images: From benchmark datasets to real-world application,”ISPRS Journal of Photogrammetry and Remote Sensing, vol. 212, pp. 454–474, 2024

  3. [3]

    Multitask learning for large-scale semantic change detection,

    R. C. Daudt, B. Le Saux, A. Boulch, and Y . Gousseau, “Multitask learning for large-scale semantic change detection,”Computer Vision and Image Understanding, vol. 187, p. 102783, 2019

  4. [4]

    Asymmetric siamese networks for semantic change detection in aerial images,

    K. Yang, G.-S. Xia, Z. Liu, B. Du, W. Yang, M. Pelillo, and L. Zhang, “Asymmetric siamese networks for semantic change detection in aerial images,”IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–18, 2022

  5. [5]

    A spatial-temporal attention-based method and a new dataset for remote sensing image change detection,

    H. Chen and Z. Shi, “A spatial-temporal attention-based method and a new dataset for remote sensing image change detection,”Remote Sensing, vol. 12, no. 10, p. 1662, 2020

  6. [6]

    A deeply supervised attention metric-based network and an open aerial image dataset for remote sensing change detection,

    Q. Shi, M. Liu, S. Li, X. Liu, F. Wang, and L. Zhang, “A deeply supervised attention metric-based network and an open aerial image dataset for remote sensing change detection,”IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–16, 2022

  7. [7]

    Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery dataset,

    S. Ji, S. Wei, and M. Lu, “Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery dataset,” in IEEE Transactions on Geoscience and Remote Sensing, vol. 57, no. 1, 2018, pp. 574–586

  8. [8]

    Jilin-1 cup 2024 remote sensing image intelligent processing competition, track 2: Farmland semantic change detection,

    Chang Guang Satellite Technology Co., Ltd., “Jilin-1 cup 2024 remote sensing image intelligent processing competition, track 2: Farmland semantic change detection,” 2024, accessed: 2026-05-

  9. [9]

    Available: https://www.jl1mall.com/contest/match/info? id=1645664411716952066

    [Online]. Available: https://www.jl1mall.com/contest/match/info? id=1645664411716952066

  10. [10]

    Panet: A multi-scale temporal decoupling network and its high-resolution benchmark dataset for detect- ing pseudo changes in cropland non-agriculturalization,

    S. Sui, J. Zhang, H. Gu, and Y . Chang, “Panet: A multi-scale temporal decoupling network and its high-resolution benchmark dataset for detect- ing pseudo changes in cropland non-agriculturalization,”ISPRS Journal of Photogrammetry and Remote Sensing, vol. 233, pp. 126–143, 2026

  11. [11]

    Fully convolutional siamese networks for change detection,

    R. C. Daudt, B. Le Saux, and A. Boulch, “Fully convolutional siamese networks for change detection,” in2018 25th IEEE International Con- ference on Image Processing (ICIP), 2018, pp. 4063–4067

  12. [12]

    Change detection based on deep siamese convolutional network for optical aerial images,

    Y . Zhan, K. Fu, M. Yan, X. Sun, H. Wang, and X. Qiu, “Change detection based on deep siamese convolutional network for optical aerial images,”IEEE Geoscience and Remote Sensing Letters, vol. 14, no. 10, pp. 1845–1849, 2017

  13. [13]

    A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sensing images,

    C. Zhang, P. Yue, D. Tapete, L. Jiang, B. Shangguan, L. Huang, and G. Liu, “A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sensing images,”ISPRS Journal of Photogrammetry and Remote Sensing, vol. 166, pp. 183–200, 2020

  14. [14]

    Snunet-cd: A densely connected siamese network for change detection of vhr images,

    S. Fang, K. Li, J. Shao, and Z. Li, “Snunet-cd: A densely connected siamese network for change detection of vhr images,”IEEE Geoscience and Remote Sensing Letters, vol. 19, pp. 1–5, 2021

  15. [15]

    Remote sensing image change detection with transformers,

    H. Chen, Z. Qi, and Z. Shi, “Remote sensing image change detection with transformers,”IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–14, 2022

  16. [16]

    A transformer-based siamese network for change detection,

    W. G. C. Bandara and V . M. Patel, “A transformer-based siamese network for change detection,” in2022 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 2022, pp. 207–210

  17. [17]

    Swinsunet: Pure transformer network for remote sensing image change detection,

    C. Zhang, L. Wang, S. Cheng, and Y . Li, “Swinsunet: Pure transformer network for remote sensing image change detection,”IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–13, 2022

  18. [18]

    Relation changes matter: Cross-temporal difference transformer for change detection in remote sensing images,

    K. Zhang, L. T. Luppino, X. X. Zhu, and L. Bruzzone, “Relation changes matter: Cross-temporal difference transformer for change detection in remote sensing images,”IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–15, 2023

  19. [19]

    Lightweight structure-aware transformer network for remote sensing image change detection,

    T. Lei, Y . Xu, H. Ning, Z. Lv, C. Min, Y . Jin, and A. K. Nandi, “Lightweight structure-aware transformer network for remote sensing image change detection,”IEEE Geoscience and Remote Sensing Letters, vol. 21, no. 6000305, pp. 1–5, 2024

  20. [20]

    Changemamba: Re- mote sensing change detection with spatiotemporal state space model,

    H. Chen, J. Song, C. Han, J. Xia, and N. Yokoya, “Changemamba: Re- mote sensing change detection with spatiotemporal state space model,” IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1– 20, 2024

  21. [21]

    Rs-mamba for large remote sensing image dense prediction,

    S. Zhao, H. Chen, X. Zhang, P. Xiao, L. Bai, and W. Ouyang, “Rs-mamba for large remote sensing image dense prediction,”IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–14, 2024

  22. [22]

    Cdmamba: Incorporating local clues into mamba for remote sensing image binary change detection,

    H. Zhang, K. Chen, C. Liu, H. Chen, Z. Zou, and Z. Shi, “Cdmamba: Incorporating local clues into mamba for remote sensing image binary change detection,”IEEE Transactions on Geoscience and Remote Sens- ing, vol. 63, pp. 1–16, 2025

  23. [23]

    Conmamba: Cnn and ssm high- performance hybrid network for remote sensing change detection,

    Z. Dong, G. Yuan, Z. Hua, and J. Li, “Conmamba: Cnn and ssm high- performance hybrid network for remote sensing change detection,”IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–15, 2024

  24. [24]

    Multi-modal medical diagnosis via large-small model collaboration,

    W. Chen, Z. Zhao, J. Yao, Y . Zhang, J. Bu, and H. Wang, “Multi-modal medical diagnosis via large-small model collaboration,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 30 763–30 773

  25. [25]

    Cotuning: A large- small model collaborating distillation framework for better model gen- eralization,

    Z. Liu, K. Liu, M. Guo, S. Zhang, and Y . Wang, “Cotuning: A large- small model collaborating distillation framework for better model gen- eralization,” inProceedings of the 32nd ACM International Conference on Multimedia, 2024, pp. 10 487–10 496

  26. [26]

    Collab- orative training of tiny-large vision language models,

    S. Lu, L. Guo, W. Wang, Z. Zhao, T. Yue, J. Liu, and S. Liu, “Collab- orative training of tiny-large vision language models,” inProceedings of the 32nd ACM International Conference on Multimedia, 2024, pp. 4928–4937

  27. [27]

    Collaborative enhancement of large and small models for question answering via dual knowledge trans- fer,

    S. Wang, Y . Liu, X. Tang, and W. Chen, “Collaborative enhancement of large and small models for question answering via dual knowledge trans- fer,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 40, no. 40, 2026, pp. 33 630–33 638

  28. [28]

    Multimodal adaptive distilla- tion for leveraging unimodal encoders for vision-language tasks,

    Z. Wang, N. Codella, Y .-C. Chen, L. Zhou, X. Dai, B. Xiao, J. Yang, H. You, K.-W. Chang, S.-f. Changet al., “Multimodal adaptive distilla- tion for leveraging unimodal encoders for vision-language tasks,”arXiv preprint arXiv:2204.10496, 2022

  29. [29]

    Data shunt: Collaboration of small and large models for lower costs and better performance,

    D. Chen, Y . Zhuang, S. Zhang, J. Liu, S. Dong, and S. Tang, “Data shunt: Collaboration of small and large models for lower costs and better performance,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 10, 2024, pp. 11 249–11 257

  30. [30]

    FarmCD: Agricultural information- guided gated network for farmland change detection from remote sensing images,

    H. Wang, N. Wang, and X. Li, “FarmCD: Agricultural information- guided gated network for farmland change detection from remote sensing images,”IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–14, 2026

  31. [31]

    Bi- Temporal semantic reasoning for the semantic change detection in HR remote sensing images,

    L. Ding, H. Guo, S. Liu, L. Mou, J. Zhang, and L. Bruzzone, “Bi- Temporal semantic reasoning for the semantic change detection in HR remote sensing images,”IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–14, 2022

  32. [32]

    Mamba: Linear-time sequence modeling with selective state spaces,

    A. Gu and T. Dao, “Mamba: Linear-time sequence modeling with selective state spaces,” 2023

  33. [33]

    Mamba-fcs: Joint spatio-frequency feature fusion, change-guided attention, and sek in- spired loss for enhanced semantic change detection in remote sensing,

    B. Wijenayake, A. Ratnayake, P. Sumanasekara, R. Godaliyadda, P. Ekanayake, V . Herath, and N. Wasalathilaka, “Mamba-fcs: Joint spatio-frequency feature fusion, change-guided attention, and sek in- spired loss for enhanced semantic change detection in remote sensing,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2026. ...

  34. [34]

    Ham-cd: Hybrid attention mamba for remote sensing change detection,

    G. Li, P. Han, W. Wang, T. Mu, Z. Xiao, and X. Li, “Ham-cd: Hybrid attention mamba for remote sensing change detection,”IEEE Transactions on Geoscience and Remote Sensing, vol. 64, p. 5609518, 2026

  35. [35]

    Lgmm-net: A local–global encoder and mask mamba decoder network for remote sens- ing change detection,

    C. Fang, S. Cheng, A. Du, C. Wu, and Y . Ding, “Lgmm-net: A local–global encoder and mask mamba decoder network for remote sens- ing change detection,”IEEE Transactions on Geoscience and Remote Sensing, vol. 64, p. 2000923, 2026

  36. [36]

    HGINet: Hierarchical graph interaction transformer with edge-indicated attention for semantic change detection,

    Z. Zheng, Q. Wan, Y . Zhang, Y . Zhong, and L. Zhang, “HGINet: Hierarchical graph interaction transformer with edge-indicated attention for semantic change detection,”IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–16, 2024

  37. [37]

    Landscd: Change detection based on change of land surface characteristics under semantic constraints,

    L. Ding, H. Tang, and L. Bruzzone, “Landscd: Change detection based on change of land surface characteristics under semantic constraints,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1– 14, 2022

  38. [38]

    Changeclip: Remote sensing change detection with sample-efficient vision-language semantic alignment,

    W. Shi, M. Zhang, R. Zhang, S. Chen, and Z. Zhan, “Changeclip: Remote sensing change detection with sample-efficient vision-language semantic alignment,”ISPRS Journal of Photogrammetry and Remote Sensing, vol. 208, pp. 1–14, 2024

  39. [39]

    Semantic change detection via bidirectional vision-language feature alignment,

    Y . Liu, D. Peng, X. Zhang, Q. Guo, Y . Zhong, and L. Zhang, “Semantic change detection via bidirectional vision-language feature alignment,” IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1– 14, 2024

  40. [40]

    A novel change detection method based on visual language from high-resolution remote sensing images,

    J. Qiu, W. Liu, H. Zhang, E. Li, L. Zhang, and X. Li, “A novel change detection method based on visual language from high-resolution remote sensing images,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 18, pp. 4554–4567, 2024

  41. [41]

    A remote sensing image change detection network with feature constraints from a visual foundation model,

    Z. Wu, L. Zan, Z. Chen, M. Cai, Y . Li, Z. Wang, J. Xie, and X. Shi, “A remote sensing image change detection network with feature constraints from a visual foundation model,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 18, pp. 28 939– 28 956, 2025

  42. [42]

    Change-lisa: Language-guided reasoning for remote sensing change detection,

    X. Jia, Z. Chen, S. Zhang, and X. Xue, “Change-lisa: Language-guided reasoning for remote sensing change detection,”IEEE Transactions on Geoscience and Remote Sensing, vol. 64, p. 3001415, 2026

  43. [43]

    Wfcdclip: A clip-based framework for weakly supervised farmland change detection,

    Z. Cao, Y . Huang, L. Ma, Y . Zhou, P. Zhou, and W. Shi, “Wfcdclip: A clip-based framework for weakly supervised farmland change detection,” IEEE Transactions on Geoscience and Remote Sensing, vol. 64, p. 3684850, 2026

  44. [44]

    Synergy of content and style: Enhanced remote sensing change detection via disentanglement and refinement,

    S. Dong, C. Lu, S. Fu, and X. Meng, “Synergy of content and style: Enhanced remote sensing change detection via disentanglement and refinement,”IEEE Transactions on Geoscience and Remote Sensing, vol. 64, p. 5610316, 2026

  45. [45]

    Imea-net: An edge-sensitive network for cropland change detection in high-resolution remote sensing images,

    X. Yuan, L. Chen, J. Zhang, G. Zhou, M. Wang, and L. Li, “Imea-net: An edge-sensitive network for cropland change detection in high-resolution remote sensing images,”ISPRS Journal of Photogrammetry and Remote Sensing, vol. 236, pp. 175–196, 2026

  46. [46]

    A CNN-transformer network with multiscale context aggregation for fine-grained cropland change de- tection,

    M. Liu, Z. Chai, H. Deng, and R. Liu, “A CNN-transformer network with multiscale context aggregation for fine-grained cropland change de- tection,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 15, pp. 4297–4306, 2022

  47. [47]

    Learning transferable visual models from natural language supervi- sion,

    A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning transferable visual models from natural language supervi- sion,” inProceedings of the International Conference on Machine Learning, 2021, pp. 8748–8763

  48. [48]

    Gemini: A Family of Highly Capable Multimodal Models

    Gemini Team, R. Anil, S. Borgeaud, J.-B. Alayrac, J. Yu, R. Soricut, J. Schalkwyk, A. M. Dai, A. Hauth, K. Millican, D. Silver, M. Johnson, I. Antonoglou, J. Schrittwieser, A. Glaese, J. Chen, E. Pitler, T. Lillicrap, A. Lazaridou, O. Firatet al., “Gemini: A family of highly capable multimodal models,”arXiv preprint arXiv:2312.11805, 2024

  49. [49]

    MCTNet: Multi- context transformer network for semantic change detection of remote sensing images,

    T. Song, X. Zhang, J. Li, L. Gao, B. Li, and M. Peng, “MCTNet: Multi- context transformer network for semantic change detection of remote sensing images,”IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–15, 2023

  50. [50]

    SCanNet: Joint spatiotempo- ral convolutional-and-attention network for semantic change detection,

    Y . Du, J. Xu, X. Zhu, X. Qiu, and Z. Wei, “SCanNet: Joint spatiotempo- ral convolutional-and-attention network for semantic change detection,” IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1– 15, 2023

  51. [51]

    MambaFCS: Selective state space model with frequency-domain cues for remote sensing image change detection,

    Z. Zhao, H. Zhang, Y . He, Y . Zhou, and Z. Shi, “MambaFCS: Selective state space model with frequency-domain cues for remote sensing image change detection,”IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–15, 2024

  52. [52]

    Remote sensing change detection with transformers trained from scratch,

    M. Noman, M. Fiaz, H. Cholakkal, S. Narayan, R. Muhammad Anwer, S. Khan, and F. Shahbaz Khan, “Remote sensing change detection with transformers trained from scratch,”IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–14, 2024

  53. [53]

    Cross-difference semantic consistency network for semantic change detection,

    Q. Wang, W. Jing, K. Chi, and Y . Yuan, “Cross-difference semantic consistency network for semantic change detection,”IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–12, 2024

  54. [54]

    GSTM-SCD: Graph-enhanced spatio-temporal state space model for semantic change detection in multi-temporal remote sensing images,

    X. Liu, C. Dai, L. Ding, Z. Zhang, Y . Li, X. Zuo, M. Li, H. Wang, and Y . Miao, “GSTM-SCD: Graph-enhanced spatio-temporal state space model for semantic change detection in multi-temporal remote sensing images,”ISPRS Journal of Photogrammetry and Remote Sensing, vol. 230, pp. 73–91, 2025

  55. [55]

    IFNet: Deep fusion of multi-scale and multi-spectral information for change detection in optical and SAR images,

    J. Chen, Z. Yuan, J. Peng, L. Chen, H. Han, J. Chu, X. Fan, and H. Li, “IFNet: Deep fusion of multi-scale and multi-spectral information for change detection in optical and SAR images,”IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–13, 2022

  56. [56]

    PaFormer: Parallel attentional transformer for remote sensing change detection,

    X. Liu, Z. Li, W. Zhao, J. Shi, and J. Zhou, “PaFormer: Parallel attentional transformer for remote sensing change detection,”IEEE Geoscience and Remote Sensing Letters, vol. 20, pp. 1–5, 2023

  57. [57]

    DARNet: Semantic supervised dense attention retargeting network for change detection in remote sensing images,

    Z. Mao, Y . Zhong, X. Hu, L. Cao, J. Gao, and L. Zhang, “DARNet: Semantic supervised dense attention retargeting network for change detection in remote sensing images,”IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–15, 2023

  58. [58]

    ACABFNet: Attentional class-aware background features for remote sensing image change detection,

    J. Li, Z. Su, J. Geng, and Y . Yin, “ACABFNet: Attentional class-aware background features for remote sensing image change detection,”IEEE Geoscience and Remote Sensing Letters, vol. 19, pp. 1–5, 2022