Recognition: 2 theorem links
· Lean TheoremFoundation Model-Driven Semantic Change Detection in Remote Sensing Imagery
Pith reviewed 2026-05-15 22:04 UTC · model grok-4.3
The pith
A modular cascaded decoder lets remote sensing foundation models extract semantic changes while cutting pseudo-changes and training data needs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
PerASCD is a unified framework that pairs the PerA remote-sensing foundation model with a Cascaded Gated Decoder and a Soft Semantic Consistency Loss. The decoder accepts multi-scale backbone features, processes them in coarse-to-fine stages, and produces pixel-level change maps for multiple semantic classes. Experiments on SECOND and LandsatSCD establish new state-of-the-art Sek scores of 26.11 percent and 65.21 percent, exceed prior leaders by 0.61 and 4.95 percentage points, maintain superiority at 50 percent data, and generalize across backbones while preserving semantic consistency under radiometric shifts.
What carries the argument
The Cascaded Gated Decoder, a modular coarse-to-fine network that fuses multi-scale features from any backbone and adaptively extracts change information for semantic change detection tasks.
If this is right
- PerASCD outperforms full-data baselines when trained on only 50 percent of the labeled examples from SECOND and LandsatSCD.
- The same decoder architecture works without retraining when swapped onto different backbone networks.
- Interpretability improves because change decisions rest on the foundation model's already consistent semantic embeddings rather than ad-hoc features.
- Performance remains stable across radiometric and environmental shifts that normally create pseudo-changes.
Where Pith is reading between the lines
- The decoder pattern could transfer to other dense-prediction remote-sensing tasks such as semantic segmentation or instance detection.
- Data efficiency may allow faster model updates when new satellite imagery arrives with limited labels.
- Pairing the decoder with future, larger foundation models could further suppress pseudo-change errors.
- The coarse-to-fine gating might extend naturally to multi-date sequences beyond standard bi-temporal pairs.
Load-bearing premise
The chosen foundation model must keep producing semantically stable features even when imaging conditions and dates differ between the two images.
What would settle it
Run PerASCD and the prior best method on a fresh bi-temporal remote-sensing dataset that contains stronger seasonal, atmospheric, and sensor variations; if the new method no longer leads in Sek score, the consistency premise does not hold.
Figures
read the original abstract
Remote sensing (RS) change detection is essential for interpreting surface dynamics. Semantic change detection (SCD) further enables pixel-level understanding of multi-class transitions, yet remains sensitive to pseudo-changes induced by imaging conditions. Recent RS foundation models extract semantically consistent features across temporal and environmental variations, which is critical for mitigating pseudo-changes. However, existing SCD methods are often rigid and backbone-specific, lacking the flexibility to integrate diverse multi-scale features from emerging foundation models. To this end, we introduce a modular Cascaded Gated Decoder (CG-Decoder) that bridges various backbones and SCD tasks, processing multi-scale features in a coarse-to-fine manner while enabling adaptive change extraction. Building upon the RS foundation model PerA, we present PerASCD, a unified SCD framework. We further propose a Soft Semantic Consistency Loss (SSCLoss) to mitigate numerical instability in mixed-precision training. Extensive experiments on SECOND and LandsatSCD show that PerASCD achieves new state-of-the-art Sek scores (26.11% and 65.21%), surpassing the previous best by 0.61% and 4.95%, respectively. It also demonstrates exceptional data efficiency (outperforming the full-data baseline with 50% data), seamless cross-backbone generalization, and enhanced interpretability. Our approach maintains robust semantic consistency under radiometric variations, providing a reliable SCD solution. Code: https://github.com/SathShen/PerASCD.git.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces PerASCD, a semantic change detection (SCD) framework for remote sensing imagery that leverages the PerA foundation model. It proposes a modular Cascaded Gated Decoder (CG-Decoder) for coarse-to-fine multi-scale feature processing across backbones and a Soft Semantic Consistency Loss (SSCLoss) to mitigate mixed-precision instability. Experiments on SECOND and LandsatSCD report new SOTA Sek scores of 26.11% and 65.21% (gains of 0.61% and 4.95% over prior best), plus claims of 50%-data efficiency, cross-backbone generalization, and robustness to radiometric variations.
Significance. If the performance deltas are shown to arise from PerA's semantic consistency rather than decoder architecture or training details alone, the work could meaningfully advance foundation-model integration in SCD by reducing pseudo-changes and improving data efficiency. The modular CG-Decoder design offers a practical bridge for future backbones, and code release supports reproducibility.
major comments (2)
- [Experimental Results] Experimental Results section: The headline Sek gains (0.61% on SECOND, 4.95% on LandsatSCD) and 50%-data efficiency claim rest on the premise that PerA features remain semantically stable across temporal/radiometric variations. No quantitative isolation (e.g., cosine similarity on unchanged pixels, change-map ablation on pseudo-change regions, or radiometric-perturbation test) is reported to separate PerA's contribution from the CG-Decoder or SSCLoss; without this, attribution of the deltas remains unverified.
- [Methodology] Methodology section: The CG-Decoder is described as processing multi-scale features in a coarse-to-fine gated manner, yet no equations or pseudocode detail the gating mechanism, feature fusion, or how it specifically exploits PerA consistency; this leaves the load-bearing architectural novelty underspecified relative to the central claim.
minor comments (2)
- [Abstract] Abstract: The reported Sek scores and data-efficiency claim would be strengthened by explicit mention of error bars, number of runs, and exact baseline configurations.
- [Code Availability] The code link is provided but should include a README with exact training hyperparameters and dataset splits to aid verification.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive review. We address each major comment below and commit to revisions that strengthen the attribution of results and the specification of the architecture.
read point-by-point responses
-
Referee: [Experimental Results] Experimental Results section: The headline Sek gains (0.61% on SECOND, 4.95% on LandsatSCD) and 50%-data efficiency claim rest on the premise that PerA features remain semantically stable across temporal/radiometric variations. No quantitative isolation (e.g., cosine similarity on unchanged pixels, change-map ablation on pseudo-change regions, or radiometric-perturbation test) is reported to separate PerA's contribution from the CG-Decoder or SSCLoss; without this, attribution of the deltas remains unverified.
Authors: We agree that stronger isolation of PerA's contribution would improve the manuscript. In the revision we will add (i) cosine-similarity statistics of PerA features computed exclusively on unchanged pixels across temporal pairs, (ii) an ablation that replaces the CG-Decoder with a standard FPN while keeping PerA features fixed, and (iii) a controlled radiometric-perturbation experiment that measures Sek degradation under simulated illumination and sensor shifts. These additions will directly quantify how much of the reported gains and data-efficiency stem from PerA's semantic stability versus the decoder or loss. revision: yes
-
Referee: [Methodology] Methodology section: The CG-Decoder is described as processing multi-scale features in a coarse-to-fine gated manner, yet no equations or pseudocode detail the gating mechanism, feature fusion, or how it specifically exploits PerA consistency; this leaves the load-bearing architectural novelty underspecified relative to the central claim.
Authors: We accept that the gating and fusion operations require formal specification. The revised manuscript will include (a) the exact equations for the cascaded gate computation (including the sigmoid-activated gate weights and the element-wise modulation of multi-scale features), (b) the coarse-to-fine fusion formula that progressively refines change maps, and (c) pseudocode for the full CG-Decoder forward pass. These additions will make explicit how the architecture exploits the semantic consistency already present in PerA features. revision: yes
Circularity Check
No significant circularity; new modular components evaluated on public benchmarks
full rationale
The paper introduces CG-Decoder and SSCLoss as novel modules, treats the PerA foundation model as an external pretrained input, and reports empirical results on the public SECOND and LandsatSCD datasets. No equation or claim reduces by construction to a fitted parameter or self-definition internal to the paper. The assumption that PerA features are semantically consistent is presented as a premise drawn from prior foundation-model literature rather than derived within this work. No self-citations appear load-bearing for the central performance claims, and the reported Sek gains, data-efficiency results, and cross-backbone tests are independent evaluations rather than tautological restatements of inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Remote sensing foundation models extract semantically consistent features across temporal and environmental variations
invented entities (2)
-
Cascaded Gated Decoder (CG-Decoder)
no independent evidence
-
Soft Semantic Consistency Loss (SSCLoss)
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
Recent RS foundation models extract semantically consistent features across temporal and environmental variations, which is critical for mitigating pseudo-changes.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Hi -UCD: A Large -scale Dataset for Urban Semantic Change Detection in Remote Sensing Imagery,
S. Tian, A. Ma, Z. Zheng, and Y. Zhong, “Hi -UCD: A Large -scale Dataset for Urban Semantic Change Detection in Remote Sensing Imagery,” Dec. 28, 2020, arXiv: arXiv:2011.03247. doi: 10.48550/arXiv.2011.03247
-
[2]
A. Ochtyra, A. Marcinkowska -Ochtyra, and E. Raczko, “Threshold - and trend-based vegetation change monitoring algorithm based on the inter - annual multi -temporal normalized difference moisture index series: A case study of the Tatra Mountains,” Remote Sensing of Environment , vol. 249, p. 112026, Nov. 2020, doi: 10.1016/j.rse.2020.112026
-
[3]
Z. Zheng, Y. Zhong, J. Wang, A. Ma, and L. Zhang, “Building damage assessment for rapid disaster response with a deep object -based semantic change detection framework: From natural disasters to man -made disasters,” Remote Sensing of Environment , vol. 265, p. 112636, Nov. 2021, doi: 10.1016/j.rse.2021.112636
-
[4]
A Billion -scale Foundation Model for Remote Sensing Images,
K. Cha, J. Seo, and T. Lee, “A Billion -scale Foundation Model for Remote Sensing Images,” IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing, pp. 1–17, 2024, doi: 10.1109/JSTARS.2024.3401772
-
[5]
RingMo: A Remote Sensing Foundation M odel With Masked Image Modeling,
X. Sun et al. , “RingMo: A Remote Sensing Foundation M odel With Masked Image Modeling,” IEEE Trans. Geosci. Remote Sensing , vol. 61, pp. 1–22, 2023, doi: 10.1109/TGRS.2022.3194732
-
[6]
Multi-level logit distillation
U. Mall, B. Hariharan, and K. Bala, “Change -Aware Sampling and Contrastive Learning for Satellite Images,” in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , Vancouver, BC, Canada: IEEE, Jun. 2023, pp. 5261 –5270. doi: 10.1109/CVPR52729.2023.00509
-
[7]
H. Shen, H. Gu, H. Li, Y. Yang, and A. Qiu, “A contrastive learning foundation model based on perfect ly aligned sample pairs for remote sensing images,” Geo-spatial Information Science, vol. 0, no. 0, pp. 1 –18, Mar. 2026, doi: 10.1080/10095020.2026.2628435
-
[8]
Scale-MAE: A Scale -Aware Masked Autoencoder for Multiscale Geospatial Repre sentation Learning,
C. J. Reed et al., “Scale-MAE: A Scale -Aware Masked Autoencoder for Multiscale Geospatial Repre sentation Learning,” Sep. 22, 2023, arXiv: arXiv:2212.14532. doi: 10.48550/arXiv.2212.14532
-
[9]
Seasonal Contrast: Unsupervised Pre -Training from Uncurated Remote Sensing Data,
O. Mañ as, A. Lacoste, X. Giro -i-Nieto, D. Vazquez, and P. Rodriguez, “Seasonal Contrast: Unsupervised Pre -Training from Uncurated Remote Sensing Data,” May 03, 2021, arXiv: arXiv:2103.16607. doi: 10.48550/arXiv.2103.16607
-
[10]
X. Guo et al. , “SkySense: A Multi -Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery,” Mar. 22, 2024, arXiv: arXiv:2312.10115. doi: 10.48550/arXiv.2312.10115
-
[11]
T. Wang and P. Isola, “Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere,” Aug. 15, 2022, arXiv: arXiv:2005.10242. doi: 10.48550/arXiv.2005.10242
-
[12]
Masked Autoencoders Are Scalable Vision Learners
K. He, X. Chen, S. Xie, Y . Li, P. Dollár, and R. Girshick, “Masked Autoencoders Are Scalable Vision Learners,” Dec. 19, 2021, arXiv: arXiv:2111.06377. doi: 10.48550/arXiv.2111.06377
work page internal anchor Pith review doi:10.48550/arxiv.2111.06377 2021
-
[13]
Fully Convolutional Siamese Networks for Change De tection,
R. Caye Daudt, B. Le Saux, and A. Boulch, “Fully Convolutional Siamese Networks for Change De tection,” in 2018 25th IEEE International Conference on Image Processing (ICIP) , Oct. 2018, pp. 4063–4067. doi: 10.1109/ICIP.2018.8451652
-
[14]
SNUNet -CD: A Densely Connected Siamese Network for Change Detection of VHR Images,
S. Fang, K. Li, J. Shao, and Z. Li, “SNUNet -CD: A Densely Connected Siamese Network for Change Detection of VHR Images,” IEEE Geoscience and Remote Sensing Letters , vol. 19, pp. 1 –5, 2022, doi: 10.1109/LGRS.2021.3056416
-
[15]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
A. Dosovitskiy et al. , “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” Jun. 03, 2021, arXiv: arXiv:2010.11929. doi: 10.48550/arXiv.2010.11929
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2010.11929 2021
-
[16]
Remote Sensing Image Change Detection With Transformers,
H. Chen, Z. Qi, and Z. Shi, “Remote Sensing Image Change Detection With Transformers,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–14, 2022, doi: 10.1109/TGRS.2021.3095166
-
[17]
A Transformer -Based Siamese Network for Change Detection,
W. G. C. Bandara an d V. M. Patel, “A Transformer -Based Siamese Network for Change Detection,” in IGARSS 2022 - 2022 IEEE International Geoscience and Remote Sensing Symposium , Jul. 2022, pp. 207–210. doi: 10.1109/IGARSS46834.2022.9883686
-
[18]
Ke rnel Slow Feature Analysis for Scene Change Detection,
C. Wu, L. Zhang, and B. Du, “Ke rnel Slow Feature Analysis for Scene Change Detection,” IEEE Trans. Geosci. Remote Sensing , vol. 55, no. 4, pp. 2367–2384, Apr. 2017, doi: 10.1109/TGRS.2016.2642125
-
[19]
Semantic Change Detection with Hypermaps
T. Suzuki, S. Shirakabe, Y. Miyashita, A. Nakamura, Y. Satoh, and H. Kataoka, “Semantic Change Detection with Hypermaps,” Mar. 16, 2017, arXiv: arXiv:1604.07513. doi: 10.48550/arXiv.1604.07513
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1604.07513 2017
-
[20]
ChangeNet: A Deep Learning Architecture for Visual Change Detection,
A. Varghese, J. Gubbi, A. Ramaswamy, and P. Balamuralidha r, “ChangeNet: A Deep Learning Architecture for Visual Change Detection,” presented at the Proceedings of the European Conference on Computer Vision (ECCV) Workshops, 2018, pp. 0 –0. Accessed: Feb. 05, 2026. [Online]. Available: https://openaccess.thecvf.com/content_eccv_2018_workshops/w7/html/V ar...
work page 2018
-
[21]
Multitask learning for large-scale semantic change detection,
R. Caye Daudt, B. Le Saux, A. Boulch, and Y. Gousseau, “Multitask learning for large-scale semantic change detection,” Computer Vision and Image Understanding , vol. 187, p. 102783, Oct. 2019, doi: 10.1016/j.cviu.2019.07.003
-
[22]
Scene Change Detection VIA Deep Convolution Canonical Correlation Analysis Neural Network,
Y. Wang, B. Du, L. Ru, C. Wu, and H. Luo, “Scene Change Detection VIA Deep Convolution Canonical Correlation Analysis Neural Network,” in IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan: IEEE, Jul. 2019, pp. 198–201. doi: 10.1109/IGARSS.2019.8898211
-
[23]
D. Peng, L. Bruzzone, Y. Zhang, H. Guan, and P. He, “SCDNET: A novel convolutional network for semantic change detection in high resolution optical remote sensing imagery,” International Journal of Applied Earth Observation and Geoinformation , vol. 103, p. 102465, Dec. 2021, doi: 10.1016/j.jag.2021.102465
-
[24]
Semantic -CD: Remote Sensing Image Semantic Change Detection towards Open - vocabulary Setting,
Y. Zhu, L. Li, K. Chen, C. Liu, F. Zh ou, and Z. Shi, “Semantic -CD: Remote Sensing Image Semantic Change Detection towards Open - vocabulary Setting,” Jan. 12, 2025, arXiv: arXiv:2501.06808. doi: 10.48550/arXiv.2501.06808
-
[25]
Feature -Guided Multitask Change Detection Network,
Y. Deng et al. , “Feature -Guided Multitask Change Detection Network,” IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing , vol. 15, pp. 9667–9679, 2022, doi: 10.1109/JSTARS.2022.3215773
-
[26]
S. Xiang, M. Wang, X. Jiang, G. Xie, Z. Zhang, and P. Tang, “Dual -Task Semantic Change Detection for Remote Sensing Images Usin g the Generative Change Field Module,” Remote Sensing , vol. 13, no. 16, p. 3336, Aug. 2021, doi: 10.3390/rs13163336
-
[27]
Joint Spatio-Temporal Modeling for Semantic Change Detection in Remote Sensing Images,
L. Ding, J. Zhang, H. Guo, K. Zhang, B. Liu, and L. Bruzzone, “Joint Spatio-Temporal Modeling for Semantic Change Detection in Remote Sensing Images,” IEEE Trans. Geosci. Remote Sensing, vol. 62, pp. 1–14, 2024, doi: 10.1109/TGRS.2024.3362795
-
[28]
MSCD -Net: From Unimodal to Multimodal Semantic Change Detection,
J. Wang et al. , “MSCD -Net: From Unimodal to Multimodal Semantic Change Detection,” IEEE Trans. Geosci. Remote Sensing , vol. 63, pp. 1 – 17, 2025, doi: 10.1109/TGRS.2025.3591814
-
[29]
Q. Shu et al., “Semantic Change Detection of Roads and Bridges: A Fine - grained Dataset and Multimodal Frequency -driven Detector,” Sep. 19, 2025, arXiv: arXiv:2505.13212. doi: 10.48550/arXiv.2505.13212
-
[30]
B. Wijena yake et al. , “Mamba -FCS: Joint Spatio - Frequency Feature Fusion, Change-Guided Attention, and SeK Loss for Enhanced Semantic Change Detection in Remote Sensing,” Aug. 11, 2025, arXiv: arXiv:2508.08232. doi: 10.48550/arXiv.2508.08232
-
[31]
Y. Chen, C. Li, C . Ling, Y. Tan, and P. Wu, “FAPMNet: Flow -Aligned Prototype Memory Network for Semantic Change Detection in Remote Sensing Images,” IEEE Geoscience and Remote Sensing Letters , vol. 23, pp. 1–5, 2026, doi: 10.1109/LGRS.2026.3652300
-
[32]
X. Liu et al., “GSTM-SCD: Graph-enhanced spatio-temporal state space model for semantic change detection in multi -temporal remote sensing images,” ISPRS Journal of Photogrammetry and Remote Sensing , vol. 230, pp. 73–91, Dec. 2025, doi: 10.1016/j.isprsjprs.2025.09.003
-
[33]
A Review of Deep -Learning Methods for Change Detection in Multispectral Remote Sensing Images,
E. J. Parelius, “A Review of Deep -Learning Methods for Change Detection in Multispectral Remote Sensing Images,” Remote Sensing, vol. 15, no. 8, p. 2092, Apr. 2023, doi: 10.3390/rs15082092
-
[34]
Change Detection Methods for Remote Se nsing in the Last Decade: A Comprehensive Review,
G. Cheng et al., “Change Detection Methods for Remote Se nsing in the Last Decade: A Comprehensive Review,” Remote Sensing, vol. 16, no. 13, p. 2355, Jan. 2024, doi: 10.3390/rs16132355
-
[35]
W. Jing, K. Chi, Q. Li, and Q. Wang, “ChangeRD: A registration - integrated change detection framework for unaligned remote sensing images,” ISPRS Journal of Photogrammetry and Remote Sensing , vol. 220, pp. 64–74, Feb. 2025, doi: 10.1016/j.isprsjprs.2024.11.019
-
[36]
Bi - Temporal Semantic Reasoning for the Semantic Cha nge Detection in HR Remote Sensing Images,
L. Ding, H. Guo, S. Liu, L. Mou, J. Zhang, and L. Bruzzone, “Bi - Temporal Semantic Reasoning for the Semantic Cha nge Detection in HR Remote Sensing Images,” IEEE Trans. Geosci. Remote Sensing , vol. 60, pp. 1–14, 2022, doi: 10.1109/TGRS.2022.3154390
-
[37]
Vision Transformer Adapter for Dense Predictions,
Z. Chen et al., “Vision Transformer Adapter for Dense Predictions,” Feb. 13, 2023, arXiv: arXiv:2205.08534. doi: 10.48550/arXiv.2205.08534
-
[38]
U -Net: Convolutional Networks for Biomedical Image Segmentation,
O. Ronneberger, P. Fischer, and T. Brox, “U -Net: Convolutional Networks for Biomedical Image Segmentation,” in Medical Image Computing and Computer -Assisted Intervention – MICCAI 2015 , N. Navab, J. Hornegger, W. M. Wells, and A. F. Frangi, Eds., Cham: 5 > REPLACE THIS LINE WITH YOUR MANUSCRIPT ID NUMBER (DOUBLE-CLICK HERE TO EDIT) < Springer Internatio...
work page doi:10.1007/978 2015
-
[39]
Asymmetric Siamese Networks for Semantic Change Detection in Aerial Images,
K. Yang et al. , “Asymmetric Siamese Networks for Semantic Change Detection in Aerial Images,” IEEE Trans. Geosci. Remote Sensing , vol. 60, pp. 1–18, 2022, doi: 10.1109/TGRS.2021.3113912
-
[40]
P. Yuan, Q. Zhao, X. Zhao, X. Wang, X. Long, and Y. Zheng, “A transformer-based Siamese network and an open optical dataset for semantic change detection of remote sensing images,” International Journal of Digital Earth , vol. 15, no. 1, pp. 1506 –1525, Dec. 2022, doi: 10.1080/17538947.2022.2111470
-
[41]
Be the Change You Want to See: Revisiting Remote Sensing Change Detection Practices,
B. Rolih, M. Fučka, F. Wolf, and L. Č. Zajc, “Be the Change You Want to See: Revisiting Remote Sensing Change Detection Practices,” IEEE Transactions on Geoscience and Remote Sensing, vol. 63, pp. 1–11, 2025, doi: 10.1109/TGRS.2025.3585342
-
[42]
Deep Residual Learning for Image Recognition,
K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” presented at the Proceedings of the IEEE Conference on Computer Vision and P attern Recognition, 2016, pp. 770 –778. Accessed: Feb. 05, 2026. [Online]. Available: https://openaccess.thecvf.com/content_cvpr_2016/html/He_Deep_Residu al_Learning_CVPR_2016_paper.html
work page 2016
-
[43]
Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows,
Z. Liu et al., “Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows,” presented at the Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10012 –10022. Accessed: Feb. 05, 2026. [Online]. Available: https://openaccess.thecvf.com/content/ICCV2021/html/Liu_Swin_Transf ormer_Hierarchical_Vision_Transformer_U...
work page 2021
-
[44]
SatMAE: Pre -training Transformers for Temporal and Multi-Spectral Satellite Imagery
Y. Cong et al. , “SatMAE: Pre -training Transformers for Temporal and Multi-Spectral Satellite Imagery”
-
[45]
VMamba: Visual State Space Model,
Y. Liu et al., “VMamba: Visual State Space Model,” Advances in Neural Information Processing Systems, vol. 37, pp. 103031 –103063, Dec. 2024, doi: 10.52202/079017-3273. Hengtong Shen received the M.E. degree in photogrammetry and remote sensing from the Chinese Academy of Surveying and Mapping in 2025. He is currently pursuing a Ph.D. degree at Wuhan Univ...
-
[46]
He is currently pursuing a M.E. degree at Wuhan University. His research interests include change detection in urban scenes and point cloud deep learning
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.