arxiv: 2604.14540 · v1 · submitted 2026-04-16 · 💻 cs.CV

Recognition: unknown

WILD-SAM: Phase-Aware Expert Adaptation of SAM for Landslide Detection in Wrapped InSAR Interferograms

Yucheng Pan , Heping Li , Zhangle Liu , Sajid Hussain , Bin Pan

Authors on Pith no claims yet

Pith reviewed 2026-05-10 11:49 UTC · model grok-4.3

classification 💻 cs.CV

keywords landslide detectionwrapped InSARSAM adaptationphase-aware adaptermixture of expertswavelet enhancementremote sensing segmentation

0 comments

The pith

WILD-SAM adapts SAM with phase-aware experts and wavelet prompts to detect landslides accurately from wrapped InSAR interferograms.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish that inserting a Phase-Aware Mixture-of-Experts Adapter into SAM's frozen encoder and adding a Wavelet-Guided Subband Enhancement strategy can align the spectral mismatch between natural images and wrapped phase data. This alignment preserves the high-frequency fringes needed to outline landslide boundaries despite phase ambiguity and coherence noise. If correct, the method would allow reliable detection of slow-moving landslides directly from wrapped interferograms, avoiding the errors and costs of phase unwrapping.

Core claim

WILD-SAM integrates a Phase-Aware Mixture-of-Experts Adapter into the frozen SAM encoder to adaptively align spectral distributions between natural images and interferometric phase data through dynamic routing across convolutional experts, while the Wavelet-Guided Subband Enhancement strategy uses discrete wavelet transforms to disentangle high-frequency subbands and inject refined phase textures as dense prompts, ensuring topological integrity along sharp landslide boundaries and delivering state-of-the-art target completeness and contour fidelity on the ISSLIDE and ISSLIDE+ benchmarks.

What carries the argument

Phase-Aware Mixture-of-Experts (PA-MoE) Adapter that routes across heterogeneous convolutional experts to aggregate multi-scale spectral-textural priors, combined with Wavelet-Guided Subband Enhancement (WGSE) that generates frequency-aware dense prompts from high-frequency subbands.

Load-bearing premise

The spectral domain shift between natural images and wrapped interferometric phase data can be bridged by the PA-MoE Adapter and WGSE strategy while preserving the high-frequency fringes needed for accurate boundary delineation.

What would settle it

A direct comparison on held-out wrapped InSAR datasets where WILD-SAM shows no improvement in boundary precision or completeness over unmodified SAM or conventional segmentation baselines would falsify the central claim.

Figures

Figures reproduced from arXiv: 2604.14540 by Bin Pan, Heping Li, Sajid Hussain, Yucheng Pan, Zhangle Liu.

**Figure 2.** Figure 2: Overview of the proposed WILD-SAM architecture. The framework takes a three-channel phase representation as input and consists of a frozen ViT [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Details of our Convolutional Routing Experts (CORE) module. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Wavelet-Domain Feature Rectifier (WDFR) and SE Gate. The high [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Qualitative comparison of segmentation results on the ISSLIDE [ [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Qualitative comparison of segmentation results on the ISSLIDE+ [ [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: Quantitative comparison of cross-region generalization performance on the Hunza-InSAR [ [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 8.** Figure 8: Cross-region qualitative evaluation on the external Hunza-InSAR [ [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗

**Figure 9.** Figure 9: Qualitative ablation results on the ISSLIDE [ [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗

**Figure 10.** Figure 10: Visualization of feature attention maps input to the Mask Decoder. [PITH_FULL_IMAGE:figures/full_fig_p011_10.png] view at source ↗

read the original abstract

Detecting slow-moving landslides directly from wrapped Interferometric Synthetic Aperture Radar (InSAR) interferograms is crucial for efficient geohazard monitoring, yet it remains fundamentally challenged by severe phase ambiguity and complex coherence noise. While the Segment Anything Model (SAM) offers a powerful foundation for segmentation, its direct transfer to wrapped phase data is hindered by a profound spectral domain shift, which suppresses the high-frequency fringes essential for boundary delineation. To bridge this gap, we propose WILD-SAM, a novel parameter-efficient fine-tuning framework specifically designed to adapt SAM for high-precision landslide detection on wrapped interferograms. Specifically, the architecture integrates a Phase-Aware Mixture-of-Experts (PA-MoE) Adapter into the frozen encoder to align spectral distributions and introduces a Wavelet-Guided Subband Enhancement (WGSE) strategy to generate frequency-aware dense prompts. The PA-MoE Adapter exploits a dynamic routing mechanism across heterogeneous convolutional experts to adaptively aggregate multi-scale spectral-textural priors, effectively aligning the distribution discrepancy between natural images and interferometric phase data. Meanwhile, the WGSE strategy leverages discrete wavelet transforms to explicitly disentangle high-frequency subbands and refine directional phase textures, injecting these structural cues as dense prompts to ensure topological integrity along sharp landslide boundaries. Extensive experiments on the ISSLIDE and ISSLIDE+ benchmarks demonstrate that WILD-SAM achieves state-of-the-art performance, significantly outperforming existing methods in both target completeness and contour fidelity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

WILD-SAM adds a phase-aware MoE adapter and wavelet subband prompts to SAM for wrapped InSAR landslide detection, but the SOTA claims rest on unshown numbers and missing frequency checks.

read the letter

WILD-SAM takes the frozen SAM encoder and inserts a phase-aware mixture-of-experts adapter that routes across convolutional experts to handle the shift from natural images to wrapped interferometric phase. It pairs that with a wavelet-guided subband enhancement step that decomposes the input, refines high-frequency directional textures, and feeds them back as dense prompts. Those two pieces are the concrete additions for keeping fringe detail while doing segmentation on ISSLIDE and ISSLIDE+ data.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes WILD-SAM, a parameter-efficient fine-tuning framework adapting the Segment Anything Model (SAM) for landslide detection directly from wrapped InSAR interferograms. It integrates a Phase-Aware Mixture-of-Experts (PA-MoE) Adapter into the frozen encoder for spectral alignment via dynamic routing across convolutional experts and introduces a Wavelet-Guided Subband Enhancement (WGSE) strategy that applies discrete wavelet transforms to disentangle and refine high-frequency subbands, injecting them as dense prompts. The paper claims this yields state-of-the-art performance on the ISSLIDE and ISSLIDE+ benchmarks, with gains in target completeness and contour fidelity over prior methods.

Significance. If the experimental claims hold with proper validation, the work could meaningfully advance operational geohazard monitoring by enabling segmentation on wrapped phase data without error-prone unwrapping steps. The targeted use of wavelet subband refinement and MoE-based domain adaptation for preserving fringe structures in noisy interferograms addresses a genuine spectral mismatch problem in remote-sensing foundation-model transfer. The parameter-efficient design is a practical strength for deployment on limited InSAR datasets.

major comments (2)

[WGSE description and experimental validation] The central SOTA claim on contour fidelity rests on the WGSE strategy's ability to preserve high-frequency phase fringes, yet the manuscript supplies no frequency-domain diagnostics (e.g., power-spectrum ratios before/after WGSE, fringe-visibility scores, or edge-preservation metrics) comparing input interferograms to WGSE outputs. This leaves open whether boundary improvements arise from true high-frequency recovery or from the PA-MoE adapter and prompt engineering alone.
[Abstract and §4 (Experiments)] The abstract asserts SOTA results on named benchmarks but supplies no quantitative numbers, error bars, ablation studies, or experimental protocol details. Without these, the strength of the performance claims cannot be evaluated.

minor comments (1)

[Method section] Notation for the PA-MoE routing mechanism and wavelet subband indices could be clarified with explicit equations or a diagram to aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive review. We address each major comment below and outline the revisions we will make to improve the manuscript.

read point-by-point responses

Referee: [WGSE description and experimental validation] The central SOTA claim on contour fidelity rests on the WGSE strategy's ability to preserve high-frequency phase fringes, yet the manuscript supplies no frequency-domain diagnostics (e.g., power-spectrum ratios before/after WGSE, fringe-visibility scores, or edge-preservation metrics) comparing input interferograms to WGSE outputs. This leaves open whether boundary improvements arise from true high-frequency recovery or from the PA-MoE adapter and prompt engineering alone.

Authors: We appreciate this observation on the need for direct validation of WGSE's frequency-domain effects. The manuscript describes how WGSE uses discrete wavelet transforms to disentangle high-frequency subbands and inject refined phase textures as dense prompts. However, we acknowledge that explicit diagnostics such as power-spectrum ratios, fringe-visibility scores, or edge-preservation metrics comparing inputs to WGSE outputs are not currently provided. In the revised manuscript we will add these analyses, including before/after comparisons on sample interferograms, to demonstrate high-frequency recovery and to help isolate WGSE's contribution from that of the PA-MoE adapter and prompt engineering. revision: yes
Referee: [Abstract and §4 (Experiments)] The abstract asserts SOTA results on named benchmarks but supplies no quantitative numbers, error bars, ablation studies, or experimental protocol details. Without these, the strength of the performance claims cannot be evaluated.

Authors: We agree that the current abstract and experimental section do not supply the requested quantitative details. The manuscript states that WILD-SAM achieves state-of-the-art results on ISSLIDE and ISSLIDE+ but does not include specific metrics, error bars, or protocol information in the abstract, and §4 lacks comprehensive ablation tables and repeated-run statistics. We will revise the abstract to include key quantitative results (e.g., IoU and F1 improvements with error bars) and a brief protocol summary. We will also expand §4 with full ablation studies, experimental protocol details, and error bars from multiple runs to allow proper evaluation of the performance claims. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper introduces an architectural adaptation of SAM via the PA-MoE Adapter and WGSE strategy for spectral alignment and frequency-aware prompting on wrapped InSAR data. No equations, fitted parameters, or performance metrics are defined in terms of themselves or prior self-citations in the abstract or method description. The SOTA claims rest on benchmark experiments rather than any self-definitional reduction, fitted-input prediction, or load-bearing self-citation chain. The derivation chain is self-contained as an independent fine-tuning framework without tautological constructions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

Assessment is limited to the abstract; no numerical free parameters or external benchmarks are described. The ledger records the two explicitly introduced components and the background transfer-learning assumption.

axioms (1)

domain assumption The Segment Anything Model provides a transferable foundation for segmentation that can be adapted to new domains via parameter-efficient modules.
This premise underpins the decision to start from frozen SAM rather than training from scratch.

invented entities (2)

Phase-Aware Mixture-of-Experts (PA-MoE) Adapter no independent evidence
purpose: Dynamically route and aggregate multi-scale spectral-textural features to align natural-image and interferometric-phase distributions.
New module introduced inside the encoder; no independent evidence outside this work is supplied.
Wavelet-Guided Subband Enhancement (WGSE) strategy no independent evidence
purpose: Disentangle high-frequency subbands via discrete wavelet transform and inject them as dense prompts to preserve boundary topology.
New prompting strategy proposed in the paper; no external validation is given.

pith-pipeline@v0.9.0 · 5576 in / 1453 out tokens · 64303 ms · 2026-05-10T11:49:01.435330+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

71 extracted references · 5 canonical work pages · 3 internal anchors

[1]

Sar monitoring of progressive and seasonal ground deformation using the permanent scatterers technique,

C. Colesanti, A. Ferretti, F. Novali, C. Prati, and F. Rocca, “Sar monitoring of progressive and seasonal ground deformation using the permanent scatterers technique,”IEEE Transactions on Geoscience and Remote Sensing, vol. 41, no. 7, pp. 1685–1701, 2003

2003
[2]

A small-baseline approach for investigating deformations on full-resolution differential sar interferograms,

R. Lanari, O. Mora, M. Manunta, J. Mallorqui, P. Berardino, and E. Sansosti, “A small-baseline approach for investigating deformations on full-resolution differential sar interferograms,”IEEE Transactions on Geoscience and Remote Sensing, vol. 42, no. 7, pp. 1377–1386, 2004

2004
[3]

A new algorithm for processing interferometric data-stacks: Squeesar,

A. Ferretti, A. Fumagalli, F. Novali, C. Prati, F. Rocca, and A. Rucci, “A new algorithm for processing interferometric data-stacks: Squeesar,” IEEE Transactions on Geoscience and Remote Sensing, vol. 49, no. 9, pp. 3460–3470, 2011

2011
[4]

Automatic detection and update of landslide inventory before and after impoundments at the lianghekou reservoir using sentinel-1 insar,

Y . Wang, J. Dong, L. Zhang, S. Deng, G. Zhang, M. Liao, and J. Gong, “Automatic detection and update of landslide inventory before and after impoundments at the lianghekou reservoir using sentinel-1 insar,”International Journal of Applied Earth Observation and Geoinformation, vol. 118, p. 103224, 2023. [Online]. Available: https://www.sciencedirect.com/s...

2023
[5]

Landslide detection in long-term and low-coherence scenario using faster intermittent stacking insar method,

H. Dai, L. Wu, Y . Liao, L. Chen, and Y . Yang, “Landslide detection in long-term and low-coherence scenario using faster intermittent stacking insar method,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 18, pp. 19 245–19 259, 2025

2025
[6]

Remote sensing-based detection and analysis of slow-moving landslides in aba prefecture, southwest china,

J. Ren, W. Yang, Z. Ma, W. Li, S. Zeng, H. Fu, Y . Wen, and J. He, “Remote sensing-based detection and analysis of slow-moving landslides in aba prefecture, southwest china,”Remote Sensing, vol. 17, no. 8,
[7]

Available: https://www.mdpi.com/2072-4292/17/8/1462

[Online]. Available: https://www.mdpi.com/2072-4292/17/8/1462

2072
[8]

An embedding swin transformer model for automatic slow-moving landslide detection based on insar products,

X. Chen, C. Zhao, X. Liu, S. Zhang, J. Xi, and B. A. Khan, “An embedding swin transformer model for automatic slow-moving landslide detection based on insar products,”IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–15, 2024

2024
[9]

A new deep learning neural network model for the identification of insar anomalous deformation areas,

T. Zhang, W. Zhang, D. Cao, Y . Yi, and X. Wu, “A new deep learning neural network model for the identification of insar anomalous deformation areas,”Remote Sensing, vol. 14, no. 11, 2022. [Online]. Available: https://www.mdpi.com/2072-4292/14/11/2690

2022
[10]

Automatic identification of active landslides over wide areas from time-series insar measurements using faster rcnn,

J. Cai, L. Zhang, J. Dong, J. Guo, Y . Wang, and M. Liao, “Automatic identification of active landslides over wide areas from time-series insar measurements using faster rcnn,”International Journal of Applied Earth Observation and Geoinformation, vol. 124, p. 103516, 2023. [Online]. Available: https://www.sciencedirect.com/science/article/pii/ S1569843223003400

2023
[11]

Drs-unet: A deep semantic segmentation network for the recognition of active landslides from insar imagery in the three rivers region of the qinghai–tibet plateau,

X. Chen, X. Yao, Z. Zhou, Y . Liu, C. Yao, and K. Ren, “Drs-unet: A deep semantic segmentation network for the recognition of active landslides from insar imagery in the three rivers region of the qinghai–tibet plateau,”Remote Sensing, vol. 14, no. 8, 2022. [Online]. Available: https://www.mdpi.com/2072-4292/14/8/1848

2022
[12]

A lightweight context-aware adaptive fusion network for automatic identification of active landslides,

X. Cai, C. Song, Z. Li, Y . Chen, B. Chen, J. Du, C. Yu, W. Zhu, and J. Peng, “A lightweight context-aware adaptive fusion network for automatic identification of active landslides,”International Journal of Applied Earth Observation and Geoinformation, vol. 144, p. 104882, 2025. [Online]. Available: https://www.sciencedirect.com/ science/article/pii/S1569...

2025
[13]

Change detection of slow-moving landslide with multi- source sbas-insar and light-u2net,

J. Cai, D. Ming, F. Liu, X. Ling, N. Liu, L. Zhang, L. Xu, Y . Li, and M. Zhu, “Change detection of slow-moving landslide with multi- source sbas-insar and light-u2net,”International Journal of Applied Earth Observation and Geoinformation, vol. 136, p. 104387, 2025. [Online]. Available: https://www.sciencedirect.com/science/article/pii/ S1569843225000342

2025
[14]

Zero-shot detection for insar-based land displacement by the deformation-prompt-based sam method,

Y . He, B. Chen, M. Motagh, Y . Zhu, S. Shao, J. Li, B. Zhang, and H. Kaufmann, “Zero-shot detection for insar-based land displacement by the deformation-prompt-based sam method,”International Journal of Applied Earth Observation and Geoinformation, vol. 136, p. 104407, 2025. [Online]. Available: https://www.sciencedirect.com/ science/article/pii/S1569843...

2025
[15]

Segment anything,

A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y . Lo, P. Doll ´ar, and R. B. Girshick, “Segment anything,”2023 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3992–4003, 2023. [Online]. Available: https://api.semanticscholar.org/CorpusID:257952310

2023
[16]

Isslide: A new insar dataset for slow sliding area detection with machine learning,

A. Bralet, E. Trouv ´e, J. Chanussot, and A. M. Atto, “Isslide: A new insar dataset for slow sliding area detection with machine learning,” IEEE Geoscience and Remote Sensing Letters, vol. 21, pp. 1–5, 2024

2024
[17]

Mb-net: A network for accurately identifying creeping landslides from wrapped interferograms,

R. Zhang, W. Zhu, B. Fan, Q. He, J. Zhan, C. Wang, and B. Zhang, “Mb-net: A network for accurately identifying creeping landslides from wrapped interferograms,”International Journal of Applied Earth Observation and Geoinformation, vol. 135, p. 104300, 2024. [Online]. Available: https://www.sciencedirect.com/science/article/pii/ S1569843224006587

2024
[18]

Ecsplain: Explainability-constrained classifier for pairing the detection and the lo- calization of moving areas from sar interferograms,

A. Bralet, A. M. Atto, J. Chanussot, and E. Trouv ´e, “Ecsplain: Explainability-constrained classifier for pairing the detection and the lo- calization of moving areas from sar interferograms,”IEEE Transactions on Geoscience and Remote Sensing, vol. 63, pp. 1–18, 2025

2025
[19]

Medical sam adapter: Adapting segment anything model for medical image segmentation,

J. Wu, Z. Wang, M. Hong, W. Ji, H. Fu, Y . Xu, M. Xu, and Y . Jin, “Medical sam adapter: Adapting segment anything model for medical image segmentation,”Medical Image Analysis, vol. 102, p. 103547, 2025. [Online]. Available: https://www.sciencedirect.com/ science/article/pii/S1361841525000945

2025
[20]

Samrs: Scaling-up remote sensing segmentation dataset with segment anything model,

D. Wang, J. Zhang, B. Du, M. Xu, L. Liu, D. Tao, and L. Zhang, “Samrs: Scaling-up remote sensing segmentation dataset with segment anything model,” inAdvances in Neural Information Processing Systems, vol. 36, 2023, pp. 8815–8827

2023
[21]

Segment anything meets point tracking,

F. Raji ˇc, L. Ke, Y . Tai, C. Tang, M. Danelljan, and F. Yu, “Segment anything meets point tracking,” in2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2025, pp. 9302–9311

2025
[22]

ASTRA: Let Arbitrary Subjects Transform in Video Editing

F. Shen, W. Xu, R. Yan, D. Zhang, X. Shu, and J. Tang, “Imagedit: Let any subject transform,”arXiv preprint arXiv:2510.01186, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[23]

IMAGHar- mony: Controllable image editing with consistent object quantity and layout,

F. Shen, X. Du, Y . Gao, J. Yu, Y . Cao, X. Lei, and J. Tang, “Imaghar- mony: Controllable image editing with consistent object quantity and layout,”arXiv preprint arXiv:2506.01949, 2025

work page arXiv 2025
[24]

Imagpose: A unified conditional framework for pose-guided person generation,

F. Shen and J. Tang, “Imagpose: A unified conditional framework for pose-guided person generation,”Advances in neural information processing systems, vol. 37, pp. 6246–6266, 2024

2024
[25]

The segment anything model (sam) for remote sensing applications: From zero to one shot,

L. P. Osco, Q. Wu, E. L. de Lemos, W. N. Gonc ¸alves, A. P. M. Ramos, J. Li, and J. Marcato, “The segment anything model (sam) for remote sensing applications: From zero to one shot,”International Journal of Applied Earth Observation and Geoinformation, vol. 124, p. 103540, 2023. [Online]. Available: https://www.sciencedirect.com/ science/article/pii/S156...

2023
[26]

Parameter-efficient transfer learning for NLP,

N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. De Laroussilhe, A. Gesmundo, M. Attariyan, and S. Gelly, “Parameter-efficient transfer learning for NLP,” inProceedings of the 36th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, K. Chaudhuri and R. Salakhutdinov, Eds., vol. 97. PMLR, 09–15 Jun 2019, pp. 2...

2019
[27]

LoRA: Low-rank adaptation of large language models,

E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, and W. Chen, “LoRA: Low-rank adaptation of large language models,” in International Conference on Learning Representations, 2022. [Online]. Available: https://openreview.net/forum?id=nZeVKeeFYf9

2022
[28]

Multi-lora fine-tuned segment anything model for urban man-made object extraction,

X. Lu and Q. Weng, “Multi-lora fine-tuned segment anything model for urban man-made object extraction,”IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–19, 2024

2024
[29]

A multispectral remote sensing crop segmentation method based on seg- ment anything model using multistage adaptation fine-tuning,

B. Song, H. Yang, Y . Wu, P. Zhang, B. Wang, and G. Han, “A multispectral remote sensing crop segmentation method based on seg- ment anything model using multistage adaptation fine-tuning,”IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–18, 2024

2024
[30]

Sam-adapter: Adapting segment anything in underperformed scenes,

T. Chen, L. Zhu, C. Deng, R. Cao, Y . Wang, S. Zhang, Z. Li, L. Sun, Y . Zang, and P. Mao, “Sam-adapter: Adapting segment anything in underperformed scenes,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 3367–3375

2023
[31]

Rs-sam: Integrating multi-scale information for enhanced remote sensing image segmentation,

E. Zhang, J. Liu, A. Cao, Z. Sun, H. Zhang, H. Wang, L. Sun, and M. Song, “Rs-sam: Integrating multi-scale information for enhanced remote sensing image segmentation,” inComputer Vision – ACCV 2024, M. Cho, I. Laptev, D. Tran, A. Yao, and H. Zha, Eds. Singapore: Springer Nature Singapore, 2025, pp. 280–296

2024
[32]

Scd-sam: Adapting segment anything model for semantic change detection in remote sensing imagery,

L. Mei, Z. Ye, C. Xu, H. Wang, Y . Wang, C. Lei, W. Yang, and Y . Li, “Scd-sam: Adapting segment anything model for semantic change detection in remote sensing imagery,”IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–13, 2024

2024
[33]

A universal adapter in segmentation models for transferable landslide mapping,

R. Wei, Y . Li, Y . Li, B. Zhang, J. Wang, C. Wu, S. Yao, and C. Ye, “A universal adapter in segmentation models for transferable landslide mapping,”ISPRS Journal of Photogrammetry and Remote Sensing, vol. 218, pp. 446–465, 2024. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0924271624004143 13

2024
[34]

Classwise-sam-adapter: Parameter-efficient fine-tuning adapts segment anything to sar domain for semantic segmentation,

X. Pu, H. Jia, L. Zheng, F. Wang, and F. Xu, “Classwise-sam-adapter: Parameter-efficient fine-tuning adapts segment anything to sar domain for semantic segmentation,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 18, pp. 4791–4804, 2025

2025
[35]

A vision foundation model-based method for large-scale forest disturbance mapping using time series sentinel-1 sar data,

Y . Tian, F. Zhao, R. Meng, R. Sun, Y . Zhang, Y . Shen, B. Wang, J. Liu, and M. Li, “A vision foundation model-based method for large-scale forest disturbance mapping using time series sentinel-1 sar data,”Remote Sensing of Environment, vol. 325, p. 114775, 2025. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0034425725001798

2025
[36]

Mesam: Multiscale enhanced segment anything model for optical remote sensing images,

X. Zhou, F. Liang, L. Chen, H. Liu, Q. Song, G. Vivone, and J. Chanus- sot, “Mesam: Multiscale enhanced segment anything model for optical remote sensing images,”IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–15, 2024

2024
[37]

Bsdsnet: Dual-stream feature extraction network based on segment anything model for synthetic aperture radar land cover classification,

Y . Wang, W. Zhang, W. Chen, and C. Chen, “Bsdsnet: Dual-stream feature extraction network based on segment anything model for synthetic aperture radar land cover classification,”Remote Sensing, vol. 16, no. 7, 2024. [Online]. Available: https://www.mdpi.com/ 2072-4292/16/7/1150

2024
[38]

Sam-cffnet: Sam-based cross-feature fusion network for intelligent identification of landslides,

L. Xi, J. Yu, D. Ge, Y . Pang, P. Zhou, C. Hou, Y . Li, Y . Chen, and Y . Dong, “Sam-cffnet: Sam-based cross-feature fusion network for intelligent identification of landslides,”Remote Sensing, vol. 16, no. 13, 2024. [Online]. Available: https://www.mdpi.com/2072-4292/ 16/13/2334

2024
[39]

Adaptive mixtures of local experts,

R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton, “Adaptive mixtures of local experts,”Neural Computation, vol. 3, no. 1, pp. 79–87, 1991

1991
[40]

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

N. Shazeer, A. Mirhoseini, K. Maziarz, A. Davis, Q. V . Le, G. E. Hinton, and J. Dean, “Outrageously large neural networks: The sparsely-gated mixture-of-experts layer,”ArXiv, vol. abs/1701.06538, 2017. [Online]. Available: https://api.semanticscholar.org/CorpusID:12462234

work page internal anchor Pith review Pith/arXiv arXiv 2017
[41]

Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity,

W. Fedus, B. Zoph, and N. Shazeer, “Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity,”Journal of Machine Learning Research, vol. 23, no. 120, pp. 1–39, 2022

2022
[42]

Mixture-of-experts with expert choice routing,

Y . Zhou, T. Lei, H. Liu, N. Du, Y . Huang, V . Zhao, A. M. Dai, z. Chen, Q. V . Le, and J. Laudon, “Mixture-of-experts with expert choice routing,” inAdvances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, Eds., vol. 35. Curran Associates, Inc., 2022, pp. 7103–7114. [Online]. Available: https:...

2022
[43]

Tutel: Adaptive mixture-of-experts at scale,

C. Hwang, W. Cui, Y . Xiong, Z. Yang, Z. Liu, H. Hu, Z. Wang, R. Salas, J. Jose, P. Ramet al., “Tutel: Adaptive mixture-of-experts at scale,” Proceedings of Machine Learning and Systems, vol. 5, pp. 269–287, 2023

2023
[44]

Scaling vision with sparse mixture of experts,

C. Riquelme, J. Puigcerver, B. Mustafa, M. Neumann, R. Jenatton, A. Susano Pinto, D. Keysers, and N. Houlsby, “Scaling vision with sparse mixture of experts,”Advances in Neural Information Processing Systems, vol. 34, pp. 8583–8595, 2021

2021
[45]

Patch-level routing in mixture-of-experts is provably sample-efficient for convolutional neural networks,

M. N. R. Chowdhury, S. Zhang, M. Wang, S. Liu, and P.-Y . Chen, “Patch-level routing in mixture-of-experts is provably sample-efficient for convolutional neural networks,” inInternational Conference on Machine Learning. PMLR, 2023, pp. 6074–6114

2023
[46]

From sparse to soft mixtures of experts,

J. Puigcerver, C. R. Ruiz, B. Mustafa, and N. Houlsby, “From sparse to soft mixtures of experts,” inThe Twelfth International Conference on Learning Representations, 2024. [Online]. Available: https://openreview.net/forum?id=jxpsAj7ltE

2024
[47]

DeepSeekMoE: Towards ultimate expert specialization in mixture-of-experts language models,

D. Dai, C. Deng, C. Zhao, R. Xu, H. Gao, D. Chen, J. Li, W. Zeng, X. Yu, Y . Wu, Z. Xie, Y . Li, P. Huang, F. Luo, C. Ruan, Z. Sui, and W. Liang, “DeepSeekMoE: Towards ultimate expert specialization in mixture-of-experts language models,” in Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), L....

2024
[48]

Sparse upcycling: Training mixture-of-experts from dense checkpoints,

A. Komatsuzaki, J. Puigcerver, J. Lee-Thorp, C. R. Ruiz, B. Mustafa, J. Ainslie, Y . Tay, M. Dehghani, and N. Houlsby, “Sparse upcycling: Training mixture-of-experts from dense checkpoints,” inThe Eleventh International Conference on Learning Representations, 2023. [Online]. Available: https://openreview.net/forum?id=T5nUQDrM4u

2023
[49]

LoRAMoE: Alleviating world knowledge forgetting in large language models via MoE-style plugin,

S. Dou, E. Zhou, Y . Liu, S. Gao, W. Shen, L. Xiong, Y . Zhou, X. Wang, Z. Xi, X. Fan, S. Pu, J. Zhu, R. Zheng, T. Gui, Q. Zhang, and X. Huang, “LoRAMoE: Alleviating world knowledge forgetting in large language models via MoE-style plugin,” in Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ...

2024
[50]

Dselect-k: Differentiable selection in the mixture of experts with applications to multi-task learning,

H. Hazimeh, Z. Zhao, A. Chowdhery, M. Sathiamoorthy, Y . Chen, R. Mazumder, L. Hong, and E. Chi, “Dselect-k: Differentiable selection in the mixture of experts with applications to multi-task learning,”Ad- vances in Neural Information Processing Systems, vol. 34, pp. 29 335– 29 347, 2021

2021
[51]

Robust mixture-of-expert training for convolu- tional neural networks,

Y . Zhang, R. Cai, T. Chen, G. Zhang, H. Zhang, P.-Y . Chen, S. Chang, Z. Wang, and S. Liu, “Robust mixture-of-expert training for convolu- tional neural networks,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 90–101

2023
[52]

Boost- ing consistency in story visualization with rich-contextual conditional diffusion models,

F. Shen, H. Ye, S. Liu, J. Zhang, C. Wang, X. Han, and Y . Wei, “Boost- ing consistency in story visualization with rich-contextual conditional diffusion models,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 7, 2025, pp. 6785–6794

2025
[53]

Transformer tracking via frequency fusion,

X. Hu, B. Zhong, Q. Liang, S. Zhang, N. Li, X. Li, and R. Ji, “Transformer tracking via frequency fusion,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 2, pp. 1020– 1031, 2024

2024
[54]

Towards building more robust models with frequency bias,

Q. Bu, D. Huang, and H. Cui, “Towards building more robust models with frequency bias,” in2023 IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 4379–4388

2023
[55]

Wave-vit: Unifying wavelet and transformers for visual representation learning,

T. Yao, Y . Pan, Y . Li, C.-W. Ngo, and T. Mei, “Wave-vit: Unifying wavelet and transformers for visual representation learning,” inCom- puter Vision – ECCV 2022, S. Avidan, G. Brostow, M. Ciss ´e, G. M. Farinella, and T. Hassner, Eds. Cham: Springer Nature Switzerland, 2022, pp. 328–345

2022
[56]

Uncertainty-aware source-free adaptive image super-resolution with wavelet augmentation transformer,

Y . Ai, X. Zhou, H. Huang, L. Zhang, and R. He, “Uncertainty-aware source-free adaptive image super-resolution with wavelet augmentation transformer,” in2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 8142–8152

2024
[57]

Frequency- aware feature fusion for dense image prediction,

L. Chen, Y . Fu, L. Gu, C. Yan, T. Harada, and G. Huang, “Frequency- aware feature fusion for dense image prediction,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 12, pp. 10 763– 10 780, 2024

2024
[58]

Momfnet: A deep learning approach for insar phase filtering based on multi-objective multi-kernel feature extraction,

X. Zhang, C. Peng, Z. Li, Y . Zhang, Y . Liu, and Y . Wang, “Momfnet: A deep learning approach for insar phase filtering based on multi-objective multi-kernel feature extraction,”Sensors, vol. 24, no. 23, 2024. [Online]. Available: https://www.mdpi.com/1424-8220/24/23/7821

2024
[59]

Synthetic aperture radar image despeckling based on a deep learning network employing frequency domain decomposition,

X. Zhao, F. Ren, H. Sun, and Q. Qi, “Synthetic aperture radar image despeckling based on a deep learning network employing frequency domain decomposition,”Electronics, vol. 13, no. 3, 2024. [Online]. Available: https://www.mdpi.com/2079-9292/13/3/490

2024
[60]

IMAGGarment-1: Fine-grained garment generation for controllable fashion design,

F. Shen, J. Yu, C. Wang, X. Jiang, X. Du, and J. Tang, “Imaggarment-1: Fine-grained garment generation for controllable fashion design,”arXiv preprint arXiv:2504.13176, 2025

work page arXiv 2025
[61]

Imagdressing-v1: Customizable virtual dressing,

F. Shen, X. Jiang, X. He, H. Ye, C. Wang, X. Du, Z. Li, and J. Tang, “Imagdressing-v1: Customizable virtual dressing,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 7, 2025, pp. 6795–6804

2025
[62]

Long-term talkingface generation via motion-prior conditional diffusion model,

F. Shen, C. Wang, J. Gao, Q. Guo, J. Dang, J. Tang, and T.-S. Chua, “Long-term talkingface generation via motion-prior conditional diffusion model,” inForty-second International Conference on Machine Learning, 2025. [Online]. Available: https://openreview.net/forum?id= aINERD9MzJ

2025
[63]

U-net: Convolutional networks for biomedical image segmentation,

O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” inInternational Conference on Medical image computing and computer-assisted intervention. Springer, 2015, pp. 234–241

2015
[64]

Resunet++: An advanced archi- tecture for medical image segmentation,

D. Jha, P. H. Smedsrud, M. A. Riegler, D. Johansen, T. D. Lange, P. Halvorsen, and H. D. Johansen, “Resunet++: An advanced archi- tecture for medical image segmentation,” in2019 IEEE International Symposium on Multimedia (ISM), 2019, pp. 225–2255

2019
[65]

Rethinking Atrous Convolution for Semantic Image Segmentation

L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking atrous convolution for semantic image segmentation,”arXiv preprint arXiv:1706.05587, 2017

work page internal anchor Pith review arXiv 2017
[66]

Fully convolutional networks for semantic segmentation,

J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431–3440

2015
[67]

Segformer: Simple and efficient design for semantic segmentation with transformers,

E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, “Segformer: Simple and efficient design for semantic segmentation with transformers,”Advances in neural information processing systems, vol. 34, pp. 12 077–12 090, 2021

2021
[68]

Per-pixel classification is not all you need for semantic segmentation,

B. Cheng, A. Schwing, and A. Kirillov, “Per-pixel classification is not all you need for semantic segmentation,”Advances in neural information processing systems, vol. 34, pp. 17 864–17 875, 2021. 14

2021
[69]

A dual-stream coding fusion network for landslide mapping based on multisource remote sensing images,

Y . He, H. Chen, Q. Zhu, Q. Zhang, W. Yang, J. Yang, W. Shi, L. Gao, and M. Filonchyk, “A dual-stream coding fusion network for landslide mapping based on multisource remote sensing images,”IEEE Transactions on Geoscience and Remote Sensing, vol. 63, pp. 1–21, 2025

2025
[70]

Rsprompter: Learning to prompt for remote sensing instance seg- mentation based on visual foundation model,

K. Chen, C. Liu, H. Chen, H. Zhang, W. Li, Z. Zou, and Z. Shi, “Rsprompter: Learning to prompt for remote sensing instance seg- mentation based on visual foundation model,”IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–17, 2024

2024
[71]

Integrated psinsar and sbas-insar analysis for landslide detection and monitoring,

S. Hussain, B. Pan, W. Hussain, M. M. Sajjad, M. Ali, Z. Afzal, M. Abdullah-Al-Wadud, and A. Tariq, “Integrated psinsar and sbas-insar analysis for landslide detection and monitoring,”Physics and Chemistry of the Earth, Parts A/B/C, vol. 139, p. 103956, 2025. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1474706525001068

2025