arxiv: 2604.03342 · v1 · submitted 2026-04-03 · 💻 cs.CV

Recognition: no theorem link

Mixture-of-Experts in Remote Sensing: A Survey

Yongchuan Cui , Peng Liu , Lajiao Chen

Authors on Pith no claims yet

Pith reviewed 2026-05-13 20:01 UTC · model grok-4.3

classification 💻 cs.CV

keywords mixture of expertsremote sensingsurveydynamic routingearth observationmodel specializationspatiotemporal data

0 comments

The pith

Mixture-of-Experts routes remote sensing inputs to specialized experts to manage sensor diversity and dynamics.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes the first systematic overview of Mixture-of-Experts applications in remote sensing by reviewing core principles, architectural variants, and uses across tasks. It highlights how MoE addresses the challenges of multimodal inputs and changing Earth conditions through selective activation of experts rather than uniform processing. A sympathetic reader would value this because remote sensing data spans optical, radar, and other modalities with high spatiotemporal variability, where dynamic routing can improve both accuracy and efficiency. The survey closes by mapping trends that point toward broader adoption in data-heavy observation workflows.

Core claim

Mixture-of-Experts models address remote sensing challenges by employing a routing mechanism that directs each input to the most relevant subset of specialized expert sub-networks, and the survey synthesizes existing designs and applications to demonstrate this approach across classification, segmentation, detection, and change-analysis tasks.

What carries the argument

Mixture-of-Experts (MoE) model, which uses a gating or routing network to activate only a sparse subset of specialized expert networks for each input.

If this is right

Sparse activation in MoE variants lowers compute demands for processing high-volume satellite imagery.
Routing strategies can be tuned separately for optical, SAR, and hyperspectral data modalities.
Applications extend to time-series analysis and multimodal fusion without requiring full model activation.
Future designs may incorporate MoE layers into larger foundation models for Earth observation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

MoE routing patterns could transfer to real-time stream processing of incoming satellite feeds.
The surveyed techniques suggest a path for parameter-efficient adaptation across multiple remote sensing sensors.
Neighboring domains such as climate data analysis may adopt similar expert specialization for regional variability.

Load-bearing premise

The published body of MoE work in remote sensing is large enough and distinct enough to support a complete, representative survey.

What would settle it

Discovery of several major MoE-based remote sensing methods or papers that the survey omits or fails to synthesize.

Figures

Figures reproduced from arXiv: 2604.03342 by Lajiao Chen, Peng Liu, Yongchuan Cui.

**Figure 1.** Figure 1: Word cloud of the most frequent words appearing in MoE-related remote sensing papers. multi-task and multi-modal learning, M3ViT [79] integrates MoE layers into Vision Transformers [25] to reduce interference between tasks while keeping inference efficient, Mod-Squad [14] treats experts as reusable modules that can be shared or specialized across tasks, and MoE-based semantic segmentation frameworks use ex… view at source ↗

**Figure 2.** Figure 2: Basic architecture of Mixture-of-Experts (MoE). ( [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Overview of Mixture-of-Experts applications in remote sensing. 9 [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: MoE in adaptive mixture-of-experts distillation (AMoED) [30] for cross-satellite generalizable incremental scene classification. the bias toward frequent categories and markedly improves the recognition performance of rare classes compared with single-backbone baselines. On the NWPU-RESISC45 [15] dataset, DMRS [143] reports 88.3% overall accuracy, exceeding the next best method (MDCS [154] at 81.6%) by 6.7… view at source ↗

**Figure 5.** Figure 5: MoE in mixture-of-spectral-spatial-experts state space model (MambaMoE) [148] for hyperspectral image classification. class label supervision. Throughout the training process, a shallow style-mixing operation is applied to reduce geospatial and sensor-induced deviations, effectively mitigating domain shift and catastrophic forgetting across satellites. Guo et al. [40] addressed radar target recognition by … view at source ↗

**Figure 6.** Figure 6: Grid-level MoE backbone used in single model for multi-modal datasets and multi-task object detection (SM3Det) [77] for multi-modal remote sensing object detection. large objects are all well-represented. SAFPN’s [6] multi-level expert design results in mean average precision (mAP) improvements for detection and instance segmentation from 71.3% and 62.4% to 82.7% and 71.1%, respectively, on the Airbus Ship… view at source ↗

**Figure 7.** Figure 7: Sparse mixture-of-experts for hyperspectral object tracking (HotMoE) [132] framework for hyperspectral object tracking. for final classification. This approach yielded highly accurate burned area maps, as the mixture of experts could adaptively process different aspects of the bi-temporal data while the attention mechanisms enhanced feature representation. The MoE-based siamese model outperformed conventio… view at source ↗

**Figure 8.** Figure 8: Modality-aware pruning of experts (MAPEX) [43] for multi-modal remote sensing foundation models. 3.4 Multi-Modal Fusion and Adaptation Remote sensing often involves multi-modal data fusion by combining information from different sensors (e.g., optical, radar, LiDAR, multispectral), as well as adapting models across different data domains (e.g., different satellites or geographic regions). MoE models are na… view at source ↗

**Figure 9.** Figure 9: Phytoplankton absorption mixture-of-experts (PhA-MoE) [141] architecture for hyperspectral retrieval of phytoplankton absorption coefficients. the expert network consists of one shared expert and three special experts, each corresponding to a specific parameter (P-wave velocity, S-wave velocity, and density). Each task has its own gating network that assigns weights to the shared expert and its correspondi… view at source ↗

read the original abstract

Remote sensing data analysis and interpretation present unique challenges due to the diversity in sensor modalities and spatiotemporal dynamics of Earth observation data. Mixture-of-Experts (MoE) model has emerged as a powerful paradigm that addresses these challenges by dynamically routing inputs to specialized experts designed for different aspects of a task. However, despite rapid progress, the community still lacks a comprehensive review of MoE for remote sensing. This survey provides the first systematic overview of MoE applications in remote sensing, covering fundamental principles, architectural designs, and key applications across a variety of remote sensing tasks. The survey also outlines future trends to inspire further research and innovation in applying MoE to remote sensing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This survey compiles MoE work for remote sensing but its value rests on whether enough distinct papers exist and whether the selection is representative.

read the letter

The main point is that this is a literature survey positioning itself as the first systematic overview of Mixture-of-Experts models applied to remote sensing. It covers basic principles, model designs, and uses across tasks like image analysis and change detection, plus some notes on future directions. That organization could save time for people already working in remote sensing who want to see how MoE fits the data variety and dynamics without starting from scratch. The authors do a reasonable job framing the challenges of sensor modalities and spatiotemporal aspects, which aligns with real issues in the domain. If the cited papers are well-chosen and the synthesis holds, it functions as a reference point for the subfield. The soft spot is the lack of any visible search protocol, date range, or inclusion rules in the abstract. That leaves open whether the review captured the full set of relevant work or missed areas such as hyperspectral or SAR-specific applications. It is also unclear how large or separate the MoE-remote-sensing literature actually is from standard MoE vision papers; if the body of work is still small or mostly direct transfers, the dedicated survey risks feeling thin. This is for remote sensing researchers or engineers considering MoE architectures. A serious referee could check the completeness of the citations and the balance of the summary. I would send it to peer review to test those coverage claims rather than desk reject it outright.

Referee Report

2 major / 2 minor

Summary. The paper claims to deliver the first systematic survey of Mixture-of-Experts (MoE) models applied to remote sensing, covering fundamental principles, architectural designs, and applications across diverse remote sensing tasks such as image classification, segmentation, and change detection, while also discussing future trends.

Significance. If the survey is comprehensive and representative, it would fill an important gap by synthesizing MoE techniques tailored to remote sensing challenges including multi-modal sensor data and spatiotemporal variability, potentially serving as a reference for researchers bridging computer vision and Earth observation.

major comments (2)

[Abstract and Introduction] Abstract and Introduction: The central claim that this is the 'first systematic overview' is load-bearing but unsupported by any description of the literature search protocol, including databases searched (e.g., IEEE Xplore, Google Scholar), search terms, date range, or inclusion/exclusion criteria. Without this information, it is impossible to evaluate completeness or selection bias.
[Literature Overview] Section 2 or 3 (Literature Overview): The manuscript should include a quantitative summary (e.g., a table or figure) of the number of relevant peer-reviewed papers identified per task category to demonstrate that the MoE-remote sensing corpus is large and distinct enough from generic MoE vision literature to justify a dedicated survey.

minor comments (2)

[Figures] Figure captions and notation: Ensure consistent use of symbols for routing gates and expert outputs across all architectural diagrams to improve readability.
[References] References: Verify that all cited works are from peer-reviewed venues and that recent 2023-2024 publications are included where relevant.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help strengthen the transparency and justification of our survey. We will revise the manuscript accordingly to address both major points.

read point-by-point responses

Referee: [Abstract and Introduction] Abstract and Introduction: The central claim that this is the 'first systematic overview' is load-bearing but unsupported by any description of the literature search protocol, including databases searched (e.g., IEEE Xplore, Google Scholar), search terms, date range, or inclusion/exclusion criteria. Without this information, it is impossible to evaluate completeness or selection bias.

Authors: We agree that a detailed description of the literature search protocol is necessary to support the claim of providing the first systematic overview. In the revised manuscript, we will add a dedicated subsection (likely in the Introduction) that explicitly describes the databases searched (IEEE Xplore, Google Scholar, arXiv, Web of Science), the search terms and Boolean combinations used (e.g., 'Mixture-of-Experts' OR MoE AND 'remote sensing' OR 'Earth observation'), the date range covered, and the inclusion/exclusion criteria applied to select papers. This will allow readers to assess completeness and potential selection bias. revision: yes
Referee: [Literature Overview] Section 2 or 3 (Literature Overview): The manuscript should include a quantitative summary (e.g., a table or figure) of the number of relevant peer-reviewed papers identified per task category to demonstrate that the MoE-remote sensing corpus is large and distinct enough from generic MoE vision literature to justify a dedicated survey.

Authors: We accept this recommendation. In the revised version, we will insert a new table (or figure) in the Literature Overview section that provides a quantitative breakdown of the number of peer-reviewed papers identified per remote sensing task category (e.g., classification, segmentation, change detection, object detection, and multimodal fusion). The table will also note the proportion of papers that focus specifically on remote sensing challenges versus generic vision applications, thereby demonstrating the size and distinctiveness of the MoE-remote sensing corpus. revision: yes

Circularity Check

0 steps flagged

No circularity: survey asserts literature gap without derivations or self-referential reductions

full rationale

The paper is a literature survey containing no equations, predictions, fitted parameters, or first-principles derivations. Its central claim—that it supplies the 'first systematic overview' of MoE in remote sensing—is a descriptive assertion about the external corpus rather than a result obtained by reducing any quantity to its own inputs. No self-citation chains, ansatzes, or uniqueness theorems are invoked to justify internal results. The absence of prior reviews is stated without reference to the authors' own prior work as load-bearing evidence. This matches the default expectation for non-circular survey papers whose claims rest on external literature rather than internal construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

As a survey paper, the central claim rests on the completeness and representativeness of the reviewed literature rather than on any mathematical derivations, fitted parameters, or newly postulated entities. No free parameters, axioms, or invented entities are introduced in the provided abstract.

pith-pipeline@v0.9.0 · 5402 in / 1141 out tokens · 47178 ms · 2026-05-13T20:01:12.639387+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

162 extracted references · 162 canonical work pages · 1 internal anchor

[1]

Aggarwal, V., Nagarajan, K., and Slatton, K. C. (2004). Multiple-model multiscale data fusion regulated by a mixture-of-experts network. InIGARSS 2004. 2004 IEEE International Geoscience and Remote Sensing Symposium, volume 1. IEEE

work page 2004
[2]

Albughdadi, M. (2025). Lightweight metadata-aware mixture-of-experts masked autoencoder for earth observation

work page 2025
[3]

Bi, H., Feng, Y., Tong, B., Wang, M., Yu, H., Mao, Y., Chang, H., Diao, W., Wang, P., Yu, Y., Peng, H., Zhang, Y., Fu, K., and Sun, X. (2025). RingMoE: Mixture-of-modality-experts multi-modal foundation models for universal remote sensing image interpretation

work page 2025
[4]

Cai, W., Jiang, J., Wang, F., Tang, J., Kim, S., and Huang, J. (2024). A survey on mixture of experts.arXiv preprint

work page 2024
[5]

Cai, W., Jiang, J., Wang, F., Tang, J., Kim, S., and Huang, J. (2025). A survey on mixture of experts in large language models.IEEE Transactions on Knowledge and Data Engineering, 37(7):3896–3915

work page 2025
[6]

Chai, B., Zhou, Q., Nie, X., Qiao, Q., Wu, W., Shi, Y., and Li, X. (2025). Scalable mixture-of-experts attention feature pyramid network for detection and segmentation

work page 2025
[7]

Chamroukhi, F. (2017). Skew t mixture of experts. Neurocomputing, 266:390–408

work page 2017
[8]

Chen, B., Chen, K., Yang, M., Zou, Z., and Shi, Z. (2025a). Heterogeneous mixture of experts for remote sensing image super-resolution.IEEE Geoscience and Remote Sensing Letters, 22:1–5

work page
[9]

K., Liu, S., and Wang, Z

Chen, T., Zhang, Z., JAISWAL, A. K., Liu, S., and Wang, Z. (2023a). Sparse moe as the new dropout: Scaling dense and self-slimmable transformers. InThe Eleventh International Conference on Learning Representations

work page
[10]

Chen, X., Yan, S., Zhu, J., Chen, C., Liu, Y., and Zhang, M. (2025b). Generalizable multispectral land cover classification via frequency-aware mixture of low-rank token experts

work page
[11]

Chen, Y., Cui, H., Zhang, G., Li, X., Xie, Z., Li, H., and Li, D. (2025c). SparseFormer: A credible dual-cnn expert-guided transformer for remote sensing image segmentation with sparse point annotation.IEEE Transactions on Geoscience and Remote Sensing, 63:1–16

work page
[12]

Chen, Y., Jiang, W., and Wang, Y. (2025d). FAMHE-Net: Multi-scale feature augmentation and mixture of heterogeneous experts for oriented object detection.Remote Sensing, 17(2):205

work page
[13]

Chen, Z., Deng, Y., Wu, Y., Gu, Q., and Li, Y. (2022). Towards understanding the mixture-of-experts layer in deep learning. In Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., and Oh, A., editors,Advances in Neural Information Processing Systems, volume 35, pages 23049–23062. Curran Associates, Inc

work page 2022
[14]

Chen, Z., Shen, Y., Ding, M., Chen, Z., Zhao, H., Learned-Miller, E., and Gan, C. (2023b). Mod-Squad: Designing mixtures of experts as modular multi-task learners. In2023IEEE/CVFConferenceonComputerVision and Pattern Recognition (CVPR), pages 11828–11837

work page
[15]

Cheng, G., Han, J., and Lu, X. (2017). Remote sensing image scene classification: Benchmark and state of the art.Proceedings of the IEEE, 105(10):1865–1883

work page 2017
[16]

Dai, D., Deng, C., Zhao, C., Xu, R., Gao, H., Chen, D., Li, J., Zeng, W., Yu, X., Wu, Y., Xie, Z., Li, Y., Huang, P., Luo, F., Ruan, C., Sui, Z., and Liang, W. (2024). DeepSeekMoE: Towards ultimate expert specialization in mixture-of-experts language models. In Ku, L.-W., Martins, A., and Srikumar, V., editors, Proceedings of the 62nd Annual Meeting of th...

work page 2024
[17]

Dai, X., Li, Z., Li, L., Xue, S., Huang, X., and Yang, X. (2025). HyperTransXNet: learning both global and local dynamics with a dual dynamic token mixer for hyperspectral image classification.Remote Sensing, 17(14):2361

work page 2025
[18]

and Gu, A

Dao, T. and Gu, A. (2024). Transformers are SSMs: Generalized models and efficient algorithms through structured state space duality. InInternational Conference on Machine Learning (ICML)

work page 2024
[19]

Statisticalcomparisonsofclassifiers over multiple data sets.J

Demšar,J.(2006). Statisticalcomparisonsofclassifiers over multiple data sets.J. Mach. Learn. Res., 7:1–30

work page 2006
[20]

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Burstein, J., Doran, C., and Solorio, T., editors,Proceedings of 28 Journal of Geoscience and Remote Sensing the 2019 Conference of the North American Chapter of the AssociationforComputationalLinguistics: Human...

work page 2019
[21]

Dimitri,V.,Regina,B.,andAlfonz,M.(2025).Asurvey on mixture of experts: Advancements, challenges, and future directions.TechRxiv Preprints

work page 2025
[22]

Ding, L., Hong, D., Zhao, M., Chen, H., Li, C., Deng, J., Yokoya, N., Bruzzone, L., and Chanussot, J. (2025). A survey of sample-efficient deep learning for change detection in remote sensing: Tasks, strategies, and challenges.IEEE Geoscience and Remote Sensing Magazine, 13(3):164–189

work page 2025
[23]

Do, G., Le, H., and Tran, T. (2025). SimSMoE: Toward efficient training mixture of experts via solving representational collapse. In Chiruzzo, L., Ritter, A., and Wang, L., editors,Findings of the Association for Computational Linguistics: NAACL 2025, pages 2012–2025, Albuquerque, New Mexico. Association for Computational Linguistics

work page 2025
[24]

Dong, Z., Sun, Y., Jiang, H., Liu, T., and Gu, Y. (2025). PhyDAE: Physics-guided degradation-adaptive experts for all-in-one remote sensing image restoration

work page 2025
[25]

Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al. (2021). An image is worth 16x16 words: Transformers for image recognition at scale. InInternational Conference on Learning Representations (ICLR)

work page 2021
[26]

Dou, P., Shen, H., Li, Z., and Guan, X. (2021). Time series remote sensing image classification framework using combination of deep learning and multiple classifiers system.International Journal of Applied Earth Observation and Geoinformation, 103:102477

work page 2021
[27]

Dror, R., Baumer, G., Shlomov, S., and Reichart, R. (2018). The hitchhiker’s guide to testing statistical significanceinnaturallanguageprocessing.InGurevych, I. and Miyao, Y., editors,Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1383–1392, Melbourne, Australia. Association for Compu...

work page 2018
[28]

M., Tong, S., Lepikhin, D., Xu, Y., Krikun, M., Zhou, Y., Yu, A

Du, N., Huang, Y., Dai, A. M., Tong, S., Lepikhin, D., Xu, Y., Krikun, M., Zhou, Y., Yu, A. W., Firat, O., Zoph, B., Fedus, L., Bosma, M. P., Zhou, Z., Wang, T., Wang, E., Webster, K., Pellat, M., Robinson, K., Meier-Hellstern, K., Duke, T., Dixon, L., Zhang, K., Le, Q., Wu, Y., Chen, Z., and Cui, C. (2022). GLaM: Efficient scaling of language models with...

work page 2022
[29]

Fedus, W., Zoph, B., and Shazeer, N. (2022). Switch Transformers: Scaling to trillion parameter models with simple and efficient sparsity.Journal of Machine Learning Research, 23(120):1–39

work page 2022
[30]

Fu, Y., Yang, R., Liu, Z., and Ng, M. K. (2025). Adaptive mixture-of-experts distillation for cross-satellite generalizable incremental remote sensing scene classification.IEEE Transactions on Circuits and Systems for Video Technology, pages 1–1

work page 2025
[31]

Fung,T.C.andTseung,S.C.(2025).Mixtureofexperts models for multilevel data: Modeling framework and approximation theory.Neurocomputing, 626:129357

work page 2025
[32]

Gale, T., Elsen, E., and Hooker, S. (2023). MegaBlocks: Efficient sparse training with mixture-of-experts.arXiv preprint

work page 2023
[33]

Gan, W., Ning, Z., Qi, Z., and Yu, P. S. (2025). Mixture of experts (MoE): A big data perspective.arXiv preprint

work page 2025
[34]

Asurvey on deep learning for multimodal data fusion.Neural Computation, 32(5):829–864

Gao,J.,Li,P.,Chen,Z.,andZhang,J.(2020). Asurvey on deep learning for multimodal data fusion.Neural Computation, 32(5):829–864

work page 2020
[35]

Gao, Q., Qu, J., Li, Y., and Dong, W. (2025a). Rethinking efficient mixture-of-experts for remote sensing modality-missing classification

work page
[36]

ToMoE:Convertingdense large language models to mixture-of-experts through dynamic structural pruning

Gao,S.,Hua,T.,Shirkavand,R.,Lin,C.-H.,Tang,Z.,Li, Z., Yuan, L., Li, F., Zhang, Z., Ganjdanesh, A., Qian, L., Jie,X.,andHsu,Y.-C.(2025b). ToMoE:Convertingdense large language models to mixture-of-experts through dynamic structural pruning

work page
[37]

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Gu, A. and Dao, T. (2023). Mamba: Linear-time sequence modeling with selective state spaces.arXiv preprint arXiv:2312.00752

work page internal anchor Pith review Pith/arXiv arXiv 2023
[38]

Gu, N., Zhang, Z., Feng, Y., Chen, Y., Fu, P., Lin, Z., Wang, S., Sun, Y., Wu, H., Wang, W., and Wang, H. (2025). Elastic MoE: Unlocking the inference-time scalability of mixture-of-experts

work page 2025
[39]

Guo, C., Pleiss, G., Sun, Y., and Weinberger, K. Q. (2017). On calibration of modern neural networks. InProceedings of the 34th International Conference on Machine Learning - Volume 70, ICML’17, page 1321–1330. JMLR.org

work page 2017
[40]

Guo, S., Chen, T., Wang, P., Yan, J., and Liu, H. (2025). Confidence fusion with representation distribution and mixture of experts for multimodal radar target recognition.IEEE Transactions on Aerospace and Electronic Systems, 61(5):13251–13268

work page 2025
[41]

H., and Gao, J

Gupta, S., Mukherjee, S., Subudhi, K., Gonzalez, E., Jose, D., Awadallah, A. H., and Gao, J. (2022). Sparsely activated mixture-of-experts are robust multi-task learners.arXiv preprint

work page 2022
[42]

A., and Zettlemoyer, L

Gururangan, S., Li, M., Lewis, M., Shi, W., Althoff, T., Smith, N. A., and Zettlemoyer, L. (2023). Scaling expert language models with unsupervised domain discovery

work page 2023
[43]

Hanna, J., Scheibenreif, L., and Borth, D. (2025). MAPEX: Modality-aware pruning of experts for remote sensing foundation models

work page 2025
[44]

Hazimeh,H.,Zhao,Z.,Chowdhery,A.,Sathiamoorthy, M., Chen, Y., Mazumder, R., Hong, L., and Chi, E. (2021). DSelect-k: Differentiable selection in the mixture of experts with applications to multi-task learning. In Ranzato, M., Beygelzimer, A., Dauphin, 29 Journal of Geoscience and Remote Sensing Y., Liang, P., and Vaughan, J. W., editors,Advances in Neural ...

work page 2021
[45]

He, J., Qiu, J., Zeng, A., Yang, Z., Zhai, J., and Tang, J. (2021). FastMoE: A fast mixture-of-expert training system.arXiv preprint

work page 2021
[46]

He, J., Zhai, J., Antunes, T., Wang, H., Luo, F., Shi, S., and Li, Q. (2022). FasterMoE: modeling and optimizing training of large-scale dynamic pre-trained models. In Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 120–134

work page 2022
[47]

He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep residual learning for image recognition. In2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778

work page 2016
[48]

He, S., Cheng, Q., Huai, Y., Zhu, Z., and Ding, J. (2024a). Mixture-of-experts for semantic segmentation of remoting sensing image. In Qin, C. and Zhou, H., editors,International Conference on Image Processing and Artificial Intelligence (ICIPAl 2024), volume 13213, page 132131Z. International Society for Optics and Photonics, SPIE

work page 2024
[49]

He, W., Cai, Y., Ren, Q., Ruze, A., and Jia, S. (2025). Adaptive expert learning for hyperspectral and multispectral image fusion.IEEE Transactions on Geoscience and Remote Sensing, 63:1–15

work page 2025
[50]

He, X., Yan, K., Li, R., Xie, C., Zhang, J., and Zhou, M. (2024b). Frequency-adaptive pan-sharpening with mixture of experts. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 2121–2129

work page
[51]

Ho, N., Yang, C.-Y., and Jordan, M. I. (2022). Convergence rates for gaussian mixtures of experts. Journal of Machine Learning Research, 23(323):1–81

work page 2022
[52]

and Schmidhuber, J

Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory.Neural computation, 9(8):1735–1780

work page 1997
[53]

J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., and Chen, W

Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., and Chen, W. (2022). LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations

work page 2022
[54]

Harder task needs more experts: Dynamic routing in MoE models

Huang, Q., An,Z., Zhuang, N., Tao, M., Zhang, C., Jin, Y.,Xu,K.,Xu,K.,Chen,L.,Huang,S.,andFeng,Y.(2024). Harder task needs more experts: Dynamic routing in MoE models. In Ku, L.-W., Martins, A., and Srikumar, V., editors,Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 12883–12895, Bang...

work page 2024
[55]

Hwang,C.,Cui,W.,Xiong,Y.,Yang,Z.,Liu,Z.,Hu,H., Wang, Z., Salas, R., Jose, J., Ram, P., Chau, H., Cheng, P., Yang, F., Yang, M., and Xiong, Y. (2023). Tutel: Adaptive mixture-of-experts at scale. In Song, D., Carbin, M., and Chen, T., editors,Proceedings of Machine Learning and Systems, volume 5, pages 269–287. Curan

work page 2023
[56]

Hwang, R., Wei, J., Cao, S., Hwang, C., Tang, X., Cao, T., and Yang, M. (2024). Pre-gated moe: An algorithm-system co-design for fast and scalable mixture-of-expert inference. In2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA), pages 1018–1031

work page 2024
[57]

A., Jordan, M

Jacobs, R. A., Jordan, M. I., Nowlan, S. J., and Hinton, G. E. (1991). Adaptive mixtures of local experts.Neural Computation, 3(1):79–87

work page 1991
[58]

J., Abdul-Mageed, M., Lakshmanan, V.S., L., Awadallah, A

Jawahar, G., Mukherjee, S., Liu, X., Kim, Y. J., Abdul-Mageed, M., Lakshmanan, V.S., L., Awadallah, A. H., Bubeck, S., and Gao, J. (2023). AutoMoE: Heterogeneous mixture-of-experts with adaptive computation for efficient neural machine translation. In Rogers, A., Boyd-Graber, J., and Okazaki, N., editors,Findings of the Association for Computational Lingu...

work page 2023
[59]

Jia, Y., Ge, Y., Ling, F., Guo, X., Wang, J., Wang, L., Chen, Y., and Li, X. (2018). Urban land use mapping by combining remote sensing imagery and mobile phone positioning data.Remote Sensing, 10(3)

work page 2018
[60]

Q., Sablayrolles, A., Roux, A., Mensch, A., Savary, B., Bamford, C., Chaplot, D

Jiang, A. Q., Sablayrolles, A., Roux, A., Mensch, A., Savary, B., Bamford, C., Chaplot, D. S., de las Casas, D., Bou Hanna, E., Bressand, F., et al. (2024). Mixtral of experts.arXiv preprint

work page 2024
[61]

D., Feng, D., and Ku, W.-S

Jiang, C., Osei, K., Yeddula, S. D., Feng, D., and Ku, W.-S. (2025). Knowledge-guided adaptive mixture of experts for precipitation prediction

work page 2025
[62]

Jiang, H., Peng, M., Zhong, Y., Xie, H., Hao, Z., Lin, J., Ma, X., and Hu, X. (2022). A survey on deep learning-based change detection from high-resolution remote sensing images.Remote Sensing, 14(7)

work page 2022
[63]

and Tanner, M

Jiang, W. and Tanner, M. A. (1999). Hierarchical mixtures-of-experts for generalized linear models: Some results on denseness and consistency. In Heckerman, D. and Whittaker, J., editors,Proceedings of the Seventh International Workshop on Artificial Intelligence and Statistics, volume R2 ofProceedings of Machine Learning Research. PMLR. Reissued by PMLR ...

work page 1999
[64]

Jordan, M. I. and Jacobs, R. A. (1994). Hierarchical mixtures of experts and the em algorithm.Neural Computation, 6(2):181–214

work page 1994
[65]

R., Mustafa, B., Ainslie, J., Tay, Y., Dehghani, M., and Houlsby, N

Komatsuzaki, A., Puigcerver, J., Lee-Thorp, J., Ruiz, C. R., Mustafa, B., Ainslie, J., Tay, Y., Dehghani, M., and Houlsby, N. (2023). Sparse Upcycling: Training mixture-of-experts from dense checkpoints. InThe Eleventh International Conference on Learning Representations

work page 2023
[66]

L., and Wang, X

Kong, Y., Yu, S., Cheng, Y., Philip Chen, C. L., and Wang, X. (2025). Joint classification of hyperspectral images and lidar data based on candidate pseudo labels pruning and dual mixture of experts.IEEE Transactions on Geoscience and Remote Sensing, 63:1–12

work page 2025
[67]

Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. InAdvances in Neural Information Processing 30 Journal of Geoscience and Remote Sensing Systems, volume 25

work page 2012
[68]

Kudugunta, S., Huang, Y., Bapna, A., Krikun, M., Lepikhin, D., Luong, M.-T., and Firat, O. (2021). Beyond Distillation: Task-level mixture-of-experts for efficient inference. In Moens, M.-F., Huang, X., Specia, L., and Yih, S. W.-t., editors,Findings of the Association for ComputationalLinguistics: EMNLP2021,pages3577–3599, Punta Cana, Dominican Republic....

work page 2021
[69]

N., Gupta, M., Abdelsalam, M., and Bhattarai, M

Kunwar, P., Vu, M. N., Gupta, M., Abdelsalam, M., and Bhattarai, M. (2025). TT-LoRA MoE: Unifying parameter-efficient fine-tuning and sparse mixture-of-experts.arXiv preprint arXiv:2504.21190

work page arXiv 2025
[70]

Kussul, N., Shelestov, A., Lavreniuk, M., Butko, I., and Skakun, S. (2016). Deep learning approach for large scale land cover mapping based on remote sensing data fusion. In2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), pages 198–201

work page 2016
[71]

Lee, S., Park, S., Yang, J., Kim, J., and Cha, M. (2025). Generalizableslumdetectionfromsatelliteimagerywith mixture-of-experts

work page 2025
[72]

GShard: Scaling giant models with conditional computation and automatic sharding

Lepikhin,D.,Lee,H.,Xu,Y.,Chen,D.,Firat,O.,Huang, Y.,Krikun,M.,Shazeer,N.,andChen,Z.(2021). GShard: Scaling giant models with conditional computation and automatic sharding. InInternational Conference on Learning Representations

work page 2021
[73]

In Meila, M

Lewis, M., Bhosale, S., Dettmers, T., Goyal, N., and Zettlemoyer,L.(2021).BASElayers: Simplifyingtraining of large, sparse models. In Meila, M. and Zhang, T., editors,Proceedings of the 38th International Conference on Machine Learning, volume 139 ofProceedings of Machine Learning Research, pages 6265–6274. PMLR

work page 2021
[74]

Li, J., Kang, J., Lu, J., Fu, H., Li, Z., Liu, B., Lin, X., Zhao, J., Guan, H., Liu, H., and Liu, Z. (2025a). Dynamic gating-enhanced deep learning model with multi-source remote sensing synergy for optimizing wheatyieldestimation.FrontiersinPlantScience,Volume 16 - 2025

work page 2025
[75]

Li, J., Li, D., Savarese, S., and Hoi, S. (2023). BLIP-2: bootstrappinglanguage-imagepre-trainingwith frozen image encoders and large language models. In Proceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.org

work page 2023
[76]

Li, R., Ding, X., Peng, S., and Cai, F. (2025b). U-MoEMamba: A hybrid expert segmentation model for cabbage heads in complex uav low-altitude remote sensing scenarios.Agriculture, 15(16):1723

work page
[77]

Li, Y., Li, X., Li, Y., Zhang, Y., Dai, Y., Hou, Q., Cheng, M.-M., and Yang, J. (2025c). SM3Det: A unified model for multi-modal remote sensing object detection

work page
[78]

Li, Z., Chen, X., Li, J., and Zhang, J. (2022). Pertinent multigate mixture-of-experts-based prestack three-parameter seismic inversion.IEEE Transactions on Geoscience and Remote Sensing, 60:1–15

work page 2022
[79]

Liang, H., Fan, Z., Sarkar, R., Jiang, Z., Chen, T., Zou, K., Cheng, Y., Hao, C., and Wang, Z. (2022). M3vit: mixture-of-experts vision transformer for efficient multi-task learning with model-accelerator co-design. InProceedings of the 36th International Conference on Neural Information Processing Systems, NIPS ’22, Red Hook, NY, USA. Curran Associates Inc

work page 2022
[80]

Liao, M., Chen, W., Shen, J., Guo, S., and Wan, H. (2025). HMoRA: Making llms more effective with hierarchical mixture of lora experts. InThe Thirteenth International Conference on Learning Representations

work page 2025

Showing first 80 references.