Recognition: no theorem link
Mixture-of-Experts in Remote Sensing: A Survey
Pith reviewed 2026-05-13 20:01 UTC · model grok-4.3
The pith
Mixture-of-Experts routes remote sensing inputs to specialized experts to manage sensor diversity and dynamics.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Mixture-of-Experts models address remote sensing challenges by employing a routing mechanism that directs each input to the most relevant subset of specialized expert sub-networks, and the survey synthesizes existing designs and applications to demonstrate this approach across classification, segmentation, detection, and change-analysis tasks.
What carries the argument
Mixture-of-Experts (MoE) model, which uses a gating or routing network to activate only a sparse subset of specialized expert networks for each input.
If this is right
- Sparse activation in MoE variants lowers compute demands for processing high-volume satellite imagery.
- Routing strategies can be tuned separately for optical, SAR, and hyperspectral data modalities.
- Applications extend to time-series analysis and multimodal fusion without requiring full model activation.
- Future designs may incorporate MoE layers into larger foundation models for Earth observation.
Where Pith is reading between the lines
- MoE routing patterns could transfer to real-time stream processing of incoming satellite feeds.
- The surveyed techniques suggest a path for parameter-efficient adaptation across multiple remote sensing sensors.
- Neighboring domains such as climate data analysis may adopt similar expert specialization for regional variability.
Load-bearing premise
The published body of MoE work in remote sensing is large enough and distinct enough to support a complete, representative survey.
What would settle it
Discovery of several major MoE-based remote sensing methods or papers that the survey omits or fails to synthesize.
Figures
read the original abstract
Remote sensing data analysis and interpretation present unique challenges due to the diversity in sensor modalities and spatiotemporal dynamics of Earth observation data. Mixture-of-Experts (MoE) model has emerged as a powerful paradigm that addresses these challenges by dynamically routing inputs to specialized experts designed for different aspects of a task. However, despite rapid progress, the community still lacks a comprehensive review of MoE for remote sensing. This survey provides the first systematic overview of MoE applications in remote sensing, covering fundamental principles, architectural designs, and key applications across a variety of remote sensing tasks. The survey also outlines future trends to inspire further research and innovation in applying MoE to remote sensing.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to deliver the first systematic survey of Mixture-of-Experts (MoE) models applied to remote sensing, covering fundamental principles, architectural designs, and applications across diverse remote sensing tasks such as image classification, segmentation, and change detection, while also discussing future trends.
Significance. If the survey is comprehensive and representative, it would fill an important gap by synthesizing MoE techniques tailored to remote sensing challenges including multi-modal sensor data and spatiotemporal variability, potentially serving as a reference for researchers bridging computer vision and Earth observation.
major comments (2)
- [Abstract and Introduction] Abstract and Introduction: The central claim that this is the 'first systematic overview' is load-bearing but unsupported by any description of the literature search protocol, including databases searched (e.g., IEEE Xplore, Google Scholar), search terms, date range, or inclusion/exclusion criteria. Without this information, it is impossible to evaluate completeness or selection bias.
- [Literature Overview] Section 2 or 3 (Literature Overview): The manuscript should include a quantitative summary (e.g., a table or figure) of the number of relevant peer-reviewed papers identified per task category to demonstrate that the MoE-remote sensing corpus is large and distinct enough from generic MoE vision literature to justify a dedicated survey.
minor comments (2)
- [Figures] Figure captions and notation: Ensure consistent use of symbols for routing gates and expert outputs across all architectural diagrams to improve readability.
- [References] References: Verify that all cited works are from peer-reviewed venues and that recent 2023-2024 publications are included where relevant.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which help strengthen the transparency and justification of our survey. We will revise the manuscript accordingly to address both major points.
read point-by-point responses
-
Referee: [Abstract and Introduction] Abstract and Introduction: The central claim that this is the 'first systematic overview' is load-bearing but unsupported by any description of the literature search protocol, including databases searched (e.g., IEEE Xplore, Google Scholar), search terms, date range, or inclusion/exclusion criteria. Without this information, it is impossible to evaluate completeness or selection bias.
Authors: We agree that a detailed description of the literature search protocol is necessary to support the claim of providing the first systematic overview. In the revised manuscript, we will add a dedicated subsection (likely in the Introduction) that explicitly describes the databases searched (IEEE Xplore, Google Scholar, arXiv, Web of Science), the search terms and Boolean combinations used (e.g., 'Mixture-of-Experts' OR MoE AND 'remote sensing' OR 'Earth observation'), the date range covered, and the inclusion/exclusion criteria applied to select papers. This will allow readers to assess completeness and potential selection bias. revision: yes
-
Referee: [Literature Overview] Section 2 or 3 (Literature Overview): The manuscript should include a quantitative summary (e.g., a table or figure) of the number of relevant peer-reviewed papers identified per task category to demonstrate that the MoE-remote sensing corpus is large and distinct enough from generic MoE vision literature to justify a dedicated survey.
Authors: We accept this recommendation. In the revised version, we will insert a new table (or figure) in the Literature Overview section that provides a quantitative breakdown of the number of peer-reviewed papers identified per remote sensing task category (e.g., classification, segmentation, change detection, object detection, and multimodal fusion). The table will also note the proportion of papers that focus specifically on remote sensing challenges versus generic vision applications, thereby demonstrating the size and distinctiveness of the MoE-remote sensing corpus. revision: yes
Circularity Check
No circularity: survey asserts literature gap without derivations or self-referential reductions
full rationale
The paper is a literature survey containing no equations, predictions, fitted parameters, or first-principles derivations. Its central claim—that it supplies the 'first systematic overview' of MoE in remote sensing—is a descriptive assertion about the external corpus rather than a result obtained by reducing any quantity to its own inputs. No self-citation chains, ansatzes, or uniqueness theorems are invoked to justify internal results. The absence of prior reviews is stated without reference to the authors' own prior work as load-bearing evidence. This matches the default expectation for non-circular survey papers whose claims rest on external literature rather than internal construction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Aggarwal, V., Nagarajan, K., and Slatton, K. C. (2004). Multiple-model multiscale data fusion regulated by a mixture-of-experts network. InIGARSS 2004. 2004 IEEE International Geoscience and Remote Sensing Symposium, volume 1. IEEE
work page 2004
-
[2]
Albughdadi, M. (2025). Lightweight metadata-aware mixture-of-experts masked autoencoder for earth observation
work page 2025
-
[3]
Bi, H., Feng, Y., Tong, B., Wang, M., Yu, H., Mao, Y., Chang, H., Diao, W., Wang, P., Yu, Y., Peng, H., Zhang, Y., Fu, K., and Sun, X. (2025). RingMoE: Mixture-of-modality-experts multi-modal foundation models for universal remote sensing image interpretation
work page 2025
-
[4]
Cai, W., Jiang, J., Wang, F., Tang, J., Kim, S., and Huang, J. (2024). A survey on mixture of experts.arXiv preprint
work page 2024
-
[5]
Cai, W., Jiang, J., Wang, F., Tang, J., Kim, S., and Huang, J. (2025). A survey on mixture of experts in large language models.IEEE Transactions on Knowledge and Data Engineering, 37(7):3896–3915
work page 2025
-
[6]
Chai, B., Zhou, Q., Nie, X., Qiao, Q., Wu, W., Shi, Y., and Li, X. (2025). Scalable mixture-of-experts attention feature pyramid network for detection and segmentation
work page 2025
-
[7]
Chamroukhi, F. (2017). Skew t mixture of experts. Neurocomputing, 266:390–408
work page 2017
-
[8]
Chen, B., Chen, K., Yang, M., Zou, Z., and Shi, Z. (2025a). Heterogeneous mixture of experts for remote sensing image super-resolution.IEEE Geoscience and Remote Sensing Letters, 22:1–5
-
[9]
Chen, T., Zhang, Z., JAISWAL, A. K., Liu, S., and Wang, Z. (2023a). Sparse moe as the new dropout: Scaling dense and self-slimmable transformers. InThe Eleventh International Conference on Learning Representations
-
[10]
Chen, X., Yan, S., Zhu, J., Chen, C., Liu, Y., and Zhang, M. (2025b). Generalizable multispectral land cover classification via frequency-aware mixture of low-rank token experts
-
[11]
Chen, Y., Cui, H., Zhang, G., Li, X., Xie, Z., Li, H., and Li, D. (2025c). SparseFormer: A credible dual-cnn expert-guided transformer for remote sensing image segmentation with sparse point annotation.IEEE Transactions on Geoscience and Remote Sensing, 63:1–16
-
[12]
Chen, Y., Jiang, W., and Wang, Y. (2025d). FAMHE-Net: Multi-scale feature augmentation and mixture of heterogeneous experts for oriented object detection.Remote Sensing, 17(2):205
-
[13]
Chen, Z., Deng, Y., Wu, Y., Gu, Q., and Li, Y. (2022). Towards understanding the mixture-of-experts layer in deep learning. In Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., and Oh, A., editors,Advances in Neural Information Processing Systems, volume 35, pages 23049–23062. Curran Associates, Inc
work page 2022
-
[14]
Chen, Z., Shen, Y., Ding, M., Chen, Z., Zhao, H., Learned-Miller, E., and Gan, C. (2023b). Mod-Squad: Designing mixtures of experts as modular multi-task learners. In2023IEEE/CVFConferenceonComputerVision and Pattern Recognition (CVPR), pages 11828–11837
-
[15]
Cheng, G., Han, J., and Lu, X. (2017). Remote sensing image scene classification: Benchmark and state of the art.Proceedings of the IEEE, 105(10):1865–1883
work page 2017
-
[16]
Dai, D., Deng, C., Zhao, C., Xu, R., Gao, H., Chen, D., Li, J., Zeng, W., Yu, X., Wu, Y., Xie, Z., Li, Y., Huang, P., Luo, F., Ruan, C., Sui, Z., and Liang, W. (2024). DeepSeekMoE: Towards ultimate expert specialization in mixture-of-experts language models. In Ku, L.-W., Martins, A., and Srikumar, V., editors, Proceedings of the 62nd Annual Meeting of th...
work page 2024
-
[17]
Dai, X., Li, Z., Li, L., Xue, S., Huang, X., and Yang, X. (2025). HyperTransXNet: learning both global and local dynamics with a dual dynamic token mixer for hyperspectral image classification.Remote Sensing, 17(14):2361
work page 2025
- [18]
-
[19]
Statisticalcomparisonsofclassifiers over multiple data sets.J
Demšar,J.(2006). Statisticalcomparisonsofclassifiers over multiple data sets.J. Mach. Learn. Res., 7:1–30
work page 2006
-
[20]
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Burstein, J., Doran, C., and Solorio, T., editors,Proceedings of 28 Journal of Geoscience and Remote Sensing the 2019 Conference of the North American Chapter of the AssociationforComputationalLinguistics: Human...
work page 2019
-
[21]
Dimitri,V.,Regina,B.,andAlfonz,M.(2025).Asurvey on mixture of experts: Advancements, challenges, and future directions.TechRxiv Preprints
work page 2025
-
[22]
Ding, L., Hong, D., Zhao, M., Chen, H., Li, C., Deng, J., Yokoya, N., Bruzzone, L., and Chanussot, J. (2025). A survey of sample-efficient deep learning for change detection in remote sensing: Tasks, strategies, and challenges.IEEE Geoscience and Remote Sensing Magazine, 13(3):164–189
work page 2025
-
[23]
Do, G., Le, H., and Tran, T. (2025). SimSMoE: Toward efficient training mixture of experts via solving representational collapse. In Chiruzzo, L., Ritter, A., and Wang, L., editors,Findings of the Association for Computational Linguistics: NAACL 2025, pages 2012–2025, Albuquerque, New Mexico. Association for Computational Linguistics
work page 2025
-
[24]
Dong, Z., Sun, Y., Jiang, H., Liu, T., and Gu, Y. (2025). PhyDAE: Physics-guided degradation-adaptive experts for all-in-one remote sensing image restoration
work page 2025
-
[25]
Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al. (2021). An image is worth 16x16 words: Transformers for image recognition at scale. InInternational Conference on Learning Representations (ICLR)
work page 2021
-
[26]
Dou, P., Shen, H., Li, Z., and Guan, X. (2021). Time series remote sensing image classification framework using combination of deep learning and multiple classifiers system.International Journal of Applied Earth Observation and Geoinformation, 103:102477
work page 2021
-
[27]
Dror, R., Baumer, G., Shlomov, S., and Reichart, R. (2018). The hitchhiker’s guide to testing statistical significanceinnaturallanguageprocessing.InGurevych, I. and Miyao, Y., editors,Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1383–1392, Melbourne, Australia. Association for Compu...
work page 2018
-
[28]
M., Tong, S., Lepikhin, D., Xu, Y., Krikun, M., Zhou, Y., Yu, A
Du, N., Huang, Y., Dai, A. M., Tong, S., Lepikhin, D., Xu, Y., Krikun, M., Zhou, Y., Yu, A. W., Firat, O., Zoph, B., Fedus, L., Bosma, M. P., Zhou, Z., Wang, T., Wang, E., Webster, K., Pellat, M., Robinson, K., Meier-Hellstern, K., Duke, T., Dixon, L., Zhang, K., Le, Q., Wu, Y., Chen, Z., and Cui, C. (2022). GLaM: Efficient scaling of language models with...
work page 2022
-
[29]
Fedus, W., Zoph, B., and Shazeer, N. (2022). Switch Transformers: Scaling to trillion parameter models with simple and efficient sparsity.Journal of Machine Learning Research, 23(120):1–39
work page 2022
-
[30]
Fu, Y., Yang, R., Liu, Z., and Ng, M. K. (2025). Adaptive mixture-of-experts distillation for cross-satellite generalizable incremental remote sensing scene classification.IEEE Transactions on Circuits and Systems for Video Technology, pages 1–1
work page 2025
-
[31]
Fung,T.C.andTseung,S.C.(2025).Mixtureofexperts models for multilevel data: Modeling framework and approximation theory.Neurocomputing, 626:129357
work page 2025
-
[32]
Gale, T., Elsen, E., and Hooker, S. (2023). MegaBlocks: Efficient sparse training with mixture-of-experts.arXiv preprint
work page 2023
-
[33]
Gan, W., Ning, Z., Qi, Z., and Yu, P. S. (2025). Mixture of experts (MoE): A big data perspective.arXiv preprint
work page 2025
-
[34]
Asurvey on deep learning for multimodal data fusion.Neural Computation, 32(5):829–864
Gao,J.,Li,P.,Chen,Z.,andZhang,J.(2020). Asurvey on deep learning for multimodal data fusion.Neural Computation, 32(5):829–864
work page 2020
-
[35]
Gao, Q., Qu, J., Li, Y., and Dong, W. (2025a). Rethinking efficient mixture-of-experts for remote sensing modality-missing classification
-
[36]
ToMoE:Convertingdense large language models to mixture-of-experts through dynamic structural pruning
Gao,S.,Hua,T.,Shirkavand,R.,Lin,C.-H.,Tang,Z.,Li, Z., Yuan, L., Li, F., Zhang, Z., Ganjdanesh, A., Qian, L., Jie,X.,andHsu,Y.-C.(2025b). ToMoE:Convertingdense large language models to mixture-of-experts through dynamic structural pruning
-
[37]
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Gu, A. and Dao, T. (2023). Mamba: Linear-time sequence modeling with selective state spaces.arXiv preprint arXiv:2312.00752
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[38]
Gu, N., Zhang, Z., Feng, Y., Chen, Y., Fu, P., Lin, Z., Wang, S., Sun, Y., Wu, H., Wang, W., and Wang, H. (2025). Elastic MoE: Unlocking the inference-time scalability of mixture-of-experts
work page 2025
-
[39]
Guo, C., Pleiss, G., Sun, Y., and Weinberger, K. Q. (2017). On calibration of modern neural networks. InProceedings of the 34th International Conference on Machine Learning - Volume 70, ICML’17, page 1321–1330. JMLR.org
work page 2017
-
[40]
Guo, S., Chen, T., Wang, P., Yan, J., and Liu, H. (2025). Confidence fusion with representation distribution and mixture of experts for multimodal radar target recognition.IEEE Transactions on Aerospace and Electronic Systems, 61(5):13251–13268
work page 2025
-
[41]
Gupta, S., Mukherjee, S., Subudhi, K., Gonzalez, E., Jose, D., Awadallah, A. H., and Gao, J. (2022). Sparsely activated mixture-of-experts are robust multi-task learners.arXiv preprint
work page 2022
-
[42]
Gururangan, S., Li, M., Lewis, M., Shi, W., Althoff, T., Smith, N. A., and Zettlemoyer, L. (2023). Scaling expert language models with unsupervised domain discovery
work page 2023
-
[43]
Hanna, J., Scheibenreif, L., and Borth, D. (2025). MAPEX: Modality-aware pruning of experts for remote sensing foundation models
work page 2025
-
[44]
Hazimeh,H.,Zhao,Z.,Chowdhery,A.,Sathiamoorthy, M., Chen, Y., Mazumder, R., Hong, L., and Chi, E. (2021). DSelect-k: Differentiable selection in the mixture of experts with applications to multi-task learning. In Ranzato, M., Beygelzimer, A., Dauphin, 29 Journal of Geoscience and Remote Sensing Y., Liang, P., and Vaughan, J. W., editors,Advances in Neural ...
work page 2021
-
[45]
He, J., Qiu, J., Zeng, A., Yang, Z., Zhai, J., and Tang, J. (2021). FastMoE: A fast mixture-of-expert training system.arXiv preprint
work page 2021
-
[46]
He, J., Zhai, J., Antunes, T., Wang, H., Luo, F., Shi, S., and Li, Q. (2022). FasterMoE: modeling and optimizing training of large-scale dynamic pre-trained models. In Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 120–134
work page 2022
-
[47]
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep residual learning for image recognition. In2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778
work page 2016
-
[48]
He, S., Cheng, Q., Huai, Y., Zhu, Z., and Ding, J. (2024a). Mixture-of-experts for semantic segmentation of remoting sensing image. In Qin, C. and Zhou, H., editors,International Conference on Image Processing and Artificial Intelligence (ICIPAl 2024), volume 13213, page 132131Z. International Society for Optics and Photonics, SPIE
work page 2024
-
[49]
He, W., Cai, Y., Ren, Q., Ruze, A., and Jia, S. (2025). Adaptive expert learning for hyperspectral and multispectral image fusion.IEEE Transactions on Geoscience and Remote Sensing, 63:1–15
work page 2025
-
[50]
He, X., Yan, K., Li, R., Xie, C., Zhang, J., and Zhou, M. (2024b). Frequency-adaptive pan-sharpening with mixture of experts. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 2121–2129
-
[51]
Ho, N., Yang, C.-Y., and Jordan, M. I. (2022). Convergence rates for gaussian mixtures of experts. Journal of Machine Learning Research, 23(323):1–81
work page 2022
-
[52]
Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory.Neural computation, 9(8):1735–1780
work page 1997
-
[53]
J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., and Chen, W
Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., and Chen, W. (2022). LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations
work page 2022
-
[54]
Harder task needs more experts: Dynamic routing in MoE models
Huang, Q., An,Z., Zhuang, N., Tao, M., Zhang, C., Jin, Y.,Xu,K.,Xu,K.,Chen,L.,Huang,S.,andFeng,Y.(2024). Harder task needs more experts: Dynamic routing in MoE models. In Ku, L.-W., Martins, A., and Srikumar, V., editors,Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 12883–12895, Bang...
work page 2024
-
[55]
Hwang,C.,Cui,W.,Xiong,Y.,Yang,Z.,Liu,Z.,Hu,H., Wang, Z., Salas, R., Jose, J., Ram, P., Chau, H., Cheng, P., Yang, F., Yang, M., and Xiong, Y. (2023). Tutel: Adaptive mixture-of-experts at scale. In Song, D., Carbin, M., and Chen, T., editors,Proceedings of Machine Learning and Systems, volume 5, pages 269–287. Curan
work page 2023
-
[56]
Hwang, R., Wei, J., Cao, S., Hwang, C., Tang, X., Cao, T., and Yang, M. (2024). Pre-gated moe: An algorithm-system co-design for fast and scalable mixture-of-expert inference. In2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA), pages 1018–1031
work page 2024
-
[57]
Jacobs, R. A., Jordan, M. I., Nowlan, S. J., and Hinton, G. E. (1991). Adaptive mixtures of local experts.Neural Computation, 3(1):79–87
work page 1991
-
[58]
J., Abdul-Mageed, M., Lakshmanan, V.S., L., Awadallah, A
Jawahar, G., Mukherjee, S., Liu, X., Kim, Y. J., Abdul-Mageed, M., Lakshmanan, V.S., L., Awadallah, A. H., Bubeck, S., and Gao, J. (2023). AutoMoE: Heterogeneous mixture-of-experts with adaptive computation for efficient neural machine translation. In Rogers, A., Boyd-Graber, J., and Okazaki, N., editors,Findings of the Association for Computational Lingu...
work page 2023
-
[59]
Jia, Y., Ge, Y., Ling, F., Guo, X., Wang, J., Wang, L., Chen, Y., and Li, X. (2018). Urban land use mapping by combining remote sensing imagery and mobile phone positioning data.Remote Sensing, 10(3)
work page 2018
-
[60]
Q., Sablayrolles, A., Roux, A., Mensch, A., Savary, B., Bamford, C., Chaplot, D
Jiang, A. Q., Sablayrolles, A., Roux, A., Mensch, A., Savary, B., Bamford, C., Chaplot, D. S., de las Casas, D., Bou Hanna, E., Bressand, F., et al. (2024). Mixtral of experts.arXiv preprint
work page 2024
-
[61]
Jiang, C., Osei, K., Yeddula, S. D., Feng, D., and Ku, W.-S. (2025). Knowledge-guided adaptive mixture of experts for precipitation prediction
work page 2025
-
[62]
Jiang, H., Peng, M., Zhong, Y., Xie, H., Hao, Z., Lin, J., Ma, X., and Hu, X. (2022). A survey on deep learning-based change detection from high-resolution remote sensing images.Remote Sensing, 14(7)
work page 2022
-
[63]
Jiang, W. and Tanner, M. A. (1999). Hierarchical mixtures-of-experts for generalized linear models: Some results on denseness and consistency. In Heckerman, D. and Whittaker, J., editors,Proceedings of the Seventh International Workshop on Artificial Intelligence and Statistics, volume R2 ofProceedings of Machine Learning Research. PMLR. Reissued by PMLR ...
work page 1999
-
[64]
Jordan, M. I. and Jacobs, R. A. (1994). Hierarchical mixtures of experts and the em algorithm.Neural Computation, 6(2):181–214
work page 1994
-
[65]
R., Mustafa, B., Ainslie, J., Tay, Y., Dehghani, M., and Houlsby, N
Komatsuzaki, A., Puigcerver, J., Lee-Thorp, J., Ruiz, C. R., Mustafa, B., Ainslie, J., Tay, Y., Dehghani, M., and Houlsby, N. (2023). Sparse Upcycling: Training mixture-of-experts from dense checkpoints. InThe Eleventh International Conference on Learning Representations
work page 2023
-
[66]
Kong, Y., Yu, S., Cheng, Y., Philip Chen, C. L., and Wang, X. (2025). Joint classification of hyperspectral images and lidar data based on candidate pseudo labels pruning and dual mixture of experts.IEEE Transactions on Geoscience and Remote Sensing, 63:1–12
work page 2025
-
[67]
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. InAdvances in Neural Information Processing 30 Journal of Geoscience and Remote Sensing Systems, volume 25
work page 2012
-
[68]
Kudugunta, S., Huang, Y., Bapna, A., Krikun, M., Lepikhin, D., Luong, M.-T., and Firat, O. (2021). Beyond Distillation: Task-level mixture-of-experts for efficient inference. In Moens, M.-F., Huang, X., Specia, L., and Yih, S. W.-t., editors,Findings of the Association for ComputationalLinguistics: EMNLP2021,pages3577–3599, Punta Cana, Dominican Republic....
work page 2021
-
[69]
N., Gupta, M., Abdelsalam, M., and Bhattarai, M
Kunwar, P., Vu, M. N., Gupta, M., Abdelsalam, M., and Bhattarai, M. (2025). TT-LoRA MoE: Unifying parameter-efficient fine-tuning and sparse mixture-of-experts.arXiv preprint arXiv:2504.21190
-
[70]
Kussul, N., Shelestov, A., Lavreniuk, M., Butko, I., and Skakun, S. (2016). Deep learning approach for large scale land cover mapping based on remote sensing data fusion. In2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), pages 198–201
work page 2016
-
[71]
Lee, S., Park, S., Yang, J., Kim, J., and Cha, M. (2025). Generalizableslumdetectionfromsatelliteimagerywith mixture-of-experts
work page 2025
-
[72]
GShard: Scaling giant models with conditional computation and automatic sharding
Lepikhin,D.,Lee,H.,Xu,Y.,Chen,D.,Firat,O.,Huang, Y.,Krikun,M.,Shazeer,N.,andChen,Z.(2021). GShard: Scaling giant models with conditional computation and automatic sharding. InInternational Conference on Learning Representations
work page 2021
-
[73]
Lewis, M., Bhosale, S., Dettmers, T., Goyal, N., and Zettlemoyer,L.(2021).BASElayers: Simplifyingtraining of large, sparse models. In Meila, M. and Zhang, T., editors,Proceedings of the 38th International Conference on Machine Learning, volume 139 ofProceedings of Machine Learning Research, pages 6265–6274. PMLR
work page 2021
-
[74]
Li, J., Kang, J., Lu, J., Fu, H., Li, Z., Liu, B., Lin, X., Zhao, J., Guan, H., Liu, H., and Liu, Z. (2025a). Dynamic gating-enhanced deep learning model with multi-source remote sensing synergy for optimizing wheatyieldestimation.FrontiersinPlantScience,Volume 16 - 2025
work page 2025
-
[75]
Li, J., Li, D., Savarese, S., and Hoi, S. (2023). BLIP-2: bootstrappinglanguage-imagepre-trainingwith frozen image encoders and large language models. In Proceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.org
work page 2023
-
[76]
Li, R., Ding, X., Peng, S., and Cai, F. (2025b). U-MoEMamba: A hybrid expert segmentation model for cabbage heads in complex uav low-altitude remote sensing scenarios.Agriculture, 15(16):1723
-
[77]
Li, Y., Li, X., Li, Y., Zhang, Y., Dai, Y., Hou, Q., Cheng, M.-M., and Yang, J. (2025c). SM3Det: A unified model for multi-modal remote sensing object detection
-
[78]
Li, Z., Chen, X., Li, J., and Zhang, J. (2022). Pertinent multigate mixture-of-experts-based prestack three-parameter seismic inversion.IEEE Transactions on Geoscience and Remote Sensing, 60:1–15
work page 2022
-
[79]
Liang, H., Fan, Z., Sarkar, R., Jiang, Z., Chen, T., Zou, K., Cheng, Y., Hao, C., and Wang, Z. (2022). M3vit: mixture-of-experts vision transformer for efficient multi-task learning with model-accelerator co-design. InProceedings of the 36th International Conference on Neural Information Processing Systems, NIPS ’22, Red Hook, NY, USA. Curran Associates Inc
work page 2022
-
[80]
Liao, M., Chen, W., Shen, J., Guo, S., and Wan, H. (2025). HMoRA: Making llms more effective with hierarchical mixture of lora experts. InThe Thirteenth International Conference on Learning Representations
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.