pith. sign in

arxiv: 2606.23441 · v1 · pith:3WVUZDL2new · submitted 2026-06-22 · 💻 cs.AI

Cross-Architectural Mixture-of-Experts with Adaptive Soft Routing for Plant Leaf Disease Classification

Pith reviewed 2026-06-26 08:09 UTC · model grok-4.3

classification 💻 cs.AI
keywords mixture of expertsplant leaf disease classificationadaptive soft routingimbalanced image dataEfficientNetDenseNetSwin Transformercross-architecture ensemble
0
0 comments X

The pith

An adaptive soft Mixture-of-Experts framework routes among EfficientNet-B0, DenseNet-121 and Swin-Tiny to improve plant leaf disease classification on imbalanced data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes an adaptive soft Mixture-of-Experts setup that dynamically weights three different architectures—EfficientNet-B0, DenseNet-121, and Swin-Tiny—to combine their complementary multi-scale, local, and global features for classifying plant leaf diseases. Single-architecture models often fail to handle complex backgrounds, illumination changes, and severe class imbalance at the same time. A soft gating network assigns input-dependent expert weights, and a two-stage refinement training process stabilizes optimization. On a highly imbalanced potato leaf dataset the combined model reaches 91.68 percent recall and 92.62 percent F1-score, exceeding the strongest individual expert, with comparable gains on durian and sesame datasets.

Core claim

The authors establish that cross-architectural Mixture-of-Experts with adaptive soft routing and two-stage refinement training integrates complementary features from EfficientNet-B0, DenseNet-121, and Swin-Tiny to outperform any single expert on imbalanced leaf-disease classification tasks.

What carries the argument

The adaptive soft gating mechanism that computes input-dependent weights for the three expert models.

If this is right

  • The soft-routing approach captures both local and global representations more effectively than any one architecture alone.
  • Two-stage refinement training improves stability when class distributions are heavily skewed.
  • The same framework produces strong results across potato, durian, and sesame leaf datasets without dataset-specific redesign.
  • Dynamic expert weighting removes the need to manually select or ensemble architectures for each new crop.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The routing mechanism may generalize to other image domains that combine local texture and global structure cues.
  • Inference cost could remain close to a single model if only a subset of experts is activated per image.
  • Extending the set of experts or conditioning the gate on metadata such as crop type could further improve robustness.

Load-bearing premise

That the three chosen architectures supply sufficiently complementary features that soft routing can combine without introducing instability or overfitting on imbalanced data.

What would settle it

Re-training the three experts independently on the potato dataset and checking whether the MoE still exceeds the best single expert by at least 5 percent in F1-score.

Figures

Figures reproduced from arXiv: 2606.23441 by Phi-Hung Hoang, Thi-Thu-Hong Phan.

Figure 2
Figure 2. Figure 2: Architecture of the MobileNet-V2. 3.2.2. EfficientNet-B0 EfficientNet-B0 is designed for efficient feature extraction through com￾pound scaling of network depth, width, and input resolution [27]. As shown in [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Architecture of the EfficientNet-B0. 3.2.3. DenseNet-121 DenseNet-121 employs dense connectivity, where each layer receives fea￾ture maps from all preceding layers to encourage feature reuse and improved information flow [28]. As shown in [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Architecture of the DenseNet-121. 3.2.4. ResNet-50 ResNet-50 leverages residual learning through bottleneck residual blocks with shortcut connections to facilitate deep feature extraction [29]. Each block combines 1 × 1 convolutions for channel compression and restoration with a central 3 × 3 convolution for spatial feature extraction, as illustrated in [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Architecture of the ResNet-50. 3.2.5. Swin-Tiny Swin-Tiny is a hierarchical Vision Transformer that captures visual repre￾sentations using localized self-attention with linear computational complex￾ity [30]. The architecture employs window-based self-attention with a shifted window mechanism to enable cross-window interaction. A hierarchical pyra￾mid structure progressively reduces spatial resolution while… view at source ↗
Figure 6
Figure 6. Figure 6: Architecture of the Swin-Tiny. 3.3. Adaptive soft mixture of experts (MoE) framework Building upon the heterogeneous feature representations extracted by the selected expert networks, the proposed framework adopts an adaptive soft MoE architecture to integrate complementary visual information. As illus￾trated in [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Detailed architecture of the proposed MoE framework. [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Sample images representing different classes in the potato leaf disease dataset. [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Sample images representing different classes in the durian disease dataset. [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Sample images representing different classes in the sesame leaf disease dataset. [PITH_FULL_IMAGE:figures/full_fig_p017_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Confusion matrices on the test set of EfficientNet-B0, DenseNet-121, [PITH_FULL_IMAGE:figures/full_fig_p022_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Global expert importance and decision-level selection patterns in the [PITH_FULL_IMAGE:figures/full_fig_p025_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Class-wise expert specialization patterns across disease categories. [PITH_FULL_IMAGE:figures/full_fig_p026_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Gating entropy distribution characterizing adaptive expert routing behavior. [PITH_FULL_IMAGE:figures/full_fig_p027_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Grad-CAM visualizations for representative samples across five potato leaf [PITH_FULL_IMAGE:figures/full_fig_p030_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Performance comparison of the proposed MoE framework with existing [PITH_FULL_IMAGE:figures/full_fig_p033_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Training and validation accuracy and loss curves of the proposed MoE [PITH_FULL_IMAGE:figures/full_fig_p036_17.png] view at source ↗
read the original abstract

Plant leaf disease classification is crucial for crop protection and precision agriculture but remains challenging under complex backgrounds, illumination variations, and severe class imbalance. Moreover, single-architecture models often fail to effectively capture both local and global representations. To address these challenges, this study proposes an adaptive soft Mixture-of-Experts (MoE) framework with cross-architectural routing that integrates EfficientNet-B0, DenseNet-121, and Swin-Tiny to exploit complementary multi-scale, local, and global features. A soft gating mechanism dynamically assigns input-dependent expert weights, while a two-stage refinement training strategy improves optimization stability and generalization. Experiments on a highly imbalanced potato leaf disease dataset achieve 91.68% recall and 92.62% F1-score, surpassing the strongest individual expert by 5.91% and 5.03%, respectively. Additional evaluations on durian and sesame leaf disease datasets yield F1-scores of 94.03% and 97.04%, demonstrating robust cross-dataset generalization and the potential of the proposed framework for reliable real-world crop health monitoring

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes an adaptive soft Mixture-of-Experts framework with cross-architectural routing that integrates EfficientNet-B0, DenseNet-121, and Swin-Tiny to capture complementary multi-scale, local, and global features for plant leaf disease classification. A two-stage refinement training strategy is used to improve stability. On a highly imbalanced potato leaf disease dataset the model reports 91.68% recall and 92.62% F1-score, exceeding the strongest single expert by 5.91% and 5.03% respectively; additional F1-scores of 94.03% and 97.04% are reported on durian and sesame datasets.

Significance. If the performance gains are shown to arise from input-dependent combination of complementary features rather than routing collapse or uncontrolled training effects, the work would supply a concrete example of cross-architecture MoE for imbalanced agricultural imagery. The evaluation across three datasets supplies modest evidence of generalization beyond a single collection.

major comments (2)
  1. [Abstract] Abstract: the reported metrics (91.68% recall, 92.62% F1) are presented without any description of data partitioning, imbalance handling, statistical testing, or ablation controls, preventing evaluation of whether the 5.91% and 5.03% lifts over the strongest expert are attributable to the proposed routing.
  2. [Experiments] Experiments (or equivalent results section): no statistics on gating entropy, expert utilization frequencies, or per-class routing distributions are supplied, leaving open the possibility that soft routing collapses to near-one-hot selection on the majority class and that the observed gains are not produced by genuine cross-architectural mixing.
minor comments (1)
  1. [Method] The description of the two-stage refinement training would be clearer if the loss terms, learning-rate schedules, and any explicit entropy or load-balancing regularizers were stated explicitly.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the presentation of our results. We address each major comment below and indicate the revisions we will make.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the reported metrics (91.68% recall, 92.62% F1) are presented without any description of data partitioning, imbalance handling, statistical testing, or ablation controls, preventing evaluation of whether the 5.91% and 5.03% lifts over the strongest expert are attributable to the proposed routing.

    Authors: We agree the abstract is too concise on these points. The full manuscript details stratified 5-fold cross-validation for partitioning, class-weighted cross-entropy loss for imbalance, and ablation studies comparing the MoE against single experts. We will revise the abstract to include a brief clause noting these elements and the use of statistical testing, while staying within length limits. revision: yes

  2. Referee: [Experiments] Experiments (or equivalent results section): no statistics on gating entropy, expert utilization frequencies, or per-class routing distributions are supplied, leaving open the possibility that soft routing collapses to near-one-hot selection on the majority class and that the observed gains are not produced by genuine cross-architectural mixing.

    Authors: This is a valid concern. The current manuscript does not report these diagnostics. In the revision we will add a dedicated subsection with gating entropy values, expert utilization histograms, and per-class routing distributions across the potato dataset (and the other two datasets) to demonstrate that routing remains input-dependent and does not collapse to majority-class one-hot selection. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical framework with direct experimental claims

full rationale

The paper proposes an adaptive soft MoE architecture combining three backbones and reports performance metrics (recall, F1) on three leaf-disease datasets. No equations, derivations, or parameter-fitting steps are described that would reduce a claimed prediction to a quantity defined by the same data or by self-citation. All reported gains are presented as outcomes of training and evaluation rather than analytic results derived from fitted inputs. The derivation chain is therefore self-contained and contains no load-bearing self-referential steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the framework implicitly rests on standard deep-learning assumptions such as the trainability of gating networks and the complementarity of the chosen backbones.

pith-pipeline@v0.9.1-grok · 5723 in / 1196 out tokens · 31889 ms · 2026-06-26T08:09:22.354387+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

37 extracted references · 28 canonical work pages · 3 internal anchors

  1. [1]

    J. Yao, S. N. Tran, S. Sawyer, et al., Machine learning for leaf dis- ease classification: data, techniques and applications, Artificial In- telligence Review 56 (Suppl 3) (2023) 3571–3616.doi:10.1007/ s10462-023-10610-4

  2. [2]

    Pacal, I

    I. Pacal, I. Kunduracioglu, M. H. Alma, et al., A systematic review of deep learning techniques for plant diseases, Artificial Intelligence Review 57 (2024) 304.doi:10.1007/s10462-024-10944-7. 37

  3. [3]

    Sridhar, P

    P. Sridhar, P. Angamuthu, Enhancing image based classification for crop disease detection using a multiclass svm approach with ker- nel comparison, Scientific Reports 15 (2025) 40055.doi:10.1038/ s41598-025-23568-w

  4. [4]

    P. H. Hoang, T. T. H. Phan, Potato leaf disease classification in un- controlled environments: Leveraging the synergy of handcrafted fea- tures, in: T. T. Quan, C. Sombattheera, H. A. Pham, N. T. Tran (Eds.), Multi-disciplinary Trends in Artificial Intelligence, Vol. 16354 of Lecture Notes in Computer Science, Springer, Singapore, 2026. doi:10.1007/978-98...

  5. [5]

    P. H. Hoang, T. T. H. Phan, Toward robust potato leaf disease identi- fication: Optimizing performance via comparative feature selection, in: N. N. Dao, H. A. Le, R. Vadivel, N. T. Nguyen (Eds.), Intelligence of Things: Technologies and Applications, Vol. 281 of Lecture Notes on Data Engineering and Communications Technologies, Springer, Cham, 2026.doi:10...

  6. [6]

    N. H. Shabrina, S. Indarti, R. Maharani, D. A. Kristiyanti, Irmawati, N. Prastomo, T. Adilah M, A novel dataset of potato leaf disease in uncontrolled environment, Data in Brief 52 (2024) 109955.doi:https: //doi.org/10.1016/j.dib.2023.109955

  7. [7]

    Rivaldo, D

    M. Rivaldo, D. Udjulawa, Performance comparison of efficientnetb0 in potato leaf disease classification with adam and sgd, Brilliance: Re- search of Artificial Intelligence 5 (2) (2025) 1224–1231.doi:10.47709/ brilliance.v5i2.7482

  8. [8]

    Mhala, A

    P. Mhala, A. Bilandani, S. Sharma, Enhancing crop productivity with fined-tuned deep convolution neural network for potato leaf disease de- tection, Expert Systems with Applications 267 (2025) 126066.doi: https://doi.org/10.1016/j.eswa.2024.126066

  9. [9]

    Meghana, S

    V. Meghana, S. Akanksha, P. K. Reddy, M. A. Jabbar, Potato leaf dis- ease classification using vision transformers, in: Proceedings of the 2025 International Conference on Computing and Communications (COM- PUTINGCON), 2025, pp. 1–7.doi:10.1109/COMPUTINGCON64838. 2025.11377413. 38

  10. [10]

    , author Gopi, R

    S. Murugavalli, R. Gopi, Plant leaf disease detection using vision trans- formers for precision agriculture, Scientific Reports 15 (2025) 22361. doi:10.1038/s41598-025-05102-0

  11. [11]

    Tabassum, V

    I. Tabassum, V. Nunavath, Transformer-based multi-class classification of bangladeshi rice varieties using image data, Applied Sciences 16 (3) (2026) 1279.doi:10.3390/app16031279

  12. [12]

    Apleni, F

    T. Apleni, F. O. Isinkaye, M. O. Olusanya, Ensemble-based fea- ture fusion for accurate plant disease classification using pre- trained models, Scientific Reports 15 (2025) 41925.doi:10.1038/ s41598-025-25927-z

  13. [13]

    J. H. Sinamenye, A. Chatterjee, R. Shrestha, Potato plant disease de- tection: Leveraging hybrid deep learning models, BMC Plant Biology 25 (2025) 647.doi:10.1186/s12870-025-06679-4

  14. [14]

    B. Ahmad, Alamsyah, Ensemble learning-based potato leaf disease clas- sification using densenet201 and mobilenetv2, Journal of Information System Exploration and Research 4 (1) (2026) 1–8

  15. [15]

    A. R. Al-Shamasneh, Potato leaves disease classification based on gen- eralized jones polynomials image features, MethodsX 14 (2025) 103421. doi:https://doi.org/10.1016/j.mex.2025.103421

  16. [16]

    Z. Li, S. M. Javidan, Ml-based approach to potato diseases diagnosis using image processing and whale optimization algorithm for feature selection, Smart Agricultural Technology 12 (2025) 101282.doi:https: //doi.org/10.1016/j.atech.2025.101282

  17. [17]

    Hoang, N.-T

    P.-H. Hoang, N.-T. Trinh, V.-M. Tran, T.-T.-H. Phan, Multi-objective hybrid knowledge distillation for efficient deep learning in smart agri- culture (2025).arXiv:2512.22239

  18. [18]

    Zhang, X

    J. Zhang, X. Yang, X. Fu, B. Wang, H. Li, Ldl-mobilenetv3s: an en- hanced lightweight mobilenetv3-small model for potato leaf disease diag- nosis through multi-module fusion, Frontiers in Plant Science 16 (2025). doi:10.3389/fpls.2025.1656731. 39

  19. [19]

    Aishwarya, S

    N. Aishwarya, S. Cheran, S. S. Gnaneswar, V. Rathinasamy, Transformer-based deep learning approach for potato leaf disease clas- sification, in: Proceedings of the 2025 International Conference on Sus- tainability, Innovation & Technology (ICSIT), 2025, pp. 1–6.doi: 10.1109/ICSIT65336.2025.11295448

  20. [20]

    H. K. Rofiqi, E. Noersasongko, S. Winarno, M. A. Soeleman, Augmenta- tion strategy and hyperparameter optimization using optuna for potato leaf disease classification in uncontrolled environment, Jurnal Teknik Informatika (JUTIF) 7 (2) (2026) 743–759

  21. [21]

    Mandhani, S

    K. Mandhani, S. Singh, A. Chandrawanshi, Multi-crop disease detection using deep learning with class imbalance handling, International Journal of Research Publication and Reviews 7 (4) (2026) 2906–2913

  22. [22]

    L. E. Raya-González, V. A. Alcántar-Camarena, J. Cepeda-Negrete, A. Bustos-Gaytán, M. del Rosario Abraham-Juárez, N. Salda˜ na-Robles, Application of mixture of experts models for the recognition of pests and diseases in maize, Array 27 (2025) 100502.doi:https://doi.org/ 10.1016/j.array.2025.100502

  23. [23]

    Salman, A

    Z. Salman, A. Muhammad, D. Han, Plant disease classification in the wild using vision transformers and mixture of experts, Frontiers in Plant Science 16 (2025).doi:10.3389/fpls.2025.1522985

  24. [24]

    Q. Lu, W. Zhao, J. Chen, X. Chen, L. Zhang, Uncertainty mixture of experts model for long tail crop type mapping, Remote Sensing 17 (22) (2025) 3752.doi:10.3390/rs17223752

  25. [25]

    S. Xu, Z. He, X. Liang, H. Lu, Robust prediction of soluble solids content in pomelo across storage time using a gated mixture-of-experts model with near-infrared transmittance spectra, Talanta 306 (2026) 129758. doi:https://doi.org/10.1016/j.talanta.2026.129758

  26. [26]

    MobileNetV2: Inverted Residuals and Linear Bottlenecks

    M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, L.-C. Chen, Mo- bilenetv2: Inverted residuals and linear bottlenecks (2019).arXiv: 1801.04381

  27. [27]

    M. Tan, Q. V. Le, Efficientnet: Rethinking model scaling for convolu- tional neural networks (2020).arXiv:1905.11946. 40

  28. [28]

    Densely connected convolutional networks

    G. Huang, Z. Liu, L. Van Der Maaten, K. Q. Weinberger, Densely connected convolutional networks, in: 2017 IEEE Conference on Com- puter Vision and Pattern Recognition (CVPR), 2017, pp. 2261–2269. doi:10.1109/CVPR.2017.243

  29. [29]

    K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition (2015).arXiv:1512.03385

  30. [30]

    Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, B. Guo, Swin transformer: Hierarchical vision transformer using shifted windows, in: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 9992–10002

  31. [31]

    R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, D. Batra, Grad-cam: Visual explanations from deep networks via gradient-based localization, International Journal of Computer Vision 128 (2) (2019) 336–359.doi:10.1007/s11263-019-01228-7

  32. [32]

    Nguyen, Image dataset of ten durian diseases captured in real-field conditions from a family orchard in vinh long, vietnam, Mendeley Data, V1 (2025).doi:10.17632/mhjwyb5p48.1

    T. Nguyen, Image dataset of ten durian diseases captured in real-field conditions from a family orchard in vinh long, vietnam, Mendeley Data, V1 (2025).doi:10.17632/mhjwyb5p48.1

  33. [33]

    S. A. Rahman, M. H. Hena, Applying convolutional neural networks for early detection of diseases in sesame leaf, Mendeley Data (2025). doi:10.17632/c64jt5gkzm.1

  34. [34]

    Chang, C.-C

    C.-Y. Chang, C.-C. Lai, Potato leaf disease detection based on a lightweight deep learning model, Machine Learning and Knowledge Ex- traction 6 (4) (2024) 2321–2335.doi:10.3390/make6040114

  35. [35]

    M. H. Tariq, H. Sultan, R. Akram, S. G. Kim, J. S. Kim, M. Us- man, H. A. H. Gondal, J. Seo, Y. H. Lee, K. R. Park, Estimation of fractal dimensions and classification of plant disease with complex backgrounds, Fractal and Fractional 9 (5) (2025) 315.doi:10.3390/ fractalfract9050315

  36. [36]

    Mondal, A

    A. Mondal, A. Chatterjee, N. A vazov, A hybrid cnn-transformer model with adaptive activation function for potato leaf disease classification, Scientific Reports 16 (2026) 4282.doi:10.1038/s41598-025-34406-4. 41

  37. [37]

    Sangar, V

    G. Sangar, V. Rajasekar, Optimized classification of potato leaf disease using efficientnet-lite and ke-svm in diverse environments, Frontiers in Plant Science 16 (2025) 1499909.doi:10.3389/fpls.2025.1499909. 42