Expert Routing for Communication-Efficient MoE via Finite Expert Banks

· 2026 · cs.LG · arXiv 2605.05278

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Resource-efficient machine learning increasingly uses sparse Mixture-of-Experts (MoE) architectures, where the gate acts as both a learning component and a routing interface controlling computation, communication, and accuracy. Motivated by finite-rate interpretations of MoE gating, we treat the gate as a stochastic channel and use $I(X;T)$ to quantify the routing information available to the selected expert. To make the associated information quantities tractable beyond synthetic examples, we develop a finite-bank MNIST construction using pretrained CNN experts and a discrete, data-dependent selection rule. Since the selected model belongs to a finite candidate set, the algorithmic mutual information $I(S;W)$ admits a closed-form discrete-entropy estimator from the empirical posterior $q(W|S)$. Sweeping a data-dependence parameter $\alpha$, we observe that $\widehat I(S;W)$ monotonically tracks the generalization gap, while the Xu-Raginsky bound exhibits the expected looseness. We also compare with a uniform union-bound baseline and introduce an empirical estimator of $I(X;T)$ together with a Blahut-Arimoto procedure for tracing an accuracy-rate curve over the expert bank. The proposed framework provides a practical tool for analyzing resource-aware MoE inference systems and for interpreting $I(X;T)$ and $D(R_g)$ as design proxies for efficient expert routing.

representative citing papers

Sparse In-Network Learning via Shortest-Path Backpropagation and Finite-Rate Gating

cs.IT · 2026-05-22 · unverdicted · novelty 5.0

D-INL reduces training exchange by 70.4% while keeping accuracy within standard deviation of dense INL, with finite-rate regularization cutting estimated latent rate by 45.7% in a distributed classification experiment.

citing papers explorer

Showing 1 of 1 citing paper.

Sparse In-Network Learning via Shortest-Path Backpropagation and Finite-Rate Gating cs.IT · 2026-05-22 · unverdicted · none · ref 10 · internal anchor
D-INL reduces training exchange by 70.4% while keeping accuracy within standard deviation of dense INL, with finite-rate regularization cutting estimated latent rate by 45.7% in a distributed classification experiment.

Expert Routing for Communication-Efficient MoE via Finite Expert Banks

fields

years

verdicts

representative citing papers

citing papers explorer