The authors derive a Maximally Scale-Stable Parameterization (MSSP) for MoE models that achieves robust learning-rate transfer and monotonic performance gains with scale across co-scaling regimes of width, experts, and sparsity.
International Conference on Learning Representations (ICLR) , year=
5 Pith papers cite this work. Polarity classification is still indexing.
years
2026 5verdicts
UNVERDICTED 5representative citing papers
ViT-K uses Vision Transformers and Koopman operators to learn stable long-term spatiotemporal dynamics of coupled fluid-porous media flows from sparse data.
Language representations serve as the asymptotic attractor for convergence in independently trained multimodal neural networks due to feature density asymmetry.
A pipeline derives continuous signed edges from LLM stance scores on text and links discourse signals such as toxicity and extreme claims to changes in structural polarization measured by spectral and frustration scores on Reddit Brexit data.
A perturbation-based metric for XAI quality that formalizes sufficiency and necessity, paired with an adapter trained via differentiable supervision to generate causal explanations on black-box models.
citing papers explorer
-
How to Scale Mixture-of-Experts: From muP to the Maximally Scale-Stable Parameterization
The authors derive a Maximally Scale-Stable Parameterization (MSSP) for MoE models that achieves robust learning-rate transfer and monotonic performance gains with scale across co-scaling regimes of width, experts, and sparsity.
-
ViT-K: A Few-Shot Learning Model for Coupled Fluid-Porous Media Flows with Interface Conditions
ViT-K uses Vision Transformers and Koopman operators to learn stable long-term spatiotemporal dynamics of coupled fluid-porous media flows from sparse data.
-
The Wittgensteinian Representation Hypothesis: Is Language the Attractor of Multimodal Convergence?
Language representations serve as the asymptotic attractor for convergence in independently trained multimodal neural networks due to feature density asymmetry.
-
Linking Extreme Discourse to Structural Polarization in Signed Interaction Networks
A pipeline derives continuous signed edges from LLM stance scores on text and links discourse signals such as toxicity and extreme claims to changes in structural polarization measured by spectral and frustration scores on Reddit Brexit data.
-
Learning Quantifiable Visual Explanations Without Ground-Truth
A perturbation-based metric for XAI quality that formalizes sufficiency and necessity, paired with an adapter trained via differentiable supervision to generate causal explanations on black-box models.