A new MoE training method integrates expert-level losses and partial online updates to improve forecasting accuracy and efficiency over standard statistical and neural models.
Outrageously large neural networks: The sparsely-gated mixture-of-experts layer
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Few-shot prompting lifts F1 scores above 96 percent on electricity-invoice extraction for Gemini 1.5 Pro and Mistral-small, while hyperparameter changes produce only marginal gains.
citing papers explorer
-
Fast Training of Mixture-of-Experts for Time Series Forecasting via Expert Loss Integration
A new MoE training method integrates expert-level losses and partial online updates to improve forecasting accuracy and efficiency over standard statistical and neural models.
-
Information Extraction from Electricity Invoices with General-Purpose Large Language Models
Few-shot prompting lifts F1 scores above 96 percent on electricity-invoice extraction for Gemini 1.5 Pro and Mistral-small, while hyperparameter changes produce only marginal gains.