citation dossier

Jordan and Robert A

stub below hub threshold · 4 Pith inbound

Michael I · 1994 · DOI 10.1162/neco.1994.6.2.181

4Pith papers citing it

4reference links

cs.LGtop field · 2 papers

UNVERDICTEDtop verdict bucket · 4 papers

This DOI or bibliographic work is known through the citation graph. Pith is enriching metadata through Crossref/OpenAlex; full non-arXiv reviews need publisher/open-access PDF resolution.

why this work matters in Pith

Pith has found this work in 4 reviewed papers. Its strongest current cluster is cs.LG (2 papers). The largest review-status bucket among citing papers is UNVERDICTED (4 papers). For highly cited works, this page shows a dossier first and a bounded explorer second; it never tries to render every citing paper at once.

representative citing papers

Boundary Mass and the Soft-to-Hard Limit in Mixture-of-Experts

cs.LG · 2026-05-04 · unverdicted · novelty 7.0

Boundary mass in MoE is linear in slab width under smoothness and transversality, so the zero-temperature limit is governed by a thin geometric layer around routing interfaces rather than the full input space.

Hidden in the Multiplicative Interaction: Uncovering Fragility in Multimodal Contrastive Learning

cs.LG · 2026-04-07 · unverdicted · novelty 7.0

Multimodal contrastive learning using multilinear products is fragile to single bad modalities, and a gated version improves top-1 retrieval accuracy on synthetic and real trimodal data.

Fast Training of Mixture-of-Experts for Time Series Forecasting via Expert Loss Integration

stat.ML · 2026-05-11 · unverdicted · novelty 6.0

A new MoE training method integrates expert-level losses and partial online updates to improve forecasting accuracy and efficiency over standard statistical and neural models.

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

cs.CL · 2024-01-11 · unverdicted · novelty 5.0

DeepSeekMoE 2B matches GShard 2.9B performance and approaches a dense 2B model; the 16B version matches LLaMA2-7B at 40% compute by using fine-grained expert segmentation plus shared experts.

citing papers explorer

Showing 4 of 4 citing papers.

Boundary Mass and the Soft-to-Hard Limit in Mixture-of-Experts cs.LG · 2026-05-04 · unverdicted · none · ref 7
Boundary mass in MoE is linear in slab width under smoothness and transversality, so the zero-temperature limit is governed by a thin geometric layer around routing interfaces rather than the full input space.
Hidden in the Multiplicative Interaction: Uncovering Fragility in Multimodal Contrastive Learning cs.LG · 2026-04-07 · unverdicted · none · ref 38
Multimodal contrastive learning using multilinear products is fragile to single bad modalities, and a gated version improves top-1 retrieval accuracy on synthetic and real trimodal data.
Fast Training of Mixture-of-Experts for Time Series Forecasting via Expert Loss Integration stat.ML · 2026-05-11 · unverdicted · none · ref 18
A new MoE training method integrates expert-level losses and partial online updates to improve forecasting accuracy and efficiency over standard statistical and neural models.
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models cs.CL · 2024-01-11 · unverdicted · none · ref 27
DeepSeekMoE 2B matches GShard 2.9B performance and approaches a dense 2B model; the 16B version matches LLaMA2-7B at 40% compute by using fine-grained expert segmentation plus shared experts.

Jordan and Robert A

why this work matters in Pith

fields

years

verdicts

representative citing papers

citing papers explorer