Grouter: Decoupling routing from representation for accelerated moe training

Yuqi Xu, Rizhen Hu, Zihan Liu, Mou Sun, Kun Yuan · 2026 · arXiv 2603.06626

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open full Pith review browse 4 citing papers arXiv PDF

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Routers Learn the Geometry of Their Experts: Geometric Coupling in Sparse Mixture-of-Experts

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

Routers in SMoE models form geometric alignments with their experts through shared gradient directions, enabling effective specialization that auxiliary load-balancing losses tend to disrupt.

Equifinality in Mixture of Experts: Routing Topology Does Not Determine Language Modeling Quality

cs.AI · 2026-04-15 · conditional · novelty 7.0

Routing topology in sparse Mixture-of-Experts models does not determine asymptotic language modeling perplexity; multiple variants including cosine-similarity routing achieve statistically equivalent performance.

Step-wise Rubric Rewards for LLM Reasoning

cs.LG · 2026-05-17 · conditional · novelty 6.0

SRaR attributes rubric items to specific steps via an LLM judge, normalizes per-step scores across rollouts, and combines them with outcome rewards via a decoupled advantage estimator, yielding 3.57-point accuracy gains on Qwen3-8B across math benchmarks.

Leveraging Error Diversity in Group Rollouts for Reinforcement Learning

cs.LG · 2026-05-17

citing papers explorer

Showing 4 of 4 citing papers.

Routers Learn the Geometry of Their Experts: Geometric Coupling in Sparse Mixture-of-Experts cs.LG · 2026-05-12 · unverdicted · none · ref 27 · internal anchor
Routers in SMoE models form geometric alignments with their experts through shared gradient directions, enabling effective specialization that auxiliary load-balancing losses tend to disrupt.
Equifinality in Mixture of Experts: Routing Topology Does Not Determine Language Modeling Quality cs.AI · 2026-04-15 · conditional · none · ref 32 · internal anchor
Routing topology in sparse Mixture-of-Experts models does not determine asymptotic language modeling perplexity; multiple variants including cosine-similarity routing achieve statistically equivalent performance.
Step-wise Rubric Rewards for LLM Reasoning cs.LG · 2026-05-17 · conditional · none · ref 22 · internal anchor
SRaR attributes rubric items to specific steps via an LLM judge, normalizes per-step scores across rollouts, and combines them with outcome rewards via a decoupled advantage estimator, yielding 3.57-point accuracy gains on Qwen3-8B across math benchmarks.
Leveraging Error Diversity in Group Rollouts for Reinforcement Learning cs.LG · 2026-05-17 · unreviewed · ref 30 · internal anchor

Grouter: Decoupling routing from representation for accelerated moe training

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer