Unveiling super experts in mixture-of-experts large language models

Zunhai Su, Qingyuan Li, Hao Zhang, YuLei Qian, Yuchen Xie, Kehong Yuan · 2025 · arXiv 2507.23279

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

representative citing papers

Preserving Long-Tailed Expert Information in Mixture-of-Experts Tuning

cs.LG · 2026-04-24 · unverdicted · novelty 7.0

A new SFT framework for MoE models combines bias-driven sparsification with gated condenser experts to retain long-tailed expert information, outperforming DenseMixer and ESFT by over 2.5% on math reasoning and commonsense QA benchmarks.

BitsMoE: Efficient Spectral Energy-Guided Bit Allocation for MoE LLM Quantization

cs.LG · 2026-05-22 · unverdicted · novelty 6.0

BitsMoE uses SVD decomposition and activation-aware ILP bit allocation to quantize MoE LLMs at ultra-low bits with reduced accuracy degradation compared to GPTQ.

OScaR: The Occam's Razor for Extreme KV Cache Quantization in LLMs and Beyond

cs.LG · 2026-05-19 · unverdicted · novelty 6.0

OScaR mitigates token norm imbalance via canalized rotation and omni-token scaling to enable near-lossless INT2 KV cache quantization with up to 3x decoding speedup and 5.3x memory reduction.

Accelerating MoE with Dynamic In-Switch Computing on Multi-GPUs

cs.AR · 2026-05-07 · unverdicted · novelty 6.0

DySHARP accelerates MoE expert parallelism via dynamic multimem addressing and token-centric kernel fusion to cut redundant traffic and deliver up to 1.79x speedup over prior in-switch solutions.

Rethinking the Role of Tensor Decompositions in Post-Training LLM Compression

cs.LG · 2026-06-02 · unverdicted · novelty 5.0

Tensor decompositions face practical limits in large-scale LLM compression due to mismatch between assumed shared subspaces and heterogeneous model representations.

SnapMLA: Efficient Long-Context MLA Decoding via Hardware-Aware FP8 Quantized Pipelining

cs.LG · 2026-02-11 · conditional · novelty 5.0

SnapMLA achieves up to 1.91x higher throughput in long-output MLA decoding using FP8 quantization and specialized kernels while keeping benchmark quality near the BF16 baseline.

Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models

cs.CL · 2026-01-20 · unverdicted · novelty 5.0

The survey organizes mechanistic interpretability techniques into a Locate-Steer-Improve framework to enable actionable improvements in LLM alignment, capability, and efficiency.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models cs.CL · 2026-01-20 · unverdicted · none · ref 288
The survey organizes mechanistic interpretability techniques into a Locate-Steer-Improve framework to enable actionable improvements in LLM alignment, capability, and efficiency.

Unveiling super experts in mixture-of-experts large language models

fields

years

verdicts

representative citing papers

citing papers explorer