org/CorpusID:269804400

Toward inference-optimal mixture-of-expert large language models , author= · 2022 · arXiv 2404.02852

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

HELLoRA: Hot Experts Layer-Level Low-Rank Adaptation for Mixture-of-Experts Models

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

HELLoRA selectively applies LoRA adapters to hot experts in MoE layers, using as little as 15.7% of standard LoRA parameters while improving accuracy by 9.2% on OlMoE across math, code, and alignment tasks.

Lynx: Enabling Efficient MoE Inference through Dynamic Batch-Aware Expert Selection

cs.LG · 2024-11-13 · unverdicted · novelty 6.0

Lynx exploits training-induced batch-level expert activation skews via AffinityBinning to reduce invoked experts per batch, delivering up to 1.30x throughput with under 1% accuracy loss across four model families.

Ling and Ring 2.6 Technical Report: Efficient and Instant Agentic Intelligence at Trillion-Parameter Scale

cs.CL · 2026-06-13 · unverdicted · novelty 4.0

Technical report announcing Ling-2.6 and Ring-2.6 models with hybrid linear attention, evolutionary CoT, and KPop RL for efficient agentic intelligence at scale.

citing papers explorer

Showing 3 of 3 citing papers.

HELLoRA: Hot Experts Layer-Level Low-Rank Adaptation for Mixture-of-Experts Models cs.LG · 2026-05-11 · unverdicted · none · ref 7
HELLoRA selectively applies LoRA adapters to hot experts in MoE layers, using as little as 15.7% of standard LoRA parameters while improving accuracy by 9.2% on OlMoE across math, code, and alignment tasks.
Lynx: Enabling Efficient MoE Inference through Dynamic Batch-Aware Expert Selection cs.LG · 2024-11-13 · unverdicted · none · ref 19
Lynx exploits training-induced batch-level expert activation skews via AffinityBinning to reduce invoked experts per batch, delivering up to 1.30x throughput with under 1% accuracy loss across four model families.
Ling and Ring 2.6 Technical Report: Efficient and Instant Agentic Intelligence at Trillion-Parameter Scale cs.CL · 2026-06-13 · unverdicted · none · ref 80
Technical report announcing Ling-2.6 and Ring-2.6 models with hybrid linear attention, evolutionary CoT, and KPop RL for efficient agentic intelligence at scale.

org/CorpusID:269804400

fields

years

verdicts

representative citing papers

citing papers explorer