Mandar Joshi, Eunsol Choi, Daniel Weld, and Luke Zettlemoyer

Patrick Huber, Ernie Chang, Wei Wen, Igor Fedorov, Tarek Elgamal, Hanxian Huang, Naveen Suda, Chinnadhurai Sankar, Vish Vogeti, Yanghan Wang, et al · 2025 · arXiv 2511.06719

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

representative citing papers

MobileMoE: Scaling On-Device Mixture of Experts

cs.LG · 2026-05-26 · unverdicted · novelty 6.0

MobileMoE introduces on-device MoE LLMs that match dense models with 2-4x fewer FLOPs and provide efficient smartphone inference.

MobileLLM-Flash: Latency-Guided On-Device LLM Design for Industry Scale Deployment

cs.LG · 2026-03-16 · unverdicted · novelty 6.0

MobileLLM-Flash creates 350M-1.4B parameter LLMs via latency-guided search and attention skipping, delivering up to 1.8x faster prefill and 1.6x faster decode on mobile CPUs with comparable or better quality.

citing papers explorer

Showing 2 of 2 citing papers.

MobileMoE: Scaling On-Device Mixture of Experts cs.LG · 2026-05-26 · unverdicted · none · ref 24
MobileMoE introduces on-device MoE LLMs that match dense models with 2-4x fewer FLOPs and provide efficient smartphone inference.
MobileLLM-Flash: Latency-Guided On-Device LLM Design for Industry Scale Deployment cs.LG · 2026-03-16 · unverdicted · none · ref 7
MobileLLM-Flash creates 350M-1.4B parameter LLMs via latency-guided search and attention skipping, delivering up to 1.8x faster prefill and 1.6x faster decode on mobile CPUs with comparable or better quality.

Mandar Joshi, Eunsol Choi, Daniel Weld, and Luke Zettlemoyer

fields

years

verdicts

representative citing papers

citing papers explorer