A new SFT framework for MoE models combines bias-driven sparsification with gated condenser experts to retain long-tailed expert information, outperforming DenseMixer and ESFT by over 2.5% on math reasoning and commonsense QA benchmarks.
Title resolution pending
6 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Gen-SSD improves chain-of-thought distillation by letting the student model guide the teacher's generation process through real-time selection of learnable reasoning branches, yielding 5.9-point gains over standard KD on math benchmarks.
Chain-in-Tree cuts token use, model calls, and runtime by 75-85% in LLM tree search on GSM8K and Math500 by using simple branching-necessity checks, with little accuracy loss in most cases.
StableToken introduces a multi-branch architecture with bit-wise voting to create noise-robust semantic speech tokens, achieving lower Unit Edit Distance and better SpeechLLM robustness than prior single-path tokenizers.
Discovery via visual inspection and griz photometry of a new collisional ring galaxy Eridanus Wheel (PGC 1112751) resembling the Cartwheel, with rings, spokes, and a connecting bridge to an early-type galaxy at ~60 kpc projected separation.
citing papers explorer
-
A New Cartwheel-like Collisional Ring Galaxy
Discovery via visual inspection and griz photometry of a new collisional ring galaxy Eridanus Wheel (PGC 1112751) resembling the Cartwheel, with rings, spokes, and a connecting bridge to an early-type galaxy at ~60 kpc projected separation.