arxiv: 2604.14246 · v2 · submitted 2026-04-15 · 💻 cs.LG · cs.AI

Recognition: unknown

Awakening Dormant Experts:Counterfactual Routing to Mitigate MoE Hallucinations

Kaidong Yu, Mingkuan Zhao, Shanhong yu, Shuangyong Song, Wentao Hu, Xiaohui Hu, Xue Liu, Xuelong Li, Yanbo Zhai

Authors on Pith no claims yet

Pith reviewed 2026-05-10 14:19 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords mixture of expertshallucinationscounterfactual routingfactual accuracysparse modelsinference optimizationlong-tail knowledgetruthfulqa

0 comments

The pith

Counterfactual routing activates dormant experts in MoE models to improve factual accuracy by 3.1 percent without added compute.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Sparse Mixture-of-Experts models hallucinate on long-tail facts because static Top-k routing assigns low scores to specialist experts that hold rare knowledge, leaving them dormant. The paper proposes Counterfactual Routing, a training-free method that performs layer-wise perturbations to compute the Counterfactual Expert Impact of each expert and then reroutes to include high-impact dormant experts. This keeps the total number of active experts fixed per token while shifting resources toward knowledge-intensive layers. Experiments on TruthfulQA, FACTOR, and TriviaQA show consistent gains. Readers would care because the technique extracts more reliable factual behavior from already-trained models at zero extra inference cost.

Core claim

The paper claims that layer-wise perturbation analysis combined with the Counterfactual Expert Impact metric can identify and activate dormant experts that are causally decisive for long-tail factual associations, thereby mitigating hallucinations in MoE models while preserving the original total activation count and inference budget.

What carries the argument

Counterfactual Expert Impact (CEI) metric obtained via layer-wise virtual ablation to measure each expert's causal contribution to the output.

If this is right

Factual accuracy rises by 3.1 percent on average across TruthfulQA, FACTOR, and TriviaQA.
Total expert activations per token stay constant, so inference cost remains unchanged.
The method yields a superior performance-compute frontier compared with static increases in the number of active experts.
Existing trained MoE models can use the technique at inference time with no retraining required.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Routing biases may be a broader source of unreliability across other sparse conditional-computation architectures.
The approach could be combined with targeted continued pretraining on long-tail data for additive gains.
Similar dormant-expert effects may appear in non-factual tasks such as reasoning or code generation.
The technique suggests that hallucinations often reflect under-use of existing knowledge rather than absence of that knowledge.

Load-bearing premise

Layer-wise perturbation analysis and the Counterfactual Expert Impact metric reliably identify causally important dormant experts for long-tail knowledge without introducing new errors or degrading other capabilities.

What would settle it

On a held-out factual benchmark, applying the suggested expert activations produces no accuracy gain or causes measurable degradation on fluency or non-factual tasks.

Figures

Figures reproduced from arXiv: 2604.14246 by Kaidong Yu, Mingkuan Zhao, Shanhong yu, Shuangyong Song, Wentao Hu, Xiaohui Hu, Xue Liu, Xuelong Li, Yanbo Zhai.

**Figure 1.** Figure 1: Overview of the Offline Causal Analysis pipeline. The framework consists of three hierarchical stages: [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Schematic of Compute-Preserving Expert Redistribution. CoR dynamically reallocates budget based [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Illustration of Causal-Guided Expert Awaken [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Visualization of the Dormant Expert phe [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Pareto Efficiency Analysis. The CoR curve [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Extended Visualization of Dormant Experts. Consistent with Layer 22, deeper layers continue to show a [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

read the original abstract

Sparse Mixture-of-Experts (MoE) models have achieved remarkable scalability, yet they remain vulnerable to hallucinations, particularly when processing long-tail knowledge. We identify that this fragility stems from static Top-$k$ routing: routers tend to favor high-frequency patterns over rare factual associations. Consequently, ``specialist experts'' possessing critical long-tail knowledge are often assigned low gating scores and remain ``dormant'' -- under-prioritized for specific tokens despite their proven causal importance on other inputs. To address this, we propose Counterfactual Routing (CoR), a training-free inference framework designed to awaken these dormant experts. CoR integrates layer-wise perturbation analysis with the Counterfactual Expert Impact (CEI) metric to dynamically shift computational resources from syntax-dominant to knowledge-intensive layers while maintaining a constant total activation count, effectively retrieving causally decisive experts via virtual ablation. Extensive experiments on TruthfulQA, FACTOR, and TriviaQA demonstrate that CoR improves factual accuracy by 3.1\% on average without increasing the inference budget, establishing a superior Pareto frontier compared to static scaling strategies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CoR introduces a training-free counterfactual rerouting trick for MoE models to wake up dormant experts on factual queries, but the claim of zero extra compute rests on unverified details about the perturbation step.

read the letter

The paper's main point is a new inference procedure called Counterfactual Routing that uses layer-wise virtual ablations and a CEI metric to shift activations toward experts that hold long-tail facts, all while keeping the total number of active experts fixed. This targets the known tendency of static top-k routers to ignore specialist experts on rare knowledge, which contributes to hallucinations in large MoE systems already in production use. The training-free design and focus on maintaining inference budget are the practical angles worth noting first. The combination of perturbation analysis with dynamic, budget-preserving re-routing is not a standard extension of existing MoE routing papers, so that part counts as new. Framing the fix around identifying causally decisive experts rather than retraining the router is a reasonable direction, and the goal of improving factual accuracy on datasets like TruthfulQA without scaling compute is the right target. If the full experiments show the method works as described, it could give practitioners a lightweight way to boost reliability in deployed models. The soft spot is the central efficiency claim. The abstract and stress-test note both leave open how the layer-wise perturbation analysis is performed without adding forward passes or ablations that would increase the activation count. If any per-token analysis requires extra work, the Pareto superiority over static scaling collapses. There is also no detail here on statistical tests, variance across runs, or precise baseline comparisons, so the reported 3.1% average gain is hard to evaluate for robustness. The assumption that CEI reliably flags experts for long-tail facts rather than incidental routing changes needs stronger causal evidence. This is the kind of paper that would interest people working on inference-time optimizations for MoE architectures or hallucination mitigation in LLMs. A reader already familiar with routing literature could extract the framework and try adapting the CEI idea, though they would need the full protocol to reproduce anything. It has enough of a distinct method to deserve a serious referee who can check the compute accounting and experimental controls. I would send it to peer review to get those mechanics clarified and the results properly vetted.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes Counterfactual Routing (CoR), a training-free inference-time framework for sparse MoE models. It identifies dormant specialist experts for long-tail knowledge via layer-wise perturbation analysis and the Counterfactual Expert Impact (CEI) metric, then dynamically reallocates a fixed activation budget away from syntax-dominant layers to activate those experts. Experiments on TruthfulQA, FACTOR, and TriviaQA are reported to yield a 3.1% average factual-accuracy gain while preserving inference budget, thereby improving the accuracy-compute Pareto frontier relative to static scaling.

Significance. If the central empirical claim holds under rigorous controls, the work would be significant: it offers a practical, post-training intervention that improves factual reliability in already-deployed MoE models without extra compute or retraining. The training-free character and explicit constant-activation-count constraint are notable strengths that distinguish it from scaling or fine-tuning approaches.

major comments (3)

[§3] §3 (Method), paragraph on virtual ablation and CEI computation: the claim that layer-wise perturbation analysis can be performed while strictly preserving the total activation count is load-bearing for the Pareto-superiority assertion, yet the text supplies no concrete accounting of forward-pass cost, amortization across tokens, or elimination of extra passes. If any per-layer analysis requires even one additional forward pass, the constant-budget guarantee collapses.
[§4] §4 (Experiments), results on TruthfulQA/FACTOR/TriviaQA: the reported 3.1% average gain is presented without baseline details, number of runs, statistical significance tests, or controls for routing variance. This information is required to evaluate whether the observed improvement is attributable to CEI-guided expert activation rather than incidental routing changes.
[§3.1] §3.1, definition of CEI: the metric is introduced as identifying 'causally decisive' experts, but no ablation or correlation analysis is shown demonstrating that CEI scores specifically predict long-tail factual recall (as opposed to general performance shifts). This correlation is necessary to support the causal interpretation underlying the method.

minor comments (2)

[Abstract] Abstract and §1: the phrase 'without increasing the inference budget' is repeated but never quantified (e.g., FLOPs or activation count per token); a single sentence or table entry stating the exact constant activation count used would improve clarity.
[§3] Notation: the manuscript introduces 'Counterfactual Expert Impact (CEI)' without an explicit equation number or boxed definition; adding Eq. (X) for CEI would aid reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript. We address each major comment in detail below, indicating the revisions we plan to make.

read point-by-point responses

Referee: [§3] §3 (Method), paragraph on virtual ablation and CEI computation: the claim that layer-wise perturbation analysis can be performed while strictly preserving the total activation count is load-bearing for the Pareto-superiority assertion, yet the text supplies no concrete accounting of forward-pass cost, amortization across tokens, or elimination of extra passes. If any per-layer analysis requires even one additional forward pass, the constant-budget guarantee collapses.

Authors: We acknowledge that the original manuscript does not provide a detailed breakdown of the forward-pass costs for the layer-wise perturbation analysis. The virtual ablation is computed by perturbing the router decisions post-forward-pass without requiring additional model evaluations, allowing us to maintain the exact activation count by reallocating the fixed budget of k experts per layer. To fully address this concern and support the constant-budget claim, we will add a dedicated paragraph and pseudocode in §3 explaining the amortization across tokens and confirming zero extra passes. This will include a table comparing the computational profile to standard inference. revision: yes
Referee: [§4] §4 (Experiments), results on TruthfulQA/FACTOR/TriviaQA: the reported 3.1% average gain is presented without baseline details, number of runs, statistical significance tests, or controls for routing variance. This information is required to evaluate whether the observed improvement is attributable to CEI-guided expert activation rather than incidental routing changes.

Authors: We agree with the referee that more rigorous experimental details are needed. In the revised manuscript, we will update §4 to report: full baseline comparisons including vanilla Top-k MoE, results averaged over 5 runs with standard deviations, p-values from statistical tests (paired t-test), and a control experiment with random expert reallocation to rule out incidental effects. These changes will allow readers to better assess the source of the 3.1% improvement. revision: yes
Referee: [§3.1] §3.1, definition of CEI: the metric is introduced as identifying 'causally decisive' experts, but no ablation or correlation analysis is shown demonstrating that CEI scores specifically predict long-tail factual recall (as opposed to general performance shifts). This correlation is necessary to support the causal interpretation underlying the method.

Authors: We recognize the importance of validating that CEI specifically targets long-tail factual recall. The current text derives CEI from perturbation analysis but lacks supporting ablations. We will revise §3.1 to include correlation plots between CEI scores and factual accuracy on long-tail subsets, as well as an ablation study contrasting CEI-based routing with frequency-based and random routing. This will provide evidence for the causal role in mitigating hallucinations on rare knowledge. revision: yes

Circularity Check

0 steps flagged

Training-free empirical method with no self-referential derivations or fitted predictions

full rationale

The paper presents CoR as a training-free inference-time framework that uses layer-wise perturbation analysis and the CEI metric to reallocate a fixed activation budget toward dormant experts. The 3.1% factual accuracy gain is reported as an empirical outcome on external benchmarks (TruthfulQA, FACTOR, TriviaQA). No equations, predictions, or first-principles claims are shown to reduce by construction to fitted parameters, self-citations, or renamed inputs. The central claims rest on benchmark measurements rather than tautological definitions or load-bearing self-citations. This is the expected non-circular outcome for a method paper whose value is demonstrated through independent evaluation.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on domain assumptions about expert specialization and the validity of perturbation-based causal attribution; no free parameters or new physical entities are introduced in the abstract.

axioms (2)

domain assumption Static top-k routing systematically under-activates experts that hold long-tail factual knowledge.
Stated as the root cause of hallucinations in the abstract.
domain assumption Layer-wise virtual ablation can isolate the causal contribution of individual experts to factual outputs.
Core premise of the Counterfactual Expert Impact metric.

invented entities (1)

Counterfactual Expert Impact (CEI) metric no independent evidence
purpose: Quantify expert importance via perturbation to guide dynamic routing.
Newly defined quantity used to decide which dormant experts to awaken.

pith-pipeline@v0.9.0 · 5522 in / 1450 out tokens · 75296 ms · 2026-05-10T14:19:50.686847+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

When Are Experts Misrouted? Counterfactual Routing Analysis in Mixture-of-Experts Language Models
cs.LG 2026-05 unverdicted novelty 7.0

Standard top-k routers in MoE language models often select suboptimal routes for difficult tokens, and updating only the final router layer raises pass@K on AIME and HMMT benchmarks across multiple models.

Reference graph

Works this paper leans on

6 extracted references · 6 canonical work pages · cited by 1 Pith paper · 4 internal anchors

[1]

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Deepseek-v2: A strong, economical, and effi- cient mixture-of-experts language model.Preprint, arXiv:2405.04434. Jesse Dodge, Maarten Sap, Ana Marasovi ´c, William Agnew, Gabriel Ilharco, Dirk Groeneveld, Margaret Mitchell, and Matt Gardner. 2021. Documenting large webtext corpora: A case study on the colossal clean crawled corpus.Preprint, arXiv:2104.087...

work page internal anchor Pith review arXiv 2021
[2]

Measuring Massive Multitask Language Understanding

Measuring massive multitask language under- standing.arXiv preprint arXiv:2009.03300. Wentao Hu, Mingkuan Zhao, Shuangyong Song, Xi- aoyan Zhu, Xin Lai, and Jiayin Wang. 2025. Mo- saic pruning: A hierarchical framework for generaliz- able pruning of mixture-of-experts models.Preprint, arXiv:2511.19822. Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zha...

work page internal anchor Pith review Pith/arXiv arXiv 2009
[3]

arXiv:2512.24157 [cs.CL] https://arxiv.org/abs/2512.24157

Training report of telechat3-moe.Preprint, arXiv:2512.24157. Alex Mallen, Akari Asai, Victor Zhong, Rajarshi Das, Daniel Khashabi, and Hannaneh Hajishirzi. 2023. When not to trust language models: Investigating effectiveness of parametric and non-parametric mem- ories. InProceedings of the 61st Annual Meeting of the Association for Computational Linguisti...

work page arXiv 2023
[4]

gpt-oss-120b & gpt-oss-20b Model Card

Generating benchmarks for factuality evalua- tion of language models. InProceedings of the 18th conference of the european chapter of the associa- tion for computational linguistics (volume 1: Long papers), pages 49–66. OpenAI, :, Sandhini Agarwal, Lama Ahmad, Jason Ai, Sam Altman, Andy Applebaum, Edwin Ar- bus, Rahul K. Arora, Yu Bai, Bowen Baker, Haim- ...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[5]

Hongrui Xing, Xinzhang Liu, Zhuo Jiang, Zhihao Yang, Yitong Yao, Zihan Wang, Wenmin Deng, Chao Wang, Shuangyong Song, Wang Yang, Zhongjiang He, and Yongxiang Li

Technical report of telechat2, telechat2.5 and t1.Preprint, arXiv:2507.18013. Hongrui Xing, Xinzhang Liu, Zhuo Jiang, Zhihao Yang, Yitong Yao, Zihan Wang, Wenmin Deng, Chao Wang, Shuangyong Song, Wang Yang, Zhongjiang He, and Yongxiang Li. 2025. LLMSR@XLLM25: A lan- guage model-based pipeline for structured reasoning data construction. InProceedings of th...

work page arXiv 2025
[6]

Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models

Siren’s song in the ai ocean: A survey on hallucination in large language models.Preprint, arXiv:2309.01219. Deji Zhao, Donghong Han, Jia Wu, Zhongjiang He, Bo Ning, Ye Yuan, Yongxiang Li, Chao Wang, and Shuangyong Song. 2025a. Enhancing math rea- soning ability of large language models via com- putation logic graphs.Knowledge-Based Systems, 325:113905. M...

work page internal anchor Pith review arXiv 2022