pith. sign in

arxiv: 2606.25092 · v1 · pith:3W4ANZKFnew · submitted 2026-06-23 · 💻 cs.LG

How Modular Is a Frontier Mixture-of-Experts? A Pre-registered Causal Test in Which Apparent Expert Modularity Mostly Dissolves

Pith reviewed 2026-06-26 00:05 UTC · model grok-4.3

classification 💻 cs.LG
keywords mixture of expertsmodularityablationcausal testinglanguage modelsexpert routingselectivity
0
0 comments X

The pith

Causal ablation tests on a frontier MoE model show that only one of six pre-registered expert families acts as a robust selective module.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether experts in a large sparse Mixture-of-Experts model form functional modules linked to specific languages or capabilities. Researchers first map token routing to build an atlas of expert families, then pre-register six hypotheses that tie each family to a performance axis. They ablate each family at inference time and compare the effect to a size-matched random-expert control, checking whether the drop is selective to the matching axis. Only the Arabic-language family meets the strict selectivity criterion across metrics and a held-out corpus; the other families show causal effects that change with the chosen measurement and statistical threshold.

Core claim

Robust functional modularity is rare and measurement-dependent. Of six pre-registered families, only one, the Arabic-language family, is a clean selective module that survives an independent corpus and a conservative statistical bar (1/6; a more permissive pre-registered point rule admits 3/6, but that count is threshold-sensitive). Every other family has a real causal effect yet fails selectivity, and its apparent modularity flips with the measurement: with the corpus, the metric, and the statistical bar.

What carries the argument

Pre-registered causal ablation of expert families identified via a routing-mass atlas, tested for selective performance drops on hypothesized axes against size-matched random controls.

If this is right

  • Ablation-based claims of expert modularity are reliable only when the corpus, metric, and statistical bar are held fixed.
  • The Arabic-language family produces selective effects on Arabic-related tasks that survive multiple controls.
  • Apparent modularity for the other five families reverses when the evaluation setup changes.
  • The method recovers published disjoint structure in a positive-control model, confirming it can detect modularity when present.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Existing analyses of MoE expert specialization may need re-examination if they rely on single-metric or single-corpus ablations.
  • The pattern observed here could be tested on additional frontier MoE models to determine how widespread the measurement dependence is.

Load-bearing premise

The routing-mass atlas accurately identifies candidate functional families whose ablation will produce measurable, selective effects on the corresponding axes.

What would settle it

Finding two or more additional families that meet the selective criterion on the independent corpus under the conservative statistical bar would falsify the claim that robust modularity is rare.

Figures

Figures reproduced from arXiv: 2606.25092 by Ali Asaria, Deep Gandhi, Tony Salomone.

Figure 1
Figure 1. Figure 1: Family separation. Qwen3 (positive control) shows a large within-vs-cross-group gap [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Selectivity scatter. Each family’s on-target effect (x) vs. its worst off-target effect (y); [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The verdict depends on the measurement. Each cell is a family’s modularity verdict [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
read the original abstract

Sparse Mixture-of-Experts (MoE) models route each token to a few of many experts, inviting the hypothesis that experts form functional modules tied to capabilities or languages. We test this causally on Command A+, a frontier open-weights MoE (218B total / 25B active; 128 experts, 8 active, +1 shared). We build a routing-mass atlas, pre-register six family-to-axis hypotheses before any intervention, and ablate each family at inference time against a size-matched random-expert null, measuring whether it selectively breaks its own axis (worst off-target effect at most one third of on-target). Crucially, we test the same families under four metrics and a held-out, independent-corpus run with bootstrap confidence intervals. Our finding is cautionary: robust functional modularity is rare and measurement-dependent. Of six pre-registered families, only one, the Arabic-language family, is a clean selective module that survives an independent corpus and a conservative statistical bar (1/6; a more permissive pre-registered point rule admits 3/6, but that count is threshold-sensitive). Every other family has a real causal effect yet fails selectivity, and its apparent modularity flips with the measurement: with the corpus, the metric, and the statistical bar. A positive control on Qwen3-30B-A3B recovers its published disjoint structure, confirming the method detects modularity when present. The verdict reproduces on the un-quantized BF16 model, ruling out a 4-bit quantization artifact. We conclude that ablation-based modularity verdicts are not safe unless the corpus, metric, and statistical bar are controlled. We release the atlas and ablation data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper conducts a pre-registered causal ablation study on expert modularity in the Command A+ MoE model (218B total parameters). It constructs a routing-mass atlas to define six expert families, tests pre-registered family-to-axis hypotheses by ablating each family versus size-matched random nulls, and evaluates selectivity (on-target effect at least three times any off-target) across four metrics, a held-out independent corpus, and bootstrap intervals. Only the Arabic-language family meets the conservative bar; others show causal effects but fail selectivity or are sensitive to corpus/metric/threshold. A positive control recovers Qwen3's published structure, and results reproduce on the BF16 model. The conclusion is that robust functional modularity is rare and measurement-dependent; data and atlas are released.

Significance. If the central claim holds, the work demonstrates that ablation-based claims of expert specialization in frontier MoEs require strict controls on corpus, metric, and statistical threshold, as apparent modularity often dissolves under them. Strengths include the pre-registered design with explicit hypotheses, positive control recovering known structure, independent-corpus replication, bootstrap CIs, reproduction on unquantized weights, and full data release. These elements make the finding that only 1/6 families (Arabic) survives the conservative test internally consistent and falsifiable, providing a cautionary benchmark for future MoE interpretability studies.

minor comments (2)
  1. Abstract: the description of the routing-mass atlas construction and the exact definition of the six families could be expanded with one additional sentence to clarify how candidate families were identified from the atlas before pre-registration.
  2. The 1/3 off-target threshold and the 1/6 conservative bar are pre-registered, but a brief justification or sensitivity table in the methods would help readers understand why these specific values were chosen over alternatives.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive and accurate summary of our work, the recognition of its methodological strengths (pre-registration, positive control, independent corpus, bootstrap CIs, BF16 reproduction, and data release), and the recommendation to accept. No major comments were raised.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper's central claim rests on pre-registered ablation experiments that compare selective effects against size-matched random-expert nulls, an independent held-out corpus, bootstrap intervals, and a positive control recovering known structure in Qwen3. The routing-mass atlas is an input constructed from routing data, but the family-to-axis hypotheses are stated before any ablation, and the selectivity criterion (on-target effect at least three times any off-target) is applied uniformly; no result is forced by re-using the same fitted values or by a self-citation chain that itself lacks external verification. The design is therefore self-contained against the stated benchmarks and does not reduce any prediction to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper introduces no free parameters, new entities, or ad-hoc axioms beyond standard statistical assumptions for bootstrap intervals and null-model comparisons; all central quantities are measured directly from the model under pre-registered rules.

axioms (1)
  • standard math Bootstrap confidence intervals provide valid uncertainty estimates for the selectivity ratios
    Used to decide whether on-target effect exceeds off-target effects by the pre-registered margin.

pith-pipeline@v0.9.1-grok · 5856 in / 1319 out tokens · 18302 ms · 2026-06-26T00:05:28.321410+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

10 extracted references · 4 linked inside Pith

  1. [1]

    Nicol\`o De Sabbata, Greta Tuckute, Zeming Chen, Martin Schrimpf, and Antoine Bosselut

    Badr AlKhamissi, C. Nicol\`o De Sabbata, Greta Tuckute, Zeming Chen, Martin Schrimpf, and Antoine Bosselut. Mixture of cognitive reasoners: Modular reasoning with brain-like specialization. arXiv preprint arXiv:2506.13331, 2026

  2. [2]

    Multilingual routing in mixture-of-experts

    Lucas Bandarkar, Chenyuan Yang, Mohsen Fayyaz, Junlin Hu, and Nanyun Peng. Multilingual routing in mixture-of-experts. arXiv preprint arXiv:2510.04694, 2025

  3. [3]

    Command a+

    Cohere . Command a+. https://cohere.com/blog/command-a-plus, 2026. Open-weights sparse Mixture-of-Experts model (218B total / 25B active; 128 experts, 8 active, +1 shared), Apache-2.0. CohereLabs/command-a-plus-05-2026

  4. [4]

    From observation to intervention: A causal audit of expert importance in mixture-of-experts models

    Leonard Engmann, Christian Medeiros Adriano, and Holger Giese. From observation to intervention: A causal audit of expert importance in mixture-of-experts models. arXiv preprint arXiv:2606.10703, 2026

  5. [5]

    The expert strikes back: Interpreting mixture-of-experts language models at expert level

    Jeremy Herbst, Stefan Wermter, and Jae Hee Lee. The expert strikes back: Interpreting mixture-of-experts language models at expert level. arXiv preprint arXiv:2604.02178, 2026

  6. [6]

    Martin, Lucas Bandarkar, and Nanyun Peng

    Liu O. Martin, Lucas Bandarkar, and Nanyun Peng. Extracting small translation specialists from llms by aggressively pruning experts. arXiv preprint arXiv:2605.28042, 2026

  7. [7]

    The quest for the right mediator: Surveying mechanistic interpretability through the lens of causal mediation analysis

    Aaron Mueller, Jannik Brinkmann, Millicent Li, Samuel Marks, Koyena Pal, Nikhil Prakash, Can Rager, Aruna Sankaranarayanan, Arnab Sen Sharma, Jiuding Sun, Eric Todd, David Bau, and Yonatan Belinkov. The quest for the right mediator: Surveying mechanistic interpretability through the lens of causal mediation analysis. arXiv preprint arXiv:2408.01416, 2024 a

  8. [8]

    Missed causes and ambiguous effects: Counterfactuals pose challenges for interpreting neural networks

    Aaron Mueller et al. Missed causes and ambiguous effects: Counterfactuals pose challenges for interpreting neural networks. arXiv preprint arXiv:2407.04690, 2024 b

  9. [9]

    Claudia Shi, Nicolas Beltran-Velez, Achille Nazaret, Carolina Zheng, Adri\`a Garriga-Alonso, Andrew Jesson, Maggie Makar, and David M. Blei. Hypothesis testing the circuit hypothesis in llms. arXiv preprint arXiv:2410.13032, 2024

  10. [10]

    Deconstructing pre-training: Knowledge attribution analysis in moe and dense models

    Bo Wang, Junzhuo Li, Hong Chen, Yuanlin Chu, Yuxuan Fan, and Xuming Hu. Deconstructing pre-training: Knowledge attribution analysis in moe and dense models. arXiv preprint arXiv:2601.08383, 2026