Does the Same Token Mean the Same State? MoE Routing as Signal for Reasoning Control
Pith reviewed 2026-06-26 08:57 UTC · model grok-4.3
The pith
The same token in MoE models activates different experts based on context, so routing states at anchors can select correct reasoning paths without reading the answers.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Holding the emitted token id fixed at repeated anchors, the experts that produce it still separate task context, trajectory history, and reasoning-effort mode. Near boundary anchors and delimiter anchors, routing neighborhoods already align with final-answer basins at a marker-only readout, strongest when read at the answer opening.
What carries the argument
RAD (Routing Agreement Decoding): locate a fixed anchor, represent each rollout by anchor-window MoE routing states, return the densest Weighted-Jaccard K-NN route-basin center.
If this is right
- RAD performs on par with majority voting (73.9 vs 73.6) across 10 MoE configs and 6 datasets without using answer strings.
- It provides a direct pass@1 selector for code generation where exact-string voting is ill-defined.
- Re-anchoring the routing-density principle to the agentic boundary improves best-of-16 patch selection on SWE-bench Verified over random.
- RAD is not a verifier and can still select a dense wrong basin.
Where Pith is reading between the lines
- Routing states might allow internal monitoring of reasoning effort without external tools.
- The approach could extend to selecting among multiple agent trajectories or plans in multi-step tasks.
- Testing RAD on non-MoE models or other routing mechanisms would check if the signal is specific to sparse experts.
Load-bearing premise
MoE routing states observed at a fixed anchor window are stable and task-discriminative enough to identify the correct answer basin without any access to the generated token sequence or external verification.
What would settle it
Finding a set of rollouts where the routing-based selector picks the wrong basin more frequently than string majority voting across the tested datasets would show the alignment does not hold.
read the original abstract
In sparse Mixture-of-Experts language models, does the same token id imply the same router state and the same experts producing it? Holding the emitted token id fixed at repeated anchors, we find it does not: the experts that produce it still separate task context, trajectory history, and reasoning-effort mode. This residual structure supports test-time control: near \emph{boundary} anchors (the final-response transition) and \emph{delimiter} anchors (which open the answer, e.g.\ \texttt{\textbackslash boxed\{} or code fences), routing neighborhoods already align with final-answer basins at a marker-only readout and strongest when the routing is read at the answer opening. We operationalize this as \textbf{RAD} (Routing Agreement Decoding), an answer-string-free multi-rollout selector: it locates a fixed anchor, represents each rollout by its anchor-window MoE routing states, and returns the densest Weighted-Jaccard $K$-NN route-basin center, without parsing, normalizing, executing, or voting over answer strings. Across 10 sparse-MoE configurations (gpt-oss, Qwen3-MoE) and 6 datasets spanning math, GPQA, and code, RAD is on par with Majority where string voting is well-posed, with small positive paired deltas (RAD $73.9$ / RAD+DC $74.2$ vs.\ Majority $73.6$). Like majority voting, RAD is not a verifier: a dense \emph{wrong} basin can still win. Its value is the interface: the same selector gives direct pass@1 on code, where exact-string voting is ill-defined, and the same routing-density principle, re-anchored to the agentic boundary, improves best-of-16 patch selection on SWE-bench Verified over random, where patches have no answer string to vote on.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that in sparse MoE language models the same token ID does not imply the same router state: routing at repeated boundary and delimiter anchors separates task context, trajectory history, and reasoning-effort mode. It introduces RAD (Routing Agreement Decoding), a string-free multi-rollout selector that represents each rollout by its anchor-window routing states and returns the densest Weighted-Jaccard K-NN route-basin center; across 10 MoE configurations and 6 datasets RAD reports 73.9 (RAD+DC 74.2) versus Majority 73.6 and extends the same principle to code and SWE-bench patch selection.
Significance. If the routing states at fixed anchors prove stable and sufficiently discriminative, RAD supplies a practical interface for test-time control that does not require answer-string parsing or voting, which is directly useful for code generation and agentic settings where exact-string majority is ill-defined. The multi-configuration, multi-dataset evaluation is a concrete strength.
major comments (2)
- [Abstract] Abstract: the aggregate claim of small positive deltas (RAD 73.9 / RAD+DC 74.2 vs. Majority 73.6) across 10 configurations and 6 datasets supplies no variance, statistical tests, data-split details, or confirmation that anchor selection was pre-specified rather than post-hoc; this information is load-bearing for the assertion that routing neighborhoods already align with correct final-answer basins at a marker-only readout.
- [RAD definition and experimental section] RAD definition and experimental section: the selector is defined directly from the observed routing vectors and their Weighted-Jaccard density; the manuscript does not report independent metrics of cluster stability across rollouts or correlation between routing basins and correctness independent of the final answer string, leaving the central assumption that fixed-anchor states are task-discriminative without token access unverified.
minor comments (1)
- [Method] Notation for the Weighted-Jaccard K-NN distance and the precise window size around boundary/delimiter anchors should be stated explicitly in the method section for reproducibility.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive feedback. We address the two major comments point by point below, indicating where revisions will be made.
read point-by-point responses
-
Referee: [Abstract] Abstract: the aggregate claim of small positive deltas (RAD 73.9 / RAD+DC 74.2 vs. Majority 73.6) across 10 configurations and 6 datasets supplies no variance, statistical tests, data-split details, or confirmation that anchor selection was pre-specified rather than post-hoc; this information is load-bearing for the assertion that routing neighborhoods already align with correct final-answer basins at a marker-only readout.
Authors: The experimental section of the manuscript already reports per-configuration and per-dataset breakdowns together with the exact data splits and model configurations used. Anchor selection (boundary and delimiter tokens) was fixed in advance on the basis of earlier pilot observations of MoE routing behavior and was not tuned on the final test sets. Nevertheless, the abstract itself presents only aggregate figures. In the revision we will (i) add a parenthetical note on variance and the paired statistical tests that were performed, (ii) explicitly state that anchor positions were pre-specified, and (iii) reference the supplementary tables that contain the full per-run statistics. revision: partial
-
Referee: [RAD definition and experimental section] RAD definition and experimental section: the selector is defined directly from the observed routing vectors and their Weighted-Jaccard density; the manuscript does not report independent metrics of cluster stability across rollouts or correlation between routing basins and correctness independent of the final answer string, leaving the central assumption that fixed-anchor states are task-discriminative without token access unverified.
Authors: The primary evidence offered is that RAD, which uses only routing states at fixed anchors, matches or slightly exceeds string-based majority voting across ten model configurations and six tasks. This performance parity supplies indirect support for the claim that routing neighborhoods align with answer correctness. We agree, however, that direct, answer-string-independent diagnostics would strengthen the argument. In the revised experimental section we will therefore add (a) intra- and inter-cluster similarity statistics on the routing vectors themselves and (b) a correlation analysis between basin density and correctness computed after removing any reference to the generated answer strings. revision: yes
Circularity Check
No significant circularity; RAD is an empirical definition from observed routing vectors
full rationale
The paper reports an empirical observation that routing states at fixed anchors separate task context/trajectory/mode, then directly defines RAD as the densest Weighted-Jaccard K-NN center in that routing space. No equations, fitted parameters, or self-citations reduce the selector to its own inputs by construction. The method is presented as an operationalization of the observed alignment, with performance compared to majority voting on external datasets. This matches the default expectation of a self-contained empirical contribution.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Modelutilitylaw: Evaluating LLMs beyond performance through mechanism interpretable metric
YixinCao,JiahaoYing,YaoningWang,XipengQiu,XuanjingHuang,andYugangJiang. Modelutilitylaw: Evaluating LLMs beyond performance through mechanism interpretable metric. arXiv preprint arXiv:2504.07440, 2025. URL https://arxiv.org/abs/2504.07440
arXiv 2025
-
[2]
Do LLMs signal when they’re right? evidence from neuron agreement, 2025
Kang Chen, Yaoning Wang, Kai Xiong, Zhuoka Feng, Wenhe Sun, Haotian Chen, and Yixin Cao. Do LLMs signal when they’re right? evidence from neuron agreement, 2025. URLhttps://arxiv.org/abs/2510.26277
arXiv 2025
-
[3]
NEX: Neuron explore-exploit scoring for label-free chain-of-thought selection and model ranking
Kang Chen, Zhuoka Feng, Sihan Zhao, Kai Xiong, Junjie Nian, Yaoning Wang, Changyi Xiao, and Yixin Cao. NEX: Neuron explore-exploit scoring for label-free chain-of-thought selection and model ranking. arXiv preprint arXiv:2602.05805, 2026. URLhttps://arxiv.org/abs/2602.05805
arXiv 2026
-
[4]
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Damai Dai, Chengqi Deng, Chenggang Zhao, R.x. Xu, Huazuo Gao, Deli Chen, Jiashi Li, Wangding Zeng, Xingkai Yu, Y. Wu, Zhenda Xie, Y.k. Li, Panpan Huang, Fuli Luo, Chong Ruan, Zhifang Sui, and Wenfeng Liang. DeepSeekMoE: Towards ultimate expert specialization in mixture-of-experts language models. InProceedings of the 62nd Annual Meeting of the Association...
-
[5]
Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity.Journal of Machine Learning Research, 23(120):1–39, 2022
William Fedus, Barret Zoph, and Noam Shazeer. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity.Journal of Machine Learning Research, 23(120):1–39, 2022. URLhttps://jmlr.org/ papers/v23/21-0998.html
2022
-
[6]
Deep think with confidence
Yichao Fu, Xuewei Wang, Hao Zhang, Yuandong Tian, and Jiawei Zhao. Deep think with confidence. InInternational Conference on Learning Representations, 2026. URLhttps://openreview.net/forum?id=8LqHs0KIM7
2026
-
[7]
Shun-ichiro Hayashi, Daichi Mukunoki, Tetsuya Hoshino, and Takahiro Katagiri. Layer-wise MoE routing locality under shared-prefix code generation: Token-identity decomposition and compile-equivalent fork redundancy. arXiv preprint arXiv:2604.17182, 2026. URLhttps://arxiv.org/abs/2604.17182
Pith/arXiv arXiv 2026
-
[8]
Slim-SC: Thought pruning for efficient scaling with self-consistency
Colin Hong, Xu Guo, Anand Chaanan Singh, Esha Choukse, and Dmitrii Ustiugov. Slim-SC: Thought pruning for efficient scaling with self-consistency. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 34500–34517, Suzhou, China, 2025. Association for Computational Linguistics. doi: 10.18653/v1/2025.emnlp-main.1750...
-
[9]
Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, De- vendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak, Tev...
Pith/arXiv arXiv 2024
-
[10]
Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan
Carlos E. Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan. SWE-bench: Can language models resolve real-world GitHub issues? InInternational Conference on Learning Representations (ICLR), 2024. URLhttps://arxiv.org/abs/2310.06770
Pith/arXiv arXiv 2024
-
[11]
The path of least resistance: Guiding LLM reasoning trajectories with prefix consensus
Ishan Jindal, Sai Prashanth Akuthota, Jayant Taneja, and Sachin Dev Sharma. The path of least resistance: Guiding LLM reasoning trajectories with prefix consensus. InInternational Conference on Learning Representations, 2026. URL https://openreview.net/forum?id=hrnSqERgPn
2026
-
[12]
OpenAI Harmony Response Format, August 5 2025
Dominik Kundel. OpenAI Harmony Response Format, August 5 2025. URLhttps://developers.openai.com/ cookbook/articles/openai-harmony. OpenAI Cookbook. Accessed 2026-05-07
2025
-
[13]
GShard: Scaling giant models with conditional computation and automatic sharding
DmitryLepikhin,HyoukJoongLee,YuanzhongXu,DehaoChen,OrhanFirat,YanpingHuang,MaximKrikun,Noam 12 Shazeer, and Zhifeng Chen. GShard: Scaling giant models with conditional computation and automatic sharding. In International Conference on Learning Representations, 2021. URLhttps://arxiv.org/abs/2006.16668
Pith/arXiv arXiv 2021
-
[14]
Escape sky-high cost: Early-stopping self-consistency for multi-step reasoning
Yiwei Li, Peiwen Yuan, Shaoxiong Feng, Boyuan Pan, Xinglin Wang, Bin Sun, Heda Wang, and Kan Li. Escape sky-high cost: Early-stopping self-consistency for multi-step reasoning. InInternational Conference on Learning Representations, 2024. URLhttps://openreview.net/forum?id=ndR8Ytrzhh
2024
-
[15]
BorislavMavrin. Inharmonywithgpt-oss. arXivpreprintarXiv:2604.00362,2026. URL https://arxiv.org/abs/ 2604.00362
arXiv 2026
-
[16]
Introducing SWE-bench verified
OpenAI. Introducing SWE-bench verified. OpenAI blog, 2024. URL https://openai.com/index/ introducing-swe-bench-verified/
2024
-
[17]
gpt-oss-120b & gpt-oss-20b Model Card, August 5 2025
OpenAI. gpt-oss-120b & gpt-oss-20b Model Card, August 5 2025. URL https://cdn.openai.com/pdf/ 419b6906-9da6-406c-a19d-1bb078ac7637/oai_gpt-oss_model_card.pdf. Accessed 2026-05-07
2025
-
[18]
Outrageously large neural networks: The sparsely-gated mixture-of-experts layer
Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, and Jeff Dean. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. InInternational Conference on Learning Representations, 2017. URLhttps://openreview.net/forum?id=B1ckMDqlg
2017
-
[19]
Reasoning aware self-consistency: Leveraging reasoning paths for efficient LLM sampling
Guangya Wan, Yuqi Wu, Jie Chen, and Sheng Li. Reasoning aware self-consistency: Leveraging reasoning paths for efficient LLM sampling. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 3613–3635, Albuquerque, New Mexico, 2...
-
[20]
Xi Wang, Soufiane Hayou, and Eric Nalisnick. The myth of expert specialization in MoEs: Why routing reflects geometry, not necessarily domain expertise. arXiv preprint arXiv:2604.09780, 2026. URLhttps://arxiv.org/ abs/2604.09780
Pith/arXiv arXiv 2026
-
[21]
Self-consistency improves chain of thought reasoning in language models
Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. Self-consistency improves chain of thought reasoning in language models. InInternational Conference on Learning Representations (ICLR), 2023. URLhttps://arxiv.org/abs/2203.11171
Pith/arXiv arXiv 2023
-
[22]
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc V. Le, and Denny Zhou. Chain-of-thought prompting elicits reasoning in large language models.Advances in Neural Information Processing Systems, 35, 2022. URLhttps://arxiv.org/abs/2201.11903
Pith/arXiv arXiv 2022
-
[23]
OpenMoE: An early effort on open mixture-of-experts language models
Fuzhao Xue, Zian Zheng, Yao Fu, Jinjie Ni, Zangwei Zheng, Wangchunshu Zhou, and Yang You. OpenMoE: An early effort on open mixture-of-experts language models. In Ruslan Salakhutdinov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors,Proceedings of the 41st International Conference on Machine Lear...
-
[24]
URLhttps://proceedings.mlr.press/v235/xue24c.html
-
[25]
Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press
John Yang, Carlos E. Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press. SWE-agent: Agent-computer interfaces enable automated software engineering. InAdvances in Neural Information Processing Systems (NeurIPS), 2024. URLhttps://arxiv.org/abs/2405.15793
Pith/arXiv arXiv 2024
-
[26]
Beyond benchmarks: Understanding mixture-of-experts models through internal mechanisms
Jiahao Ying, Mingbao Lin, Qianru Sun, and Yixin Cao. Beyond benchmarks: Understanding mixture-of-experts models through internal mechanisms. arXiv preprint arXiv:2509.23933, 2025. URLhttps://arxiv.org/abs/ 2509.23933
arXiv 2025
-
[27]
Anqi Zhang, Yulin Chen, Jane Pan, Chen Zhao, Aurojit Panda, Jinyang Li, and He He. Reasoning models know when they’re right: Probing hidden states for self-verification. arXiv preprint arXiv:2504.05419, 2025. URL https://arxiv.org/abs/2504.05419. A Technical appendices and supplementary material This appendix contains the full answer-string-free protocol ...
arXiv 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.