pith. sign in

arXiv preprint arXiv:2507.17702 , year=

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

citation-role summary

background 2

citation-polarity summary

years

2026 10

roles

background 2

polarities

background 2

representative citing papers

RouteHijack: Routing-Aware Attack on Mixture-of-Experts LLMs

cs.LG · 2026-05-01 · unverdicted · novelty 7.0

RouteHijack is a routing-aware jailbreak that identifies safety-critical experts via activation contrast and optimizes suffixes to suppress them, reaching 69.3% average attack success rate on seven MoE LLMs with strong transfer to variants and VLMs.

ZAYA1-8B Technical Report

cs.AI · 2026-05-06 · unverdicted · novelty 6.0

ZAYA1-8B is a reasoning MoE model with 700M active parameters that matches larger models on math and coding benchmarks and reaches 91.9% on AIME'25 via Markovian RSA test-time compute.

ZONOS2 Technical Report

cs.SD · 2026-06-23 · unverdicted · novelty 4.0

ZONOS2 8B is a scaled MoE TTS model with 900M active parameters trained on 6M hours of data that reports competitive SOTA results on naturalness, speaker similarity, WER, and a new ZTTS1-Eval benchmark while releasing weights and code.

citing papers explorer

Showing 10 of 10 citing papers.

  • RouteHijack: Routing-Aware Attack on Mixture-of-Experts LLMs cs.LG · 2026-05-01 · unverdicted · none · ref 62

    RouteHijack is a routing-aware jailbreak that identifies safety-critical experts via activation contrast and optimizes suffixes to suppress them, reaching 69.3% average attack success rate on seven MoE LLMs with strong transfer to variants and VLMs.

  • Expert Upcycling: Shifting the Compute-Efficient Frontier of Mixture-of-Experts cs.LG · 2026-04-21 · unverdicted · none · ref 51 · 2 links

    Expert upcycling duplicates experts in an existing MoE checkpoint and continues pre-training to match fixed-size baseline performance with 32% less compute.

  • Equifinality in Mixture of Experts: Routing Topology Does Not Determine Language Modeling Quality cs.AI · 2026-04-15 · conditional · none · ref 22

    Routing topology in sparse Mixture-of-Experts models does not determine asymptotic language modeling perplexity; multiple variants including cosine-similarity routing achieve statistically equivalent performance.

  • Optimus: Elastic Decoding for Efficient Diffusion LLM Serving cs.DC · 2026-05-24 · unverdicted · none · ref 42

    Optimus enables elastic decoding granularity adaptation in diffusion LLMs via chunked decoding and load-based scheduling to raise throughput under dynamic conditions.

  • DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices cs.LG · 2026-05-11 · unverdicted · none · ref 176 · 3 links

    DECO is a sparse MoE architecture with ReLU-based routing, learnable expert scaling, and NormSiLU activation that matches dense Transformer performance at 20% expert activation and delivers 2.93x speedup on Jetson AGX Orin.

  • ZAYA1-8B Technical Report cs.AI · 2026-05-06 · unverdicted · none · ref 191

    ZAYA1-8B is a reasoning MoE model with 700M active parameters that matches larger models on math and coding benchmarks and reaches 91.9% on AIME'25 via Markovian RSA test-time compute.

  • Geometric Routing Enables Causal Expert Control in Mixture of Experts cs.AI · 2026-04-15 · unverdicted · none · ref 11

    Cosine-similarity routing in low-dimensional space makes MoE experts monosemantic by construction and enables direct causal control via centroid interventions.

  • Generalization and Scaling Laws for Mixture-of-Experts Transformers cs.LG · 2026-04-10 · unverdicted · none · ref 3

    A covering-number bound and ERM analysis for MoE Transformers under manifold data produces generalization and scaling laws that treat active capacity like dense networks while adding routing overhead.

  • DAG-MoE: From Simple Mixture to Structural Aggregation in Mixture-of-Experts cs.AI · 2026-05-31 · unverdicted · none · ref 26

    DAG-MoE uses a lightweight module to learn DAG-based structural aggregation of selected experts, expanding combination space and enabling intra-layer multi-step reasoning compared to standard weighted-sum MoE.

  • ZONOS2 Technical Report cs.SD · 2026-06-23 · unverdicted · none · ref 232

    ZONOS2 8B is a scaled MoE TTS model with 900M active parameters trained on 6M hours of data that reports competitive SOTA results on naturalness, speaker similarity, WER, and a new ZTTS1-Eval benchmark while releasing weights and code.