pith. machine review for the scientific record. sign in

Mesanet: Sequence modeling by locally optimal test-time training.arXiv preprint arXiv:2506.05233

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

fields

cs.LG 5 cs.CL 1

years

2026 5 2025 1

verdicts

UNVERDICTED 6

representative citing papers

Priming: Hybrid State Space Models From Pre-trained Transformers

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

Priming transfers knowledge from pre-trained Transformers to hybrid SSM-attention models, recovering performance with minimal additional tokens and showing Gated KalmaNet outperforming Mamba-2 on long-context reasoning at 32B scale.

MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

cs.CL · 2025-06-16 · unverdicted · novelty 6.0

MiniMax-M1 is a 456B parameter hybrid-attention MoE model trained with CISPO RL that achieves performance comparable or superior to DeepSeek-R1 and Qwen3-235B on reasoning and software engineering tasks while training in three weeks on 512 GPUs.

citing papers explorer

Showing 6 of 6 citing papers.