pith. machine review for the scientific record. sign in

hub

MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

34 Pith papers cite this work. Polarity classification is still indexing.

34 Pith papers citing it
abstract

We introduce MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model. MiniMax-M1 is powered by a hybrid Mixture-of-Experts (MoE) architecture combined with a lightning attention mechanism. The model is developed based on our previous MiniMax-Text-01 model, which contains a total of 456 billion parameters with 45.9 billion parameters activated per token. The M1 model natively supports a context length of 1 million tokens, 8x the context size of DeepSeek R1. Furthermore, the lightning attention mechanism in MiniMax-M1 enables efficient scaling of test-time compute. These properties make M1 particularly suitable for complex tasks that require processing long inputs and thinking extensively. MiniMax-M1 is trained using large-scale reinforcement learning (RL) on diverse problems including sandbox-based, real-world software engineering environments. In addition to M1's inherent efficiency advantage for RL training, we propose CISPO, a novel RL algorithm to further enhance RL efficiency. CISPO clips importance sampling weights rather than token updates, outperforming other competitive RL variants. Combining hybrid-attention and CISPO enables MiniMax-M1's full RL training on 512 H800 GPUs to complete in only three weeks, with a rental cost of just $534,700. We release two versions of MiniMax-M1 models with 40K and 80K thinking budgets respectively, where the 40K model represents an intermediate phase of the 80K training. Experiments on standard benchmarks show that our models are comparable or superior to strong open-weight models such as the original DeepSeek-R1 and Qwen3-235B, with particular strengths in complex software engineering, tool utilization, and long-context tasks. We publicly release MiniMax-M1 at https://github.com/MiniMax-AI/MiniMax-M1.

hub tools

citation-role summary

dataset 1

citation-polarity summary

years

2026 31 2025 3

roles

dataset 1

polarities

use dataset 1

representative citing papers

KL for a KL: On-Policy Distillation with Control Variate Baseline

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

vOPD stabilizes on-policy distillation gradients by subtracting a closed-form per-token negative reverse KL baseline as a detached control variate, preserving unbiasedness while lowering variance and matching expensive full-vocabulary methods.

SAGE: A Service Agent Graph-guided Evaluation Benchmark

cs.AI · 2026-04-10 · unverdicted · novelty 7.0

SAGE is a new multi-agent benchmark that formalizes service SOPs as dynamic dialogue graphs to measure LLM agents on logical compliance and path coverage, uncovering an execution gap and empathy resilience across 27 models in 6 scenarios.

Learning, Fast and Slow: Towards LLMs That Adapt Continually

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

Fast-Slow Training combines slow parameter updates with fast context optimization to achieve up to 3x better sample efficiency, higher performance, less forgetting, and preserved plasticity in continual LLM learning.

Internalizing Curriculum Judgment for LLM Reinforcement Fine-Tuning

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

METIS internalizes curriculum judgment in LLM reinforcement fine-tuning by predicting within-prompt reward variance via in-context learning and jointly optimizing with a self-judgment reward, yielding superior performance and up to 67% faster convergence across math, code, and agent benchmarks.

Priming: Hybrid State Space Models From Pre-trained Transformers

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

Priming transfers knowledge from pre-trained Transformers to hybrid SSM-attention models, recovering performance with minimal additional tokens and showing Gated KalmaNet outperforming Mamba-2 on long-context reasoning at 32B scale.

ZAYA1-8B Technical Report

cs.AI · 2026-05-06 · unverdicted · novelty 6.0

ZAYA1-8B is a reasoning MoE model with 700M active parameters that matches larger models on math and coding benchmarks and reaches 91.9% on AIME'25 via Markovian RSA test-time compute.

Cost-Aware Learning

cs.LG · 2026-04-30 · unverdicted · novelty 6.0

Cost-aware SGD achieves target error with lower total sampling cost than standard methods, and Cost-Aware GRPO reduces token usage by up to 30% in LLM reinforcement learning while matching baseline performance.

Building a Precise Video Language with Human-AI Oversight

cs.CV · 2026-04-22 · unverdicted · novelty 6.0

CHAI framework pairs AI pre-captions with expert human critiques to produce precise video descriptions, enabling open models to outperform closed ones like Gemini-3.1-Pro and improve fine-grained control in video generation models.

Scaling Self-Play with Self-Guidance

cs.LG · 2026-04-22 · unverdicted · novelty 6.0

SGS adds self-guidance to LLM self-play for Lean4 theorem proving, surpassing RL baselines and enabling a 7B model to outperform a 671B model after 200 rounds.

Balanced Aggregation: Understanding and Fixing Aggregation Bias in GRPO

cs.LG · 2026-04-14 · unverdicted · novelty 6.0

Balanced Aggregation fixes sign-length coupling and length downweighting in GRPO by computing separate token means for positive and negative subsets and combining them with sequence-count weights, yielding more stable training and higher benchmark scores.

citing papers explorer

Showing 34 of 34 citing papers.