Re- cursive self-aggregation unlocks deep thinking in large language models

Recursive self-aggregation unlocks deep thinking in large language models , author= · 2025 · arXiv 2509.26626

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

representative citing papers

On Test-Time Scaling for Vision-Language Models

cs.CV · 2026-06-27 · conditional · novelty 7.0 · 2 refs

Small well-performing LVLMs gain the largest benefits from test-time scaling (up to ~30% improvement), often matching or exceeding larger models, while visual tokens contribute mainly early in the reasoning chain.

SPIRAL: Learning to Search and Aggregate

cs.AI · 2026-06-22 · unverdicted · novelty 7.0

SPIRAL is a reinforcement learning framework that jointly optimizes sequential reasoning, parallel trace generation, and aggregation in language models for improved test-time performance.

ATLAS: Agentic Test-time Learning-to-Allocate Scaling

cs.LG · 2026-06-01 · unverdicted · novelty 7.0

ATLAS introduces an LLM-orchestrated agentic framework for dynamic test-time scaling via extensible 'explore' actions, achieving higher accuracy with fewer API calls than fixed-workflow baselines on four benchmarks.

CAPS: Cascaded Adaptive Pairwise Selection for Efficient Parallel Reasoning

cs.AI · 2026-05-15 · unverdicted · novelty 7.0

CAPS is a four-stage inference-only cascade that adapts how much of each solution the verifier sees and how comparisons are distributed, halving per-candidate verifier tokens while outperforming uniform pairwise verification on most benchmarks.

Test-Time Learning with an Evolving Library

cs.LG · 2026-05-14 · unverdicted · novelty 7.0

EvoLib enables LLMs to accumulate, reuse, and evolve knowledge abstractions from inference trajectories at test time, yielding substantial gains on math reasoning, code generation, and agentic benchmarks without parameter updates or supervision.

ZAYA1-8B Technical Report

cs.AI · 2026-05-06 · unverdicted · novelty 6.0

ZAYA1-8B is a reasoning MoE model with 700M active parameters that matches larger models on math and coding benchmarks and reaches 91.9% on AIME'25 via Markovian RSA test-time compute.

Comprehensive AI governance requires addressing non-model gains

cs.CY · 2026-05-01 · unverdicted · novelty 6.0

Non-model gains via inference, systems, and assets can drive AI capabilities independently of base models, requiring governance beyond model-level evaluation and mitigation.

QED-Nano: Teaching a Tiny Model to Prove Hard Theorems

cs.AI · 2026-04-06 · unverdicted · novelty 6.0

A 4B model post-trained with SFT, RL, and a reasoning cache surpasses larger open models and approaches proprietary ones on Olympiad proof generation.

Understanding Performance Gap Between Parallel and Sequential Sampling in Large Reasoning Models

cs.CL · 2026-04-07 · unverdicted · novelty 5.0

Lack of exploration from conditioning on prior answers is the primary reason parallel sampling outperforms sequential sampling in large reasoning models.

ZONOS2 Technical Report

cs.SD · 2026-06-23 · unverdicted · novelty 4.0

ZONOS2 8B is a scaled MoE TTS model with 900M active parameters trained on 6M hours of data that reports competitive SOTA results on naturalness, speaker similarity, WER, and a new ZTTS1-Eval benchmark while releasing weights and code.

citing papers explorer

Showing 10 of 10 citing papers.

On Test-Time Scaling for Vision-Language Models cs.CV · 2026-06-27 · conditional · none · ref 33 · 2 links
Small well-performing LVLMs gain the largest benefits from test-time scaling (up to ~30% improvement), often matching or exceeding larger models, while visual tokens contribute mainly early in the reasoning chain.
SPIRAL: Learning to Search and Aggregate cs.AI · 2026-06-22 · unverdicted · none · ref 14
SPIRAL is a reinforcement learning framework that jointly optimizes sequential reasoning, parallel trace generation, and aggregation in language models for improved test-time performance.
ATLAS: Agentic Test-time Learning-to-Allocate Scaling cs.LG · 2026-06-01 · unverdicted · none · ref 51
ATLAS introduces an LLM-orchestrated agentic framework for dynamic test-time scaling via extensible 'explore' actions, achieving higher accuracy with fewer API calls than fixed-workflow baselines on four benchmarks.
CAPS: Cascaded Adaptive Pairwise Selection for Efficient Parallel Reasoning cs.AI · 2026-05-15 · unverdicted · none · ref 42
CAPS is a four-stage inference-only cascade that adapts how much of each solution the verifier sees and how comparisons are distributed, halving per-candidate verifier tokens while outperforming uniform pairwise verification on most benchmarks.
Test-Time Learning with an Evolving Library cs.LG · 2026-05-14 · unverdicted · none · ref 6
EvoLib enables LLMs to accumulate, reuse, and evolve knowledge abstractions from inference trajectories at test time, yielding substantial gains on math reasoning, code generation, and agentic benchmarks without parameter updates or supervision.
ZAYA1-8B Technical Report cs.AI · 2026-05-06 · unverdicted · none · ref 152
ZAYA1-8B is a reasoning MoE model with 700M active parameters that matches larger models on math and coding benchmarks and reaches 91.9% on AIME'25 via Markovian RSA test-time compute.
Comprehensive AI governance requires addressing non-model gains cs.CY · 2026-05-01 · unverdicted · none · ref 89
Non-model gains via inference, systems, and assets can drive AI capabilities independently of base models, requiring governance beyond model-level evaluation and mitigation.
QED-Nano: Teaching a Tiny Model to Prove Hard Theorems cs.AI · 2026-04-06 · unverdicted · none · ref 3
A 4B model post-trained with SFT, RL, and a reasoning cache surpasses larger open models and approaches proprietary ones on Olympiad proof generation.
Understanding Performance Gap Between Parallel and Sequential Sampling in Large Reasoning Models cs.CL · 2026-04-07 · unverdicted · none · ref 25
Lack of exploration from conditioning on prior answers is the primary reason parallel sampling outperforms sequential sampling in large reasoning models.
ZONOS2 Technical Report cs.SD · 2026-06-23 · unverdicted · none · ref 177
ZONOS2 8B is a scaled MoE TTS model with 900M active parameters trained on 6M hours of data that reports competitive SOTA results on naturalness, speaker similarity, WER, and a new ZTTS1-Eval benchmark while releasing weights and code.

Re- cursive self-aggregation unlocks deep thinking in large language models

fields

years

verdicts

representative citing papers

citing papers explorer