hub

arXiv preprint arXiv:2508.13167 , year=

Weizhen Li, Jianbo Lin, Zhuosong Jiang, Jingyi Cao, Xinpeng Liu, Jiayu Zhang, Zhenqiang Huang, Qianben Chen, Weichen Sun, Qiexiang Wang, Hongxuan Lu, Tianrui Qin, Chenghao Zhu, Yi Yao, ShuyingFan,XiaowanLi,TiannanWang,PaiLiu,KingZhu,HeZhu,D · 2025 · arXiv 2508.13167

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

read on arXiv browse 10 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3 baseline 1

citation-polarity summary

background 3 baseline 1

representative citing papers

SCOUT: Active Information Foraging for Long-Text Understanding with Decoupled Epistemic States

cs.CL · 2026-05-06 · unverdicted · novelty 7.0

SCOUT achieves state-of-the-art long-text understanding with up to 8x lower token use by actively foraging for sparse query-relevant information and updating a compact provenance-grounded epistemic state.

MAD-OPD: Breaking the Ceiling in On-Policy Distillation via Multi-Agent Debate

cs.CL · 2026-05-02 · unverdicted · novelty 7.0

MAD-OPD recasts on-policy distillation teachers as a debating collective to supply better supervision, lifting agentic and code performance over single-teacher OPD across multiple model sizes.

Efficient Agentic Reasoning Through Self-Regulated Simulative Planning

cs.AI · 2026-05-21 · unverdicted · novelty 6.0

SR²AM achieves competitive Pass@1 accuracy on diverse tasks with 25.8-95.3% fewer reasoning tokens than much larger models by using self-regulated simulative planning trained via supervised learning and RL.

ICRL: Learning to Internalize Self-Critique with Reinforcement Learning

cs.AI · 2026-05-13 · unverdicted · novelty 6.0

ICRL uses joint RL training of solver and critic with distribution-calibration re-weighting and role-wise advantage estimation to internalize critique into unassisted LLM performance, yielding 6.4-point gains on agentic tasks and 7.0 on math reasoning with Qwen3 models.

SOD: Step-wise On-policy Distillation for Small Language Model Agents

cs.CL · 2026-05-08 · unverdicted · novelty 6.0

SOD reweights on-policy distillation strength step-by-step using divergence to stabilize tool use in small language model agents, yielding up to 20.86% gains and 26.13% on AIME 2025 for a 0.6B model.

Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence

cs.AI · 2026-04-20 · unverdicted · novelty 6.0

Agent-World autonomously synthesizes verifiable real-world tasks and uses continuous self-evolution to train 8B and 14B agents that outperform proprietary models on 23 benchmarks.

Paper Circle: An Open-source Multi-agent Research Discovery and Analysis Framework

cs.CL · 2026-04-07 · unverdicted · novelty 6.0

Paper Circle is an open-source multi-agent system that retrieves papers via offline and online sources, applies multi-criteria scoring and diversity ranking, and converts papers into typed knowledge graphs for structured analysis and question answering.

MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling

cs.CL · 2025-11-14 · unverdicted · novelty 6.0

MiroThinker shows that scaling agent-environment interactions via reinforcement learning lets a 72B open-source model reach up to 81.9% on GAIA and approach commercial performance on research benchmarks.

Scaling Mobile Agent Systems: From Capability Density to Collective Intelligence

cs.DC · 2026-04-29 · unverdicted · novelty 3.0

A vision paper outlining a two-pronged research agenda for scaling mobile agents from isolated devices to distributed intelligent systems.

A Survey of Reinforcement Learning for Large Reasoning Models

cs.CL · 2025-09-10 · accept · novelty 3.0

A survey compiling RL methods, challenges, data resources, and applications for enhancing reasoning in large language models and large reasoning models since DeepSeek-R1.

citing papers explorer

Showing 10 of 10 citing papers.

SCOUT: Active Information Foraging for Long-Text Understanding with Decoupled Epistemic States cs.CL · 2026-05-06 · unverdicted · none · ref 40
SCOUT achieves state-of-the-art long-text understanding with up to 8x lower token use by actively foraging for sparse query-relevant information and updating a compact provenance-grounded epistemic state.
MAD-OPD: Breaking the Ceiling in On-Policy Distillation via Multi-Agent Debate cs.CL · 2026-05-02 · unverdicted · none · ref 29
MAD-OPD recasts on-policy distillation teachers as a debating collective to supply better supervision, lifting agentic and code performance over single-teacher OPD across multiple model sizes.
Efficient Agentic Reasoning Through Self-Regulated Simulative Planning cs.AI · 2026-05-21 · unverdicted · none · ref 48
SR²AM achieves competitive Pass@1 accuracy on diverse tasks with 25.8-95.3% fewer reasoning tokens than much larger models by using self-regulated simulative planning trained via supervised learning and RL.
ICRL: Learning to Internalize Self-Critique with Reinforcement Learning cs.AI · 2026-05-13 · unverdicted · none · ref 11
ICRL uses joint RL training of solver and critic with distribution-calibration re-weighting and role-wise advantage estimation to internalize critique into unassisted LLM performance, yielding 6.4-point gains on agentic tasks and 7.0 on math reasoning with Qwen3 models.
SOD: Step-wise On-policy Distillation for Small Language Model Agents cs.CL · 2026-05-08 · unverdicted · none · ref 12
SOD reweights on-policy distillation strength step-by-step using divergence to stabilize tool use in small language model agents, yielding up to 20.86% gains and 26.13% on AIME 2025 for a 0.6B model.
Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence cs.AI · 2026-04-20 · unverdicted · none · ref 46
Agent-World autonomously synthesizes verifiable real-world tasks and uses continuous self-evolution to train 8B and 14B agents that outperform proprietary models on 23 benchmarks.
Paper Circle: An Open-source Multi-agent Research Discovery and Analysis Framework cs.CL · 2026-04-07 · unverdicted · none · ref 2
Paper Circle is an open-source multi-agent system that retrieves papers via offline and online sources, applies multi-criteria scoring and diversity ranking, and converts papers into typed knowledge graphs for structured analysis and question answering.
MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling cs.CL · 2025-11-14 · unverdicted · none · ref 16
MiroThinker shows that scaling agent-environment interactions via reinforcement learning lets a 72B open-source model reach up to 81.9% on GAIA and approach commercial performance on research benchmarks.
Scaling Mobile Agent Systems: From Capability Density to Collective Intelligence cs.DC · 2026-04-29 · unverdicted · none · ref 9
A vision paper outlining a two-pronged research agenda for scaling mobile agents from isolated devices to distributed intelligent systems.
A Survey of Reinforcement Learning for Large Reasoning Models cs.CL · 2025-09-10 · accept · none · ref 283
A survey compiling RL methods, challenges, data resources, and applications for enhancing reasoning in large language models and large reasoning models since DeepSeek-R1.

arXiv preprint arXiv:2508.13167 , year=

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer