hub

Llms can easily learn to reason from demonstrations structure, not content, is what matters!

Dacheng Li, Shiyi Cao, Tyler Griggs, Shu Liu, Xiangxi Mo, Shishir G Patil, Matei Zaharia, Joseph E Gonzalez, Ion Stoica · 2025 · arXiv 2502.07374

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

read on arXiv browse 11 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR

cs.CL · 2025-07-21 · unverdicted · novelty 6.0

Archer introduces response-level entropy normalization and differentiated clipping/KL regularization in RLVR to encourage exploration on reasoning tokens while stabilizing knowledge tokens, yielding gains in pass@1 and pass@K on reasoning benchmarks.

LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning

cs.CL · 2025-06-23 · unverdicted · novelty 6.0

LongWriter-Zero applies RL from a base model with specialized rewards for length, quality, and structure to outperform SFT baselines and larger models on long-writing benchmarks.

The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity

cs.AI · 2025-06-07 · unverdicted · novelty 6.0

LRMs exhibit complete accuracy collapse beyond certain puzzle complexities, with reasoning effort rising then declining, outperforming standard LLMs only on medium-complexity tasks.

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

cs.CL · 2025-06-02 · conditional · novelty 6.0

High-entropy minority tokens drive RLVR gains, so restricting gradients to the top 20% maintains or improves performance over full updates on Qwen3 models, especially larger ones.

Absurd World: A Simple Yet Powerful Method to Absurdify the Real-world for Probing LLM Reasoning Capabilities

cs.AI · 2026-05-10 · unverdicted · novelty 5.0

Absurd World automatically converts real-world problems into absurd yet logically coherent scenarios to test whether LLMs can reason without depending on familiar patterns.

How Well Do LLMs Perform on the Simplest Long-Chain Reasoning Tasks: An Empirical Study on the Equivalence Class Problem

cs.AI · 2026-05-07 · unverdicted · novelty 5.0

Non-reasoning LLMs fail the equivalence class problem while reasoning LLMs perform better but remain incomplete, with difficulty peaking at phase transition for the former and maximum diameter for the latter.

A Model Can Help Itself: Reward-Free Self-Training for LLM Reasoning

cs.LG · 2025-10-21 · unverdicted · novelty 5.0 · 2 refs

SePT alternates self-generation of responses at controlled temperatures with training on the latest model outputs, yielding gains over a strong no-training baseline on six math reasoning benchmarks.

OpenCodeReasoning: Advancing Data Distillation for Competitive Coding

cs.CL · 2025-04-02 · unverdicted · novelty 4.0

A new open SFT dataset for reasoning distillation lets coding models hit state-of-the-art scores on LiveCodeBench and CodeContests with supervised fine-tuning alone, outperforming RL-trained baselines.

Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

cs.AI · 2025-03-31 · unverdicted · novelty 2.0

This survey frames foundation agents using brain-inspired modular architectures and reviews challenges in evolution, collaboration, and safety.

Self-Consistency from Only Two Samples: CoT-PoT Ensembling for Efficient LLM Reasoning

cs.CL · 2026-04-19

Which Reasoning Trajectories Teach Students to Reason Better? A Simple Metric of Informative Alignment

cs.CL · 2026-01-20

citing papers explorer

Showing 11 of 11 citing papers.

Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR cs.CL · 2025-07-21 · unverdicted · none · ref 19
Archer introduces response-level entropy normalization and differentiated clipping/KL regularization in RLVR to encourage exploration on reasoning tokens while stabilizing knowledge tokens, yielding gains in pass@1 and pass@K on reasoning benchmarks.
LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning cs.CL · 2025-06-23 · unverdicted · none · ref 17
LongWriter-Zero applies RL from a base model with specialized rewards for length, quality, and structure to outperform SFT baselines and larger models on long-writing benchmarks.
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity cs.AI · 2025-06-07 · unverdicted · none · ref 29
LRMs exhibit complete accuracy collapse beyond certain puzzle complexities, with reasoning effort rising then declining, outperforming standard LLMs only on medium-complexity tasks.
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning cs.CL · 2025-06-02 · conditional · none · ref 13
High-entropy minority tokens drive RLVR gains, so restricting gradients to the top 20% maintains or improves performance over full updates on Qwen3 models, especially larger ones.
Absurd World: A Simple Yet Powerful Method to Absurdify the Real-world for Probing LLM Reasoning Capabilities cs.AI · 2026-05-10 · unverdicted · none · ref 7
Absurd World automatically converts real-world problems into absurd yet logically coherent scenarios to test whether LLMs can reason without depending on familiar patterns.
How Well Do LLMs Perform on the Simplest Long-Chain Reasoning Tasks: An Empirical Study on the Equivalence Class Problem cs.AI · 2026-05-07 · unverdicted · none · ref 39
Non-reasoning LLMs fail the equivalence class problem while reasoning LLMs perform better but remain incomplete, with difficulty peaking at phase transition for the former and maximum diameter for the latter.
A Model Can Help Itself: Reward-Free Self-Training for LLM Reasoning cs.LG · 2025-10-21 · unverdicted · none · ref 11 · 2 links
SePT alternates self-generation of responses at controlled temperatures with training on the latest model outputs, yielding gains over a strong no-training baseline on six math reasoning benchmarks.
OpenCodeReasoning: Advancing Data Distillation for Competitive Coding cs.CL · 2025-04-02 · unverdicted · none · ref 12
A new open SFT dataset for reasoning distillation lets coding models hit state-of-the-art scores on LiveCodeBench and CodeContests with supervised fine-tuning alone, outperforming RL-trained baselines.
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems cs.AI · 2025-03-31 · unverdicted · none · ref 156
This survey frames foundation agents using brain-inspired modular architectures and reviews challenges in evolution, collaboration, and safety.
Self-Consistency from Only Two Samples: CoT-PoT Ensembling for Efficient LLM Reasoning cs.CL · 2026-04-19 · unreviewed · ref 81
Which Reasoning Trajectories Teach Students to Reason Better? A Simple Metric of Informative Alignment cs.CL · 2026-01-20 · unreviewed · ref 21

Llms can easily learn to reason from demonstrations structure, not content, is what matters!

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer