hub

V-star: Training verifiers for self-taught reasoners

· 2024 · arXiv 2402.06457

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

read on arXiv browse 12 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Self-Policy Distillation via Capability-Selective Subspace Projection

cs.CL · 2026-05-21 · unverdicted · novelty 7.0

Self-Policy Distillation extracts a capability subspace from model gradients on correctness tokens, projects KV activations into it for self-generation, and fine-tunes LLMs to achieve up to 13-16% gains over baselines without external signals.

Residual Skill Optimization for Text-to-SQL Ensembles

cs.CL · 2026-05-20 · unverdicted · novelty 7.0

Residual skill optimization creates complementary Text-to-SQL agents by training each new skill on prior ensemble failures, yielding accuracy gains on Spider2-Lite and transfer to other dialects and tasks.

FrontierSmith: Synthesizing Open-Ended Coding Problems at Scale

cs.LG · 2026-05-14 · conditional · novelty 7.0

FrontierSmith automates synthesis of open-ended coding problems from closed-ended seeds and shows measurable gains on two open-ended LLM coding benchmarks.

Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs

cs.CL · 2024-12-30 · unverdicted · novelty 7.0

o1-like models overthink easy tasks; self-training reduces compute use without accuracy loss on GSM8K, MATH500, GPQA, and AIME.

Grad Detect: Gradient-Based Hallucination Detection in LLMs

cs.LG · 2026-06-23 · unverdicted · novelty 6.0

Grad Detect uses internal gradient patterns from one inference pass to predict LLM hallucinations and abstention, outperforming confidence and sampling baselines on Q&A benchmarks with most signal in the final five layers.

Navigating the Clutter: Waypoint-Based Bi-Level Planning for Multi-Robot Systems

cs.RO · 2026-04-22 · unverdicted · novelty 6.0

Waypoint-based bi-level planning with curriculum RLVR improves multi-robot task success rates in dense-obstacle benchmarks over motion-agnostic and VLA baselines.

OpenVLThinker: Complex Vision-Language Reasoning via Iterative SFT-RL Cycles

cs.CV · 2025-03-21 · conditional · novelty 6.0

Iterative SFT-RL cycles enable a 7B LVLM to develop sophisticated visual chain-of-thought reasoning and improve performance on math and general reasoning benchmarks.

Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning

cs.LG · 2024-10-10 · unverdicted · novelty 6.0

Process advantage verifiers trained to predict step-level progress under a distinct prover policy improve LLM reasoning accuracy by over 8% and sample efficiency by 5-6x over outcome reward models.

ReMedi: Reasoner for Medical Clinical Prediction

cs.CL · 2026-05-02 · unverdicted · novelty 5.0

ReMedi boosts LLM performance on EHR clinical predictions by up to 19.9% F1 through ground-truth-guided rationale regeneration and fine-tuning.

A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence

cs.AI · 2025-07-28 · accept · novelty 4.0

The paper delivers the first systematic review of self-evolving agents, structured around what components evolve, when adaptation occurs, and how it is implemented.

From System 1 to System 2: A Survey of Reasoning Large Language Models

cs.AI · 2025-02-24 · accept · novelty 3.0

The survey organizes the shift of LLMs toward deliberate System 2 reasoning, covering model construction techniques, performance on math and coding benchmarks, and future research directions.

Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

cs.AI · 2025-03-31 · unverdicted · novelty 2.0

This survey frames foundation agents using brain-inspired modular architectures and reviews challenges in evolution, collaboration, and safety.

citing papers explorer

Showing 12 of 12 citing papers.

Self-Policy Distillation via Capability-Selective Subspace Projection cs.CL · 2026-05-21 · unverdicted · none · ref 8
Self-Policy Distillation extracts a capability subspace from model gradients on correctness tokens, projects KV activations into it for self-generation, and fine-tunes LLMs to achieve up to 13-16% gains over baselines without external signals.
Residual Skill Optimization for Text-to-SQL Ensembles cs.CL · 2026-05-20 · unverdicted · none · ref 15
Residual skill optimization creates complementary Text-to-SQL agents by training each new skill on prior ensemble failures, yielding accuracy gains on Spider2-Lite and transfer to other dialects and tasks.
FrontierSmith: Synthesizing Open-Ended Coding Problems at Scale cs.LG · 2026-05-14 · conditional · none · ref 11
FrontierSmith automates synthesis of open-ended coding problems from closed-ended seeds and shows measurable gains on two open-ended LLM coding benchmarks.
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs cs.CL · 2024-12-30 · unverdicted · none · ref 83
o1-like models overthink easy tasks; self-training reduces compute use without accuracy loss on GSM8K, MATH500, GPQA, and AIME.
Grad Detect: Gradient-Based Hallucination Detection in LLMs cs.LG · 2026-06-23 · unverdicted · none · ref 60
Grad Detect uses internal gradient patterns from one inference pass to predict LLM hallucinations and abstention, outperforming confidence and sampling baselines on Q&A benchmarks with most signal in the final five layers.
Navigating the Clutter: Waypoint-Based Bi-Level Planning for Multi-Robot Systems cs.RO · 2026-04-22 · unverdicted · none · ref 83
Waypoint-based bi-level planning with curriculum RLVR improves multi-robot task success rates in dense-obstacle benchmarks over motion-agnostic and VLA baselines.
OpenVLThinker: Complex Vision-Language Reasoning via Iterative SFT-RL Cycles cs.CV · 2025-03-21 · conditional · none · ref 24
Iterative SFT-RL cycles enable a 7B LVLM to develop sophisticated visual chain-of-thought reasoning and improve performance on math and general reasoning benchmarks.
Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning cs.LG · 2024-10-10 · unverdicted · none · ref 8
Process advantage verifiers trained to predict step-level progress under a distinct prover policy improve LLM reasoning accuracy by over 8% and sample efficiency by 5-6x over outcome reward models.
ReMedi: Reasoner for Medical Clinical Prediction cs.CL · 2026-05-02 · unverdicted · none · ref 30
ReMedi boosts LLM performance on EHR clinical predictions by up to 19.9% F1 through ground-truth-guided rationale regeneration and fine-tuning.
A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence cs.AI · 2025-07-28 · accept · none · ref 206
The paper delivers the first systematic review of self-evolving agents, structured around what components evolve, when adaptation occurs, and how it is implemented.
From System 1 to System 2: A Survey of Reasoning Large Language Models cs.AI · 2025-02-24 · accept · none · ref 189
The survey organizes the shift of LLMs toward deliberate System 2 reasoning, covering model construction techniques, performance on math and coding benchmarks, and future research directions.
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems cs.AI · 2025-03-31 · unverdicted · none · ref 157
This survey frames foundation agents using brain-inspired modular architectures and reviews challenges in evolution, collaboration, and safety.

V-star: Training verifiers for self-taught reasoners

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer