Prune as you generate: Online rollout pruning for faster and better RLVR

· 2026 · arXiv 2603.24840

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 1 baseline 1

citation-polarity summary

background 1 baseline 1

representative citing papers

Cliff Tokens: Identifying Single-Token Failure Triggers in LLM Mathematical Reasoning

cs.AI · 2026-06-24 · conditional · novelty 7.0 · 2 refs

Cliff tokens are single tokens triggering LLM math reasoning failures, identified via adaptive z-test threshold on token potential; a taxonomy and Cliff-DPO optimization yield up to +6.6 accuracy gains.

DUET: Optimize Token-Budget Allocation for Reinforcement Learning with Verifiable Rewards

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

DUET improves RLVR by allocating tokens across both prompt selection and rollout length, outperforming full-budget baselines even when using only half the tokens.

IV-CoT: Implicit Visual Chain-of-Thought for Structure-Aware Text-to-Image Generation

cs.CV · 2026-06-23 · unverdicted · novelty 6.0

IV-CoT introduces an implicit chain-of-thought framework that decomposes visual queries into a structural-to-semantic cascade with training-only sketch supervision to improve structure-aware text-to-image generation.

TRACE: A Unified Rollout Budget Allocation Framework for Efficient Agentic Reinforcement Learning

cs.LG · 2026-06-09 · unverdicted · novelty 6.0

TRACE is a rollout budget allocation framework that models ReAct turns as tree nodes and uses a predictor to allocate samples to informative prefixes, yielding a 2.8-point accuracy gain on Multi-Hop QA at equal cost.

Extending Confidence-Based Text2Cypher with Grammar and Schema Aware Filtering

cs.CL · 2026-05-11 · unverdicted · novelty 5.0

Post-generation grammar and schema filtering on top of confidence scoring raises syntactic validity and execution success for Text2Cypher but increases empty outputs and lowers coverage.

DuQuant++: Fine-grained Rotation Enhances Microscaling FP4 Quantization

cs.CV · 2026-04-20 · unverdicted · novelty 4.0

DuQuant++ adapts outlier-aware fine-grained rotation to MXFP4 by matching block size to the 32-element microscaling group, enabling a single rotation that smooths distributions and achieves SOTA performance on LLaMA-3 with lower cost.

citing papers explorer

Showing 6 of 6 citing papers.

Cliff Tokens: Identifying Single-Token Failure Triggers in LLM Mathematical Reasoning cs.AI · 2026-06-24 · conditional · none · ref 26 · 2 links
Cliff tokens are single tokens triggering LLM math reasoning failures, identified via adaptive z-test threshold on token potential; a taxonomy and Cliff-DPO optimization yield up to +6.6 accuracy gains.
DUET: Optimize Token-Budget Allocation for Reinforcement Learning with Verifiable Rewards cs.LG · 2026-05-08 · unverdicted · none · ref 37
DUET improves RLVR by allocating tokens across both prompt selection and rollout length, outperforming full-budget baselines even when using only half the tokens.
IV-CoT: Implicit Visual Chain-of-Thought for Structure-Aware Text-to-Image Generation cs.CV · 2026-06-23 · unverdicted · none · ref 46
IV-CoT introduces an implicit chain-of-thought framework that decomposes visual queries into a structural-to-semantic cascade with training-only sketch supervision to improve structure-aware text-to-image generation.
TRACE: A Unified Rollout Budget Allocation Framework for Efficient Agentic Reinforcement Learning cs.LG · 2026-06-09 · unverdicted · none · ref 72
TRACE is a rollout budget allocation framework that models ReAct turns as tree nodes and uses a predictor to allocate samples to informative prefixes, yielding a 2.8-point accuracy gain on Multi-Hop QA at equal cost.
Extending Confidence-Based Text2Cypher with Grammar and Schema Aware Filtering cs.CL · 2026-05-11 · unverdicted · none · ref 21
Post-generation grammar and schema filtering on top of confidence scoring raises syntactic validity and execution success for Text2Cypher but increases empty outputs and lowers coverage.
DuQuant++: Fine-grained Rotation Enhances Microscaling FP4 Quantization cs.CV · 2026-04-20 · unverdicted · none · ref 21
DuQuant++ adapts outlier-aware fine-grained rotation to MXFP4 by matching block size to the 32-element microscaling group, enabling a single rotation that smooths distributions and achieves SOTA performance on LLaMA-3 with lower cost.

Prune as you generate: Online rollout pruning for faster and better RLVR

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer