Cliff tokens are single tokens triggering LLM math reasoning failures, identified via adaptive z-test threshold on token potential; a taxonomy and Cliff-DPO optimization yield up to +6.6 accuracy gains.
Prune as you generate: Online rollout pruning for faster and better RLVR
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 6representative citing papers
DUET improves RLVR by allocating tokens across both prompt selection and rollout length, outperforming full-budget baselines even when using only half the tokens.
IV-CoT introduces an implicit chain-of-thought framework that decomposes visual queries into a structural-to-semantic cascade with training-only sketch supervision to improve structure-aware text-to-image generation.
TRACE is a rollout budget allocation framework that models ReAct turns as tree nodes and uses a predictor to allocate samples to informative prefixes, yielding a 2.8-point accuracy gain on Multi-Hop QA at equal cost.
Post-generation grammar and schema filtering on top of confidence scoring raises syntactic validity and execution success for Text2Cypher but increases empty outputs and lowers coverage.
DuQuant++ adapts outlier-aware fine-grained rotation to MXFP4 by matching block size to the 32-element microscaling group, enabling a single rotation that smooths distributions and achieves SOTA performance on LLaMA-3 with lower cost.
citing papers explorer
-
Cliff Tokens: Identifying Single-Token Failure Triggers in LLM Mathematical Reasoning
Cliff tokens are single tokens triggering LLM math reasoning failures, identified via adaptive z-test threshold on token potential; a taxonomy and Cliff-DPO optimization yield up to +6.6 accuracy gains.
-
DUET: Optimize Token-Budget Allocation for Reinforcement Learning with Verifiable Rewards
DUET improves RLVR by allocating tokens across both prompt selection and rollout length, outperforming full-budget baselines even when using only half the tokens.
-
IV-CoT: Implicit Visual Chain-of-Thought for Structure-Aware Text-to-Image Generation
IV-CoT introduces an implicit chain-of-thought framework that decomposes visual queries into a structural-to-semantic cascade with training-only sketch supervision to improve structure-aware text-to-image generation.
-
TRACE: A Unified Rollout Budget Allocation Framework for Efficient Agentic Reinforcement Learning
TRACE is a rollout budget allocation framework that models ReAct turns as tree nodes and uses a predictor to allocate samples to informative prefixes, yielding a 2.8-point accuracy gain on Multi-Hop QA at equal cost.
-
Extending Confidence-Based Text2Cypher with Grammar and Schema Aware Filtering
Post-generation grammar and schema filtering on top of confidence scoring raises syntactic validity and execution success for Text2Cypher but increases empty outputs and lowers coverage.
-
DuQuant++: Fine-grained Rotation Enhances Microscaling FP4 Quantization
DuQuant++ adapts outlier-aware fine-grained rotation to MXFP4 by matching block size to the 32-element microscaling group, enabling a single rotation that smooths distributions and achieves SOTA performance on LLaMA-3 with lower cost.