hub Canonical reference

Conrft: A reinforced fine-tuning method for vla models via con- sistency policy.arXiv preprint arXiv:2502.05450

· 2025 · arXiv 2502.05450

Canonical reference. 89% of citing Pith papers cite this work as background.

33 Pith papers citing it

Background 89% of classified citations

read on arXiv browse 33 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 8 other 1

citation-polarity summary

background 8 unclear 1

representative citing papers

Adapting Generalist Robot Policies with Semantic Reinforcement Learning

cs.RO · 2026-06-30 · unverdicted · novelty 7.0

SARL optimizes language prompt inputs to generalist vision-language-action policies through online RL to solve complex long-horizon tasks by composing existing skills.

One Demonstration Is Enough for Real-World Robotic Reinforcement Learning

cs.RO · 2026-07-02 · unverdicted · novelty 6.0

AutoSERL achieves strong performance on six real-world robot manipulation tasks using RL guided by a single demonstration via sliding-window intervention, safety recovery, and automatic termination.

Improving Vision-Language-Action Model Fine-Tuning with Structured Stage and Keyframe Supervision

cs.RO · 2026-06-25 · unverdicted · novelty 6.0

StaKe adds lightweight auxiliary heads for manipulation stage identification and next-gripper-transition keyframe prediction to VLA fine-tuning, reporting relative success rate gains of 14% in bimanual simulation and 56% on single-arm real-robot tasks.

Learning Process Rewards via Success Visitation Matching for Efficient RL

cs.LG · 2026-06-22 · unverdicted · novelty 6.0

Success Visitation Matching uses a discriminator to turn sparse outcome rewards into dense process rewards by matching visitations of successful episodes, provably preserving the optimal policy and speeding up robotic RL finetuning.

SARM2: Multi-Task Stage Aware Reward Modeling for Self Improving Robotic Manipulation

cs.RO · 2026-06-09 · unverdicted · novelty 6.0

SARM2 presents RM, a multi-task stage-aware reward model achieving 80% lower value-estimation MSE, which when used in SPIRAL boosts manipulation task success from ~50% to near-perfect on several benchmarks.

FiberTune: Preserving Action-Fiber Visual Residuals in Vision-Language-Action Fine-Tuning

cs.CV · 2026-06-07 · unverdicted · novelty 6.0

FiberTune is a new fine-tuning objective that preserves action-fiber visual residuals in VLA policies, yielding performance gains on simulation and physical robot tasks.

Preference-Calibrated Human-in-the-Loop Reinforcement Learning for Robotic Manipulation

cs.RO · 2026-06-02 · unverdicted · novelty 6.0

PACT calibrates credit assignment in HIL-RL by penalizing Bellman targets on suboptimal segments using counterfactual advantages from human-policy preference pairs, yielding 24.5% higher success rates and 1.3x faster convergence on five real-robot tasks.

Hand-in-the-Loop: Improving VLA Policies for Dexterous Manipulation via Seamless Hand-Arm Intervention

cs.RO · 2026-05-14 · unverdicted · novelty 6.0 · 2 refs

HandITL enables seamless human intervention in VLA policies for bimanual dexterous manipulation, cutting jitter by 99.8% and improving refined policies by 19% over standard teleoperation.

Unified Noise Steering for Efficient Human-Guided VLA Adaptation

cs.RO · 2026-05-11 · unverdicted · novelty 6.0

UniSteer unifies human corrective actions and noise-space RL for VLA adaptation by inverting actions to noise targets, raising success rates from 20% to 90% in 66 minutes across four real-world manipulation tasks.

Retrieve-then-Steer: Online Success Memory for Test-Time Adaptation of Generative VLAs

cs.RO · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

A retrieve-then-steer method stores successful robot actions in memory and uses them to steer a frozen VLA's flow-matching sampler for better test-time reliability without parameter updates.

Escaping the Diversity Trap in Robotic Manipulation via Anchor-Centric Adaptation

cs.RO · 2026-05-08 · unverdicted · novelty 6.0

Anchor-Centric Adaptation escapes the diversity trap by prioritizing repeated demonstrations at core anchors over broad coverage, yielding higher success rates under fixed data budgets in robotic manipulation.

Learning While Deploying: Fleet-Scale Reinforcement Learning for Generalist Robot Policies

cs.RO · 2026-05-01 · unverdicted · novelty 6.0 · 2 refs

LWD is a fleet-scale offline-to-online RL framework that continually improves pretrained VLA policies using autonomous rollouts and human interventions, reaching 95% average success on real-world manipulation tasks.

LaST-R1: Reinforcing Robotic Manipulation via Adaptive Physical Latent Reasoning

cs.RO · 2026-04-30 · unverdicted · novelty 6.0 · 2 refs

LaST-R1 introduces a RL post-training method called LAPO that optimizes latent Chain-of-Thought reasoning in vision-language-action models, yielding 99.9% success on LIBERO and up to 22.5% real-world gains.

RL Token: Bootstrapping Online RL with Vision-Language-Action Models

cs.LG · 2026-04-24 · unverdicted · novelty 6.0

RL Token enables sample-efficient online RL fine-tuning of large VLAs, delivering up to 3x speed gains and higher success rates on real-robot manipulation tasks within minutes to hours.

MoRI: Mixture of RL and IL Experts for Long-Horizon Manipulation Tasks

cs.RO · 2026-04-11 · unverdicted · novelty 6.0

MoRI dynamically mixes RL and IL experts with variance-based switching and IL regularization to reach 97.5% success in four real-world robotic tasks while cutting human intervention by 85.8%.

Towards Long-Lived Robots: Continual Learning VLA Models via Reinforcement Fine-Tuning

cs.RO · 2026-02-11 · unverdicted · novelty 6.0

LifeLong-RFT applies chunking-level on-policy reinforcement learning with Quantized Action Consistency Reward, Continuous Trajectory Alignment Reward, and Format Compliance Reward to fine-tune VLA models, achieving a 22% average success rate gain over supervised fine-tuning on the LIBERO benchmark's

TwinRL: Digital Twin-Driven Reinforcement Learning for Real-World Robotic Manipulation

cs.RO · 2026-02-09 · unverdicted · novelty 6.0

TwinRL expands RL exploration via digital twin reconstruction and twin RL warm-up to guide real-world learning, reaching near-100% success with 20 minutes of on-robot time across four tasks.

$\pi^{*}_{0.6}$: a VLA That Learns From Experience

cs.LG · 2025-11-18 · unverdicted · novelty 6.0

RECAP enables a generalist VLA to self-improve via advantage-conditioned RL on mixed real-world data, more than doubling throughput and halving failure rates on hard manipulation tasks.

DeepThinkVLA: Enhancing Reasoning Capability of Vision-Language-Action Models

cs.LG · 2025-10-31 · unverdicted · novelty 6.0

DeepThinkVLA shows CoT improves VLA models only under decoding and causal alignment, delivering 97% success on LIBERO and 21.7-point gains via hybrid attention and SFT-RL training.

SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning

cs.RO · 2025-09-11 · conditional · novelty 6.0

SimpleVLA-RL applies tailored reinforcement learning to VLA models, reaching SoTA on LIBERO, outperforming π₀ on RoboTwin, and surpassing SFT in real-world tasks while reducing data needs and identifying a 'pushcut' phenomenon.

Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation

cs.RO · 2025-08-19 · conditional · novelty 6.0

Embodied-R1 uses a pointing-centric representation and reinforced fine-tuning on a 200K dataset to achieve state-of-the-art results on embodied benchmarks plus 56.2% success in SIMPLEREnv and 87.5% on real XArm tasks without task-specific training.

SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

cs.LG · 2025-06-02 · unverdicted · novelty 6.0

SmolVLA is a small efficient VLA model that achieves performance comparable to 10x larger models while training on one GPU and deploying on consumer hardware via community data and chunked asynchronous action prediction.

WorldSample: Closed-loop Real-robot RL with World Modelling

cs.RO · 2026-07-02 · unverdicted · novelty 5.0

WorldSample generates synthetic transitions from a post-trained world model grounded in real rollouts and uses Policy-Paced Learning to improve RL policies, reporting 28% higher success rates and 59% fewer training steps on contact-rich robot tasks.

AllDayNav: Lifelong Navigation via Real-World Reinforcement Learning

cs.RO · 2026-06-09 · unverdicted · novelty 5.0

AllDayNav encodes scene dynamics into a large model's parameters via RL and a multimodal memory, achieving near-100% success rates in lifelong navigation and outperforming map-based and VLM baselines.

citing papers explorer

Showing 0 of 0 citing papers after filters.

No citing papers match the current filters.

Conrft: A reinforced fine-tuning method for vla models via con- sistency policy.arXiv preprint arXiv:2502.05450

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer