pith. machine review for the scientific record. sign in

arXiv preprint arXiv:2507.21053 , year=

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

fields

cs.LG 6 cs.RO 3

years

2026 9

verdicts

UNVERDICTED 9

representative citing papers

Generative Actor-Critic with Soft Bridge Policies

cs.LG · 2026-05-09 · unverdicted · novelty 7.0

SoftGAC defines a stochastic bridge from base to action latent that converts the MaxEnt objective into a tractable relative-entropy term reducible to control energy, achieving competitive returns with one-pass sampling.

Driving Intents Amplify Planning-Oriented Reinforcement Learning

cs.RO · 2026-05-12 · unverdicted · novelty 6.0

DIAL uses intent-conditioned CFG and multi-intent GRPO to expand and preserve diverse modes in continuous-action preference RL, lifting RFS to 9.14 and surpassing both prior best (8.5) and human demonstration (8.13).

Unified Noise Steering for Efficient Human-Guided VLA Adaptation

cs.RO · 2026-05-11 · unverdicted · novelty 6.0

UniSteer unifies human corrective actions and noise-space RL for VLA adaptation by inverting actions to noise targets, raising success rates from 20% to 90% in 66 minutes across four real-world manipulation tasks.

OGPO: Sample Efficient Full-Finetuning of Generative Control Policies

cs.LG · 2026-05-04 · unverdicted · novelty 6.0

OGPO is a sample-efficient off-policy method for full finetuning of generative control policies that reaches SOTA on robotic manipulation tasks and can recover from poor behavior-cloning initializations without expert data.

Positive-Only Drifting Policy Optimization

cs.LG · 2026-04-15 · unverdicted · novelty 6.0

PODPO is a likelihood-free generative policy optimization method for online RL that steers actions to high-return regions using only positive-advantage samples and local contrastive drifting.

citing papers explorer

Showing 9 of 9 citing papers.