Recognition: unknown
Policy Gradient Methods for Reinforcement Learning with Function Approximation and Action-Dependent Baselines
classification
💻 cs.AI
cs.LG
keywords
action-dependentapproximationbaselinesfunctiongradientpolicyaction-independentbaseline
read the original abstract
We show how an action-dependent baseline can be used by the policy gradient theorem using function approximation, originally presented with action-independent baselines by (Sutton et al. 2000).
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
OmniVLA-RL: A Vision-Language-Action Model with Spatial Understanding and Online RL
OmniVLA-RL uses a mix-of-transformers architecture and flow-matching reformulated as SDE with group segmented policy optimization to surpass prior VLA models on LIBERO benchmarks.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.