pith. machine review for the scientific record. sign in

arxiv: 1706.06643 · v1 · submitted 2017-06-20 · 💻 cs.AI · cs.LG

Recognition: unknown

Policy Gradient Methods for Reinforcement Learning with Function Approximation and Action-Dependent Baselines

Authors on Pith no claims yet
classification 💻 cs.AI cs.LG
keywords action-dependentapproximationbaselinesfunctiongradientpolicyaction-independentbaseline
0
0 comments X
read the original abstract

We show how an action-dependent baseline can be used by the policy gradient theorem using function approximation, originally presented with action-independent baselines by (Sutton et al. 2000).

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. OmniVLA-RL: A Vision-Language-Action Model with Spatial Understanding and Online RL

    cs.RO 2026-04 unverdicted novelty 4.0

    OmniVLA-RL uses a mix-of-transformers architecture and flow-matching reformulated as SDE with group segmented policy optimization to surpass prior VLA models on LIBERO benchmarks.