In policy gradient RL, careful variance control and simple estimator switching frequently outperform explicit discontinuity detection even when using differentiable simulators.
Conversely, c= 0 imposes a strong smoothness assumption, frequently falling back to the 0th-order estimator and leading to more conservative updates
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Does "Do Differentiable Simulators Give Better Policy Gradients?'' Give Better Policy Gradients?
In policy gradient RL, careful variance control and simple estimator switching frequently outperform explicit discontinuity detection even when using differentiable simulators.