Randomly dropping 25% of transitions in PPO rollouts stabilizes training dynamics across five environments while matching vanilla PPO reward performance.
(2024).Colored Noise in PPO: Improved Exploration and Performance through Correlated Action Sampling
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Not All Transitions Matter: Evidence from PPO
Randomly dropping 25% of transitions in PPO rollouts stabilizes training dynamics across five environments while matching vanilla PPO reward performance.