PPO-Clip gradient equals a per-sample KL surrogate with closed-form coefficient on importance ratio and advantage, yielding identical curves on five MuJoCo tasks.
arXiv preprint arXiv:2009.10897 , year=
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.LG 3years
2026 3representative citing papers
Synthetic data augmentation helps channel-mixing time series models but degrades channel-independent ones, with reliable gains only from seasonal-trend generators and gradual schedules in low-resource settings.
LogNEO applies PPO to GPT-Neo with a partial-credit exponentially decaying position-aware reward to reach F1 scores of 0.927/0.913/0.984 on HDFS/BGL/Thunderbird while running at production speeds.
citing papers explorer
-
KLip-PPO: A per-sample KL perspective on PPO-Clip
PPO-Clip gradient equals a per-sample KL surrogate with closed-form coefficient on importance ratio and advantage, yielding identical curves on five MuJoCo tasks.
-
Does Synthetic Data Help? Empirical Evidence from Deep Learning Time Series Forecasters
Synthetic data augmentation helps channel-mixing time series models but degrades channel-independent ones, with reliable gains only from seasonal-trend generators and gradual schedules in low-resource settings.
-
LogNEO: A GPT-Neo Reinforcement Learning Framework for Accurate Real-Time Log Anomaly Detection
LogNEO applies PPO to GPT-Neo with a partial-credit exponentially decaying position-aware reward to reach F1 scores of 0.927/0.913/0.984 on HDFS/BGL/Thunderbird while running at production speeds.