Proximal Policy Optimization Algorithms

· 2017

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

Beyond Specialization: Robust Reinforcement Learning Navigation via Procedural Map Generators

cs.RO · 2026-05-04 · unverdicted · novelty 7.0

Combined procedural generators yield 91.5% mean success for RL navigation policies across map types, A* subgoals raise it to 98.9%, and the policies outperform classical controllers at higher speeds with partial sim-to-real transfer.

MARLIN: Multi-Agent Reinforcement Learning Guided by Language-Based Inter-Robot Negotiation

cs.RO · 2024-10-18 · unverdicted · novelty 6.0

MARLIN hybridizes multi-agent RL with LLM-based inter-robot negotiation to improve early training performance in simulated and physical robot teams without harming final results.

Constrained Policy Optimization via Sampling-Based Weight-Space Projection

cs.LG · 2025-12-15 · unverdicted · novelty 5.0

SCPO is a sampling-based weight-space projection algorithm that enforces rollout-evaluated safety constraints in policy optimization and provides a safe-by-induction guarantee from any safe starting point.

citing papers explorer

Showing 3 of 3 citing papers.

Beyond Specialization: Robust Reinforcement Learning Navigation via Procedural Map Generators cs.RO · 2026-05-04 · unverdicted · none · ref 21
Combined procedural generators yield 91.5% mean success for RL navigation policies across map types, A* subgoals raise it to 98.9%, and the policies outperform classical controllers at higher speeds with partial sim-to-real transfer.
MARLIN: Multi-Agent Reinforcement Learning Guided by Language-Based Inter-Robot Negotiation cs.RO · 2024-10-18 · unverdicted · none · ref 4
MARLIN hybridizes multi-agent RL with LLM-based inter-robot negotiation to improve early training performance in simulated and physical robot teams without harming final results.
Constrained Policy Optimization via Sampling-Based Weight-Space Projection cs.LG · 2025-12-15 · unverdicted · none · ref 9
SCPO is a sampling-based weight-space projection algorithm that enforces rollout-evaluated safety constraints in policy optimization and provides a safe-by-induction guarantee from any safe starting point.

Proximal Policy Optimization Algorithms

fields

years

verdicts

representative citing papers

citing papers explorer