Combined procedural generators yield 91.5% mean success for RL navigation policies across map types, A* subgoals raise it to 98.9%, and the policies outperform classical controllers at higher speeds with partial sim-to-real transfer.
Proximal Policy Optimization Algorithms
3 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 3representative citing papers
MARLIN hybridizes multi-agent RL with LLM-based inter-robot negotiation to improve early training performance in simulated and physical robot teams without harming final results.
SCPO is a sampling-based weight-space projection algorithm that enforces rollout-evaluated safety constraints in policy optimization and provides a safe-by-induction guarantee from any safe starting point.
citing papers explorer
-
Beyond Specialization: Robust Reinforcement Learning Navigation via Procedural Map Generators
Combined procedural generators yield 91.5% mean success for RL navigation policies across map types, A* subgoals raise it to 98.9%, and the policies outperform classical controllers at higher speeds with partial sim-to-real transfer.
-
MARLIN: Multi-Agent Reinforcement Learning Guided by Language-Based Inter-Robot Negotiation
MARLIN hybridizes multi-agent RL with LLM-based inter-robot negotiation to improve early training performance in simulated and physical robot teams without harming final results.
-
Constrained Policy Optimization via Sampling-Based Weight-Space Projection
SCPO is a sampling-based weight-space projection algorithm that enforces rollout-evaluated safety constraints in policy optimization and provides a safe-by-induction guarantee from any safe starting point.