Challenges of Real-World Reinforcement Learning

Gabriel Dulac-Arnold , Daniel Mankowitz , Todd Hester

Authors on Pith no claims yet

classification 💻 cs.LG cs.AIcs.ROstat.ML

keywords challengesreal-worldsomechallengelearningnineproblemsreal

read the original abstract

Reinforcement learning (RL) has proven its worth in a series of artificial domains, and is beginning to show some successes in real-world scenarios. However, much of the research advances in RL are often hard to leverage in real-world systems due to a series of assumptions that are rarely satisfied in practice. We present a set of nine unique challenges that must be addressed to productionize RL to real world problems. For each of these challenges, we specify the exact meaning of the challenge, present some approaches from the literature, and specify some metrics for evaluating that challenge. An approach that addresses all nine challenges would be applicable to a large number of real world problems. We also present an example domain that has been modified to present these challenges as a testbed for practical RL research.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 7 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

D4RL: Datasets for Deep Data-Driven Reinforcement Learning
cs.LG 2020-04 accept novelty 8.0

D4RL supplies new offline RL benchmarks and datasets from expert and mixed sources to expose weaknesses in existing algorithms and standardize evaluation.
Scalar Federated Learning for Linear Quadratic Regulator
eess.SY 2026-04 unverdicted novelty 7.0

A scalar-projection federated zeroth-order method for model-free LQR policy learning that reduces per-agent communication from O(d) to O(1) with convergence rate improving in the number of agents.
Model-Based Proactive Cost Generation for Learning Safe Policies Offline with Limited Violation Data
cs.LG 2026-05 unverdicted novelty 6.0

PROCO generates synthetic unsafe samples via model-based rollouts and LLM-grounded costs to enable safer policy learning from offline datasets containing few or no violations.
Higher Resolution, Better Generalization: Unlocking Visual Scaling in Deep Reinforcement Learning
cs.LG 2026-05 unverdicted novelty 5.0

Higher-resolution observations with global-average-pooling encoders improve RL performance and generalization by enabling more localized visual attention, yielding up to 28% gains over standard Impala encoders.
LANTERN: LLM-Augmented Neurosymbolic Transfer with Experience-Gated Reasoning Networks
cs.AI 2026-05 unverdicted novelty 5.0

LANTERN improves RL sample efficiency by 40-60% via LLM-generated task automata, semantic multi-source policy aggregation, and experience-gated adaptive transfer.
Hierarchical Reinforcement Learning with Runtime Safety Shielding for Power Grid Operation
cs.AI 2026-04 unverdicted novelty 5.0

A hierarchical RL policy paired with a runtime safety shield using forward simulation achieves longer survival, lower line loading, and zero-shot generalization on Grid2Op benchmarks including stress tests and unseen ...
UAV Trajectory and Bandwidth Allocation for Efficient Data Collection in Low-Altitude Intelligent IoT: A Hierarchical DRL Approach
cs.CE 2026-04 unverdicted novelty 3.0

Hierarchical DRL optimizes UAV trajectories and bandwidth allocation to increase IoT data collection volume, with simulations showing 44% faster convergence and 58% lower compute cost than flat DRL.