Introduces replicable random design regression and covariance estimation tools to enable the first provably efficient replicable RL algorithms for linear MDPs in generative and episodic settings.
Siyuan Zhang and Nan Jiang
4 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.LG 4representative citing papers
SOPE dynamically controls offline training length in online RL using actor-aligned OPE on validation data to stop when benefits saturate, achieving up to 45.6% better performance and 22x less computation on Minari tasks.
Perturb-and-Correct generates epistemically diverse predictors from a single pretrained network via hidden-layer perturbations followed by affine least-squares corrections that enforce agreement on calibration data.
Gymnasium establishes a standardized API for RL environments to improve interoperability, reproducibility, and ease of development in reinforcement learning.
citing papers explorer
-
Replicable Reinforcement Learning with Linear Function Approximation
Introduces replicable random design regression and covariance estimation tools to enable the first provably efficient replicable RL algorithms for linear MDPs in generative and episodic settings.
-
SOPE: Stabilizing Off-Policy Evaluation for Online RL with Prior Data
SOPE dynamically controls offline training length in online RL using actor-aligned OPE on validation data to stop when benefits saturate, achieving up to 45.6% better performance and 22x less computation on Minari tasks.
-
Perturb and Correct: Post-Hoc Ensembles using Affine Redundancy
Perturb-and-Correct generates epistemically diverse predictors from a single pretrained network via hidden-layer perturbations followed by affine least-squares corrections that enforce agreement on calibration data.
-
Gymnasium: A Standard Interface for Reinforcement Learning Environments
Gymnasium establishes a standardized API for RL environments to improve interoperability, reproducibility, and ease of development in reinforcement learning.