Characterizes utility functions making recursive OCE objectives PAC-learnable and derives matching upper and lower PAC sample complexity bounds for value and policy learning, with improved tau dependence for CVaR.
Efficient Risk-sensitive Planning via Entropic Risk Measures
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.LG 3verdicts
UNVERDICTED 3representative citing papers
New concentration bounds and stopping rule close the exponential gap to match the lower bound for entropic best policy identification.
Derives PAC-type upper bounds and matching lower bounds on sample complexity for value and policy learning under recursive entropic risk measures, with exponential dependence on |β|/(1-γ).
citing papers explorer
-
On the Sample Complexity of Discounted Reinforcement Learning with Optimized Certainty Equivalents
Characterizes utility functions making recursive OCE objectives PAC-learnable and derives matching upper and lower PAC sample complexity bounds for value and policy learning, with improved tau dependence for CVaR.
-
Tight Sample Complexity Bounds for Entropic Best Policy Identification
New concentration bounds and stopping rule close the exponential gap to match the lower bound for entropic best policy identification.
-
Recursive Entropic Risk Optimization in Discounted MDPs: Sample Complexity Bounds with a Generative Model
Derives PAC-type upper bounds and matching lower bounds on sample complexity for value and policy learning under recursive entropic risk measures, with exponential dependence on |β|/(1-γ).