A T-estimation-based procedure for adaptive density estimation and optimal control in offline contextual MDPs without stationarity, providing oracle risk bounds under two loss functions and finite-sample cost guarantees.
Estimator selection with respect to Hellinger-type risks.Probability Theory and Related Fields, 151(1–2):353–401
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2representative citing papers
Total variation between products of measures is at least a universal constant times that between the averaged measures' products.
citing papers explorer
-
Adaptive Estimation and Optimal Control in Offline Contextual MDPs without Stationarity
A T-estimation-based procedure for adaptive density estimation and optimal control in offline contextual MDPs without stationarity, providing oracle risk bounds under two loss functions and finite-sample cost guarantees.
-
A homogenization principle for total variation
Total variation between products of measures is at least a universal constant times that between the averaged measures' products.