The work establishes the first DP regret bound of order O(K^{3/5}) for model-free online RL under general function approximation and the first coverability-based regret bound for batched non-private RL.
arXiv preprint arXiv:2202.05567 , year=
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.LG 3verdicts
UNVERDICTED 3representative citing papers
Differential privacy in policy optimization adds sample complexity costs that often appear as lower-order terms rather than dominating the bounds.
Replaces determinant growth with generalized Rayleigh quotient for rare switching in private linear bandits to control worst-direction volume despite non-monotonic design matrices from noise.
citing papers explorer
-
Towards Differentially Private Reinforcement Learning with General Function Approximation
The work establishes the first DP regret bound of order O(K^{3/5}) for model-free online RL under general function approximation and the first coverability-based regret bound for batched non-private RL.
-
On the Sample Complexity of Differentially Private Policy Optimization
Differential privacy in policy optimization adds sample complexity costs that often appear as lower-order terms rather than dominating the bounds.
-
When Determinants Are Not Enough: Private Rare Switching
Replaces determinant growth with generalized Rayleigh quotient for rare switching in private linear bandits to control worst-direction volume despite non-monotonic design matrices from noise.