Covariance-based entropy control selectively regularizes high-covariance tokens in softmax policies and achieves asymptotic unbiasedness upon annealing, unlike traditional regularization which introduces dense bias and alters the stationary distribution.
Learning to predict by the methods of temporal differ- ences
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1roles
background 1polarities
background 1representative citing papers
citing papers explorer
-
A Comparative Theoretical Analysis of Entropy Control Methods in Reinforcement Learning
Covariance-based entropy control selectively regularizes high-covariance tokens in softmax policies and achieves asymptotic unbiasedness upon annealing, unlike traditional regularization which introduces dense bias and alters the stationary distribution.