A framework and RL algorithm for long-term fairness under selective labels that decomposes the true fairness measure into observed fairness plus prediction bias and provides sufficient conditions based on predictor confidence.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2years
2026 2representative citing papers
The note claims linear convergence of WPO in entropy-regularized MDPs by combining mean-field gradient flow analysis with a local log-Sobolev inequality under a regularity assumption.
citing papers explorer
-
Long-term Fairness with Selective Labels
A framework and RL algorithm for long-term fairness under selective labels that decomposes the true fairness measure into observed fairness plus prediction bias and provides sufficient conditions based on predictor confidence.
-
A note on convergence of Wasserstein policy optimization
The note claims linear convergence of WPO in entropy-regularized MDPs by combining mean-field gradient flow analysis with a local log-Sobolev inequality under a regularity assumption.