A support-aware offline decision framework for reserve-policy selection that outputs certified policies and shortlists instead of rankings, with a finite-catalog guarantee preserving the best supported policy.
1Corresponding author:shekharp@erau.edu 13 A Proofs A.1 Proof of Theorem 4.1 Proof.For eachπ∈ P, define the centered replay difference Z π i :=Y π i −Y 0 i , µ Z,π :=E[Z π i ]
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
stat.ML 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Support-aware offline policy selection for advertising marketplaces
A support-aware offline decision framework for reserve-policy selection that outputs certified policies and shortlists instead of rankings, with a finite-catalog guarantee preserving the best supported policy.