New concentration bounds and stopping rule close the exponential gap to match the lower bound for entropic best policy identification.
Title resolution pending
5 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.LG 5years
2026 5verdicts
UNVERDICTED 5representative citing papers
Aggregation distorts parametric behavioral curve peaks by factors of 3-5x via Simpson's paradox and survival bias, shown by individual vs. aggregate comparisons on Goodreads and Amazon datasets with a negative control.
Derives contraction-based Q-value extensions for exponential utility and proves almost-sure convergence of two-timescale and one-timescale model-free algorithms in discounted MDPs.
InvEvolve evolves white-box inventory policies from LLMs with statistical safety guarantees and outperforms classical and deep learning methods on synthetic and real retail data.
Reinforcement learning with graph neural networks finds minimally rigid graphs that match known planar realization optima and set new records for spherical realization counts.
citing papers explorer
-
Tight Sample Complexity Bounds for Entropic Best Policy Identification
New concentration bounds and stopping rule close the exponential gap to match the lower bound for entropic best policy identification.
-
Simpson's Paradox in Behavioral Curves: How Aggregation Distorts Parametric Models of User Dynamics
Aggregation distorts parametric behavioral curve peaks by factors of 3-5x via Simpson's paradox and survival bias, shown by individual vs. aggregate comparisons on Goodreads and Amazon datasets with a negative control.
-
Reinforcement Learning for Exponential Utility: Algorithms and Convergence in Discounted MDPs
Derives contraction-based Q-value extensions for exponential utility and proves almost-sure convergence of two-timescale and one-timescale model-free algorithms in discounted MDPs.
-
InvEvolve: Evolving White-Box Inventory Policies via Large Language Models with Performance Guarantees
InvEvolve evolves white-box inventory policies from LLMs with statistical safety guarantees and outperforms classical and deep learning methods on synthetic and real retail data.
-
Learning Minimally Rigid Graphs with High Realization Counts
Reinforcement learning with graph neural networks finds minimally rigid graphs that match known planar realization optima and set new records for spherical realization counts.