Title resolution pending

Reinforcement Learning: An Introduction , author= · 2018

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

browse 5 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Tight Sample Complexity Bounds for Entropic Best Policy Identification

cs.LG · 2026-05-13 · unverdicted · novelty 7.0

New concentration bounds and stopping rule close the exponential gap to match the lower bound for entropic best policy identification.

Simpson's Paradox in Behavioral Curves: How Aggregation Distorts Parametric Models of User Dynamics

cs.LG · 2026-05-10 · unverdicted · novelty 7.0

Aggregation distorts parametric behavioral curve peaks by factors of 3-5x via Simpson's paradox and survival bias, shown by individual vs. aggregate comparisons on Goodreads and Amazon datasets with a negative control.

Reinforcement Learning for Exponential Utility: Algorithms and Convergence in Discounted MDPs

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

Derives contraction-based Q-value extensions for exponential utility and proves almost-sure convergence of two-timescale and one-timescale model-free algorithms in discounted MDPs.

InvEvolve: Evolving White-Box Inventory Policies via Large Language Models with Performance Guarantees

cs.LG · 2026-05-01 · unverdicted · novelty 7.0

InvEvolve evolves white-box inventory policies from LLMs with statistical safety guarantees and outperforms classical and deep learning methods on synthetic and real retail data.

Learning Minimally Rigid Graphs with High Realization Counts

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

Reinforcement learning with graph neural networks finds minimally rigid graphs that match known planar realization optima and set new records for spherical realization counts.

citing papers explorer

Showing 5 of 5 citing papers after filters.

Tight Sample Complexity Bounds for Entropic Best Policy Identification cs.LG · 2026-05-13 · unverdicted · none · ref 7
New concentration bounds and stopping rule close the exponential gap to match the lower bound for entropic best policy identification.
Simpson's Paradox in Behavioral Curves: How Aggregation Distorts Parametric Models of User Dynamics cs.LG · 2026-05-10 · unverdicted · none · ref 10
Aggregation distorts parametric behavioral curve peaks by factors of 3-5x via Simpson's paradox and survival bias, shown by individual vs. aggregate comparisons on Goodreads and Amazon datasets with a negative control.
Reinforcement Learning for Exponential Utility: Algorithms and Convergence in Discounted MDPs cs.LG · 2026-05-08 · unverdicted · none · ref 179
Derives contraction-based Q-value extensions for exponential utility and proves almost-sure convergence of two-timescale and one-timescale model-free algorithms in discounted MDPs.
InvEvolve: Evolving White-Box Inventory Policies via Large Language Models with Performance Guarantees cs.LG · 2026-05-01 · unverdicted · none · ref 164
InvEvolve evolves white-box inventory policies from LLMs with statistical safety guarantees and outperforms classical and deep learning methods on synthetic and real retail data.
Learning Minimally Rigid Graphs with High Realization Counts cs.LG · 2026-05-12 · unverdicted · none · ref 22
Reinforcement learning with graph neural networks finds minimally rigid graphs that match known planar realization optima and set new records for spherical realization counts.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer