Sutton and Andrew G

Richard S · 1998 · arXiv 1998.712192

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Entropy Polarity in Reinforcement Fine-Tuning: Direction, Asymmetry, and Control

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

Entropy polarity from a first-order entropy change approximation enables Polarity-Aware Policy Optimization (PAPO) that preserves complementary polarity branches and outperforms baselines on math and agentic RL fine-tuning tasks.

Safe reinforcement learning with online filtering for fatigue-predictive human-robot task planning and allocation in production

cs.AI · 2026-04-14 · unverdicted · novelty 5.0

PF-CD3Q uses online particle filtering to estimate fatigue parameters and constrains a deep Q-learning agent to solve fatigue-aware human-robot task planning as a CMDP.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Entropy Polarity in Reinforcement Fine-Tuning: Direction, Asymmetry, and Control cs.LG · 2026-05-12 · unverdicted · none · ref 38
Entropy polarity from a first-order entropy change approximation enables Polarity-Aware Policy Optimization (PAPO) that preserves complementary polarity branches and outperforms baselines on math and agentic RL fine-tuning tasks.
Safe reinforcement learning with online filtering for fatigue-predictive human-robot task planning and allocation in production cs.AI · 2026-04-14 · unverdicted · none · ref 72
PF-CD3Q uses online particle filtering to estimate fatigue parameters and constrains a deep Q-learning agent to solve fatigue-aware human-robot task planning as a CMDP.

Sutton and Andrew G

fields

years

verdicts

representative citing papers

citing papers explorer