The in-sample softmax for offline reinforcement learning.arXiv preprint arXiv:2302.14372,

Chenjun Xiao, Han Wang, Yangchen Pan, Adam White, Martha White · arXiv 2302.14372

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Peng's Q($\lambda$) for Conservative Value Estimation in Offline Reinforcement Learning

cs.LG · 2026-05-14 · unverdicted · novelty 7.0

CPQL adapts the multi-step Peng's Q(λ) operator for conservative offline value estimation, achieving performance guarantees and empirical gains over single-step baselines on D4RL while supporting offline-to-online fine-tuning.

SPAR: Support-Preserving Action Rectification

cs.LG · 2026-05-27 · unverdicted · novelty 6.0

SPAR anchors policy learning to a frozen BC policy for residual rectification and introduces latent self-imitation to eliminate manifold drift, achieving SOTA on D4RL.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Peng's Q($\lambda$) for Conservative Value Estimation in Offline Reinforcement Learning cs.LG · 2026-05-14 · unverdicted · none · ref 17
CPQL adapts the multi-step Peng's Q(λ) operator for conservative offline value estimation, achieving performance guarantees and empirical gains over single-step baselines on D4RL while supporting offline-to-online fine-tuning.
SPAR: Support-Preserving Action Rectification cs.LG · 2026-05-27 · unverdicted · none · ref 17
SPAR anchors policy learning to a frozen BC policy for residual rectification and introduces latent self-imitation to eliminate manifold drift, achieving SOTA on D4RL.

The in-sample softmax for offline reinforcement learning.arXiv preprint arXiv:2302.14372,

fields

years

verdicts

representative citing papers

citing papers explorer