Sutton , title =

Richard S · 1988 · DOI 10.1007/bf00115009

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open at publisher browse 4 citing papers

representative citing papers

Offline Policy Evaluation for Manipulation Policies via Discounted Liveness Formulation

cs.RO · 2026-05-12 · conditional · novelty 7.0

A liveness-based Bellman operator enables conservative offline policy evaluation for manipulation tasks by encoding task progression and reducing truncation bias from finite horizons.

A Meta Reinforcement Learning Approach to Goals-Based Wealth Management

cs.LG · 2026-05-04 · unverdicted · novelty 6.0

MetaRL pre-trained on GBWM problems delivers near-optimal dynamic strategies in 0.01s achieving 97.8% of DP optimal utility and handles larger problems where DP fails.

PYTHALAB-MERA: Validation-Grounded Memory, Retrieval, and Acceptance Control for Frozen-LLM Coding Agents

cs.CL · 2026-05-08 · unverdicted · novelty 5.0

An external controller for frozen LLMs raises strict validation success on three RL coding tasks from 0/9 to 8/9 by selecting memory records and skills, running fail-fast checks, and propagating credit via eligibility traces.

Revisiting Adam for Streaming Reinforcement Learning

cs.LG · 2026-05-07 · unverdicted · novelty 5.0

C51 matches StreamQ in streaming RL on 55 Atari games while a new Adaptive Q(λ) algorithm based on bounded derivatives and variance-adjusted updates reaches nearly double the human baseline.

citing papers explorer

Showing 4 of 4 citing papers.

Offline Policy Evaluation for Manipulation Policies via Discounted Liveness Formulation cs.RO · 2026-05-12 · conditional · none · ref 27
A liveness-based Bellman operator enables conservative offline policy evaluation for manipulation tasks by encoding task progression and reducing truncation bias from finite horizons.
A Meta Reinforcement Learning Approach to Goals-Based Wealth Management cs.LG · 2026-05-04 · unverdicted · none · ref 74
MetaRL pre-trained on GBWM problems delivers near-optimal dynamic strategies in 0.01s achieving 97.8% of DP optimal utility and handles larger problems where DP fails.
PYTHALAB-MERA: Validation-Grounded Memory, Retrieval, and Acceptance Control for Frozen-LLM Coding Agents cs.CL · 2026-05-08 · unverdicted · none · ref 27
An external controller for frozen LLMs raises strict validation success on three RL coding tasks from 0/9 to 8/9 by selecting memory records and skills, running fail-fast checks, and propagating credit via eligibility traces.
Revisiting Adam for Streaming Reinforcement Learning cs.LG · 2026-05-07 · unverdicted · none · ref 32
C51 matches StreamQ in streaming RL on 55 Atari games while a new Adaptive Q(λ) algorithm based on bounded derivatives and variance-adjusted updates reaches nearly double the human baseline.

Sutton , title =

fields

years

verdicts

representative citing papers

citing papers explorer