Title resolution pending

Sutton, R · 2018

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

TRAM: Test-Time Risk Adaptation with Mixture of Agents

cs.LG · 2024-08-16 · unverdicted · novelty 7.0

TRAM is a test-time mixture method that scores and composes risk-neutral source policies using reward and occupancy-based risk to achieve new reward-risk tradeoffs without parameter updates.

Bayesian Inverse Transition Learning: Learning Dynamics From Near-Optimal Trajectories

cs.LG · 2024-11-07 · unverdicted · novelty 6.0

A Bayesian method uses near-optimality constraints from expert trajectories to estimate transition dynamics in offline model-based reinforcement learning.

citing papers explorer

Showing 2 of 2 citing papers.

TRAM: Test-Time Risk Adaptation with Mixture of Agents cs.LG · 2024-08-16 · unverdicted · none · ref 38
TRAM is a test-time mixture method that scores and composes risk-neutral source policies using reward and occupancy-based risk to achieve new reward-risk tradeoffs without parameter updates.
Bayesian Inverse Transition Learning: Learning Dynamics From Near-Optimal Trajectories cs.LG · 2024-11-07 · unverdicted · none · ref 39
A Bayesian method uses near-optimality constraints from expert trajectories to estimate transition dynamics in offline model-based reinforcement learning.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer