TRAM is a test-time mixture method that scores and composes risk-neutral source policies using reward and occupancy-based risk to achieve new reward-risk tradeoffs without parameter updates.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2years
2024 2verdicts
UNVERDICTED 2representative citing papers
A Bayesian method uses near-optimality constraints from expert trajectories to estimate transition dynamics in offline model-based reinforcement learning.
citing papers explorer
-
TRAM: Test-Time Risk Adaptation with Mixture of Agents
TRAM is a test-time mixture method that scores and composes risk-neutral source policies using reward and occupancy-based risk to achieve new reward-risk tradeoffs without parameter updates.
-
Bayesian Inverse Transition Learning: Learning Dynamics From Near-Optimal Trajectories
A Bayesian method uses near-optimality constraints from expert trajectories to estimate transition dynamics in offline model-based reinforcement learning.