A new Gym environment for medical AI agents reveals collapse in multi-turn RL due to sparse rewards, addressed by Turn-level Truncated On-Policy Distillation yielding +3.9 pp gains on clinical benchmarks.
It eliminates rotator cuff (too lateral), spinal accessory nerve (too posterior), and internal carotid (too lateral/deep)
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Healthcare AI GYM for Medical Agents
A new Gym environment for medical AI agents reveals collapse in multi-turn RL due to sparse rewards, addressed by Turn-level Truncated On-Policy Distillation yielding +3.9 pp gains on clinical benchmarks.