pith. sign in

arxiv: 2001.04032 · v2 · pith:V7FQN5EBnew · submitted 2020-01-13 · 📊 stat.ML · cs.LG

POPCORN: Partially Observed Prediction COnstrained ReiNforcement Learning

classification 📊 stat.ML cs.LG
keywords datadecision-makingmedicalobservedpartiallyplanningwhenapproach
0
0 comments X
read the original abstract

Many medical decision-making tasks can be framed as partially observed Markov decision processes (POMDPs). However, prevailing two-stage approaches that first learn a POMDP and then solve it often fail because the model that best fits the data may not be well suited for planning. We introduce a new optimization objective that (a) produces both high-performing policies and high-quality generative models, even when some observations are irrelevant for planning, and (b) does so in batch off-policy settings that are typical in healthcare, when only retrospective data is available. We demonstrate our approach on synthetic examples and a challenging medical decision-making problem.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. An adaptive variance estimator for relative sparsity

    stat.ME 2026-05 unverdicted novelty 6.0

    A new adaptive variance estimator for relative sparsity coefficients is introduced that fully utilizes the prior asymptotic normality theorem and incorporates variable selection effects.

  2. VentAgent: When LLMs Learn to Breathe -- Multi-Objective Arbitration for ARDS Ventilation

    cs.LG 2026-06 unverdicted novelty 5.0

    VentAgent uses LLMs in a three-stage Perception-Planning-Orchestration hierarchy to perform multi-objective arbitration for mechanical ventilation in ARDS, outperforming RL baselines on a simulator while producing hum...

  3. Treatment, evidence, imitation, and chat

    stat.OT 2025-06 unverdicted novelty 4.0

    LLMs cannot solve the medical treatment problem through imitation alone because it requires evidence from experiments or observations, posing ethical challenges for training such systems.