PromptPO shows LLMs can act as black-box policy optimizers for sequential RL when leveraging prior knowledge, matching baselines in exploration and robotics but underperforming in MuJoCo.
Reinforcement learning for optimization of covid-19 mitigation policies
3 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 3representative citing papers
Hierarchical RL with a global cost controller and local marginal-value policies outperforms RMAB and heuristic baselines by 20-30% in simulated multi-cluster SARS-CoV-2 control.
MARL framework for jurisdiction-specific HIV intervention allocation accounting for cross-jurisdictional interactions outperforms single-agent RL in CA/FL simulations under fixed budgets.
citing papers explorer
-
When are LLMs Sufficient Policy Optimizers for Sequential RL Tasks?
PromptPO shows LLMs can act as black-box policy optimizers for sequential RL when leveraging prior knowledge, matching baselines in exploration and robotics but underperforming in MuJoCo.
-
Optimizing Resource-Constrained Non-Pharmaceutical Interventions for Multi-Cluster Outbreak Control Using Hierarchical Reinforcement Learning
Hierarchical RL with a global cost controller and local marginal-value policies outperforms RMAB and heuristic baselines by 20-30% in simulated multi-cluster SARS-CoV-2 control.
-
A Multi-Agent Reinforcement Learning Framework for Public Health Decision Analysis
MARL framework for jurisdiction-specific HIV intervention allocation accounting for cross-jurisdictional interactions outperforms single-agent RL in CA/FL simulations under fixed budgets.