Recognition: unknown
Multiple-Environment Markov Decision Processes
read the original abstract
We introduce Multi-Environment Markov Decision Processes (MEMDPs) which are MDPs with a set of probabilistic transition functions. The goal in a MEMDP is to synthesize a single controller with guaranteed performances against all environments even though the environment is unknown a priori. While MEMDPs can be seen as a special class of partially observable MDPs, we show that several verification problems that are undecidable for partially observable MDPs, are decidable for MEMDPs and sometimes have even efficient solutions.
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
Probing the Impact of Scale on Data-Efficient, Generalist Transformer World Models for Atari
Transformer world models on Atari exhibit game-specific scaling regimes, but joint training on 26 environments produces consistent monotonic gains that improve downstream control policies to a median normalized score ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.