pith. machine review for the scientific record. sign in

arxiv: 1502.02259 · v1 · submitted 2015-02-08 · 📊 stat.ML · cs.LG

Recognition: unknown

Contextual Markov Decision Processes

Authors on Pith no claims yet
classification 📊 stat.ML cs.LG
keywords behaviorcontextscustomercontextualdecisionlearnmarkovmodel
0
0 comments X
read the original abstract

We consider a planning problem where the dynamics and rewards of the environment depend on a hidden static parameter referred to as the context. The objective is to learn a strategy that maximizes the accumulated reward across all contexts. The new model, called Contextual Markov Decision Process (CMDP), can model a customer's behavior when interacting with a website (the learner). The customer's behavior depends on gender, age, location, device, etc. Based on that behavior, the website objective is to determine customer characteristics, and to optimize the interaction between them. Our work focuses on one basic scenario--finite horizon with a small known number of possible contexts. We suggest a family of algorithms with provable guarantees that learn the underlying models and the latent contexts, and optimize the CMDPs. Bounds are obtained for specific naive implementations, and extensions of the framework are discussed, laying the ground for future research.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Behavior-Constrained Reinforcement Learning with Receding-Horizon Credit Assignment for High-Performance Control

    cs.RO 2026-04 unverdicted novelty 6.0

    A behavior-constrained RL framework with receding-horizon credit assignment learns high-performance control policies that stay aligned with expert behavior in race car simulation.

  2. Task-specific Subnetwork Discovery in Reinforcement Learning for Autonomous Underwater Navigation

    cs.LG 2026-04 unverdicted novelty 5.0

    Contextual multi-task RL for underwater navigation uses just 1.5% of network weights for task differentiation, mostly from context-variable connections to the first hidden layer.

  3. Contextual Intelligence The Next Leap for Reinforcement Learning

    cs.LG 2026-02 unverdicted novelty 5.0

    Reinforcement learning agents can generalize better by treating context as a first-class primitive that distinguishes slow-changing external factors from fast-changing internal ones and incorporates abstract high-leve...

  4. Contextual Multi-Task Reinforcement Learning for Autonomous Reef Monitoring

    cs.RO 2026-04 unverdicted novelty 4.0

    A context-dependent multi-task RL policy is trained and evaluated in HoloOcean simulation to solve multiple reef monitoring tasks with claimed improvements in sample efficiency, zero-shot generalization, and robustnes...