A vailable: https://arxiv.org/abs/1301.6720

[Online] · 2013 · cs.AI · arXiv 1301.6720

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open full Pith review browse 2 citing papers arXiv PDF

abstract

Solving partially observable Markov decision processes (POMDPs) is highly intractable in general, at least in part because the optimal policy may be infinitely large. In this paper, we explore the problem of finding the optimal policy from a restricted set of policies, represented as finite state automata of a given size. This problem is also intractable, but we show that the complexity can be greatly reduced when the POMDP and/or policy are further constrained. We demonstrate good empirical results with a branch-and-bound method for finding globally optimal deterministic policies, and a gradient-ascent method for finding locally optimal stochastic policies.

representative citing papers

Planning Stealthy Backdoor Attacks in MDPs with Observation-Based Triggers

eess.SY · 2025-04-17 · unverdicted · novelty 6.0

A switching gradient-based algorithm jointly optimizes a backdoor policy and finite-memory observation-based trigger for stealthy attacks in MDPs under partial observations.

Discrete Diffusion for Codebook-Based Beam Candidate Generation

eess.SP · 2026-04-09 · unverdicted · novelty 6.0

A discrete denoising diffusion model learns from probing histories to generate promising beam candidates, yielding better SNR, lower beam-miss probability, and reduced probe regret than baselines under tight probing budgets.

citing papers explorer

Showing 2 of 2 citing papers.

Planning Stealthy Backdoor Attacks in MDPs with Observation-Based Triggers eess.SY · 2025-04-17 · unverdicted · none · ref 11 · internal anchor
A switching gradient-based algorithm jointly optimizes a backdoor policy and finite-memory observation-based trigger for stealthy attacks in MDPs under partial observations.
Discrete Diffusion for Codebook-Based Beam Candidate Generation eess.SP · 2026-04-09 · unverdicted · none · ref 46
A discrete denoising diffusion model learns from probing histories to generate promising beam candidates, yielding better SNR, lower beam-miss probability, and reduced probe regret than baselines under tight probing budgets.

A vailable: https://arxiv.org/abs/1301.6720

fields

years

verdicts

representative citing papers

citing papers explorer