pith. sign in

arxiv: 1709.04047 · v1 · pith:FCT6Q3LBnew · submitted 2017-09-12 · 💻 cs.SY

Learning-based Control of Unknown Linear Systems with Thompson Sampling

classification 💻 cs.SY
keywords thompsonalgorithmcontrolsamplingstoppingtsdecriteriondynamic
0
0 comments X
read the original abstract

We propose a Thompson sampling-based learning algorithm for the Linear Quadratic (LQ) control problem with unknown system parameters. The algorithm is called Thompson sampling with dynamic episodes (TSDE) where two stopping criteria determine the lengths of the dynamic episodes in Thompson sampling. The first stopping criterion controls the growth rate of episode length. The second stopping criterion is triggered when the determinant of the sample covariance matrix is less than half of the previous value. We show under some conditions on the prior distribution that the expected (Bayesian) regret of TSDE accumulated up to time T is bounded by O(\sqrt{T}). Here O(.) hides constants and logarithmic factors. This is the first O(\sqrt{T} ) bound on expected regret of learning in LQ control. By introducing a reinitialization schedule, we also show that the algorithm is robust to time-varying drift in model parameters. Numerical simulations are provided to illustrate the performance of TSDE.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Revised Progressive-Hedging-Algorithm Based Two-layer Solution Scheme for Bayesian Reinforcement Learning

    eess.SY 2019-06 unverdicted novelty 5.0

    A two-layer DP-PHA scheme approximates optimal policies in Bayesian RL by separating reducible from irreducible uncertainty, demonstrated on the LQG problem with unknown gain.